Re: [Mesa-dev] [PATCH 00/38] radv, ac: 16-bit and 8-bit arithmetic and 8-bit storage

2019-02-18 Thread Rhys Perry
The CTS is buggy because the input_output_float_64_to_16 tests are run
even though they shouldn't be run because they try to use a
unadvertised (and unimplemented) optional feature.
Some of them crash for unrelated reasons though: load_tess_varyings()
from ac_nir_to_llvm.c doesn't handle 64-bit varyings. So not all of
them would work even if VK_FORMAT_R64_SFLOAT was a implemented vertex
format.

On Mon, 18 Feb 2019 at 08:53, Samuel Pitoiset  wrote:
>
>
> On 2/16/19 1:21 AM, Rhys Perry wrote:
> > This series add support for:
> > - VK_KHR_shader_float16_int8
> > - VK_AMD_gpu_shader_half_float
> > - VK_AMD_gpu_shader_int16
> > - VK_KHR_8bit_storage
> > on VI+. Half floats are disabled on LLVM 7 because of a bug causing large
> > memory usage and long (or unbounded) compilation times with some CTS
> > tests.
> >
> > It is written against the following patch series:
> > - https://patchwork.freedesktop.org/series/53454/ (v4)
> > - https://patchwork.freedesktop.org/series/53660/ (v1)
> >
> > With LLVM 9, there are no reproducable Vulkan CTS regressions with Vega
> > and VI except for
> > dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.input_output_float_64_to_16.*
> > which fails or crashes because of unrelated radv bugs with 64-bit varyings
> > and because the tests use VK_FORMAT_R64_SFLOAT as a vertex format even
> > though radv does not support it.
>
> test bug?
>
> The two NIR related patches (22 and 25) should be sent separately,
> otherwise people working on NIR might miss them.
>
> >
> > With LLVM 9, there are no reproducable piglit regressions except for
> > glsl-array-bounds-12.shader_test because of a LLVM bug when
> > SLP vectorization is enabled.
> >
> > With LLVM 8, there are no reproducable Vulkan CTS regressions with Vega
> > and VI except for those with LLVM 9 and a couple of tests because of a
> > LLVM bug after the SLP vectorizer and with the current lack of fallback
> > for 16-bit interpolation on LLVM versions before LLVM 9.
> >
> > With LLVM 7, there are no reproducable Vulkan CTS regressions with Vega
> > and VI except for those with LLVM 9 and a couple of tests because of a
> > LLVM bug after the SLP vectorizer.
> >
> > The SLP vectorization patch is marked as WIP because it exposes LLVM bugs
> > with piglit's glsl-array-bounds-12.shader_test, some Vulkan CTS tests and
> > some shader-db test for a game I can't remember. It also over-vectorizes
> > 32-bit code which can cause significant worsening in generated code
> > quality.
> >
> > The 16-bit interpolation patch is marked as WIP because it currently
> > requires intrinsics only available in LLVM 9 and does not have a fallback.
> >
> > A branch on Github containing this series can be found at:
> > https://github.com/pendingchaos/mesa/commits/radv_fp16_int16_int8_v2
> >
> > v2: rebase
> > v2: implement 16-bit interpolation
> > v2: move LLVMAddSLPVectorizePass to after LLVMAddEarlyCSEMemSSAPass
> > v2: run vectorization unconditionally on GFX9 and later
> > v2: remove ac_get_one(), ac_get_zero(), ac_get_onef() and ac_get_zerof()
> > v2: remove ac_int_of_size()
> > v2: fix 64-bit visit_load_var()
> > v2: mark VK_KHR_8bit_storage as DONE in features.txt
> > v2: mark SLP vectorization patch as WIP
> > v2: fix C++ style comment
> >
> > Rhys Perry (41):
> >radv: bitcast 16-bit outputs to integers
> >radv: ensure export arguments are always float
> >ac: add various helpers for float16/int16/int8
> >ac/nir: implement 8-bit push constant, ssbo and ubo loads
> >ac/nir: implement 8-bit ssbo stores
> >ac/nir: fix 16-bit ssbo stores
> >ac/nir: implement 8-bit nir_load_const_instr
> >ac/nir: implement 8-bit conversions
> >ac/nir: fix 64-bit nir_op_f2f16_rtz
> >ac/nir: make ac_build_clamp work on all bit sizes
> >ac/nir: make ac_build_fract work on all bit sizes
> >ac/nir: make ac_build_isign work on all bit sizes
> >ac/nir: make ac_build_fsign work on all bit sizes
> >ac/nir: make ac_build_fdiv support 16-bit floats
> >ac/nir: implement half-float nir_op_frcp
> >ac/nir: implement half-float nir_op_frsq
> >ac/nir: implement half-float nir_op_ldexp
> >radv: lower 16-bit flrp
> >ac/nir: support half floats in emit_b2f
> >ac/nir: make emit_b2i work on all bit sizes
> >ac/nir: implement 16-bit shifts
> >compiler/nir: add lowering option for 16-bit ffma
> >ac/nir: implement 16-bit ac_build_ddxy
> >ac/nir: implement 8 and 16 bit ac_build_readlane
> >nir: make bitfield_reverse and ifind_msb work with all integers
> >ac/nir: make ac_find_lsb work on all bit sizes
> >ac/nir: make ac_build_umsb work on all bit sizes
> >ac/nir: implement 8 and 16 bit ac_build_imsb
> >ac/nir: make ac_build_bit_count work on all bit sizes
> >ac/nir: make ac_build_bitfield_reverse work on all bit sizes
> >ac/nir: implement 16-bit pack/unpack opcodes
> >ac/nir: add 8-bit types to glsl_base_to_llvm_type
> >ac/nir,radv: create an array of varying 

Re: [Mesa-dev] [PATCH 00/38] radv, ac: 16-bit and 8-bit arithmetic and 8-bit storage

2019-02-18 Thread Samuel Pitoiset


On 2/16/19 1:21 AM, Rhys Perry wrote:

This series add support for:
- VK_KHR_shader_float16_int8
- VK_AMD_gpu_shader_half_float
- VK_AMD_gpu_shader_int16
- VK_KHR_8bit_storage
on VI+. Half floats are disabled on LLVM 7 because of a bug causing large
memory usage and long (or unbounded) compilation times with some CTS
tests.

It is written against the following patch series:
- https://patchwork.freedesktop.org/series/53454/ (v4)
- https://patchwork.freedesktop.org/series/53660/ (v1)

With LLVM 9, there are no reproducable Vulkan CTS regressions with Vega
and VI except for
dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.input_output_float_64_to_16.*
which fails or crashes because of unrelated radv bugs with 64-bit varyings
and because the tests use VK_FORMAT_R64_SFLOAT as a vertex format even
though radv does not support it.


test bug?

The two NIR related patches (22 and 25) should be sent separately, 
otherwise people working on NIR might miss them.




With LLVM 9, there are no reproducable piglit regressions except for
glsl-array-bounds-12.shader_test because of a LLVM bug when
SLP vectorization is enabled.

With LLVM 8, there are no reproducable Vulkan CTS regressions with Vega
and VI except for those with LLVM 9 and a couple of tests because of a
LLVM bug after the SLP vectorizer and with the current lack of fallback
for 16-bit interpolation on LLVM versions before LLVM 9.

With LLVM 7, there are no reproducable Vulkan CTS regressions with Vega
and VI except for those with LLVM 9 and a couple of tests because of a
LLVM bug after the SLP vectorizer.

The SLP vectorization patch is marked as WIP because it exposes LLVM bugs
with piglit's glsl-array-bounds-12.shader_test, some Vulkan CTS tests and
some shader-db test for a game I can't remember. It also over-vectorizes
32-bit code which can cause significant worsening in generated code
quality.

The 16-bit interpolation patch is marked as WIP because it currently
requires intrinsics only available in LLVM 9 and does not have a fallback.

A branch on Github containing this series can be found at:
https://github.com/pendingchaos/mesa/commits/radv_fp16_int16_int8_v2

v2: rebase
v2: implement 16-bit interpolation
v2: move LLVMAddSLPVectorizePass to after LLVMAddEarlyCSEMemSSAPass
v2: run vectorization unconditionally on GFX9 and later
v2: remove ac_get_one(), ac_get_zero(), ac_get_onef() and ac_get_zerof()
v2: remove ac_int_of_size()
v2: fix 64-bit visit_load_var()
v2: mark VK_KHR_8bit_storage as DONE in features.txt
v2: mark SLP vectorization patch as WIP
v2: fix C++ style comment

Rhys Perry (41):
   radv: bitcast 16-bit outputs to integers
   radv: ensure export arguments are always float
   ac: add various helpers for float16/int16/int8
   ac/nir: implement 8-bit push constant, ssbo and ubo loads
   ac/nir: implement 8-bit ssbo stores
   ac/nir: fix 16-bit ssbo stores
   ac/nir: implement 8-bit nir_load_const_instr
   ac/nir: implement 8-bit conversions
   ac/nir: fix 64-bit nir_op_f2f16_rtz
   ac/nir: make ac_build_clamp work on all bit sizes
   ac/nir: make ac_build_fract work on all bit sizes
   ac/nir: make ac_build_isign work on all bit sizes
   ac/nir: make ac_build_fsign work on all bit sizes
   ac/nir: make ac_build_fdiv support 16-bit floats
   ac/nir: implement half-float nir_op_frcp
   ac/nir: implement half-float nir_op_frsq
   ac/nir: implement half-float nir_op_ldexp
   radv: lower 16-bit flrp
   ac/nir: support half floats in emit_b2f
   ac/nir: make emit_b2i work on all bit sizes
   ac/nir: implement 16-bit shifts
   compiler/nir: add lowering option for 16-bit ffma
   ac/nir: implement 16-bit ac_build_ddxy
   ac/nir: implement 8 and 16 bit ac_build_readlane
   nir: make bitfield_reverse and ifind_msb work with all integers
   ac/nir: make ac_find_lsb work on all bit sizes
   ac/nir: make ac_build_umsb work on all bit sizes
   ac/nir: implement 8 and 16 bit ac_build_imsb
   ac/nir: make ac_build_bit_count work on all bit sizes
   ac/nir: make ac_build_bitfield_reverse work on all bit sizes
   ac/nir: implement 16-bit pack/unpack opcodes
   ac/nir: add 8-bit types to glsl_base_to_llvm_type
   ac/nir,radv: create an array of varying output types
   ac/nir: store all outputs as f32
   radv: store all fragment shader inputs as f32
   radv: handle all fragment output types
   WIP: radv,ac: implement 16-bit interpolation
   WIP: ac,radv: run LLVM's SLP vectorizer
   ac/nir: generate better code for nir_op_f2f16_rtz
   ac/nir: have nir_op_f2f16 round to zero
   radv,docs: expose float16, int16 and int8 features and extensions

  docs/features.txt|   2 +-
  src/amd/common/ac_llvm_build.c   | 325 +++
  src/amd/common/ac_llvm_build.h   |  18 +-
  src/amd/common/ac_llvm_util.c|   8 +-
  src/amd/common/ac_nir_to_llvm.c  | 268 +++
  src/amd/common/ac_shader_abi.h   |   1 +
  src/amd/vulkan/radv_device.c |  17 ++
  

[Mesa-dev] [PATCH 00/38] radv, ac: 16-bit and 8-bit arithmetic and 8-bit storage

2019-02-15 Thread Rhys Perry
This series add support for:
- VK_KHR_shader_float16_int8
- VK_AMD_gpu_shader_half_float
- VK_AMD_gpu_shader_int16
- VK_KHR_8bit_storage
on VI+. Half floats are disabled on LLVM 7 because of a bug causing large
memory usage and long (or unbounded) compilation times with some CTS
tests.

It is written against the following patch series:
- https://patchwork.freedesktop.org/series/53454/ (v4)
- https://patchwork.freedesktop.org/series/53660/ (v1)

With LLVM 9, there are no reproducable Vulkan CTS regressions with Vega
and VI except for
dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.input_output_float_64_to_16.*
which fails or crashes because of unrelated radv bugs with 64-bit varyings
and because the tests use VK_FORMAT_R64_SFLOAT as a vertex format even
though radv does not support it.

With LLVM 9, there are no reproducable piglit regressions except for
glsl-array-bounds-12.shader_test because of a LLVM bug when
SLP vectorization is enabled.

With LLVM 8, there are no reproducable Vulkan CTS regressions with Vega
and VI except for those with LLVM 9 and a couple of tests because of a
LLVM bug after the SLP vectorizer and with the current lack of fallback
for 16-bit interpolation on LLVM versions before LLVM 9.

With LLVM 7, there are no reproducable Vulkan CTS regressions with Vega
and VI except for those with LLVM 9 and a couple of tests because of a
LLVM bug after the SLP vectorizer.

The SLP vectorization patch is marked as WIP because it exposes LLVM bugs
with piglit's glsl-array-bounds-12.shader_test, some Vulkan CTS tests and
some shader-db test for a game I can't remember. It also over-vectorizes
32-bit code which can cause significant worsening in generated code
quality.

The 16-bit interpolation patch is marked as WIP because it currently
requires intrinsics only available in LLVM 9 and does not have a fallback.

A branch on Github containing this series can be found at:
https://github.com/pendingchaos/mesa/commits/radv_fp16_int16_int8_v2

v2: rebase
v2: implement 16-bit interpolation
v2: move LLVMAddSLPVectorizePass to after LLVMAddEarlyCSEMemSSAPass
v2: run vectorization unconditionally on GFX9 and later
v2: remove ac_get_one(), ac_get_zero(), ac_get_onef() and ac_get_zerof()
v2: remove ac_int_of_size()
v2: fix 64-bit visit_load_var()
v2: mark VK_KHR_8bit_storage as DONE in features.txt
v2: mark SLP vectorization patch as WIP
v2: fix C++ style comment

Rhys Perry (41):
  radv: bitcast 16-bit outputs to integers
  radv: ensure export arguments are always float
  ac: add various helpers for float16/int16/int8
  ac/nir: implement 8-bit push constant, ssbo and ubo loads
  ac/nir: implement 8-bit ssbo stores
  ac/nir: fix 16-bit ssbo stores
  ac/nir: implement 8-bit nir_load_const_instr
  ac/nir: implement 8-bit conversions
  ac/nir: fix 64-bit nir_op_f2f16_rtz
  ac/nir: make ac_build_clamp work on all bit sizes
  ac/nir: make ac_build_fract work on all bit sizes
  ac/nir: make ac_build_isign work on all bit sizes
  ac/nir: make ac_build_fsign work on all bit sizes
  ac/nir: make ac_build_fdiv support 16-bit floats
  ac/nir: implement half-float nir_op_frcp
  ac/nir: implement half-float nir_op_frsq
  ac/nir: implement half-float nir_op_ldexp
  radv: lower 16-bit flrp
  ac/nir: support half floats in emit_b2f
  ac/nir: make emit_b2i work on all bit sizes
  ac/nir: implement 16-bit shifts
  compiler/nir: add lowering option for 16-bit ffma
  ac/nir: implement 16-bit ac_build_ddxy
  ac/nir: implement 8 and 16 bit ac_build_readlane
  nir: make bitfield_reverse and ifind_msb work with all integers
  ac/nir: make ac_find_lsb work on all bit sizes
  ac/nir: make ac_build_umsb work on all bit sizes
  ac/nir: implement 8 and 16 bit ac_build_imsb
  ac/nir: make ac_build_bit_count work on all bit sizes
  ac/nir: make ac_build_bitfield_reverse work on all bit sizes
  ac/nir: implement 16-bit pack/unpack opcodes
  ac/nir: add 8-bit types to glsl_base_to_llvm_type
  ac/nir,radv: create an array of varying output types
  ac/nir: store all outputs as f32
  radv: store all fragment shader inputs as f32
  radv: handle all fragment output types
  WIP: radv,ac: implement 16-bit interpolation
  WIP: ac,radv: run LLVM's SLP vectorizer
  ac/nir: generate better code for nir_op_f2f16_rtz
  ac/nir: have nir_op_f2f16 round to zero
  radv,docs: expose float16, int16 and int8 features and extensions

 docs/features.txt|   2 +-
 src/amd/common/ac_llvm_build.c   | 325 +++
 src/amd/common/ac_llvm_build.h   |  18 +-
 src/amd/common/ac_llvm_util.c|   8 +-
 src/amd/common/ac_nir_to_llvm.c  | 268 +++
 src/amd/common/ac_shader_abi.h   |   1 +
 src/amd/vulkan/radv_device.c |  17 ++
 src/amd/vulkan/radv_extensions.py|   4 +
 src/amd/vulkan/radv_nir_to_llvm.c| 123 +
 src/amd/vulkan/radv_pipeline.c   |  19 +-
 src/amd/vulkan/radv_shader.c |   4 +
 

Re: [Mesa-dev] [PATCH 00/38] radv, ac: 16-bit and 8-bit arithmetic and 8-bit storage

2019-02-13 Thread Samuel Pitoiset


On 2/13/19 9:20 PM, Rhys Perry wrote:

Quite a bit of the patches aren't specific to a single extension as
many make code size-generic and some of the extensions intersect in
functionality.
It might still be possible to roughly order the patches by
functionality but I'm not sure if it would be very useful (possible
order in attachment). I didn't look at the actual content of the
patches when creating the attachment, this is from memory and looking
at the descriptions.
Would you like me to send out a v2 of this series doing like that?


Ok. No that's fine.

Can you rebase and handle Marek feedbacks, at least? I will review the v2.

Thanks Rhys.



On Tue, 12 Feb 2019 at 17:08, Samuel Pitoiset  wrote:

How about splitting this series in four different parts? One for every
extension? Is this doable without too much troubles?

On 2/12/19 6:02 PM, Rhys Perry wrote:

It currently requires review (and possibly rebasing). Marek Olšák send
some feedback for a few of the patches but other than that, it hasn't
gotten much attention.

Also patch 35 seems to vectorize 32-bit code which can help or hurt
shaders quite a bit and seems to hurt shaders overall. I'm not yet
sure how to solve this without removing it or changing the result of
LLVM's SLP vectorizer significantly.
IIRC enabling SLP vectorizer also uncovered a RA bug with a shader.

I think I'll look into the issues with patch 35 again.

On Tue, 12 Feb 2019 at 16:30, Samuel Pitoiset  wrote:

What's the status of this?

On 12/7/18 6:21 PM, Rhys Perry wrote:

This series add support for:
- VK_KHR_shader_float16_int8
- VK_AMD_gpu_shader_half_float
- VK_AMD_gpu_shader_int16
- VK_KHR_8bit_storage
on VI+. Half floats are currently disabled on LLVM 7 because of a bug
causing large memory usage and long (or unbounded) compilation times with
some tests.

It depends on the follow patch series:
- https://patchwork.freedesktop.org/series/53454/
- https://patchwork.freedesktop.org/series/53602/
- https://patchwork.freedesktop.org/series/53660/

An older version was tested on my Polaris card, but due to hardware issues
I currently can't test the latest version of the series.

deqp-vk has no regressions and none of the newly enabled tests fail.

Rhys Perry (38):
 ac: add various helpers for float16/int16/int8
 ac/nir: implement 8-bit push constant, ssbo and ubo loads
 ac/nir: implement 8-bit ssbo stores
 ac/nir: fix 16-bit ssbo stores
 ac/nir: implement 8-bit nir_load_const_instr
 ac/nir: implement 8-bit conversions
 ac/nir: fix 64-bit nir_op_f2f16_rtz
 ac/nir: make ac_build_clamp work on all bit sizes
 ac/nir: make ac_build_fract work on all bit sizes
 ac/nir: make ac_build_isign work on all bit sizes
 ac/nir: make ac_build_fsign work on all bit sizes
 ac/nir: make ac_build_fdiv support 16-bit floats
 ac/nir: implement half-float nir_op_frcp
 ac/nir: implement half-float nir_op_frsq
 ac/nir: implement half-float nir_op_ldexp
 radv: lower 16-bit flrp
 ac/nir: support half floats in emit_b2f
 ac/nir: make emit_b2i work on all bit sizes
 ac/nir: implement 16-bit shifts
 compiler/nir: add lowering option for 16-bit ffma
 ac/nir: implement 16-bit ac_build_ddxy
 ac/nir: implement 8 and 16 bit ac_build_readlane
 nir: make bitfield_reverse and ifind_msb work with all integers
 ac/nir: make ac_find_lsb work on all bit sizes
 ac/nir: make ac_build_umsb work on all bit sizes
 ac/nir: implement 8 and 16 bit ac_build_imsb
 ac/nir: make ac_build_bit_count work on all bit sizes
 ac/nir: make ac_build_bitfield_reverse work on all bit sizes
 ac/nir: implement 16-bit pack/unpack opcodes
 ac/nir: add 8-bit and 16-bit types to glsl_base_to_llvm_type
 ac/nir,radv: create an array of varying output types
 ac/nir: store all outputs as f32
 radv: store all fragment shader inputs as f32
 radv: handle all fragment output types
 ac,radv: run LLVM's SLP vectorizer
 ac/nir: generate better code for nir_op_f2f16_rtz
 ac/nir: have nir_op_f2f16 round to zero
 radv: expose float16, int16 and int8 features and extensions

src/amd/common/ac_llvm_build.c| 355 ++
src/amd/common/ac_llvm_build.h|  22 +-
src/amd/common/ac_llvm_util.c |   9 +-
src/amd/common/ac_llvm_util.h |   1 +
src/amd/common/ac_nir_to_llvm.c   | 258 +++
src/amd/common/ac_shader_abi.h|   1 +
src/amd/vulkan/radv_device.c  |  17 ++
src/amd/vulkan/radv_extensions.py |   4 +
src/amd/vulkan/radv_nir_to_llvm.c |  92 ---
src/amd/vulkan/radv_shader.c  |   7 +
src/broadcom/compiler/nir_to_vir.c|   1 +
src/compiler/nir/nir.h|   1 +
src/compiler/nir/nir_opcodes.py   |   4 +-
src/compiler/nir/nir_opt_algebraic.py |   4 +-
src/gallium/drivers/radeonsi/si_get.c |   1 +
src/gallium/drivers/vc4/vc4_program.c |   1 

Re: [Mesa-dev] [PATCH 00/38] radv, ac: 16-bit and 8-bit arithmetic and 8-bit storage

2019-02-13 Thread Rhys Perry
Quite a bit of the patches aren't specific to a single extension as
many make code size-generic and some of the extensions intersect in
functionality.
It might still be possible to roughly order the patches by
functionality but I'm not sure if it would be very useful (possible
order in attachment). I didn't look at the actual content of the
patches when creating the attachment, this is from memory and looking
at the descriptions.
Would you like me to send out a v2 of this series doing like that?

On Tue, 12 Feb 2019 at 17:08, Samuel Pitoiset  wrote:
>
> How about splitting this series in four different parts? One for every
> extension? Is this doable without too much troubles?
>
> On 2/12/19 6:02 PM, Rhys Perry wrote:
> > It currently requires review (and possibly rebasing). Marek Olšák send
> > some feedback for a few of the patches but other than that, it hasn't
> > gotten much attention.
> >
> > Also patch 35 seems to vectorize 32-bit code which can help or hurt
> > shaders quite a bit and seems to hurt shaders overall. I'm not yet
> > sure how to solve this without removing it or changing the result of
> > LLVM's SLP vectorizer significantly.
> > IIRC enabling SLP vectorizer also uncovered a RA bug with a shader.
> >
> > I think I'll look into the issues with patch 35 again.
> >
> > On Tue, 12 Feb 2019 at 16:30, Samuel Pitoiset  
> > wrote:
> >> What's the status of this?
> >>
> >> On 12/7/18 6:21 PM, Rhys Perry wrote:
> >>> This series add support for:
> >>> - VK_KHR_shader_float16_int8
> >>> - VK_AMD_gpu_shader_half_float
> >>> - VK_AMD_gpu_shader_int16
> >>> - VK_KHR_8bit_storage
> >>> on VI+. Half floats are currently disabled on LLVM 7 because of a bug
> >>> causing large memory usage and long (or unbounded) compilation times with
> >>> some tests.
> >>>
> >>> It depends on the follow patch series:
> >>> - https://patchwork.freedesktop.org/series/53454/
> >>> - https://patchwork.freedesktop.org/series/53602/
> >>> - https://patchwork.freedesktop.org/series/53660/
> >>>
> >>> An older version was tested on my Polaris card, but due to hardware issues
> >>> I currently can't test the latest version of the series.
> >>>
> >>> deqp-vk has no regressions and none of the newly enabled tests fail.
> >>>
> >>> Rhys Perry (38):
> >>> ac: add various helpers for float16/int16/int8
> >>> ac/nir: implement 8-bit push constant, ssbo and ubo loads
> >>> ac/nir: implement 8-bit ssbo stores
> >>> ac/nir: fix 16-bit ssbo stores
> >>> ac/nir: implement 8-bit nir_load_const_instr
> >>> ac/nir: implement 8-bit conversions
> >>> ac/nir: fix 64-bit nir_op_f2f16_rtz
> >>> ac/nir: make ac_build_clamp work on all bit sizes
> >>> ac/nir: make ac_build_fract work on all bit sizes
> >>> ac/nir: make ac_build_isign work on all bit sizes
> >>> ac/nir: make ac_build_fsign work on all bit sizes
> >>> ac/nir: make ac_build_fdiv support 16-bit floats
> >>> ac/nir: implement half-float nir_op_frcp
> >>> ac/nir: implement half-float nir_op_frsq
> >>> ac/nir: implement half-float nir_op_ldexp
> >>> radv: lower 16-bit flrp
> >>> ac/nir: support half floats in emit_b2f
> >>> ac/nir: make emit_b2i work on all bit sizes
> >>> ac/nir: implement 16-bit shifts
> >>> compiler/nir: add lowering option for 16-bit ffma
> >>> ac/nir: implement 16-bit ac_build_ddxy
> >>> ac/nir: implement 8 and 16 bit ac_build_readlane
> >>> nir: make bitfield_reverse and ifind_msb work with all integers
> >>> ac/nir: make ac_find_lsb work on all bit sizes
> >>> ac/nir: make ac_build_umsb work on all bit sizes
> >>> ac/nir: implement 8 and 16 bit ac_build_imsb
> >>> ac/nir: make ac_build_bit_count work on all bit sizes
> >>> ac/nir: make ac_build_bitfield_reverse work on all bit sizes
> >>> ac/nir: implement 16-bit pack/unpack opcodes
> >>> ac/nir: add 8-bit and 16-bit types to glsl_base_to_llvm_type
> >>> ac/nir,radv: create an array of varying output types
> >>> ac/nir: store all outputs as f32
> >>> radv: store all fragment shader inputs as f32
> >>> radv: handle all fragment output types
> >>> ac,radv: run LLVM's SLP vectorizer
> >>> ac/nir: generate better code for nir_op_f2f16_rtz
> >>> ac/nir: have nir_op_f2f16 round to zero
> >>> radv: expose float16, int16 and int8 features and extensions
> >>>
> >>>src/amd/common/ac_llvm_build.c| 355 ++
> >>>src/amd/common/ac_llvm_build.h|  22 +-
> >>>src/amd/common/ac_llvm_util.c |   9 +-
> >>>src/amd/common/ac_llvm_util.h |   1 +
> >>>src/amd/common/ac_nir_to_llvm.c   | 258 +++
> >>>src/amd/common/ac_shader_abi.h|   1 +
> >>>src/amd/vulkan/radv_device.c  |  17 ++
> >>>src/amd/vulkan/radv_extensions.py |   4 +
> >>>src/amd/vulkan/radv_nir_to_llvm.c |  92 ---
> >>>src/amd/vulkan/radv_shader.c  |   7 +
> >>>

Re: [Mesa-dev] [PATCH 00/38] radv, ac: 16-bit and 8-bit arithmetic and 8-bit storage

2019-02-12 Thread Samuel Pitoiset
How about splitting this series in four different parts? One for every 
extension? Is this doable without too much troubles?


On 2/12/19 6:02 PM, Rhys Perry wrote:

It currently requires review (and possibly rebasing). Marek Olšák send
some feedback for a few of the patches but other than that, it hasn't
gotten much attention.

Also patch 35 seems to vectorize 32-bit code which can help or hurt
shaders quite a bit and seems to hurt shaders overall. I'm not yet
sure how to solve this without removing it or changing the result of
LLVM's SLP vectorizer significantly.
IIRC enabling SLP vectorizer also uncovered a RA bug with a shader.

I think I'll look into the issues with patch 35 again.

On Tue, 12 Feb 2019 at 16:30, Samuel Pitoiset  wrote:

What's the status of this?

On 12/7/18 6:21 PM, Rhys Perry wrote:

This series add support for:
- VK_KHR_shader_float16_int8
- VK_AMD_gpu_shader_half_float
- VK_AMD_gpu_shader_int16
- VK_KHR_8bit_storage
on VI+. Half floats are currently disabled on LLVM 7 because of a bug
causing large memory usage and long (or unbounded) compilation times with
some tests.

It depends on the follow patch series:
- https://patchwork.freedesktop.org/series/53454/
- https://patchwork.freedesktop.org/series/53602/
- https://patchwork.freedesktop.org/series/53660/

An older version was tested on my Polaris card, but due to hardware issues
I currently can't test the latest version of the series.

deqp-vk has no regressions and none of the newly enabled tests fail.

Rhys Perry (38):
ac: add various helpers for float16/int16/int8
ac/nir: implement 8-bit push constant, ssbo and ubo loads
ac/nir: implement 8-bit ssbo stores
ac/nir: fix 16-bit ssbo stores
ac/nir: implement 8-bit nir_load_const_instr
ac/nir: implement 8-bit conversions
ac/nir: fix 64-bit nir_op_f2f16_rtz
ac/nir: make ac_build_clamp work on all bit sizes
ac/nir: make ac_build_fract work on all bit sizes
ac/nir: make ac_build_isign work on all bit sizes
ac/nir: make ac_build_fsign work on all bit sizes
ac/nir: make ac_build_fdiv support 16-bit floats
ac/nir: implement half-float nir_op_frcp
ac/nir: implement half-float nir_op_frsq
ac/nir: implement half-float nir_op_ldexp
radv: lower 16-bit flrp
ac/nir: support half floats in emit_b2f
ac/nir: make emit_b2i work on all bit sizes
ac/nir: implement 16-bit shifts
compiler/nir: add lowering option for 16-bit ffma
ac/nir: implement 16-bit ac_build_ddxy
ac/nir: implement 8 and 16 bit ac_build_readlane
nir: make bitfield_reverse and ifind_msb work with all integers
ac/nir: make ac_find_lsb work on all bit sizes
ac/nir: make ac_build_umsb work on all bit sizes
ac/nir: implement 8 and 16 bit ac_build_imsb
ac/nir: make ac_build_bit_count work on all bit sizes
ac/nir: make ac_build_bitfield_reverse work on all bit sizes
ac/nir: implement 16-bit pack/unpack opcodes
ac/nir: add 8-bit and 16-bit types to glsl_base_to_llvm_type
ac/nir,radv: create an array of varying output types
ac/nir: store all outputs as f32
radv: store all fragment shader inputs as f32
radv: handle all fragment output types
ac,radv: run LLVM's SLP vectorizer
ac/nir: generate better code for nir_op_f2f16_rtz
ac/nir: have nir_op_f2f16 round to zero
radv: expose float16, int16 and int8 features and extensions

   src/amd/common/ac_llvm_build.c| 355 ++
   src/amd/common/ac_llvm_build.h|  22 +-
   src/amd/common/ac_llvm_util.c |   9 +-
   src/amd/common/ac_llvm_util.h |   1 +
   src/amd/common/ac_nir_to_llvm.c   | 258 +++
   src/amd/common/ac_shader_abi.h|   1 +
   src/amd/vulkan/radv_device.c  |  17 ++
   src/amd/vulkan/radv_extensions.py |   4 +
   src/amd/vulkan/radv_nir_to_llvm.c |  92 ---
   src/amd/vulkan/radv_shader.c  |   7 +
   src/broadcom/compiler/nir_to_vir.c|   1 +
   src/compiler/nir/nir.h|   1 +
   src/compiler/nir/nir_opcodes.py   |   4 +-
   src/compiler/nir/nir_opt_algebraic.py |   4 +-
   src/gallium/drivers/radeonsi/si_get.c |   1 +
   src/gallium/drivers/vc4/vc4_program.c |   1 +
   16 files changed, 516 insertions(+), 262 deletions(-)


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 00/38] radv, ac: 16-bit and 8-bit arithmetic and 8-bit storage

2019-02-12 Thread Rhys Perry
It currently requires review (and possibly rebasing). Marek Olšák send
some feedback for a few of the patches but other than that, it hasn't
gotten much attention.

Also patch 35 seems to vectorize 32-bit code which can help or hurt
shaders quite a bit and seems to hurt shaders overall. I'm not yet
sure how to solve this without removing it or changing the result of
LLVM's SLP vectorizer significantly.
IIRC enabling SLP vectorizer also uncovered a RA bug with a shader.

I think I'll look into the issues with patch 35 again.

On Tue, 12 Feb 2019 at 16:30, Samuel Pitoiset  wrote:
>
> What's the status of this?
>
> On 12/7/18 6:21 PM, Rhys Perry wrote:
> > This series add support for:
> > - VK_KHR_shader_float16_int8
> > - VK_AMD_gpu_shader_half_float
> > - VK_AMD_gpu_shader_int16
> > - VK_KHR_8bit_storage
> > on VI+. Half floats are currently disabled on LLVM 7 because of a bug
> > causing large memory usage and long (or unbounded) compilation times with
> > some tests.
> >
> > It depends on the follow patch series:
> > - https://patchwork.freedesktop.org/series/53454/
> > - https://patchwork.freedesktop.org/series/53602/
> > - https://patchwork.freedesktop.org/series/53660/
> >
> > An older version was tested on my Polaris card, but due to hardware issues
> > I currently can't test the latest version of the series.
> >
> > deqp-vk has no regressions and none of the newly enabled tests fail.
> >
> > Rhys Perry (38):
> >ac: add various helpers for float16/int16/int8
> >ac/nir: implement 8-bit push constant, ssbo and ubo loads
> >ac/nir: implement 8-bit ssbo stores
> >ac/nir: fix 16-bit ssbo stores
> >ac/nir: implement 8-bit nir_load_const_instr
> >ac/nir: implement 8-bit conversions
> >ac/nir: fix 64-bit nir_op_f2f16_rtz
> >ac/nir: make ac_build_clamp work on all bit sizes
> >ac/nir: make ac_build_fract work on all bit sizes
> >ac/nir: make ac_build_isign work on all bit sizes
> >ac/nir: make ac_build_fsign work on all bit sizes
> >ac/nir: make ac_build_fdiv support 16-bit floats
> >ac/nir: implement half-float nir_op_frcp
> >ac/nir: implement half-float nir_op_frsq
> >ac/nir: implement half-float nir_op_ldexp
> >radv: lower 16-bit flrp
> >ac/nir: support half floats in emit_b2f
> >ac/nir: make emit_b2i work on all bit sizes
> >ac/nir: implement 16-bit shifts
> >compiler/nir: add lowering option for 16-bit ffma
> >ac/nir: implement 16-bit ac_build_ddxy
> >ac/nir: implement 8 and 16 bit ac_build_readlane
> >nir: make bitfield_reverse and ifind_msb work with all integers
> >ac/nir: make ac_find_lsb work on all bit sizes
> >ac/nir: make ac_build_umsb work on all bit sizes
> >ac/nir: implement 8 and 16 bit ac_build_imsb
> >ac/nir: make ac_build_bit_count work on all bit sizes
> >ac/nir: make ac_build_bitfield_reverse work on all bit sizes
> >ac/nir: implement 16-bit pack/unpack opcodes
> >ac/nir: add 8-bit and 16-bit types to glsl_base_to_llvm_type
> >ac/nir,radv: create an array of varying output types
> >ac/nir: store all outputs as f32
> >radv: store all fragment shader inputs as f32
> >radv: handle all fragment output types
> >ac,radv: run LLVM's SLP vectorizer
> >ac/nir: generate better code for nir_op_f2f16_rtz
> >ac/nir: have nir_op_f2f16 round to zero
> >radv: expose float16, int16 and int8 features and extensions
> >
> >   src/amd/common/ac_llvm_build.c| 355 ++
> >   src/amd/common/ac_llvm_build.h|  22 +-
> >   src/amd/common/ac_llvm_util.c |   9 +-
> >   src/amd/common/ac_llvm_util.h |   1 +
> >   src/amd/common/ac_nir_to_llvm.c   | 258 +++
> >   src/amd/common/ac_shader_abi.h|   1 +
> >   src/amd/vulkan/radv_device.c  |  17 ++
> >   src/amd/vulkan/radv_extensions.py |   4 +
> >   src/amd/vulkan/radv_nir_to_llvm.c |  92 ---
> >   src/amd/vulkan/radv_shader.c  |   7 +
> >   src/broadcom/compiler/nir_to_vir.c|   1 +
> >   src/compiler/nir/nir.h|   1 +
> >   src/compiler/nir/nir_opcodes.py   |   4 +-
> >   src/compiler/nir/nir_opt_algebraic.py |   4 +-
> >   src/gallium/drivers/radeonsi/si_get.c |   1 +
> >   src/gallium/drivers/vc4/vc4_program.c |   1 +
> >   16 files changed, 516 insertions(+), 262 deletions(-)
> >
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 00/38] radv, ac: 16-bit and 8-bit arithmetic and 8-bit storage

2019-02-12 Thread Samuel Pitoiset

What's the status of this?

On 12/7/18 6:21 PM, Rhys Perry wrote:

This series add support for:
- VK_KHR_shader_float16_int8
- VK_AMD_gpu_shader_half_float
- VK_AMD_gpu_shader_int16
- VK_KHR_8bit_storage
on VI+. Half floats are currently disabled on LLVM 7 because of a bug
causing large memory usage and long (or unbounded) compilation times with
some tests.

It depends on the follow patch series:
- https://patchwork.freedesktop.org/series/53454/
- https://patchwork.freedesktop.org/series/53602/
- https://patchwork.freedesktop.org/series/53660/

An older version was tested on my Polaris card, but due to hardware issues
I currently can't test the latest version of the series.

deqp-vk has no regressions and none of the newly enabled tests fail.

Rhys Perry (38):
   ac: add various helpers for float16/int16/int8
   ac/nir: implement 8-bit push constant, ssbo and ubo loads
   ac/nir: implement 8-bit ssbo stores
   ac/nir: fix 16-bit ssbo stores
   ac/nir: implement 8-bit nir_load_const_instr
   ac/nir: implement 8-bit conversions
   ac/nir: fix 64-bit nir_op_f2f16_rtz
   ac/nir: make ac_build_clamp work on all bit sizes
   ac/nir: make ac_build_fract work on all bit sizes
   ac/nir: make ac_build_isign work on all bit sizes
   ac/nir: make ac_build_fsign work on all bit sizes
   ac/nir: make ac_build_fdiv support 16-bit floats
   ac/nir: implement half-float nir_op_frcp
   ac/nir: implement half-float nir_op_frsq
   ac/nir: implement half-float nir_op_ldexp
   radv: lower 16-bit flrp
   ac/nir: support half floats in emit_b2f
   ac/nir: make emit_b2i work on all bit sizes
   ac/nir: implement 16-bit shifts
   compiler/nir: add lowering option for 16-bit ffma
   ac/nir: implement 16-bit ac_build_ddxy
   ac/nir: implement 8 and 16 bit ac_build_readlane
   nir: make bitfield_reverse and ifind_msb work with all integers
   ac/nir: make ac_find_lsb work on all bit sizes
   ac/nir: make ac_build_umsb work on all bit sizes
   ac/nir: implement 8 and 16 bit ac_build_imsb
   ac/nir: make ac_build_bit_count work on all bit sizes
   ac/nir: make ac_build_bitfield_reverse work on all bit sizes
   ac/nir: implement 16-bit pack/unpack opcodes
   ac/nir: add 8-bit and 16-bit types to glsl_base_to_llvm_type
   ac/nir,radv: create an array of varying output types
   ac/nir: store all outputs as f32
   radv: store all fragment shader inputs as f32
   radv: handle all fragment output types
   ac,radv: run LLVM's SLP vectorizer
   ac/nir: generate better code for nir_op_f2f16_rtz
   ac/nir: have nir_op_f2f16 round to zero
   radv: expose float16, int16 and int8 features and extensions

  src/amd/common/ac_llvm_build.c| 355 ++
  src/amd/common/ac_llvm_build.h|  22 +-
  src/amd/common/ac_llvm_util.c |   9 +-
  src/amd/common/ac_llvm_util.h |   1 +
  src/amd/common/ac_nir_to_llvm.c   | 258 +++
  src/amd/common/ac_shader_abi.h|   1 +
  src/amd/vulkan/radv_device.c  |  17 ++
  src/amd/vulkan/radv_extensions.py |   4 +
  src/amd/vulkan/radv_nir_to_llvm.c |  92 ---
  src/amd/vulkan/radv_shader.c  |   7 +
  src/broadcom/compiler/nir_to_vir.c|   1 +
  src/compiler/nir/nir.h|   1 +
  src/compiler/nir/nir_opcodes.py   |   4 +-
  src/compiler/nir/nir_opt_algebraic.py |   4 +-
  src/gallium/drivers/radeonsi/si_get.c |   1 +
  src/gallium/drivers/vc4/vc4_program.c |   1 +
  16 files changed, 516 insertions(+), 262 deletions(-)


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 00/38] radv, ac: 16-bit and 8-bit arithmetic and 8-bit storage

2018-12-07 Thread Rhys Perry
This series add support for:
- VK_KHR_shader_float16_int8
- VK_AMD_gpu_shader_half_float
- VK_AMD_gpu_shader_int16
- VK_KHR_8bit_storage
on VI+. Half floats are currently disabled on LLVM 7 because of a bug
causing large memory usage and long (or unbounded) compilation times with
some tests.

It depends on the follow patch series:
- https://patchwork.freedesktop.org/series/53454/
- https://patchwork.freedesktop.org/series/53602/
- https://patchwork.freedesktop.org/series/53660/

An older version was tested on my Polaris card, but due to hardware issues
I currently can't test the latest version of the series.

deqp-vk has no regressions and none of the newly enabled tests fail.

Rhys Perry (38):
  ac: add various helpers for float16/int16/int8
  ac/nir: implement 8-bit push constant, ssbo and ubo loads
  ac/nir: implement 8-bit ssbo stores
  ac/nir: fix 16-bit ssbo stores
  ac/nir: implement 8-bit nir_load_const_instr
  ac/nir: implement 8-bit conversions
  ac/nir: fix 64-bit nir_op_f2f16_rtz
  ac/nir: make ac_build_clamp work on all bit sizes
  ac/nir: make ac_build_fract work on all bit sizes
  ac/nir: make ac_build_isign work on all bit sizes
  ac/nir: make ac_build_fsign work on all bit sizes
  ac/nir: make ac_build_fdiv support 16-bit floats
  ac/nir: implement half-float nir_op_frcp
  ac/nir: implement half-float nir_op_frsq
  ac/nir: implement half-float nir_op_ldexp
  radv: lower 16-bit flrp
  ac/nir: support half floats in emit_b2f
  ac/nir: make emit_b2i work on all bit sizes
  ac/nir: implement 16-bit shifts
  compiler/nir: add lowering option for 16-bit ffma
  ac/nir: implement 16-bit ac_build_ddxy
  ac/nir: implement 8 and 16 bit ac_build_readlane
  nir: make bitfield_reverse and ifind_msb work with all integers
  ac/nir: make ac_find_lsb work on all bit sizes
  ac/nir: make ac_build_umsb work on all bit sizes
  ac/nir: implement 8 and 16 bit ac_build_imsb
  ac/nir: make ac_build_bit_count work on all bit sizes
  ac/nir: make ac_build_bitfield_reverse work on all bit sizes
  ac/nir: implement 16-bit pack/unpack opcodes
  ac/nir: add 8-bit and 16-bit types to glsl_base_to_llvm_type
  ac/nir,radv: create an array of varying output types
  ac/nir: store all outputs as f32
  radv: store all fragment shader inputs as f32
  radv: handle all fragment output types
  ac,radv: run LLVM's SLP vectorizer
  ac/nir: generate better code for nir_op_f2f16_rtz
  ac/nir: have nir_op_f2f16 round to zero
  radv: expose float16, int16 and int8 features and extensions

 src/amd/common/ac_llvm_build.c| 355 ++
 src/amd/common/ac_llvm_build.h|  22 +-
 src/amd/common/ac_llvm_util.c |   9 +-
 src/amd/common/ac_llvm_util.h |   1 +
 src/amd/common/ac_nir_to_llvm.c   | 258 +++
 src/amd/common/ac_shader_abi.h|   1 +
 src/amd/vulkan/radv_device.c  |  17 ++
 src/amd/vulkan/radv_extensions.py |   4 +
 src/amd/vulkan/radv_nir_to_llvm.c |  92 ---
 src/amd/vulkan/radv_shader.c  |   7 +
 src/broadcom/compiler/nir_to_vir.c|   1 +
 src/compiler/nir/nir.h|   1 +
 src/compiler/nir/nir_opcodes.py   |   4 +-
 src/compiler/nir/nir_opt_algebraic.py |   4 +-
 src/gallium/drivers/radeonsi/si_get.c |   1 +
 src/gallium/drivers/vc4/vc4_program.c |   1 +
 16 files changed, 516 insertions(+), 262 deletions(-)

-- 
2.19.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev