Re: [Mesa-dev] [PATCH 00/38] radv, ac: 16-bit and 8-bit arithmetic and 8-bit storage
The CTS is buggy because the input_output_float_64_to_16 tests are run even though they shouldn't be run because they try to use a unadvertised (and unimplemented) optional feature. Some of them crash for unrelated reasons though: load_tess_varyings() from ac_nir_to_llvm.c doesn't handle 64-bit varyings. So not all of them would work even if VK_FORMAT_R64_SFLOAT was a implemented vertex format. On Mon, 18 Feb 2019 at 08:53, Samuel Pitoiset wrote: > > > On 2/16/19 1:21 AM, Rhys Perry wrote: > > This series add support for: > > - VK_KHR_shader_float16_int8 > > - VK_AMD_gpu_shader_half_float > > - VK_AMD_gpu_shader_int16 > > - VK_KHR_8bit_storage > > on VI+. Half floats are disabled on LLVM 7 because of a bug causing large > > memory usage and long (or unbounded) compilation times with some CTS > > tests. > > > > It is written against the following patch series: > > - https://patchwork.freedesktop.org/series/53454/ (v4) > > - https://patchwork.freedesktop.org/series/53660/ (v1) > > > > With LLVM 9, there are no reproducable Vulkan CTS regressions with Vega > > and VI except for > > dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.input_output_float_64_to_16.* > > which fails or crashes because of unrelated radv bugs with 64-bit varyings > > and because the tests use VK_FORMAT_R64_SFLOAT as a vertex format even > > though radv does not support it. > > test bug? > > The two NIR related patches (22 and 25) should be sent separately, > otherwise people working on NIR might miss them. > > > > > With LLVM 9, there are no reproducable piglit regressions except for > > glsl-array-bounds-12.shader_test because of a LLVM bug when > > SLP vectorization is enabled. > > > > With LLVM 8, there are no reproducable Vulkan CTS regressions with Vega > > and VI except for those with LLVM 9 and a couple of tests because of a > > LLVM bug after the SLP vectorizer and with the current lack of fallback > > for 16-bit interpolation on LLVM versions before LLVM 9. > > > > With LLVM 7, there are no reproducable Vulkan CTS regressions with Vega > > and VI except for those with LLVM 9 and a couple of tests because of a > > LLVM bug after the SLP vectorizer. > > > > The SLP vectorization patch is marked as WIP because it exposes LLVM bugs > > with piglit's glsl-array-bounds-12.shader_test, some Vulkan CTS tests and > > some shader-db test for a game I can't remember. It also over-vectorizes > > 32-bit code which can cause significant worsening in generated code > > quality. > > > > The 16-bit interpolation patch is marked as WIP because it currently > > requires intrinsics only available in LLVM 9 and does not have a fallback. > > > > A branch on Github containing this series can be found at: > > https://github.com/pendingchaos/mesa/commits/radv_fp16_int16_int8_v2 > > > > v2: rebase > > v2: implement 16-bit interpolation > > v2: move LLVMAddSLPVectorizePass to after LLVMAddEarlyCSEMemSSAPass > > v2: run vectorization unconditionally on GFX9 and later > > v2: remove ac_get_one(), ac_get_zero(), ac_get_onef() and ac_get_zerof() > > v2: remove ac_int_of_size() > > v2: fix 64-bit visit_load_var() > > v2: mark VK_KHR_8bit_storage as DONE in features.txt > > v2: mark SLP vectorization patch as WIP > > v2: fix C++ style comment > > > > Rhys Perry (41): > >radv: bitcast 16-bit outputs to integers > >radv: ensure export arguments are always float > >ac: add various helpers for float16/int16/int8 > >ac/nir: implement 8-bit push constant, ssbo and ubo loads > >ac/nir: implement 8-bit ssbo stores > >ac/nir: fix 16-bit ssbo stores > >ac/nir: implement 8-bit nir_load_const_instr > >ac/nir: implement 8-bit conversions > >ac/nir: fix 64-bit nir_op_f2f16_rtz > >ac/nir: make ac_build_clamp work on all bit sizes > >ac/nir: make ac_build_fract work on all bit sizes > >ac/nir: make ac_build_isign work on all bit sizes > >ac/nir: make ac_build_fsign work on all bit sizes > >ac/nir: make ac_build_fdiv support 16-bit floats > >ac/nir: implement half-float nir_op_frcp > >ac/nir: implement half-float nir_op_frsq > >ac/nir: implement half-float nir_op_ldexp > >radv: lower 16-bit flrp > >ac/nir: support half floats in emit_b2f > >ac/nir: make emit_b2i work on all bit sizes > >ac/nir: implement 16-bit shifts > >compiler/nir: add lowering option for 16-bit ffma > >ac/nir: implement 16-bit ac_build_ddxy > >ac/nir: implement 8 and 16 bit ac_build_readlane > >nir: make bitfield_reverse and ifind_msb work with all integers > >ac/nir: make ac_find_lsb work on all bit sizes > >ac/nir: make ac_build_umsb work on all bit sizes > >ac/nir: implement 8 and 16 bit ac_build_imsb > >ac/nir: make ac_build_bit_count work on all bit sizes > >ac/nir: make ac_build_bitfield_reverse work on all bit sizes > >ac/nir: implement 16-bit pack/unpack opcodes > >ac/nir: add 8-bit types to glsl_base_to_llvm_type > >ac/nir,radv: create an array of varying
Re: [Mesa-dev] [PATCH 00/38] radv, ac: 16-bit and 8-bit arithmetic and 8-bit storage
On 2/16/19 1:21 AM, Rhys Perry wrote: This series add support for: - VK_KHR_shader_float16_int8 - VK_AMD_gpu_shader_half_float - VK_AMD_gpu_shader_int16 - VK_KHR_8bit_storage on VI+. Half floats are disabled on LLVM 7 because of a bug causing large memory usage and long (or unbounded) compilation times with some CTS tests. It is written against the following patch series: - https://patchwork.freedesktop.org/series/53454/ (v4) - https://patchwork.freedesktop.org/series/53660/ (v1) With LLVM 9, there are no reproducable Vulkan CTS regressions with Vega and VI except for dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.input_output_float_64_to_16.* which fails or crashes because of unrelated radv bugs with 64-bit varyings and because the tests use VK_FORMAT_R64_SFLOAT as a vertex format even though radv does not support it. test bug? The two NIR related patches (22 and 25) should be sent separately, otherwise people working on NIR might miss them. With LLVM 9, there are no reproducable piglit regressions except for glsl-array-bounds-12.shader_test because of a LLVM bug when SLP vectorization is enabled. With LLVM 8, there are no reproducable Vulkan CTS regressions with Vega and VI except for those with LLVM 9 and a couple of tests because of a LLVM bug after the SLP vectorizer and with the current lack of fallback for 16-bit interpolation on LLVM versions before LLVM 9. With LLVM 7, there are no reproducable Vulkan CTS regressions with Vega and VI except for those with LLVM 9 and a couple of tests because of a LLVM bug after the SLP vectorizer. The SLP vectorization patch is marked as WIP because it exposes LLVM bugs with piglit's glsl-array-bounds-12.shader_test, some Vulkan CTS tests and some shader-db test for a game I can't remember. It also over-vectorizes 32-bit code which can cause significant worsening in generated code quality. The 16-bit interpolation patch is marked as WIP because it currently requires intrinsics only available in LLVM 9 and does not have a fallback. A branch on Github containing this series can be found at: https://github.com/pendingchaos/mesa/commits/radv_fp16_int16_int8_v2 v2: rebase v2: implement 16-bit interpolation v2: move LLVMAddSLPVectorizePass to after LLVMAddEarlyCSEMemSSAPass v2: run vectorization unconditionally on GFX9 and later v2: remove ac_get_one(), ac_get_zero(), ac_get_onef() and ac_get_zerof() v2: remove ac_int_of_size() v2: fix 64-bit visit_load_var() v2: mark VK_KHR_8bit_storage as DONE in features.txt v2: mark SLP vectorization patch as WIP v2: fix C++ style comment Rhys Perry (41): radv: bitcast 16-bit outputs to integers radv: ensure export arguments are always float ac: add various helpers for float16/int16/int8 ac/nir: implement 8-bit push constant, ssbo and ubo loads ac/nir: implement 8-bit ssbo stores ac/nir: fix 16-bit ssbo stores ac/nir: implement 8-bit nir_load_const_instr ac/nir: implement 8-bit conversions ac/nir: fix 64-bit nir_op_f2f16_rtz ac/nir: make ac_build_clamp work on all bit sizes ac/nir: make ac_build_fract work on all bit sizes ac/nir: make ac_build_isign work on all bit sizes ac/nir: make ac_build_fsign work on all bit sizes ac/nir: make ac_build_fdiv support 16-bit floats ac/nir: implement half-float nir_op_frcp ac/nir: implement half-float nir_op_frsq ac/nir: implement half-float nir_op_ldexp radv: lower 16-bit flrp ac/nir: support half floats in emit_b2f ac/nir: make emit_b2i work on all bit sizes ac/nir: implement 16-bit shifts compiler/nir: add lowering option for 16-bit ffma ac/nir: implement 16-bit ac_build_ddxy ac/nir: implement 8 and 16 bit ac_build_readlane nir: make bitfield_reverse and ifind_msb work with all integers ac/nir: make ac_find_lsb work on all bit sizes ac/nir: make ac_build_umsb work on all bit sizes ac/nir: implement 8 and 16 bit ac_build_imsb ac/nir: make ac_build_bit_count work on all bit sizes ac/nir: make ac_build_bitfield_reverse work on all bit sizes ac/nir: implement 16-bit pack/unpack opcodes ac/nir: add 8-bit types to glsl_base_to_llvm_type ac/nir,radv: create an array of varying output types ac/nir: store all outputs as f32 radv: store all fragment shader inputs as f32 radv: handle all fragment output types WIP: radv,ac: implement 16-bit interpolation WIP: ac,radv: run LLVM's SLP vectorizer ac/nir: generate better code for nir_op_f2f16_rtz ac/nir: have nir_op_f2f16 round to zero radv,docs: expose float16, int16 and int8 features and extensions docs/features.txt| 2 +- src/amd/common/ac_llvm_build.c | 325 +++ src/amd/common/ac_llvm_build.h | 18 +- src/amd/common/ac_llvm_util.c| 8 +- src/amd/common/ac_nir_to_llvm.c | 268 +++ src/amd/common/ac_shader_abi.h | 1 + src/amd/vulkan/radv_device.c | 17 ++
[Mesa-dev] [PATCH 00/38] radv, ac: 16-bit and 8-bit arithmetic and 8-bit storage
This series add support for: - VK_KHR_shader_float16_int8 - VK_AMD_gpu_shader_half_float - VK_AMD_gpu_shader_int16 - VK_KHR_8bit_storage on VI+. Half floats are disabled on LLVM 7 because of a bug causing large memory usage and long (or unbounded) compilation times with some CTS tests. It is written against the following patch series: - https://patchwork.freedesktop.org/series/53454/ (v4) - https://patchwork.freedesktop.org/series/53660/ (v1) With LLVM 9, there are no reproducable Vulkan CTS regressions with Vega and VI except for dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.input_output_float_64_to_16.* which fails or crashes because of unrelated radv bugs with 64-bit varyings and because the tests use VK_FORMAT_R64_SFLOAT as a vertex format even though radv does not support it. With LLVM 9, there are no reproducable piglit regressions except for glsl-array-bounds-12.shader_test because of a LLVM bug when SLP vectorization is enabled. With LLVM 8, there are no reproducable Vulkan CTS regressions with Vega and VI except for those with LLVM 9 and a couple of tests because of a LLVM bug after the SLP vectorizer and with the current lack of fallback for 16-bit interpolation on LLVM versions before LLVM 9. With LLVM 7, there are no reproducable Vulkan CTS regressions with Vega and VI except for those with LLVM 9 and a couple of tests because of a LLVM bug after the SLP vectorizer. The SLP vectorization patch is marked as WIP because it exposes LLVM bugs with piglit's glsl-array-bounds-12.shader_test, some Vulkan CTS tests and some shader-db test for a game I can't remember. It also over-vectorizes 32-bit code which can cause significant worsening in generated code quality. The 16-bit interpolation patch is marked as WIP because it currently requires intrinsics only available in LLVM 9 and does not have a fallback. A branch on Github containing this series can be found at: https://github.com/pendingchaos/mesa/commits/radv_fp16_int16_int8_v2 v2: rebase v2: implement 16-bit interpolation v2: move LLVMAddSLPVectorizePass to after LLVMAddEarlyCSEMemSSAPass v2: run vectorization unconditionally on GFX9 and later v2: remove ac_get_one(), ac_get_zero(), ac_get_onef() and ac_get_zerof() v2: remove ac_int_of_size() v2: fix 64-bit visit_load_var() v2: mark VK_KHR_8bit_storage as DONE in features.txt v2: mark SLP vectorization patch as WIP v2: fix C++ style comment Rhys Perry (41): radv: bitcast 16-bit outputs to integers radv: ensure export arguments are always float ac: add various helpers for float16/int16/int8 ac/nir: implement 8-bit push constant, ssbo and ubo loads ac/nir: implement 8-bit ssbo stores ac/nir: fix 16-bit ssbo stores ac/nir: implement 8-bit nir_load_const_instr ac/nir: implement 8-bit conversions ac/nir: fix 64-bit nir_op_f2f16_rtz ac/nir: make ac_build_clamp work on all bit sizes ac/nir: make ac_build_fract work on all bit sizes ac/nir: make ac_build_isign work on all bit sizes ac/nir: make ac_build_fsign work on all bit sizes ac/nir: make ac_build_fdiv support 16-bit floats ac/nir: implement half-float nir_op_frcp ac/nir: implement half-float nir_op_frsq ac/nir: implement half-float nir_op_ldexp radv: lower 16-bit flrp ac/nir: support half floats in emit_b2f ac/nir: make emit_b2i work on all bit sizes ac/nir: implement 16-bit shifts compiler/nir: add lowering option for 16-bit ffma ac/nir: implement 16-bit ac_build_ddxy ac/nir: implement 8 and 16 bit ac_build_readlane nir: make bitfield_reverse and ifind_msb work with all integers ac/nir: make ac_find_lsb work on all bit sizes ac/nir: make ac_build_umsb work on all bit sizes ac/nir: implement 8 and 16 bit ac_build_imsb ac/nir: make ac_build_bit_count work on all bit sizes ac/nir: make ac_build_bitfield_reverse work on all bit sizes ac/nir: implement 16-bit pack/unpack opcodes ac/nir: add 8-bit types to glsl_base_to_llvm_type ac/nir,radv: create an array of varying output types ac/nir: store all outputs as f32 radv: store all fragment shader inputs as f32 radv: handle all fragment output types WIP: radv,ac: implement 16-bit interpolation WIP: ac,radv: run LLVM's SLP vectorizer ac/nir: generate better code for nir_op_f2f16_rtz ac/nir: have nir_op_f2f16 round to zero radv,docs: expose float16, int16 and int8 features and extensions docs/features.txt| 2 +- src/amd/common/ac_llvm_build.c | 325 +++ src/amd/common/ac_llvm_build.h | 18 +- src/amd/common/ac_llvm_util.c| 8 +- src/amd/common/ac_nir_to_llvm.c | 268 +++ src/amd/common/ac_shader_abi.h | 1 + src/amd/vulkan/radv_device.c | 17 ++ src/amd/vulkan/radv_extensions.py| 4 + src/amd/vulkan/radv_nir_to_llvm.c| 123 + src/amd/vulkan/radv_pipeline.c | 19 +- src/amd/vulkan/radv_shader.c | 4 +
Re: [Mesa-dev] [PATCH 00/38] radv, ac: 16-bit and 8-bit arithmetic and 8-bit storage
On 2/13/19 9:20 PM, Rhys Perry wrote: Quite a bit of the patches aren't specific to a single extension as many make code size-generic and some of the extensions intersect in functionality. It might still be possible to roughly order the patches by functionality but I'm not sure if it would be very useful (possible order in attachment). I didn't look at the actual content of the patches when creating the attachment, this is from memory and looking at the descriptions. Would you like me to send out a v2 of this series doing like that? Ok. No that's fine. Can you rebase and handle Marek feedbacks, at least? I will review the v2. Thanks Rhys. On Tue, 12 Feb 2019 at 17:08, Samuel Pitoiset wrote: How about splitting this series in four different parts? One for every extension? Is this doable without too much troubles? On 2/12/19 6:02 PM, Rhys Perry wrote: It currently requires review (and possibly rebasing). Marek Olšák send some feedback for a few of the patches but other than that, it hasn't gotten much attention. Also patch 35 seems to vectorize 32-bit code which can help or hurt shaders quite a bit and seems to hurt shaders overall. I'm not yet sure how to solve this without removing it or changing the result of LLVM's SLP vectorizer significantly. IIRC enabling SLP vectorizer also uncovered a RA bug with a shader. I think I'll look into the issues with patch 35 again. On Tue, 12 Feb 2019 at 16:30, Samuel Pitoiset wrote: What's the status of this? On 12/7/18 6:21 PM, Rhys Perry wrote: This series add support for: - VK_KHR_shader_float16_int8 - VK_AMD_gpu_shader_half_float - VK_AMD_gpu_shader_int16 - VK_KHR_8bit_storage on VI+. Half floats are currently disabled on LLVM 7 because of a bug causing large memory usage and long (or unbounded) compilation times with some tests. It depends on the follow patch series: - https://patchwork.freedesktop.org/series/53454/ - https://patchwork.freedesktop.org/series/53602/ - https://patchwork.freedesktop.org/series/53660/ An older version was tested on my Polaris card, but due to hardware issues I currently can't test the latest version of the series. deqp-vk has no regressions and none of the newly enabled tests fail. Rhys Perry (38): ac: add various helpers for float16/int16/int8 ac/nir: implement 8-bit push constant, ssbo and ubo loads ac/nir: implement 8-bit ssbo stores ac/nir: fix 16-bit ssbo stores ac/nir: implement 8-bit nir_load_const_instr ac/nir: implement 8-bit conversions ac/nir: fix 64-bit nir_op_f2f16_rtz ac/nir: make ac_build_clamp work on all bit sizes ac/nir: make ac_build_fract work on all bit sizes ac/nir: make ac_build_isign work on all bit sizes ac/nir: make ac_build_fsign work on all bit sizes ac/nir: make ac_build_fdiv support 16-bit floats ac/nir: implement half-float nir_op_frcp ac/nir: implement half-float nir_op_frsq ac/nir: implement half-float nir_op_ldexp radv: lower 16-bit flrp ac/nir: support half floats in emit_b2f ac/nir: make emit_b2i work on all bit sizes ac/nir: implement 16-bit shifts compiler/nir: add lowering option for 16-bit ffma ac/nir: implement 16-bit ac_build_ddxy ac/nir: implement 8 and 16 bit ac_build_readlane nir: make bitfield_reverse and ifind_msb work with all integers ac/nir: make ac_find_lsb work on all bit sizes ac/nir: make ac_build_umsb work on all bit sizes ac/nir: implement 8 and 16 bit ac_build_imsb ac/nir: make ac_build_bit_count work on all bit sizes ac/nir: make ac_build_bitfield_reverse work on all bit sizes ac/nir: implement 16-bit pack/unpack opcodes ac/nir: add 8-bit and 16-bit types to glsl_base_to_llvm_type ac/nir,radv: create an array of varying output types ac/nir: store all outputs as f32 radv: store all fragment shader inputs as f32 radv: handle all fragment output types ac,radv: run LLVM's SLP vectorizer ac/nir: generate better code for nir_op_f2f16_rtz ac/nir: have nir_op_f2f16 round to zero radv: expose float16, int16 and int8 features and extensions src/amd/common/ac_llvm_build.c| 355 ++ src/amd/common/ac_llvm_build.h| 22 +- src/amd/common/ac_llvm_util.c | 9 +- src/amd/common/ac_llvm_util.h | 1 + src/amd/common/ac_nir_to_llvm.c | 258 +++ src/amd/common/ac_shader_abi.h| 1 + src/amd/vulkan/radv_device.c | 17 ++ src/amd/vulkan/radv_extensions.py | 4 + src/amd/vulkan/radv_nir_to_llvm.c | 92 --- src/amd/vulkan/radv_shader.c | 7 + src/broadcom/compiler/nir_to_vir.c| 1 + src/compiler/nir/nir.h| 1 + src/compiler/nir/nir_opcodes.py | 4 +- src/compiler/nir/nir_opt_algebraic.py | 4 +- src/gallium/drivers/radeonsi/si_get.c | 1 + src/gallium/drivers/vc4/vc4_program.c | 1
Re: [Mesa-dev] [PATCH 00/38] radv, ac: 16-bit and 8-bit arithmetic and 8-bit storage
Quite a bit of the patches aren't specific to a single extension as many make code size-generic and some of the extensions intersect in functionality. It might still be possible to roughly order the patches by functionality but I'm not sure if it would be very useful (possible order in attachment). I didn't look at the actual content of the patches when creating the attachment, this is from memory and looking at the descriptions. Would you like me to send out a v2 of this series doing like that? On Tue, 12 Feb 2019 at 17:08, Samuel Pitoiset wrote: > > How about splitting this series in four different parts? One for every > extension? Is this doable without too much troubles? > > On 2/12/19 6:02 PM, Rhys Perry wrote: > > It currently requires review (and possibly rebasing). Marek Olšák send > > some feedback for a few of the patches but other than that, it hasn't > > gotten much attention. > > > > Also patch 35 seems to vectorize 32-bit code which can help or hurt > > shaders quite a bit and seems to hurt shaders overall. I'm not yet > > sure how to solve this without removing it or changing the result of > > LLVM's SLP vectorizer significantly. > > IIRC enabling SLP vectorizer also uncovered a RA bug with a shader. > > > > I think I'll look into the issues with patch 35 again. > > > > On Tue, 12 Feb 2019 at 16:30, Samuel Pitoiset > > wrote: > >> What's the status of this? > >> > >> On 12/7/18 6:21 PM, Rhys Perry wrote: > >>> This series add support for: > >>> - VK_KHR_shader_float16_int8 > >>> - VK_AMD_gpu_shader_half_float > >>> - VK_AMD_gpu_shader_int16 > >>> - VK_KHR_8bit_storage > >>> on VI+. Half floats are currently disabled on LLVM 7 because of a bug > >>> causing large memory usage and long (or unbounded) compilation times with > >>> some tests. > >>> > >>> It depends on the follow patch series: > >>> - https://patchwork.freedesktop.org/series/53454/ > >>> - https://patchwork.freedesktop.org/series/53602/ > >>> - https://patchwork.freedesktop.org/series/53660/ > >>> > >>> An older version was tested on my Polaris card, but due to hardware issues > >>> I currently can't test the latest version of the series. > >>> > >>> deqp-vk has no regressions and none of the newly enabled tests fail. > >>> > >>> Rhys Perry (38): > >>> ac: add various helpers for float16/int16/int8 > >>> ac/nir: implement 8-bit push constant, ssbo and ubo loads > >>> ac/nir: implement 8-bit ssbo stores > >>> ac/nir: fix 16-bit ssbo stores > >>> ac/nir: implement 8-bit nir_load_const_instr > >>> ac/nir: implement 8-bit conversions > >>> ac/nir: fix 64-bit nir_op_f2f16_rtz > >>> ac/nir: make ac_build_clamp work on all bit sizes > >>> ac/nir: make ac_build_fract work on all bit sizes > >>> ac/nir: make ac_build_isign work on all bit sizes > >>> ac/nir: make ac_build_fsign work on all bit sizes > >>> ac/nir: make ac_build_fdiv support 16-bit floats > >>> ac/nir: implement half-float nir_op_frcp > >>> ac/nir: implement half-float nir_op_frsq > >>> ac/nir: implement half-float nir_op_ldexp > >>> radv: lower 16-bit flrp > >>> ac/nir: support half floats in emit_b2f > >>> ac/nir: make emit_b2i work on all bit sizes > >>> ac/nir: implement 16-bit shifts > >>> compiler/nir: add lowering option for 16-bit ffma > >>> ac/nir: implement 16-bit ac_build_ddxy > >>> ac/nir: implement 8 and 16 bit ac_build_readlane > >>> nir: make bitfield_reverse and ifind_msb work with all integers > >>> ac/nir: make ac_find_lsb work on all bit sizes > >>> ac/nir: make ac_build_umsb work on all bit sizes > >>> ac/nir: implement 8 and 16 bit ac_build_imsb > >>> ac/nir: make ac_build_bit_count work on all bit sizes > >>> ac/nir: make ac_build_bitfield_reverse work on all bit sizes > >>> ac/nir: implement 16-bit pack/unpack opcodes > >>> ac/nir: add 8-bit and 16-bit types to glsl_base_to_llvm_type > >>> ac/nir,radv: create an array of varying output types > >>> ac/nir: store all outputs as f32 > >>> radv: store all fragment shader inputs as f32 > >>> radv: handle all fragment output types > >>> ac,radv: run LLVM's SLP vectorizer > >>> ac/nir: generate better code for nir_op_f2f16_rtz > >>> ac/nir: have nir_op_f2f16 round to zero > >>> radv: expose float16, int16 and int8 features and extensions > >>> > >>>src/amd/common/ac_llvm_build.c| 355 ++ > >>>src/amd/common/ac_llvm_build.h| 22 +- > >>>src/amd/common/ac_llvm_util.c | 9 +- > >>>src/amd/common/ac_llvm_util.h | 1 + > >>>src/amd/common/ac_nir_to_llvm.c | 258 +++ > >>>src/amd/common/ac_shader_abi.h| 1 + > >>>src/amd/vulkan/radv_device.c | 17 ++ > >>>src/amd/vulkan/radv_extensions.py | 4 + > >>>src/amd/vulkan/radv_nir_to_llvm.c | 92 --- > >>>src/amd/vulkan/radv_shader.c | 7 + > >>>
Re: [Mesa-dev] [PATCH 00/38] radv, ac: 16-bit and 8-bit arithmetic and 8-bit storage
How about splitting this series in four different parts? One for every extension? Is this doable without too much troubles? On 2/12/19 6:02 PM, Rhys Perry wrote: It currently requires review (and possibly rebasing). Marek Olšák send some feedback for a few of the patches but other than that, it hasn't gotten much attention. Also patch 35 seems to vectorize 32-bit code which can help or hurt shaders quite a bit and seems to hurt shaders overall. I'm not yet sure how to solve this without removing it or changing the result of LLVM's SLP vectorizer significantly. IIRC enabling SLP vectorizer also uncovered a RA bug with a shader. I think I'll look into the issues with patch 35 again. On Tue, 12 Feb 2019 at 16:30, Samuel Pitoiset wrote: What's the status of this? On 12/7/18 6:21 PM, Rhys Perry wrote: This series add support for: - VK_KHR_shader_float16_int8 - VK_AMD_gpu_shader_half_float - VK_AMD_gpu_shader_int16 - VK_KHR_8bit_storage on VI+. Half floats are currently disabled on LLVM 7 because of a bug causing large memory usage and long (or unbounded) compilation times with some tests. It depends on the follow patch series: - https://patchwork.freedesktop.org/series/53454/ - https://patchwork.freedesktop.org/series/53602/ - https://patchwork.freedesktop.org/series/53660/ An older version was tested on my Polaris card, but due to hardware issues I currently can't test the latest version of the series. deqp-vk has no regressions and none of the newly enabled tests fail. Rhys Perry (38): ac: add various helpers for float16/int16/int8 ac/nir: implement 8-bit push constant, ssbo and ubo loads ac/nir: implement 8-bit ssbo stores ac/nir: fix 16-bit ssbo stores ac/nir: implement 8-bit nir_load_const_instr ac/nir: implement 8-bit conversions ac/nir: fix 64-bit nir_op_f2f16_rtz ac/nir: make ac_build_clamp work on all bit sizes ac/nir: make ac_build_fract work on all bit sizes ac/nir: make ac_build_isign work on all bit sizes ac/nir: make ac_build_fsign work on all bit sizes ac/nir: make ac_build_fdiv support 16-bit floats ac/nir: implement half-float nir_op_frcp ac/nir: implement half-float nir_op_frsq ac/nir: implement half-float nir_op_ldexp radv: lower 16-bit flrp ac/nir: support half floats in emit_b2f ac/nir: make emit_b2i work on all bit sizes ac/nir: implement 16-bit shifts compiler/nir: add lowering option for 16-bit ffma ac/nir: implement 16-bit ac_build_ddxy ac/nir: implement 8 and 16 bit ac_build_readlane nir: make bitfield_reverse and ifind_msb work with all integers ac/nir: make ac_find_lsb work on all bit sizes ac/nir: make ac_build_umsb work on all bit sizes ac/nir: implement 8 and 16 bit ac_build_imsb ac/nir: make ac_build_bit_count work on all bit sizes ac/nir: make ac_build_bitfield_reverse work on all bit sizes ac/nir: implement 16-bit pack/unpack opcodes ac/nir: add 8-bit and 16-bit types to glsl_base_to_llvm_type ac/nir,radv: create an array of varying output types ac/nir: store all outputs as f32 radv: store all fragment shader inputs as f32 radv: handle all fragment output types ac,radv: run LLVM's SLP vectorizer ac/nir: generate better code for nir_op_f2f16_rtz ac/nir: have nir_op_f2f16 round to zero radv: expose float16, int16 and int8 features and extensions src/amd/common/ac_llvm_build.c| 355 ++ src/amd/common/ac_llvm_build.h| 22 +- src/amd/common/ac_llvm_util.c | 9 +- src/amd/common/ac_llvm_util.h | 1 + src/amd/common/ac_nir_to_llvm.c | 258 +++ src/amd/common/ac_shader_abi.h| 1 + src/amd/vulkan/radv_device.c | 17 ++ src/amd/vulkan/radv_extensions.py | 4 + src/amd/vulkan/radv_nir_to_llvm.c | 92 --- src/amd/vulkan/radv_shader.c | 7 + src/broadcom/compiler/nir_to_vir.c| 1 + src/compiler/nir/nir.h| 1 + src/compiler/nir/nir_opcodes.py | 4 +- src/compiler/nir/nir_opt_algebraic.py | 4 +- src/gallium/drivers/radeonsi/si_get.c | 1 + src/gallium/drivers/vc4/vc4_program.c | 1 + 16 files changed, 516 insertions(+), 262 deletions(-) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 00/38] radv, ac: 16-bit and 8-bit arithmetic and 8-bit storage
It currently requires review (and possibly rebasing). Marek Olšák send some feedback for a few of the patches but other than that, it hasn't gotten much attention. Also patch 35 seems to vectorize 32-bit code which can help or hurt shaders quite a bit and seems to hurt shaders overall. I'm not yet sure how to solve this without removing it or changing the result of LLVM's SLP vectorizer significantly. IIRC enabling SLP vectorizer also uncovered a RA bug with a shader. I think I'll look into the issues with patch 35 again. On Tue, 12 Feb 2019 at 16:30, Samuel Pitoiset wrote: > > What's the status of this? > > On 12/7/18 6:21 PM, Rhys Perry wrote: > > This series add support for: > > - VK_KHR_shader_float16_int8 > > - VK_AMD_gpu_shader_half_float > > - VK_AMD_gpu_shader_int16 > > - VK_KHR_8bit_storage > > on VI+. Half floats are currently disabled on LLVM 7 because of a bug > > causing large memory usage and long (or unbounded) compilation times with > > some tests. > > > > It depends on the follow patch series: > > - https://patchwork.freedesktop.org/series/53454/ > > - https://patchwork.freedesktop.org/series/53602/ > > - https://patchwork.freedesktop.org/series/53660/ > > > > An older version was tested on my Polaris card, but due to hardware issues > > I currently can't test the latest version of the series. > > > > deqp-vk has no regressions and none of the newly enabled tests fail. > > > > Rhys Perry (38): > >ac: add various helpers for float16/int16/int8 > >ac/nir: implement 8-bit push constant, ssbo and ubo loads > >ac/nir: implement 8-bit ssbo stores > >ac/nir: fix 16-bit ssbo stores > >ac/nir: implement 8-bit nir_load_const_instr > >ac/nir: implement 8-bit conversions > >ac/nir: fix 64-bit nir_op_f2f16_rtz > >ac/nir: make ac_build_clamp work on all bit sizes > >ac/nir: make ac_build_fract work on all bit sizes > >ac/nir: make ac_build_isign work on all bit sizes > >ac/nir: make ac_build_fsign work on all bit sizes > >ac/nir: make ac_build_fdiv support 16-bit floats > >ac/nir: implement half-float nir_op_frcp > >ac/nir: implement half-float nir_op_frsq > >ac/nir: implement half-float nir_op_ldexp > >radv: lower 16-bit flrp > >ac/nir: support half floats in emit_b2f > >ac/nir: make emit_b2i work on all bit sizes > >ac/nir: implement 16-bit shifts > >compiler/nir: add lowering option for 16-bit ffma > >ac/nir: implement 16-bit ac_build_ddxy > >ac/nir: implement 8 and 16 bit ac_build_readlane > >nir: make bitfield_reverse and ifind_msb work with all integers > >ac/nir: make ac_find_lsb work on all bit sizes > >ac/nir: make ac_build_umsb work on all bit sizes > >ac/nir: implement 8 and 16 bit ac_build_imsb > >ac/nir: make ac_build_bit_count work on all bit sizes > >ac/nir: make ac_build_bitfield_reverse work on all bit sizes > >ac/nir: implement 16-bit pack/unpack opcodes > >ac/nir: add 8-bit and 16-bit types to glsl_base_to_llvm_type > >ac/nir,radv: create an array of varying output types > >ac/nir: store all outputs as f32 > >radv: store all fragment shader inputs as f32 > >radv: handle all fragment output types > >ac,radv: run LLVM's SLP vectorizer > >ac/nir: generate better code for nir_op_f2f16_rtz > >ac/nir: have nir_op_f2f16 round to zero > >radv: expose float16, int16 and int8 features and extensions > > > > src/amd/common/ac_llvm_build.c| 355 ++ > > src/amd/common/ac_llvm_build.h| 22 +- > > src/amd/common/ac_llvm_util.c | 9 +- > > src/amd/common/ac_llvm_util.h | 1 + > > src/amd/common/ac_nir_to_llvm.c | 258 +++ > > src/amd/common/ac_shader_abi.h| 1 + > > src/amd/vulkan/radv_device.c | 17 ++ > > src/amd/vulkan/radv_extensions.py | 4 + > > src/amd/vulkan/radv_nir_to_llvm.c | 92 --- > > src/amd/vulkan/radv_shader.c | 7 + > > src/broadcom/compiler/nir_to_vir.c| 1 + > > src/compiler/nir/nir.h| 1 + > > src/compiler/nir/nir_opcodes.py | 4 +- > > src/compiler/nir/nir_opt_algebraic.py | 4 +- > > src/gallium/drivers/radeonsi/si_get.c | 1 + > > src/gallium/drivers/vc4/vc4_program.c | 1 + > > 16 files changed, 516 insertions(+), 262 deletions(-) > > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 00/38] radv, ac: 16-bit and 8-bit arithmetic and 8-bit storage
What's the status of this? On 12/7/18 6:21 PM, Rhys Perry wrote: This series add support for: - VK_KHR_shader_float16_int8 - VK_AMD_gpu_shader_half_float - VK_AMD_gpu_shader_int16 - VK_KHR_8bit_storage on VI+. Half floats are currently disabled on LLVM 7 because of a bug causing large memory usage and long (or unbounded) compilation times with some tests. It depends on the follow patch series: - https://patchwork.freedesktop.org/series/53454/ - https://patchwork.freedesktop.org/series/53602/ - https://patchwork.freedesktop.org/series/53660/ An older version was tested on my Polaris card, but due to hardware issues I currently can't test the latest version of the series. deqp-vk has no regressions and none of the newly enabled tests fail. Rhys Perry (38): ac: add various helpers for float16/int16/int8 ac/nir: implement 8-bit push constant, ssbo and ubo loads ac/nir: implement 8-bit ssbo stores ac/nir: fix 16-bit ssbo stores ac/nir: implement 8-bit nir_load_const_instr ac/nir: implement 8-bit conversions ac/nir: fix 64-bit nir_op_f2f16_rtz ac/nir: make ac_build_clamp work on all bit sizes ac/nir: make ac_build_fract work on all bit sizes ac/nir: make ac_build_isign work on all bit sizes ac/nir: make ac_build_fsign work on all bit sizes ac/nir: make ac_build_fdiv support 16-bit floats ac/nir: implement half-float nir_op_frcp ac/nir: implement half-float nir_op_frsq ac/nir: implement half-float nir_op_ldexp radv: lower 16-bit flrp ac/nir: support half floats in emit_b2f ac/nir: make emit_b2i work on all bit sizes ac/nir: implement 16-bit shifts compiler/nir: add lowering option for 16-bit ffma ac/nir: implement 16-bit ac_build_ddxy ac/nir: implement 8 and 16 bit ac_build_readlane nir: make bitfield_reverse and ifind_msb work with all integers ac/nir: make ac_find_lsb work on all bit sizes ac/nir: make ac_build_umsb work on all bit sizes ac/nir: implement 8 and 16 bit ac_build_imsb ac/nir: make ac_build_bit_count work on all bit sizes ac/nir: make ac_build_bitfield_reverse work on all bit sizes ac/nir: implement 16-bit pack/unpack opcodes ac/nir: add 8-bit and 16-bit types to glsl_base_to_llvm_type ac/nir,radv: create an array of varying output types ac/nir: store all outputs as f32 radv: store all fragment shader inputs as f32 radv: handle all fragment output types ac,radv: run LLVM's SLP vectorizer ac/nir: generate better code for nir_op_f2f16_rtz ac/nir: have nir_op_f2f16 round to zero radv: expose float16, int16 and int8 features and extensions src/amd/common/ac_llvm_build.c| 355 ++ src/amd/common/ac_llvm_build.h| 22 +- src/amd/common/ac_llvm_util.c | 9 +- src/amd/common/ac_llvm_util.h | 1 + src/amd/common/ac_nir_to_llvm.c | 258 +++ src/amd/common/ac_shader_abi.h| 1 + src/amd/vulkan/radv_device.c | 17 ++ src/amd/vulkan/radv_extensions.py | 4 + src/amd/vulkan/radv_nir_to_llvm.c | 92 --- src/amd/vulkan/radv_shader.c | 7 + src/broadcom/compiler/nir_to_vir.c| 1 + src/compiler/nir/nir.h| 1 + src/compiler/nir/nir_opcodes.py | 4 +- src/compiler/nir/nir_opt_algebraic.py | 4 +- src/gallium/drivers/radeonsi/si_get.c | 1 + src/gallium/drivers/vc4/vc4_program.c | 1 + 16 files changed, 516 insertions(+), 262 deletions(-) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 00/38] radv, ac: 16-bit and 8-bit arithmetic and 8-bit storage
This series add support for: - VK_KHR_shader_float16_int8 - VK_AMD_gpu_shader_half_float - VK_AMD_gpu_shader_int16 - VK_KHR_8bit_storage on VI+. Half floats are currently disabled on LLVM 7 because of a bug causing large memory usage and long (or unbounded) compilation times with some tests. It depends on the follow patch series: - https://patchwork.freedesktop.org/series/53454/ - https://patchwork.freedesktop.org/series/53602/ - https://patchwork.freedesktop.org/series/53660/ An older version was tested on my Polaris card, but due to hardware issues I currently can't test the latest version of the series. deqp-vk has no regressions and none of the newly enabled tests fail. Rhys Perry (38): ac: add various helpers for float16/int16/int8 ac/nir: implement 8-bit push constant, ssbo and ubo loads ac/nir: implement 8-bit ssbo stores ac/nir: fix 16-bit ssbo stores ac/nir: implement 8-bit nir_load_const_instr ac/nir: implement 8-bit conversions ac/nir: fix 64-bit nir_op_f2f16_rtz ac/nir: make ac_build_clamp work on all bit sizes ac/nir: make ac_build_fract work on all bit sizes ac/nir: make ac_build_isign work on all bit sizes ac/nir: make ac_build_fsign work on all bit sizes ac/nir: make ac_build_fdiv support 16-bit floats ac/nir: implement half-float nir_op_frcp ac/nir: implement half-float nir_op_frsq ac/nir: implement half-float nir_op_ldexp radv: lower 16-bit flrp ac/nir: support half floats in emit_b2f ac/nir: make emit_b2i work on all bit sizes ac/nir: implement 16-bit shifts compiler/nir: add lowering option for 16-bit ffma ac/nir: implement 16-bit ac_build_ddxy ac/nir: implement 8 and 16 bit ac_build_readlane nir: make bitfield_reverse and ifind_msb work with all integers ac/nir: make ac_find_lsb work on all bit sizes ac/nir: make ac_build_umsb work on all bit sizes ac/nir: implement 8 and 16 bit ac_build_imsb ac/nir: make ac_build_bit_count work on all bit sizes ac/nir: make ac_build_bitfield_reverse work on all bit sizes ac/nir: implement 16-bit pack/unpack opcodes ac/nir: add 8-bit and 16-bit types to glsl_base_to_llvm_type ac/nir,radv: create an array of varying output types ac/nir: store all outputs as f32 radv: store all fragment shader inputs as f32 radv: handle all fragment output types ac,radv: run LLVM's SLP vectorizer ac/nir: generate better code for nir_op_f2f16_rtz ac/nir: have nir_op_f2f16 round to zero radv: expose float16, int16 and int8 features and extensions src/amd/common/ac_llvm_build.c| 355 ++ src/amd/common/ac_llvm_build.h| 22 +- src/amd/common/ac_llvm_util.c | 9 +- src/amd/common/ac_llvm_util.h | 1 + src/amd/common/ac_nir_to_llvm.c | 258 +++ src/amd/common/ac_shader_abi.h| 1 + src/amd/vulkan/radv_device.c | 17 ++ src/amd/vulkan/radv_extensions.py | 4 + src/amd/vulkan/radv_nir_to_llvm.c | 92 --- src/amd/vulkan/radv_shader.c | 7 + src/broadcom/compiler/nir_to_vir.c| 1 + src/compiler/nir/nir.h| 1 + src/compiler/nir/nir_opcodes.py | 4 +- src/compiler/nir/nir_opt_algebraic.py | 4 +- src/gallium/drivers/radeonsi/si_get.c | 1 + src/gallium/drivers/vc4/vc4_program.c | 1 + 16 files changed, 516 insertions(+), 262 deletions(-) -- 2.19.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev