Re: [Freedreno] [PATCH 2/2] freedreno/ir3: add a pass to lower tg4 to txl, enable gather on a4xx
On Sun, Nov 19, 2017 at 2:54 PM, Ilia Mirkinwrote: > Unfortunately Adreno A4xx hardware returns incorrect results with the > GATHER4 opcodes. As a result, we have to lower to 4 individual texture > calls (txl since we have to force lod to 0). We achieve this using > offsets, including on cube maps which normally never have offsets. > > Signed-off-by: Ilia Mirkin > --- > > This pass relies on the hw doing the "right thing", working with nonconst > offsets, and not having the usual limits (since the gather offset will in > effect get offset by another 1). > > It fails two tests out of all the gather ones: > > bin/zero-tex-coord textureGather > tests/spec/arb_gpu_shader5/execution/built-in-functions/fs-textureGatherOffset-uniform-array-offset.shader_test > > We haven't fully investigated why yet, but this is a good start. > > Note that the blob does this differently - they modify the source coordinate. > However this seems unnecessary given that the hw can be made to use the > offsets. > > Also please note that my knowledge of nir is minimal. Please carefully check > that I used the right helpers/etc. This was largely a result of seeing what > doesn't result in assertions. > > docs/features.txt | 4 +- > src/gallium/drivers/freedreno/Makefile.sources | 1 + > src/gallium/drivers/freedreno/freedreno_screen.c | 2 +- > .../drivers/freedreno/ir3/ir3_compiler_nir.c | 7 +- > src/gallium/drivers/freedreno/ir3/ir3_nir.c| 2 + > src/gallium/drivers/freedreno/ir3/ir3_nir.h| 1 + > .../freedreno/ir3/ir3_nir_lower_tg4_to_tex.c | 139 > + > src/gallium/drivers/freedreno/meson.build | 1 + > 8 files changed, 152 insertions(+), 5 deletions(-) > create mode 100644 > src/gallium/drivers/freedreno/ir3/ir3_nir_lower_tg4_to_tex.c > > diff --git a/docs/features.txt b/docs/features.txt > index 633d2593738..99fb1715e0b 100644 > --- a/docs/features.txt > +++ b/docs/features.txt > @@ -130,7 +130,7 @@ GL 4.0, GLSL 4.00 --- all DONE: i965/gen7+, nvc0, r600, > radeonsi >GL_ARB_tessellation_shaderDONE (i965/gen7+) >GL_ARB_texture_buffer_object_rgb32DONE (freedreno, > i965/gen6+, llvmpipe, softpipe, swr) >GL_ARB_texture_cube_map_array DONE (i965/gen6+, > nv50, llvmpipe, softpipe) > - GL_ARB_texture_gather DONE > (freedreno/a5xx, i965/gen6+, nv50, llvmpipe, softpipe, swr) > + GL_ARB_texture_gather DONE (freedreno, > i965/gen6+, nv50, llvmpipe, softpipe, swr) >GL_ARB_texture_query_lod DONE (freedreno, > i965, nv50, llvmpipe, softpipe) >GL_ARB_transform_feedback2DONE (i965/gen6+, > nv50, llvmpipe, softpipe, swr) >GL_ARB_transform_feedback3DONE (i965/gen7+, > llvmpipe, softpipe, swr) > @@ -256,7 +256,7 @@ GLES3.1, GLSL ES 3.1 -- all DONE: i965/hsw+, nvc0, > radeonsi >GL_ARB_texture_multisample (Multisample textures) DONE (i965/gen7+, > nv50, r600, llvmpipe, softpipe) >GL_ARB_texture_storage_multisampleDONE (all drivers > that support GL_ARB_texture_multisample) >GL_ARB_vertex_attrib_binding DONE (all drivers) > - GS5 Enhanced textureGatherDONE (i965/gen7+, > r600) > + GS5 Enhanced textureGatherDONE (freedreno, > i965/gen7+, r600) >GS5 Packing/bitfield/conversion functions DONE (i965/gen6+, > r600) >GL_EXT_shader_integer_mix DONE (all drivers > that support GLSL) > > diff --git a/src/gallium/drivers/freedreno/Makefile.sources > b/src/gallium/drivers/freedreno/Makefile.sources > index b109a5a7a21..40c2eff0455 100644 > --- a/src/gallium/drivers/freedreno/Makefile.sources > +++ b/src/gallium/drivers/freedreno/Makefile.sources > @@ -168,6 +168,7 @@ ir3_SOURCES := \ > ir3/ir3_nir.c \ > ir3/ir3_nir.h \ > ir3/ir3_nir_lower_if_else.c \ > + ir3/ir3_nir_lower_tg4_to_tex.c \ > ir3/ir3_print.c \ > ir3/ir3_ra.c \ > ir3/ir3_sched.c \ > diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c > b/src/gallium/drivers/freedreno/freedreno_screen.c > index e61344fd104..62e4a574b90 100644 > --- a/src/gallium/drivers/freedreno/freedreno_screen.c > +++ b/src/gallium/drivers/freedreno/freedreno_screen.c > @@ -264,7 +264,7 @@ fd_screen_get_param(struct pipe_screen *pscreen, enum > pipe_cap param) > return 0; > > case PIPE_CAP_MAX_TEXTURE_GATHER_COMPONENTS: > - if (is_a5xx(screen)) > + if (is_a4xx(screen) || is_a5xx(screen)) > return 4; > return 0; > > diff --git a/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c >
[Freedreno] [PATCH 2/2] freedreno/ir3: add a pass to lower tg4 to txl, enable gather on a4xx
Unfortunately Adreno A4xx hardware returns incorrect results with the GATHER4 opcodes. As a result, we have to lower to 4 individual texture calls (txl since we have to force lod to 0). We achieve this using offsets, including on cube maps which normally never have offsets. Signed-off-by: Ilia Mirkin--- This pass relies on the hw doing the "right thing", working with nonconst offsets, and not having the usual limits (since the gather offset will in effect get offset by another 1). It fails two tests out of all the gather ones: bin/zero-tex-coord textureGather tests/spec/arb_gpu_shader5/execution/built-in-functions/fs-textureGatherOffset-uniform-array-offset.shader_test We haven't fully investigated why yet, but this is a good start. Note that the blob does this differently - they modify the source coordinate. However this seems unnecessary given that the hw can be made to use the offsets. Also please note that my knowledge of nir is minimal. Please carefully check that I used the right helpers/etc. This was largely a result of seeing what doesn't result in assertions. docs/features.txt | 4 +- src/gallium/drivers/freedreno/Makefile.sources | 1 + src/gallium/drivers/freedreno/freedreno_screen.c | 2 +- .../drivers/freedreno/ir3/ir3_compiler_nir.c | 7 +- src/gallium/drivers/freedreno/ir3/ir3_nir.c| 2 + src/gallium/drivers/freedreno/ir3/ir3_nir.h| 1 + .../freedreno/ir3/ir3_nir_lower_tg4_to_tex.c | 139 + src/gallium/drivers/freedreno/meson.build | 1 + 8 files changed, 152 insertions(+), 5 deletions(-) create mode 100644 src/gallium/drivers/freedreno/ir3/ir3_nir_lower_tg4_to_tex.c diff --git a/docs/features.txt b/docs/features.txt index 633d2593738..99fb1715e0b 100644 --- a/docs/features.txt +++ b/docs/features.txt @@ -130,7 +130,7 @@ GL 4.0, GLSL 4.00 --- all DONE: i965/gen7+, nvc0, r600, radeonsi GL_ARB_tessellation_shaderDONE (i965/gen7+) GL_ARB_texture_buffer_object_rgb32DONE (freedreno, i965/gen6+, llvmpipe, softpipe, swr) GL_ARB_texture_cube_map_array DONE (i965/gen6+, nv50, llvmpipe, softpipe) - GL_ARB_texture_gather DONE (freedreno/a5xx, i965/gen6+, nv50, llvmpipe, softpipe, swr) + GL_ARB_texture_gather DONE (freedreno, i965/gen6+, nv50, llvmpipe, softpipe, swr) GL_ARB_texture_query_lod DONE (freedreno, i965, nv50, llvmpipe, softpipe) GL_ARB_transform_feedback2DONE (i965/gen6+, nv50, llvmpipe, softpipe, swr) GL_ARB_transform_feedback3DONE (i965/gen7+, llvmpipe, softpipe, swr) @@ -256,7 +256,7 @@ GLES3.1, GLSL ES 3.1 -- all DONE: i965/hsw+, nvc0, radeonsi GL_ARB_texture_multisample (Multisample textures) DONE (i965/gen7+, nv50, r600, llvmpipe, softpipe) GL_ARB_texture_storage_multisampleDONE (all drivers that support GL_ARB_texture_multisample) GL_ARB_vertex_attrib_binding DONE (all drivers) - GS5 Enhanced textureGatherDONE (i965/gen7+, r600) + GS5 Enhanced textureGatherDONE (freedreno, i965/gen7+, r600) GS5 Packing/bitfield/conversion functions DONE (i965/gen6+, r600) GL_EXT_shader_integer_mix DONE (all drivers that support GLSL) diff --git a/src/gallium/drivers/freedreno/Makefile.sources b/src/gallium/drivers/freedreno/Makefile.sources index b109a5a7a21..40c2eff0455 100644 --- a/src/gallium/drivers/freedreno/Makefile.sources +++ b/src/gallium/drivers/freedreno/Makefile.sources @@ -168,6 +168,7 @@ ir3_SOURCES := \ ir3/ir3_nir.c \ ir3/ir3_nir.h \ ir3/ir3_nir_lower_if_else.c \ + ir3/ir3_nir_lower_tg4_to_tex.c \ ir3/ir3_print.c \ ir3/ir3_ra.c \ ir3/ir3_sched.c \ diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c b/src/gallium/drivers/freedreno/freedreno_screen.c index e61344fd104..62e4a574b90 100644 --- a/src/gallium/drivers/freedreno/freedreno_screen.c +++ b/src/gallium/drivers/freedreno/freedreno_screen.c @@ -264,7 +264,7 @@ fd_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) return 0; case PIPE_CAP_MAX_TEXTURE_GATHER_COMPONENTS: - if (is_a5xx(screen)) + if (is_a4xx(screen) || is_a5xx(screen)) return 4; return 0; diff --git a/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c b/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c index da4aeaa7acb..c97df4f1d63 100644 --- a/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c +++ b/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c @@ -2399,9 +2399,12 @@ emit_tex(struct ir3_context *ctx, nir_tex_instr *tex)