Re: [Freedreno] [PATCH 2/2] freedreno/ir3: add a pass to lower tg4 to txl, enable gather on a4xx

2017-11-20 Thread Rob Clark
On Sun, Nov 19, 2017 at 2:54 PM, Ilia Mirkin  wrote:
> Unfortunately Adreno A4xx hardware returns incorrect results with the
> GATHER4 opcodes. As a result, we have to lower to 4 individual texture
> calls (txl since we have to force lod to 0). We achieve this using
> offsets, including on cube maps which normally never have offsets.
>
> Signed-off-by: Ilia Mirkin 
> ---
>
> This pass relies on the hw doing the "right thing", working with nonconst
> offsets, and not having the usual limits (since the gather offset will in
> effect get offset by another 1).
>
> It fails two tests out of all the gather ones:
>
> bin/zero-tex-coord textureGather
> tests/spec/arb_gpu_shader5/execution/built-in-functions/fs-textureGatherOffset-uniform-array-offset.shader_test
>
> We haven't fully investigated why yet, but this is a good start.
>
> Note that the blob does this differently - they modify the source coordinate.
> However this seems unnecessary given that the hw can be made to use the
> offsets.
>
> Also please note that my knowledge of nir is minimal. Please carefully check
> that I used the right helpers/etc. This was largely a result of seeing what
> doesn't result in assertions.
>
>  docs/features.txt  |   4 +-
>  src/gallium/drivers/freedreno/Makefile.sources |   1 +
>  src/gallium/drivers/freedreno/freedreno_screen.c   |   2 +-
>  .../drivers/freedreno/ir3/ir3_compiler_nir.c   |   7 +-
>  src/gallium/drivers/freedreno/ir3/ir3_nir.c|   2 +
>  src/gallium/drivers/freedreno/ir3/ir3_nir.h|   1 +
>  .../freedreno/ir3/ir3_nir_lower_tg4_to_tex.c   | 139 
> +
>  src/gallium/drivers/freedreno/meson.build  |   1 +
>  8 files changed, 152 insertions(+), 5 deletions(-)
>  create mode 100644 
> src/gallium/drivers/freedreno/ir3/ir3_nir_lower_tg4_to_tex.c
>
> diff --git a/docs/features.txt b/docs/features.txt
> index 633d2593738..99fb1715e0b 100644
> --- a/docs/features.txt
> +++ b/docs/features.txt
> @@ -130,7 +130,7 @@ GL 4.0, GLSL 4.00 --- all DONE: i965/gen7+, nvc0, r600, 
> radeonsi
>GL_ARB_tessellation_shaderDONE (i965/gen7+)
>GL_ARB_texture_buffer_object_rgb32DONE (freedreno, 
> i965/gen6+, llvmpipe, softpipe, swr)
>GL_ARB_texture_cube_map_array DONE (i965/gen6+, 
> nv50, llvmpipe, softpipe)
> -  GL_ARB_texture_gather DONE 
> (freedreno/a5xx, i965/gen6+, nv50, llvmpipe, softpipe, swr)
> +  GL_ARB_texture_gather DONE (freedreno, 
> i965/gen6+, nv50, llvmpipe, softpipe, swr)
>GL_ARB_texture_query_lod  DONE (freedreno, 
> i965, nv50, llvmpipe, softpipe)
>GL_ARB_transform_feedback2DONE (i965/gen6+, 
> nv50, llvmpipe, softpipe, swr)
>GL_ARB_transform_feedback3DONE (i965/gen7+, 
> llvmpipe, softpipe, swr)
> @@ -256,7 +256,7 @@ GLES3.1, GLSL ES 3.1 -- all DONE: i965/hsw+, nvc0, 
> radeonsi
>GL_ARB_texture_multisample (Multisample textures) DONE (i965/gen7+, 
> nv50, r600, llvmpipe, softpipe)
>GL_ARB_texture_storage_multisampleDONE (all drivers 
> that support GL_ARB_texture_multisample)
>GL_ARB_vertex_attrib_binding  DONE (all drivers)
> -  GS5 Enhanced textureGatherDONE (i965/gen7+, 
> r600)
> +  GS5 Enhanced textureGatherDONE (freedreno, 
> i965/gen7+, r600)
>GS5 Packing/bitfield/conversion functions DONE (i965/gen6+, 
> r600)
>GL_EXT_shader_integer_mix DONE (all drivers 
> that support GLSL)
>
> diff --git a/src/gallium/drivers/freedreno/Makefile.sources 
> b/src/gallium/drivers/freedreno/Makefile.sources
> index b109a5a7a21..40c2eff0455 100644
> --- a/src/gallium/drivers/freedreno/Makefile.sources
> +++ b/src/gallium/drivers/freedreno/Makefile.sources
> @@ -168,6 +168,7 @@ ir3_SOURCES := \
> ir3/ir3_nir.c \
> ir3/ir3_nir.h \
> ir3/ir3_nir_lower_if_else.c \
> +   ir3/ir3_nir_lower_tg4_to_tex.c \
> ir3/ir3_print.c \
> ir3/ir3_ra.c \
> ir3/ir3_sched.c \
> diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c 
> b/src/gallium/drivers/freedreno/freedreno_screen.c
> index e61344fd104..62e4a574b90 100644
> --- a/src/gallium/drivers/freedreno/freedreno_screen.c
> +++ b/src/gallium/drivers/freedreno/freedreno_screen.c
> @@ -264,7 +264,7 @@ fd_screen_get_param(struct pipe_screen *pscreen, enum 
> pipe_cap param)
> return 0;
>
> case PIPE_CAP_MAX_TEXTURE_GATHER_COMPONENTS:
> -   if (is_a5xx(screen))
> +   if (is_a4xx(screen) || is_a5xx(screen))
> return 4;
> return 0;
>
> diff --git a/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c 
> 

[Freedreno] [PATCH 2/2] freedreno/ir3: add a pass to lower tg4 to txl, enable gather on a4xx

2017-11-19 Thread Ilia Mirkin
Unfortunately Adreno A4xx hardware returns incorrect results with the
GATHER4 opcodes. As a result, we have to lower to 4 individual texture
calls (txl since we have to force lod to 0). We achieve this using
offsets, including on cube maps which normally never have offsets.

Signed-off-by: Ilia Mirkin 
---

This pass relies on the hw doing the "right thing", working with nonconst
offsets, and not having the usual limits (since the gather offset will in
effect get offset by another 1).

It fails two tests out of all the gather ones:

bin/zero-tex-coord textureGather
tests/spec/arb_gpu_shader5/execution/built-in-functions/fs-textureGatherOffset-uniform-array-offset.shader_test

We haven't fully investigated why yet, but this is a good start.

Note that the blob does this differently - they modify the source coordinate.
However this seems unnecessary given that the hw can be made to use the
offsets.

Also please note that my knowledge of nir is minimal. Please carefully check
that I used the right helpers/etc. This was largely a result of seeing what
doesn't result in assertions.

 docs/features.txt  |   4 +-
 src/gallium/drivers/freedreno/Makefile.sources |   1 +
 src/gallium/drivers/freedreno/freedreno_screen.c   |   2 +-
 .../drivers/freedreno/ir3/ir3_compiler_nir.c   |   7 +-
 src/gallium/drivers/freedreno/ir3/ir3_nir.c|   2 +
 src/gallium/drivers/freedreno/ir3/ir3_nir.h|   1 +
 .../freedreno/ir3/ir3_nir_lower_tg4_to_tex.c   | 139 +
 src/gallium/drivers/freedreno/meson.build  |   1 +
 8 files changed, 152 insertions(+), 5 deletions(-)
 create mode 100644 src/gallium/drivers/freedreno/ir3/ir3_nir_lower_tg4_to_tex.c

diff --git a/docs/features.txt b/docs/features.txt
index 633d2593738..99fb1715e0b 100644
--- a/docs/features.txt
+++ b/docs/features.txt
@@ -130,7 +130,7 @@ GL 4.0, GLSL 4.00 --- all DONE: i965/gen7+, nvc0, r600, 
radeonsi
   GL_ARB_tessellation_shaderDONE (i965/gen7+)
   GL_ARB_texture_buffer_object_rgb32DONE (freedreno, 
i965/gen6+, llvmpipe, softpipe, swr)
   GL_ARB_texture_cube_map_array DONE (i965/gen6+, 
nv50, llvmpipe, softpipe)
-  GL_ARB_texture_gather DONE (freedreno/a5xx, 
i965/gen6+, nv50, llvmpipe, softpipe, swr)
+  GL_ARB_texture_gather DONE (freedreno, 
i965/gen6+, nv50, llvmpipe, softpipe, swr)
   GL_ARB_texture_query_lod  DONE (freedreno, i965, 
nv50, llvmpipe, softpipe)
   GL_ARB_transform_feedback2DONE (i965/gen6+, 
nv50, llvmpipe, softpipe, swr)
   GL_ARB_transform_feedback3DONE (i965/gen7+, 
llvmpipe, softpipe, swr)
@@ -256,7 +256,7 @@ GLES3.1, GLSL ES 3.1 -- all DONE: i965/hsw+, nvc0, radeonsi
   GL_ARB_texture_multisample (Multisample textures) DONE (i965/gen7+, 
nv50, r600, llvmpipe, softpipe)
   GL_ARB_texture_storage_multisampleDONE (all drivers that 
support GL_ARB_texture_multisample)
   GL_ARB_vertex_attrib_binding  DONE (all drivers)
-  GS5 Enhanced textureGatherDONE (i965/gen7+, r600)
+  GS5 Enhanced textureGatherDONE (freedreno, 
i965/gen7+, r600)
   GS5 Packing/bitfield/conversion functions DONE (i965/gen6+, r600)
   GL_EXT_shader_integer_mix DONE (all drivers that 
support GLSL)
 
diff --git a/src/gallium/drivers/freedreno/Makefile.sources 
b/src/gallium/drivers/freedreno/Makefile.sources
index b109a5a7a21..40c2eff0455 100644
--- a/src/gallium/drivers/freedreno/Makefile.sources
+++ b/src/gallium/drivers/freedreno/Makefile.sources
@@ -168,6 +168,7 @@ ir3_SOURCES := \
ir3/ir3_nir.c \
ir3/ir3_nir.h \
ir3/ir3_nir_lower_if_else.c \
+   ir3/ir3_nir_lower_tg4_to_tex.c \
ir3/ir3_print.c \
ir3/ir3_ra.c \
ir3/ir3_sched.c \
diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c 
b/src/gallium/drivers/freedreno/freedreno_screen.c
index e61344fd104..62e4a574b90 100644
--- a/src/gallium/drivers/freedreno/freedreno_screen.c
+++ b/src/gallium/drivers/freedreno/freedreno_screen.c
@@ -264,7 +264,7 @@ fd_screen_get_param(struct pipe_screen *pscreen, enum 
pipe_cap param)
return 0;
 
case PIPE_CAP_MAX_TEXTURE_GATHER_COMPONENTS:
-   if (is_a5xx(screen))
+   if (is_a4xx(screen) || is_a5xx(screen))
return 4;
return 0;
 
diff --git a/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c 
b/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c
index da4aeaa7acb..c97df4f1d63 100644
--- a/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c
+++ b/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c
@@ -2399,9 +2399,12 @@ emit_tex(struct ir3_context *ctx, nir_tex_instr *tex)