Re: [Mesa-dev] freedreno: 'Unhandled NIR tex src type: 11' on A3XX

2019-06-10 Thread Jonathan Marek

On 6/10/19 10:00 PM, Brian Masney wrote:

On Mon, Jun 10, 2019 at 09:53:25PM -0400, Jonathan Marek wrote:

This error doesn't happen on X11 using the mesa master branch. Instead,
I get the following error on that branch:

../src/gallium/drivers/freedreno/freedreno_batch.c:424:fd_batch_add_dep: 
Assertion `!batch_depends_on(dep, batch)' failed.

Full disclosure though: I rebuilt the mesa package using the
postmarketOS packaging yesterday and it includes a few extra patches for
musl libc.

https://gitlab.com/postmarketOS/pmaports/tree/master/temp/mesa



I don't see anything obvious in those patches that would be related..
but I suspect this type of error is going to be timing related.
(Which could ofc be due to musl or something else)

but a bit surprised debug_assert() is enabled in debug builds.. it
would probably be a "harmless" situation if asserts were not enabled.

(note that I do most of my testing with debug builds with asserts
enabled.. this is the type of thing that I want to see and fix.. but
probably shouldn't matter to end users)


I recompiled the master branch of mesa in pmOS with '-Db_ndebug=true'
and X11 is now working properly on the Nexus 5. glxgears averages about
59.5 FPS. I'll add a bug report with pmOS to have them add that flag to
their mesa build. Fedora added that flag to their builds:
https://bugzilla.redhat.com/show_bug.cgi?id=1692426

19.1.0-rc5 still doesn't work for me due to the original error.

Brian



You probably want '--buildtype=release' instead of '-Db_ndebug=true'


According to:
https://gitlab.freedesktop.org/mesa/mesa/blob/master/docs/meson.html#L321

 -Db_ndebug - This option controls assertions in meson projects. When
 set to false (the default) assertions are enabled, when set to true
 they are disabled. This is unrelated to the buildtype; setting the
 latter to release will not turn off assertions.

Brian



I always thought release == no assertions, I guess meson has different 
ideas. You will want '--buildtype=release' anyway for optimizations, etc.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] freedreno: 'Unhandled NIR tex src type: 11' on A3XX

2019-06-10 Thread Jonathan Marek

On 6/10/19 9:52 PM, Brian Masney wrote:

Hi Rob,

On Mon, Jun 10, 2019 at 05:10:45PM -0700, Rob Clark wrote:

On Mon, Jun 10, 2019 at 3:54 PM Brian Masney  wrote:


On Mon, Jun 10, 2019 at 06:58:30AM -0700, Rob Clark wrote:

On Mon, Jun 10, 2019 at 6:53 AM Rob Clark  wrote:


On Sat, Jun 8, 2019 at 6:08 PM Brian Masney  wrote:


Hi,

I'm trying to get the GPU working using the Freedreno driver (A330) on
the Nexus 5 phone. I'm using kernel 5.2rc3 with some out of tree patches
related to the GPU [1] and mesa 19.1.0-rc5 on postmarketOS. When I run
glxgears, I see the gears show up for a fraction of a second and then
it terminates due to the following error:

-
shader: MESA_SHADER_FRAGMENT
inputs: 1
outputs: 1
uniforms: 0
shared: 0
decl_var uniform INTERP_MODE_NONE sampler2D sampler (0, 0, 0)
decl_var shader_in INTERP_MODE_SMOOTH vec4 in_0 (VARYING_SLOT_VAR0, 0, 0)
decl_var shader_out INTERP_MODE_FLAT vec4 out_0 (FRAG_RESULT_DATA0, 0, 0)
decl_function main (0 params)

impl main {
 block block_0:
 /* preds: */
 vec1 32 ssa_0 = load_const (0x /* 0.00 */)
 vec2 32 ssa_1 = intrinsic load_barycentric_pixel () (1) /* 
interp_mode=1 */
 vec4 32 ssa_2 = intrinsic load_interpolated_input (ssa_1, ssa_0) (0, 
0) /* base=0 */ /* component=0 */  /* in_0 */
 vec1 32 ssa_3 = deref_var  (uniform sampler2D)
 vec2 32 ssa_4 = vec2 ssa_2.x, ssa_2.y
 vec4 32 ssa_5 = tex ssa_3 (texture_deref), ssa_3 (sampler_deref), 
ssa_4 (coord)
Unhandled NIR tex src type: 11


This should be getting lowered somewhere..  and I don't *think* it
should be a3xx specific.


It should be getting lowered in gl_nir_lower_samplers().. which should
be called from mesa/st before the driver even sees this shader.

Could you build mesa from git w/ latest 19.1, I guess this must have
been fixed by now, since other drivers that use nir would hit the same
issue.


This error doesn't happen on X11 using the mesa master branch. Instead,
I get the following error on that branch:

../src/gallium/drivers/freedreno/freedreno_batch.c:424:fd_batch_add_dep: 
Assertion `!batch_depends_on(dep, batch)' failed.

Full disclosure though: I rebuilt the mesa package using the
postmarketOS packaging yesterday and it includes a few extra patches for
musl libc.

https://gitlab.com/postmarketOS/pmaports/tree/master/temp/mesa



I don't see anything obvious in those patches that would be related..
but I suspect this type of error is going to be timing related.
(Which could ofc be due to musl or something else)

but a bit surprised debug_assert() is enabled in debug builds.. it
would probably be a "harmless" situation if asserts were not enabled.

(note that I do most of my testing with debug builds with asserts
enabled.. this is the type of thing that I want to see and fix.. but
probably shouldn't matter to end users)


I recompiled the master branch of mesa in pmOS with '-Db_ndebug=true'
and X11 is now working properly on the Nexus 5. glxgears averages about
59.5 FPS. I'll add a bug report with pmOS to have them add that flag to
their mesa build. Fedora added that flag to their builds:
https://bugzilla.redhat.com/show_bug.cgi?id=1692426

19.1.0-rc5 still doesn't work for me due to the original error.

Brian



You probably want '--buildtype=release' instead of '-Db_ndebug=true'
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] freedreno: 'Unhandled NIR tex src type: 11' on A3XX

2019-06-09 Thread Jonathan Marek

On 6/9/19 8:41 AM, Brian Masney wrote:

On Sat, Jun 08, 2019 at 10:58:11PM -0400, Jonathan Marek wrote:

Hi,

It's possible 19.1 has another issue, I only tested the master branch with
my fix. I would suggest trying 19.0 or the master branch.


The mesa master branch and 19.0.6 both give the following error when
glxgears starts up:

../src/gallium/drivers/freedreno/freedreno_batch.c:424:fd_batch_add_dep: 
Assertion `!batch_depends_on(dep, batch)' failed.



No one is testing freedreno+X11 AFAIK. This would affect all adrenos 
too, not just a3xx. I can look into it at some point, if no one else does.


To test if the GPU works at all you should use kmscube. If that works 
then you can try wayland/weston, or if you really need X11 IIRC 18.1 was 
working with X11.



FYI, I haven't pushed it anywhere but I recently rebased my Nexus 5 patches
from last year (and been looking at getting call audio working).


Fantastic!

Brian




On 6/8/19 9:08 PM, Brian Masney wrote:

Hi,

I'm trying to get the GPU working using the Freedreno driver (A330) on
the Nexus 5 phone. I'm using kernel 5.2rc3 with some out of tree patches
related to the GPU [1] and mesa 19.1.0-rc5 on postmarketOS. When I run
glxgears, I see the gears show up for a fraction of a second and then
it terminates due to the following error:

-
shader: MESA_SHADER_FRAGMENT
inputs: 1
outputs: 1
uniforms: 0
shared: 0
decl_var uniform INTERP_MODE_NONE sampler2D sampler (0, 0, 0)
decl_var shader_in INTERP_MODE_SMOOTH vec4 in_0 (VARYING_SLOT_VAR0, 0, 0)
decl_var shader_out INTERP_MODE_FLAT vec4 out_0 (FRAG_RESULT_DATA0, 0, 0)
decl_function main (0 params)

impl main {
  block block_0:
  /* preds: */
  vec1 32 ssa_0 = load_const (0x /* 0.00 */)
  vec2 32 ssa_1 = intrinsic load_barycentric_pixel () (1) /* 
interp_mode=1 */
  vec4 32 ssa_2 = intrinsic load_interpolated_input (ssa_1, ssa_0) (0, 
0) /* base=0 */ /* component=0 */  /* in_0 */
  vec1 32 ssa_3 = deref_var  (uniform sampler2D)
  vec2 32 ssa_4 = vec2 ssa_2.x, ssa_2.y
  vec4 32 ssa_5 = tex ssa_3 (texture_deref), ssa_3 (sampler_deref), 
ssa_4 (coord)
Unhandled NIR tex src type: 11


  intrinsic store_output (ssa_5, ssa_0) (0, 15, 0) /* base=0 */ /* 
wrmask=xyzw */ /* component=0 */   /* out_0 */
  /* succs: block_1 */
  block block_1:
}

Assertion failed: !"" (../src/freedreno/ir3/ir3_context.c: ir3_context_error: 
407)
-

I verified that the mesa 19.1.0-rc5 release contains this recent a3xx
fix from Jonathan:
https://gitlab.freedesktop.org/mesa/mesa/commit/1db86d8b62860380c34af77ae62b019ed2376443

Any suggestions?

[1] https://github.com/masneyb/linux/commits/v5.2-rc3-nexus5-gpu-wip
  The GPU specific patches start at Rob's patch 'qcom-scm: add support
  to restore secure config' on that list. I submitted the patches
  below that a few weeks ago to the upstream kernel and I expect
  they'll be merged. Once I have a working GPU, I plan to start
  working on the interconnect support in the kernel for msm8974 so
  that the clock hacks can be dropped.

Brian


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] freedreno: 'Unhandled NIR tex src type: 11' on A3XX

2019-06-08 Thread Jonathan Marek

Hi,

It's possible 19.1 has another issue, I only tested the master branch 
with my fix. I would suggest trying 19.0 or the master branch.


FYI, I haven't pushed it anywhere but I recently rebased my Nexus 5 
patches from last year (and been looking at getting call audio working).


Jonathan

On 6/8/19 9:08 PM, Brian Masney wrote:

Hi,

I'm trying to get the GPU working using the Freedreno driver (A330) on
the Nexus 5 phone. I'm using kernel 5.2rc3 with some out of tree patches
related to the GPU [1] and mesa 19.1.0-rc5 on postmarketOS. When I run
glxgears, I see the gears show up for a fraction of a second and then
it terminates due to the following error:

-
shader: MESA_SHADER_FRAGMENT
inputs: 1
outputs: 1
uniforms: 0
shared: 0
decl_var uniform INTERP_MODE_NONE sampler2D sampler (0, 0, 0)
decl_var shader_in INTERP_MODE_SMOOTH vec4 in_0 (VARYING_SLOT_VAR0, 0, 0)
decl_var shader_out INTERP_MODE_FLAT vec4 out_0 (FRAG_RESULT_DATA0, 0, 0)
decl_function main (0 params)

impl main {
 block block_0:
 /* preds: */
 vec1 32 ssa_0 = load_const (0x /* 0.00 */)
 vec2 32 ssa_1 = intrinsic load_barycentric_pixel () (1) /* 
interp_mode=1 */
 vec4 32 ssa_2 = intrinsic load_interpolated_input (ssa_1, ssa_0) (0, 
0) /* base=0 */ /* component=0 */  /* in_0 */
 vec1 32 ssa_3 = deref_var  (uniform sampler2D)
 vec2 32 ssa_4 = vec2 ssa_2.x, ssa_2.y
 vec4 32 ssa_5 = tex ssa_3 (texture_deref), ssa_3 (sampler_deref), 
ssa_4 (coord)
Unhandled NIR tex src type: 11


 intrinsic store_output (ssa_5, ssa_0) (0, 15, 0) /* base=0 */ /* 
wrmask=xyzw */ /* component=0 */   /* out_0 */
 /* succs: block_1 */
 block block_1:
}

Assertion failed: !"" (../src/freedreno/ir3/ir3_context.c: ir3_context_error: 
407)
-

I verified that the mesa 19.1.0-rc5 release contains this recent a3xx
fix from Jonathan:
https://gitlab.freedesktop.org/mesa/mesa/commit/1db86d8b62860380c34af77ae62b019ed2376443

Any suggestions?

[1] https://github.com/masneyb/linux/commits/v5.2-rc3-nexus5-gpu-wip
 The GPU specific patches start at Rob's patch 'qcom-scm: add support
 to restore secure config' on that list. I submitted the patches
 below that a few weeks ago to the upstream kernel and I expect
 they'll be merged. Once I have a working GPU, I plan to start
 working on the interconnect support in the kernel for msm8974 so
 that the clock hacks can be dropped.

Brian


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 06/16] nir: improve convert_yuv_to_rgb when fuse_ffma=true

2019-01-07 Thread Jonathan Marek
There's no updated series yet. This patch will work on its own and the 
issue that was pointed out doesn't affect behavior at all.


On 1/7/19 4:47 PM, Lionel Landwerlin wrote:
I did not but then saw someone pointed out an issue with this particular 
patch.

I can do tomorrow.
Do you have link to the updated series?

Thanks,

-
Lionel

On 07/01/2019 16:54, Jonathan Marek wrote:

Hi,

Did you get a chance try this? If not, I might be able to try it 
myself as I have Intel HW.


On 12/19/18 12:34 PM, Lionel Landwerlin wrote:

Hey Jonathan,

I'm kind of curious as to whether we can have a single expression 
that pretty much generates the same final code (through some of the 
algebraic lowering/optimizations).

I'll give it a try on Intel HW, see what it does.

-
Lionel

On 19/12/2018 16:39, Jonathan Marek wrote:
When ffma is available, we can use a different arrangement of 
constants to

get a better result. On freedreno/ir3, this reduces the YUV->RGB to 7
scalar ffma. On freedreno/a2xx, it will allow YUV->RGB to be 3 vec4 
ffma.


Signed-off-by: Jonathan Marek 
---
  src/compiler/nir/nir_lower_tex.c | 62 
++--

  1 file changed, 43 insertions(+), 19 deletions(-)

diff --git a/src/compiler/nir/nir_lower_tex.c 
b/src/compiler/nir/nir_lower_tex.c

index 6a6b6c41a7..f7c821bb34 100644
--- a/src/compiler/nir/nir_lower_tex.c
+++ b/src/compiler/nir/nir_lower_tex.c
@@ -342,25 +342,49 @@ convert_yuv_to_rgb(nir_builder *b, 
nir_tex_instr *tex,

 nir_ssa_def *y, nir_ssa_def *u, nir_ssa_def *v,
 nir_ssa_def *a)
  {
-   nir_const_value m[3] = {
-  { .f32 = { 1.0f,  0.0f, 1.59602678f, 0.0f } },
-  { .f32 = { 1.0f, -0.39176229f, -0.81296764f, 0.0f } },
-  { .f32 = { 1.0f,  2.01723214f,  0.0f,    0.0f } }
-   };
-
-   nir_ssa_def *yuv =
-  nir_vec4(b,
-   nir_fmul(b, nir_imm_float(b, 1.16438356f),
-    nir_fadd(b, y, nir_imm_float(b, -16.0f / 
255.0f))),
-   nir_channel(b, nir_fadd(b, u, nir_imm_float(b, 
-128.0f / 255.0f)), 0),
-   nir_channel(b, nir_fadd(b, v, nir_imm_float(b, 
-128.0f / 255.0f)), 0),

-   nir_imm_float(b, 0.0));
-
-   nir_ssa_def *red = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32, 
m[0]));
-   nir_ssa_def *green = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32, 
m[1]));
-   nir_ssa_def *blue = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32, 
m[2]));

-
-   nir_ssa_def *result = nir_vec4(b, red, green, blue, a);
+   nir_ssa_def *result;
+
+
+   if (b->shader->options->fuse_ffma) {
+  nir_const_value m[4] = {
+ { .f32 = { 1.16438356f, 1.16438356f, 1.16438356f, 0.0f } },
+ { .f32 = { 0.0f,   -0.39176229f, 2.01723214f, 0.0f } },
+ { .f32 = { 1.59602678f,-0.81296764f, 0.0f, 0.0f } },
+  };
+  static const float y_off = -16.0f * 1.16438356f / 255.0f;
+  static const float sc = 128.0f / 255.0f;
+
+  nir_ssa_def *offset =
+ nir_vec4(b,
+  nir_imm_float(b, y_off - sc * 1.59602678f),
+  nir_imm_float(b, y_off + sc * (0.81296764f + 
0.39176229f)),

+  nir_imm_float(b, y_off - sc * 2.01723214f),
+  a);
+
+  result = nir_ffma(b, y, nir_build_imm(b, 4, 32, m[0]),
+   nir_ffma(b, u, nir_build_imm(b, 4, 32, m[1]),
+    nir_ffma(b, v, nir_build_imm(b, 4, 
32, m[2]), offset)));

+   } else {
+  nir_const_value m[3] = {
+ { .f32 = { 1.0f,  0.0f, 1.59602678f, 0.0f } },
+ { .f32 = { 1.0f, -0.39176229f, -0.81296764f, 0.0f } },
+ { .f32 = { 1.0f,  2.01723214f,  0.0f,    0.0f } }
+  };
+
+  nir_ssa_def *yuv =
+ nir_vec4(b,
+  nir_fmul(b, nir_imm_float(b, 1.16438356f),
+   nir_fadd(b, y, nir_imm_float(b, -16.0f / 
255.0f))),
+  nir_channel(b, nir_fadd(b, u, nir_imm_float(b, 
-128.0f / 255.0f)), 0),
+  nir_channel(b, nir_fadd(b, v, nir_imm_float(b, 
-128.0f / 255.0f)), 0),

+  nir_imm_float(b, 0.0));
+
+  nir_ssa_def *red = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32, 
m[0]));
+  nir_ssa_def *green = nir_fdot4(b, yuv, nir_build_imm(b, 4, 
32, m[1]));
+  nir_ssa_def *blue = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32, 
m[2]));

+
+  result = nir_vec4(b, red, green, blue, a);
+   }
 nir_ssa_def_rewrite_uses(>dest.ssa, nir_src_for_ssa(result));
  }




___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev




___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 06/16] nir: improve convert_yuv_to_rgb when fuse_ffma=true

2019-01-07 Thread Jonathan Marek

Hi,

Did you get a chance try this? If not, I might be able to try it myself 
as I have Intel HW.


On 12/19/18 12:34 PM, Lionel Landwerlin wrote:

Hey Jonathan,

I'm kind of curious as to whether we can have a single expression that 
pretty much generates the same final code (through some of the algebraic 
lowering/optimizations).

I'll give it a try on Intel HW, see what it does.

-
Lionel

On 19/12/2018 16:39, Jonathan Marek wrote:
When ffma is available, we can use a different arrangement of 
constants to

get a better result. On freedreno/ir3, this reduces the YUV->RGB to 7
scalar ffma. On freedreno/a2xx, it will allow YUV->RGB to be 3 vec4 ffma.

Signed-off-by: Jonathan Marek 
---
  src/compiler/nir/nir_lower_tex.c | 62 ++--
  1 file changed, 43 insertions(+), 19 deletions(-)

diff --git a/src/compiler/nir/nir_lower_tex.c 
b/src/compiler/nir/nir_lower_tex.c

index 6a6b6c41a7..f7c821bb34 100644
--- a/src/compiler/nir/nir_lower_tex.c
+++ b/src/compiler/nir/nir_lower_tex.c
@@ -342,25 +342,49 @@ convert_yuv_to_rgb(nir_builder *b, nir_tex_instr 
*tex,

 nir_ssa_def *y, nir_ssa_def *u, nir_ssa_def *v,
 nir_ssa_def *a)
  {
-   nir_const_value m[3] = {
-  { .f32 = { 1.0f,  0.0f, 1.59602678f, 0.0f } },
-  { .f32 = { 1.0f, -0.39176229f, -0.81296764f, 0.0f } },
-  { .f32 = { 1.0f,  2.01723214f,  0.0f,    0.0f } }
-   };
-
-   nir_ssa_def *yuv =
-  nir_vec4(b,
-   nir_fmul(b, nir_imm_float(b, 1.16438356f),
-    nir_fadd(b, y, nir_imm_float(b, -16.0f / 
255.0f))),
-   nir_channel(b, nir_fadd(b, u, nir_imm_float(b, -128.0f 
/ 255.0f)), 0),
-   nir_channel(b, nir_fadd(b, v, nir_imm_float(b, -128.0f 
/ 255.0f)), 0),

-   nir_imm_float(b, 0.0));
-
-   nir_ssa_def *red = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32, m[0]));
-   nir_ssa_def *green = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32, 
m[1]));

-   nir_ssa_def *blue = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32, m[2]));
-
-   nir_ssa_def *result = nir_vec4(b, red, green, blue, a);
+   nir_ssa_def *result;
+
+
+   if (b->shader->options->fuse_ffma) {
+  nir_const_value m[4] = {
+ { .f32 = { 1.16438356f, 1.16438356f, 1.16438356f, 0.0f } },
+ { .f32 = { 0.0f,   -0.39176229f, 2.01723214f, 0.0f } },
+ { .f32 = { 1.59602678f,-0.81296764f, 0.0f,    0.0f } },
+  };
+  static const float y_off = -16.0f * 1.16438356f / 255.0f;
+  static const float sc = 128.0f / 255.0f;
+
+  nir_ssa_def *offset =
+ nir_vec4(b,
+  nir_imm_float(b, y_off - sc * 1.59602678f),
+  nir_imm_float(b, y_off + sc * (0.81296764f + 
0.39176229f)),

+  nir_imm_float(b, y_off - sc * 2.01723214f),
+  a);
+
+  result = nir_ffma(b, y, nir_build_imm(b, 4, 32, m[0]),
+   nir_ffma(b, u, nir_build_imm(b, 4, 32, m[1]),
+    nir_ffma(b, v, nir_build_imm(b, 4, 
32, m[2]), offset)));

+   } else {
+  nir_const_value m[3] = {
+ { .f32 = { 1.0f,  0.0f, 1.59602678f, 0.0f } },
+ { .f32 = { 1.0f, -0.39176229f, -0.81296764f, 0.0f } },
+ { .f32 = { 1.0f,  2.01723214f,  0.0f,    0.0f } }
+  };
+
+  nir_ssa_def *yuv =
+ nir_vec4(b,
+  nir_fmul(b, nir_imm_float(b, 1.16438356f),
+   nir_fadd(b, y, nir_imm_float(b, -16.0f / 
255.0f))),
+  nir_channel(b, nir_fadd(b, u, nir_imm_float(b, 
-128.0f / 255.0f)), 0),
+  nir_channel(b, nir_fadd(b, v, nir_imm_float(b, 
-128.0f / 255.0f)), 0),

+  nir_imm_float(b, 0.0));
+
+  nir_ssa_def *red = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32, 
m[0]));
+  nir_ssa_def *green = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32, 
m[1]));
+  nir_ssa_def *blue = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32, 
m[2]));

+
+  result = nir_vec4(b, red, green, blue, a);
+   }
 nir_ssa_def_rewrite_uses(>dest.ssa, nir_src_for_ssa(result));
  }




___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 06/16] nir: improve convert_yuv_to_rgb when fuse_ffma=true

2018-12-20 Thread Jonathan marek


On 12/20/2018 01:28 AM, Nils Wallménius wrote:

Den ons 19 dec. 2018 17:44 skrev Jonathan Marek :


When ffma is available, we can use a different arrangement of constants to
get a better result. On freedreno/ir3, this reduces the YUV->RGB to 7
scalar ffma. On freedreno/a2xx, it will allow YUV->RGB to be 3 vec4 ffma.

Signed-off-by: Jonathan Marek 
---
  src/compiler/nir/nir_lower_tex.c | 62 ++--
  1 file changed, 43 insertions(+), 19 deletions(-)

diff --git a/src/compiler/nir/nir_lower_tex.c
b/src/compiler/nir/nir_lower_tex.c
index 6a6b6c41a7..f7c821bb34 100644
--- a/src/compiler/nir/nir_lower_tex.c
+++ b/src/compiler/nir/nir_lower_tex.c
@@ -342,25 +342,49 @@ convert_yuv_to_rgb(nir_builder *b, nir_tex_instr
*tex,
 nir_ssa_def *y, nir_ssa_def *u, nir_ssa_def *v,
 nir_ssa_def *a)
  {
-   nir_const_value m[3] = {
-  { .f32 = { 1.0f,  0.0f, 1.59602678f, 0.0f } },
-  { .f32 = { 1.0f, -0.39176229f, -0.81296764f, 0.0f } },
-  { .f32 = { 1.0f,  2.01723214f,  0.0f,0.0f } }
-   };
-
-   nir_ssa_def *yuv =
-  nir_vec4(b,
-   nir_fmul(b, nir_imm_float(b, 1.16438356f),
-nir_fadd(b, y, nir_imm_float(b, -16.0f /
255.0f))),
-   nir_channel(b, nir_fadd(b, u, nir_imm_float(b, -128.0f /
255.0f)), 0),
-   nir_channel(b, nir_fadd(b, v, nir_imm_float(b, -128.0f /
255.0f)), 0),
-   nir_imm_float(b, 0.0));
-
-   nir_ssa_def *red = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32, m[0]));
-   nir_ssa_def *green = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32, m[1]));
-   nir_ssa_def *blue = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32, m[2]));
-
-   nir_ssa_def *result = nir_vec4(b, red, green, blue, a);
+   nir_ssa_def *result;
+
+
+   if (b->shader->options->fuse_ffma) {
+  nir_const_value m[4] = {



Drive-by comment, but shouldn't this^ be m[3]?

Regards
Nils



Yes, it should be m[3]. It was originally 4 before alpha was added.


+ { .f32 = { 1.16438356f, 1.16438356f, 1.16438356f, 0.0f } },
+ { .f32 = { 0.0f,   -0.39176229f, 2.01723214f, 0.0f } },
+ { .f32 = { 1.59602678f,-0.81296764f, 0.0f,0.0f } },
+  };
+  static const float y_off = -16.0f * 1.16438356f / 255.0f;
+  static const float sc = 128.0f / 255.0f;
+
+  nir_ssa_def *offset =
+ nir_vec4(b,
+  nir_imm_float(b, y_off - sc * 1.59602678f),
+  nir_imm_float(b, y_off + sc * (0.81296764f +
0.39176229f)),
+  nir_imm_float(b, y_off - sc * 2.01723214f),
+  a);
+
+  result = nir_ffma(b, y, nir_build_imm(b, 4, 32, m[0]),
+   nir_ffma(b, u, nir_build_imm(b, 4, 32, m[1]),
+nir_ffma(b, v, nir_build_imm(b, 4, 32,
m[2]), offset)));
+   } else {
+  nir_const_value m[3] = {
+ { .f32 = { 1.0f,  0.0f, 1.59602678f, 0.0f } },
+ { .f32 = { 1.0f, -0.39176229f, -0.81296764f, 0.0f } },
+ { .f32 = { 1.0f,  2.01723214f,  0.0f,0.0f } }
+  };
+
+  nir_ssa_def *yuv =
+ nir_vec4(b,
+  nir_fmul(b, nir_imm_float(b, 1.16438356f),
+   nir_fadd(b, y, nir_imm_float(b, -16.0f /
255.0f))),
+  nir_channel(b, nir_fadd(b, u, nir_imm_float(b, -128.0f
/ 255.0f)), 0),
+  nir_channel(b, nir_fadd(b, v, nir_imm_float(b, -128.0f
/ 255.0f)), 0),
+  nir_imm_float(b, 0.0));
+
+  nir_ssa_def *red = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32, m[0]));
+  nir_ssa_def *green = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32,
m[1]));
+  nir_ssa_def *blue = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32,
m[2]));
+
+  result = nir_vec4(b, red, green, blue, a);
+   }

 nir_ssa_def_rewrite_uses(>dest.ssa, nir_src_for_ssa(result));
  }
--
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev




___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 01/16] glsl/nir: int constants as float for native_integers=false

2018-12-19 Thread Jonathan marek
I haven't encountered such dereference issues, but lowering integers 
later is a good idea (as with bools which are now lowered later).


On 12/19/2018 01:22 PM, Eric Anholt wrote:

Jonathan Marek  writes:


Note: the backend must take care that uniform index is now a float


This makes me think that lowering ints to float should be done near the
end of the compile (followed by maybe an algebraic and a dce).  As is, I
think nir_lower_io() is going to do bad things to dereferences of i/o
arrays.

That said, it looks like this will be fixing way more than it regresses,
so I would go along with it.


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 08/16] freedreno: a2xx: enable early-Z testing

2018-12-19 Thread Jonathan marek

Hi,

I didn't verify it, but both r600 and a3xx disable earlyZ when alpha 
test is enabled, so this is almost certainly right.


We don't need to worry about the shader writing Z, it is not part of 
OpenGL ES 2.0 and not implemented by the driver (although the hardware 
should allow it).


Why should we need to check if the shader does discards?

On 12/19/2018 01:05 PM, Eric Anholt wrote:

Jonathan Marek  writes:


Enable earlyZ when alpha test is disabled.

Signed-off-by: Jonathan Marek 
---
  src/gallium/drivers/freedreno/a2xx/fd2_zsa.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_zsa.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_zsa.c
index 64b31b677b..d3c19b4450 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_zsa.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_zsa.c
@@ -49,7 +49,8 @@ fd2_zsa_state_create(struct pipe_context *pctx,
A2XX_RB_DEPTHCONTROL_ZFUNC(cso->depth.func); /* maps 1:1 */
  
  	if (cso->depth.enabled)

-   so->rb_depthcontrol |= A2XX_RB_DEPTHCONTROL_Z_ENABLE;
+   so->rb_depthcontrol |= A2XX_RB_DEPTHCONTROL_Z_ENABLE |
+   COND(!cso->alpha.enabled, 
A2XX_RB_DEPTHCONTROL_EARLY_Z_ENABLE);


Why when alpha test is disabled?  Should you also be checking if the
shader does discards?  How about if the shader writes Z, is anything
preventing early Z then?


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 04/16] nir: add nir_lower_bool_to_float

2018-12-19 Thread Jonathan marek

Hi,

No I did not see that. That version should work for me, although I don't 
like the lowering of nir_op_inot it has, since the backend might have 
something smarter to implement a "fnot" (and ior could also just be a 
fmax instead).


On 12/19/2018 12:44 PM, Christian Gmeiner wrote:

Am Mi., 19. Dez. 2018 um 17:44 Uhr schrieb Jonathan Marek :


Mainly a copy of nir_lower_bool_to_int32, but with float opcodes.



Hmmm.. are you aware of https://patchwork.freedesktop.org/patch/257867/ and
https://gitlab.freedesktop.org/jekstrand/mesa/commit/cf819c8a3fa99ccedf423ea77cf710dbd852066b
?

I am going to send out a lager patch series with that version of bool
to float during my christmas break.
Keep in mind that I did not looked very closely at your lowering pass.


Signed-off-by: Jonathan Marek 
---
  src/compiler/Makefile.sources  |   1 +
  src/compiler/nir/meson.build   |   3 +-
  src/compiler/nir/nir.h |   1 +
  src/compiler/nir/nir_lower_bool_to_float.c | 165 +
  4 files changed, 169 insertions(+), 1 deletion(-)
  create mode 100644 src/compiler/nir/nir_lower_bool_to_float.c

diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.sources
index ef47bdb33b..39eaedc658 100644
--- a/src/compiler/Makefile.sources
+++ b/src/compiler/Makefile.sources
@@ -231,6 +231,7 @@ NIR_FILES = \
 nir/nir_lower_atomics_to_ssbo.c \
 nir/nir_lower_bitmap.c \
 nir/nir_lower_bit_size.c \
+   nir/nir_lower_bool_to_float.c \
 nir/nir_lower_bool_to_int32.c \
 nir/nir_lower_clamp_color_outputs.c \
 nir/nir_lower_clip.c \
diff --git a/src/compiler/nir/meson.build b/src/compiler/nir/meson.build
index e252f64539..f1016104af 100644
--- a/src/compiler/nir/meson.build
+++ b/src/compiler/nir/meson.build
@@ -114,6 +114,7 @@ files_libnir = files(
'nir_lower_alpha_test.c',
'nir_lower_atomics_to_ssbo.c',
'nir_lower_bitmap.c',
+  'nir_lower_bool_to_float.c',
'nir_lower_bool_to_int32.c',
'nir_lower_clamp_color_outputs.c',
'nir_lower_clip.c',
@@ -248,7 +249,7 @@ if with_tests
include_directories : [inc_common],
dependencies : [dep_thread, idep_gtest, idep_nir],
link_with : libmesa_util,
-),
+),
  suite : ['compiler', 'nir'],
)

diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index 54f9c64a3a..f6d0bdf7ec 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -2905,6 +2905,7 @@ void nir_lower_alpha_test(nir_shader *shader, enum 
compare_func func,
bool alpha_to_one);
  bool nir_lower_alu(nir_shader *shader);
  bool nir_lower_alu_to_scalar(nir_shader *shader);
+bool nir_lower_bool_to_float(nir_shader *shader);
  bool nir_lower_bool_to_int32(nir_shader *shader);
  bool nir_lower_load_const_to_scalar(nir_shader *shader);
  bool nir_lower_read_invocation_to_scalar(nir_shader *shader);
diff --git a/src/compiler/nir/nir_lower_bool_to_float.c 
b/src/compiler/nir/nir_lower_bool_to_float.c
new file mode 100644
index 00..2756a1815f
--- /dev/null
+++ b/src/compiler/nir/nir_lower_bool_to_float.c
@@ -0,0 +1,165 @@
+/*
+ * Copyright © 2018 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include "nir.h"
+
+static bool
+assert_ssa_def_is_not_1bit(nir_ssa_def *def, UNUSED void *unused)
+{
+   assert(def->bit_size > 1);
+   return true;
+}
+
+static bool
+rewrite_1bit_ssa_def_to_32bit(nir_ssa_def *def, void *_progress)
+{
+   bool *progress = _progress;
+   if (def->bit_size == 1) {
+  def->bit_size = 32;
+  *progress = true;
+   }
+   return true;
+}
+
+static bool
+lower_alu_instr(nir_alu_instr *alu)
+{
+   const nir_op_info *op_info = _op_infos[alu->op];
+
+   switch (alu->op) {
+   case nir_op_vec2:
+   case nir_op_vec3:
+   case nir_op_vec4:
+  /* Thes

[Mesa-dev] [PATCH 11/16] freedreno: a2xx: use fd_resource_offset for base in emit_texture

2018-12-19 Thread Jonathan Marek
Fixup for the texture update patch.

Signed-off-by: Jonathan Marek 
---
 src/gallium/drivers/freedreno/a2xx/fd2_emit.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c
index ce275a78a6..ac2a02dfae 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c
@@ -137,7 +137,7 @@ emit_texture(struct fd_ringbuffer *ring, struct fd_context 
*ctx,
 
OUT_RING(ring, sampler->tex0 | view->tex0);
if (rsc)
-   OUT_RELOC(ring, rsc->bo, 0, view->tex1, 0);
+   OUT_RELOC(ring, rsc->bo, fd_resource_offset(rsc, 0, 0), 
view->tex1, 0);
else
OUT_RING(ring, 0);
 
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 12/16] freedreno: a2xx: sysmem rendering

2018-12-19 Thread Jonathan Marek
Signed-off-by: Jonathan Marek 
---
 src/gallium/drivers/freedreno/a2xx/fd2_gmem.c | 62 +++
 1 file changed, 62 insertions(+)

diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c
index d9aad16b4a..77c8d80055 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c
@@ -367,6 +367,67 @@ fd2_emit_tile_mem2gmem(struct fd_batch *batch, struct 
fd_tile *tile)
/* TODO blob driver seems to toss in a CACHE_FLUSH after each 
DRAW_INDX.. */
 }
 
+static void
+fd2_emit_sysmem_prep(struct fd_batch *batch)
+{
+   struct fd_context *ctx = batch->ctx;
+   struct fd_ringbuffer *ring = batch->gmem;
+   struct pipe_framebuffer_state *pfb = >framebuffer;
+   struct pipe_surface *psurf = pfb->cbufs[0];
+
+   if (!psurf)
+   return;
+
+   struct fd_resource *rsc = fd_resource(psurf->texture);
+   struct fd_resource_slice *slice =
+   fd_resource_slice(rsc, psurf->u.tex.level);
+   uint32_t offset =
+   fd_resource_offset(rsc, psurf->u.tex.level, 
psurf->u.tex.first_layer);
+
+   assert((slice->pitch & 31) == 0);
+   assert((offset & 0xfff) == 0);
+
+   fd2_emit_restore(ctx, ring);
+
+   OUT_PKT0(ring, REG_A2XX_COHER_SIZE_PM4, 1);
+   OUT_RING(ring, slice->size0);
+   OUT_PKT0(ring, REG_A2XX_COHER_STATUS_PM4, 1);
+   OUT_RING(ring, 0x02000200);
+   OUT_PKT0(ring, REG_A2XX_COHER_BASE_PM4, 1);
+   OUT_RELOCW(ring, rsc->bo, offset, 0, 0);
+
+   OUT_PKT3(ring, CP_WAIT_REG_EQ, 4);
+   OUT_RING(ring, REG_A2XX_COHER_STATUS_PM4);
+   OUT_RING(ring, 0);
+   OUT_RING(ring, 0x8000);
+   OUT_RING(ring, 1);
+
+   OUT_PKT3(ring, CP_SET_CONSTANT, 2);
+   OUT_RING(ring, CP_REG(REG_A2XX_RB_SURFACE_INFO));
+   OUT_RING(ring, A2XX_RB_SURFACE_INFO_SURFACE_PITCH(slice->pitch));
+
+   OUT_PKT3(ring, CP_SET_CONSTANT, 2);
+   OUT_RING(ring, CP_REG(REG_A2XX_RB_COLOR_INFO));
+   OUT_RELOCW(ring, rsc->bo, offset, A2XX_RB_COLOR_INFO_LINEAR |
+   A2XX_RB_COLOR_INFO_SWAP(fmt2swap(psurf->format)) |
+   A2XX_RB_COLOR_INFO_FORMAT(fd2_pipe2color(psurf->format)), 0);
+
+   OUT_PKT3(ring, CP_SET_CONSTANT, 2);
+   OUT_RING(ring, CP_REG(REG_A2XX_COHER_DEST_BASE_0));
+   OUT_RELOCW(ring, rsc->bo, offset, 0, 0);
+
+   OUT_PKT3(ring, CP_SET_CONSTANT, 3);
+   OUT_RING(ring, CP_REG(REG_A2XX_PA_SC_SCREEN_SCISSOR_TL));
+   OUT_RING(ring, A2XX_PA_SC_SCREEN_SCISSOR_TL_WINDOW_OFFSET_DISABLE);
+   OUT_RING(ring, A2XX_PA_SC_SCREEN_SCISSOR_BR_X(pfb->width) |
+   A2XX_PA_SC_SCREEN_SCISSOR_BR_Y(pfb->height));
+
+   OUT_PKT3(ring, CP_SET_CONSTANT, 2);
+   OUT_RING(ring, CP_REG(REG_A2XX_PA_SC_WINDOW_OFFSET));
+   OUT_RING(ring, A2XX_PA_SC_WINDOW_OFFSET_X(0) |
+   A2XX_PA_SC_WINDOW_OFFSET_Y(0));
+}
+
 /* before first tile */
 static void
 fd2_emit_tile_init(struct fd_batch *batch)
@@ -440,6 +501,7 @@ fd2_gmem_init(struct pipe_context *pctx)
 {
struct fd_context *ctx = fd_context(pctx);
 
+   ctx->emit_sysmem_prep = fd2_emit_sysmem_prep;
ctx->emit_tile_init = fd2_emit_tile_init;
ctx->emit_tile_prep = fd2_emit_tile_prep;
ctx->emit_tile_mem2gmem = fd2_emit_tile_mem2gmem;
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 16/16] freedreno: a2xx: a20x hw binning

2018-12-19 Thread Jonathan Marek
---
 src/gallium/drivers/freedreno/a2xx/fd2_draw.c |  32 +++-
 src/gallium/drivers/freedreno/a2xx/fd2_emit.c |  52 ++
 src/gallium/drivers/freedreno/a2xx/fd2_emit.h |   3 +-
 src/gallium/drivers/freedreno/a2xx/fd2_gmem.c | 150 ++
 .../drivers/freedreno/a2xx/fd2_program.c  |  11 +-
 .../drivers/freedreno/freedreno_batch.c   |   3 +
 .../drivers/freedreno/freedreno_batch.h   |   7 +
 .../drivers/freedreno/freedreno_draw.h|   3 +
 .../drivers/freedreno/freedreno_gmem.c|  29 +++-
 .../drivers/freedreno/freedreno_gmem.h|   1 +
 .../drivers/freedreno/freedreno_screen.h  |   6 +
 11 files changed, 281 insertions(+), 16 deletions(-)

diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c
index 4e91267080..d3e440d144 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c
@@ -75,11 +75,12 @@ emit_vertexbufs(struct fd_context *ctx)
// CONST(20,0) (or CONST(26,0) in soliv_vp)
 
fd2_emit_vertex_bufs(ctx->batch->draw, 0x78, bufs, vtx->num_elements);
+   fd2_emit_vertex_bufs(ctx->batch->binning, 0x78, bufs, 
vtx->num_elements);
 }
 
 static void
 draw_impl(struct fd_context *ctx, const struct pipe_draw_info *info,
-  struct fd_ringbuffer *ring, unsigned index_offset)
+  struct fd_ringbuffer *ring, unsigned index_offset, bool 
binning)
 {
OUT_PKT3(ring, CP_SET_CONSTANT, 2);
OUT_RING(ring, CP_REG(REG_A2XX_VGT_INDX_OFFSET));
@@ -119,8 +120,22 @@ draw_impl(struct fd_context *ctx, const struct 
pipe_draw_info *info,
OUT_RING(ring, info->min_index);/* VGT_MIN_VTX_INDX */
}
 
+   /* binning shader will take offset from C64 */
+   if (binning && is_a20x(ctx->screen)) {
+   OUT_PKT3(ring, CP_SET_CONSTANT, 5);
+   OUT_RING(ring, 0x0180);
+   OUT_RING(ring, fui(ctx->batch->num_vertices));
+   OUT_RING(ring, fui(0.0f));
+   OUT_RING(ring, fui(0.0f));
+   OUT_RING(ring, fui(0.0f));
+   }
+
+   enum pc_di_vis_cull_mode vismode = USE_VISIBILITY;
+   if (binning || info->mode == PIPE_PRIM_POINTS)
+   vismode = IGNORE_VISIBILITY;
+
fd_draw_emit(ctx->batch, ring, ctx->primtypes[info->mode],
-IGNORE_VISIBILITY, info, index_offset);
+vismode, info, index_offset);
 
if (is_a20x(ctx->screen)) {
/* not sure why this is required, but it fixes some hangs */
@@ -145,6 +160,9 @@ fd2_draw_vbo(struct fd_context *ctx, const struct 
pipe_draw_info *pinfo,
if (ctx->dirty & FD_DIRTY_VTXBUF)
emit_vertexbufs(ctx);
 
+   if (!(fd_mesa_debug & FD_DBG_NOBIN))
+   fd2_emit_state_binning(ctx, ctx->dirty);
+
fd2_emit_state(ctx, ctx->dirty);
 
/* a2xx can draw only 65535 vertices at once
@@ -166,17 +184,23 @@ fd2_draw_vbo(struct fd_context *ctx, const struct 
pipe_draw_info *pinfo,
struct pipe_draw_info info = *pinfo;
unsigned count = info.count;
unsigned step = step_tbl[info.mode];
+   unsigned num_vertices = ctx->batch->num_vertices;
 
if (!step)
return false;
 
for (; count + step > 32766; count -= step) {
info.count = MIN2(count, 32766);
-   draw_impl(ctx, , ctx->batch->draw, index_offset);
+   draw_impl(ctx, , ctx->batch->draw, index_offset, 
false);
+   draw_impl(ctx, , ctx->batch->binning, 
index_offset, true);
info.start += step;
+   ctx->batch->num_vertices += step;
}
+   /* changing this value is a hack, restore it */
+   ctx->batch->num_vertices = num_vertices;
} else {
-   draw_impl(ctx, pinfo, ctx->batch->draw, index_offset);
+   draw_impl(ctx, pinfo, ctx->batch->draw, index_offset, false);
+   draw_impl(ctx, pinfo, ctx->batch->binning, index_offset, true);
}
 
fd_context_all_clean(ctx);
diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c
index 9628f26736..7371fa6e8c 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c
@@ -185,6 +185,58 @@ fd2_emit_vertex_bufs(struct fd_ringbuffer *ring, uint32_t 
val,
}
 }
 
+void
+fd2_emit_state_binning(struct fd_context *ctx, const enum fd_dirty_3d_state 
dirty)
+{
+   struct fd2_blend_stateobj *blend = fd2_blend_stateobj(ctx->blend);
+   struct fd_ringbuffer *ring = ctx->batch->binning;
+
+   /* subset of fd2_emit_state needed for hw binning on a20x */
+
+   if (dirty & (FD_DIRTY_PROG | FD_DIRTY_VTXSTATE))
+ 

[Mesa-dev] [PATCH 07/16] freedreno: a2xx: improve REG_A2XX_PA_CL_VTE_CNTL management

2018-12-19 Thread Jonathan Marek
Doesn't change much, but reduces the size of fd2_emit_state

gmem2mem does not need to change the value: no Z clipping on resolve
mem2gmem now needs to restore the common value after rendering

Signed-off-by: Jonathan Marek 
---
 src/gallium/drivers/freedreno/a2xx/fd2_emit.c | 20 +--
 src/gallium/drivers/freedreno/a2xx/fd2_gmem.c | 18 +
 2 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c
index 60bc9fad4c..7dcd31cbcb 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c
@@ -272,16 +272,6 @@ fd2_emit_state(struct fd_context *ctx, const enum 
fd_dirty_3d_state dirty)
OUT_RING(ring, fui(ctx->viewport.translate[1]));   /* 
PA_CL_VPORT_YOFFSET */
OUT_RING(ring, fui(ctx->viewport.scale[2]));   /* 
PA_CL_VPORT_ZSCALE */
OUT_RING(ring, fui(ctx->viewport.translate[2]));   /* 
PA_CL_VPORT_ZOFFSET */
-
-   OUT_PKT3(ring, CP_SET_CONSTANT, 2);
-   OUT_RING(ring, CP_REG(REG_A2XX_PA_CL_VTE_CNTL));
-   OUT_RING(ring, A2XX_PA_CL_VTE_CNTL_VTX_W0_FMT |
-   A2XX_PA_CL_VTE_CNTL_VPORT_X_SCALE_ENA |
-   A2XX_PA_CL_VTE_CNTL_VPORT_X_OFFSET_ENA |
-   A2XX_PA_CL_VTE_CNTL_VPORT_Y_SCALE_ENA |
-   A2XX_PA_CL_VTE_CNTL_VPORT_Y_OFFSET_ENA |
-   A2XX_PA_CL_VTE_CNTL_VPORT_Z_SCALE_ENA |
-   A2XX_PA_CL_VTE_CNTL_VPORT_Z_OFFSET_ENA);
}
 
if (dirty & (FD_DIRTY_PROG | FD_DIRTY_VTXSTATE | FD_DIRTY_TEXSTATE)) {
@@ -475,6 +465,16 @@ fd2_emit_restore(struct fd_context *ctx, struct 
fd_ringbuffer *ring)
OUT_RING(ring, 0x);/* RB_BLEND_GREEN */
OUT_RING(ring, 0x);/* RB_BLEND_BLUE */
OUT_RING(ring, 0x00ff);/* RB_BLEND_ALPHA */
+
+   OUT_PKT3(ring, CP_SET_CONSTANT, 2);
+   OUT_RING(ring, CP_REG(REG_A2XX_PA_CL_VTE_CNTL));
+   OUT_RING(ring, A2XX_PA_CL_VTE_CNTL_VTX_W0_FMT |
+   A2XX_PA_CL_VTE_CNTL_VPORT_X_SCALE_ENA |
+   A2XX_PA_CL_VTE_CNTL_VPORT_X_OFFSET_ENA |
+   A2XX_PA_CL_VTE_CNTL_VPORT_Y_SCALE_ENA |
+   A2XX_PA_CL_VTE_CNTL_VPORT_Y_OFFSET_ENA |
+   A2XX_PA_CL_VTE_CNTL_VPORT_Z_SCALE_ENA |
+   A2XX_PA_CL_VTE_CNTL_VPORT_Z_OFFSET_ENA);
 }
 
 static void
diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c
index e98ae7334a..3c54e2c6c0 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c
@@ -156,14 +156,6 @@ fd2_emit_tile_gmem2mem(struct fd_batch *batch, struct 
fd_tile *tile)
OUT_RING(ring, xy2d(0, 0));   /* 
PA_SC_WINDOW_SCISSOR_TL */
OUT_RING(ring, xy2d(pfb->width, pfb->height));/* 
PA_SC_WINDOW_SCISSOR_BR */
 
-   OUT_PKT3(ring, CP_SET_CONSTANT, 2);
-   OUT_RING(ring, CP_REG(REG_A2XX_PA_CL_VTE_CNTL));
-   OUT_RING(ring, A2XX_PA_CL_VTE_CNTL_VTX_W0_FMT |
-   A2XX_PA_CL_VTE_CNTL_VPORT_X_SCALE_ENA |
-   A2XX_PA_CL_VTE_CNTL_VPORT_X_OFFSET_ENA |
-   A2XX_PA_CL_VTE_CNTL_VPORT_Y_SCALE_ENA |
-   A2XX_PA_CL_VTE_CNTL_VPORT_Y_OFFSET_ENA);
-
OUT_PKT3(ring, CP_SET_CONSTANT, 2);
OUT_RING(ring, CP_REG(REG_A2XX_PA_CL_CLIP_CNTL));
OUT_RING(ring, 0x);
@@ -350,6 +342,16 @@ fd2_emit_tile_mem2gmem(struct fd_batch *batch, struct 
fd_tile *tile)
if (fd_gmem_needs_restore(batch, tile, FD_BUFFER_COLOR))
emit_mem2gmem_surf(batch, gmem->cbuf_base[0], pfb->cbufs[0]);
 
+   OUT_PKT3(ring, CP_SET_CONSTANT, 2);
+   OUT_RING(ring, CP_REG(REG_A2XX_PA_CL_VTE_CNTL));
+   OUT_RING(ring, A2XX_PA_CL_VTE_CNTL_VTX_W0_FMT |
+   A2XX_PA_CL_VTE_CNTL_VPORT_X_SCALE_ENA |
+   A2XX_PA_CL_VTE_CNTL_VPORT_X_OFFSET_ENA |
+   A2XX_PA_CL_VTE_CNTL_VPORT_Y_SCALE_ENA |
+   A2XX_PA_CL_VTE_CNTL_VPORT_Y_OFFSET_ENA |
+   A2XX_PA_CL_VTE_CNTL_VPORT_Z_SCALE_ENA |
+   A2XX_PA_CL_VTE_CNTL_VPORT_Z_OFFSET_ENA);
+
/* TODO blob driver seems to toss in a CACHE_FLUSH after each 
DRAW_INDX.. */
 }
 
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 06/16] nir: improve convert_yuv_to_rgb when fuse_ffma=true

2018-12-19 Thread Jonathan Marek
When ffma is available, we can use a different arrangement of constants to
get a better result. On freedreno/ir3, this reduces the YUV->RGB to 7
scalar ffma. On freedreno/a2xx, it will allow YUV->RGB to be 3 vec4 ffma.

Signed-off-by: Jonathan Marek 
---
 src/compiler/nir/nir_lower_tex.c | 62 ++--
 1 file changed, 43 insertions(+), 19 deletions(-)

diff --git a/src/compiler/nir/nir_lower_tex.c b/src/compiler/nir/nir_lower_tex.c
index 6a6b6c41a7..f7c821bb34 100644
--- a/src/compiler/nir/nir_lower_tex.c
+++ b/src/compiler/nir/nir_lower_tex.c
@@ -342,25 +342,49 @@ convert_yuv_to_rgb(nir_builder *b, nir_tex_instr *tex,
nir_ssa_def *y, nir_ssa_def *u, nir_ssa_def *v,
nir_ssa_def *a)
 {
-   nir_const_value m[3] = {
-  { .f32 = { 1.0f,  0.0f, 1.59602678f, 0.0f } },
-  { .f32 = { 1.0f, -0.39176229f, -0.81296764f, 0.0f } },
-  { .f32 = { 1.0f,  2.01723214f,  0.0f,0.0f } }
-   };
-
-   nir_ssa_def *yuv =
-  nir_vec4(b,
-   nir_fmul(b, nir_imm_float(b, 1.16438356f),
-nir_fadd(b, y, nir_imm_float(b, -16.0f / 255.0f))),
-   nir_channel(b, nir_fadd(b, u, nir_imm_float(b, -128.0f / 
255.0f)), 0),
-   nir_channel(b, nir_fadd(b, v, nir_imm_float(b, -128.0f / 
255.0f)), 0),
-   nir_imm_float(b, 0.0));
-
-   nir_ssa_def *red = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32, m[0]));
-   nir_ssa_def *green = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32, m[1]));
-   nir_ssa_def *blue = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32, m[2]));
-
-   nir_ssa_def *result = nir_vec4(b, red, green, blue, a);
+   nir_ssa_def *result;
+
+
+   if (b->shader->options->fuse_ffma) {
+  nir_const_value m[4] = {
+ { .f32 = { 1.16438356f, 1.16438356f, 1.16438356f, 0.0f } },
+ { .f32 = { 0.0f,   -0.39176229f, 2.01723214f, 0.0f } },
+ { .f32 = { 1.59602678f,-0.81296764f, 0.0f,0.0f } },
+  };
+  static const float y_off = -16.0f * 1.16438356f / 255.0f;
+  static const float sc = 128.0f / 255.0f;
+
+  nir_ssa_def *offset =
+ nir_vec4(b,
+  nir_imm_float(b, y_off - sc * 1.59602678f),
+  nir_imm_float(b, y_off + sc * (0.81296764f + 0.39176229f)),
+  nir_imm_float(b, y_off - sc * 2.01723214f),
+  a);
+
+  result = nir_ffma(b, y, nir_build_imm(b, 4, 32, m[0]),
+   nir_ffma(b, u, nir_build_imm(b, 4, 32, m[1]),
+nir_ffma(b, v, nir_build_imm(b, 4, 32, m[2]), 
offset)));
+   } else {
+  nir_const_value m[3] = {
+ { .f32 = { 1.0f,  0.0f, 1.59602678f, 0.0f } },
+ { .f32 = { 1.0f, -0.39176229f, -0.81296764f, 0.0f } },
+ { .f32 = { 1.0f,  2.01723214f,  0.0f,0.0f } }
+  };
+
+  nir_ssa_def *yuv =
+ nir_vec4(b,
+  nir_fmul(b, nir_imm_float(b, 1.16438356f),
+   nir_fadd(b, y, nir_imm_float(b, -16.0f / 255.0f))),
+  nir_channel(b, nir_fadd(b, u, nir_imm_float(b, -128.0f / 
255.0f)), 0),
+  nir_channel(b, nir_fadd(b, v, nir_imm_float(b, -128.0f / 
255.0f)), 0),
+  nir_imm_float(b, 0.0));
+
+  nir_ssa_def *red = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32, m[0]));
+  nir_ssa_def *green = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32, m[1]));
+  nir_ssa_def *blue = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32, m[2]));
+
+  result = nir_vec4(b, red, green, blue, a);
+   }
 
nir_ssa_def_rewrite_uses(>dest.ssa, nir_src_for_ssa(result));
 }
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 05/16] nir: combine fmul and fadd across ffma operations

2018-12-19 Thread Jonathan Marek
This works by moving the fadd up across the ffma operations, so that it
can eventually can be combined with a fmul. I'm not sure it works in all
cases, but it works in all the common cases.

This will only affect freedreno since it is the only driver using the
fuse_ffma option.

Example:
matrix * vec4(coord, 1.0)
is compiled as:
fmul, ffma, ffma, fadd
and with this patch:
ffma, ffma, ffma

Signed-off-by: Jonathan Marek 
---
 src/compiler/nir/nir_opt_algebraic.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/compiler/nir/nir_opt_algebraic.py 
b/src/compiler/nir/nir_opt_algebraic.py
index 506d45e55b..97a6c0d8dc 100644
--- a/src/compiler/nir/nir_opt_algebraic.py
+++ b/src/compiler/nir/nir_opt_algebraic.py
@@ -137,6 +137,7 @@ optimizations = [
(('~fadd@64', a, ('fmul', c , ('fadd', b, ('fneg', a, ('flrp', 
a, b, c), '!options->lower_flrp64'),
(('ffma', a, b, c), ('fadd', ('fmul', a, b), c), 'options->lower_ffma'),
(('~fadd', ('fmul', a, b), c), ('ffma', a, b, c), 'options->fuse_ffma'),
+   (('~fadd', ('ffma', a, b, c), d), ('ffma', a, b, ('fadd', c, d)), 
'options->fuse_ffma'),
 
(('fdot4', ('vec4', a, b,   c,   1.0), d), ('fdph',  ('vec3', a, b, c), d)),
(('fdot4', ('vec4', a, 0.0, 0.0, 0.0), b), ('fmul', a, b)),
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 15/16] freedreno: a2xx: add partial lower_scalar pass for ir2

2018-12-19 Thread Jonathan Marek
Some instructions can only be scalar on a2xx, lower these only

Signed-off-by: Jonathan Marek 
---
 .../drivers/freedreno/Makefile.sources|   1 +
 src/gallium/drivers/freedreno/a2xx/ir2_nir.c  |   3 +
 .../freedreno/a2xx/ir2_nir_lower_scalar.c | 174 ++
 .../drivers/freedreno/a2xx/ir2_private.h  |   1 +
 src/gallium/drivers/freedreno/meson.build |   1 +
 5 files changed, 180 insertions(+)
 create mode 100644 src/gallium/drivers/freedreno/a2xx/ir2_nir_lower_scalar.c

diff --git a/src/gallium/drivers/freedreno/Makefile.sources 
b/src/gallium/drivers/freedreno/Makefile.sources
index f4979953e8..fed5b5bd17 100644
--- a/src/gallium/drivers/freedreno/Makefile.sources
+++ b/src/gallium/drivers/freedreno/Makefile.sources
@@ -73,6 +73,7 @@ a2xx_SOURCES := \
a2xx/ir2_assemble.c \
a2xx/ir2_cp.c \
a2xx/ir2_nir.c \
+   a2xx/ir2_nir_lower_scalar.c \
a2xx/ir2_private.h \
a2xx/ir2_ra.c
 
diff --git a/src/gallium/drivers/freedreno/a2xx/ir2_nir.c 
b/src/gallium/drivers/freedreno/a2xx/ir2_nir.c
index 8162479341..10fa0c765f 100644
--- a/src/gallium/drivers/freedreno/a2xx/ir2_nir.c
+++ b/src/gallium/drivers/freedreno/a2xx/ir2_nir.c
@@ -1124,6 +1124,9 @@ ir2_nir_compile(struct ir2_context *ctx, bool binning)
 
OPT_V(ctx->nir, nir_lower_bool_to_float);
 
+   /* lower to scalar instructions that can only be scalar on a2xx */
+   OPT_V(ctx->nir, ir2_nir_lower_scalar);
+
OPT_V(ctx->nir, nir_lower_locals_to_regs);
 
OPT_V(ctx->nir, nir_convert_from_ssa, true);
diff --git a/src/gallium/drivers/freedreno/a2xx/ir2_nir_lower_scalar.c 
b/src/gallium/drivers/freedreno/a2xx/ir2_nir_lower_scalar.c
new file mode 100644
index 00..2b72a86b3e
--- /dev/null
+++ b/src/gallium/drivers/freedreno/a2xx/ir2_nir_lower_scalar.c
@@ -0,0 +1,174 @@
+/*
+ * Copyright (C) 2018 Jonathan Marek 
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 
THE
+ * SOFTWARE.
+ *
+ * Authors:
+ *Jonathan Marek 
+ */
+
+/* some operations can only be scalar on a2xx:
+ *  rsq, rcp, log2, exp2, cos, sin, sqrt
+ * mostly copy-pasted from nir_lower_alu_to_scalar.c
+ */
+
+#include "ir2_private.h"
+#include "compiler/nir/nir_builder.h"
+
+static void
+nir_alu_ssa_dest_init(nir_alu_instr * instr, unsigned num_components,
+ unsigned bit_size)
+{
+   nir_ssa_dest_init(>instr, >dest.dest, num_components,
+ bit_size, NULL);
+   instr->dest.write_mask = (1 << num_components) - 1;
+}
+
+static void
+lower_reduction(nir_alu_instr * instr, nir_op chan_op, nir_op merge_op,
+   nir_builder * builder)
+{
+   unsigned num_components = nir_op_infos[instr->op].input_sizes[0];
+
+   nir_ssa_def *last = NULL;
+   for (unsigned i = 0; i < num_components; i++) {
+   nir_alu_instr *chan =
+   nir_alu_instr_create(builder->shader, chan_op);
+   nir_alu_ssa_dest_init(chan, 1, instr->dest.dest.ssa.bit_size);
+   nir_alu_src_copy(>src[0], >src[0], chan);
+   chan->src[0].swizzle[0] = chan->src[0].swizzle[i];
+   if (nir_op_infos[chan_op].num_inputs > 1) {
+   assert(nir_op_infos[chan_op].num_inputs == 2);
+   nir_alu_src_copy(>src[1], >src[1], chan);
+   chan->src[1].swizzle[0] = chan->src[1].swizzle[i];
+   }
+   chan->exact = instr->exact;
+
+   nir_builder_instr_insert(builder, >instr);
+
+   if (i == 0) {
+   last = >dest.dest.ssa;
+   } else {
+   last = nir_build_alu(builder, merge_op,
+  

[Mesa-dev] [PATCH 14/16] freedreno: a2xx: add ir2 copy propagation

2018-12-19 Thread Jonathan Marek
Two cases:
* replacing srcs which refer to MOV instructions
* replacing MOVs used to write to exports

Signed-off-by: Jonathan Marek 
---
 .../drivers/freedreno/Makefile.sources|   1 +
 src/gallium/drivers/freedreno/a2xx/ir2.c  |   6 +
 src/gallium/drivers/freedreno/a2xx/ir2_cp.c   | 225 ++
 .../drivers/freedreno/a2xx/ir2_private.h  |   3 +
 src/gallium/drivers/freedreno/meson.build |   1 +
 5 files changed, 236 insertions(+)
 create mode 100644 src/gallium/drivers/freedreno/a2xx/ir2_cp.c

diff --git a/src/gallium/drivers/freedreno/Makefile.sources 
b/src/gallium/drivers/freedreno/Makefile.sources
index 8421318081..f4979953e8 100644
--- a/src/gallium/drivers/freedreno/Makefile.sources
+++ b/src/gallium/drivers/freedreno/Makefile.sources
@@ -71,6 +71,7 @@ a2xx_SOURCES := \
a2xx/ir2.c \
a2xx/ir2.h \
a2xx/ir2_assemble.c \
+   a2xx/ir2_cp.c \
a2xx/ir2_nir.c \
a2xx/ir2_private.h \
a2xx/ir2_ra.c
diff --git a/src/gallium/drivers/freedreno/a2xx/ir2.c 
b/src/gallium/drivers/freedreno/a2xx/ir2.c
index 344f62defe..bc1d7c23b8 100644
--- a/src/gallium/drivers/freedreno/a2xx/ir2.c
+++ b/src/gallium/drivers/freedreno/a2xx/ir2.c
@@ -422,9 +422,15 @@ ir2_compile(struct fd2_shader_stateobj *so, unsigned 
variant,
/* convert nir to internal representation */
ir2_nir_compile(, binning);
 
+   /* copy propagate srcs */
+   cp_src();
+
/* get ref_counts and kill non-needed instructions */
ra_count_refs();
 
+   /* remove movs used to write outputs */
+   cp_export();
+
/* instruction order.. and vector->scalar conversions */
schedule_instrs();
 
diff --git a/src/gallium/drivers/freedreno/a2xx/ir2_cp.c 
b/src/gallium/drivers/freedreno/a2xx/ir2_cp.c
new file mode 100644
index 00..fa155887f8
--- /dev/null
+++ b/src/gallium/drivers/freedreno/a2xx/ir2_cp.c
@@ -0,0 +1,225 @@
+/*
+ * Copyright (C) 2018 Jonathan Marek 
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 
THE
+ * SOFTWARE.
+ *
+ * Authors:
+ *Jonathan Marek 
+ */
+
+#include "ir2_private.h"
+
+static bool is_mov(struct ir2_instr *instr)
+{
+   return instr->type == IR2_ALU && instr->alu.vector_opc == MAXv &&
+   instr->src_count == 1;
+}
+
+static void src_combine(struct ir2_src *src, struct ir2_src b)
+{
+   src->num = b.num;
+   src->type = b.type;
+   src->swizzle = swiz_merge(b.swizzle, src->swizzle);
+   if (!src->abs) /* if we have abs we don't care about previous negate */
+   src->negate ^= b.negate;
+   src->abs |= b.abs;
+}
+
+/* cp_src: replace src regs when they refer to a mov instruction
+ * example:
+ * ALU:  MAXvR7 = C7, C7
+ * ALU:  MULADDv R7 = R7, R10, R0.
+ * becomes:
+ * ALU:  MULADDv R7 = C7, R10, R0.
+ */
+void cp_src(struct ir2_context *ctx)
+{
+   struct ir2_instr *p;
+
+   ir2_foreach_instr(instr, ctx) {
+   ir2_foreach_src(src, instr) {
+   /* loop to replace recursively */
+   do {
+   if (src->type != IR2_SRC_SSA)
+   break;
+
+   p = >instr[src->num];
+   /* don't work across blocks to avoid possible 
issues */
+   if (p->block_idx != instr->block_idx)
+   break;
+
+   if (!is_mov(p))
+   break;
+
+   /* cant apply abs to const src, const src only 
for alu */
+   if (p->src[0].type == IR2_SRC_CONST &&
+   (src->abs || instr->type != IR2_A

[Mesa-dev] [PATCH 10/16] freedreno: a2xx: fix VERTEX_REUSE/DEALLOC on a20x

2018-12-19 Thread Jonathan Marek
On a20x, set VGT_VERTEX_REUSE_BLOCK_CNTL to 2 and don't change it. Small
rearrangement on a220 to reduce the size of draw commands.

Only set DEALLOC_CNTL on a20x because the correct a220 value is not known.

Signed-off-by: Jonathan Marek 
---
 src/gallium/drivers/freedreno/a2xx/fd2_draw.c | 18 +++---
 src/gallium/drivers/freedreno/a2xx/fd2_emit.c | 16 
 src/gallium/drivers/freedreno/a2xx/fd2_gmem.c | 18 +++---
 3 files changed, 34 insertions(+), 18 deletions(-)

diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c
index 6dac8ca6a9..db8b022f8d 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c
@@ -85,10 +85,6 @@ draw_impl(struct fd_context *ctx, const struct 
pipe_draw_info *info,
OUT_RING(ring, CP_REG(REG_A2XX_VGT_INDX_OFFSET));
OUT_RING(ring, info->index_size ? 0 : info->start);
 
-   OUT_PKT3(ring, CP_SET_CONSTANT, 2);
-   OUT_RING(ring, CP_REG(REG_A2XX_VGT_VERTEX_REUSE_BLOCK_CNTL));
-   OUT_RING(ring, is_a20x(ctx->screen) ? 0x0002 : 0x003b);
-
OUT_PKT0(ring, REG_A2XX_TC_CNTL_STATUS, 1);
OUT_RING(ring, A2XX_TC_CNTL_STATUS_L2_INVALIDATE);
 
@@ -214,9 +210,11 @@ fd2_clear(struct fd_context *ctx, unsigned buffers,
OUT_RING(ring, CP_REG(REG_A2XX_VGT_INDX_OFFSET));
OUT_RING(ring, 0);
 
-   OUT_PKT3(ring, CP_SET_CONSTANT, 2);
-   OUT_RING(ring, CP_REG(REG_A2XX_VGT_VERTEX_REUSE_BLOCK_CNTL));
-   OUT_RING(ring, 0x028f);
+   if (!is_a20x(ctx->screen)) {
+   OUT_PKT3(ring, CP_SET_CONSTANT, 2);
+   OUT_RING(ring, CP_REG(REG_A2XX_VGT_VERTEX_REUSE_BLOCK_CNTL));
+   OUT_RING(ring, 0x028f);
+   }
 
fd2_program_emit(ring, >solid_prog);
 
@@ -357,6 +355,12 @@ fd2_clear(struct fd_context *ctx, unsigned buffers,
OUT_RING(ring, CP_REG(REG_A2XX_RB_COPY_CONTROL));
OUT_RING(ring, 0x);
 
+   if (!is_a20x(ctx->screen)) {
+   OUT_PKT3(ring, CP_SET_CONSTANT, 2);
+   OUT_RING(ring, CP_REG(REG_A2XX_VGT_VERTEX_REUSE_BLOCK_CNTL));
+   OUT_RING(ring, 0x003b);
+   }
+
ctx->dirty |= FD_DIRTY_ZSA |
FD_DIRTY_VIEWPORT |
FD_DIRTY_RASTERIZER |
diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c
index 7dcd31cbcb..ce275a78a6 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c
@@ -341,6 +341,18 @@ fd2_emit_restore(struct fd_context *ctx, struct 
fd_ringbuffer *ring)
OUT_PKT3(ring, CP_SET_CONSTANT, 2);
OUT_RING(ring, CP_REG(REG_A2XX_PA_SC_VIZ_QUERY));
OUT_RING(ring, A2XX_PA_SC_VIZ_QUERY_VIZ_QUERY_ID(16));
+
+   OUT_PKT3(ring, CP_SET_CONSTANT, 2);
+   OUT_RING(ring, CP_REG(REG_A2XX_VGT_VERTEX_REUSE_BLOCK_CNTL));
+   OUT_RING(ring, 0x0002);
+
+   OUT_PKT3(ring, CP_SET_CONSTANT, 2);
+   OUT_RING(ring, CP_REG(REG_A2XX_VGT_OUT_DEALLOC_CNTL));
+   OUT_RING(ring, 0x0002);
+   } else {
+   OUT_PKT3(ring, CP_SET_CONSTANT, 2);
+   OUT_RING(ring, CP_REG(REG_A2XX_VGT_VERTEX_REUSE_BLOCK_CNTL));
+   OUT_RING(ring, 0x003b);
}
 
OUT_PKT0(ring, REG_A2XX_TP0_CHICKEN, 1);
@@ -368,10 +380,6 @@ fd2_emit_restore(struct fd_context *ctx, struct 
fd_ringbuffer *ring)
OUT_RING(ring, CP_REG(REG_A2XX_VGT_INDX_OFFSET));
OUT_RING(ring, 0x);
 
-   OUT_PKT3(ring, CP_SET_CONSTANT, 2);
-   OUT_RING(ring, CP_REG(REG_A2XX_VGT_VERTEX_REUSE_BLOCK_CNTL));
-   OUT_RING(ring, 0x003b);
-
OUT_PKT3(ring, CP_SET_CONSTANT, 2);
OUT_RING(ring, CP_REG(REG_A2XX_SQ_CONTEXT_MISC));
OUT_RING(ring, A2XX_SQ_CONTEXT_MISC_SC_SAMPLE_CNTL(CENTERS_ONLY));
diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c
index 8469e827b9..d9aad16b4a 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c
@@ -131,9 +131,11 @@ fd2_emit_tile_gmem2mem(struct fd_batch *batch, struct 
fd_tile *tile)
OUT_RING(ring, CP_REG(REG_A2XX_VGT_INDX_OFFSET));
OUT_RING(ring, 0);
 
-   OUT_PKT3(ring, CP_SET_CONSTANT, 2);
-   OUT_RING(ring, CP_REG(REG_A2XX_VGT_VERTEX_REUSE_BLOCK_CNTL));
-   OUT_RING(ring, 0x028f);
+   if (!is_a20x(ctx->screen)) {
+   OUT_PKT3(ring, CP_SET_CONSTANT, 2);
+   OUT_RING(ring, CP_REG(REG_A2XX_VGT_VERTEX_REUSE_BLOCK_CNTL));
+   OUT_RING(ring, 0x028f);
+   }
 
fd2_program_emit(ring, >solid_prog);
 
@@ -186,6 +188,12 @@ fd2_emit_tile_gmem2mem(struct fd_batch *batch, struct 
fd_tile *tile)
OUT_PK

[Mesa-dev] [PATCH 09/16] freedreno: a2xx: set viewport in gmem2mem

2018-12-19 Thread Jonathan Marek
Fixes cases where previous viewport values might case gmem2mem to fail.

Signed-off-by: Jonathan Marek 
---
 src/gallium/drivers/freedreno/a2xx/fd2_gmem.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c
index 3c54e2c6c0..8469e827b9 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c
@@ -160,6 +160,14 @@ fd2_emit_tile_gmem2mem(struct fd_batch *batch, struct 
fd_tile *tile)
OUT_RING(ring, CP_REG(REG_A2XX_PA_CL_CLIP_CNTL));
OUT_RING(ring, 0x);
 
+   /* make sure the rectangle covers the entire screen */
+   OUT_PKT3(ring, CP_SET_CONSTANT, 5);
+   OUT_RING(ring, CP_REG(REG_A2XX_PA_CL_VPORT_XSCALE));
+   OUT_RING(ring, fui(4096.0));
+   OUT_RING(ring, fui(4096.0));
+   OUT_RING(ring, fui(4096.0));
+   OUT_RING(ring, fui(4096.0));
+
OUT_PKT3(ring, CP_SET_CONSTANT, 2);
OUT_RING(ring, CP_REG(REG_A2XX_RB_MODECONTROL));
OUT_RING(ring, A2XX_RB_MODECONTROL_EDRAM_MODE(EDRAM_COPY));
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 01/16] glsl/nir: int constants as float for native_integers=false

2018-12-19 Thread Jonathan Marek
Note: the backend must take care that uniform index is now a float

Signed-off-by: Jonathan Marek 
---
 src/compiler/glsl/glsl_to_nir.cpp | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/src/compiler/glsl/glsl_to_nir.cpp 
b/src/compiler/glsl/glsl_to_nir.cpp
index c5ba47d9e3..c8a7f3bd6c 100644
--- a/src/compiler/glsl/glsl_to_nir.cpp
+++ b/src/compiler/glsl/glsl_to_nir.cpp
@@ -94,6 +94,8 @@ private:
 
nir_deref_instr *evaluate_deref(ir_instruction *ir);
 
+   nir_constant *constant_copy(ir_constant *ir, void *mem_ctx);
+
/* most recent deref instruction created */
nir_deref_instr *deref;
 
@@ -194,8 +196,8 @@ nir_visitor::evaluate_deref(ir_instruction *ir)
return this->deref;
 }
 
-static nir_constant *
-constant_copy(ir_constant *ir, void *mem_ctx)
+nir_constant *
+nir_visitor::constant_copy(ir_constant *ir, void *mem_ctx)
 {
if (ir == NULL)
   return NULL;
@@ -213,7 +215,10 @@ constant_copy(ir_constant *ir, void *mem_ctx)
   assert(cols == 1);
 
   for (unsigned r = 0; r < rows; r++)
- ret->values[0].u32[r] = ir->value.u[r];
+ if (supports_ints)
+ret->values[0].u32[r] = ir->value.u[r];
+ else
+ret->values[0].f32[r] = ir->value.u[r];
 
   break;
 
@@ -222,7 +227,10 @@ constant_copy(ir_constant *ir, void *mem_ctx)
   assert(cols == 1);
 
   for (unsigned r = 0; r < rows; r++)
- ret->values[0].i32[r] = ir->value.i[r];
+ if (supports_ints)
+ret->values[0].i32[r] = ir->value.i[r];
+ else
+ret->values[0].f32[r] = ir->value.i[r];
 
   break;
 
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 08/16] freedreno: a2xx: enable early-Z testing

2018-12-19 Thread Jonathan Marek
Enable earlyZ when alpha test is disabled.

Signed-off-by: Jonathan Marek 
---
 src/gallium/drivers/freedreno/a2xx/fd2_zsa.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_zsa.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_zsa.c
index 64b31b677b..d3c19b4450 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_zsa.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_zsa.c
@@ -49,7 +49,8 @@ fd2_zsa_state_create(struct pipe_context *pctx,
A2XX_RB_DEPTHCONTROL_ZFUNC(cso->depth.func); /* maps 1:1 */
 
if (cso->depth.enabled)
-   so->rb_depthcontrol |= A2XX_RB_DEPTHCONTROL_Z_ENABLE;
+   so->rb_depthcontrol |= A2XX_RB_DEPTHCONTROL_Z_ENABLE |
+   COND(!cso->alpha.enabled, 
A2XX_RB_DEPTHCONTROL_EARLY_Z_ENABLE);
if (cso->depth.writemask)
so->rb_depthcontrol |= A2XX_RB_DEPTHCONTROL_Z_WRITE_ENABLE;
 
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 03/16] glsl/nir: keep bool types when native_integers=false

2018-12-19 Thread Jonathan Marek
We should now lower bool to float later.

Signed-off-by: Jonathan Marek 
---
 src/compiler/glsl/glsl_to_nir.cpp | 175 --
 1 file changed, 71 insertions(+), 104 deletions(-)

diff --git a/src/compiler/glsl/glsl_to_nir.cpp 
b/src/compiler/glsl/glsl_to_nir.cpp
index d88289f682..29702abf75 100644
--- a/src/compiler/glsl/glsl_to_nir.cpp
+++ b/src/compiler/glsl/glsl_to_nir.cpp
@@ -86,6 +86,7 @@ private:
nir_ssa_def *src2, nir_ssa_def *src3);
 
bool supports_ints;
+   bool type_force_float(glsl_base_type type);
 
nir_shader *shader;
nir_function_impl *impl;
@@ -1261,6 +1262,12 @@ nir_visitor::visit(ir_call *ir)
unreachable("glsl_to_nir only handles function calls to intrinsics");
 }
 
+bool
+nir_visitor::type_force_float(glsl_base_type type)
+{
+   return !supports_ints && type != GLSL_TYPE_BOOL;
+}
+
 void
 nir_visitor::visit(ir_assignment *ir)
 {
@@ -1298,7 +1305,8 @@ nir_visitor::visit(ir_assignment *ir)
   for (unsigned i = 0; i < 4; i++) {
  swiz[i] = ir->write_mask & (1 << i) ? component++ : 0;
   }
-  src = nir_swizzle(, src, swiz, num_components, !supports_ints);
+  src = nir_swizzle(, src, swiz, num_components,
+type_force_float(ir->rhs->type->base_type));
}
 
if (ir->condition) {
@@ -1495,22 +1503,21 @@ nir_visitor::visit(ir_expression *ir)
   srcs[i] = evaluate_rvalue(ir->operands[i]);
 
glsl_base_type types[4];
-   for (unsigned i = 0; i < ir->num_operands; i++)
-  if (supports_ints)
- types[i] = ir->operands[i]->type->base_type;
-  else
+   for (unsigned i = 0; i < ir->num_operands; i++) {
+  types[i] = ir->operands[i]->type->base_type;
+  if (type_force_float(types[i]))
  types[i] = GLSL_TYPE_FLOAT;
+   }
 
-   glsl_base_type out_type;
-   if (supports_ints)
-  out_type = ir->type->base_type;
-   else
+   glsl_base_type out_type = ir->type->base_type;
+   if (type_force_float(out_type))
   out_type = GLSL_TYPE_FLOAT;
 
switch (ir->operation) {
case ir_unop_bit_not: result = nir_inot(, srcs[0]); break;
case ir_unop_logic_not:
-  result = supports_ints ? nir_inot(, srcs[0]) : nir_fnot(, srcs[0]);
+  result = type_is_float(types[0]) ? nir_fnot(, srcs[0])
+   : nir_inot(, srcs[0]);
   break;
case ir_unop_neg:
   result = type_is_float(types[0]) ? nir_fneg(, srcs[0])
@@ -1542,7 +1549,7 @@ nir_visitor::visit(ir_expression *ir)
   result = supports_ints ? nir_u2f32(, srcs[0]) : nir_fmov(, srcs[0]);
   break;
case ir_unop_b2f:
-  result = supports_ints ? nir_b2f32(, srcs[0]) : nir_fmov(, srcs[0]);
+  result = nir_b2f32(, srcs[0]);
   break;
case ir_unop_f2i:
case ir_unop_f2u:
@@ -1788,16 +1795,16 @@ nir_visitor::visit(ir_expression *ir)
case ir_binop_bit_or: result = nir_ior(, srcs[0], srcs[1]); break;
case ir_binop_bit_xor: result = nir_ixor(, srcs[0], srcs[1]); break;
case ir_binop_logic_and:
-  result = supports_ints ? nir_iand(, srcs[0], srcs[1])
- : nir_fand(, srcs[0], srcs[1]);
+  result = type_is_float(types[0]) ? nir_fand(, srcs[0], srcs[1])
+   : nir_iand(, srcs[0], srcs[1]);
   break;
case ir_binop_logic_or:
-  result = supports_ints ? nir_ior(, srcs[0], srcs[1])
- : nir_for(, srcs[0], srcs[1]);
+  result = type_is_float(types[0]) ? nir_for(, srcs[0], srcs[1])
+   : nir_ior(, srcs[0], srcs[1]);
   break;
case ir_binop_logic_xor:
-  result = supports_ints ? nir_ixor(, srcs[0], srcs[1])
- : nir_fxor(, srcs[0], srcs[1]);
+  result = type_is_float(types[0]) ? nir_fxor(, srcs[0], srcs[1])
+   : nir_ixor(, srcs[0], srcs[1]);
   break;
case ir_binop_lshift: result = nir_ishl(, srcs[0], srcs[1]); break;
case ir_binop_rshift:
@@ -1811,108 +1818,70 @@ nir_visitor::visit(ir_expression *ir)
case ir_binop_carry:  result = nir_uadd_carry(, srcs[0], srcs[1]);  break;
case ir_binop_borrow: result = nir_usub_borrow(, srcs[0], srcs[1]); break;
case ir_binop_less:
-  if (supports_ints) {
- if (type_is_float(types[0]))
-result = nir_flt(, srcs[0], srcs[1]);
- else if (type_is_signed(types[0]))
-result = nir_ilt(, srcs[0], srcs[1]);
- else
-result = nir_ult(, srcs[0], srcs[1]);
-  } else {
- result = nir_slt(, srcs[0], srcs[1]);
-  }
+  if (type_is_float(types[0]))
+ result = nir_flt(, srcs[0], srcs[1]);
+  else if (type_is_signed(types[0]))
+ result = nir_ilt(, srcs[0], srcs[1]);
+  else
+ result = nir_ult(, srcs[0], srcs[1]);
   break;
case ir_binop_gequal:
-  if (

[Mesa-dev] [PATCH 04/16] nir: add nir_lower_bool_to_float

2018-12-19 Thread Jonathan Marek
Mainly a copy of nir_lower_bool_to_int32, but with float opcodes.

Signed-off-by: Jonathan Marek 
---
 src/compiler/Makefile.sources  |   1 +
 src/compiler/nir/meson.build   |   3 +-
 src/compiler/nir/nir.h |   1 +
 src/compiler/nir/nir_lower_bool_to_float.c | 165 +
 4 files changed, 169 insertions(+), 1 deletion(-)
 create mode 100644 src/compiler/nir/nir_lower_bool_to_float.c

diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.sources
index ef47bdb33b..39eaedc658 100644
--- a/src/compiler/Makefile.sources
+++ b/src/compiler/Makefile.sources
@@ -231,6 +231,7 @@ NIR_FILES = \
nir/nir_lower_atomics_to_ssbo.c \
nir/nir_lower_bitmap.c \
nir/nir_lower_bit_size.c \
+   nir/nir_lower_bool_to_float.c \
nir/nir_lower_bool_to_int32.c \
nir/nir_lower_clamp_color_outputs.c \
nir/nir_lower_clip.c \
diff --git a/src/compiler/nir/meson.build b/src/compiler/nir/meson.build
index e252f64539..f1016104af 100644
--- a/src/compiler/nir/meson.build
+++ b/src/compiler/nir/meson.build
@@ -114,6 +114,7 @@ files_libnir = files(
   'nir_lower_alpha_test.c',
   'nir_lower_atomics_to_ssbo.c',
   'nir_lower_bitmap.c',
+  'nir_lower_bool_to_float.c',
   'nir_lower_bool_to_int32.c',
   'nir_lower_clamp_color_outputs.c',
   'nir_lower_clip.c',
@@ -248,7 +249,7 @@ if with_tests
   include_directories : [inc_common],
   dependencies : [dep_thread, idep_gtest, idep_nir],
   link_with : libmesa_util,
-), 
+),
 suite : ['compiler', 'nir'],
   )
 
diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index 54f9c64a3a..f6d0bdf7ec 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -2905,6 +2905,7 @@ void nir_lower_alpha_test(nir_shader *shader, enum 
compare_func func,
   bool alpha_to_one);
 bool nir_lower_alu(nir_shader *shader);
 bool nir_lower_alu_to_scalar(nir_shader *shader);
+bool nir_lower_bool_to_float(nir_shader *shader);
 bool nir_lower_bool_to_int32(nir_shader *shader);
 bool nir_lower_load_const_to_scalar(nir_shader *shader);
 bool nir_lower_read_invocation_to_scalar(nir_shader *shader);
diff --git a/src/compiler/nir/nir_lower_bool_to_float.c 
b/src/compiler/nir/nir_lower_bool_to_float.c
new file mode 100644
index 00..2756a1815f
--- /dev/null
+++ b/src/compiler/nir/nir_lower_bool_to_float.c
@@ -0,0 +1,165 @@
+/*
+ * Copyright © 2018 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include "nir.h"
+
+static bool
+assert_ssa_def_is_not_1bit(nir_ssa_def *def, UNUSED void *unused)
+{
+   assert(def->bit_size > 1);
+   return true;
+}
+
+static bool
+rewrite_1bit_ssa_def_to_32bit(nir_ssa_def *def, void *_progress)
+{
+   bool *progress = _progress;
+   if (def->bit_size == 1) {
+  def->bit_size = 32;
+  *progress = true;
+   }
+   return true;
+}
+
+static bool
+lower_alu_instr(nir_alu_instr *alu)
+{
+   const nir_op_info *op_info = _op_infos[alu->op];
+
+   switch (alu->op) {
+   case nir_op_vec2:
+   case nir_op_vec3:
+   case nir_op_vec4:
+  /* These we expect to have booleans but the opcode doesn't change */
+  break;
+
+   case nir_op_b2f32: alu->op = nir_op_fmov; break;
+
+   /* Note: we only expect these 5 opcodes with bools */
+   case nir_op_imov: alu->op = nir_op_fmov; break;
+   case nir_op_inot: alu->op = nir_op_fnot; break;
+   case nir_op_iand: alu->op = nir_op_fand; break;
+   case nir_op_ior: alu->op = nir_op_for; break;
+   case nir_op_ixor: alu->op = nir_op_fxor; break;
+
+   /* We might want a new opcode (for the (x != 0.0) f2b op)  */
+   case nir_op_f2b1: alu->op = nir_op_f2b32; break;
+   case nir_op_i2b1: alu->op = nir_op_f2b32; break;
+
+   case nir_op_flt: alu->op = nir_op_slt; break;
+   case nir_op_fge: alu->op = nir_op_sge

[Mesa-dev] [PATCH 02/16] glsl/nir: ftrunc for native_integers=false float to int cast

2018-12-19 Thread Jonathan Marek
out_type is always GLSL_TYPE_FLOAT, so we don't get the ftrunc otherwise

Signed-off-by: Jonathan Marek 
---
 src/compiler/glsl/glsl_to_nir.cpp | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/compiler/glsl/glsl_to_nir.cpp 
b/src/compiler/glsl/glsl_to_nir.cpp
index c8a7f3bd6c..d88289f682 100644
--- a/src/compiler/glsl/glsl_to_nir.cpp
+++ b/src/compiler/glsl/glsl_to_nir.cpp
@@ -1578,6 +1578,13 @@ nir_visitor::visit(ir_expression *ir)
case ir_unop_u2i:
case ir_unop_i642u64:
case ir_unop_u642i64: {
+  if (!supports_ints) {
+ if (ir->operation == ir_unop_f2i || ir->operation == ir_unop_f2u) {
+result = nir_ftrunc(, srcs[0]);
+break;
+ }
+  }
+
   nir_alu_type src_type = nir_get_nir_type_for_glsl_base_type(types[0]);
   nir_alu_type dst_type = nir_get_nir_type_for_glsl_base_type(out_type);
   result = nir_build_alu(, nir_type_conversion_op(src_type, dst_type,
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/4] freedreno: a2xx: add partial lower_scalar pass for ir2

2018-11-19 Thread Jonathan Marek
some instructions can only be scalar on a2xx, lower these only

Signed-off-by: Jonathan Marek 
---
 .../drivers/freedreno/Makefile.sources|   1 +
 src/gallium/drivers/freedreno/a2xx/ir2_nir.c  |   3 +
 .../freedreno/a2xx/ir2_nir_lower_scalar.c | 174 ++
 .../drivers/freedreno/a2xx/ir2_private.h  |   1 +
 src/gallium/drivers/freedreno/meson.build |   1 +
 5 files changed, 180 insertions(+)
 create mode 100644 src/gallium/drivers/freedreno/a2xx/ir2_nir_lower_scalar.c

diff --git a/src/gallium/drivers/freedreno/Makefile.sources 
b/src/gallium/drivers/freedreno/Makefile.sources
index 9061b26ba7..7f2b8e7b7d 100644
--- a/src/gallium/drivers/freedreno/Makefile.sources
+++ b/src/gallium/drivers/freedreno/Makefile.sources
@@ -87,6 +87,7 @@ a2xx_SOURCES := \
a2xx/instr-a2xx.h \
a2xx/ir2.c \
a2xx/ir2_nir.c \
+   a2xx/ir2_nir_lower_scalar.c \
a2xx/ir2_substitutions.c \
a2xx/ir2_ra.c \
a2xx/ir2_assemble.c \
diff --git a/src/gallium/drivers/freedreno/a2xx/ir2_nir.c 
b/src/gallium/drivers/freedreno/a2xx/ir2_nir.c
index f24debf140..12d4ee6653 100644
--- a/src/gallium/drivers/freedreno/a2xx/ir2_nir.c
+++ b/src/gallium/drivers/freedreno/a2xx/ir2_nir.c
@@ -1112,6 +1112,9 @@ ir2_nir_compile(struct ir2_context *ctx, unsigned variant)
/* postprocess */
OPT_V(ctx->nir, nir_opt_algebraic_late);
 
+   /* lower to scalar instructions that can only be scalar on a2xx */
+   OPT_V(ctx->nir, ir2_nir_lower_scalar);
+
OPT_V(ctx->nir, nir_lower_to_source_mods);
OPT_V(ctx->nir, nir_copy_prop);
OPT_V(ctx->nir, nir_opt_dce);
diff --git a/src/gallium/drivers/freedreno/a2xx/ir2_nir_lower_scalar.c 
b/src/gallium/drivers/freedreno/a2xx/ir2_nir_lower_scalar.c
new file mode 100644
index 00..2b72a86b3e
--- /dev/null
+++ b/src/gallium/drivers/freedreno/a2xx/ir2_nir_lower_scalar.c
@@ -0,0 +1,174 @@
+/*
+ * Copyright (C) 2018 Jonathan Marek 
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 
THE
+ * SOFTWARE.
+ *
+ * Authors:
+ *Jonathan Marek 
+ */
+
+/* some operations can only be scalar on a2xx:
+ *  rsq, rcp, log2, exp2, cos, sin, sqrt
+ * mostly copy-pasted from nir_lower_alu_to_scalar.c
+ */
+
+#include "ir2_private.h"
+#include "compiler/nir/nir_builder.h"
+
+static void
+nir_alu_ssa_dest_init(nir_alu_instr * instr, unsigned num_components,
+ unsigned bit_size)
+{
+   nir_ssa_dest_init(>instr, >dest.dest, num_components,
+ bit_size, NULL);
+   instr->dest.write_mask = (1 << num_components) - 1;
+}
+
+static void
+lower_reduction(nir_alu_instr * instr, nir_op chan_op, nir_op merge_op,
+   nir_builder * builder)
+{
+   unsigned num_components = nir_op_infos[instr->op].input_sizes[0];
+
+   nir_ssa_def *last = NULL;
+   for (unsigned i = 0; i < num_components; i++) {
+   nir_alu_instr *chan =
+   nir_alu_instr_create(builder->shader, chan_op);
+   nir_alu_ssa_dest_init(chan, 1, instr->dest.dest.ssa.bit_size);
+   nir_alu_src_copy(>src[0], >src[0], chan);
+   chan->src[0].swizzle[0] = chan->src[0].swizzle[i];
+   if (nir_op_infos[chan_op].num_inputs > 1) {
+   assert(nir_op_infos[chan_op].num_inputs == 2);
+   nir_alu_src_copy(>src[1], >src[1], chan);
+   chan->src[1].swizzle[0] = chan->src[1].swizzle[i];
+   }
+   chan->exact = instr->exact;
+
+   nir_builder_instr_insert(builder, >instr);
+
+   if (i == 0) {
+   last = >dest.dest.ssa;
+   } else {
+   last = nir_build_

[Mesa-dev] [PATCH 3/4] freedreno: implement a20x hw binning

2018-11-19 Thread Jonathan Marek
Not in this patch: emitting the hw binning variant and filling the
"draw_patches". That is part of the ir2 patch.

Signed-off-by: Jonathan Marek 
---
 src/gallium/drivers/freedreno/a2xx/fd2_draw.c |  51 ++--
 src/gallium/drivers/freedreno/a2xx/fd2_emit.c |   8 +-
 src/gallium/drivers/freedreno/a2xx/fd2_emit.h |   3 +-
 src/gallium/drivers/freedreno/a2xx/fd2_gmem.c | 118 ++
 .../drivers/freedreno/freedreno_gmem.c|  29 +++--
 .../drivers/freedreno/freedreno_gmem.h|   1 +
 6 files changed, 190 insertions(+), 20 deletions(-)

diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c
index 6945a1dc3d..cab20d0295 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c
@@ -75,19 +75,29 @@ emit_vertexbufs(struct fd_context *ctx)
// CONST(20,0) (or CONST(26,0) in soliv_vp)
 
fd2_emit_vertex_bufs(ctx->batch->draw, 0x78, bufs, vtx->num_elements);
+   fd2_emit_vertex_bufs(ctx->batch->binning, 0x78, bufs, 
vtx->num_elements);
 }
 
 static void
 draw_impl(struct fd_context *ctx, const struct pipe_draw_info *info,
-  struct fd_ringbuffer *ring, unsigned index_offset)
+  struct fd_ringbuffer *ring, unsigned index_offset,
+  bool binning)
 {
+   enum pc_di_vis_cull_mode vismode;
+
OUT_PKT3(ring, CP_SET_CONSTANT, 2);
OUT_RING(ring, CP_REG(REG_A2XX_VGT_INDX_OFFSET));
OUT_RING(ring, info->index_size ? 0 : info->start);
 
-   OUT_PKT3(ring, CP_SET_CONSTANT, 2);
-   OUT_RING(ring, CP_REG(REG_A2XX_VGT_VERTEX_REUSE_BLOCK_CNTL));
-   OUT_RING(ring, is_a20x(ctx->screen) ? 0x0002 : 0x003b);
+   /* in the binning batch, this value is set once in fd2_emit_tile_init */
+   if (!binning) {
+   OUT_PKT3(ring, CP_SET_CONSTANT, 2);
+   OUT_RING(ring, CP_REG(REG_A2XX_VGT_VERTEX_REUSE_BLOCK_CNTL));
+   /* XXX do this for every REG_A2XX_VGT_VERTEX_REUSE_BLOCK_CNTL 
write ?
+* if set to 0x3b on a20x, clipping is broken
+*/
+   OUT_RING(ring, is_a20x(ctx->screen) ? 0x0002 : 0x003b);
+   }
 
OUT_PKT0(ring, REG_A2XX_TC_CNTL_STATUS, 1);
OUT_RING(ring, A2XX_TC_CNTL_STATUS_L2_INVALIDATE);
@@ -123,8 +133,26 @@ draw_impl(struct fd_context *ctx, const struct 
pipe_draw_info *info,
OUT_RING(ring, info->min_index);/* VGT_MIN_VTX_INDX */
}
 
+   /* binning shader will take offset from C64 */
+   if (binning && is_a20x(ctx->screen)) {
+   OUT_PKT3(ring, CP_SET_CONSTANT, 5);
+   OUT_RING(ring, 0x0180);
+   OUT_RING(ring, fui(ctx->batch->num_vertices));
+   OUT_RING(ring, fui(0.0f));
+   OUT_RING(ring, fui(0.0f));
+   OUT_RING(ring, fui(0.0f));
+   }
+
+   vismode = binning ? IGNORE_VISIBILITY : USE_VISIBILITY;
+   /* a22x hw binning not implemented */
+   if (binning || !is_a20x(ctx->screen) || (fd_mesa_debug & FD_DBG_NOBIN))
+   vismode = IGNORE_VISIBILITY;
+
+   if (info->mode == PIPE_PRIM_POINTS)
+   vismode = IGNORE_VISIBILITY;
+
fd_draw_emit(ctx->batch, ring, ctx->primtypes[info->mode],
-IGNORE_VISIBILITY, info, index_offset);
+vismode, info, index_offset);
 
if (is_a20x(ctx->screen)) {
/* not sure why this is required, but it fixes some hangs */
@@ -149,7 +177,8 @@ fd2_draw_vbo(struct fd_context *ctx, const struct 
pipe_draw_info *pinfo,
if (ctx->dirty & FD_DIRTY_VTXBUF)
emit_vertexbufs(ctx);
 
-   fd2_emit_state(ctx, ctx->dirty);
+   fd2_emit_state(ctx, ctx->batch->draw, ctx->dirty);
+   fd2_emit_state(ctx, ctx->batch->binning, ctx->dirty);
 
/* a2xx can draw only 65535 vertices at once
 * on a22x the field in the draw command is 32bits but seems limited too
@@ -170,17 +199,23 @@ fd2_draw_vbo(struct fd_context *ctx, const struct 
pipe_draw_info *pinfo,
struct pipe_draw_info info = *pinfo;
unsigned count = info.count;
unsigned step = step_tbl[info.mode];
+   unsigned num_vertices = ctx->batch->num_vertices;
 
if (!step)
return false;
 
for (; count + step > 32766; count -= step) {
info.count = MIN2(count, 32766);
-   draw_impl(ctx, , ctx->batch->draw, index_offset);
+   draw_impl(ctx, , ctx->batch->draw, index_offset, 
false);
+   draw_impl(ctx, , ctx->batch->binning, 
index_offset, true);
info.start += st

[Mesa-dev] [PATCH 4/4] freedreno: use MSM_BO_SCANOUT with scanout buffers

2018-11-19 Thread Jonathan Marek
Signed-off-by: Jonathan Marek 
---
 src/gallium/drivers/freedreno/drm/freedreno_drmif.h | 1 +
 src/gallium/drivers/freedreno/drm/msm_bo.c  | 3 +++
 src/gallium/drivers/freedreno/freedreno_resource.c  | 4 +++-
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/freedreno/drm/freedreno_drmif.h 
b/src/gallium/drivers/freedreno/drm/freedreno_drmif.h
index 6468eac4a0..e12ab970c8 100644
--- a/src/gallium/drivers/freedreno/drm/freedreno_drmif.h
+++ b/src/gallium/drivers/freedreno/drm/freedreno_drmif.h
@@ -63,6 +63,7 @@ enum fd_param_id {
 #define DRM_FREEDRENO_GEM_CACHE_WBACKWA   0x0080
 #define DRM_FREEDRENO_GEM_CACHE_MASK  0x00f0
 #define DRM_FREEDRENO_GEM_GPUREADONLY 0x0100
+#define DRM_FREEDRENO_GEM_SCANOUT 0x0200
 
 /* bo access flags: (keep aligned to MSM_PREP_x) */
 #define DRM_FREEDRENO_PREP_READ   0x01
diff --git a/src/gallium/drivers/freedreno/drm/msm_bo.c 
b/src/gallium/drivers/freedreno/drm/msm_bo.c
index da3315c9ab..d93dfbeab2 100644
--- a/src/gallium/drivers/freedreno/drm/msm_bo.c
+++ b/src/gallium/drivers/freedreno/drm/msm_bo.c
@@ -142,6 +142,9 @@ int msm_bo_new_handle(struct fd_device *dev,
};
int ret;
 
+   if (flags & DRM_FREEDRENO_GEM_SCANOUT)
+   req.flags |= MSM_BO_SCANOUT;
+
ret = drmCommandWriteRead(dev->fd, DRM_MSM_GEM_NEW,
, sizeof(req));
if (ret)
diff --git a/src/gallium/drivers/freedreno/freedreno_resource.c 
b/src/gallium/drivers/freedreno/freedreno_resource.c
index 54d7385896..bd7be94c85 100644
--- a/src/gallium/drivers/freedreno/freedreno_resource.c
+++ b/src/gallium/drivers/freedreno/freedreno_resource.c
@@ -99,7 +99,9 @@ realloc_bo(struct fd_resource *rsc, uint32_t size)
 {
struct fd_screen *screen = fd_screen(rsc->base.screen);
uint32_t flags = DRM_FREEDRENO_GEM_CACHE_WCOMBINE |
-   DRM_FREEDRENO_GEM_TYPE_KMEM; /* TODO */
+   DRM_FREEDRENO_GEM_TYPE_KMEM |
+   COND(rsc->base.bind & PIPE_BIND_SCANOUT, 
DRM_FREEDRENO_GEM_SCANOUT);
+   /* TODO other flags? */
 
/* if we start using things other than write-combine,
 * be sure to check for PIPE_RESOURCE_FLAG_MAP_COHERENT
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] nir: add fceil lowering

2018-11-19 Thread Jonathan marek
I don't have push rights, but robclark added this patch to his staging 
branch so I imagine he will push it soon.


On 11/19/2018 03:15 PM, Christian Gmeiner wrote:

Am Mo., 12. Nov. 2018 um 19:17 Uhr schrieb Jonathan Marek :


lowers ceil(x) as -floor(-x)

Signed-off-by: Jonathan Marek 


Do you have push rights? As I am interested in this one I would push
it for you if needed.


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] nir: combine fmul and fadd across ffma operations

2018-11-19 Thread Jonathan Marek
This works by moving the fadd up across the ffma operations, so that it
can eventually can be combined with a fmul. I'm not sure it works in all
cases, but it works in all the common cases.

This will only affect freedreno since it is the only driver using the
fuse_ffma option.

Example:
matrix * vec4(coord, 1.0)
is compiled as:
fmul, ffma, ffma, fadd
and with this patch:
ffma, ffma, ffma

Signed-off-by: Jonathan Marek 
---
 src/compiler/nir/nir_opt_algebraic.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/compiler/nir/nir_opt_algebraic.py 
b/src/compiler/nir/nir_opt_algebraic.py
index 8f4df891b8..8d7c30e04a 100644
--- a/src/compiler/nir/nir_opt_algebraic.py
+++ b/src/compiler/nir/nir_opt_algebraic.py
@@ -133,6 +133,7 @@ optimizations = [
(('~fadd@64', a, ('fmul', c , ('fadd', b, ('fneg', a, ('flrp', 
a, b, c), '!options->lower_flrp64'),
(('ffma', a, b, c), ('fadd', ('fmul', a, b), c), 'options->lower_ffma'),
(('~fadd', ('fmul', a, b), c), ('ffma', a, b, c), 'options->fuse_ffma'),
+   (('~fadd', ('ffma', a, b, c), d), ('ffma', a, b, ('fadd', c, d)), 
'options->fuse_ffma'),
 
(('fdot4', ('vec4', a, b,   c,   1.0), d), ('fdph',  ('vec3', a, b, c), d)),
(('fdot4', ('vec4', a, 0.0, 0.0, 0.0), b), ('fmul', a, b)),
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] glsl/nir: ftrunc for native_integers=false float to int cast

2018-11-19 Thread Jonathan Marek
out_type is always GLSL_TYPE_FLOAT, so we don't get the ftrunc otherwise

since there are no other conversions needed, use fmov for the other cases
(there is the f2b case, but the 1-bit bool patches should fix that)

Signed-off-by: Jonathan Marek 
---
 src/compiler/glsl/glsl_to_nir.cpp | 13 +
 1 file changed, 13 insertions(+)

diff --git a/src/compiler/glsl/glsl_to_nir.cpp 
b/src/compiler/glsl/glsl_to_nir.cpp
index 22863c072f..5d1ae85924 100644
--- a/src/compiler/glsl/glsl_to_nir.cpp
+++ b/src/compiler/glsl/glsl_to_nir.cpp
@@ -1560,6 +1560,19 @@ nir_visitor::visit(ir_expression *ir)
case ir_unop_u2i:
case ir_unop_i642u64:
case ir_unop_u642i64: {
+  if (!supports_ints) {
+ switch (ir->operation) {
+ case ir_unop_f2i:
+ case ir_unop_f2u:
+result = nir_ftrunc(, srcs[0]);
+break;
+ default:
+result = nir_fmov(, srcs[0]);
+break;
+ }
+ break;
+  }
+
   nir_alu_type src_type = nir_get_nir_type_for_glsl_base_type(types[0]);
   nir_alu_type dst_type = nir_get_nir_type_for_glsl_base_type(out_type);
   result = nir_build_alu(, nir_type_conversion_op(src_type, dst_type,
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] glsl/nir: int constants as float for native_integers=false

2018-11-19 Thread Jonathan Marek
Note: the backend must take care that uniform index is now a float

Signed-off-by: Jonathan Marek 
---
 src/compiler/glsl/glsl_to_nir.cpp | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/src/compiler/glsl/glsl_to_nir.cpp 
b/src/compiler/glsl/glsl_to_nir.cpp
index 0479f8fcfe..22863c072f 100644
--- a/src/compiler/glsl/glsl_to_nir.cpp
+++ b/src/compiler/glsl/glsl_to_nir.cpp
@@ -93,6 +93,8 @@ private:
 
nir_deref_instr *evaluate_deref(ir_instruction *ir);
 
+   nir_constant *constant_copy(ir_constant *ir, void *mem_ctx);
+
/* most recent deref instruction created */
nir_deref_instr *deref;
 
@@ -196,8 +198,8 @@ nir_visitor::evaluate_deref(ir_instruction *ir)
return this->deref;
 }
 
-static nir_constant *
-constant_copy(ir_constant *ir, void *mem_ctx)
+nir_constant *
+nir_visitor::constant_copy(ir_constant *ir, void *mem_ctx)
 {
if (ir == NULL)
   return NULL;
@@ -215,7 +217,10 @@ constant_copy(ir_constant *ir, void *mem_ctx)
   assert(cols == 1);
 
   for (unsigned r = 0; r < rows; r++)
- ret->values[0].u32[r] = ir->value.u[r];
+ if (supports_ints)
+ret->values[0].u32[r] = ir->value.u[r];
+ else
+ret->values[0].f32[r] = ir->value.u[r];
 
   break;
 
@@ -224,7 +229,10 @@ constant_copy(ir_constant *ir, void *mem_ctx)
   assert(cols == 1);
 
   for (unsigned r = 0; r < rows; r++)
- ret->values[0].i32[r] = ir->value.i[r];
+ if (supports_ints)
+ret->values[0].i32[r] = ir->value.i[r];
+ else
+ret->values[0].f32[r] = ir->value.i[r];
 
   break;
 
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/9] freedreno: a2xx: set VIZ_QUERY_ID on a20x

2018-11-13 Thread Jonathan Marek
Signed-off-by: Jonathan Marek 
---
 src/gallium/drivers/freedreno/a2xx/fd2_emit.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c
index 20bfd06b13..50e2fe13eb 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c
@@ -339,6 +339,11 @@ fd2_emit_restore(struct fd_context *ctx, struct 
fd_ringbuffer *ring)
A2XX_RB_BC_CONTROL_ENABLE_CRC_UPDATE |
A2XX_RB_BC_CONTROL_ACCUM_DATA_FIFO_LIMIT(8) |
A2XX_RB_BC_CONTROL_MEM_EXPORT_TIMEOUT_SELECT(3));
+
+   /* not sure why this is required */
+   OUT_PKT3(ring, CP_SET_CONSTANT, 2);
+   OUT_RING(ring, CP_REG(REG_A2XX_PA_SC_VIZ_QUERY));
+   OUT_RING(ring, A2XX_PA_SC_VIZ_QUERY_VIZ_QUERY_ID(16));
}
 
OUT_PKT0(ring, REG_A2XX_TP0_CHICKEN, 1);
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 7/9] freedreno: use GENERIC instead of TEXCOORD for blit program

2018-11-13 Thread Jonathan Marek
blip_fp uses GENERIC as input, so blit_vp should match for linking

Signed-off-by: Jonathan Marek 
---
 src/gallium/drivers/freedreno/freedreno_program.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/freedreno/freedreno_program.c 
b/src/gallium/drivers/freedreno/freedreno_program.c
index e41ac2d922..989ccd1838 100644
--- a/src/gallium/drivers/freedreno/freedreno_program.c
+++ b/src/gallium/drivers/freedreno/freedreno_program.c
@@ -67,7 +67,7 @@ static const char *blit_vp =
"VERT\n"
"DCL IN[0]   \n"
"DCL IN[1]   \n"
-   "DCL OUT[0], TEXCOORD[0] \n"
+   "DCL OUT[0], GENERIC[0]  \n"
"DCL OUT[1], POSITION\n"
"  0: MOV OUT[0], IN[0]  \n"
"  0: MOV OUT[1], IN[1]  \n"
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 8/9] freedreno: implement a20x hw binning

2018-11-13 Thread Jonathan Marek
Not in this patch: emitting the hw binning variant and filling the
"draw_patches". That is part of the ir2 patch.

Signed-off-by: Jonathan Marek 
---
 src/gallium/drivers/freedreno/a2xx/fd2_draw.c |  47 +--
 src/gallium/drivers/freedreno/a2xx/fd2_emit.c |   8 +-
 src/gallium/drivers/freedreno/a2xx/fd2_emit.h |   3 +-
 src/gallium/drivers/freedreno/a2xx/fd2_gmem.c | 118 ++
 .../drivers/freedreno/freedreno_gmem.c|  29 +++--
 .../drivers/freedreno/freedreno_gmem.h|   1 +
 6 files changed, 187 insertions(+), 19 deletions(-)

diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c
index 49df1daa59..46c76df807 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c
@@ -75,11 +75,13 @@ emit_vertexbufs(struct fd_context *ctx)
// CONST(20,0) (or CONST(26,0) in soliv_vp)
 
fd2_emit_vertex_bufs(ctx->batch->draw, 0x78, bufs, vtx->num_elements);
+   fd2_emit_vertex_bufs(ctx->batch->binning, 0x78, bufs, 
vtx->num_elements);
 }
 
 static void
 draw_impl(struct fd_context *ctx, const struct pipe_draw_info *info,
-  struct fd_ringbuffer *ring, unsigned index_offset)
+  struct fd_ringbuffer *ring, unsigned index_offset,
+  bool binning)
 {
enum pc_di_vis_cull_mode vismode;
 
@@ -87,9 +89,15 @@ draw_impl(struct fd_context *ctx, const struct 
pipe_draw_info *info,
OUT_RING(ring, CP_REG(REG_A2XX_VGT_INDX_OFFSET));
OUT_RING(ring, info->index_size ? 0 : info->start);
 
-   OUT_PKT3(ring, CP_SET_CONSTANT, 2);
-   OUT_RING(ring, CP_REG(REG_A2XX_VGT_VERTEX_REUSE_BLOCK_CNTL));
-   OUT_RING(ring, is_a20x(ctx->screen) ? 0x0002 : 0x003b);
+   /* in the binning batch, this value is set once in fd2_emit_tile_init */
+   if (!binning) {
+   OUT_PKT3(ring, CP_SET_CONSTANT, 2);
+   OUT_RING(ring, CP_REG(REG_A2XX_VGT_VERTEX_REUSE_BLOCK_CNTL));
+   /* XXX do this for every REG_A2XX_VGT_VERTEX_REUSE_BLOCK_CNTL 
write ?
+* if set to 0x3b on a20x, clipping is broken
+*/
+   OUT_RING(ring, is_a20x(ctx->screen) ? 0x0002 : 0x003b);
+   }
 
OUT_PKT0(ring, REG_A2XX_TC_CNTL_STATUS, 1);
OUT_RING(ring, A2XX_TC_CNTL_STATUS_L2_INVALIDATE);
@@ -125,8 +133,26 @@ draw_impl(struct fd_context *ctx, const struct 
pipe_draw_info *info,
OUT_RING(ring, info->min_index);/* VGT_MIN_VTX_INDX */
}
 
+   /* binning shader will take offset from C64 */
+   if (binning && is_a20x(ctx->screen)) {
+   OUT_PKT3(ring, CP_SET_CONSTANT, 5);
+   OUT_RING(ring, 0x0180);
+   OUT_RING(ring, fui(ctx->batch->num_vertices));
+   OUT_RING(ring, fui(0.0f));
+   OUT_RING(ring, fui(0.0f));
+   OUT_RING(ring, fui(0.0f));
+   }
+
+   vismode = binning ? IGNORE_VISIBILITY : USE_VISIBILITY;
+   /* a22x hw binning not implemented */
+   if (binning || !is_a20x(ctx->screen) || (fd_mesa_debug & FD_DBG_NOBIN))
+   vismode = IGNORE_VISIBILITY;
+
+   if (info->mode == PIPE_PRIM_POINTS)
+   vismode = IGNORE_VISIBILITY;
+
fd_draw_emit(ctx->batch, ring, ctx->primtypes[info->mode],
-IGNORE_VISIBILITY, info, index_offset);
+vismode, info, index_offset);
 
if (is_a20x(ctx->screen)) {
/* not sure why this is required, but it fixes some hangs */
@@ -152,6 +178,7 @@ fd2_draw_vbo(struct fd_context *ctx, const struct 
pipe_draw_info *pinfo,
emit_vertexbufs(ctx);
 
fd2_emit_state(ctx, ctx->batch->draw, ctx->dirty);
+   fd2_emit_state(ctx, ctx->batch->binning, ctx->dirty);
 
/* a20x can draw only 65535 vertices at once
 * however, using a limit of 32k fixes an unexplained hang
@@ -171,17 +198,23 @@ fd2_draw_vbo(struct fd_context *ctx, const struct 
pipe_draw_info *pinfo,
struct pipe_draw_info info = *pinfo;
unsigned count = info.count;
unsigned step = step_tbl[info.mode];
+   unsigned num_vertices = ctx->batch->num_vertices;
 
if (!step)
return false;
 
for (; count + step > 32766; count -= step) {
info.count = MIN2(count, 32766);
-   draw_impl(ctx, , ctx->batch->draw, index_offset);
+   draw_impl(ctx, , ctx->batch->draw, index_offset, 
false);
+   draw_impl(ctx, , ctx->batch->binning, 
index_offset, true);
info.start += step;
+   ctx->batch->num_vertices += step;

[Mesa-dev] [PATCH 6/9] freedreno: a2xx texture update

2018-11-13 Thread Jonathan Marek
Adds all missing texture related logic. For everything to work it also
needs changes to ir2/fd2_program, which are part of the ir2 update patch.

Note: it needs rnndb update

Signed-off-by: Jonathan Marek 
---
 .../drivers/freedreno/Makefile.sources|  2 +
 src/gallium/drivers/freedreno/a2xx/fd2_emit.c | 15 +++-
 src/gallium/drivers/freedreno/a2xx/fd2_gmem.c | 26 +--
 .../drivers/freedreno/a2xx/fd2_resource.c | 73 +++
 .../drivers/freedreno/a2xx/fd2_resource.h | 34 +
 .../drivers/freedreno/a2xx/fd2_screen.c   |  6 +-
 .../drivers/freedreno/a2xx/fd2_texture.c  | 67 +++--
 .../drivers/freedreno/a2xx/fd2_texture.h  |  7 +-
 src/gallium/drivers/freedreno/meson.build |  2 +
 9 files changed, 212 insertions(+), 20 deletions(-)
 create mode 100644 src/gallium/drivers/freedreno/a2xx/fd2_resource.c
 create mode 100644 src/gallium/drivers/freedreno/a2xx/fd2_resource.h

diff --git a/src/gallium/drivers/freedreno/Makefile.sources 
b/src/gallium/drivers/freedreno/Makefile.sources
index 8b4d61c988..4d4644f96b 100644
--- a/src/gallium/drivers/freedreno/Makefile.sources
+++ b/src/gallium/drivers/freedreno/Makefile.sources
@@ -76,6 +76,8 @@ a2xx_SOURCES := \
a2xx/fd2_program.h \
a2xx/fd2_rasterizer.c \
a2xx/fd2_rasterizer.h \
+   a2xx/fd2_resource.c \
+   a2xx/fd2_resource.h \
a2xx/fd2_screen.c \
a2xx/fd2_screen.h \
a2xx/fd2_texture.c \
diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c
index 50e2fe13eb..60bc9fad4c 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c
@@ -118,6 +118,7 @@ emit_texture(struct fd_ringbuffer *ring, struct fd_context 
*ctx,
static const struct fd2_pipe_sampler_view dummy_view = {};
const struct fd2_sampler_stateobj *sampler;
const struct fd2_pipe_sampler_view *view;
+   struct fd_resource *rsc;
 
if (emitted & (1 << const_idx))
return 0;
@@ -129,19 +130,25 @@ emit_texture(struct fd_ringbuffer *ring, struct 
fd_context *ctx,
fd2_pipe_sampler_view(tex->textures[samp_id]) :
_view;
 
+   rsc = view->base.texture ? fd_resource(view->base.texture) : NULL;
+
OUT_PKT3(ring, CP_SET_CONSTANT, 7);
OUT_RING(ring, 0x0001 + (0x6 * const_idx));
 
OUT_RING(ring, sampler->tex0 | view->tex0);
-   if (view->base.texture)
-   OUT_RELOC(ring, fd_resource(view->base.texture)->bo, 0, 
view->fmt, 0);
+   if (rsc)
+   OUT_RELOC(ring, rsc->bo, 0, view->tex1, 0);
else
OUT_RING(ring, 0);
 
OUT_RING(ring, view->tex2);
OUT_RING(ring, sampler->tex3 | view->tex3);
-   OUT_RING(ring, sampler->tex4);
-   OUT_RING(ring, sampler->tex5);
+   OUT_RING(ring, sampler->tex4 | view->tex4);
+
+   if (rsc && rsc->base.last_level)
+   OUT_RELOC(ring, rsc->bo, fd_resource_offset(rsc, 1, 0), 
view->tex5, 0);
+   else
+   OUT_RING(ring, view->tex5);
 
return (1 << const_idx);
 }
diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c
index 5770d6248e..e98ae7334a 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c
@@ -66,6 +66,13 @@ emit_gmem2mem_surf(struct fd_batch *batch, uint32_t base,
struct fd_ringbuffer *ring = batch->gmem;
struct fd_resource *rsc = fd_resource(psurf->texture);
uint32_t swap = fmt2swap(psurf->format);
+   struct fd_resource_slice *slice =
+   fd_resource_slice(rsc, psurf->u.tex.level);
+   uint32_t offset =
+   fd_resource_offset(rsc, psurf->u.tex.level, 
psurf->u.tex.first_layer);
+
+   assert((slice->pitch & 31) == 0);
+   assert((offset & 0xfff) == 0);
 
if (!rsc->valid)
return;
@@ -79,8 +86,8 @@ emit_gmem2mem_surf(struct fd_batch *batch, uint32_t base,
OUT_PKT3(ring, CP_SET_CONSTANT, 5);
OUT_RING(ring, CP_REG(REG_A2XX_RB_COPY_CONTROL));
OUT_RING(ring, 0x); /* RB_COPY_CONTROL */
-   OUT_RELOCW(ring, rsc->bo, 0, 0, 0); /* RB_COPY_DEST_BASE */
-   OUT_RING(ring, rsc->slices[0].pitch >> 5); /* RB_COPY_DEST_PITCH */
+   OUT_RELOCW(ring, rsc->bo, offset, 0, 0); /* RB_COPY_DEST_BASE */
+   OUT_RING(ring, slice->pitch >> 5); /* RB_COPY_DEST_PITCH */
OUT_RING(ring,  /* RB_COPY_DEST_INFO */

A2XX_RB_COPY_DEST_INFO_FORMAT(fd2_pipe2color(psurf->format)) |
A2XX_RB_COPY_DEST_INFO_LINEAR |
@@ -189,6 +196,10 @@ emit_mem2gmem_surf(struct fd_batch *bat

[Mesa-dev] [PATCH 2/9] freedreno: a2xx: fix POINT_MINMAX_MAX overflow

2018-11-13 Thread Jonathan Marek
As it stands, it overflows to zero.

Signed-off-by: Jonathan Marek 
---
 src/gallium/drivers/freedreno/a2xx/fd2_rasterizer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_rasterizer.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_rasterizer.c
index f35fddc09f..a81f63b570 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_rasterizer.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_rasterizer.c
@@ -47,7 +47,7 @@ fd2_rasterizer_state_create(struct pipe_context *pctx,
 
if (cso->point_size_per_vertex) {
psize_min = util_get_min_point_size(cso);
-   psize_max = 8192;
+   psize_max = 8192.0 - 0.0625;
} else {
/* Force the point size to be as if the vertex output was 
disabled. */
psize_min = cso->point_size;
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/9] a2xx: Compute depth base in gmem correctly

2018-11-13 Thread Jonathan Marek
Note: it needs rnndb update

Signed-off-by: Marek Vasut 
Signed-off-by: Jonathan Marek 
---
 src/gallium/drivers/freedreno/a2xx/fd2_gmem.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c
index c711f8c79a..5770d6248e 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c
@@ -108,6 +108,7 @@ fd2_emit_tile_gmem2mem(struct fd_batch *batch, struct 
fd_tile *tile)
 {
struct fd_context *ctx = batch->ctx;
struct fd2_context *fd2_ctx = fd2_context(ctx);
+   struct fd_gmem_stateobj *gmem = >gmem;
struct fd_ringbuffer *ring = batch->gmem;
struct pipe_framebuffer_state *pfb = >framebuffer;
 
@@ -170,10 +171,10 @@ fd2_emit_tile_gmem2mem(struct fd_batch *batch, struct 
fd_tile *tile)
A2XX_RB_COPY_DEST_OFFSET_Y(tile->yoff));
 
if (batch->resolve & (FD_BUFFER_DEPTH | FD_BUFFER_STENCIL))
-   emit_gmem2mem_surf(batch, tile->bin_w * tile->bin_h, 
pfb->zsbuf);
+   emit_gmem2mem_surf(batch, gmem->zsbuf_base[0], pfb->zsbuf);
 
if (batch->resolve & FD_BUFFER_COLOR)
-   emit_gmem2mem_surf(batch, 0, pfb->cbufs[0]);
+   emit_gmem2mem_surf(batch, gmem->cbuf_base[0], pfb->cbufs[0]);
 
OUT_PKT3(ring, CP_SET_CONSTANT, 2);
OUT_RING(ring, CP_REG(REG_A2XX_RB_MODECONTROL));
@@ -233,6 +234,7 @@ fd2_emit_tile_mem2gmem(struct fd_batch *batch, struct 
fd_tile *tile)
 {
struct fd_context *ctx = batch->ctx;
struct fd2_context *fd2_ctx = fd2_context(ctx);
+   struct fd_gmem_stateobj *gmem = >gmem;
struct fd_ringbuffer *ring = batch->gmem;
struct pipe_framebuffer_state *pfb = >framebuffer;
unsigned bin_w = tile->bin_w;
@@ -331,10 +333,10 @@ fd2_emit_tile_mem2gmem(struct fd_batch *batch, struct 
fd_tile *tile)
OUT_RING(ring, 0x);
 
if (fd_gmem_needs_restore(batch, tile, FD_BUFFER_DEPTH | 
FD_BUFFER_STENCIL))
-   emit_mem2gmem_surf(batch, bin_w * bin_h, pfb->zsbuf);
+   emit_mem2gmem_surf(batch, gmem->zsbuf_base[0], pfb->zsbuf);
 
if (fd_gmem_needs_restore(batch, tile, FD_BUFFER_COLOR))
-   emit_mem2gmem_surf(batch, 0, pfb->cbufs[0]);
+   emit_mem2gmem_surf(batch, gmem->cbuf_base[0], pfb->cbufs[0]);
 
/* TODO blob driver seems to toss in a CACHE_FLUSH after each 
DRAW_INDX.. */
 }
@@ -357,7 +359,7 @@ fd2_emit_tile_init(struct fd_batch *batch)
OUT_RING(ring, gmem->bin_w); /* RB_SURFACE_INFO */
OUT_RING(ring, A2XX_RB_COLOR_INFO_SWAP(fmt2swap(format)) |
A2XX_RB_COLOR_INFO_FORMAT(fd2_pipe2color(format)));
-   reg = A2XX_RB_DEPTH_INFO_DEPTH_BASE(align(gmem->bin_w * gmem->bin_h, 
4));
+   reg = A2XX_RB_DEPTH_INFO_DEPTH_BASE(gmem->zsbuf_base[0]);
if (pfb->zsbuf)
reg |= 
A2XX_RB_DEPTH_INFO_DEPTH_FORMAT(fd_pipe2depth(pfb->zsbuf->format));
OUT_RING(ring, reg); /* RB_DEPTH_INFO */
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/9] freedreno: a2xx: fd2_draw update for a20x

2018-11-13 Thread Jonathan Marek
Signed-off-by: Jonathan Marek 
---
 src/gallium/drivers/freedreno/a2xx/fd2_draw.c | 97 ---
 .../drivers/freedreno/freedreno_batch.c   |  1 +
 .../drivers/freedreno/freedreno_batch.h   |  1 +
 .../drivers/freedreno/freedreno_draw.c|  2 +
 .../drivers/freedreno/freedreno_draw.h| 24 -
 .../drivers/freedreno/freedreno_util.h|  8 +-
 6 files changed, 114 insertions(+), 19 deletions(-)

diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c
index f00bec6efc..49df1daa59 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c
@@ -77,29 +77,46 @@ emit_vertexbufs(struct fd_context *ctx)
fd2_emit_vertex_bufs(ctx->batch->draw, 0x78, bufs, vtx->num_elements);
 }
 
-static bool
-fd2_draw_vbo(struct fd_context *ctx, const struct pipe_draw_info *info,
- unsigned index_offset)
+static void
+draw_impl(struct fd_context *ctx, const struct pipe_draw_info *info,
+  struct fd_ringbuffer *ring, unsigned index_offset)
 {
-   struct fd_ringbuffer *ring = ctx->batch->draw;
-
-   if (ctx->dirty & FD_DIRTY_VTXBUF)
-   emit_vertexbufs(ctx);
-
-   fd2_emit_state(ctx, ctx->dirty);
+   enum pc_di_vis_cull_mode vismode;
 
OUT_PKT3(ring, CP_SET_CONSTANT, 2);
OUT_RING(ring, CP_REG(REG_A2XX_VGT_INDX_OFFSET));
-   OUT_RING(ring, info->start);
+   OUT_RING(ring, info->index_size ? 0 : info->start);
 
OUT_PKT3(ring, CP_SET_CONSTANT, 2);
OUT_RING(ring, CP_REG(REG_A2XX_VGT_VERTEX_REUSE_BLOCK_CNTL));
-   OUT_RING(ring, 0x003b);
+   OUT_RING(ring, is_a20x(ctx->screen) ? 0x0002 : 0x003b);
 
OUT_PKT0(ring, REG_A2XX_TC_CNTL_STATUS, 1);
OUT_RING(ring, A2XX_TC_CNTL_STATUS_L2_INVALIDATE);
 
-   if (!is_a20x(ctx->screen)) {
+   if (is_a20x(ctx->screen)) {
+   /* wait for DMA to finish and
+* dummy draw one triangle with indexes 0,0,0.
+* with PRE_FETCH_CULL_ENABLE | GRP_CULL_ENABLE.
+*
+* this workaround is for a HW bug related to DMA alignment:
+* it is necessary for indexed draws and possibly also
+* draws that read binning data
+*/
+   OUT_PKT3(ring, CP_WAIT_REG_EQ, 4);
+   OUT_RING(ring, 0x05d0); /* RBBM_STATUS */
+   OUT_RING(ring, 0x);
+   OUT_RING(ring, 0x1000); /* bit: 12: VGT_BUSY_NO_DMA */
+   OUT_RING(ring, 0x0001);
+
+   OUT_PKT3(ring, CP_DRAW_INDX_BIN, 6);
+   OUT_RING(ring, 0x);
+   OUT_RING(ring, 0x0003c004);
+   OUT_RING(ring, 0x);
+   OUT_RING(ring, 0x0003);
+   OUT_RELOC(ring, 
fd_resource(fd2_context(ctx)->solid_vertexbuf)->bo, 0x80, 0, 0);
+   OUT_RING(ring, 0x0006);
+   } else {
OUT_WFI (ring);
 
OUT_PKT3(ring, CP_SET_CONSTANT, 3);
@@ -111,11 +128,61 @@ fd2_draw_vbo(struct fd_context *ctx, const struct 
pipe_draw_info *info,
fd_draw_emit(ctx->batch, ring, ctx->primtypes[info->mode],
 IGNORE_VISIBILITY, info, index_offset);
 
-   OUT_PKT3(ring, CP_SET_CONSTANT, 2);
-   OUT_RING(ring, CP_REG(REG_A2XX_UNKNOWN_2010));
-   OUT_RING(ring, 0x);
+   if (is_a20x(ctx->screen)) {
+   /* not sure why this is required, but it fixes some hangs */
+   OUT_WFI(ring);
+   } else {
+   OUT_PKT3(ring, CP_SET_CONSTANT, 2);
+   OUT_RING(ring, CP_REG(REG_A2XX_UNKNOWN_2010));
+   OUT_RING(ring, 0x);
+   }
 
emit_cacheflush(ring);
+}
+
+
+static bool
+fd2_draw_vbo(struct fd_context *ctx, const struct pipe_draw_info *pinfo,
+ unsigned index_offset)
+{
+   if (!ctx->prog.fp || !ctx->prog.vp)
+   return false;
+
+   if (ctx->dirty & FD_DIRTY_VTXBUF)
+   emit_vertexbufs(ctx);
+
+   fd2_emit_state(ctx, ctx->batch->draw, ctx->dirty);
+
+   /* a20x can draw only 65535 vertices at once
+* however, using a limit of 32k fixes an unexplained hang
+* 32766 works for all primitives
+*/
+   if (is_a20x(ctx->screen) && pinfo->count > 32766) {
+   static const uint16_t step_tbl[PIPE_PRIM_MAX] = {
+   [0 ... PIPE_PRIM_MAX - 1]  = 32766,
+   [PIPE_PRIM_LINE_STRIP] = 32765,
+   [PIPE_PRIM_TRIANGLE_STRIP] = 32764,
+
+   /* needs more work */
+   [PIPE_PRIM_TRIANGLE_FAN]   = 0,
+   [PIPE_PRIM_LINE_LOOP]  = 0,
+   };
+
+   struct pipe_draw_info 

[Mesa-dev] [PATCH 3/9] freedreno: add missing a20x ids

2018-11-13 Thread Jonathan Marek
200: 256KiB GMEM A200 (imx53)
201: 128KiB GMEM A200 (imx51)

Signed-off-by: Jonathan Marek 
---
 src/gallium/drivers/freedreno/freedreno_screen.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c 
b/src/gallium/drivers/freedreno/freedreno_screen.c
index 88d91a9123..a55403804b 100644
--- a/src/gallium/drivers/freedreno/freedreno_screen.c
+++ b/src/gallium/drivers/freedreno/freedreno_screen.c
@@ -796,6 +796,8 @@ fd_screen_create(struct fd_device *dev)
 * send a patch ;-)
 */
switch (screen->gpu_id) {
+   case 200:
+   case 201:
case 205:
case 220:
fd2_screen_init(pscreen);
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/3] nir: combine fmul and fadd across ffma operations

2018-11-13 Thread Jonathan marek
The brw_nir_opt_peephole_ffma pass is only doing what the fuse_ffma 
option already does. It produces the same result as the fuse_ffma 
option, which is not optimal.


This is what I get:
   vec4 32 ssa_7 = fmul ssa_6, ssa_1.
   vec4 32 ssa_8 = ffma ssa_5, ssa_1., ssa_7
   vec4 32 ssa_10 = ffma ssa_9, ssa_1., ssa_8
   vec4 32 ssa_12 = fadd ssa_10, ssa_11
But better optimized as (example with the least rearrangements):
   vec4 32 ssa_7 = ffma ssa_6, ssa_1., ssa_11
   vec4 32 ssa_8 = ffma ssa_5, ssa_1., ssa_7
   vec4 32 ssa_10 = ffma ssa_9, ssa_1., ssa_8

Fusing the fmul and fadd in this case is not obvious. Could this patch 
be OK if it is behind the fuse_ffma option?


On 11/12/2018 02:30 PM, Jason Ekstrand wrote:

In general, you're not supposed to mess around with the precision of fma...
What we do in the Intel drivers is to leave fma split, apply operations,
and then we have a special mul+add fusion pass we run at the end.  Leaving
them split allows for exactly this kind of optimization without mixing up
those FMAs that are supposed to be kept fused and those generated by
mul+add fusion which can be split back apart and re-optimized.

On Mon, Nov 12, 2018 at 12:17 PM Jonathan Marek  wrote:


This works by moving the fadd up across the ffma operations, so that it
can eventually can be combined with a fmul. I'm not sure it works in all
cases, but it works in all the common cases.

Example:
 matrix * vec4(coord, 1.0)
is compiled as:
 fmul, ffma, ffma, fadd
and with this patch:
 ffma, ffma, ffma

Signed-off-by: Jonathan Marek 
---
  src/compiler/nir/nir_opt_algebraic.py | 1 +
  1 file changed, 1 insertion(+)

diff --git a/src/compiler/nir/nir_opt_algebraic.py
b/src/compiler/nir/nir_opt_algebraic.py
index 8f4df891b8..82e10731a6 100644
--- a/src/compiler/nir/nir_opt_algebraic.py
+++ b/src/compiler/nir/nir_opt_algebraic.py
@@ -133,6 +133,7 @@ optimizations = [
 (('~fadd@64', a, ('fmul', c , ('fadd', b, ('fneg', a,
('flrp', a, b, c), '!options->lower_flrp64'),
 (('ffma', a, b, c), ('fadd', ('fmul', a, b), c),
'options->lower_ffma'),
 (('~fadd', ('fmul', a, b), c), ('ffma', a, b, c),
'options->fuse_ffma'),
+   (('~fadd', ('ffma', a, b, c), d), ('ffma', a, b, ('fadd', c, d))),

 (('fdot4', ('vec4', a, b,   c,   1.0), d), ('fdph',  ('vec3', a, b,
c), d)),
 (('fdot4', ('vec4', a, 0.0, 0.0, 0.0), b), ('fmul', a, b)),
--
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev




___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] glsl/nir: glsl_to_nir fixes for native_integers=false

2018-11-12 Thread Jonathan Marek
Two parts:
1. for intructions that have a BOOL source, insert b2f to so that the
backend can identify the source as a BOOL and perform the conversion from
NIR_TRUE/NIR_FALSE
2. add missing type conversions (out_type is always GLSL_TYPE_FLOAT, so we
are missing some conversion instructions): float to int (ftrunc), and f2b
(which represents the operation that is the opposite of a fnot).

Signed-off-by: Jonathan Marek 
---
 src/compiler/glsl/glsl_to_nir.cpp | 23 ++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/src/compiler/glsl/glsl_to_nir.cpp 
b/src/compiler/glsl/glsl_to_nir.cpp
index 0479f8fcfe..565bf588f5 100644
--- a/src/compiler/glsl/glsl_to_nir.cpp
+++ b/src/compiler/glsl/glsl_to_nir.cpp
@@ -1465,9 +1465,13 @@ nir_visitor::visit(ir_expression *ir)
}
 
nir_ssa_def *srcs[4];
-   for (unsigned i = 0; i < ir->num_operands; i++)
+   for (unsigned i = 0; i < ir->num_operands; i++) {
   srcs[i] = evaluate_rvalue(ir->operands[i]);
 
+  if (ir->operands[i]->type->base_type == GLSL_TYPE_BOOL)
+ srcs[i] = nir_b2f(, srcs[i]);
+   }
+
glsl_base_type types[4];
for (unsigned i = 0; i < ir->num_operands; i++)
   if (supports_ints)
@@ -1552,6 +1556,23 @@ nir_visitor::visit(ir_expression *ir)
case ir_unop_u2i:
case ir_unop_i642u64:
case ir_unop_u642i64: {
+  if (!supports_ints) {
+ switch (ir->operation) {
+ case ir_unop_f2i:
+ case ir_unop_f2u:
+result = nir_ftrunc(, srcs[0]);
+break;
+ case ir_unop_f2b:
+ case ir_unop_i2b:
+result = nir_f2b(, srcs[0]);
+break;
+ default:
+result = nir_fmov(, srcs[0]);
+break;
+ }
+ break;
+  }
+
   nir_alu_type src_type = nir_get_nir_type_for_glsl_base_type(types[0]);
   nir_alu_type dst_type = nir_get_nir_type_for_glsl_base_type(out_type);
   result = nir_build_alu(, nir_type_conversion_op(src_type, dst_type,
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] nir: combine fmul and fadd across ffma operations

2018-11-12 Thread Jonathan Marek
This works by moving the fadd up across the ffma operations, so that it
can eventually can be combined with a fmul. I'm not sure it works in all
cases, but it works in all the common cases.

Example:
matrix * vec4(coord, 1.0)
is compiled as:
fmul, ffma, ffma, fadd
and with this patch:
ffma, ffma, ffma

Signed-off-by: Jonathan Marek 
---
 src/compiler/nir/nir_opt_algebraic.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/compiler/nir/nir_opt_algebraic.py 
b/src/compiler/nir/nir_opt_algebraic.py
index 8f4df891b8..82e10731a6 100644
--- a/src/compiler/nir/nir_opt_algebraic.py
+++ b/src/compiler/nir/nir_opt_algebraic.py
@@ -133,6 +133,7 @@ optimizations = [
(('~fadd@64', a, ('fmul', c , ('fadd', b, ('fneg', a, ('flrp', 
a, b, c), '!options->lower_flrp64'),
(('ffma', a, b, c), ('fadd', ('fmul', a, b), c), 'options->lower_ffma'),
(('~fadd', ('fmul', a, b), c), ('ffma', a, b, c), 'options->fuse_ffma'),
+   (('~fadd', ('ffma', a, b, c), d), ('ffma', a, b, ('fadd', c, d))),
 
(('fdot4', ('vec4', a, b,   c,   1.0), d), ('fdph',  ('vec3', a, b, c), d)),
(('fdot4', ('vec4', a, 0.0, 0.0, 0.0), b), ('fmul', a, b)),
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] nir: add fceil lowering

2018-11-12 Thread Jonathan Marek
lowers ceil(x) as -floor(-x)

Signed-off-by: Jonathan Marek 
---
 src/compiler/nir/nir.h| 3 +++
 src/compiler/nir/nir_opt_algebraic.py | 1 +
 2 files changed, 4 insertions(+)

diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index dc3c729dee..f9b32a5daf 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -2086,6 +2086,9 @@ typedef struct nir_shader_compiler_options {
/** lowers ffract to fsub+ffloor: */
bool lower_ffract;
 
+   /** lowers fceil to fneg+ffloor+fneg: */
+   bool lower_fceil;
+
bool lower_ldexp;
 
bool lower_pack_half_2x16;
diff --git a/src/compiler/nir/nir_opt_algebraic.py 
b/src/compiler/nir/nir_opt_algebraic.py
index 8b24daddfd..8f4df891b8 100644
--- a/src/compiler/nir/nir_opt_algebraic.py
+++ b/src/compiler/nir/nir_opt_algebraic.py
@@ -124,6 +124,7 @@ optimizations = [
(('flrp@32', a, b, c), ('fadd', ('fmul', c, ('fsub', b, a)), a), 
'options->lower_flrp32'),
(('flrp@64', a, b, c), ('fadd', ('fmul', c, ('fsub', b, a)), a), 
'options->lower_flrp64'),
(('ffract', a), ('fsub', a, ('ffloor', a)), 'options->lower_ffract'),
+   (('fceil', a), ('fneg', ('ffloor', ('fneg', a))), 'options->lower_fceil'),
(('~fadd', ('fmul', a, ('fadd', 1.0, ('fneg', ('b2f', c, ('fmul', b, 
('b2f', c))), ('bcsel', c, b, a), 'options->lower_flrp32'),
(('~fadd@32', ('fmul', a, ('fadd', 1.0, ('fneg', c ))), ('fmul', b, 
c )), ('flrp', a, b, c), '!options->lower_flrp32'),
(('~fadd@64', ('fmul', a, ('fadd', 1.0, ('fneg', c ))), ('fmul', b, 
c )), ('flrp', a, b, c), '!options->lower_flrp64'),
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 08/11] freedreno: a2xx: split large draws on a20x

2018-10-08 Thread Jonathan marek

Hi,

You're right, it would be easy to do. I'll include it in my next submission.

On 10/08/2018 12:13 AM, Ilia Mirkin wrote:

See my feedback from your earlier submission for how to make this work
on more than triangles. Seems easy enough to just do it.

https://patchwork.freedesktop.org/patch/250192/
On Mon, Oct 8, 2018 at 12:07 AM Jonathan Marek  wrote:


a20x can only draw 65535 vertices at once. this fix only applies to
triangles.

Signed-off-by: Jonathan Marek 
---
  src/gallium/drivers/freedreno/a2xx/fd2_draw.c | 30 +--
  1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c
index 1792505808..7ccbee587f 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c
@@ -171,8 +171,34 @@ fd2_draw_vbo(struct fd_context *ctx, const struct 
pipe_draw_info *pinfo,
 fd2_emit_state(ctx, ctx->batch->draw, ctx->dirty);
 fd2_emit_state(ctx, ctx->batch->binning, ctx->dirty);

-   draw_impl(ctx, pinfo, ctx->batch->draw, index_offset, false);
-   draw_impl(ctx, pinfo, ctx->batch->binning, index_offset, true);
+   /* a20x can only draw 65535 vertices at once... */
+   if (is_a20x(ctx->screen) && pinfo->count > 0x) {
+   struct pipe_draw_info info = *pinfo;
+   unsigned count = info.count;
+   unsigned num_vertices = ctx->batch->num_vertices;
+
+   /* other primitives require more work
+* (triangles works because 0x is divible by 3)
+*/
+   if (info.mode != PIPE_PRIM_TRIANGLES)
+   return false;
+
+   for (; count; ) {
+   info.count = MIN2(count, 0x);
+
+   draw_impl(ctx, , ctx->batch->draw, index_offset, 
false);
+   draw_impl(ctx, , ctx->batch->binning, 
index_offset, true);
+
+   info.start += 0x;
+   ctx->batch->num_vertices += 0x;
+   count -= info.count;
+   }
+   /* changing this value is a hack, restore it */
+   ctx->batch->num_vertices = num_vertices;
+   } else {
+   draw_impl(ctx, pinfo, ctx->batch->draw, index_offset, false);
+   draw_impl(ctx, pinfo, ctx->batch->binning, index_offset, true);
+   }

 fd_context_all_clean(ctx);

--
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 09/11] freedreno: a2xx: set PA_SC_VIZ_QUERY register

2018-10-07 Thread Jonathan Marek
on a20x the GPU will hang if this register is zero

Signed-off-by: Jonathan Marek 
---
 src/gallium/drivers/freedreno/a2xx/fd2_emit.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c
index 90da6a2192..10a8ad586c 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c
@@ -359,6 +359,10 @@ fd2_emit_restore(struct fd_context *ctx, struct 
fd_ringbuffer *ring)
A2XX_RB_BC_CONTROL_MEM_EXPORT_TIMEOUT_SELECT(3));
}
 
+   OUT_PKT3(ring, CP_SET_CONSTANT, 2);
+   OUT_RING(ring, CP_REG(REG_A2XX_PA_SC_VIZ_QUERY));
+   OUT_RING(ring, A2XX_PA_SC_VIZ_QUERY_VIZ_QUERY_ID(16));
+
OUT_PKT0(ring, REG_A2XX_TP0_CHICKEN, 1);
OUT_RING(ring, 0x0002);
 
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 10/11] freedreno: a2xx: start max_reg at -1

2018-10-07 Thread Jonathan Marek
on a220 it makes a difference if the max register # is -1 or 0

Signed-off-by: Jonathan Marek 
---
 src/gallium/drivers/freedreno/a2xx/ir-a2xx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/freedreno/a2xx/ir-a2xx.c 
b/src/gallium/drivers/freedreno/a2xx/ir-a2xx.c
index f8e056e424..3924d11e5a 100644
--- a/src/gallium/drivers/freedreno/a2xx/ir-a2xx.c
+++ b/src/gallium/drivers/freedreno/a2xx/ir-a2xx.c
@@ -209,8 +209,8 @@ void* ir2_shader_assemble(struct ir2_shader *shader,
/* bitmask of variables required for exports defined by "export" */
uint32_t export_mask[REG_MASK/32+1] = {};
 
-   unsigned idx, reg_idx;
-   unsigned max_input = 0;
+   int idx, reg_idx;
+   int max_input = -1;
int export_size = -1;
 
for (idx = 0; idx < shader->instr_count; idx++) {
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 07/11] freedreno: a2xx: a20x hw binning

2018-10-07 Thread Jonathan Marek
adds all the required logic for a20x hw binning to work

Signed-off-by: Jonathan Marek 
---
 src/gallium/drivers/freedreno/a2xx/fd2_draw.c |  95 
 src/gallium/drivers/freedreno/a2xx/fd2_emit.c |  10 +-
 src/gallium/drivers/freedreno/a2xx/fd2_emit.h |   3 +-
 src/gallium/drivers/freedreno/a2xx/fd2_gmem.c | 107 +-
 .../drivers/freedreno/a2xx/fd2_program.c  |  41 ---
 .../drivers/freedreno/a2xx/fd2_program.h  |   2 +-
 6 files changed, 215 insertions(+), 43 deletions(-)

diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c
index 6f0535fa2b..1792505808 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c
@@ -77,31 +77,56 @@ emit_vertexbufs(struct fd_context *ctx)
// CONST(20,0) (or CONST(26,0) in soliv_vp)
 
fd2_emit_vertex_bufs(ctx->batch->draw, 0x78, bufs, vtx->num_elements);
+   fd2_emit_vertex_bufs(ctx->batch->binning, 0x78, bufs, 
vtx->num_elements);
 }
 
-static bool
-fd2_draw_vbo(struct fd_context *ctx, const struct pipe_draw_info *info,
- unsigned index_offset)
+static void
+draw_impl(struct fd_context *ctx, const struct pipe_draw_info *info,
+  struct fd_ringbuffer *ring, unsigned index_offset,
+  bool binning)
 {
-   struct fd_ringbuffer *ring = ctx->batch->draw;
-
-   if (ctx->dirty & FD_DIRTY_VTXBUF)
-   emit_vertexbufs(ctx);
-
-   fd2_emit_state(ctx, ctx->dirty);
+   enum pc_di_vis_cull_mode vismode;
 
OUT_PKT3(ring, CP_SET_CONSTANT, 2);
OUT_RING(ring, CP_REG(REG_A2XX_VGT_INDX_OFFSET));
-   OUT_RING(ring, info->start);
+   OUT_RING(ring, info->index_size ? 0 : info->start);
 
-   OUT_PKT3(ring, CP_SET_CONSTANT, 2);
-   OUT_RING(ring, CP_REG(REG_A2XX_VGT_VERTEX_REUSE_BLOCK_CNTL));
-   OUT_RING(ring, 0x003b);
+   /* in the binning batch, thid value is set once in fd2_emit_tile_init */
+   if (!binning) {
+   OUT_PKT3(ring, CP_SET_CONSTANT, 2);
+   OUT_RING(ring, CP_REG(REG_A2XX_VGT_VERTEX_REUSE_BLOCK_CNTL));
+   /* XXX do this for every REG_A2XX_VGT_VERTEX_REUSE_BLOCK_CNTL 
write ?
+* if set to 0x3b on a20x, clipping is broken
+*/
+   OUT_RING(ring, is_a20x(ctx->screen) ? 0x0002 : 0x003b);
+   }
 
OUT_PKT0(ring, REG_A2XX_TC_CNTL_STATUS, 1);
OUT_RING(ring, A2XX_TC_CNTL_STATUS_L2_INVALIDATE);
 
-   if (!is_a20x(ctx->screen)) {
+   if (is_a20x(ctx->screen)) {
+   /* wait for DMA to finish and
+* dummy draw one triangle with indexes 0,0,0.
+* with PRE_FETCH_CULL_ENABLE | GRP_CULL_ENABLE.
+*
+* this workaround is for a HW bug related to DMA alignment:
+* it is necessary for indexed draws and possibly also
+* draws that read binning data
+*/
+   OUT_PKT3(ring, CP_WAIT_REG_EQ, 4);
+   OUT_RING(ring, 0x05d0); /* RBBM_STATUS */
+   OUT_RING(ring, 0x);
+   OUT_RING(ring, 0x1000); /* bit: 12: VGT_BUSY_NO_DMA */
+   OUT_RING(ring, 0x0001);
+
+   OUT_PKT3(ring, CP_DRAW_INDX_BIN, 6);
+   OUT_RING(ring, 0x);
+   OUT_RING(ring, 0x0003c004);
+   OUT_RING(ring, 0x);
+   OUT_RING(ring, 0x0003);
+   OUT_RELOC(ring, 
fd_resource(fd2_context(ctx)->solid_vertexbuf)->bo, 0x80, 0, 0);
+   OUT_RING(ring, 0x0006);
+   } else {
OUT_WFI (ring);
 
OUT_PKT3(ring, CP_SET_CONSTANT, 3);
@@ -110,14 +135,44 @@ fd2_draw_vbo(struct fd_context *ctx, const struct 
pipe_draw_info *info,
OUT_RING(ring, info->min_index);/* VGT_MIN_VTX_INDX */
}
 
+   /* C64 holds offset to use for binning data */
+   if (binning && is_a20x(ctx->screen)) {
+   OUT_PKT3(ring, CP_SET_CONSTANT, 5);
+   OUT_RING(ring, 0x0180);
+   OUT_RING(ring, fui(ctx->batch->num_vertices));
+   OUT_RING(ring, fui(0.0f));
+   OUT_RING(ring, fui(0.0f));
+   OUT_RING(ring, fui(0.0f));
+   }
+
+   vismode = binning ? IGNORE_VISIBILITY : USE_VISIBILITY;
+   /* a22x hw binning not implemented */
+   if (binning || !is_a20x(ctx->screen) || (fd_mesa_debug & FD_DBG_NOBIN))
+   vismode = IGNORE_VISIBILITY;
+
fd_draw_emit(ctx->batch, ring, ctx->primtypes[info->mode],
-IGNORE_VISIBILITY, info, index_offset);
+   vismode, info, index_offset);
 
-   OUT_PKT3(ring, CP_SET_CONSTANT, 2);
-   OUT_RING(ring, CP_REG(REG_A2XX_UNKNOWN_2010))

[Mesa-dev] [PATCH 11/11] a2xx: Compute depth base in gmem correctly

2018-10-07 Thread Jonathan Marek
From: Marek Vasut 

This fixes "a2xx: Compute depth base in gmem consistently" by using
the already present zsbuf and cbuf bases rather than incorrect hand
crafted calculation.

Without this patch, the following assertion triggers ie. with Qt5
on system with 480x272 display:

appliation: 
../../../../../git/src/gallium/drivers/freedreno/a2xx/a2xx.xml.h:699: 
A2XX_RB_DEPTH_INFO_DEPTH_BASE: Assertion `!(val & 0x3ff)' failed.

Signed-off-by: Marek Vasut 
---
 src/gallium/drivers/freedreno/a2xx/a2xx.xml.h |  4 ++--
 src/gallium/drivers/freedreno/a2xx/fd2_gmem.c | 12 +++-
 2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/src/gallium/drivers/freedreno/a2xx/a2xx.xml.h 
b/src/gallium/drivers/freedreno/a2xx/a2xx.xml.h
index 4a2daca9ed..87c18918f5 100644
--- a/src/gallium/drivers/freedreno/a2xx/a2xx.xml.h
+++ b/src/gallium/drivers/freedreno/a2xx/a2xx.xml.h
@@ -682,7 +682,7 @@ static inline uint32_t A2XX_RB_COLOR_INFO_SWAP(uint32_t val)
 static inline uint32_t A2XX_RB_COLOR_INFO_BASE(uint32_t val)
 {
assert(!(val & 0x3ff));
-   return ((val >> 10) << A2XX_RB_COLOR_INFO_BASE__SHIFT) & 
A2XX_RB_COLOR_INFO_BASE__MASK;
+   return ((val >> 12) << A2XX_RB_COLOR_INFO_BASE__SHIFT) & 
A2XX_RB_COLOR_INFO_BASE__MASK;
 }
 
 #define REG_A2XX_RB_DEPTH_INFO 0x2002
@@ -697,7 +697,7 @@ static inline uint32_t A2XX_RB_DEPTH_INFO_DEPTH_FORMAT(enum 
adreno_rb_depth_form
 static inline uint32_t A2XX_RB_DEPTH_INFO_DEPTH_BASE(uint32_t val)
 {
assert(!(val & 0x3ff));
-   return ((val >> 10) << A2XX_RB_DEPTH_INFO_DEPTH_BASE__SHIFT) & 
A2XX_RB_DEPTH_INFO_DEPTH_BASE__MASK;
+   return ((val >> 12) << A2XX_RB_DEPTH_INFO_DEPTH_BASE__SHIFT) & 
A2XX_RB_DEPTH_INFO_DEPTH_BASE__MASK;
 }
 
 #define REG_A2XX_A225_RB_COLOR_INFO3   0x2005
diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c
index 7cf5e201fe..cf93d8539c 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c
@@ -110,6 +110,7 @@ fd2_emit_tile_gmem2mem(struct fd_batch *batch, struct 
fd_tile *tile)
 {
struct fd_context *ctx = batch->ctx;
struct fd2_context *fd2_ctx = fd2_context(ctx);
+   struct fd_gmem_stateobj *gmem = >gmem;
struct fd_ringbuffer *ring = batch->gmem;
struct pipe_framebuffer_state *pfb = >framebuffer;
 
@@ -172,10 +173,10 @@ fd2_emit_tile_gmem2mem(struct fd_batch *batch, struct 
fd_tile *tile)
A2XX_RB_COPY_DEST_OFFSET_Y(tile->yoff));
 
if (batch->resolve & (FD_BUFFER_DEPTH | FD_BUFFER_STENCIL))
-   emit_gmem2mem_surf(batch, tile->bin_w * tile->bin_h, 
pfb->zsbuf);
+   emit_gmem2mem_surf(batch, gmem->zsbuf_base[0], pfb->zsbuf);
 
if (batch->resolve & FD_BUFFER_COLOR)
-   emit_gmem2mem_surf(batch, 0, pfb->cbufs[0]);
+   emit_gmem2mem_surf(batch, gmem->cbuf_base[0], pfb->cbufs[0]);
 
OUT_PKT3(ring, CP_SET_CONSTANT, 2);
OUT_RING(ring, CP_REG(REG_A2XX_RB_MODECONTROL));
@@ -235,6 +236,7 @@ fd2_emit_tile_mem2gmem(struct fd_batch *batch, struct 
fd_tile *tile)
 {
struct fd_context *ctx = batch->ctx;
struct fd2_context *fd2_ctx = fd2_context(ctx);
+   struct fd_gmem_stateobj *gmem = >gmem;
struct fd_ringbuffer *ring = batch->gmem;
struct pipe_framebuffer_state *pfb = >framebuffer;
unsigned bin_w = tile->bin_w;
@@ -333,10 +335,10 @@ fd2_emit_tile_mem2gmem(struct fd_batch *batch, struct 
fd_tile *tile)
OUT_RING(ring, 0x);
 
if (fd_gmem_needs_restore(batch, tile, FD_BUFFER_DEPTH | 
FD_BUFFER_STENCIL))
-   emit_mem2gmem_surf(batch, bin_w * bin_h, pfb->zsbuf);
+   emit_mem2gmem_surf(batch, gmem->zsbuf_base[0], pfb->zsbuf);
 
if (fd_gmem_needs_restore(batch, tile, FD_BUFFER_COLOR))
-   emit_mem2gmem_surf(batch, 0, pfb->cbufs[0]);
+   emit_mem2gmem_surf(batch, gmem->cbuf_base[0], pfb->cbufs[0]);
 
/* TODO blob driver seems to toss in a CACHE_FLUSH after each 
DRAW_INDX.. */
 }
@@ -360,7 +362,7 @@ fd2_emit_tile_init(struct fd_batch *batch)
OUT_RING(ring, gmem->bin_w); /* RB_SURFACE_INFO */
OUT_RING(ring, A2XX_RB_COLOR_INFO_SWAP(fmt2swap(format)) |
A2XX_RB_COLOR_INFO_FORMAT(fd2_pipe2color(format)));
-   reg = A2XX_RB_DEPTH_INFO_DEPTH_BASE(align(gmem->bin_w * gmem->bin_h, 
4));
+   reg = A2XX_RB_DEPTH_INFO_DEPTH_BASE(gmem->zsbuf_base[0]);
if (pfb->zsbuf)
reg |= 
A2XX_RB_DEPTH_INFO_DEPTH_FORMAT(fd_pipe2depth(pfb->zsbuf->format));
OUT_RING(ring, reg); /* RB_DEPTH_INFO */
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 02/11] freedreno: implement different pipe configuration for a20x

2018-10-07 Thread Jonathan Marek
this also adds a num_vsc_pipe which represents the number of pipes to use:
this value is useful because more pipes has a higher cost (on a20x)

Signed-off-by: Jonathan Marek 
---
 .../drivers/freedreno/freedreno_gmem.c| 29 ++-
 .../drivers/freedreno/freedreno_gmem.h|  1 +
 2 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/src/gallium/drivers/freedreno/freedreno_gmem.c 
b/src/gallium/drivers/freedreno/freedreno_gmem.c
index 668730390c..76f3b5a89e 100644
--- a/src/gallium/drivers/freedreno/freedreno_gmem.c
+++ b/src/gallium/drivers/freedreno/freedreno_gmem.c
@@ -216,12 +216,21 @@ calculate_tiles(struct fd_batch *batch)
 
 #define div_round_up(v, a)  (((v) + (a) - 1) / (a))
/* figure out number of tiles per pipe: */
-   tpp_x = tpp_y = 1;
-   while (div_round_up(nbins_y, tpp_y) > screen->num_vsc_pipes)
-   tpp_y += 2;
-   while ((div_round_up(nbins_y, tpp_y) *
-   div_round_up(nbins_x, tpp_x)) > screen->num_vsc_pipes)
-   tpp_x += 1;
+   if (is_a20x(ctx->screen)) {
+   /* for a20x we want to minimize the number of "pipes"
+* binning data has 3 bits for x/y (8x8) but the edges are used 
to
+* cull off-screen vertices with hw binning, so we have 6x6 
pipes
+*/
+   tpp_x = 6;
+   tpp_y = 6;
+   } else {
+   tpp_x = tpp_y = 1;
+   while (div_round_up(nbins_y, tpp_y) > screen->num_vsc_pipes)
+   tpp_y += 2;
+   while ((div_round_up(nbins_y, tpp_y) *
+   div_round_up(nbins_x, tpp_x)) > 
screen->num_vsc_pipes)
+   tpp_x += 1;
+   }
 
gmem->maxpw = tpp_x;
gmem->maxph = tpp_y;
@@ -248,6 +257,9 @@ calculate_tiles(struct fd_batch *batch)
xoff += tpp_x;
}
 
+   /* number of pipes to use for a20x */
+   gmem->num_vsc_pipes = MAX2(1, i);
+
for (; i < npipes; i++) {
struct fd_vsc_pipe *pipe = >vsc_pipe[i];
pipe->x = pipe->y = pipe->w = pipe->h = 0;
@@ -282,11 +294,12 @@ calculate_tiles(struct fd_batch *batch)
 
/* pipe number: */
p = ((i / tpp_y) * div_round_up(nbins_x, tpp_x)) + (j / 
tpp_x);
+   assert(p < gmem->num_vsc_pipes);
 
/* clip bin width: */
bw = MIN2(bin_w, minx + width - xoff);
-
-   tile->n = tile_n[p]++;
+   tile->n = !is_a20x(ctx->screen) ? tile_n[p]++ :
+   ((i % tpp_y + 1) << 3 | (j % tpp_x + 1));
tile->p = p;
tile->bin_w = bw;
tile->bin_h = bh;
diff --git a/src/gallium/drivers/freedreno/freedreno_gmem.h 
b/src/gallium/drivers/freedreno/freedreno_gmem.h
index 47f52307b6..3959ea18be 100644
--- a/src/gallium/drivers/freedreno/freedreno_gmem.h
+++ b/src/gallium/drivers/freedreno/freedreno_gmem.h
@@ -59,6 +59,7 @@ struct fd_gmem_stateobj {
uint16_t minx, miny;
uint16_t width, height;
uint16_t maxpw, maxph;   /* maximum pipe width/height */
+   uint8_t num_vsc_pipes;   /* number of pipes for a20x */
 };
 
 struct fd_batch;
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 08/11] freedreno: a2xx: split large draws on a20x

2018-10-07 Thread Jonathan Marek
a20x can only draw 65535 vertices at once. this fix only applies to
triangles.

Signed-off-by: Jonathan Marek 
---
 src/gallium/drivers/freedreno/a2xx/fd2_draw.c | 30 +--
 1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c
index 1792505808..7ccbee587f 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c
@@ -171,8 +171,34 @@ fd2_draw_vbo(struct fd_context *ctx, const struct 
pipe_draw_info *pinfo,
fd2_emit_state(ctx, ctx->batch->draw, ctx->dirty);
fd2_emit_state(ctx, ctx->batch->binning, ctx->dirty);
 
-   draw_impl(ctx, pinfo, ctx->batch->draw, index_offset, false);
-   draw_impl(ctx, pinfo, ctx->batch->binning, index_offset, true);
+   /* a20x can only draw 65535 vertices at once... */
+   if (is_a20x(ctx->screen) && pinfo->count > 0x) {
+   struct pipe_draw_info info = *pinfo;
+   unsigned count = info.count;
+   unsigned num_vertices = ctx->batch->num_vertices;
+
+   /* other primitives require more work
+* (triangles works because 0x is divible by 3)
+*/
+   if (info.mode != PIPE_PRIM_TRIANGLES)
+   return false;
+
+   for (; count; ) {
+   info.count = MIN2(count, 0x);
+
+   draw_impl(ctx, , ctx->batch->draw, index_offset, 
false);
+   draw_impl(ctx, , ctx->batch->binning, 
index_offset, true);
+
+   info.start += 0x;
+   ctx->batch->num_vertices += 0x;
+   count -= info.count;
+   }
+   /* changing this value is a hack, restore it */
+   ctx->batch->num_vertices = num_vertices;
+   } else {
+   draw_impl(ctx, pinfo, ctx->batch->draw, index_offset, false);
+   draw_impl(ctx, pinfo, ctx->batch->binning, index_offset, true);
+   }
 
fd_context_all_clean(ctx);
 
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 06/11] freedreno: a2xx: add fragcoord

2018-10-07 Thread Jonathan Marek
emulated fragcoord. a2xx has *some* hw support but it is not practical

Signed-off-by: Jonathan Marek 
---
 .../drivers/freedreno/a2xx/fd2_compiler.c| 16 
 1 file changed, 16 insertions(+)

diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c
index 1ce3bc4f82..ab5d16f1a7 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c
@@ -186,6 +186,7 @@ compile_init(struct fd2_compile_context *ctx, struct 
fd_program_stateobj *prog,
switch (name) {
case TGSI_SEMANTIC_COLOR:
case TGSI_SEMANTIC_GENERIC:
+   case TGSI_SEMANTIC_POSITION:
ctx->num_param++;
break;
default:
@@ -325,6 +326,8 @@ add_dst_reg(struct fd2_compile_context *ctx, struct 
ir2_instruction *alu,
num = ctx->prog->num_exports;
}
} else {
+   /* write to gl_FragCoord.z not possible */
+   assert(ctx->output_export_idx[dst->Index] != 
TGSI_SEMANTIC_POSITION);
num = dst->Index;
}
break;
@@ -1103,6 +1106,7 @@ compile_extra_exports(struct fd2_compile_context *ctx)
 {
struct ir2_shader *shader = ctx->so->ir;
struct ir2_instruction *instr;
+   int fragcoord = ctx->prog->export_linkage[TGSI_SEMANTIC_POSITION];
int position = ctx->num_regs[TGSI_FILE_INPUT] + 1;
unsigned i;
/* XXX hacky way to get new temporaries */
@@ -1122,6 +1126,18 @@ compile_extra_exports(struct fd2_compile_context *ctx)
ir2_reg_create(instr, tmp, "", 0);
ir2_dst_create(instr, tmp + 1, "xyzw", 0);
 
+   if (fragcoord != 0xff) {
+   instr = ir2_instr_create_alu_v(shader, MULADDv);
+   ir2_reg_create(instr, 66, "xyzw", IR2_REG_CONST);
+   ir2_reg_create(instr, tmp + 1, "xyzw", 0);
+   ir2_reg_create(instr, 65, "xyzw", IR2_REG_CONST);
+   ir2_dst_create(instr, fragcoord, "xyz_", IR2_REG_EXPORT);
+
+   instr = ir2_instr_create_alu_s(shader, MAXs);
+   ir2_reg_create(instr, tmp, "", 0);
+   ir2_dst_create(instr, fragcoord, "___w", IR2_REG_EXPORT);
+   }
+
/* these two instructions could be avoided with constant folding
 * but it would be hard to implement..
 */
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 04/11] freedreno: a2xx: map tgsi ids to ir2 ids

2018-10-07 Thread Jonathan Marek
this is for a2xx specific semantics (vertex id) and a basic SSA form

Signed-off-by: Jonathan Marek 
---
 .../drivers/freedreno/a2xx/fd2_compiler.c | 54 +--
 src/gallium/drivers/freedreno/a2xx/ir-a2xx.c  | 45 ++--
 src/gallium/drivers/freedreno/a2xx/ir-a2xx.h  |  9 
 3 files changed, 64 insertions(+), 44 deletions(-)

diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c
index 12f9a1ce0a..54f0df54da 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c
@@ -244,8 +244,8 @@ compile_vtx_fetch(struct fd2_compile_context *ctx)
 
ctx->need_sync |= 1 << (i+1);
 
-   ir2_dst_create(instr, i+1, "xyzw", 0);
ir2_reg_create(instr, 0, "x", IR2_REG_INPUT);
+   ir2_dst_create(instr, i+1, "xyzw", 0);
 
if (i == 0)
instr->sync = true;
@@ -421,9 +421,9 @@ add_regs_vector_1(struct fd2_compile_context *ctx,
assert(inst->Instruction.NumSrcRegs == 1);
assert(inst->Instruction.NumDstRegs == 1);
 
-   add_dst_reg(ctx, alu, >Dst[0].Register);
add_src_reg(ctx, alu, >Src[0].Register);
add_src_reg(ctx, alu, >Src[0].Register);
+   add_dst_reg(ctx, alu, >Dst[0].Register);
add_vector_clamp(inst, alu);
 }
 
@@ -434,9 +434,9 @@ add_regs_vector_2(struct fd2_compile_context *ctx,
assert(inst->Instruction.NumSrcRegs == 2);
assert(inst->Instruction.NumDstRegs == 1);
 
-   add_dst_reg(ctx, alu, >Dst[0].Register);
add_src_reg(ctx, alu, >Src[0].Register);
add_src_reg(ctx, alu, >Src[1].Register);
+   add_dst_reg(ctx, alu, >Dst[0].Register);
add_vector_clamp(inst, alu);
 }
 
@@ -447,10 +447,10 @@ add_regs_vector_3(struct fd2_compile_context *ctx,
assert(inst->Instruction.NumSrcRegs == 3);
assert(inst->Instruction.NumDstRegs == 1);
 
-   add_dst_reg(ctx, alu, >Dst[0].Register);
add_src_reg(ctx, alu, >Src[0].Register);
add_src_reg(ctx, alu, >Src[1].Register);
add_src_reg(ctx, alu, >Src[2].Register);
+   add_dst_reg(ctx, alu, >Dst[0].Register);
add_vector_clamp(inst, alu);
 }
 
@@ -461,8 +461,8 @@ add_regs_scalar_1(struct fd2_compile_context *ctx,
assert(inst->Instruction.NumSrcRegs == 1);
assert(inst->Instruction.NumDstRegs == 1);
 
-   add_dst_reg(ctx, alu, >Dst[0].Register);
add_src_reg(ctx, alu, >Src[0].Register);
+   add_dst_reg(ctx, alu, >Dst[0].Register);
add_scalar_clamp(inst, alu);
 }
 
@@ -544,17 +544,17 @@ push_predicate(struct fd2_compile_context *ctx, struct 
tgsi_src_register *src)
get_predicate(ctx, _dst, NULL);
 
alu = ir2_instr_create_alu_s(ctx->so->ir, PRED_SETNEs);
-   add_dst_reg(ctx, alu, _dst);
add_src_reg(ctx, alu, src);
+   add_dst_reg(ctx, alu, _dst);
} else {
struct tgsi_src_register pred_src;
 
get_predicate(ctx, _dst, _src);
 
alu = ir2_instr_create_alu_v(ctx->so->ir, MULv);
-   add_dst_reg(ctx, alu, _dst);
add_src_reg(ctx, alu, _src);
add_src_reg(ctx, alu, src);
+   add_dst_reg(ctx, alu, _dst);
 
// XXX need to make PRED_SETE_PUSHv IR2_PRED_NONE.. but need to 
make
// sure src reg is valid if it was calculated with a predicate
@@ -580,8 +580,8 @@ pop_predicate(struct fd2_compile_context *ctx)
get_predicate(ctx, _dst, _src);
 
alu = ir2_instr_create_alu_s(ctx->so->ir, PRED_SET_POPs);
-   add_dst_reg(ctx, alu, _dst);
add_src_reg(ctx, alu, _src);
+   add_dst_reg(ctx, alu, _dst);
alu->pred = IR2_PRED_NONE;
} else {
/* predicate register no longer needed: */
@@ -648,13 +648,13 @@ translate_pow(struct fd2_compile_context *ctx,
get_internal_temp(ctx, _dst, _src);
 
alu = ir2_instr_create_alu_s(ctx->so->ir, LOG_CLAMP);
-   add_dst_reg(ctx, alu, _dst);
add_src_reg(ctx, alu, >Src[0].Register);
+   add_dst_reg(ctx, alu, _dst);
 
alu = ir2_instr_create_alu_v(ctx->so->ir, MULv);
-   add_dst_reg(ctx, alu, _dst);
add_src_reg(ctx, alu, _src);
add_src_reg(ctx, alu, >Src[1].Register);
+   add_dst_reg(ctx, alu, _dst);
 
/* NOTE: some of the instructions, like EXP_IEEE, seem hard-
 * coded to take their input from the w component.
@@ -679,8 +679,8 @@ translate_pow(struct fd2_compile_context *ctx,
}
 
alu = ir2_instr_create_alu_s(ctx->so->ir, EXP_IEEE);
-   add_dst_reg(ctx, alu, >Dst[0].Register);

[Mesa-dev] [PATCH 05/11] freedreno: a2xx: implement a20x binning shader

2018-10-07 Thread Jonathan Marek
writes to position export are mapped to a temp reg, code inserted at the
end of vertex shaders to export the position and compute the memory
exports for hw binning on a20x. C64 is the offset in the binning data,
C65/C66 are viewport parameters, C67+i/C68+i are binning view parameters.
C3+i is the binning data "pointer" - relative_addr=1 (in ir-a2xx) makes
it not interfere with the other shader constants

Signed-off-by: Jonathan Marek 
---
 .../drivers/freedreno/a2xx/fd2_compiler.c | 72 +--
 src/gallium/drivers/freedreno/a2xx/fd2_emit.c | 14 
 .../drivers/freedreno/a2xx/fd2_program.c  |  6 +-
 src/gallium/drivers/freedreno/a2xx/ir-a2xx.c  | 62 +---
 src/gallium/drivers/freedreno/a2xx/ir-a2xx.h  |  4 +-
 5 files changed, 141 insertions(+), 17 deletions(-)

diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c
index 54f0df54da..1ce3bc4f82 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c
@@ -294,7 +294,7 @@ get_temp_gpr(struct fd2_compile_context *ctx, int idx)
 {
unsigned num = idx + ctx->num_regs[TGSI_FILE_INPUT];
if (ctx->type == PIPE_SHADER_VERTEX)
-   num++;
+   num += 2; /* vertex fetch input / position temp */
return num;
 }
 
@@ -310,12 +310,19 @@ add_dst_reg(struct fd2_compile_context *ctx, struct 
ir2_instruction *alu,
flags |= IR2_REG_EXPORT;
if (ctx->type == PIPE_SHADER_VERTEX) {
if (dst->Index == ctx->position) {
-   num = 62;
+   /* position needed for fragcoord / a20x hw 
binning
+* write to a temp reg instead
+*/
+   num = ctx->num_regs[TGSI_FILE_INPUT] + 1;
+   flags &= ~IR2_REG_EXPORT;
} else if (dst->Index == ctx->psize) {
num = 63;
} else {
-   num = export_linkage(ctx,
-   
ctx->output_export_idx[dst->Index]);
+   num = ctx->prog->export_linkage[
+   
ctx->output_export_idx[dst->Index]];
+   /* not used by fragment shader - ir-a2xx will 
clean it up */
+   if (num == 0xff)
+   num = ctx->prog->num_exports;
}
} else {
num = dst->Index;
@@ -1091,6 +1098,60 @@ compile_instructions(struct fd2_compile_context *ctx)
}
 }
 
+static void
+compile_extra_exports(struct fd2_compile_context *ctx)
+{
+   struct ir2_shader *shader = ctx->so->ir;
+   struct ir2_instruction *instr;
+   int position = ctx->num_regs[TGSI_FILE_INPUT] + 1;
+   unsigned i;
+   /* XXX hacky way to get new temporaries */
+   unsigned tmp = shader->max_reg + 1;
+
+   instr = ir2_instr_create_alu_v(shader, MAXv);
+   ir2_reg_create(instr, position, "xyzw", 0);
+   ir2_reg_create(instr, position, "xyzw", 0);
+   ir2_dst_create(instr, 62, "xyzw", IR2_REG_EXPORT);
+
+   instr = ir2_instr_create_alu_s(shader, RECIP_CLAMP);
+   ir2_reg_create(instr, position, "xyzw", 0);
+   ir2_dst_create(instr, tmp, "___w", 0);
+
+   instr = ir2_instr_create_alu_v(shader, MULv);
+   ir2_reg_create(instr, position, "xyzw", 0);
+   ir2_reg_create(instr, tmp, "", 0);
+   ir2_dst_create(instr, tmp + 1, "xyzw", 0);
+
+   /* these two instructions could be avoided with constant folding
+* but it would be hard to implement..
+*/
+   instr = ir2_instr_create_alu_v(shader, MULADDv);
+   ir2_reg_create(instr, 66, "xyzw", IR2_REG_CONST);
+   ir2_reg_create(instr, tmp + 1, "xyzw", 0);
+   ir2_reg_create(instr, 65, "xyzw", IR2_REG_CONST);
+   ir2_dst_create(instr, tmp + 2, "xyzw", 0);
+
+   instr = ir2_instr_create_alu_v(shader, ADDv);
+   ir2_reg_create(instr, 64, "", IR2_REG_CONST);
+   ir2_reg_create(instr, 15, "", IR2_REG_INPUT);
+   ir2_dst_create(instr, tmp + 3, "x___", 0);
+
+   /* 8 max set in freedreno_screen.. unneeded instrs patched out */
+   for (i = 0; i < 8; i++) {
+   instr = ir2_instr_create_alu_v(shader, MULADDv);
+   ir2_reg_create(instr, 1, "wyww", IR2_REG_CONST);
+   ir2_reg_create(instr, tmp + 3, "", 0);
+   ir2_reg_create(instr, 3 + i, "xyzw", IR2_REG_CONST);
+   

[Mesa-dev] [PATCH 03/11] freedreno: add a20x ids

2018-10-07 Thread Jonathan Marek
the two a20x GPUs tested are a200 in the imx51 and the imx53 (not a205).
the 201 id is used for the imx51 (it only has 128kb gmem as opposed to the
typical 256kb for a200)

Signed-off-by: Jonathan Marek 
---
 src/gallium/drivers/freedreno/freedreno_screen.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c 
b/src/gallium/drivers/freedreno/freedreno_screen.c
index 231e0d4c81..ef722e31fb 100644
--- a/src/gallium/drivers/freedreno/freedreno_screen.c
+++ b/src/gallium/drivers/freedreno/freedreno_screen.c
@@ -797,6 +797,8 @@ fd_screen_create(struct fd_device *dev)
 * send a patch ;-)
 */
switch (screen->gpu_id) {
+   case 200:
+   case 201:
case 205:
case 220:
fd2_screen_init(pscreen);
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 01/11] freedreno: implement the USE_VISIBILITY case for a20x in fd_draw

2018-10-07 Thread Jonathan Marek
this introduces some tracking of the number of vertices drawn in the
current batch: the draw command needs an offset to the start of the
binning data

Signed-off-by: Jonathan Marek 
---
 .../drivers/freedreno/adreno_pm4.xml.h|  7 +
 .../drivers/freedreno/freedreno_batch.c   |  1 +
 .../drivers/freedreno/freedreno_batch.h   |  1 +
 .../drivers/freedreno/freedreno_draw.c|  2 ++
 .../drivers/freedreno/freedreno_draw.h| 28 +--
 .../drivers/freedreno/freedreno_util.h|  8 --
 6 files changed, 42 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/freedreno/adreno_pm4.xml.h 
b/src/gallium/drivers/freedreno/adreno_pm4.xml.h
index 88d1c4e6eb..27bbb1928e 100644
--- a/src/gallium/drivers/freedreno/adreno_pm4.xml.h
+++ b/src/gallium/drivers/freedreno/adreno_pm4.xml.h
@@ -108,6 +108,13 @@ enum pc_di_src_sel {
DI_SRC_SEL_RESERVED = 3,
 };
 
+enum pc_di_face_cull_sel {
+DI_FACE_CULL_NONE = 0,
+DI_FACE_CULL_FETCH = 1,
+DI_FACE_BACKFACE_CULL = 2,
+DI_FACE_FRONTFACE_CULL = 3,
+};
+
 enum pc_di_index_size {
INDEX_SIZE_IGN = 0,
INDEX_SIZE_16_BIT = 0,
diff --git a/src/gallium/drivers/freedreno/freedreno_batch.c 
b/src/gallium/drivers/freedreno/freedreno_batch.c
index a714d97f5c..7ffadea4e0 100644
--- a/src/gallium/drivers/freedreno/freedreno_batch.c
+++ b/src/gallium/drivers/freedreno/freedreno_batch.c
@@ -76,6 +76,7 @@ batch_init(struct fd_batch *batch)
batch->flushed = false;
batch->gmem_reason = 0;
batch->num_draws = 0;
+   batch->num_vertices = 0;
batch->stage = FD_STAGE_NULL;
 
fd_reset_wfi(batch);
diff --git a/src/gallium/drivers/freedreno/freedreno_batch.h 
b/src/gallium/drivers/freedreno/freedreno_batch.h
index 6ff4014ddc..7e6c780aca 100644
--- a/src/gallium/drivers/freedreno/freedreno_batch.h
+++ b/src/gallium/drivers/freedreno/freedreno_batch.h
@@ -124,6 +124,7 @@ struct fd_batch {
FD_GMEM_LOGICOP_ENABLED  = 0x20,
} gmem_reason;
unsigned num_draws;   /* number of draws in current batch */
+   unsigned num_vertices;   /* number of vertices in current batch */
 
/* Track the maximal bounds of the scissor of all the draws within a
 * batch.  Used at the tile rendering step (fd_gmem_render_tiles(),
diff --git a/src/gallium/drivers/freedreno/freedreno_draw.c 
b/src/gallium/drivers/freedreno/freedreno_draw.c
index e130895aac..974a153773 100644
--- a/src/gallium/drivers/freedreno/freedreno_draw.c
+++ b/src/gallium/drivers/freedreno/freedreno_draw.c
@@ -263,6 +263,8 @@ fd_draw_vbo(struct pipe_context *pctx, const struct 
pipe_draw_info *info)
if (ctx->draw_vbo(ctx, info, index_offset))
batch->needs_flush = true;
 
+   batch->num_vertices += info->count;
+
for (i = 0; i < ctx->streamout.num_targets; i++)
ctx->streamout.offsets[i] += info->count;
 
diff --git a/src/gallium/drivers/freedreno/freedreno_draw.h 
b/src/gallium/drivers/freedreno/freedreno_draw.h
index 4a922d9ca3..7f4407a3ae 100644
--- a/src/gallium/drivers/freedreno/freedreno_draw.h
+++ b/src/gallium/drivers/freedreno/freedreno_draw.h
@@ -41,6 +41,7 @@ struct fd_ringbuffer;
 
 void fd_draw_init(struct pipe_context *pctx);
 
+
 static inline void
 fd_draw(struct fd_batch *batch, struct fd_ringbuffer *ring,
enum pc_di_primtype primtype,
@@ -75,9 +76,31 @@ fd_draw(struct fd_batch *batch, struct fd_ringbuffer *ring,
}
 
if (is_a20x(batch->ctx->screen)) {
-   OUT_PKT3(ring, CP_DRAW_INDX, idx_buffer ? 4 : 2);
+   /* a20x has a different draw command for drawing with binning 
data
+* that makes it harder to patch so always use hw binning if 
enabled
+*
+* binning data is is 1 byte/vertex (8x8x4 bin position of 
vertex)
+* base ptr set by the CP_SET_DRAW_INIT_FLAGS command
+*
+* TODO: investigate the faceness_cull_select parameter to see 
how
+* it is used with hw binning to use "faceness" bits
+*/
+   bool bin = (vismode == USE_VISIBILITY);
+   uint32_t draw_initiator = DRAW_A20X(primtype, DI_FACE_CULL_NONE,
+   src_sel, idx_type, bin, bin, count);
+   uint32_t size = 2;
+   if (bin)
+   size += 2;
+   if (idx_buffer)
+   size += 2;
+
+   OUT_PKT3(ring, bin ? CP_DRAW_INDX_BIN : CP_DRAW_INDX, size);
OUT_RING(ring, 0x);
-   OUT_RING(ring, DRAW_A20X(primtype, src_sel, idx_type, vismode, 
count));
+   OUT_RING(ring, draw_initiator);
+   if (bin) {
+   OUT_RING(ring, batch->num_vertices);
+   OUT_RING(ring, count);
+   }
} else

[Mesa-dev] [PATCH 11/11] freedreno: a2xx: set PA_SC_VIZ_QUERY register

2018-09-17 Thread Jonathan Marek
on a20x the GPU will hang if this register is zero

Signed-off-by: Jonathan Marek 
---
 src/gallium/drivers/freedreno/a2xx/fd2_emit.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c
index dcb7b6500a..4a1085e676 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c
@@ -358,6 +358,10 @@ fd2_emit_restore(struct fd_context *ctx, struct 
fd_ringbuffer *ring)
A2XX_RB_BC_CONTROL_MEM_EXPORT_TIMEOUT_SELECT(3));
}
 
+   OUT_PKT3(ring, CP_SET_CONSTANT, 2);
+   OUT_RING(ring, CP_REG(REG_A2XX_PA_SC_VIZ_QUERY));
+   OUT_RING(ring, A2XX_PA_SC_VIZ_QUERY_VIZ_QUERY_ID(16));
+
OUT_PKT0(ring, REG_A2XX_TP0_CHICKEN, 1);
OUT_RING(ring, 0x0002);
 
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 10/11] freedreno: a2xx: split large draws on a20x

2018-09-17 Thread Jonathan Marek
a20x can only draw 65535 vertices at once. this fix only applies to
triangles.

Signed-off-by: Jonathan Marek 
---
 src/gallium/drivers/freedreno/a2xx/fd2_draw.c | 30 +--
 1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c
index 1792505808..7ccbee587f 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c
@@ -171,8 +171,34 @@ fd2_draw_vbo(struct fd_context *ctx, const struct 
pipe_draw_info *pinfo,
fd2_emit_state(ctx, ctx->batch->draw, ctx->dirty);
fd2_emit_state(ctx, ctx->batch->binning, ctx->dirty);
 
-   draw_impl(ctx, pinfo, ctx->batch->draw, index_offset, false);
-   draw_impl(ctx, pinfo, ctx->batch->binning, index_offset, true);
+   /* a20x can only draw 65535 vertices at once... */
+   if (is_a20x(ctx->screen) && pinfo->count > 0x) {
+   struct pipe_draw_info info = *pinfo;
+   unsigned count = info.count;
+   unsigned num_vertices = ctx->batch->num_vertices;
+
+   /* other primitives require more work
+* (triangles works because 0x is divible by 3)
+*/
+   if (info.mode != PIPE_PRIM_TRIANGLES)
+   return false;
+
+   for (; count; ) {
+   info.count = MIN2(count, 0x);
+
+   draw_impl(ctx, , ctx->batch->draw, index_offset, 
false);
+   draw_impl(ctx, , ctx->batch->binning, 
index_offset, true);
+
+   info.start += 0x;
+   ctx->batch->num_vertices += 0x;
+   count -= info.count;
+   }
+   /* changing this value is a hack, restore it */
+   ctx->batch->num_vertices = num_vertices;
+   } else {
+   draw_impl(ctx, pinfo, ctx->batch->draw, index_offset, false);
+   draw_impl(ctx, pinfo, ctx->batch->binning, index_offset, true);
+   }
 
fd_context_all_clean(ctx);
 
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 08/11] freedreno: a2xx: add fragcoord

2018-09-17 Thread Jonathan Marek
emulated fragcoord. a2xx has *some* hw support but it is not practical

Signed-off-by: Jonathan Marek 
---
 .../drivers/freedreno/a2xx/fd2_compiler.c| 16 
 1 file changed, 16 insertions(+)

diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c
index 1ce3bc4f82..ab5d16f1a7 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c
@@ -186,6 +186,7 @@ compile_init(struct fd2_compile_context *ctx, struct 
fd_program_stateobj *prog,
switch (name) {
case TGSI_SEMANTIC_COLOR:
case TGSI_SEMANTIC_GENERIC:
+   case TGSI_SEMANTIC_POSITION:
ctx->num_param++;
break;
default:
@@ -325,6 +326,8 @@ add_dst_reg(struct fd2_compile_context *ctx, struct 
ir2_instruction *alu,
num = ctx->prog->num_exports;
}
} else {
+   /* write to gl_FragCoord.z not possible */
+   assert(ctx->output_export_idx[dst->Index] != 
TGSI_SEMANTIC_POSITION);
num = dst->Index;
}
break;
@@ -1103,6 +1106,7 @@ compile_extra_exports(struct fd2_compile_context *ctx)
 {
struct ir2_shader *shader = ctx->so->ir;
struct ir2_instruction *instr;
+   int fragcoord = ctx->prog->export_linkage[TGSI_SEMANTIC_POSITION];
int position = ctx->num_regs[TGSI_FILE_INPUT] + 1;
unsigned i;
/* XXX hacky way to get new temporaries */
@@ -1122,6 +1126,18 @@ compile_extra_exports(struct fd2_compile_context *ctx)
ir2_reg_create(instr, tmp, "", 0);
ir2_dst_create(instr, tmp + 1, "xyzw", 0);
 
+   if (fragcoord != 0xff) {
+   instr = ir2_instr_create_alu_v(shader, MULADDv);
+   ir2_reg_create(instr, 66, "xyzw", IR2_REG_CONST);
+   ir2_reg_create(instr, tmp + 1, "xyzw", 0);
+   ir2_reg_create(instr, 65, "xyzw", IR2_REG_CONST);
+   ir2_dst_create(instr, fragcoord, "xyz_", IR2_REG_EXPORT);
+
+   instr = ir2_instr_create_alu_s(shader, MAXs);
+   ir2_reg_create(instr, tmp, "", 0);
+   ir2_dst_create(instr, fragcoord, "___w", IR2_REG_EXPORT);
+   }
+
/* these two instructions could be avoided with constant folding
 * but it would be hard to implement..
 */
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 09/11] freedreno: a2xx: a20x hw binning

2018-09-17 Thread Jonathan Marek
adds all the required logic for a20x hw binning to work

Signed-off-by: Jonathan Marek 
---
 src/gallium/drivers/freedreno/a2xx/fd2_draw.c |  95 
 src/gallium/drivers/freedreno/a2xx/fd2_emit.c |  10 +-
 src/gallium/drivers/freedreno/a2xx/fd2_emit.h |   3 +-
 src/gallium/drivers/freedreno/a2xx/fd2_gmem.c | 106 +-
 .../drivers/freedreno/a2xx/fd2_program.c  |  41 ---
 .../drivers/freedreno/a2xx/fd2_program.h  |   2 +-
 6 files changed, 214 insertions(+), 43 deletions(-)

diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c
index 6f0535fa2b..1792505808 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c
@@ -77,31 +77,56 @@ emit_vertexbufs(struct fd_context *ctx)
// CONST(20,0) (or CONST(26,0) in soliv_vp)
 
fd2_emit_vertex_bufs(ctx->batch->draw, 0x78, bufs, vtx->num_elements);
+   fd2_emit_vertex_bufs(ctx->batch->binning, 0x78, bufs, 
vtx->num_elements);
 }
 
-static bool
-fd2_draw_vbo(struct fd_context *ctx, const struct pipe_draw_info *info,
- unsigned index_offset)
+static void
+draw_impl(struct fd_context *ctx, const struct pipe_draw_info *info,
+  struct fd_ringbuffer *ring, unsigned index_offset,
+  bool binning)
 {
-   struct fd_ringbuffer *ring = ctx->batch->draw;
-
-   if (ctx->dirty & FD_DIRTY_VTXBUF)
-   emit_vertexbufs(ctx);
-
-   fd2_emit_state(ctx, ctx->dirty);
+   enum pc_di_vis_cull_mode vismode;
 
OUT_PKT3(ring, CP_SET_CONSTANT, 2);
OUT_RING(ring, CP_REG(REG_A2XX_VGT_INDX_OFFSET));
-   OUT_RING(ring, info->start);
+   OUT_RING(ring, info->index_size ? 0 : info->start);
 
-   OUT_PKT3(ring, CP_SET_CONSTANT, 2);
-   OUT_RING(ring, CP_REG(REG_A2XX_VGT_VERTEX_REUSE_BLOCK_CNTL));
-   OUT_RING(ring, 0x003b);
+   /* in the binning batch, thid value is set once in fd2_emit_tile_init */
+   if (!binning) {
+   OUT_PKT3(ring, CP_SET_CONSTANT, 2);
+   OUT_RING(ring, CP_REG(REG_A2XX_VGT_VERTEX_REUSE_BLOCK_CNTL));
+   /* XXX do this for every REG_A2XX_VGT_VERTEX_REUSE_BLOCK_CNTL 
write ?
+* if set to 0x3b on a20x, clipping is broken
+*/
+   OUT_RING(ring, is_a20x(ctx->screen) ? 0x0002 : 0x003b);
+   }
 
OUT_PKT0(ring, REG_A2XX_TC_CNTL_STATUS, 1);
OUT_RING(ring, A2XX_TC_CNTL_STATUS_L2_INVALIDATE);
 
-   if (!is_a20x(ctx->screen)) {
+   if (is_a20x(ctx->screen)) {
+   /* wait for DMA to finish and
+* dummy draw one triangle with indexes 0,0,0.
+* with PRE_FETCH_CULL_ENABLE | GRP_CULL_ENABLE.
+*
+* this workaround is for a HW bug related to DMA alignment:
+* it is necessary for indexed draws and possibly also
+* draws that read binning data
+*/
+   OUT_PKT3(ring, CP_WAIT_REG_EQ, 4);
+   OUT_RING(ring, 0x05d0); /* RBBM_STATUS */
+   OUT_RING(ring, 0x);
+   OUT_RING(ring, 0x1000); /* bit: 12: VGT_BUSY_NO_DMA */
+   OUT_RING(ring, 0x0001);
+
+   OUT_PKT3(ring, CP_DRAW_INDX_BIN, 6);
+   OUT_RING(ring, 0x);
+   OUT_RING(ring, 0x0003c004);
+   OUT_RING(ring, 0x);
+   OUT_RING(ring, 0x0003);
+   OUT_RELOC(ring, 
fd_resource(fd2_context(ctx)->solid_vertexbuf)->bo, 0x80, 0, 0);
+   OUT_RING(ring, 0x0006);
+   } else {
OUT_WFI (ring);
 
OUT_PKT3(ring, CP_SET_CONSTANT, 3);
@@ -110,14 +135,44 @@ fd2_draw_vbo(struct fd_context *ctx, const struct 
pipe_draw_info *info,
OUT_RING(ring, info->min_index);/* VGT_MIN_VTX_INDX */
}
 
+   /* C64 holds offset to use for binning data */
+   if (binning && is_a20x(ctx->screen)) {
+   OUT_PKT3(ring, CP_SET_CONSTANT, 5);
+   OUT_RING(ring, 0x0180);
+   OUT_RING(ring, fui(ctx->batch->num_vertices));
+   OUT_RING(ring, fui(0.0f));
+   OUT_RING(ring, fui(0.0f));
+   OUT_RING(ring, fui(0.0f));
+   }
+
+   vismode = binning ? IGNORE_VISIBILITY : USE_VISIBILITY;
+   /* a22x hw binning not implemented */
+   if (binning || !is_a20x(ctx->screen) || (fd_mesa_debug & FD_DBG_NOBIN))
+   vismode = IGNORE_VISIBILITY;
+
fd_draw_emit(ctx->batch, ring, ctx->primtypes[info->mode],
-IGNORE_VISIBILITY, info, index_offset);
+   vismode, info, index_offset);
 
-   OUT_PKT3(ring, CP_SET_CONSTANT, 2);
-   OUT_RING(ring, CP_REG(REG_A2XX_UNKNOWN_2010))

[Mesa-dev] [PATCH 07/11] freedreno: a2xx: implement a20x binning shader

2018-09-17 Thread Jonathan Marek
writes to position export are mapped to a temp reg, code inserted at the
end of vertex shaders to export the position and compute the memory
exports for hw binning on a20x. C64 is the offset in the binning data,
C65/C66 are viewport parameters, C67+i/C68+i are binning view parameters.
C3+i is the binning data "pointer" - relative_addr=1 (in ir-a2xx) makes
it not interfere with the other shader constants

Signed-off-by: Jonathan Marek 
---
 .../drivers/freedreno/a2xx/fd2_compiler.c | 72 +--
 src/gallium/drivers/freedreno/a2xx/fd2_emit.c | 14 
 .../drivers/freedreno/a2xx/fd2_program.c  |  6 +-
 src/gallium/drivers/freedreno/a2xx/ir-a2xx.c  | 62 +---
 src/gallium/drivers/freedreno/a2xx/ir-a2xx.h  |  4 +-
 5 files changed, 141 insertions(+), 17 deletions(-)

diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c
index 54f0df54da..1ce3bc4f82 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c
@@ -294,7 +294,7 @@ get_temp_gpr(struct fd2_compile_context *ctx, int idx)
 {
unsigned num = idx + ctx->num_regs[TGSI_FILE_INPUT];
if (ctx->type == PIPE_SHADER_VERTEX)
-   num++;
+   num += 2; /* vertex fetch input / position temp */
return num;
 }
 
@@ -310,12 +310,19 @@ add_dst_reg(struct fd2_compile_context *ctx, struct 
ir2_instruction *alu,
flags |= IR2_REG_EXPORT;
if (ctx->type == PIPE_SHADER_VERTEX) {
if (dst->Index == ctx->position) {
-   num = 62;
+   /* position needed for fragcoord / a20x hw 
binning
+* write to a temp reg instead
+*/
+   num = ctx->num_regs[TGSI_FILE_INPUT] + 1;
+   flags &= ~IR2_REG_EXPORT;
} else if (dst->Index == ctx->psize) {
num = 63;
} else {
-   num = export_linkage(ctx,
-   
ctx->output_export_idx[dst->Index]);
+   num = ctx->prog->export_linkage[
+   
ctx->output_export_idx[dst->Index]];
+   /* not used by fragment shader - ir-a2xx will 
clean it up */
+   if (num == 0xff)
+   num = ctx->prog->num_exports;
}
} else {
num = dst->Index;
@@ -1091,6 +1098,60 @@ compile_instructions(struct fd2_compile_context *ctx)
}
 }
 
+static void
+compile_extra_exports(struct fd2_compile_context *ctx)
+{
+   struct ir2_shader *shader = ctx->so->ir;
+   struct ir2_instruction *instr;
+   int position = ctx->num_regs[TGSI_FILE_INPUT] + 1;
+   unsigned i;
+   /* XXX hacky way to get new temporaries */
+   unsigned tmp = shader->max_reg + 1;
+
+   instr = ir2_instr_create_alu_v(shader, MAXv);
+   ir2_reg_create(instr, position, "xyzw", 0);
+   ir2_reg_create(instr, position, "xyzw", 0);
+   ir2_dst_create(instr, 62, "xyzw", IR2_REG_EXPORT);
+
+   instr = ir2_instr_create_alu_s(shader, RECIP_CLAMP);
+   ir2_reg_create(instr, position, "xyzw", 0);
+   ir2_dst_create(instr, tmp, "___w", 0);
+
+   instr = ir2_instr_create_alu_v(shader, MULv);
+   ir2_reg_create(instr, position, "xyzw", 0);
+   ir2_reg_create(instr, tmp, "", 0);
+   ir2_dst_create(instr, tmp + 1, "xyzw", 0);
+
+   /* these two instructions could be avoided with constant folding
+* but it would be hard to implement..
+*/
+   instr = ir2_instr_create_alu_v(shader, MULADDv);
+   ir2_reg_create(instr, 66, "xyzw", IR2_REG_CONST);
+   ir2_reg_create(instr, tmp + 1, "xyzw", 0);
+   ir2_reg_create(instr, 65, "xyzw", IR2_REG_CONST);
+   ir2_dst_create(instr, tmp + 2, "xyzw", 0);
+
+   instr = ir2_instr_create_alu_v(shader, ADDv);
+   ir2_reg_create(instr, 64, "", IR2_REG_CONST);
+   ir2_reg_create(instr, 15, "", IR2_REG_INPUT);
+   ir2_dst_create(instr, tmp + 3, "x___", 0);
+
+   /* 8 max set in freedreno_screen.. unneeded instrs patched out */
+   for (i = 0; i < 8; i++) {
+   instr = ir2_instr_create_alu_v(shader, MULADDv);
+   ir2_reg_create(instr, 1, "wyww", IR2_REG_CONST);
+   ir2_reg_create(instr, tmp + 3, "", 0);
+   ir2_reg_create(instr, 3 + i, "xyzw", IR2_REG_CONST);
+   

[Mesa-dev] [PATCH 06/11] freedreno: a2xx: map tgsi ids to ir2 ids

2018-09-17 Thread Jonathan Marek
this is for a2xx specific semantics (vertex id) and a basic SSA form

Signed-off-by: Jonathan Marek 
---
 .../drivers/freedreno/a2xx/fd2_compiler.c | 54 +--
 src/gallium/drivers/freedreno/a2xx/ir-a2xx.c  | 45 ++--
 src/gallium/drivers/freedreno/a2xx/ir-a2xx.h  |  9 
 3 files changed, 64 insertions(+), 44 deletions(-)

diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c
index 12f9a1ce0a..54f0df54da 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c
@@ -244,8 +244,8 @@ compile_vtx_fetch(struct fd2_compile_context *ctx)
 
ctx->need_sync |= 1 << (i+1);
 
-   ir2_dst_create(instr, i+1, "xyzw", 0);
ir2_reg_create(instr, 0, "x", IR2_REG_INPUT);
+   ir2_dst_create(instr, i+1, "xyzw", 0);
 
if (i == 0)
instr->sync = true;
@@ -421,9 +421,9 @@ add_regs_vector_1(struct fd2_compile_context *ctx,
assert(inst->Instruction.NumSrcRegs == 1);
assert(inst->Instruction.NumDstRegs == 1);
 
-   add_dst_reg(ctx, alu, >Dst[0].Register);
add_src_reg(ctx, alu, >Src[0].Register);
add_src_reg(ctx, alu, >Src[0].Register);
+   add_dst_reg(ctx, alu, >Dst[0].Register);
add_vector_clamp(inst, alu);
 }
 
@@ -434,9 +434,9 @@ add_regs_vector_2(struct fd2_compile_context *ctx,
assert(inst->Instruction.NumSrcRegs == 2);
assert(inst->Instruction.NumDstRegs == 1);
 
-   add_dst_reg(ctx, alu, >Dst[0].Register);
add_src_reg(ctx, alu, >Src[0].Register);
add_src_reg(ctx, alu, >Src[1].Register);
+   add_dst_reg(ctx, alu, >Dst[0].Register);
add_vector_clamp(inst, alu);
 }
 
@@ -447,10 +447,10 @@ add_regs_vector_3(struct fd2_compile_context *ctx,
assert(inst->Instruction.NumSrcRegs == 3);
assert(inst->Instruction.NumDstRegs == 1);
 
-   add_dst_reg(ctx, alu, >Dst[0].Register);
add_src_reg(ctx, alu, >Src[0].Register);
add_src_reg(ctx, alu, >Src[1].Register);
add_src_reg(ctx, alu, >Src[2].Register);
+   add_dst_reg(ctx, alu, >Dst[0].Register);
add_vector_clamp(inst, alu);
 }
 
@@ -461,8 +461,8 @@ add_regs_scalar_1(struct fd2_compile_context *ctx,
assert(inst->Instruction.NumSrcRegs == 1);
assert(inst->Instruction.NumDstRegs == 1);
 
-   add_dst_reg(ctx, alu, >Dst[0].Register);
add_src_reg(ctx, alu, >Src[0].Register);
+   add_dst_reg(ctx, alu, >Dst[0].Register);
add_scalar_clamp(inst, alu);
 }
 
@@ -544,17 +544,17 @@ push_predicate(struct fd2_compile_context *ctx, struct 
tgsi_src_register *src)
get_predicate(ctx, _dst, NULL);
 
alu = ir2_instr_create_alu_s(ctx->so->ir, PRED_SETNEs);
-   add_dst_reg(ctx, alu, _dst);
add_src_reg(ctx, alu, src);
+   add_dst_reg(ctx, alu, _dst);
} else {
struct tgsi_src_register pred_src;
 
get_predicate(ctx, _dst, _src);
 
alu = ir2_instr_create_alu_v(ctx->so->ir, MULv);
-   add_dst_reg(ctx, alu, _dst);
add_src_reg(ctx, alu, _src);
add_src_reg(ctx, alu, src);
+   add_dst_reg(ctx, alu, _dst);
 
// XXX need to make PRED_SETE_PUSHv IR2_PRED_NONE.. but need to 
make
// sure src reg is valid if it was calculated with a predicate
@@ -580,8 +580,8 @@ pop_predicate(struct fd2_compile_context *ctx)
get_predicate(ctx, _dst, _src);
 
alu = ir2_instr_create_alu_s(ctx->so->ir, PRED_SET_POPs);
-   add_dst_reg(ctx, alu, _dst);
add_src_reg(ctx, alu, _src);
+   add_dst_reg(ctx, alu, _dst);
alu->pred = IR2_PRED_NONE;
} else {
/* predicate register no longer needed: */
@@ -648,13 +648,13 @@ translate_pow(struct fd2_compile_context *ctx,
get_internal_temp(ctx, _dst, _src);
 
alu = ir2_instr_create_alu_s(ctx->so->ir, LOG_CLAMP);
-   add_dst_reg(ctx, alu, _dst);
add_src_reg(ctx, alu, >Src[0].Register);
+   add_dst_reg(ctx, alu, _dst);
 
alu = ir2_instr_create_alu_v(ctx->so->ir, MULv);
-   add_dst_reg(ctx, alu, _dst);
add_src_reg(ctx, alu, _src);
add_src_reg(ctx, alu, >Src[1].Register);
+   add_dst_reg(ctx, alu, _dst);
 
/* NOTE: some of the instructions, like EXP_IEEE, seem hard-
 * coded to take their input from the w component.
@@ -679,8 +679,8 @@ translate_pow(struct fd2_compile_context *ctx,
}
 
alu = ir2_instr_create_alu_s(ctx->so->ir, EXP_IEEE);
-   add_dst_reg(ctx, alu, >Dst[0].Register);

[Mesa-dev] [PATCH 05/11] freedreno: add a20x ids

2018-09-17 Thread Jonathan Marek
the two a20x GPUs tested are a200 in the imx51 and the imx53 (not a205).
the 201 id is used for the imx51 (it only has 128kb gmem as opposed to the
typical 256kb for a200)

Signed-off-by: Jonathan Marek 
---
 src/gallium/drivers/freedreno/freedreno_screen.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c 
b/src/gallium/drivers/freedreno/freedreno_screen.c
index ef88e5b121..c39c140fac 100644
--- a/src/gallium/drivers/freedreno/freedreno_screen.c
+++ b/src/gallium/drivers/freedreno/freedreno_screen.c
@@ -782,6 +782,8 @@ fd_screen_create(struct fd_device *dev, struct renderonly 
*ro)
 * send a patch ;-)
 */
switch (screen->gpu_id) {
+   case 200:
+   case 201:
case 205:
case 220:
fd2_screen_init(pscreen);
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 02/11] freedreno: implement different pipe configuration for a20x

2018-09-17 Thread Jonathan Marek
this also adds a num_vsc_pipe which represents the number of pipes to use:
this value is useful because more pipes has a higher cost (on a20x)

Signed-off-by: Jonathan Marek 
---
 .../drivers/freedreno/freedreno_context.h |  1 +
 .../drivers/freedreno/freedreno_gmem.c| 30 ++-
 2 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/src/gallium/drivers/freedreno/freedreno_context.h 
b/src/gallium/drivers/freedreno/freedreno_context.h
index 58fba99874..e150fdef5a 100644
--- a/src/gallium/drivers/freedreno/freedreno_context.h
+++ b/src/gallium/drivers/freedreno/freedreno_context.h
@@ -260,6 +260,7 @@ struct fd_context {
struct fd_gmem_stateobj gmem;
struct fd_vsc_pipe  vsc_pipe[16];
struct fd_tile  tile[512];
+   unsigned num_vsc_pipe;
 
/* which state objects need to be re-emit'd: */
enum fd_dirty_3d_state dirty;
diff --git a/src/gallium/drivers/freedreno/freedreno_gmem.c 
b/src/gallium/drivers/freedreno/freedreno_gmem.c
index 981ab0cf76..44133a19ab 100644
--- a/src/gallium/drivers/freedreno/freedreno_gmem.c
+++ b/src/gallium/drivers/freedreno/freedreno_gmem.c
@@ -215,12 +215,21 @@ calculate_tiles(struct fd_batch *batch)
 
 #define div_round_up(v, a)  (((v) + (a) - 1) / (a))
/* figure out number of tiles per pipe: */
-   tpp_x = tpp_y = 1;
-   while (div_round_up(nbins_y, tpp_y) > 8)
-   tpp_y += 2;
-   while ((div_round_up(nbins_y, tpp_y) *
-   div_round_up(nbins_x, tpp_x)) > 8)
-   tpp_x += 1;
+   if (is_a20x(ctx->screen)) {
+   /* for a20x we want to minimize the number of "pipes"
+* binning data has 3 bits for x/y (8x8) but the edges are used 
to
+* cull off-screen vertices with hw binning, so we have 6x6 
pipes
+*/
+   tpp_x = 6;
+   tpp_y = 6;
+   } else {
+   tpp_x = tpp_y = 1;
+   while (div_round_up(nbins_y, tpp_y) > 8)
+   tpp_y += 2;
+   while ((div_round_up(nbins_y, tpp_y) *
+   div_round_up(nbins_x, tpp_x)) > 8)
+   tpp_x += 1;
+   }
 
gmem->maxpw = tpp_x;
gmem->maxph = tpp_y;
@@ -246,6 +255,10 @@ calculate_tiles(struct fd_batch *batch)
 
xoff += tpp_x;
}
+   /* number of pipes to use (for a20x)
+* at least 1 pipe is needed
+*/
+   ctx->num_vsc_pipe = i ? i : 1;
 
for (; i < npipes; i++) {
struct fd_vsc_pipe *pipe = >vsc_pipe[i];
@@ -281,11 +294,12 @@ calculate_tiles(struct fd_batch *batch)
 
/* pipe number: */
p = ((i / tpp_y) * div_round_up(nbins_x, tpp_x)) + (j / 
tpp_x);
+   assert(p < ctx->num_vsc_pipe);
 
/* clip bin width: */
bw = MIN2(bin_w, minx + width - xoff);
-
-   tile->n = tile_n[p]++;
+   tile->n = !is_a20x(ctx->screen) ? tile_n[p]++ :
+   ((i % tpp_y + 1) << 3 | (j % tpp_x + 1));
tile->p = p;
tile->bin_w = bw;
tile->bin_h = bh;
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 01/11] freedreno: implement the USE_VISIBILITY case for a20x in fd_draw

2018-09-17 Thread Jonathan Marek
this introduces some tracking of the number of vertices drawn in the
current batch: the draw command needs an offset to the start of the
binning data

Signed-off-by: Jonathan Marek 
---
 .../drivers/freedreno/adreno_pm4.xml.h|  7 +
 .../drivers/freedreno/freedreno_batch.c   |  1 +
 .../drivers/freedreno/freedreno_batch.h   |  1 +
 .../drivers/freedreno/freedreno_draw.c|  2 ++
 .../drivers/freedreno/freedreno_draw.h| 28 +--
 .../drivers/freedreno/freedreno_util.h|  8 --
 6 files changed, 42 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/freedreno/adreno_pm4.xml.h 
b/src/gallium/drivers/freedreno/adreno_pm4.xml.h
index fe96a1381f..eff0ed9f8e 100644
--- a/src/gallium/drivers/freedreno/adreno_pm4.xml.h
+++ b/src/gallium/drivers/freedreno/adreno_pm4.xml.h
@@ -108,6 +108,13 @@ enum pc_di_src_sel {
DI_SRC_SEL_RESERVED = 3,
 };
 
+enum pc_di_face_cull_sel {
+DI_FACE_CULL_NONE = 0,
+DI_FACE_CULL_FETCH = 1,
+DI_FACE_BACKFACE_CULL = 2,
+DI_FACE_FRONTFACE_CULL = 3,
+};
+
 enum pc_di_index_size {
INDEX_SIZE_IGN = 0,
INDEX_SIZE_16_BIT = 0,
diff --git a/src/gallium/drivers/freedreno/freedreno_batch.c 
b/src/gallium/drivers/freedreno/freedreno_batch.c
index ff8298e82a..ad60c2742c 100644
--- a/src/gallium/drivers/freedreno/freedreno_batch.c
+++ b/src/gallium/drivers/freedreno/freedreno_batch.c
@@ -75,6 +75,7 @@ batch_init(struct fd_batch *batch)
batch->flushed = false;
batch->gmem_reason = 0;
batch->num_draws = 0;
+   batch->num_vertices = 0;
batch->stage = FD_STAGE_NULL;
 
fd_reset_wfi(batch);
diff --git a/src/gallium/drivers/freedreno/freedreno_batch.h 
b/src/gallium/drivers/freedreno/freedreno_batch.h
index 6bb88a6291..67cadd5633 100644
--- a/src/gallium/drivers/freedreno/freedreno_batch.h
+++ b/src/gallium/drivers/freedreno/freedreno_batch.h
@@ -120,6 +120,7 @@ struct fd_batch {
FD_GMEM_LOGICOP_ENABLED  = 0x20,
} gmem_reason;
unsigned num_draws;   /* number of draws in current batch */
+   unsigned num_vertices;   /* number of vertices in current batch */
 
/* Track the maximal bounds of the scissor of all the draws within a
 * batch.  Used at the tile rendering step (fd_gmem_render_tiles(),
diff --git a/src/gallium/drivers/freedreno/freedreno_draw.c 
b/src/gallium/drivers/freedreno/freedreno_draw.c
index f55905e7bf..ee8aab48c3 100644
--- a/src/gallium/drivers/freedreno/freedreno_draw.c
+++ b/src/gallium/drivers/freedreno/freedreno_draw.c
@@ -254,6 +254,8 @@ fd_draw_vbo(struct pipe_context *pctx, const struct 
pipe_draw_info *info)
if (ctx->draw_vbo(ctx, info, index_offset))
batch->needs_flush = true;
 
+   batch->num_vertices += info->count;
+
for (i = 0; i < ctx->streamout.num_targets; i++)
ctx->streamout.offsets[i] += info->count;
 
diff --git a/src/gallium/drivers/freedreno/freedreno_draw.h 
b/src/gallium/drivers/freedreno/freedreno_draw.h
index 4a922d9ca3..7f4407a3ae 100644
--- a/src/gallium/drivers/freedreno/freedreno_draw.h
+++ b/src/gallium/drivers/freedreno/freedreno_draw.h
@@ -41,6 +41,7 @@ struct fd_ringbuffer;
 
 void fd_draw_init(struct pipe_context *pctx);
 
+
 static inline void
 fd_draw(struct fd_batch *batch, struct fd_ringbuffer *ring,
enum pc_di_primtype primtype,
@@ -75,9 +76,31 @@ fd_draw(struct fd_batch *batch, struct fd_ringbuffer *ring,
}
 
if (is_a20x(batch->ctx->screen)) {
-   OUT_PKT3(ring, CP_DRAW_INDX, idx_buffer ? 4 : 2);
+   /* a20x has a different draw command for drawing with binning 
data
+* that makes it harder to patch so always use hw binning if 
enabled
+*
+* binning data is is 1 byte/vertex (8x8x4 bin position of 
vertex)
+* base ptr set by the CP_SET_DRAW_INIT_FLAGS command
+*
+* TODO: investigate the faceness_cull_select parameter to see 
how
+* it is used with hw binning to use "faceness" bits
+*/
+   bool bin = (vismode == USE_VISIBILITY);
+   uint32_t draw_initiator = DRAW_A20X(primtype, DI_FACE_CULL_NONE,
+   src_sel, idx_type, bin, bin, count);
+   uint32_t size = 2;
+   if (bin)
+   size += 2;
+   if (idx_buffer)
+   size += 2;
+
+   OUT_PKT3(ring, bin ? CP_DRAW_INDX_BIN : CP_DRAW_INDX, size);
OUT_RING(ring, 0x);
-   OUT_RING(ring, DRAW_A20X(primtype, src_sel, idx_type, vismode, 
count));
+   OUT_RING(ring, draw_initiator);
+   if (bin) {
+   OUT_RING(ring, batch->num_vertices);
+   OUT_RING(ring, count);
+   }
} else

[Mesa-dev] [PATCH 03/11] freedreno: use renderonly scanout

2018-09-17 Thread Jonathan Marek
Signed-off-by: Jonathan Marek 
---
 .../drivers/freedreno/freedreno_resource.c| 57 +--
 .../drivers/freedreno/freedreno_resource.h|  1 +
 .../drivers/freedreno/freedreno_screen.c  | 29 +++---
 .../drivers/freedreno/freedreno_screen.h  | 10 ++--
 .../freedreno/drm/freedreno_drm_public.h  |  4 ++
 .../freedreno/drm/freedreno_drm_winsys.c  | 23 ++--
 6 files changed, 85 insertions(+), 39 deletions(-)

diff --git a/src/gallium/drivers/freedreno/freedreno_resource.c 
b/src/gallium/drivers/freedreno/freedreno_resource.c
index 344004f696..adfa0f27a7 100644
--- a/src/gallium/drivers/freedreno/freedreno_resource.c
+++ b/src/gallium/drivers/freedreno/freedreno_resource.c
@@ -645,6 +645,9 @@ fd_resource_destroy(struct pipe_screen *pscreen,
fd_bc_invalidate_resource(rsc, true);
if (rsc->bo)
fd_bo_del(rsc->bo);
+   if (rsc->scanout)
+   renderonly_scanout_destroy(rsc->scanout, 
fd_screen(pscreen)->ro);
+
util_range_destroy(>valid_buffer_range);
FREE(rsc);
 }
@@ -657,9 +660,26 @@ fd_resource_get_handle(struct pipe_screen *pscreen,
unsigned usage)
 {
struct fd_resource *rsc = fd_resource(prsc);
-
-   return fd_screen_bo_get_handle(pscreen, rsc->bo,
-   rsc->slices[0].pitch * rsc->cpp, handle);
+   struct renderonly_scanout *scanout = rsc->scanout;
+   struct fd_bo *bo = rsc->bo;
+
+   handle->stride = rsc->slices[0].pitch * rsc->cpp;
+
+   if (handle->type == WINSYS_HANDLE_TYPE_SHARED) {
+   return fd_bo_get_name(bo, >handle) == 0;
+   } else if (handle->type == WINSYS_HANDLE_TYPE_KMS) {
+   if (renderonly_get_handle(scanout, handle)) {
+   return TRUE;
+   } else {
+   handle->handle = fd_bo_handle(bo);
+   return TRUE;
+   }
+   } else if (handle->type == WINSYS_HANDLE_TYPE_FD) {
+   handle->handle = fd_bo_dmabuf(bo);
+   return TRUE;
+   } else {
+   return FALSE;
+   }
 }
 
 static uint32_t
@@ -801,8 +821,8 @@ fd_resource_create(struct pipe_screen *pscreen,
const struct pipe_resource *tmpl)
 {
struct fd_screen *screen = fd_screen(pscreen);
-   struct fd_resource *rsc = CALLOC_STRUCT(fd_resource);
-   struct pipe_resource *prsc = >base;
+   struct fd_resource *rsc;
+   struct pipe_resource *prsc;
enum pipe_format format = tmpl->format;
uint32_t size;
 
@@ -813,6 +833,33 @@ fd_resource_create(struct pipe_screen *pscreen,
tmpl->array_size, tmpl->last_level, tmpl->nr_samples,
tmpl->usage, tmpl->bind, tmpl->flags);
 
+   if (tmpl->bind & PIPE_BIND_SCANOUT) {
+   struct pipe_resource scanout_templat = *tmpl;
+   struct renderonly_scanout *scanout;
+   struct winsys_handle handle;
+
+   scanout = renderonly_scanout_for_resource(_templat,
+   
screen->ro, );
+   if (!scanout)
+   return NULL;
+
+   assert(handle.type == WINSYS_HANDLE_TYPE_FD);
+   // handle.modifier = modifier;
+   scanout_templat.bind &= ~PIPE_BIND_SCANOUT;
+   rsc = fd_resource(pscreen->resource_from_handle(pscreen, 
_templat,
+   
,
+   
PIPE_HANDLE_USAGE_WRITE));
+   close(handle.handle);
+   if (!rsc)
+   return NULL;
+
+   rsc->scanout = scanout;
+   return >base;
+   }
+
+   rsc = CALLOC_STRUCT(fd_resource);
+   prsc = >base;
+
if (!rsc)
return NULL;
 
diff --git a/src/gallium/drivers/freedreno/freedreno_resource.h 
b/src/gallium/drivers/freedreno/freedreno_resource.h
index 2834969110..baad2baa68 100644
--- a/src/gallium/drivers/freedreno/freedreno_resource.h
+++ b/src/gallium/drivers/freedreno/freedreno_resource.h
@@ -65,6 +65,7 @@ struct set;
 
 struct fd_resource {
struct pipe_resource base;
+   struct renderonly_scanout *scanout;
struct fd_bo *bo;
uint32_t cpp;
enum pipe_format internal_format;
diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c 
b/src/gallium/drivers/freedreno/freedreno_screen.c
index 33f14b8f24..ef88e5b121 100644
--- a/src/gallium/drivers/freedreno/freedreno_screen.c
+++ b/src/gallium/drivers/freedreno/freedreno_screen.c
@@ -649,27 +649,6 @@ fd_get_compiler_options(struct pipe_screen *pscreen,
return NULL;
 }
 
-boolean
-fd_scree

[Mesa-dev] [PATCH 04/11] imx: add freedreno

2018-09-17 Thread Jonathan Marek
Signed-off-by: Jonathan Marek 
---
 configure.ac|  4 ++--
 src/gallium/targets/dri/target.c|  5 +++-
 src/gallium/winsys/imx/drm/Makefile.am  |  9 +++
 src/gallium/winsys/imx/drm/imx_drm_winsys.c | 26 -
 4 files changed, 35 insertions(+), 9 deletions(-)

diff --git a/configure.ac b/configure.ac
index f8bb131cb6..85cd3c1eeb 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2835,8 +2835,8 @@ AM_CONDITIONAL(HAVE_SWR_BUILTIN, test 
"x$HAVE_SWR_BUILTIN" = xyes)
 
 dnl We need to validate some needed dependencies for renderonly drivers.
 
-if test "x$HAVE_GALLIUM_ETNAVIV" != xyes -a "x$HAVE_GALLIUM_IMX" = xyes  ; then
-AC_MSG_ERROR([Building with imx requires etnaviv])
+if test "x$HAVE_GALLIUM_ETNAVIV" != xyes -a "x$HAVE_GALLIUM_FREEDRENO" != xyes 
-a "x$HAVE_GALLIUM_IMX" = xyes  ; then
+AC_MSG_ERROR([Building with imx requires etnaviv or freedreno])
 fi
 
 if test "x$HAVE_GALLIUM_VC4" != xyes -a "x$HAVE_GALLIUM_PL111" = xyes  ; then
diff --git a/src/gallium/targets/dri/target.c b/src/gallium/targets/dri/target.c
index 835d125f21..ddaca8501a 100644
--- a/src/gallium/targets/dri/target.c
+++ b/src/gallium/targets/dri/target.c
@@ -83,10 +83,13 @@ DEFINE_LOADER_DRM_ENTRYPOINT(pl111)
 #endif
 
 #if defined(GALLIUM_ETNAVIV)
-DEFINE_LOADER_DRM_ENTRYPOINT(imx_drm)
 DEFINE_LOADER_DRM_ENTRYPOINT(etnaviv)
 #endif
 
+#if defined(GALLIUM_IMX)
+DEFINE_LOADER_DRM_ENTRYPOINT(imx_drm)
+#endif
+
 #if defined(GALLIUM_TEGRA)
 DEFINE_LOADER_DRM_ENTRYPOINT(tegra);
 #endif
diff --git a/src/gallium/winsys/imx/drm/Makefile.am 
b/src/gallium/winsys/imx/drm/Makefile.am
index f15b531f81..17068cb300 100644
--- a/src/gallium/winsys/imx/drm/Makefile.am
+++ b/src/gallium/winsys/imx/drm/Makefile.am
@@ -28,8 +28,17 @@ AM_CFLAGS = \
-I$(top_srcdir)/src/gallium/winsys \
$(GALLIUM_WINSYS_CFLAGS)
 
+if HAVE_GALLIUM_ETNAVIV
+AM_CFLAGS += -DGALLIUM_ETNAVIV
+endif
+
+if HAVE_GALLIUM_FREEDRENO
+AM_CFLAGS += -DGALLIUM_FREEDRENO
+endif
+
 noinst_LTLIBRARIES = libimxdrm.la
 
 libimxdrm_la_SOURCES = $(C_SOURCES)
 
 EXTRA_DIST = meson.build
+
diff --git a/src/gallium/winsys/imx/drm/imx_drm_winsys.c 
b/src/gallium/winsys/imx/drm/imx_drm_winsys.c
index 4bd2125031..f8c4abffde 100644
--- a/src/gallium/winsys/imx/drm/imx_drm_winsys.c
+++ b/src/gallium/winsys/imx/drm/imx_drm_winsys.c
@@ -26,6 +26,7 @@
 
 #include "imx_drm_public.h"
 #include "etnaviv/drm/etnaviv_drm_public.h"
+#include "freedreno/drm/freedreno_drm_public.h"
 #include "loader/loader.h"
 #include "renderonly/renderonly.h"
 
@@ -37,15 +38,28 @@ struct pipe_screen *imx_drm_screen_create(int fd)
struct renderonly ro = {
   .create_for_resource = renderonly_create_kms_dumb_buffer_for_resource,
   .kms_fd = fd,
-  .gpu_fd = loader_open_render_node("etnaviv")
};
+   struct pipe_screen *screen;
 
-   if (ro.gpu_fd < 0)
-  return NULL;
+#if defined(GALLIUM_ETNAVIV)
+   ro.gpu_fd = loader_open_render_node("etnaviv");
+   if (ro.gpu_fd >= 0) {
+  screen = etna_drm_screen_create_renderonly();
+  if (screen)
+return screen;
+  close(ro.gpu_fd);
+   }
+#endif
 
-   struct pipe_screen *screen = etna_drm_screen_create_renderonly();
-   if (!screen)
+#if defined(GALLIUM_FREEDRENO)
+   ro.gpu_fd = loader_open_render_node("msm");
+   if (ro.gpu_fd >= 0) {
+  screen = fd_drm_screen_create_renderonly();
+  if (screen)
+return screen;
   close(ro.gpu_fd);
+   }
+#endif
 
-   return screen;
+   return NULL;
 }
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] freedreno: a2xx: ir2 update

2018-07-24 Thread Jonathan Marek
this patch brings a number of changes to ir2:
-ir2 now generates CF clauses as necessary during assembly. this simplifies
 fd2_program/fd2_compiler and is necessary to implement optimization passes
-ir2 now has separate vector/scalar instructions. this will make it easier
 to implementing scheduling of scalar+vector instructions together. dst_reg
 is also now seperate from src registers instead of a single list
-ir2 now implements register allocation. this makes it possible to compile
 shaders which have more than 64 TGSI registers
-ir2 now implements the following optimizations: removal of IN/OUT MOV
 instructions generated by TGSI and removal of unused instructions when
 some exports are disabled
-ir2 now allows full 8-bit index for constants
-ir2_alloc no longer allocates 4 times too many bytes

Signed-off-by: Jonathan Marek 
---
 .../drivers/freedreno/a2xx/fd2_compiler.c | 210 ++---
 .../drivers/freedreno/a2xx/fd2_program.c  |  75 +-
 .../drivers/freedreno/a2xx/instr-a2xx.h   |  28 +-
 src/gallium/drivers/freedreno/a2xx/ir-a2xx.c  | 734 +++---
 src/gallium/drivers/freedreno/a2xx/ir-a2xx.h  | 113 +--
 5 files changed, 615 insertions(+), 545 deletions(-)

diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c
index 3ad47f9850..12f9a1ce0a 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c
@@ -93,9 +93,6 @@ struct fd2_compile_context {
unsigned position, psize;
 
uint64_t need_sync;
-
-   /* current exec CF instruction */
-   struct ir2_cf *cf;
 };
 
 static int
@@ -130,7 +127,6 @@ compile_init(struct fd2_compile_context *ctx, struct 
fd_program_stateobj *prog,
 
ctx->prog = prog;
ctx->so = so;
-   ctx->cf = NULL;
ctx->pred_depth = 0;
 
ret = tgsi_parse_init(>parser, so->tokens);
@@ -236,15 +232,6 @@ compile_free(struct fd2_compile_context *ctx)
tgsi_parse_free(>parser);
 }
 
-static struct ir2_cf *
-next_exec_cf(struct fd2_compile_context *ctx)
-{
-   struct ir2_cf *cf = ctx->cf;
-   if (!cf || cf->exec.instrs_count >= ARRAY_SIZE(ctx->cf->exec.instrs))
-   ctx->cf = cf = ir2_cf_create(ctx->so->ir, EXEC);
-   return cf;
-}
-
 static void
 compile_vtx_fetch(struct fd2_compile_context *ctx)
 {
@@ -252,13 +239,13 @@ compile_vtx_fetch(struct fd2_compile_context *ctx)
int i;
for (i = 0; i < ctx->num_regs[TGSI_FILE_INPUT]; i++) {
struct ir2_instruction *instr = ir2_instr_create(
-   next_exec_cf(ctx), IR2_FETCH);
+   ctx->so->ir, IR2_FETCH);
instr->fetch.opc = VTX_FETCH;
 
ctx->need_sync |= 1 << (i+1);
 
-   ir2_reg_create(instr, i+1, "xyzw", 0);
-   ir2_reg_create(instr, 0, "x", 0);
+   ir2_dst_create(instr, i+1, "xyzw", 0);
+   ir2_reg_create(instr, 0, "x", IR2_REG_INPUT);
 
if (i == 0)
instr->sync = true;
@@ -266,7 +253,6 @@ compile_vtx_fetch(struct fd2_compile_context *ctx)
vfetch_instrs[i] = instr;
}
ctx->so->num_vfetch_instrs = i;
-   ctx->cf = NULL;
 }
 
 /*
@@ -312,7 +298,7 @@ get_temp_gpr(struct fd2_compile_context *ctx, int idx)
return num;
 }
 
-static struct ir2_register *
+static struct ir2_dst_register *
 add_dst_reg(struct fd2_compile_context *ctx, struct ir2_instruction *alu,
const struct tgsi_dst_register *dst)
 {
@@ -351,10 +337,10 @@ add_dst_reg(struct fd2_compile_context *ctx, struct 
ir2_instruction *alu,
swiz[3] = (dst->WriteMask & TGSI_WRITEMASK_W) ? 'w' : '_';
swiz[4] = '\0';
 
-   return ir2_reg_create(alu, num, swiz, flags);
+   return ir2_dst_create(alu, num, swiz, flags);
 }
 
-static struct ir2_register *
+static struct ir2_src_register *
 add_src_reg(struct fd2_compile_context *ctx, struct ir2_instruction *alu,
const struct tgsi_src_register *src)
 {
@@ -373,6 +359,7 @@ add_src_reg(struct fd2_compile_context *ctx, struct 
ir2_instruction *alu,
if (ctx->type == PIPE_SHADER_VERTEX) {
num = src->Index + 1;
} else {
+   flags |= IR2_REG_INPUT;
num = export_linkage(ctx,
ctx->input_export_idx[src->Index]);
}
@@ -415,7 +402,7 @@ static void
 add_vector_clamp(struct tgsi_full_instruction *inst, struct ir2_instruction 
*alu)
 {
if (inst->Instruction.Saturate) {
-   alu->alu.vector_clamp = true;
+   alu->alu_vector.clamp = true;
}
 }
 
@@ -423,7 +410,7 @@ static void
 add_scalar_clamp(struct

[Mesa-dev] [PATCH 5/5] freedreno: a2xx: fix clear color

2018-06-21 Thread Jonathan Marek
the format of the CLEAR_COLOR register doesn't depend on the target format
this fixes clear color when rendering to 32-bit RGBA and 16-bit targets

Signed-off-by: Jonathan Marek 
---
 src/gallium/drivers/freedreno/a2xx/fd2_draw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c
index ca634d794a..6f0535fa2b 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c
@@ -135,7 +135,7 @@ fd2_clear(struct fd_context *ctx, unsigned buffers,
uint32_t reg, colr = 0;
 
if ((buffers & PIPE_CLEAR_COLOR) && fb->nr_cbufs)
-   colr  = pack_rgba(fb->cbufs[0]->format, color->f);
+   colr = pack_rgba(PIPE_FORMAT_R8G8B8A8_UNORM, color->f);
 
/* emit generic state now: */
fd2_emit_state(ctx, ctx->dirty &
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/5] freedreno: a2xx: fix crash when freeing context

2018-06-21 Thread Jonathan Marek
Signed-off-by: Jonathan Marek 
---
 src/gallium/drivers/freedreno/a2xx/fd2_program.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_program.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_program.c
index 9a77457251..834a7c7fcd 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_program.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_program.c
@@ -54,6 +54,8 @@ create_shader(enum shader_t type)
 static void
 delete_shader(struct fd2_shader_stateobj *so)
 {
+   if (!so)
+   return;
ir2_shader_destroy(so->ir);
free(so->tokens);
free(so->bin);
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/5] freedreno: a2xx: increase size of the offset field in instr_fetch_vtx_t

2018-06-21 Thread Jonathan Marek
The offset field is 22 bit large.
11 bits are necessary because MaxVertexAttribRelativeOffset = 2047

Signed-off-by: Jonathan Marek 
---
 src/gallium/drivers/freedreno/a2xx/instr-a2xx.h | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/freedreno/a2xx/instr-a2xx.h 
b/src/gallium/drivers/freedreno/a2xx/instr-a2xx.h
index 0d6e138daf..ac972ed35a 100644
--- a/src/gallium/drivers/freedreno/a2xx/instr-a2xx.h
+++ b/src/gallium/drivers/freedreno/a2xx/instr-a2xx.h
@@ -366,10 +366,8 @@ typedef struct PACKED {
uint8_t pred_select  : 1;
/* dword2: */
uint8_t stride   : 8;
-   /* possibly offset and reserved4 are swapped on a200? */
-   uint8_t offset   : 8;
-   uint8_t reserved4: 8;
-   uint8_t reserved5: 7;
+   uint32_toffset   : 22;
+   uint8_t reserved4: 1;
uint8_t pred_condition   : 1;
 } instr_fetch_vtx_t;
 
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/5] freedreno: a2xx: fix crash on first clear

2018-06-21 Thread Jonathan Marek
blend can be NULL, so check for that

Signed-off-by: Jonathan Marek 
---
 src/gallium/drivers/freedreno/a2xx/fd2_emit.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c
index 4bf41b2c67..dcf7ed10b5 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c
@@ -295,7 +295,7 @@ fd2_emit_state(struct fd_context *ctx, const enum 
fd_dirty_3d_state dirty)
if (dirty & (FD_DIRTY_BLEND | FD_DIRTY_ZSA)) {
OUT_PKT3(ring, CP_SET_CONSTANT, 2);
OUT_RING(ring, CP_REG(REG_A2XX_RB_COLORCONTROL));
-   OUT_RING(ring, zsa->rb_colorcontrol | blend->rb_colorcontrol);
+   OUT_RING(ring, blend ? zsa->rb_colorcontrol | 
blend->rb_colorcontrol : 0);
}
 
if (dirty & (FD_DIRTY_BLEND | FD_DIRTY_FRAMEBUFFER)) {
@@ -305,13 +305,13 @@ fd2_emit_state(struct fd_context *ctx, const enum 
fd_dirty_3d_state dirty)
 
OUT_PKT3(ring, CP_SET_CONSTANT, 2);
OUT_RING(ring, CP_REG(REG_A2XX_RB_BLEND_CONTROL));
-   OUT_RING(ring, blend->rb_blendcontrol_alpha |
+   OUT_RING(ring, blend ? blend->rb_blendcontrol_alpha |
COND(has_alpha, blend->rb_blendcontrol_rgb) |
-   COND(!has_alpha, blend->rb_blendcontrol_no_alpha_rgb));
+   COND(!has_alpha, blend->rb_blendcontrol_no_alpha_rgb) : 
0);
 
OUT_PKT3(ring, CP_SET_CONSTANT, 2);
OUT_RING(ring, CP_REG(REG_A2XX_RB_COLOR_MASK));
-   OUT_RING(ring, blend->rb_colormask);
+   OUT_RING(ring, blend ? blend->rb_colormask : 0xf);
}
 
if (dirty & FD_DIRTY_BLEND_COLOR) {
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/5] freedreno: add a20x

2018-06-21 Thread Jonathan Marek
this patch adds support for a20x, which has some differences with a220:
-no VGT_MAX_VTX_INDX register
-no CLEAR_COLOR register
-set RB_BC_CONTROL in restore (hangs without)
-different CP_DRAW_INDX format

tested with kmscube and glmark2 scenes, on par with a220

Signed-off-by: Jonathan Marek 
---
 src/gallium/drivers/freedreno/a2xx/fd2_draw.c | 37 +--
 src/gallium/drivers/freedreno/a2xx/fd2_emit.c | 10 +
 src/gallium/drivers/freedreno/a2xx/fd2_gmem.c | 22 ++-
 .../drivers/freedreno/freedreno_draw.h| 27 +-
 .../drivers/freedreno/freedreno_screen.c  |  1 +
 .../drivers/freedreno/freedreno_screen.h  |  6 +++
 .../drivers/freedreno/freedreno_util.h| 13 +++
 7 files changed, 85 insertions(+), 31 deletions(-)

diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c
index 8df1793a35..ca634d794a 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c
@@ -101,12 +101,14 @@ fd2_draw_vbo(struct fd_context *ctx, const struct 
pipe_draw_info *info,
OUT_PKT0(ring, REG_A2XX_TC_CNTL_STATUS, 1);
OUT_RING(ring, A2XX_TC_CNTL_STATUS_L2_INVALIDATE);
 
-   OUT_WFI (ring);
+   if (!is_a20x(ctx->screen)) {
+   OUT_WFI (ring);
 
-   OUT_PKT3(ring, CP_SET_CONSTANT, 3);
-   OUT_RING(ring, CP_REG(REG_A2XX_VGT_MAX_VTX_INDX));
-   OUT_RING(ring, info->max_index);/* VGT_MAX_VTX_INDX */
-   OUT_RING(ring, info->min_index);/* VGT_MIN_VTX_INDX */
+   OUT_PKT3(ring, CP_SET_CONSTANT, 3);
+   OUT_RING(ring, CP_REG(REG_A2XX_VGT_MAX_VTX_INDX));
+   OUT_RING(ring, info->max_index);/* VGT_MAX_VTX_INDX */
+   OUT_RING(ring, info->min_index);/* VGT_MIN_VTX_INDX */
+   }
 
fd_draw_emit(ctx->batch, ring, ctx->primtypes[info->mode],
 IGNORE_VISIBILITY, info, index_offset);
@@ -157,9 +159,18 @@ fd2_clear(struct fd_context *ctx, unsigned buffers,
OUT_PKT0(ring, REG_A2XX_TC_CNTL_STATUS, 1);
OUT_RING(ring, A2XX_TC_CNTL_STATUS_L2_INVALIDATE);
 
-   OUT_PKT3(ring, CP_SET_CONSTANT, 2);
-   OUT_RING(ring, CP_REG(REG_A2XX_CLEAR_COLOR));
-   OUT_RING(ring, colr);
+   if (is_a20x(ctx->screen)) {
+   OUT_PKT3(ring, CP_SET_CONSTANT, 5);
+   OUT_RING(ring, 0x0480);
+   OUT_RING(ring, color->ui[0]);
+   OUT_RING(ring, color->ui[1]);
+   OUT_RING(ring, color->ui[2]);
+   OUT_RING(ring, color->ui[3]);
+   } else {
+   OUT_PKT3(ring, CP_SET_CONSTANT, 2);
+   OUT_RING(ring, CP_REG(REG_A2XX_CLEAR_COLOR));
+   OUT_RING(ring, colr);
+   }
 
OUT_PKT3(ring, CP_SET_CONSTANT, 2);
OUT_RING(ring, CP_REG(REG_A2XX_A220_RB_LRZ_VSC_CONTROL));
@@ -264,10 +275,12 @@ fd2_clear(struct fd_context *ctx, unsigned buffers,
OUT_RING(ring, 0x0);
}
 
-   OUT_PKT3(ring, CP_SET_CONSTANT, 3);
-   OUT_RING(ring, CP_REG(REG_A2XX_VGT_MAX_VTX_INDX));
-   OUT_RING(ring, 3); /* VGT_MAX_VTX_INDX */
-   OUT_RING(ring, 0); /* VGT_MIN_VTX_INDX */
+   if (!is_a20x(ctx->screen)) {
+   OUT_PKT3(ring, CP_SET_CONSTANT, 3);
+   OUT_RING(ring, CP_REG(REG_A2XX_VGT_MAX_VTX_INDX));
+   OUT_RING(ring, 3); /* VGT_MAX_VTX_INDX */
+   OUT_RING(ring, 0); /* VGT_MIN_VTX_INDX */
+   }
 
fd_draw(ctx->batch, ring, DI_PT_RECTLIST, IGNORE_VISIBILITY,
DI_SRC_SEL_AUTO_INDEX, 3, 0, INDEX_SIZE_IGN, 0, 0, 
NULL);
diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c
index d749eb0324..4bf41b2c67 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c
@@ -332,6 +332,16 @@ fd2_emit_state(struct fd_context *ctx, const enum 
fd_dirty_3d_state dirty)
 void
 fd2_emit_restore(struct fd_context *ctx, struct fd_ringbuffer *ring)
 {
+   if (is_a20x(ctx->screen)) {
+   OUT_PKT0(ring, REG_A2XX_RB_BC_CONTROL, 1);
+   OUT_RING(ring,
+   A2XX_RB_BC_CONTROL_ACCUM_TIMEOUT_SELECT(3) |
+   A2XX_RB_BC_CONTROL_DISABLE_LZ_NULL_ZCMD_DROP |
+   A2XX_RB_BC_CONTROL_ENABLE_CRC_UPDATE |
+   A2XX_RB_BC_CONTROL_ACCUM_DATA_FIFO_LIMIT(8) |
+   A2XX_RB_BC_CONTROL_MEM_EXPORT_TIMEOUT_SELECT(3));
+   }
+
OUT_PKT0(ring, REG_A2XX_TP0_CHICKEN, 1);
OUT_RING(ring, 0x0002);
 
diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c
index 46a7d18ef0..62382995c0 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c
+++

[Mesa-dev] [PATCH] freedreno: add a20x

2018-06-19 Thread Jonathan Marek
this patch adds support for a20x, which has some differences with a220:
-no VGT_MAX_VTX_INDX register
-no CLEAR_COLOR register
-set RB_BC_CONTROL in restore (hangs without)
-different CP_DRAW_INDX format

tested with kmscube and glmark2 scenes, on par with a220

Signed-off-by: Jonathan Marek 
---
 src/gallium/drivers/freedreno/a2xx/fd2_draw.c | 37 +--
 src/gallium/drivers/freedreno/a2xx/fd2_emit.c | 10 +
 src/gallium/drivers/freedreno/a2xx/fd2_gmem.c | 22 ++-
 .../drivers/freedreno/freedreno_draw.h| 27 +-
 .../drivers/freedreno/freedreno_screen.c  |  1 +
 .../drivers/freedreno/freedreno_screen.h  |  6 +++
 .../drivers/freedreno/freedreno_util.h| 13 +++
 7 files changed, 85 insertions(+), 31 deletions(-)

diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c
index 8df1793a35..ca634d794a 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c
@@ -101,12 +101,14 @@ fd2_draw_vbo(struct fd_context *ctx, const struct 
pipe_draw_info *info,
OUT_PKT0(ring, REG_A2XX_TC_CNTL_STATUS, 1);
OUT_RING(ring, A2XX_TC_CNTL_STATUS_L2_INVALIDATE);
 
-   OUT_WFI (ring);
+   if (!is_a20x(ctx->screen)) {
+   OUT_WFI (ring);
 
-   OUT_PKT3(ring, CP_SET_CONSTANT, 3);
-   OUT_RING(ring, CP_REG(REG_A2XX_VGT_MAX_VTX_INDX));
-   OUT_RING(ring, info->max_index);/* VGT_MAX_VTX_INDX */
-   OUT_RING(ring, info->min_index);/* VGT_MIN_VTX_INDX */
+   OUT_PKT3(ring, CP_SET_CONSTANT, 3);
+   OUT_RING(ring, CP_REG(REG_A2XX_VGT_MAX_VTX_INDX));
+   OUT_RING(ring, info->max_index);/* VGT_MAX_VTX_INDX */
+   OUT_RING(ring, info->min_index);/* VGT_MIN_VTX_INDX */
+   }
 
fd_draw_emit(ctx->batch, ring, ctx->primtypes[info->mode],
 IGNORE_VISIBILITY, info, index_offset);
@@ -157,9 +159,18 @@ fd2_clear(struct fd_context *ctx, unsigned buffers,
OUT_PKT0(ring, REG_A2XX_TC_CNTL_STATUS, 1);
OUT_RING(ring, A2XX_TC_CNTL_STATUS_L2_INVALIDATE);
 
-   OUT_PKT3(ring, CP_SET_CONSTANT, 2);
-   OUT_RING(ring, CP_REG(REG_A2XX_CLEAR_COLOR));
-   OUT_RING(ring, colr);
+   if (is_a20x(ctx->screen)) {
+   OUT_PKT3(ring, CP_SET_CONSTANT, 5);
+   OUT_RING(ring, 0x0480);
+   OUT_RING(ring, color->ui[0]);
+   OUT_RING(ring, color->ui[1]);
+   OUT_RING(ring, color->ui[2]);
+   OUT_RING(ring, color->ui[3]);
+   } else {
+   OUT_PKT3(ring, CP_SET_CONSTANT, 2);
+   OUT_RING(ring, CP_REG(REG_A2XX_CLEAR_COLOR));
+   OUT_RING(ring, colr);
+   }
 
OUT_PKT3(ring, CP_SET_CONSTANT, 2);
OUT_RING(ring, CP_REG(REG_A2XX_A220_RB_LRZ_VSC_CONTROL));
@@ -264,10 +275,12 @@ fd2_clear(struct fd_context *ctx, unsigned buffers,
OUT_RING(ring, 0x0);
}
 
-   OUT_PKT3(ring, CP_SET_CONSTANT, 3);
-   OUT_RING(ring, CP_REG(REG_A2XX_VGT_MAX_VTX_INDX));
-   OUT_RING(ring, 3); /* VGT_MAX_VTX_INDX */
-   OUT_RING(ring, 0); /* VGT_MIN_VTX_INDX */
+   if (!is_a20x(ctx->screen)) {
+   OUT_PKT3(ring, CP_SET_CONSTANT, 3);
+   OUT_RING(ring, CP_REG(REG_A2XX_VGT_MAX_VTX_INDX));
+   OUT_RING(ring, 3); /* VGT_MAX_VTX_INDX */
+   OUT_RING(ring, 0); /* VGT_MIN_VTX_INDX */
+   }
 
fd_draw(ctx->batch, ring, DI_PT_RECTLIST, IGNORE_VISIBILITY,
DI_SRC_SEL_AUTO_INDEX, 3, 0, INDEX_SIZE_IGN, 0, 0, 
NULL);
diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c
index a787b71e37..9c765dfd88 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c
@@ -340,6 +340,16 @@ fd2_emit_state(struct fd_context *ctx, const enum 
fd_dirty_3d_state dirty)
 void
 fd2_emit_restore(struct fd_context *ctx, struct fd_ringbuffer *ring)
 {
+   if (is_a20x(ctx->screen)) {
+   OUT_PKT0(ring, REG_A2XX_RB_BC_CONTROL, 1);
+   OUT_RING(ring,
+   A2XX_RB_BC_CONTROL_ACCUM_TIMEOUT_SELECT(3) |
+   A2XX_RB_BC_CONTROL_DISABLE_LZ_NULL_ZCMD_DROP |
+   A2XX_RB_BC_CONTROL_ENABLE_CRC_UPDATE |
+   A2XX_RB_BC_CONTROL_ACCUM_DATA_FIFO_LIMIT(8) |
+   A2XX_RB_BC_CONTROL_MEM_EXPORT_TIMEOUT_SELECT(3));
+   }
+
OUT_PKT0(ring, REG_A2XX_TP0_CHICKEN, 1);
OUT_RING(ring, 0x0002);
 
diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c
index 46a7d18ef0..62382995c0 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c
+++

[Mesa-dev] [PATCH 2/2] freedreno: a2xx: fix crash when freeing context

2018-06-19 Thread Jonathan Marek
Signed-off-by: Jonathan Marek 
---
 src/gallium/drivers/freedreno/a2xx/fd2_program.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_program.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_program.c
index 9a77457251..834a7c7fcd 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_program.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_program.c
@@ -54,6 +54,8 @@ create_shader(enum shader_t type)
 static void
 delete_shader(struct fd2_shader_stateobj *so)
 {
+   if (!so)
+   return;
ir2_shader_destroy(so->ir);
free(so->tokens);
free(so->bin);
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] freedreno: a2xx: fix crash on first clear

2018-06-19 Thread Jonathan Marek
blend can be NULL, so check for that

Signed-off-by: Jonathan Marek 
---
 src/gallium/drivers/freedreno/a2xx/fd2_emit.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c
index 9c765dfd88..e36eebf98c 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c
@@ -303,7 +303,7 @@ fd2_emit_state(struct fd_context *ctx, const enum 
fd_dirty_3d_state dirty)
if (dirty & (FD_DIRTY_BLEND | FD_DIRTY_ZSA)) {
OUT_PKT3(ring, CP_SET_CONSTANT, 2);
OUT_RING(ring, CP_REG(REG_A2XX_RB_COLORCONTROL));
-   OUT_RING(ring, zsa->rb_colorcontrol | blend->rb_colorcontrol);
+   OUT_RING(ring, blend ? zsa->rb_colorcontrol | 
blend->rb_colorcontrol : 0);
}
 
if (dirty & (FD_DIRTY_BLEND | FD_DIRTY_FRAMEBUFFER)) {
@@ -313,13 +313,13 @@ fd2_emit_state(struct fd_context *ctx, const enum 
fd_dirty_3d_state dirty)
 
OUT_PKT3(ring, CP_SET_CONSTANT, 2);
OUT_RING(ring, CP_REG(REG_A2XX_RB_BLEND_CONTROL));
-   OUT_RING(ring, blend->rb_blendcontrol_alpha |
+   OUT_RING(ring, blend ? blend->rb_blendcontrol_alpha |
COND(has_alpha, blend->rb_blendcontrol_rgb) |
-   COND(!has_alpha, blend->rb_blendcontrol_no_alpha_rgb));
+   COND(!has_alpha, blend->rb_blendcontrol_no_alpha_rgb) : 
0);
 
OUT_PKT3(ring, CP_SET_CONSTANT, 2);
OUT_RING(ring, CP_REG(REG_A2XX_RB_COLOR_MASK));
-   OUT_RING(ring, blend->rb_colormask);
+   OUT_RING(ring, blend ? blend->rb_colormask : 0xf);
}
 
if (dirty & FD_DIRTY_BLEND_COLOR) {
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] freedreno: a2xx: fix crash on first clear

2018-06-19 Thread Jonathan Marek
blend can be NULL, so check for that

Signed-off-by: Jonathan Marek 
---
 src/gallium/drivers/freedreno/a2xx/fd2_emit.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c
index 9c765dfd88..e36eebf98c 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c
@@ -303,7 +303,7 @@ fd2_emit_state(struct fd_context *ctx, const enum 
fd_dirty_3d_state dirty)
if (dirty & (FD_DIRTY_BLEND | FD_DIRTY_ZSA)) {
OUT_PKT3(ring, CP_SET_CONSTANT, 2);
OUT_RING(ring, CP_REG(REG_A2XX_RB_COLORCONTROL));
-   OUT_RING(ring, zsa->rb_colorcontrol | blend->rb_colorcontrol);
+   OUT_RING(ring, blend ? zsa->rb_colorcontrol | 
blend->rb_colorcontrol : 0);
}
 
if (dirty & (FD_DIRTY_BLEND | FD_DIRTY_FRAMEBUFFER)) {
@@ -313,13 +313,13 @@ fd2_emit_state(struct fd_context *ctx, const enum 
fd_dirty_3d_state dirty)
 
OUT_PKT3(ring, CP_SET_CONSTANT, 2);
OUT_RING(ring, CP_REG(REG_A2XX_RB_BLEND_CONTROL));
-   OUT_RING(ring, blend->rb_blendcontrol_alpha |
+   OUT_RING(ring, blend ? blend->rb_blendcontrol_alpha |
COND(has_alpha, blend->rb_blendcontrol_rgb) |
-   COND(!has_alpha, blend->rb_blendcontrol_no_alpha_rgb));
+   COND(!has_alpha, blend->rb_blendcontrol_no_alpha_rgb) : 
0);
 
OUT_PKT3(ring, CP_SET_CONSTANT, 2);
OUT_RING(ring, CP_REG(REG_A2XX_RB_COLOR_MASK));
-   OUT_RING(ring, blend->rb_colormask);
+   OUT_RING(ring, blend ? blend->rb_colormask : 0xf);
}
 
if (dirty & FD_DIRTY_BLEND_COLOR) {
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] freedreno: a2xx: fix clear color

2018-06-18 Thread Jonathan Marek
the format of the CLEAR_COLOR register doesn't depend on the target format
this fixes clear color when rendering to 32-bit RGBA and 16-bit targets

Signed-off-by: Jonathan Marek 
---
 src/gallium/drivers/freedreno/a2xx/fd2_draw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c
index c12047628c..2d3c029e57 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c
@@ -133,7 +133,7 @@ fd2_clear(struct fd_context *ctx, unsigned buffers,
uint32_t reg, colr = 0;
 
if ((buffers & PIPE_CLEAR_COLOR) && fb->nr_cbufs)
-   colr  = pack_rgba(fb->cbufs[0]->format, color->f);
+   colr = pack_rgba(PIPE_FORMAT_R8G8B8A8_UNORM, color->f);
 
/* emit generic state for clear now: */
fd2_emit_state_for_clear(ctx);
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] freedreno: add initial a20x support

2018-06-18 Thread Jonathan Marek
the bare minimum to get a20x running with kmscube and some glmark2 scenes:
different CP_DRAW_INDX format and the different clear color register

Signed-off-by: Jonathan Marek 
---
 src/gallium/drivers/freedreno/a2xx/fd2_draw.c | 15 +++--
 src/gallium/drivers/freedreno/a2xx/fd2_emit.c | 10 ++
 .../drivers/freedreno/freedreno_draw.h| 32 ---
 .../drivers/freedreno/freedreno_screen.c  |  1 +
 .../drivers/freedreno/freedreno_screen.h  |  6 
 5 files changed, 50 insertions(+), 14 deletions(-)

diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c
index ef9daddfcf..c12047628c 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c
@@ -155,9 +155,18 @@ fd2_clear(struct fd_context *ctx, unsigned buffers,
OUT_PKT0(ring, REG_A2XX_TC_CNTL_STATUS, 1);
OUT_RING(ring, A2XX_TC_CNTL_STATUS_L2_INVALIDATE);
 
-   OUT_PKT3(ring, CP_SET_CONSTANT, 2);
-   OUT_RING(ring, CP_REG(REG_A2XX_CLEAR_COLOR));
-   OUT_RING(ring, colr);
+   if (is_a20x(ctx->screen)) {
+   OUT_PKT3(ring, CP_SET_CONSTANT, 5);
+   OUT_RING(ring, 0x0480);
+   OUT_RING(ring, color->ui[0]);
+   OUT_RING(ring, color->ui[1]);
+   OUT_RING(ring, color->ui[2]);
+   OUT_RING(ring, color->ui[3]);
+   } else {
+   OUT_PKT3(ring, CP_SET_CONSTANT, 2);
+   OUT_RING(ring, CP_REG(REG_A2XX_CLEAR_COLOR));
+   OUT_RING(ring, colr);
+   }
 
OUT_PKT3(ring, CP_SET_CONSTANT, 2);
OUT_RING(ring, CP_REG(REG_A2XX_A220_RB_LRZ_VSC_CONTROL));
diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c
index 6927fa87fd..2b28bb23a3 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c
@@ -415,6 +415,16 @@ fd2_emit_state_for_clear(struct fd_context *ctx)
 void
 fd2_emit_restore(struct fd_context *ctx, struct fd_ringbuffer *ring)
 {
+   if (is_a20x(ctx->screen)) {
+   OUT_PKT0(ring, REG_A2XX_RB_BC_CONTROL, 1);
+   OUT_RING(ring,
+   A2XX_RB_BC_CONTROL_ACCUM_TIMEOUT_SELECT(3) |
+   A2XX_RB_BC_CONTROL_DISABLE_LZ_NULL_ZCMD_DROP |
+   A2XX_RB_BC_CONTROL_ENABLE_CRC_UPDATE |
+   A2XX_RB_BC_CONTROL_ACCUM_DATA_FIFO_LIMIT(8) |
+   A2XX_RB_BC_CONTROL_MEM_EXPORT_TIMEOUT_SELECT(3));
+   }
+
OUT_PKT0(ring, REG_A2XX_TP0_CHICKEN, 1);
OUT_RING(ring, 0x0002);
 
diff --git a/src/gallium/drivers/freedreno/freedreno_draw.h 
b/src/gallium/drivers/freedreno/freedreno_draw.h
index b293f73b82..ec4b47898d 100644
--- a/src/gallium/drivers/freedreno/freedreno_draw.h
+++ b/src/gallium/drivers/freedreno/freedreno_draw.h
@@ -51,6 +51,8 @@ fd_draw(struct fd_batch *batch, struct fd_ringbuffer *ring,
uint32_t idx_size, uint32_t idx_offset,
struct pipe_resource *idx_buffer)
 {
+   uint32_t cnt, draw;
+
/* for debug after a lock up, write a unique counter value
 * to scratch7 for each draw, to make it easier to match up
 * register dumps to cmdstream.  The combination of IB
@@ -74,18 +76,26 @@ fd_draw(struct fd_batch *batch, struct fd_ringbuffer *ring,
OUT_RING(ring, 0);
}
 
-   OUT_PKT3(ring, CP_DRAW_INDX, idx_buffer ? 5 : 3);
-   OUT_RING(ring, 0x);/* viz query info. */
-   if (vismode == USE_VISIBILITY) {
-   /* leave vis mode blank for now, it will be patched up when
-* we know if we are binning or not
-*/
-   OUT_RINGP(ring, DRAW(primtype, src_sel, idx_type, 0, instances),
-   >draw_patches);
-   } else {
-   OUT_RING(ring, DRAW(primtype, src_sel, idx_type, vismode, 
instances));
+   cnt = idx_buffer ? 5 : 3;
+   draw = DRAW(primtype, src_sel, idx_type, 0, instances);
+
+   if (is_a20x(batch->ctx->screen)) {
+   /* XXX instances field is overwritten */
+   draw &= 0x;
+   draw |= count << 16;
+   cnt -= 1;
}
-   OUT_RING(ring, count); /* NumIndices */
+
+   OUT_PKT3(ring, CP_DRAW_INDX, cnt);
+   OUT_RING(ring, 0x);/* viz query info. */
+   if (vismode == USE_VISIBILITY)
+   OUT_RINGP(ring, draw, >draw_patches);
+   else
+   OUT_RING(ring, draw | DRAW(0, 0, 0, vismode, 0));
+
+   if (!is_a20x(batch->ctx->screen))
+   OUT_RING(ring, count);/* NumIndices */
+
if (idx_buffer) {
OUT_RELOC(ring, fd_resource(idx_buffer)->bo, idx_offset, 0, 0);
OUT_RING (ring, idx_size);
diff --git a/src/gallium/dr