Re: [Mesa-dev] freedreno: 'Unhandled NIR tex src type: 11' on A3XX
On 6/10/19 10:00 PM, Brian Masney wrote: On Mon, Jun 10, 2019 at 09:53:25PM -0400, Jonathan Marek wrote: This error doesn't happen on X11 using the mesa master branch. Instead, I get the following error on that branch: ../src/gallium/drivers/freedreno/freedreno_batch.c:424:fd_batch_add_dep: Assertion `!batch_depends_on(dep, batch)' failed. Full disclosure though: I rebuilt the mesa package using the postmarketOS packaging yesterday and it includes a few extra patches for musl libc. https://gitlab.com/postmarketOS/pmaports/tree/master/temp/mesa I don't see anything obvious in those patches that would be related.. but I suspect this type of error is going to be timing related. (Which could ofc be due to musl or something else) but a bit surprised debug_assert() is enabled in debug builds.. it would probably be a "harmless" situation if asserts were not enabled. (note that I do most of my testing with debug builds with asserts enabled.. this is the type of thing that I want to see and fix.. but probably shouldn't matter to end users) I recompiled the master branch of mesa in pmOS with '-Db_ndebug=true' and X11 is now working properly on the Nexus 5. glxgears averages about 59.5 FPS. I'll add a bug report with pmOS to have them add that flag to their mesa build. Fedora added that flag to their builds: https://bugzilla.redhat.com/show_bug.cgi?id=1692426 19.1.0-rc5 still doesn't work for me due to the original error. Brian You probably want '--buildtype=release' instead of '-Db_ndebug=true' According to: https://gitlab.freedesktop.org/mesa/mesa/blob/master/docs/meson.html#L321 -Db_ndebug - This option controls assertions in meson projects. When set to false (the default) assertions are enabled, when set to true they are disabled. This is unrelated to the buildtype; setting the latter to release will not turn off assertions. Brian I always thought release == no assertions, I guess meson has different ideas. You will want '--buildtype=release' anyway for optimizations, etc. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] freedreno: 'Unhandled NIR tex src type: 11' on A3XX
On 6/10/19 9:52 PM, Brian Masney wrote: Hi Rob, On Mon, Jun 10, 2019 at 05:10:45PM -0700, Rob Clark wrote: On Mon, Jun 10, 2019 at 3:54 PM Brian Masney wrote: On Mon, Jun 10, 2019 at 06:58:30AM -0700, Rob Clark wrote: On Mon, Jun 10, 2019 at 6:53 AM Rob Clark wrote: On Sat, Jun 8, 2019 at 6:08 PM Brian Masney wrote: Hi, I'm trying to get the GPU working using the Freedreno driver (A330) on the Nexus 5 phone. I'm using kernel 5.2rc3 with some out of tree patches related to the GPU [1] and mesa 19.1.0-rc5 on postmarketOS. When I run glxgears, I see the gears show up for a fraction of a second and then it terminates due to the following error: - shader: MESA_SHADER_FRAGMENT inputs: 1 outputs: 1 uniforms: 0 shared: 0 decl_var uniform INTERP_MODE_NONE sampler2D sampler (0, 0, 0) decl_var shader_in INTERP_MODE_SMOOTH vec4 in_0 (VARYING_SLOT_VAR0, 0, 0) decl_var shader_out INTERP_MODE_FLAT vec4 out_0 (FRAG_RESULT_DATA0, 0, 0) decl_function main (0 params) impl main { block block_0: /* preds: */ vec1 32 ssa_0 = load_const (0x /* 0.00 */) vec2 32 ssa_1 = intrinsic load_barycentric_pixel () (1) /* interp_mode=1 */ vec4 32 ssa_2 = intrinsic load_interpolated_input (ssa_1, ssa_0) (0, 0) /* base=0 */ /* component=0 */ /* in_0 */ vec1 32 ssa_3 = deref_var (uniform sampler2D) vec2 32 ssa_4 = vec2 ssa_2.x, ssa_2.y vec4 32 ssa_5 = tex ssa_3 (texture_deref), ssa_3 (sampler_deref), ssa_4 (coord) Unhandled NIR tex src type: 11 This should be getting lowered somewhere.. and I don't *think* it should be a3xx specific. It should be getting lowered in gl_nir_lower_samplers().. which should be called from mesa/st before the driver even sees this shader. Could you build mesa from git w/ latest 19.1, I guess this must have been fixed by now, since other drivers that use nir would hit the same issue. This error doesn't happen on X11 using the mesa master branch. Instead, I get the following error on that branch: ../src/gallium/drivers/freedreno/freedreno_batch.c:424:fd_batch_add_dep: Assertion `!batch_depends_on(dep, batch)' failed. Full disclosure though: I rebuilt the mesa package using the postmarketOS packaging yesterday and it includes a few extra patches for musl libc. https://gitlab.com/postmarketOS/pmaports/tree/master/temp/mesa I don't see anything obvious in those patches that would be related.. but I suspect this type of error is going to be timing related. (Which could ofc be due to musl or something else) but a bit surprised debug_assert() is enabled in debug builds.. it would probably be a "harmless" situation if asserts were not enabled. (note that I do most of my testing with debug builds with asserts enabled.. this is the type of thing that I want to see and fix.. but probably shouldn't matter to end users) I recompiled the master branch of mesa in pmOS with '-Db_ndebug=true' and X11 is now working properly on the Nexus 5. glxgears averages about 59.5 FPS. I'll add a bug report with pmOS to have them add that flag to their mesa build. Fedora added that flag to their builds: https://bugzilla.redhat.com/show_bug.cgi?id=1692426 19.1.0-rc5 still doesn't work for me due to the original error. Brian You probably want '--buildtype=release' instead of '-Db_ndebug=true' ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] freedreno: 'Unhandled NIR tex src type: 11' on A3XX
On 6/9/19 8:41 AM, Brian Masney wrote: On Sat, Jun 08, 2019 at 10:58:11PM -0400, Jonathan Marek wrote: Hi, It's possible 19.1 has another issue, I only tested the master branch with my fix. I would suggest trying 19.0 or the master branch. The mesa master branch and 19.0.6 both give the following error when glxgears starts up: ../src/gallium/drivers/freedreno/freedreno_batch.c:424:fd_batch_add_dep: Assertion `!batch_depends_on(dep, batch)' failed. No one is testing freedreno+X11 AFAIK. This would affect all adrenos too, not just a3xx. I can look into it at some point, if no one else does. To test if the GPU works at all you should use kmscube. If that works then you can try wayland/weston, or if you really need X11 IIRC 18.1 was working with X11. FYI, I haven't pushed it anywhere but I recently rebased my Nexus 5 patches from last year (and been looking at getting call audio working). Fantastic! Brian On 6/8/19 9:08 PM, Brian Masney wrote: Hi, I'm trying to get the GPU working using the Freedreno driver (A330) on the Nexus 5 phone. I'm using kernel 5.2rc3 with some out of tree patches related to the GPU [1] and mesa 19.1.0-rc5 on postmarketOS. When I run glxgears, I see the gears show up for a fraction of a second and then it terminates due to the following error: - shader: MESA_SHADER_FRAGMENT inputs: 1 outputs: 1 uniforms: 0 shared: 0 decl_var uniform INTERP_MODE_NONE sampler2D sampler (0, 0, 0) decl_var shader_in INTERP_MODE_SMOOTH vec4 in_0 (VARYING_SLOT_VAR0, 0, 0) decl_var shader_out INTERP_MODE_FLAT vec4 out_0 (FRAG_RESULT_DATA0, 0, 0) decl_function main (0 params) impl main { block block_0: /* preds: */ vec1 32 ssa_0 = load_const (0x /* 0.00 */) vec2 32 ssa_1 = intrinsic load_barycentric_pixel () (1) /* interp_mode=1 */ vec4 32 ssa_2 = intrinsic load_interpolated_input (ssa_1, ssa_0) (0, 0) /* base=0 */ /* component=0 */ /* in_0 */ vec1 32 ssa_3 = deref_var (uniform sampler2D) vec2 32 ssa_4 = vec2 ssa_2.x, ssa_2.y vec4 32 ssa_5 = tex ssa_3 (texture_deref), ssa_3 (sampler_deref), ssa_4 (coord) Unhandled NIR tex src type: 11 intrinsic store_output (ssa_5, ssa_0) (0, 15, 0) /* base=0 */ /* wrmask=xyzw */ /* component=0 */ /* out_0 */ /* succs: block_1 */ block block_1: } Assertion failed: !"" (../src/freedreno/ir3/ir3_context.c: ir3_context_error: 407) - I verified that the mesa 19.1.0-rc5 release contains this recent a3xx fix from Jonathan: https://gitlab.freedesktop.org/mesa/mesa/commit/1db86d8b62860380c34af77ae62b019ed2376443 Any suggestions? [1] https://github.com/masneyb/linux/commits/v5.2-rc3-nexus5-gpu-wip The GPU specific patches start at Rob's patch 'qcom-scm: add support to restore secure config' on that list. I submitted the patches below that a few weeks ago to the upstream kernel and I expect they'll be merged. Once I have a working GPU, I plan to start working on the interconnect support in the kernel for msm8974 so that the clock hacks can be dropped. Brian ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] freedreno: 'Unhandled NIR tex src type: 11' on A3XX
Hi, It's possible 19.1 has another issue, I only tested the master branch with my fix. I would suggest trying 19.0 or the master branch. FYI, I haven't pushed it anywhere but I recently rebased my Nexus 5 patches from last year (and been looking at getting call audio working). Jonathan On 6/8/19 9:08 PM, Brian Masney wrote: Hi, I'm trying to get the GPU working using the Freedreno driver (A330) on the Nexus 5 phone. I'm using kernel 5.2rc3 with some out of tree patches related to the GPU [1] and mesa 19.1.0-rc5 on postmarketOS. When I run glxgears, I see the gears show up for a fraction of a second and then it terminates due to the following error: - shader: MESA_SHADER_FRAGMENT inputs: 1 outputs: 1 uniforms: 0 shared: 0 decl_var uniform INTERP_MODE_NONE sampler2D sampler (0, 0, 0) decl_var shader_in INTERP_MODE_SMOOTH vec4 in_0 (VARYING_SLOT_VAR0, 0, 0) decl_var shader_out INTERP_MODE_FLAT vec4 out_0 (FRAG_RESULT_DATA0, 0, 0) decl_function main (0 params) impl main { block block_0: /* preds: */ vec1 32 ssa_0 = load_const (0x /* 0.00 */) vec2 32 ssa_1 = intrinsic load_barycentric_pixel () (1) /* interp_mode=1 */ vec4 32 ssa_2 = intrinsic load_interpolated_input (ssa_1, ssa_0) (0, 0) /* base=0 */ /* component=0 */ /* in_0 */ vec1 32 ssa_3 = deref_var (uniform sampler2D) vec2 32 ssa_4 = vec2 ssa_2.x, ssa_2.y vec4 32 ssa_5 = tex ssa_3 (texture_deref), ssa_3 (sampler_deref), ssa_4 (coord) Unhandled NIR tex src type: 11 intrinsic store_output (ssa_5, ssa_0) (0, 15, 0) /* base=0 */ /* wrmask=xyzw */ /* component=0 */ /* out_0 */ /* succs: block_1 */ block block_1: } Assertion failed: !"" (../src/freedreno/ir3/ir3_context.c: ir3_context_error: 407) - I verified that the mesa 19.1.0-rc5 release contains this recent a3xx fix from Jonathan: https://gitlab.freedesktop.org/mesa/mesa/commit/1db86d8b62860380c34af77ae62b019ed2376443 Any suggestions? [1] https://github.com/masneyb/linux/commits/v5.2-rc3-nexus5-gpu-wip The GPU specific patches start at Rob's patch 'qcom-scm: add support to restore secure config' on that list. I submitted the patches below that a few weeks ago to the upstream kernel and I expect they'll be merged. Once I have a working GPU, I plan to start working on the interconnect support in the kernel for msm8974 so that the clock hacks can be dropped. Brian ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 06/16] nir: improve convert_yuv_to_rgb when fuse_ffma=true
There's no updated series yet. This patch will work on its own and the issue that was pointed out doesn't affect behavior at all. On 1/7/19 4:47 PM, Lionel Landwerlin wrote: I did not but then saw someone pointed out an issue with this particular patch. I can do tomorrow. Do you have link to the updated series? Thanks, - Lionel On 07/01/2019 16:54, Jonathan Marek wrote: Hi, Did you get a chance try this? If not, I might be able to try it myself as I have Intel HW. On 12/19/18 12:34 PM, Lionel Landwerlin wrote: Hey Jonathan, I'm kind of curious as to whether we can have a single expression that pretty much generates the same final code (through some of the algebraic lowering/optimizations). I'll give it a try on Intel HW, see what it does. - Lionel On 19/12/2018 16:39, Jonathan Marek wrote: When ffma is available, we can use a different arrangement of constants to get a better result. On freedreno/ir3, this reduces the YUV->RGB to 7 scalar ffma. On freedreno/a2xx, it will allow YUV->RGB to be 3 vec4 ffma. Signed-off-by: Jonathan Marek --- src/compiler/nir/nir_lower_tex.c | 62 ++-- 1 file changed, 43 insertions(+), 19 deletions(-) diff --git a/src/compiler/nir/nir_lower_tex.c b/src/compiler/nir/nir_lower_tex.c index 6a6b6c41a7..f7c821bb34 100644 --- a/src/compiler/nir/nir_lower_tex.c +++ b/src/compiler/nir/nir_lower_tex.c @@ -342,25 +342,49 @@ convert_yuv_to_rgb(nir_builder *b, nir_tex_instr *tex, nir_ssa_def *y, nir_ssa_def *u, nir_ssa_def *v, nir_ssa_def *a) { - nir_const_value m[3] = { - { .f32 = { 1.0f, 0.0f, 1.59602678f, 0.0f } }, - { .f32 = { 1.0f, -0.39176229f, -0.81296764f, 0.0f } }, - { .f32 = { 1.0f, 2.01723214f, 0.0f, 0.0f } } - }; - - nir_ssa_def *yuv = - nir_vec4(b, - nir_fmul(b, nir_imm_float(b, 1.16438356f), - nir_fadd(b, y, nir_imm_float(b, -16.0f / 255.0f))), - nir_channel(b, nir_fadd(b, u, nir_imm_float(b, -128.0f / 255.0f)), 0), - nir_channel(b, nir_fadd(b, v, nir_imm_float(b, -128.0f / 255.0f)), 0), - nir_imm_float(b, 0.0)); - - nir_ssa_def *red = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32, m[0])); - nir_ssa_def *green = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32, m[1])); - nir_ssa_def *blue = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32, m[2])); - - nir_ssa_def *result = nir_vec4(b, red, green, blue, a); + nir_ssa_def *result; + + + if (b->shader->options->fuse_ffma) { + nir_const_value m[4] = { + { .f32 = { 1.16438356f, 1.16438356f, 1.16438356f, 0.0f } }, + { .f32 = { 0.0f, -0.39176229f, 2.01723214f, 0.0f } }, + { .f32 = { 1.59602678f,-0.81296764f, 0.0f, 0.0f } }, + }; + static const float y_off = -16.0f * 1.16438356f / 255.0f; + static const float sc = 128.0f / 255.0f; + + nir_ssa_def *offset = + nir_vec4(b, + nir_imm_float(b, y_off - sc * 1.59602678f), + nir_imm_float(b, y_off + sc * (0.81296764f + 0.39176229f)), + nir_imm_float(b, y_off - sc * 2.01723214f), + a); + + result = nir_ffma(b, y, nir_build_imm(b, 4, 32, m[0]), + nir_ffma(b, u, nir_build_imm(b, 4, 32, m[1]), + nir_ffma(b, v, nir_build_imm(b, 4, 32, m[2]), offset))); + } else { + nir_const_value m[3] = { + { .f32 = { 1.0f, 0.0f, 1.59602678f, 0.0f } }, + { .f32 = { 1.0f, -0.39176229f, -0.81296764f, 0.0f } }, + { .f32 = { 1.0f, 2.01723214f, 0.0f, 0.0f } } + }; + + nir_ssa_def *yuv = + nir_vec4(b, + nir_fmul(b, nir_imm_float(b, 1.16438356f), + nir_fadd(b, y, nir_imm_float(b, -16.0f / 255.0f))), + nir_channel(b, nir_fadd(b, u, nir_imm_float(b, -128.0f / 255.0f)), 0), + nir_channel(b, nir_fadd(b, v, nir_imm_float(b, -128.0f / 255.0f)), 0), + nir_imm_float(b, 0.0)); + + nir_ssa_def *red = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32, m[0])); + nir_ssa_def *green = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32, m[1])); + nir_ssa_def *blue = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32, m[2])); + + result = nir_vec4(b, red, green, blue, a); + } nir_ssa_def_rewrite_uses(>dest.ssa, nir_src_for_ssa(result)); } ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 06/16] nir: improve convert_yuv_to_rgb when fuse_ffma=true
Hi, Did you get a chance try this? If not, I might be able to try it myself as I have Intel HW. On 12/19/18 12:34 PM, Lionel Landwerlin wrote: Hey Jonathan, I'm kind of curious as to whether we can have a single expression that pretty much generates the same final code (through some of the algebraic lowering/optimizations). I'll give it a try on Intel HW, see what it does. - Lionel On 19/12/2018 16:39, Jonathan Marek wrote: When ffma is available, we can use a different arrangement of constants to get a better result. On freedreno/ir3, this reduces the YUV->RGB to 7 scalar ffma. On freedreno/a2xx, it will allow YUV->RGB to be 3 vec4 ffma. Signed-off-by: Jonathan Marek --- src/compiler/nir/nir_lower_tex.c | 62 ++-- 1 file changed, 43 insertions(+), 19 deletions(-) diff --git a/src/compiler/nir/nir_lower_tex.c b/src/compiler/nir/nir_lower_tex.c index 6a6b6c41a7..f7c821bb34 100644 --- a/src/compiler/nir/nir_lower_tex.c +++ b/src/compiler/nir/nir_lower_tex.c @@ -342,25 +342,49 @@ convert_yuv_to_rgb(nir_builder *b, nir_tex_instr *tex, nir_ssa_def *y, nir_ssa_def *u, nir_ssa_def *v, nir_ssa_def *a) { - nir_const_value m[3] = { - { .f32 = { 1.0f, 0.0f, 1.59602678f, 0.0f } }, - { .f32 = { 1.0f, -0.39176229f, -0.81296764f, 0.0f } }, - { .f32 = { 1.0f, 2.01723214f, 0.0f, 0.0f } } - }; - - nir_ssa_def *yuv = - nir_vec4(b, - nir_fmul(b, nir_imm_float(b, 1.16438356f), - nir_fadd(b, y, nir_imm_float(b, -16.0f / 255.0f))), - nir_channel(b, nir_fadd(b, u, nir_imm_float(b, -128.0f / 255.0f)), 0), - nir_channel(b, nir_fadd(b, v, nir_imm_float(b, -128.0f / 255.0f)), 0), - nir_imm_float(b, 0.0)); - - nir_ssa_def *red = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32, m[0])); - nir_ssa_def *green = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32, m[1])); - nir_ssa_def *blue = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32, m[2])); - - nir_ssa_def *result = nir_vec4(b, red, green, blue, a); + nir_ssa_def *result; + + + if (b->shader->options->fuse_ffma) { + nir_const_value m[4] = { + { .f32 = { 1.16438356f, 1.16438356f, 1.16438356f, 0.0f } }, + { .f32 = { 0.0f, -0.39176229f, 2.01723214f, 0.0f } }, + { .f32 = { 1.59602678f,-0.81296764f, 0.0f, 0.0f } }, + }; + static const float y_off = -16.0f * 1.16438356f / 255.0f; + static const float sc = 128.0f / 255.0f; + + nir_ssa_def *offset = + nir_vec4(b, + nir_imm_float(b, y_off - sc * 1.59602678f), + nir_imm_float(b, y_off + sc * (0.81296764f + 0.39176229f)), + nir_imm_float(b, y_off - sc * 2.01723214f), + a); + + result = nir_ffma(b, y, nir_build_imm(b, 4, 32, m[0]), + nir_ffma(b, u, nir_build_imm(b, 4, 32, m[1]), + nir_ffma(b, v, nir_build_imm(b, 4, 32, m[2]), offset))); + } else { + nir_const_value m[3] = { + { .f32 = { 1.0f, 0.0f, 1.59602678f, 0.0f } }, + { .f32 = { 1.0f, -0.39176229f, -0.81296764f, 0.0f } }, + { .f32 = { 1.0f, 2.01723214f, 0.0f, 0.0f } } + }; + + nir_ssa_def *yuv = + nir_vec4(b, + nir_fmul(b, nir_imm_float(b, 1.16438356f), + nir_fadd(b, y, nir_imm_float(b, -16.0f / 255.0f))), + nir_channel(b, nir_fadd(b, u, nir_imm_float(b, -128.0f / 255.0f)), 0), + nir_channel(b, nir_fadd(b, v, nir_imm_float(b, -128.0f / 255.0f)), 0), + nir_imm_float(b, 0.0)); + + nir_ssa_def *red = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32, m[0])); + nir_ssa_def *green = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32, m[1])); + nir_ssa_def *blue = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32, m[2])); + + result = nir_vec4(b, red, green, blue, a); + } nir_ssa_def_rewrite_uses(>dest.ssa, nir_src_for_ssa(result)); } ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 06/16] nir: improve convert_yuv_to_rgb when fuse_ffma=true
On 12/20/2018 01:28 AM, Nils Wallménius wrote: Den ons 19 dec. 2018 17:44 skrev Jonathan Marek : When ffma is available, we can use a different arrangement of constants to get a better result. On freedreno/ir3, this reduces the YUV->RGB to 7 scalar ffma. On freedreno/a2xx, it will allow YUV->RGB to be 3 vec4 ffma. Signed-off-by: Jonathan Marek --- src/compiler/nir/nir_lower_tex.c | 62 ++-- 1 file changed, 43 insertions(+), 19 deletions(-) diff --git a/src/compiler/nir/nir_lower_tex.c b/src/compiler/nir/nir_lower_tex.c index 6a6b6c41a7..f7c821bb34 100644 --- a/src/compiler/nir/nir_lower_tex.c +++ b/src/compiler/nir/nir_lower_tex.c @@ -342,25 +342,49 @@ convert_yuv_to_rgb(nir_builder *b, nir_tex_instr *tex, nir_ssa_def *y, nir_ssa_def *u, nir_ssa_def *v, nir_ssa_def *a) { - nir_const_value m[3] = { - { .f32 = { 1.0f, 0.0f, 1.59602678f, 0.0f } }, - { .f32 = { 1.0f, -0.39176229f, -0.81296764f, 0.0f } }, - { .f32 = { 1.0f, 2.01723214f, 0.0f,0.0f } } - }; - - nir_ssa_def *yuv = - nir_vec4(b, - nir_fmul(b, nir_imm_float(b, 1.16438356f), -nir_fadd(b, y, nir_imm_float(b, -16.0f / 255.0f))), - nir_channel(b, nir_fadd(b, u, nir_imm_float(b, -128.0f / 255.0f)), 0), - nir_channel(b, nir_fadd(b, v, nir_imm_float(b, -128.0f / 255.0f)), 0), - nir_imm_float(b, 0.0)); - - nir_ssa_def *red = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32, m[0])); - nir_ssa_def *green = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32, m[1])); - nir_ssa_def *blue = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32, m[2])); - - nir_ssa_def *result = nir_vec4(b, red, green, blue, a); + nir_ssa_def *result; + + + if (b->shader->options->fuse_ffma) { + nir_const_value m[4] = { Drive-by comment, but shouldn't this^ be m[3]? Regards Nils Yes, it should be m[3]. It was originally 4 before alpha was added. + { .f32 = { 1.16438356f, 1.16438356f, 1.16438356f, 0.0f } }, + { .f32 = { 0.0f, -0.39176229f, 2.01723214f, 0.0f } }, + { .f32 = { 1.59602678f,-0.81296764f, 0.0f,0.0f } }, + }; + static const float y_off = -16.0f * 1.16438356f / 255.0f; + static const float sc = 128.0f / 255.0f; + + nir_ssa_def *offset = + nir_vec4(b, + nir_imm_float(b, y_off - sc * 1.59602678f), + nir_imm_float(b, y_off + sc * (0.81296764f + 0.39176229f)), + nir_imm_float(b, y_off - sc * 2.01723214f), + a); + + result = nir_ffma(b, y, nir_build_imm(b, 4, 32, m[0]), + nir_ffma(b, u, nir_build_imm(b, 4, 32, m[1]), +nir_ffma(b, v, nir_build_imm(b, 4, 32, m[2]), offset))); + } else { + nir_const_value m[3] = { + { .f32 = { 1.0f, 0.0f, 1.59602678f, 0.0f } }, + { .f32 = { 1.0f, -0.39176229f, -0.81296764f, 0.0f } }, + { .f32 = { 1.0f, 2.01723214f, 0.0f,0.0f } } + }; + + nir_ssa_def *yuv = + nir_vec4(b, + nir_fmul(b, nir_imm_float(b, 1.16438356f), + nir_fadd(b, y, nir_imm_float(b, -16.0f / 255.0f))), + nir_channel(b, nir_fadd(b, u, nir_imm_float(b, -128.0f / 255.0f)), 0), + nir_channel(b, nir_fadd(b, v, nir_imm_float(b, -128.0f / 255.0f)), 0), + nir_imm_float(b, 0.0)); + + nir_ssa_def *red = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32, m[0])); + nir_ssa_def *green = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32, m[1])); + nir_ssa_def *blue = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32, m[2])); + + result = nir_vec4(b, red, green, blue, a); + } nir_ssa_def_rewrite_uses(>dest.ssa, nir_src_for_ssa(result)); } -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 01/16] glsl/nir: int constants as float for native_integers=false
I haven't encountered such dereference issues, but lowering integers later is a good idea (as with bools which are now lowered later). On 12/19/2018 01:22 PM, Eric Anholt wrote: Jonathan Marek writes: Note: the backend must take care that uniform index is now a float This makes me think that lowering ints to float should be done near the end of the compile (followed by maybe an algebraic and a dce). As is, I think nir_lower_io() is going to do bad things to dereferences of i/o arrays. That said, it looks like this will be fixing way more than it regresses, so I would go along with it. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 08/16] freedreno: a2xx: enable early-Z testing
Hi, I didn't verify it, but both r600 and a3xx disable earlyZ when alpha test is enabled, so this is almost certainly right. We don't need to worry about the shader writing Z, it is not part of OpenGL ES 2.0 and not implemented by the driver (although the hardware should allow it). Why should we need to check if the shader does discards? On 12/19/2018 01:05 PM, Eric Anholt wrote: Jonathan Marek writes: Enable earlyZ when alpha test is disabled. Signed-off-by: Jonathan Marek --- src/gallium/drivers/freedreno/a2xx/fd2_zsa.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_zsa.c b/src/gallium/drivers/freedreno/a2xx/fd2_zsa.c index 64b31b677b..d3c19b4450 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_zsa.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_zsa.c @@ -49,7 +49,8 @@ fd2_zsa_state_create(struct pipe_context *pctx, A2XX_RB_DEPTHCONTROL_ZFUNC(cso->depth.func); /* maps 1:1 */ if (cso->depth.enabled) - so->rb_depthcontrol |= A2XX_RB_DEPTHCONTROL_Z_ENABLE; + so->rb_depthcontrol |= A2XX_RB_DEPTHCONTROL_Z_ENABLE | + COND(!cso->alpha.enabled, A2XX_RB_DEPTHCONTROL_EARLY_Z_ENABLE); Why when alpha test is disabled? Should you also be checking if the shader does discards? How about if the shader writes Z, is anything preventing early Z then? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 04/16] nir: add nir_lower_bool_to_float
Hi, No I did not see that. That version should work for me, although I don't like the lowering of nir_op_inot it has, since the backend might have something smarter to implement a "fnot" (and ior could also just be a fmax instead). On 12/19/2018 12:44 PM, Christian Gmeiner wrote: Am Mi., 19. Dez. 2018 um 17:44 Uhr schrieb Jonathan Marek : Mainly a copy of nir_lower_bool_to_int32, but with float opcodes. Hmmm.. are you aware of https://patchwork.freedesktop.org/patch/257867/ and https://gitlab.freedesktop.org/jekstrand/mesa/commit/cf819c8a3fa99ccedf423ea77cf710dbd852066b ? I am going to send out a lager patch series with that version of bool to float during my christmas break. Keep in mind that I did not looked very closely at your lowering pass. Signed-off-by: Jonathan Marek --- src/compiler/Makefile.sources | 1 + src/compiler/nir/meson.build | 3 +- src/compiler/nir/nir.h | 1 + src/compiler/nir/nir_lower_bool_to_float.c | 165 + 4 files changed, 169 insertions(+), 1 deletion(-) create mode 100644 src/compiler/nir/nir_lower_bool_to_float.c diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.sources index ef47bdb33b..39eaedc658 100644 --- a/src/compiler/Makefile.sources +++ b/src/compiler/Makefile.sources @@ -231,6 +231,7 @@ NIR_FILES = \ nir/nir_lower_atomics_to_ssbo.c \ nir/nir_lower_bitmap.c \ nir/nir_lower_bit_size.c \ + nir/nir_lower_bool_to_float.c \ nir/nir_lower_bool_to_int32.c \ nir/nir_lower_clamp_color_outputs.c \ nir/nir_lower_clip.c \ diff --git a/src/compiler/nir/meson.build b/src/compiler/nir/meson.build index e252f64539..f1016104af 100644 --- a/src/compiler/nir/meson.build +++ b/src/compiler/nir/meson.build @@ -114,6 +114,7 @@ files_libnir = files( 'nir_lower_alpha_test.c', 'nir_lower_atomics_to_ssbo.c', 'nir_lower_bitmap.c', + 'nir_lower_bool_to_float.c', 'nir_lower_bool_to_int32.c', 'nir_lower_clamp_color_outputs.c', 'nir_lower_clip.c', @@ -248,7 +249,7 @@ if with_tests include_directories : [inc_common], dependencies : [dep_thread, idep_gtest, idep_nir], link_with : libmesa_util, -), +), suite : ['compiler', 'nir'], ) diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h index 54f9c64a3a..f6d0bdf7ec 100644 --- a/src/compiler/nir/nir.h +++ b/src/compiler/nir/nir.h @@ -2905,6 +2905,7 @@ void nir_lower_alpha_test(nir_shader *shader, enum compare_func func, bool alpha_to_one); bool nir_lower_alu(nir_shader *shader); bool nir_lower_alu_to_scalar(nir_shader *shader); +bool nir_lower_bool_to_float(nir_shader *shader); bool nir_lower_bool_to_int32(nir_shader *shader); bool nir_lower_load_const_to_scalar(nir_shader *shader); bool nir_lower_read_invocation_to_scalar(nir_shader *shader); diff --git a/src/compiler/nir/nir_lower_bool_to_float.c b/src/compiler/nir/nir_lower_bool_to_float.c new file mode 100644 index 00..2756a1815f --- /dev/null +++ b/src/compiler/nir/nir_lower_bool_to_float.c @@ -0,0 +1,165 @@ +/* + * Copyright © 2018 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +#include "nir.h" + +static bool +assert_ssa_def_is_not_1bit(nir_ssa_def *def, UNUSED void *unused) +{ + assert(def->bit_size > 1); + return true; +} + +static bool +rewrite_1bit_ssa_def_to_32bit(nir_ssa_def *def, void *_progress) +{ + bool *progress = _progress; + if (def->bit_size == 1) { + def->bit_size = 32; + *progress = true; + } + return true; +} + +static bool +lower_alu_instr(nir_alu_instr *alu) +{ + const nir_op_info *op_info = _op_infos[alu->op]; + + switch (alu->op) { + case nir_op_vec2: + case nir_op_vec3: + case nir_op_vec4: + /* Thes
[Mesa-dev] [PATCH 11/16] freedreno: a2xx: use fd_resource_offset for base in emit_texture
Fixup for the texture update patch. Signed-off-by: Jonathan Marek --- src/gallium/drivers/freedreno/a2xx/fd2_emit.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c index ce275a78a6..ac2a02dfae 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c @@ -137,7 +137,7 @@ emit_texture(struct fd_ringbuffer *ring, struct fd_context *ctx, OUT_RING(ring, sampler->tex0 | view->tex0); if (rsc) - OUT_RELOC(ring, rsc->bo, 0, view->tex1, 0); + OUT_RELOC(ring, rsc->bo, fd_resource_offset(rsc, 0, 0), view->tex1, 0); else OUT_RING(ring, 0); -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 12/16] freedreno: a2xx: sysmem rendering
Signed-off-by: Jonathan Marek --- src/gallium/drivers/freedreno/a2xx/fd2_gmem.c | 62 +++ 1 file changed, 62 insertions(+) diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c b/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c index d9aad16b4a..77c8d80055 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c @@ -367,6 +367,67 @@ fd2_emit_tile_mem2gmem(struct fd_batch *batch, struct fd_tile *tile) /* TODO blob driver seems to toss in a CACHE_FLUSH after each DRAW_INDX.. */ } +static void +fd2_emit_sysmem_prep(struct fd_batch *batch) +{ + struct fd_context *ctx = batch->ctx; + struct fd_ringbuffer *ring = batch->gmem; + struct pipe_framebuffer_state *pfb = >framebuffer; + struct pipe_surface *psurf = pfb->cbufs[0]; + + if (!psurf) + return; + + struct fd_resource *rsc = fd_resource(psurf->texture); + struct fd_resource_slice *slice = + fd_resource_slice(rsc, psurf->u.tex.level); + uint32_t offset = + fd_resource_offset(rsc, psurf->u.tex.level, psurf->u.tex.first_layer); + + assert((slice->pitch & 31) == 0); + assert((offset & 0xfff) == 0); + + fd2_emit_restore(ctx, ring); + + OUT_PKT0(ring, REG_A2XX_COHER_SIZE_PM4, 1); + OUT_RING(ring, slice->size0); + OUT_PKT0(ring, REG_A2XX_COHER_STATUS_PM4, 1); + OUT_RING(ring, 0x02000200); + OUT_PKT0(ring, REG_A2XX_COHER_BASE_PM4, 1); + OUT_RELOCW(ring, rsc->bo, offset, 0, 0); + + OUT_PKT3(ring, CP_WAIT_REG_EQ, 4); + OUT_RING(ring, REG_A2XX_COHER_STATUS_PM4); + OUT_RING(ring, 0); + OUT_RING(ring, 0x8000); + OUT_RING(ring, 1); + + OUT_PKT3(ring, CP_SET_CONSTANT, 2); + OUT_RING(ring, CP_REG(REG_A2XX_RB_SURFACE_INFO)); + OUT_RING(ring, A2XX_RB_SURFACE_INFO_SURFACE_PITCH(slice->pitch)); + + OUT_PKT3(ring, CP_SET_CONSTANT, 2); + OUT_RING(ring, CP_REG(REG_A2XX_RB_COLOR_INFO)); + OUT_RELOCW(ring, rsc->bo, offset, A2XX_RB_COLOR_INFO_LINEAR | + A2XX_RB_COLOR_INFO_SWAP(fmt2swap(psurf->format)) | + A2XX_RB_COLOR_INFO_FORMAT(fd2_pipe2color(psurf->format)), 0); + + OUT_PKT3(ring, CP_SET_CONSTANT, 2); + OUT_RING(ring, CP_REG(REG_A2XX_COHER_DEST_BASE_0)); + OUT_RELOCW(ring, rsc->bo, offset, 0, 0); + + OUT_PKT3(ring, CP_SET_CONSTANT, 3); + OUT_RING(ring, CP_REG(REG_A2XX_PA_SC_SCREEN_SCISSOR_TL)); + OUT_RING(ring, A2XX_PA_SC_SCREEN_SCISSOR_TL_WINDOW_OFFSET_DISABLE); + OUT_RING(ring, A2XX_PA_SC_SCREEN_SCISSOR_BR_X(pfb->width) | + A2XX_PA_SC_SCREEN_SCISSOR_BR_Y(pfb->height)); + + OUT_PKT3(ring, CP_SET_CONSTANT, 2); + OUT_RING(ring, CP_REG(REG_A2XX_PA_SC_WINDOW_OFFSET)); + OUT_RING(ring, A2XX_PA_SC_WINDOW_OFFSET_X(0) | + A2XX_PA_SC_WINDOW_OFFSET_Y(0)); +} + /* before first tile */ static void fd2_emit_tile_init(struct fd_batch *batch) @@ -440,6 +501,7 @@ fd2_gmem_init(struct pipe_context *pctx) { struct fd_context *ctx = fd_context(pctx); + ctx->emit_sysmem_prep = fd2_emit_sysmem_prep; ctx->emit_tile_init = fd2_emit_tile_init; ctx->emit_tile_prep = fd2_emit_tile_prep; ctx->emit_tile_mem2gmem = fd2_emit_tile_mem2gmem; -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 16/16] freedreno: a2xx: a20x hw binning
--- src/gallium/drivers/freedreno/a2xx/fd2_draw.c | 32 +++- src/gallium/drivers/freedreno/a2xx/fd2_emit.c | 52 ++ src/gallium/drivers/freedreno/a2xx/fd2_emit.h | 3 +- src/gallium/drivers/freedreno/a2xx/fd2_gmem.c | 150 ++ .../drivers/freedreno/a2xx/fd2_program.c | 11 +- .../drivers/freedreno/freedreno_batch.c | 3 + .../drivers/freedreno/freedreno_batch.h | 7 + .../drivers/freedreno/freedreno_draw.h| 3 + .../drivers/freedreno/freedreno_gmem.c| 29 +++- .../drivers/freedreno/freedreno_gmem.h| 1 + .../drivers/freedreno/freedreno_screen.h | 6 + 11 files changed, 281 insertions(+), 16 deletions(-) diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c index 4e91267080..d3e440d144 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c @@ -75,11 +75,12 @@ emit_vertexbufs(struct fd_context *ctx) // CONST(20,0) (or CONST(26,0) in soliv_vp) fd2_emit_vertex_bufs(ctx->batch->draw, 0x78, bufs, vtx->num_elements); + fd2_emit_vertex_bufs(ctx->batch->binning, 0x78, bufs, vtx->num_elements); } static void draw_impl(struct fd_context *ctx, const struct pipe_draw_info *info, - struct fd_ringbuffer *ring, unsigned index_offset) + struct fd_ringbuffer *ring, unsigned index_offset, bool binning) { OUT_PKT3(ring, CP_SET_CONSTANT, 2); OUT_RING(ring, CP_REG(REG_A2XX_VGT_INDX_OFFSET)); @@ -119,8 +120,22 @@ draw_impl(struct fd_context *ctx, const struct pipe_draw_info *info, OUT_RING(ring, info->min_index);/* VGT_MIN_VTX_INDX */ } + /* binning shader will take offset from C64 */ + if (binning && is_a20x(ctx->screen)) { + OUT_PKT3(ring, CP_SET_CONSTANT, 5); + OUT_RING(ring, 0x0180); + OUT_RING(ring, fui(ctx->batch->num_vertices)); + OUT_RING(ring, fui(0.0f)); + OUT_RING(ring, fui(0.0f)); + OUT_RING(ring, fui(0.0f)); + } + + enum pc_di_vis_cull_mode vismode = USE_VISIBILITY; + if (binning || info->mode == PIPE_PRIM_POINTS) + vismode = IGNORE_VISIBILITY; + fd_draw_emit(ctx->batch, ring, ctx->primtypes[info->mode], -IGNORE_VISIBILITY, info, index_offset); +vismode, info, index_offset); if (is_a20x(ctx->screen)) { /* not sure why this is required, but it fixes some hangs */ @@ -145,6 +160,9 @@ fd2_draw_vbo(struct fd_context *ctx, const struct pipe_draw_info *pinfo, if (ctx->dirty & FD_DIRTY_VTXBUF) emit_vertexbufs(ctx); + if (!(fd_mesa_debug & FD_DBG_NOBIN)) + fd2_emit_state_binning(ctx, ctx->dirty); + fd2_emit_state(ctx, ctx->dirty); /* a2xx can draw only 65535 vertices at once @@ -166,17 +184,23 @@ fd2_draw_vbo(struct fd_context *ctx, const struct pipe_draw_info *pinfo, struct pipe_draw_info info = *pinfo; unsigned count = info.count; unsigned step = step_tbl[info.mode]; + unsigned num_vertices = ctx->batch->num_vertices; if (!step) return false; for (; count + step > 32766; count -= step) { info.count = MIN2(count, 32766); - draw_impl(ctx, , ctx->batch->draw, index_offset); + draw_impl(ctx, , ctx->batch->draw, index_offset, false); + draw_impl(ctx, , ctx->batch->binning, index_offset, true); info.start += step; + ctx->batch->num_vertices += step; } + /* changing this value is a hack, restore it */ + ctx->batch->num_vertices = num_vertices; } else { - draw_impl(ctx, pinfo, ctx->batch->draw, index_offset); + draw_impl(ctx, pinfo, ctx->batch->draw, index_offset, false); + draw_impl(ctx, pinfo, ctx->batch->binning, index_offset, true); } fd_context_all_clean(ctx); diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c index 9628f26736..7371fa6e8c 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c @@ -185,6 +185,58 @@ fd2_emit_vertex_bufs(struct fd_ringbuffer *ring, uint32_t val, } } +void +fd2_emit_state_binning(struct fd_context *ctx, const enum fd_dirty_3d_state dirty) +{ + struct fd2_blend_stateobj *blend = fd2_blend_stateobj(ctx->blend); + struct fd_ringbuffer *ring = ctx->batch->binning; + + /* subset of fd2_emit_state needed for hw binning on a20x */ + + if (dirty & (FD_DIRTY_PROG | FD_DIRTY_VTXSTATE)) +
[Mesa-dev] [PATCH 07/16] freedreno: a2xx: improve REG_A2XX_PA_CL_VTE_CNTL management
Doesn't change much, but reduces the size of fd2_emit_state gmem2mem does not need to change the value: no Z clipping on resolve mem2gmem now needs to restore the common value after rendering Signed-off-by: Jonathan Marek --- src/gallium/drivers/freedreno/a2xx/fd2_emit.c | 20 +-- src/gallium/drivers/freedreno/a2xx/fd2_gmem.c | 18 + 2 files changed, 20 insertions(+), 18 deletions(-) diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c index 60bc9fad4c..7dcd31cbcb 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c @@ -272,16 +272,6 @@ fd2_emit_state(struct fd_context *ctx, const enum fd_dirty_3d_state dirty) OUT_RING(ring, fui(ctx->viewport.translate[1])); /* PA_CL_VPORT_YOFFSET */ OUT_RING(ring, fui(ctx->viewport.scale[2])); /* PA_CL_VPORT_ZSCALE */ OUT_RING(ring, fui(ctx->viewport.translate[2])); /* PA_CL_VPORT_ZOFFSET */ - - OUT_PKT3(ring, CP_SET_CONSTANT, 2); - OUT_RING(ring, CP_REG(REG_A2XX_PA_CL_VTE_CNTL)); - OUT_RING(ring, A2XX_PA_CL_VTE_CNTL_VTX_W0_FMT | - A2XX_PA_CL_VTE_CNTL_VPORT_X_SCALE_ENA | - A2XX_PA_CL_VTE_CNTL_VPORT_X_OFFSET_ENA | - A2XX_PA_CL_VTE_CNTL_VPORT_Y_SCALE_ENA | - A2XX_PA_CL_VTE_CNTL_VPORT_Y_OFFSET_ENA | - A2XX_PA_CL_VTE_CNTL_VPORT_Z_SCALE_ENA | - A2XX_PA_CL_VTE_CNTL_VPORT_Z_OFFSET_ENA); } if (dirty & (FD_DIRTY_PROG | FD_DIRTY_VTXSTATE | FD_DIRTY_TEXSTATE)) { @@ -475,6 +465,16 @@ fd2_emit_restore(struct fd_context *ctx, struct fd_ringbuffer *ring) OUT_RING(ring, 0x);/* RB_BLEND_GREEN */ OUT_RING(ring, 0x);/* RB_BLEND_BLUE */ OUT_RING(ring, 0x00ff);/* RB_BLEND_ALPHA */ + + OUT_PKT3(ring, CP_SET_CONSTANT, 2); + OUT_RING(ring, CP_REG(REG_A2XX_PA_CL_VTE_CNTL)); + OUT_RING(ring, A2XX_PA_CL_VTE_CNTL_VTX_W0_FMT | + A2XX_PA_CL_VTE_CNTL_VPORT_X_SCALE_ENA | + A2XX_PA_CL_VTE_CNTL_VPORT_X_OFFSET_ENA | + A2XX_PA_CL_VTE_CNTL_VPORT_Y_SCALE_ENA | + A2XX_PA_CL_VTE_CNTL_VPORT_Y_OFFSET_ENA | + A2XX_PA_CL_VTE_CNTL_VPORT_Z_SCALE_ENA | + A2XX_PA_CL_VTE_CNTL_VPORT_Z_OFFSET_ENA); } static void diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c b/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c index e98ae7334a..3c54e2c6c0 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c @@ -156,14 +156,6 @@ fd2_emit_tile_gmem2mem(struct fd_batch *batch, struct fd_tile *tile) OUT_RING(ring, xy2d(0, 0)); /* PA_SC_WINDOW_SCISSOR_TL */ OUT_RING(ring, xy2d(pfb->width, pfb->height));/* PA_SC_WINDOW_SCISSOR_BR */ - OUT_PKT3(ring, CP_SET_CONSTANT, 2); - OUT_RING(ring, CP_REG(REG_A2XX_PA_CL_VTE_CNTL)); - OUT_RING(ring, A2XX_PA_CL_VTE_CNTL_VTX_W0_FMT | - A2XX_PA_CL_VTE_CNTL_VPORT_X_SCALE_ENA | - A2XX_PA_CL_VTE_CNTL_VPORT_X_OFFSET_ENA | - A2XX_PA_CL_VTE_CNTL_VPORT_Y_SCALE_ENA | - A2XX_PA_CL_VTE_CNTL_VPORT_Y_OFFSET_ENA); - OUT_PKT3(ring, CP_SET_CONSTANT, 2); OUT_RING(ring, CP_REG(REG_A2XX_PA_CL_CLIP_CNTL)); OUT_RING(ring, 0x); @@ -350,6 +342,16 @@ fd2_emit_tile_mem2gmem(struct fd_batch *batch, struct fd_tile *tile) if (fd_gmem_needs_restore(batch, tile, FD_BUFFER_COLOR)) emit_mem2gmem_surf(batch, gmem->cbuf_base[0], pfb->cbufs[0]); + OUT_PKT3(ring, CP_SET_CONSTANT, 2); + OUT_RING(ring, CP_REG(REG_A2XX_PA_CL_VTE_CNTL)); + OUT_RING(ring, A2XX_PA_CL_VTE_CNTL_VTX_W0_FMT | + A2XX_PA_CL_VTE_CNTL_VPORT_X_SCALE_ENA | + A2XX_PA_CL_VTE_CNTL_VPORT_X_OFFSET_ENA | + A2XX_PA_CL_VTE_CNTL_VPORT_Y_SCALE_ENA | + A2XX_PA_CL_VTE_CNTL_VPORT_Y_OFFSET_ENA | + A2XX_PA_CL_VTE_CNTL_VPORT_Z_SCALE_ENA | + A2XX_PA_CL_VTE_CNTL_VPORT_Z_OFFSET_ENA); + /* TODO blob driver seems to toss in a CACHE_FLUSH after each DRAW_INDX.. */ } -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 06/16] nir: improve convert_yuv_to_rgb when fuse_ffma=true
When ffma is available, we can use a different arrangement of constants to get a better result. On freedreno/ir3, this reduces the YUV->RGB to 7 scalar ffma. On freedreno/a2xx, it will allow YUV->RGB to be 3 vec4 ffma. Signed-off-by: Jonathan Marek --- src/compiler/nir/nir_lower_tex.c | 62 ++-- 1 file changed, 43 insertions(+), 19 deletions(-) diff --git a/src/compiler/nir/nir_lower_tex.c b/src/compiler/nir/nir_lower_tex.c index 6a6b6c41a7..f7c821bb34 100644 --- a/src/compiler/nir/nir_lower_tex.c +++ b/src/compiler/nir/nir_lower_tex.c @@ -342,25 +342,49 @@ convert_yuv_to_rgb(nir_builder *b, nir_tex_instr *tex, nir_ssa_def *y, nir_ssa_def *u, nir_ssa_def *v, nir_ssa_def *a) { - nir_const_value m[3] = { - { .f32 = { 1.0f, 0.0f, 1.59602678f, 0.0f } }, - { .f32 = { 1.0f, -0.39176229f, -0.81296764f, 0.0f } }, - { .f32 = { 1.0f, 2.01723214f, 0.0f,0.0f } } - }; - - nir_ssa_def *yuv = - nir_vec4(b, - nir_fmul(b, nir_imm_float(b, 1.16438356f), -nir_fadd(b, y, nir_imm_float(b, -16.0f / 255.0f))), - nir_channel(b, nir_fadd(b, u, nir_imm_float(b, -128.0f / 255.0f)), 0), - nir_channel(b, nir_fadd(b, v, nir_imm_float(b, -128.0f / 255.0f)), 0), - nir_imm_float(b, 0.0)); - - nir_ssa_def *red = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32, m[0])); - nir_ssa_def *green = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32, m[1])); - nir_ssa_def *blue = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32, m[2])); - - nir_ssa_def *result = nir_vec4(b, red, green, blue, a); + nir_ssa_def *result; + + + if (b->shader->options->fuse_ffma) { + nir_const_value m[4] = { + { .f32 = { 1.16438356f, 1.16438356f, 1.16438356f, 0.0f } }, + { .f32 = { 0.0f, -0.39176229f, 2.01723214f, 0.0f } }, + { .f32 = { 1.59602678f,-0.81296764f, 0.0f,0.0f } }, + }; + static const float y_off = -16.0f * 1.16438356f / 255.0f; + static const float sc = 128.0f / 255.0f; + + nir_ssa_def *offset = + nir_vec4(b, + nir_imm_float(b, y_off - sc * 1.59602678f), + nir_imm_float(b, y_off + sc * (0.81296764f + 0.39176229f)), + nir_imm_float(b, y_off - sc * 2.01723214f), + a); + + result = nir_ffma(b, y, nir_build_imm(b, 4, 32, m[0]), + nir_ffma(b, u, nir_build_imm(b, 4, 32, m[1]), +nir_ffma(b, v, nir_build_imm(b, 4, 32, m[2]), offset))); + } else { + nir_const_value m[3] = { + { .f32 = { 1.0f, 0.0f, 1.59602678f, 0.0f } }, + { .f32 = { 1.0f, -0.39176229f, -0.81296764f, 0.0f } }, + { .f32 = { 1.0f, 2.01723214f, 0.0f,0.0f } } + }; + + nir_ssa_def *yuv = + nir_vec4(b, + nir_fmul(b, nir_imm_float(b, 1.16438356f), + nir_fadd(b, y, nir_imm_float(b, -16.0f / 255.0f))), + nir_channel(b, nir_fadd(b, u, nir_imm_float(b, -128.0f / 255.0f)), 0), + nir_channel(b, nir_fadd(b, v, nir_imm_float(b, -128.0f / 255.0f)), 0), + nir_imm_float(b, 0.0)); + + nir_ssa_def *red = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32, m[0])); + nir_ssa_def *green = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32, m[1])); + nir_ssa_def *blue = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32, m[2])); + + result = nir_vec4(b, red, green, blue, a); + } nir_ssa_def_rewrite_uses(>dest.ssa, nir_src_for_ssa(result)); } -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 05/16] nir: combine fmul and fadd across ffma operations
This works by moving the fadd up across the ffma operations, so that it can eventually can be combined with a fmul. I'm not sure it works in all cases, but it works in all the common cases. This will only affect freedreno since it is the only driver using the fuse_ffma option. Example: matrix * vec4(coord, 1.0) is compiled as: fmul, ffma, ffma, fadd and with this patch: ffma, ffma, ffma Signed-off-by: Jonathan Marek --- src/compiler/nir/nir_opt_algebraic.py | 1 + 1 file changed, 1 insertion(+) diff --git a/src/compiler/nir/nir_opt_algebraic.py b/src/compiler/nir/nir_opt_algebraic.py index 506d45e55b..97a6c0d8dc 100644 --- a/src/compiler/nir/nir_opt_algebraic.py +++ b/src/compiler/nir/nir_opt_algebraic.py @@ -137,6 +137,7 @@ optimizations = [ (('~fadd@64', a, ('fmul', c , ('fadd', b, ('fneg', a, ('flrp', a, b, c), '!options->lower_flrp64'), (('ffma', a, b, c), ('fadd', ('fmul', a, b), c), 'options->lower_ffma'), (('~fadd', ('fmul', a, b), c), ('ffma', a, b, c), 'options->fuse_ffma'), + (('~fadd', ('ffma', a, b, c), d), ('ffma', a, b, ('fadd', c, d)), 'options->fuse_ffma'), (('fdot4', ('vec4', a, b, c, 1.0), d), ('fdph', ('vec3', a, b, c), d)), (('fdot4', ('vec4', a, 0.0, 0.0, 0.0), b), ('fmul', a, b)), -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 15/16] freedreno: a2xx: add partial lower_scalar pass for ir2
Some instructions can only be scalar on a2xx, lower these only Signed-off-by: Jonathan Marek --- .../drivers/freedreno/Makefile.sources| 1 + src/gallium/drivers/freedreno/a2xx/ir2_nir.c | 3 + .../freedreno/a2xx/ir2_nir_lower_scalar.c | 174 ++ .../drivers/freedreno/a2xx/ir2_private.h | 1 + src/gallium/drivers/freedreno/meson.build | 1 + 5 files changed, 180 insertions(+) create mode 100644 src/gallium/drivers/freedreno/a2xx/ir2_nir_lower_scalar.c diff --git a/src/gallium/drivers/freedreno/Makefile.sources b/src/gallium/drivers/freedreno/Makefile.sources index f4979953e8..fed5b5bd17 100644 --- a/src/gallium/drivers/freedreno/Makefile.sources +++ b/src/gallium/drivers/freedreno/Makefile.sources @@ -73,6 +73,7 @@ a2xx_SOURCES := \ a2xx/ir2_assemble.c \ a2xx/ir2_cp.c \ a2xx/ir2_nir.c \ + a2xx/ir2_nir_lower_scalar.c \ a2xx/ir2_private.h \ a2xx/ir2_ra.c diff --git a/src/gallium/drivers/freedreno/a2xx/ir2_nir.c b/src/gallium/drivers/freedreno/a2xx/ir2_nir.c index 8162479341..10fa0c765f 100644 --- a/src/gallium/drivers/freedreno/a2xx/ir2_nir.c +++ b/src/gallium/drivers/freedreno/a2xx/ir2_nir.c @@ -1124,6 +1124,9 @@ ir2_nir_compile(struct ir2_context *ctx, bool binning) OPT_V(ctx->nir, nir_lower_bool_to_float); + /* lower to scalar instructions that can only be scalar on a2xx */ + OPT_V(ctx->nir, ir2_nir_lower_scalar); + OPT_V(ctx->nir, nir_lower_locals_to_regs); OPT_V(ctx->nir, nir_convert_from_ssa, true); diff --git a/src/gallium/drivers/freedreno/a2xx/ir2_nir_lower_scalar.c b/src/gallium/drivers/freedreno/a2xx/ir2_nir_lower_scalar.c new file mode 100644 index 00..2b72a86b3e --- /dev/null +++ b/src/gallium/drivers/freedreno/a2xx/ir2_nir_lower_scalar.c @@ -0,0 +1,174 @@ +/* + * Copyright (C) 2018 Jonathan Marek + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * Authors: + *Jonathan Marek + */ + +/* some operations can only be scalar on a2xx: + * rsq, rcp, log2, exp2, cos, sin, sqrt + * mostly copy-pasted from nir_lower_alu_to_scalar.c + */ + +#include "ir2_private.h" +#include "compiler/nir/nir_builder.h" + +static void +nir_alu_ssa_dest_init(nir_alu_instr * instr, unsigned num_components, + unsigned bit_size) +{ + nir_ssa_dest_init(>instr, >dest.dest, num_components, + bit_size, NULL); + instr->dest.write_mask = (1 << num_components) - 1; +} + +static void +lower_reduction(nir_alu_instr * instr, nir_op chan_op, nir_op merge_op, + nir_builder * builder) +{ + unsigned num_components = nir_op_infos[instr->op].input_sizes[0]; + + nir_ssa_def *last = NULL; + for (unsigned i = 0; i < num_components; i++) { + nir_alu_instr *chan = + nir_alu_instr_create(builder->shader, chan_op); + nir_alu_ssa_dest_init(chan, 1, instr->dest.dest.ssa.bit_size); + nir_alu_src_copy(>src[0], >src[0], chan); + chan->src[0].swizzle[0] = chan->src[0].swizzle[i]; + if (nir_op_infos[chan_op].num_inputs > 1) { + assert(nir_op_infos[chan_op].num_inputs == 2); + nir_alu_src_copy(>src[1], >src[1], chan); + chan->src[1].swizzle[0] = chan->src[1].swizzle[i]; + } + chan->exact = instr->exact; + + nir_builder_instr_insert(builder, >instr); + + if (i == 0) { + last = >dest.dest.ssa; + } else { + last = nir_build_alu(builder, merge_op, +
[Mesa-dev] [PATCH 14/16] freedreno: a2xx: add ir2 copy propagation
Two cases: * replacing srcs which refer to MOV instructions * replacing MOVs used to write to exports Signed-off-by: Jonathan Marek --- .../drivers/freedreno/Makefile.sources| 1 + src/gallium/drivers/freedreno/a2xx/ir2.c | 6 + src/gallium/drivers/freedreno/a2xx/ir2_cp.c | 225 ++ .../drivers/freedreno/a2xx/ir2_private.h | 3 + src/gallium/drivers/freedreno/meson.build | 1 + 5 files changed, 236 insertions(+) create mode 100644 src/gallium/drivers/freedreno/a2xx/ir2_cp.c diff --git a/src/gallium/drivers/freedreno/Makefile.sources b/src/gallium/drivers/freedreno/Makefile.sources index 8421318081..f4979953e8 100644 --- a/src/gallium/drivers/freedreno/Makefile.sources +++ b/src/gallium/drivers/freedreno/Makefile.sources @@ -71,6 +71,7 @@ a2xx_SOURCES := \ a2xx/ir2.c \ a2xx/ir2.h \ a2xx/ir2_assemble.c \ + a2xx/ir2_cp.c \ a2xx/ir2_nir.c \ a2xx/ir2_private.h \ a2xx/ir2_ra.c diff --git a/src/gallium/drivers/freedreno/a2xx/ir2.c b/src/gallium/drivers/freedreno/a2xx/ir2.c index 344f62defe..bc1d7c23b8 100644 --- a/src/gallium/drivers/freedreno/a2xx/ir2.c +++ b/src/gallium/drivers/freedreno/a2xx/ir2.c @@ -422,9 +422,15 @@ ir2_compile(struct fd2_shader_stateobj *so, unsigned variant, /* convert nir to internal representation */ ir2_nir_compile(, binning); + /* copy propagate srcs */ + cp_src(); + /* get ref_counts and kill non-needed instructions */ ra_count_refs(); + /* remove movs used to write outputs */ + cp_export(); + /* instruction order.. and vector->scalar conversions */ schedule_instrs(); diff --git a/src/gallium/drivers/freedreno/a2xx/ir2_cp.c b/src/gallium/drivers/freedreno/a2xx/ir2_cp.c new file mode 100644 index 00..fa155887f8 --- /dev/null +++ b/src/gallium/drivers/freedreno/a2xx/ir2_cp.c @@ -0,0 +1,225 @@ +/* + * Copyright (C) 2018 Jonathan Marek + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * Authors: + *Jonathan Marek + */ + +#include "ir2_private.h" + +static bool is_mov(struct ir2_instr *instr) +{ + return instr->type == IR2_ALU && instr->alu.vector_opc == MAXv && + instr->src_count == 1; +} + +static void src_combine(struct ir2_src *src, struct ir2_src b) +{ + src->num = b.num; + src->type = b.type; + src->swizzle = swiz_merge(b.swizzle, src->swizzle); + if (!src->abs) /* if we have abs we don't care about previous negate */ + src->negate ^= b.negate; + src->abs |= b.abs; +} + +/* cp_src: replace src regs when they refer to a mov instruction + * example: + * ALU: MAXvR7 = C7, C7 + * ALU: MULADDv R7 = R7, R10, R0. + * becomes: + * ALU: MULADDv R7 = C7, R10, R0. + */ +void cp_src(struct ir2_context *ctx) +{ + struct ir2_instr *p; + + ir2_foreach_instr(instr, ctx) { + ir2_foreach_src(src, instr) { + /* loop to replace recursively */ + do { + if (src->type != IR2_SRC_SSA) + break; + + p = >instr[src->num]; + /* don't work across blocks to avoid possible issues */ + if (p->block_idx != instr->block_idx) + break; + + if (!is_mov(p)) + break; + + /* cant apply abs to const src, const src only for alu */ + if (p->src[0].type == IR2_SRC_CONST && + (src->abs || instr->type != IR2_A
[Mesa-dev] [PATCH 10/16] freedreno: a2xx: fix VERTEX_REUSE/DEALLOC on a20x
On a20x, set VGT_VERTEX_REUSE_BLOCK_CNTL to 2 and don't change it. Small rearrangement on a220 to reduce the size of draw commands. Only set DEALLOC_CNTL on a20x because the correct a220 value is not known. Signed-off-by: Jonathan Marek --- src/gallium/drivers/freedreno/a2xx/fd2_draw.c | 18 +++--- src/gallium/drivers/freedreno/a2xx/fd2_emit.c | 16 src/gallium/drivers/freedreno/a2xx/fd2_gmem.c | 18 +++--- 3 files changed, 34 insertions(+), 18 deletions(-) diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c index 6dac8ca6a9..db8b022f8d 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c @@ -85,10 +85,6 @@ draw_impl(struct fd_context *ctx, const struct pipe_draw_info *info, OUT_RING(ring, CP_REG(REG_A2XX_VGT_INDX_OFFSET)); OUT_RING(ring, info->index_size ? 0 : info->start); - OUT_PKT3(ring, CP_SET_CONSTANT, 2); - OUT_RING(ring, CP_REG(REG_A2XX_VGT_VERTEX_REUSE_BLOCK_CNTL)); - OUT_RING(ring, is_a20x(ctx->screen) ? 0x0002 : 0x003b); - OUT_PKT0(ring, REG_A2XX_TC_CNTL_STATUS, 1); OUT_RING(ring, A2XX_TC_CNTL_STATUS_L2_INVALIDATE); @@ -214,9 +210,11 @@ fd2_clear(struct fd_context *ctx, unsigned buffers, OUT_RING(ring, CP_REG(REG_A2XX_VGT_INDX_OFFSET)); OUT_RING(ring, 0); - OUT_PKT3(ring, CP_SET_CONSTANT, 2); - OUT_RING(ring, CP_REG(REG_A2XX_VGT_VERTEX_REUSE_BLOCK_CNTL)); - OUT_RING(ring, 0x028f); + if (!is_a20x(ctx->screen)) { + OUT_PKT3(ring, CP_SET_CONSTANT, 2); + OUT_RING(ring, CP_REG(REG_A2XX_VGT_VERTEX_REUSE_BLOCK_CNTL)); + OUT_RING(ring, 0x028f); + } fd2_program_emit(ring, >solid_prog); @@ -357,6 +355,12 @@ fd2_clear(struct fd_context *ctx, unsigned buffers, OUT_RING(ring, CP_REG(REG_A2XX_RB_COPY_CONTROL)); OUT_RING(ring, 0x); + if (!is_a20x(ctx->screen)) { + OUT_PKT3(ring, CP_SET_CONSTANT, 2); + OUT_RING(ring, CP_REG(REG_A2XX_VGT_VERTEX_REUSE_BLOCK_CNTL)); + OUT_RING(ring, 0x003b); + } + ctx->dirty |= FD_DIRTY_ZSA | FD_DIRTY_VIEWPORT | FD_DIRTY_RASTERIZER | diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c index 7dcd31cbcb..ce275a78a6 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c @@ -341,6 +341,18 @@ fd2_emit_restore(struct fd_context *ctx, struct fd_ringbuffer *ring) OUT_PKT3(ring, CP_SET_CONSTANT, 2); OUT_RING(ring, CP_REG(REG_A2XX_PA_SC_VIZ_QUERY)); OUT_RING(ring, A2XX_PA_SC_VIZ_QUERY_VIZ_QUERY_ID(16)); + + OUT_PKT3(ring, CP_SET_CONSTANT, 2); + OUT_RING(ring, CP_REG(REG_A2XX_VGT_VERTEX_REUSE_BLOCK_CNTL)); + OUT_RING(ring, 0x0002); + + OUT_PKT3(ring, CP_SET_CONSTANT, 2); + OUT_RING(ring, CP_REG(REG_A2XX_VGT_OUT_DEALLOC_CNTL)); + OUT_RING(ring, 0x0002); + } else { + OUT_PKT3(ring, CP_SET_CONSTANT, 2); + OUT_RING(ring, CP_REG(REG_A2XX_VGT_VERTEX_REUSE_BLOCK_CNTL)); + OUT_RING(ring, 0x003b); } OUT_PKT0(ring, REG_A2XX_TP0_CHICKEN, 1); @@ -368,10 +380,6 @@ fd2_emit_restore(struct fd_context *ctx, struct fd_ringbuffer *ring) OUT_RING(ring, CP_REG(REG_A2XX_VGT_INDX_OFFSET)); OUT_RING(ring, 0x); - OUT_PKT3(ring, CP_SET_CONSTANT, 2); - OUT_RING(ring, CP_REG(REG_A2XX_VGT_VERTEX_REUSE_BLOCK_CNTL)); - OUT_RING(ring, 0x003b); - OUT_PKT3(ring, CP_SET_CONSTANT, 2); OUT_RING(ring, CP_REG(REG_A2XX_SQ_CONTEXT_MISC)); OUT_RING(ring, A2XX_SQ_CONTEXT_MISC_SC_SAMPLE_CNTL(CENTERS_ONLY)); diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c b/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c index 8469e827b9..d9aad16b4a 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c @@ -131,9 +131,11 @@ fd2_emit_tile_gmem2mem(struct fd_batch *batch, struct fd_tile *tile) OUT_RING(ring, CP_REG(REG_A2XX_VGT_INDX_OFFSET)); OUT_RING(ring, 0); - OUT_PKT3(ring, CP_SET_CONSTANT, 2); - OUT_RING(ring, CP_REG(REG_A2XX_VGT_VERTEX_REUSE_BLOCK_CNTL)); - OUT_RING(ring, 0x028f); + if (!is_a20x(ctx->screen)) { + OUT_PKT3(ring, CP_SET_CONSTANT, 2); + OUT_RING(ring, CP_REG(REG_A2XX_VGT_VERTEX_REUSE_BLOCK_CNTL)); + OUT_RING(ring, 0x028f); + } fd2_program_emit(ring, >solid_prog); @@ -186,6 +188,12 @@ fd2_emit_tile_gmem2mem(struct fd_batch *batch, struct fd_tile *tile) OUT_PK
[Mesa-dev] [PATCH 09/16] freedreno: a2xx: set viewport in gmem2mem
Fixes cases where previous viewport values might case gmem2mem to fail. Signed-off-by: Jonathan Marek --- src/gallium/drivers/freedreno/a2xx/fd2_gmem.c | 8 1 file changed, 8 insertions(+) diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c b/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c index 3c54e2c6c0..8469e827b9 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c @@ -160,6 +160,14 @@ fd2_emit_tile_gmem2mem(struct fd_batch *batch, struct fd_tile *tile) OUT_RING(ring, CP_REG(REG_A2XX_PA_CL_CLIP_CNTL)); OUT_RING(ring, 0x); + /* make sure the rectangle covers the entire screen */ + OUT_PKT3(ring, CP_SET_CONSTANT, 5); + OUT_RING(ring, CP_REG(REG_A2XX_PA_CL_VPORT_XSCALE)); + OUT_RING(ring, fui(4096.0)); + OUT_RING(ring, fui(4096.0)); + OUT_RING(ring, fui(4096.0)); + OUT_RING(ring, fui(4096.0)); + OUT_PKT3(ring, CP_SET_CONSTANT, 2); OUT_RING(ring, CP_REG(REG_A2XX_RB_MODECONTROL)); OUT_RING(ring, A2XX_RB_MODECONTROL_EDRAM_MODE(EDRAM_COPY)); -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 01/16] glsl/nir: int constants as float for native_integers=false
Note: the backend must take care that uniform index is now a float Signed-off-by: Jonathan Marek --- src/compiler/glsl/glsl_to_nir.cpp | 16 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/src/compiler/glsl/glsl_to_nir.cpp b/src/compiler/glsl/glsl_to_nir.cpp index c5ba47d9e3..c8a7f3bd6c 100644 --- a/src/compiler/glsl/glsl_to_nir.cpp +++ b/src/compiler/glsl/glsl_to_nir.cpp @@ -94,6 +94,8 @@ private: nir_deref_instr *evaluate_deref(ir_instruction *ir); + nir_constant *constant_copy(ir_constant *ir, void *mem_ctx); + /* most recent deref instruction created */ nir_deref_instr *deref; @@ -194,8 +196,8 @@ nir_visitor::evaluate_deref(ir_instruction *ir) return this->deref; } -static nir_constant * -constant_copy(ir_constant *ir, void *mem_ctx) +nir_constant * +nir_visitor::constant_copy(ir_constant *ir, void *mem_ctx) { if (ir == NULL) return NULL; @@ -213,7 +215,10 @@ constant_copy(ir_constant *ir, void *mem_ctx) assert(cols == 1); for (unsigned r = 0; r < rows; r++) - ret->values[0].u32[r] = ir->value.u[r]; + if (supports_ints) +ret->values[0].u32[r] = ir->value.u[r]; + else +ret->values[0].f32[r] = ir->value.u[r]; break; @@ -222,7 +227,10 @@ constant_copy(ir_constant *ir, void *mem_ctx) assert(cols == 1); for (unsigned r = 0; r < rows; r++) - ret->values[0].i32[r] = ir->value.i[r]; + if (supports_ints) +ret->values[0].i32[r] = ir->value.i[r]; + else +ret->values[0].f32[r] = ir->value.i[r]; break; -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 08/16] freedreno: a2xx: enable early-Z testing
Enable earlyZ when alpha test is disabled. Signed-off-by: Jonathan Marek --- src/gallium/drivers/freedreno/a2xx/fd2_zsa.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_zsa.c b/src/gallium/drivers/freedreno/a2xx/fd2_zsa.c index 64b31b677b..d3c19b4450 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_zsa.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_zsa.c @@ -49,7 +49,8 @@ fd2_zsa_state_create(struct pipe_context *pctx, A2XX_RB_DEPTHCONTROL_ZFUNC(cso->depth.func); /* maps 1:1 */ if (cso->depth.enabled) - so->rb_depthcontrol |= A2XX_RB_DEPTHCONTROL_Z_ENABLE; + so->rb_depthcontrol |= A2XX_RB_DEPTHCONTROL_Z_ENABLE | + COND(!cso->alpha.enabled, A2XX_RB_DEPTHCONTROL_EARLY_Z_ENABLE); if (cso->depth.writemask) so->rb_depthcontrol |= A2XX_RB_DEPTHCONTROL_Z_WRITE_ENABLE; -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 03/16] glsl/nir: keep bool types when native_integers=false
We should now lower bool to float later. Signed-off-by: Jonathan Marek --- src/compiler/glsl/glsl_to_nir.cpp | 175 -- 1 file changed, 71 insertions(+), 104 deletions(-) diff --git a/src/compiler/glsl/glsl_to_nir.cpp b/src/compiler/glsl/glsl_to_nir.cpp index d88289f682..29702abf75 100644 --- a/src/compiler/glsl/glsl_to_nir.cpp +++ b/src/compiler/glsl/glsl_to_nir.cpp @@ -86,6 +86,7 @@ private: nir_ssa_def *src2, nir_ssa_def *src3); bool supports_ints; + bool type_force_float(glsl_base_type type); nir_shader *shader; nir_function_impl *impl; @@ -1261,6 +1262,12 @@ nir_visitor::visit(ir_call *ir) unreachable("glsl_to_nir only handles function calls to intrinsics"); } +bool +nir_visitor::type_force_float(glsl_base_type type) +{ + return !supports_ints && type != GLSL_TYPE_BOOL; +} + void nir_visitor::visit(ir_assignment *ir) { @@ -1298,7 +1305,8 @@ nir_visitor::visit(ir_assignment *ir) for (unsigned i = 0; i < 4; i++) { swiz[i] = ir->write_mask & (1 << i) ? component++ : 0; } - src = nir_swizzle(, src, swiz, num_components, !supports_ints); + src = nir_swizzle(, src, swiz, num_components, +type_force_float(ir->rhs->type->base_type)); } if (ir->condition) { @@ -1495,22 +1503,21 @@ nir_visitor::visit(ir_expression *ir) srcs[i] = evaluate_rvalue(ir->operands[i]); glsl_base_type types[4]; - for (unsigned i = 0; i < ir->num_operands; i++) - if (supports_ints) - types[i] = ir->operands[i]->type->base_type; - else + for (unsigned i = 0; i < ir->num_operands; i++) { + types[i] = ir->operands[i]->type->base_type; + if (type_force_float(types[i])) types[i] = GLSL_TYPE_FLOAT; + } - glsl_base_type out_type; - if (supports_ints) - out_type = ir->type->base_type; - else + glsl_base_type out_type = ir->type->base_type; + if (type_force_float(out_type)) out_type = GLSL_TYPE_FLOAT; switch (ir->operation) { case ir_unop_bit_not: result = nir_inot(, srcs[0]); break; case ir_unop_logic_not: - result = supports_ints ? nir_inot(, srcs[0]) : nir_fnot(, srcs[0]); + result = type_is_float(types[0]) ? nir_fnot(, srcs[0]) + : nir_inot(, srcs[0]); break; case ir_unop_neg: result = type_is_float(types[0]) ? nir_fneg(, srcs[0]) @@ -1542,7 +1549,7 @@ nir_visitor::visit(ir_expression *ir) result = supports_ints ? nir_u2f32(, srcs[0]) : nir_fmov(, srcs[0]); break; case ir_unop_b2f: - result = supports_ints ? nir_b2f32(, srcs[0]) : nir_fmov(, srcs[0]); + result = nir_b2f32(, srcs[0]); break; case ir_unop_f2i: case ir_unop_f2u: @@ -1788,16 +1795,16 @@ nir_visitor::visit(ir_expression *ir) case ir_binop_bit_or: result = nir_ior(, srcs[0], srcs[1]); break; case ir_binop_bit_xor: result = nir_ixor(, srcs[0], srcs[1]); break; case ir_binop_logic_and: - result = supports_ints ? nir_iand(, srcs[0], srcs[1]) - : nir_fand(, srcs[0], srcs[1]); + result = type_is_float(types[0]) ? nir_fand(, srcs[0], srcs[1]) + : nir_iand(, srcs[0], srcs[1]); break; case ir_binop_logic_or: - result = supports_ints ? nir_ior(, srcs[0], srcs[1]) - : nir_for(, srcs[0], srcs[1]); + result = type_is_float(types[0]) ? nir_for(, srcs[0], srcs[1]) + : nir_ior(, srcs[0], srcs[1]); break; case ir_binop_logic_xor: - result = supports_ints ? nir_ixor(, srcs[0], srcs[1]) - : nir_fxor(, srcs[0], srcs[1]); + result = type_is_float(types[0]) ? nir_fxor(, srcs[0], srcs[1]) + : nir_ixor(, srcs[0], srcs[1]); break; case ir_binop_lshift: result = nir_ishl(, srcs[0], srcs[1]); break; case ir_binop_rshift: @@ -1811,108 +1818,70 @@ nir_visitor::visit(ir_expression *ir) case ir_binop_carry: result = nir_uadd_carry(, srcs[0], srcs[1]); break; case ir_binop_borrow: result = nir_usub_borrow(, srcs[0], srcs[1]); break; case ir_binop_less: - if (supports_ints) { - if (type_is_float(types[0])) -result = nir_flt(, srcs[0], srcs[1]); - else if (type_is_signed(types[0])) -result = nir_ilt(, srcs[0], srcs[1]); - else -result = nir_ult(, srcs[0], srcs[1]); - } else { - result = nir_slt(, srcs[0], srcs[1]); - } + if (type_is_float(types[0])) + result = nir_flt(, srcs[0], srcs[1]); + else if (type_is_signed(types[0])) + result = nir_ilt(, srcs[0], srcs[1]); + else + result = nir_ult(, srcs[0], srcs[1]); break; case ir_binop_gequal: - if (
[Mesa-dev] [PATCH 04/16] nir: add nir_lower_bool_to_float
Mainly a copy of nir_lower_bool_to_int32, but with float opcodes. Signed-off-by: Jonathan Marek --- src/compiler/Makefile.sources | 1 + src/compiler/nir/meson.build | 3 +- src/compiler/nir/nir.h | 1 + src/compiler/nir/nir_lower_bool_to_float.c | 165 + 4 files changed, 169 insertions(+), 1 deletion(-) create mode 100644 src/compiler/nir/nir_lower_bool_to_float.c diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.sources index ef47bdb33b..39eaedc658 100644 --- a/src/compiler/Makefile.sources +++ b/src/compiler/Makefile.sources @@ -231,6 +231,7 @@ NIR_FILES = \ nir/nir_lower_atomics_to_ssbo.c \ nir/nir_lower_bitmap.c \ nir/nir_lower_bit_size.c \ + nir/nir_lower_bool_to_float.c \ nir/nir_lower_bool_to_int32.c \ nir/nir_lower_clamp_color_outputs.c \ nir/nir_lower_clip.c \ diff --git a/src/compiler/nir/meson.build b/src/compiler/nir/meson.build index e252f64539..f1016104af 100644 --- a/src/compiler/nir/meson.build +++ b/src/compiler/nir/meson.build @@ -114,6 +114,7 @@ files_libnir = files( 'nir_lower_alpha_test.c', 'nir_lower_atomics_to_ssbo.c', 'nir_lower_bitmap.c', + 'nir_lower_bool_to_float.c', 'nir_lower_bool_to_int32.c', 'nir_lower_clamp_color_outputs.c', 'nir_lower_clip.c', @@ -248,7 +249,7 @@ if with_tests include_directories : [inc_common], dependencies : [dep_thread, idep_gtest, idep_nir], link_with : libmesa_util, -), +), suite : ['compiler', 'nir'], ) diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h index 54f9c64a3a..f6d0bdf7ec 100644 --- a/src/compiler/nir/nir.h +++ b/src/compiler/nir/nir.h @@ -2905,6 +2905,7 @@ void nir_lower_alpha_test(nir_shader *shader, enum compare_func func, bool alpha_to_one); bool nir_lower_alu(nir_shader *shader); bool nir_lower_alu_to_scalar(nir_shader *shader); +bool nir_lower_bool_to_float(nir_shader *shader); bool nir_lower_bool_to_int32(nir_shader *shader); bool nir_lower_load_const_to_scalar(nir_shader *shader); bool nir_lower_read_invocation_to_scalar(nir_shader *shader); diff --git a/src/compiler/nir/nir_lower_bool_to_float.c b/src/compiler/nir/nir_lower_bool_to_float.c new file mode 100644 index 00..2756a1815f --- /dev/null +++ b/src/compiler/nir/nir_lower_bool_to_float.c @@ -0,0 +1,165 @@ +/* + * Copyright © 2018 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +#include "nir.h" + +static bool +assert_ssa_def_is_not_1bit(nir_ssa_def *def, UNUSED void *unused) +{ + assert(def->bit_size > 1); + return true; +} + +static bool +rewrite_1bit_ssa_def_to_32bit(nir_ssa_def *def, void *_progress) +{ + bool *progress = _progress; + if (def->bit_size == 1) { + def->bit_size = 32; + *progress = true; + } + return true; +} + +static bool +lower_alu_instr(nir_alu_instr *alu) +{ + const nir_op_info *op_info = _op_infos[alu->op]; + + switch (alu->op) { + case nir_op_vec2: + case nir_op_vec3: + case nir_op_vec4: + /* These we expect to have booleans but the opcode doesn't change */ + break; + + case nir_op_b2f32: alu->op = nir_op_fmov; break; + + /* Note: we only expect these 5 opcodes with bools */ + case nir_op_imov: alu->op = nir_op_fmov; break; + case nir_op_inot: alu->op = nir_op_fnot; break; + case nir_op_iand: alu->op = nir_op_fand; break; + case nir_op_ior: alu->op = nir_op_for; break; + case nir_op_ixor: alu->op = nir_op_fxor; break; + + /* We might want a new opcode (for the (x != 0.0) f2b op) */ + case nir_op_f2b1: alu->op = nir_op_f2b32; break; + case nir_op_i2b1: alu->op = nir_op_f2b32; break; + + case nir_op_flt: alu->op = nir_op_slt; break; + case nir_op_fge: alu->op = nir_op_sge
[Mesa-dev] [PATCH 02/16] glsl/nir: ftrunc for native_integers=false float to int cast
out_type is always GLSL_TYPE_FLOAT, so we don't get the ftrunc otherwise Signed-off-by: Jonathan Marek --- src/compiler/glsl/glsl_to_nir.cpp | 7 +++ 1 file changed, 7 insertions(+) diff --git a/src/compiler/glsl/glsl_to_nir.cpp b/src/compiler/glsl/glsl_to_nir.cpp index c8a7f3bd6c..d88289f682 100644 --- a/src/compiler/glsl/glsl_to_nir.cpp +++ b/src/compiler/glsl/glsl_to_nir.cpp @@ -1578,6 +1578,13 @@ nir_visitor::visit(ir_expression *ir) case ir_unop_u2i: case ir_unop_i642u64: case ir_unop_u642i64: { + if (!supports_ints) { + if (ir->operation == ir_unop_f2i || ir->operation == ir_unop_f2u) { +result = nir_ftrunc(, srcs[0]); +break; + } + } + nir_alu_type src_type = nir_get_nir_type_for_glsl_base_type(types[0]); nir_alu_type dst_type = nir_get_nir_type_for_glsl_base_type(out_type); result = nir_build_alu(, nir_type_conversion_op(src_type, dst_type, -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/4] freedreno: a2xx: add partial lower_scalar pass for ir2
some instructions can only be scalar on a2xx, lower these only Signed-off-by: Jonathan Marek --- .../drivers/freedreno/Makefile.sources| 1 + src/gallium/drivers/freedreno/a2xx/ir2_nir.c | 3 + .../freedreno/a2xx/ir2_nir_lower_scalar.c | 174 ++ .../drivers/freedreno/a2xx/ir2_private.h | 1 + src/gallium/drivers/freedreno/meson.build | 1 + 5 files changed, 180 insertions(+) create mode 100644 src/gallium/drivers/freedreno/a2xx/ir2_nir_lower_scalar.c diff --git a/src/gallium/drivers/freedreno/Makefile.sources b/src/gallium/drivers/freedreno/Makefile.sources index 9061b26ba7..7f2b8e7b7d 100644 --- a/src/gallium/drivers/freedreno/Makefile.sources +++ b/src/gallium/drivers/freedreno/Makefile.sources @@ -87,6 +87,7 @@ a2xx_SOURCES := \ a2xx/instr-a2xx.h \ a2xx/ir2.c \ a2xx/ir2_nir.c \ + a2xx/ir2_nir_lower_scalar.c \ a2xx/ir2_substitutions.c \ a2xx/ir2_ra.c \ a2xx/ir2_assemble.c \ diff --git a/src/gallium/drivers/freedreno/a2xx/ir2_nir.c b/src/gallium/drivers/freedreno/a2xx/ir2_nir.c index f24debf140..12d4ee6653 100644 --- a/src/gallium/drivers/freedreno/a2xx/ir2_nir.c +++ b/src/gallium/drivers/freedreno/a2xx/ir2_nir.c @@ -1112,6 +1112,9 @@ ir2_nir_compile(struct ir2_context *ctx, unsigned variant) /* postprocess */ OPT_V(ctx->nir, nir_opt_algebraic_late); + /* lower to scalar instructions that can only be scalar on a2xx */ + OPT_V(ctx->nir, ir2_nir_lower_scalar); + OPT_V(ctx->nir, nir_lower_to_source_mods); OPT_V(ctx->nir, nir_copy_prop); OPT_V(ctx->nir, nir_opt_dce); diff --git a/src/gallium/drivers/freedreno/a2xx/ir2_nir_lower_scalar.c b/src/gallium/drivers/freedreno/a2xx/ir2_nir_lower_scalar.c new file mode 100644 index 00..2b72a86b3e --- /dev/null +++ b/src/gallium/drivers/freedreno/a2xx/ir2_nir_lower_scalar.c @@ -0,0 +1,174 @@ +/* + * Copyright (C) 2018 Jonathan Marek + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * Authors: + *Jonathan Marek + */ + +/* some operations can only be scalar on a2xx: + * rsq, rcp, log2, exp2, cos, sin, sqrt + * mostly copy-pasted from nir_lower_alu_to_scalar.c + */ + +#include "ir2_private.h" +#include "compiler/nir/nir_builder.h" + +static void +nir_alu_ssa_dest_init(nir_alu_instr * instr, unsigned num_components, + unsigned bit_size) +{ + nir_ssa_dest_init(>instr, >dest.dest, num_components, + bit_size, NULL); + instr->dest.write_mask = (1 << num_components) - 1; +} + +static void +lower_reduction(nir_alu_instr * instr, nir_op chan_op, nir_op merge_op, + nir_builder * builder) +{ + unsigned num_components = nir_op_infos[instr->op].input_sizes[0]; + + nir_ssa_def *last = NULL; + for (unsigned i = 0; i < num_components; i++) { + nir_alu_instr *chan = + nir_alu_instr_create(builder->shader, chan_op); + nir_alu_ssa_dest_init(chan, 1, instr->dest.dest.ssa.bit_size); + nir_alu_src_copy(>src[0], >src[0], chan); + chan->src[0].swizzle[0] = chan->src[0].swizzle[i]; + if (nir_op_infos[chan_op].num_inputs > 1) { + assert(nir_op_infos[chan_op].num_inputs == 2); + nir_alu_src_copy(>src[1], >src[1], chan); + chan->src[1].swizzle[0] = chan->src[1].swizzle[i]; + } + chan->exact = instr->exact; + + nir_builder_instr_insert(builder, >instr); + + if (i == 0) { + last = >dest.dest.ssa; + } else { + last = nir_build_
[Mesa-dev] [PATCH 3/4] freedreno: implement a20x hw binning
Not in this patch: emitting the hw binning variant and filling the "draw_patches". That is part of the ir2 patch. Signed-off-by: Jonathan Marek --- src/gallium/drivers/freedreno/a2xx/fd2_draw.c | 51 ++-- src/gallium/drivers/freedreno/a2xx/fd2_emit.c | 8 +- src/gallium/drivers/freedreno/a2xx/fd2_emit.h | 3 +- src/gallium/drivers/freedreno/a2xx/fd2_gmem.c | 118 ++ .../drivers/freedreno/freedreno_gmem.c| 29 +++-- .../drivers/freedreno/freedreno_gmem.h| 1 + 6 files changed, 190 insertions(+), 20 deletions(-) diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c index 6945a1dc3d..cab20d0295 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c @@ -75,19 +75,29 @@ emit_vertexbufs(struct fd_context *ctx) // CONST(20,0) (or CONST(26,0) in soliv_vp) fd2_emit_vertex_bufs(ctx->batch->draw, 0x78, bufs, vtx->num_elements); + fd2_emit_vertex_bufs(ctx->batch->binning, 0x78, bufs, vtx->num_elements); } static void draw_impl(struct fd_context *ctx, const struct pipe_draw_info *info, - struct fd_ringbuffer *ring, unsigned index_offset) + struct fd_ringbuffer *ring, unsigned index_offset, + bool binning) { + enum pc_di_vis_cull_mode vismode; + OUT_PKT3(ring, CP_SET_CONSTANT, 2); OUT_RING(ring, CP_REG(REG_A2XX_VGT_INDX_OFFSET)); OUT_RING(ring, info->index_size ? 0 : info->start); - OUT_PKT3(ring, CP_SET_CONSTANT, 2); - OUT_RING(ring, CP_REG(REG_A2XX_VGT_VERTEX_REUSE_BLOCK_CNTL)); - OUT_RING(ring, is_a20x(ctx->screen) ? 0x0002 : 0x003b); + /* in the binning batch, this value is set once in fd2_emit_tile_init */ + if (!binning) { + OUT_PKT3(ring, CP_SET_CONSTANT, 2); + OUT_RING(ring, CP_REG(REG_A2XX_VGT_VERTEX_REUSE_BLOCK_CNTL)); + /* XXX do this for every REG_A2XX_VGT_VERTEX_REUSE_BLOCK_CNTL write ? +* if set to 0x3b on a20x, clipping is broken +*/ + OUT_RING(ring, is_a20x(ctx->screen) ? 0x0002 : 0x003b); + } OUT_PKT0(ring, REG_A2XX_TC_CNTL_STATUS, 1); OUT_RING(ring, A2XX_TC_CNTL_STATUS_L2_INVALIDATE); @@ -123,8 +133,26 @@ draw_impl(struct fd_context *ctx, const struct pipe_draw_info *info, OUT_RING(ring, info->min_index);/* VGT_MIN_VTX_INDX */ } + /* binning shader will take offset from C64 */ + if (binning && is_a20x(ctx->screen)) { + OUT_PKT3(ring, CP_SET_CONSTANT, 5); + OUT_RING(ring, 0x0180); + OUT_RING(ring, fui(ctx->batch->num_vertices)); + OUT_RING(ring, fui(0.0f)); + OUT_RING(ring, fui(0.0f)); + OUT_RING(ring, fui(0.0f)); + } + + vismode = binning ? IGNORE_VISIBILITY : USE_VISIBILITY; + /* a22x hw binning not implemented */ + if (binning || !is_a20x(ctx->screen) || (fd_mesa_debug & FD_DBG_NOBIN)) + vismode = IGNORE_VISIBILITY; + + if (info->mode == PIPE_PRIM_POINTS) + vismode = IGNORE_VISIBILITY; + fd_draw_emit(ctx->batch, ring, ctx->primtypes[info->mode], -IGNORE_VISIBILITY, info, index_offset); +vismode, info, index_offset); if (is_a20x(ctx->screen)) { /* not sure why this is required, but it fixes some hangs */ @@ -149,7 +177,8 @@ fd2_draw_vbo(struct fd_context *ctx, const struct pipe_draw_info *pinfo, if (ctx->dirty & FD_DIRTY_VTXBUF) emit_vertexbufs(ctx); - fd2_emit_state(ctx, ctx->dirty); + fd2_emit_state(ctx, ctx->batch->draw, ctx->dirty); + fd2_emit_state(ctx, ctx->batch->binning, ctx->dirty); /* a2xx can draw only 65535 vertices at once * on a22x the field in the draw command is 32bits but seems limited too @@ -170,17 +199,23 @@ fd2_draw_vbo(struct fd_context *ctx, const struct pipe_draw_info *pinfo, struct pipe_draw_info info = *pinfo; unsigned count = info.count; unsigned step = step_tbl[info.mode]; + unsigned num_vertices = ctx->batch->num_vertices; if (!step) return false; for (; count + step > 32766; count -= step) { info.count = MIN2(count, 32766); - draw_impl(ctx, , ctx->batch->draw, index_offset); + draw_impl(ctx, , ctx->batch->draw, index_offset, false); + draw_impl(ctx, , ctx->batch->binning, index_offset, true); info.start += st
[Mesa-dev] [PATCH 4/4] freedreno: use MSM_BO_SCANOUT with scanout buffers
Signed-off-by: Jonathan Marek --- src/gallium/drivers/freedreno/drm/freedreno_drmif.h | 1 + src/gallium/drivers/freedreno/drm/msm_bo.c | 3 +++ src/gallium/drivers/freedreno/freedreno_resource.c | 4 +++- 3 files changed, 7 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/freedreno/drm/freedreno_drmif.h b/src/gallium/drivers/freedreno/drm/freedreno_drmif.h index 6468eac4a0..e12ab970c8 100644 --- a/src/gallium/drivers/freedreno/drm/freedreno_drmif.h +++ b/src/gallium/drivers/freedreno/drm/freedreno_drmif.h @@ -63,6 +63,7 @@ enum fd_param_id { #define DRM_FREEDRENO_GEM_CACHE_WBACKWA 0x0080 #define DRM_FREEDRENO_GEM_CACHE_MASK 0x00f0 #define DRM_FREEDRENO_GEM_GPUREADONLY 0x0100 +#define DRM_FREEDRENO_GEM_SCANOUT 0x0200 /* bo access flags: (keep aligned to MSM_PREP_x) */ #define DRM_FREEDRENO_PREP_READ 0x01 diff --git a/src/gallium/drivers/freedreno/drm/msm_bo.c b/src/gallium/drivers/freedreno/drm/msm_bo.c index da3315c9ab..d93dfbeab2 100644 --- a/src/gallium/drivers/freedreno/drm/msm_bo.c +++ b/src/gallium/drivers/freedreno/drm/msm_bo.c @@ -142,6 +142,9 @@ int msm_bo_new_handle(struct fd_device *dev, }; int ret; + if (flags & DRM_FREEDRENO_GEM_SCANOUT) + req.flags |= MSM_BO_SCANOUT; + ret = drmCommandWriteRead(dev->fd, DRM_MSM_GEM_NEW, , sizeof(req)); if (ret) diff --git a/src/gallium/drivers/freedreno/freedreno_resource.c b/src/gallium/drivers/freedreno/freedreno_resource.c index 54d7385896..bd7be94c85 100644 --- a/src/gallium/drivers/freedreno/freedreno_resource.c +++ b/src/gallium/drivers/freedreno/freedreno_resource.c @@ -99,7 +99,9 @@ realloc_bo(struct fd_resource *rsc, uint32_t size) { struct fd_screen *screen = fd_screen(rsc->base.screen); uint32_t flags = DRM_FREEDRENO_GEM_CACHE_WCOMBINE | - DRM_FREEDRENO_GEM_TYPE_KMEM; /* TODO */ + DRM_FREEDRENO_GEM_TYPE_KMEM | + COND(rsc->base.bind & PIPE_BIND_SCANOUT, DRM_FREEDRENO_GEM_SCANOUT); + /* TODO other flags? */ /* if we start using things other than write-combine, * be sure to check for PIPE_RESOURCE_FLAG_MAP_COHERENT -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] nir: add fceil lowering
I don't have push rights, but robclark added this patch to his staging branch so I imagine he will push it soon. On 11/19/2018 03:15 PM, Christian Gmeiner wrote: Am Mo., 12. Nov. 2018 um 19:17 Uhr schrieb Jonathan Marek : lowers ceil(x) as -floor(-x) Signed-off-by: Jonathan Marek Do you have push rights? As I am interested in this one I would push it for you if needed. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/3] nir: combine fmul and fadd across ffma operations
This works by moving the fadd up across the ffma operations, so that it can eventually can be combined with a fmul. I'm not sure it works in all cases, but it works in all the common cases. This will only affect freedreno since it is the only driver using the fuse_ffma option. Example: matrix * vec4(coord, 1.0) is compiled as: fmul, ffma, ffma, fadd and with this patch: ffma, ffma, ffma Signed-off-by: Jonathan Marek --- src/compiler/nir/nir_opt_algebraic.py | 1 + 1 file changed, 1 insertion(+) diff --git a/src/compiler/nir/nir_opt_algebraic.py b/src/compiler/nir/nir_opt_algebraic.py index 8f4df891b8..8d7c30e04a 100644 --- a/src/compiler/nir/nir_opt_algebraic.py +++ b/src/compiler/nir/nir_opt_algebraic.py @@ -133,6 +133,7 @@ optimizations = [ (('~fadd@64', a, ('fmul', c , ('fadd', b, ('fneg', a, ('flrp', a, b, c), '!options->lower_flrp64'), (('ffma', a, b, c), ('fadd', ('fmul', a, b), c), 'options->lower_ffma'), (('~fadd', ('fmul', a, b), c), ('ffma', a, b, c), 'options->fuse_ffma'), + (('~fadd', ('ffma', a, b, c), d), ('ffma', a, b, ('fadd', c, d)), 'options->fuse_ffma'), (('fdot4', ('vec4', a, b, c, 1.0), d), ('fdph', ('vec3', a, b, c), d)), (('fdot4', ('vec4', a, 0.0, 0.0, 0.0), b), ('fmul', a, b)), -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/3] glsl/nir: ftrunc for native_integers=false float to int cast
out_type is always GLSL_TYPE_FLOAT, so we don't get the ftrunc otherwise since there are no other conversions needed, use fmov for the other cases (there is the f2b case, but the 1-bit bool patches should fix that) Signed-off-by: Jonathan Marek --- src/compiler/glsl/glsl_to_nir.cpp | 13 + 1 file changed, 13 insertions(+) diff --git a/src/compiler/glsl/glsl_to_nir.cpp b/src/compiler/glsl/glsl_to_nir.cpp index 22863c072f..5d1ae85924 100644 --- a/src/compiler/glsl/glsl_to_nir.cpp +++ b/src/compiler/glsl/glsl_to_nir.cpp @@ -1560,6 +1560,19 @@ nir_visitor::visit(ir_expression *ir) case ir_unop_u2i: case ir_unop_i642u64: case ir_unop_u642i64: { + if (!supports_ints) { + switch (ir->operation) { + case ir_unop_f2i: + case ir_unop_f2u: +result = nir_ftrunc(, srcs[0]); +break; + default: +result = nir_fmov(, srcs[0]); +break; + } + break; + } + nir_alu_type src_type = nir_get_nir_type_for_glsl_base_type(types[0]); nir_alu_type dst_type = nir_get_nir_type_for_glsl_base_type(out_type); result = nir_build_alu(, nir_type_conversion_op(src_type, dst_type, -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/3] glsl/nir: int constants as float for native_integers=false
Note: the backend must take care that uniform index is now a float Signed-off-by: Jonathan Marek --- src/compiler/glsl/glsl_to_nir.cpp | 16 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/src/compiler/glsl/glsl_to_nir.cpp b/src/compiler/glsl/glsl_to_nir.cpp index 0479f8fcfe..22863c072f 100644 --- a/src/compiler/glsl/glsl_to_nir.cpp +++ b/src/compiler/glsl/glsl_to_nir.cpp @@ -93,6 +93,8 @@ private: nir_deref_instr *evaluate_deref(ir_instruction *ir); + nir_constant *constant_copy(ir_constant *ir, void *mem_ctx); + /* most recent deref instruction created */ nir_deref_instr *deref; @@ -196,8 +198,8 @@ nir_visitor::evaluate_deref(ir_instruction *ir) return this->deref; } -static nir_constant * -constant_copy(ir_constant *ir, void *mem_ctx) +nir_constant * +nir_visitor::constant_copy(ir_constant *ir, void *mem_ctx) { if (ir == NULL) return NULL; @@ -215,7 +217,10 @@ constant_copy(ir_constant *ir, void *mem_ctx) assert(cols == 1); for (unsigned r = 0; r < rows; r++) - ret->values[0].u32[r] = ir->value.u[r]; + if (supports_ints) +ret->values[0].u32[r] = ir->value.u[r]; + else +ret->values[0].f32[r] = ir->value.u[r]; break; @@ -224,7 +229,10 @@ constant_copy(ir_constant *ir, void *mem_ctx) assert(cols == 1); for (unsigned r = 0; r < rows; r++) - ret->values[0].i32[r] = ir->value.i[r]; + if (supports_ints) +ret->values[0].i32[r] = ir->value.i[r]; + else +ret->values[0].f32[r] = ir->value.i[r]; break; -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/9] freedreno: a2xx: set VIZ_QUERY_ID on a20x
Signed-off-by: Jonathan Marek --- src/gallium/drivers/freedreno/a2xx/fd2_emit.c | 5 + 1 file changed, 5 insertions(+) diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c index 20bfd06b13..50e2fe13eb 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c @@ -339,6 +339,11 @@ fd2_emit_restore(struct fd_context *ctx, struct fd_ringbuffer *ring) A2XX_RB_BC_CONTROL_ENABLE_CRC_UPDATE | A2XX_RB_BC_CONTROL_ACCUM_DATA_FIFO_LIMIT(8) | A2XX_RB_BC_CONTROL_MEM_EXPORT_TIMEOUT_SELECT(3)); + + /* not sure why this is required */ + OUT_PKT3(ring, CP_SET_CONSTANT, 2); + OUT_RING(ring, CP_REG(REG_A2XX_PA_SC_VIZ_QUERY)); + OUT_RING(ring, A2XX_PA_SC_VIZ_QUERY_VIZ_QUERY_ID(16)); } OUT_PKT0(ring, REG_A2XX_TP0_CHICKEN, 1); -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 7/9] freedreno: use GENERIC instead of TEXCOORD for blit program
blip_fp uses GENERIC as input, so blit_vp should match for linking Signed-off-by: Jonathan Marek --- src/gallium/drivers/freedreno/freedreno_program.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/freedreno/freedreno_program.c b/src/gallium/drivers/freedreno/freedreno_program.c index e41ac2d922..989ccd1838 100644 --- a/src/gallium/drivers/freedreno/freedreno_program.c +++ b/src/gallium/drivers/freedreno/freedreno_program.c @@ -67,7 +67,7 @@ static const char *blit_vp = "VERT\n" "DCL IN[0] \n" "DCL IN[1] \n" - "DCL OUT[0], TEXCOORD[0] \n" + "DCL OUT[0], GENERIC[0] \n" "DCL OUT[1], POSITION\n" " 0: MOV OUT[0], IN[0] \n" " 0: MOV OUT[1], IN[1] \n" -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 8/9] freedreno: implement a20x hw binning
Not in this patch: emitting the hw binning variant and filling the "draw_patches". That is part of the ir2 patch. Signed-off-by: Jonathan Marek --- src/gallium/drivers/freedreno/a2xx/fd2_draw.c | 47 +-- src/gallium/drivers/freedreno/a2xx/fd2_emit.c | 8 +- src/gallium/drivers/freedreno/a2xx/fd2_emit.h | 3 +- src/gallium/drivers/freedreno/a2xx/fd2_gmem.c | 118 ++ .../drivers/freedreno/freedreno_gmem.c| 29 +++-- .../drivers/freedreno/freedreno_gmem.h| 1 + 6 files changed, 187 insertions(+), 19 deletions(-) diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c index 49df1daa59..46c76df807 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c @@ -75,11 +75,13 @@ emit_vertexbufs(struct fd_context *ctx) // CONST(20,0) (or CONST(26,0) in soliv_vp) fd2_emit_vertex_bufs(ctx->batch->draw, 0x78, bufs, vtx->num_elements); + fd2_emit_vertex_bufs(ctx->batch->binning, 0x78, bufs, vtx->num_elements); } static void draw_impl(struct fd_context *ctx, const struct pipe_draw_info *info, - struct fd_ringbuffer *ring, unsigned index_offset) + struct fd_ringbuffer *ring, unsigned index_offset, + bool binning) { enum pc_di_vis_cull_mode vismode; @@ -87,9 +89,15 @@ draw_impl(struct fd_context *ctx, const struct pipe_draw_info *info, OUT_RING(ring, CP_REG(REG_A2XX_VGT_INDX_OFFSET)); OUT_RING(ring, info->index_size ? 0 : info->start); - OUT_PKT3(ring, CP_SET_CONSTANT, 2); - OUT_RING(ring, CP_REG(REG_A2XX_VGT_VERTEX_REUSE_BLOCK_CNTL)); - OUT_RING(ring, is_a20x(ctx->screen) ? 0x0002 : 0x003b); + /* in the binning batch, this value is set once in fd2_emit_tile_init */ + if (!binning) { + OUT_PKT3(ring, CP_SET_CONSTANT, 2); + OUT_RING(ring, CP_REG(REG_A2XX_VGT_VERTEX_REUSE_BLOCK_CNTL)); + /* XXX do this for every REG_A2XX_VGT_VERTEX_REUSE_BLOCK_CNTL write ? +* if set to 0x3b on a20x, clipping is broken +*/ + OUT_RING(ring, is_a20x(ctx->screen) ? 0x0002 : 0x003b); + } OUT_PKT0(ring, REG_A2XX_TC_CNTL_STATUS, 1); OUT_RING(ring, A2XX_TC_CNTL_STATUS_L2_INVALIDATE); @@ -125,8 +133,26 @@ draw_impl(struct fd_context *ctx, const struct pipe_draw_info *info, OUT_RING(ring, info->min_index);/* VGT_MIN_VTX_INDX */ } + /* binning shader will take offset from C64 */ + if (binning && is_a20x(ctx->screen)) { + OUT_PKT3(ring, CP_SET_CONSTANT, 5); + OUT_RING(ring, 0x0180); + OUT_RING(ring, fui(ctx->batch->num_vertices)); + OUT_RING(ring, fui(0.0f)); + OUT_RING(ring, fui(0.0f)); + OUT_RING(ring, fui(0.0f)); + } + + vismode = binning ? IGNORE_VISIBILITY : USE_VISIBILITY; + /* a22x hw binning not implemented */ + if (binning || !is_a20x(ctx->screen) || (fd_mesa_debug & FD_DBG_NOBIN)) + vismode = IGNORE_VISIBILITY; + + if (info->mode == PIPE_PRIM_POINTS) + vismode = IGNORE_VISIBILITY; + fd_draw_emit(ctx->batch, ring, ctx->primtypes[info->mode], -IGNORE_VISIBILITY, info, index_offset); +vismode, info, index_offset); if (is_a20x(ctx->screen)) { /* not sure why this is required, but it fixes some hangs */ @@ -152,6 +178,7 @@ fd2_draw_vbo(struct fd_context *ctx, const struct pipe_draw_info *pinfo, emit_vertexbufs(ctx); fd2_emit_state(ctx, ctx->batch->draw, ctx->dirty); + fd2_emit_state(ctx, ctx->batch->binning, ctx->dirty); /* a20x can draw only 65535 vertices at once * however, using a limit of 32k fixes an unexplained hang @@ -171,17 +198,23 @@ fd2_draw_vbo(struct fd_context *ctx, const struct pipe_draw_info *pinfo, struct pipe_draw_info info = *pinfo; unsigned count = info.count; unsigned step = step_tbl[info.mode]; + unsigned num_vertices = ctx->batch->num_vertices; if (!step) return false; for (; count + step > 32766; count -= step) { info.count = MIN2(count, 32766); - draw_impl(ctx, , ctx->batch->draw, index_offset); + draw_impl(ctx, , ctx->batch->draw, index_offset, false); + draw_impl(ctx, , ctx->batch->binning, index_offset, true); info.start += step; + ctx->batch->num_vertices += step;
[Mesa-dev] [PATCH 6/9] freedreno: a2xx texture update
Adds all missing texture related logic. For everything to work it also needs changes to ir2/fd2_program, which are part of the ir2 update patch. Note: it needs rnndb update Signed-off-by: Jonathan Marek --- .../drivers/freedreno/Makefile.sources| 2 + src/gallium/drivers/freedreno/a2xx/fd2_emit.c | 15 +++- src/gallium/drivers/freedreno/a2xx/fd2_gmem.c | 26 +-- .../drivers/freedreno/a2xx/fd2_resource.c | 73 +++ .../drivers/freedreno/a2xx/fd2_resource.h | 34 + .../drivers/freedreno/a2xx/fd2_screen.c | 6 +- .../drivers/freedreno/a2xx/fd2_texture.c | 67 +++-- .../drivers/freedreno/a2xx/fd2_texture.h | 7 +- src/gallium/drivers/freedreno/meson.build | 2 + 9 files changed, 212 insertions(+), 20 deletions(-) create mode 100644 src/gallium/drivers/freedreno/a2xx/fd2_resource.c create mode 100644 src/gallium/drivers/freedreno/a2xx/fd2_resource.h diff --git a/src/gallium/drivers/freedreno/Makefile.sources b/src/gallium/drivers/freedreno/Makefile.sources index 8b4d61c988..4d4644f96b 100644 --- a/src/gallium/drivers/freedreno/Makefile.sources +++ b/src/gallium/drivers/freedreno/Makefile.sources @@ -76,6 +76,8 @@ a2xx_SOURCES := \ a2xx/fd2_program.h \ a2xx/fd2_rasterizer.c \ a2xx/fd2_rasterizer.h \ + a2xx/fd2_resource.c \ + a2xx/fd2_resource.h \ a2xx/fd2_screen.c \ a2xx/fd2_screen.h \ a2xx/fd2_texture.c \ diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c index 50e2fe13eb..60bc9fad4c 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c @@ -118,6 +118,7 @@ emit_texture(struct fd_ringbuffer *ring, struct fd_context *ctx, static const struct fd2_pipe_sampler_view dummy_view = {}; const struct fd2_sampler_stateobj *sampler; const struct fd2_pipe_sampler_view *view; + struct fd_resource *rsc; if (emitted & (1 << const_idx)) return 0; @@ -129,19 +130,25 @@ emit_texture(struct fd_ringbuffer *ring, struct fd_context *ctx, fd2_pipe_sampler_view(tex->textures[samp_id]) : _view; + rsc = view->base.texture ? fd_resource(view->base.texture) : NULL; + OUT_PKT3(ring, CP_SET_CONSTANT, 7); OUT_RING(ring, 0x0001 + (0x6 * const_idx)); OUT_RING(ring, sampler->tex0 | view->tex0); - if (view->base.texture) - OUT_RELOC(ring, fd_resource(view->base.texture)->bo, 0, view->fmt, 0); + if (rsc) + OUT_RELOC(ring, rsc->bo, 0, view->tex1, 0); else OUT_RING(ring, 0); OUT_RING(ring, view->tex2); OUT_RING(ring, sampler->tex3 | view->tex3); - OUT_RING(ring, sampler->tex4); - OUT_RING(ring, sampler->tex5); + OUT_RING(ring, sampler->tex4 | view->tex4); + + if (rsc && rsc->base.last_level) + OUT_RELOC(ring, rsc->bo, fd_resource_offset(rsc, 1, 0), view->tex5, 0); + else + OUT_RING(ring, view->tex5); return (1 << const_idx); } diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c b/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c index 5770d6248e..e98ae7334a 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c @@ -66,6 +66,13 @@ emit_gmem2mem_surf(struct fd_batch *batch, uint32_t base, struct fd_ringbuffer *ring = batch->gmem; struct fd_resource *rsc = fd_resource(psurf->texture); uint32_t swap = fmt2swap(psurf->format); + struct fd_resource_slice *slice = + fd_resource_slice(rsc, psurf->u.tex.level); + uint32_t offset = + fd_resource_offset(rsc, psurf->u.tex.level, psurf->u.tex.first_layer); + + assert((slice->pitch & 31) == 0); + assert((offset & 0xfff) == 0); if (!rsc->valid) return; @@ -79,8 +86,8 @@ emit_gmem2mem_surf(struct fd_batch *batch, uint32_t base, OUT_PKT3(ring, CP_SET_CONSTANT, 5); OUT_RING(ring, CP_REG(REG_A2XX_RB_COPY_CONTROL)); OUT_RING(ring, 0x); /* RB_COPY_CONTROL */ - OUT_RELOCW(ring, rsc->bo, 0, 0, 0); /* RB_COPY_DEST_BASE */ - OUT_RING(ring, rsc->slices[0].pitch >> 5); /* RB_COPY_DEST_PITCH */ + OUT_RELOCW(ring, rsc->bo, offset, 0, 0); /* RB_COPY_DEST_BASE */ + OUT_RING(ring, slice->pitch >> 5); /* RB_COPY_DEST_PITCH */ OUT_RING(ring, /* RB_COPY_DEST_INFO */ A2XX_RB_COPY_DEST_INFO_FORMAT(fd2_pipe2color(psurf->format)) | A2XX_RB_COPY_DEST_INFO_LINEAR | @@ -189,6 +196,10 @@ emit_mem2gmem_surf(struct fd_batch *bat
[Mesa-dev] [PATCH 2/9] freedreno: a2xx: fix POINT_MINMAX_MAX overflow
As it stands, it overflows to zero. Signed-off-by: Jonathan Marek --- src/gallium/drivers/freedreno/a2xx/fd2_rasterizer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_rasterizer.c b/src/gallium/drivers/freedreno/a2xx/fd2_rasterizer.c index f35fddc09f..a81f63b570 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_rasterizer.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_rasterizer.c @@ -47,7 +47,7 @@ fd2_rasterizer_state_create(struct pipe_context *pctx, if (cso->point_size_per_vertex) { psize_min = util_get_min_point_size(cso); - psize_max = 8192; + psize_max = 8192.0 - 0.0625; } else { /* Force the point size to be as if the vertex output was disabled. */ psize_min = cso->point_size; -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 5/9] a2xx: Compute depth base in gmem correctly
Note: it needs rnndb update Signed-off-by: Marek Vasut Signed-off-by: Jonathan Marek --- src/gallium/drivers/freedreno/a2xx/fd2_gmem.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c b/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c index c711f8c79a..5770d6248e 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c @@ -108,6 +108,7 @@ fd2_emit_tile_gmem2mem(struct fd_batch *batch, struct fd_tile *tile) { struct fd_context *ctx = batch->ctx; struct fd2_context *fd2_ctx = fd2_context(ctx); + struct fd_gmem_stateobj *gmem = >gmem; struct fd_ringbuffer *ring = batch->gmem; struct pipe_framebuffer_state *pfb = >framebuffer; @@ -170,10 +171,10 @@ fd2_emit_tile_gmem2mem(struct fd_batch *batch, struct fd_tile *tile) A2XX_RB_COPY_DEST_OFFSET_Y(tile->yoff)); if (batch->resolve & (FD_BUFFER_DEPTH | FD_BUFFER_STENCIL)) - emit_gmem2mem_surf(batch, tile->bin_w * tile->bin_h, pfb->zsbuf); + emit_gmem2mem_surf(batch, gmem->zsbuf_base[0], pfb->zsbuf); if (batch->resolve & FD_BUFFER_COLOR) - emit_gmem2mem_surf(batch, 0, pfb->cbufs[0]); + emit_gmem2mem_surf(batch, gmem->cbuf_base[0], pfb->cbufs[0]); OUT_PKT3(ring, CP_SET_CONSTANT, 2); OUT_RING(ring, CP_REG(REG_A2XX_RB_MODECONTROL)); @@ -233,6 +234,7 @@ fd2_emit_tile_mem2gmem(struct fd_batch *batch, struct fd_tile *tile) { struct fd_context *ctx = batch->ctx; struct fd2_context *fd2_ctx = fd2_context(ctx); + struct fd_gmem_stateobj *gmem = >gmem; struct fd_ringbuffer *ring = batch->gmem; struct pipe_framebuffer_state *pfb = >framebuffer; unsigned bin_w = tile->bin_w; @@ -331,10 +333,10 @@ fd2_emit_tile_mem2gmem(struct fd_batch *batch, struct fd_tile *tile) OUT_RING(ring, 0x); if (fd_gmem_needs_restore(batch, tile, FD_BUFFER_DEPTH | FD_BUFFER_STENCIL)) - emit_mem2gmem_surf(batch, bin_w * bin_h, pfb->zsbuf); + emit_mem2gmem_surf(batch, gmem->zsbuf_base[0], pfb->zsbuf); if (fd_gmem_needs_restore(batch, tile, FD_BUFFER_COLOR)) - emit_mem2gmem_surf(batch, 0, pfb->cbufs[0]); + emit_mem2gmem_surf(batch, gmem->cbuf_base[0], pfb->cbufs[0]); /* TODO blob driver seems to toss in a CACHE_FLUSH after each DRAW_INDX.. */ } @@ -357,7 +359,7 @@ fd2_emit_tile_init(struct fd_batch *batch) OUT_RING(ring, gmem->bin_w); /* RB_SURFACE_INFO */ OUT_RING(ring, A2XX_RB_COLOR_INFO_SWAP(fmt2swap(format)) | A2XX_RB_COLOR_INFO_FORMAT(fd2_pipe2color(format))); - reg = A2XX_RB_DEPTH_INFO_DEPTH_BASE(align(gmem->bin_w * gmem->bin_h, 4)); + reg = A2XX_RB_DEPTH_INFO_DEPTH_BASE(gmem->zsbuf_base[0]); if (pfb->zsbuf) reg |= A2XX_RB_DEPTH_INFO_DEPTH_FORMAT(fd_pipe2depth(pfb->zsbuf->format)); OUT_RING(ring, reg); /* RB_DEPTH_INFO */ -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/9] freedreno: a2xx: fd2_draw update for a20x
Signed-off-by: Jonathan Marek --- src/gallium/drivers/freedreno/a2xx/fd2_draw.c | 97 --- .../drivers/freedreno/freedreno_batch.c | 1 + .../drivers/freedreno/freedreno_batch.h | 1 + .../drivers/freedreno/freedreno_draw.c| 2 + .../drivers/freedreno/freedreno_draw.h| 24 - .../drivers/freedreno/freedreno_util.h| 8 +- 6 files changed, 114 insertions(+), 19 deletions(-) diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c index f00bec6efc..49df1daa59 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c @@ -77,29 +77,46 @@ emit_vertexbufs(struct fd_context *ctx) fd2_emit_vertex_bufs(ctx->batch->draw, 0x78, bufs, vtx->num_elements); } -static bool -fd2_draw_vbo(struct fd_context *ctx, const struct pipe_draw_info *info, - unsigned index_offset) +static void +draw_impl(struct fd_context *ctx, const struct pipe_draw_info *info, + struct fd_ringbuffer *ring, unsigned index_offset) { - struct fd_ringbuffer *ring = ctx->batch->draw; - - if (ctx->dirty & FD_DIRTY_VTXBUF) - emit_vertexbufs(ctx); - - fd2_emit_state(ctx, ctx->dirty); + enum pc_di_vis_cull_mode vismode; OUT_PKT3(ring, CP_SET_CONSTANT, 2); OUT_RING(ring, CP_REG(REG_A2XX_VGT_INDX_OFFSET)); - OUT_RING(ring, info->start); + OUT_RING(ring, info->index_size ? 0 : info->start); OUT_PKT3(ring, CP_SET_CONSTANT, 2); OUT_RING(ring, CP_REG(REG_A2XX_VGT_VERTEX_REUSE_BLOCK_CNTL)); - OUT_RING(ring, 0x003b); + OUT_RING(ring, is_a20x(ctx->screen) ? 0x0002 : 0x003b); OUT_PKT0(ring, REG_A2XX_TC_CNTL_STATUS, 1); OUT_RING(ring, A2XX_TC_CNTL_STATUS_L2_INVALIDATE); - if (!is_a20x(ctx->screen)) { + if (is_a20x(ctx->screen)) { + /* wait for DMA to finish and +* dummy draw one triangle with indexes 0,0,0. +* with PRE_FETCH_CULL_ENABLE | GRP_CULL_ENABLE. +* +* this workaround is for a HW bug related to DMA alignment: +* it is necessary for indexed draws and possibly also +* draws that read binning data +*/ + OUT_PKT3(ring, CP_WAIT_REG_EQ, 4); + OUT_RING(ring, 0x05d0); /* RBBM_STATUS */ + OUT_RING(ring, 0x); + OUT_RING(ring, 0x1000); /* bit: 12: VGT_BUSY_NO_DMA */ + OUT_RING(ring, 0x0001); + + OUT_PKT3(ring, CP_DRAW_INDX_BIN, 6); + OUT_RING(ring, 0x); + OUT_RING(ring, 0x0003c004); + OUT_RING(ring, 0x); + OUT_RING(ring, 0x0003); + OUT_RELOC(ring, fd_resource(fd2_context(ctx)->solid_vertexbuf)->bo, 0x80, 0, 0); + OUT_RING(ring, 0x0006); + } else { OUT_WFI (ring); OUT_PKT3(ring, CP_SET_CONSTANT, 3); @@ -111,11 +128,61 @@ fd2_draw_vbo(struct fd_context *ctx, const struct pipe_draw_info *info, fd_draw_emit(ctx->batch, ring, ctx->primtypes[info->mode], IGNORE_VISIBILITY, info, index_offset); - OUT_PKT3(ring, CP_SET_CONSTANT, 2); - OUT_RING(ring, CP_REG(REG_A2XX_UNKNOWN_2010)); - OUT_RING(ring, 0x); + if (is_a20x(ctx->screen)) { + /* not sure why this is required, but it fixes some hangs */ + OUT_WFI(ring); + } else { + OUT_PKT3(ring, CP_SET_CONSTANT, 2); + OUT_RING(ring, CP_REG(REG_A2XX_UNKNOWN_2010)); + OUT_RING(ring, 0x); + } emit_cacheflush(ring); +} + + +static bool +fd2_draw_vbo(struct fd_context *ctx, const struct pipe_draw_info *pinfo, + unsigned index_offset) +{ + if (!ctx->prog.fp || !ctx->prog.vp) + return false; + + if (ctx->dirty & FD_DIRTY_VTXBUF) + emit_vertexbufs(ctx); + + fd2_emit_state(ctx, ctx->batch->draw, ctx->dirty); + + /* a20x can draw only 65535 vertices at once +* however, using a limit of 32k fixes an unexplained hang +* 32766 works for all primitives +*/ + if (is_a20x(ctx->screen) && pinfo->count > 32766) { + static const uint16_t step_tbl[PIPE_PRIM_MAX] = { + [0 ... PIPE_PRIM_MAX - 1] = 32766, + [PIPE_PRIM_LINE_STRIP] = 32765, + [PIPE_PRIM_TRIANGLE_STRIP] = 32764, + + /* needs more work */ + [PIPE_PRIM_TRIANGLE_FAN] = 0, + [PIPE_PRIM_LINE_LOOP] = 0, + }; + + struct pipe_draw_info
[Mesa-dev] [PATCH 3/9] freedreno: add missing a20x ids
200: 256KiB GMEM A200 (imx53) 201: 128KiB GMEM A200 (imx51) Signed-off-by: Jonathan Marek --- src/gallium/drivers/freedreno/freedreno_screen.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c b/src/gallium/drivers/freedreno/freedreno_screen.c index 88d91a9123..a55403804b 100644 --- a/src/gallium/drivers/freedreno/freedreno_screen.c +++ b/src/gallium/drivers/freedreno/freedreno_screen.c @@ -796,6 +796,8 @@ fd_screen_create(struct fd_device *dev) * send a patch ;-) */ switch (screen->gpu_id) { + case 200: + case 201: case 205: case 220: fd2_screen_init(pscreen); -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/3] nir: combine fmul and fadd across ffma operations
The brw_nir_opt_peephole_ffma pass is only doing what the fuse_ffma option already does. It produces the same result as the fuse_ffma option, which is not optimal. This is what I get: vec4 32 ssa_7 = fmul ssa_6, ssa_1. vec4 32 ssa_8 = ffma ssa_5, ssa_1., ssa_7 vec4 32 ssa_10 = ffma ssa_9, ssa_1., ssa_8 vec4 32 ssa_12 = fadd ssa_10, ssa_11 But better optimized as (example with the least rearrangements): vec4 32 ssa_7 = ffma ssa_6, ssa_1., ssa_11 vec4 32 ssa_8 = ffma ssa_5, ssa_1., ssa_7 vec4 32 ssa_10 = ffma ssa_9, ssa_1., ssa_8 Fusing the fmul and fadd in this case is not obvious. Could this patch be OK if it is behind the fuse_ffma option? On 11/12/2018 02:30 PM, Jason Ekstrand wrote: In general, you're not supposed to mess around with the precision of fma... What we do in the Intel drivers is to leave fma split, apply operations, and then we have a special mul+add fusion pass we run at the end. Leaving them split allows for exactly this kind of optimization without mixing up those FMAs that are supposed to be kept fused and those generated by mul+add fusion which can be split back apart and re-optimized. On Mon, Nov 12, 2018 at 12:17 PM Jonathan Marek wrote: This works by moving the fadd up across the ffma operations, so that it can eventually can be combined with a fmul. I'm not sure it works in all cases, but it works in all the common cases. Example: matrix * vec4(coord, 1.0) is compiled as: fmul, ffma, ffma, fadd and with this patch: ffma, ffma, ffma Signed-off-by: Jonathan Marek --- src/compiler/nir/nir_opt_algebraic.py | 1 + 1 file changed, 1 insertion(+) diff --git a/src/compiler/nir/nir_opt_algebraic.py b/src/compiler/nir/nir_opt_algebraic.py index 8f4df891b8..82e10731a6 100644 --- a/src/compiler/nir/nir_opt_algebraic.py +++ b/src/compiler/nir/nir_opt_algebraic.py @@ -133,6 +133,7 @@ optimizations = [ (('~fadd@64', a, ('fmul', c , ('fadd', b, ('fneg', a, ('flrp', a, b, c), '!options->lower_flrp64'), (('ffma', a, b, c), ('fadd', ('fmul', a, b), c), 'options->lower_ffma'), (('~fadd', ('fmul', a, b), c), ('ffma', a, b, c), 'options->fuse_ffma'), + (('~fadd', ('ffma', a, b, c), d), ('ffma', a, b, ('fadd', c, d))), (('fdot4', ('vec4', a, b, c, 1.0), d), ('fdph', ('vec3', a, b, c), d)), (('fdot4', ('vec4', a, 0.0, 0.0, 0.0), b), ('fmul', a, b)), -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/3] glsl/nir: glsl_to_nir fixes for native_integers=false
Two parts: 1. for intructions that have a BOOL source, insert b2f to so that the backend can identify the source as a BOOL and perform the conversion from NIR_TRUE/NIR_FALSE 2. add missing type conversions (out_type is always GLSL_TYPE_FLOAT, so we are missing some conversion instructions): float to int (ftrunc), and f2b (which represents the operation that is the opposite of a fnot). Signed-off-by: Jonathan Marek --- src/compiler/glsl/glsl_to_nir.cpp | 23 ++- 1 file changed, 22 insertions(+), 1 deletion(-) diff --git a/src/compiler/glsl/glsl_to_nir.cpp b/src/compiler/glsl/glsl_to_nir.cpp index 0479f8fcfe..565bf588f5 100644 --- a/src/compiler/glsl/glsl_to_nir.cpp +++ b/src/compiler/glsl/glsl_to_nir.cpp @@ -1465,9 +1465,13 @@ nir_visitor::visit(ir_expression *ir) } nir_ssa_def *srcs[4]; - for (unsigned i = 0; i < ir->num_operands; i++) + for (unsigned i = 0; i < ir->num_operands; i++) { srcs[i] = evaluate_rvalue(ir->operands[i]); + if (ir->operands[i]->type->base_type == GLSL_TYPE_BOOL) + srcs[i] = nir_b2f(, srcs[i]); + } + glsl_base_type types[4]; for (unsigned i = 0; i < ir->num_operands; i++) if (supports_ints) @@ -1552,6 +1556,23 @@ nir_visitor::visit(ir_expression *ir) case ir_unop_u2i: case ir_unop_i642u64: case ir_unop_u642i64: { + if (!supports_ints) { + switch (ir->operation) { + case ir_unop_f2i: + case ir_unop_f2u: +result = nir_ftrunc(, srcs[0]); +break; + case ir_unop_f2b: + case ir_unop_i2b: +result = nir_f2b(, srcs[0]); +break; + default: +result = nir_fmov(, srcs[0]); +break; + } + break; + } + nir_alu_type src_type = nir_get_nir_type_for_glsl_base_type(types[0]); nir_alu_type dst_type = nir_get_nir_type_for_glsl_base_type(out_type); result = nir_build_alu(, nir_type_conversion_op(src_type, dst_type, -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/3] nir: combine fmul and fadd across ffma operations
This works by moving the fadd up across the ffma operations, so that it can eventually can be combined with a fmul. I'm not sure it works in all cases, but it works in all the common cases. Example: matrix * vec4(coord, 1.0) is compiled as: fmul, ffma, ffma, fadd and with this patch: ffma, ffma, ffma Signed-off-by: Jonathan Marek --- src/compiler/nir/nir_opt_algebraic.py | 1 + 1 file changed, 1 insertion(+) diff --git a/src/compiler/nir/nir_opt_algebraic.py b/src/compiler/nir/nir_opt_algebraic.py index 8f4df891b8..82e10731a6 100644 --- a/src/compiler/nir/nir_opt_algebraic.py +++ b/src/compiler/nir/nir_opt_algebraic.py @@ -133,6 +133,7 @@ optimizations = [ (('~fadd@64', a, ('fmul', c , ('fadd', b, ('fneg', a, ('flrp', a, b, c), '!options->lower_flrp64'), (('ffma', a, b, c), ('fadd', ('fmul', a, b), c), 'options->lower_ffma'), (('~fadd', ('fmul', a, b), c), ('ffma', a, b, c), 'options->fuse_ffma'), + (('~fadd', ('ffma', a, b, c), d), ('ffma', a, b, ('fadd', c, d))), (('fdot4', ('vec4', a, b, c, 1.0), d), ('fdph', ('vec3', a, b, c), d)), (('fdot4', ('vec4', a, 0.0, 0.0, 0.0), b), ('fmul', a, b)), -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/3] nir: add fceil lowering
lowers ceil(x) as -floor(-x) Signed-off-by: Jonathan Marek --- src/compiler/nir/nir.h| 3 +++ src/compiler/nir/nir_opt_algebraic.py | 1 + 2 files changed, 4 insertions(+) diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h index dc3c729dee..f9b32a5daf 100644 --- a/src/compiler/nir/nir.h +++ b/src/compiler/nir/nir.h @@ -2086,6 +2086,9 @@ typedef struct nir_shader_compiler_options { /** lowers ffract to fsub+ffloor: */ bool lower_ffract; + /** lowers fceil to fneg+ffloor+fneg: */ + bool lower_fceil; + bool lower_ldexp; bool lower_pack_half_2x16; diff --git a/src/compiler/nir/nir_opt_algebraic.py b/src/compiler/nir/nir_opt_algebraic.py index 8b24daddfd..8f4df891b8 100644 --- a/src/compiler/nir/nir_opt_algebraic.py +++ b/src/compiler/nir/nir_opt_algebraic.py @@ -124,6 +124,7 @@ optimizations = [ (('flrp@32', a, b, c), ('fadd', ('fmul', c, ('fsub', b, a)), a), 'options->lower_flrp32'), (('flrp@64', a, b, c), ('fadd', ('fmul', c, ('fsub', b, a)), a), 'options->lower_flrp64'), (('ffract', a), ('fsub', a, ('ffloor', a)), 'options->lower_ffract'), + (('fceil', a), ('fneg', ('ffloor', ('fneg', a))), 'options->lower_fceil'), (('~fadd', ('fmul', a, ('fadd', 1.0, ('fneg', ('b2f', c, ('fmul', b, ('b2f', c))), ('bcsel', c, b, a), 'options->lower_flrp32'), (('~fadd@32', ('fmul', a, ('fadd', 1.0, ('fneg', c ))), ('fmul', b, c )), ('flrp', a, b, c), '!options->lower_flrp32'), (('~fadd@64', ('fmul', a, ('fadd', 1.0, ('fneg', c ))), ('fmul', b, c )), ('flrp', a, b, c), '!options->lower_flrp64'), -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 08/11] freedreno: a2xx: split large draws on a20x
Hi, You're right, it would be easy to do. I'll include it in my next submission. On 10/08/2018 12:13 AM, Ilia Mirkin wrote: See my feedback from your earlier submission for how to make this work on more than triangles. Seems easy enough to just do it. https://patchwork.freedesktop.org/patch/250192/ On Mon, Oct 8, 2018 at 12:07 AM Jonathan Marek wrote: a20x can only draw 65535 vertices at once. this fix only applies to triangles. Signed-off-by: Jonathan Marek --- src/gallium/drivers/freedreno/a2xx/fd2_draw.c | 30 +-- 1 file changed, 28 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c index 1792505808..7ccbee587f 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c @@ -171,8 +171,34 @@ fd2_draw_vbo(struct fd_context *ctx, const struct pipe_draw_info *pinfo, fd2_emit_state(ctx, ctx->batch->draw, ctx->dirty); fd2_emit_state(ctx, ctx->batch->binning, ctx->dirty); - draw_impl(ctx, pinfo, ctx->batch->draw, index_offset, false); - draw_impl(ctx, pinfo, ctx->batch->binning, index_offset, true); + /* a20x can only draw 65535 vertices at once... */ + if (is_a20x(ctx->screen) && pinfo->count > 0x) { + struct pipe_draw_info info = *pinfo; + unsigned count = info.count; + unsigned num_vertices = ctx->batch->num_vertices; + + /* other primitives require more work +* (triangles works because 0x is divible by 3) +*/ + if (info.mode != PIPE_PRIM_TRIANGLES) + return false; + + for (; count; ) { + info.count = MIN2(count, 0x); + + draw_impl(ctx, , ctx->batch->draw, index_offset, false); + draw_impl(ctx, , ctx->batch->binning, index_offset, true); + + info.start += 0x; + ctx->batch->num_vertices += 0x; + count -= info.count; + } + /* changing this value is a hack, restore it */ + ctx->batch->num_vertices = num_vertices; + } else { + draw_impl(ctx, pinfo, ctx->batch->draw, index_offset, false); + draw_impl(ctx, pinfo, ctx->batch->binning, index_offset, true); + } fd_context_all_clean(ctx); -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 09/11] freedreno: a2xx: set PA_SC_VIZ_QUERY register
on a20x the GPU will hang if this register is zero Signed-off-by: Jonathan Marek --- src/gallium/drivers/freedreno/a2xx/fd2_emit.c | 4 1 file changed, 4 insertions(+) diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c index 90da6a2192..10a8ad586c 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c @@ -359,6 +359,10 @@ fd2_emit_restore(struct fd_context *ctx, struct fd_ringbuffer *ring) A2XX_RB_BC_CONTROL_MEM_EXPORT_TIMEOUT_SELECT(3)); } + OUT_PKT3(ring, CP_SET_CONSTANT, 2); + OUT_RING(ring, CP_REG(REG_A2XX_PA_SC_VIZ_QUERY)); + OUT_RING(ring, A2XX_PA_SC_VIZ_QUERY_VIZ_QUERY_ID(16)); + OUT_PKT0(ring, REG_A2XX_TP0_CHICKEN, 1); OUT_RING(ring, 0x0002); -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 10/11] freedreno: a2xx: start max_reg at -1
on a220 it makes a difference if the max register # is -1 or 0 Signed-off-by: Jonathan Marek --- src/gallium/drivers/freedreno/a2xx/ir-a2xx.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/freedreno/a2xx/ir-a2xx.c b/src/gallium/drivers/freedreno/a2xx/ir-a2xx.c index f8e056e424..3924d11e5a 100644 --- a/src/gallium/drivers/freedreno/a2xx/ir-a2xx.c +++ b/src/gallium/drivers/freedreno/a2xx/ir-a2xx.c @@ -209,8 +209,8 @@ void* ir2_shader_assemble(struct ir2_shader *shader, /* bitmask of variables required for exports defined by "export" */ uint32_t export_mask[REG_MASK/32+1] = {}; - unsigned idx, reg_idx; - unsigned max_input = 0; + int idx, reg_idx; + int max_input = -1; int export_size = -1; for (idx = 0; idx < shader->instr_count; idx++) { -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 07/11] freedreno: a2xx: a20x hw binning
adds all the required logic for a20x hw binning to work Signed-off-by: Jonathan Marek --- src/gallium/drivers/freedreno/a2xx/fd2_draw.c | 95 src/gallium/drivers/freedreno/a2xx/fd2_emit.c | 10 +- src/gallium/drivers/freedreno/a2xx/fd2_emit.h | 3 +- src/gallium/drivers/freedreno/a2xx/fd2_gmem.c | 107 +- .../drivers/freedreno/a2xx/fd2_program.c | 41 --- .../drivers/freedreno/a2xx/fd2_program.h | 2 +- 6 files changed, 215 insertions(+), 43 deletions(-) diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c index 6f0535fa2b..1792505808 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c @@ -77,31 +77,56 @@ emit_vertexbufs(struct fd_context *ctx) // CONST(20,0) (or CONST(26,0) in soliv_vp) fd2_emit_vertex_bufs(ctx->batch->draw, 0x78, bufs, vtx->num_elements); + fd2_emit_vertex_bufs(ctx->batch->binning, 0x78, bufs, vtx->num_elements); } -static bool -fd2_draw_vbo(struct fd_context *ctx, const struct pipe_draw_info *info, - unsigned index_offset) +static void +draw_impl(struct fd_context *ctx, const struct pipe_draw_info *info, + struct fd_ringbuffer *ring, unsigned index_offset, + bool binning) { - struct fd_ringbuffer *ring = ctx->batch->draw; - - if (ctx->dirty & FD_DIRTY_VTXBUF) - emit_vertexbufs(ctx); - - fd2_emit_state(ctx, ctx->dirty); + enum pc_di_vis_cull_mode vismode; OUT_PKT3(ring, CP_SET_CONSTANT, 2); OUT_RING(ring, CP_REG(REG_A2XX_VGT_INDX_OFFSET)); - OUT_RING(ring, info->start); + OUT_RING(ring, info->index_size ? 0 : info->start); - OUT_PKT3(ring, CP_SET_CONSTANT, 2); - OUT_RING(ring, CP_REG(REG_A2XX_VGT_VERTEX_REUSE_BLOCK_CNTL)); - OUT_RING(ring, 0x003b); + /* in the binning batch, thid value is set once in fd2_emit_tile_init */ + if (!binning) { + OUT_PKT3(ring, CP_SET_CONSTANT, 2); + OUT_RING(ring, CP_REG(REG_A2XX_VGT_VERTEX_REUSE_BLOCK_CNTL)); + /* XXX do this for every REG_A2XX_VGT_VERTEX_REUSE_BLOCK_CNTL write ? +* if set to 0x3b on a20x, clipping is broken +*/ + OUT_RING(ring, is_a20x(ctx->screen) ? 0x0002 : 0x003b); + } OUT_PKT0(ring, REG_A2XX_TC_CNTL_STATUS, 1); OUT_RING(ring, A2XX_TC_CNTL_STATUS_L2_INVALIDATE); - if (!is_a20x(ctx->screen)) { + if (is_a20x(ctx->screen)) { + /* wait for DMA to finish and +* dummy draw one triangle with indexes 0,0,0. +* with PRE_FETCH_CULL_ENABLE | GRP_CULL_ENABLE. +* +* this workaround is for a HW bug related to DMA alignment: +* it is necessary for indexed draws and possibly also +* draws that read binning data +*/ + OUT_PKT3(ring, CP_WAIT_REG_EQ, 4); + OUT_RING(ring, 0x05d0); /* RBBM_STATUS */ + OUT_RING(ring, 0x); + OUT_RING(ring, 0x1000); /* bit: 12: VGT_BUSY_NO_DMA */ + OUT_RING(ring, 0x0001); + + OUT_PKT3(ring, CP_DRAW_INDX_BIN, 6); + OUT_RING(ring, 0x); + OUT_RING(ring, 0x0003c004); + OUT_RING(ring, 0x); + OUT_RING(ring, 0x0003); + OUT_RELOC(ring, fd_resource(fd2_context(ctx)->solid_vertexbuf)->bo, 0x80, 0, 0); + OUT_RING(ring, 0x0006); + } else { OUT_WFI (ring); OUT_PKT3(ring, CP_SET_CONSTANT, 3); @@ -110,14 +135,44 @@ fd2_draw_vbo(struct fd_context *ctx, const struct pipe_draw_info *info, OUT_RING(ring, info->min_index);/* VGT_MIN_VTX_INDX */ } + /* C64 holds offset to use for binning data */ + if (binning && is_a20x(ctx->screen)) { + OUT_PKT3(ring, CP_SET_CONSTANT, 5); + OUT_RING(ring, 0x0180); + OUT_RING(ring, fui(ctx->batch->num_vertices)); + OUT_RING(ring, fui(0.0f)); + OUT_RING(ring, fui(0.0f)); + OUT_RING(ring, fui(0.0f)); + } + + vismode = binning ? IGNORE_VISIBILITY : USE_VISIBILITY; + /* a22x hw binning not implemented */ + if (binning || !is_a20x(ctx->screen) || (fd_mesa_debug & FD_DBG_NOBIN)) + vismode = IGNORE_VISIBILITY; + fd_draw_emit(ctx->batch, ring, ctx->primtypes[info->mode], -IGNORE_VISIBILITY, info, index_offset); + vismode, info, index_offset); - OUT_PKT3(ring, CP_SET_CONSTANT, 2); - OUT_RING(ring, CP_REG(REG_A2XX_UNKNOWN_2010))
[Mesa-dev] [PATCH 11/11] a2xx: Compute depth base in gmem correctly
From: Marek Vasut This fixes "a2xx: Compute depth base in gmem consistently" by using the already present zsbuf and cbuf bases rather than incorrect hand crafted calculation. Without this patch, the following assertion triggers ie. with Qt5 on system with 480x272 display: appliation: ../../../../../git/src/gallium/drivers/freedreno/a2xx/a2xx.xml.h:699: A2XX_RB_DEPTH_INFO_DEPTH_BASE: Assertion `!(val & 0x3ff)' failed. Signed-off-by: Marek Vasut --- src/gallium/drivers/freedreno/a2xx/a2xx.xml.h | 4 ++-- src/gallium/drivers/freedreno/a2xx/fd2_gmem.c | 12 +++- 2 files changed, 9 insertions(+), 7 deletions(-) diff --git a/src/gallium/drivers/freedreno/a2xx/a2xx.xml.h b/src/gallium/drivers/freedreno/a2xx/a2xx.xml.h index 4a2daca9ed..87c18918f5 100644 --- a/src/gallium/drivers/freedreno/a2xx/a2xx.xml.h +++ b/src/gallium/drivers/freedreno/a2xx/a2xx.xml.h @@ -682,7 +682,7 @@ static inline uint32_t A2XX_RB_COLOR_INFO_SWAP(uint32_t val) static inline uint32_t A2XX_RB_COLOR_INFO_BASE(uint32_t val) { assert(!(val & 0x3ff)); - return ((val >> 10) << A2XX_RB_COLOR_INFO_BASE__SHIFT) & A2XX_RB_COLOR_INFO_BASE__MASK; + return ((val >> 12) << A2XX_RB_COLOR_INFO_BASE__SHIFT) & A2XX_RB_COLOR_INFO_BASE__MASK; } #define REG_A2XX_RB_DEPTH_INFO 0x2002 @@ -697,7 +697,7 @@ static inline uint32_t A2XX_RB_DEPTH_INFO_DEPTH_FORMAT(enum adreno_rb_depth_form static inline uint32_t A2XX_RB_DEPTH_INFO_DEPTH_BASE(uint32_t val) { assert(!(val & 0x3ff)); - return ((val >> 10) << A2XX_RB_DEPTH_INFO_DEPTH_BASE__SHIFT) & A2XX_RB_DEPTH_INFO_DEPTH_BASE__MASK; + return ((val >> 12) << A2XX_RB_DEPTH_INFO_DEPTH_BASE__SHIFT) & A2XX_RB_DEPTH_INFO_DEPTH_BASE__MASK; } #define REG_A2XX_A225_RB_COLOR_INFO3 0x2005 diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c b/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c index 7cf5e201fe..cf93d8539c 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c @@ -110,6 +110,7 @@ fd2_emit_tile_gmem2mem(struct fd_batch *batch, struct fd_tile *tile) { struct fd_context *ctx = batch->ctx; struct fd2_context *fd2_ctx = fd2_context(ctx); + struct fd_gmem_stateobj *gmem = >gmem; struct fd_ringbuffer *ring = batch->gmem; struct pipe_framebuffer_state *pfb = >framebuffer; @@ -172,10 +173,10 @@ fd2_emit_tile_gmem2mem(struct fd_batch *batch, struct fd_tile *tile) A2XX_RB_COPY_DEST_OFFSET_Y(tile->yoff)); if (batch->resolve & (FD_BUFFER_DEPTH | FD_BUFFER_STENCIL)) - emit_gmem2mem_surf(batch, tile->bin_w * tile->bin_h, pfb->zsbuf); + emit_gmem2mem_surf(batch, gmem->zsbuf_base[0], pfb->zsbuf); if (batch->resolve & FD_BUFFER_COLOR) - emit_gmem2mem_surf(batch, 0, pfb->cbufs[0]); + emit_gmem2mem_surf(batch, gmem->cbuf_base[0], pfb->cbufs[0]); OUT_PKT3(ring, CP_SET_CONSTANT, 2); OUT_RING(ring, CP_REG(REG_A2XX_RB_MODECONTROL)); @@ -235,6 +236,7 @@ fd2_emit_tile_mem2gmem(struct fd_batch *batch, struct fd_tile *tile) { struct fd_context *ctx = batch->ctx; struct fd2_context *fd2_ctx = fd2_context(ctx); + struct fd_gmem_stateobj *gmem = >gmem; struct fd_ringbuffer *ring = batch->gmem; struct pipe_framebuffer_state *pfb = >framebuffer; unsigned bin_w = tile->bin_w; @@ -333,10 +335,10 @@ fd2_emit_tile_mem2gmem(struct fd_batch *batch, struct fd_tile *tile) OUT_RING(ring, 0x); if (fd_gmem_needs_restore(batch, tile, FD_BUFFER_DEPTH | FD_BUFFER_STENCIL)) - emit_mem2gmem_surf(batch, bin_w * bin_h, pfb->zsbuf); + emit_mem2gmem_surf(batch, gmem->zsbuf_base[0], pfb->zsbuf); if (fd_gmem_needs_restore(batch, tile, FD_BUFFER_COLOR)) - emit_mem2gmem_surf(batch, 0, pfb->cbufs[0]); + emit_mem2gmem_surf(batch, gmem->cbuf_base[0], pfb->cbufs[0]); /* TODO blob driver seems to toss in a CACHE_FLUSH after each DRAW_INDX.. */ } @@ -360,7 +362,7 @@ fd2_emit_tile_init(struct fd_batch *batch) OUT_RING(ring, gmem->bin_w); /* RB_SURFACE_INFO */ OUT_RING(ring, A2XX_RB_COLOR_INFO_SWAP(fmt2swap(format)) | A2XX_RB_COLOR_INFO_FORMAT(fd2_pipe2color(format))); - reg = A2XX_RB_DEPTH_INFO_DEPTH_BASE(align(gmem->bin_w * gmem->bin_h, 4)); + reg = A2XX_RB_DEPTH_INFO_DEPTH_BASE(gmem->zsbuf_base[0]); if (pfb->zsbuf) reg |= A2XX_RB_DEPTH_INFO_DEPTH_FORMAT(fd_pipe2depth(pfb->zsbuf->format)); OUT_RING(ring, reg); /* RB_DEPTH_INFO */ -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 02/11] freedreno: implement different pipe configuration for a20x
this also adds a num_vsc_pipe which represents the number of pipes to use: this value is useful because more pipes has a higher cost (on a20x) Signed-off-by: Jonathan Marek --- .../drivers/freedreno/freedreno_gmem.c| 29 ++- .../drivers/freedreno/freedreno_gmem.h| 1 + 2 files changed, 22 insertions(+), 8 deletions(-) diff --git a/src/gallium/drivers/freedreno/freedreno_gmem.c b/src/gallium/drivers/freedreno/freedreno_gmem.c index 668730390c..76f3b5a89e 100644 --- a/src/gallium/drivers/freedreno/freedreno_gmem.c +++ b/src/gallium/drivers/freedreno/freedreno_gmem.c @@ -216,12 +216,21 @@ calculate_tiles(struct fd_batch *batch) #define div_round_up(v, a) (((v) + (a) - 1) / (a)) /* figure out number of tiles per pipe: */ - tpp_x = tpp_y = 1; - while (div_round_up(nbins_y, tpp_y) > screen->num_vsc_pipes) - tpp_y += 2; - while ((div_round_up(nbins_y, tpp_y) * - div_round_up(nbins_x, tpp_x)) > screen->num_vsc_pipes) - tpp_x += 1; + if (is_a20x(ctx->screen)) { + /* for a20x we want to minimize the number of "pipes" +* binning data has 3 bits for x/y (8x8) but the edges are used to +* cull off-screen vertices with hw binning, so we have 6x6 pipes +*/ + tpp_x = 6; + tpp_y = 6; + } else { + tpp_x = tpp_y = 1; + while (div_round_up(nbins_y, tpp_y) > screen->num_vsc_pipes) + tpp_y += 2; + while ((div_round_up(nbins_y, tpp_y) * + div_round_up(nbins_x, tpp_x)) > screen->num_vsc_pipes) + tpp_x += 1; + } gmem->maxpw = tpp_x; gmem->maxph = tpp_y; @@ -248,6 +257,9 @@ calculate_tiles(struct fd_batch *batch) xoff += tpp_x; } + /* number of pipes to use for a20x */ + gmem->num_vsc_pipes = MAX2(1, i); + for (; i < npipes; i++) { struct fd_vsc_pipe *pipe = >vsc_pipe[i]; pipe->x = pipe->y = pipe->w = pipe->h = 0; @@ -282,11 +294,12 @@ calculate_tiles(struct fd_batch *batch) /* pipe number: */ p = ((i / tpp_y) * div_round_up(nbins_x, tpp_x)) + (j / tpp_x); + assert(p < gmem->num_vsc_pipes); /* clip bin width: */ bw = MIN2(bin_w, minx + width - xoff); - - tile->n = tile_n[p]++; + tile->n = !is_a20x(ctx->screen) ? tile_n[p]++ : + ((i % tpp_y + 1) << 3 | (j % tpp_x + 1)); tile->p = p; tile->bin_w = bw; tile->bin_h = bh; diff --git a/src/gallium/drivers/freedreno/freedreno_gmem.h b/src/gallium/drivers/freedreno/freedreno_gmem.h index 47f52307b6..3959ea18be 100644 --- a/src/gallium/drivers/freedreno/freedreno_gmem.h +++ b/src/gallium/drivers/freedreno/freedreno_gmem.h @@ -59,6 +59,7 @@ struct fd_gmem_stateobj { uint16_t minx, miny; uint16_t width, height; uint16_t maxpw, maxph; /* maximum pipe width/height */ + uint8_t num_vsc_pipes; /* number of pipes for a20x */ }; struct fd_batch; -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 08/11] freedreno: a2xx: split large draws on a20x
a20x can only draw 65535 vertices at once. this fix only applies to triangles. Signed-off-by: Jonathan Marek --- src/gallium/drivers/freedreno/a2xx/fd2_draw.c | 30 +-- 1 file changed, 28 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c index 1792505808..7ccbee587f 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c @@ -171,8 +171,34 @@ fd2_draw_vbo(struct fd_context *ctx, const struct pipe_draw_info *pinfo, fd2_emit_state(ctx, ctx->batch->draw, ctx->dirty); fd2_emit_state(ctx, ctx->batch->binning, ctx->dirty); - draw_impl(ctx, pinfo, ctx->batch->draw, index_offset, false); - draw_impl(ctx, pinfo, ctx->batch->binning, index_offset, true); + /* a20x can only draw 65535 vertices at once... */ + if (is_a20x(ctx->screen) && pinfo->count > 0x) { + struct pipe_draw_info info = *pinfo; + unsigned count = info.count; + unsigned num_vertices = ctx->batch->num_vertices; + + /* other primitives require more work +* (triangles works because 0x is divible by 3) +*/ + if (info.mode != PIPE_PRIM_TRIANGLES) + return false; + + for (; count; ) { + info.count = MIN2(count, 0x); + + draw_impl(ctx, , ctx->batch->draw, index_offset, false); + draw_impl(ctx, , ctx->batch->binning, index_offset, true); + + info.start += 0x; + ctx->batch->num_vertices += 0x; + count -= info.count; + } + /* changing this value is a hack, restore it */ + ctx->batch->num_vertices = num_vertices; + } else { + draw_impl(ctx, pinfo, ctx->batch->draw, index_offset, false); + draw_impl(ctx, pinfo, ctx->batch->binning, index_offset, true); + } fd_context_all_clean(ctx); -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 06/11] freedreno: a2xx: add fragcoord
emulated fragcoord. a2xx has *some* hw support but it is not practical Signed-off-by: Jonathan Marek --- .../drivers/freedreno/a2xx/fd2_compiler.c| 16 1 file changed, 16 insertions(+) diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c b/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c index 1ce3bc4f82..ab5d16f1a7 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c @@ -186,6 +186,7 @@ compile_init(struct fd2_compile_context *ctx, struct fd_program_stateobj *prog, switch (name) { case TGSI_SEMANTIC_COLOR: case TGSI_SEMANTIC_GENERIC: + case TGSI_SEMANTIC_POSITION: ctx->num_param++; break; default: @@ -325,6 +326,8 @@ add_dst_reg(struct fd2_compile_context *ctx, struct ir2_instruction *alu, num = ctx->prog->num_exports; } } else { + /* write to gl_FragCoord.z not possible */ + assert(ctx->output_export_idx[dst->Index] != TGSI_SEMANTIC_POSITION); num = dst->Index; } break; @@ -1103,6 +1106,7 @@ compile_extra_exports(struct fd2_compile_context *ctx) { struct ir2_shader *shader = ctx->so->ir; struct ir2_instruction *instr; + int fragcoord = ctx->prog->export_linkage[TGSI_SEMANTIC_POSITION]; int position = ctx->num_regs[TGSI_FILE_INPUT] + 1; unsigned i; /* XXX hacky way to get new temporaries */ @@ -1122,6 +1126,18 @@ compile_extra_exports(struct fd2_compile_context *ctx) ir2_reg_create(instr, tmp, "", 0); ir2_dst_create(instr, tmp + 1, "xyzw", 0); + if (fragcoord != 0xff) { + instr = ir2_instr_create_alu_v(shader, MULADDv); + ir2_reg_create(instr, 66, "xyzw", IR2_REG_CONST); + ir2_reg_create(instr, tmp + 1, "xyzw", 0); + ir2_reg_create(instr, 65, "xyzw", IR2_REG_CONST); + ir2_dst_create(instr, fragcoord, "xyz_", IR2_REG_EXPORT); + + instr = ir2_instr_create_alu_s(shader, MAXs); + ir2_reg_create(instr, tmp, "", 0); + ir2_dst_create(instr, fragcoord, "___w", IR2_REG_EXPORT); + } + /* these two instructions could be avoided with constant folding * but it would be hard to implement.. */ -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 04/11] freedreno: a2xx: map tgsi ids to ir2 ids
this is for a2xx specific semantics (vertex id) and a basic SSA form Signed-off-by: Jonathan Marek --- .../drivers/freedreno/a2xx/fd2_compiler.c | 54 +-- src/gallium/drivers/freedreno/a2xx/ir-a2xx.c | 45 ++-- src/gallium/drivers/freedreno/a2xx/ir-a2xx.h | 9 3 files changed, 64 insertions(+), 44 deletions(-) diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c b/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c index 12f9a1ce0a..54f0df54da 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c @@ -244,8 +244,8 @@ compile_vtx_fetch(struct fd2_compile_context *ctx) ctx->need_sync |= 1 << (i+1); - ir2_dst_create(instr, i+1, "xyzw", 0); ir2_reg_create(instr, 0, "x", IR2_REG_INPUT); + ir2_dst_create(instr, i+1, "xyzw", 0); if (i == 0) instr->sync = true; @@ -421,9 +421,9 @@ add_regs_vector_1(struct fd2_compile_context *ctx, assert(inst->Instruction.NumSrcRegs == 1); assert(inst->Instruction.NumDstRegs == 1); - add_dst_reg(ctx, alu, >Dst[0].Register); add_src_reg(ctx, alu, >Src[0].Register); add_src_reg(ctx, alu, >Src[0].Register); + add_dst_reg(ctx, alu, >Dst[0].Register); add_vector_clamp(inst, alu); } @@ -434,9 +434,9 @@ add_regs_vector_2(struct fd2_compile_context *ctx, assert(inst->Instruction.NumSrcRegs == 2); assert(inst->Instruction.NumDstRegs == 1); - add_dst_reg(ctx, alu, >Dst[0].Register); add_src_reg(ctx, alu, >Src[0].Register); add_src_reg(ctx, alu, >Src[1].Register); + add_dst_reg(ctx, alu, >Dst[0].Register); add_vector_clamp(inst, alu); } @@ -447,10 +447,10 @@ add_regs_vector_3(struct fd2_compile_context *ctx, assert(inst->Instruction.NumSrcRegs == 3); assert(inst->Instruction.NumDstRegs == 1); - add_dst_reg(ctx, alu, >Dst[0].Register); add_src_reg(ctx, alu, >Src[0].Register); add_src_reg(ctx, alu, >Src[1].Register); add_src_reg(ctx, alu, >Src[2].Register); + add_dst_reg(ctx, alu, >Dst[0].Register); add_vector_clamp(inst, alu); } @@ -461,8 +461,8 @@ add_regs_scalar_1(struct fd2_compile_context *ctx, assert(inst->Instruction.NumSrcRegs == 1); assert(inst->Instruction.NumDstRegs == 1); - add_dst_reg(ctx, alu, >Dst[0].Register); add_src_reg(ctx, alu, >Src[0].Register); + add_dst_reg(ctx, alu, >Dst[0].Register); add_scalar_clamp(inst, alu); } @@ -544,17 +544,17 @@ push_predicate(struct fd2_compile_context *ctx, struct tgsi_src_register *src) get_predicate(ctx, _dst, NULL); alu = ir2_instr_create_alu_s(ctx->so->ir, PRED_SETNEs); - add_dst_reg(ctx, alu, _dst); add_src_reg(ctx, alu, src); + add_dst_reg(ctx, alu, _dst); } else { struct tgsi_src_register pred_src; get_predicate(ctx, _dst, _src); alu = ir2_instr_create_alu_v(ctx->so->ir, MULv); - add_dst_reg(ctx, alu, _dst); add_src_reg(ctx, alu, _src); add_src_reg(ctx, alu, src); + add_dst_reg(ctx, alu, _dst); // XXX need to make PRED_SETE_PUSHv IR2_PRED_NONE.. but need to make // sure src reg is valid if it was calculated with a predicate @@ -580,8 +580,8 @@ pop_predicate(struct fd2_compile_context *ctx) get_predicate(ctx, _dst, _src); alu = ir2_instr_create_alu_s(ctx->so->ir, PRED_SET_POPs); - add_dst_reg(ctx, alu, _dst); add_src_reg(ctx, alu, _src); + add_dst_reg(ctx, alu, _dst); alu->pred = IR2_PRED_NONE; } else { /* predicate register no longer needed: */ @@ -648,13 +648,13 @@ translate_pow(struct fd2_compile_context *ctx, get_internal_temp(ctx, _dst, _src); alu = ir2_instr_create_alu_s(ctx->so->ir, LOG_CLAMP); - add_dst_reg(ctx, alu, _dst); add_src_reg(ctx, alu, >Src[0].Register); + add_dst_reg(ctx, alu, _dst); alu = ir2_instr_create_alu_v(ctx->so->ir, MULv); - add_dst_reg(ctx, alu, _dst); add_src_reg(ctx, alu, _src); add_src_reg(ctx, alu, >Src[1].Register); + add_dst_reg(ctx, alu, _dst); /* NOTE: some of the instructions, like EXP_IEEE, seem hard- * coded to take their input from the w component. @@ -679,8 +679,8 @@ translate_pow(struct fd2_compile_context *ctx, } alu = ir2_instr_create_alu_s(ctx->so->ir, EXP_IEEE); - add_dst_reg(ctx, alu, >Dst[0].Register);
[Mesa-dev] [PATCH 05/11] freedreno: a2xx: implement a20x binning shader
writes to position export are mapped to a temp reg, code inserted at the end of vertex shaders to export the position and compute the memory exports for hw binning on a20x. C64 is the offset in the binning data, C65/C66 are viewport parameters, C67+i/C68+i are binning view parameters. C3+i is the binning data "pointer" - relative_addr=1 (in ir-a2xx) makes it not interfere with the other shader constants Signed-off-by: Jonathan Marek --- .../drivers/freedreno/a2xx/fd2_compiler.c | 72 +-- src/gallium/drivers/freedreno/a2xx/fd2_emit.c | 14 .../drivers/freedreno/a2xx/fd2_program.c | 6 +- src/gallium/drivers/freedreno/a2xx/ir-a2xx.c | 62 +--- src/gallium/drivers/freedreno/a2xx/ir-a2xx.h | 4 +- 5 files changed, 141 insertions(+), 17 deletions(-) diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c b/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c index 54f0df54da..1ce3bc4f82 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c @@ -294,7 +294,7 @@ get_temp_gpr(struct fd2_compile_context *ctx, int idx) { unsigned num = idx + ctx->num_regs[TGSI_FILE_INPUT]; if (ctx->type == PIPE_SHADER_VERTEX) - num++; + num += 2; /* vertex fetch input / position temp */ return num; } @@ -310,12 +310,19 @@ add_dst_reg(struct fd2_compile_context *ctx, struct ir2_instruction *alu, flags |= IR2_REG_EXPORT; if (ctx->type == PIPE_SHADER_VERTEX) { if (dst->Index == ctx->position) { - num = 62; + /* position needed for fragcoord / a20x hw binning +* write to a temp reg instead +*/ + num = ctx->num_regs[TGSI_FILE_INPUT] + 1; + flags &= ~IR2_REG_EXPORT; } else if (dst->Index == ctx->psize) { num = 63; } else { - num = export_linkage(ctx, - ctx->output_export_idx[dst->Index]); + num = ctx->prog->export_linkage[ + ctx->output_export_idx[dst->Index]]; + /* not used by fragment shader - ir-a2xx will clean it up */ + if (num == 0xff) + num = ctx->prog->num_exports; } } else { num = dst->Index; @@ -1091,6 +1098,60 @@ compile_instructions(struct fd2_compile_context *ctx) } } +static void +compile_extra_exports(struct fd2_compile_context *ctx) +{ + struct ir2_shader *shader = ctx->so->ir; + struct ir2_instruction *instr; + int position = ctx->num_regs[TGSI_FILE_INPUT] + 1; + unsigned i; + /* XXX hacky way to get new temporaries */ + unsigned tmp = shader->max_reg + 1; + + instr = ir2_instr_create_alu_v(shader, MAXv); + ir2_reg_create(instr, position, "xyzw", 0); + ir2_reg_create(instr, position, "xyzw", 0); + ir2_dst_create(instr, 62, "xyzw", IR2_REG_EXPORT); + + instr = ir2_instr_create_alu_s(shader, RECIP_CLAMP); + ir2_reg_create(instr, position, "xyzw", 0); + ir2_dst_create(instr, tmp, "___w", 0); + + instr = ir2_instr_create_alu_v(shader, MULv); + ir2_reg_create(instr, position, "xyzw", 0); + ir2_reg_create(instr, tmp, "", 0); + ir2_dst_create(instr, tmp + 1, "xyzw", 0); + + /* these two instructions could be avoided with constant folding +* but it would be hard to implement.. +*/ + instr = ir2_instr_create_alu_v(shader, MULADDv); + ir2_reg_create(instr, 66, "xyzw", IR2_REG_CONST); + ir2_reg_create(instr, tmp + 1, "xyzw", 0); + ir2_reg_create(instr, 65, "xyzw", IR2_REG_CONST); + ir2_dst_create(instr, tmp + 2, "xyzw", 0); + + instr = ir2_instr_create_alu_v(shader, ADDv); + ir2_reg_create(instr, 64, "", IR2_REG_CONST); + ir2_reg_create(instr, 15, "", IR2_REG_INPUT); + ir2_dst_create(instr, tmp + 3, "x___", 0); + + /* 8 max set in freedreno_screen.. unneeded instrs patched out */ + for (i = 0; i < 8; i++) { + instr = ir2_instr_create_alu_v(shader, MULADDv); + ir2_reg_create(instr, 1, "wyww", IR2_REG_CONST); + ir2_reg_create(instr, tmp + 3, "", 0); + ir2_reg_create(instr, 3 + i, "xyzw", IR2_REG_CONST); +
[Mesa-dev] [PATCH 03/11] freedreno: add a20x ids
the two a20x GPUs tested are a200 in the imx51 and the imx53 (not a205). the 201 id is used for the imx51 (it only has 128kb gmem as opposed to the typical 256kb for a200) Signed-off-by: Jonathan Marek --- src/gallium/drivers/freedreno/freedreno_screen.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c b/src/gallium/drivers/freedreno/freedreno_screen.c index 231e0d4c81..ef722e31fb 100644 --- a/src/gallium/drivers/freedreno/freedreno_screen.c +++ b/src/gallium/drivers/freedreno/freedreno_screen.c @@ -797,6 +797,8 @@ fd_screen_create(struct fd_device *dev) * send a patch ;-) */ switch (screen->gpu_id) { + case 200: + case 201: case 205: case 220: fd2_screen_init(pscreen); -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 01/11] freedreno: implement the USE_VISIBILITY case for a20x in fd_draw
this introduces some tracking of the number of vertices drawn in the current batch: the draw command needs an offset to the start of the binning data Signed-off-by: Jonathan Marek --- .../drivers/freedreno/adreno_pm4.xml.h| 7 + .../drivers/freedreno/freedreno_batch.c | 1 + .../drivers/freedreno/freedreno_batch.h | 1 + .../drivers/freedreno/freedreno_draw.c| 2 ++ .../drivers/freedreno/freedreno_draw.h| 28 +-- .../drivers/freedreno/freedreno_util.h| 8 -- 6 files changed, 42 insertions(+), 5 deletions(-) diff --git a/src/gallium/drivers/freedreno/adreno_pm4.xml.h b/src/gallium/drivers/freedreno/adreno_pm4.xml.h index 88d1c4e6eb..27bbb1928e 100644 --- a/src/gallium/drivers/freedreno/adreno_pm4.xml.h +++ b/src/gallium/drivers/freedreno/adreno_pm4.xml.h @@ -108,6 +108,13 @@ enum pc_di_src_sel { DI_SRC_SEL_RESERVED = 3, }; +enum pc_di_face_cull_sel { +DI_FACE_CULL_NONE = 0, +DI_FACE_CULL_FETCH = 1, +DI_FACE_BACKFACE_CULL = 2, +DI_FACE_FRONTFACE_CULL = 3, +}; + enum pc_di_index_size { INDEX_SIZE_IGN = 0, INDEX_SIZE_16_BIT = 0, diff --git a/src/gallium/drivers/freedreno/freedreno_batch.c b/src/gallium/drivers/freedreno/freedreno_batch.c index a714d97f5c..7ffadea4e0 100644 --- a/src/gallium/drivers/freedreno/freedreno_batch.c +++ b/src/gallium/drivers/freedreno/freedreno_batch.c @@ -76,6 +76,7 @@ batch_init(struct fd_batch *batch) batch->flushed = false; batch->gmem_reason = 0; batch->num_draws = 0; + batch->num_vertices = 0; batch->stage = FD_STAGE_NULL; fd_reset_wfi(batch); diff --git a/src/gallium/drivers/freedreno/freedreno_batch.h b/src/gallium/drivers/freedreno/freedreno_batch.h index 6ff4014ddc..7e6c780aca 100644 --- a/src/gallium/drivers/freedreno/freedreno_batch.h +++ b/src/gallium/drivers/freedreno/freedreno_batch.h @@ -124,6 +124,7 @@ struct fd_batch { FD_GMEM_LOGICOP_ENABLED = 0x20, } gmem_reason; unsigned num_draws; /* number of draws in current batch */ + unsigned num_vertices; /* number of vertices in current batch */ /* Track the maximal bounds of the scissor of all the draws within a * batch. Used at the tile rendering step (fd_gmem_render_tiles(), diff --git a/src/gallium/drivers/freedreno/freedreno_draw.c b/src/gallium/drivers/freedreno/freedreno_draw.c index e130895aac..974a153773 100644 --- a/src/gallium/drivers/freedreno/freedreno_draw.c +++ b/src/gallium/drivers/freedreno/freedreno_draw.c @@ -263,6 +263,8 @@ fd_draw_vbo(struct pipe_context *pctx, const struct pipe_draw_info *info) if (ctx->draw_vbo(ctx, info, index_offset)) batch->needs_flush = true; + batch->num_vertices += info->count; + for (i = 0; i < ctx->streamout.num_targets; i++) ctx->streamout.offsets[i] += info->count; diff --git a/src/gallium/drivers/freedreno/freedreno_draw.h b/src/gallium/drivers/freedreno/freedreno_draw.h index 4a922d9ca3..7f4407a3ae 100644 --- a/src/gallium/drivers/freedreno/freedreno_draw.h +++ b/src/gallium/drivers/freedreno/freedreno_draw.h @@ -41,6 +41,7 @@ struct fd_ringbuffer; void fd_draw_init(struct pipe_context *pctx); + static inline void fd_draw(struct fd_batch *batch, struct fd_ringbuffer *ring, enum pc_di_primtype primtype, @@ -75,9 +76,31 @@ fd_draw(struct fd_batch *batch, struct fd_ringbuffer *ring, } if (is_a20x(batch->ctx->screen)) { - OUT_PKT3(ring, CP_DRAW_INDX, idx_buffer ? 4 : 2); + /* a20x has a different draw command for drawing with binning data +* that makes it harder to patch so always use hw binning if enabled +* +* binning data is is 1 byte/vertex (8x8x4 bin position of vertex) +* base ptr set by the CP_SET_DRAW_INIT_FLAGS command +* +* TODO: investigate the faceness_cull_select parameter to see how +* it is used with hw binning to use "faceness" bits +*/ + bool bin = (vismode == USE_VISIBILITY); + uint32_t draw_initiator = DRAW_A20X(primtype, DI_FACE_CULL_NONE, + src_sel, idx_type, bin, bin, count); + uint32_t size = 2; + if (bin) + size += 2; + if (idx_buffer) + size += 2; + + OUT_PKT3(ring, bin ? CP_DRAW_INDX_BIN : CP_DRAW_INDX, size); OUT_RING(ring, 0x); - OUT_RING(ring, DRAW_A20X(primtype, src_sel, idx_type, vismode, count)); + OUT_RING(ring, draw_initiator); + if (bin) { + OUT_RING(ring, batch->num_vertices); + OUT_RING(ring, count); + } } else
[Mesa-dev] [PATCH 11/11] freedreno: a2xx: set PA_SC_VIZ_QUERY register
on a20x the GPU will hang if this register is zero Signed-off-by: Jonathan Marek --- src/gallium/drivers/freedreno/a2xx/fd2_emit.c | 4 1 file changed, 4 insertions(+) diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c index dcb7b6500a..4a1085e676 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c @@ -358,6 +358,10 @@ fd2_emit_restore(struct fd_context *ctx, struct fd_ringbuffer *ring) A2XX_RB_BC_CONTROL_MEM_EXPORT_TIMEOUT_SELECT(3)); } + OUT_PKT3(ring, CP_SET_CONSTANT, 2); + OUT_RING(ring, CP_REG(REG_A2XX_PA_SC_VIZ_QUERY)); + OUT_RING(ring, A2XX_PA_SC_VIZ_QUERY_VIZ_QUERY_ID(16)); + OUT_PKT0(ring, REG_A2XX_TP0_CHICKEN, 1); OUT_RING(ring, 0x0002); -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 10/11] freedreno: a2xx: split large draws on a20x
a20x can only draw 65535 vertices at once. this fix only applies to triangles. Signed-off-by: Jonathan Marek --- src/gallium/drivers/freedreno/a2xx/fd2_draw.c | 30 +-- 1 file changed, 28 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c index 1792505808..7ccbee587f 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c @@ -171,8 +171,34 @@ fd2_draw_vbo(struct fd_context *ctx, const struct pipe_draw_info *pinfo, fd2_emit_state(ctx, ctx->batch->draw, ctx->dirty); fd2_emit_state(ctx, ctx->batch->binning, ctx->dirty); - draw_impl(ctx, pinfo, ctx->batch->draw, index_offset, false); - draw_impl(ctx, pinfo, ctx->batch->binning, index_offset, true); + /* a20x can only draw 65535 vertices at once... */ + if (is_a20x(ctx->screen) && pinfo->count > 0x) { + struct pipe_draw_info info = *pinfo; + unsigned count = info.count; + unsigned num_vertices = ctx->batch->num_vertices; + + /* other primitives require more work +* (triangles works because 0x is divible by 3) +*/ + if (info.mode != PIPE_PRIM_TRIANGLES) + return false; + + for (; count; ) { + info.count = MIN2(count, 0x); + + draw_impl(ctx, , ctx->batch->draw, index_offset, false); + draw_impl(ctx, , ctx->batch->binning, index_offset, true); + + info.start += 0x; + ctx->batch->num_vertices += 0x; + count -= info.count; + } + /* changing this value is a hack, restore it */ + ctx->batch->num_vertices = num_vertices; + } else { + draw_impl(ctx, pinfo, ctx->batch->draw, index_offset, false); + draw_impl(ctx, pinfo, ctx->batch->binning, index_offset, true); + } fd_context_all_clean(ctx); -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 08/11] freedreno: a2xx: add fragcoord
emulated fragcoord. a2xx has *some* hw support but it is not practical Signed-off-by: Jonathan Marek --- .../drivers/freedreno/a2xx/fd2_compiler.c| 16 1 file changed, 16 insertions(+) diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c b/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c index 1ce3bc4f82..ab5d16f1a7 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c @@ -186,6 +186,7 @@ compile_init(struct fd2_compile_context *ctx, struct fd_program_stateobj *prog, switch (name) { case TGSI_SEMANTIC_COLOR: case TGSI_SEMANTIC_GENERIC: + case TGSI_SEMANTIC_POSITION: ctx->num_param++; break; default: @@ -325,6 +326,8 @@ add_dst_reg(struct fd2_compile_context *ctx, struct ir2_instruction *alu, num = ctx->prog->num_exports; } } else { + /* write to gl_FragCoord.z not possible */ + assert(ctx->output_export_idx[dst->Index] != TGSI_SEMANTIC_POSITION); num = dst->Index; } break; @@ -1103,6 +1106,7 @@ compile_extra_exports(struct fd2_compile_context *ctx) { struct ir2_shader *shader = ctx->so->ir; struct ir2_instruction *instr; + int fragcoord = ctx->prog->export_linkage[TGSI_SEMANTIC_POSITION]; int position = ctx->num_regs[TGSI_FILE_INPUT] + 1; unsigned i; /* XXX hacky way to get new temporaries */ @@ -1122,6 +1126,18 @@ compile_extra_exports(struct fd2_compile_context *ctx) ir2_reg_create(instr, tmp, "", 0); ir2_dst_create(instr, tmp + 1, "xyzw", 0); + if (fragcoord != 0xff) { + instr = ir2_instr_create_alu_v(shader, MULADDv); + ir2_reg_create(instr, 66, "xyzw", IR2_REG_CONST); + ir2_reg_create(instr, tmp + 1, "xyzw", 0); + ir2_reg_create(instr, 65, "xyzw", IR2_REG_CONST); + ir2_dst_create(instr, fragcoord, "xyz_", IR2_REG_EXPORT); + + instr = ir2_instr_create_alu_s(shader, MAXs); + ir2_reg_create(instr, tmp, "", 0); + ir2_dst_create(instr, fragcoord, "___w", IR2_REG_EXPORT); + } + /* these two instructions could be avoided with constant folding * but it would be hard to implement.. */ -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 09/11] freedreno: a2xx: a20x hw binning
adds all the required logic for a20x hw binning to work Signed-off-by: Jonathan Marek --- src/gallium/drivers/freedreno/a2xx/fd2_draw.c | 95 src/gallium/drivers/freedreno/a2xx/fd2_emit.c | 10 +- src/gallium/drivers/freedreno/a2xx/fd2_emit.h | 3 +- src/gallium/drivers/freedreno/a2xx/fd2_gmem.c | 106 +- .../drivers/freedreno/a2xx/fd2_program.c | 41 --- .../drivers/freedreno/a2xx/fd2_program.h | 2 +- 6 files changed, 214 insertions(+), 43 deletions(-) diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c index 6f0535fa2b..1792505808 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c @@ -77,31 +77,56 @@ emit_vertexbufs(struct fd_context *ctx) // CONST(20,0) (or CONST(26,0) in soliv_vp) fd2_emit_vertex_bufs(ctx->batch->draw, 0x78, bufs, vtx->num_elements); + fd2_emit_vertex_bufs(ctx->batch->binning, 0x78, bufs, vtx->num_elements); } -static bool -fd2_draw_vbo(struct fd_context *ctx, const struct pipe_draw_info *info, - unsigned index_offset) +static void +draw_impl(struct fd_context *ctx, const struct pipe_draw_info *info, + struct fd_ringbuffer *ring, unsigned index_offset, + bool binning) { - struct fd_ringbuffer *ring = ctx->batch->draw; - - if (ctx->dirty & FD_DIRTY_VTXBUF) - emit_vertexbufs(ctx); - - fd2_emit_state(ctx, ctx->dirty); + enum pc_di_vis_cull_mode vismode; OUT_PKT3(ring, CP_SET_CONSTANT, 2); OUT_RING(ring, CP_REG(REG_A2XX_VGT_INDX_OFFSET)); - OUT_RING(ring, info->start); + OUT_RING(ring, info->index_size ? 0 : info->start); - OUT_PKT3(ring, CP_SET_CONSTANT, 2); - OUT_RING(ring, CP_REG(REG_A2XX_VGT_VERTEX_REUSE_BLOCK_CNTL)); - OUT_RING(ring, 0x003b); + /* in the binning batch, thid value is set once in fd2_emit_tile_init */ + if (!binning) { + OUT_PKT3(ring, CP_SET_CONSTANT, 2); + OUT_RING(ring, CP_REG(REG_A2XX_VGT_VERTEX_REUSE_BLOCK_CNTL)); + /* XXX do this for every REG_A2XX_VGT_VERTEX_REUSE_BLOCK_CNTL write ? +* if set to 0x3b on a20x, clipping is broken +*/ + OUT_RING(ring, is_a20x(ctx->screen) ? 0x0002 : 0x003b); + } OUT_PKT0(ring, REG_A2XX_TC_CNTL_STATUS, 1); OUT_RING(ring, A2XX_TC_CNTL_STATUS_L2_INVALIDATE); - if (!is_a20x(ctx->screen)) { + if (is_a20x(ctx->screen)) { + /* wait for DMA to finish and +* dummy draw one triangle with indexes 0,0,0. +* with PRE_FETCH_CULL_ENABLE | GRP_CULL_ENABLE. +* +* this workaround is for a HW bug related to DMA alignment: +* it is necessary for indexed draws and possibly also +* draws that read binning data +*/ + OUT_PKT3(ring, CP_WAIT_REG_EQ, 4); + OUT_RING(ring, 0x05d0); /* RBBM_STATUS */ + OUT_RING(ring, 0x); + OUT_RING(ring, 0x1000); /* bit: 12: VGT_BUSY_NO_DMA */ + OUT_RING(ring, 0x0001); + + OUT_PKT3(ring, CP_DRAW_INDX_BIN, 6); + OUT_RING(ring, 0x); + OUT_RING(ring, 0x0003c004); + OUT_RING(ring, 0x); + OUT_RING(ring, 0x0003); + OUT_RELOC(ring, fd_resource(fd2_context(ctx)->solid_vertexbuf)->bo, 0x80, 0, 0); + OUT_RING(ring, 0x0006); + } else { OUT_WFI (ring); OUT_PKT3(ring, CP_SET_CONSTANT, 3); @@ -110,14 +135,44 @@ fd2_draw_vbo(struct fd_context *ctx, const struct pipe_draw_info *info, OUT_RING(ring, info->min_index);/* VGT_MIN_VTX_INDX */ } + /* C64 holds offset to use for binning data */ + if (binning && is_a20x(ctx->screen)) { + OUT_PKT3(ring, CP_SET_CONSTANT, 5); + OUT_RING(ring, 0x0180); + OUT_RING(ring, fui(ctx->batch->num_vertices)); + OUT_RING(ring, fui(0.0f)); + OUT_RING(ring, fui(0.0f)); + OUT_RING(ring, fui(0.0f)); + } + + vismode = binning ? IGNORE_VISIBILITY : USE_VISIBILITY; + /* a22x hw binning not implemented */ + if (binning || !is_a20x(ctx->screen) || (fd_mesa_debug & FD_DBG_NOBIN)) + vismode = IGNORE_VISIBILITY; + fd_draw_emit(ctx->batch, ring, ctx->primtypes[info->mode], -IGNORE_VISIBILITY, info, index_offset); + vismode, info, index_offset); - OUT_PKT3(ring, CP_SET_CONSTANT, 2); - OUT_RING(ring, CP_REG(REG_A2XX_UNKNOWN_2010))
[Mesa-dev] [PATCH 07/11] freedreno: a2xx: implement a20x binning shader
writes to position export are mapped to a temp reg, code inserted at the end of vertex shaders to export the position and compute the memory exports for hw binning on a20x. C64 is the offset in the binning data, C65/C66 are viewport parameters, C67+i/C68+i are binning view parameters. C3+i is the binning data "pointer" - relative_addr=1 (in ir-a2xx) makes it not interfere with the other shader constants Signed-off-by: Jonathan Marek --- .../drivers/freedreno/a2xx/fd2_compiler.c | 72 +-- src/gallium/drivers/freedreno/a2xx/fd2_emit.c | 14 .../drivers/freedreno/a2xx/fd2_program.c | 6 +- src/gallium/drivers/freedreno/a2xx/ir-a2xx.c | 62 +--- src/gallium/drivers/freedreno/a2xx/ir-a2xx.h | 4 +- 5 files changed, 141 insertions(+), 17 deletions(-) diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c b/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c index 54f0df54da..1ce3bc4f82 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c @@ -294,7 +294,7 @@ get_temp_gpr(struct fd2_compile_context *ctx, int idx) { unsigned num = idx + ctx->num_regs[TGSI_FILE_INPUT]; if (ctx->type == PIPE_SHADER_VERTEX) - num++; + num += 2; /* vertex fetch input / position temp */ return num; } @@ -310,12 +310,19 @@ add_dst_reg(struct fd2_compile_context *ctx, struct ir2_instruction *alu, flags |= IR2_REG_EXPORT; if (ctx->type == PIPE_SHADER_VERTEX) { if (dst->Index == ctx->position) { - num = 62; + /* position needed for fragcoord / a20x hw binning +* write to a temp reg instead +*/ + num = ctx->num_regs[TGSI_FILE_INPUT] + 1; + flags &= ~IR2_REG_EXPORT; } else if (dst->Index == ctx->psize) { num = 63; } else { - num = export_linkage(ctx, - ctx->output_export_idx[dst->Index]); + num = ctx->prog->export_linkage[ + ctx->output_export_idx[dst->Index]]; + /* not used by fragment shader - ir-a2xx will clean it up */ + if (num == 0xff) + num = ctx->prog->num_exports; } } else { num = dst->Index; @@ -1091,6 +1098,60 @@ compile_instructions(struct fd2_compile_context *ctx) } } +static void +compile_extra_exports(struct fd2_compile_context *ctx) +{ + struct ir2_shader *shader = ctx->so->ir; + struct ir2_instruction *instr; + int position = ctx->num_regs[TGSI_FILE_INPUT] + 1; + unsigned i; + /* XXX hacky way to get new temporaries */ + unsigned tmp = shader->max_reg + 1; + + instr = ir2_instr_create_alu_v(shader, MAXv); + ir2_reg_create(instr, position, "xyzw", 0); + ir2_reg_create(instr, position, "xyzw", 0); + ir2_dst_create(instr, 62, "xyzw", IR2_REG_EXPORT); + + instr = ir2_instr_create_alu_s(shader, RECIP_CLAMP); + ir2_reg_create(instr, position, "xyzw", 0); + ir2_dst_create(instr, tmp, "___w", 0); + + instr = ir2_instr_create_alu_v(shader, MULv); + ir2_reg_create(instr, position, "xyzw", 0); + ir2_reg_create(instr, tmp, "", 0); + ir2_dst_create(instr, tmp + 1, "xyzw", 0); + + /* these two instructions could be avoided with constant folding +* but it would be hard to implement.. +*/ + instr = ir2_instr_create_alu_v(shader, MULADDv); + ir2_reg_create(instr, 66, "xyzw", IR2_REG_CONST); + ir2_reg_create(instr, tmp + 1, "xyzw", 0); + ir2_reg_create(instr, 65, "xyzw", IR2_REG_CONST); + ir2_dst_create(instr, tmp + 2, "xyzw", 0); + + instr = ir2_instr_create_alu_v(shader, ADDv); + ir2_reg_create(instr, 64, "", IR2_REG_CONST); + ir2_reg_create(instr, 15, "", IR2_REG_INPUT); + ir2_dst_create(instr, tmp + 3, "x___", 0); + + /* 8 max set in freedreno_screen.. unneeded instrs patched out */ + for (i = 0; i < 8; i++) { + instr = ir2_instr_create_alu_v(shader, MULADDv); + ir2_reg_create(instr, 1, "wyww", IR2_REG_CONST); + ir2_reg_create(instr, tmp + 3, "", 0); + ir2_reg_create(instr, 3 + i, "xyzw", IR2_REG_CONST); +
[Mesa-dev] [PATCH 06/11] freedreno: a2xx: map tgsi ids to ir2 ids
this is for a2xx specific semantics (vertex id) and a basic SSA form Signed-off-by: Jonathan Marek --- .../drivers/freedreno/a2xx/fd2_compiler.c | 54 +-- src/gallium/drivers/freedreno/a2xx/ir-a2xx.c | 45 ++-- src/gallium/drivers/freedreno/a2xx/ir-a2xx.h | 9 3 files changed, 64 insertions(+), 44 deletions(-) diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c b/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c index 12f9a1ce0a..54f0df54da 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c @@ -244,8 +244,8 @@ compile_vtx_fetch(struct fd2_compile_context *ctx) ctx->need_sync |= 1 << (i+1); - ir2_dst_create(instr, i+1, "xyzw", 0); ir2_reg_create(instr, 0, "x", IR2_REG_INPUT); + ir2_dst_create(instr, i+1, "xyzw", 0); if (i == 0) instr->sync = true; @@ -421,9 +421,9 @@ add_regs_vector_1(struct fd2_compile_context *ctx, assert(inst->Instruction.NumSrcRegs == 1); assert(inst->Instruction.NumDstRegs == 1); - add_dst_reg(ctx, alu, >Dst[0].Register); add_src_reg(ctx, alu, >Src[0].Register); add_src_reg(ctx, alu, >Src[0].Register); + add_dst_reg(ctx, alu, >Dst[0].Register); add_vector_clamp(inst, alu); } @@ -434,9 +434,9 @@ add_regs_vector_2(struct fd2_compile_context *ctx, assert(inst->Instruction.NumSrcRegs == 2); assert(inst->Instruction.NumDstRegs == 1); - add_dst_reg(ctx, alu, >Dst[0].Register); add_src_reg(ctx, alu, >Src[0].Register); add_src_reg(ctx, alu, >Src[1].Register); + add_dst_reg(ctx, alu, >Dst[0].Register); add_vector_clamp(inst, alu); } @@ -447,10 +447,10 @@ add_regs_vector_3(struct fd2_compile_context *ctx, assert(inst->Instruction.NumSrcRegs == 3); assert(inst->Instruction.NumDstRegs == 1); - add_dst_reg(ctx, alu, >Dst[0].Register); add_src_reg(ctx, alu, >Src[0].Register); add_src_reg(ctx, alu, >Src[1].Register); add_src_reg(ctx, alu, >Src[2].Register); + add_dst_reg(ctx, alu, >Dst[0].Register); add_vector_clamp(inst, alu); } @@ -461,8 +461,8 @@ add_regs_scalar_1(struct fd2_compile_context *ctx, assert(inst->Instruction.NumSrcRegs == 1); assert(inst->Instruction.NumDstRegs == 1); - add_dst_reg(ctx, alu, >Dst[0].Register); add_src_reg(ctx, alu, >Src[0].Register); + add_dst_reg(ctx, alu, >Dst[0].Register); add_scalar_clamp(inst, alu); } @@ -544,17 +544,17 @@ push_predicate(struct fd2_compile_context *ctx, struct tgsi_src_register *src) get_predicate(ctx, _dst, NULL); alu = ir2_instr_create_alu_s(ctx->so->ir, PRED_SETNEs); - add_dst_reg(ctx, alu, _dst); add_src_reg(ctx, alu, src); + add_dst_reg(ctx, alu, _dst); } else { struct tgsi_src_register pred_src; get_predicate(ctx, _dst, _src); alu = ir2_instr_create_alu_v(ctx->so->ir, MULv); - add_dst_reg(ctx, alu, _dst); add_src_reg(ctx, alu, _src); add_src_reg(ctx, alu, src); + add_dst_reg(ctx, alu, _dst); // XXX need to make PRED_SETE_PUSHv IR2_PRED_NONE.. but need to make // sure src reg is valid if it was calculated with a predicate @@ -580,8 +580,8 @@ pop_predicate(struct fd2_compile_context *ctx) get_predicate(ctx, _dst, _src); alu = ir2_instr_create_alu_s(ctx->so->ir, PRED_SET_POPs); - add_dst_reg(ctx, alu, _dst); add_src_reg(ctx, alu, _src); + add_dst_reg(ctx, alu, _dst); alu->pred = IR2_PRED_NONE; } else { /* predicate register no longer needed: */ @@ -648,13 +648,13 @@ translate_pow(struct fd2_compile_context *ctx, get_internal_temp(ctx, _dst, _src); alu = ir2_instr_create_alu_s(ctx->so->ir, LOG_CLAMP); - add_dst_reg(ctx, alu, _dst); add_src_reg(ctx, alu, >Src[0].Register); + add_dst_reg(ctx, alu, _dst); alu = ir2_instr_create_alu_v(ctx->so->ir, MULv); - add_dst_reg(ctx, alu, _dst); add_src_reg(ctx, alu, _src); add_src_reg(ctx, alu, >Src[1].Register); + add_dst_reg(ctx, alu, _dst); /* NOTE: some of the instructions, like EXP_IEEE, seem hard- * coded to take their input from the w component. @@ -679,8 +679,8 @@ translate_pow(struct fd2_compile_context *ctx, } alu = ir2_instr_create_alu_s(ctx->so->ir, EXP_IEEE); - add_dst_reg(ctx, alu, >Dst[0].Register);
[Mesa-dev] [PATCH 05/11] freedreno: add a20x ids
the two a20x GPUs tested are a200 in the imx51 and the imx53 (not a205). the 201 id is used for the imx51 (it only has 128kb gmem as opposed to the typical 256kb for a200) Signed-off-by: Jonathan Marek --- src/gallium/drivers/freedreno/freedreno_screen.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c b/src/gallium/drivers/freedreno/freedreno_screen.c index ef88e5b121..c39c140fac 100644 --- a/src/gallium/drivers/freedreno/freedreno_screen.c +++ b/src/gallium/drivers/freedreno/freedreno_screen.c @@ -782,6 +782,8 @@ fd_screen_create(struct fd_device *dev, struct renderonly *ro) * send a patch ;-) */ switch (screen->gpu_id) { + case 200: + case 201: case 205: case 220: fd2_screen_init(pscreen); -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 02/11] freedreno: implement different pipe configuration for a20x
this also adds a num_vsc_pipe which represents the number of pipes to use: this value is useful because more pipes has a higher cost (on a20x) Signed-off-by: Jonathan Marek --- .../drivers/freedreno/freedreno_context.h | 1 + .../drivers/freedreno/freedreno_gmem.c| 30 ++- 2 files changed, 23 insertions(+), 8 deletions(-) diff --git a/src/gallium/drivers/freedreno/freedreno_context.h b/src/gallium/drivers/freedreno/freedreno_context.h index 58fba99874..e150fdef5a 100644 --- a/src/gallium/drivers/freedreno/freedreno_context.h +++ b/src/gallium/drivers/freedreno/freedreno_context.h @@ -260,6 +260,7 @@ struct fd_context { struct fd_gmem_stateobj gmem; struct fd_vsc_pipe vsc_pipe[16]; struct fd_tile tile[512]; + unsigned num_vsc_pipe; /* which state objects need to be re-emit'd: */ enum fd_dirty_3d_state dirty; diff --git a/src/gallium/drivers/freedreno/freedreno_gmem.c b/src/gallium/drivers/freedreno/freedreno_gmem.c index 981ab0cf76..44133a19ab 100644 --- a/src/gallium/drivers/freedreno/freedreno_gmem.c +++ b/src/gallium/drivers/freedreno/freedreno_gmem.c @@ -215,12 +215,21 @@ calculate_tiles(struct fd_batch *batch) #define div_round_up(v, a) (((v) + (a) - 1) / (a)) /* figure out number of tiles per pipe: */ - tpp_x = tpp_y = 1; - while (div_round_up(nbins_y, tpp_y) > 8) - tpp_y += 2; - while ((div_round_up(nbins_y, tpp_y) * - div_round_up(nbins_x, tpp_x)) > 8) - tpp_x += 1; + if (is_a20x(ctx->screen)) { + /* for a20x we want to minimize the number of "pipes" +* binning data has 3 bits for x/y (8x8) but the edges are used to +* cull off-screen vertices with hw binning, so we have 6x6 pipes +*/ + tpp_x = 6; + tpp_y = 6; + } else { + tpp_x = tpp_y = 1; + while (div_round_up(nbins_y, tpp_y) > 8) + tpp_y += 2; + while ((div_round_up(nbins_y, tpp_y) * + div_round_up(nbins_x, tpp_x)) > 8) + tpp_x += 1; + } gmem->maxpw = tpp_x; gmem->maxph = tpp_y; @@ -246,6 +255,10 @@ calculate_tiles(struct fd_batch *batch) xoff += tpp_x; } + /* number of pipes to use (for a20x) +* at least 1 pipe is needed +*/ + ctx->num_vsc_pipe = i ? i : 1; for (; i < npipes; i++) { struct fd_vsc_pipe *pipe = >vsc_pipe[i]; @@ -281,11 +294,12 @@ calculate_tiles(struct fd_batch *batch) /* pipe number: */ p = ((i / tpp_y) * div_round_up(nbins_x, tpp_x)) + (j / tpp_x); + assert(p < ctx->num_vsc_pipe); /* clip bin width: */ bw = MIN2(bin_w, minx + width - xoff); - - tile->n = tile_n[p]++; + tile->n = !is_a20x(ctx->screen) ? tile_n[p]++ : + ((i % tpp_y + 1) << 3 | (j % tpp_x + 1)); tile->p = p; tile->bin_w = bw; tile->bin_h = bh; -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 01/11] freedreno: implement the USE_VISIBILITY case for a20x in fd_draw
this introduces some tracking of the number of vertices drawn in the current batch: the draw command needs an offset to the start of the binning data Signed-off-by: Jonathan Marek --- .../drivers/freedreno/adreno_pm4.xml.h| 7 + .../drivers/freedreno/freedreno_batch.c | 1 + .../drivers/freedreno/freedreno_batch.h | 1 + .../drivers/freedreno/freedreno_draw.c| 2 ++ .../drivers/freedreno/freedreno_draw.h| 28 +-- .../drivers/freedreno/freedreno_util.h| 8 -- 6 files changed, 42 insertions(+), 5 deletions(-) diff --git a/src/gallium/drivers/freedreno/adreno_pm4.xml.h b/src/gallium/drivers/freedreno/adreno_pm4.xml.h index fe96a1381f..eff0ed9f8e 100644 --- a/src/gallium/drivers/freedreno/adreno_pm4.xml.h +++ b/src/gallium/drivers/freedreno/adreno_pm4.xml.h @@ -108,6 +108,13 @@ enum pc_di_src_sel { DI_SRC_SEL_RESERVED = 3, }; +enum pc_di_face_cull_sel { +DI_FACE_CULL_NONE = 0, +DI_FACE_CULL_FETCH = 1, +DI_FACE_BACKFACE_CULL = 2, +DI_FACE_FRONTFACE_CULL = 3, +}; + enum pc_di_index_size { INDEX_SIZE_IGN = 0, INDEX_SIZE_16_BIT = 0, diff --git a/src/gallium/drivers/freedreno/freedreno_batch.c b/src/gallium/drivers/freedreno/freedreno_batch.c index ff8298e82a..ad60c2742c 100644 --- a/src/gallium/drivers/freedreno/freedreno_batch.c +++ b/src/gallium/drivers/freedreno/freedreno_batch.c @@ -75,6 +75,7 @@ batch_init(struct fd_batch *batch) batch->flushed = false; batch->gmem_reason = 0; batch->num_draws = 0; + batch->num_vertices = 0; batch->stage = FD_STAGE_NULL; fd_reset_wfi(batch); diff --git a/src/gallium/drivers/freedreno/freedreno_batch.h b/src/gallium/drivers/freedreno/freedreno_batch.h index 6bb88a6291..67cadd5633 100644 --- a/src/gallium/drivers/freedreno/freedreno_batch.h +++ b/src/gallium/drivers/freedreno/freedreno_batch.h @@ -120,6 +120,7 @@ struct fd_batch { FD_GMEM_LOGICOP_ENABLED = 0x20, } gmem_reason; unsigned num_draws; /* number of draws in current batch */ + unsigned num_vertices; /* number of vertices in current batch */ /* Track the maximal bounds of the scissor of all the draws within a * batch. Used at the tile rendering step (fd_gmem_render_tiles(), diff --git a/src/gallium/drivers/freedreno/freedreno_draw.c b/src/gallium/drivers/freedreno/freedreno_draw.c index f55905e7bf..ee8aab48c3 100644 --- a/src/gallium/drivers/freedreno/freedreno_draw.c +++ b/src/gallium/drivers/freedreno/freedreno_draw.c @@ -254,6 +254,8 @@ fd_draw_vbo(struct pipe_context *pctx, const struct pipe_draw_info *info) if (ctx->draw_vbo(ctx, info, index_offset)) batch->needs_flush = true; + batch->num_vertices += info->count; + for (i = 0; i < ctx->streamout.num_targets; i++) ctx->streamout.offsets[i] += info->count; diff --git a/src/gallium/drivers/freedreno/freedreno_draw.h b/src/gallium/drivers/freedreno/freedreno_draw.h index 4a922d9ca3..7f4407a3ae 100644 --- a/src/gallium/drivers/freedreno/freedreno_draw.h +++ b/src/gallium/drivers/freedreno/freedreno_draw.h @@ -41,6 +41,7 @@ struct fd_ringbuffer; void fd_draw_init(struct pipe_context *pctx); + static inline void fd_draw(struct fd_batch *batch, struct fd_ringbuffer *ring, enum pc_di_primtype primtype, @@ -75,9 +76,31 @@ fd_draw(struct fd_batch *batch, struct fd_ringbuffer *ring, } if (is_a20x(batch->ctx->screen)) { - OUT_PKT3(ring, CP_DRAW_INDX, idx_buffer ? 4 : 2); + /* a20x has a different draw command for drawing with binning data +* that makes it harder to patch so always use hw binning if enabled +* +* binning data is is 1 byte/vertex (8x8x4 bin position of vertex) +* base ptr set by the CP_SET_DRAW_INIT_FLAGS command +* +* TODO: investigate the faceness_cull_select parameter to see how +* it is used with hw binning to use "faceness" bits +*/ + bool bin = (vismode == USE_VISIBILITY); + uint32_t draw_initiator = DRAW_A20X(primtype, DI_FACE_CULL_NONE, + src_sel, idx_type, bin, bin, count); + uint32_t size = 2; + if (bin) + size += 2; + if (idx_buffer) + size += 2; + + OUT_PKT3(ring, bin ? CP_DRAW_INDX_BIN : CP_DRAW_INDX, size); OUT_RING(ring, 0x); - OUT_RING(ring, DRAW_A20X(primtype, src_sel, idx_type, vismode, count)); + OUT_RING(ring, draw_initiator); + if (bin) { + OUT_RING(ring, batch->num_vertices); + OUT_RING(ring, count); + } } else
[Mesa-dev] [PATCH 03/11] freedreno: use renderonly scanout
Signed-off-by: Jonathan Marek --- .../drivers/freedreno/freedreno_resource.c| 57 +-- .../drivers/freedreno/freedreno_resource.h| 1 + .../drivers/freedreno/freedreno_screen.c | 29 +++--- .../drivers/freedreno/freedreno_screen.h | 10 ++-- .../freedreno/drm/freedreno_drm_public.h | 4 ++ .../freedreno/drm/freedreno_drm_winsys.c | 23 ++-- 6 files changed, 85 insertions(+), 39 deletions(-) diff --git a/src/gallium/drivers/freedreno/freedreno_resource.c b/src/gallium/drivers/freedreno/freedreno_resource.c index 344004f696..adfa0f27a7 100644 --- a/src/gallium/drivers/freedreno/freedreno_resource.c +++ b/src/gallium/drivers/freedreno/freedreno_resource.c @@ -645,6 +645,9 @@ fd_resource_destroy(struct pipe_screen *pscreen, fd_bc_invalidate_resource(rsc, true); if (rsc->bo) fd_bo_del(rsc->bo); + if (rsc->scanout) + renderonly_scanout_destroy(rsc->scanout, fd_screen(pscreen)->ro); + util_range_destroy(>valid_buffer_range); FREE(rsc); } @@ -657,9 +660,26 @@ fd_resource_get_handle(struct pipe_screen *pscreen, unsigned usage) { struct fd_resource *rsc = fd_resource(prsc); - - return fd_screen_bo_get_handle(pscreen, rsc->bo, - rsc->slices[0].pitch * rsc->cpp, handle); + struct renderonly_scanout *scanout = rsc->scanout; + struct fd_bo *bo = rsc->bo; + + handle->stride = rsc->slices[0].pitch * rsc->cpp; + + if (handle->type == WINSYS_HANDLE_TYPE_SHARED) { + return fd_bo_get_name(bo, >handle) == 0; + } else if (handle->type == WINSYS_HANDLE_TYPE_KMS) { + if (renderonly_get_handle(scanout, handle)) { + return TRUE; + } else { + handle->handle = fd_bo_handle(bo); + return TRUE; + } + } else if (handle->type == WINSYS_HANDLE_TYPE_FD) { + handle->handle = fd_bo_dmabuf(bo); + return TRUE; + } else { + return FALSE; + } } static uint32_t @@ -801,8 +821,8 @@ fd_resource_create(struct pipe_screen *pscreen, const struct pipe_resource *tmpl) { struct fd_screen *screen = fd_screen(pscreen); - struct fd_resource *rsc = CALLOC_STRUCT(fd_resource); - struct pipe_resource *prsc = >base; + struct fd_resource *rsc; + struct pipe_resource *prsc; enum pipe_format format = tmpl->format; uint32_t size; @@ -813,6 +833,33 @@ fd_resource_create(struct pipe_screen *pscreen, tmpl->array_size, tmpl->last_level, tmpl->nr_samples, tmpl->usage, tmpl->bind, tmpl->flags); + if (tmpl->bind & PIPE_BIND_SCANOUT) { + struct pipe_resource scanout_templat = *tmpl; + struct renderonly_scanout *scanout; + struct winsys_handle handle; + + scanout = renderonly_scanout_for_resource(_templat, + screen->ro, ); + if (!scanout) + return NULL; + + assert(handle.type == WINSYS_HANDLE_TYPE_FD); + // handle.modifier = modifier; + scanout_templat.bind &= ~PIPE_BIND_SCANOUT; + rsc = fd_resource(pscreen->resource_from_handle(pscreen, _templat, + , + PIPE_HANDLE_USAGE_WRITE)); + close(handle.handle); + if (!rsc) + return NULL; + + rsc->scanout = scanout; + return >base; + } + + rsc = CALLOC_STRUCT(fd_resource); + prsc = >base; + if (!rsc) return NULL; diff --git a/src/gallium/drivers/freedreno/freedreno_resource.h b/src/gallium/drivers/freedreno/freedreno_resource.h index 2834969110..baad2baa68 100644 --- a/src/gallium/drivers/freedreno/freedreno_resource.h +++ b/src/gallium/drivers/freedreno/freedreno_resource.h @@ -65,6 +65,7 @@ struct set; struct fd_resource { struct pipe_resource base; + struct renderonly_scanout *scanout; struct fd_bo *bo; uint32_t cpp; enum pipe_format internal_format; diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c b/src/gallium/drivers/freedreno/freedreno_screen.c index 33f14b8f24..ef88e5b121 100644 --- a/src/gallium/drivers/freedreno/freedreno_screen.c +++ b/src/gallium/drivers/freedreno/freedreno_screen.c @@ -649,27 +649,6 @@ fd_get_compiler_options(struct pipe_screen *pscreen, return NULL; } -boolean -fd_scree
[Mesa-dev] [PATCH 04/11] imx: add freedreno
Signed-off-by: Jonathan Marek --- configure.ac| 4 ++-- src/gallium/targets/dri/target.c| 5 +++- src/gallium/winsys/imx/drm/Makefile.am | 9 +++ src/gallium/winsys/imx/drm/imx_drm_winsys.c | 26 - 4 files changed, 35 insertions(+), 9 deletions(-) diff --git a/configure.ac b/configure.ac index f8bb131cb6..85cd3c1eeb 100644 --- a/configure.ac +++ b/configure.ac @@ -2835,8 +2835,8 @@ AM_CONDITIONAL(HAVE_SWR_BUILTIN, test "x$HAVE_SWR_BUILTIN" = xyes) dnl We need to validate some needed dependencies for renderonly drivers. -if test "x$HAVE_GALLIUM_ETNAVIV" != xyes -a "x$HAVE_GALLIUM_IMX" = xyes ; then -AC_MSG_ERROR([Building with imx requires etnaviv]) +if test "x$HAVE_GALLIUM_ETNAVIV" != xyes -a "x$HAVE_GALLIUM_FREEDRENO" != xyes -a "x$HAVE_GALLIUM_IMX" = xyes ; then +AC_MSG_ERROR([Building with imx requires etnaviv or freedreno]) fi if test "x$HAVE_GALLIUM_VC4" != xyes -a "x$HAVE_GALLIUM_PL111" = xyes ; then diff --git a/src/gallium/targets/dri/target.c b/src/gallium/targets/dri/target.c index 835d125f21..ddaca8501a 100644 --- a/src/gallium/targets/dri/target.c +++ b/src/gallium/targets/dri/target.c @@ -83,10 +83,13 @@ DEFINE_LOADER_DRM_ENTRYPOINT(pl111) #endif #if defined(GALLIUM_ETNAVIV) -DEFINE_LOADER_DRM_ENTRYPOINT(imx_drm) DEFINE_LOADER_DRM_ENTRYPOINT(etnaviv) #endif +#if defined(GALLIUM_IMX) +DEFINE_LOADER_DRM_ENTRYPOINT(imx_drm) +#endif + #if defined(GALLIUM_TEGRA) DEFINE_LOADER_DRM_ENTRYPOINT(tegra); #endif diff --git a/src/gallium/winsys/imx/drm/Makefile.am b/src/gallium/winsys/imx/drm/Makefile.am index f15b531f81..17068cb300 100644 --- a/src/gallium/winsys/imx/drm/Makefile.am +++ b/src/gallium/winsys/imx/drm/Makefile.am @@ -28,8 +28,17 @@ AM_CFLAGS = \ -I$(top_srcdir)/src/gallium/winsys \ $(GALLIUM_WINSYS_CFLAGS) +if HAVE_GALLIUM_ETNAVIV +AM_CFLAGS += -DGALLIUM_ETNAVIV +endif + +if HAVE_GALLIUM_FREEDRENO +AM_CFLAGS += -DGALLIUM_FREEDRENO +endif + noinst_LTLIBRARIES = libimxdrm.la libimxdrm_la_SOURCES = $(C_SOURCES) EXTRA_DIST = meson.build + diff --git a/src/gallium/winsys/imx/drm/imx_drm_winsys.c b/src/gallium/winsys/imx/drm/imx_drm_winsys.c index 4bd2125031..f8c4abffde 100644 --- a/src/gallium/winsys/imx/drm/imx_drm_winsys.c +++ b/src/gallium/winsys/imx/drm/imx_drm_winsys.c @@ -26,6 +26,7 @@ #include "imx_drm_public.h" #include "etnaviv/drm/etnaviv_drm_public.h" +#include "freedreno/drm/freedreno_drm_public.h" #include "loader/loader.h" #include "renderonly/renderonly.h" @@ -37,15 +38,28 @@ struct pipe_screen *imx_drm_screen_create(int fd) struct renderonly ro = { .create_for_resource = renderonly_create_kms_dumb_buffer_for_resource, .kms_fd = fd, - .gpu_fd = loader_open_render_node("etnaviv") }; + struct pipe_screen *screen; - if (ro.gpu_fd < 0) - return NULL; +#if defined(GALLIUM_ETNAVIV) + ro.gpu_fd = loader_open_render_node("etnaviv"); + if (ro.gpu_fd >= 0) { + screen = etna_drm_screen_create_renderonly(); + if (screen) +return screen; + close(ro.gpu_fd); + } +#endif - struct pipe_screen *screen = etna_drm_screen_create_renderonly(); - if (!screen) +#if defined(GALLIUM_FREEDRENO) + ro.gpu_fd = loader_open_render_node("msm"); + if (ro.gpu_fd >= 0) { + screen = fd_drm_screen_create_renderonly(); + if (screen) +return screen; close(ro.gpu_fd); + } +#endif - return screen; + return NULL; } -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] freedreno: a2xx: ir2 update
this patch brings a number of changes to ir2: -ir2 now generates CF clauses as necessary during assembly. this simplifies fd2_program/fd2_compiler and is necessary to implement optimization passes -ir2 now has separate vector/scalar instructions. this will make it easier to implementing scheduling of scalar+vector instructions together. dst_reg is also now seperate from src registers instead of a single list -ir2 now implements register allocation. this makes it possible to compile shaders which have more than 64 TGSI registers -ir2 now implements the following optimizations: removal of IN/OUT MOV instructions generated by TGSI and removal of unused instructions when some exports are disabled -ir2 now allows full 8-bit index for constants -ir2_alloc no longer allocates 4 times too many bytes Signed-off-by: Jonathan Marek --- .../drivers/freedreno/a2xx/fd2_compiler.c | 210 ++--- .../drivers/freedreno/a2xx/fd2_program.c | 75 +- .../drivers/freedreno/a2xx/instr-a2xx.h | 28 +- src/gallium/drivers/freedreno/a2xx/ir-a2xx.c | 734 +++--- src/gallium/drivers/freedreno/a2xx/ir-a2xx.h | 113 +-- 5 files changed, 615 insertions(+), 545 deletions(-) diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c b/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c index 3ad47f9850..12f9a1ce0a 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c @@ -93,9 +93,6 @@ struct fd2_compile_context { unsigned position, psize; uint64_t need_sync; - - /* current exec CF instruction */ - struct ir2_cf *cf; }; static int @@ -130,7 +127,6 @@ compile_init(struct fd2_compile_context *ctx, struct fd_program_stateobj *prog, ctx->prog = prog; ctx->so = so; - ctx->cf = NULL; ctx->pred_depth = 0; ret = tgsi_parse_init(>parser, so->tokens); @@ -236,15 +232,6 @@ compile_free(struct fd2_compile_context *ctx) tgsi_parse_free(>parser); } -static struct ir2_cf * -next_exec_cf(struct fd2_compile_context *ctx) -{ - struct ir2_cf *cf = ctx->cf; - if (!cf || cf->exec.instrs_count >= ARRAY_SIZE(ctx->cf->exec.instrs)) - ctx->cf = cf = ir2_cf_create(ctx->so->ir, EXEC); - return cf; -} - static void compile_vtx_fetch(struct fd2_compile_context *ctx) { @@ -252,13 +239,13 @@ compile_vtx_fetch(struct fd2_compile_context *ctx) int i; for (i = 0; i < ctx->num_regs[TGSI_FILE_INPUT]; i++) { struct ir2_instruction *instr = ir2_instr_create( - next_exec_cf(ctx), IR2_FETCH); + ctx->so->ir, IR2_FETCH); instr->fetch.opc = VTX_FETCH; ctx->need_sync |= 1 << (i+1); - ir2_reg_create(instr, i+1, "xyzw", 0); - ir2_reg_create(instr, 0, "x", 0); + ir2_dst_create(instr, i+1, "xyzw", 0); + ir2_reg_create(instr, 0, "x", IR2_REG_INPUT); if (i == 0) instr->sync = true; @@ -266,7 +253,6 @@ compile_vtx_fetch(struct fd2_compile_context *ctx) vfetch_instrs[i] = instr; } ctx->so->num_vfetch_instrs = i; - ctx->cf = NULL; } /* @@ -312,7 +298,7 @@ get_temp_gpr(struct fd2_compile_context *ctx, int idx) return num; } -static struct ir2_register * +static struct ir2_dst_register * add_dst_reg(struct fd2_compile_context *ctx, struct ir2_instruction *alu, const struct tgsi_dst_register *dst) { @@ -351,10 +337,10 @@ add_dst_reg(struct fd2_compile_context *ctx, struct ir2_instruction *alu, swiz[3] = (dst->WriteMask & TGSI_WRITEMASK_W) ? 'w' : '_'; swiz[4] = '\0'; - return ir2_reg_create(alu, num, swiz, flags); + return ir2_dst_create(alu, num, swiz, flags); } -static struct ir2_register * +static struct ir2_src_register * add_src_reg(struct fd2_compile_context *ctx, struct ir2_instruction *alu, const struct tgsi_src_register *src) { @@ -373,6 +359,7 @@ add_src_reg(struct fd2_compile_context *ctx, struct ir2_instruction *alu, if (ctx->type == PIPE_SHADER_VERTEX) { num = src->Index + 1; } else { + flags |= IR2_REG_INPUT; num = export_linkage(ctx, ctx->input_export_idx[src->Index]); } @@ -415,7 +402,7 @@ static void add_vector_clamp(struct tgsi_full_instruction *inst, struct ir2_instruction *alu) { if (inst->Instruction.Saturate) { - alu->alu.vector_clamp = true; + alu->alu_vector.clamp = true; } } @@ -423,7 +410,7 @@ static void add_scalar_clamp(struct
[Mesa-dev] [PATCH 5/5] freedreno: a2xx: fix clear color
the format of the CLEAR_COLOR register doesn't depend on the target format this fixes clear color when rendering to 32-bit RGBA and 16-bit targets Signed-off-by: Jonathan Marek --- src/gallium/drivers/freedreno/a2xx/fd2_draw.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c index ca634d794a..6f0535fa2b 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c @@ -135,7 +135,7 @@ fd2_clear(struct fd_context *ctx, unsigned buffers, uint32_t reg, colr = 0; if ((buffers & PIPE_CLEAR_COLOR) && fb->nr_cbufs) - colr = pack_rgba(fb->cbufs[0]->format, color->f); + colr = pack_rgba(PIPE_FORMAT_R8G8B8A8_UNORM, color->f); /* emit generic state now: */ fd2_emit_state(ctx, ctx->dirty & -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/5] freedreno: a2xx: fix crash when freeing context
Signed-off-by: Jonathan Marek --- src/gallium/drivers/freedreno/a2xx/fd2_program.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_program.c b/src/gallium/drivers/freedreno/a2xx/fd2_program.c index 9a77457251..834a7c7fcd 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_program.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_program.c @@ -54,6 +54,8 @@ create_shader(enum shader_t type) static void delete_shader(struct fd2_shader_stateobj *so) { + if (!so) + return; ir2_shader_destroy(so->ir); free(so->tokens); free(so->bin); -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/5] freedreno: a2xx: increase size of the offset field in instr_fetch_vtx_t
The offset field is 22 bit large. 11 bits are necessary because MaxVertexAttribRelativeOffset = 2047 Signed-off-by: Jonathan Marek --- src/gallium/drivers/freedreno/a2xx/instr-a2xx.h | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/src/gallium/drivers/freedreno/a2xx/instr-a2xx.h b/src/gallium/drivers/freedreno/a2xx/instr-a2xx.h index 0d6e138daf..ac972ed35a 100644 --- a/src/gallium/drivers/freedreno/a2xx/instr-a2xx.h +++ b/src/gallium/drivers/freedreno/a2xx/instr-a2xx.h @@ -366,10 +366,8 @@ typedef struct PACKED { uint8_t pred_select : 1; /* dword2: */ uint8_t stride : 8; - /* possibly offset and reserved4 are swapped on a200? */ - uint8_t offset : 8; - uint8_t reserved4: 8; - uint8_t reserved5: 7; + uint32_toffset : 22; + uint8_t reserved4: 1; uint8_t pred_condition : 1; } instr_fetch_vtx_t; -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/5] freedreno: a2xx: fix crash on first clear
blend can be NULL, so check for that Signed-off-by: Jonathan Marek --- src/gallium/drivers/freedreno/a2xx/fd2_emit.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c index 4bf41b2c67..dcf7ed10b5 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c @@ -295,7 +295,7 @@ fd2_emit_state(struct fd_context *ctx, const enum fd_dirty_3d_state dirty) if (dirty & (FD_DIRTY_BLEND | FD_DIRTY_ZSA)) { OUT_PKT3(ring, CP_SET_CONSTANT, 2); OUT_RING(ring, CP_REG(REG_A2XX_RB_COLORCONTROL)); - OUT_RING(ring, zsa->rb_colorcontrol | blend->rb_colorcontrol); + OUT_RING(ring, blend ? zsa->rb_colorcontrol | blend->rb_colorcontrol : 0); } if (dirty & (FD_DIRTY_BLEND | FD_DIRTY_FRAMEBUFFER)) { @@ -305,13 +305,13 @@ fd2_emit_state(struct fd_context *ctx, const enum fd_dirty_3d_state dirty) OUT_PKT3(ring, CP_SET_CONSTANT, 2); OUT_RING(ring, CP_REG(REG_A2XX_RB_BLEND_CONTROL)); - OUT_RING(ring, blend->rb_blendcontrol_alpha | + OUT_RING(ring, blend ? blend->rb_blendcontrol_alpha | COND(has_alpha, blend->rb_blendcontrol_rgb) | - COND(!has_alpha, blend->rb_blendcontrol_no_alpha_rgb)); + COND(!has_alpha, blend->rb_blendcontrol_no_alpha_rgb) : 0); OUT_PKT3(ring, CP_SET_CONSTANT, 2); OUT_RING(ring, CP_REG(REG_A2XX_RB_COLOR_MASK)); - OUT_RING(ring, blend->rb_colormask); + OUT_RING(ring, blend ? blend->rb_colormask : 0xf); } if (dirty & FD_DIRTY_BLEND_COLOR) { -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/5] freedreno: add a20x
this patch adds support for a20x, which has some differences with a220: -no VGT_MAX_VTX_INDX register -no CLEAR_COLOR register -set RB_BC_CONTROL in restore (hangs without) -different CP_DRAW_INDX format tested with kmscube and glmark2 scenes, on par with a220 Signed-off-by: Jonathan Marek --- src/gallium/drivers/freedreno/a2xx/fd2_draw.c | 37 +-- src/gallium/drivers/freedreno/a2xx/fd2_emit.c | 10 + src/gallium/drivers/freedreno/a2xx/fd2_gmem.c | 22 ++- .../drivers/freedreno/freedreno_draw.h| 27 +- .../drivers/freedreno/freedreno_screen.c | 1 + .../drivers/freedreno/freedreno_screen.h | 6 +++ .../drivers/freedreno/freedreno_util.h| 13 +++ 7 files changed, 85 insertions(+), 31 deletions(-) diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c index 8df1793a35..ca634d794a 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c @@ -101,12 +101,14 @@ fd2_draw_vbo(struct fd_context *ctx, const struct pipe_draw_info *info, OUT_PKT0(ring, REG_A2XX_TC_CNTL_STATUS, 1); OUT_RING(ring, A2XX_TC_CNTL_STATUS_L2_INVALIDATE); - OUT_WFI (ring); + if (!is_a20x(ctx->screen)) { + OUT_WFI (ring); - OUT_PKT3(ring, CP_SET_CONSTANT, 3); - OUT_RING(ring, CP_REG(REG_A2XX_VGT_MAX_VTX_INDX)); - OUT_RING(ring, info->max_index);/* VGT_MAX_VTX_INDX */ - OUT_RING(ring, info->min_index);/* VGT_MIN_VTX_INDX */ + OUT_PKT3(ring, CP_SET_CONSTANT, 3); + OUT_RING(ring, CP_REG(REG_A2XX_VGT_MAX_VTX_INDX)); + OUT_RING(ring, info->max_index);/* VGT_MAX_VTX_INDX */ + OUT_RING(ring, info->min_index);/* VGT_MIN_VTX_INDX */ + } fd_draw_emit(ctx->batch, ring, ctx->primtypes[info->mode], IGNORE_VISIBILITY, info, index_offset); @@ -157,9 +159,18 @@ fd2_clear(struct fd_context *ctx, unsigned buffers, OUT_PKT0(ring, REG_A2XX_TC_CNTL_STATUS, 1); OUT_RING(ring, A2XX_TC_CNTL_STATUS_L2_INVALIDATE); - OUT_PKT3(ring, CP_SET_CONSTANT, 2); - OUT_RING(ring, CP_REG(REG_A2XX_CLEAR_COLOR)); - OUT_RING(ring, colr); + if (is_a20x(ctx->screen)) { + OUT_PKT3(ring, CP_SET_CONSTANT, 5); + OUT_RING(ring, 0x0480); + OUT_RING(ring, color->ui[0]); + OUT_RING(ring, color->ui[1]); + OUT_RING(ring, color->ui[2]); + OUT_RING(ring, color->ui[3]); + } else { + OUT_PKT3(ring, CP_SET_CONSTANT, 2); + OUT_RING(ring, CP_REG(REG_A2XX_CLEAR_COLOR)); + OUT_RING(ring, colr); + } OUT_PKT3(ring, CP_SET_CONSTANT, 2); OUT_RING(ring, CP_REG(REG_A2XX_A220_RB_LRZ_VSC_CONTROL)); @@ -264,10 +275,12 @@ fd2_clear(struct fd_context *ctx, unsigned buffers, OUT_RING(ring, 0x0); } - OUT_PKT3(ring, CP_SET_CONSTANT, 3); - OUT_RING(ring, CP_REG(REG_A2XX_VGT_MAX_VTX_INDX)); - OUT_RING(ring, 3); /* VGT_MAX_VTX_INDX */ - OUT_RING(ring, 0); /* VGT_MIN_VTX_INDX */ + if (!is_a20x(ctx->screen)) { + OUT_PKT3(ring, CP_SET_CONSTANT, 3); + OUT_RING(ring, CP_REG(REG_A2XX_VGT_MAX_VTX_INDX)); + OUT_RING(ring, 3); /* VGT_MAX_VTX_INDX */ + OUT_RING(ring, 0); /* VGT_MIN_VTX_INDX */ + } fd_draw(ctx->batch, ring, DI_PT_RECTLIST, IGNORE_VISIBILITY, DI_SRC_SEL_AUTO_INDEX, 3, 0, INDEX_SIZE_IGN, 0, 0, NULL); diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c index d749eb0324..4bf41b2c67 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c @@ -332,6 +332,16 @@ fd2_emit_state(struct fd_context *ctx, const enum fd_dirty_3d_state dirty) void fd2_emit_restore(struct fd_context *ctx, struct fd_ringbuffer *ring) { + if (is_a20x(ctx->screen)) { + OUT_PKT0(ring, REG_A2XX_RB_BC_CONTROL, 1); + OUT_RING(ring, + A2XX_RB_BC_CONTROL_ACCUM_TIMEOUT_SELECT(3) | + A2XX_RB_BC_CONTROL_DISABLE_LZ_NULL_ZCMD_DROP | + A2XX_RB_BC_CONTROL_ENABLE_CRC_UPDATE | + A2XX_RB_BC_CONTROL_ACCUM_DATA_FIFO_LIMIT(8) | + A2XX_RB_BC_CONTROL_MEM_EXPORT_TIMEOUT_SELECT(3)); + } + OUT_PKT0(ring, REG_A2XX_TP0_CHICKEN, 1); OUT_RING(ring, 0x0002); diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c b/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c index 46a7d18ef0..62382995c0 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c +++
[Mesa-dev] [PATCH] freedreno: add a20x
this patch adds support for a20x, which has some differences with a220: -no VGT_MAX_VTX_INDX register -no CLEAR_COLOR register -set RB_BC_CONTROL in restore (hangs without) -different CP_DRAW_INDX format tested with kmscube and glmark2 scenes, on par with a220 Signed-off-by: Jonathan Marek --- src/gallium/drivers/freedreno/a2xx/fd2_draw.c | 37 +-- src/gallium/drivers/freedreno/a2xx/fd2_emit.c | 10 + src/gallium/drivers/freedreno/a2xx/fd2_gmem.c | 22 ++- .../drivers/freedreno/freedreno_draw.h| 27 +- .../drivers/freedreno/freedreno_screen.c | 1 + .../drivers/freedreno/freedreno_screen.h | 6 +++ .../drivers/freedreno/freedreno_util.h| 13 +++ 7 files changed, 85 insertions(+), 31 deletions(-) diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c index 8df1793a35..ca634d794a 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c @@ -101,12 +101,14 @@ fd2_draw_vbo(struct fd_context *ctx, const struct pipe_draw_info *info, OUT_PKT0(ring, REG_A2XX_TC_CNTL_STATUS, 1); OUT_RING(ring, A2XX_TC_CNTL_STATUS_L2_INVALIDATE); - OUT_WFI (ring); + if (!is_a20x(ctx->screen)) { + OUT_WFI (ring); - OUT_PKT3(ring, CP_SET_CONSTANT, 3); - OUT_RING(ring, CP_REG(REG_A2XX_VGT_MAX_VTX_INDX)); - OUT_RING(ring, info->max_index);/* VGT_MAX_VTX_INDX */ - OUT_RING(ring, info->min_index);/* VGT_MIN_VTX_INDX */ + OUT_PKT3(ring, CP_SET_CONSTANT, 3); + OUT_RING(ring, CP_REG(REG_A2XX_VGT_MAX_VTX_INDX)); + OUT_RING(ring, info->max_index);/* VGT_MAX_VTX_INDX */ + OUT_RING(ring, info->min_index);/* VGT_MIN_VTX_INDX */ + } fd_draw_emit(ctx->batch, ring, ctx->primtypes[info->mode], IGNORE_VISIBILITY, info, index_offset); @@ -157,9 +159,18 @@ fd2_clear(struct fd_context *ctx, unsigned buffers, OUT_PKT0(ring, REG_A2XX_TC_CNTL_STATUS, 1); OUT_RING(ring, A2XX_TC_CNTL_STATUS_L2_INVALIDATE); - OUT_PKT3(ring, CP_SET_CONSTANT, 2); - OUT_RING(ring, CP_REG(REG_A2XX_CLEAR_COLOR)); - OUT_RING(ring, colr); + if (is_a20x(ctx->screen)) { + OUT_PKT3(ring, CP_SET_CONSTANT, 5); + OUT_RING(ring, 0x0480); + OUT_RING(ring, color->ui[0]); + OUT_RING(ring, color->ui[1]); + OUT_RING(ring, color->ui[2]); + OUT_RING(ring, color->ui[3]); + } else { + OUT_PKT3(ring, CP_SET_CONSTANT, 2); + OUT_RING(ring, CP_REG(REG_A2XX_CLEAR_COLOR)); + OUT_RING(ring, colr); + } OUT_PKT3(ring, CP_SET_CONSTANT, 2); OUT_RING(ring, CP_REG(REG_A2XX_A220_RB_LRZ_VSC_CONTROL)); @@ -264,10 +275,12 @@ fd2_clear(struct fd_context *ctx, unsigned buffers, OUT_RING(ring, 0x0); } - OUT_PKT3(ring, CP_SET_CONSTANT, 3); - OUT_RING(ring, CP_REG(REG_A2XX_VGT_MAX_VTX_INDX)); - OUT_RING(ring, 3); /* VGT_MAX_VTX_INDX */ - OUT_RING(ring, 0); /* VGT_MIN_VTX_INDX */ + if (!is_a20x(ctx->screen)) { + OUT_PKT3(ring, CP_SET_CONSTANT, 3); + OUT_RING(ring, CP_REG(REG_A2XX_VGT_MAX_VTX_INDX)); + OUT_RING(ring, 3); /* VGT_MAX_VTX_INDX */ + OUT_RING(ring, 0); /* VGT_MIN_VTX_INDX */ + } fd_draw(ctx->batch, ring, DI_PT_RECTLIST, IGNORE_VISIBILITY, DI_SRC_SEL_AUTO_INDEX, 3, 0, INDEX_SIZE_IGN, 0, 0, NULL); diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c index a787b71e37..9c765dfd88 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c @@ -340,6 +340,16 @@ fd2_emit_state(struct fd_context *ctx, const enum fd_dirty_3d_state dirty) void fd2_emit_restore(struct fd_context *ctx, struct fd_ringbuffer *ring) { + if (is_a20x(ctx->screen)) { + OUT_PKT0(ring, REG_A2XX_RB_BC_CONTROL, 1); + OUT_RING(ring, + A2XX_RB_BC_CONTROL_ACCUM_TIMEOUT_SELECT(3) | + A2XX_RB_BC_CONTROL_DISABLE_LZ_NULL_ZCMD_DROP | + A2XX_RB_BC_CONTROL_ENABLE_CRC_UPDATE | + A2XX_RB_BC_CONTROL_ACCUM_DATA_FIFO_LIMIT(8) | + A2XX_RB_BC_CONTROL_MEM_EXPORT_TIMEOUT_SELECT(3)); + } + OUT_PKT0(ring, REG_A2XX_TP0_CHICKEN, 1); OUT_RING(ring, 0x0002); diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c b/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c index 46a7d18ef0..62382995c0 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c +++
[Mesa-dev] [PATCH 2/2] freedreno: a2xx: fix crash when freeing context
Signed-off-by: Jonathan Marek --- src/gallium/drivers/freedreno/a2xx/fd2_program.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_program.c b/src/gallium/drivers/freedreno/a2xx/fd2_program.c index 9a77457251..834a7c7fcd 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_program.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_program.c @@ -54,6 +54,8 @@ create_shader(enum shader_t type) static void delete_shader(struct fd2_shader_stateobj *so) { + if (!so) + return; ir2_shader_destroy(so->ir); free(so->tokens); free(so->bin); -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] freedreno: a2xx: fix crash on first clear
blend can be NULL, so check for that Signed-off-by: Jonathan Marek --- src/gallium/drivers/freedreno/a2xx/fd2_emit.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c index 9c765dfd88..e36eebf98c 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c @@ -303,7 +303,7 @@ fd2_emit_state(struct fd_context *ctx, const enum fd_dirty_3d_state dirty) if (dirty & (FD_DIRTY_BLEND | FD_DIRTY_ZSA)) { OUT_PKT3(ring, CP_SET_CONSTANT, 2); OUT_RING(ring, CP_REG(REG_A2XX_RB_COLORCONTROL)); - OUT_RING(ring, zsa->rb_colorcontrol | blend->rb_colorcontrol); + OUT_RING(ring, blend ? zsa->rb_colorcontrol | blend->rb_colorcontrol : 0); } if (dirty & (FD_DIRTY_BLEND | FD_DIRTY_FRAMEBUFFER)) { @@ -313,13 +313,13 @@ fd2_emit_state(struct fd_context *ctx, const enum fd_dirty_3d_state dirty) OUT_PKT3(ring, CP_SET_CONSTANT, 2); OUT_RING(ring, CP_REG(REG_A2XX_RB_BLEND_CONTROL)); - OUT_RING(ring, blend->rb_blendcontrol_alpha | + OUT_RING(ring, blend ? blend->rb_blendcontrol_alpha | COND(has_alpha, blend->rb_blendcontrol_rgb) | - COND(!has_alpha, blend->rb_blendcontrol_no_alpha_rgb)); + COND(!has_alpha, blend->rb_blendcontrol_no_alpha_rgb) : 0); OUT_PKT3(ring, CP_SET_CONSTANT, 2); OUT_RING(ring, CP_REG(REG_A2XX_RB_COLOR_MASK)); - OUT_RING(ring, blend->rb_colormask); + OUT_RING(ring, blend ? blend->rb_colormask : 0xf); } if (dirty & FD_DIRTY_BLEND_COLOR) { -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] freedreno: a2xx: fix crash on first clear
blend can be NULL, so check for that Signed-off-by: Jonathan Marek --- src/gallium/drivers/freedreno/a2xx/fd2_emit.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c index 9c765dfd88..e36eebf98c 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c @@ -303,7 +303,7 @@ fd2_emit_state(struct fd_context *ctx, const enum fd_dirty_3d_state dirty) if (dirty & (FD_DIRTY_BLEND | FD_DIRTY_ZSA)) { OUT_PKT3(ring, CP_SET_CONSTANT, 2); OUT_RING(ring, CP_REG(REG_A2XX_RB_COLORCONTROL)); - OUT_RING(ring, zsa->rb_colorcontrol | blend->rb_colorcontrol); + OUT_RING(ring, blend ? zsa->rb_colorcontrol | blend->rb_colorcontrol : 0); } if (dirty & (FD_DIRTY_BLEND | FD_DIRTY_FRAMEBUFFER)) { @@ -313,13 +313,13 @@ fd2_emit_state(struct fd_context *ctx, const enum fd_dirty_3d_state dirty) OUT_PKT3(ring, CP_SET_CONSTANT, 2); OUT_RING(ring, CP_REG(REG_A2XX_RB_BLEND_CONTROL)); - OUT_RING(ring, blend->rb_blendcontrol_alpha | + OUT_RING(ring, blend ? blend->rb_blendcontrol_alpha | COND(has_alpha, blend->rb_blendcontrol_rgb) | - COND(!has_alpha, blend->rb_blendcontrol_no_alpha_rgb)); + COND(!has_alpha, blend->rb_blendcontrol_no_alpha_rgb) : 0); OUT_PKT3(ring, CP_SET_CONSTANT, 2); OUT_RING(ring, CP_REG(REG_A2XX_RB_COLOR_MASK)); - OUT_RING(ring, blend->rb_colormask); + OUT_RING(ring, blend ? blend->rb_colormask : 0xf); } if (dirty & FD_DIRTY_BLEND_COLOR) { -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] freedreno: a2xx: fix clear color
the format of the CLEAR_COLOR register doesn't depend on the target format this fixes clear color when rendering to 32-bit RGBA and 16-bit targets Signed-off-by: Jonathan Marek --- src/gallium/drivers/freedreno/a2xx/fd2_draw.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c index c12047628c..2d3c029e57 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c @@ -133,7 +133,7 @@ fd2_clear(struct fd_context *ctx, unsigned buffers, uint32_t reg, colr = 0; if ((buffers & PIPE_CLEAR_COLOR) && fb->nr_cbufs) - colr = pack_rgba(fb->cbufs[0]->format, color->f); + colr = pack_rgba(PIPE_FORMAT_R8G8B8A8_UNORM, color->f); /* emit generic state for clear now: */ fd2_emit_state_for_clear(ctx); -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] freedreno: add initial a20x support
the bare minimum to get a20x running with kmscube and some glmark2 scenes: different CP_DRAW_INDX format and the different clear color register Signed-off-by: Jonathan Marek --- src/gallium/drivers/freedreno/a2xx/fd2_draw.c | 15 +++-- src/gallium/drivers/freedreno/a2xx/fd2_emit.c | 10 ++ .../drivers/freedreno/freedreno_draw.h| 32 --- .../drivers/freedreno/freedreno_screen.c | 1 + .../drivers/freedreno/freedreno_screen.h | 6 5 files changed, 50 insertions(+), 14 deletions(-) diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c index ef9daddfcf..c12047628c 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c @@ -155,9 +155,18 @@ fd2_clear(struct fd_context *ctx, unsigned buffers, OUT_PKT0(ring, REG_A2XX_TC_CNTL_STATUS, 1); OUT_RING(ring, A2XX_TC_CNTL_STATUS_L2_INVALIDATE); - OUT_PKT3(ring, CP_SET_CONSTANT, 2); - OUT_RING(ring, CP_REG(REG_A2XX_CLEAR_COLOR)); - OUT_RING(ring, colr); + if (is_a20x(ctx->screen)) { + OUT_PKT3(ring, CP_SET_CONSTANT, 5); + OUT_RING(ring, 0x0480); + OUT_RING(ring, color->ui[0]); + OUT_RING(ring, color->ui[1]); + OUT_RING(ring, color->ui[2]); + OUT_RING(ring, color->ui[3]); + } else { + OUT_PKT3(ring, CP_SET_CONSTANT, 2); + OUT_RING(ring, CP_REG(REG_A2XX_CLEAR_COLOR)); + OUT_RING(ring, colr); + } OUT_PKT3(ring, CP_SET_CONSTANT, 2); OUT_RING(ring, CP_REG(REG_A2XX_A220_RB_LRZ_VSC_CONTROL)); diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c index 6927fa87fd..2b28bb23a3 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c @@ -415,6 +415,16 @@ fd2_emit_state_for_clear(struct fd_context *ctx) void fd2_emit_restore(struct fd_context *ctx, struct fd_ringbuffer *ring) { + if (is_a20x(ctx->screen)) { + OUT_PKT0(ring, REG_A2XX_RB_BC_CONTROL, 1); + OUT_RING(ring, + A2XX_RB_BC_CONTROL_ACCUM_TIMEOUT_SELECT(3) | + A2XX_RB_BC_CONTROL_DISABLE_LZ_NULL_ZCMD_DROP | + A2XX_RB_BC_CONTROL_ENABLE_CRC_UPDATE | + A2XX_RB_BC_CONTROL_ACCUM_DATA_FIFO_LIMIT(8) | + A2XX_RB_BC_CONTROL_MEM_EXPORT_TIMEOUT_SELECT(3)); + } + OUT_PKT0(ring, REG_A2XX_TP0_CHICKEN, 1); OUT_RING(ring, 0x0002); diff --git a/src/gallium/drivers/freedreno/freedreno_draw.h b/src/gallium/drivers/freedreno/freedreno_draw.h index b293f73b82..ec4b47898d 100644 --- a/src/gallium/drivers/freedreno/freedreno_draw.h +++ b/src/gallium/drivers/freedreno/freedreno_draw.h @@ -51,6 +51,8 @@ fd_draw(struct fd_batch *batch, struct fd_ringbuffer *ring, uint32_t idx_size, uint32_t idx_offset, struct pipe_resource *idx_buffer) { + uint32_t cnt, draw; + /* for debug after a lock up, write a unique counter value * to scratch7 for each draw, to make it easier to match up * register dumps to cmdstream. The combination of IB @@ -74,18 +76,26 @@ fd_draw(struct fd_batch *batch, struct fd_ringbuffer *ring, OUT_RING(ring, 0); } - OUT_PKT3(ring, CP_DRAW_INDX, idx_buffer ? 5 : 3); - OUT_RING(ring, 0x);/* viz query info. */ - if (vismode == USE_VISIBILITY) { - /* leave vis mode blank for now, it will be patched up when -* we know if we are binning or not -*/ - OUT_RINGP(ring, DRAW(primtype, src_sel, idx_type, 0, instances), - >draw_patches); - } else { - OUT_RING(ring, DRAW(primtype, src_sel, idx_type, vismode, instances)); + cnt = idx_buffer ? 5 : 3; + draw = DRAW(primtype, src_sel, idx_type, 0, instances); + + if (is_a20x(batch->ctx->screen)) { + /* XXX instances field is overwritten */ + draw &= 0x; + draw |= count << 16; + cnt -= 1; } - OUT_RING(ring, count); /* NumIndices */ + + OUT_PKT3(ring, CP_DRAW_INDX, cnt); + OUT_RING(ring, 0x);/* viz query info. */ + if (vismode == USE_VISIBILITY) + OUT_RINGP(ring, draw, >draw_patches); + else + OUT_RING(ring, draw | DRAW(0, 0, 0, vismode, 0)); + + if (!is_a20x(batch->ctx->screen)) + OUT_RING(ring, count);/* NumIndices */ + if (idx_buffer) { OUT_RELOC(ring, fd_resource(idx_buffer)->bo, idx_offset, 0, 0); OUT_RING (ring, idx_size); diff --git a/src/gallium/dr