[Mesa-dev] [PATCH] r600/sb: fix rotated register in while loop
This patch is based on https://lists.freedesktop.org/archives/mesa-dev/2018-February/185805.html Dave Airlie: "A bunch of CTS tests led me to write tests/shaders/ssa/fs-while-loop-rotate-value.shader_test which r600/sb always fell over on. GCM seems to move some of the copies into other basic blocks, if we don't allow this to happen then it doesn't seem to schedule them badly. Everything I've read on SSA/phi copies say they have to happen in parallel, so keeping them in the same basic block seems like a good way to keep some of that property." This patch differs from the one proposed by Dave in that it only adds the NF_DONT_MOVE flag to copy_move instructions that are created by split_phi* and that are located in loops. Fixes piglit: tests/shaders/ssa/fs-while-loop-rotate-value.shader_test (no regressions in the shader set). It also fixes all failing tests from dEQP-GLES3.functional.shaders.loops.* Signed-off-by: Gert Wollny --- src/gallium/drivers/r600/sb/sb_ra_init.cpp | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/src/gallium/drivers/r600/sb/sb_ra_init.cpp b/src/gallium/drivers/r600/sb/sb_ra_init.cpp index 985e179452..c557b86871 100644 --- a/src/gallium/drivers/r600/sb/sb_ra_init.cpp +++ b/src/gallium/drivers/r600/sb/sb_ra_init.cpp @@ -545,10 +545,13 @@ void ra_split::split_phi_src(container_node *loc, container_node *c, continue; value *t = sh.create_temp_value(); + alu_node* n = sh.create_copy_mov(t, v); + if (loop) + n->flags |= NF_DONT_MOVE; if (loop && id == 0) - loc->insert_before(sh.create_copy_mov(t, v)); + loc->insert_before(n); else - loc->push_back(sh.create_copy_mov(t, v)); + loc->push_back(n); v = t; sh.coal.add_edge(v, d, coalescer::phi_cost); @@ -566,9 +569,10 @@ void ra_split::split_phi_dst(node* loc, container_node *c, bool loop) { value *t = sh.create_temp_value(); node *cp = sh.create_copy_mov(v, t); - if (loop) + if (loop) { + cp->flags |= NF_DONT_MOVE; static_cast(loc)->push_front(cp); - else + } else loc->insert_after(cp); v = t; } -- 2.16.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600/sb: fix rotated register in while loop
Am Montag, den 19.02.2018, 14:06 +1000 schrieb Dave Airlie: > On 15 February 2018 at 01:26, Gert Wollny > wrote: > > Am Mittwoch, den 14.02.2018, 17:18 +1000 schrieb Dave Airlie: > > > From: Dave Airlie > > > > > > A bunch of CTS tests led me to write > > > tests/shaders/ssa/fs-while-loop-rotate-value.shader_test > > > which r600/sb always fell over on. > > > > > > This patch fixes it, but I'll probably never be 100% sure why. > > I've gone back to this approach, I've spot tested and I don't see > as many regressions. For me most of the assertion failures don't come up anymore, but in total the numer of regressions is still quite high, i.e. 1 assertion failure and 13 normal failures. vs-struct-nonconst-non-opaque-members fails with b/sb_sched.cpp:931:process_fetch: Assertion `f->parent->count() == 1' failed. and the list of normal failures is very similar to you first patch, the only change is that glsl-1.10/execution/variable-indexing/vs-output-array-float-index-wr no longer fails, but instead glsl-1.50/execution/geometry/dynamic_input_array_index fails, Best, Gert ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600/sb: fix rotated register in while loop
On 15 February 2018 at 01:26, Gert Wollny wrote: > Am Mittwoch, den 14.02.2018, 17:18 +1000 schrieb Dave Airlie: >> From: Dave Airlie >> >> A bunch of CTS tests led me to write >> tests/shaders/ssa/fs-while-loop-rotate-value.shader_test >> which r600/sb always fell over on. >> >> This patch fixes it, but I'll probably never be 100% sure why. I've gone back to this approach, I've spot tested and I don't see as many regressions. > > fixes: >shaders/ssa/fs-while-loop-rotate-value >spec/arb_enhanced_layouts/execution/component-layout/ > sso-vs-gs-fs-array-interleave > However this test is broken on r600 even with nosb, it also never clears so if a previous test has the right answer it will always pass. I might try fixing the test then seeing why it's broken on r600. Dave. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] r600/sb: fix rotated register in while loop (v3)
From: Dave Airlie A bunch of CTS tests led me to write tests/shaders/ssa/fs-while-loop-rotate-value.shader_test which r600/sb always fell over on. This patch fixes it, but I'll probably never be 100% sure why. Anyways what appears to be happening is when gcm is scheduling the copy_movs used for phis, we have 4 copy_movs in the scheduling queue, one of them releases a new copy_mov and it gets pushed to the the front of the scheduling queue, but for correctness we rely on the previous copy_movs getting emitted first. This places copy_movs in order on the list. So if the list is empty, it puts it at the front, if the list has just copy_movs in it, it goes to the end, otherwise we iterate the list and insert it between the copy_movs and subsequent instructions. I've added the additional check here that if the src[0] of the copy mov has a constraint (which will be true for the (copy) MOV R1.x.2, t10 case but not for MOV t20, cases, then we should schedule those early, but the others late. v2: wrong direction. v3: back to v1 + changes Signed-off-by: Dave Airlie --- src/gallium/drivers/r600/sb/sb_gcm.cpp | 19 +-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/r600/sb/sb_gcm.cpp b/src/gallium/drivers/r600/sb/sb_gcm.cpp index 7776a10fc86..0c715b24dfb 100644 --- a/src/gallium/drivers/r600/sb/sb_gcm.cpp +++ b/src/gallium/drivers/r600/sb/sb_gcm.cpp @@ -637,8 +637,23 @@ void gcm::add_ready(node *n) { sched_queue_id sq = sh.get_queue_id(n); if (n->flags & NF_SCHEDULE_EARLY) bu_ready_early[sq].push_back(n); - else if (sq == SQ_ALU && n->is_copy_mov()) - bu_ready[sq].push_front(n); + else if (sq == SQ_ALU && n->is_copy_mov()) { + if (bu_ready[sq].empty() || n->src[0]->constraint) + bu_ready[sq].push_front(n); + else { + bool inserted = false; + for (sched_queue::iterator I = bu_ready[sq].begin(), E = bu_ready[sq].end(); I != E; I++) { + node *a = *I; + if (!a->is_copy_mov()) { + bu_ready[sq].insert(I, n); + inserted = true; + break; + } + } + if (!inserted) + bu_ready[sq].push_back(n); + } + } else if (n->is_alu_inst()) { alu_node *a = static_cast(n); if (a->bc.op_ptr->flags & AF_PRED && a->dst[2]) { -- 2.14.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600/sb: fix rotated register in while loop (attempt 2)
Am Freitag, den 16.02.2018, 14:44 +1000 schrieb Dave Airlie: > From: Dave Airlie > > A bunch of CTS tests led me to write > tests/shaders/ssa/fs-while-loop-rotate-value.shader_test > which r600/sb always fell over on. > > GCM seems to move some of the copys into other basic blocks, > if we don't allow this to happen then it doesn't seem to schedule > them badly. > > Everything I've read on SSA/phi copies say they have to happen > in parallel, so keeping them in the same basic block seems like > a good way to keep some of that property. This one fixes fs-while-loop-rotate-value for me (barts) as promised, no longer fixes spec/arb_enhanced_layouts/execution/component-layout/ sso-vs-gs-fs-array-interleave but breaks spec/glsl-1.50/execution/variable-indexing/ vs-output-array-vec2-index-wr-before-gs I don't know whether this is helpful, but when I apply a patch to TGSI split arrays [1], this piglit also passes with your patch. Specifically, since the loops are completely unrolled no indirect access is happening and all arrays can be split. As a result, when using plain master the shaders optimised by sb and when using your patch on top of the array splitting are very similar (mostly differently ordered literals and register indices), however, running master + this patch results in very different optimised shaders, e.g. sb bundles many vfetch operations at the beginning. Best, Gert [1] https://patchwork.freedesktop.org/series/37991/ > > Signed-off-by: Dave Airlie > --- > src/gallium/drivers/r600/sb/sb_shader.cpp | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/gallium/drivers/r600/sb/sb_shader.cpp > b/src/gallium/drivers/r600/sb/sb_shader.cpp > index 321e24ea25..8959b8391d 100644 > --- a/src/gallium/drivers/r600/sb/sb_shader.cpp > +++ b/src/gallium/drivers/r600/sb/sb_shader.cpp > @@ -121,7 +121,7 @@ alu_node* shader::create_copy_mov(value* dst, > value* src, unsigned affcost) { > alu_node *n = create_mov(dst, src); > > dst->assign_source(src); > - n->flags |= NF_COPY_MOV | NF_DONT_HOIST; > + n->flags |= NF_COPY_MOV | NF_DONT_HOIST | NF_DONT_MOVE; > > if (affcost && dst->is_sgpr() && src->is_sgpr()) > coal.add_edge(src, dst, affcost); ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] r600/sb: fix rotated register in while loop (attempt 2)
From: Dave Airlie A bunch of CTS tests led me to write tests/shaders/ssa/fs-while-loop-rotate-value.shader_test which r600/sb always fell over on. GCM seems to move some of the copys into other basic blocks, if we don't allow this to happen then it doesn't seem to schedule them badly. Everything I've read on SSA/phi copies say they have to happen in parallel, so keeping them in the same basic block seems like a good way to keep some of that property. Signed-off-by: Dave Airlie --- src/gallium/drivers/r600/sb/sb_shader.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/r600/sb/sb_shader.cpp b/src/gallium/drivers/r600/sb/sb_shader.cpp index 321e24ea25..8959b8391d 100644 --- a/src/gallium/drivers/r600/sb/sb_shader.cpp +++ b/src/gallium/drivers/r600/sb/sb_shader.cpp @@ -121,7 +121,7 @@ alu_node* shader::create_copy_mov(value* dst, value* src, unsigned affcost) { alu_node *n = create_mov(dst, src); dst->assign_source(src); - n->flags |= NF_COPY_MOV | NF_DONT_HOIST; + n->flags |= NF_COPY_MOV | NF_DONT_HOIST | NF_DONT_MOVE; if (affcost && dst->is_sgpr() && src->is_sgpr()) coal.add_edge(src, dst, affcost); -- 2.14.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600/sb: fix rotated register in while loop
Am Mittwoch, den 14.02.2018, 17:18 +1000 schrieb Dave Airlie: > From: Dave Airlie > > A bunch of CTS tests led me to write > tests/shaders/ssa/fs-while-loop-rotate-value.shader_test > which r600/sb always fell over on. > > This patch fixes it, but I'll probably never be 100% sure why. Unfortunately it shows quite a number of regressions against (git-b9d2ff05a6), in summary: fixes: shaders/ssa/fs-while-loop-rotate-value spec/arb_enhanced_layouts/execution/component-layout/ sso-vs-gs-fs-array-interleave regressions: crash with sb/sb_sched.cpp:931:process_fetch: Assertion `f->parent->count() == 1' failed: spec/ arb_arrays_of_arrays/execution/sampler/ vs-struct-nonconst-non-opaque-members spec/ arb_gpu_shader5/execution/sampler_array_indexing/ sampler-nonconst-2d sampler-nonconst-2d-array sampler-nonconst-2d-array-grad sampler-nonconst-2d-grad Simply fail: spec/ glsl-1.10/execution/variable-indexing/ vs-output-array-float-index-wr vs-temp-array-mat2-col-rd vs-temp-array-mat2-index-col-rd vs-temp-mat2-col-rd vs-varying-array-mat2-index-col-rd vs-varying-mat2-col-rd glsl-1.20/execution/variable-indexing/ vs-temp-array-mat2-col-rd vs-temp-mat2-col-rd vs-varying-array-mat2-index-col-rd vs-varying-array-mat3-col-row-wr vs-varying-array-mat3-col-wr vs-varying-mat2-col-rd glsl-120/execution/variable-indexing/ vs-varying-array-mat3-col-row-wr vs-varying-array-mat3-col-wr I tried to figure out goes wrong with vs-output-array-float-index-wr, it is the vertex shader (#18 for me), but why it fails eludes me so far. best, Gert > Anyways what appears to be happening is when gcm is scheduling > the copy_movs used for phis, we have 4 copy_movs in the scheduling > queue, one of them releases a new copy_mov and it gets pushed > to the the front of the scheduling queue, but for correctness > we rely on the previous copy_movs getting emitted first. This > places copy_movs in order on the list. > > So if the list is empty, it puts it at the front, if the list > has just copy_movs in it, it goes to the end, otherwise > we iterate the list and insert it between the copy_movs and > subsequent instructions. > > Signed-off-by: Dave Airlie > --- > src/gallium/drivers/r600/sb/sb_gcm.cpp | 19 +-- > 1 file changed, 17 insertions(+), 2 deletions(-) > > diff --git a/src/gallium/drivers/r600/sb/sb_gcm.cpp > b/src/gallium/drivers/r600/sb/sb_gcm.cpp > index 7776a10fc8..7b4df8bd52 100644 > --- a/src/gallium/drivers/r600/sb/sb_gcm.cpp > +++ b/src/gallium/drivers/r600/sb/sb_gcm.cpp > @@ -637,8 +637,23 @@ void gcm::add_ready(node *n) { > sched_queue_id sq = sh.get_queue_id(n); > if (n->flags & NF_SCHEDULE_EARLY) > bu_ready_early[sq].push_back(n); > - else if (sq == SQ_ALU && n->is_copy_mov()) > - bu_ready[sq].push_front(n); > + else if (sq == SQ_ALU && n->is_copy_mov()) { > + if (bu_ready[sq].empty()) > + bu_ready[sq].push_front(n); > + else { > + bool inserted = false; > + for (sched_queue::iterator I = > bu_ready[sq].begin(), E = bu_ready[sq].end(); I != E; I++) { > + node *a = *I; > + if (!a->is_copy_mov()) { > + bu_ready[sq].insert(I, n); > + inserted = true; > + break; > + } > + } > + if (!inserted) > + bu_ready[sq].push_back(n); > + } > + } > else if (n->is_alu_inst()) { > alu_node *a = static_cast(n); > if (a->bc.op_ptr->flags & AF_PRED && a->dst[2]) { ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] r600/sb: fix rotated register in while loop
From: Dave Airlie A bunch of CTS tests led me to write tests/shaders/ssa/fs-while-loop-rotate-value.shader_test which r600/sb always fell over on. This patch fixes it, but I'll probably never be 100% sure why. Anyways what appears to be happening is when gcm is scheduling the copy_movs used for phis, we have 4 copy_movs in the scheduling queue, one of them releases a new copy_mov and it gets pushed to the the front of the scheduling queue, but for correctness we rely on the previous copy_movs getting emitted first. This places copy_movs in order on the list. So if the list is empty, it puts it at the front, if the list has just copy_movs in it, it goes to the end, otherwise we iterate the list and insert it between the copy_movs and subsequent instructions. Signed-off-by: Dave Airlie --- src/gallium/drivers/r600/sb/sb_gcm.cpp | 19 +-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/r600/sb/sb_gcm.cpp b/src/gallium/drivers/r600/sb/sb_gcm.cpp index 7776a10fc8..7b4df8bd52 100644 --- a/src/gallium/drivers/r600/sb/sb_gcm.cpp +++ b/src/gallium/drivers/r600/sb/sb_gcm.cpp @@ -637,8 +637,23 @@ void gcm::add_ready(node *n) { sched_queue_id sq = sh.get_queue_id(n); if (n->flags & NF_SCHEDULE_EARLY) bu_ready_early[sq].push_back(n); - else if (sq == SQ_ALU && n->is_copy_mov()) - bu_ready[sq].push_front(n); + else if (sq == SQ_ALU && n->is_copy_mov()) { + if (bu_ready[sq].empty()) + bu_ready[sq].push_front(n); + else { + bool inserted = false; + for (sched_queue::iterator I = bu_ready[sq].begin(), E = bu_ready[sq].end(); I != E; I++) { + node *a = *I; + if (!a->is_copy_mov()) { + bu_ready[sq].insert(I, n); + inserted = true; + break; + } + } + if (!inserted) + bu_ready[sq].push_back(n); + } + } else if (n->is_alu_inst()) { alu_node *a = static_cast(n); if (a->bc.op_ptr->flags & AF_PRED && a->dst[2]) { -- 2.14.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev