Re: [Mesa-dev] [PATCH] i965/fs: Remove try_replace_with_sel().
On Dec 4, 2014 11:22 PM, Kenneth Graunke kenn...@whitecape.org wrote: On Friday, November 21, 2014 10:23:43 AM Matt Turner wrote: On Tue, Nov 11, 2014 at 9:41 AM, Matt Turner matts...@gmail.com wrote: The rest of our backend optimizations have replaced the need for this since it was written. instructions in affected programs: 30626 - 30564 (-0.20%) Hurts a small number of CSGO shaders by one instruction, but helps even more. Hurts two by a larger number because of something I noticed when I first wrote the SEL peephole: try_replace_with_sel() operates on instructions before we've demoted uniforms to pull constants. So code like var.x = ( -abs(r6.w) = 0.0 ) ? pc[82].x : r9.x; var.y = ( -abs(r6.w) = 0.0 ) ? pc[82].y : r9.y; var.z = ( -abs(r6.w) = 0.0 ) ? pc[82].z : r9.z; var.w = ( -abs(r6.w) = 0.0 ) ? pc[82].w : r9.w; where pc[82] gets demoted to a pull constant, we end up emitting a send(4) instruction to load pc[82] each time, and since they're in different basic blocks because we mishandle the ternary operator in this case we can't combine them. Once we handle this common ternary pattern better the problem will go away. --- Thoughts? I don't know...if I'm reading the above text correctly, this doesn't look compelling. Your argument for deleting this is it's not necessary anymore, but you go on to undermine that by saying that it hurts a few shaders, and even quadrouples the number of pull loads in a few cases... Sure, it'll get fixed if we handle ternary operations better...but we haven't yet...so... First off, how is this different from the sel peephole? Second, at the risk of sounding like a broken record, NIR should fix this for us. That said, I'm a little hesitant to delete whole passes until we know what kind of code it will generate. --Jason I'm pretty confused. Maybe I'm misreading your justification... ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs: Remove try_replace_with_sel().
On Thu, Dec 4, 2014 at 11:22 PM, Kenneth Graunke kenn...@whitecape.org wrote: On Friday, November 21, 2014 10:23:43 AM Matt Turner wrote: On Tue, Nov 11, 2014 at 9:41 AM, Matt Turner matts...@gmail.com wrote: The rest of our backend optimizations have replaced the need for this since it was written. instructions in affected programs: 30626 - 30564 (-0.20%) Hurts a small number of CSGO shaders by one instruction, but helps even more. Hurts two by a larger number because of something I noticed when I first wrote the SEL peephole: try_replace_with_sel() operates on instructions before we've demoted uniforms to pull constants. So code like var.x = ( -abs(r6.w) = 0.0 ) ? pc[82].x : r9.x; var.y = ( -abs(r6.w) = 0.0 ) ? pc[82].y : r9.y; var.z = ( -abs(r6.w) = 0.0 ) ? pc[82].z : r9.z; var.w = ( -abs(r6.w) = 0.0 ) ? pc[82].w : r9.w; where pc[82] gets demoted to a pull constant, we end up emitting a send(4) instruction to load pc[82] each time, and since they're in different basic blocks because we mishandle the ternary operator in this case we can't combine them. Once we handle this common ternary pattern better the problem will go away. --- Thoughts? I don't know...if I'm reading the above text correctly, this doesn't look compelling. Your argument for deleting this is it's not necessary anymore, but you go on to undermine that by saying that it hurts a few shaders, and even quadrouples the number of pull loads in a few cases... Sure, it'll get fixed if we handle ternary operations better...but we haven't yet...so... I'm pretty confused. Maybe I'm misreading your justification... 52 shaders are helped by removing it (16 by one instruction), and only 17 are hurt. Of those 17, 15 have one extra instruction. The remaining two are the cases I described that this pass (not by design, I think?) is able to handle because demoting to pull constants hasn't happened yet. My claim is that the optimization is now a net loss. ... and that this pass isn't the place the problem in those two shaders should be optimized. For the record, their results are: HURT: shaders/closed/steam/counter-strike-global-offensive/2433.shader_test SIMD8: 668 - 683 (2.25%) HURT: shaders/closed/steam/counter-strike-global-offensive/2508.shader_test SIMD8: 733 - 756 (3.14%) If you don't think removing the pass is worth it, okay. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs: Remove try_replace_with_sel().
On Fri, Dec 5, 2014 at 7:02 AM, Jason Ekstrand ja...@jlekstrand.net wrote: First off, how is this different from the sel peephole? This happens in the visitor during translation from GLSL IR to FS IR. It only looks for an exact sequence of IF/MOV/ELSE/MOV/ENDIF where the MOVs are to the same destination. The peephole happens in the optimization loop itself. The peephole looks for sequences of MOVs, rather than just a single one. The peephole also emits a MOV if the two sources you're selecting from are identical. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs: Remove try_replace_with_sel().
On Fri, Dec 5, 2014 at 10:24 AM, Matt Turner matts...@gmail.com wrote: On Fri, Dec 5, 2014 at 7:02 AM, Jason Ekstrand ja...@jlekstrand.net wrote: First off, how is this different from the sel peephole? This happens in the visitor during translation from GLSL IR to FS IR. It only looks for an exact sequence of IF/MOV/ELSE/MOV/ENDIF where the MOVs are to the same destination. The peephole happens in the optimization loop itself. The peephole looks for sequences of MOVs, rather than just a single one. The peephole also emits a MOV if the two sources you're selecting from are identical. So why isn't the peephole picking up on what the other one is? I'm confused as to why we have any shaderdb change at all. Any insight on that? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs: Remove try_replace_with_sel().
On Fri, Nov 21, 2014 at 10:23 AM, Matt Turner matts...@gmail.com wrote: Thoughts? Ping^2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs: Remove try_replace_with_sel().
On 12/04/2014 06:06 PM, Matt Turner wrote: On Fri, Nov 21, 2014 at 10:23 AM, Matt Turner matts...@gmail.com wrote: Thoughts? Ping^2 Seems reasonable enough... any progress on the common ternary pattern better? (I deleted the original message, so I reviewed the patch from the list archive.) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs: Remove try_replace_with_sel().
On Friday, November 21, 2014 10:23:43 AM Matt Turner wrote: On Tue, Nov 11, 2014 at 9:41 AM, Matt Turner matts...@gmail.com wrote: The rest of our backend optimizations have replaced the need for this since it was written. instructions in affected programs: 30626 - 30564 (-0.20%) Hurts a small number of CSGO shaders by one instruction, but helps even more. Hurts two by a larger number because of something I noticed when I first wrote the SEL peephole: try_replace_with_sel() operates on instructions before we've demoted uniforms to pull constants. So code like var.x = ( -abs(r6.w) = 0.0 ) ? pc[82].x : r9.x; var.y = ( -abs(r6.w) = 0.0 ) ? pc[82].y : r9.y; var.z = ( -abs(r6.w) = 0.0 ) ? pc[82].z : r9.z; var.w = ( -abs(r6.w) = 0.0 ) ? pc[82].w : r9.w; where pc[82] gets demoted to a pull constant, we end up emitting a send(4) instruction to load pc[82] each time, and since they're in different basic blocks because we mishandle the ternary operator in this case we can't combine them. Once we handle this common ternary pattern better the problem will go away. --- Thoughts? I don't know...if I'm reading the above text correctly, this doesn't look compelling. Your argument for deleting this is it's not necessary anymore, but you go on to undermine that by saying that it hurts a few shaders, and even quadrouples the number of pull loads in a few cases... Sure, it'll get fixed if we handle ternary operations better...but we haven't yet...so... I'm pretty confused. Maybe I'm misreading your justification... signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs: Remove try_replace_with_sel().
On Tue, Nov 11, 2014 at 9:41 AM, Matt Turner matts...@gmail.com wrote: The rest of our backend optimizations have replaced the need for this since it was written. instructions in affected programs: 30626 - 30564 (-0.20%) Hurts a small number of CSGO shaders by one instruction, but helps even more. Hurts two by a larger number because of something I noticed when I first wrote the SEL peephole: try_replace_with_sel() operates on instructions before we've demoted uniforms to pull constants. So code like var.x = ( -abs(r6.w) = 0.0 ) ? pc[82].x : r9.x; var.y = ( -abs(r6.w) = 0.0 ) ? pc[82].y : r9.y; var.z = ( -abs(r6.w) = 0.0 ) ? pc[82].z : r9.z; var.w = ( -abs(r6.w) = 0.0 ) ? pc[82].w : r9.w; where pc[82] gets demoted to a pull constant, we end up emitting a send(4) instruction to load pc[82] each time, and since they're in different basic blocks because we mishandle the ternary operator in this case we can't combine them. Once we handle this common ternary pattern better the problem will go away. --- Thoughts? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965/fs: Remove try_replace_with_sel().
The rest of our backend optimizations have replaced the need for this since it was written. instructions in affected programs: 30626 - 30564 (-0.20%) Hurts a small number of CSGO shaders by one instruction, but helps even more. Hurts two by a larger number because of something I noticed when I first wrote the SEL peephole: try_replace_with_sel() operates on instructions before we've demoted uniforms to pull constants. So code like var.x = ( -abs(r6.w) = 0.0 ) ? pc[82].x : r9.x; var.y = ( -abs(r6.w) = 0.0 ) ? pc[82].y : r9.y; var.z = ( -abs(r6.w) = 0.0 ) ? pc[82].z : r9.z; var.w = ( -abs(r6.w) = 0.0 ) ? pc[82].w : r9.w; where pc[82] gets demoted to a pull constant, we end up emitting a send(4) instruction to load pc[82] each time, and since they're in different basic blocks because we mishandle the ternary operator in this case we can't combine them. Once we handle this common ternary pattern better the problem will go away. --- src/mesa/drivers/dri/i965/brw_fs.h | 1 - src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 87 2 files changed, 88 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index d9150c3..1c14d13 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -510,7 +510,6 @@ public: const fs_reg src0, const fs_reg src1); bool try_emit_saturate(ir_expression *ir); bool try_emit_mad(ir_expression *ir); - void try_replace_with_sel(); bool opt_peephole_sel(); bool opt_peephole_predicated_break(); bool opt_saturate_propagation(); diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index 4e1badd..d4f08aa 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -2565,91 +2565,6 @@ fs_visitor::emit_if_gen6(ir_if *ir) emit(IF(this-result, fs_reg(0), BRW_CONDITIONAL_NZ)); } -/** - * Try to replace IF/MOV/ELSE/MOV/ENDIF with SEL. - * - * Many GLSL shaders contain the following pattern: - * - *x = condition ? foo : bar - * - * The compiler emits an ir_if tree for this, since each subexpression might be - * a complex tree that could have side-effects or short-circuit logic. - * - * However, the common case is to simply select one of two constants or - * variable values---which is exactly what SEL is for. In this case, the - * assembly looks like: - * - *(+f0) IF - *MOV dst src0 - *ELSE - *MOV dst src1 - *ENDIF - * - * which can be easily translated into: - * - *(+f0) SEL dst src0 src1 - * - * If src0 is an immediate value, we promote it to a temporary GRF. - */ -void -fs_visitor::try_replace_with_sel() -{ - fs_inst *endif_inst = (fs_inst *) instructions.get_tail(); - assert(endif_inst-opcode == BRW_OPCODE_ENDIF); - - /* Pattern match in reverse: IF, MOV, ELSE, MOV, ENDIF. */ - int opcodes[] = { - BRW_OPCODE_IF, BRW_OPCODE_MOV, BRW_OPCODE_ELSE, BRW_OPCODE_MOV, - }; - - fs_inst *match = (fs_inst *) endif_inst-prev; - for (int i = 0; i 4; i++) { - if (match-is_head_sentinel() || match-opcode != opcodes[4-i-1]) - return; - match = (fs_inst *) match-prev; - } - - /* The opcodes match; it looks like the right sequence of instructions. */ - fs_inst *else_mov = (fs_inst *) endif_inst-prev; - fs_inst *then_mov = (fs_inst *) else_mov-prev-prev; - fs_inst *if_inst = (fs_inst *) then_mov-prev; - - /* Check that the MOVs are the right form. */ - if (then_mov-dst.equals(else_mov-dst) - !then_mov-is_partial_write() - !else_mov-is_partial_write()) { - - /* Remove the matched instructions; we'll emit a SEL to replace them. */ - while (!if_inst-next-is_tail_sentinel()) - if_inst-next-exec_node::remove(); - if_inst-exec_node::remove(); - - /* Only the last source register can be a constant, so if the MOV in - * the then clause uses a constant, we need to put it in a temporary. - */ - fs_reg src0(then_mov-src[0]); - if (src0.file == IMM) { - src0 = fs_reg(this, glsl_type::float_type); - src0.type = then_mov-src[0].type; - emit(MOV(src0, then_mov-src[0])); - } - - fs_inst *sel; - if (if_inst-conditional_mod) { - /* Sandybridge-specific IF with embedded comparison */ - emit(CMP(reg_null_d, if_inst-src[0], if_inst-src[1], - if_inst-conditional_mod)); - sel = emit(BRW_OPCODE_SEL, then_mov-dst, src0, else_mov-src[0]); - sel-predicate = BRW_PREDICATE_NORMAL; - } else { - /* Separate CMP and IF instructions */ - sel = emit(BRW_OPCODE_SEL, then_mov-dst, src0, else_mov-src[0]); - sel-predicate = if_inst-predicate; - sel-predicate_inverse = if_inst-predicate_inverse; - } - } -} - void fs_visitor::visit(ir_if *ir) { @@ -2685,8 +2600,6 @@ fs_visitor::visit(ir_if *ir)