Re: [Mesa-dev] [PATCH 21/22] i965/vec4: Allow CSE on subset VF constant loads
On 03/05/2018 04:53 PM, Kenneth Graunke wrote: > On Friday, February 23, 2018 3:56:06 PM PST Ian Romanick wrote: >> From: Ian Romanick>> >> No changes on other platforms. >> >> Haswell, Ivy Bridge, and Sandy Bridge had similar results. (Haswell shown) >> total instructions in shared programs: 13059891 -> 13059884 (<.01%) >> instructions in affected programs: 431 -> 424 (-1.62%) >> helped: 7 >> HURT: 0 >> helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 >> helped stats (rel) min: 1.19% max: 5.26% x̄: 2.05% x̃: 1.49% >> 95% mean confidence interval for instructions value: -1.00 -1.00 >> 95% mean confidence interval for instructions %-change: -3.39% -0.71% >> Instructions are helped. >> >> total cycles in shared programs: 409260032 -> 409260018 (<.01%) >> cycles in affected programs: 4228 -> 4214 (-0.33%) >> helped: 7 >> HURT: 0 >> helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 >> helped stats (rel) min: 0.28% max: 2.04% x̄: 0.54% x̃: 0.28% >> 95% mean confidence interval for cycles value: -2.00 -2.00 >> 95% mean confidence interval for cycles %-change: -1.15% 0.07% >> >> Inconclusive result (%-change mean confidence interval includes 0). >> >> Signed-off-by: Ian Romanick >> --- >> src/intel/compiler/brw_vec4_cse.cpp | 21 + >> 1 file changed, 21 insertions(+) >> >> diff --git a/src/intel/compiler/brw_vec4_cse.cpp >> b/src/intel/compiler/brw_vec4_cse.cpp >> index aa2f127..3302939 100644 >> --- a/src/intel/compiler/brw_vec4_cse.cpp >> +++ b/src/intel/compiler/brw_vec4_cse.cpp >> @@ -104,6 +104,27 @@ operands_match(const vec4_instruction *a, const >> vec4_instruction *b) >>return xs[0].equals(ys[0]) && >> ((xs[1].equals(ys[1]) && xs[2].equals(ys[2])) || >>(xs[2].equals(ys[1]) && xs[1].equals(ys[2]))); >> + } else if (a->opcode == BRW_OPCODE_MOV && >> + xs[0].file == IMM && >> + xs[0].type == BRW_REGISTER_TYPE_VF) { >> + src_reg tmp_x = xs[0]; >> + src_reg tmp_y = ys[0]; >> + >> + /* Smash out the values that are not part of the writemask. Otherwise >> + * the equals and negative_equals operators will fail due to >> mismatches >> + * in unused components. >> + */ >> + static const uint32_t mask[] = { >> + 0x, 0x00ff, 0xff00, 0x, >> + 0x00ff, 0x00ff00ff, 0x0000, 0x00ff, >> + 0xff00, 0xffff, 0xff00ff00, 0xff00, >> + 0x, 0x00ff, 0xff00, 0x >> + }; > > This looks a bit crazy at first glance...it isn't though, it's just a > mapping from WRITEMASK_* to the proper byte masks. But I think it might > be clearer to write it with an explicit formula: > > unsigned ab_writemask = a->dst.writemask & b->dst.writemask; > uint32_t mask = ((ab_writemask & WRITEMASK_X) ? 0x00ff : 0) | > ((ab_writemask & WRITEMASK_Y) ? 0xff00 : 0) | > ((ab_writemask & WRITEMASK_Z) ? 0x00ff : 0) | > ((ab_writemask & WRITEMASK_W) ? 0xff00 : 0); > tmp_x.ud &= mask; > tmp_y.ud &= mask; I was going to complain that this would be bigger / less efficient / something. However, in a release build this is actually 48 bytes smaller. :) > This makes it pretty obvious what's going on. > > Either way (though I prefer mine), > Reviewed-by: Kenneth Graunke > >> + >> + tmp_x.ud &= mask[a->dst.writemask & b->dst.writemask]; >> + tmp_y.ud &= mask[a->dst.writemask & b->dst.writemask]; >> + >> + return tmp_x.equals(tmp_y); >> } else if (!a->is_commutative()) { >>return xs[0].equals(ys[0]) && xs[1].equals(ys[1]) && >> xs[2].equals(ys[2]); >> } else { >> > signature.asc Description: OpenPGP digital signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 21/22] i965/vec4: Allow CSE on subset VF constant loads
On Friday, February 23, 2018 3:56:06 PM PST Ian Romanick wrote: > From: Ian Romanick> > No changes on other platforms. > > Haswell, Ivy Bridge, and Sandy Bridge had similar results. (Haswell shown) > total instructions in shared programs: 13059891 -> 13059884 (<.01%) > instructions in affected programs: 431 -> 424 (-1.62%) > helped: 7 > HURT: 0 > helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 > helped stats (rel) min: 1.19% max: 5.26% x̄: 2.05% x̃: 1.49% > 95% mean confidence interval for instructions value: -1.00 -1.00 > 95% mean confidence interval for instructions %-change: -3.39% -0.71% > Instructions are helped. > > total cycles in shared programs: 409260032 -> 409260018 (<.01%) > cycles in affected programs: 4228 -> 4214 (-0.33%) > helped: 7 > HURT: 0 > helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 > helped stats (rel) min: 0.28% max: 2.04% x̄: 0.54% x̃: 0.28% > 95% mean confidence interval for cycles value: -2.00 -2.00 > 95% mean confidence interval for cycles %-change: -1.15% 0.07% > > Inconclusive result (%-change mean confidence interval includes 0). > > Signed-off-by: Ian Romanick > --- > src/intel/compiler/brw_vec4_cse.cpp | 21 + > 1 file changed, 21 insertions(+) > > diff --git a/src/intel/compiler/brw_vec4_cse.cpp > b/src/intel/compiler/brw_vec4_cse.cpp > index aa2f127..3302939 100644 > --- a/src/intel/compiler/brw_vec4_cse.cpp > +++ b/src/intel/compiler/brw_vec4_cse.cpp > @@ -104,6 +104,27 @@ operands_match(const vec4_instruction *a, const > vec4_instruction *b) >return xs[0].equals(ys[0]) && > ((xs[1].equals(ys[1]) && xs[2].equals(ys[2])) || >(xs[2].equals(ys[1]) && xs[1].equals(ys[2]))); > + } else if (a->opcode == BRW_OPCODE_MOV && > + xs[0].file == IMM && > + xs[0].type == BRW_REGISTER_TYPE_VF) { > + src_reg tmp_x = xs[0]; > + src_reg tmp_y = ys[0]; > + > + /* Smash out the values that are not part of the writemask. Otherwise > + * the equals and negative_equals operators will fail due to mismatches > + * in unused components. > + */ > + static const uint32_t mask[] = { > + 0x, 0x00ff, 0xff00, 0x, > + 0x00ff, 0x00ff00ff, 0x0000, 0x00ff, > + 0xff00, 0xffff, 0xff00ff00, 0xff00, > + 0x, 0x00ff, 0xff00, 0x > + }; This looks a bit crazy at first glance...it isn't though, it's just a mapping from WRITEMASK_* to the proper byte masks. But I think it might be clearer to write it with an explicit formula: unsigned ab_writemask = a->dst.writemask & b->dst.writemask; uint32_t mask = ((ab_writemask & WRITEMASK_X) ? 0x00ff : 0) | ((ab_writemask & WRITEMASK_Y) ? 0xff00 : 0) | ((ab_writemask & WRITEMASK_Z) ? 0x00ff : 0) | ((ab_writemask & WRITEMASK_W) ? 0xff00 : 0); tmp_x.ud &= mask; tmp_y.ud &= mask; This makes it pretty obvious what's going on. Either way (though I prefer mine), Reviewed-by: Kenneth Graunke > + > + tmp_x.ud &= mask[a->dst.writemask & b->dst.writemask]; > + tmp_y.ud &= mask[a->dst.writemask & b->dst.writemask]; > + > + return tmp_x.equals(tmp_y); > } else if (!a->is_commutative()) { >return xs[0].equals(ys[0]) && xs[1].equals(ys[1]) && > xs[2].equals(ys[2]); > } else { > signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 21/22] i965/vec4: Allow CSE on subset VF constant loads
From: Ian RomanickNo changes on other platforms. Haswell, Ivy Bridge, and Sandy Bridge had similar results. (Haswell shown) total instructions in shared programs: 13059891 -> 13059884 (<.01%) instructions in affected programs: 431 -> 424 (-1.62%) helped: 7 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 1.19% max: 5.26% x̄: 2.05% x̃: 1.49% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -3.39% -0.71% Instructions are helped. total cycles in shared programs: 409260032 -> 409260018 (<.01%) cycles in affected programs: 4228 -> 4214 (-0.33%) helped: 7 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.28% max: 2.04% x̄: 0.54% x̃: 0.28% 95% mean confidence interval for cycles value: -2.00 -2.00 95% mean confidence interval for cycles %-change: -1.15% 0.07% Inconclusive result (%-change mean confidence interval includes 0). Signed-off-by: Ian Romanick --- src/intel/compiler/brw_vec4_cse.cpp | 21 + 1 file changed, 21 insertions(+) diff --git a/src/intel/compiler/brw_vec4_cse.cpp b/src/intel/compiler/brw_vec4_cse.cpp index aa2f127..3302939 100644 --- a/src/intel/compiler/brw_vec4_cse.cpp +++ b/src/intel/compiler/brw_vec4_cse.cpp @@ -104,6 +104,27 @@ operands_match(const vec4_instruction *a, const vec4_instruction *b) return xs[0].equals(ys[0]) && ((xs[1].equals(ys[1]) && xs[2].equals(ys[2])) || (xs[2].equals(ys[1]) && xs[1].equals(ys[2]))); + } else if (a->opcode == BRW_OPCODE_MOV && + xs[0].file == IMM && + xs[0].type == BRW_REGISTER_TYPE_VF) { + src_reg tmp_x = xs[0]; + src_reg tmp_y = ys[0]; + + /* Smash out the values that are not part of the writemask. Otherwise + * the equals and negative_equals operators will fail due to mismatches + * in unused components. + */ + static const uint32_t mask[] = { + 0x, 0x00ff, 0xff00, 0x, + 0x00ff, 0x00ff00ff, 0x0000, 0x00ff, + 0xff00, 0xffff, 0xff00ff00, 0xff00, + 0x, 0x00ff, 0xff00, 0x + }; + + tmp_x.ud &= mask[a->dst.writemask & b->dst.writemask]; + tmp_y.ud &= mask[a->dst.writemask & b->dst.writemask]; + + return tmp_x.equals(tmp_y); } else if (!a->is_commutative()) { return xs[0].equals(ys[0]) && xs[1].equals(ys[1]) && xs[2].equals(ys[2]); } else { -- 2.9.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev