Re: [PATCH] PR target/66144, PowerPC improve vector compare
On Mon, Feb 06, 2017 at 01:28:29PM -0500, Michael Meissner wrote: > Can I check it into the trunk? Yes please. Thanks, Segher > 2017-02-06 Michael Meissner> > PR target/66144 > * config/rs6000/vector.md (vcond): Allow the true and > false values to be constant vectors with all 0 or all 1 bits set. > (vcondu): Likewise. > * config/rs6000/predicates.md (vector_int_reg_or_same_bit): New > predicate. > (fpmask_comparison_operator): Update comment. > (vecint_comparison_operator): New predicate. > * config/rs6000/rs6000.c (rs6000_emit_vector_cond_expr): Optimize > vector conditionals when the true and false values are constant > vectors with all 0 bits or all 1 bits set. > > [gcc/testsuite] > 2017-02-06 Michael Meissner > > PR target/66144 > * gcc.target/powerpc/pr66144-1.c: New test. > * gcc.target/powerpc/pr66144-2.c: Likewise. > * gcc.target/powerpc/pr66144-3.c: Likewise.
Re: [PATCH] PR target/66144, PowerPC improve vector compare
On Fri, Feb 03, 2017 at 06:07:56PM -0600, Segher Boessenkool wrote: > On Fri, Feb 03, 2017 at 04:25:00PM -0500, Michael Meissner wrote: > > +;; Return 1 if operand is either a vector constant of all 0 bits of a > > vector > > +;; constant of all 1 bits. > > +(define_predicate "vector_int_same_bit" > > + (match_code "const_vector") > > +{ > > + if (GET_MODE_CLASS (mode) != MODE_VECTOR_INT) > > +return 0; > > + > > + else > > +return op == CONST0_RTX (mode) || op == CONSTM1_RTX (mode); > > +}) > > This predicate is unused as far as I see? I removed it. Thanks. > > + /* Optimize vec1 == vec2, to know the mask generates -1/0. */ > > + if (GET_MODE_CLASS (dest_mode) == MODE_VECTOR_INT) > > { > > - tmp = op_true; > > - op_true = op_false; > > - op_false = tmp; > > + if (op_true == constant_m1 && op_false == constant_0) > > + { > > + emit_move_insn (dest, mask); > > + return 1; > > + } > > + > > + else if (op_true == constant_0 && op_false == constant_m1) > > + { > > + emit_insn (gen_rtx_SET (dest, gen_rtx_NOT (dest_mode, mask))); > > + return 1; > > + } > > } > > Do you need to test for dest_mode == mask_mode here, like below? I reworked the test, see below. > > + if (op_true == constant_m1 && dest_mode == mask_mode) > > +op_true = mask; > > + else if (!REG_P (op_true) && !SUBREG_P (op_true)) > > +op_true = force_reg (dest_mode, op_true); > > + > > + if (op_false == constant_0 && dest_mode == mask_mode) > > +op_false = mask; > > + else if (!REG_P (op_false) && !SUBREG_P (op_false)) > > +op_false = force_reg (dest_mode, op_false); > > Another thing you could try is, if either op_true or op_false is 0 > or -1, let the result be > (mask & op_true) | (~mask & op_false) > > and let the rest of the optimisers sort it out (it's a single vor/vand > or vorc/vandc, or a vnot, or nothing). A later improvement perhaps. > Or does it already handle all cases now :-) That will be a later optimization if desired. I reran the tests with no regressions. When I reran the spec benchmark in more controlled conditions, I wasn't able to reproduce the gcc speedups like I saw with a quick run. I did see the tonto speedups with these changes. As I said, but perhaps I wasn't clear. I will be in the office on Tuesday, but I will be on vacation starting on Wednesday. I would prefer checking in the changes today (Monday) just in case something needs fixing before I leave. Can I check it into the trunk? [gcc] 2017-02-06 Michael MeissnerPR target/66144 * config/rs6000/vector.md (vcond): Allow the true and false values to be constant vectors with all 0 or all 1 bits set. (vcondu): Likewise. * config/rs6000/predicates.md (vector_int_reg_or_same_bit): New predicate. (fpmask_comparison_operator): Update comment. (vecint_comparison_operator): New predicate. * config/rs6000/rs6000.c (rs6000_emit_vector_cond_expr): Optimize vector conditionals when the true and false values are constant vectors with all 0 bits or all 1 bits set. [gcc/testsuite] 2017-02-06 Michael Meissner PR target/66144 * gcc.target/powerpc/pr66144-1.c: New test. * gcc.target/powerpc/pr66144-2.c: Likewise. * gcc.target/powerpc/pr66144-3.c: Likewise. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797 Index: gcc/config/rs6000/predicates.md === --- gcc/config/rs6000/predicates.md (revision 245137) +++ gcc/config/rs6000/predicates.md (working copy) @@ -808,6 +808,21 @@ (define_predicate "all_ones_constant" (and (match_code "const_int,const_double,const_wide_int,const_vector") (match_test "op == CONSTM1_RTX (mode) && !FLOAT_MODE_P (mode)"))) +;; Return 1 if operand is a vector int register or is either a vector constant +;; of all 0 bits of a vector constant of all 1 bits. +(define_predicate "vector_int_reg_or_same_bit" + (match_code "reg,subreg,const_vector") +{ + if (GET_MODE_CLASS (mode) != MODE_VECTOR_INT) +return 0; + + else if (REG_P (op) || SUBREG_P (op)) +return vint_operand (op, mode); + + else +return op == CONST0_RTX (mode) || op == CONSTM1_RTX (mode); +}) + ;; Return 1 if operand is 0.0. (define_predicate "zero_fp_constant" (and (match_code "const_double") @@ -1260,8 +1275,8 @@ (define_predicate "scc_rev_comparison_op (and (match_operand 0 "branch_comparison_operator") (match_code "ne,le,ge,leu,geu,ordered"))) -;; Return 1 if OP is a comparison operator suitable for vector/scalar -;; comparisons that generate a -1/0 mask. +;; Return 1 if OP is a comparison operator suitable for floating point +;; vector/scalar comparisons that generate a -1/0 mask.
Re: [PATCH] PR target/66144, PowerPC improve vector compare
On Fri, Feb 03, 2017 at 06:07:56PM -0600, Segher Boessenkool wrote: > On Fri, Feb 03, 2017 at 04:25:00PM -0500, Michael Meissner wrote: > > +;; Return 1 if operand is either a vector constant of all 0 bits of a > > vector > > +;; constant of all 1 bits. > > +(define_predicate "vector_int_same_bit" > > + (match_code "const_vector") > > +{ > > + if (GET_MODE_CLASS (mode) != MODE_VECTOR_INT) > > +return 0; > > + > > + else > > +return op == CONST0_RTX (mode) || op == CONSTM1_RTX (mode); > > +}) > > This predicate is unused as far as I see? Right. It was used in my first attempt when I had a peephole2 to eliminate the extra loads. Since I moved the processing to when we create the conditional vector assignment and deleted the peephole2, I missed deleting the predicate. Thanks for catching it. > > + /* Optimize vec1 == vec2, to know the mask generates -1/0. */ > > + if (GET_MODE_CLASS (dest_mode) == MODE_VECTOR_INT) > > { > > - tmp = op_true; > > - op_true = op_false; > > - op_false = tmp; > > + if (op_true == constant_m1 && op_false == constant_0) > > + { > > + emit_move_insn (dest, mask); > > + return 1; > > + } > > + > > + else if (op_true == constant_0 && op_false == constant_m1) > > + { > > + emit_insn (gen_rtx_SET (dest, gen_rtx_NOT (dest_mode, mask))); > > + return 1; > > + } > > } > > Do you need to test for dest_mode == mask_mode here, like below? Yes, because there is support for vcondv4siv4sf and vcondv4sfv4si where the mask is one of V4SI or V4SF and the destination is the other. The dest_mode == mask_mode checks for that, and ignore that case. I.e. something like: static float af[1024], bf[1024], cf[1024]; static int di[1024], ei[1024]; // .. for (i = 0; i < 1024; i++) af[i] = (di[i] == ei[i]) ? bf[i] : cf[i]; > > > + if (op_true == constant_m1 && dest_mode == mask_mode) > > +op_true = mask; > > + else if (!REG_P (op_true) && !SUBREG_P (op_true)) > > +op_true = force_reg (dest_mode, op_true); > > + > > + if (op_false == constant_0 && dest_mode == mask_mode) > > +op_false = mask; > > + else if (!REG_P (op_false) && !SUBREG_P (op_false)) > > +op_false = force_reg (dest_mode, op_false); > > Another thing you could try is, if either op_true or op_false is 0 > or -1, let the result be > (mask & op_true) | (~mask & op_false) > > and let the rest of the optimisers sort it out (it's a single vor/vand > or vorc/vandc, or a vnot, or nothing). A later improvement perhaps. > Or does it already handle all cases now :-) I don't know if it would handle it. > > Okay for trunk with the unused predicate removed, and the dest_mode == > mask_mode thing looked at. Thanks! -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: [PATCH] PR target/66144, PowerPC improve vector compare
On Fri, Feb 03, 2017 at 04:25:00PM -0500, Michael Meissner wrote: > +;; Return 1 if operand is either a vector constant of all 0 bits of a vector > +;; constant of all 1 bits. > +(define_predicate "vector_int_same_bit" > + (match_code "const_vector") > +{ > + if (GET_MODE_CLASS (mode) != MODE_VECTOR_INT) > +return 0; > + > + else > +return op == CONST0_RTX (mode) || op == CONSTM1_RTX (mode); > +}) This predicate is unused as far as I see? > + /* Optimize vec1 == vec2, to know the mask generates -1/0. */ > + if (GET_MODE_CLASS (dest_mode) == MODE_VECTOR_INT) > { > - tmp = op_true; > - op_true = op_false; > - op_false = tmp; > + if (op_true == constant_m1 && op_false == constant_0) > + { > + emit_move_insn (dest, mask); > + return 1; > + } > + > + else if (op_true == constant_0 && op_false == constant_m1) > + { > + emit_insn (gen_rtx_SET (dest, gen_rtx_NOT (dest_mode, mask))); > + return 1; > + } > } Do you need to test for dest_mode == mask_mode here, like below? > + if (op_true == constant_m1 && dest_mode == mask_mode) > +op_true = mask; > + else if (!REG_P (op_true) && !SUBREG_P (op_true)) > +op_true = force_reg (dest_mode, op_true); > + > + if (op_false == constant_0 && dest_mode == mask_mode) > +op_false = mask; > + else if (!REG_P (op_false) && !SUBREG_P (op_false)) > +op_false = force_reg (dest_mode, op_false); Another thing you could try is, if either op_true or op_false is 0 or -1, let the result be (mask & op_true) | (~mask & op_false) and let the rest of the optimisers sort it out (it's a single vor/vand or vorc/vandc, or a vnot, or nothing). A later improvement perhaps. Or does it already handle all cases now :-) Okay for trunk with the unused predicate removed, and the dest_mode == mask_mode thing looked at. Thanks! Segher
[PATCH] PR target/66144, PowerPC improve vector compare
This patch improves the code generation for: vector char compare_vc (vector char a, vector char b) { return a == b; } Previously it generated: vcmpequb 2,2,3 vspltisw 1,-1 vspltisw 0,0 xxsel 34,32,33,34 and now it generates: vcmpequb 2,2,3 I changed the vector conditional support for integer vectors to allow constant all 0's or all 1's in addition to registers, and it knows that after a vector comparison, the vector element will be all 1's if the comparison was true, and all 0's if the comparison was false. I did the normal bootstrap build and make check. There were no regressions. I built Spec 2006 with the compiler, and I noticed that 15 out of the 29 benchmarks had code changes with this fix (bzip2, gcc, gamess,milc, gromacs, gobmk, dealII, calculix, hmmer, sjeng, libquantum, h264ref, tonto, wrf, sphinx3). I did a quick run (one pass instead of the normal 3) of the benchmarks that had code changes. There were no regressions in performance, and 2 runs that were faster (gcc and tonto) by 3-4%. Can I check this fix into the GCC trunk? Note, I will be on starting on vacation on Wednesday (February 8th), and I will return to work on Friday February 24th. If I don't get the approval by Monday, I will likely hold the patches until I return to work. [gcc] 2017-02-03 Michael MeissnerPR target/66144 * config/rs6000/vector.md (vcond): Allow the true and false values to be constant vectors with all 0 or all 1 bits set. (vcondu): Likewise. * config/rs6000/predicates.md (vector_int_same_bit): New predicate. (vector_int_reg_or_same_bit): Likewise. (fpmask_comparison_operator): Update comment. (vecint_comparison_operator): New predicate. * config/rs6000/rs6000.c (rs6000_emit_vector_cond_expr): Optimize vector conditionals when the true and false values are constant vectors with all 0 bits or all 1 bits set. [gcc/testsuite] 2017-02-03 Michael Meissner PR target/66144 * gcc.target/powerpc/pr66144-1.c: New test. * gcc.target/powerpc/pr66144-2.c: Likewise. * gcc.target/powerpc/pr66144-3.c: Likewise. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797 Index: gcc/config/rs6000/vector.md === --- gcc/config/rs6000/vector.md (revision 245101) +++ gcc/config/rs6000/vector.md (working copy) @@ -390,13 +390,13 @@ (define_expand "vcond" }") (define_expand "vcond" - [(set (match_operand:VEC_I 0 "vint_operand" "") + [(set (match_operand:VEC_I 0 "vint_operand") (if_then_else:VEC_I (match_operator 3 "comparison_operator" -[(match_operand:VEC_I 4 "vint_operand" "") - (match_operand:VEC_I 5 "vint_operand" "")]) -(match_operand:VEC_I 1 "vint_operand" "") -(match_operand:VEC_I 2 "vint_operand" "")))] +[(match_operand:VEC_I 4 "vint_operand") + (match_operand:VEC_I 5 "vint_operand")]) +(match_operand:VEC_I 1 "vector_int_reg_or_same_bit") +(match_operand:VEC_I 2 "vector_int_reg_or_same_bit")))] "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)" " { @@ -446,13 +446,13 @@ (define_expand "vcondv4siv4sf" }") (define_expand "vcondu" - [(set (match_operand:VEC_I 0 "vint_operand" "") + [(set (match_operand:VEC_I 0 "vint_operand") (if_then_else:VEC_I (match_operator 3 "comparison_operator" -[(match_operand:VEC_I 4 "vint_operand" "") - (match_operand:VEC_I 5 "vint_operand" "")]) -(match_operand:VEC_I 1 "vint_operand" "") -(match_operand:VEC_I 2 "vint_operand" "")))] +[(match_operand:VEC_I 4 "vint_operand") + (match_operand:VEC_I 5 "vint_operand")]) +(match_operand:VEC_I 1 "vector_int_reg_or_same_bit") +(match_operand:VEC_I 2 "vector_int_reg_or_same_bit")))] "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)" " { Index: gcc/config/rs6000/predicates.md === --- gcc/config/rs6000/predicates.md (revision 245101) +++ gcc/config/rs6000/predicates.md (working copy) @@ -808,6 +808,33 @@ (define_predicate "all_ones_constant" (and (match_code "const_int,const_double,const_wide_int,const_vector") (match_test "op == CONSTM1_RTX (mode) && !FLOAT_MODE_P (mode)"))) +;; Return 1 if operand is either a vector constant of all 0 bits of a vector +;; constant of all 1 bits. +(define_predicate "vector_int_same_bit" + (match_code "const_vector") +{ + if (GET_MODE_CLASS (mode) != MODE_VECTOR_INT) +return 0; + + else +return op == CONST0_RTX (mode)