Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.
On Wed, Sep 20, 2023 at 6:52 AM Robin Dapp wrote: > > Hi, > > as described in PR111401 we currently emit a COND and a PLUS expression > for conditional reductions. This makes it difficult to combine both > into a masked reduction statement later. > This patch improves that by directly emitting a COND_ADD during ifcvt and > adjusting some vectorizer code to handle it. > > It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS > is true. > > Related question/change: We only allow PLUS_EXPR in fold_left_reduction_fn > but have code to handle MINUS_EXPR in vectorize_fold_left_reduction. I > suppose that's intentional but it "just works" on riscv and the testsuite > doesn't change when allowing MINUS_EXPR so I went ahead and did that. > > Bootstrapped and regtested on x86 and aarch64. This caused gcc.target/i386/avx512f-reduce-op-1.c testcase to start to fail when testing on a x86_64 that has avx512f (In my case I am using `Intel(R) Xeon(R) D-2166NT CPU @ 2.00GHz`). I reverted the commit to double check it too. The difference in optimized I see is: if (_40 != 3.5e+1) // working vs if (_40 != 6.4e+1) // not working It is test_epi32_ps which is failing with TEST_PS macro and the plus operand that uses TESTOP: TESTOP (add, +, float, ps, 0.0f); \ I have not reduced the testcase any further though. Thanks, Andrew Pinski > > Regards > Robin > > gcc/ChangeLog: > > PR middle-end/111401 > * internal-fn.cc (cond_fn_p): New function. > * internal-fn.h (cond_fn_p): Define. > * tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_ADD > if supported. > (predicate_scalar_phi): Add whitespace. > * tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_ADD. > (neutral_op_for_reduction): Return -0 for PLUS. > (vect_is_simple_reduction): Don't count else operand in > COND_ADD. > (vectorize_fold_left_reduction): Add COND_ADD handling. > (vectorizable_reduction): Don't count else operand in COND_ADD. > (vect_transform_reduction): Add COND_ADD handling. > * tree-vectorizer.h (neutral_op_for_reduction): Add default > parameter. > > gcc/testsuite/ChangeLog: > > * gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test. > * gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test. > --- > gcc/internal-fn.cc| 38 + > gcc/internal-fn.h | 1 + > .../vect-cond-reduc-in-order-2-signed-zero.c | 141 ++ > .../riscv/rvv/autovec/cond/pr111401.c | 61 > gcc/tree-if-conv.cc | 63 ++-- > gcc/tree-vect-loop.cc | 130 > gcc/tree-vectorizer.h | 2 +- > 7 files changed, 394 insertions(+), 42 deletions(-) > create mode 100644 > gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c > > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc > index 0fd34359247..77939890f5a 100644 > --- a/gcc/internal-fn.cc > +++ b/gcc/internal-fn.cc > @@ -4241,6 +4241,44 @@ first_commutative_argument (internal_fn fn) > } > } > > +/* Return true if this CODE describes a conditional (masked) internal_fn. */ > + > +bool > +cond_fn_p (code_helper code) > +{ > + if (!code.is_fn_code ()) > +return false; > + > + if (!internal_fn_p ((combined_fn) code)) > +return false; > + > + internal_fn fn = as_internal_fn ((combined_fn) code); > + switch (fn) > +{ > +#undef DEF_INTERNAL_COND_FN > +#define DEF_INTERNAL_COND_FN(NAME, F, O, T) \ > +case IFN_COND_##NAME:\ > +case IFN_COND_LEN_##NAME:\ > + return true; > +#include "internal-fn.def" > +#undef DEF_INTERNAL_COND_FN > + > +#undef DEF_INTERNAL_SIGNED_COND_FN > +#define DEF_INTERNAL_SIGNED_COND_FN(NAME, F, S, SO, UO, T) \ > +case IFN_COND_##NAME:\ > +case IFN_COND_LEN_##NAME:\ > + return true; > +#include "internal-fn.def" > +#undef DEF_INTERNAL_SIGNED_COND_FN > + > +default: > + return false; > +} > + > + return false; > +} > + > + > /* Return true if this CODE describes an internal_fn that returns a vector > with > elements twice as wide as the element size of the input vectors. */ > > diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h > index 99de13a0199..f1cc9db29c0 100644 > --- a/gcc/internal-fn.h > +++ b/gcc/internal-fn.h > @@ -219,6 +219,7 @@ extern bool commutative_ternary_fn_p (internal_fn); > extern int first_commutative_argument (internal_fn); > extern bool associative_binary_fn_p (internal_fn); > extern bool widening_fn_p (code_helper); >
Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.
On Tue, 31 Oct 2023, Robin Dapp wrote: > >> +int > >> +internal_fn_else_index (internal_fn fn) > > > > The function needs a comment, maybe: > > > > /* If FN is an IFN_COND_* or IFN_COND_LEN_* function, return the index of > > the > >argument that is used when the condition is false. Return -1 otherwise. > > */ > > > > OK for the internal-fn* and tree-if-conv.cc bits (which were the > > parts I commented on earlier). I'll look at cleaning up the > > definition of conditional internal functions separately, so that > > the list of functions isn't necessary. > > Thank you, added the comment (shouldn't have forgotten it in the > first place...). So there's the vectorizer part left that is not > yet OK'd. The vectorizer part is OK. Richard.
Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.
>> +int >> +internal_fn_else_index (internal_fn fn) > > The function needs a comment, maybe: > > /* If FN is an IFN_COND_* or IFN_COND_LEN_* function, return the index of the >argument that is used when the condition is false. Return -1 otherwise. > */ > > OK for the internal-fn* and tree-if-conv.cc bits (which were the > parts I commented on earlier). I'll look at cleaning up the > definition of conditional internal functions separately, so that > the list of functions isn't necessary. Thank you, added the comment (shouldn't have forgotten it in the first place...). So there's the vectorizer part left that is not yet OK'd. Regards Robin
Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.
Robin Dapp writes: > Changed as suggested. The difference to v5 is thus: > > + if (cond_fn_p) > + { > + gcall *call = dyn_cast (use_stmt); > + unsigned else_pos > + = internal_fn_else_index (internal_fn (op.code)); > + > + for (unsigned int j = 0; j < gimple_call_num_args (call); ++j) > + { > + if (j == else_pos) > + continue; > + if (gimple_call_arg (call, j) == op.ops[opi]) > + cnt++; > + } > + } > + else if (!is_gimple_debug (op_use_stmt) > > as well as internal_fn_else_index. > > Testsuite on riscv is unchanged, bootstrap and testsuite on power10 done, > aarch64 and x86 still running. > > Regards > Robin > > From e11ac2b5889558c58ce711d8119ebcd78173ac6c Mon Sep 17 00:00:00 2001 > From: Robin Dapp > Date: Wed, 13 Sep 2023 22:19:35 +0200 > Subject: [PATCH v6] ifcvt/vect: Emit COND_OP for conditional scalar reduction. > > As described in PR111401 we currently emit a COND and a PLUS expression > for conditional reductions. This makes it difficult to combine both > into a masked reduction statement later. > This patch improves that by directly emitting a COND_ADD/COND_OP during > ifcvt and adjusting some vectorizer code to handle it. > > It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS > is true. > > gcc/ChangeLog: > > PR middle-end/111401 > * internal-fn.cc (internal_fn_else_index): New function. > * internal-fn.h (internal_fn_else_index): Define. > * tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_OP > if supported. > (predicate_scalar_phi): Add whitespace. > * tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_OP. > (neutral_op_for_reduction): Return -0 for PLUS. > (check_reduction_path): Don't count else operand in COND_OP. > (vect_is_simple_reduction): Ditto. > (vect_create_epilog_for_reduction): Fix whitespace. > (vectorize_fold_left_reduction): Add COND_OP handling. > (vectorizable_reduction): Don't count else operand in COND_OP. > (vect_transform_reduction): Add COND_OP handling. > * tree-vectorizer.h (neutral_op_for_reduction): Add default > parameter. > > gcc/testsuite/ChangeLog: > > * gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test. > * gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test. > * gcc.target/riscv/rvv/autovec/reduc/reduc_call-2.c: Adjust. > * gcc.target/riscv/rvv/autovec/reduc/reduc_call-4.c: Ditto. > > --- > gcc/internal-fn.cc| 58 ++ > gcc/internal-fn.h | 1 + > .../vect-cond-reduc-in-order-2-signed-zero.c | 141 + > .../riscv/rvv/autovec/cond/pr111401.c | 139 + > .../riscv/rvv/autovec/reduc/reduc_call-2.c| 4 +- > .../riscv/rvv/autovec/reduc/reduc_call-4.c| 4 +- > gcc/tree-if-conv.cc | 49 +++-- > gcc/tree-vect-loop.cc | 193 ++ > gcc/tree-vectorizer.h | 2 +- > 9 files changed, 536 insertions(+), 55 deletions(-) > create mode 100644 > gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c > > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc > index 61d5a9e4772..018175261b9 100644 > --- a/gcc/internal-fn.cc > +++ b/gcc/internal-fn.cc > @@ -4697,6 +4697,64 @@ internal_fn_len_index (internal_fn fn) > } > } > > +int > +internal_fn_else_index (internal_fn fn) The function needs a comment, maybe: /* If FN is an IFN_COND_* or IFN_COND_LEN_* function, return the index of the argument that is used when the condition is false. Return -1 otherwise. */ OK for the internal-fn* and tree-if-conv.cc bits (which were the parts I commented on earlier). I'll look at cleaning up the definition of conditional internal functions separately, so that the list of functions isn't necessary. Thanks, Richard > +{ > + switch (fn) > +{ > +case IFN_COND_NEG: > +case IFN_COND_NOT: > +case IFN_COND_LEN_NEG: > +case IFN_COND_LEN_NOT: > + return 2; > + > +case IFN_COND_ADD: > +case IFN_COND_SUB: > +case IFN_COND_MUL: > +case IFN_COND_DIV: > +case IFN_COND_MOD: > +case IFN_COND_MIN: > +case IFN_COND_MAX: > +case IFN_COND_FMIN: > +case IFN_COND_FMAX: > +case IFN_COND_AND: > +case IFN_COND_IOR: > +case IFN_COND_XOR: > +case IFN_COND_SHL: > +case IFN_COND_SHR: > +case IFN_COND_LEN_ADD: > +case IFN_COND_LEN_SUB: > +case IFN_COND_LEN_MUL: > +case IFN_COND_LEN_DIV: > +case IFN_COND_LEN_MOD: > +case IFN_COND_LEN_MIN: > +case IFN_COND_LEN_MAX: > +case IFN_COND_LEN_FMIN: > +case IFN_COND_LEN_FMAX: > +case IFN_COND_LEN_AND: > +case IFN_COND_LEN_IOR: > +
Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.
Changed as suggested. The difference to v5 is thus: + if (cond_fn_p) + { + gcall *call = dyn_cast (use_stmt); + unsigned else_pos + = internal_fn_else_index (internal_fn (op.code)); + + for (unsigned int j = 0; j < gimple_call_num_args (call); ++j) + { + if (j == else_pos) + continue; + if (gimple_call_arg (call, j) == op.ops[opi]) + cnt++; + } + } + else if (!is_gimple_debug (op_use_stmt) as well as internal_fn_else_index. Testsuite on riscv is unchanged, bootstrap and testsuite on power10 done, aarch64 and x86 still running. Regards Robin >From e11ac2b5889558c58ce711d8119ebcd78173ac6c Mon Sep 17 00:00:00 2001 From: Robin Dapp Date: Wed, 13 Sep 2023 22:19:35 +0200 Subject: [PATCH v6] ifcvt/vect: Emit COND_OP for conditional scalar reduction. As described in PR111401 we currently emit a COND and a PLUS expression for conditional reductions. This makes it difficult to combine both into a masked reduction statement later. This patch improves that by directly emitting a COND_ADD/COND_OP during ifcvt and adjusting some vectorizer code to handle it. It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS is true. gcc/ChangeLog: PR middle-end/111401 * internal-fn.cc (internal_fn_else_index): New function. * internal-fn.h (internal_fn_else_index): Define. * tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_OP if supported. (predicate_scalar_phi): Add whitespace. * tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_OP. (neutral_op_for_reduction): Return -0 for PLUS. (check_reduction_path): Don't count else operand in COND_OP. (vect_is_simple_reduction): Ditto. (vect_create_epilog_for_reduction): Fix whitespace. (vectorize_fold_left_reduction): Add COND_OP handling. (vectorizable_reduction): Don't count else operand in COND_OP. (vect_transform_reduction): Add COND_OP handling. * tree-vectorizer.h (neutral_op_for_reduction): Add default parameter. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test. * gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test. * gcc.target/riscv/rvv/autovec/reduc/reduc_call-2.c: Adjust. * gcc.target/riscv/rvv/autovec/reduc/reduc_call-4.c: Ditto. --- gcc/internal-fn.cc| 58 ++ gcc/internal-fn.h | 1 + .../vect-cond-reduc-in-order-2-signed-zero.c | 141 + .../riscv/rvv/autovec/cond/pr111401.c | 139 + .../riscv/rvv/autovec/reduc/reduc_call-2.c| 4 +- .../riscv/rvv/autovec/reduc/reduc_call-4.c| 4 +- gcc/tree-if-conv.cc | 49 +++-- gcc/tree-vect-loop.cc | 193 ++ gcc/tree-vectorizer.h | 2 +- 9 files changed, 536 insertions(+), 55 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index 61d5a9e4772..018175261b9 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -4697,6 +4697,64 @@ internal_fn_len_index (internal_fn fn) } } +int +internal_fn_else_index (internal_fn fn) +{ + switch (fn) +{ +case IFN_COND_NEG: +case IFN_COND_NOT: +case IFN_COND_LEN_NEG: +case IFN_COND_LEN_NOT: + return 2; + +case IFN_COND_ADD: +case IFN_COND_SUB: +case IFN_COND_MUL: +case IFN_COND_DIV: +case IFN_COND_MOD: +case IFN_COND_MIN: +case IFN_COND_MAX: +case IFN_COND_FMIN: +case IFN_COND_FMAX: +case IFN_COND_AND: +case IFN_COND_IOR: +case IFN_COND_XOR: +case IFN_COND_SHL: +case IFN_COND_SHR: +case IFN_COND_LEN_ADD: +case IFN_COND_LEN_SUB: +case IFN_COND_LEN_MUL: +case IFN_COND_LEN_DIV: +case IFN_COND_LEN_MOD: +case IFN_COND_LEN_MIN: +case IFN_COND_LEN_MAX: +case IFN_COND_LEN_FMIN: +case IFN_COND_LEN_FMAX: +case IFN_COND_LEN_AND: +case IFN_COND_LEN_IOR: +case IFN_COND_LEN_XOR: +case IFN_COND_LEN_SHL: +case IFN_COND_LEN_SHR: + return 3; + +case IFN_COND_FMA: +case IFN_COND_FMS: +case IFN_COND_FNMA: +case IFN_COND_FNMS: +case IFN_COND_LEN_FMA: +case IFN_COND_LEN_FMS: +case IFN_COND_LEN_FNMA: +case IFN_COND_LEN_FNMS: + return 4; + +default: + return -1; +} + + return -1; +} + /* If FN takes a vector mask argument, return the index of that argument, otherwise return -1. */ diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h index 99de13a0199..7d72f4db2d0 100644 --- a/gcc/internal-fn.h +++ b/gcc/internal-fn.h @@
Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.
Richard Biener writes: > On Thu, 19 Oct 2023, Robin Dapp wrote: > >> Ugh, I didn't push yet because with a rebased trunk I am >> seeing different behavior for some riscv testcases. >> >> A reduction is not recognized because there is yet another >> "double use" occurrence in check_reduction_path. I guess it's >> reasonable to loosen the restriction for conditional operations >> here as well. >> >> The only change to v4 therefore is: >> >> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc >> index ebab1953b9c..64654a55e4c 100644 >> --- a/gcc/tree-vect-loop.cc >> +++ b/gcc/tree-vect-loop.cc >> @@ -4085,7 +4094,15 @@ pop: >> || flow_bb_inside_loop_p (loop, gimple_bb (op_use_stmt >> FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter) >> cnt++; >> - if (cnt != 1) >> + >> + bool cond_fn_p = op.code.is_internal_fn () >> + && (conditional_internal_fn_code (internal_fn (*code)) >> + != ERROR_MARK); >> + >> + /* In case of a COND_OP (mask, op1, op2, op1) reduction we might have >> +op1 twice (once as definition, once as else) in the same operation. >> +Allow this. */ >> + if ((!cond_fn_p && cnt != 1) || (opi == 1 && cond_fn_p && cnt != 2)) >> >> Bootstrapped and regtested again on x86, aarch64 and power10. >> Testsuite on riscv unchanged. > > Hmm, why opi == 1 only? I think > > # _1 = PHI <.., _4> > _3 = .COND_ADD (_1, _2, _1); > _4 = .COND_ADD (_3, _5, _3); > > would be fine as well. I think we want to simply ignore the 'else' value > of conditional internal functions. I suppose we have unary, binary > and ternary conditional functions - I miss a internal_fn_else_index, > but I suppose it's always the last one? Yeah, it was always the last one before the introduction of .COND_LEN. I agree internal_fn_else_index would be useful now. Thanks, Richard > > I think a single use on .COND functions is also OK, even when on the > 'else' value only? But maybe that's not too important here. > > Maybe > > gimple *op_use_stmt; > unsigned cnt = 0; > FOR_EACH_IMM_USE_STMT (op_use_stmt, imm_iter, op.ops[opi]) > if (.. op_use_stmt is conditional internal function ..) > { > for (unsigned j = 0; j < gimple_call_num_args (call) - 1; ++j) > if (gimple_call_arg (call, j) == op.ops[opi]) > cnt++; > } > else if (!is_gimple_debug (op_use_stmt) > && (*code != ERROR_MARK > || flow_bb_inside_loop_p (loop, gimple_bb (op_use_stmt > FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter) > cnt++; > > ? > >> Regards >> Robin >> >> Subject: [PATCH v5] ifcvt/vect: Emit COND_OP for conditional scalar >> reduction. >> >> As described in PR111401 we currently emit a COND and a PLUS expression >> for conditional reductions. This makes it difficult to combine both >> into a masked reduction statement later. >> This patch improves that by directly emitting a COND_ADD/COND_OP during >> ifcvt and adjusting some vectorizer code to handle it. >> >> It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS >> is true. >> >> gcc/ChangeLog: >> >> PR middle-end/111401 >> * tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_OP >> if supported. >> (predicate_scalar_phi): Add whitespace. >> * tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_OP. >> (neutral_op_for_reduction): Return -0 for PLUS. >> (check_reduction_path): Don't count else operand in COND_OP. >> (vect_is_simple_reduction): Ditto. >> (vect_create_epilog_for_reduction): Fix whitespace. >> (vectorize_fold_left_reduction): Add COND_OP handling. >> (vectorizable_reduction): Don't count else operand in COND_OP. >> (vect_transform_reduction): Add COND_OP handling. >> * tree-vectorizer.h (neutral_op_for_reduction): Add default >> parameter. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test. >> * gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test. >> * gcc.target/riscv/rvv/autovec/reduc/reduc_call-2.c: Adjust. >> * gcc.target/riscv/rvv/autovec/reduc/reduc_call-4.c: Ditto. >> --- >> .../vect-cond-reduc-in-order-2-signed-zero.c | 141 +++ >> .../riscv/rvv/autovec/cond/pr111401.c | 139 +++ >> .../riscv/rvv/autovec/reduc/reduc_call-2.c| 4 +- >> .../riscv/rvv/autovec/reduc/reduc_call-4.c| 4 +- >> gcc/tree-if-conv.cc | 49 +++-- >> gcc/tree-vect-loop.cc | 168 ++ >> gcc/tree-vectorizer.h | 2 +- >> 7 files changed, 456 insertions(+), 51 deletions(-) >> create mode 100644 >> gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c >> create mode 100644 >> gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c >> >> diff --git >>
Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.
On Thu, 19 Oct 2023, Robin Dapp wrote: > Ugh, I didn't push yet because with a rebased trunk I am > seeing different behavior for some riscv testcases. > > A reduction is not recognized because there is yet another > "double use" occurrence in check_reduction_path. I guess it's > reasonable to loosen the restriction for conditional operations > here as well. > > The only change to v4 therefore is: > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > index ebab1953b9c..64654a55e4c 100644 > --- a/gcc/tree-vect-loop.cc > +++ b/gcc/tree-vect-loop.cc > @@ -4085,7 +4094,15 @@ pop: > || flow_bb_inside_loop_p (loop, gimple_bb (op_use_stmt > FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter) > cnt++; > - if (cnt != 1) > + > + bool cond_fn_p = op.code.is_internal_fn () > + && (conditional_internal_fn_code (internal_fn (*code)) > + != ERROR_MARK); > + > + /* In case of a COND_OP (mask, op1, op2, op1) reduction we might have > +op1 twice (once as definition, once as else) in the same operation. > +Allow this. */ > + if ((!cond_fn_p && cnt != 1) || (opi == 1 && cond_fn_p && cnt != 2)) > > Bootstrapped and regtested again on x86, aarch64 and power10. > Testsuite on riscv unchanged. Hmm, why opi == 1 only? I think # _1 = PHI <.., _4> _3 = .COND_ADD (_1, _2, _1); _4 = .COND_ADD (_3, _5, _3); would be fine as well. I think we want to simply ignore the 'else' value of conditional internal functions. I suppose we have unary, binary and ternary conditional functions - I miss a internal_fn_else_index, but I suppose it's always the last one? I think a single use on .COND functions is also OK, even when on the 'else' value only? But maybe that's not too important here. Maybe gimple *op_use_stmt; unsigned cnt = 0; FOR_EACH_IMM_USE_STMT (op_use_stmt, imm_iter, op.ops[opi]) if (.. op_use_stmt is conditional internal function ..) { for (unsigned j = 0; j < gimple_call_num_args (call) - 1; ++j) if (gimple_call_arg (call, j) == op.ops[opi]) cnt++; } else if (!is_gimple_debug (op_use_stmt) && (*code != ERROR_MARK || flow_bb_inside_loop_p (loop, gimple_bb (op_use_stmt FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter) cnt++; ? > Regards > Robin > > Subject: [PATCH v5] ifcvt/vect: Emit COND_OP for conditional scalar reduction. > > As described in PR111401 we currently emit a COND and a PLUS expression > for conditional reductions. This makes it difficult to combine both > into a masked reduction statement later. > This patch improves that by directly emitting a COND_ADD/COND_OP during > ifcvt and adjusting some vectorizer code to handle it. > > It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS > is true. > > gcc/ChangeLog: > > PR middle-end/111401 > * tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_OP > if supported. > (predicate_scalar_phi): Add whitespace. > * tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_OP. > (neutral_op_for_reduction): Return -0 for PLUS. > (check_reduction_path): Don't count else operand in COND_OP. > (vect_is_simple_reduction): Ditto. > (vect_create_epilog_for_reduction): Fix whitespace. > (vectorize_fold_left_reduction): Add COND_OP handling. > (vectorizable_reduction): Don't count else operand in COND_OP. > (vect_transform_reduction): Add COND_OP handling. > * tree-vectorizer.h (neutral_op_for_reduction): Add default > parameter. > > gcc/testsuite/ChangeLog: > > * gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test. > * gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test. > * gcc.target/riscv/rvv/autovec/reduc/reduc_call-2.c: Adjust. > * gcc.target/riscv/rvv/autovec/reduc/reduc_call-4.c: Ditto. > --- > .../vect-cond-reduc-in-order-2-signed-zero.c | 141 +++ > .../riscv/rvv/autovec/cond/pr111401.c | 139 +++ > .../riscv/rvv/autovec/reduc/reduc_call-2.c| 4 +- > .../riscv/rvv/autovec/reduc/reduc_call-4.c| 4 +- > gcc/tree-if-conv.cc | 49 +++-- > gcc/tree-vect-loop.cc | 168 ++ > gcc/tree-vectorizer.h | 2 +- > 7 files changed, 456 insertions(+), 51 deletions(-) > create mode 100644 > gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c > > diff --git > a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c > b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c > new file mode 100644 > index 000..7b46e7d8a2a > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c > @@ -0,0 +1,141
Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.
Ugh, I didn't push yet because with a rebased trunk I am seeing different behavior for some riscv testcases. A reduction is not recognized because there is yet another "double use" occurrence in check_reduction_path. I guess it's reasonable to loosen the restriction for conditional operations here as well. The only change to v4 therefore is: diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index ebab1953b9c..64654a55e4c 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -4085,7 +4094,15 @@ pop: || flow_bb_inside_loop_p (loop, gimple_bb (op_use_stmt FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter) cnt++; - if (cnt != 1) + + bool cond_fn_p = op.code.is_internal_fn () + && (conditional_internal_fn_code (internal_fn (*code)) + != ERROR_MARK); + + /* In case of a COND_OP (mask, op1, op2, op1) reduction we might have +op1 twice (once as definition, once as else) in the same operation. +Allow this. */ + if ((!cond_fn_p && cnt != 1) || (opi == 1 && cond_fn_p && cnt != 2)) Bootstrapped and regtested again on x86, aarch64 and power10. Testsuite on riscv unchanged. Regards Robin Subject: [PATCH v5] ifcvt/vect: Emit COND_OP for conditional scalar reduction. As described in PR111401 we currently emit a COND and a PLUS expression for conditional reductions. This makes it difficult to combine both into a masked reduction statement later. This patch improves that by directly emitting a COND_ADD/COND_OP during ifcvt and adjusting some vectorizer code to handle it. It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS is true. gcc/ChangeLog: PR middle-end/111401 * tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_OP if supported. (predicate_scalar_phi): Add whitespace. * tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_OP. (neutral_op_for_reduction): Return -0 for PLUS. (check_reduction_path): Don't count else operand in COND_OP. (vect_is_simple_reduction): Ditto. (vect_create_epilog_for_reduction): Fix whitespace. (vectorize_fold_left_reduction): Add COND_OP handling. (vectorizable_reduction): Don't count else operand in COND_OP. (vect_transform_reduction): Add COND_OP handling. * tree-vectorizer.h (neutral_op_for_reduction): Add default parameter. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test. * gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test. * gcc.target/riscv/rvv/autovec/reduc/reduc_call-2.c: Adjust. * gcc.target/riscv/rvv/autovec/reduc/reduc_call-4.c: Ditto. --- .../vect-cond-reduc-in-order-2-signed-zero.c | 141 +++ .../riscv/rvv/autovec/cond/pr111401.c | 139 +++ .../riscv/rvv/autovec/reduc/reduc_call-2.c| 4 +- .../riscv/rvv/autovec/reduc/reduc_call-4.c| 4 +- gcc/tree-if-conv.cc | 49 +++-- gcc/tree-vect-loop.cc | 168 ++ gcc/tree-vectorizer.h | 2 +- 7 files changed, 456 insertions(+), 51 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c new file mode 100644 index 000..7b46e7d8a2a --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c @@ -0,0 +1,141 @@ +/* Make sure a -0 stays -0 when we perform a conditional reduction. */ +/* { dg-do run } */ +/* { dg-require-effective-target vect_double } */ +/* { dg-add-options ieee } */ +/* { dg-additional-options "-std=gnu99 -fno-fast-math" } */ + +#include "tree-vect.h" + +#include + +#define N (VECTOR_BITS * 17) + +double __attribute__ ((noinline, noclone)) +reduc_plus_double (double *restrict a, double init, int *cond, int n) +{ + double res = init; + for (int i = 0; i < n; i++) +if (cond[i]) + res += a[i]; + return res; +} + +double __attribute__ ((noinline, noclone, optimize ("0"))) +reduc_plus_double_ref (double *restrict a, double init, int *cond, int n) +{ + double res = init; + for (int i = 0; i < n; i++) +if (cond[i]) + res += a[i]; + return res; +} + +double __attribute__ ((noinline, noclone)) +reduc_minus_double (double *restrict a, double init, int *cond, int n) +{ + double res = init; + for (int i = 0; i < n; i++) +if (cond[i]) + res -= a[i]; + return res; +} + +double __attribute__ ((noinline, noclone, optimize ("0"))) +reduc_minus_double_ref (double *restrict a, double init, int *cond, int n) +{ + double res = init; + for (int i = 0; i < n; i++) +if (cond[i]) + res -= a[i]; + return res; +} + +int
Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.
On Wed, 11 Oct 2023, Robin Dapp wrote: > > It wasn't very clear, sorry, but it was the last sentence I was asking > > for clarification on, not the other bits. Why do we want to avoid > > generating a COND_ADD when the operand is a vectorisable call? > > Ah, I see, apologies. Upon thinking about it a bit more (thanks) > I figured this hunk is not necessary. I added it early in the process > in order to keep the current behavior for situations like the following: > > before: > _1 = .FMA (...) > _2 = COND (cond, .FMA, 0.0) > _3 = COND_ADD (true, result, _2, result) > > This we would simplify to: > _2 = COND_FMA (cond, ...) > _3 = COND_ADD (true, result, _2, result) > > with the patch we have: > _1 = .FMA (...) > _2 = .COND_ADD (cond, arg1, _1, arg1) > > Due to differences in expansion we'd end up with a masked > vfmacc ("a += a + b * c") before and now emit an unmasked > vfmadd ("a += a * b + c") and a masked result add. This shouldn't > be worse from a vector spec point of view, so I just changed the > test expectation for now. > > The attached v4 also includes Richi's suggestion for the HONOR... > stuff. > > Bootstrap and regtest unchanged on aarch64, x86 and power10. OK Thanks, Richard. > Regards > Robin > > > From 1752507ce22c22b50b96f889dc0a9c2fc8e50859 Mon Sep 17 00:00:00 2001 > From: Robin Dapp > Date: Wed, 13 Sep 2023 22:19:35 +0200 > Subject: [PATCH v4] ifcvt/vect: Emit COND_ADD for conditional scalar > reduction. > > As described in PR111401 we currently emit a COND and a PLUS expression > for conditional reductions. This makes it difficult to combine both > into a masked reduction statement later. > This patch improves that by directly emitting a COND_ADD during ifcvt and > adjusting some vectorizer code to handle it. > > It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS > is true. > > gcc/ChangeLog: > > PR middle-end/111401 > * tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_ADD > if supported. > (predicate_scalar_phi): Add whitespace. > * tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_ADD. > (neutral_op_for_reduction): Return -0 for PLUS. > (vect_is_simple_reduction): Don't count else operand in > COND_ADD. > (vect_create_epilog_for_reduction): Fix whitespace. > (vectorize_fold_left_reduction): Add COND_ADD handling. > (vectorizable_reduction): Don't count else operand in COND_ADD. > (vect_transform_reduction): Add COND_ADD handling. > * tree-vectorizer.h (neutral_op_for_reduction): Add default > parameter. > > gcc/testsuite/ChangeLog: > > * gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test. > * gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test. > * gcc.target/riscv/rvv/autovec/reduc/reduc_call-2.c: Adjust. > * gcc.target/riscv/rvv/autovec/reduc/reduc_call-4.c: Ditto. > --- > .../vect-cond-reduc-in-order-2-signed-zero.c | 141 > .../riscv/rvv/autovec/cond/pr111401.c | 139 > .../riscv/rvv/autovec/reduc/reduc_call-2.c| 4 +- > .../riscv/rvv/autovec/reduc/reduc_call-4.c| 4 +- > gcc/tree-if-conv.cc | 49 -- > gcc/tree-vect-loop.cc | 156 ++ > gcc/tree-vectorizer.h | 2 +- > 7 files changed, 446 insertions(+), 49 deletions(-) > create mode 100644 > gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c > > diff --git > a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c > b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c > new file mode 100644 > index 000..7b46e7d8a2a > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c > @@ -0,0 +1,141 @@ > +/* Make sure a -0 stays -0 when we perform a conditional reduction. */ > +/* { dg-do run } */ > +/* { dg-require-effective-target vect_double } */ > +/* { dg-add-options ieee } */ > +/* { dg-additional-options "-std=gnu99 -fno-fast-math" } */ > + > +#include "tree-vect.h" > + > +#include > + > +#define N (VECTOR_BITS * 17) > + > +double __attribute__ ((noinline, noclone)) > +reduc_plus_double (double *restrict a, double init, int *cond, int n) > +{ > + double res = init; > + for (int i = 0; i < n; i++) > +if (cond[i]) > + res += a[i]; > + return res; > +} > + > +double __attribute__ ((noinline, noclone, optimize ("0&
Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.
Robin Dapp writes: >> It wasn't very clear, sorry, but it was the last sentence I was asking >> for clarification on, not the other bits. Why do we want to avoid >> generating a COND_ADD when the operand is a vectorisable call? > > Ah, I see, apologies. Upon thinking about it a bit more (thanks) > I figured this hunk is not necessary. I added it early in the process > in order to keep the current behavior for situations like the following: > > before: > _1 = .FMA (...) > _2 = COND (cond, .FMA, 0.0) > _3 = COND_ADD (true, result, _2, result) > > This we would simplify to: > _2 = COND_FMA (cond, ...) > _3 = COND_ADD (true, result, _2, result) > > with the patch we have: > _1 = .FMA (...) > _2 = .COND_ADD (cond, arg1, _1, arg1) > > Due to differences in expansion we'd end up with a masked > vfmacc ("a += a + b * c") before and now emit an unmasked > vfmadd ("a += a * b + c") and a masked result add. This shouldn't > be worse from a vector spec point of view, so I just changed the > test expectation for now. Thanks, sounds good. > The attached v4 also includes Richi's suggestion for the HONOR... > stuff. > > Bootstrap and regtest unchanged on aarch64, x86 and power10. I'm reluctant to comment on the signed zeros/MINUS_EXPR parts, but FWIW, the rest looks good to me. Thanks, Richard > > Regards > Robin > > > From 1752507ce22c22b50b96f889dc0a9c2fc8e50859 Mon Sep 17 00:00:00 2001 > From: Robin Dapp > Date: Wed, 13 Sep 2023 22:19:35 +0200 > Subject: [PATCH v4] ifcvt/vect: Emit COND_ADD for conditional scalar > reduction. > > As described in PR111401 we currently emit a COND and a PLUS expression > for conditional reductions. This makes it difficult to combine both > into a masked reduction statement later. > This patch improves that by directly emitting a COND_ADD during ifcvt and > adjusting some vectorizer code to handle it. > > It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS > is true. > > gcc/ChangeLog: > > PR middle-end/111401 > * tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_ADD > if supported. > (predicate_scalar_phi): Add whitespace. > * tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_ADD. > (neutral_op_for_reduction): Return -0 for PLUS. > (vect_is_simple_reduction): Don't count else operand in > COND_ADD. > (vect_create_epilog_for_reduction): Fix whitespace. > (vectorize_fold_left_reduction): Add COND_ADD handling. > (vectorizable_reduction): Don't count else operand in COND_ADD. > (vect_transform_reduction): Add COND_ADD handling. > * tree-vectorizer.h (neutral_op_for_reduction): Add default > parameter. > > gcc/testsuite/ChangeLog: > > * gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test. > * gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test. > * gcc.target/riscv/rvv/autovec/reduc/reduc_call-2.c: Adjust. > * gcc.target/riscv/rvv/autovec/reduc/reduc_call-4.c: Ditto. > --- > .../vect-cond-reduc-in-order-2-signed-zero.c | 141 > .../riscv/rvv/autovec/cond/pr111401.c | 139 > .../riscv/rvv/autovec/reduc/reduc_call-2.c| 4 +- > .../riscv/rvv/autovec/reduc/reduc_call-4.c| 4 +- > gcc/tree-if-conv.cc | 49 -- > gcc/tree-vect-loop.cc | 156 ++ > gcc/tree-vectorizer.h | 2 +- > 7 files changed, 446 insertions(+), 49 deletions(-) > create mode 100644 > gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c > > diff --git > a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c > b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c > new file mode 100644 > index 000..7b46e7d8a2a > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c > @@ -0,0 +1,141 @@ > +/* Make sure a -0 stays -0 when we perform a conditional reduction. */ > +/* { dg-do run } */ > +/* { dg-require-effective-target vect_double } */ > +/* { dg-add-options ieee } */ > +/* { dg-additional-options "-std=gnu99 -fno-fast-math" } */ > + > +#include "tree-vect.h" > + > +#include > + > +#define N (VECTOR_BITS * 17) > + > +double __attribute__ ((noinline, noclone)) > +reduc_plus_double (double *restrict a, double init, int *cond, int n) > +{ > + double res = init; > + for (int i = 0; i < n; i++) > +if (cond[i]) > + res += a[i]; > + return res; &g
Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.
> It wasn't very clear, sorry, but it was the last sentence I was asking > for clarification on, not the other bits. Why do we want to avoid > generating a COND_ADD when the operand is a vectorisable call? Ah, I see, apologies. Upon thinking about it a bit more (thanks) I figured this hunk is not necessary. I added it early in the process in order to keep the current behavior for situations like the following: before: _1 = .FMA (...) _2 = COND (cond, .FMA, 0.0) _3 = COND_ADD (true, result, _2, result) This we would simplify to: _2 = COND_FMA (cond, ...) _3 = COND_ADD (true, result, _2, result) with the patch we have: _1 = .FMA (...) _2 = .COND_ADD (cond, arg1, _1, arg1) Due to differences in expansion we'd end up with a masked vfmacc ("a += a + b * c") before and now emit an unmasked vfmadd ("a += a * b + c") and a masked result add. This shouldn't be worse from a vector spec point of view, so I just changed the test expectation for now. The attached v4 also includes Richi's suggestion for the HONOR... stuff. Bootstrap and regtest unchanged on aarch64, x86 and power10. Regards Robin >From 1752507ce22c22b50b96f889dc0a9c2fc8e50859 Mon Sep 17 00:00:00 2001 From: Robin Dapp Date: Wed, 13 Sep 2023 22:19:35 +0200 Subject: [PATCH v4] ifcvt/vect: Emit COND_ADD for conditional scalar reduction. As described in PR111401 we currently emit a COND and a PLUS expression for conditional reductions. This makes it difficult to combine both into a masked reduction statement later. This patch improves that by directly emitting a COND_ADD during ifcvt and adjusting some vectorizer code to handle it. It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS is true. gcc/ChangeLog: PR middle-end/111401 * tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_ADD if supported. (predicate_scalar_phi): Add whitespace. * tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_ADD. (neutral_op_for_reduction): Return -0 for PLUS. (vect_is_simple_reduction): Don't count else operand in COND_ADD. (vect_create_epilog_for_reduction): Fix whitespace. (vectorize_fold_left_reduction): Add COND_ADD handling. (vectorizable_reduction): Don't count else operand in COND_ADD. (vect_transform_reduction): Add COND_ADD handling. * tree-vectorizer.h (neutral_op_for_reduction): Add default parameter. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test. * gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test. * gcc.target/riscv/rvv/autovec/reduc/reduc_call-2.c: Adjust. * gcc.target/riscv/rvv/autovec/reduc/reduc_call-4.c: Ditto. --- .../vect-cond-reduc-in-order-2-signed-zero.c | 141 .../riscv/rvv/autovec/cond/pr111401.c | 139 .../riscv/rvv/autovec/reduc/reduc_call-2.c| 4 +- .../riscv/rvv/autovec/reduc/reduc_call-4.c| 4 +- gcc/tree-if-conv.cc | 49 -- gcc/tree-vect-loop.cc | 156 ++ gcc/tree-vectorizer.h | 2 +- 7 files changed, 446 insertions(+), 49 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c new file mode 100644 index 000..7b46e7d8a2a --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c @@ -0,0 +1,141 @@ +/* Make sure a -0 stays -0 when we perform a conditional reduction. */ +/* { dg-do run } */ +/* { dg-require-effective-target vect_double } */ +/* { dg-add-options ieee } */ +/* { dg-additional-options "-std=gnu99 -fno-fast-math" } */ + +#include "tree-vect.h" + +#include + +#define N (VECTOR_BITS * 17) + +double __attribute__ ((noinline, noclone)) +reduc_plus_double (double *restrict a, double init, int *cond, int n) +{ + double res = init; + for (int i = 0; i < n; i++) +if (cond[i]) + res += a[i]; + return res; +} + +double __attribute__ ((noinline, noclone, optimize ("0"))) +reduc_plus_double_ref (double *restrict a, double init, int *cond, int n) +{ + double res = init; + for (int i = 0; i < n; i++) +if (cond[i]) + res += a[i]; + return res; +} + +double __attribute__ ((noinline, noclone)) +reduc_minus_double (double *restrict a, double init, int *cond, int n) +{ + double res = init; + for (int i = 0; i < n; i++) +if (cond[i]) + res -= a[i]; + return res; +} + +double __attribute__ ((noinline, noclone, optimize ("0"))) +reduc_minus_double_ref (double *restrict a, double init, int *cond, int n) +{ + double res = init; +
Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.
Robin Dapp writes: >> It'd be good to expand on this comment a bit. What kind of COND are you >> anticipating? A COND with the neutral op as the else value, so that the >> PLUS_EXPR (or whatever) can remain unconditional? If so, it would be >> good to sketch briefly how that happens, and why it's better than using >> the conditional PLUS_EXPR. >> >> If that's the reason, perhaps we want a single-use check as well. >> It's possible that OP1 is used elsewhere in the loop body, in a >> context that would prefer a different else value. > > Would something like the following on top work? > > - /* If possible try to create an IFN_COND_ADD instead of a COND_EXPR and > - a PLUS_EXPR. Don't do this if the reduction def operand itself is > + /* If possible create a COND_OP instead of a COND_EXPR and an OP_EXPR. > + The COND_OP will have a neutral_op else value. > + > + This allows re-using the mask directly in a masked reduction instead > + of creating a vector merge (or similar) and then an unmasked reduction. > + > + Don't do this if the reduction def operand itself is > a vectorizable call as we can create a COND version of it directly. */ It wasn't very clear, sorry, but it was the last sentence I was asking for clarification on, not the other bits. Why do we want to avoid generating a COND_ADD when the operand is a vectorisable call? Thanks, Richard > >if (ifn != IFN_LAST >&& vectorized_internal_fn_supported_p (ifn, TREE_TYPE (lhs)) > - && try_cond_op && !swap) > + && use_cond_op && !swap && has_single_use (op1)) > > Regards > Robin
Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.
On Mon, 9 Oct 2023, Robin Dapp wrote: > > Hmm, the function is called at transform time so this shouldn't help > > avoiding the ICE. I expected we refuse to vectorize _any_ reduction > > when sign dependent rounding is in effect? OTOH maybe sign-dependent > > rounding is OK but only when we use a unconditional fold-left > > (so a loop mask from fully masking is OK but not an original COND_ADD?). > > So we currently only disable the use of partial vectors > > else if (reduction_type == FOLD_LEFT_REDUCTION > && reduc_fn == IFN_LAST aarch64 probably chokes because reduc_fn is not IFN_LAST. > && FLOAT_TYPE_P (vectype_in) > && HONOR_SIGNED_ZEROS (vectype_in) so with your change we'd support signed zeros correctly. > && HONOR_SIGN_DEPENDENT_ROUNDING (vectype_in)) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, >"can't operate on partial vectors because" >" signed zeros cannot be preserved.\n"); > LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false; > > which is inside a LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P block. > > For the fully masked case we continue (and then fail the assertion > on aarch64 at transform time). > > I didn't get why that case is ok, though? We still merge the initial > definition with the identity/neutral op (i.e. possibly -0.0) based on > the loop mask. Is that different to partial masking? I think the main point with my earlier change is that without native support for a fold-left reduction (like on x86) we get ops = mask ? ops : neutral; acc += ops[0]; acc += ops[1]; ... so we wouldn't use a COND_ADD but add neutral elements for masked elements. That's OK for signed zeros after your change (great) but not OK for sign dependent rounding (because we can't decide on the sign of the neutral zero then). For the case of using an internal function, thus direct target support, it should be OK to have sign-dependent rounding if we can use the masked-fold-left reduction op. As we do /* On the first iteration the input is simply the scalar phi result, and for subsequent iterations it is the output of the preceding operation. */ if (reduc_fn != IFN_LAST || (mask && mask_reduc_fn != IFN_LAST)) { if (mask && len && mask_reduc_fn == IFN_MASK_LEN_FOLD_LEFT_PLUS) new_stmt = gimple_build_call_internal (mask_reduc_fn, 5, reduc_var, def0, mask, len, bias); else if (mask && mask_reduc_fn == IFN_MASK_FOLD_LEFT_PLUS) new_stmt = gimple_build_call_internal (mask_reduc_fn, 3, reduc_var, def0, mask); else new_stmt = gimple_build_call_internal (reduc_fn, 2, reduc_var, def0); the last case should be able to assert that !HONOR_SIGN_DEPENDENT_ROUNDING (also the reduc_fn == IFN_LAST case). The quoted condition above should change to drop the HONOR_SIGNED_ZEROS condition and the reduc_fn == IFN_LAST should change, maybe to internal_fn_mask_index (reduc_fn) == -1? Richard.
Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.
> Hmm, the function is called at transform time so this shouldn't help > avoiding the ICE. I expected we refuse to vectorize _any_ reduction > when sign dependent rounding is in effect? OTOH maybe sign-dependent > rounding is OK but only when we use a unconditional fold-left > (so a loop mask from fully masking is OK but not an original COND_ADD?). So we currently only disable the use of partial vectors else if (reduction_type == FOLD_LEFT_REDUCTION && reduc_fn == IFN_LAST && FLOAT_TYPE_P (vectype_in) && HONOR_SIGNED_ZEROS (vectype_in) && HONOR_SIGN_DEPENDENT_ROUNDING (vectype_in)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "can't operate on partial vectors because" " signed zeros cannot be preserved.\n"); LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false; which is inside a LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P block. For the fully masked case we continue (and then fail the assertion on aarch64 at transform time). I didn't get why that case is ok, though? We still merge the initial definition with the identity/neutral op (i.e. possibly -0.0) based on the loop mask. Is that different to partial masking? Regards Robin
Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.
> It'd be good to expand on this comment a bit. What kind of COND are you > anticipating? A COND with the neutral op as the else value, so that the > PLUS_EXPR (or whatever) can remain unconditional? If so, it would be > good to sketch briefly how that happens, and why it's better than using > the conditional PLUS_EXPR. > > If that's the reason, perhaps we want a single-use check as well. > It's possible that OP1 is used elsewhere in the loop body, in a > context that would prefer a different else value. Would something like the following on top work? - /* If possible try to create an IFN_COND_ADD instead of a COND_EXPR and - a PLUS_EXPR. Don't do this if the reduction def operand itself is + /* If possible create a COND_OP instead of a COND_EXPR and an OP_EXPR. + The COND_OP will have a neutral_op else value. + + This allows re-using the mask directly in a masked reduction instead + of creating a vector merge (or similar) and then an unmasked reduction. + + Don't do this if the reduction def operand itself is a vectorizable call as we can create a COND version of it directly. */ if (ifn != IFN_LAST && vectorized_internal_fn_supported_p (ifn, TREE_TYPE (lhs)) - && try_cond_op && !swap) + && use_cond_op && !swap && has_single_use (op1)) Regards Robin
Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.
On Fri, 6 Oct 2023, Robin Dapp wrote: > > So if you think you got everything correct the patch is OK as-is, > > I just wasn't sure - maybe the neutral_element change deserves > > a comment as to how MINUS_EXPR is handled. > > Heh, I never think I got everything correct ;) > > Added this now: > > static bool > fold_left_reduction_fn (code_helper code, internal_fn *reduc_fn) > { > + /* We support MINUS_EXPR by negating the operand. This also preserves an > + initial -0.0 since -0.0 - 0.0 (neutral op for MINUS_EXPR) == -0.0 + > + (-0.0) = -0.0. */ > > What I still found is that aarch64 ICEs at the assertion you added > with -frounding-math. Therefore I changed it to: > > - gcc_assert (!HONOR_SIGN_DEPENDENT_ROUNDING (vectype_out)); > + if (HONOR_SIGN_DEPENDENT_ROUNDING (vectype_out)) > + { > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > +"cannot vectorize fold-left reduction > because" > +" signed zeros cannot be preserved.\n"); > + return false; > + } > > No code changes apart from that. Will leave it until Monday and push then > barring any objections. Hmm, the function is called at transform time so this shouldn't help avoiding the ICE. I expected we refuse to vectorize _any_ reduction when sign dependent rounding is in effect? OTOH maybe sign-dependent rounding is OK but only when we use a unconditional fold-left (so a loop mask from fully masking is OK but not an original COND_ADD?). Still the check should be done in vectorizable_reduction, not only during transform (there the assert is proper, if we can distinguish the loop mask vs. the COND_ADD here, otherwise just remove it). Richard. > Thanks for the pointers. > > Regards > Robin > > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.
Robin Dapp writes: > Hi Tamar, > >> The only comment I have is whether you actually need this helper >> function? It looks like all the uses of it are in cases you have, or >> will call conditional_internal_fn_code directly. > removed the cond_fn_p entirely in the attached v3. > > Bootstrapped and regtested on x86_64, aarch64 and power10. > > Regards > Robin > > Subject: [PATCH v3] ifcvt/vect: Emit COND_ADD for conditional scalar > reduction. > > As described in PR111401 we currently emit a COND and a PLUS expression > for conditional reductions. This makes it difficult to combine both > into a masked reduction statement later. > This patch improves that by directly emitting a COND_ADD during ifcvt and > adjusting some vectorizer code to handle it. > > It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS > is true. > > gcc/ChangeLog: > > PR middle-end/111401 > * tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_ADD > if supported. > (predicate_scalar_phi): Add whitespace. > * tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_ADD. > (neutral_op_for_reduction): Return -0 for PLUS. > (vect_is_simple_reduction): Don't count else operand in > COND_ADD. > (vect_create_epilog_for_reduction): Fix whitespace. > (vectorize_fold_left_reduction): Add COND_ADD handling. > (vectorizable_reduction): Don't count else operand in COND_ADD. > (vect_transform_reduction): Add COND_ADD handling. > * tree-vectorizer.h (neutral_op_for_reduction): Add default > parameter. > > gcc/testsuite/ChangeLog: > > * gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test. > * gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test. The patch LGTM too FWIW, except... > --- > .../vect-cond-reduc-in-order-2-signed-zero.c | 141 > .../riscv/rvv/autovec/cond/pr111401.c | 139 > gcc/tree-if-conv.cc | 63 ++-- > gcc/tree-vect-loop.cc | 150 ++ > gcc/tree-vectorizer.h | 2 +- > 5 files changed, 451 insertions(+), 44 deletions(-) > create mode 100644 > gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c > > diff --git > a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c > b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c > new file mode 100644 > index 000..7b46e7d8a2a > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c > @@ -0,0 +1,141 @@ > +/* Make sure a -0 stays -0 when we perform a conditional reduction. */ > +/* { dg-do run } */ > +/* { dg-require-effective-target vect_double } */ > +/* { dg-add-options ieee } */ > +/* { dg-additional-options "-std=gnu99 -fno-fast-math" } */ > + > +#include "tree-vect.h" > + > +#include > + > +#define N (VECTOR_BITS * 17) > + > +double __attribute__ ((noinline, noclone)) > +reduc_plus_double (double *restrict a, double init, int *cond, int n) > +{ > + double res = init; > + for (int i = 0; i < n; i++) > +if (cond[i]) > + res += a[i]; > + return res; > +} > + > +double __attribute__ ((noinline, noclone, optimize ("0"))) > +reduc_plus_double_ref (double *restrict a, double init, int *cond, int n) > +{ > + double res = init; > + for (int i = 0; i < n; i++) > +if (cond[i]) > + res += a[i]; > + return res; > +} > + > +double __attribute__ ((noinline, noclone)) > +reduc_minus_double (double *restrict a, double init, int *cond, int n) > +{ > + double res = init; > + for (int i = 0; i < n; i++) > +if (cond[i]) > + res -= a[i]; > + return res; > +} > + > +double __attribute__ ((noinline, noclone, optimize ("0"))) > +reduc_minus_double_ref (double *restrict a, double init, int *cond, int n) > +{ > + double res = init; > + for (int i = 0; i < n; i++) > +if (cond[i]) > + res -= a[i]; > + return res; > +} > + > +int __attribute__ ((optimize (1))) > +main () > +{ > + int n = 19; > + double a[N]; > + int cond1[N], cond2[N]; > + > + for (int i = 0; i < N; i++) > +{ > + a[i] = (i * 0.1) * (i & 1 ? 1 : -1); > + cond1[i] = 0; > + cond2[i] = i & 4 ? 1 : 0; > + asm volatile ("" ::: "memory"); > +} > + > + double res1 = reduc_plus_double (a, -0.0, cond1, n); > + double ref1 = reduc_plus_double_ref (a, -
Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.
> So if you think you got everything correct the patch is OK as-is, > I just wasn't sure - maybe the neutral_element change deserves > a comment as to how MINUS_EXPR is handled. Heh, I never think I got everything correct ;) Added this now: static bool fold_left_reduction_fn (code_helper code, internal_fn *reduc_fn) { + /* We support MINUS_EXPR by negating the operand. This also preserves an + initial -0.0 since -0.0 - 0.0 (neutral op for MINUS_EXPR) == -0.0 + + (-0.0) = -0.0. */ What I still found is that aarch64 ICEs at the assertion you added with -frounding-math. Therefore I changed it to: - gcc_assert (!HONOR_SIGN_DEPENDENT_ROUNDING (vectype_out)); + if (HONOR_SIGN_DEPENDENT_ROUNDING (vectype_out)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, +"cannot vectorize fold-left reduction because" +" signed zeros cannot be preserved.\n"); + return false; + } No code changes apart from that. Will leave it until Monday and push then barring any objections. Thanks for the pointers. Regards Robin
Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.
On Fri, 6 Oct 2023, Robin Dapp wrote: > > We might need a similar assert > > > > gcc_assert (HONOR_SIGNED_ZEROS (vectype_out) > > && !HONOR_SIGN_DEPENDENT_ROUNDING (vectype_out));? > > erm, obviously not that exact assert but more something like > > if (HONOR_SIGNED_ZEROS && !HONOR_SIGN_DEPENDENT_ROUNDING...) > { > if (dump) > ... > return false; > } > > or so. Yeah, of course the whole point of a fold-left reduction is to _not_ give up without -ffast-math which is why I added the above. I obviously didn't fully verify what happens for an original MINUS_EXPR. I think it's required to give up for -frounding-math, but I think I might have put the code to do that in a generic enough place. For x86 you need --param vect-partial-vector-usage=2 and an AVX512 enabled arch like -march=skylake-avx512 or -march=znver4. I think tranforming - x to + (-x) works for signed zeros. So if you think you got everything correct the patch is OK as-is, I just wasn't sure - maybe the neutral_element change deserves a comment as to how MINUS_EXPR is handled. Richard.
Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.
> We might need a similar assert > > gcc_assert (HONOR_SIGNED_ZEROS (vectype_out) > && !HONOR_SIGN_DEPENDENT_ROUNDING (vectype_out));? erm, obviously not that exact assert but more something like if (HONOR_SIGNED_ZEROS && !HONOR_SIGN_DEPENDENT_ROUNDING...) { if (dump) ... return false; } or so. Regards Robin
Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.
> ... here we probably get PLUS_EXPR for MINUS_EXPR above but IIRC > for MINUS_EXPR the !as_initial case should return positive zero. > > Can you double-check? You're referring to the canonicalization from a - CST to a + -CST so that the neutral op would need to change with it? Argh, good point. >From what I can tell the only difference for MINUS_EXPR is that we negate the reduction operand and then just continue as if it were a PLUS_EXPR (which is the right thing to do also for +-0.0?). At least I didn't observe a canonicalization and we don't call neutral_op_for_reduction in between. What we do have, though, is for the fully-masked case (you added that recently): if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)) { vector_identity = build_zero_cst (vectype_out); if (!HONOR_SIGNED_ZEROS (vectype_out)) ; else { gcc_assert (!HONOR_SIGN_DEPENDENT_ROUNDING (vectype_out)); vector_identity = const_unop (NEGATE_EXPR, vectype_out, vector_identity); } } So for /* Handle MINUS by adding the negative. */ if (reduc_fn != IFN_LAST && code == MINUS_EXPR) { tree negated = make_ssa_name (vectype_out); We might need a similar assert gcc_assert (HONOR_SIGNED_ZEROS (vectype_out) && !HONOR_SIGN_DEPENDENT_ROUNDING (vectype_out));? Apart from that the only call with !as_inital is in vect_create_epilog_for_reduction. I just instrumented it with an assert (false) but i386.exp doesn't trigger it at all. Regards Robin
Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.
On Thu, 5 Oct 2023, Robin Dapp wrote: > Hi Tamar, > > > The only comment I have is whether you actually need this helper > > function? It looks like all the uses of it are in cases you have, or > > will call conditional_internal_fn_code directly. > removed the cond_fn_p entirely in the attached v3. > > Bootstrapped and regtested on x86_64, aarch64 and power10. Looks good - I only have one question, see below ... > Regards > Robin > > Subject: [PATCH v3] ifcvt/vect: Emit COND_ADD for conditional scalar > reduction. > > As described in PR111401 we currently emit a COND and a PLUS expression > for conditional reductions. This makes it difficult to combine both > into a masked reduction statement later. > This patch improves that by directly emitting a COND_ADD during ifcvt and > adjusting some vectorizer code to handle it. > > It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS > is true. > > gcc/ChangeLog: > > PR middle-end/111401 > * tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_ADD > if supported. > (predicate_scalar_phi): Add whitespace. > * tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_ADD. > (neutral_op_for_reduction): Return -0 for PLUS. > (vect_is_simple_reduction): Don't count else operand in > COND_ADD. > (vect_create_epilog_for_reduction): Fix whitespace. > (vectorize_fold_left_reduction): Add COND_ADD handling. > (vectorizable_reduction): Don't count else operand in COND_ADD. > (vect_transform_reduction): Add COND_ADD handling. > * tree-vectorizer.h (neutral_op_for_reduction): Add default > parameter. > > gcc/testsuite/ChangeLog: > > * gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test. > * gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test. > --- > .../vect-cond-reduc-in-order-2-signed-zero.c | 141 > .../riscv/rvv/autovec/cond/pr111401.c | 139 > gcc/tree-if-conv.cc | 63 ++-- > gcc/tree-vect-loop.cc | 150 ++ > gcc/tree-vectorizer.h | 2 +- > 5 files changed, 451 insertions(+), 44 deletions(-) > create mode 100644 > gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c > > diff --git > a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c > b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c > new file mode 100644 > index 000..7b46e7d8a2a > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c > @@ -0,0 +1,141 @@ > +/* Make sure a -0 stays -0 when we perform a conditional reduction. */ > +/* { dg-do run } */ > +/* { dg-require-effective-target vect_double } */ > +/* { dg-add-options ieee } */ > +/* { dg-additional-options "-std=gnu99 -fno-fast-math" } */ > + > +#include "tree-vect.h" > + > +#include > + > +#define N (VECTOR_BITS * 17) > + > +double __attribute__ ((noinline, noclone)) > +reduc_plus_double (double *restrict a, double init, int *cond, int n) > +{ > + double res = init; > + for (int i = 0; i < n; i++) > +if (cond[i]) > + res += a[i]; > + return res; > +} > + > +double __attribute__ ((noinline, noclone, optimize ("0"))) > +reduc_plus_double_ref (double *restrict a, double init, int *cond, int n) > +{ > + double res = init; > + for (int i = 0; i < n; i++) > +if (cond[i]) > + res += a[i]; > + return res; > +} > + > +double __attribute__ ((noinline, noclone)) > +reduc_minus_double (double *restrict a, double init, int *cond, int n) > +{ > + double res = init; > + for (int i = 0; i < n; i++) > +if (cond[i]) > + res -= a[i]; > + return res; > +} > + > +double __attribute__ ((noinline, noclone, optimize ("0"))) > +reduc_minus_double_ref (double *restrict a, double init, int *cond, int n) > +{ > + double res = init; > + for (int i = 0; i < n; i++) > +if (cond[i]) > + res -= a[i]; > + return res; > +} > + > +int __attribute__ ((optimize (1))) > +main () > +{ > + int n = 19; > + double a[N]; > + int cond1[N], cond2[N]; > + > + for (int i = 0; i < N; i++) > +{ > + a[i] = (i * 0.1) * (i & 1 ? 1 : -1); > + cond1[i] = 0; > + cond2[i] = i & 4 ? 1 : 0; > + asm volatile ("" ::: "memory"); > +} > + > + double res1 = reduc_plus_double (a, -0.0, cond1, n); &
RE: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.
Hi Robin, > -Original Message- > From: Robin Dapp > Sent: Thursday, October 5, 2023 3:06 PM > To: Tamar Christina ; gcc-patches patc...@gcc.gnu.org>; Richard Biener > Cc: rdapp@gmail.com > Subject: Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar > reduction. > > Hi Tamar, > > > The only comment I have is whether you actually need this helper > > function? It looks like all the uses of it are in cases you have, or > > will call conditional_internal_fn_code directly. > removed the cond_fn_p entirely in the attached v3. > > Bootstrapped and regtested on x86_64, aarch64 and power10. > Changes look good to me thanks! I'll leave it up to Richi for final approval. Regards, Tamar > Regards > Robin > > Subject: [PATCH v3] ifcvt/vect: Emit COND_ADD for conditional scalar > reduction. > > As described in PR111401 we currently emit a COND and a PLUS expression > for conditional reductions. This makes it difficult to combine both into a > masked reduction statement later. > This patch improves that by directly emitting a COND_ADD during ifcvt and > adjusting some vectorizer code to handle it. > > It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS is > true. > > gcc/ChangeLog: > > PR middle-end/111401 > * tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_ADD > if supported. > (predicate_scalar_phi): Add whitespace. > * tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_ADD. > (neutral_op_for_reduction): Return -0 for PLUS. > (vect_is_simple_reduction): Don't count else operand in > COND_ADD. > (vect_create_epilog_for_reduction): Fix whitespace. > (vectorize_fold_left_reduction): Add COND_ADD handling. > (vectorizable_reduction): Don't count else operand in COND_ADD. > (vect_transform_reduction): Add COND_ADD handling. > * tree-vectorizer.h (neutral_op_for_reduction): Add default > parameter. > > gcc/testsuite/ChangeLog: > > * gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test. > * gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test. > --- > .../vect-cond-reduc-in-order-2-signed-zero.c | 141 > .../riscv/rvv/autovec/cond/pr111401.c | 139 > gcc/tree-if-conv.cc | 63 ++-- > gcc/tree-vect-loop.cc | 150 ++ > gcc/tree-vectorizer.h | 2 +- > 5 files changed, 451 insertions(+), 44 deletions(-) create mode 100644 > gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c > create mode 100644 > gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed- > zero.c b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c > new file mode 100644 > index 000..7b46e7d8a2a > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c > @@ -0,0 +1,141 @@ > +/* Make sure a -0 stays -0 when we perform a conditional reduction. */ > +/* { dg-do run } */ > +/* { dg-require-effective-target vect_double } */ > +/* { dg-add-options ieee } */ > +/* { dg-additional-options "-std=gnu99 -fno-fast-math" } */ > + > +#include "tree-vect.h" > + > +#include > + > +#define N (VECTOR_BITS * 17) > + > +double __attribute__ ((noinline, noclone)) reduc_plus_double (double > +*restrict a, double init, int *cond, int n) { > + double res = init; > + for (int i = 0; i < n; i++) > +if (cond[i]) > + res += a[i]; > + return res; > +} > + > +double __attribute__ ((noinline, noclone, optimize ("0"))) > +reduc_plus_double_ref (double *restrict a, double init, int *cond, int > +n) { > + double res = init; > + for (int i = 0; i < n; i++) > +if (cond[i]) > + res += a[i]; > + return res; > +} > + > +double __attribute__ ((noinline, noclone)) reduc_minus_double (double > +*restrict a, double init, int *cond, int n) { > + double res = init; > + for (int i = 0; i < n; i++) > +if (cond[i]) > + res -= a[i]; > + return res; > +} > + > +double __attribute__ ((noinline, noclone, optimize ("0"))) > +reduc_minus_double_ref (double *restrict a, double init, int *cond, int > +n) { > + double res = init; > + for (int i = 0; i < n; i++) > +if (cond[i]) > + res -= a[i]; > + return res; > +} > + > +int __attribute__ ((optimize (1))) > +main () > +{ > + int n = 19; > + double a[N]; > + int cond1[N], cond2[N]; > + > +
Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.
Hi Tamar, > The only comment I have is whether you actually need this helper > function? It looks like all the uses of it are in cases you have, or > will call conditional_internal_fn_code directly. removed the cond_fn_p entirely in the attached v3. Bootstrapped and regtested on x86_64, aarch64 and power10. Regards Robin Subject: [PATCH v3] ifcvt/vect: Emit COND_ADD for conditional scalar reduction. As described in PR111401 we currently emit a COND and a PLUS expression for conditional reductions. This makes it difficult to combine both into a masked reduction statement later. This patch improves that by directly emitting a COND_ADD during ifcvt and adjusting some vectorizer code to handle it. It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS is true. gcc/ChangeLog: PR middle-end/111401 * tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_ADD if supported. (predicate_scalar_phi): Add whitespace. * tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_ADD. (neutral_op_for_reduction): Return -0 for PLUS. (vect_is_simple_reduction): Don't count else operand in COND_ADD. (vect_create_epilog_for_reduction): Fix whitespace. (vectorize_fold_left_reduction): Add COND_ADD handling. (vectorizable_reduction): Don't count else operand in COND_ADD. (vect_transform_reduction): Add COND_ADD handling. * tree-vectorizer.h (neutral_op_for_reduction): Add default parameter. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test. * gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test. --- .../vect-cond-reduc-in-order-2-signed-zero.c | 141 .../riscv/rvv/autovec/cond/pr111401.c | 139 gcc/tree-if-conv.cc | 63 ++-- gcc/tree-vect-loop.cc | 150 ++ gcc/tree-vectorizer.h | 2 +- 5 files changed, 451 insertions(+), 44 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c new file mode 100644 index 000..7b46e7d8a2a --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c @@ -0,0 +1,141 @@ +/* Make sure a -0 stays -0 when we perform a conditional reduction. */ +/* { dg-do run } */ +/* { dg-require-effective-target vect_double } */ +/* { dg-add-options ieee } */ +/* { dg-additional-options "-std=gnu99 -fno-fast-math" } */ + +#include "tree-vect.h" + +#include + +#define N (VECTOR_BITS * 17) + +double __attribute__ ((noinline, noclone)) +reduc_plus_double (double *restrict a, double init, int *cond, int n) +{ + double res = init; + for (int i = 0; i < n; i++) +if (cond[i]) + res += a[i]; + return res; +} + +double __attribute__ ((noinline, noclone, optimize ("0"))) +reduc_plus_double_ref (double *restrict a, double init, int *cond, int n) +{ + double res = init; + for (int i = 0; i < n; i++) +if (cond[i]) + res += a[i]; + return res; +} + +double __attribute__ ((noinline, noclone)) +reduc_minus_double (double *restrict a, double init, int *cond, int n) +{ + double res = init; + for (int i = 0; i < n; i++) +if (cond[i]) + res -= a[i]; + return res; +} + +double __attribute__ ((noinline, noclone, optimize ("0"))) +reduc_minus_double_ref (double *restrict a, double init, int *cond, int n) +{ + double res = init; + for (int i = 0; i < n; i++) +if (cond[i]) + res -= a[i]; + return res; +} + +int __attribute__ ((optimize (1))) +main () +{ + int n = 19; + double a[N]; + int cond1[N], cond2[N]; + + for (int i = 0; i < N; i++) +{ + a[i] = (i * 0.1) * (i & 1 ? 1 : -1); + cond1[i] = 0; + cond2[i] = i & 4 ? 1 : 0; + asm volatile ("" ::: "memory"); +} + + double res1 = reduc_plus_double (a, -0.0, cond1, n); + double ref1 = reduc_plus_double_ref (a, -0.0, cond1, n); + double res2 = reduc_minus_double (a, -0.0, cond1, n); + double ref2 = reduc_minus_double_ref (a, -0.0, cond1, n); + double res3 = reduc_plus_double (a, -0.0, cond1, n); + double ref3 = reduc_plus_double_ref (a, -0.0, cond1, n); + double res4 = reduc_minus_double (a, -0.0, cond1, n); + double ref4 = reduc_minus_double_ref (a, -0.0, cond1, n); + + if (res1 != ref1 || signbit (res1) != signbit (ref1)) +__builtin_abort (); + if (res2 != ref2 || signbit (res2) != signbit (ref2)) +__builtin_abort (); + if (res3 != ref3 || signbit (res3) != signbit (ref3)) +__builtin_abort (); + if (res4 != ref4 || signbit (res4) != signbi
Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.
Ah, sorry, read your remark incorrectly. Will try again. Regards Robin
Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.
Hi Tamar, > So in the > > if (slp_node) > { > > Add something like: > > If (is_cond_op) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, >"left fold reduction on SLP not supported.\n"); > return false; > } Yes, seems reasonable, added. > The only comment I have is whether you actually need this helper function? > It looks like all the uses of it are in cases you have, or will call > conditional_internal_fn_code > directly. > > e.g. in vect_transform_reduction you can replace it by > > bool cond_fn_p = cond_fn != ERROR_MARK; > > and in > > if (cond_fn_p (orig_code)) > orig_code = conditional_internal_fn_code (internal_fn(orig_code)); > > just > > internal_fn new_fn = conditional_internal_fn_code (internal_fn(orig_code)); > if (new_fn != ERROR_MARK) > orig_code = new_fn; > > which would save the repeated testing of the condition. I see what you mean. One complication is that we want to disambiguate (among others): (1) code = IFN_COND_ADD, cond_fn = IFN_LAST. (new case) (2) code = IFN_MAX, cond_fn = IFN_COND_MAX. (3) code = IFN_SOMETHING, cond_fn = IFN_LAST. So just checking cond_fn is not enough (even if we made get_conditional_internal_fn (IFN_COND_ADD) return IFN_COND_ADD). We need to know if the initial code already was an IFN_COND. It's a bit of a mess but I didn't dare untangling. Well, actually, I tried but made it worse ;) The cond_fn_p check seemed least intrusive to me. Maybe you have another idea? Regards Robin
RE: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.
Hi Robin, > -Original Message- > From: Robin Dapp > Sent: Wednesday, October 4, 2023 8:54 AM > To: Tamar Christina ; gcc-patches patc...@gcc.gnu.org>; Richard Biener > Cc: rdapp@gmail.com > Subject: Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar > reduction. > > Hi Tamar, > > > I can't approve but hope you don't mind the review, > > Not at all, greatly appreciated. > > I incorporated all your remarks apart from this: > > > Isn't vec_opmask NULL for SLP? You probably need to read it from > > vec_defs for the COND_EXPR > > Above that I gcc_assert (!slp_node) for the IFN_COND case. It doesn't seem to > be hit during testsuite runs. I also didn't manage to create an example that > would trigger it. When "conditionalizing" an SLP fold-left reduction we don't > seem to vectorize for a different reason. > Granted, I didn't look very closely at that reason :) Yeah it looks like it's failing because it can't handle the PHI node reduction for the condition. So that's fine, I think then we should exit from vectorize_fold_left_reduction in that case so we avoid the segfault when we start forcing things through SLP only soon and add single lane SLP support. So in the if (slp_node) { Add something like: If (is_cond_op) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "left fold reduction on SLP not supported.\n"); return false; } > > Bootstrap and testsuite are currently running with the attached v2 on x86, > aarch64 and powerpc. > > Besides, when thinking about which COND_OPs we expect I tried to loosen the > restrictions in if-conv by allowing MAX_EXPR and MIN_EXPR. The emitted > code on riscv looks correct but I hit a bootstrap ICE on x86 so omitted it for > now. > > Regards > Robin > > > Subject: [PATCH v2] ifcvt/vect: Emit COND_ADD for conditional scalar > reduction. > > As described in PR111401 we currently emit a COND and a PLUS expression > for conditional reductions. This makes it difficult to combine both into a > masked reduction statement later. > This patch improves that by directly emitting a COND_ADD during ifcvt and > adjusting some vectorizer code to handle it. > > It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS is > true. > > gcc/ChangeLog: > > PR middle-end/111401 > * internal-fn.cc (cond_fn_p): New function. > * internal-fn.h (cond_fn_p): Define. > * tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_ADD > if supported. > (predicate_scalar_phi): Add whitespace. > * tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_ADD. > (neutral_op_for_reduction): Return -0 for PLUS. > (vect_is_simple_reduction): Don't count else operand in > COND_ADD. > (vectorize_fold_left_reduction): Add COND_ADD handling. > (vectorizable_reduction): Don't count else operand in COND_ADD. > (vect_transform_reduction): Add COND_ADD handling. > * tree-vectorizer.h (neutral_op_for_reduction): Add default > parameter. > > gcc/testsuite/ChangeLog: > > * gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test. > * gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test. > --- > gcc/internal-fn.cc| 17 +++ > gcc/internal-fn.h | 1 + > .../vect-cond-reduc-in-order-2-signed-zero.c | 141 ++ > .../riscv/rvv/autovec/cond/pr111401.c | 139 + > gcc/tree-if-conv.cc | 63 ++-- > gcc/tree-vect-loop.cc | 129 > gcc/tree-vectorizer.h | 2 +- > 7 files changed, 450 insertions(+), 42 deletions(-) create mode 100644 > gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c > create mode 100644 > gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c > > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index > 61d5a9e4772..9b38dc0cef4 100644 > --- a/gcc/internal-fn.cc > +++ b/gcc/internal-fn.cc > @@ -4245,6 +4245,23 @@ first_commutative_argument (internal_fn fn) > } > } > > +/* Return true if this CODE describes a conditional (masked) > +internal_fn. */ > + > +bool > +cond_fn_p (code_helper code) > +{ > + if (!code.is_fn_code ()) > +return false; > + > + if (!internal_fn_p ((combined_fn) code)) > +return false; > + > + internal_fn fn = as_internal_fn ((combined_fn) code); > + > + return conditional_internal_fn_code (fn) != ERROR_MARK; } > + > + > /* Return true if
Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.
> + gcc_assert (code == IFN_COND_ADD || code == IFN_COND_SUB); I forgot to add the other IFN_CONDs here before sending. So with - gcc_assert (code == IFN_COND_ADD || code == IFN_COND_SUB); + gcc_assert (code == IFN_COND_ADD || code == IFN_COND_SUB + || code == IFN_COND_MUL || code == IFN_COND_AND + || code == IFN_COND_IOR || code == IFN_COND_XOR); on top, bootstrap and testsuites on x86, aarch64 and power10 are unchanged. Regards Robin
Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.
Hi Tamar, > I can't approve but hope you don't mind the review, Not at all, greatly appreciated. I incorporated all your remarks apart from this: > Isn't vec_opmask NULL for SLP? You probably need to read it from > vec_defs for the COND_EXPR Above that I gcc_assert (!slp_node) for the IFN_COND case. It doesn't seem to be hit during testsuite runs. I also didn't manage to create an example that would trigger it. When "conditionalizing" an SLP fold-left reduction we don't seem to vectorize for a different reason. Granted, I didn't look very closely at that reason :) Bootstrap and testsuite are currently running with the attached v2 on x86, aarch64 and powerpc. Besides, when thinking about which COND_OPs we expect I tried to loosen the restrictions in if-conv by allowing MAX_EXPR and MIN_EXPR. The emitted code on riscv looks correct but I hit a bootstrap ICE on x86 so omitted it for now. Regards Robin Subject: [PATCH v2] ifcvt/vect: Emit COND_ADD for conditional scalar reduction. As described in PR111401 we currently emit a COND and a PLUS expression for conditional reductions. This makes it difficult to combine both into a masked reduction statement later. This patch improves that by directly emitting a COND_ADD during ifcvt and adjusting some vectorizer code to handle it. It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS is true. gcc/ChangeLog: PR middle-end/111401 * internal-fn.cc (cond_fn_p): New function. * internal-fn.h (cond_fn_p): Define. * tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_ADD if supported. (predicate_scalar_phi): Add whitespace. * tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_ADD. (neutral_op_for_reduction): Return -0 for PLUS. (vect_is_simple_reduction): Don't count else operand in COND_ADD. (vectorize_fold_left_reduction): Add COND_ADD handling. (vectorizable_reduction): Don't count else operand in COND_ADD. (vect_transform_reduction): Add COND_ADD handling. * tree-vectorizer.h (neutral_op_for_reduction): Add default parameter. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test. * gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test. --- gcc/internal-fn.cc| 17 +++ gcc/internal-fn.h | 1 + .../vect-cond-reduc-in-order-2-signed-zero.c | 141 ++ .../riscv/rvv/autovec/cond/pr111401.c | 139 + gcc/tree-if-conv.cc | 63 ++-- gcc/tree-vect-loop.cc | 129 gcc/tree-vectorizer.h | 2 +- 7 files changed, 450 insertions(+), 42 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index 61d5a9e4772..9b38dc0cef4 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -4245,6 +4245,23 @@ first_commutative_argument (internal_fn fn) } } +/* Return true if this CODE describes a conditional (masked) internal_fn. */ + +bool +cond_fn_p (code_helper code) +{ + if (!code.is_fn_code ()) +return false; + + if (!internal_fn_p ((combined_fn) code)) +return false; + + internal_fn fn = as_internal_fn ((combined_fn) code); + + return conditional_internal_fn_code (fn) != ERROR_MARK; +} + + /* Return true if this CODE describes an internal_fn that returns a vector with elements twice as wide as the element size of the input vectors. */ diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h index 99de13a0199..f1cc9db29c0 100644 --- a/gcc/internal-fn.h +++ b/gcc/internal-fn.h @@ -219,6 +219,7 @@ extern bool commutative_ternary_fn_p (internal_fn); extern int first_commutative_argument (internal_fn); extern bool associative_binary_fn_p (internal_fn); extern bool widening_fn_p (code_helper); +extern bool cond_fn_p (code_helper code); extern bool set_edom_supported_p (void); diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c new file mode 100644 index 000..7b46e7d8a2a --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c @@ -0,0 +1,141 @@ +/* Make sure a -0 stays -0 when we perform a conditional reduction. */ +/* { dg-do run } */ +/* { dg-require-effective-target vect_double } */ +/* { dg-add-options ieee } */ +/* { dg-additional-options "-std=gnu99 -fno-fast-math" } */ + +#include "tree-vect.h" + +#include + +#define N (VECTOR_BITS * 17) + +double __attribute__ ((noinline, noclone)) +reduc_plus_double (double *restrict a, double init, int *cond, int n) +{ + double res = init; + for
Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.
On Wed, 20 Sep 2023, Robin Dapp wrote: > Hi, > > as described in PR111401 we currently emit a COND and a PLUS expression > for conditional reductions. This makes it difficult to combine both > into a masked reduction statement later. > This patch improves that by directly emitting a COND_ADD during ifcvt and > adjusting some vectorizer code to handle it. > > It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS > is true. > > Related question/change: We only allow PLUS_EXPR in fold_left_reduction_fn > but have code to handle MINUS_EXPR in vectorize_fold_left_reduction. I > suppose that's intentional but it "just works" on riscv and the testsuite > doesn't change when allowing MINUS_EXPR so I went ahead and did that. > > Bootstrapped and regtested on x86 and aarch64. I think overall the patch is fine - please address Tamars comments though, those look valid. Thanks, Richard. > Regards > Robin > > gcc/ChangeLog: > > PR middle-end/111401 > * internal-fn.cc (cond_fn_p): New function. > * internal-fn.h (cond_fn_p): Define. > * tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_ADD > if supported. > (predicate_scalar_phi): Add whitespace. > * tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_ADD. > (neutral_op_for_reduction): Return -0 for PLUS. > (vect_is_simple_reduction): Don't count else operand in > COND_ADD. > (vectorize_fold_left_reduction): Add COND_ADD handling. > (vectorizable_reduction): Don't count else operand in COND_ADD. > (vect_transform_reduction): Add COND_ADD handling. > * tree-vectorizer.h (neutral_op_for_reduction): Add default > parameter. > > gcc/testsuite/ChangeLog: > > * gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test. > * gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test. > --- > gcc/internal-fn.cc| 38 + > gcc/internal-fn.h | 1 + > .../vect-cond-reduc-in-order-2-signed-zero.c | 141 ++ > .../riscv/rvv/autovec/cond/pr111401.c | 61 > gcc/tree-if-conv.cc | 63 ++-- > gcc/tree-vect-loop.cc | 130 > gcc/tree-vectorizer.h | 2 +- > 7 files changed, 394 insertions(+), 42 deletions(-) > create mode 100644 > gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c > > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc > index 0fd34359247..77939890f5a 100644 > --- a/gcc/internal-fn.cc > +++ b/gcc/internal-fn.cc > @@ -4241,6 +4241,44 @@ first_commutative_argument (internal_fn fn) > } > } > > +/* Return true if this CODE describes a conditional (masked) internal_fn. */ > + > +bool > +cond_fn_p (code_helper code) > +{ > + if (!code.is_fn_code ()) > +return false; > + > + if (!internal_fn_p ((combined_fn) code)) > +return false; > + > + internal_fn fn = as_internal_fn ((combined_fn) code); > + switch (fn) > +{ > +#undef DEF_INTERNAL_COND_FN > +#define DEF_INTERNAL_COND_FN(NAME, F, O, T)\ > +case IFN_COND_##NAME: \ > +case IFN_COND_LEN_##NAME: \ > + return true; > +#include "internal-fn.def" > +#undef DEF_INTERNAL_COND_FN > + > +#undef DEF_INTERNAL_SIGNED_COND_FN > +#define DEF_INTERNAL_SIGNED_COND_FN(NAME, F, S, SO, UO, T) \ > +case IFN_COND_##NAME: \ > +case IFN_COND_LEN_##NAME: \ > + return true; > +#include "internal-fn.def" > +#undef DEF_INTERNAL_SIGNED_COND_FN > + > +default: > + return false; > +} > + > + return false; > +} > + > + > /* Return true if this CODE describes an internal_fn that returns a vector > with > elements twice as wide as the element size of the input vectors. */ > > diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h > index 99de13a0199..f1cc9db29c0 100644 > --- a/gcc/internal-fn.h > +++ b/gcc/internal-fn.h > @@ -219,6 +219,7 @@ extern bool commutative_ternary_fn_p (internal_fn); > extern int first_commutative_argument (internal_fn); > extern bool associative_binary_fn_p (internal_fn); > extern bool widening_fn_p (code_helper); > +extern bool cond_fn_p (code_helper code); > > extern bool set_edom_supported_p (void); > > diff --git > a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c > b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c > new file mode 100644 > index 000..57c600838ee > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c > @@ -0,0 +1,141 @@ > +/* Make sure a -0 stays -0 when we perform a conditional reduction. */ > +/* {
RE: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.
Hi, I can't approve but hope you don't mind the review, > +/* Return true if this CODE describes a conditional (masked) > +internal_fn. */ > + > +bool > +cond_fn_p (code_helper code) > +{ > + if (!code.is_fn_code ()) > +return false; > + > + if (!internal_fn_p ((combined_fn) code)) > +return false; > + > + internal_fn fn = as_internal_fn ((combined_fn) code); switch (fn) > +{ > +#undef DEF_INTERNAL_COND_FN > +#define DEF_INTERNAL_COND_FN(NAME, F, O, T)\ > +case IFN_COND_##NAME: \ > +case IFN_COND_LEN_##NAME: \ > + return true; > +#include "internal-fn.def" > +#undef DEF_INTERNAL_COND_FN > + > +#undef DEF_INTERNAL_SIGNED_COND_FN > +#define DEF_INTERNAL_SIGNED_COND_FN(NAME, F, S, SO, UO, T) \ > +case IFN_COND_##NAME: \ > +case IFN_COND_LEN_##NAME: \ > + return true; > +#include "internal-fn.def" > +#undef DEF_INTERNAL_SIGNED_COND_FN > + > +default: > + return false; > +} > + > + return false; > +} > + > + Could you not use conditional_internal_fn_code for this? Just check result is not ERROR_MARK? > /* Return true if this CODE describes an internal_fn that returns a vector > with > elements twice as wide as the element size of the input vectors. */ > > diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h index > 99de13a0199..f1cc9db29c0 100644 > --- a/gcc/internal-fn.h > +++ b/gcc/internal-fn.h > @@ -219,6 +219,7 @@ extern bool commutative_ternary_fn_p (internal_fn); > extern int first_commutative_argument (internal_fn); extern bool > associative_binary_fn_p (internal_fn); extern bool widening_fn_p > (code_helper); > +extern bool cond_fn_p (code_helper code); > > extern bool set_edom_supported_p (void); > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed- > zero.c b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c > new file mode 100644 > index 000..57c600838ee > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c > @@ -0,0 +1,141 @@ > +/* Make sure a -0 stays -0 when we perform a conditional reduction. */ > +/* { dg-do run } */ > +/* { dg-require-effective-target vect_double } */ > +/* { dg-add-options ieee } */ > +/* { dg-additional-options "-std=c99 -fno-fast-math" } */ > + > +#include "tree-vect.h" > + > +#include > + > +#define N (VECTOR_BITS * 17) > + > +double __attribute__ ((noinline, noclone)) reduc_plus_double (double > +*restrict a, double init, int *cond, int n) { > + double res = init; > + for (int i = 0; i < n; i++) > +if (cond[i]) > + res += a[i]; > + return res; > +} > + > +double __attribute__ ((noinline, noclone, optimize ("0"))) > +reduc_plus_double_ref (double *restrict a, double init, int *cond, int > +n) { > + double res = init; > + for (int i = 0; i < n; i++) > +if (cond[i]) > + res += a[i]; > + return res; > +} > + > +double __attribute__ ((noinline, noclone)) reduc_minus_double (double > +*restrict a, double init, int *cond, int n) { > + double res = init; > + for (int i = 0; i < n; i++) > +if (cond[i]) > + res -= a[i]; > + return res; > +} > + > +double __attribute__ ((noinline, noclone, optimize ("0"))) > +reduc_minus_double_ref (double *restrict a, double init, int *cond, int > +n) { > + double res = init; > + for (int i = 0; i < n; i++) > +if (cond[i]) > + res -= a[i]; > + return res; > +} > + > +int __attribute__ ((optimize (1))) > +main () > +{ > + int n = 19; > + double a[N]; > + int cond1[N], cond2[N]; > + > + for (int i = 0; i < N; i++) > +{ > + a[i] = (i * 0.1) * (i & 1 ? 1 : -1); > + cond1[i] = 0; > + cond2[i] = i & 4 ? 1 : 0; > + asm volatile ("" ::: "memory"); > +} > + > + double res1 = reduc_plus_double (a, -0.0, cond1, n); double ref1 = > + reduc_plus_double_ref (a, -0.0, cond1, n); double res2 = > + reduc_minus_double (a, -0.0, cond1, n); double ref2 = > + reduc_minus_double_ref (a, -0.0, cond1, n); double res3 = > + reduc_plus_double (a, -0.0, cond1, n); double ref3 = > + reduc_plus_double_ref (a, -0.0, cond1, n); double res4 = > + reduc_minus_double (a, -0.0, cond1, n); double ref4 = > + reduc_minus_double_ref (a, -0.0, cond1, n); > + > + if (res1 != ref1 || signbit (res1) != signbit (ref1)) > +__builtin_abort (); > + if (res2 != ref2 || signbit (res2) != signbit (ref2)) > +__builtin_abort (); > + if (res3 != ref3 || signbit (res3) != signbit (ref3)) > +__builtin_abort (); > + if (res4 != ref4 || signbit (res4) != signbit (ref4)) > +__builtin_abort (); > + > + res1 = reduc_plus_double (a, 0.0, cond1, n); > + ref1 = reduc_plus_double_ref (a, 0.0, cond1, n); > + res2 = reduc_minus_double (a, 0.0, cond1, n); > + ref2 = reduc_minus_double_ref (a, 0.0, cond1, n); > + res3 =
[PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.
Hi, as described in PR111401 we currently emit a COND and a PLUS expression for conditional reductions. This makes it difficult to combine both into a masked reduction statement later. This patch improves that by directly emitting a COND_ADD during ifcvt and adjusting some vectorizer code to handle it. It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS is true. Related question/change: We only allow PLUS_EXPR in fold_left_reduction_fn but have code to handle MINUS_EXPR in vectorize_fold_left_reduction. I suppose that's intentional but it "just works" on riscv and the testsuite doesn't change when allowing MINUS_EXPR so I went ahead and did that. Bootstrapped and regtested on x86 and aarch64. Regards Robin gcc/ChangeLog: PR middle-end/111401 * internal-fn.cc (cond_fn_p): New function. * internal-fn.h (cond_fn_p): Define. * tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_ADD if supported. (predicate_scalar_phi): Add whitespace. * tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_ADD. (neutral_op_for_reduction): Return -0 for PLUS. (vect_is_simple_reduction): Don't count else operand in COND_ADD. (vectorize_fold_left_reduction): Add COND_ADD handling. (vectorizable_reduction): Don't count else operand in COND_ADD. (vect_transform_reduction): Add COND_ADD handling. * tree-vectorizer.h (neutral_op_for_reduction): Add default parameter. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test. * gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test. --- gcc/internal-fn.cc| 38 + gcc/internal-fn.h | 1 + .../vect-cond-reduc-in-order-2-signed-zero.c | 141 ++ .../riscv/rvv/autovec/cond/pr111401.c | 61 gcc/tree-if-conv.cc | 63 ++-- gcc/tree-vect-loop.cc | 130 gcc/tree-vectorizer.h | 2 +- 7 files changed, 394 insertions(+), 42 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index 0fd34359247..77939890f5a 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -4241,6 +4241,44 @@ first_commutative_argument (internal_fn fn) } } +/* Return true if this CODE describes a conditional (masked) internal_fn. */ + +bool +cond_fn_p (code_helper code) +{ + if (!code.is_fn_code ()) +return false; + + if (!internal_fn_p ((combined_fn) code)) +return false; + + internal_fn fn = as_internal_fn ((combined_fn) code); + switch (fn) +{ +#undef DEF_INTERNAL_COND_FN +#define DEF_INTERNAL_COND_FN(NAME, F, O, T) \ +case IFN_COND_##NAME:\ +case IFN_COND_LEN_##NAME:\ + return true; +#include "internal-fn.def" +#undef DEF_INTERNAL_COND_FN + +#undef DEF_INTERNAL_SIGNED_COND_FN +#define DEF_INTERNAL_SIGNED_COND_FN(NAME, F, S, SO, UO, T) \ +case IFN_COND_##NAME:\ +case IFN_COND_LEN_##NAME:\ + return true; +#include "internal-fn.def" +#undef DEF_INTERNAL_SIGNED_COND_FN + +default: + return false; +} + + return false; +} + + /* Return true if this CODE describes an internal_fn that returns a vector with elements twice as wide as the element size of the input vectors. */ diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h index 99de13a0199..f1cc9db29c0 100644 --- a/gcc/internal-fn.h +++ b/gcc/internal-fn.h @@ -219,6 +219,7 @@ extern bool commutative_ternary_fn_p (internal_fn); extern int first_commutative_argument (internal_fn); extern bool associative_binary_fn_p (internal_fn); extern bool widening_fn_p (code_helper); +extern bool cond_fn_p (code_helper code); extern bool set_edom_supported_p (void); diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c new file mode 100644 index 000..57c600838ee --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c @@ -0,0 +1,141 @@ +/* Make sure a -0 stays -0 when we perform a conditional reduction. */ +/* { dg-do run } */ +/* { dg-require-effective-target vect_double } */ +/* { dg-add-options ieee } */ +/* { dg-additional-options "-std=c99 -fno-fast-math" } */ + +#include "tree-vect.h" + +#include + +#define N (VECTOR_BITS * 17) + +double __attribute__ ((noinline, noclone)) +reduc_plus_double (double *restrict a, double init, int *cond, int n) +{ + double res = init; + for (int i = 0; i