Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-11-02 Thread Andrew Pinski
On Wed, Sep 20, 2023 at 6:52 AM Robin Dapp  wrote:
>
> Hi,
>
> as described in PR111401 we currently emit a COND and a PLUS expression
> for conditional reductions.  This makes it difficult to combine both
> into a masked reduction statement later.
> This patch improves that by directly emitting a COND_ADD during ifcvt and
> adjusting some vectorizer code to handle it.
>
> It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS
> is true.
>
> Related question/change: We only allow PLUS_EXPR in fold_left_reduction_fn
> but have code to handle MINUS_EXPR in vectorize_fold_left_reduction.  I
> suppose that's intentional but it "just works" on riscv and the testsuite
> doesn't change when allowing MINUS_EXPR so I went ahead and did that.
>
> Bootstrapped and regtested on x86 and aarch64.

This caused gcc.target/i386/avx512f-reduce-op-1.c testcase to start to
fail when testing on a x86_64 that has avx512f (In my case I am using
`Intel(R) Xeon(R) D-2166NT CPU @ 2.00GHz`).  I reverted the commit to
double check it too.

The difference in optimized I see is:
  if (_40 != 3.5e+1) // working
vs
  if (_40 != 6.4e+1) // not working

It is test_epi32_ps which is failing with TEST_PS macro and the plus
operand that uses TESTOP:
TESTOP (add, +, float, ps, 0.0f);   \

I have not reduced the testcase any further though.

Thanks,
Andrew Pinski


>
> Regards
>  Robin
>
> gcc/ChangeLog:
>
> PR middle-end/111401
> * internal-fn.cc (cond_fn_p): New function.
> * internal-fn.h (cond_fn_p): Define.
> * tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_ADD
> if supported.
> (predicate_scalar_phi): Add whitespace.
> * tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_ADD.
> (neutral_op_for_reduction): Return -0 for PLUS.
> (vect_is_simple_reduction): Don't count else operand in
> COND_ADD.
> (vectorize_fold_left_reduction): Add COND_ADD handling.
> (vectorizable_reduction): Don't count else operand in COND_ADD.
> (vect_transform_reduction): Add COND_ADD handling.
> * tree-vectorizer.h (neutral_op_for_reduction): Add default
> parameter.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test.
> * gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test.
> ---
>  gcc/internal-fn.cc|  38 +
>  gcc/internal-fn.h |   1 +
>  .../vect-cond-reduc-in-order-2-signed-zero.c  | 141 ++
>  .../riscv/rvv/autovec/cond/pr111401.c |  61 
>  gcc/tree-if-conv.cc   |  63 ++--
>  gcc/tree-vect-loop.cc | 130 
>  gcc/tree-vectorizer.h |   2 +-
>  7 files changed, 394 insertions(+), 42 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 0fd34359247..77939890f5a 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4241,6 +4241,44 @@ first_commutative_argument (internal_fn fn)
>  }
>  }
>
> +/* Return true if this CODE describes a conditional (masked) internal_fn.  */
> +
> +bool
> +cond_fn_p (code_helper code)
> +{
> +  if (!code.is_fn_code ())
> +return false;
> +
> +  if (!internal_fn_p ((combined_fn) code))
> +return false;
> +
> +  internal_fn fn = as_internal_fn ((combined_fn) code);
> +  switch (fn)
> +{
> +#undef DEF_INTERNAL_COND_FN
> +#define DEF_INTERNAL_COND_FN(NAME, F, O, T)  \
> +case IFN_COND_##NAME:\
> +case IFN_COND_LEN_##NAME:\
> +  return true;
> +#include "internal-fn.def"
> +#undef DEF_INTERNAL_COND_FN
> +
> +#undef DEF_INTERNAL_SIGNED_COND_FN
> +#define DEF_INTERNAL_SIGNED_COND_FN(NAME, F, S, SO, UO, T)   \
> +case IFN_COND_##NAME:\
> +case IFN_COND_LEN_##NAME:\
> +  return true;
> +#include "internal-fn.def"
> +#undef DEF_INTERNAL_SIGNED_COND_FN
> +
> +default:
> +  return false;
> +}
> +
> +  return false;
> +}
> +
> +
>  /* Return true if this CODE describes an internal_fn that returns a vector 
> with
> elements twice as wide as the element size of the input vectors.  */
>
> diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h
> index 99de13a0199..f1cc9db29c0 100644
> --- a/gcc/internal-fn.h
> +++ b/gcc/internal-fn.h
> @@ -219,6 +219,7 @@ extern bool commutative_ternary_fn_p (internal_fn);
>  extern int first_commutative_argument (internal_fn);
>  extern bool associative_binary_fn_p (internal_fn);
>  extern bool widening_fn_p (code_helper);
> 

Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-11-02 Thread Richard Biener
On Tue, 31 Oct 2023, Robin Dapp wrote:

> >> +int
> >> +internal_fn_else_index (internal_fn fn)
> > 
> > The function needs a comment, maybe:
> > 
> > /* If FN is an IFN_COND_* or IFN_COND_LEN_* function, return the index of 
> > the
> >argument that is used when the condition is false.  Return -1 otherwise. 
> >  */
> > 
> > OK for the internal-fn* and tree-if-conv.cc bits (which were the
> > parts I commented on earlier).  I'll look at cleaning up the
> > definition of conditional internal functions separately, so that
> > the list of functions isn't necessary.
> 
> Thank you, added the comment (shouldn't have forgotten it in the
> first place...).  So there's the vectorizer part left that is not
> yet OK'd.  

The vectorizer part is OK.

Richard.


Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-31 Thread Robin Dapp
>> +int
>> +internal_fn_else_index (internal_fn fn)
> 
> The function needs a comment, maybe:
> 
> /* If FN is an IFN_COND_* or IFN_COND_LEN_* function, return the index of the
>argument that is used when the condition is false.  Return -1 otherwise.  
> */
> 
> OK for the internal-fn* and tree-if-conv.cc bits (which were the
> parts I commented on earlier).  I'll look at cleaning up the
> definition of conditional internal functions separately, so that
> the list of functions isn't necessary.

Thank you, added the comment (shouldn't have forgotten it in the
first place...).  So there's the vectorizer part left that is not
yet OK'd.  

Regards
 Robin


Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-31 Thread Richard Sandiford
Robin Dapp  writes:
> Changed as suggested.  The difference to v5 is thus:
>
> +   if (cond_fn_p)
> + {
> +   gcall *call = dyn_cast (use_stmt);
> +   unsigned else_pos
> + = internal_fn_else_index (internal_fn (op.code));
> +
> +   for (unsigned int j = 0; j < gimple_call_num_args (call); ++j)
> + {
> +   if (j == else_pos)
> + continue;
> +   if (gimple_call_arg (call, j) == op.ops[opi])
> + cnt++;
> + }
> + }
> +   else if (!is_gimple_debug (op_use_stmt)
>
> as well as internal_fn_else_index.
>
> Testsuite on riscv is unchanged, bootstrap and testsuite on power10 done,
> aarch64 and x86 still running.
>
> Regards
>  Robin
>
> From e11ac2b5889558c58ce711d8119ebcd78173ac6c Mon Sep 17 00:00:00 2001
> From: Robin Dapp 
> Date: Wed, 13 Sep 2023 22:19:35 +0200
> Subject: [PATCH v6] ifcvt/vect: Emit COND_OP for conditional scalar reduction.
>
> As described in PR111401 we currently emit a COND and a PLUS expression
> for conditional reductions.  This makes it difficult to combine both
> into a masked reduction statement later.
> This patch improves that by directly emitting a COND_ADD/COND_OP during
> ifcvt and adjusting some vectorizer code to handle it.
>
> It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS
> is true.
>
> gcc/ChangeLog:
>
>   PR middle-end/111401
>   * internal-fn.cc (internal_fn_else_index): New function.
>   * internal-fn.h (internal_fn_else_index): Define.
>   * tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_OP
>   if supported.
>   (predicate_scalar_phi): Add whitespace.
>   * tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_OP.
>   (neutral_op_for_reduction): Return -0 for PLUS.
>   (check_reduction_path): Don't count else operand in COND_OP.
>   (vect_is_simple_reduction): Ditto.
>   (vect_create_epilog_for_reduction): Fix whitespace.
>   (vectorize_fold_left_reduction): Add COND_OP handling.
>   (vectorizable_reduction): Don't count else operand in COND_OP.
>   (vect_transform_reduction): Add COND_OP handling.
>   * tree-vectorizer.h (neutral_op_for_reduction): Add default
>   parameter.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test.
>   * gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test.
>   * gcc.target/riscv/rvv/autovec/reduc/reduc_call-2.c: Adjust.
>   * gcc.target/riscv/rvv/autovec/reduc/reduc_call-4.c: Ditto.
>
> ---
>  gcc/internal-fn.cc|  58 ++
>  gcc/internal-fn.h |   1 +
>  .../vect-cond-reduc-in-order-2-signed-zero.c  | 141 +
>  .../riscv/rvv/autovec/cond/pr111401.c | 139 +
>  .../riscv/rvv/autovec/reduc/reduc_call-2.c|   4 +-
>  .../riscv/rvv/autovec/reduc/reduc_call-4.c|   4 +-
>  gcc/tree-if-conv.cc   |  49 +++--
>  gcc/tree-vect-loop.cc | 193 ++
>  gcc/tree-vectorizer.h |   2 +-
>  9 files changed, 536 insertions(+), 55 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 61d5a9e4772..018175261b9 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4697,6 +4697,64 @@ internal_fn_len_index (internal_fn fn)
>  }
>  }
>  
> +int
> +internal_fn_else_index (internal_fn fn)

The function needs a comment, maybe:

/* If FN is an IFN_COND_* or IFN_COND_LEN_* function, return the index of the
   argument that is used when the condition is false.  Return -1 otherwise.  */

OK for the internal-fn* and tree-if-conv.cc bits (which were the
parts I commented on earlier).  I'll look at cleaning up the
definition of conditional internal functions separately, so that
the list of functions isn't necessary.

Thanks,
Richard

> +{
> +  switch (fn)
> +{
> +case IFN_COND_NEG:
> +case IFN_COND_NOT:
> +case IFN_COND_LEN_NEG:
> +case IFN_COND_LEN_NOT:
> +  return 2;
> +
> +case IFN_COND_ADD:
> +case IFN_COND_SUB:
> +case IFN_COND_MUL:
> +case IFN_COND_DIV:
> +case IFN_COND_MOD:
> +case IFN_COND_MIN:
> +case IFN_COND_MAX:
> +case IFN_COND_FMIN:
> +case IFN_COND_FMAX:
> +case IFN_COND_AND:
> +case IFN_COND_IOR:
> +case IFN_COND_XOR:
> +case IFN_COND_SHL:
> +case IFN_COND_SHR:
> +case IFN_COND_LEN_ADD:
> +case IFN_COND_LEN_SUB:
> +case IFN_COND_LEN_MUL:
> +case IFN_COND_LEN_DIV:
> +case IFN_COND_LEN_MOD:
> +case IFN_COND_LEN_MIN:
> +case IFN_COND_LEN_MAX:
> +case IFN_COND_LEN_FMIN:
> +case IFN_COND_LEN_FMAX:
> +case IFN_COND_LEN_AND:
> +case IFN_COND_LEN_IOR:
> +

Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-24 Thread Robin Dapp
Changed as suggested.  The difference to v5 is thus:

+ if (cond_fn_p)
+   {
+ gcall *call = dyn_cast (use_stmt);
+ unsigned else_pos
+   = internal_fn_else_index (internal_fn (op.code));
+
+ for (unsigned int j = 0; j < gimple_call_num_args (call); ++j)
+   {
+ if (j == else_pos)
+   continue;
+ if (gimple_call_arg (call, j) == op.ops[opi])
+   cnt++;
+   }
+   }
+ else if (!is_gimple_debug (op_use_stmt)

as well as internal_fn_else_index.

Testsuite on riscv is unchanged, bootstrap and testsuite on power10 done,
aarch64 and x86 still running.

Regards
 Robin

>From e11ac2b5889558c58ce711d8119ebcd78173ac6c Mon Sep 17 00:00:00 2001
From: Robin Dapp 
Date: Wed, 13 Sep 2023 22:19:35 +0200
Subject: [PATCH v6] ifcvt/vect: Emit COND_OP for conditional scalar reduction.

As described in PR111401 we currently emit a COND and a PLUS expression
for conditional reductions.  This makes it difficult to combine both
into a masked reduction statement later.
This patch improves that by directly emitting a COND_ADD/COND_OP during
ifcvt and adjusting some vectorizer code to handle it.

It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS
is true.

gcc/ChangeLog:

PR middle-end/111401
* internal-fn.cc (internal_fn_else_index): New function.
* internal-fn.h (internal_fn_else_index): Define.
* tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_OP
if supported.
(predicate_scalar_phi): Add whitespace.
* tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_OP.
(neutral_op_for_reduction): Return -0 for PLUS.
(check_reduction_path): Don't count else operand in COND_OP.
(vect_is_simple_reduction): Ditto.
(vect_create_epilog_for_reduction): Fix whitespace.
(vectorize_fold_left_reduction): Add COND_OP handling.
(vectorizable_reduction): Don't count else operand in COND_OP.
(vect_transform_reduction): Add COND_OP handling.
* tree-vectorizer.h (neutral_op_for_reduction): Add default
parameter.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test.
* gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_call-2.c: Adjust.
* gcc.target/riscv/rvv/autovec/reduc/reduc_call-4.c: Ditto.
---
 gcc/internal-fn.cc|  58 ++
 gcc/internal-fn.h |   1 +
 .../vect-cond-reduc-in-order-2-signed-zero.c  | 141 +
 .../riscv/rvv/autovec/cond/pr111401.c | 139 +
 .../riscv/rvv/autovec/reduc/reduc_call-2.c|   4 +-
 .../riscv/rvv/autovec/reduc/reduc_call-4.c|   4 +-
 gcc/tree-if-conv.cc   |  49 +++--
 gcc/tree-vect-loop.cc | 193 ++
 gcc/tree-vectorizer.h |   2 +-
 9 files changed, 536 insertions(+), 55 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 61d5a9e4772..018175261b9 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -4697,6 +4697,64 @@ internal_fn_len_index (internal_fn fn)
 }
 }
 
+int
+internal_fn_else_index (internal_fn fn)
+{
+  switch (fn)
+{
+case IFN_COND_NEG:
+case IFN_COND_NOT:
+case IFN_COND_LEN_NEG:
+case IFN_COND_LEN_NOT:
+  return 2;
+
+case IFN_COND_ADD:
+case IFN_COND_SUB:
+case IFN_COND_MUL:
+case IFN_COND_DIV:
+case IFN_COND_MOD:
+case IFN_COND_MIN:
+case IFN_COND_MAX:
+case IFN_COND_FMIN:
+case IFN_COND_FMAX:
+case IFN_COND_AND:
+case IFN_COND_IOR:
+case IFN_COND_XOR:
+case IFN_COND_SHL:
+case IFN_COND_SHR:
+case IFN_COND_LEN_ADD:
+case IFN_COND_LEN_SUB:
+case IFN_COND_LEN_MUL:
+case IFN_COND_LEN_DIV:
+case IFN_COND_LEN_MOD:
+case IFN_COND_LEN_MIN:
+case IFN_COND_LEN_MAX:
+case IFN_COND_LEN_FMIN:
+case IFN_COND_LEN_FMAX:
+case IFN_COND_LEN_AND:
+case IFN_COND_LEN_IOR:
+case IFN_COND_LEN_XOR:
+case IFN_COND_LEN_SHL:
+case IFN_COND_LEN_SHR:
+  return 3;
+
+case IFN_COND_FMA:
+case IFN_COND_FMS:
+case IFN_COND_FNMA:
+case IFN_COND_FNMS:
+case IFN_COND_LEN_FMA:
+case IFN_COND_LEN_FMS:
+case IFN_COND_LEN_FNMA:
+case IFN_COND_LEN_FNMS:
+  return 4;
+
+default:
+  return -1;
+}
+
+  return -1;
+}
+
 /* If FN takes a vector mask argument, return the index of that argument,
otherwise return -1.  */
 
diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h
index 99de13a0199..7d72f4db2d0 100644
--- a/gcc/internal-fn.h
+++ b/gcc/internal-fn.h
@@ 

Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-24 Thread Richard Sandiford
Richard Biener  writes:
> On Thu, 19 Oct 2023, Robin Dapp wrote:
>
>> Ugh, I didn't push yet because with a rebased trunk I am
>> seeing different behavior for some riscv testcases.
>> 
>> A reduction is not recognized because there is yet another
>> "double use" occurrence in check_reduction_path.  I guess it's
>> reasonable to loosen the restriction for conditional operations
>> here as well.
>> 
>> The only change to v4 therefore is:
>> 
>> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
>> index ebab1953b9c..64654a55e4c 100644
>> --- a/gcc/tree-vect-loop.cc
>> +++ b/gcc/tree-vect-loop.cc
>> @@ -4085,7 +4094,15 @@ pop:
>> || flow_bb_inside_loop_p (loop, gimple_bb (op_use_stmt
>>   FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
>> cnt++;
>> -  if (cnt != 1)
>> +
>> +  bool cond_fn_p = op.code.is_internal_fn ()
>> +   && (conditional_internal_fn_code (internal_fn (*code))
>> +   != ERROR_MARK);
>> +
>> +  /* In case of a COND_OP (mask, op1, op2, op1) reduction we might have
>> +op1 twice (once as definition, once as else) in the same operation.
>> +Allow this.  */
>> +  if ((!cond_fn_p && cnt != 1) || (opi == 1 && cond_fn_p && cnt != 2))
>> 
>> Bootstrapped and regtested again on x86, aarch64 and power10.
>> Testsuite on riscv unchanged.
>
> Hmm, why opi == 1 only?  I think
>
> # _1 = PHI <.., _4>
>  _3 = .COND_ADD (_1, _2, _1);
>  _4 = .COND_ADD (_3, _5, _3);
>
> would be fine as well.  I think we want to simply ignore the 'else' value
> of conditional internal functions.  I suppose we have unary, binary
> and ternary conditional functions - I miss a internal_fn_else_index,
> but I suppose it's always the last one?

Yeah, it was always the last one before the introduction of .COND_LEN.
I agree internal_fn_else_index would be useful now.

Thanks,
Richard

>
> I think a single use on .COND functions is also OK, even when on the
> 'else' value only?  But maybe that's not too important here.
>
> Maybe
>
>   gimple *op_use_stmt;
>   unsigned cnt = 0;
>   FOR_EACH_IMM_USE_STMT (op_use_stmt, imm_iter, op.ops[opi])
> if (.. op_use_stmt is conditional internal function ..)
>   {
> for (unsigned j = 0; j < gimple_call_num_args (call) - 1; ++j)
>   if (gimple_call_arg (call, j) == op.ops[opi])
> cnt++;
>   }
> else if (!is_gimple_debug (op_use_stmt)
> && (*code != ERROR_MARK
> || flow_bb_inside_loop_p (loop, gimple_bb (op_use_stmt
>   FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
> cnt++;
>
> ?
>
>> Regards
>>  Robin
>> 
>> Subject: [PATCH v5] ifcvt/vect: Emit COND_OP for conditional scalar 
>> reduction.
>> 
>> As described in PR111401 we currently emit a COND and a PLUS expression
>> for conditional reductions.  This makes it difficult to combine both
>> into a masked reduction statement later.
>> This patch improves that by directly emitting a COND_ADD/COND_OP during
>> ifcvt and adjusting some vectorizer code to handle it.
>> 
>> It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS
>> is true.
>> 
>> gcc/ChangeLog:
>> 
>>  PR middle-end/111401
>>  * tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_OP
>>  if supported.
>>  (predicate_scalar_phi): Add whitespace.
>>  * tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_OP.
>>  (neutral_op_for_reduction): Return -0 for PLUS.
>>  (check_reduction_path): Don't count else operand in COND_OP.
>>  (vect_is_simple_reduction): Ditto.
>>  (vect_create_epilog_for_reduction): Fix whitespace.
>>  (vectorize_fold_left_reduction): Add COND_OP handling.
>>  (vectorizable_reduction): Don't count else operand in COND_OP.
>>  (vect_transform_reduction): Add COND_OP handling.
>>  * tree-vectorizer.h (neutral_op_for_reduction): Add default
>>  parameter.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>  * gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test.
>>  * gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test.
>>  * gcc.target/riscv/rvv/autovec/reduc/reduc_call-2.c: Adjust.
>>  * gcc.target/riscv/rvv/autovec/reduc/reduc_call-4.c: Ditto.
>> ---
>>  .../vect-cond-reduc-in-order-2-signed-zero.c  | 141 +++
>>  .../riscv/rvv/autovec/cond/pr111401.c | 139 +++
>>  .../riscv/rvv/autovec/reduc/reduc_call-2.c|   4 +-
>>  .../riscv/rvv/autovec/reduc/reduc_call-4.c|   4 +-
>>  gcc/tree-if-conv.cc   |  49 +++--
>>  gcc/tree-vect-loop.cc | 168 ++
>>  gcc/tree-vectorizer.h |   2 +-
>>  7 files changed, 456 insertions(+), 51 deletions(-)
>>  create mode 100644 
>> gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
>>  create mode 100644 
>> gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c
>> 
>> diff --git 
>> 

Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-23 Thread Richard Biener
On Thu, 19 Oct 2023, Robin Dapp wrote:

> Ugh, I didn't push yet because with a rebased trunk I am
> seeing different behavior for some riscv testcases.
> 
> A reduction is not recognized because there is yet another
> "double use" occurrence in check_reduction_path.  I guess it's
> reasonable to loosen the restriction for conditional operations
> here as well.
> 
> The only change to v4 therefore is:
> 
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index ebab1953b9c..64654a55e4c 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -4085,7 +4094,15 @@ pop:
> || flow_bb_inside_loop_p (loop, gimple_bb (op_use_stmt
>   FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
> cnt++;
> -  if (cnt != 1)
> +
> +  bool cond_fn_p = op.code.is_internal_fn ()
> +   && (conditional_internal_fn_code (internal_fn (*code))
> +   != ERROR_MARK);
> +
> +  /* In case of a COND_OP (mask, op1, op2, op1) reduction we might have
> +op1 twice (once as definition, once as else) in the same operation.
> +Allow this.  */
> +  if ((!cond_fn_p && cnt != 1) || (opi == 1 && cond_fn_p && cnt != 2))
> 
> Bootstrapped and regtested again on x86, aarch64 and power10.
> Testsuite on riscv unchanged.

Hmm, why opi == 1 only?  I think

# _1 = PHI <.., _4>
 _3 = .COND_ADD (_1, _2, _1);
 _4 = .COND_ADD (_3, _5, _3);

would be fine as well.  I think we want to simply ignore the 'else' value
of conditional internal functions.  I suppose we have unary, binary
and ternary conditional functions - I miss a internal_fn_else_index,
but I suppose it's always the last one?

I think a single use on .COND functions is also OK, even when on the
'else' value only?  But maybe that's not too important here.

Maybe

  gimple *op_use_stmt;
  unsigned cnt = 0;
  FOR_EACH_IMM_USE_STMT (op_use_stmt, imm_iter, op.ops[opi])
if (.. op_use_stmt is conditional internal function ..)
  {
for (unsigned j = 0; j < gimple_call_num_args (call) - 1; ++j)
  if (gimple_call_arg (call, j) == op.ops[opi])
cnt++;
  }
else if (!is_gimple_debug (op_use_stmt)
&& (*code != ERROR_MARK
|| flow_bb_inside_loop_p (loop, gimple_bb (op_use_stmt
  FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
cnt++;

?

> Regards
>  Robin
> 
> Subject: [PATCH v5] ifcvt/vect: Emit COND_OP for conditional scalar reduction.
> 
> As described in PR111401 we currently emit a COND and a PLUS expression
> for conditional reductions.  This makes it difficult to combine both
> into a masked reduction statement later.
> This patch improves that by directly emitting a COND_ADD/COND_OP during
> ifcvt and adjusting some vectorizer code to handle it.
> 
> It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS
> is true.
> 
> gcc/ChangeLog:
> 
>   PR middle-end/111401
>   * tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_OP
>   if supported.
>   (predicate_scalar_phi): Add whitespace.
>   * tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_OP.
>   (neutral_op_for_reduction): Return -0 for PLUS.
>   (check_reduction_path): Don't count else operand in COND_OP.
>   (vect_is_simple_reduction): Ditto.
>   (vect_create_epilog_for_reduction): Fix whitespace.
>   (vectorize_fold_left_reduction): Add COND_OP handling.
>   (vectorizable_reduction): Don't count else operand in COND_OP.
>   (vect_transform_reduction): Add COND_OP handling.
>   * tree-vectorizer.h (neutral_op_for_reduction): Add default
>   parameter.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test.
>   * gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test.
>   * gcc.target/riscv/rvv/autovec/reduc/reduc_call-2.c: Adjust.
>   * gcc.target/riscv/rvv/autovec/reduc/reduc_call-4.c: Ditto.
> ---
>  .../vect-cond-reduc-in-order-2-signed-zero.c  | 141 +++
>  .../riscv/rvv/autovec/cond/pr111401.c | 139 +++
>  .../riscv/rvv/autovec/reduc/reduc_call-2.c|   4 +-
>  .../riscv/rvv/autovec/reduc/reduc_call-4.c|   4 +-
>  gcc/tree-if-conv.cc   |  49 +++--
>  gcc/tree-vect-loop.cc | 168 ++
>  gcc/tree-vectorizer.h |   2 +-
>  7 files changed, 456 insertions(+), 51 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c
> 
> diff --git 
> a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c 
> b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
> new file mode 100644
> index 000..7b46e7d8a2a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
> @@ -0,0 +1,141 

Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-19 Thread Robin Dapp
Ugh, I didn't push yet because with a rebased trunk I am
seeing different behavior for some riscv testcases.

A reduction is not recognized because there is yet another
"double use" occurrence in check_reduction_path.  I guess it's
reasonable to loosen the restriction for conditional operations
here as well.

The only change to v4 therefore is:

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index ebab1953b9c..64654a55e4c 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -4085,7 +4094,15 @@ pop:
|| flow_bb_inside_loop_p (loop, gimple_bb (op_use_stmt
  FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
cnt++;
-  if (cnt != 1)
+
+  bool cond_fn_p = op.code.is_internal_fn ()
+   && (conditional_internal_fn_code (internal_fn (*code))
+   != ERROR_MARK);
+
+  /* In case of a COND_OP (mask, op1, op2, op1) reduction we might have
+op1 twice (once as definition, once as else) in the same operation.
+Allow this.  */
+  if ((!cond_fn_p && cnt != 1) || (opi == 1 && cond_fn_p && cnt != 2))

Bootstrapped and regtested again on x86, aarch64 and power10.
Testsuite on riscv unchanged.

Regards
 Robin

Subject: [PATCH v5] ifcvt/vect: Emit COND_OP for conditional scalar reduction.

As described in PR111401 we currently emit a COND and a PLUS expression
for conditional reductions.  This makes it difficult to combine both
into a masked reduction statement later.
This patch improves that by directly emitting a COND_ADD/COND_OP during
ifcvt and adjusting some vectorizer code to handle it.

It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS
is true.

gcc/ChangeLog:

PR middle-end/111401
* tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_OP
if supported.
(predicate_scalar_phi): Add whitespace.
* tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_OP.
(neutral_op_for_reduction): Return -0 for PLUS.
(check_reduction_path): Don't count else operand in COND_OP.
(vect_is_simple_reduction): Ditto.
(vect_create_epilog_for_reduction): Fix whitespace.
(vectorize_fold_left_reduction): Add COND_OP handling.
(vectorizable_reduction): Don't count else operand in COND_OP.
(vect_transform_reduction): Add COND_OP handling.
* tree-vectorizer.h (neutral_op_for_reduction): Add default
parameter.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test.
* gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_call-2.c: Adjust.
* gcc.target/riscv/rvv/autovec/reduc/reduc_call-4.c: Ditto.
---
 .../vect-cond-reduc-in-order-2-signed-zero.c  | 141 +++
 .../riscv/rvv/autovec/cond/pr111401.c | 139 +++
 .../riscv/rvv/autovec/reduc/reduc_call-2.c|   4 +-
 .../riscv/rvv/autovec/reduc/reduc_call-4.c|   4 +-
 gcc/tree-if-conv.cc   |  49 +++--
 gcc/tree-vect-loop.cc | 168 ++
 gcc/tree-vectorizer.h |   2 +-
 7 files changed, 456 insertions(+), 51 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c 
b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
new file mode 100644
index 000..7b46e7d8a2a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
@@ -0,0 +1,141 @@
+/* Make sure a -0 stays -0 when we perform a conditional reduction.  */
+/* { dg-do run } */
+/* { dg-require-effective-target vect_double } */
+/* { dg-add-options ieee } */
+/* { dg-additional-options "-std=gnu99 -fno-fast-math" } */
+
+#include "tree-vect.h"
+
+#include 
+
+#define N (VECTOR_BITS * 17)
+
+double __attribute__ ((noinline, noclone))
+reduc_plus_double (double *restrict a, double init, int *cond, int n)
+{
+  double res = init;
+  for (int i = 0; i < n; i++)
+if (cond[i])
+  res += a[i];
+  return res;
+}
+
+double __attribute__ ((noinline, noclone, optimize ("0")))
+reduc_plus_double_ref (double *restrict a, double init, int *cond, int n)
+{
+  double res = init;
+  for (int i = 0; i < n; i++)
+if (cond[i])
+  res += a[i];
+  return res;
+}
+
+double __attribute__ ((noinline, noclone))
+reduc_minus_double (double *restrict a, double init, int *cond, int n)
+{
+  double res = init;
+  for (int i = 0; i < n; i++)
+if (cond[i])
+  res -= a[i];
+  return res;
+}
+
+double __attribute__ ((noinline, noclone, optimize ("0")))
+reduc_minus_double_ref (double *restrict a, double init, int *cond, int n)
+{
+  double res = init;
+  for (int i = 0; i < n; i++)
+if (cond[i])
+  res -= a[i];
+  return res;
+}
+
+int 

Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-12 Thread Richard Biener
On Wed, 11 Oct 2023, Robin Dapp wrote:

> > It wasn't very clear, sorry, but it was the last sentence I was asking
> > for clarification on, not the other bits.  Why do we want to avoid
> > generating a COND_ADD when the operand is a vectorisable call?
> 
> Ah, I see, apologies.  Upon thinking about it a bit more (thanks)
> I figured this hunk is not necessary.  I added it early in the process
> in order to keep the current behavior for situations like the following:
> 
>  before:
>  _1 = .FMA (...)
>  _2 = COND (cond, .FMA, 0.0)
>  _3 = COND_ADD (true, result, _2, result)
> 
>  This we would simplify to:
>  _2 = COND_FMA (cond, ...)
>  _3 = COND_ADD (true, result, _2, result)
> 
>  with the patch we have:
>  _1 = .FMA (...)
>  _2 = .COND_ADD (cond, arg1, _1, arg1)
> 
> Due to differences in expansion we'd end up with a masked
> vfmacc ("a += a + b * c") before and now emit an unmasked
> vfmadd ("a += a * b + c") and a masked result add.  This shouldn't
> be worse from a vector spec point of view, so I just changed the
> test expectation for now.
> 
> The attached v4 also includes Richi's suggestion for the HONOR...
> stuff.
> 
> Bootstrap and regtest unchanged on aarch64, x86 and power10.

OK

Thanks,
Richard.

> Regards
>  Robin
> 
> 
> From 1752507ce22c22b50b96f889dc0a9c2fc8e50859 Mon Sep 17 00:00:00 2001
> From: Robin Dapp 
> Date: Wed, 13 Sep 2023 22:19:35 +0200
> Subject: [PATCH v4] ifcvt/vect: Emit COND_ADD for conditional scalar
>  reduction.
> 
> As described in PR111401 we currently emit a COND and a PLUS expression
> for conditional reductions.  This makes it difficult to combine both
> into a masked reduction statement later.
> This patch improves that by directly emitting a COND_ADD during ifcvt and
> adjusting some vectorizer code to handle it.
> 
> It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS
> is true.
> 
> gcc/ChangeLog:
> 
>   PR middle-end/111401
>   * tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_ADD
>   if supported.
>   (predicate_scalar_phi): Add whitespace.
>   * tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_ADD.
>   (neutral_op_for_reduction): Return -0 for PLUS.
>   (vect_is_simple_reduction): Don't count else operand in
>   COND_ADD.
>   (vect_create_epilog_for_reduction): Fix whitespace.
>   (vectorize_fold_left_reduction): Add COND_ADD handling.
>   (vectorizable_reduction): Don't count else operand in COND_ADD.
>   (vect_transform_reduction): Add COND_ADD handling.
>   * tree-vectorizer.h (neutral_op_for_reduction): Add default
>   parameter.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test.
>   * gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test.
>   * gcc.target/riscv/rvv/autovec/reduc/reduc_call-2.c: Adjust.
>   * gcc.target/riscv/rvv/autovec/reduc/reduc_call-4.c: Ditto.
> ---
>  .../vect-cond-reduc-in-order-2-signed-zero.c  | 141 
>  .../riscv/rvv/autovec/cond/pr111401.c | 139 
>  .../riscv/rvv/autovec/reduc/reduc_call-2.c|   4 +-
>  .../riscv/rvv/autovec/reduc/reduc_call-4.c|   4 +-
>  gcc/tree-if-conv.cc   |  49 --
>  gcc/tree-vect-loop.cc | 156 ++
>  gcc/tree-vectorizer.h |   2 +-
>  7 files changed, 446 insertions(+), 49 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c
> 
> diff --git 
> a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c 
> b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
> new file mode 100644
> index 000..7b46e7d8a2a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
> @@ -0,0 +1,141 @@
> +/* Make sure a -0 stays -0 when we perform a conditional reduction.  */
> +/* { dg-do run } */
> +/* { dg-require-effective-target vect_double } */
> +/* { dg-add-options ieee } */
> +/* { dg-additional-options "-std=gnu99 -fno-fast-math" } */
> +
> +#include "tree-vect.h"
> +
> +#include 
> +
> +#define N (VECTOR_BITS * 17)
> +
> +double __attribute__ ((noinline, noclone))
> +reduc_plus_double (double *restrict a, double init, int *cond, int n)
> +{
> +  double res = init;
> +  for (int i = 0; i < n; i++)
> +if (cond[i])
> +  res += a[i];
> +  return res;
> +}
> +
> +double __attribute__ ((noinline, noclone, optimize ("0&

Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-12 Thread Richard Sandiford
Robin Dapp  writes:
>> It wasn't very clear, sorry, but it was the last sentence I was asking
>> for clarification on, not the other bits.  Why do we want to avoid
>> generating a COND_ADD when the operand is a vectorisable call?
>
> Ah, I see, apologies.  Upon thinking about it a bit more (thanks)
> I figured this hunk is not necessary.  I added it early in the process
> in order to keep the current behavior for situations like the following:
>
>  before:
>  _1 = .FMA (...)
>  _2 = COND (cond, .FMA, 0.0)
>  _3 = COND_ADD (true, result, _2, result)
>
>  This we would simplify to:
>  _2 = COND_FMA (cond, ...)
>  _3 = COND_ADD (true, result, _2, result)
>
>  with the patch we have:
>  _1 = .FMA (...)
>  _2 = .COND_ADD (cond, arg1, _1, arg1)
>
> Due to differences in expansion we'd end up with a masked
> vfmacc ("a += a + b * c") before and now emit an unmasked
> vfmadd ("a += a * b + c") and a masked result add.  This shouldn't
> be worse from a vector spec point of view, so I just changed the
> test expectation for now.

Thanks, sounds good.

> The attached v4 also includes Richi's suggestion for the HONOR...
> stuff.
>
> Bootstrap and regtest unchanged on aarch64, x86 and power10.

I'm reluctant to comment on the signed zeros/MINUS_EXPR parts,
but FWIW, the rest looks good to me.

Thanks,
Richard

>
> Regards
>  Robin
>
>
> From 1752507ce22c22b50b96f889dc0a9c2fc8e50859 Mon Sep 17 00:00:00 2001
> From: Robin Dapp 
> Date: Wed, 13 Sep 2023 22:19:35 +0200
> Subject: [PATCH v4] ifcvt/vect: Emit COND_ADD for conditional scalar
>  reduction.
>
> As described in PR111401 we currently emit a COND and a PLUS expression
> for conditional reductions.  This makes it difficult to combine both
> into a masked reduction statement later.
> This patch improves that by directly emitting a COND_ADD during ifcvt and
> adjusting some vectorizer code to handle it.
>
> It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS
> is true.
>
> gcc/ChangeLog:
>
>   PR middle-end/111401
>   * tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_ADD
>   if supported.
>   (predicate_scalar_phi): Add whitespace.
>   * tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_ADD.
>   (neutral_op_for_reduction): Return -0 for PLUS.
>   (vect_is_simple_reduction): Don't count else operand in
>   COND_ADD.
>   (vect_create_epilog_for_reduction): Fix whitespace.
>   (vectorize_fold_left_reduction): Add COND_ADD handling.
>   (vectorizable_reduction): Don't count else operand in COND_ADD.
>   (vect_transform_reduction): Add COND_ADD handling.
>   * tree-vectorizer.h (neutral_op_for_reduction): Add default
>   parameter.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test.
>   * gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test.
>   * gcc.target/riscv/rvv/autovec/reduc/reduc_call-2.c: Adjust.
>   * gcc.target/riscv/rvv/autovec/reduc/reduc_call-4.c: Ditto.
> ---
>  .../vect-cond-reduc-in-order-2-signed-zero.c  | 141 
>  .../riscv/rvv/autovec/cond/pr111401.c | 139 
>  .../riscv/rvv/autovec/reduc/reduc_call-2.c|   4 +-
>  .../riscv/rvv/autovec/reduc/reduc_call-4.c|   4 +-
>  gcc/tree-if-conv.cc   |  49 --
>  gcc/tree-vect-loop.cc | 156 ++
>  gcc/tree-vectorizer.h |   2 +-
>  7 files changed, 446 insertions(+), 49 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c
>
> diff --git 
> a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c 
> b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
> new file mode 100644
> index 000..7b46e7d8a2a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
> @@ -0,0 +1,141 @@
> +/* Make sure a -0 stays -0 when we perform a conditional reduction.  */
> +/* { dg-do run } */
> +/* { dg-require-effective-target vect_double } */
> +/* { dg-add-options ieee } */
> +/* { dg-additional-options "-std=gnu99 -fno-fast-math" } */
> +
> +#include "tree-vect.h"
> +
> +#include 
> +
> +#define N (VECTOR_BITS * 17)
> +
> +double __attribute__ ((noinline, noclone))
> +reduc_plus_double (double *restrict a, double init, int *cond, int n)
> +{
> +  double res = init;
> +  for (int i = 0; i < n; i++)
> +if (cond[i])
> +  res += a[i];
> +  return res;
&g

Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-11 Thread Robin Dapp
> It wasn't very clear, sorry, but it was the last sentence I was asking
> for clarification on, not the other bits.  Why do we want to avoid
> generating a COND_ADD when the operand is a vectorisable call?

Ah, I see, apologies.  Upon thinking about it a bit more (thanks)
I figured this hunk is not necessary.  I added it early in the process
in order to keep the current behavior for situations like the following:

 before:
 _1 = .FMA (...)
 _2 = COND (cond, .FMA, 0.0)
 _3 = COND_ADD (true, result, _2, result)

 This we would simplify to:
 _2 = COND_FMA (cond, ...)
 _3 = COND_ADD (true, result, _2, result)

 with the patch we have:
 _1 = .FMA (...)
 _2 = .COND_ADD (cond, arg1, _1, arg1)

Due to differences in expansion we'd end up with a masked
vfmacc ("a += a + b * c") before and now emit an unmasked
vfmadd ("a += a * b + c") and a masked result add.  This shouldn't
be worse from a vector spec point of view, so I just changed the
test expectation for now.

The attached v4 also includes Richi's suggestion for the HONOR...
stuff.

Bootstrap and regtest unchanged on aarch64, x86 and power10.

Regards
 Robin


>From 1752507ce22c22b50b96f889dc0a9c2fc8e50859 Mon Sep 17 00:00:00 2001
From: Robin Dapp 
Date: Wed, 13 Sep 2023 22:19:35 +0200
Subject: [PATCH v4] ifcvt/vect: Emit COND_ADD for conditional scalar
 reduction.

As described in PR111401 we currently emit a COND and a PLUS expression
for conditional reductions.  This makes it difficult to combine both
into a masked reduction statement later.
This patch improves that by directly emitting a COND_ADD during ifcvt and
adjusting some vectorizer code to handle it.

It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS
is true.

gcc/ChangeLog:

PR middle-end/111401
* tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_ADD
if supported.
(predicate_scalar_phi): Add whitespace.
* tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_ADD.
(neutral_op_for_reduction): Return -0 for PLUS.
(vect_is_simple_reduction): Don't count else operand in
COND_ADD.
(vect_create_epilog_for_reduction): Fix whitespace.
(vectorize_fold_left_reduction): Add COND_ADD handling.
(vectorizable_reduction): Don't count else operand in COND_ADD.
(vect_transform_reduction): Add COND_ADD handling.
* tree-vectorizer.h (neutral_op_for_reduction): Add default
parameter.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test.
* gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_call-2.c: Adjust.
* gcc.target/riscv/rvv/autovec/reduc/reduc_call-4.c: Ditto.
---
 .../vect-cond-reduc-in-order-2-signed-zero.c  | 141 
 .../riscv/rvv/autovec/cond/pr111401.c | 139 
 .../riscv/rvv/autovec/reduc/reduc_call-2.c|   4 +-
 .../riscv/rvv/autovec/reduc/reduc_call-4.c|   4 +-
 gcc/tree-if-conv.cc   |  49 --
 gcc/tree-vect-loop.cc | 156 ++
 gcc/tree-vectorizer.h |   2 +-
 7 files changed, 446 insertions(+), 49 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c 
b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
new file mode 100644
index 000..7b46e7d8a2a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
@@ -0,0 +1,141 @@
+/* Make sure a -0 stays -0 when we perform a conditional reduction.  */
+/* { dg-do run } */
+/* { dg-require-effective-target vect_double } */
+/* { dg-add-options ieee } */
+/* { dg-additional-options "-std=gnu99 -fno-fast-math" } */
+
+#include "tree-vect.h"
+
+#include 
+
+#define N (VECTOR_BITS * 17)
+
+double __attribute__ ((noinline, noclone))
+reduc_plus_double (double *restrict a, double init, int *cond, int n)
+{
+  double res = init;
+  for (int i = 0; i < n; i++)
+if (cond[i])
+  res += a[i];
+  return res;
+}
+
+double __attribute__ ((noinline, noclone, optimize ("0")))
+reduc_plus_double_ref (double *restrict a, double init, int *cond, int n)
+{
+  double res = init;
+  for (int i = 0; i < n; i++)
+if (cond[i])
+  res += a[i];
+  return res;
+}
+
+double __attribute__ ((noinline, noclone))
+reduc_minus_double (double *restrict a, double init, int *cond, int n)
+{
+  double res = init;
+  for (int i = 0; i < n; i++)
+if (cond[i])
+  res -= a[i];
+  return res;
+}
+
+double __attribute__ ((noinline, noclone, optimize ("0")))
+reduc_minus_double_ref (double *restrict a, double init, int *cond, int n)
+{
+  double res = init;
+

Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-09 Thread Richard Sandiford
Robin Dapp  writes:
>> It'd be good to expand on this comment a bit.  What kind of COND are you
>> anticipating?  A COND with the neutral op as the else value, so that the
>> PLUS_EXPR (or whatever) can remain unconditional?  If so, it would be
>> good to sketch briefly how that happens, and why it's better than using
>> the conditional PLUS_EXPR.
>> 
>> If that's the reason, perhaps we want a single-use check as well.
>> It's possible that OP1 is used elsewhere in the loop body, in a
>> context that would prefer a different else value.
>
> Would something like the following on top work?
>
> -  /* If possible try to create an IFN_COND_ADD instead of a COND_EXPR and
> - a PLUS_EXPR.  Don't do this if the reduction def operand itself is
> +  /* If possible create a COND_OP instead of a COND_EXPR and an OP_EXPR.
> + The COND_OP will have a neutral_op else value.
> +
> + This allows re-using the mask directly in a masked reduction instead
> + of creating a vector merge (or similar) and then an unmasked reduction.
> +
> + Don't do this if the reduction def operand itself is
>   a vectorizable call as we can create a COND version of it directly.  */

It wasn't very clear, sorry, but it was the last sentence I was asking
for clarification on, not the other bits.  Why do we want to avoid
generating a COND_ADD when the operand is a vectorisable call?

Thanks,
Richard

>
>if (ifn != IFN_LAST
>&& vectorized_internal_fn_supported_p (ifn, TREE_TYPE (lhs))
> -  && try_cond_op && !swap)
> +  && use_cond_op && !swap && has_single_use (op1))
>
> Regards
>  Robin


Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-09 Thread Richard Biener
On Mon, 9 Oct 2023, Robin Dapp wrote:

> > Hmm, the function is called at transform time so this shouldn't help
> > avoiding the ICE.  I expected we refuse to vectorize _any_ reduction
> > when sign dependent rounding is in effect?  OTOH maybe sign-dependent
> > rounding is OK but only when we use a unconditional fold-left
> > (so a loop mask from fully masking is OK but not an original COND_ADD?).
> 
> So we currently only disable the use of partial vectors
> 
>   else if (reduction_type == FOLD_LEFT_REDUCTION
>  && reduc_fn == IFN_LAST

aarch64 probably chokes because reduc_fn is not IFN_LAST.

>  && FLOAT_TYPE_P (vectype_in)
>  && HONOR_SIGNED_ZEROS (vectype_in)

so with your change we'd support signed zeros correctly.

>  && HONOR_SIGN_DEPENDENT_ROUNDING (vectype_in))
>   {
> if (dump_enabled_p ())
>   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>"can't operate on partial vectors because"
>" signed zeros cannot be preserved.\n");
> LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
> 
> which is inside a LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P block.
> 
> For the fully masked case we continue (and then fail the assertion
> on aarch64 at transform time).
> 
> I didn't get why that case is ok, though?  We still merge the initial
> definition with the identity/neutral op (i.e. possibly -0.0) based on
> the loop mask.  Is that different to partial masking?

I think the main point with my earlier change is that without
native support for a fold-left reduction (like on x86) we get

 ops = mask ? ops : neutral;
 acc += ops[0];
 acc += ops[1];
 ...

so we wouldn't use a COND_ADD but add neutral elements for masked
elements.  That's OK for signed zeros after your change (great)
but not OK for sign dependent rounding (because we can't decide on
the sign of the neutral zero then).

For the case of using an internal function, thus direct target support,
it should be OK to have sign-dependent rounding if we can use
the masked-fold-left reduction op.  As we do

  /* On the first iteration the input is simply the scalar phi
 result, and for subsequent iterations it is the output of
 the preceding operation.  */
  if (reduc_fn != IFN_LAST || (mask && mask_reduc_fn != IFN_LAST))
{
  if (mask && len && mask_reduc_fn == IFN_MASK_LEN_FOLD_LEFT_PLUS)
new_stmt = gimple_build_call_internal (mask_reduc_fn, 5, 
reduc_var,
   def0, mask, len, bias);
  else if (mask && mask_reduc_fn == IFN_MASK_FOLD_LEFT_PLUS)
new_stmt = gimple_build_call_internal (mask_reduc_fn, 3, 
reduc_var,
   def0, mask);
  else
new_stmt = gimple_build_call_internal (reduc_fn, 2, reduc_var,
   def0);

the last case should be able to assert that 
!HONOR_SIGN_DEPENDENT_ROUNDING (also the reduc_fn == IFN_LAST case).

The quoted condition above should change to drop the HONOR_SIGNED_ZEROS
condition and the reduc_fn == IFN_LAST should change, maybe to
internal_fn_mask_index (reduc_fn) == -1?

Richard.


Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-09 Thread Robin Dapp
> Hmm, the function is called at transform time so this shouldn't help
> avoiding the ICE.  I expected we refuse to vectorize _any_ reduction
> when sign dependent rounding is in effect?  OTOH maybe sign-dependent
> rounding is OK but only when we use a unconditional fold-left
> (so a loop mask from fully masking is OK but not an original COND_ADD?).

So we currently only disable the use of partial vectors

  else if (reduction_type == FOLD_LEFT_REDUCTION
   && reduc_fn == IFN_LAST
   && FLOAT_TYPE_P (vectype_in)
   && HONOR_SIGNED_ZEROS (vectype_in)
   && HONOR_SIGN_DEPENDENT_ROUNDING (vectype_in))
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 "can't operate on partial vectors because"
 " signed zeros cannot be preserved.\n");
  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;

which is inside a LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P block.

For the fully masked case we continue (and then fail the assertion
on aarch64 at transform time).

I didn't get why that case is ok, though?  We still merge the initial
definition with the identity/neutral op (i.e. possibly -0.0) based on
the loop mask.  Is that different to partial masking?

Regards
 Robin



Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-09 Thread Robin Dapp
> It'd be good to expand on this comment a bit.  What kind of COND are you
> anticipating?  A COND with the neutral op as the else value, so that the
> PLUS_EXPR (or whatever) can remain unconditional?  If so, it would be
> good to sketch briefly how that happens, and why it's better than using
> the conditional PLUS_EXPR.
> 
> If that's the reason, perhaps we want a single-use check as well.
> It's possible that OP1 is used elsewhere in the loop body, in a
> context that would prefer a different else value.

Would something like the following on top work?

-  /* If possible try to create an IFN_COND_ADD instead of a COND_EXPR and
- a PLUS_EXPR.  Don't do this if the reduction def operand itself is
+  /* If possible create a COND_OP instead of a COND_EXPR and an OP_EXPR.
+ The COND_OP will have a neutral_op else value.
+
+ This allows re-using the mask directly in a masked reduction instead
+ of creating a vector merge (or similar) and then an unmasked reduction.
+
+ Don't do this if the reduction def operand itself is
  a vectorizable call as we can create a COND version of it directly.  */

   if (ifn != IFN_LAST
   && vectorized_internal_fn_supported_p (ifn, TREE_TYPE (lhs))
-  && try_cond_op && !swap)
+  && use_cond_op && !swap && has_single_use (op1))

Regards
 Robin



Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-09 Thread Richard Biener
On Fri, 6 Oct 2023, Robin Dapp wrote:

> > So if you think you got everything correct the patch is OK as-is,
> > I just wasn't sure - maybe the neutral_element change deserves
> > a comment as to how MINUS_EXPR is handled.
> 
> Heh, I never think I got everything correct ;)
> 
> Added this now:
> 
>  static bool
>  fold_left_reduction_fn (code_helper code, internal_fn *reduc_fn)
>  {
> +  /* We support MINUS_EXPR by negating the operand.  This also preserves an
> + initial -0.0 since -0.0 - 0.0 (neutral op for MINUS_EXPR) == -0.0 +
> + (-0.0) = -0.0.  */
> 
> What I still found is that aarch64 ICEs at the assertion you added
> with -frounding-math.  Therefore I changed it to:
> 
> - gcc_assert (!HONOR_SIGN_DEPENDENT_ROUNDING (vectype_out));
> + if (HONOR_SIGN_DEPENDENT_ROUNDING (vectype_out))
> +   {
> + if (dump_enabled_p ())
> +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +"cannot vectorize fold-left reduction 
> because"
> +" signed zeros cannot be preserved.\n");
> + return false;
> +   }
> 
> No code changes apart from that.  Will leave it until Monday and push then
> barring any objections.

Hmm, the function is called at transform time so this shouldn't help
avoiding the ICE.  I expected we refuse to vectorize _any_ reduction
when sign dependent rounding is in effect?  OTOH maybe sign-dependent
rounding is OK but only when we use a unconditional fold-left
(so a loop mask from fully masking is OK but not an original COND_ADD?).

Still the check should be done in vectorizable_reduction, not only
during transform (there the assert is proper, if we can distinguish
the loop mask vs. the COND_ADD here, otherwise just remove it).

Richard.


> Thanks for the pointers.
> 
> Regards
>  Robin
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-08 Thread Richard Sandiford
Robin Dapp  writes:
> Hi Tamar,
>
>> The only comment I have is whether you actually need this helper
>> function? It looks like all the uses of it are in cases you have, or
>> will call conditional_internal_fn_code directly.
> removed the cond_fn_p entirely in the attached v3.
>
> Bootstrapped and regtested on x86_64, aarch64 and power10.
>
> Regards
>  Robin
>
> Subject: [PATCH v3] ifcvt/vect: Emit COND_ADD for conditional scalar
>  reduction.
>
> As described in PR111401 we currently emit a COND and a PLUS expression
> for conditional reductions.  This makes it difficult to combine both
> into a masked reduction statement later.
> This patch improves that by directly emitting a COND_ADD during ifcvt and
> adjusting some vectorizer code to handle it.
>
> It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS
> is true.
>
> gcc/ChangeLog:
>
>   PR middle-end/111401
>   * tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_ADD
>   if supported.
>   (predicate_scalar_phi): Add whitespace.
>   * tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_ADD.
>   (neutral_op_for_reduction): Return -0 for PLUS.
>   (vect_is_simple_reduction): Don't count else operand in
>   COND_ADD.
>   (vect_create_epilog_for_reduction): Fix whitespace.
>   (vectorize_fold_left_reduction): Add COND_ADD handling.
>   (vectorizable_reduction): Don't count else operand in COND_ADD.
>   (vect_transform_reduction): Add COND_ADD handling.
>   * tree-vectorizer.h (neutral_op_for_reduction): Add default
>   parameter.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test.
>   * gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test.

The patch LGTM too FWIW, except...

> ---
>  .../vect-cond-reduc-in-order-2-signed-zero.c  | 141 
>  .../riscv/rvv/autovec/cond/pr111401.c | 139 
>  gcc/tree-if-conv.cc   |  63 ++--
>  gcc/tree-vect-loop.cc | 150 ++
>  gcc/tree-vectorizer.h |   2 +-
>  5 files changed, 451 insertions(+), 44 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c
>
> diff --git 
> a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c 
> b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
> new file mode 100644
> index 000..7b46e7d8a2a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
> @@ -0,0 +1,141 @@
> +/* Make sure a -0 stays -0 when we perform a conditional reduction.  */
> +/* { dg-do run } */
> +/* { dg-require-effective-target vect_double } */
> +/* { dg-add-options ieee } */
> +/* { dg-additional-options "-std=gnu99 -fno-fast-math" } */
> +
> +#include "tree-vect.h"
> +
> +#include 
> +
> +#define N (VECTOR_BITS * 17)
> +
> +double __attribute__ ((noinline, noclone))
> +reduc_plus_double (double *restrict a, double init, int *cond, int n)
> +{
> +  double res = init;
> +  for (int i = 0; i < n; i++)
> +if (cond[i])
> +  res += a[i];
> +  return res;
> +}
> +
> +double __attribute__ ((noinline, noclone, optimize ("0")))
> +reduc_plus_double_ref (double *restrict a, double init, int *cond, int n)
> +{
> +  double res = init;
> +  for (int i = 0; i < n; i++)
> +if (cond[i])
> +  res += a[i];
> +  return res;
> +}
> +
> +double __attribute__ ((noinline, noclone))
> +reduc_minus_double (double *restrict a, double init, int *cond, int n)
> +{
> +  double res = init;
> +  for (int i = 0; i < n; i++)
> +if (cond[i])
> +  res -= a[i];
> +  return res;
> +}
> +
> +double __attribute__ ((noinline, noclone, optimize ("0")))
> +reduc_minus_double_ref (double *restrict a, double init, int *cond, int n)
> +{
> +  double res = init;
> +  for (int i = 0; i < n; i++)
> +if (cond[i])
> +  res -= a[i];
> +  return res;
> +}
> +
> +int __attribute__ ((optimize (1)))
> +main ()
> +{
> +  int n = 19;
> +  double a[N];
> +  int cond1[N], cond2[N];
> +
> +  for (int i = 0; i < N; i++)
> +{
> +  a[i] = (i * 0.1) * (i & 1 ? 1 : -1);
> +  cond1[i] = 0;
> +  cond2[i] = i & 4 ? 1 : 0;
> +  asm volatile ("" ::: "memory");
> +}
> +
> +  double res1 = reduc_plus_double (a, -0.0, cond1, n);
> +  double ref1 = reduc_plus_double_ref (a, -

Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-06 Thread Robin Dapp
> So if you think you got everything correct the patch is OK as-is,
> I just wasn't sure - maybe the neutral_element change deserves
> a comment as to how MINUS_EXPR is handled.

Heh, I never think I got everything correct ;)

Added this now:

 static bool
 fold_left_reduction_fn (code_helper code, internal_fn *reduc_fn)
 {
+  /* We support MINUS_EXPR by negating the operand.  This also preserves an
+ initial -0.0 since -0.0 - 0.0 (neutral op for MINUS_EXPR) == -0.0 +
+ (-0.0) = -0.0.  */

What I still found is that aarch64 ICEs at the assertion you added
with -frounding-math.  Therefore I changed it to:

- gcc_assert (!HONOR_SIGN_DEPENDENT_ROUNDING (vectype_out));
+ if (HONOR_SIGN_DEPENDENT_ROUNDING (vectype_out))
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"cannot vectorize fold-left reduction because"
+" signed zeros cannot be preserved.\n");
+ return false;
+   }

No code changes apart from that.  Will leave it until Monday and push then
barring any objections.

Thanks for the pointers.

Regards
 Robin



Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-06 Thread Richard Biener
On Fri, 6 Oct 2023, Robin Dapp wrote:

> > We might need a similar assert
> > 
> >   gcc_assert (HONOR_SIGNED_ZEROS (vectype_out)
> >   && !HONOR_SIGN_DEPENDENT_ROUNDING (vectype_out));?
> 
> erm, obviously not that exact assert but more something like
> 
> if (HONOR_SIGNED_ZEROS && !HONOR_SIGN_DEPENDENT_ROUNDING...)
>   {
> if (dump)
>   ...
> return false;
>   }
> 
> or so.

Yeah, of course the whole point of a fold-left reduction is to
_not_ give up without -ffast-math which is why I added the above.
I obviously didn't fully verify what happens for an original
MINUS_EXPR.  I think it's required to give up for -frounding-math,
but I think I might have put the code to do that in a generic
enough place.

For x86 you need --param vect-partial-vector-usage=2 and an
AVX512 enabled arch like -march=skylake-avx512 or -march=znver4.

I think tranforming - x to + (-x) works for signed zeros.

So if you think you got everything correct the patch is OK as-is,
I just wasn't sure - maybe the neutral_element change deserves
a comment as to how MINUS_EXPR is handled.

Richard.


Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-06 Thread Robin Dapp
> We might need a similar assert
> 
> gcc_assert (HONOR_SIGNED_ZEROS (vectype_out)
>   && !HONOR_SIGN_DEPENDENT_ROUNDING (vectype_out));?

erm, obviously not that exact assert but more something like

if (HONOR_SIGNED_ZEROS && !HONOR_SIGN_DEPENDENT_ROUNDING...)
  {
if (dump)
  ...
return false;
  }

or so.

Regards
 Robin


Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-06 Thread Robin Dapp
> ... here we probably get PLUS_EXPR for MINUS_EXPR above but IIRC
> for MINUS_EXPR the !as_initial case should return positive zero.
> 
> Can you double-check?

You're referring to the canonicalization from a - CST to a + -CST so
that the neutral op would need to change with it?  Argh, good point.

>From what I can tell the only difference for MINUS_EXPR is that we
negate the reduction operand and then just continue as if it were
a PLUS_EXPR (which is the right thing to do also for +-0.0?).
At least I didn't observe a canonicalization and we don't call
neutral_op_for_reduction in between.

What we do have, though, is for the fully-masked case (you added
that recently):

  if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
{
  vector_identity = build_zero_cst (vectype_out);
  if (!HONOR_SIGNED_ZEROS (vectype_out))
;
  else
{
  gcc_assert (!HONOR_SIGN_DEPENDENT_ROUNDING (vectype_out));
  vector_identity = const_unop (NEGATE_EXPR, vectype_out,
vector_identity);
}
}

So for

  /* Handle MINUS by adding the negative.  */
  if (reduc_fn != IFN_LAST && code == MINUS_EXPR)
{
  tree negated = make_ssa_name (vectype_out);

We might need a similar assert

  gcc_assert (HONOR_SIGNED_ZEROS (vectype_out)
  && !HONOR_SIGN_DEPENDENT_ROUNDING (vectype_out));?

Apart from that the only call with !as_inital is in 
vect_create_epilog_for_reduction.  I just instrumented it with an
assert (false) but i386.exp doesn't trigger it at all. 

Regards
 Robin



Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-06 Thread Richard Biener
On Thu, 5 Oct 2023, Robin Dapp wrote:

> Hi Tamar,
> 
> > The only comment I have is whether you actually need this helper
> > function? It looks like all the uses of it are in cases you have, or
> > will call conditional_internal_fn_code directly.
> removed the cond_fn_p entirely in the attached v3.
> 
> Bootstrapped and regtested on x86_64, aarch64 and power10.

Looks good - I only have one question, see below ...

> Regards
>  Robin
> 
> Subject: [PATCH v3] ifcvt/vect: Emit COND_ADD for conditional scalar
>  reduction.
> 
> As described in PR111401 we currently emit a COND and a PLUS expression
> for conditional reductions.  This makes it difficult to combine both
> into a masked reduction statement later.
> This patch improves that by directly emitting a COND_ADD during ifcvt and
> adjusting some vectorizer code to handle it.
> 
> It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS
> is true.
> 
> gcc/ChangeLog:
> 
>   PR middle-end/111401
>   * tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_ADD
>   if supported.
>   (predicate_scalar_phi): Add whitespace.
>   * tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_ADD.
>   (neutral_op_for_reduction): Return -0 for PLUS.
>   (vect_is_simple_reduction): Don't count else operand in
>   COND_ADD.
>   (vect_create_epilog_for_reduction): Fix whitespace.
>   (vectorize_fold_left_reduction): Add COND_ADD handling.
>   (vectorizable_reduction): Don't count else operand in COND_ADD.
>   (vect_transform_reduction): Add COND_ADD handling.
>   * tree-vectorizer.h (neutral_op_for_reduction): Add default
>   parameter.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test.
>   * gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test.
> ---
>  .../vect-cond-reduc-in-order-2-signed-zero.c  | 141 
>  .../riscv/rvv/autovec/cond/pr111401.c | 139 
>  gcc/tree-if-conv.cc   |  63 ++--
>  gcc/tree-vect-loop.cc | 150 ++
>  gcc/tree-vectorizer.h |   2 +-
>  5 files changed, 451 insertions(+), 44 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c
> 
> diff --git 
> a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c 
> b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
> new file mode 100644
> index 000..7b46e7d8a2a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
> @@ -0,0 +1,141 @@
> +/* Make sure a -0 stays -0 when we perform a conditional reduction.  */
> +/* { dg-do run } */
> +/* { dg-require-effective-target vect_double } */
> +/* { dg-add-options ieee } */
> +/* { dg-additional-options "-std=gnu99 -fno-fast-math" } */
> +
> +#include "tree-vect.h"
> +
> +#include 
> +
> +#define N (VECTOR_BITS * 17)
> +
> +double __attribute__ ((noinline, noclone))
> +reduc_plus_double (double *restrict a, double init, int *cond, int n)
> +{
> +  double res = init;
> +  for (int i = 0; i < n; i++)
> +if (cond[i])
> +  res += a[i];
> +  return res;
> +}
> +
> +double __attribute__ ((noinline, noclone, optimize ("0")))
> +reduc_plus_double_ref (double *restrict a, double init, int *cond, int n)
> +{
> +  double res = init;
> +  for (int i = 0; i < n; i++)
> +if (cond[i])
> +  res += a[i];
> +  return res;
> +}
> +
> +double __attribute__ ((noinline, noclone))
> +reduc_minus_double (double *restrict a, double init, int *cond, int n)
> +{
> +  double res = init;
> +  for (int i = 0; i < n; i++)
> +if (cond[i])
> +  res -= a[i];
> +  return res;
> +}
> +
> +double __attribute__ ((noinline, noclone, optimize ("0")))
> +reduc_minus_double_ref (double *restrict a, double init, int *cond, int n)
> +{
> +  double res = init;
> +  for (int i = 0; i < n; i++)
> +if (cond[i])
> +  res -= a[i];
> +  return res;
> +}
> +
> +int __attribute__ ((optimize (1)))
> +main ()
> +{
> +  int n = 19;
> +  double a[N];
> +  int cond1[N], cond2[N];
> +
> +  for (int i = 0; i < N; i++)
> +{
> +  a[i] = (i * 0.1) * (i & 1 ? 1 : -1);
> +  cond1[i] = 0;
> +  cond2[i] = i & 4 ? 1 : 0;
> +  asm volatile ("" ::: "memory");
> +}
> +
> +  double res1 = reduc_plus_double (a, -0.0, cond1, n);
&

RE: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-05 Thread Tamar Christina
Hi Robin,

> -Original Message-
> From: Robin Dapp 
> Sent: Thursday, October 5, 2023 3:06 PM
> To: Tamar Christina ; gcc-patches  patc...@gcc.gnu.org>; Richard Biener 
> Cc: rdapp@gmail.com
> Subject: Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar
> reduction.
> 
> Hi Tamar,
> 
> > The only comment I have is whether you actually need this helper
> > function? It looks like all the uses of it are in cases you have, or
> > will call conditional_internal_fn_code directly.
> removed the cond_fn_p entirely in the attached v3.
> 
> Bootstrapped and regtested on x86_64, aarch64 and power10.
> 

Changes look good to me thanks! I'll leave it up to Richi for final approval.

Regards,
Tamar

> Regards
>  Robin
> 
> Subject: [PATCH v3] ifcvt/vect: Emit COND_ADD for conditional scalar
> reduction.
> 
> As described in PR111401 we currently emit a COND and a PLUS expression
> for conditional reductions.  This makes it difficult to combine both into a
> masked reduction statement later.
> This patch improves that by directly emitting a COND_ADD during ifcvt and
> adjusting some vectorizer code to handle it.
> 
> It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS is
> true.
> 
> gcc/ChangeLog:
> 
>   PR middle-end/111401
>   * tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_ADD
>   if supported.
>   (predicate_scalar_phi): Add whitespace.
>   * tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_ADD.
>   (neutral_op_for_reduction): Return -0 for PLUS.
>   (vect_is_simple_reduction): Don't count else operand in
>   COND_ADD.
>   (vect_create_epilog_for_reduction): Fix whitespace.
>   (vectorize_fold_left_reduction): Add COND_ADD handling.
>   (vectorizable_reduction): Don't count else operand in COND_ADD.
>   (vect_transform_reduction): Add COND_ADD handling.
>   * tree-vectorizer.h (neutral_op_for_reduction): Add default
>   parameter.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test.
>   * gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test.
> ---
>  .../vect-cond-reduc-in-order-2-signed-zero.c  | 141 
>  .../riscv/rvv/autovec/cond/pr111401.c | 139 
>  gcc/tree-if-conv.cc   |  63 ++--
>  gcc/tree-vect-loop.cc | 150 ++
>  gcc/tree-vectorizer.h |   2 +-
>  5 files changed, 451 insertions(+), 44 deletions(-)  create mode 100644
> gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-
> zero.c b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
> new file mode 100644
> index 000..7b46e7d8a2a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
> @@ -0,0 +1,141 @@
> +/* Make sure a -0 stays -0 when we perform a conditional reduction.  */
> +/* { dg-do run } */
> +/* { dg-require-effective-target vect_double } */
> +/* { dg-add-options ieee } */
> +/* { dg-additional-options "-std=gnu99 -fno-fast-math" } */
> +
> +#include "tree-vect.h"
> +
> +#include 
> +
> +#define N (VECTOR_BITS * 17)
> +
> +double __attribute__ ((noinline, noclone)) reduc_plus_double (double
> +*restrict a, double init, int *cond, int n) {
> +  double res = init;
> +  for (int i = 0; i < n; i++)
> +if (cond[i])
> +  res += a[i];
> +  return res;
> +}
> +
> +double __attribute__ ((noinline, noclone, optimize ("0")))
> +reduc_plus_double_ref (double *restrict a, double init, int *cond, int
> +n) {
> +  double res = init;
> +  for (int i = 0; i < n; i++)
> +if (cond[i])
> +  res += a[i];
> +  return res;
> +}
> +
> +double __attribute__ ((noinline, noclone)) reduc_minus_double (double
> +*restrict a, double init, int *cond, int n) {
> +  double res = init;
> +  for (int i = 0; i < n; i++)
> +if (cond[i])
> +  res -= a[i];
> +  return res;
> +}
> +
> +double __attribute__ ((noinline, noclone, optimize ("0")))
> +reduc_minus_double_ref (double *restrict a, double init, int *cond, int
> +n) {
> +  double res = init;
> +  for (int i = 0; i < n; i++)
> +if (cond[i])
> +  res -= a[i];
> +  return res;
> +}
> +
> +int __attribute__ ((optimize (1)))
> +main ()
> +{
> +  int n = 19;
> +  double a[N];
> +  int cond1[N], cond2[N];
> +
> +

Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-05 Thread Robin Dapp
Hi Tamar,

> The only comment I have is whether you actually need this helper
> function? It looks like all the uses of it are in cases you have, or
> will call conditional_internal_fn_code directly.
removed the cond_fn_p entirely in the attached v3.

Bootstrapped and regtested on x86_64, aarch64 and power10.

Regards
 Robin

Subject: [PATCH v3] ifcvt/vect: Emit COND_ADD for conditional scalar
 reduction.

As described in PR111401 we currently emit a COND and a PLUS expression
for conditional reductions.  This makes it difficult to combine both
into a masked reduction statement later.
This patch improves that by directly emitting a COND_ADD during ifcvt and
adjusting some vectorizer code to handle it.

It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS
is true.

gcc/ChangeLog:

PR middle-end/111401
* tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_ADD
if supported.
(predicate_scalar_phi): Add whitespace.
* tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_ADD.
(neutral_op_for_reduction): Return -0 for PLUS.
(vect_is_simple_reduction): Don't count else operand in
COND_ADD.
(vect_create_epilog_for_reduction): Fix whitespace.
(vectorize_fold_left_reduction): Add COND_ADD handling.
(vectorizable_reduction): Don't count else operand in COND_ADD.
(vect_transform_reduction): Add COND_ADD handling.
* tree-vectorizer.h (neutral_op_for_reduction): Add default
parameter.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test.
* gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test.
---
 .../vect-cond-reduc-in-order-2-signed-zero.c  | 141 
 .../riscv/rvv/autovec/cond/pr111401.c | 139 
 gcc/tree-if-conv.cc   |  63 ++--
 gcc/tree-vect-loop.cc | 150 ++
 gcc/tree-vectorizer.h |   2 +-
 5 files changed, 451 insertions(+), 44 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c 
b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
new file mode 100644
index 000..7b46e7d8a2a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
@@ -0,0 +1,141 @@
+/* Make sure a -0 stays -0 when we perform a conditional reduction.  */
+/* { dg-do run } */
+/* { dg-require-effective-target vect_double } */
+/* { dg-add-options ieee } */
+/* { dg-additional-options "-std=gnu99 -fno-fast-math" } */
+
+#include "tree-vect.h"
+
+#include 
+
+#define N (VECTOR_BITS * 17)
+
+double __attribute__ ((noinline, noclone))
+reduc_plus_double (double *restrict a, double init, int *cond, int n)
+{
+  double res = init;
+  for (int i = 0; i < n; i++)
+if (cond[i])
+  res += a[i];
+  return res;
+}
+
+double __attribute__ ((noinline, noclone, optimize ("0")))
+reduc_plus_double_ref (double *restrict a, double init, int *cond, int n)
+{
+  double res = init;
+  for (int i = 0; i < n; i++)
+if (cond[i])
+  res += a[i];
+  return res;
+}
+
+double __attribute__ ((noinline, noclone))
+reduc_minus_double (double *restrict a, double init, int *cond, int n)
+{
+  double res = init;
+  for (int i = 0; i < n; i++)
+if (cond[i])
+  res -= a[i];
+  return res;
+}
+
+double __attribute__ ((noinline, noclone, optimize ("0")))
+reduc_minus_double_ref (double *restrict a, double init, int *cond, int n)
+{
+  double res = init;
+  for (int i = 0; i < n; i++)
+if (cond[i])
+  res -= a[i];
+  return res;
+}
+
+int __attribute__ ((optimize (1)))
+main ()
+{
+  int n = 19;
+  double a[N];
+  int cond1[N], cond2[N];
+
+  for (int i = 0; i < N; i++)
+{
+  a[i] = (i * 0.1) * (i & 1 ? 1 : -1);
+  cond1[i] = 0;
+  cond2[i] = i & 4 ? 1 : 0;
+  asm volatile ("" ::: "memory");
+}
+
+  double res1 = reduc_plus_double (a, -0.0, cond1, n);
+  double ref1 = reduc_plus_double_ref (a, -0.0, cond1, n);
+  double res2 = reduc_minus_double (a, -0.0, cond1, n);
+  double ref2 = reduc_minus_double_ref (a, -0.0, cond1, n);
+  double res3 = reduc_plus_double (a, -0.0, cond1, n);
+  double ref3 = reduc_plus_double_ref (a, -0.0, cond1, n);
+  double res4 = reduc_minus_double (a, -0.0, cond1, n);
+  double ref4 = reduc_minus_double_ref (a, -0.0, cond1, n);
+
+  if (res1 != ref1 || signbit (res1) != signbit (ref1))
+__builtin_abort ();
+  if (res2 != ref2 || signbit (res2) != signbit (ref2))
+__builtin_abort ();
+  if (res3 != ref3 || signbit (res3) != signbit (ref3))
+__builtin_abort ();
+  if (res4 != ref4 || signbit (res4) != signbi

Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-05 Thread Robin Dapp
Ah, sorry, read your remark incorrectly.  Will try again.

Regards
 Robin


Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-05 Thread Robin Dapp
Hi Tamar,

> So in the
> 
>   if (slp_node)
> {
> 
> Add something like:
> 
> If (is_cond_op)
> {
>   if (dump_enabled_p ())
>   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>"left fold reduction on SLP not supported.\n");
>   return false;
> }

Yes, seems reasonable, added.

> The only comment I have is whether you actually need this helper function?
> It looks like all the uses of it are in cases you have, or will call 
> conditional_internal_fn_code
> directly.
> 
> e.g. in vect_transform_reduction you can replace it by 
> 
> bool cond_fn_p = cond_fn != ERROR_MARK;
> 
> and in 
> 
>   if (cond_fn_p (orig_code))
>   orig_code = conditional_internal_fn_code (internal_fn(orig_code));
> 
> just 
> 
> internal_fn new_fn = conditional_internal_fn_code (internal_fn(orig_code));
> if (new_fn != ERROR_MARK)
>   orig_code = new_fn;
> 
> which would save the repeated testing of the condition.

I see what you mean.  One complication is that we want to disambiguate
(among others):

 (1) code = IFN_COND_ADD, cond_fn = IFN_LAST.   (new case)
 (2) code = IFN_MAX, cond_fn = IFN_COND_MAX.
 (3) code = IFN_SOMETHING, cond_fn = IFN_LAST.

So just checking cond_fn is not enough (even if we made
get_conditional_internal_fn (IFN_COND_ADD) return IFN_COND_ADD).
We need to know if the initial code already was an IFN_COND.

It's a bit of a mess but I didn't dare untangling.  Well, actually, I
tried but made it worse ;)  The cond_fn_p check seemed least
intrusive to me.  Maybe you have another idea?

Regards
 Robin


RE: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-04 Thread Tamar Christina
Hi Robin,

> -Original Message-
> From: Robin Dapp 
> Sent: Wednesday, October 4, 2023 8:54 AM
> To: Tamar Christina ; gcc-patches  patc...@gcc.gnu.org>; Richard Biener 
> Cc: rdapp@gmail.com
> Subject: Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar
> reduction.
> 
> Hi Tamar,
> 
> > I can't approve but hope you don't mind the review,
> 
> Not at all, greatly appreciated.
> 
> I incorporated all your remarks apart from this:
> 
> > Isn't vec_opmask NULL for SLP? You probably need to read it from
> > vec_defs for the COND_EXPR
> 
> Above that I gcc_assert (!slp_node) for the IFN_COND case.  It doesn't seem to
> be hit during testsuite runs.  I also didn't manage to create an example that
> would trigger it.  When "conditionalizing" an SLP fold-left reduction we don't
> seem to vectorize for a different reason.
> Granted, I didn't look very closely at that reason :)

Yeah it looks like it's failing because it can't handle the PHI node reduction 
for
the condition.

So that's fine, I think then we should exit from vectorize_fold_left_reduction
in that case so we avoid the segfault when we start forcing things through
SLP only soon and add single lane SLP support.

So in the

  if (slp_node)
{

Add something like:

If (is_cond_op)
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 "left fold reduction on SLP not supported.\n");
  return false;
}

> 
> Bootstrap and testsuite are currently running with the attached v2 on x86,
> aarch64 and powerpc.
> 
> Besides, when thinking about which COND_OPs we expect I tried to loosen the
> restrictions in if-conv by allowing MAX_EXPR and MIN_EXPR.  The emitted
> code on riscv looks correct but I hit a bootstrap ICE on x86 so omitted it for
> now.
> 
> Regards
>  Robin
> 
> 
> Subject: [PATCH v2] ifcvt/vect: Emit COND_ADD for conditional scalar
> reduction.
> 
> As described in PR111401 we currently emit a COND and a PLUS expression
> for conditional reductions.  This makes it difficult to combine both into a
> masked reduction statement later.
> This patch improves that by directly emitting a COND_ADD during ifcvt and
> adjusting some vectorizer code to handle it.
> 
> It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS is
> true.
> 
> gcc/ChangeLog:
> 
>   PR middle-end/111401
>   * internal-fn.cc (cond_fn_p): New function.
>   * internal-fn.h (cond_fn_p): Define.
>   * tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_ADD
>   if supported.
>   (predicate_scalar_phi): Add whitespace.
>   * tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_ADD.
>   (neutral_op_for_reduction): Return -0 for PLUS.
>   (vect_is_simple_reduction): Don't count else operand in
>   COND_ADD.
>   (vectorize_fold_left_reduction): Add COND_ADD handling.
>   (vectorizable_reduction): Don't count else operand in COND_ADD.
>   (vect_transform_reduction): Add COND_ADD handling.
>   * tree-vectorizer.h (neutral_op_for_reduction): Add default
>   parameter.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test.
>   * gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test.
> ---
>  gcc/internal-fn.cc|  17 +++
>  gcc/internal-fn.h |   1 +
>  .../vect-cond-reduc-in-order-2-signed-zero.c  | 141 ++
>  .../riscv/rvv/autovec/cond/pr111401.c | 139 +
>  gcc/tree-if-conv.cc   |  63 ++--
>  gcc/tree-vect-loop.cc | 129 
>  gcc/tree-vectorizer.h |   2 +-
>  7 files changed, 450 insertions(+), 42 deletions(-)  create mode 100644
> gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c
> 
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index
> 61d5a9e4772..9b38dc0cef4 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4245,6 +4245,23 @@ first_commutative_argument (internal_fn fn)
>  }
>  }
> 
> +/* Return true if this CODE describes a conditional (masked)
> +internal_fn.  */
> +
> +bool
> +cond_fn_p (code_helper code)
> +{
> +  if (!code.is_fn_code ())
> +return false;
> +
> +  if (!internal_fn_p ((combined_fn) code))
> +return false;
> +
> +  internal_fn fn = as_internal_fn ((combined_fn) code);
> +
> +  return conditional_internal_fn_code (fn) != ERROR_MARK; }
> +
> +
>  /* Return true if

Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-04 Thread Robin Dapp
> +  gcc_assert (code == IFN_COND_ADD || code == IFN_COND_SUB);

I forgot to add the other IFN_CONDs here before sending.  So with

-  gcc_assert (code == IFN_COND_ADD || code == IFN_COND_SUB);
+  gcc_assert (code == IFN_COND_ADD || code == IFN_COND_SUB
+   || code == IFN_COND_MUL || code == IFN_COND_AND
+   || code == IFN_COND_IOR || code == IFN_COND_XOR);

on top, bootstrap and testsuites on x86, aarch64 and power10 are
unchanged.

Regards
 Robin


Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-04 Thread Robin Dapp
Hi Tamar,

> I can't approve but hope you don't mind the review,

Not at all, greatly appreciated.

I incorporated all your remarks apart from this:

> Isn't vec_opmask NULL for SLP? You probably need to read it from
> vec_defs for the COND_EXPR

Above that I gcc_assert (!slp_node) for the IFN_COND case.  It doesn't
seem to be hit during testsuite runs.  I also didn't manage to create
an example that would trigger it.  When "conditionalizing" an SLP
fold-left reduction we don't seem to vectorize for a different reason.
Granted, I didn't look very closely at that reason :)

Bootstrap and testsuite are currently running with the attached v2
on x86, aarch64 and powerpc.

Besides, when thinking about which COND_OPs we expect I tried to loosen
the restrictions in if-conv by allowing MAX_EXPR and MIN_EXPR.  The
emitted code on riscv looks correct but I hit a bootstrap ICE on x86
so omitted it for now.

Regards
 Robin


Subject: [PATCH v2] ifcvt/vect: Emit COND_ADD for conditional scalar
 reduction.

As described in PR111401 we currently emit a COND and a PLUS expression
for conditional reductions.  This makes it difficult to combine both
into a masked reduction statement later.
This patch improves that by directly emitting a COND_ADD during ifcvt and
adjusting some vectorizer code to handle it.

It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS
is true.

gcc/ChangeLog:

PR middle-end/111401
* internal-fn.cc (cond_fn_p): New function.
* internal-fn.h (cond_fn_p): Define.
* tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_ADD
if supported.
(predicate_scalar_phi): Add whitespace.
* tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_ADD.
(neutral_op_for_reduction): Return -0 for PLUS.
(vect_is_simple_reduction): Don't count else operand in
COND_ADD.
(vectorize_fold_left_reduction): Add COND_ADD handling.
(vectorizable_reduction): Don't count else operand in COND_ADD.
(vect_transform_reduction): Add COND_ADD handling.
* tree-vectorizer.h (neutral_op_for_reduction): Add default
parameter.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test.
* gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test.
---
 gcc/internal-fn.cc|  17 +++
 gcc/internal-fn.h |   1 +
 .../vect-cond-reduc-in-order-2-signed-zero.c  | 141 ++
 .../riscv/rvv/autovec/cond/pr111401.c | 139 +
 gcc/tree-if-conv.cc   |  63 ++--
 gcc/tree-vect-loop.cc | 129 
 gcc/tree-vectorizer.h |   2 +-
 7 files changed, 450 insertions(+), 42 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 61d5a9e4772..9b38dc0cef4 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -4245,6 +4245,23 @@ first_commutative_argument (internal_fn fn)
 }
 }
 
+/* Return true if this CODE describes a conditional (masked) internal_fn.  */
+
+bool
+cond_fn_p (code_helper code)
+{
+  if (!code.is_fn_code ())
+return false;
+
+  if (!internal_fn_p ((combined_fn) code))
+return false;
+
+  internal_fn fn = as_internal_fn ((combined_fn) code);
+
+  return conditional_internal_fn_code (fn) != ERROR_MARK;
+}
+
+
 /* Return true if this CODE describes an internal_fn that returns a vector with
elements twice as wide as the element size of the input vectors.  */
 
diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h
index 99de13a0199..f1cc9db29c0 100644
--- a/gcc/internal-fn.h
+++ b/gcc/internal-fn.h
@@ -219,6 +219,7 @@ extern bool commutative_ternary_fn_p (internal_fn);
 extern int first_commutative_argument (internal_fn);
 extern bool associative_binary_fn_p (internal_fn);
 extern bool widening_fn_p (code_helper);
+extern bool cond_fn_p (code_helper code);
 
 extern bool set_edom_supported_p (void);
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c 
b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
new file mode 100644
index 000..7b46e7d8a2a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
@@ -0,0 +1,141 @@
+/* Make sure a -0 stays -0 when we perform a conditional reduction.  */
+/* { dg-do run } */
+/* { dg-require-effective-target vect_double } */
+/* { dg-add-options ieee } */
+/* { dg-additional-options "-std=gnu99 -fno-fast-math" } */
+
+#include "tree-vect.h"
+
+#include 
+
+#define N (VECTOR_BITS * 17)
+
+double __attribute__ ((noinline, noclone))
+reduc_plus_double (double *restrict a, double init, int *cond, int n)
+{
+  double res = init;
+  for 

Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-09-27 Thread Richard Biener
On Wed, 20 Sep 2023, Robin Dapp wrote:

> Hi,
> 
> as described in PR111401 we currently emit a COND and a PLUS expression
> for conditional reductions.  This makes it difficult to combine both
> into a masked reduction statement later.
> This patch improves that by directly emitting a COND_ADD during ifcvt and
> adjusting some vectorizer code to handle it.
> 
> It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS
> is true.
> 
> Related question/change: We only allow PLUS_EXPR in fold_left_reduction_fn
> but have code to handle MINUS_EXPR in vectorize_fold_left_reduction.  I
> suppose that's intentional but it "just works" on riscv and the testsuite
> doesn't change when allowing MINUS_EXPR so I went ahead and did that.
> 
> Bootstrapped and regtested on x86 and aarch64.

I think overall the patch is fine - please address Tamars comments
though, those look valid.

Thanks,
Richard.

> Regards
>  Robin
> 
> gcc/ChangeLog:
> 
>   PR middle-end/111401
>   * internal-fn.cc (cond_fn_p): New function.
>   * internal-fn.h (cond_fn_p): Define.
>   * tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_ADD
>   if supported.
>   (predicate_scalar_phi): Add whitespace.
>   * tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_ADD.
>   (neutral_op_for_reduction): Return -0 for PLUS.
>   (vect_is_simple_reduction): Don't count else operand in
>   COND_ADD.
>   (vectorize_fold_left_reduction): Add COND_ADD handling.
>   (vectorizable_reduction): Don't count else operand in COND_ADD.
>   (vect_transform_reduction): Add COND_ADD handling.
>   * tree-vectorizer.h (neutral_op_for_reduction): Add default
>   parameter.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test.
>   * gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test.
> ---
>  gcc/internal-fn.cc|  38 +
>  gcc/internal-fn.h |   1 +
>  .../vect-cond-reduc-in-order-2-signed-zero.c  | 141 ++
>  .../riscv/rvv/autovec/cond/pr111401.c |  61 
>  gcc/tree-if-conv.cc   |  63 ++--
>  gcc/tree-vect-loop.cc | 130 
>  gcc/tree-vectorizer.h |   2 +-
>  7 files changed, 394 insertions(+), 42 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c
> 
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 0fd34359247..77939890f5a 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4241,6 +4241,44 @@ first_commutative_argument (internal_fn fn)
>  }
>  }
>  
> +/* Return true if this CODE describes a conditional (masked) internal_fn.  */
> +
> +bool
> +cond_fn_p (code_helper code)
> +{
> +  if (!code.is_fn_code ())
> +return false;
> +
> +  if (!internal_fn_p ((combined_fn) code))
> +return false;
> +
> +  internal_fn fn = as_internal_fn ((combined_fn) code);
> +  switch (fn)
> +{
> +#undef DEF_INTERNAL_COND_FN
> +#define DEF_INTERNAL_COND_FN(NAME, F, O, T)\
> +case IFN_COND_##NAME:  \
> +case IFN_COND_LEN_##NAME:  \
> +  return true;
> +#include "internal-fn.def"
> +#undef DEF_INTERNAL_COND_FN
> +
> +#undef DEF_INTERNAL_SIGNED_COND_FN
> +#define DEF_INTERNAL_SIGNED_COND_FN(NAME, F, S, SO, UO, T) \
> +case IFN_COND_##NAME:  \
> +case IFN_COND_LEN_##NAME:  \
> +  return true;
> +#include "internal-fn.def"
> +#undef DEF_INTERNAL_SIGNED_COND_FN
> +
> +default:
> +  return false;
> +}
> +
> +  return false;
> +}
> +
> +
>  /* Return true if this CODE describes an internal_fn that returns a vector 
> with
> elements twice as wide as the element size of the input vectors.  */
>  
> diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h
> index 99de13a0199..f1cc9db29c0 100644
> --- a/gcc/internal-fn.h
> +++ b/gcc/internal-fn.h
> @@ -219,6 +219,7 @@ extern bool commutative_ternary_fn_p (internal_fn);
>  extern int first_commutative_argument (internal_fn);
>  extern bool associative_binary_fn_p (internal_fn);
>  extern bool widening_fn_p (code_helper);
> +extern bool cond_fn_p (code_helper code);
>  
>  extern bool set_edom_supported_p (void);
>  
> diff --git 
> a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c 
> b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
> new file mode 100644
> index 000..57c600838ee
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
> @@ -0,0 +1,141 @@
> +/* Make sure a -0 stays -0 when we perform a conditional reduction.  */
> +/* { 

RE: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-09-26 Thread Tamar Christina
Hi,

I can't approve but hope you don't mind the review,

> +/* Return true if this CODE describes a conditional (masked)
> +internal_fn.  */
> +
> +bool
> +cond_fn_p (code_helper code)
> +{
> +  if (!code.is_fn_code ())
> +return false;
> +
> +  if (!internal_fn_p ((combined_fn) code))
> +return false;
> +
> +  internal_fn fn = as_internal_fn ((combined_fn) code);  switch (fn)
> +{
> +#undef DEF_INTERNAL_COND_FN
> +#define DEF_INTERNAL_COND_FN(NAME, F, O, T)\
> +case IFN_COND_##NAME:  \
> +case IFN_COND_LEN_##NAME:  \
> +  return true;
> +#include "internal-fn.def"
> +#undef DEF_INTERNAL_COND_FN
> +
> +#undef DEF_INTERNAL_SIGNED_COND_FN
> +#define DEF_INTERNAL_SIGNED_COND_FN(NAME, F, S, SO, UO, T) \
> +case IFN_COND_##NAME:  \
> +case IFN_COND_LEN_##NAME:  \
> +  return true;
> +#include "internal-fn.def"
> +#undef DEF_INTERNAL_SIGNED_COND_FN
> +
> +default:
> +  return false;
> +}
> +
> +  return false;
> +}
> +
> +

Could you not use conditional_internal_fn_code for this? Just check result is 
not ERROR_MARK?

>  /* Return true if this CODE describes an internal_fn that returns a vector 
> with
> elements twice as wide as the element size of the input vectors.  */
> 
> diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h index
> 99de13a0199..f1cc9db29c0 100644
> --- a/gcc/internal-fn.h
> +++ b/gcc/internal-fn.h
> @@ -219,6 +219,7 @@ extern bool commutative_ternary_fn_p (internal_fn);
> extern int first_commutative_argument (internal_fn);  extern bool
> associative_binary_fn_p (internal_fn);  extern bool widening_fn_p
> (code_helper);
> +extern bool cond_fn_p (code_helper code);
> 
>  extern bool set_edom_supported_p (void);
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-
> zero.c b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
> new file mode 100644
> index 000..57c600838ee
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
> @@ -0,0 +1,141 @@
> +/* Make sure a -0 stays -0 when we perform a conditional reduction.  */
> +/* { dg-do run } */
> +/* { dg-require-effective-target vect_double } */
> +/* { dg-add-options ieee } */
> +/* { dg-additional-options "-std=c99 -fno-fast-math" } */
> +
> +#include "tree-vect.h"
> +
> +#include 
> +
> +#define N (VECTOR_BITS * 17)
> +
> +double __attribute__ ((noinline, noclone)) reduc_plus_double (double
> +*restrict a, double init, int *cond, int n) {
> +  double res = init;
> +  for (int i = 0; i < n; i++)
> +if (cond[i])
> +  res += a[i];
> +  return res;
> +}
> +
> +double __attribute__ ((noinline, noclone, optimize ("0")))
> +reduc_plus_double_ref (double *restrict a, double init, int *cond, int
> +n) {
> +  double res = init;
> +  for (int i = 0; i < n; i++)
> +if (cond[i])
> +  res += a[i];
> +  return res;
> +}
> +
> +double __attribute__ ((noinline, noclone)) reduc_minus_double (double
> +*restrict a, double init, int *cond, int n) {
> +  double res = init;
> +  for (int i = 0; i < n; i++)
> +if (cond[i])
> +  res -= a[i];
> +  return res;
> +}
> +
> +double __attribute__ ((noinline, noclone, optimize ("0")))
> +reduc_minus_double_ref (double *restrict a, double init, int *cond, int
> +n) {
> +  double res = init;
> +  for (int i = 0; i < n; i++)
> +if (cond[i])
> +  res -= a[i];
> +  return res;
> +}
> +
> +int __attribute__ ((optimize (1)))
> +main ()
> +{
> +  int n = 19;
> +  double a[N];
> +  int cond1[N], cond2[N];
> +
> +  for (int i = 0; i < N; i++)
> +{
> +  a[i] = (i * 0.1) * (i & 1 ? 1 : -1);
> +  cond1[i] = 0;
> +  cond2[i] = i & 4 ? 1 : 0;
> +  asm volatile ("" ::: "memory");
> +}
> +
> +  double res1 = reduc_plus_double (a, -0.0, cond1, n);  double ref1 =
> + reduc_plus_double_ref (a, -0.0, cond1, n);  double res2 =
> + reduc_minus_double (a, -0.0, cond1, n);  double ref2 =
> + reduc_minus_double_ref (a, -0.0, cond1, n);  double res3 =
> + reduc_plus_double (a, -0.0, cond1, n);  double ref3 =
> + reduc_plus_double_ref (a, -0.0, cond1, n);  double res4 =
> + reduc_minus_double (a, -0.0, cond1, n);  double ref4 =
> + reduc_minus_double_ref (a, -0.0, cond1, n);
> +
> +  if (res1 != ref1 || signbit (res1) != signbit (ref1))
> +__builtin_abort ();
> +  if (res2 != ref2 || signbit (res2) != signbit (ref2))
> +__builtin_abort ();
> +  if (res3 != ref3 || signbit (res3) != signbit (ref3))
> +__builtin_abort ();
> +  if (res4 != ref4 || signbit (res4) != signbit (ref4))
> +__builtin_abort ();
> +
> +  res1 = reduc_plus_double (a, 0.0, cond1, n);
> +  ref1 = reduc_plus_double_ref (a, 0.0, cond1, n);
> +  res2 = reduc_minus_double (a, 0.0, cond1, n);
> +  ref2 = reduc_minus_double_ref (a, 0.0, cond1, n);
> +  res3 = 

[PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-09-20 Thread Robin Dapp
Hi,

as described in PR111401 we currently emit a COND and a PLUS expression
for conditional reductions.  This makes it difficult to combine both
into a masked reduction statement later.
This patch improves that by directly emitting a COND_ADD during ifcvt and
adjusting some vectorizer code to handle it.

It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS
is true.

Related question/change: We only allow PLUS_EXPR in fold_left_reduction_fn
but have code to handle MINUS_EXPR in vectorize_fold_left_reduction.  I
suppose that's intentional but it "just works" on riscv and the testsuite
doesn't change when allowing MINUS_EXPR so I went ahead and did that.

Bootstrapped and regtested on x86 and aarch64.

Regards
 Robin

gcc/ChangeLog:

PR middle-end/111401
* internal-fn.cc (cond_fn_p): New function.
* internal-fn.h (cond_fn_p): Define.
* tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_ADD
if supported.
(predicate_scalar_phi): Add whitespace.
* tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_ADD.
(neutral_op_for_reduction): Return -0 for PLUS.
(vect_is_simple_reduction): Don't count else operand in
COND_ADD.
(vectorize_fold_left_reduction): Add COND_ADD handling.
(vectorizable_reduction): Don't count else operand in COND_ADD.
(vect_transform_reduction): Add COND_ADD handling.
* tree-vectorizer.h (neutral_op_for_reduction): Add default
parameter.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test.
* gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test.
---
 gcc/internal-fn.cc|  38 +
 gcc/internal-fn.h |   1 +
 .../vect-cond-reduc-in-order-2-signed-zero.c  | 141 ++
 .../riscv/rvv/autovec/cond/pr111401.c |  61 
 gcc/tree-if-conv.cc   |  63 ++--
 gcc/tree-vect-loop.cc | 130 
 gcc/tree-vectorizer.h |   2 +-
 7 files changed, 394 insertions(+), 42 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 0fd34359247..77939890f5a 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -4241,6 +4241,44 @@ first_commutative_argument (internal_fn fn)
 }
 }
 
+/* Return true if this CODE describes a conditional (masked) internal_fn.  */
+
+bool
+cond_fn_p (code_helper code)
+{
+  if (!code.is_fn_code ())
+return false;
+
+  if (!internal_fn_p ((combined_fn) code))
+return false;
+
+  internal_fn fn = as_internal_fn ((combined_fn) code);
+  switch (fn)
+{
+#undef DEF_INTERNAL_COND_FN
+#define DEF_INTERNAL_COND_FN(NAME, F, O, T)  \
+case IFN_COND_##NAME:\
+case IFN_COND_LEN_##NAME:\
+  return true;
+#include "internal-fn.def"
+#undef DEF_INTERNAL_COND_FN
+
+#undef DEF_INTERNAL_SIGNED_COND_FN
+#define DEF_INTERNAL_SIGNED_COND_FN(NAME, F, S, SO, UO, T)   \
+case IFN_COND_##NAME:\
+case IFN_COND_LEN_##NAME:\
+  return true;
+#include "internal-fn.def"
+#undef DEF_INTERNAL_SIGNED_COND_FN
+
+default:
+  return false;
+}
+
+  return false;
+}
+
+
 /* Return true if this CODE describes an internal_fn that returns a vector with
elements twice as wide as the element size of the input vectors.  */
 
diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h
index 99de13a0199..f1cc9db29c0 100644
--- a/gcc/internal-fn.h
+++ b/gcc/internal-fn.h
@@ -219,6 +219,7 @@ extern bool commutative_ternary_fn_p (internal_fn);
 extern int first_commutative_argument (internal_fn);
 extern bool associative_binary_fn_p (internal_fn);
 extern bool widening_fn_p (code_helper);
+extern bool cond_fn_p (code_helper code);
 
 extern bool set_edom_supported_p (void);
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c 
b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
new file mode 100644
index 000..57c600838ee
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
@@ -0,0 +1,141 @@
+/* Make sure a -0 stays -0 when we perform a conditional reduction.  */
+/* { dg-do run } */
+/* { dg-require-effective-target vect_double } */
+/* { dg-add-options ieee } */
+/* { dg-additional-options "-std=c99 -fno-fast-math" } */
+
+#include "tree-vect.h"
+
+#include 
+
+#define N (VECTOR_BITS * 17)
+
+double __attribute__ ((noinline, noclone))
+reduc_plus_double (double *restrict a, double init, int *cond, int n)
+{
+  double res = init;
+  for (int i = 0; i