Re: [PATCH] internal-fn: Add VCOND_MASK_LEN.

2023-11-05 Thread Richard Biener
On Sun, 5 Nov 2023, Richard Sandiford wrote:

> Robin Dapp  writes:
> >> Ah, OK.  IMO it's better to keep the optab operands the same as the IFN
> >> operands, even if that makes things inconsistent with vcond_mask.
> >> vcond_mask isn't really a good example to follow, since the operand
> >> order is not only inconsistent with the IFN, it's also inconsistent
> >> with the natural if_then_else order.
> >
> > v4 attached with that changed,  match.pd patterns interleaved as well
> > as scratch-handling added and VLS modes removed.  Lehua has since pushed
> > another patch that extends gimple_match_op to 6/7 operands already so
> > that could be removed as well making the patch even smaller now.
> >
> > Testsuite on riscv looks good (apart from the mentioned cond_widen...),
> > still running on aarch64 and x86.  OK if those pass?
> >
> > Regards
> >  Robin
> >
> > Subject: [PATCH v4] internal-fn: Add VCOND_MASK_LEN.
> >
> > In order to prevent simplification of a COND_OP with degenerate mask
> > (CONSTM1_RTX) into just an OP in the presence of length masking this
> > patch introduces a length-masked analog to VEC_COND_EXPR:
> > IFN_VCOND_MASK_LEN.
> >
> > It also adds new match patterns that allow the combination of
> > unconditional unary, binary and ternay operations with the
> > VCOND_MASK_LEN into a conditional operation if the target supports it.
> >
> > gcc/ChangeLog:
> >
> > PR tree-optimization/111760
> >
> > * config/riscv/autovec.md (vcond_mask_len_): Add
> > expander.
> > * config/riscv/riscv-protos.h (enum insn_type): Add.
> > * config/riscv/riscv-v.cc (needs_fp_rounding): Add !pred_mov.
> > * doc/md.texi: Add vcond_mask_len.
> > * gimple-match-exports.cc (maybe_resimplify_conditional_op):
> > Create VCOND_MASK_LEN when length masking.
> > * gimple-match.h (gimple_match_op::gimple_match_op): Always
> > initialize len and bias.
> > * internal-fn.cc (vec_cond_mask_len_direct): Add.
> > (direct_vec_cond_mask_len_optab_supported_p): Add.
> > (internal_fn_len_index): Add VCOND_MASK_LEN.
> > (internal_fn_mask_index): Ditto.
> > * internal-fn.def (VCOND_MASK_LEN): New internal function.
> > * match.pd: Combine unconditional unary, binary and ternary
> > operations into the respective COND_LEN operations.
> > * optabs.def (OPTAB_D): Add vcond_mask_len optab.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.dg/vect/vect-cond-arith-2.c: No vect cost model for
> > riscv_v.
> > ---
> >  gcc/config/riscv/autovec.md   | 26 ++
> >  gcc/config/riscv/riscv-protos.h   |  3 ++
> >  gcc/config/riscv/riscv-v.cc   |  3 +-
> >  gcc/doc/md.texi   |  9 
> >  gcc/gimple-match-exports.cc   | 13 +++--
> >  gcc/gimple-match.h|  6 ++-
> >  gcc/internal-fn.cc|  5 ++
> >  gcc/internal-fn.def   |  2 +
> >  gcc/match.pd  | 51 +++
> >  gcc/optabs.def|  1 +
> >  gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c |  1 +
> >  11 files changed, 114 insertions(+), 6 deletions(-)
> >
> > diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> > index cc4c9596bbf..0a5e4ccb54e 100644
> > --- a/gcc/config/riscv/autovec.md
> > +++ b/gcc/config/riscv/autovec.md
> > @@ -565,6 +565,32 @@ (define_insn_and_split "vcond_mask_"
> >[(set_attr "type" "vector")]
> >  )
> >  
> > +(define_expand "vcond_mask_len_"
> > +  [(match_operand:V 0 "register_operand")
> > +(match_operand: 1 "nonmemory_operand")
> > +(match_operand:V 2 "nonmemory_operand")
> > +(match_operand:V 3 "autovec_else_operand")
> > +(match_operand 4 "autovec_length_operand")
> > +(match_operand 5 "const_0_operand")]
> > +  "TARGET_VECTOR"
> > +  {
> > +if (satisfies_constraint_Wc1 (operands[1]))
> > +  riscv_vector::expand_cond_len_unop (code_for_pred_mov (mode),
> > + operands);
> > +else
> > +  {
> > +   /* The order of then and else is opposite to pred_merge.  */
> > +   rtx ops[] = {operands[0], operands[3], operands[3], operands[2],
> > +operands[1]};
> > +   riscv_vector::emit_nonvlmax_insn (code_for_pred_merge (mode),
> > + riscv_vector::MERGE_OP_TU,
> > + ops, operands[4]);
> > +  }
> > +DONE;
> > +  }
> > +  [(set_attr "type" "vector")]
> > +)
> > +
> >  ;; 
> > -
> >  ;;  [BOOL] Select based on masks
> >  ;; 
> > -
> > diff --git a/gcc/config/riscv/riscv-protos.h 
> > b/gcc/config/riscv/riscv-protos.h
> > index a1be731c28e..0d0ee5effea 100644
> > --- a/gcc/config/riscv/riscv-protos.h
> > +++ 

Re: [PATCH] internal-fn: Add VCOND_MASK_LEN.

2023-11-05 Thread Richard Sandiford
Robin Dapp  writes:
>> Ah, OK.  IMO it's better to keep the optab operands the same as the IFN
>> operands, even if that makes things inconsistent with vcond_mask.
>> vcond_mask isn't really a good example to follow, since the operand
>> order is not only inconsistent with the IFN, it's also inconsistent
>> with the natural if_then_else order.
>
> v4 attached with that changed,  match.pd patterns interleaved as well
> as scratch-handling added and VLS modes removed.  Lehua has since pushed
> another patch that extends gimple_match_op to 6/7 operands already so
> that could be removed as well making the patch even smaller now.
>
> Testsuite on riscv looks good (apart from the mentioned cond_widen...),
> still running on aarch64 and x86.  OK if those pass?
>
> Regards
>  Robin
>
> Subject: [PATCH v4] internal-fn: Add VCOND_MASK_LEN.
>
> In order to prevent simplification of a COND_OP with degenerate mask
> (CONSTM1_RTX) into just an OP in the presence of length masking this
> patch introduces a length-masked analog to VEC_COND_EXPR:
> IFN_VCOND_MASK_LEN.
>
> It also adds new match patterns that allow the combination of
> unconditional unary, binary and ternay operations with the
> VCOND_MASK_LEN into a conditional operation if the target supports it.
>
> gcc/ChangeLog:
>
>   PR tree-optimization/111760
>
>   * config/riscv/autovec.md (vcond_mask_len_): Add
>   expander.
>   * config/riscv/riscv-protos.h (enum insn_type): Add.
>   * config/riscv/riscv-v.cc (needs_fp_rounding): Add !pred_mov.
>   * doc/md.texi: Add vcond_mask_len.
>   * gimple-match-exports.cc (maybe_resimplify_conditional_op):
>   Create VCOND_MASK_LEN when length masking.
>   * gimple-match.h (gimple_match_op::gimple_match_op): Always
>   initialize len and bias.
>   * internal-fn.cc (vec_cond_mask_len_direct): Add.
>   (direct_vec_cond_mask_len_optab_supported_p): Add.
>   (internal_fn_len_index): Add VCOND_MASK_LEN.
>   (internal_fn_mask_index): Ditto.
>   * internal-fn.def (VCOND_MASK_LEN): New internal function.
>   * match.pd: Combine unconditional unary, binary and ternary
>   operations into the respective COND_LEN operations.
>   * optabs.def (OPTAB_D): Add vcond_mask_len optab.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.dg/vect/vect-cond-arith-2.c: No vect cost model for
>   riscv_v.
> ---
>  gcc/config/riscv/autovec.md   | 26 ++
>  gcc/config/riscv/riscv-protos.h   |  3 ++
>  gcc/config/riscv/riscv-v.cc   |  3 +-
>  gcc/doc/md.texi   |  9 
>  gcc/gimple-match-exports.cc   | 13 +++--
>  gcc/gimple-match.h|  6 ++-
>  gcc/internal-fn.cc|  5 ++
>  gcc/internal-fn.def   |  2 +
>  gcc/match.pd  | 51 +++
>  gcc/optabs.def|  1 +
>  gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c |  1 +
>  11 files changed, 114 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index cc4c9596bbf..0a5e4ccb54e 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -565,6 +565,32 @@ (define_insn_and_split "vcond_mask_"
>[(set_attr "type" "vector")]
>  )
>  
> +(define_expand "vcond_mask_len_"
> +  [(match_operand:V 0 "register_operand")
> +(match_operand: 1 "nonmemory_operand")
> +(match_operand:V 2 "nonmemory_operand")
> +(match_operand:V 3 "autovec_else_operand")
> +(match_operand 4 "autovec_length_operand")
> +(match_operand 5 "const_0_operand")]
> +  "TARGET_VECTOR"
> +  {
> +if (satisfies_constraint_Wc1 (operands[1]))
> +  riscv_vector::expand_cond_len_unop (code_for_pred_mov (mode),
> +   operands);
> +else
> +  {
> + /* The order of then and else is opposite to pred_merge.  */
> + rtx ops[] = {operands[0], operands[3], operands[3], operands[2],
> +  operands[1]};
> + riscv_vector::emit_nonvlmax_insn (code_for_pred_merge (mode),
> +   riscv_vector::MERGE_OP_TU,
> +   ops, operands[4]);
> +  }
> +DONE;
> +  }
> +  [(set_attr "type" "vector")]
> +)
> +
>  ;; -
>  ;;  [BOOL] Select based on masks
>  ;; -
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> index a1be731c28e..0d0ee5effea 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -359,6 +359,9 @@ enum insn_type : unsigned int
>/* For vmerge, no mask operand, no mask policy operand.  */
>MERGE_OP = __NORMAL_OP_TA2 | TERNARY_OP_P,
>  
> +  /* For vmerge with TU policy.  */
> +  

Re: [PATCH] internal-fn: Add VCOND_MASK_LEN.

2023-11-03 Thread Robin Dapp
> Ah, OK.  IMO it's better to keep the optab operands the same as the IFN
> operands, even if that makes things inconsistent with vcond_mask.
> vcond_mask isn't really a good example to follow, since the operand
> order is not only inconsistent with the IFN, it's also inconsistent
> with the natural if_then_else order.

v4 attached with that changed,  match.pd patterns interleaved as well
as scratch-handling added and VLS modes removed.  Lehua has since pushed
another patch that extends gimple_match_op to 6/7 operands already so
that could be removed as well making the patch even smaller now.

Testsuite on riscv looks good (apart from the mentioned cond_widen...),
still running on aarch64 and x86.  OK if those pass?

Regards
 Robin

Subject: [PATCH v4] internal-fn: Add VCOND_MASK_LEN.

In order to prevent simplification of a COND_OP with degenerate mask
(CONSTM1_RTX) into just an OP in the presence of length masking this
patch introduces a length-masked analog to VEC_COND_EXPR:
IFN_VCOND_MASK_LEN.

It also adds new match patterns that allow the combination of
unconditional unary, binary and ternay operations with the
VCOND_MASK_LEN into a conditional operation if the target supports it.

gcc/ChangeLog:

PR tree-optimization/111760

* config/riscv/autovec.md (vcond_mask_len_): Add
expander.
* config/riscv/riscv-protos.h (enum insn_type): Add.
* config/riscv/riscv-v.cc (needs_fp_rounding): Add !pred_mov.
* doc/md.texi: Add vcond_mask_len.
* gimple-match-exports.cc (maybe_resimplify_conditional_op):
Create VCOND_MASK_LEN when length masking.
* gimple-match.h (gimple_match_op::gimple_match_op): Always
initialize len and bias.
* internal-fn.cc (vec_cond_mask_len_direct): Add.
(direct_vec_cond_mask_len_optab_supported_p): Add.
(internal_fn_len_index): Add VCOND_MASK_LEN.
(internal_fn_mask_index): Ditto.
* internal-fn.def (VCOND_MASK_LEN): New internal function.
* match.pd: Combine unconditional unary, binary and ternary
operations into the respective COND_LEN operations.
* optabs.def (OPTAB_D): Add vcond_mask_len optab.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-cond-arith-2.c: No vect cost model for
riscv_v.
---
 gcc/config/riscv/autovec.md   | 26 ++
 gcc/config/riscv/riscv-protos.h   |  3 ++
 gcc/config/riscv/riscv-v.cc   |  3 +-
 gcc/doc/md.texi   |  9 
 gcc/gimple-match-exports.cc   | 13 +++--
 gcc/gimple-match.h|  6 ++-
 gcc/internal-fn.cc|  5 ++
 gcc/internal-fn.def   |  2 +
 gcc/match.pd  | 51 +++
 gcc/optabs.def|  1 +
 gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c |  1 +
 11 files changed, 114 insertions(+), 6 deletions(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index cc4c9596bbf..0a5e4ccb54e 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -565,6 +565,32 @@ (define_insn_and_split "vcond_mask_"
   [(set_attr "type" "vector")]
 )
 
+(define_expand "vcond_mask_len_"
+  [(match_operand:V 0 "register_operand")
+(match_operand: 1 "nonmemory_operand")
+(match_operand:V 2 "nonmemory_operand")
+(match_operand:V 3 "autovec_else_operand")
+(match_operand 4 "autovec_length_operand")
+(match_operand 5 "const_0_operand")]
+  "TARGET_VECTOR"
+  {
+if (satisfies_constraint_Wc1 (operands[1]))
+  riscv_vector::expand_cond_len_unop (code_for_pred_mov (mode),
+ operands);
+else
+  {
+   /* The order of then and else is opposite to pred_merge.  */
+   rtx ops[] = {operands[0], operands[3], operands[3], operands[2],
+operands[1]};
+   riscv_vector::emit_nonvlmax_insn (code_for_pred_merge (mode),
+ riscv_vector::MERGE_OP_TU,
+ ops, operands[4]);
+  }
+DONE;
+  }
+  [(set_attr "type" "vector")]
+)
+
 ;; -
 ;;  [BOOL] Select based on masks
 ;; -
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index a1be731c28e..0d0ee5effea 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -359,6 +359,9 @@ enum insn_type : unsigned int
   /* For vmerge, no mask operand, no mask policy operand.  */
   MERGE_OP = __NORMAL_OP_TA2 | TERNARY_OP_P,
 
+  /* For vmerge with TU policy.  */
+  MERGE_OP_TU = HAS_DEST_P | HAS_MERGE_P | TERNARY_OP_P | TU_POLICY_P,
+
   /* For vm, no tail policy operand.  */
   COMPARE_OP = __NORMAL_OP_MA | TERNARY_OP_P,
   COMPARE_OP_MU = 

Re: [PATCH] internal-fn: Add VCOND_MASK_LEN.

2023-11-03 Thread Richard Sandiford
Robin Dapp  writes:
>> Could you explain why a special expansion is needed?  (Sorry if you already
>> have and I missed it, bit overloaded ATM.)  What does it do that is
>> different from what expand_fn_using_insn would do?
>
> All it does (in excess) is shuffle the arguments - vcond_mask_len has the
> mask as third operand similar to vcond_mask while vec_cond has the mask
> first.  I can swap them in the IFN already but when not swapping we will
> either be inconsistent with vec_cond or with vcond_mask.

Ah, OK.  IMO it's better to keep the optab operands the same as the IFN
operands, even if that makes things inconsistent with vcond_mask.
vcond_mask isn't really a good example to follow, since the operand
order is not only inconsistent with the IFN, it's also inconsistent
with the natural if_then_else order.

Thanks,
Richard



Re: [PATCH] internal-fn: Add VCOND_MASK_LEN.

2023-11-03 Thread Robin Dapp
> Could you explain why a special expansion is needed?  (Sorry if you already
> have and I missed it, bit overloaded ATM.)  What does it do that is
> different from what expand_fn_using_insn would do?

All it does (in excess) is shuffle the arguments - vcond_mask_len has the
mask as third operand similar to vcond_mask while vec_cond has the mask
first.  I can swap them in the IFN already but when not swapping we will
either be inconsistent with vec_cond or with vcond_mask.

Regards
 Robin


Re: [PATCH] internal-fn: Add VCOND_MASK_LEN.

2023-11-02 Thread Richard Sandiford
Robin Dapp  writes:
>> Looks reasonable overall.  The new match patterns are 1:1 the
>> same as the COND_ ones.  That's a bit awkward, but I don't see
>> a good way to "macroize" stuff further there.  Can you at least
>> interleave the COND_LEN_* ones with the other ones instead of
>> putting them all at the end?
>
> Yes, no problem.  It's supposed to be only temporary anyway (FWIW)
> as I didn't manage with the "stripping _LEN" way on the first few tries.
> Still on the todo list but unlikely to be done before stage 1 closes.
>
> I believe Richard "kind of" LGTM'ed the rest minus the spurious
> pattern (which is gone now) but there is still the direct optab change
> that he didn't comment on so I think we should wait for his remarks
> still.

Could you explain why a special expansion is needed?  (Sorry if you already
have and I missed it, bit overloaded ATM.)  What does it do that is
different from what expand_fn_using_insn would do?

Thanks,
Richard



Re: [PATCH] internal-fn: Add VCOND_MASK_LEN.

2023-11-02 Thread Robin Dapp
> Looks reasonable overall.  The new match patterns are 1:1 the
> same as the COND_ ones.  That's a bit awkward, but I don't see
> a good way to "macroize" stuff further there.  Can you at least
> interleave the COND_LEN_* ones with the other ones instead of
> putting them all at the end?

Yes, no problem.  It's supposed to be only temporary anyway (FWIW)
as I didn't manage with the "stripping _LEN" way on the first few tries.
Still on the todo list but unlikely to be done before stage 1 closes.

I believe Richard "kind of" LGTM'ed the rest minus the spurious
pattern (which is gone now) but there is still the direct optab change
that he didn't comment on so I think we should wait for his remarks
still.

Regards
 Robin



Re: [PATCH] internal-fn: Add VCOND_MASK_LEN.

2023-11-02 Thread Richard Biener
On Thu, 26 Oct 2023, Robin Dapp wrote:

> Ok, next try.  Now without dubious pattern and with direct optab
> but still dedicated expander function.
> 
> This will cause one riscv regression in cond_widen_reduc-2.c that
> we can deal with later.  It is just a missed optimization where
> we do not combine something that we used to because of the
> now-present length masking.
> 
> I'd also like to postpone handling vcond_mask_len simplifications
> via stripping the length and falling back to vec_cond and its fold
> patterns to a later time.  As is, this helps us avoid execution
> failures in at least five test cases.
> 
> Bootstrap et al. running on x86, aarch64 and power10.

Looks reasonable overall.  The new match patterns are 1:1 the
same as the COND_ ones.  That's a bit awkward, but I don't see
a good way to "macroize" stuff further there.  Can you at least
interleave the COND_LEN_* ones with the other ones instead of
putting them all at the end?

Thanks,
Richard.


> Regards
>  Robin
> 
> From 7acdebb5b13b71331621af08da6649fe08476fe8 Mon Sep 17 00:00:00 2001
> From: Robin Dapp 
> Date: Wed, 25 Oct 2023 22:19:43 +0200
> Subject: [PATCH v3] internal-fn: Add VCOND_MASK_LEN.
> 
> In order to prevent simplification of a COND_OP with degenerate mask
> (all true or all zero) into just an OP in the presence of length
> masking this patch introduces a length-masked analog to VEC_COND_EXPR:
> IFN_VCOND_MASK_LEN.
> 
> It also adds new match patterns that allow the combination of
> unconditional unary, binary and ternay operations with the
> VCOND_MASK_LEN into a conditional operation if the target supports it.
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/111760
> 
>   * config/riscv/autovec.md (vcond_mask_len_): Add
>   expander.
>   * config/riscv/riscv-protos.h (enum insn_type): Add.
>   * doc/md.texi: Add vcond_mask_len.
>   * gimple-match-exports.cc (maybe_resimplify_conditional_op):
>   Create VCOND_MASK_LEN when
>   length masking.
>   * gimple-match.h (gimple_match_op::gimple_match_op): Allow
>   matching of 6 and 7 parameters.
>   (gimple_match_op::set_op): Ditto.
>   (gimple_match_op::gimple_match_op): Always initialize len and
>   bias.
>   * internal-fn.cc (vec_cond_mask_len_direct): Add.
>   (expand_vec_cond_mask_len_optab_fn): Add.
>   (direct_vec_cond_mask_len_optab_supported_p): Add.
>   (internal_fn_len_index): Add VCOND_MASK_LEN.
>   (internal_fn_mask_index): Ditto.
>   * internal-fn.def (VCOND_MASK_LEN): New internal function.
>   * match.pd: Combine unconditional unary, binary and ternary
>   operations into the respective COND_LEN operations.
>   * optabs.def (OPTAB_D): Add vcond_mask_len optab.
> ---
>  gcc/config/riscv/autovec.md | 37 
>  gcc/config/riscv/riscv-protos.h |  5 +++
>  gcc/doc/md.texi |  9 
>  gcc/gimple-match-exports.cc | 13 --
>  gcc/gimple-match.h  | 78 -
>  gcc/internal-fn.cc  | 42 ++
>  gcc/internal-fn.def |  2 +
>  gcc/match.pd| 61 ++
>  gcc/optabs.def  |  1 +
>  9 files changed, 243 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index 80910ba3cc2..dadb71c1165 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -565,6 +565,43 @@ (define_insn_and_split "vcond_mask_"
>[(set_attr "type" "vector")]
>  )
>  
> +(define_expand "vcond_mask_len_"
> +  [(match_operand:V_VLS 0 "register_operand")
> +(match_operand: 3 "nonmemory_operand")
> +(match_operand:V_VLS 1 "nonmemory_operand")
> +(match_operand:V_VLS 2 "autovec_else_operand")
> +(match_operand 4 "autovec_length_operand")
> +(match_operand 5 "const_0_operand")]
> +  "TARGET_VECTOR"
> +  {
> +if (satisfies_constraint_Wc1 (operands[3]))
> +  {
> + rtx ops[] = {operands[0], operands[2], operands[1]};
> + riscv_vector::emit_nonvlmax_insn (code_for_pred_mov (mode),
> +   riscv_vector::UNARY_OP_TUMA,
> +   ops, operands[4]);
> +  }
> +else if (satisfies_constraint_Wc0 (operands[3]))
> +  {
> + rtx ops[] = {operands[0], operands[2], operands[2]};
> + riscv_vector::emit_nonvlmax_insn (code_for_pred_mov (mode),
> +   riscv_vector::UNARY_OP_TUMA,
> +   ops, operands[4]);
> +  }
> +else
> +  {
> + /* The order of vcond_mask is opposite to pred_merge.  */
> + rtx ops[] = {operands[0], operands[2], operands[2], operands[1],
> +  operands[3]};
> + riscv_vector::emit_nonvlmax_insn (code_for_pred_merge (mode),
> +   riscv_vector::MERGE_OP_TUMA,
> +   ops, 

Re: [PATCH] internal-fn: Add VCOND_MASK_LEN.

2023-10-26 Thread Robin Dapp
> +(define_expand "vcond_mask_len_"
> +  [(match_operand:V_VLS 0 "register_operand")
> +    (match_operand: 3 "nonmemory_operand")
> +    (match_operand:V_VLS 1 "nonmemory_operand")
> +    (match_operand:V_VLS 2 "autovec_else_operand")
> +    (match_operand 4 "autovec_length_operand")
> +    (match_operand 5 "const_0_operand")]
> 
> I think you should change V_VLS into V since we never apply partial 
> vectorization (predicated by length)
> on VLSmodes.  VLSmodes are the modes used on GNU vector/SLP/SIMD 
> vectorizations.

Right, thanks.  That was likely copy and paste from vcond_mask.
Changed it and re-tested but not going to send another version
before more changes are requested.

Regards
 Robin


Re: Re: [PATCH] internal-fn: Add VCOND_MASK_LEN.

2023-10-26 Thread 钟居哲
+(define_expand "vcond_mask_len_"
+  [(match_operand:V_VLS 0 "register_operand")
+(match_operand: 3 "nonmemory_operand")
+(match_operand:V_VLS 1 "nonmemory_operand")
+(match_operand:V_VLS 2 "autovec_else_operand")
+(match_operand 4 "autovec_length_operand")
+(match_operand 5 "const_0_operand")]

I think you should change V_VLS into V since we never apply partial 
vectorization (predicated by length)
on VLSmodes.  VLSmodes are the modes used on GNU vector/SLP/SIMD vectorizations.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-10-26 22:02
To: richard.sandiford
CC: rdapp.gcc; gcc-patches; rguenther; juzhe.zh...@rivai.ai
Subject: Re: [PATCH] internal-fn: Add VCOND_MASK_LEN.
Ok, next try.  Now without dubious pattern and with direct optab
but still dedicated expander function.
 
This will cause one riscv regression in cond_widen_reduc-2.c that
we can deal with later.  It is just a missed optimization where
we do not combine something that we used to because of the
now-present length masking.
 
I'd also like to postpone handling vcond_mask_len simplifications
via stripping the length and falling back to vec_cond and its fold
patterns to a later time.  As is, this helps us avoid execution
failures in at least five test cases.
 
Bootstrap et al. running on x86, aarch64 and power10.
 
Regards
Robin
 
From 7acdebb5b13b71331621af08da6649fe08476fe8 Mon Sep 17 00:00:00 2001
From: Robin Dapp 
Date: Wed, 25 Oct 2023 22:19:43 +0200
Subject: [PATCH v3] internal-fn: Add VCOND_MASK_LEN.
 
In order to prevent simplification of a COND_OP with degenerate mask
(all true or all zero) into just an OP in the presence of length
masking this patch introduces a length-masked analog to VEC_COND_EXPR:
IFN_VCOND_MASK_LEN.
 
It also adds new match patterns that allow the combination of
unconditional unary, binary and ternay operations with the
VCOND_MASK_LEN into a conditional operation if the target supports it.
 
gcc/ChangeLog:
 
PR tree-optimization/111760
 
* config/riscv/autovec.md (vcond_mask_len_): Add
expander.
* config/riscv/riscv-protos.h (enum insn_type): Add.
* doc/md.texi: Add vcond_mask_len.
* gimple-match-exports.cc (maybe_resimplify_conditional_op):
Create VCOND_MASK_LEN when
length masking.
* gimple-match.h (gimple_match_op::gimple_match_op): Allow
matching of 6 and 7 parameters.
(gimple_match_op::set_op): Ditto.
(gimple_match_op::gimple_match_op): Always initialize len and
bias.
* internal-fn.cc (vec_cond_mask_len_direct): Add.
(expand_vec_cond_mask_len_optab_fn): Add.
(direct_vec_cond_mask_len_optab_supported_p): Add.
(internal_fn_len_index): Add VCOND_MASK_LEN.
(internal_fn_mask_index): Ditto.
* internal-fn.def (VCOND_MASK_LEN): New internal function.
* match.pd: Combine unconditional unary, binary and ternary
operations into the respective COND_LEN operations.
* optabs.def (OPTAB_D): Add vcond_mask_len optab.
---
gcc/config/riscv/autovec.md | 37 
gcc/config/riscv/riscv-protos.h |  5 +++
gcc/doc/md.texi |  9 
gcc/gimple-match-exports.cc | 13 --
gcc/gimple-match.h  | 78 -
gcc/internal-fn.cc  | 42 ++
gcc/internal-fn.def |  2 +
gcc/match.pd| 61 ++
gcc/optabs.def  |  1 +
9 files changed, 243 insertions(+), 5 deletions(-)
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 80910ba3cc2..dadb71c1165 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -565,6 +565,43 @@ (define_insn_and_split "vcond_mask_"
   [(set_attr "type" "vector")]
)
+(define_expand "vcond_mask_len_"
+  [(match_operand:V_VLS 0 "register_operand")
+(match_operand: 3 "nonmemory_operand")
+(match_operand:V_VLS 1 "nonmemory_operand")
+(match_operand:V_VLS 2 "autovec_else_operand")
+(match_operand 4 "autovec_length_operand")
+(match_operand 5 "const_0_operand")]
+  "TARGET_VECTOR"
+  {
+if (satisfies_constraint_Wc1 (operands[3]))
+  {
+ rtx ops[] = {operands[0], operands[2], operands[1]};
+ riscv_vector::emit_nonvlmax_insn (code_for_pred_mov (mode),
+   riscv_vector::UNARY_OP_TUMA,
+   ops, operands[4]);
+  }
+else if (satisfies_constraint_Wc0 (operands[3]))
+  {
+ rtx ops[] = {operands[0], operands[2], operands[2]};
+ riscv_vector::emit_nonvlmax_insn (code_for_pred_mov (mode),
+   riscv_vector::UNARY_OP_TUMA,
+   ops, operands[4]);
+  }
+else
+  {
+ /* The order of vcond_mask is opposite to pred_merge.  */
+ rtx ops[] = {operands[0], operands[2], operands[2], operands[1],
+  operands[3]};
+ riscv_vector::emit_nonvlmax_insn (code_for_pred_merge (mode),
+   riscv_vector::MERGE_OP_TUMA,
+   ops, operands[

Re: [PATCH] internal-fn: Add VCOND_MASK_LEN.

2023-10-26 Thread Robin Dapp
Ok, next try.  Now without dubious pattern and with direct optab
but still dedicated expander function.

This will cause one riscv regression in cond_widen_reduc-2.c that
we can deal with later.  It is just a missed optimization where
we do not combine something that we used to because of the
now-present length masking.

I'd also like to postpone handling vcond_mask_len simplifications
via stripping the length and falling back to vec_cond and its fold
patterns to a later time.  As is, this helps us avoid execution
failures in at least five test cases.

Bootstrap et al. running on x86, aarch64 and power10.

Regards
 Robin

>From 7acdebb5b13b71331621af08da6649fe08476fe8 Mon Sep 17 00:00:00 2001
From: Robin Dapp 
Date: Wed, 25 Oct 2023 22:19:43 +0200
Subject: [PATCH v3] internal-fn: Add VCOND_MASK_LEN.

In order to prevent simplification of a COND_OP with degenerate mask
(all true or all zero) into just an OP in the presence of length
masking this patch introduces a length-masked analog to VEC_COND_EXPR:
IFN_VCOND_MASK_LEN.

It also adds new match patterns that allow the combination of
unconditional unary, binary and ternay operations with the
VCOND_MASK_LEN into a conditional operation if the target supports it.

gcc/ChangeLog:

PR tree-optimization/111760

* config/riscv/autovec.md (vcond_mask_len_): Add
expander.
* config/riscv/riscv-protos.h (enum insn_type): Add.
* doc/md.texi: Add vcond_mask_len.
* gimple-match-exports.cc (maybe_resimplify_conditional_op):
Create VCOND_MASK_LEN when
length masking.
* gimple-match.h (gimple_match_op::gimple_match_op): Allow
matching of 6 and 7 parameters.
(gimple_match_op::set_op): Ditto.
(gimple_match_op::gimple_match_op): Always initialize len and
bias.
* internal-fn.cc (vec_cond_mask_len_direct): Add.
(expand_vec_cond_mask_len_optab_fn): Add.
(direct_vec_cond_mask_len_optab_supported_p): Add.
(internal_fn_len_index): Add VCOND_MASK_LEN.
(internal_fn_mask_index): Ditto.
* internal-fn.def (VCOND_MASK_LEN): New internal function.
* match.pd: Combine unconditional unary, binary and ternary
operations into the respective COND_LEN operations.
* optabs.def (OPTAB_D): Add vcond_mask_len optab.
---
 gcc/config/riscv/autovec.md | 37 
 gcc/config/riscv/riscv-protos.h |  5 +++
 gcc/doc/md.texi |  9 
 gcc/gimple-match-exports.cc | 13 --
 gcc/gimple-match.h  | 78 -
 gcc/internal-fn.cc  | 42 ++
 gcc/internal-fn.def |  2 +
 gcc/match.pd| 61 ++
 gcc/optabs.def  |  1 +
 9 files changed, 243 insertions(+), 5 deletions(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 80910ba3cc2..dadb71c1165 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -565,6 +565,43 @@ (define_insn_and_split "vcond_mask_"
   [(set_attr "type" "vector")]
 )
 
+(define_expand "vcond_mask_len_"
+  [(match_operand:V_VLS 0 "register_operand")
+(match_operand: 3 "nonmemory_operand")
+(match_operand:V_VLS 1 "nonmemory_operand")
+(match_operand:V_VLS 2 "autovec_else_operand")
+(match_operand 4 "autovec_length_operand")
+(match_operand 5 "const_0_operand")]
+  "TARGET_VECTOR"
+  {
+if (satisfies_constraint_Wc1 (operands[3]))
+  {
+   rtx ops[] = {operands[0], operands[2], operands[1]};
+   riscv_vector::emit_nonvlmax_insn (code_for_pred_mov (mode),
+ riscv_vector::UNARY_OP_TUMA,
+ ops, operands[4]);
+  }
+else if (satisfies_constraint_Wc0 (operands[3]))
+  {
+   rtx ops[] = {operands[0], operands[2], operands[2]};
+   riscv_vector::emit_nonvlmax_insn (code_for_pred_mov (mode),
+ riscv_vector::UNARY_OP_TUMA,
+ ops, operands[4]);
+  }
+else
+  {
+   /* The order of vcond_mask is opposite to pred_merge.  */
+   rtx ops[] = {operands[0], operands[2], operands[2], operands[1],
+operands[3]};
+   riscv_vector::emit_nonvlmax_insn (code_for_pred_merge (mode),
+ riscv_vector::MERGE_OP_TUMA,
+ ops, operands[4]);
+  }
+DONE;
+  }
+  [(set_attr "type" "vector")]
+)
+
 ;; -
 ;;  [BOOL] Select based on masks
 ;; -
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 668d75043ca..0a54e4ff022 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -302,6 +302,7 @@ enum 

Re: [PATCH] internal-fn: Add VCOND_MASK_LEN.

2023-10-26 Thread Robin Dapp
> Yeah. I think Robin may need this :
> 
> TREE_CODE (else_val) == SSA_NAAME
>   && SSA_NAME_IS_DEFAULT_DEF (else_val)
>   && VAR_P (SSA_NAME_VAR (else_val))
> 
> to differentiate whether the ELSE VALUE is uninitialized SSA or not.

I think we are talking about a different simplification now.
This one we could still add as a match.pd pattern simplifying every
conditional operation with an undefined else value.

I just re-checked - without my pattern that turns
VCOND_MASK_LEN into VEC_COND there is only one additional fail.
(cond_widen_reduc-2.c where we scan for vfwreduc).
I guess I can just change the combine pattern to combine cond
as well as length masking (merge + if_then_else) when the else
value is similar in both.  Then we would avoid my dubious
simplification and still get rid of the execution failures.

Surely Richard is right in that we cannot "unconditionally" fold
away the length but my naive hunch is that we currently never
create situations where this really leads to errors.

Regards
 Robin



Re: Re: [PATCH] internal-fn: Add VCOND_MASK_LEN.

2023-10-25 Thread 钟居哲
Yeah. I think Robin may need this :

TREE_CODE (else_val) == SSA_NAAME
  && SSA_NAME_IS_DEFAULT_DEF (else_val)
  && VAR_P (SSA_NAME_VAR (else_val))
to differentiate whether the ELSE VALUE is uninitialized SSA or not.



juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-10-26 06:32
To: 钟居哲
CC: gcc-patches; rdapp.gcc; rguenther
Subject: Re: [PATCH] internal-fn: Add VCOND_MASK_LEN.
钟居哲  writes:
>>> Which one is right?
> Hi, Richard. Let me explain this situation.
>
> Both situations are possible. It's depending on the 'ELSE' value whether it 
> is unitialized value.
>
> For reduction case:
>
> for (int i = 0; i < n; i++)
>   result += a[i]
>
> The trailing elements should be well-defined, keep the original value. 
> Otherwise, it will cause run-time issue.
>
> For integer DIV operation:
>
> for (int i = 0; i < n; i++)
>   a[i] = a[i] / b[i];
>
> The trailling elements are DON'T care (or undefined), I will use unitialized 
> value in 'ELSE' value.
> Then later 'expand' stage will expand it into "clobber scratch" RTL.
 
OK, in that case it sounds like we're talking about PR110751.
The gimple semantics are that the COND_LEN operates on all lanes
of the mode (and the md.texi documentation should be fixed).
But if the else value is undefined, we can simplify:
 
  IFN_COND_LEN_IOR (mask, a, 0, undef, len, bias)
 
to "a".
 
That's not really a property of COND_LEN though.  The same thing
applies to plain IFN_COND_IOR.  If the fold reduces to a selection
between two values, and one of them is undefined, we can pick the other.
 
(Although we'd need to think a little carefully about that,
c.f. llvm's distinction between undef and poison.)
 
Thanks,
Richard
 


Re: [PATCH] internal-fn: Add VCOND_MASK_LEN.

2023-10-25 Thread Richard Sandiford
钟居哲  writes:
>>> Which one is right?
> Hi, Richard. Let me explain this situation.
>
> Both situations are possible. It's depending on the 'ELSE' value whether it 
> is unitialized value.
>
> For reduction case:
>
> for (int i = 0; i < n; i++)
>   result += a[i]
>
> The trailing elements should be well-defined, keep the original value. 
> Otherwise, it will cause run-time issue.
>
> For integer DIV operation:
>
> for (int i = 0; i < n; i++)
>   a[i] = a[i] / b[i];
>
> The trailling elements are DON'T care (or undefined), I will use unitialized 
> value in 'ELSE' value.
> Then later 'expand' stage will expand it into "clobber scratch" RTL.

OK, in that case it sounds like we're talking about PR110751.
The gimple semantics are that the COND_LEN operates on all lanes
of the mode (and the md.texi documentation should be fixed).
But if the else value is undefined, we can simplify:

  IFN_COND_LEN_IOR (mask, a, 0, undef, len, bias)

to "a".

That's not really a property of COND_LEN though.  The same thing
applies to plain IFN_COND_IOR.  If the fold reduces to a selection
between two values, and one of them is undefined, we can pick the other.

(Although we'd need to think a little carefully about that,
c.f. llvm's distinction between undef and poison.)

Thanks,
Richard


[PATCH] internal-fn: Add VCOND_MASK_LEN.

2023-10-25 Thread 钟居哲

>> Which one is right?
Hi, Richard. Let me explain this situation.

Both situations are possible. It's depending on the 'ELSE' value whether it is 
unitialized value.

For reduction case:

for (int i = 0; i < n; i++)
  result += a[i]

The trailing elements should be well-defined, keep the original value. 
Otherwise, it will cause run-time issue.

For integer DIV operation:

for (int i = 0; i < n; i++)
  a[i] = a[i] / b[i];

The trailling elements are DON'T care (or undefined), I will use unitialized 
value in 'ELSE' value.
Then later 'expand' stage will expand it into "clobber scratch" RTL.

 Thanks.


juzhe.zh...@rivai.ai


Re: [PATCH] internal-fn: Add VCOND_MASK_LEN.

2023-10-25 Thread Richard Sandiford
Robin Dapp  writes:
>> At first, this seemed like an odd place to fold away the length.
>> AFAIK the length in res_op is inherited directly from the original
>> operation, and so it isn't any more redundant after the fold than
>> it was before.  But I suppose the reason for doing it here is that
>> we deliberately create IFN_COND_LEN_FOO calls that have "redundant"
>> lengths.  Doing that avoids the need to define an IFN_COND_FOO
>> equivalent of every IFN_COND_LEN_FOO optab.  Is that right?  If so,
>> I think it deserves a comment.
>
> I think, generally, what I want to cover is a more fundamental thing
> - in length-controlled targets the loop length doesn't change
> throughout a loop and what we normally do is load the right length,
> operate on the maximum length (ignoring tail elements) and store
> the right length.
>
> So, whenever the length is constant it was already determined that
> we operate on exactly this length and length masking is not needed.
> Only when the length is variable and not compile-time constant we need
> to use length masking (and therefore the vec_cond simplification becomes
> invalid).  I think we never e.g. operate on the first "half" of a
> vector, leaving the second half unchanged.  As far as I know such access
> patterns are always done with non-length, "conditional" masking.

In that case, I think we need to nail down what the semantics of
these LEN functions actually are.  There seems to be a discrepancy
between the optab documentation and the internal-fn.cc documentation.

The optab documentation says:

for (i = 0; i < ops[5] + ops[6]; i++)
  op0[i] = op1[i] ? op2[i] @var{op} op3[i] : op4[i];

which leaves trailing elements of op0 in an undefined state.
But internal-fn.cc says:

 for (int i = 0; i < NUNITS; i++)
  {
if (i < LEN + BIAS && COND[i])
  LHS[i] = A[i] CODE B[i];
else
  LHS[i] = ELSE[i];
  }

which leaves all lanes in a well-defined state.  Which one is right?

If the first one is right, then it doesn't seem to matter whether
the length is constant or variable.  We can simplify:

  IFN_COND_LEN_IOR (mask, a, 0, b, len, bias)

to:

  VEC_COND_EXPR 

regardless of the values of len and bias.  We wouldn't then need a
VCOND_MASK_LEN after all.

If the second one is right, then we cannot get rid of the length
unless it is known to be equal to the number of lanes, at least
according to gimple semantics.  Any knowledge about which lanes
"exist" is only exposed in target-dependent code (presumably by
the VSETVL pass).

Thanks,
Richard

> Actually the only problematic cases I found were reduction-like loops
> where the reduction operated on full length rather than the "right" one.
> If a tail element is wrong then, obviously the reduction result is also
> wrong.  From a "loop len" point of view a reduction could have a length
> like len_store.  Then the simplification problem would go away.
>
> In the attached version I removed the hunk you mentioned but added a
> match.pd pattern where all constant-length vcond_mask_len are simplified
> to vec_cond.
>
> /* A VCOND_MASK_LEN with a constant length is just a vec_cond for
>our purposes.  */
> (simplify
>  (IFN_VCOND_MASK_LEN @0 @1 @2 INTEGER_CST@3 INTEGER_CST@4)
>   (vec_cond @0 @1 @2))
>
> This works for all of the testsuite (and is basically the same
> thing we have been testing all along with the bogus simplification
> still in place).  Is there any way how to formalize the
> requirement?  Or am I totally wrong and this must never be done?
>
>> Any reason not to make IFN_COND_LEN_MASK a directly-mapped optab?
>> (I realise IFN_COND_MASK isn't, but that's used differently.)
>
> Right, a conversion optab is not necessary - in the expander function
> all we really do is move the condition from position 1 to 3.  Changing
> the order would mean inconsistency with vec_cond.  If that's acceptable
> I can change it and we can use expand_direct_optab_fn.  For now I kept
> the expander function but used a direct optab.
>
> Regards
>  Robin
>
> From 4f793b71184b3301087780ed500f798d69328fc9 Mon Sep 17 00:00:00 2001
> From: Robin Dapp 
> Date: Fri, 13 Oct 2023 10:20:35 +0200
> Subject: [PATCH v2] internal-fn: Add VCOND_MASK_LEN.
>
> In order to prevent simplification of a COND_OP with degenerate mask
> (all true or all zero) into just an OP in the presence of length
> masking this patch introduces a length-masked analog to VEC_COND_EXPR:
> IFN_VCOND_MASK_LEN.  If the to-be-simplified conditional operation has a
> length that is not the full hardware vector length a simplification now
> does not result int a VEC_COND but rather a VCOND_MASK_LEN.
>
> For cases where the masks is known to be all true or all zero the patch
> introduces new match patterns that allow combination of unconditional
> unary, binary and ternay operations with the respective conditional
> operations if the target supports it.
>
> Similarly, if the length is known to be equal to the target hardware
> length 

Re: [PATCH] internal-fn: Add VCOND_MASK_LEN.

2023-10-25 Thread Robin Dapp
> At first, this seemed like an odd place to fold away the length.
> AFAIK the length in res_op is inherited directly from the original
> operation, and so it isn't any more redundant after the fold than
> it was before.  But I suppose the reason for doing it here is that
> we deliberately create IFN_COND_LEN_FOO calls that have "redundant"
> lengths.  Doing that avoids the need to define an IFN_COND_FOO
> equivalent of every IFN_COND_LEN_FOO optab.  Is that right?  If so,
> I think it deserves a comment.

I think, generally, what I want to cover is a more fundamental thing
- in length-controlled targets the loop length doesn't change
throughout a loop and what we normally do is load the right length,
operate on the maximum length (ignoring tail elements) and store
the right length.

So, whenever the length is constant it was already determined that
we operate on exactly this length and length masking is not needed.
Only when the length is variable and not compile-time constant we need
to use length masking (and therefore the vec_cond simplification becomes
invalid).  I think we never e.g. operate on the first "half" of a
vector, leaving the second half unchanged.  As far as I know such access
patterns are always done with non-length, "conditional" masking.

Actually the only problematic cases I found were reduction-like loops
where the reduction operated on full length rather than the "right" one.
If a tail element is wrong then, obviously the reduction result is also
wrong.  From a "loop len" point of view a reduction could have a length
like len_store.  Then the simplification problem would go away.

In the attached version I removed the hunk you mentioned but added a
match.pd pattern where all constant-length vcond_mask_len are simplified
to vec_cond.

/* A VCOND_MASK_LEN with a constant length is just a vec_cond for
   our purposes.  */
(simplify
 (IFN_VCOND_MASK_LEN @0 @1 @2 INTEGER_CST@3 INTEGER_CST@4)
  (vec_cond @0 @1 @2))

This works for all of the testsuite (and is basically the same
thing we have been testing all along with the bogus simplification
still in place).  Is there any way how to formalize the
requirement?  Or am I totally wrong and this must never be done?

> Any reason not to make IFN_COND_LEN_MASK a directly-mapped optab?
> (I realise IFN_COND_MASK isn't, but that's used differently.)

Right, a conversion optab is not necessary - in the expander function
all we really do is move the condition from position 1 to 3.  Changing
the order would mean inconsistency with vec_cond.  If that's acceptable
I can change it and we can use expand_direct_optab_fn.  For now I kept
the expander function but used a direct optab.

Regards
 Robin

>From 4f793b71184b3301087780ed500f798d69328fc9 Mon Sep 17 00:00:00 2001
From: Robin Dapp 
Date: Fri, 13 Oct 2023 10:20:35 +0200
Subject: [PATCH v2] internal-fn: Add VCOND_MASK_LEN.

In order to prevent simplification of a COND_OP with degenerate mask
(all true or all zero) into just an OP in the presence of length
masking this patch introduces a length-masked analog to VEC_COND_EXPR:
IFN_VCOND_MASK_LEN.  If the to-be-simplified conditional operation has a
length that is not the full hardware vector length a simplification now
does not result int a VEC_COND but rather a VCOND_MASK_LEN.

For cases where the masks is known to be all true or all zero the patch
introduces new match patterns that allow combination of unconditional
unary, binary and ternay operations with the respective conditional
operations if the target supports it.

Similarly, if the length is known to be equal to the target hardware
length VCOND_MASK_LEN will be simplified to VEC_COND_EXPR.

gcc/ChangeLog:

* config/riscv/autovec.md (vcond_mask_len_): Add
expander.
* config/riscv/riscv-protos.h (enum insn_type):
* doc/md.texi: Add vcond_mask_len.
* gimple-match-exports.cc (maybe_resimplify_conditional_op):
Create VCOND_MASK_LEN when
length masking.
* gimple-match.h (gimple_match_op::gimple_match_op): Allow
matching of 6 and 7 parameters.
(gimple_match_op::set_op): Ditto.
(gimple_match_op::gimple_match_op): Always initialize len and
bias.
* internal-fn.cc (vec_cond_mask_len_direct): Add.
(expand_vec_cond_mask_len_optab_fn): Add.
(direct_vec_cond_mask_len_optab_supported_p): Add.
(internal_fn_len_index): Add VCOND_MASK_LEN.
(internal_fn_mask_index): Ditto.
* internal-fn.def (VCOND_MASK_LEN): New internal function.
* match.pd: Combine unconditional unary, binary and ternary
operations into the respective COND_LEN operations.
* optabs.def (OPTAB_CD): Add vcond_mask_len optab.
---
 gcc/config/riscv/autovec.md | 37 
 gcc/config/riscv/riscv-protos.h |  5 +++
 gcc/doc/md.texi |  9 
 gcc/gimple-match-exports.cc | 13 --
 gcc/gimple-match.h  | 78 -
 

Re: [PATCH] internal-fn: Add VCOND_MASK_LEN.

2023-10-24 Thread Richard Sandiford
Robin Dapp  writes:
> The attached patch introduces a VCOND_MASK_LEN, helps for the riscv cases
> that were broken before and looks unchanged on x86, aarch64 and power
> bootstrap and testsuites.
>
> I only went with the minimal number of new match.pd patterns and did not
> try stripping the length of a COND_LEN_OP in order to simplify the
> associated COND_OP.
>
> An important part that I'm not sure how to handle properly is -
> when we have a constant immediate length of e.g. 16 and the hardware
> also operates on 16 units, vector length masking is actually
> redundant and the vcond_mask_len can be reduced to a vec_cond.
> For those (if_then_else unsplit) we have a large number of combine
> patterns that fuse instruction which do not correspond to ifns
> (like widening operations but also more complex ones).
>
> Currently I achieve this in a most likely wrong way:
>
>   auto sz = GET_MODE_NUNITS (TYPE_MODE (res_op->type));
>   bool full_len = len && known_eq (sz.coeffs[0], ilen);
>   if (!len || full_len)
>  "vec_cond"
>   else
>  "vcond_mask_len"

At first, this seemed like an odd place to fold away the length.
AFAIK the length in res_op is inherited directly from the original
operation, and so it isn't any more redundant after the fold than
it was before.  But I suppose the reason for doing it here is that
we deliberately create IFN_COND_LEN_FOO calls that have "redundant"
lengths.  Doing that avoids the need to define an IFN_COND_FOO
equivalent of every IFN_COND_LEN_FOO optab.  Is that right?  If so,
I think it deserves a comment.

But yeah, known_eq (sz.coeffs[0], ilen) doesn't look right.
If the target knows that the length is exactly 16 at runtime,
then it should set GET_MODE_NUNITS to 16.  So I think the length
is only redundant if known_eq (sz, ilen).

The calculation should take the bias into account as well.

Any reason not to make IFN_COND_LEN_MASK a directly-mapped optab?
(I realise IFN_COND_MASK isn't, but that's used differently.)

Failing that, could the expansion use expand_fn_using_insn?

It generally looks OK to me otherwise FWIW, but it would be nice
to handle the fold programmatically in gimple-match*.cc rather
than having the explicit match.pd patterns.  Richi should review
the match.pd stuff though. ;)  (I didn't really look at it.)

Thanks,
Richard

> Another thing not done in this patch:  For vcond_mask we only expect
> register operands as mask and force to a register.  For a vcond_mask_len
> that results from a simplification with all-one or all-zero mask we
> could allow constant immediate vectors and expand them to simple
> len moves in the backend.
>
> Regards
>  Robin
>
> From bc72e9b2f3ee46508404ee7723ca78790fa96b6b Mon Sep 17 00:00:00 2001
> From: Robin Dapp 
> Date: Fri, 13 Oct 2023 10:20:35 +0200
> Subject: [PATCH] internal-fn: Add VCOND_MASK_LEN.
>
> In order to prevent simplification of a COND_OP with degenerate mask
> (all true or all zero) into just an OP in the presence of length
> masking this patch introduces a length-masked analog to VEC_COND_EXPR:
> IFN_VCOND_MASK_LEN.  If the to-be-simplified conditional operation has a
> length that is not the full hardware vector length a simplification now
> does not result int a VEC_COND but rather a VCOND_MASK_LEN.
>
> For cases where the masks is known to be all true or all zero the patch
> introduces new match patterns that allow combination of unconditional
> unary, binary and ternay operations with the respective conditional
> operations if the target supports it.
>
> Similarly, if the length is known to be equal to the target hardware
> length VCOND_MASK_LEN will be simplified to VEC_COND_EXPR.
>
> gcc/ChangeLog:
>
>   * config/riscv/autovec.md (vcond_mask_len_): Add
>   expander.
>   * config/riscv/riscv-protos.h (enum insn_type):
>   * doc/md.texi: Add vcond_mask_len.
>   * gimple-match-exports.cc (maybe_resimplify_conditional_op):
>   Create VCOND_MASK_LEN when
>   length masking.
>   * gimple-match.h (gimple_match_op::gimple_match_op): Allow
>   matching of 6 and 7 parameters.
>   (gimple_match_op::set_op): Ditto.
>   (gimple_match_op::gimple_match_op): Always initialize len and
>   bias.
>   * internal-fn.cc (vec_cond_mask_len_direct): Add.
>   (expand_vec_cond_mask_len_optab_fn): Add.
>   (direct_vec_cond_mask_len_optab_supported_p): Add.
>   (internal_fn_len_index): Add VCOND_MASK_LEN.
>   (internal_fn_mask_index): Ditto.
>   * internal-fn.def (VCOND_MASK_LEN): New internal function.
>   * match.pd: Combine unconditional unary, binary and ternary
>   operations into the respective COND_LEN operations.
>   * optabs.def (O

[PATCH] internal-fn: Add VCOND_MASK_LEN.

2023-10-23 Thread Robin Dapp
The attached patch introduces a VCOND_MASK_LEN, helps for the riscv cases
that were broken before and looks unchanged on x86, aarch64 and power
bootstrap and testsuites.

I only went with the minimal number of new match.pd patterns and did not
try stripping the length of a COND_LEN_OP in order to simplify the
associated COND_OP.

An important part that I'm not sure how to handle properly is -
when we have a constant immediate length of e.g. 16 and the hardware
also operates on 16 units, vector length masking is actually
redundant and the vcond_mask_len can be reduced to a vec_cond.
For those (if_then_else unsplit) we have a large number of combine
patterns that fuse instruction which do not correspond to ifns
(like widening operations but also more complex ones).

Currently I achieve this in a most likely wrong way:

  auto sz = GET_MODE_NUNITS (TYPE_MODE (res_op->type));
  bool full_len = len && known_eq (sz.coeffs[0], ilen);
  if (!len || full_len)
 "vec_cond"
  else
 "vcond_mask_len"

Another thing not done in this patch:  For vcond_mask we only expect
register operands as mask and force to a register.  For a vcond_mask_len
that results from a simplification with all-one or all-zero mask we
could allow constant immediate vectors and expand them to simple
len moves in the backend.

Regards
 Robin

>From bc72e9b2f3ee46508404ee7723ca78790fa96b6b Mon Sep 17 00:00:00 2001
From: Robin Dapp 
Date: Fri, 13 Oct 2023 10:20:35 +0200
Subject: [PATCH] internal-fn: Add VCOND_MASK_LEN.

In order to prevent simplification of a COND_OP with degenerate mask
(all true or all zero) into just an OP in the presence of length
masking this patch introduces a length-masked analog to VEC_COND_EXPR:
IFN_VCOND_MASK_LEN.  If the to-be-simplified conditional operation has a
length that is not the full hardware vector length a simplification now
does not result int a VEC_COND but rather a VCOND_MASK_LEN.

For cases where the masks is known to be all true or all zero the patch
introduces new match patterns that allow combination of unconditional
unary, binary and ternay operations with the respective conditional
operations if the target supports it.

Similarly, if the length is known to be equal to the target hardware
length VCOND_MASK_LEN will be simplified to VEC_COND_EXPR.

gcc/ChangeLog:

* config/riscv/autovec.md (vcond_mask_len_): Add
expander.
* config/riscv/riscv-protos.h (enum insn_type):
* doc/md.texi: Add vcond_mask_len.
* gimple-match-exports.cc (maybe_resimplify_conditional_op):
Create VCOND_MASK_LEN when
length masking.
* gimple-match.h (gimple_match_op::gimple_match_op): Allow
matching of 6 and 7 parameters.
(gimple_match_op::set_op): Ditto.
(gimple_match_op::gimple_match_op): Always initialize len and
bias.
* internal-fn.cc (vec_cond_mask_len_direct): Add.
(expand_vec_cond_mask_len_optab_fn): Add.
(direct_vec_cond_mask_len_optab_supported_p): Add.
(internal_fn_len_index): Add VCOND_MASK_LEN.
(internal_fn_mask_index): Ditto.
* internal-fn.def (VCOND_MASK_LEN): New internal function.
* match.pd: Combine unconditional unary, binary and ternary
operations into the respective COND_LEN operations.
* optabs.def (OPTAB_CD): Add vcond_mask_len optab.
---
 gcc/config/riscv/autovec.md | 20 +
 gcc/config/riscv/riscv-protos.h |  4 ++
 gcc/doc/md.texi |  9 
 gcc/gimple-match-exports.cc | 20 +++--
 gcc/gimple-match.h  | 78 -
 gcc/internal-fn.cc  | 41 +
 gcc/internal-fn.def |  2 +
 gcc/match.pd| 74 +++
 gcc/optabs.def  |  1 +
 9 files changed, 244 insertions(+), 5 deletions(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 80910ba3cc2..27a71bc1ef9 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -565,6 +565,26 @@ (define_insn_and_split "vcond_mask_"
   [(set_attr "type" "vector")]
 )
 
+(define_expand "vcond_mask_len_"
+  [(match_operand:V_VLS 0 "register_operand")
+(match_operand: 3 "register_operand")
+(match_operand:V_VLS 1 "nonmemory_operand")
+(match_operand:V_VLS 2 "register_operand")
+(match_operand 4 "autovec_length_operand")
+(match_operand 5 "const_0_operand")]
+  "TARGET_VECTOR"
+  {
+/* The order of vcond_mask is opposite to pred_merge.  */
+rtx ops[] = {operands[0], operands[0], operands[2], operands[1],
+operands[3]};
+riscv_vector::emit_nonvlmax_insn (code_for_pred_merge (mode),
+