RE: [PATCH v3] Internal-fn: Introduce new internal function SAT_ADD

2024-05-02 Thread Li, Pan2
> No, The isel would only be for the scalar, The vectorizer will still use the 
> vect_pattern.
> It needs to so we can cost the operation correctly, and in some cases 
> depending on how
> the saturation is described you are unable the vectorize.  The pattern allows 
> us to catch
> these cases and still vectorize.

> But you should be able to use the same match.pd predicate for both the 
> vectorizer pattern
> and isel.

Thanks. Got it, will have a try isel for scalar and remove the simplify in 
match.md.

> Eventually we do need to recognize this variant since:
>
> uint64_t
> add_sat(uint64_t x, uint64_t y) noexcept
> {
> uint64_t z;
> if (!__builtin_add_overflow(x, y, ))
>   return z;
> return -1u;
> }
> 
> Is a valid and common way to do saturation too.

Sure thing, will cover this later.

Pan

-Original Message-
From: Tamar Christina  
Sent: Thursday, May 2, 2024 8:58 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com; 
Liu, Hongtao 
Subject: RE: [PATCH v3] Internal-fn: Introduce new internal function SAT_ADD

> > So he was responding for how to do it for the vectorizer and scalar parts.
> > Remember that the goal is not to introduce new gimple IL that can block 
> > other
> optimizations.
> > The vectorizer already introduces new IL (various IFN) but this is fine as 
> > we don't
> track things like ranges for
> > vector instructions.  So we don't loose any information here.
> 
> > Now for the scalar, if we do an early replacement like in match.pd we 
> > prevent a
> lot of other optimizations
> > because they don't know what IFN_SAT_ADD does. gimple-isel runs pretty late,
> and so at this point we don't
> > expect many more optimizations to happen, so it's a safe spot to insert 
> > more IL
> with "unknown semantics".
> 
> > Was that your intention Richi?
> 
> Thanks Tamar for clear explanation, does that mean both the scalar and vector 
> will
> go isel approach? If so I may
> misunderstand in previous that it is only for vectorize.

No, The isel would only be for the scalar, The vectorizer will still use the 
vect_pattern.
It needs to so we can cost the operation correctly, and in some cases depending 
on how
the saturation is described you are unable the vectorize.  The pattern allows 
us to catch
these cases and still vectorize.

But you should be able to use the same match.pd predicate for both the 
vectorizer pattern
and isel.

> 
> Understand the point that we would like to put the pattern match late but I 
> may
> have a question here.
> Given SAT_ADD related pattern is sort of complicated, it is possible that the 
> sub-
> expression of SAT_ADD is optimized
> In early pass by others and we can hardly catch the shapes later.
> 
> For example, there is a plus expression in SAT_ADD, and in early pass it may 
> be
> optimized to .ADD_OVERFLOW, and
> then the pattern is quite different to aware of that in later pass.
> 

Yeah, it looks like this transformation is done in widening_mul, which is the 
other
place richi suggested to recognize SAT_ADD.  widening_mul already runs quite
late as well so it's also ok.

If you put it there before the code that transforms the sequence to overflow it
should work.

Eventually we do need to recognize this variant since:

uint64_t
add_sat(uint64_t x, uint64_t y) noexcept
{
uint64_t z;
if (!__builtin_add_overflow(x, y, ))
return z;
return -1u;
}

Is a valid and common way to do saturation too.

But for now, it's fine.

Cheers,
Tamar

> Sorry not sure if my understanding is correct, feel free to correct me.
> 
> Pan
> 
> -Original Message-----
> From: Tamar Christina 
> Sent: Thursday, May 2, 2024 11:26 AM
> To: Li, Pan2 ; gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com;
> Liu, Hongtao 
> Subject: RE: [PATCH v3] Internal-fn: Introduce new internal function SAT_ADD
> 
> > -----Original Message-
> > From: Li, Pan2 
> > Sent: Thursday, May 2, 2024 4:11 AM
> > To: Tamar Christina ; gcc-patches@gcc.gnu.org
> > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com;
> > Liu, Hongtao 
> > Subject: RE: [PATCH v3] Internal-fn: Introduce new internal function SAT_ADD
> >
> > Thanks Tamar
> >
> > > Could you also split off the vectorizer change from scalar recog one? 
> > > Typically I
> > would structure a change like this as:
> >
> > > 1. create types/structures + scalar recogn
> > > 2. Vector recog code
> > > 3. Backend changes
> >
> > Sure thing, will rearrange the patch like t

RE: [PATCH v3] Internal-fn: Introduce new internal function SAT_ADD

2024-05-02 Thread Tamar Christina
> > So he was responding for how to do it for the vectorizer and scalar parts.
> > Remember that the goal is not to introduce new gimple IL that can block 
> > other
> optimizations.
> > The vectorizer already introduces new IL (various IFN) but this is fine as 
> > we don't
> track things like ranges for
> > vector instructions.  So we don't loose any information here.
> 
> > Now for the scalar, if we do an early replacement like in match.pd we 
> > prevent a
> lot of other optimizations
> > because they don't know what IFN_SAT_ADD does. gimple-isel runs pretty late,
> and so at this point we don't
> > expect many more optimizations to happen, so it's a safe spot to insert 
> > more IL
> with "unknown semantics".
> 
> > Was that your intention Richi?
> 
> Thanks Tamar for clear explanation, does that mean both the scalar and vector 
> will
> go isel approach? If so I may
> misunderstand in previous that it is only for vectorize.

No, The isel would only be for the scalar, The vectorizer will still use the 
vect_pattern.
It needs to so we can cost the operation correctly, and in some cases depending 
on how
the saturation is described you are unable the vectorize.  The pattern allows 
us to catch
these cases and still vectorize.

But you should be able to use the same match.pd predicate for both the 
vectorizer pattern
and isel.

> 
> Understand the point that we would like to put the pattern match late but I 
> may
> have a question here.
> Given SAT_ADD related pattern is sort of complicated, it is possible that the 
> sub-
> expression of SAT_ADD is optimized
> In early pass by others and we can hardly catch the shapes later.
> 
> For example, there is a plus expression in SAT_ADD, and in early pass it may 
> be
> optimized to .ADD_OVERFLOW, and
> then the pattern is quite different to aware of that in later pass.
> 

Yeah, it looks like this transformation is done in widening_mul, which is the 
other
place richi suggested to recognize SAT_ADD.  widening_mul already runs quite
late as well so it's also ok.

If you put it there before the code that transforms the sequence to overflow it
should work.

Eventually we do need to recognize this variant since:

uint64_t
add_sat(uint64_t x, uint64_t y) noexcept
{
uint64_t z;
if (!__builtin_add_overflow(x, y, ))
return z;
return -1u;
}

Is a valid and common way to do saturation too.

But for now, it's fine.

Cheers,
Tamar

> Sorry not sure if my understanding is correct, feel free to correct me.
> 
> Pan
> 
> -Original Message-
> From: Tamar Christina 
> Sent: Thursday, May 2, 2024 11:26 AM
> To: Li, Pan2 ; gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com;
> Liu, Hongtao 
> Subject: RE: [PATCH v3] Internal-fn: Introduce new internal function SAT_ADD
> 
> > -Original Message-
> > From: Li, Pan2 
> > Sent: Thursday, May 2, 2024 4:11 AM
> > To: Tamar Christina ; gcc-patches@gcc.gnu.org
> > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com;
> > Liu, Hongtao 
> > Subject: RE: [PATCH v3] Internal-fn: Introduce new internal function SAT_ADD
> >
> > Thanks Tamar
> >
> > > Could you also split off the vectorizer change from scalar recog one? 
> > > Typically I
> > would structure a change like this as:
> >
> > > 1. create types/structures + scalar recogn
> > > 2. Vector recog code
> > > 3. Backend changes
> >
> > Sure thing, will rearrange the patch like this.
> >
> > > Is ECF_NOTHROW correct here? At least on most targets I believe the scalar
> > version
> > > can set flags/throw exceptions if the saturation happens?
> >
> > I see, will remove that.
> >
> > > Hmm I believe Richi mentioned that he wanted the recognition done in isel?
> >
> > > The problem with doing it in match.pd is that it replaces the operations 
> > > quite
> > > early the pipeline. Did I miss an email perhaps? The early replacement 
> > > means
> we
> > > lose optimizations and things such as range calculations etc, since e.g. 
> > > ranger
> > > doesn't know these internal functions.
> >
> > > I think Richi will want this in islet or mult widening but I'll continue 
> > > with
> match.pd
> > > review just in case.
> >
> > If I understand is correct, Richard suggested try vectorizer patterns first 
> > and then
> > possible isel.
> > Thus, I don't have a try for SAT_ADD in ISEL as vectorizer patterns works 
> > well for
> > SAT_ADD.
> > Let's wait 

RE: [PATCH v3] Internal-fn: Introduce new internal function SAT_ADD

2024-05-02 Thread Li, Pan2
> So he was responding for how to do it for the vectorizer and scalar parts.
> Remember that the goal is not to introduce new gimple IL that can block other 
> optimizations.
> The vectorizer already introduces new IL (various IFN) but this is fine as we 
> don't track things like ranges for
> vector instructions.  So we don't loose any information here.

> Now for the scalar, if we do an early replacement like in match.pd we prevent 
> a lot of other optimizations
> because they don't know what IFN_SAT_ADD does. gimple-isel runs pretty late, 
> and so at this point we don't
> expect many more optimizations to happen, so it's a safe spot to insert more 
> IL with "unknown semantics".

> Was that your intention Richi?

Thanks Tamar for clear explanation, does that mean both the scalar and vector 
will go isel approach? If so I may
misunderstand in previous that it is only for vectorize.

Understand the point that we would like to put the pattern match late but I may 
have a question here.
Given SAT_ADD related pattern is sort of complicated, it is possible that the 
sub-expression of SAT_ADD is optimized
In early pass by others and we can hardly catch the shapes later.

For example, there is a plus expression in SAT_ADD, and in early pass it may be 
optimized to .ADD_OVERFLOW, and
then the pattern is quite different to aware of that in later pass.

Sorry not sure if my understanding is correct, feel free to correct me.

Pan

-Original Message-
From: Tamar Christina  
Sent: Thursday, May 2, 2024 11:26 AM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com; 
Liu, Hongtao 
Subject: RE: [PATCH v3] Internal-fn: Introduce new internal function SAT_ADD

> -Original Message-
> From: Li, Pan2 
> Sent: Thursday, May 2, 2024 4:11 AM
> To: Tamar Christina ; gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com;
> Liu, Hongtao 
> Subject: RE: [PATCH v3] Internal-fn: Introduce new internal function SAT_ADD
> 
> Thanks Tamar
> 
> > Could you also split off the vectorizer change from scalar recog one? 
> > Typically I
> would structure a change like this as:
> 
> > 1. create types/structures + scalar recogn
> > 2. Vector recog code
> > 3. Backend changes
> 
> Sure thing, will rearrange the patch like this.
> 
> > Is ECF_NOTHROW correct here? At least on most targets I believe the scalar
> version
> > can set flags/throw exceptions if the saturation happens?
> 
> I see, will remove that.
> 
> > Hmm I believe Richi mentioned that he wanted the recognition done in isel?
> 
> > The problem with doing it in match.pd is that it replaces the operations 
> > quite
> > early the pipeline. Did I miss an email perhaps? The early replacement 
> > means we
> > lose optimizations and things such as range calculations etc, since e.g. 
> > ranger
> > doesn't know these internal functions.
> 
> > I think Richi will want this in islet or mult widening but I'll continue 
> > with match.pd
> > review just in case.
> 
> If I understand is correct, Richard suggested try vectorizer patterns first 
> and then
> possible isel.
> Thus, I don't have a try for SAT_ADD in ISEL as vectorizer patterns works 
> well for
> SAT_ADD.
> Let's wait the confirmation from Richard. Below are the original words from
> previous mail for reference.
> 

I think the comment he made was this

> > Given we have saturating integer alu like below, could you help to coach me 
> > the most reasonable way to represent
> > It in scalar as well as vectorize part? Sorry not familiar with this part 
> > and still dig into how it works...
> 
> As in your v2, .SAT_ADD for both sat_uadd and sat_sadd, similar for
> the other cases.
>
> As I said, use vectorizer patterns and possibly do instruction
> selection at ISEL/widen_mult time.

So he was responding for how to do it for the vectorizer and scalar parts.
Remember that the goal is not to introduce new gimple IL that can block other 
optimizations.
The vectorizer already introduces new IL (various IFN) but this is fine as we 
don't track things like ranges for
vector instructions.  So we don't loose any information here.

Now for the scalar, if we do an early replacement like in match.pd we prevent a 
lot of other optimizations
because they don't know what IFN_SAT_ADD does. gimple-isel runs pretty late, 
and so at this point we don't
expect many more optimizations to happen, so it's a safe spot to insert more IL 
with "unknown semantics".

Was that your intention Richi?

Thanks,
Tamar

> >> As I said, use vectorizer patterns and possibly do instruction
> >> selection at ISEL/widen_mult ti

RE: [PATCH v3] Internal-fn: Introduce new internal function SAT_ADD

2024-05-01 Thread Tamar Christina
> -Original Message-
> From: Li, Pan2 
> Sent: Thursday, May 2, 2024 4:11 AM
> To: Tamar Christina ; gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com;
> Liu, Hongtao 
> Subject: RE: [PATCH v3] Internal-fn: Introduce new internal function SAT_ADD
> 
> Thanks Tamar
> 
> > Could you also split off the vectorizer change from scalar recog one? 
> > Typically I
> would structure a change like this as:
> 
> > 1. create types/structures + scalar recogn
> > 2. Vector recog code
> > 3. Backend changes
> 
> Sure thing, will rearrange the patch like this.
> 
> > Is ECF_NOTHROW correct here? At least on most targets I believe the scalar
> version
> > can set flags/throw exceptions if the saturation happens?
> 
> I see, will remove that.
> 
> > Hmm I believe Richi mentioned that he wanted the recognition done in isel?
> 
> > The problem with doing it in match.pd is that it replaces the operations 
> > quite
> > early the pipeline. Did I miss an email perhaps? The early replacement 
> > means we
> > lose optimizations and things such as range calculations etc, since e.g. 
> > ranger
> > doesn't know these internal functions.
> 
> > I think Richi will want this in islet or mult widening but I'll continue 
> > with match.pd
> > review just in case.
> 
> If I understand is correct, Richard suggested try vectorizer patterns first 
> and then
> possible isel.
> Thus, I don't have a try for SAT_ADD in ISEL as vectorizer patterns works 
> well for
> SAT_ADD.
> Let's wait the confirmation from Richard. Below are the original words from
> previous mail for reference.
> 

I think the comment he made was this

> > Given we have saturating integer alu like below, could you help to coach me 
> > the most reasonable way to represent
> > It in scalar as well as vectorize part? Sorry not familiar with this part 
> > and still dig into how it works...
> 
> As in your v2, .SAT_ADD for both sat_uadd and sat_sadd, similar for
> the other cases.
>
> As I said, use vectorizer patterns and possibly do instruction
> selection at ISEL/widen_mult time.

So he was responding for how to do it for the vectorizer and scalar parts.
Remember that the goal is not to introduce new gimple IL that can block other 
optimizations.
The vectorizer already introduces new IL (various IFN) but this is fine as we 
don't track things like ranges for
vector instructions.  So we don't loose any information here.

Now for the scalar, if we do an early replacement like in match.pd we prevent a 
lot of other optimizations
because they don't know what IFN_SAT_ADD does. gimple-isel runs pretty late, 
and so at this point we don't
expect many more optimizations to happen, so it's a safe spot to insert more IL 
with "unknown semantics".

Was that your intention Richi?

Thanks,
Tamar

> >> As I said, use vectorizer patterns and possibly do instruction
> >> selection at ISEL/widen_mult time.
> 
> > The optimize checks in the match.pd file are weird as it seems to check if 
> > we have
> > optimizations enabled?
> 
> > We don't typically need to do this.
> 
> Sure, will remove this.
> 
> > The function has only one caller, you should just inline it into the 
> > pattern.
> 
> Sure thing.
> 
> > Once you inline vect_sat_add_build_call you can do the check for
> > vtype here, which is the cheaper check so perform it early.
> 
> Sure thing.
> 
> Thanks again and will send the v4 with all comments addressed, as well as the 
> test
> results.
> 
> Pan
> 
> -Original Message-
> From: Tamar Christina 
> Sent: Thursday, May 2, 2024 1:06 AM
> To: Li, Pan2 ; gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com;
> Liu, Hongtao 
> Subject: RE: [PATCH v3] Internal-fn: Introduce new internal function SAT_ADD
> 
> Hi,
> 
> > From: Pan Li 
> >
> > Update in v3:
> > * Rebase upstream for conflict.
> >
> > Update in v2:
> > * Fix one failure for x86 bootstrap.
> >
> > Original log:
> >
> > This patch would like to add the middle-end presentation for the
> > saturation add.  Aka set the result of add to the max when overflow.
> > It will take the pattern similar as below.
> >
> > SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
> >
> > Take uint8_t as example, we will have:
> >
> > * SAT_ADD (1, 254)   => 255.
> > * SAT_ADD (1, 255)   => 255.
> > * SAT_ADD (2, 255)   => 255.
> > * SAT_ADD (255, 255) => 255.
> >
> > The p

RE: [PATCH v3] Internal-fn: Introduce new internal function SAT_ADD

2024-05-01 Thread Li, Pan2
Thanks Tamar

> Could you also split off the vectorizer change from scalar recog one? 
> Typically I would structure a change like this as:

> 1. create types/structures + scalar recogn
> 2. Vector recog code
> 3. Backend changes

Sure thing, will rearrange the patch like this.

> Is ECF_NOTHROW correct here? At least on most targets I believe the scalar 
> version
> can set flags/throw exceptions if the saturation happens?

I see, will remove that.

> Hmm I believe Richi mentioned that he wanted the recognition done in isel?

> The problem with doing it in match.pd is that it replaces the operations quite
> early the pipeline. Did I miss an email perhaps? The early replacement means 
> we
> lose optimizations and things such as range calculations etc, since e.g. 
> ranger
> doesn't know these internal functions.

> I think Richi will want this in islet or mult widening but I'll continue with 
> match.pd
> review just in case.

If I understand is correct, Richard suggested try vectorizer patterns first and 
then possible isel.
Thus, I don't have a try for SAT_ADD in ISEL as vectorizer patterns works well 
for SAT_ADD.
Let's wait the confirmation from Richard. Below are the original words from 
previous mail for reference.

>> As I said, use vectorizer patterns and possibly do instruction
>> selection at ISEL/widen_mult time.

> The optimize checks in the match.pd file are weird as it seems to check if we 
> have
> optimizations enabled?

> We don't typically need to do this.

Sure, will remove this.

> The function has only one caller, you should just inline it into the pattern.

Sure thing.

> Once you inline vect_sat_add_build_call you can do the check for
> vtype here, which is the cheaper check so perform it early.

Sure thing.

Thanks again and will send the v4 with all comments addressed, as well as the 
test results.

Pan

-Original Message-
From: Tamar Christina  
Sent: Thursday, May 2, 2024 1:06 AM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com; 
Liu, Hongtao 
Subject: RE: [PATCH v3] Internal-fn: Introduce new internal function SAT_ADD

Hi,

> From: Pan Li 
> 
> Update in v3:
> * Rebase upstream for conflict.
> 
> Update in v2:
> * Fix one failure for x86 bootstrap.
> 
> Original log:
> 
> This patch would like to add the middle-end presentation for the
> saturation add.  Aka set the result of add to the max when overflow.
> It will take the pattern similar as below.
> 
> SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
> 
> Take uint8_t as example, we will have:
> 
> * SAT_ADD (1, 254)   => 255.
> * SAT_ADD (1, 255)   => 255.
> * SAT_ADD (2, 255)   => 255.
> * SAT_ADD (255, 255) => 255.
> 
> The patch also implement the SAT_ADD in the riscv backend as
> the sample for both the scalar and vector.  Given below example:
> 
> uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> {
>   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> }
> 
> Before this patch:
> uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> {
>   long unsigned int _1;
>   _Bool _2;
>   long unsigned int _3;
>   long unsigned int _4;
>   uint64_t _7;
>   long unsigned int _10;
>   __complex__ long unsigned int _11;
> 
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
>   _1 = REALPART_EXPR <_11>;
>   _10 = IMAGPART_EXPR <_11>;
>   _2 = _10 != 0;
>   _3 = (long unsigned int) _2;
>   _4 = -_3;
>   _7 = _1 | _4;
>   return _7;
> ;;succ:   EXIT
> 
> }
> 
> After this patch:
> uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> {
>   uint64_t _7;
> 
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
>   return _7;
> ;;succ:   EXIT
> }
> 
> For vectorize, we leverage the existing vect pattern recog to find
> the pattern similar to scalar and let the vectorizer to perform
> the rest part for standard name usadd3 in vector mode.
> The riscv vector backend have insn "Vector Single-Width Saturating
> Add and Subtract" which can be leveraged when expand the usadd3
> in vector mode.  For example:
> 
> void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
> {
>   unsigned i;
> 
>   for (i = 0; i < n; i++)
> out[i] = (x[i] + y[i]) | (- (uint64_t)((uint64_t)(x[i] + y[i]) < x[i]));
> }
> 
> Before this patch:
> void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
> {
>   ...
>   _80 = .SELECT_VL (ivtmp_78, POLY_INT_CST [2, 2]);
>   ivtmp_58 = _80 * 8;
>   vect__4.7_61 = .MASK_LEN_LOAD (ve

RE: [PATCH v3] Internal-fn: Introduce new internal function SAT_ADD

2024-05-01 Thread Tamar Christina
Hi,

> From: Pan Li 
> 
> Update in v3:
> * Rebase upstream for conflict.
> 
> Update in v2:
> * Fix one failure for x86 bootstrap.
> 
> Original log:
> 
> This patch would like to add the middle-end presentation for the
> saturation add.  Aka set the result of add to the max when overflow.
> It will take the pattern similar as below.
> 
> SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
> 
> Take uint8_t as example, we will have:
> 
> * SAT_ADD (1, 254)   => 255.
> * SAT_ADD (1, 255)   => 255.
> * SAT_ADD (2, 255)   => 255.
> * SAT_ADD (255, 255) => 255.
> 
> The patch also implement the SAT_ADD in the riscv backend as
> the sample for both the scalar and vector.  Given below example:
> 
> uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> {
>   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> }
> 
> Before this patch:
> uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> {
>   long unsigned int _1;
>   _Bool _2;
>   long unsigned int _3;
>   long unsigned int _4;
>   uint64_t _7;
>   long unsigned int _10;
>   __complex__ long unsigned int _11;
> 
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
>   _1 = REALPART_EXPR <_11>;
>   _10 = IMAGPART_EXPR <_11>;
>   _2 = _10 != 0;
>   _3 = (long unsigned int) _2;
>   _4 = -_3;
>   _7 = _1 | _4;
>   return _7;
> ;;succ:   EXIT
> 
> }
> 
> After this patch:
> uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> {
>   uint64_t _7;
> 
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
>   return _7;
> ;;succ:   EXIT
> }
> 
> For vectorize, we leverage the existing vect pattern recog to find
> the pattern similar to scalar and let the vectorizer to perform
> the rest part for standard name usadd3 in vector mode.
> The riscv vector backend have insn "Vector Single-Width Saturating
> Add and Subtract" which can be leveraged when expand the usadd3
> in vector mode.  For example:
> 
> void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
> {
>   unsigned i;
> 
>   for (i = 0; i < n; i++)
> out[i] = (x[i] + y[i]) | (- (uint64_t)((uint64_t)(x[i] + y[i]) < x[i]));
> }
> 
> Before this patch:
> void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
> {
>   ...
>   _80 = .SELECT_VL (ivtmp_78, POLY_INT_CST [2, 2]);
>   ivtmp_58 = _80 * 8;
>   vect__4.7_61 = .MASK_LEN_LOAD (vectp_x.5_59, 64B, { -1, ... }, _80, 0);
>   vect__6.10_65 = .MASK_LEN_LOAD (vectp_y.8_63, 64B, { -1, ... }, _80, 0);
>   vect__7.11_66 = vect__4.7_61 + vect__6.10_65;
>   mask__8.12_67 = vect__4.7_61 > vect__7.11_66;
>   vect__12.15_72 = .VCOND_MASK (mask__8.12_67, { 18446744073709551615,
> ... }, vect__7.11_66);
>   .MASK_LEN_STORE (vectp_out.16_74, 64B, { -1, ... }, _80, 0, vect__12.15_72);
>   vectp_x.5_60 = vectp_x.5_59 + ivtmp_58;
>   vectp_y.8_64 = vectp_y.8_63 + ivtmp_58;
>   vectp_out.16_75 = vectp_out.16_74 + ivtmp_58;
>   ivtmp_79 = ivtmp_78 - _80;
>   ...
> }
> 
> vec_sat_add_u64:
>   ...
>   vsetvli a5,a3,e64,m1,ta,ma
>   vle64.v v0,0(a1)
>   vle64.v v1,0(a2)
>   sllia4,a5,3
>   sub a3,a3,a5
>   add a1,a1,a4
>   add a2,a2,a4
>   vadd.vv v1,v0,v1
>   vmsgtu.vv   v0,v0,v1
>   vmerge.vim  v1,v1,-1,v0
>   vse64.v v1,0(a0)
>   ...
> 
> After this patch:
> void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
> {
>   ...
>   _62 = .SELECT_VL (ivtmp_60, POLY_INT_CST [2, 2]);
>   ivtmp_46 = _62 * 8;
>   vect__4.7_49 = .MASK_LEN_LOAD (vectp_x.5_47, 64B, { -1, ... }, _62, 0);
>   vect__6.10_53 = .MASK_LEN_LOAD (vectp_y.8_51, 64B, { -1, ... }, _62, 0);
>   vect__12.11_54 = .SAT_ADD (vect__4.7_49, vect__6.10_53);
>   .MASK_LEN_STORE (vectp_out.12_56, 64B, { -1, ... }, _62, 0, vect__12.11_54);
>   ...
> }
> 
> vec_sat_add_u64:
>   ...
>   vsetvli a5,a3,e64,m1,ta,ma
>   vle64.v v1,0(a1)
>   vle64.v v2,0(a2)
>   sllia4,a5,3
>   sub a3,a3,a5
>   add a1,a1,a4
>   add a2,a2,a4
>   vsaddu.vv   v1,v1,v2
>   vse64.v v1,0(a0)
>   ...
> 
> To limit the patch size for review, only unsigned version of
> usadd3 are involved here. The signed version will be covered
> in the underlying patch(es).
> 
> The below test suites are passed for this patch.
> * The riscv fully regression tests.
> * The aarch64 fully regression tests.
> * The x86 bootstrap tests.
> * The x86 fully regression tests.
> 
>   PR target/51492
>   PR target/112600
> 
> gcc/ChangeLog:
> 
>   * config/riscv/autovec.md (usadd3): New pattern expand
>   for unsigned SAT_ADD vector.
>   * config/riscv/riscv-protos.h (riscv_expand_usadd): New func
>   decl to expand usadd3 pattern.
>   (expand_vec_usadd): Ditto but for vector.
>   * config/riscv/riscv-v.cc (emit_vec_saddu): New func impl to
>   emit the vsadd insn.
>   (expand_vec_usadd): New func impl to expand usadd3 for
>   vector.
>   * config/riscv/riscv.cc (riscv_expand_usadd): New func impl
>   to 

[PATCH v3] Internal-fn: Introduce new internal function SAT_ADD

2024-04-29 Thread pan2 . li
From: Pan Li 

Update in v3:
* Rebase upstream for conflict.

Update in v2:
* Fix one failure for x86 bootstrap.

Original log:

This patch would like to add the middle-end presentation for the
saturation add.  Aka set the result of add to the max when overflow.
It will take the pattern similar as below.

SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))

Take uint8_t as example, we will have:

* SAT_ADD (1, 254)   => 255.
* SAT_ADD (1, 255)   => 255.
* SAT_ADD (2, 255)   => 255.
* SAT_ADD (255, 255) => 255.

The patch also implement the SAT_ADD in the riscv backend as
the sample for both the scalar and vector.  Given below example:

uint64_t sat_add_u64 (uint64_t x, uint64_t y)
{
  return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
}

Before this patch:
uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
{
  long unsigned int _1;
  _Bool _2;
  long unsigned int _3;
  long unsigned int _4;
  uint64_t _7;
  long unsigned int _10;
  __complex__ long unsigned int _11;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
  _1 = REALPART_EXPR <_11>;
  _10 = IMAGPART_EXPR <_11>;
  _2 = _10 != 0;
  _3 = (long unsigned int) _2;
  _4 = -_3;
  _7 = _1 | _4;
  return _7;
;;succ:   EXIT

}

After this patch:
uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
{
  uint64_t _7;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
  return _7;
;;succ:   EXIT
}

For vectorize, we leverage the existing vect pattern recog to find
the pattern similar to scalar and let the vectorizer to perform
the rest part for standard name usadd3 in vector mode.
The riscv vector backend have insn "Vector Single-Width Saturating
Add and Subtract" which can be leveraged when expand the usadd3
in vector mode.  For example:

void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  unsigned i;

  for (i = 0; i < n; i++)
out[i] = (x[i] + y[i]) | (- (uint64_t)((uint64_t)(x[i] + y[i]) < x[i]));
}

Before this patch:
void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  ...
  _80 = .SELECT_VL (ivtmp_78, POLY_INT_CST [2, 2]);
  ivtmp_58 = _80 * 8;
  vect__4.7_61 = .MASK_LEN_LOAD (vectp_x.5_59, 64B, { -1, ... }, _80, 0);
  vect__6.10_65 = .MASK_LEN_LOAD (vectp_y.8_63, 64B, { -1, ... }, _80, 0);
  vect__7.11_66 = vect__4.7_61 + vect__6.10_65;
  mask__8.12_67 = vect__4.7_61 > vect__7.11_66;
  vect__12.15_72 = .VCOND_MASK (mask__8.12_67, { 18446744073709551615, ... }, 
vect__7.11_66);
  .MASK_LEN_STORE (vectp_out.16_74, 64B, { -1, ... }, _80, 0, vect__12.15_72);
  vectp_x.5_60 = vectp_x.5_59 + ivtmp_58;
  vectp_y.8_64 = vectp_y.8_63 + ivtmp_58;
  vectp_out.16_75 = vectp_out.16_74 + ivtmp_58;
  ivtmp_79 = ivtmp_78 - _80;
  ...
}

vec_sat_add_u64:
  ...
  vsetvli a5,a3,e64,m1,ta,ma
  vle64.v v0,0(a1)
  vle64.v v1,0(a2)
  sllia4,a5,3
  sub a3,a3,a5
  add a1,a1,a4
  add a2,a2,a4
  vadd.vv v1,v0,v1
  vmsgtu.vv   v0,v0,v1
  vmerge.vim  v1,v1,-1,v0
  vse64.v v1,0(a0)
  ...

After this patch:
void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  ...
  _62 = .SELECT_VL (ivtmp_60, POLY_INT_CST [2, 2]);
  ivtmp_46 = _62 * 8;
  vect__4.7_49 = .MASK_LEN_LOAD (vectp_x.5_47, 64B, { -1, ... }, _62, 0);
  vect__6.10_53 = .MASK_LEN_LOAD (vectp_y.8_51, 64B, { -1, ... }, _62, 0);
  vect__12.11_54 = .SAT_ADD (vect__4.7_49, vect__6.10_53);
  .MASK_LEN_STORE (vectp_out.12_56, 64B, { -1, ... }, _62, 0, vect__12.11_54);
  ...
}

vec_sat_add_u64:
  ...
  vsetvli a5,a3,e64,m1,ta,ma
  vle64.v v1,0(a1)
  vle64.v v2,0(a2)
  sllia4,a5,3
  sub a3,a3,a5
  add a1,a1,a4
  add a2,a2,a4
  vsaddu.vv   v1,v1,v2
  vse64.v v1,0(a0)
  ...

To limit the patch size for review, only unsigned version of
usadd3 are involved here. The signed version will be covered
in the underlying patch(es).

The below test suites are passed for this patch.
* The riscv fully regression tests.
* The aarch64 fully regression tests.
* The x86 bootstrap tests.
* The x86 fully regression tests.

PR target/51492
PR target/112600

gcc/ChangeLog:

* config/riscv/autovec.md (usadd3): New pattern expand
for unsigned SAT_ADD vector.
* config/riscv/riscv-protos.h (riscv_expand_usadd): New func
decl to expand usadd3 pattern.
(expand_vec_usadd): Ditto but for vector.
* config/riscv/riscv-v.cc (emit_vec_saddu): New func impl to
emit the vsadd insn.
(expand_vec_usadd): New func impl to expand usadd3 for
vector.
* config/riscv/riscv.cc (riscv_expand_usadd): New func impl
to expand usadd3 for scalar.
* config/riscv/riscv.md (usadd3): New pattern expand
for unsigned SAT_ADD scalar.
* config/riscv/vector.md: Allow VLS mode for vsaddu.
* internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD.
* internal-fn.def (SAT_ADD): Add new signed optab