[PATCH 2/2] testcase: rename pr111303.c to pr111324.c

2023-09-18 Thread Jiufu Guo via Gcc-patches
Hi,

When commit the fix for pr111324, the test cases was named as pr111303.c
by mistake.  Here, rename it to pr111324.c

Is this ok for trunk?

BR,
Jeff (Jiufu Guo)

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr111303.c: Rename to ...
* gcc.dg/tree-ssa/pr111324.c: ... this.
---
 gcc/testsuite/gcc.dg/tree-ssa/{pr111303.c => pr111324.c} | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename gcc/testsuite/gcc.dg/tree-ssa/{pr111303.c => pr111324.c} (100%)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr111303.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr111324.c
similarity index 100%
rename from gcc/testsuite/gcc.dg/tree-ssa/pr111303.c
rename to gcc/testsuite/gcc.dg/tree-ssa/pr111324.c
-- 
2.25.1



[PATCH 1/2] using overflow_free_p to simplify pattern

2023-09-18 Thread Jiufu Guo via Gcc-patches
Hi,

In r14-3582, an "overflow_free_p" interface is added.
The pattern of "(t * 2) / 2" in match.pd can be simplified
by using this interface.

Bootstrap & regtest pass on ppc64{,le} and x86_64.
Is this ok for trunk?

BR,
Jeff (Jiufu)

gcc/ChangeLog:

* match.pd ((t * 2) / 2): Update to use overflow_free_p.

---
 gcc/match.pd | 37 +++--
 1 file changed, 7 insertions(+), 30 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index 87edf0e75c3..8bba7056000 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -926,36 +926,13 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(if (TYPE_OVERFLOW_UNDEFINED (type))
 @0
 #if GIMPLE
-(with
- {
-   bool overflowed = true;
-   value_range vr0, vr1;
-   if (INTEGRAL_TYPE_P (type)
-  && get_range_query (cfun)->range_of_expr (vr0, @0)
-  && get_range_query (cfun)->range_of_expr (vr1, @1)
-  && !vr0.varying_p () && !vr0.undefined_p ()
-  && !vr1.varying_p () && !vr1.undefined_p ())
-{
-  wide_int wmin0 = vr0.lower_bound ();
-  wide_int wmax0 = vr0.upper_bound ();
-  wide_int wmin1 = vr1.lower_bound ();
-  wide_int wmax1 = vr1.upper_bound ();
-  /* If the multiplication can't overflow/wrap around, then
- it can be optimized too.  */
-  wi::overflow_type min_ovf, max_ovf;
-  wi::mul (wmin0, wmin1, TYPE_SIGN (type), _ovf);
-  wi::mul (wmax0, wmax1, TYPE_SIGN (type), _ovf);
-  if (min_ovf == wi::OVF_NONE && max_ovf == wi::OVF_NONE)
-{
-  wi::mul (wmin0, wmax1, TYPE_SIGN (type), _ovf);
-  wi::mul (wmax0, wmin1, TYPE_SIGN (type), _ovf);
-  if (min_ovf == wi::OVF_NONE && max_ovf == wi::OVF_NONE)
-overflowed = false;
-}
-}
- }
-(if (!overflowed)
- @0))
+(with {value_range vr0, vr1;}
+ (if (INTEGRAL_TYPE_P (type)
+ && get_range_query (cfun)->range_of_expr (vr0, @0)
+ && get_range_query (cfun)->range_of_expr (vr1, @1)
+ && !vr0.varying_p () && !vr1.varying_p ()
+ && range_op_handler (MULT_EXPR).overflow_free_p (vr0, vr1))
+  @0))
 #endif

 
-- 
2.25.1



Ping [PATCH V4 2/2] rs6000: use mtvsrws to move sf from si p9

2023-09-17 Thread Jiufu Guo via Gcc-patches
Hi,

I would like to have a ping.

BR,
Jeff (Jiufu Guo)

Jiufu Guo  writes:

> Hi,
>
> As mentioned in PR108338, on p9, we could use mtvsrws to implement
> the bitcast from SI to SF (or lowpart DI to SF).
>
> For code:
>   *(long long*)buff = di;
>   float f = *(float*)(buff);
>
> "sldi 9,3,32 ; mtvsrd 1,9 ; xscvspdpn 1,1" is generated.
> A better one would be "mtvsrws 1,3 ; xscvspdpn 1,1".
>
> Compare with previous patch:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623533.html
> "highpart DI-->SF" is put to a seperate patch.
>
> Pass bootstrap and regression on ppc64{,le}.
> Is this ok for trunk?
>
> BR,
> Jeff (Jiufu Guo)
>
> gcc/ChangeLog:
>
>   * config/rs6000/rs6000.md (movsf_from_si): Update to generate mtvsrws
>   for P9.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/powerpc/pr108338.c: Updated to check mtvsrws for p9.
>
> ---
>  gcc/config/rs6000/rs6000.md | 25 -
>  gcc/testsuite/gcc.target/powerpc/pr108338.c |  6 +++--
>  2 files changed, 23 insertions(+), 8 deletions(-)
>
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 
> 8c92cbf976de915136ad5dba24e69a363d21438d..c03e677bca79e8fb1acb276d07d0acfae009f6d8
>  100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -8280,13 +8280,26 @@ (define_insn_and_split "movsf_from_si"
>  {
>rtx op0 = operands[0];
>rtx op1 = operands[1];
> -  rtx op2 = operands[2];
> -  rtx op1_di = gen_rtx_REG (DImode, REGNO (op1));
>  
> -  /* Move SF value to upper 32-bits for xscvspdpn.  */
> -  emit_insn (gen_ashldi3 (op2, op1_di, GEN_INT (32)));
> -  emit_insn (gen_p8_mtvsrd_sf (op0, op2));
> -  emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0));
> +  /* Move lowpart 32-bits from register for SFmode.  */
> +  if (TARGET_P9_VECTOR)
> +{
> +  /* Using mtvsrws;xscvspdpn.  */
> +  rtx op0_v = gen_rtx_REG (V4SImode, REGNO (op0));
> +  emit_insn (gen_vsx_splat_v4si (op0_v, op1));
> +  emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0));
> +}
> +  else
> +{
> +  rtx op2 = operands[2];
> +  rtx op1_di = gen_rtx_REG (DImode, REGNO (op1));
> +
> +  /* Using ashl;mtvsrd;xscvspdpn.  */
> +  emit_insn (gen_ashldi3 (op2, op1_di, GEN_INT (32)));
> +  emit_insn (gen_p8_mtvsrd_sf (op0, op2));
> +  emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0));
> +}
> +
>DONE;
>  }
>[(set_attr "length"
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr108338.c 
> b/gcc/testsuite/gcc.target/powerpc/pr108338.c
> index 
> 6db65595343c2407fc32f68f5f52a1f7196c371d..0565e5254ed0a8cc579cf505a3f865426dcf62ae
>  100644
> --- a/gcc/testsuite/gcc.target/powerpc/pr108338.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr108338.c
> @@ -19,9 +19,11 @@ float  __attribute__ ((noipa)) sf_from_di_off4 (long long 
> l)
>  
>  /* Under lp64, parameter 'l' is in one DI reg, then bitcast sub DI to SF. */
>  /* { dg-final { scan-assembler-times {\mxscvspdpn\M} 2 { target { lp64 && 
> has_arch_pwr8 } } } } */
> -/* { dg-final { scan-assembler-times {\mmtvsrd\M} 2 { target { lp64 && 
> has_arch_pwr8 } } } } */
> +/* { dg-final { scan-assembler-times {\mmtvsrd\M} 2 { target { lp64 && { 
> has_arch_pwr8 && { ! has_arch_pwr9 } } } } } } */
> +/* { dg-final { scan-assembler-times {\msldi\M} 1 { target { lp64 && { 
> has_arch_pwr8 && { ! has_arch_pwr9 } } } } } } */
> +/* { dg-final { scan-assembler-times {\mmtvsrd\M} 1 { target { lp64 && 
> has_arch_pwr9 } } } } */
> +/* { dg-final { scan-assembler-times {\mmtvsrws\M} 1 { target { lp64 && 
> has_arch_pwr9 } } } } */
>  /* { dg-final { scan-assembler-times {\mrldicr\M} 1 { target { lp64 && 
> has_arch_pwr8 } } } } */
> -/* { dg-final { scan-assembler-times {\msldi\M} 1 { target { lp64 && 
> has_arch_pwr8 } } } } */
>  
>  union di_sf_sf
>  {


Ping [PATCH V4 1/2] rs6000: optimize moving to sf from highpart di

2023-09-17 Thread Jiufu Guo via Gcc-patches
Hi,

I would like to have a ping.

BR,
Jeff (Jiufu Guo)

Jiufu Guo  writes:

> Hi,
>
> Currently, we have the pattern "movsf_from_si2" which was trying
> to support moving high part DI to SF.
>
> The pattern looks like: XX:SF=bitcast:SF(subreg(YY:DI>>32),0)
> It only accepts the "ashiftrt" for ">>", but "lshiftrt" is also ok.
> And the offset of "subreg" is hard code 0, which only works for LE.
>
> "movsf_from_si2" is updated to cover BE for "subreg", and cover
> the logical shift for ":DI>>32".
>
> Pass bootstrap and regression on ppc64{,le}.
> Is this ok for trunk?
>
> BR,
> Jeff (Jiufu Guo)
>
>   PR target/108338
>
> gcc/ChangeLog:
>
>   * config/rs6000/predicates.md (lowpart_subreg_operator): New
>   define_predicate.
>   * config/rs6000/rs6000.md (any_rshift): New code_iterator.
>   (movsf_from_si2): Rename to ...
>   (movsf_from_si2_): ... this.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/powerpc/pr108338.c: New test.
>
> ---
>  gcc/config/rs6000/predicates.md |  5 +++
>  gcc/config/rs6000/rs6000.md | 11 +++---
>  gcc/testsuite/gcc.target/powerpc/pr108338.c | 40 +
>  3 files changed, 51 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108338.c
>
> diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
> index 
> 3552d908e9d149a30993e3e6568466de537336be..e25b3b4864f681d47e9d5c2eb88bcde0aea6d17b
>  100644
> --- a/gcc/config/rs6000/predicates.md
> +++ b/gcc/config/rs6000/predicates.md
> @@ -2098,3 +2098,8 @@ (define_predicate "macho_pic_address"
>else
>  return false;
>  })
> +
> +(define_predicate "lowpart_subreg_operator"
> +  (and (match_code "subreg")
> +   (match_test "subreg_lowpart_offset (mode, GET_MODE (SUBREG_REG (op)))
> + == SUBREG_BYTE (op)")))
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 
> 1a9a7b1a47918f39fc91038607f21a8ba9a2e740..8c92cbf976de915136ad5dba24e69a363d21438d
>  100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -8299,18 +8299,19 @@ (define_insn_and_split "movsf_from_si"
>   "*,  *, p9v,   p8v,   *, *,
>p8v,p8v,   p8v,   *")])
>  
> +(define_code_iterator any_rshift [ashiftrt lshiftrt])
> +
>  ;; For extracting high part element from DImode register like:
>  ;; {%1:SF=unspec[r122:DI>>0x20#0] 86;clobber scratch;}
>  ;; split it before reload with "and mask" to avoid generating shift right
>  ;; 32 bit then shift left 32 bit.
> -(define_insn_and_split "movsf_from_si2"
> +(define_insn_and_split "movsf_from_si2_"
>[(set (match_operand:SF 0 "gpc_reg_operand" "=wa")
>   (unspec:SF
> -  [(subreg:SI
> -(ashiftrt:DI
> +  [(match_operator:SI 3 "lowpart_subreg_operator"
> +[(any_rshift:DI
>   (match_operand:DI 1 "input_operand" "r")
> - (const_int 32))
> -0)]
> + (const_int 32))])]
>UNSPEC_SF_FROM_SI))
>(clobber (match_scratch:DI 2 "=r"))]
>"TARGET_NO_SF_SUBREG"
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr108338.c 
> b/gcc/testsuite/gcc.target/powerpc/pr108338.c
> new file mode 100644
> index 
> ..6db65595343c2407fc32f68f5f52a1f7196c371d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr108338.c
> @@ -0,0 +1,40 @@
> +// { dg-do run }
> +// { dg-options "-O2 -save-temps" }
> +
> +float __attribute__ ((noipa)) sf_from_di_off0 (long long l)
> +{
> +  char buff[16];
> +  *(long long*)buff = l;
> +  float f = *(float*)(buff);
> +  return f;
> +}
> +
> +float  __attribute__ ((noipa)) sf_from_di_off4 (long long l)
> +{
> +  char buff[16];
> +  *(long long*)buff = l;
> +  float f = *(float*)(buff + 4);
> +  return f; 
> +}
> +
> +/* Under lp64, parameter 'l' is in one DI reg, then bitcast sub DI to SF. */
> +/* { dg-final { scan-assembler-times {\mxscvspdpn\M} 2 { target { lp64 && 
> has_arch_pwr8 } } } } */
> +/* { dg-final { scan-assembler-times {\mmtvsrd\M} 2 { target { lp64 && 
> has_arch_pwr8 } } } } */
> +/* { dg-final { scan-assembler-times {\mrldicr\M} 1 { target { lp64 && 
> has_arch_pwr8 } } } } */
> +/* { dg-final { scan-assembler-times {\msldi\M} 1 { target { lp64 && 
> has_arch_pwr8 } } } } */
> +
> +union di_sf_sf
> +{
> +  struct {float f1; float f2;};
> +  long long l;
> +};
> +
> +int main()
> +{
> +  union di_sf_sf v;
> +  v.f1 = 1.0f;
> +  v.f2 = 2.0f;
> +  if (sf_from_di_off0 (v.l) != 1.0f || sf_from_di_off4 (v.l) != 2.0f )
> +__builtin_abort ();
> +  return 0;
> +}


Ping [PATCH] rs6000: mark tieable between INT and FLOAT

2023-09-17 Thread Jiufu Guo via Gcc-patches
Hi,

I would like to have a ping.

BR,
Jeff (Jiufu Guo)

Jiufu Guo  writes:

> Hi,
>
> For PowerPC, some INT mode and FLOAT modes can be marked as tieable,
> for example: DI<->DF.
> One note SFmode is special, it would only tieable with itself.
>
> I updated previous patch more reasonable:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609504.html
>
> Bootstrap and regtest pass on ppc64{,le}.
> Is this ok for trunk?
>
> BR,
> Jeff (Jiufu)
>
> gcc/ChangeLog:
>
>   * config/rs6000/rs6000.cc (rs6000_modes_tieable_p): Mark more tieable
>   modes.
>
> gcc/testsuite/ChangeLog:
>
>   * g++.target/powerpc/pr102024.C: Updated.
>
> ---
>  gcc/config/rs6000/rs6000.cc | 9 +
>  gcc/testsuite/g++.target/powerpc/pr102024.C | 3 ++-
>  2 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 6ac3adcec6b..3cb0186089e 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -1968,6 +1968,15 @@ rs6000_modes_tieable_p (machine_mode mode1, 
> machine_mode mode2)
>if (ALTIVEC_OR_VSX_VECTOR_MODE (mode2))
>  return false;
>  
> +  /* SFmode format (IEEE DP) in register would not as required,
> + So SFmode is restrict here.  */
> +  if (GET_MODE_CLASS (mode1) == MODE_FLOAT
> +  && GET_MODE_CLASS (mode2) == MODE_INT)
> +return GET_MODE_SIZE (mode1) == UNITS_PER_FP_WORD;
> +  if (GET_MODE_CLASS (mode1) == MODE_INT
> +  && GET_MODE_CLASS (mode2) == MODE_FLOAT)
> +return GET_MODE_SIZE (mode2) == UNITS_PER_FP_WORD;
> +
>if (SCALAR_FLOAT_MODE_P (mode1))
>  return SCALAR_FLOAT_MODE_P (mode2);
>if (SCALAR_FLOAT_MODE_P (mode2))
> diff --git a/gcc/testsuite/g++.target/powerpc/pr102024.C 
> b/gcc/testsuite/g++.target/powerpc/pr102024.C
> index 769585052b5..27d2dc5e80b 100644
> --- a/gcc/testsuite/g++.target/powerpc/pr102024.C
> +++ b/gcc/testsuite/g++.target/powerpc/pr102024.C
> @@ -5,7 +5,8 @@
>  // Test that a zero-width bit field in an otherwise homogeneous aggregate
>  // generates a psabi warning and passes arguments in GPRs.
>  
> -// { dg-final { scan-assembler-times {\mstd\M} 4 } }
> +// { dg-final { scan-assembler-times {\mmtvsrd\M} 4 { target has_arch_pwr8 } 
> } }
> +// { dg-final { scan-assembler-times {\mstd\M} 4 { target { ! has_arch_pwr8 
> } } } }
>  
>  struct a_thing
>  {


Re: [PATCH] Checking undefined_p before using the vr

2023-09-14 Thread Jiufu Guo via Gcc-patches


Hi,

Andrew MacLeod  writes:

> On 9/12/23 21:42, Jiufu Guo wrote:
>> Hi,
>>
>> Richard Biener  writes:
>>
>>> On Thu, 7 Sep 2023, Jiufu Guo wrote:
>>>
 Hi,

 As discussed in PR111303:

 For pattern "(X + C) / N": "div (plus@3 @0 INTEGER_CST@1) INTEGER_CST@2)",
 Even if "X" has value-range and "X + C" does not overflow, "@3" may still
 be undefined. Like below example:

 _3 = _2 + -5;
 if (0 != 0)
goto ; [34.00%]
 else
goto ; [66.00%]
 ;;  succ:   3
 ;;  4

 ;; basic block 3, loop depth 0
 ;;  pred:   2
 _5 = _3 / 5;
 ;;  succ:   4

 The whole pattern "(_2 + -5 ) / 5" is in "bb 3", but "bb 3" would be
 unreachable (because "if (0 != 0)" is always false).
 And "get_range_query (cfun)->range_of_expr (vr3, @3)" is checked in
 "bb 3", "range_of_expr" gets an "undefined vr3". Where "@3" is "_5".

 So, before using "vr3", it would be safe to check "!vr3.undefined_p ()".

 Bootstrap & regtest pass on ppc64{,le} and x86_64.
 Is this ok for trunk?
>>> OK, but I wonder why ->range_of_expr () doesn't return false for
>>> undefined_p ()?  While "undefined" technically means we can treat
>>> it as nonnegative_p (or not, maybe but maybe not both), we seem to
>>> not want to do that.  So why expose it at all to ranger users
>>> (yes, internally we in some places want to handle undefined).
>> I guess, currently, it returns true and then lets the user check
>> undefined_p, maybe because it tries to only return false if the
>> type of EXPR is unsupported.
>
> false is returned if no range can be calculated for any reason. The
> most common ones are unsupported types or in some cases, statements
> that are not understood.  FALSE means you cannot use the range being
> passed in.

Thanks a lot for the explaination! "false" means no ranger returned:
we cannot use the range argument after call.

>
>
>> Let "range_of_expr" return false for undefined_p would save checking
>> undefined_p again when using the APIs.
>>
> undefined is a perfectly acceptable range.  It can be used to
> represent either values which has not been initialized, or more
> frequently it identifies values that cannot occur due to
> conflicting/unreachable code.  VARYING means it can be any range,
> UNDEFINED means this is unusable, so treat it accordingly.  Its
> propagated like any other range.

"undefined" means the ranger is unusable. So, for this ranger, it
seems only "undefined_p ()" can be checked, and it seems no other
functions of this ranger can be called.

I'm thinking that it may be ok to let "range_of_expr" return false
if the "vr" is "undefined_p".  I know this may change the meaning
of "range_of_expr" slightly :) 

>
> The only reason you are having issues is you are then asking for the
> type of the range, and an undefined range currently has no type, for
> historical reasons.

Yeap, thanks for pointing out this!

BR,
Jeff (Jiufu Guo)

>
> Andrew
>
> Andrew


[PATCH] use local range for one more pattern in match.pd

2023-09-14 Thread Jiufu Guo via Gcc-patches
Hi,

For "get_global_range_query" SSA_NAME_RANGE_INFO can be queried.
For "get_range_query", it could get more context-aware range info.
And look at the implementation of "get_range_query",  it returns
global range if no local fun info.

ATTRIBUTE_RETURNS_NONNULL inline range_query *
get_range_query (const struct function *fun)
{
  return (fun && fun->x_range_query) ? fun->x_range_query : _ranges;
}

So, using "get_range_query" would cover more case.
For example, the test case of "pr111303.c".

Bootstrap  pass on ppc64{,le} and x86_64.
Is this ok for trunk?


BR,
Jeff (Jiufu Guo)


PR middle-end/111303

gcc/ChangeLog:

* match.pd ((t * 2) / 2): Update pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr111303.c: New test.

---
 gcc/match.pd |  4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/pr111303.c | 15 +++
 2 files changed, 17 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr111303.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 693638f8ca0..6bd72ff4d69 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -931,8 +931,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
bool overflowed = true;
value_range vr0, vr1;
if (INTEGRAL_TYPE_P (type)
-  && get_global_range_query ()->range_of_expr (vr0, @0)
-  && get_global_range_query ()->range_of_expr (vr1, @1)
+  && get_range_query (cfun)->range_of_expr (vr0, @0)
+  && get_range_query (cfun)->range_of_expr (vr1, @1)
   && !vr0.varying_p () && !vr0.undefined_p ()
   && !vr1.varying_p () && !vr1.undefined_p ())
 {
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr111303.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr111303.c
new file mode 100644
index 000..b703fe4546d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr111303.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+typedef unsigned int INT;
+
+INT
+foo (INT x, INT y)
+{
+  if (x > 100 || y > 100)
+return x;
+  return (x * y) / y;
+}
+
+/* { dg-final { scan-tree-dump-times "return x_..D." 1 "optimized"} } */
+/* { dg-final { scan-tree-dump-times " / " 0 "optimized"} } */
-- 
2.25.1



Re: [PATCH] Checking undefined_p before using the vr

2023-09-12 Thread Jiufu Guo via Gcc-patches


Hi,

Richard Biener  writes:

> On Thu, 7 Sep 2023, Jiufu Guo wrote:
>
>> Hi,
>> 
>> As discussed in PR111303:
>> 
>> For pattern "(X + C) / N": "div (plus@3 @0 INTEGER_CST@1) INTEGER_CST@2)",
>> Even if "X" has value-range and "X + C" does not overflow, "@3" may still
>> be undefined. Like below example:
>> 
>> _3 = _2 + -5;
>> if (0 != 0)
>>   goto ; [34.00%]
>> else
>>   goto ; [66.00%]
>> ;;  succ:   3
>> ;;  4
>> 
>> ;; basic block 3, loop depth 0
>> ;;  pred:   2
>> _5 = _3 / 5; 
>> ;;  succ:   4
>> 
>> The whole pattern "(_2 + -5 ) / 5" is in "bb 3", but "bb 3" would be
>> unreachable (because "if (0 != 0)" is always false).
>> And "get_range_query (cfun)->range_of_expr (vr3, @3)" is checked in
>> "bb 3", "range_of_expr" gets an "undefined vr3". Where "@3" is "_5".
>> 
>> So, before using "vr3", it would be safe to check "!vr3.undefined_p ()".
>> 
>> Bootstrap & regtest pass on ppc64{,le} and x86_64.
>> Is this ok for trunk?
>
> OK, but I wonder why ->range_of_expr () doesn't return false for
> undefined_p ()?  While "undefined" technically means we can treat
> it as nonnegative_p (or not, maybe but maybe not both), we seem to
> not want to do that.  So why expose it at all to ranger users
> (yes, internally we in some places want to handle undefined).

I guess, currently, it returns true and then lets the user check
undefined_p, maybe because it tries to only return false if the
type of EXPR is unsupported.

Let "range_of_expr" return false for undefined_p would save checking
undefined_p again when using the APIs.

Committed va r14-3913.

BR,
Jeff (Jiufu Guo)

>
> Richard.
>
>> BR,
>> Jeff (Jiufu Guo)
>> 
>>  PR middle-end/111303
>> 
>> gcc/ChangeLog:
>> 
>>  * match.pd ((X - N * M) / N): Add undefined_p checking.
>>  (X + N * M) / N): Likewise.
>>  ((X + C) div_rshift N): Likewise.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>  * gcc.dg/pr111303.c: New test.
>> 
>> ---
>>  gcc/match.pd|  3 +++
>>  gcc/testsuite/gcc.dg/pr111303.c | 11 +++
>>  2 files changed, 14 insertions(+)
>>  create mode 100644 gcc/testsuite/gcc.dg/pr111303.c
>> 
>> diff --git a/gcc/match.pd b/gcc/match.pd
>> index 801edb128f9..e2583ca7960 100644
>> --- a/gcc/match.pd
>> +++ b/gcc/match.pd
>> @@ -975,6 +975,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>> /* "X+(N*M)" doesn't overflow.  */
>> && range_op_handler (PLUS_EXPR).overflow_free_p (vr0, vr3)
>> && get_range_query (cfun)->range_of_expr (vr4, @4)
>> +   && !vr4.undefined_p ()
>> /* "X+N*M" is not with opposite sign as "X".  */
>> && (TYPE_UNSIGNED (type)
>> || (vr0.nonnegative_p () && vr4.nonnegative_p ())
>> @@ -995,6 +996,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>> /* "X - (N*M)" doesn't overflow.  */
>> && range_op_handler (MINUS_EXPR).overflow_free_p (vr0, vr3)
>> && get_range_query (cfun)->range_of_expr (vr4, @4)
>> +   && !vr4.undefined_p ()
>> /* "X-N*M" is not with opposite sign as "X".  */
>> && (TYPE_UNSIGNED (type)
>> || (vr0.nonnegative_p () && vr4.nonnegative_p ())
>> @@ -1025,6 +1027,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>>/* "X+C" doesn't overflow.  */
>>&& range_op_handler (PLUS_EXPR).overflow_free_p (vr0, vr1)
>>&& get_range_query (cfun)->range_of_expr (vr3, @3)
>> +  && !vr3.undefined_p ()
>>/* "X+C" and "X" are not of opposite sign.  */
>>&& (TYPE_UNSIGNED (type)
>>|| (vr0.nonnegative_p () && vr3.nonnegative_p ())
>> diff --git a/gcc/testsuite/gcc.dg/pr111303.c 
>> b/gcc/testsuite/gcc.dg/pr111303.c
>> new file mode 100644
>> index 000..eaabe55c105
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/pr111303.c
>> @@ -0,0 +1,11 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2" } */
>> +
>> +/* Make sure no ICE. */
>> +unsigned char a;
>> +int b(int c) {
>> +  if (c >= 5000)
>> +return c / 5;
>> +}
>> +void d() { b(a - 5); }
>> +int main() {}
>> 


Re: [PATCH V5 1/4] rs6000: build constant via li;rotldi

2023-09-07 Thread Jiufu Guo via Gcc-patches


Hi,

Gentle ping...

BR,
Jeff (Jiufu Guo)

Jiufu Guo  writes:

> Hi,
>
> If a constant is possible to be rotated to/from a positive or negative
> value which "li" can generated, then "li;rotldi" can be used to build
> the constant.
>
> Compare with the previous version:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623528.html
> This patch just did minor changes to the comments according to previous
> review.
>
> Bootstrap and regtest pass on ppc64{,le}.
>
> Is this ok for trunk?
>
>
> BR,
> Jeff (Jiufu)
>
> gcc/ChangeLog:
>
>   * config/rs6000/rs6000.cc (can_be_built_by_li_and_rotldi): New function.
>   (rs6000_emit_set_long_const): Call can_be_built_by_li_and_rotldi.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/powerpc/const-build.c: New test.
> ---
>  gcc/config/rs6000/rs6000.cc   | 47 +--
>  .../gcc.target/powerpc/const-build.c  | 57 +++
>  2 files changed, 98 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/const-build.c
>
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 42f49e4a56b..acc332acc05 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -10258,6 +10258,31 @@ rs6000_emit_set_const (rtx dest, rtx source)
>return true;
>  }
>  
> +/* Check if value C can be built by 2 instructions: one is 'li', another is
> +   'rotldi'.
> +
> +   If so, *SHIFT is set to the shift operand of rotldi(rldicl), and *MASK
> +   is set to the mask operand of rotldi(rldicl), and return true.
> +   Return false otherwise.  */
> +
> +static bool
> +can_be_built_by_li_and_rotldi (HOST_WIDE_INT c, int *shift,
> +HOST_WIDE_INT *mask)
> +{
> +  /* If C or ~C contains at least 49 successive zeros, then C can be rotated
> + to/from a positive or negative value that 'li' is able to load.  */
> +  int n;
> +  if (can_be_rotated_to_lowbits (c, 15, )
> +  || can_be_rotated_to_lowbits (~c, 15, ))
> +{
> +  *mask = HOST_WIDE_INT_M1;
> +  *shift = HOST_BITS_PER_WIDE_INT - n;
> +  return true;
> +}
> +
> +  return false;
> +}
> +
>  /* Subroutine of rs6000_emit_set_const, handling PowerPC64 DImode.
> Output insns to set DEST equal to the constant C as a series of
> lis, ori and shl instructions.  */
> @@ -10266,15 +10291,14 @@ static void
>  rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
>  {
>rtx temp;
> +  int shift;
> +  HOST_WIDE_INT mask;
>HOST_WIDE_INT ud1, ud2, ud3, ud4;
>  
>ud1 = c & 0x;
> -  c = c >> 16;
> -  ud2 = c & 0x;
> -  c = c >> 16;
> -  ud3 = c & 0x;
> -  c = c >> 16;
> -  ud4 = c & 0x;
> +  ud2 = (c >> 16) & 0x;
> +  ud3 = (c >> 32) & 0x;
> +  ud4 = (c >> 48) & 0x;
>  
>if ((ud4 == 0x && ud3 == 0x && ud2 == 0x && (ud1 & 0x8000))
>|| (ud4 == 0 && ud3 == 0 && ud2 == 0 && ! (ud1 & 0x8000)))
> @@ -10305,6 +10329,17 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT 
> c)
>emit_move_insn (dest, gen_rtx_XOR (DImode, temp,
>GEN_INT ((ud2 ^ 0x) << 16)));
>  }
> +  else if (can_be_built_by_li_and_rotldi (c, , ))
> +{
> +  temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
> +  unsigned HOST_WIDE_INT imm = (c | ~mask);
> +  imm = (imm >> shift) | (imm << (HOST_BITS_PER_WIDE_INT - shift));
> +
> +  emit_move_insn (temp, GEN_INT (imm));
> +  if (shift != 0)
> + temp = gen_rtx_ROTATE (DImode, temp, GEN_INT (shift));
> +  emit_move_insn (dest, temp);
> +}
>else if (ud3 == 0 && ud4 == 0)
>  {
>temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
> diff --git a/gcc/testsuite/gcc.target/powerpc/const-build.c 
> b/gcc/testsuite/gcc.target/powerpc/const-build.c
> new file mode 100644
> index 000..69b37e2bb53
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/const-build.c
> @@ -0,1 +1,57 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -save-temps" } */
> +/* { dg-require-effective-target has_arch_ppc64 } */
> +
> +/* Verify that two instructions are successfully used to build constants.
> +   One insn is li, another is rotate: rldicl.  */
> +
> +#define NOIPA __attribute__ ((noipa))
> +
> +struct fun
> +{
> +  long long (*f) (void);
> +  long long val;
> +};
> +
> +long long NOIPA
> +li_rotldi_1 (void)
> +{
> +  return 0x75310LL;
> +}
> +
> +long long NOIPA
> +li_rotldi_2 (void)
> +{
> +  return 0x2164LL;
> +}
> +
> +long long NOIPA
> +li_rotldi_3 (void)
> +{
> +  return 0x8531LL;
> +}
> +
> +long long NOIPA
> +li_rotldi_4 (void)
> +{
> +  return 0x2194LL;
> +}
> +
> +struct fun arr[] = {
> +  {li_rotldi_1, 0x75310LL},
> +  {li_rotldi_2, 0x2164LL},
> +  {li_rotldi_3, 0x8531LL},
> +  {li_rotldi_4, 0x2194LL},
> +};
> +
> +/* { dg-final { scan-assembler-times {\mrotldi\M} 4 } } */
> +
> 

[PATCH] Checking undefined_p before using the vr

2023-09-06 Thread Jiufu Guo via Gcc-patches
Hi,

As discussed in PR111303:

For pattern "(X + C) / N": "div (plus@3 @0 INTEGER_CST@1) INTEGER_CST@2)",
Even if "X" has value-range and "X + C" does not overflow, "@3" may still
be undefined. Like below example:

_3 = _2 + -5;
if (0 != 0)
  goto ; [34.00%]
else
  goto ; [66.00%]
;;  succ:   3
;;  4

;; basic block 3, loop depth 0
;;  pred:   2
_5 = _3 / 5; 
;;  succ:   4

The whole pattern "(_2 + -5 ) / 5" is in "bb 3", but "bb 3" would be
unreachable (because "if (0 != 0)" is always false).
And "get_range_query (cfun)->range_of_expr (vr3, @3)" is checked in
"bb 3", "range_of_expr" gets an "undefined vr3". Where "@3" is "_5".

So, before using "vr3", it would be safe to check "!vr3.undefined_p ()".

Bootstrap & regtest pass on ppc64{,le} and x86_64.
Is this ok for trunk?

BR,
Jeff (Jiufu Guo)

PR middle-end/111303

gcc/ChangeLog:

* match.pd ((X - N * M) / N): Add undefined_p checking.
(X + N * M) / N): Likewise.
((X + C) div_rshift N): Likewise.

gcc/testsuite/ChangeLog:

* gcc.dg/pr111303.c: New test.

---
 gcc/match.pd|  3 +++
 gcc/testsuite/gcc.dg/pr111303.c | 11 +++
 2 files changed, 14 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pr111303.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 801edb128f9..e2583ca7960 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -975,6 +975,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
/* "X+(N*M)" doesn't overflow.  */
&& range_op_handler (PLUS_EXPR).overflow_free_p (vr0, vr3)
&& get_range_query (cfun)->range_of_expr (vr4, @4)
+   && !vr4.undefined_p ()
/* "X+N*M" is not with opposite sign as "X".  */
&& (TYPE_UNSIGNED (type)
   || (vr0.nonnegative_p () && vr4.nonnegative_p ())
@@ -995,6 +996,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
/* "X - (N*M)" doesn't overflow.  */
&& range_op_handler (MINUS_EXPR).overflow_free_p (vr0, vr3)
&& get_range_query (cfun)->range_of_expr (vr4, @4)
+   && !vr4.undefined_p ()
/* "X-N*M" is not with opposite sign as "X".  */
&& (TYPE_UNSIGNED (type)
   || (vr0.nonnegative_p () && vr4.nonnegative_p ())
@@ -1025,6 +1027,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  /* "X+C" doesn't overflow.  */
  && range_op_handler (PLUS_EXPR).overflow_free_p (vr0, vr1)
  && get_range_query (cfun)->range_of_expr (vr3, @3)
+ && !vr3.undefined_p ()
  /* "X+C" and "X" are not of opposite sign.  */
  && (TYPE_UNSIGNED (type)
  || (vr0.nonnegative_p () && vr3.nonnegative_p ())
diff --git a/gcc/testsuite/gcc.dg/pr111303.c b/gcc/testsuite/gcc.dg/pr111303.c
new file mode 100644
index 000..eaabe55c105
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr111303.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+/* Make sure no ICE. */
+unsigned char a;
+int b(int c) {
+  if (c >= 5000)
+return c / 5;
+}
+void d() { b(a - 5); }
+int main() {}
-- 
2.25.1



Re: [PATCH V6] Optimize '(X - N * M) / N' to 'X / N - M' if valid

2023-09-03 Thread Jiufu Guo via Gcc-patches


Hi,

Richard Biener  writes:

> On Fri, 1 Sep 2023, Jiufu Guo wrote:
>
>> Hi,
>> 
>> Integer expression "(X - N * M) / N" can be optimized to "X / N - M" with
>> the below conditions:
>> 1. There is no wrap/overflow/underflow.
>>wrap/overflow/underflow breaks the arithmetic operation.
>> 2. "X - N * M" and "X" are not of opposite sign.
>>Here, the operation "/" would be "trunc_div", the fractional part is
>>discarded towards zero. If "X - N * M" and "X" are in different signs,
>>then trunc_div discards the fractional parts (of /N) in different
>>directions.
>> 
>> Compare the previous version:
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624801.html
>> This patch adds comments and update the pattern on "(t + C)" to be more
>> tight.
>> 
>> Bootstrap & regtest pass on ppc64{,le} and x86_64.
>> Is this patch ok for trunk?
>> 
>> BR,
>> Jeff (Jiufu Guo)
>> 
>>  PR tree-optimization/108757
>> 
>> gcc/ChangeLog:
>> 
>>  * match.pd ((X - N * M) / N): New pattern.
>>  ((X + N * M) / N): New pattern.
>>  ((X + C) div_rshift N): New pattern.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>  * gcc.dg/pr108757-1.c: New test.
>>  * gcc.dg/pr108757-2.c: New test.
>>  * gcc.dg/pr108757.h: New test.
>> 
>> ---
>>  gcc/match.pd  |  78 ++
>>  gcc/testsuite/gcc.dg/pr108757-1.c |  18 +++
>>  gcc/testsuite/gcc.dg/pr108757-2.c |  19 +++
>>  gcc/testsuite/gcc.dg/pr108757.h   | 233 ++
>>  4 files changed, 348 insertions(+)
>>  create mode 100644 gcc/testsuite/gcc.dg/pr108757-1.c
>>  create mode 100644 gcc/testsuite/gcc.dg/pr108757-2.c
>>  create mode 100644 gcc/testsuite/gcc.dg/pr108757.h
>> 
>> diff --git a/gcc/match.pd b/gcc/match.pd
>> index 
>> fa598d5ca2e470f9cc3b82469e77d743b12f107e..863bc7299cdefc622a7806a4d32e37268c50d453
>>  100644
>> --- a/gcc/match.pd
>> +++ b/gcc/match.pd
>> @@ -959,6 +959,84 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>>  #endif
>> 
>>  
>> +#if GIMPLE
>> +(for div (trunc_div exact_div)
>> + /* Simplify (X + M*N) / N -> X / N + M.  */
>> + (simplify
>> +  (div (plus:c@4 @0 (mult:c@3 @1 @2)) @2)
>> +  (with {value_range vr0, vr1, vr2, vr3, vr4;}
>> +  (if (INTEGRAL_TYPE_P (type)
>> +   && get_range_query (cfun)->range_of_expr (vr1, @1)
>> +   && get_range_query (cfun)->range_of_expr (vr2, @2)
>> +   /* "N*M" doesn't overflow.  */
>> +   && range_op_handler (MULT_EXPR).overflow_free_p (vr1, vr2)
>> +   && get_range_query (cfun)->range_of_expr (vr0, @0)
>> +   && get_range_query (cfun)->range_of_expr (vr3, @3)
>> +   /* "X+(N*M)" doesn't overflow.  */
>> +   && range_op_handler (PLUS_EXPR).overflow_free_p (vr0, vr3)
>> +   && get_range_query (cfun)->range_of_expr (vr4, @4)
>> +   /* "X+N*M" is not with opposite sign as "X".  */
>> +   && (TYPE_UNSIGNED (type)
>> +   || (vr0.nonnegative_p () && vr4.nonnegative_p ())
>> +   || (vr0.nonpositive_p () && vr4.nonpositive_p (
>> +  (plus (div @0 @2) @1
>> +
>> + /* Simplify (X - M*N) / N -> X / N - M.  */
>> + (simplify
>> +  (div (minus@4 @0 (mult:c@3 @1 @2)) @2)
>> +  (with {value_range vr0, vr1, vr2, vr3, vr4;}
>> +  (if (INTEGRAL_TYPE_P (type)
>> +   && get_range_query (cfun)->range_of_expr (vr1, @1)
>> +   && get_range_query (cfun)->range_of_expr (vr2, @2)
>> +   /* "N * M" doesn't overflow.  */
>> +   && range_op_handler (MULT_EXPR).overflow_free_p (vr1, vr2)
>> +   && get_range_query (cfun)->range_of_expr (vr0, @0)
>> +   && get_range_query (cfun)->range_of_expr (vr3, @3)
>> +   /* "X - (N*M)" doesn't overflow.  */
>> +   && range_op_handler (MINUS_EXPR).overflow_free_p (vr0, vr3)
>> +   && get_range_query (cfun)->range_of_expr (vr4, @4)
>> +   /* "X-N*M" is not with opposite sign as "X".  */
>> +   && (TYPE_UNSIGNED (type)
>> +   || (vr0.nonnegative_p () && vr4.nonnegative_p ())
>> +   || (vr0.nonpositive_p () && vr4.nonpositive_p (
>> +  (minus (div @0 @2) @1)
>> +
>> +/* Simplify
>> +   (X + C) / N -> X / N + C / N where C is multiple of N.
>> +   (X + C) >> N -> X >> N + C>>N if low N bits of C is 0.  */
>> +(for op (trunc_div exact_div rshift)
>> + (simplify
>> +  (op (plus@3 @0 INTEGER_CST@1) INTEGER_CST@2)
>> +   (with
>> +{
>> +  wide_int c = wi::to_wide (@1);
>> +  wide_int n = wi::to_wide (@2);
>> +  bool shift = op == RSHIFT_EXPR;
>> +  #define plus_op1(v) (shift ? wi::rshift (v, n, TYPE_SIGN (type)) \
>> + : wi::div_trunc (v, n, TYPE_SIGN (type)))
>> +  #define exact_mod(v) (shift ? wi::ctz (v) >= n.to_shwi () \
>> +  : wi::multiple_of_p (v, n, TYPE_SIGN (type)))
>
> please indent these full left
>
>> +  value_range vr0, vr1, vr3;
>> +}
>> +(if (INTEGRAL_TYPE_P (type)
>> + && get_range_query (cfun)->range_of_expr (vr0, @0))
>> + (if (exact_mod (c)
>> +  && get_range_query (cfun)->range_of_expr (vr1, @1)
>> + 

[PATCH V6] Optimize '(X - N * M) / N' to 'X / N - M' if valid

2023-09-01 Thread Jiufu Guo via Gcc-patches
Hi,

Integer expression "(X - N * M) / N" can be optimized to "X / N - M" with
the below conditions:
1. There is no wrap/overflow/underflow.
   wrap/overflow/underflow breaks the arithmetic operation.
2. "X - N * M" and "X" are not of opposite sign.
   Here, the operation "/" would be "trunc_div", the fractional part is
   discarded towards zero. If "X - N * M" and "X" are in different signs,
   then trunc_div discards the fractional parts (of /N) in different
   directions.

Compare the previous version:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624801.html
This patch adds comments and update the pattern on "(t + C)" to be more
tight.

Bootstrap & regtest pass on ppc64{,le} and x86_64.
Is this patch ok for trunk?

BR,
Jeff (Jiufu Guo)

PR tree-optimization/108757

gcc/ChangeLog:

* match.pd ((X - N * M) / N): New pattern.
((X + N * M) / N): New pattern.
((X + C) div_rshift N): New pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/pr108757-1.c: New test.
* gcc.dg/pr108757-2.c: New test.
* gcc.dg/pr108757.h: New test.

---
 gcc/match.pd  |  78 ++
 gcc/testsuite/gcc.dg/pr108757-1.c |  18 +++
 gcc/testsuite/gcc.dg/pr108757-2.c |  19 +++
 gcc/testsuite/gcc.dg/pr108757.h   | 233 ++
 4 files changed, 348 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pr108757-1.c
 create mode 100644 gcc/testsuite/gcc.dg/pr108757-2.c
 create mode 100644 gcc/testsuite/gcc.dg/pr108757.h

diff --git a/gcc/match.pd b/gcc/match.pd
index 
fa598d5ca2e470f9cc3b82469e77d743b12f107e..863bc7299cdefc622a7806a4d32e37268c50d453
 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -959,6 +959,84 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 #endif

 
+#if GIMPLE
+(for div (trunc_div exact_div)
+ /* Simplify (X + M*N) / N -> X / N + M.  */
+ (simplify
+  (div (plus:c@4 @0 (mult:c@3 @1 @2)) @2)
+  (with {value_range vr0, vr1, vr2, vr3, vr4;}
+  (if (INTEGRAL_TYPE_P (type)
+   && get_range_query (cfun)->range_of_expr (vr1, @1)
+   && get_range_query (cfun)->range_of_expr (vr2, @2)
+   /* "N*M" doesn't overflow.  */
+   && range_op_handler (MULT_EXPR).overflow_free_p (vr1, vr2)
+   && get_range_query (cfun)->range_of_expr (vr0, @0)
+   && get_range_query (cfun)->range_of_expr (vr3, @3)
+   /* "X+(N*M)" doesn't overflow.  */
+   && range_op_handler (PLUS_EXPR).overflow_free_p (vr0, vr3)
+   && get_range_query (cfun)->range_of_expr (vr4, @4)
+   /* "X+N*M" is not with opposite sign as "X".  */
+   && (TYPE_UNSIGNED (type)
+  || (vr0.nonnegative_p () && vr4.nonnegative_p ())
+  || (vr0.nonpositive_p () && vr4.nonpositive_p (
+  (plus (div @0 @2) @1
+
+ /* Simplify (X - M*N) / N -> X / N - M.  */
+ (simplify
+  (div (minus@4 @0 (mult:c@3 @1 @2)) @2)
+  (with {value_range vr0, vr1, vr2, vr3, vr4;}
+  (if (INTEGRAL_TYPE_P (type)
+   && get_range_query (cfun)->range_of_expr (vr1, @1)
+   && get_range_query (cfun)->range_of_expr (vr2, @2)
+   /* "N * M" doesn't overflow.  */
+   && range_op_handler (MULT_EXPR).overflow_free_p (vr1, vr2)
+   && get_range_query (cfun)->range_of_expr (vr0, @0)
+   && get_range_query (cfun)->range_of_expr (vr3, @3)
+   /* "X - (N*M)" doesn't overflow.  */
+   && range_op_handler (MINUS_EXPR).overflow_free_p (vr0, vr3)
+   && get_range_query (cfun)->range_of_expr (vr4, @4)
+   /* "X-N*M" is not with opposite sign as "X".  */
+   && (TYPE_UNSIGNED (type)
+  || (vr0.nonnegative_p () && vr4.nonnegative_p ())
+  || (vr0.nonpositive_p () && vr4.nonpositive_p (
+  (minus (div @0 @2) @1)
+
+/* Simplify
+   (X + C) / N -> X / N + C / N where C is multiple of N.
+   (X + C) >> N -> X >> N + C>>N if low N bits of C is 0.  */
+(for op (trunc_div exact_div rshift)
+ (simplify
+  (op (plus@3 @0 INTEGER_CST@1) INTEGER_CST@2)
+   (with
+{
+  wide_int c = wi::to_wide (@1);
+  wide_int n = wi::to_wide (@2);
+  bool shift = op == RSHIFT_EXPR;
+  #define plus_op1(v) (shift ? wi::rshift (v, n, TYPE_SIGN (type)) \
+: wi::div_trunc (v, n, TYPE_SIGN (type)))
+  #define exact_mod(v) (shift ? wi::ctz (v) >= n.to_shwi () \
+ : wi::multiple_of_p (v, n, TYPE_SIGN (type)))
+  value_range vr0, vr1, vr3;
+}
+(if (INTEGRAL_TYPE_P (type)
+&& get_range_query (cfun)->range_of_expr (vr0, @0))
+ (if (exact_mod (c)
+ && get_range_query (cfun)->range_of_expr (vr1, @1)
+ /* "X+C" doesn't overflow.  */
+ && range_op_handler (PLUS_EXPR).overflow_free_p (vr0, vr1)
+ && get_range_query (cfun)->range_of_expr (vr3, @3)
+ /* "X+C" and "X" are not of opposite sign.  */
+ && (TYPE_UNSIGNED (type)
+ || (vr0.nonnegative_p () && vr3.nonnegative_p ())
+ || (vr0.nonpositive_p () && vr3.nonpositive_p (
+   (plus (op @0 @2) { 

[PATCH V4 2/2] rs6000: use mtvsrws to move sf from si p9

2023-08-30 Thread Jiufu Guo via Gcc-patches
Hi,

As mentioned in PR108338, on p9, we could use mtvsrws to implement
the bitcast from SI to SF (or lowpart DI to SF).

For code:
  *(long long*)buff = di;
  float f = *(float*)(buff);

"sldi 9,3,32 ; mtvsrd 1,9 ; xscvspdpn 1,1" is generated.
A better one would be "mtvsrws 1,3 ; xscvspdpn 1,1".

Compare with previous patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623533.html
"highpart DI-->SF" is put to a seperate patch.

Pass bootstrap and regression on ppc64{,le}.
Is this ok for trunk?

BR,
Jeff (Jiufu Guo)

gcc/ChangeLog:

* config/rs6000/rs6000.md (movsf_from_si): Update to generate mtvsrws
for P9.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr108338.c: Updated to check mtvsrws for p9.

---
 gcc/config/rs6000/rs6000.md | 25 -
 gcc/testsuite/gcc.target/powerpc/pr108338.c |  6 +++--
 2 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 
8c92cbf976de915136ad5dba24e69a363d21438d..c03e677bca79e8fb1acb276d07d0acfae009f6d8
 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -8280,13 +8280,26 @@ (define_insn_and_split "movsf_from_si"
 {
   rtx op0 = operands[0];
   rtx op1 = operands[1];
-  rtx op2 = operands[2];
-  rtx op1_di = gen_rtx_REG (DImode, REGNO (op1));
 
-  /* Move SF value to upper 32-bits for xscvspdpn.  */
-  emit_insn (gen_ashldi3 (op2, op1_di, GEN_INT (32)));
-  emit_insn (gen_p8_mtvsrd_sf (op0, op2));
-  emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0));
+  /* Move lowpart 32-bits from register for SFmode.  */
+  if (TARGET_P9_VECTOR)
+{
+  /* Using mtvsrws;xscvspdpn.  */
+  rtx op0_v = gen_rtx_REG (V4SImode, REGNO (op0));
+  emit_insn (gen_vsx_splat_v4si (op0_v, op1));
+  emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0));
+}
+  else
+{
+  rtx op2 = operands[2];
+  rtx op1_di = gen_rtx_REG (DImode, REGNO (op1));
+
+  /* Using ashl;mtvsrd;xscvspdpn.  */
+  emit_insn (gen_ashldi3 (op2, op1_di, GEN_INT (32)));
+  emit_insn (gen_p8_mtvsrd_sf (op0, op2));
+  emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0));
+}
+
   DONE;
 }
   [(set_attr "length"
diff --git a/gcc/testsuite/gcc.target/powerpc/pr108338.c 
b/gcc/testsuite/gcc.target/powerpc/pr108338.c
index 
6db65595343c2407fc32f68f5f52a1f7196c371d..0565e5254ed0a8cc579cf505a3f865426dcf62ae
 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr108338.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr108338.c
@@ -19,9 +19,11 @@ float  __attribute__ ((noipa)) sf_from_di_off4 (long long l)
 
 /* Under lp64, parameter 'l' is in one DI reg, then bitcast sub DI to SF. */
 /* { dg-final { scan-assembler-times {\mxscvspdpn\M} 2 { target { lp64 && 
has_arch_pwr8 } } } } */
-/* { dg-final { scan-assembler-times {\mmtvsrd\M} 2 { target { lp64 && 
has_arch_pwr8 } } } } */
+/* { dg-final { scan-assembler-times {\mmtvsrd\M} 2 { target { lp64 && { 
has_arch_pwr8 && { ! has_arch_pwr9 } } } } } } */
+/* { dg-final { scan-assembler-times {\msldi\M} 1 { target { lp64 && { 
has_arch_pwr8 && { ! has_arch_pwr9 } } } } } } */
+/* { dg-final { scan-assembler-times {\mmtvsrd\M} 1 { target { lp64 && 
has_arch_pwr9 } } } } */
+/* { dg-final { scan-assembler-times {\mmtvsrws\M} 1 { target { lp64 && 
has_arch_pwr9 } } } } */
 /* { dg-final { scan-assembler-times {\mrldicr\M} 1 { target { lp64 && 
has_arch_pwr8 } } } } */
-/* { dg-final { scan-assembler-times {\msldi\M} 1 { target { lp64 && 
has_arch_pwr8 } } } } */
 
 union di_sf_sf
 {
-- 
2.25.1



[PATCH V4 1/2] rs6000: optimize moving to sf from highpart di

2023-08-30 Thread Jiufu Guo via Gcc-patches
Hi,

Currently, we have the pattern "movsf_from_si2" which was trying
to support moving high part DI to SF.

The pattern looks like: XX:SF=bitcast:SF(subreg(YY:DI>>32),0)
It only accepts the "ashiftrt" for ">>", but "lshiftrt" is also ok.
And the offset of "subreg" is hard code 0, which only works for LE.

"movsf_from_si2" is updated to cover BE for "subreg", and cover
the logical shift for ":DI>>32".

Pass bootstrap and regression on ppc64{,le}.
Is this ok for trunk?

BR,
Jeff (Jiufu Guo)

PR target/108338

gcc/ChangeLog:

* config/rs6000/predicates.md (lowpart_subreg_operator): New
define_predicate.
* config/rs6000/rs6000.md (any_rshift): New code_iterator.
(movsf_from_si2): Rename to ...
(movsf_from_si2_): ... this.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr108338.c: New test.

---
 gcc/config/rs6000/predicates.md |  5 +++
 gcc/config/rs6000/rs6000.md | 11 +++---
 gcc/testsuite/gcc.target/powerpc/pr108338.c | 40 +
 3 files changed, 51 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108338.c

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 
3552d908e9d149a30993e3e6568466de537336be..e25b3b4864f681d47e9d5c2eb88bcde0aea6d17b
 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -2098,3 +2098,8 @@ (define_predicate "macho_pic_address"
   else
 return false;
 })
+
+(define_predicate "lowpart_subreg_operator"
+  (and (match_code "subreg")
+   (match_test "subreg_lowpart_offset (mode, GET_MODE (SUBREG_REG (op)))
+   == SUBREG_BYTE (op)")))
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 
1a9a7b1a47918f39fc91038607f21a8ba9a2e740..8c92cbf976de915136ad5dba24e69a363d21438d
 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -8299,18 +8299,19 @@ (define_insn_and_split "movsf_from_si"
"*,  *, p9v,   p8v,   *, *,
 p8v,p8v,   p8v,   *")])
 
+(define_code_iterator any_rshift [ashiftrt lshiftrt])
+
 ;; For extracting high part element from DImode register like:
 ;; {%1:SF=unspec[r122:DI>>0x20#0] 86;clobber scratch;}
 ;; split it before reload with "and mask" to avoid generating shift right
 ;; 32 bit then shift left 32 bit.
-(define_insn_and_split "movsf_from_si2"
+(define_insn_and_split "movsf_from_si2_"
   [(set (match_operand:SF 0 "gpc_reg_operand" "=wa")
(unspec:SF
-[(subreg:SI
-  (ashiftrt:DI
+[(match_operator:SI 3 "lowpart_subreg_operator"
+  [(any_rshift:DI
(match_operand:DI 1 "input_operand" "r")
-   (const_int 32))
-  0)]
+   (const_int 32))])]
 UNSPEC_SF_FROM_SI))
   (clobber (match_scratch:DI 2 "=r"))]
   "TARGET_NO_SF_SUBREG"
diff --git a/gcc/testsuite/gcc.target/powerpc/pr108338.c 
b/gcc/testsuite/gcc.target/powerpc/pr108338.c
new file mode 100644
index 
..6db65595343c2407fc32f68f5f52a1f7196c371d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr108338.c
@@ -0,0 +1,40 @@
+// { dg-do run }
+// { dg-options "-O2 -save-temps" }
+
+float __attribute__ ((noipa)) sf_from_di_off0 (long long l)
+{
+  char buff[16];
+  *(long long*)buff = l;
+  float f = *(float*)(buff);
+  return f;
+}
+
+float  __attribute__ ((noipa)) sf_from_di_off4 (long long l)
+{
+  char buff[16];
+  *(long long*)buff = l;
+  float f = *(float*)(buff + 4);
+  return f; 
+}
+
+/* Under lp64, parameter 'l' is in one DI reg, then bitcast sub DI to SF. */
+/* { dg-final { scan-assembler-times {\mxscvspdpn\M} 2 { target { lp64 && 
has_arch_pwr8 } } } } */
+/* { dg-final { scan-assembler-times {\mmtvsrd\M} 2 { target { lp64 && 
has_arch_pwr8 } } } } */
+/* { dg-final { scan-assembler-times {\mrldicr\M} 1 { target { lp64 && 
has_arch_pwr8 } } } } */
+/* { dg-final { scan-assembler-times {\msldi\M} 1 { target { lp64 && 
has_arch_pwr8 } } } } */
+
+union di_sf_sf
+{
+  struct {float f1; float f2;};
+  long long l;
+};
+
+int main()
+{
+  union di_sf_sf v;
+  v.f1 = 1.0f;
+  v.f2 = 2.0f;
+  if (sf_from_di_off0 (v.l) != 1.0f || sf_from_di_off4 (v.l) != 2.0f )
+__builtin_abort ();
+  return 0;
+}
-- 
2.25.1



Re: [PATCH V1 1/2] light expander sra v0

2023-08-30 Thread Jiufu Guo via Gcc-patches


Hi Richard,

Thanks so much for your great review!

Richard Biener  writes:

> On Wed, 23 Aug 2023, Jiufu Guo wrote:
>
>> 
>> Hi,
>> 
>> I just updated the patch.  We could review this one.
>> 
>> Compare with previous patch:
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627287.html
>> This version:
>> * Supports bitfield access from one register.
>> * Allow return scalar registers cleaned via contructor.
>> 
>> Bootstrapped and regtested on x86_64-redhat-linux, and
>> powerpc64{,le}-linux-gnu.
>> 
>> Is it ok for trunk?
>
> Some comments inline - not a full review (and sorry for the delay).
>
>> 
>>  PR target/65421
>>  PR target/69143
>> 
>> gcc/ChangeLog:
>> 
>>  * cfgexpand.cc (extract_bit_field): Extern declare.
>>  (struct access): New class.
>>  (struct expand_sra): New class.
>>  (expand_sra::build_access): New member function.
>>  (expand_sra::visit_base): Likewise.
>>  (expand_sra::analyze_default_stmt): Likewise.
>>  (expand_sra::analyze_assign): Likewise.
>>  (expand_sra::add_sra_candidate): Likewise.
>>  (expand_sra::collect_sra_candidates): Likewise.
>>  (expand_sra::valid_scalariable_accesses): Likewise.
>>  (expand_sra::prepare_expander_sra): Likewise.
>>  (expand_sra::expand_sra): Class constructor.
>>  (expand_sra::~expand_sra): Class destructor.
>>  (expand_sra::get_scalarized_rtx): New member function.
>>  (extract_one_reg): New function.
>>  (extract_bitfield): New function.
>>  (expand_sra::scalarize_access): New member function.
>>  (expand_sra::scalarize_accesses): New member function.
>>  (get_scalar_rtx_for_aggregate_expr): New function.
>>  (set_scalar_rtx_for_aggregate_access): New function.
>>  (set_scalar_rtx_for_returns): New function.
>>  (expand_return): Call get_scalar_rtx_for_aggregate_expr.
>>  (expand_debug_expr): Call get_scalar_rtx_for_aggregate_expr.
>>  (pass_expand::execute): Update to use the expand_sra.
>>  * expr.cc (get_scalar_rtx_for_aggregate_expr): Extern declare.
>>  (expand_assignment): Call get_scalar_rtx_for_aggregate_expr.
>>  (expand_expr_real): Call get_scalar_rtx_for_aggregate_expr.
>>  * function.cc (set_scalar_rtx_for_aggregate_access):  Extern declare.
>>  (set_scalar_rtx_for_returns): Extern declare.
>>  (assign_parm_setup_block): Call set_scalar_rtx_for_aggregate_access.
>>  (assign_parms): Call set_scalar_rtx_for_aggregate_access. 
>>  (expand_function_start): Call set_scalar_rtx_for_returns.
>>  * tree-sra.h (struct base_access): New class.
>>  (struct default_analyzer): New class.
>>  (scan_function): New function template.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>  * g++.target/powerpc/pr102024.C: Updated.
>>  * gcc.target/powerpc/pr108073.c: New test.
>>  * gcc.target/powerpc/pr65421-1.c: New test.
>>  * gcc.target/powerpc/pr65421-2.c: New test.
>> 
>> ---
>>  gcc/cfgexpand.cc | 474 ++-
>>  gcc/expr.cc  |  29 +-
>>  gcc/function.cc  |  28 +-
>>  gcc/tree-sra.h   |  77 +++
>>  gcc/testsuite/g++.target/powerpc/pr102024.C  |   2 +-
>>  gcc/testsuite/gcc.target/powerpc/pr108073.c  |  29 ++
>>  gcc/testsuite/gcc.target/powerpc/pr65421-1.c |   6 +
>>  gcc/testsuite/gcc.target/powerpc/pr65421-2.c |  32 ++
>>  8 files changed, 668 insertions(+), 9 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073.c
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-1.c
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-2.c
>> 
>> diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
>> index 
>> edf292cfbe95ac2711faee7769e839cb4edb0dd3..385b6c781aa2805e7ca40293a0ae84f87e23e0b6
>>  100644
>> --- a/gcc/cfgexpand.cc
>> +++ b/gcc/cfgexpand.cc
>> @@ -74,6 +74,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "output.h"
>>  #include "builtins.h"
>>  #include "opts.h"
>> +#include "tree-sra.h"
>>  
>>  /* Some systems use __main in a way incompatible with its use in gcc, in 
>> these
>> cases use the macros NAME__MAIN to give a quoted symbol and SYMBOL__MAIN 
>> to
>> @@ -97,6 +98,468 @@ static bool defer_stack_allocation (tree, bool);
>>  
>>  static void record_alignment_for_reg_var (unsigned int);
>>  
>> +extern rtx extract_bit_field (rtx, poly_uint64, poly_uint64, int, rtx,
>> +  machine_mode, machine_mode, bool, rtx *);
>
> belongs in some header

Thanks. Will update.
>
>> +
>> +/* For light SRA in expander about paramaters and returns.  */
>> +struct access : public base_access
>> +{
>> +  /* The rtx for the access: link to incoming/returning register(s).  */
>> +  rtx rtx_val;
>> +};
>> +
>> +typedef struct access *access_p;
>> +
>> +struct expand_sra : public default_analyzer
>> +{
>
> Both 'base_access' and 'default_analyzer' need a more specific
> name I think.  

Re: Bind RTL to a TREE expr (Re: [Bug target/111166])

2023-08-29 Thread Jiufu Guo via Gcc-patches


Hi Richard,

Thanks a lot for your great comments!

Richard Biener  writes:

> On Tue, 29 Aug 2023, Jiufu Guo wrote:
>
>> 
>> Hi Richard,
>> 
>> Thanks a lot for your quick reply!
>> 
>> Richard Biener  writes:
>> 
>> > On Tue, 29 Aug 2023, Jiufu Guo wrote:
>> >
>> >> 
>> >> Hi All!
>> >> 
>> >> "rguenth at gcc dot gnu.org"  writes:
>> >> 
>> >> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66
>> >> ...
>> >> >
>> >> >
>> >> > At RTL expansion time we store to D.2865 where it's DECL_RTL is r82:TI 
>> >> > so
>> >> > we can hardly fix it there.  Only a later pass could figure each of the
>> >> > insns fully define the reg.
>> >> >
>> >> > Jiufu Guo is working to improve what we choose for DECL_RTL, but for
>> >> > incoming params / outgoing return.  This is a case where we could,
>> >> > with -fno-tree-vectorize, improve DECL_RTL for an automatic var and
>> >> > choose not TImode but something like a (concat:TI reg:DI reg:DI).
>> >> 
>> >> Here is the patch about improving the parameters and returns in
>> >> registers.
>> >> https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628213.html
>> >> 
>> >> I have a question about how to bind an RTL to a TREE expression.
>> >> In this patch, a map TREE->RTL is used. But it would be better if
>> >> there was a faster way.
>> >> 
>> >> We have DECL_RTL/INCOMING_RTL, but they can only be bound to
>> >> DECL(or PARM). In the above patch, the TREE can be an EXPR
>> >> (e.g. COMPONENT_REF/ARRAY_REF).
>> >> 
>> >> Is there a way to achieve this? Thanks for suggestions!
>> >
>> > No, but we don't need to bind RTL to COMPONENT_REF and friends,
>> > what we want to change is the DECL_RTL of the underlying DECL.
>> 
>> In the above patch, the scalarized rtx for the access of the
>> parameter/returns are computed at the time when parameters
>> are set up.  And record "scalarized rtx" and "access expression".
>> When expanding an expression, the patch queries the scalarized rtx.
>> 
>> +  rtx x = get_scalar_rtx_for_aggregate_expr (exp);
>> +  if (x)
>> +return x;
>> 
>> I'm reading "don't need to bind RTL to COMPONENT_REF and friends"
>> and "change is the DECL_RTL of the underlying DECL."
>> This may be doable. The method would be:
>> 1. When the incoming/outgoing registers are determined, we can
>>   check if the parameter/return can be scalarized, **then bind
>>   the registers to DECL_RTL of the parm/ret**.
>> 2. When expanding the expression (e.g. COMPONENT_REF), compute the
>>   scalarized rtx from DECL_RTL of the param/return.
>>   In expand_expr_real_1:
>>   case COMPONENT_REF: ... case ARRAY_REF: if base is parm...
>> 
>> Is my understanding correct?
>
> Yes, that's how it works today.  The target computes DECL_RTL
> for the parameter (could be a BLKmode memory), expansion
> of references first expands the base and gets DECL_RTL
> and then extracts the piece as analyzed via extract_bit_field
> or more direct means.
I see.
>
> As said in the review attempt sent out just now the complication
> is allowing more "complex" DECL_RTL, say a set of possibly
> different sized pseudos rather than a single pseudo or MEM.

Right.  It seems PARALLEL can used for this purpose.
It can gather some pseudos with same/different sizes.

> There's support for CONCAT already (for _Complex), some
> rough support for PARALLEL (not sure what it actually supports).

"PARALLEL" is already used for incoming registers and return
registers.  So, it was also used in the light-sra patch.

BR,
Jeff (Jiufu Guo)

>
> Richard.
>
>> BR,
>> Jeff (Jiufu Guo)
>> 
>> >
>> > Richard.
>> 


Re: Bind RTL to a TREE expr (Re: [Bug target/111166])

2023-08-29 Thread Jiufu Guo via Gcc-patches


Hi Richard,

Thanks a lot for your quick reply!

Richard Biener  writes:

> On Tue, 29 Aug 2023, Jiufu Guo wrote:
>
>> 
>> Hi All!
>> 
>> "rguenth at gcc dot gnu.org"  writes:
>> 
>> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66
>> ...
>> >
>> >
>> > At RTL expansion time we store to D.2865 where it's DECL_RTL is r82:TI so
>> > we can hardly fix it there.  Only a later pass could figure each of the
>> > insns fully define the reg.
>> >
>> > Jiufu Guo is working to improve what we choose for DECL_RTL, but for
>> > incoming params / outgoing return.  This is a case where we could,
>> > with -fno-tree-vectorize, improve DECL_RTL for an automatic var and
>> > choose not TImode but something like a (concat:TI reg:DI reg:DI).
>> 
>> Here is the patch about improving the parameters and returns in
>> registers.
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628213.html
>> 
>> I have a question about how to bind an RTL to a TREE expression.
>> In this patch, a map TREE->RTL is used. But it would be better if
>> there was a faster way.
>> 
>> We have DECL_RTL/INCOMING_RTL, but they can only be bound to
>> DECL(or PARM). In the above patch, the TREE can be an EXPR
>> (e.g. COMPONENT_REF/ARRAY_REF).
>> 
>> Is there a way to achieve this? Thanks for suggestions!
>
> No, but we don't need to bind RTL to COMPONENT_REF and friends,
> what we want to change is the DECL_RTL of the underlying DECL.

In the above patch, the scalarized rtx for the access of the
parameter/returns are computed at the time when parameters
are set up.  And record "scalarized rtx" and "access expression".
When expanding an expression, the patch queries the scalarized rtx.

+  rtx x = get_scalar_rtx_for_aggregate_expr (exp);
+  if (x)
+return x;

I'm reading "don't need to bind RTL to COMPONENT_REF and friends"
and "change is the DECL_RTL of the underlying DECL."
This may be doable. The method would be:
1. When the incoming/outgoing registers are determined, we can
  check if the parameter/return can be scalarized, **then bind
  the registers to DECL_RTL of the parm/ret**.
2. When expanding the expression (e.g. COMPONENT_REF), compute the
  scalarized rtx from DECL_RTL of the param/return.
  In expand_expr_real_1:
  case COMPONENT_REF: ... case ARRAY_REF: if base is parm...

Is my understanding correct?

BR,
Jeff (Jiufu Guo)

>
> Richard.


Re: Ping^^ [PATCH V5 2/2] Optimize '(X - N * M) / N' to 'X / N - M' if valid

2023-08-29 Thread Jiufu Guo via Gcc-patches


Hi Richard,

Thanks a lot for your review!

Richard Biener  writes:

> On Wed, 23 Aug 2023, guojiufu wrote:
>
>> Hi,
>> 
>> I would like to have a gentle ping...
>> 
>> BR,
>> Jeff (Jiufu Guo)
>> 
>> On 2023-08-07 10:45, guojiufu via Gcc-patches wrote:
>> > Hi,
>> > 
>> > Gentle ping...
>> > 
>> > On 2023-07-18 22:05, Jiufu Guo wrote:
>> >> Hi,
>> >> 
>> >> Integer expression "(X - N * M) / N" can be optimized to "X / N - M"
>> >> if there is no wrap/overflow/underflow and "X - N * M" has the same
>> >> sign with "X".
>> >> 
>> >> Compare the previous version:
>> >> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624067.html
>> >> - APIs: overflow, nonnegative_p and nonpositive_p are moved close
>> >>   to value range.
>> >> - Use above APIs in match.pd.
>> >> 
>> >> Bootstrap & regtest pass on ppc64{,le} and x86_64.
>> >> Is this patch ok for trunk?
>> >> 
>> >> BR,
>> >> Jeff (Jiufu Guo)
>> >> 
>> >>  PR tree-optimization/108757
>> >> 
>> >> gcc/ChangeLog:
>> >> 
>> >>  * match.pd ((X - N * M) / N): New pattern.
>> >>  ((X + N * M) / N): New pattern.
>> >>  ((X + C) div_rshift N): New pattern.
>> >> 
>> >> gcc/testsuite/ChangeLog:
>> >> 
>> >>  * gcc.dg/pr108757-1.c: New test.
>> >>  * gcc.dg/pr108757-2.c: New test.
>> >>  * gcc.dg/pr108757.h: New test.
>> >> 
>> >> ---
>> >>  gcc/match.pd  |  85 +++
>> >>  gcc/testsuite/gcc.dg/pr108757-1.c |  18 +++
>> >>  gcc/testsuite/gcc.dg/pr108757-2.c |  19 +++
>> >>  gcc/testsuite/gcc.dg/pr108757.h   | 233 
>> >> ++
>> >>  4 files changed, 355 insertions(+)
>> >>  create mode 100644 gcc/testsuite/gcc.dg/pr108757-1.c
>> >>  create mode 100644 gcc/testsuite/gcc.dg/pr108757-2.c
>> >>  create mode 100644 gcc/testsuite/gcc.dg/pr108757.h
>> >> 
>> >> diff --git a/gcc/match.pd b/gcc/match.pd
>> >> index 8543f777a28..39dbb0567dc 100644
>> >> --- a/gcc/match.pd
>> >> +++ b/gcc/match.pd
>> >> @@ -942,6 +942,91 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>> >>  #endif
>> >> 
>> >> 
>> >> +#if GIMPLE
>> >> +(for div (trunc_div exact_div)
>> >> + /* Simplify (t + M*N) / N -> t / N + M.  */
>> >> + (simplify
>> >> +  (div (plus:c@4 @0 (mult:c@3 @1 @2)) @2)
>
> The :c on the plus isn't necessary?

":c" would be needed.  Because when the pattern is matched
in gimple passes(e.g. vrp), the insn sequences would looks
like: 
"%_6 = N * M; %_7 = %_6 + t":  "%_6" is leading "t".

Without ":c", the pattern may need write as:
(plus@4 (mult:c@3 @1 @2) $0).

>
>> >> +  (with {value_range vr0, vr1, vr2, vr3, vr4;}
>> >> +  (if (INTEGRAL_TYPE_P (type)
>> >> +   && get_range_query (cfun)->range_of_expr (vr1, @1)
>> >> +   && get_range_query (cfun)->range_of_expr (vr2, @2)
>> >> +   && range_op_handler (MULT_EXPR).overflow_free_p (vr1, vr2)
>
> the multiplication doesn't overflow
Yes, this is checking no overflow on mult.
>
>> >> +   && get_range_query (cfun)->range_of_expr (vr0, @0)
>> >> +   && get_range_query (cfun)->range_of_expr (vr3, @3)
>> >> +   && range_op_handler (PLUS_EXPR).overflow_free_p (vr0, vr3)
>
> the add doesn't overflow
Yes, this is checking no overflow on add.
>
>> >> +   && get_range_query (cfun)->range_of_expr (vr4, @4)
>> >> +   && (TYPE_UNSIGNED (type)
>> >> +|| (vr0.nonnegative_p () && vr4.nonnegative_p ())
>> >> +|| (vr0.nonpositive_p () && vr4.nonpositive_p (
>
> I don't know what this checks - the add result and the add first
> argument are not of opposite sign.  Huh.  At least this part
> needs an explaining comment.

Right, "X-N*M" is not with opposite sign of "X".

Because it is trunc_div in this pattern.  Which cutting towards
zero, if "X-N*M" changes the sign of "X", then "(X-N*M)/N" and
"X/N" cut mod to different direction.

A comment is needed, I will add.

>
> Sorry if we hashed this out before, but you can see I forgot
> and it's not obvious.
>
>> >> +  (plus (div @0 @2) @1
>> >> +
>> >> + /* Simplify (t - M*N) / N -> t / N - M.  */
>> >> + (simplify
>> >> +  (div (minus@4 @0 (mult:c@3 @1 @2)) @2)
>> >> +  (with {value_range vr0, vr1, vr2, vr3, vr4;}
>> >> +  (if (INTEGRAL_TYPE_P (type)
>> >> +   && get_range_query (cfun)->range_of_expr (vr1, @1)
>> >> +   && get_range_query (cfun)->range_of_expr (vr2, @2)
>> >> +   && range_op_handler (MULT_EXPR).overflow_free_p (vr1, vr2)
>> >> +   && get_range_query (cfun)->range_of_expr (vr0, @0)
>> >> +   && get_range_query (cfun)->range_of_expr (vr3, @3)
>> >> +   && range_op_handler (MINUS_EXPR).overflow_free_p (vr0, vr3)
>> >> +   && get_range_query (cfun)->range_of_expr (vr4, @4)
>> >> +   && (TYPE_UNSIGNED (type)
>> >> +|| (vr0.nonnegative_p () && vr4.nonnegative_p ())
>> >> +|| (vr0.nonpositive_p () && vr4.nonpositive_p (
>> >> +  (minus (div @0 @2) @1)
>
> looks like exactly the same - if you use a
>
>  (for addsub (plus minus)

I also tried to use this.  But fail, the reason is similar
with adding ":c" for "plus".
For "plus", the insn sequences would 

Bind RTL to a TREE expr (Re: [Bug target/111166])

2023-08-28 Thread Jiufu Guo via Gcc-patches


Hi All!

"rguenth at gcc dot gnu.org"  writes:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66
...
>
>
> At RTL expansion time we store to D.2865 where it's DECL_RTL is r82:TI so
> we can hardly fix it there.  Only a later pass could figure each of the
> insns fully define the reg.
>
> Jiufu Guo is working to improve what we choose for DECL_RTL, but for
> incoming params / outgoing return.  This is a case where we could,
> with -fno-tree-vectorize, improve DECL_RTL for an automatic var and
> choose not TImode but something like a (concat:TI reg:DI reg:DI).

Here is the patch about improving the parameters and returns in
registers.
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628213.html

I have a question about how to bind an RTL to a TREE expression.
In this patch, a map TREE->RTL is used. But it would be better if
there was a faster way.

We have DECL_RTL/INCOMING_RTL, but they can only be bound to
DECL(or PARM). In the above patch, the TREE can be an EXPR
(e.g. COMPONENT_REF/ARRAY_REF).

Is there a way to achieve this? Thanks for suggestions!

BR,
Jeff (Jiufu Guo)


[PATCH] rs6000: mark tieable between INT and FLOAT

2023-08-27 Thread Jiufu Guo via Gcc-patches
Hi,

For PowerPC, some INT mode and FLOAT modes can be marked as tieable,
for example: DI<->DF.
One note SFmode is special, it would only tieable with itself.

I updated previous patch more reasonable:
https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609504.html

Bootstrap and regtest pass on ppc64{,le}.
Is this ok for trunk?

BR,
Jeff (Jiufu)

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_modes_tieable_p): Mark more tieable
modes.

gcc/testsuite/ChangeLog:

* g++.target/powerpc/pr102024.C: Updated.

---
 gcc/config/rs6000/rs6000.cc | 9 +
 gcc/testsuite/g++.target/powerpc/pr102024.C | 3 ++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 6ac3adcec6b..3cb0186089e 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -1968,6 +1968,15 @@ rs6000_modes_tieable_p (machine_mode mode1, machine_mode 
mode2)
   if (ALTIVEC_OR_VSX_VECTOR_MODE (mode2))
 return false;
 
+  /* SFmode format (IEEE DP) in register would not as required,
+ So SFmode is restrict here.  */
+  if (GET_MODE_CLASS (mode1) == MODE_FLOAT
+  && GET_MODE_CLASS (mode2) == MODE_INT)
+return GET_MODE_SIZE (mode1) == UNITS_PER_FP_WORD;
+  if (GET_MODE_CLASS (mode1) == MODE_INT
+  && GET_MODE_CLASS (mode2) == MODE_FLOAT)
+return GET_MODE_SIZE (mode2) == UNITS_PER_FP_WORD;
+
   if (SCALAR_FLOAT_MODE_P (mode1))
 return SCALAR_FLOAT_MODE_P (mode2);
   if (SCALAR_FLOAT_MODE_P (mode2))
diff --git a/gcc/testsuite/g++.target/powerpc/pr102024.C 
b/gcc/testsuite/g++.target/powerpc/pr102024.C
index 769585052b5..27d2dc5e80b 100644
--- a/gcc/testsuite/g++.target/powerpc/pr102024.C
+++ b/gcc/testsuite/g++.target/powerpc/pr102024.C
@@ -5,7 +5,8 @@
 // Test that a zero-width bit field in an otherwise homogeneous aggregate
 // generates a psabi warning and passes arguments in GPRs.
 
-// { dg-final { scan-assembler-times {\mstd\M} 4 } }
+// { dg-final { scan-assembler-times {\mmtvsrd\M} 4 { target has_arch_pwr8 } } 
}
+// { dg-final { scan-assembler-times {\mstd\M} 4 { target { ! has_arch_pwr8 } 
} } }
 
 struct a_thing
 {
-- 
2.17.1



[PATCH V5 1/4] rs6000: build constant via li;rotldi

2023-08-23 Thread Jiufu Guo via Gcc-patches
Hi,

If a constant is possible to be rotated to/from a positive or negative
value which "li" can generated, then "li;rotldi" can be used to build
the constant.

Compare with the previous version:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623528.html
This patch just did minor changes to the comments according to previous
review.

Bootstrap and regtest pass on ppc64{,le}.

Is this ok for trunk?


BR,
Jeff (Jiufu)

gcc/ChangeLog:

* config/rs6000/rs6000.cc (can_be_built_by_li_and_rotldi): New function.
(rs6000_emit_set_long_const): Call can_be_built_by_li_and_rotldi.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/const-build.c: New test.
---
 gcc/config/rs6000/rs6000.cc   | 47 +--
 .../gcc.target/powerpc/const-build.c  | 57 +++
 2 files changed, 98 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/const-build.c

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 42f49e4a56b..acc332acc05 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -10258,6 +10258,31 @@ rs6000_emit_set_const (rtx dest, rtx source)
   return true;
 }
 
+/* Check if value C can be built by 2 instructions: one is 'li', another is
+   'rotldi'.
+
+   If so, *SHIFT is set to the shift operand of rotldi(rldicl), and *MASK
+   is set to the mask operand of rotldi(rldicl), and return true.
+   Return false otherwise.  */
+
+static bool
+can_be_built_by_li_and_rotldi (HOST_WIDE_INT c, int *shift,
+  HOST_WIDE_INT *mask)
+{
+  /* If C or ~C contains at least 49 successive zeros, then C can be rotated
+ to/from a positive or negative value that 'li' is able to load.  */
+  int n;
+  if (can_be_rotated_to_lowbits (c, 15, )
+  || can_be_rotated_to_lowbits (~c, 15, ))
+{
+  *mask = HOST_WIDE_INT_M1;
+  *shift = HOST_BITS_PER_WIDE_INT - n;
+  return true;
+}
+
+  return false;
+}
+
 /* Subroutine of rs6000_emit_set_const, handling PowerPC64 DImode.
Output insns to set DEST equal to the constant C as a series of
lis, ori and shl instructions.  */
@@ -10266,15 +10291,14 @@ static void
 rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
 {
   rtx temp;
+  int shift;
+  HOST_WIDE_INT mask;
   HOST_WIDE_INT ud1, ud2, ud3, ud4;
 
   ud1 = c & 0x;
-  c = c >> 16;
-  ud2 = c & 0x;
-  c = c >> 16;
-  ud3 = c & 0x;
-  c = c >> 16;
-  ud4 = c & 0x;
+  ud2 = (c >> 16) & 0x;
+  ud3 = (c >> 32) & 0x;
+  ud4 = (c >> 48) & 0x;
 
   if ((ud4 == 0x && ud3 == 0x && ud2 == 0x && (ud1 & 0x8000))
   || (ud4 == 0 && ud3 == 0 && ud2 == 0 && ! (ud1 & 0x8000)))
@@ -10305,6 +10329,17 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
   emit_move_insn (dest, gen_rtx_XOR (DImode, temp,
 GEN_INT ((ud2 ^ 0x) << 16)));
 }
+  else if (can_be_built_by_li_and_rotldi (c, , ))
+{
+  temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
+  unsigned HOST_WIDE_INT imm = (c | ~mask);
+  imm = (imm >> shift) | (imm << (HOST_BITS_PER_WIDE_INT - shift));
+
+  emit_move_insn (temp, GEN_INT (imm));
+  if (shift != 0)
+   temp = gen_rtx_ROTATE (DImode, temp, GEN_INT (shift));
+  emit_move_insn (dest, temp);
+}
   else if (ud3 == 0 && ud4 == 0)
 {
   temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
diff --git a/gcc/testsuite/gcc.target/powerpc/const-build.c 
b/gcc/testsuite/gcc.target/powerpc/const-build.c
new file mode 100644
index 000..69b37e2bb53
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/const-build.c
@@ -0,1 +1,57 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -save-temps" } */
+/* { dg-require-effective-target has_arch_ppc64 } */
+
+/* Verify that two instructions are successfully used to build constants.
+   One insn is li, another is rotate: rldicl.  */
+
+#define NOIPA __attribute__ ((noipa))
+
+struct fun
+{
+  long long (*f) (void);
+  long long val;
+};
+
+long long NOIPA
+li_rotldi_1 (void)
+{
+  return 0x75310LL;
+}
+
+long long NOIPA
+li_rotldi_2 (void)
+{
+  return 0x2164LL;
+}
+
+long long NOIPA
+li_rotldi_3 (void)
+{
+  return 0x8531LL;
+}
+
+long long NOIPA
+li_rotldi_4 (void)
+{
+  return 0x2194LL;
+}
+
+struct fun arr[] = {
+  {li_rotldi_1, 0x75310LL},
+  {li_rotldi_2, 0x2164LL},
+  {li_rotldi_3, 0x8531LL},
+  {li_rotldi_4, 0x2194LL},
+};
+
+/* { dg-final { scan-assembler-times {\mrotldi\M} 4 } } */
+
+int
+main ()
+{
+  for (int i = 0; i < sizeof (arr) / sizeof (arr[0]); i++)
+if ((*arr[i].f) () != arr[i].val)
+  __builtin_abort ();
+
+  return 0;
+}
-- 
2.39.3



[PATCH V1 1/2] light expander sra v0

2023-08-22 Thread Jiufu Guo via Gcc-patches


Hi,

I just updated the patch.  We could review this one.

Compare with previous patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627287.html
This version:
* Supports bitfield access from one register.
* Allow return scalar registers cleaned via contructor.

Bootstrapped and regtested on x86_64-redhat-linux, and
powerpc64{,le}-linux-gnu.

Is it ok for trunk?


PR target/65421
PR target/69143

gcc/ChangeLog:

* cfgexpand.cc (extract_bit_field): Extern declare.
(struct access): New class.
(struct expand_sra): New class.
(expand_sra::build_access): New member function.
(expand_sra::visit_base): Likewise.
(expand_sra::analyze_default_stmt): Likewise.
(expand_sra::analyze_assign): Likewise.
(expand_sra::add_sra_candidate): Likewise.
(expand_sra::collect_sra_candidates): Likewise.
(expand_sra::valid_scalariable_accesses): Likewise.
(expand_sra::prepare_expander_sra): Likewise.
(expand_sra::expand_sra): Class constructor.
(expand_sra::~expand_sra): Class destructor.
(expand_sra::get_scalarized_rtx): New member function.
(extract_one_reg): New function.
(extract_bitfield): New function.
(expand_sra::scalarize_access): New member function.
(expand_sra::scalarize_accesses): New member function.
(get_scalar_rtx_for_aggregate_expr): New function.
(set_scalar_rtx_for_aggregate_access): New function.
(set_scalar_rtx_for_returns): New function.
(expand_return): Call get_scalar_rtx_for_aggregate_expr.
(expand_debug_expr): Call get_scalar_rtx_for_aggregate_expr.
(pass_expand::execute): Update to use the expand_sra.
* expr.cc (get_scalar_rtx_for_aggregate_expr): Extern declare.
(expand_assignment): Call get_scalar_rtx_for_aggregate_expr.
(expand_expr_real): Call get_scalar_rtx_for_aggregate_expr.
* function.cc (set_scalar_rtx_for_aggregate_access):  Extern declare.
(set_scalar_rtx_for_returns): Extern declare.
(assign_parm_setup_block): Call set_scalar_rtx_for_aggregate_access.
(assign_parms): Call set_scalar_rtx_for_aggregate_access. 
(expand_function_start): Call set_scalar_rtx_for_returns.
* tree-sra.h (struct base_access): New class.
(struct default_analyzer): New class.
(scan_function): New function template.

gcc/testsuite/ChangeLog:

* g++.target/powerpc/pr102024.C: Updated.
* gcc.target/powerpc/pr108073.c: New test.
* gcc.target/powerpc/pr65421-1.c: New test.
* gcc.target/powerpc/pr65421-2.c: New test.

---
 gcc/cfgexpand.cc | 474 ++-
 gcc/expr.cc  |  29 +-
 gcc/function.cc  |  28 +-
 gcc/tree-sra.h   |  77 +++
 gcc/testsuite/g++.target/powerpc/pr102024.C  |   2 +-
 gcc/testsuite/gcc.target/powerpc/pr108073.c  |  29 ++
 gcc/testsuite/gcc.target/powerpc/pr65421-1.c |   6 +
 gcc/testsuite/gcc.target/powerpc/pr65421-2.c |  32 ++
 8 files changed, 668 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-2.c

diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
index 
edf292cfbe95ac2711faee7769e839cb4edb0dd3..385b6c781aa2805e7ca40293a0ae84f87e23e0b6
 100644
--- a/gcc/cfgexpand.cc
+++ b/gcc/cfgexpand.cc
@@ -74,6 +74,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "output.h"
 #include "builtins.h"
 #include "opts.h"
+#include "tree-sra.h"
 
 /* Some systems use __main in a way incompatible with its use in gcc, in these
cases use the macros NAME__MAIN to give a quoted symbol and SYMBOL__MAIN to
@@ -97,6 +98,468 @@ static bool defer_stack_allocation (tree, bool);
 
 static void record_alignment_for_reg_var (unsigned int);
 
+extern rtx extract_bit_field (rtx, poly_uint64, poly_uint64, int, rtx,
+ machine_mode, machine_mode, bool, rtx *);
+
+/* For light SRA in expander about paramaters and returns.  */
+struct access : public base_access
+{
+  /* The rtx for the access: link to incoming/returning register(s).  */
+  rtx rtx_val;
+};
+
+typedef struct access *access_p;
+
+struct expand_sra : public default_analyzer
+{
+  expand_sra ();
+  ~expand_sra ();
+
+  /* Now use default APIs, no actions for
+ pre_analyze_stmt, analyze_return.  */
+
+  /* overwrite analyze_default_stmt.  */
+  void analyze_default_stmt (gimple *);
+
+  /* overwrite analyze phi,call,asm .  */
+  void analyze_phi (gphi *stmt) { analyze_default_stmt (stmt); };
+  void analyze_call (gcall *stmt) { analyze_default_stmt (stmt); };
+  void analyze_asm (gasm *stmt) { analyze_default_stmt (stmt); };
+  /* overwrite analyze_assign.  */
+  void analyze_assign (gassign *);
+
+  /* 

Re: [PATCH 1/2] light expander sra v0

2023-08-14 Thread Jiufu Guo via Gcc-patches


Hi,

Jiufu Guo  writes:

> Hi,
>
> There are a few PRs about the issues on the struct parameters and
> returns, like PRs 69143/65421/108073.
>
> we could consider introducing a light SRA in the expander to
> handle those parameters and returns in aggregate type, if they
> are passed through registers.  For access to the fields of
> the parameters or returns, the corresponding scalar registers
> can be used.
>
> As discussed:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619884.html
>
> This is an initial patch for the light-expander-sra.

In this patch, there are a few places that can be enhanced. e.g.
- support the reverse storage accessing.
- support accessing fields on the part of the registers.
- support mixed vector/TI modes.
- support accessing on-call stmt and asm stmt.
- ...
An enhancement, I'm investigating to do first: when querying the
scalarized rtx value for a tree expression, a TREE->RTX map is
used.  It may be better to bind the scalar rtx value to the
tree-type expression directly (like DECL_RTL/INCOMING_RTL).
Then 'get_scalarized_rtx' can be simpler.
But I did not figure out a suitable field of TREE for this.

Thanks for any suggestions!

BR,
Jeff (Jiufu Guo)

>
> Bootstrapped and regtested on x86_64-redhat-linux, and
> powerpc64{,le}-linux-gnu.
>
> Is it ok for trunk?
>
>
> BR,
> Jeff (Jiufu Guo)
>
>
>   PR target/65421
>   PR target/69143
>
> gcc/ChangeLog:
>
>   * cfgexpand.cc (expand_shift): Extern declare.
>   (struct access): New class.
>   (struct expand_sra): New class.
>   (expand_sra::build_access): New member function.
>   (expand_sra::visit_base): Likewise.
>   (expand_sra::analyze_default_stmt): Likewise.
>   (expand_sra::analyze_assign): Likewise.
>   (expand_sra::add_sra_candidate): Likewise.
>   (expand_sra::collect_sra_candidates): Likewise.
>   (expand_sra::valid_scalariable_accesses): Likewise.
>   (expand_sra::prepare_expander_sra): Likewise.
>   (expand_sra::expand_sra): Class constructor.
>   (expand_sra::~expand_sra): Class destructor.
>   (expand_sra::get_scalarized_rtx): New member function.
>   (extract_one_reg): New function.
>   (extract_sub_reg): New function.
>   (expand_sra::scalarize_access): New member function.
>   (expand_sra::scalarize_accesses): New member function.
>   (get_scalar_rtx_for_aggregate_expr): New function.
>   (set_scalar_rtx_for_aggregate_access): New function.
>   (set_scalar_rtx_for_returns): New function.
>   (expand_return): Call get_scalar_rtx_for_aggregate_expr.
>   (expand_debug_expr): Call get_scalar_rtx_for_aggregate_expr.
>   (pass_expand::execute): Update to use the expand_sra.
>   * expr.cc (get_scalar_rtx_for_aggregate_expr): Extern declare.
>   (expand_assignment): Call get_scalar_rtx_for_aggregate_expr.
>   (expand_expr_real): Call get_scalar_rtx_for_aggregate_expr.
>   * function.cc (set_scalar_rtx_for_aggregate_access):  Extern declare.
>   (set_scalar_rtx_for_returns): Extern declare.
>   (assign_parm_setup_block): Call set_scalar_rtx_for_aggregate_access.
>   (assign_parms): Call set_scalar_rtx_for_aggregate_access. 
>   (expand_function_start): Call set_scalar_rtx_for_returns.
>   * tree-sra.h (struct base_access): New class.
>   (struct default_analyzer): New class.
>   (scan_function): New function template.
>
> gcc/testsuite/ChangeLog:
>
>   * g++.target/powerpc/pr102024.C: Updated.
>   * gcc.target/powerpc/pr108073.c: New test.
>   * gcc.target/powerpc/pr65421-1.c: New test.
>   * gcc.target/powerpc/pr65421-2.c: New test.
>
> ---
>  gcc/cfgexpand.cc | 478 ++-
>  gcc/expr.cc  |  15 +-
>  gcc/function.cc  |  28 +-
>  gcc/tree-sra.h   |  80 +++-
>  gcc/testsuite/g++.target/powerpc/pr102024.C  |   2 +-
>  gcc/testsuite/gcc.target/powerpc/pr108073.c  |  29 ++
>  gcc/testsuite/gcc.target/powerpc/pr65421-1.c |   6 +
>  gcc/testsuite/gcc.target/powerpc/pr65421-2.c |  32 ++
>  8 files changed, 660 insertions(+), 10 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-1.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-2.c
>
> diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
> index 
> edf292cfbe95ac2711faee7769e839cb4edb0dd3..21a09ebac96bbcddc67da73c42f470c6d5f60e6c
>  100644
> --- a/gcc/cfgexpand.cc
> +++ b/gcc/cfgexpand.cc
> @@ -74,6 +74,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "output.h"
>  #include "builtins.h"
>  #include "opts.h"
> +#include "tree-sra.h"
>  
>  /* Some systems use __main in a way incompatible with its use in gcc, in 
> these
> cases use the macros NAME__MAIN to give a quoted symbol and SYMBOL__MAIN 
> to
> @@ -97,6 +98,472 @@ static bool defer_stack_allocation (tree, 

[PATCH 2/2] combine nonconstant_array walker and expander_sra walker

2023-08-13 Thread Jiufu Guo via Gcc-patches
Hi,

In the light-expander-sra, each statement in each basic-block of a function
need to be analyzed, and there is a similar behavior in checking variable
which need to be stored in the stack.

These per-stmt analyses can be combined to improve cache locality.

Bootstrapped and regtested on x86_64-redhat-linux, and
powerpc64{,le}-linux-gnu.

Is it ok for trunk?

BR,
Jeff (Jiufu Guo)

gcc/ChangeLog:

* cfgexpand.cc (discover_nonconstant_array_refs): Deleted.
(struct array_and_sra_walk): New class.
(pass_expand::execute): Call scan_function on array_and_sra_walk.

---
 gcc/cfgexpand.cc | 104 +++
 1 file changed, 52 insertions(+), 52 deletions(-)

diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
index 
21a09ebac96bbcddc67da73c42f470c6d5f60e6c..dc3ebe45275cc4b1c0873b4c6e5f6cbe2491ab8c
 100644
--- a/gcc/cfgexpand.cc
+++ b/gcc/cfgexpand.cc
@@ -6843,59 +6843,59 @@ avoid_type_punning_on_regs (tree t, bitmap 
forced_stack_vars)
 bitmap_set_bit (forced_stack_vars, DECL_UID (base));
 }
 
-/* RTL expansion is not able to compile array references with variable
-   offsets for arrays stored in single register.  Discover such
-   expressions and mark variables as addressable to avoid this
-   scenario.  */
+/* Beside light-sra, walk stmts to discover expressions of array references
+   with variable offsets for arrays and mark variables as addressable to
+   avoid to be stored in single register. */
 
-static void
-discover_nonconstant_array_refs (bitmap forced_stack_vars)
+struct array_and_sra_walk : public expand_sra
 {
-  basic_block bb;
-  gimple_stmt_iterator gsi;
+  array_and_sra_walk (bitmap map) : wi{}, forced_stack_vars (map)
+  {
+wi.info = forced_stack_vars;
+  };
 
-  walk_stmt_info wi = {};
-  wi.info = forced_stack_vars;
-  FOR_EACH_BB_FN (bb, cfun)
-for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next ())
+  void pre_analyze_stmt (gimple *stmt)
+  {
+expand_sra::pre_analyze_stmt (stmt);
+if (!is_gimple_debug (stmt))
+  walk_gimple_op (stmt, discover_nonconstant_array_refs_r, );
+if (gimple_vdef (stmt))
   {
-   gimple *stmt = gsi_stmt (gsi);
-   if (!is_gimple_debug (stmt))
+   tree t = gimple_get_lhs (stmt);
+   if (t && REFERENCE_CLASS_P (t))
+ avoid_type_punning_on_regs (t, forced_stack_vars);
+  }
+  }
+
+  void analyze_call (gcall *call)
+  {
+expand_sra::analyze_call (call);
+if (gimple_call_internal_p (call))
+  {
+   tree cand = NULL_TREE;
+   switch (gimple_call_internal_fn (call))
  {
-   walk_gimple_op (stmt, discover_nonconstant_array_refs_r, );
-   gcall *call = dyn_cast  (stmt);
-   if (call && gimple_call_internal_p (call))
- {
-   tree cand = NULL_TREE;
-   switch (gimple_call_internal_fn (call))
- {
- case IFN_LOAD_LANES:
-   /* The source must be a MEM.  */
-   cand = gimple_call_arg (call, 0);
-   break;
- case IFN_STORE_LANES:
-   /* The destination must be a MEM.  */
-   cand = gimple_call_lhs (call);
-   break;
- default:
-   break;
- }
-   if (cand)
- cand = get_base_address (cand);
-   if (cand
-   && DECL_P (cand)
-   && use_register_for_decl (cand))
- bitmap_set_bit (forced_stack_vars, DECL_UID (cand));
- }
-   if (gimple_vdef (stmt))
- {
-   tree t = gimple_get_lhs (stmt);
-   if (t && REFERENCE_CLASS_P (t))
- avoid_type_punning_on_regs (t, forced_stack_vars);
- }
+ case IFN_LOAD_LANES:
+   /* The source must be a MEM.  */
+   cand = gimple_call_arg (call, 0);
+   break;
+ case IFN_STORE_LANES:
+   /* The destination must be a MEM.  */
+   cand = gimple_call_lhs (call);
+   break;
+ default:
+   break;
  }
+   if (cand)
+ cand = get_base_address (cand);
+   if (cand && DECL_P (cand) && use_register_for_decl (cand))
+ bitmap_set_bit (forced_stack_vars, DECL_UID (cand));
   }
-}
+  };
+
+  walk_stmt_info wi;
+  bitmap forced_stack_vars;
+};
 
 /* This function sets crtl->args.internal_arg_pointer to a virtual
register if DRAP is needed.  Local register allocator will replace
@@ -7091,12 +7091,12 @@ pass_expand::execute (function *fun)
avoid_deep_ter_for_debug (gsi_stmt (gsi), 0);
 }
 
-  /* Mark arrays indexed with non-constant indices with TREE_ADDRESSABLE.  */
+  /* Mark arrays indexed with non-constant indices with TREE_ADDRESSABLE.
+ And scan expressions for possible SRA accesses. */
   auto_bitmap forced_stack_vars;
-  discover_nonconstant_array_refs 

[PATCH 1/2] light expander sra v0

2023-08-13 Thread Jiufu Guo via Gcc-patches
Hi,

There are a few PRs about the issues on the struct parameters and
returns, like PRs 69143/65421/108073.

we could consider introducing a light SRA in the expander to
handle those parameters and returns in aggregate type, if they
are passed through registers.  For access to the fields of
the parameters or returns, the corresponding scalar registers
can be used.

As discussed:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619884.html

This is an initial patch for the light-expander-sra.

Bootstrapped and regtested on x86_64-redhat-linux, and
powerpc64{,le}-linux-gnu.

Is it ok for trunk?


BR,
Jeff (Jiufu Guo)


PR target/65421
PR target/69143

gcc/ChangeLog:

* cfgexpand.cc (expand_shift): Extern declare.
(struct access): New class.
(struct expand_sra): New class.
(expand_sra::build_access): New member function.
(expand_sra::visit_base): Likewise.
(expand_sra::analyze_default_stmt): Likewise.
(expand_sra::analyze_assign): Likewise.
(expand_sra::add_sra_candidate): Likewise.
(expand_sra::collect_sra_candidates): Likewise.
(expand_sra::valid_scalariable_accesses): Likewise.
(expand_sra::prepare_expander_sra): Likewise.
(expand_sra::expand_sra): Class constructor.
(expand_sra::~expand_sra): Class destructor.
(expand_sra::get_scalarized_rtx): New member function.
(extract_one_reg): New function.
(extract_sub_reg): New function.
(expand_sra::scalarize_access): New member function.
(expand_sra::scalarize_accesses): New member function.
(get_scalar_rtx_for_aggregate_expr): New function.
(set_scalar_rtx_for_aggregate_access): New function.
(set_scalar_rtx_for_returns): New function.
(expand_return): Call get_scalar_rtx_for_aggregate_expr.
(expand_debug_expr): Call get_scalar_rtx_for_aggregate_expr.
(pass_expand::execute): Update to use the expand_sra.
* expr.cc (get_scalar_rtx_for_aggregate_expr): Extern declare.
(expand_assignment): Call get_scalar_rtx_for_aggregate_expr.
(expand_expr_real): Call get_scalar_rtx_for_aggregate_expr.
* function.cc (set_scalar_rtx_for_aggregate_access):  Extern declare.
(set_scalar_rtx_for_returns): Extern declare.
(assign_parm_setup_block): Call set_scalar_rtx_for_aggregate_access.
(assign_parms): Call set_scalar_rtx_for_aggregate_access. 
(expand_function_start): Call set_scalar_rtx_for_returns.
* tree-sra.h (struct base_access): New class.
(struct default_analyzer): New class.
(scan_function): New function template.

gcc/testsuite/ChangeLog:

* g++.target/powerpc/pr102024.C: Updated.
* gcc.target/powerpc/pr108073.c: New test.
* gcc.target/powerpc/pr65421-1.c: New test.
* gcc.target/powerpc/pr65421-2.c: New test.

---
 gcc/cfgexpand.cc | 478 ++-
 gcc/expr.cc  |  15 +-
 gcc/function.cc  |  28 +-
 gcc/tree-sra.h   |  80 +++-
 gcc/testsuite/g++.target/powerpc/pr102024.C  |   2 +-
 gcc/testsuite/gcc.target/powerpc/pr108073.c  |  29 ++
 gcc/testsuite/gcc.target/powerpc/pr65421-1.c |   6 +
 gcc/testsuite/gcc.target/powerpc/pr65421-2.c |  32 ++
 8 files changed, 660 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-2.c

diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
index 
edf292cfbe95ac2711faee7769e839cb4edb0dd3..21a09ebac96bbcddc67da73c42f470c6d5f60e6c
 100644
--- a/gcc/cfgexpand.cc
+++ b/gcc/cfgexpand.cc
@@ -74,6 +74,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "output.h"
 #include "builtins.h"
 #include "opts.h"
+#include "tree-sra.h"
 
 /* Some systems use __main in a way incompatible with its use in gcc, in these
cases use the macros NAME__MAIN to give a quoted symbol and SYMBOL__MAIN to
@@ -97,6 +98,472 @@ static bool defer_stack_allocation (tree, bool);
 
 static void record_alignment_for_reg_var (unsigned int);
 
+extern rtx
+expand_shift (enum tree_code, machine_mode, rtx, poly_int64, rtx, int);
+
+/* For light SRA in expander about paramaters and returns.  */
+struct access : public base_access
+{
+  /* The rtx for the access: link to incoming/returning register(s).  */
+  rtx rtx_val;
+};
+
+typedef struct access *access_p;
+
+struct expand_sra : public default_analyzer
+{
+  expand_sra ();
+  ~expand_sra ();
+
+  /* Now use default APIs, no actions for
+ pre_analyze_stmt, analyze_return.  */
+
+  /* overwrite analyze_default_stmt.  */
+  void analyze_default_stmt (gimple *);
+
+  /* overwrite analyze phi,call,asm .  */
+  void analyze_phi (gphi *stmt) { analyze_default_stmt (stmt); };
+  void analyze_call (gcall 

Re: [RFC] light expander sra for parameters and returns

2023-08-05 Thread Jiufu Guo via Gcc-patches
t; >> +  /* Just need one reg for the correspond access.  */
>> >> +  if (end_index == start_index && left_bits == 0 && right_bits == 0)
>> >> + {
>> >> +   rtx reg = extract_parallel_reg (regs, start_index);
>> >> +   if (GET_MODE (reg) != expr_mode)
>> >> + reg = gen_lowpart (expr_mode, reg);
>> >> +
>> >> +   acc->rtx_val = reg;
>> >> +   continue;
>> >> + }
>> >> +
>> >> +  /* Need to shift to extract a part reg for the access.  */
>> >> +  if (!acc->writing && end_index == start_index)
>> >> + {
>> >> +   rtx orig_reg = XEXP (XVECEXP (regs, 0, start_index), 0);
>> >> +   acc->rtx_val = extract_sub_reg (orig_reg, left_bits, expr_mode);
>> >> +   if (acc->rtx_val)
>> >> + continue;
>> >> + }
>> >> +
>> >> +  break;
>> >> +}
>> >> +
>> >> +  /* If all access expr(s) are not scalarized,
>> >> + bind/map all expr(tree) to sclarized rtx.  */
>> >> +  if (cur_access_index == n)
>> >> +for (int j = 0; j < n; j++)
>> >> +  {
>> >> + access_p access = (*access_vec)[j];
>> >> + expr_rtx_vec->put (access->expr, access->rtx_val);
>> >> +  }
>> >> +}
>> >> +
>> >> +void
>> >> +set_scalar_rtx_for_returns ()
>> >> +{
>> >> +  tree res = DECL_RESULT (current_function_decl);
>> >> +  gcc_assert (res);
>> >> +  edge_iterator ei;
>> >> +  edge e;
>> >> +  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
>> >> +if (greturn *r = safe_dyn_cast (*gsi_last_bb (e->src)))
>> >> +  {
>> >> + tree val = gimple_return_retval (r);
>> >> + if (val && VAR_P (val))
>> >> +   set_scalar_rtx_for_aggregate_access (val, DECL_RTL (res));
>> >> +  }
>> >> +}
>> >> +
>> >>  /* Return an expression tree corresponding to the RHS of GIMPLE
>> >> statement STMT.  */
>> >>  
>> >> @@ -3778,7 +4274,8 @@ expand_return (tree retval)
>> >>  
>> >>/* If we are returning the RESULT_DECL, then the value has already
>> >>   been stored into it, so we don't have to do anything special.  */
>> >> -  if (TREE_CODE (retval_rhs) == RESULT_DECL)
>> >> +  if (TREE_CODE (retval_rhs) == RESULT_DECL
>> >> +  || get_scalar_rtx_for_aggregate_expr (retval_rhs))
>> >>  expand_value_return (result_rtl);
>> >>  
>> >>/* If the result is an aggregate that is being returned in one (or 
>> >> more)
>> >> @@ -4422,6 +4919,9 @@ expand_debug_expr (tree exp)
>> >>int unsignedp = TYPE_UNSIGNED (TREE_TYPE (exp));
>> >>addr_space_t as;
>> >>scalar_int_mode op0_mode, op1_mode, addr_mode;
>> >> +  rtx x = get_scalar_rtx_for_aggregate_expr (exp);
>> >> +  if (x)
>> >> +return NULL_RTX;/* optimized out.  */
>> >>  
>> >>switch (TREE_CODE_CLASS (TREE_CODE (exp)))
>> >>  {
>> >> @@ -6620,6 +7120,8 @@ pass_expand::execute (function *fun)
>> >>   avoid_deep_ter_for_debug (gsi_stmt (gsi), 0);
>> >>  }
>> >>  
>> >> +  prepare_expander_sra ();
>> >> +
>> >>/* Mark arrays indexed with non-constant indices with 
>> >> TREE_ADDRESSABLE.  */
>> >>auto_bitmap forced_stack_vars;
>> >>discover_nonconstant_array_refs (forced_stack_vars);
>> >> @@ -7052,6 +7554,7 @@ pass_expand::execute (function *fun)
>> >>loop_optimizer_finalize ();
>> >>  }
>> >>  
>> >> +  free_expander_sra ();
>> >>timevar_pop (TV_POST_EXPAND);
>> >>  
>> >>return 0;
>> >> diff --git a/gcc/expr.cc b/gcc/expr.cc
>> >> index fff09dc9951..d487fe3b53b 100644
>> >> --- a/gcc/expr.cc
>> >> +++ b/gcc/expr.cc
>> >> @@ -100,6 +100,7 @@ static void do_tablejump (rtx, machine_mode, rtx, 
>> >> rtx, rtx,
>> >>  static rtx const_vector_from_tree (tree);
>> >>  static tree tree_expr_size (const_tree);
>> >>  static void convert_mode_scalar (rtx, rtx, int);
>> >> +rtx get_scalar_rtx_for_aggregate_expr (tree);
>> >>  
>> >> 

Re: [RFC] light expander sra for parameters and returns

2023-08-03 Thread Jiufu Guo via Gcc-patches
> +
>>  /* A subroutine of assign_parms.  Adjust DATA->ENTRY_RTL such that it's
>> always valid and contiguous.  */
>>  
>> @@ -3115,8 +3118,24 @@ assign_parm_setup_block (struct assign_parm_data_all 
>> *all,
>>emit_move_insn (mem, entry_parm);
>>  }
>>else
>> -move_block_from_reg (REGNO (entry_parm), mem,
>> - size_stored / UNITS_PER_WORD);
>> +{
>> +  int regno = REGNO (entry_parm);
>> +  int nregs = size_stored / UNITS_PER_WORD;
>> +  move_block_from_reg (regno, mem, nregs);
>> +
>> +  rtx *tmps = XALLOCAVEC (rtx, nregs);
>> +  machine_mode mode = word_mode;
>> +  HOST_WIDE_INT word_size = GET_MODE_SIZE (mode).to_constant ();
>> +  for (int i = 0; i < nregs; i++)
>> +{
>> +  rtx reg = gen_rtx_REG (mode, regno + i);
>> +  rtx off = GEN_INT (word_size * i);
>> +  tmps[i] = gen_rtx_EXPR_LIST (VOIDmode, reg, off);
>> +}
>> +
>> +  rtx regs = gen_rtx_PARALLEL (BLKmode, gen_rtvec_v (nregs, tmps));
>> +      set_scalar_rtx_for_aggregate_access (parm, regs);
>> +}
>>      }
>>else if (data->stack_parm == 0 && !TYPE_EMPTY_P (data->arg.type))
>>  {
>> @@ -3716,6 +3735,10 @@ assign_parms (tree fndecl)
>>else
>>  set_decl_incoming_rtl (parm, data.entry_parm, false);
>>  
>> +  rtx incoming = DECL_INCOMING_RTL (parm);
>> +  if (GET_CODE (incoming) == PARALLEL)
>> +set_scalar_rtx_for_aggregate_access (parm, incoming);
>> +
>>assign_parm_adjust_stack_rtl ();
>>  
>>if (assign_parm_setup_block_p ())
>> @@ -5136,6 +5159,7 @@ expand_function_start (tree subr)
>>  {
>>gcc_assert (GET_CODE (hard_reg) == PARALLEL);
>>set_parm_rtl (res, gen_group_rtx (hard_reg));
>> +  set_scalar_rtx_for_returns ();
>>  }
>>  }
>>  
>> diff --git a/gcc/testsuite/g++.target/powerpc/pr102024.C 
>> b/gcc/testsuite/g++.target/powerpc/pr102024.C
>> index 769585052b5..c8995cae707 100644
>> --- a/gcc/testsuite/g++.target/powerpc/pr102024.C
>> +++ b/gcc/testsuite/g++.target/powerpc/pr102024.C
>> @@ -5,7 +5,7 @@
>>  // Test that a zero-width bit field in an otherwise homogeneous aggregate
>>  // generates a psabi warning and passes arguments in GPRs.
>>  
>> -// { dg-final { scan-assembler-times {\mstd\M} 4 } }
>> +// { dg-final { scan-assembler-times {\mmtvsrd\M} 4 } }
>>  
>>  struct a_thing
>>  {
>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr108073.c 
>> b/gcc/testsuite/gcc.target/powerpc/pr108073.c
>> new file mode 100644
>> index 000..7dd1a4a326a
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr108073.c
>> @@ -0,0 +1,29 @@
>> +/* { dg-do run } */
>> +/* { dg-options "-O2 -save-temps" } */
>> +
>> +typedef struct DF {double a[4]; short s1; short s2; short s3; short s4; } 
>> DF;
>> +typedef struct SF {float a[4]; int i1; int i2; } SF;
>> +
>> +/* { dg-final { scan-assembler-times {\mmtvsrd\M} 3 {target { 
>> has_arch_ppc64 && has_arch_pwr8 } } } } */
>> +/* { dg-final { scan-assembler-not {\mlwz\M} {target { has_arch_ppc64 && 
>> has_arch_pwr8 } } } } */
>> +/* { dg-final { scan-assembler-not {\mlhz\M} {target { has_arch_ppc64 && 
>> has_arch_pwr8 } } } } */
>> +short  __attribute__ ((noipa)) foo_hi (DF a, int flag){if (flag == 2)return 
>> a.s2+a.s3;return 0;}
>> +int  __attribute__ ((noipa)) foo_si (SF a, int flag){if (flag == 2)return 
>> a.i2+a.i1;return 0;}
>> +double __attribute__ ((noipa)) foo_df (DF arg, int flag){if (flag == 
>> 2)return arg.a[3];else return 0.0;}
>> +float  __attribute__ ((noipa)) foo_sf (SF arg, int flag){if (flag == 
>> 2)return arg.a[2]; return 0;}
>> +float  __attribute__ ((noipa)) foo_sf1 (SF arg, int flag){if (flag == 
>> 2)return arg.a[1];return 0;}
>> +
>> +DF gdf = {{1.0,2.0,3.0,4.0}, 1, 2, 3, 4};
>> +SF gsf = {{1.0f,2.0f,3.0f,4.0f}, 1, 2};
>> +
>> +int main()
>> +{
>> +  if (!(foo_hi (gdf, 2) == 5 && foo_si (gsf, 2) == 3 && foo_df (gdf, 2) == 
>> 4.0
>> +&& foo_sf (gsf, 2) == 3.0 && foo_sf1 (gsf, 2) == 2.0))
>> +__builtin_abort ();
>> +  if (!(foo_hi (gdf, 1) == 0 && foo_si (gsf, 1) == 0 && foo_df (gdf, 1) == 0
>> +&& foo_sf (gsf, 1) == 0 && foo_sf1 (gsf, 1) == 0))
>> +__builtin_abort ();
>> +  re

Re: [PATCH V5 1/2] Add overflow API for plus minus mult on range

2023-08-02 Thread Jiufu Guo via Gcc-patches


Hi,

I would like to have a ping on this patch.

BR,
Jeff (Jiufu Guo)


Jiufu Guo  writes:

> Hi,
>
> As discussed in previous reviews, adding overflow APIs to range-op
> would be useful. Those APIs could help to check if overflow happens
> when operating between two 'range's, like: plus, minus, and mult.
>
> Previous discussions are here:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624067.html
> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624701.html
>
> Bootstrap & regtest pass on ppc64{,le} and x86_64.
> Is this patch ok for trunk?
>
> BR,
> Jeff (Jiufu Guo)
>
> gcc/ChangeLog:
>
>   * range-op-mixed.h (operator_plus::overflow_free_p): New declare.
>   (operator_minus::overflow_free_p): New declare.
>   (operator_mult::overflow_free_p): New declare.
>   * range-op.cc (range_op_handler::overflow_free_p): New function.
>   (range_operator::overflow_free_p): New default function.
>   (operator_plus::overflow_free_p): New function.
>   (operator_minus::overflow_free_p): New function.
>   (operator_mult::overflow_free_p): New function.
>   * range-op.h (range_op_handler::overflow_free_p): New declare.
>   (range_operator::overflow_free_p): New declare.
>   * value-range.cc (irange::nonnegative_p): New function.
>   (irange::nonpositive_p): New function.
>   * value-range.h (irange::nonnegative_p): New declare.
>   (irange::nonpositive_p): New declare.
>
> ---
>  gcc/range-op-mixed.h |  11 
>  gcc/range-op.cc  | 124 +++
>  gcc/range-op.h   |   5 ++
>  gcc/value-range.cc   |  12 +
>  gcc/value-range.h|   2 +
>  5 files changed, 154 insertions(+)
>
> diff --git a/gcc/range-op-mixed.h b/gcc/range-op-mixed.h
> index 6944742ecbc..42157ed9061 100644
> --- a/gcc/range-op-mixed.h
> +++ b/gcc/range-op-mixed.h
> @@ -383,6 +383,10 @@ public:
> relation_kind rel) const final override;
>void update_bitmask (irange , const irange ,
>  const irange ) const final override;
> +
> +  virtual bool overflow_free_p (const irange , const irange ,
> + relation_trio = TRIO_VARYING) const;
> +
>  private:
>void wi_fold (irange , tree type, const wide_int _lb,
>   const wide_int _ub, const wide_int _lb,
> @@ -446,6 +450,10 @@ public:
>   relation_kind rel) const final override;
>void update_bitmask (irange , const irange ,
>  const irange ) const final override;
> +
> +  virtual bool overflow_free_p (const irange , const irange ,
> + relation_trio = TRIO_VARYING) const;
> +
>  private:
>void wi_fold (irange , tree type, const wide_int _lb,
>   const wide_int _ub, const wide_int _lb,
> @@ -525,6 +533,9 @@ public:
>   const REAL_VALUE_TYPE _lb, const REAL_VALUE_TYPE _ub,
>   const REAL_VALUE_TYPE _lb, const REAL_VALUE_TYPE _ub,
>   relation_kind kind) const final override;
> +  virtual bool overflow_free_p (const irange , const irange ,
> + relation_trio = TRIO_VARYING) const;
> +
>  };
>  
>  class operator_addr_expr : public range_operator
> diff --git a/gcc/range-op.cc b/gcc/range-op.cc
> index cb584314f4c..632b044331b 100644
> --- a/gcc/range-op.cc
> +++ b/gcc/range-op.cc
> @@ -366,6 +366,22 @@ range_op_handler::op1_op2_relation (const vrange ) 
> const
>  }
>  }
>  
> +bool
> +range_op_handler::overflow_free_p (const vrange ,
> +const vrange ,
> +relation_trio rel) const
> +{
> +  gcc_checking_assert (m_operator);
> +  switch (dispatch_kind (lh, lh, rh))
> +{
> +  case RO_III:
> + return m_operator->overflow_free_p(as_a  (lh),
> +as_a  (rh),
> +rel);
> +  default:
> + return false;
> +}
> +}
>  
>  // Convert irange bitmasks into a VALUE MASK pair suitable for calling CCP.
>  
> @@ -688,6 +704,13 @@ range_operator::op1_op2_relation_effect (irange 
> _range ATTRIBUTE_UNUSED,
>return false;
>  }
>  
> +bool
> +range_operator::overflow_free_p (const irange &, const irange &,
> +  relation_trio) const
> +{
> +  return false;
> +}
> +
>  // Apply any known bitmask updates based on this operator.
>  
>  void
> @@ -4311,6 +4334,107 @@ range_op_table::initialize_integral_ops ()
>  
>  }
>  
> +bool
> +operator_plus::overflow_free_p (const irange , const irange ,
> + relation_trio) const
> +{
> +  if (lh.undefined_p () || rh.undefined_p ())
> +return false;
> +
> +  tree type = lh.type ();
> +  if (TYPE_OVERFLOW_UNDEFINED (type))
> +return true;
> +
> +  wi::overflow_type ovf;
> +  signop sgn = TYPE_SIGN (type);
> +  wide_int wmax0 = lh.upper_bound ();
> +  wide_int wmax1 = rh.upper_bound ();
> +  wi::add (wmax0, 

Re: [RFC] light expander sra for parameters and returns

2023-08-01 Thread Jiufu Guo via Gcc-patches


Hi,

Jiufu Guo via Gcc-patches  writes:

> Hi,
>
> Richard Biener  writes:
>
>> On Mon, 24 Jul 2023, Jiufu Guo wrote:
>>
>>> 
>>> Hi Martin,
>>> 
>>> Not sure about your current option about re-using the ipa-sra code
>>> in the light-expander-sra. And if anything I could input please
>>> let me know.
>>>
...
>>
>> What I was hoping for is shared stmt-level analysis and a shared
>> data structure for the "access"(es) a stmt performs.  Because that
>> can come up handy in multiple places.  The existing SRA data
>> structures could easily embed that subset for example if sharing
>> the whole data structure of [IPA] SRA seems too unwieldly.
>
> Understand.
> The stmt-level analysis and "access" data structure are similar
> between ipa-sra/tree-sra and the expander-sra.
>
> I just update the patch, this version does not change the behaviors of
> the previous version.  It is just cleaning/merging some functions only.
> The patch is attached.
>
> This version (and tree-sra/ipa-sra) is still using the similar
> "stmt analyze" and "access struct"".  This could be extracted as
> shared code.
> I'm thinking to update the code to use the same "base_access" and
> "walk function".

I'm drafting code for the shared stmt-analyze and access-structure.
The code may like below.

BR,
Jeff (Jiufu Guo)

---
struct base_access
{
  /* Values returned by get_ref_base_and_extent, indicates the
 OFFSET, SIZE and BASE of the access.  */
  HOST_WIDE_INT offset;
  HOST_WIDE_INT size;
  tree base;

  /* The context expression of this access.  */
  tree expr;

  /* Indicates this is a write access.  */
  bool write : 1;

  /* Indicates if this access is made in reverse storage order.  */
  bool reverse : 1;
};

/* Default template for sra_scan_function.  */

struct default_analyzer
{
  /* Template analyze functions.  */
  void analyze_phi (gphi *){};
  void pre_analyze_stmt (gimple *){};
  void analyze_return (greturn *){};
  void analyze_assign (gassign *){};
  void analyze_call (gcall *){};
  void analyze_asm (gasm *){};
  void analyze_default_stmt (gimple *){};
};

/* Scan function and look for interesting expressions.  */

template 
void
sra_scan_function (struct function *fun, analyzer )
{
  basic_block bb;
  FOR_EACH_BB_FN (bb, fun)
{
  for (gphi_iterator gsi = gsi_start_phis (bb); !gsi_end_p (gsi);
   gsi_next ())
a.analyze_phi (gsi.phi ());

  for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi);
   gsi_next ())
{
  gimple *stmt = gsi_stmt (gsi);
  a.pre_analyze_stmt (stmt);

  switch (gimple_code (stmt))
{
case GIMPLE_RETURN:
  a.analyze_return (as_a (stmt));
  break;

case GIMPLE_ASSIGN:
  a.analyze_assign (as_a (stmt));
  break;

case GIMPLE_CALL:
  a.analyze_call (as_a (stmt));
  break;

case GIMPLE_ASM:
  a.analyze_asm (as_a (stmt));
  break;

default:
  a.analyze_default_stmt (stmt);
  break;
}
}
}
}


struct access : public base_access
{
  /* The rtx for the access: link to incoming/returning register(s).  */
  rtx rtx_val;
};

struct expand_access_analyzer : public default_analyzer
{
  /* Now use default APIs, no actions for
 pre_analyze_stmt, analyze_return.  */

  /* overwrite analyze_default_stmt.  */
  void analyze_default_stmt (gimple *);

  /* overwrite analyze phi,call,asm .  */
  void analyze_phi (gphi *stmt) { analyze_default_stmt (stmt); };
  void analyze_call (gcall *stmt) { analyze_default_stmt (stmt); };
  void analyze_asm (gasm *stmt) { analyze_default_stmt (stmt); };  

  /* overwrite analyze_assign.  */
  void analyze_assign (gassign *);
};


>
>>
>> With a stmt-leve API using FOR_EACH_IMM_USE_STMT would still be
>> possible (though RTL expansion pre-walks all stmts anyway).
>
> Yeap, I also notice that "FOR_EACH_IMM_USE_STMT" is not enough.
> For struct parameters, walking stmt is needed.
>
>
> BR,
> Jeff (Jiufu Guo)
>
> -
> diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
> index edf292cfbe9..8c36ad5df79 100644
> --- a/gcc/cfgexpand.cc
> +++ b/gcc/cfgexpand.cc
> @@ -97,6 +97,502 @@ static bool defer_stack_allocation (tree, bool);
>  
>  static void record_alignment_for_reg_var (unsigned int);
>  
> +extern rtx
> +expand_shift (enum tree_code, machine_mode, rtx, poly_int64, rtx, int);
> +
> +/* For light SRA in expander about paramaters and returns.  */
> +namespace
> +{
> +
> +struct access
> +{
> +  /* Each

Re: [RFC] light expander sra for parameters and returns

2023-08-01 Thread Jiufu Guo via Gcc-patches
 (DF arg, int flag){if (flag == 2)return 
arg.a[3];else return 0.0;}
+float  __attribute__ ((noipa)) foo_sf (SF arg, int flag){if (flag == 2)return 
arg.a[2]; return 0;}
+float  __attribute__ ((noipa)) foo_sf1 (SF arg, int flag){if (flag == 2)return 
arg.a[1];return 0;}
+
+DF gdf = {{1.0,2.0,3.0,4.0}, 1, 2, 3, 4};
+SF gsf = {{1.0f,2.0f,3.0f,4.0f}, 1, 2};
+
+int main()
+{
+  if (!(foo_hi (gdf, 2) == 5 && foo_si (gsf, 2) == 3 && foo_df (gdf, 2) == 4.0
+   && foo_sf (gsf, 2) == 3.0 && foo_sf1 (gsf, 2) == 2.0))
+__builtin_abort ();
+  if (!(foo_hi (gdf, 1) == 0 && foo_si (gsf, 1) == 0 && foo_df (gdf, 1) == 0
+   && foo_sf (gsf, 1) == 0 && foo_sf1 (gsf, 1) == 0))
+__builtin_abort ();
+  return 0;
+}
+
diff --git a/gcc/testsuite/gcc.target/powerpc/pr65421-1.c 
b/gcc/testsuite/gcc.target/powerpc/pr65421-1.c
new file mode 100644
index 000..4e1f87f7939
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr65421-1.c
@@ -0,0 +1,6 @@
+/* PR target/65421 */
+/* { dg-options "-O2" } */
+
+typedef struct LARGE {double a[4]; int arr[32];} LARGE;
+LARGE foo (LARGE a){return a;}
+/* { dg-final { scan-assembler-times {\mmemcpy\M} 1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr65421-2.c 
b/gcc/testsuite/gcc.target/powerpc/pr65421-2.c
new file mode 100644
index 000..8a8e1a0e996
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr65421-2.c
@@ -0,0 +1,32 @@
+/* PR target/65421 */
+/* { dg-options "-O2" } */
+/* { dg-require-effective-target powerpc_elfv2 } */
+/* { dg-require-effective-target has_arch_ppc64 } */
+
+typedef struct FLOATS
+{
+  double a[3];
+} FLOATS;
+
+/* 3 lfd after returns also optimized */
+/* FLOATS ret_arg_pt (FLOATS *a){return *a;} */
+
+/* 3 stfd */
+void st_arg (FLOATS a, FLOATS *p) {*p = a;}
+/* { dg-final { scan-assembler-times {\mstfd\M} 3 } } */
+
+/* blr */
+FLOATS ret_arg (FLOATS a) {return a;}
+
+typedef struct MIX
+{
+  double a[2];
+  long l;
+} MIX;
+
+/* std 3 param regs to return slot */
+MIX ret_arg1 (MIX a) {return a;}
+/* { dg-final { scan-assembler-times {\mstd\M} 3 } } */
+
+/* count insns */
+/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 9 } } */

>
> Richard.
>
>> 
>> BR,
>> Jeff (Jiufu Guo)
>> 
>> 
>> Jiufu Guo via Gcc-patches  writes:
>> 
>> > Hi Martin,
>> >
>> > Jiufu Guo via Gcc-patches  writes:
>> >
>> >> Hi,
>> >>
>> >> Martin Jambor  writes:
>> >>
>> >>> Hi,
>> >>>
>> >>> On Tue, May 30 2023, Richard Biener wrote:
>> >>>> On Mon, 29 May 2023, Jiufu Guo wrote:
>> >>>>
>> >>>>> Hi,
>> >>>>> 
>> >>>>> Previously, I was investigating some struct parameters and returns 
>> >>>>> related
>> >>>>> PRs 69143/65421/108073.
>> >>>>> 
>> >>>>> Investigating the issues case by case, and drafting patches for each of
>> >>>>> them one by one. This would help us to enhance code incrementally.
>> >>>>> While, this way, patches would interact with each other and implement
>> >>>>> different codes for similar issues (because of the different paths in
>> >>>>> gimple/rtl).  We may have a common fix for those issues.
>> >>>>> 
>> >>>>> We know a few other related PRs(such as meta-bug PR101926) exist. For 
>> >>>>> those
>> >>>>> PRs in different targets with different symptoms (and also different 
>> >>>>> root
>> >>>>> cause), I would expect a method could help some of them, but it may
>> >>>>> be hard to handle all of them in one fix.
>> >>>>> 
>> >>>>> With investigation and check discussion for the issues, I remember a
>> >>>>> suggestion from Richard: it would be nice to perform some SRA-like 
>> >>>>> analysis
>> >>>>> for the accesses on the structs (parameter/returns).
>> >>>>> https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605117.html
>> >>>>> This may be a 'fairly common method' for those issues. With this idea,
>> >>>>> I drafted a patch as below in this mail.
>> >>>>> 
>> >>>>> I also thought about directly using tree-sra.cc, e.g. enhance it and 
>> >>>>> rerun it
>> >>>>> at the end of GIMPLE passes. While since some issues are introduced 
>> >>>>> inside
>> >>>>> the expander, so below

Re: [RFC] light expander sra for parameters and returns

2023-07-23 Thread Jiufu Guo via Gcc-patches


Hi Martin,

Not sure about your current option about re-using the ipa-sra code
in the light-expander-sra. And if anything I could input please
let me know.

And I'm thinking about the difference between the expander-sra, ipa-sra
and tree-sra. 1. For stmts walking, expander-sra has special behavior
for return-stmt, and also a little special on assign-stmt. And phi
stmts are not checked by ipa-sra/tree-sra. 2. For the access structure,
I'm also thinking if we need a tree structure; it would be useful when
checking overlaps, it was not used now in the expander-sra.

For ipa-sra and tree-sra, I notice that there is some similar code,
but of cause there are differences. While it seems the difference
is 'intended', for example: 1. when creating and accessing,
'size != max_size' is acceptable in tree-sra but not for ipa-sra.
2. 'AGGREGATE_TYPE_P' for ipa-sra is accepted for some cases, but
not ok for tree-ipa.  
I'm wondering if those slight difference blocks re-use the code
between ipa-sra and tree-sra.

The expander-sra may be more light, for example, maybe we can use
FOR_EACH_IMM_USE_STMT to check the usage of each parameter, and not
need to walk all the stmts.


BR,
Jeff (Jiufu Guo)


Jiufu Guo via Gcc-patches  writes:

> Hi Martin,
>
> Jiufu Guo via Gcc-patches  writes:
>
>> Hi,
>>
>> Martin Jambor  writes:
>>
>>> Hi,
>>>
>>> On Tue, May 30 2023, Richard Biener wrote:
>>>> On Mon, 29 May 2023, Jiufu Guo wrote:
>>>>
>>>>> Hi,
>>>>> 
>>>>> Previously, I was investigating some struct parameters and returns related
>>>>> PRs 69143/65421/108073.
>>>>> 
>>>>> Investigating the issues case by case, and drafting patches for each of
>>>>> them one by one. This would help us to enhance code incrementally.
>>>>> While, this way, patches would interact with each other and implement
>>>>> different codes for similar issues (because of the different paths in
>>>>> gimple/rtl).  We may have a common fix for those issues.
>>>>> 
>>>>> We know a few other related PRs(such as meta-bug PR101926) exist. For 
>>>>> those
>>>>> PRs in different targets with different symptoms (and also different root
>>>>> cause), I would expect a method could help some of them, but it may
>>>>> be hard to handle all of them in one fix.
>>>>> 
>>>>> With investigation and check discussion for the issues, I remember a
>>>>> suggestion from Richard: it would be nice to perform some SRA-like 
>>>>> analysis
>>>>> for the accesses on the structs (parameter/returns).
>>>>> https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605117.html
>>>>> This may be a 'fairly common method' for those issues. With this idea,
>>>>> I drafted a patch as below in this mail.
>>>>> 
>>>>> I also thought about directly using tree-sra.cc, e.g. enhance it and 
>>>>> rerun it
>>>>> at the end of GIMPLE passes. While since some issues are introduced inside
>>>>> the expander, so below patch also co-works with other parts of the 
>>>>> expander.
>>>>> And since we already have tree-sra in gimple pass, we only need to take 
>>>>> more
>>>>> care on parameter and return in this patch: other decls could be handled
>>>>> well in tree-sra.
>>>>> 
>>>>> The steps of this patch are:
>>>>> 1. Collect struct type parameters and returns, and then scan the function 
>>>>> to
>>>>> get the accesses on them. And figure out the accesses which would be 
>>>>> profitable
>>>>> to be scalarized (using registers of the parameter/return ). Now, reading 
>>>>> on
>>>>> parameter and writing on returns are checked in the current patch.
>>>>> 2. When/after the scalar registers are determined/expanded for the return 
>>>>> or
>>>>> parameters, compute the corresponding scalar register(s) for each 
>>>>> accesses of
>>>>> the return/parameter, and prepare the scalar RTLs for those accesses.
>>>>> 3. When using/expanding the accesses expression, leverage the 
>>>>> computed/prepared
>>>>> scalars directly.
>>>>> 
>>>>> This patch is tested on ppc64 both LE and BE.
>>>>> To continue, I would ask for comments and suggestions first. And then I 
>>>>> would
>>>>> update/enhance according

[PATCH V5 1/2] Add overflow API for plus minus mult on range

2023-07-18 Thread Jiufu Guo via Gcc-patches
Hi,

As discussed in previous reviews, adding overflow APIs to range-op
would be useful. Those APIs could help to check if overflow happens
when operating between two 'range's, like: plus, minus, and mult.

Previous discussions are here:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624067.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624701.html

Bootstrap & regtest pass on ppc64{,le} and x86_64.
Is this patch ok for trunk?

BR,
Jeff (Jiufu Guo)

gcc/ChangeLog:

* range-op-mixed.h (operator_plus::overflow_free_p): New declare.
(operator_minus::overflow_free_p): New declare.
(operator_mult::overflow_free_p): New declare.
* range-op.cc (range_op_handler::overflow_free_p): New function.
(range_operator::overflow_free_p): New default function.
(operator_plus::overflow_free_p): New function.
(operator_minus::overflow_free_p): New function.
(operator_mult::overflow_free_p): New function.
* range-op.h (range_op_handler::overflow_free_p): New declare.
(range_operator::overflow_free_p): New declare.
* value-range.cc (irange::nonnegative_p): New function.
(irange::nonpositive_p): New function.
* value-range.h (irange::nonnegative_p): New declare.
(irange::nonpositive_p): New declare.

---
 gcc/range-op-mixed.h |  11 
 gcc/range-op.cc  | 124 +++
 gcc/range-op.h   |   5 ++
 gcc/value-range.cc   |  12 +
 gcc/value-range.h|   2 +
 5 files changed, 154 insertions(+)

diff --git a/gcc/range-op-mixed.h b/gcc/range-op-mixed.h
index 6944742ecbc..42157ed9061 100644
--- a/gcc/range-op-mixed.h
+++ b/gcc/range-op-mixed.h
@@ -383,6 +383,10 @@ public:
  relation_kind rel) const final override;
   void update_bitmask (irange , const irange ,
   const irange ) const final override;
+
+  virtual bool overflow_free_p (const irange , const irange ,
+   relation_trio = TRIO_VARYING) const;
+
 private:
   void wi_fold (irange , tree type, const wide_int _lb,
const wide_int _ub, const wide_int _lb,
@@ -446,6 +450,10 @@ public:
relation_kind rel) const final override;
   void update_bitmask (irange , const irange ,
   const irange ) const final override;
+
+  virtual bool overflow_free_p (const irange , const irange ,
+   relation_trio = TRIO_VARYING) const;
+
 private:
   void wi_fold (irange , tree type, const wide_int _lb,
const wide_int _ub, const wide_int _lb,
@@ -525,6 +533,9 @@ public:
const REAL_VALUE_TYPE _lb, const REAL_VALUE_TYPE _ub,
const REAL_VALUE_TYPE _lb, const REAL_VALUE_TYPE _ub,
relation_kind kind) const final override;
+  virtual bool overflow_free_p (const irange , const irange ,
+   relation_trio = TRIO_VARYING) const;
+
 };
 
 class operator_addr_expr : public range_operator
diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index cb584314f4c..632b044331b 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -366,6 +366,22 @@ range_op_handler::op1_op2_relation (const vrange ) 
const
 }
 }
 
+bool
+range_op_handler::overflow_free_p (const vrange ,
+  const vrange ,
+  relation_trio rel) const
+{
+  gcc_checking_assert (m_operator);
+  switch (dispatch_kind (lh, lh, rh))
+{
+  case RO_III:
+   return m_operator->overflow_free_p(as_a  (lh),
+  as_a  (rh),
+  rel);
+  default:
+   return false;
+}
+}
 
 // Convert irange bitmasks into a VALUE MASK pair suitable for calling CCP.
 
@@ -688,6 +704,13 @@ range_operator::op1_op2_relation_effect (irange _range 
ATTRIBUTE_UNUSED,
   return false;
 }
 
+bool
+range_operator::overflow_free_p (const irange &, const irange &,
+relation_trio) const
+{
+  return false;
+}
+
 // Apply any known bitmask updates based on this operator.
 
 void
@@ -4311,6 +4334,107 @@ range_op_table::initialize_integral_ops ()
 
 }
 
+bool
+operator_plus::overflow_free_p (const irange , const irange ,
+   relation_trio) const
+{
+  if (lh.undefined_p () || rh.undefined_p ())
+return false;
+
+  tree type = lh.type ();
+  if (TYPE_OVERFLOW_UNDEFINED (type))
+return true;
+
+  wi::overflow_type ovf;
+  signop sgn = TYPE_SIGN (type);
+  wide_int wmax0 = lh.upper_bound ();
+  wide_int wmax1 = rh.upper_bound ();
+  wi::add (wmax0, wmax1, sgn, );
+  if (ovf != wi::OVF_NONE)
+return false;
+
+  if (TYPE_UNSIGNED (type))
+return true;
+
+  wide_int wmin0 = lh.lower_bound ();
+  wide_int wmin1 = rh.lower_bound ();
+  wi::add (wmin0, wmin1, sgn, );
+  if (ovf != wi::OVF_NONE)
+return false;
+
+  return true;
+}
+
+bool

[PATCH V5 2/2] Optimize '(X - N * M) / N' to 'X / N - M' if valid

2023-07-18 Thread Jiufu Guo via Gcc-patches


Hi,

Integer expression "(X - N * M) / N" can be optimized to "X / N - M"
if there is no wrap/overflow/underflow and "X - N * M" has the same
sign with "X".

Compare the previous version:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624067.html
- APIs: overflow, nonnegative_p and nonpositive_p are moved close
  to value range.
- Use above APIs in match.pd.

Bootstrap & regtest pass on ppc64{,le} and x86_64.
Is this patch ok for trunk?

BR,
Jeff (Jiufu Guo)

PR tree-optimization/108757

gcc/ChangeLog:

* match.pd ((X - N * M) / N): New pattern.
((X + N * M) / N): New pattern.
((X + C) div_rshift N): New pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/pr108757-1.c: New test.
* gcc.dg/pr108757-2.c: New test.
* gcc.dg/pr108757.h: New test.

---
 gcc/match.pd  |  85 +++
 gcc/testsuite/gcc.dg/pr108757-1.c |  18 +++
 gcc/testsuite/gcc.dg/pr108757-2.c |  19 +++
 gcc/testsuite/gcc.dg/pr108757.h   | 233 ++
 4 files changed, 355 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pr108757-1.c
 create mode 100644 gcc/testsuite/gcc.dg/pr108757-2.c
 create mode 100644 gcc/testsuite/gcc.dg/pr108757.h

diff --git a/gcc/match.pd b/gcc/match.pd
index 8543f777a28..39dbb0567dc 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -942,6 +942,91 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 #endif

 
+#if GIMPLE
+(for div (trunc_div exact_div)
+ /* Simplify (t + M*N) / N -> t / N + M.  */
+ (simplify
+  (div (plus:c@4 @0 (mult:c@3 @1 @2)) @2)
+  (with {value_range vr0, vr1, vr2, vr3, vr4;}
+  (if (INTEGRAL_TYPE_P (type)
+   && get_range_query (cfun)->range_of_expr (vr1, @1)
+   && get_range_query (cfun)->range_of_expr (vr2, @2)
+   && range_op_handler (MULT_EXPR).overflow_free_p (vr1, vr2)
+   && get_range_query (cfun)->range_of_expr (vr0, @0)
+   && get_range_query (cfun)->range_of_expr (vr3, @3)
+   && range_op_handler (PLUS_EXPR).overflow_free_p (vr0, vr3)
+   && get_range_query (cfun)->range_of_expr (vr4, @4)
+   && (TYPE_UNSIGNED (type)
+  || (vr0.nonnegative_p () && vr4.nonnegative_p ())
+  || (vr0.nonpositive_p () && vr4.nonpositive_p (
+  (plus (div @0 @2) @1
+
+ /* Simplify (t - M*N) / N -> t / N - M.  */
+ (simplify
+  (div (minus@4 @0 (mult:c@3 @1 @2)) @2)
+  (with {value_range vr0, vr1, vr2, vr3, vr4;}
+  (if (INTEGRAL_TYPE_P (type)
+   && get_range_query (cfun)->range_of_expr (vr1, @1)
+   && get_range_query (cfun)->range_of_expr (vr2, @2)
+   && range_op_handler (MULT_EXPR).overflow_free_p (vr1, vr2)
+   && get_range_query (cfun)->range_of_expr (vr0, @0)
+   && get_range_query (cfun)->range_of_expr (vr3, @3)
+   && range_op_handler (MINUS_EXPR).overflow_free_p (vr0, vr3)
+   && get_range_query (cfun)->range_of_expr (vr4, @4)
+   && (TYPE_UNSIGNED (type)
+  || (vr0.nonnegative_p () && vr4.nonnegative_p ())
+  || (vr0.nonpositive_p () && vr4.nonpositive_p (
+  (minus (div @0 @2) @1)
+
+/* Simplify
+   (t + C) / N -> t / N + C / N where C is multiple of N.
+   (t + C) >> N -> t >> N + C>>N if low N bits of C is 0.  */
+(for op (trunc_div exact_div rshift)
+ (simplify
+  (op (plus@3 @0 INTEGER_CST@1) INTEGER_CST@2)
+   (with
+{
+  wide_int c = wi::to_wide (@1);
+  wide_int n = wi::to_wide (@2);
+  bool is_rshift = op == RSHIFT_EXPR;
+  bool neg_c = false;
+  bool ok = false;
+  value_range vr0;
+  if (INTEGRAL_TYPE_P (type)
+ && get_range_query (cfun)->range_of_expr (vr0, @0))
+{
+ ok = is_rshift ? wi::ctz (c) >= n.to_shwi ()
+: wi::multiple_of_p (c, n, TYPE_SIGN (type));
+ value_range vr1, vr3;
+ ok = ok && get_range_query (cfun)->range_of_expr (vr1, @1)
+  && range_op_handler (PLUS_EXPR).overflow_free_p (vr0, vr1)
+  && get_range_query (cfun)->range_of_expr (vr3, @3)
+  && (TYPE_UNSIGNED (type)
+  || (vr0.nonnegative_p () && vr3.nonnegative_p ())
+  || (vr0.nonpositive_p () && vr3.nonpositive_p ()));
+
+ /* Try check 'X + C' as 'X - -C' for unsigned.  */
+ if (!ok && TYPE_UNSIGNED (type) && c.sign_mask () < 0)
+   {
+ neg_c = true;
+ c = -c;
+ ok = is_rshift ? wi::ctz (c) >= n.to_shwi ()
+: wi::multiple_of_p (c, n, UNSIGNED);
+ ok = ok && wi::geu_p (vr0.lower_bound (), c);
+   }
+   }
+}
+   (if (ok)
+   (with
+{
+  wide_int m;
+  m = is_rshift ? wi::rshift (c, n, TYPE_SIGN (type))
+   : wi::div_trunc (c, n, TYPE_SIGN (type));
+  m = neg_c ? -m : m;
+}
+   (plus (op @0 @2) { wide_int_to_tree(type, m); }))
+#endif
+
 (for op (negate abs)
  /* Simplify cos(-x) and cos(|x|) -> cos(x).  Similarly for cosh.  */
  (for coss (COS COSH)
diff --git a/gcc/testsuite/gcc.dg/pr108757-1.c 

Re: [PATCH V4] Optimize '(X - N * M) / N' to 'X / N - M' if valid

2023-07-18 Thread Jiufu Guo via Gcc-patches


Hi,

Andrew MacLeod  writes:

> On 7/17/23 09:45, Jiufu Guo wrote:
>>
 Should we decide we would like it in general, it wouldnt be hard to add to
 irange.  wi_fold() cuurently returns null, it could easily return a bool
 indicating if an overflow happened, and wi_fold_in_parts and fold_range 
 would
 simply OR the results all together of the compoent wi_fold() calls.  It 
 would
 require updating/audfiting  a number of range-op entries and adding an
 overflowed_p()  query to irange.
>>> Ah, yeah - the folding APIs would be a good fit I guess.  I was
>>> also looking to have the "new" helpers to be somewhat consistent
>>> with the ranger API.
>>>
>>> So if we had a fold_range overload with either an output argument
>>> or a flag that makes it return false on possible overflow that
>>> would work I guess?  Since we have a virtual class setup we
>>> might be able to provide a default failing method and implement
>>> workers for plus and mult (as needed for this patch) as the need
>>> arises?
>> Thanks for your comments!
>> Here is a concern.  The patterns in match.pd may be supported by
>> 'vrp' passes. At that time, the range info would be computed (via
>> the value-range machinery) and cached for each SSA_NAME. In the
>> patterns, when range_of_expr is called for a capture, the range
>> info is retrieved from the cache, and no need to fold_range again.
>> This means the overflow info may also need to be cached together
>> with other range info.  There may be additional memory and time
>> cost.
>>
>
> I've been thinking about this a little bit, and how to make the info 
> available in a useful way.
>
> I wonder if maybe we just add another entry point  to range-ops that looks a 
> bit like fold_range ..
>
>   Attached is an (untested) patch which ads overflow_free_p(op1, op2,
> relation)  to rangeops.   It defaults to returning false.  If you want
> to implement it for say plus,  you'd add to operator_plus in
> range-ops.cc  something like
>
> operator_plus::overflow_free_p (irange, irange& op2, relation_kind)
> {
>    // stuff you do in plus_without_overflow
> }
>
> I added relation_kind as  param, but you can ignore it.  maybe it wont
> ever help, but it seems like if we know there is a relation between
> op1 and op2 we might be able to someday determine something else?
> if not, remove it.
>
> Then all you need to do too access it is to go thru range-op_handler.. so for 
> instance:
>
> range_op_handler (PLUS_EXPR).overflow_free_p (op1, op2)
>
> It'll work for all types an all tree codes. the dispatch machinery
> will return false unless both op1 and op2 are integral ranges, and
> then it will invoke the appropriate handler, defaulting to returning
> FALSE.

Very good suggestions! Thanks so much for your great guide!

>
> I also am not a fan of the get_range  routine.  It would be better to
> generally just call range_of_expr, get the results, then handle
> undefined in the new overflow_free_p() routine and return false. 
> varying should not need anything special since it will trigger the
> overflow when you do the calculation.

The general code in the trunk is just like you said: range_of_expr is
used when querying a range for an expr.
I am also aware that: a range with varying([min, max]) may be ok if the
range is computed from other ranges, especially if there is no overflow.
For example, '[MAX-100, MAX] - [0, 100]' generates a varying range, but
it would be ok for some case.
And a varying range will trigger overflow if it takes part in a
calculation as your said.
So, I agree that varying would not be specially for some patterns.

>
> The auxillary routines could go in vr-values.h/cc.  They seem like
> things that simplify_using_ranges could utilize, and when we get to
> integrating simplify_using_ranges better,  what you are doing may end
> up there anyway

Thanks for your suggestion!  Or maybe we could just use the APIs 
in match.pd directly.

>
> Does that work?

I believe this would work!
I will submit a new version patch!  Thanks again for your comments!

BR,
Jeff (Jiufu Guo)

>
> Andrew


Re: [PATCH V4] Optimize '(X - N * M) / N' to 'X / N - M' if valid

2023-07-17 Thread Jiufu Guo via Gcc-patches


Hi,

Richard Biener  writes:

> On Fri, 14 Jul 2023, Andrew MacLeod wrote:
>
>> 
>> On 7/14/23 09:37, Richard Biener wrote:
>> > On Fri, 14 Jul 2023, Aldy Hernandez wrote:
>> >
>> >> I don't know what you're trying to accomplish here, as I haven't been
>> >> following the PR, but adding all these helper functions to the ranger
>> >> header
>> >> file seems wrong, especially since there's only one use of them. I see
>> >> you're
>> >> tweaking the irange API, adding helper functions to range-op (which is 
>> >> only
>> >> for code dealing with implementing range operators for tree codes), etc
>> >> etc.
>> >>
>> >> If you need these helper functions, I suggest you put them closer to their
>> >> uses (i.e. wherever the match.pd support machinery goes).
>> > Note I suggested the opposite beacuse I thought these kind of helpers
>> > are closer to value-range support than to match.pd.
>> 
>> 
>> probably vr-values.{cc.h} and  the simply_using_ranges paradigm would be the
>> most sensible place to put these kinds of auxiliary routines?
>> 
>> 
>> >
>> > But I take away from your answer that there's nothing close in the
>> > value-range machinery that answers the question whether A op B may
>> > overflow?
>> 
>> we dont track it in ranges themselves.   During calculation of a range we
>> obviously know, but propagating that generally when we rarely care doesn't
>> seem worthwhile.  The very first generation of irange 6 years ago had an
>> overflow_p() flag, but it was removed as not being worth keeping.     easier
>> to simply ask the question when it matters
>> 
>> As the routines show, it pretty easy to figure out when the need arises so I
>> think that should suffice.  At least for now,
>> 
>> Should we decide we would like it in general, it wouldnt be hard to add to
>> irange.  wi_fold() cuurently returns null, it could easily return a bool
>> indicating if an overflow happened, and wi_fold_in_parts and fold_range would
>> simply OR the results all together of the compoent wi_fold() calls.  It would
>> require updating/audfiting  a number of range-op entries and adding an
>> overflowed_p()  query to irange.
>
> Ah, yeah - the folding APIs would be a good fit I guess.  I was
> also looking to have the "new" helpers to be somewhat consistent
> with the ranger API.
>
> So if we had a fold_range overload with either an output argument
> or a flag that makes it return false on possible overflow that
> would work I guess?  Since we have a virtual class setup we
> might be able to provide a default failing method and implement
> workers for plus and mult (as needed for this patch) as the need
> arises?

Thanks for your comments!
Here is a concern.  The patterns in match.pd may be supported by
'vrp' passes. At that time, the range info would be computed (via
the value-range machinery) and cached for each SSA_NAME. In the
patterns, when range_of_expr is called for a capture, the range
info is retrieved from the cache, and no need to fold_range again.
This means the overflow info may also need to be cached together
with other range info.  There may be additional memory and time
cost.

BR,
Jeff (Jiufu Guo)

>
> Thanks,
> Richard.


Re: [PATCH V4] Optimize '(X - N * M) / N' to 'X / N - M' if valid

2023-07-17 Thread Jiufu Guo via Gcc-patches


Hi Andrew, Aldy and Richard,

Thanks a lot for all your very helpful comments!

Andrew MacLeod  writes:

> On 7/14/23 09:37, Richard Biener wrote:
>> On Fri, 14 Jul 2023, Aldy Hernandez wrote:
>>
>>> I don't know what you're trying to accomplish here, as I haven't been
>>> following the PR, but adding all these helper functions to the ranger header
>>> file seems wrong, especially since there's only one use of them. I see 
>>> you're
>>> tweaking the irange API, adding helper functions to range-op (which is only
>>> for code dealing with implementing range operators for tree codes), etc etc.
>>>
>>> If you need these helper functions, I suggest you put them closer to their
>>> uses (i.e. wherever the match.pd support machinery goes).
>> Note I suggested the opposite beacuse I thought these kind of helpers
>> are closer to value-range support than to match.pd.
>
>
> probably vr-values.{cc.h} and  the simply_using_ranges paradigm would
> be the most sensible place to put these kinds of auxiliary routines?

Thanks! Richard also mentioned this as an example of using VR APIs.
I did not use vr-values.h/cc just because it seems vr-values are not
used/included in match.pd directly yet.

>
>
>>
>> But I take away from your answer that there's nothing close in the
>> value-range machinery that answers the question whether A op B may
>> overflow?
>
> we dont track it in ranges themselves.   During calculation of a range
> we obviously know, but propagating that generally when we rarely care
> doesn't seem worthwhile.  The very first generation of irange 6 years
> ago had an overflow_p() flag, but it was removed as not being worth
> keeping.     easier to simply ask the question when it matters

Right, agree!

>
> As the routines show, it pretty easy to figure out when the need arises so I 
> think that should suffice.  At least for now,
>
> Should we decide we would like it in general, it wouldnt be hard to
> add to irange.  wi_fold() cuurently returns null, it could easily
> return a bool indicating if an overflow happened, and wi_fold_in_parts
> and fold_range would simply OR the results all together of the
> compoent wi_fold() calls.  It would require updating/audfiting  a
> number of range-op entries and adding an overflowed_p()  query to
> irange.

I also tried to add a 'm_ovf' field to irange and record the
overflow info during 'wi_fold' in range-op(e.g. plus/minus/mult).
But, as you said, overflow info is not widely used (it seems that 
match.pd covers most of the cases which are using overflow info).
It may not be worth adding a field to every instance of VR, and 
additional cost may not be profitable to maintain it during VR
assign/union_/intersect.
I've attached a patch for this idea, just for reference.

Currently, what I'm trying to do is find a better place to add
APIs to check the overflow info for match.pd.

vr-range.h/cc would be one choice :)
I noticed Aldy mentioned 'may_overflow_p' in a comment of PR100499,
where this API was trying to add? This may be another choice.

Thanks a gain for your great comments!

BR,
Jeff (Jiufu Guo)

>
> Andrew

gcc/ChangeLog:

* range-op.cc (value_range_with_overflow): Call set_overflow.
(operator_mult::wi_fold): Call set_overflow.
* value-query.h (get_range): New function.
* value-range-storage.cc (irange_storage::set_irange): Set
ovf info.
(irange_storage::get_irange): Call set_overflow.
* value-range-storage.h (irange_storage): Add m_ovf field.
* value-range.cc (irange::nonnegative_p): New function.
(irange::nonpositive_p): New function.
(irange::operator=): Maintain ovf info.
(irange::union_): Maintain ovf info.
(irange::intersect): Maintain ovf info.
* value-range.h (irange::irange): Initialize m_ovf.

---
 gcc/range-op.cc| 17 +
 gcc/value-query.h  | 10 ++
 gcc/value-range-storage.cc |  2 ++
 gcc/value-range-storage.h  |  1 +
 gcc/value-range.cc | 29 ++---
 gcc/value-range.h  |  6 ++
 6 files changed, 58 insertions(+), 7 deletions(-)

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 3ab2c665901..02971e8a16a 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -433,6 +433,10 @@ value_range_with_overflow (irange , tree type,
   const unsigned int prec = TYPE_PRECISION (type);
   const bool overflow_wraps = TYPE_OVERFLOW_WRAPS (type);
 
+  bool ovf = !TYPE_OVERFLOW_UNDEFINED (type)
+&& (min_ovf != wi::OVF_NONE || max_ovf != wi::OVF_NONE);
+  r.set_overflow (ovf ? wi::OVF_UNKNOWN : wi::OVF_NONE);
+
   // For one bit precision if max != min, then the range covers all
   // values.
   if (prec == 1 && wi::ne_p (wmax, wmin))
@@ -2050,10 +2054,15 @@ operator_mult::wi_fold (irange , tree type,
 
   // Sort the 4 products so that min is in prod0 and max is in
   // prod3.
-  widest2_int prod0 = min0 * min1;
-  widest2_int prod1 = min0 * max1;
-  widest2_int prod2 = 

Re: [RFC] light expander sra for parameters and returns

2023-07-13 Thread Jiufu Guo via Gcc-patches


Hi Martin,

Jiufu Guo via Gcc-patches  writes:

> Hi,
>
> Martin Jambor  writes:
>
>> Hi,
>>
>> On Tue, May 30 2023, Richard Biener wrote:
>>> On Mon, 29 May 2023, Jiufu Guo wrote:
>>>
>>>> Hi,
>>>> 
>>>> Previously, I was investigating some struct parameters and returns related
>>>> PRs 69143/65421/108073.
>>>> 
>>>> Investigating the issues case by case, and drafting patches for each of
>>>> them one by one. This would help us to enhance code incrementally.
>>>> While, this way, patches would interact with each other and implement
>>>> different codes for similar issues (because of the different paths in
>>>> gimple/rtl).  We may have a common fix for those issues.
>>>> 
>>>> We know a few other related PRs(such as meta-bug PR101926) exist. For those
>>>> PRs in different targets with different symptoms (and also different root
>>>> cause), I would expect a method could help some of them, but it may
>>>> be hard to handle all of them in one fix.
>>>> 
>>>> With investigation and check discussion for the issues, I remember a
>>>> suggestion from Richard: it would be nice to perform some SRA-like analysis
>>>> for the accesses on the structs (parameter/returns).
>>>> https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605117.html
>>>> This may be a 'fairly common method' for those issues. With this idea,
>>>> I drafted a patch as below in this mail.
>>>> 
>>>> I also thought about directly using tree-sra.cc, e.g. enhance it and rerun 
>>>> it
>>>> at the end of GIMPLE passes. While since some issues are introduced inside
>>>> the expander, so below patch also co-works with other parts of the 
>>>> expander.
>>>> And since we already have tree-sra in gimple pass, we only need to take 
>>>> more
>>>> care on parameter and return in this patch: other decls could be handled
>>>> well in tree-sra.
>>>> 
>>>> The steps of this patch are:
>>>> 1. Collect struct type parameters and returns, and then scan the function 
>>>> to
>>>> get the accesses on them. And figure out the accesses which would be 
>>>> profitable
>>>> to be scalarized (using registers of the parameter/return ). Now, reading 
>>>> on
>>>> parameter and writing on returns are checked in the current patch.
>>>> 2. When/after the scalar registers are determined/expanded for the return 
>>>> or
>>>> parameters, compute the corresponding scalar register(s) for each accesses 
>>>> of
>>>> the return/parameter, and prepare the scalar RTLs for those accesses.
>>>> 3. When using/expanding the accesses expression, leverage the 
>>>> computed/prepared
>>>> scalars directly.
>>>> 
>>>> This patch is tested on ppc64 both LE and BE.
>>>> To continue, I would ask for comments and suggestions first. And then I 
>>>> would
>>>> update/enhance accordingly.  Thanks in advance!
>>>
>>> Thanks for working on this - the description above sounds exactly like
>>> what should be done.
>>>
>>> Now - I'd like the code to re-use the access tree data structure from
>>> SRA plus at least the worker creating the accesses from a stmt.
>>

I'm thinking about which part of the code can be re-used from
ipa-sra and tree-sra.
It seems there are some similar concepts between them:
"access with offset/size", "collect and check candidates",
"analyze accesses"...

While because the purposes are different, the logic and behavior
between them (ipa-sra, tree-sra, and expander-sra) are different,
even for similar concepts.

The same behavior and similar concept may be reusable. Below list
may be part of them.
*. allocate and maintain access
   basic access structure: offset, size, reverse
*. type or expr checking
*. disqualify
*. scan and build expr access
*. scan and walk stmts (return/assign/call/asm)
*. collect candidates
*. initialize/deinitialize
*. access dump

There are different behaviors for a similar concept.
For examples:
*. Access has grg/queues in tree-sra, access has nonarg in ipa-sra,
and expander-sra does not check access's child/sibling yet.
*. for same stmt(assign/call), different sra checks different logic.
*. candidates have different checking logic: ipa-sra checks more stuff.

Is this align with your thoughts?  Thanks for comments!

BR,
Jeff (Jiufu Guo)

> Thanks Martin for your repl

[PATCH V4] Optimize '(X - N * M) / N' to 'X / N - M' if valid

2023-07-11 Thread Jiufu Guo via Gcc-patches
Hi,

Integer expression "(X - N * M) / N" can be optimized to "X / N - M"
if there is no wrap/overflow/underflow and "X - N * M" has the same
sign with "X".

Compare the previous version:
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/623028.html
- The APIs for checking overflow of range operation are moved to
other files: range-op and gimple-range.
- Improve the patterns with '(X + C)' for unsigned type.

Bootstrap & regtest pass on ppc64{,le} and x86_64.
Is this patch ok for trunk?

BR,
Jeff (Jiufu Guo)


PR tree-optimization/108757

gcc/ChangeLog:

* gimple-range.cc (arith_without_overflow_p): New function.
(same_sign_p): New function.
* gimple-range.h (arith_without_overflow_p): New declare.
(same_sign_p): New declare.
* match.pd ((X - N * M) / N): New pattern.
((X + N * M) / N): New pattern.
((X + C) div_rshift N): New pattern.
* range-op.cc (plus_without_overflow_p): New function.
(minus_without_overflow_p): New function.
(mult_without_overflow_p): New function.
* range-op.h (plus_without_overflow_p): New declare.
(minus_without_overflow_p): New declare.
(mult_without_overflow_p): New declare.
* value-query.h (get_range): New function
* value-range.cc (irange::nonnegative_p): New function.
(irange::nonpositive_p): New function.
* value-range.h (irange::nonnegative_p): New declare.
(irange::nonpositive_p): New declare.

gcc/testsuite/ChangeLog:

* gcc.dg/pr108757-1.c: New test.
* gcc.dg/pr108757-2.c: New test.
* gcc.dg/pr108757.h: New test.

---
 gcc/gimple-range.cc   |  50 +++
 gcc/gimple-range.h|   2 +
 gcc/match.pd  |  64 
 gcc/range-op.cc   |  77 ++
 gcc/range-op.h|   4 +
 gcc/value-query.h |  10 ++
 gcc/value-range.cc|  12 ++
 gcc/value-range.h |   2 +
 gcc/testsuite/gcc.dg/pr108757-1.c |  18 +++
 gcc/testsuite/gcc.dg/pr108757-2.c |  19 +++
 gcc/testsuite/gcc.dg/pr108757.h   | 233 ++
 11 files changed, 491 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pr108757-1.c
 create mode 100644 gcc/testsuite/gcc.dg/pr108757-2.c
 create mode 100644 gcc/testsuite/gcc.dg/pr108757.h

diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
index 
01e62d3ff3901143bde33dc73c0debf41d0c0fdd..620fe32e85e5fe3847a933554fc656b2939cf02d
 100644
--- a/gcc/gimple-range.cc
+++ b/gcc/gimple-range.cc
@@ -926,3 +926,53 @@ assume_query::dump (FILE *f)
 }
   fprintf (f, "--\n");
 }
+
+/* Return true if the operation "X CODE Y" in type does not overflow
+   underflow or wrap with value range info, otherwise return false.  */
+
+bool
+arith_without_overflow_p (tree_code code, tree x, tree y, tree type)
+{
+  gcc_assert (INTEGRAL_TYPE_P (type));
+
+  if (TYPE_OVERFLOW_UNDEFINED (type))
+return true;
+
+  value_range vr0;
+  value_range vr1;
+  if (!(get_range (vr0, x) && get_range (vr1, y)))
+return false;
+
+  switch (code)
+{
+case PLUS_EXPR:
+  return plus_without_overflow_p (vr0, vr1, type);
+case MINUS_EXPR:
+  return minus_without_overflow_p (vr0, vr1, type);
+case MULT_EXPR:
+  return mult_without_overflow_p (vr0, vr1, type);
+default:
+  gcc_unreachable ();
+}
+
+  return false;
+}
+
+/* Return true if "X" and "Y" have the same sign or zero.  */
+
+bool
+same_sign_p (tree x, tree y, tree type)
+{
+  gcc_assert (INTEGRAL_TYPE_P (type));
+
+  if (TYPE_UNSIGNED (type))
+return true;
+
+  value_range vr0;
+  value_range vr1;
+  if (!(get_range (vr0, x) && get_range (vr1, y)))
+return false;
+
+  return (vr0.nonnegative_p () && vr1.nonnegative_p ())
+|| (vr0.nonpositive_p () && vr1.nonpositive_p ());
+}
diff --git a/gcc/gimple-range.h b/gcc/gimple-range.h
index 
6587e4923ff44e10826a697ecced237a0ad23c88..84eac87392b642ed3305011415c804f5b319e09f
 100644
--- a/gcc/gimple-range.h
+++ b/gcc/gimple-range.h
@@ -101,5 +101,7 @@ protected:
   gori_compute m_gori;
 };
 
+bool arith_without_overflow_p (tree_code code, tree x, tree y, tree type);
+bool same_sign_p (tree x, tree y, tree type);
 
 #endif // GCC_GIMPLE_RANGE_H
diff --git a/gcc/match.pd b/gcc/match.pd
index 
8543f777a28e4f39b2b2a40d0702aed88786bbb3..87e990c5b1ebbd116d7d7efdba62347d3a967cdd
 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -942,6 +942,70 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 #endif

 
+#if GIMPLE
+(for div (trunc_div exact_div)
+ /* Simplify (t + M*N) / N -> t / N + M.  */
+ (simplify
+  (div (plus:c@4 @0 (mult:c@3 @1 @2)) @2)
+  (if (INTEGRAL_TYPE_P (type)
+   && arith_without_overflow_p (MULT_EXPR, @1, @2, type)
+   && arith_without_overflow_p (PLUS_EXPR, @0, @3, type)
+   && same_sign_p (@0, @4, type))
+  (plus (div @0 @2) @1)))
+
+ /* Simplify (t - M*N) / N -> t / N - M.  */
+ 

Re: [PATCH V3] Optimize '(X - N * M) / N' to 'X / N - M' if valid

2023-07-04 Thread Jiufu Guo via Gcc-patches


Hi Richard/Andrew!

Richard Biener  writes:

> On Thu, 29 Jun 2023, Jiufu Guo wrote:
>
>> 
>> Hi,
>> 
>> Jiufu Guo  writes:
>> 
>> > Hi,
>> >
>> > Integer expression "(X - N * M) / N" can be optimized to "X / N - M" if
>> > there is no wrap/overflow/underflow and "X - N * M" has the same sign
>> > with "X".
>> >
>> > Compare with the previous version:
>> > https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620896.html
>> > This version changes:
>> > 1. Remove the behavior to convert 'm' to '-m' for unsigned variable.
>> >This kind of case is rare, and it makes the code ambiguous.
>> > 2. Use the 'capture' expression and avoid building new expressions.
>> > 3. Add APIs like get_range and nonpositive/nonnegative.
>> > 4. Refactor patterns in match.pd and function names and signatures.
>> >
>> > While some APIs are still in gimple-fold.cc/h.  Tried to add them
>> > to other files, but did not find a better place.
>> > Thanks for comments/suggestions!
>> 
>> Saving and propagating overflow information in range-op and value-range
>> maybe one idea.  While I'm wondering if this is a better method from
>> the aspect of compiling time and memory usage.
>> As below attached patch, a m_ovf field is added to irange, and maintain
>> it in range-op/value-range-storage.
>
> I don't think we want to store that.  But still we should have a
> way to compute whether the operation in a stmt could overflow, but
> this should be (or already does) exist within the ranger API.
>
> So do not add mult_without_overflow_p and friends to gimple-fold.{cc,h}
> but instead leverage what's available in ranger (or add to it there).
>
> IIRC simplify_using_ranges for example used to perform narrowing
> based on knowledge if the resulting op would overflow or not.
>
> CCing Andrew again here.

I'm also trying to find a good place to define these APIs, but did not
find a perfect place.

If any suggestions, please point them out!!!
Thanks in advance.

*. For the trunk code, there are a few codes that care about overflow info:
- simplify_using_ranges is using arith_overflowed_p to check overflow info
  on CST value.
- arith_overflowed_p is defined in fold-const/gimple-fold. It checks overflow
  info via wi::add/sub/mult
  -- fold_builtin_arith_overflow/fold_binary_loc also care about CST.
  -- TREE_OVERFLOW(cst)/TREE_OVERFLOW_P, it is also about CONSTANT_CLASS
 CST_CHECK (NODE)->base.public_flag.

- rewrite_to_defined_overflow and match_arith_overflow is transforming
  signed integer operation to unsigned arithmetic.

- rang-op is also using wi::add/sub/mult/neg to check overflow info
  This is why I was trying to use the overflow info directly.
  If maintain the overflow info in value-range, we may need to pay the cost
  on rang-op/ssa_cache/range-value-store. I'm also wondering if we want to
  do this.

- match.pd, a few patterns in this file are checking overflow info via
  value-range and wi::add/sub/mult. This is mostly what we need.
  The most current match.pd is inlining value-range and overflow info
  checking inside patterns.
  I'm wondering if we may want to extract common code from patterns into
  gimple-fold.cc/h (or xx-match-head.cc).


BR,
Jeff (Jiufu Guo)

>
> Richard.
>
>
>> BR,
>> Jeff (Jiufu Guo)
>> 
>> diff --git a/gcc/range-op.cc b/gcc/range-op.cc
>> index 3ab2c665901..7c287aed8b8 100644
>> --- a/gcc/range-op.cc
>> +++ b/gcc/range-op.cc
>> @@ -261,6 +261,7 @@ range_operator::fold_range (irange , tree type,
>>  relation_trio trio) const
>>  {
>>gcc_checking_assert (r.supports_type_p (type));
>> +  r.set_overflow (lh.with_overflow () || rh.with_overflow ());
>>if (empty_range_varying (r, type, lh, rh))
>>  return true;
>>  
>> @@ -433,6 +434,10 @@ value_range_with_overflow (irange , tree type,
>>const unsigned int prec = TYPE_PRECISION (type);
>>const bool overflow_wraps = TYPE_OVERFLOW_WRAPS (type);
>>  
>> +  if (!TYPE_OVERFLOW_UNDEFINED (type)
>> +  && (min_ovf != wi::OVF_NONE || max_ovf != wi::OVF_NONE))
>> +r.set_overflow (true);
>> +
>>// For one bit precision if max != min, then the range covers all
>>// values.
>>if (prec == 1 && wi::ne_p (wmax, wmin))
>> @@ -2050,10 +2055,15 @@ operator_mult::wi_fold (irange , tree type,
>>  
>>// Sort the 4 products so that min is in prod0 and max is in
>>// prod3.
>> -  widest2_int prod0 = min0 * min1;
>> -  widest2_int prod1 = min0 * max1;
>> -  widest2_int prod2 = max0 * min1;
>> -  widest2_int prod3 = max0 * max1;
>> +  wi::overflow_type ovf1, ovf2, ovf3, ovf4;
>> +  widest2_int prod0 = wi::mul (min0, min1, sign, );
>> +  widest2_int prod1 = wi::mul (min0, max1, sign, );
>> +  widest2_int prod2 = wi::mul (max0, min1, sign, );
>> +  widest2_int prod3 = wi::mul (max0, max1, sign, );
>> +  if (!TYPE_OVERFLOW_UNDEFINED (type)
>> +  && (ovf1 != wi::OVF_NONE || ovf2 != wi::OVF_NONE || ovf3 != 
>> wi::OVF_NONE
>> +  || ovf3 != wi::OVF_NONE))
>> +r.set_overflow (true);
>>  
>>// min0min1 

Re: ping^^^^: [PATCH V2] rs6000: Enhance lowpart/highpart DI->SF by mtvsrws/mtvsrd

2023-07-04 Thread Jiufu Guo via Gcc-patches


Hi,

I just submit a new version:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623533.html
So, we could ignore this ping and check the new version.

BR,
Jeff (Jiufu Guo)

Jiufu Guo via Gcc-patches  writes:

> Hi,
>
> Gentle ping ...
>
> Jiufu Guo via Gcc-patches  writes:
>
>> Gentle ping...
>>
>> Jiufu Guo via Gcc-patches  writes:
>>
>>> Gentle ping...
>>>
>>> Jiufu Guo via Gcc-patches  writes:
>>>
>>>> Hi
>>>>
>>>> I would like to ping this patch for stage1:
>>>> https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612168.html
>>>>
>>>> BR,
>>>> Jeff (Jiufu)
>>>>
>>>> Jiufu Guo  writes:
>>>>
>>>>> Hi,
>>>>>
>>>>> Compare with previous version:
>>>>> https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609654.html
>>>>> This patch does not use UNSPEC for insn mtvsrws anymore.  And to handle
>>>>> the subreg better on BE and LE, predicate "lowpart_subreg_operator"
>>>>> is introducted. To help combine pass to match the pattern on high32
>>>>> bit of DI, shiftrt is still used.
>>>>>
>>>>> As mentioned in PR108338, on p9, we could use mtvsrws to implement
>>>>> the conversion from SI#0 to SF (or lowpart DI to SF).
>>>>>
>>>>> For examples:
>>>>>   *(long long*)buff = di;
>>>>>   float f = *(float*)(buff);
>>>>> We generate "sldi 9,3,32 ; mtvsrd 1,9 ; xscvspdpn 1,1" instead of
>>>>> "mtvsrws 1,3 ; xscvspdpn 1,1".
>>>>>
>>>>> This patch update this, and also enhance the bitcast from highpart
>>>>> DI to SF.
>>>>>
>>>>> Bootstrap and regtests pass on ppc64{,le}.
>>>>> Is this ok for trunk?
>>>>>
>>>>> BR,
>>>>> Jeff (Jiufu)
>>>>>
>>>>>   PR target/108338
>>>>>
>>>>> gcc/ChangeLog:
>>>>>
>>>>>   * config/rs6000/predicates.md (lowpart_subreg_operator): New
>>>>>   define_predicate.
>>>>>   * config/rs6000/rs6000.md (any_rshift): New code_iterator.
>>>>>   (movsf_from_si2): Rename to...
>>>>>   (movsf_from_si2_): ... this.
>>>>>   (si2sf_mtvsrws): New define_insn.
>>>>>
>>>>> gcc/testsuite/ChangeLog:
>>>>>
>>>>>   * gcc.target/powerpc/pr108338.c: New test.
>>>>>
>>>>> ---
>>>>>  gcc/config/rs6000/predicates.md |  5 +++
>>>>>  gcc/config/rs6000/rs6000.md | 35 -
>>>>>  gcc/testsuite/gcc.target/powerpc/pr108338.c | 42 +
>>>>>  3 files changed, 73 insertions(+), 9 deletions(-)
>>>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108338.c
>>>>>
>>>>> diff --git a/gcc/config/rs6000/predicates.md 
>>>>> b/gcc/config/rs6000/predicates.md
>>>>> index 52c65534e51..e57c9d99c6b 100644
>>>>> --- a/gcc/config/rs6000/predicates.md
>>>>> +++ b/gcc/config/rs6000/predicates.md
>>>>> @@ -2064,3 +2064,8 @@ (define_predicate "macho_pic_address"
>>>>>else
>>>>>  return false;
>>>>>  })
>>>>> +
>>>>> +(define_predicate "lowpart_subreg_operator"
>>>>> +  (and (match_code "subreg")
>>>>> +   (match_test "subreg_lowpart_offset (mode, GET_MODE (SUBREG_REG 
>>>>> (op)))
>>>>> + == SUBREG_BYTE (op)")))
>>>>> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
>>>>> index 4a7812fa592..5b4a7f8d801 100644
>>>>> --- a/gcc/config/rs6000/rs6000.md
>>>>> +++ b/gcc/config/rs6000/rs6000.md
>>>>> @@ -7539,6 +7539,14 @@ (define_split
>>>>>UNSPEC_MOVSI_GOT))]
>>>>>"")
>>>>>  
>>>>> +(define_insn "si2sf_mtvsrws"
>>>>> +  [(set (match_operand:SF 0 "gpc_reg_operand" "=wa")
>>>>> +   (subreg:SF (match_operand:SI 1 "gpc_reg_operand" "r") 0))]
>>>>> +  "TARGET_P9_VECTOR && TARGET_XSCVSPDPN"
>>>>> +  "mtvsrws %x0,%1\n\txsc

[PATCH V3] rs6000: Enhance lowpart/highpart DI->SF by mtvsrws/mtvsrd

2023-07-04 Thread Jiufu Guo via Gcc-patches
Hi,

As mentioned in PR108338, on p9, we could use mtvsrws to implement
the bitcast from SI#0 to SF (or lowpart DI to SF).

For code:
  *(long long*)buff = di;
  float f = *(float*)(buff);

"sldi 9,3,32 ; mtvsrd 1,9 ; xscvspdpn 1,1" is generated.
But "mtvsrws 1,3 ; xscvspdpn 1,1" would be better.

Or say, the bitcast from lowpart DI(also highpart DI) to SF could be enhanced.

Compare with previous version:
https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611823.html
This patch does not define new insn for mtvsrws, but use existing insns.


Bootstrap and regtests pass on ppc64{,le}.
Is this ok for trunk?

BR,
Jeff (Jiufu)


PR target/108338

gcc/ChangeLog:

* config/rs6000/predicates.md (lowpart_subreg_operator): New
define_predicate.
* config/rs6000/rs6000.md (any_rshift): New code_iterator.
(movsf_from_si): Update to generate mtvsrws for P9.
(movsf_from_si2): Rename to...
(movsf_from_si2_): ... this, and use lowpart_subreg_operator.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr108338.c: New test.

---
 gcc/config/rs6000/predicates.md |  5 +++
 gcc/config/rs6000/rs6000.md | 34 +++--
 gcc/testsuite/gcc.target/powerpc/pr108338.c | 42 +
 3 files changed, 70 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108338.c

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 
a16ee30f0c061965da07a5832097eeffa6ccf29c..94b948868881b1a96c5653cbc396b81ebb60c74c
 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -2101,3 +2101,8 @@ (define_predicate "macho_pic_address"
   else
 return false;
 })
+
+(define_predicate "lowpart_subreg_operator"
+  (and (match_code "subreg")
+   (match_test "subreg_lowpart_offset (mode, GET_MODE (SUBREG_REG (op)))
+   == SUBREG_BYTE (op)")))
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 
b0db8ae508d8ee50e34f85b654988ae0919e152f..1a8f3ff362a3973ec6260a0702fa930679cf66d1
 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -8215,13 +8215,24 @@ (define_insn_and_split "movsf_from_si"
 {
   rtx op0 = operands[0];
   rtx op1 = operands[1];
-  rtx op2 = operands[2];
-  rtx op1_di = gen_rtx_REG (DImode, REGNO (op1));
 
-  /* Move SF value to upper 32-bits for xscvspdpn.  */
-  emit_insn (gen_ashldi3 (op2, op1_di, GEN_INT (32)));
-  emit_insn (gen_p8_mtvsrd_sf (op0, op2));
-  emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0));
+  if (TARGET_P9_VECTOR)
+{
+  rtx op0_v = gen_rtx_REG (V4SImode, REGNO (op0));
+  emit_insn (gen_vsx_splat_v4si (op0_v, op1));
+  emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0));
+}
+  else
+{
+  rtx op2 = operands[2];
+  rtx op1_di = gen_rtx_REG (DImode, REGNO (op1));
+
+  /* Move SF value to upper 32-bits for xscvspdpn.  */
+  emit_insn (gen_ashldi3 (op2, op1_di, GEN_INT (32)));
+  emit_insn (gen_p8_mtvsrd_sf (op0, op2));
+  emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0));
+}
+
   DONE;
 }
   [(set_attr "length"
@@ -8234,18 +8245,19 @@ (define_insn_and_split "movsf_from_si"
"*,  *, p9v,   p8v,   *, *,
 p8v,p8v,   p8v,   *")])
 
+(define_code_iterator any_rshift [ashiftrt lshiftrt])
+
 ;; For extracting high part element from DImode register like:
 ;; {%1:SF=unspec[r122:DI>>0x20#0] 86;clobber scratch;}
 ;; split it before reload with "and mask" to avoid generating shift right
 ;; 32 bit then shift left 32 bit.
-(define_insn_and_split "movsf_from_si2"
+(define_insn_and_split "movsf_from_si2_"
   [(set (match_operand:SF 0 "gpc_reg_operand" "=wa")
(unspec:SF
-[(subreg:SI
-  (ashiftrt:DI
+[(match_operator:SI 3 "lowpart_subreg_operator"
+  [(any_rshift:DI
(match_operand:DI 1 "input_operand" "r")
-   (const_int 32))
-  0)]
+   (const_int 32))])]
 UNSPEC_SF_FROM_SI))
   (clobber (match_scratch:DI 2 "=r"))]
   "TARGET_NO_SF_SUBREG"
diff --git a/gcc/testsuite/gcc.target/powerpc/pr108338.c 
b/gcc/testsuite/gcc.target/powerpc/pr108338.c
new file mode 100644
index 
..39da7cec535c59ec34d5f6fc4a63b44ef4316976
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr108338.c
@@ -0,0 +1,42 @@
+// { dg-do run }
+// { dg-options "-O2 -save-temps" }
+
+float __attribute__ ((noipa)) sf_from_di_off0 (long long l)
+{
+  char buff[16];
+  *(long long*)buff = l;
+  float f = *(float*)(buff);
+  return f;
+}
+
+float  __attribute__ ((noipa)) sf_from_di_off4 (long long l)
+{
+  char buff[16];
+  *(long long*)buff = l;
+  float f = *(float*)(buff + 4);
+  return f; 
+}
+
+/* Under lp64, parameter 'l' is in one DI reg, then bitcast sub DI to SF. */
+/* { dg-final { scan-assembler-times 

ping^^^^: [PATCH V2] rs6000: Enhance lowpart/highpart DI->SF by mtvsrws/mtvsrd

2023-07-03 Thread Jiufu Guo via Gcc-patches


Hi,

Gentle ping ...

Jiufu Guo via Gcc-patches  writes:

> Gentle ping...
>
> Jiufu Guo via Gcc-patches  writes:
>
>> Gentle ping...
>>
>> Jiufu Guo via Gcc-patches  writes:
>>
>>> Hi
>>>
>>> I would like to ping this patch for stage1:
>>> https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612168.html
>>>
>>> BR,
>>> Jeff (Jiufu)
>>>
>>> Jiufu Guo  writes:
>>>
>>>> Hi,
>>>>
>>>> Compare with previous version:
>>>> https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609654.html
>>>> This patch does not use UNSPEC for insn mtvsrws anymore.  And to handle
>>>> the subreg better on BE and LE, predicate "lowpart_subreg_operator"
>>>> is introducted. To help combine pass to match the pattern on high32
>>>> bit of DI, shiftrt is still used.
>>>>
>>>> As mentioned in PR108338, on p9, we could use mtvsrws to implement
>>>> the conversion from SI#0 to SF (or lowpart DI to SF).
>>>>
>>>> For examples:
>>>>   *(long long*)buff = di;
>>>>   float f = *(float*)(buff);
>>>> We generate "sldi 9,3,32 ; mtvsrd 1,9 ; xscvspdpn 1,1" instead of
>>>> "mtvsrws 1,3 ; xscvspdpn 1,1".
>>>>
>>>> This patch update this, and also enhance the bitcast from highpart
>>>> DI to SF.
>>>>
>>>> Bootstrap and regtests pass on ppc64{,le}.
>>>> Is this ok for trunk?
>>>>
>>>> BR,
>>>> Jeff (Jiufu)
>>>>
>>>>PR target/108338
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>>* config/rs6000/predicates.md (lowpart_subreg_operator): New
>>>>define_predicate.
>>>>* config/rs6000/rs6000.md (any_rshift): New code_iterator.
>>>>(movsf_from_si2): Rename to...
>>>>(movsf_from_si2_): ... this.
>>>>(si2sf_mtvsrws): New define_insn.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>>* gcc.target/powerpc/pr108338.c: New test.
>>>>
>>>> ---
>>>>  gcc/config/rs6000/predicates.md |  5 +++
>>>>  gcc/config/rs6000/rs6000.md | 35 -
>>>>  gcc/testsuite/gcc.target/powerpc/pr108338.c | 42 +
>>>>  3 files changed, 73 insertions(+), 9 deletions(-)
>>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108338.c
>>>>
>>>> diff --git a/gcc/config/rs6000/predicates.md 
>>>> b/gcc/config/rs6000/predicates.md
>>>> index 52c65534e51..e57c9d99c6b 100644
>>>> --- a/gcc/config/rs6000/predicates.md
>>>> +++ b/gcc/config/rs6000/predicates.md
>>>> @@ -2064,3 +2064,8 @@ (define_predicate "macho_pic_address"
>>>>else
>>>>  return false;
>>>>  })
>>>> +
>>>> +(define_predicate "lowpart_subreg_operator"
>>>> +  (and (match_code "subreg")
>>>> +   (match_test "subreg_lowpart_offset (mode, GET_MODE (SUBREG_REG 
>>>> (op)))
>>>> +  == SUBREG_BYTE (op)")))
>>>> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
>>>> index 4a7812fa592..5b4a7f8d801 100644
>>>> --- a/gcc/config/rs6000/rs6000.md
>>>> +++ b/gcc/config/rs6000/rs6000.md
>>>> @@ -7539,6 +7539,14 @@ (define_split
>>>> UNSPEC_MOVSI_GOT))]
>>>>"")
>>>>  
>>>> +(define_insn "si2sf_mtvsrws"
>>>> +  [(set (match_operand:SF 0 "gpc_reg_operand" "=wa")
>>>> +   (subreg:SF (match_operand:SI 1 "gpc_reg_operand" "r") 0))]
>>>> +  "TARGET_P9_VECTOR && TARGET_XSCVSPDPN"
>>>> +  "mtvsrws %x0,%1\n\txscvspdpn %x0,%x0"
>>>> +  [(set_attr "type" "mfvsr")
>>>> +   (set_attr "length" "8")])
>>>> +
>>>>  ;;   MR  LA
>>>>  ;;   LWZ LFIWZX  LXSIWZX
>>>>  ;;   STW STFIWX  STXSIWX
>>>> @@ -8203,10 +8211,18 @@ (define_insn_and_split "movsf_from_si"
>>>>rtx op2 = operands[2];
>>>>rtx op1_di = gen_rtx_REG (DImode, REGNO (op1));
>>>>  
>>>> -  /* M

[PATCH V4 1/4] rs6000: build constant via li;rotldi

2023-07-03 Thread Jiufu Guo via Gcc-patches
Hi,

If a constant is possible to be rotated to/from a positive or negative
value from "li", then "li;rotldi" can be used to build the constant.

Compare with the previous version:
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621961.html
This patch just did minor changes to the style and comments.

Bootstrap and regtest pass on ppc64{,le}.

Since the previous version is approved with conditions, this version
explained the concern too.  If no objection, I would like to apply
this patch to trunk.


BR,
Jeff (Jiufu)

gcc/ChangeLog:

* config/rs6000/rs6000.cc (can_be_built_by_li_and_rotldi): New function.
(rs6000_emit_set_long_const): Call can_be_built_by_li_and_rotldi.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/const-build.c: New test.
---
 gcc/config/rs6000/rs6000.cc   | 47 +--
 .../gcc.target/powerpc/const-build.c  | 57 +++
 2 files changed, 98 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/const-build.c

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 42f49e4a56b..acc332acc05 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -10258,6 +10258,31 @@ rs6000_emit_set_const (rtx dest, rtx source)
   return true;
 }
 
+/* Check if value C can be built by 2 instructions: one is 'li', another is
+   rotldi.
+
+   If so, *SHIFT is set to the shift operand of rotldi(rldicl), and *MASK
+   is set to the mask operand of rotldi(rldicl), and return true.
+   Return false otherwise.  */
+
+static bool
+can_be_built_by_li_and_rotldi (HOST_WIDE_INT c, int *shift,
+  HOST_WIDE_INT *mask)
+{
+  /* If C or ~C contains at least 49 successive zeros, then C can be rotated
+ to/from a positive or negative value that 'li' is able to load.  */
+  int n;
+  if (can_be_rotated_to_lowbits (c, 15, )
+  || can_be_rotated_to_lowbits (~c, 15, ))
+{
+  *mask = HOST_WIDE_INT_M1;
+  *shift = HOST_BITS_PER_WIDE_INT - n;
+  return true;
+}
+
+  return false;
+}
+
 /* Subroutine of rs6000_emit_set_const, handling PowerPC64 DImode.
Output insns to set DEST equal to the constant C as a series of
lis, ori and shl instructions.  */
@@ -10266,15 +10291,14 @@ static void
 rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
 {
   rtx temp;
+  int shift;
+  HOST_WIDE_INT mask;
   HOST_WIDE_INT ud1, ud2, ud3, ud4;
 
   ud1 = c & 0x;
-  c = c >> 16;
-  ud2 = c & 0x;
-  c = c >> 16;
-  ud3 = c & 0x;
-  c = c >> 16;
-  ud4 = c & 0x;
+  ud2 = (c >> 16) & 0x;
+  ud3 = (c >> 32) & 0x;
+  ud4 = (c >> 48) & 0x;
 
   if ((ud4 == 0x && ud3 == 0x && ud2 == 0x && (ud1 & 0x8000))
   || (ud4 == 0 && ud3 == 0 && ud2 == 0 && ! (ud1 & 0x8000)))
@@ -10305,6 +10329,17 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
   emit_move_insn (dest, gen_rtx_XOR (DImode, temp,
 GEN_INT ((ud2 ^ 0x) << 16)));
 }
+  else if (can_be_built_by_li_and_rotldi (c, , ))
+{
+  temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
+  unsigned HOST_WIDE_INT imm = (c | ~mask);
+  imm = (imm >> shift) | (imm << (HOST_BITS_PER_WIDE_INT - shift));
+
+  emit_move_insn (temp, GEN_INT (imm));
+  if (shift != 0)
+   temp = gen_rtx_ROTATE (DImode, temp, GEN_INT (shift));
+  emit_move_insn (dest, temp);
+}
   else if (ud3 == 0 && ud4 == 0)
 {
   temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
diff --git a/gcc/testsuite/gcc.target/powerpc/const-build.c 
b/gcc/testsuite/gcc.target/powerpc/const-build.c
new file mode 100644
index 000..69b37e2bb53
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/const-build.c
@@ -0,0 +1,57 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -save-temps" } */
+/* { dg-require-effective-target has_arch_ppc64 } */
+
+/* Verify that two instructions are sucessfully used to build constants.
+   One insn is li or lis, another is rotate: rldicl, rldicr or rldic.  */
+
+#define NOIPA __attribute__ ((noipa))
+
+struct fun
+{
+  long long (*f) (void);
+  long long val;
+};
+
+long long NOIPA
+li_rotldi_1 (void)
+{
+  return 0x75310LL;
+}
+
+long long NOIPA
+li_rotldi_2 (void)
+{
+  return 0x2164LL;
+}
+
+long long NOIPA
+li_rotldi_3 (void)
+{
+  return 0x8531LL;
+}
+
+long long NOIPA
+li_rotldi_4 (void)
+{
+  return 0x2194LL;
+}
+
+struct fun arr[] = {
+  {li_rotldi_1, 0x75310LL},
+  {li_rotldi_2, 0x2164LL},
+  {li_rotldi_3, 0x8531LL},
+  {li_rotldi_4, 0x2194LL},
+};
+
+/* { dg-final { scan-assembler-times {\mrotldi\M} 4 } } */
+
+int
+main ()
+{
+  for (int i = 0; i < sizeof (arr) / sizeof (arr[0]); i++)
+if ((*arr[i].f) () != arr[i].val)
+  __builtin_abort ();
+
+  return 0;
+}
-- 
2.39.3



Re: [PATCH V3 1/4] rs6000: build constant via li;rotldi

2023-06-28 Thread Jiufu Guo via Gcc-patches


Hi,

Jiufu Guo via Gcc-patches  writes:

> Hi!
>
> Segher Boessenkool  writes:
>
>> Hi!
>>
>> On Fri, Jun 16, 2023 at 04:34:12PM +0800, Jiufu Guo wrote:
>>> +/* Check if value C can be built by 2 instructions: one is 'li', another is
>>> +   rotldi.
>>> +
>>> +   If so, *SHIFT is set to the shift operand of rotldi(rldicl), and *MASK
>>> +   is set to -1, and return true.  Return false otherwise.  */
>>
>> Don't say "is set to -1", the point of having this is so you say "is set
>> to the "li" value".  Just like you describe what SHIFT is for.
> Yes, thanks!
>>
>>> +static bool
>>> +can_be_built_by_li_and_rotldi (HOST_WIDE_INT c, int *shift,
>>> +  HOST_WIDE_INT *mask)
>>> +{
>>> +  int n;
>>
>> Put shis later, like:
> Thanks!
>>
>>> +  /* Check if C can be rotated to a positive or negative value
>>> +  which 'li' instruction is able to load.  */
>>   int n;
>>> +  if (can_be_rotated_to_lowbits (c, 15, )
>>> +  || can_be_rotated_to_lowbits (~c, 15, ))
>>> +{
>>> +  *mask = HOST_WIDE_INT_M1;
>>> +  *shift = HOST_BITS_PER_WIDE_INT - n;
>>> +  return true;
>>> +}
>>
>> It is tricky to see ~c will always work, since what is really done is -c
>> instead.  Can you just use that here?
>
> Some explanation: 
> A negative value of 'li' is:
> 0b11..11xxx there are 49 leading '1's, and the other 15 tailing bits can
> be 0 or 1. With the '~' operation, there are 49 '0's.
> After the value is rotated,  there are still 49 '1's. (xxx may also be
> at head/tail.) 
> For the rotated value, with the '~' operation, there are still 49 '0's.
>
> So, for a value, if there are 49 successive '1's (may cross head/tail).
> It should be able to rotate to low 15 bits after the '~' operation.
>
> It would not be enough if using the '-' operation, since '-x=~x+1' in
> the bit aspect. As the below case 'li_rotldi_3': 0x8531LL
> (rotate left 0x8531 32bit).
> The '~c' is 0x7ace, this can be rotated from 0x7ace. (~0x8531).
> But '-c' is 0x7ace0001. this value is not good.
>
>>
>>> @@ -10266,15 +10291,14 @@ static void
>>>  rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
>>>  {
>>>rtx temp;
>>> +  int shift;
>>> +  HOST_WIDE_INT mask;
>>>HOST_WIDE_INT ud1, ud2, ud3, ud4;
>>>  
>>>ud1 = c & 0x;
>>> -  c = c >> 16;
>>> -  ud2 = c & 0x;
>>> -  c = c >> 16;
>>> -  ud3 = c & 0x;
>>> -  c = c >> 16;
>>> -  ud4 = c & 0x;
>>> +  ud2 = (c >> 16) & 0x;
>>> +  ud3 = (c >> 32) & 0x;
>>> +  ud4 = (c >> 48) & 0x;
>>>  
>>>if ((ud4 == 0x && ud3 == 0x && ud2 == 0x && (ud1 & 0x8000))
>>>|| (ud4 == 0 && ud3 == 0 && ud2 == 0 && ! (ud1 & 0x8000)))
>>> @@ -10305,6 +10329,17 @@ rs6000_emit_set_long_const (rtx dest, 
>>> HOST_WIDE_INT c)
>>>emit_move_insn (dest, gen_rtx_XOR (DImode, temp,
>>>  GEN_INT ((ud2 ^ 0x) << 16)));
>>>  }
>>> +  else if (can_be_built_by_li_and_rotldi (c, , ))
>>> +{
>>> +  temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
>>> +  unsigned HOST_WIDE_INT imm = (c | ~mask);
>>> +  imm = (imm >> shift) | (imm << (HOST_BITS_PER_WIDE_INT - shift));
>>> +
>>> +  emit_move_insn (temp, GEN_INT (imm));
>>> +  if (shift != 0)
>>> +   temp = gen_rtx_ROTATE (DImode, temp, GEN_INT (shift));
>>> +  emit_move_insn (dest, temp);
>>> +}
>>
>> If you would rewrite so it isn't such a run-on thing with "else if",
>> instead using early outs, or even some factoring, you could declare the
>> variable used only in a tiny scope in that tiny scope instead.
>
> Yes! Early returning is better for a lot of cases.  I would like
> to have a refactor patch.
>
>>
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/powerpc/const-build.c
>>> @@ -0,0 +1,54 @@
>>> +/* { dg-do run } */
>>> +/* { dg-options "-O2 -save-temps" } */
>>> +/* { dg-require-effective-target has_arch_ppc64 } */
>>
>> Please put a tiny comment here saying what this test is *for*?  The file
>> name is a bit o

Re: [PATCH V3] Optimize '(X - N * M) / N' to 'X / N - M' if valid

2023-06-28 Thread Jiufu Guo via Gcc-patches


Hi,

Jiufu Guo  writes:

> Hi,
>
> Integer expression "(X - N * M) / N" can be optimized to "X / N - M" if
> there is no wrap/overflow/underflow and "X - N * M" has the same sign
> with "X".
>
> Compare with the previous version:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620896.html
> This version changes:
> 1. Remove the behavior to convert 'm' to '-m' for unsigned variable.
>This kind of case is rare, and it makes the code ambiguous.
> 2. Use the 'capture' expression and avoid building new expressions.
> 3. Add APIs like get_range and nonpositive/nonnegative.
> 4. Refactor patterns in match.pd and function names and signatures.
>
> While some APIs are still in gimple-fold.cc/h.  Tried to add them
> to other files, but did not find a better place.
> Thanks for comments/suggestions!

Saving and propagating overflow information in range-op and value-range
maybe one idea.  While I'm wondering if this is a better method from
the aspect of compiling time and memory usage.
As below attached patch, a m_ovf field is added to irange, and maintain
it in range-op/value-range-storage.

BR,
Jeff (Jiufu Guo)

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 3ab2c665901..7c287aed8b8 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -261,6 +261,7 @@ range_operator::fold_range (irange , tree type,
relation_trio trio) const
 {
   gcc_checking_assert (r.supports_type_p (type));
+  r.set_overflow (lh.with_overflow () || rh.with_overflow ());
   if (empty_range_varying (r, type, lh, rh))
 return true;
 
@@ -433,6 +434,10 @@ value_range_with_overflow (irange , tree type,
   const unsigned int prec = TYPE_PRECISION (type);
   const bool overflow_wraps = TYPE_OVERFLOW_WRAPS (type);
 
+  if (!TYPE_OVERFLOW_UNDEFINED (type)
+  && (min_ovf != wi::OVF_NONE || max_ovf != wi::OVF_NONE))
+r.set_overflow (true);
+
   // For one bit precision if max != min, then the range covers all
   // values.
   if (prec == 1 && wi::ne_p (wmax, wmin))
@@ -2050,10 +2055,15 @@ operator_mult::wi_fold (irange , tree type,
 
   // Sort the 4 products so that min is in prod0 and max is in
   // prod3.
-  widest2_int prod0 = min0 * min1;
-  widest2_int prod1 = min0 * max1;
-  widest2_int prod2 = max0 * min1;
-  widest2_int prod3 = max0 * max1;
+  wi::overflow_type ovf1, ovf2, ovf3, ovf4;
+  widest2_int prod0 = wi::mul (min0, min1, sign, );
+  widest2_int prod1 = wi::mul (min0, max1, sign, );
+  widest2_int prod2 = wi::mul (max0, min1, sign, );
+  widest2_int prod3 = wi::mul (max0, max1, sign, );
+  if (!TYPE_OVERFLOW_UNDEFINED (type)
+  && (ovf1 != wi::OVF_NONE || ovf2 != wi::OVF_NONE || ovf3 != wi::OVF_NONE
+ || ovf3 != wi::OVF_NONE))
+r.set_overflow (true);
 
   // min0min1 > max0max1
   if (prod0 > prod3)
diff --git a/gcc/value-range-storage.cc b/gcc/value-range-storage.cc
index 2f82739680c..a541c31bde2 100644
--- a/gcc/value-range-storage.cc
+++ b/gcc/value-range-storage.cc
@@ -277,6 +277,7 @@ void
 irange_storage::set_irange (const irange )
 {
   gcc_checking_assert (fits_p (r));
+  m_ovf = r.with_overflow ();
 
   if (r.undefined_p ())
 {
@@ -325,6 +326,7 @@ read_wide_int (wide_int ,
 void
 irange_storage::get_irange (irange , tree type) const
 {
+  r.set_overflow (m_ovf);
   if (m_kind == VR_UNDEFINED)
 {
   r.set_undefined ();
diff --git a/gcc/value-range-storage.h b/gcc/value-range-storage.h
index 99fb815cdc2..fc19009e566 100644
--- a/gcc/value-range-storage.h
+++ b/gcc/value-range-storage.h
@@ -90,6 +90,7 @@ private:
   unsigned char m_num_ranges;
 
   enum value_range_kind m_kind : 3;
+  bool m_ovf;
 
   // The length of this is m_num_ranges * 2 + 1 to accomodate the nonzero bits.
   HOST_WIDE_INT m_val[1];
diff --git a/gcc/value-range.h b/gcc/value-range.h
index 4dad4666a32..468d48547e1 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -147,6 +147,8 @@ public:
   bool contains_p (const wide_int &) const;
   bool nonnegative_p () const;
   bool nonpositive_p () const;
+  bool with_overflow () const { return m_ovf; }
+  void set_overflow (bool ovf) { m_ovf = ovf;}
 
   // In-place operators.
   virtual bool union_ (const vrange &) override;
@@ -199,6 +201,7 @@ private:
   unsigned char m_max_ranges;
   tree m_type;
   wide_int m_nonzero_mask;
+  bool m_ovf;
 protected:
   wide_int *m_base;
 };
@@ -842,6 +845,7 @@ irange::irange (wide_int *base, unsigned nranges, bool 
resizable)
 {
   m_base = base;
   set_undefined ();
+  m_ovf = false;
 }
 
 // Constructors for int_range<>.
>
> Bootstrap & regtest pass on ppc64{,le} and x86_64.
> Is this patch ok for trunk?
>
> BR,
> Jeff (Jiufu Guo)
>
>
>   PR tree-optimization/108757
>
> gcc/ChangeLog:
>
>   * gimple-fold.cc (mult_without_overflow_p): New function.
>   (plus_without_overflow_p): New function.
>   (minus_without_overflow_p): New function.
>   (same_sign_p): New function.
>   * gimple-fold.h (mult_without_overflow_p): New declare.
>   (plus_without_overflow_p): New 

[PATCH V3] Optimize '(X - N * M) / N' to 'X / N - M' if valid

2023-06-28 Thread Jiufu Guo via Gcc-patches
Hi,

Integer expression "(X - N * M) / N" can be optimized to "X / N - M" if
there is no wrap/overflow/underflow and "X - N * M" has the same sign
with "X".

Compare with the previous version:
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620896.html
This version changes:
1. Remove the behavior to convert 'm' to '-m' for unsigned variable.
   This kind of case is rare, and it makes the code ambiguous.
2. Use the 'capture' expression and avoid building new expressions.
3. Add APIs like get_range and nonpositive/nonnegative.
4. Refactor patterns in match.pd and function names and signatures.

While some APIs are still in gimple-fold.cc/h.  Tried to add them
to other files, but did not find a better place.
Thanks for comments/suggestions!

Bootstrap & regtest pass on ppc64{,le} and x86_64.
Is this patch ok for trunk?

BR,
Jeff (Jiufu Guo)


PR tree-optimization/108757

gcc/ChangeLog:

* gimple-fold.cc (mult_without_overflow_p): New function.
(plus_without_overflow_p): New function.
(minus_without_overflow_p): New function.
(same_sign_p): New function.
* gimple-fold.h (mult_without_overflow_p): New declare.
(plus_without_overflow_p): New declare.
(minus_without_overflow_p): New declare.
(same_sign_p): New declare.
* match.pd ((X - N * M) / N): New pattern.
((X + N * M) / N): New pattern.
((X + C) div_rshift N): New pattern.
* value-query.h (get_range): New function.
* value-range.cc (irange::nonnegative_p): New function.
(irange::nonpositive_p): New function.
* value-range.h (irange::nonnegative_p): New declare.
(irange::nonpositive_p): New declare.

gcc/testsuite/ChangeLog:

* gcc.dg/pr108757-1.c: New test.
* gcc.dg/pr108757-2.c: New test.
* gcc.dg/pr108757.h: New test.

---
 gcc/gimple-fold.cc| 132 +
 gcc/gimple-fold.h |   4 +
 gcc/match.pd  |  54 +++
 gcc/value-query.h |  10 ++
 gcc/value-range.cc|  12 ++
 gcc/value-range.h |   2 +
 gcc/testsuite/gcc.dg/pr108757-1.c |  18 +++
 gcc/testsuite/gcc.dg/pr108757-2.c |  19 +++
 gcc/testsuite/gcc.dg/pr108757.h   | 233 ++
 9 files changed, 484 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pr108757-1.c
 create mode 100644 gcc/testsuite/gcc.dg/pr108757-2.c
 create mode 100644 gcc/testsuite/gcc.dg/pr108757.h

diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index 581575b65ec..c0703b45c4b 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -9349,3 +9349,135 @@ gimple_stmt_integer_valued_real_p (gimple *stmt, int 
depth)
   return false;
 }
 }
+
+/* Return true if "X * Y" may be overflow on integer TYPE.  */
+
+bool
+mult_without_overflow_p (tree x, tree y, tree type)
+{
+  gcc_assert (INTEGRAL_TYPE_P (type));
+
+  if (TYPE_OVERFLOW_UNDEFINED (type))
+return true;
+
+  value_range vr0;
+  value_range vr1;
+  if (!(get_range (vr0, x) && get_range (vr1, y)))
+return false;
+
+  wi::overflow_type ovf;
+  signop sgn = TYPE_SIGN (type);
+  wide_int wmax0 = vr0.upper_bound ();
+  wide_int wmax1 = vr1.upper_bound ();
+  wi::mul (wmax0, wmax1, sgn, );
+  if (ovf != wi::OVF_NONE)
+return false;
+
+  if (TYPE_UNSIGNED (type))
+return true;
+
+  wide_int wmin0 = vr0.lower_bound ();
+  wide_int wmin1 = vr1.lower_bound ();
+  wi::mul (wmin0, wmin1, sgn, );
+  if (ovf != wi::OVF_NONE)
+return false;
+
+  wi::mul (wmin0, wmax1, sgn, );
+  if (ovf != wi::OVF_NONE)
+return false;
+
+  wi::mul (wmax0, wmin1, sgn, );
+  if (ovf != wi::OVF_NONE)
+return false;
+
+  return true;
+}
+
+/* Return true if "X + Y" may be overflow on integer TYPE.  */
+
+bool
+plus_without_overflow_p (tree x, tree y, tree type)
+{
+  gcc_assert (INTEGRAL_TYPE_P (type));
+
+  if (TYPE_OVERFLOW_UNDEFINED (type))
+return true;
+
+  value_range vr0;
+  value_range vr1;
+  if (!(get_range (vr0, x) && get_range (vr1, y)))
+return false;
+
+  wi::overflow_type ovf;
+  signop sgn = TYPE_SIGN (type);
+  wide_int wmax0 = vr0.upper_bound ();
+  wide_int wmax1 = vr1.upper_bound ();
+  wi::add (wmax0, wmax1, sgn, );
+  if (ovf != wi::OVF_NONE)
+return false;
+
+  if (TYPE_UNSIGNED (type))
+return true;
+
+  wide_int wmin0 = vr0.lower_bound ();
+  wide_int wmin1 = vr1.lower_bound ();
+  wi::add (wmin0, wmin1, sgn, );
+  if (ovf != wi::OVF_NONE)
+return false;
+
+  return true;
+}
+
+/* Return true if "X - Y" may be overflow on integer TYPE.  */
+
+bool
+minus_without_overflow_p (tree x, tree y, tree type)
+{
+  gcc_assert (INTEGRAL_TYPE_P (type));
+
+  if (TYPE_OVERFLOW_UNDEFINED (type))
+return true;
+
+  value_range vr0;
+  value_range vr1;
+  if (!(get_range (vr0, x) && get_range (vr1, y)))
+return false;
+
+  wi::overflow_type ovf;
+  signop sgn = TYPE_SIGN (type);
+  wide_int wmin0 = vr0.lower_bound ();
+  wide_int wmax1 = 

Re: ping^^: [PATCH] rs6000: Enable const_anchor for 'addi'

2023-06-18 Thread Jiufu Guo via Gcc-patches


Hi!

David Edelsohn  writes:

> This Message Is From an External Sender 
> This message came from outside your organization. 
>  
> On Tue, May 30, 2023 at 11:00 PM Jiufu Guo  wrote:
>
>  Gentle ping...
>
>  Jiufu Guo via Gcc-patches  writes:
>
>  > Gentle ping...
>  >
>  > Jiufu Guo via Gcc-patches  writes:
>  >
>  >> Hi,
>  >>
>  >> I'm thinking that we may enable this patch for stage1, so ping it.
>  >> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603530.html
>  >>
>  >> BR,
>  >> Jeff (Jiufu)
>  >>
>  >> Jiufu Guo  writes:
>  >>
>  >>> Hi,
>  >>>
>  >>> There is a functionality as const_anchor in cse.cc.  This const_anchor
>  >>> supports to generate new constants through adding small gap/offsets to
>  >>> existing constant.  For example:
>  >>>
>  >>> void __attribute__ ((noinline)) foo (long long *a)
>  >>> {
>  >>>   *a++ = 0x2351847027482577LL;
>  >>>   *a++ = 0x2351847027482578LL;
>  >>> }
>  >>> The second constant (0x2351847027482578LL) can be compated by adding '1'
>  >>> to the first constant (0x2351847027482577LL).
>  >>> This is profitable if more than one instructions are need to build the
>  >>> second constant.
>  >>>
>  >>> * For rs6000, we can enable this functionality, as the instruction
>  >>> 'addi' is just for this when gap is smaller than 0x8000.
>  >>>
>  >>> * Besides enabling TARGET_CONST_ANCHOR on rs6000, this patch also fixed
>  >>> one issue. The issue is:
>  >>> "gcc_assert (SCALAR_INT_MODE_P (mode))" is an requirement for function
>  >>> "try_const_anchors". 
>  >>>
>  >>> * One potential side effect of this patch:
>  >>> Comparing with
>  >>> "r101=0x2351847027482577LL
>  >>> ...
>  >>> r201=0x2351847027482578LL"
>  >>> The new r201 will be "r201=r101+1", and then r101 will live longer,
>  >>> and would increase pressure when allocating registers.
>  >>> But I feel, this would be acceptable for this const_anchor feature.
>  >>>
>  >>> * With this patch, I checked the performance change on SPEC2017, while,
>  >>> and the performance is not aggressive, since this functionality is not
>  >>> hit on any hot path. There are runtime wavings/noise(e.g. on
>  >>> povray_r/xalancbmk_r/xz_r), that are not caused by the patch.
>  >>>
>  >>> With this patch, I also checked the changes in object files (from
>  >>> GCC bootstrap and SPEC), the significant changes are the improvement
>  >>> that: "addi" vs. "2 or more insns: lis+or.."; it also exposes some
>  >>> other optimizations opportunities: like combine/jump2. While the
>  >>> code to store/load one more register is also occurring in few cases,
>  >>> but it does not impact overall performance.
>  >>>
>  >>> * To refine this patch, some history discussions are referenced:
>  >>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=33699
>  >>> https://gcc.gnu.org/pipermail/gcc-patches/2009-April/260421.html
>  >>> https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566744.html
>  >>>
>  >>>
>  >>> Bootstrap and regtest pass on ppc64 and ppc64le for this patch.
>  >>> Is this ok for trunk?
>
> Hi, Jiufu
>
> Thanks for developing this patch and your persistence.
>
> The rs6000.cc part of the patch (TARGET_CONST_ANCHOR) is okay for
> Stage 1.  This is approved.

Pushed as r14-1919-g41f42d120c4a66.  Thanks!

BR,
Jeff (Jiufu Guo)

>
> I don't have the authority to approve the change to cse_insn.  Is the 
> cse_insn change a prerequisite?  Will the rs6000 change break or produce wrong
> code without the cse change?  The second part of the patch should be posted 
> separately to the mailing list, with a cc for appropriate maintainers,
> because most maintainers will not be following this specific thread to 
> approve the other part of the patch.
>
> Thanks, David
>  
>  >>>
>  >>>
>  >>> BR,
>  >>> Jeff (Jiufu)
>  >>>
>  >>> gcc/ChangeLog:
>  >>>
>  >>> * config/rs6000/rs6000.cc (TARGET_CONST_ANCHOR): New define.
>  >>> * cse.cc (cse_insn): Add guard condition.
>  >>>
>  >>> gcc/testsuite/ChangeLog:
>  >&

Re: [PATCH] Check SCALAR_INT_MODE_P in try_const_anchors

2023-06-18 Thread Jiufu Guo via Gcc-patches


Hi,

Richard Biener  writes:

> On Fri, 16 Jun 2023, Jiufu Guo wrote:
>
>> Hi,
>> 
>> The const_anchor in cse.cc supports integer constants only.
>> There is a "gcc_assert (SCALAR_INT_MODE_P (mode))" in
>> try_const_anchors.
>> 
>> In the latest code, some non-integer modes are used with const int.
>> For examples:
>> "set (mem/c:BLK (xx) (const_int 0 [0])" occur in md files of
>> rs6000, i386, arm, and pa. For this, the mode may be BLKmode.
>> Pattern "(set (strict_low_part (xx)) (const_int xx))" could
>> be generated in a few ports. For this, the mode may be VOIDmode.
>> 
>> So, avoid mode other than SCALAR_INT_MODE in try_const_anchors
>> would be needed.
>> 
>> Some discussions in the previous thread:
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621097.html
>> 
>> Bootstrap  pass on ppc64{,le} and x86_64.
>> Is this ok for trunk?
>
> OK.

Thanks a lot! Committed via r14-1918-gc0bd79300e8fad.

BR,
Jeff (Jiufu Guo)

>
> Richard.
>
>> 
>> BR,
>> Jeff (Jiufu Guo)
>> 
>> gcc/ChangeLog:
>> 
>>  * cse.cc (try_const_anchors): Check SCALAR_INT_MODE.
>> 
>> ---
>>  gcc/cse.cc | 5 ++---
>>  1 file changed, 2 insertions(+), 3 deletions(-)
>> 
>> diff --git a/gcc/cse.cc b/gcc/cse.cc
>> index 2bb63ac4105..ddb76fd281d 100644
>> --- a/gcc/cse.cc
>> +++ b/gcc/cse.cc
>> @@ -1312,11 +1312,10 @@ try_const_anchors (rtx src_const, machine_mode mode)
>>rtx lower_exp = NULL_RTX, upper_exp = NULL_RTX;
>>unsigned lower_old, upper_old;
>>  
>> -  /* CONST_INT is used for CC modes, but we should leave those alone.  */
>> -  if (GET_MODE_CLASS (mode) == MODE_CC)
>> +  /* CONST_INT may be in various modes, avoid non-scalar-int mode. */
>> +  if (!SCALAR_INT_MODE_P (mode))
>>  return NULL_RTX;
>>  
>> -  gcc_assert (SCALAR_INT_MODE_P (mode));
>>if (!compute_const_anchors (src_const, _base, _offs,
>>_base, _offs))
>>  return NULL_RTX;
>> 


Re: [PATCH V3 1/4] rs6000: build constant via li;rotldi

2023-06-18 Thread Jiufu Guo via Gcc-patches


Hi!

Segher Boessenkool  writes:

> Hi!
>
> On Fri, Jun 16, 2023 at 04:34:12PM +0800, Jiufu Guo wrote:
>> +/* Check if value C can be built by 2 instructions: one is 'li', another is
>> +   rotldi.
>> +
>> +   If so, *SHIFT is set to the shift operand of rotldi(rldicl), and *MASK
>> +   is set to -1, and return true.  Return false otherwise.  */
>
> Don't say "is set to -1", the point of having this is so you say "is set
> to the "li" value".  Just like you describe what SHIFT is for.
Yes, thanks!
>
>> +static bool
>> +can_be_built_by_li_and_rotldi (HOST_WIDE_INT c, int *shift,
>> +   HOST_WIDE_INT *mask)
>> +{
>> +  int n;
>
> Put shis later, like:
Thanks!
>
>> +  /* Check if C can be rotated to a positive or negative value
>> +  which 'li' instruction is able to load.  */
>   int n;
>> +  if (can_be_rotated_to_lowbits (c, 15, )
>> +  || can_be_rotated_to_lowbits (~c, 15, ))
>> +{
>> +  *mask = HOST_WIDE_INT_M1;
>> +  *shift = HOST_BITS_PER_WIDE_INT - n;
>> +  return true;
>> +}
>
> It is tricky to see ~c will always work, since what is really done is -c
> instead.  Can you just use that here?

Some explanation: 
A negative value of 'li' is:
0b11..11xxx there are 49 leading '1's, and the other 15 tailing bits can
be 0 or 1. With the '~' operation, there are 49 '0's.
After the value is rotated,  there are still 49 '1's. (xxx may also be
at head/tail.) 
For the rotated value, with the '~' operation, there are still 49 '0's.

So, for a value, if there are 49 successive '1's (may cross head/tail).
It should be able to rotate to low 15 bits after the '~' operation.

It would not be enough if using the '-' operation, since '-x=~x+1' in
the bit aspect. As the below case 'li_rotldi_3': 0x8531LL
(rotate left 0x8531 32bit).
The '~c' is 0x7ace, this can be rotated from 0x7ace. (~0x8531).
But '-c' is 0x7ace0001. this value is not good.

>
>> @@ -10266,15 +10291,14 @@ static void
>>  rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
>>  {
>>rtx temp;
>> +  int shift;
>> +  HOST_WIDE_INT mask;
>>HOST_WIDE_INT ud1, ud2, ud3, ud4;
>>  
>>ud1 = c & 0x;
>> -  c = c >> 16;
>> -  ud2 = c & 0x;
>> -  c = c >> 16;
>> -  ud3 = c & 0x;
>> -  c = c >> 16;
>> -  ud4 = c & 0x;
>> +  ud2 = (c >> 16) & 0x;
>> +  ud3 = (c >> 32) & 0x;
>> +  ud4 = (c >> 48) & 0x;
>>  
>>if ((ud4 == 0x && ud3 == 0x && ud2 == 0x && (ud1 & 0x8000))
>>|| (ud4 == 0 && ud3 == 0 && ud2 == 0 && ! (ud1 & 0x8000)))
>> @@ -10305,6 +10329,17 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT 
>> c)
>>emit_move_insn (dest, gen_rtx_XOR (DImode, temp,
>>   GEN_INT ((ud2 ^ 0x) << 16)));
>>  }
>> +  else if (can_be_built_by_li_and_rotldi (c, , ))
>> +{
>> +  temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
>> +  unsigned HOST_WIDE_INT imm = (c | ~mask);
>> +  imm = (imm >> shift) | (imm << (HOST_BITS_PER_WIDE_INT - shift));
>> +
>> +  emit_move_insn (temp, GEN_INT (imm));
>> +  if (shift != 0)
>> +temp = gen_rtx_ROTATE (DImode, temp, GEN_INT (shift));
>> +  emit_move_insn (dest, temp);
>> +}
>
> If you would rewrite so it isn't such a run-on thing with "else if",
> instead using early outs, or even some factoring, you could declare the
> variable used only in a tiny scope in that tiny scope instead.

Yes! Early returning is better for a lot of cases.  I would like
to have a refactor patch.

>
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/const-build.c
>> @@ -0,0 +1,54 @@
>> +/* { dg-do run } */
>> +/* { dg-options "-O2 -save-temps" } */
>> +/* { dg-require-effective-target has_arch_ppc64 } */
>
> Please put a tiny comment here saying what this test is *for*?  The file
> name is a bit of hint already, but you can indicate much more in one or
> two lines :-)

Oh, yes, thanks for point out this!

>
> With those adjustments, okay for trunk.  Thanks!
>
> (If -c doesn't work, it needs more explanation).

Sure, some words as above.

BR,
Jeff (Jiufu Guo)

>
>
> Segher


[PATCH V3 1/4] rs6000: build constant via li;rotldi

2023-06-16 Thread Jiufu Guo via Gcc-patches
Hi,

If a constant is possible to be rotated to/from a positive or negative
value from "li", then "li;rotldi" can be used to build the constant.

Compare with the previous version, those one-line abstraction codes are
removed.
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621001.html

Bootstrap and regtest pass on ppc64{,le}.
Is this ok for trunk?

BR,
Jeff (Jiufu)

gcc/ChangeLog:

* config/rs6000/rs6000.cc (can_be_built_by_li_and_rotldi): New function.
(rs6000_emit_set_long_const): Call can_be_built_by_li_and_rotldi.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/const-build.c: New test.
---
 gcc/config/rs6000/rs6000.cc   | 47 +---
 .../gcc.target/powerpc/const-build.c  | 54 +++
 2 files changed, 95 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/const-build.c

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 42f49e4a56b..13aafd1360a 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -10258,6 +10258,31 @@ rs6000_emit_set_const (rtx dest, rtx source)
   return true;
 }
 
+/* Check if value C can be built by 2 instructions: one is 'li', another is
+   rotldi.
+
+   If so, *SHIFT is set to the shift operand of rotldi(rldicl), and *MASK
+   is set to -1, and return true.  Return false otherwise.  */
+
+static bool
+can_be_built_by_li_and_rotldi (HOST_WIDE_INT c, int *shift,
+  HOST_WIDE_INT *mask)
+{
+  int n;
+
+  /* Check if C can be rotated to a positive or negative value
+  which 'li' instruction is able to load.  */
+  if (can_be_rotated_to_lowbits (c, 15, )
+  || can_be_rotated_to_lowbits (~c, 15, ))
+{
+  *mask = HOST_WIDE_INT_M1;
+  *shift = HOST_BITS_PER_WIDE_INT - n;
+  return true;
+}
+
+  return false;
+}
+
 /* Subroutine of rs6000_emit_set_const, handling PowerPC64 DImode.
Output insns to set DEST equal to the constant C as a series of
lis, ori and shl instructions.  */
@@ -10266,15 +10291,14 @@ static void
 rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
 {
   rtx temp;
+  int shift;
+  HOST_WIDE_INT mask;
   HOST_WIDE_INT ud1, ud2, ud3, ud4;
 
   ud1 = c & 0x;
-  c = c >> 16;
-  ud2 = c & 0x;
-  c = c >> 16;
-  ud3 = c & 0x;
-  c = c >> 16;
-  ud4 = c & 0x;
+  ud2 = (c >> 16) & 0x;
+  ud3 = (c >> 32) & 0x;
+  ud4 = (c >> 48) & 0x;
 
   if ((ud4 == 0x && ud3 == 0x && ud2 == 0x && (ud1 & 0x8000))
   || (ud4 == 0 && ud3 == 0 && ud2 == 0 && ! (ud1 & 0x8000)))
@@ -10305,6 +10329,17 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
   emit_move_insn (dest, gen_rtx_XOR (DImode, temp,
 GEN_INT ((ud2 ^ 0x) << 16)));
 }
+  else if (can_be_built_by_li_and_rotldi (c, , ))
+{
+  temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
+  unsigned HOST_WIDE_INT imm = (c | ~mask);
+  imm = (imm >> shift) | (imm << (HOST_BITS_PER_WIDE_INT - shift));
+
+  emit_move_insn (temp, GEN_INT (imm));
+  if (shift != 0)
+   temp = gen_rtx_ROTATE (DImode, temp, GEN_INT (shift));
+  emit_move_insn (dest, temp);
+}
   else if (ud3 == 0 && ud4 == 0)
 {
   temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
diff --git a/gcc/testsuite/gcc.target/powerpc/const-build.c 
b/gcc/testsuite/gcc.target/powerpc/const-build.c
new file mode 100644
index 000..70f095f6bf2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/const-build.c
@@ -0,0 +1,54 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -save-temps" } */
+/* { dg-require-effective-target has_arch_ppc64 } */
+
+#define NOIPA __attribute__ ((noipa))
+
+struct fun
+{
+  long long (*f) (void);
+  long long val;
+};
+
+long long NOIPA
+li_rotldi_1 (void)
+{
+  return 0x75310LL;
+}
+
+long long NOIPA
+li_rotldi_2 (void)
+{
+  return 0x2164LL;
+}
+
+long long NOIPA
+li_rotldi_3 (void)
+{
+  return 0x8531LL;
+}
+
+long long NOIPA
+li_rotldi_4 (void)
+{
+  return 0x2194LL;
+}
+
+struct fun arr[] = {
+  {li_rotldi_1, 0x75310LL},
+  {li_rotldi_2, 0x2164LL},
+  {li_rotldi_3, 0x8531LL},
+  {li_rotldi_4, 0x2194LL},
+};
+
+/* { dg-final { scan-assembler-times {\mrotldi\M} 4 } } */
+
+int
+main ()
+{
+  for (int i = 0; i < sizeof (arr) / sizeof (arr[0]); i++)
+if ((*arr[i].f) () != arr[i].val)
+  __builtin_abort ();
+
+  return 0;
+}
-- 
2.39.3



[PATCH] Check SCALAR_INT_MODE_P in try_const_anchors

2023-06-16 Thread Jiufu Guo via Gcc-patches
Hi,

The const_anchor in cse.cc supports integer constants only.
There is a "gcc_assert (SCALAR_INT_MODE_P (mode))" in
try_const_anchors.

In the latest code, some non-integer modes are used with const int.
For examples:
"set (mem/c:BLK (xx) (const_int 0 [0])" occur in md files of
rs6000, i386, arm, and pa. For this, the mode may be BLKmode.
Pattern "(set (strict_low_part (xx)) (const_int xx))" could
be generated in a few ports. For this, the mode may be VOIDmode.

So, avoid mode other than SCALAR_INT_MODE in try_const_anchors
would be needed.

Some discussions in the previous thread:
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621097.html

Bootstrap  pass on ppc64{,le} and x86_64.
Is this ok for trunk?


BR,
Jeff (Jiufu Guo)

gcc/ChangeLog:

* cse.cc (try_const_anchors): Check SCALAR_INT_MODE.

---
 gcc/cse.cc | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/gcc/cse.cc b/gcc/cse.cc
index 2bb63ac4105..ddb76fd281d 100644
--- a/gcc/cse.cc
+++ b/gcc/cse.cc
@@ -1312,11 +1312,10 @@ try_const_anchors (rtx src_const, machine_mode mode)
   rtx lower_exp = NULL_RTX, upper_exp = NULL_RTX;
   unsigned lower_old, upper_old;
 
-  /* CONST_INT is used for CC modes, but we should leave those alone.  */
-  if (GET_MODE_CLASS (mode) == MODE_CC)
+  /* CONST_INT may be in various modes, avoid non-scalar-int mode. */
+  if (!SCALAR_INT_MODE_P (mode))
 return NULL_RTX;
 
-  gcc_assert (SCALAR_INT_MODE_P (mode));
   if (!compute_const_anchors (src_const, _base, _offs,
  _base, _offs))
 return NULL_RTX;
-- 
2.39.3



Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-15 Thread Jiufu Guo via Gcc-patches


Hi,

Segher Boessenkool  writes:

> On Thu, Jun 15, 2023 at 03:00:40PM +0800, Jiufu Guo wrote:
>> >>   This is the existing pattern.  It may be read as an action
>> >>   to clean an unknown-size memory block.
>> >
>> > Including a size zero memory block, yes.  BLKmode was originally to do
>> > things like bcopy (before modern names like memcpy were more usually
>> > used), and those very much need size zero as well.h
>> 
>> The size is possible to be zero.  No asm code needs to
>> be generated for "set 'const_int 0' to zero size memory"".
>> stack_tie does not generate any real code.  It seems ok :)
>> 
>> While, it may not be zero size mem.  This may be a concern.
>> This is one reason that I would like to have an unspec_tie.
>
> It very much *can* be a zero size mem, that is perfectly find for
> mem:BLK.

There is still one concern: how to distinguish stack_tie
from other insn.
For example, below fake pattern:
(define_insn "xx_cleanmem"
  [(parallel: [(set (mem:BLK (xxx)) (const_int 0))
   (XXX/use "const_int_operand" "n")])]...

To avoid this pattern to be recognized as 'stack_tie',
'unspec_tie' was came to mind. 

>
>> Another reason is unspec:blk is used but various ports :) 
>
> unspec:BLK is undefined.  BLKmode is allowed on mem only.
>
>> >> 2. "set (mem/c:BLK (reg/f:DI 1 1) unspec:blk (const_int 0 [0])
>> >> UNSPEC_TIE".
>> >>   Current patch is using this one.
>> >
>> > What would be the semantics of that?  Just the same as the current stuff
>> > I'd say, or less?  It cannot be more!
>> 
>> The semantic that I trying to achieve is "this is a special
>> insn, not only a normal set to unknown size mem".
>
> What does that *mean*?  "Special instruction"?  What would what code do
> for that?  What would the RTL mean?
>
>> As you explained before on 'unspec:DI', the unspec would
>> just decorate the set_src part: something DI value with
>> machine-specific operation.
>
> An unspec is an operation on its operands, giving some (in this case)
> DImode value.  There is nothing special about that operation, it can be
> optimised like any other, it's just not specified what exactly that
> value is (to the generic compiler, the backend itself can very much
> optimise stuff with it).
>
>> But, since 'tie_operand' is checked for this insn.
>> If 'tie_operand' checks UNPSEC_TIE, then the insn
>> with UNPSEC_TIE is 'a special insn'.  Or interpret
>> the semantic of this insn as: this insn stack_ite
>> indicates "set/operate a zero size block".
>
> tie_operand is a predicate.  The predicate of an insn has to return 1,
> or the insn is not recognised.  You can do the same in insn conditions
> always (in principle anyway).

Thank you very much for your detailed and patient explanation!

BR,
Jeff (Jiufu Guo)

>
>
> Segher


Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-15 Thread Jiufu Guo via Gcc-patches


Hi,

Segher Boessenkool  writes:

> Hi!
>
> On Wed, Jun 14, 2023 at 05:18:15PM +0800, Xi Ruoyao wrote:
>> The generic issue here is to fix (not "papering over") the signed
>> overflow, we need to perform the addition in a target machine mode.  We
>> may always use Pmode (IIRC const_anchor was introduced for optimizing
>> some constant addresses), but can we do better?
>
> The main issue is that the machine description generated target code to
> compute some constants, but the sanitizer treats it as if it was user
> code that might do wrong things.
>
>> Should we try addition in both DImode and SImode for a 64-bit capable
>> machine?
>
> Why?  At least on PowerPC there is only one insn, and it is 64 bits.
> The SImode version just ignores all bits other than the low 32 bits, in
> both inputs and output.
>
>> Or should we even try more operations than addition (for eg bit
>> operations like xor or shift)?  Doing so will need to create a new
>> target hook for const anchoring, this is the "complete rework" I meant.
>

Yeap! This would be a different implementation than the current
const_anchor in cse.cc. In postreload.cc, there is another
implementation: "reload_cse_move2add" which checks all 'add's
instructions from the target. But both implementations have pros
and cons.

Using gcc source code as a benchmark, analyzing the relations
between constants (focusing on those constants in the same
function or the same basic block). IIRC, 'add's can cover
most of the relations. Small part of constants can be built
via other operations(e.g. shift, and, neg ,...).
There may be still some benchmarks that hit other operations
in the hot path.

Indeed, the const_anchor feature could be enhanced to cover
more cases.


BR,
Jeff (Jiufu Guo)

> This might make const anchor useful for way more targets maybe,
> including rs6000, yes :-)
>
>
> Segher


Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-15 Thread Jiufu Guo via Gcc-patches


Hi,

Segher Boessenkool  writes:

> Hi!
>
> On Wed, Jun 14, 2023 at 12:06:29PM +0800, Jiufu Guo wrote:
>> Segher Boessenkool  writes:
>> I'm also thinking about other solutions:
>> 1. "set (mem/c:BLK (reg/f:DI 1 1) (const_int 0 [0])"
>>   This is the existing pattern.  It may be read as an action
>>   to clean an unknown-size memory block.
>
> Including a size zero memory block, yes.  BLKmode was originally to do
> things like bcopy (before modern names like memcpy were more usually
> used), and those very much need size zero as well.h

The size is possible to be zero.  No asm code needs to
be generated for "set 'const_int 0' to zero size memory"".
stack_tie does not generate any real code.  It seems ok :)

While, it may not be zero size mem.  This may be a concern.
This is one reason that I would like to have an unspec_tie.

Another reason is unspec:blk is used but various ports :) 

>
>> 2. "set (mem/c:BLK (reg/f:DI 1 1) unspec:blk (const_int 0 [0])
>> UNSPEC_TIE".
>>   Current patch is using this one.
>
> What would be the semantics of that?  Just the same as the current stuff
> I'd say, or less?  It cannot be more!

The semantic that I trying to achieve is "this is a special
insn, not only a normal set to unknown size mem".

As you explained before on 'unspec:DI', the unspec would
just decorate the set_src part: something DI value with
machine-specific operation.

But, since 'tie_operand' is checked for this insn.
If 'tie_operand' checks UNPSEC_TIE, then the insn
with UNPSEC_TIE is 'a special insn'.  Or interpret
the semantic of this insn as: this insn stack_ite
indicates "set/operate a zero size block".

Does this make sense?

>
>> 3. "set (mem/c:DI (reg/f:DI 1 1) unspec:DI (const_int 0 [0])
>> UNSPEC_TIE".
>>This avoids using BLK on unspec, but using DI.
>
> And is incorrect because of that.
>
>> 4. "set (mem/c:BLK (reg/f:DI 1 1) unspec (const_int 0 [0])
>> UNSPEC_TIE"
>>There is still a mode for the unspec.
>
> It has VOIDmode here, which is incorrect.
>
>> > On Tue, Jun 13, 2023 at 08:23:35PM +0800, Jiufu Guo wrote:
>> >> +   && XINT (SET_SRC (set), 1) == UNSPEC_TIE
>> >> +   && XVECEXP (SET_SRC (set), 0, 0) == const0_rtx);
>> >
>> > This makes it required that the operand of an UNSPEC_TIE unspec is a
>> > const_int 0.  This should be documented somewhere.  Ideally you would
>> > want no operand at all here, but every unspec has an operand.
>> 
>> Right!  Since checked UNSPEC_TIE arleady, we may not need to check
>> the inner operand. Like " && XINT (SET_SRC (set), 1) == UNSPEC_TIE);".
>
> Yes.  But we should write down somewhere (in a comment near the unspec
> constant def for example) what the operand is -- so, "operand is usually
> (const_int 0) because we have to put *something* there" or such.  The
> clumsiness of this is enough for me to prefer some other solution
> already ;-)

Thanks a lot for your comments!

BR,
Jeff (Jiufu Guo)

>
>
> Segher


Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-14 Thread Jiufu Guo via Gcc-patches


Hi,

Richard Sandiford  writes:

> Richard Biener  writes:
>> AFAIU this special instruction is only supposed to prevent
>> code motion (of stack memory accesses?) across this instruction?
>> I'd say a
>>
>>   (may_clobber (mem:BLK (reg:DI 1 1)))
>>
>> might be more to the point?  I've used "may_clobber" which doesn't
>> exist since I'm not sure whether a clobber is considered a kill.
>> The docs say "Represents the storing or possible storing of an 
>> unpredictable..." - what is it? Storing or possible storing?
>
> I'd also understood it to be either.  As in, it is a may-clobber
> that can be used for must-clobber.  Alternatively: the value stored
> is unpredictable, and can therefore be the same as the current value.
>
> I think the main difference between:
>
>   (clobber (mem:BLK …))
>
> and
>
>   (set (mem:BLK …) (unspec:BLK …))
>
> is that the latter must happen for correctness (unless something
> that understands the unspec proves otherwise) whereas a clobber
> can validly be dropped.  So for something like stack_tie, a set
> seems more correct than a clobber.

Thanks a lot for all your helpful comments!

BR,
Jeff (Jiufu Guo)

>
> Thanks,
> Richard


Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-14 Thread Jiufu Guo via Gcc-patches


Hi,

Richard Biener  writes:

> On Wed, 14 Jun 2023, Jiufu Guo wrote:
>
>> 
>> Hi,
>> 
>> Segher Boessenkool  writes:
>> 
>> > Hi!
>> >
>> > As I said in a reply to the original patch: not okay.  Sorry.
>> 
>> Thanks a lot for your comments!
>> I'm also thinking about other solutions:
>> 1. "set (mem/c:BLK (reg/f:DI 1 1) (const_int 0 [0])"
>>   This is the existing pattern.  It may be read as an action
>>   to clean an unknown-size memory block.
>> 
>> 2. "set (mem/c:BLK (reg/f:DI 1 1) unspec:blk (const_int 0 [0])
>> UNSPEC_TIE".
>>   Current patch is using this one.
>> 
>> 3. "set (mem/c:DI (reg/f:DI 1 1) unspec:DI (const_int 0 [0])
>> UNSPEC_TIE".
>>This avoids using BLK on unspec, but using DI.
>
> That gives the MEM a size which means we can interpret the (set ..)
> as killing a specific area of memory, enabling DSE of earlier
> stores.

Oh, thanks!
While with 'unspec:DI', I'm wondering if it means this 'set' would
do some special things other than pure 'set' to the memory. 

BR,
Jeff (Jiufu Guo)

>
> AFAIU this special instruction is only supposed to prevent
> code motion (of stack memory accesses?) across this instruction?
> I'd say a
>
>   (may_clobber (mem:BLK (reg:DI 1 1)))
>
> might be more to the point?  I've used "may_clobber" which doesn't
> exist since I'm not sure whether a clobber is considered a kill.
> The docs say "Represents the storing or possible storing of an 
> unpredictable..." - what is it?  Storing or possible storing?
> I suppose stack_tie should be less strict than the documented
> (clobber (mem:BLK (const_int 0))) (clobber all memory).
>
> ?
>
>> 4. "set (mem/c:BLK (reg/f:DI 1 1) unspec (const_int 0 [0])
>> UNSPEC_TIE"
>>There is still a mode for the unspec.
>> 
>> 
>> >
>> > But some comments on this patch:
>> >
>> > On Tue, Jun 13, 2023 at 08:23:35PM +0800, Jiufu Guo wrote:
>> >> +   && XINT (SET_SRC (set), 1) == UNSPEC_TIE
>> >> +   && XVECEXP (SET_SRC (set), 0, 0) == const0_rtx);
>> >
>> > This makes it required that the operand of an UNSPEC_TIE unspec is a
>> > const_int 0.  This should be documented somewhere.  Ideally you would
>> > want no operand at all here, but every unspec has an operand.
>> 
>> Right!  Since checked UNSPEC_TIE arleady, we may not need to check
>> the inner operand. Like " && XINT (SET_SRC (set), 1) == UNSPEC_TIE);".
>> 
>> >
>> >> +  RTVEC_ELT (p, i)
>> >> + = gen_rtx_SET (mem, gen_rtx_UNSPEC (BLKmode, gen_rtvec (1, const0_rtx),
>> >> + UNSPEC_TIE));
>> >
>> > If it is hard to indent your code, your code is trying to do to much.
>> > Just have an extra temporary?
>> >
>> >   rtx un = gen_rtx_UNSPEC (BLKmode, gen_rtvec (1, const0_rtx), 
>> > UNSPEC_TIE);
>> >   RTVEC_ELT (p, i) = gen_rtx_SET (mem, un);
>> >
>> > That is shorter even, and certainly more readable :-)
>> 
>> Yeap, thanks!
>> 
>> >
>> >> @@ -10828,7 +10829,9 @@ (define_expand "restore_stack_block"
>> >>operands[4] = gen_frame_mem (Pmode, operands[1]);
>> >>p = rtvec_alloc (1);
>> >>RTVEC_ELT (p, 0) = gen_rtx_SET (gen_frame_mem (BLKmode, operands[0]),
>> >> -   const0_rtx);
>> >> +   gen_rtx_UNSPEC (BLKmode,
>> >> +   gen_rtvec (1, const0_rtx),
>> >> +   UNSPEC_TIE));
>> >>operands[5] = gen_rtx_PARALLEL (VOIDmode, p);
>> >
>> > I have a hard time to see how this could ever be seen as clearer or more
>> > obvious or anything like that :-(
>> 
>> I was thinking about just invoking gen_stack_tie here.
>> 
>> BR,
>> Jeff (Jiufu Guo)
>> 
>> >
>> >
>> > Segher
>> 


Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-13 Thread Jiufu Guo via Gcc-patches


Hi,

Segher Boessenkool  writes:

> Hi!
>
> As I said in a reply to the original patch: not okay.  Sorry.

Thanks a lot for your comments!
I'm also thinking about other solutions:
1. "set (mem/c:BLK (reg/f:DI 1 1) (const_int 0 [0])"
  This is the existing pattern.  It may be read as an action
  to clean an unknown-size memory block.

2. "set (mem/c:BLK (reg/f:DI 1 1) unspec:blk (const_int 0 [0])
UNSPEC_TIE".
  Current patch is using this one.

3. "set (mem/c:DI (reg/f:DI 1 1) unspec:DI (const_int 0 [0])
UNSPEC_TIE".
   This avoids using BLK on unspec, but using DI.

4. "set (mem/c:BLK (reg/f:DI 1 1) unspec (const_int 0 [0])
UNSPEC_TIE"
   There is still a mode for the unspec.


>
> But some comments on this patch:
>
> On Tue, Jun 13, 2023 at 08:23:35PM +0800, Jiufu Guo wrote:
>> +  && XINT (SET_SRC (set), 1) == UNSPEC_TIE
>> +  && XVECEXP (SET_SRC (set), 0, 0) == const0_rtx);
>
> This makes it required that the operand of an UNSPEC_TIE unspec is a
> const_int 0.  This should be documented somewhere.  Ideally you would
> want no operand at all here, but every unspec has an operand.

Right!  Since checked UNSPEC_TIE arleady, we may not need to check
the inner operand. Like " && XINT (SET_SRC (set), 1) == UNSPEC_TIE);".

>
>> +  RTVEC_ELT (p, i)
>> += gen_rtx_SET (mem, gen_rtx_UNSPEC (BLKmode, gen_rtvec (1, const0_rtx),
>> +UNSPEC_TIE));
>
> If it is hard to indent your code, your code is trying to do to much.
> Just have an extra temporary?
>
>   rtx un = gen_rtx_UNSPEC (BLKmode, gen_rtvec (1, const0_rtx), 
> UNSPEC_TIE);
>   RTVEC_ELT (p, i) = gen_rtx_SET (mem, un);
>
> That is shorter even, and certainly more readable :-)

Yeap, thanks!

>
>> @@ -10828,7 +10829,9 @@ (define_expand "restore_stack_block"
>>operands[4] = gen_frame_mem (Pmode, operands[1]);
>>p = rtvec_alloc (1);
>>RTVEC_ELT (p, 0) = gen_rtx_SET (gen_frame_mem (BLKmode, operands[0]),
>> -  const0_rtx);
>> +  gen_rtx_UNSPEC (BLKmode,
>> +  gen_rtvec (1, const0_rtx),
>> +  UNSPEC_TIE));
>>operands[5] = gen_rtx_PARALLEL (VOIDmode, p);
>
> I have a hard time to see how this could ever be seen as clearer or more
> obvious or anything like that :-(

I was thinking about just invoking gen_stack_tie here.

BR,
Jeff (Jiufu Guo)

>
>
> Segher


Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-13 Thread Jiufu Guo via Gcc-patches


Hi Segher, David,

David Edelsohn  writes:

> On Tue, Jun 13, 2023 at 2:16 PM Segher Boessenkool
>  wrote:
>>
>> Hi!
>>
>> On Tue, Jun 13, 2023 at 10:15:49AM +0800, Jiufu Guo wrote:
>> > David Edelsohn  writes:
>> > >
>> > > This definitely seems to be a better solution.
>> > >
>> > > The TARGET_CONST_ANCHOR change should not be part of this patch.  Also
>> > > there is no ChangeLog for the patch.
>> >
>> > Thanks a lot for your quick review!! And sorry for the sending this patch
>> > in a hurry.  I would update the patch accordingly.
>>
>> > > This generally looks correct and consistent with other ports. I want
>> > > to give Segher a chance to double check it, if he wishes.
>>
>> The documentation is very clear that the only thing for which you can
>> have BLKmode is "mem".  Not unspec, only "mem".
>>
>> Let's not do this.  The existing code has clear and obvious semantics,
>> which is documented as well -- there is no reason to make it worse in
>> every respect.

Thanks for all your insight comments!

Yeap, while "unspec:BLK" is very widely used already on various ports.
And it seems a few place is using BLKmode without strictly align with
the document :( It would not be very good thing, but maybe no better
solutions.

For existing code "set (mem/c:BLK (reg/f:DI 1 1) (const_int 0 [0])"
Since it is a set, the operand set_src should be valid for
the mode of the set_dest. While set_src is 'const_int 0'.
And this 'set' may be mis-readed as 'a memory is zeroed' or
'no-op to a mem'. Using unspec here would just say this is an special
operation instead a normal 'const_int 0'.

BR,
Jeff (Jiufu Guo)

>
> Segher,
>
> Unfortunately, GCC now is inconsistent and this response is incorrect.
> The documentation is out of date or was ignored and the "facts on the
> ground" contradict your review.
>
> Yes, (const_int 0) is supposed to be a general no-op and BLKmode only
> is supposed to be used for MEM, but other major targets (arm, aarch64,
> riscv, s390) all use unspec:BLK and specifically UNSPEC_TIE.  rs6000
> is the only port that does not follow this convention.  The middle-end
> has adapted to the behavior of all of the other targets, whether that
> conformed to the documentation or not.  The rs6000 port needs to be
> fixed and Jiufu's approach is the correct one, consistent with all
> other targets for stack tie.  If the documentation differs, the
> documentation needs to be updated, not a different approach for the
> rs6000 port.  Jiufu's patch is correct.
>
> Thanks, David


Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-13 Thread Jiufu Guo via Gcc-patches
Hi,

Xi Ruoyao  writes:

> On Tue, 2023-06-13 at 20:23 +0800, Jiufu Guo via Gcc-patches wrote:
>
>> Compare with previous version, this addes ChangeLog and removes
>> const_anchor parts.
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621356.html.
>
> [Off topic]
>
> const_anchor is just broken now.  See
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104843 and the thread
> beginning at
> https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591470.html.  If
> you want to use it for rs6000 I guess you need to fix it first...

Thanks so much for pointing out this.  It seems about supporting
negative value, right?

As you say: for 1. "g(0x8123, 0x81240001)", it would be fine.

The generated insns are:
(insn 5 2 6 2 (set (reg:DI 117)
(const_int -2128347135 [0x81240001])) "negative.c":5:3 681 
{*movdi_internal64}
 (nil))
(insn 6 5 7 2 (set (reg:DI 118)
(plus:DI (reg:DI 117)
(const_int -2 [0xfffe]))) "negative.c":5:3 66 {*adddi3}
 (expr_list:REG_EQUAL (const_int -2128347137 [0x8123])
(nil)))

While for 2. "g (0x7fff, 0x8001)", the generated rtl insns:
(insn 5 2 6 2 (set (reg:DI 117)
(const_int -2147483647 [0x8001])) "negative.c":5:3 681 
{*movdi_internal64}
 (nil))
(insn 7 6 8 2 (set (reg:DI 3 3)
(const_int 2147483647 [0x7fff])) "negative.c":5:3 681 
{*movdi_internal64}
 (nil))

The current const_anchor does not generate sth like: "r3 = r117 - 2"
But I would lean to say it is the limitation of current implementation:
"0x8001" and "0x7fff" hit different anchors(even these
two values are 'close' on some aspect.)

BR,
Jeff (Jiufu Guo)

>
> To me const_anchor needs a complete rework but I don't want to spend my
> time on it.


Re: [PATCH 1/4] rs6000: build constant via li;rotldi

2023-06-13 Thread Jiufu Guo via Gcc-patches


Hi,

David Edelsohn  writes:

> On Mon, Jun 12, 2023 at 11:30 PM Jiufu Guo  wrote:
>>
>>
>> Hi David,
>>
>> David Edelsohn  writes:
>> > On Wed, Jun 7, 2023 at 9:55 PM Jiufu Guo  wrote:
>> >
>> >  Hi,
>> >
>> >  This patch checks if a constant is possible to be rotated to/from a 
>> > positive
>> >  or negative value from "li". If so, we could use "li;rotldi" to build it.
>> >
>> >  Bootstrap and regtest pass on ppc64{,le}.
>> >  Is this ok for trunk?
>> >
>> >  BR,
>> >  Jeff (Jiufu)
>> >
>> >  gcc/ChangeLog:
>> >
>> >  * config/rs6000/rs6000.cc (can_be_rotated_to_positive_li): New 
>> > function.
>> >  (can_be_rotated_to_negative_li): New function.
>> >  (can_be_built_by_li_and_rotldi): New function.
>> >  (rs6000_emit_set_long_const): Call can_be_built_by_li_and_rotldi.
>> >
>> >  gcc/testsuite/ChangeLog:
>> >
>> >  * gcc.target/powerpc/const-build.c: New test.
>> >  ---
>> >   gcc/config/rs6000/rs6000.cc   | 64 +--
>> >   .../gcc.target/powerpc/const-build.c  | 54 
>> >   2 files changed, 112 insertions(+), 6 deletions(-)
>> >   create mode 100644 gcc/testsuite/gcc.target/powerpc/const-build.c
>> >
>> >  diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>> >  index 42f49e4a56b..1dd0072350a 100644
>> >  --- a/gcc/config/rs6000/rs6000.cc
>> >  +++ b/gcc/config/rs6000/rs6000.cc
>> >  @@ -10258,6 +10258,48 @@ rs6000_emit_set_const (rtx dest, rtx source)
>> > return true;
>> >   }
>> >
>> >  +/* Check if C can be rotated to a positive value which 'li' instruction
>> >  +   is able to load.  If so, set *ROT to the number by which C is rotated,
>> >  +   and return true.  Return false otherwise.  */
>> >  +
>> >  +static bool
>> >  +can_be_rotated_to_positive_li (HOST_WIDE_INT c, int *rot)
>> >  +{
>> >  +  /* 49 leading zeros and 15 low bits on the positive value
>> >  + generated by 'li' instruction.  */
>> >  +  return can_be_rotated_to_lowbits (c, 15, rot);
>> >  +}
>> >  +
>> >  +/* Like can_be_rotated_to_positive_li, but check the negative value of 
>> > 'li'.  */
>> >  +
>> >  +static bool
>> >  +can_be_rotated_to_negative_li (HOST_WIDE_INT c, int *rot)
>> >  +{
>> >  +  return can_be_rotated_to_lowbits (~c, 15, rot);
>> >  +}
>> >  +
>> >  +/* Check if value C can be built by 2 instructions: one is 'li', another 
>> > is
>> >  +   rotldi.
>> >  +
>> >  +   If so, *SHIFT is set to the shift operand of rotldi(rldicl), and *MASK
>> >  +   is set to -1, and return true.  Return false otherwise.  */
>> >  +
>> >
>> > I look at this feature and it's good, but I don't fully understand the 
>> > benefit of this level of abstraction.  Ideally all of the above functions 
>> > would
>> > be inlined.  They aren't reused.
>> >
>> >  +static bool
>> >  +can_be_built_by_li_and_rotldi (HOST_WIDE_INT c, int *shift,
>> >  +  HOST_WIDE_INT *mask)
>> >  +{
>> >  +  int n;
>> >  +  if (can_be_rotated_to_positive_li (c, )
>> >  +  || can_be_rotated_to_negative_li (c, ))
>> >
>> > Why not
>> >
>> > /* Check if C or ~C can be rotated to a positive or negative value
>> > which 'li' instruction is able to load.  */
>> > if (can_be_rotated_to_lowbits (c, 15, )
>> > || can_be_rotated_to_lowbits (~c, 15, ))
>>
>>
>> Thanks a lot for your review!!
>>
>> Your suggestions could also achieve my goal of using a new function:
>> Using "can_be_rotated_to_positive_li" is just trying to get a
>> straightforward name.  Like yours, the code's comments would also
>> make it easy to understand.
>
> I recognize that you are trying to be consistent with the other
> functions that you add in later patches, but it feels like overkill in
Yes :)
> abstraction to me.  Or maybe combine postive_li and negative_li into a
> single function so that the abstraction serves a purpose other than a
> tail call and creating an alias for a specific invocation of
> can_be_rotated_to_lowbits.
Get it.

Thanks for your valuable suggestion!

BR,
Jeff (Jiufu Guo)

>
> Thanks, David
>
>>
>> BR,
>> Jeff (Jiufu Guo)
>> >
>> > ...
>> >
>> > This is a style of software engineering, but it seems overkill to me when 
>> > the function is a single line that tail calls another function.  Am I 
>> > missing
>> > something?
>> >
>> > The rest of this patch looks good.
>> >
>> > Thanks, David
>> >
>> >  +{
>> >  +  *mask = HOST_WIDE_INT_M1;
>> >  +  *shift = HOST_BITS_PER_WIDE_INT - n;
>> >  +  return true;
>> >  +}
>> >  +
>> >  +  return false;
>> >  +}
>> >  +
>> >   /* Subroutine of rs6000_emit_set_const, handling PowerPC64 DImode.
>> >  Output insns to set DEST equal to the constant C as a series of
>> >  lis, ori and shl instructions.  */
>> >  @@ -10266,15 +10308,14 @@ static void
>> >   rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
>> >   {
>> > rtx temp;
>> >  +  int shift;
>> >  +  HOST_WIDE_INT mask;
>> > HOST_WIDE_INT ud1, ud2, ud3, ud4;
>> >
>> > 

[PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-13 Thread Jiufu Guo via Gcc-patches
Hi,

For stack_tie, currently below insn is generated:
(insn 15 14 16 3 (parallel [
 (set (mem/c:BLK (reg/f:DI 1 1) [1  A8])
 (const_int 0 [0]))
 ]) "/home/guojiufu/temp/gdb.c":13:3 922 {stack_tie}
  (nil))

It is "set (mem/c:BLK (reg/f:DI 1 1) (const_int 0 [0])".  This maybe
looks like "a memory block is zerored", while actually stack_tie
may be more like a placeholder, and does not generate any thing.

To avoid potential misunderstand, "UNPSEC:BLK [(const_int 0)].." could
be used here.

Compare with previous version, this addes ChangeLog and removes
const_anchor parts.
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621356.html.

Bootstrap pass on ppc64{,le}.
Is this ok for trunk?

BR,
Jeff (Jiufu Guo)

gcc/ChangeLog:

* config/rs6000/predicates.md (tie_operand): Update to match new
stack_tie pattern.
* config/rs6000/rs6000-logue.cc (rs6000_emit_stack_tie): Update to
use the new stack_tie pattern.
* config/rs6000/rs6000.md (UNSPEC_TIE): New UNSPEC.
(restore_stack_block): Update to use the new stack_tie pattern.
(restore_stack_nonlocal): Likewise.
(stack_tie): Update pattern to use UNSPEC_TIE.
(stack_restore_tie): Likewise.  

---
 gcc/config/rs6000/predicates.md   | 11 +++
 gcc/config/rs6000/rs6000-logue.cc |  4 +++-
 gcc/config/rs6000/rs6000.md   | 14 ++
 4 files changed, 24 insertions(+), 9 deletions(-)

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index a16ee30f0c0..4748cb37ce8 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -1854,10 +1854,13 @@ (define_predicate "stmw_operation"
 (define_predicate "tie_operand"
   (match_code "parallel")
 {
-  return (GET_CODE (XVECEXP (op, 0, 0)) == SET
- && MEM_P (XEXP (XVECEXP (op, 0, 0), 0))
- && GET_MODE (XEXP (XVECEXP (op, 0, 0), 0)) == BLKmode
- && XEXP (XVECEXP (op, 0, 0), 1) == const0_rtx);
+  rtx set = XVECEXP (op, 0, 0);
+  return (GET_CODE (set) == SET
+ && MEM_P (SET_DEST (set))
+ && GET_MODE (SET_DEST (set)) == BLKmode
+ && GET_CODE (SET_SRC (set)) == UNSPEC
+ && XINT (SET_SRC (set), 1) == UNSPEC_TIE
+ && XVECEXP (SET_SRC (set), 0, 0) == const0_rtx);
 })
 
 ;; Match a small code model toc reference (or medium and large
diff --git a/gcc/config/rs6000/rs6000-logue.cc 
b/gcc/config/rs6000/rs6000-logue.cc
index bc6b153b59f..b99f43a8282 100644
--- a/gcc/config/rs6000/rs6000-logue.cc
+++ b/gcc/config/rs6000/rs6000-logue.cc
@@ -1463,7 +1463,9 @@ rs6000_emit_stack_tie (rtx fp, bool hard_frame_needed)
   while (--i >= 0)
 {
   rtx mem = gen_frame_mem (BLKmode, regs[i]);
-  RTVEC_ELT (p, i) = gen_rtx_SET (mem, const0_rtx);
+  RTVEC_ELT (p, i)
+   = gen_rtx_SET (mem, gen_rtx_UNSPEC (BLKmode, gen_rtvec (1, const0_rtx),
+   UNSPEC_TIE));
 }
 
   emit_insn (gen_stack_tie (gen_rtx_PARALLEL (VOIDmode, p)));
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index b0db8ae508d..fdcf8347812 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -158,6 +158,7 @@ (define_c_enum "unspec"
UNSPEC_HASHCHK
UNSPEC_XXSPLTIDP_CONST
UNSPEC_XXSPLTIW_CONST
+   UNSPEC_TIE
   ])
 
 ;;
@@ -10828,7 +10829,9 @@ (define_expand "restore_stack_block"
   operands[4] = gen_frame_mem (Pmode, operands[1]);
   p = rtvec_alloc (1);
   RTVEC_ELT (p, 0) = gen_rtx_SET (gen_frame_mem (BLKmode, operands[0]),
- const0_rtx);
+ gen_rtx_UNSPEC (BLKmode,
+ gen_rtvec (1, const0_rtx),
+ UNSPEC_TIE));
   operands[5] = gen_rtx_PARALLEL (VOIDmode, p);
 })
 
@@ -10866,7 +10869,9 @@ (define_expand "restore_stack_nonlocal"
   operands[5] = gen_frame_mem (Pmode, operands[3]);
   p = rtvec_alloc (1);
   RTVEC_ELT (p, 0) = gen_rtx_SET (gen_frame_mem (BLKmode, operands[0]),
- const0_rtx);
+ gen_rtx_UNSPEC (BLKmode,
+ gen_rtvec (1, const0_rtx),
+ UNSPEC_TIE));
   operands[6] = gen_rtx_PARALLEL (VOIDmode, p);
 })
 
@@ -13898,7 +13903,8 @@ (define_insn "*save_fpregs__r1"
 ; not be moved over loads from or stores to stack memory.
 (define_insn "stack_tie"
   [(match_parallel 0 "tie_operand"
-  [(set (mem:BLK (reg 1)) (const_int 0))])]
+  [(set (mem:BLK (reg 1))
+   (unspec:BLK [(const_int 0)] UNSPEC_TIE))])]
   ""
   ""
   [(set_attr "length" "0")])
@@ -13910,7 +13916,7 @@ (define_insn "stack_restore_tie"
   [(set (match_operand:SI 0 "gpc_reg_operand" "=r,r")
(plus:SI (match_operand:SI 1 "gpc_reg_operand" "r,r")
 (match_operand:SI 2 

Re: [PATCH 4/4] rs6000: build constant via li/lis;rldic

2023-06-13 Thread Jiufu Guo via Gcc-patches


Hi David,

Thanks for your valuable comments!

David Edelsohn  writes:
>  
> On Wed, Jun 7, 2023 at 9:56 PM Jiufu Guo  wrote:
>
>  Hi,
>
>  This patch checks if a constant is possible to be built by "li;rldic".
>  We only need to take care of "negative li", other forms do not need to check.
>  For example, "negative lis" is just a "negative li" with an additional shift.
>
>  Bootstrap and regtest pass on ppc64{,le}.
>  Is this ok for trunk?
>
>  BR,
>  Jeff (Jiufu)
>
>  gcc/ChangeLog:
>
>  * config/rs6000/rs6000.cc (can_be_built_by_li_and_rldic): New 
> function.
>  (rs6000_emit_set_long_const): Call can_be_built_by_li_and_rldic.
>
> This is okay.
>
> Do you have any measurement of how expensive it is to test all of these 
> additional methods to generate a constant?  How much does this affect the
> compile time?

Yeap, Thanks for this very good question!
This patch is mostly using bitwise operations and if-conditions,
it would be expected not expensive.

Testcases were checked.  For example:
A case with ~1000 constants: most of them hit this feature.
With this feature, the compiling time is slightly faster.

0m1.985s(without patch) vs. 0m1.874s(with patch)
(note:D rs6000_emit_set_long_const does not occur in hot perf
functions.  So, the tricky time saving would not directly cause
by this feature.)

A case with ~1000 constants:(most are not hit by this feature)
0m2.493s(without patch) vs. 0m2.558s(with patch).

For runtime, actually, with the patch, it seems there is no visible
improvement in SPEC2017.  While I still feel this patch is
doing the right thing: use fewer instructions to build the constant.

BR,
Jeff (Jiufu Guo)

>
> Thanks, David
>
>  
>  
>  gcc/testsuite/ChangeLog:
>
>  * gcc.target/powerpc/const-build.c: Add more tests.
>  ---
>   gcc/config/rs6000/rs6000.cc   | 61 ++-
>   .../gcc.target/powerpc/const-build.c  | 28 +
>   2 files changed, 88 insertions(+), 1 deletion(-)
>
>  diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>  index 2a3fa733b45..cd04b6b5c82 100644
>  --- a/gcc/config/rs6000/rs6000.cc
>  +++ b/gcc/config/rs6000/rs6000.cc
>  @@ -10387,6 +10387,64 @@ can_be_built_by_li_lis_and_rldicr (HOST_WIDE_INT c, 
> int *shift,
> return false;
>   }
>
>  +/* Check if value C can be built by 2 instructions: one is 'li', another is
>  +   rldic.
>  +
>  +   If so, *SHIFT is set to the 'shift' operand of rldic; and *MASK is set
>  +   to the mask value about the 'mb' operand of rldic; and return true.
>  +   Return false otherwise.  */
>  +
>  +static bool
>  +can_be_built_by_li_and_rldic (HOST_WIDE_INT c, int *shift, HOST_WIDE_INT 
> *mask)
>  +{
>  +  /* There are 49 successive ones in the negative value of 'li'.  */
>  +  int ones = 49;
>  +
>  +  /* 1..1xx1..1: negative value of li --> 0..01..1xx0..0:
>  + right bits are shifted as 0's, and left 1's(and x's) are cleaned.  */
>  +  int tz = ctz_hwi (c);
>  +  int lz = clz_hwi (c);
>  +  int middle_ones = clz_hwi (~(c << lz));
>  +  if (tz + lz + middle_ones >= ones)
>  +{
>  +  *mask = ((1LL << (HOST_BITS_PER_WIDE_INT - tz - lz)) - 1LL) << tz;
>  +  *shift = tz;
>  +  return true;
>  +}
>  +
>  +  /* 1..1xx1..1 --> 1..1xx0..01..1: some 1's(following x's) are cleaned. */
>  +  int leading_ones = clz_hwi (~c);
>  +  int tailing_ones = ctz_hwi (~c);
>  +  int middle_zeros = ctz_hwi (c >> tailing_ones);
>  +  if (leading_ones + tailing_ones + middle_zeros >= ones)
>  +{
>  +  *mask = ~(((1ULL << middle_zeros) - 1ULL) << tailing_ones);
>  +  *shift = tailing_ones + middle_zeros;
>  +  return true;
>  +}
>  +
>  +  /* xx1..1xx: --> xx0..01..1xx: some 1's(following x's) are cleaned. */
>  +  /* Get the position for the first bit of successive 1.
>  + The 24th bit would be in successive 0 or 1.  */
>  +  HOST_WIDE_INT low_mask = (1LL << 24) - 1LL;
>  +  int pos_first_1 = ((c & (low_mask + 1)) == 0)
>  + ? clz_hwi (c & low_mask)
>  + : HOST_BITS_PER_WIDE_INT - ctz_hwi (~(c | low_mask));
>  +  middle_ones = clz_hwi (~c << pos_first_1);
>  +  middle_zeros = ctz_hwi (c >> (HOST_BITS_PER_WIDE_INT - pos_first_1));
>  +  if (pos_first_1 < HOST_BITS_PER_WIDE_INT
>  +  && middle_ones + middle_zeros < HOST_BITS_PER_WIDE_INT
>  +  && middle_ones + middle_zeros >= ones)
>  +{
>  +  *mask = ~(((1ULL << middle_zeros) - 1LL)
>  +   << (HOST_BITS_PER_WIDE_INT - pos_first_1));
>  +  *shift = HOST_BITS_PER_WIDE_INT - pos_first_1 + middle_zeros;
>  +  return true;
>  +}
>  +
>  +  return false;
>  +}
>  +
>   /* Subroutine of rs6000_emit_set_const, handling PowerPC64 DImode.
>  Output insns to set DEST equal to the constant C as a series of
>  lis, ori and shl instructions.  */
>  @@ -10435,7 +10493,8 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT 
> c)
>   }
> else if (can_be_built_by_li_lis_and_rotldi (c, , )
>  

Re: [PATCH 3/4] rs6000: build constant via li/lis;rldicl/rldicr

2023-06-12 Thread Jiufu Guo via Gcc-patches


Hi,

David Edelsohn  writes:
>  
> On Wed, Jun 7, 2023 at 9:56 PM Jiufu Guo  wrote:
>
>  Hi,
>
>  This patch checks if a constant is possible left/right cleaned on a rotated
>  value from a negative value of "li/lis".  If so, we can build the constant
>  through "li/lis ; rldicl/rldicr".
>
>  Bootstrap and regtest pass on ppc64{,le}.
>  Is this ok for trunk?
>
>  BR,
>  Jeff (Jiufu)
>
>  gcc/ChangeLog:
>
>  * config/rs6000/rs6000.cc (can_be_built_by_li_lis_and_rldicl): New
>  function.
>  (can_be_built_by_li_lis_and_rldicr): New function.
>  (rs6000_emit_set_long_const): Call can_be_built_by_li_lis_and_rldicr 
> and
>  can_be_built_by_li_lis_and_rldicl.
>
> This is okay.  See below.
>
> Thanks, David
>
>  
>  
>  gcc/testsuite/ChangeLog:
>
>  * gcc.target/powerpc/const-build.c: Add more tests.
>  ---
>   gcc/config/rs6000/rs6000.cc   | 61 ++-
>   .../gcc.target/powerpc/const-build.c  | 44 +
>   2 files changed, 104 insertions(+), 1 deletion(-)
>
>  diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>  index 03cd9d5e952..2a3fa733b45 100644
>  --- a/gcc/config/rs6000/rs6000.cc
>  +++ b/gcc/config/rs6000/rs6000.cc
>  @@ -10332,6 +10332,61 @@ can_be_built_by_li_lis_and_rotldi (HOST_WIDE_INT c, 
> int *shift,
> return false;
>   }
>
>  +/* Check if value C can be built by 2 instructions: one is 'li or lis',
>  +   another is rldicl.
>  +
>  +   If so, *SHIFT is set to the shift operand of rldicl, and *MASK is set to
>  +   the mask operand of rldicl, and return true.
>  +   Return false otherwise.  */
>  +
>  +static bool
>  +can_be_built_by_li_lis_and_rldicl (HOST_WIDE_INT c, int *shift,
>  +  HOST_WIDE_INT *mask)
>  +{
>  +  /* Leading zeros may be cleaned by rldicl with a mask.  Change leading 
> zeros
>  + to ones and then recheck it.  */
>  +  int lz = clz_hwi (c);
>  +  HOST_WIDE_INT unmask_c
>  += c | (HOST_WIDE_INT_M1U << (HOST_BITS_PER_WIDE_INT - lz));
>  +  int n;
>  +  if (can_be_rotated_to_negative_li (unmask_c, )
>
> using can_be_rotated_to_lowbits (~unmask_c, 15, )
>
> Maybe Segher would want the abstraction, but it seems more wasteful to
> me.

Thanks! I would update accordingly :)

BR,
Jeff (Jiufu) Guo

>  
>  +  || can_be_rotated_to_negative_lis (unmask_c, ))
>  +{
>  +  *mask = HOST_WIDE_INT_M1U >> lz;
>  +  *shift = n == 0 ? 0 : HOST_BITS_PER_WIDE_INT - n;
>  +  return true;
>  +}
>  +
>  +  return false;
>  +}
>  +
>  +/* Check if value C can be built by 2 instructions: one is 'li or lis',
>  +   another is rldicr.
>  +
>  +   If so, *SHIFT is set to the shift operand of rldicr, and *MASK is set to
>  +   the mask operand of rldicr, and return true.
>  +   Return false otherwise.  */
>  +
>  +static bool
>  +can_be_built_by_li_lis_and_rldicr (HOST_WIDE_INT c, int *shift,
>  +  HOST_WIDE_INT *mask)
>  +{
>  +  /* Tailing zeros may be cleaned by rldicr with a mask.  Change tailing 
> zeros
>  + to ones and then recheck it.  */
>  +  int tz = ctz_hwi (c);
>  +  HOST_WIDE_INT unmask_c = c | ((HOST_WIDE_INT_1U << tz) - 1);
>  +  int n;
>  +  if (can_be_rotated_to_negative_li (unmask_c, )
>  +  || can_be_rotated_to_negative_lis (unmask_c, ))
>  +{
>  +  *mask = HOST_WIDE_INT_M1U << tz;
>  +  *shift = HOST_BITS_PER_WIDE_INT - n;
>  +  return true;
>  +}
>  +
>  +  return false;
>  +}
>  +
>   /* Subroutine of rs6000_emit_set_const, handling PowerPC64 DImode.
>  Output insns to set DEST equal to the constant C as a series of
>  lis, ori and shl instructions.  */
>  @@ -10378,7 +10433,9 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT 
> c)
> emit_move_insn (dest, gen_rtx_XOR (DImode, temp,
>   GEN_INT ((ud2 ^ 0x) << 16)));
>   }
>  -  else if (can_be_built_by_li_lis_and_rotldi (c, , ))
>  +  else if (can_be_built_by_li_lis_and_rotldi (c, , )
>  +  || can_be_built_by_li_lis_and_rldicl (c, , )
>  +  || can_be_built_by_li_lis_and_rldicr (c, , ))
>   {
> temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
> unsigned HOST_WIDE_INT imm = (c | ~mask);
>  @@ -10387,6 +10444,8 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT 
> c)
> emit_move_insn (temp, GEN_INT (imm));
> if (shift != 0)
>  temp = gen_rtx_ROTATE (DImode, temp, GEN_INT (shift));
>  +  if (mask != HOST_WIDE_INT_M1)
>  +   temp = gen_rtx_AND (DImode, temp, GEN_INT (mask));
> emit_move_insn (dest, temp);
>   }
> else if (ud3 == 0 && ud4 == 0)
>  diff --git a/gcc/testsuite/gcc.target/powerpc/const-build.c 
> b/gcc/testsuite/gcc.target/powerpc/const-build.c
>  index c38a1dd91f2..8c209921d41 100644
>  --- a/gcc/testsuite/gcc.target/powerpc/const-build.c
>  +++ b/gcc/testsuite/gcc.target/powerpc/const-build.c
>  @@ -46,6 +46,42 @@ 

Re: [PATCH 1/4] rs6000: build constant via li;rotldi

2023-06-12 Thread Jiufu Guo via Gcc-patches


Hi David,

David Edelsohn  writes:
> On Wed, Jun 7, 2023 at 9:55 PM Jiufu Guo  wrote:
>
>  Hi,
>
>  This patch checks if a constant is possible to be rotated to/from a positive
>  or negative value from "li". If so, we could use "li;rotldi" to build it.
>
>  Bootstrap and regtest pass on ppc64{,le}.
>  Is this ok for trunk?
>
>  BR,
>  Jeff (Jiufu)
>
>  gcc/ChangeLog:
>
>  * config/rs6000/rs6000.cc (can_be_rotated_to_positive_li): New 
> function.
>  (can_be_rotated_to_negative_li): New function.
>  (can_be_built_by_li_and_rotldi): New function.
>  (rs6000_emit_set_long_const): Call can_be_built_by_li_and_rotldi.
>
>  gcc/testsuite/ChangeLog:
>
>  * gcc.target/powerpc/const-build.c: New test.
>  ---
>   gcc/config/rs6000/rs6000.cc   | 64 +--
>   .../gcc.target/powerpc/const-build.c  | 54 
>   2 files changed, 112 insertions(+), 6 deletions(-)
>   create mode 100644 gcc/testsuite/gcc.target/powerpc/const-build.c
>
>  diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>  index 42f49e4a56b..1dd0072350a 100644
>  --- a/gcc/config/rs6000/rs6000.cc
>  +++ b/gcc/config/rs6000/rs6000.cc
>  @@ -10258,6 +10258,48 @@ rs6000_emit_set_const (rtx dest, rtx source)
> return true;
>   }
>
>  +/* Check if C can be rotated to a positive value which 'li' instruction
>  +   is able to load.  If so, set *ROT to the number by which C is rotated,
>  +   and return true.  Return false otherwise.  */
>  +
>  +static bool
>  +can_be_rotated_to_positive_li (HOST_WIDE_INT c, int *rot)
>  +{
>  +  /* 49 leading zeros and 15 low bits on the positive value
>  + generated by 'li' instruction.  */
>  +  return can_be_rotated_to_lowbits (c, 15, rot);
>  +}
>  +
>  +/* Like can_be_rotated_to_positive_li, but check the negative value of 
> 'li'.  */
>  +
>  +static bool
>  +can_be_rotated_to_negative_li (HOST_WIDE_INT c, int *rot)
>  +{
>  +  return can_be_rotated_to_lowbits (~c, 15, rot);
>  +}
>  +
>  +/* Check if value C can be built by 2 instructions: one is 'li', another is
>  +   rotldi.
>  +
>  +   If so, *SHIFT is set to the shift operand of rotldi(rldicl), and *MASK
>  +   is set to -1, and return true.  Return false otherwise.  */
>  +
>
> I look at this feature and it's good, but I don't fully understand the 
> benefit of this level of abstraction.  Ideally all of the above functions 
> would
> be inlined.  They aren't reused.
>  
>  +static bool
>  +can_be_built_by_li_and_rotldi (HOST_WIDE_INT c, int *shift,
>  +  HOST_WIDE_INT *mask)
>  +{
>  +  int n;
>  +  if (can_be_rotated_to_positive_li (c, )
>  +  || can_be_rotated_to_negative_li (c, ))
>
> Why not
>
> /* Check if C or ~C can be rotated to a positive or negative value
> which 'li' instruction is able to load.  */
> if (can_be_rotated_to_lowbits (c, 15, )
> || can_be_rotated_to_lowbits (~c, 15, ))

 
Thanks a lot for your review!!

Your suggestions could also achieve my goal of using a new function:
Using "can_be_rotated_to_positive_li" is just trying to get a
straightforward name.  Like yours, the code's comments would also
make it easy to understand.

BR,
Jeff (Jiufu Guo)
>  
> ...
>
> This is a style of software engineering, but it seems overkill to me when the 
> function is a single line that tail calls another function.  Am I missing
> something?
>
> The rest of this patch looks good.
>
> Thanks, David
>  
>  +{
>  +  *mask = HOST_WIDE_INT_M1;
>  +  *shift = HOST_BITS_PER_WIDE_INT - n;
>  +  return true;
>  +}
>  +
>  +  return false;
>  +}
>  +
>   /* Subroutine of rs6000_emit_set_const, handling PowerPC64 DImode.
>  Output insns to set DEST equal to the constant C as a series of
>  lis, ori and shl instructions.  */
>  @@ -10266,15 +10308,14 @@ static void
>   rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
>   {
> rtx temp;
>  +  int shift;
>  +  HOST_WIDE_INT mask;
> HOST_WIDE_INT ud1, ud2, ud3, ud4;
>
> ud1 = c & 0x;
>  -  c = c >> 16;
>  -  ud2 = c & 0x;
>  -  c = c >> 16;
>  -  ud3 = c & 0x;
>  -  c = c >> 16;
>  -  ud4 = c & 0x;
>  +  ud2 = (c >> 16) & 0x;
>  +  ud3 = (c >> 32) & 0x;
>  +  ud4 = (c >> 48) & 0x;
>
> if ((ud4 == 0x && ud3 == 0x && ud2 == 0x && (ud1 & 0x8000))
> || (ud4 == 0 && ud3 == 0 && ud2 == 0 && ! (ud1 & 0x8000)))
>  @@ -10305,6 +10346,17 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT 
> c)
> emit_move_insn (dest, gen_rtx_XOR (DImode, temp,
>   GEN_INT ((ud2 ^ 0x) << 16)));
>   }
>  +  else if (can_be_built_by_li_and_rotldi (c, , ))
>  +{
>  +  temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
>  +  unsigned HOST_WIDE_INT imm = (c | ~mask);
>  +  imm = (imm >> shift) | (imm << (HOST_BITS_PER_WIDE_INT - shift));
>  +
>  +  emit_move_insn (temp, GEN_INT (imm));
>  +  if (shift != 

Re: Ping^^: [PATCH V2] extract DF/SF/SI/HI/QI subreg from parameter word on stack

2023-06-12 Thread Jiufu Guo via Gcc-patches


Hi,

Jeff Law  writes:

> On 5/10/23 19:20, Jiufu Guo wrote:
>>
>> Hi,
>>
>> I would like to ping:
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609396.html
>>
>> We know there are a few issues related to aggregate parameter and
>> returns.  I'm thinking if it is ok for trunk to use this patch to
>> resolve part of those issues.
> It looks like the patch is focused on emitting a load of the object
> from memory into a GPR, then copying the GPR into pseudo (which
> hopefully gets allocated into an FPR).  That would seem to indicate
> the value got flushed to memory at some point.  Presumably because the
> type of the object it not one that we would typically allow in
> registers, except for some special cases for parameter passing and
> return values?

Yes.  The the object would be in a different mode as the nature mode
of the register of the parameter/returns.

>
> If that's the case, then is there any value in finding the flush to
> the stack and just emitting a copy from the GPR into the destination
> pseudo at that point?
>
> Or is it just easier to construct a load from the flushback area and
> let CSE/DCE/DSE clean things up?
Right! The idea of this patch is to construct a load that can expose the
opportunity for CSE/DCE/DSE passes to optimize the code.

And I'm also trying to use another solution to handle this issue: like
light-sra in expander for struct parameter.  Sorry for missing a recall
of this patch.

Thank you very much for your comments!!


BR,
Jeff (Jiufu)

>
> Jeff
>
> I


Re: [PATCH] Make sure SCALAR_INT_MODE_P before invoke try_const_anchors

2023-06-12 Thread Jiufu Guo via Gcc-patches


Hi,

Jeff Law  writes:

> On 6/11/23 23:44, Jiufu Guo wrote:
>> Richard Biener  writes:
>>
>>> On Fri, 9 Jun 2023, Jiufu Guo wrote:
>>>

 Hi,

 Richard Biener  writes:

> On Fri, 9 Jun 2023, Jiufu Guo wrote:
>
>>
>> Hi,
>>
>> Richard Biener  writes:
>>
>>> On Fri, 9 Jun 2023, Richard Sandiford wrote:
>>>
 guojiufu  writes:
> Hi,
>
> On 2023-06-09 16:00, Richard Biener wrote:
>> On Fri, 9 Jun 2023, Jiufu Guo wrote:
>>
>>> Hi,
>>>
 ...
>>>
>>> This patch is raised when drafting below one.
>>> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603530.html.
>>> With that patch, "{[%1:DI]=0;} stack_tie" with BLKmode runs into
>>> try_const_anchors, and hits the assert/ice.
>>>
>>> Boostrap and regtest pass on ppc64{,le} and x86_64.
>>> Is this ok for trunk?
>>
>> Iff the correct fix at all (how can a CONST_INT have BLKmode?) then
>> I suggest to instead fix try_const_anchors to change
>>
>>/* CONST_INT is used for CC modes, but we should leave those 
>> alone.
>> */
>>if (GET_MODE_CLASS (mode) == MODE_CC)
>>  return NULL_RTX;
>>
>>gcc_assert (SCALAR_INT_MODE_P (mode));
>>
>> to
>>
>>/* CONST_INT is used for CC modes, leave any non-scalar-int mode
>> alone.  */
>>if (!SCALAR_INT_MODE_P (mode))
>>  return NULL_RTX;
>>
>
> This is also able to fix this issue.  there is a "Punt on CC modes"
> patch
> to return NULL_RTX in try_const_anchors.
>
>> but as said I wonder how we arrive at a BLKmode CONST_INT and whether
>> we should have fended this off earlier.  Can you share more complete
>> RTL of that stack_tie?
>
>
> (insn 15 14 16 3 (parallel [
>   (set (mem/c:BLK (reg/f:DI 1 1) [1  A8])
>   (const_int 0 [0]))
>   ]) "/home/guojiufu/temp/gdb.c":13:3 922 {stack_tie}
>(nil))
>
> It is "set (mem/c:BLK (reg/f:DI 1 1) (const_int 0 [0])".

 I'm not convinced this is correct RTL.  (unspec:BLK [(const_int 0)] 
 ...)
 would be though.  It's arguably more accurate too, since the effect
 on the stack locations is unspecified rather than predictable.
>>>
>>> powerpc seems to be the only port with a stack_tie that's not
>>> using an UNSPEC RHS.
>> In rs6000.md, it is
>>
>> ; This is to explain that changes to the stack pointer should
>> ; not be moved over loads from or stores to stack memory.
>> (define_insn "stack_tie"
>>[(match_parallel 0 "tie_operand"
>> [(set (mem:BLK (reg 1)) (const_int 0))])]
>>""
>>""
>>[(set_attr "length" "0")])
>>
>> This would be just an placeholder insn, and acts as the comments.
>> UNSPEC_ would works like other targets.  While, I'm wondering
>> the concerns on "set (mem:BLK (reg 1)) (const_int 0)".
>> MODEs between SET_DEST and SET_SRC?
>
> I don't think the issue is the mode but the issue is that
> the patter as-is says some memory is zeroed while that's not
> actually true (not specifying a size means we can't really do
> anything with this MEM, but still).  Using an UNSPEC avoids
> implying anything for the stored value.
>
> Of course I think a MEM SET_DEST without a specified size is bougs
> as well, but there's larger precedent for this...

 Thanks for your kindly comments!
 Using "(set (mem:BLK (reg 1)) (const_int 0))" here, may because this
 insn does not generate real thing (not a real store and no asm code),
 may like barrier.

 While I agree that, using UNSPEC may be more clear to avoid mis-reading.
>>>
>>> Btw, another way to avoid the issue in CSE is to make it not process
>>> (aka record anything for optimization) for SET from MEMs with
>>> !MEM_SIZE_KNOWN_P
>>
>> Thanks! Yes, this would make sense.
>> Then, there are two ideas(patches) to handle this issue:
>> Which one would be preferable?  This one (from compiling time aspect)?
>>
>> And maybe, the changes in rs6000 stack_tie through using unspec
>> can be a standalone enhancement besides cse patch.
> I'd tend to lean more towards fixing the rs6000 backend.  It's basically 
> lying to the rest of the compiler and when it presents passes with something 
> like
>
> (set (mem:BLK) (const_int 0))
>
> It's largely inviting the generic bits to treat it like a memory store, when 
> in fact it's something significantly different.
>
> I don't think the CSE patch is wrong or a bad idea, more that it's
> just papering over a problem caused by an odd chunk of RTL created by
> the PPC backend.


Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-12 Thread Jiufu Guo via Gcc-patches


Hi David,

David Edelsohn  writes:

> Hi, Jiufu
>
> This definitely seems to be a better solution.
>
> The TARGET_CONST_ANCHOR change should not be part of this patch.  Also
> there is no ChangeLog for the patch.

Thanks a lot for your quick review!! And sorry for the sending this patch
in a hurry.  I would update the patch accordingly.


BR,
Jeff (Jiufu Guo)

>
> This generally looks correct and consistent with other ports. I want
> to give Segher a chance to double check it, if he wishes.
>
> Thanks David
>
> On Mon, Jun 12, 2023 at 9:19 AM Jiufu Guo  wrote:
>>
>> Hi,
>>
>> For stack_tie, currently below insn is generated:
>> (insn 15 14 16 3 (parallel [
>>  (set (mem/c:BLK (reg/f:DI 1 1) [1  A8])
>>  (const_int 0 [0]))
>>  ]) "/home/guojiufu/temp/gdb.c":13:3 922 {stack_tie}
>>   (nil))
>>
>> It is "set (mem/c:BLK (reg/f:DI 1 1) (const_int 0 [0])".  This maybe
>> looks like "a memory block is zerored", while actually stack_tie
>> may be more like a placeholder, and does not generate any thing.
>>
>> To avoid potential misunderstand, "UNPSEC:BLK [(const_int 0)].." could
>> be used here like other ports.
>>
>> This patch does this.  Bootstrap pass on ppc64{,le}.
>> Is this ok for trunk?
>>
>> BR,
>> Jeff (Jiufu Guo)
>>
>> ---
>>  gcc/config/rs6000/predicates.md   | 11 +++
>>  gcc/config/rs6000/rs6000-logue.cc |  4 +++-
>>  gcc/config/rs6000/rs6000.cc   |  4 
>>  gcc/config/rs6000/rs6000.md   | 14 ++
>>  4 files changed, 24 insertions(+), 9 deletions(-)
>>
>> diff --git a/gcc/config/rs6000/predicates.md 
>> b/gcc/config/rs6000/predicates.md
>> index a16ee30f0c0..4748cb37ce8 100644
>> --- a/gcc/config/rs6000/predicates.md
>> +++ b/gcc/config/rs6000/predicates.md
>> @@ -1854,10 +1854,13 @@ (define_predicate "stmw_operation"
>>  (define_predicate "tie_operand"
>>(match_code "parallel")
>>  {
>> -  return (GET_CODE (XVECEXP (op, 0, 0)) == SET
>> - && MEM_P (XEXP (XVECEXP (op, 0, 0), 0))
>> - && GET_MODE (XEXP (XVECEXP (op, 0, 0), 0)) == BLKmode
>> - && XEXP (XVECEXP (op, 0, 0), 1) == const0_rtx);
>> +  rtx set = XVECEXP (op, 0, 0);
>> +  return (GET_CODE (set) == SET
>> + && MEM_P (SET_DEST (set))
>> + && GET_MODE (SET_DEST (set)) == BLKmode
>> + && GET_CODE (SET_SRC (set)) == UNSPEC
>> + && XINT (SET_SRC (set), 1) == UNSPEC_TIE
>> + && XVECEXP (SET_SRC (set), 0, 0) == const0_rtx);
>>  })
>>
>>  ;; Match a small code model toc reference (or medium and large
>> diff --git a/gcc/config/rs6000/rs6000-logue.cc 
>> b/gcc/config/rs6000/rs6000-logue.cc
>> index bc6b153b59f..b99f43a8282 100644
>> --- a/gcc/config/rs6000/rs6000-logue.cc
>> +++ b/gcc/config/rs6000/rs6000-logue.cc
>> @@ -1463,7 +1463,9 @@ rs6000_emit_stack_tie (rtx fp, bool hard_frame_needed)
>>while (--i >= 0)
>>  {
>>rtx mem = gen_frame_mem (BLKmode, regs[i]);
>> -  RTVEC_ELT (p, i) = gen_rtx_SET (mem, const0_rtx);
>> +  RTVEC_ELT (p, i)
>> +   = gen_rtx_SET (mem, gen_rtx_UNSPEC (BLKmode, gen_rtvec (1, 
>> const0_rtx),
>> +   UNSPEC_TIE));
>>  }
>>
>>emit_insn (gen_stack_tie (gen_rtx_PARALLEL (VOIDmode, p)));
>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>> index d197c3f3289..0c81ebea711 100644
>> --- a/gcc/config/rs6000/rs6000.cc
>> +++ b/gcc/config/rs6000/rs6000.cc
>> @@ -1760,6 +1760,10 @@ static const struct attribute_spec 
>> rs6000_attribute_table[] =
>>
>>  #undef TARGET_UPDATE_IPA_FN_TARGET_INFO
>>  #define TARGET_UPDATE_IPA_FN_TARGET_INFO rs6000_update_ipa_fn_target_info
>> +
>> +#undef TARGET_CONST_ANCHOR
>> +#define TARGET_CONST_ANCHOR 0x8000
>> +
>>
>>
>>  /* Processor table.  */
>> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
>> index b0db8ae508d..fdcf8347812 100644
>> --- a/gcc/config/rs6000/rs6000.md
>> +++ b/gcc/config/rs6000/rs6000.md
>> @@ -158,6 +158,7 @@ (define_c_enum "unspec"
>> UNSPEC_HASHCHK
>> UNSPEC_XXSPLTIDP_CONST
>> UNSPEC_XXSPLTIW_CONST
>> +   UNSPEC_TIE
>>])
>>
>>  ;;
>> @@ -10828,7 +10829,9 @@ (define_expand "restore_stack_block"
>>operands[4] = gen_frame_mem (Pmode, operands[1]);
>>p = rtvec_alloc (1);
>>RTVEC_ELT (p, 0) = gen_rtx_SET (gen_frame_mem (BLKmode, operands[0]),
>> - const0_rtx);
>> + gen_rtx_UNSPEC (BLKmode,
>> + gen_rtvec (1, const0_rtx),
>> + UNSPEC_TIE));
>>operands[5] = gen_rtx_PARALLEL (VOIDmode, p);
>>  })
>>
>> @@ -10866,7 +10869,9 @@ (define_expand "restore_stack_nonlocal"
>>operands[5] = gen_frame_mem (Pmode, operands[3]);
>>p = rtvec_alloc (1);
>>RTVEC_ELT (p, 0) = gen_rtx_SET (gen_frame_mem (BLKmode, operands[0]),
>> - const0_rtx);
>> + gen_rtx_UNSPEC (BLKmode,
>> +

[PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-12 Thread Jiufu Guo via Gcc-patches
Hi,

For stack_tie, currently below insn is generated:
(insn 15 14 16 3 (parallel [
 (set (mem/c:BLK (reg/f:DI 1 1) [1  A8])
 (const_int 0 [0]))
 ]) "/home/guojiufu/temp/gdb.c":13:3 922 {stack_tie}
  (nil))

It is "set (mem/c:BLK (reg/f:DI 1 1) (const_int 0 [0])".  This maybe
looks like "a memory block is zerored", while actually stack_tie
may be more like a placeholder, and does not generate any thing.

To avoid potential misunderstand, "UNPSEC:BLK [(const_int 0)].." could
be used here like other ports.

This patch does this.  Bootstrap pass on ppc64{,le}.
Is this ok for trunk?

BR,
Jeff (Jiufu Guo)

---
 gcc/config/rs6000/predicates.md   | 11 +++
 gcc/config/rs6000/rs6000-logue.cc |  4 +++-
 gcc/config/rs6000/rs6000.cc   |  4 
 gcc/config/rs6000/rs6000.md   | 14 ++
 4 files changed, 24 insertions(+), 9 deletions(-)

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index a16ee30f0c0..4748cb37ce8 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -1854,10 +1854,13 @@ (define_predicate "stmw_operation"
 (define_predicate "tie_operand"
   (match_code "parallel")
 {
-  return (GET_CODE (XVECEXP (op, 0, 0)) == SET
- && MEM_P (XEXP (XVECEXP (op, 0, 0), 0))
- && GET_MODE (XEXP (XVECEXP (op, 0, 0), 0)) == BLKmode
- && XEXP (XVECEXP (op, 0, 0), 1) == const0_rtx);
+  rtx set = XVECEXP (op, 0, 0);
+  return (GET_CODE (set) == SET
+ && MEM_P (SET_DEST (set))
+ && GET_MODE (SET_DEST (set)) == BLKmode
+ && GET_CODE (SET_SRC (set)) == UNSPEC
+ && XINT (SET_SRC (set), 1) == UNSPEC_TIE
+ && XVECEXP (SET_SRC (set), 0, 0) == const0_rtx);
 })
 
 ;; Match a small code model toc reference (or medium and large
diff --git a/gcc/config/rs6000/rs6000-logue.cc 
b/gcc/config/rs6000/rs6000-logue.cc
index bc6b153b59f..b99f43a8282 100644
--- a/gcc/config/rs6000/rs6000-logue.cc
+++ b/gcc/config/rs6000/rs6000-logue.cc
@@ -1463,7 +1463,9 @@ rs6000_emit_stack_tie (rtx fp, bool hard_frame_needed)
   while (--i >= 0)
 {
   rtx mem = gen_frame_mem (BLKmode, regs[i]);
-  RTVEC_ELT (p, i) = gen_rtx_SET (mem, const0_rtx);
+  RTVEC_ELT (p, i)
+   = gen_rtx_SET (mem, gen_rtx_UNSPEC (BLKmode, gen_rtvec (1, const0_rtx),
+   UNSPEC_TIE));
 }
 
   emit_insn (gen_stack_tie (gen_rtx_PARALLEL (VOIDmode, p)));
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index d197c3f3289..0c81ebea711 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -1760,6 +1760,10 @@ static const struct attribute_spec 
rs6000_attribute_table[] =
 
 #undef TARGET_UPDATE_IPA_FN_TARGET_INFO
 #define TARGET_UPDATE_IPA_FN_TARGET_INFO rs6000_update_ipa_fn_target_info
+
+#undef TARGET_CONST_ANCHOR
+#define TARGET_CONST_ANCHOR 0x8000
+
 
 
 /* Processor table.  */
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index b0db8ae508d..fdcf8347812 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -158,6 +158,7 @@ (define_c_enum "unspec"
UNSPEC_HASHCHK
UNSPEC_XXSPLTIDP_CONST
UNSPEC_XXSPLTIW_CONST
+   UNSPEC_TIE
   ])
 
 ;;
@@ -10828,7 +10829,9 @@ (define_expand "restore_stack_block"
   operands[4] = gen_frame_mem (Pmode, operands[1]);
   p = rtvec_alloc (1);
   RTVEC_ELT (p, 0) = gen_rtx_SET (gen_frame_mem (BLKmode, operands[0]),
- const0_rtx);
+ gen_rtx_UNSPEC (BLKmode,
+ gen_rtvec (1, const0_rtx),
+ UNSPEC_TIE));
   operands[5] = gen_rtx_PARALLEL (VOIDmode, p);
 })
 
@@ -10866,7 +10869,9 @@ (define_expand "restore_stack_nonlocal"
   operands[5] = gen_frame_mem (Pmode, operands[3]);
   p = rtvec_alloc (1);
   RTVEC_ELT (p, 0) = gen_rtx_SET (gen_frame_mem (BLKmode, operands[0]),
- const0_rtx);
+ gen_rtx_UNSPEC (BLKmode,
+ gen_rtvec (1, const0_rtx),
+ UNSPEC_TIE));
   operands[6] = gen_rtx_PARALLEL (VOIDmode, p);
 })
 
@@ -13898,7 +13903,8 @@ (define_insn "*save_fpregs__r1"
 ; not be moved over loads from or stores to stack memory.
 (define_insn "stack_tie"
   [(match_parallel 0 "tie_operand"
-  [(set (mem:BLK (reg 1)) (const_int 0))])]
+  [(set (mem:BLK (reg 1))
+   (unspec:BLK [(const_int 0)] UNSPEC_TIE))])]
   ""
   ""
   [(set_attr "length" "0")])
@@ -13910,7 +13916,7 @@ (define_insn "stack_restore_tie"
   [(set (match_operand:SI 0 "gpc_reg_operand" "=r,r")
(plus:SI (match_operand:SI 1 "gpc_reg_operand" "r,r")
 (match_operand:SI 2 "reg_or_cint_operand" "O,rI")))
-   (set (mem:BLK (scratch)) (const_int 0))]
+   (set (mem:BLK (scratch)) 

Re: [PATCH] Make sure SCALAR_INT_MODE_P before invoke try_const_anchors

2023-06-12 Thread Jiufu Guo via Gcc-patches


Hi,

Richard Biener  writes:

> On Mon, 12 Jun 2023, Jiufu Guo wrote:
>
>> Richard Biener  writes:
>> 
>> > On Fri, 9 Jun 2023, Jiufu Guo wrote:
>> >
>> >> 
>> >> Hi,
>> >> 
>> >> Richard Biener  writes:
>> >> 
>> >> > On Fri, 9 Jun 2023, Jiufu Guo wrote:
>> >> >
>> >> >> 
>> >> >> Hi,
>> >> >> 
>> >> >> Richard Biener  writes:
>> >> >> 
>> >> >> > On Fri, 9 Jun 2023, Richard Sandiford wrote:
>> >> >> >
>> >> >> >> guojiufu  writes:
>> >> >> >> > Hi,
>> >> >> >> >
>> >> >> >> > On 2023-06-09 16:00, Richard Biener wrote:
>> >> >> >> >> On Fri, 9 Jun 2023, Jiufu Guo wrote:
>> >> >> >> >> 
>> >> >> >> >>> Hi,
>> >> >> >> >>> 
>> >> ...
>> >> >> >> >>> 
>> >> >> >> >>> This patch is raised when drafting below one.
>> >> >> >> >>> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603530.html.
>> >> >> >> >>> With that patch, "{[%1:DI]=0;} stack_tie" with BLKmode runs into
>> >> >> >> >>> try_const_anchors, and hits the assert/ice.
>> >> >> >> >>> 
>> >> >> >> >>> Boostrap and regtest pass on ppc64{,le} and x86_64.
>> >> >> >> >>> Is this ok for trunk?
>> >> >> >> >> 
>> >> >> >> >> Iff the correct fix at all (how can a CONST_INT have BLKmode?) 
>> >> >> >> >> then
>> >> >> >> >> I suggest to instead fix try_const_anchors to change
>> >> >> >> >> 
>> >> >> >> >>   /* CONST_INT is used for CC modes, but we should leave those 
>> >> >> >> >> alone.  
>> >> >> >> >> */
>> >> >> >> >>   if (GET_MODE_CLASS (mode) == MODE_CC)
>> >> >> >> >> return NULL_RTX;
>> >> >> >> >> 
>> >> >> >> >>   gcc_assert (SCALAR_INT_MODE_P (mode));
>> >> >> >> >> 
>> >> >> >> >> to
>> >> >> >> >> 
>> >> >> >> >>   /* CONST_INT is used for CC modes, leave any non-scalar-int 
>> >> >> >> >> mode 
>> >> >> >> >> alone.  */
>> >> >> >> >>   if (!SCALAR_INT_MODE_P (mode))
>> >> >> >> >> return NULL_RTX;
>> >> >> >> >> 
>> >> >> >> >
>> >> >> >> > This is also able to fix this issue.  there is a "Punt on CC 
>> >> >> >> > modes" 
>> >> >> >> > patch
>> >> >> >> > to return NULL_RTX in try_const_anchors.
>> >> >> >> >
>> >> >> >> >> but as said I wonder how we arrive at a BLKmode CONST_INT and 
>> >> >> >> >> whether
>> >> >> >> >> we should have fended this off earlier.  Can you share more 
>> >> >> >> >> complete
>> >> >> >> >> RTL of that stack_tie?
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > (insn 15 14 16 3 (parallel [
>> >> >> >> >  (set (mem/c:BLK (reg/f:DI 1 1) [1  A8])
>> >> >> >> >  (const_int 0 [0]))
>> >> >> >> >  ]) "/home/guojiufu/temp/gdb.c":13:3 922 {stack_tie}
>> >> >> >> >   (nil))
>> >> >> >> >
>> >> >> >> > It is "set (mem/c:BLK (reg/f:DI 1 1) (const_int 0 [0])".
>> >> >> >> 
>> >> >> >> I'm not convinced this is correct RTL.  (unspec:BLK [(const_int 0)] 
>> >> >> >> ...)
>> >> >> >> would be though.  It's arguably more accurate too, since the effect
>> >> >> >> on the stack locations is unspecified rather than predictable.
>> >> >> >
>> >> >> > powerpc seems to be the only port with a stack_tie that's not
>> >> >> > using an UNSPEC RHS.
>> >> >> In rs6000.md, it is
>> >> >> 
>> >> >> ; This is to explain that changes to the stack pointer should
>> >> >> ; not be moved over loads from or stores to stack memory.
>> >> >> (define_insn "stack_tie"
>> >> >>   [(match_parallel 0 "tie_operand"
>> >> >>   [(set (mem:BLK (reg 1)) (const_int 0))])]
>> >> >>   ""
>> >> >>   ""
>> >> >>   [(set_attr "length" "0")])
>> >> >> 
>> >> >> This would be just an placeholder insn, and acts as the comments.
>> >> >> UNSPEC_ would works like other targets.  While, I'm wondering
>> >> >> the concerns on "set (mem:BLK (reg 1)) (const_int 0)".
>> >> >> MODEs between SET_DEST and SET_SRC?
>> >> >
>> >> > I don't think the issue is the mode but the issue is that
>> >> > the patter as-is says some memory is zeroed while that's not
>> >> > actually true (not specifying a size means we can't really do
>> >> > anything with this MEM, but still).  Using an UNSPEC avoids
>> >> > implying anything for the stored value.
>> >> >
>> >> > Of course I think a MEM SET_DEST without a specified size is bougs
>> >> > as well, but there's larger precedent for this...
>> >> 
>> >> Thanks for your kindly comments!
>> >> Using "(set (mem:BLK (reg 1)) (const_int 0))" here, may because this
>> >> insn does not generate real thing (not a real store and no asm code),
>> >> may like barrier.
>> >> 
>> >> While I agree that, using UNSPEC may be more clear to avoid mis-reading.
>> >
>> > Btw, another way to avoid the issue in CSE is to make it not process
>> > (aka record anything for optimization) for SET from MEMs with
>> > !MEM_SIZE_KNOWN_P
>> 
>> Thanks! Yes, this would make sense.
>> Then, there are two ideas(patches) to handle this issue:
>> Which one would be preferable?  This one (from compiling time aspect)?
>> 
>> And maybe, the changes in rs6000 stack_tie through using unspec
>> can be a standalone enhancement besides cse patch.
>> 
>> Thanks for comments!
>> 
>> BR,
>> Jeff (Jiufu Guo)
>> 
>> 

Re: [PATCH] Make sure SCALAR_INT_MODE_P before invoke try_const_anchors

2023-06-11 Thread Jiufu Guo via Gcc-patches
Richard Biener  writes:

> On Fri, 9 Jun 2023, Jiufu Guo wrote:
>
>> 
>> Hi,
>> 
>> Richard Biener  writes:
>> 
>> > On Fri, 9 Jun 2023, Jiufu Guo wrote:
>> >
>> >> 
>> >> Hi,
>> >> 
>> >> Richard Biener  writes:
>> >> 
>> >> > On Fri, 9 Jun 2023, Richard Sandiford wrote:
>> >> >
>> >> >> guojiufu  writes:
>> >> >> > Hi,
>> >> >> >
>> >> >> > On 2023-06-09 16:00, Richard Biener wrote:
>> >> >> >> On Fri, 9 Jun 2023, Jiufu Guo wrote:
>> >> >> >> 
>> >> >> >>> Hi,
>> >> >> >>> 
>> ...
>> >> >> >>> 
>> >> >> >>> This patch is raised when drafting below one.
>> >> >> >>> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603530.html.
>> >> >> >>> With that patch, "{[%1:DI]=0;} stack_tie" with BLKmode runs into
>> >> >> >>> try_const_anchors, and hits the assert/ice.
>> >> >> >>> 
>> >> >> >>> Boostrap and regtest pass on ppc64{,le} and x86_64.
>> >> >> >>> Is this ok for trunk?
>> >> >> >> 
>> >> >> >> Iff the correct fix at all (how can a CONST_INT have BLKmode?) then
>> >> >> >> I suggest to instead fix try_const_anchors to change
>> >> >> >> 
>> >> >> >>   /* CONST_INT is used for CC modes, but we should leave those 
>> >> >> >> alone.  
>> >> >> >> */
>> >> >> >>   if (GET_MODE_CLASS (mode) == MODE_CC)
>> >> >> >> return NULL_RTX;
>> >> >> >> 
>> >> >> >>   gcc_assert (SCALAR_INT_MODE_P (mode));
>> >> >> >> 
>> >> >> >> to
>> >> >> >> 
>> >> >> >>   /* CONST_INT is used for CC modes, leave any non-scalar-int mode 
>> >> >> >> alone.  */
>> >> >> >>   if (!SCALAR_INT_MODE_P (mode))
>> >> >> >> return NULL_RTX;
>> >> >> >> 
>> >> >> >
>> >> >> > This is also able to fix this issue.  there is a "Punt on CC modes" 
>> >> >> > patch
>> >> >> > to return NULL_RTX in try_const_anchors.
>> >> >> >
>> >> >> >> but as said I wonder how we arrive at a BLKmode CONST_INT and 
>> >> >> >> whether
>> >> >> >> we should have fended this off earlier.  Can you share more complete
>> >> >> >> RTL of that stack_tie?
>> >> >> >
>> >> >> >
>> >> >> > (insn 15 14 16 3 (parallel [
>> >> >> >  (set (mem/c:BLK (reg/f:DI 1 1) [1  A8])
>> >> >> >  (const_int 0 [0]))
>> >> >> >  ]) "/home/guojiufu/temp/gdb.c":13:3 922 {stack_tie}
>> >> >> >   (nil))
>> >> >> >
>> >> >> > It is "set (mem/c:BLK (reg/f:DI 1 1) (const_int 0 [0])".
>> >> >> 
>> >> >> I'm not convinced this is correct RTL.  (unspec:BLK [(const_int 0)] 
>> >> >> ...)
>> >> >> would be though.  It's arguably more accurate too, since the effect
>> >> >> on the stack locations is unspecified rather than predictable.
>> >> >
>> >> > powerpc seems to be the only port with a stack_tie that's not
>> >> > using an UNSPEC RHS.
>> >> In rs6000.md, it is
>> >> 
>> >> ; This is to explain that changes to the stack pointer should
>> >> ; not be moved over loads from or stores to stack memory.
>> >> (define_insn "stack_tie"
>> >>   [(match_parallel 0 "tie_operand"
>> >>  [(set (mem:BLK (reg 1)) (const_int 0))])]
>> >>   ""
>> >>   ""
>> >>   [(set_attr "length" "0")])
>> >> 
>> >> This would be just an placeholder insn, and acts as the comments.
>> >> UNSPEC_ would works like other targets.  While, I'm wondering
>> >> the concerns on "set (mem:BLK (reg 1)) (const_int 0)".
>> >> MODEs between SET_DEST and SET_SRC?
>> >
>> > I don't think the issue is the mode but the issue is that
>> > the patter as-is says some memory is zeroed while that's not
>> > actually true (not specifying a size means we can't really do
>> > anything with this MEM, but still).  Using an UNSPEC avoids
>> > implying anything for the stored value.
>> >
>> > Of course I think a MEM SET_DEST without a specified size is bougs
>> > as well, but there's larger precedent for this...
>> 
>> Thanks for your kindly comments!
>> Using "(set (mem:BLK (reg 1)) (const_int 0))" here, may because this
>> insn does not generate real thing (not a real store and no asm code),
>> may like barrier.
>> 
>> While I agree that, using UNSPEC may be more clear to avoid mis-reading.
>
> Btw, another way to avoid the issue in CSE is to make it not process
> (aka record anything for optimization) for SET from MEMs with
> !MEM_SIZE_KNOWN_P

Thanks! Yes, this would make sense.
Then, there are two ideas(patches) to handle this issue:
Which one would be preferable?  This one (from compiling time aspect)?

And maybe, the changes in rs6000 stack_tie through using unspec
can be a standalone enhancement besides cse patch.

Thanks for comments!

BR,
Jeff (Jiufu Guo)

 patch 1
diff --git a/gcc/cse.cc b/gcc/cse.cc
index 2bb63ac4105..06ecdadecbc 100644
--- a/gcc/cse.cc
+++ b/gcc/cse.cc
@@ -4271,6 +4271,8 @@ find_sets_in_insn (rtx_insn *insn, vec *psets)
 someplace else, so it isn't worth cse'ing.  */
   else if (GET_CODE (SET_SRC (x)) == CALL)
;
+  else if (MEM_P (SET_DEST (x)) && !MEM_SIZE_KNOWN_P (SET_DEST (x)))
+   ;
   else if (GET_CODE (SET_SRC (x)) == CONST_VECTOR
   && GET_MODE_CLASS (GET_MODE (SET_SRC (x))) != 

Re: [PATCH V2] Optimize '(X - N * M) / N' to 'X / N - M' if valid

2023-06-11 Thread Jiufu Guo via Gcc-patches


Hi,

Thanks for your comments!

Segher Boessenkool  writes:

> Hi!
>
> On Wed, Jun 07, 2023 at 04:21:11PM +0800, Jiufu Guo wrote:
>> This patch tries to optimize "(X - N * M) / N" to "X / N - M".
>> For C code, "/" towards zero (trunc_div), and "X - N * M" maybe
>> wrap/overflow/underflow. So, it is valid that "X - N * M" does
>> not cross zero and does not wrap/overflow/underflow.
>
> Is it ever valid semi-generally when N does not divide X?

It is valid only if there is no wrap/overflow/underflow, and the sign
of "X" and "X-N*M" are the same.  Under this condition, N,M and X can be
any value. 

>
> Say X=5, N=2, M=3.  Then the original expression evaluates to 0, but the
> new one to -1.  Whenever one of the divisions rounds up and the other
> rounds down you have this problem.
You are right.  Since '/' is always towards zero, so, 'X' and 'X-N*M'
should have the same sign bit.  Otherwise, one rounds up, the other
rounds down, then the transform is invalid.

BR,
Jeff (Jiufu Guo)
>
>
> Segher


Re: [PATCH V2] Optimize '(X - N * M) / N' to 'X / N - M' if valid

2023-06-09 Thread Jiufu Guo via Gcc-patches


Hi,

Richard Biener  writes:

> On Wed, 7 Jun 2023, Jiufu Guo wrote:
>
>> Hi,
>> 
>> This patch tries to optimize "(X - N * M) / N" to "X / N - M".
>> For C code, "/" towards zero (trunc_div), and "X - N * M" maybe
>> wrap/overflow/underflow. So, it is valid that "X - N * M" does
>> not cross zero and does not wrap/overflow/underflow.
>> 
>> Compare with previous version:
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618796.html
>> 
>> This patch 1. adds the patterns for variable N or M,
>> 2. uses simpler form "(X - N * M) / N" for patterns,
>> 3. adds functions to gimle-fold.h/cc (not gimple-match-head.cc)
>> 4. updates testcases
>> 
>> Bootstrap & regtest pass on ppc64{,le} and x86_64.
>> Is this patch ok for trunk?
>
> Comments below.
>
>> 
>> BR,
>> Jeff (Jiufu Guo)
>> 
>>  PR tree-optimization/108757
>> 
>> gcc/ChangeLog:
>> 
>>  * gimple-fold.cc (maybe_mult_overflow): New function.
>>  (maybe_plus_overflow): New function.
>>  (maybe_minus_overflow): New function.
>>  (plus_mult_no_ovf_and_keep_sign): New function.
>>  (plus_no_ovf_and_keep_sign): New function.
>>  * gimple-fold.h (maybe_mult_overflow): New declare.
>>  (plus_mult_no_ovf_and_keep_sign): New declare.
>>  (plus_no_ovf_and_keep_sign): New declare.
>>  * match.pd ((X - N * M) / N): New pattern.
>>  ((X + N * M) / N): New pattern.
>>  ((X + C) / N): New pattern.
>>  ((X + C) >> N): New pattern.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>  * gcc.dg/pr108757-1.c: New test.
>>  * gcc.dg/pr108757-2.c: New test.
>>  * gcc.dg/pr108757.h: New test.
>> 
>> ---
>>  gcc/gimple-fold.cc| 161 
>>  gcc/gimple-fold.h |   3 +
>>  gcc/match.pd  |  58 +++
>>  gcc/testsuite/gcc.dg/pr108757-1.c |  18 +++
>>  gcc/testsuite/gcc.dg/pr108757-2.c |  19 +++
>>  gcc/testsuite/gcc.dg/pr108757.h   | 244 ++
>>  6 files changed, 503 insertions(+)
>>  create mode 100644 gcc/testsuite/gcc.dg/pr108757-1.c
>>  create mode 100644 gcc/testsuite/gcc.dg/pr108757-2.c
>>  create mode 100644 gcc/testsuite/gcc.dg/pr108757.h
>> 
>> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
>> index 581575b65ec..bb833ae17b3 100644
>> --- a/gcc/gimple-fold.cc
>> +++ b/gcc/gimple-fold.cc
>> @@ -9349,3 +9349,164 @@ gimple_stmt_integer_valued_real_p (gimple *stmt, int 
>> depth)
>>return false;
>>  }
>>  }
>> +
>> +/* Return true if "X * Y" may be overflow.  */
>> +
>> +bool
>> +maybe_mult_overflow (value_range , value_range , signop sgn)
>
> These functions look like some "basic" functionality that should
> be (or maybe already is?  Andrew?) provided by the value-range
> framework.  That means it should not reside in gimple-fold.{cc,h}
> but elsehwere and possibly with an API close to the existing
> value-range stuff.
>
> Andrew?

It would be great to get the overflow info directly from VR :)
Now, in range-op.cc, there is aleady value_range_with_overflow and
value_range_from_overflowed_bounds which checks OVFs.
While this information seems not recorded.  Maybe, it is helpful
adding a field in VR and adding API to query it.

>
>> +{
>> +  wide_int wmin0 = x.lower_bound ();
>> +  wide_int wmax0 = x.upper_bound ();
>> +  wide_int wmin1 = y.lower_bound ();
>> +  wide_int wmax1 = y.upper_bound ();
>> +
>> +  wi::overflow_type min_ovf, max_ovf;
>> +  wi::mul (wmin0, wmin1, sgn, _ovf);
>> +  wi::mul (wmax0, wmax1, sgn, _ovf);
>> +  if (min_ovf == wi::OVF_NONE && max_ovf == wi::OVF_NONE)
>> +{
>> +  wi::mul (wmin0, wmax1, sgn, _ovf);
>> +  wi::mul (wmax0, wmin1, sgn, _ovf);
>> +  if (min_ovf == wi::OVF_NONE && max_ovf == wi::OVF_NONE)
>> +return false;
>> +}
>> +  return true;
>> +}
>> +
>> +/* Return true if "X + Y" may be overflow.  */
>> +
>> +static bool
>> +maybe_plus_overflow (value_range , value_range , signop sgn)
>> +{
>> +  wide_int wmin0 = x.lower_bound ();
>> +  wide_int wmax0 = x.upper_bound ();
>> +  wide_int wmin1 = y.lower_bound ();
>> +  wide_int wmax1 = y.upper_bound ();
>> +
>> +  wi::overflow_type min_ovf, max_ovf;
>> +  wi::add (wmax0, wmax1, sgn, _ovf);
>> +  wi::add (wmin0, wmin1, sgn, _ovf);
>> +  if (min_ovf == wi::OVF_NONE && max_ovf == wi::OVF_NONE)
>> +return false;
>> +
>> +  return true;
>> +}
>> +
>> +/* Return true if "X - Y" may be overflow.  */
>> +
>> +static bool
>> +maybe_minus_overflow (value_range , value_range , signop sgn)
>> +{
>> +  wide_int wmin0 = x.lower_bound ();
>> +  wide_int wmax0 = x.upper_bound ();
>> +  wide_int wmin1 = y.lower_bound ();
>> +  wide_int wmax1 = y.upper_bound ();
>> +
>> +  wi::overflow_type min_ovf, max_ovf;
>> +  wi::sub (wmin0, wmax1, sgn, _ovf);
>> +  wi::sub (wmax0, wmin1, sgn, _ovf);
>> +  if (min_ovf == wi::OVF_NONE && max_ovf == wi::OVF_NONE)
>> +return false;
>> +
>> +  return true;
>> +}
>> +
>> +/* Return true if there is no overflow in the expression.
>> +   And no sign change on the plus/minus for X.
>
> 

Re: [PATCH] Make sure SCALAR_INT_MODE_P before invoke try_const_anchors

2023-06-09 Thread Jiufu Guo via Gcc-patches


Hi,

Richard Biener  writes:

> On Fri, 9 Jun 2023, Jiufu Guo wrote:
>
>> 
>> Hi,
>> 
>> Richard Biener  writes:
>> 
>> > On Fri, 9 Jun 2023, Richard Sandiford wrote:
>> >
>> >> guojiufu  writes:
>> >> > Hi,
>> >> >
>> >> > On 2023-06-09 16:00, Richard Biener wrote:
>> >> >> On Fri, 9 Jun 2023, Jiufu Guo wrote:
>> >> >> 
>> >> >>> Hi,
>> >> >>> 
...
>> >> >>> 
>> >> >>> This patch is raised when drafting below one.
>> >> >>> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603530.html.
>> >> >>> With that patch, "{[%1:DI]=0;} stack_tie" with BLKmode runs into
>> >> >>> try_const_anchors, and hits the assert/ice.
>> >> >>> 
>> >> >>> Boostrap and regtest pass on ppc64{,le} and x86_64.
>> >> >>> Is this ok for trunk?
>> >> >> 
>> >> >> Iff the correct fix at all (how can a CONST_INT have BLKmode?) then
>> >> >> I suggest to instead fix try_const_anchors to change
>> >> >> 
>> >> >>   /* CONST_INT is used for CC modes, but we should leave those alone.  
>> >> >> */
>> >> >>   if (GET_MODE_CLASS (mode) == MODE_CC)
>> >> >> return NULL_RTX;
>> >> >> 
>> >> >>   gcc_assert (SCALAR_INT_MODE_P (mode));
>> >> >> 
>> >> >> to
>> >> >> 
>> >> >>   /* CONST_INT is used for CC modes, leave any non-scalar-int mode 
>> >> >> alone.  */
>> >> >>   if (!SCALAR_INT_MODE_P (mode))
>> >> >> return NULL_RTX;
>> >> >> 
>> >> >
>> >> > This is also able to fix this issue.  there is a "Punt on CC modes" 
>> >> > patch
>> >> > to return NULL_RTX in try_const_anchors.
>> >> >
>> >> >> but as said I wonder how we arrive at a BLKmode CONST_INT and whether
>> >> >> we should have fended this off earlier.  Can you share more complete
>> >> >> RTL of that stack_tie?
>> >> >
>> >> >
>> >> > (insn 15 14 16 3 (parallel [
>> >> >  (set (mem/c:BLK (reg/f:DI 1 1) [1  A8])
>> >> >  (const_int 0 [0]))
>> >> >  ]) "/home/guojiufu/temp/gdb.c":13:3 922 {stack_tie}
>> >> >   (nil))
>> >> >
>> >> > It is "set (mem/c:BLK (reg/f:DI 1 1) (const_int 0 [0])".
>> >> 
>> >> I'm not convinced this is correct RTL.  (unspec:BLK [(const_int 0)] ...)
>> >> would be though.  It's arguably more accurate too, since the effect
>> >> on the stack locations is unspecified rather than predictable.
>> >
>> > powerpc seems to be the only port with a stack_tie that's not
>> > using an UNSPEC RHS.
>> In rs6000.md, it is
>> 
>> ; This is to explain that changes to the stack pointer should
>> ; not be moved over loads from or stores to stack memory.
>> (define_insn "stack_tie"
>>   [(match_parallel 0 "tie_operand"
>> [(set (mem:BLK (reg 1)) (const_int 0))])]
>>   ""
>>   ""
>>   [(set_attr "length" "0")])
>> 
>> This would be just an placeholder insn, and acts as the comments.
>> UNSPEC_ would works like other targets.  While, I'm wondering
>> the concerns on "set (mem:BLK (reg 1)) (const_int 0)".
>> MODEs between SET_DEST and SET_SRC?
>
> I don't think the issue is the mode but the issue is that
> the patter as-is says some memory is zeroed while that's not
> actually true (not specifying a size means we can't really do
> anything with this MEM, but still).  Using an UNSPEC avoids
> implying anything for the stored value.
>
> Of course I think a MEM SET_DEST without a specified size is bougs
> as well, but there's larger precedent for this...

Thanks for your kindly comments!
Using "(set (mem:BLK (reg 1)) (const_int 0))" here, may because this
insn does not generate real thing (not a real store and no asm code),
may like barrier.

While I agree that, using UNSPEC may be more clear to avoid mis-reading.

BR,
Jeff (Jiufu Guo)

>
> Richard.
>
>> Thanks for comments!
>> 
>> BR,
>> Jeff (Jiufu Guo)
>> >
>> >> Thanks,
>> >> Richard
>> 


Re: [PATCH] Make sure SCALAR_INT_MODE_P before invoke try_const_anchors

2023-06-09 Thread Jiufu Guo via Gcc-patches


Hi,

Richard Biener  writes:

> On Fri, 9 Jun 2023, Richard Sandiford wrote:
>
>> guojiufu  writes:
>> > Hi,
>> >
>> > On 2023-06-09 16:00, Richard Biener wrote:
>> >> On Fri, 9 Jun 2023, Jiufu Guo wrote:
>> >> 
>> >>> Hi,
>> >>> 
>> >>> As checking the code, there is a "gcc_assert (SCALAR_INT_MODE_P 
>> >>> (mode))"
>> >>> in "try_const_anchors".
>> >>> This assert seems correct because the function try_const_anchors cares
>> >>> about integer values currently, and modes other than SCALAR_INT_MODE_P
>> >>> are not needed to support.
>> >>> 
>> >>> This patch makes sure SCALAR_INT_MODE_P when calling 
>> >>> try_const_anchors.
>> >>> 
>> >>> This patch is raised when drafting below one.
>> >>> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603530.html.
>> >>> With that patch, "{[%1:DI]=0;} stack_tie" with BLKmode runs into
>> >>> try_const_anchors, and hits the assert/ice.
>> >>> 
>> >>> Boostrap and regtest pass on ppc64{,le} and x86_64.
>> >>> Is this ok for trunk?
>> >> 
>> >> Iff the correct fix at all (how can a CONST_INT have BLKmode?) then
>> >> I suggest to instead fix try_const_anchors to change
>> >> 
>> >>   /* CONST_INT is used for CC modes, but we should leave those alone.  
>> >> */
>> >>   if (GET_MODE_CLASS (mode) == MODE_CC)
>> >> return NULL_RTX;
>> >> 
>> >>   gcc_assert (SCALAR_INT_MODE_P (mode));
>> >> 
>> >> to
>> >> 
>> >>   /* CONST_INT is used for CC modes, leave any non-scalar-int mode 
>> >> alone.  */
>> >>   if (!SCALAR_INT_MODE_P (mode))
>> >> return NULL_RTX;
>> >> 
>> >
>> > This is also able to fix this issue.  there is a "Punt on CC modes" 
>> > patch
>> > to return NULL_RTX in try_const_anchors.
>> >
>> >> but as said I wonder how we arrive at a BLKmode CONST_INT and whether
>> >> we should have fended this off earlier.  Can you share more complete
>> >> RTL of that stack_tie?
>> >
>> >
>> > (insn 15 14 16 3 (parallel [
>> >  (set (mem/c:BLK (reg/f:DI 1 1) [1  A8])
>> >  (const_int 0 [0]))
>> >  ]) "/home/guojiufu/temp/gdb.c":13:3 922 {stack_tie}
>> >   (nil))
>> >
>> > It is "set (mem/c:BLK (reg/f:DI 1 1) (const_int 0 [0])".
>> 
>> I'm not convinced this is correct RTL.  (unspec:BLK [(const_int 0)] ...)
>> would be though.  It's arguably more accurate too, since the effect
>> on the stack locations is unspecified rather than predictable.
>
> powerpc seems to be the only port with a stack_tie that's not
> using an UNSPEC RHS.
In rs6000.md, it is

; This is to explain that changes to the stack pointer should
; not be moved over loads from or stores to stack memory.
(define_insn "stack_tie"
  [(match_parallel 0 "tie_operand"
   [(set (mem:BLK (reg 1)) (const_int 0))])]
  ""
  ""
  [(set_attr "length" "0")])

This would be just an placeholder insn, and acts as the comments.
UNSPEC_ would works like other targets.  While, I'm wondering
the concerns on "set (mem:BLK (reg 1)) (const_int 0)".
MODEs between SET_DEST and SET_SRC?

Thanks for comments!

BR,
Jeff (Jiufu Guo)
>
>> Thanks,
>> Richard


[PATCH] Make sure SCALAR_INT_MODE_P before invoke try_const_anchors

2023-06-08 Thread Jiufu Guo via Gcc-patches
Hi,

As checking the code, there is a "gcc_assert (SCALAR_INT_MODE_P (mode))"
in "try_const_anchors".
This assert seems correct because the function try_const_anchors cares
about integer values currently, and modes other than SCALAR_INT_MODE_P
are not needed to support.

This patch makes sure SCALAR_INT_MODE_P when calling try_const_anchors.

This patch is raised when drafting below one.
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603530.html.
With that patch, "{[%1:DI]=0;} stack_tie" with BLKmode runs into
try_const_anchors, and hits the assert/ice.

Boostrap and regtest pass on ppc64{,le} and x86_64.
Is this ok for trunk?


BR,
Jeff (Jiufu Guo)

gcc/ChangeLog:

* cse.cc (cse_insn): Add SCALAR_INT_MODE_P condition.

---
 gcc/cse.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/cse.cc b/gcc/cse.cc
index 2bb63ac4105..f213fa0faf7 100644
*** a/gcc/cse.cc
--- b/gcc/cse.cc
***
*** 5003,5009 
if (targetm.const_anchor
  && !src_related
  && src_const
! && GET_CODE (src_const) == CONST_INT)
{
  src_related = try_const_anchors (src_const, mode);
  src_related_is_const_anchor = src_related != NULL_RTX;
- - 
--- 5003,5010 
if (targetm.const_anchor
  && !src_related
  && src_const
! && GET_CODE (src_const) == CONST_INT
! && SCALAR_INT_MODE_P (mode))
{
  src_related = try_const_anchors (src_const, mode);
  src_related_is_const_anchor = src_related != NULL_RTX;
2.39.3



[PATCH 3/4] rs6000: build constant via li/lis;rldicl/rldicr

2023-06-07 Thread Jiufu Guo via Gcc-patches
Hi,

This patch checks if a constant is possible left/right cleaned on a rotated
value from a negative value of "li/lis".  If so, we can build the constant
through "li/lis ; rldicl/rldicr".

Bootstrap and regtest pass on ppc64{,le}.
Is this ok for trunk?

BR,
Jeff (Jiufu)

gcc/ChangeLog:

* config/rs6000/rs6000.cc (can_be_built_by_li_lis_and_rldicl): New
function.
(can_be_built_by_li_lis_and_rldicr): New function.
(rs6000_emit_set_long_const): Call can_be_built_by_li_lis_and_rldicr and
can_be_built_by_li_lis_and_rldicl.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/const-build.c: Add more tests.
---
 gcc/config/rs6000/rs6000.cc   | 61 ++-
 .../gcc.target/powerpc/const-build.c  | 44 +
 2 files changed, 104 insertions(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 03cd9d5e952..2a3fa733b45 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -10332,6 +10332,61 @@ can_be_built_by_li_lis_and_rotldi (HOST_WIDE_INT c, 
int *shift,
   return false;
 }
 
+/* Check if value C can be built by 2 instructions: one is 'li or lis',
+   another is rldicl.
+
+   If so, *SHIFT is set to the shift operand of rldicl, and *MASK is set to
+   the mask operand of rldicl, and return true.
+   Return false otherwise.  */
+
+static bool
+can_be_built_by_li_lis_and_rldicl (HOST_WIDE_INT c, int *shift,
+  HOST_WIDE_INT *mask)
+{
+  /* Leading zeros may be cleaned by rldicl with a mask.  Change leading zeros
+ to ones and then recheck it.  */
+  int lz = clz_hwi (c);
+  HOST_WIDE_INT unmask_c
+= c | (HOST_WIDE_INT_M1U << (HOST_BITS_PER_WIDE_INT - lz));
+  int n;
+  if (can_be_rotated_to_negative_li (unmask_c, )
+  || can_be_rotated_to_negative_lis (unmask_c, ))
+{
+  *mask = HOST_WIDE_INT_M1U >> lz;
+  *shift = n == 0 ? 0 : HOST_BITS_PER_WIDE_INT - n;
+  return true;
+}
+
+  return false;
+}
+
+/* Check if value C can be built by 2 instructions: one is 'li or lis',
+   another is rldicr.
+
+   If so, *SHIFT is set to the shift operand of rldicr, and *MASK is set to
+   the mask operand of rldicr, and return true.
+   Return false otherwise.  */
+
+static bool
+can_be_built_by_li_lis_and_rldicr (HOST_WIDE_INT c, int *shift,
+  HOST_WIDE_INT *mask)
+{
+  /* Tailing zeros may be cleaned by rldicr with a mask.  Change tailing zeros
+ to ones and then recheck it.  */
+  int tz = ctz_hwi (c);
+  HOST_WIDE_INT unmask_c = c | ((HOST_WIDE_INT_1U << tz) - 1);
+  int n;
+  if (can_be_rotated_to_negative_li (unmask_c, )
+  || can_be_rotated_to_negative_lis (unmask_c, ))
+{
+  *mask = HOST_WIDE_INT_M1U << tz;
+  *shift = HOST_BITS_PER_WIDE_INT - n;
+  return true;
+}
+
+  return false;
+}
+
 /* Subroutine of rs6000_emit_set_const, handling PowerPC64 DImode.
Output insns to set DEST equal to the constant C as a series of
lis, ori and shl instructions.  */
@@ -10378,7 +10433,9 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
   emit_move_insn (dest, gen_rtx_XOR (DImode, temp,
 GEN_INT ((ud2 ^ 0x) << 16)));
 }
-  else if (can_be_built_by_li_lis_and_rotldi (c, , ))
+  else if (can_be_built_by_li_lis_and_rotldi (c, , )
+  || can_be_built_by_li_lis_and_rldicl (c, , )
+  || can_be_built_by_li_lis_and_rldicr (c, , ))
 {
   temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
   unsigned HOST_WIDE_INT imm = (c | ~mask);
@@ -10387,6 +10444,8 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
   emit_move_insn (temp, GEN_INT (imm));
   if (shift != 0)
temp = gen_rtx_ROTATE (DImode, temp, GEN_INT (shift));
+  if (mask != HOST_WIDE_INT_M1)
+   temp = gen_rtx_AND (DImode, temp, GEN_INT (mask));
   emit_move_insn (dest, temp);
 }
   else if (ud3 == 0 && ud4 == 0)
diff --git a/gcc/testsuite/gcc.target/powerpc/const-build.c 
b/gcc/testsuite/gcc.target/powerpc/const-build.c
index c38a1dd91f2..8c209921d41 100644
--- a/gcc/testsuite/gcc.target/powerpc/const-build.c
+++ b/gcc/testsuite/gcc.target/powerpc/const-build.c
@@ -46,6 +46,42 @@ lis_rotldi_6 (void)
   return 0x5318LL;
 }
 
+long long NOIPA
+li_rldicl_7 (void)
+{
+  return 0x3ffa1LL;
+}
+
+long long NOIPA
+li_rldicl_8 (void)
+{
+  return 0xff8531LL;
+}
+
+long long NOIPA
+lis_rldicl_9 (void)
+{
+  return 0x00ff8531LL;
+}
+
+long long NOIPA
+li_rldicr_10 (void)
+{
+  return 0x8531fff0LL;
+}
+
+long long NOIPA
+li_rldicr_11 (void)
+{
+  return 0x21f0LL;
+}
+
+long long NOIPA
+lis_rldicr_12 (void)
+{
+  return 0x5310LL;
+}
+
 struct fun arr[] = {
   {li_rotldi_1, 0x75310LL},
   {li_rotldi_2, 0x2164LL},
@@ -53,9 +89,17 @@ struct fun arr[] = {
   {li_rotldi_4, 0x2194LL},
   

[PATCH 2/4] rs6000: build constant via lis;rotldi

2023-06-07 Thread Jiufu Guo via Gcc-patches
Hi,

This patch checks if a constant is possible to be rotated to/from a negative
value from "lis".  If so, we could use "lis;rotldi" to build it.
The positive value of "lis" does not need to be analyzed.  Because if a
constant can be rotated from the positive value of "lis", it also can be
rotated from a positive value of "li".

Bootstrap and regtest pass on ppc64{,le}.
Is this ok for trunk?

BR,
Jeff (Jiufu)

gcc/ChangeLog:

* config/rs6000/rs6000.cc (can_be_rotated_to_negative_lis): New
function.
(can_be_built_by_li_and_rotldi): Rename to ...
(can_be_built_by_li_lis_and_rotldi): ... this function.
(rs6000_emit_set_long_const): Call can_be_built_by_li_lis_and_rotldi.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/const-build.c: Add more tests.
---
 gcc/config/rs6000/rs6000.cc   | 42 ---
 .../gcc.target/powerpc/const-build.c  | 16 ++-
 2 files changed, 52 insertions(+), 6 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 1dd0072350a..03cd9d5e952 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -10278,19 +10278,51 @@ can_be_rotated_to_negative_li (HOST_WIDE_INT c, int 
*rot)
   return can_be_rotated_to_lowbits (~c, 15, rot);
 }
 
-/* Check if value C can be built by 2 instructions: one is 'li', another is
-   rotldi.
+/* Check if C can be rotated to a negative value which 'lis' instruction is
+   able to load: 1..1xx0..0.  If so, set *ROT to the number by which C is
+   rotated, and return true.  Return false otherwise.  */
+
+static bool
+can_be_rotated_to_negative_lis (HOST_WIDE_INT c, int *rot)
+{
+  /* case a. 1..1xxx0..01..1: up to 15 x's, at least 16 0's.  */
+  int leading_ones = clz_hwi (~c);
+  int tailing_ones = ctz_hwi (~c);
+  int middle_zeros = ctz_hwi (c >> tailing_ones);
+  if (middle_zeros >= 16 && leading_ones + tailing_ones >= 33)
+{
+  *rot = HOST_BITS_PER_WIDE_INT - tailing_ones;
+  return true;
+}
+
+  /* case b. xx0..01..1xx: some of 15 x's (and some of 16 0's) are
+ rotated over the highest bit.  */
+  int pos_one = clz_hwi ((c << 16) >> 16);
+  middle_zeros = ctz_hwi (c >> (HOST_BITS_PER_WIDE_INT - pos_one));
+  int middle_ones = clz_hwi (~(c << pos_one));
+  if (middle_zeros >= 16 && middle_ones >= 33)
+{
+  *rot = pos_one;
+  return true;
+}
+
+  return false;
+}
+
+/* Check if value C can be built by 2 instructions: one is 'li or lis',
+   another is rotldi.
 
If so, *SHIFT is set to the shift operand of rotldi(rldicl), and *MASK
is set to -1, and return true.  Return false otherwise.  */
 
 static bool
-can_be_built_by_li_and_rotldi (HOST_WIDE_INT c, int *shift,
+can_be_built_by_li_lis_and_rotldi (HOST_WIDE_INT c, int *shift,
   HOST_WIDE_INT *mask)
 {
   int n;
   if (can_be_rotated_to_positive_li (c, )
-  || can_be_rotated_to_negative_li (c, ))
+  || can_be_rotated_to_negative_li (c, )
+  || can_be_rotated_to_negative_lis (c, ))
 {
   *mask = HOST_WIDE_INT_M1;
   *shift = HOST_BITS_PER_WIDE_INT - n;
@@ -10346,7 +10378,7 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
   emit_move_insn (dest, gen_rtx_XOR (DImode, temp,
 GEN_INT ((ud2 ^ 0x) << 16)));
 }
-  else if (can_be_built_by_li_and_rotldi (c, , ))
+  else if (can_be_built_by_li_lis_and_rotldi (c, , ))
 {
   temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
   unsigned HOST_WIDE_INT imm = (c | ~mask);
diff --git a/gcc/testsuite/gcc.target/powerpc/const-build.c 
b/gcc/testsuite/gcc.target/powerpc/const-build.c
index 70f095f6bf2..c38a1dd91f2 100644
--- a/gcc/testsuite/gcc.target/powerpc/const-build.c
+++ b/gcc/testsuite/gcc.target/powerpc/const-build.c
@@ -34,14 +34,28 @@ li_rotldi_4 (void)
   return 0x2194LL;
 }
 
+long long NOIPA
+lis_rotldi_5 (void)
+{
+  return 0x8531LL;
+}
+
+long long NOIPA
+lis_rotldi_6 (void)
+{
+  return 0x5318LL;
+}
+
 struct fun arr[] = {
   {li_rotldi_1, 0x75310LL},
   {li_rotldi_2, 0x2164LL},
   {li_rotldi_3, 0x8531LL},
   {li_rotldi_4, 0x2194LL},
+  {lis_rotldi_5, 0x8531LL},
+  {lis_rotldi_6, 0x5318LL},
 };
 
-/* { dg-final { scan-assembler-times {\mrotldi\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mrotldi\M} 6 } } */
 
 int
 main ()
-- 
2.39.1



[PATCH 4/4] rs6000: build constant via li/lis;rldic

2023-06-07 Thread Jiufu Guo via Gcc-patches
Hi,

This patch checks if a constant is possible to be built by "li;rldic".
We only need to take care of "negative li", other forms do not need to check.
For example, "negative lis" is just a "negative li" with an additional shift.

Bootstrap and regtest pass on ppc64{,le}.
Is this ok for trunk?

BR,
Jeff (Jiufu)

gcc/ChangeLog:

* config/rs6000/rs6000.cc (can_be_built_by_li_and_rldic): New function.
(rs6000_emit_set_long_const): Call can_be_built_by_li_and_rldic.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/const-build.c: Add more tests.
---
 gcc/config/rs6000/rs6000.cc   | 61 ++-
 .../gcc.target/powerpc/const-build.c  | 28 +
 2 files changed, 88 insertions(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 2a3fa733b45..cd04b6b5c82 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -10387,6 +10387,64 @@ can_be_built_by_li_lis_and_rldicr (HOST_WIDE_INT c, 
int *shift,
   return false;
 }
 
+/* Check if value C can be built by 2 instructions: one is 'li', another is
+   rldic.
+
+   If so, *SHIFT is set to the 'shift' operand of rldic; and *MASK is set
+   to the mask value about the 'mb' operand of rldic; and return true.
+   Return false otherwise.  */
+
+static bool
+can_be_built_by_li_and_rldic (HOST_WIDE_INT c, int *shift, HOST_WIDE_INT *mask)
+{
+  /* There are 49 successive ones in the negative value of 'li'.  */
+  int ones = 49;
+
+  /* 1..1xx1..1: negative value of li --> 0..01..1xx0..0:
+ right bits are shifted as 0's, and left 1's(and x's) are cleaned.  */
+  int tz = ctz_hwi (c);
+  int lz = clz_hwi (c);
+  int middle_ones = clz_hwi (~(c << lz));
+  if (tz + lz + middle_ones >= ones)
+{
+  *mask = ((1LL << (HOST_BITS_PER_WIDE_INT - tz - lz)) - 1LL) << tz;
+  *shift = tz;
+  return true;
+}
+
+  /* 1..1xx1..1 --> 1..1xx0..01..1: some 1's(following x's) are cleaned. */
+  int leading_ones = clz_hwi (~c);
+  int tailing_ones = ctz_hwi (~c);
+  int middle_zeros = ctz_hwi (c >> tailing_ones);
+  if (leading_ones + tailing_ones + middle_zeros >= ones)
+{
+  *mask = ~(((1ULL << middle_zeros) - 1ULL) << tailing_ones);
+  *shift = tailing_ones + middle_zeros;
+  return true;
+}
+
+  /* xx1..1xx: --> xx0..01..1xx: some 1's(following x's) are cleaned. */
+  /* Get the position for the first bit of successive 1.
+ The 24th bit would be in successive 0 or 1.  */
+  HOST_WIDE_INT low_mask = (1LL << 24) - 1LL;
+  int pos_first_1 = ((c & (low_mask + 1)) == 0)
+ ? clz_hwi (c & low_mask)
+ : HOST_BITS_PER_WIDE_INT - ctz_hwi (~(c | low_mask));
+  middle_ones = clz_hwi (~c << pos_first_1);
+  middle_zeros = ctz_hwi (c >> (HOST_BITS_PER_WIDE_INT - pos_first_1));
+  if (pos_first_1 < HOST_BITS_PER_WIDE_INT
+  && middle_ones + middle_zeros < HOST_BITS_PER_WIDE_INT
+  && middle_ones + middle_zeros >= ones)
+{
+  *mask = ~(((1ULL << middle_zeros) - 1LL)
+   << (HOST_BITS_PER_WIDE_INT - pos_first_1));
+  *shift = HOST_BITS_PER_WIDE_INT - pos_first_1 + middle_zeros;
+  return true;
+}
+
+  return false;
+}
+
 /* Subroutine of rs6000_emit_set_const, handling PowerPC64 DImode.
Output insns to set DEST equal to the constant C as a series of
lis, ori and shl instructions.  */
@@ -10435,7 +10493,8 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
 }
   else if (can_be_built_by_li_lis_and_rotldi (c, , )
   || can_be_built_by_li_lis_and_rldicl (c, , )
-  || can_be_built_by_li_lis_and_rldicr (c, , ))
+  || can_be_built_by_li_lis_and_rldicr (c, , )
+  || can_be_built_by_li_and_rldic (c, , ))
 {
   temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
   unsigned HOST_WIDE_INT imm = (c | ~mask);
diff --git a/gcc/testsuite/gcc.target/powerpc/const-build.c 
b/gcc/testsuite/gcc.target/powerpc/const-build.c
index 8c209921d41..b503ee31c7c 100644
--- a/gcc/testsuite/gcc.target/powerpc/const-build.c
+++ b/gcc/testsuite/gcc.target/powerpc/const-build.c
@@ -82,6 +82,29 @@ lis_rldicr_12 (void)
   return 0x5310LL;
 }
 
+long long NOIPA
+li_rldic_13 (void)
+{
+  return 0x000f8531LL;
+}
+long long NOIPA
+li_rldic_14 (void)
+{
+  return 0x853100ffLL;
+}
+
+long long NOIPA
+li_rldic_15 (void)
+{
+  return 0x8031LL;
+}
+
+long long NOIPA
+li_rldic_16 (void)
+{
+  return 0x8f31LL;
+}
+
 struct fun arr[] = {
   {li_rotldi_1, 0x75310LL},
   {li_rotldi_2, 0x2164LL},
@@ -95,11 +118,16 @@ struct fun arr[] = {
   {li_rldicr_10, 0x8531fff0LL},
   {li_rldicr_11, 0x21f0LL},
   {lis_rldicr_12, 0x5310LL},
+  {li_rldic_13, 0x000f8531LL},
+  {li_rldic_14, 0x853100ffLL},
+  {li_rldic_15, 0x8031LL},
+  {li_rldic_16, 0x8f31LL}
 };
 
 /* { dg-final { scan-assembler-times 

[PATCH 1/4] rs6000: build constant via li;rotldi

2023-06-07 Thread Jiufu Guo via Gcc-patches
Hi,

This patch checks if a constant is possible to be rotated to/from a positive
or negative value from "li". If so, we could use "li;rotldi" to build it.

Bootstrap and regtest pass on ppc64{,le}.
Is this ok for trunk?

BR,
Jeff (Jiufu)

gcc/ChangeLog:

* config/rs6000/rs6000.cc (can_be_rotated_to_positive_li): New function.
(can_be_rotated_to_negative_li): New function.
(can_be_built_by_li_and_rotldi): New function.
(rs6000_emit_set_long_const): Call can_be_built_by_li_and_rotldi.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/const-build.c: New test.
---
 gcc/config/rs6000/rs6000.cc   | 64 +--
 .../gcc.target/powerpc/const-build.c  | 54 
 2 files changed, 112 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/const-build.c

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 42f49e4a56b..1dd0072350a 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -10258,6 +10258,48 @@ rs6000_emit_set_const (rtx dest, rtx source)
   return true;
 }
 
+/* Check if C can be rotated to a positive value which 'li' instruction
+   is able to load.  If so, set *ROT to the number by which C is rotated,
+   and return true.  Return false otherwise.  */
+
+static bool
+can_be_rotated_to_positive_li (HOST_WIDE_INT c, int *rot)
+{
+  /* 49 leading zeros and 15 low bits on the positive value
+ generated by 'li' instruction.  */
+  return can_be_rotated_to_lowbits (c, 15, rot);
+}
+
+/* Like can_be_rotated_to_positive_li, but check the negative value of 'li'.  
*/
+
+static bool
+can_be_rotated_to_negative_li (HOST_WIDE_INT c, int *rot)
+{
+  return can_be_rotated_to_lowbits (~c, 15, rot);
+}
+
+/* Check if value C can be built by 2 instructions: one is 'li', another is
+   rotldi.
+
+   If so, *SHIFT is set to the shift operand of rotldi(rldicl), and *MASK
+   is set to -1, and return true.  Return false otherwise.  */
+
+static bool
+can_be_built_by_li_and_rotldi (HOST_WIDE_INT c, int *shift,
+  HOST_WIDE_INT *mask)
+{
+  int n;
+  if (can_be_rotated_to_positive_li (c, )
+  || can_be_rotated_to_negative_li (c, ))
+{
+  *mask = HOST_WIDE_INT_M1;
+  *shift = HOST_BITS_PER_WIDE_INT - n;
+  return true;
+}
+
+  return false;
+}
+
 /* Subroutine of rs6000_emit_set_const, handling PowerPC64 DImode.
Output insns to set DEST equal to the constant C as a series of
lis, ori and shl instructions.  */
@@ -10266,15 +10308,14 @@ static void
 rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
 {
   rtx temp;
+  int shift;
+  HOST_WIDE_INT mask;
   HOST_WIDE_INT ud1, ud2, ud3, ud4;
 
   ud1 = c & 0x;
-  c = c >> 16;
-  ud2 = c & 0x;
-  c = c >> 16;
-  ud3 = c & 0x;
-  c = c >> 16;
-  ud4 = c & 0x;
+  ud2 = (c >> 16) & 0x;
+  ud3 = (c >> 32) & 0x;
+  ud4 = (c >> 48) & 0x;
 
   if ((ud4 == 0x && ud3 == 0x && ud2 == 0x && (ud1 & 0x8000))
   || (ud4 == 0 && ud3 == 0 && ud2 == 0 && ! (ud1 & 0x8000)))
@@ -10305,6 +10346,17 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
   emit_move_insn (dest, gen_rtx_XOR (DImode, temp,
 GEN_INT ((ud2 ^ 0x) << 16)));
 }
+  else if (can_be_built_by_li_and_rotldi (c, , ))
+{
+  temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
+  unsigned HOST_WIDE_INT imm = (c | ~mask);
+  imm = (imm >> shift) | (imm << (HOST_BITS_PER_WIDE_INT - shift));
+
+  emit_move_insn (temp, GEN_INT (imm));
+  if (shift != 0)
+   temp = gen_rtx_ROTATE (DImode, temp, GEN_INT (shift));
+  emit_move_insn (dest, temp);
+}
   else if (ud3 == 0 && ud4 == 0)
 {
   temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
diff --git a/gcc/testsuite/gcc.target/powerpc/const-build.c 
b/gcc/testsuite/gcc.target/powerpc/const-build.c
new file mode 100644
index 000..70f095f6bf2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/const-build.c
@@ -0,0 +1,54 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -save-temps" } */
+/* { dg-require-effective-target has_arch_ppc64 } */
+
+#define NOIPA __attribute__ ((noipa))
+
+struct fun
+{
+  long long (*f) (void);
+  long long val;
+};
+
+long long NOIPA
+li_rotldi_1 (void)
+{
+  return 0x75310LL;
+}
+
+long long NOIPA
+li_rotldi_2 (void)
+{
+  return 0x2164LL;
+}
+
+long long NOIPA
+li_rotldi_3 (void)
+{
+  return 0x8531LL;
+}
+
+long long NOIPA
+li_rotldi_4 (void)
+{
+  return 0x2194LL;
+}
+
+struct fun arr[] = {
+  {li_rotldi_1, 0x75310LL},
+  {li_rotldi_2, 0x2164LL},
+  {li_rotldi_3, 0x8531LL},
+  {li_rotldi_4, 0x2194LL},
+};
+
+/* { dg-final { scan-assembler-times {\mrotldi\M} 4 } } */
+
+int
+main ()
+{
+  for (int i = 0; i < sizeof (arr) / sizeof (arr[0]); i++)
+if ((*arr[i].f) () != arr[i].val)
+  

[PATCH V2 0/4] rs6000: build constant via li/lis;rldicX

2023-06-07 Thread Jiufu Guo via Gcc-patches
Hi,

These patches are just minor changes based on previous version/comments.
https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611286.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620489.html
And also update the wording for patches in this series.

For a given constant, it would be profitable if we can use 2 insns to build.
This patch enables more constants building through 2 insns: one is "li or lis",
another is 'rldicl, rldicr or rldic'.
Through checking and analyzing the characters of the insns "li/lis;rldicX",
all the possible constant values are considered by this patch.

The below patches are in this series.

Considering the functionality and size, 4 patches are split as below:
1. Support the constants which can be built by "li;rotldi"
   Both positive and negative values from insn "li" are analyzed.
2. Support the constants which can be built by "lis;rotldi"
   We only need to analyze the negative value from "lis".
   And this patch uses more code to check leading 1s and tailing 0s from "lis".
3. Support the constants which can be built by "li/lis;rldicl/rldicr":
   Leverage the APIs defined/analyzed in patches 1 and 2,
   this patch checks the characters for the mask of "rldicl/rldicr"
   to support more constants.
4. Support the constants which can be built by "li/lis;rldic":
   The mask of "rldic" is relatively complicated, it is analyzed in this
   patch to support more constants.

BR,
Jeff (Jiufu)


[PATCH V2] Optimize '(X - N * M) / N' to 'X / N - M' if valid

2023-06-07 Thread Jiufu Guo via Gcc-patches
Hi,

This patch tries to optimize "(X - N * M) / N" to "X / N - M".
For C code, "/" towards zero (trunc_div), and "X - N * M" maybe
wrap/overflow/underflow. So, it is valid that "X - N * M" does
not cross zero and does not wrap/overflow/underflow.

Compare with previous version:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618796.html

This patch 1. adds the patterns for variable N or M,
2. uses simpler form "(X - N * M) / N" for patterns,
3. adds functions to gimle-fold.h/cc (not gimple-match-head.cc)
4. updates testcases

Bootstrap & regtest pass on ppc64{,le} and x86_64.
Is this patch ok for trunk?


BR,
Jeff (Jiufu Guo)

PR tree-optimization/108757

gcc/ChangeLog:

* gimple-fold.cc (maybe_mult_overflow): New function.
(maybe_plus_overflow): New function.
(maybe_minus_overflow): New function.
(plus_mult_no_ovf_and_keep_sign): New function.
(plus_no_ovf_and_keep_sign): New function.
* gimple-fold.h (maybe_mult_overflow): New declare.
(plus_mult_no_ovf_and_keep_sign): New declare.
(plus_no_ovf_and_keep_sign): New declare.
* match.pd ((X - N * M) / N): New pattern.
((X + N * M) / N): New pattern.
((X + C) / N): New pattern.
((X + C) >> N): New pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/pr108757-1.c: New test.
* gcc.dg/pr108757-2.c: New test.
* gcc.dg/pr108757.h: New test.

---
 gcc/gimple-fold.cc| 161 
 gcc/gimple-fold.h |   3 +
 gcc/match.pd  |  58 +++
 gcc/testsuite/gcc.dg/pr108757-1.c |  18 +++
 gcc/testsuite/gcc.dg/pr108757-2.c |  19 +++
 gcc/testsuite/gcc.dg/pr108757.h   | 244 ++
 6 files changed, 503 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pr108757-1.c
 create mode 100644 gcc/testsuite/gcc.dg/pr108757-2.c
 create mode 100644 gcc/testsuite/gcc.dg/pr108757.h

diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index 581575b65ec..bb833ae17b3 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -9349,3 +9349,164 @@ gimple_stmt_integer_valued_real_p (gimple *stmt, int 
depth)
   return false;
 }
 }
+
+/* Return true if "X * Y" may be overflow.  */
+
+bool
+maybe_mult_overflow (value_range , value_range , signop sgn)
+{
+  wide_int wmin0 = x.lower_bound ();
+  wide_int wmax0 = x.upper_bound ();
+  wide_int wmin1 = y.lower_bound ();
+  wide_int wmax1 = y.upper_bound ();
+
+  wi::overflow_type min_ovf, max_ovf;
+  wi::mul (wmin0, wmin1, sgn, _ovf);
+  wi::mul (wmax0, wmax1, sgn, _ovf);
+  if (min_ovf == wi::OVF_NONE && max_ovf == wi::OVF_NONE)
+{
+  wi::mul (wmin0, wmax1, sgn, _ovf);
+  wi::mul (wmax0, wmin1, sgn, _ovf);
+  if (min_ovf == wi::OVF_NONE && max_ovf == wi::OVF_NONE)
+   return false;
+}
+  return true;
+}
+
+/* Return true if "X + Y" may be overflow.  */
+
+static bool
+maybe_plus_overflow (value_range , value_range , signop sgn)
+{
+  wide_int wmin0 = x.lower_bound ();
+  wide_int wmax0 = x.upper_bound ();
+  wide_int wmin1 = y.lower_bound ();
+  wide_int wmax1 = y.upper_bound ();
+
+  wi::overflow_type min_ovf, max_ovf;
+  wi::add (wmax0, wmax1, sgn, _ovf);
+  wi::add (wmin0, wmin1, sgn, _ovf);
+  if (min_ovf == wi::OVF_NONE && max_ovf == wi::OVF_NONE)
+return false;
+
+  return true;
+}
+
+/* Return true if "X - Y" may be overflow.  */
+
+static bool
+maybe_minus_overflow (value_range , value_range , signop sgn)
+{
+  wide_int wmin0 = x.lower_bound ();
+  wide_int wmax0 = x.upper_bound ();
+  wide_int wmin1 = y.lower_bound ();
+  wide_int wmax1 = y.upper_bound ();
+
+  wi::overflow_type min_ovf, max_ovf;
+  wi::sub (wmin0, wmax1, sgn, _ovf);
+  wi::sub (wmax0, wmin1, sgn, _ovf);
+  if (min_ovf == wi::OVF_NONE && max_ovf == wi::OVF_NONE)
+return false;
+
+  return true;
+}
+
+/* Return true if there is no overflow in the expression.
+   And no sign change on the plus/minus for X.
+   CODE is PLUS_EXPR, if the expression is "X + N * M".
+   CODE is MINUS_EXPR, if the expression is "X - N * M".
+   TYPE is the integer type of the expressions.  */
+
+bool
+plus_mult_no_ovf_and_keep_sign (tree x, tree m, tree n, tree_code code,
+   tree type)
+{
+  value_range vr0;
+  value_range vr1;
+  value_range vr2;
+
+  if (get_range_query (cfun)->range_of_expr (vr0, x)
+  && get_range_query (cfun)->range_of_expr (vr1, n)
+  && get_range_query (cfun)->range_of_expr (vr2, m) && !vr0.varying_p ()
+  && !vr0.undefined_p () && !vr1.varying_p () && !vr1.undefined_p ()
+  && !vr2.varying_p () && !vr2.undefined_p ())
+{
+  signop sgn = TYPE_SIGN (type);
+  if (!TYPE_OVERFLOW_UNDEFINED (type))
+   {
+ if (maybe_mult_overflow (vr1, vr2, sgn))
+   {
+ m = fold_build1 (NEGATE_EXPR, type, m);
+ if (get_range_query (cfun)->range_of_expr (vr2, m)
+ && !vr2.varying_p () && !vr2.undefined_p ()
+ 

Re: [PATCH 1/4] rs6000: build constant via li;rotldi

2023-06-07 Thread Jiufu Guo via Gcc-patches


Hi David,

David Edelsohn  writes:
>  
> Hi, Jiufu
>   * config/rs6000/rs6000.cc (can_be_rotated_to_possitive_li): New 
> function.
>   (can_be_rotated_to_negative_li): New function.
>   (can_be_built_by_li_and_rotldi): New function.
>   (rs6000_emit_set_long_const): Call can_be_built_by_li_and_rotldi.
> In English the word "positive" contains one "s", not two.  Please correct 
> throughout the patches.
> Also a style issue, comments before a function should be followed by a
> blank line.

Sure, I will update accordingly, and check other patches.

>> +/* Check if C can be rotated to a possitive value which 'li' instruction
> positive
>> +   is able to load.  If so, set *ROT to the number by which C is rotated,
>> +   and return true.  Return false otherwise.  */
> Add a blank line here
>> +static bool
>> +can_be_rotated_to_possitive_li (HOST_WIDE_INT c, int *rot)
> positive
>> +{
>> +  /* 49 leading zeros and 15 lowbits on the possitive value
> low bits, positive

Thanks for your careful review! 

>> + generated by 'li' instruction.  */
>> +  return can_be_rotated_to_lowbits (c, 15, rot);
>> +}
>> +/* Check if value C can be built by 2 instructions: one is 'li', another is
>> +   rotldi.
>> +
>> +   If so, *SHIFT is set to the shift operand of rotldi(rldicl), and *MASK
>> +   is set to -1, and return true.  Return false otherwise.  */
>> +static bool
>> +can_be_built_by_li_and_rotldi (HOST_WIDE_INT c, int *shift,
>> +   HOST_WIDE_INT *mask)
>> +{
>> +  int n;
>> +  if (can_be_rotated_to_possitive_li (c, )
>> +  || can_be_rotated_to_negative_li (c, ))
>> +{
>> +  *mask = HOST_WIDE_INT_M1;
>> +  *shift = HOST_BITS_PER_WIDE_INT - n;
>> +  return true;
>> +}
>> +
>> +  return false;
>> +}
>> +
>>  /* Subroutine of rs6000_emit_set_const, handling PowerPC64 DImode.
>> Output insns to set DEST equal to the constant C as a series of
>> lis, ori and shl instructions.  */
>> @@ -10246,15 +10285,14 @@ static void
>>  rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
>>  {
>>rtx temp;
>> +  int shift;
>> +  HOST_WIDE_INT mask;
>>HOST_WIDE_INT ud1, ud2, ud3, ud4;
>>  
>>ud1 = c & 0x;
>> -  c = c >> 16;
>> -  ud2 = c & 0x;
>> -  c = c >> 16;
>> -  ud3 = c & 0x;
>> -  c = c >> 16;
>> -  ud4 = c & 0x;
>> +  ud2 = (c >> 16) & 0x;
>> +  ud3 = (c >> 32) & 0x;
>> +  ud4 = (c >> 48) & 0x;
>>  
>>if ((ud4 == 0x && ud3 == 0x && ud2 == 0x && (ud1 & 0x8000))
>>|| (ud4 == 0 && ud3 == 0 && ud2 == 0 && ! (ud1 & 0x8000)))
>> @@ -10278,6 +10316,19 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT 
>> c)
>>emit_move_insn (dest, gen_rtx_XOR (DImode, temp,
>>   GEN_INT ((ud2 ^ 0x) << 16)));
>>  }
>> +  else if (can_be_built_by_li_and_rotldi (c, , ))
>> +{
>> +  temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
>> +  unsigned HOST_WIDE_INT imm = (c | ~mask);
>> +  imm = (imm >> shift) | (imm << (HOST_BITS_PER_WIDE_INT - shift));
>> +
>> +  emit_move_insn (temp, GEN_INT (imm));
>> +  if (shift != 0)
>> +temp = gen_rtx_ROTATE (DImode, temp, GEN_INT (shift));
>> +  if (mask != HOST_WIDE_INT_M1)
> How is mask != HOST_WIDE_INT_M1? The call to can_by_built_by_li_and_rotldi() 
> set it
> to that value and it is not modified in the interim statements.

Oh, Thanks for catching this!
Actually this line is shared for these patches.
"if (mask != HOST_WIDE_INT_M1)" is useful with patch [3/4], and it
should be merged into that patch.

Thanks again for your review!


>> +temp = gen_rtx_AND (DImode, temp, GEN_INT (mask));
>> +  emit_move_insn (dest, temp);
>> +}
>>else if (ud3 == 0 && ud4 == 0)
>>  {
>>temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
> Thanks, David


Re: ping^^: [PATCH] rs6000: Enable const_anchor for 'addi'

2023-06-01 Thread Jiufu Guo via Gcc-patches


Hi David,

Thanks!

David Edelsohn  writes:

> This Message Is From an External Sender 
> This message came from outside your organization. 
>  
> On Tue, May 30, 2023 at 11:00 PM Jiufu Guo  wrote:
>
>  Gentle ping...
>
>  Jiufu Guo via Gcc-patches  writes:
>
>  > Gentle ping...
>  >
>  > Jiufu Guo via Gcc-patches  writes:
>  >
>  >> Hi,
>  >>
>  >> I'm thinking that we may enable this patch for stage1, so ping it.
>  >> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603530.html
>  >>
>  >> BR,
>  >> Jeff (Jiufu)
>  >>
>  >> Jiufu Guo  writes:
>  >>
>  >>> Hi,
>  >>>
>  >>> There is a functionality as const_anchor in cse.cc.  This const_anchor
>  >>> supports to generate new constants through adding small gap/offsets to
>  >>> existing constant.  For example:
>  >>>
>  >>> void __attribute__ ((noinline)) foo (long long *a)
>  >>> {
>  >>>   *a++ = 0x2351847027482577LL;
>  >>>   *a++ = 0x2351847027482578LL;
>  >>> }
>  >>> The second constant (0x2351847027482578LL) can be compated by adding '1'
>  >>> to the first constant (0x2351847027482577LL).
>  >>> This is profitable if more than one instructions are need to build the
>  >>> second constant.
>  >>>
>  >>> * For rs6000, we can enable this functionality, as the instruction
>  >>> 'addi' is just for this when gap is smaller than 0x8000.
>  >>>
>  >>> * Besides enabling TARGET_CONST_ANCHOR on rs6000, this patch also fixed
>  >>> one issue. The issue is:
>  >>> "gcc_assert (SCALAR_INT_MODE_P (mode))" is an requirement for function
>  >>> "try_const_anchors". 
>  >>>
>  >>> * One potential side effect of this patch:
>  >>> Comparing with
>  >>> "r101=0x2351847027482577LL
>  >>> ...
>  >>> r201=0x2351847027482578LL"
>  >>> The new r201 will be "r201=r101+1", and then r101 will live longer,
>  >>> and would increase pressure when allocating registers.
>  >>> But I feel, this would be acceptable for this const_anchor feature.
>  >>>
>  >>> * With this patch, I checked the performance change on SPEC2017, while,
>  >>> and the performance is not aggressive, since this functionality is not
>  >>> hit on any hot path. There are runtime wavings/noise(e.g. on
>  >>> povray_r/xalancbmk_r/xz_r), that are not caused by the patch.
>  >>>
>  >>> With this patch, I also checked the changes in object files (from
>  >>> GCC bootstrap and SPEC), the significant changes are the improvement
>  >>> that: "addi" vs. "2 or more insns: lis+or.."; it also exposes some
>  >>> other optimizations opportunities: like combine/jump2. While the
>  >>> code to store/load one more register is also occurring in few cases,
>  >>> but it does not impact overall performance.
>  >>>
>  >>> * To refine this patch, some history discussions are referenced:
>  >>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=33699
>  >>> https://gcc.gnu.org/pipermail/gcc-patches/2009-April/260421.html
>  >>> https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566744.html
>  >>>
>  >>>
>  >>> Bootstrap and regtest pass on ppc64 and ppc64le for this patch.
>  >>> Is this ok for trunk?
>
> Hi, Jiufu
>
> Thanks for developing this patch and your persistence.
>
> The rs6000.cc part of the patch (TARGET_CONST_ANCHOR) is okay for
> Stage 1.  This is approved. 
>
> I don't have the authority to approve the change to cse_insn.  Is the
> cse_insn change a prerequisite?  Will the rs6000 change break or
> produce wrong code 
> without the cse change?  The second part of the patch should be posted
> separately to the mailing list, with a cc for appropriate maintainers,
> because most maintainers will not be following this specific thread
> to approve the other part of the patch.

I would extract the cse part as a seperate patch.
Yes, cse part is prerequest, the bug could be exposed by rs6000 part
change.

BR,
Jeff (Jiufu Guo)

>
> Thanks, David
>  
>  >>>
>  >>>
>  >>> BR,
>  >>> Jeff (Jiufu)
>  >>>
>  >>> gcc/ChangeLog:
>  >>>
>  >>> * config/rs6000/rs6000.cc (TARGET_CONST_ANCHOR): New define.
>  >>> * cse.cc (

Re: [RFC] light expander sra for parameters and returns

2023-06-01 Thread Jiufu Guo via Gcc-patches


Hi,

Richard Biener  writes:

> On Mon, 29 May 2023, Jiufu Guo wrote:
>
>> Hi,
>> 
>> Previously, I was investigating some struct parameters and returns related
>> PRs 69143/65421/108073.
>> 
>> Investigating the issues case by case, and drafting patches for each of
>> them one by one. This would help us to enhance code incrementally.
>> While, this way, patches would interact with each other and implement
>> different codes for similar issues (because of the different paths in
>> gimple/rtl).  We may have a common fix for those issues.
>> 
>> We know a few other related PRs(such as meta-bug PR101926) exist. For those
>> PRs in different targets with different symptoms (and also different root
>> cause), I would expect a method could help some of them, but it may
>> be hard to handle all of them in one fix.
>> 
>> With investigation and check discussion for the issues, I remember a
>> suggestion from Richard: it would be nice to perform some SRA-like analysis
>> for the accesses on the structs (parameter/returns).
>> https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605117.html
>> This may be a 'fairly common method' for those issues. With this idea,
>> I drafted a patch as below in this mail.
>> 
>> I also thought about directly using tree-sra.cc, e.g. enhance it and rerun it
>> at the end of GIMPLE passes. While since some issues are introduced inside
>> the expander, so below patch also co-works with other parts of the expander.
>> And since we already have tree-sra in gimple pass, we only need to take more
>> care on parameter and return in this patch: other decls could be handled
>> well in tree-sra.
>> 
>> The steps of this patch are:
>> 1. Collect struct type parameters and returns, and then scan the function to
>> get the accesses on them. And figure out the accesses which would be 
>> profitable
>> to be scalarized (using registers of the parameter/return ). Now, reading on
>> parameter and writing on returns are checked in the current patch.
>> 2. When/after the scalar registers are determined/expanded for the return or
>> parameters, compute the corresponding scalar register(s) for each accesses of
>> the return/parameter, and prepare the scalar RTLs for those accesses.
>> 3. When using/expanding the accesses expression, leverage the 
>> computed/prepared
>> scalars directly.
>> 
>> This patch is tested on ppc64 both LE and BE.
>> To continue, I would ask for comments and suggestions first. And then I would
>> update/enhance accordingly.  Thanks in advance!
>
> Thanks for working on this - the description above sounds exactly like
> what should be done.
>
> Now - I'd like the code to re-use the access tree data structure from
> SRA plus at least the worker creating the accesses from a stmt.
>
> The RTL expansion code already does a sweep over stmts in
> discover_nonconstant_array_refs which makes sure RTL expansion doesn't
> scalarize (aka assign non-stack) to variables which have accesses
> that would later eventually FAIL to expand when operating on registers.
> That's very much related to the task at hand so we should try to
> at least merge the CFG walks of both (it produces a forced_stack_vars
> bitmap).

Thanks so much for pointing out this! I would check how it works and
how to use the logic for parameters/returns.

BR,
Jeff (Jiufu Guo)

>
> Can you work together with Martin to split out the access tree
> data structure and share it?
>
> I didn't look in detail as of how you make use of the information
> yet.
>
> Thanks,
> Richard.
>
>> 
>> BR,
>> Jeff (Jiufu)
>> 
>> 
>> ---
>>  gcc/cfgexpand.cc | 567 ++-
>>  gcc/expr.cc  |  15 +-
>>  gcc/function.cc  |  26 +-
>>  gcc/opts.cc  |   8 +-
>>  gcc/testsuite/g++.target/powerpc/pr102024.C  |   2 +-
>>  gcc/testsuite/gcc.target/powerpc/pr108073.c  |  29 +
>>  gcc/testsuite/gcc.target/powerpc/pr65421-1.c |   6 +
>>  gcc/testsuite/gcc.target/powerpc/pr65421-2.c |  32 ++
>>  8 files changed, 675 insertions(+), 10 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073.c
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-1.c
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-2.c
>> 
>> diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
>> index 85a93a547c0..95c29b6b6fe 100644
>> --- a/gcc/cfgexpand.cc
>> +++ b/gcc/cfgexpand.cc
>> @@ -97,6 +97,564 @@ static bool defer_stack_allocation (tree, bool);
>>  
>>  static void record_alignment_for_reg_var (unsigned int);
>>  
>> +/* For light SRA in expander about paramaters and returns.  */
>> +namespace {
>> +
>> +struct access
>> +{
>> +  /* Each accessing on the aggragate is about OFFSET/SIZE and BASE.  */
>> +  HOST_WIDE_INT offset;
>> +  HOST_WIDE_INT size;
>> +  tree base;
>> +  bool writing;
>> +
>> +  /* The context expression of this access.  */
>> +  tree expr;
>> +
>> +  /* The rtx for the access: link to 

Re: [RFC] light expander sra for parameters and returns

2023-06-01 Thread Jiufu Guo via Gcc-patches


Hi,

Martin Jambor  writes:

> Hi,
>
> On Tue, May 30 2023, Richard Biener wrote:
>> On Mon, 29 May 2023, Jiufu Guo wrote:
>>
>>> Hi,
>>> 
>>> Previously, I was investigating some struct parameters and returns related
>>> PRs 69143/65421/108073.
>>> 
>>> Investigating the issues case by case, and drafting patches for each of
>>> them one by one. This would help us to enhance code incrementally.
>>> While, this way, patches would interact with each other and implement
>>> different codes for similar issues (because of the different paths in
>>> gimple/rtl).  We may have a common fix for those issues.
>>> 
>>> We know a few other related PRs(such as meta-bug PR101926) exist. For those
>>> PRs in different targets with different symptoms (and also different root
>>> cause), I would expect a method could help some of them, but it may
>>> be hard to handle all of them in one fix.
>>> 
>>> With investigation and check discussion for the issues, I remember a
>>> suggestion from Richard: it would be nice to perform some SRA-like analysis
>>> for the accesses on the structs (parameter/returns).
>>> https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605117.html
>>> This may be a 'fairly common method' for those issues. With this idea,
>>> I drafted a patch as below in this mail.
>>> 
>>> I also thought about directly using tree-sra.cc, e.g. enhance it and rerun 
>>> it
>>> at the end of GIMPLE passes. While since some issues are introduced inside
>>> the expander, so below patch also co-works with other parts of the expander.
>>> And since we already have tree-sra in gimple pass, we only need to take more
>>> care on parameter and return in this patch: other decls could be handled
>>> well in tree-sra.
>>> 
>>> The steps of this patch are:
>>> 1. Collect struct type parameters and returns, and then scan the function to
>>> get the accesses on them. And figure out the accesses which would be 
>>> profitable
>>> to be scalarized (using registers of the parameter/return ). Now, reading on
>>> parameter and writing on returns are checked in the current patch.
>>> 2. When/after the scalar registers are determined/expanded for the return or
>>> parameters, compute the corresponding scalar register(s) for each accesses 
>>> of
>>> the return/parameter, and prepare the scalar RTLs for those accesses.
>>> 3. When using/expanding the accesses expression, leverage the 
>>> computed/prepared
>>> scalars directly.
>>> 
>>> This patch is tested on ppc64 both LE and BE.
>>> To continue, I would ask for comments and suggestions first. And then I 
>>> would
>>> update/enhance accordingly.  Thanks in advance!
>>
>> Thanks for working on this - the description above sounds exactly like
>> what should be done.
>>
>> Now - I'd like the code to re-use the access tree data structure from
>> SRA plus at least the worker creating the accesses from a stmt.
>
Thanks Martin for your reply and thanks for your time!

> I have had a first look at the patch but still need to look into it more
> to understand how it uses the information it gathers.
>
> My plan is to make the access-tree infrastructure of IPA-SRA more
> generic and hopefully usable even for this purpose, rather than the one
> in tree-sra.cc.  But that really builds a tree of accesses, bailing out
> on any partial overlaps, for example, which may not be the right thing
> here since I don't see any tree-building here.

Yeap, both in tree-sra and ipa-sra, there are concepts about
"access" and "scan functions/stmts". In this light-sra, these concepts
are also used. And you may notice that ipa-sra and tree-sra have more
logic than the current 'light-expand-sra'.

Currently, the 'light-expand-sra' just takes care few things: reading
from parameter, writing to returns, and disabling sra if address-taken.
As you notice, now the "access" in this patch is not in a 'tree-struct',
it is just a 'flat' (or say map & vector). And overlaps between
accesses are not checked because they are all just reading (for parm).

When we take care of more stuff: passing to call argument, occur in
memory assignment, occur in line asm... This light-expander-sra would be
more and more like tee-sra and ipa-sra. And it would be good to leverage
more capabilities from tree-sra and ipa-sra. So, I agree that it would be
a great idea to share and reuse the same struct.

> But I still need to
> properly read set_scalar_rtx_for_aggregate_access function in the patch,
> which I plan to do next week.

set_scalar_rtx_for_aggregate_access is another key part of this patch.
Different from tree-sra/ipa-sra (which creates new scalars SSA for each
access), this patch invokes "set_scalar_rtx_for_aggregate_access" to
create an rtx expression for each access. Now, this part may not common
with tree-sra and ipa-sra.

This function is invoked for each parameter if the parameter is
aggregate type and passed via registers.
For each access about this parameter, the function creates an rtx
according to the offset/size/mode of the 

Re: [PATCH] Optimized "(X - N * M) / N + M" to "X / N" if valid

2023-05-31 Thread Jiufu Guo via Gcc-patches
Hi,

Jiufu Guo via Gcc-patches  writes:

> Hi,
>
> Richard Biener  writes:
>
>> On Wed, 17 May 2023, Jiufu Guo wrote:
>>
>>> Hi,
>>> 
>>> This patch tries to optimize "(X - N * M) / N + M" to "X / N".
>>
>> But if that's valid why not make the transform simpler and transform
>> (X - N * M) / N  to X / N - M instead?
>
> Great catch!
> If "N * M" is not constant, "X / N - M" would be better than
> "(X - N * M) / N".  If "N, M" are constants, "(X - N * M) / N" and
> "X / N - M" may be similar; while for this case, "X / N - M" should
> also be fine!  I would try to update accordingly. 
>
>>
>> You use the same optimize_x_minus_NM_div_N_plus_M validator for
>> the division and shift variants but the overflow rules are different,
>> so I'm not sure that's warranted.  I'd also prefer to not split out
>> the validator to a different file - iff then the appropriate file
>> is fold-const.cc, not gimple-match-head.cc (I see we're a bit
>> inconsistent here, for pure gimple matches gimple-fold.cc would
>> be another place).
>
> Thanks for pointing out this!
> For shift,  I guess you may concern that: 1. if the right operand is
> negative or is greater than or equal to the type width.  2. if it is
> a signed negative value.  They may UB or 'sign bit shift'?  This patch
> assumes it is ok to do the transform.  I may have more check to see
> if this is really ok, and hope some one can point out if this is
> invalid. "(X - N * M) >> log2(N)" ==> " X >> log2(N) - M".
>
> I split out the validator just because: it is shared for division and
> shift :).  And it seems gimple-match-head.cc and generic-match-head.cc,
> may be introduced for match.pd.  So, I put it into gimple-match-head.cc.
>
>>
>> Since you use range information why is the transform restricted
>> to constant M?
>
> If M is a variable, the range for "X" is varying_p. I did not find
> the method to get the bounds for "X" (or for "X - N * M") to check no
> wraps.  Any suggestions?

Oh, I may misunderstand here.
You may say: M could be with a range too, then we can check if
"X - N * M" has a valid range or possible wrap/overflow. 

BR,
Jeff (Jiufu Guo)

>
>
> Again, thanks for your great help!
>
> BR,
> Jeff (Jiufu Guo)
>
>>
>> Richard.
>>
>>> As per the discussions in PR108757, we know this transformation is valid
>>> only under some conditions.
>>> For C code, "/" towards zero (trunc_div), and "X - N * M"
>>> maybe wrap/overflow/underflow. So, it is valid that "X - N * M" does
>>> not cross zero and does not wrap/overflow/underflow.
>>> 
>>> This patch also handles the case when "N" is the power of 2, where
>>> "(X - N * M) / N" is "(X - N * M) >> log2(N)".
>>> 
>>> Bootstrap & regtest pass on ppc64{,le} and x86_64.
>>> Is this ok for trunk?
>>> 
>>> BR,
>>> Jeff (Jiufu)
>>> 
>>> PR tree-optimization/108757
>>> 
>>> gcc/ChangeLog:
>>> 
>>> * gimple-match-head.cc (optimize_x_minus_NM_div_N_plus_M): New function.
>>> * match.pd ((X - N * M) / N + M): New pattern.
>>> 
>>> gcc/testsuite/ChangeLog:
>>> 
>>> * gcc.dg/pr108757-1.c: New test.
>>> * gcc.dg/pr108757-2.c: New test.
>>> * gcc.dg/pr108757.h: New test.
>>> 
>>> ---
>>>  gcc/gimple-match-head.cc  |  54 ++
>>>  gcc/match.pd  |  22 
>>>  gcc/testsuite/gcc.dg/pr108757-1.c |  17 
>>>  gcc/testsuite/gcc.dg/pr108757-2.c |  18 
>>>  gcc/testsuite/gcc.dg/pr108757.h   | 160 ++
>>>  5 files changed, 271 insertions(+)
>>>  create mode 100644 gcc/testsuite/gcc.dg/pr108757-1.c
>>>  create mode 100644 gcc/testsuite/gcc.dg/pr108757-2.c
>>>  create mode 100644 gcc/testsuite/gcc.dg/pr108757.h
>>> 
>>> diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
>>> index b08cd891a13..680a4cb2fc6 100644
>>> --- a/gcc/gimple-match-head.cc
>>> +++ b/gcc/gimple-match-head.cc
>>> @@ -224,3 +224,57 @@ optimize_successive_divisions_p (tree divisor, tree 
>>> inner_div)
>>>  }
>>>return true;
>>>  }
>>> +
>>> +/* Return true if "(X - N * M) / N + M" can be optimized into "X / N".
>>&

Re: [PATCH] Optimized "(X - N * M) / N + M" to "X / N" if valid

2023-05-31 Thread Jiufu Guo via Gcc-patches


Hi,

Richard Biener  writes:

> On Wed, 17 May 2023, Jiufu Guo wrote:
>
>> Hi,
>> 
>> This patch tries to optimize "(X - N * M) / N + M" to "X / N".
>
> But if that's valid why not make the transform simpler and transform
> (X - N * M) / N  to X / N - M instead?

Great catch!
If "N * M" is not constant, "X / N - M" would be better than
"(X - N * M) / N".  If "N, M" are constants, "(X - N * M) / N" and
"X / N - M" may be similar; while for this case, "X / N - M" should
also be fine!  I would try to update accordingly. 

>
> You use the same optimize_x_minus_NM_div_N_plus_M validator for
> the division and shift variants but the overflow rules are different,
> so I'm not sure that's warranted.  I'd also prefer to not split out
> the validator to a different file - iff then the appropriate file
> is fold-const.cc, not gimple-match-head.cc (I see we're a bit
> inconsistent here, for pure gimple matches gimple-fold.cc would
> be another place).

Thanks for pointing out this!
For shift,  I guess you may concern that: 1. if the right operand is
negative or is greater than or equal to the type width.  2. if it is
a signed negative value.  They may UB or 'sign bit shift'?  This patch
assumes it is ok to do the transform.  I may have more check to see
if this is really ok, and hope some one can point out if this is
invalid. "(X - N * M) >> log2(N)" ==> " X >> log2(N) - M".

I split out the validator just because: it is shared for division and
shift :).  And it seems gimple-match-head.cc and generic-match-head.cc,
may be introduced for match.pd.  So, I put it into gimple-match-head.cc.

>
> Since you use range information why is the transform restricted
> to constant M?

If M is a variable, the range for "X" is varying_p. I did not find
the method to get the bounds for "X" (or for "X - N * M") to check no
wraps.  Any suggestions?


Again, thanks for your great help!

BR,
Jeff (Jiufu Guo)

>
> Richard.
>
>> As per the discussions in PR108757, we know this transformation is valid
>> only under some conditions.
>> For C code, "/" towards zero (trunc_div), and "X - N * M"
>> maybe wrap/overflow/underflow. So, it is valid that "X - N * M" does
>> not cross zero and does not wrap/overflow/underflow.
>> 
>> This patch also handles the case when "N" is the power of 2, where
>> "(X - N * M) / N" is "(X - N * M) >> log2(N)".
>> 
>> Bootstrap & regtest pass on ppc64{,le} and x86_64.
>> Is this ok for trunk?
>> 
>> BR,
>> Jeff (Jiufu)
>> 
>>  PR tree-optimization/108757
>> 
>> gcc/ChangeLog:
>> 
>>  * gimple-match-head.cc (optimize_x_minus_NM_div_N_plus_M): New function.
>>  * match.pd ((X - N * M) / N + M): New pattern.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>  * gcc.dg/pr108757-1.c: New test.
>>  * gcc.dg/pr108757-2.c: New test.
>>  * gcc.dg/pr108757.h: New test.
>> 
>> ---
>>  gcc/gimple-match-head.cc  |  54 ++
>>  gcc/match.pd  |  22 
>>  gcc/testsuite/gcc.dg/pr108757-1.c |  17 
>>  gcc/testsuite/gcc.dg/pr108757-2.c |  18 
>>  gcc/testsuite/gcc.dg/pr108757.h   | 160 ++
>>  5 files changed, 271 insertions(+)
>>  create mode 100644 gcc/testsuite/gcc.dg/pr108757-1.c
>>  create mode 100644 gcc/testsuite/gcc.dg/pr108757-2.c
>>  create mode 100644 gcc/testsuite/gcc.dg/pr108757.h
>> 
>> diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
>> index b08cd891a13..680a4cb2fc6 100644
>> --- a/gcc/gimple-match-head.cc
>> +++ b/gcc/gimple-match-head.cc
>> @@ -224,3 +224,57 @@ optimize_successive_divisions_p (tree divisor, tree 
>> inner_div)
>>  }
>>return true;
>>  }
>> +
>> +/* Return true if "(X - N * M) / N + M" can be optimized into "X / N".
>> +   Otherwise return false.
>> +
>> +   For unsigned,
>> +   If sign bit of M is 0 (clz is 0), valid range is [N*M, MAX].
>> +   If sign bit of M is 1, valid range is [0, MAX - N*(-M)].
>> +
>> +   For signed,
>> +   If N*M > 0, valid range: [MIN+N*M, 0] + [N*M, MAX]
>> +   If N*M < 0, valid range: [MIN, -(-N*M)] + [0, MAX - (-N*M)].  */
>> +
>> +static bool
>> +optimize_x_minus_NM_div_N_plus_M (tree x, wide_int n, wide_int m, tree type)
>> +{
>> +  wide_int max = wi::max_value (type);
>> +  signop sgn = TYPE_SIGN (type);
>> +  wide_int nm;
>> +  wi::overflow_type ovf;
>> +  if (TYPE_UNSIGNED (type) && wi::clz (m) == 0)
>> +nm = wi::mul (n, -m, sgn, );
>> +  else
>> +nm = wi::mul (n, m, sgn, );
>> +
>> +  if (ovf != wi::OVF_NONE)
>> +return false;
>> +
>> +  value_range vr0;
>> +  if (!get_range_query (cfun)->range_of_expr (vr0, x) || vr0.varying_p ()
>> +  || vr0.undefined_p ())
>> +return false;
>> +
>> +  wide_int wmin0 = vr0.lower_bound ();
>> +  wide_int wmax0 = vr0.upper_bound ();
>> +  wide_int min = wi::min_value (type);
>> +
>> +  /* unsigned */
>> +  if ((TYPE_UNSIGNED (type)))
>> +/* M > 0 (clz != 0): [N*M, MAX],  M < 0 : [0, MAX-N*(-M)]  */
>> +return wi::clz (m) != 0 ? wi::ge_p (wmin0, nm, sgn)
>> +: wi::le_p 

ping^^: [PATCH] rs6000: Enable const_anchor for 'addi'

2023-05-30 Thread Jiufu Guo via Gcc-patches


Gentle ping...

Jiufu Guo via Gcc-patches  writes:

> Gentle ping...
>
> Jiufu Guo via Gcc-patches  writes:
>
>> Hi,
>>
>> I'm thinking that we may enable this patch for stage1, so ping it.
>> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603530.html
>>
>> BR,
>> Jeff (Jiufu)
>>
>> Jiufu Guo  writes:
>>
>>> Hi,
>>>
>>> There is a functionality as const_anchor in cse.cc.  This const_anchor
>>> supports to generate new constants through adding small gap/offsets to
>>> existing constant.  For example:
>>>
>>> void __attribute__ ((noinline)) foo (long long *a)
>>> {
>>>   *a++ = 0x2351847027482577LL;
>>>   *a++ = 0x2351847027482578LL;
>>> }
>>> The second constant (0x2351847027482578LL) can be compated by adding '1'
>>> to the first constant (0x2351847027482577LL).
>>> This is profitable if more than one instructions are need to build the
>>> second constant.
>>>
>>> * For rs6000, we can enable this functionality, as the instruction
>>> 'addi' is just for this when gap is smaller than 0x8000.
>>>
>>> * Besides enabling TARGET_CONST_ANCHOR on rs6000, this patch also fixed
>>> one issue. The issue is:
>>> "gcc_assert (SCALAR_INT_MODE_P (mode))" is an requirement for function
>>> "try_const_anchors". 
>>>
>>> * One potential side effect of this patch:
>>> Comparing with
>>> "r101=0x2351847027482577LL
>>> ...
>>> r201=0x2351847027482578LL"
>>> The new r201 will be "r201=r101+1", and then r101 will live longer,
>>> and would increase pressure when allocating registers.
>>> But I feel, this would be acceptable for this const_anchor feature.
>>>
>>> * With this patch, I checked the performance change on SPEC2017, while,
>>> and the performance is not aggressive, since this functionality is not
>>> hit on any hot path. There are runtime wavings/noise(e.g. on
>>> povray_r/xalancbmk_r/xz_r), that are not caused by the patch.
>>>
>>> With this patch, I also checked the changes in object files (from
>>> GCC bootstrap and SPEC), the significant changes are the improvement
>>> that: "addi" vs. "2 or more insns: lis+or.."; it also exposes some
>>> other optimizations opportunities: like combine/jump2. While the
>>> code to store/load one more register is also occurring in few cases,
>>> but it does not impact overall performance.
>>>
>>> * To refine this patch, some history discussions are referenced:
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=33699
>>> https://gcc.gnu.org/pipermail/gcc-patches/2009-April/260421.html
>>> https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566744.html
>>>
>>>
>>> Bootstrap and regtest pass on ppc64 and ppc64le for this patch.
>>> Is this ok for trunk?
>>>
>>>
>>> BR,
>>> Jeff (Jiufu)
>>>
>>> gcc/ChangeLog:
>>>
>>> * config/rs6000/rs6000.cc (TARGET_CONST_ANCHOR): New define.
>>> * cse.cc (cse_insn): Add guard condition.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> * gcc.target/powerpc/const_anchors.c: New test.
>>> * gcc.target/powerpc/try_const_anchors_ice.c: New test.
>>>
>>> ---
>>>  gcc/config/rs6000/rs6000.cc   |  4 
>>>  gcc/cse.cc|  3 ++-
>>>  .../gcc.target/powerpc/const_anchors.c| 20 +++
>>>  .../powerpc/try_const_anchors_ice.c   | 16 +++
>>>  4 files changed, 42 insertions(+), 1 deletion(-)
>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/const_anchors.c
>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/try_const_anchors_ice.c
>>>
>>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>>> index d2743f7bce6..80cded6dec1 100644
>>> --- a/gcc/config/rs6000/rs6000.cc
>>> +++ b/gcc/config/rs6000/rs6000.cc
>>> @@ -1760,6 +1760,10 @@ static const struct attribute_spec 
>>> rs6000_attribute_table[] =
>>>  
>>>  #undef TARGET_UPDATE_IPA_FN_TARGET_INFO
>>>  #define TARGET_UPDATE_IPA_FN_TARGET_INFO rs6000_update_ipa_fn_target_info
>>> +
>>> +#undef TARGET_CONST_ANCHOR
>>> +#define TARGET_CONST_ANCHOR 0x8000
>>> +
>>>  
>>>  
>>>  /* Processor table.  */
>>> diff --git a/g

Ping^^^ [PATCH 0/4] rs6000: build constant via li/lis;rldicX

2023-05-30 Thread Jiufu Guo via Gcc-patches


Gentle ping...

Jiufu Guo via Gcc-patches  writes:

> Hi,
>
> I would like to ping these patches.
> [0/4]
> https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611286.html
> [1/4]
> https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611287.html
> [2/4]
> https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611288.html
> [3/4]
> https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611289.html
> [4/4]
> https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611290.html
>
> Any sugguestions for the code functionality/style or to make
> it easy for review, please point out, thanks in advance!
>
>
> BR,
> Jeff (Jiufu)
>
> Jiufu Guo via Gcc-patches  writes:
>
>> Hi,
>>
>> Gental ping these patches:
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611286.html
>>
>> BR,
>> Jeff (Jiufu)
>>
>>
>> Jiufu Guo  writes:
>>
>>> Hi,
>>>
>>> For a given constant, it would be profitable if we can use 2 insns to build.
>>> This patch enables more constants building through 2 insns: one is "li or 
>>> lis",
>>> another is 'rldicl, rldicr or rldic'.
>>> Through checking and analyzing the characters of the insns "li/lis;rldicX",
>>> all the possible constant values are considered by this patch.
>>>
>>> Previously, a patch is posted, but it is too large.
>>> https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601276.html
>>> As suggested, I split it into this series.
>>>
>>> Considering the functionality and size, 4 patches are split as below:
>>> 1. Support the constants which can be built by "li;rotldi"
>>>Both positive and negative values from insn "li" are analyzed.
>>> 2. Support the constants which can be built by "lis;rotldi"
>>>We only need to analyze the negative value from "lis".
>>>And this patch uses more code to check leading 1s and tailing 0s from 
>>> "lis".
>>> 3. Support the constants which can be built by "li/lis;rldicl/rldicr":
>>>Leverage the APIs defined/analyzed in patches 1 and 2,
>>>this patch checks the characters for the mask of "rldicl/rldicr"
>>>to support more constants.
>>> 4. Support the constants which can be built by "li/lis;rldic":
>>>The mask of "rldic" is relatively complicated, it is analyzed in this
>>>patch to support more constants.
>>>
>>> BR,
>>> Jeff (Jiufu)


ping^^^: [PATCH V2] rs6000: Enhance lowpart/highpart DI->SF by mtvsrws/mtvsrd

2023-05-30 Thread Jiufu Guo via Gcc-patches


Gentle ping...

Jiufu Guo via Gcc-patches  writes:

> Gentle ping...
>
> Jiufu Guo via Gcc-patches  writes:
>
>> Hi
>>
>> I would like to ping this patch for stage1:
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612168.html
>>
>> BR,
>> Jeff (Jiufu)
>>
>> Jiufu Guo  writes:
>>
>>> Hi,
>>>
>>> Compare with previous version:
>>> https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609654.html
>>> This patch does not use UNSPEC for insn mtvsrws anymore.  And to handle
>>> the subreg better on BE and LE, predicate "lowpart_subreg_operator"
>>> is introducted. To help combine pass to match the pattern on high32
>>> bit of DI, shiftrt is still used.
>>>
>>> As mentioned in PR108338, on p9, we could use mtvsrws to implement
>>> the conversion from SI#0 to SF (or lowpart DI to SF).
>>>
>>> For examples:
>>>   *(long long*)buff = di;
>>>   float f = *(float*)(buff);
>>> We generate "sldi 9,3,32 ; mtvsrd 1,9 ; xscvspdpn 1,1" instead of
>>> "mtvsrws 1,3 ; xscvspdpn 1,1".
>>>
>>> This patch update this, and also enhance the bitcast from highpart
>>> DI to SF.
>>>
>>> Bootstrap and regtests pass on ppc64{,le}.
>>> Is this ok for trunk?
>>>
>>> BR,
>>> Jeff (Jiufu)
>>>
>>> PR target/108338
>>>
>>> gcc/ChangeLog:
>>>
>>> * config/rs6000/predicates.md (lowpart_subreg_operator): New
>>> define_predicate.
>>> * config/rs6000/rs6000.md (any_rshift): New code_iterator.
>>> (movsf_from_si2): Rename to...
>>> (movsf_from_si2_): ... this.
>>> (si2sf_mtvsrws): New define_insn.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> * gcc.target/powerpc/pr108338.c: New test.
>>>
>>> ---
>>>  gcc/config/rs6000/predicates.md |  5 +++
>>>  gcc/config/rs6000/rs6000.md | 35 -
>>>  gcc/testsuite/gcc.target/powerpc/pr108338.c | 42 +
>>>  3 files changed, 73 insertions(+), 9 deletions(-)
>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108338.c
>>>
>>> diff --git a/gcc/config/rs6000/predicates.md 
>>> b/gcc/config/rs6000/predicates.md
>>> index 52c65534e51..e57c9d99c6b 100644
>>> --- a/gcc/config/rs6000/predicates.md
>>> +++ b/gcc/config/rs6000/predicates.md
>>> @@ -2064,3 +2064,8 @@ (define_predicate "macho_pic_address"
>>>else
>>>  return false;
>>>  })
>>> +
>>> +(define_predicate "lowpart_subreg_operator"
>>> +  (and (match_code "subreg")
>>> +   (match_test "subreg_lowpart_offset (mode, GET_MODE (SUBREG_REG 
>>> (op)))
>>> +   == SUBREG_BYTE (op)")))
>>> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
>>> index 4a7812fa592..5b4a7f8d801 100644
>>> --- a/gcc/config/rs6000/rs6000.md
>>> +++ b/gcc/config/rs6000/rs6000.md
>>> @@ -7539,6 +7539,14 @@ (define_split
>>>  UNSPEC_MOVSI_GOT))]
>>>"")
>>>  
>>> +(define_insn "si2sf_mtvsrws"
>>> +  [(set (match_operand:SF 0 "gpc_reg_operand" "=wa")
>>> +   (subreg:SF (match_operand:SI 1 "gpc_reg_operand" "r") 0))]
>>> +  "TARGET_P9_VECTOR && TARGET_XSCVSPDPN"
>>> +  "mtvsrws %x0,%1\n\txscvspdpn %x0,%x0"
>>> +  [(set_attr "type" "mfvsr")
>>> +   (set_attr "length" "8")])
>>> +
>>>  ;;MR  LA
>>>  ;;LWZ LFIWZX  LXSIWZX
>>>  ;;STW STFIWX  STXSIWX
>>> @@ -8203,10 +8211,18 @@ (define_insn_and_split "movsf_from_si"
>>>rtx op2 = operands[2];
>>>rtx op1_di = gen_rtx_REG (DImode, REGNO (op1));
>>>  
>>> -  /* Move SF value to upper 32-bits for xscvspdpn.  */
>>> -  emit_insn (gen_ashldi3 (op2, op1_di, GEN_INT (32)));
>>> -  emit_insn (gen_p8_mtvsrd_sf (op0, op2));
>>> -  emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0));
>>> +  if (TARGET_P9_VECTOR)
>>> +{
>>> +  emit_insn (gen_si2sf_mtvsrws (op0, gen_lowpart (SImode, op1_di)));
>>> +}
>>> +  else
>>> +{
>>> +  /* Move SF value to

[RFC] light expander sra for parameters and returns

2023-05-28 Thread Jiufu Guo via Gcc-patches
Hi,

Previously, I was investigating some struct parameters and returns related
PRs 69143/65421/108073.

Investigating the issues case by case, and drafting patches for each of
them one by one. This would help us to enhance code incrementally.
While, this way, patches would interact with each other and implement
different codes for similar issues (because of the different paths in
gimple/rtl).  We may have a common fix for those issues.

We know a few other related PRs(such as meta-bug PR101926) exist. For those
PRs in different targets with different symptoms (and also different root
cause), I would expect a method could help some of them, but it may
be hard to handle all of them in one fix.

With investigation and check discussion for the issues, I remember a
suggestion from Richard: it would be nice to perform some SRA-like analysis
for the accesses on the structs (parameter/returns).
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605117.html
This may be a 'fairly common method' for those issues. With this idea,
I drafted a patch as below in this mail.

I also thought about directly using tree-sra.cc, e.g. enhance it and rerun it
at the end of GIMPLE passes. While since some issues are introduced inside
the expander, so below patch also co-works with other parts of the expander.
And since we already have tree-sra in gimple pass, we only need to take more
care on parameter and return in this patch: other decls could be handled
well in tree-sra.

The steps of this patch are:
1. Collect struct type parameters and returns, and then scan the function to
get the accesses on them. And figure out the accesses which would be profitable
to be scalarized (using registers of the parameter/return ). Now, reading on
parameter and writing on returns are checked in the current patch.
2. When/after the scalar registers are determined/expanded for the return or
parameters, compute the corresponding scalar register(s) for each accesses of
the return/parameter, and prepare the scalar RTLs for those accesses.
3. When using/expanding the accesses expression, leverage the computed/prepared
scalars directly.

This patch is tested on ppc64 both LE and BE.
To continue, I would ask for comments and suggestions first. And then I would
update/enhance accordingly.  Thanks in advance!
   

BR,
Jeff (Jiufu)


---
 gcc/cfgexpand.cc | 567 ++-
 gcc/expr.cc  |  15 +-
 gcc/function.cc  |  26 +-
 gcc/opts.cc  |   8 +-
 gcc/testsuite/g++.target/powerpc/pr102024.C  |   2 +-
 gcc/testsuite/gcc.target/powerpc/pr108073.c  |  29 +
 gcc/testsuite/gcc.target/powerpc/pr65421-1.c |   6 +
 gcc/testsuite/gcc.target/powerpc/pr65421-2.c |  32 ++
 8 files changed, 675 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-2.c

diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
index 85a93a547c0..95c29b6b6fe 100644
--- a/gcc/cfgexpand.cc
+++ b/gcc/cfgexpand.cc
@@ -97,6 +97,564 @@ static bool defer_stack_allocation (tree, bool);
 
 static void record_alignment_for_reg_var (unsigned int);
 
+/* For light SRA in expander about paramaters and returns.  */
+namespace {
+
+struct access
+{
+  /* Each accessing on the aggragate is about OFFSET/SIZE and BASE.  */
+  HOST_WIDE_INT offset;
+  HOST_WIDE_INT size;
+  tree base;
+  bool writing;
+
+  /* The context expression of this access.  */
+  tree expr;
+
+  /* The rtx for the access: link to incoming/returning register(s).  */
+  rtx rtx_val;
+};
+
+typedef struct access *access_p;
+
+/* Expr (tree) -> Acess (access_p) map.  */
+static hash_map *expr_access_vec;
+
+/* Base (tree) -> Vector (vec *) map.  */
+static hash_map > *base_access_vec;
+
+/* Return a vector of pointers to accesses for the variable given in BASE or
+ NULL if there is none.  */
+
+static vec *
+get_base_access_vector (tree base)
+{
+  return base_access_vec->get (base);
+}
+
+/* Remove DECL from candidates for SRA.  */
+static void
+disqualify_candidate (tree decl)
+{
+  decl = get_base_address (decl);
+  base_access_vec->remove (decl);
+}
+
+/* Create and insert access for EXPR. Return created access, or NULL if it is
+   not possible.  */
+static struct access *
+create_access (tree expr, bool write)
+{
+  poly_int64 poffset, psize, pmax_size;
+  bool reverse;
+
+  tree base
+= get_ref_base_and_extent (expr, , , _size, );
+
+  if (!DECL_P (base))
+return NULL;
+
+  vec *access_vec = get_base_access_vector (base);
+  if (!access_vec)
+return NULL;
+
+  /* TODO: support reverse. */
+  if (reverse)
+{
+  disqualify_candidate (expr);
+  return NULL;
+}
+
+  HOST_WIDE_INT offset, size, max_size;
+  if (!poffset.is_constant () || !psize.is_constant ()
+  || !pmax_size.is_constant (_size))
+return NULL;
+

Re: ping: [PATCH] rs6000: Enable const_anchor for 'addi'

2023-05-17 Thread Jiufu Guo via Gcc-patches


Gentle ping...

Jiufu Guo via Gcc-patches  writes:

> Hi,
>
> I'm thinking that we may enable this patch for stage1, so ping it.
> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603530.html
>
> BR,
> Jeff (Jiufu)
>
> Jiufu Guo  writes:
>
>> Hi,
>>
>> There is a functionality as const_anchor in cse.cc.  This const_anchor
>> supports to generate new constants through adding small gap/offsets to
>> existing constant.  For example:
>>
>> void __attribute__ ((noinline)) foo (long long *a)
>> {
>>   *a++ = 0x2351847027482577LL;
>>   *a++ = 0x2351847027482578LL;
>> }
>> The second constant (0x2351847027482578LL) can be compated by adding '1'
>> to the first constant (0x2351847027482577LL).
>> This is profitable if more than one instructions are need to build the
>> second constant.
>>
>> * For rs6000, we can enable this functionality, as the instruction
>> 'addi' is just for this when gap is smaller than 0x8000.
>>
>> * Besides enabling TARGET_CONST_ANCHOR on rs6000, this patch also fixed
>> one issue. The issue is:
>> "gcc_assert (SCALAR_INT_MODE_P (mode))" is an requirement for function
>> "try_const_anchors". 
>>
>> * One potential side effect of this patch:
>> Comparing with
>> "r101=0x2351847027482577LL
>> ...
>> r201=0x2351847027482578LL"
>> The new r201 will be "r201=r101+1", and then r101 will live longer,
>> and would increase pressure when allocating registers.
>> But I feel, this would be acceptable for this const_anchor feature.
>>
>> * With this patch, I checked the performance change on SPEC2017, while,
>> and the performance is not aggressive, since this functionality is not
>> hit on any hot path. There are runtime wavings/noise(e.g. on
>> povray_r/xalancbmk_r/xz_r), that are not caused by the patch.
>>
>> With this patch, I also checked the changes in object files (from
>> GCC bootstrap and SPEC), the significant changes are the improvement
>> that: "addi" vs. "2 or more insns: lis+or.."; it also exposes some
>> other optimizations opportunities: like combine/jump2. While the
>> code to store/load one more register is also occurring in few cases,
>> but it does not impact overall performance.
>>
>> * To refine this patch, some history discussions are referenced:
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=33699
>> https://gcc.gnu.org/pipermail/gcc-patches/2009-April/260421.html
>> https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566744.html
>>
>>
>> Bootstrap and regtest pass on ppc64 and ppc64le for this patch.
>> Is this ok for trunk?
>>
>>
>> BR,
>> Jeff (Jiufu)
>>
>> gcc/ChangeLog:
>>
>>  * config/rs6000/rs6000.cc (TARGET_CONST_ANCHOR): New define.
>>  * cse.cc (cse_insn): Add guard condition.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * gcc.target/powerpc/const_anchors.c: New test.
>>  * gcc.target/powerpc/try_const_anchors_ice.c: New test.
>>
>> ---
>>  gcc/config/rs6000/rs6000.cc   |  4 
>>  gcc/cse.cc|  3 ++-
>>  .../gcc.target/powerpc/const_anchors.c| 20 +++
>>  .../powerpc/try_const_anchors_ice.c   | 16 +++
>>  4 files changed, 42 insertions(+), 1 deletion(-)
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/const_anchors.c
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/try_const_anchors_ice.c
>>
>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>> index d2743f7bce6..80cded6dec1 100644
>> --- a/gcc/config/rs6000/rs6000.cc
>> +++ b/gcc/config/rs6000/rs6000.cc
>> @@ -1760,6 +1760,10 @@ static const struct attribute_spec 
>> rs6000_attribute_table[] =
>>  
>>  #undef TARGET_UPDATE_IPA_FN_TARGET_INFO
>>  #define TARGET_UPDATE_IPA_FN_TARGET_INFO rs6000_update_ipa_fn_target_info
>> +
>> +#undef TARGET_CONST_ANCHOR
>> +#define TARGET_CONST_ANCHOR 0x8000
>> +
>>  
>>  
>>  /* Processor table.  */
>> diff --git a/gcc/cse.cc b/gcc/cse.cc
>> index b13afd4ba72..56542b91c1e 100644
>> --- a/gcc/cse.cc
>> +++ b/gcc/cse.cc
>> @@ -5005,7 +5005,8 @@ cse_insn (rtx_insn *insn)
>>if (targetm.const_anchor
>>&& !src_related
>>&& src_const
>> -  && GET_CODE (src_const) == CONST_INT)
>> +  && GET_CODE (src_const) == CONST_INT
>> +  && SCALAR_INT_MODE_P (mode

ping^^^ [PATCH] rs6000: mark tieable between INT and FLOAT

2023-05-17 Thread Jiufu Guo via Gcc-patches


Gentle ping...

Jiufu Guo via Gcc-patches  writes:

> Hi,
>
> I would ping this patch for stage1:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609504.html
>
> BR,
> Jeff (Jiufu)
>
> Jiufu Guo via Gcc-patches  writes:
>
>> Hi,
>>
>> Gently Ping:
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609504.html
>>
>> BR,
>> Jeff (Jiufu)
>>
>>
>> Jiufu Guo  writes:
>>
>>> Hi,
>>>
>>> During discussing/review patches in maillist, we find more modes are
>>> tieable, e.g. DI<->DF.  With some discussion, I drafted this patch
>>> to mark more tieable modes.
>>>
>>> Bootstrap and regtest pass on ppc64{,le}.
>>> Is this ok for trunk?
>>>
>>> BR,
>>> Jeff (Jiufu)
>>>
>>> gcc/ChangeLog:
>>>
>>> * config/rs6000/rs6000.cc (rs6000_modes_tieable_p): Mark more tieable
>>> modes.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> * g++.target/powerpc/pr102024.C: Updated.
>>>
>>> ---
>>>  gcc/config/rs6000/rs6000.cc | 9 +
>>>  gcc/testsuite/g++.target/powerpc/pr102024.C | 3 ++-
>>>  2 files changed, 11 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>>> index 6ac3adcec6b..3cb0186089e 100644
>>> --- a/gcc/config/rs6000/rs6000.cc
>>> +++ b/gcc/config/rs6000/rs6000.cc
>>> @@ -1968,6 +1968,15 @@ rs6000_modes_tieable_p (machine_mode mode1, 
>>> machine_mode mode2)
>>>if (ALTIVEC_OR_VSX_VECTOR_MODE (mode2))
>>>  return false;
>>>  
>>> +  /* SFmode format (IEEE DP) in register would not as required,
>>> + So SFmode is restrict here.  */
>>> +  if (GET_MODE_CLASS (mode1) == MODE_FLOAT
>>> +  && GET_MODE_CLASS (mode2) == MODE_INT)
>>> +return GET_MODE_SIZE (mode2) == UNITS_PER_FP_WORD && mode1 != SFmode;
>>> +  if (GET_MODE_CLASS (mode1) == MODE_INT
>>> +  && GET_MODE_CLASS (mode2) == MODE_FLOAT)
>>> +return GET_MODE_SIZE (mode1) == UNITS_PER_FP_WORD && mode2 != SFmode;
>>> +
>>>if (SCALAR_FLOAT_MODE_P (mode1))
>>>  return SCALAR_FLOAT_MODE_P (mode2);
>>>if (SCALAR_FLOAT_MODE_P (mode2))
>>> diff --git a/gcc/testsuite/g++.target/powerpc/pr102024.C 
>>> b/gcc/testsuite/g++.target/powerpc/pr102024.C
>>> index 769585052b5..27d2dc5e80b 100644
>>> --- a/gcc/testsuite/g++.target/powerpc/pr102024.C
>>> +++ b/gcc/testsuite/g++.target/powerpc/pr102024.C
>>> @@ -5,7 +5,8 @@
>>>  // Test that a zero-width bit field in an otherwise homogeneous aggregate
>>>  // generates a psabi warning and passes arguments in GPRs.
>>>  
>>> -// { dg-final { scan-assembler-times {\mstd\M} 4 } }
>>> +// { dg-final { scan-assembler-times {\mmtvsrd\M} 4 { target has_arch_pwr8 
>>> } } }
>>> +// { dg-final { scan-assembler-times {\mstd\M} 4 { target { ! 
>>> has_arch_pwr8 } } } }
>>>  
>>>  struct a_thing
>>>  {


ping^^: [PATCH V2] rs6000: Enhance lowpart/highpart DI->SF by mtvsrws/mtvsrd

2023-05-17 Thread Jiufu Guo via Gcc-patches
Gentle ping...

Jiufu Guo via Gcc-patches  writes:

> Hi
>
> I would like to ping this patch for stage1:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612168.html
>
> BR,
> Jeff (Jiufu)
>
> Jiufu Guo  writes:
>
>> Hi,
>>
>> Compare with previous version:
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609654.html
>> This patch does not use UNSPEC for insn mtvsrws anymore.  And to handle
>> the subreg better on BE and LE, predicate "lowpart_subreg_operator"
>> is introducted. To help combine pass to match the pattern on high32
>> bit of DI, shiftrt is still used.
>>
>> As mentioned in PR108338, on p9, we could use mtvsrws to implement
>> the conversion from SI#0 to SF (or lowpart DI to SF).
>>
>> For examples:
>>   *(long long*)buff = di;
>>   float f = *(float*)(buff);
>> We generate "sldi 9,3,32 ; mtvsrd 1,9 ; xscvspdpn 1,1" instead of
>> "mtvsrws 1,3 ; xscvspdpn 1,1".
>>
>> This patch update this, and also enhance the bitcast from highpart
>> DI to SF.
>>
>> Bootstrap and regtests pass on ppc64{,le}.
>> Is this ok for trunk?
>>
>> BR,
>> Jeff (Jiufu)
>>
>>  PR target/108338
>>
>> gcc/ChangeLog:
>>
>>  * config/rs6000/predicates.md (lowpart_subreg_operator): New
>>  define_predicate.
>>  * config/rs6000/rs6000.md (any_rshift): New code_iterator.
>>  (movsf_from_si2): Rename to...
>>  (movsf_from_si2_): ... this.
>>  (si2sf_mtvsrws): New define_insn.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * gcc.target/powerpc/pr108338.c: New test.
>>
>> ---
>>  gcc/config/rs6000/predicates.md |  5 +++
>>  gcc/config/rs6000/rs6000.md | 35 -
>>  gcc/testsuite/gcc.target/powerpc/pr108338.c | 42 +
>>  3 files changed, 73 insertions(+), 9 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108338.c
>>
>> diff --git a/gcc/config/rs6000/predicates.md 
>> b/gcc/config/rs6000/predicates.md
>> index 52c65534e51..e57c9d99c6b 100644
>> --- a/gcc/config/rs6000/predicates.md
>> +++ b/gcc/config/rs6000/predicates.md
>> @@ -2064,3 +2064,8 @@ (define_predicate "macho_pic_address"
>>else
>>  return false;
>>  })
>> +
>> +(define_predicate "lowpart_subreg_operator"
>> +  (and (match_code "subreg")
>> +   (match_test "subreg_lowpart_offset (mode, GET_MODE (SUBREG_REG (op)))
>> +== SUBREG_BYTE (op)")))
>> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
>> index 4a7812fa592..5b4a7f8d801 100644
>> --- a/gcc/config/rs6000/rs6000.md
>> +++ b/gcc/config/rs6000/rs6000.md
>> @@ -7539,6 +7539,14 @@ (define_split
>>   UNSPEC_MOVSI_GOT))]
>>"")
>>  
>> +(define_insn "si2sf_mtvsrws"
>> +  [(set (match_operand:SF 0 "gpc_reg_operand" "=wa")
>> +   (subreg:SF (match_operand:SI 1 "gpc_reg_operand" "r") 0))]
>> +  "TARGET_P9_VECTOR && TARGET_XSCVSPDPN"
>> +  "mtvsrws %x0,%1\n\txscvspdpn %x0,%x0"
>> +  [(set_attr "type" "mfvsr")
>> +   (set_attr "length" "8")])
>> +
>>  ;; MR  LA
>>  ;; LWZ LFIWZX  LXSIWZX
>>  ;; STW STFIWX  STXSIWX
>> @@ -8203,10 +8211,18 @@ (define_insn_and_split "movsf_from_si"
>>rtx op2 = operands[2];
>>rtx op1_di = gen_rtx_REG (DImode, REGNO (op1));
>>  
>> -  /* Move SF value to upper 32-bits for xscvspdpn.  */
>> -  emit_insn (gen_ashldi3 (op2, op1_di, GEN_INT (32)));
>> -  emit_insn (gen_p8_mtvsrd_sf (op0, op2));
>> -  emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0));
>> +  if (TARGET_P9_VECTOR)
>> +{
>> +  emit_insn (gen_si2sf_mtvsrws (op0, gen_lowpart (SImode, op1_di)));
>> +}
>> +  else
>> +{
>> +  /* Move SF value to upper 32-bits for xscvspdpn.  */
>> +  emit_insn (gen_ashldi3 (op2, op1_di, GEN_INT (32)));
>> +  emit_insn (gen_p8_mtvsrd_sf (op0, op2));
>> +  emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0));
>> +}
>> +
>>DONE;
>>  }
>>[(set_attr "length"
>> @@ -8219,18 +8235,19 @@ (define_insn_and_split "movsf_from_si"
>>  "*,  *, p9v,   p8v,   *, *,
>>   p8v,   

[PATCH] Optimized "(X - N * M) / N + M" to "X / N" if valid

2023-05-17 Thread Jiufu Guo via Gcc-patches
Hi,

This patch tries to optimize "(X - N * M) / N + M" to "X / N".
As per the discussions in PR108757, we know this transformation is valid
only under some conditions.
For C code, "/" towards zero (trunc_div), and "X - N * M"
maybe wrap/overflow/underflow. So, it is valid that "X - N * M" does
not cross zero and does not wrap/overflow/underflow.

This patch also handles the case when "N" is the power of 2, where
"(X - N * M) / N" is "(X - N * M) >> log2(N)".

Bootstrap & regtest pass on ppc64{,le} and x86_64.
Is this ok for trunk?

BR,
Jeff (Jiufu)

PR tree-optimization/108757

gcc/ChangeLog:

* gimple-match-head.cc (optimize_x_minus_NM_div_N_plus_M): New function.
* match.pd ((X - N * M) / N + M): New pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/pr108757-1.c: New test.
* gcc.dg/pr108757-2.c: New test.
* gcc.dg/pr108757.h: New test.

---
 gcc/gimple-match-head.cc  |  54 ++
 gcc/match.pd  |  22 
 gcc/testsuite/gcc.dg/pr108757-1.c |  17 
 gcc/testsuite/gcc.dg/pr108757-2.c |  18 
 gcc/testsuite/gcc.dg/pr108757.h   | 160 ++
 5 files changed, 271 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pr108757-1.c
 create mode 100644 gcc/testsuite/gcc.dg/pr108757-2.c
 create mode 100644 gcc/testsuite/gcc.dg/pr108757.h

diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
index b08cd891a13..680a4cb2fc6 100644
--- a/gcc/gimple-match-head.cc
+++ b/gcc/gimple-match-head.cc
@@ -224,3 +224,57 @@ optimize_successive_divisions_p (tree divisor, tree 
inner_div)
 }
   return true;
 }
+
+/* Return true if "(X - N * M) / N + M" can be optimized into "X / N".
+   Otherwise return false.
+
+   For unsigned,
+   If sign bit of M is 0 (clz is 0), valid range is [N*M, MAX].
+   If sign bit of M is 1, valid range is [0, MAX - N*(-M)].
+
+   For signed,
+   If N*M > 0, valid range: [MIN+N*M, 0] + [N*M, MAX]
+   If N*M < 0, valid range: [MIN, -(-N*M)] + [0, MAX - (-N*M)].  */
+
+static bool
+optimize_x_minus_NM_div_N_plus_M (tree x, wide_int n, wide_int m, tree type)
+{
+  wide_int max = wi::max_value (type);
+  signop sgn = TYPE_SIGN (type);
+  wide_int nm;
+  wi::overflow_type ovf;
+  if (TYPE_UNSIGNED (type) && wi::clz (m) == 0)
+nm = wi::mul (n, -m, sgn, );
+  else
+nm = wi::mul (n, m, sgn, );
+
+  if (ovf != wi::OVF_NONE)
+return false;
+
+  value_range vr0;
+  if (!get_range_query (cfun)->range_of_expr (vr0, x) || vr0.varying_p ()
+  || vr0.undefined_p ())
+return false;
+
+  wide_int wmin0 = vr0.lower_bound ();
+  wide_int wmax0 = vr0.upper_bound ();
+  wide_int min = wi::min_value (type);
+
+  /* unsigned */
+  if ((TYPE_UNSIGNED (type)))
+/* M > 0 (clz != 0): [N*M, MAX],  M < 0 : [0, MAX-N*(-M)]  */
+return wi::clz (m) != 0 ? wi::ge_p (wmin0, nm, sgn)
+   : wi::le_p (wmax0, max - nm, sgn);
+
+  /* signed, N*M > 0 */
+  else if (wi::gt_p (nm, 0, sgn))
+/* [N*M, MAX] or [MIN+N*M, 0] */
+return wi::ge_p (wmin0, nm, sgn)
+  || (wi::ge_p (wmin0, min + nm, sgn) && wi::le_p (wmax0, 0, sgn));
+
+  /* signed, N*M < 0 */
+  /* [MIN, N*M] or [0, MAX + N*M]*/
+  else
+return wi::le_p (wmax0, nm, sgn)
+  || (wi::ge_p (wmin0, 0, sgn) && wi::le_p (wmax0, max - (-nm), sgn));
+}
diff --git a/gcc/match.pd b/gcc/match.pd
index ceae1c34abc..1aaa5530577 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -881,6 +881,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 #endif

 
+#if GIMPLE
+/* Simplify ((t + -N*M) / N + M) -> t / N.  */
+(for div (trunc_div exact_div)
+ (simplify
+  (plus (div (plus @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3)
+  (with {wide_int n = wi::to_wide (@2); wide_int m = wi::to_wide (@3);}
+(if (INTEGRAL_TYPE_P (type)
+&& n * m == -wi::to_wide (@1)
+&& optimize_x_minus_NM_div_N_plus_M (@0, n, m, type))
+(div @0 @2)
+
+/* Simplify ((t + -(M<> N + M) -> t >> N.  */
+(simplify
+ (plus (rshift (plus @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3)
+ (with {wide_int n = wi::to_wide (@2); wide_int m = wi::to_wide (@3);}
+   (if (INTEGRAL_TYPE_P (type)
+   && (m << n) == -wi::to_wide (@1)
+   && optimize_x_minus_NM_div_N_plus_M (@0,
+wi::one (TYPE_PRECISION (type)) << n, m, type))
+(rshift @0 @2
+#endif
+
 (for op (negate abs)
  /* Simplify cos(-x) and cos(|x|) -> cos(x).  Similarly for cosh.  */
  (for coss (COS COSH)
diff --git a/gcc/testsuite/gcc.dg/pr108757-1.c 
b/gcc/testsuite/gcc.dg/pr108757-1.c
new file mode 100644
index 000..349318a7c82
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr108757-1.c
@@ -0,0 +1,17 @@
+/* PR tree-optimization/108757 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#include 
+#define N 4
+#define M 3
+#define GAP 0
+typedef unsigned int UINT;
+typedef int INT;
+#define UMAX UINT_MAX
+#define IMAX INT_MAX
+#define IMIN INT_MIN
+#include "pr108757.h"
+
+/* { dg-final { scan-tree-dump-not " 

Ping^^: [PATCH V2] extract DF/SF/SI/HI/QI subreg from parameter word on stack

2023-05-10 Thread Jiufu Guo via Gcc-patches


Hi,

I would like to ping:
https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609396.html

We know there are a few issues related to aggregate parameter and
returns.  I'm thinking if it is ok for trunk to use this patch to
resolve part of those issues.


BR,
Jeff (Jiufu)


Jiufu Guo via Gcc-patches  writes:

> Hi,
>
> Gentle ping:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609396.html
>
> Thanks for comments and suggestions!
>
> I'm thinking that we may use these patches to fix some of the issues
> on parm and returns.
>
> Sorry for the late ping for this patch to ask if this is acceptable.
>
>
> BR,
> Jeff (Jiufu)
>
> Jiufu Guo  writes:
>
>> Hi,
>>
>> This patch is fixing an issue about parameter accessing if the
>> parameter is struct type and passed through integer registers, and
>> there is floating member is accessed. Like below code:
>>
>> typedef struct DF {double a[4]; long l; } DF;
>> double foo_df (DF arg){return arg.a[3];}
>>
>> On ppc64le, with trunk gcc, "std 6,-24(1) ; lfd 1,-24(1)" is
>> generated.  While instruction "mtvsrd 1, 6" would be enough for
>> this case.
>>
>> This patch updates the behavior when loading floating members of a
>> parameter: if that floating member is stored via integer register,
>> then loading it as integer mode first, and converting it to floating
>> mode.
>>
>> Compare with previous patch:
>> https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608872.html
>> Previous version supports converion from DImode to DF/SF, this
>> version also supports conversion from DImode to SI/HI/QI modes.
>>
>> I also tried to enhance CSE/DSE for this issue.  But because the
>> limitations (e.g. CSE does not like new pseudo, DSE is not good
>> at cross-blocks), some cases (as this patch) can not be handled.
>>
>> Bootstrap and regtest passes on ppc64{,le}.
>> Is this ok for trunk?  Thanks for comments!
>>
>>
>> BR,
>> Jeff (Jiufu)
>>
>>
>>  PR target/108073
>>
>> gcc/ChangeLog:
>>
>>  * expr.cc (extract_subreg_from_loading_word): New function.
>>  (expand_expr_real_1): Call extract_subreg_from_loading_word.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * g++.target/powerpc/pr102024.C: Updated.
>>  * gcc.target/powerpc/pr108073.c: New test.
>>
>> ---
>>  gcc/expr.cc | 76 +
>>  gcc/testsuite/g++.target/powerpc/pr102024.C |  2 +-
>>  gcc/testsuite/gcc.target/powerpc/pr108073.c | 30 
>>  3 files changed, 107 insertions(+), 1 deletion(-)
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073.c
>>
>> diff --git a/gcc/expr.cc b/gcc/expr.cc
>> index d9407432ea5..6de4a985c8b 100644
>> --- a/gcc/expr.cc
>> +++ b/gcc/expr.cc
>> @@ -10631,6 +10631,69 @@ stmt_is_replaceable_p (gimple *stmt)
>>return false;
>>  }
>>  
>> +/* Return the content of the memory slot SOURCE as MODE.
>> +   SOURCE is based on BASE. BASE is a memory block that is stored via words.
>> +
>> +   To get the content from SOURCE:
>> +   first load the word from the memory which covers the SOURCE slot first;
>> +   next return the word's subreg which offsets to SOURCE slot;
>> +   then convert to MODE as necessary.  */
>> +
>> +static rtx
>> +extract_subreg_from_loading_word (machine_mode mode, rtx source, rtx base)
>> +{
>> +  rtx src_base = XEXP (source, 0);
>> +  poly_uint64 offset = MEM_OFFSET (source);
>> +
>> +  if (GET_CODE (src_base) == PLUS && CONSTANT_P (XEXP (src_base, 1)))
>> +{
>> +  offset += INTVAL (XEXP (src_base, 1));
>> +  src_base = XEXP (src_base, 0);
>> +}
>> +
>> +  if (!rtx_equal_p (XEXP (base, 0), src_base))
>> +return NULL_RTX;
>> +
>> +  /* Subreg(DI,n) -> DF/SF/SI/HI/QI */
>> +  poly_uint64 word_size = GET_MODE_SIZE (word_mode);
>> +  poly_uint64 mode_size = GET_MODE_SIZE (mode);
>> +  poly_uint64 byte_off;
>> +  unsigned int start;
>> +  machine_mode int_mode;
>> +  if (known_ge (word_size, mode_size) && multiple_p (word_size, mode_size)
>> +  && int_mode_for_mode (mode).exists (_mode)
>> +  && can_div_trunc_p (offset, word_size, , _off)
>> +  && multiple_p (byte_off, mode_size))
>> +{
>> +  rtx word_mem = copy_rtx (source);
>> +  PUT_MODE (word_mem, word_mode);
>> +  word_mem = adjust_address (word_mem, word_mode, -byte_off);
>> +
&

Re: [PATCH V5] Use reg mode to move sub blocks for parameters and returns

2023-05-09 Thread Jiufu Guo via Gcc-patches


Hi,

Jeff Law  writes:

> On 5/3/23 23:49, guojiufu wrote:
>> Hi,
>>
>> On 2023-05-01 03:00, Jeff Law wrote:
>>> On 3/16/23 21:39, Jiufu Guo wrote:
 Hi,

 When assigning a parameter to a variable, or assigning a variable to
 return value with struct type, and the parameter/return is passed
 through registers.
 For this kind of case, it would be better to use the nature mode of
 the registers to move the content for the assignment.

 As the example code (like code in PR65421):

 typedef struct SA {double a[3];} A;
 A ret_arg_pt (A *a) {return *a;} // on ppc64le, expect only 3 lfd(s)
 A ret_arg (A a) {return a;} // just empty fun body
 void st_arg (A a, A *p) {*p = a;} //only 3 stfd(s)

 Comparing with previous version:
 https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609394.html
 This version refine code to eliminated reductant code in  the sub
 routine "move_sub_blocks".

 Bootstrap and regtest pass on ppc64{,le}.
 Is this ok for trunk?

>> ...
>>
 diff --git a/gcc/expr.cc b/gcc/expr.cc
 index 15be1c8db99..97a7be9542e 100644
 --- a/gcc/expr.cc
 +++ b/gcc/expr.cc
 @@ -5559,6 +5559,41 @@ mem_ref_refers_to_non_mem_p (tree ref)
     return non_mem_decl_p (base);
   }
   +/* Sub routine of expand_assignment, invoked when assigning from a
 +   parameter or assigning to a return val on struct type which may
 +   be passed through registers.  The mode of register is used to
 +   move the content for the assignment.
 +
 +   This routine generates code for expression FROM which is BLKmode,
 +   and move the generated content to TO_RTX by su-blocks in SUB_MODE.  */
 +
 +static void
 +move_sub_blocks (rtx to_rtx, tree from, machine_mode sub_mode)
 +{
 +  gcc_assert (MEM_P (to_rtx));
 +
 +  HOST_WIDE_INT size = MEM_SIZE (to_rtx).to_constant ();
>>> Consider the case of a BLKmode return value.  Isn't TO_RTX in this
>>> case a BLKmode object?
>>
>> Thanks for this question!
>>
>> Yes, the mode of TO_RTX is BLKmode.
>> As we know, when the function returns via registers, the mode of
>> the `return-rtx` could also be BLKmode.  This patch is going to
>> improve these kinds of cases.
>>
>> For example:
>> ```
>> typedef struct FLOATS
>> {
>>    double a[3];
>> } FLOATS;
>> FLOATS ret_arg_pt (FLOATS *a){return *a;}
>> ```
>>
>> D.3952 = *a_2(D); //this patch enhance this assignment
>> return D.3952;
>>
>> The mode is BLKmode for the rtx of `D.3952` is BLKmode, and the
>> rtx for "DECL_RESULT(current_function_decl)".  And the DECL_RESULT
>> represents the return registers.
> I didn't think MEM_SIZE worked for BLKmode.  BUt looking at its
> definition, it's pulling the size out of the attributes rather than
> from the mode.  SO I guess there's a reasonable chance it's going to
> work :-)

Thanks for point out this!  Yes, BLKmode rtx may not always be a MEM.
MEM_SIZE is only ok for MEM after the it's known size is computed.
Here MEM_SIZE is fine just because it is an stack rtx corresponding
to the type of parameter and returns which has been computed.

I updated the patch to resolve the conflicts with the trunk, and
retest bootstrap, and then updated the patch a new version.

And this version pass bootstrap and regtest on ppc64{,le}, x86_64. 

The major change is 'move_sub_blocks' only handles the case when
the block size can be move by same submode, or say (size % sub_size)
is 0.  If no objection, I would committed the new version.

BR,
Jeff (Jiufu)

gcc/ChangeLog:

* cfgexpand.cc (expand_used_vars): Update to mark DECL_USEDBY_RETURN_P
for returns.
* expr.cc (move_sub_blocks): New function.
(expand_assignment): Update assignment code about returns/parameters.
* function.cc (assign_parm_setup_block): Update to mark
DECL_REGS_TO_STACK_P for parameter.
* tree-core.h (struct tree_decl_common): Add comment.
* tree.h (DECL_USEDBY_RETURN_P): New define.
(DECL_REGS_TO_STACK_P): New define.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr65421-1.c: New test.
* gcc.target/powerpc/pr65421-2.c: New test.
---
 gcc/cfgexpand.cc | 14 +
 gcc/expr.cc  | 62 
 gcc/function.cc  |  3 +
 gcc/tree-core.h  |  4 +-
 gcc/tree.h   |  9 +++
 gcc/testsuite/gcc.target/powerpc/pr65421-1.c |  6 ++
 gcc/testsuite/gcc.target/powerpc/pr65421-2.c | 33 +++
 7 files changed, 130 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-2.c

diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
index 1a1b26b1c6c..7b6a2216492 100644
--- a/gcc/cfgexpand.cc
+++ b/gcc/cfgexpand.cc
@@ -2158,6 +2158,20 @@ expand_used_vars (bitmap 

ping: [PATCH V2] rs6000: Add new patterns rlwinm with mask

2023-04-25 Thread Jiufu Guo via Gcc-patches
Hi,

I would like to ping this for comments.
https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611821.html

In this patch, "subreg:DI (x:SI)" is used. I' thinking that this
may be a concern, while it maybe acceptable for current code.

BR,
Jeff (Jiufu)

Jiufu Guo  writes:

> Hi,
>
> Compare with previous version:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611513.html
> This patch remove un-needed lowpart_subreg_operand.
>
> For code:
> ```
> u64
> test_rlwinm_lowpart_mask (u32 v)
> {
>   u32 v1 = ((v << N) | (v >> (32 - N))) & 0xf00;
>   return (u64)v1;
> }
> ```
> We generate "rlwinm 3,3,4,4,23; rldicl 3,3,0,32" instead of "rlwinm 
> 3,3,4,4,23".
> Here the "rlwinm" cleans high32 bits already, so "rldicl" is reductant.
>
> Similarly, for the below code which is the functionality of "rlwinm".
> ```
> u64
> test_rlwinm_mask (u32 v)
> {
>   u32 v1 = ((v << N) | (v >> (32 - N)));
>   u64 v2 = (u64) v1 | ((u64) v1 << 32);
>   return v2 & 0xe003ULL;
> }
> ```
> We generate
> "rotlwi 3,3,4; sldi 9,3,32; add 3,9,3; rldicl 3,3,35,27; rldicl 3,3,29,0"
> instead of "rlwinm 3,3,4,30,2".
>
> This patch optimizes these two kinds of code to use just one "rlwinm" insn.
>
> Bootstrap and regtests pass on ppc64{,le}.
> Is this patch ok for trunk (or next stage1)?
>
>
> BR,
> Jeff (Jiufu)
>
>
> gcc/ChangeLog:
>
>   * config/rs6000/rs6000.md (rlwinm_lowpart_mask): New define_insn.
>   (rlwinm_mask_): New define_insn.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/powerpc/rlwinm-0.c: Reduce instruction number.
>   * gcc.target/powerpc/rlwinm_3.c: New test.
>
> ---
>  gcc/config/rs6000/rs6000.md | 34 +++
>  gcc/testsuite/gcc.target/powerpc/rlwinm-0.c |  6 +--
>  gcc/testsuite/gcc.target/powerpc/rlwinm_3.c | 47 +
>  3 files changed, 84 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/rlwinm_3.c
>
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 4a7812fa592..a7bf80da32e 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -4325,6 +4325,40 @@ (define_insn "*rotldi3_insert_7"
>[(set_attr "type" "insert")
> (set_attr "size" "64")])
>  
> +(define_insn "rlwinm_lowpart_mask"
> +  [(set (match_operand:DI 0 "gpc_reg_operand" "=r")
> + (and:DI
> +   (subreg:DI
> +   (match_operator:SI 4 "rotate_mask_operator"
> + [(match_operand:SI 1 "gpc_reg_operand" "r")
> +  (match_operand:SI 2 "const_int_operand" "n")]) 0)
> +   (match_operand:DI 3 "const_int_operand" "n")))]
> +  "TARGET_POWERPC64 && (UINTVAL (operands[3]) >> 32) == 0
> +   && rs6000_is_valid_shift_mask (operands[3], operands[4], SImode)"
> +{
> +  return rs6000_insn_for_shift_mask (SImode, operands, false);
> +}
> +  [(set_attr "type" "shift")])
> +
> +(define_insn "rlwinm_mask_"
> +  [(set (match_operand:DI 0 "gpc_reg_operand" "=r")
> + (and:DI
> +   (plus_ior_xor:DI
> +  (ashift:DI
> +(subreg:DI
> +  (match_operator:SI 4 "rotate_mask_operator"
> +[(match_operand:SI 1 "gpc_reg_operand" "r")
> + (match_operand:SI 2 "const_int_operand" "n")]) 0)
> +(const_int 32))
> +  (zero_extend:DI (match_dup 4)))
> +   (match_operand:DI 3 "const_int_operand" "n")))]
> +  "TARGET_POWERPC64
> +   && (UINTVAL (operands[3]) & 0x8001ULL) == 
> 0x8001ULL
> +   && rs6000_is_valid_mask (operands[3], NULL, NULL, SImode)"
> +{
> +  return rs6000_insn_for_shift_mask (SImode, operands, false);
> +}
> +  [(set_attr "type" "shift")])
>  
>  ; This handles the important case of multiple-precision shifts.  There is
>  ; no canonicalization rule for ASHIFT vs. LSHIFTRT, so two patterns.
> diff --git a/gcc/testsuite/gcc.target/powerpc/rlwinm-0.c 
> b/gcc/testsuite/gcc.target/powerpc/rlwinm-0.c
> index 4f4fca2d8ef..50ff01e1925 100644
> --- a/gcc/testsuite/gcc.target/powerpc/rlwinm-0.c
> +++ b/gcc/testsuite/gcc.target/powerpc/rlwinm-0.c
> @@ -2,12 +2,12 @@
>  /* { dg-options "-O2" } */
>  
>  /* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 6739 { target ilp32 } } 
> } */
> -/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 9716 { target lp64 } } 
> } */
> +/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 8164 { target lp64 } } 
> } */
>  /* { dg-final { scan-assembler-times {(?n)^\s+blr} 3375 } } */
> -/* { dg-final { scan-assembler-times {(?n)^\s+rldicl} 3081 { target lp64 } } 
> } */
> +/* { dg-final { scan-assembler-times {(?n)^\s+rldicl} 1538 { target lp64 } } 
> } */
>  
>  /* { dg-final { scan-assembler-times {(?n)^\s+rlwinm} 3197 { target ilp32 } 
> } } */
> -/* { dg-final { scan-assembler-times {(?n)^\s+rlwinm} 3093 { target lp64 } } 
> } */
> +/* { dg-final { scan-assembler-times {(?n)^\s+rlwinm} 3084 { target lp64 } } 
> } */
>  /* { dg-final { scan-assembler-times {(?n)^\s+rotlwi} 154 } } */
>  /* { dg-final { scan-assembler-times {(?n)^\s+srwi} 13 { target ilp32 

ping: [PATCH] rs6000: Enable const_anchor for 'addi'

2023-04-25 Thread Jiufu Guo via Gcc-patches
Hi,

I'm thinking that we may enable this patch for stage1, so ping it.
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603530.html

BR,
Jeff (Jiufu)

Jiufu Guo  writes:

> Hi,
>
> There is a functionality as const_anchor in cse.cc.  This const_anchor
> supports to generate new constants through adding small gap/offsets to
> existing constant.  For example:
>
> void __attribute__ ((noinline)) foo (long long *a)
> {
>   *a++ = 0x2351847027482577LL;
>   *a++ = 0x2351847027482578LL;
> }
> The second constant (0x2351847027482578LL) can be compated by adding '1'
> to the first constant (0x2351847027482577LL).
> This is profitable if more than one instructions are need to build the
> second constant.
>
> * For rs6000, we can enable this functionality, as the instruction
> 'addi' is just for this when gap is smaller than 0x8000.
>
> * Besides enabling TARGET_CONST_ANCHOR on rs6000, this patch also fixed
> one issue. The issue is:
> "gcc_assert (SCALAR_INT_MODE_P (mode))" is an requirement for function
> "try_const_anchors". 
>
> * One potential side effect of this patch:
> Comparing with
> "r101=0x2351847027482577LL
> ...
> r201=0x2351847027482578LL"
> The new r201 will be "r201=r101+1", and then r101 will live longer,
> and would increase pressure when allocating registers.
> But I feel, this would be acceptable for this const_anchor feature.
>
> * With this patch, I checked the performance change on SPEC2017, while,
> and the performance is not aggressive, since this functionality is not
> hit on any hot path. There are runtime wavings/noise(e.g. on
> povray_r/xalancbmk_r/xz_r), that are not caused by the patch.
>
> With this patch, I also checked the changes in object files (from
> GCC bootstrap and SPEC), the significant changes are the improvement
> that: "addi" vs. "2 or more insns: lis+or.."; it also exposes some
> other optimizations opportunities: like combine/jump2. While the
> code to store/load one more register is also occurring in few cases,
> but it does not impact overall performance.
>
> * To refine this patch, some history discussions are referenced:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=33699
> https://gcc.gnu.org/pipermail/gcc-patches/2009-April/260421.html
> https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566744.html
>
>
> Bootstrap and regtest pass on ppc64 and ppc64le for this patch.
> Is this ok for trunk?
>
>
> BR,
> Jeff (Jiufu)
>
> gcc/ChangeLog:
>
>   * config/rs6000/rs6000.cc (TARGET_CONST_ANCHOR): New define.
>   * cse.cc (cse_insn): Add guard condition.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/powerpc/const_anchors.c: New test.
>   * gcc.target/powerpc/try_const_anchors_ice.c: New test.
>
> ---
>  gcc/config/rs6000/rs6000.cc   |  4 
>  gcc/cse.cc|  3 ++-
>  .../gcc.target/powerpc/const_anchors.c| 20 +++
>  .../powerpc/try_const_anchors_ice.c   | 16 +++
>  4 files changed, 42 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/const_anchors.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/try_const_anchors_ice.c
>
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index d2743f7bce6..80cded6dec1 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -1760,6 +1760,10 @@ static const struct attribute_spec 
> rs6000_attribute_table[] =
>  
>  #undef TARGET_UPDATE_IPA_FN_TARGET_INFO
>  #define TARGET_UPDATE_IPA_FN_TARGET_INFO rs6000_update_ipa_fn_target_info
> +
> +#undef TARGET_CONST_ANCHOR
> +#define TARGET_CONST_ANCHOR 0x8000
> +
>  
>  
>  /* Processor table.  */
> diff --git a/gcc/cse.cc b/gcc/cse.cc
> index b13afd4ba72..56542b91c1e 100644
> --- a/gcc/cse.cc
> +++ b/gcc/cse.cc
> @@ -5005,7 +5005,8 @@ cse_insn (rtx_insn *insn)
>if (targetm.const_anchor
> && !src_related
> && src_const
> -   && GET_CODE (src_const) == CONST_INT)
> +   && GET_CODE (src_const) == CONST_INT
> +   && SCALAR_INT_MODE_P (mode))
>   {
> src_related = try_const_anchors (src_const, mode);
> src_related_is_const_anchor = src_related != NULL_RTX;
> diff --git a/gcc/testsuite/gcc.target/powerpc/const_anchors.c 
> b/gcc/testsuite/gcc.target/powerpc/const_anchors.c
> new file mode 100644
> index 000..39958ff9765
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/const_anchors.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile { target has_arch_ppc64 } } */
> +/* { dg-options "-O2" } */
> +
> +#define C1 0x2351847027482577ULL
> +#define C2 0x2351847027482578ULL
> +
> +void __attribute__ ((noinline)) foo (long long *a)
> +{
> +  *a++ = C1;
> +  *a++ = C2;
> +}
> +
> +void __attribute__ ((noinline)) foo1 (long long *a, long long b)
> +{
> +  *a++ = C1;
> +  if (b)
> +*a++ = C2;
> +}
> +
> +/* { dg-final { scan-assembler-times {\maddi\M} 2 } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/try_const_anchors_ice.c 
> 

Ping^^ [PATCH 0/4] rs6000: build constant via li/lis;rldicX

2023-04-25 Thread Jiufu Guo via Gcc-patches
Hi,

I would like to ping these patches.
[0/4]
https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611286.html
[1/4]
https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611287.html
[2/4]
https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611288.html
[3/4]
https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611289.html
[4/4]
https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611290.html

Any sugguestions for the code functionality/style or to make
it easy for review, please point out, thanks in advance!


BR,
Jeff (Jiufu)

Jiufu Guo via Gcc-patches  writes:

> Hi,
>
> Gental ping these patches:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611286.html
>
> BR,
> Jeff (Jiufu)
>
>
> Jiufu Guo  writes:
>
>> Hi,
>>
>> For a given constant, it would be profitable if we can use 2 insns to build.
>> This patch enables more constants building through 2 insns: one is "li or 
>> lis",
>> another is 'rldicl, rldicr or rldic'.
>> Through checking and analyzing the characters of the insns "li/lis;rldicX",
>> all the possible constant values are considered by this patch.
>>
>> Previously, a patch is posted, but it is too large.
>> https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601276.html
>> As suggested, I split it into this series.
>>
>> Considering the functionality and size, 4 patches are split as below:
>> 1. Support the constants which can be built by "li;rotldi"
>>Both positive and negative values from insn "li" are analyzed.
>> 2. Support the constants which can be built by "lis;rotldi"
>>We only need to analyze the negative value from "lis".
>>And this patch uses more code to check leading 1s and tailing 0s from 
>> "lis".
>> 3. Support the constants which can be built by "li/lis;rldicl/rldicr":
>>Leverage the APIs defined/analyzed in patches 1 and 2,
>>this patch checks the characters for the mask of "rldicl/rldicr"
>>to support more constants.
>> 4. Support the constants which can be built by "li/lis;rldic":
>>The mask of "rldic" is relatively complicated, it is analyzed in this
>>patch to support more constants.
>>
>> BR,
>> Jeff (Jiufu)


  1   2   3   4   >