Re: [RFC] RISC-V: Eliminate extension after for *w instructions

2023-06-03 Thread Jeff Law via Gcc-patches




On 5/24/23 17:14, Jivan Hakobyan via Gcc-patches wrote:

`This patch tries to prevent generating unnecessary sign extension
after *w instructions like "addiw" or "divw".

The main idea of it is to add SUBREG_PROMOTED fields during expanding.

I have tested on SPEC2017 there is no regression.
Only gcc.dg/pr30957-1.c test failed.
To solve that I did some changes in loop-iv.cc, but not sure that it is
suitable.
So this generally looks good and I did some playing around with it over 
the weekend.  It's generally a win, though it can result in performance 
regressions under the "right" circumstances.


The case I was looking at was omnetpp in spec.  I suspect you didn't see 
it because we have a parameter which increases the threshold for when a 
string comparison against a constant string should be inlined.  I 
suspect you aren't using that param.  I inherited that param usage and 
haven't tried to eliminate it yet.


If you compile a test like this:

#include 
int
foo (char *x)
{
  return strcmp (x, "lowerLayout");
}

With -O2 --param builtin-string-cmp-inline-length=100

You'll see the regression.  This is reasonable representation of the 
code in omnetpp.


It's actually a fairly interesting little problem.  We end up with 
overlapping lifetimes for the 32 and 64bit results.  The register 
allocator doesn't really try hard to detect the case where it can ignore 
a conflict because the two objects hold the same value.  I remember 
kicking this around with Vlad at least 10 years ago and we concluded 
(based on experience and data at the time) that this case wasn't that 
important to handle.  Anyway...


Another approach to this problem would be to twiddle the strcmp expander 
to work in word_mode, then convert the word mode result to the final 
mode at the end of the sequence.  I need to ponder the semantics of this 
a bit more, but if the semantics are right, it seems like it might be a 
viable solution.



I briefly looked at the big improvement in leela (33b instructions, 
roughly 1.7% of the dynamic count removed).  The hope was that if I 
looked at the cases where we improved that they would all be shifts, 
rotates and the like and we could consider a more limited version of 
your patch.  But it was quite clear that the improvement in leela as due 
to better handling of 32bit additions.  So that idea's a non-starter.  I 
didn't look at the big gains in xz (they're smaller in absolute count 
terms, but larger in percentage of instructions removed).



So I'm going to play a bit more with the expander for comparisons 
against constant strings.  If done correctly it might actually be an 
improvement on other targets as well.


Jeff










gcc/ChangeLog:
 * config/riscv/bitmanip.md (rotrdi3): New pattern.
 (rotrsi3): Likewise.
 (rotlsi3): Likewise.
 * config/riscv/riscv-protos.h (riscv_emit_binary): New function
 declaration
 * config/riscv/riscv.cc (riscv_emit_binary): Removed static
 * config/riscv/riscv.md (addsi3): New pattern
 (subsi3): Likewise.
 (negsi2): Likewise.
 (mulsi3): Likewise.
 (si3): New pattern for any_div.
 (si3): New pattern for any_shift.
 * loop-iv.cc (get_biv_step_1):  Process src of extension when it
PLUS

gcc/testsuite/ChangeLog:
 * testsuite/gcc.target/riscv/shift-and-2.c: New test
 * testsuite/gcc.target/riscv/shift-shift-2.c: New test
 * testsuite/gcc.target/riscv/sign-extend.c: New test
 * testsuite/gcc.target/riscv/zbb-rol-ror-03.c: New test




Re: [PATCH] xtensa: Optimize boolean evaluation or branching when EQ/NE to INT_MIN

2023-06-03 Thread Jeff Law via Gcc-patches




On 6/3/23 17:03, Andrew Pinski via Gcc-patches wrote:

On Sat, Jun 3, 2023 at 3:53 PM Takayuki 'January June' Suwa via
Gcc-patches  wrote:


This patch optimizes both the boolean evaluation of and the branching of
EQ/NE against INT_MIN (-2147483648), by taking advantage of the specifi-
cation the ABS machine instruction on Xtensa returns INT_MIN iff INT_MIN,
otherwise non-negative value.


I wonder if this should be a generic expand improvement here.
You would definitely need to expand both ways and see if one is cost
more than the other.
There's probably some targets where this would be beneficial. 
Especially those with branch-on-bit capabilities.


Jeff


RE: [PATCH] RISC-V: Support RVV zvfh{min} vfloat16*_t mov and spill

2023-06-03 Thread Li, Pan2 via Gcc-patches
Committed, thanks Kito.

Pan

From: Kito Cheng 
Sent: Sunday, June 4, 2023 9:01 AM
To: Li, Pan2 
Cc: 钟居哲 ; gcc-patches ; 
kito.cheng ; Wang, Yanzhang 
Subject: Re: [PATCH] RISC-V: Support RVV zvfh{min} vfloat16*_t mov and spill

LGTM

Li, Pan2 via Gcc-patches 
mailto:gcc-patches@gcc.gnu.org>> 於 2023年6月4日 週日 08:36 
寫道:
Great! Thanks Juzhe and let’s wait kito’s approval.

Pan

From: 钟居哲 mailto:juzhe.zh...@rivai.ai>>
Sent: Sunday, June 4, 2023 7:36 AM
To: Li, Pan2 mailto:pan2...@intel.com>>; gcc-patches 
mailto:gcc-patches@gcc.gnu.org>>
Cc: kito.cheng mailto:kito.ch...@sifive.com>>; Li, Pan2 
mailto:pan2...@intel.com>>; Wang, Yanzhang 
mailto:yanzhang.w...@intel.com>>
Subject: Re: [PATCH] RISC-V: Support RVV zvfh{min} vfloat16*_t mov and spill

LGTM. Hope FP16 vector can be committed soon.
Since I would like to wait for FP16 vector and then start to support FP16 FP32 
FP64 autovec together.

Thanks.

juzhe.zh...@rivai.ai>

From: 
pan2.li>
Date: 2023-06-03 22:37
To: gcc-patches>
CC: juzhe.zhong>; 
kito.cheng>; 
pan2.li>; 
yanzhang.wang>
Subject: [PATCH] RISC-V: Support RVV zvfh{min} vfloat16*_t mov and spill
From: Pan Li 
mailto:pan2...@intel.com>>>

This patch would like to allow the mov and spill operation for the RVV
vfloat16*_t types. The involved machine mode includes VNx1HF, VNx2HF,
VNx4HF, VNx8HF, VNx16HF, VNx32HF and VNx64HF.

Signed-off-by: Pan Li 
mailto:pan2...@intel.com>>>
Co-Authored by: Juzhe-Zhong 
mailto:juzhe.zh...@rivai.ai>>>

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-types.def
(vfloat16mf4_t): Add the float16 type to DEF_RVV_F_OPS.
(vfloat16mf2_t): Likewise.
(vfloat16m1_t): Likewise.
(vfloat16m2_t): Likewise.
(vfloat16m4_t): Likewise.
(vfloat16m8_t): Likewise.
* config/riscv/riscv.md: Add vfloat16*_t to attr mode.
* config/riscv/vector-iterators.md: Add vfloat16*_t machine mode
to V, V_WHOLE, V_FRACT, VINDEX, VM, VEL and sew.
* config/riscv/vector.md: Add vfloat16*_t machine mode to sew,
vlmul and ratio.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/mov-14.c: New test.
* gcc.target/riscv/rvv/base/spill-13.c: New test.
---
.../riscv/riscv-vector-builtins-types.def |   7 ++
gcc/config/riscv/riscv.md |   1 +
gcc/config/riscv/vector-iterators.md  |  25 
gcc/config/riscv/vector.md|  35 ++
.../gcc.target/riscv/rvv/base/mov-14.c|  81 +
.../gcc.target/riscv/rvv/base/spill-13.c  | 108 ++
6 files changed, 257 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/mov-14.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/spill-13.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-types.def 
b/gcc/config/riscv/riscv-vector-builtins-types.def
index f7f650f7e95..65716b8c637 100644
--- a/gcc/config/riscv/riscv-vector-builtins-types.def
+++ b/gcc/config/riscv/riscv-vector-builtins-types.def
@@ -385,6 +385,13 @@ DEF_RVV_U_OPS (vuint64m2_t, RVV_REQUIRE_ELEN_64)
DEF_RVV_U_OPS (vuint64m4_t, RVV_REQUIRE_ELEN_64)
DEF_RVV_U_OPS (vuint64m8_t, RVV_REQUIRE_ELEN_64)
+DEF_RVV_F_OPS (vfloat16mf4_t, RVV_REQUIRE_ELEN_FP_16 | RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_F_OPS (vfloat16mf2_t, RVV_REQUIRE_ELEN_FP_16)
+DEF_RVV_F_OPS (vfloat16m1_t, RVV_REQUIRE_ELEN_FP_16)
+DEF_RVV_F_OPS (vfloat16m2_t, RVV_REQUIRE_ELEN_FP_16)
+DEF_RVV_F_OPS (vfloat16m4_t, RVV_REQUIRE_ELEN_FP_16)
+DEF_RVV_F_OPS (vfloat16m8_t, RVV_REQUIRE_ELEN_FP_16)
+
DEF_RVV_F_OPS (vfloat32mf2_t, RVV_REQUIRE_ELEN_FP_32 | RVV_REQUIRE_MIN_VLEN_64)
DEF_RVV_F_OPS (vfloat32m1_t, RVV_REQUIRE_ELEN_FP_32)
DEF_RVV_F_OPS (vfloat32m2_t, RVV_REQUIRE_ELEN_FP_32)
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index f545874edc1..be960583101 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -175,6 +175,7 @@ (define_attr "mode" 
"unknown,none,QI,HI,SI,DI,TI,HF,SF,DF,TF,
   VNx1HI,VNx2HI,VNx4HI,VNx8HI,VNx16HI,VNx32HI,VNx64HI,
   VNx1SI,VNx2SI,VNx4SI,VNx8SI,VNx16SI,VNx32SI,
   VNx1DI,VNx2DI,VNx4DI,VNx8DI,VNx16DI,
+  VNx1HF,VNx2HF,VNx4HF,VNx8HF,VNx16HF,VNx32HF,VNx64HF,
   VNx1SF,VNx2SF,VNx4SF,VNx8SF,VNx16SF,VNx32SF,
   VNx1DF,VNx2DF,VNx4DF,VNx8DF,VNx16DF,
   VNx2x64QI,VNx2x32QI,VNx3x32QI,VNx4x32QI,
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 937ec3c7f67..5fbaef89566 100644
--- 

Re: [PATCH] RISC-V: Support RVV zvfh{min} vfloat16*_t mov and spill

2023-06-03 Thread Kito Cheng via Gcc-patches
LGTM

Li, Pan2 via Gcc-patches  於 2023年6月4日 週日 08:36 寫道:

> Great! Thanks Juzhe and let’s wait kito’s approval.
>
> Pan
>
> From: 钟居哲 
> Sent: Sunday, June 4, 2023 7:36 AM
> To: Li, Pan2 ; gcc-patches 
> Cc: kito.cheng ; Li, Pan2 ;
> Wang, Yanzhang 
> Subject: Re: [PATCH] RISC-V: Support RVV zvfh{min} vfloat16*_t mov and
> spill
>
> LGTM. Hope FP16 vector can be committed soon.
> Since I would like to wait for FP16 vector and then start to support FP16
> FP32 FP64 autovec together.
>
> Thanks.
> 
> juzhe.zh...@rivai.ai
>
> From: pan2.li
> Date: 2023-06-03 22:37
> To: gcc-patches
> CC: juzhe.zhong; kito.cheng kito.ch...@sifive.com>; pan2.li;
> yanzhang.wang
> Subject: [PATCH] RISC-V: Support RVV zvfh{min} vfloat16*_t mov and spill
> From: Pan Li mailto:pan2...@intel.com>>
>
> This patch would like to allow the mov and spill operation for the RVV
> vfloat16*_t types. The involved machine mode includes VNx1HF, VNx2HF,
> VNx4HF, VNx8HF, VNx16HF, VNx32HF and VNx64HF.
>
> Signed-off-by: Pan Li mailto:pan2...@intel.com>>
> Co-Authored by: Juzhe-Zhong  juzhe.zh...@rivai.ai>>
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vector-builtins-types.def
> (vfloat16mf4_t): Add the float16 type to DEF_RVV_F_OPS.
> (vfloat16mf2_t): Likewise.
> (vfloat16m1_t): Likewise.
> (vfloat16m2_t): Likewise.
> (vfloat16m4_t): Likewise.
> (vfloat16m8_t): Likewise.
> * config/riscv/riscv.md: Add vfloat16*_t to attr mode.
> * config/riscv/vector-iterators.md: Add vfloat16*_t machine mode
> to V, V_WHOLE, V_FRACT, VINDEX, VM, VEL and sew.
> * config/riscv/vector.md: Add vfloat16*_t machine mode to sew,
> vlmul and ratio.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/mov-14.c: New test.
> * gcc.target/riscv/rvv/base/spill-13.c: New test.
> ---
> .../riscv/riscv-vector-builtins-types.def |   7 ++
> gcc/config/riscv/riscv.md |   1 +
> gcc/config/riscv/vector-iterators.md  |  25 
> gcc/config/riscv/vector.md|  35 ++
> .../gcc.target/riscv/rvv/base/mov-14.c|  81 +
> .../gcc.target/riscv/rvv/base/spill-13.c  | 108 ++
> 6 files changed, 257 insertions(+)
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/mov-14.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/spill-13.c
>
> diff --git a/gcc/config/riscv/riscv-vector-builtins-types.def
> b/gcc/config/riscv/riscv-vector-builtins-types.def
> index f7f650f7e95..65716b8c637 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-types.def
> +++ b/gcc/config/riscv/riscv-vector-builtins-types.def
> @@ -385,6 +385,13 @@ DEF_RVV_U_OPS (vuint64m2_t, RVV_REQUIRE_ELEN_64)
> DEF_RVV_U_OPS (vuint64m4_t, RVV_REQUIRE_ELEN_64)
> DEF_RVV_U_OPS (vuint64m8_t, RVV_REQUIRE_ELEN_64)
> +DEF_RVV_F_OPS (vfloat16mf4_t, RVV_REQUIRE_ELEN_FP_16 |
> RVV_REQUIRE_MIN_VLEN_64)
> +DEF_RVV_F_OPS (vfloat16mf2_t, RVV_REQUIRE_ELEN_FP_16)
> +DEF_RVV_F_OPS (vfloat16m1_t, RVV_REQUIRE_ELEN_FP_16)
> +DEF_RVV_F_OPS (vfloat16m2_t, RVV_REQUIRE_ELEN_FP_16)
> +DEF_RVV_F_OPS (vfloat16m4_t, RVV_REQUIRE_ELEN_FP_16)
> +DEF_RVV_F_OPS (vfloat16m8_t, RVV_REQUIRE_ELEN_FP_16)
> +
> DEF_RVV_F_OPS (vfloat32mf2_t, RVV_REQUIRE_ELEN_FP_32 |
> RVV_REQUIRE_MIN_VLEN_64)
> DEF_RVV_F_OPS (vfloat32m1_t, RVV_REQUIRE_ELEN_FP_32)
> DEF_RVV_F_OPS (vfloat32m2_t, RVV_REQUIRE_ELEN_FP_32)
> diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> index f545874edc1..be960583101 100644
> --- a/gcc/config/riscv/riscv.md
> +++ b/gcc/config/riscv/riscv.md
> @@ -175,6 +175,7 @@ (define_attr "mode"
> "unknown,none,QI,HI,SI,DI,TI,HF,SF,DF,TF,
>VNx1HI,VNx2HI,VNx4HI,VNx8HI,VNx16HI,VNx32HI,VNx64HI,
>VNx1SI,VNx2SI,VNx4SI,VNx8SI,VNx16SI,VNx32SI,
>VNx1DI,VNx2DI,VNx4DI,VNx8DI,VNx16DI,
> +  VNx1HF,VNx2HF,VNx4HF,VNx8HF,VNx16HF,VNx32HF,VNx64HF,
>VNx1SF,VNx2SF,VNx4SF,VNx8SF,VNx16SF,VNx32SF,
>VNx1DF,VNx2DF,VNx4DF,VNx8DF,VNx16DF,
>VNx2x64QI,VNx2x32QI,VNx3x32QI,VNx4x32QI,
> diff --git a/gcc/config/riscv/vector-iterators.md
> b/gcc/config/riscv/vector-iterators.md
> index 937ec3c7f67..5fbaef89566 100644
> --- a/gcc/config/riscv/vector-iterators.md
> +++ b/gcc/config/riscv/vector-iterators.md
> @@ -90,6 +90,15 @@ (define_mode_iterator V [
>(VNx1SI "TARGET_MIN_VLEN < 128") VNx2SI VNx4SI VNx8SI (VNx16SI
> "TARGET_MIN_VLEN > 32") (VNx32SI "TARGET_MIN_VLEN >= 128")
>(VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI
> "TARGET_VECTOR_ELEN_64")
>(VNx4DI "TARGET_VECTOR_ELEN_64") (VNx8DI "TARGET_VECTOR_ELEN_64")
> (VNx16DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
> +
> +  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
> +  (VNx2HF "TARGET_VECTOR_ELEN_FP_16")
> +  (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
> +  (VNx8HF "TARGET_VECTOR_ELEN_FP_16")
> +  (VNx16HF "TARGET_VECTOR_ELEN_FP_16")
> + 

RE: [PATCH] RISC-V: Support RVV zvfh{min} vfloat16*_t mov and spill

2023-06-03 Thread Li, Pan2 via Gcc-patches
Great! Thanks Juzhe and let’s wait kito’s approval.

Pan

From: 钟居哲 
Sent: Sunday, June 4, 2023 7:36 AM
To: Li, Pan2 ; gcc-patches 
Cc: kito.cheng ; Li, Pan2 ; Wang, 
Yanzhang 
Subject: Re: [PATCH] RISC-V: Support RVV zvfh{min} vfloat16*_t mov and spill

LGTM. Hope FP16 vector can be committed soon.
Since I would like to wait for FP16 vector and then start to support FP16 FP32 
FP64 autovec together.

Thanks.

juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-06-03 22:37
To: gcc-patches
CC: juzhe.zhong; 
kito.cheng; pan2.li; 
yanzhang.wang
Subject: [PATCH] RISC-V: Support RVV zvfh{min} vfloat16*_t mov and spill
From: Pan Li mailto:pan2...@intel.com>>

This patch would like to allow the mov and spill operation for the RVV
vfloat16*_t types. The involved machine mode includes VNx1HF, VNx2HF,
VNx4HF, VNx8HF, VNx16HF, VNx32HF and VNx64HF.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
Co-Authored by: Juzhe-Zhong mailto:juzhe.zh...@rivai.ai>>

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-types.def
(vfloat16mf4_t): Add the float16 type to DEF_RVV_F_OPS.
(vfloat16mf2_t): Likewise.
(vfloat16m1_t): Likewise.
(vfloat16m2_t): Likewise.
(vfloat16m4_t): Likewise.
(vfloat16m8_t): Likewise.
* config/riscv/riscv.md: Add vfloat16*_t to attr mode.
* config/riscv/vector-iterators.md: Add vfloat16*_t machine mode
to V, V_WHOLE, V_FRACT, VINDEX, VM, VEL and sew.
* config/riscv/vector.md: Add vfloat16*_t machine mode to sew,
vlmul and ratio.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/mov-14.c: New test.
* gcc.target/riscv/rvv/base/spill-13.c: New test.
---
.../riscv/riscv-vector-builtins-types.def |   7 ++
gcc/config/riscv/riscv.md |   1 +
gcc/config/riscv/vector-iterators.md  |  25 
gcc/config/riscv/vector.md|  35 ++
.../gcc.target/riscv/rvv/base/mov-14.c|  81 +
.../gcc.target/riscv/rvv/base/spill-13.c  | 108 ++
6 files changed, 257 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/mov-14.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/spill-13.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-types.def 
b/gcc/config/riscv/riscv-vector-builtins-types.def
index f7f650f7e95..65716b8c637 100644
--- a/gcc/config/riscv/riscv-vector-builtins-types.def
+++ b/gcc/config/riscv/riscv-vector-builtins-types.def
@@ -385,6 +385,13 @@ DEF_RVV_U_OPS (vuint64m2_t, RVV_REQUIRE_ELEN_64)
DEF_RVV_U_OPS (vuint64m4_t, RVV_REQUIRE_ELEN_64)
DEF_RVV_U_OPS (vuint64m8_t, RVV_REQUIRE_ELEN_64)
+DEF_RVV_F_OPS (vfloat16mf4_t, RVV_REQUIRE_ELEN_FP_16 | RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_F_OPS (vfloat16mf2_t, RVV_REQUIRE_ELEN_FP_16)
+DEF_RVV_F_OPS (vfloat16m1_t, RVV_REQUIRE_ELEN_FP_16)
+DEF_RVV_F_OPS (vfloat16m2_t, RVV_REQUIRE_ELEN_FP_16)
+DEF_RVV_F_OPS (vfloat16m4_t, RVV_REQUIRE_ELEN_FP_16)
+DEF_RVV_F_OPS (vfloat16m8_t, RVV_REQUIRE_ELEN_FP_16)
+
DEF_RVV_F_OPS (vfloat32mf2_t, RVV_REQUIRE_ELEN_FP_32 | RVV_REQUIRE_MIN_VLEN_64)
DEF_RVV_F_OPS (vfloat32m1_t, RVV_REQUIRE_ELEN_FP_32)
DEF_RVV_F_OPS (vfloat32m2_t, RVV_REQUIRE_ELEN_FP_32)
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index f545874edc1..be960583101 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -175,6 +175,7 @@ (define_attr "mode" 
"unknown,none,QI,HI,SI,DI,TI,HF,SF,DF,TF,
   VNx1HI,VNx2HI,VNx4HI,VNx8HI,VNx16HI,VNx32HI,VNx64HI,
   VNx1SI,VNx2SI,VNx4SI,VNx8SI,VNx16SI,VNx32SI,
   VNx1DI,VNx2DI,VNx4DI,VNx8DI,VNx16DI,
+  VNx1HF,VNx2HF,VNx4HF,VNx8HF,VNx16HF,VNx32HF,VNx64HF,
   VNx1SF,VNx2SF,VNx4SF,VNx8SF,VNx16SF,VNx32SF,
   VNx1DF,VNx2DF,VNx4DF,VNx8DF,VNx16DF,
   VNx2x64QI,VNx2x32QI,VNx3x32QI,VNx4x32QI,
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 937ec3c7f67..5fbaef89566 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -90,6 +90,15 @@ (define_mode_iterator V [
   (VNx1SI "TARGET_MIN_VLEN < 128") VNx2SI VNx4SI VNx8SI (VNx16SI 
"TARGET_MIN_VLEN > 32") (VNx32SI "TARGET_MIN_VLEN >= 128")
   (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI 
"TARGET_VECTOR_ELEN_64")
   (VNx4DI "TARGET_VECTOR_ELEN_64") (VNx8DI "TARGET_VECTOR_ELEN_64") (VNx16DI 
"TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
+
+  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
+  (VNx2HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx8HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx16HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
+  (VNx64HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")
+
   (VNx1SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN < 128")
   (VNx2SF "TARGET_VECTOR_ELEN_FP_32")
   (VNx4SF 

Re: [PATCH] RISC-V: Support RVV zvfh{min} vfloat16*_t mov and spill

2023-06-03 Thread 钟居哲
LGTM. Hope FP16 vector can be committed soon.
Since I would like to wait for FP16 vector and then start to support FP16 FP32 
FP64 autovec together.

Thanks.


juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-06-03 22:37
To: gcc-patches
CC: juzhe.zhong; kito.cheng; pan2.li; yanzhang.wang
Subject: [PATCH] RISC-V: Support RVV zvfh{min} vfloat16*_t mov and spill
From: Pan Li 
 
This patch would like to allow the mov and spill operation for the RVV
vfloat16*_t types. The involved machine mode includes VNx1HF, VNx2HF,
VNx4HF, VNx8HF, VNx16HF, VNx32HF and VNx64HF.
 
Signed-off-by: Pan Li 
Co-Authored by: Juzhe-Zhong 
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins-types.def
(vfloat16mf4_t): Add the float16 type to DEF_RVV_F_OPS.
(vfloat16mf2_t): Likewise.
(vfloat16m1_t): Likewise.
(vfloat16m2_t): Likewise.
(vfloat16m4_t): Likewise.
(vfloat16m8_t): Likewise.
* config/riscv/riscv.md: Add vfloat16*_t to attr mode.
* config/riscv/vector-iterators.md: Add vfloat16*_t machine mode
to V, V_WHOLE, V_FRACT, VINDEX, VM, VEL and sew.
* config/riscv/vector.md: Add vfloat16*_t machine mode to sew,
vlmul and ratio.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/mov-14.c: New test.
* gcc.target/riscv/rvv/base/spill-13.c: New test.
---
.../riscv/riscv-vector-builtins-types.def |   7 ++
gcc/config/riscv/riscv.md |   1 +
gcc/config/riscv/vector-iterators.md  |  25 
gcc/config/riscv/vector.md|  35 ++
.../gcc.target/riscv/rvv/base/mov-14.c|  81 +
.../gcc.target/riscv/rvv/base/spill-13.c  | 108 ++
6 files changed, 257 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/mov-14.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/spill-13.c
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-types.def 
b/gcc/config/riscv/riscv-vector-builtins-types.def
index f7f650f7e95..65716b8c637 100644
--- a/gcc/config/riscv/riscv-vector-builtins-types.def
+++ b/gcc/config/riscv/riscv-vector-builtins-types.def
@@ -385,6 +385,13 @@ DEF_RVV_U_OPS (vuint64m2_t, RVV_REQUIRE_ELEN_64)
DEF_RVV_U_OPS (vuint64m4_t, RVV_REQUIRE_ELEN_64)
DEF_RVV_U_OPS (vuint64m8_t, RVV_REQUIRE_ELEN_64)
+DEF_RVV_F_OPS (vfloat16mf4_t, RVV_REQUIRE_ELEN_FP_16 | RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_F_OPS (vfloat16mf2_t, RVV_REQUIRE_ELEN_FP_16)
+DEF_RVV_F_OPS (vfloat16m1_t, RVV_REQUIRE_ELEN_FP_16)
+DEF_RVV_F_OPS (vfloat16m2_t, RVV_REQUIRE_ELEN_FP_16)
+DEF_RVV_F_OPS (vfloat16m4_t, RVV_REQUIRE_ELEN_FP_16)
+DEF_RVV_F_OPS (vfloat16m8_t, RVV_REQUIRE_ELEN_FP_16)
+
DEF_RVV_F_OPS (vfloat32mf2_t, RVV_REQUIRE_ELEN_FP_32 | RVV_REQUIRE_MIN_VLEN_64)
DEF_RVV_F_OPS (vfloat32m1_t, RVV_REQUIRE_ELEN_FP_32)
DEF_RVV_F_OPS (vfloat32m2_t, RVV_REQUIRE_ELEN_FP_32)
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index f545874edc1..be960583101 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -175,6 +175,7 @@ (define_attr "mode" 
"unknown,none,QI,HI,SI,DI,TI,HF,SF,DF,TF,
   VNx1HI,VNx2HI,VNx4HI,VNx8HI,VNx16HI,VNx32HI,VNx64HI,
   VNx1SI,VNx2SI,VNx4SI,VNx8SI,VNx16SI,VNx32SI,
   VNx1DI,VNx2DI,VNx4DI,VNx8DI,VNx16DI,
+  VNx1HF,VNx2HF,VNx4HF,VNx8HF,VNx16HF,VNx32HF,VNx64HF,
   VNx1SF,VNx2SF,VNx4SF,VNx8SF,VNx16SF,VNx32SF,
   VNx1DF,VNx2DF,VNx4DF,VNx8DF,VNx16DF,
   VNx2x64QI,VNx2x32QI,VNx3x32QI,VNx4x32QI,
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 937ec3c7f67..5fbaef89566 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -90,6 +90,15 @@ (define_mode_iterator V [
   (VNx1SI "TARGET_MIN_VLEN < 128") VNx2SI VNx4SI VNx8SI (VNx16SI 
"TARGET_MIN_VLEN > 32") (VNx32SI "TARGET_MIN_VLEN >= 128")
   (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI 
"TARGET_VECTOR_ELEN_64")
   (VNx4DI "TARGET_VECTOR_ELEN_64") (VNx8DI "TARGET_VECTOR_ELEN_64") (VNx16DI 
"TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
+
+  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
+  (VNx2HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx8HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx16HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
+  (VNx64HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")
+
   (VNx1SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN < 128")
   (VNx2SF "TARGET_VECTOR_ELEN_FP_32")
   (VNx4SF "TARGET_VECTOR_ELEN_FP_32")
@@ -427,6 +436,15 @@ (define_mode_iterator V_WHOLE [
   (VNx1SI "TARGET_MIN_VLEN == 32") VNx2SI VNx4SI VNx8SI (VNx16SI 
"TARGET_MIN_VLEN > 32") (VNx32SI "TARGET_MIN_VLEN >= 128")
   (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI 
"TARGET_VECTOR_ELEN_64")
   (VNx4DI "TARGET_VECTOR_ELEN_64") (VNx8DI "TARGET_VECTOR_ELEN_64") (VNx16DI 
"TARGET_MIN_VLEN >= 128")
+
+  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
+  (VNx2HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx8HF 

Re: [PATCH] xtensa: Optimize boolean evaluation or branching when EQ/NE to INT_MIN

2023-06-03 Thread Andrew Pinski via Gcc-patches
On Sat, Jun 3, 2023 at 3:53 PM Takayuki 'January June' Suwa via
Gcc-patches  wrote:
>
> This patch optimizes both the boolean evaluation of and the branching of
> EQ/NE against INT_MIN (-2147483648), by taking advantage of the specifi-
> cation the ABS machine instruction on Xtensa returns INT_MIN iff INT_MIN,
> otherwise non-negative value.

I wonder if this should be a generic expand improvement here.
You would definitely need to expand both ways and see if one is cost
more than the other.

Thanks,
Andrew Pinski


>
> /* example */
> int test0(int x) {
>   return (x == -2147483648);
> }
> int test1(int x) {
>   return (x != -2147483648);
> }
> extern void foo(void);
> void test2(int x) {
>   if(x == -2147483648)
> foo();
> }
> void test3(int x) {
>   if(x != -2147483648)
> foo();
> }
>
> ;; before
> test0:
> movi.n  a9, -1
> sllia9, a9, 31
> add.n   a2, a2, a9
> nsaua2, a2
> srlia2, a2, 5
> ret.n
> test1:
> movi.n  a9, -1
> sllia9, a9, 31
> add.n   a9, a2, a9
> movi.n  a2, 1
> moveqz  a2, a9, a9
> ret.n
> test2:
> movi.n  a9, -1
> sllia9, a9, 31
> bne a2, a9, .L3
> j.l foo, a9
> .L3:
> ret.n
> test3:
> movi.n  a9, -1
> sllia9, a9, 31
> beq a2, a9, .L5
> j.l foo, a9
> .L5:
> ret.n
>
> ;; after
> test0:
> abs a2, a2
> extui   a2, a2, 31, 1
> ret.n
> test1:
> abs a2, a2
> sraia2, a2, 31
> addi.n  a2, a2, 1
> ret.n
> test2:
> abs a2, a2
> bbcia2, 31, .L3
> j.l foo, a9
> .L3:
> ret.n
> test3:
> abs a2, a2
> bbsia2, 31, .L5
> j.l foo, a9
> .L5:
> ret.n
>
> gcc/ChangeLog:
>
> * config/xtensa/xtensa.md (*btrue_INT_MIN, *eqne_INT_MIN):
> New insn_and_split patterns.
> ---
>  gcc/config/xtensa/xtensa.md | 64 +
>  1 file changed, 64 insertions(+)
>
> diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md
> index 87620934bbe..c9790babf75 100644
> --- a/gcc/config/xtensa/xtensa.md
> +++ b/gcc/config/xtensa/xtensa.md
> @@ -1940,6 +1940,37 @@
>(const_int 2)
>(const_int 3)))])
>
> +(define_insn_and_split "*btrue_INT_MIN"
> +  [(set (pc)
> +   (if_then_else (match_operator 2 "boolean_operator"
> +   [(match_operand:SI 0 "register_operand" "r")
> +(const_int -2147483648)])
> + (label_ref (match_operand 1 ""))
> + (pc)))]
> +  "TARGET_ABS"
> +  "#"
> +  "&& can_create_pseudo_p ()"
> +  [(set (match_dup 3)
> +   (abs:SI (match_dup 0)))
> +   (set (pc)
> +   (if_then_else (match_op_dup 2
> +   [(zero_extract:SI (match_dup 3)
> + (const_int 1)
> + (match_dup 4))
> +(const_int 0)])
> + (label_ref (match_dup 1))
> + (pc)))]
> +{
> +  operands[3] = gen_reg_rtx (SImode);
> +  operands[4] = GEN_INT (BITS_BIG_ENDIAN ? 0 : 31);
> +  operands[2] = gen_rtx_fmt_ee (reverse_condition (GET_CODE (operands[2])),
> +   VOIDmode, XEXP (operands[2], 0),
> +   const0_rtx);
> +}
> +  [(set_attr "type""jump")
> +   (set_attr "mode""none")
> +   (set_attr "length"  "6")])
> +
>  (define_insn "*ubtrue"
>[(set (pc)
> (if_then_else (match_operator 3 "ubranch_operator"
> @@ -3198,6 +3229,39 @@
> (set_attr "mode""SI")
> (set_attr "length"  "6")])
>
> +(define_insn_and_split "*eqne_INT_MIN"
> +  [(set (match_operand:SI 0 "register_operand" "=a")
> +   (match_operator 2 "boolean_operator"
> +   [(match_operand:SI 1 "register_operand" "r")
> +(const_int -2147483648)]))]
> +  "TARGET_ABS"
> +  "#"
> +  "&& 1"
> +  [(set (match_dup 0)
> +   (abs:SI (match_dup 1)))
> +   (set (match_dup 0)
> +   (match_op_dup:SI 2
> +   [(match_dup 0)
> +(const_int 31)]))
> +   (match_dup 3)]
> +{
> +  enum rtx_code code = GET_CODE (operands[2]);
> +  operands[2] = gen_rtx_fmt_ee ((code == EQ) ? LSHIFTRT : ASHIFTRT,
> +   SImode, XEXP (operands[2], 0),
> +   XEXP (operands[2], 1));
> +  operands[3] = (code != EQ) ? gen_addsi3 (operands[0],
> +  operands[0], const1_rtx)
> +: const0_rtx;
> +}
> +  [(set_attr "type""move")
> +   (set_attr "mode""SI")
> +   (set (attr "length")
> +   (if_then_else (match_test "GET_CODE 

[PATCH] xtensa: Optimize boolean evaluation or branching when EQ/NE to INT_MIN

2023-06-03 Thread Takayuki 'January June' Suwa via Gcc-patches
This patch optimizes both the boolean evaluation of and the branching of
EQ/NE against INT_MIN (-2147483648), by taking advantage of the specifi-
cation the ABS machine instruction on Xtensa returns INT_MIN iff INT_MIN,
otherwise non-negative value.

/* example */
int test0(int x) {
  return (x == -2147483648);
}
int test1(int x) {
  return (x != -2147483648);
}
extern void foo(void);
void test2(int x) {
  if(x == -2147483648)
foo();
}
void test3(int x) {
  if(x != -2147483648)
foo();
}

;; before
test0:
movi.n  a9, -1
sllia9, a9, 31
add.n   a2, a2, a9
nsaua2, a2
srlia2, a2, 5
ret.n
test1:
movi.n  a9, -1
sllia9, a9, 31
add.n   a9, a2, a9
movi.n  a2, 1
moveqz  a2, a9, a9
ret.n
test2:
movi.n  a9, -1
sllia9, a9, 31
bne a2, a9, .L3
j.l foo, a9
.L3:
ret.n
test3:
movi.n  a9, -1
sllia9, a9, 31
beq a2, a9, .L5
j.l foo, a9
.L5:
ret.n

;; after
test0:
abs a2, a2
extui   a2, a2, 31, 1
ret.n
test1:
abs a2, a2
sraia2, a2, 31
addi.n  a2, a2, 1
ret.n
test2:
abs a2, a2
bbcia2, 31, .L3
j.l foo, a9
.L3:
ret.n
test3:
abs a2, a2
bbsia2, 31, .L5
j.l foo, a9
.L5:
ret.n

gcc/ChangeLog:

* config/xtensa/xtensa.md (*btrue_INT_MIN, *eqne_INT_MIN):
New insn_and_split patterns.
---
 gcc/config/xtensa/xtensa.md | 64 +
 1 file changed, 64 insertions(+)

diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md
index 87620934bbe..c9790babf75 100644
--- a/gcc/config/xtensa/xtensa.md
+++ b/gcc/config/xtensa/xtensa.md
@@ -1940,6 +1940,37 @@
   (const_int 2)
   (const_int 3)))])
 
+(define_insn_and_split "*btrue_INT_MIN"
+  [(set (pc)
+   (if_then_else (match_operator 2 "boolean_operator"
+   [(match_operand:SI 0 "register_operand" "r")
+(const_int -2147483648)])
+ (label_ref (match_operand 1 ""))
+ (pc)))]
+  "TARGET_ABS"
+  "#"
+  "&& can_create_pseudo_p ()"
+  [(set (match_dup 3)
+   (abs:SI (match_dup 0)))
+   (set (pc)
+   (if_then_else (match_op_dup 2
+   [(zero_extract:SI (match_dup 3)
+ (const_int 1)
+ (match_dup 4))
+(const_int 0)])
+ (label_ref (match_dup 1))
+ (pc)))]
+{
+  operands[3] = gen_reg_rtx (SImode);
+  operands[4] = GEN_INT (BITS_BIG_ENDIAN ? 0 : 31);
+  operands[2] = gen_rtx_fmt_ee (reverse_condition (GET_CODE (operands[2])),
+   VOIDmode, XEXP (operands[2], 0),
+   const0_rtx);
+}
+  [(set_attr "type""jump")
+   (set_attr "mode""none")
+   (set_attr "length"  "6")])
+
 (define_insn "*ubtrue"
   [(set (pc)
(if_then_else (match_operator 3 "ubranch_operator"
@@ -3198,6 +3229,39 @@
(set_attr "mode""SI")
(set_attr "length"  "6")])
 
+(define_insn_and_split "*eqne_INT_MIN"
+  [(set (match_operand:SI 0 "register_operand" "=a")
+   (match_operator 2 "boolean_operator"
+   [(match_operand:SI 1 "register_operand" "r")
+(const_int -2147483648)]))]
+  "TARGET_ABS"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+   (abs:SI (match_dup 1)))
+   (set (match_dup 0)
+   (match_op_dup:SI 2
+   [(match_dup 0)
+(const_int 31)]))
+   (match_dup 3)]
+{
+  enum rtx_code code = GET_CODE (operands[2]);
+  operands[2] = gen_rtx_fmt_ee ((code == EQ) ? LSHIFTRT : ASHIFTRT,
+   SImode, XEXP (operands[2], 0),
+   XEXP (operands[2], 1));
+  operands[3] = (code != EQ) ? gen_addsi3 (operands[0],
+  operands[0], const1_rtx)
+: const0_rtx;
+}
+  [(set_attr "type""move")
+   (set_attr "mode""SI")
+   (set (attr "length")
+   (if_then_else (match_test "GET_CODE (operands[2]) == EQ")
+ (const_int 3)
+ (if_then_else (match_test "TARGET_DENSITY")
+   (const_int 5)
+   (const_int 6])
+
 (define_peephole2
   [(set (match_operand:SI 0 "register_operand")
(match_operand:SI 6 "reload_operand"))
-- 
2.30.2


[x86 PATCH] Add support for stc, clc and cmc instructions in i386.md

2023-06-03 Thread Roger Sayle

This patch is the latest revision of my patch to add support for the
STC (set carry flag), CLC (clear carry flag) and CMC (complement
carry flag) instructions to the i386 backend, incorporating Uros'
previous feedback.  The significant changes are (i) the inclusion
of CMC, (ii) the use of UNSPEC for pattern, (iii) Use of a new
X86_TUNE_SLOW_STC tuning flag to use alternate implementations on
pentium4 (which has a notoriously slow STC) when not optimizing
for size.

An example of the use of the stc instruction is:
unsigned int foo (unsigned int a, unsigned int b, unsigned int *c) {
  return __builtin_ia32_addcarryx_u32 (1, a, b, c);
}

which previously generated:
movl$1, %eax
addb$-1, %al
adcl%esi, %edi
setc%al
movl%edi, (%rdx)
movzbl  %al, %eax
ret

with this patch now generates:
stc
adcl%esi, %edi
setc%al
movl%edi, (%rdx)
movzbl  %al, %eax
ret

An example of the use of the cmc instruction (where the carry from
a first adc is inverted/complemented as input to a second adc) is:
unsigned int bar (unsigned int a, unsigned int b,
  unsigned int c, unsigned int d)
{
  unsigned int c1 = __builtin_ia32_addcarryx_u32 (1, a, b, );
  return __builtin_ia32_addcarryx_u32 (c1 ^ 1, c, d, );
}

which previously generated:
movl$1, %eax
addb$-1, %al
adcl%esi, %edi
setnc   %al
movl%edi, o1(%rip)
addb$-1, %al
adcl%ecx, %edx
setc%al
movl%edx, o2(%rip)
movzbl  %al, %eax
ret

and now generates:
stc
adcl%esi, %edi
cmc
movl%edi, o1(%rip)
adcl%ecx, %edx
setc%al
movl%edx, o2(%rip)
movzbl  %al, %eax
ret


This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2022-06-03  Roger Sayle  

gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_expand_builtin) :
Use new x86_stc or negqi_ccc_1 instructions to set the carry flag.
* config/i386/i386.h (TARGET_SLOW_STC): New define.
* config/i386/i386.md (UNSPEC_CLC): New UNSPEC for clc.
(UNSPEC_STC): New UNSPEC for stc.
(UNSPEC_CMC): New UNSPEC for cmc.
(*x86_clc): New define_insn.
(*x86_clc_xor): New define_insn for pentium4 without -Os.
(x86_stc): New define_insn.
(define_split): Convert x86_stc into alternate implementation
on pentium4.
(x86_cmc): New define_insn.
(*x86_cmc_1): New define_insn_and_split to recognize cmc pattern.
(*setcc_qi_negqi_ccc_1_): New define_insn_and_split to
recognize (and eliminate) the carry flag being copied to itself.
(*setcc_qi_negqi_ccc_2_): Likewise.
(neg_ccc_1): Renamed from *neg_ccc_1 for gen function.
* config/i386/x86-tune.def (X86_TUNE_SLOW_STC): New tuning flag.

gcc/testsuite/ChangeLog
* gcc.target/i386/cmc-1.c: New test case.
* gcc.target/i386/stc-1.c: Likewise.


Thanks,
Roger
--

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 5d21810..9e02fdd 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -13948,8 +13948,6 @@ rdseed_step:
   arg3 = CALL_EXPR_ARG (exp, 3); /* unsigned int *sum_out.  */
 
   op1 = expand_normal (arg0);
-  if (!integer_zerop (arg0))
-   op1 = copy_to_mode_reg (QImode, convert_to_mode (QImode, op1, 1));
 
   op2 = expand_normal (arg1);
   if (!register_operand (op2, mode0))
@@ -13967,7 +13965,7 @@ rdseed_step:
}
 
   op0 = gen_reg_rtx (mode0);
-  if (integer_zerop (arg0))
+  if (op1 == const0_rtx)
{
  /* If arg0 is 0, optimize right away into add or sub
 instruction that sets CCCmode flags.  */
@@ -13977,7 +13975,14 @@ rdseed_step:
   else
{
  /* Generate CF from input operand.  */
- emit_insn (gen_addqi3_cconly_overflow (op1, constm1_rtx));
+ if (!CONST_INT_P (op1))
+   {
+ op1 = convert_to_mode (QImode, op1, 1);
+ op1 = copy_to_mode_reg (QImode, op1);
+ emit_insn (gen_negqi_ccc_1 (op1, op1));
+   }
+ else
+   emit_insn (gen_x86_stc ());
 
  /* Generate instruction that consumes CF.  */
  op1 = gen_rtx_REG (CCCmode, FLAGS_REG);
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index c7439f8..5ac9c78 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -448,6 +448,7 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST];
ix86_tune_features[X86_TUNE_V2DF_REDUCTION_PREFER_HADDPD]
 #define TARGET_DEST_FALSE_DEP_FOR_GLC \
ix86_tune_features[X86_TUNE_DEST_FALSE_DEP_FOR_GLC]
+#define TARGET_SLOW_STC 

Re: Consider '--with-build-sysroot=[...]' for target libraries' build-tree testing (instead of build-time 'CC' etc.) [PR109951]

2023-06-03 Thread Maciej W. Rozycki
Hi Thomas,

> Will you, Maciej, please test that this doesn't break your setting?

 Umm, this was implemented for my Western Digital development environment, 
which I don't have access to anymore.  I'll see what I can do, but it may 
be neither easy nor quick.  It's been long ago and I don't have a setup 
with multilibs enabled anymore.  And neither I remember the thorough 
problem analysis I went through that has led me to the conclusions.

 I've come across my note, in a reply to Chung-Lin's concerns, about using 
libgomp.exp as a standalone test driver.  Has this been verified somehow 
with your proposed change?

 Also I've skimmed over your change and this has caught my eye:

> diff --git a/libgomp/configure.ac b/libgomp/configure.ac
> index 1aad83a79da..49f7fb0dc82 100644
> --- a/libgomp/configure.ac
> +++ b/libgomp/configure.ac
> @@ -151,22 +151,11 @@ AC_SUBST(enable_static)
>  
>  AM_MAINTAINER_MODE
>  
> -# We optionally test libgomp C++ support, and for that want to use the proper
> -# C++ driver, 'g++' (or 'xg++' for build-tree testing).  Given that build of
> -# target libstdc++-v3 depends on target libgomp (see '../Makefile.def'), we
> -# cannot make build of target libgomp depend on target libstdc++-v3: circular
> -# dependency.  We thus cannot instantiate 'AC_PROG_CXX' here: we'd get
> -# '-funconfigured-libstdc++-v3' (see '../configure.ac').  Therefore, just
> -# capture 'CXX', and we'll fix this up at 'make check' time (see
> -# 'testsuite/lib/libgomp.exp:libgomp_init').
> -AC_SUBST(CXX)
> -
>  # Create a spec file, so that compile/link tests don't fail
>  test -f libgfortran.spec || touch libgfortran.spec
>  FCFLAGS="$FCFLAGS -L."
>  
> -# We need 'gfortran' to compile parts of the library, and test libgomp 
> Fortran
> -# support.
> +# We need gfortran to compile parts of the library
>  # We can't use AC_PROG_FC because it expects a fully working gfortran.

-- missing full stop here, and I suggest to just make all this comment one 
paragraph (I can't imagine why it's not already, as the second sentence is 
clearly a continuation of the first one).

 I think a proper change description would be good too, as otherwise one 
may wonder why you have removed all the stuff above, and what this change 
is about anyway.

  Maciej


Re: [Patch, fortran] PR37336 finalization

2023-06-03 Thread Thomas Koenig via Gcc-patches

Hi Paul,


I want to get something approaching correct finalization to the
distros, which implies 12-branch at present. Hopefully I can do the
same with associate in a month or two's time.


OK by me then.

(I just wanted to be sure that we had this discussion :-)

Best regards

Thomas


Re: [PATCH, committed] Fortran: fix diagnostics for SELECT RANK [PR100607]

2023-06-03 Thread Harald Anlauf via Gcc-patches

Hi Paul,

On 6/3/23 07:48, Paul Richard Thomas via Gcc-patches wrote:

Hi Harald,

It looks good to me. Thanks to you and Steve for the fix. I suggest
that it is such and obvious one that it deserved back-porting.


alright, I'll check how far this makes sense.

Cheers,
Harald


Cheers

Paul

On Fri, 2 Jun 2023 at 19:06, Harald Anlauf via Fortran
 wrote:


Dear all,

I've committed that attached simple patch on behalf of Steve
after discussion in the PR and regtesting on x86_64-pc-linux-gnu.

It fixes a duplicate error message and an ICE.

Pushed as r14-1505-gfae09dfc0e6bf4cfe35d817558827aea78c6426f .

Thanks,
Harald








Re: [Patch, fortran] PR37336 finalization

2023-06-03 Thread Harald Anlauf via Gcc-patches

Hi Paul, all,

On 6/3/23 15:16, Paul Richard Thomas via Gcc-patches wrote:

Hi Thomas,

I want to get something approaching correct finalization to the
distros, which implies 12-branch at present. Hopefully I can do the
same with associate in a month or two's time.


IMHO it is not only distros, but also installations at (scientific)
computing centers with a larger user base and a large software stack.
Migrating to a different major version of gcc/gfortran is not a trivial
task for them.

I'd fully support the idea of backporting the finalization fixes, as
IIUC this on the one hand touches a rather isolated part, and on the
other hand already got quite some testing.  It is also already in the
13-branch (or only mostly?).  Given that 12.3 was released recently
and 12.4 is far away, there'd be sufficient time to fix any fallout.

Regarding the associate fixes, we could get as much of those into 13.2,
which we'd normally expect in just a few months.  As long as spare time
to work on gfortran is limited, I'd rather prefer to get as much fixed
for that release.

(This is not a no: I simply expect that real regression testing for the
associate changes may take more time.)


I am dithering about changing the F2003/08 part of finalization since
the default is 2018 compliance. That said, it does need a change since
the suppression of constructor finalization is also suppressing
finalization of function results within the compilers. I'll do that
first, perhaps?


That sounds like a good idea.

Cheers,
Harald


Cheers

Paul



On Sat, 3 Jun 2023 at 06:50, Thomas Koenig  wrote:


Hi Paul,


I propose to backport
r13-6747-gd7caf313525a46f200d7f5db1ba893f853774aee to 12-branch very
soon.


Is this something that we usually do?

While finalization was basically broken before, some people still used
working subsets (or subsets that were broken, and they adapted or
wrote their code accordingly).

What is the general opinion on that?  I'm undecided.


Before that, I propose to remove the F2003/2008 finalization of
structure and array constructors in 13- and 14-branches. I can see why
it was removed from the standard in a correction to F2008 and think
that it is likely to cause endless confusion and maintenance
complications. However, finalization of function results within
constructors will be retained.


That, I agree with.  Should it be noted somewhere as an intentional
deviation from the standard?

Best regards

 Thomas




--
"If you can't explain it simply, you don't understand it well enough"
- Albert Einstein





Re: [patch] Fix PR101188 wrong code from postreload

2023-06-03 Thread Georg-Johann Lay




Am 03.06.23 um 17:53 schrieb Jeff Law:



On 6/2/23 02:46, Georg-Johann Lay wrote:

There is the following bug in postreload that can be traced back
to v5 at least:

In postreload.cc::reload_cse_move2add() there is a loop over all
insns.  If it encounters a SET, the next insn is analyzed if it
is a single_set.

After next has been analyzed, it continues with

   if (success)
 delete_insn (insn);
   changed |= success;
   insn = next; // This effectively skips analysis of next.
   move2add_record_mode (reg);
   reg_offset[regno]
 = trunc_int_for_mode (added_offset + base_offset,
   mode);
   continue; // for continues with insn = NEXT_INSN (insn).

So it records the effect of next, but not the clobbers that
next might have.  This is a problem if next clobbers a GPR
like it can happen for avr.  What then can happen is that in a
later round, it may use a value from a (partially) clobbered reg.

The patch records the effects of potential clobbers.

Bootstrapped and reg-tested on x86_64.  Also tested on avr where
the bug popped up.  The testcase discriminates on avr, and for now
I am not aware of any other target that's affected by the bug.

The change is not intrusive and fixes wrong code, so I'd like
to backport it.

Ok to apply?

Johann

rtl-optimization/101188: Don't bypass clobbers of some insns that are
optimized or are optimization candidates.

gcc/
 PR rtl-optimization/101188
 * postreload.cc (reload_cse_move2add): Record clobbers of next
 insn using move2add_note_store.

gcc/testsuite/
 PR rtl-optimization/101188
 * gcc.c-torture/execute/pr101188.c: New test.
If I understand the code correctly, isn't the core of the problem that 
we "continue" rather than executing the rest of the code in the loop. In 
particular the continue bypasses this chunk of code:



 for (note = REG_NOTES (insn); note; note = XEXP (note, 1))
    {
  if (REG_NOTE_KIND (note) == REG_INC
  && REG_P (XEXP (note, 0)))
    {
  /* Reset the information about this register.  */
  int regno = REGNO (XEXP (note, 0));
  if (regno < FIRST_PSEUDO_REGISTER)
    {
  move2add_record_mode (XEXP (note, 0));
  reg_mode[regno] = VOIDmode;
    }
    }
    }

  /* There are no REG_INC notes for SP autoinc.  */
  subrtx_var_iterator::array_type array;
  FOR_EACH_SUBRTX_VAR (iter, array, PATTERN (insn), NONCONST)
    {
  rtx mem = *iter;
  if (mem
  && MEM_P (mem)
  && GET_RTX_CLASS (GET_CODE (XEXP (mem, 0))) == RTX_AUTOINC)
    {
  if (XEXP (XEXP (mem, 0), 0) == stack_pointer_rtx)
    reg_mode[STACK_POINTER_REGNUM] = VOIDmode;
    }
    }

  note_stores (insn, move2add_note_store, insn);


The point is that in the continue block, the effect of the insn is
recorded even if !success, it's just the computed effect of the code.

Moreover, "next" is REG = REG + CONST_INT, so there are no REG_INC
notes, no?

Also I don't have any testcases that break other than the one
that has a clobber of a GPR along with the pointer addition.

I tried some "smart" solutions before, but all failed for some
reason, so I resorted to something that fixes the bug, and
*only* fixes the bug, and which has clearly no other side
effects than fixing the bug (I have to do all remote on compile
farm).  If a more elaborate fix is needed that also catches other
PRs, then I would hand this over to a postreload maintainer please.

Of particular importance for your case would be the note_stores call. 
But I could well see other targets needing the search for REG_INC notes 
as well as stack pushes.


If I'm right, then wouldn't it be better to factor that blob of code 
above into its own function, then use it before the "continue" rather 
than implementing a custom can for CLOBBERS?


I cannot answer that.  Maybe the authors of the code have some ideas.

Johann

It also begs the question if the other case immediately above the code I 
quoted needs similar adjustment.  It doesn't do the insn = next, but it 
does bypass the search for autoinc memory references and the note_stores 
call.


Jeff


[r14-1466 Regression] FAIL: gcc.dg/torture/fp-int-convert-timode.c -O3 -g (test for excess errors) on Linux/x86_64

2023-06-03 Thread haochen.jiang via Gcc-patches
On Linux/x86_64,

3635e8c67e13e3da7e1e23a617dd9952218e93e0 is the first bad commit
commit 3635e8c67e13e3da7e1e23a617dd9952218e93e0
Author: Roger Sayle 
Date:   Thu Jun 1 15:10:09 2023 +0100

PR target/109973: CCZmode and CCCmode variants of [v]ptest on x86.

caused

FAIL: gcc.dg/torture/fp-int-convert-float128-ieee-timode.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none  (internal compiler error: in as_a, 
at machmode.h:381)
FAIL: gcc.dg/torture/fp-int-convert-float128-ieee-timode.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none  (test for excess errors)
FAIL: gcc.dg/torture/fp-int-convert-float128-ieee-timode.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects  (internal compiler error: in as_a, at 
machmode.h:381)
FAIL: gcc.dg/torture/fp-int-convert-float128-ieee-timode.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects  (test for excess errors)
FAIL: gcc.dg/torture/fp-int-convert-float128-ieee-timode.c   -O2  (internal 
compiler error: in as_a, at machmode.h:381)
FAIL: gcc.dg/torture/fp-int-convert-float128-ieee-timode.c   -O2  (test for 
excess errors)
FAIL: gcc.dg/torture/fp-int-convert-float128-ieee-timode.c   -O3 -g  (internal 
compiler error: in as_a, at machmode.h:381)
FAIL: gcc.dg/torture/fp-int-convert-float128-ieee-timode.c   -O3 -g  (test for 
excess errors)
FAIL: gcc.dg/torture/fp-int-convert-float128-timode.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none  (internal compiler error: in as_a, 
at machmode.h:381)
FAIL: gcc.dg/torture/fp-int-convert-float128-timode.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none  (test for excess errors)
FAIL: gcc.dg/torture/fp-int-convert-float128-timode.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects  (internal compiler error: in as_a, at 
machmode.h:381)
FAIL: gcc.dg/torture/fp-int-convert-float128-timode.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects  (test for excess errors)
FAIL: gcc.dg/torture/fp-int-convert-float128-timode.c   -O2  (internal compiler 
error: in as_a, at machmode.h:381)
FAIL: gcc.dg/torture/fp-int-convert-float128-timode.c   -O2  (test for excess 
errors)
FAIL: gcc.dg/torture/fp-int-convert-float128-timode.c   -O3 -g  (internal 
compiler error: in as_a, at machmode.h:381)
FAIL: gcc.dg/torture/fp-int-convert-float128-timode.c   -O3 -g  (test for 
excess errors)
FAIL: gcc.dg/torture/fp-int-convert-float32-timode.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none  (internal compiler error: in as_a, 
at machmode.h:381)
FAIL: gcc.dg/torture/fp-int-convert-float32-timode.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none  (test for excess errors)
FAIL: gcc.dg/torture/fp-int-convert-float32-timode.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects  (internal compiler error: in as_a, at 
machmode.h:381)
FAIL: gcc.dg/torture/fp-int-convert-float32-timode.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects  (test for excess errors)
FAIL: gcc.dg/torture/fp-int-convert-float32-timode.c   -O2  (internal compiler 
error: in as_a, at machmode.h:381)
FAIL: gcc.dg/torture/fp-int-convert-float32-timode.c   -O2  (test for excess 
errors)
FAIL: gcc.dg/torture/fp-int-convert-float32-timode.c   -O3 -g  (internal 
compiler error: in as_a, at machmode.h:381)
FAIL: gcc.dg/torture/fp-int-convert-float32-timode.c   -O3 -g  (test for excess 
errors)
FAIL: gcc.dg/torture/fp-int-convert-float32x-timode.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none  (internal compiler error: in as_a, 
at machmode.h:381)
FAIL: gcc.dg/torture/fp-int-convert-float32x-timode.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none  (test for excess errors)
FAIL: gcc.dg/torture/fp-int-convert-float32x-timode.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects  (internal compiler error: in as_a, at 
machmode.h:381)
FAIL: gcc.dg/torture/fp-int-convert-float32x-timode.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects  (test for excess errors)
FAIL: gcc.dg/torture/fp-int-convert-float32x-timode.c   -O2  (internal compiler 
error: in as_a, at machmode.h:381)
FAIL: gcc.dg/torture/fp-int-convert-float32x-timode.c   -O2  (test for excess 
errors)
FAIL: gcc.dg/torture/fp-int-convert-float32x-timode.c   -O3 -g  (internal 
compiler error: in as_a, at machmode.h:381)
FAIL: gcc.dg/torture/fp-int-convert-float32x-timode.c   -O3 -g  (test for 
excess errors)
FAIL: gcc.dg/torture/fp-int-convert-float64-timode.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none  (internal compiler error: in as_a, 
at machmode.h:381)
FAIL: gcc.dg/torture/fp-int-convert-float64-timode.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none  (test for excess errors)
FAIL: gcc.dg/torture/fp-int-convert-float64-timode.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects  (internal compiler error: in as_a, at 
machmode.h:381)
FAIL: gcc.dg/torture/fp-int-convert-float64-timode.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects  (test for excess errors)
FAIL: 

[x86_64 PATCH] PR target/110083: Fix-up REG_EQUAL notes on COMPARE in STV.

2023-06-03 Thread Roger Sayle

This patch fixes PR target/110083, an ICE-on-valid regression exposed by
my recent PTEST improvements (to address PR target/109973).  The latent
bug (admittedly mine) is that the scalar-to-vector (STV) pass doesn't update
or delete REG_EQUAL notes attached to COMPARE instructions.  As a result
the operands of COMPARE would be mismatched, with the register transformed
to V1TImode, but the immediate operand left as const_wide_int, which is
valid for TImode but not V1TImode.  This remained latent when the STV
conversion converted the mode of the COMPARE to CCmode, with later passes
recognizing the REG_EQUAL note is obviously invalid as the modes didn't
match, but now that we (correctly) preserve the CCZmode on COMPARE, the
mismatched operand modes trigger a sanity checking ICE downstream.

Fixed by updating (or deleting) any REG_EQUAL notes in convert_compare.

Before:
(expr_list:REG_EQUAL (compare:CCZ (reg:V1TI 119 [ ivin.29_38 ])
(const_wide_int 0x8000))

After:
(expr_list:REG_EQUAL (compare:CCZ (reg:V1TI 119 [ ivin.29_38 ])
(const_vector:V1TI [
(const_wide_int 0x8000)
 ]))

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2023-06-03  Roger Sayle  

gcc/ChangeLog
PR target/110083
* config/i386/i386-features.cc (scalar_chain::convert_compare):
Update or delete REG_EQUAL notes, converting CONST_INT and
CONST_WIDE_INT immediate operands to a suitable CONST_VECTOR.

gcc/testsuite/ChangeLog
PR target/110083
* gcc.target/i386/pr110083.c: New test case.


Roger
--

diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc
index 3417f6b..4a3b07a 100644
--- a/gcc/config/i386/i386-features.cc
+++ b/gcc/config/i386/i386-features.cc
@@ -980,6 +980,39 @@ rtx
 scalar_chain::convert_compare (rtx op1, rtx op2, rtx_insn *insn)
 {
   rtx src, tmp;
+
+  /* Handle any REG_EQUAL notes.  */
+  tmp = find_reg_equal_equiv_note (insn);
+  if (tmp)
+{
+  if (GET_CODE (XEXP (tmp, 0)) == COMPARE
+ && GET_MODE (XEXP (tmp, 0)) == CCZmode
+ && REG_P (XEXP (XEXP (tmp, 0), 0)))
+   {
+ rtx *op =  (XEXP (tmp, 0), 1);
+ if (CONST_SCALAR_INT_P (*op))
+   {
+ if (constm1_operand (*op, GET_MODE (*op)))
+   *op = CONSTM1_RTX (vmode);
+ else
+   {
+ unsigned n = GET_MODE_NUNITS (vmode);
+ rtx *v = XALLOCAVEC (rtx, n);
+ v[0] = *op;
+ for (unsigned i = 1; i < n; ++i)
+   v[i] = const0_rtx;
+ *op = gen_rtx_CONST_VECTOR (vmode, gen_rtvec_v (n, v));
+   }
+ tmp = NULL_RTX;
+   }
+ else if (REG_P (*op))
+   tmp = NULL_RTX;
+   }
+
+  if (tmp)
+   remove_note (insn, tmp);
+}
+
   /* Comparison against anything other than zero, requires an XOR.  */
   if (op2 != const0_rtx)
 {
diff --git a/gcc/testsuite/gcc.target/i386/pr110083.c 
b/gcc/testsuite/gcc.target/i386/pr110083.c
new file mode 100644
index 000..4b38ca8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr110083.c
@@ -0,0 +1,26 @@
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O2 -msse4 -mstv -mno-stackrealign" } */
+typedef int TItype __attribute__ ((mode (TI)));
+typedef unsigned int UTItype __attribute__ ((mode (TI)));
+
+void foo (void)
+{
+  static volatile TItype ivin, ivout;
+  static volatile float fv1, fv2;
+  ivin = ((TItype) (UTItype) ~ (((UTItype) ~ (UTItype) 0) >> 1));
+  fv1 = ((TItype) (UTItype) ~ (((UTItype) ~ (UTItype) 0) >> 1));
+  fv2 = ivin;
+  ivout = fv2;
+  if (ivin != ((TItype) (UTItype) ~ (((UTItype) ~ (UTItype) 0) >> 1))
+  || 128) > sizeof (TItype) * 8 - 1)) && ivout != ivin)
+  || 128) > sizeof (TItype) * 8 - 1))
+ && ivout !=
+ ((TItype) (UTItype) ~ (((UTItype) ~ (UTItype) 0) >> 1)))
+  || fv1 !=
+  (float) ((TItype) (UTItype) ~ (((UTItype) ~ (UTItype) 0) >> 1))
+  || fv2 !=
+  (float) ((TItype) (UTItype) ~ (((UTItype) ~ (UTItype) 0) >> 1))
+  || fv1 != fv2)
+__builtin_abort ();
+}
+


Re: [PATCH] reload_cse_move2add: Handle trivial single_set:s

2023-06-03 Thread Jeff Law via Gcc-patches




On 5/31/23 09:13, Hans-Peter Nilsson via Gcc-patches wrote:

Tested cris-elf, bootstrapped & checked native
x86_64-pc-linux-gnu for good measure.  Ok to commit?

If it wasn't for there already being an auto_inc_dec pass,
this looks like a good place to put it, considering the
framework data.  (BTW, current auto-inc-dec generation is so
poor that you can replace half of what auto_inc_dec does
with a few peephole2s.)
Actually a better way to do this stuff is to use the PRE/LCM framework. 
Either Muchnick or Morgan discusses it in their book.





brgds, H-P

-- >8 --
The reload_cse_move2add part of "postreload" handled only
insns whose PATTERN was a SET.  That excludes insns that
e.g. clobber a flags register, which it does only for
"simplicity".  The patch extends the "simplicity" to most
single_set insns.  For a subset of those insns there's still
an assumption; that the single_set of a PARALLEL insn is the
first element in the PARALLEL.  If the assumption fails,
it's no biggie; the optimization just isn't performed.
Don't let the name deceive you, this optimization doesn't
hit often, but as often (or as rarely) for LRA as for reload
at least on e.g. cris-elf where the biggest effect was seen
in reducing repeated addresses in copies from fixed-address
arrays, like in gcc.c-torture/compile/pr78694.c.

* postreload.cc (move2add_use_add2_insn): Handle
trivial single_sets.  Rename variable PAT to SET.
(move2add_use_add3_insn, reload_cse_move2add): Similar.

OK
jeff


Re: [PATCH 1/2] [RISC-V] fix cfi issue in save-restore.

2023-06-03 Thread Jeff Law via Gcc-patches




On 6/2/23 04:42, Fei Gao wrote:

This patch fixes a cfi issue introduced by
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=60524be1e3929d83e15fceac6e2aa053c8a6fb20

Test code:
char my_getchar();
float getf();
int test_f0()
{
   int s0 = my_getchar();
   float f0 = getf();
   int b = my_getchar();
   return f0+s0+b;
}

cflags: -g -Os -march=rv32imafc -mabi=ilp32f -msave-restore -mcmodel=medlow

before patch:
test_f0:
...
.cfi_startproc
callt0,__riscv_save_1
.cfi_offset 8, -8
.cfi_offset 1, -4
.cfi_def_cfa_offset 16
...
addisp,sp,-16
.cfi_def_cfa_offset 32

...

addisp,sp,16
.cfi_def_cfa_offset 0  // issue here
...
tail__riscv_restore_1
.cfi_restore 8
.cfi_restore 1
.cfi_def_cfa_offset -16 // issue here
.cfi_endproc

after patch:
test_f0:
...
.cfi_startproc
callt0,__riscv_save_1
.cfi_offset 8, -8
.cfi_offset 1, -4
.cfi_def_cfa_offset 16
...
addisp,sp,-16
.cfi_def_cfa_offset 32

...

addisp,sp,16
.cfi_def_cfa_offset 16  // corrected here
...
tail__riscv_restore_1
.cfi_restore 8
.cfi_restore 1
.cfi_def_cfa_offset 0 // corrected here
.cfi_endproc

gcc/ChangeLog:

 * config/riscv/riscv.cc (riscv_expand_epilogue): fix cfi issue with 
correct offset.
I fixed a trivial whitespace problem, rewrapped the ChangeLog and pushed 
this patch to the trunk.  Please consider adding a testcase for this bug 
to the testsuite.


jeff


Re: [PATCH] RISC-V: Remove unnecessary md pattern for TARGET_XTHEADCONDMOV

2023-06-03 Thread Jeff Law via Gcc-patches




On 6/1/23 23:56, Die Li wrote:

There are 2 small changes in this patch, but they do not affect the result.

1. Remove unnecessary md pattern for TARGET_XTHEADCONDMOV in thead.md. The 
operands[4]
in "if_then_else" are always comparison operations, so the generated rtl does 
not match
the pattern that is expected to be deleted.

2. Change operands[4] from const0_rtx to operands[1] to maintain rtl 
consistency. Although
when output assembly, only operands[4] CODE will affect the output result.

Signed-off-by: Die Li

gcc/ChangeLog:

 * config/riscv/thead.md (*th_cond_gpr_mov): 
Delete.

Thanks.  I've pushed this to the trunk.
jeff


Re: [PATCH] Add more ForEachMacros to clang-format file

2023-06-03 Thread Jeff Law via Gcc-patches




On 6/2/23 07:20, Lehua Ding wrote:

Hi,

This patch adds some missed ForEachMacros to the contrib/clang-format file,
which allows the clang-format tool to format gcc code correctly.

Best,
Lehua

---
  contrib/clang-format | 10 +-
  1 file changed, 9 insertions(+), 1 deletion(-)

Thanks.  I created a ChangeLog entry and pushed this patch to the trunk.

jeff


Re: [patch] Fix PR101188 wrong code from postreload

2023-06-03 Thread Jeff Law via Gcc-patches




On 6/2/23 02:46, Georg-Johann Lay wrote:

There is the following bug in postreload that can be traced back
to v5 at least:

In postreload.cc::reload_cse_move2add() there is a loop over all
insns.  If it encounters a SET, the next insn is analyzed if it
is a single_set.

After next has been analyzed, it continues with

   if (success)
 delete_insn (insn);
   changed |= success;
   insn = next; // This effectively skips analysis of next.
   move2add_record_mode (reg);
   reg_offset[regno]
 = trunc_int_for_mode (added_offset + base_offset,
   mode);
   continue; // for continues with insn = NEXT_INSN (insn).

So it records the effect of next, but not the clobbers that
next might have.  This is a problem if next clobbers a GPR
like it can happen for avr.  What then can happen is that in a
later round, it may use a value from a (partially) clobbered reg.

The patch records the effects of potential clobbers.

Bootstrapped and reg-tested on x86_64.  Also tested on avr where
the bug popped up.  The testcase discriminates on avr, and for now
I am not aware of any other target that's affected by the bug.

The change is not intrusive and fixes wrong code, so I'd like
to backport it.

Ok to apply?

Johann

rtl-optimization/101188: Don't bypass clobbers of some insns that are
optimized or are optimization candidates.

gcc/
 PR rtl-optimization/101188
 * postreload.cc (reload_cse_move2add): Record clobbers of next
 insn using move2add_note_store.

gcc/testsuite/
 PR rtl-optimization/101188
 * gcc.c-torture/execute/pr101188.c: New test.
If I understand the code correctly, isn't the core of the problem that 
we "continue" rather than executing the rest of the code in the loop. 
In particular the continue bypasses this chunk of code:



 for (note = REG_NOTES (insn); note; note = XEXP (note, 1))
{
  if (REG_NOTE_KIND (note) == REG_INC
  && REG_P (XEXP (note, 0)))
{
  /* Reset the information about this register.  */
  int regno = REGNO (XEXP (note, 0));
  if (regno < FIRST_PSEUDO_REGISTER)
{
  move2add_record_mode (XEXP (note, 0));
  reg_mode[regno] = VOIDmode;
}
}
}

  /* There are no REG_INC notes for SP autoinc.  */
  subrtx_var_iterator::array_type array;
  FOR_EACH_SUBRTX_VAR (iter, array, PATTERN (insn), NONCONST)
{
  rtx mem = *iter;
  if (mem
  && MEM_P (mem)
  && GET_RTX_CLASS (GET_CODE (XEXP (mem, 0))) == RTX_AUTOINC)
{
  if (XEXP (XEXP (mem, 0), 0) == stack_pointer_rtx)
reg_mode[STACK_POINTER_REGNUM] = VOIDmode;
}
}

  note_stores (insn, move2add_note_store, insn);


Of particular importance for your case would be the note_stores call. 
But I could well see other targets needing the search for REG_INC notes 
as well as stack pushes.


If I'm right, then wouldn't it be better to factor that blob of code 
above into its own function, then use it before the "continue" rather 
than implementing a custom can for CLOBBERS?


It also begs the question if the other case immediately above the code I 
quoted needs similar adjustment.  It doesn't do the insn = next, but it 
does bypass the search for autoinc memory references and the note_stores 
call.





Jeff



Re: [PATCH 01/12] [contrib] validate_failures.py: Avoid testsuite aliasing

2023-06-03 Thread Jeff Law via Gcc-patches




On 6/2/23 09:20, Maxim Kuvyrkov via Gcc-patches wrote:

This patch adds tracking of current testsuite "tool" and "exp"
to the processing of .sum files.  This avoids aliasing between
tests from different testsuites with same name+description.

E.g., this is necessary for testsuite/c-c++-common, which is ran
for both gcc and g++ "tools".

This patch changes manifest format from ...

FAIL: gcc_test
FAIL: g++_test

... to ...

=== gcc tests ===
Running gcc/foo.exp ...
FAIL: gcc_test
=== gcc Summary ==
=== g++ tests ===
Running g++/bar.exp ...
FAIL: g++_test
=== g++ Summary ==
.

The new format uses same formatting as DejaGnu's .sum files
to specify which "tool" and "exp" the test belongs to.
I think the series is fine.  You're not likely to hear from Diego or 
Doug I suspect, I don't think either are involved in GNU stuff anymore.


jeff


Re: [COMMITTED] MAINTAINERS: Add myself as MIPS port maintainer

2023-06-03 Thread Maciej W. Rozycki
On Fri, 2 Jun 2023, YunQiang Su wrote:

> diff --git a/MAINTAINERS b/MAINTAINERS
> index 4a7c963914b..c8b787b6e1e 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -91,7 +91,7 @@ m68k port   Andreas Schwab  
> 
>  m68k-motorola-sysv port  Philippe De Muyter  
>  mcore port   Nick Clifton
>  microblaze   Michael Eager   
> -mips portMatthew Fortune 
> +mips portYunQiang Su 

 Has Matthew agreed to be removed from the maintainer's post?  Even if so,
then he needs to be moved back to the Write After Approval section, as no
one has deprived him of this right.

  Maciej


[PATCH] RISC-V: Support RVV zvfh{min} vfloat16*_t mov and spill

2023-06-03 Thread Pan Li via Gcc-patches
From: Pan Li 

This patch would like to allow the mov and spill operation for the RVV
vfloat16*_t types. The involved machine mode includes VNx1HF, VNx2HF,
VNx4HF, VNx8HF, VNx16HF, VNx32HF and VNx64HF.

Signed-off-by: Pan Li 
Co-Authored by: Juzhe-Zhong 

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-types.def
(vfloat16mf4_t): Add the float16 type to DEF_RVV_F_OPS.
(vfloat16mf2_t): Likewise.
(vfloat16m1_t): Likewise.
(vfloat16m2_t): Likewise.
(vfloat16m4_t): Likewise.
(vfloat16m8_t): Likewise.
* config/riscv/riscv.md: Add vfloat16*_t to attr mode.
* config/riscv/vector-iterators.md: Add vfloat16*_t machine mode
to V, V_WHOLE, V_FRACT, VINDEX, VM, VEL and sew.
* config/riscv/vector.md: Add vfloat16*_t machine mode to sew,
vlmul and ratio.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/mov-14.c: New test.
* gcc.target/riscv/rvv/base/spill-13.c: New test.
---
 .../riscv/riscv-vector-builtins-types.def |   7 ++
 gcc/config/riscv/riscv.md |   1 +
 gcc/config/riscv/vector-iterators.md  |  25 
 gcc/config/riscv/vector.md|  35 ++
 .../gcc.target/riscv/rvv/base/mov-14.c|  81 +
 .../gcc.target/riscv/rvv/base/spill-13.c  | 108 ++
 6 files changed, 257 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/mov-14.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/spill-13.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-types.def 
b/gcc/config/riscv/riscv-vector-builtins-types.def
index f7f650f7e95..65716b8c637 100644
--- a/gcc/config/riscv/riscv-vector-builtins-types.def
+++ b/gcc/config/riscv/riscv-vector-builtins-types.def
@@ -385,6 +385,13 @@ DEF_RVV_U_OPS (vuint64m2_t, RVV_REQUIRE_ELEN_64)
 DEF_RVV_U_OPS (vuint64m4_t, RVV_REQUIRE_ELEN_64)
 DEF_RVV_U_OPS (vuint64m8_t, RVV_REQUIRE_ELEN_64)
 
+DEF_RVV_F_OPS (vfloat16mf4_t, RVV_REQUIRE_ELEN_FP_16 | RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_F_OPS (vfloat16mf2_t, RVV_REQUIRE_ELEN_FP_16)
+DEF_RVV_F_OPS (vfloat16m1_t, RVV_REQUIRE_ELEN_FP_16)
+DEF_RVV_F_OPS (vfloat16m2_t, RVV_REQUIRE_ELEN_FP_16)
+DEF_RVV_F_OPS (vfloat16m4_t, RVV_REQUIRE_ELEN_FP_16)
+DEF_RVV_F_OPS (vfloat16m8_t, RVV_REQUIRE_ELEN_FP_16)
+
 DEF_RVV_F_OPS (vfloat32mf2_t, RVV_REQUIRE_ELEN_FP_32 | RVV_REQUIRE_MIN_VLEN_64)
 DEF_RVV_F_OPS (vfloat32m1_t, RVV_REQUIRE_ELEN_FP_32)
 DEF_RVV_F_OPS (vfloat32m2_t, RVV_REQUIRE_ELEN_FP_32)
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index f545874edc1..be960583101 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -175,6 +175,7 @@ (define_attr "mode" 
"unknown,none,QI,HI,SI,DI,TI,HF,SF,DF,TF,
   VNx1HI,VNx2HI,VNx4HI,VNx8HI,VNx16HI,VNx32HI,VNx64HI,
   VNx1SI,VNx2SI,VNx4SI,VNx8SI,VNx16SI,VNx32SI,
   VNx1DI,VNx2DI,VNx4DI,VNx8DI,VNx16DI,
+  VNx1HF,VNx2HF,VNx4HF,VNx8HF,VNx16HF,VNx32HF,VNx64HF,
   VNx1SF,VNx2SF,VNx4SF,VNx8SF,VNx16SF,VNx32SF,
   VNx1DF,VNx2DF,VNx4DF,VNx8DF,VNx16DF,
   VNx2x64QI,VNx2x32QI,VNx3x32QI,VNx4x32QI,
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 937ec3c7f67..5fbaef89566 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -90,6 +90,15 @@ (define_mode_iterator V [
   (VNx1SI "TARGET_MIN_VLEN < 128") VNx2SI VNx4SI VNx8SI (VNx16SI 
"TARGET_MIN_VLEN > 32") (VNx32SI "TARGET_MIN_VLEN >= 128")
   (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI 
"TARGET_VECTOR_ELEN_64")
   (VNx4DI "TARGET_VECTOR_ELEN_64") (VNx8DI "TARGET_VECTOR_ELEN_64") (VNx16DI 
"TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
+
+  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
+  (VNx2HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx8HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx16HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
+  (VNx64HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")
+
   (VNx1SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN < 128")
   (VNx2SF "TARGET_VECTOR_ELEN_FP_32")
   (VNx4SF "TARGET_VECTOR_ELEN_FP_32")
@@ -427,6 +436,15 @@ (define_mode_iterator V_WHOLE [
   (VNx1SI "TARGET_MIN_VLEN == 32") VNx2SI VNx4SI VNx8SI (VNx16SI 
"TARGET_MIN_VLEN > 32") (VNx32SI "TARGET_MIN_VLEN >= 128")
   (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI 
"TARGET_VECTOR_ELEN_64")
   (VNx4DI "TARGET_VECTOR_ELEN_64") (VNx8DI "TARGET_VECTOR_ELEN_64") (VNx16DI 
"TARGET_MIN_VLEN >= 128")
+
+  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
+  (VNx2HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx8HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx16HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
+  (VNx64HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")
+
   (VNx1SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN 

Re: [PATCH RFC] c++: use __cxa_call_terminate for MUST_NOT_THROW [PR97720]

2023-06-03 Thread Jeff Law via Gcc-patches




On 5/24/23 12:55, Jason Merrill via Gcc-patches wrote:

Middle-end folks: any thoughts about how best to make the change described in
the last paragraph below?

Library folks: any thoughts on the changes to __cxa_call_terminate?

-- 8< --

[except.handle]/7 says that when we enter std::terminate due to a throw,
that is considered an active handler.  We already implemented that properly
for the case of not finding a handler (__cxa_throw calls __cxa_begin_catch
before std::terminate) and the case of finding a callsite with no landing
pad (the personality function calls __cxa_call_terminate which calls
__cxa_begin_catch), but for the case of a throw in a try/catch in a noexcept
function, we were emitting a cleanup that calls std::terminate directly
without ever calling __cxa_begin_catch to handle the exception.

A straightforward way to fix this seems to be calling __cxa_call_terminate
instead.  However, that requires exporting it from libstdc++, which we have
not previously done.  Despite the name, it isn't actually part of the ABI
standard.  Nor is __cxa_call_unexpected, as far as I can tell, but that one
is also used by clang.  For this case they use __clang_call_terminate; it
seems reasonable to me for us to stick with __cxa_call_terminate.

I also change __cxa_call_terminate to take void* for simplicity in the front
end (and consistency with __cxa_call_unexpected) but that isn't necessary if
it's undesirable for some reason.

This patch does not fix the issue that representing the noexcept as a
cleanup is wrong, and confuses the handler search; since it looks like a
cleanup in the EH tables, the unwinder keeps looking until it finds the
catch in main(), which it should never have gotten to.  Without the
try/catch in main, the unwinder would reach the end of the stack and say no
handler was found.  The noexcept is a handler, and should be treated as one,
as it is when the landing pad is omitted.

The best fix for that issue seems to me to be to represent an
ERT_MUST_NOT_THROW after an ERT_TRY in an action list as though it were an
ERT_ALLOWED_EXCEPTIONS (since indeed it is an exception-specification).  The
actual code generation shouldn't need to change (apart from the change made
by this patch), only the action table entry.

PR c++/97720

gcc/cp/ChangeLog:

* cp-tree.h (enum cp_tree_index): Add CPTI_CALL_TERMINATE_FN.
(call_terminate_fn): New macro.
* cp-gimplify.cc (gimplify_must_not_throw_expr): Use it.
* except.cc (init_exception_processing): Set it.
(cp_protect_cleanup_actions): Return it.

gcc/ChangeLog:

* tree-eh.cc (lower_resx): Pass the exception pointer to the
failure_decl.
* except.h: Tweak comment.

libstdc++-v3/ChangeLog:

* libsupc++/eh_call.cc (__cxa_call_terminate): Take void*.
* config/abi/pre/gnu.ver: Add it.

gcc/testsuite/ChangeLog:

* g++.dg/eh/terminate2.C: New test.
OK on the middle end bits.  And I'd tend to agree MUST_NOT_THROW is just 
a special case of an exception-specification, so if you can make your 
proposal work it seems like a reasonable approach.


jeff


Re: [PATCH] libatomic: x86_64: Always try ifunc

2023-06-03 Thread Bernhard Reutner-Fischer via Gcc-patches
On 3 June 2023 15:46:02 CEST, Xi Ruoyao  wrote:

>Unfortunately __builtin_cpu_is performs CPU detection on runtime, not
>compile time.

Right, you were talking about configure, sorry.


Re: [PATCH] libatomic: x86_64: Always try ifunc

2023-06-03 Thread Xi Ruoyao via Gcc-patches
On Sat, 2023-06-03 at 14:53 +0200, Bernhard Reutner-Fischer wrote:
> On 3 June 2023 13:25:32 CEST, Xi Ruoyao via Gcc-patches
>  wrote:
> 
> > There seems no good way to check if the CPU is Intel or AMD from
> > the built-in macros (maybe we can check every known model like
> > __skylake,
> > __bdver2, ..., but it will be very error-prune and require an update
> > whenever we add the support for a new x86 model).  The best thing we
> > can
> > do seems "always try ifunc" here.
> 
> IIRC there is __builtin_cpu_is (after initialisation) -- A couple of
> days ago, we wondered if it would be handy to lower that even in
> fortran without going through C, so i am pretty sure I don't make that
> up.. ;-)

Unfortunately __builtin_cpu_is performs CPU detection on runtime, not
compile time.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [Patch, fortran] PR37336 finalization

2023-06-03 Thread Paul Richard Thomas via Gcc-patches
Hi Thomas,

I want to get something approaching correct finalization to the
distros, which implies 12-branch at present. Hopefully I can do the
same with associate in a month or two's time.

I am dithering about changing the F2003/08 part of finalization since
the default is 2018 compliance. That said, it does need a change since
the suppression of constructor finalization is also suppressing
finalization of function results within the compilers. I'll do that
first, perhaps?

Cheers

Paul



On Sat, 3 Jun 2023 at 06:50, Thomas Koenig  wrote:
>
> Hi Paul,
>
> > I propose to backport
> > r13-6747-gd7caf313525a46f200d7f5db1ba893f853774aee to 12-branch very
> > soon.
>
> Is this something that we usually do?
>
> While finalization was basically broken before, some people still used
> working subsets (or subsets that were broken, and they adapted or
> wrote their code accordingly).
>
> What is the general opinion on that?  I'm undecided.
>
> > Before that, I propose to remove the F2003/2008 finalization of
> > structure and array constructors in 13- and 14-branches. I can see why
> > it was removed from the standard in a correction to F2008 and think
> > that it is likely to cause endless confusion and maintenance
> > complications. However, finalization of function results within
> > constructors will be retained.
>
> That, I agree with.  Should it be noted somewhere as an intentional
> deviation from the standard?
>
> Best regards
>
> Thomas
>


--
"If you can't explain it simply, you don't understand it well enough"
- Albert Einstein


Re: [PATCH] libatomic: x86_64: Always try ifunc

2023-06-03 Thread Bernhard Reutner-Fischer via Gcc-patches
On 3 June 2023 13:25:32 CEST, Xi Ruoyao via Gcc-patches 
 wrote:

>There seems no good way to check if the CPU is Intel or AMD from
>the built-in macros (maybe we can check every known model like __skylake,
>__bdver2, ..., but it will be very error-prune and require an update
>whenever we add the support for a new x86 model).  The best thing we can
>do seems "always try ifunc" here.

IIRC there is __builtin_cpu_is (after initialisation) -- A couple of days ago, 
we wondered if it would be handy to lower that even in fortran without going 
through C, so i am pretty sure I don't make that up.. ;-)

Just a thought,


Re: [pushed] Darwin, PPC: Fix struct layout with pragma pack [PR110044].

2023-06-03 Thread Iain Sandoe
Hi Richard,

> On 3 Jun 2023, at 12:20, Richard Biener  wrote:
> 
>> Am 02.06.2023 um 21:12 schrieb Iain Sandoe via Gcc-patches 
>> :
>> 

>> --- 8< ---
>> 
>> This bug was essentially that darwin_rs6000_special_round_type_align()
>> was ignoring externally-imposed capping of field alignment.
>> 
>> 

>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>> index 5b3b8b52e7e..42f49e4a56b 100644
>> --- a/gcc/config/rs6000/rs6000.cc
>> +++ b/gcc/config/rs6000/rs6000.cc
>> @@ -8209,7 +8209,8 @@ darwin_rs6000_special_round_type_align (tree type, 
>> unsigned int computed,
>>  type = TREE_TYPE (type);
>>  } while (AGGREGATE_TYPE_P (type));
>> 
>> -  if (! AGGREGATE_TYPE_P (type) && type != error_mark_node)
>> +  if (type != error_mark_node && ! AGGREGATE_TYPE_P (type)
>> +  && ! TYPE_PACKED (type) && maximum_field_alignment == 0)
> 
> Just noticed while browsing mail.  ‚Maximum_field_alignment‘ sounds like
> Something that should be factored in when 
> Computing align but as written there’s no adjustment done instead?  Is there 
> a way to get that to more than BITS_PER_UNIT?

I believe it is already correctly factored in (the values of ‘computed’ and 
’specified’ supplied to the
darwin_rs6000_special_round_type_align() take it into account).  The point of 
this function is to
override the supplied values under specific conditions (that the first element 
in the aggregate is a
double or long long).  However, [at least in the de facto Darwin PPC32 ABI] we 
should not do so if
there is a packed pragma in effect (that takes priority) and omitting that 
check is the bug being fixed.

It is a bit unfortunate to be looking at a global from deep in the machinery 
(although I did a 
quick grep and it seems that this would not be easily fixable - several targets 
and other places do
inspect maximum_field_alignment).  I suppose we could add a parm indicating the 
packed status
and/or value.

Part of the motivation for a self-contained and Darwin-local solution is to 
allow backport to 10.x
before it closes (since that’s the last GCC branch that can be built with 
native tools on the earlier boxes).

hopefully, I understood your point?
cheers
Iain




[PATCH] libatomic: x86_64: Always try ifunc

2023-06-03 Thread Xi Ruoyao via Gcc-patches
We used to skip ifunc check when CX16 is available.  But now we use
CX16+AVX+Intel/AMD for the "perfect" 16b load implementation, so CX16
alone is not a sufficient reason not to use ifunc (see PR104688).

This causes a subtle and annoying issue: when GCC is built with a
higher -march= setting in CFLAGS_FOR_TARGET, ifunc is disabled and
the worst (locked) implementation of __atomic_load_16 is always used.

There seems no good way to check if the CPU is Intel or AMD from
the built-in macros (maybe we can check every known model like __skylake,
__bdver2, ..., but it will be very error-prune and require an update
whenever we add the support for a new x86 model).  The best thing we can
do seems "always try ifunc" here.

Bootstrapped and tested on x86_64-linux-gnu.  Ok for trunk?

libatomic/ChangeLog:

* configure.tgt: For x86_64, always set try_ifunc=yes.
---
 libatomic/configure.tgt | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/libatomic/configure.tgt b/libatomic/configure.tgt
index a92ae9e8309..39dd5686f2e 100644
--- a/libatomic/configure.tgt
+++ b/libatomic/configure.tgt
@@ -100,9 +100,7 @@ EOF
fi
cat > conftestx.c <

Re: [pushed] Darwin, PPC: Fix struct layout with pragma pack [PR110044].

2023-06-03 Thread Richard Biener via Gcc-patches



> Am 02.06.2023 um 21:12 schrieb Iain Sandoe via Gcc-patches 
> :
> 
> @David: I am not sure what sets the ABI on AIX (for Darwin, it is effectively
> "whatever the system compiler [Apple gcc-4] does") but from an inspection of
> the code, it seems that (if the platform should honour #pragma pack) a similar
> effect could be present there too.
> 
> Tested on powerpc-apple-darwin9, powerpc64-linux-gnu and on i686 and x86_64
> Darwin.  Checked that the testcases also pass for Apple gcc-4.2.1.
> pushed to trunk, thanks
> Iain
> 
> --- 8< ---
> 
> This bug was essentially that darwin_rs6000_special_round_type_align()
> was ignoring externally-imposed capping of field alignment.
> 
> Signed-off-by: Iain Sandoe 
> 
>PR target/110044
> 
> gcc/ChangeLog:
> 
>* config/rs6000/rs6000.cc (darwin_rs6000_special_round_type_align):
>Make sure that we do not have a cap on field alignment before altering
>the struct layout based on the type alignment of the first entry.
> 
> gcc/testsuite/ChangeLog:
> 
>* gcc.target/powerpc/darwin-abi-13-0.c: New test.
>* gcc.target/powerpc/darwin-abi-13-1.c: New test.
>* gcc.target/powerpc/darwin-abi-13-2.c: New test.
>* gcc.target/powerpc/darwin-structs-0.h: New test.
> ---
> gcc/config/rs6000/rs6000.cc   |  3 +-
> .../gcc.target/powerpc/darwin-abi-13-0.c  | 23 +++
> .../gcc.target/powerpc/darwin-abi-13-1.c  | 27 +
> .../gcc.target/powerpc/darwin-abi-13-2.c  | 27 +
> .../gcc.target/powerpc/darwin-structs-0.h | 29 +++
> 5 files changed, 108 insertions(+), 1 deletion(-)
> create mode 100644 gcc/testsuite/gcc.target/powerpc/darwin-abi-13-0.c
> create mode 100644 gcc/testsuite/gcc.target/powerpc/darwin-abi-13-1.c
> create mode 100644 gcc/testsuite/gcc.target/powerpc/darwin-abi-13-2.c
> create mode 100644 gcc/testsuite/gcc.target/powerpc/darwin-structs-0.h
> 
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 5b3b8b52e7e..42f49e4a56b 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -8209,7 +8209,8 @@ darwin_rs6000_special_round_type_align (tree type, 
> unsigned int computed,
>   type = TREE_TYPE (type);
>   } while (AGGREGATE_TYPE_P (type));
> 
> -  if (! AGGREGATE_TYPE_P (type) && type != error_mark_node)
> +  if (type != error_mark_node && ! AGGREGATE_TYPE_P (type)
> +  && ! TYPE_PACKED (type) && maximum_field_alignment == 0)

Just noticed while browsing mail.  ‚Maximum_field_alignment‘ sounds like
Something that should be factored in when 
Computing align but as written there’s no adjustment done instead?  Is there a 
way to get that to more than BITS_PER_UNIT?

> align = MAX (align, TYPE_ALIGN (type));
> 
>   return align;
> diff --git a/gcc/testsuite/gcc.target/powerpc/darwin-abi-13-0.c 
> b/gcc/testsuite/gcc.target/powerpc/darwin-abi-13-0.c
> new file mode 100644
> index 000..d8d3c63a083
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/darwin-abi-13-0.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile { target powerpc*-*-darwin* } } */
> +/* { dg-require-effective-target ilp32 } */
> +/* { dg-options "-Wno-long-long" } */
> +
> +#include "darwin-structs-0.h"
> +
> +int tcd[sizeof(cd) != 12 ? -1 : 1];
> +int acd[__alignof__(cd) != 4 ? -1 : 1];
> +
> +int sdc[sizeof(dc) != 16 ? -1 : 1];
> +int adc[__alignof__(dc) != 8 ? -1 : 1];
> +
> +int scL[sizeof(cL) != 12 ? -1 : 1];
> +int acL[__alignof__(cL) != 4 ? -1 : 1];
> +
> +int sLc[sizeof(Lc) != 16 ? -1 : 1];
> +int aLc[__alignof__(Lc) != 8 ? -1 : 1];
> +
> +int scD[sizeof(cD) != 32 ? -1 : 1];
> +int acD[__alignof__(cD) != 16 ? -1 : 1];
> +
> +int sDc[sizeof(Dc) != 32 ? -1 : 1];
> +int aDc[__alignof__(Dc) != 16 ? -1 : 1];
> diff --git a/gcc/testsuite/gcc.target/powerpc/darwin-abi-13-1.c 
> b/gcc/testsuite/gcc.target/powerpc/darwin-abi-13-1.c
> new file mode 100644
> index 000..4d888d383fa
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/darwin-abi-13-1.c
> @@ -0,0 +1,27 @@
> +/* { dg-do compile { target powerpc*-*-darwin* } } */
> +/* { dg-require-effective-target ilp32 } */
> +/* { dg-options "-Wno-long-long" } */
> +
> +#pragma pack(push, 1)
> +
> +#include "darwin-structs-0.h"
> +
> +int tcd[sizeof(cd) != 9 ? -1 : 1];
> +int acd[__alignof__(cd) != 1 ? -1 : 1];
> +
> +int sdc[sizeof(dc) != 9 ? -1 : 1];
> +int adc[__alignof__(dc) != 1 ? -1 : 1];
> +
> +int scL[sizeof(cL) != 9 ? -1 : 1];
> +int acL[__alignof__(cL) != 1 ? -1 : 1];
> +
> +int sLc[sizeof(Lc) != 9 ? -1 : 1];
> +int aLc[__alignof__(Lc) != 1 ? -1 : 1];
> +
> +int scD[sizeof(cD) != 17 ? -1 : 1];
> +int acD[__alignof__(cD) != 1 ? -1 : 1];
> +
> +int sDc[sizeof(Dc) != 17 ? -1 : 1];
> +int aDc[__alignof__(Dc) != 1 ? -1 : 1];
> +
> +#pragma pack(pop)
> diff --git a/gcc/testsuite/gcc.target/powerpc/darwin-abi-13-2.c 
> b/gcc/testsuite/gcc.target/powerpc/darwin-abi-13-2.c
> new file mode 100644
> index 000..3bd52c0a8f8
> --- /dev/null
> +++ 

[PATCH] xtensa: Optimize boolean evaluation or branching when EQ/NE to zero in S[IF]mode

2023-06-03 Thread Takayuki 'January June' Suwa via Gcc-patches
This patch optimizes the boolean evaluation of EQ/NE against zero
by adding two insn_and_split patterns similar to SImode conditional
store:

"eq_zero":
op0 = (op1 == 0) ? 1 : 0;
op0 = clz(op1) >> 5;  /* optimized (requires TARGET_NSA) */

"movsicc_ne0_reg_0":
op0 = (op1 != 0) ? op2 : 0;
op0 = op2; if (op1 == 0) ? op0 = op1;  /* optimized */

These also work in SFmode by ignoring their sign bits, and further-
more, the branch if EQ/NE against zero in SFmode is also done in the
same manner.

The reasons for this optimization in SFmode are:

  - Only zero values (negative or non-negative) contain no bits of 1
with both the exponent and the mantissa.
  - EQ/NE comparisons involving NaNs produce no signal even if they
are signaling.
  - Even if the use of IEEE 754 single-precision floating-point co-
processor is configured (TARGET_HARD_FLOAT is true):
1. Load zero value to FP register
2. Possibly, additional FP move if the comparison target is
   an address register
3. FP equality check instruction
4. Read the boolean register containing the result, or condi-
   tional branch
As noted above, a considerable number of instructions are still
generated.

gcc/ChangeLog:

* config/xtensa/predicates.md (const_float_0_operand):
Rename from obsolete "const_float_1_operand" and change the
constant to compare.
(cstoresf_cbranchsf_operand, cstoresf_cbranchsf_operator):
New.
* config/xtensa/xtensa.cc (xtensa_expand_conditional_branch):
Add code for EQ/NE comparison with constant zero in SFmode.
(xtensa_expand_scc): Added code to derive boolean evaluation
of EQ/NE with constant zero for comparison in SFmode.
(xtensa_rtx_costs): Change cost of CONST_DOUBLE with value
zero inside "cbranchsf4" to 0.
* config/xtensa/xtensa.md (cbranchsf4, cstoresf4):
Change "match_operator" and the third "match_operand" to the
ones mentioned above.
(movsicc_ne0_reg_zero, eq_zero): New.
---
 gcc/config/xtensa/predicates.md | 19 ++--
 gcc/config/xtensa/xtensa.cc | 43 ++
 gcc/config/xtensa/xtensa.md | 53 +
 3 files changed, 106 insertions(+), 9 deletions(-)

diff --git a/gcc/config/xtensa/predicates.md b/gcc/config/xtensa/predicates.md
index a3575a68892..d3b49e32fa4 100644
--- a/gcc/config/xtensa/predicates.md
+++ b/gcc/config/xtensa/predicates.md
@@ -155,11 +155,11 @@
&& CONSTANT_P (op)
&& GET_MODE_SIZE (mode) % UNITS_PER_WORD == 0")
 
-;; Accept the floating point constant 1 in the appropriate mode.
-(define_predicate "const_float_1_operand"
+;; Accept the floating point constant 0 in the appropriate mode.
+(define_predicate "const_float_0_operand"
   (match_code "const_double")
 {
-  return real_equal (CONST_DOUBLE_REAL_VALUE (op), );
+  return real_equal (CONST_DOUBLE_REAL_VALUE (op), );
 })
 
 (define_predicate "fpmem_offset_operand"
@@ -179,6 +179,13 @@
   return false;
 })
 
+(define_predicate "cstoresf_cbranchsf_operand"
+  (ior (and (match_test "TARGET_HARD_FLOAT")
+   (match_operand 0 "register_operand"))
+   (and (match_code "const_double")
+   (match_test "real_equal (CONST_DOUBLE_REAL_VALUE (op),
+)"
+
 (define_predicate "branch_operator"
   (match_code "eq,ne,lt,ge"))
 
@@ -197,6 +204,12 @@
 (define_predicate "xtensa_cstoresi_operator"
   (match_code "eq,ne,gt,ge,lt,le"))
 
+(define_predicate "cstoresf_cbranchsf_operator"
+  (ior (and (match_test "TARGET_HARD_FLOAT")
+   (match_operand 0 "comparison_operator"))
+   (and (match_test "!TARGET_HARD_FLOAT")
+   (match_operand 0 "boolean_operator"
+
 (define_predicate "xtensa_shift_per_byte_operator"
   (match_code "ashift,ashiftrt,lshiftrt"))
 
diff --git a/gcc/config/xtensa/xtensa.cc b/gcc/config/xtensa/xtensa.cc
index 3b5d25b660a..fefca3b11cd 100644
--- a/gcc/config/xtensa/xtensa.cc
+++ b/gcc/config/xtensa/xtensa.cc
@@ -865,6 +865,16 @@ xtensa_expand_conditional_branch (rtx *operands, 
machine_mode mode)
   switch (mode)
 {
 case E_SFmode:
+  if ((test_code == EQ || test_code == NE)
+ && const_float_0_operand (cmp1, SFmode))
+   {
+ emit_move_insn (cmp1 = gen_reg_rtx (SImode),
+ gen_rtx_SUBREG (SImode, cmp0, 0));
+ emit_insn (gen_addsi3 (cmp1, cmp1, cmp1));
+ cmp = gen_int_relational (test_code, cmp1, const0_rtx);
+ break;
+   }
+
   if (TARGET_HARD_FLOAT)
{
  cmp = gen_float_relational (test_code, cmp0, cmp1);
@@ -996,6 +1006,34 @@ xtensa_expand_scc (rtx operands[4], machine_mode cmp_mode)
   rtx one_tmp, zero_tmp;
   rtx (*gen_fn) (rtx, rtx, rtx, rtx, rtx);
 
+  if (cmp_mode == SFmode)
+{
+  if (const_float_0_operand (operands[3], SFmode))
+ 

Re: [Patch, fortran] PR37336 finalization

2023-06-03 Thread Steve Kargl via Gcc-patches
On Sat, Jun 03, 2023 at 07:50:19AM +0200, Thomas Koenig via Fortran wrote:
> Hi Paul,
> 
> > I propose to backport
> > r13-6747-gd7caf313525a46f200d7f5db1ba893f853774aee to 12-branch very
> > soon.
> 
> Is this something that we usually do?
> 
> While finalization was basically broken before, some people still used
> working subsets (or subsets that were broken, and they adapted or
> wrote their code accordingly).
> 
> What is the general opinion on that?  I'm undecided.
> 

I think a backport that fixes a bug that is a violation
of Fortran standard is always okay.  A backport of anything
else is up to the discretion of the contributor.  If pault
or you or harald or ... want to backport a patch, after all
these years, I think we should trust their judgement.

-- 
Steve