date:20240412

RE: [PATCH v1] RISC-V: Bugfix ICE non-vector in TARGET_FUNCTION_VALUE_REGNO_P

2024-04-12 Thread Li, Pan2

Just completed the rv64imac test for fully regression test, there is NO 
increased failures.

For FP_RETURN, I think the ABI side somehow has some restrictions similar to 
TARGET_HARD_FLOAT (of course as is). For example, the rv64imac cannot work
with lp64f/d, thus the FP_RETURN will be right REG because it is GP_RETURN if 
the lp64 abi is given.

Unfortunately this is not working for v extension as we have no v in abi option.
How about we refine this part to TARGET_HARD_FLOAT after gcc-15 
opens as the current implement looks like friable and implicit up to a point?

Pan

-Original Message-
From: Li, Pan2 
Sent: Friday, April 12, 2024 6:58 PM
To: Kito Cheng 
Cc: juzhe.zh...@rivai.ai; gcc-patches 
Subject: RE: [PATCH v1] RISC-V: Bugfix ICE non-vector in 
TARGET_FUNCTION_VALUE_REGNO_P

Sure thing, the FP_RETURN only acts on ABI_xxx similar to below:

#define FP_RETURN (UNITS_PER_FP_ARG == 0 ? GP_RETURN : FP_ARG_FIRST)

I add some test for rv32/64imac option but don't cover all test cases without 
f/d extension, will have a try and keep you posted.

Pan

-Original Message-
From: Kito Cheng  
Sent: Friday, April 12, 2024 4:56 PM
To: Li, Pan2 
Cc: juzhe.zh...@rivai.ai; gcc-patches 
Subject: Re: [PATCH v1] RISC-V: Bugfix ICE non-vector in 
TARGET_FUNCTION_VALUE_REGNO_P

Does FP reg also need gurared with TARGET_HARD_FLOAT? could you try to
compile that case without F?

On Fri, Apr 12, 2024 at 2:19 PM Li, Pan2  wrote:
>
> Committed, thanks Juzhe.
>
>
>
> Pan
>
>
>
> From: juzhe.zh...@rivai.ai 
> Sent: Friday, April 12, 2024 2:11 PM
> To: Li, Pan2 ; gcc-patches 
> Cc: kito.cheng ; Li, Pan2 
> Subject: Re: [PATCH v1] RISC-V: Bugfix ICE non-vector in 
> TARGET_FUNCTION_VALUE_REGNO_P
>
>
>
> LGTM。
>
>
>
> 
>
> juzhe.zh...@rivai.ai
>
>
>
> From: pan2.li
>
> Date: 2024-04-12 14:08
>
> To: gcc-patches
>
> CC: juzhe.zhong; kito.cheng; Pan Li
>
> Subject: [PATCH v1] RISC-V: Bugfix ICE non-vector in 
> TARGET_FUNCTION_VALUE_REGNO_P
>
> From: Pan Li 
>
>
>
> This patch would like to fix one ICE when vector is not enabled
>
> in hook TARGET_FUNCTION_VALUE_REGNO_P implementation.  The vector
>
> regno is available if and only if the TARGET_VECTOR is true.  The
>
> previous implement missed this condition and then result in ICE
>
> when rv64gc build option without vector.
>
>
>
> PR target/114639
>
>
>
> The below test suite is passed for this patch.
>
>
>
> * The rv64gcv fully regression tests.
>
> * The rv64gc fully regression tests.
>
>
>
> gcc/ChangeLog:
>
>
>
> * config/riscv/riscv.cc (riscv_function_value_regno_p): Add
>
> TARGET_VECTOR predicate for V_RETURN regno.
>
>
>
> gcc/testsuite/ChangeLog:
>
>
>
> * gcc.target/riscv/pr114639-1.c: New test.
>
> * gcc.target/riscv/pr114639-2.c: New test.
>
> * gcc.target/riscv/pr114639-3.c: New test.
>
> * gcc.target/riscv/pr114639-4.c: New test.
>
>
>
> Signed-off-by: Pan Li 
>
> ---
>
> gcc/config/riscv/riscv.cc   |  2 +-
>
> gcc/testsuite/gcc.target/riscv/pr114639-1.c | 11 +++
>
> gcc/testsuite/gcc.target/riscv/pr114639-2.c | 11 +++
>
> gcc/testsuite/gcc.target/riscv/pr114639-3.c | 11 +++
>
> gcc/testsuite/gcc.target/riscv/pr114639-4.c | 11 +++
>
> 5 files changed, 45 insertions(+), 1 deletion(-)
>
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr114639-1.c
>
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr114639-2.c
>
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr114639-3.c
>
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr114639-4.c
>
>
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>
> index 91f017dd52a..e5f00806bb9 100644
>
> --- a/gcc/config/riscv/riscv.cc
>
> +++ b/gcc/config/riscv/riscv.cc
>
> @@ -11008,7 +11008,7 @@ riscv_function_value_regno_p (const unsigned regno)
>
>if (FP_RETURN_FIRST <= regno && regno <= FP_RETURN_LAST)
>
>  return true;
>
> -  if (regno == V_RETURN)
>
> +  if (TARGET_VECTOR && regno == V_RETURN)
>
>  return true;
>
>return false;
>
> diff --git a/gcc/testsuite/gcc.target/riscv/pr114639-1.c 
> b/gcc/testsuite/gcc.target/riscv/pr114639-1.c
>
> new file mode 100644
>
> index 000..f41723193a4
>
> --- /dev/null
>
> +++ b/gcc/testsuite/gcc.target/riscv/pr114639-1.c
>
> @@ -0,0 +1,11 @@
>
> +/* Test that we do not have ice when compile */
>
> +/* { dg-do compile } */
>
> +/* { dg-options "-march=rv64gc -mabi=lp64d -std=gnu89 -O3" } */
>
> +
>
> +g (a, b) {}
>
> +
>
> +f (xx)
>
> + void* xx;
>
> +{
>
> +  __builtin_apply ((void*)g, xx, 200);
>
> +}
>
> diff --git a/gcc/testsuite/gcc.target/riscv/pr114639-2.c 
> b/gcc/testsuite/gcc.target/riscv/pr114639-2.c
>
> new file mode 100644
>
> index 000..0c402c4b254
>
> --- /dev/null
>
> +++ b/gcc/testsuite/gcc.target/riscv/pr114639-2.c
>
> @@ -0,0 +1,11 @@
>
> +/* Test that we do not have ice when compile */
>
> +/* { dg-do compile } */
>
> +/* { dg-options "-march=rv64imac -mabi=lp64 -std=gnu89 -O3" } */
>
> +
>
> +g (a, b) {}
>
>

Re: [PATCH] aarch64: Add rcpc3 dependency on rcpc2 and rcpc

2024-04-12 Thread Andrew Carlotti

On Fri, Apr 12, 2024 at 06:00:24PM +0100, Andrew Carlotti wrote:
> On Fri, Apr 12, 2024 at 04:49:03PM +0100, Richard Sandiford wrote:
> > Andrew Carlotti  writes:
> > > We don't yet have a separate feature flag for FEAT_LRCPC2 (and adding
> > > one will require extending the feature bitmask).  Instead, make the
> > > FEAT_LRCPC patterns available when either armv8.4-a or +rcpc3 is
> > > specified.  On the other hand, we already have a +rcpc flag, so this
> > > dependency can be specified directly.
> > >
> > > The cpunative test needed updating because it used an invalid Features
> > > list, since lrcpc3 requires both ilrcpc and lrcpc to be present.
> > > Without this change, host_detect_local_cpu would return the architecture
> > > string 'armv8-a+dotprod+crc+crypto+rcpc3+norcpc'.
> > >
> > > gcc/ChangeLog:
> > >
> > >   * config/aarch64/aarch64-option-extensions.def: Add RCPC to
> > >   RCPC3 dependencies.
> > >   * config/aarch64/aarch64.h (AARCH64_ISA_RCPC8_4): Add test for
> > >   RCPC3 bit
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   * gcc.target/aarch64/cpunative/info_24: Include lrcpc and ilrcpc.
> > >
> > > ---
> > >
> > > Bootstrapped and regression tested on aarch64.  I also verified that the
> > > atomic-store.c and ldapr-sext.c tests would pass when replacing 
> > > 'armv8.4-a'
> > > with 'armv8-a+rcpc3'.
> > >
> > > Ok for master?
> > >
> > >
> > > diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
> > > b/gcc/config/aarch64/aarch64-option-extensions.def
> > > index 
> > > 3155eccd39c8e6825b7fc2bb0d0514c2e7e559bf..42ec0eec31e2ddb0cc6f83fdbaf0fd4eac5ca7f4
> > >  100644
> > > --- a/gcc/config/aarch64/aarch64-option-extensions.def
> > > +++ b/gcc/config/aarch64/aarch64-option-extensions.def
> > > @@ -153,7 +153,7 @@ AARCH64_FMV_FEATURE("fp16fml", FP16FML, (F16FML))
> > >  
> > >  AARCH64_OPT_FMV_EXTENSION("rcpc", RCPC, (), (), (), "lrcpc")
> > >  
> > > -AARCH64_OPT_FMV_EXTENSION("rcpc3", RCPC3, (), (), (), "lrcpc3")
> > > +AARCH64_OPT_FMV_EXTENSION("rcpc3", RCPC3, (RCPC), (), (), "lrcpc3")
> > >  
> > >  AARCH64_OPT_FMV_EXTENSION("i8mm", I8MM, (SIMD), (), (), "i8mm")
> > >  
> > > diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> > > index 
> > > 45e901cda644dbe4eaae709e685954f1a6f7dbcf..5870e3f812f6cb0674488b8e17ab7278003d2d54
> > >  100644
> > > --- a/gcc/config/aarch64/aarch64.h
> > > +++ b/gcc/config/aarch64/aarch64.h
> > > @@ -242,7 +242,8 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = 
> > > AARCH64_FL_SM_OFF;
> > >  #define AARCH64_ISA_SHA3(aarch64_isa_flags & AARCH64_FL_SHA3)
> > >  #define AARCH64_ISA_F16FML  (aarch64_isa_flags & 
> > > AARCH64_FL_F16FML)
> > >  #define AARCH64_ISA_RCPC(aarch64_isa_flags & AARCH64_FL_RCPC)
> > > -#define AARCH64_ISA_RCPC8_4 (aarch64_isa_flags & 
> > > AARCH64_FL_V8_4A)
> > > +#define AARCH64_ISA_RCPC8_4 (aarch64_isa_flags \
> > > + & (AARCH64_FL_V8_4A | AARCH64_FL_RCPC3))
> > 
> > It looks like the effect of these two changes is that:
> > 
> > * armv9-a+rcpc3+norcpc leaves TARGET_RCPC2 true and TARGET_RCPC and
> >   TARGET_RCPC3 false.
> > 
> > * armv8-a+rcpc3+norcpc correctly leaves all three false.
> > 
> > If we add the RCPC3->RCPC dependency then I think we should also
> > require FL_RCPC alongside FL_V8_4A.  I.e.:
> > 
> > #define AARCH64_ISA_RCPC8_4 (AARCH64_ISA_RCPC \
> >  && (aarch64_isa_flags \
> >  & (AARCH64_FL_V8_4A | AARCH64_FL_RCPC3)))
> 
> Good spot! I'll go with the following instead (for formatting reasons), if it
> passes testing:
> 
> #define AARCH64_ISA_RCPC8_4((AARCH64_ISA_RCPC && AARCH_ISA_V8_4A) \
>   || (aarch64_isa_flags & AARCH64_FL_RCPC3))

I missed the 64 in AARCH64_ISA_V8_4A.  The corrected version passed testing and
is now merged.

> > OK with that change, thanks.
> > 
> > Richard
> > 
> > 
> > >  #define AARCH64_ISA_RNG (aarch64_isa_flags & AARCH64_FL_RNG)
> > >  #define AARCH64_ISA_V8_5A   (aarch64_isa_flags & 
> > > AARCH64_FL_V8_5A)
> > >  #define AARCH64_ISA_TME (aarch64_isa_flags & AARCH64_FL_TME)
> > > diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/info_24 
> > > b/gcc/testsuite/gcc.target/aarch64/cpunative/info_24
> > > index 
> > > 8d3c16a10910af977c560782f9d659c0e51286fd..3c64e00ca3a416ef565bc0b4a5b3e5bd9cfc41bc
> > >  100644
> > > --- a/gcc/testsuite/gcc.target/aarch64/cpunative/info_24
> > > +++ b/gcc/testsuite/gcc.target/aarch64/cpunative/info_24
> > > @@ -1,8 +1,8 @@
> > >  processor: 0
> > >  BogoMIPS : 100.00
> > > -Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 asimddp lrcpc3
> > > +Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 asimddp lrcpc 
> > > ilrcpc lrcpc3
> > >  CPU implementer  : 0xfe
> > >  CPU architecture: 8
> > >  CPU variant  : 0x0
> > >  CPU part : 0xd08
> > > -CPU revision : 2
> > > \ No newline at end of file
> > > +CPU

[PATCH] x86: Allow TImode offsettable memory only with 8-bit constant

2024-04-12 Thread H.J. Lu

The x86 instruction size limit is 15 bytes.  If a NDD instruction has
a segment prefix byte, a 4-byte opcode prefix, a MODRM byte, a SIB byte,
a 4-byte displacement and a 4-byte immediate, adding an address size
prefix will exceed the size limit.  Change TImode ADD, AND, OR and XOR
to allow offsettable memory only with 8-bit signed integer constant,
which is encoded with a 1-byte immediate, if the address size prefix
is used.

gcc/

PR target/114696
* config/i386/i386.md (isa): Add apx_ndd_64.
(enabled): Likewise.
(*add3_doubleword): Change rjO to r,ro,jO with 8-bit
signed integer constant and enable jO only for apx_ndd_64.
(*add3_doubleword_cc_overflow_1): Likewise.
(*and3_doubleword): Likewise.
(*3_doubleword): Likewise.

gcc/testsuite/

PR target/114696
* gcc.target/i386/apx-ndd-x32-2a.c: New test.
* gcc.target/i386/apx-ndd-x32-2b.c: Likewise.
* gcc.target/i386/apx-ndd-x32-2c.c: Likewise.
* gcc.target/i386/apx-ndd-x32-2d.c: Likewise.
---
 gcc/config/i386/i386.md   | 36 ++-
 .../gcc.target/i386/apx-ndd-x32-2a.c  | 13 +++
 .../gcc.target/i386/apx-ndd-x32-2b.c  |  6 
 .../gcc.target/i386/apx-ndd-x32-2c.c  |  6 
 .../gcc.target/i386/apx-ndd-x32-2d.c  |  6 
 5 files changed, 50 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-x32-2a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-x32-2b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-x32-2c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-x32-2d.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index d4ce3809e6d..adab1ef9e04 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -568,7 +568,7 @@ (define_attr "unit" "integer,i387,sse,mmx,unknown"
 
 ;; Used to control the "enabled" attribute on a per-instruction basis.
 (define_attr "isa" "base,x64,nox64,x64_sse2,x64_sse4,x64_sse4_noavx,
-   x64_avx,x64_avx512bw,x64_avx512dq,apx_ndd,
+   x64_avx,x64_avx512bw,x64_avx512dq,apx_ndd,apx_ndd_64,
sse_noavx,sse2,sse2_noavx,sse3,sse3_noavx,sse4,sse4_noavx,
avx,noavx,avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,avx512f_512,
noavx512f,avx512bw,avx512bw_512,noavx512bw,avx512dq,
@@ -968,6 +968,8 @@ (define_attr "enabled" ""
   (symbol_ref "TARGET_VPCLMULQDQ && TARGET_AVX512VL")
 (eq_attr "isa" "apx_ndd")
   (symbol_ref "TARGET_APX_NDD")
+(eq_attr "isa" "apx_ndd_64")
+  (symbol_ref "TARGET_APX_NDD && Pmode == DImode")
 (eq_attr "isa" "vaes_avx512vl")
   (symbol_ref "TARGET_VAES && TARGET_AVX512VL")
 
@@ -6302,10 +6304,10 @@ (define_expand "add3"
 })
 
 (define_insn_and_split "*add3_doubleword"
-  [(set (match_operand: 0 "nonimmediate_operand" "=ro,r,,,")
+  [(set (match_operand: 0 "nonimmediate_operand" "=ro,r,")
(plus:
- (match_operand: 1 "nonimmediate_operand" "%0,0,ro,rjO,r")
- (match_operand: 2 "x86_64_hilo_general_operand" 
"r,o,r,,r")))
+ (match_operand: 1 "nonimmediate_operand" "%0,0,ro,r,ro,jO,r")
+ (match_operand: 2 "x86_64_hilo_general_operand" 
"r,o,r,,K,,r")))
(clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (PLUS, mode, operands, TARGET_APX_NDD)"
   "#"
@@ -6344,7 +6346,7 @@ (define_insn_and_split "*add3_doubleword"
   DONE;
 }
 }
-[(set_attr "isa" "*,*,apx_ndd,apx_ndd,apx_ndd")])
+[(set_attr "isa" "*,*,apx_ndd,apx_ndd,apx_ndd,apx_ndd_64,apx_ndd")])
 
 (define_insn_and_split "*add3_doubleword_zext"
   [(set (match_operand: 0 "nonimmediate_operand" "=r,o,,")
@@ -9515,10 +9517,10 @@ (define_insn_and_split 
"*add3_doubleword_cc_overflow_1"
   [(set (reg:CCC FLAGS_REG)
(compare:CCC
  (plus:
-   (match_operand: 1 "nonimmediate_operand" "%0,0,ro,rjO,r")
-   (match_operand: 2 "x86_64_hilo_general_operand" 
"r,o,r,,o"))
+   (match_operand: 1 "nonimmediate_operand" "%0,0,ro,r,ro,jO,r")
+   (match_operand: 2 "x86_64_hilo_general_operand" 
"r,o,r,,K,,o"))
  (match_dup 1)))
-   (set (match_operand: 0 "nonimmediate_operand" "=ro,r,,,")
+   (set (match_operand: 0 "nonimmediate_operand" "=ro,r,")
(plus: (match_dup 1) (match_dup 2)))]
   "ix86_binary_operator_ok (PLUS, mode, operands, TARGET_APX_NDD)"
   "#"
@@ -9560,7 +9562,7 @@ (define_insn_and_split 
"*add3_doubleword_cc_overflow_1"
   else
 operands[6] = gen_rtx_ZERO_EXTEND (mode, operands[5]);
 }
-[(set_attr "isa" "*,*,apx_ndd,apx_ndd,apx_ndd")])
+[(set_attr "isa" "*,*,apx_ndd,apx_ndd,apx_ndd,apx_ndd_64,apx_ndd")])
 
 ;; x == 0 with zero flag test can be done also as x < 1U with carry flag
 ;; test, where the latter is preferrable if we have some carry consuming
@@ -11704,10 +11706,10 @@ (define_expand "and3"
 })
 
 (define_insn_and_split "*and3_doubleword"

Re: [nvptx PATCH] Correct pattern for popcountdi2 insn in nvptx.md.

2024-04-12 Thread Thomas Schwinge

Hi Roger!

On 2023-01-09T13:29:14+, "Roger Sayle"  wrote:
> The result of a POPCOUNT operation in RTL should have the same mode
> as its operand.  This corrects the specification of popcount in
> the nvptx backend, splitting the current generic define_insn into
> two, one for popcountsi2 and the other for popcountdi2 (the latter
> with an explicit truncate).
>
> This patch has been tested on nvptx-none (hosted on x86_64-pc-linux-gnu)
> with make and make -k check with no new failures.  This functionality is
> already tested by gcc.target/nvptx/popc-[123].c.

So I compared '-fdump-rtl-all' and '*.s' of current vs. patched for those
three '*.c' files.  It is expected that I only see '(popcount:SI [DI])'
-> '(truncate:SI (popcount:DI [DI]))', but not any actually observable
change, right?

Shouldn't the current erronuous form trigger a '--enable-checking=rtl'
error?

> Ok for mainline?

OK, thanks.


..., and sorry for the great delay!  The chaos that came upon my group
half a year ago, and resulted in having had to switch employers, has not
exactly helped to allow allocating proper time for better learning GCC
back end.  But, fortunately, we've been able to switch employers!


Grüße
 Thomas


> 2023-01-09  Roger Sayle  
>
> gcc/ChangeLog
>   * config/nvptx/nvptx.md (popcount2): Split into...
>   (popcountsi2): define_insn handling SImode popcount.
>   (popcountdi2): define_insn handling DImode popcount, with an
>   explicit truncate:SI to produce an SImode result.
>
> Thanks in advance,
> Roger
> --
>
> diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
> index 740c4de..461540e 100644
> --- a/gcc/config/nvptx/nvptx.md
> +++ b/gcc/config/nvptx/nvptx.md
> @@ -658,11 +658,18 @@
>DONE;
>  })
>  
> -(define_insn "popcount2"
> +(define_insn "popcountsi2"
>[(set (match_operand:SI 0 "nvptx_register_operand" "=R")
> - (popcount:SI (match_operand:SDIM 1 "nvptx_register_operand" "R")))]
> + (popcount:SI (match_operand:SI 1 "nvptx_register_operand" "R")))]
>""
> -  "%.\\tpopc.b%T1\\t%0, %1;")
> +  "%.\\tpopc.b32\\t%0, %1;")
> +
> +(define_insn "popcountdi2"
> +  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
> + (truncate:SI
> +   (popcount:DI (match_operand:DI 1 "nvptx_register_operand" "R"]
> +  ""
> +  "%.\\tpopc.b64\\t%0, %1;")
>  
>  ;; Multiplication variants
>

Re: [PATCH] libstdc++: Update some baseline_symbols.txt (x32)

2024-04-12 Thread Jonathan Wakely

On Fri, 12 Apr 2024, 21:51 H.J. Lu,  wrote:

> * config/abi/post/x86_64-linux-gnu/x32/baseline_symbols.txt:
> Updated.
>

OK thanks


---
>  .../abi/post/x86_64-linux-gnu/x32/baseline_symbols.txt  | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git
> a/libstdc++-v3/config/abi/post/x86_64-linux-gnu/x32/baseline_symbols.txt
> b/libstdc++-v3/config/abi/post/x86_64-linux-gnu/x32/baseline_symbols.txt
> index dc69c47f4d7..ac11d5dba4d 100644
> ---
> a/libstdc++-v3/config/abi/post/x86_64-linux-gnu/x32/baseline_symbols.txt
> +++
> b/libstdc++-v3/config/abi/post/x86_64-linux-gnu/x32/baseline_symbols.txt
> @@ -497,6 +497,7 @@ FUNC:_ZNKSt11__timepunctIwE7_M_daysEPPKw@@GLIBCXX_3.4
>  FUNC:_ZNKSt11__timepunctIwE8_M_am_pmEPPKw@@GLIBCXX_3.4
>  FUNC:_ZNKSt11__timepunctIwE9_M_monthsEPPKw@@GLIBCXX_3.4
>  FUNC:_ZNKSt11logic_error4whatEv@@GLIBCXX_3.4
> +FUNC:_ZNKSt12__basic_fileIcE13native_handleEv@@GLIBCXX_3.4.33
>  FUNC:_ZNKSt12__basic_fileIcE7is_openEv@@GLIBCXX_3.4
>
>  
> FUNC:_ZNKSt12__shared_ptrINSt10filesystem28recursive_directory_iterator10_Dir_stackELN9__gnu_cxx12_Lock_policyE2EEcvbEv@
> @GLIBCXX_3.4.31
>
>  
> FUNC:_ZNKSt12__shared_ptrINSt10filesystem4_DirELN9__gnu_cxx12_Lock_policyE2EEcvbEv@
> @GLIBCXX_3.4.31
> @@ -3214,6 +3215,7 @@
> FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE10_M_disposeEv@
> @GLIBCX
>
>  
> FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE10_M_replaceEjjPKcj@
> @GLIBCXX_3.4.21
>  FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE10_S_compareEjj@
> @GLIBCXX_3.4.21
>  FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE11_M_capacityEj@
> @GLIBCXX_3.4.21
>
> +FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE11_S_allocateERS3_j@
> @GLIBCXX_3.4.32
>
>  
> FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE12_Alloc_hiderC1EPcOS3_@
> @GLIBCXX_3.4.23
>
>  
> FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE12_Alloc_hiderC1EPcRKS3_@
> @GLIBCXX_3.4.21
>
>  
> FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE12_Alloc_hiderC2EPcOS3_@
> @GLIBCXX_3.4.23
> @@ -3366,6 +3368,7 @@
> FUNC:_ZNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEE10_M_disposeEv@
> @GLIBCX
>
>  
> FUNC:_ZNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEE10_M_replaceEjjPKwj@
> @GLIBCXX_3.4.21
>  FUNC:_ZNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEE10_S_compareEjj@
> @GLIBCXX_3.4.21
>  FUNC:_ZNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEE11_M_capacityEj@
> @GLIBCXX_3.4.21
>
> +FUNC:_ZNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEE11_S_allocateERS3_j@
> @GLIBCXX_3.4.32
>
>  
> FUNC:_ZNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEE12_Alloc_hiderC1EPwOS3_@
> @GLIBCXX_3.4.23
>
>  
> FUNC:_ZNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEE12_Alloc_hiderC1EPwRKS3_@
> @GLIBCXX_3.4.21
>
>  
> FUNC:_ZNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEE12_Alloc_hiderC2EPwOS3_@
> @GLIBCXX_3.4.23
> @@ -4531,6 +4534,7 @@ FUNC:__cxa_allocate_exception@@CXXABI_1.3
>  FUNC:__cxa_bad_cast@@CXXABI_1.3
>  FUNC:__cxa_bad_typeid@@CXXABI_1.3
>  FUNC:__cxa_begin_catch@@CXXABI_1.3
> +FUNC:__cxa_call_terminate@@CXXABI_1.3.15
>  FUNC:__cxa_call_unexpected@@CXXABI_1.3
>  FUNC:__cxa_current_exception_type@@CXXABI_1.3
>  FUNC:__cxa_deleted_virtual@@CXXABI_1.3.6
> @@ -4574,6 +4578,7 @@ OBJECT:0:CXXABI_1.3.11
>  OBJECT:0:CXXABI_1.3.12
>  OBJECT:0:CXXABI_1.3.13
>  OBJECT:0:CXXABI_1.3.14
> +OBJECT:0:CXXABI_1.3.15
>  OBJECT:0:CXXABI_1.3.2
>  OBJECT:0:CXXABI_1.3.3
>  OBJECT:0:CXXABI_1.3.4
> @@ -4611,6 +4616,7 @@ OBJECT:0:GLIBCXX_3.4.3
>  OBJECT:0:GLIBCXX_3.4.30
>  OBJECT:0:GLIBCXX_3.4.31
>  OBJECT:0:GLIBCXX_3.4.32
> +OBJECT:0:GLIBCXX_3.4.33
>  OBJECT:0:GLIBCXX_3.4.4
>  OBJECT:0:GLIBCXX_3.4.5
>  OBJECT:0:GLIBCXX_3.4.6
> --
> 2.44.0
>
>

Re: [PATCH v3] c++: ICE with temporary of class type in array DMI [PR109966]

2024-04-12 Thread Marek Polacek

On Fri, Apr 12, 2024 at 04:15:45PM -0400, Jason Merrill wrote:
> On 3/14/24 17:26, Marek Polacek wrote:
> > 
> > In the following patch, I'm taking a different tack.  I believe
> > we ought to use TARGET_EXPR_ELIDING_P.  The gimplify_arg bit I'm
> > talking about below is this:
> > 
> >/* Also strip a TARGET_EXPR that would force an extra copy.  */
> >if (TREE_CODE (*arg_p) == TARGET_EXPR)
> >  {
> >tree init = TARGET_EXPR_INITIAL (*arg_p);
> >if (init
> >&& !VOID_TYPE_P (TREE_TYPE (init)))
> >  *arg_p = init;
> >  }
> > 
> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/13?
> > 
> > -- >8 --
> > This ICE started with the fairly complicated r13-765.  We crash in
> > gimplify_var_or_parm_decl because a stray VAR_DECL leaked there.
> > The problem is ultimately that potential_prvalue_result_of wasn't
> > correctly handling arrays and replace_placeholders_for_class_temp_r
> > replaced a PLACEHOLDER_EXPR in a TARGET_EXPR which is used in the
> > context of copy elision.  If I have
> > 
> >M m[2] = { M{""}, M{""} };
> > 
> > then we don't invoke the M(const M&) copy-ctor.
> > 
> > One part of the fix is to use TARGET_EXPR_ELIDING_P rather than
> > potential_prvalue_result_of.  That unfortunately doesn't handle the
> > case like
> > 
> >struct N { N(M); };
> >N arr[2] = { M{""}, M{""} };
> > 
> > because TARGET_EXPRs that initialize a function argument are not
> > marked TARGET_EXPR_ELIDING_P even though gimplify_arg drops such
> > TARGET_EXPRs on the floor.  We can use a pset to avoid replacing
> > placeholders in them.
> > 
> > I made an attempt to use set_target_expr_eliding in
> > convert_for_arg_passing but that regressed constexpr-diag1.C, and does
> > not seem like a prudent change in stage 4 anyway.
> 
> I tried the same thing to see what you mean, and that doesn't look like a
> regression to me, just a different (and more accurate) diagnostic.
> 
> But you're right that this patch is safer, and the other approach can wait
> for stage 1.  Will you queue that up?  In the mean time, this patch is OK.

Yeah, happy to; I've opened 114707 to remember.
 
> > I just realized this could also check !TARGET_EXPR_ELIDING_P; there's no 
> > point
> > to adding an eliding TARGET_EXPR into the pset.
> 
> ...with this change.

Thanks.

Marek

[PATCH] libstdc++: Update some baseline_symbols.txt (x32)

2024-04-12 Thread H.J. Lu

* config/abi/post/x86_64-linux-gnu/x32/baseline_symbols.txt:
Updated.
---
 .../abi/post/x86_64-linux-gnu/x32/baseline_symbols.txt  | 6 ++
 1 file changed, 6 insertions(+)

diff --git 
a/libstdc++-v3/config/abi/post/x86_64-linux-gnu/x32/baseline_symbols.txt 
b/libstdc++-v3/config/abi/post/x86_64-linux-gnu/x32/baseline_symbols.txt
index dc69c47f4d7..ac11d5dba4d 100644
--- a/libstdc++-v3/config/abi/post/x86_64-linux-gnu/x32/baseline_symbols.txt
+++ b/libstdc++-v3/config/abi/post/x86_64-linux-gnu/x32/baseline_symbols.txt
@@ -497,6 +497,7 @@ FUNC:_ZNKSt11__timepunctIwE7_M_daysEPPKw@@GLIBCXX_3.4
 FUNC:_ZNKSt11__timepunctIwE8_M_am_pmEPPKw@@GLIBCXX_3.4
 FUNC:_ZNKSt11__timepunctIwE9_M_monthsEPPKw@@GLIBCXX_3.4
 FUNC:_ZNKSt11logic_error4whatEv@@GLIBCXX_3.4
+FUNC:_ZNKSt12__basic_fileIcE13native_handleEv@@GLIBCXX_3.4.33
 FUNC:_ZNKSt12__basic_fileIcE7is_openEv@@GLIBCXX_3.4
 
FUNC:_ZNKSt12__shared_ptrINSt10filesystem28recursive_directory_iterator10_Dir_stackELN9__gnu_cxx12_Lock_policyE2EEcvbEv@@GLIBCXX_3.4.31
 
FUNC:_ZNKSt12__shared_ptrINSt10filesystem4_DirELN9__gnu_cxx12_Lock_policyE2EEcvbEv@@GLIBCXX_3.4.31
@@ -3214,6 +3215,7 @@ 
FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE10_M_disposeEv@@GLIBCX
 
FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE10_M_replaceEjjPKcj@@GLIBCXX_3.4.21
 
FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE10_S_compareEjj@@GLIBCXX_3.4.21
 
FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE11_M_capacityEj@@GLIBCXX_3.4.21
+FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE11_S_allocateERS3_j@@GLIBCXX_3.4.32
 
FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE12_Alloc_hiderC1EPcOS3_@@GLIBCXX_3.4.23
 
FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE12_Alloc_hiderC1EPcRKS3_@@GLIBCXX_3.4.21
 
FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE12_Alloc_hiderC2EPcOS3_@@GLIBCXX_3.4.23
@@ -3366,6 +3368,7 @@ 
FUNC:_ZNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEE10_M_disposeEv@@GLIBCX
 
FUNC:_ZNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEE10_M_replaceEjjPKwj@@GLIBCXX_3.4.21
 
FUNC:_ZNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEE10_S_compareEjj@@GLIBCXX_3.4.21
 
FUNC:_ZNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEE11_M_capacityEj@@GLIBCXX_3.4.21
+FUNC:_ZNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEE11_S_allocateERS3_j@@GLIBCXX_3.4.32
 
FUNC:_ZNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEE12_Alloc_hiderC1EPwOS3_@@GLIBCXX_3.4.23
 
FUNC:_ZNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEE12_Alloc_hiderC1EPwRKS3_@@GLIBCXX_3.4.21
 
FUNC:_ZNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEE12_Alloc_hiderC2EPwOS3_@@GLIBCXX_3.4.23
@@ -4531,6 +4534,7 @@ FUNC:__cxa_allocate_exception@@CXXABI_1.3
 FUNC:__cxa_bad_cast@@CXXABI_1.3
 FUNC:__cxa_bad_typeid@@CXXABI_1.3
 FUNC:__cxa_begin_catch@@CXXABI_1.3
+FUNC:__cxa_call_terminate@@CXXABI_1.3.15
 FUNC:__cxa_call_unexpected@@CXXABI_1.3
 FUNC:__cxa_current_exception_type@@CXXABI_1.3
 FUNC:__cxa_deleted_virtual@@CXXABI_1.3.6
@@ -4574,6 +4578,7 @@ OBJECT:0:CXXABI_1.3.11
 OBJECT:0:CXXABI_1.3.12
 OBJECT:0:CXXABI_1.3.13
 OBJECT:0:CXXABI_1.3.14
+OBJECT:0:CXXABI_1.3.15
 OBJECT:0:CXXABI_1.3.2
 OBJECT:0:CXXABI_1.3.3
 OBJECT:0:CXXABI_1.3.4
@@ -4611,6 +4616,7 @@ OBJECT:0:GLIBCXX_3.4.3
 OBJECT:0:GLIBCXX_3.4.30
 OBJECT:0:GLIBCXX_3.4.31
 OBJECT:0:GLIBCXX_3.4.32
+OBJECT:0:GLIBCXX_3.4.33
 OBJECT:0:GLIBCXX_3.4.4
 OBJECT:0:GLIBCXX_3.4.5
 OBJECT:0:GLIBCXX_3.4.6
-- 
2.44.0

Re: [PATCH] rs6000: Add OPTION_MASK_POWER8 [PR101865]

2024-04-12 Thread Peter Bergner

On 4/11/24 11:23 PM, Peter Bergner wrote:
> I'll make the changes above, modulo leaving the option name unchanged until
> we hear from Segher on that and report back on the LE and BE testing.

I made all of the requested changes and went with -mpower8-internal since
Segher was fine with that (offline) along with the helpful Warn message.

Testing was clean on both LE and BE, so I pushed the changes.
I'll let things bake on trunk for a bit before pushing the backports.

Thanks!

Peter

Re: [PATCH] c++: problematic assert in reference_binding [PR113141]

2024-04-12 Thread Patrick Palka

On Fri, 12 Apr 2024, Jason Merrill wrote:

> On 3/26/24 09:44, Patrick Palka wrote:
> > On Thu, 7 Mar 2024, Jason Merrill wrote:
> > 
> > > On 1/29/24 17:42, Patrick Palka wrote:
> > > > On Mon, 29 Jan 2024, Patrick Palka wrote:
> > > > 
> > > > > On Fri, 26 Jan 2024, Jason Merrill wrote:
> > > > > 
> > > > > > On 1/26/24 17:11, Jason Merrill wrote:
> > > > > > > On 1/26/24 16:52, Jason Merrill wrote:
> > > > > > > > On 1/25/24 14:18, Patrick Palka wrote:
> > > > > > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this
> > > > > > > > > look
> > > > > > > > > OK for trunk/13?  This isn't a very satisfactory fix, but at
> > > > > > > > > least
> > > > > > > > > it safely fixes these testcases I guess.  Note that there's
> > > > > > > > > implementation disagreement about the second testcase, GCC
> > > > > > > > > always
> > > > > > > > > accepted it but Clang/MSVC/icc reject it.
> > > > > > > > 
> > > > > > > > Because of trying to initialize int& from {c}; removing the
> > > > > > > > extra
> > > > > > > > braces
> > > > > > > > makes it work everywhore.
> > > > > > > > 
> > > > > > > > https://eel.is/c++draft/dcl.init#list-3.10 says that we always
> > > > > > > > generate a
> > > > > > > > prvalue in this case, so perhaps we shouldn't recalculate if the
> > > > > > > > initializer is an init-list?
> > > > > > > 
> > > > > > > ...but it seems bad to silently bind a const int& to a prvalue
> > > > > > > instead
> > > > > > > of
> > > > > > > directly to the reference returned by the operator, as clang does
> > > > > > > if
> > > > > > > we add
> > > > > > > const to the second testcase, so I think there's a defect in the
> > > > > > > standard
> > > > > > > here.
> > > > > > 
> > > > > > Perhaps bullet 3.9 should change to "...its referenced type is
> > > > > > reference-related to E or scalar, ..."
> > > > > > 
> > > > > > > Maybe for now also disable the maybe_valid heuristics in the case
> > > > > > > of
> > > > > > > an
> > > > > > > init-list?
> > > > > > > 
> > > > > > > > The first testcase is special because it's a C-style cast; seems
> > > > > > > > like the
> > > > > > > > maybe_valid = false heuristics should be disabled if c_cast_p.
> > > > > 
> > > > > Thanks a lot for the pointers.  IIUC c_cast_p and
> > > > > LOOKUP_SHORTCUT_BAD_CONVS
> > > > > should already be mutually exclusive, since the latter is set only
> > > > > when
> > > > > computing argument conversions, so it shouldn't be necessary to check
> > > > > c_cast_p.
> > > > > 
> > > > > I suppose we could disable the heuristic for init-lists, but after
> > > > > some
> > > > > digging I noticed that the heuristics were originally in same spot
> > > > > they
> > > > > are now until r5-601-gd02f620dc0bb3b moved them to get checked after
> > > > > the recursive recalculation case in reference_binding, returning a bad
> > > > > conversion instead of NULL.  (Then in r13-1755-g68f37670eff0b872 I
> > > > > moved
> > > > > them back; IIRC that's why I felt confident that moving the checks was
> > > > > safe.)
> > > > > Thus we didn't always accept the second testcase, we only started
> > > > > doing so
> > > > > in
> > > > > GCC 5: https://godbolt.org/z/6nsEW14fh (sorry for missing this and
> > > > > saying
> > > > > we
> > > > > always accepted it)
> > > > > 
> > > > > And indeed the current order of checks seems consistent with that of
> > > > > [dcl.init.ref]/5.  So I wonder if we don't instead want to "complete"
> > > > > the NULL-to-bad-conversion adjustment in r5-601-gd02f620dc0bb3b and
> > > > > do:
> > > > > 
> > > > > gcc/cp/ChangeLog:
> > > > > 
> > > > >   * call.cc (reference_binding): Set bad_p according to
> > > > >   maybe_valid_p in the recursive case as well.  Remove
> > > > >   redundant gcc_assert.
> > > > > 
> > > > > diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
> > > > > index 9de0d77c423..c4158b2af37 100644
> > > > > --- a/gcc/cp/call.cc
> > > > > +++ b/gcc/cp/call.cc
> > > > > @@ -2033,8 +2033,8 @@ reference_binding (tree rto, tree rfrom, tree
> > > > > expr,
> > > > > bool c_cast_p, int flags,
> > > > >  sflags, complain);
> > > > >   if (!new_second)
> > > > > return bad_direct_conv ? bad_direct_conv : nullptr;
> > > > > + t->bad_p = !maybe_valid_p;
> > > > 
> > > > Oops, that should be |= not =.
> > > > 
> > > > > > Perhaps bullet 3.9 should change to "...its referenced type is
> > > > > > reference-related to E or scalar, ..."
> > > > >   conv = merge_conversion_sequences (t, new_second);
> > > > > - gcc_assert (maybe_valid_p || conv->bad_p);
> > > > >   return conv;
> > > > > }
> > > > >}
> > > > > 
> > > > > This'd mean we'd go back to rejecting the second testcase (only the
> > > > > call, not the direct-init, interestingly enough), but that seems to be
> > > > 
> > > > In the second testcase, with the above fix initialize_reference silently
> > > > returns error_mark_node for the

Re: [PATCH v3] c++: ICE with temporary of class type in array DMI [PR109966]

2024-04-12 Thread Jason Merrill


On 3/14/24 17:26, Marek Polacek wrote:


In the following patch, I'm taking a different tack.  I believe
we ought to use TARGET_EXPR_ELIDING_P.  The gimplify_arg bit I'm
talking about below is this:

   /* Also strip a TARGET_EXPR that would force an extra copy.  */
   if (TREE_CODE (*arg_p) == TARGET_EXPR)
 {
   tree init = TARGET_EXPR_INITIAL (*arg_p);
   if (init
   && !VOID_TYPE_P (TREE_TYPE (init)))
 *arg_p = init;
 }

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/13?

-- >8 --
This ICE started with the fairly complicated r13-765.  We crash in
gimplify_var_or_parm_decl because a stray VAR_DECL leaked there.
The problem is ultimately that potential_prvalue_result_of wasn't
correctly handling arrays and replace_placeholders_for_class_temp_r
replaced a PLACEHOLDER_EXPR in a TARGET_EXPR which is used in the
context of copy elision.  If I have

   M m[2] = { M{""}, M{""} };

then we don't invoke the M(const M&) copy-ctor.

One part of the fix is to use TARGET_EXPR_ELIDING_P rather than
potential_prvalue_result_of.  That unfortunately doesn't handle the
case like

   struct N { N(M); };
   N arr[2] = { M{""}, M{""} };

because TARGET_EXPRs that initialize a function argument are not
marked TARGET_EXPR_ELIDING_P even though gimplify_arg drops such
TARGET_EXPRs on the floor.  We can use a pset to avoid replacing
placeholders in them.

I made an attempt to use set_target_expr_eliding in
convert_for_arg_passing but that regressed constexpr-diag1.C, and does
not seem like a prudent change in stage 4 anyway.


I tried the same thing to see what you mean, and that doesn't look like 
a regression to me, just a different (and more accurate) diagnostic.


But you're right that this patch is safer, and the other approach can 
wait for stage 1.  Will you queue that up?  In the mean time, this patch 
is OK.



I just realized this could also check !TARGET_EXPR_ELIDING_P; there's no point
to adding an eliding TARGET_EXPR into the pset.


...with this change.

Jason

Re: [PATCH] c++/modules: Setup aliases imported from modules [PR106820]

2024-04-12 Thread Jason Merrill


On 3/26/24 09:24, Nathaniel Shead wrote:


I wonder if more generally we need to be doing more work when importing
definitions from header units especially to handle all the work that
'make_rtl_for_nonlocal_decl' and 'rest_of_decl_compilation' would have
been performing. 


Can we just call those functions?


But this patch fixes at least one missing step.


OK.


PR c++/106820

gcc/cp/ChangeLog:

* module.cc (trees_in::decl_value): Assemble alias when needed.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr106820_a.H: New test.
* g++.dg/modules/pr106820_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/module.cc  | 9 +
  gcc/testsuite/g++.dg/modules/pr106820_a.H | 5 +
  gcc/testsuite/g++.dg/modules/pr106820_b.C | 8 
  3 files changed, 22 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/modules/pr106820_a.H
  create mode 100644 gcc/testsuite/g++.dg/modules/pr106820_b.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 8aab9ea0bae..b4e3b38c6fe 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -217,6 +217,7 @@ Classes used:
  #include "dumpfile.h"
  #include "bitmap.h"
  #include "cgraph.h"
+#include "varasm.h"
  #include "tree-iterator.h"
  #include "cpplib.h"
  #include "mkdeps.h"
@@ -8302,6 +8303,14 @@ trees_in::decl_value ()
if (state->is_header ()
  && decl_tls_wrapper_p (decl))
note_vague_linkage_fn (decl);
+
+  /* Setup aliases for the declaration.  */
+  if (tree alias = lookup_attribute ("alias", DECL_ATTRIBUTES (decl)))
+   {
+ alias = TREE_VALUE (TREE_VALUE (alias));
+ alias = get_identifier (TREE_STRING_POINTER (alias));
+ assemble_alias (decl, alias);
+   }
  }
else
  {
diff --git a/gcc/testsuite/g++.dg/modules/pr106820_a.H 
b/gcc/testsuite/g++.dg/modules/pr106820_a.H
new file mode 100644
index 000..7d32d4e5fc3
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr106820_a.H
@@ -0,0 +1,5 @@
+// PR c++/106820
+// { dg-additional-options "-fmodules-ts" }
+// { dg-module-cmi {} }
+
+static int __gthrw___pthread_key_create() __attribute__((__weakref__("foo")));
diff --git a/gcc/testsuite/g++.dg/modules/pr106820_b.C 
b/gcc/testsuite/g++.dg/modules/pr106820_b.C
new file mode 100644
index 000..247fe26e778
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr106820_b.C
@@ -0,0 +1,8 @@
+// PR c++/106820
+// { dg-additional-options "-fmodules-ts" }
+
+import "pr106820_a.H";
+
+int main() {
+  __gthrw___pthread_key_create();
+}

Re: [PATCH] c++: problematic assert in reference_binding [PR113141]

2024-04-12 Thread Jason Merrill


On 3/26/24 09:44, Patrick Palka wrote:

On Thu, 7 Mar 2024, Jason Merrill wrote:


On 1/29/24 17:42, Patrick Palka wrote:

On Mon, 29 Jan 2024, Patrick Palka wrote:


On Fri, 26 Jan 2024, Jason Merrill wrote:


On 1/26/24 17:11, Jason Merrill wrote:

On 1/26/24 16:52, Jason Merrill wrote:

On 1/25/24 14:18, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk/13?  This isn't a very satisfactory fix, but at least
it safely fixes these testcases I guess.  Note that there's
implementation disagreement about the second testcase, GCC always
accepted it but Clang/MSVC/icc reject it.


Because of trying to initialize int& from {c}; removing the extra
braces
makes it work everywhore.

https://eel.is/c++draft/dcl.init#list-3.10 says that we always
generate a
prvalue in this case, so perhaps we shouldn't recalculate if the
initializer is an init-list?


...but it seems bad to silently bind a const int& to a prvalue instead
of
directly to the reference returned by the operator, as clang does if
we add
const to the second testcase, so I think there's a defect in the
standard
here.


Perhaps bullet 3.9 should change to "...its referenced type is
reference-related to E or scalar, ..."


Maybe for now also disable the maybe_valid heuristics in the case of
an
init-list?


The first testcase is special because it's a C-style cast; seems
like the
maybe_valid = false heuristics should be disabled if c_cast_p.


Thanks a lot for the pointers.  IIUC c_cast_p and
LOOKUP_SHORTCUT_BAD_CONVS
should already be mutually exclusive, since the latter is set only when
computing argument conversions, so it shouldn't be necessary to check
c_cast_p.

I suppose we could disable the heuristic for init-lists, but after some
digging I noticed that the heuristics were originally in same spot they
are now until r5-601-gd02f620dc0bb3b moved them to get checked after
the recursive recalculation case in reference_binding, returning a bad
conversion instead of NULL.  (Then in r13-1755-g68f37670eff0b872 I moved
them back; IIRC that's why I felt confident that moving the checks was
safe.)
Thus we didn't always accept the second testcase, we only started doing so
in
GCC 5: https://godbolt.org/z/6nsEW14fh (sorry for missing this and saying
we
always accepted it)

And indeed the current order of checks seems consistent with that of
[dcl.init.ref]/5.  So I wonder if we don't instead want to "complete"
the NULL-to-bad-conversion adjustment in r5-601-gd02f620dc0bb3b and
do:

gcc/cp/ChangeLog:

* call.cc (reference_binding): Set bad_p according to
maybe_valid_p in the recursive case as well.  Remove
redundant gcc_assert.

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index 9de0d77c423..c4158b2af37 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -2033,8 +2033,8 @@ reference_binding (tree rto, tree rfrom, tree expr,
bool c_cast_p, int flags,
   sflags, complain);
if (!new_second)
  return bad_direct_conv ? bad_direct_conv : nullptr;
+   t->bad_p = !maybe_valid_p;


Oops, that should be |= not =.


Perhaps bullet 3.9 should change to "...its referenced type is
reference-related to E or scalar, ..."

conv = merge_conversion_sequences (t, new_second);
-   gcc_assert (maybe_valid_p || conv->bad_p);
return conv;
  }
   }

This'd mean we'd go back to rejecting the second testcase (only the
call, not the direct-init, interestingly enough), but that seems to be


In the second testcase, with the above fix initialize_reference silently
returns error_mark_node for the direct-init without issuing a
diagnostic, because in the error path convert_like doesn't find anything
wrong with the bad conversion.  So more changes need to be made if we
want to set bad_p in the recursive case of reference_binding it seems;
dunno if that's the path we want to go down?

On the other hand, disabling the badness checks in certain cases seems
to be undesirable as well, since AFAICT their current position is
consistent with [dcl.init.ref]/5?

So I wonder if we should just go with the safest thing at this stage,
which would be the original patch that removes the problematic assert?


I still think the assert is correct, and the problem is that maybe_valid_p is
wrong; these cases turn out to be valid, so maybe_valid_p should be true.


I'm afraid then I don't know how we can statically identify these cases
without actually performing the conversion, in light of the recursion :/
Do you mind taking this PR?  I don't feel well-versed enough with the
reference binding rules to tackle this adequately..


That ended up being a surprisingly deep dive, but I've now checked in 
separate fixes for the two cases.


Jason

Re: [PATCH] Regenerate opt.urls

2024-04-12 Thread Palmer Dabbelt


On Fri, 12 Apr 2024 12:25:42 PDT (-0700), tschwi...@baylibre.com wrote:

Hi!

After having received around a dozen more buildbot notifications...

On 2024-04-10T06:46:04-0700, Palmer Dabbelt  wrote:

On Tue, 09 Apr 2024 07:57:24 PDT (-0700), ishitatsuy...@gmail.com wrote:

Fixes: 97069657c4e ("RISC-V: Implement TLS Descriptors.")

gcc/ChangeLog:
* config/riscv/riscv.opt.urls: Regenerated.
---
 gcc/config/riscv/riscv.opt.urls | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/config/riscv/riscv.opt.urls b/gcc/config/riscv/riscv.opt.urls
index da31820e234..351f7f0dda2 100644
--- a/gcc/config/riscv/riscv.opt.urls
+++ b/gcc/config/riscv/riscv.opt.urls
@@ -89,3 +89,5 @@ UrlSuffix(gcc/RISC-V-Options.html#index-minline-strncmp)
 minline-strlen
 UrlSuffix(gcc/RISC-V-Options.html#index-minline-strlen)

+; skipping UrlSuffix for 'mtls-dialect=' due to finding no URLs
+


Thanks.  I had another one over here 
, 
but let's go with yours -- I think the actual contents are the same, but 
I didn't actually run the regenerate script.  So


Reviewed-by: Palmer Dabbelt 
Acked-by: Palmer Dabbelt 


..., I've now pushed this to trunk branch in
commit c9500083073ff5e0f5c1c9db92d7ce6e51a62919
"Regenerate opt.urls".


Thanks, and sorry I forgot about this one.




Grüße
 Thomas

[pushed] c++: reference cast, conversion fn [PR113141]

2024-04-12 Thread Jason Merrill

Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

The second testcase in 113141 is a separate issue: we first decide that the
conversion is ill-formed, but then when recalculating the special c_cast_p
handling makes us think it's OK.  We don't want that, it should continue to
fall back to the reinterpret_cast interpretation.  And while we're here,
let's warn that we're not using the conversion function.

Note that the standard seems to say that in this case we should
treat (Matrix &) as const_cast(static_cast(X)),
which would use the conversion operator, but that doesn't match existing
practice, so let's resolve that another day.  I've raised this issue with
CWG; at the moment I lean toward never binding a temporary in a C-style cast
to reference type, which would also be a change from existing practice.

PR c++/113141

gcc/c-family/ChangeLog:

* c.opt: Add -Wcast-user-defined.

gcc/ChangeLog:

* doc/invoke.texi: Document -Wcast-user-defined.

gcc/cp/ChangeLog:

* call.cc (reference_binding): For an invalid cast, warn and don't
recalculate.

gcc/testsuite/ChangeLog:

* g++.dg/conversion/ref12.C: New test.

Co-authored-by: Patrick Palka 
---
 gcc/doc/invoke.texi | 13 +
 gcc/c-family/c.opt  |  4 
 gcc/cp/call.cc  | 12 +++-
 gcc/testsuite/g++.dg/conversion/ref12.C | 20 
 4 files changed, 48 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/conversion/ref12.C

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 5d5e70c3033..e3285587e4e 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -9391,6 +9391,19 @@ In a cast involving pointer to member types this warning 
warns whenever
 the type cast is changing the pointer to member type.
 This warning is enabled by @option{-Wextra}.
 
+@opindex Wcast-user-defined
+@opindex Wno-cast-user-defined
+@item -Wcast-user-defined
+Warn when a cast to reference type does not involve a user-defined
+conversion that the programmer might expect to be called.
+
+@smallexample
+struct A @{ operator const int&(); @} a;
+auto r = (int&)a; // warning
+@end smallexample
+
+This warning is enabled by default.
+
 @opindex Wwrite-strings
 @opindex Wno-write-strings
 @item -Wwrite-strings
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 56cccf2a67b..848c2fda203 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -514,6 +514,10 @@ Wcast-qual
 C ObjC C++ ObjC++ Var(warn_cast_qual) Warning
 Warn about casts which discard qualifiers.
 
+Wcast-user-defined
+C++ ObjC++ Var(warn_cast_user_defined) Warning Init(1)
+Warn about a cast to reference type that does not use a related user-defined 
conversion function.
+
 Wcatch-value
 C++ ObjC++ Warning Alias(Wcatch-value=, 1, 0)
 Warn about catch handlers of non-reference type.
diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index 15b5647298e..dbdd7c29fe8 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -2034,7 +2034,17 @@ reference_binding (tree rto, tree rfrom, tree expr, bool 
c_cast_p, int flags,
 recalculate the second conversion sequence.  */
   for (conversion *t = conv; t; t = next_conversion (t))
if (t->kind == ck_user
-   && DECL_CONV_FN_P (t->cand->fn))
+   && c_cast_p && !maybe_valid_p)
+ {
+   if (complain & tf_warning)
+ warning (OPT_Wcast_user_defined,
+  "casting %qT to %qT does not use %qD",
+  from, rto, t->cand->fn);
+   /* Don't let recalculation try to make this valid.  */
+   break;
+ }
+   else if (t->kind == ck_user
+&& DECL_CONV_FN_P (t->cand->fn))
  {
tree ftype = TREE_TYPE (TREE_TYPE (t->cand->fn));
/* A prvalue of non-class type is cv-unqualified.  */
diff --git a/gcc/testsuite/g++.dg/conversion/ref12.C 
b/gcc/testsuite/g++.dg/conversion/ref12.C
new file mode 100644
index 000..27ed9122769
--- /dev/null
+++ b/gcc/testsuite/g++.dg/conversion/ref12.C
@@ -0,0 +1,20 @@
+// PR c++/113141
+
+struct Matrix { };
+
+struct TPoint3 { private: operator const Matrix(); };
+
+void f(Matrix&);
+
+int main() {
+  TPoint3 X;
+  Matrix& m = (Matrix &)X; // { dg-warning "does not use" }
+  f((Matrix &)X);  // { dg-warning "does not use" }
+}
+
+struct A { private: operator const int&(); } a;
+int  = (int&)a;  // { dg-warning "does not use" }
+
+struct B { B(int); };
+int i;
+B  = (B&)i; // { dg-warning "does not use" }

base-commit: 0fd824d717ca901319864a5eeba4e62b278f8329
-- 
2.44.0

[pushed] c++: reference list-init, conversion fn [PR113141]

2024-04-12 Thread Jason Merrill

Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

The original testcase in PR113141 is an instance of CWG1996; the standard
fails to consider conversion functions when initializing a reference
directly from an initializer-list of one element, but then does consider
them when initializing a temporary.  I have a proposed fix for this defect,
which is implemented here.

DR 1996
PR c++/113141

gcc/cp/ChangeLog:

* call.cc (reference_binding): Check direct binding from
a single-element list.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/initlist-ref1.C: New test.
* g++.dg/cpp0x/initlist-ref2.C: New test.
* g++.dg/cpp0x/initlist-ref3.C: New test.

Co-authored-by: Patrick Palka 
---
 gcc/cp/call.cc | 21 +
 gcc/testsuite/g++.dg/cpp0x/initlist-ref1.C | 16 
 gcc/testsuite/g++.dg/cpp0x/initlist-ref2.C | 10 ++
 gcc/testsuite/g++.dg/cpp0x/initlist-ref3.C | 13 +
 4 files changed, 56 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/initlist-ref1.C
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/initlist-ref2.C
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/initlist-ref3.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index 9568b5eb2c4..15b5647298e 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -1596,7 +1596,9 @@ standard_conversion (tree to, tree from, tree expr, bool 
c_cast_p,
   return conv;
 }
 
-/* Returns nonzero if T1 is reference-related to T2.  */
+/* Returns nonzero if T1 is reference-related to T2.
+
+   This is considered when a reference to T1 is initialized by a T2.  */
 
 bool
 reference_related_p (tree t1, tree t2)
@@ -1757,6 +1759,7 @@ reference_binding (tree rto, tree rfrom, tree expr, bool 
c_cast_p, int flags,
 }
 
   bool copy_list_init = false;
+  bool single_list_conv = false;
   if (expr && BRACE_ENCLOSED_INITIALIZER_P (expr))
 {
   maybe_warn_cpp0x (CPP0X_INITIALIZER_LISTS);
@@ -1783,6 +1786,11 @@ reference_binding (tree rto, tree rfrom, tree expr, bool 
c_cast_p, int flags,
  from = etype;
  goto skip;
}
+ else if (CLASS_TYPE_P (etype) && TYPE_HAS_CONVERSION (etype))
+   /* CWG1996: jason's proposed drafting adds "or initializing T from E
+  would bind directly".  We check that in the direct binding with
+  conversion code below.  */
+   single_list_conv = true;
}
   /* Otherwise, if T is a reference type, a prvalue temporary of the type
 referenced by T is copy-list-initialized, and the reference is bound
@@ -1907,9 +1915,14 @@ reference_binding (tree rto, tree rfrom, tree expr, bool 
c_cast_p, int flags,
  (possibly cv-qualified) object to the (possibly cv-qualified) same
  object type (or a reference to it), to a (possibly cv-qualified) base
  class of that type (or a reference to it) */
-  else if (CLASS_TYPE_P (from) && !related_p
-  && !(flags & LOOKUP_NO_CONVERSION))
+  else if (!related_p
+  && !(flags & LOOKUP_NO_CONVERSION)
+  && (CLASS_TYPE_P (from) || single_list_conv))
 {
+  tree rexpr = expr;
+  if (single_list_conv)
+   rexpr = CONSTRUCTOR_ELT (expr, 0)->value;
+
   /* [dcl.init.ref]
 
 If the initializer expression
@@ -1923,7 +1936,7 @@ reference_binding (tree rto, tree rfrom, tree expr, bool 
c_cast_p, int flags,
 
the reference is bound to the lvalue result of the conversion
in the second case.  */
-  z_candidate *cand = build_user_type_conversion_1 (rto, expr, flags,
+  z_candidate *cand = build_user_type_conversion_1 (rto, rexpr, flags,
complain);
   if (cand)
{
diff --git a/gcc/testsuite/g++.dg/cpp0x/initlist-ref1.C 
b/gcc/testsuite/g++.dg/cpp0x/initlist-ref1.C
new file mode 100644
index 000..f893f12dafa
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/initlist-ref1.C
@@ -0,0 +1,16 @@
+// PR c++/113141
+// { dg-do compile { target c++11 } }
+
+struct ConvToRef {
+  operator int&();
+};
+
+struct A { int& r; };
+
+void f(A);
+
+int main() {
+  ConvToRef c;
+  A a{{c}};
+  f({{c}});
+}
diff --git a/gcc/testsuite/g++.dg/cpp0x/initlist-ref2.C 
b/gcc/testsuite/g++.dg/cpp0x/initlist-ref2.C
new file mode 100644
index 000..401d868d820
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/initlist-ref2.C
@@ -0,0 +1,10 @@
+// CWG1996
+// { dg-do compile { target c++11 } }
+
+struct S { operator struct D &(); } s;
+D {s};   // OK, direct binding
+
+namespace N1 {
+  struct S { operator volatile struct D &(); } s;
+  const D {s};// { dg-error "invalid user-defined|discards qualifiers" }
+}
diff --git a/gcc/testsuite/g++.dg/cpp0x/initlist-ref3.C 
b/gcc/testsuite/g++.dg/cpp0x/initlist-ref3.C
new file mode 100644
index 000..e2cc1deace5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/initlist-ref3.C
@@ -0,0

Re: [PATCH] Regenerate opt.urls

2024-04-12 Thread Thomas Schwinge

Hi!

After having received around a dozen more buildbot notifications...

On 2024-04-10T06:46:04-0700, Palmer Dabbelt  wrote:
> On Tue, 09 Apr 2024 07:57:24 PDT (-0700), ishitatsuy...@gmail.com wrote:
>> Fixes: 97069657c4e ("RISC-V: Implement TLS Descriptors.")
>>
>> gcc/ChangeLog:
>>  * config/riscv/riscv.opt.urls: Regenerated.
>> ---
>>  gcc/config/riscv/riscv.opt.urls | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/gcc/config/riscv/riscv.opt.urls 
>> b/gcc/config/riscv/riscv.opt.urls
>> index da31820e234..351f7f0dda2 100644
>> --- a/gcc/config/riscv/riscv.opt.urls
>> +++ b/gcc/config/riscv/riscv.opt.urls
>> @@ -89,3 +89,5 @@ UrlSuffix(gcc/RISC-V-Options.html#index-minline-strncmp)
>>  minline-strlen
>>  UrlSuffix(gcc/RISC-V-Options.html#index-minline-strlen)
>>
>> +; skipping UrlSuffix for 'mtls-dialect=' due to finding no URLs
>> +
>
> Thanks.  I had another one over here 
> ,
>  
> but let's go with yours -- I think the actual contents are the same, but 
> I didn't actually run the regenerate script.  So
>
> Reviewed-by: Palmer Dabbelt 
> Acked-by: Palmer Dabbelt 

..., I've now pushed this to trunk branch in
commit c9500083073ff5e0f5c1c9db92d7ce6e51a62919
"Regenerate opt.urls".


Grüße
 Thomas

Re: [PATCH] c++/modules: local class merging [PR99426]

2024-04-12 Thread Jason Merrill


On 4/12/24 14:39, Patrick Palka wrote:

On Fri, 12 Apr 2024, Jason Merrill wrote:


On 4/12/24 13:48, Patrick Palka wrote:

On Fri, 12 Apr 2024, Jason Merrill wrote:


On 4/12/24 10:35, Patrick Palka wrote:

On Wed, 10 Apr 2024, Jason Merrill wrote:


On 4/10/24 14:48, Patrick Palka wrote:

On Tue, 9 Apr 2024, Jason Merrill wrote:


On 3/5/24 10:31, Patrick Palka wrote:

On Tue, 27 Feb 2024, Patrick Palka wrote:

Subject: [PATCH] c++/modules: local type merging [PR99426]

One known missing piece in the modules implementation is merging
of
a
streamed-in local type (class or enum) with the corresponding
in-TU
version of the local type.  This missing piece turns out to
cause a
hard-to-reduce use-after-free GC issue due to the entity_ary not
being
marked as a GC root (deliberately), and manifests as a
serialization
error on stream-in as in PR99426 (see comment #6 for a
reduction).
It's
also reproducible on trunk when running the xtreme-header tests
without
-fno-module-lazy.

This patch makes us merge such local types according to their
position
within the containing function's definition, analogous to how we
merge
FIELD_DECLs of a class according to their index in the
TYPE_FIELDS
list.

PR c++/99426

gcc/cp/ChangeLog:

* module.cc (merge_kind::MK_local_type): New enumerator.
(merge_kind_name): Update.
(trees_out::chained_decls): Move BLOCK-specific handling
of DECL_LOCAL_DECL_P decls to ...
(trees_out::core_vals) : ... here.  Stream
BLOCK_VARS manually.
(trees_in::core_vals) : Stream BLOCK_VARS
manually.  Handle deduplicated local types..
(trees_out::key_local_type): Define.
(trees_in::key_local_type): Define.
(trees_out::get_merge_kind) : Return
MK_local_type for a local type.
(trees_out::key_mergeable) : Use
key_local_type.
(trees_in::key_mergeable) : Likewise.
(trees_in::is_matching_decl): Be flexible with type mismatches
for local entities.

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 80b63a70a62..d9e34e9a4b9 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -6714,7 +6720,37 @@ trees_in::core_vals (tree t)
  case BLOCK:
t->block.locus = state->read_location (*this);
t->block.end_locus = state->read_location (*this);
-  t->block.vars = chained_decls ();
+
+  for (tree *chain = >block.vars;;)
+   if (tree decl = tree_node ())
+ {
+   /* For a deduplicated local type or enumerator, chain the
+  duplicate decl instead of the canonical in-TU decl.
Seeing
+  a duplicate here means the containing function whose
body
+  we're streaming in is a duplicate too, so we'll end up
+  discarding this BLOCK (and the rest of the duplicate
function
+  body) anyway.  */
+   if (is_duplicate (decl))
+ decl = maybe_duplicate (decl);
+   else if (DECL_IMPLICIT_TYPEDEF_P (decl)
+&& TYPE_TEMPLATE_INFO (TREE_TYPE (decl)))
+ {
+   tree tmpl = TYPE_TI_TEMPLATE (TREE_TYPE (decl));
+   if (DECL_TEMPLATE_RESULT (tmpl) == decl &&
is_duplicate
(tmpl))
+ decl = DECL_TEMPLATE_RESULT (maybe_duplicate
(tmpl));
+ }


This seems like a lot of generally-applicable code for finding the
duplicate,
which other calls to maybe_duplicate/odr_duplicate don't use.  If
the
template
is a duplicate, why isn't its result?  If there's a good reason
for
that,
should this template handling go into maybe_duplicate?


Ah yeah, that makes sense.

Some context: IIUC modules treats the TEMPLATE_DECL instead of the
DECL_TEMPLATE_RESULT as the canonical decl, which in turn means
we'll
register_duplicate only the TEMPLATE_DECL.  But BLOCK_VARS never
contains
a TEMPLATE_DECL, always the DECL_TEMPLATE_RESULT (i.e. a TYPE_DECL),
hence the extra handling.

Given that it's relatively more difficult to get at the
TEMPLATE_DECL
from the DECL_TEMPLATE_RESULT rather than vice versa, maybe we
should
just register both as duplicates from register_duplicate?  That way
callers can just simply pass the DECL_TEMPLATE_RESULT to
maybe_duplicate
and it'll do the right thing.


Sounds good.


@@ -10337,6 +10373,83 @@ trees_in::fn_parms_fini (int tag, tree
fn,
tree
existing, bool is_defn)
  }
  }
  +/* Encode into KEY the position of the local type (class
or
enum)
+   declaration DECL within FN.  The position is encoded as the
+   index of the innermost BLOCK (numbered in BFS order) along
with
+   the index within its BLOCK_VARS list.  */


Since we already set DECL_DISCRIMINATOR for mangling, could we use
it+name
for
the key as well?


We could (and IIUc that'd be more robust to ODR violations), but
wouldn't it mean we'd have to do a linear walk over all BLOCK_VARs
of
all BLOCKS in order to find the one with the matching
name+discriminator?  That'd be slower than the current approach

Re: [PATCH] c++: Fix constexpr evaluation of parameters passed by invisible reference [PR111284]

2024-04-12 Thread Jason Merrill


On 3/8/24 03:56, Jakub Jelinek wrote:

Hi!

My r9-6136 changes to make a copy of constexpr function bodies before
genericization modifies it broke the constant evaluation of non-POD
arguments passed by value.
In the callers such arguments are passed as reference to usually a
TARGET_EXPR, but on the callee side until genericization they are just
direct uses of a PARM_DECL with some class type.
In cxx_bind_parameters_in_call I've used convert_from_reference to
pretend it is passed by value and then cxx_eval_constant_expression
is called there and evaluates that as an rvalue, followed by
adjust_temp_type if the types don't match exactly (e.g. const Foo
argument and passing to it reference to Foo TARGET_EXPR).

The reason this doesn't work is that when the TARGET_EXPR in the caller
is constant initialized, this for it is the address of the TARGET_EXPR_SLOT,
but if the code later on pretends the PARM_DECL is just initialized to the
rvalue of the constant evaluation of the TARGET_EXPR, it is as if there
is a bitwise copy of the TARGET_EXPR to the callee, so this in the callee
is then address of the PARM_DECL in the callee.

The following patch attempts to fix that by constexpr evaluation of such
arguments in the caller as an lvalue instead of rvalue, and on the callee
side when seeing such a PARM_DECL, if we want an lvalue, lookup the value
(lvalue) saved in ctx->globals (if any), and if wanting an rvalue,
recursing with vc_prvalue on the looked up value (because it is there
as an lvalue, nor rvalue).

adjust_temp_type doesn't work for lvalues of non-scalarish types, for
such types it relies on changing the type of a CONSTRUCTOR, but on the
other side we know what we pass to the argument is addressable, so
the patch on type mismatch takes address of the argument value, casts
to reference to the desired type and dereferences it.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-03-08  Jakub Jelinek  

PR c++/111284
* constexpr.cc (cxx_bind_parameters_in_call): For PARM_DECLs with
TREE_ADDRESSABLE types use vc_glvalue rather than vc_prvalue for
cxx_eval_constant_expression and if it doesn't have the same
type as it should, cast the reference type to reference to type
before convert_from_reference and instead of adjust_temp_type
take address of the arg, cast to reference to type and then
convert_from_reference.
(cxx_eval_constant_expression) : For lval case
on parameters with TREE_ADDRESSABLE types lookup result in
ctx->globals if possible.  Otherwise if lookup in ctx->globals
was successful for parameter with TREE_ADDRESSABLE type,
recurse with vc_prvalue on the returned value.

* g++.dg/cpp1z/constexpr-111284.C: New test.
* g++.dg/cpp1y/constexpr-lifetime7.C: Expect one error on a different
line.

--- gcc/cp/constexpr.cc.jj  2024-02-13 10:29:57.979155641 +0100
+++ gcc/cp/constexpr.cc 2024-03-07 19:35:01.032412221 +0100
@@ -1877,13 +1877,21 @@ cxx_bind_parameters_in_call (const const
  x = build_address (x);
}
if (TREE_ADDRESSABLE (type))
-   /* Undo convert_for_arg_passing work here.  */
-   x = convert_from_reference (x);
-  /* Normally we would strip a TARGET_EXPR in an initialization context
-such as this, but here we do the elision differently: we keep the
-TARGET_EXPR, and use its CONSTRUCTOR as the value of the parm.  */
-  arg = cxx_eval_constant_expression (ctx, x, vc_prvalue,
- non_constant_p, overflow_p);
+   {
+ /* Undo convert_for_arg_passing work here.  */
+ if (TYPE_REF_P (TREE_TYPE (x))
+ && !same_type_p (type, TREE_TYPE (TREE_TYPE (x
+   x = cp_fold_convert (build_reference_type (type), x);
+ x = convert_from_reference (x);
+ arg = cxx_eval_constant_expression (ctx, x, vc_glvalue,
+ non_constant_p, overflow_p);
+   }
+  else
+   /* Normally we would strip a TARGET_EXPR in an initialization context
+  such as this, but here we do the elision differently: we keep the
+  TARGET_EXPR, and use its CONSTRUCTOR as the value of the parm.  */
+   arg = cxx_eval_constant_expression (ctx, x, vc_prvalue,
+   non_constant_p, overflow_p);


It seems simpler to move the convert_for_reference after the 
cxx_eval_constant_expression rather than duplicate the call to 
cxx_eval_constant_expression.



/* Check we aren't dereferencing a null pointer when calling a 
non-static
 member function, which is undefined behaviour.  */
if (i == 0 && DECL_OBJECT_MEMBER_FUNCTION_P (fun)
@@ -1909,7 +1917,16 @@ cxx_bind_parameters_in_call (const const
{
  /* Make sure the binding has the same type as the parm.  But
 only for constant args.  */
-

Re: [PATCH] c++/modules: local class merging [PR99426]

2024-04-12 Thread Patrick Palka

On Fri, 12 Apr 2024, Jason Merrill wrote:

> On 4/12/24 13:48, Patrick Palka wrote:
> > On Fri, 12 Apr 2024, Jason Merrill wrote:
> > 
> > > On 4/12/24 10:35, Patrick Palka wrote:
> > > > On Wed, 10 Apr 2024, Jason Merrill wrote:
> > > > 
> > > > > On 4/10/24 14:48, Patrick Palka wrote:
> > > > > > On Tue, 9 Apr 2024, Jason Merrill wrote:
> > > > > > 
> > > > > > > On 3/5/24 10:31, Patrick Palka wrote:
> > > > > > > > On Tue, 27 Feb 2024, Patrick Palka wrote:
> > > > > > > > 
> > > > > > > > Subject: [PATCH] c++/modules: local type merging [PR99426]
> > > > > > > > 
> > > > > > > > One known missing piece in the modules implementation is merging
> > > > > > > > of
> > > > > > > > a
> > > > > > > > streamed-in local type (class or enum) with the corresponding
> > > > > > > > in-TU
> > > > > > > > version of the local type.  This missing piece turns out to
> > > > > > > > cause a
> > > > > > > > hard-to-reduce use-after-free GC issue due to the entity_ary not
> > > > > > > > being
> > > > > > > > marked as a GC root (deliberately), and manifests as a
> > > > > > > > serialization
> > > > > > > > error on stream-in as in PR99426 (see comment #6 for a
> > > > > > > > reduction).
> > > > > > > > It's
> > > > > > > > also reproducible on trunk when running the xtreme-header tests
> > > > > > > > without
> > > > > > > > -fno-module-lazy.
> > > > > > > > 
> > > > > > > > This patch makes us merge such local types according to their
> > > > > > > > position
> > > > > > > > within the containing function's definition, analogous to how we
> > > > > > > > merge
> > > > > > > > FIELD_DECLs of a class according to their index in the
> > > > > > > > TYPE_FIELDS
> > > > > > > > list.
> > > > > > > > 
> > > > > > > > PR c++/99426
> > > > > > > > 
> > > > > > > > gcc/cp/ChangeLog:
> > > > > > > > 
> > > > > > > > * module.cc (merge_kind::MK_local_type): New enumerator.
> > > > > > > > (merge_kind_name): Update.
> > > > > > > > (trees_out::chained_decls): Move BLOCK-specific handling
> > > > > > > > of DECL_LOCAL_DECL_P decls to ...
> > > > > > > > (trees_out::core_vals) : ... here.  Stream
> > > > > > > > BLOCK_VARS manually.
> > > > > > > > (trees_in::core_vals) : Stream BLOCK_VARS
> > > > > > > > manually.  Handle deduplicated local types..
> > > > > > > > (trees_out::key_local_type): Define.
> > > > > > > > (trees_in::key_local_type): Define.
> > > > > > > > (trees_out::get_merge_kind) : Return
> > > > > > > > MK_local_type for a local type.
> > > > > > > > (trees_out::key_mergeable) : Use
> > > > > > > > key_local_type.
> > > > > > > > (trees_in::key_mergeable) : 
> > > > > > > > Likewise.
> > > > > > > > (trees_in::is_matching_decl): Be flexible with type 
> > > > > > > > mismatches
> > > > > > > > for local entities.
> > > > > > > > 
> > > > > > > > diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
> > > > > > > > index 80b63a70a62..d9e34e9a4b9 100644
> > > > > > > > --- a/gcc/cp/module.cc
> > > > > > > > +++ b/gcc/cp/module.cc
> > > > > > > > @@ -6714,7 +6720,37 @@ trees_in::core_vals (tree t)
> > > > > > > >  case BLOCK:
> > > > > > > >t->block.locus = state->read_location (*this);
> > > > > > > >t->block.end_locus = state->read_location (*this);
> > > > > > > > -  t->block.vars = chained_decls ();
> > > > > > > > +
> > > > > > > > +  for (tree *chain = >block.vars;;)
> > > > > > > > +   if (tree decl = tree_node ())
> > > > > > > > + {
> > > > > > > > +   /* For a deduplicated local type or enumerator, 
> > > > > > > > chain the
> > > > > > > > +  duplicate decl instead of the canonical in-TU 
> > > > > > > > decl.
> > > > > > > > Seeing
> > > > > > > > +  a duplicate here means the containing function 
> > > > > > > > whose
> > > > > > > > body
> > > > > > > > +  we're streaming in is a duplicate too, so we'll 
> > > > > > > > end up
> > > > > > > > +  discarding this BLOCK (and the rest of the 
> > > > > > > > duplicate
> > > > > > > > function
> > > > > > > > +  body) anyway.  */
> > > > > > > > +   if (is_duplicate (decl))
> > > > > > > > + decl = maybe_duplicate (decl);
> > > > > > > > +   else if (DECL_IMPLICIT_TYPEDEF_P (decl)
> > > > > > > > +&& TYPE_TEMPLATE_INFO (TREE_TYPE (decl)))
> > > > > > > > + {
> > > > > > > > +   tree tmpl = TYPE_TI_TEMPLATE (TREE_TYPE (decl));
> > > > > > > > +   if (DECL_TEMPLATE_RESULT (tmpl) == decl &&
> > > > > > > > is_duplicate
> > > > > > > > (tmpl))
> > > > > > > > + decl = DECL_TEMPLATE_RESULT (maybe_duplicate
> > > > > > > > (tmpl));
> > > > > > > > + }
> > > > > > > 
> > > > > > > This seems like a lot of generally-applicable code for finding the
> > > > > > > duplicate,
> > > > > > > which other

Re: [PATCH] c++: Fix bogus warnings about ignored annotations [PR114409]

2024-04-12 Thread Jason Merrill


On 3/22/24 04:08, Jakub Jelinek wrote:

Hi!

The middle-end warns about the ANNOTATE_EXPR added for while/for loops
if they declare a var inside of the loop condition.
This is because the assumption is that ANNOTATE_EXPR argument is used
immediately in a COND_EXPR (later GIMPLE_COND), but simplify_loop_decl_cond
wraps the ANNOTATE_EXPR inside of a TRUTH_NOT_EXPR, so it no longer
holds.

The following patch fixes that by adding the TRUTH_NOT_EXPR inside of the
ANNOTATE_EXPR argument if any.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK


Note, the PR is mostly about ICE with the annotations used in a template,
this patch doesn't change anything on that and I really don't know what
should be done in that case.

2024-03-22  Jakub Jelinek  

PR c++/114409
* semantics.cc (simplify_loop_decl_cond): Use cp_build_unary_op with
TRUTH_NOT_EXPR on ANNOTATE_EXPR argument (if any) rather than
ANNOTATE_EXPR itself.

* g++.dg/ext/pr114409.C: New test.

--- gcc/cp/semantics.cc.jj  2024-03-01 17:27:58.862888609 +0100
+++ gcc/cp/semantics.cc 2024-03-21 15:24:57.296857864 +0100
@@ -799,7 +799,11 @@ simplify_loop_decl_cond (tree *cond_p, t
*cond_p = boolean_true_node;
  
if_stmt = begin_if_stmt ();

-  cond = cp_build_unary_op (TRUTH_NOT_EXPR, cond, false, tf_warning_or_error);
+  cond_p = 
+  while (TREE_CODE (*cond_p) == ANNOTATE_EXPR)
+cond_p = _OPERAND (*cond_p, 0);
+  *cond_p = cp_build_unary_op (TRUTH_NOT_EXPR, *cond_p, false,
+  tf_warning_or_error);
finish_if_stmt_cond (cond, if_stmt);
finish_break_stmt ();
finish_then_clause (if_stmt);
--- gcc/testsuite/g++.dg/ext/pr114409.C.jj  2024-03-21 15:27:44.077661090 
+0100
+++ gcc/testsuite/g++.dg/ext/pr114409.C 2024-03-21 15:27:15.331039726 +0100
@@ -0,0 +1,22 @@
+// PR c++/114409
+// { dg-do compile }
+// { dg-options "-O2 -Wall" }
+
+void qux (int);
+int foo (int);
+
+void
+bar (int x)
+{
+  #pragma GCC novector
+  while (int y = foo (x))  // { dg-bogus "ignoring loop annotation" }
+qux (y);
+}
+
+void
+baz (int x)
+{
+  #pragma GCC novector
+  for (; int y = foo (x); )// { dg-bogus "ignoring loop annotation" }
+qux (y);
+}

Jakub

Re: [PATCH] c++/modules: local class merging [PR99426]

2024-04-12 Thread Jason Merrill


On 4/12/24 13:48, Patrick Palka wrote:

On Fri, 12 Apr 2024, Jason Merrill wrote:


On 4/12/24 10:35, Patrick Palka wrote:

On Wed, 10 Apr 2024, Jason Merrill wrote:


On 4/10/24 14:48, Patrick Palka wrote:

On Tue, 9 Apr 2024, Jason Merrill wrote:


On 3/5/24 10:31, Patrick Palka wrote:

On Tue, 27 Feb 2024, Patrick Palka wrote:

Subject: [PATCH] c++/modules: local type merging [PR99426]

One known missing piece in the modules implementation is merging of
a
streamed-in local type (class or enum) with the corresponding in-TU
version of the local type.  This missing piece turns out to cause a
hard-to-reduce use-after-free GC issue due to the entity_ary not
being
marked as a GC root (deliberately), and manifests as a serialization
error on stream-in as in PR99426 (see comment #6 for a reduction).
It's
also reproducible on trunk when running the xtreme-header tests
without
-fno-module-lazy.

This patch makes us merge such local types according to their
position
within the containing function's definition, analogous to how we
merge
FIELD_DECLs of a class according to their index in the TYPE_FIELDS
list.

PR c++/99426

gcc/cp/ChangeLog:

* module.cc (merge_kind::MK_local_type): New enumerator.
(merge_kind_name): Update.
(trees_out::chained_decls): Move BLOCK-specific handling
of DECL_LOCAL_DECL_P decls to ...
(trees_out::core_vals) : ... here.  Stream
BLOCK_VARS manually.
(trees_in::core_vals) : Stream BLOCK_VARS
manually.  Handle deduplicated local types..
(trees_out::key_local_type): Define.
(trees_in::key_local_type): Define.
(trees_out::get_merge_kind) : Return
MK_local_type for a local type.
(trees_out::key_mergeable) : Use
key_local_type.
(trees_in::key_mergeable) : Likewise.
(trees_in::is_matching_decl): Be flexible with type mismatches
for local entities.

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 80b63a70a62..d9e34e9a4b9 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -6714,7 +6720,37 @@ trees_in::core_vals (tree t)
 case BLOCK:
   t->block.locus = state->read_location (*this);
   t->block.end_locus = state->read_location (*this);
-  t->block.vars = chained_decls ();
+
+  for (tree *chain = >block.vars;;)
+   if (tree decl = tree_node ())
+ {
+   /* For a deduplicated local type or enumerator, chain the
+  duplicate decl instead of the canonical in-TU decl.
Seeing
+  a duplicate here means the containing function whose
body
+  we're streaming in is a duplicate too, so we'll end up
+  discarding this BLOCK (and the rest of the duplicate
function
+  body) anyway.  */
+   if (is_duplicate (decl))
+ decl = maybe_duplicate (decl);
+   else if (DECL_IMPLICIT_TYPEDEF_P (decl)
+&& TYPE_TEMPLATE_INFO (TREE_TYPE (decl)))
+ {
+   tree tmpl = TYPE_TI_TEMPLATE (TREE_TYPE (decl));
+   if (DECL_TEMPLATE_RESULT (tmpl) == decl &&
is_duplicate
(tmpl))
+ decl = DECL_TEMPLATE_RESULT (maybe_duplicate
(tmpl));
+ }


This seems like a lot of generally-applicable code for finding the
duplicate,
which other calls to maybe_duplicate/odr_duplicate don't use.  If the
template
is a duplicate, why isn't its result?  If there's a good reason for
that,
should this template handling go into maybe_duplicate?


Ah yeah, that makes sense.

Some context: IIUC modules treats the TEMPLATE_DECL instead of the
DECL_TEMPLATE_RESULT as the canonical decl, which in turn means we'll
register_duplicate only the TEMPLATE_DECL.  But BLOCK_VARS never
contains
a TEMPLATE_DECL, always the DECL_TEMPLATE_RESULT (i.e. a TYPE_DECL),
hence the extra handling.

Given that it's relatively more difficult to get at the TEMPLATE_DECL
from the DECL_TEMPLATE_RESULT rather than vice versa, maybe we should
just register both as duplicates from register_duplicate?  That way
callers can just simply pass the DECL_TEMPLATE_RESULT to maybe_duplicate
and it'll do the right thing.


Sounds good.


@@ -10337,6 +10373,83 @@ trees_in::fn_parms_fini (int tag, tree fn,
tree
existing, bool is_defn)
 }
 }
 +/* Encode into KEY the position of the local type (class or
enum)
+   declaration DECL within FN.  The position is encoded as the
+   index of the innermost BLOCK (numbered in BFS order) along with
+   the index within its BLOCK_VARS list.  */


Since we already set DECL_DISCRIMINATOR for mangling, could we use
it+name
for
the key as well?


We could (and IIUc that'd be more robust to ODR violations), but
wouldn't it mean we'd have to do a linear walk over all BLOCK_VARs of
all BLOCKS in order to find the one with the matching
name+discriminator?  That'd be slower than the current approach which
lets us skip to the correct BLOCK and walk only its BLOCK_VARS.


Ah, good point.

Re: [PATCH 3/3] c++/modules: Propagate hidden flag on decls from partitions

2024-04-12 Thread Jason Merrill


On 4/11/24 20:41, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?


OK.


-- >8 --

While working on some other fixes I noticed that the partition handling
code used the wrong flag to propagate OVL_HIDDEN_P on exported bindings
from partitions. This patch fixes that, and renames the flag to be
clearer.

gcc/cp/ChangeLog:

* name-lookup.cc (walk_module_binding): Use the
partition-specific hidden flag instead of the top-level
decl_hidden.

gcc/testsuite/ChangeLog:

* g++.dg/modules/using-16_a.C: New test.
* g++.dg/modules/using-16_b.C: New test.
* g++.dg/modules/using-16_c.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/name-lookup.cc | 10 +-
  gcc/testsuite/g++.dg/modules/using-16_a.C | 11 +++
  gcc/testsuite/g++.dg/modules/using-16_b.C | 12 
  gcc/testsuite/g++.dg/modules/using-16_c.C | 11 +++
  4 files changed, 39 insertions(+), 5 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/modules/using-16_a.C
  create mode 100644 gcc/testsuite/g++.dg/modules/using-16_b.C
  create mode 100644 gcc/testsuite/g++.dg/modules/using-16_c.C

diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
index 7af7f00e34c..b7746938e1b 100644
--- a/gcc/cp/name-lookup.cc
+++ b/gcc/cp/name-lookup.cc
@@ -4274,19 +4274,19 @@ walk_module_binding (tree binding, bitmap partitions,
  
  			count += callback (btype, flags, data);

  }
-   bool hidden = STAT_DECL_HIDDEN_P (bind);
+   bool part_hidden = STAT_DECL_HIDDEN_P (bind);
for (ovl_iterator iter (MAYBE_STAT_DECL (STAT_DECL (bind)));
 iter; ++iter)
  {
if (iter.hidden_p ())
- hidden = true;
+ part_hidden = true;
gcc_checking_assert
- (!(hidden && DECL_IS_UNDECLARED_BUILTIN (*iter)));
+ (!(part_hidden && DECL_IS_UNDECLARED_BUILTIN 
(*iter)));
  
  			WMB_Flags flags = WMB_None;

if (maybe_dups)
  flags = WMB_Flags (flags | WMB_Dups);
-   if (decl_hidden)
+   if (part_hidden)
  flags = WMB_Flags (flags | WMB_Hidden);
if (iter.using_p ())
  {
@@ -4295,7 +4295,7 @@ walk_module_binding (tree binding, bitmap partitions,
  flags = WMB_Flags (flags | WMB_Export);
  }
count += callback (*iter, flags, data);
-   hidden = false;
+   part_hidden = false;
  }
  }
  }
diff --git a/gcc/testsuite/g++.dg/modules/using-16_a.C 
b/gcc/testsuite/g++.dg/modules/using-16_a.C
new file mode 100644
index 000..25d8bca5d1c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/using-16_a.C
@@ -0,0 +1,11 @@
+// { dg-additional-options "-fmodules-ts" }
+// { dg-module-cmi M:S }
+
+export module M:S;
+
+namespace foo {
+  // propagate hidden from partitions
+  export struct S {
+friend void f(S);
+  };
+};
diff --git a/gcc/testsuite/g++.dg/modules/using-16_b.C 
b/gcc/testsuite/g++.dg/modules/using-16_b.C
new file mode 100644
index 000..3f704a913f4
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/using-16_b.C
@@ -0,0 +1,12 @@
+// { dg-additional-options "-fmodules-ts -Wno-global-module" }
+// { dg-module-cmi M }
+
+module;
+namespace bar {
+  void f(int);
+}
+export module M;
+export import :S;
+namespace foo {
+  export using bar::f;
+}
diff --git a/gcc/testsuite/g++.dg/modules/using-16_c.C 
b/gcc/testsuite/g++.dg/modules/using-16_c.C
new file mode 100644
index 000..5e46cd16013
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/using-16_c.C
@@ -0,0 +1,11 @@
+// { dg-additional-options "-fmodules-ts" }
+import M;
+
+int main() {
+  // this should be hidden and fail
+  foo::f(foo::S{});  // { dg-error "cannot convert" }
+
+  // but these should be legal
+  foo::f(10);
+  f(foo::S{});
+}

Re: [PATCH 2/3] c++/modules: Propagate using decls from partitions

2024-04-12 Thread Jason Merrill


On 4/11/24 20:40, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

The modules code currently neglects to set OVL_USING_P on the dependency
created for a using-decl, which causes it not to remember that the
OVL_EXPORT_P flag had been set on it when emitted from the primary
interface unit. This patch ensures that it occurs.

gcc/cp/ChangeLog:

* module.cc (depset::hash::add_binding_entity): Propagate
OVL_USING_P for using-declarations.

gcc/testsuite/ChangeLog:

* g++.dg/modules/using-15_a.C: New test.
* g++.dg/modules/using-15_b.C: New test.
* g++.dg/modules/using-15_c.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/module.cc  |  4 
  gcc/testsuite/g++.dg/modules/using-15_a.C | 13 +
  gcc/testsuite/g++.dg/modules/using-15_b.C |  5 +
  gcc/testsuite/g++.dg/modules/using-15_c.C |  7 +++
  4 files changed, 29 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/modules/using-15_a.C
  create mode 100644 gcc/testsuite/g++.dg/modules/using-15_b.C
  create mode 100644 gcc/testsuite/g++.dg/modules/using-15_c.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 9d054c4c792..527c9046c67 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -12915,10 +12915,12 @@ depset::hash::add_binding_entity (tree decl, 
WMB_Flags flags, void *data_)
/* Ignore NTTP objects.  */
return false;
  
+  bool unscoped_enum_const_p = false;

if (!(flags & WMB_Using) && CP_DECL_CONTEXT (decl) != data->ns)
{
  /* A using that lost its wrapper or an unscoped enum
 constant.  */
+ unscoped_enum_const_p = (TREE_CODE (decl) == CONST_DECL);


How does this interact with C++20 using enum?


  flags = WMB_Flags (flags | WMB_Using);
  if (DECL_MODULE_EXPORT_P (TREE_CODE (decl) == CONST_DECL
? TYPE_NAME (TREE_TYPE (decl))
@@ -12979,6 +12981,8 @@ depset::hash::add_binding_entity (tree decl, WMB_Flags 
flags, void *data_)
if (flags & WMB_Using)
{
  decl = ovl_make (decl, NULL_TREE);
+ if (!unscoped_enum_const_p)
+   OVL_USING_P (decl) = true;
  if (flags & WMB_Export)
OVL_EXPORT_P (decl) = true;
}
diff --git a/gcc/testsuite/g++.dg/modules/using-15_a.C 
b/gcc/testsuite/g++.dg/modules/using-15_a.C
new file mode 100644
index 000..23895bd8c4a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/using-15_a.C
@@ -0,0 +1,13 @@
+// { dg-additional-options "-fmodules-ts -Wno-global-module" }
+// { dg-module-cmi M:a }
+
+module;
+namespace foo {
+  void a();
+};
+export module M:a;
+
+namespace bar {
+  // propagate usings from partitions
+  export using foo::a;
+};
diff --git a/gcc/testsuite/g++.dg/modules/using-15_b.C 
b/gcc/testsuite/g++.dg/modules/using-15_b.C
new file mode 100644
index 000..a88f86af61f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/using-15_b.C
@@ -0,0 +1,5 @@
+// { dg-additional-options "-fmodules-ts" }
+// { dg-module-cmi M }
+
+export module M;
+export import :a;
diff --git a/gcc/testsuite/g++.dg/modules/using-15_c.C 
b/gcc/testsuite/g++.dg/modules/using-15_c.C
new file mode 100644
index 000..0651efffc91
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/using-15_c.C
@@ -0,0 +1,7 @@
+// { dg-additional-options "-fmodules-ts" }
+import M;
+
+int main() {
+  bar::a();
+  foo::a();  // { dg-error "not been declared" }
+}

Re: [PATCH] c++/modules: local class merging [PR99426]

2024-04-12 Thread Patrick Palka

On Fri, 12 Apr 2024, Jason Merrill wrote:

> On 4/12/24 10:35, Patrick Palka wrote:
> > On Wed, 10 Apr 2024, Jason Merrill wrote:
> > 
> > > On 4/10/24 14:48, Patrick Palka wrote:
> > > > On Tue, 9 Apr 2024, Jason Merrill wrote:
> > > > 
> > > > > On 3/5/24 10:31, Patrick Palka wrote:
> > > > > > On Tue, 27 Feb 2024, Patrick Palka wrote:
> > > > > > 
> > > > > > Subject: [PATCH] c++/modules: local type merging [PR99426]
> > > > > > 
> > > > > > One known missing piece in the modules implementation is merging of
> > > > > > a
> > > > > > streamed-in local type (class or enum) with the corresponding in-TU
> > > > > > version of the local type.  This missing piece turns out to cause a
> > > > > > hard-to-reduce use-after-free GC issue due to the entity_ary not
> > > > > > being
> > > > > > marked as a GC root (deliberately), and manifests as a serialization
> > > > > > error on stream-in as in PR99426 (see comment #6 for a reduction).
> > > > > > It's
> > > > > > also reproducible on trunk when running the xtreme-header tests
> > > > > > without
> > > > > > -fno-module-lazy.
> > > > > > 
> > > > > > This patch makes us merge such local types according to their
> > > > > > position
> > > > > > within the containing function's definition, analogous to how we
> > > > > > merge
> > > > > > FIELD_DECLs of a class according to their index in the TYPE_FIELDS
> > > > > > list.
> > > > > > 
> > > > > > PR c++/99426
> > > > > > 
> > > > > > gcc/cp/ChangeLog:
> > > > > > 
> > > > > > * module.cc (merge_kind::MK_local_type): New enumerator.
> > > > > > (merge_kind_name): Update.
> > > > > > (trees_out::chained_decls): Move BLOCK-specific handling
> > > > > > of DECL_LOCAL_DECL_P decls to ...
> > > > > > (trees_out::core_vals) : ... here.  Stream
> > > > > > BLOCK_VARS manually.
> > > > > > (trees_in::core_vals) : Stream BLOCK_VARS
> > > > > > manually.  Handle deduplicated local types..
> > > > > > (trees_out::key_local_type): Define.
> > > > > > (trees_in::key_local_type): Define.
> > > > > > (trees_out::get_merge_kind) : Return
> > > > > > MK_local_type for a local type.
> > > > > > (trees_out::key_mergeable) : Use
> > > > > > key_local_type.
> > > > > > (trees_in::key_mergeable) : Likewise.
> > > > > > (trees_in::is_matching_decl): Be flexible with type mismatches
> > > > > > for local entities.
> > > > > > 
> > > > > > diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
> > > > > > index 80b63a70a62..d9e34e9a4b9 100644
> > > > > > --- a/gcc/cp/module.cc
> > > > > > +++ b/gcc/cp/module.cc
> > > > > > @@ -6714,7 +6720,37 @@ trees_in::core_vals (tree t)
> > > > > > case BLOCK:
> > > > > >   t->block.locus = state->read_location (*this);
> > > > > >   t->block.end_locus = state->read_location (*this);
> > > > > > -  t->block.vars = chained_decls ();
> > > > > > +
> > > > > > +  for (tree *chain = >block.vars;;)
> > > > > > +   if (tree decl = tree_node ())
> > > > > > + {
> > > > > > +   /* For a deduplicated local type or enumerator, chain the
> > > > > > +  duplicate decl instead of the canonical in-TU decl.
> > > > > > Seeing
> > > > > > +  a duplicate here means the containing function whose
> > > > > > body
> > > > > > +  we're streaming in is a duplicate too, so we'll end up
> > > > > > +  discarding this BLOCK (and the rest of the duplicate
> > > > > > function
> > > > > > +  body) anyway.  */
> > > > > > +   if (is_duplicate (decl))
> > > > > > + decl = maybe_duplicate (decl);
> > > > > > +   else if (DECL_IMPLICIT_TYPEDEF_P (decl)
> > > > > > +&& TYPE_TEMPLATE_INFO (TREE_TYPE (decl)))
> > > > > > + {
> > > > > > +   tree tmpl = TYPE_TI_TEMPLATE (TREE_TYPE (decl));
> > > > > > +   if (DECL_TEMPLATE_RESULT (tmpl) == decl &&
> > > > > > is_duplicate
> > > > > > (tmpl))
> > > > > > + decl = DECL_TEMPLATE_RESULT (maybe_duplicate
> > > > > > (tmpl));
> > > > > > + }
> > > > > 
> > > > > This seems like a lot of generally-applicable code for finding the
> > > > > duplicate,
> > > > > which other calls to maybe_duplicate/odr_duplicate don't use.  If the
> > > > > template
> > > > > is a duplicate, why isn't its result?  If there's a good reason for
> > > > > that,
> > > > > should this template handling go into maybe_duplicate?
> > > > 
> > > > Ah yeah, that makes sense.
> > > > 
> > > > Some context: IIUC modules treats the TEMPLATE_DECL instead of the
> > > > DECL_TEMPLATE_RESULT as the canonical decl, which in turn means we'll
> > > > register_duplicate only the TEMPLATE_DECL.  But BLOCK_VARS never
> > > > contains
> > > > a TEMPLATE_DECL, always the DECL_TEMPLATE_RESULT (i.e. a TYPE_DECL),
> > > > hence the extra handling.
> > > > 
> > > > Given that it's relatively more difficult to get at the TEMPLATE_DECL
> > > > from the DECL_TEMPLATE_RESULT rather than vice versa, maybe we should
> > > >

Re: [PATCH 1/3] c++/modules: Only emit exported GMF usings [PR114600]

2024-04-12 Thread Jason Merrill


On 4/11/24 20:40, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?


OK.


-- >8 --

A typo in r14-6978 made us emit too many things. This ensures that we
don't emit using-declarations from the GMF that we don't need to.

PR c++/114600

gcc/cp/ChangeLog:

* module.cc (depset::hash::add_binding_entity): Require both
WMB_Using and WMB_Export for GMF entities.

gcc/testsuite/ChangeLog:

* g++.dg/modules/using-14.C: New test.

Signed-off-by: Nathaniel Shead 
Co-authored-by: Patrick Palka 
---
  gcc/cp/module.cc|  2 +-
  gcc/testsuite/g++.dg/modules/using-14.C | 14 ++
  2 files changed, 15 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/modules/using-14.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 4e91fa6e052..9d054c4c792 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -12892,7 +12892,7 @@ depset::hash::add_binding_entity (tree decl, WMB_Flags 
flags, void *data_)
inner = DECL_TEMPLATE_RESULT (inner);
  
if ((!DECL_LANG_SPECIFIC (inner) || !DECL_MODULE_PURVIEW_P (inner))

- && !(flags & (WMB_Using | WMB_Export)))
+ && !((flags & WMB_Using) && (flags & WMB_Export)))
/* Ignore global module fragment entities unless explicitly
   exported with a using declaration.  */
return false;
diff --git a/gcc/testsuite/g++.dg/modules/using-14.C 
b/gcc/testsuite/g++.dg/modules/using-14.C
new file mode 100644
index 000..0e15a952de5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/using-14.C
@@ -0,0 +1,14 @@
+// PR c++/114600
+// { dg-additional-options "-fmodules-ts -Wno-global-module 
-fdump-lang-module" }
+// { dg-module-cmi M }
+
+module;
+namespace std {
+  template struct A { int n; };
+  template A f();
+  namespace __swappable_details { using std::f; }
+}
+export module M;
+
+// The whole GMF should be discarded here
+// { dg-final { scan-lang-dump "Wrote 0 clusters" module } }

Re: [PATCH] c++: Diagnose or avoid constexpr dtors in classes with virtual bases [PR114426]

2024-04-12 Thread Jason Merrill


On 4/12/24 09:12, Jakub Jelinek wrote:

Hi!

I had another look at this P1 PR today.
You said in the "c++: fix in-charge parm in constexpr" mail back in December
(as well as in the r14-6507 commit message):
"Since a class with vbases can't have constexpr 'tors there isn't actually
a need for an in-charge parameter in a destructor" but the ICE is because
the destructor is marked implicitly constexpr.
https://eel.is/c++draft/dcl.constexpr#3.2 says that a destructor of a class
with virtual bases is not constexpr-suitable, but we were actually
implementing this just for constructors, so clearly my fault from the
https://wg21.link/P0784R7 implementation.  That paper clearly added that
sentence in there and removed similar sentence just from the constructor case.

So, the following patch makes sure the
   else if (CLASSTYPE_VBASECLASSES (DECL_CONTEXT (fun)))
 {
   ret = false;
   if (complain)
 error ("%q#T has virtual base classes", DECL_CONTEXT (fun));
 }
hunk is done no just for DECL_CONSTRUCTOR_P (fun), but also
DECL_DESTRUCTOR_P (fun) - in that case just for cxx_dialect >= cxx20,
as for cxx_dialect < cxx20 we already set ret = false; and diagnose
a different error, so no need to diagnose two.

Bootstrapped/regtested on x86_64-linux and i686-linux, and checked it fixes
the testcase in a cross to armv7hl-linux-gnueabi, ok for trunk?


OK.


2024-04-12  Jakub Jelinek  

PR c++/114426
* constexpr.cc (is_valid_constexpr_fn): Return false/diagnose with
complain destructors in classes with virtual bases.

* g++.dg/cpp2a/pr114426.C: New test.
* g++.dg/cpp2a/constexpr-dtor16.C: New test.

--- gcc/cp/constexpr.cc.jj  2024-04-09 09:29:04.708521907 +0200
+++ gcc/cp/constexpr.cc 2024-04-12 11:45:08.845476718 +0200
@@ -262,18 +262,15 @@ is_valid_constexpr_fn (tree fun, bool co
inform (DECL_SOURCE_LOCATION (fun),
"lambdas are implicitly % only in C++17 and later");
  }
-  else if (DECL_DESTRUCTOR_P (fun))
+  else if (DECL_DESTRUCTOR_P (fun) && cxx_dialect < cxx20)
  {
-  if (cxx_dialect < cxx20)
-   {
- ret = false;
- if (complain)
-   error_at (DECL_SOURCE_LOCATION (fun),
- "% destructors only available"
- " with %<-std=c++20%> or %<-std=gnu++20%>");
-   }
+  ret = false;
+  if (complain)
+   error_at (DECL_SOURCE_LOCATION (fun),
+ "% destructors only available with "
+ "%<-std=c++20%> or %<-std=gnu++20%>");
  }
-  else if (!DECL_CONSTRUCTOR_P (fun))
+  else if (!DECL_CONSTRUCTOR_P (fun) && !DECL_DESTRUCTOR_P (fun))
  {
tree rettype = TREE_TYPE (TREE_TYPE (fun));
if (!literal_type_p (rettype))
--- gcc/testsuite/g++.dg/cpp2a/pr114426.C.jj2024-04-12 12:05:07.443891700 
+0200
+++ gcc/testsuite/g++.dg/cpp2a/pr114426.C   2024-04-12 12:05:07.443891700 
+0200
@@ -0,0 +1,7 @@
+// PR c++/114426
+// { dg-do compile }
+// { dg-additional-options "-O2" }
+
+struct A { virtual ~A (); };
+struct B : virtual A { virtual void foo () = 0; };
+struct C : B { C () {} };
--- gcc/testsuite/g++.dg/cpp2a/constexpr-dtor16.C.jj2024-04-12 
12:05:35.398505976 +0200
+++ gcc/testsuite/g++.dg/cpp2a/constexpr-dtor16.C   2024-04-12 
12:08:31.771072322 +0200
@@ -0,0 +1,7 @@
+// PR c++/114426
+// { dg-do compile { target c++11 } }
+
+struct A { virtual ~A (); };
+struct B : virtual A { constexpr ~B () {} };
+// { dg-error "'struct B' has virtual base classes" "" { target c++20 } .-1 }
+// { dg-error "'constexpr' destructors only available with" "" { target 
c++17_down } .-2 }

Jakub

Re: [PATCH] c++: templated substitution into lambda-expr, cont [PR114393]

2024-04-12 Thread Jason Merrill


On 4/12/24 09:47, Patrick Palka wrote:


The original PR114393 testcase is unfortunately still not accepted after
r14-9938-g081c1e93d56d35 due to return type deduction confusion when a
lambda-expr is used as a default template argument.

The below reduced testcase demonstrates the bug.  Here, when forming the
dependent specialization b_v we substitute the default argument of F,
a lambda-expr, with _Descriptor=U.  (In this case in_template_context is
true since we're in the context of the template c_v so we don't defer.)
This substitution in turn lowers the level of its auto return type from
2 to 1.  So later, when instantiating c_v we incorrectly
replace the auto with the template argument at level=0, index=0, i.e.
int, instead of going through do_auto_deduction which would yield char.

One way to fix this would be to use a level-less auto to represent a
deduced return type of a lambda, but that might be too invasive of a
change at this stage.


I suspect we want to move to all level-less autos, apart from those that 
imply an actual template parameter.  But agreed, not in stage 4.


The patch is OK.

Jason

Re: [PATCH V4 1/3] aarch64: Place target independent and dependent changed code in one file

2024-04-12 Thread Ajit Agarwal

Hello Alex:

On 12/04/24 8:15 pm, Alex Coplan wrote:
> On 12/04/2024 20:02, Ajit Agarwal wrote:
>> Hello Alex:
>>
>> On 11/04/24 7:55 pm, Alex Coplan wrote:
>>> On 10/04/2024 23:48, Ajit Agarwal wrote:
 Hello Alex:

 On 10/04/24 7:52 pm, Alex Coplan wrote:
> Hi Ajit,
>
> On 10/04/2024 15:31, Ajit Agarwal wrote:
>> Hello Alex:
>>
>> On 10/04/24 1:42 pm, Alex Coplan wrote:
>>> Hi Ajit,
>>>
>>> On 09/04/2024 20:59, Ajit Agarwal wrote:
 Hello Alex:

 On 09/04/24 8:39 pm, Alex Coplan wrote:
> On 09/04/2024 20:01, Ajit Agarwal wrote:
>> Hello Alex:
>>
>> On 09/04/24 7:29 pm, Alex Coplan wrote:
>>> On 09/04/2024 17:30, Ajit Agarwal wrote:


 On 05/04/24 10:03 pm, Alex Coplan wrote:
> On 05/04/2024 13:53, Ajit Agarwal wrote:
>> Hello Alex/Richard:
>>
>> All review comments are incorporated.
> 
>> @@ -2890,8 +3018,8 @@ ldp_bb_info::merge_pairs (insn_list_t 
>> _list,
>>  // of accesses.  If we find two sets of adjacent accesses, call
>>  // merge_pairs.
>>  void
>> -ldp_bb_info::transform_for_base (int encoded_lfs,
>> - access_group )
>> +pair_fusion_bb_info::transform_for_base (int encoded_lfs,
>> + access_group )
>>  {
>>const auto lfs = decode_lfs (encoded_lfs);
>>const unsigned access_size = lfs.size;
>> @@ -2909,7 +3037,7 @@ ldp_bb_info::transform_for_base (int 
>> encoded_lfs,
>> access.cand_insns,
>> lfs.load_p,
>> access_size);
>> -  skip_next = access.cand_insns.empty ();
>> +  skip_next = bb_state->cand_insns_empty_p 
>> (access.cand_insns);
>
> As above, why is this needed?

 For rs6000 we want to return always true. as load store pair
 that are to be merged with 8/16 16/32 32/64 is occuring for rs6000.
 And we want load store pair to 8/16 32/64. Thats why we want
 to generate always true for rs6000 to skip pairs as above.
>>>
>>> Hmm, sorry, I'm not sure I follow.  Are you saying that for rs6000 
>>> you have
>>> load/store pair instructions where the two arms of the access are 
>>> storing
>>> operands of different sizes?  Or something else?
>>>
>>> As it stands the logic is to skip the next iteration only if we
>>> exhausted all the candidate insns for the current access.  In the 
>>> case
>>> that we didn't exhaust all such candidates, then the idea is that 
>>> when
>>> access becomes prev_access, we can attempt to use those candidates 
>>> as
>>> the "left-hand side" of a pair in the next iteration since we 
>>> failed to
>>> use them as the "right-hand side" of a pair in the current 
>>> iteration.
>>> I don't see why you wouldn't want that behaviour.  Please can you
>>> explain?
>>>
>>
>> In merge_pair we get the 2 load candiates one load from 0 offset and
>> other load is from 16th offset. Then in next iteration we get load
>> from 16th offset and other load from 32 offset. In next iteration
>> we get load from 32 offset and other load from 48 offset.
>>
>> For example:
>>
>> Currently we get the load candiates as follows.
>>
>> pairs:
>>
>> load from 0th offset.
>> load from 16th offset.
>>
>> next pairs:
>>
>> load from 16th offset.
>> load from 32th offset.
>>
>> next pairs:
>>
>> load from 32th offset
>> load from 48th offset.
>>
>> Instead in rs6000 we should get:
>>
>> pairs:
>>
>> load from 0th offset
>> load from 16th offset.
>>
>> next pairs:
>>
>> load from 32th offset
>> load from 48th offset.
>
> Hmm, so then I guess my question is: why wouldn't you consider merging
> the pair with offsets (16,32) for rs6000?  Is it because you have a
> stricter alignment requirement on the base pair offsets (i.e. they 
> have
> to be a multiple of 32 when the operand size is 16)?  So the pair
> offsets have to be a multiple of the entire pair size rather than a
> single operand size> 

 We get load pair at a certain point with (0,16) and other program
 point we get load pair (32, 48).

Re: [PATCH] c++/modules: local class merging [PR99426]

2024-04-12 Thread Jason Merrill


On 4/12/24 10:35, Patrick Palka wrote:

On Wed, 10 Apr 2024, Jason Merrill wrote:


On 4/10/24 14:48, Patrick Palka wrote:

On Tue, 9 Apr 2024, Jason Merrill wrote:


On 3/5/24 10:31, Patrick Palka wrote:

On Tue, 27 Feb 2024, Patrick Palka wrote:

Subject: [PATCH] c++/modules: local type merging [PR99426]

One known missing piece in the modules implementation is merging of a
streamed-in local type (class or enum) with the corresponding in-TU
version of the local type.  This missing piece turns out to cause a
hard-to-reduce use-after-free GC issue due to the entity_ary not being
marked as a GC root (deliberately), and manifests as a serialization
error on stream-in as in PR99426 (see comment #6 for a reduction).  It's
also reproducible on trunk when running the xtreme-header tests without
-fno-module-lazy.

This patch makes us merge such local types according to their position
within the containing function's definition, analogous to how we merge
FIELD_DECLs of a class according to their index in the TYPE_FIELDS
list.

PR c++/99426

gcc/cp/ChangeLog:

* module.cc (merge_kind::MK_local_type): New enumerator.
(merge_kind_name): Update.
(trees_out::chained_decls): Move BLOCK-specific handling
of DECL_LOCAL_DECL_P decls to ...
(trees_out::core_vals) : ... here.  Stream
BLOCK_VARS manually.
(trees_in::core_vals) : Stream BLOCK_VARS
manually.  Handle deduplicated local types..
(trees_out::key_local_type): Define.
(trees_in::key_local_type): Define.
(trees_out::get_merge_kind) : Return
MK_local_type for a local type.
(trees_out::key_mergeable) : Use
key_local_type.
(trees_in::key_mergeable) : Likewise.
(trees_in::is_matching_decl): Be flexible with type mismatches
for local entities.

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 80b63a70a62..d9e34e9a4b9 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -6714,7 +6720,37 @@ trees_in::core_vals (tree t)
case BLOCK:
  t->block.locus = state->read_location (*this);
  t->block.end_locus = state->read_location (*this);
-  t->block.vars = chained_decls ();
+
+  for (tree *chain = >block.vars;;)
+   if (tree decl = tree_node ())
+ {
+   /* For a deduplicated local type or enumerator, chain the
+  duplicate decl instead of the canonical in-TU decl.  Seeing
+  a duplicate here means the containing function whose body
+  we're streaming in is a duplicate too, so we'll end up
+  discarding this BLOCK (and the rest of the duplicate function
+  body) anyway.  */
+   if (is_duplicate (decl))
+ decl = maybe_duplicate (decl);
+   else if (DECL_IMPLICIT_TYPEDEF_P (decl)
+&& TYPE_TEMPLATE_INFO (TREE_TYPE (decl)))
+ {
+   tree tmpl = TYPE_TI_TEMPLATE (TREE_TYPE (decl));
+   if (DECL_TEMPLATE_RESULT (tmpl) == decl && is_duplicate
(tmpl))
+ decl = DECL_TEMPLATE_RESULT (maybe_duplicate (tmpl));
+ }


This seems like a lot of generally-applicable code for finding the
duplicate,
which other calls to maybe_duplicate/odr_duplicate don't use.  If the
template
is a duplicate, why isn't its result?  If there's a good reason for that,
should this template handling go into maybe_duplicate?


Ah yeah, that makes sense.

Some context: IIUC modules treats the TEMPLATE_DECL instead of the
DECL_TEMPLATE_RESULT as the canonical decl, which in turn means we'll
register_duplicate only the TEMPLATE_DECL.  But BLOCK_VARS never contains
a TEMPLATE_DECL, always the DECL_TEMPLATE_RESULT (i.e. a TYPE_DECL),
hence the extra handling.

Given that it's relatively more difficult to get at the TEMPLATE_DECL
from the DECL_TEMPLATE_RESULT rather than vice versa, maybe we should
just register both as duplicates from register_duplicate?  That way
callers can just simply pass the DECL_TEMPLATE_RESULT to maybe_duplicate
and it'll do the right thing.


Sounds good.


@@ -10337,6 +10373,83 @@ trees_in::fn_parms_fini (int tag, tree fn, tree
existing, bool is_defn)
}
}
+/* Encode into KEY the position of the local type (class or enum)
+   declaration DECL within FN.  The position is encoded as the
+   index of the innermost BLOCK (numbered in BFS order) along with
+   the index within its BLOCK_VARS list.  */


Since we already set DECL_DISCRIMINATOR for mangling, could we use it+name
for
the key as well?


We could (and IIUc that'd be more robust to ODR violations), but
wouldn't it mean we'd have to do a linear walk over all BLOCK_VARs of
all BLOCKS in order to find the one with the matching
name+discriminator?  That'd be slower than the current approach which
lets us skip to the correct BLOCK and walk only its BLOCK_VARS.


Ah, good point.  How about block number + name instead of the index?


It seems DECL_DISCRIMINATOR is

Re: [PATCH] aarch64: Add rcpc3 dependency on rcpc2 and rcpc

2024-04-12 Thread Andrew Carlotti

On Fri, Apr 12, 2024 at 04:49:03PM +0100, Richard Sandiford wrote:
> Andrew Carlotti  writes:
> > We don't yet have a separate feature flag for FEAT_LRCPC2 (and adding
> > one will require extending the feature bitmask).  Instead, make the
> > FEAT_LRCPC patterns available when either armv8.4-a or +rcpc3 is
> > specified.  On the other hand, we already have a +rcpc flag, so this
> > dependency can be specified directly.
> >
> > The cpunative test needed updating because it used an invalid Features
> > list, since lrcpc3 requires both ilrcpc and lrcpc to be present.
> > Without this change, host_detect_local_cpu would return the architecture
> > string 'armv8-a+dotprod+crc+crypto+rcpc3+norcpc'.
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64-option-extensions.def: Add RCPC to
> > RCPC3 dependencies.
> > * config/aarch64/aarch64.h (AARCH64_ISA_RCPC8_4): Add test for
> > RCPC3 bit
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/cpunative/info_24: Include lrcpc and ilrcpc.
> >
> > ---
> >
> > Bootstrapped and regression tested on aarch64.  I also verified that the
> > atomic-store.c and ldapr-sext.c tests would pass when replacing 'armv8.4-a'
> > with 'armv8-a+rcpc3'.
> >
> > Ok for master?
> >
> >
> > diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
> > b/gcc/config/aarch64/aarch64-option-extensions.def
> > index 
> > 3155eccd39c8e6825b7fc2bb0d0514c2e7e559bf..42ec0eec31e2ddb0cc6f83fdbaf0fd4eac5ca7f4
> >  100644
> > --- a/gcc/config/aarch64/aarch64-option-extensions.def
> > +++ b/gcc/config/aarch64/aarch64-option-extensions.def
> > @@ -153,7 +153,7 @@ AARCH64_FMV_FEATURE("fp16fml", FP16FML, (F16FML))
> >  
> >  AARCH64_OPT_FMV_EXTENSION("rcpc", RCPC, (), (), (), "lrcpc")
> >  
> > -AARCH64_OPT_FMV_EXTENSION("rcpc3", RCPC3, (), (), (), "lrcpc3")
> > +AARCH64_OPT_FMV_EXTENSION("rcpc3", RCPC3, (RCPC), (), (), "lrcpc3")
> >  
> >  AARCH64_OPT_FMV_EXTENSION("i8mm", I8MM, (SIMD), (), (), "i8mm")
> >  
> > diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> > index 
> > 45e901cda644dbe4eaae709e685954f1a6f7dbcf..5870e3f812f6cb0674488b8e17ab7278003d2d54
> >  100644
> > --- a/gcc/config/aarch64/aarch64.h
> > +++ b/gcc/config/aarch64/aarch64.h
> > @@ -242,7 +242,8 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = 
> > AARCH64_FL_SM_OFF;
> >  #define AARCH64_ISA_SHA3  (aarch64_isa_flags & AARCH64_FL_SHA3)
> >  #define AARCH64_ISA_F16FML(aarch64_isa_flags & AARCH64_FL_F16FML)
> >  #define AARCH64_ISA_RCPC  (aarch64_isa_flags & AARCH64_FL_RCPC)
> > -#define AARCH64_ISA_RCPC8_4   (aarch64_isa_flags & 
> > AARCH64_FL_V8_4A)
> > +#define AARCH64_ISA_RCPC8_4   (aarch64_isa_flags \
> > +   & (AARCH64_FL_V8_4A | AARCH64_FL_RCPC3))
> 
> It looks like the effect of these two changes is that:
> 
> * armv9-a+rcpc3+norcpc leaves TARGET_RCPC2 true and TARGET_RCPC and
>   TARGET_RCPC3 false.
> 
> * armv8-a+rcpc3+norcpc correctly leaves all three false.
> 
> If we add the RCPC3->RCPC dependency then I think we should also
> require FL_RCPC alongside FL_V8_4A.  I.e.:
> 
> #define AARCH64_ISA_RCPC8_4   (AARCH64_ISA_RCPC \
>&& (aarch64_isa_flags \
>& (AARCH64_FL_V8_4A | AARCH64_FL_RCPC3)))

Good spot! I'll go with the following instead (for formatting reasons), if it
passes testing:

#define AARCH64_ISA_RCPC8_4((AARCH64_ISA_RCPC && AARCH_ISA_V8_4A) \
|| (aarch64_isa_flags & AARCH64_FL_RCPC3))

> OK with that change, thanks.
> 
> Richard
> 
> 
> >  #define AARCH64_ISA_RNG   (aarch64_isa_flags & AARCH64_FL_RNG)
> >  #define AARCH64_ISA_V8_5A (aarch64_isa_flags & AARCH64_FL_V8_5A)
> >  #define AARCH64_ISA_TME   (aarch64_isa_flags & AARCH64_FL_TME)
> > diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/info_24 
> > b/gcc/testsuite/gcc.target/aarch64/cpunative/info_24
> > index 
> > 8d3c16a10910af977c560782f9d659c0e51286fd..3c64e00ca3a416ef565bc0b4a5b3e5bd9cfc41bc
> >  100644
> > --- a/gcc/testsuite/gcc.target/aarch64/cpunative/info_24
> > +++ b/gcc/testsuite/gcc.target/aarch64/cpunative/info_24
> > @@ -1,8 +1,8 @@
> >  processor  : 0
> >  BogoMIPS   : 100.00
> > -Features   : fp asimd evtstrm aes pmull sha1 sha2 crc32 asimddp lrcpc3
> > +Features   : fp asimd evtstrm aes pmull sha1 sha2 crc32 asimddp lrcpc 
> > ilrcpc lrcpc3
> >  CPU implementer: 0xfe
> >  CPU architecture: 8
> >  CPU variant: 0x0
> >  CPU part   : 0xd08
> > -CPU revision   : 2
> > \ No newline at end of file
> > +CPU revision   : 2

[pushed] aarch64: Avoid using mismatched ZERO ZA sizes

2024-04-12 Thread Richard Sandiford

The svzero_mask_za intrinsic tried to use the shortest combination
of .b, .h, .s and .d tiles, allowing mixtures of sizes where necessary.
However, Iain S pointed out that LLVM instead requires the tiles to
have the same suffix.  GAS supports both versions, so this patch
generates the LLVM-friendly form.

Tested on aarch64-linux-gnu & pushed.

Please revert the patch if it causes any problems.

Richard


gcc/
* config/aarch64/aarch64.cc (aarch64_output_sme_zero_za): Require
all tiles to have the same suffix.

gcc/testsuite/
* gcc.target/aarch64/sme/acle-asm/zero_mask_za.c (zero_mask_za_ab)
(zero_mask_za_d7, zero_mask_za_bf): Expect a list of .d tiles instead
of a mixture.
---
 gcc/config/aarch64/aarch64.cc | 20 +++
 .../aarch64/sme/acle-asm/zero_mask_za.c   |  6 +++---
 2 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index a2e3d208d76..1beec94629d 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -13210,29 +13210,33 @@ aarch64_output_sme_zero_za (rtx mask)
   /* The last entry in the list has the form "za7.d }", but that's the
  same length as "za7.d, ".  */
   static char buffer[sizeof("zero\t{ ") + sizeof ("za7.d, ") * 8 + 1];
-  unsigned int i = 0;
-  i += snprintf (buffer + i, sizeof (buffer) - i, "zero\t");
-  const char *prefix = "{ ";
   for (auto  : tiles)
 {
   unsigned int tile_mask = tile.mask;
   unsigned int tile_index = 0;
+  unsigned int i = snprintf (buffer, sizeof (buffer), "zero\t");
+  const char *prefix = "{ ";
+  auto remaining_mask = mask_val;
   while (tile_mask < 0x100)
{
- if ((mask_val & tile_mask) == tile_mask)
+ if ((remaining_mask & tile_mask) == tile_mask)
{
  i += snprintf (buffer + i, sizeof (buffer) - i, "%sza%d.%c",
 prefix, tile_index, tile.letter);
  prefix = ", ";
- mask_val &= ~tile_mask;
+ remaining_mask &= ~tile_mask;
}
  tile_mask <<= 1;
  tile_index += 1;
}
+  if (remaining_mask == 0)
+   {
+ gcc_assert (i + 3 <= sizeof (buffer));
+ snprintf (buffer + i, sizeof (buffer) - i, " }");
+ return buffer;
+   }
 }
-  gcc_assert (mask_val == 0 && i + 3 <= sizeof (buffer));
-  snprintf (buffer + i, sizeof (buffer) - i, " }");
-  return buffer;
+  gcc_unreachable ();
 }
 
 /* Return size in bits of an arithmetic operand which is shifted/scaled and
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/zero_mask_za.c 
b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/zero_mask_za.c
index 9ce7331ebdd..2ba8f8cc332 100644
--- a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/zero_mask_za.c
+++ b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/zero_mask_za.c
@@ -103,21 +103,21 @@ PROTO (zero_mask_za_aa, void, ()) { svzero_mask_za 
(0xaa); }
 
 /*
 ** zero_mask_za_ab:
-** zero{ za1\.h, za0\.d }
+** zero{ za0\.d, za1\.d, za3\.d, za5\.d, za7\.d }
 ** ret
 */
 PROTO (zero_mask_za_ab, void, ()) { svzero_mask_za (0xab); }
 
 /*
 ** zero_mask_za_d7:
-** zero{ za0\.h, za1\.d, za7\.d }
+** zero{ za0\.d, za1\.d, za2\.d, za4\.d, za6\.d, za7\.d }
 ** ret
 */
 PROTO (zero_mask_za_d7, void, ()) { svzero_mask_za (0xd7); }
 
 /*
 ** zero_mask_za_bf:
-** zero{ za1\.h, za0\.s, za2\.d }
+** zero{ za0\.d, za1\.d, za2\.d, za3\.d, za4\.d, za5\.d, za7\.d }
 ** ret
 */
 PROTO (zero_mask_za_bf, void, ()) { svzero_mask_za (0xbf); }
-- 
2.25.1

Re: [PATCH] docs: Update function multiversioning documentation

2024-04-12 Thread Richard Sandiford

Hi Andrew,

Thanks for doing this.  I think it improves the organisation of the
FMV documentation and adds some details that were previously missing.

I've made some suggestions below, but documentation is subjective
and I realise that not everyone will agree with them.

I've also added Sandra to cc: in case she has time to help with this.
[original patch: 
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649071.html]

Andrew Carlotti  writes:
> Add target_version attribute to Common Function Attributes and update
> target and target_clones documentation.  Move shared detail and examples
> to the Function Multiversioning page.  Add target-specific details to
> target-specific pages.
>
> ---
>
> I've built and checked the info and dvi outputs.  Ok for master?
>
> gcc/ChangeLog:
>
>   * doc/extend.texi (Common Function Attributes): Update target
>   and target_clones documentation, and add target_version.
>   (AArch64 Function Attributes): Add ACLE reference and list
>   supported features.
>   (PowerPC Function Attributes): List supported features.
>   (x86 Function Attributes): Mention function multiversioning.
>   (Function Multiversioning): Update, and move shared detail here.
>
>
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 
> 7b54a241a7bfde03ce86571be9486b30bcea6200..78cc7ad2903b61a06b618b82ba7ad52ed42d944a
>  100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -4178,18 +4178,27 @@ and @option{-Wanalyzer-tainted-size}.
>  Multiple target back ends implement the @code{target} attribute
>  to specify that a function is to
>  be compiled with different target options than specified on the
> -command line.  The original target command-line options are ignored.
> -One or more strings can be provided as arguments.
> -Each string consists of one or more comma-separated suffixes to
> -the @code{-m} prefix jointly forming the name of a machine-dependent
> -option.  @xref{Submodel Options,,Machine-Dependent Options}.
> -
> +command line.  One or more strings can be provided as arguments.
> +The attribute may override the original target command-line options, or it 
> may
> +be combined with them in a target-specific manner.

It's hard to tell from this what the conditions for "may" are,
e.g. whether it depends on the arguments, on the back end, or both.
Could you add a bit more text to clarify (even if it's just a forward
reference)?

With that extra text, and perhaps without, I think it's clearer to
say this after...

>  The @code{target} attribute can be used for instance to have a function
>  compiled with a different ISA (instruction set architecture) than the
> -default.  @samp{#pragma GCC target} can be used to specify target-specific
> +default.

...this.  I.e.:

  Multiple target back ends implement [...] command-line.  
  The @code{target} attribute can be used [...] the default.

  

> +
> +@samp{#pragma GCC target} can be used to specify target-specific
>  options for more than one function.  @xref{Function Specific Option Pragmas},
>  for details about the pragma.
>  
> +On x86, the @code{target} attribute can also be used to create multiple
> +versions of a function, compiled with different target-specific options.
> +@xref{Function Multiversioning} for more details.

It might be clearer to put this at the end, since the rest of the section
goes back to talking about the non-FMV usage.  Perhaps the same goes for
the pragma part.

Also, how about saying that, on AArch64, the equivalent functionality
is provided by the target_version attribute?

> +
> +The options supported by the @code{target} attribute are specific to each
> +target; refer to @ref{x86 Function Attributes}, @ref{PowerPC Function
> +Attributes}, @ref{ARM Function Attributes}, @ref{AArch64 Function 
> Attributes},
> +@ref{Nios II Function Attributes}, and @ref{S/390 Function Attributes}
> +for details.
> +
>  For instance, on an x86, you could declare one function with the
>  @code{target("sse4.1,arch=core2")} attribute and another with
>  @code{target("sse4a,arch=amdfam10")}.  This is equivalent to
> @@ -4211,39 +4220,18 @@ multiple options is equivalent to separating the 
> option suffixes with
>  a comma (@samp{,}) within a single string.  Spaces are not permitted
>  within the strings.
>  
> -The options supported are specific to each target; refer to @ref{x86
> -Function Attributes}, @ref{PowerPC Function Attributes},
> -@ref{ARM Function Attributes}, @ref{AArch64 Function Attributes},
> -@ref{Nios II Function Attributes}, and @ref{S/390 Function Attributes}
> -for details.
> -
>  @cindex @code{target_clones} function attribute
>  @item target_clones (@var{options})
>  The @code{target_clones} attribute is used to specify that a function
> -be cloned into multiple versions compiled with different target options
> -than specified on the command line.  The supported options and restrictions
> -are the same as for @code{target} attribute.
> -
> -For instance, on an

Re: [PATCH] aarch64: Add rcpc3 dependency on rcpc2 and rcpc

2024-04-12 Thread Richard Sandiford

Andrew Carlotti  writes:
> We don't yet have a separate feature flag for FEAT_LRCPC2 (and adding
> one will require extending the feature bitmask).  Instead, make the
> FEAT_LRCPC patterns available when either armv8.4-a or +rcpc3 is
> specified.  On the other hand, we already have a +rcpc flag, so this
> dependency can be specified directly.
>
> The cpunative test needed updating because it used an invalid Features
> list, since lrcpc3 requires both ilrcpc and lrcpc to be present.
> Without this change, host_detect_local_cpu would return the architecture
> string 'armv8-a+dotprod+crc+crypto+rcpc3+norcpc'.
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-option-extensions.def: Add RCPC to
>   RCPC3 dependencies.
>   * config/aarch64/aarch64.h (AARCH64_ISA_RCPC8_4): Add test for
>   RCPC3 bit
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/cpunative/info_24: Include lrcpc and ilrcpc.
>
> ---
>
> Bootstrapped and regression tested on aarch64.  I also verified that the
> atomic-store.c and ldapr-sext.c tests would pass when replacing 'armv8.4-a'
> with 'armv8-a+rcpc3'.
>
> Ok for master?
>
>
> diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
> b/gcc/config/aarch64/aarch64-option-extensions.def
> index 
> 3155eccd39c8e6825b7fc2bb0d0514c2e7e559bf..42ec0eec31e2ddb0cc6f83fdbaf0fd4eac5ca7f4
>  100644
> --- a/gcc/config/aarch64/aarch64-option-extensions.def
> +++ b/gcc/config/aarch64/aarch64-option-extensions.def
> @@ -153,7 +153,7 @@ AARCH64_FMV_FEATURE("fp16fml", FP16FML, (F16FML))
>  
>  AARCH64_OPT_FMV_EXTENSION("rcpc", RCPC, (), (), (), "lrcpc")
>  
> -AARCH64_OPT_FMV_EXTENSION("rcpc3", RCPC3, (), (), (), "lrcpc3")
> +AARCH64_OPT_FMV_EXTENSION("rcpc3", RCPC3, (RCPC), (), (), "lrcpc3")
>  
>  AARCH64_OPT_FMV_EXTENSION("i8mm", I8MM, (SIMD), (), (), "i8mm")
>  
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index 
> 45e901cda644dbe4eaae709e685954f1a6f7dbcf..5870e3f812f6cb0674488b8e17ab7278003d2d54
>  100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -242,7 +242,8 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = 
> AARCH64_FL_SM_OFF;
>  #define AARCH64_ISA_SHA3(aarch64_isa_flags & AARCH64_FL_SHA3)
>  #define AARCH64_ISA_F16FML  (aarch64_isa_flags & AARCH64_FL_F16FML)
>  #define AARCH64_ISA_RCPC(aarch64_isa_flags & AARCH64_FL_RCPC)
> -#define AARCH64_ISA_RCPC8_4 (aarch64_isa_flags & AARCH64_FL_V8_4A)
> +#define AARCH64_ISA_RCPC8_4 (aarch64_isa_flags \
> + & (AARCH64_FL_V8_4A | AARCH64_FL_RCPC3))

It looks like the effect of these two changes is that:

* armv9-a+rcpc3+norcpc leaves TARGET_RCPC2 true and TARGET_RCPC and
  TARGET_RCPC3 false.

* armv8-a+rcpc3+norcpc correctly leaves all three false.

If we add the RCPC3->RCPC dependency then I think we should also
require FL_RCPC alongside FL_V8_4A.  I.e.:

#define AARCH64_ISA_RCPC8_4 (AARCH64_ISA_RCPC \
 && (aarch64_isa_flags \
 & (AARCH64_FL_V8_4A | AARCH64_FL_RCPC3)))

OK with that change, thanks.

Richard


>  #define AARCH64_ISA_RNG (aarch64_isa_flags & AARCH64_FL_RNG)
>  #define AARCH64_ISA_V8_5A   (aarch64_isa_flags & AARCH64_FL_V8_5A)
>  #define AARCH64_ISA_TME (aarch64_isa_flags & AARCH64_FL_TME)
> diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/info_24 
> b/gcc/testsuite/gcc.target/aarch64/cpunative/info_24
> index 
> 8d3c16a10910af977c560782f9d659c0e51286fd..3c64e00ca3a416ef565bc0b4a5b3e5bd9cfc41bc
>  100644
> --- a/gcc/testsuite/gcc.target/aarch64/cpunative/info_24
> +++ b/gcc/testsuite/gcc.target/aarch64/cpunative/info_24
> @@ -1,8 +1,8 @@
>  processor: 0
>  BogoMIPS : 100.00
> -Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 asimddp lrcpc3
> +Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 asimddp lrcpc 
> ilrcpc lrcpc3
>  CPU implementer  : 0xfe
>  CPU architecture: 8
>  CPU variant  : 0x0
>  CPU part : 0xd08
> -CPU revision : 2
> \ No newline at end of file
> +CPU revision : 2

Re: [PATCH] aarch64: Enable +cssc for armv8.9-a

2024-04-12 Thread Richard Sandiford

Andrew Carlotti  writes:
> FEAT_CSSC is mandatory in the architecture from Armv8.9.
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-arches.def: Add CSSC to V8_9A
>   dependencies.

OK, thanks.

Richard

>
> ---
>
> Bootstrapped and regression tested on aarch64.  Ok for master?
>
>
> diff --git a/gcc/config/aarch64/aarch64-arches.def 
> b/gcc/config/aarch64/aarch64-arches.def
> index 
> 9bec30e9203bac01155281ef3474846c402bb29e..4634b272e28006b5c6c2d6705a2f1010cbd9ab9b
>  100644
> --- a/gcc/config/aarch64/aarch64-arches.def
> +++ b/gcc/config/aarch64/aarch64-arches.def
> @@ -39,7 +39,7 @@ AARCH64_ARCH("armv8.5-a", generic_armv8_a,   V8_5A, 
> 8,  (V8_4A, SB, SSBS
>  AARCH64_ARCH("armv8.6-a", generic_armv8_a,   V8_6A, 8,  (V8_5A, 
> I8MM, BF16))
>  AARCH64_ARCH("armv8.7-a", generic_armv8_a,   V8_7A, 8,  (V8_6A))
>  AARCH64_ARCH("armv8.8-a", generic_armv8_a,   V8_8A, 8,  (V8_7A, 
> MOPS))
> -AARCH64_ARCH("armv8.9-a", generic_armv8_a,   V8_9A, 8,  (V8_8A))
> +AARCH64_ARCH("armv8.9-a", generic_armv8_a,   V8_9A, 8,  (V8_8A, 
> CSSC))
>  AARCH64_ARCH("armv8-r",   generic_armv8_a,   V8R  , 8,  (V8_4A))
>  AARCH64_ARCH("armv9-a",   generic_armv9_a,   V9A  , 9,  (V8_5A, 
> SVE2))
>  AARCH64_ARCH("armv9.1-a", generic_armv9_a,   V9_1A, 9,  (V8_6A, V9A))

Re: [PATCH V4 1/3] aarch64: Place target independent and dependent changed code in one file

2024-04-12 Thread Alex Coplan

On 12/04/2024 20:02, Ajit Agarwal wrote:
> Hello Alex:
> 
> On 11/04/24 7:55 pm, Alex Coplan wrote:
> > On 10/04/2024 23:48, Ajit Agarwal wrote:
> >> Hello Alex:
> >>
> >> On 10/04/24 7:52 pm, Alex Coplan wrote:
> >>> Hi Ajit,
> >>>
> >>> On 10/04/2024 15:31, Ajit Agarwal wrote:
>  Hello Alex:
> 
>  On 10/04/24 1:42 pm, Alex Coplan wrote:
> > Hi Ajit,
> >
> > On 09/04/2024 20:59, Ajit Agarwal wrote:
> >> Hello Alex:
> >>
> >> On 09/04/24 8:39 pm, Alex Coplan wrote:
> >>> On 09/04/2024 20:01, Ajit Agarwal wrote:
>  Hello Alex:
> 
>  On 09/04/24 7:29 pm, Alex Coplan wrote:
> > On 09/04/2024 17:30, Ajit Agarwal wrote:
> >>
> >>
> >> On 05/04/24 10:03 pm, Alex Coplan wrote:
> >>> On 05/04/2024 13:53, Ajit Agarwal wrote:
>  Hello Alex/Richard:
> 
>  All review comments are incorporated.
> >>> 
>  @@ -2890,8 +3018,8 @@ ldp_bb_info::merge_pairs (insn_list_t 
>  _list,
>   // of accesses.  If we find two sets of adjacent accesses, call
>   // merge_pairs.
>   void
>  -ldp_bb_info::transform_for_base (int encoded_lfs,
>  - access_group )
>  +pair_fusion_bb_info::transform_for_base (int encoded_lfs,
>  + access_group )
>   {
> const auto lfs = decode_lfs (encoded_lfs);
> const unsigned access_size = lfs.size;
>  @@ -2909,7 +3037,7 @@ ldp_bb_info::transform_for_base (int 
>  encoded_lfs,
>  access.cand_insns,
>  lfs.load_p,
>  access_size);
>  -  skip_next = access.cand_insns.empty ();
>  +  skip_next = bb_state->cand_insns_empty_p 
>  (access.cand_insns);
> >>>
> >>> As above, why is this needed?
> >>
> >> For rs6000 we want to return always true. as load store pair
> >> that are to be merged with 8/16 16/32 32/64 is occuring for rs6000.
> >> And we want load store pair to 8/16 32/64. Thats why we want
> >> to generate always true for rs6000 to skip pairs as above.
> >
> > Hmm, sorry, I'm not sure I follow.  Are you saying that for rs6000 
> > you have
> > load/store pair instructions where the two arms of the access are 
> > storing
> > operands of different sizes?  Or something else?
> >
> > As it stands the logic is to skip the next iteration only if we
> > exhausted all the candidate insns for the current access.  In the 
> > case
> > that we didn't exhaust all such candidates, then the idea is that 
> > when
> > access becomes prev_access, we can attempt to use those candidates 
> > as
> > the "left-hand side" of a pair in the next iteration since we 
> > failed to
> > use them as the "right-hand side" of a pair in the current 
> > iteration.
> > I don't see why you wouldn't want that behaviour.  Please can you
> > explain?
> >
> 
>  In merge_pair we get the 2 load candiates one load from 0 offset and
>  other load is from 16th offset. Then in next iteration we get load
>  from 16th offset and other load from 32 offset. In next iteration
>  we get load from 32 offset and other load from 48 offset.
> 
>  For example:
> 
>  Currently we get the load candiates as follows.
> 
>  pairs:
> 
>  load from 0th offset.
>  load from 16th offset.
> 
>  next pairs:
> 
>  load from 16th offset.
>  load from 32th offset.
> 
>  next pairs:
> 
>  load from 32th offset
>  load from 48th offset.
> 
>  Instead in rs6000 we should get:
> 
>  pairs:
> 
>  load from 0th offset
>  load from 16th offset.
> 
>  next pairs:
> 
>  load from 32th offset
>  load from 48th offset.
> >>>
> >>> Hmm, so then I guess my question is: why wouldn't you consider merging
> >>> the pair with offsets (16,32) for rs6000?  Is it because you have a
> >>> stricter alignment requirement on the base pair offsets (i.e. they 
> >>> have
> >>> to be a multiple of 32 when the operand size is 16)?  So the pair
> >>> offsets have to be a multiple of the entire pair size rather than a
> >>> single operand size> 
> >>
> >> We get load pair at a certain point with (0,16) and other program
> >> point we get load pair (32, 48).
> >>
> >> In current implementation it takes

Re: [PATCH] c++/modules: local class merging [PR99426]

2024-04-12 Thread Patrick Palka

On Wed, 10 Apr 2024, Jason Merrill wrote:

> On 4/10/24 14:48, Patrick Palka wrote:
> > On Tue, 9 Apr 2024, Jason Merrill wrote:
> > 
> > > On 3/5/24 10:31, Patrick Palka wrote:
> > > > On Tue, 27 Feb 2024, Patrick Palka wrote:
> > > > 
> > > > Subject: [PATCH] c++/modules: local type merging [PR99426]
> > > > 
> > > > One known missing piece in the modules implementation is merging of a
> > > > streamed-in local type (class or enum) with the corresponding in-TU
> > > > version of the local type.  This missing piece turns out to cause a
> > > > hard-to-reduce use-after-free GC issue due to the entity_ary not being
> > > > marked as a GC root (deliberately), and manifests as a serialization
> > > > error on stream-in as in PR99426 (see comment #6 for a reduction).  It's
> > > > also reproducible on trunk when running the xtreme-header tests without
> > > > -fno-module-lazy.
> > > > 
> > > > This patch makes us merge such local types according to their position
> > > > within the containing function's definition, analogous to how we merge
> > > > FIELD_DECLs of a class according to their index in the TYPE_FIELDS
> > > > list.
> > > > 
> > > > PR c++/99426
> > > > 
> > > > gcc/cp/ChangeLog:
> > > > 
> > > > * module.cc (merge_kind::MK_local_type): New enumerator.
> > > > (merge_kind_name): Update.
> > > > (trees_out::chained_decls): Move BLOCK-specific handling
> > > > of DECL_LOCAL_DECL_P decls to ...
> > > > (trees_out::core_vals) : ... here.  Stream
> > > > BLOCK_VARS manually.
> > > > (trees_in::core_vals) : Stream BLOCK_VARS
> > > > manually.  Handle deduplicated local types..
> > > > (trees_out::key_local_type): Define.
> > > > (trees_in::key_local_type): Define.
> > > > (trees_out::get_merge_kind) : Return
> > > > MK_local_type for a local type.
> > > > (trees_out::key_mergeable) : Use
> > > > key_local_type.
> > > > (trees_in::key_mergeable) : Likewise.
> > > > (trees_in::is_matching_decl): Be flexible with type mismatches
> > > > for local entities.
> > > > 
> > > > diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
> > > > index 80b63a70a62..d9e34e9a4b9 100644
> > > > --- a/gcc/cp/module.cc
> > > > +++ b/gcc/cp/module.cc
> > > > @@ -6714,7 +6720,37 @@ trees_in::core_vals (tree t)
> > > >case BLOCK:
> > > >  t->block.locus = state->read_location (*this);
> > > >  t->block.end_locus = state->read_location (*this);
> > > > -  t->block.vars = chained_decls ();
> > > > +
> > > > +  for (tree *chain = >block.vars;;)
> > > > +   if (tree decl = tree_node ())
> > > > + {
> > > > +   /* For a deduplicated local type or enumerator, chain the
> > > > +  duplicate decl instead of the canonical in-TU decl.  
> > > > Seeing
> > > > +  a duplicate here means the containing function whose body
> > > > +  we're streaming in is a duplicate too, so we'll end up
> > > > +  discarding this BLOCK (and the rest of the duplicate 
> > > > function
> > > > +  body) anyway.  */
> > > > +   if (is_duplicate (decl))
> > > > + decl = maybe_duplicate (decl);
> > > > +   else if (DECL_IMPLICIT_TYPEDEF_P (decl)
> > > > +&& TYPE_TEMPLATE_INFO (TREE_TYPE (decl)))
> > > > + {
> > > > +   tree tmpl = TYPE_TI_TEMPLATE (TREE_TYPE (decl));
> > > > +   if (DECL_TEMPLATE_RESULT (tmpl) == decl && is_duplicate
> > > > (tmpl))
> > > > + decl = DECL_TEMPLATE_RESULT (maybe_duplicate (tmpl));
> > > > + }
> > > 
> > > This seems like a lot of generally-applicable code for finding the
> > > duplicate,
> > > which other calls to maybe_duplicate/odr_duplicate don't use.  If the
> > > template
> > > is a duplicate, why isn't its result?  If there's a good reason for that,
> > > should this template handling go into maybe_duplicate?
> > 
> > Ah yeah, that makes sense.
> > 
> > Some context: IIUC modules treats the TEMPLATE_DECL instead of the
> > DECL_TEMPLATE_RESULT as the canonical decl, which in turn means we'll
> > register_duplicate only the TEMPLATE_DECL.  But BLOCK_VARS never contains
> > a TEMPLATE_DECL, always the DECL_TEMPLATE_RESULT (i.e. a TYPE_DECL),
> > hence the extra handling.
> > 
> > Given that it's relatively more difficult to get at the TEMPLATE_DECL
> > from the DECL_TEMPLATE_RESULT rather than vice versa, maybe we should
> > just register both as duplicates from register_duplicate?  That way
> > callers can just simply pass the DECL_TEMPLATE_RESULT to maybe_duplicate
> > and it'll do the right thing.
> 
> Sounds good.
> 
> > > > @@ -10337,6 +10373,83 @@ trees_in::fn_parms_fini (int tag, tree fn, tree
> > > > existing, bool is_defn)
> > > >}
> > > >}
> > > >+/* Encode into KEY the position of the local type (class or enum)
> > > > +   declaration DECL

Re: [PATCH V4 1/3] aarch64: Place target independent and dependent changed code in one file

2024-04-12 Thread Ajit Agarwal

Hello Alex:

On 11/04/24 7:55 pm, Alex Coplan wrote:
> On 10/04/2024 23:48, Ajit Agarwal wrote:
>> Hello Alex:
>>
>> On 10/04/24 7:52 pm, Alex Coplan wrote:
>>> Hi Ajit,
>>>
>>> On 10/04/2024 15:31, Ajit Agarwal wrote:
 Hello Alex:

 On 10/04/24 1:42 pm, Alex Coplan wrote:
> Hi Ajit,
>
> On 09/04/2024 20:59, Ajit Agarwal wrote:
>> Hello Alex:
>>
>> On 09/04/24 8:39 pm, Alex Coplan wrote:
>>> On 09/04/2024 20:01, Ajit Agarwal wrote:
 Hello Alex:

 On 09/04/24 7:29 pm, Alex Coplan wrote:
> On 09/04/2024 17:30, Ajit Agarwal wrote:
>>
>>
>> On 05/04/24 10:03 pm, Alex Coplan wrote:
>>> On 05/04/2024 13:53, Ajit Agarwal wrote:
 Hello Alex/Richard:

 All review comments are incorporated.
>>> 
 @@ -2890,8 +3018,8 @@ ldp_bb_info::merge_pairs (insn_list_t 
 _list,
  // of accesses.  If we find two sets of adjacent accesses, call
  // merge_pairs.
  void
 -ldp_bb_info::transform_for_base (int encoded_lfs,
 -   access_group )
 +pair_fusion_bb_info::transform_for_base (int encoded_lfs,
 +   access_group )
  {
const auto lfs = decode_lfs (encoded_lfs);
const unsigned access_size = lfs.size;
 @@ -2909,7 +3037,7 @@ ldp_bb_info::transform_for_base (int 
 encoded_lfs,
   access.cand_insns,
   lfs.load_p,
   access_size);
 -skip_next = access.cand_insns.empty ();
 +skip_next = bb_state->cand_insns_empty_p (access.cand_insns);
>>>
>>> As above, why is this needed?
>>
>> For rs6000 we want to return always true. as load store pair
>> that are to be merged with 8/16 16/32 32/64 is occuring for rs6000.
>> And we want load store pair to 8/16 32/64. Thats why we want
>> to generate always true for rs6000 to skip pairs as above.
>
> Hmm, sorry, I'm not sure I follow.  Are you saying that for rs6000 
> you have
> load/store pair instructions where the two arms of the access are 
> storing
> operands of different sizes?  Or something else?
>
> As it stands the logic is to skip the next iteration only if we
> exhausted all the candidate insns for the current access.  In the case
> that we didn't exhaust all such candidates, then the idea is that when
> access becomes prev_access, we can attempt to use those candidates as
> the "left-hand side" of a pair in the next iteration since we failed 
> to
> use them as the "right-hand side" of a pair in the current iteration.
> I don't see why you wouldn't want that behaviour.  Please can you
> explain?
>

 In merge_pair we get the 2 load candiates one load from 0 offset and
 other load is from 16th offset. Then in next iteration we get load
 from 16th offset and other load from 32 offset. In next iteration
 we get load from 32 offset and other load from 48 offset.

 For example:

 Currently we get the load candiates as follows.

 pairs:

 load from 0th offset.
 load from 16th offset.

 next pairs:

 load from 16th offset.
 load from 32th offset.

 next pairs:

 load from 32th offset
 load from 48th offset.

 Instead in rs6000 we should get:

 pairs:

 load from 0th offset
 load from 16th offset.

 next pairs:

 load from 32th offset
 load from 48th offset.
>>>
>>> Hmm, so then I guess my question is: why wouldn't you consider merging
>>> the pair with offsets (16,32) for rs6000?  Is it because you have a
>>> stricter alignment requirement on the base pair offsets (i.e. they have
>>> to be a multiple of 32 when the operand size is 16)?  So the pair
>>> offsets have to be a multiple of the entire pair size rather than a
>>> single operand size> 
>>
>> We get load pair at a certain point with (0,16) and other program
>> point we get load pair (32, 48).
>>
>> In current implementation it takes offsets loads as (0, 16),
>> (16, 32), (32, 48).
>>
>> But In rs6000 we want  the load pair to be merged at different points
>> as (0,16) and (32, 48). for (0,16) we want to replace load lxvp with
>> 0 offset and other load (32, 48) with lxvp with 32 offset.
>>
>> In current case it will merge with lxvp with 0 offset and lxvp with
>> 16 offset, then lxvp with 32 offset and lxvp

[PATCH] aarch64: Add rcpc3 dependency on rcpc2 and rcpc

2024-04-12 Thread Andrew Carlotti

We don't yet have a separate feature flag for FEAT_LRCPC2 (and adding
one will require extending the feature bitmask).  Instead, make the
FEAT_LRCPC patterns available when either armv8.4-a or +rcpc3 is
specified.  On the other hand, we already have a +rcpc flag, so this
dependency can be specified directly.

The cpunative test needed updating because it used an invalid Features
list, since lrcpc3 requires both ilrcpc and lrcpc to be present.
Without this change, host_detect_local_cpu would return the architecture
string 'armv8-a+dotprod+crc+crypto+rcpc3+norcpc'.

gcc/ChangeLog:

* config/aarch64/aarch64-option-extensions.def: Add RCPC to
RCPC3 dependencies.
* config/aarch64/aarch64.h (AARCH64_ISA_RCPC8_4): Add test for
RCPC3 bit

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cpunative/info_24: Include lrcpc and ilrcpc.

---

Bootstrapped and regression tested on aarch64.  I also verified that the
atomic-store.c and ldapr-sext.c tests would pass when replacing 'armv8.4-a'
with 'armv8-a+rcpc3'.

Ok for master?


diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
b/gcc/config/aarch64/aarch64-option-extensions.def
index 
3155eccd39c8e6825b7fc2bb0d0514c2e7e559bf..42ec0eec31e2ddb0cc6f83fdbaf0fd4eac5ca7f4
 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -153,7 +153,7 @@ AARCH64_FMV_FEATURE("fp16fml", FP16FML, (F16FML))
 
 AARCH64_OPT_FMV_EXTENSION("rcpc", RCPC, (), (), (), "lrcpc")
 
-AARCH64_OPT_FMV_EXTENSION("rcpc3", RCPC3, (), (), (), "lrcpc3")
+AARCH64_OPT_FMV_EXTENSION("rcpc3", RCPC3, (RCPC), (), (), "lrcpc3")
 
 AARCH64_OPT_FMV_EXTENSION("i8mm", I8MM, (SIMD), (), (), "i8mm")
 
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
45e901cda644dbe4eaae709e685954f1a6f7dbcf..5870e3f812f6cb0674488b8e17ab7278003d2d54
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -242,7 +242,8 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = 
AARCH64_FL_SM_OFF;
 #define AARCH64_ISA_SHA3  (aarch64_isa_flags & AARCH64_FL_SHA3)
 #define AARCH64_ISA_F16FML(aarch64_isa_flags & AARCH64_FL_F16FML)
 #define AARCH64_ISA_RCPC  (aarch64_isa_flags & AARCH64_FL_RCPC)
-#define AARCH64_ISA_RCPC8_4   (aarch64_isa_flags & AARCH64_FL_V8_4A)
+#define AARCH64_ISA_RCPC8_4   (aarch64_isa_flags \
+   & (AARCH64_FL_V8_4A | AARCH64_FL_RCPC3))
 #define AARCH64_ISA_RNG   (aarch64_isa_flags & AARCH64_FL_RNG)
 #define AARCH64_ISA_V8_5A (aarch64_isa_flags & AARCH64_FL_V8_5A)
 #define AARCH64_ISA_TME   (aarch64_isa_flags & AARCH64_FL_TME)
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/info_24 
b/gcc/testsuite/gcc.target/aarch64/cpunative/info_24
index 
8d3c16a10910af977c560782f9d659c0e51286fd..3c64e00ca3a416ef565bc0b4a5b3e5bd9cfc41bc
 100644
--- a/gcc/testsuite/gcc.target/aarch64/cpunative/info_24
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/info_24
@@ -1,8 +1,8 @@
 processor  : 0
 BogoMIPS   : 100.00
-Features   : fp asimd evtstrm aes pmull sha1 sha2 crc32 asimddp lrcpc3
+Features   : fp asimd evtstrm aes pmull sha1 sha2 crc32 asimddp lrcpc 
ilrcpc lrcpc3
 CPU implementer: 0xfe
 CPU architecture: 8
 CPU variant: 0x0
 CPU part   : 0xd08
-CPU revision   : 2
\ No newline at end of file
+CPU revision   : 2

[PATCH] aarch64: Enable +cssc for armv8.9-a

2024-04-12 Thread Andrew Carlotti

FEAT_CSSC is mandatory in the architecture from Armv8.9.

gcc/ChangeLog:

* config/aarch64/aarch64-arches.def: Add CSSC to V8_9A
dependencies.

---

Bootstrapped and regression tested on aarch64.  Ok for master?


diff --git a/gcc/config/aarch64/aarch64-arches.def 
b/gcc/config/aarch64/aarch64-arches.def
index 
9bec30e9203bac01155281ef3474846c402bb29e..4634b272e28006b5c6c2d6705a2f1010cbd9ab9b
 100644
--- a/gcc/config/aarch64/aarch64-arches.def
+++ b/gcc/config/aarch64/aarch64-arches.def
@@ -39,7 +39,7 @@ AARCH64_ARCH("armv8.5-a", generic_armv8_a,   V8_5A, 
8,  (V8_4A, SB, SSBS
 AARCH64_ARCH("armv8.6-a", generic_armv8_a,   V8_6A, 8,  (V8_5A, I8MM, 
BF16))
 AARCH64_ARCH("armv8.7-a", generic_armv8_a,   V8_7A, 8,  (V8_6A))
 AARCH64_ARCH("armv8.8-a", generic_armv8_a,   V8_8A, 8,  (V8_7A, MOPS))
-AARCH64_ARCH("armv8.9-a", generic_armv8_a,   V8_9A, 8,  (V8_8A))
+AARCH64_ARCH("armv8.9-a", generic_armv8_a,   V8_9A, 8,  (V8_8A, CSSC))
 AARCH64_ARCH("armv8-r",   generic_armv8_a,   V8R  , 8,  (V8_4A))
 AARCH64_ARCH("armv9-a",   generic_armv9_a,   V9A  , 9,  (V8_5A, SVE2))
 AARCH64_ARCH("armv9.1-a", generic_armv9_a,   V9_1A, 9,  (V8_6A, V9A))

Re: [PATCH] rs6000: Add OPTION_MASK_POWER8 [PR101865]

2024-04-12 Thread Segher Boessenkool

Hi!

On Thu, Apr 11, 2024 at 11:23:02PM -0500, Peter Bergner wrote:
> On 4/11/24 10:31 PM, Kewen.Lin wrote:
> >> +;; This option exists only to create its MASK.  It is not intended for 
> >> users.
> >> +mdo-not-use-this-option
> >> +Target RejectNegative Mask(POWER8) Var(rs6000_isa_flags) WarnRemoved
> >> +
> > 
> > I can understand the given name is to avoid users to use it, but it looks 
> > odd, personally
> > I'm inclined to mpower8 (or even mpower8-internal) even if it's more likely 
> > to be used but
> > it's a bit more meaningful (especially we already have mpower10), 
> > theoretically speaking
> > it's undocumented users shouldn't use it at all.
> 
> Sorry, I should have mentioned this, but I originally had it -mpower8, but 
> given
> it was an option we don't want users to use, Segher mentioned offline to give 
> it
> a name something like the above and not -mpower8.  I kind of like 
> -mpower8-internal
> now that you mention it, but I'd like Segher's input here whether he prefers
> -mdo-not-use-this-option or -mpower8-internal or something else???

-mpower8-internal is fine.  Anyone who thinks this would be a good thing
to us, well, we cannot stop them from hurting themselves I guess.  Esp.
with a nice help text it is fine.

Going forward we need something like this for most ISA levels (we
currently often use the existence of some more-or-less random insn for
this, but we need the same test to actually test for *that* insn, not a
good thing at all).  But we do not want the user to be able to use such
options at all, so we really shouln't make command line options for it.

So it should not use an option flag at all for this, but something else.
Maybe something new even, we have had problems around this forever, that
suggests we have insufficient abstractions around this :-)

> I'll make the changes above, modulo leaving the option name unchanged until
> we hear from Segher on that and report back on the LE and BE testing.

-mpower8-internal should dissuade users from using it, certainly people
who actually read the documentation as well.  It is unfortunate we need
to tell people to not use tools we provide ourselves, but this is
temporary, right :-)  (Right?!)

Thanks guys,


Segher

Re: [PATCH] docs: Update function multiversioning documentation

2024-04-12 Thread Andrew Carlotti

Resending to CC some relevant reviewers.

I'll remove "memtag", "ssbs" and "ls64" from the AArch64 feature list before
committing, following changes to my recent AArch64 patch series.

On Tue, Apr 09, 2024 at 02:35:48PM +0100, Andrew Carlotti wrote:
> Add target_version attribute to Common Function Attributes and update
> target and target_clones documentation.  Move shared detail and examples
> to the Function Multiversioning page.  Add target-specific details to
> target-specific pages.
> 
> ---
> 
> I've built and checked the info and dvi outputs.  Ok for master?
> 
> gcc/ChangeLog:
> 
>   * doc/extend.texi (Common Function Attributes): Update target
>   and target_clones documentation, and add target_version.
>   (AArch64 Function Attributes): Add ACLE reference and list
>   supported features.
>   (PowerPC Function Attributes): List supported features.
>   (x86 Function Attributes): Mention function multiversioning.
>   (Function Multiversioning): Update, and move shared detail here.
> 
> 
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 
> 7b54a241a7bfde03ce86571be9486b30bcea6200..78cc7ad2903b61a06b618b82ba7ad52ed42d944a
>  100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -4178,18 +4178,27 @@ and @option{-Wanalyzer-tainted-size}.
>  Multiple target back ends implement the @code{target} attribute
>  to specify that a function is to
>  be compiled with different target options than specified on the
> -command line.  The original target command-line options are ignored.
> -One or more strings can be provided as arguments.
> -Each string consists of one or more comma-separated suffixes to
> -the @code{-m} prefix jointly forming the name of a machine-dependent
> -option.  @xref{Submodel Options,,Machine-Dependent Options}.
> -
> +command line.  One or more strings can be provided as arguments.
> +The attribute may override the original target command-line options, or it 
> may
> +be combined with them in a target-specific manner.
>  The @code{target} attribute can be used for instance to have a function
>  compiled with a different ISA (instruction set architecture) than the
> -default.  @samp{#pragma GCC target} can be used to specify target-specific
> +default.
> +
> +@samp{#pragma GCC target} can be used to specify target-specific
>  options for more than one function.  @xref{Function Specific Option Pragmas},
>  for details about the pragma.
>  
> +On x86, the @code{target} attribute can also be used to create multiple
> +versions of a function, compiled with different target-specific options.
> +@xref{Function Multiversioning} for more details.
> +
> +The options supported by the @code{target} attribute are specific to each
> +target; refer to @ref{x86 Function Attributes}, @ref{PowerPC Function
> +Attributes}, @ref{ARM Function Attributes}, @ref{AArch64 Function 
> Attributes},
> +@ref{Nios II Function Attributes}, and @ref{S/390 Function Attributes}
> +for details.
> +
>  For instance, on an x86, you could declare one function with the
>  @code{target("sse4.1,arch=core2")} attribute and another with
>  @code{target("sse4a,arch=amdfam10")}.  This is equivalent to
> @@ -4211,39 +4220,18 @@ multiple options is equivalent to separating the 
> option suffixes with
>  a comma (@samp{,}) within a single string.  Spaces are not permitted
>  within the strings.
>  
> -The options supported are specific to each target; refer to @ref{x86
> -Function Attributes}, @ref{PowerPC Function Attributes},
> -@ref{ARM Function Attributes}, @ref{AArch64 Function Attributes},
> -@ref{Nios II Function Attributes}, and @ref{S/390 Function Attributes}
> -for details.
> -
>  @cindex @code{target_clones} function attribute
>  @item target_clones (@var{options})
>  The @code{target_clones} attribute is used to specify that a function
> -be cloned into multiple versions compiled with different target options
> -than specified on the command line.  The supported options and restrictions
> -are the same as for @code{target} attribute.
> -
> -For instance, on an x86, you could compile a function with
> -@code{target_clones("sse4.1,avx")}.  GCC creates two function clones,
> -one compiled with @option{-msse4.1} and another with @option{-mavx}.
> -
> -On a PowerPC, you can compile a function with
> -@code{target_clones("cpu=power9,default")}.  GCC will create two
> -function clones, one compiled with @option{-mcpu=power9} and another
> -with the default options.  GCC must be configured to use GLIBC 2.23 or
> -newer in order to use the @code{target_clones} attribute.
> -
> -It also creates a resolver function (see
> -the @code{ifunc} attribute above) that dynamically selects a clone
> -suitable for current architecture.  The resolver is created only if there
> -is a usage of a function with @code{target_clones} attribute.
> -
> -Note that any subsequent call of a function without @code{target_clone}
> -from a @code{target_clone} caller will not lead to copying
> -(target

[PATCH v9 4/5] Use the .ACCESS_WITH_SIZE in bound sanitizer.

2024-04-12 Thread Qing Zhao

gcc/c-family/ChangeLog:

* c-ubsan.cc (get_bound_from_access_with_size): New function.
(ubsan_instrument_bounds): Handle call to .ACCESS_WITH_SIZE.

gcc/testsuite/ChangeLog:

* gcc.dg/ubsan/flex-array-counted-by-bounds-2.c: New test.
* gcc.dg/ubsan/flex-array-counted-by-bounds-3.c: New test.
* gcc.dg/ubsan/flex-array-counted-by-bounds-4.c: New test.
* gcc.dg/ubsan/flex-array-counted-by-bounds.c: New test.
---
 gcc/c-family/c-ubsan.cc   | 42 +
 .../ubsan/flex-array-counted-by-bounds-2.c| 45 ++
 .../ubsan/flex-array-counted-by-bounds-3.c| 34 ++
 .../ubsan/flex-array-counted-by-bounds-4.c| 34 ++
 .../ubsan/flex-array-counted-by-bounds.c  | 46 +++
 5 files changed, 201 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-3.c
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-4.c
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds.c

diff --git a/gcc/c-family/c-ubsan.cc b/gcc/c-family/c-ubsan.cc
index 940982819ddf..7cd3c6aa5b88 100644
--- a/gcc/c-family/c-ubsan.cc
+++ b/gcc/c-family/c-ubsan.cc
@@ -376,6 +376,40 @@ ubsan_instrument_return (location_t loc)
   return build_call_expr_loc (loc, t, 1, build_fold_addr_expr_loc (loc, data));
 }
 
+/* Get the tree that represented the number of counted_by, i.e, the maximum
+   number of the elements of the object that the call to .ACCESS_WITH_SIZE
+   points to, this number will be the bound of the corresponding array.  */
+static tree
+get_bound_from_access_with_size (tree call)
+{
+  if (!is_access_with_size_p (call))
+return NULL_TREE;
+
+  tree ref_to_size = CALL_EXPR_ARG (call, 1);
+  unsigned int class_of_size = TREE_INT_CST_LOW (CALL_EXPR_ARG (call, 2));
+  tree type = TREE_TYPE (CALL_EXPR_ARG (call, 3));
+  tree size = fold_build2 (MEM_REF, type, unshare_expr (ref_to_size),
+  build_int_cst (ptr_type_node, 0));
+  /* If size is negative value, treat it as zero.  */
+  if (!TYPE_UNSIGNED (type))
+  {
+tree cond = fold_build2 (LT_EXPR, boolean_type_node,
+unshare_expr (size), build_zero_cst (type));
+size = fold_build3 (COND_EXPR, type, cond,
+   build_zero_cst (type), size);
+  }
+
+  /* Only when class_of_size is 1, i.e, the number of the elements of
+ the object type, return the size.  */
+  if (class_of_size != 1)
+return NULL_TREE;
+  else
+size = fold_convert (sizetype, size);
+
+  return size;
+}
+
+
 /* Instrument array bounds for ARRAY_REFs.  We create special builtin,
that gets expanded in the sanopt pass, and make an array dimension
of it.  ARRAY is the array, *INDEX is an index to the array.
@@ -401,6 +435,14 @@ ubsan_instrument_bounds (location_t loc, tree array, tree 
*index,
  && COMPLETE_TYPE_P (type)
  && integer_zerop (TYPE_SIZE (type)))
bound = build_int_cst (TREE_TYPE (TYPE_MIN_VALUE (domain)), -1);
+  else if (INDIRECT_REF_P (array)
+  && is_access_with_size_p ((TREE_OPERAND (array, 0
+   {
+ bound = get_bound_from_access_with_size ((TREE_OPERAND (array, 0)));
+ bound = fold_build2 (MINUS_EXPR, TREE_TYPE (bound),
+  bound,
+  build_int_cst (TREE_TYPE (bound), 1));
+   }
   else
return NULL_TREE;
 }
diff --git a/gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c 
b/gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c
new file mode 100644
index ..b503320628d2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c
@@ -0,0 +1,45 @@
+/* Test the attribute counted_by and its usage in
+   bounds sanitizer combined with VLA.  */
+/* { dg-do run } */
+/* { dg-options "-fsanitize=bounds" } */
+/* { dg-output "index 11 out of bounds for type 'int 
\\\[\\\*\\\]\\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*index 20 out of bounds for type 'int 
\\\[\\\*\\\]\\\[\\\*\\\]\\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*index 11 out of bounds for type 'int 
\\\[\\\*\\\]\\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*index 10 out of bounds for type 'int 
\\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */
+
+
+#include 
+
+void __attribute__((__noinline__)) setup_and_test_vla (int n, int m)
+{
+   struct foo {
+   int n;
+   int p[][n] __attribute__((counted_by(n)));
+   } *f;
+
+   f = (struct foo *) malloc (sizeof(struct foo) + m*sizeof(int[n]));
+   f->n = m;
+   f->p[m][n-1]=1;
+   return;
+}
+
+void __attribute__((__noinline__)) setup_and_test_vla_1 (int n1, int n2, int m)
+{
+  struct foo {
+int n;
+int p[][n2][n1] __attribute__((counted_by(n)));
+  } *f;
+
+  f = (struct foo *) malloc

[PATCH v9 3/5] Use the .ACCESS_WITH_SIZE in builtin object size.

2024-04-12 Thread Qing Zhao

gcc/ChangeLog:

* tree-object-size.cc (access_with_size_object_size): New function.
(call_object_size): Call the new function.

gcc/testsuite/ChangeLog:

* gcc.dg/builtin-object-size-common.h: Add a new macro EXPECT.
* gcc.dg/flex-array-counted-by-3.c: New test.
* gcc.dg/flex-array-counted-by-4.c: New test.
* gcc.dg/flex-array-counted-by-5.c: New test.
---
 .../gcc.dg/builtin-object-size-common.h   |  11 ++
 .../gcc.dg/flex-array-counted-by-3.c  |  63 +++
 .../gcc.dg/flex-array-counted-by-4.c  | 178 ++
 .../gcc.dg/flex-array-counted-by-5.c  |  48 +
 gcc/tree-object-size.cc   |  60 ++
 5 files changed, 360 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-4.c
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-5.c

diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-common.h 
b/gcc/testsuite/gcc.dg/builtin-object-size-common.h
index 66ff7cdd953a..b677067c6e6b 100644
--- a/gcc/testsuite/gcc.dg/builtin-object-size-common.h
+++ b/gcc/testsuite/gcc.dg/builtin-object-size-common.h
@@ -30,3 +30,14 @@ unsigned nfails = 0;
   __builtin_abort ();\
 return 0;\
   } while (0)
+
+#define EXPECT(p, _v) do {   \
+  size_t v = _v; \
+  if (p == v)\
+__builtin_printf ("ok:  %s == %zd\n", #p, p);\
+  else   \
+{\
+  __builtin_printf ("WAT: %s == %zd (expected %zd)\n", #p, p, v);\
+  FAIL ();   \
+}\
+} while (0);
diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c 
b/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
new file mode 100644
index ..78f50230e891
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
@@ -0,0 +1,63 @@
+/* Test the attribute counted_by and its usage in
+ * __builtin_dynamic_object_size.  */ 
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include "builtin-object-size-common.h"
+
+struct flex {
+  int b;
+  int c[];
+} *array_flex;
+
+struct annotated {
+  int b;
+  int c[] __attribute__ ((counted_by (b)));
+} *array_annotated;
+
+struct nested_annotated {
+  struct {
+union {
+  int b;
+  float f; 
+};
+int n;
+  };
+  int c[] __attribute__ ((counted_by (b)));
+} *array_nested_annotated;
+
+void __attribute__((__noinline__)) setup (int normal_count, int attr_count)
+{
+  array_flex
+= (struct flex *)malloc (sizeof (struct flex)
++ normal_count *  sizeof (int));
+  array_flex->b = normal_count;
+
+  array_annotated
+= (struct annotated *)malloc (sizeof (struct annotated)
+ + attr_count *  sizeof (int));
+  array_annotated->b = attr_count;
+
+  array_nested_annotated
+= (struct nested_annotated *)malloc (sizeof (struct nested_annotated)
++ attr_count *  sizeof (int));
+  array_nested_annotated->b = attr_count;
+
+  return;
+}
+
+void __attribute__((__noinline__)) test ()
+{
+EXPECT(__builtin_dynamic_object_size(array_flex->c, 1), -1);
+EXPECT(__builtin_dynamic_object_size(array_annotated->c, 1),
+  array_annotated->b * sizeof (int));
+EXPECT(__builtin_dynamic_object_size(array_nested_annotated->c, 1),
+  array_nested_annotated->b * sizeof (int));
+}
+
+int main(int argc, char *argv[])
+{
+  setup (10,10);   
+  test ();
+  DONE ();
+}
diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c 
b/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c
new file mode 100644
index ..20103d58ef51
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c
@@ -0,0 +1,178 @@
+/* Test the attribute counted_by and its usage in
+__builtin_dynamic_object_size: what's the correct behavior when the
+allocation size mismatched with the value of counted_by attribute?
+We should always use the latest value that is hold by the counted_by
+field.  */
+/* { dg-do run } */
+/* { dg-options "-O -fstrict-flex-arrays=3" } */
+
+#include "builtin-object-size-common.h"
+
+struct annotated {
+  size_t foo;
+  char others;
+  char array[] __attribute__((counted_by (foo)));
+};
+
+#define noinline __attribute__((__noinline__))
+#define SIZE_BUMP 10 
+#define MAX(a, b) ((a) > (b) ? (a) : (b))
+
+/* In general, Due to type casting, the type for the pointee of a pointer
+   does not say

[PATCH v9 5/5] Add the 6th argument to .ACCESS_WITH_SIZE

2024-04-12 Thread Qing Zhao

to carry the TYPE of the flexible array.

Such information is needed during tree-object-size.cc.

We cannot use the result type or the type of the 1st argument
of the routine .ACCESS_WITH_SIZE to decide the element type
of the original array due to possible type casting in the
source code.

gcc/c/ChangeLog:

* c-typeck.cc (build_access_with_size_for_counted_by): Add the 6th
argument to .ACCESS_WITH_SIZE.

gcc/ChangeLog:

* tree-object-size.cc (access_with_size_object_size): Use the type
of the 6th argument for the type of the element.

gcc/testsuite/ChangeLog:

* gcc.dg/flex-array-counted-by-6.c: New test.
---
 gcc/c/c-typeck.cc | 11 +++--
 gcc/internal-fn.cc|  2 +
 .../gcc.dg/flex-array-counted-by-6.c  | 46 +++
 gcc/tree-object-size.cc   | 16 ---
 4 files changed, 66 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-6.c

diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index ff6685c6c4ba..0ea3b75355a4 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -2640,7 +2640,8 @@ build_counted_by_ref (tree datum, tree subdatum, tree 
*counted_by_type)
 
to:
 
-   (*.ACCESS_WITH_SIZE (REF, COUNTED_BY_REF, 1, (TYPE_OF_SIZE)0, -1))
+   (*.ACCESS_WITH_SIZE (REF, COUNTED_BY_REF, 1, (TYPE_OF_SIZE)0, -1,
+   (TYPE_OF_ARRAY *)0))
 
NOTE: The return type of this function is the POINTER type pointing
to the original flexible array type.
@@ -2652,6 +2653,9 @@ build_counted_by_ref (tree datum, tree subdatum, tree 
*counted_by_type)
The 4th argument of the call is a constant 0 with the TYPE of the
object pointed by COUNTED_BY_REF.
 
+   The 6th argument of the call is a constant 0 with the pointer TYPE
+   to the original flexible array type.
+
   */
 static tree
 build_access_with_size_for_counted_by (location_t loc, tree ref,
@@ -2664,12 +2668,13 @@ build_access_with_size_for_counted_by (location_t loc, 
tree ref,
 
   tree call
 = build_call_expr_internal_loc (loc, IFN_ACCESS_WITH_SIZE,
-   result_type, 5,
+   result_type, 6,
array_to_pointer_conversion (loc, ref),
counted_by_ref,
build_int_cst (integer_type_node, 1),
build_int_cst (counted_by_type, 0),
-   build_int_cst (integer_type_node, -1));
+   build_int_cst (integer_type_node, -1),
+   build_int_cst (result_type, 0));
   /* Wrap the call with an INDIRECT_REF with the flexible array type.  */
   call = build1 (INDIRECT_REF, TREE_TYPE (ref), call);
   SET_EXPR_LOCATION (call, loc);
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index e744080ee670..34e4a4aea534 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -3411,6 +3411,8 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt)
  1: read_only
  2: write_only
  3: read_write
+   6th argument: A constant 0 with the pointer TYPE to the original flexible
+ array type.
 
Both the return type and the type of the first argument of this
function have been converted from the incomplete array type to
diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-6.c 
b/gcc/testsuite/gcc.dg/flex-array-counted-by-6.c
new file mode 100644
index ..65fa01443d95
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-6.c
@@ -0,0 +1,46 @@
+/* Test the attribute counted_by and its usage in
+ * __builtin_dynamic_object_size: when the type of the flexible array member
+ * is casting to another type.  */
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include "builtin-object-size-common.h"
+
+typedef unsigned short u16;
+
+struct info {
+   u16 data_len;
+   char data[] __attribute__((counted_by(data_len)));
+};
+
+struct foo {
+   int a;
+   int b;
+};
+
+static __attribute__((__noinline__))
+struct info *setup ()
+{
+ struct info *p;
+ size_t bytes = 3 * sizeof(struct foo);
+
+ p = (struct info *)malloc (sizeof (struct info) + bytes);
+ p->data_len = bytes;
+
+ return p;
+}
+
+static void
+__attribute__((__noinline__)) report (struct info *p)
+{
+ struct foo *bar = (struct foo *)p->data;
+ EXPECT(__builtin_dynamic_object_size((char *)(bar + 1), 1), 16);
+ EXPECT(__builtin_dynamic_object_size((char *)(bar + 2), 1), 8);
+}
+
+int main(int argc, char *argv[])
+{
+ struct info *p = setup();
+ report(p);
+ return 0;
+}
diff --git a/gcc/tree-object-size.cc b/gcc/tree-object-size.cc
index 8de264d1dee2..4c1fa9b555fa 100644
--- a/gcc/tree-object-size.cc
+++ b/gcc/tree-object-size.cc
@@ -762,9 +762,11 @@ addr_object_size (struct object_size_info *osi, const_tree 
ptr,
  1: the number of the elements of the object type;
4th argument

[PATCH v9 2/5] Convert references with "counted_by" attributes to/from .ACCESS_WITH_SIZE.

2024-04-12 Thread Qing Zhao

Including the following changes:
* The definition of the new internal function .ACCESS_WITH_SIZE
  in internal-fn.def.
* C FE converts every reference to a FAM with a "counted_by" attribute
  to a call to the internal function .ACCESS_WITH_SIZE.
  (build_component_ref in c_typeck.cc)

  This includes the case when the object is statically allocated and
  initialized.
  In order to make this working, the routines initializer_constant_valid_p_1
  and output_constant in varasm.cc are updated to handle calls to
  .ACCESS_WITH_SIZE.
  (initializer_constant_valid_p_1 and output_constant in varasm.c)

  However, for the reference inside "offsetof", the "counted_by" attribute is
  ignored since it's not useful at all.
  (c_parser_postfix_expression in c/c-parser.cc)

  In addtion to "offsetof", for the reference inside operator "typeof" and
  "alignof", we ignore counted_by attribute too.

  When building ADDR_EXPR for the .ACCESS_WITH_SIZE in C FE,
  replace the call with its first argument.

* Convert every call to .ACCESS_WITH_SIZE to its first argument.
  (expand_ACCESS_WITH_SIZE in internal-fn.cc)
* Adjust alias analysis to exclude the new internal from clobbering anything.
  (ref_maybe_used_by_call_p_1 and call_may_clobber_ref_p_1 in tree-ssa-alias.cc)
* Adjust dead code elimination to eliminate the call to .ACCESS_WITH_SIZE when
  it's LHS is eliminated as dead code.
  (eliminate_unnecessary_stmts in tree-ssa-dce.cc)
* Provide the utility routines to check the call is .ACCESS_WITH_SIZE and
  get the reference from the call to .ACCESS_WITH_SIZE.
  (is_access_with_size_p and get_ref_from_access_with_size in tree.cc)

gcc/c/ChangeLog:

* c-parser.cc (c_parser_postfix_expression): Ignore the counted-by
attribute when build_component_ref inside offsetof operator.
* c-tree.h (build_component_ref): Add one more parameter.
* c-typeck.cc (build_counted_by_ref): New function.
(build_access_with_size_for_counted_by): New function.
(build_component_ref): Check the counted-by attribute and build
call to .ACCESS_WITH_SIZE.
(build_unary_op): When building ADDR_EXPR for
.ACCESS_WITH_SIZE, use its first argument.
(lvalue_p): Accept call to .ACCESS_WITH_SIZE.

gcc/ChangeLog:

* internal-fn.cc (expand_ACCESS_WITH_SIZE): New function.
* internal-fn.def (ACCESS_WITH_SIZE): New internal function.
* tree-ssa-alias.cc (ref_maybe_used_by_call_p_1): Special case
IFN_ACCESS_WITH_SIZE.
(call_may_clobber_ref_p_1): Special case IFN_ACCESS_WITH_SIZE.
* tree-ssa-dce.cc (eliminate_unnecessary_stmts): Eliminate the call
to .ACCESS_WITH_SIZE when its LHS is dead.
* tree.cc (process_call_operands): Adjust side effect for function
.ACCESS_WITH_SIZE.
(is_access_with_size_p): New function.
(get_ref_from_access_with_size): New function.
* tree.h (is_access_with_size_p): New prototype.
(get_ref_from_access_with_size): New prototype.
* varasm.cc (initializer_constant_valid_p_1): Handle call to
.ACCESS_WITH_SIZE.
(output_constant): Handle call to .ACCESS_WITH_SIZE.

gcc/testsuite/ChangeLog:

* gcc.dg/flex-array-counted-by-2.c: New test.
---
 gcc/c/c-parser.cc |  10 +-
 gcc/c/c-tree.h|   2 +-
 gcc/c/c-typeck.cc | 128 +-
 gcc/internal-fn.cc|  35 +
 gcc/internal-fn.def   |   4 +
 .../gcc.dg/flex-array-counted-by-2.c  | 112 +++
 gcc/tree-ssa-alias.cc |   2 +
 gcc/tree-ssa-dce.cc   |   5 +-
 gcc/tree.cc   |  25 +++-
 gcc/tree.h|   8 ++
 gcc/varasm.cc |  10 ++
 11 files changed, 331 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-2.c

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index c31349dae2ff..a6ed5ac43bb1 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -10850,9 +10850,12 @@ c_parser_postfix_expression (c_parser *parser)
if (c_parser_next_token_is (parser, CPP_NAME))
  {
c_token *comp_tok = c_parser_peek_token (parser);
+   /* Ignore the counted_by attribute for reference inside
+  offsetof since the information is not useful at all.  */
offsetof_ref
  = build_component_ref (loc, offsetof_ref, comp_tok->value,
-comp_tok->location, UNKNOWN_LOCATION);
+comp_tok->location, UNKNOWN_LOCATION,
+false);
c_parser_consume_token (parser);
while (c_parser_next_token_is (parser, CPP_DOT)
   ||

[PATCH v9 1/5] Provide counted_by attribute to flexible array member field (PR108896)

2024-04-12 Thread Qing Zhao

'counted_by (COUNT)'
 The 'counted_by' attribute may be attached to the C99 flexible
 array member of a structure.  It indicates that the number of the
 elements of the array is given by the field "COUNT" in the
 same structure as the flexible array member.
 GCC may use this information to improve detection of object size 
information
 for such structures and provide better results in compile-time diagnostics
 and runtime features like the array bound sanitizer and
 the '__builtin_dynamic_object_size'.

 For instance, the following code:

  struct P {
size_t count;
char other;
char array[] __attribute__ ((counted_by (count)));
  } *p;

 specifies that the 'array' is a flexible array member whose number
 of elements is given by the field 'count' in the same structure.

 The field that represents the number of the elements should have an
 integer type.  Otherwise, the compiler reports an error and
 ignores the attribute.

 When the field that represents the number of the elements is assigned a
 negative integer value, the compiler treats the value as zero.

 An explicit 'counted_by' annotation defines a relationship between
 two objects, 'p->array' and 'p->count', and there are the following
 requirementthat on the relationship between this pair:

* 'p->count' must be initialized before the first reference to
  'p->array';

* 'p->array' has _at least_ 'p->count' number of elements
  available all the time.  This relationship must hold even
  after any of these related objects are updated during the
  program.

 It's the user's responsibility to make sure the above requirements
 to be kept all the time.  Otherwise the compiler reports
 warnings, at the same time, the results of the array bound
 sanitizer and the '__builtin_dynamic_object_size' is undefined.

 One important feature of the attribute is, a reference to the
 flexible array member field uses the latest value assigned to
 the field that represents the number of the elements before that
 reference.  For example,

p->count = val1;
p->array[20] = 0;  // ref1 to p->array
p->count = val2;
p->array[30] = 0;  // ref2 to p->array

 in the above, 'ref1' uses 'val1' as the number of the elements
 in 'p->array', and 'ref2' uses 'val2' as the number of elements
 in 'p->array'.

gcc/c-family/ChangeLog:

PR C/108896
* c-attribs.cc (handle_counted_by_attribute): New function.
(attribute_takes_identifier_p): Add counted_by attribute to the list.
* c-common.cc (c_flexible_array_member_type_p): ...To this.
* c-common.h (c_flexible_array_member_type_p): New prototype.

gcc/c/ChangeLog:

PR C/108896
* c-decl.cc (flexible_array_member_type_p): Renamed and moved to...
(add_flexible_array_elts_to_size): Use renamed function.
(is_flexible_array_member_p): Use renamed function.
(verify_counted_by_attribute): New function.
(finish_struct): Use renamed function and verify counted_by
attribute.
* c-tree.h (lookup_field): New prototype.
* c-typeck.cc (lookup_field): Expose as extern function.
(tagged_types_tu_compatible_p): Check counted_by attribute for
structure type.

gcc/ChangeLog:

PR C/108896
* doc/extend.texi: Document attribute counted_by.

gcc/testsuite/ChangeLog:

PR C/108896
* gcc.dg/flex-array-counted-by.c: New test.
* gcc.dg/flex-array-counted-by-7.c: New test.
* gcc.dg/flex-array-counted-by-8.c: New test.
---
 gcc/c-family/c-attribs.cc |  68 +-
 gcc/c-family/c-common.cc  |  13 ++
 gcc/c-family/c-common.h   |   1 +
 gcc/c/c-decl.cc   |  78 ---
 gcc/c/c-tree.h|   1 +
 gcc/c/c-typeck.cc |  37 -
 gcc/doc/extend.texi   |  68 ++
 .../gcc.dg/flex-array-counted-by-7.c  |   8 ++
 .../gcc.dg/flex-array-counted-by-8.c  | 127 ++
 gcc/testsuite/gcc.dg/flex-array-counted-by.c  |  62 +
 10 files changed, 442 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-7.c
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-8.c
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by.c

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index 40a0cf90295d..39e5824ee7a5 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -105,6 +105,8 @@ static tree handle_warn_if_not_aligned_attribute (tree *, 
tree, tree,
  int, bool *);
 static tree handle_strict_flex_array_attribute

[PATCH v9 0/5] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2024-04-12 Thread Qing Zhao

Hi,

This is the 9th version of the patch.

Compare with the 8th version, the difference are:

updates per Joseph's comments:

1. in C FE, add checking for counted_by attribute for the new multiple 
definitions of the same tag for C23 in the routine 
"tagged_types_tu_compatible_p".
   Add a new testing case flex-array-counted-by-8.c for this. 
   This is for Patch 1;

2. two minor typo fixes in c-typeck.cc. 
   This is for Patch 2;

Approval status:

   Patch 2's C FE change has been approved with minor typo fixes (the above 2);
   Patch 4 has been approved; 
   Patch 5's C FE change has been approved;

Review status:

   Patch 3, Patch 2 and Patch 5's Middle-end change have been review by Sid, No 
issue.
   
More review needed:

   Patch 1's new change to C FE (the above 1);
   Patch 2, 3 and 5's middle-end change need to be approved   
  
The 8th version is here:
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648559.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648560.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648561.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648562.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648563.html

It based on the following original proposal:

https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635884.html
Represent the missing dependence for the "counted_by" attribute and its 
consumers

**The summary of the proposal is:

* Add a new internal function ".ACCESS_WITH_SIZE" to carry the size information 
for every reference to a FAM field;
* In C FE, Replace every reference to a FAM field whose TYPE has the 
"counted_by" attribute with the new internal function ".ACCESS_WITH_SIZE";
* In every consumer of the size information, for example, BDOS or array bound 
sanitizer, query the size information or ACCESS_MODE information from the new 
internal function;
* When expansing to RTL, replace the internal function with the actual 
reference to the FAM field;
* Some adjustment to ipa alias analysis, and other SSA passes to mitigate the 
impact to the optimizer and code generation.


**The new internal function

  .ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, CLASS_OF_SIZE, TYPE_OF_SIZE, 
ACCESS_MODE, TYPE_OF_REF)

INTERNAL_FN (ACCESS_WITH_SIZE, ECF_LEAF | ECF_NOTHROW, NULL)

which returns the "REF_TO_OBJ" same as the 1st argument;

Both the return type and the type of the first argument of this function have 
been converted from the incomplete array type to the corresponding pointer type.

The call to .ACCESS_WITH_SIZE is wrapped with an INDIRECT_REF, whose type is 
the original imcomplete array type.

Please see the following link for why:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638793.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639605.html

1st argument "REF_TO_OBJ": The reference to the object;
2nd argument "REF_TO_SIZE": The reference to the size of the object,
3rd argument "CLASS_OF_SIZE": The size referenced by the REF_TO_SIZE represents
   0: the number of bytes;
   1: the number of the elements of the object type;
4th argument "TYPE_OF_SIZE": A constant 0 with the TYPE of the object
  refed by REF_TO_SIZE
5th argument "ACCESS_MODE":
  -1: Unknown access semantics
   0: none
   1: read_only
   2: write_only
   3: read_write
6th argument "TYPE_OF_REF": A constant 0 with the pointer TYPE to
  to the original flexible array type.

** The Patch sets included:

1. Provide counted_by attribute to flexible array member field;
  which includes:
  * "counted_by" attribute documentation;
  * C FE handling of the new attribute;
syntax checking, error reporting;
  * testing cases;

2. Convert "counted_by" attribute to/from .ACCESS_WITH_SIZE.
  which includes:
  * The definition of the new internal function .ACCESS_WITH_SIZE in 
internal-fn.def.
  * C FE converts every reference to a FAM with "counted_by" attribute to a 
call to the internal function .ACCESS_WITH_SIZE.
(build_component_ref in c_typeck.cc)
This includes the case when the object is statically allocated and 
initialized.
In order to make this working, we should update 
initializer_constant_valid_p_1 and output_constant in varasm.cc to include 
calls to .ACCESS_WITH_SIZE.

However, for the reference inside "offsetof", ignore the "counted_by" 
attribute since it's not useful at all. (c_parser_postfix_expression in 
c/c-parser.cc)
In addtion to "offsetof", for the reference inside operator "typeof" and
  "alignof", we ignore counted_by attribute too.
When building ADDR_EXPR for the .ACCESS_WITH_SIZE in C FE,
  replace the call with its first argument.

  * Convert every call to .ACCESS_WITH_SIZE to its first argument.
(expand_ACCESS_WITH_SIZE in internal-fn.cc)
  * adjust alias analysis to exclude the new internal from clobbering 
anything.
(ref_maybe_used_by_call_p_1 and call_may_clobber_ref_p_1 in 
tree-ssa-alias.cc)

Re: [PATCH] c++: templated substitution into lambda-expr [PR114393]

2024-04-12 Thread Patrick Palka

On Wed, 10 Apr 2024, Jason Merrill wrote:

> On 3/27/24 10:01, Patrick Palka wrote:
> > On Mon, 25 Mar 2024, Patrick Palka wrote:
> > > On Mon, 25 Mar 2024, Patrick Palka wrote:
> > > > 
> > > > The below testcases use a lambda-expr as a template argument and they
> > > > all trip over the below added tsubst_lambda_expr sanity check ultimately
> > > > because current_template_parms is empty, which causes push_template_decl
> > > > to return error_mark_node from the call to begin_lambda_type.  Were it
> > > > not for the sanity check this silent error_mark_node result leads to
> > > > nonsensical errors down the line, or silent breakage.
> > > > 
> > > > In the first testcase, we hit this assert during instantiation of the
> > > > dependent alias template-id c1_t<_Data> from instantiate_template, which
> > > > clears current_template_parms via push_to_top_level.  Similar story for
> > > > the second testcase.  For the third testcase we hit the assert during
> > > > partial instantiation of the member template from
> > > > instantiate_class_template
> > > > which similarly calls push_to_top_level.
> > > > 
> > > > These testcases illustrate that templated substitution into a
> > > > lambda-expr
> > > > is not always possible, in particular when we lost the relevant template
> > > > context.  I experimented with recovering the template context by making
> > > > tsubst_lambda_expr fall back to using scope_chain->prev->template_parms
> > > > if
> > > > current_template_parms is empty which worked but seemed like a hack.  I
> > > > also experimented with preserving the template context by keeping
> > > > current_template_parms set during instantiate_template for a dependent
> > > > specialization which also worked but it's at odds with the fact that we
> > > > cache dependent specializations (and so they should be independent of
> > > > the template context).
> 
> I suspect the problem comes from this bit in type_unification_real:
> 
> >   /* First instatiate in template context, in case we still
> > depend on undeduced template parameters.  */
> >   ++processing_template_decl;
> >   substed = tsubst_template_arg (arg, full_targs, complain,
> >  NULL_TREE);
> >   --processing_template_decl;
> >   if (substed != error_mark_node
> >   && !uses_template_parms (substed))
> > /* We replaced all the tparms, substitute again out of
> > template context.  */
> > substed = NULL_TREE;
> 
> and perhaps we should switch to searching the argument for undeduced template
> parameters rather than doing a trial substitution.

Interesting, you also mentioned that approach would help fix PR101463
(which I need to get back to):
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642340.html

> 
> But the pattern of setting processing_template_decl, substituting, and
> clearing it again is very widespread, so we still want to handle lambdas in
> that context.
> 
> > > > +  if (processing_template_decl && !in_template_context)
> > > > +{
> > > > +  /* Defer templated substitution into a lambda-expr when arguments
> > > > +are dependent or when we lost the necessary template context,
> > > > +which may happen for a lambda-expr used as a template 
> > > > argument.  */
> > > 
> > > And this comment is stale (an earlier version of the patch also deferred
> > > for dependent arguments even when current_template_parms is non-empty,
> > > which I backed out to make the fix as narrow as possible).
> > 
> > FWIW I also experimented with unconditionally deferring templated
> > substitution into a lambda-expr (i.e. iff processing_template_decl)
> > which passed bootstrap+regtest, and turns out to also fix the
> > (non-regression) PR114167.  I didn't analyze the underlying issue
> > very closely though, there might very well be a better way to fix
> > that particular non-regression PR.
> > 
> > One downside of unconditionally deferring is that it'd mean less
> > ahead-of-time checking of uninvoked deeply-nested generic lambdas,
> > e.g.:
> > 
> >int main() {
> >  [](auto x) {
> >[](auto) {
> >  [](auto) { decltype(x)::fail; }; // not diagnosed anymore
> >};
> >  }(0);
> >}
> 
> Hmm, unconditionally deferring would probably also help to resolve issues with
> local classes in generic lambdas.  It might be worth going that way rather
> than continue to grapple with partial substitution problems.
> 
> > > diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> > > index 8cf0d5b7a8d..c25bdd283f1 100644
> > > --- a/gcc/cp/pt.cc
> > > +++ b/gcc/cp/pt.cc
> > > @@ -19571,6 +19572,18 @@ tsubst_lambda_expr (tree t, tree args,
> > > tsubst_flags_t complain, tree in_decl)
> > > tree oldfn = lambda_function (t);
> > > in_decl = oldfn;
> > >   +  args = add_extra_args (LAMBDA_EXPR_EXTRA_ARGS (t), args, complain,

[PATCH] c++: templated substitution into lambda-expr, cont [PR114393]

2024-04-12 Thread Patrick Palka

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

-- >8 --

The original PR114393 testcase is unfortunately still not accepted after
r14-9938-g081c1e93d56d35 due to return type deduction confusion when a
lambda-expr is used as a default template argument.

The below reduced testcase demonstrates the bug.  Here, when forming the
dependent specialization b_v we substitute the default argument of F,
a lambda-expr, with _Descriptor=U.  (In this case in_template_context is
true since we're in the context of the template c_v so we don't defer.)
This substitution in turn lowers the level of its auto return type from
2 to 1.  So later, when instantiating c_v we incorrectly
replace the auto with the template argument at level=0, index=0, i.e.
int, instead of going through do_auto_deduction which would yield char.

One way to fix this would be to use a level-less auto to represent a
deduced return type of a lambda, but that might be too invasive of a
change at this stage.

Another way would be to pass tf_partial during dependent substitution
into a default template argument from coerce_template_parms so that the
levels of deeper parameters don't get lowered, but that wouldn't do
the right thing currently due to the tf_partial early exit in the
LAMBDA_EXPR case of tsubst_expr.

Another way, and the approach that this patch takes, is to just defer
all dependent substitution into a lambda-expr, building upon the logic
added in r14-9938-g081c1e93d56d35.  This seems like the right thing to
do in light of the way we build up LAMBDA_EXPR_REGEN_INFO which should
consist only of the concrete template arguments used to regenerate the
lambda.

PR c++/114393

gcc/cp/ChangeLog:

* pt.cc (tsubst_lambda_expr): Also defer all dependent
substitution into a lambda-expr.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/lambda-targ2a.C: New test.
---
 gcc/cp/pt.cc   |  9 +++--
 gcc/testsuite/g++.dg/cpp2a/lambda-targ2a.C | 14 ++
 2 files changed, 21 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-targ2a.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index ec259ee0fbf..bbd35417f8d 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -19622,11 +19622,16 @@ tsubst_lambda_expr (tree t, tree args, tsubst_flags_t 
complain, tree in_decl)
   in_decl = oldfn;
 
   args = add_extra_args (LAMBDA_EXPR_EXTRA_ARGS (t), args, complain, in_decl);
-  if (processing_template_decl && !in_template_context)
+  if (processing_template_decl
+  && (!in_template_context || any_dependent_template_arguments_p (args)))
 {
   /* Defer templated substitution into a lambda-expr if we lost the
 necessary template context.  This may happen for a lambda-expr
-used as a default template argument.  */
+used as a default template argument.
+
+Defer dependent substitution as well so that we don't lower the
+level of a deduced return type or any other auto or template
+parameter.  */
   t = copy_node (t);
   LAMBDA_EXPR_EXTRA_ARGS (t) = NULL_TREE;
   LAMBDA_EXPR_EXTRA_ARGS (t) = build_extra_args (t, args, complain);
diff --git a/gcc/testsuite/g++.dg/cpp2a/lambda-targ2a.C 
b/gcc/testsuite/g++.dg/cpp2a/lambda-targ2a.C
new file mode 100644
index 000..7136ce79872
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/lambda-targ2a.C
@@ -0,0 +1,14 @@
+// PR c++/114393
+// { dg-do compile { target c++20 } }
+
+template  struct c1 {};
+
+template 
+inline constexpr auto b_v = F;
+
+template 
+inline constexpr auto c_v = b_v;
+
+auto f = c_v;
+using type = decltype(f());
+using type = char;
-- 
2.44.0.568.g436d4e5b14

Re: [PATCH]middle-end: adjust loop upper bounds when peeling for gaps and early break [PR114403].

2024-04-12 Thread Richard Sandiford

Richard Biener  writes:
> On Fri, 12 Apr 2024, Tamar Christina wrote:
>
>> Hi All,
>> 
>> This is a story all about how the peeling for gaps introduces a bug in the 
>> upper
>> bounds.
>> 
>> Before I go further, I'll first explain how I understand this to work for 
>> loops
>> with a single exit.
>> 
>> When peeling for gaps we peel N < VF iterations to scalar.
>> This happens by removing N iterations from the calculation of niters such 
>> that
>> vect_iters * VF == niters is always false.
>> 
>> In other words, when we exit the vector loop we always fall to the scalar 
>> loop.
>> The loop bounds adjustment guarantees this. Because of this we potentially
>> execute a vector loop iteration less.  That is, if you're at the boundary
>> condition where niters % VF by peeling one or more scalar iterations the 
>> vector
>> loop executes one less.
>> 
>> This is accounted for by the adjustments in vect_transform_loops.  This
>> adjustment happens differently based on whether the the vector loop can be
>> partial or not:
>> 
>> Peeling for gaps sets the bias to 0 and then:
>> 
>> when not partial:  we take the floor of (scalar_upper_bound / VF) - 1 to get 
>> the
>> vector latch iteration count.
>> 
>> when loop is partial:  For a single exit this means the loop is masked, we 
>> take
>>the ceil to account for the fact that the loop can 
>> handle
>> the final partial iteration using masking.
>> 
>> Note that there's no difference between ceil an floor on the boundary 
>> condition.
>> There is a difference however when you're slightly above it. i.e. if scalar
>> iterates 14 times and VF = 4 and we peel 1 iteration for gaps.
>> 
>> The partial loop does ((13 + 0) / 4) - 1 == 2 vector iterations. and in 
>> effect
>> the partial iteration is ignored and it's done as scalar.
>> 
>> This is fine because the niters modification has capped the vector iteration 
>> at
>> 2.  So that when we reduce the induction values you end up entering the 
>> scalar
>> code with ind_var.2 = ind_var.1 + 2 * VF.
>> 
>> Now lets look at early breaks.  To make it esier I'll focus on the specific
>> testcase:
>> 
>> char buffer[64];
>> 
>> __attribute__ ((noipa))
>> buff_t *copy (buff_t *first, buff_t *last)
>> {
>>   char *buffer_ptr = buffer;
>>   char *const buffer_end = [SZ-1];
>>   int store_size = sizeof(first->Val);
>>   while (first != last && (buffer_ptr + store_size) <= buffer_end)
>> {
>>   const char *value_data = (const char *)(>Val);
>>   __builtin_memcpy(buffer_ptr, value_data, store_size);
>>   buffer_ptr += store_size;
>>   ++first;
>> }
>> 
>>   if (first == last)
>> return 0;
>> 
>>   return first;
>> }
>> 
>> Here the first, early exit is on the condition:
>> 
>>   (buffer_ptr + store_size) <= buffer_end
>> 
>> and the main exit is on condition:
>> 
>>   first != last
>> 
>> This is important, as this bug only manifests itself when the first exit has 
>> a
>> known constant iteration count that's lower than the latch exit count.
>> 
>> because buffer holds 64 bytes, and VF = 4, unroll = 2, we end up processing 
>> 16
>> bytes per iteration.  So the exit has a known bounds of 8 + 1.
>> 
>> The vectorizer correctly analizes this:
>> 
>> Statement (exit)if (ivtmp_21 != 0)
>>  is executed at most 8 (bounded by 8) + 1 times in loop 1.
>> 
>> and as a consequence the IV is bound by 9:
>> 
>>   # vect_vec_iv_.14_117 = PHI <_118(9), { 9, 8, 7, 6 }(20)>
>>   ...
>>   vect_ivtmp_21.16_124 = vect_vec_iv_.14_117 + { 18446744073709551615, 
>> 18446744073709551615, 18446744073709551615, 18446744073709551615 };
>>   mask_patt_22.17_126 = vect_ivtmp_21.16_124 != { 0, 0, 0, 0 };
>>   if (mask_patt_22.17_126 == { -1, -1, -1, -1 })
>> goto ; [88.89%]
>>   else
>> goto ; [11.11%]
>> 
>> The imporant bits are this:
>> 
>> In this example the value of last - first = 416.
>> 
>> the calculated vector iteration count, is:
>> 
>> x = (((ptr2 - ptr1) - 16) / 16) + 1 = 27
>> 
>> the bounds generated, adjusting for gaps:
>> 
>>x == (((x - 1) >> 2) << 2)
>> 
>> which means we'll always fall through to the scalar code. as intended.
>> 
>> Here are two key things to note:
>> 
>> 1. In this loop, the early exit will always be the one taken.  When it's 
>> taken
>>we enter the scalar loop with the correct induction value to apply the gap
>>peeling.
>> 
>> 2. If the main exit is taken, the induction values assumes you've finished 
>> all
>>vector iterations.  i.e. it assumes you have completed 24 iterations, as 
>> we
>>treat the main exit the same for normal loop vect and early break when not
>>PEELED.
>>This means the induction value is adjusted to ind_var.2 = ind_var.1 + 24 
>> * VF;
>> 
>> So what's going wrong.  The vectorizer's codegen is correct and efficient,
>> however when we adjust the upper bounds, that code knows that the loops upper
>> bound is based on the early exit. i.e. 8 latch iterations. or in other

Re: [PATCH] s390: testsuite: Xfail range-sincos.c and vrp-float-abs-1.c

2024-04-12 Thread Andreas Krebbel

On 4/12/24 10:16, Stefan Schulze Frielinghaus wrote:
> As mentioned in PR114678 those failures will be fixed by
> https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648303.html
> For GCC 14 just xfail them which should be reverted once the patch is
> applied.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/tree-ssa/range-sincos.c: Xfail for s390.
>   * gcc.dg/tree-ssa/vrp-float-abs-1.c: Dito.> ---
>  Ok for mainline?

Ok, thanks!

Andreas

> 
>  gcc/testsuite/gcc.dg/tree-ssa/range-sincos.c| 2 +-
>  gcc/testsuite/gcc.dg/tree-ssa/vrp-float-abs-1.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-sincos.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/range-sincos.c
> index 337f9cda02f..35b38c3c914 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/range-sincos.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/range-sincos.c
> @@ -40,4 +40,4 @@ stool (double x)
>  link_error ();
>  }
>  
> -// { dg-final { scan-tree-dump-not "link_error" "evrp" { target { { 
> *-*-linux* } && { glibc } } } } }
> +// { dg-final { scan-tree-dump-not "link_error" "evrp" { target { { 
> *-*-linux* } && { glibc } } xfail s390*-*-* } } } xfail: PR114678
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp-float-abs-1.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/vrp-float-abs-1.c
> index 4b7b75833e0..a814a973963 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/vrp-float-abs-1.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp-float-abs-1.c
> @@ -14,4 +14,4 @@ foo (double x, double y)
>  }
>  }
>  
> -// { dg-final { scan-tree-dump-not "link_error" "evrp" } }
> +// { dg-final { scan-tree-dump-not "link_error" "evrp" { xfail s390*-*-* } } 
> } xfail: PR114678

[PATCH] c++: Diagnose or avoid constexpr dtors in classes with virtual bases [PR114426]

2024-04-12 Thread Jakub Jelinek

Hi!

I had another look at this P1 PR today.
You said in the "c++: fix in-charge parm in constexpr" mail back in December
(as well as in the r14-6507 commit message):
"Since a class with vbases can't have constexpr 'tors there isn't actually
a need for an in-charge parameter in a destructor" but the ICE is because
the destructor is marked implicitly constexpr.
https://eel.is/c++draft/dcl.constexpr#3.2 says that a destructor of a class
with virtual bases is not constexpr-suitable, but we were actually
implementing this just for constructors, so clearly my fault from the
https://wg21.link/P0784R7 implementation.  That paper clearly added that
sentence in there and removed similar sentence just from the constructor case.

So, the following patch makes sure the
  else if (CLASSTYPE_VBASECLASSES (DECL_CONTEXT (fun)))
{
  ret = false;
  if (complain)
error ("%q#T has virtual base classes", DECL_CONTEXT (fun));
}
hunk is done no just for DECL_CONSTRUCTOR_P (fun), but also
DECL_DESTRUCTOR_P (fun) - in that case just for cxx_dialect >= cxx20,
as for cxx_dialect < cxx20 we already set ret = false; and diagnose
a different error, so no need to diagnose two.

Bootstrapped/regtested on x86_64-linux and i686-linux, and checked it fixes
the testcase in a cross to armv7hl-linux-gnueabi, ok for trunk?

2024-04-12  Jakub Jelinek  

PR c++/114426
* constexpr.cc (is_valid_constexpr_fn): Return false/diagnose with
complain destructors in classes with virtual bases.

* g++.dg/cpp2a/pr114426.C: New test.
* g++.dg/cpp2a/constexpr-dtor16.C: New test.

--- gcc/cp/constexpr.cc.jj  2024-04-09 09:29:04.708521907 +0200
+++ gcc/cp/constexpr.cc 2024-04-12 11:45:08.845476718 +0200
@@ -262,18 +262,15 @@ is_valid_constexpr_fn (tree fun, bool co
inform (DECL_SOURCE_LOCATION (fun),
"lambdas are implicitly % only in C++17 and later");
 }
-  else if (DECL_DESTRUCTOR_P (fun))
+  else if (DECL_DESTRUCTOR_P (fun) && cxx_dialect < cxx20)
 {
-  if (cxx_dialect < cxx20)
-   {
- ret = false;
- if (complain)
-   error_at (DECL_SOURCE_LOCATION (fun),
- "% destructors only available"
- " with %<-std=c++20%> or %<-std=gnu++20%>");
-   }
+  ret = false;
+  if (complain)
+   error_at (DECL_SOURCE_LOCATION (fun),
+ "% destructors only available with "
+ "%<-std=c++20%> or %<-std=gnu++20%>");
 }
-  else if (!DECL_CONSTRUCTOR_P (fun))
+  else if (!DECL_CONSTRUCTOR_P (fun) && !DECL_DESTRUCTOR_P (fun))
 {
   tree rettype = TREE_TYPE (TREE_TYPE (fun));
   if (!literal_type_p (rettype))
--- gcc/testsuite/g++.dg/cpp2a/pr114426.C.jj2024-04-12 12:05:07.443891700 
+0200
+++ gcc/testsuite/g++.dg/cpp2a/pr114426.C   2024-04-12 12:05:07.443891700 
+0200
@@ -0,0 +1,7 @@
+// PR c++/114426
+// { dg-do compile }
+// { dg-additional-options "-O2" }
+
+struct A { virtual ~A (); };
+struct B : virtual A { virtual void foo () = 0; };
+struct C : B { C () {} };
--- gcc/testsuite/g++.dg/cpp2a/constexpr-dtor16.C.jj2024-04-12 
12:05:35.398505976 +0200
+++ gcc/testsuite/g++.dg/cpp2a/constexpr-dtor16.C   2024-04-12 
12:08:31.771072322 +0200
@@ -0,0 +1,7 @@
+// PR c++/114426
+// { dg-do compile { target c++11 } }
+
+struct A { virtual ~A (); };
+struct B : virtual A { constexpr ~B () {} };
+// { dg-error "'struct B' has virtual base classes" "" { target c++20 } .-1 }
+// { dg-error "'constexpr' destructors only available with" "" { target 
c++17_down } .-2 }

Jakub

Re: [PATCH 0/1] libgfortran: Fix compilation of gf_vsnprintf

2024-04-12 Thread FX Coudert

> Gentle ping. If this looks good, can someone commit to main (I don't have 
> commit privileges). This is also something that could be considered for 
> stable, since it's been around for many years.

Thanks for the patch. Pushed as 
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=3bd3ca05b519b99b5ea570c10fd80737cd4c6c49

FX

Re: [PATCH]middle-end: adjust loop upper bounds when peeling for gaps and early break [PR114403].

2024-04-12 Thread Richard Biener

On Fri, 12 Apr 2024, Tamar Christina wrote:

> Hi All,
> 
> This is a story all about how the peeling for gaps introduces a bug in the 
> upper
> bounds.
> 
> Before I go further, I'll first explain how I understand this to work for 
> loops
> with a single exit.
> 
> When peeling for gaps we peel N < VF iterations to scalar.
> This happens by removing N iterations from the calculation of niters such that
> vect_iters * VF == niters is always false.
> 
> In other words, when we exit the vector loop we always fall to the scalar 
> loop.
> The loop bounds adjustment guarantees this. Because of this we potentially
> execute a vector loop iteration less.  That is, if you're at the boundary
> condition where niters % VF by peeling one or more scalar iterations the 
> vector
> loop executes one less.
> 
> This is accounted for by the adjustments in vect_transform_loops.  This
> adjustment happens differently based on whether the the vector loop can be
> partial or not:
> 
> Peeling for gaps sets the bias to 0 and then:
> 
> when not partial:  we take the floor of (scalar_upper_bound / VF) - 1 to get 
> the
>  vector latch iteration count.
> 
> when loop is partial:  For a single exit this means the loop is masked, we 
> take
>the ceil to account for the fact that the loop can 
> handle
>  the final partial iteration using masking.
> 
> Note that there's no difference between ceil an floor on the boundary 
> condition.
> There is a difference however when you're slightly above it. i.e. if scalar
> iterates 14 times and VF = 4 and we peel 1 iteration for gaps.
> 
> The partial loop does ((13 + 0) / 4) - 1 == 2 vector iterations. and in effect
> the partial iteration is ignored and it's done as scalar.
> 
> This is fine because the niters modification has capped the vector iteration 
> at
> 2.  So that when we reduce the induction values you end up entering the scalar
> code with ind_var.2 = ind_var.1 + 2 * VF.
> 
> Now lets look at early breaks.  To make it esier I'll focus on the specific
> testcase:
> 
> char buffer[64];
> 
> __attribute__ ((noipa))
> buff_t *copy (buff_t *first, buff_t *last)
> {
>   char *buffer_ptr = buffer;
>   char *const buffer_end = [SZ-1];
>   int store_size = sizeof(first->Val);
>   while (first != last && (buffer_ptr + store_size) <= buffer_end)
> {
>   const char *value_data = (const char *)(>Val);
>   __builtin_memcpy(buffer_ptr, value_data, store_size);
>   buffer_ptr += store_size;
>   ++first;
> }
> 
>   if (first == last)
> return 0;
> 
>   return first;
> }
> 
> Here the first, early exit is on the condition:
> 
>   (buffer_ptr + store_size) <= buffer_end
> 
> and the main exit is on condition:
> 
>   first != last
> 
> This is important, as this bug only manifests itself when the first exit has a
> known constant iteration count that's lower than the latch exit count.
> 
> because buffer holds 64 bytes, and VF = 4, unroll = 2, we end up processing 16
> bytes per iteration.  So the exit has a known bounds of 8 + 1.
> 
> The vectorizer correctly analizes this:
> 
> Statement (exit)if (ivtmp_21 != 0)
>  is executed at most 8 (bounded by 8) + 1 times in loop 1.
> 
> and as a consequence the IV is bound by 9:
> 
>   # vect_vec_iv_.14_117 = PHI <_118(9), { 9, 8, 7, 6 }(20)>
>   ...
>   vect_ivtmp_21.16_124 = vect_vec_iv_.14_117 + { 18446744073709551615, 
> 18446744073709551615, 18446744073709551615, 18446744073709551615 };
>   mask_patt_22.17_126 = vect_ivtmp_21.16_124 != { 0, 0, 0, 0 };
>   if (mask_patt_22.17_126 == { -1, -1, -1, -1 })
> goto ; [88.89%]
>   else
> goto ; [11.11%]
> 
> The imporant bits are this:
> 
> In this example the value of last - first = 416.
> 
> the calculated vector iteration count, is:
> 
> x = (((ptr2 - ptr1) - 16) / 16) + 1 = 27
> 
> the bounds generated, adjusting for gaps:
> 
>x == (((x - 1) >> 2) << 2)
> 
> which means we'll always fall through to the scalar code. as intended.
> 
> Here are two key things to note:
> 
> 1. In this loop, the early exit will always be the one taken.  When it's taken
>we enter the scalar loop with the correct induction value to apply the gap
>peeling.
> 
> 2. If the main exit is taken, the induction values assumes you've finished all
>vector iterations.  i.e. it assumes you have completed 24 iterations, as we
>treat the main exit the same for normal loop vect and early break when not
>PEELED.
>This means the induction value is adjusted to ind_var.2 = ind_var.1 + 24 * 
> VF;
> 
> So what's going wrong.  The vectorizer's codegen is correct and efficient,
> however when we adjust the upper bounds, that code knows that the loops upper
> bound is based on the early exit. i.e. 8 latch iterations. or in other words.
> It thinks the loop iterates once.
> 
> This is incorrect as the vector loop iterates twice, as it has set up the
> induction value such that it exits at the early exit.   So it in effect

Re: [PATCH v2] aarch64: Preserve mem info on change of base for ldp/stp [PR114674]

2024-04-12 Thread Richard Sandiford

Alex Coplan  writes:
> This is a v2 because I accidentally sent a WIP version of the patch last
> time round which used replace_equiv_address instead of
> replace_equiv_address_nv; that caused some ICEs (pointed out by the
> Linaro CI) since pair addressing modes aren't a subset of the addresses
> that are accepted by memory_operand for a given mode.
>
> This patch should otherwise be identical to v1.  Bootstrapped/regtested
> on aarch64-linux-gnu (indeed this is the patch I actually tested last
> time), is this version also OK for GCC 15?

OK, thanks.  Sorry for missing this in the first review.

Richard

> Thanks,
> Alex
>
> --- >8 ---
>
> The ldp/stp fusion pass can change the base of an access so that the two
> accesses end up using a common base register.  So far we have been using
> adjust_address_nv to do this, but this means that we don't preserve
> other properties of the mem we're replacing.  It seems better to use
> replace_equiv_address_nv, as this will preserve e.g. the MEM_ALIGN of the
> mem whose address we're changing.
>
> The PR shows that by adjusting the other mem we lose alignment
> information about the original access and therefore end up rejecting an
> otherwise viable pair when --param=aarch64-stp-policy=aligned is passed.
> This patch fixes that by using replace_equiv_address_nv instead.
>
> Notably this is the same approach as taken by
> aarch64_check_consecutive_mems when a change of base is required, so
> this at least makes things more consistent between the ldp fusion pass
> and the peepholes.
>
> gcc/ChangeLog:
>
>   PR target/114674
>   * config/aarch64/aarch64-ldp-fusion.cc (ldp_bb_info::fuse_pair):
>   Use replace_equiv_address_nv on a change of base instead of
>   adjust_address_nv on the other access.
>
> gcc/testsuite/ChangeLog:
>
>   PR target/114674
>   * gcc.target/aarch64/pr114674.c: New test.
>
> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> index 365dcf48b22..d07d79df06c 100644
> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> @@ -1730,11 +1730,11 @@ ldp_bb_info::fuse_pair (bool load_p,
>   adjust_amt *= -1;
>  
>rtx change_reg = XEXP (change_pat, !load_p);
> -  machine_mode mode_for_mem = GET_MODE (change_mem);
>rtx effective_base = drop_writeback (base_mem);
> -  rtx new_mem = adjust_address_nv (effective_base,
> -mode_for_mem,
> -adjust_amt);
> +  rtx adjusted_addr = plus_constant (Pmode,
> +  XEXP (effective_base, 0),
> +  adjust_amt);
> +  rtx new_mem = replace_equiv_address_nv (change_mem, adjusted_addr);
>rtx new_set = load_p
>   ? gen_rtx_SET (change_reg, new_mem)
>   : gen_rtx_SET (new_mem, change_reg);
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr114674.c 
> b/gcc/testsuite/gcc.target/aarch64/pr114674.c
> new file mode 100644
> index 000..944784fd008
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/pr114674.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 --param=aarch64-stp-policy=aligned" } */
> +typedef struct {
> + unsigned int f1;
> + unsigned int f2;
> +} test_struct;
> +
> +static test_struct ts = {
> + 123, 456
> +};
> +
> +void foo(void)
> +{
> + ts.f2 = 36969 * (ts.f2 & 65535) + (ts.f1 >> 16);
> + ts.f1 = 18000 * (ts.f2 & 65535) + (ts.f2 >> 16);
> +}
> +/* { dg-final { scan-assembler-times "stp" 1 } } */

RE: [PATCH v1] RISC-V: Bugfix ICE non-vector in TARGET_FUNCTION_VALUE_REGNO_P

2024-04-12 Thread Li, Pan2

Sure thing, the FP_RETURN only acts on ABI_xxx similar to below:

#define FP_RETURN (UNITS_PER_FP_ARG == 0 ? GP_RETURN : FP_ARG_FIRST)

I add some test for rv32/64imac option but don't cover all test cases without 
f/d extension, will have a try and keep you posted.

Pan

-Original Message-
From: Kito Cheng  
Sent: Friday, April 12, 2024 4:56 PM
To: Li, Pan2 
Cc: juzhe.zh...@rivai.ai; gcc-patches 
Subject: Re: [PATCH v1] RISC-V: Bugfix ICE non-vector in 
TARGET_FUNCTION_VALUE_REGNO_P

Does FP reg also need gurared with TARGET_HARD_FLOAT? could you try to
compile that case without F?

On Fri, Apr 12, 2024 at 2:19 PM Li, Pan2  wrote:
>
> Committed, thanks Juzhe.
>
>
>
> Pan
>
>
>
> From: juzhe.zh...@rivai.ai 
> Sent: Friday, April 12, 2024 2:11 PM
> To: Li, Pan2 ; gcc-patches 
> Cc: kito.cheng ; Li, Pan2 
> Subject: Re: [PATCH v1] RISC-V: Bugfix ICE non-vector in 
> TARGET_FUNCTION_VALUE_REGNO_P
>
>
>
> LGTM。
>
>
>
> 
>
> juzhe.zh...@rivai.ai
>
>
>
> From: pan2.li
>
> Date: 2024-04-12 14:08
>
> To: gcc-patches
>
> CC: juzhe.zhong; kito.cheng; Pan Li
>
> Subject: [PATCH v1] RISC-V: Bugfix ICE non-vector in 
> TARGET_FUNCTION_VALUE_REGNO_P
>
> From: Pan Li 
>
>
>
> This patch would like to fix one ICE when vector is not enabled
>
> in hook TARGET_FUNCTION_VALUE_REGNO_P implementation.  The vector
>
> regno is available if and only if the TARGET_VECTOR is true.  The
>
> previous implement missed this condition and then result in ICE
>
> when rv64gc build option without vector.
>
>
>
> PR target/114639
>
>
>
> The below test suite is passed for this patch.
>
>
>
> * The rv64gcv fully regression tests.
>
> * The rv64gc fully regression tests.
>
>
>
> gcc/ChangeLog:
>
>
>
> * config/riscv/riscv.cc (riscv_function_value_regno_p): Add
>
> TARGET_VECTOR predicate for V_RETURN regno.
>
>
>
> gcc/testsuite/ChangeLog:
>
>
>
> * gcc.target/riscv/pr114639-1.c: New test.
>
> * gcc.target/riscv/pr114639-2.c: New test.
>
> * gcc.target/riscv/pr114639-3.c: New test.
>
> * gcc.target/riscv/pr114639-4.c: New test.
>
>
>
> Signed-off-by: Pan Li 
>
> ---
>
> gcc/config/riscv/riscv.cc   |  2 +-
>
> gcc/testsuite/gcc.target/riscv/pr114639-1.c | 11 +++
>
> gcc/testsuite/gcc.target/riscv/pr114639-2.c | 11 +++
>
> gcc/testsuite/gcc.target/riscv/pr114639-3.c | 11 +++
>
> gcc/testsuite/gcc.target/riscv/pr114639-4.c | 11 +++
>
> 5 files changed, 45 insertions(+), 1 deletion(-)
>
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr114639-1.c
>
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr114639-2.c
>
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr114639-3.c
>
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr114639-4.c
>
>
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>
> index 91f017dd52a..e5f00806bb9 100644
>
> --- a/gcc/config/riscv/riscv.cc
>
> +++ b/gcc/config/riscv/riscv.cc
>
> @@ -11008,7 +11008,7 @@ riscv_function_value_regno_p (const unsigned regno)
>
>if (FP_RETURN_FIRST <= regno && regno <= FP_RETURN_LAST)
>
>  return true;
>
> -  if (regno == V_RETURN)
>
> +  if (TARGET_VECTOR && regno == V_RETURN)
>
>  return true;
>
>return false;
>
> diff --git a/gcc/testsuite/gcc.target/riscv/pr114639-1.c 
> b/gcc/testsuite/gcc.target/riscv/pr114639-1.c
>
> new file mode 100644
>
> index 000..f41723193a4
>
> --- /dev/null
>
> +++ b/gcc/testsuite/gcc.target/riscv/pr114639-1.c
>
> @@ -0,0 +1,11 @@
>
> +/* Test that we do not have ice when compile */
>
> +/* { dg-do compile } */
>
> +/* { dg-options "-march=rv64gc -mabi=lp64d -std=gnu89 -O3" } */
>
> +
>
> +g (a, b) {}
>
> +
>
> +f (xx)
>
> + void* xx;
>
> +{
>
> +  __builtin_apply ((void*)g, xx, 200);
>
> +}
>
> diff --git a/gcc/testsuite/gcc.target/riscv/pr114639-2.c 
> b/gcc/testsuite/gcc.target/riscv/pr114639-2.c
>
> new file mode 100644
>
> index 000..0c402c4b254
>
> --- /dev/null
>
> +++ b/gcc/testsuite/gcc.target/riscv/pr114639-2.c
>
> @@ -0,0 +1,11 @@
>
> +/* Test that we do not have ice when compile */
>
> +/* { dg-do compile } */
>
> +/* { dg-options "-march=rv64imac -mabi=lp64 -std=gnu89 -O3" } */
>
> +
>
> +g (a, b) {}
>
> +
>
> +f (xx)
>
> + void* xx;
>
> +{
>
> +  __builtin_apply ((void*)g, xx, 200);
>
> +}
>
> diff --git a/gcc/testsuite/gcc.target/riscv/pr114639-3.c 
> b/gcc/testsuite/gcc.target/riscv/pr114639-3.c
>
> new file mode 100644
>
> index 000..ffb0d6d162d
>
> --- /dev/null
>
> +++ b/gcc/testsuite/gcc.target/riscv/pr114639-3.c
>
> @@ -0,0 +1,11 @@
>
> +/* Test that we do not have ice when compile */
>
> +/* { dg-do compile } */
>
> +/* { dg-options "-march=rv32gc -mabi=ilp32d -std=gnu89 -O3" } */
>
> +
>
> +g (a, b) {}
>
> +
>
> +f (xx)
>
> + void* xx;
>
> +{
>
> +  __builtin_apply ((void*)g, xx, 200);
>
> +}
>
> diff --git a/gcc/testsuite/gcc.target/riscv/pr114639-4.c 
> b/gcc/testsuite/gcc.target/riscv/pr114639-4.c
>
> new file mode 100644
>
> index 000..a6e229101ef
>
> ---

[committed] RISC-V: Fix Werror=sign-compare in riscv_validate_vector_type

2024-04-12 Thread pan2 . li

From: Pan Li 

This patch would like to fix the Werror=sign-compare similar to below:

gcc/config/riscv/riscv.cc: In function ‘void
riscv_validate_vector_type(const_tree, const char*)’:
gcc/config/riscv/riscv.cc:5614:23: error: comparison of integer
expressions of different signedness: ‘int’ and ‘unsigned int’
[-Werror=sign-compare]
 5614 |   if (TARGET_MIN_VLEN < required_min_vlen)

The TARGET_MIN_VLEN is *int* by default but the required_min_vlen
returned from riscv_vector_required_min_vlen is **unsigned**.  Thus,
adjust the related function and reference variable(s) to int type
to avoid such kind of Werror.

The below test suite is passed for this patch.
* The rv64gcv fully regression tests.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_vector_float_type_p): Take int
as the return value instead of unsigned.
(riscv_vector_element_bitsize): Ditto.
(riscv_vector_required_min_vlen): Ditto.
(riscv_validate_vector_type): Take int type for local variable(s).

Signed-off-by: Pan Li 
---
 gcc/config/riscv/riscv.cc | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index e5f00806bb9..74445bc977c 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -5499,7 +5499,7 @@ riscv_vector_float_type_p (const_tree type)
   return strstr (name, "vfloat") != NULL;
 }
 
-static unsigned
+static int
 riscv_vector_element_bitsize (const_tree type)
 {
   machine_mode mode = TYPE_MODE (type);
@@ -5523,7 +5523,7 @@ riscv_vector_element_bitsize (const_tree type)
   gcc_unreachable ();
 }
 
-static unsigned
+static int
 riscv_vector_required_min_vlen (const_tree type)
 {
   machine_mode mode = TYPE_MODE (type);
@@ -5531,7 +5531,7 @@ riscv_vector_required_min_vlen (const_tree type)
   if (riscv_v_ext_mode_p (mode))
 return TARGET_MIN_VLEN;
 
-  unsigned element_bitsize = riscv_vector_element_bitsize (type);
+  int element_bitsize = riscv_vector_element_bitsize (type);
   const char *name = IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (type)));
 
   if (strstr (name, "bool64") != NULL)
@@ -5569,7 +5569,7 @@ riscv_validate_vector_type (const_tree type, const char 
*hint)
   return;
 }
 
-  unsigned element_bitsize = riscv_vector_element_bitsize (type);
+  int element_bitsize = riscv_vector_element_bitsize (type);
   bool int_type_p = riscv_vector_int_type_p (type);
 
   if (int_type_p && element_bitsize == 64
@@ -5609,7 +5609,7 @@ riscv_validate_vector_type (const_tree type, const char 
*hint)
   return;
 }
 
-  unsigned required_min_vlen = riscv_vector_required_min_vlen (type);
+  int required_min_vlen = riscv_vector_required_min_vlen (type);
 
   if (TARGET_MIN_VLEN < required_min_vlen)
 {
-- 
2.34.1

Re: [PATCH 0/1] libgfortran: Fix compilation of gf_vsnprintf

2024-04-12 Thread McInerney, Ian S

Gentle ping. If this looks good, can someone commit to main (I don't have 
commit privileges). This is also something that could be considered for stable, 
since it's been around for many years.

-Ian




From: McInerney, Ian S 
Sent: Thursday, April 4, 2024 4:16 PM
To: fort...@gcc.gnu.org ; gcc-patches@gcc.gnu.org 

Cc: McInerney, Ian S 
Subject: [PATCH 0/1] libgfortran: Fix compilation of gf_vsnprintf
 
The fallback function (gf_vsnprintf) to provide a vsnprintf function
if the system library doesn't have one would not compile due to the
variable name for the string's destination buffer not being updated
after the refactor in 2018 in edaaef601d0d6d263fba87b42d6d04c99dd23dba.

This updates the internal logic of gf_vsnprintf to now use the str
variable defined in the function signature.

I am not actually sure what configurations are using this fallback, since
it was added in 2018 and no patches have been made to fix this compilation
error. Testing this also isn't straightforward, and I had to do a bit of a
hack to get it to use the codepath to show the compilation error:
 1) Configure and build as normal to generate the config.h header
 2) Modify config.h directly to undefine HAVE_VSNPRINTF
 3) Directly call the libgfortran compilation step

Ian McInerney (1):
  libgfortran: Fix compilation of gf_vsnprintf

 libgfortran/runtime/error.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)


base-commit: b7bd2ec73d66f7487bc8842b24daecaa802a72e6
--
2.43.0

[PATCH]middle-end: adjust loop upper bounds when peeling for gaps and early break [PR114403].

2024-04-12 Thread Tamar Christina

Hi All,

This is a story all about how the peeling for gaps introduces a bug in the upper
bounds.

Before I go further, I'll first explain how I understand this to work for loops
with a single exit.

When peeling for gaps we peel N < VF iterations to scalar.
This happens by removing N iterations from the calculation of niters such that
vect_iters * VF == niters is always false.

In other words, when we exit the vector loop we always fall to the scalar loop.
The loop bounds adjustment guarantees this. Because of this we potentially
execute a vector loop iteration less.  That is, if you're at the boundary
condition where niters % VF by peeling one or more scalar iterations the vector
loop executes one less.

This is accounted for by the adjustments in vect_transform_loops.  This
adjustment happens differently based on whether the the vector loop can be
partial or not:

Peeling for gaps sets the bias to 0 and then:

when not partial:  we take the floor of (scalar_upper_bound / VF) - 1 to get the
   vector latch iteration count.

when loop is partial:  For a single exit this means the loop is masked, we take
   the ceil to account for the fact that the loop can handle
   the final partial iteration using masking.

Note that there's no difference between ceil an floor on the boundary condition.
There is a difference however when you're slightly above it. i.e. if scalar
iterates 14 times and VF = 4 and we peel 1 iteration for gaps.

The partial loop does ((13 + 0) / 4) - 1 == 2 vector iterations. and in effect
the partial iteration is ignored and it's done as scalar.

This is fine because the niters modification has capped the vector iteration at
2.  So that when we reduce the induction values you end up entering the scalar
code with ind_var.2 = ind_var.1 + 2 * VF.

Now lets look at early breaks.  To make it esier I'll focus on the specific
testcase:

char buffer[64];

__attribute__ ((noipa))
buff_t *copy (buff_t *first, buff_t *last)
{
  char *buffer_ptr = buffer;
  char *const buffer_end = [SZ-1];
  int store_size = sizeof(first->Val);
  while (first != last && (buffer_ptr + store_size) <= buffer_end)
{
  const char *value_data = (const char *)(>Val);
  __builtin_memcpy(buffer_ptr, value_data, store_size);
  buffer_ptr += store_size;
  ++first;
}

  if (first == last)
return 0;

  return first;
}

Here the first, early exit is on the condition:

  (buffer_ptr + store_size) <= buffer_end

and the main exit is on condition:

  first != last

This is important, as this bug only manifests itself when the first exit has a
known constant iteration count that's lower than the latch exit count.

because buffer holds 64 bytes, and VF = 4, unroll = 2, we end up processing 16
bytes per iteration.  So the exit has a known bounds of 8 + 1.

The vectorizer correctly analizes this:

Statement (exit)if (ivtmp_21 != 0)
 is executed at most 8 (bounded by 8) + 1 times in loop 1.

and as a consequence the IV is bound by 9:

  # vect_vec_iv_.14_117 = PHI <_118(9), { 9, 8, 7, 6 }(20)>
  ...
  vect_ivtmp_21.16_124 = vect_vec_iv_.14_117 + { 18446744073709551615, 
18446744073709551615, 18446744073709551615, 18446744073709551615 };
  mask_patt_22.17_126 = vect_ivtmp_21.16_124 != { 0, 0, 0, 0 };
  if (mask_patt_22.17_126 == { -1, -1, -1, -1 })
goto ; [88.89%]
  else
goto ; [11.11%]

The imporant bits are this:

In this example the value of last - first = 416.

the calculated vector iteration count, is:

x = (((ptr2 - ptr1) - 16) / 16) + 1 = 27

the bounds generated, adjusting for gaps:

   x == (((x - 1) >> 2) << 2)

which means we'll always fall through to the scalar code. as intended.

Here are two key things to note:

1. In this loop, the early exit will always be the one taken.  When it's taken
   we enter the scalar loop with the correct induction value to apply the gap
   peeling.

2. If the main exit is taken, the induction values assumes you've finished all
   vector iterations.  i.e. it assumes you have completed 24 iterations, as we
   treat the main exit the same for normal loop vect and early break when not
   PEELED.
   This means the induction value is adjusted to ind_var.2 = ind_var.1 + 24 * 
VF;

So what's going wrong.  The vectorizer's codegen is correct and efficient,
however when we adjust the upper bounds, that code knows that the loops upper
bound is based on the early exit. i.e. 8 latch iterations. or in other words.
It thinks the loop iterates once.

This is incorrect as the vector loop iterates twice, as it has set up the
induction value such that it exits at the early exit.   So it in effect iterates
2.5x times.

Becuase the upper bound is incorrect, when we unroll it now exits from the main
exit which uses the incorrect induction value.

So there are three ways to fix this:

1.  If we take the position that the main exit should support both premature
exits and final exits then vect_update_ivs_after_vectorizer

Re: [PATCH v1] RISC-V: Bugfix ICE non-vector in TARGET_FUNCTION_VALUE_REGNO_P

2024-04-12 Thread Kito Cheng

Does FP reg also need gurared with TARGET_HARD_FLOAT? could you try to
compile that case without F?

On Fri, Apr 12, 2024 at 2:19 PM Li, Pan2  wrote:
>
> Committed, thanks Juzhe.
>
>
>
> Pan
>
>
>
> From: juzhe.zh...@rivai.ai 
> Sent: Friday, April 12, 2024 2:11 PM
> To: Li, Pan2 ; gcc-patches 
> Cc: kito.cheng ; Li, Pan2 
> Subject: Re: [PATCH v1] RISC-V: Bugfix ICE non-vector in 
> TARGET_FUNCTION_VALUE_REGNO_P
>
>
>
> LGTM。
>
>
>
> 
>
> juzhe.zh...@rivai.ai
>
>
>
> From: pan2.li
>
> Date: 2024-04-12 14:08
>
> To: gcc-patches
>
> CC: juzhe.zhong; kito.cheng; Pan Li
>
> Subject: [PATCH v1] RISC-V: Bugfix ICE non-vector in 
> TARGET_FUNCTION_VALUE_REGNO_P
>
> From: Pan Li 
>
>
>
> This patch would like to fix one ICE when vector is not enabled
>
> in hook TARGET_FUNCTION_VALUE_REGNO_P implementation.  The vector
>
> regno is available if and only if the TARGET_VECTOR is true.  The
>
> previous implement missed this condition and then result in ICE
>
> when rv64gc build option without vector.
>
>
>
> PR target/114639
>
>
>
> The below test suite is passed for this patch.
>
>
>
> * The rv64gcv fully regression tests.
>
> * The rv64gc fully regression tests.
>
>
>
> gcc/ChangeLog:
>
>
>
> * config/riscv/riscv.cc (riscv_function_value_regno_p): Add
>
> TARGET_VECTOR predicate for V_RETURN regno.
>
>
>
> gcc/testsuite/ChangeLog:
>
>
>
> * gcc.target/riscv/pr114639-1.c: New test.
>
> * gcc.target/riscv/pr114639-2.c: New test.
>
> * gcc.target/riscv/pr114639-3.c: New test.
>
> * gcc.target/riscv/pr114639-4.c: New test.
>
>
>
> Signed-off-by: Pan Li 
>
> ---
>
> gcc/config/riscv/riscv.cc   |  2 +-
>
> gcc/testsuite/gcc.target/riscv/pr114639-1.c | 11 +++
>
> gcc/testsuite/gcc.target/riscv/pr114639-2.c | 11 +++
>
> gcc/testsuite/gcc.target/riscv/pr114639-3.c | 11 +++
>
> gcc/testsuite/gcc.target/riscv/pr114639-4.c | 11 +++
>
> 5 files changed, 45 insertions(+), 1 deletion(-)
>
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr114639-1.c
>
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr114639-2.c
>
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr114639-3.c
>
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr114639-4.c
>
>
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>
> index 91f017dd52a..e5f00806bb9 100644
>
> --- a/gcc/config/riscv/riscv.cc
>
> +++ b/gcc/config/riscv/riscv.cc
>
> @@ -11008,7 +11008,7 @@ riscv_function_value_regno_p (const unsigned regno)
>
>if (FP_RETURN_FIRST <= regno && regno <= FP_RETURN_LAST)
>
>  return true;
>
> -  if (regno == V_RETURN)
>
> +  if (TARGET_VECTOR && regno == V_RETURN)
>
>  return true;
>
>return false;
>
> diff --git a/gcc/testsuite/gcc.target/riscv/pr114639-1.c 
> b/gcc/testsuite/gcc.target/riscv/pr114639-1.c
>
> new file mode 100644
>
> index 000..f41723193a4
>
> --- /dev/null
>
> +++ b/gcc/testsuite/gcc.target/riscv/pr114639-1.c
>
> @@ -0,0 +1,11 @@
>
> +/* Test that we do not have ice when compile */
>
> +/* { dg-do compile } */
>
> +/* { dg-options "-march=rv64gc -mabi=lp64d -std=gnu89 -O3" } */
>
> +
>
> +g (a, b) {}
>
> +
>
> +f (xx)
>
> + void* xx;
>
> +{
>
> +  __builtin_apply ((void*)g, xx, 200);
>
> +}
>
> diff --git a/gcc/testsuite/gcc.target/riscv/pr114639-2.c 
> b/gcc/testsuite/gcc.target/riscv/pr114639-2.c
>
> new file mode 100644
>
> index 000..0c402c4b254
>
> --- /dev/null
>
> +++ b/gcc/testsuite/gcc.target/riscv/pr114639-2.c
>
> @@ -0,0 +1,11 @@
>
> +/* Test that we do not have ice when compile */
>
> +/* { dg-do compile } */
>
> +/* { dg-options "-march=rv64imac -mabi=lp64 -std=gnu89 -O3" } */
>
> +
>
> +g (a, b) {}
>
> +
>
> +f (xx)
>
> + void* xx;
>
> +{
>
> +  __builtin_apply ((void*)g, xx, 200);
>
> +}
>
> diff --git a/gcc/testsuite/gcc.target/riscv/pr114639-3.c 
> b/gcc/testsuite/gcc.target/riscv/pr114639-3.c
>
> new file mode 100644
>
> index 000..ffb0d6d162d
>
> --- /dev/null
>
> +++ b/gcc/testsuite/gcc.target/riscv/pr114639-3.c
>
> @@ -0,0 +1,11 @@
>
> +/* Test that we do not have ice when compile */
>
> +/* { dg-do compile } */
>
> +/* { dg-options "-march=rv32gc -mabi=ilp32d -std=gnu89 -O3" } */
>
> +
>
> +g (a, b) {}
>
> +
>
> +f (xx)
>
> + void* xx;
>
> +{
>
> +  __builtin_apply ((void*)g, xx, 200);
>
> +}
>
> diff --git a/gcc/testsuite/gcc.target/riscv/pr114639-4.c 
> b/gcc/testsuite/gcc.target/riscv/pr114639-4.c
>
> new file mode 100644
>
> index 000..a6e229101ef
>
> --- /dev/null
>
> +++ b/gcc/testsuite/gcc.target/riscv/pr114639-4.c
>
> @@ -0,0 +1,11 @@
>
> +/* Test that we do not have ice when compile */
>
> +/* { dg-do compile } */
>
> +/* { dg-options "-march=rv32imac -mabi=ilp32 -std=gnu89 -O3" } */
>
> +
>
> +g (a, b) {}
>
> +
>
> +f (xx)
>
> + void* xx;
>
> +{
>
> +  __builtin_apply ((void*)g, xx, 200);
>
> +}
>
> --
>
> 2.34.1
>
>
>
>

Re: [gcc-wwwdocs PATCH] Uncomment MCore part title

2024-04-12 Thread Gerald Pfeifer

On Fri, 12 Apr 2024, Haochen Jiang wrote:
> Uncomment that and commit as obvious.

Thank you!

Gerald

Re: [PATCH] contrib/check-params-in-docs.py: Ignore target-specific params

2024-04-12 Thread Martin Jambor

Hi,

On Fri, Apr 12 2024, Filip Kastl wrote:
> On Thu 2024-04-11 20:51:55, Thomas Schwinge wrote:
>> Hi!
>> 
>> On 2024-04-11T19:52:51+0200, Martin Jambor  wrote:
>> > contrib/check-params-in-docs.py is a script that checks that all
>> > options reported with ./gcc/xgcc -Bgcc --help=param are in
>> > gcc/doc/invoke.texi and vice versa.
>> 
>> Eh, first time I'm hearing about this one!

It's running as part of our internal buildbot that Martin Liška set up.

I must admit I did want to spend the minimum time necessary to fix the
failure and did not realize Filip was looking at it too until I commited
my simple fix...

>> 
>> (a) Shouldn't this be running as part of the GCC build process?
>> 
>> > gcn-preferred-vectorization-factor is in the manual but normally not
>> > reported by --help, probably because I do not have gcn offload
>> > configured.
>> 
>> No, because you've not been building GCC for GCN target.  ;-P
>> 
>> > This patch makes the script silently about this particular
>> > fact.
>> 
>> (b) Shouldn't we instead ignore any '--param's with "gcn" prefix, similar
>> to how that's done for "skip aarch64 params"?
>> 
>> (c) ..., and shouldn't we likewise skip any "x86" ones?
>> 
>> (d) ..., or in fact any target specific ones, following after the generic
>> section?  (Easily achieved with a special marker in
>> 'gcc/doc/invoke.texi', just before:
>> 
>> The following choices of @var{name} are available on AArch64 targets:
>> 
>> ..., and adjusting the 'takewhile' in 'contrib/check-params-in-docs.py'
>> accordingly?
>
> Hi,
>
> I've made a patch to address (b), (c), (d).  I didn't adjust takewhile.  I
> chose to do it differently since target-specific params in both invoke.texi 
> and
> --help=params have to be ignored.
>
> The downside of this patch is that the script won't complain if someone adds a
> target-specific param and doesn't document it.
>
> What do you think?

...and this is clearly much better.  Thanks!

Martin

>
> Cheers,
> Filip
>
> -- 8< --
>
> contrib/check-params-in-docs.py is a script that checks that all options
> reported with gcc --help=params are in gcc/doc/invoke.texi and vice
> versa.
> gcc/doc/invoke.texi lists target-specific params but gcc --help=params
> doesn't.  This meant that the script would mistakenly complain about
> parms missing from --help=params.  Previously, the script was just set
> to ignore aarch64 and gcn params which solved this issue only for x86.
> This patch sets the script to ignore all target-specific params.
>
> contrib/ChangeLog:
>
>   * check-params-in-docs.py: Ignore target specific params.
>
> Signed-off-by: Filip Kastl 
> ---
>  contrib/check-params-in-docs.py | 21 +
>  1 file changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/contrib/check-params-in-docs.py b/contrib/check-params-in-docs.py
> index f7879dd8e08..ccdb8d72169 100755
> --- a/contrib/check-params-in-docs.py
> +++ b/contrib/check-params-in-docs.py
> @@ -38,6 +38,9 @@ def get_param_tuple(line):
>  description = line[i:].strip()
>  return (name, description)
>  
> +def target_specific(param):
> +return param.split('-')[0] in ('aarch64', 'gcn', 'x86')
> +
>  
>  parser = argparse.ArgumentParser()
>  parser.add_argument('texi_file')
> @@ -45,13 +48,16 @@ parser.add_argument('params_output')
>  
>  args = parser.parse_args()
>  
> -ignored = {'logical-op-non-short-circuit', 
> 'gcn-preferred-vectorization-factor'}
> -params = {}
> +ignored = {'logical-op-non-short-circuit'}
> +help_params = {}
>  
>  for line in open(args.params_output).readlines():
>  if line.startswith(' ' * 2) and not line.startswith(' ' * 8):
>  r = get_param_tuple(line)
> -params[r[0]] = r[1]
> +help_params[r[0]] = r[1]
> +
> +# Skip target-specific params
> +help_params = [x for x in help_params.keys() if not target_specific(x)]
>  
>  # Find section in .texi manual with parameters
>  texi = ([x.strip() for x in open(args.texi_file).readlines()])
> @@ -66,14 +72,13 @@ for line in texi:
>  texi_params.append(line[len(token):])
>  break
>  
> -# skip digits
> +# Skip digits
>  texi_params = [x for x in texi_params if not x[0].isdigit()]
> -# skip aarch64 params
> -texi_params = [x for x in texi_params if not x.startswith('aarch64')]
> -sorted_params = sorted(texi_params)
> +# Skip target-specific params
> +texi_params = [x for x in texi_params if not target_specific(x)]
>  
>  texi_set = set(texi_params) - ignored
> -params_set = set(params.keys()) - ignored
> +params_set = set(help_params) - ignored
>  
>  success = True
>  extra = texi_set - params_set
> -- 
> 2.43.1

Re: [PATCH] tree-cfg: Make the verifier returns_twice message translatable

2024-04-12 Thread Richard Biener




> Am 12.04.2024 um 09:58 schrieb Jakub Jelinek :
> 
> Hi!
> 
> While translation of the verifier messages is questionable, that case is
> something that ideally should never happen except to gcc developers
> and so pressumably English should be fine, we use error etc. APIs and
> those imply translatations and some translators translate it.
> The following patch adjusts the code such that we don't emit
> appel returns_twice est not first dans le bloc de base 33
> in French (i.e. 2 English word in the middle of a French message).
> Similarly Swedish or Ukrainian.
> Note, the German translator did differentiate between these verifier
> messages vs. normal user facing and translated it to:
> "Interner Fehler: returns_twice call is %s in basic block %d"
> so just a German prefix before English message.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok

Richard 

> 2024-04-12  Jakub Jelinek  
> 
>* tree-cfg.cc (gimple_verify_flow_info): Make the misplaced
>returns_twice diagnostics translatable.
> 
> --- gcc/tree-cfg.cc.jj2024-04-10 10:19:04.237471564 +0200
> +++ gcc/tree-cfg.cc2024-04-11 17:18:57.962672110 +0200
> @@ -5818,7 +5818,7 @@ gimple_verify_flow_info (void)
>  if (gimple_code (stmt) == GIMPLE_CALL
>  && gimple_call_flags (stmt) & ECF_RETURNS_TWICE)
>{
> -  const char *misplaced = NULL;
> +  bool misplaced = false;
>  /* TM is an exception: it points abnormal edges just after the
> call that starts a transaction, i.e. it must end the BB.  */
>  if (gimple_call_builtin_p (stmt, BUILT_IN_TM_START))
> @@ -5826,18 +5826,23 @@ gimple_verify_flow_info (void)
>  if (single_succ_p (bb)
>  && bb_has_abnormal_pred (single_succ (bb))
>  && !gsi_one_nondebug_before_end_p (gsi))
> -misplaced = "not last";
> +{
> +  error ("returns_twice call is not last in basic block "
> + "%d", bb->index);
> +  misplaced = true;
> +}
>}
>  else
>{
> -  if (seen_nondebug_stmt
> -  && bb_has_abnormal_pred (bb))
> -misplaced = "not first";
> +  if (seen_nondebug_stmt && bb_has_abnormal_pred (bb))
> +{
> +  error ("returns_twice call is not first in basic block "
> + "%d", bb->index);
> +  misplaced = true;
> +}
>}
>  if (misplaced)
>{
> -  error ("returns_twice call is %s in basic block %d",
> - misplaced, bb->index);
>  print_gimple_stmt (stderr, stmt, 0, TDF_SLIM);
>  err = true;
>}
> 
>Jakub
>

[gcc-wwwdocs PATCH] Uncomment MCore part title

2024-04-12 Thread Haochen Jiang

Hi all,

When I am checking GCC14 documentation, I found that MCore forgot to uncomment
the title for their part, which caused the documentation is mixed with x86.

Uncomment that and commit as obvious.

Thx,
Haochen

---
 htdocs/gcc-14/changes.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index 14301157..8ac08e9a 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -828,7 +828,7 @@ __asm (".global __flmap_lock"  "\n\t"
   
 
 
-
+MCore
 
   Bitfields are now signed by default per GCC policy.  If you need 
bitfields
 to be unsigned, use -funsigned-bitfields.
-- 
2.31.1

Re: [PATCH] Limit special asan/ubsan/bitint returns_twice handling to calls in bbs with abnormal pred [PR114687]

2024-04-12 Thread Richard Biener




> Am 12.04.2024 um 09:50 schrieb Jakub Jelinek :
> 
> Hi!
> 
> The tree-cfg.cc verifier only diagnoses returns_twice calls preceded
> by non-label/debug stmts if it is in a bb with abnormal predecessor.
> The following testcase shows that if a user lies in the attributes
> (a function which never returns can't be pure, and can't return
> twice when it doesn't ever return at all), when we figure it out,
> we can remove the abnormal edges to the "returns_twice" call and perhaps
> whole .ABNORMAL_DISPATCHER etc.
> edge_before_returns_twice_call then ICEs because it can't find such
> an edge.
> 
> The following patch limits the special handling to calls in bbs where
> the verifier requires that.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok

Richard 

> 2024-04-12  Jakub Jelinek  
> 
>PR sanitizer/114687
>* gimple-iterator.cc (gsi_safe_insert_before): Only use
>edge_before_returns_twice_call if bb_has_abnormal_pred.
>(gsi_safe_insert_seq_before): Likewise.
>* gimple-lower-bitint.cc (bitint_large_huge::lower_call): Only
>push to m_returns_twice_calls if bb_has_abnormal_pred.
> 
>* gcc.dg/asan/pr114687.c: New test.
> 
> --- gcc/gimple-iterator.cc.jj2024-03-14 09:57:09.024966285 +0100
> +++ gcc/gimple-iterator.cc2024-04-11 17:05:06.267081433 +0200
> @@ -1049,7 +1049,8 @@ gsi_safe_insert_before (gimple_stmt_iter
>   gimple *stmt = gsi_stmt (*iter);
>   if (stmt
>   && is_gimple_call (stmt)
> -  && (gimple_call_flags (stmt) & ECF_RETURNS_TWICE) != 0)
> +  && (gimple_call_flags (stmt) & ECF_RETURNS_TWICE) != 0
> +  && bb_has_abnormal_pred (gsi_bb (*iter)))
> {
>   edge e = edge_before_returns_twice_call (gsi_bb (*iter));
>   basic_block new_bb = gsi_insert_on_edge_immediate (e, g);
> @@ -1072,7 +1073,8 @@ gsi_safe_insert_seq_before (gimple_stmt_
>   gimple *stmt = gsi_stmt (*iter);
>   if (stmt
>   && is_gimple_call (stmt)
> -  && (gimple_call_flags (stmt) & ECF_RETURNS_TWICE) != 0)
> +  && (gimple_call_flags (stmt) & ECF_RETURNS_TWICE) != 0
> +  && bb_has_abnormal_pred (gsi_bb (*iter)))
> {
>   edge e = edge_before_returns_twice_call (gsi_bb (*iter));
>   gimple *f = gimple_seq_first_stmt (seq);
> --- gcc/gimple-lower-bitint.cc.jj2024-04-09 09:28:21.261123664 +0200
> +++ gcc/gimple-lower-bitint.cc2024-04-11 17:06:58.033548199 +0200
> @@ -5320,7 +5320,7 @@ bitint_large_huge::lower_call (tree obj,
>  arg = make_ssa_name (TREE_TYPE (arg));
>  gimple *g = gimple_build_assign (arg, v);
>  gsi_insert_before (, g, GSI_SAME_STMT);
> -  if (returns_twice)
> +  if (returns_twice && bb_has_abnormal_pred (gimple_bb (stmt)))
>{
>  m_returns_twice_calls.safe_push (stmt);
>  returns_twice = false;
> --- gcc/testsuite/gcc.dg/asan/pr114687.c.jj2024-04-11 17:09:54.518127165 
> +0200
> +++ gcc/testsuite/gcc.dg/asan/pr114687.c2024-04-11 17:09:22.699563654 
> +0200
> @@ -0,0 +1,22 @@
> +/* PR sanitizer/114687 */
> +/* { dg-do compile } */
> +
> +int a;
> +int foo (int);
> +
> +__attribute__((pure, returns_twice)) int
> +bar (void)
> +{
> +  a = 1;
> +  while (a)
> +a = 2;
> +  return a;
> +}
> +
> +int
> +baz (void)
> +{
> +  int d = bar ();
> +  foo (d);
> +  return 0;
> +}
> 
>Jakub
>

[PATCH v2] aarch64: Preserve mem info on change of base for ldp/stp [PR114674]

2024-04-12 Thread Alex Coplan

This is a v2 because I accidentally sent a WIP version of the patch last
time round which used replace_equiv_address instead of
replace_equiv_address_nv; that caused some ICEs (pointed out by the
Linaro CI) since pair addressing modes aren't a subset of the addresses
that are accepted by memory_operand for a given mode.

This patch should otherwise be identical to v1.  Bootstrapped/regtested
on aarch64-linux-gnu (indeed this is the patch I actually tested last
time), is this version also OK for GCC 15?

Thanks,
Alex

--- >8 ---

The ldp/stp fusion pass can change the base of an access so that the two
accesses end up using a common base register.  So far we have been using
adjust_address_nv to do this, but this means that we don't preserve
other properties of the mem we're replacing.  It seems better to use
replace_equiv_address_nv, as this will preserve e.g. the MEM_ALIGN of the
mem whose address we're changing.

The PR shows that by adjusting the other mem we lose alignment
information about the original access and therefore end up rejecting an
otherwise viable pair when --param=aarch64-stp-policy=aligned is passed.
This patch fixes that by using replace_equiv_address_nv instead.

Notably this is the same approach as taken by
aarch64_check_consecutive_mems when a change of base is required, so
this at least makes things more consistent between the ldp fusion pass
and the peepholes.

gcc/ChangeLog:

PR target/114674
* config/aarch64/aarch64-ldp-fusion.cc (ldp_bb_info::fuse_pair):
Use replace_equiv_address_nv on a change of base instead of
adjust_address_nv on the other access.

gcc/testsuite/ChangeLog:

PR target/114674
* gcc.target/aarch64/pr114674.c: New test.
diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
b/gcc/config/aarch64/aarch64-ldp-fusion.cc
index 365dcf48b22..d07d79df06c 100644
--- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
+++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
@@ -1730,11 +1730,11 @@ ldp_bb_info::fuse_pair (bool load_p,
adjust_amt *= -1;
 
   rtx change_reg = XEXP (change_pat, !load_p);
-  machine_mode mode_for_mem = GET_MODE (change_mem);
   rtx effective_base = drop_writeback (base_mem);
-  rtx new_mem = adjust_address_nv (effective_base,
-  mode_for_mem,
-  adjust_amt);
+  rtx adjusted_addr = plus_constant (Pmode,
+XEXP (effective_base, 0),
+adjust_amt);
+  rtx new_mem = replace_equiv_address_nv (change_mem, adjusted_addr);
   rtx new_set = load_p
? gen_rtx_SET (change_reg, new_mem)
: gen_rtx_SET (new_mem, change_reg);
diff --git a/gcc/testsuite/gcc.target/aarch64/pr114674.c 
b/gcc/testsuite/gcc.target/aarch64/pr114674.c
new file mode 100644
index 000..944784fd008
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr114674.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 --param=aarch64-stp-policy=aligned" } */
+typedef struct {
+   unsigned int f1;
+   unsigned int f2;
+} test_struct;
+
+static test_struct ts = {
+   123, 456
+};
+
+void foo(void)
+{
+   ts.f2 = 36969 * (ts.f2 & 65535) + (ts.f1 >> 16);
+   ts.f1 = 18000 * (ts.f2 & 65535) + (ts.f2 >> 16);
+}
+/* { dg-final { scan-assembler-times "stp" 1 } } */

[PATCH-4, rs6000] Implement optab_isnormal for SFmode, DFmode and TFmode [PR97786]

2024-04-12 Thread HAO CHEN GUI

Hi,
  This patch implemented optab_isnormal for SF/DF/TFmode by rs6000 test
data class instructions.

  This patch relies on former patch which adds optab_isnormal.
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649366.html

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. Is it OK for next stage 1?

Thanks
Gui Haochen


ChangeLog
rs6000: Implement optab_isnormal for SFmode, DFmode and TFmode

gcc/
PR target/97786
* config/rs6000/vsx.md (isnormal2): New expand for SFmode and
DFmode.
(isnormal2): New expand for TFmode.

gcc/testsuite/
PR target/97786
* gcc.target/powerpc/pr97786-7.c: New test.
* gcc.target/powerpc/pr97786-8.c: New test.

patch.diff
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index a6c72ae33b0..d1c9ef5447c 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -5357,6 +5357,30 @@ (define_expand "isfinite2"
   DONE;
 })

+(define_expand "isnormal2"
+  [(use (match_operand:SI 0 "gpc_reg_operand"))
+   (use (match_operand:SFDF 1 "gpc_reg_operand"))]
+  "TARGET_HARD_FLOAT
+   && TARGET_P9_VECTOR"
+{
+  rtx tmp = can_create_pseudo_p () ? gen_reg_rtx (SImode) : operands[0];
+  emit_insn (gen_xststdcp (tmp, operands[1], GEN_INT (0x7f)));
+  emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx));
+  DONE;
+})
+
+(define_expand "isnormal2"
+  [(use (match_operand:SI 0 "gpc_reg_operand"))
+   (use (match_operand:IEEE128 1 "gpc_reg_operand"))]
+  "TARGET_HARD_FLOAT
+   && TARGET_P9_VECTOR"
+{
+  rtx tmp = can_create_pseudo_p () ? gen_reg_rtx (SImode) : operands[0];
+  emit_insn (gen_xststdcqp_ (tmp, operands[1], GEN_INT (0x7f)));
+  emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx));
+  DONE;
+})
+

 ;; The VSX Scalar Test Negative Quad-Precision
 (define_expand "xststdcnegqp_"
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-7.c 
b/gcc/testsuite/gcc.target/powerpc/pr97786-7.c
new file mode 100644
index 000..a0d848497b9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97786-7.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -mvsx" } */
+
+int test1 (double x)
+{
+  return __builtin_isnormal (x);
+}
+
+int test2 (float x)
+{
+  return __builtin_isnormal (x);
+}
+
+/* { dg-final { scan-assembler-not {\mfcmpu\M} } } */
+/* { dg-final { scan-assembler-times {\mxststdc[sd]p\M} 2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-8.c 
b/gcc/testsuite/gcc.target/powerpc/pr97786-8.c
new file mode 100644
index 000..d591073d281
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97786-8.c
@@ -0,0 +1,13 @@
+/* { dg-do compile { target lp64 } } */
+/* { dg-require-effective-target ppc_float128_sw } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -mvsx -mabi=ieeelongdouble 
-Wno-psabi" } */
+
+int test1 (long double x)
+{
+  return __builtin_isnormal (x);
+}
+
+
+/* { dg-final { scan-assembler-not {\mxscmpuqp\M} } } */
+/* { dg-final { scan-assembler {\mxststdcqp\M} } } */

[PATCH] s390: testsuite: Xfail range-sincos.c and vrp-float-abs-1.c

2024-04-12 Thread Stefan Schulze Frielinghaus

As mentioned in PR114678 those failures will be fixed by
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648303.html
For GCC 14 just xfail them which should be reverted once the patch is
applied.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/range-sincos.c: Xfail for s390.
* gcc.dg/tree-ssa/vrp-float-abs-1.c: Dito.
---
 Ok for mainline?

 gcc/testsuite/gcc.dg/tree-ssa/range-sincos.c| 2 +-
 gcc/testsuite/gcc.dg/tree-ssa/vrp-float-abs-1.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-sincos.c 
b/gcc/testsuite/gcc.dg/tree-ssa/range-sincos.c
index 337f9cda02f..35b38c3c914 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/range-sincos.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/range-sincos.c
@@ -40,4 +40,4 @@ stool (double x)
 link_error ();
 }
 
-// { dg-final { scan-tree-dump-not "link_error" "evrp" { target { { *-*-linux* 
} && { glibc } } } } }
+// { dg-final { scan-tree-dump-not "link_error" "evrp" { target { { *-*-linux* 
} && { glibc } } xfail s390*-*-* } } } xfail: PR114678
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp-float-abs-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/vrp-float-abs-1.c
index 4b7b75833e0..a814a973963 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/vrp-float-abs-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp-float-abs-1.c
@@ -14,4 +14,4 @@ foo (double x, double y)
 }
 }
 
-// { dg-final { scan-tree-dump-not "link_error" "evrp" } }
+// { dg-final { scan-tree-dump-not "link_error" "evrp" { xfail s390*-*-* } } } 
xfail: PR114678
-- 
2.43.0

[PATCH] Optab: add isnormal_optab for __builtin_isnormal

2024-04-12 Thread HAO CHEN GUI

Hi,
  This patch adds an optab for __builtin_isnormal. The normal check can be
implemented on rs6000 by a single instruction. It needs an optab to be
expanded to the certain sequence of instructions.

  The subsequent patches will implement the expand on rs6000.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for next stage-1?

Thanks
Gui Haochen
ChangeLog
optab: Add isnormal_optab for isnormal builtin

gcc/
* builtins.cc (interclass_mathfn_icode): Set optab to isnormal_optab
for isnormal builtin.
* optabs.def (isnormal_optab): New.

patch.diff
diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 3174f52ebe8..defb39de95f 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -2462,6 +2462,7 @@ interclass_mathfn_icode (tree arg, tree fndecl)
 case BUILT_IN_ISFINITE:
   builtin_optab = isfinite_optab; break;
 case BUILT_IN_ISNORMAL:
+  builtin_optab = isnormal_optab; break;
 CASE_FLT_FN (BUILT_IN_FINITE):
 case BUILT_IN_FINITED32:
 case BUILT_IN_FINITED64:
diff --git a/gcc/optabs.def b/gcc/optabs.def
index dcd77315c2a..3c401fc0b4c 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -353,6 +353,7 @@ OPTAB_D (hypot_optab, "hypot$a3")
 OPTAB_D (ilogb_optab, "ilogb$a2")
 OPTAB_D (isinf_optab, "isinf$a2")
 OPTAB_D (isfinite_optab, "isfinite$a2")
+OPTAB_D (isnormal_optab, "isnormal$a2")
 OPTAB_D (issignaling_optab, "issignaling$a2")
 OPTAB_D (ldexp_optab, "ldexp$a3")
 OPTAB_D (log10_optab, "log10$a2")

[PATCH] tree-cfg: Make the verifier returns_twice message translatable

2024-04-12 Thread Jakub Jelinek

Hi!

While translation of the verifier messages is questionable, that case is
something that ideally should never happen except to gcc developers
and so pressumably English should be fine, we use error etc. APIs and
those imply translatations and some translators translate it.
The following patch adjusts the code such that we don't emit
appel returns_twice est not first dans le bloc de base 33
in French (i.e. 2 English word in the middle of a French message).
Similarly Swedish or Ukrainian. 
Note, the German translator did differentiate between these verifier
messages vs. normal user facing and translated it to:
"Interner Fehler: returns_twice call is %s in basic block %d"
so just a German prefix before English message.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-04-12  Jakub Jelinek  

* tree-cfg.cc (gimple_verify_flow_info): Make the misplaced
returns_twice diagnostics translatable.

--- gcc/tree-cfg.cc.jj  2024-04-10 10:19:04.237471564 +0200
+++ gcc/tree-cfg.cc 2024-04-11 17:18:57.962672110 +0200
@@ -5818,7 +5818,7 @@ gimple_verify_flow_info (void)
  if (gimple_code (stmt) == GIMPLE_CALL
  && gimple_call_flags (stmt) & ECF_RETURNS_TWICE)
{
- const char *misplaced = NULL;
+ bool misplaced = false;
  /* TM is an exception: it points abnormal edges just after the
 call that starts a transaction, i.e. it must end the BB.  */
  if (gimple_call_builtin_p (stmt, BUILT_IN_TM_START))
@@ -5826,18 +5826,23 @@ gimple_verify_flow_info (void)
  if (single_succ_p (bb)
  && bb_has_abnormal_pred (single_succ (bb))
  && !gsi_one_nondebug_before_end_p (gsi))
-   misplaced = "not last";
+   {
+ error ("returns_twice call is not last in basic block "
+"%d", bb->index);
+ misplaced = true;
+   }
}
  else
{
- if (seen_nondebug_stmt
- && bb_has_abnormal_pred (bb))
-   misplaced = "not first";
+ if (seen_nondebug_stmt && bb_has_abnormal_pred (bb))
+   {
+ error ("returns_twice call is not first in basic block "
+"%d", bb->index);
+ misplaced = true;
+   }
}
  if (misplaced)
{
- error ("returns_twice call is %s in basic block %d",
-misplaced, bb->index);
  print_gimple_stmt (stderr, stmt, 0, TDF_SLIM);
  err = true;
}

Jakub

[PATCH] Limit special asan/ubsan/bitint returns_twice handling to calls in bbs with abnormal pred [PR114687]

2024-04-12 Thread Jakub Jelinek

Hi!

The tree-cfg.cc verifier only diagnoses returns_twice calls preceded
by non-label/debug stmts if it is in a bb with abnormal predecessor.
The following testcase shows that if a user lies in the attributes
(a function which never returns can't be pure, and can't return
twice when it doesn't ever return at all), when we figure it out,
we can remove the abnormal edges to the "returns_twice" call and perhaps
whole .ABNORMAL_DISPATCHER etc.
edge_before_returns_twice_call then ICEs because it can't find such
an edge.

The following patch limits the special handling to calls in bbs where
the verifier requires that.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-04-12  Jakub Jelinek  

PR sanitizer/114687
* gimple-iterator.cc (gsi_safe_insert_before): Only use
edge_before_returns_twice_call if bb_has_abnormal_pred.
(gsi_safe_insert_seq_before): Likewise.
* gimple-lower-bitint.cc (bitint_large_huge::lower_call): Only
push to m_returns_twice_calls if bb_has_abnormal_pred.

* gcc.dg/asan/pr114687.c: New test.

--- gcc/gimple-iterator.cc.jj   2024-03-14 09:57:09.024966285 +0100
+++ gcc/gimple-iterator.cc  2024-04-11 17:05:06.267081433 +0200
@@ -1049,7 +1049,8 @@ gsi_safe_insert_before (gimple_stmt_iter
   gimple *stmt = gsi_stmt (*iter);
   if (stmt
   && is_gimple_call (stmt)
-  && (gimple_call_flags (stmt) & ECF_RETURNS_TWICE) != 0)
+  && (gimple_call_flags (stmt) & ECF_RETURNS_TWICE) != 0
+  && bb_has_abnormal_pred (gsi_bb (*iter)))
 {
   edge e = edge_before_returns_twice_call (gsi_bb (*iter));
   basic_block new_bb = gsi_insert_on_edge_immediate (e, g);
@@ -1072,7 +1073,8 @@ gsi_safe_insert_seq_before (gimple_stmt_
   gimple *stmt = gsi_stmt (*iter);
   if (stmt
   && is_gimple_call (stmt)
-  && (gimple_call_flags (stmt) & ECF_RETURNS_TWICE) != 0)
+  && (gimple_call_flags (stmt) & ECF_RETURNS_TWICE) != 0
+  && bb_has_abnormal_pred (gsi_bb (*iter)))
 {
   edge e = edge_before_returns_twice_call (gsi_bb (*iter));
   gimple *f = gimple_seq_first_stmt (seq);
--- gcc/gimple-lower-bitint.cc.jj   2024-04-09 09:28:21.261123664 +0200
+++ gcc/gimple-lower-bitint.cc  2024-04-11 17:06:58.033548199 +0200
@@ -5320,7 +5320,7 @@ bitint_large_huge::lower_call (tree obj,
  arg = make_ssa_name (TREE_TYPE (arg));
  gimple *g = gimple_build_assign (arg, v);
  gsi_insert_before (, g, GSI_SAME_STMT);
- if (returns_twice)
+ if (returns_twice && bb_has_abnormal_pred (gimple_bb (stmt)))
{
  m_returns_twice_calls.safe_push (stmt);
  returns_twice = false;
--- gcc/testsuite/gcc.dg/asan/pr114687.c.jj 2024-04-11 17:09:54.518127165 
+0200
+++ gcc/testsuite/gcc.dg/asan/pr114687.c2024-04-11 17:09:22.699563654 
+0200
@@ -0,0 +1,22 @@
+/* PR sanitizer/114687 */
+/* { dg-do compile } */
+
+int a;
+int foo (int);
+
+__attribute__((pure, returns_twice)) int
+bar (void)
+{
+  a = 1;
+  while (a)
+a = 2;
+  return a;
+}
+
+int
+baz (void)
+{
+  int d = bar ();
+  foo (d);
+  return 0;
+}

Jakub

Re: [PATCH] contrib/check-params-in-docs.py: Ignore target-specific params

2024-04-12 Thread Thomas Schwinge

Hi!

On 2024-04-12T09:08:13+0200, Filip Kastl  wrote:
> On Thu 2024-04-11 20:51:55, Thomas Schwinge wrote:
>> On 2024-04-11T19:52:51+0200, Martin Jambor  wrote:
>> > contrib/check-params-in-docs.py is a script that checks that all
>> > options reported with ./gcc/xgcc -Bgcc --help=param are in
>> > gcc/doc/invoke.texi and vice versa.
>> 
>> Eh, first time I'm hearing about this one!
>> 
>> (a) Shouldn't this be running as part of the GCC build process?
>> 
>> > gcn-preferred-vectorization-factor is in the manual but normally not
>> > reported by --help, probably because I do not have gcn offload
>> > configured.
>> 
>> No, because you've not been building GCC for GCN target.  ;-P
>> 
>> > This patch makes the script silently about this particular
>> > fact.
>> 
>> (b) Shouldn't we instead ignore any '--param's with "gcn" prefix, similar
>> to how that's done for "skip aarch64 params"?
>> 
>> (c) ..., and shouldn't we likewise skip any "x86" ones?
>> 
>> (d) ..., or in fact any target specific ones, following after the generic
>> section?  (Easily achieved with a special marker in
>> 'gcc/doc/invoke.texi', just before:
>> 
>> The following choices of @var{name} are available on AArch64 targets:
>> 
>> ..., and adjusting the 'takewhile' in 'contrib/check-params-in-docs.py'
>> accordingly?

> I've made a patch to address (b), (c), (d).  I didn't adjust takewhile.  I
> chose to do it differently since target-specific params in both invoke.texi 
> and
> --help=params have to be ignored.

Right, I realized that after I had sent my email...

> The downside of this patch is that the script won't complain if someone adds a
> target-specific param and doesn't document it.

Yes, but that's a pre-existing problem -- unless you happened to be
targeting some x86 variant.  The target-specific '--param's will have to
be handled differently.

> What do you think?

Looks like a good incremental improvement to me, thanks!


Grüße
 Thomas


> contrib/check-params-in-docs.py is a script that checks that all options
> reported with gcc --help=params are in gcc/doc/invoke.texi and vice
> versa.
> gcc/doc/invoke.texi lists target-specific params but gcc --help=params
> doesn't.  This meant that the script would mistakenly complain about
> parms missing from --help=params.  Previously, the script was just set
> to ignore aarch64 and gcn params which solved this issue only for x86.
> This patch sets the script to ignore all target-specific params.
>
> contrib/ChangeLog:
>
>   * check-params-in-docs.py: Ignore target specific params.
>
> Signed-off-by: Filip Kastl 
> ---
>  contrib/check-params-in-docs.py | 21 +
>  1 file changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/contrib/check-params-in-docs.py b/contrib/check-params-in-docs.py
> index f7879dd8e08..ccdb8d72169 100755
> --- a/contrib/check-params-in-docs.py
> +++ b/contrib/check-params-in-docs.py
> @@ -38,6 +38,9 @@ def get_param_tuple(line):
>  description = line[i:].strip()
>  return (name, description)
>  
> +def target_specific(param):
> +return param.split('-')[0] in ('aarch64', 'gcn', 'x86')
> +
>  
>  parser = argparse.ArgumentParser()
>  parser.add_argument('texi_file')
> @@ -45,13 +48,16 @@ parser.add_argument('params_output')
>  
>  args = parser.parse_args()
>  
> -ignored = {'logical-op-non-short-circuit', 
> 'gcn-preferred-vectorization-factor'}
> -params = {}
> +ignored = {'logical-op-non-short-circuit'}
> +help_params = {}
>  
>  for line in open(args.params_output).readlines():
>  if line.startswith(' ' * 2) and not line.startswith(' ' * 8):
>  r = get_param_tuple(line)
> -params[r[0]] = r[1]
> +help_params[r[0]] = r[1]
> +
> +# Skip target-specific params
> +help_params = [x for x in help_params.keys() if not target_specific(x)]
>  
>  # Find section in .texi manual with parameters
>  texi = ([x.strip() for x in open(args.texi_file).readlines()])
> @@ -66,14 +72,13 @@ for line in texi:
>  texi_params.append(line[len(token):])
>  break
>  
> -# skip digits
> +# Skip digits
>  texi_params = [x for x in texi_params if not x[0].isdigit()]
> -# skip aarch64 params
> -texi_params = [x for x in texi_params if not x.startswith('aarch64')]
> -sorted_params = sorted(texi_params)
> +# Skip target-specific params
> +texi_params = [x for x in texi_params if not target_specific(x)]
>  
>  texi_set = set(texi_params) - ignored
> -params_set = set(params.keys()) - ignored
> +params_set = set(help_params) - ignored
>  
>  success = True
>  extra = texi_set - params_set
> -- 
> 2.43.1

Re: [PATCH] match: Fix `!a?b:c` and `a?~t:t` patterns for signed 1 bit types [PR114666]

2024-04-12 Thread Richard Biener

On Fri, Apr 12, 2024 at 1:25 AM Andrew Pinski (QUIC)
 wrote:
>
> > -Original Message-
> > From: Richard Biener 
> > Sent: Thursday, April 11, 2024 2:31 AM
> > To: Andrew Pinski (QUIC) 
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: Re: [PATCH] match: Fix `!a?b:c` and `a?~t:t` patterns for signed 1 
> > bit
> > types [PR114666]
> >
> > On Thu, Apr 11, 2024 at 10:43 AM Andrew Pinski
> >  wrote:
> > >
> > > The issue here is that the `a?~t:t` pattern assumed (maybe correctly)
> > > that a here was always going to be a unsigned boolean type. This fixes
> > > the problem in both patterns to cast the operand to boolean type first.
> > >
> > > I should note that VRP seems to be keep on wanting to produce `a ==
> > > 0?1:-2` from `((int)a) ^ 1` is a bit odd and partly is the cause of
> > > the issue and there seems to be some disconnect on what should be the
> > > canonical form. That will be something to look at for GCC 15.
> > >
> > > Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> > >
> > > PR tree-optimization/114666
> > >
> > > gcc/ChangeLog:
> > >
> > > * match.pd (`!a?b:c`): Cast `a` to boolean type for cond for
> > > gimple.
> > > (`a?~t:t`): Cast `a` to boolean type before casting it
> > > to the type.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.c-torture/execute/bitfld-signed1-1.c: New test.
> > >
> > > Signed-off-by: Andrew Pinski 
> > > ---
> > >  gcc/match.pd| 10 +++---
> > >  .../gcc.c-torture/execute/bitfld-signed1-1.c| 13 +
> > >  2 files changed, 20 insertions(+), 3 deletions(-)  create mode 100644
> > > gcc/testsuite/gcc.c-torture/execute/bitfld-signed1-1.c
> > >
> > > diff --git a/gcc/match.pd b/gcc/match.pd index
> > > 15a1e7350d4..ffc928b656a 100644
> > > --- a/gcc/match.pd
> > > +++ b/gcc/match.pd
> > > @@ -5895,7 +5895,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > >   /* !A ? B : C -> A ? C : B.  */
> > >   (simplify
> > >(cnd (logical_inverted_value truth_valued_p@0) @1 @2)
> > > -  (cnd @0 @2 @1)))
> > > +  /* For gimple, make sure the operand to COND is a boolean type,
> > > + truth_valued_p will match 1bit integers too. */  (if (GIMPLE &&
> > > + cnd == COND_EXPR)
> > > +   (cnd (convert:boolean_type_node @0) @2 @1)
> > > +   (cnd @0 @2 @1
> >
> > This looks "wrong" for GENERIC still?
>
> I tired without the GIMPLE check and ran into the testcase 
> gcc.dg/torture/builtins-isinf-sign-1.c failing. Because the extra convert was 
> blocking seeing both sides of an equal was the same (I didn't look into it 
> further than that). So I decided to limit it to GIMPLE only.
>
> > But this is not really part of the fix but deciding we should not have
> > signed:1 as
> > cond operand?  I'll note that truth_valued_p allows signed:1.
> >
> > Maybe as minimal surgery add a TYPE_UNSIGNED (TREE_TPE (@0)) check here
> > instead?
>
> That might work, let me try.
>
> >
> > >  /* abs/negative simplifications moved from
> > fold_cond_expr_with_comparison.
> > >
> > > @@ -7099,8 +7103,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > > && (!wascmp || TYPE_PRECISION (type) == 1))
> > > (if ((!TYPE_UNSIGNED (type) && TREE_CODE (type) == BOOLEAN_TYPE)
> > > || TYPE_PRECISION (type) == 1)
> > > -(bit_xor (convert:type @0) @2)
> > > -(bit_xor (negate (convert:type @0)) @2)
> > > +(bit_xor (convert:type (convert:boolean_type_node @0)) @2)
> > > +(bit_xor (negate (convert:type (convert:boolean_type_node @0)))
> > > + @2)
> > >  #endif
> >
> > This looks OK, but then testing TYPE_UNSIGNED (TREE_TYPE (@0)) might be
> > better?
> >
>
> Let me do that just like the other pattern.
>
> > Does this all just go downhill from what VRP creates?  That is, would IL
> > checking have had a chance detecting it if we say signed:1 are not valid as
> > condition?
>
> Yes. So what VRP produces in the testcase is:
> `_2 == 0 ? 1 : -2u` (where _2 is the signed 1bit integer).
> Now maybe the COND_EXPR should be the canonical form for constants (but that 
> is for a different patch I think, I added it to the list of things I should 
> look into for GCC 15).

Ah OK, so the !A ? B : C -> A ? C : B transform turns the
"proper" conditional into an improper one (if we want to restrict it).
And then the other pattern matches doing the wrong transform.

> >
> > That said, the latter pattern definitely needs guarding/adjustment, I'm not
> > sure the former is wrong?  Semantically [VEC_]COND_EXPR is op0 != 0 ? ... : 
> > ...
>
> I forgot to mention that to fix the bug only one of the 2 hunks are needed.
>
> >
> > Richard.
> >
> > >  /* Simplify pointer equality compares using PTA.  */ diff --git
> > > a/gcc/testsuite/gcc.c-torture/execute/bitfld-signed1-1.c
> > > b/gcc/testsuite/gcc.c-torture/execute/bitfld-signed1-1.c
> > > new file mode 100644
> > > index 000..b0ff120ea51
> > > --- /dev/null
> > > +++

Re: [PATCH v2] match: Fix `!a?b:c` and `a?~t:t` patterns for signed 1 bit types [PR114666]

2024-04-12 Thread Richard Biener

On Fri, Apr 12, 2024 at 6:53 AM Andrew Pinski  wrote:
>
> The problem is `!a?b:c` pattern will create a COND_EXPR with an 1bit signed 
> integer
> which breaks patterns like `a?~t:t`. This rejects when we have a signed 
> operand for
> both patterns.
>
> Note for GCC 15, I am going to look at the canonicalization of `a?~t:t` where 
> t
> was a constant since I think keeping it a COND_EXPR might be more canonical 
> and
> is what VPR produces from the same IR; if anything expand should handle which 
> one
> is better.
>
> Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK.

> PR tree-optimization/114666
>
> gcc/ChangeLog:
>
> * match.pd (`!a?b:c`): Reject signed types for the condition.
> (`a?~t:t`): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.c-torture/execute/bitfld-signed1-1.c: New test.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/match.pd|  6 +-
>  .../gcc.c-torture/execute/bitfld-signed1-1.c| 13 +
>  2 files changed, 18 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.c-torture/execute/bitfld-signed1-1.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 15a1e7350d4..d401e7503e6 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -5895,7 +5895,10 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   /* !A ? B : C -> A ? C : B.  */
>   (simplify
>(cnd (logical_inverted_value truth_valued_p@0) @1 @2)
> -  (cnd @0 @2 @1)))
> +  /* For CONDs, don't handle signed values here. */
> +  (if (cnd == VEC_COND_EXPR
> +   || TYPE_UNSIGNED (TREE_TYPE (@0)))
> +   (cnd @0 @2 @1
>
>  /* abs/negative simplifications moved from fold_cond_expr_with_comparison.
>
> @@ -7095,6 +7098,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (cond @0 @1 @2)
>   (with { bool wascmp; }
>(if (INTEGRAL_TYPE_P (type)
> +   && TYPE_UNSIGNED (TREE_TYPE (@0))
> && bitwise_inverted_equal_p (@1, @2, wascmp)
> && (!wascmp || TYPE_PRECISION (type) == 1))
> (if ((!TYPE_UNSIGNED (type) && TREE_CODE (type) == BOOLEAN_TYPE)
> diff --git a/gcc/testsuite/gcc.c-torture/execute/bitfld-signed1-1.c 
> b/gcc/testsuite/gcc.c-torture/execute/bitfld-signed1-1.c
> new file mode 100644
> index 000..b0ff120ea51
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/bitfld-signed1-1.c
> @@ -0,0 +1,13 @@
> +/* PR tree-optimization/114666 */
> +/* We used to miscompile this to be always aborting
> +   due to the use of the signed 1bit into the COND_EXPR. */
> +
> +struct {
> +  signed a : 1;
> +} b = {-1};
> +char c;
> +int main()
> +{
> +  if ((b.a ^ 1UL) < 3)
> +__builtin_abort();
> +}
> --
> 2.43.0
>

Re: [PATCH, OpenACC 2.7, v3] Adjust acc_map_data/acc_unmap_data interaction with reference counters

2024-04-12 Thread Thomas Schwinge

Hi Chung-Lin!

On 2024-04-11T22:08:47+0800, Chung-Lin Tang  wrote:
> On 2024/3/15 7:24 PM, Thomas Schwinge wrote:
>> -  if (n->refcount != REFCOUNT_INFINITY)
>> +  if (n->refcount != REFCOUNT_INFINITY
>> +  && n->refcount != REFCOUNT_ACC_MAP_DATA)
>>  n->refcount--;
>>n->dynamic_refcount--;
>>  }
>>  
>> +  /* Mappings created by 'acc_map_data' may only be deleted by
>> + 'acc_unmap_data'.  */
>> +  if (n->refcount == REFCOUNT_ACC_MAP_DATA
>> +  && n->dynamic_refcount == 0)
>> +n->dynamic_refcount = 1;
>> +
>>if (n->refcount == 0)
>>  {
>>bool copyout = (kind == GOMP_MAP_FROM
>> 
>> ..., which really should have the same semantics?  No strong opinion on
>> which of the two variants you now chose.
>
> My guess is that breaking off the REFCOUNT_ACC_MAP_DATA case separately will
> be lighter on any branch predictors (faster performing overall)

Eh, OK...

> so I will
> stick with my version here.


 It's not clear to me why you need this handling -- instead of just
 handling 'REFCOUNT_ACC_MAP_DATA' like 'REFCOUNT_INFINITY' here, that is,
 early 'return'?

 Per my understanding, this code is for OpenACC only exercised for
 structured data regions, and it seems strange (unnecessary?) to adjust
 the 'dynamic_refcount' for these for 'acc_map_data'-mapped data?  Or am I
 missing anything?
>>>
>>> No, that is not true. It goes through almost everything through 
>>> gomp_map_vars_existing/_internal.
>>> This is what happens when you acc_create/acc_copyin on a mapping created by 
>>> acc_map_data.

I still don't follow.  If you 'acc_map_data' something, and then
'acc_create' the same memory region, then that's handled, with
'dynamic_refcount', via 'acc_create' -> 'goacc_enter_datum' ->
'goacc_map_var_existing', all in 'libgomp/oacc-mem.c'.  Agree?

>> But I don't understand what you foresee breaking with the following (on
>> top of your v2):
>> 
>> --- a/libgomp/target.c
>> +++ b/libgomp/target.c
>> @@ -476,14 +476,14 @@ gomp_free_device_memory (struct gomp_device_descr 
>> *devicep, void *devptr)
>>  static inline void
>>  gomp_increment_refcount (splay_tree_key k, htab_t *refcount_set)
>>  {
>> -  if (k == NULL || k->refcount == REFCOUNT_INFINITY)
>> +  if (k == NULL
>> +  || k->refcount == REFCOUNT_INFINITY
>> +  || k->refcount == REFCOUNT_ACC_MAP_DATA)
>>  return;
>>  
>>uintptr_t *refcount_ptr = >refcount;
>>  
>> -  if (k->refcount == REFCOUNT_ACC_MAP_DATA)
>> -refcount_ptr = >dynamic_refcount;
>> -  else if (REFCOUNT_STRUCTELEM_FIRST_P (k->refcount))
>> +  if (REFCOUNT_STRUCTELEM_FIRST_P (k->refcount))
>>  refcount_ptr = >structelem_refcount;
> ...
>> Can you please show a test case?

That is, a test case where the 'libgomp/target.c:gomp_increment_refcount'
etc. handling is relevant.  Those test cases:

> I have re-tested the patch *without* the gomp_increment/decrement_refcount 
> changes,
> and have these regressions (just to demonstrate what is affected):
> +FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nested-1.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  
> execution test
> +FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nested-1.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  
> execution test
> +FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/pr92854-1.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  
> execution test
> +FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/pr92854-1.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  
> execution test
> +FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/nested-1.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  
> execution test
> +FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/nested-1.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  
> execution test
> +FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/pr92854-1.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  
> execution test
> +FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/pr92854-1.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  
> execution test

... are cases where we 'acc_map_data' something, and then invoke an
OpenACC compute constuct with a data clause for the same memory region...

> Now, I have also re-tested your version (aka, just break early and return 
> when k->refcount == REFCOUNT_ACC_MAP_DATA)
> And for the record, that also works (no regressions).
>
> However, I strongly suggest we use my version here where we adjust the 
> dynamic_refcount

..., and it's confusing to me why such an OpenACC compute constuct (which
is to use the structured reference counter) should then use the

[PATCH] contrib/check-params-in-docs.py: Ignore target-specific params

2024-04-12 Thread Filip Kastl

On Thu 2024-04-11 20:51:55, Thomas Schwinge wrote:
> Hi!
> 
> On 2024-04-11T19:52:51+0200, Martin Jambor  wrote:
> > contrib/check-params-in-docs.py is a script that checks that all
> > options reported with ./gcc/xgcc -Bgcc --help=param are in
> > gcc/doc/invoke.texi and vice versa.
> 
> Eh, first time I'm hearing about this one!
> 
> (a) Shouldn't this be running as part of the GCC build process?
> 
> > gcn-preferred-vectorization-factor is in the manual but normally not
> > reported by --help, probably because I do not have gcn offload
> > configured.
> 
> No, because you've not been building GCC for GCN target.  ;-P
> 
> > This patch makes the script silently about this particular
> > fact.
> 
> (b) Shouldn't we instead ignore any '--param's with "gcn" prefix, similar
> to how that's done for "skip aarch64 params"?
> 
> (c) ..., and shouldn't we likewise skip any "x86" ones?
> 
> (d) ..., or in fact any target specific ones, following after the generic
> section?  (Easily achieved with a special marker in
> 'gcc/doc/invoke.texi', just before:
> 
> The following choices of @var{name} are available on AArch64 targets:
> 
> ..., and adjusting the 'takewhile' in 'contrib/check-params-in-docs.py'
> accordingly?

Hi,

I've made a patch to address (b), (c), (d).  I didn't adjust takewhile.  I
chose to do it differently since target-specific params in both invoke.texi and
--help=params have to be ignored.

The downside of this patch is that the script won't complain if someone adds a
target-specific param and doesn't document it.

What do you think?

Cheers,
Filip

-- 8< --

contrib/check-params-in-docs.py is a script that checks that all options
reported with gcc --help=params are in gcc/doc/invoke.texi and vice
versa.
gcc/doc/invoke.texi lists target-specific params but gcc --help=params
doesn't.  This meant that the script would mistakenly complain about
parms missing from --help=params.  Previously, the script was just set
to ignore aarch64 and gcn params which solved this issue only for x86.
This patch sets the script to ignore all target-specific params.

contrib/ChangeLog:

* check-params-in-docs.py: Ignore target specific params.

Signed-off-by: Filip Kastl 
---
 contrib/check-params-in-docs.py | 21 +
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/contrib/check-params-in-docs.py b/contrib/check-params-in-docs.py
index f7879dd8e08..ccdb8d72169 100755
--- a/contrib/check-params-in-docs.py
+++ b/contrib/check-params-in-docs.py
@@ -38,6 +38,9 @@ def get_param_tuple(line):
 description = line[i:].strip()
 return (name, description)
 
+def target_specific(param):
+return param.split('-')[0] in ('aarch64', 'gcn', 'x86')
+
 
 parser = argparse.ArgumentParser()
 parser.add_argument('texi_file')
@@ -45,13 +48,16 @@ parser.add_argument('params_output')
 
 args = parser.parse_args()
 
-ignored = {'logical-op-non-short-circuit', 
'gcn-preferred-vectorization-factor'}
-params = {}
+ignored = {'logical-op-non-short-circuit'}
+help_params = {}
 
 for line in open(args.params_output).readlines():
 if line.startswith(' ' * 2) and not line.startswith(' ' * 8):
 r = get_param_tuple(line)
-params[r[0]] = r[1]
+help_params[r[0]] = r[1]
+
+# Skip target-specific params
+help_params = [x for x in help_params.keys() if not target_specific(x)]
 
 # Find section in .texi manual with parameters
 texi = ([x.strip() for x in open(args.texi_file).readlines()])
@@ -66,14 +72,13 @@ for line in texi:
 texi_params.append(line[len(token):])
 break
 
-# skip digits
+# Skip digits
 texi_params = [x for x in texi_params if not x[0].isdigit()]
-# skip aarch64 params
-texi_params = [x for x in texi_params if not x.startswith('aarch64')]
-sorted_params = sorted(texi_params)
+# Skip target-specific params
+texi_params = [x for x in texi_params if not target_specific(x)]
 
 texi_set = set(texi_params) - ignored
-params_set = set(params.keys()) - ignored
+params_set = set(help_params) - ignored
 
 success = True
 extra = texi_set - params_set
-- 
2.43.1

RE: [PATCH v1] RISC-V: Bugfix ICE non-vector in TARGET_FUNCTION_VALUE_REGNO_P

2024-04-12 Thread Li, Pan2

Committed, thanks Juzhe.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Friday, April 12, 2024 2:11 PM
To: Li, Pan2 ; gcc-patches 
Cc: kito.cheng ; Li, Pan2 
Subject: Re: [PATCH v1] RISC-V: Bugfix ICE non-vector in 
TARGET_FUNCTION_VALUE_REGNO_P

LGTM。


juzhe.zh...@rivai.ai

From: pan2.li
Date: 2024-04-12 14:08
To: gcc-patches
CC: juzhe.zhong; 
kito.cheng; Pan Li
Subject: [PATCH v1] RISC-V: Bugfix ICE non-vector in 
TARGET_FUNCTION_VALUE_REGNO_P
From: Pan Li mailto:pan2...@intel.com>>

This patch would like to fix one ICE when vector is not enabled
in hook TARGET_FUNCTION_VALUE_REGNO_P implementation.  The vector
regno is available if and only if the TARGET_VECTOR is true.  The
previous implement missed this condition and then result in ICE
when rv64gc build option without vector.

PR target/114639

The below test suite is passed for this patch.

* The rv64gcv fully regression tests.
* The rv64gc fully regression tests.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_function_value_regno_p): Add
TARGET_VECTOR predicate for V_RETURN regno.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr114639-1.c: New test.
* gcc.target/riscv/pr114639-2.c: New test.
* gcc.target/riscv/pr114639-3.c: New test.
* gcc.target/riscv/pr114639-4.c: New test.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
---
gcc/config/riscv/riscv.cc   |  2 +-
gcc/testsuite/gcc.target/riscv/pr114639-1.c | 11 +++
gcc/testsuite/gcc.target/riscv/pr114639-2.c | 11 +++
gcc/testsuite/gcc.target/riscv/pr114639-3.c | 11 +++
gcc/testsuite/gcc.target/riscv/pr114639-4.c | 11 +++
5 files changed, 45 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/pr114639-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr114639-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr114639-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr114639-4.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 91f017dd52a..e5f00806bb9 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -11008,7 +11008,7 @@ riscv_function_value_regno_p (const unsigned regno)
   if (FP_RETURN_FIRST <= regno && regno <= FP_RETURN_LAST)
 return true;
-  if (regno == V_RETURN)
+  if (TARGET_VECTOR && regno == V_RETURN)
 return true;
   return false;
diff --git a/gcc/testsuite/gcc.target/riscv/pr114639-1.c 
b/gcc/testsuite/gcc.target/riscv/pr114639-1.c
new file mode 100644
index 000..f41723193a4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr114639-1.c
@@ -0,0 +1,11 @@
+/* Test that we do not have ice when compile */
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -std=gnu89 -O3" } */
+
+g (a, b) {}
+
+f (xx)
+ void* xx;
+{
+  __builtin_apply ((void*)g, xx, 200);
+}
diff --git a/gcc/testsuite/gcc.target/riscv/pr114639-2.c 
b/gcc/testsuite/gcc.target/riscv/pr114639-2.c
new file mode 100644
index 000..0c402c4b254
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr114639-2.c
@@ -0,0 +1,11 @@
+/* Test that we do not have ice when compile */
+/* { dg-do compile } */
+/* { dg-options "-march=rv64imac -mabi=lp64 -std=gnu89 -O3" } */
+
+g (a, b) {}
+
+f (xx)
+ void* xx;
+{
+  __builtin_apply ((void*)g, xx, 200);
+}
diff --git a/gcc/testsuite/gcc.target/riscv/pr114639-3.c 
b/gcc/testsuite/gcc.target/riscv/pr114639-3.c
new file mode 100644
index 000..ffb0d6d162d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr114639-3.c
@@ -0,0 +1,11 @@
+/* Test that we do not have ice when compile */
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gc -mabi=ilp32d -std=gnu89 -O3" } */
+
+g (a, b) {}
+
+f (xx)
+ void* xx;
+{
+  __builtin_apply ((void*)g, xx, 200);
+}
diff --git a/gcc/testsuite/gcc.target/riscv/pr114639-4.c 
b/gcc/testsuite/gcc.target/riscv/pr114639-4.c
new file mode 100644
index 000..a6e229101ef
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr114639-4.c
@@ -0,0 +1,11 @@
+/* Test that we do not have ice when compile */
+/* { dg-do compile } */
+/* { dg-options "-march=rv32imac -mabi=ilp32 -std=gnu89 -O3" } */
+
+g (a, b) {}
+
+f (xx)
+ void* xx;
+{
+  __builtin_apply ((void*)g, xx, 200);
+}
--
2.34.1

Re: [PATCH, OpenACC 2.7] Connect readonly modifier to points-to analysis

2024-04-12 Thread Richard Biener

On Thu, 11 Apr 2024, Thomas Schwinge wrote:

> Hi Chung-Lin, Richard!
> 
> From me just a few mechanical pieces, see below.  Richard, are you able
> to again comment on Chung-Lin's general strategy, as I'm not at all
> familiar with those parts of the code?

I've queued all stage1 material and will be only able to slowly look
at it after we released.

> On 2024-04-03T19:50:55+0800, Chung-Lin Tang  
> wrote:
> > On 2023/10/30 8:46 PM, Richard Biener wrote:
> >>>
> >>> What Chung-Lin's first patch does is mark the OMP clause for 'x' (not the
> >>> 'x' decl itself!) as 'readonly', via a new 'OMP_CLAUSE_MAP_READONLY'
> >>> flag.
> >>>
> >>> The actual optimization then is done in this second patch.  Chung-Lin
> >>> found that he could use 'SSA_NAME_POINTS_TO_READONLY_MEMORY' for that.
> >>> I don't have much experience with most of the following generic code, so
> >>> would appreciate a helping hand, whether that conceptually makes sense as
> >>> well as from the implementation point of view:
> >
> > First of all, I have removed all of the gimplify-stage scanning and setting 
> > of
> > DECL_POINTS_TO_READONLY and SSA_NAME_POINTS_TO_READONLY_MEMORY (so no 
> > changes to
> > gimplify.cc now)
> >
> > I remember this code was an artifact of earlier attempts to allow 
> > struct-member
> > pointer mappings to also work (e.g. map(readonly:rec.ptr[:N])), but failed 
> > anyways.
> > I think the omp_data_* member accesses when building child function side
> > receiver_refs is blocking points-to analysis from working (didn't try 
> > digging deeper)
> >
> > Also during gimplify, VAR_DECLs appeared to be reused (at least in some 
> > cases) for map
> > clause decl reference building, so hoping that the variables "happen to be" 
> > single-use and
> > DECL_POINTS_TO_READONLY relaying into SSA_NAME_POINTS_TO_READONLY_MEMORY 
> > does appear to be
> > a little risky.
> >
> > However, for firstprivate pointers processed during omp-low, it appears to 
> > be somewhat different.
> > (see below description)
> >
> >> No, I don't think you can use that flag on non-default-defs, nor
> >> preserve it on copying.  So
> >> it also doesn't nicely extend to DECLs as done by the patch.  We
> >> currently _only_ use it
> >> for incoming parameters.  When used on arbitrary code you can get to for 
> >> example
> >> 
> >> ptr1(points-to-readony-memory) = >x;
> >> ... access via ptr1 ...
> >> ptr2 = >x;
> >> ... access via ptr2 ...
> >> 
> >> where both are your OMP regions differently constrained (the constrain is 
> >> on the
> >> code in the region, _not_ on the actual protections of the pointed to
> >> data, much like
> >> for the fortran case).  But now CSE comes along and happily replaces all 
> >> ptr2
> >> with ptr2 in the second region and ... oops!
> >
> > Richard, I assume what you meant was "happily replaces all ptr2 with ptr1 
> > in the second region"?
> >
> > That doesn't happen, because during omp-lower/expand, OMP target regions 
> > (which is all that
> > this applies currently) is separated into different individual child 
> > functions.
> >
> > (Currently, the only "effective" use of DECL_POINTS_TO_READONLY is during 
> > omp-lower, when
> > for firstprivate pointers (i.e. 'a' here) we set this bit when constructing 
> > the first load
> > of this pointer)
> >
> >   #pragma acc parallel copyin(readonly: a[:32]) copyout(r)
> >   {
> > foo (a, a[8]);
> > r = a[8];
> >   }
> >   #pragma acc parallel copyin(readonly: a[:32]) copyout(r)
> >   {
> > foo (a, a[12]);
> > r = a[12];
> >   }
> >
> > After omp-expand (before SSA):
> >
> > __attribute__((oacc parallel, omp target entrypoint, noclone))
> > void main._omp_fn.1 (const struct .omp_data_t.3 & restrict .omp_data_i)
> > {
> >  ...
> >:
> >   D.2962 = .omp_data_i->D.2947;
> >   a.8 = D.2962;
> >   r.1 = (*a.8)[12];
> >   foo (a.8, r.1);
> >   r.1 = (*a.8)[12];
> >   D.2965 = .omp_data_i->r;
> >   *D.2965 = r.1;
> >   return;
> > }
> >
> > __attribute__((oacc parallel, omp target entrypoint, noclone))
> > void main._omp_fn.0 (const struct .omp_data_t.2 & restrict .omp_data_i)
> > {
> >   ...
> >:
> >   D.2968 = .omp_data_i->D.2939;
> >   a.4 = D.2968;
> >   r.0 = (*a.4)[8];
> >   foo (a.4, r.0);
> >   r.0 = (*a.4)[8];
> >   D.2971 = .omp_data_i->r;
> >   *D.2971 = r.0;
> >   return;
> > }
> >
> > So actually, the creating of DECL_POINTS_TO_READONLY and its relaying to
> > SSA_NAME_POINTS_TO_READONLY_MEMORY here, is actually quite similar to a 
> > default-def
> > for an PARM_DECL, at least conceptually.
> >
> > (If offloading was structured significantly differently, say if child 
> > functions
> > were separated much earlier before omp-lowering, than this 
> > readonly-modifier might
> > possibly be a direct application of 'r' in the "fn spec" attribute)
> >
> > Other changes since first version of patch include:
> > 1) update of C/C++ FE changes to new style in c-family/c-omp.cc
> > 2) merging of two if cases in fortran/trans-openmp.cc like

[PATCH-3] Builtin: Fold builtin_isfinite on IBM long double to builtin_isfinite on double [PR97786]

2024-04-12 Thread HAO CHEN GUI

Hi,
  This patch folds builtin_isfinite on IBM long double to builtin_isfinite on
double type. The former patch
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649346.html
implemented the DFmode isfinite_optab.

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. Is it OK for next stage 1?

Thanks
Gui Haochen

ChangeLog
Builtin: Fold builtin_isfinite on IBM long double to builtin_isfinite on double

For IBM long double, INF and NAN is encoded in the high-order double value
only.  So the builtin_isfinite on IBM long double can be folded to
builtin_isfinite on double type.  As former patch implemented DFmode
isfinite_optab, this patch converts builtin_isfinite on IBM long double to
builtin_isfinite on double type if the DFmode isfinite_optab exists.

gcc/
PR target/97786
* builtins.cc (fold_builtin_interclass_mathfn): Fold IBM long double
isfinite call to double isfinite call when DFmode isfinite_optab
exists.

gcc/testsuite/
PR target/97786
* gcc.target/powerpc/pr97786-6.c: New test.

patch.diff
diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 5262aa01660..3174f52ebe8 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -9605,6 +9605,12 @@ fold_builtin_interclass_mathfn (location_t loc, tree 
fndecl, tree arg)
type = double_type_node;
mode = DFmode;
arg = fold_build1_loc (loc, NOP_EXPR, type, arg);
+   tree const isfinite_fn = builtin_decl_explicit (BUILT_IN_ISFINITE);
+   if (interclass_mathfn_icode (arg, isfinite_fn) != CODE_FOR_nothing)
+ {
+   result = build_call_expr (isfinite_fn, 1, arg);
+   return result;
+ }
  }
get_max_float (REAL_MODE_FORMAT (mode), buf, sizeof (buf), false);
real_from_string (, buf);
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-6.c 
b/gcc/testsuite/gcc.target/powerpc/pr97786-6.c
new file mode 100644
index 000..c86c765651d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97786-6.c
@@ -0,0 +1,12 @@
+/* { dg-do compile { target lp64 } } */
+/* { dg-require-effective-target ppc_float128_sw } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -mvsx -mabi=ibmlongdouble 
-Wno-psabi" } */
+
+int test1 (long double x)
+{
+  return __builtin_isfinite (x);
+}
+
+/* { dg-final { scan-assembler-not {\mfcmpu\M} } } */
+/* { dg-final { scan-assembler {\mxststdcdp\M} } } */

Re: [PATCH v1] RISC-V: Bugfix ICE non-vector in TARGET_FUNCTION_VALUE_REGNO_P

2024-04-12 Thread juzhe.zh...@rivai.ai

LGTM。



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2024-04-12 14:08
To: gcc-patches
CC: juzhe.zhong; kito.cheng; Pan Li
Subject: [PATCH v1] RISC-V: Bugfix ICE non-vector in 
TARGET_FUNCTION_VALUE_REGNO_P
From: Pan Li 
 
This patch would like to fix one ICE when vector is not enabled
in hook TARGET_FUNCTION_VALUE_REGNO_P implementation.  The vector
regno is available if and only if the TARGET_VECTOR is true.  The
previous implement missed this condition and then result in ICE
when rv64gc build option without vector.
 
PR target/114639
 
The below test suite is passed for this patch.
 
* The rv64gcv fully regression tests.
* The rv64gc fully regression tests.
 
gcc/ChangeLog:
 
* config/riscv/riscv.cc (riscv_function_value_regno_p): Add
TARGET_VECTOR predicate for V_RETURN regno.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/pr114639-1.c: New test.
* gcc.target/riscv/pr114639-2.c: New test.
* gcc.target/riscv/pr114639-3.c: New test.
* gcc.target/riscv/pr114639-4.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/riscv.cc   |  2 +-
gcc/testsuite/gcc.target/riscv/pr114639-1.c | 11 +++
gcc/testsuite/gcc.target/riscv/pr114639-2.c | 11 +++
gcc/testsuite/gcc.target/riscv/pr114639-3.c | 11 +++
gcc/testsuite/gcc.target/riscv/pr114639-4.c | 11 +++
5 files changed, 45 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/pr114639-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr114639-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr114639-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr114639-4.c
 
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 91f017dd52a..e5f00806bb9 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -11008,7 +11008,7 @@ riscv_function_value_regno_p (const unsigned regno)
   if (FP_RETURN_FIRST <= regno && regno <= FP_RETURN_LAST)
 return true;
-  if (regno == V_RETURN)
+  if (TARGET_VECTOR && regno == V_RETURN)
 return true;
   return false;
diff --git a/gcc/testsuite/gcc.target/riscv/pr114639-1.c 
b/gcc/testsuite/gcc.target/riscv/pr114639-1.c
new file mode 100644
index 000..f41723193a4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr114639-1.c
@@ -0,0 +1,11 @@
+/* Test that we do not have ice when compile */
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -std=gnu89 -O3" } */
+
+g (a, b) {}
+
+f (xx)
+ void* xx;
+{
+  __builtin_apply ((void*)g, xx, 200);
+}
diff --git a/gcc/testsuite/gcc.target/riscv/pr114639-2.c 
b/gcc/testsuite/gcc.target/riscv/pr114639-2.c
new file mode 100644
index 000..0c402c4b254
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr114639-2.c
@@ -0,0 +1,11 @@
+/* Test that we do not have ice when compile */
+/* { dg-do compile } */
+/* { dg-options "-march=rv64imac -mabi=lp64 -std=gnu89 -O3" } */
+
+g (a, b) {}
+
+f (xx)
+ void* xx;
+{
+  __builtin_apply ((void*)g, xx, 200);
+}
diff --git a/gcc/testsuite/gcc.target/riscv/pr114639-3.c 
b/gcc/testsuite/gcc.target/riscv/pr114639-3.c
new file mode 100644
index 000..ffb0d6d162d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr114639-3.c
@@ -0,0 +1,11 @@
+/* Test that we do not have ice when compile */
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gc -mabi=ilp32d -std=gnu89 -O3" } */
+
+g (a, b) {}
+
+f (xx)
+ void* xx;
+{
+  __builtin_apply ((void*)g, xx, 200);
+}
diff --git a/gcc/testsuite/gcc.target/riscv/pr114639-4.c 
b/gcc/testsuite/gcc.target/riscv/pr114639-4.c
new file mode 100644
index 000..a6e229101ef
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr114639-4.c
@@ -0,0 +1,11 @@
+/* Test that we do not have ice when compile */
+/* { dg-do compile } */
+/* { dg-options "-march=rv32imac -mabi=ilp32 -std=gnu89 -O3" } */
+
+g (a, b) {}
+
+f (xx)
+ void* xx;
+{
+  __builtin_apply ((void*)g, xx, 200);
+}
-- 
2.34.1

[PATCH v1] RISC-V: Bugfix ICE non-vector in TARGET_FUNCTION_VALUE_REGNO_P

2024-04-12 Thread pan2 . li

From: Pan Li 

This patch would like to fix one ICE when vector is not enabled
in hook TARGET_FUNCTION_VALUE_REGNO_P implementation.  The vector
regno is available if and only if the TARGET_VECTOR is true.  The
previous implement missed this condition and then result in ICE
when rv64gc build option without vector.

PR target/114639

The below test suite is passed for this patch.

* The rv64gcv fully regression tests.
* The rv64gc fully regression tests.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_function_value_regno_p): Add
TARGET_VECTOR predicate for V_RETURN regno.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr114639-1.c: New test.
* gcc.target/riscv/pr114639-2.c: New test.
* gcc.target/riscv/pr114639-3.c: New test.
* gcc.target/riscv/pr114639-4.c: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/riscv.cc   |  2 +-
 gcc/testsuite/gcc.target/riscv/pr114639-1.c | 11 +++
 gcc/testsuite/gcc.target/riscv/pr114639-2.c | 11 +++
 gcc/testsuite/gcc.target/riscv/pr114639-3.c | 11 +++
 gcc/testsuite/gcc.target/riscv/pr114639-4.c | 11 +++
 5 files changed, 45 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr114639-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr114639-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr114639-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr114639-4.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 91f017dd52a..e5f00806bb9 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -11008,7 +11008,7 @@ riscv_function_value_regno_p (const unsigned regno)
   if (FP_RETURN_FIRST <= regno && regno <= FP_RETURN_LAST)
 return true;
 
-  if (regno == V_RETURN)
+  if (TARGET_VECTOR && regno == V_RETURN)
 return true;
 
   return false;
diff --git a/gcc/testsuite/gcc.target/riscv/pr114639-1.c 
b/gcc/testsuite/gcc.target/riscv/pr114639-1.c
new file mode 100644
index 000..f41723193a4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr114639-1.c
@@ -0,0 +1,11 @@
+/* Test that we do not have ice when compile */
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -std=gnu89 -O3" } */
+
+g (a, b) {}
+
+f (xx)
+ void* xx;
+{
+  __builtin_apply ((void*)g, xx, 200);
+}
diff --git a/gcc/testsuite/gcc.target/riscv/pr114639-2.c 
b/gcc/testsuite/gcc.target/riscv/pr114639-2.c
new file mode 100644
index 000..0c402c4b254
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr114639-2.c
@@ -0,0 +1,11 @@
+/* Test that we do not have ice when compile */
+/* { dg-do compile } */
+/* { dg-options "-march=rv64imac -mabi=lp64 -std=gnu89 -O3" } */
+
+g (a, b) {}
+
+f (xx)
+ void* xx;
+{
+  __builtin_apply ((void*)g, xx, 200);
+}
diff --git a/gcc/testsuite/gcc.target/riscv/pr114639-3.c 
b/gcc/testsuite/gcc.target/riscv/pr114639-3.c
new file mode 100644
index 000..ffb0d6d162d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr114639-3.c
@@ -0,0 +1,11 @@
+/* Test that we do not have ice when compile */
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gc -mabi=ilp32d -std=gnu89 -O3" } */
+
+g (a, b) {}
+
+f (xx)
+ void* xx;
+{
+  __builtin_apply ((void*)g, xx, 200);
+}
diff --git a/gcc/testsuite/gcc.target/riscv/pr114639-4.c 
b/gcc/testsuite/gcc.target/riscv/pr114639-4.c
new file mode 100644
index 000..a6e229101ef
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr114639-4.c
@@ -0,0 +1,11 @@
+/* Test that we do not have ice when compile */
+/* { dg-do compile } */
+/* { dg-options "-march=rv32imac -mabi=ilp32 -std=gnu89 -O3" } */
+
+g (a, b) {}
+
+f (xx)
+ void* xx;
+{
+  __builtin_apply ((void*)g, xx, 200);
+}
-- 
2.34.1

Re: [r14-9912 Regression] FAIL: gcc.dg/guality/pr54693-2.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects -DPREVENT_OPTIMIZATION line 21 z == 30 - 3 * i on Linux/x86_64

2024-04-12 Thread Richard Biener

On Fri, 12 Apr 2024, haochen.jiang wrote:

> On Linux/x86_64,
> 
> c7e8a8d814229fd6fc4c16c2452f15dddc613479 is the first bad commit
> commit c7e8a8d814229fd6fc4c16c2452f15dddc613479
> Author: Richard Biener 
> Date:   Thu Apr 11 11:08:07 2024 +0200
> 
> tree-optimization/109596 - wrong debug stmt move by copyheader
> 
> caused
> 
> FAIL: gcc.dg/guality/pr43051-1.c   -O3 -fomit-frame-pointer -funroll-loops 
> -fpeel-loops -ftracer -finline-functions  -DPREVENT_OPTIMIZATION  line 34 c 
> == [0]
> FAIL: gcc.dg/guality/pr43051-1.c   -O3 -fomit-frame-pointer -funroll-loops 
> -fpeel-loops -ftracer -finline-functions  -DPREVENT_OPTIMIZATION  line 39 c 
> == [0]
> FAIL: gcc.dg/guality/pr54693-2.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  -DPREVENT_OPTIMIZATION line 21 x == 10 - i
> FAIL: gcc.dg/guality/pr54693-2.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  -DPREVENT_OPTIMIZATION line 21 y == 20 - 2 * i
> FAIL: gcc.dg/guality/pr54693-2.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  -DPREVENT_OPTIMIZATION line 21 z == 30 - 3 * i

Just FYI these are the FAILs as they were present before the regression
this change fixed.

[PATCH-2, rs6000] Implement optab_isfinite for SFmode, DFmode and TFmode [PR97786]

2024-04-12 Thread HAO CHEN GUI

Hi,
  This patch implemented optab_finite for SF/DF/TFmode by rs6000 test
data class instructions.

  This patch relies on former patch which adds optab_finite.
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649339.html

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. Is it OK for next stage 1?

Thanks
Gui Haochen


ChangeLog
rs6000: Implement optab_isfinite for SFmode, DFmode and TFmode

gcc/
PR target/97786
* config/rs6000/vsx.md (isfinite2): New expand for SFmode and
DFmode.
(isfinite2): New expand for TFmode.

gcc/testsuite/
PR target/97786
* gcc.target/powerpc/pr97786-4.c: New test.
* gcc.target/powerpc/pr97786-5.c: New test.

patch.diff
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index f0cc02f7e7b..a6c72ae33b0 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -5333,6 +5333,31 @@ (define_expand "isinf2"
   DONE;
 })

+(define_expand "isfinite2"
+  [(use (match_operand:SI 0 "gpc_reg_operand"))
+   (use (match_operand:SFDF 1 "gpc_reg_operand"))]
+  "TARGET_HARD_FLOAT
+   && TARGET_P9_VECTOR"
+{
+  rtx tmp = can_create_pseudo_p () ? gen_reg_rtx (SImode) : operands[0];
+  emit_insn (gen_xststdcp (tmp, operands[1], GEN_INT (0x70)));
+  emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx));
+  DONE;
+})
+
+(define_expand "isfinite2"
+  [(use (match_operand:SI 0 "gpc_reg_operand"))
+   (use (match_operand:IEEE128 1 "gpc_reg_operand"))]
+  "TARGET_HARD_FLOAT
+   && TARGET_P9_VECTOR"
+{
+  rtx tmp = can_create_pseudo_p () ? gen_reg_rtx (SImode) : operands[0];
+  emit_insn (gen_xststdcqp_ (tmp, operands[1], GEN_INT (0x70)));
+  emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx));
+  DONE;
+})
+
+
 ;; The VSX Scalar Test Negative Quad-Precision
 (define_expand "xststdcnegqp_"
   [(set (match_dup 2)
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-4.c 
b/gcc/testsuite/gcc.target/powerpc/pr97786-4.c
new file mode 100644
index 000..55b5ff507b4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97786-4.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -mvsx" } */
+
+int test1 (double x)
+{
+  return __builtin_isfinite (x);
+}
+
+int test2 (float x)
+{
+  return __builtin_isfinite (x);
+}
+
+/* { dg-final { scan-assembler-not {\mfcmpu\M} } } */
+/* { dg-final { scan-assembler-times {\mxststdc[sd]p\M} 2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-5.c 
b/gcc/testsuite/gcc.target/powerpc/pr97786-5.c
new file mode 100644
index 000..5b5a89681fc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97786-5.c
@@ -0,0 +1,13 @@
+/* { dg-do compile { target lp64 } } */
+/* { dg-require-effective-target ppc_float128_sw } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -mvsx -mabi=ieeelongdouble 
-Wno-psabi" } */
+
+int test1 (long double x)
+{
+  return __builtin_isfinite (x);
+}
+
+
+/* { dg-final { scan-assembler-not {\mxscmpuqp\M} } } */
+/* { dg-final { scan-assembler {\mxststdcqp\M} } } */

83 matches

Mail list logo