date:20240305

[PATCH v2] RISC-V: Introduce gcc attribute riscv_rvv_vector_bits for RVV

2024-03-05 Thread pan2 . li

From: Pan Li 

Update in v2:
* Cleanup some unused code.
* Fix some typo of commit log.

Original log:

This patch would like to introduce one new gcc attribute for RVV.
This attribute is used to define fixed-length variants of one
existing sizeless RVV types.

This attribute is valid if and only if the mrvv-vector-bits=zvl, the only
one args should be the integer constant and its' value is terminated
by the LMUL and the vector register bits in zvl*b.  For example:

typedef vint32m2_t fixed_vint32m2_t __attribute__((riscv_rvv_vector_bits(128)));

The above type define is valid when -march=rv64gc_zve64d_zvl64b
(aka 2(m2) * 64 = 128 for vin32m2_t), and will report error when
-march=rv64gcv_zvl128b similar to below.

"error: invalid RVV vector size '128', expected size is '256' based on
LMUL of type and '-mrvv-vector-bits=zvl'"

For the vint*m*_t below operations are allowed.
* The sizeof.
* The global variable(s).
* The element of union and struct.
* The cast to other equalities.
* CMP: >, <, ==, !=, <=, >=
* ALU: +, -, *, /, %, &, |, ^, >>, <<, ~, -

For the vfloat*m*_t below operations are allowed.
* The sizeof.
* The global variable(s).
* The element of union and struct.
* The cast to other equalities.
* CMP: >, <, ==, !=, <=, >=
* ALU: +, -, *, /, -

For the vbool*_t types only below operations are allowed except
the CMP and ALU. The CMP and ALU operations on vbool*_t is not
well defined currently.
* The sizeof.
* The global variable(s).
* The element of union and struct.
* The cast to other equalities.

For the vint*x*m*_t tuple types are not suppored in this patch
which is compatible with clang.

This patch passed the below testsuites.
* The riscv fully regression tests.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_handle_rvv_vector_bits_attribute):
New static func to take care of the RVV types decorated by
the attributes.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-1.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-10.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-11.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-12.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-2.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-3.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-4.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-5.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-6.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-7.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-8.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-9.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits.h: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/riscv.cc |  87 +-
 .../riscv/rvv/base/riscv_rvv_vector_bits-1.c  |   6 +
 .../riscv/rvv/base/riscv_rvv_vector_bits-10.c |  53 +
 .../riscv/rvv/base/riscv_rvv_vector_bits-11.c |  76 
 .../riscv/rvv/base/riscv_rvv_vector_bits-12.c |  14 +++
 .../riscv/rvv/base/riscv_rvv_vector_bits-2.c  |   6 +
 .../riscv/rvv/base/riscv_rvv_vector_bits-3.c  |   6 +
 .../riscv/rvv/base/riscv_rvv_vector_bits-4.c  |   6 +
 .../riscv/rvv/base/riscv_rvv_vector_bits-5.c  |   6 +
 .../riscv/rvv/base/riscv_rvv_vector_bits-6.c  |   6 +
 .../riscv/rvv/base/riscv_rvv_vector_bits-7.c  |  76 
 .../riscv/rvv/base/riscv_rvv_vector_bits-8.c  |  75 
 .../riscv/rvv/base/riscv_rvv_vector_bits-9.c  |  76 
 .../riscv/rvv/base/riscv_rvv_vector_bits.h| 108 ++
 14 files changed, 599 insertions(+), 2 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-10.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-11.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-12.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-6.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-7.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-9.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/riscv_rvv_vector_bits.h

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc

Re: [PATCH v1] RISC-V: Introduce gcc attribute riscv_rvv_vector_bits for RVV

2024-03-05 Thread juzhe.zh...@rivai.ai

Thanks for support it.
I leave this patch review to kito who is much more familiar with it than me.

CCing more folks who may be interested at this stuff.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2024-03-06 14:38
To: gcc-patches
CC: juzhe.zhong; kito.cheng; yanzhang.wang; Pan Li
Subject: [PATCH v1] RISC-V: Introduce gcc attribute riscv_rvv_vector_bits for 
RVV
From: Pan Li 
 
This patch would like to introduce one new gcc attribute for RVV.
This attribute is used to define fixed-length variants of one
existing sizeless RVV types.
 
This attribute is valid if and only if the mrvv-vector-bits=zvl, the only
one args should be the integer constant and its' value is terminated
by the LMUL and the vector register bits in zvl*b.  For example:
 
typedef vint32m2_t fixed_vint32m2_t __attribute__((riscv_rvv_vector_bits(128)));
 
The above type define is invalid when -march=rv64gc_zve64d_zvl64b
(aka 2(m2) * 64 = 128 for vin32m2_t), and will report error when
-march=rv64gcv_zvl128b similar to below.
 
"error: invalid RVV vector size '128', expected size is '256' based on
LMUL of type and '-mrvv-vector-bits=zvl'"
 
For the vint*m*_t below operations are allowed.
* The sizeof.
* The global variable(s).
* The element of union and struct.
* The cast to other equalities.
* CMP: >, <, ==, !=, <=, >=
* ALU: +, -, *, /, %, &, |, ^, >>, <<, ~, -
 
For the vfloat*m*_t below operations are allowed.
* The sizeof.
* The global variable(s).
* The element of union and struct.
* The cast to other equalities.
* CMP: >, <, ==, !=, <=, >=
* ALU: +, -, *, /, -
 
For the vbool*_t types only below operations are allowed except
the CMP and ALU. The CMP and ALU operations on vbool*_t is not
well defined currently.
* The sizeof.
* The global variable(s).
* The element of union and struct.
* The cast to other equalities.
 
For the vint*x*m*_t tuple types are not suppored in this patch
which is compatible with clang.
 
This patch passed the below testsuites.
* The riscv fully regression tests.
 
gcc/ChangeLog:
 
* config/riscv/riscv.cc (riscv_handle_rvv_vector_bits_attribute):
New static func to take care of the RVV types decorated by
the attributes.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-1.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-10.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-11.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-12.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-2.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-3.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-4.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-5.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-6.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-7.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-8.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-9.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits.h: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/riscv.cc |  88 +-
.../riscv/rvv/base/riscv_rvv_vector_bits-1.c  |   6 +
.../riscv/rvv/base/riscv_rvv_vector_bits-10.c |  53 +
.../riscv/rvv/base/riscv_rvv_vector_bits-11.c |  76 
.../riscv/rvv/base/riscv_rvv_vector_bits-12.c |  14 +++
.../riscv/rvv/base/riscv_rvv_vector_bits-2.c  |   6 +
.../riscv/rvv/base/riscv_rvv_vector_bits-3.c  |   6 +
.../riscv/rvv/base/riscv_rvv_vector_bits-4.c  |   6 +
.../riscv/rvv/base/riscv_rvv_vector_bits-5.c  |   6 +
.../riscv/rvv/base/riscv_rvv_vector_bits-6.c  |   6 +
.../riscv/rvv/base/riscv_rvv_vector_bits-7.c  |  76 
.../riscv/rvv/base/riscv_rvv_vector_bits-8.c  |  75 
.../riscv/rvv/base/riscv_rvv_vector_bits-9.c  |  76 
.../riscv/rvv/base/riscv_rvv_vector_bits.h| 108 ++
14 files changed, 600 insertions(+), 2 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-10.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-11.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-12.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-2.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-3.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-4.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-5.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-6.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-7.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-8.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-9.c
create mode

[PATCH v1] RISC-V: Introduce gcc attribute riscv_rvv_vector_bits for RVV

2024-03-05 Thread pan2 . li

From: Pan Li 

This patch would like to introduce one new gcc attribute for RVV.
This attribute is used to define fixed-length variants of one
existing sizeless RVV types.

This attribute is valid if and only if the mrvv-vector-bits=zvl, the only
one args should be the integer constant and its' value is terminated
by the LMUL and the vector register bits in zvl*b.  For example:

typedef vint32m2_t fixed_vint32m2_t __attribute__((riscv_rvv_vector_bits(128)));

The above type define is invalid when -march=rv64gc_zve64d_zvl64b
(aka 2(m2) * 64 = 128 for vin32m2_t), and will report error when
-march=rv64gcv_zvl128b similar to below.

"error: invalid RVV vector size '128', expected size is '256' based on
LMUL of type and '-mrvv-vector-bits=zvl'"

For the vint*m*_t below operations are allowed.
* The sizeof.
* The global variable(s).
* The element of union and struct.
* The cast to other equalities.
* CMP: >, <, ==, !=, <=, >=
* ALU: +, -, *, /, %, &, |, ^, >>, <<, ~, -

For the vfloat*m*_t below operations are allowed.
* The sizeof.
* The global variable(s).
* The element of union and struct.
* The cast to other equalities.
* CMP: >, <, ==, !=, <=, >=
* ALU: +, -, *, /, -

For the vbool*_t types only below operations are allowed except
the CMP and ALU. The CMP and ALU operations on vbool*_t is not
well defined currently.
* The sizeof.
* The global variable(s).
* The element of union and struct.
* The cast to other equalities.

For the vint*x*m*_t tuple types are not suppored in this patch
which is compatible with clang.

This patch passed the below testsuites.
* The riscv fully regression tests.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_handle_rvv_vector_bits_attribute):
New static func to take care of the RVV types decorated by
the attributes.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-1.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-10.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-11.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-12.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-2.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-3.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-4.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-5.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-6.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-7.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-8.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-9.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits.h: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/riscv.cc |  88 +-
 .../riscv/rvv/base/riscv_rvv_vector_bits-1.c  |   6 +
 .../riscv/rvv/base/riscv_rvv_vector_bits-10.c |  53 +
 .../riscv/rvv/base/riscv_rvv_vector_bits-11.c |  76 
 .../riscv/rvv/base/riscv_rvv_vector_bits-12.c |  14 +++
 .../riscv/rvv/base/riscv_rvv_vector_bits-2.c  |   6 +
 .../riscv/rvv/base/riscv_rvv_vector_bits-3.c  |   6 +
 .../riscv/rvv/base/riscv_rvv_vector_bits-4.c  |   6 +
 .../riscv/rvv/base/riscv_rvv_vector_bits-5.c  |   6 +
 .../riscv/rvv/base/riscv_rvv_vector_bits-6.c  |   6 +
 .../riscv/rvv/base/riscv_rvv_vector_bits-7.c  |  76 
 .../riscv/rvv/base/riscv_rvv_vector_bits-8.c  |  75 
 .../riscv/rvv/base/riscv_rvv_vector_bits-9.c  |  76 
 .../riscv/rvv/base/riscv_rvv_vector_bits.h| 108 ++
 14 files changed, 600 insertions(+), 2 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-10.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-11.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-12.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-6.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-7.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-9.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/riscv_rvv_vector_bits.h

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 56cd8d2c23f..fdbaf1633ac 100644
--- a/gcc/config/riscv/riscv.cc
+++

Re: [patch, libgfortran] Part 2: PR105456 Child I/O does not propage iostat

2024-03-05 Thread Steve Kargl

On Tue, Mar 05, 2024 at 08:06:10PM -0800, Jerry D wrote:
> On 3/5/24 1:51 PM, Harald Anlauf wrote:
> > Hi Jerry,
> > 
> > on further thought, do we sanitize 'child_iomsg'?
> > We pass it to snprintf as format.
> > 
> > Wouldn't a strncpy be sufficient?
> > 
> > Harald
> > 
> > 
> 
> Just to be safe I will bump char message[IOMSG_LEN] to char
> message[IOMSG_LEN + 1]
> 
> This is like a C string vs a Fortran string length situation. snprintf
> guarantees we don't exceed the child_iomsg_len and null terminates it.
> 
> I added 1 to:
>  child_iomsg_len = string_len_trim (IOMSG_LEN, child_iomsg) + 1
> 

string_len_trim substracts 1 from the passed in argument.

gfc_charlen_type
string_len_trim (gfc_charlen_type len, const CHARTYPE *s)
{
  if (len <= 0)
return 0;

  const size_t long_len = sizeof (unsigned long);

  size_t i = len - 1;


Does this account for the NULL?

-- 
steve

Re: [PATCH] arm: Support -mfdpic for more targets

2024-03-05 Thread Fangrui Song

On Fri, Feb 23, 2024 at 7:33 PM Fangrui Song  wrote:
>
> From: Fangrui Song 
>
> Targets that are not arm*-*-uclinuxfdpiceabi can use -S -mfdpic, but -c
> -mfdpic does not pass --fdpic to gas.  This is an unnecessary
> restriction.  Just define the ASM_SPEC in bpabi.h.
>
> Additionally, use armelf[b]_linux_fdpiceabi emulations for -mfdpic in
> linux-eabi.h.  This will allow a future musl fdpic port to use the
> desired BFD emulation.
>
> gcc/ChangeLog:
>
> * config/arm/bpabi.h (TARGET_FDPIC_ASM_SPEC): Transform -mfdpic.
> * config/arm/linux-eabi.h (TARGET_FDPIC_LINKER_EMULATION): Define.
> (SUBTARGET_EXTRA_LINK_SPEC): Use TARGET_FDPIC_LINKER_EMULATION
> if -mfdpic.
> ---
>  gcc/config/arm/bpabi.h  | 2 +-
>  gcc/config/arm/linux-eabi.h | 5 -
>  2 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/config/arm/bpabi.h b/gcc/config/arm/bpabi.h
> index 7a279f3ed3c..6778be1a8bf 100644
> --- a/gcc/config/arm/bpabi.h
> +++ b/gcc/config/arm/bpabi.h
> @@ -55,7 +55,7 @@
>  #define TARGET_FIX_V4BX_SPEC " %{mcpu=arm8|mcpu=arm810|mcpu=strongarm*"\
>"|march=armv4|mcpu=fa526|mcpu=fa626:--fix-v4bx}"
>
> -#define TARGET_FDPIC_ASM_SPEC ""
> +#define TARGET_FDPIC_ASM_SPEC "%{mfdpic: --fdpic}"
>
>  #define BE8_LINK_SPEC  \
>"%{!r:%{!mbe32:%:be8_linkopt(%{mlittle-endian:little}"   \
> diff --git a/gcc/config/arm/linux-eabi.h b/gcc/config/arm/linux-eabi.h
> index eef791f6a02..0c5c58e4928 100644
> --- a/gcc/config/arm/linux-eabi.h
> +++ b/gcc/config/arm/linux-eabi.h
> @@ -46,12 +46,15 @@
>  #undef  TARGET_LINKER_EMULATION
>  #if TARGET_BIG_ENDIAN_DEFAULT
>  #define TARGET_LINKER_EMULATION "armelfb_linux_eabi"
> +#define TARGET_FDPIC_LINKER_EMULATION "armelfb_linux_fdpiceabi"
>  #else
>  #define TARGET_LINKER_EMULATION "armelf_linux_eabi"
> +#define TARGET_FDPIC_LINKER_EMULATION "armelf_linux_fdpiceabi"
>  #endif
>
>  #undef  SUBTARGET_EXTRA_LINK_SPEC
> -#define SUBTARGET_EXTRA_LINK_SPEC " -m " TARGET_LINKER_EMULATION
> +#define SUBTARGET_EXTRA_LINK_SPEC " -m %{mfdpic: " \
> +  TARGET_FDPIC_LINKER_EMULATION ";:" TARGET_LINKER_EMULATION "}"
>
>  /* GNU/Linux on ARM currently supports three dynamic linkers:
> - ld-linux.so.2 - for the legacy ABI
> --
> 2.44.0.rc1.240.g4c46232300-goog
>

Ping:)


-- 
宋方睿

Re: [patch, libgfortran] Part 2: PR105456 Child I/O does not propage iostat

2024-03-05 Thread Jerry D


On 3/5/24 1:51 PM, Harald Anlauf wrote:

Hi Jerry,

on further thought, do we sanitize 'child_iomsg'?
We pass it to snprintf as format.

Wouldn't a strncpy be sufficient?

Harald




Just to be safe I will bump char message[IOMSG_LEN] to char 
message[IOMSG_LEN + 1]


This is like a C string vs a Fortran string length situation. snprintf 
guarantees we don't exceed the child_iomsg_len and null terminates it.


I added 1 to:
 child_iomsg_len = string_len_trim (IOMSG_LEN, child_iomsg) + 1

Because snprintf was chopping off the last character of the fortran 
string to put the null in. (zero based vs one based char array). I test 
this with a very long string which exceeded the length and then reduced 
it until I could see the desired end.


I have not tried running a test case with sanitize. I did check with 
valgrind.  I will try the sanitize flags to see if we get a problem.  If 
not will push.


Thanks for comments,

Jerry -


On 3/5/24 22:37, Harald Anlauf wrote:

Hi Jerry,

I think there is the risk of buffer overrun in the following places:

+ char message[IOMSG_LEN];
+ child_iomsg_len = string_len_trim (IOMSG_LEN, child_iomsg)
+ 1;
   free_line (dtp);
   snprintf (message, child_iomsg_len, child_iomsg);
   generate_error (>common, dtp->u.p.child_saved_iostat,

plus several more.  Wouldn't it be better to increase the size of
message by one?

Thanks,
Harald


On 3/5/24 04:15, Jerry D wrote:

On 3/1/24 11:24 AM, rep.dot@gmail.com wrote:

Hi Jerry and Steve,

On 29 February 2024 19:28:19 CET, Jerry D  wrote:

On 2/29/24 10:13 AM, Steve Kargl wrote:

On Thu, Feb 29, 2024 at 09:36:43AM -0800, Jerry D wrote:

On 2/29/24 1:47 AM, Bernhard Reutner-Fischer wrote:


And, just for my own education, the length limitation of iomsg to
255
chars is not backed by the standard AFAICS, right? It's just our
STRERR_MAXSZ?


Yes, its what we have had for a long lone time. Once you throw an
error
things get very processor dependent. I found MSGLEN set to 100 and
IOMSG_len
to 256. Nothing magic about it.



There is no restriction on the length for the iomsg-variable
that receives the generated error message.  In fact, if the
iomsg-variable has a deferred-length type parameter, then
(re)-allocation to the exact length is expected.

    F2023

    12.11.6 IOMSG= specifier

    If an error, end-of-file, or end-of-record condition occurs 
during
    execution of an input/output statement, iomsg-variable is 
assigned

    an explanatory message, as if by intrinsic assignment. If no such
    condition occurs, the definition status and value of
iomsg-variable
    are unchanged.
   character(len=23) emsg
read(fd,*,iomsg=emsg)

Here, the generated iomsg is either truncated to a length of 23
or padded with blanks to a length of 23.

character(len=:), allocatable :: emsg
read(fd,*,iomsg=emsg)

Here, emsg should have the length of whatever error message was
generated.
   HTH



Well, currently, if someone uses a larger string than 256 we are
going to chop it off.

Do we want to process this differently now?


Yes. There is some odd hunk about discrepancy of passed len and
actual len afterwards in 22-007-r1, IIRC. Didn't look closely though.


--- snip ---

Attached is the revised patch using the already available
string_len_trim function.

This hunk is only executed if a user has not passed an iostat or iomsg
variable in the parent I/O statement and an error is triggered which
terminates execution of the program. In this case, the iomsg string is
provided in the usual error message in a "processor defined" way.

(F2023):

12.6.4.8.3 Executing defined input/output data transfers
---
11 If the iostat argument of the defined input/output procedure has a
nonzero value when that procedure returns, and the processor therefore
terminates execution of the program as described in 12.11, the
processor shall make the value of the iomsg argument available in a
processor-dependent manner.
---

OK for trunk?

Regards,

Jerry

Re: [PATCH v2] c++: Fix template deduction for conversion operators with xobj parameters [PR113629]

2024-03-05 Thread Jason Merrill


On 3/5/24 22:46, Nathaniel Shead wrote:

On Tue, Mar 05, 2024 at 06:19:07PM -0500, Jason Merrill wrote:

On 3/5/24 17:47, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

Unification for conversion operators (DEDUCE_CONV) doesn't perform
transformations like handling forwarding references. This is correct in
general, but not for xobj parameters, which should be handled "normally"
for the purposes of deduction: [temp.deduct.conv] only applies to the
return type of the conversion function.

PR c++/113629

gcc/cp/ChangeLog:

* pt.cc (type_unification_real): Use DEDUCE_CALL for xobj
parameters of conversion functions.

gcc/testsuite/ChangeLog:

* g++.dg/cpp23/explicit-obj-conv-op.C: New test.

Signed-off-by: Nathaniel Shead 
---
   gcc/cp/pt.cc  | 15 +-
   .../g++.dg/cpp23/explicit-obj-conv-op.C   | 49 +++
   2 files changed, 63 insertions(+), 1 deletion(-)
   create mode 100644 gcc/testsuite/g++.dg/cpp23/explicit-obj-conv-op.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index c4bc54a8fdb..632437d3424 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -23281,6 +23281,10 @@ type_unification_real (tree tparms,
in TARGS.  */
 NON_DEFAULT_TEMPLATE_ARGS_COUNT (targs) = NULL_TREE;
+  bool is_xobj_conv_fn
+= (strict == DEDUCE_CONV
+   && DECL_XOBJ_MEMBER_FUNCTION_P (TREE_TYPE (tparms)));
+
again:
 parms = xparms;
 args = xargs;
@@ -23312,10 +23316,17 @@ type_unification_real (tree tparms,
   parameter pack is a non-deduced context.  */
continue;
+  /* For explicit object parameters, unification should behave like
+normal function calls, even for conversion functions.  This
+corresponds to the second (that is, last) argument.  */
+  unification_kind_t kind = strict;
+  if (is_xobj_conv_fn && ia > 0)


Is it necessary to check the xobj flag?  Or can this just be

  if (strict == DEDUCE_CONV && ia > 0)

?

Jason



I restricted it to xobj to be conservative, but I think you're right
that it's not necessary: there's nothing special about xobj here apart
from this being a new circumstance where we might actually need to unify
the object parameter.

Here's a new version of the patch. Bootstrapped and regtested (so far
only dg.exp) on x86_64-pc-linux-gnu, OK for trunk if full regtest
completes successfully?


OK.


-- >8 --

Unification for conversion operators (DEDUCE_CONV) doesn't perform
transformations like handling forwarding references. This is correct in
general, but not for xobj parameters, which should be handled "normally"
for the purposes of deduction: [temp.deduct.conv] only applies to the
return type of the conversion function.

PR c++/113629

gcc/cp/ChangeLog:

* pt.cc (type_unification_real): Only use DEDUCE_CONV for the
return type of a conversion function.

gcc/testsuite/ChangeLog:

* g++.dg/cpp23/explicit-obj-conv-op.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/pt.cc  | 12 -
  .../g++.dg/cpp23/explicit-obj-conv-op.C   | 49 +++
  2 files changed, 60 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp23/explicit-obj-conv-op.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index c4bc54a8fdb..a6e6c804130 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -23312,10 +23312,18 @@ type_unification_real (tree tparms,
   parameter pack is a non-deduced context.  */
continue;
  
+  /* [temp.deduct.conv] only applies to the deduction of the return

+type, which is always the first argument here.  Other arguments
+(notably, explicit object parameters) should undergo normal
+call-like unification.  */
+  unification_kind_t kind = strict;
+  if (strict == DEDUCE_CONV && ia > 0)
+   kind = DEDUCE_CALL;
+
arg = args[ia];
++ia;
  
-  if (unify_one_argument (tparms, full_targs, parm, arg, subr, strict,

+  if (unify_one_argument (tparms, full_targs, parm, arg, subr, kind,
  explain_p))
return 1;
  }
@@ -23324,6 +23332,8 @@ type_unification_real (tree tparms,
&& parms != void_list_node
&& TREE_CODE (TREE_VALUE (parms)) == TYPE_PACK_EXPANSION)
  {
+  gcc_assert (strict != DEDUCE_CONV);
+
/* Unify the remaining arguments with the pack expansion type.  */
tree argvec;
tree parmvec = make_tree_vec (1);
diff --git a/gcc/testsuite/g++.dg/cpp23/explicit-obj-conv-op.C 
b/gcc/testsuite/g++.dg/cpp23/explicit-obj-conv-op.C
new file mode 100644
index 000..a6ae4ea1dda
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp23/explicit-obj-conv-op.C
@@ -0,0 +1,49 @@
+// PR c++/113629
+// { dg-do compile { target c++23 } }
+
+template  constexpr bool is_lvalue = false;
+template  constexpr bool is_lvalue = true;
+
+struct A {
+  constexpr operator bool(this

[PATCH v2] c++: Fix template deduction for conversion operators with xobj parameters [PR113629]

2024-03-05 Thread Nathaniel Shead

On Tue, Mar 05, 2024 at 06:19:07PM -0500, Jason Merrill wrote:
> On 3/5/24 17:47, Nathaniel Shead wrote:
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?
> > 
> > -- >8 --
> > 
> > Unification for conversion operators (DEDUCE_CONV) doesn't perform
> > transformations like handling forwarding references. This is correct in
> > general, but not for xobj parameters, which should be handled "normally"
> > for the purposes of deduction: [temp.deduct.conv] only applies to the
> > return type of the conversion function.
> > 
> > PR c++/113629
> > 
> > gcc/cp/ChangeLog:
> > 
> > * pt.cc (type_unification_real): Use DEDUCE_CALL for xobj
> > parameters of conversion functions.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/cpp23/explicit-obj-conv-op.C: New test.
> > 
> > Signed-off-by: Nathaniel Shead 
> > ---
> >   gcc/cp/pt.cc  | 15 +-
> >   .../g++.dg/cpp23/explicit-obj-conv-op.C   | 49 +++
> >   2 files changed, 63 insertions(+), 1 deletion(-)
> >   create mode 100644 gcc/testsuite/g++.dg/cpp23/explicit-obj-conv-op.C
> > 
> > diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> > index c4bc54a8fdb..632437d3424 100644
> > --- a/gcc/cp/pt.cc
> > +++ b/gcc/cp/pt.cc
> > @@ -23281,6 +23281,10 @@ type_unification_real (tree tparms,
> >in TARGS.  */
> > NON_DEFAULT_TEMPLATE_ARGS_COUNT (targs) = NULL_TREE;
> > +  bool is_xobj_conv_fn
> > += (strict == DEDUCE_CONV
> > +   && DECL_XOBJ_MEMBER_FUNCTION_P (TREE_TYPE (tparms)));
> > +
> >again:
> > parms = xparms;
> > args = xargs;
> > @@ -23312,10 +23316,17 @@ type_unification_real (tree tparms,
> >parameter pack is a non-deduced context.  */
> > continue;
> > +  /* For explicit object parameters, unification should behave like
> > +normal function calls, even for conversion functions.  This
> > +corresponds to the second (that is, last) argument.  */
> > +  unification_kind_t kind = strict;
> > +  if (is_xobj_conv_fn && ia > 0)
> 
> Is it necessary to check the xobj flag?  Or can this just be
> 
>  if (strict == DEDUCE_CONV && ia > 0)
> 
> ?
> 
> Jason
> 

I restricted it to xobj to be conservative, but I think you're right
that it's not necessary: there's nothing special about xobj here apart
from this being a new circumstance where we might actually need to unify
the object parameter.

Here's a new version of the patch. Bootstrapped and regtested (so far
only dg.exp) on x86_64-pc-linux-gnu, OK for trunk if full regtest
completes successfully?

-- >8 --

Unification for conversion operators (DEDUCE_CONV) doesn't perform
transformations like handling forwarding references. This is correct in
general, but not for xobj parameters, which should be handled "normally"
for the purposes of deduction: [temp.deduct.conv] only applies to the
return type of the conversion function.

PR c++/113629

gcc/cp/ChangeLog:

* pt.cc (type_unification_real): Only use DEDUCE_CONV for the
return type of a conversion function.

gcc/testsuite/ChangeLog:

* g++.dg/cpp23/explicit-obj-conv-op.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/pt.cc  | 12 -
 .../g++.dg/cpp23/explicit-obj-conv-op.C   | 49 +++
 2 files changed, 60 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp23/explicit-obj-conv-op.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index c4bc54a8fdb..a6e6c804130 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -23312,10 +23312,18 @@ type_unification_real (tree tparms,
   parameter pack is a non-deduced context.  */
continue;
 
+  /* [temp.deduct.conv] only applies to the deduction of the return
+type, which is always the first argument here.  Other arguments
+(notably, explicit object parameters) should undergo normal
+call-like unification.  */
+  unification_kind_t kind = strict;
+  if (strict == DEDUCE_CONV && ia > 0)
+   kind = DEDUCE_CALL;
+
   arg = args[ia];
   ++ia;
 
-  if (unify_one_argument (tparms, full_targs, parm, arg, subr, strict,
+  if (unify_one_argument (tparms, full_targs, parm, arg, subr, kind,
  explain_p))
return 1;
 }
@@ -23324,6 +23332,8 @@ type_unification_real (tree tparms,
   && parms != void_list_node
   && TREE_CODE (TREE_VALUE (parms)) == TYPE_PACK_EXPANSION)
 {
+  gcc_assert (strict != DEDUCE_CONV);
+
   /* Unify the remaining arguments with the pack expansion type.  */
   tree argvec;
   tree parmvec = make_tree_vec (1);
diff --git a/gcc/testsuite/g++.dg/cpp23/explicit-obj-conv-op.C 
b/gcc/testsuite/g++.dg/cpp23/explicit-obj-conv-op.C
new file mode 100644
index 000..a6ae4ea1dda
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp23/explicit-obj-conv-op.C
@@ -0,0 +1,49 @@
+// PR c++/113629
+// { dg-do compile { target c++23 } }
+
+template  constexpr bool

[PATCH] c++/modules: Prevent emission of really-extern vtables in importers [PR114229]

2024-03-05 Thread Nathaniel Shead

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

Currently, reading a variable definition always marks that decl as
DECL_NOT_REALLY_EXTERN, with anything else imported still being
considered external. This is not sufficient for vtables, however; for an
extern template, a vtable may be generated (and its definition emitted)
but nonetheless the vtable should only be emitted in the TU where that
template is actually instantiated.

While playing around with various settings of DECL_EXTERNAL I also
noticed that we mark variables not declared 'inline' as external, which
makes sense (there's a definition in another TU, while for vague linkage
variables we'll be importing the definition too), but we also do so for
'static constexpr' members that aren't directly marked 'inline'. This
fixes that for consistency, if nothing else (I wasn't able to cause an
actual issue exploiting this currently), but may become relevant with
https://github.com/itanium-cxx-abi/cxx-abi/issues/170.

PR c++/114229

gcc/cp/ChangeLog:

* module.cc (trees_out::core_bools): Count 'static constexpr'
vars also as inline.
(trees_out::write_var_def): Stream DECL_NOT_REALLY_EXTERN for
vtables.
(trees_in::read_var_def): Read it.

gcc/testsuite/ChangeLog:

* g++.dg/modules/virt-2_c.C:
* g++.dg/modules/virt-3_a.C: New test.
* g++.dg/modules/virt-3_b.C: New test.
* g++.dg/modules/virt-3_c.C: New test.
* g++.dg/modules/virt-3_d.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/module.cc| 15 +--
 gcc/testsuite/g++.dg/modules/virt-2_c.C | 14 +-
 gcc/testsuite/g++.dg/modules/virt-3_a.C |  9 +
 gcc/testsuite/g++.dg/modules/virt-3_b.C |  6 ++
 gcc/testsuite/g++.dg/modules/virt-3_c.C |  3 +++
 gcc/testsuite/g++.dg/modules/virt-3_d.C |  7 +++
 6 files changed, 47 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/virt-3_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/virt-3_b.C
 create mode 100644 gcc/testsuite/g++.dg/modules/virt-3_c.C
 create mode 100644 gcc/testsuite/g++.dg/modules/virt-3_d.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 67f132d28d7..771e56245dc 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -5418,7 +5418,7 @@ trees_out::core_bools (tree t)
  && !(TREE_STATIC (t)
   && DECL_FUNCTION_SCOPE_P (t)
   && DECL_DECLARED_INLINE_P (DECL_CONTEXT (t)))
- && !DECL_VAR_DECLARED_INLINE_P (t))
+ && !DECL_INLINE_VAR_P (t))
is_external = true;
  break;
 
@@ -11799,6 +11799,12 @@ trees_out::write_var_def (tree decl)
}
   tree_node (dyn_init);
 }
+
+  /* For vtables we need to know if they're actually extern or not,
+ even if we get a definition; for other kinds of variables we
+ can assume that if we have a definition they can be emitted.  */
+  if (streaming_p () && VAR_P (decl) && DECL_VTABLE_OR_VTT_P (decl))
+u (DECL_NOT_REALLY_EXTERN (decl));
 }
 
 void
@@ -11816,6 +11822,11 @@ trees_in::read_var_def (tree decl, tree maybe_template)
   tree dyn_init = init ? NULL_TREE : tree_node ();
   unused -= vtable;
 
+  /* Assume for most vars that if we have a definition it's not extern.  */
+  bool not_really_extern = true;
+  if (vtable)
+not_really_extern = u ();
+
   if (get_overrun ())
 return false;
 
@@ -11826,7 +11837,7 @@ trees_in::read_var_def (tree decl, tree maybe_template)
   if (installing)
 {
   if (DECL_EXTERNAL (decl))
-   DECL_NOT_REALLY_EXTERN (decl) = true;
+   DECL_NOT_REALLY_EXTERN (decl) = not_really_extern;
   if (VAR_P (decl))
{
  DECL_INITIALIZED_P (decl) = true;
diff --git a/gcc/testsuite/g++.dg/modules/virt-2_c.C 
b/gcc/testsuite/g++.dg/modules/virt-2_c.C
index 7b3eeebe508..8969cb04911 100644
--- a/gcc/testsuite/g++.dg/modules/virt-2_c.C
+++ b/gcc/testsuite/g++.dg/modules/virt-2_c.C
@@ -9,8 +9,12 @@ int Foo ()
   return !(Visit () == 0);
 }
 
-// We do emit Visitor vtable
-// andl also we do emit rtti here
-// { dg-final { scan-assembler {_ZTVW3foo7Visitor:} } }
-// { dg-final { scan-assembler {_ZTIW3foo7Visitor:} } }
-// { dg-final { scan-assembler {_ZTSW3foo7Visitor:} } }
+// Again, we do not emit Visitor vtable
+// but we do emit rtti here
+
+// but see https://github.com/itanium-cxx-abi/cxx-abi/issues/170:
+// we should only emit RTTI in virt-2_a.C, alongside the vtable
+
+// { dg-final { scan-assembler-not {_ZTVW3foo7Visitor:} } }
+// { dg-final { scan-assembler-not {_ZTIW3foo7Visitor:} { xfail *-*-* } } }
+// { dg-final { scan-assembler-not {_ZTSW3foo7Visitor:} { xfail *-*-* } } }
diff --git a/gcc/testsuite/g++.dg/modules/virt-3_a.C 
b/gcc/testsuite/g++.dg/modules/virt-3_a.C
new file mode 100644
index 000..a7eae7f9d35
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/virt-3_a.C
@@ -0,0 +1,9 @@
+// PR

Re: [PATCH] MAINTAINERS: Add myself to write after approval

2024-03-05 Thread juzhe.zh...@rivai.ai

Hi, han.

I think you can commit this patch:
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/646931.html 
RISC-V: Refactor expand_vec_cmp
It's an NFC patch that I approved.


juzhe.zh...@rivai.ai
 
From: demin.han
Date: 2024-03-04 14:51
To: gcc-patches
CC: juzhe.zhong; kito.cheng
Subject: [PATCH] MAINTAINERS: Add myself to write after approval
ChangeLog:
 
* MAINTAINERS: Add myself
 
Signed-off-by: demin.han 
---
MAINTAINERS | 1 +
1 file changed, 1 insertion(+)
 
diff --git a/MAINTAINERS b/MAINTAINERS
index b01fab16061..a681518d704 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -448,6 +448,7 @@ Wei Guozhi 
Vineet Gupta 
Naveen H.S 
Mostafa Hagog 
+Demin Han 
Jivan Hakobyan 
Andrew Haley 
Frederik Harwath 
-- 
2.43.2

[PATCH] LoongArch: Use /lib instead of /lib64 as the library search path for MUSL.

2024-03-05 Thread Yang Yujie

gcc/ChangeLog:

* config.gcc: Add a case for loongarch*-*-linux-musl*.
* config/loongarch/linux.h: Disable the multilib-compatible
treatment for *musl* targets.
* config/loongarch/musl.h: New file.
---
 gcc/config.gcc   |  3 +++
 gcc/config/loongarch/linux.h |  4 +++-
 gcc/config/loongarch/musl.h  | 23 +++
 3 files changed, 29 insertions(+), 1 deletion(-)
 create mode 100644 gcc/config/loongarch/musl.h

diff --git a/gcc/config.gcc b/gcc/config.gcc
index a1480b72c46..3293be16699 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -2538,6 +2538,9 @@ riscv*-*-freebsd*)
 
 loongarch*-*-linux*)
tm_file="elfos.h gnu-user.h linux.h linux-android.h glibc-stdint.h 
${tm_file}"
+case ${target} in
+ *-linux-musl*) tm_file="${tm_file} loongarch/musl.h"
+   esac
tm_file="${tm_file} loongarch/gnu-user.h loongarch/linux.h 
loongarch/loongarch-driver.h"
extra_options="${extra_options} linux-android.opt"
tmake_file="${tmake_file} loongarch/t-multilib loongarch/t-linux"
diff --git a/gcc/config/loongarch/linux.h b/gcc/config/loongarch/linux.h
index 17d9f87537b..40d9ba6d405 100644
--- a/gcc/config/loongarch/linux.h
+++ b/gcc/config/loongarch/linux.h
@@ -21,7 +21,9 @@ along with GCC; see the file COPYING3.  If not see
  * This ensures that a compiler configured with --disable-multilib
  * can work in a multilib environment.  */
 
-#if defined(LA_DISABLE_MULTILIB) && defined(LA_DISABLE_MULTIARCH)
+#if !defined(LA_DEFAULT_TARGET_MUSL) \
+  && defined(LA_DISABLE_MULTILIB) \
+  && defined(LA_DISABLE_MULTIARCH)
 
   #if DEFAULT_ABI_BASE == ABI_BASE_LP64D
 #define ABI_LIBDIR "lib64"
diff --git a/gcc/config/loongarch/musl.h b/gcc/config/loongarch/musl.h
new file mode 100644
index 000..fa43bc86606
--- /dev/null
+++ b/gcc/config/loongarch/musl.h
@@ -0,0 +1,23 @@
+/* Definitions for MUSL C library support.
+   Copyright (C) 2024 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+
+#ifndef LA_DEFAULT_TARGET_MUSL
+#define LA_DEFAULT_TARGET_MUSL
+#endif
-- 
2.43.0

Re: [PATCH] c++/modules: befriending template from current class scope

2024-03-05 Thread Jason Merrill


On 2/26/24 15:52, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk?


OK.


-- >8 --

Here the TEMPLATE_DECL representing the template friend declaration for
B has class scope since B has class scope, but get_merge_kind assumes
all DECL_UNINSTANTIATED_TEMPLATE_FRIEND_P TEMPLATE_DECL have namespace
scope and wrongly returns MK_named instead of MK_local_friend.

gcc/cp/ChangeLog:

* module.cc (trees_out::get_merge_kind) :
Accomodate class-scope DECL_UNINSTANTIATED_TEMPLATE_FRIEND_P
TEMPLATE_DECL.  Merge IDENTIFIER_ANON_P branches.

gcc/testsuite/ChangeLog:

* g++.dg/modules/friend-7.h: New test.
* g++.dg/modules/friend-7_a.H: New test.
* g++.dg/modules/friend-7_b.C: New test.
---
  gcc/cp/module.cc  | 19 +--
  gcc/testsuite/g++.dg/modules/friend-7.h   |  5 +
  gcc/testsuite/g++.dg/modules/friend-7_a.H |  3 +++
  gcc/testsuite/g++.dg/modules/friend-7_b.C |  5 +
  4 files changed, 22 insertions(+), 10 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/modules/friend-7.h
  create mode 100644 gcc/testsuite/g++.dg/modules/friend-7_a.H
  create mode 100644 gcc/testsuite/g++.dg/modules/friend-7_b.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 106af7bdb3e..fa91c6ff9cb 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -10491,21 +10491,20 @@ trees_out::get_merge_kind (tree decl, depset *dep)
break;
  }
  
-	if (RECORD_OR_UNION_TYPE_P (ctx))

+   if (TREE_CODE (decl) == TEMPLATE_DECL
+   && DECL_UNINSTANTIATED_TEMPLATE_FRIEND_P (decl))
  {
-   if (IDENTIFIER_ANON_P (DECL_NAME (decl)))
- mk = MK_field;
+   mk = MK_local_friend;
break;
  }
  
-	if (TREE_CODE (decl) == TEMPLATE_DECL

-   && DECL_UNINSTANTIATED_TEMPLATE_FRIEND_P (decl))
- mk = MK_local_friend;
-   else if (IDENTIFIER_ANON_P (DECL_NAME (decl)))
+   if (IDENTIFIER_ANON_P (DECL_NAME (decl)))
  {
-   if (DECL_IMPLICIT_TYPEDEF_P (decl)
-   && UNSCOPED_ENUM_P (TREE_TYPE (decl))
-   && TYPE_VALUES (TREE_TYPE (decl)))
+   if (RECORD_OR_UNION_TYPE_P (ctx))
+ mk = MK_field;
+   else if (DECL_IMPLICIT_TYPEDEF_P (decl)
+&& UNSCOPED_ENUM_P (TREE_TYPE (decl))
+&& TYPE_VALUES (TREE_TYPE (decl)))
  /* Keyed by first enum value, and underlying type.  */
  mk = MK_enum;
else
diff --git a/gcc/testsuite/g++.dg/modules/friend-7.h 
b/gcc/testsuite/g++.dg/modules/friend-7.h
new file mode 100644
index 000..c0f00394f3b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/friend-7.h
@@ -0,0 +1,5 @@
+template
+struct A {
+  template struct B { };
+  template friend struct B;
+};
diff --git a/gcc/testsuite/g++.dg/modules/friend-7_a.H 
b/gcc/testsuite/g++.dg/modules/friend-7_a.H
new file mode 100644
index 000..e750e4c7d8d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/friend-7_a.H
@@ -0,0 +1,3 @@
+// { dg-additional-options "-fmodule-header" }
+// { dg-module-cmi {} }
+#include "friend-7.h"
diff --git a/gcc/testsuite/g++.dg/modules/friend-7_b.C 
b/gcc/testsuite/g++.dg/modules/friend-7_b.C
new file mode 100644
index 000..d90b685d89d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/friend-7_b.C
@@ -0,0 +1,5 @@
+// { dg-additional-options "-fmodules-ts" }
+#include "friend-7.h"
+import "friend-7_a.H";
+
+A a;

Re: [PATCH] c++: ICE with noexcept and local specialization [PR114114]

2024-03-05 Thread Jason Merrill


On 3/5/24 15:56, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK.


-- >8 --
Here we ICE because we call register_local_specialization while
local_specializations is null, so

   local_specializations->put ();

crashes on null this.  It's null since maybe_instantiate_noexcept calls
push_to_top_level which creates a new scope.  Normally, I would have
guessed that we need a new local_specialization_stack.  But here we're
dealing with an operand of a noexcept, which is an unevaluated operand,
and those aren't registered in the hash map.  maybe_instantiate_noexcept
wasn't signalling that it's substituting an unevaluated operand though.

PR c++/114114

gcc/cp/ChangeLog:

* pt.cc (maybe_instantiate_noexcept): Save/restore
cp_unevaluated_operand, c_inhibit_evaluation_warnings, and
cp_noexcept_operand around the tsubst_expr call.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/noexcept84.C: New test.
---
  gcc/cp/pt.cc|  6 +
  gcc/testsuite/g++.dg/cpp0x/noexcept84.C | 32 +
  2 files changed, 38 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept84.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index c4bc54a8fdb..11f7d33c766 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -26869,10 +26869,16 @@ maybe_instantiate_noexcept (tree fn, tsubst_flags_t 
complain)
  if (orig_fn)
++processing_template_decl;
  
+	  ++cp_unevaluated_operand;

+ ++c_inhibit_evaluation_warnings;
+ ++cp_noexcept_operand;
  /* Do deferred instantiation of the noexcept-specifier.  */
  noex = tsubst_expr (DEFERRED_NOEXCEPT_PATTERN (noex),
  DEFERRED_NOEXCEPT_ARGS (noex),
  tf_warning_or_error, fn);
+ --cp_unevaluated_operand;
+ --c_inhibit_evaluation_warnings;
+ --cp_noexcept_operand;
  
  	  /* Build up the noexcept-specification.  */

  spec = build_noexcept_spec (noex, tf_warning_or_error);
diff --git a/gcc/testsuite/g++.dg/cpp0x/noexcept84.C 
b/gcc/testsuite/g++.dg/cpp0x/noexcept84.C
new file mode 100644
index 000..06f33264f77
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/noexcept84.C
@@ -0,0 +1,32 @@
+// PR c++/114114
+// { dg-do compile { target c++11 } }
+
+template
+constexpr void
+test ()
+{
+  constexpr bool is_yes = B;
+  struct S {
+constexpr S() noexcept(is_yes) { }
+  };
+  S s;
+}
+
+constexpr bool foo() { return true; }
+
+template
+constexpr void
+test2 ()
+{
+  constexpr T (*pfn)() = 
+  struct S {
+constexpr S() noexcept(pfn()) { }
+  };
+  S s;
+}
+
+int main()
+{
+  test();
+  test2();
+}

base-commit: 8776468d9e57ace5f832c1368243a6dbce9984d5

Re: [PATCH] c++: Fix template deduction for conversion operators with xobj parameters [PR113629]

2024-03-05 Thread Jason Merrill


On 3/5/24 17:47, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

Unification for conversion operators (DEDUCE_CONV) doesn't perform
transformations like handling forwarding references. This is correct in
general, but not for xobj parameters, which should be handled "normally"
for the purposes of deduction: [temp.deduct.conv] only applies to the
return type of the conversion function.

PR c++/113629

gcc/cp/ChangeLog:

* pt.cc (type_unification_real): Use DEDUCE_CALL for xobj
parameters of conversion functions.

gcc/testsuite/ChangeLog:

* g++.dg/cpp23/explicit-obj-conv-op.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/pt.cc  | 15 +-
  .../g++.dg/cpp23/explicit-obj-conv-op.C   | 49 +++
  2 files changed, 63 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp23/explicit-obj-conv-op.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index c4bc54a8fdb..632437d3424 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -23281,6 +23281,10 @@ type_unification_real (tree tparms,
   in TARGS.  */
NON_DEFAULT_TEMPLATE_ARGS_COUNT (targs) = NULL_TREE;
  
+  bool is_xobj_conv_fn

+= (strict == DEDUCE_CONV
+   && DECL_XOBJ_MEMBER_FUNCTION_P (TREE_TYPE (tparms)));
+
   again:
parms = xparms;
args = xargs;
@@ -23312,10 +23316,17 @@ type_unification_real (tree tparms,
   parameter pack is a non-deduced context.  */
continue;
  
+  /* For explicit object parameters, unification should behave like

+normal function calls, even for conversion functions.  This
+corresponds to the second (that is, last) argument.  */
+  unification_kind_t kind = strict;
+  if (is_xobj_conv_fn && ia > 0)


Is it necessary to check the xobj flag?  Or can this just be

 if (strict == DEDUCE_CONV && ia > 0)

?

Jason

[PATCH] c++: Fix template deduction for conversion operators with xobj parameters [PR113629]

2024-03-05 Thread Nathaniel Shead

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

Unification for conversion operators (DEDUCE_CONV) doesn't perform
transformations like handling forwarding references. This is correct in
general, but not for xobj parameters, which should be handled "normally"
for the purposes of deduction: [temp.deduct.conv] only applies to the
return type of the conversion function.

PR c++/113629

gcc/cp/ChangeLog:

* pt.cc (type_unification_real): Use DEDUCE_CALL for xobj
parameters of conversion functions.

gcc/testsuite/ChangeLog:

* g++.dg/cpp23/explicit-obj-conv-op.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/pt.cc  | 15 +-
 .../g++.dg/cpp23/explicit-obj-conv-op.C   | 49 +++
 2 files changed, 63 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp23/explicit-obj-conv-op.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index c4bc54a8fdb..632437d3424 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -23281,6 +23281,10 @@ type_unification_real (tree tparms,
  in TARGS.  */
   NON_DEFAULT_TEMPLATE_ARGS_COUNT (targs) = NULL_TREE;
 
+  bool is_xobj_conv_fn
+= (strict == DEDUCE_CONV
+   && DECL_XOBJ_MEMBER_FUNCTION_P (TREE_TYPE (tparms)));
+
  again:
   parms = xparms;
   args = xargs;
@@ -23312,10 +23316,17 @@ type_unification_real (tree tparms,
   parameter pack is a non-deduced context.  */
continue;
 
+  /* For explicit object parameters, unification should behave like
+normal function calls, even for conversion functions.  This
+corresponds to the second (that is, last) argument.  */
+  unification_kind_t kind = strict;
+  if (is_xobj_conv_fn && ia > 0)
+   kind = DEDUCE_CALL;
+
   arg = args[ia];
   ++ia;
 
-  if (unify_one_argument (tparms, full_targs, parm, arg, subr, strict,
+  if (unify_one_argument (tparms, full_targs, parm, arg, subr, kind,
  explain_p))
return 1;
 }
@@ -23324,6 +23335,8 @@ type_unification_real (tree tparms,
   && parms != void_list_node
   && TREE_CODE (TREE_VALUE (parms)) == TYPE_PACK_EXPANSION)
 {
+  gcc_assert (strict != DEDUCE_CONV);
+
   /* Unify the remaining arguments with the pack expansion type.  */
   tree argvec;
   tree parmvec = make_tree_vec (1);
diff --git a/gcc/testsuite/g++.dg/cpp23/explicit-obj-conv-op.C 
b/gcc/testsuite/g++.dg/cpp23/explicit-obj-conv-op.C
new file mode 100644
index 000..a6ae4ea1dda
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp23/explicit-obj-conv-op.C
@@ -0,0 +1,49 @@
+// PR c++/113629
+// { dg-do compile { target c++23 } }
+
+template  constexpr bool is_lvalue = false;
+template  constexpr bool is_lvalue = true;
+
+struct A {
+  constexpr operator bool(this auto&& self) {
+return is_lvalue;
+  }
+};
+
+constexpr A a;
+static_assert(static_cast(a));
+static_assert((bool)a);
+static_assert(!static_cast(A{}));
+static_assert(!(bool)A{});
+
+struct B : A {};
+
+constexpr B b;
+static_assert(static_cast(b));
+static_assert((bool)b);
+static_assert(!static_cast(B{}));
+static_assert(!(bool)B{});
+
+struct C {
+  template 
+  explicit constexpr operator R(this T&&) {
+return is_lvalue;
+  }
+};
+
+constexpr C c;
+static_assert(static_cast(c));
+static_assert((bool)c);
+static_assert(!static_cast(C{}));
+static_assert(!(bool)C{});
+
+struct D {
+  explicit constexpr operator bool(this const D&) { return true; }
+  explicit constexpr operator bool(this const D&&) { return false; }
+};
+
+constexpr D d;
+static_assert(static_cast(d));
+static_assert((bool)d);
+static_assert(!static_cast(D{}));
+static_assert(!(bool)D{});
-- 
2.43.2

Re: [PATCH v7] C, ObjC: Add -Wunterminated-string-initialization

2024-03-05 Thread Sandra Loosemore


On 3/5/24 13:33, Alejandro Colomar wrote:

Warn about the following:

 char  s[3] = "foo";

Initializing a char array with a string literal of the same length as
the size of the array is usually a mistake.  Rarely is the case where
one wants to create a non-terminated character sequence from a string
literal.

In some cases, for writing faster code, one may want to use arrays
instead of pointers, since that removes the need for storing an array of
pointers apart from the strings themselves.

 char  *log_levels[]   = { "info", "warning", "err" };
vs.
 char  log_levels[][7] = { "info", "warning", "err" };

This forces the programmer to specify a size, which might change if a
new entry is later added.  Having no way to enforce null termination is
very dangerous, however, so it is useful to have a warning for this, so
that the compiler can make sure that the programmer didn't make any
mistakes.  This warning catches the bug above, so that the programmer
will be able to fix it and write:

 char  log_levels[][8] = { "info", "warning", "err" };

This warning already existed as part of -Wc++-compat, but this patch
allows enabling it separately.  It is also included in -Wextra, since
it may not always be desired (when unterminated character sequences are
wanted), but it's likely to be desired in most cases.

Since Wc++-compat now includes this warning, the test has to be modified
to expect the text of the new warning too, in .


The documentation parts of the patch are OK.

-Sandra

Re: [patch, libgfortran] Part 2: PR105456 Child I/O does not propage iostat

2024-03-05 Thread Harald Anlauf


Hi Jerry,

on further thought, do we sanitize 'child_iomsg'?
We pass it to snprintf as format.

Wouldn't a strncpy be sufficient?

Harald


On 3/5/24 22:37, Harald Anlauf wrote:

Hi Jerry,

I think there is the risk of buffer overrun in the following places:

+ char message[IOMSG_LEN];
+ child_iomsg_len = string_len_trim (IOMSG_LEN, child_iomsg)
+ 1;
   free_line (dtp);
   snprintf (message, child_iomsg_len, child_iomsg);
   generate_error (>common, dtp->u.p.child_saved_iostat,

plus several more.  Wouldn't it be better to increase the size of
message by one?

Thanks,
Harald


On 3/5/24 04:15, Jerry D wrote:

On 3/1/24 11:24 AM, rep.dot@gmail.com wrote:

Hi Jerry and Steve,

On 29 February 2024 19:28:19 CET, Jerry D  wrote:

On 2/29/24 10:13 AM, Steve Kargl wrote:

On Thu, Feb 29, 2024 at 09:36:43AM -0800, Jerry D wrote:

On 2/29/24 1:47 AM, Bernhard Reutner-Fischer wrote:


And, just for my own education, the length limitation of iomsg to
255
chars is not backed by the standard AFAICS, right? It's just our
STRERR_MAXSZ?


Yes, its what we have had for a long lone time. Once you throw an
error
things get very processor dependent. I found MSGLEN set to 100 and
IOMSG_len
to 256. Nothing magic about it.



There is no restriction on the length for the iomsg-variable
that receives the generated error message.  In fact, if the
iomsg-variable has a deferred-length type parameter, then
(re)-allocation to the exact length is expected.

    F2023

    12.11.6 IOMSG= specifier

    If an error, end-of-file, or end-of-record condition occurs during
    execution of an input/output statement, iomsg-variable is assigned
    an explanatory message, as if by intrinsic assignment. If no such
    condition occurs, the definition status and value of
iomsg-variable
    are unchanged.
   character(len=23) emsg
read(fd,*,iomsg=emsg)

Here, the generated iomsg is either truncated to a length of 23
or padded with blanks to a length of 23.

character(len=:), allocatable :: emsg
read(fd,*,iomsg=emsg)

Here, emsg should have the length of whatever error message was
generated.
   HTH



Well, currently, if someone uses a larger string than 256 we are
going to chop it off.

Do we want to process this differently now?


Yes. There is some odd hunk about discrepancy of passed len and
actual len afterwards in 22-007-r1, IIRC. Didn't look closely though.


--- snip ---

Attached is the revised patch using the already available
string_len_trim function.

This hunk is only executed if a user has not passed an iostat or iomsg
variable in the parent I/O statement and an error is triggered which
terminates execution of the program. In this case, the iomsg string is
provided in the usual error message in a "processor defined" way.

(F2023):

12.6.4.8.3 Executing defined input/output data transfers
---
11 If the iostat argument of the defined input/output procedure has a
nonzero value when that procedure returns, and the processor therefore
terminates execution of the program as described in 12.11, the
processor shall make the value of the iomsg argument available in a
processor-dependent manner.
---

OK for trunk?

Regards,

Jerry

Re: [PATCH] tree-profile: Don't instrument an IFUNC resolver nor its callees

2024-03-05 Thread H.J. Lu

On Thu, Feb 29, 2024 at 7:11 AM H.J. Lu  wrote:
>
> On Thu, Feb 29, 2024 at 7:06 AM Jan Hubicka  wrote:
> >
> > > > I am worried about scenario where ifunc selector calls function foo
> > > > defined locally and foo is also used from other places possibly in hot
> > > > loops.
> > > > >
> > > > > > So it is not really reliable fix (though I guess it will work a lot 
> > > > > > of
> > > > > > common code).  I wonder what would be alternatives.  In GCC 
> > > > > > generated
> > > > > > profling code we use TLS only for undirect call profiling (so there 
> > > > > > is
> > > > > > no need to turn off rest of profiling).  I wonder if there is any 
> > > > > > chance
> > > > > > to not make it seffault when it is done before TLS is set up?
> > > > >
> > > > > IFUNC selector should make minimum external calls, none is preferred.
> > > >
> > > > Edge porfiling only inserts (atomic) 64bit increments of counters.
> > > > If target supports these operations inline, no external calls will be
> > > > done.
> > > >
> > > > Indirect call profiling inserts the problematic TLS variable (to track
> > > > caller-callee pairs). Value profiling also inserts various additional
> > > > external calls to counters.
> > > >
> > > > I am perfectly fine with disabling instrumentation for ifunc selectors
> > > > and functions only reachable from them, but I am worried about calles
> > > > used also from non-ifunc path.
> > >
> > > Programmers need to understand not to do it.
> >
> > It would help to have this documented. Should we warn when ifunc
> > resolver calls external function, comdat of function reachable from
> > non-ifunc code?
>
> That will be nice.
>
> > >
> > > > For example selector implemented in C++ may do some string handling to
> > > > match CPU name and propagation will disable profiling for std::string
> > >
> > > On x86, they should use CPUID, not string functions.
> > >
> > > > member functions (which may not be effective if comdat section is
> > > > prevailed from other translation unit).
> > >
> > > String functions may lead to external function calls which is dangerous.
> > >
> > > > > Any external calls may lead to issues at run-time.  It is a very bad 
> > > > > idea
> > > > > to profile IFUNC selector via external function call.
> > > >
> > > > Looking at https://sourceware.org/glibc/wiki/GNU_IFUNC
> > > > there are other limitations on ifunc except for profiling, such as
> > > > -fstack-protector-all.  So perhaps your propagation can be used to
> > > > disable those features as well.
> > >
> > > So, it may not be tree-profile specific.  Where should these 2 bits
> > > be added?
> >
> > If we want to disable other transforms too, then I think having a bit in
> > cgraph_node for reachability from ifunc resolver makes sense.
> > I would still do the cycle detection using on-side hash_map to avoid
> > polution of the global datastructure.
> >
>
> I will see what I can do.
>
>

The v2 patch is at

https://patchwork.sourceware.org/project/gcc/list/?series=31627

-- 
H.J.

[PATCH v2] tree-profile: Don't instrument an IFUNC resolver nor its callees

2024-03-05 Thread H.J. Lu

We can't instrument an IFUNC resolver nor its callees as it may require
TLS which hasn't been set up yet when the dynamic linker is resolving
IFUNC symbols.

Add an IFUNC resolver caller marker to cgraph_node and set it if the
function is called by an IFUNC resolver.  Update tree_profiling to skip
functions called by IFUNC resolver.

Tested with profiledbootstrap on Fedora 39/x86-64.

gcc/ChangeLog:

PR tree-optimization/114115
* cgraph.h (symtab_node): Add check_ifunc_callee_symtab_nodes.
(cgraph_node): Add called_by_ifunc_resolver.
* cgraphunit.cc (symbol_table::compile): Call
symtab_node::check_ifunc_callee_symtab_nodes.
* symtab.cc (check_ifunc_resolver): New.
(ifunc_ref_map): Likewise.
(is_caller_ifunc_resolver): Likewise.
(symtab_node::check_ifunc_callee_symtab_nodes): Likewise.
* tree-profile.cc (tree_profiling): Do not instrument an IFUNC
resolver nor its callees.

gcc/testsuite/ChangeLog:

PR tree-optimization/114115
* gcc.dg/pr114115.c: New test.
---
 gcc/cgraph.h|  6 +++
 gcc/cgraphunit.cc   |  2 +
 gcc/symtab.cc   | 89 +
 gcc/testsuite/gcc.dg/pr114115.c | 24 +
 gcc/tree-profile.cc |  4 ++
 5 files changed, 125 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pr114115.c

diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 47f35e8078d..a8c3224802c 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -479,6 +479,9 @@ public:
  Return NULL if there's no such node.  */
   static symtab_node *get_for_asmname (const_tree asmname);
 
+  /* Check symbol table for callees of IFUNC resolvers.  */
+  static void check_ifunc_callee_symtab_nodes (void);
+
   /* Verify symbol table for internal consistency.  */
   static DEBUG_FUNCTION void verify_symtab_nodes (void);
 
@@ -896,6 +899,7 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : public 
symtab_node
   redefined_extern_inline (false), tm_may_enter_irr (false),
   ipcp_clone (false), declare_variant_alt (false),
   calls_declare_variant_alt (false), gc_candidate (false),
+  called_by_ifunc_resolver (false),
   m_uid (uid), m_summary_id (-1)
   {}
 
@@ -1495,6 +1499,8 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : 
public symtab_node
  is set for local SIMD clones when they are created and cleared if the
  vectorizer uses them.  */
   unsigned gc_candidate : 1;
+  /* Set if the function is called by an IFUNC resolver.  */
+  unsigned called_by_ifunc_resolver : 1;
 
 private:
   /* Unique id of the node.  */
diff --git a/gcc/cgraphunit.cc b/gcc/cgraphunit.cc
index d200166f7e9..2bd0289ffba 100644
--- a/gcc/cgraphunit.cc
+++ b/gcc/cgraphunit.cc
@@ -2317,6 +2317,8 @@ symbol_table::compile (void)
 
   symtab_node::checking_verify_symtab_nodes ();
 
+  symtab_node::check_ifunc_callee_symtab_nodes ();
+
   timevar_push (TV_CGRAPHOPT);
   if (pre_ipa_mem_report)
 dump_memory_report ("Memory consumption before IPA");
diff --git a/gcc/symtab.cc b/gcc/symtab.cc
index 4c7e3c135ca..3256133891d 100644
--- a/gcc/symtab.cc
+++ b/gcc/symtab.cc
@@ -1369,6 +1369,95 @@ symtab_node::verify (void)
   timevar_pop (TV_CGRAPH_VERIFY);
 }
 
+/* Return true and set *DATA to true if NODE is an ifunc resolver.  */
+
+static bool
+check_ifunc_resolver (cgraph_node *node, void *data)
+{
+  if (node->ifunc_resolver)
+{
+  bool *is_ifunc_resolver = (bool *) data;
+  *is_ifunc_resolver = true;
+  return true;
+}
+  return false;
+}
+
+static auto_bitmap ifunc_ref_map;
+
+/* Return true if any caller of NODE is an ifunc resolver.  */
+
+static bool
+is_caller_ifunc_resolver (cgraph_node *node)
+{
+  bool is_ifunc_resolver = false;
+
+  for (cgraph_edge *e = node->callers; e; e = e->next_caller)
+{
+  /* Return true if caller is known to be an IFUNC resolver.  */
+  if (e->caller->called_by_ifunc_resolver)
+   return true;
+
+  /* Check for recursive call.  */
+  if (e->caller == node)
+   continue;
+
+  /* Skip if it has been visited.  */
+  unsigned int uid = e->caller->get_uid ();
+  if (bitmap_bit_p (ifunc_ref_map, uid))
+   continue;
+  bitmap_set_bit (ifunc_ref_map, uid);
+
+  if (is_caller_ifunc_resolver (e->caller))
+   {
+ /* Return true if caller is an IFUNC resolver.  */
+ e->caller->called_by_ifunc_resolver = true;
+ return true;
+   }
+
+  /* Check if caller's alias is an IFUNC resolver.  */
+  e->caller->call_for_symbol_and_aliases (check_ifunc_resolver,
+ _ifunc_resolver,
+ true);
+  if (is_ifunc_resolver)
+   {
+ /* Return true if caller's alias is an IFUNC resolver.  */
+ e->caller->called_by_ifunc_resolver = true;
+ return true;
+   }
+}
+
+  return false;
+}
+
+/* Check symbol table for

Re: [patch, libgfortran] Part 2: PR105456 Child I/O does not propage iostat

2024-03-05 Thread Harald Anlauf


Hi Jerry,

I think there is the risk of buffer overrun in the following places:

+ char message[IOMSG_LEN];
+ child_iomsg_len = string_len_trim (IOMSG_LEN, child_iomsg) 
+ 1;

  free_line (dtp);
  snprintf (message, child_iomsg_len, child_iomsg);
  generate_error (>common, dtp->u.p.child_saved_iostat,

plus several more.  Wouldn't it be better to increase the size of 
message by one?


Thanks,
Harald


On 3/5/24 04:15, Jerry D wrote:

On 3/1/24 11:24 AM, rep.dot@gmail.com wrote:

Hi Jerry and Steve,

On 29 February 2024 19:28:19 CET, Jerry D  wrote:

On 2/29/24 10:13 AM, Steve Kargl wrote:

On Thu, Feb 29, 2024 at 09:36:43AM -0800, Jerry D wrote:

On 2/29/24 1:47 AM, Bernhard Reutner-Fischer wrote:


And, just for my own education, the length limitation of iomsg to 255
chars is not backed by the standard AFAICS, right? It's just our
STRERR_MAXSZ?


Yes, its what we have had for a long lone time. Once you throw an 
error
things get very processor dependent. I found MSGLEN set to 100 and 
IOMSG_len

to 256. Nothing magic about it.



There is no restriction on the length for the iomsg-variable
that receives the generated error message.  In fact, if the
iomsg-variable has a deferred-length type parameter, then
(re)-allocation to the exact length is expected.

    F2023

    12.11.6 IOMSG= specifier

    If an error, end-of-file, or end-of-record condition occurs during
    execution of an input/output statement, iomsg-variable is assigned
    an explanatory message, as if by intrinsic assignment. If no such
    condition occurs, the definition status and value of iomsg-variable
    are unchanged.
   character(len=23) emsg
read(fd,*,iomsg=emsg)

Here, the generated iomsg is either truncated to a length of 23
or padded with blanks to a length of 23.

character(len=:), allocatable :: emsg
read(fd,*,iomsg=emsg)

Here, emsg should have the length of whatever error message was
generated.
   HTH



Well, currently, if someone uses a larger string than 256 we are 
going to chop it off.


Do we want to process this differently now?


Yes. There is some odd hunk about discrepancy of passed len and actual 
len afterwards in 22-007-r1, IIRC. Didn't look closely though.



--- snip ---

Attached is the revised patch using the already available 
string_len_trim function.


This hunk is only executed if a user has not passed an iostat or iomsg 
variable in the parent I/O statement and an error is triggered which 
terminates execution of the program. In this case, the iomsg string is 
provided in the usual error message in a "processor defined" way.


(F2023):

12.6.4.8.3 Executing defined input/output data transfers
---
11 If the iostat argument of the defined input/output procedure has a 
nonzero value when that procedure returns, and the processor therefore 
terminates execution of the program as described in 12.11, the processor 
shall make the value of the iomsg argument available in a 
processor-dependent manner.

---

OK for trunk?

Regards,

Jerry

Re: [patch, libgfortran] Part 2: PR105456 Child I/O does not propage iostat

2024-03-05 Thread rep . dot . nop

On 5 March 2024 04:15:12 CET, Jerry D  wrote:

>
>Attached is the revised patch using the already available string_len_trim 
>function.
>
>This hunk is only executed if a user has not passed an iostat or iomsg 
>variable in the parent I/O statement and an error is triggered which 
>terminates execution of the program. In this case, the iomsg string is 
>provided in the usual error message in a "processor defined" way.
>
>(F2023):
>
>12.6.4.8.3 Executing defined input/output data transfers
>---
>11 If the iostat argument of the defined input/output procedure has a nonzero 
>value when that procedure returns, and the processor therefore terminates 
>execution of the program as described in 12.11, the processor shall make the 
>value of the iomsg argument available in a processor-dependent manner.
>---
>
>OK for trunk?

LGTM.
thanks!

[PATCH] Fortran: error recovery while simplifying expressions [PR103707,PR106987]

2024-03-05 Thread Harald Anlauf

Dear all,

error recovery on arithmetic errors during simplification has bugged
me for a long time, especially since the occurence of ICEs depended
on whether -frange-check is specified or not, whether array ctors
were involved, etc.

I've now come up with the attached patch that classifies the arithmetic
result codes into "hard" and "soft" errors.

A "soft" error means that it is an overflow or other exception (e.g. NaN)
that is ignored with -fno-range-check.  After the patch, a soft error
will not stop simplification (a hard one will), and error status will be
passed along.

I took this opportunity to change the emitted error for division by zero
for real and complex division dependent on whether the numerator is
regular or not.  This makes e.g. (0.)/0 a NaN and now says so, in
accordance with some other brands.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Other comments?

Thanks,
Harald

From d9b87bea6af77fbc794e1f21cfecb0468c68cb72 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Tue, 5 Mar 2024 21:54:26 +0100
Subject: [PATCH] Fortran: error recovery while simplifying expressions
 [PR103707,PR106987]

When an exception is encountered during simplification of arithmetic
expressions, the result may depend on whether range-checking is active
(-frange-check) or not.  However, the code path in the front-end should
stay the same for "soft" errors for which the exception is triggered by the
check, while "hard" errors should always terminate the simplification, so
that error recovery is independent of the flag.  Separation of arithmetic
error codes into "hard" and "soft" errors shall be done consistently via
is_hard_arith_error().

	PR fortran/103707
	PR fortran/106987

gcc/fortran/ChangeLog:

	* arith.cc (is_hard_arith_error): New helper function to determine
	whether an arithmetic error is "hard" or not.
	(check_result): Use it.
	(gfc_arith_divide): Set "Division by zero" only for regular
	numerators of real and complex divisions.
	(reduce_unary): Use is_hard_arith_error to determine whether a hard
	or (recoverable) soft error was encountered.  Terminate immediately
	on hard error, otherwise remember code of first soft error.
	(reduce_binary_ac): Likewise.
	(reduce_binary_ca): Likewise.
	(reduce_binary_aa): Likewise.

gcc/testsuite/ChangeLog:

	* gfortran.dg/pr99350.f90:
	* gfortran.dg/arithmetic_overflow_3.f90: New test.
---
 gcc/fortran/arith.cc  | 134 --
 .../gfortran.dg/arithmetic_overflow_3.f90 |  48 +++
 gcc/testsuite/gfortran.dg/pr99350.f90 |   2 +-
 3 files changed, 143 insertions(+), 41 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/arithmetic_overflow_3.f90

diff --git a/gcc/fortran/arith.cc b/gcc/fortran/arith.cc
index d17d1aaa1d9..b373c25e5e1 100644
--- a/gcc/fortran/arith.cc
+++ b/gcc/fortran/arith.cc
@@ -130,6 +130,30 @@ gfc_arith_error (arith code)
 }


+/* Check if a certain arithmetic error code is severe enough to prevent
+   further simplification, as opposed to errors thrown by the range check
+   (e.g. overflow) or arithmetic exceptions that are tolerated with
+   -fno-range-check.  */
+
+static bool
+is_hard_arith_error (arith code)
+{
+  switch (code)
+{
+case ARITH_OK:
+case ARITH_OVERFLOW:
+case ARITH_UNDERFLOW:
+case ARITH_NAN:
+case ARITH_DIV0:
+case ARITH_ASYMMETRIC:
+  return false;
+
+default:
+  return true;
+}
+}
+
+
 /* Get things ready to do math.  */

 void
@@ -579,10 +603,10 @@ check_result (arith rc, gfc_expr *x, gfc_expr *r, gfc_expr **rp)
   val = ARITH_OK;
 }

-  if (val == ARITH_OK || val == ARITH_OVERFLOW)
-*rp = r;
-  else
+  if (is_hard_arith_error (val))
 gfc_free_expr (r);
+  else
+*rp = r;

   return val;
 }
@@ -792,23 +816,26 @@ gfc_arith_divide (gfc_expr *op1, gfc_expr *op2, gfc_expr **resultp)
   break;

 case BT_REAL:
-  if (mpfr_sgn (op2->value.real) == 0 && flag_range_check == 1)
-	{
-	  rc = ARITH_DIV0;
-	  break;
-	}
+  /* Set "Division by zero" only for regular numerator.  */
+  if (flag_range_check == 1
+	  && mpfr_zero_p (op2->value.real)
+	  && mpfr_regular_p (op1->value.real))
+	rc = ARITH_DIV0;

   mpfr_div (result->value.real, op1->value.real, op2->value.real,
 	   GFC_RND_MODE);
   break;

 case BT_COMPLEX:
-  if (mpc_cmp_si_si (op2->value.complex, 0, 0) == 0
-	  && flag_range_check == 1)
-	{
-	  rc = ARITH_DIV0;
-	  break;
-	}
+  /* Set "Division by zero" only for regular numerator.  */
+  if (flag_range_check == 1
+	  && mpfr_zero_p (mpc_realref (op2->value.complex))
+	  && mpfr_zero_p (mpc_imagref (op2->value.complex))
+	  && ((mpfr_regular_p (mpc_realref (op1->value.complex))
+	   && mpfr_number_p (mpc_imagref (op1->value.complex)))
+	  || (mpfr_regular_p (mpc_imagref (op1->value.complex))
+		  && mpfr_number_p (mpc_realref (op1->value.complex)
+	rc = ARITH_DIV0;

   gfc_set_model (mpc_realref (op1->value.complex));
   if

[PATCH] c++: ICE with noexcept and local specialization [PR114114]

2024-03-05 Thread Marek Polacek

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
Here we ICE because we call register_local_specialization while
local_specializations is null, so

  local_specializations->put ();

crashes on null this.  It's null since maybe_instantiate_noexcept calls
push_to_top_level which creates a new scope.  Normally, I would have
guessed that we need a new local_specialization_stack.  But here we're
dealing with an operand of a noexcept, which is an unevaluated operand,
and those aren't registered in the hash map.  maybe_instantiate_noexcept
wasn't signalling that it's substituting an unevaluated operand though.

PR c++/114114

gcc/cp/ChangeLog:

* pt.cc (maybe_instantiate_noexcept): Save/restore
cp_unevaluated_operand, c_inhibit_evaluation_warnings, and
cp_noexcept_operand around the tsubst_expr call.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/noexcept84.C: New test.
---
 gcc/cp/pt.cc|  6 +
 gcc/testsuite/g++.dg/cpp0x/noexcept84.C | 32 +
 2 files changed, 38 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept84.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index c4bc54a8fdb..11f7d33c766 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -26869,10 +26869,16 @@ maybe_instantiate_noexcept (tree fn, tsubst_flags_t 
complain)
  if (orig_fn)
++processing_template_decl;
 
+ ++cp_unevaluated_operand;
+ ++c_inhibit_evaluation_warnings;
+ ++cp_noexcept_operand;
  /* Do deferred instantiation of the noexcept-specifier.  */
  noex = tsubst_expr (DEFERRED_NOEXCEPT_PATTERN (noex),
  DEFERRED_NOEXCEPT_ARGS (noex),
  tf_warning_or_error, fn);
+ --cp_unevaluated_operand;
+ --c_inhibit_evaluation_warnings;
+ --cp_noexcept_operand;
 
  /* Build up the noexcept-specification.  */
  spec = build_noexcept_spec (noex, tf_warning_or_error);
diff --git a/gcc/testsuite/g++.dg/cpp0x/noexcept84.C 
b/gcc/testsuite/g++.dg/cpp0x/noexcept84.C
new file mode 100644
index 000..06f33264f77
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/noexcept84.C
@@ -0,0 +1,32 @@
+// PR c++/114114
+// { dg-do compile { target c++11 } }
+
+template
+constexpr void
+test ()
+{
+  constexpr bool is_yes = B;
+  struct S {
+constexpr S() noexcept(is_yes) { }
+  };
+  S s;
+}
+
+constexpr bool foo() { return true; }
+
+template
+constexpr void
+test2 ()
+{
+  constexpr T (*pfn)() = 
+  struct S {
+constexpr S() noexcept(pfn()) { }
+  };
+  S s;
+}
+
+int main()
+{
+  test();
+  test2();
+}

base-commit: 8776468d9e57ace5f832c1368243a6dbce9984d5
-- 
2.44.0

[PATCH v7] C, ObjC: Add -Wunterminated-string-initialization

2024-03-05 Thread Alejandro Colomar

Warn about the following:

char  s[3] = "foo";

Initializing a char array with a string literal of the same length as
the size of the array is usually a mistake.  Rarely is the case where
one wants to create a non-terminated character sequence from a string
literal.

In some cases, for writing faster code, one may want to use arrays
instead of pointers, since that removes the need for storing an array of
pointers apart from the strings themselves.

char  *log_levels[]   = { "info", "warning", "err" };
vs.
char  log_levels[][7] = { "info", "warning", "err" };

This forces the programmer to specify a size, which might change if a
new entry is later added.  Having no way to enforce null termination is
very dangerous, however, so it is useful to have a warning for this, so
that the compiler can make sure that the programmer didn't make any
mistakes.  This warning catches the bug above, so that the programmer
will be able to fix it and write:

char  log_levels[][8] = { "info", "warning", "err" };

This warning already existed as part of -Wc++-compat, but this patch
allows enabling it separately.  It is also included in -Wextra, since
it may not always be desired (when unterminated character sequences are
wanted), but it's likely to be desired in most cases.

Since Wc++-compat now includes this warning, the test has to be modified
to expect the text of the new warning too, in .

Link: 
Link: 
Link: 

Acked-by: Doug McIlroy 
Cc: "G. Branden Robinson" 
Cc: Ralph Corderoy 
Cc: Dave Kemper 
Cc: Larry McVoy 
Cc: Andrew Pinski 
Cc: Jonathan Wakely 
Cc: Andrew Clayton 
Cc: Martin Uecker 
Cc: David Malcolm 
Cc: Mike Stump 
Cc: Joseph Myers 
Cc: Sandra Loosemore 
Signed-off-by: Alejandro Colomar 
---
Range-diff against v6:
1:  e8fd975bde7 ! 1:  c0f3ffcca7a C, ObjC: Add 
-Wunterminated-string-initialization
@@ gcc/doc/invoke.texi: arithmetic that may yield out of bounds values. 
This warnin
  
 +@opindex Wunterminated-string-initialization
 +@opindex Wno-unterminated-string-initialization
-+@item -Wunterminated-string-initialization
++@item -Wunterminated-string-initialization @r{(C and Objective-C only)}
 +Warn about character arrays
 +initialized as unterminated character sequences
 +with a string literal.
@@ gcc/doc/invoke.texi: arithmetic that may yield out of bounds values. 
This warnin
 +char arr[3] = "foo";
 +@end smallexample
 +
-+@option{-Wunterminated-string-initialization} is enabled by 
@option{-Wextra}.
++This warning is enabled by @option{-Wextra} and @option{-Wc++-compat}.
++In C++, such initializations are an error.
 +
  @opindex Warray-compare
  @opindex Wno-array-compare

 gcc/c-family/c.opt|  4 
 gcc/c/c-typeck.cc |  6 +++---
 gcc/doc/invoke.texi   | 20 ++-
 gcc/testsuite/gcc.dg/Wcxx-compat-14.c |  2 +-
 .../Wunterminated-string-initialization.c |  6 ++
 5 files changed, 33 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/Wunterminated-string-initialization.c

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 44b9c862c14..3837021747b 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1407,6 +1407,10 @@ Wunsuffixed-float-constants
 C ObjC Var(warn_unsuffixed_float_constants) Warning
 Warn about unsuffixed float constants.
 
+Wunterminated-string-initialization
+C ObjC Var(warn_unterminated_string_initialization) Warning LangEnabledBy(C 
ObjC,Wextra || Wc++-compat)
+Warn about character arrays initialized as unterminated character sequences 
with a string literal.
+
 Wunused
 C ObjC C++ ObjC++ LangEnabledBy(C ObjC C++ ObjC++,Wall)
 ; documented in common.opt
diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index e55e887da14..7df9de819ed 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -8399,11 +8399,11 @@ digest_init (location_t init_loc, tree type, tree init, 
tree origtype,
pedwarn_init (init_loc, 0,
  ("initializer-string for array of %qT "
   "is too long"), typ1);
- else if (warn_cxx_compat
+ else if (warn_unterminated_string_initialization
   && compare_tree_int (TYPE_SIZE_UNIT (type), len) < 0)
-   warning_at (init_loc, OPT_Wc___compat,
+   warning_at (init_loc, OPT_Wunterminated_string_initialization,
("initializer-string for array of %qT "
-"is too long for C++"), typ1);
+"is too long"), typ1);
  if (compare_tree_int (TYPE_SIZE_UNIT (type), len) < 0)
{
  unsigned

Re: [PATCH v6] C, ObjC: Add -Wunterminated-string-initialization

2024-03-05 Thread Alejandro Colomar

On Tue, Mar 05, 2024 at 09:20:42PM +0100, Alejandro Colomar wrote:
> Hi!
> 
> v6:
> -  Small wording fix in c.opt
> -  Document the option in invoke.texi
> 
> I tried again, but didn't find much alphabetic order  in there, so put
> it where Mike suggested, after -Warray-bounds=n.
> 
> Have a lovely night!
> Alex
> 
[...]
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 146b40414b0..f81df4de934 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -410,7 +410,9 @@ Objective-C and Objective-C++ Dialects}.
>  -Wsystem-headers  -Wtautological-compare  -Wtrampolines  -Wtrigraphs
>  -Wtrivial-auto-var-init -Wtsan -Wtype-limits  -Wundef
>  -Wuninitialized  -Wunknown-pragmas
> --Wunsuffixed-float-constants  -Wunused
> +-Wunsuffixed-float-constants
> +-Wunterminated-string-initialization
> +-Wunused
>  -Wunused-but-set-parameter  -Wunused-but-set-variable
>  -Wunused-const-variable  -Wunused-const-variable=@var{n}
>  -Wunused-function  -Wunused-label  -Wunused-local-typedefs
> @@ -6264,6 +6266,7 @@ name is still supported, but the newer name is more 
> descriptive.)
>  -Wredundant-move @r{(only for C++)}
>  -Wtype-limits
>  -Wuninitialized
> +-Wunterminated-string-initialization
>  -Wshift-negative-value @r{(in C++11 to C++17 and in C99 and newer)}
>  -Wunused-parameter @r{(only with} @option{-Wunused} @r{or} 
> @option{-Wall}@r{)}
>  -Wunused-but-set-parameter @r{(only with} @option{-Wunused} @r{or} 
> @option{-Wall}@r{)}}
> @@ -8281,6 +8284,20 @@ arithmetic that may yield out of bounds values. This 
> warning level may
>  give a larger number of false positives and is deactivated by default.
>  @end table
>  
> +@opindex Wunterminated-string-initialization
> +@opindex Wno-unterminated-string-initialization
> +@item -Wunterminated-string-initialization
> +Warn about character arrays
> +initialized as unterminated character sequences
> +with a string literal.
> +For example:
> +
> +@smallexample
> +char arr[3] = "foo";
> +@end smallexample
> +
> +@option{-Wunterminated-string-initialization} is enabled by @option{-Wextra}.

Oops, I should also mention -Wc++-compat here.


-- 

Looking for a remote C programming job at the moment.


signature.asc
Description: PGP signature

Re: CI for "Option handling: add documentation URLs"

2024-03-05 Thread Mark Wielaard

On Tue, Mar 05, 2024 at 08:34:31AM -0500, David Malcolm wrote:
> > I committed that patch, but was not fast enough actually enabling the
> > buildbot and missed another fixlet needed first.
> > 
> > OK, to push the attached regeneration patch?
> 
> Yes

Thanks, pushed. And now also pushed the builder patch (attached) to
enable it in the CI autoregen checker. It already ran without finding
any issues.

https://builder.sourceware.org/buildbot/#/builders/gcc-autoregen

[PATCH v6] C, ObjC: Add -Wunterminated-string-initialization

2024-03-05 Thread Alejandro Colomar

Warn about the following:

char  s[3] = "foo";

Initializing a char array with a string literal of the same length as
the size of the array is usually a mistake.  Rarely is the case where
one wants to create a non-terminated character sequence from a string
literal.

In some cases, for writing faster code, one may want to use arrays
instead of pointers, since that removes the need for storing an array of
pointers apart from the strings themselves.

char  *log_levels[]   = { "info", "warning", "err" };
vs.
char  log_levels[][7] = { "info", "warning", "err" };

This forces the programmer to specify a size, which might change if a
new entry is later added.  Having no way to enforce null termination is
very dangerous, however, so it is useful to have a warning for this, so
that the compiler can make sure that the programmer didn't make any
mistakes.  This warning catches the bug above, so that the programmer
will be able to fix it and write:

char  log_levels[][8] = { "info", "warning", "err" };

This warning already existed as part of -Wc++-compat, but this patch
allows enabling it separately.  It is also included in -Wextra, since
it may not always be desired (when unterminated character sequences are
wanted), but it's likely to be desired in most cases.

Since Wc++-compat now includes this warning, the test has to be modified
to expect the text of the new warning too, in .

Link: 
Link: 
Link: 

Acked-by: Doug McIlroy 
Cc: "G. Branden Robinson" 
Cc: Ralph Corderoy 
Cc: Dave Kemper 
Cc: Larry McVoy 
Cc: Andrew Pinski 
Cc: Jonathan Wakely 
Cc: Andrew Clayton 
Cc: Martin Uecker 
Cc: David Malcolm 
Cc: Mike Stump 
Cc: Joseph Myers 
Cc: Sandra Loosemore 
Signed-off-by: Alejandro Colomar 
---

Hi!

v6:
-  Small wording fix in c.opt
-  Document the option in invoke.texi

I tried again, but didn't find much alphabetic order  in there, so put
it where Mike suggested, after -Warray-bounds=n.

Have a lovely night!
Alex


Range-diff against v5:
1:  d98d1fec176 ! 1:  e8fd975bde7 C, ObjC: Add 
-Wunterminated-string-initialization
@@ gcc/c-family/c.opt: Wunsuffixed-float-constants
  
 +Wunterminated-string-initialization
 +C ObjC Var(warn_unterminated_string_initialization) Warning 
LangEnabledBy(C ObjC,Wextra || Wc++-compat)
-+Warn about character arrays initialized as unterminated character 
sequences by a string literal.
++Warn about character arrays initialized as unterminated character 
sequences with a string literal.
 +
  Wunused
  C ObjC C++ ObjC++ LangEnabledBy(C ObjC C++ ObjC++,Wall)
@@ gcc/c/c-typeck.cc: digest_init (location_t init_loc, tree type, tree 
init, tree
{
  unsigned HOST_WIDE_INT size
 
+ ## gcc/doc/invoke.texi ##
+@@ gcc/doc/invoke.texi: Objective-C and Objective-C++ Dialects}.
+ -Wsystem-headers  -Wtautological-compare  -Wtrampolines  -Wtrigraphs
+ -Wtrivial-auto-var-init -Wtsan -Wtype-limits  -Wundef
+ -Wuninitialized  -Wunknown-pragmas
+--Wunsuffixed-float-constants  -Wunused
++-Wunsuffixed-float-constants
++-Wunterminated-string-initialization
++-Wunused
+ -Wunused-but-set-parameter  -Wunused-but-set-variable
+ -Wunused-const-variable  -Wunused-const-variable=@var{n}
+ -Wunused-function  -Wunused-label  -Wunused-local-typedefs
+@@ gcc/doc/invoke.texi: name is still supported, but the newer name is 
more descriptive.)
+ -Wredundant-move @r{(only for C++)}
+ -Wtype-limits
+ -Wuninitialized
++-Wunterminated-string-initialization
+ -Wshift-negative-value @r{(in C++11 to C++17 and in C99 and newer)}
+ -Wunused-parameter @r{(only with} @option{-Wunused} @r{or} 
@option{-Wall}@r{)}
+ -Wunused-but-set-parameter @r{(only with} @option{-Wunused} @r{or} 
@option{-Wall}@r{)}}
+@@ gcc/doc/invoke.texi: arithmetic that may yield out of bounds values. 
This warning level may
+ give a larger number of false positives and is deactivated by default.
+ @end table
+ 
++@opindex Wunterminated-string-initialization
++@opindex Wno-unterminated-string-initialization
++@item -Wunterminated-string-initialization
++Warn about character arrays
++initialized as unterminated character sequences
++with a string literal.
++For example:
++
++@smallexample
++char arr[3] = "foo";
++@end smallexample
++
++@option{-Wunterminated-string-initialization} is enabled by 
@option{-Wextra}.
++
+ @opindex Warray-compare
+ @opindex Wno-array-compare
+ @item -Warray-compare
+
  ## gcc/testsuite/gcc.dg/Wcxx-compat-14.c ##
 @@
  /* { dg-options "-Wc++-compat" } */

 gcc/c-family/c.opt|  4 
 gcc/c/c-typeck.cc

Re: [PATCH] asan: Handle poly-int sizes in ASAN_MARK [PR97696]

2024-03-05 Thread Jakub Jelinek

On Tue, Mar 05, 2024 at 07:49:21PM +, Richard Sandiford wrote:
> Jakub Jelinek  writes:
> > On Tue, Mar 05, 2024 at 06:30:40PM +, Richard Sandiford wrote:
> >> (1) Keep the test where it is, taking advantage of the current SVE
> >> handling in aarch64-sve.exp, and add:
> >> 
> >>   /* { dg-skip-if "" { no_fsanitize_address } } */
> >
> > I'd go with this.  asan/ directory for test would be needed for dg-do run
> > tests obviously, because then we need the test driver to add appropriate
> > options to find the library etc.
> 
> Thanks, now pushed with that change.
> 
> What do you think about backports, after a baking-in period?

Looks backportable to me.

Jakub

Re: [PATCHv2] fwprop: Avoid volatile defines to be propagated

2024-03-05 Thread Richard Sandiford

HAO CHEN GUI  writes:
> Hi,
>   This patch tries to fix a potential problem which is raised by the patch
> for PR111267. The volatile asm operand tries to be propagated to a single
> set insn with the patch for PR111267. The volatile asm operand might be
> executed for multiple times if the define insn isn't eliminated after
> propagation. Now set_src_cost comparison might reject such propagation.
> But it has the chance to be taken after replacing set_src_cost with insn
> cost. Actually I found the problem in testing my patch which replacing
> set_src_cost with insn_cost in fwprop pass.
>
>   Compared to the last version, the check volatile_insn_p is replaced with
> volatile_refs_p in order to check volatile memory reference also.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646482.html
>
>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions. Is it OK for the trunk?

OK, thanks.  I'm not sure this fixes a known regression, but IMO the
barrier should be lower for things that fix loss of volatility, since
it's usually so hard to observe the effect in a determinstic way.

Richard

>
> Thanks
> Gui Haochen
>
> ChangeLog
> fwprop: Avoid volatile defines to be propagated
>
> The patch for PR111267 (commit id 86de9b66480b710202a2898cf513db105d8c432f)
> which introduces an exception for propagation on single set insn.  The
> propagation which might not be profitable (checked by profitable_p) is still
> allowed to be propagated to single set insn.  It has a potential problem
> that a volatile operand might be propagated to a single set insn.  If the
> define insn is not eliminated after propagation, the volatile operand will
> be executed for multiple times.  This patch fixes the problem by skipping
> volatile set source rtx in propagation.
>
> gcc/
>   * fwprop.cc (forward_propagate_into): Return false for volatile set
>   source rtx.
>
> gcc/testsuite/
>   * gcc.target/powerpc/fwprop-1.c: New.
>
> patch.diff
> diff --git a/gcc/fwprop.cc b/gcc/fwprop.cc
> index 7872609b336..cb6fd6700ca 100644
> --- a/gcc/fwprop.cc
> +++ b/gcc/fwprop.cc
> @@ -854,6 +854,8 @@ forward_propagate_into (use_info *use, bool reg_prop_only 
> = false)
>
>rtx dest = SET_DEST (def_set);
>rtx src = SET_SRC (def_set);
> +  if (volatile_refs_p (src))
> +return false;
>
>/* Allow propagations into a loop only for reg-to-reg copies, since
>   replacing one register by another shouldn't increase the cost.
> diff --git a/gcc/testsuite/gcc.target/powerpc/fwprop-1.c 
> b/gcc/testsuite/gcc.target/powerpc/fwprop-1.c
> new file mode 100644
> index 000..07b207f980c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/fwprop-1.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O1 -fdump-rtl-fwprop1-details" } */
> +/* { dg-final { scan-rtl-dump-not "propagating insn" "fwprop1" } } */
> +
> +/* Verify that volatile asm operands doesn't be propagated.  */
> +long long foo ()
> +{
> +  long long res;
> +  __asm__ __volatile__(
> +""
> +  : "=r" (res)
> +  :
> +  : "memory");
> +  return res;
> +}

Re: [PATCH] asan: Handle poly-int sizes in ASAN_MARK [PR97696]

2024-03-05 Thread Richard Sandiford

Jakub Jelinek  writes:
> On Tue, Mar 05, 2024 at 06:30:40PM +, Richard Sandiford wrote:
>> (1) Keep the test where it is, taking advantage of the current SVE
>> handling in aarch64-sve.exp, and add:
>> 
>>   /* { dg-skip-if "" { no_fsanitize_address } } */
>
> I'd go with this.  asan/ directory for test would be needed for dg-do run
> tests obviously, because then we need the test driver to add appropriate
> options to find the library etc.

Thanks, now pushed with that change.

What do you think about backports, after a baking-in period?

Richard

Re: [PATCH] doc: Fix docs for -dD regarding predefined macros

2024-03-05 Thread Jonathan Wakely

On Tue, 5 Mar 2024 at 18:31, Joseph Myers wrote:
>
> On Tue, 5 Mar 2024, Jakub Jelinek wrote:
>
> > I can't bisect that far, supposedly predefined macros weren't included back
> > in 1996 when this was written but maybe it changed in 1999 or even earlier.
>
> It looks like this changed in 3.0 (so probably with the move to cpplib to
> provide the default preprocessor implementation), but I don't know why.


Ah, thanks for narrowing it down.

I'll push the doc change tomorrow.

Re: [PATCH] asan: Handle poly-int sizes in ASAN_MARK [PR97696]

2024-03-05 Thread Jakub Jelinek

On Tue, Mar 05, 2024 at 06:30:40PM +, Richard Sandiford wrote:
> (1) Keep the test where it is, taking advantage of the current SVE
> handling in aarch64-sve.exp, and add:
> 
>   /* { dg-skip-if "" { no_fsanitize_address } } */

I'd go with this.  asan/ directory for test would be needed for dg-do run
tests obviously, because then we need the test driver to add appropriate
options to find the library etc.

Jakub

Re: [PATCH] doc: Fix docs for -dD regarding predefined macros

2024-03-05 Thread Joseph Myers

On Tue, 5 Mar 2024, Jakub Jelinek wrote:

> I can't bisect that far, supposedly predefined macros weren't included back
> in 1996 when this was written but maybe it changed in 1999 or even earlier.

It looks like this changed in 3.0 (so probably with the move to cpplib to 
provide the default preprocessor implementation), but I don't know why.

-- 
Joseph S. Myers
josmy...@redhat.com

Re: [PATCH] asan: Handle poly-int sizes in ASAN_MARK [PR97696]

2024-03-05 Thread Richard Sandiford

Jakub Jelinek  writes:
> On Tue, Mar 05, 2024 at 06:03:41PM +, Richard Sandiford wrote:
>> This patch makes the expansion of IFN_ASAN_MARK let through
>> poly-int-sized objects.  The expansion itself was already generic
>> enough, but the tests for the fast path were too strict.
>> 
>> Bootstrapped & regression tested on aarch64-linux-gnu.  Is this OK
>> for trunk now, or should it wait for GCC 15?  I'm not sure that it's
>> technically a regression, in the sense that we previously accepted the
>> testcase, but rejecting with an ICE is arguably worse than "sorry, can't
>> do that".  And as noted in the PR, this bug is breaking numpy builds.
>> 
>> Richard
>> 
>> 
>> gcc/
>>  PR sanitizer/97696
>>  * asan.cc (asan_expand_mark_ifn): Allow the length to be a poly_int.
>> 
>> gcc/testsuite/
>>  PR sanitizer/97696
>>  * gcc.target/aarch64/sve/pr97696.c: New test.
>
> Ok for trunk now.

Thanks.  (And thanks for the quick review.)

>
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr97696.c
>> @@ -0,0 +1,28 @@
>> +/* { dg-options "-fsanitize=address -fsanitize-address-use-after-scope" } */
>
> Though I'd say this test should require sanitize_address affective target.
> E.g. libsanitizer (sure, not actually used by the test) is only supported
> on aarch64*-*-linux*, not e.g. on darwin nor freebsd nor fuchsia etc.

Yeah, I'd wondered about that.  But fsanitize_address is only available
in the asan.exp framework (or something else that includes asan-dg.exp).
And like you say, the test doesn't specifically need the library to
be available.

I guess the options are:

(1) Keep the test where it is, taking advantage of the current SVE
handling in aarch64-sve.exp, and add:

  /* { dg-skip-if "" { no_fsanitize_address } } */

(2) Move the test to gcc.dg/asan/, make it conditional on aarch64*-*-*,
and add:

  #pragma GCC target "+sve"

Any preference?

Actually running the test would require both libsanitizer support and
aarch64_sve_hw.  Assembling it would need an assembler that understands SVE.

Richard

Re: [PATCH] asan: Handle poly-int sizes in ASAN_MARK [PR97696]

2024-03-05 Thread Jakub Jelinek

On Tue, Mar 05, 2024 at 06:03:41PM +, Richard Sandiford wrote:
> This patch makes the expansion of IFN_ASAN_MARK let through
> poly-int-sized objects.  The expansion itself was already generic
> enough, but the tests for the fast path were too strict.
> 
> Bootstrapped & regression tested on aarch64-linux-gnu.  Is this OK
> for trunk now, or should it wait for GCC 15?  I'm not sure that it's
> technically a regression, in the sense that we previously accepted the
> testcase, but rejecting with an ICE is arguably worse than "sorry, can't
> do that".  And as noted in the PR, this bug is breaking numpy builds.
> 
> Richard
> 
> 
> gcc/
>   PR sanitizer/97696
>   * asan.cc (asan_expand_mark_ifn): Allow the length to be a poly_int.
> 
> gcc/testsuite/
>   PR sanitizer/97696
>   * gcc.target/aarch64/sve/pr97696.c: New test.

Ok for trunk now.

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr97696.c
> @@ -0,0 +1,28 @@
> +/* { dg-options "-fsanitize=address -fsanitize-address-use-after-scope" } */

Though I'd say this test should require sanitize_address affective target.
E.g. libsanitizer (sure, not actually used by the test) is only supported
on aarch64*-*-linux*, not e.g. on darwin nor freebsd nor fuchsia etc.

Jakub

[PATCH] asan: Handle poly-int sizes in ASAN_MARK [PR97696]

2024-03-05 Thread Richard Sandiford

This patch makes the expansion of IFN_ASAN_MARK let through
poly-int-sized objects.  The expansion itself was already generic
enough, but the tests for the fast path were too strict.

Bootstrapped & regression tested on aarch64-linux-gnu.  Is this OK
for trunk now, or should it wait for GCC 15?  I'm not sure that it's
technically a regression, in the sense that we previously accepted the
testcase, but rejecting with an ICE is arguably worse than "sorry, can't
do that".  And as noted in the PR, this bug is breaking numpy builds.

Richard


gcc/
PR sanitizer/97696
* asan.cc (asan_expand_mark_ifn): Allow the length to be a poly_int.

gcc/testsuite/
PR sanitizer/97696
* gcc.target/aarch64/sve/pr97696.c: New test.
---
 gcc/asan.cc   |  9 +++---
 .../gcc.target/aarch64/sve/pr97696.c  | 28 +++
 2 files changed, 32 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pr97696.c

diff --git a/gcc/asan.cc b/gcc/asan.cc
index 0fd7dd1f3ed..d621ec9c323 100644
--- a/gcc/asan.cc
+++ b/gcc/asan.cc
@@ -3795,9 +3795,7 @@ asan_expand_mark_ifn (gimple_stmt_iterator *iter)
 }
   tree len = gimple_call_arg (g, 2);
 
-  gcc_assert (tree_fits_shwi_p (len));
-  unsigned HOST_WIDE_INT size_in_bytes = tree_to_shwi (len);
-  gcc_assert (size_in_bytes);
+  gcc_assert (poly_int_tree_p (len));
 
   g = gimple_build_assign (make_ssa_name (pointer_sized_int_node),
   NOP_EXPR, base);
@@ -3806,9 +3804,10 @@ asan_expand_mark_ifn (gimple_stmt_iterator *iter)
   tree base_addr = gimple_assign_lhs (g);
 
   /* Generate direct emission if size_in_bytes is small.  */
-  if (size_in_bytes
-  <= (unsigned)param_use_after_scope_direct_emission_threshold)
+  unsigned threshold = param_use_after_scope_direct_emission_threshold;
+  if (tree_fits_uhwi_p (len) && tree_to_uhwi (len) <= threshold)
 {
+  unsigned HOST_WIDE_INT size_in_bytes = tree_to_uhwi (len);
   const unsigned HOST_WIDE_INT shadow_size
= shadow_mem_size (size_in_bytes);
   const unsigned int shadow_align
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr97696.c 
b/gcc/testsuite/gcc.target/aarch64/sve/pr97696.c
new file mode 100644
index 000..f533d9efc02
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pr97696.c
@@ -0,0 +1,28 @@
+/* { dg-options "-fsanitize=address -fsanitize-address-use-after-scope" } */
+
+#include 
+
+__attribute__((noinline, noclone)) int
+foo (char *a)
+{
+  int i, j = 0;
+  asm volatile ("" : "+r" (a) : : "memory");
+  for (i = 0; i < 12; i++)
+j += a[i];
+  return j;
+}
+
+int
+main ()
+{
+  int i, j = 0;
+  for (i = 0; i < 4; i++)
+{
+  char a[12];
+  __SVInt8_t freq;
+  __builtin_bcmp (, a, 10);
+  __builtin_memset (a, 0, sizeof (a));
+  j += foo (a);
+}
+  return j;
+}
-- 
2.25.1

[pushed] aarch64: Remove SME2.1 forms of LUTI2/4

2024-03-05 Thread Richard Sandiford

I was over-eager when adding support for strided SME2 instructions
and accidentally included forms of LUTI2 and LUTI4 that are only
available with SME2.1, not SME2.  This patch removes them for now.
We're planning to add proper support for SME2.1 in the GCC 15
timeframe.

Sorry for the blunder :(

Tested on aarch64-linux-gnu & pushed.

Richard


gcc/
* config/aarch64/aarch64.md (stride_type): Remove luti_consecutive
and luti_strided.
* config/aarch64/aarch64-sme.md
(@aarch64_sme_lut): Remove stride_type attribute.
(@aarch64_sme_lut_strided2): Delete.
(@aarch64_sme_lut_strided4): Likewise.
* config/aarch64/aarch64-early-ra.cc (is_stride_candidate)
(early_ra::maybe_convert_to_strided_access): Remove support for
strided LUTI2 and LUTI4.

gcc/testsuite/
* gcc.target/aarch64/sme/strided_1.c (test5): Remove.
---
 gcc/config/aarch64/aarch64-early-ra.cc| 20 +-
 gcc/config/aarch64/aarch64-sme.md | 70 ---
 gcc/config/aarch64/aarch64.md |  3 +-
 .../gcc.target/aarch64/sme/strided_1.c| 55 ---
 4 files changed, 3 insertions(+), 145 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-early-ra.cc 
b/gcc/config/aarch64/aarch64-early-ra.cc
index 8530b0ae41e..1e2c823cb2e 100644
--- a/gcc/config/aarch64/aarch64-early-ra.cc
+++ b/gcc/config/aarch64/aarch64-early-ra.cc
@@ -1060,8 +1060,7 @@ is_stride_candidate (rtx_insn *insn)
 return false;
 
   auto stride_type = get_attr_stride_type (insn);
-  return (stride_type == STRIDE_TYPE_LUTI_CONSECUTIVE
- || stride_type == STRIDE_TYPE_LD1_CONSECUTIVE
+  return (stride_type == STRIDE_TYPE_LD1_CONSECUTIVE
  || stride_type == STRIDE_TYPE_ST1_CONSECUTIVE);
 }
 
@@ -3212,8 +3211,7 @@ early_ra::maybe_convert_to_strided_access (rtx_insn *insn)
   auto stride_type = get_attr_stride_type (insn);
   rtx pat = PATTERN (insn);
   rtx op;
-  if (stride_type == STRIDE_TYPE_LUTI_CONSECUTIVE
-  || stride_type == STRIDE_TYPE_LD1_CONSECUTIVE)
+  if (stride_type == STRIDE_TYPE_LD1_CONSECUTIVE)
 op = SET_DEST (pat);
   else if (stride_type == STRIDE_TYPE_ST1_CONSECUTIVE)
 op = XVECEXP (SET_SRC (pat), 0, 1);
@@ -3263,20 +3261,6 @@ early_ra::maybe_convert_to_strided_access (rtx_insn 
*insn)
   XVECEXP (SET_SRC (pat), 0, XVECLEN (SET_SRC (pat), 0) - 1)
= *recog_data.dup_loc[0];
 }
-  else if (stride_type == STRIDE_TYPE_LUTI_CONSECUTIVE)
-{
-  auto bits = INTVAL (XVECEXP (SET_SRC (pat), 0, 4));
-  if (range.count == 2)
-   pat = gen_aarch64_sme_lut_strided2 (bits, single_mode,
-   regs[0], regs[1],
-   recog_data.operand[1],
-   recog_data.operand[2]);
-  else
-   pat = gen_aarch64_sme_lut_strided4 (bits, single_mode,
-   regs[0], regs[1], regs[2], regs[3],
-   recog_data.operand[1],
-   recog_data.operand[2]);
-}
   else
 gcc_unreachable ();
   PATTERN (insn) = pat;
diff --git a/gcc/config/aarch64/aarch64-sme.md 
b/gcc/config/aarch64/aarch64-sme.md
index c95d4aa696c..78ad2fc699f 100644
--- a/gcc/config/aarch64/aarch64-sme.md
+++ b/gcc/config/aarch64/aarch64-sme.md
@@ -1939,74 +1939,4 @@ (define_insn "@aarch64_sme_lut"
   "TARGET_STREAMING_SME2
&& !( == 4 &&  == 4 &&  == 8)"
   "luti\t%0, zt0, %1[%2]"
-  [(set_attr "stride_type" "luti_consecutive")]
-)
-
-(define_insn "@aarch64_sme_lut_strided2"
-  [(set (match_operand:SVE_FULL_BHS 0 "aarch64_simd_register" "=Uwd")
-   (unspec:SVE_FULL_BHS
- [(reg:V8DI ZT0_REGNUM)
-  (reg:DI SME_STATE_REGNUM)
-  (match_operand:VNx16QI 2 "register_operand" "w")
-  (match_operand:DI 3 "const_int_operand")
-  (const_int LUTI_BITS)
-  (const_int 0)]
- UNSPEC_SME_LUTI))
-   (set (match_operand:SVE_FULL_BHS 1 "aarch64_simd_register" "=w")
-   (unspec:SVE_FULL_BHS
- [(reg:V8DI ZT0_REGNUM)
-  (reg:DI SME_STATE_REGNUM)
-  (match_dup 2)
-  (match_dup 3)
-  (const_int LUTI_BITS)
-  (const_int 1)]
- UNSPEC_SME_LUTI))]
-  "TARGET_STREAMING_SME2
-   && aarch64_strided_registers_p (operands, 2, 8)"
-  "luti\t{%0., %1.}, zt0, %2[%3]"
-  [(set_attr "stride_type" "luti_strided")]
-)
-
-(define_insn "@aarch64_sme_lut_strided4"
-  [(set (match_operand:SVE_FULL_BHS 0 "aarch64_simd_register" "=Uwt")
-   (unspec:SVE_FULL_BHS
- [(reg:V8DI ZT0_REGNUM)
-  (reg:DI SME_STATE_REGNUM)
-  (match_operand:VNx16QI 4 "register_operand" "w")
-  (match_operand:DI 5 "const_int_operand")
-  (const_int LUTI_BITS)
-  (const_int 0)]
- UNSPEC_SME_LUTI))
-   (set (match_operand:SVE_FULL_BHS 1 "aarch64_simd_register" "=w")
-   (unspec:SVE_FULL_BHS
-

Re: [PATCH, V3] ctf: fix incorrect CTF for multi-dimensional array types

2024-03-05 Thread David Faust




On 3/5/24 00:47, Indu Bhagat wrote:
> From: Cupertino Miranda 
> 
> [Changes from V2]
>   - Fixed aarch64 new FAILs reported by Linaro CI.
>   - Fixed typos and other nits pointed out in V2.
> [End of changes from V2]

OK, thanks.

> 
> PR debug/114186
> 
> DWARF DIEs of type DW_TAG_subrange_type are linked together to represent
> the information about the subsequent dimensions.  The CTF processing was
> so far working through them in the opposite (incorrect) order.
> 
> While fixing the issue, refactor the code a bit for readability.
> 
> co-authored-By: Indu Bhagat 
> 
> gcc/
>   PR debug/114186
>   * dwarf2ctf.cc (gen_ctf_array_type): Invoke the ctf_add_array ()
>   in the correct order of the dimensions.
> (gen_ctf_subrange_type): Refactor out handling of
>   DW_TAG_subrange_type DIE to here.
> 
> gcc/testsuite/
>   PR debug/114186
>   * gcc.dg/debug/ctf/ctf-array-6.c: Add test.
> ---
> 
> Testing notes:
>  - Linaro CI reported three new FAILs introduced by ctf-array-6.c due to
>presence of char '#' on aarch64 where the ASM_COMMENT_START differs.
>Fixed and regression tested on aarch64.
>  - Regression tested on x86_64-linux-gnu default target.
>  - Regression tested for target bpf-unknown-none (btf.exp, ctf.exp, bpf.exp).
>  - Kernel build with -gctf shows healthier CTF types for arrays.
> 
> ---
>  gcc/dwarf2ctf.cc | 158 +--
>  gcc/testsuite/gcc.dg/debug/ctf/ctf-array-6.c |  14 ++
>  2 files changed, 89 insertions(+), 83 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-array-6.c
> 
> diff --git a/gcc/dwarf2ctf.cc b/gcc/dwarf2ctf.cc
> index dca86edfffa9..77d6bf896893 100644
> --- a/gcc/dwarf2ctf.cc
> +++ b/gcc/dwarf2ctf.cc
> @@ -349,105 +349,97 @@ gen_ctf_pointer_type (ctf_container_ref ctfc, 
> dw_die_ref ptr_type)
>return ptr_type_id;
>  }
>  
> -/* Generate CTF for an array type.  */
> +/* Recursively generate CTF for array dimensions starting at DIE C (of type
> +   DW_TAG_subrange_type) until DIE LAST (of type DW_TAG_subrange_type) is
> +   reached.  ARRAY_ELEMS_TYPE_ID is base type for the array.  */
>  
>  static ctf_id_t
> -gen_ctf_array_type (ctf_container_ref ctfc, dw_die_ref array_type)
> +gen_ctf_subrange_type (ctf_container_ref ctfc, ctf_id_t array_elems_type_id,
> +dw_die_ref c, dw_die_ref last)
>  {
> -  dw_die_ref c;
> -  ctf_id_t array_elems_type_id = CTF_NULL_TYPEID;
> +  ctf_arinfo_t arinfo;
> +  ctf_id_t array_node_type_id = CTF_NULL_TYPEID;
>  
> -  int vector_type_p = get_AT_flag (array_type, DW_AT_GNU_vector);
> -  if (vector_type_p)
> -return array_elems_type_id;
> +  dw_attr_node *upper_bound_at;
> +  dw_die_ref array_index_type;
> +  uint32_t array_num_elements;
>  
> -  dw_die_ref array_elems_type = ctf_get_AT_type (array_type);
> +  if (dw_get_die_tag (c) == DW_TAG_subrange_type)
> +{
> +  /* When DW_AT_upper_bound is used to specify the size of an
> +  array in DWARF, it is usually an unsigned constant
> +  specifying the upper bound index of the array.  However,
> +  for unsized arrays, such as foo[] or bar[0],
> +  DW_AT_upper_bound is a signed integer constant
> +  instead.  */
> +
> +  upper_bound_at = get_AT (c, DW_AT_upper_bound);
> +  if (upper_bound_at
> +   && AT_class (upper_bound_at) == dw_val_class_unsigned_const)
> + /* This is the upper bound index.  */
> + array_num_elements = get_AT_unsigned (c, DW_AT_upper_bound) + 1;
> +  else if (get_AT (c, DW_AT_count))
> + array_num_elements = get_AT_unsigned (c, DW_AT_count);
> +  else
> + {
> +   /* This is a VLA of some kind.  */
> +   array_num_elements = 0;
> + }
> +}
> +  else
> +gcc_unreachable ();
>  
> -  /* First, register the type of the array elements if needed.  */
> -  array_elems_type_id = gen_ctf_type (ctfc, array_elems_type);
> +  /* Ok, mount and register the array type.  Note how the array
> + type we register here is the type of the elements in
> + subsequent "dimensions", if there are any.  */
> +  arinfo.ctr_nelems = array_num_elements;
>  
> -  /* DWARF array types pretend C supports multi-dimensional arrays.
> - So for the type int[N][M], the array type DIE contains two
> - subrange_type children, the first with upper bound N-1 and the
> - second with upper bound M-1.
> +  array_index_type = ctf_get_AT_type (c);
> +  arinfo.ctr_index = gen_ctf_type (ctfc, array_index_type);
>  
> - CTF, on the other hand, just encodes each array type in its own
> - array type CTF struct.  Therefore we have to iterate on the
> - children and create all the needed types.  */
> +  if (c == last)
> +arinfo.ctr_contents = array_elems_type_id;
> +  else
> +arinfo.ctr_contents = gen_ctf_subrange_type (ctfc, array_elems_type_id,
> +  dw_get_die_sib (c), last);
>  
> -  c = dw_get_die_child (array_type);
> -  gcc_assert

[PATCH] arm: check for low register before applying peephole [PR113510]

2024-03-05 Thread Richard Earnshaw


For thumb1, when using a peephole to fuse

mov reg, #const
add reg, reg, SP

into

add reg, SP, #const

we must first check that reg is a low register, otherwise we will ICE
when trying to recognize the resulting insn.

gcc/ChangeLog:

PR target/113510
* config/arm/thumb1.md (peephole2 to fuse mov imm/add SP): Use
low_register_operand.
---

This appears to have gone latent again, but checked against the known
failing version.

 gcc/config/arm/thumb1.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
index 14d6df580af..d7074b43f60 100644
--- a/gcc/config/arm/thumb1.md
+++ b/gcc/config/arm/thumb1.md
@@ -113,7 +113,7 @@ (define_insn_and_split "*thumb1_addsi3"
 ;; Reloading and elimination of the frame pointer can
 ;; sometimes cause this optimization to be missed.
 (define_peephole2
-  [(set (match_operand:SI 0 "arm_general_register_operand" "")
+  [(set (match_operand:SI 0 "low_register_operand" "")
 	(match_operand:SI 1 "const_int_operand" ""))
(set (match_dup 0)
 	(plus:SI (match_dup 0) (reg:SI SP_REGNUM)))]

Re: [C++ coroutines] Initial implementation pushed to master.

2024-03-05 Thread H.J. Lu

On Sat, Jan 18, 2020 at 4:54 AM Iain Sandoe  wrote:
>
> Hi,
>
> Thanks to:
>
>* the reviewers, the code was definitely improved by your reviews.
>
>* those folks who tested the branch and/or compiler explorer
>  instance and reported problems with reproducers.
>
>   * WG21 colleagues, especially Lewis and Gor for valuable input
> and discussions on the design.
>
> = TL;DR:
>
> * This is not enabled by default (even for -std=c++2a), it needs -fcoroutines.
>
> * Like all the C++20 support, it is experimental, perhaps more experimental
>   than some other pieces because wording is still being amended.
>
> * The FE/ME tests are run for ALL targets; in principle this should be target-
>   agnostic, if we see fails then that is probably interesting input for the 
> ABI
>  panel.
>
>  * I regstrapped on 64b LE and BE platforms and a 32b LE host with no observed
>   issues or regressions.
>
>  * it’s just slightly too big to send uncompressed so attached as a bz2.
>
>  * commit is r10-6063-g49789fd08
>
> thanks again to all those who helped,
> Iain
>
> ==  The full covering note:
>
> This is the squashed version of the first 6 patches that were split to
> facilitate review.
>
> The changes to libiberty (7th patch) to support demangling the co_await
> operator stand alone and are applied separately.
>
> The patch series is an initial implementation of a coroutine feature,
> expected to be standardised in C++20.
>
> Standardisation status (and potential impact on this implementation)
> 
>
> The facility was accepted into the working draft for C++20 by WG21 in
> February 2019.  During following WG21 meetings, design and national body
> comments have been reviewed, with no significant change resulting.
>
> The current GCC implementation is against n4835 [1].
>
> At this stage, the remaining potential for change comes from:
>
> * Areas of national body comments that were not resolved in the version we
>   have worked to:
>   (a) handling of the situation where aligned allocation is available.
>   (b) handling of the situation where a user wants coroutines, but does not
>   want exceptions (e.g. a GPU).
>
> * Agreed changes that have not yet been worded in a draft standard that we
>   have worked to.
>
> It is not expected that the resolution to these can produce any major
> change at this phase of the standardisation process.  Such changes should be
> limited to the coroutine-specific code.
>
> ABI
> ---
>
> The various compiler developers 'vendors' have discussed a minimal ABI to
> allow one implementation to call coroutines compiled by another.
>
> This amounts to:
>
> 1. The layout of a public portion of the coroutine frame.
>
>  Coroutines need to preserve state across suspension points, the storage for
>  this is called a "coroutine frame".
>
>  The ABI mandates that pointers into the coroutine frame point to an area
>  begining with two function pointers (to the resume and destroy functions
>  described below); these are immediately followed by the "promise object"
>  described in the standard.
>
>  This is sufficient that the builtins can take a coroutine frame pointer and
>  determine the address of the promise (or call the resume/destroy functions).
>
> 2. A number of compiler builtins that the standard library might use.
>
>   These are implemented by this patch series.
>
> 3. This introduces a new operator 'co_await' the mangling for which is also
> agreed between vendors (and has an issue filed for that against the upstream
> c++abi).  Demangling for this is added to libiberty in a separate patch.
>
> The ABI has currently no target-specific content (a given psABI might elect
> to mandate alignment, but the common ABI does not do this).
>
> Standard Library impact
> ---
>
> The current implementations require addition of only a single header to
> the standard library (no change to the runtime).  This header is part of
> the patch.
>
> GCC Implementation outline
> --
>
> The standard's design for coroutines does not decorate the definition of
> a coroutine in any way, so that a function is only known to be a coroutine
> when one of the keywords (co_await, co_yield, co_return) is encountered.
>
> This means that we cannot special-case such functions from the outset, but
> must process them differently when they are finalised - which we do from
> "finish_function ()".
>
> At a high level, this design of coroutine produces four pieces from the
> original user's function:
>
>   1. A coroutine state frame (taking the logical place of the activation
>  record for a regular function).  One item stored in that state is the
>  index of the current suspend point.
>   2. A "ramp" function
>  This is what the user calls to construct the coroutine frame and start
>  the coroutine execution.  This will return some object representing the
>  coroutine's eventual

Re: [PATCH] doc: Fix docs for -dD regarding predefined macros

2024-03-05 Thread Jakub Jelinek

On Tue, Mar 05, 2024 at 04:16:00PM +, Jonathan Wakely wrote:
> OK for trunk?
> 
> Or am I missing something and the docs are right? (sometimes? always?)
> 
> 
> -- >8 --
> 
> The manual has always claimed that -dD differs from -dM by not
> outputting predefined macros, but that's untrue. It has been untrue
> since at least GCC 3.2 and probably even older.
> 
> gcc/ChangeLog:
> 
>   * doc/cppopts.texi: Remove incorrect claim about -dD not
>   outputting predefined macros.

LGTM.

I can't bisect that far, supposedly predefined macros weren't included back
in 1996 when this was written but maybe it changed in 1999 or even earlier.

Jakub

[PATCH] doc: Fix docs for -dD regarding predefined macros

2024-03-05 Thread Jonathan Wakely

OK for trunk?

Or am I missing something and the docs are right? (sometimes? always?)


-- >8 --

The manual has always claimed that -dD differs from -dM by not
outputting predefined macros, but that's untrue. It has been untrue
since at least GCC 3.2 and probably even older.

gcc/ChangeLog:

* doc/cppopts.texi: Remove incorrect claim about -dD not
outputting predefined macros.
---
 gcc/doc/cppopts.texi | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/doc/cppopts.texi b/gcc/doc/cppopts.texi
index fa8f3d88c89..5b5b0848ae8 100644
--- a/gcc/doc/cppopts.texi
+++ b/gcc/doc/cppopts.texi
@@ -524,8 +524,7 @@ interpreted as a synonym for @option{-fdump-rtl-mach}.
 
 @opindex dD
 @item -dD
-Like @option{-dM} except in two respects: it does @emph{not} include the
-predefined macros, and it outputs @emph{both} the @samp{#define}
+Like @option{-dM} except that it outputs @emph{both} the @samp{#define}
 directives and the result of preprocessing.  Both kinds of output go to
 the standard output file.
 
-- 
2.43.2

Re: [PATCH] middle-end/113680 - Optimize (x - y) CMP 0 as x CMP y

2024-03-05 Thread Richard Biener

On Tue, Mar 5, 2024 at 1:51 PM Ken Matsui  wrote:
>
> On Tue, Mar 5, 2024 at 12:38 AM Richard Biener
>  wrote:
> >
> > On Mon, Mar 4, 2024 at 9:40 PM Ken Matsui  wrote:
> > >
> > > (x - y) CMP 0 is equivalent to x CMP y where x and y are signed
> > > integers and CMP is <, <=, >, or >=.  Similarly, 0 CMP (x - y) is
> > > equivalent to y CMP x.  As reported in PR middle-end/113680, this
> > > equivalence does not hold for types other than signed integers.  When
> > > it comes to conditions, the former was translated to a combination of
> > > sub and test, whereas the latter was translated to a single cmp.
> > > Thus, this optimization pass tries to optimize the former to the
> > > latter.
> > >
> > > When `-fwrapv` is enabled, GCC treats the overflow of signed integers
> > > as defined behavior, specifically, wrapping around according to two's
> > > complement arithmetic.  This has implications for optimizations that
> > > rely on the standard behavior of signed integers, where overflow is
> > > undefined.  Consider the example given:
> > >
> > > long long llmax = __LONG_LONG_MAX__;
> > > long long llmin = -llmax - 1;
> > >
> > > Here, `llmax - llmin` effectively becomes `llmax - (-llmax - 1)`, which
> > > simplifies to `2 * llmax + 1`.  Given that `llmax` is the maximum value
> > > for a `long long`, this calculation overflows in a defined manner
> > > (wrapping around), which under `-fwrapv` is a legal operation that
> > > produces a negative value due to two's complement wraparound.
> > > Therefore, `llmax - llmin < 0` is true.
> > >
> > > However, the direct comparison `llmax < llmin` is false since `llmax`
> > > is the maximum possible value and `llmin` is the minimum.  Hence,
> > > optimizations that rely on the equivalence of `(x - y) CMP 0` to
> > > `x CMP y` (and vice versa) cannot be safely applied when `-fwrapv` is
> > > enabled.  This is why this optimization pass is disabled under
> > > `-fwrapv`.
> > >
> > > This optimization pass must run before the Jump Threading pass and the
> > > VRP pass, as it may modify conditions. For example, in the VRP pass:
> > >
> > > (1)
> > >   int diff = x - y;
> > >   if (diff > 0)
> > > foo();
> > >   if (diff < 0)
> > > bar();
> > >
> > > The second condition would be converted to diff != 0 in the VRP pass
> > > because we know the postcondition of the first condition is diff <= 0,
> > > and then diff != 0 is cheaper than diff < 0. If we apply this pass
> > > after this VRP, we get:
> > >
> > > (2)
> > >   int diff = x - y;
> > >   if (x > y)
> > > foo();
> > >   if (diff != 0)
> > > bar();
> > >
> > > This generates sub and test for the second condition and cmp for the
> > > first condition. However, if we apply this pass beforehand, we simply
> > > get:
> > >
> > > (3)
> > >   int diff = x - y;
> > >   if (x > y)
> > > foo();
> > >   if (x < y)
> > > bar();
> > >
> > > In this code, diff will be eliminated as a dead code, and sub and test
> > > will not be generated, which is more efficient.
> > >
> > > For the Jump Threading pass, without this optimization pass, (1) and
> > > (3) above are recognized as different, which prevents TCO.
> > >
> > > PR middle-end/113680
> >
> > This shouldn't be done as a new optimization pass.  It fits either
> > the explicit code present in the forwprop pass or a new match.pd
> > pattern.  There's possible interaction with x - y value being used
> > elsewhere and thus exposing a CSE opportunity as well as
> > a comparison against zero being possibly implemented by
> > a flag setting subtraction instruction.
> >
>
> Thank you so much for your review!  Although the forwprop pass runs
> multiple times, we might not need to apply this optimization multiple
> times.  Would it be acceptable to add such optimization?  More
> generally, I would like to know how to determine where to put
> optimization in the future.

This kind of pattern matching expression simplification is best
addressed by patterns in match.pd though historically the forwprop
pass still catches cases not in match.pd in its
forward_propagate_into_comparison_1 (and callers).

> FYI, I read this page: https://gcc.gnu.org/wiki/OptimizationCourse
>
> > Our VN pass has some tricks to anticipate CSE opportunities
> > like this, but it's not done "properly" in the way of anticipating
> > both forms during PRE.
> >
> > I'll note we have
> >
> >  /* (A - B) != 0 ? (A - B) : (B - A)same as (A - B) */
> >  (for cmp (ne ltgt)
> >
> > and similar which might be confused by canonicalizing to A != B.
>
> I will investigate and update my patch (after my final exam ends...)!
>
> > I'm also surprised we don't already have the pattern you add.
>
> Hmm, so am I.

It looks like we do it for equality compares which can also handle
types where overflow is undefined.  -fdump-tree-all-folding

Re: [PATCH v2] testsuite, arm: Fix testcase arm/pr112337.c to check for the options first

2024-03-05 Thread Richard Earnshaw (lists)

On 19/02/2024 10:11, Saurabh Jha wrote:
> 
> On 2/9/2024 2:57 PM, Richard Earnshaw (lists) wrote:
>> On 30/01/2024 17:07, Saurabh Jha wrote:
>>> Hey,
>>>
>>> Previously, this test was added to fix this bug: 
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112337. However, it did not 
>>> check the compilation options before using them, leading to errors.
>>>
>>> This patch fixes the test by first checking whether it can use the options 
>>> before using them.
>>>
>>> Tested for arm-none-eabi and found no regressions. The output of check-gcc 
>>> with RUNTESTFLAGS="arm.exp=*" changed like this:
>>>
>>> Before:
>>> # of expected passes  5963
>>> # of unexpected failures  64
>>>
>>> After:
>>> # of expected passes  5964
>>> # of unexpected failures  63
>>>
>>> Ok for master?
>>>
>>> Regards,
>>> Saurabh
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>>  * gcc.target/arm/pr112337.c: Check whether we can use the 
>>> compilation options before using them.
>> My apologies for missing this earlier.  It didn't show up in patchwork. 
>> That's most likely because the attachment is a binary blob instead of 
>> text/plain.  That also means that the Linaro CI system hasn't seen this 
>> patch either.  Please can you fix your mailer to add plain text patch files.
>>
>> -/* { dg-options "-O2 -march=armv8.1-m.main+fp.dp+mve.fp -mfloat-abi=hard" } 
>> */
>> +/* { dg-require-effective-target arm_hard_ok } */
>> +/* { dg-require-effective-target arm_v8_1m_mve_ok } */
>> +/* { dg-options "-O2 -mfloat-abi=hard" } */
>> +/* { dg-add-options arm_v8_1m_mve } */
>>
>> This is moving in the right direction, but it adds more than necessary now: 
>> checking for, and adding -mfloat-abi=hard is not necessary any more as 
>> arm_v8_1m_mve_ok will work out what float-abi flags are needed to make the 
>> options work. (What's more, it will prevent the test from running if the 
>> base configuration of the compiler is incompatible with the hard float ABI, 
>> which is more than we need.).
>>
>> So please can you re-spin removing the hard-float check and removing that 
>> from dg-options.
>>
>> Thanks,
>> R.
> 
> Hi Richard,
> 
> Agreed with your comments. Please find the patch with the suggested changes 
> attached.
> 
> Regards,
> 
> Saurabh
> 


Thanks, I've pushed this.  Next time, please can you put the commit message 
inside the patch, so that I can apply things automatically.  Eg: 

>From 1c92c94074449929f40cea99a6450bcde3aec12f Mon Sep 17 00:00:00 2001
From: Saurabh Jha 
Date: Tue, 30 Jan 2024 15:03:36 +
Subject: [PATCH] Fix testcase pr112337.c to check the options [PR112337]

gcc.target/arm/pr112337.c was failing to validate that adding MVE options
was compatible with the test environment, so add the missing checks.

gcc/testsuite/ChangeLog:

PR target/112337
* gcc.target/arm/pr112337.c: Check for, then use the right MVE
options.

---
 gcc/testsuite/gcc.target/arm/pr112337.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/arm/pr112337.c 
b/gcc/testsuite/gcc.target/arm/pr112337.c

...

Re: [PATCH] c++/modules: befriending template from current class scope

2024-03-05 Thread Patrick Palka

On Mon, 26 Feb 2024, Patrick Palka wrote:

> Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
> OK for trunk?

Ping.

> 
> -- >8 --
> 
> Here the TEMPLATE_DECL representing the template friend declaration for
> B has class scope since B has class scope, but get_merge_kind assumes
> all DECL_UNINSTANTIATED_TEMPLATE_FRIEND_P TEMPLATE_DECL have namespace
> scope and wrongly returns MK_named instead of MK_local_friend.
> 
> gcc/cp/ChangeLog:
> 
>   * module.cc (trees_out::get_merge_kind) :
>   Accomodate class-scope DECL_UNINSTANTIATED_TEMPLATE_FRIEND_P
>   TEMPLATE_DECL.  Merge IDENTIFIER_ANON_P branches.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/modules/friend-7.h: New test.
>   * g++.dg/modules/friend-7_a.H: New test.
>   * g++.dg/modules/friend-7_b.C: New test.
> ---
>  gcc/cp/module.cc  | 19 +--
>  gcc/testsuite/g++.dg/modules/friend-7.h   |  5 +
>  gcc/testsuite/g++.dg/modules/friend-7_a.H |  3 +++
>  gcc/testsuite/g++.dg/modules/friend-7_b.C |  5 +
>  4 files changed, 22 insertions(+), 10 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/modules/friend-7.h
>  create mode 100644 gcc/testsuite/g++.dg/modules/friend-7_a.H
>  create mode 100644 gcc/testsuite/g++.dg/modules/friend-7_b.C
> 
> diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
> index 106af7bdb3e..fa91c6ff9cb 100644
> --- a/gcc/cp/module.cc
> +++ b/gcc/cp/module.cc
> @@ -10491,21 +10491,20 @@ trees_out::get_merge_kind (tree decl, depset *dep)
>   break;
> }
>  
> - if (RECORD_OR_UNION_TYPE_P (ctx))
> + if (TREE_CODE (decl) == TEMPLATE_DECL
> + && DECL_UNINSTANTIATED_TEMPLATE_FRIEND_P (decl))
> {
> - if (IDENTIFIER_ANON_P (DECL_NAME (decl)))
> -   mk = MK_field;
> + mk = MK_local_friend;
>   break;
> }
>  
> - if (TREE_CODE (decl) == TEMPLATE_DECL
> - && DECL_UNINSTANTIATED_TEMPLATE_FRIEND_P (decl))
> -   mk = MK_local_friend;
> - else if (IDENTIFIER_ANON_P (DECL_NAME (decl)))
> + if (IDENTIFIER_ANON_P (DECL_NAME (decl)))
> {
> - if (DECL_IMPLICIT_TYPEDEF_P (decl)
> - && UNSCOPED_ENUM_P (TREE_TYPE (decl))
> - && TYPE_VALUES (TREE_TYPE (decl)))
> + if (RECORD_OR_UNION_TYPE_P (ctx))
> +   mk = MK_field;
> + else if (DECL_IMPLICIT_TYPEDEF_P (decl)
> +  && UNSCOPED_ENUM_P (TREE_TYPE (decl))
> +  && TYPE_VALUES (TREE_TYPE (decl)))
> /* Keyed by first enum value, and underlying type.  */
> mk = MK_enum;
>   else
> diff --git a/gcc/testsuite/g++.dg/modules/friend-7.h 
> b/gcc/testsuite/g++.dg/modules/friend-7.h
> new file mode 100644
> index 000..c0f00394f3b
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/modules/friend-7.h
> @@ -0,0 +1,5 @@
> +template
> +struct A {
> +  template struct B { };
> +  template friend struct B;
> +};
> diff --git a/gcc/testsuite/g++.dg/modules/friend-7_a.H 
> b/gcc/testsuite/g++.dg/modules/friend-7_a.H
> new file mode 100644
> index 000..e750e4c7d8d
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/modules/friend-7_a.H
> @@ -0,0 +1,3 @@
> +// { dg-additional-options "-fmodule-header" }
> +// { dg-module-cmi {} }
> +#include "friend-7.h"
> diff --git a/gcc/testsuite/g++.dg/modules/friend-7_b.C 
> b/gcc/testsuite/g++.dg/modules/friend-7_b.C
> new file mode 100644
> index 000..d90b685d89d
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/modules/friend-7_b.C
> @@ -0,0 +1,5 @@
> +// { dg-additional-options "-fmodules-ts" }
> +#include "friend-7.h"
> +import "friend-7_a.H";
> +
> +A a;
> -- 
> 2.44.0.rc1.15.g4fc51f00ef
> 
>

Re: [PATCH] c++/modules: local class merging [PR99426]

2024-03-05 Thread Patrick Palka

On Tue, 27 Feb 2024, Patrick Palka wrote:

> On Mon, 26 Feb 2024, Patrick Palka wrote:
> 
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this approach
> > look reasonable?
> > 
> > -- >8 --
> > 
> > One known missing piece in the modules implementation is merging of a
> > streamed-in local class with the corresponding in-TU version of the
> > local class.  This missing piece turns out to cause a hard-to-reduce
> > use-after-free GC issue due to the entity_ary not being marked as a GC
> > root (deliberately), and manifests as a serialization error on stream-in
> > as in PR99426 (see comment #6 for a reduction).  It's also reproducible
> > on trunk when running the xtreme-header tests without -fno-module-lazy.
> > 
> > This patch makes us merge such local classes according to their position
> > within the containing function's definition, similar to how we merge
> > FIELD_DECLs of a class according to their index in the TYPE_FIELDS
> > list.
> > 
> > PR c++/99426
> > 
> > gcc/cp/ChangeLog:
> > 
> > * module.cc (merge_kind::MK_local_class): New enumerator.
> > (merge_kind_name): Update.
> > (trees_out::chained_decls): Move BLOCK-specific handling
> > of DECL_LOCAL_DECL_P decls to ...
> > (trees_out::core_vals) : ... here.  Stream
> > BLOCK_VARS manually.
> > (trees_in::core_vals) : Stream BLOCK_VARS
> > manually.  Handle deduplicated local classes.
> > (trees_out::key_local_class): Define.
> > (trees_in::key_local_class): Define.
> > (trees_out::get_merge_kind) : Return
> > MK_local_class for a local class.
> > (trees_out::key_mergeable) : Use
> > key_local_class.
> > (trees_in::key_mergeable) : Likewise.
> > (trees_in::is_matching_decl): Be flexible with type mismatches
> > for local entities.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/modules/xtreme-header-7_a.H: New test.
> > * g++.dg/modules/xtreme-header-7_b.C: New test.
> 
> > ---
> >  gcc/cp/module.cc  | 167 +++---
> >  .../g++.dg/modules/xtreme-header-7_a.H|   4 +
> >  .../g++.dg/modules/xtreme-header-7_b.C|   6 +
> >  3 files changed, 149 insertions(+), 28 deletions(-)
> >  create mode 100644 gcc/testsuite/g++.dg/modules/xtreme-header-7_a.H
> >  create mode 100644 gcc/testsuite/g++.dg/modules/xtreme-header-7_b.C
> > 
> > diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
> > index fa91c6ff9cb..f77f73a59ed 100644
> > --- a/gcc/cp/module.cc
> > +++ b/gcc/cp/module.cc
> > @@ -2771,6 +2771,7 @@ enum merge_kind
> >  
> >MK_enum, /* Found by CTX, & 1stMemberNAME.  */
> >MK_keyed, /* Found by key & index.  */
> > +  MK_local_class, /* Found by CTX, index.  */
> >  
> >MK_friend_spec,  /* Like named, but has a tmpl & args too.  */
> >MK_local_friend, /* Found by CTX, index.  */
> > @@ -2799,7 +2800,7 @@ static char const *const merge_kind_name[MK_hwm] =
> >  "unique", "named", "field", "vtable",  /* 0...3  */
> >  "asbase", "partial", "enum", "attached",   /* 4...7  */
> >  
> > -"friend spec", "local friend", NULL, NULL,  /* 8...11 */
> > +"local class", "friend spec", "local friend", NULL,  /* 8...11 */
> >  NULL, NULL, NULL, NULL,
> >  
> >  "type spec", "type tmpl spec", /* 16,17 type (template).  */
> > @@ -2928,6 +2929,7 @@ public:
> >unsigned binfo_mergeable (tree *);
> >  
> >  private:
> > +  tree key_local_class (const merge_key&, tree);
> >uintptr_t *find_duplicate (tree existing);
> >void register_duplicate (tree decl, tree existing);
> >/* Mark as an already diagnosed bad duplicate.  */
> > @@ -3086,6 +3088,7 @@ public:
> >void binfo_mergeable (tree binfo);
> >  
> >  private:
> > +  void key_local_class (merge_key&, tree, tree);
> >bool decl_node (tree, walk_kind ref);
> >void type_node (tree);
> >void tree_value (tree);
> > @@ -4952,18 +4955,7 @@ void
> >  trees_out::chained_decls (tree decls)
> >  {
> >for (; decls; decls = DECL_CHAIN (decls))
> > -{
> > -  if (VAR_OR_FUNCTION_DECL_P (decls)
> > - && DECL_LOCAL_DECL_P (decls))
> > -   {
> > - /* Make sure this is the first encounter, and mark for
> > -walk-by-value.  */
> > - gcc_checking_assert (!TREE_VISITED (decls)
> > -  && !DECL_TEMPLATE_INFO (decls));
> > - mark_by_value (decls);
> > -   }
> > -  tree_node (decls);
> > -}
> > +tree_node (decls);
> >tree_node (NULL_TREE);
> >  }
> >  
> > @@ -6204,7 +6196,21 @@ trees_out::core_vals (tree t)
> >  
> >/* DECL_LOCAL_DECL_P decls are first encountered here and
> >   streamed by value.  */
> > -  chained_decls (t->block.vars);
> > +  for (tree decls = t->block.vars; decls; decls = DECL_CHAIN (decls))
> > +   {
> > + if (VAR_OR_FUNCTION_DECL_P (decls)
> > + && DECL_LOCAL_DECL_P (decls))
> > +   {
> > + /* Make sure this is the first encounter, and mark for
> > +walk-by-value.

Frontend access to target features (was Re: [PATCH] libgccjit: Add ability to get CPU features)

2024-03-05 Thread David Malcolm

On Thu, 2023-11-09 at 19:33 -0500, Antoni Boucher wrote:
> Hi.
> See answers below.
> 
> On Thu, 2023-11-09 at 18:04 -0500, David Malcolm wrote:
> > On Thu, 2023-11-09 at 17:27 -0500, Antoni Boucher wrote:
> > > Hi.
> > > This patch adds support for getting the CPU features in libgccjit
> > > (bug
> > > 112466)
> > > 
> > > There's a TODO in the test:
> > > I'm not sure how to test that gcc_jit_target_info_arch returns
> > > the
> > > correct value since it is dependant on the CPU.
> > > Any idea on how to improve this?
> > > 
> > > Also, I created a CStringHash to be able to have a
> > > std::unordered_set. Is there any built-in way of
> > > doing
> > > this?
> > 
> > Thanks for the patch.
> > 
> > Some high-level questions:
> > 
> > Is this specifically about detecting capabilities of the host that
> > libgccjit is currently running on? or how the target was configured
> > when libgccjit was built?
> 
> I'm less sure about this part. I'll need to do more tests.
> 
> > 
> > One of the benefits of libgccjit is that, in theory, we support all
> > of
> > the targets that GCC already supports.  Does this patch change
> > that,
> > or
> > is this more about giving client code the ability to determine
> > capabilities of the specific host being compiled for?
> 
> This should not change that. If it does, this is a bug.
> 
> > 
> > I'm nervous about having per-target jit code.  Presumably there's a
> > reason that we can't reuse existing target logic here - can you
> > please
> > describe what the problem is.  I see that the ChangeLog has:
> > 
> > > * config/i386/i386-jit.cc: New file.
> > 
> > where i386-jit.cc has almost 200 lines of nontrivial code.  Where
> > did
> > this come from?  Did you base it on existing code in our source
> > tree,
> > making modifications to fit the new internal API, or did you write
> > it
> > from scratch?  In either case, how onerous would this be for other
> > targets?
> 
> This was mostly copied from the same code done for the Rust and D
> frontends.
> See this commit and the following:
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=b1c06fd9723453dd2b2ec306684cb806dc2b4fbb
> The equivalent to i386-jit.cc is there:
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=22e3557e2d52f129f2bbfdc98688b945dba28dc9

[CCing Iain and Arthur re those patches; for reference, the patch being
discussed is attached to :
https://gcc.gnu.org/pipermail/jit/2024q1/001792.html ]

One of my concerns about this patch is that we seem to be gaining code
that's per-(frontend x config) which seems to be copied and pasted with
a search and replace, which could lead to an M*N explosion.

Is there any real difference between the per-config code for the
different frontends, or should there be a general "enumerate all
features of the target" hook that's independent of the frontend? (but
perhaps calls into it).

Am I right in thinking that (rustc with default LLVM backend) has some
set of feature strings that both (rustc with rustc_codegen_gcc) and
gccrs are trying to emulate?  If so, is it presumably a goal that
libgccjit gives identical results to gccrs?  If so, would it be crazy
for libgccjit to consume e.g. config/i386/i386-rust.cc ?

Dave

> 
> > 
> > I'm not at expert at target hooks (or at the i386 backend), so if
> > we
> > do
> > go with this approach I'd want someone else to review those parts
> > of
> > the patch.
> > 
> > Have you verified that GCC builds with this patch with jit *not*
> > enabled in the enabled languages?
> 
> I will do.
> 
> > 
> > [...snip...]
> > 
> > A nitpick:
> > 
> > > +.. function:: const char * \
> > > +  gcc_jit_target_info_arch (gcc_jit_target_info
> > > *info)
> > > +
> > > +   Get the architecture of the currently running CPU.
> > 
> > What does this string look like?
> > How long does the pointer remain valid?
> 
> It's the march string, like "znver2", for instance.
> It remains valid until we free the gcc_jit_target_info object.
> 
> > 
> > Thanks again; hope the above makes sense
> > Dave
> > 
>

Re: [PATCH] c++: Don't set DECL_CONTEXT to nested template-template parameters [PR98881]

2024-03-05 Thread Patrick Palka

On Tue, 5 Mar 2024, Nathaniel Shead wrote:

> On Mon, Mar 04, 2024 at 10:07:33PM -0500, Patrick Palka wrote:
> > On Tue, 5 Mar 2024, Nathaniel Shead wrote:
> > 
> > > On Mon, Mar 04, 2024 at 09:26:00PM -0500, Patrick Palka wrote:
> > > > On Tue, 5 Mar 2024, Nathaniel Shead wrote:
> > > > 
> > > > > On Mon, Mar 04, 2024 at 07:14:54PM -0500, Patrick Palka wrote:
> > > > > > On Sat, 2 Mar 2024, Nathaniel Shead wrote:
> > > > > > 
> > > > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?
> > > > > > > 
> > > > > > > -- >8 --
> > > > > > > 
> > > > > > > When streaming in a nested template-template parameter as in the
> > > > > > > attached testcase, we end up reaching the containing 
> > > > > > > template-template
> > > > > > > parameter in 'tpl_parms_fini'. We should not set the DECL_CONTEXT 
> > > > > > > to
> > > > > > > this (nested) template-template parameter, as it should already 
> > > > > > > be the
> > > > > > > struct that the outer template-template parameter is declared on.
> > > > > > > 
> > > > > > >   PR c++/98881
> > > > > > > 
> > > > > > > gcc/cp/ChangeLog:
> > > > > > > 
> > > > > > >   * module.cc (trees_out::tpl_parms_fini): Clarify logic purely
> > > > > > >   for checking purposes. Don't consider a template template
> > > > > > >   parameter as the owning template.
> > > > > > >   (trees_in::tpl_parms_fini): Don't consider a template template
> > > > > > >   parameter as the owning template.
> > > > > > > 
> > > > > > > gcc/testsuite/ChangeLog:
> > > > > > > 
> > > > > > >   * g++.dg/modules/tpl-tpl-parm-3_a.H: New test.
> > > > > > >   * g++.dg/modules/tpl-tpl-parm-3_b.C: New test.
> > > > > > > 
> > > > > > > Signed-off-by: Nathaniel Shead 
> > > > > > > ---
> > > > > > >  gcc/cp/module.cc| 17 
> > > > > > > -
> > > > > > >  gcc/testsuite/g++.dg/modules/tpl-tpl-parm-3_a.H | 11 +++
> > > > > > >  gcc/testsuite/g++.dg/modules/tpl-tpl-parm-3_b.C | 13 
> > > > > > > +
> > > > > > >  3 files changed, 36 insertions(+), 5 deletions(-)
> > > > > > >  create mode 100644 
> > > > > > > gcc/testsuite/g++.dg/modules/tpl-tpl-parm-3_a.H
> > > > > > >  create mode 100644 
> > > > > > > gcc/testsuite/g++.dg/modules/tpl-tpl-parm-3_b.C
> > > > > > > 
> > > > > > > diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
> > > > > > > index 67f132d28d7..5663d01ed9c 100644
> > > > > > > --- a/gcc/cp/module.cc
> > > > > > > +++ b/gcc/cp/module.cc
> > > > > > > @@ -10126,10 +10126,14 @@ trees_out::tpl_parms_fini (tree tmpl, 
> > > > > > > unsigned tpl_levels)
> > > > > > > tree dflt = TREE_PURPOSE (parm);
> > > > > > > tree_node (dflt);
> > > > > > >  
> > > > > > > -   if (streaming_p ())
> > > > > > > +   if (CHECKING_P && streaming_p ())
> > > > > > >   {
> > > > > > > +   /* Sanity check that the DECL_CONTEXT we'll infer when
> > > > > > > +  streaming in is correct.  */
> > > > > > > tree decl = TREE_VALUE (parm);
> > > > > > > -   if (TREE_CODE (decl) == TEMPLATE_DECL)
> > > > > > > +   if (TREE_CODE (decl) == TEMPLATE_DECL
> > > > > > > +   /* A template template parm is not the owning 
> > > > > > > template.  */
> > > > > > > +   && !DECL_TEMPLATE_TEMPLATE_PARM_P (tmpl))
> > > > > > >   {
> > > > > > > tree ctx = DECL_CONTEXT (decl);
> > > > > > > tree inner = DECL_TEMPLATE_RESULT (decl);
> > > > > > > @@ -10164,8 +10168,13 @@ trees_in::tpl_parms_fini (tree tmpl, 
> > > > > > > unsigned tpl_levels)
> > > > > > >   return false;
> > > > > > > TREE_PURPOSE (parm) = dflt;
> > > > > > >  
> > > > > > > +   /* Original template template parms have a context
> > > > > > > +  of their owning template.  Reduced ones do not.
> > > > > > > +  But if TMPL is itself a template template parm
> > > > > > > +  then it cannot be the owning template.  */
> > > > > > > tree decl = TREE_VALUE (parm);
> > > > > > > -   if (TREE_CODE (decl) == TEMPLATE_DECL)
> > > > > > > +   if (TREE_CODE (decl) == TEMPLATE_DECL
> > > > > > > +   && !DECL_TEMPLATE_TEMPLATE_PARM_P (tmpl))
> > > > > > 
> > > > > > IIUC a TEMPLATE_DECL inside a template parameter list always 
> > > > > > represents
> > > > > > a template template parm, so won't this effectively disable the
> > > > > > DECL_CONTEXT setting logic?
> > > > > 
> > > > > This is only when 'tmpl' (i.e. the containing TEMPLATE_DECL that we're
> > > > > streaming) is itself a template template parm.
> > > > 
> > > > D'oh, makes sense.
> > > > 
> > > > > 
> > > > > > >   {
> > > > > > > tree inner = DECL_TEMPLATE_RESULT (decl);
> > > > > > > tree tpi = (TREE_CODE (inner) == TYPE_DECL
> > > > > > > @@ -10173,8 +10182,6 @@ trees_in::tpl_parms_fini (tree tmpl, 
> > > > > > > unsigned tpl_levels)
> > > > > > > : DECL_INITIAL (inner));
> > > > > > > bool original = (TEMPLATE_PARM_LEVEL (tpi)
> > > > > > >  ==

Re: [PATCH] middle-end/113576 - avoid out-of-bound vector element access

2024-03-05 Thread Jeff Law





On 2/6/24 04:37, Richard Biener wrote:

The following avoids accessing out-of-bound vector elements when
native encoding a boolean vector with sub-BITS_PER_UNIT precision
elements.  The error was basing the number of elements to extract
on the rounded up total byte size involved and the patch bases
everything on the total number of elements to extract instead.

As a side-effect this now consistently results in zeros in the
padding of the last encoded byte which also avoids the failure
mode seen in PR113576.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK?

Thanks,
Richard.

PR middle-end/113576
* fold-const.cc (native_encode_vector_part): Avoid accessing
out-of-bound elements.

OK.
jeff

[patch,avr,applied] Add two RTL peepholes.

2024-03-05 Thread Georg-Johann Lay


Register alloc may expand a 3-operand arithmetic X = Y o CST as
   X = CST
   X o= Y
where it may be better to instead:
   X = Y
   X o= CST

Johann

--

AVR: Add two RTL peepholes.

Register alloc may expand a 3-operand arithmetic X = Y o CST as
   X = CST
   X o= Y
where it may be better to instead:
   X = Y
   X o= CST
because 1) the first insn may use MOVW for "X = Y", and 2) the
operation may be more efficient when performed with a constant,
for example when ADIW or SBIW can be used, or some bytes of
the constant are 0x00 or 0xff.

gcc/
* config/avr/avr.md: Add two RTL peepholes for PLUS, IOR and AND
in HI, PSI, SI that swap operation order from "X = CST, X o= Y"
to "X = Y, X o= CST".diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md
index 6bdf4682fab..bc8a59c956c 100644
--- a/gcc/config/avr/avr.md
+++ b/gcc/config/avr/avr.md
@@ -932,6 +932,55 @@ (define_peephole2 ; movw_r
 operands[5] = gen_rtx_REG (HImode, REGNO (operands[3]));
   })
 
+
+;; Register alloc may expand a 3-operand arithmetic X = Y o CST as
+;;X = CST
+;;X o= Y
+;; where it may be better to instead:
+;;X = Y
+;;X o= CST
+;; because 1) the first insn may use MOVW for "X = Y", and 2) the
+;; operation may be more efficient when performed with a constant,
+;; for example when ADIW or SBIW can be used, or some bytes of
+;; the constant are 0x00 or 0xff.
+(define_peephole2
+  [(parallel [(set (match_operand:HISI 0 "d_register_operand")
+   (match_operand:HISI 1 "const_int_operand"))
+  (clobber (reg:CC REG_CC))])
+   (parallel [(set (match_dup 0)
+   (piaop:HISI (match_dup 0)
+   (match_operand:HISI 2 "register_operand")))
+  (clobber (scratch:QI))
+  (clobber (reg:CC REG_CC))])]
+  "! reg_overlap_mentioned_p (operands[0], operands[2])"
+  [(parallel [(set (match_dup 0)
+   (match_dup 2))
+  (clobber (reg:CC REG_CC))])
+   (parallel [(set (match_dup 0)
+   (piaop:HISI (match_dup 0)
+   (match_dup 1)))
+  (clobber (scratch:QI))
+  (clobber (reg:CC REG_CC))])])
+
+;; Same, but just for plus:HI without a scratch:QI.
+(define_peephole2
+  [(parallel [(set (match_operand:HI 0 "d_register_operand")
+   (match_operand:HI 1 "const_int_operand"))
+  (clobber (reg:CC REG_CC))])
+   (parallel [(set (match_dup 0)
+   (plus:HI (match_dup 0)
+(match_operand:HI 2 "register_operand")))
+  (clobber (reg:CC REG_CC))])]
+  "! reg_overlap_mentioned_p (operands[0], operands[2])"
+  [(parallel [(set (match_dup 0)
+   (match_dup 2))
+  (clobber (reg:CC REG_CC))])
+   (parallel [(set (match_dup 0)
+   (plus:HI (match_dup 0)
+(match_dup 1)))
+  (clobber (reg:CC REG_CC))])])
+
+
 ;; For LPM loads from AS1 we split
 ;;R = *Z
 ;; to
@@ -1644,9 +1693,9 @@ (define_insn_and_split "*addhi3_sp"
   [(set_attr "length" "6")
(set_attr "adjust_len" "addto_sp")])
 
-;; "*addhi3"
-;; "*addhq3" "*adduhq3"
-;; "*addha3" "*adduha3"
+;; "*addhi3_split"
+;; "*addhq3_split"  "*adduhq3_split"
+;; "*addha3_split"  "*adduha3_split"
 (define_insn_and_split "*add3_split"
   [(set (match_operand:ALL2 0 "register_operand"   "=??r,d,!w,d")
 (plus:ALL2 (match_operand:ALL2 1 "register_operand"  "%0,0,0 ,0")
@@ -1661,6 +1710,9 @@ (define_insn_and_split "*add3_split"
   ""
   [(set_attr "isa" "*,*,adiw,*")])
 
+;; "*addhi3"
+;; "*addhq3"  "*adduhq3"
+;; "*addha3"  "*adduha3"
 (define_insn "*add3"
   [(set (match_operand:ALL2 0 "register_operand"   "=??r,d,!w,d")
 (plus:ALL2 (match_operand:ALL2 1 "register_operand"  "%0,0,0 ,0")
@@ -1732,6 +1784,9 @@ (define_insn_and_split "add3_clobber"
   (clobber (match_dup 3))
   (clobber (reg:CC REG_CC))])])
 
+;; "*addhi3_clobber"
+;; "*addhq3_clobber"  "*adduhq3_clobber"
+;; "*addha3_clobber"  "*adduha3_clobber"
 (define_insn "*add3_clobber"
   [(set (match_operand:ALL2 0 "register_operand""=!w,d,r")
 (plus:ALL2 (match_operand:ALL2 1 "register_operand"  "%0,0,0")

Re: CI for "Option handling: add documentation URLs"

2024-03-05 Thread David Malcolm

On Tue, 2024-03-05 at 13:06 +0100, Mark Wielaard wrote:
> Hi,
> 
> On Mon, 2024-03-04 at 08:48 -0500, David Malcolm wrote:
> > > I have now regenerated the patch to also include the new avr
> > > mfuse-
> > > add change. It would be nice to get this committed so we can turn
> > > on the
> > > automatic checker.
> > 
> > Please go ahead with that.
> 
> I committed that patch, but was not fast enough actually enabling the
> buildbot and missed another fixlet needed first.
> 
> OK, to push the attached regeneration patch?

Yes

Thanks
Dave

[PATCH] LoongArch: testsuite: Rewrite {x, }vfcmp-{d, f}.c to avoid named registers

2024-03-05 Thread Xi Ruoyao

Loops on named vector register are not vectorized (see comment 11 of
PR113622), so the these test cases have been failing for a while.
Rewrite them using check-function-bodies to remove hard coding register
names.  A barrier is needed to always load the first operand before the
second operand.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vfcmp-f.c: Rewrite to avoid named
registers.
* gcc.target/loongarch/vfcmp-d.c: Likewise.
* gcc.target/loongarch/xvfcmp-f.c: Likewise.
* gcc.target/loongarch/xvfcmp-d.c: Likewise.
---

Tested on loongarch64-linux-gnu.  Ok for trunk?

 gcc/testsuite/gcc.target/loongarch/vfcmp-d.c  | 202 --
 gcc/testsuite/gcc.target/loongarch/vfcmp-f.c  | 347 ++
 gcc/testsuite/gcc.target/loongarch/xvfcmp-d.c | 202 --
 gcc/testsuite/gcc.target/loongarch/xvfcmp-f.c | 204 --
 4 files changed, 816 insertions(+), 139 deletions(-)

diff --git a/gcc/testsuite/gcc.target/loongarch/vfcmp-d.c 
b/gcc/testsuite/gcc.target/loongarch/vfcmp-d.c
index 8b870ef38a0..87e4ed19e96 100644
--- a/gcc/testsuite/gcc.target/loongarch/vfcmp-d.c
+++ b/gcc/testsuite/gcc.target/loongarch/vfcmp-d.c
@@ -1,28 +1,188 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mlsx -ffixed-f0 -ffixed-f1 -ffixed-f2 
-fno-vect-cost-model" } */
+/* { dg-options "-O2 -mlsx -fno-vect-cost-model" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #define F double
 #define I long long
 
 #include "vfcmp-f.c"
 
-/* { dg-final { scan-assembler 
"compare_quiet_equal:.*\tvfcmp\\.ceq\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_equal\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_not_equal:.*\tvfcmp\\.cune\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_not_equal\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_greater:.*\tvfcmp\\.slt\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_signaling_greater\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_greater_equal:.*\tvfcmp\\.sle\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_signaling_greater_equal\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_less:.*\tvfcmp\\.slt\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_signaling_less\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_less_equal:.*\tvfcmp\\.sle\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_signaling_less_equal\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_not_greater:.*\tvfcmp\\.sule\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_signaling_not_greater\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_less_unordered:.*\tvfcmp\\.sult\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_signaling_less_unordered\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_not_less:.*\tvfcmp\\.sule\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_signaling_not_less\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_greater_unordered:.*\tvfcmp\\.sult\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_signaling_greater_unordered\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_less:.*\tvfcmp\\.clt\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_less\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_less_equal:.*\tvfcmp\\.cle\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_less_equal\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_greater:.*\tvfcmp\\.clt\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_quiet_greater\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_greater_equal:.*\tvfcmp\\.cle\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_quiet_greater_equal\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_not_less:.*\tvfcmp\\.cule\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_quiet_not_less\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_greater_unordered:.*\tvfcmp\\.cult\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_quiet_greater_unordered\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_not_greater:.*\tvfcmp\\.cule\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_not_greater\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_less_unordered:.*\tvfcmp\\.cult\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_less_unordered\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_unordered:.*\tvfcmp\\.cun\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_unordered\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_ordered:.*\tvfcmp\\.cor\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_ordered\n"
 } } */
+/*
+** compare_quiet_equal:
+** vld (\$vr[0-9]+),\$r4,0
+** vld (\$vr[0-9]+),\$r5,0
+** vfcmp.ceq.d (\$vr[0-9]+),(\1,\2|\2,\1)
+** vst \3,\$r6,0
+** jr  \$r1
+*/
+
+/*
+** compare_quiet_not_equal:
+** vld (\$vr[0-9]+),\$r4,0
+** vld (\$vr[0-9]+),\$r5,0
+** vfcmp.cune.d(\$vr[0-9]+),(\1,\2|\2,\1)
+** vst \3,\$r6,0
+** jr  \$r1
+*/
+
+/*
+** compare_signaling_greater:
+** vld (\$vr[0-9]+),\$r4,0
+** vld (\$vr[0-9]+),\$r5,0
+** vfcmp.slt.d (\$vr[0-9]+),\2,\1
+** vst \3,\$r6,0
+**

Re: [PATCH] middle-end/113680 - Optimize (x - y) CMP 0 as x CMP y

2024-03-05 Thread Ken Matsui

On Tue, Mar 5, 2024 at 12:38 AM Richard Biener
 wrote:
>
> On Mon, Mar 4, 2024 at 9:40 PM Ken Matsui  wrote:
> >
> > (x - y) CMP 0 is equivalent to x CMP y where x and y are signed
> > integers and CMP is <, <=, >, or >=.  Similarly, 0 CMP (x - y) is
> > equivalent to y CMP x.  As reported in PR middle-end/113680, this
> > equivalence does not hold for types other than signed integers.  When
> > it comes to conditions, the former was translated to a combination of
> > sub and test, whereas the latter was translated to a single cmp.
> > Thus, this optimization pass tries to optimize the former to the
> > latter.
> >
> > When `-fwrapv` is enabled, GCC treats the overflow of signed integers
> > as defined behavior, specifically, wrapping around according to two's
> > complement arithmetic.  This has implications for optimizations that
> > rely on the standard behavior of signed integers, where overflow is
> > undefined.  Consider the example given:
> >
> > long long llmax = __LONG_LONG_MAX__;
> > long long llmin = -llmax - 1;
> >
> > Here, `llmax - llmin` effectively becomes `llmax - (-llmax - 1)`, which
> > simplifies to `2 * llmax + 1`.  Given that `llmax` is the maximum value
> > for a `long long`, this calculation overflows in a defined manner
> > (wrapping around), which under `-fwrapv` is a legal operation that
> > produces a negative value due to two's complement wraparound.
> > Therefore, `llmax - llmin < 0` is true.
> >
> > However, the direct comparison `llmax < llmin` is false since `llmax`
> > is the maximum possible value and `llmin` is the minimum.  Hence,
> > optimizations that rely on the equivalence of `(x - y) CMP 0` to
> > `x CMP y` (and vice versa) cannot be safely applied when `-fwrapv` is
> > enabled.  This is why this optimization pass is disabled under
> > `-fwrapv`.
> >
> > This optimization pass must run before the Jump Threading pass and the
> > VRP pass, as it may modify conditions. For example, in the VRP pass:
> >
> > (1)
> >   int diff = x - y;
> >   if (diff > 0)
> > foo();
> >   if (diff < 0)
> > bar();
> >
> > The second condition would be converted to diff != 0 in the VRP pass
> > because we know the postcondition of the first condition is diff <= 0,
> > and then diff != 0 is cheaper than diff < 0. If we apply this pass
> > after this VRP, we get:
> >
> > (2)
> >   int diff = x - y;
> >   if (x > y)
> > foo();
> >   if (diff != 0)
> > bar();
> >
> > This generates sub and test for the second condition and cmp for the
> > first condition. However, if we apply this pass beforehand, we simply
> > get:
> >
> > (3)
> >   int diff = x - y;
> >   if (x > y)
> > foo();
> >   if (x < y)
> > bar();
> >
> > In this code, diff will be eliminated as a dead code, and sub and test
> > will not be generated, which is more efficient.
> >
> > For the Jump Threading pass, without this optimization pass, (1) and
> > (3) above are recognized as different, which prevents TCO.
> >
> > PR middle-end/113680
>
> This shouldn't be done as a new optimization pass.  It fits either
> the explicit code present in the forwprop pass or a new match.pd
> pattern.  There's possible interaction with x - y value being used
> elsewhere and thus exposing a CSE opportunity as well as
> a comparison against zero being possibly implemented by
> a flag setting subtraction instruction.
>

Thank you so much for your review!  Although the forwprop pass runs
multiple times, we might not need to apply this optimization multiple
times.  Would it be acceptable to add such optimization?  More
generally, I would like to know how to determine where to put
optimization in the future.

FYI, I read this page: https://gcc.gnu.org/wiki/OptimizationCourse

> Our VN pass has some tricks to anticipate CSE opportunities
> like this, but it's not done "properly" in the way of anticipating
> both forms during PRE.
>
> I'll note we have
>
>  /* (A - B) != 0 ? (A - B) : (B - A)same as (A - B) */
>  (for cmp (ne ltgt)
>
> and similar which might be confused by canonicalizing to A != B.

I will investigate and update my patch (after my final exam ends...)!

> I'm also surprised we don't already have the pattern you add.

Hmm, so am I.  I saw that this optimization sometimes messes up VRP
while sometimes helping it (as I described in my comment).  I also
need to research this.

>
> Richard.
>
> > gcc/ChangeLog:
> >
> > * Makefile.in: Add tree-ssa-cmp.o to OBJS.
> > * common.opt: Define ftree-cmp
> > * doc/invoke.texi: Document ftree-cmp.
> > * opts.cc (default_options_table): Handle OPT_ftree_cmp.
> > * passes.def (pass_cmp): New optimization pass.
> > * timevar.def (TV_TREE_CMP): New variable for timing.
> > * tree-pass.h (make_pass_cmp): New declaration.
> > * tree-ssa-cmp.cc: New

Re: [PATCH v2] LoongArch: Allow s9 as a register alias

2024-03-05 Thread chenglulu




在 2024/3/5 下午7:50, Xi Ruoyao 写道:

The psABI allows using s9 as an alias of r22.

gcc/ChangeLog:

* config/loongarch/loongarch.h (ADDITIONAL_REGISTER_NAMES): Add
s9 as an alias of r22.
---

v1 -> v2: Add a test case.

Ok for trunk?

Ok. Thanks!


  gcc/config/loongarch/loongarch.h   | 1 +
  gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c | 3 +++
  2 files changed, 4 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c

diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h
index 8b453ab3140..bf2351f0968 100644
--- a/gcc/config/loongarch/loongarch.h
+++ b/gcc/config/loongarch/loongarch.h
@@ -931,6 +931,7 @@ typedef struct {
{ "t8",   20 + GP_REG_FIRST },\
{ "x",21 + GP_REG_FIRST },\
{ "fp",   22 + GP_REG_FIRST },\
+  { "s9",22 + GP_REG_FIRST },\
{ "s0",   23 + GP_REG_FIRST },\
{ "s1",   24 + GP_REG_FIRST },\
{ "s2",   25 + GP_REG_FIRST },\
diff --git a/gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c 
b/gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c
new file mode 100644
index 000..d2e3b80f83c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c
@@ -0,0 +1,3 @@
+/* { dg-do compile } */
+register long s9 asm("s9"); /* { dg-note "conflicts with 's9'" } */
+register long fp asm("fp"); /* { dg-warning "register of 'fp' used for multiple 
global register variables" } */

Re: CI for "Option handling: add documentation URLs"

2024-03-05 Thread Mark Wielaard

Hi,

On Mon, 2024-03-04 at 08:48 -0500, David Malcolm wrote:
> > I have now regenerated the patch to also include the new avr mfuse-
> > add change. It would be nice to get this committed so we can turn on the
> > automatic checker.
> 
> Please go ahead with that.

I committed that patch, but was not fast enough actually enabling the
buildbot and missed another fixlet needed first.

OK, to push the attached regeneration patch?

Thanks,

Mark
From e5c2b9983d7c09e5a21fa587dc9cd03d53d67a23 Mon Sep 17 00:00:00 2001
From: Mark Wielaard 
Date: Tue, 5 Mar 2024 13:01:08 +0100
Subject: [PATCH] Regenerate c.opt.urls

Fixes: 08edf85f747b ("c++/modules: relax diagnostic about GMF contents")

gcc/c-family/ChangeLog:

	* c.opt.urls: Regenerate.
---
 gcc/c-family/c.opt.urls | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/c-family/c.opt.urls b/gcc/c-family/c.opt.urls
index 9f97dc61a778..631719863a5e 100644
--- a/gcc/c-family/c.opt.urls
+++ b/gcc/c-family/c.opt.urls
@@ -403,6 +403,9 @@ UrlSuffix(gcc/Warning-Options.html#index-Wformat)
 Wframe-address
 UrlSuffix(gcc/Warning-Options.html#index-Wframe-address)
 
+Wglobal-module
+UrlSuffix(gcc/C_002b_002b-Dialect-Options.html#index-Wglobal-module)
+
 Wif-not-aligned
 UrlSuffix(gcc/Warning-Options.html#index-Wif-not-aligned)
 
-- 
2.44.0

[PATCH v2] LoongArch: Allow s9 as a register alias

2024-03-05 Thread Xi Ruoyao

The psABI allows using s9 as an alias of r22.

gcc/ChangeLog:

* config/loongarch/loongarch.h (ADDITIONAL_REGISTER_NAMES): Add
s9 as an alias of r22.
---

v1 -> v2: Add a test case.

Ok for trunk?

 gcc/config/loongarch/loongarch.h   | 1 +
 gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c | 3 +++
 2 files changed, 4 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c

diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h
index 8b453ab3140..bf2351f0968 100644
--- a/gcc/config/loongarch/loongarch.h
+++ b/gcc/config/loongarch/loongarch.h
@@ -931,6 +931,7 @@ typedef struct {
   { "t8",  20 + GP_REG_FIRST },\
   { "x",   21 + GP_REG_FIRST },\
   { "fp",  22 + GP_REG_FIRST },\
+  { "s9",  22 + GP_REG_FIRST },\
   { "s0",  23 + GP_REG_FIRST },\
   { "s1",  24 + GP_REG_FIRST },\
   { "s2",  25 + GP_REG_FIRST },\
diff --git a/gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c 
b/gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c
new file mode 100644
index 000..d2e3b80f83c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c
@@ -0,0 +1,3 @@
+/* { dg-do compile } */
+register long s9 asm("s9"); /* { dg-note "conflicts with 's9'" } */
+register long fp asm("fp"); /* { dg-warning "register of 'fp' used for 
multiple global register variables" } */
-- 
2.44.0

[PATCH v3] testsuite: Add a test case for negating FP vectors containing zeros

2024-03-05 Thread Xi Ruoyao

Recently I've fixed two wrong FP vector negate implementation which
caused wrong sign bits in zeros in targets (r14-8786 and r14-8801).  To
prevent a similar issue from happening again, add a test case.

Tested on x86_64 (with SSE2, AVX, AVX2, and AVX512F), AArch64, MIPS
(with MSA), LoongArch (with LSX and LASX).

gcc/testsuite:

* gcc.dg/vect/vect-neg-zero.c: New test.
---

- v1 -> v2: Remove { dg-do run } which may cause SIGILL.
- v2 -> v3: Add -fno-associative-math to fix an excessive warning on
  arm.

Ok for trunk?

 gcc/testsuite/gcc.dg/vect/vect-neg-zero.c | 38 +++
 1 file changed, 38 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-neg-zero.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c 
b/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c
new file mode 100644
index 000..21fa00cfa15
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c
@@ -0,0 +1,38 @@
+/* { dg-add-options ieee } */
+/* { dg-additional-options "-fno-associative-math -fsigned-zeros" } */
+
+double x[4] = {-0.0, 0.0, -0.0, 0.0};
+float y[8] = {-0.0, 0.0, -0.0, 0.0, -0.0, -0.0, 0.0, 0.0};
+
+static __attribute__ ((always_inline)) inline void
+test (int factor)
+{
+  double a[4];
+  float b[8];
+
+  asm ("" ::: "memory");
+
+  for (int i = 0; i < 2 * factor; i++)
+a[i] = -x[i];
+
+  for (int i = 0; i < 4 * factor; i++)
+b[i] = -y[i];
+
+#pragma GCC novector
+  for (int i = 0; i < 2 * factor; i++)
+if (__builtin_signbit (a[i]) == __builtin_signbit (x[i]))
+  __builtin_abort ();
+
+#pragma GCC novector
+  for (int i = 0; i < 4 * factor; i++)
+if (__builtin_signbit (b[i]) == __builtin_signbit (y[i]))
+  __builtin_abort ();
+}
+
+int
+main (void)
+{
+  test (1);
+  test (2);
+  return 0;
+}
-- 
2.44.0

[patch,avr,applied] Improve output of insn "*insv.any_shift.".

2024-03-05 Thread Georg-Johann Lay


Applied Roger's proposed improvements with some changes:

Lengthy code is more convenient in avr.cc than in an insn
output function, and it makes it easy to work out the exact
instruction length.  Moreover, the code can handle shifts
with offset zero (cases of *and3 insns).

Passed with no new regressions on ATmega128.

Applied as https://gcc.gnu.org/r14-9317

Johann

--

AVR: Improve output of insn "*insv.any_shift._split".

The instructions printed by insn "*insv.any_shift._split" were
sub-optimal.  The code to print the improved output is lengthy and
performed by new function avr_out_insv.  As it turns out, the function
can also handle shift-offsets of zero, which is "*andhi3", "*andpsi3"
and "*andsi3".  Thus, these tree insns get a new 3-operand alternative
where the 3rd operand is an exact power of 2.

gcc/
* config/avr/avr-protos.h (avr_out_insv): New proto.
* config/avr/avr.cc (avr_out_insv): New function.
(avr_adjust_insn_length) [ADJUST_LEN_INSV]: Handle case.
(avr_cbranch_cost) [ZERO_EXTRACT]: Adjust rtx costs.
* config/avr/avr.md (define_attr "adjust_len") Add insv.
(andhi3, *andhi3, andpsi3, *andpsi3, andsi3, *andsi3):
Add constraint alternative where the 3rd operand is a power
of 2, and the source register may differ from the destination.
(*insv.any_shift._split): Call avr_out_insv to output
instructions.  Set attr "length" to "insv".
* config/avr/constraints.md (Cb2, Cb3, Cb4): New constraints.

gcc/testsuite/
* gcc.target/avr/torture/insv-anyshift-hi.c: New test.
* gcc.target/avr/torture/insv-anyshift-si.c: New test.
commit 49a1a340ea0eef681f23b6861f3cdb6840aadd99
Author: Roger Sayle 
Date:   Tue Mar 5 11:06:17 2024 +0100

AVR: Improve output of insn "*insv.any_shift._split".

The instructions printed by insn "*insv.any_shift._split" were
sub-optimal.  The code to print the improved output is lengthy and
performed by new function avr_out_insv.  As it turns out, the function
can also handle shift-offsets of zero, which is "*andhi3", "*andpsi3"
and "*andsi3".  Thus, these tree insns get a new 3-operand alternative
where the 3rd operand is an exact power of 2.

gcc/
* config/avr/avr-protos.h (avr_out_insv): New proto.
* config/avr/avr.cc (avr_out_insv): New function.
(avr_adjust_insn_length) [ADJUST_LEN_INSV]: Handle case.
(avr_cbranch_cost) [ZERO_EXTRACT]: Adjust rtx costs.
* config/avr/avr.md (define_attr "adjust_len") Add insv.
(andhi3, *andhi3, andpsi3, *andpsi3, andsi3, *andsi3):
Add constraint alternative where the 3rd operand is a power
of 2, and the source register may differ from the destination.
(*insv.any_shift._split): Call avr_out_insv to output
instructions.  Set attr "length" to "insv".
* config/avr/constraints.md (Cb2, Cb3, Cb4): New constraints.

gcc/testsuite/
* gcc.target/avr/torture/insv-anyshift-hi.c: New test.
* gcc.target/avr/torture/insv-anyshift-si.c: New test.

diff --git a/gcc/config/avr/avr-protos.h b/gcc/config/avr/avr-protos.h
index 3e19409d636..bb680312117 100644
--- a/gcc/config/avr/avr-protos.h
+++ b/gcc/config/avr/avr-protos.h
@@ -58,6 +58,7 @@ extern const char *ret_cond_branch (rtx x, int len, int reverse);
 extern const char *avr_out_movpsi (rtx_insn *, rtx*, int*);
 extern const char *avr_out_sign_extend (rtx_insn *, rtx*, int*);
 extern const char *avr_out_insert_notbit (rtx_insn *, rtx*, int*);
+extern const char *avr_out_insv (rtx_insn *, rtx*, int*);
 extern const char *avr_out_extr (rtx_insn *, rtx*, int*);
 extern const char *avr_out_extr_not (rtx_insn *, rtx*, int*);
 extern const char *avr_out_plus_set_ZN (rtx*, int*);
diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc
index c8b2b504e3f..36995e05cbe 100644
--- a/gcc/config/avr/avr.cc
+++ b/gcc/config/avr/avr.cc
@@ -9795,6 +9795,178 @@ avr_out_insert_notbit (rtx_insn *insn, rtx op[], int *plen)
 }
 
 
+/* Output instructions for  XOP[0] = (XOP[1]  XOP[2]) & XOP[3]  where
+   -  XOP[0] and XOP[1] have the same mode which is one of: QI, HI, PSI, SI.
+   -  XOP[3] is an exact const_int power of 2.
+   -  XOP[2] and XOP[3] are const_int.
+   -   is any of: ASHIFT, LSHIFTRT, ASHIFTRT.
+   -  The result depends on XOP[1].
+   or  XOP[0] = XOP[1] & XOP[2]  where
+   -  XOP[0] and XOP[1] have the same mode which is one of: HI, PSI, SI.
+   -  XOP[2] is an exact const_int power of 2.
+   Returns "".
+   PLEN != 0: Set *PLEN to the code length in words.  Don't output anything.
+   PLEN == 0: Output instructions.  */
+
+const char*
+avr_out_insv (rtx_insn *insn, rtx xop[], int *plen)
+{
+  machine_mode mode = GET_MODE (xop[0]);
+  int n_bytes = GET_MODE_SIZE (mode);
+  rtx xsrc = SET_SRC (single_set (insn));
+
+  gcc_assert (AND == GET_CODE (xsrc));
+
+  rtx xop2 = xop[2];
+  rtx xop3 = xop[3];
+
+

[PATCH] tree-optimization/114231 - use patterns for BB SLP discovery root stmts

2024-03-05 Thread Richard Biener

The following makes sure to use recognized patterns when vectorizing
roots during BB SLP discovery.  We need to apply those late since
during root discovery we've not yet done pattern recognition.
All parts of the vectorizer assume patterns get used, for the testcase
we mix this up when doing live lane computation.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/114231
* tree-vect-slp.cc (vect_analyze_slp): Lookup patterns when
processing a BB SLP root.

* gcc.dg/vect/pr114231.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr114231.c | 12 
 gcc/tree-vect-slp.cc |  4 
 2 files changed, 16 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr114231.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr114231.c 
b/gcc/testsuite/gcc.dg/vect/pr114231.c
new file mode 100644
index 000..5e3a8103918
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr114231.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+
+void f(long*);
+int ff[2];
+void f2(long, long, unsigned long);
+void k(unsigned long x, unsigned long y)
+{
+  long t = x >> ff[0];
+  long t1 = ff[1];
+  unsigned long t2 = y >> ff[0];
+  f2(t1, t+t2, t2);
+}
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index e8844044e88..734ec4f4284 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -3634,6 +3634,10 @@ vect_analyze_slp (vec_info *vinfo, unsigned 
max_tree_size)
   for (unsigned i = 0; i < bb_vinfo->roots.length (); ++i)
{
  vect_location = bb_vinfo->roots[i].roots[0]->stmt;
+ /* Apply patterns.  */
+ for (unsigned j = 0; j < bb_vinfo->roots[i].stmts.length (); ++j)
+   bb_vinfo->roots[i].stmts[j]
+ = vect_stmt_to_vectorize (bb_vinfo->roots[i].stmts[j]);
  if (vect_build_slp_instance (bb_vinfo, bb_vinfo->roots[i].kind,
   bb_vinfo->roots[i].stmts,
   bb_vinfo->roots[i].roots,
-- 
2.35.3

RE: RE:[PATCH 3/5] RISC-V: Support vmfxx.vf for autovec comparison of vec and imm

2024-03-05 Thread Demin Han

OK, I will solve the comparison operation first and then do some check over 
other operations.



Regards,

Demin


From: juzhe.zh...@rivai.ai 
Sent: 2024年3月5日 17:02
To: Demin Han ; gcc-patches 

Cc: kito.cheng ; pan2.li ; jeffreyalaw 
; Robin Dapp ; richard.sandiford 

Subject: Re: RE:[PATCH 3/5] RISC-V: Support vmfxx.vf for autovec comparison of 
vec and imm

Yes. I think we are lacking some combine patterns to do all vector-scalar 
combinations.

If you are interested at this topic, you can do some investigations on that (I 
believe currently no body works on it for now).
I bet we should add some patterns for late-combine PASS for example:

(set (plus : (vec_duplicate) (reg)))


juzhe.zh...@rivai.ai

From: Demin Han
Date: 2024-03-05 16:40
To: 钟居哲; 
gcc-patches
CC: kito.cheng; Li, 
Pan2; jeffreyalaw; 
Robin Dapp; 
richard.sandiford
Subject: RE: Re:[PATCH 3/5] RISC-V: Support vmfxx.vf for autovec comparison of 
vec and imm
Hi,

I applied the mentioned last_combine 
patch(https://patchwork.ozlabs.org/project/gcc/patch/mptbka7em9w@arm.com/).
And did some initial tests.

Found that:

1.  Float vector-scalar and vector-imm are OK

2.  Integer vector-scalar is OK

3.  Integer vector-imm(e.g. a[i] > 16) is not OK.

When reaches last_combine pass, vec_duplicate(0x10) form is still kept, but no 
pattern match this now,

because  all scalar patterns  have “register_operand” predication.


I think MD file or expand function of rvv need to change for this situation.

Regards,
Demin

Re: [PATCH v2] testsuite, arm: Fix testcase arm/pr112337.c to check for the options first

2024-03-05 Thread Saurabh Jha


Ping

On 2/19/2024 10:11 AM, Saurabh Jha wrote:


On 2/9/2024 2:57 PM, Richard Earnshaw (lists) wrote:

On 30/01/2024 17:07, Saurabh Jha wrote:

Hey,

Previously, this test was added to fix this bug: 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112337. However, it did 
not check the compilation options before using them, leading to errors.


This patch fixes the test by first checking whether it can use the 
options before using them.


Tested for arm-none-eabi and found no regressions. The output of 
check-gcc with RUNTESTFLAGS="arm.exp=*" changed like this:


Before:
# of expected passes  5963
# of unexpected failures  64

After:
# of expected passes  5964
# of unexpected failures  63

Ok for master?

Regards,
Saurabh

gcc/testsuite/ChangeLog:

 * gcc.target/arm/pr112337.c: Check whether we can use the 
compilation options before using them.
My apologies for missing this earlier.  It didn't show up in 
patchwork. That's most likely because the attachment is a binary blob 
instead of text/plain.  That also means that the Linaro CI system 
hasn't seen this patch either.  Please can you fix your mailer to add 
plain text patch files.


-/* { dg-options "-O2 -march=armv8.1-m.main+fp.dp+mve.fp 
-mfloat-abi=hard" } */

+/* { dg-require-effective-target arm_hard_ok } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-options "-O2 -mfloat-abi=hard" } */
+/* { dg-add-options arm_v8_1m_mve } */

This is moving in the right direction, but it adds more than 
necessary now: checking for, and adding -mfloat-abi=hard is not 
necessary any more as arm_v8_1m_mve_ok will work out what float-abi 
flags are needed to make the options work. (What's more, it will 
prevent the test from running if the base configuration of the 
compiler is incompatible with the hard float ABI, which is more than 
we need.).


So please can you re-spin removing the hard-float check and removing 
that from dg-options.


Thanks,
R.


Hi Richard,

Agreed with your comments. Please find the patch with the suggested 
changes attached.


Regards,

Saurabh

Re: [PATCH v2] LoongArch: Fix inconsistent description in *sge_

2024-03-05 Thread Xi Ruoyao

On Tue, 2024-03-05 at 16:05 +0800, Guo Jie wrote:
> The constraint of op[1] is inconsistent with the output template.
> 
> gcc/ChangeLog:
> 
>   * config/loongarch/loongarch.md
>   (define_insn "*sge_"): Fix inconsistency
>   error.
> 
> ---
> Update in v2:
>     Remove useless support for op[1] is const_imm12_operand.
> 
> ---
>  gcc/config/loongarch/loongarch.md | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/config/loongarch/loongarch.md 
> b/gcc/config/loongarch/loongarch.md
> index f3b5c641fce..e35a001e0ed 100644
> --- a/gcc/config/loongarch/loongarch.md
> +++ b/gcc/config/loongarch/loongarch.md
> @@ -3360,7 +3360,7 @@ (define_insn "*sge_"
>   (any_ge:GPR (match_operand:X 1 "register_operand" "r")
>    (const_int 1)))]
>    ""
> -  "slti\t%0,%.,%1"
> +  "slt\t%0,%.,%1"
>    [(set_attr "type" "slt")
>     (set_attr "mode" "")])

Hmm, this define_insn seems never really used or it would generate
something like "sltui $r4,$r0,$r4" and trigger an assembler failure. 
The generic path seems already converting "x >= 1" to "x > 0".

So it seems we should just remove this define_insn?


-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

RE: [PATCH v1] RISC-V: Cleanup unused code in riscv_v_adjust_bytesize [NFC]

2024-03-05 Thread Li, Pan2

Committed, thanks Juzhe.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Tuesday, March 5, 2024 5:15 PM
To: Li, Pan2 ; gcc-patches 
Cc: kito.cheng ; Wang, Yanzhang 
; Li, Pan2 
Subject: Re: [PATCH v1] RISC-V: Cleanup unused code in riscv_v_adjust_bytesize 
[NFC]

LGTM. Thanks for clean up.


juzhe.zh...@rivai.ai

From: pan2.li
Date: 2024-03-05 16:59
To: gcc-patches
CC: juzhe.zhong; 
kito.cheng; 
yanzhang.wang; Pan Li
Subject: [PATCH v1] RISC-V: Cleanup unused code in riscv_v_adjust_bytesize [NFC]
From: Pan Li mailto:pan2...@intel.com>>

Cleanup mode_size related code which is not used anymore. Below tests are
passed for this patch.

* The RVV fully regresssion test.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_v_adjust_bytesize): Cleanup unused
mode_size related code.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
---
gcc/config/riscv/riscv.cc | 4 
1 file changed, 4 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 56cd8d2c23f..691d967de29 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -1527,10 +1527,6 @@ riscv_v_adjust_bytesize (machine_mode mode, int scale)
return BYTES_PER_RISCV_VECTOR;
   poly_int64 nunits = GET_MODE_NUNITS (mode);
-  poly_int64 mode_size = GET_MODE_SIZE (mode);
-
-  if (maybe_eq (mode_size, (uint16_t) -1))
- mode_size = riscv_vector_chunks * scale;
   if (nunits.coeffs[0] > 8)
return exact_div (nunits, 8);
--
2.34.1

Re: [PATCH v1] RISC-V: Cleanup unused code in riscv_v_adjust_bytesize [NFC]

2024-03-05 Thread juzhe.zh...@rivai.ai

LGTM. Thanks for clean up.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2024-03-05 16:59
To: gcc-patches
CC: juzhe.zhong; kito.cheng; yanzhang.wang; Pan Li
Subject: [PATCH v1] RISC-V: Cleanup unused code in riscv_v_adjust_bytesize [NFC]
From: Pan Li 
 
Cleanup mode_size related code which is not used anymore. Below tests are
passed for this patch.
 
* The RVV fully regresssion test.
 
gcc/ChangeLog:
 
* config/riscv/riscv.cc (riscv_v_adjust_bytesize): Cleanup unused
mode_size related code.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/riscv.cc | 4 
1 file changed, 4 deletions(-)
 
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 56cd8d2c23f..691d967de29 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -1527,10 +1527,6 @@ riscv_v_adjust_bytesize (machine_mode mode, int scale)
return BYTES_PER_RISCV_VECTOR;
   poly_int64 nunits = GET_MODE_NUNITS (mode);
-  poly_int64 mode_size = GET_MODE_SIZE (mode);
-
-  if (maybe_eq (mode_size, (uint16_t) -1))
- mode_size = riscv_vector_chunks * scale;
   if (nunits.coeffs[0] > 8)
return exact_div (nunits, 8);
-- 
2.34.1

Re: RE:[PATCH 3/5] RISC-V: Support vmfxx.vf for autovec comparison of vec and imm

2024-03-05 Thread juzhe.zh...@rivai.ai

Yes. I think we are lacking some combine patterns to do all vector-scalar 
combinations.

If you are interested at this topic, you can do some investigations on that (I 
believe currently no body works on it for now).
I bet we should add some patterns for late-combine PASS for example:

(set (plus : (vec_duplicate) (reg))) 



juzhe.zh...@rivai.ai
 
From: Demin Han
Date: 2024-03-05 16:40
To: 钟居哲; gcc-patches
CC: kito.cheng; Li, Pan2; jeffreyalaw; Robin Dapp; richard.sandiford
Subject: RE: Re:[PATCH 3/5] RISC-V: Support vmfxx.vf for autovec comparison of 
vec and imm
Hi,
 
I applied the mentioned last_combine 
patch(https://patchwork.ozlabs.org/project/gcc/patch/mptbka7em9w@arm.com/).
And did some initial tests. 
 
Found that:
1.  Float vector-scalar and vector-imm are OK
2.  Integer vector-scalar is OK
3.  Integer vector-imm(e.g. a[i] > 16) is not OK.
When reaches last_combine pass, vec_duplicate(0x10) form is still kept, but no 
pattern match this now, 
because  all scalar patterns  have “register_operand” predication. 
 
I think MD file or expand function of rvv need to change for this situation.
 
Regards,
Demin

[PATCH v1] RISC-V: Cleanup unused code in riscv_v_adjust_bytesize [NFC]

2024-03-05 Thread pan2 . li

From: Pan Li 

Cleanup mode_size related code which is not used anymore. Below tests are
passed for this patch.

* The RVV fully regresssion test.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_v_adjust_bytesize): Cleanup unused
mode_size related code.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/riscv.cc | 4 
 1 file changed, 4 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 56cd8d2c23f..691d967de29 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -1527,10 +1527,6 @@ riscv_v_adjust_bytesize (machine_mode mode, int scale)
return BYTES_PER_RISCV_VECTOR;
 
   poly_int64 nunits = GET_MODE_NUNITS (mode);
-  poly_int64 mode_size = GET_MODE_SIZE (mode);
-
-  if (maybe_eq (mode_size, (uint16_t) -1))
-   mode_size = riscv_vector_chunks * scale;
 
   if (nunits.coeffs[0] > 8)
return exact_div (nunits, 8);
-- 
2.34.1

Re: [PATCH] bitint: Handle BIT_FIELD_REF lowering [PR114157]

2024-03-05 Thread Richard Biener

On Tue, 5 Mar 2024, Jakub Jelinek wrote:

> On Tue, Mar 05, 2024 at 09:27:22AM +0100, Richard Biener wrote:
> > On Tue, 5 Mar 2024, Jakub Jelinek wrote:
> > > The following patch adds support for BIT_FIELD_REF lowering with
> > > large/huge _BitInt lhs.  BIT_FIELD_REF requires mode argument first
> > > operand, so the operand shouldn't be any huge _BitInt.
> > > If we only access limbs from inside of BIT_FIELD_REF using constant
> > > indexes, we can just create a new BIT_FIELD_REF to extract the limb,
> > > but if we need to use variable index in a loop, I'm afraid we need
> > > to spill it into memory, which is what the following patch does.
> > 
> > :/
> > 
> > If it's only ever "small" _BitInt and we'd want to optimize we could
> > fully unroll the loop at code generation time and thus avoid the
> > variable indices?  You could also lower the BIT_FIELD_REF to
> > variable shifts & masking I suppose.
> 
> Not really sure if one can have some of the SVE/RISCV modes in there,
> that couldn't be small anymore.  But otherwise yes, likely right now at most
> 64 byte vectors aka 512 bits.  Now, if it is say extraction of _BitInt(448)
> out of it (so that it isn't just VCE instead), that would still mean
> e.g. on ia32 unrolling the loop with 7 iterations handling 2 limbs each.
> 14 is already huge I'm afraid especially when it can be hidden somewhere in
> the middle of a large expression which is all mergeable.
> But more importantly, currently there are simple rules, large _BitInt
> implies straight line code, huge _BitInt implies a loop and the loop handles
> just 2 limbs (for other operations just 1 limb) per iteration.  Changing
> that depending on what trees are somewhere used would be a nightmare.
> The idea was that if it is worth unrolling, unroller can unroll it later
> and at that point I'd think e.g. FRE would optimize away the temporary
> memory.

Yeah, I would also guess FRE would optimize it though the question is
whether the unroller heuristic anticipates it or the loop is small
enough.  I guess we can worry when it shows to be a problem.

> For variable shifts/masking I'd need some type in which I can do it.

Ah, sure ... OTOH somehow RTL expansion manages to do it ;)

> Sure, perhaps if the inner operand is a vector I could use some non-constant
> permutations or similar.

If the extraction is byte aligned sure, maybe if the extraction is
from a single limb then it can be lowered without an extra temporary.

Richard.

[PATCH, V3] ctf: fix incorrect CTF for multi-dimensional array types

2024-03-05 Thread Indu Bhagat

From: Cupertino Miranda 

[Changes from V2]
  - Fixed aarch64 new FAILs reported by Linaro CI.
  - Fixed typos and other nits pointed out in V2.
[End of changes from V2]

PR debug/114186

DWARF DIEs of type DW_TAG_subrange_type are linked together to represent
the information about the subsequent dimensions.  The CTF processing was
so far working through them in the opposite (incorrect) order.

While fixing the issue, refactor the code a bit for readability.

co-authored-By: Indu Bhagat 

gcc/
PR debug/114186
* dwarf2ctf.cc (gen_ctf_array_type): Invoke the ctf_add_array ()
in the correct order of the dimensions.
(gen_ctf_subrange_type): Refactor out handling of
DW_TAG_subrange_type DIE to here.

gcc/testsuite/
PR debug/114186
* gcc.dg/debug/ctf/ctf-array-6.c: Add test.
---

Testing notes:
 - Linaro CI reported three new FAILs introduced by ctf-array-6.c due to
   presence of char '#' on aarch64 where the ASM_COMMENT_START differs.
   Fixed and regression tested on aarch64.
 - Regression tested on x86_64-linux-gnu default target.
 - Regression tested for target bpf-unknown-none (btf.exp, ctf.exp, bpf.exp).
 - Kernel build with -gctf shows healthier CTF types for arrays.

---
 gcc/dwarf2ctf.cc | 158 +--
 gcc/testsuite/gcc.dg/debug/ctf/ctf-array-6.c |  14 ++
 2 files changed, 89 insertions(+), 83 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-array-6.c

diff --git a/gcc/dwarf2ctf.cc b/gcc/dwarf2ctf.cc
index dca86edfffa9..77d6bf896893 100644
--- a/gcc/dwarf2ctf.cc
+++ b/gcc/dwarf2ctf.cc
@@ -349,105 +349,97 @@ gen_ctf_pointer_type (ctf_container_ref ctfc, dw_die_ref 
ptr_type)
   return ptr_type_id;
 }
 
-/* Generate CTF for an array type.  */
+/* Recursively generate CTF for array dimensions starting at DIE C (of type
+   DW_TAG_subrange_type) until DIE LAST (of type DW_TAG_subrange_type) is
+   reached.  ARRAY_ELEMS_TYPE_ID is base type for the array.  */
 
 static ctf_id_t
-gen_ctf_array_type (ctf_container_ref ctfc, dw_die_ref array_type)
+gen_ctf_subrange_type (ctf_container_ref ctfc, ctf_id_t array_elems_type_id,
+  dw_die_ref c, dw_die_ref last)
 {
-  dw_die_ref c;
-  ctf_id_t array_elems_type_id = CTF_NULL_TYPEID;
+  ctf_arinfo_t arinfo;
+  ctf_id_t array_node_type_id = CTF_NULL_TYPEID;
 
-  int vector_type_p = get_AT_flag (array_type, DW_AT_GNU_vector);
-  if (vector_type_p)
-return array_elems_type_id;
+  dw_attr_node *upper_bound_at;
+  dw_die_ref array_index_type;
+  uint32_t array_num_elements;
 
-  dw_die_ref array_elems_type = ctf_get_AT_type (array_type);
+  if (dw_get_die_tag (c) == DW_TAG_subrange_type)
+{
+  /* When DW_AT_upper_bound is used to specify the size of an
+array in DWARF, it is usually an unsigned constant
+specifying the upper bound index of the array.  However,
+for unsized arrays, such as foo[] or bar[0],
+DW_AT_upper_bound is a signed integer constant
+instead.  */
+
+  upper_bound_at = get_AT (c, DW_AT_upper_bound);
+  if (upper_bound_at
+ && AT_class (upper_bound_at) == dw_val_class_unsigned_const)
+   /* This is the upper bound index.  */
+   array_num_elements = get_AT_unsigned (c, DW_AT_upper_bound) + 1;
+  else if (get_AT (c, DW_AT_count))
+   array_num_elements = get_AT_unsigned (c, DW_AT_count);
+  else
+   {
+ /* This is a VLA of some kind.  */
+ array_num_elements = 0;
+   }
+}
+  else
+gcc_unreachable ();
 
-  /* First, register the type of the array elements if needed.  */
-  array_elems_type_id = gen_ctf_type (ctfc, array_elems_type);
+  /* Ok, mount and register the array type.  Note how the array
+ type we register here is the type of the elements in
+ subsequent "dimensions", if there are any.  */
+  arinfo.ctr_nelems = array_num_elements;
 
-  /* DWARF array types pretend C supports multi-dimensional arrays.
- So for the type int[N][M], the array type DIE contains two
- subrange_type children, the first with upper bound N-1 and the
- second with upper bound M-1.
+  array_index_type = ctf_get_AT_type (c);
+  arinfo.ctr_index = gen_ctf_type (ctfc, array_index_type);
 
- CTF, on the other hand, just encodes each array type in its own
- array type CTF struct.  Therefore we have to iterate on the
- children and create all the needed types.  */
+  if (c == last)
+arinfo.ctr_contents = array_elems_type_id;
+  else
+arinfo.ctr_contents = gen_ctf_subrange_type (ctfc, array_elems_type_id,
+dw_get_die_sib (c), last);
 
-  c = dw_get_die_child (array_type);
-  gcc_assert (c);
-  do
-{
-  ctf_arinfo_t arinfo;
-  dw_die_ref array_index_type;
-  uint32_t array_num_elements;
+  if (!ctf_type_exists (ctfc, c, _node_type_id))
+array_node_type_id = ctf_add_array (ctfc, CTF_ADD_ROOT, , c);
 
-  c = dw_get_die_sib

Re: [PATCH] lower-subreg: Fix ROTATE handling [PR114211]

2024-03-05 Thread Jakub Jelinek

On Tue, Mar 05, 2024 at 09:29:38AM +0100, Richard Biener wrote:
> I wonder if we need to care about extra temporaries on RTL before
> RA, thus whether always using a temporary would be OK?

I'd still need to do the resolve_reg_p check, otherwise if it is e.g. a
memory, the copying to temporary would be significantly harder because
src isn't a CONCATN in that case.
I think the reg_overlap_mentioned_p is worth it, because the normal
case is that there is no overlap, usually expansion forces distinct
pseudos.  If this was more common problem, it wouldn't take so long
to discover it and wouldn't need -fno-tree-coalesce-vars.
And the extra pseudo could result in different RA decisions.

Jakub

Re: [PATCH] bitint: Handle BIT_FIELD_REF lowering [PR114157]

2024-03-05 Thread Jakub Jelinek

On Tue, Mar 05, 2024 at 09:27:22AM +0100, Richard Biener wrote:
> On Tue, 5 Mar 2024, Jakub Jelinek wrote:
> > The following patch adds support for BIT_FIELD_REF lowering with
> > large/huge _BitInt lhs.  BIT_FIELD_REF requires mode argument first
> > operand, so the operand shouldn't be any huge _BitInt.
> > If we only access limbs from inside of BIT_FIELD_REF using constant
> > indexes, we can just create a new BIT_FIELD_REF to extract the limb,
> > but if we need to use variable index in a loop, I'm afraid we need
> > to spill it into memory, which is what the following patch does.
> 
> :/
> 
> If it's only ever "small" _BitInt and we'd want to optimize we could
> fully unroll the loop at code generation time and thus avoid the
> variable indices?  You could also lower the BIT_FIELD_REF to
> variable shifts & masking I suppose.

Not really sure if one can have some of the SVE/RISCV modes in there,
that couldn't be small anymore.  But otherwise yes, likely right now at most
64 byte vectors aka 512 bits.  Now, if it is say extraction of _BitInt(448)
out of it (so that it isn't just VCE instead), that would still mean
e.g. on ia32 unrolling the loop with 7 iterations handling 2 limbs each.
14 is already huge I'm afraid especially when it can be hidden somewhere in
the middle of a large expression which is all mergeable.
But more importantly, currently there are simple rules, large _BitInt
implies straight line code, huge _BitInt implies a loop and the loop handles
just 2 limbs (for other operations just 1 limb) per iteration.  Changing
that depending on what trees are somewhere used would be a nightmare.
The idea was that if it is worth unrolling, unroller can unroll it later
and at that point I'd think e.g. FRE would optimize away the temporary
memory.

For variable shifts/masking I'd need some type in which I can do it.
Sure, perhaps if the inner operand is a vector I could use some non-constant
permutations or similar.

Jakub

Re: [PATCH v2] Draft|Internal-fn: Introduce internal fn saturation US_PLUS

2024-03-05 Thread Richard Biener

On Tue, Mar 5, 2024 at 8:09 AM Li, Pan2  wrote:
>
> Thanks Richard for comments.
>
> > I do wonder what the existing usadd patterns with integer vector modes
> > in various targets do?
> > Those define_insn will at least not end up in the optab set I guess,
> > so they must end up
> > being either unused or used by explicit gen_* (via intrinsic
> > functions?) or by combine?
>
> For usadd with vector modes, I think the backend like RISC-V try to leverage 
> instructions
> like Vector Single-Width Saturating Add(aka vsaddu.vv/x/i).
>
> > I think simply changing gen_*_fixed_libfunc to gen_int_libfunc won't
> > work.  Since there's
> > no libgcc support I'd leave it as gen_*_fixed_libfunc thus no library
> > fallback for integers?
>
> Change to gen_int_libfunc follows other int optabs. I am not sure if it will 
> hit the standard name usaddm3 for vector mode.
> But the happy path for scalar modes works up to a point, please help to 
> correct me if any misunderstanding.

gen_int_libfunc will no longer make it emit libcalls for fixed point
modes, so this can't be correct
and there's no libgcc implementation for integer mode saturating ops,
so it's pointless to emit calls
to them.

> #0  riscv_expand_usadd (dest=0x76a8c7c8, x=0x76a8c798, 
> y=0x76a8c7b0) at 
> /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/config/riscv/riscv.cc:10662
> #1  0x029f142a in gen_usaddsi3 (operand0=0x76a8c7c8, 
> operand1=0x76a8c798, operand2=0x76a8c7b0) at 
> /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/config/riscv/riscv.md:3848
> #2  0x01751e60 in insn_gen_fn::operator() rtx_def*> (this=0x4910e70 ) at 
> /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/recog.h:441
> #3  0x0180f553 in maybe_gen_insn (icode=CODE_FOR_usaddsi3, nops=3, 
> ops=0x7fffd2c0) at 
> /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/optabs.cc:8232
> #4  0x0180fa42 in maybe_expand_insn (icode=CODE_FOR_usaddsi3, nops=3, 
> ops=0x7fffd2c0) at 
> /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/optabs.cc:8275
> #5  0x0180fade in expand_insn (icode=CODE_FOR_usaddsi3, nops=3, 
> ops=0x7fffd2c0) at 
> /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/optabs.cc:8306
> #6  0x015cebdc in expand_fn_using_insn (stmt=0x76a36480, 
> icode=CODE_FOR_usaddsi3, noutputs=1, ninputs=2) at 
> /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/internal-fn.cc:254
> #7  0x015de146 in expand_direct_optab_fn (fn=IFN_SAT_ADD, 
> stmt=0x76a36480, optab=usadd_optab, nargs=2) at 
> /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/internal-fn.cc:3818
> #8  0x015e3610 in expand_SAT_ADD (fn=IFN_SAT_ADD, 
> stmt=0x76a36480) at 
> /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/internal-fn.def:278
> #9  0x015e65b6 in expand_internal_call (fn=IFN_SAT_ADD, 
> stmt=0x76a36480) at 
> /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/internal-fn.cc:4914
> #10 0x015e65e5 in expand_internal_call (stmt=0x76a36480) at 
> /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/internal-fn.cc:4922
> #11 0x01248c8f in expand_call_stmt (stmt=0x76a36480) at 
> /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/cfgexpand.cc:2771
> #12 0x0124d392 in expand_gimple_stmt_1 (stmt=0x76a36480) at 
> /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/cfgexpand.cc:3932
> #13 0x0124d9aa in expand_gimple_stmt (stmt=0x76a36480) at 
> /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/cfgexpand.cc:4077
> #14 0x0124dac4 in expand_gimple_tailcall (bb=0x76dddae0, 
> stmt=0x76a36480, can_fallthru=0x7fffd800) at 
> /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/cfgexpand.cc:4123
> #15 0x0125636b in expand_gimple_basic_block (bb=0x76dddae0, 
> disable_tail_calls=false) at 
> /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/cfgexpand.cc:6107
> #16 0x01258a1a in (anonymous namespace)::pass_expand::execute 
> (this=0x556d180, fun=0x76a7f2e0) at 
> /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/cfgexpand.cc:6872
> #17 0x01873565 in execute_one_pass (pass=0x556d180) at 
> /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/passes.cc:2646
> #18 0x01873948 in execute_pass_list_1 (pass=0x556d180) at 
> /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/passes.cc:2755
> #19 0x018739d6 in execute_pass_list (fn=0x76a7f2e0, 
> pass=0x5568870) at 
> /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/passes.cc:2766
> #20 0x012bc975 in cgraph_node::expand (this=0x76c2c880) at 
> /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/cgraphunit.cc:1845
> #21

RE: Re:[PATCH 3/5] RISC-V: Support vmfxx.vf for autovec comparison of vec and imm

2024-03-05 Thread Demin Han

Hi,

I applied the mentioned last_combine 
patch(https://patchwork.ozlabs.org/project/gcc/patch/mptbka7em9w@arm.com/).
And did some initial tests.

Found that:

1.  Float vector-scalar and vector-imm are OK

2.  Integer vector-scalar is OK

3.  Integer vector-imm(e.g. a[i] > 16) is not OK.

When reaches last_combine pass, vec_duplicate(0x10) form is still kept, but no 
pattern match this now,

because  all scalar patterns  have “register_operand” predication.


I think MD file or expand function of rvv need to change for this situation.

Regards,
Demin

Re: [PATCH] middle-end/113680 - Optimize (x - y) CMP 0 as x CMP y

2024-03-05 Thread Richard Biener

On Mon, Mar 4, 2024 at 9:40 PM Ken Matsui  wrote:
>
> (x - y) CMP 0 is equivalent to x CMP y where x and y are signed
> integers and CMP is <, <=, >, or >=.  Similarly, 0 CMP (x - y) is
> equivalent to y CMP x.  As reported in PR middle-end/113680, this
> equivalence does not hold for types other than signed integers.  When
> it comes to conditions, the former was translated to a combination of
> sub and test, whereas the latter was translated to a single cmp.
> Thus, this optimization pass tries to optimize the former to the
> latter.
>
> When `-fwrapv` is enabled, GCC treats the overflow of signed integers
> as defined behavior, specifically, wrapping around according to two's
> complement arithmetic.  This has implications for optimizations that
> rely on the standard behavior of signed integers, where overflow is
> undefined.  Consider the example given:
>
> long long llmax = __LONG_LONG_MAX__;
> long long llmin = -llmax - 1;
>
> Here, `llmax - llmin` effectively becomes `llmax - (-llmax - 1)`, which
> simplifies to `2 * llmax + 1`.  Given that `llmax` is the maximum value
> for a `long long`, this calculation overflows in a defined manner
> (wrapping around), which under `-fwrapv` is a legal operation that
> produces a negative value due to two's complement wraparound.
> Therefore, `llmax - llmin < 0` is true.
>
> However, the direct comparison `llmax < llmin` is false since `llmax`
> is the maximum possible value and `llmin` is the minimum.  Hence,
> optimizations that rely on the equivalence of `(x - y) CMP 0` to
> `x CMP y` (and vice versa) cannot be safely applied when `-fwrapv` is
> enabled.  This is why this optimization pass is disabled under
> `-fwrapv`.
>
> This optimization pass must run before the Jump Threading pass and the
> VRP pass, as it may modify conditions. For example, in the VRP pass:
>
> (1)
>   int diff = x - y;
>   if (diff > 0)
> foo();
>   if (diff < 0)
> bar();
>
> The second condition would be converted to diff != 0 in the VRP pass
> because we know the postcondition of the first condition is diff <= 0,
> and then diff != 0 is cheaper than diff < 0. If we apply this pass
> after this VRP, we get:
>
> (2)
>   int diff = x - y;
>   if (x > y)
> foo();
>   if (diff != 0)
> bar();
>
> This generates sub and test for the second condition and cmp for the
> first condition. However, if we apply this pass beforehand, we simply
> get:
>
> (3)
>   int diff = x - y;
>   if (x > y)
> foo();
>   if (x < y)
> bar();
>
> In this code, diff will be eliminated as a dead code, and sub and test
> will not be generated, which is more efficient.
>
> For the Jump Threading pass, without this optimization pass, (1) and
> (3) above are recognized as different, which prevents TCO.
>
> PR middle-end/113680

This shouldn't be done as a new optimization pass.  It fits either
the explicit code present in the forwprop pass or a new match.pd
pattern.  There's possible interaction with x - y value being used
elsewhere and thus exposing a CSE opportunity as well as
a comparison against zero being possibly implemented by
a flag setting subtraction instruction.

Our VN pass has some tricks to anticipate CSE opportunities
like this, but it's not done "properly" in the way of anticipating
both forms during PRE.

I'll note we have

 /* (A - B) != 0 ? (A - B) : (B - A)same as (A - B) */
 (for cmp (ne ltgt)

and similar which might be confused by canonicalizing to A != B.
I'm also surprised we don't already have the pattern you add.

Richard.

> gcc/ChangeLog:
>
> * Makefile.in: Add tree-ssa-cmp.o to OBJS.
> * common.opt: Define ftree-cmp
> * doc/invoke.texi: Document ftree-cmp.
> * opts.cc (default_options_table): Handle OPT_ftree_cmp.
> * passes.def (pass_cmp): New optimization pass.
> * timevar.def (TV_TREE_CMP): New variable for timing.
> * tree-pass.h (make_pass_cmp): New declaration.
> * tree-ssa-cmp.cc: New file.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/pr113680.c: New test.
>
> Signed-off-by: Ken Matsui 
> ---
>  gcc/Makefile.in |   1 +
>  gcc/common.opt  |   4 +
>  gcc/doc/invoke.texi |  11 +-
>  gcc/opts.cc |   1 +
>  gcc/passes.def  |   3 +
>  gcc/testsuite/gcc.dg/pr113680.c |  47 ++
>  gcc/timevar.def |   1 +
>  gcc/tree-pass.h |   1 +
>  gcc/tree-ssa-cmp.cc | 262 
>  9 files changed, 330 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/pr113680.c
>  create mode 100644 gcc/tree-ssa-cmp.cc
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index a74761b7ab3..935b80b6947 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1731,6 +1731,7 @@ OBJS = \

Re: [PATCH] lower-subreg: Fix ROTATE handling [PR114211]

2024-03-05 Thread Richard Biener

On Tue, 5 Mar 2024, Jakub Jelinek wrote:

> Hi!
> 
> On the following testcase, we have
> (insn 10 7 11 2 (set (reg/v:TI 106 [ h ])
> (rotate:TI (reg/v:TI 106 [ h ])
> (const_int 64 [0x40]))) "pr114211.c":8:5 1042 
> {rotl64ti2_doubleword}
>  (nil))
> before subreg1 and the pass decides to use
> (reg:DI 127 [ h ]) / (reg:DI 128 [ h+8 ])
> register pair instead of (reg/v:TI 106 [ h ]).
> resolve_operand_for_swap_move_operator implements it by pretending it is
> an assignment from
> (concatn (reg:DI 127 [ h ]) (reg:DI 128 [ h+8 ]))
> to
> (concatn (reg:DI 128 [ h+8 ]) (reg:DI 127 [ h ]))
> The problem is that if the rotate argument is the same as destination or
> if there is even an overlap between the first half of the destination with
> second half of the source we emit incorrect code, because the store to
> (reg:DI 128 [ h+8 ]) overwrites what we need for source of the second
> move.  THe following patch detects that case and uses a temporary pseudo
> to hold the original (reg:DI 128 [ h+8 ]) value across the first store.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

I wonder if we need to care about extra temporaries on RTL before
RA, thus whether always using a temporary would be OK?

Thanks,
Richard.

> 2024-03-05  Jakub Jelinek  
> 
>   PR rtl-optimization/114211
>   * lower-subreg.cc (resolve_simple_move): For double-word
>   rotates by BITS_PER_WORD if there is overlap between source
>   and destination use a temporary.
> 
>   * gcc.dg/pr114211.c: New test.
> 
> --- gcc/lower-subreg.cc.jj2024-01-03 11:51:33.713700906 +0100
> +++ gcc/lower-subreg.cc   2024-03-04 20:29:13.911428988 +0100
> @@ -927,6 +927,21 @@ resolve_simple_move (rtx set, rtx_insn *
>SRC's operator.  */
> dest = resolve_operand_for_swap_move_operator (dest);
> src = src_op;
> +   if (resolve_reg_p (src))
> + {
> +   gcc_assert (GET_CODE (src) == CONCATN);
> +   if (reg_overlap_mentioned_p (XVECEXP (dest, 0, 0),
> +XVECEXP (src, 0, 1)))
> + {
> +   /* If there is overlap betwee the first half of the
> +  destination and what will be stored to the second one,
> +  use a temporary pseudo.  See PR114211.  */
> +   rtx tem = gen_reg_rtx (GET_MODE (XVECEXP (src, 0, 1)));
> +   emit_move_insn (tem, XVECEXP (src, 0, 1));
> +   src = copy_rtx (src);
> +   XVECEXP (src, 0, 1) = tem;
> + }
> + }
>   }
>else if (resolve_reg_p (src_op))
>   {
> --- gcc/testsuite/gcc.dg/pr114211.c.jj2024-03-04 20:37:58.735339443 
> +0100
> +++ gcc/testsuite/gcc.dg/pr114211.c   2024-03-04 20:37:33.78077 +0100
> @@ -0,0 +1,23 @@
> +/* PR rtl-optimization/114211 */
> +/* { dg-do run { target int128 } } */
> +/* { dg-options "-O -fno-tree-coalesce-vars -Wno-psabi" } */
> +
> +typedef unsigned __int128 V __attribute__((__vector_size__ (16)));
> +unsigned int u;
> +V v;
> +
> +V
> +foo (unsigned __int128 h)
> +{
> +  h = h << 64 | h >> 64;
> +  h *= ~u;
> +  return h + v;
> +}
> +
> +int
> +main ()
> +{
> +  V x = foo (1);
> +  if (x[0] != (unsigned __int128) 0x << 64)
> +__builtin_abort ();
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH] bitint: Handle BIT_FIELD_REF lowering [PR114157]

2024-03-05 Thread Richard Biener

On Tue, 5 Mar 2024, Jakub Jelinek wrote:

> Hi!
> 
> The following patch adds support for BIT_FIELD_REF lowering with
> large/huge _BitInt lhs.  BIT_FIELD_REF requires mode argument first
> operand, so the operand shouldn't be any huge _BitInt.
> If we only access limbs from inside of BIT_FIELD_REF using constant
> indexes, we can just create a new BIT_FIELD_REF to extract the limb,
> but if we need to use variable index in a loop, I'm afraid we need
> to spill it into memory, which is what the following patch does.

:/

If it's only ever "small" _BitInt and we'd want to optimize we could
fully unroll the loop at code generation time and thus avoid the
variable indices?  You could also lower the BIT_FIELD_REF to
variable shifts & masking I suppose.

Not sure if it's worth the trouble though.

> If there is some bitwise type for the extraction, it extracts just
> what we need and not more than that, otherwise it spills the whole
> first argument of BIT_FIELD_REF and uses MEM_REF with an offset
> with VIEW_CONVERT_EXPR around it.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2024-03-05  Jakub Jelinek  
> 
>   PR middle-end/114157
>   * gimple-lower-bitint.cc: Include stor-layout.h.
>   (mergeable_op): Return true for BIT_FIELD_REF.
>   (struct bitint_large_huge): Declare handle_bit_field_ref method.
>   (bitint_large_huge::handle_bit_field_ref): New method.
>   (bitint_large_huge::handle_stmt): Use it for BIT_FIELD_REF.
> 
>   * gcc.dg/bitint-98.c: New test.
>   * gcc.target/i386/avx2-pr114157.c: New test.
>   * gcc.target/i386/avx512f-pr114157.c: New test.
> 
> --- gcc/gimple-lower-bitint.cc.jj 2024-03-04 11:14:57.450288563 +0100
> +++ gcc/gimple-lower-bitint.cc2024-03-04 18:51:06.833008534 +0100
> @@ -54,6 +54,7 @@ along with GCC; see the file COPYING3.
>  #include "tree-cfgcleanup.h"
>  #include "tree-switch-conversion.h"
>  #include "ubsan.h"
> +#include "stor-layout.h"
>  #include "gimple-lower-bitint.h"
>  
>  /* Split BITINT_TYPE precisions in 4 categories.  Small _BitInt, where
> @@ -212,6 +213,7 @@ mergeable_op (gimple *stmt)
>  case BIT_NOT_EXPR:
>  case SSA_NAME:
>  case INTEGER_CST:
> +case BIT_FIELD_REF:
>return true;
>  case LSHIFT_EXPR:
>{
> @@ -435,6 +437,7 @@ struct bitint_large_huge
>tree handle_plus_minus (tree_code, tree, tree, tree);
>tree handle_lshift (tree, tree, tree);
>tree handle_cast (tree, tree, tree);
> +  tree handle_bit_field_ref (tree, tree);
>tree handle_load (gimple *, tree);
>tree handle_stmt (gimple *, tree);
>tree handle_operand_addr (tree, gimple *, int *, int *);
> @@ -1685,6 +1688,86 @@ bitint_large_huge::handle_cast (tree lhs
>return NULL_TREE;
>  }
>  
> +/* Helper function for handle_stmt method, handle a BIT_FIELD_REF.  */
> +
> +tree
> +bitint_large_huge::handle_bit_field_ref (tree op, tree idx)
> +{
> +  if (tree_fits_uhwi_p (idx))
> +{
> +  if (m_first)
> + m_data.safe_push (NULL);
> +  ++m_data_cnt;
> +  unsigned HOST_WIDE_INT sz = tree_to_uhwi (TYPE_SIZE (m_limb_type));
> +  tree bfr = build3 (BIT_FIELD_REF, m_limb_type,
> +  TREE_OPERAND (op, 0),
> +  TYPE_SIZE (m_limb_type),
> +  size_binop (PLUS_EXPR, TREE_OPERAND (op, 2),
> +  bitsize_int (tree_to_uhwi (idx) * sz)));
> +  tree r = make_ssa_name (m_limb_type);
> +  gimple *g = gimple_build_assign (r, bfr);
> +  insert_before (g);
> +  tree type = limb_access_type (TREE_TYPE (op), idx);
> +  if (!useless_type_conversion_p (type, m_limb_type))
> + r = add_cast (type, r);
> +  return r;
> +}
> +  tree var;
> +  if (m_first)
> +{
> +  unsigned HOST_WIDE_INT sz = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (op)));
> +  machine_mode mode;
> +  tree type, bfr;
> +  if (bitwise_mode_for_size (sz).exists ()
> +   && known_eq (GET_MODE_BITSIZE (mode), sz))
> + type = bitwise_type_for_mode (mode);
> +  else
> + {
> +   mode = VOIDmode;
> +   type = TYPE_MAIN_VARIANT (TREE_TYPE (TREE_OPERAND (op, 0)));
> + }
> +  if (TYPE_ALIGN (type) < TYPE_ALIGN (TREE_TYPE (op)))
> + type = build_aligned_type (type, TYPE_ALIGN (TREE_TYPE (op)));
> +  var = create_tmp_var (type);
> +  TREE_ADDRESSABLE (var) = 1;
> +  gimple *g;
> +  if (mode != VOIDmode)
> + {
> +   bfr = build3 (BIT_FIELD_REF, type, TREE_OPERAND (op, 0),
> + TYPE_SIZE (type), TREE_OPERAND (op, 2));
> +   g = gimple_build_assign (make_ssa_name (type),
> +BIT_FIELD_REF, bfr);
> +   gimple_set_location (g, m_loc);
> +   gsi_insert_after (_init_gsi, g, GSI_NEW_STMT);
> +   bfr = gimple_assign_lhs (g);
> + }
> +  else
> + bfr = TREE_OPERAND (op, 0);
> +  g = gimple_build_assign (var, bfr);
> +

[PATCH v2] LoongArch: Fix inconsistent description in *sge_

2024-03-05 Thread Guo Jie

The constraint of op[1] is inconsistent with the output template.

gcc/ChangeLog:

* config/loongarch/loongarch.md
(define_insn "*sge_"): Fix inconsistency
error.

---
Update in v2:
Remove useless support for op[1] is const_imm12_operand.

---
 gcc/config/loongarch/loongarch.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index f3b5c641fce..e35a001e0ed 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -3360,7 +3360,7 @@ (define_insn "*sge_"
(any_ge:GPR (match_operand:X 1 "register_operand" "r")
 (const_int 1)))]
   ""
-  "slti\t%0,%.,%1"
+  "slt\t%0,%.,%1"
   [(set_attr "type" "slt")
(set_attr "mode" "")])
 
-- 
2.20.1

76 matches

Mail list logo