date:20240321

[PATCH] Another ICE after conflicting types of redeclaration [PR109619]

2024-03-21 Thread Andrew Pinski

This another one of these ICE after error issues with the
gimplifier and a fallout from r12-3278-g823685221de986af.
This case happens when we are trying to fold memcpy/memmove.
There is already code to try to catch ERROR_MARKs as arguments
to the builtins so just need to change them to use error_operand_p
which checks the type of the expression to see if it was an error mark
also.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

PR c/109619
* builtins.cc (fold_builtin_1): Use error_operand_p
instead of checking against ERROR_MARK.
(fold_builtin_2): Likewise.
(fold_builtin_3): Likewise.

gcc/testsuite/ChangeLog:

PR c/109619
* gcc.dg/redecl-26.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/builtins.cc  | 12 ++--
 gcc/testsuite/gcc.dg/redecl-26.c | 14 ++
 2 files changed, 20 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/redecl-26.c

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index eda8bea9c4b..bb74b5cbcd6 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -10461,7 +10461,7 @@ fold_builtin_1 (location_t loc, tree expr, tree fndecl, 
tree arg0)
   tree type = TREE_TYPE (TREE_TYPE (fndecl));
   enum built_in_function fcode = DECL_FUNCTION_CODE (fndecl);
 
-  if (TREE_CODE (arg0) == ERROR_MARK)
+  if (error_operand_p (arg0))
 return NULL_TREE;
 
   if (tree ret = fold_const_call (as_combined_fn (fcode), type, arg0))
@@ -10601,8 +10601,8 @@ fold_builtin_2 (location_t loc, tree expr, tree fndecl, 
tree arg0, tree arg1)
   tree type = TREE_TYPE (TREE_TYPE (fndecl));
   enum built_in_function fcode = DECL_FUNCTION_CODE (fndecl);
 
-  if (TREE_CODE (arg0) == ERROR_MARK
-  || TREE_CODE (arg1) == ERROR_MARK)
+  if (error_operand_p (arg0)
+  || error_operand_p (arg1))
 return NULL_TREE;
 
   if (tree ret = fold_const_call (as_combined_fn (fcode), type, arg0, arg1))
@@ -10693,9 +10693,9 @@ fold_builtin_3 (location_t loc, tree fndecl,
   tree type = TREE_TYPE (TREE_TYPE (fndecl));
   enum built_in_function fcode = DECL_FUNCTION_CODE (fndecl);
 
-  if (TREE_CODE (arg0) == ERROR_MARK
-  || TREE_CODE (arg1) == ERROR_MARK
-  || TREE_CODE (arg2) == ERROR_MARK)
+  if (error_operand_p (arg0)
+  || error_operand_p (arg1)
+  || error_operand_p (arg2))
 return NULL_TREE;
 
   if (tree ret = fold_const_call (as_combined_fn (fcode), type,
diff --git a/gcc/testsuite/gcc.dg/redecl-26.c b/gcc/testsuite/gcc.dg/redecl-26.c
new file mode 100644
index 000..5f8889c4c39
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/redecl-26.c
@@ -0,0 +1,14 @@
+/* We used to ICE while folding memcpy and memmove.
+   PR c/109619. */
+/* { dg-do compile } */
+/* { dg-options "" } */
+
+int *a1, *a2;
+
+void foo(__SIZE_TYPE__ a3) /* { dg-note "" }  */
+{
+  __builtin_memcpy(a1, a2, a3);
+  __builtin_memmove(a1, a2, a3);
+  int *a3; /* { dg-error "redeclared as different kind of symbol" } */
+}
+
-- 
2.43.0

Re: [PATCH] RISC-V: Don't add fractional LMUL types to V_VLS for XTheadVector

2024-03-21 Thread Bruce Hoult

> The effect is demonstrated by a new test case that shows
that the by-pieces framework now emits `sb` instructions
instead of triggering an ICE

So these small memset() now don't use RVV at all if xtheadvector is enabled?

I don't have evidence whether the use of RVV (whether V or
xtheadvector) for these memsets is a win or not, but the treatment
should probably be consistent.

I don't know why RVV 1.0 uses a fractional LMUL at all here. It would
work perfectly well with LMUL=1 and just setting vl to the appropriate
length (which is always less than 16 bytes). Use of fractional LMUL
doesn't save any resources.

On Fri, Mar 22, 2024 at 12:46 PM Christoph Müllner
 wrote:
>
> The expansion of `memset` (via expand_builtin_memset_args())
> uses clear_by_pieces() and store_by_pieces() to avoid calls
> to the C runtime. To check if a type can be used for that purpose
> the function by_pieces_mode_supported_p() tests if a `mov` and
> a `vec_duplicate` INSN can be expaned by the backend.
>
> The `vec_duplicate` expansion takes arguments of type `V_VLS`.
> The `mov` expansions take arguments of type `V`, `VB`, `VT`,
> `VLS_AVL_IMM`, and `VLS_AVL_REG`. Some of these types (in fact
> not types but type iterators) include fractional LMUL types.
> E.g. `V_VLS` includes `V`, which includes `VI`, which includes
> `RVVMF2QI`.
>
> This results in an attempt to use fractional LMUL-types for
> the `memset` expansion resulting in an ICE for XTheadVector,
> because that extension cannot handle fractional LMULs.
>
> This patch addresses this issue by splitting the definition
> of the `VI` mode itereator into `VI_NOFRAC` (without fractional
> LMUL types) and `VI_FRAC` (only fractional LMUL types).
> Further, it defines `V_VLS` such, that `VI_FRAC` types are only
> included if XTheadVector is not enabled.
>
> The effect is demonstrated by a new test case that shows
> that the by-pieces framework now emits `sb` instructions
> instead of triggering an ICE.
>
> Signed-off-by: Christoph Müllner 
>
> PR 114194
>
> gcc/ChangeLog:
>
> * config/riscv/vector-iterators.md: Split VI into VI_FRAC and 
> VI_NOFRAC.
> Only include VI_NOFRAC in V_VLS without TARGET_XTHEADVECTOR.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/xtheadvector/pr114194.c: New test.
>
> Signed-off-by: Christoph Müllner 
> ---
>  gcc/config/riscv/vector-iterators.md  | 19 +--
>  .../riscv/rvv/xtheadvector/pr114194.c | 56 +++
>  2 files changed, 69 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr114194.c
>
> diff --git a/gcc/config/riscv/vector-iterators.md 
> b/gcc/config/riscv/vector-iterators.md
> index c2ea7e8b10a..a24e1bf078f 100644
> --- a/gcc/config/riscv/vector-iterators.md
> +++ b/gcc/config/riscv/vector-iterators.md
> @@ -108,17 +108,24 @@ (define_c_enum "unspecv" [
>UNSPECV_FRM_RESTORE_EXIT
>  ])
>
> -(define_mode_iterator VI [
> -  RVVM8QI RVVM4QI RVVM2QI RVVM1QI RVVMF2QI RVVMF4QI (RVVMF8QI 
> "TARGET_MIN_VLEN > 32")
> -
> -  RVVM8HI RVVM4HI RVVM2HI RVVM1HI RVVMF2HI (RVVMF4HI "TARGET_MIN_VLEN > 32")
> -
> -  RVVM8SI RVVM4SI RVVM2SI RVVM1SI (RVVMF2SI "TARGET_MIN_VLEN > 32")
> +;; Subset of VI with fractional LMUL types
> +(define_mode_iterator VI_FRAC [
> +  RVVMF2QI RVVMF4QI (RVVMF8QI "TARGET_MIN_VLEN > 32")
> +  RVVMF2HI (RVVMF4HI "TARGET_MIN_VLEN > 32")
> +  (RVVMF2SI "TARGET_MIN_VLEN > 32")
> +])
>
> +;; Subset of VI with non-fractional LMUL types
> +(define_mode_iterator VI_NOFRAC [
> +  RVVM8QI RVVM4QI RVVM2QI RVVM1QI
> +  RVVM8HI RVVM4HI RVVM2HI RVVM1HI
> +  RVVM8SI RVVM4SI RVVM2SI RVVM1SI
>(RVVM8DI "TARGET_VECTOR_ELEN_64") (RVVM4DI "TARGET_VECTOR_ELEN_64")
>(RVVM2DI "TARGET_VECTOR_ELEN_64") (RVVM1DI "TARGET_VECTOR_ELEN_64")
>  ])
>
> +(define_mode_iterator VI [ VI_NOFRAC (VI_FRAC "!TARGET_XTHEADVECTOR") ])
> +
>  ;; This iterator is the same as above but with TARGET_VECTOR_ELEN_FP_16
>  ;; changed to TARGET_ZVFH.  TARGET_VECTOR_ELEN_FP_16 is also true for
>  ;; TARGET_ZVFHMIN while we actually want to disable all instructions apart
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr114194.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr114194.c
> new file mode 100644
> index 000..fc2d1349425
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr114194.c
> @@ -0,0 +1,56 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv32gc_xtheadvector" { target { rv32 } } } */
> +/* { dg-options "-march=rv64gc_xtheadvector" { target { rv64 } } } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +/*
> +** foo0_1:
> +** sb\tzero,0([a-x0-9]+)
> +** ret
> +*/
> +void foo0_1 (void *p)
> +{
> +  __builtin_memset (p, 0, 1);
> +}
> +
> +/*
> +** foo0_7:
> +** sb\tzero,0([a-x0-9]+)
> +** sb\tzero,1([a-x0-9]+)
> +** sb\tzero,2([a-x0-9]+)
> +** sb\tzero,3([a-x0-9]+)
> +** sb\tzero,4([a-x0-9]+)
> +** sb\tzero,5([a-x0-9]+)
> +**

[committed] Fix RISC-V missing stack tie

2024-03-21 Thread Jeff Law



As some of you know, Raphael has been working on stack-clash support for 
the RISC-V port.  A little while ago Florian reached out to us with an 
issue where glibc was failing its smoke test due to referencing an 
unallocated stack slot.


Without diving into the code in detail I (incorrectly) concluded it was 
a problem with the fallback of using Ada's stack-check paths due to not 
having stack-clash support.


Once enough stack-clash bits were ready I had Raphael review the code 
generated for Florian's test and we concluded the the original case from 
Florian was just wrong irrespective of stack clash/stack check.  While 
Raphael's stack-clash work will indirectly fix Florian's case, it really 
should also work without stack-clash.


In particular this code was called out by valgrind:


0003cb5e :
__GI___realpath():
   3cb5e:   81010113addisp,sp,-2032
   3cb62:   7d313423sd  s3,1992(sp)
   3cb66:   79fdlui s3,0xf
   3cb68:   7e813023sd  s0,2016(sp)
   3cb6c:   7c913c23sd  s1,2008(sp)
   3cb70:   7f010413addis0,sp,2032
   3cb74:   35098793addia5,s3,848 # f350 
<__libc_initial+0xffe8946a>
   3cb78:   74fdlui s1,0xf
   3cb7a:   008789b3add s3,a5,s0
   3cb7e:   f9048793addia5,s1,-112 # ef90 
<__libc_initial+0xffe890aa>
   3cb82:   008784b3add s1,a5,s0
   3cb86:   77fdlui a5,0xf
   3cb88:   7d413023sd  s4,1984(sp)
   3cb8c:   7b513c23sd  s5,1976(sp)
   3cb90:   7e113423sd  ra,2024(sp)
   3cb94:   7d213823sd  s2,2000(sp)
   3cb98:   7b613823sd  s6,1968(sp)
   3cb9c:   7b713423sd  s7,1960(sp)
   3cba0:   7b813023sd  s8,1952(sp)
   3cba4:   79913c23sd  s9,1944(sp)
   3cba8:   79a13823sd  s10,1936(sp)
   3cbac:   79b13423sd  s11,1928(sp)
   3cbb0:   34878793addia5,a5,840 # f348 
<__libc_initial+0xffe89462>
   3cbb4:   4713li  a4,1024
   3cbb8:   00132a17auipc   s4,0x132
   3cbbc:   ae0a3a03ld  s4,-1312(s4) # 16e698 
<__stack_chk_guard>
   3cbc0:   01098893addia7,s3,16
   3cbc4:   42098693addia3,s3,1056
   3cbc8:   b8040a93addis5,s0,-1152
   3cbcc:   97a2add a5,a5,s0
   3cbce:   000a3603ld  a2,0(s4)
   3cbd2:   f8c43423sd  a2,-120(s0)
   3cbd6:   4601li  a2,0
   3cbd8:   3d14b023sd  a7,960(s1)
   3cbdc:   3ce4b423sd  a4,968(s1)
   3cbe0:   7cd4b823sd  a3,2000(s1)
   3cbe4:   7ce4bc23sd  a4,2008(s1)
   3cbe8:   b7543823sd  s5,-1168(s0)
   3cbec:   b6e43c23sd  a4,-1160(s0)
   3cbf0:   e38csd  a1,0(a5)
   3cbf2:   b0010113addisp,sp,-1280


In particular note the store at 0x3cbd8.  That's hitting (s1 + 960). If 
you chase the values around, you'll find it's a bit more than 1k into 
unallocated stack space.  It's also worth noting the final stack 
adjustment at 0x3cbf2.


While I haven't reproduced Florian's code exactly, I was able to get 
reasonably close and verify my suspicion that everything was fine before 
sched2 and incorrect after sched2.  It was also obvious at that point 
what had gone wrong -- we were missing a stack tie after the final stack 
pointer adjustment.


This patch adds the missing stack tie.

While not technically a regression, I shudder at the thought of chasing 
one of these issues down again in the wild.  Been there, done that.


Regression tested on rv64gc.  Verified the scheduler no longer mucked up 
realpath by hand.  Pushing to the trunk.


Jeff

commit c65046ff2ef0a9a46e59bc0b3369b2d226f6a239
Author: Jeff Law 
Date:   Thu Mar 21 20:41:59 2024 -0600

[committed] Fix RISC-V missing stack tie

As some of you know, Raphael has been working on stack-clash support for the
RISC-V port.  A little while ago Florian reached out to us with an issue 
where
glibc was failing its smoke test due to referencing an unallocated stack 
slot.

Without diving into the code in detail I (incorrectly) concluded it was a
problem with the fallback of using Ada's stack-check paths due to not having
stack-clash support.

Once enough stack-clash bits were ready I had Raphael review the code 
generated

RE: [PATCH v2] RISC-V: Bugfix ICE for attribute((target("arch=+v"))

2024-03-21 Thread Li, Pan2

Committed, thanks Kito.

Pan

-Original Message-
From: Kito Cheng  
Sent: Friday, March 22, 2024 10:24 AM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 

Subject: Re: [PATCH v2] RISC-V: Bugfix ICE for __attribute__((target("arch=+v"))

LGTM, thanks :)

On Fri, Mar 22, 2024 at 9:13 AM  wrote:
>
> From: Pan Li 
>
> This patch would like to fix one ICE for __attribute__((target("arch=+v"))
> and likewise extension(s). Given we have sample code as below:
>
> void __attribute__((target("arch=+v")))
> test_2 (int *a, int *b, int *out, unsigned count)
> {
>   unsigned i;
>   for (i = 0; i < count; i++)
>out[i] = a[i] + b[i];
> }
>
> It will have ICE when build with -march=rv64gc -O3.
>
> test.c: In function ‘test_2’:
> test.c:4:1: internal compiler error: Floating point exception
> 4 | {
>   | ^
> 0x1a5891b crash_signal
> .../__RISC-V_BUILD__/../gcc/toplev.cc:319
> 0x7f0a7884251f ???
> ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
> 0x1f51ba4 riscv_hard_regno_nregs
> .../__RISC-V_BUILD__/../gcc/config/riscv/riscv.cc:8143
> 0x1967bb9 init_reg_modes_target()
> .../__RISC-V_BUILD__/../gcc/reginfo.cc:471
> 0x13fc029 init_emit_regs()
> .../__RISC-V_BUILD__/../gcc/emit-rtl.cc:6237
> 0x1a5b83d target_reinit()
> .../__RISC-V_BUILD__/../gcc/toplev.cc:1936
> 0x35e374d save_target_globals()
> .../__RISC-V_BUILD__/../gcc/target-globals.cc:92
> 0x35e381f save_target_globals_default_opts()
> .../__RISC-V_BUILD__/../gcc/target-globals.cc:122
> 0x1f544cc riscv_save_restore_target_globals(tree_node*)
> .../__RISC-V_BUILD__/../gcc/config/riscv/riscv.cc:9138
> 0x1f55c36 riscv_set_current_function
> ...
>
> There are two reasons for this ICE.
> 1. The implied extension(s) of v are not well handled and the
>TARGET_MIN_VLEN is 0 which is not reinitialized.  Then the
>size / TARGET_MIN_VLEN will have DivideByZero.
> 2. The machine modes of the vector types will be vary after
>the v extension is introduced.
>
> This patch passed below testsuite:
> 1. The riscv fully regression test.
>
> PR target/114352
>
> gcc/ChangeLog:
>
> * common/config/riscv/riscv-common.cc (riscv_subset_list::parse):
> Replace implied, combine and check to func finalize.
> (riscv_subset_list::finalize): New func impl to take care of
> implied, combine ext and related checks.
> * config/riscv/riscv-subset.h: Add func decl for finalize.
> * config/riscv/riscv-target-attr.cc 
> (riscv_target_attr_parser::parse_arch):
> Finalize the ext before return succeed.
> * config/riscv/riscv.cc (riscv_set_current_function): Reinit the
> machine mode before when set cur function.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/pr114352-1.c: New test.
> * gcc.target/riscv/rvv/base/pr114352-2.c: New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/common/config/riscv/riscv-common.cc   | 31 ++
>  gcc/config/riscv/riscv-subset.h   |  2 +
>  gcc/config/riscv/riscv-target-attr.cc |  2 +
>  gcc/config/riscv/riscv.cc |  4 ++
>  .../gcc.target/riscv/rvv/base/pr114352-1.c| 58 +++
>  .../gcc.target/riscv/rvv/base/pr114352-2.c| 27 +
>  6 files changed, 114 insertions(+), 10 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr114352-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr114352-2.c
>
> diff --git a/gcc/common/config/riscv/riscv-common.cc 
> b/gcc/common/config/riscv/riscv-common.cc
> index 440127a2af0..15d44245b3c 100644
> --- a/gcc/common/config/riscv/riscv-common.cc
> +++ b/gcc/common/config/riscv/riscv-common.cc
> @@ -1428,16 +1428,7 @@ riscv_subset_list::parse (const char *arch, location_t 
> loc)
>if (p == NULL)
>  goto fail;
>
> -  for (itr = subset_list->m_head; itr != NULL; itr = itr->next)
> -{
> -  subset_list->handle_implied_ext (itr->name.c_str ());
> -}
> -
> -  /* Make sure all implied extensions are included. */
> -  gcc_assert (subset_list->check_implied_ext ());
> -
> -  subset_list->handle_combine_ext ();
> -  subset_list->check_conflict_ext ();
> +  subset_list->finalize ();
>
>return subset_list;
>
> @@ -1467,6 +1458,26 @@ riscv_subset_list::set_loc (location_t loc)
>m_loc = loc;
>  }
>
> +/* Make sure the implied or combined extension is included after add
> +   a new std extension to subset list or likewise.  For exmaple as below,
> +
> +   void __attribute__((target("arch=+v"))) func () with -march=rv64gc.
> +
> +   The implied zvl128b and zve64d of the std v should be included.  */
> +void
> +riscv_subset_list::finalize ()
> +{
> +  riscv_subset_t *subset;
> +
> +  for (subset = m_head; subset != NULL; subset = subset->next)
> +handle_implied_ext (subset->name.c_str ());
> +
> +  gcc_assert (check_implied_ext ());
> +
> +  handle_combine_ext ();
> +

Re: [PATCH v2] RISC-V: Bugfix ICE for attribute((target("arch=+v"))

2024-03-21 Thread Kito Cheng

LGTM, thanks :)

On Fri, Mar 22, 2024 at 9:13 AM  wrote:
>
> From: Pan Li 
>
> This patch would like to fix one ICE for __attribute__((target("arch=+v"))
> and likewise extension(s). Given we have sample code as below:
>
> void __attribute__((target("arch=+v")))
> test_2 (int *a, int *b, int *out, unsigned count)
> {
>   unsigned i;
>   for (i = 0; i < count; i++)
>out[i] = a[i] + b[i];
> }
>
> It will have ICE when build with -march=rv64gc -O3.
>
> test.c: In function ‘test_2’:
> test.c:4:1: internal compiler error: Floating point exception
> 4 | {
>   | ^
> 0x1a5891b crash_signal
> .../__RISC-V_BUILD__/../gcc/toplev.cc:319
> 0x7f0a7884251f ???
> ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
> 0x1f51ba4 riscv_hard_regno_nregs
> .../__RISC-V_BUILD__/../gcc/config/riscv/riscv.cc:8143
> 0x1967bb9 init_reg_modes_target()
> .../__RISC-V_BUILD__/../gcc/reginfo.cc:471
> 0x13fc029 init_emit_regs()
> .../__RISC-V_BUILD__/../gcc/emit-rtl.cc:6237
> 0x1a5b83d target_reinit()
> .../__RISC-V_BUILD__/../gcc/toplev.cc:1936
> 0x35e374d save_target_globals()
> .../__RISC-V_BUILD__/../gcc/target-globals.cc:92
> 0x35e381f save_target_globals_default_opts()
> .../__RISC-V_BUILD__/../gcc/target-globals.cc:122
> 0x1f544cc riscv_save_restore_target_globals(tree_node*)
> .../__RISC-V_BUILD__/../gcc/config/riscv/riscv.cc:9138
> 0x1f55c36 riscv_set_current_function
> ...
>
> There are two reasons for this ICE.
> 1. The implied extension(s) of v are not well handled and the
>TARGET_MIN_VLEN is 0 which is not reinitialized.  Then the
>size / TARGET_MIN_VLEN will have DivideByZero.
> 2. The machine modes of the vector types will be vary after
>the v extension is introduced.
>
> This patch passed below testsuite:
> 1. The riscv fully regression test.
>
> PR target/114352
>
> gcc/ChangeLog:
>
> * common/config/riscv/riscv-common.cc (riscv_subset_list::parse):
> Replace implied, combine and check to func finalize.
> (riscv_subset_list::finalize): New func impl to take care of
> implied, combine ext and related checks.
> * config/riscv/riscv-subset.h: Add func decl for finalize.
> * config/riscv/riscv-target-attr.cc 
> (riscv_target_attr_parser::parse_arch):
> Finalize the ext before return succeed.
> * config/riscv/riscv.cc (riscv_set_current_function): Reinit the
> machine mode before when set cur function.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/pr114352-1.c: New test.
> * gcc.target/riscv/rvv/base/pr114352-2.c: New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/common/config/riscv/riscv-common.cc   | 31 ++
>  gcc/config/riscv/riscv-subset.h   |  2 +
>  gcc/config/riscv/riscv-target-attr.cc |  2 +
>  gcc/config/riscv/riscv.cc |  4 ++
>  .../gcc.target/riscv/rvv/base/pr114352-1.c| 58 +++
>  .../gcc.target/riscv/rvv/base/pr114352-2.c| 27 +
>  6 files changed, 114 insertions(+), 10 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr114352-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr114352-2.c
>
> diff --git a/gcc/common/config/riscv/riscv-common.cc 
> b/gcc/common/config/riscv/riscv-common.cc
> index 440127a2af0..15d44245b3c 100644
> --- a/gcc/common/config/riscv/riscv-common.cc
> +++ b/gcc/common/config/riscv/riscv-common.cc
> @@ -1428,16 +1428,7 @@ riscv_subset_list::parse (const char *arch, location_t 
> loc)
>if (p == NULL)
>  goto fail;
>
> -  for (itr = subset_list->m_head; itr != NULL; itr = itr->next)
> -{
> -  subset_list->handle_implied_ext (itr->name.c_str ());
> -}
> -
> -  /* Make sure all implied extensions are included. */
> -  gcc_assert (subset_list->check_implied_ext ());
> -
> -  subset_list->handle_combine_ext ();
> -  subset_list->check_conflict_ext ();
> +  subset_list->finalize ();
>
>return subset_list;
>
> @@ -1467,6 +1458,26 @@ riscv_subset_list::set_loc (location_t loc)
>m_loc = loc;
>  }
>
> +/* Make sure the implied or combined extension is included after add
> +   a new std extension to subset list or likewise.  For exmaple as below,
> +
> +   void __attribute__((target("arch=+v"))) func () with -march=rv64gc.
> +
> +   The implied zvl128b and zve64d of the std v should be included.  */
> +void
> +riscv_subset_list::finalize ()
> +{
> +  riscv_subset_t *subset;
> +
> +  for (subset = m_head; subset != NULL; subset = subset->next)
> +handle_implied_ext (subset->name.c_str ());
> +
> +  gcc_assert (check_implied_ext ());
> +
> +  handle_combine_ext ();
> +  check_conflict_ext ();
> +}
> +
>  /* Return the current arch string.  */
>
>  std::string
> diff --git a/gcc/config/riscv/riscv-subset.h b/gcc/config/riscv/riscv-subset.h
> index ae849e2a302..ec979040e8c 100644
> --- a/gcc/config/riscv/riscv-subset.h
> +++

[PATCH] Move pr114396.c from gcc.target/i386 to gcc.c-torture/execute.

2024-03-21 Thread liuhongt

Also fixed a typo in the testcase.

Commit as an obvious fix.

gcc/testsuite/ChangeLog:

PR tree-optimization/114396
* gcc.target/i386/pr114396.c: Move to...
* gcc.c-torture/execute/pr114396.c: ...here.
---
 .../{gcc.target/i386 => gcc.c-torture/execute}/pr114396.c   | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
 rename gcc/testsuite/{gcc.target/i386 => gcc.c-torture/execute}/pr114396.c 
(92%)

diff --git a/gcc/testsuite/gcc.target/i386/pr114396.c 
b/gcc/testsuite/gcc.c-torture/execute/pr114396.c
similarity index 92%
rename from gcc/testsuite/gcc.target/i386/pr114396.c
rename to gcc/testsuite/gcc.c-torture/execute/pr114396.c
index 4c4015f871f..baf90eafabf 100644
--- a/gcc/testsuite/gcc.target/i386/pr114396.c
+++ b/gcc/testsuite/gcc.c-torture/execute/pr114396.c
@@ -1,5 +1,5 @@
-/* { dg-do run } */
-/* { dg-options "-O1 -fwrapv -fno-vect-cost-model" } */
+/* PR tree-optimization/114396 */
+/* { dg-additional-options "-fwrapv -fno-vect-cost-model" } */
 
 short a = 0xF;
 short b[16];
@@ -88,7 +88,7 @@ int main() {
 
   exp = foo1 (a);
   res = foo1_o3 (a);
-  if (uexp != ures)
+  if (exp != res)
 __builtin_abort ();
 
   uexp = foou (a);
-- 
2.31.1

Re: [PATCH] cpp: new built-in __EXP_COUNTER__

2024-03-21 Thread Andrew Pinski

On Thu, Mar 21, 2024, 17:20 Kaz Kylheku  wrote:

> On 2024-03-20 16:34, rep.dot@gmail.com wrote:
> > On 19 March 2024 18:27:13 CET, Kaz Kylheku  wrote:
> >>On 2024-03-18 00:30, Jonathan Wakely wrote:
> >>> I don't have an opinion on the implementation, or the proposal itself,
> >>> except that the implementation seems susprisingly simple, which is
> >>> nice.
> >>
> >>Hi Jonathan,
> >>
> >>Here is an updated patch.
> >>
> >>It rebased cleanly over more than newer 16000 commits, suggesting
> >>that the area in the cpp code is "still waters", which is good.
> >>
> >>I made the documentation change not to recommend using #if, but
> >>#ifdef.
> >>
> >>I got rid of the ChangeLog changes, and also tried to pay more
> >>attention to the log message format, where the ChangeLog pieces
> >>are specified.
> >>
> >>In the first test case, I had to adjust the expected warning text
> >>for two lines.
> >>
> >
> > Please forgive the bike shedding, but __EXP_COUNTER__ would lead me into
> thinking about exponents or thereabouts.
> > __MACRO_EXPANSION_COUNTER__ is more what your patch is about, IMHO?
> Maybe you could come up with a more descriptive name, please?
> >
> > And, while I can see what could possibly be done with that, I'm not
> really convinced that it would be a wise idea to (unilaterally) support
> that idea. Don't you think that this would encourage producing more
> spaghetti code?
> >
> > Just curious about real world motivating examples I guess.
> > cheers
>
> Hi, (Bernhard?)
>
> Concerns about naming are very important; not bike shedding at all.
> I changed the patch to use __EXPANSION_NUMBER__. I didn't include MACRO
> because I hope it's clear that in preprocessing, we are expanding
> macros. The parent symbol is now called __PARENT_EXPANSION_NUMBER__.
>
> I dropped the COUNTER terminology because the existing __COUNTER__
> is a symbol whose value changes each time it is mentioned,
> These symbols are not like that; they capture a fixed value in
> a scope and behave like ordinary macros.
>
> In doing the renaming, I noticed that from the beginning I've already
> been calling the internal value in the macro context macro->exp_number,
> because it's not a counter.
>
> The focus of this feature isn't to enable some new "earth-shattering"
> techniques, but to improve certain situations in existing macros.
>
> For instance, suppose we have a macro that expands to some block
> of code in which there is an internal goto. If we have it
>
>   #define MAC(...) { ... goto _label; ... __label: ; }
>
> then this cannot be used twice in the same function; labels have
> function scope.



In this case why can't you use gcc's already extension of defining a local
label?
https://gcc.gnu.org/onlinedocs/gcc-13.2.0/gcc/Local-Labels.html

This extension has been around for over 20 years specifically for that use
case.

Thanks,
Andrew



If we make it
>
>   #define MAC(...) { ... goto CAT(__label, __LINE__); ... CAT(__label,
> __LINE__): ; }
>
> we now can use MAC two or more times in the same function, but not in
> the same line of code.
>
> With __EXPANSION_NUMBER__ it is doable. Given this program:
>
>   #define xcat(A, B) A ## B
>   #define cat(A, B) xcat(A, B)
>   #define lab(PREFIX) cat(PREFIX, __PARENT_EXPANSION_NUMBER__)
>
>   #define MAC { goto lab(foo); /*...*/ lab(foo): ; }
>
>   MAC MAC MAC
>
> We get the preprocessed output (with -E):
>
>   { goto foo3; foo3: ; } { goto foo10; foo10: ; } { goto foo17; foo17: ; }
>
> There are issues with relying on __LINE__ to produce different values
> when it is referenced in code generated by a macro.
>
> The following program prints the same value 12 three times; even though
> PRINT seems to be referenced on different physical lines in the
> PRINT3 macro replacement text. __LINE__ references the line where
> the top-level expansion of PRINT3 occurs, not where PRINT occurs.
>
> #include 
>
> #define PRINT (printf("%d\n", __LINE__))
>
> #define PRINT3 do { \
>   PRINT;\
>   PRINT;\
>   PRINT;\
> } while (0)
>
> int main()
> {
>   PRINT3;
>   return 0;
> }
>

Re: [PATCH] RISC-V: Don't add fractional LMUL types to V_VLS for XTheadVector

2024-03-21 Thread juzhe.zh...@rivai.ai

LGTM.

juzhe.zh...@rivai.ai

From: Christoph Müllner
Date: 2024-03-22 07:45
To: gcc-patches; Kito Cheng; Palmer Dabbelt; Andrew Waterman; Philipp Tomsich; 
Camel Coder; Bruce Hoult; Juzhe-Zhong; Jun Sha; Xianmiao Qu; Jin Ma
CC: Christoph Müllner
Subject: [PATCH] RISC-V: Don't add fractional LMUL types to V_VLS for 
XTheadVector
The expansion of `memset` (via expand_builtin_memset_args())
uses clear_by_pieces() and store_by_pieces() to avoid calls
to the C runtime. To check if a type can be used for that purpose
the function by_pieces_mode_supported_p() tests if a `mov` and
a `vec_duplicate` INSN can be expaned by the backend.

The `vec_duplicate` expansion takes arguments of type `V_VLS`.
The `mov` expansions take arguments of type `V`, `VB`, `VT`,
`VLS_AVL_IMM`, and `VLS_AVL_REG`. Some of these types (in fact
not types but type iterators) include fractional LMUL types.
E.g. `V_VLS` includes `V`, which includes `VI`, which includes
`RVVMF2QI`.

This results in an attempt to use fractional LMUL-types for
the `memset` expansion resulting in an ICE for XTheadVector,
because that extension cannot handle fractional LMULs.

This patch addresses this issue by splitting the definition
of the `VI` mode itereator into `VI_NOFRAC` (without fractional
LMUL types) and `VI_FRAC` (only fractional LMUL types).
Further, it defines `V_VLS` such, that `VI_FRAC` types are only
included if XTheadVector is not enabled.

The effect is demonstrated by a new test case that shows
that the by-pieces framework now emits `sb` instructions
instead of triggering an ICE.

Signed-off-by: Christoph Müllner 

PR 114194

gcc/ChangeLog:

* config/riscv/vector-iterators.md: Split VI into VI_FRAC and VI_NOFRAC.
Only include VI_NOFRAC in V_VLS without TARGET_XTHEADVECTOR.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/xtheadvector/pr114194.c: New test.

Signed-off-by: Christoph Müllner 
---
gcc/config/riscv/vector-iterators.md  | 19 +--
.../riscv/rvv/xtheadvector/pr114194.c | 56 +++
2 files changed, 69 insertions(+), 6 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr114194.c

diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index c2ea7e8b10a..a24e1bf078f 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -108,17 +108,24 @@ (define_c_enum "unspecv" [
   UNSPECV_FRM_RESTORE_EXIT
])
-(define_mode_iterator VI [
-  RVVM8QI RVVM4QI RVVM2QI RVVM1QI RVVMF2QI RVVMF4QI (RVVMF8QI "TARGET_MIN_VLEN 
> 32")
-
-  RVVM8HI RVVM4HI RVVM2HI RVVM1HI RVVMF2HI (RVVMF4HI "TARGET_MIN_VLEN > 32")
-
-  RVVM8SI RVVM4SI RVVM2SI RVVM1SI (RVVMF2SI "TARGET_MIN_VLEN > 32")
+;; Subset of VI with fractional LMUL types
+(define_mode_iterator VI_FRAC [
+  RVVMF2QI RVVMF4QI (RVVMF8QI "TARGET_MIN_VLEN > 32")
+  RVVMF2HI (RVVMF4HI "TARGET_MIN_VLEN > 32")
+  (RVVMF2SI "TARGET_MIN_VLEN > 32")
+])
+;; Subset of VI with non-fractional LMUL types
+(define_mode_iterator VI_NOFRAC [
+  RVVM8QI RVVM4QI RVVM2QI RVVM1QI
+  RVVM8HI RVVM4HI RVVM2HI RVVM1HI
+  RVVM8SI RVVM4SI RVVM2SI RVVM1SI
   (RVVM8DI "TARGET_VECTOR_ELEN_64") (RVVM4DI "TARGET_VECTOR_ELEN_64")
   (RVVM2DI "TARGET_VECTOR_ELEN_64") (RVVM1DI "TARGET_VECTOR_ELEN_64")
])
+(define_mode_iterator VI [ VI_NOFRAC (VI_FRAC "!TARGET_XTHEADVECTOR") ])
+
;; This iterator is the same as above but with TARGET_VECTOR_ELEN_FP_16
;; changed to TARGET_ZVFH.  TARGET_VECTOR_ELEN_FP_16 is also true for
;; TARGET_ZVFHMIN while we actually want to disable all instructions apart
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr114194.c 
b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr114194.c
new file mode 100644
index 000..fc2d1349425
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr114194.c
@@ -0,0 +1,56 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gc_xtheadvector" { target { rv32 } } } */
+/* { dg-options "-march=rv64gc_xtheadvector" { target { rv64 } } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+/*
+** foo0_1:
+** sb\tzero,0([a-x0-9]+)
+** ret
+*/
+void foo0_1 (void *p)
+{
+  __builtin_memset (p, 0, 1);
+}
+
+/*
+** foo0_7:
+** sb\tzero,0([a-x0-9]+)
+** sb\tzero,1([a-x0-9]+)
+** sb\tzero,2([a-x0-9]+)
+** sb\tzero,3([a-x0-9]+)
+** sb\tzero,4([a-x0-9]+)
+** sb\tzero,5([a-x0-9]+)
+** sb\tzero,6([a-x0-9]+)
+** ret
+*/
+void foo0_7 (void *p)
+{
+  __builtin_memset (p, 0, 7);
+}
+
+/*
+** foo1_1:
+** li\t[a-x0-9]+,1
+** sb\t[a-x0-9]+,0([a-x0-9]+)
+** ret
+*/
+void foo1_1 (void *p)
+{
+  __builtin_memset (p, 1, 1);
+}
+
+/*
+** foo1_5:
+** li\t[a-x0-9]+,1
+** sb\t[a-x0-9]+,0([a-x0-9]+)
+** sb\t[a-x0-9]+,1([a-x0-9]+)
+** sb\t[a-x0-9]+,2([a-x0-9]+)
+** sb\t[a-x0-9]+,3([a-x0-9]+)
+** sb\t[a-x0-9]+,4([a-x0-9]+)
+** ret
+*/
+void foo1_5 (void *p)
+{
+  __builtin_memset (p, 1, 5);
+}
-- 
2.44.0

RE: [PATCH v2] DSE: Bugfix ICE after allow vector type in get_stored_val

2024-03-21 Thread Li, Pan2

Sorry for disturbing, kindly ping for this ICE.

Pan

-Original Message-
From: Li, Pan2  
Sent: Tuesday, March 12, 2024 10:09 AM
To: Jeff Law ; Robin Dapp ; 
gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com; 
Wang, Yanzhang ; Liu, Hongtao 
Subject: RE: [PATCH v2] DSE: Bugfix ICE after allow vector type in 
get_stored_val

Hi Jeff,

Is there any suggestion(s) for how to fix this ICE in the reasonable approach? 
Thanks a lot.

Pan

-Original Message-
From: Li, Pan2 
Sent: Tuesday, March 5, 2024 2:23 PM
To: Jeff Law ; Robin Dapp ; 
gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com; 
Wang, Yanzhang ; Liu, Hongtao 
Subject: RE: [PATCH v2] DSE: Bugfix ICE after allow vector type in 
get_stored_val

Thanks Jeff for comments.

> But in the case of a vector modes, we can usually reinterpret the 
> underlying bits in whatever mode we want and do any of the usual 
> operations on those bits.

Yes, I think that is why we can allow vector mode in get_stored_val if my 
understanding is correct.
And then the different modes will return by gen_low_part. Unfortunately, there 
are some modes
 (less than a vector bit size like V2SF, V2QI for vlen=128) are considered as 
invalid by validate_subreg, 
and return NULL_RTX result in the final ICE.

Thus, consider stage 4 I wonder if this is a acceptable fix, aka find some 
where to filter-out the invalid
modes before goes to gen_low_part.

Pan

-Original Message-
From: Jeff Law  
Sent: Monday, March 4, 2024 6:47 AM
To: Robin Dapp ; Li, Pan2 ; 
gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com; 
Wang, Yanzhang ; Liu, Hongtao 
Subject: Re: [PATCH v2] DSE: Bugfix ICE after allow vector type in 
get_stored_val

On 2/29/24 06:28, Robin Dapp wrote:
> On 2/29/24 02:38, Li, Pan2 wrote:
>>> So it's going to check if V2SF can be tied to DI and V4QI with SI.  I
>>> suspect those are going to fail for RISC-V as those aren't tieable.
>>
>> Yes, you are right. Different REG_CLASS are not allowed to be tieable in 
>> RISC-V.
>>
>> static bool
>> riscv_modes_tieable_p (machine_mode mode1, machine_mode mode2)
>> {
>>/* We don't allow different REG_CLASS modes tieable since it
>>   will cause ICE in register allocation (RA).
>>   E.g. V2SI and DI are not tieable.  */
>>if (riscv_v_ext_mode_p (mode1) != riscv_v_ext_mode_p (mode2))
>>  return false;
>>return (mode1 == mode2
>>|| !(GET_MODE_CLASS (mode1) == MODE_FLOAT
>> && GET_MODE_CLASS (mode2) == MODE_FLOAT));
>> }
> 
> Yes, but what we set tieable is e.g. V4QI and V2SF.
But in the case of a vector modes, we can usually reinterpret the 
underlying bits in whatever mode we want and do any of the usual 
operations on those bits.

In my mind that's fundamentally different than the int vs fp case.  If 
we have an integer value in an FP register, we can't really operate on 
the value in any sensible way without first copying it over to the 
integer register file and vice-versa.

Jeff

[PATCH v2] RISC-V: Bugfix ICE for attribute((target("arch=+v"))

2024-03-21 Thread pan2 . li

From: Pan Li 

This patch would like to fix one ICE for __attribute__((target("arch=+v"))
and likewise extension(s). Given we have sample code as below:

void __attribute__((target("arch=+v")))
test_2 (int *a, int *b, int *out, unsigned count)
{
  unsigned i;
  for (i = 0; i < count; i++)
   out[i] = a[i] + b[i];
}

It will have ICE when build with -march=rv64gc -O3.

test.c: In function ‘test_2’:
test.c:4:1: internal compiler error: Floating point exception
4 | {
  | ^
0x1a5891b crash_signal
.../__RISC-V_BUILD__/../gcc/toplev.cc:319
0x7f0a7884251f ???
./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
0x1f51ba4 riscv_hard_regno_nregs
.../__RISC-V_BUILD__/../gcc/config/riscv/riscv.cc:8143
0x1967bb9 init_reg_modes_target()
.../__RISC-V_BUILD__/../gcc/reginfo.cc:471
0x13fc029 init_emit_regs()
.../__RISC-V_BUILD__/../gcc/emit-rtl.cc:6237
0x1a5b83d target_reinit()
.../__RISC-V_BUILD__/../gcc/toplev.cc:1936
0x35e374d save_target_globals()
.../__RISC-V_BUILD__/../gcc/target-globals.cc:92
0x35e381f save_target_globals_default_opts()
.../__RISC-V_BUILD__/../gcc/target-globals.cc:122
0x1f544cc riscv_save_restore_target_globals(tree_node*)
.../__RISC-V_BUILD__/../gcc/config/riscv/riscv.cc:9138
0x1f55c36 riscv_set_current_function
...

There are two reasons for this ICE.
1. The implied extension(s) of v are not well handled and the
   TARGET_MIN_VLEN is 0 which is not reinitialized.  Then the
   size / TARGET_MIN_VLEN will have DivideByZero.
2. The machine modes of the vector types will be vary after
   the v extension is introduced.

This patch passed below testsuite:
1. The riscv fully regression test.

PR target/114352

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (riscv_subset_list::parse):
Replace implied, combine and check to func finalize.
(riscv_subset_list::finalize): New func impl to take care of
implied, combine ext and related checks.
* config/riscv/riscv-subset.h: Add func decl for finalize.
* config/riscv/riscv-target-attr.cc 
(riscv_target_attr_parser::parse_arch):
Finalize the ext before return succeed.
* config/riscv/riscv.cc (riscv_set_current_function): Reinit the
machine mode before when set cur function.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr114352-1.c: New test.
* gcc.target/riscv/rvv/base/pr114352-2.c: New test.

Signed-off-by: Pan Li 
---
 gcc/common/config/riscv/riscv-common.cc   | 31 ++
 gcc/config/riscv/riscv-subset.h   |  2 +
 gcc/config/riscv/riscv-target-attr.cc |  2 +
 gcc/config/riscv/riscv.cc |  4 ++
 .../gcc.target/riscv/rvv/base/pr114352-1.c| 58 +++
 .../gcc.target/riscv/rvv/base/pr114352-2.c| 27 +
 6 files changed, 114 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr114352-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr114352-2.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 440127a2af0..15d44245b3c 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -1428,16 +1428,7 @@ riscv_subset_list::parse (const char *arch, location_t 
loc)
   if (p == NULL)
 goto fail;
 
-  for (itr = subset_list->m_head; itr != NULL; itr = itr->next)
-{
-  subset_list->handle_implied_ext (itr->name.c_str ());
-}
-
-  /* Make sure all implied extensions are included. */
-  gcc_assert (subset_list->check_implied_ext ());
-
-  subset_list->handle_combine_ext ();
-  subset_list->check_conflict_ext ();
+  subset_list->finalize ();
 
   return subset_list;
 
@@ -1467,6 +1458,26 @@ riscv_subset_list::set_loc (location_t loc)
   m_loc = loc;
 }
 
+/* Make sure the implied or combined extension is included after add
+   a new std extension to subset list or likewise.  For exmaple as below,
+
+   void __attribute__((target("arch=+v"))) func () with -march=rv64gc.
+
+   The implied zvl128b and zve64d of the std v should be included.  */
+void
+riscv_subset_list::finalize ()
+{
+  riscv_subset_t *subset;
+
+  for (subset = m_head; subset != NULL; subset = subset->next)
+handle_implied_ext (subset->name.c_str ());
+
+  gcc_assert (check_implied_ext ());
+
+  handle_combine_ext ();
+  check_conflict_ext ();
+}
+
 /* Return the current arch string.  */
 
 std::string
diff --git a/gcc/config/riscv/riscv-subset.h b/gcc/config/riscv/riscv-subset.h
index ae849e2a302..ec979040e8c 100644
--- a/gcc/config/riscv/riscv-subset.h
+++ b/gcc/config/riscv/riscv-subset.h
@@ -105,6 +105,8 @@ public:
   int match_score (riscv_subset_list *) const;
 
   void set_loc (location_t);
+
+  void finalize ();
 };
 
 extern const riscv_subset_list *riscv_current_subset_list (void);
diff --git a/gcc/config/riscv/riscv-target-attr.cc 
b/gcc/config/riscv/riscv-target-attr.cc
index

Re: scheduler queue flush

2024-03-21 Thread Vineet Gupta




On 3/21/24 12:56, Jeff Law wrote:
>
> On 3/21/24 11:19 AM, Vineet Gupta wrote:
>
>>> So if we go back to Robin's observation that scheduling dramatically
>>> increases the instruction count, perhaps we try a run with
>>> -fno-schedule-insns -fno-schedule-insns2 and see how the instruction
>>> counts compare.
>> Oh yeah ! Robin hinted to this in Tues patchworks meeting too
>>
>> default  : 2,565,319,368,591
>> 128  : 2,509,741,035,068
>> 256  : 2,527,817,813,612
>> no-sched{,2}: 1,295,520,567,376
> Now we're getting somewhere.  That's in line with expectations.
>
> I would strongly suspect it's -fno-schedule-insns rather than 
> -fno-schedule-insns2.  The former turns off scheduling before register 
> allocation, the second turns it off after register allocation.  So if 
> our theory about spilling is correct, then it must be the first since 
> the second won't affect register allocation.   While I can speculate 
> about other potential scheduler impacts, spilling due to sched1's 
> actions is by far the most likely.

As always you are absolutely right, just doing -fno-schedule-insns gets
almost the same as last row above.

> Given the magnitude here, I would bet we can see this pretty clearly if 
> you've got function level or block level count data for those runs.  I'd 
> start with that, ideally narrowing things down to a function or hot loop 
> within a function which shows a huge delta.

Alright, on it.

Thx,
-Vineet

> From that we can then look at the IRA and LRA dumps and correlate what 
> we see there with the before/after scheduling dumps to see how we've 
> lengthened lifetimes in critical locations.
>
> I'd probably start with the IRA dump.  It's going to have annotations in 
> its dump output like "Potential Spill" which may guide us.  In simplest 
> terms a pseudo is trivially allocatable when it has fewer neighbors in 
> the conflict graph than available hard registers.  If it has more 
> neighbors in the conflict graph than available hard registers, then it's 
> potentially going to be spilled -- we can't know during this phase of 
> allocation.
>
> As we pop registers off the coloring stack, some neighbors of the pseudo 
> in question may end up allocated into the same hard register.  That can 
> sometimes result in a hard register being available.  It might be easier 
> to see with a graph
>
>  a--b--c
> |
> d
>
> Where a..d are pseudo registers.  If two pseudos are connected by an 
> edge, then they have overlapping lifetimes and can't be allocated to the 
> same hard register.  So as we can see b conflicts with a, c & d.  If we 
> only have two hard registers, then b is not trivially colorable and will 
> be marked as a potential spill.
>
> During coloring we may end up allocating a, c & d to the same hard 
> register (they don't conflict, so its safe).  If that happens, then 
> there would be a register available for b.
>
> Anyway, that should explain why b would be marked as a potential spill 
> and how it might end up getting a hard register anyway.
>
> The hope is we can see the potential spills increasing.  At which point 
> we can walk backwards to sched1 and dive into its scheduling decisions.
>
> Jeff

Re: [PATCH] cpp: new built-in __EXP_COUNTER__

2024-03-21 Thread Kaz Kylheku

On 2024-03-20 16:34, rep.dot@gmail.com wrote:
> On 19 March 2024 18:27:13 CET, Kaz Kylheku  wrote:
>>On 2024-03-18 00:30, Jonathan Wakely wrote:
>>> I don't have an opinion on the implementation, or the proposal itself,
>>> except that the implementation seems susprisingly simple, which is
>>> nice.
>>
>>Hi Jonathan,
>>
>>Here is an updated patch.
>>
>>It rebased cleanly over more than newer 16000 commits, suggesting
>>that the area in the cpp code is "still waters", which is good.
>>
>>I made the documentation change not to recommend using #if, but
>>#ifdef.
>>
>>I got rid of the ChangeLog changes, and also tried to pay more
>>attention to the log message format, where the ChangeLog pieces
>>are specified.
>>
>>In the first test case, I had to adjust the expected warning text
>>for two lines.
>>
> 
> Please forgive the bike shedding, but __EXP_COUNTER__ would lead me into 
> thinking about exponents or thereabouts.
> __MACRO_EXPANSION_COUNTER__ is more what your patch is about, IMHO? Maybe you 
> could come up with a more descriptive name, please?
> 
> And, while I can see what could possibly be done with that, I'm not really 
> convinced that it would be a wise idea to (unilaterally) support that idea. 
> Don't you think that this would encourage producing more spaghetti code?
> 
> Just curious about real world motivating examples I guess.
> cheers

Hi, (Bernhard?)

Concerns about naming are very important; not bike shedding at all.
I changed the patch to use __EXPANSION_NUMBER__. I didn't include MACRO
because I hope it's clear that in preprocessing, we are expanding
macros. The parent symbol is now called __PARENT_EXPANSION_NUMBER__.

I dropped the COUNTER terminology because the existing __COUNTER__
is a symbol whose value changes each time it is mentioned,
These symbols are not like that; they capture a fixed value in
a scope and behave like ordinary macros.

In doing the renaming, I noticed that from the beginning I've already
been calling the internal value in the macro context macro->exp_number,
because it's not a counter.

The focus of this feature isn't to enable some new "earth-shattering"
techniques, but to improve certain situations in existing macros.

For instance, suppose we have a macro that expands to some block
of code in which there is an internal goto. If we have it

  #define MAC(...) { ... goto _label; ... __label: ; }

then this cannot be used twice in the same function; labels have
function scope. If we make it

  #define MAC(...) { ... goto CAT(__label, __LINE__); ... CAT(__label, 
__LINE__): ; }

we now can use MAC two or more times in the same function, but not in
the same line of code.

With __EXPANSION_NUMBER__ it is doable. Given this program:

  #define xcat(A, B) A ## B
  #define cat(A, B) xcat(A, B)
  #define lab(PREFIX) cat(PREFIX, __PARENT_EXPANSION_NUMBER__)

  #define MAC { goto lab(foo); /*...*/ lab(foo): ; }

  MAC MAC MAC

We get the preprocessed output (with -E):

  { goto foo3; foo3: ; } { goto foo10; foo10: ; } { goto foo17; foo17: ; }

There are issues with relying on __LINE__ to produce different values
when it is referenced in code generated by a macro.

The following program prints the same value 12 three times; even though
PRINT seems to be referenced on different physical lines in the
PRINT3 macro replacement text. __LINE__ references the line where
the top-level expansion of PRINT3 occurs, not where PRINT occurs.

#include 

#define PRINT (printf("%d\n", __LINE__))

#define PRINT3 do { \
  PRINT;\
  PRINT;\
  PRINT;\
} while (0)

int main()
{
  PRINT3;
  return 0;
}
From 4aded10c4171f9a9a361bb8986d721357f1fc2c8 Mon Sep 17 00:00:00 2001
From: Kaz Kylheku 
Date: Wed, 20 Apr 2022 01:15:24 -0700
Subject: [PATCH] cpp: new built-in __EXPANSION_NUMBER__

This change introduces a pair of related macros __EXPANSION_NUMBER__ and
__PARENT_EXPANSION_NUMBER__.  These macros access integer values which
enumerate macro expansions.  They can be used for the purposes of
obtaining, unique identifiers (within the scope of a translation unit),
as a replacement for unreliable hacks based on __LINE__.

Outside of macro expansions, these macros epand to 1, so they are easy
to test for in portable code that needs to fall back on another
approach, perhaps involving __LINE__.

libcpp/ChangeLog:

	* libcpp/include/cpplib.h (struct cpp_macro): New members of
	type long long: expansion_number and parent_expansion_number.
	These members are used to attach the current expansion_counter
	value, and the parent context's copy of it to a new macro
	expansion.  The special macros then just access these.
	(enum cpp_builtin_type): New enumeration members,
	BT_EXPANSION_NUMBER and BT_PARENT_EXPANSION_NUMBER.

	* libcpp/macro.cc (expansion_counter): New static variable for
	counting expansions.  There is an existing one,
	num_expanded_macros_counter, but that has its own purpose and is
	incremented in a specific place.  Plus it uses a narrower
	integer

[PATCH] RISC-V: Don't add fractional LMUL types to V_VLS for XTheadVector

2024-03-21 Thread Christoph Müllner

The expansion of `memset` (via expand_builtin_memset_args())
uses clear_by_pieces() and store_by_pieces() to avoid calls
to the C runtime. To check if a type can be used for that purpose
the function by_pieces_mode_supported_p() tests if a `mov` and
a `vec_duplicate` INSN can be expaned by the backend.

The `vec_duplicate` expansion takes arguments of type `V_VLS`.
The `mov` expansions take arguments of type `V`, `VB`, `VT`,
`VLS_AVL_IMM`, and `VLS_AVL_REG`. Some of these types (in fact
not types but type iterators) include fractional LMUL types.
E.g. `V_VLS` includes `V`, which includes `VI`, which includes
`RVVMF2QI`.

This results in an attempt to use fractional LMUL-types for
the `memset` expansion resulting in an ICE for XTheadVector,
because that extension cannot handle fractional LMULs.

This patch addresses this issue by splitting the definition
of the `VI` mode itereator into `VI_NOFRAC` (without fractional
LMUL types) and `VI_FRAC` (only fractional LMUL types).
Further, it defines `V_VLS` such, that `VI_FRAC` types are only
included if XTheadVector is not enabled.

The effect is demonstrated by a new test case that shows
that the by-pieces framework now emits `sb` instructions
instead of triggering an ICE.

Signed-off-by: Christoph Müllner 

PR 114194

gcc/ChangeLog:

* config/riscv/vector-iterators.md: Split VI into VI_FRAC and VI_NOFRAC.
Only include VI_NOFRAC in V_VLS without TARGET_XTHEADVECTOR.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/xtheadvector/pr114194.c: New test.

Signed-off-by: Christoph Müllner 
---
 gcc/config/riscv/vector-iterators.md  | 19 +--
 .../riscv/rvv/xtheadvector/pr114194.c | 56 +++
 2 files changed, 69 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr114194.c

diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index c2ea7e8b10a..a24e1bf078f 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -108,17 +108,24 @@ (define_c_enum "unspecv" [
   UNSPECV_FRM_RESTORE_EXIT
 ])
 
-(define_mode_iterator VI [
-  RVVM8QI RVVM4QI RVVM2QI RVVM1QI RVVMF2QI RVVMF4QI (RVVMF8QI "TARGET_MIN_VLEN 
> 32")
-
-  RVVM8HI RVVM4HI RVVM2HI RVVM1HI RVVMF2HI (RVVMF4HI "TARGET_MIN_VLEN > 32")
-
-  RVVM8SI RVVM4SI RVVM2SI RVVM1SI (RVVMF2SI "TARGET_MIN_VLEN > 32")
+;; Subset of VI with fractional LMUL types
+(define_mode_iterator VI_FRAC [
+  RVVMF2QI RVVMF4QI (RVVMF8QI "TARGET_MIN_VLEN > 32")
+  RVVMF2HI (RVVMF4HI "TARGET_MIN_VLEN > 32")
+  (RVVMF2SI "TARGET_MIN_VLEN > 32")
+])
 
+;; Subset of VI with non-fractional LMUL types
+(define_mode_iterator VI_NOFRAC [
+  RVVM8QI RVVM4QI RVVM2QI RVVM1QI
+  RVVM8HI RVVM4HI RVVM2HI RVVM1HI
+  RVVM8SI RVVM4SI RVVM2SI RVVM1SI
   (RVVM8DI "TARGET_VECTOR_ELEN_64") (RVVM4DI "TARGET_VECTOR_ELEN_64")
   (RVVM2DI "TARGET_VECTOR_ELEN_64") (RVVM1DI "TARGET_VECTOR_ELEN_64")
 ])
 
+(define_mode_iterator VI [ VI_NOFRAC (VI_FRAC "!TARGET_XTHEADVECTOR") ])
+
 ;; This iterator is the same as above but with TARGET_VECTOR_ELEN_FP_16
 ;; changed to TARGET_ZVFH.  TARGET_VECTOR_ELEN_FP_16 is also true for
 ;; TARGET_ZVFHMIN while we actually want to disable all instructions apart
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr114194.c 
b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr114194.c
new file mode 100644
index 000..fc2d1349425
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr114194.c
@@ -0,0 +1,56 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gc_xtheadvector" { target { rv32 } } } */
+/* { dg-options "-march=rv64gc_xtheadvector" { target { rv64 } } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+/*
+** foo0_1:
+** sb\tzero,0([a-x0-9]+)
+** ret
+*/
+void foo0_1 (void *p)
+{
+  __builtin_memset (p, 0, 1);
+}
+
+/*
+** foo0_7:
+** sb\tzero,0([a-x0-9]+)
+** sb\tzero,1([a-x0-9]+)
+** sb\tzero,2([a-x0-9]+)
+** sb\tzero,3([a-x0-9]+)
+** sb\tzero,4([a-x0-9]+)
+** sb\tzero,5([a-x0-9]+)
+** sb\tzero,6([a-x0-9]+)
+** ret
+*/
+void foo0_7 (void *p)
+{
+  __builtin_memset (p, 0, 7);
+}
+
+/*
+** foo1_1:
+** li\t[a-x0-9]+,1
+** sb\t[a-x0-9]+,0([a-x0-9]+)
+** ret
+*/
+void foo1_1 (void *p)
+{
+  __builtin_memset (p, 1, 1);
+}
+
+/*
+** foo1_5:
+** li\t[a-x0-9]+,1
+** sb\t[a-x0-9]+,0([a-x0-9]+)
+** sb\t[a-x0-9]+,1([a-x0-9]+)
+** sb\t[a-x0-9]+,2([a-x0-9]+)
+** sb\t[a-x0-9]+,3([a-x0-9]+)
+** sb\t[a-x0-9]+,4([a-x0-9]+)
+** ret
+*/
+void foo1_5 (void *p)
+{
+  __builtin_memset (p, 1, 5);
+}
-- 
2.44.0

[PATCH] xtensa: Add supplementary split pattern for "*addsubx"

2024-03-21 Thread Takayuki 'January June' Suwa

int test(int a) {
   return a * 4 + 3;
}

In the example above, since Xtensa has instructions to add register value
scaled by 2, 4 or 8 (and corresponding define_insns), we would expect them
to be used but not, because it is transformed before reaching the RTL
generation pass as below:

int test(int a) {
   return (a + 7500) * 4;
}

Fortunately, the RTL combination pass tries a splitting pattern that matches
the first example, so it is easy to solve by defining that pattern.

gcc/ChangeLog:

* config/xtensa/xtensa.md: Add new split pattern described above.
---
 gcc/config/xtensa/xtensa.md | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md
index 5cdf4dffe70..fbe40ec671a 100644
--- a/gcc/config/xtensa/xtensa.md
+++ b/gcc/config/xtensa/xtensa.md
@@ -194,6 +194,20 @@
(set_attr "mode""SI")
(set_attr "length"  "3")])
 
+(define_split
+  [(set (match_operand:SI 0 "register_operand")
+   (plus:SI (ashift:SI (match_operand:SI 1 "register_operand")
+   (match_operand:SI 3 "addsubx_operand"))
+(match_operand:SI 2 "const_int_operand")))]
+  "TARGET_ADDX && can_create_pseudo_p ()"
+  [(set (match_dup 0)
+   (plus:SI (ashift:SI (match_dup 1)
+   (match_dup 3))
+(match_dup 2)))]
+{
+  operands[2] = force_reg (SImode, operands[2]);
+})
+
 (define_expand "adddi3"
   [(set (match_operand:DI 0 "register_operand")
(plus:DI (match_operand:DI 1 "register_operand")
-- 
2.39.2

[pushed] analyzer: fix ignored constraints involving casts [PR113619]

2024-03-21 Thread David Malcolm

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Successful run of analyzer integration tests on x86_64-pc-linux-gnu.
Pushed to trunk as r14-9600-g7a5a4a4467b2e1.

gcc/analyzer/ChangeLog:
PR analyzer/113619
* region-model.cc (region_model::eval_condition): Fix
cast-handling from r14-3632-ge7b267444045c5 so that if those give
an unknown result, we continue trying the constraint manager.

gcc/testsuite/ChangeLog:
PR analyzer/113619
* c-c++-common/analyzer/taint-divisor-pr113619.c: New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/region-model.cc  | 24 ++-
 .../analyzer/taint-divisor-pr113619.c | 29 +++
 2 files changed, 46 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/analyzer/taint-divisor-pr113619.c

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index c3a4ec7bcfc5..902b887fc074 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -4704,17 +4704,27 @@ region_model::eval_condition (const svalue *lhs,
 if (lhs_un_op && CONVERT_EXPR_CODE_P (lhs_un_op->get_op ())
&& rhs_un_op && CONVERT_EXPR_CODE_P (rhs_un_op->get_op ())
&& lhs_type == rhs_type)
-  return eval_condition (lhs_un_op->get_arg (),
-op,
-rhs_un_op->get_arg ());
-
+  {
+   tristate res = eval_condition (lhs_un_op->get_arg (),
+  op,
+  rhs_un_op->get_arg ());
+   if (res.is_known ())
+ return res;
+  }
 else if (lhs_un_op && CONVERT_EXPR_CODE_P (lhs_un_op->get_op ())
 && lhs_type == rhs_type)
-  return eval_condition (lhs_un_op->get_arg (), op, rhs);
-
+  {
+   tristate res = eval_condition (lhs_un_op->get_arg (), op, rhs);
+   if (res.is_known ())
+ return res;
+  }
 else if (rhs_un_op && CONVERT_EXPR_CODE_P (rhs_un_op->get_op ())
 && lhs_type == rhs_type)
-  return eval_condition (lhs, op, rhs_un_op->get_arg ());
+  {
+   tristate res = eval_condition (lhs, op, rhs_un_op->get_arg ());
+   if (res.is_known ())
+ return res;
+  }
   }
 
   /* Otherwise, try constraints.
diff --git a/gcc/testsuite/c-c++-common/analyzer/taint-divisor-pr113619.c 
b/gcc/testsuite/c-c++-common/analyzer/taint-divisor-pr113619.c
new file mode 100644
index ..15c881247ce7
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/analyzer/taint-divisor-pr113619.c
@@ -0,0 +1,29 @@
+/* Reduced from false positive in Linux kernel's fs/ceph/ioctl.c: */
+
+__extension__ typedef unsigned long long __u64;
+
+struct ceph_ioctl_layout
+{
+  __u64 stripe_unit, object_size;
+};
+static long
+__validate_layout(struct ceph_ioctl_layout* l)
+{
+  if ((l->object_size & ~(~(((1UL) << 12) - 1))) ||
+  (l->stripe_unit & ~(~(((1UL) << 12) - 1))) ||
+  ((unsigned)l->stripe_unit != 0 &&
+   ((unsigned)l->object_size % (unsigned)l->stripe_unit))) /* { dg-bogus 
"use of attacker-controlled value 'l.stripe_unit' as divisor without checking 
for zero" "PR analyzer/113619" } */
+return -22;
+  return 0;
+}
+
+long
+__attribute__((tainted_args))
+ceph_ioctl_set_layout_policy(struct ceph_ioctl_layout l)
+{
+  int err;
+  err = __validate_layout();
+  if (err)
+return err;
+  return err;
+}
-- 
2.26.3

Re: [PATCH] testsuite: i386: Skip gcc.target/i386/avx512cd-vpbroadcastmb2q-2.c etc. with Solaris as [PR114150]

2024-03-21 Thread Uros Bizjak

On Thu, Mar 21, 2024 at 10:26 AM Rainer Orth
 wrote:
>
> Two avx512cd tests FAIL to assemble with the Solaris/x86 assembler:
>
> FAIL: gcc.target/i386/avx512cd-vpbroadcastmb2q-2.c (test for excess errors)
> UNRESOLVED: gcc.target/i386/avx512cd-vpbroadcastmb2q-2.c compilation failed 
> to produce executable
> FAIL: gcc.target/i386/avx512cd-vpbroadcastmw2d-2.c (test for excess errors)
> UNRESOLVED: gcc.target/i386/avx512cd-vpbroadcastmw2d-2.c compilation failed 
> to produce executable
>
> Excess errors:
> Assembler: avx512cd-vpbroadcastmb2q-2.c
> "/var/tmp//ccs_9lod.s", line 42 : Invalid instruction argument
> Near line: "vpbroadcastmb2q %k0, %zmm0"
>
> Assembler: avx512cd-vpbroadcastmw2d-2.c
> "/var/tmp//ccevT6Rd.s", line 35 : Invalid instruction argument
> Near line: "vpbroadcastmw2d %k0, %zmm0"
>
> This seems to be an as bug, but given that this rarely if ever gets any
> fixes these days, this test just skips the affected tests.
>
> Adjuststing check_effective_target_avx512cd instead doesn't seem
> sensible since it would disable quite a number of working tests.
>
> Tested on i386-pc-solaris2.11 (as and gas) and x86_64-pc-linux-gnu.
>
> Ok for trunk?

OK, looks obvious to me.

Thanks,
Uros.

>
> Rainer
>
> --
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University
>
>
> 2024-03-19  Rainer Orth  
>
> gcc/testsuite:
> PR target/114150
> * gcc.target/i386/avx512cd-vpbroadcastmb2q-2.c: Skip on
> Solaris/x86 with as.
> * gcc.target/i386/avx512cd-vpbroadcastmw2d-2.c: Likewise.
>

Re: [PATCH v2] c++: direct-init of an array of class type [PR59465]

2024-03-21 Thread Jason Merrill


On 3/21/24 16:48, Marek Polacek wrote:

On Wed, Mar 20, 2024 at 09:21:02PM -0400, Jason Merrill wrote:

On 3/1/24 19:58, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?  I don't
claim that this has to go to 14 though.

-- >8 --
...from another array in a mem-initializer should not be accepted.

We already reject

struct string {} a[1];
string x[1](a);

but

struct pair {
  string s[1];
  pair() : s(a) {}
};

is wrongly accepted.

It started to be accepted with r0-110915-ga034826198b771:

which was supposed to be a cleanup, not a deliberate change to start
accepting the code.  The build_vec_init_expr code was added in r165976:
.

It appears that we do the magic copy array when we have a defaulted
constructor and we generate code for its mem-initializer which
initializes an array.  I also see that we go that path for compound
literals.  So when initializing an array member, we can limit building
up a VEC_INIT_EXPR to those special cases.

PR c++/59465

gcc/cp/ChangeLog:

* init.cc (can_init_array_with_p): New.
(perform_member_init): Check it.

gcc/testsuite/ChangeLog:

* g++.dg/init/array62.C: New test.
* g++.dg/init/array63.C: New test.
* g++.dg/init/array64.C: New test.
---
   gcc/cp/init.cc  | 27 ++-
   gcc/testsuite/g++.dg/init/array62.C | 19 +++
   gcc/testsuite/g++.dg/init/array63.C | 13 +
   gcc/testsuite/g++.dg/init/array64.C | 22 ++
   4 files changed, 80 insertions(+), 1 deletion(-)
   create mode 100644 gcc/testsuite/g++.dg/init/array62.C
   create mode 100644 gcc/testsuite/g++.dg/init/array63.C
   create mode 100644 gcc/testsuite/g++.dg/init/array64.C

diff --git a/gcc/cp/init.cc b/gcc/cp/init.cc
index d2586fad86b..fb8c0e521fb 100644
--- a/gcc/cp/init.cc
+++ b/gcc/cp/init.cc
@@ -934,6 +934,31 @@ find_uninit_fields (tree *t, hash_set 
*uninitialized, tree member)
   }
   }
+/* Return true if it's OK to initialize an array from INIT.  Mere mortals
+   can't copy arrays, but the compiler can do so with a VEC_INIT_EXPR in
+   certain cases.  */
+
+static bool
+can_init_array_with_p (tree init)
+{
+  if (!init)
+return true;
+
+  /* We're called from synthesize_method, and we're processing the
+ mem-initializers of a constructor.  */
+  if (DECL_DEFAULTED_FN (current_function_decl))
+return true;
+  /* As an extension, we allow copying from a compound literal.  */
+  else if (TREE_CODE (init) == TARGET_EXPR)
+{
+  init = TARGET_EXPR_INITIAL (init);
+  if (TREE_CODE (init) == CONSTRUCTOR)
+   return CONSTRUCTOR_C99_COMPOUND_LITERAL (init);
+}
+
+  return false;
+}
+
   /* Initialize MEMBER, a FIELD_DECL, with INIT, a TREE_LIST of
  arguments.  If TREE_LIST is void_type_node, an empty initializer
  list was given; if NULL_TREE no initializer was given.  UNINITIALIZED
@@ -1085,7 +1110,7 @@ perform_member_init (tree member, tree init, hash_set 
)
 else if (type_build_ctor_call (type)
   || (init && CLASS_TYPE_P (strip_array_types (type
   {
-  if (TREE_CODE (type) == ARRAY_TYPE)
+  if (TREE_CODE (type) == ARRAY_TYPE && can_init_array_with_p (init))
{
  if (init == NULL_TREE
  || same_type_ignoring_top_level_qualifiers_p (type,


It seems like these last two existing lines also fall under "init is
suitable to initialize type", so let's fold them into the new function.


Sounds good.  Here it is:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK.


-- >8 --
...from another array in a mem-initializer should not be accepted.

We already reject

   struct string {} a[1];
   string x[1](a);

but

   struct pair {
 string s[1];
 pair() : s(a) {}
   };

is wrongly accepted.

It started to be accepted with r0-110915-ga034826198b771:

which was supposed to be a cleanup, not a deliberate change to start
accepting the code.  The build_vec_init_expr code was added in r165976:
.

It appears that we do the magic copy array when we have a defaulted
constructor and we generate code for its mem-initializer which
initializes an array.  I also see that we go that path for compound
literals.  So when initializing an array member, we can limit building
up a VEC_INIT_EXPR to those special cases.

PR c++/59465

gcc/cp/ChangeLog:

* init.cc (can_init_array_with_p): New.
(perform_member_init): Check it.

gcc/testsuite/ChangeLog:

* g++.dg/init/array62.C: New test.
* g++.dg/init/array63.C: New test.
* g++.dg/init/array64.C: New test.
---
  gcc/cp/init.cc  | 31

Re: [PATCH] c++: ICE with noexcept and local specialization, again [PR114349]

2024-03-21 Thread Jason Merrill


On 3/21/24 17:01, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
Patrick noticed that my r14-9339-gdc6c3bfb59baab patch is wrong;
we're dealing with a noexcept-spec there, not a noexcept-expr, so
setting cp_noexcept_operand et al is incorrect.  Back to the drawing
board then.

To fix noexcept84.C, we should probably avoid doing push_to_top_level
in certain cases.  Patrick suggested checking:

   const bool push_to_top = current_function_decl != fn;

which works, but I'm not sure I follow the logic there.  I also came
up with

   const bool push_to_top = !decl_function_context (fn);

which also works.  But ultimately I went with !DECL_TEMPLATE_INSTANTIATED;
if DECL_TEMPLATE_INSTANTIATED is set, we've already pushed to top level
if it was necessary in instantiate_body.


This sort of thing is what maybe_push_to_top_level is for, does that 
also work?



This also fixes c++/114349, introduced by r14-9339.

PR c++/114349

gcc/cp/ChangeLog:

* pt.cc (maybe_instantiate_noexcept): Don't push_to_top_level if
fn has already been instantiated.  Don't save/restore
cp_unevaluated_operand, c_inhibit_evaluation_warnings, and
cp_noexcept_operand around the tsubst_expr call.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/noexcept85.C: New test.
* g++.dg/cpp0x/noexcept86.C: New test.
---
  gcc/cp/pt.cc| 17 ++---
  gcc/testsuite/g++.dg/cpp0x/noexcept85.C | 33 +
  gcc/testsuite/g++.dg/cpp0x/noexcept86.C | 25 +++
  3 files changed, 66 insertions(+), 9 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept85.C
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept86.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 8cf0d5b7a8d..b7ebd93ac7d 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -26855,7 +26855,12 @@ maybe_instantiate_noexcept (tree fn, tsubst_flags_t 
complain)
}
else if (push_tinst_level (fn))
{
- push_to_top_level ();
+ /* If we've already instantiated FN, there's no need to push to
+top level (as instantiate_body already pushed to top level if
+needed).  */
+ const bool push_to_top = !DECL_TEMPLATE_INSTANTIATED (fn);
+ if (push_to_top)
+   push_to_top_level ();
  push_access_scope (fn);
  push_deferring_access_checks (dk_no_deferred);
  input_location = DECL_SOURCE_LOCATION (fn);
@@ -26878,17 +26883,10 @@ maybe_instantiate_noexcept (tree fn, tsubst_flags_t 
complain)
  if (orig_fn)
++processing_template_decl;
  
-	  ++cp_unevaluated_operand;

- ++c_inhibit_evaluation_warnings;
- ++cp_noexcept_operand;
  /* Do deferred instantiation of the noexcept-specifier.  */
  noex = tsubst_expr (DEFERRED_NOEXCEPT_PATTERN (noex),
  DEFERRED_NOEXCEPT_ARGS (noex),
  tf_warning_or_error, fn);
- --cp_unevaluated_operand;
- --c_inhibit_evaluation_warnings;
- --cp_noexcept_operand;
-
  /* Build up the noexcept-specification.  */
  spec = build_noexcept_spec (noex, tf_warning_or_error);
  
@@ -26898,7 +26896,8 @@ maybe_instantiate_noexcept (tree fn, tsubst_flags_t complain)

  pop_deferring_access_checks ();
  pop_access_scope (fn);
  pop_tinst_level ();
- pop_from_top_level ();
+ if (push_to_top)
+   pop_from_top_level ();
}
else
spec = noexcept_false_spec;
diff --git a/gcc/testsuite/g++.dg/cpp0x/noexcept85.C 
b/gcc/testsuite/g++.dg/cpp0x/noexcept85.C
new file mode 100644
index 000..b415bb46bc9
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/noexcept85.C
@@ -0,0 +1,33 @@
+// PR c++/114349
+// { dg-do compile { target c++11 } }
+
+using A = struct {};
+template  class, typename, typename>
+using B = A;
+template 
+using C = typename T::D;
+struct E {
+  using D = B;
+};
+template  constexpr bool foo (A) { return false; }
+template  struct F {
+  using G = T;
+  using H = E;
+  F(const F &);
+  void operator=(F) noexcept(foo  (H::D{}));
+};
+template 
+using I = F;
+template 
+using J = I;
+struct K {
+  typedef J L;
+  L k;
+  K();
+};
+struct M {
+  bool bar () const;
+  K::L m;
+};
+K n;
+bool M::bar () const { n.k = m; return true; }
diff --git a/gcc/testsuite/g++.dg/cpp0x/noexcept86.C 
b/gcc/testsuite/g++.dg/cpp0x/noexcept86.C
new file mode 100644
index 000..2d040c090f5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/noexcept86.C
@@ -0,0 +1,25 @@
+// PR c++/114349
+// { dg-do compile { target c++14 } }
+
+struct B
+{
+  int i;
+};
+
+template 
+void
+goo ()
+{
+  constexpr bool is_yes = BA;
+  struct C
+  {
+static auto g(B b) noexcept(is_yes) { }
+  };
+  C::g({});
+}
+
+void
+x ()
+{
+  goo();
+}

base-commit: 48d49200510198cafcab55601cd8e5f8eb541f01

Re: [PATCH] c++: ICE with noexcept and local specialization [PR114114]

2024-03-21 Thread Marek Polacek

On Fri, Mar 15, 2024 at 12:12:49PM -0400, Patrick Palka wrote:
> On Fri, 15 Mar 2024, Marek Polacek wrote:
> 
> > On Fri, Mar 15, 2024 at 10:35:07AM -0400, Patrick Palka wrote:
> > > On Tue, 5 Mar 2024, Marek Polacek wrote:
> > > 
> > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > > > 
> > > > -- >8 --
> > > > Here we ICE because we call register_local_specialization while
> > > > local_specializations is null, so
> > > > 
> > > >   local_specializations->put ();
> > > > 
> > > > crashes on null this.  It's null since maybe_instantiate_noexcept calls
> > > > push_to_top_level which creates a new scope.  Normally, I would have
> > > > guessed that we need a new local_specialization_stack.  But here we're
> > > > dealing with an operand of a noexcept, which is an unevaluated operand,
> > > > and those aren't registered in the hash map.  maybe_instantiate_noexcept
> > > > wasn't signalling that it's substituting an unevaluated operand though.
> > > 
> > > It thought it was noexcept-exprs rather than noexcept-specs that are
> > > unevaluated contexts?
> > 
> > Yes, sigh.  It would have to be noexcept(noexcept(x)).  I was looking at
> > cp_parser_unary_expression/RID_NOEXCEPT but that's a noexcept-expr.  So
> > what can we do here, set a new local_specialization_stack?  That wasn't
> > that straightforward when I tried.  Or maybe just
> 
> Maybe we can avoid doing push_to_top_level (which clears
> local_specializations) from maybe_instantiate_noexcept if
> current_function_decl == fn?

Thanks, I agree that not doing push_to_top_level in the first place
is a better fix.  I just sent a patch that does that.
 
> Relatedly I wonder if we can avoid calling regenerate_decl_from_template
> for local class member functions since they can't be redeclared?

Good point.  I've tried the below, but that breaks a lot of contracts tests.
I have not pursued it further than that.

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index a7ba8b5af92..5352453a5d3 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -26623,6 +26623,12 @@ regenerate_decl_from_template (tree decl, tree tmpl, 
tree args)
   if (DECL_UNIQUE_FRIEND_P (decl))
goto done;
 
+  /* [class.mem.general]/5 says that a member shall not be declared twice
+in the member-specification (unless it's a nested class or member class
+template or an enumeration).  */
+  if (DECL_CLASS_SCOPE_P (decl))
+   goto done;
+
   /* Use the source location of the definition.  */
   DECL_SOURCE_LOCATION (decl) = DECL_SOURCE_LOCATION (tmpl);
 

Marek

[PATCH] c++: ICE with noexcept and local specialization, again [PR114349]

2024-03-21 Thread Marek Polacek

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
Patrick noticed that my r14-9339-gdc6c3bfb59baab patch is wrong;
we're dealing with a noexcept-spec there, not a noexcept-expr, so
setting cp_noexcept_operand et al is incorrect.  Back to the drawing
board then.

To fix noexcept84.C, we should probably avoid doing push_to_top_level
in certain cases.  Patrick suggested checking:

  const bool push_to_top = current_function_decl != fn;

which works, but I'm not sure I follow the logic there.  I also came
up with

  const bool push_to_top = !decl_function_context (fn);

which also works.  But ultimately I went with !DECL_TEMPLATE_INSTANTIATED;
if DECL_TEMPLATE_INSTANTIATED is set, we've already pushed to top level
if it was necessary in instantiate_body.

This also fixes c++/114349, introduced by r14-9339.

PR c++/114349

gcc/cp/ChangeLog:

* pt.cc (maybe_instantiate_noexcept): Don't push_to_top_level if
fn has already been instantiated.  Don't save/restore
cp_unevaluated_operand, c_inhibit_evaluation_warnings, and
cp_noexcept_operand around the tsubst_expr call.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/noexcept85.C: New test.
* g++.dg/cpp0x/noexcept86.C: New test.
---
 gcc/cp/pt.cc| 17 ++---
 gcc/testsuite/g++.dg/cpp0x/noexcept85.C | 33 +
 gcc/testsuite/g++.dg/cpp0x/noexcept86.C | 25 +++
 3 files changed, 66 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept85.C
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept86.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 8cf0d5b7a8d..b7ebd93ac7d 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -26855,7 +26855,12 @@ maybe_instantiate_noexcept (tree fn, tsubst_flags_t 
complain)
}
   else if (push_tinst_level (fn))
{
- push_to_top_level ();
+ /* If we've already instantiated FN, there's no need to push to
+top level (as instantiate_body already pushed to top level if
+needed).  */
+ const bool push_to_top = !DECL_TEMPLATE_INSTANTIATED (fn);
+ if (push_to_top)
+   push_to_top_level ();
  push_access_scope (fn);
  push_deferring_access_checks (dk_no_deferred);
  input_location = DECL_SOURCE_LOCATION (fn);
@@ -26878,17 +26883,10 @@ maybe_instantiate_noexcept (tree fn, tsubst_flags_t 
complain)
  if (orig_fn)
++processing_template_decl;
 
- ++cp_unevaluated_operand;
- ++c_inhibit_evaluation_warnings;
- ++cp_noexcept_operand;
  /* Do deferred instantiation of the noexcept-specifier.  */
  noex = tsubst_expr (DEFERRED_NOEXCEPT_PATTERN (noex),
  DEFERRED_NOEXCEPT_ARGS (noex),
  tf_warning_or_error, fn);
- --cp_unevaluated_operand;
- --c_inhibit_evaluation_warnings;
- --cp_noexcept_operand;
-
  /* Build up the noexcept-specification.  */
  spec = build_noexcept_spec (noex, tf_warning_or_error);
 
@@ -26898,7 +26896,8 @@ maybe_instantiate_noexcept (tree fn, tsubst_flags_t 
complain)
  pop_deferring_access_checks ();
  pop_access_scope (fn);
  pop_tinst_level ();
- pop_from_top_level ();
+ if (push_to_top)
+   pop_from_top_level ();
}
   else
spec = noexcept_false_spec;
diff --git a/gcc/testsuite/g++.dg/cpp0x/noexcept85.C 
b/gcc/testsuite/g++.dg/cpp0x/noexcept85.C
new file mode 100644
index 000..b415bb46bc9
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/noexcept85.C
@@ -0,0 +1,33 @@
+// PR c++/114349
+// { dg-do compile { target c++11 } }
+
+using A = struct {};
+template  class, typename, typename>
+using B = A;
+template 
+using C = typename T::D;
+struct E {
+  using D = B;
+};
+template  constexpr bool foo (A) { return false; }
+template  struct F {
+  using G = T;
+  using H = E;
+  F(const F &);
+  void operator=(F) noexcept(foo  (H::D{}));
+};
+template 
+using I = F;
+template 
+using J = I;
+struct K {
+  typedef J L;
+  L k;
+  K();
+};
+struct M {
+  bool bar () const;
+  K::L m;
+};
+K n;
+bool M::bar () const { n.k = m; return true; }
diff --git a/gcc/testsuite/g++.dg/cpp0x/noexcept86.C 
b/gcc/testsuite/g++.dg/cpp0x/noexcept86.C
new file mode 100644
index 000..2d040c090f5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/noexcept86.C
@@ -0,0 +1,25 @@
+// PR c++/114349
+// { dg-do compile { target c++14 } }
+
+struct B
+{
+  int i;
+};
+
+template 
+void
+goo ()
+{
+  constexpr bool is_yes = BA;
+  struct C
+  {
+static auto g(B b) noexcept(is_yes) { }
+  };
+  C::g({});
+}
+
+void
+x ()
+{
+  goo();
+}

base-commit: 48d49200510198cafcab55601cd8e5f8eb541f01
-- 
2.44.0

Re: [PATCH] fortran: Ignore use statements on error [PR107426]

2024-03-21 Thread Harald Anlauf


Hi Mikael,

this looks all good to me.  I wouldn't mind the minor side-effects of
better error recovery, as you are (successfully) trying hard to keep
the namespaces sane.  So OK for mainline.

Thanks for the patch!

Harald


On 3/21/24 17:27, Mikael Morin wrote:

Hello,

here is a fix for an ICE caused by dangling pointers to ISO_C_BINDING's
C_PTR symbol in the global intrinsic symbol for C_LOC.
I tried to fix it by making the intrinsic symbol use its own copy of
C_PTR, but it regressed heavily.

Instead, I propose this which is based on a patch I attached to the PR
one year ago.  It's sufficient to remove the access to freed memory.

However, an underlying problem remains that successive use-associations
of ISO_C_BINDING's symbols in different scopes cause the return type
of the C_LOC global intrinsic symbol to be set to the C_PTR from each
scope successively, with the last one "winning".  Not very pretty.

Anyway, there are two changed messages in the testsuite as a side-effect
of the proposed change.  I regard them as acceptable, albeit slightly worse.
No regression otherwise on x86_64-pc-linux-gnu.
Ok for 14 master?

Mikael

-- >8 --

This fixes an access to freed memory on the testcase from the PR.
The problem comes from an invalid subroutine statement in an interface,
which is ignored and causes the following statements forming the procedure
body to be rejected.  One of them use-associates the intrinsic ISO_C_BINDING
module, which imports new symbols in a namespace that is freed at the time
the statement is rejected.  However, this creates dangling pointers as
ISO_C_BINDING is special and its import creates a reference to the imported
C_PTR symbol in the return type of the global intrinsic symbol for C_LOC
(see the function create_intrinsic_function).

This change saves and restores the list of use statements, so that rejected
use statements are removed before they have a chance to be applied to the
current namespace and create dangling pointers.

PR fortran/107426

gcc/fortran/ChangeLog:

* gfortran.h (gfc_save_module_list, gfc_restore_old_module_list):
New declarations.
* module.cc (old_module_list_tail): New global variable.
(gfc_save_module_list, gfc_restore_old_module_list): New functions.
(gfc_use_modules): Set module_list and old_module_list_tail.
* parse.cc (next_statement): Save module_list before doing any work.
(reject_statement): Restore module_list to its saved value.

gcc/testsuite/ChangeLog:

* gfortran.dg/pr89943_3.f90: Update error pattern.
* gfortran.dg/pr89943_4.f90: Likewise.
* gfortran.dg/use_31.f90: New test.
---
  gcc/fortran/gfortran.h  |  2 ++
  gcc/fortran/module.cc   | 31 +
  gcc/fortran/parse.cc|  4 
  gcc/testsuite/gfortran.dg/pr89943_3.f90 |  2 +-
  gcc/testsuite/gfortran.dg/pr89943_4.f90 |  2 +-
  gcc/testsuite/gfortran.dg/use_31.f90| 25 
  6 files changed, 64 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/gfortran.dg/use_31.f90

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index c7039730fad..fec7b53ff1a 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -3926,6 +3926,8 @@ void gfc_module_done_2 (void);
  void gfc_dump_module (const char *, int);
  bool gfc_check_symbol_access (gfc_symbol *);
  void gfc_free_use_stmts (gfc_use_list *);
+void gfc_save_module_list ();
+void gfc_restore_old_module_list ();
  const char *gfc_dt_lower_string (const char *);
  const char *gfc_dt_upper_string (const char *);

diff --git a/gcc/fortran/module.cc b/gcc/fortran/module.cc
index d1de53cbdb4..c565b84d61b 100644
--- a/gcc/fortran/module.cc
+++ b/gcc/fortran/module.cc
@@ -195,7 +195,12 @@ static const char *module_name;
  /* The name of the .smod file that the submodule will write to.  */
  static const char *submodule_name;

+/* The list of use statements to apply to the current namespace
+   before parsing the non-use statements.  */
  static gfc_use_list *module_list;
+/* The end of the MODULE_LIST list above at the time the recognition
+   of the current statement started.  */
+static gfc_use_list **old_module_list_tail;

  /* If we're reading an intrinsic module, this is its ID.  */
  static intmod_id current_intmod;
@@ -7561,6 +7566,8 @@ gfc_use_modules (void)
gfc_use_module (module_list);
free (module_list);
  }
+  module_list = NULL;
+  old_module_list_tail = _list;
gfc_rename_list = NULL;
  }

@@ -7584,6 +7591,30 @@ gfc_free_use_stmts (gfc_use_list *use_stmts)
  }


+/* Remember the end of the MODULE_LIST list, so that the list can be restored
+   to its previous state if the current statement is erroneous.  */
+
+void
+gfc_save_module_list ()
+{
+  gfc_use_list **tail = _list;
+  while (*tail != NULL)
+tail = &(*tail)->next;
+  old_module_list_tail = tail;
+}
+
+
+/* Restore the MODULE_LIST list to its

[PATCH v2] c++: direct-init of an array of class type [PR59465]

2024-03-21 Thread Marek Polacek

On Wed, Mar 20, 2024 at 09:21:02PM -0400, Jason Merrill wrote:
> On 3/1/24 19:58, Marek Polacek wrote:
> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?  I don't
> > claim that this has to go to 14 though.
> > 
> > -- >8 --
> > ...from another array in a mem-initializer should not be accepted.
> > 
> > We already reject
> > 
> >struct string {} a[1];
> >string x[1](a);
> > 
> > but
> > 
> >struct pair {
> >  string s[1];
> >  pair() : s(a) {}
> >};
> > 
> > is wrongly accepted.
> > 
> > It started to be accepted with r0-110915-ga034826198b771:
> > 
> > which was supposed to be a cleanup, not a deliberate change to start
> > accepting the code.  The build_vec_init_expr code was added in r165976:
> > .
> > 
> > It appears that we do the magic copy array when we have a defaulted
> > constructor and we generate code for its mem-initializer which
> > initializes an array.  I also see that we go that path for compound
> > literals.  So when initializing an array member, we can limit building
> > up a VEC_INIT_EXPR to those special cases.
> > 
> > PR c++/59465
> > 
> > gcc/cp/ChangeLog:
> > 
> > * init.cc (can_init_array_with_p): New.
> > (perform_member_init): Check it.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/init/array62.C: New test.
> > * g++.dg/init/array63.C: New test.
> > * g++.dg/init/array64.C: New test.
> > ---
> >   gcc/cp/init.cc  | 27 ++-
> >   gcc/testsuite/g++.dg/init/array62.C | 19 +++
> >   gcc/testsuite/g++.dg/init/array63.C | 13 +
> >   gcc/testsuite/g++.dg/init/array64.C | 22 ++
> >   4 files changed, 80 insertions(+), 1 deletion(-)
> >   create mode 100644 gcc/testsuite/g++.dg/init/array62.C
> >   create mode 100644 gcc/testsuite/g++.dg/init/array63.C
> >   create mode 100644 gcc/testsuite/g++.dg/init/array64.C
> > 
> > diff --git a/gcc/cp/init.cc b/gcc/cp/init.cc
> > index d2586fad86b..fb8c0e521fb 100644
> > --- a/gcc/cp/init.cc
> > +++ b/gcc/cp/init.cc
> > @@ -934,6 +934,31 @@ find_uninit_fields (tree *t, hash_set 
> > *uninitialized, tree member)
> >   }
> >   }
> > +/* Return true if it's OK to initialize an array from INIT.  Mere mortals
> > +   can't copy arrays, but the compiler can do so with a VEC_INIT_EXPR in
> > +   certain cases.  */
> > +
> > +static bool
> > +can_init_array_with_p (tree init)
> > +{
> > +  if (!init)
> > +return true;
> > +
> > +  /* We're called from synthesize_method, and we're processing the
> > + mem-initializers of a constructor.  */
> > +  if (DECL_DEFAULTED_FN (current_function_decl))
> > +return true;
> > +  /* As an extension, we allow copying from a compound literal.  */
> > +  else if (TREE_CODE (init) == TARGET_EXPR)
> > +{
> > +  init = TARGET_EXPR_INITIAL (init);
> > +  if (TREE_CODE (init) == CONSTRUCTOR)
> > +   return CONSTRUCTOR_C99_COMPOUND_LITERAL (init);
> > +}
> > +
> > +  return false;
> > +}
> > +
> >   /* Initialize MEMBER, a FIELD_DECL, with INIT, a TREE_LIST of
> >  arguments.  If TREE_LIST is void_type_node, an empty initializer
> >  list was given; if NULL_TREE no initializer was given.  UNINITIALIZED
> > @@ -1085,7 +1110,7 @@ perform_member_init (tree member, tree init, 
> > hash_set )
> > else if (type_build_ctor_call (type)
> >|| (init && CLASS_TYPE_P (strip_array_types (type
> >   {
> > -  if (TREE_CODE (type) == ARRAY_TYPE)
> > +  if (TREE_CODE (type) == ARRAY_TYPE && can_init_array_with_p (init))
> > {
> >   if (init == NULL_TREE
> >   || same_type_ignoring_top_level_qualifiers_p (type,
> 
> It seems like these last two existing lines also fall under "init is
> suitable to initialize type", so let's fold them into the new function.

Sounds good.  Here it is:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
...from another array in a mem-initializer should not be accepted.

We already reject

  struct string {} a[1];
  string x[1](a);

but

  struct pair {
string s[1];
pair() : s(a) {}
  };

is wrongly accepted.

It started to be accepted with r0-110915-ga034826198b771:

which was supposed to be a cleanup, not a deliberate change to start
accepting the code.  The build_vec_init_expr code was added in r165976:
.

It appears that we do the magic copy array when we have a defaulted
constructor and we generate code for its mem-initializer which
initializes an array.  I also see that we go that path for compound
literals.  So when initializing an array member, we can limit building
up a VEC_INIT_EXPR to those special cases.

PR c++/59465

gcc/cp/ChangeLog:

* init.cc

Re: scheduler queue flush (was Re: [gcc-15 0/3] RISC-V improve stack/array access by constant mat tweak)

2024-03-21 Thread Jeff Law





On 3/21/24 11:19 AM, Vineet Gupta wrote:



So if we go back to Robin's observation that scheduling dramatically
increases the instruction count, perhaps we try a run with
-fno-schedule-insns -fno-schedule-insns2 and see how the instruction
counts compare.


Oh yeah ! Robin hinted to this in Tues patchworks meeting too

default : 2,565,319,368,591
128 : 2,509,741,035,068
256 : 2,527,817,813,612
no-sched{,2}: 1,295,520,567,376

Now we're getting somewhere.  That's in line with expectations.

I would strongly suspect it's -fno-schedule-insns rather than 
-fno-schedule-insns2.  The former turns off scheduling before register 
allocation, the second turns it off after register allocation.  So if 
our theory about spilling is correct, then it must be the first since 
the second won't affect register allocation.   While I can speculate 
about other potential scheduler impacts, spilling due to sched1's 
actions is by far the most likely.


Given the magnitude here, I would bet we can see this pretty clearly if 
you've got function level or block level count data for those runs.  I'd 
start with that, ideally narrowing things down to a function or hot loop 
within a function which shows a huge delta.


From that we can then look at the IRA and LRA dumps and correlate what 
we see there with the before/after scheduling dumps to see how we've 
lengthened lifetimes in critical locations.


I'd probably start with the IRA dump.  It's going to have annotations in 
its dump output like "Potential Spill" which may guide us.  In simplest 
terms a pseudo is trivially allocatable when it has fewer neighbors in 
the conflict graph than available hard registers.  If it has more 
neighbors in the conflict graph than available hard registers, then it's 
potentially going to be spilled -- we can't know during this phase of 
allocation.


As we pop registers off the coloring stack, some neighbors of the pseudo 
in question may end up allocated into the same hard register.  That can 
sometimes result in a hard register being available.  It might be easier 
to see with a graph


a--b--c
   |
   d

Where a..d are pseudo registers.  If two pseudos are connected by an 
edge, then they have overlapping lifetimes and can't be allocated to the 
same hard register.  So as we can see b conflicts with a, c & d.  If we 
only have two hard registers, then b is not trivially colorable and will 
be marked as a potential spill.


During coloring we may end up allocating a, c & d to the same hard 
register (they don't conflict, so its safe).  If that happens, then 
there would be a register available for b.


Anyway, that should explain why b would be marked as a potential spill 
and how it might end up getting a hard register anyway.


The hope is we can see the potential spills increasing.  At which point 
we can walk backwards to sched1 and dive into its scheduling decisions.


Jeff

Re: scheduler queue flush (was Re: [gcc-15 0/3] RISC-V improve stack/array access by constant mat tweak)

2024-03-21 Thread Vineet Gupta




On 3/21/24 07:45, Jeff Law wrote:
 The first patch is the main change which improves SPEC cactu by 10%.
>>> Just to confirm.  Yup, 10% reduction in icounts and about a 3.5%
>>> improvement in cycles on our target.  Which is great!
>>>
>>> This also makes me wonder if cactu is the benchmark that was sensitive
>>> to flushing the pending queue in the scheduler.  Jivan's data would tend
>>> to indicate that is the case as several routines seem to flush the
>>> pending queue often.  In particular:
>>>
>>> ML_BSSN_RHS_Body
>>> ML_BSSN_Advect_Body
>>> ML_BSSN_constraints_Body
>>>
>>> All have a high number of dynamic instructions as well as lots of
>>> flushes of the pending queue.
>>>
>>> Vineet, you might want to look and see if cranking up the
>>> max-pending-list-length parameter helps drive down spilling.   I think
>>> it's default value is 32 insns.  I've seen it cranked up to 128 and 256
>>> insns without significant ill effects on compile time.
>>>
>>> My recollection (it's been like 3 years) of the key loop was that it had
>>> a few hundred instructions and we'd flush the pending list about 50
>>> cycles into the loop as there just wasn't enough issue bandwidth to the
>>> FP units to dispatch all the FP instructions as their inputs became
>>> ready.  So you'd be looking for flushes in a big loop.
>> Here are the results for Cactu on top of the new splitter changes:
>>
>> default  : 2,565,319,368,591
>> 128  : 2,509,741,035,068
>> 256  : 2,527,817,813,612
>>
>> I've haven't probed deeper in generated code itself but likely to be
>> helping with spilling
> Actually, I read that as not important for this issue.  While it is 50b 
> instructions, I would be looking for something that had perhaps an order 
> of magnitude bigger impact.Ultimately I think it means we still 
> don't have a good handle on what's causing the spilling.  Oh well.
>
> So if we go back to Robin's observation that scheduling dramatically 
> increases the instruction count, perhaps we try a run with 
> -fno-schedule-insns -fno-schedule-insns2 and see how the instruction 
> counts compare.

Oh yeah ! Robin hinted to this in Tues patchworks meeting too

default : 2,565,319,368,591
128 : 2,509,741,035,068
256 : 2,527,817,813,612
no-sched{,2}: 1,295,520,567,376

-Vineet

Re: [PATCH] s390: testsuite: Fix abs-4.c

2024-03-21 Thread Andreas Krebbel

On 3/21/24 15:41, Stefan Schulze Frielinghaus wrote:
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/tree-ssa/abs-4.c: On s390 we also have a copysign optab
>   for long double.  Thus, scan 3 instead of 2 times for it.
> ---
>  Ok for mainline?

Ok. Thanks!

Andreas

[PATCH] RISC-V: Require a extension for ztso testcases with atomic insns

2024-03-21 Thread Patrick O'Neill

Use dg_add_options riscv_a to add atomic extension when running compile
tests on non-a targets.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/amo-table-ztso-amo-add-1.c: Add
  dg_add_options riscv_a
* gcc.target/riscv/amo-table-ztso-amo-add-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-amo-add-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-amo-add-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-amo-add-5.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-1.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-5.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-6.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-7.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-1.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-5.c: Ditto.

Signed-off-by: Patrick O'Neill 
---
 gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-1.c| 1 +
 gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-2.c| 1 +
 gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-3.c| 1 +
 gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-4.c| 1 +
 gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-5.c| 1 +
 .../gcc.target/riscv/amo-table-ztso-compare-exchange-1.c | 1 +
 .../gcc.target/riscv/amo-table-ztso-compare-exchange-2.c | 1 +
 .../gcc.target/riscv/amo-table-ztso-compare-exchange-3.c | 1 +
 .../gcc.target/riscv/amo-table-ztso-compare-exchange-4.c | 1 +
 .../gcc.target/riscv/amo-table-ztso-compare-exchange-5.c | 1 +
 .../gcc.target/riscv/amo-table-ztso-compare-exchange-6.c | 1 +
 .../gcc.target/riscv/amo-table-ztso-compare-exchange-7.c | 1 +
 .../gcc.target/riscv/amo-table-ztso-subword-amo-add-1.c  | 1 +
 .../gcc.target/riscv/amo-table-ztso-subword-amo-add-2.c  | 1 +
 .../gcc.target/riscv/amo-table-ztso-subword-amo-add-3.c  | 1 +
 .../gcc.target/riscv/amo-table-ztso-subword-amo-add-4.c  | 1 +
 .../gcc.target/riscv/amo-table-ztso-subword-amo-add-5.c  | 1 +
 17 files changed, 17 insertions(+)

diff --git a/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-1.c 
b/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-1.c
index 65a4351025d..a9edc33ff39 100644
--- a/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-1.c
+++ b/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-1.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* Verify that atomic op mappings match the Ztso suggested mapping.  */
 /* { dg-options "-O3" } */
+/* { dg-add-options riscv_a } */
 /* { dg-add-options riscv_ztso } */
 /* { dg-skip-if "" { *-*-* } { "-g" "-flto"} } */
 /* { dg-final { check-function-bodies "**" "" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-2.c 
b/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-2.c
index 03da6b04de0..ad843402bcc 100644
--- a/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-2.c
+++ b/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-2.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* Verify that atomic op mappings the Ztso suggested mapping.  */
 /* { dg-options "-O3" } */
+/* { dg-add-options riscv_a } */
 /* { dg-add-options riscv_ztso } */
 /* { dg-skip-if "" { *-*-* } { "-g" "-flto"} } */
 /* { dg-final { check-function-bodies "**" "" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-3.c 
b/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-3.c
index 695306e9d6f..bdae5bb83a6 100644
--- a/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-3.c
+++ b/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-3.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* Verify that atomic op mappings match the Ztso suggested mapping.  */
 /* { dg-options "-O3" } */
+/* { dg-add-options riscv_a } */
 /* { dg-add-options riscv_ztso } */
 /* { dg-skip-if "" { *-*-* } { "-g" "-flto"} } */
 /* { dg-final { check-function-bodies "**" "" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-4.c 
b/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-4.c
index e7e5ac7cc88..815a72f1e56 100644
--- a/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-4.c
+++ b/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-4.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* Verify that atomic op mappings match the Ztso suggested mapping.  */
 /* { dg-options "-O3" } */
+/* { dg-add-options riscv_a } */
 /* { dg-add-options riscv_ztso } */
 /* { dg-skip-if "" { *-*-* } { "-g"

[PATCH] fortran: Ignore use statements on error [PR107426]

2024-03-21 Thread Mikael Morin

Hello,

here is a fix for an ICE caused by dangling pointers to ISO_C_BINDING's
C_PTR symbol in the global intrinsic symbol for C_LOC.
I tried to fix it by making the intrinsic symbol use its own copy of
C_PTR, but it regressed heavily.

Instead, I propose this which is based on a patch I attached to the PR
one year ago.  It's sufficient to remove the access to freed memory.

However, an underlying problem remains that successive use-associations
of ISO_C_BINDING's symbols in different scopes cause the return type
of the C_LOC global intrinsic symbol to be set to the C_PTR from each
scope successively, with the last one "winning".  Not very pretty.

Anyway, there are two changed messages in the testsuite as a side-effect
of the proposed change.  I regard them as acceptable, albeit slightly worse.
No regression otherwise on x86_64-pc-linux-gnu.
Ok for 14 master?

Mikael

-- >8 --

This fixes an access to freed memory on the testcase from the PR.
The problem comes from an invalid subroutine statement in an interface,
which is ignored and causes the following statements forming the procedure
body to be rejected.  One of them use-associates the intrinsic ISO_C_BINDING
module, which imports new symbols in a namespace that is freed at the time
the statement is rejected.  However, this creates dangling pointers as
ISO_C_BINDING is special and its import creates a reference to the imported
C_PTR symbol in the return type of the global intrinsic symbol for C_LOC
(see the function create_intrinsic_function).

This change saves and restores the list of use statements, so that rejected
use statements are removed before they have a chance to be applied to the
current namespace and create dangling pointers.

PR fortran/107426

gcc/fortran/ChangeLog:

* gfortran.h (gfc_save_module_list, gfc_restore_old_module_list):
New declarations.
* module.cc (old_module_list_tail): New global variable.
(gfc_save_module_list, gfc_restore_old_module_list): New functions.
(gfc_use_modules): Set module_list and old_module_list_tail.
* parse.cc (next_statement): Save module_list before doing any work.
(reject_statement): Restore module_list to its saved value.

gcc/testsuite/ChangeLog:

* gfortran.dg/pr89943_3.f90: Update error pattern.
* gfortran.dg/pr89943_4.f90: Likewise.
* gfortran.dg/use_31.f90: New test.
---
 gcc/fortran/gfortran.h  |  2 ++
 gcc/fortran/module.cc   | 31 +
 gcc/fortran/parse.cc|  4 
 gcc/testsuite/gfortran.dg/pr89943_3.f90 |  2 +-
 gcc/testsuite/gfortran.dg/pr89943_4.f90 |  2 +-
 gcc/testsuite/gfortran.dg/use_31.f90| 25 
 6 files changed, 64 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/use_31.f90

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index c7039730fad..fec7b53ff1a 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -3926,6 +3926,8 @@ void gfc_module_done_2 (void);
 void gfc_dump_module (const char *, int);
 bool gfc_check_symbol_access (gfc_symbol *);
 void gfc_free_use_stmts (gfc_use_list *);
+void gfc_save_module_list ();
+void gfc_restore_old_module_list ();
 const char *gfc_dt_lower_string (const char *);
 const char *gfc_dt_upper_string (const char *);
 
diff --git a/gcc/fortran/module.cc b/gcc/fortran/module.cc
index d1de53cbdb4..c565b84d61b 100644
--- a/gcc/fortran/module.cc
+++ b/gcc/fortran/module.cc
@@ -195,7 +195,12 @@ static const char *module_name;
 /* The name of the .smod file that the submodule will write to.  */
 static const char *submodule_name;
 
+/* The list of use statements to apply to the current namespace
+   before parsing the non-use statements.  */
 static gfc_use_list *module_list;
+/* The end of the MODULE_LIST list above at the time the recognition
+   of the current statement started.  */
+static gfc_use_list **old_module_list_tail;
 
 /* If we're reading an intrinsic module, this is its ID.  */
 static intmod_id current_intmod;
@@ -7561,6 +7566,8 @@ gfc_use_modules (void)
   gfc_use_module (module_list);
   free (module_list);
 }
+  module_list = NULL;
+  old_module_list_tail = _list;
   gfc_rename_list = NULL;
 }
 
@@ -7584,6 +7591,30 @@ gfc_free_use_stmts (gfc_use_list *use_stmts)
 }
 
 
+/* Remember the end of the MODULE_LIST list, so that the list can be restored
+   to its previous state if the current statement is erroneous.  */
+
+void
+gfc_save_module_list ()
+{
+  gfc_use_list **tail = _list;
+  while (*tail != NULL)
+tail = &(*tail)->next;
+  old_module_list_tail = tail;
+}
+
+
+/* Restore the MODULE_LIST list to its previous value and free the use
+   statements that are no longer part of the list.  */
+
+void
+gfc_restore_old_module_list ()
+{
+  gfc_free_use_stmts (*old_module_list_tail);
+  *old_module_list_tail = NULL;
+}
+
+
 void
 gfc_module_init_2 (void)
 {
diff --git a/gcc/fortran/parse.cc

Re: [PATCH] vect: more oversized bitmask fixups

2024-03-21 Thread Andrew Stubbs


On 21/03/2024 15:18, Richard Biener wrote:

On Thu, Mar 21, 2024 at 3:23 PM Andrew Stubbs  wrote:


My previous patch to fix this problem with xor was rejected because we
want to fix these issues only at the point of use.  That patch produced
slightly better code, in this example, but this works too

These patches fix up a failure in testcase vect/tsvc/vect-tsvc-s278.c when
configured to use V32 instead of V64 (I plan to do this for RDNA devices).

The problem was that a "not" operation on the mask inadvertently enabled
inactive lanes 31-63 and corrupted the output.  The fix is to adjust the mask
when calling internal functions (in this case COND_MINUS), when doing masked
loads and stores, and when doing conditional jumps.

OK for mainline?

Andrew

gcc/ChangeLog:

 * dojump.cc (do_compare_rtx_and_jump): Clear excess bits in vector
 bitmaps.
 * internal-fn.cc (expand_fn_using_insn): Likewise.
 (add_mask_and_len_args): Likewise.
---
  gcc/dojump.cc  | 16 
  gcc/internal-fn.cc | 26 ++
  2 files changed, 42 insertions(+)

diff --git a/gcc/dojump.cc b/gcc/dojump.cc
index 88600cb42d3..8df86957e83 100644
--- a/gcc/dojump.cc
+++ b/gcc/dojump.cc
@@ -1235,6 +1235,22 @@ do_compare_rtx_and_jump (rtx op0, rtx op1, enum rtx_code 
code, int unsignedp,
 }
 }

+  if (val
+ && VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (val))
+ && SCALAR_INT_MODE_P (mode))
+   {
+ auto nunits = TYPE_VECTOR_SUBPARTS (TREE_TYPE (val)).to_constant ();
+ if (maybe_ne (GET_MODE_PRECISION (mode), nunits))
+   {
+ op0 = expand_binop (mode, and_optab, op0,
+ GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
+ NULL_RTX, true, OPTAB_WIDEN);
+ op1 = expand_binop (mode, and_optab, op1,
+ GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
+ NULL_RTX, true, OPTAB_WIDEN);
+   }
+   }
+


Can we then remove the same code from do_compare_and_jump before the call to
do_compare_rtx_and_jump?


It's called from do_jump.


 I'll note that we don't pass 'val' there and
'val' is unfortunately
not documented - what's it supposed to be?  I think I placed the original fix in
do_compare_and_jump because we have the full into available there.  So
what's the
do_compare_rtx_and_jump caller that needs fixing as well?  (IMHO keying on 'val'
looks fragile)


"val" is the tree expression from which the rtx op0 was expanded. It's 
optional, but it's used in emit_cmp_and_jump_insns to determine whether 
the target supports tbranch (according to a comment).


I think it would be safe to remove your code as that path does path 
"treeop0" to "val".


WDYT?


The other hunks below are OK.


Thanks.

Andrew


Thanks,
Richard.


emit_cmp_and_jump_insns (op0, op1, code, size, mode, unsignedp, val,
if_true_label, prob);
  }
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index fcf47c7fa12..5269f0ac528 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -245,6 +245,18 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, 
unsigned int noutputs,
&& SSA_NAME_IS_DEFAULT_DEF (rhs)
&& VAR_P (SSA_NAME_VAR (rhs)))
 create_undefined_input_operand ([opno], TYPE_MODE (rhs_type));
+  else if (VECTOR_BOOLEAN_TYPE_P (rhs_type)
+  && SCALAR_INT_MODE_P (TYPE_MODE (rhs_type))
+  && maybe_ne (GET_MODE_PRECISION (TYPE_MODE (rhs_type)),
+   TYPE_VECTOR_SUBPARTS (rhs_type).to_constant ()))
+   {
+ /* Ensure that the vector bitmasks do not have excess bits.  */
+ int nunits = TYPE_VECTOR_SUBPARTS (rhs_type).to_constant ();
+ rtx tmp = expand_binop (TYPE_MODE (rhs_type), and_optab, rhs_rtx,
+ GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
+ NULL_RTX, true, OPTAB_WIDEN);
+ create_input_operand ([opno], tmp, TYPE_MODE (rhs_type));
+   }
else
 create_input_operand ([opno], rhs_rtx, TYPE_MODE (rhs_type));
opno += 1;
@@ -312,6 +324,20 @@ add_mask_and_len_args (expand_operand *ops, unsigned int 
opno, gcall *stmt)
  {
tree mask = gimple_call_arg (stmt, mask_index);
rtx mask_rtx = expand_normal (mask);
+
+  tree mask_type = TREE_TYPE (mask);
+  if (VECTOR_BOOLEAN_TYPE_P (mask_type)
+ && SCALAR_INT_MODE_P (TYPE_MODE (mask_type))
+ && maybe_ne (GET_MODE_PRECISION (TYPE_MODE (mask_type)),
+  TYPE_VECTOR_SUBPARTS (mask_type).to_constant ()))
+   {
+ /* Ensure that the vector bitmasks do not have excess bits.  */
+ int nunits = TYPE_VECTOR_SUBPARTS (mask_type).to_constant ();
+ mask_rtx = expand_binop (TYPE_MODE (mask_type), and_optab, mask_rtx,
+

Re: [PATCH] vect: more oversized bitmask fixups

2024-03-21 Thread Richard Biener

On Thu, Mar 21, 2024 at 3:23 PM Andrew Stubbs  wrote:
>
> My previous patch to fix this problem with xor was rejected because we
> want to fix these issues only at the point of use.  That patch produced
> slightly better code, in this example, but this works too
>
> These patches fix up a failure in testcase vect/tsvc/vect-tsvc-s278.c when
> configured to use V32 instead of V64 (I plan to do this for RDNA devices).
>
> The problem was that a "not" operation on the mask inadvertently enabled
> inactive lanes 31-63 and corrupted the output.  The fix is to adjust the mask
> when calling internal functions (in this case COND_MINUS), when doing masked
> loads and stores, and when doing conditional jumps.
>
> OK for mainline?
>
> Andrew
>
> gcc/ChangeLog:
>
> * dojump.cc (do_compare_rtx_and_jump): Clear excess bits in vector
> bitmaps.
> * internal-fn.cc (expand_fn_using_insn): Likewise.
> (add_mask_and_len_args): Likewise.
> ---
>  gcc/dojump.cc  | 16 
>  gcc/internal-fn.cc | 26 ++
>  2 files changed, 42 insertions(+)
>
> diff --git a/gcc/dojump.cc b/gcc/dojump.cc
> index 88600cb42d3..8df86957e83 100644
> --- a/gcc/dojump.cc
> +++ b/gcc/dojump.cc
> @@ -1235,6 +1235,22 @@ do_compare_rtx_and_jump (rtx op0, rtx op1, enum 
> rtx_code code, int unsignedp,
> }
> }
>
> +  if (val
> + && VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (val))
> + && SCALAR_INT_MODE_P (mode))
> +   {
> + auto nunits = TYPE_VECTOR_SUBPARTS (TREE_TYPE (val)).to_constant ();
> + if (maybe_ne (GET_MODE_PRECISION (mode), nunits))
> +   {
> + op0 = expand_binop (mode, and_optab, op0,
> + GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
> + NULL_RTX, true, OPTAB_WIDEN);
> + op1 = expand_binop (mode, and_optab, op1,
> + GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
> + NULL_RTX, true, OPTAB_WIDEN);
> +   }
> +   }
> +

Can we then remove the same code from do_compare_and_jump before the call to
do_compare_rtx_and_jump?  I'll note that we don't pass 'val' there and
'val' is unfortunately
not documented - what's it supposed to be?  I think I placed the original fix in
do_compare_and_jump because we have the full into available there.  So
what's the
do_compare_rtx_and_jump caller that needs fixing as well?  (IMHO keying on 'val'
looks fragile)

The other hunks below are OK.

Thanks,
Richard.

>emit_cmp_and_jump_insns (op0, op1, code, size, mode, unsignedp, val,
>if_true_label, prob);
>  }
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index fcf47c7fa12..5269f0ac528 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -245,6 +245,18 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, 
> unsigned int noutputs,
>&& SSA_NAME_IS_DEFAULT_DEF (rhs)
>&& VAR_P (SSA_NAME_VAR (rhs)))
> create_undefined_input_operand ([opno], TYPE_MODE (rhs_type));
> +  else if (VECTOR_BOOLEAN_TYPE_P (rhs_type)
> +  && SCALAR_INT_MODE_P (TYPE_MODE (rhs_type))
> +  && maybe_ne (GET_MODE_PRECISION (TYPE_MODE (rhs_type)),
> +   TYPE_VECTOR_SUBPARTS (rhs_type).to_constant ()))
> +   {
> + /* Ensure that the vector bitmasks do not have excess bits.  */
> + int nunits = TYPE_VECTOR_SUBPARTS (rhs_type).to_constant ();
> + rtx tmp = expand_binop (TYPE_MODE (rhs_type), and_optab, rhs_rtx,
> + GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
> + NULL_RTX, true, OPTAB_WIDEN);
> + create_input_operand ([opno], tmp, TYPE_MODE (rhs_type));
> +   }
>else
> create_input_operand ([opno], rhs_rtx, TYPE_MODE (rhs_type));
>opno += 1;
> @@ -312,6 +324,20 @@ add_mask_and_len_args (expand_operand *ops, unsigned int 
> opno, gcall *stmt)
>  {
>tree mask = gimple_call_arg (stmt, mask_index);
>rtx mask_rtx = expand_normal (mask);
> +
> +  tree mask_type = TREE_TYPE (mask);
> +  if (VECTOR_BOOLEAN_TYPE_P (mask_type)
> + && SCALAR_INT_MODE_P (TYPE_MODE (mask_type))
> + && maybe_ne (GET_MODE_PRECISION (TYPE_MODE (mask_type)),
> +  TYPE_VECTOR_SUBPARTS (mask_type).to_constant ()))
> +   {
> + /* Ensure that the vector bitmasks do not have excess bits.  */
> + int nunits = TYPE_VECTOR_SUBPARTS (mask_type).to_constant ();
> + mask_rtx = expand_binop (TYPE_MODE (mask_type), and_optab, mask_rtx,
> +  GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
> +  NULL_RTX, true, OPTAB_WIDEN);
> +   }
> +
>create_input_operand ([opno++], mask_rtx,
> TYPE_MODE

Re: scheduler queue flush (was Re: [gcc-15 0/3] RISC-V improve stack/array access by constant mat tweak)

2024-03-21 Thread Jeff Law





On 3/21/24 8:36 AM, Vineet Gupta wrote:



On 3/18/24 21:41, Jeff Law wrote:


On 3/16/24 11:35 AM, Vineet Gupta wrote:

Hi,

This set of patches (for gcc-15) help improve stack/array accesses
by improving constant materialization. Details are in respective
patches.

The first patch is the main change which improves SPEC cactu by 10%.

Just to confirm.  Yup, 10% reduction in icounts and about a 3.5%
improvement in cycles on our target.  Which is great!

This also makes me wonder if cactu is the benchmark that was sensitive
to flushing the pending queue in the scheduler.  Jivan's data would tend
to indicate that is the case as several routines seem to flush the
pending queue often.  In particular:

ML_BSSN_RHS_Body
ML_BSSN_Advect_Body
ML_BSSN_constraints_Body

All have a high number of dynamic instructions as well as lots of
flushes of the pending queue.

Vineet, you might want to look and see if cranking up the
max-pending-list-length parameter helps drive down spilling.   I think
it's default value is 32 insns.  I've seen it cranked up to 128 and 256
insns without significant ill effects on compile time.

My recollection (it's been like 3 years) of the key loop was that it had
a few hundred instructions and we'd flush the pending list about 50
cycles into the loop as there just wasn't enough issue bandwidth to the
FP units to dispatch all the FP instructions as their inputs became
ready.  So you'd be looking for flushes in a big loop.


Here are the results for Cactu on top of the new splitter changes:

default : 2,565,319,368,591
128 : 2,509,741,035,068
256 : 2,527,817,813,612

I've haven't probed deeper in generated code itself but likely to be
helping with spilling
Actually, I read that as not important for this issue.  While it is 50b 
instructions, I would be looking for something that had perhaps an order 
of magnitude bigger impact.Ultimately I think it means we still 
don't have a good handle on what's causing the spilling.  Oh well.


So if we go back to Robin's observation that scheduling dramatically 
increases the instruction count, perhaps we try a run with 
-fno-schedule-insns -fno-schedule-insns2 and see how the instruction 
counts compare.



Jeff

[PATCH] s390: testsuite: Fix abs-4.c

2024-03-21 Thread Stefan Schulze Frielinghaus

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/abs-4.c: On s390 we also have a copysign optab
for long double.  Thus, scan 3 instead of 2 times for it.
---
 Ok for mainline?

 gcc/testsuite/gcc.dg/tree-ssa/abs-4.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/abs-4.c 
b/gcc/testsuite/gcc.dg/tree-ssa/abs-4.c
index 80fa448df12..4144d1cd954 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/abs-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/abs-4.c
@@ -10,8 +10,9 @@ long double abs_ld(long double x) { return 
__builtin_signbit(x) ? x : -x; }
 
 /* __builtin_signbit(x) ? x : -x. Should be convert into - ABS_EXP */
 /* { dg-final { scan-tree-dump-not "signbit" "optimized"} } */
-/* { dg-final { scan-tree-dump-times "= ABS_EXPR" 1 "optimized" { target 
ifn_copysign } } } */
-/* { dg-final { scan-tree-dump-times "= -" 1 "optimized" { target ifn_copysign 
} } } */
-/* { dg-final { scan-tree-dump-times "= \.COPYSIGN" 2 "optimized" { target 
ifn_copysign } } } */
+/* { dg-final { scan-tree-dump-times "= ABS_EXPR" 1 "optimized" { target { 
ifn_copysign && { ! { s390*-*-* } } } } } } */
+/* { dg-final { scan-tree-dump-times "= -" 1 "optimized" { target { 
ifn_copysign && { ! { s390*-*-* } } } } } } */
+/* { dg-final { scan-tree-dump-times "= \.COPYSIGN" 2 "optimized" { target { 
ifn_copysign && { ! { s390*-*-* } } } } } } */
+/* { dg-final { scan-tree-dump-times "= \.COPYSIGN" 3 "optimized" { target { 
ifn_copysign && s390*-*-* } } } } */
 /* { dg-final { scan-tree-dump-times "= ABS_EXPR" 3 "optimized" { target { ! 
ifn_copysign } } } } */
 /* { dg-final { scan-tree-dump-times "= -" 3 "optimized" { target { ! 
ifn_copysign } } } } */
-- 
2.43.0

scheduler queue flush (was Re: [gcc-15 0/3] RISC-V improve stack/array access by constant mat tweak)

2024-03-21 Thread Vineet Gupta




On 3/18/24 21:41, Jeff Law wrote:
>
> On 3/16/24 11:35 AM, Vineet Gupta wrote:
>> Hi,
>>
>> This set of patches (for gcc-15) help improve stack/array accesses
>> by improving constant materialization. Details are in respective
>> patches.
>>
>> The first patch is the main change which improves SPEC cactu by 10%.
> Just to confirm.  Yup, 10% reduction in icounts and about a 3.5% 
> improvement in cycles on our target.  Which is great!
>
> This also makes me wonder if cactu is the benchmark that was sensitive 
> to flushing the pending queue in the scheduler.  Jivan's data would tend 
> to indicate that is the case as several routines seem to flush the 
> pending queue often.  In particular:
>
> ML_BSSN_RHS_Body
> ML_BSSN_Advect_Body
> ML_BSSN_constraints_Body
>
> All have a high number of dynamic instructions as well as lots of 
> flushes of the pending queue.
>
> Vineet, you might want to look and see if cranking up the 
> max-pending-list-length parameter helps drive down spilling.   I think 
> it's default value is 32 insns.  I've seen it cranked up to 128 and 256 
> insns without significant ill effects on compile time.
>
> My recollection (it's been like 3 years) of the key loop was that it had 
> a few hundred instructions and we'd flush the pending list about 50 
> cycles into the loop as there just wasn't enough issue bandwidth to the 
> FP units to dispatch all the FP instructions as their inputs became 
> ready.  So you'd be looking for flushes in a big loop.

Here are the results for Cactu on top of the new splitter changes:

default : 2,565,319,368,591
128 : 2,509,741,035,068
256 : 2,527,817,813,612

I've haven't probed deeper in generated code itself but likely to be
helping with spilling

-Vineet

[PATCH] vect: more oversized bitmask fixups

2024-03-21 Thread Andrew Stubbs

My previous patch to fix this problem with xor was rejected because we
want to fix these issues only at the point of use.  That patch produced
slightly better code, in this example, but this works too

These patches fix up a failure in testcase vect/tsvc/vect-tsvc-s278.c when
configured to use V32 instead of V64 (I plan to do this for RDNA devices).

The problem was that a "not" operation on the mask inadvertently enabled
inactive lanes 31-63 and corrupted the output.  The fix is to adjust the mask
when calling internal functions (in this case COND_MINUS), when doing masked
loads and stores, and when doing conditional jumps.

OK for mainline?

Andrew

gcc/ChangeLog:

* dojump.cc (do_compare_rtx_and_jump): Clear excess bits in vector
bitmaps.
* internal-fn.cc (expand_fn_using_insn): Likewise.
(add_mask_and_len_args): Likewise.
---
 gcc/dojump.cc  | 16 
 gcc/internal-fn.cc | 26 ++
 2 files changed, 42 insertions(+)

diff --git a/gcc/dojump.cc b/gcc/dojump.cc
index 88600cb42d3..8df86957e83 100644
--- a/gcc/dojump.cc
+++ b/gcc/dojump.cc
@@ -1235,6 +1235,22 @@ do_compare_rtx_and_jump (rtx op0, rtx op1, enum rtx_code 
code, int unsignedp,
}
}
 
+  if (val
+ && VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (val))
+ && SCALAR_INT_MODE_P (mode))
+   {
+ auto nunits = TYPE_VECTOR_SUBPARTS (TREE_TYPE (val)).to_constant ();
+ if (maybe_ne (GET_MODE_PRECISION (mode), nunits))
+   {
+ op0 = expand_binop (mode, and_optab, op0,
+ GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
+ NULL_RTX, true, OPTAB_WIDEN);
+ op1 = expand_binop (mode, and_optab, op1,
+ GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
+ NULL_RTX, true, OPTAB_WIDEN);
+   }
+   }
+
   emit_cmp_and_jump_insns (op0, op1, code, size, mode, unsignedp, val,
   if_true_label, prob);
 }
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index fcf47c7fa12..5269f0ac528 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -245,6 +245,18 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, 
unsigned int noutputs,
   && SSA_NAME_IS_DEFAULT_DEF (rhs)
   && VAR_P (SSA_NAME_VAR (rhs)))
create_undefined_input_operand ([opno], TYPE_MODE (rhs_type));
+  else if (VECTOR_BOOLEAN_TYPE_P (rhs_type)
+  && SCALAR_INT_MODE_P (TYPE_MODE (rhs_type))
+  && maybe_ne (GET_MODE_PRECISION (TYPE_MODE (rhs_type)),
+   TYPE_VECTOR_SUBPARTS (rhs_type).to_constant ()))
+   {
+ /* Ensure that the vector bitmasks do not have excess bits.  */
+ int nunits = TYPE_VECTOR_SUBPARTS (rhs_type).to_constant ();
+ rtx tmp = expand_binop (TYPE_MODE (rhs_type), and_optab, rhs_rtx,
+ GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
+ NULL_RTX, true, OPTAB_WIDEN);
+ create_input_operand ([opno], tmp, TYPE_MODE (rhs_type));
+   }
   else
create_input_operand ([opno], rhs_rtx, TYPE_MODE (rhs_type));
   opno += 1;
@@ -312,6 +324,20 @@ add_mask_and_len_args (expand_operand *ops, unsigned int 
opno, gcall *stmt)
 {
   tree mask = gimple_call_arg (stmt, mask_index);
   rtx mask_rtx = expand_normal (mask);
+
+  tree mask_type = TREE_TYPE (mask);
+  if (VECTOR_BOOLEAN_TYPE_P (mask_type)
+ && SCALAR_INT_MODE_P (TYPE_MODE (mask_type))
+ && maybe_ne (GET_MODE_PRECISION (TYPE_MODE (mask_type)),
+  TYPE_VECTOR_SUBPARTS (mask_type).to_constant ()))
+   {
+ /* Ensure that the vector bitmasks do not have excess bits.  */
+ int nunits = TYPE_VECTOR_SUBPARTS (mask_type).to_constant ();
+ mask_rtx = expand_binop (TYPE_MODE (mask_type), and_optab, mask_rtx,
+  GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
+  NULL_RTX, true, OPTAB_WIDEN);
+   }
+
   create_input_operand ([opno++], mask_rtx,
TYPE_MODE (TREE_TYPE (mask)));
 }
-- 
2.41.0

Re: [PATCH v3] RISC-V: Introduce gcc attribute riscv_rvv_vector_bits for RVV

2024-03-21 Thread Kito Cheng

> > &, ^, | has supported on clang, so I think we should support that as well
>
> Looks gcc lack of such operation right now, so mark the TYPE_INDIVISIBLE_P 
> (type) = 0 as aarch64 did.
> I have a try but I am afraid we need separated patch to take care of it for 
> risk control consideration.

Yeah, agree, that's defer this part to GCC 15 :)

RE: [PATCH v3] RISC-V: Introduce gcc attribute riscv_rvv_vector_bits for RVV

2024-03-21 Thread Li, Pan2

> The result of comparison should be vbool* rather than v[u]int*.
> The result of comparison should be vbool* rather than vfloat*,
> otherwise all 1 is not really meanful for floating point value.

> But I know clang generates the same strange/wrong code here...

I see, will update the test cases and double check about it in v4.

> &, ^, | has supported on clang, so I think we should support that as well

Looks gcc lack of such operation right now, so mark the TYPE_INDIVISIBLE_P 
(type) = 0 as aarch64 did.
I have a try but I am afraid we need separated patch to take care of it for 
risk control consideration.

Pan

-Original Message-
From: Kito Cheng  
Sent: Thursday, March 21, 2024 9:25 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 
; rdapp@gmail.com; vine...@rivosinc.com; 
pal...@rivosinc.com
Subject: Re: [PATCH v3] RISC-V: Introduce gcc attribute riscv_rvv_vector_bits 
for RVV

> For the vint*m*_t below operations are allowed.
> * The sizeof.
> * The global variable(s).
> * The element of union and struct.
> * The cast to other equalities.
> * CMP: >, <, ==, !=, <=, >=

The result of comparison should be vbool* rather than v[u]int*.

> * ALU: +, -, *, /, %, &, |, ^, >>, <<, ~, -
>
> For the vfloat*m*_t below operations are allowed.
> * The sizeof.
> * The global variable(s).
> * The element of union and struct.
> * The cast to other equalities.
> * CMP: >, <, ==, !=, <=, >=

The result of comparison should be vbool* rather than vfloat*,
otherwise all 1 is not really meanful for floating point value.

But I know clang generates the same strange/wrong code here...

> * ALU: +, -, *, /, -
>
> For the vbool*_t types only below operations are allowed except
> the CMP and ALU. The CMP and ALU operations on vbool*_t is not
> well defined currently.
> * The sizeof.
> * The global variable(s).
> * The element of union and struct.
> * The cast to other equalities.

&, ^, | has supported on clang, so I think we should support that as well

RE: [PATCH v1] RISC-V: Bugfix ICE for attribute((target("arch=+v"))

2024-03-21 Thread Li, Pan2

Thanks Kito, will send v2 for this change.

Pan

-Original Message-
From: Kito Cheng  
Sent: Thursday, March 21, 2024 8:39 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 

Subject: Re: [PATCH v1] RISC-V: Bugfix ICE for __attribute__((target("arch=+v"))

> +
> +  /* Make sure the implied or combined extension is included after add
> + a new std extension to subset list.  For exmaple as below,
> +
> + void __attribute__((target("arch=+v"))) func () with -march=rv64gc.
> +
> + The implied zvl128b and zve64d of the std v should be included.  */
> +  handle_implied_ext (p);
> +  handle_combine_ext ();
> +  check_conflict_ext ();

Extract those 3 function calls to a public function
riscv_subset_list::finalize(),
and then call that at riscv_target_attr_parser::parse_arch rather than here.

> +
> +  return end_of_ext;
>  }

RE: [PATCH v1] RISC-V: Bugfix function target attribute pollution

2024-03-21 Thread Li, Pan2

Thanks Kito, will commit it after the ICE fix.

Pan

-Original Message-
From: Kito Cheng  
Sent: Thursday, March 21, 2024 8:33 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 

Subject: Re: [PATCH v1] RISC-V: Bugfix function target attribute pollution

LGTM, thanks :)

On Wed, Mar 20, 2024 at 2:07 PM  wrote:
>
> From: Pan Li 
>
> This patch depends on below ICE fix.
>
> https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647915.html
>
> The function target attribute should be on a per-function basis.
> For example, we have 3 function as below:
>
> void test_1 () {}
>
> void __attribute__((target("arch=+v"))) test_2 () {}
>
> void __attribute__((target("arch=+zfh"))) test_3 () {}
>
> void test_4 () {}
>
> The scope of the target attribute should not extend the function body.
> Aka, test_3 cannot have the 'v' extension, as well as the test_4
> cannot have both the 'v' and 'zfh' extension.
>
> Unfortunately, for now the test_4 is able to leverage the 'v' and
> the 'zfh' extension which is incorrect.  This patch would like to
> fix the sticking attribute by introduce the commandline subset_list.
> When parse_arch, we always clone from the cmdline_subset_list instead
> of the current_subset_list.
>
> Meanwhile, we correct the print information about arch like below.
>
> .option arch, rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_zbb1p0
>
> The riscv_declare_function_name hook is always after the hook
> riscv_process_target_attr.  Thus, we introduce one hash_map to record
> the 1:1 mapping from fndel to its' subset_list in advance.  And later
> the riscv_declare_function_name is able to get the right information
> about the arch.
>
> Below test are passed for this patch
> * The riscv fully regression test.
>
> PR target/114352
>
> gcc/ChangeLog:
>
> * common/config/riscv/riscv-common.cc (struct riscv_func_target_info):
> New struct for func decl and target name.
> (struct riscv_func_target_hasher): New hasher for hash table mapping
> from the fn_decl to fn_target_name.
> (riscv_func_decl_hash): New func to compute the hash for fn_decl.
> (riscv_func_target_hasher::hash): New func to impl hash interface.
> (riscv_func_target_hasher::equal): New func to impl equal interface.
> (riscv_cmdline_subset_list): New static var for cmdline subset list.
> (riscv_func_target_table_lazy_init): New func to lazy init the func
> target hash table.
> (riscv_func_target_get): New func to get target name from hash table.
> (riscv_func_target_put): New func to put target name into hash table.
> (riscv_func_target_remove_and_destory): New func to remove target
> info from the hash table and destory it.
> (riscv_parse_arch_string): Set the static var cmdline_subset_list.
> * config/riscv/riscv-subset.h (riscv_cmdline_subset_list): New static
> var for cmdline subset list.
> (riscv_func_target_get): New func decl.
> (riscv_func_target_put): Ditto.
> (riscv_func_target_remove_and_destory): Ditto.
> * config/riscv/riscv-target-attr.cc 
> (riscv_target_attr_parser::parse_arch):
> Take cmdline_subset_list instead of current_subset_list when clone.
> (riscv_process_target_attr): Record the func target info to hash 
> table.
> (riscv_option_valid_attribute_p): Add new arg tree fndel.
> * config/riscv/riscv.cc (riscv_declare_function_name): Consume the
> func target info and print the arch message.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/pr114352-3.c: New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/common/config/riscv/riscv-common.cc   | 105 +++-
>  gcc/config/riscv/riscv-subset.h   |   4 +
>  gcc/config/riscv/riscv-target-attr.cc |  18 ++-
>  gcc/config/riscv/riscv.cc |   7 +-
>  .../gcc.target/riscv/rvv/base/pr114352-3.c| 113 ++
>  5 files changed, 240 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr114352-3.c
>
> diff --git a/gcc/common/config/riscv/riscv-common.cc 
> b/gcc/common/config/riscv/riscv-common.cc
> index d32bf147eca..76ec9bf846c 100644
> --- a/gcc/common/config/riscv/riscv-common.cc
> +++ b/gcc/common/config/riscv/riscv-common.cc
> @@ -425,11 +425,108 @@ bool riscv_subset_list::parse_failed = false;
>
>  static riscv_subset_list *current_subset_list = NULL;
>
> +static riscv_subset_list *cmdline_subset_list = NULL;
> +
> +struct riscv_func_target_info
> +{
> +  tree fn_decl;
> +  std::string fn_target_name;
> +
> +  riscv_func_target_info (const tree , const std::string _name)
> +: fn_decl (decl), fn_target_name (target_name)
> +  {
> +  }
> +};
> +
> +struct riscv_func_target_hasher : nofree_ptr_hash riscv_func_target_info>
> +{
> +  typedef tree compare_type;
> +
> +  static hashval_t hash (value_type);
>

Re: [PATCH] MIPS: Add MIN/MAX.fmt instructions support for MIPS R6

2024-03-21 Thread Xi Ruoyao

On Thu, 2024-03-21 at 10:14 +0800, Jie Mei wrote:
> diff --git a/gcc/testsuite/gcc.target/mips/mips-minmax.c 
> b/gcc/testsuite/gcc.target/mips/mips-minmax.c
> new file mode 100644
> index 000..2d234ac4b1d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/mips/mips-minmax.c
> @@ -0,0 +1,40 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mhard-float -ffinite-math-only -march=mips32r6" } */

You may want to add fmin3 and fmax3 in addition to
smin3 and smax3 so it will work without -ffinite-math-only.

‘fminM3’, ‘fmaxM3’
 IEEE-conformant minimum and maximum operations.  If one operand is
 a quiet ‘NaN’, then the other operand is returned.  If both
 operands are quiet ‘NaN’, then a quiet ‘NaN’ is returned.  In the
 case when gcc supports signaling ‘NaN’ (-fsignaling-nans) an
 invalid floating point exception is raised and a quiet ‘NaN’ is
 returned.

And the MIPS 6.06 manual says:

Numbers are preferred to NaNs: if one input is a NaN, but not both, the
value of the numeric input is returned. If both are NaNs, the NaN in fs
is returned.

for MAX.fmt and MIN.fmt, so they matches fmin3 and fmax3.

> +/* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

[committed] amdgcn: Ensure gfx11 is running in cumode

2024-03-21 Thread Andrew Stubbs

CUmode "on" is the setting for compatibility with GCN and CDNA devices.

Committed to mainline.

gcc/ChangeLog:

* config/gcn/gcn-hsa.h (ASM_SPEC): Pass -mattr=+cumode.
---
 gcc/config/gcn/gcn-hsa.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/config/gcn/gcn-hsa.h b/gcc/config/gcn/gcn-hsa.h
index 9cf181f52a4..c75256dbac3 100644
--- a/gcc/config/gcn/gcn-hsa.h
+++ b/gcc/config/gcn/gcn-hsa.h
@@ -107,6 +107,7 @@ extern unsigned int gcn_local_sym_hash (const char *name);
  "%{" NO_XNACK XNACKOPT "} " \
  "%{" NO_SRAM_ECC SRAMOPT "} " \
  "%{march=gfx1030|march=gfx1100:-mattr=+wavefrontsize64} " \
+ "%{march=gfx1030|march=gfx1100:-mattr=+cumode} " \
  "-filetype=obj"
 #define LINK_SPEC "--pie --export-dynamic"
 #define LIB_SPEC  "-lc"
-- 
2.41.0

[commmitted] amdgcn: Comment correction

2024-03-21 Thread Andrew Stubbs

The location of the marker was changed, but the comment wasn't updated.
Fixed now.

Committed to mainline

gcc/ChangeLog:

* config/gcn/gcn.cc (gcn_expand_builtin_1): Comment correction.
---
 gcc/config/gcn/gcn.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index bc076d1120d..fca001811e5 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -4932,8 +4932,8 @@ gcn_expand_builtin_1 (tree exp, rtx target, rtx 
/*subtarget */ ,
   }
 case GCN_BUILTIN_FIRST_CALL_THIS_THREAD_P:
   {
-   /* Stash a marker in the unused upper 16 bits of s[0:1] to indicate
-  whether it was the first call.  */
+   /* Stash a marker in the unused upper 16 bits of QUEUE_PTR_ARG to
+  indicate whether it was the first call.  */
rtx result = gen_reg_rtx (BImode);
emit_move_insn (result, const0_rtx);
if (cfun->machine->args.reg[QUEUE_PTR_ARG] >= 0)
-- 
2.41.0

Re: [PATCH v3] RISC-V: Introduce gcc attribute riscv_rvv_vector_bits for RVV

2024-03-21 Thread Kito Cheng

Hi Stefan:

I admit that's kinda bad practice here, the spec should appear before
implementation or at least come together, however we have long
discussion on the RISC-V gcc community on this, and we give a waiver
for this feature due to the clang compatibility, and this features
also used on some project, so we would like to moving this forward
even it's incomplete.

For the spec part, we are working in progress and will put the draft soon.

On Fri, Mar 15, 2024 at 9:46 AM Li, Pan2  wrote:
>
> > Shouldn't a major user-facing change like this be discussed in a PR against
> > https://github.com/riscv-non-isa/riscv-c-api-doc/ or
> > https://github.com/riscv-non-isa/rvv-intrinsic-doc before or concurrent with
> > compiler implementation?
>
> I think Kito is working on the spec doc already.
>
> Hi Kito
> Could you please help to correct me the behavior of the riscv_rvv_vector_bits 
> attribute?
> Sort of details and I suspect there is something missing, or different 
> behavior compared with clang side.
>
> Pan
>
> -Original Message-
> From: Stefan O'Rear 
> Sent: Tuesday, March 12, 2024 9:25 PM
> To: Li, Pan2 ; gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; Kito Cheng ; Wang, Yanzhang 
> ; rdapp@gmail.com; Vineet Gupta 
> ; Palmer Dabbelt 
> Subject: Re: [PATCH v3] RISC-V: Introduce gcc attribute riscv_rvv_vector_bits 
> for RVV
>
> On Tue, Mar 12, 2024, at 2:15 AM, pan2...@intel.com wrote:
> > From: Pan Li 
> >
> > Update in v3:
> > * Add pre-defined __riscv_v_fixed_vlen when zvl.
> >
> > Update in v2:
> > * Cleanup some unused code.
> > * Fix some typo of commit log.
> >
> > Original log:
> >
> > This patch would like to introduce one new gcc attribute for RVV.
> > This attribute is used to define fixed-length variants of one
> > existing sizeless RVV types.
> >
> > This attribute is valid if and only if the mrvv-vector-bits=zvl, the only
> > one args should be the integer constant and its' value is terminated
> > by the LMUL and the vector register bits in zvl*b.  For example:
> >
> > typedef vint32m2_t fixed_vint32m2_t 
> > __attribute__((riscv_rvv_vector_bits(128)));
> >
> > The above type define is valid when -march=rv64gc_zve64d_zvl64b
> > (aka 2(m2) * 64 = 128 for vin32m2_t), and will report error when
> > -march=rv64gcv_zvl128b similar to below.
> >
> > "error: invalid RVV vector size '128', expected size is '256' based on
> > LMUL of type and '-mrvv-vector-bits=zvl'"
> >
> > Meanwhile, a pre-define macro __riscv_v_fixed_vlen is introduced to
> > represent the fixed vlen in a RVV vector register.
>
> Shouldn't a major user-facing change like this be discussed in a PR against
> https://github.com/riscv-non-isa/riscv-c-api-doc/ or
> https://github.com/riscv-non-isa/rvv-intrinsic-doc before or concurrent with
> compiler implementation?
>
> -s
>
> > For the vint*m*_t below operations are allowed.
> > * The sizeof.
> > * The global variable(s).
> > * The element of union and struct.
> > * The cast to other equalities.
> > * CMP: >, <, ==, !=, <=, >=
> > * ALU: +, -, *, /, %, &, |, ^, >>, <<, ~, -
> >
> > For the vfloat*m*_t below operations are allowed.
> > * The sizeof.
> > * The global variable(s).
> > * The element of union and struct.
> > * The cast to other equalities.
> > * CMP: >, <, ==, !=, <=, >=
> > * ALU: +, -, *, /, -
> >
> > For the vbool*_t types only below operations are allowed except
> > the CMP and ALU. The CMP and ALU operations on vbool*_t is not
> > well defined currently.
> > * The sizeof.
> > * The global variable(s).
> > * The element of union and struct.
> > * The cast to other equalities.
> >
> > For the vint*x*m*_t tuple types are not suppored in this patch
> > which is compatible with clang.
> >
> > This patch passed the below testsuites.
> > * The riscv fully regression tests.
> >
> > gcc/ChangeLog:
> >
> >   * config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins): Add pre-define
> >   macro __riscv_v_fixed_vlen when zvl.
> >   * config/riscv/riscv.cc (riscv_handle_rvv_vector_bits_attribute):
> >   New static func to take care of the RVV types decorated by
> >   the attributes.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-1.c: New test.
> >   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-10.c: New test.
> >   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-11.c: New test.
> >   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-12.c: New test.
> >   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-13.c: New test.
> >   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-14.c: New test.
> >   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-15.c: New test.
> >   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-16.c: New test.
> >   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-17.c: New test.
> >   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-2.c: New test.
> >   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-3.c: New test.
> >   *

Re: [PATCH v3] RISC-V: Introduce gcc attribute riscv_rvv_vector_bits for RVV

2024-03-21 Thread Kito Cheng

> For the vint*m*_t below operations are allowed.
> * The sizeof.
> * The global variable(s).
> * The element of union and struct.
> * The cast to other equalities.
> * CMP: >, <, ==, !=, <=, >=

The result of comparison should be vbool* rather than v[u]int*.

> * ALU: +, -, *, /, %, &, |, ^, >>, <<, ~, -
>
> For the vfloat*m*_t below operations are allowed.
> * The sizeof.
> * The global variable(s).
> * The element of union and struct.
> * The cast to other equalities.
> * CMP: >, <, ==, !=, <=, >=

The result of comparison should be vbool* rather than vfloat*,
otherwise all 1 is not really meanful for floating point value.

But I know clang generates the same strange/wrong code here...

> * ALU: +, -, *, /, -
>
> For the vbool*_t types only below operations are allowed except
> the CMP and ALU. The CMP and ALU operations on vbool*_t is not
> well defined currently.
> * The sizeof.
> * The global variable(s).
> * The element of union and struct.
> * The cast to other equalities.

&, ^, | has supported on clang, so I think we should support that as well

[committed] amdgcn: Clean up device memory in gcn-run

2024-03-21 Thread Andrew Stubbs

There are some stability issues in the ROC runtime or drivers when we
run too many tests in quick succession.  I was hoping this patch might
fix it, but no; still good to fix the omissions though.

Committed to mainline.

gcc/ChangeLog:

* config/gcn/gcn-run.cc (main): Add an hsa_memory_free calls for each
device_malloc call.
---
 gcc/config/gcn/gcn-run.cc | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/config/gcn/gcn-run.cc b/gcc/config/gcn/gcn-run.cc
index d45ff3e6c2ba..2f3ed2d41d2f 100644
--- a/gcc/config/gcn/gcn-run.cc
+++ b/gcc/config/gcn/gcn-run.cc
@@ -755,7 +755,13 @@ main (int argc, char *argv[])
 
   /* Clean shut down.  */
   XHSA (hsa_fns.hsa_memory_free_fn (kernargs),
-   "Clean up device memory");
+   "Clean up device kernargs memory");
+  XHSA (hsa_fns.hsa_memory_free_fn (args),
+   "Clean up device args memory");
+  XHSA (hsa_fns.hsa_memory_free_fn (heap),
+   "Clean up device heap memory");
+  XHSA (hsa_fns.hsa_memory_free_fn (stack),
+   "Clean up device stack memory");
   XHSA (hsa_fns.hsa_executable_destroy_fn (executable),
"Clean up GCN executable");
   XHSA (hsa_fns.hsa_queue_destroy_fn (queue),
-- 
2.41.0

Re: [PATCH, v3] Fortran: improve array component description in runtime error message [PR30802]

2024-03-21 Thread Mikael Morin


Le 20/03/2024 à 21:24, Harald Anlauf a écrit :

Hi Mikael, all,

here's now the third version of the patch that implements the following
scheme:

On 3/15/24 20:29, Mikael Morin wrote:

Le 15/03/2024 à 18:26, Harald Anlauf a écrit :

OK, that sounds interesting.  To clarify the options:

- for ordinary array x it would stay 'x'

- when z is a DT scalar, and z%x is the array in question, use 'z%x'
   (here z...%x would look strange to me)


Yes, the ellipsis would look strange to me as well.


- when z is a DT array, and x some component further down, 'z...%x'


This case also applies when z is a DT scalar and x is more than one
level deep.


I would rather not make the error message text vary too much to avoid
to run into issues with translation.  Would it be fine with you to have

... dimension 1 of array 'z...%x' above array bound ...

only?


OK, let's drop "component".


Anything else?


No, I think you covered everything.


I've created a new helper function that centralizes the generation of
the abbreviated name of the array (component) and use it to simplify
related code in multiple places.  If we change our mind how a bounds
violation error message should look like, it will be easier to adjust
in the future.

Is this OK for 14-mainline?


Yes, thanks.


Thanks,
Harald

[PATCH v2] rs6000: Fix issue in specifying PTImode as an attribute [PR106895]

2024-03-21 Thread jeevitha

Hi All,

The following patch has been bootstrapped and regtested on powerpc64le-linux.

PTImode assists in generating even/odd register pairs on 128 bits. When the 
user 
specifies PTImode as an attribute, it breaks because there is no internal type 
to handle this mode. To address this, we have created a tree node with dummy 
type
to handle PTImode. We are not documenting this dummy type since users are not
allowed to use this type externally.

2024-03-21  Jeevitha Palanisamy  

gcc/
PR target/110411
* config/rs6000/rs6000.h (enum rs6000_builtin_type_index): Add
RS6000_BTI_INTPTI.
* config/rs6000/rs6000-builtin.cc (rs6000_init_builtins): Add node
for PTImode type.

gcc/testsuite/
PR target/106895
* gcc.target/powerpc/pr106895.c: New testcase.

diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc
index 6698274031b..f553c72779e 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -756,6 +756,15 @@ rs6000_init_builtins (void)
   else
 ieee128_float_type_node = NULL_TREE;
 
+  /* PTImode to get even/odd register pairs.  */
+  intPTI_type_internal_node = make_node(INTEGER_TYPE);
+  TYPE_PRECISION (intPTI_type_internal_node) = GET_MODE_BITSIZE (PTImode);
+  layout_type (intPTI_type_internal_node);
+  SET_TYPE_MODE (intPTI_type_internal_node, PTImode);
+  t = build_qualified_type (intPTI_type_internal_node, TYPE_QUAL_CONST);
+  lang_hooks.types.register_builtin_type (intPTI_type_internal_node,
+ "__dummypti");
+
   /* Vector pair and vector quad support.  */
   vector_pair_type_node = make_node (OPAQUE_TYPE);
   SET_TYPE_MODE (vector_pair_type_node, OOmode);
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 68bc45d65ba..b6078077b20 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -2302,6 +2302,7 @@ enum rs6000_builtin_type_index
   RS6000_BTI_ptr_vector_quad,
   RS6000_BTI_ptr_long_long,
   RS6000_BTI_ptr_long_long_unsigned,
+  RS6000_BTI_INTPTI,
   RS6000_BTI_MAX
 };
 
@@ -2346,6 +2347,7 @@ enum rs6000_builtin_type_index
 #define uintDI_type_internal_node   
(rs6000_builtin_types[RS6000_BTI_UINTDI])
 #define intTI_type_internal_node
(rs6000_builtin_types[RS6000_BTI_INTTI])
 #define uintTI_type_internal_node   
(rs6000_builtin_types[RS6000_BTI_UINTTI])
+#define intPTI_type_internal_node   
(rs6000_builtin_types[RS6000_BTI_INTPTI])
 #define float_type_internal_node
(rs6000_builtin_types[RS6000_BTI_float])
 #define double_type_internal_node   
(rs6000_builtin_types[RS6000_BTI_double])
 #define long_double_type_internal_node  
(rs6000_builtin_types[RS6000_BTI_long_double])
diff --git a/gcc/testsuite/gcc.target/powerpc/pr106895.c 
b/gcc/testsuite/gcc.target/powerpc/pr106895.c
new file mode 100644
index 000..56547b7fa9d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr106895.c
@@ -0,0 +1,15 @@
+/* PR target/106895 */
+/* { dg-require-effective-target int128 } */
+/* { dg-options "-O2" } */
+
+/* Verify the following generates even/odd register pairs.  */
+
+typedef __int128 pti __attribute__((mode(PTI)));
+
+void
+set128 (pti val, pti *mem)
+{
+asm("stq %1,%0" : "=m"(*mem) : "r"(val));
+}
+
+/* { dg-final { scan-assembler "stq \[123\]?\[02468\]" } } */

Re: [PATCH v1] RISC-V: Bugfix ICE for attribute((target("arch=+v"))

2024-03-21 Thread Kito Cheng

> +
> +  /* Make sure the implied or combined extension is included after add
> + a new std extension to subset list.  For exmaple as below,
> +
> + void __attribute__((target("arch=+v"))) func () with -march=rv64gc.
> +
> + The implied zvl128b and zve64d of the std v should be included.  */
> +  handle_implied_ext (p);
> +  handle_combine_ext ();
> +  check_conflict_ext ();

Extract those 3 function calls to a public function
riscv_subset_list::finalize(),
and then call that at riscv_target_attr_parser::parse_arch rather than here.

> +
> +  return end_of_ext;
>  }

Re: [PATCH v2] RISC-V: Introduce option -mrvv-max-lmul for RVV autovec

2024-03-21 Thread Kito Cheng

LGTM too :)

On Mon, Mar 18, 2024 at 11:35 PM Robin Dapp  wrote:
>
> LGTM as well.
>
> Regards
>  Robin
>

Re: [PATCH v1] RISC-V: Bugfix function target attribute pollution

2024-03-21 Thread Kito Cheng

LGTM, thanks :)

On Wed, Mar 20, 2024 at 2:07 PM  wrote:
>
> From: Pan Li 
>
> This patch depends on below ICE fix.
>
> https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647915.html
>
> The function target attribute should be on a per-function basis.
> For example, we have 3 function as below:
>
> void test_1 () {}
>
> void __attribute__((target("arch=+v"))) test_2 () {}
>
> void __attribute__((target("arch=+zfh"))) test_3 () {}
>
> void test_4 () {}
>
> The scope of the target attribute should not extend the function body.
> Aka, test_3 cannot have the 'v' extension, as well as the test_4
> cannot have both the 'v' and 'zfh' extension.
>
> Unfortunately, for now the test_4 is able to leverage the 'v' and
> the 'zfh' extension which is incorrect.  This patch would like to
> fix the sticking attribute by introduce the commandline subset_list.
> When parse_arch, we always clone from the cmdline_subset_list instead
> of the current_subset_list.
>
> Meanwhile, we correct the print information about arch like below.
>
> .option arch, rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_zbb1p0
>
> The riscv_declare_function_name hook is always after the hook
> riscv_process_target_attr.  Thus, we introduce one hash_map to record
> the 1:1 mapping from fndel to its' subset_list in advance.  And later
> the riscv_declare_function_name is able to get the right information
> about the arch.
>
> Below test are passed for this patch
> * The riscv fully regression test.
>
> PR target/114352
>
> gcc/ChangeLog:
>
> * common/config/riscv/riscv-common.cc (struct riscv_func_target_info):
> New struct for func decl and target name.
> (struct riscv_func_target_hasher): New hasher for hash table mapping
> from the fn_decl to fn_target_name.
> (riscv_func_decl_hash): New func to compute the hash for fn_decl.
> (riscv_func_target_hasher::hash): New func to impl hash interface.
> (riscv_func_target_hasher::equal): New func to impl equal interface.
> (riscv_cmdline_subset_list): New static var for cmdline subset list.
> (riscv_func_target_table_lazy_init): New func to lazy init the func
> target hash table.
> (riscv_func_target_get): New func to get target name from hash table.
> (riscv_func_target_put): New func to put target name into hash table.
> (riscv_func_target_remove_and_destory): New func to remove target
> info from the hash table and destory it.
> (riscv_parse_arch_string): Set the static var cmdline_subset_list.
> * config/riscv/riscv-subset.h (riscv_cmdline_subset_list): New static
> var for cmdline subset list.
> (riscv_func_target_get): New func decl.
> (riscv_func_target_put): Ditto.
> (riscv_func_target_remove_and_destory): Ditto.
> * config/riscv/riscv-target-attr.cc 
> (riscv_target_attr_parser::parse_arch):
> Take cmdline_subset_list instead of current_subset_list when clone.
> (riscv_process_target_attr): Record the func target info to hash 
> table.
> (riscv_option_valid_attribute_p): Add new arg tree fndel.
> * config/riscv/riscv.cc (riscv_declare_function_name): Consume the
> func target info and print the arch message.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/pr114352-3.c: New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/common/config/riscv/riscv-common.cc   | 105 +++-
>  gcc/config/riscv/riscv-subset.h   |   4 +
>  gcc/config/riscv/riscv-target-attr.cc |  18 ++-
>  gcc/config/riscv/riscv.cc |   7 +-
>  .../gcc.target/riscv/rvv/base/pr114352-3.c| 113 ++
>  5 files changed, 240 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr114352-3.c
>
> diff --git a/gcc/common/config/riscv/riscv-common.cc 
> b/gcc/common/config/riscv/riscv-common.cc
> index d32bf147eca..76ec9bf846c 100644
> --- a/gcc/common/config/riscv/riscv-common.cc
> +++ b/gcc/common/config/riscv/riscv-common.cc
> @@ -425,11 +425,108 @@ bool riscv_subset_list::parse_failed = false;
>
>  static riscv_subset_list *current_subset_list = NULL;
>
> +static riscv_subset_list *cmdline_subset_list = NULL;
> +
> +struct riscv_func_target_info
> +{
> +  tree fn_decl;
> +  std::string fn_target_name;
> +
> +  riscv_func_target_info (const tree , const std::string _name)
> +: fn_decl (decl), fn_target_name (target_name)
> +  {
> +  }
> +};
> +
> +struct riscv_func_target_hasher : nofree_ptr_hash riscv_func_target_info>
> +{
> +  typedef tree compare_type;
> +
> +  static hashval_t hash (value_type);
> +  static bool equal (value_type, const compare_type &);
> +};
> +
> +static hash_table *func_target_table = NULL;
> +
> +static inline hashval_t riscv_func_decl_hash (tree fn_decl)
> +{
> +  inchash::hash h;
> +
> +  h.add_ptr (fn_decl);
> +
> +  return h.end ();
> +}
> +
> +inline hashval_t
>

New effective-target 'asm_goto_with_outputs' (was: [PATCH] testsuite: Fix up lra effective target)

2024-03-21 Thread Thomas Schwinge

Hi!

On 2024-02-16T10:48:53-0800, Mike Stump  wrote:
> On Feb 16, 2024, at 2:16 AM, Jakub Jelinek  wrote:
>> 
>> There is one special case, NVPTX, which is a TARGET_NO_REGISTER_ALLOCATION
>> target.  I think claiming for it that it is a lra target is strange (even
>> though it effectively returns true for targetm.lra_p ()), unsure if it
>> supports asm goto with outputs or not, if it does and we want to test it,
>> perhaps we should introduce asm_goto_outputs effective target and use
>> lra || nvptx-*-* for that?
>
> Since the port people have to maintain that code in general, I usually leave 
> it to them to try and select a cheap, maintainable way to manage it.
>
> If people want to pave the way, I'd tend to defer to them, having thought 
> about more than I.

Here I am.  ;-)

After commit e16f90be2dc8af6c371fe79044c3e668fa3dda62
"testsuite: Fix up lra effective target", we get for nvptx target:

-PASS: gcc.c-torture/compile/asmgoto-2.c   -O0  (test for excess errors)
+ERROR: gcc.c-torture/compile/asmgoto-2.c   -O0 : no files matched glob 
pattern "lra1020113.c.[0-9][0-9][0-9]r.reload" for " dg-do 2 compile { target 
lra } "

Etc.

That is, the current effective-target 'lra' is not suitable for nvptx --
which, I suppose, is OK, given that nvptx neither uses LRA nor doesn't
use LRA.  ;-) (Therefore, effective-target 'lra' shouldn't get used in
test cases that are active for nvptx.)

However, nvptx appears to support 'asm goto' with outputs, including the
new execution test case:

PASS: gcc.dg/pr107385.c execution test

I'm attaching "[WIP] New effective-target 'asm_goto_with_outputs'", which
does address the effective-target check for nvptx, and otherwise does
's%lra%asm_goto_with_outputs'.  (I have not yet actually merged
'check_effective_target_lra' into
'check_effective_target_asm_goto_with_outputs'.)

I have verified that all current effective-target 'lra' test cases
actually use 'asm goto' with outputs, there is just one exception:
'gcc.dg/pr110079.c' (see

"bb-reorder: Fix -freorder-blocks-and-partition ICEs on aarch64 with asm goto 
[PR110079]",

"ICE with -freorder-blocks-and-partition and inline-asm goto").  That
test case, 'gcc.dg/pr110079.c', currently uses 'target lra', and uses
'asm goto' -- but not with outputs, so is 'asm_goto_with_outputs' not
really applicable?  The test case does PASS for nvptx target (but I've
not verified what it's actually doing/testing).  How to handle that one?


Grüße
 Thomas


>From d9f8faaa5026bb970b3246235eb22bf9b5e9fe3a Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 4 Mar 2024 16:04:11 +0100
Subject: [PATCH] [WIP] New effective-target 'asm_goto_with_outputs'

After commit e16f90be2dc8af6c371fe79044c3e668fa3dda62
"testsuite: Fix up lra effective target", we get for nvptx target:

-PASS: gcc.c-torture/compile/asmgoto-2.c   -O0  (test for excess errors)
+ERROR: gcc.c-torture/compile/asmgoto-2.c   -O0 : no files matched glob pattern "lra1020113.c.[0-9][0-9][0-9]r.reload" for " dg-do 2 compile { target lra } "

Etc.

However, nvptx appears to support 'asm goto' with outputs, including the
new execution test case:

PASS: gcc.dg/pr107385.c execution test

TODO gcc/testsuite/gcc.dg/pr110079.c
doesn't using 'asm_goto' with outputs, but is PASS for nvptx, and would ERROR for 'target lra'.
---
 gcc/doc/sourcebuild.texi| 3 +++
 gcc/testsuite/gcc.c-torture/compile/asmgoto-2.c | 2 +-
 gcc/testsuite/gcc.c-torture/compile/asmgoto-5.c | 2 +-
 gcc/testsuite/gcc.c-torture/compile/asmgoto-6.c | 3 +--
 gcc/testsuite/gcc.c-torture/compile/pr98096.c   | 2 +-
 gcc/testsuite/gcc.dg/pr100590.c | 2 +-
 gcc/testsuite/gcc.dg/pr107385.c | 2 +-
 gcc/testsuite/gcc.dg/pr108095.c | 2 +-
 gcc/testsuite/gcc.dg/pr110079.c | 2 +-
 gcc/testsuite/gcc.dg/pr97954.c  | 2 +-
 gcc/testsuite/gcc.dg/torture/pr100329.c | 2 +-
 gcc/testsuite/gcc.dg/torture/pr100398.c | 2 +-
 gcc/testsuite/gcc.dg/torture/pr100519.c | 2 +-
 gcc/testsuite/gcc.dg/torture/pr110422.c | 2 +-
 gcc/testsuite/lib/target-supports.exp   | 9 +
 15 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index b56b9c39733..a176a3c864f 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2863,6 +2863,9 @@ Target supports weak undefined symbols
 @item R_flag_in_section
 Target supports the 'R' flag in .section directive in assembly inputs.
 
+@item asm_goto_with_outputs
+Target supports 'asm goto' with outputs.
+
 @item automatic_stack_alignment
 Target supports automatic stack alignment.
 
diff --git a/gcc/testsuite/gcc.c-torture/compile/asmgoto-2.c b/gcc/testsuite/gcc.c-torture/compile/asmgoto-2.c
index 43e597bc59f..234c90e5295 100644
--- a/gcc/testsuite/gcc.c-torture/compile/asmgoto-2.c
+++

Re: GCN: Enable effective-target 'vect_early_break', 'vect_early_break_hw'

2024-03-21 Thread Andrew Stubbs


On 21/03/2024 10:41, Thomas Schwinge wrote:

Hi!

On 2024-01-12T15:02:35+0100, I wrote:

OK to push the attached
"GCN: Enable effective-target 'vect_early_break', 'vect_early_break_hw'"?


Ping.  (Or is that not what you'd expect to see for GCN?  I haven't
checked the actual back end code...)


Sorry, I missed this during the transition.

I think early break just means conditional/masked vectors, so it should 
work.


OK to commit.


("The relevant test cases are all-PASS with just [two] exceptions, to be
looked into individually, later on."  I'm not currently planning to look
into that.)


(One of those actually going to be fixed by a different patch to be
posted in a moment.)


Nice. :)

Andrew

GCN: Enable effective-target 'vect_hw_misalign'

2024-03-21 Thread Thomas Schwinge

Hi!

OK to push the attached
"GCN: Enable effective-target 'vect_hw_misalign'"?  (Or is that not what
you'd expect to see for GCN?  I haven't checked the actual back end
code...)


Grüße
 Thomas


>From dad0686e179e9395408a39ccfbf760bc30acffc9 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 20 Mar 2024 23:52:26 +0100
Subject: [PATCH] GCN: Enable effective-target 'vect_hw_misalign'

... as made apparent by commit 4e1fcf44bdc582e71408175d75e025f5be8b0e55
"testsuite: vect: Require vect_hw_misalign in gcc.dg/vect/vect-cost-model-1.c etc. [PR98238]"
causing:

 PASS: gcc.dg/vect/vect-cost-model-1.c (test for excess errors)
-PASS: gcc.dg/vect/vect-cost-model-1.c scan-tree-dump vect "LOOP VECTORIZED"

 PASS: gcc.dg/vect/vect-cost-model-3.c (test for excess errors)
-PASS: gcc.dg/vect/vect-cost-model-3.c scan-tree-dump vect "LOOP VECTORIZED"

 PASS: gcc.dg/vect/vect-cost-model-5.c (test for excess errors)
-PASS: gcc.dg/vect/vect-cost-model-5.c scan-tree-dump vect "LOOP VECTORIZED"

..., and similarly commit ffd47fb63ddc024db847daa07f8ae27fffdfcb28
"testsuite: Fix pr113431.c FAIL on sparc* [PR113431]" causing:

 PASS: gcc.dg/vect/pr113431.c (test for excess errors)
 PASS: gcc.dg/vect/pr113431.c execution test
-PASS: gcc.dg/vect/pr113431.c scan-tree-dump-times slp1 "optimized: basic block part vectorized" 2

..., which this commit all restores, and also enables a good number of further
FAIL -> PASS, UNSUPPORTED -> PASS, etc. progressions.  There are also a small
number of regressions, mostly in the SLP area apparently:

 PASS: gcc.dg/vect/bb-slp-layout-12.c (test for excess errors)
+XPASS: gcc.dg/vect/bb-slp-layout-12.c scan-tree-dump-not slp1 "duplicating permutation node"
+XFAIL: gcc.dg/vect/bb-slp-layout-12.c scan-tree-dump-times slp1 "add new stmt: [^\\n\\r]* = VEC_PERM_EXPR" 3

 PASS: gcc.dg/vect/bb-slp-layout-6.c (test for excess errors)
+FAIL: gcc.dg/vect/bb-slp-layout-6.c scan-tree-dump slp2 "absorbing input layouts"

 PASS: gcc.dg/vect/pr97428.c (test for excess errors)
 PASS: gcc.dg/vect/pr97428.c scan-tree-dump vect "Detected interleaving load of size 8"
 PASS: gcc.dg/vect/pr97428.c scan-tree-dump vect "Detected interleaving store of size 16"
 PASS: gcc.dg/vect/pr97428.c scan-tree-dump-not vect "gap of 6 elements"
-XFAIL: gcc.dg/vect/pr97428.c scan-tree-dump-times vect "vectorizing stmts using SLP" 2
+FAIL: gcc.dg/vect/pr97428.c scan-tree-dump-times vect "vectorizing stmts using SLP" 2

 PASS: gcc.dg/vect/vect-33.c (test for excess errors)
+FAIL: gcc.dg/vect/vect-33.c scan-tree-dump vect "Vectorizing an unaligned access"
 PASS: gcc.dg/vect/vect-33.c scan-tree-dump-not optimized "Invalid sum"
 PASS: gcc.dg/vect/vect-33.c scan-tree-dump-times vect "vectorized 1 loops" 1

..., so some further conditionalizing etc. seems necessary.  These seem to
mostly appear next to pre-existing similar FAILs in related test cases.
(Overall, way more PASS than FAIL.)

	gcc/testsuite/
	* lib/target-supports.exp
	(check_effective_target_vect_hw_misalign): Enable for GCN.
	(check_effective_target_vect_element_align): Adjust.
---
 gcc/testsuite/lib/target-supports.exp | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 302781e91de..2291a673d53 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -8309,7 +8309,8 @@ proc check_effective_target_vect_hw_misalign { } {
 	 || ([istarget s390*-*-*]
 		 && [check_effective_target_s390_vx])
 	 || ([istarget riscv*-*-*])
-	 || ([istarget loongarch*-*-*]) } {
+	 || ([istarget loongarch*-*-*])
+	 || [istarget amdgcn*-*-*] } {
 	  return 1
 	}
 	if { [istarget arm*-*-*]
@@ -8873,8 +8874,7 @@ proc check_effective_target_vect_element_align { } {
 return [check_cached_effective_target_indexed vect_element_align {
   expr { ([istarget arm*-*-*]
 	  && ![check_effective_target_arm_vect_no_misalign])
-	 || [check_effective_target_vect_hw_misalign]
-	 || [istarget amdgcn-*-*] }}]
+	 || [check_effective_target_vect_hw_misalign] }}]
 }
 
 # Return 1 if we expect to see unaligned accesses in at least some
-- 
2.34.1

GCN: Enable effective-target 'vect_long_mult'

2024-03-21 Thread Thomas Schwinge

Hi!

OK to push the attached "GCN: Enable effective-target 'vect_long_mult'"?
(Or is that not what you'd expect to see for GCN?  I haven't checked the
actual back end code...)


Grüße
 Thomas


>From e0e58dfc350581ed0519420ad02adcc01e645eae Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 20 Mar 2024 23:56:58 +0100
Subject: [PATCH] GCN: Enable effective-target 'vect_long_mult'

... as made apparent by commit bfd6b36f08021f023e0e9223f5aea315b74a5c56
"testsuite/vect: Fix pr25413a.c expectations [PR109705]" causing:

 PASS: gcc.dg/vect/pr25413a.c (test for excess errors)
 PASS: gcc.dg/vect/pr25413a.c execution test
-PASS: gcc.dg/vect/pr25413a.c scan-tree-dump-times vect "vectorized 2 loops" 1
+FAIL: gcc.dg/vect/pr25413a.c scan-tree-dump-times vect "vectorized 1 loops" 1

..., which this commit resolves.

	gcc/testsuite/
	* lib/target-supports.exp (check_effective_target_vect_long_mult):
	Enable for GCN.
---
 gcc/testsuite/lib/target-supports.exp | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 2291a673d53..452b36ff927 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -9056,7 +9056,8 @@ proc check_effective_target_vect_long_mult { } {
 	 || ([istarget riscv*-*-*]
 	  && [check_effective_target_riscv_v])
 	 || ([istarget loongarch*-*-*]
-	  && [check_effective_target_loongarch_sx]) } {
+	  && [check_effective_target_loongarch_sx])
+	 || [istarget amdgcn-*-*] } {
 	set answer 1
 } else {
 	set answer 0
-- 
2.34.1

GCN: Enable effective-target 'vect_early_break', 'vect_early_break_hw'

2024-03-21 Thread Thomas Schwinge

Hi!

On 2024-01-12T15:02:35+0100, I wrote:
> OK to push the attached
> "GCN: Enable effective-target 'vect_early_break', 'vect_early_break_hw'"?

Ping.  (Or is that not what you'd expect to see for GCN?  I haven't
checked the actual back end code...)


> ("The relevant test cases are all-PASS with just [two] exceptions, to be
> looked into individually, later on."  I'm not currently planning to look
> into that.)

(One of those actually going to be fixed by a different patch to be
posted in a moment.)


Grüße
 Thomas


>From 3193614c4f9a8032e85a4da87bde8055aeee7d7b Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 9 Jan 2024 10:25:48 +0100
Subject: [PATCH] GCN: Enable effective-target 'vect_early_break',
 'vect_early_break_hw'

Via XPASSing test cases after commit a657c7e3518fcfc796f223d47385cad5e97dc9a5
"testsuite: un-xfail TSVC loops that check for exit control flow vectorization":

PASS: gcc.dg/vect/tsvc/vect-tsvc-s332.c (test for excess errors)
PASS: gcc.dg/vect/tsvc/vect-tsvc-s332.c execution test
[-XFAIL:-]{+XPASS:+} gcc.dg/vect/tsvc/vect-tsvc-s332.c scan-tree-dump vect "vectorized 1 loops"

PASS: gcc.dg/vect/tsvc/vect-tsvc-s481.c (test for excess errors)
PASS: gcc.dg/vect/tsvc/vect-tsvc-s481.c execution test
[-XFAIL:-]{+XPASS:+} gcc.dg/vect/tsvc/vect-tsvc-s481.c scan-tree-dump vect "vectorized 1 loops"

PASS: gcc.dg/vect/tsvc/vect-tsvc-s482.c (test for excess errors)
PASS: gcc.dg/vect/tsvc/vect-tsvc-s482.c execution test
[-XFAIL:-]{+XPASS:+} gcc.dg/vect/tsvc/vect-tsvc-s482.c scan-tree-dump vect "vectorized 1 loops"

..., it became apparent that GCN, too, does support vectorization of loops with
early breaks.  The relevant test cases are all-PASS with just the following
exceptions, to be looked into individually, later on:

PASS: gcc.dg/vect/vect-early-break_25.c (test for excess errors)
PASS: gcc.dg/vect/vect-early-break_25.c scan-tree-dump-times vect "vectorized 1 loops" 1
FAIL: gcc.dg/vect/vect-early-break_25.c scan-tree-dump-times vect "Alignment of access forced using peeling" 1

PASS: gcc.dg/vect/vect-early-break_56.c (test for excess errors)
PASS: gcc.dg/vect/vect-early-break_56.c execution test
XPASS: gcc.dg/vect/vect-early-break_56.c scan-tree-dump-times vect "vectorized 2 loops" 2

	gcc/testsuite/
	* lib/target-supports.exp
	(check_effective_target_vect_early_break)
	(check_effective_target_vect_early_break_hw): Enable for GCN.
---
 gcc/testsuite/lib/target-supports.exp | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 75d1add894f..497c46de4cb 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -4071,6 +4071,7 @@ proc check_effective_target_vect_early_break { } {
 	[istarget aarch64*-*-*]
 	|| [check_effective_target_arm_v8_neon_ok]
 	|| [check_effective_target_sse4]
+	|| [istarget amdgcn-*-*]
 	}}]
 }
 
@@ -4085,6 +4086,7 @@ proc check_effective_target_vect_early_break_hw { } {
 	[istarget aarch64*-*-*]
 	|| [check_effective_target_arm_v8_neon_hw]
 	|| [check_sse4_hw_available]
+	|| [istarget amdgcn-*-*]
 	}}]
 }
 
-- 
2.34.1

Re: [PATCH] tree-optimization/111736 - avoid address sanitizing of __seg_gs

2024-03-21 Thread Richard Biener

On Thu, 21 Mar 2024, Jakub Jelinek wrote:

> On Thu, Mar 21, 2024 at 10:50:04AM +0100, Richard Biener wrote:
> > Fixed and pushed.  I suppose for address-spaces nested within the
> > generic address space we could instrument the address converted to
> > the generic address space value.
> 
> Unlike TLS, we don't know if address-spaces are nested within the generic
> address space and how to map the as address to a generic as address.

In theory there's hooks for this (TARGET_ADDR_SPACE_SUBSET_P) and
ADDR_SPACE_CONVERT_EXPR to do the conversion.  But hardly worth it
I guess.

Richard.

[PING][PATCH v2 00/13] Add aarch64-w64-mingw32 target

2024-03-21 Thread Evgeny Karpov

On Thursday, March 7, 2024 9:48 PM
Evgeny Karpov wrote:

> > Changes from v1 to v2:
> > Adjust the target name to aarch64-*-mingw* to exclude the big-endian
> > target from support.
> > Exclude 64-bit ISA.

> > Update commit descriptions to follow standard style.
> > Rebase from 4th March 2024.
> 
> Hello,
> 
> It looks like the initial feedback has been addressed.
> While unit testing for the x86_64-w64-mingw32 target is still in progress, the
> first 4 patches do not obviously change other targets, including 
> aarch64-linux-
> gnu.
> Could they be merged once stage 1 starts, or could it be done even now?
> Thanks!
> 
> Regards,
> Evgeny

Hello,

The original question was about merging the first 4 patches from the series. 
Since then, we have obtained regression tests 
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648086.html .
It appears that it would be safe to merge the entire series without breaking 
other targets.

Regards,
Evgeny

Re: [PATCH] testsuite: vect: Don't xfail scan-tree-dump in gcc.dg/vect/bb-slp-32.c [PR96147]

2024-03-21 Thread Richard Biener

On Thu, 21 Mar 2024, Rainer Orth wrote:

> gcc.dg/vect/bb-slp-32.c currently XPASSes on 32 and 64-bit Solaris/SPARC:
> 
> XPASS: gcc.dg/vect/bb-slp-32.c -flto -ffat-lto-objects  scan-tree-dump slp2 
> "vectorization is not profitable"
> XPASS: gcc.dg/vect/bb-slp-32.c scan-tree-dump slp2 "vectorization is not 
> profitable"
> 
> At least on SPARC, the current xfail can simply go, but I'm highly
> uncertain if this is right in general.
> 
> Tested on sparc-sun-solaris2.11 and i386-pc-solaris2.11.
> 
> Ok for trunk?

The condition was made for the case where vectorization fails even when
not considering costing.  But given we now do

  p = __builtin_assume_aligned (p, __BIGGEST_ALIGNMENT__);

that condition doesn't make sense anymore (forgot to update it in my
r11-6715-gb36c9cd09472c8 change).

In principle the testcase should be profitable to vectorize with
the SLP reduction support now (and we'd vectorize it that way).
But we fail to apply SLP node CSE when merging the SLP instance
into a common subgraph, so we over-estimate cost (and perform
double code generation that's later CSEd).

That it's still not profitable on x86_64 for me is a quite narrow loss:

  Vector cost: 144
  Scalar cost: 140

So ideally we'd key the FAIL on .REDUC_PLUS not being available for
V4SImode but then we also try V2SImode where the reduction isn't
recognized.  So the testcase wouldn't work well for targets comparing
cost.

I'd say we remove the dg-final completely for now.  I filed PR114413
about the costing/CSE issue above.

Richard.

Re: [PATCH] tree-optimization/111736 - avoid address sanitizing of __seg_gs

2024-03-21 Thread Jakub Jelinek

On Thu, Mar 21, 2024 at 10:50:04AM +0100, Richard Biener wrote:
> Fixed and pushed.  I suppose for address-spaces nested within the
> generic address space we could instrument the address converted to
> the generic address space value.

Unlike TLS, we don't know if address-spaces are nested within the generic
address space and how to map the as address to a generic as address.

Jakub

Re: [PATCH] tree-optimization/111736 - avoid address sanitizing of __seg_gs

2024-03-21 Thread Richard Biener

On Thu, 21 Mar 2024, Jakub Jelinek wrote:

> On Thu, Mar 21, 2024 at 10:25:24AM +0100, Richard Biener wrote:
> > The following more thoroughly avoids address sanitizing accesses
> > to non-generic address-spaces.
> > 
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > 
> > OK?
> > 
> > Thanks,
> > Richard.
> > 
> > PR tree-optimization/111736
> > * asan.cc (instrument_derefs): Do not instrument accesses
> > to non-generic address-spaces.
> > 
> > * gcc.target/i386/pr111736.c: New testcase.
> > ---
> >  gcc/asan.cc  |  4 
> >  gcc/testsuite/gcc.target/i386/pr111736.c | 23 +++
> >  2 files changed, 27 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr111736.c
> > 
> > diff --git a/gcc/asan.cc b/gcc/asan.cc
> > index cfe83106460..04caf8802e2 100644
> > --- a/gcc/asan.cc
> > +++ b/gcc/asan.cc
> > @@ -2755,6 +2755,10 @@ instrument_derefs (gimple_stmt_iterator *iter, tree 
> > t,
> >if (VAR_P (inner) && DECL_HARD_REGISTER (inner))
> >  return;
> >  
> > +  /* Accesses to non-generic address-spaces are not handled.  */
> 
> I'd say s/are not handled/should not be instrumented/

Fixed and pushed.  I suppose for address-spaces nested within the
generic address space we could instrument the address converted to
the generic address space value.

Richard.

> > +  if (!ADDR_SPACE_GENERIC_P (TYPE_ADDR_SPACE (TREE_TYPE (inner
> > +return;
> > +
> 
> Otherwise LGTM.
> 
> >poly_int64 decl_size;
> >if ((VAR_P (inner)
> > || (TREE_CODE (inner) == RESULT_DECL
> > diff --git a/gcc/testsuite/gcc.target/i386/pr111736.c 
> > b/gcc/testsuite/gcc.target/i386/pr111736.c
> > new file mode 100644
> > index 000..231fdd07e80
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr111736.c
> > @@ -0,0 +1,23 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fsanitize=address" } */
> > +
> > +int __seg_gs m;
> > +
> > +int foo (void)
> > +{
> > +  return m;
> > +}
> > +
> > +extern int  __seg_gs n;
> > +
> > +int bar (void)
> > +{
> > +  return n;
> > +}
> > +
> > +int baz (int __seg_gs *o)
> > +{
> > +  return *o;
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "asan_report_load" } } */
> > -- 
> > 2.35.3
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

[PATCH] testsuite: vect: Don't xfail scan-tree-dump in gcc.dg/vect/bb-slp-32.c [PR96147]

2024-03-21 Thread Rainer Orth

gcc.dg/vect/bb-slp-32.c currently XPASSes on 32 and 64-bit Solaris/SPARC:

XPASS: gcc.dg/vect/bb-slp-32.c -flto -ffat-lto-objects  scan-tree-dump slp2 
"vectorization is not profitable"
XPASS: gcc.dg/vect/bb-slp-32.c scan-tree-dump slp2 "vectorization is not 
profitable"

At least on SPARC, the current xfail can simply go, but I'm highly
uncertain if this is right in general.

Tested on sparc-sun-solaris2.11 and i386-pc-solaris2.11.

Ok for trunk?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2024-03-19  Rainer Orth  

gcc/testsuite:
PR tree-optimization/96147
* gcc.dg/vect/bb-slp-32.c (scan-tree-dump): Remove xfail.

# HG changeset patch
# Parent  b3b6fa4472bc1f2b170e2b736852ec93bae94480
testsuite: vect: Don't xfail scan-tree-dump in gcc.dg/vect/bb-slp-32.c [PR96147]

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-32.c b/gcc/testsuite/gcc.dg/vect/bb-slp-32.c
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-32.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-32.c
@@ -25,4 +25,4 @@ int foo (int *p, int a, int b)
   return sum;
 }
 
-/* { dg-final { scan-tree-dump "vectorization is not profitable" "slp2" { xfail  { vect_no_align && { ! vect_hw_misalign } } } } } */
+/* { dg-final { scan-tree-dump "vectorization is not profitable" "slp2" } } */

Re: [PATCH] tree-optimization/111736 - avoid address sanitizing of __seg_gs

2024-03-21 Thread Jakub Jelinek

On Thu, Mar 21, 2024 at 10:25:24AM +0100, Richard Biener wrote:
> The following more thoroughly avoids address sanitizing accesses
> to non-generic address-spaces.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
> 
> OK?
> 
> Thanks,
> Richard.
> 
>   PR tree-optimization/111736
>   * asan.cc (instrument_derefs): Do not instrument accesses
>   to non-generic address-spaces.
> 
>   * gcc.target/i386/pr111736.c: New testcase.
> ---
>  gcc/asan.cc  |  4 
>  gcc/testsuite/gcc.target/i386/pr111736.c | 23 +++
>  2 files changed, 27 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr111736.c
> 
> diff --git a/gcc/asan.cc b/gcc/asan.cc
> index cfe83106460..04caf8802e2 100644
> --- a/gcc/asan.cc
> +++ b/gcc/asan.cc
> @@ -2755,6 +2755,10 @@ instrument_derefs (gimple_stmt_iterator *iter, tree t,
>if (VAR_P (inner) && DECL_HARD_REGISTER (inner))
>  return;
>  
> +  /* Accesses to non-generic address-spaces are not handled.  */

I'd say s/are not handled/should not be instrumented/

> +  if (!ADDR_SPACE_GENERIC_P (TYPE_ADDR_SPACE (TREE_TYPE (inner
> +return;
> +

Otherwise LGTM.

>poly_int64 decl_size;
>if ((VAR_P (inner)
> || (TREE_CODE (inner) == RESULT_DECL
> diff --git a/gcc/testsuite/gcc.target/i386/pr111736.c 
> b/gcc/testsuite/gcc.target/i386/pr111736.c
> new file mode 100644
> index 000..231fdd07e80
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr111736.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fsanitize=address" } */
> +
> +int __seg_gs m;
> +
> +int foo (void)
> +{
> +  return m;
> +}
> +
> +extern int  __seg_gs n;
> +
> +int bar (void)
> +{
> +  return n;
> +}
> +
> +int baz (int __seg_gs *o)
> +{
> +  return *o;
> +}
> +
> +/* { dg-final { scan-assembler-not "asan_report_load" } } */
> -- 
> 2.35.3

Jakub

[PATCH] testsuite: i386: Skip gcc.target/i386/avx512cd-vpbroadcastmb2q-2.c etc. with Solaris as [PR114150]

2024-03-21 Thread Rainer Orth

Two avx512cd tests FAIL to assemble with the Solaris/x86 assembler:

FAIL: gcc.target/i386/avx512cd-vpbroadcastmb2q-2.c (test for excess errors)
UNRESOLVED: gcc.target/i386/avx512cd-vpbroadcastmb2q-2.c compilation failed to 
produce executable
FAIL: gcc.target/i386/avx512cd-vpbroadcastmw2d-2.c (test for excess errors)
UNRESOLVED: gcc.target/i386/avx512cd-vpbroadcastmw2d-2.c compilation failed to 
produce executable

Excess errors: 
Assembler: avx512cd-vpbroadcastmb2q-2.c
"/var/tmp//ccs_9lod.s", line 42 : Invalid instruction argument
Near line: "vpbroadcastmb2q %k0, %zmm0"

Assembler: avx512cd-vpbroadcastmw2d-2.c
"/var/tmp//ccevT6Rd.s", line 35 : Invalid instruction argument
Near line: "vpbroadcastmw2d %k0, %zmm0"

This seems to be an as bug, but given that this rarely if ever gets any
fixes these days, this test just skips the affected tests.

Adjuststing check_effective_target_avx512cd instead doesn't seem
sensible since it would disable quite a number of working tests.

Tested on i386-pc-solaris2.11 (as and gas) and x86_64-pc-linux-gnu.

Ok for trunk?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2024-03-19  Rainer Orth  

gcc/testsuite:
PR target/114150
* gcc.target/i386/avx512cd-vpbroadcastmb2q-2.c: Skip on
Solaris/x86 with as.
* gcc.target/i386/avx512cd-vpbroadcastmw2d-2.c: Likewise.

# HG changeset patch
# Parent  23c541e5d902f9a56abba0367e1fb8326f350d16
testsuite: i386: xfail gcc.target/i386/avx512cd-vpbroadcastmb2q-2.c etc. with Solaris as [PR114150]

diff --git a/gcc/testsuite/gcc.target/i386/avx512cd-vpbroadcastmb2q-2.c b/gcc/testsuite/gcc.target/i386/avx512cd-vpbroadcastmb2q-2.c
--- a/gcc/testsuite/gcc.target/i386/avx512cd-vpbroadcastmb2q-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx512cd-vpbroadcastmb2q-2.c
@@ -1,6 +1,7 @@
 /* { dg-do run } */
 /* { dg-options "-O2 -mavx512cd" } */
 /* { dg-require-effective-target avx512cd } */
+/* { dg-skip-if "PR target/114150" { *-*-solaris2.* && { ! gas } } } */
 
 #define HAVE_512
 #define AVX512CD
diff --git a/gcc/testsuite/gcc.target/i386/avx512cd-vpbroadcastmw2d-2.c b/gcc/testsuite/gcc.target/i386/avx512cd-vpbroadcastmw2d-2.c
--- a/gcc/testsuite/gcc.target/i386/avx512cd-vpbroadcastmw2d-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx512cd-vpbroadcastmw2d-2.c
@@ -1,6 +1,7 @@
 /* { dg-do run } */
 /* { dg-options "-O2 -mavx512cd" } */
 /* { dg-require-effective-target avx512cd } */
+/* { dg-skip-if "PR target/114150" { *-*-solaris2.* && { ! gas } } } */
 
 #define HAVE_512
 #define AVX512CD

[PATCH] tree-optimization/111736 - avoid address sanitizing of __seg_gs

2024-03-21 Thread Richard Biener

The following more thoroughly avoids address sanitizing accesses
to non-generic address-spaces.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK?

Thanks,
Richard.

PR tree-optimization/111736
* asan.cc (instrument_derefs): Do not instrument accesses
to non-generic address-spaces.

* gcc.target/i386/pr111736.c: New testcase.
---
 gcc/asan.cc  |  4 
 gcc/testsuite/gcc.target/i386/pr111736.c | 23 +++
 2 files changed, 27 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr111736.c

diff --git a/gcc/asan.cc b/gcc/asan.cc
index cfe83106460..04caf8802e2 100644
--- a/gcc/asan.cc
+++ b/gcc/asan.cc
@@ -2755,6 +2755,10 @@ instrument_derefs (gimple_stmt_iterator *iter, tree t,
   if (VAR_P (inner) && DECL_HARD_REGISTER (inner))
 return;
 
+  /* Accesses to non-generic address-spaces are not handled.  */
+  if (!ADDR_SPACE_GENERIC_P (TYPE_ADDR_SPACE (TREE_TYPE (inner
+return;
+
   poly_int64 decl_size;
   if ((VAR_P (inner)
|| (TREE_CODE (inner) == RESULT_DECL
diff --git a/gcc/testsuite/gcc.target/i386/pr111736.c 
b/gcc/testsuite/gcc.target/i386/pr111736.c
new file mode 100644
index 000..231fdd07e80
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr111736.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fsanitize=address" } */
+
+int __seg_gs m;
+
+int foo (void)
+{
+  return m;
+}
+
+extern int  __seg_gs n;
+
+int bar (void)
+{
+  return n;
+}
+
+int baz (int __seg_gs *o)
+{
+  return *o;
+}
+
+/* { dg-final { scan-assembler-not "asan_report_load" } } */
-- 
2.35.3

Re: [PATCH] Fix runtime error for nonlinear iv vectorization(step_mult).

2024-03-21 Thread Richard Biener

On Thu, Mar 21, 2024 at 9:35 AM liuhongt  wrote:
>
> wi::from_mpz doesn't take a sign argument, we want it to be wrapped
> instead of saturation, so pass utype and true to it, and it fixes the
> bug.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk and backport to gcc13?

OK for both.

Thanks,
Richard.

> gcc/ChangeLog:
>
> PR tree-optimization/114396
> * tree-vect-loop.cc (vect_peel_nonlinear_iv_init): Pass utype
> and true to wi::from_mpz.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr114396.c: New test.
> ---
>  gcc/testsuite/gcc.target/i386/pr114396.c | 105 +++
>  gcc/tree-vect-loop.cc|   2 +-
>  2 files changed, 106 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr114396.c
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr114396.c 
> b/gcc/testsuite/gcc.target/i386/pr114396.c
> new file mode 100644
> index 000..4c4015f871f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr114396.c
> @@ -0,0 +1,105 @@
> +/* { dg-do run } */
> +/* { dg-options "-O1 -fwrapv -fno-vect-cost-model" } */
> +
> +short a = 0xF;
> +short b[16];
> +unsigned short ua = 0xF;
> +unsigned short ub[16];
> +
> +short
> +__attribute__((noipa))
> +foo (short a)
> +{
> +  for (int e = 0; e < 9; e += 1)
> +b[e] = a *= 5;
> +  return a;
> +}
> +
> +short
> +__attribute__((noipa))
> +foo1 (short a)
> +{
> +  for (int e = 0; e < 9; e += 1)
> +b[e] = a *= -5;
> +  return a;
> +}
> +
> +unsigned short
> +__attribute__((noipa))
> +foou (unsigned short a)
> +{
> +  for (int e = 0; e < 9; e += 1)
> +ub[e] = a *= -5;
> +  return a;
> +}
> +
> +unsigned short
> +__attribute__((noipa))
> +foou1 (unsigned short a)
> +{
> +  for (int e = 0; e < 9; e += 1)
> +ub[e] = a *= 5;
> +  return a;
> +}
> +
> +short
> +__attribute__((noipa,optimize("O3")))
> +foo_o3 (short a)
> +{
> +  for (int e = 0; e < 9; e += 1)
> +b[e] = a *= 5;
> +  return a;
> +}
> +
> +short
> +__attribute__((noipa,optimize("O3")))
> +foo1_o3 (short a)
> +{
> +  for (int e = 0; e < 9; e += 1)
> +b[e] = a *= -5;
> +  return a;
> +}
> +
> +unsigned short
> +__attribute__((noipa,optimize("O3")))
> +foou_o3 (unsigned short a)
> +{
> +  for (int e = 0; e < 9; e += 1)
> +ub[e] = a *= -5;
> +  return a;
> +}
> +
> +unsigned short
> +__attribute__((noipa,optimize("O3")))
> +foou1_o3 (unsigned short a)
> +{
> +  for (int e = 0; e < 9; e += 1)
> +ub[e] = a *= 5;
> +  return a;
> +}
> +
> +int main() {
> +  unsigned short uexp, ures;
> +  short exp, res;
> +  exp = foo (a);
> +  res = foo_o3 (a);
> +  if (exp != res)
> +__builtin_abort ();
> +
> +  exp = foo1 (a);
> +  res = foo1_o3 (a);
> +  if (uexp != ures)
> +__builtin_abort ();
> +
> +  uexp = foou (a);
> +  ures = foou_o3 (a);
> +  if (uexp != ures)
> +__builtin_abort ();
> +
> +  uexp = foou1 (a);
> +  ures = foou1_o3 (a);
> +  if (uexp != ures)
> +__builtin_abort ();
> +
> +  return 0;
> +}
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 4375ebdcb49..2921a9e6aa1 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -9454,7 +9454,7 @@ vect_peel_nonlinear_iv_init (gimple_seq* stmts, tree 
> init_expr,
> wi::to_mpz (skipn, exp, UNSIGNED);
> mpz_ui_pow_ui (mod, 2, TYPE_PRECISION (type));
> mpz_powm (res, base, exp, mod);
> -   begin = wi::from_mpz (type, res, TYPE_SIGN (type));
> +   begin = wi::from_mpz (utype, res, true);
> tree mult_expr = wide_int_to_tree (utype, begin);
> init_expr = gimple_build (stmts, MULT_EXPR, utype,
>   init_expr, mult_expr);
> --
> 2.31.1
>

Re: [PATCH] libgcc: Fix up bitint division [PR114397]

2024-03-21 Thread Richard Biener

On Thu, 21 Mar 2024, Jakub Jelinek wrote:

> Hi!
> 
> The Knuth's division algorithm relies on the number of dividend limbs
> to be greater ore equal to number of divisor limbs, which is why
> I've added a special case for un < vn at the start of __divmodbitint4.
> Unfortunately, my assumption that it then implies abs(v) > abs(u) and
> so quotient must be 0 and remainder same as dividend is incorrect.
> This is because this check is done before negation of the operands.
> While bitint_reduce_prec reduces precision from clearly useless limbs,
> the problematic case is when the dividend is unsigned or non-negative
> and divisor is negative.  We can have limbs (from MS to LS):
> dividend: 0   M   ?...
> divisor:  -1  -N  ?...
> where M has most significant bit set and M >= N (if M == N then it
> also the following limbs matter) and the most significant limbs can
> be even partial.  In this case, the quotient should be -1 rather than
> 0.  bitint_reduce_prec will reduce the precision of the dividend so
> that M is the most significant limb, but can't reduce precision of the
> divisor to more than having the -1 as most significant limb, because
> -N doesn't have the most significant bit set.
> 
> The following patch fixes it by detecting this problematic case in the
> un < vn handling, and instead of assuming q is 0 and r is u will
> decrease vn by 1 because it knows the later code will negate the divisor
> and it can be then expressed after negation in one fewer limbs.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Richard.

> 2024-03-21  Jakub Jelinek  
> 
>   PR libgcc/114397
>   * libgcc2.c (__divmodbitint4): Don't assume un < vn always means
>   abs(v) > abs(u), check for a special case of un + 1 == vn where
>   u is non-negative and v negative and after v's negation vn could
>   be reduced by 1.
> 
>   * gcc.dg/torture/bitint-65.c: New test.
> 
> --- libgcc/libgcc2.c.jj   2024-03-15 19:04:27.0 +0100
> +++ libgcc/libgcc2.c  2024-03-20 18:23:51.956879521 +0100
> @@ -1707,44 +1707,67 @@ __divmodbitint4 (UBILtype *q, SItype qpr
>USItype vp = avprec % W_TYPE_SIZE;
>if (__builtin_expect (un < vn, 0))
>  {
> -  /* If abs(v) > abs(u), then q is 0 and r is u.  */
> -  if (q)
> - __builtin_memset (q, 0, qn * sizeof (UWtype));
> -  if (r == NULL)
> - return;
> -#if __LIBGCC_BITINT_ORDER__ == __ORDER_BIG_ENDIAN__
> -  r += rn - 1;
> -  u += un - 1;
> -#endif
> -  if (up)
> - --un;
> -  if (rn < un)
> - un = rn;
> -  for (rn -= un; un; --un)
> +  /* If abs(v) > abs(u), then q is 0 and r is u.
> +  Unfortunately un < vn doesn't always mean abs(v) > abs(u).
> +  If uprec > 0 and vprec < 0 and vn == un + 1, if the
> +  top limb of v is all ones and the second most significant
> +  limb has most significant bit clear, then just decrease
> +  vn/avprec/vp and continue, after negation both numbers
> +  will have the same number of limbs.  */
> +  if (un + 1 == vn
> +   && uprec >= 0
> +   && vprec < 0
> +   && ((v[BITINT_END (0, vn - 1)] | (vp ? ((UWtype) -1 << vp) : 0))
> +   == (UWtype) -1)
> +   && (Wtype) v[BITINT_END (1, vn - 2)] >= 0)
>   {
> -   *r = *u;
> -   r += BITINT_INC;
> -   u += BITINT_INC;
> +   vp = 0;
> +   --vn;
> +#if __LIBGCC_BITINT_ORDER__ == __ORDER_BIG_ENDIAN__
> +   ++v;
> +#endif
>   }
> -  if (!rn)
> - return;
> -  if (up)
> +  else
>   {
> -   if (uprec > 0)
> - *r = *u & (((UWtype) 1 << up) - 1);
> -   else
> - *r = *u | ((UWtype) -1 << up);
> -   r += BITINT_INC;
> -   if (!--rn)
> +   /* q is 0 and r is u.  */
> +   if (q)
> + __builtin_memset (q, 0, qn * sizeof (UWtype));
> +   if (r == NULL)
>   return;
> +#if __LIBGCC_BITINT_ORDER__ == __ORDER_BIG_ENDIAN__
> +   r += rn - 1;
> +   u += un - 1;
> +#endif
> +   if (up)
> + --un;
> +   if (rn < un)
> + un = rn;
> +   for (rn -= un; un; --un)
> + {
> +   *r = *u;
> +   r += BITINT_INC;
> +   u += BITINT_INC;
> + }
> +   if (!rn)
> + return;
> +   if (up)
> + {
> +   if (uprec > 0)
> + *r = *u & (((UWtype) 1 << up) - 1);
> +   else
> + *r = *u | ((UWtype) -1 << up);
> +   r += BITINT_INC;
> +   if (!--rn)
> + return;
> + }
> +   UWtype c = uprec < 0 ? (UWtype) -1 : (UWtype) 0;
> +   for (; rn; --rn)
> + {
> +   *r = c;
> +   r += BITINT_INC;
> + }
> +   return;
>   }
> -  UWtype c = uprec < 0 ? (UWtype) -1 : (UWtype) 0;
> -  for (; rn; --rn)
> - {
> -   *r = c;
> -   r += BITINT_INC;
> - }
> -  return;
>  }
>USItype qn2 = un - vn + 1;
>if (qn >= qn2)
> ---

[PATCH] Fix runtime error for nonlinear iv vectorization(step_mult).

2024-03-21 Thread liuhongt

wi::from_mpz doesn't take a sign argument, we want it to be wrapped
instead of saturation, so pass utype and true to it, and it fixes the
bug.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk and backport to gcc13?

gcc/ChangeLog:

PR tree-optimization/114396
* tree-vect-loop.cc (vect_peel_nonlinear_iv_init): Pass utype
and true to wi::from_mpz.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr114396.c: New test.
---
 gcc/testsuite/gcc.target/i386/pr114396.c | 105 +++
 gcc/tree-vect-loop.cc|   2 +-
 2 files changed, 106 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr114396.c

diff --git a/gcc/testsuite/gcc.target/i386/pr114396.c 
b/gcc/testsuite/gcc.target/i386/pr114396.c
new file mode 100644
index 000..4c4015f871f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr114396.c
@@ -0,0 +1,105 @@
+/* { dg-do run } */
+/* { dg-options "-O1 -fwrapv -fno-vect-cost-model" } */
+
+short a = 0xF;
+short b[16];
+unsigned short ua = 0xF;
+unsigned short ub[16];
+
+short
+__attribute__((noipa))
+foo (short a)
+{
+  for (int e = 0; e < 9; e += 1)
+b[e] = a *= 5;
+  return a;
+}
+
+short
+__attribute__((noipa))
+foo1 (short a)
+{
+  for (int e = 0; e < 9; e += 1)
+b[e] = a *= -5;
+  return a;
+}
+
+unsigned short
+__attribute__((noipa))
+foou (unsigned short a)
+{
+  for (int e = 0; e < 9; e += 1)
+ub[e] = a *= -5;
+  return a;
+}
+
+unsigned short
+__attribute__((noipa))
+foou1 (unsigned short a)
+{
+  for (int e = 0; e < 9; e += 1)
+ub[e] = a *= 5;
+  return a;
+}
+
+short
+__attribute__((noipa,optimize("O3")))
+foo_o3 (short a)
+{
+  for (int e = 0; e < 9; e += 1)
+b[e] = a *= 5;
+  return a;
+}
+
+short
+__attribute__((noipa,optimize("O3")))
+foo1_o3 (short a)
+{
+  for (int e = 0; e < 9; e += 1)
+b[e] = a *= -5;
+  return a;
+}
+
+unsigned short
+__attribute__((noipa,optimize("O3")))
+foou_o3 (unsigned short a)
+{
+  for (int e = 0; e < 9; e += 1)
+ub[e] = a *= -5;
+  return a;
+}
+
+unsigned short
+__attribute__((noipa,optimize("O3")))
+foou1_o3 (unsigned short a)
+{
+  for (int e = 0; e < 9; e += 1)
+ub[e] = a *= 5;
+  return a;
+}
+
+int main() {
+  unsigned short uexp, ures;
+  short exp, res;
+  exp = foo (a);
+  res = foo_o3 (a);
+  if (exp != res)
+__builtin_abort ();
+
+  exp = foo1 (a);
+  res = foo1_o3 (a);
+  if (uexp != ures)
+__builtin_abort ();
+
+  uexp = foou (a);
+  ures = foou_o3 (a);
+  if (uexp != ures)
+__builtin_abort ();
+
+  uexp = foou1 (a);
+  ures = foou1_o3 (a);
+  if (uexp != ures)
+__builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 4375ebdcb49..2921a9e6aa1 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -9454,7 +9454,7 @@ vect_peel_nonlinear_iv_init (gimple_seq* stmts, tree 
init_expr,
wi::to_mpz (skipn, exp, UNSIGNED);
mpz_ui_pow_ui (mod, 2, TYPE_PRECISION (type));
mpz_powm (res, base, exp, mod);
-   begin = wi::from_mpz (type, res, TYPE_SIGN (type));
+   begin = wi::from_mpz (utype, res, true);
tree mult_expr = wide_int_to_tree (utype, begin);
init_expr = gimple_build (stmts, MULT_EXPR, utype,
  init_expr, mult_expr);
-- 
2.31.1

[PATCH] libgcc: Fix up bitint division [PR114397]

2024-03-21 Thread Jakub Jelinek

Hi!

The Knuth's division algorithm relies on the number of dividend limbs
to be greater ore equal to number of divisor limbs, which is why
I've added a special case for un < vn at the start of __divmodbitint4.
Unfortunately, my assumption that it then implies abs(v) > abs(u) and
so quotient must be 0 and remainder same as dividend is incorrect.
This is because this check is done before negation of the operands.
While bitint_reduce_prec reduces precision from clearly useless limbs,
the problematic case is when the dividend is unsigned or non-negative
and divisor is negative.  We can have limbs (from MS to LS):
dividend:   0   M   ?...
divisor:-1  -N  ?...
where M has most significant bit set and M >= N (if M == N then it
also the following limbs matter) and the most significant limbs can
be even partial.  In this case, the quotient should be -1 rather than
0.  bitint_reduce_prec will reduce the precision of the dividend so
that M is the most significant limb, but can't reduce precision of the
divisor to more than having the -1 as most significant limb, because
-N doesn't have the most significant bit set.

The following patch fixes it by detecting this problematic case in the
un < vn handling, and instead of assuming q is 0 and r is u will
decrease vn by 1 because it knows the later code will negate the divisor
and it can be then expressed after negation in one fewer limbs.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-03-21  Jakub Jelinek  

PR libgcc/114397
* libgcc2.c (__divmodbitint4): Don't assume un < vn always means
abs(v) > abs(u), check for a special case of un + 1 == vn where
u is non-negative and v negative and after v's negation vn could
be reduced by 1.

* gcc.dg/torture/bitint-65.c: New test.

--- libgcc/libgcc2.c.jj 2024-03-15 19:04:27.0 +0100
+++ libgcc/libgcc2.c2024-03-20 18:23:51.956879521 +0100
@@ -1707,44 +1707,67 @@ __divmodbitint4 (UBILtype *q, SItype qpr
   USItype vp = avprec % W_TYPE_SIZE;
   if (__builtin_expect (un < vn, 0))
 {
-  /* If abs(v) > abs(u), then q is 0 and r is u.  */
-  if (q)
-   __builtin_memset (q, 0, qn * sizeof (UWtype));
-  if (r == NULL)
-   return;
-#if __LIBGCC_BITINT_ORDER__ == __ORDER_BIG_ENDIAN__
-  r += rn - 1;
-  u += un - 1;
-#endif
-  if (up)
-   --un;
-  if (rn < un)
-   un = rn;
-  for (rn -= un; un; --un)
+  /* If abs(v) > abs(u), then q is 0 and r is u.
+Unfortunately un < vn doesn't always mean abs(v) > abs(u).
+If uprec > 0 and vprec < 0 and vn == un + 1, if the
+top limb of v is all ones and the second most significant
+limb has most significant bit clear, then just decrease
+vn/avprec/vp and continue, after negation both numbers
+will have the same number of limbs.  */
+  if (un + 1 == vn
+ && uprec >= 0
+ && vprec < 0
+ && ((v[BITINT_END (0, vn - 1)] | (vp ? ((UWtype) -1 << vp) : 0))
+ == (UWtype) -1)
+ && (Wtype) v[BITINT_END (1, vn - 2)] >= 0)
{
- *r = *u;
- r += BITINT_INC;
- u += BITINT_INC;
+ vp = 0;
+ --vn;
+#if __LIBGCC_BITINT_ORDER__ == __ORDER_BIG_ENDIAN__
+ ++v;
+#endif
}
-  if (!rn)
-   return;
-  if (up)
+  else
{
- if (uprec > 0)
-   *r = *u & (((UWtype) 1 << up) - 1);
- else
-   *r = *u | ((UWtype) -1 << up);
- r += BITINT_INC;
- if (!--rn)
+ /* q is 0 and r is u.  */
+ if (q)
+   __builtin_memset (q, 0, qn * sizeof (UWtype));
+ if (r == NULL)
return;
+#if __LIBGCC_BITINT_ORDER__ == __ORDER_BIG_ENDIAN__
+ r += rn - 1;
+ u += un - 1;
+#endif
+ if (up)
+   --un;
+ if (rn < un)
+   un = rn;
+ for (rn -= un; un; --un)
+   {
+ *r = *u;
+ r += BITINT_INC;
+ u += BITINT_INC;
+   }
+ if (!rn)
+   return;
+ if (up)
+   {
+ if (uprec > 0)
+   *r = *u & (((UWtype) 1 << up) - 1);
+ else
+   *r = *u | ((UWtype) -1 << up);
+ r += BITINT_INC;
+ if (!--rn)
+   return;
+   }
+ UWtype c = uprec < 0 ? (UWtype) -1 : (UWtype) 0;
+ for (; rn; --rn)
+   {
+ *r = c;
+ r += BITINT_INC;
+   }
+ return;
}
-  UWtype c = uprec < 0 ? (UWtype) -1 : (UWtype) 0;
-  for (; rn; --rn)
-   {
- *r = c;
- r += BITINT_INC;
-   }
-  return;
 }
   USItype qn2 = un - vn + 1;
   if (qn >= qn2)
--- gcc/testsuite/gcc.dg/torture/bitint-65.c.jj 2024-03-20 18:41:38.026311007 
+0100
+++ gcc/testsuite/gcc.dg/torture/bitint-65.c2024-03-20 18:40:18.604397871 
+0100
@@ -0,0 +1,44 @@
+/* PR

Re: [PATCH] analyzer: Bail out on function pointer for -Wanalyzer-allocation-size

2024-03-21 Thread Stefan Schulze Frielinghaus

On Tue, Mar 19, 2024 at 12:38:34PM -0400, David Malcolm wrote:
> On Tue, 2024-03-19 at 16:10 +0100, Stefan Schulze Frielinghaus wrote:
> > On s390 pr94688.c is failing due to excess error
> > 
> > pr94688.c:6:5: warning: allocated buffer size is not a multiple of
> > the pointee's size [CWE-131] [-Wanalyzer-allocation-size]
> > 
> > This is because on s390 functions are by default aligned to an 8-byte
> > boundary and during function type construction size is set to
> > function
> > boundary.  Thus, for the assignment
> > 
> > a.0_1 = (void (*) ()) 
> > 
> > we have that the right-hand side is pointing to a 4-byte memory
> > region
> > whereas the size of the function pointer is 8 byte and a warning is
> > emitted.
> 
> FWIW the test case in question is a regression test for an ICE seen in
> the GCC 10 implementation of the analyzer, which was fixed by the big
> rewrite in r11-2694-g808f4dfeb3a95f.
> 
> So the code in the test doesn't make a great deal of sense.
> 
> > 
> > I could follow and skip this test as done in PR112705, or we could
> > bail
> > out early in the analyzer for function pointers.  My intuition so far
> > is that -Wanalyzer-allocation-size shouldn't care about function
> > pointer.  Therefore, I went for bailing out early.  If you believe
> > this
> > is wrong I can still go by skipping this test on s390.  Any thoughts?
> 
> I tried imagining a situation where we're analyzing a function
> generated at run-time, but it strikes me that the buffer allocated for
> such a function can be of arbitrary size.  So -Wanalyzer-allocation-
> size is meaningless for functions.
> 
> There's probably a case for checking for mismatches between pointers to
> code vs pointers to data (e.g. alignments, Harvard architecture
> machines, etc), but -Wanalyzer-allocation-size doesn't do that.
> 
> So I think your patch is correct.
> 
> OK to push it if it passes bootstrap testing.

Bootstrapped and regtested on x64 and s390x.

Thanks,
Stefan

> 
> Thanks
> Dave
> 
> > ---
> >  gcc/analyzer/region-model.cc | 4 
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-
> > model.cc
> > index f079d1fb37e..1b43443d168 100644
> > --- a/gcc/analyzer/region-model.cc
> > +++ b/gcc/analyzer/region-model.cc
> > @@ -3514,6 +3514,10 @@ region_model::check_region_size (const region
> > *lhs_reg, const svalue *rhs_sval,
> >    || TYPE_SIZE_UNIT (pointee_type) == NULL_TREE)
> >  return;
> >  
> > +  /* Bail out early on function pointers.  */
> > +  if (TREE_CODE (pointee_type) == FUNCTION_TYPE)
> > +    return;
> > +
> >    /* Bail out early on pointers to structs where we can
> >   not deduce whether the buffer size is compatible.  */
> >    bool is_struct = RECORD_OR_UNION_TYPE_P (pointee_type);
>

Re: _LIBCXX_DEBUG value initialized singular iterators assert failures in std algorithms [PR104316]

2024-03-21 Thread Jonathan Wakely

On Wed, 20 Mar 2024 at 18:11, François Dumont wrote:
>
> As proposed below I also updated gcc-13 branch.

Great, thanks.

>
>  libstdc++: [_GLIBCXX_DEBUG] Define __cpp_lib_null_iterators
>
>  _GLIBCXX_DEBUG has now fully N3344 compliant iterator checks, we
> can define
>  __cpp_lib_null_iterators macros like the normal mode.
>
>  libstdc++-v3/ChangeLog:
>
>  * include/std/iterator (__cpp_lib_null_iterators): Define
> regardless of
>  _GLIBCXX_DEBUG.
>  * include/std/version (__cpp_lib_null_iterators): Likewise.
>
> François
>
>
> On 20/03/2024 10:02, Jonathan Wakely wrote:
> > On Wed, 20 Mar 2024 at 05:59, François Dumont wrote:
> >> Thanks to you doc:
> >>
> >>   libstdc++: [_GLIBCXX_DEBUG] Define __[glibcxx,cpp_lib]_null_iterators
> >>
> >>   _GLIBCXX_DEBUG has now fully N3344 compliant iterator checks, we
> >> can define
> >>   __glibcxx_null_iterators and __cpp_lib_null_iterators macros like
> >> the normal
> >>   mode.
> >>
> >>   libstdc++-v3/ChangeLog:
> >>
> >>   * version.def (null_iterators): Remove extra_cond.
> >>   * version.h: Regenerate.
> >>
> >> Ok to commit ?
> > Please don't bother talking about __glibcxx_null_iterators in the
> > commit message, that's an implementation detail that always mirrors
> > the standard-defined __cpp_lib_null_iterators one. The first line of
> > the commit will be much easier to read without that.
> >
> > OK with that change, thanks.
> >
> >> I already noticed that GCC 13 has no version.h file so no backport 
> >> question.
> > It has no version.h but it still has the macros:
> >
> > include/std/iterator:# define __cpp_lib_null_iterators 201304L
> > include/std/version:# define __cpp_lib_null_iterators 201304L
> >
> > Those definitions can be made to not depend on _GLIBCXX_DEBUG.
> >

68 matches

Mail list logo