Re: [PATCH] c/c++: Tweak warning for 'always_inline function might not be inlinable'

2024-01-22 Thread Richard Biener
On Mon, Jan 22, 2024 at 3:27 PM Hans-Peter Nilsson  wrote:
>
> > From: Richard Biener 
> > Date: Mon, 22 Jan 2024 08:33:47 +0100
>
> > > -   "% function might not be inlinable");
> > > +   "% function is not always inlined"
> > > +   " unless also declared %");
> >
> > I don't like the "is not always inlined", maybe simply reword to
> >
> >   "% function might not be inlinable"
> >   " unless also declared %"
> >
> > ?
>
> Sure.  Though it's a small nuance to which I don't actually
> agree, I'll go along with almost anything as long as the
> "...declared inline" augmentation is there :-)
>
> Also, I can see that keeping closer to the original wording
> as you suggest can be preferable to some.
>
> I assume by your reply that the patch is ok with that change
> but will wait another 72 hours for "native speakers" to have
> a say.

Yeah, it's OK with me with that change.  I CCed Honza in case
he has anything to add on the factual side.

Richard.

>
> Thanks!
>
> brgds, H-P


[PATCH] testsuite: require libc sym for -shared

2024-01-22 Thread Alexandre Oliva


Targets whose binutils support -shared, but that don't have a shared
libc, and that can't add PDC (non-PIC) to shared libraries, may
succeed at the effective target test for -shared, because it brings
nothing from libc, but tests that rely on -shared and that use bits
from libc, such as g++.dg/lto/pr108772, fail despite requiring the
shared effective target.

Extend the effective target test to bring malloc() from libc, that's
likely to be present in libc and bring a substantial amount of code if
no shared libc is available.

Regstrapped on x86_64-linux-gnu, also tested on aarch64-elf with gcc-13,
where the problem was observed.  Ok to install?


for  gcc/testsuite/ChangeLog

* lib/target-supports.exp (check_effective_target_shared):
Check for a static-only libc.
---
 gcc/testsuite/lib/target-supports.exp |   10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 73360cd3a0d55..213dad355a6a5 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -1391,10 +1391,16 @@ proc check_effective_target_aarch64_tlsle32 { } {
 proc check_effective_target_shared { } {
 # Note that M68K has a multilib that supports -fpic but not
 # -fPIC, so we need to check both.  We test with a program that
-# requires GOT references.
+# requires GOT references, and with a libc symbol that would
+# bring in significant parts of a static-only libc.  Absent a
+# shared libc, this would make -shared tests fail, so we don't
+# want to enable the shared effective target then.
 return [check_no_compiler_messages shared executable {
+   #include 
extern int foo (void); extern int bar;
-   int baz (void) { return foo () + bar; }
+   char *baz (void) {
+   return foo () + (char*) malloc (bar);
+   }
 } "-shared -fpic"]
 }
 

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


[PATCH] testsuite: no dfp run without dfprt

2024-01-22 Thread Alexandre Oliva


newlib-src/libc/include/sys/fenv.h doesn't define the FE_* macros that
libgcc expects to enable decimal float support.  Only after newlib is
configured and built does an overriding header that defines those
macros become available in objdir//newlib/targ-include/, but
by then, libgcc has already been built without dfp and libbid.

This has exposed a number of tests that attempt to link dfp programs
without requiring a dfprt effective target.

dfp.exp already skips if dfp support is missing altogether, and sets
the default to compile rather than run if dfp support is present in
the compiler but missing in the runtime libraries.

However, some of the dfp tests override the default without requiring
dfprt.  Drop the overriders where reasonable, and add the explicit
requirement elsewhere.

Regstrapped on x86_64-linux-gnu; also tested on aarch64-elf with gcc-13,
where the problem was observed.  Ok to install?


for  gcc/testsuite/ChangeLog

* c-c++-common/dfp/pr36800.c: Drop dg-do overrider.
* c-c++-common/dfp/pr39034.c: Likewise.
* c-c++-common/dfp/pr39035.c: Likewise.
* gcc.dg/dfṕ/bid-non-canonical-d32-1.c: Likewise.
* gcc.dg/dfṕ/bid-non-canonical-d32-2.c: Likewise.
* gcc.dg/dfṕ/bid-non-canonical-d64-1.c: Likewise.
* gcc.dg/dfṕ/bid-non-canonical-d64-2.c: Likewise.
* gcc.dg/dfṕ/builtin-tgmath-dfp.c: Likewise.
* gcc.dg/dfṕ/c23-float-dfp-4.c: Likewise.
* gcc.dg/dfṕ/c23-float-dfp-5.c: Likewise.
* gcc.dg/dfṕ/c23-float-dfp-6.c: Likewise.
* gcc.dg/dfṕ/c23-float-dfp-7.c: Likewise.
* gcc.dg/dfp/pr108068.c: Likewise.
* gcc.dg/dfp/pr97439.c: Likewise.
* g++.dg/compat/decimal/pass-1_main.C: Require dfprt.
* g++.dg/compat/decimal/pass-2_main.C: Likewise.
* g++.dg/compat/decimal/pass-3_main.C: Likewise.
* g++.dg/compat/decimal/pass-4_main.C: Likewise.
* g++.dg/compat/decimal/pass-5_main.C: Likewise.
* g++.dg/compat/decimal/pass-6_main.C: Likewise.
* g++.dg/compat/decimal/return-1_main.C: Likewise.
* g++.dg/compat/decimal/return-2_main.C: Likewise.
* g++.dg/compat/decimal/return-3_main.C: Likewise.
* g++.dg/compat/decimal/return-4_main.C: Likewise.
* g++.dg/compat/decimal/return-5_main.C: Likewise.
* g++.dg/compat/decimal/return-6_main.C: Likewise.
* g++.dg/eh/dfp-1.C: Likewise.
* g++.dg/eh/dfp-2.C: Likewise.
* g++.dg/eh/dfp-saves-aarch64.C: Likewise.
* gcc.c-torture/execute/pr80692.c: Likewise.
* gcc.dg/dfṕ/bid-non-canonical-d128-1.c: Likewise.
* gcc.dg/dfṕ/bid-non-canonical-d128-2.c: Likewise.
* gcc.dg/dfṕ/bid-non-canonical-d128-3.c: Likewise.
* gcc.dg/dfṕ/bid-non-canonical-d128-4.c: Likewise.
---
 gcc/testsuite/c-c++-common/dfp/pr36800.c   |2 --
 gcc/testsuite/c-c++-common/dfp/pr39034.c   |1 -
 gcc/testsuite/c-c++-common/dfp/pr39035.c   |1 -
 gcc/testsuite/g++.dg/compat/decimal/pass-1_main.C  |1 +
 gcc/testsuite/g++.dg/compat/decimal/pass-2_main.C  |1 +
 gcc/testsuite/g++.dg/compat/decimal/pass-3_main.C  |1 +
 gcc/testsuite/g++.dg/compat/decimal/pass-4_main.C  |1 +
 gcc/testsuite/g++.dg/compat/decimal/pass-5_main.C  |1 +
 gcc/testsuite/g++.dg/compat/decimal/pass-6_main.C  |1 +
 .../g++.dg/compat/decimal/return-1_main.C  |1 +
 .../g++.dg/compat/decimal/return-2_main.C  |1 +
 .../g++.dg/compat/decimal/return-3_main.C  |1 +
 .../g++.dg/compat/decimal/return-4_main.C  |1 +
 .../g++.dg/compat/decimal/return-5_main.C  |1 +
 .../g++.dg/compat/decimal/return-6_main.C  |1 +
 gcc/testsuite/g++.dg/eh/dfp-1.C|1 +
 gcc/testsuite/g++.dg/eh/dfp-2.C|1 +
 gcc/testsuite/g++.dg/eh/dfp-saves-aarch64.C|1 +
 gcc/testsuite/gcc.c-torture/execute/pr80692.c  |1 +
 .../gcc.dg/dfp/bid-non-canonical-d128-1.c  |2 +-
 .../gcc.dg/dfp/bid-non-canonical-d128-2.c  |2 +-
 .../gcc.dg/dfp/bid-non-canonical-d128-3.c  |2 +-
 .../gcc.dg/dfp/bid-non-canonical-d128-4.c  |2 +-
 gcc/testsuite/gcc.dg/dfp/bid-non-canonical-d32-1.c |1 -
 gcc/testsuite/gcc.dg/dfp/bid-non-canonical-d32-2.c |1 -
 gcc/testsuite/gcc.dg/dfp/bid-non-canonical-d64-1.c |1 -
 gcc/testsuite/gcc.dg/dfp/bid-non-canonical-d64-2.c |1 -
 gcc/testsuite/gcc.dg/dfp/builtin-snan-1.c  |1 -
 gcc/testsuite/gcc.dg/dfp/builtin-tgmath-dfp.c  |1 -
 gcc/testsuite/gcc.dg/dfp/c23-float-dfp-4.c |1 -
 gcc/testsuite/gcc.dg/dfp/c23-float-dfp-5.c |1 -
 gcc/testsuite/gcc.dg/dfp/c23-float-dfp-6.c |1 -
 gcc/testsuite/gcc.dg/dfp/c23-float-dfp-7.c |1 -
 gcc/testsuite/gcc.dg/dfp/pr108068.c|1 -
 gcc/testsuite/gcc.dg/dfp/pr97439.c |1 -
 35 files changed, 20 insertions(+), 20 deletions(-)

diff 

[PATCH] aarch64: enforce lane checking for intrinsics

2024-01-22 Thread Alexandre Oliva


Calling arm_neon.h functions that take lanes as arguments may fail to
report malformed values if the intrinsic happens to be optimized away,
e.g. because it is pure or const and the result is unused.

Adding __AARCH64_LANE_CHECK calls to the always_inline functions would
duplicate errors in case the intrinsics are not optimized away; using
another preprocessor macro to call either the intrinsic or
__builtin_aarch64_im_lane_boundsi moves the error messages to the
arm_neon.h header, and may add warnings if we fall off the end of the
functions; duplicating the code to avoid the undesirable effect of the
macros doesn't seem appealing; separating the checking from alternate
no-error-checking core/pure (invisible?) intrinsics in e.g. folding of
non-const/pure (user-callable) intrinsics seems ugly and risky.

So I propose dropping the pure/const attribute from the intrinsics and
builtin declarations, so that gimple passes won't optimize them away.
After expand (when errors are detected and reported), we get plain
insns rather than calls, and those are dropped if the outputs are
unused.  It's not ideal, it could be improved, but it's safe enough
for this stage.

Regstrapped on x86_64-linux-gnu, along with other patches; also tested
on aarch64-elf with gcc-13.  This addresses the issue first reported at
.
Ok to install?


for  gcc/ChangeLog

* config/aarch64/aarch64-builtins.cc (aarch64_get_attributes):
Add lane_check parm, to rule out pure and const.
(aarch64_init_simd_intrinsics): Pass lane_check if any arg has
lane index qualifiers.
(aarch64_init_simd_builtin_functions): Likewise.
---
 gcc/config/aarch64/aarch64-builtins.cc |   24 
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 9b23b6b8c33f1..1268deea28e6c 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -1258,11 +1258,12 @@ aarch64_add_attribute (const char *name, tree attrs)
 /* Return the appropriate attributes for a function that has
flags F and mode MODE.  */
 static tree
-aarch64_get_attributes (unsigned int f, machine_mode mode)
+aarch64_get_attributes (unsigned int f, machine_mode mode,
+   bool lane_check = false)
 {
   tree attrs = NULL_TREE;
 
-  if (!aarch64_modifies_global_state_p (f, mode))
+  if (!lane_check && !aarch64_modifies_global_state_p (f, mode))
 {
   if (aarch64_reads_global_state_p (f, mode))
attrs = aarch64_add_attribute ("pure", attrs);
@@ -1318,6 +1319,7 @@ aarch64_init_simd_intrinsics (void)
 
   tree return_type = void_type_node;
   tree args = void_list_node;
+  bool lane_check = false;
 
   for (int op_num = d->op_count - 1; op_num >= 0; op_num--)
{
@@ -1330,10 +1332,17 @@ aarch64_init_simd_intrinsics (void)
return_type = eltype;
  else
args = tree_cons (NULL_TREE, eltype, args);
+
+ if (qualifiers & (qualifier_lane_index
+   | qualifier_struct_load_store_lane_index
+   | qualifier_lane_pair_index
+   | qualifier_lane_quadtup_index))
+   lane_check = true;
}
 
   tree ftype = build_function_type (return_type, args);
-  tree attrs = aarch64_get_attributes (d->flags, d->op_modes[0]);
+  tree attrs = aarch64_get_attributes (d->flags, d->op_modes[0],
+  lane_check);
   unsigned int code
  = (d->fcode << AARCH64_BUILTIN_SHIFT | AARCH64_BUILTIN_GENERAL);
   tree fndecl = simulate_builtin_function_decl (input_location, d->name,
@@ -1400,6 +1409,7 @@ aarch64_init_simd_builtin_functions (bool 
called_from_pragma)
  || (!called_from_pragma && struct_mode_args > 0))
continue;
 
+  bool lane_check = false;
   /* Build a function type directly from the insn_data for this
 builtin.  The build_function_type () function takes care of
 removing duplicates for us.  */
@@ -1435,6 +1445,12 @@ aarch64_init_simd_builtin_functions (bool 
called_from_pragma)
return_type = eltype;
  else
args = tree_cons (NULL_TREE, eltype, args);
+
+ if (qualifiers & (qualifier_lane_index
+   | qualifier_struct_load_store_lane_index
+   | qualifier_lane_pair_index
+   | qualifier_lane_quadtup_index))
+   lane_check = true;
}
 
   ftype = build_function_type (return_type, args);
@@ -1448,7 +1464,7 @@ aarch64_init_simd_builtin_functions (bool 
called_from_pragma)
snprintf (namebuf, sizeof (namebuf), "__builtin_aarch64_%s",
  d->name);
 
-  tree attrs = aarch64_get_attributes (d->flags, d->mode);
+  tree attrs = 

Re: [PATCH 1/2] rtl-optimization/113255 - base_alias_check vs. pointer difference

2024-01-22 Thread Richard Biener
On Mon, 22 Jan 2024, Jeff Law wrote:

> 
> 
> On 1/15/24 06:34, Richard Biener wrote:
> > When the x86 backend generates code for cpymem with the rep_8byte
> > strathegy for the 8 byte aligned main rep movq it needs to compute
> > an adjusted pointer to the source after doing a prologue aligning
> > the destination.  It computes that via
> > 
> >src_ptr + (dest_ptr - orig_dest_ptr)
> > 
> > which is perfectly fine.  On RTL this is then
> > 
> >  8: r134:DI=const(`g'+0x44)
> >  9: {r133:DI=frame:DI-0x4c;clobber flags:CC;}
> >REG_UNUSED flags:CC
> > 56: r129:DI=const(`g'+0x4c)
> > 57: {r129:DI=r129:DI&0xfff8;clobber flags:CC;}
> >REG_UNUSED flags:CC
> >REG_EQUAL const(`g'+0x4c)&0xfff8
> > 58: {r118:DI=r134:DI-r129:DI;clobber flags:CC;}
> >REG_DEAD r134:DI
> >REG_UNUSED flags:CC
> >REG_EQUAL const(`g'+0x44)-r129:DI
> > 59: {r119:DI=r133:DI-r118:DI;clobber flags:CC;}
> >REG_DEAD r133:DI
> >REG_UNUSED flags:CC
> > 
> > but as written find_base_term happily picks the first candidate
> > it finds for the MINUS which means it picks const(`g') rather
> > than the correct frame:DI.  This way find_base_term (but also
> > the unfixed find_base_value used by init_alias_analysis to
> > initialize REG_BASE_VALUE) performs pointer analysis isn't
> > sound.  The following restricts the handling of multi-operand
> > operations to the case we know only one can be a pointer.
> > 
> > This for example causes gcc.dg/tree-ssa/pr94969.c to miss some
> > RTL PRE (I've opened PR113395 for this).  A more drastic patch,
> > removing base_alias_check results in only gcc.dg/guality/pr41447-1.c
> > regressing (so testsuite coverage is bad).  I've looked at
> > gcc.dg/tree-ssa tests and mostly scheduling changes are present,
> > the cc1plus .text size is only 230 bytes worse.  With the this
> > less drastic patch below most scheduling changes are gone.
> > 
> > x86_64 might not the very best target to test for impact, but
> > test coverage on other targets is unlikely to be very much better.
> > 
> > Bootstrapped and tested on x86_64-unknown-linux-gnu (together
> > with 2/2).  Jeff, can you maybe throw this on your tester?
> > Jakub, you did the PR64025 fix which was for a similar issue.
> No issues across the cross compilers with those two patches.

Thanks, pushed.  I'm probably going to revert when bigger issues
appear (and hopefully we'd get some test coverage then).

Richard.


[PATCH] debug/112718 - reset all type units with -ffat-lto-objects

2024-01-22 Thread Richard Biener
When mixing -flto, -ffat-lto-objects and -fdebug-type-section we
fail to reset all type units after early output resulting in an
ICE when attempting to add then duplicate sibling attributes.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

Richard.

PR debug/112718
* dwarf2out.cc (dwarf2out_finish): Reset all type units
for the fat part of an LTO compile.

* gcc.dg/debug/pr112718.c: New testcase.
---
 gcc/dwarf2out.cc  | 12 
 gcc/testsuite/gcc.dg/debug/pr112718.c | 12 
 2 files changed, 12 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/debug/pr112718.c

diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
index 1c994bb8b9b..0b8a3002292 100644
--- a/gcc/dwarf2out.cc
+++ b/gcc/dwarf2out.cc
@@ -32276,24 +32276,12 @@ dwarf2out_finish (const char *filename)
   reset_dies (comp_unit_die ());
   for (limbo_die_node *node = cu_die_list; node; node = node->next)
reset_dies (node->die);
-
-  hash_table comdat_type_table (100);
   for (ctnode = comdat_type_list; ctnode != NULL; ctnode = ctnode->next)
{
- comdat_type_node **slot
- = comdat_type_table.find_slot (ctnode, INSERT);
-
- /* Don't reset types twice.  */
- if (*slot != HTAB_EMPTY_ENTRY)
-   continue;
-
  /* Remove the pointer to the line table.  */
  remove_AT (ctnode->root_die, DW_AT_stmt_list);
-
  if (debug_info_level >= DINFO_LEVEL_TERSE)
reset_dies (ctnode->root_die);
-
- *slot = ctnode;
}
 
   /* Reset die CU symbol so we don't output it twice.  */
diff --git a/gcc/testsuite/gcc.dg/debug/pr112718.c 
b/gcc/testsuite/gcc.dg/debug/pr112718.c
new file mode 100644
index 000..ff80ca5a298
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/debug/pr112718.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target lto } */
+/* { dg-options "-g -fdebug-types-section -flto -ffat-lto-objects" } */
+
+struct {
+  int h;
+  unsigned char data[20 + 24 * 6];
+} _EC_X9_62_PRIME_192V2;
+struct {
+  int h;
+  unsigned char data[20 + 24 * 6];
+} _EC_X9_62_PRIME_192V3;
-- 
2.35.3


[PATCH 1/2] Adjust hwasan testcase for x86 target.

2024-01-22 Thread liuhongt
There're 2 cases:
1. hwasan-poison-optimisation.c is supposed to scan call to
__hwasan_tag_mismatch4, and x86 have different mnemonic(call) from
aarch64(bl), so adjust testcase to scan either call or bl.

2. alloca-outside-caught.c/vararray-outside-caught.c are supposed to
scan mismatched tags and expected the tag corresponding to
out-of-bounds memory is 00, but for x86 the continous stack is
allocated by other local variable/array which is assigned with a
different tag, but still there're mismatches. So adjust testcase to
scan XX/XX instead of XX/00.

Ok for trunk?

gcc/testsuite/ChangeLog:

* c-c++-common/hwasan/alloca-outside-caught.c: Adjust
testcase.
* c-c++-common/hwasan/hwasan-poison-optimisation.c: Ditto.
* c-c++-common/hwasan/vararray-outside-caught.c: Ditto.
---
 gcc/testsuite/c-c++-common/hwasan/alloca-outside-caught.c  | 2 +-
 gcc/testsuite/c-c++-common/hwasan/hwasan-poison-optimisation.c | 2 +-
 gcc/testsuite/c-c++-common/hwasan/vararray-outside-caught.c| 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/hwasan/alloca-outside-caught.c 
b/gcc/testsuite/c-c++-common/hwasan/alloca-outside-caught.c
index 6f3825bee7c..f31484a2613 100644
--- a/gcc/testsuite/c-c++-common/hwasan/alloca-outside-caught.c
+++ b/gcc/testsuite/c-c++-common/hwasan/alloca-outside-caught.c
@@ -20,6 +20,6 @@ main ()
 }
 
 /* { dg-output "HWAddressSanitizer: tag-mismatch on address 0x\[0-9a-f\]*.*" } 
*/
-/* { dg-output "READ of size 4 at 0x\[0-9a-f\]* tags: 
\[\[:xdigit:\]\]\[\[:xdigit:\]\]/00.* \\(ptr/mem\\) in thread T0.*" } */
+/* { dg-output "READ of size 4 at 0x\[0-9a-f\]* tags: 
\[\[:xdigit:\]\]\[\[:xdigit:\]\]/\[\[:xdigit:\]\]\[\[:xdigit:\]\].* 
\\(ptr/mem\\) in thread T0.*" } */
 /* { dg-output "Address 0x\[0-9a-f\]* is located in stack of thread T0.*" } */
 /* { dg-output "SUMMARY: HWAddressSanitizer: tag-mismatch \[^\n\]*.*" } */
diff --git a/gcc/testsuite/c-c++-common/hwasan/hwasan-poison-optimisation.c 
b/gcc/testsuite/c-c++-common/hwasan/hwasan-poison-optimisation.c
index 2d6bab4c578..48cf88744eb 100644
--- a/gcc/testsuite/c-c++-common/hwasan/hwasan-poison-optimisation.c
+++ b/gcc/testsuite/c-c++-common/hwasan/hwasan-poison-optimisation.c
@@ -22,7 +22,7 @@ main ()
 }
 
 /* { dg-final { scan-tree-dump-times "ASAN_POISON" 1 "asan1" }  } */
-/* { dg-final { scan-assembler-times "bl\\s*__hwasan_tag_mismatch4" 1 } } */
+/* { dg-final { scan-assembler-times "(?:bl|call)\\s*__hwasan_tag_mismatch4" 1 
} } */
 /* { dg-output "HWAddressSanitizer: tag-mismatch on address 0x\[0-9a-f\]*.*" } 
*/
 /* { dg-output "READ of size 4 at 0x\[0-9a-f\]* tags: 
\[\[:xdigit:\]\]\[\[:xdigit:\]\]/00 \\(ptr/mem\\) in thread T0.*" } */
 /* { dg-output "Address 0x\[0-9a-f\]* is located in stack of thread T0.*" } */
diff --git a/gcc/testsuite/c-c++-common/hwasan/vararray-outside-caught.c 
b/gcc/testsuite/c-c++-common/hwasan/vararray-outside-caught.c
index 35a344def42..743a894ede9 100644
--- a/gcc/testsuite/c-c++-common/hwasan/vararray-outside-caught.c
+++ b/gcc/testsuite/c-c++-common/hwasan/vararray-outside-caught.c
@@ -17,6 +17,6 @@ main ()
 }
 
 /* { dg-output "HWAddressSanitizer: tag-mismatch on address 0x\[0-9a-f\]*.*" } 
*/
-/* { dg-output "READ of size 4 at 0x\[0-9a-f\]* tags: 
\[\[:xdigit:\]\]\[\[:xdigit:\]\]/00 \\(ptr/mem\\) in thread T0.*" } */
+/* { dg-output "READ of size 4 at 0x\[0-9a-f\]* tags: 
\[\[:xdigit:\]\]\[\[:xdigit:\]\]/\[\[:xdigit:\]\]\[\[:xdigit:\]\].*\\(ptr/mem\\)
 in thread T0.*" } */
 /* { dg-output "Address 0x\[0-9a-f\]* is located in stack of thread T0.*" } */
 /* { dg-output "SUMMARY: HWAddressSanitizer: tag-mismatch \[^\n\]*.*" } */
-- 
2.31.1



[PATCH 2/2] [x86] Enable -mlam=u57 by default when compiled with -fsanitize=hwaddress.

2024-01-22 Thread liuhongt
Ready push to trunk.

gcc/ChangeLog:

* config/i386/i386-options.cc (ix86_option_override_internal):
Enable -mlam=u57 by default when compiled with
-fsanitize=hwaddress.
---
 gcc/config/i386/i386-options.cc | 9 +
 1 file changed, 9 insertions(+)

diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
index b6f634e9a32..e66a58ed926 100644
--- a/gcc/config/i386/i386-options.cc
+++ b/gcc/config/i386/i386-options.cc
@@ -2189,6 +2189,15 @@ ix86_option_override_internal (bool main_args_p,
   && opts->x_ix86_abi != DEFAULT_ABI)
 error ("%<-mabi=%s%> not supported with %<-fsanitize=thread%>", abi_name);
 
+  /* Hwasan is supported with lam_u57 only.  */
+  if (opts->x_flag_sanitize & SANITIZE_HWADDRESS)
+{
+  if (ix86_lam_type == lam_u48)
+   warning (0, "%<-mlam=u48%> is not compatible with Hardware-assisted "
+"AddressSanitizer, override to %<-mlam=u57%>");
+  ix86_lam_type = lam_u57;
+}
+
   /* For targets using ms ABI enable ms-extensions, if not
  explicit turned off.  For non-ms ABI we turn off this
  option.  */
-- 
2.31.1



Re: [pushed][PATCH v1] LoongArch: doc:Combined with the content of target-supports.exp, add the attribute description related to LoongArch.

2024-01-22 Thread chenglulu

Pushed to r14-8344.

在 2024/1/17 上午9:24, chenxiaolong 写道:

gcc/ChangeLog:

* doc/sourcebuild.texi: Add attributes for keywords.
---
  gcc/doc/sourcebuild.texi | 20 
  1 file changed, 20 insertions(+)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 8082100a3c9..6c33237ac78 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2352,6 +2352,26 @@ AArch64 target that is able to generate and execute 
armv8.3-a FJCVTZS
  instruction.
  @end table
  
+@subsubsection LoongArch specific attributes

+
+@table @code
+@item loongarch_sx
+LoongArch target that generates instructions for SX.
+
+@item loongarch_asx
+LoongArch target that generates instructions for ASX.
+
+@item loongarch_sx_hw
+LoongArch target that is able to generate and execute SX code.
+
+@item loongarch_asx_hw
+LoongArch target that is able to generate and execute ASX code.
+
+@item loongarch_call36_support
+LoongArch binutils supports call36 relocation.
+
+@end table
+
  @subsubsection MIPS-specific attributes
  
  @table @code




Re: [PATCH] c++: Don't ICE for unknown parameter to constexpr'd switch-statement, PR113545

2024-01-22 Thread Hans-Peter Nilsson
> Date: Mon, 22 Jan 2024 14:33:59 -0500
> From: Marek Polacek 

> The problem seems to be more about conversion so 
> g++.dg/conversion/reinterpret5.C
> or g++.dg/cpp0x/constexpr-reinterpret3.C seems more appropriate.
> 
> > @@ -0,0 +1,49 @@
> 
> Please add
> 
> PR c++/113545

> > +  unsigned const char c = swbar(reinterpret_cast<__UINTPTR_TYPE__>());
> > +  xyzzy(c);
> > +  unsigned const char d = ifbar(reinterpret_cast<__UINTPTR_TYPE__>());
> 
> I suppose we should also test a C-style cast (which leads to a 
> reinterpret_cast
> in this case).
> 
> Maybe check we get an error when c/d are constexpr (that used to ICE).

Like this?  Not sure about the value of that variant, but here goes.

I checked that these behave as expected (xfail as ICE properly) without the
previosly posted patch to cp/constexpr.cc and XPASS with it applied.

Ok to commit?

-- >8 --
Subject: [PATCH] c++: testcases for PR113545 (constexpr with switch and
 passing non-constexpr parameter)

gcc/testsuite:
PR c++/113545
* g++.dg/cpp0x/constexpr-reinterpret3.C,
g++.dg/cpp0x/constexpr-reinterpret4.C: New tests.
---
 .../g++.dg/cpp0x/constexpr-reinterpret3.C | 55 +++
 .../g++.dg/cpp0x/constexpr-reinterpret4.C | 54 ++
 2 files changed, 109 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-reinterpret3.C
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-reinterpret4.C

diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-reinterpret3.C 
b/gcc/testsuite/g++.dg/cpp0x/constexpr-reinterpret3.C
new file mode 100644
index ..319cc5e8bee9
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-reinterpret3.C
@@ -0,0 +1,55 @@
+// PR c++/113545
+// { dg-do run { target c++11 } }
+// { dg-ice "PR112545 - constexpr function with switch called for 
reinterpret_cast" }
+
+char foo;
+
+// This one caught a call to gcc_unreachable in
+// cp/constexpr.cc:label_matches, when passed a convert_expr from the
+// cast in the call.
+constexpr unsigned char swbar(__UINTPTR_TYPE__ baz)
+{
+  switch (baz)
+{
+case 13:
+  return 11;
+case 14:
+  return 78;
+case 2048:
+  return 13;
+default:
+  return 42;
+}
+}
+
+// For reference, the equivalent* if-statements.
+constexpr unsigned char ifbar(__UINTPTR_TYPE__ baz)
+{
+  if (baz == 13)
+return 11;
+  else if (baz == 14)
+return 78;
+  else if (baz == 2048)
+return 13;
+  else
+return 42;
+}
+
+__attribute__ ((__noipa__))
+void xyzzy(int x)
+{
+  if (x != 42)
+__builtin_abort ();
+}
+
+int main()
+{
+  unsigned const char c = swbar(reinterpret_cast<__UINTPTR_TYPE__>());
+  xyzzy(c);
+  unsigned const char d = ifbar(reinterpret_cast<__UINTPTR_TYPE__>());
+  xyzzy(d);
+  unsigned const char e = swbar((__UINTPTR_TYPE__) );
+  xyzzy(e);
+  unsigned const char f = ifbar((__UINTPTR_TYPE__) );
+  xyzzy(f);
+}
diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-reinterpret4.C 
b/gcc/testsuite/g++.dg/cpp0x/constexpr-reinterpret4.C
new file mode 100644
index ..4d0fdf2c0a78
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-reinterpret4.C
@@ -0,0 +1,54 @@
+// PR c++/113545
+// { dg-do compile { target c++11 } }
+
+char foo;
+
+// This one caught a call to gcc_unreachable in
+// cp/constexpr.cc:label_matches, when passed a convert_expr from the
+// cast in the call.
+constexpr unsigned char swbar(__UINTPTR_TYPE__ baz)
+{
+  switch (baz)
+{
+case 13:
+  return 11;
+case 14:
+  return 78;
+case 2048:
+  return 13;
+default:
+  return 42;
+}
+}
+
+// For reference, the equivalent* if-statements.
+constexpr unsigned char ifbar(__UINTPTR_TYPE__ baz)
+{
+  if (baz == 13)
+return 11;
+  else if (baz == 14)
+return 78;
+  else if (baz == 2048)
+return 13;
+  else
+return 42;
+}
+
+__attribute__ ((__noipa__))
+void xyzzy(int x)
+{
+  if (x != 42)
+__builtin_abort ();
+}
+
+int main()
+{
+  unsigned constexpr char c = swbar(reinterpret_cast<__UINTPTR_TYPE__>()); 
// { dg-error "conversion from pointer type" }
+  xyzzy(c);
+  unsigned constexpr char d = ifbar(reinterpret_cast<__UINTPTR_TYPE__>()); 
// { dg-error "conversion from pointer type" }
+  xyzzy(d);
+  unsigned constexpr char e = swbar((__UINTPTR_TYPE__) ); // { dg-error 
"conversion from pointer type" }
+  xyzzy(e);
+  unsigned constexpr char f = ifbar((__UINTPTR_TYPE__) ); // { dg-error 
"conversion from pointer type" }
+  xyzzy(f);
+}
-- 
2.30.2



Re: [PATCH v2] RISC-V: Add split pattern to generate SFB instructions. [PR113095]

2024-01-22 Thread Jeff Law




On 1/21/24 23:12, Monk Chiang wrote:

Since the match.pd transforms (zero_one == 0) ? y : z  y,
into ((typeof(y))zero_one * z)  y. Add splitters to recongize
this expression to generate SFB instructions.

gcc/ChangeLog:
PR target/113095
* config/riscv/sfb.md: New splitters to rewrite single bit
sign extension as the condition to SFB instructions.

gcc/testsuite/ChangeLog:
 * gcc.target/riscv/sfb.c: New test.
* gcc.target/riscv/pr113095.c: New test.
So the 113095 test is going to fail to link on rv64 causing a testsuite 
failure.  I would suggest it have these dg-options lines instead of the 
one you provided:


/* { dg-options "-O2 -march=rv32gc -mabi=ilp32d -mtune=sifive-7-series" 
{ target { rv32 } } } */
/* { dg-options "-O2 -march=rv64gc -mabi=lp64d -mtune=sifive-7-series" { 
target { rv64 } } } */



A similar change is not strictly needed for the new sfb.c test since it 
only does a compile (but not a link) test.


You still didn't indicating what testing was done for this patch. 
Standard practice is to build the compiler and run the testsuite with 
and without your change and verify there are no regressions.  Ideally 
new tests should pass as well.


I made the change above locally to pr113095.c to fix those failures on 
rv64.   So this is OK with the adjustment to the dg-options line in the 
new pr113095 test.


Jeff



[PATCH] LoongArch: Disable TLS type symbols from generating non-zero offsets.

2024-01-22 Thread Lulu Cheng
TLS gd ld and ie type symbols will generate corresponding GOT entries,
so non-zero offsets cannot be generated.
The address of TLS le type symbol+addend is not implemented in binutils,
so non-zero offset is not generated here for the time being.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_symbolic_constant_p):
For symbols of type tls, non-zero Offset is not generated.
---
 gcc/config/loongarch/loongarch.cc | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 82467474288..f2ce1f6906d 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -1924,11 +1924,7 @@ loongarch_symbolic_constant_p (rtx x, enum 
loongarch_symbol_type *symbol_type)
   x = UNSPEC_ADDRESS (x);
 }
   else if (SYMBOL_REF_P (x) || LABEL_REF_P (x))
-{
-  *symbol_type = loongarch_classify_symbol (x);
-  if (*symbol_type == SYMBOL_TLS)
-   return true;
-}
+*symbol_type = loongarch_classify_symbol (x);
   else
 return false;
 
@@ -1939,17 +1935,21 @@ loongarch_symbolic_constant_p (rtx x, enum 
loongarch_symbol_type *symbol_type)
  relocations.  */
   switch (*symbol_type)
 {
-case SYMBOL_TLS_IE:
-case SYMBOL_TLS_LE:
-case SYMBOL_TLSGD:
-case SYMBOL_TLSLDM:
 case SYMBOL_PCREL:
 case SYMBOL_PCREL64:
   /* GAS rejects offsets outside the range [-2^31, 2^31-1].  */
   return sext_hwi (INTVAL (offset), 32) == INTVAL (offset);
 
+/* The following symbol types do not allow non-zero offsets.  */
 case SYMBOL_GOT_DISP:
+case SYMBOL_TLS_IE:
+case SYMBOL_TLSGD:
+case SYMBOL_TLSLDM:
 case SYMBOL_TLS:
+/* From an implementation perspective, tls_le symbols are allowed to
+   have non-zero offsets, but currently binutils has not added support,
+   so the generation of non-zero offsets is prohibited here.  */
+case SYMBOL_TLS_LE:
   return false;
 }
   gcc_unreachable ();
-- 
2.39.3



Re: [PATCH] LoongArch: Disable explicit reloc for TLS LD/GD with -mexplicit-relocs=auto

2024-01-22 Thread chenglulu

LGTM!

Thanks!

在 2024/1/23 上午2:42, Xi Ruoyao 写道:

Binutils 2.42 supports TLS LD/GD relaxation which requires the assembler
macro.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_explicit_relocs_p):
If la_opt_explicit_relocs is EXPLICIT_RELOCS_AUTO, return false
for SYMBOL_TLS_LDM and SYMBOL_TLS_GD.
(loongarch_call_tls_get_addr): Do not split symbols of
SYMBOL_TLS_LDM or SYMBOL_TLS_GD if la_opt_explicit_relocs is
EXPLICIT_RELOCS_AUTO.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c: Check
for la.tls.ld and la.tls.gd.
---

Bootstrapped & regtested on loongarch64-linux-gnu.  Ok for trunk?

  gcc/config/loongarch/loongarch.cc| 9 -
  .../loongarch/explicit-relocs-auto-tls-ld-gd.c   | 3 ++-
  2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 82467474288..58df0b5637d 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -1970,11 +1970,10 @@ loongarch_explicit_relocs_p (enum loongarch_symbol_type 
type)
  {
case SYMBOL_TLS_IE:
case SYMBOL_TLS_LE:
-  case SYMBOL_TLSGD:
-  case SYMBOL_TLSLDM:
case SYMBOL_PCREL64:
-   /* The linker don't know how to relax TLS accesses or 64-bit
-  pc-relative accesses.  */
+   /* TLS IE cannot be relaxed.  TLS LE relaxation does not require
+  using the assembly macro.  The linker does not relax 64-bit
+  pc-relative accesses as at now.  */
return true;
case SYMBOL_GOT_DISP:
/* The linker don't know how to relax GOT accesses in extreme
@@ -2789,7 +2788,7 @@ loongarch_call_tls_get_addr (rtx sym, enum 
loongarch_symbol_type type, rtx v0)
  
start_sequence ();
  
-  if (la_opt_explicit_relocs != EXPLICIT_RELOCS_NONE)

+  if (la_opt_explicit_relocs == EXPLICIT_RELOCS_ALWAYS)
  {
/* Split tls symbol to high and low.  */
rtx high = gen_rtx_HIGH (Pmode, copy_rtx (loc));
diff --git 
a/gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c 
b/gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c
index 957ff98df62..ca55fcfc53e 100644
--- a/gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c
+++ b/gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c
@@ -6,4 +6,5 @@ extern __thread int b __attribute__((visibility("default")));
  
  int test() { return a + b; }
  
-/* { dg-final { scan-assembler-not "la.tls" { target tls_native } } } */

+/* { dg-final { scan-assembler "la\\.tls\\.ld" { target tls_native } } } */
+/* { dg-final { scan-assembler "la\\.tls\\.gd" { target tls_native } } } */




Re: [PATCH] c++: Don't ICE for unknown parameter to constexpr'd switch-statement, PR113545

2024-01-22 Thread Hans-Peter Nilsson
> Date: Mon, 22 Jan 2024 14:33:59 -0500
> From: Marek Polacek 

Oh, there was more...  Also, I think I misinterpreted your
reply as meaning that the test-case is ice-on-invalid and I
wrongly replied in agreement to that misinterpretation. :)

(For others at a comparable level of (lack of) C++ knowledge
to me: my reading of
https://en.cppreference.com/w/cpp/language/constexpr is that
a constexpr function can be validly called with an
expression that isn't "constexpr" (compile-time computable,
immediately computable, core constant expressions or
whatever the term), and the test-case is a valid example (of
two such invocations).  So, an expression calling such a
function is only truly "constexpr" with the "right"
parameters.  I know this isn't C++ 102, but many of us
hacking gcc aren't originally c++ hackers; that was just
happenstance.  I was about to write "aren't C++ hackers" but
then again, C++ happened to gcc, and c++11 at that.)

> On Mon, Jan 22, 2024 at 06:02:32PM +0100, Hans-Peter Nilsson wrote:

> The problem seems to be more about conversion so 
> g++.dg/conversion/reinterpret5.C
> or g++.dg/cpp0x/constexpr-reinterpret3.C seems more appropriate.

I briefly considered one of the cpp[0-9a-z]* subdirectories
but found no rule.

Isn't constexpr c++11 and therefor cpp0x isn't a good match
(contrary to the many constexpr tests therein)?

What *is* the actual rule for putting a test in
g++.dg/cpp0x, cpp1x and cpp1y (et al)?
(I STFW but found nothing.)

> > +++ b/gcc/testsuite/g++.dg/expr/pr113545.C
> > @@ -0,0 +1,49 @@
> 
> Please add
> 
> PR c++/113545

Certainly if the test is to change name and even if not
("git grep" wouldn't catch the file name).  Will do.

> > +  unsigned const char c = swbar(reinterpret_cast<__UINTPTR_TYPE__>());
> > +  xyzzy(c);
> > +  unsigned const char d = ifbar(reinterpret_cast<__UINTPTR_TYPE__>());
> 
> I suppose we should also test a C-style cast (which leads to a 
> reinterpret_cast
> in this case).
> 
> Maybe check we get an error when c/d are constexpr (that used to ICE).

But the expressions aren't declared constexpr, just const
(as in "here 'const' means run-time evaluation due to the
weirdness that is C++").

...oh, I see what you mean, these are valid, but you suggest
adding tests declared constexpr to check that they get a
matching error (not ICE :) !

Thanks again for the review, I think I'll at least re-work
the test-case into two separate ones.

brgds, H-P


Go patch committed: Don't pass iota value to lowering pass

2024-01-22 Thread Ian Lance Taylor
This patch to the Go frontend stops passing the iota value to the
lowering pass.  It is no longer used.  The iota value is now handled
in the determine-types pass.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian
896148947b9ff4845c8bc334f8eff30f91ff3c9a
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index b41ac99f7a8..c2a6032ae80 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-61b29a99dadf33c48a0a063f50f61e877fb419b8
+ddf3758e4a45ca2816fb68f3e4224501a3c4c438
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/gcc/go/gofrontend/expressions.cc b/gcc/go/gofrontend/expressions.cc
index a09d33b868e..51ff0206129 100644
--- a/gcc/go/gofrontend/expressions.cc
+++ b/gcc/go/gofrontend/expressions.cc
@@ -1007,7 +1007,7 @@ Expression::make_type(Type* type, Location location)
 
 Expression*
 Var_expression::do_lower(Gogo* gogo, Named_object* function,
-Statement_inserter* inserter, int)
+Statement_inserter* inserter)
 {
   if (this->variable_->is_variable())
 {
@@ -1158,7 +1158,7 @@ Enclosed_var_expression::do_traverse(Traverse*)
 
 Expression*
 Enclosed_var_expression::do_lower(Gogo* gogo, Named_object* function,
- Statement_inserter* inserter, int)
+ Statement_inserter* inserter)
 {
   gogo->lower_expression(function, inserter, >reference_);
   return this;
@@ -2097,7 +2097,7 @@ Unknown_expression::do_is_addressable() const
 // Lower a reference to an unknown name.
 
 Expression*
-Unknown_expression::do_lower(Gogo*, Named_object*, Statement_inserter*, int)
+Unknown_expression::do_lower(Gogo*, Named_object*, Statement_inserter*)
 {
   if (this->is_error_expression())
 return Expression::make_error(this->location());
@@ -3642,7 +3642,7 @@ Const_expression::do_is_zero_value() const
 // predeclared constant iota into an integer value.
 
 Expression*
-Const_expression::do_lower(Gogo* gogo, Named_object*, Statement_inserter*, int)
+Const_expression::do_lower(Gogo* gogo, Named_object*, Statement_inserter*)
 {
   Location loc = this->location();
 
@@ -4120,7 +4120,7 @@ class Iota_expression : public Parser_expression
   { }
 
   Expression*
-  do_lower(Gogo*, Named_object*, Statement_inserter*, int)
+  do_lower(Gogo*, Named_object*, Statement_inserter*)
   { go_unreachable(); }
 
   // There should only ever be one of these.
@@ -4171,7 +4171,7 @@ Type_conversion_expression::do_type()
 
 Expression*
 Type_conversion_expression::do_lower(Gogo* gogo, Named_object*,
-Statement_inserter* inserter, int)
+Statement_inserter* inserter)
 {
   Type* type = this->type_;
   Expression* val = this->expr_;
@@ -4997,7 +4997,7 @@ Unary_expression::check_operand_address_taken(Gogo*)
 // instead.
 
 Expression*
-Unary_expression::do_lower(Gogo* gogo, Named_object*, Statement_inserter*, int)
+Unary_expression::do_lower(Gogo* gogo, Named_object*, Statement_inserter*)
 {
   Location loc = this->location();
 
@@ -6677,7 +6677,7 @@ Binary_expression::eval_complex(Operator op, const 
Numeric_constant* left_nc,
 
 Expression*
 Binary_expression::do_lower(Gogo* gogo, Named_object*,
-   Statement_inserter* inserter, int)
+   Statement_inserter* inserter)
 {
   Location location = this->location();
 
@@ -8955,7 +8955,7 @@ class Selector_expression : public Parser_expression
   do_issue_nil_check();
 
   Expression*
-  do_lower(Gogo*, Named_object*, Statement_inserter*, int);
+  do_lower(Gogo*, Named_object*, Statement_inserter*);
 
   Expression*
   do_copy()
@@ -9030,7 +9030,7 @@ Selector_expression::do_issue_nil_check()
 // Lower a selector expression to the resolved value.
 
 Expression*
-Selector_expression::do_lower(Gogo*, Named_object*, Statement_inserter*, int)
+Selector_expression::do_lower(Gogo*, Named_object*, Statement_inserter*)
 {
   if (this->is_error_expression() || this->resolved_ == NULL)
 return Expression::make_error(this->location());
@@ -9360,7 +9360,7 @@ Builtin_call_expression::do_set_recover_arg(Expression* 
arg)
 
 Expression*
 Builtin_call_expression::do_lower(Gogo* gogo, Named_object* function,
- Statement_inserter* inserter, int)
+ Statement_inserter* inserter)
 {
   if (this->is_error_expression())
 return this;
@@ -12564,7 +12564,7 @@ Call_expression::do_discarding_value()
 
 Expression*
 Call_expression::do_lower(Gogo* gogo, Named_object*,
- Statement_inserter* inserter, int)
+ Statement_inserter* inserter)
 {
   if (this->lowered_ != NULL)
 return this->lowered_;
@@ -14836,7 +14836,7 @@ Index_expression::do_issue_nil_check()
 // expression into an array index, a string index, or a map index.
 
 

Re: [PATCH] c++: Don't ICE for unknown parameter to constexpr'd switch-statement, PR113545

2024-01-22 Thread Hans-Peter Nilsson
> Date: Mon, 22 Jan 2024 14:33:59 -0500
> From: Marek Polacek 

> On Mon, Jan 22, 2024 at 06:02:32PM +0100, Hans-Peter Nilsson wrote:
> > I don't really know whether this is the right way to treat
> > CONVERT_EXPR as below, but...  Regtested native
> > x86_64-linux-gnu.  Ok to commit?
> 
> Thanks for taking a look at this problem.

Honestly, it's more like posting a patch is more effective
than just opening a PR. ;)  And I was curious about that
constexpr thing that usually works, but blew up here.

> > brgds, H-P
> > 
> > -- >8 --
> > That gcc_unreachable at the default-label seems to be over
> > the top.  It seems more correct to just say "that's not
> > constant" to whatever's not known (to be constant), when
> > looking for matches in switch-statements.
> 
> Unfortunately this doesn't seem correct to me; I don't think we
> should have gotten that far.

The gcc_unreachable was indeed a clue in that direction.

>  It appears that we lose track of
> the reinterpret_cast, which is not allowed in a constant expression:
> .

B...b..but clang allows it... (and I have no clue about the
finer --or admittedly even coarser-- details of C++) ...and
not-my-code, just "porting" it.

Seriously though, thanks for the reference.  Also, maybe
something to consider for -fpermissive, if this changes to a
more graceful error path.

> cp_convert -> ... -> convert_to_integer_1 gives us a CONVERT_EXPR
> but we only set REINTERPRET_CAST_P on NOP_EXPRs:
> 
>   expr = cp_convert (type, expr, complain);
>   if (TREE_CODE (expr) == NOP_EXPR)
> /* Mark any nop_expr that created as a reintepret_cast.  */
> REINTERPRET_CAST_P (expr) = true;
> 
> so when evaluating baz we get (long unsigned int) , which
> passes verify_constant.
>  
> I don't have a good suggestion yet, sorry.

Thanks for the review!

> > With this patch, the code generated for the (inlined) call to
> > ifbar equals that to swbar, except for the comparisons being
> > in another order.
> > 
> > gcc/cp:
> > PR c++/113545
> > * constexpr.cc (label_matches): Replace call to_unreachable with
> 
> "to gcc_unreachable"

Oops!

> > return false.
> 
> More like with "break" but that's not important.

(Well, *effectively* return false.  I'd change to something
like "Replace call to gcc_unreachable with arrangement to
return false" if this were to go anywhere.)

> > gcc/testsuite:
> > * g++.dg/expr/pr113545.C: New text.
> 
> "test"

Gosh, horrible typos, thanks.

brgds, H-P


[COMMITTED, obvious] Sort warning options in c-family/c.opt.

2024-01-22 Thread Sandra Loosemore
In spite of the plea "Please try to keep this file in ASCII collating
order" at the top of the file, the sorting of the entries for the
various -Wfoo options had increased in entropy.  This made it hard to
correlate them against e.g. the list of options enabled by -Wall in
the manual (see PR90463).  This patch is at least a step in the right
direction to restore order to the file.

I confirmed that no lines were added or removed by these changes, and
that the output of "gcc -Q --help=warnings" is unchanged both with or
without -Wall added to the command.

gcc/c-family/ChangeLog
* c.opt: Improve sorting of warning options.
---
 gcc/c-family/c.opt | 488 ++---
 1 file changed, 244 insertions(+), 244 deletions(-)

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 314bd17004f..9c0a28092fc 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -303,6 +303,10 @@ Waddress
 C ObjC C++ ObjC++ Var(warn_address) Warning LangEnabledBy(C ObjC C++ 
ObjC++,Wall)
 Warn about suspicious uses of memory addresses.
 
+Waddress-of-packed-member
+C ObjC C++ ObjC++ Var(warn_address_of_packed_member) Init(1) Warning
+Warn when the address of packed member of struct or union is taken.
+
 Enum
 Name(warn_aligned_new_level) Type(int) UnknownError(argument %qs to 
%<-Waligned-new%> not recognized)
 
@@ -358,6 +362,10 @@ Wno-alloca-larger-than
 C ObjC C++ LTO ObjC++ Alias(Walloca-larger-than=,18446744073709551615EiB,none) 
Warning
 Disable Walloca-larger-than= warning.  Equivalent to 
Walloca-larger-than= or larger.
 
+Warith-conversion
+C ObjC C++ ObjC++ Var(warn_arith_conv) Warning
+Warn if conversion of the result of arithmetic might change the value even 
though converting the operands cannot.
+
 Warray-bounds=
 LangEnabledBy(C ObjC C++ LTO ObjC++,Wall,1,0)
 ; in common.opt
@@ -374,10 +382,6 @@ Warray-parameter=
 C ObjC C++ ObjC++ Joined RejectNegative UInteger Var(warn_array_parameter) 
IntegerRange(0, 2) LangEnabledBy(C ObjC C++ ObjC++,Wall, 2, 0) Warning
 Warn about mismatched declarations of array parameters and unsafe accesses to 
them.
 
-Wzero-length-bounds
-C ObjC C++ ObjC++ Var(warn_zero_length_bounds) Warning LangEnabledBy(C ObjC 
C++ ObjC++,Wall)
-Warn about accesses to interior zero-length array members.
-
 Wassign-intercept
 ObjC ObjC++ Var(warn_assign_intercept) Warning
 Warn whenever an Objective-C assignment is being intercepted by the garbage 
collector.
@@ -421,10 +425,6 @@ Wbool-operation
 C ObjC C++ ObjC++ Var(warn_bool_op) Warning LangEnabledBy(C ObjC C++ 
ObjC++,Wall)
 Warn about certain operations on boolean expressions.
 
-Wframe-address
-C ObjC C++ ObjC++ Var(warn_frame_address) Warning LangEnabledBy(C ObjC C++ 
ObjC++,Wall)
-Warn when __builtin_frame_address or __builtin_return_address is used unsafely.
-
 Wbuiltin-declaration-mismatch
 C ObjC C++ ObjC++ Var(warn_builtin_declaration_mismatch) Init(1) Warning
 Warn when a built-in function is declared with the wrong signature.
@@ -534,6 +534,14 @@ Wchkp
 C ObjC C++ ObjC++ Warning WarnRemoved
 Removed in GCC 9.  This switch has no effect.
 
+Wclass-conversion
+C++ ObjC++ Var(warn_class_conversion) Init(1) Warning
+Warn when a conversion function will never be called due to the type it 
converts to.
+
+Wclass-memaccess
+C++ ObjC++ Var(warn_class_memaccess) Warning LangEnabledBy(C++ ObjC++, Wall)
+Warn for unsafe raw memory writes to objects of class types.
+
 Wclobbered
 C ObjC C++ ObjC++ Var(warn_clobbered) Warning EnabledBy(Wextra)
 Warn about variables that might be changed by \"longjmp\" or \"vfork\".
@@ -650,6 +658,14 @@ Wdiv-by-zero
 C ObjC C++ ObjC++ Var(warn_div_by_zero) Init(1) Warning
 Warn about compile-time integer division by zero.
 
+Wdouble-promotion
+C ObjC C++ ObjC++ Var(warn_double_promotion) Warning
+Warn about implicit conversions from \"float\" to \"double\".
+
+Wduplicate-decl-specifier
+C ObjC Var(warn_duplicate_decl_specifier) Warning LangEnabledBy(C ObjC,Wall)
+Warn when a declaration has duplicate const, volatile, restrict or _Atomic 
specifier.
+
 Wduplicated-branches
 C ObjC C++ ObjC++ Var(warn_duplicated_branches) Init(0) Warning
 Warn about duplicated branches in if-else statements.
@@ -662,6 +678,10 @@ Weffc++
 C++ ObjC++ Var(warn_ecpp) Warning
 Warn about violations of Effective C++ style rules.
 
+Welaborated-enum-base
+C++ ObjC++ Var(warn_elaborated_enum_base) Warning Init(1)
+Warn if an additional enum-base is used in an elaborated-type-specifier.
+
 Wempty-body
 C ObjC C++ ObjC++ Var(warn_empty_body) Warning EnabledBy(Wextra)
 Warn about an empty body in an if or else statement.
@@ -694,6 +714,10 @@ Wexceptions
 C++ ObjC++ Var(warn_exceptions) Init(1) Warning
 Warn when an exception handler is shadowed by another handler.
 
+Wexpansion-to-defined
+C ObjC C++ ObjC++ CPP(warn_expansion_to_defined) 
CppReason(CPP_W_EXPANSION_TO_DEFINED) Var(cpp_warn_expansion_to_defined) 
Init(0) Warning EnabledBy(Wextra || Wpedantic)
+Warn if \"defined\" is used outside #if.
+
 

[COMMITTED] Correct lists of options enabled by -Wall and -Wextra [PR90463]

2024-01-22 Thread Sandra Loosemore
gcc/ChangeLog
PR c++/90463
* doc/invoke.texi (Warning Options): Correct lists of options
enabled by -Wall and -Wextra by checking against common.opt
and c-family/c.opt.
---
 gcc/doc/invoke.texi | 79 -
 1 file changed, 50 insertions(+), 29 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 278c931b6a3..676e7ef03d1 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -6202,27 +6202,37 @@ Options} and @ref{Objective-C and Objective-C++ Dialect 
Options}.
 @option{-Wall} turns on the following warning flags:
 
 @gccoptlist{-Waddress
+-Waligned-new @r{(C++ and Objective-C++ only)}
 -Warray-bounds=1 @r{(only with} @option{-O2}@r{)}
 -Warray-compare
--Warray-parameter=2 @r{(C and Objective-C only)}
+-Warray-parameter=2
 -Wbool-compare
 -Wbool-operation
--Wc++11-compat  -Wc++14-compat
+-Wc++11-compat  -Wc++14-compat  -Wc++17compat  -Wc++20compat
 -Wcatch-value @r{(C++ and Objective-C++ only)}
 -Wchar-subscripts
+-Wclass-memaccess @r{(C++ and Objective-C++ only)}
 -Wcomment
+-Wdangling-else
 -Wdangling-pointer=2
+-Wdelete-non-virtual-dtor @r{(C++ and Objective-C++ only)}
 -Wduplicate-decl-specifier @r{(C and Objective-C only)}
 -Wenum-compare @r{(in C/ObjC; this is on by default in C++)}
 -Wenum-int-mismatch @r{(C and Objective-C only)}
--Wformat
--Wformat-overflow
--Wformat-truncation
--Wint-in-bool-context
+-Wformat=1
+-Wformat-contains-nul
+-Wformat-diag
+-Wformat-extra-args
+-Wformat-overflow=1
+-Wformat-truncation=1
+-Wformat-zero-length
+-Wframe-address
 -Wimplicit @r{(C and Objective-C only)}
--Wimplicit-int @r{(C and Objective-C only)}
 -Wimplicit-function-declaration @r{(C and Objective-C only)}
--Winit-self @r{(only for C++)}
+-Wimplicit-int @r{(C and Objective-C only)}
+-Winfinite-recursion
+-Winit-self @r{(C++ and Objective-C++ only)}
+-Wint-in-bool-context
 -Wlogical-not-parentheses
 -Wmain @r{(only for C/ObjC and unless} @option{-ffreestanding}@r{)}
 -Wmaybe-uninitialized
@@ -6230,24 +6240,26 @@ Options} and @ref{Objective-C and Objective-C++ Dialect 
Options}.
 -Wmemset-transposed-args
 -Wmisleading-indentation @r{(only for C/C++)}
 -Wmismatched-dealloc
--Wmismatched-new-delete @r{(only for C/C++)}
+-Wmismatched-new-delete @r{(C++ and Objective-C++ only)}
 -Wmissing-attributes
 -Wmissing-braces @r{(only for C/ObjC)}
 -Wmultistatement-macros
--Wnarrowing @r{(only for C++)}
+-Wnarrowing  @r{(C++ and Objective-C++ only)}
 -Wnonnull
 -Wnonnull-compare
--Wopenmp-simd
+-Wopenmp-simd @r{(C and C++ only)}
+-Woverloaded-virtual=1 @r{(C++ and Objective-C++ only)}
+-Wpacked-not-aligned
 -Wparentheses
--Wpessimizing-move @r{(only for C++)}
--Wpointer-sign
--Wrange-loop-construct @r{(only for C++)}
--Wreorder
+-Wpessimizing-move @r{(C++ and Objective-C++ only)}
+-Wpointer-sign @r{(only for C/ObjC)}
+-Wrange-loop-construct @r{(C++ and Objective-C++ only)}
+-Wreorder @r{(C++ and Objective-C++ only)}
 -Wrestrict
 -Wreturn-type
--Wself-move @r{(only for C++)}
+-Wself-move @r{(C++ and Objective-C++ only)}
 -Wsequence-point
--Wsign-compare @r{(only in C++)}
+-Wsign-compare @r{(C++ and Objective-C++ only)}
 -Wsizeof-array-div
 -Wsizeof-pointer-div
 -Wsizeof-pointer-memaccess
@@ -6258,12 +6270,16 @@ Options} and @ref{Objective-C and Objective-C++ Dialect 
Options}.
 -Wtrigraphs
 -Wuninitialized
 -Wunknown-pragmas
+-Wunused
+-Wunused-but-set-variable
+-Wunused-const-variable=1 @r{(only for C/ObjC)}
 -Wunused-function
 -Wunused-label
+-Wunused-local-typedefs
 -Wunused-value
 -Wunused-variable
 -Wuse-after-free=2
--Wvla-parameter @r{(C and Objective-C only)}
+-Wvla-parameter
 -Wvolatile-register-var
 -Wzero-length-bounds}
 
@@ -6283,27 +6299,32 @@ This enables some extra warning flags that are not 
enabled by
 @option{-Wall}. (This option used to be called @option{-W}.  The older
 name is still supported, but the newer name is more descriptive.)
 
-@gccoptlist{-Wclobbered
+@gccoptlist{-Wabsolute-value @r{(only for C/ObjC)}
+-Walloc-size
+-Wcalloc-transposed-args
 -Wcast-function-type
--Wdeprecated-copy @r{(C++ only)}
+-Wclobbered
+-Wdeprecated-copy @r{(C++ and Objective-C++ only)}
 -Wempty-body
--Wenum-conversion @r{(C only)}
--Wignored-qualifiers
+-Wenum-conversion @r{(only for C/ObjC)}
+-Wexpansion-to-defined
+-Wignored-qualifiers  @r{(only for C/C++)}
 -Wimplicit-fallthrough=3
+-Wmaybe-uninitialized
 -Wmissing-field-initializers
--Wmissing-parameter-type @r{(C only)}
--Wold-style-declaration @r{(C only)}
--Woverride-init
--Wsign-compare @r{(C only)}
+-Wmissing-parameter-type @r{(C/ObjC only)}
+-Wold-style-declaration @r{(C/ObjC only)}
+-Woverride-init @r{(C/ObjC only)}
+-Wredundant-move @r{(C++ and Objective-C++ only)}
+-Wshift-negative-value @r{(in C++11 to C++17 and in C99 and newer)}
+-Wsign-compare @r{(C++ and Objective-C++ only)}
+-Wsized-deallocation @r{(C++ and Objective-C++ only)}
 -Wstring-compare
--Wredundant-move @r{(only for C++)}
 -Wtype-limits
 -Wuninitialized
--Wshift-negative-value @r{(in C++11 to 

[PATCH] Revert "Pass GUILE down to subdirectories"

2024-01-22 Thread Tom Tromey
This reverts commit b7e5a29602143b53267efcd9c8d5ecc78cd5a62f.

This patch caused problems for some users when building gdb, because
it would cause 'guild' to be invoked with the wrong versin of guile.
On the whole it seems simpler to just back this out.

* Makefile.in: Rebuild.
* Makefile.tpl (BASE_EXPORTS): Remove GUILE.
(GUILE): Remove.
* Makefile.def (flags_to_pass): Remove GUILE.
---
 Makefile.def | 1 -
 Makefile.in  | 8 ++--
 Makefile.tpl | 7 ++-
 3 files changed, 4 insertions(+), 12 deletions(-)

diff --git a/Makefile.def b/Makefile.def
index 19954e7d731..c8c80af3657 100644
--- a/Makefile.def
+++ b/Makefile.def
@@ -312,7 +312,6 @@ flags_to_pass = { flag= GNATBIND ; };
 flags_to_pass = { flag= GNATMAKE ; };
 flags_to_pass = { flag= GDC ; };
 flags_to_pass = { flag= GDCFLAGS ; };
-flags_to_pass = { flag= GUILE ; };
 
 // Target tools
 flags_to_pass = { flag= AR_FOR_TARGET ; };
diff --git a/Makefile.in b/Makefile.in
index edb0c8a9a42..245dd610b53 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -3,7 +3,7 @@
 #
 # Makefile for directory with subdirs to build.
 #   Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998,
-#   1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 
2011, 2023
+#   1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 
2011
 #   Free Software Foundation
 #
 # This file is free software; you can redistribute it and/or modify
@@ -143,8 +143,7 @@ BASE_EXPORTS = \
M4="$(M4)"; export M4; \
SED="$(SED)"; export SED; \
AWK="$(AWK)"; export AWK; \
-   MAKEINFO="$(MAKEINFO)"; export MAKEINFO; \
-   GUILE="$(GUILE)"; export GUILE;
+   MAKEINFO="$(MAKEINFO)"; export MAKEINFO;
 
 # This is the list of variables to export in the environment when
 # configuring subdirectories for the build system.
@@ -452,8 +451,6 @@ GM2FLAGS = $(CFLAGS)
 
 PKG_CONFIG_PATH = @PKG_CONFIG_PATH@
 
-GUILE = guile
-
 # Pass additional PGO and LTO compiler options to the PGO build.
 BUILD_CFLAGS = $(PGO_BUILD_CFLAGS) $(PGO_BUILD_LTO_CFLAGS)
 override CFLAGS += $(BUILD_CFLAGS)
@@ -886,7 +883,6 @@ BASE_FLAGS_TO_PASS = \
"GNATMAKE=$(GNATMAKE)" \
"GDC=$(GDC)" \
"GDCFLAGS=$(GDCFLAGS)" \
-   "GUILE=$(GUILE)" \
"AR_FOR_TARGET=$(AR_FOR_TARGET)" \
"AS_FOR_TARGET=$(AS_FOR_TARGET)" \
"CC_FOR_TARGET=$(CC_FOR_TARGET)" \
diff --git a/Makefile.tpl b/Makefile.tpl
index adbcbdd1d57..6e22adecd2f 100644
--- a/Makefile.tpl
+++ b/Makefile.tpl
@@ -6,7 +6,7 @@ in
 #
 # Makefile for directory with subdirs to build.
 #   Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998,
-#   1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 
2011, 2023
+#   1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 
2011
 #   Free Software Foundation
 #
 # This file is free software; you can redistribute it and/or modify
@@ -146,8 +146,7 @@ BASE_EXPORTS = \
M4="$(M4)"; export M4; \
SED="$(SED)"; export SED; \
AWK="$(AWK)"; export AWK; \
-   MAKEINFO="$(MAKEINFO)"; export MAKEINFO; \
-   GUILE="$(GUILE)"; export GUILE;
+   MAKEINFO="$(MAKEINFO)"; export MAKEINFO;
 
 # This is the list of variables to export in the environment when
 # configuring subdirectories for the build system.
@@ -455,8 +454,6 @@ GM2FLAGS = $(CFLAGS)
 
 PKG_CONFIG_PATH = @PKG_CONFIG_PATH@
 
-GUILE = guile
-
 # Pass additional PGO and LTO compiler options to the PGO build.
 BUILD_CFLAGS = $(PGO_BUILD_CFLAGS) $(PGO_BUILD_LTO_CFLAGS)
 override CFLAGS += $(BUILD_CFLAGS)
-- 
2.43.0



Re: [PATCH] Pass GUILE down to subdirectories

2024-01-22 Thread Tom Tromey
Eric> I mean, I've been trying to figure out how to re-run cgen myself, to
Eric> regenerate some cgen-erated files in libopcodes to fix some compiler
Eric> warnings in them, but it's pretty hard to do so; I'd really appreciate
Eric> it if the whole process of regenerating files with cgen could be made
Eric> easy and well-documented and understandable...

Yeah.  Unfortunately cgen seems fairly unmaintained and guile still
seems a bit clunky to use somehow (or maybe it's packaged strangely, I
don't know).

After some patches of mine to let cgen use the guile compiler, it
requires guile 3.0 (at least for me, guile 2 dies while trying to run
it).  So what I do is:

* cd binutils-gdb
* ln -s /path/to/cgen
* configure a new directory with --enable-cgen-maint
* make GUILE=guile3.0

Tom


[PATCH] c++: avoid -Wdangling-reference for std::span-like classes [PR110358]

2024-01-22 Thread Marek Polacek
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
Real-world experience shows that -Wdangling-reference triggers for
user-defined std::span-like classes a lot.  We can easily avoid that
by considering classes like

template
struct Span {
  T* data_;
  std::size len_;
};

to be std::span-like, and not warning for them.

PR c++/110358
PR c++/109640

gcc/cp/ChangeLog:

* call.cc (span_like_class_p): New.
(do_warn_dangling_reference): Use it.

gcc/ChangeLog:

* doc/invoke.texi: Update -Wdangling-reference documentation.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wdangling-reference18.C: New test.
* g++.dg/warn/Wdangling-reference19.C: New test.
* g++.dg/warn/Wdangling-reference20.C: New test.
---
 gcc/cp/call.cc| 38 -
 gcc/doc/invoke.texi   | 15 +++
 .../g++.dg/warn/Wdangling-reference18.C   | 24 +++
 .../g++.dg/warn/Wdangling-reference19.C   | 25 +++
 .../g++.dg/warn/Wdangling-reference20.C   | 42 +++
 5 files changed, 143 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/warn/Wdangling-reference18.C
 create mode 100644 gcc/testsuite/g++.dg/warn/Wdangling-reference19.C
 create mode 100644 gcc/testsuite/g++.dg/warn/Wdangling-reference20.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index 77f51bacce3..d6bdb3cc9bd 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -14082,6 +14082,40 @@ reference_like_class_p (tree ctype)
   return false;
 }
 
+/* Return true if class TYPE looks like std::span: it's a class template
+   and has a T* member followed by a field of integral type.  For example,
+
+template
+struct Span {
+  T* data_;
+  std::size len_;
+};
+
+   is considered std::span-like.  */
+
+static bool
+span_like_class_p (tree type)
+{
+  if (!NON_UNION_CLASS_TYPE_P (type)
+  || !CLASSTYPE_TEMPLATE_INSTANTIATION (type))
+return false;
+
+  tree args = CLASSTYPE_TI_ARGS (type);
+  if (TREE_VEC_LENGTH (args) != 1)
+return false;
+
+  tree f = next_aggregate_field (TYPE_FIELDS (type));
+  if (f && TYPE_PTR_P (TREE_TYPE (f)))
+{
+  f = next_aggregate_field (DECL_CHAIN (f));
+  if (f && INTEGRAL_TYPE_P (TREE_TYPE (f))
+ && !next_aggregate_field (DECL_CHAIN (f)))
+   return true;
+}
+
+  return false;
+}
+
 /* Helper for maybe_warn_dangling_reference to find a problematic CALL_EXPR
that initializes the LHS (and at least one of its arguments represents
a temporary, as outlined in maybe_warn_dangling_reference), or NULL_TREE
@@ -14126,7 +14160,9 @@ do_warn_dangling_reference (tree expr, bool arg_p)
   tree type = TREE_TYPE (e);
   /* If the temporary represents a lambda, we don't really know
 what's going on here.  */
-  if (!reference_like_class_p (type) && !LAMBDA_TYPE_P (type))
+  if (!reference_like_class_p (type)
+ && !LAMBDA_TYPE_P (type)
+ && !span_like_class_p (type))
return expr;
 }
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 278c931b6a3..509779c8fd8 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -3914,6 +3914,21 @@ where @code{std::minmax} returns @code{std::pair}, and
 both references dangle after the end of the full expression that contains
 the call to @code{std::minmax}.
 
+The warning does not warn for @code{std::span}-like classes.  We consider
+classes of the form:
+
+@smallexample
+template
+struct Span @{
+  T* data_;
+  std::size len_;
+@};
+@end smallexample
+
+as @code{std::span}-like; that is, the class is a class template that
+has a pointer data member followed by an integral data member, and does
+not have any other data members.
+
 This warning is enabled by @option{-Wall}.
 
 @opindex Wdelete-non-virtual-dtor
diff --git a/gcc/testsuite/g++.dg/warn/Wdangling-reference18.C 
b/gcc/testsuite/g++.dg/warn/Wdangling-reference18.C
new file mode 100644
index 000..e088c177769
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wdangling-reference18.C
@@ -0,0 +1,24 @@
+// PR c++/110358
+// { dg-do compile { target c++11 } }
+// { dg-options "-Wdangling-reference" }
+// Don't warn for std::span-like classes.
+
+template 
+struct Span {
+T* data_;
+int len_;
+
+[[nodiscard]] constexpr auto operator[](int n) const noexcept -> T& { 
return data_[n]; }
+[[nodiscard]] constexpr auto front() const noexcept -> T& { return 
data_[0]; }
+[[nodiscard]] constexpr auto back() const noexcept -> T& { return 
data_[len_ - 1]; }
+};
+
+auto get() -> Span;
+
+auto f() -> int {
+int const& a = get().front(); // { dg-bogus "dangling reference" }
+int const& b = get().back();  // { dg-bogus "dangling reference" }
+int const& c = get()[0];  // { dg-bogus "dangling reference" }
+
+return a + b + c;
+}
diff --git a/gcc/testsuite/g++.dg/warn/Wdangling-reference19.C 

Re: [PATCH 3/3] aarch64: Fix up debug uses in ldp/stp pass [PR113089]

2024-01-22 Thread Alex Coplan
On 22/01/2024 17:09, Richard Sandiford wrote:
> Sorry for the earlier review comment about debug insns.  I hadn't
> looked far enough into the queue to see this patch.
> 
> Alex Coplan  writes:
> > As the PR shows, we were missing code to update debug uses in the
> > load/store pair fusion pass.  This patch fixes that.
> >
> > Note that this patch depends on the following patch to create new uses
> > in RTL-SSA, submitted as part of the fixes for PR113070:
> > https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642919.html
> >
> > The patch tries to give a complete treatment of the debug uses that will
> > be affected by the changes we make, and in particular makes an effort to
> > preserve debug info where possible, e.g. when re-ordering an update of
> > a base register by a constant over a debug use of that register.  When
> > re-ordering loads over a debug use of a transfer register, we reset the
> > debug insn.  Likewise when re-ordering stores over debug uses of mem.
> >
> > While doing this I noticed that try_promote_writeback used a strange
> > choice of move_range for the pair insn, in that it chose the previous
> > nondebug insn instead of the insn itself.  Since the insn is being
> > changed, these move ranges are equivalent (at least in terms of nondebug
> > insn placement as far as RTL-SSA is concerned), but I think it is more
> > natural to choose the pair insn itself.  This is needed to avoid
> > incorrectly updating some debug uses.
> >
> > Notes on testing:
> >  - The series was bootstrapped/regtested on top of the fixes for
> >PR113070 and PR113356.  It seemed to make more sense to test with
> >correct use/def info, and as mentioned above, this patch depends on
> >one of the PR113070 patches.
> >  - I also ran the testsuite with -g -funroll-loops -mearly-ldp-fusion
> >-mlate-ldp-fusion to try and flush out more issues, and worked
> >through some examples where writeback updates were triggered to
> >make sure it was doing the right thing.
> >  - The patches also survived an LTO+PGO bootstrap with
> >--enable-languages=all (with the passes enabled).
> >
> > Bootstrapped/regtested as a series on aarch64-linux-gnu (with/without
> > the pass enabled).  OK for trunk?
> >
> > Thanks,
> > Alex
> >
> > gcc/ChangeLog:
> >
> > PR target/113089
> > * config/aarch64/aarch64-ldp-fusion.cc (reset_debug_use): New.
> > (fixup_debug_use): New.
> > (fixup_debug_uses_trailing_add): New.
> > (fixup_debug_uses): New. Use it ...
> > (ldp_bb_info::fuse_pair): ... here.
> > (try_promote_writeback): Call fixup_debug_uses_trailing_add to
> > fix up debug uses of the base register that are affected by
> > folding in the trailing add insn.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/113089
> > * gcc.c-torture/compile/pr113089.c: New test.
> > ---
> >  gcc/config/aarch64/aarch64-ldp-fusion.cc  | 332 +-
> >  .../gcc.c-torture/compile/pr113089.c  |  26 ++
> >  2 files changed, 351 insertions(+), 7 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr113089.c
> >
> > diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
> > b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> > index 4d7fd72c6b1..fd0278e7acf 100644
> > --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> > +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> > @@ -1342,6 +1342,309 @@ ldp_bb_info::track_tombstone (int uid)
> >  gcc_unreachable (); // Bit should have changed.
> >  }
> >  
> > +// Reset the debug insn containing USE (the debug insn has been
> > +// optimized away).
> > +static void
> > +reset_debug_use (use_info *use)
> > +{
> > +  auto use_insn = use->insn ();
> > +  auto use_rtl = use_insn->rtl ();
> > +  insn_change change (use_insn);
> > +  change.new_uses = {};
> > +  INSN_VAR_LOCATION_LOC (use_rtl) = gen_rtx_UNKNOWN_VAR_LOC ();
> > +  crtl->ssa->change_insn (change);
> > +}
> > +
> > +// USE is a debug use that needs updating because DEF (a def of the same
> > +// register) is being re-ordered over it.  If BASE is non-null, then DEF
> > +// is an update of the register BASE by a constant, given by WB_OFFSET,
> > +// and we can preserve debug info by accounting for the change in side
> > +// effects.
> > +static void
> > +fixup_debug_use (obstack_watermark ,
> > +use_info *use,
> > +def_info *def,
> > +rtx base,
> > +poly_int64 wb_offset)
> > +{
> > +  auto use_insn = use->insn ();
> > +  if (base)
> > +{
> > +  auto use_rtl = use_insn->rtl ();
> > +  insn_change change (use_insn);
> > +
> > +  gcc_checking_assert (REG_P (base) && use->regno () == REGNO (base));
> > +  change.new_uses = check_remove_regno_access (attempt,
> > +  change.new_uses,
> > +  use->regno ());
> > +
> > +  // The effect of the writeback is to add WB_OFFSET to BASE.  If
> > +  // 

Re: [PATCH 4/4] aarch64: Fix up uses of mem following stp insert [PR113070]

2024-01-22 Thread Alex Coplan
On 22/01/2024 15:59, Richard Sandiford wrote:
> Alex Coplan  writes:
> > As the PR shows (specifically #c7) we are missing updating uses of mem
> > when inserting an stp in the aarch64 load/store pair fusion pass.  This
> > patch fixes that.
> >
> > RTL-SSA has a simple view of memory and by default doesn't allow stores
> > to be re-ordered w.r.t. other stores.  In the ldp fusion pass, we do our
> > own alias analysis and so can re-order stores over other accesses when
> > we deem this is safe.  If neither store can be re-purposed (moved into
> > the required position to form the stp while respecting the RTL-SSA
> > constraints), then we turn both the candidate stores into "tombstone"
> > insns (logically delete them) and insert a new stp insn.
> >
> > As it stands, we implement the insert case separately (after dealing
> > with the candidate stores) in fuse_pair by inserting into the middle of
> > the vector of changes.  This is OK when we only have to insert one
> > change, but with this fix we would need to insert the change for the new
> > stp plus multiple changes to fix up uses of mem (note the number of
> > fix-ups is naturally bounded by the alias limit param to prevent
> > quadratic behaviour).  If we kept the code structured as is and inserted
> > into the middle of the vector, that would lead to repeated moving of
> > elements in the vector which seems inefficient.  The structure of the
> > code would also be a little unwieldy.
> >
> > To improve on that situation, this patch introduces a helper class,
> > stp_change_builder, which implements a state machine that helps to build
> > the required changes directly in program order.  That state machine is
> > reponsible for deciding what changes need to be made in what order, and
> > the code in fuse_pair then simply follows those steps.
> >
> > Together with the fix in the previous patch for installing new defs
> > correctly in RTL-SSA, this fixes PR113070.
> >
> > We take the opportunity to rename the function decide_stp_strategy to
> > try_repurpose_store, as that seems more descriptive of what it actually
> > does, since stp_change_builder is now responsible for the overall change
> > strategy.
> >
> > Bootstrapped/regtested as a series with/without the passes enabled on
> > aarch64-linux-gnu, OK for trunk?
> >
> > Thanks,
> > Alex
> >
> > gcc/ChangeLog:
> >
> > PR target/113070
> > * config/aarch64/aarch64-ldp-fusion.cc (struct stp_change_builder): New.
> > (decide_stp_strategy): Reanme to ...
> > (try_repurpose_store): ... this.
> > (ldp_bb_info::fuse_pair): Refactor to use stp_change_builder to
> > construct stp changes.  Fix up uses when inserting new stp insns.
> > ---
> >  gcc/config/aarch64/aarch64-ldp-fusion.cc | 248 ++-
> >  1 file changed, 194 insertions(+), 54 deletions(-)
> >
> > diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
> > b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> > index 689a8c884bd..703cfb1228c 100644
> > --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> > +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> > @@ -844,11 +844,138 @@ def_upwards_move_range (def_info *def)
> >return range;
> >  }
> >  
> > +// Class that implements a state machine for building the changes needed 
> > to form
> > +// a store pair instruction.  This allows us to easily build the changes in
> > +// program order, as required by rtl-ssa.
> > +struct stp_change_builder
> > +{
> > +  enum class state
> > +  {
> > +FIRST,
> > +INSERT,
> > +FIXUP_USE,
> > +LAST,
> > +DONE
> > +  };
> > +
> > +  enum class action
> > +  {
> > +TOMBSTONE,
> > +CHANGE,
> > +INSERT,
> > +FIXUP_USE
> > +  };
> > +
> > +  struct change
> > +  {
> > +action type;
> > +insn_info *insn;
> > +  };
> > +
> > +  bool done () const { return m_state == state::DONE; }
> > +
> > +  stp_change_builder (insn_info *insns[2],
> > + insn_info *repurpose,
> > + insn_info *dest)
> > +: m_state (state::FIRST), m_insns { insns[0], insns[1] },
> > +  m_repurpose (repurpose), m_dest (dest), m_use (nullptr) {}
> 
> Just to make sure I understand: is it the case that
> 
>   *insns[0] <= *dest <= *insns[1]
> 
> ?

Yes, that is my understanding.  I thought about asserting it somewhere in
stp_change_builder, but it seemed a bit gratuitous.

> 
> > +
> > +  change get_change () const
> > +  {
> > +switch (m_state)
> > +  {
> > +  case state::FIRST:
> > +   return {
> > + m_insns[0] == m_repurpose ? action::CHANGE : action::TOMBSTONE,
> > + m_insns[0]
> > +   };
> > +  case state::LAST:
> > +   return {
> > + m_insns[1] == m_repurpose ? action::CHANGE : action::TOMBSTONE,
> > + m_insns[1]
> > +   };
> > +  case state::INSERT:
> > +   return { action::INSERT, m_dest };
> > +  case state::FIXUP_USE:
> > +   return { action::FIXUP_USE, m_use->insn () };
> > +  case state::DONE:
> > +   break;
> > +  }
> > +
> > +

Re: [PATCH 3/4] rtl-ssa: Ensure new defs get inserted [PR113070]

2024-01-22 Thread Alex Coplan
On 22/01/2024 13:49, Richard Sandiford wrote:
> Alex Coplan  writes:
> > In r14-5820-ga49befbd2c783e751dc2110b544fe540eb7e33eb I added support to
> > RTL-SSA for inserting new insns, which included support for users
> > creating new defs.
> >
> > However, I missed that apply_changes_to_insn needed updating to ensure
> > that the new defs actually got inserted into the main def chain.  This
> > meant that when the aarch64 ldp/stp pass inserted a new stp insn, the
> > stp would just get skipped over during subsequent alias analysis, as its
> > def never got inserted into the memory def chain.  This (unsurprisingly)
> > led to wrong code.
> >
> > This patch fixes the issue by ensuring new user-created defs get
> > inserted.  I would have preferred to have used a flag internal to the
> > defs instead of a separate data structure to keep track of them, but since
> > machine_mode increased to 16 bits we're already at 64 bits in access_info,
> > and we can't really reuse m_is_temp as the logic in finalize_new_accesses
> > requires it to get cleared.
> >
> > Bootstrapped/regtested as a series on aarch64-linux-gnu, OK for trunk?
> >
> > Thanks,
> > Alex
> >
> > gcc/ChangeLog:
> >
> > PR target/113070
> > * rtl-ssa.h: Include hash-set.h.
> > * rtl-ssa/changes.cc (function_info::finalize_new_accesses): Add
> > new_sets parameter and use it to keep track of new user-created sets.
> > (function_info::apply_changes_to_insn): Also call add_def on new sets.
> > (function_info::change_insns): Add hash_set to keep track of new
> > user-created defs.  Plumb it through.
> > * rtl-ssa/functions.h: Add hash_set parameter to finalize_new_accesses 
> > and
> > apply_changes_to_insn.
> > ---
> >  gcc/rtl-ssa.h   |  1 +
> >  gcc/rtl-ssa/changes.cc  | 28 +---
> >  gcc/rtl-ssa/functions.h |  6 --
> >  3 files changed, 26 insertions(+), 9 deletions(-)
> >
> > diff --git a/gcc/rtl-ssa.h b/gcc/rtl-ssa.h
> > index f0cf656f5ac..17337639ae8 100644
> > --- a/gcc/rtl-ssa.h
> > +++ b/gcc/rtl-ssa.h
> > @@ -50,6 +50,7 @@
> >  #include "mux-utils.h"
> >  #include "rtlanal.h"
> >  #include "cfgbuild.h"
> > +#include "hash-set.h"
> >  
> >  // Provides the global crtl->ssa.
> >  #include "memmodel.h"
> > diff --git a/gcc/rtl-ssa/changes.cc b/gcc/rtl-ssa/changes.cc
> > index ce51d6ccd8d..6119ec3535b 100644
> > --- a/gcc/rtl-ssa/changes.cc
> > +++ b/gcc/rtl-ssa/changes.cc
> > @@ -429,7 +429,8 @@ update_insn_in_place (insn_change )
> >  // POS gives the final position of INSN, which hasn't yet been moved into
> >  // place.
> 
> The new parameter should be documented.  How about:
> 
>   // place.  NEW_SETS contains the new set_infos that are being added as part
>   // of this change (as opposed to being moved or repurposed from existing
>   // instructions).

That comment looks appropriate for apply_changes_to_insn, where NEW_SETS has
already been populated, but doesn't seem accurate for finalize_new_accesses.
How about:

  // place.  Keep track of any newly-created set_infos being added as
  // part of this change by adding them to NEW_SETS.

for finalize_new_accesses?  OK with that change (and using your suggestion for
apply_changes_to_insn)?

Thanks,
Alex

> 
> 
> >  void
> > -function_info::finalize_new_accesses (insn_change , insn_info *pos)
> > +function_info::finalize_new_accesses (insn_change , insn_info *pos,
> > + hash_set _sets)
> >  {
> >insn_info *insn = change.insn ();
> >  
> > @@ -465,6 +466,12 @@ function_info::finalize_new_accesses (insn_change 
> > , insn_info *pos)
> > // later in case we see a second write to the same resource.
> > def_info *perm_def = allocate (change.insn (),
> >  def->resource ());
> > +
> > +   // Keep track of the new set so we remember to add it to the
> > +   // def chain later.
> > +   if (new_sets.add (perm_def))
> > + gcc_unreachable (); // We shouldn't see duplicates here.
> > +
> > def->set_last_def (perm_def);
> > def = perm_def;
> >   }
> > @@ -647,7 +654,8 @@ function_info::finalize_new_accesses (insn_change 
> > , insn_info *pos)
> >  // Copy information from CHANGE to its underlying insn_info, given that
> >  // the insn_info has already been placed appropriately.
> 
> Similarly here.
> 
> OK with those changes, thanks.
> 
> Richard
> 
> >  void
> > -function_info::apply_changes_to_insn (insn_change )
> > +function_info::apply_changes_to_insn (insn_change ,
> > + hash_set _sets)
> >  {
> >insn_info *insn = change.insn ();
> >if (change.is_deletion ())
> > @@ -659,10 +667,11 @@ function_info::apply_changes_to_insn (insn_change 
> > )
> >// Copy the cost.
> >insn->set_cost (change.new_cost);
> >  
> > -  // Add all clobbers.  Sets and call clobbers never move relative to
> > -  // other definitions, so 

Re: [PATCH 2/4] rtl-ssa: Support for creating new uses [PR113070]

2024-01-22 Thread Alex Coplan
On 22/01/2024 13:45, Richard Sandiford wrote:
> Alex Coplan  writes:
> > This exposes an interface for users to create new uses in RTL-SSA.
> > This is needed for updating uses after inserting a new store pair insn
> > in the aarch64 load/store pair fusion pass.
> >
> > gcc/ChangeLog:
> >
> > PR target/113070
> > * rtl-ssa/accesses.cc (function_info::create_use): New.
> > * rtl-ssa/changes.cc (function_info::finalize_new_accesses):
> > Handle temporary uses, ensure new uses end up referring to
> > permanent defs.
> > * rtl-ssa/functions.h (function_info::create_use): Declare.
> > ---
> >  gcc/rtl-ssa/accesses.cc | 10 ++
> >  gcc/rtl-ssa/changes.cc  | 24 +++-
> >  gcc/rtl-ssa/functions.h |  5 +
> >  3 files changed, 34 insertions(+), 5 deletions(-)
> >
> > diff --git a/gcc/rtl-ssa/accesses.cc b/gcc/rtl-ssa/accesses.cc
> > index ce4a8b8dc00..3f1304fc5bf 100644
> > --- a/gcc/rtl-ssa/accesses.cc
> > +++ b/gcc/rtl-ssa/accesses.cc
> > @@ -1466,6 +1466,16 @@ function_info::create_set (obstack_watermark 
> > ,
> >return set;
> >  }
> >  
> > +use_info *
> > +function_info::create_use (obstack_watermark ,
> > +  insn_info *insn,
> > +  set_info *set)
> > +{
> > +  auto use = change_alloc (watermark, insn, set->resource (), 
> > set);
> > +  use->m_is_temp = true;
> > +  return use;
> > +}
> > +
> >  // Return true if ACCESS1 can represent ACCESS2 and if ACCESS2 can
> >  // represent ACCESS1.
> >  static bool
> > diff --git a/gcc/rtl-ssa/changes.cc b/gcc/rtl-ssa/changes.cc
> > index e538b637848..ce51d6ccd8d 100644
> > --- a/gcc/rtl-ssa/changes.cc
> > +++ b/gcc/rtl-ssa/changes.cc
> > @@ -538,7 +538,9 @@ function_info::finalize_new_accesses (insn_change 
> > , insn_info *pos)
> >unsigned int i = 0;
> >for (use_info *use : change.new_uses)
> >  {
> > -  if (!use->m_has_been_superceded)
> > +  if (use->m_is_temp)
> > +   use->m_has_been_superceded = true;
> > +  else if (!use->m_has_been_superceded)
> > {
> 
> Is this part necessary for correctness, or is it just a compile-time
> optimisation?  We already have temporary uses via make_uses_available,
> and in principle, it's possible to reuse the uses for multiple changes
> within the same group.  E.g. when replacing A with B in multiple
> instructions, it's OK for the associated insn changes to refer to
> A's uses directly, or to uses created for A by make_uses_available.
> 
> So IMO it'd better to drop this hunk if we can.

Yeah, I agree it's just a compile-time optimisation and shouldn't be
needed for correctness.  I initially thought it might save on memory,
but IIUC the memory allocated with allocate_temp will get freed when we
return from finalize_new_accesses anyway.

So I'll drop that hunk and re-test the series, thanks.

Alex

> 
> >   use = allocate_temp (insn, use->resource (), use->def ());
> >   use->m_has_been_superceded = true;
> > @@ -609,15 +611,27 @@ function_info::finalize_new_accesses (insn_change 
> > , insn_info *pos)
> >   m_temp_uses[i] = use = allocate (*use);
> >   use->m_is_temp = false;
> >   set_info *def = use->def ();
> > - // Handle cases in which the value was previously not used
> > - // within the block.
> > - if (def && def->m_is_temp)
> > + if (!def || !def->m_is_temp)
> > +   continue;
> > +
> > + if (auto phi = dyn_cast (def))
> > {
> > - phi_info *phi = as_a (def);
> > + // Handle cases in which the value was previously not used
> > + // within the block.
> >   gcc_assert (phi->is_degenerate ());
> >   phi = create_degenerate_phi (phi->ebb (), phi->input_value (0));
> >   use->set_def (phi);
> > }
> > + else
> > +   {
> > + // The temporary def may also be a set added with this change, in
> > + // which case the permanent set is stored in the last_def link,
> > + // and we need to update the use to refer to the permanent set.
> > + gcc_assert (is_a (def));
> > + auto perm_set = as_a (def->last_def ());
> > + gcc_assert (!perm_set->is_temporary ());
> > + use->set_def (perm_set);
> > +   }
> > }
> >  }
> >  
> > diff --git a/gcc/rtl-ssa/functions.h b/gcc/rtl-ssa/functions.h
> > index 58d0b50ea83..962180e27d6 100644
> > --- a/gcc/rtl-ssa/functions.h
> > +++ b/gcc/rtl-ssa/functions.h
> > @@ -73,6 +73,11 @@ public:
> > insn_info *insn,
> > resource_info resource);
> >  
> > +  // Create a temporary use.
> 
> How about something like:
> 
>   // Create a temporary use of SET as part of a change to INSN.
>   // SET can be a pre-existing definition or one that is being created
>   // as part of the same change group.
> 
> (Feel free to tweak the wording.)
> 
> OK those changes, thanks.
> 
> Richard
> 
> > +  use_info *create_use (obstack_watermark ,
> > +   insn_info *insn,
> > 

[pushed] c++: extend Wdangling-reference17.C [PR109642]

2024-01-22 Thread Marek Polacek
Tested x86_64-pc-linux-gnu, applying to trunk.

-- >8 --
This patch extends g++.dg/warn/Wdangling-reference17.C with code
from PR109642.  I'm not creating a new test because this one
already #includes the required headers.

PR c++/109642

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wdangling-reference17.C: Additional testing.
---
 gcc/testsuite/g++.dg/warn/Wdangling-reference17.C | 12 
 1 file changed, 12 insertions(+)

diff --git a/gcc/testsuite/g++.dg/warn/Wdangling-reference17.C 
b/gcc/testsuite/g++.dg/warn/Wdangling-reference17.C
index 223698422c2..8cda5dea444 100644
--- a/gcc/testsuite/g++.dg/warn/Wdangling-reference17.C
+++ b/gcc/testsuite/g++.dg/warn/Wdangling-reference17.C
@@ -4,6 +4,7 @@
 
 #include 
 #include 
+#include 
 
 int main()
 {
@@ -12,4 +13,15 @@ int main()
 {
   (void) i;
 }
+
+  // From c++/109642.
+  const auto vec = std::vector{ 1, 2, 3 };
+  const auto s = std::span{vec};
+
+  for ([[maybe_unused]] auto _ : s | std::views::take(2)) { }
+
+  for ([[maybe_unused]] auto _ : vec | std::views::take(2)) { }
+
+  const auto s_view = s | std::views::take(2);
+  for ([[maybe_unused]] auto _ : s_view) { }
 }

base-commit: bc77c035c45bb224790b1c03d06a64c8a1cc51c5
-- 
2.43.0



[PATCH] openmp, fortran: Add Fortran support for indirect clause on the declare target directive

2024-01-22 Thread Kwok Cheung Yeung

Hi

This patch adds support for the indirect clause on the OpenMP 'declare 
target' directive in Fortran. As with the C and C++ front-ends, this 
applies the 'omp declare target indirect' attribute on affected function 
declarations. The C test cases have also been translated to Fortran 
where appropriate.


Okay for mainline?

Thanks

KwokFrom 545bdb2c8ab9a43e79c7a3a2992bd9edc7d08a6f Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Thu, 11 Jan 2024 19:52:53 +
Subject: [PATCH 2/2] openmp, fortran: Add Fortran support for indirect clause
 on the declare target directive

2024-01-19  Kwok Cheung Yeung  

gcc/fortran/
* dump-parse-tree.cc (show_attr): Handle omp_declare_target_indirect
attribute.
* f95-lang.cc (gfc_gnu_attributes): Add entry for 'omp declare
target indirect'.
* gfortran.h (symbol_attribute): Add omp_declare_target_indirect
field.
(struct gfc_omp_clauses): Add indirect field.
* openmp.cc (omp_mask2): Add OMP_CLAUSE_INDIRECT.
(gfc_match_omp_clauses): Match indirect clause.
(OMP_DECLARE_TARGET_CLAUSES): Add OMP_CLAUSE_INDIRECT.
(gfc_match_omp_declare_target): Check omp_device_type and apply
omp_declare_target_indirect attribute to symbol if indirect clause
active.
* trans-decl.cc (add_attributes_to_decl): Add 'omp declare target
indirect' attribute if symbol has indirect attribute set.

gcc/testsuite/
* gfortran.dg/gomp/declare-target-indirect-1.f90: New.
* gfortran.dg/gomp/declare-target-indirect-2.f90: New.

libgomp/
* testsuite/libgomp.fortran/declare-target-indirect-1.f90: New.
* testsuite/libgomp.fortran/declare-target-indirect-2.f90: New.
---
 gcc/fortran/dump-parse-tree.cc|  2 +
 gcc/fortran/f95-lang.cc   |  2 +
 gcc/fortran/gfortran.h|  3 +-
 gcc/fortran/openmp.cc | 45 +-
 gcc/fortran/trans-decl.cc |  4 ++
 .../gomp/declare-target-indirect-1.f90| 58 +++
 .../gomp/declare-target-indirect-2.f90| 25 
 .../declare-target-indirect-1.f90 | 39 +
 .../declare-target-indirect-2.f90 | 53 +
 9 files changed, 229 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/declare-target-indirect-1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/declare-target-indirect-2.f90
 create mode 100644 
libgomp/testsuite/libgomp.fortran/declare-target-indirect-1.f90
 create mode 100644 
libgomp/testsuite/libgomp.fortran/declare-target-indirect-2.f90

diff --git a/gcc/fortran/dump-parse-tree.cc b/gcc/fortran/dump-parse-tree.cc
index 1563b810b98..7b154eb3ca7 100644
--- a/gcc/fortran/dump-parse-tree.cc
+++ b/gcc/fortran/dump-parse-tree.cc
@@ -914,6 +914,8 @@ show_attr (symbol_attribute *attr, const char * module)
 fputs (" OMP-DECLARE-TARGET", dumpfile);
   if (attr->omp_declare_target_link)
 fputs (" OMP-DECLARE-TARGET-LINK", dumpfile);
+  if (attr->omp_declare_target_indirect)
+fputs (" OMP-DECLARE-TARGET-INDIRECT", dumpfile);
   if (attr->elemental)
 fputs (" ELEMENTAL", dumpfile);
   if (attr->pure)
diff --git a/gcc/fortran/f95-lang.cc b/gcc/fortran/f95-lang.cc
index 358cb17fce2..67fda27aa3e 100644
--- a/gcc/fortran/f95-lang.cc
+++ b/gcc/fortran/f95-lang.cc
@@ -96,6 +96,8 @@ static const attribute_spec gfc_gnu_attributes[] =
 gfc_handle_omp_declare_target_attribute, NULL },
   { "omp declare target link", 0, 0, true,  false, false, false,
 gfc_handle_omp_declare_target_attribute, NULL },
+  { "omp declare target indirect", 0, 0, true,  false, false, false,
+gfc_handle_omp_declare_target_attribute, NULL },
   { "oacc function", 0, -1, true,  false, false, false,
 gfc_handle_omp_declare_target_attribute, NULL },
 };
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index fd73e4ce431..fd843a3241d 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -999,6 +999,7 @@ typedef struct
   /* Mentioned in OMP DECLARE TARGET.  */
   unsigned omp_declare_target:1;
   unsigned omp_declare_target_link:1;
+  unsigned omp_declare_target_indirect:1;
   ENUM_BITFIELD (gfc_omp_device_type) omp_device_type:2;
   unsigned omp_allocate:1;
 
@@ -1584,7 +1585,7 @@ typedef struct gfc_omp_clauses
   unsigned grainsize_strict:1, num_tasks_strict:1, compare:1, weak:1;
   unsigned non_rectangular:1, order_concurrent:1;
   unsigned contains_teams_construct:1, target_first_st_is_teams:1;
-  unsigned contained_in_target_construct:1;
+  unsigned contained_in_target_construct:1, indirect:1;
   ENUM_BITFIELD (gfc_omp_sched_kind) sched_kind:3;
   ENUM_BITFIELD (gfc_omp_device_type) device_type:2;
   ENUM_BITFIELD (gfc_omp_memorder) memorder:3;
diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 0af80d54fad..d1c5c323c54 100644
--- a/gcc/fortran/openmp.cc

[PATCH] openmp: Change to using a hashtab to lookup offload target addresses for indirect function calls

2024-01-22 Thread Kwok Cheung Yeung

Hi

There was a bug in the declare-target-indirect-2.c libgomp testcase 
(testing indirect calls in offloaded target regions, spread over 
multiple teams/threads) that due to an errant fallthrough in a switch 
statement resulted in only one indirect function ever getting called:


switch (i % 3)
  {
case 0: fn_ptr[i] =   // Missing break
case 1: fn_ptr[i] =   // Missing break
case 2: fn_ptr[i] = 
  }

However, when the missing break statements are added, the testcase fails 
with an invalid memory access. Upon investigation, this is due to the 
use of a splay-tree as the lookup structure for indirect addresses, as 
the splay-tree moves frequently accessed elements closer to the root 
node and so needs locking when used from multiple threads. However, this 
would end up partially serialising all the threads and kill performance. 
I have switched the lookup structure from a splay tree to a hashtab 
instead to avoid locking during lookup.


I have also tidied up the initialisation of the lookup table by calling 
it only from the first thread of the first team, instead of redundantly 
calling it from every thread and only having the first one reached do 
the initialisation. This removes the need for locking during initialisation.


Tested with offloading to NVPTX and GCN with a x86_64 host. Okay for master?

Thanks

KwokFrom 721ec33bec2fddc7ee37e227358e36fec923f8da Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Wed, 17 Jan 2024 16:53:40 +
Subject: [PATCH 1/2] openmp: Change to using a hashtab to lookup offload
 target addresses for indirect function calls

A splay-tree was previously used to lookup equivalent target addresses
for a given host address on offload targets. However, as splay-trees can
modify their structure on lookup, they are not suitable for concurrent
access from separate teams/threads without some form of locking.  This
patch changes the lookup data structure to a hashtab instead, which does
not have these issues.

The call to build_indirect_map to initialize the data structure is now
called from just the first thread of the first team to avoid redundant
calls to this function.

2024-01-19  Kwok Cheung Yeung  

libgomp/
* config/accel/target-indirect.c: Include string.h and hashtab.h.
Remove include of splay-tree.h.
(splay_tree_prefix, splay_tree_c): Delete.
(struct indirect_map_t): New.
(hash_entry_type, htab_alloc, htab_free, htab_hash, htab_eq): New.
(GOMP_INDIRECT_ADD_MAP): Remove volatile qualifier.
(USE_SPLAY_TREE_LOOKUP): Rename to...
(USE_HASHTAB_LOOKUP): ..this.
(indirect_map, indirect_array): Delete.
(indirect_htab): New.
(build_indirect_map): Remove locking.  Build indirect map using
hashtab.
(GOMP_target_map_indirect_ptr): Use indirect_htab to lookup target
address.
* config/gcn/team.c (gomp_gcn_enter_kernel): Call build_indirect_map
from first thread of first team only.
* config/nvptx/team.c (gomp_nvptx_main): Likewise.
* testsuite/libgomp.c-c++-common/declare-target-indirect-2.c (main):
Add missing break statements.
---
 libgomp/config/accel/target-indirect.c| 75 +++
 libgomp/config/gcn/team.c |  7 +-
 libgomp/config/nvptx/team.c   |  9 ++-
 .../declare-target-indirect-2.c   | 14 ++--
 4 files changed, 59 insertions(+), 46 deletions(-)

diff --git a/libgomp/config/accel/target-indirect.c 
b/libgomp/config/accel/target-indirect.c
index c60fd547cb6..6dad85076d6 100644
--- a/libgomp/config/accel/target-indirect.c
+++ b/libgomp/config/accel/target-indirect.c
@@ -25,22 +25,43 @@
.  */
 
 #include 
+#include 
 #include "libgomp.h"
 
-#define splay_tree_prefix indirect
-#define splay_tree_c
-#include "splay-tree.h"
+struct indirect_map_t
+{
+  void *host_addr;
+  void *target_addr;
+};
+
+typedef struct indirect_map_t *hash_entry_type;
+
+static inline void * htab_alloc (size_t size) { return gomp_malloc (size); }
+static inline void htab_free (void *ptr) { free (ptr); }
+
+#include "hashtab.h"
+
+static inline hashval_t
+htab_hash (hash_entry_type element)
+{
+  return hash_pointer (element->host_addr);
+}
 
-volatile void **GOMP_INDIRECT_ADDR_MAP = NULL;
+static inline bool
+htab_eq (hash_entry_type x, hash_entry_type y)
+{
+  return x->host_addr == y->host_addr;
+}
+
+void **GOMP_INDIRECT_ADDR_MAP = NULL;
 
 /* Use a splay tree to lookup the target address instead of using a
linear search.  */
-#define USE_SPLAY_TREE_LOOKUP
+#define USE_HASHTAB_LOOKUP
 
-#ifdef USE_SPLAY_TREE_LOOKUP
+#ifdef USE_HASHTAB_LOOKUP
 
-static struct indirect_splay_tree_s indirect_map;
-static indirect_splay_tree_node indirect_array = NULL;
+static htab_t indirect_htab = NULL;
 
 /* Build the splay tree used for host->target address lookups.  */
 
@@ -48,37 +69,29 @@ void
 build_indirect_map (void)
 {
   size_t 

[patch] plugin/plugin-nvptx.c: Fix fini_device call when already shutdown [PR113513]

2024-01-22 Thread Tobias Burnus

Testing showed that the libgomp.c/target-52.c failed with:
libgomp: cuCtxGetDevice error: unknown cuda error libgomp: device 
finalization failed This testcase uses OMP_DISPLAY_ENV=true and 
OMP_TARGET_OFFLOAD=mandatory, and those env vars matter, i.e. it only 
fails if dg-set-target-env-var is honored. If both env vars are set, the 
device initialization occurs earlier as OMP_DEFAULT_DEVICE is shown due 
to the display-env env var and its value (when target-offload-var is 
'mandatory') might be either 'omp_invalid_device' or '0'. It turned out 
that this had an effect on device finalization, which caused CUDA to 
stop earlier than expected. This patch now handles this case gracefully. 
For details, see the commit log message in the attached patch and/or the 
PR. Comments, remarks, suggestions? Does this look sensible? (I would 
like to see some acknowledgement by someone who feels more comfortable 
with CUDA than me.) Tobias
plugin/plugin-nvptx.c: Fix fini_device call when already shutdown [PR113513]

The following issue was found when running libgomp.c/target-52.c with
nvptx offloading when the dg-set-target-env-var was honored. The issue
occurred for both -foffload=disable and with offloading configured when
an nvidia device is available.

At the end of the program, the offloading parts are shutdown via two means:
The callback registered via 'atexit (gomp_target_fini)' and - via code
generated in mkoffload, the '__attribute__((destructor)) fini' function
that calls GOMP_offload_unregister_ver.

In normal processing, first gomp_target_fini is called - which then sets
GOMP_DEVICE_FINALIZED for the device - and later GOMP_offload_unregister_ver,
but that's then because the state is GOMP_DEVICE_FINALIZED.
If both OMP_DISPLAY_ENV=true and OMP_TARGET_OFFLOAD="mandatory" are set,
the call omp_display_env already invokes gomp_init_targets_once, i.e. it
occurs earlier than usual and is invoked via __attribute__((constructor))
initialize_env.

For some unknown reasons, while this does not have an effect on the
order of the called plugin functions for initialization, it changes the
order of function calls for shutting down. Namely, when the two environment
variables are set, GOMP_offload_unregister_ver is called now before
gomp_target_fini. - And it seems as if CUDA regards a call to cuModuleUnload
(or unloading the last module?) as indication that the device context should
be destroyed - or, at least, afterwards calling cuCtxGetDevice will return
CUDA_ERROR_DEINITIALIZED.

As the previous code in nvptx_attach_host_thread_to_device wasn't expecting
that result, it called
  GOMP_PLUGIN_error ("cuCtxGetDevice error: %s", cuda_error (r));
causing a fatal error of the program.

This commit handles now CUDA_ERROR_DEINITIALIZED in a special way such
that GOMP_OFFLOAD_fini_device just works.

When reading the code, the following was observed in addition:
When gomp_fini_device is called, it invokes goacc_fini_asyncqueues
to ensure that the queue is emptied.  It seems to make sense to do
likewise for GOMP_offload_unregister_ver, which this commit does in
addition.

libgomp/ChangeLog:

	PR libgomp/113513
	* target.c (GOMP_offload_unregister_ver): Call goacc_fini_asyncqueues
	before invoking GOMP_offload_unregister_ver.
	* plugin/plugin-nvptx.c (nvptx_attach_host_thread_to_device): Change
	return type to int and return -1 for CUDA_ERROR_DEINITIALIZED.
	(GOMP_OFFLOAD_fini_device): Handle the latter gracefully.
	(nvptx_init, GOMP_OFFLOAD_load_image, GOMP_OFFLOAD_alloc,
	GOMP_OFFLOAD_host2dev, GOMP_OFFLOAD_dev2host, GOMP_OFFLOAD_memcpy2d,
	GOMP_OFFLOAD_memcpy3d, GOMP_OFFLOAD_openacc_async_host2dev,
	GOMP_OFFLOAD_openacc_async_dev2host): Update for return-type change.

Signed-off-by: Tobias Burnus 

 libgomp/plugin/plugin-nvptx.c | 41 +
 libgomp/target.c  |  7 +--
 2 files changed, 30 insertions(+), 18 deletions(-)

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index c04c3acd679..dccbae44abd 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -382,9 +382,11 @@ nvptx_init (void)
 }
 
 /* Select the N'th PTX device for the current host thread.  The device must
-   have been previously opened before calling this function.  */
+   have been previously opened before calling this function.
+   Returns 1 if successful, 0 if an error occurred, and -1 for
+   CUDA_ERROR_DEINITIALIZED.  */
 
-static bool
+static int
 nvptx_attach_host_thread_to_device (int n)
 {
   CUdevice dev;
@@ -393,15 +395,17 @@ nvptx_attach_host_thread_to_device (int n)
   CUcontext thd_ctx;
 
   r = CUDA_CALL_NOCHECK (cuCtxGetDevice, );
+  if (r == CUDA_ERROR_DEINITIALIZED)
+return -1;
   if (r == CUDA_ERROR_NOT_PERMITTED)
 {
   /* Assume we're in a CUDA callback, just return true.  */
-  return true;
+  return 1;
 }
   if (r != CUDA_SUCCESS && r != CUDA_ERROR_INVALID_CONTEXT)
 {
   GOMP_PLUGIN_error ("cuCtxGetDevice 

Re: [PATCH] c++: Don't ICE for unknown parameter to constexpr'd switch-statement, PR113545

2024-01-22 Thread Marek Polacek
On Mon, Jan 22, 2024 at 06:02:32PM +0100, Hans-Peter Nilsson wrote:
> I don't really know whether this is the right way to treat
> CONVERT_EXPR as below, but...  Regtested native
> x86_64-linux-gnu.  Ok to commit?

Thanks for taking a look at this problem.
 
> brgds, H-P
> 
> -- >8 --
> That gcc_unreachable at the default-label seems to be over
> the top.  It seems more correct to just say "that's not
> constant" to whatever's not known (to be constant), when
> looking for matches in switch-statements.

Unfortunately this doesn't seem correct to me; I don't think we
should have gotten that far.  It appears that we lose track of
the reinterpret_cast, which is not allowed in a constant expression:
.

cp_convert -> ... -> convert_to_integer_1 gives us a CONVERT_EXPR
but we only set REINTERPRET_CAST_P on NOP_EXPRs:

  expr = cp_convert (type, expr, complain);
  if (TREE_CODE (expr) == NOP_EXPR)
/* Mark any nop_expr that created as a reintepret_cast.  */
REINTERPRET_CAST_P (expr) = true;

so when evaluating baz we get (long unsigned int) , which
passes verify_constant.
 
I don't have a good suggestion yet, sorry.

> With this patch, the code generated for the (inlined) call to
> ifbar equals that to swbar, except for the comparisons being
> in another order.
> 
> gcc/cp:
>   PR c++/113545
>   * constexpr.cc (label_matches): Replace call to_unreachable with

"to gcc_unreachable"

>   return false.

More like with "break" but that's not important.
 
> gcc/testsuite:
>   * g++.dg/expr/pr113545.C: New text.

"test"

> ---
>  gcc/cp/constexpr.cc |  3 +-
>  gcc/testsuite/g++.dg/expr/pr113545.C | 49 +
>  2 files changed, 51 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.dg/expr/pr113545.C
> 
> diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
> index 6350fe154085..30caf3322fff 100644
> --- a/gcc/cp/constexpr.cc
> +++ b/gcc/cp/constexpr.cc
> @@ -6922,7 +6922,8 @@ label_matches (const constexpr_ctx *ctx, tree 
> *jump_target, tree stmt)
>break;
>  
>  default:
> -  gcc_unreachable ();
> +  /* Something else, like CONVERT_EXPR.  Unknown whether it matches.  */
> +  break;
>  }
>return false;
>  }
> diff --git a/gcc/testsuite/g++.dg/expr/pr113545.C 
> b/gcc/testsuite/g++.dg/expr/pr113545.C
> new file mode 100644
> index ..914ffdeb8e16
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/expr/pr113545.C

The problem seems to be more about conversion so 
g++.dg/conversion/reinterpret5.C
or g++.dg/cpp0x/constexpr-reinterpret3.C seems more appropriate.

> @@ -0,0 +1,49 @@

Please add

PR c++/113545

> +// { dg-do run { target c++11 } }
> +
> +char foo;
> +
> +// This one caught a call to gcc_unreachable in
> +// cp/constexpr.cc:label_matches, when passed a convert_expr from the
> +// cast in the call.
> +constexpr unsigned char swbar(__UINTPTR_TYPE__ baz)
> +{
> +  switch (baz)
> +{
> +case 13:
> +  return 11;
> +case 14:
> +  return 78;
> +case 2048:
> +  return 13;
> +default:
> +  return 42;
> +}
> +}
> +
> +// For reference, the equivalent* if-statements.
> +constexpr unsigned char ifbar(__UINTPTR_TYPE__ baz)
> +{
> +  if (baz == 13)
> +return 11;
> +  else if (baz == 14)
> +return 78;
> +  else if (baz == 2048)
> +return 13;
> +  else
> +return 42;
> +}
> +
> +__attribute__ ((__noipa__))
> +void xyzzy(int x)
> +{
> +  if (x != 42)
> +__builtin_abort ();
> +}
> +
> +int main()
> +{
> +  unsigned const char c = swbar(reinterpret_cast<__UINTPTR_TYPE__>());
> +  xyzzy(c);
> +  unsigned const char d = ifbar(reinterpret_cast<__UINTPTR_TYPE__>());

I suppose we should also test a C-style cast (which leads to a reinterpret_cast
in this case).

Maybe check we get an error when c/d are constexpr (that used to ICE).

> +  xyzzy(d);
> +}
> -- 
> 2.30.2
> 

Marek



[committed] Add -gno-strict-dwarf to dg-options in various btf enum tests

2024-01-22 Thread John David Anglin
Tested on hppa64-hp-hpux11.11.  Committed to trunk.

Dave
---

Add -gno-strict-dwarf to dg-options in various btf enum tests

The -gno-strict-dwarf option is needed to ensure enum signedness
is added to type_die.

2024-01-22  John David Anglin  

gcc/testsuite/ChangeLog:

PR debug/113382
* gcc.dg/debug/btf/btf-bitfields-3.c: Add -gno-strict-dwarf
option to dg-options.
* gcc.dg/debug/btf/btf-enum-1.c: Likewise.
* gcc.dg/debug/btf/btf-enum-small.c: Likewise.
* gcc.dg/debug/btf/btf-enum64-1.c: Likewise.

diff --git a/gcc/testsuite/gcc.dg/debug/btf/btf-bitfields-3.c 
b/gcc/testsuite/gcc.dg/debug/btf/btf-bitfields-3.c
index 78b8b7d49ad..08622b771e6 100644
--- a/gcc/testsuite/gcc.dg/debug/btf/btf-bitfields-3.c
+++ b/gcc/testsuite/gcc.dg/debug/btf/btf-bitfields-3.c
@@ -14,7 +14,7 @@
*/
 
 /* { dg-do compile } */
-/* { dg-options "-O0 -gbtf -dA" } */
+/* { dg-options "-O0 -gbtf -gno-strict-dwarf -dA" } */
 
 /* Enum with 4 members.  */
 /* { dg-final { scan-assembler-times "\[\t \]0x604\[\t 
\]+\[^\n\]*btt_info" 1 } } */
diff --git a/gcc/testsuite/gcc.dg/debug/btf/btf-enum-1.c 
b/gcc/testsuite/gcc.dg/debug/btf/btf-enum-1.c
index 021ce0345e4..7873c8837a0 100644
--- a/gcc/testsuite/gcc.dg/debug/btf/btf-enum-1.c
+++ b/gcc/testsuite/gcc.dg/debug/btf/btf-enum-1.c
@@ -1,7 +1,7 @@
 /* Test BTF generation for enums.  */
 
 /* { dg-do compile } */
-/* { dg-options "-O0 -gbtf -fno-short-enums -dA" } */
+/* { dg-options "-O0 -gbtf -gno-strict-dwarf -fno-short-enums -dA" } */
 
 /* { dg-final { scan-assembler-times "\[\t \]0x604\[\t 
\]+\[^\n\]*btt_info" 1 } } */
 /* { dg-final { scan-assembler-times "\[\t \]0x8603\[\t 
\]+\[^\n\]*btt_info" 1 } } */
diff --git a/gcc/testsuite/gcc.dg/debug/btf/btf-enum-small.c 
b/gcc/testsuite/gcc.dg/debug/btf/btf-enum-small.c
index eb8a1bd2c43..ccc92c92ba9 100644
--- a/gcc/testsuite/gcc.dg/debug/btf/btf-enum-small.c
+++ b/gcc/testsuite/gcc.dg/debug/btf/btf-enum-small.c
@@ -1,7 +1,7 @@
 /* Test BTF generation for small enums.  */
 
 /* { dg-do compile } */
-/* { dg-options "-O2 -gbtf -dA" } */
+/* { dg-options "-O2 -gbtf -gno-strict-dwarf -dA" } */
 
 /* { dg-final { scan-assembler-not "bte_value_lo32" } } */
 /* { dg-final { scan-assembler-not "bte_value_hi32" } } */
diff --git a/gcc/testsuite/gcc.dg/debug/btf/btf-enum64-1.c 
b/gcc/testsuite/gcc.dg/debug/btf/btf-enum64-1.c
index 5d1487c1183..3ba885af17f 100644
--- a/gcc/testsuite/gcc.dg/debug/btf/btf-enum64-1.c
+++ b/gcc/testsuite/gcc.dg/debug/btf/btf-enum64-1.c
@@ -1,7 +1,7 @@
 /* Test BTF generation for 64 bits enums.  */
 
 /* { dg-do compile } */
-/* { dg-options "-O0 -gbtf -dA" } */
+/* { dg-options "-O0 -gbtf -gno-strict-dwarf -dA" } */
 
 /* { dg-final { scan-assembler-times "\[\t \].size\[\t \]_?myenum1,\[\t \]8" 1 
} } */
 /* { dg-final { scan-assembler-times "\[\t \].size\[\t \]_?myenum2,\[\t \]8" 1 
} } */


signature.asc
Description: PGP signature


Re: [PATCH] aarch64: Don't assert recog success in ldp/stp pass [PR113114]

2024-01-22 Thread Richard Sandiford
Alex Coplan  writes:
> Hi,
>
> The PR shows two different cases where try_promote_writeback produces an
> RTL pattern which isn't recognized.  Currently this leads to an ICE, as
> we assert recog success, but I think it's better just to back out of the
> changes gracefully if recog fails (as we do in the main fuse_pair case).
>
> In theory since we check the ranges here recog shouldn't fail (which is
> why I had the assert in the first place), but the PR shows an edge case
> in the patterns where if we form a pre-writeback pair where the
> writeback offset is exactly -S, where S is the size in bytes of one
> transfer register, we fail to match the expected pattern as the patterns
> look explicitly for plus operands in the mems.  I think fixing this
> would require adding at least four new special-case patterns to
> aarch64.md for what doesn't seem to be a particularly useful variant of
> the insns.  Even if we were to do that, I think it would be GCC 15
> material, and it's better to just punt for GCC 14.
>
> The ILP32 case in the PR is a bit different, as that shows us trying to
> combine a pair with DImode base register operands in the mems together
> with an SImode trailing update of the base register.  This leads to us
> forming an RTL pattern which references the base register in both SImode
> and DImode, which also fails to recog.  Again, I think it's best just to
> take the missed optimization for now.  If we really want to make this
> (try_promote_writeback) work for ILP32, we can try to do it for GCC 15.
>
> Bootstrapped/regtested on aarch64-linux-gnu (with/without passes
> enabled), OK for trunk?
>
> Thanks,
> Alex
>
> gcc/ChangeLog:
>
>   PR target/113114
>   * config/aarch64/aarch64-ldp-fusion.cc (try_promote_writeback):
>   Don't assert recog success, just punt if the writeback pair
>   isn't recognized.
>
> gcc/testsuite/ChangeLog:
>
>   PR target/113114
>   * gcc.c-torture/compile/pr113114.c: New test.
>   * gcc.target/aarch64/pr113114.c: New test.

OK, thanks.

Richard

>
> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> index 689a8c884bd..19142153f41 100644
> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> @@ -2672,7 +2672,15 @@ try_promote_writeback (insn_info *insn)
>for (unsigned i = 0; i < ARRAY_SIZE (changes); i++)
>  gcc_assert (rtl_ssa::restrict_movement_ignoring (*changes[i], 
> is_changing));
>  
> -  gcc_assert (rtl_ssa::recog_ignoring (attempt, pair_change, is_changing));
> +  if (!rtl_ssa::recog_ignoring (attempt, pair_change, is_changing))
> +{
> +  if (dump_file)
> + fprintf (dump_file, "i%d: recog failed on wb pair, bailing out\n",
> +  insn->uid ());
> +  cancel_changes (0);
> +  return;
> +}
> +
>gcc_assert (crtl->ssa->verify_insn_changes (changes));
>confirm_change_group ();
>crtl->ssa->change_insns (changes);
> diff --git a/gcc/testsuite/gcc.c-torture/compile/pr113114.c 
> b/gcc/testsuite/gcc.c-torture/compile/pr113114.c
> new file mode 100644
> index 000..978e594eb3d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/compile/pr113114.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-options "-funroll-loops" } */
> +float val[128];
> +float x;
> +void bar() {
> +  int i = 55;
> +  for (; i >= 0; --i)
> +x += val[i];
> +}
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr113114.c 
> b/gcc/testsuite/gcc.target/aarch64/pr113114.c
> new file mode 100644
> index 000..5b0383c2435
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/pr113114.c
> @@ -0,0 +1,7 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mabi=ilp32 -O -mearly-ldp-fusion -mlate-ldp-fusion" } */
> +void foo_n(double *a) {
> +  int i = 1;
> +  for (; i < (int)foo_n; i++)
> +a[i] = a[i - 1] + a[i + 1] * a[i];
> +}


[PATCH] LoongArch: Disable explicit reloc for TLS LD/GD with -mexplicit-relocs=auto

2024-01-22 Thread Xi Ruoyao
Binutils 2.42 supports TLS LD/GD relaxation which requires the assembler
macro.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_explicit_relocs_p):
If la_opt_explicit_relocs is EXPLICIT_RELOCS_AUTO, return false
for SYMBOL_TLS_LDM and SYMBOL_TLS_GD.
(loongarch_call_tls_get_addr): Do not split symbols of
SYMBOL_TLS_LDM or SYMBOL_TLS_GD if la_opt_explicit_relocs is
EXPLICIT_RELOCS_AUTO.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c: Check
for la.tls.ld and la.tls.gd.
---

Bootstrapped & regtested on loongarch64-linux-gnu.  Ok for trunk?

 gcc/config/loongarch/loongarch.cc| 9 -
 .../loongarch/explicit-relocs-auto-tls-ld-gd.c   | 3 ++-
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 82467474288..58df0b5637d 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -1970,11 +1970,10 @@ loongarch_explicit_relocs_p (enum loongarch_symbol_type 
type)
 {
   case SYMBOL_TLS_IE:
   case SYMBOL_TLS_LE:
-  case SYMBOL_TLSGD:
-  case SYMBOL_TLSLDM:
   case SYMBOL_PCREL64:
-   /* The linker don't know how to relax TLS accesses or 64-bit
-  pc-relative accesses.  */
+   /* TLS IE cannot be relaxed.  TLS LE relaxation does not require
+  using the assembly macro.  The linker does not relax 64-bit
+  pc-relative accesses as at now.  */
return true;
   case SYMBOL_GOT_DISP:
/* The linker don't know how to relax GOT accesses in extreme
@@ -2789,7 +2788,7 @@ loongarch_call_tls_get_addr (rtx sym, enum 
loongarch_symbol_type type, rtx v0)
 
   start_sequence ();
 
-  if (la_opt_explicit_relocs != EXPLICIT_RELOCS_NONE)
+  if (la_opt_explicit_relocs == EXPLICIT_RELOCS_ALWAYS)
 {
   /* Split tls symbol to high and low.  */
   rtx high = gen_rtx_HIGH (Pmode, copy_rtx (loc));
diff --git 
a/gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c 
b/gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c
index 957ff98df62..ca55fcfc53e 100644
--- a/gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c
+++ b/gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c
@@ -6,4 +6,5 @@ extern __thread int b __attribute__((visibility("default")));
 
 int test() { return a + b; }
 
-/* { dg-final { scan-assembler-not "la.tls" { target tls_native } } } */
+/* { dg-final { scan-assembler "la\\.tls\\.ld" { target tls_native } } } */
+/* { dg-final { scan-assembler "la\\.tls\\.gd" { target tls_native } } } */
-- 
2.43.0



Re: [PATCH v2 2/2] x86: Don't save callee-saved registers in noreturn functions

2024-01-22 Thread H.J. Lu
On Mon, Jan 22, 2024 at 8:58 AM Jan Hubicka  wrote:
>
> > I compared GCC master branch bootstrap and test times on a slow machine
> > with 6.6 Linux kernels compiled with the original GCC 13 and the GCC 13
> > with the backported patch.  The performance data isn't precise since the
> > measurements were done on different days with different GCC sources under
> > different 6.6 kernel versions.
> >
> > GCC master branch build time in seconds:
> >
> > beforeafter  improvement
> > 30043.75user  30013.16user   0%
> > 1274.85system 1243.72system  2.4%
> >
> > GCC master branch test time in seconds (new tests added):
> >
> > beforeafter  improvement
> > 216035.90user 216547.51user  0
> > 27365.51system26658.54system 2.6%
>
> It is interesting - the system time difference comes from smaller
> binary?  Is the difference any significant?

I think it comes from

In Linux kernel 6.7.0 on x86-64, do_exit is changed from

do_exit:
endbr64
call   
push   %r15
push   %r14
push   %r13
push   %r12
mov%rdi,%r12
push   %rbp
push   %rbx
mov%gs:0x0,%rbx
sub$0x28,%rsp
mov%gs:0x28,%rax
mov%rax,0x20(%rsp)
xor%eax,%eax
call   *0x0(%rip)# 
test   $0x2,%ah
je 

to

do_exit:
endbr64
call   
sub$0x28,%rsp
mov%rdi,%r12
mov%gs:0x28,%rax
mov%rax,0x20(%rsp)
xor%eax,%eax
mov%gs:0x0,%rbx
call   *0x0(%rip)# 
test   $0x2,%ah
je 

do_exit is called by every process when it exists.

> >
> > gcc/
> >
> >   PR target/38534
> >   * config/i386/i386-options.cc (ix86_set_func_type): Don't
> >   save and restore callee saved registers for a noreturn function
> >   with nothrow or compiled with -fno-exceptions.
>
> In general this looks like good thing to do.  I wonder if that is not
> something middle-end should understand for all targets.
> Also I wonder about asynchronous stack unwinding.  If we want to unwind
> stack from interrupt then we may need some registers to be saved (like
> base pointer).

It is compatible with -fasynchronous-unwind-tables.  From glibc test
debug/tst-longjmp_chk:

Starting program:
/export/build/gnu/tools-build/glibc-cet/build-x86_64-linux/debug/tst-longjmp_chk
--direct
warning: Unable to find libthread_db matching inferior's thread
library, thread debugging will not be available.

Program received signal SIGABRT, Aborted.
__pthread_kill_implementation (threadid=, signo=signo@entry=6,
no_tid=no_tid@entry=0) at pthread_kill.c:44
44   return INTERNAL_SYSCALL_ERROR_P (ret) ?
INTERNAL_SYSCALL_ERRNO (ret) : 0;
(gdb) bt
#0  __pthread_kill_implementation (threadid=,
signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x55294a4b in __pthread_kill_internal (signo=6,
threadid=) at pthread_kill.c:78
#2  0x5523da1a in __GI_raise (sig=sig@entry=6)
at ../sysdeps/posix/raise.c:26
#3  0x552248b3 in __GI_abort () at abort.c:79
#4  0x55225a7e in __libc_message_impl (
fmt=fmt@entry=0x553b7171 "*** %s ***: terminated\n")
at ../sysdeps/posix/libc_fatal.c:132
#5  0x55324517 in __GI___fortify_fail (msg=)
at fortify_fail.c:24
#6  0x55323411 in longjmp_chk ()
at ../sysdeps/x86_64/__longjmp.S:57
#7  0x55324d6d in __GI___longjmp_chk (
env=env@entry=0xa200 , val=val@entry=1)
at ../setjmp/longjmp.c:41
#8  0x6a00 in do_test () at tst-longjmp_chk.c:70
#9  0x7388 in support_test_main (argc=1431675392,
argv=0x7fffdd30, config=0x1, config@entry=0x7fffdbe0)
at support_test_main.c:413
#10 0x673f in main (argc=, argv=)
at ../support/test-driver.c:170
(gdb)

abort is a return function:

extern void abort (void) __THROW __attribute__ ((__noreturn__));

Callee-saved registers aren't saved:

Dump of assembler code for function __GI_abort:
   0x552247de <+0>: endbr64
   0x552247e2 <+4>: sub$0xa8,%rsp
   0x552247e9 <+11>: lea0x1d1540(%rip),%rbx#
0x553f5d30 
   0x552247f0 <+18>: mov%fs:0x28,%rax
   0x552247f9 <+27>: mov%rax,0x98(%rsp)
   0x55224801 <+35>: xor%eax,%eax
   0x55224803 <+37>: mov%fs:0x10,%rbp
   0x5522480c <+46>: cmp%rbp,0x1d1525(%rip)#
0x553f5d38 
   0x55224813 <+53>: je 0x55224833 <__GI_abort+85>
   0x55224815 <+55>: mov$0x1,%edx
   0x5522481a <+60>: lock cmpxchg %edx,0x1d150e(%rip)#
0x553f5d30 
   0x55224822 <+68>: je 0x5522482c <__GI_abort+78>
   0x55224824 <+70>: mov%rbx,%rdi


> Honza
> >
> > gcc/testsuite/
> >
> >   PR target/38534
> >  

[PATCH] c++: ambiguous member lookup for rewritten cands [PR113529]

2024-01-22 Thread Patrick Palka
Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk/13?

-- >8 --

Here we handle the operator expression u < v inconsistently: in a SFINAE
context (such as a requires-expression) we accept the it, and in a
non-SFINAE context we reject it with

  error: request for member 'operator<=>' is ambiguous

as per [class.member.lookup]/6.  This inconsistency is ultimately
because we neglect to propagate error_mark_node after recursing in
add_operator_candidates, fixed like so.

PR c++/113529

gcc/cp/ChangeLog:

* call.cc (add_operator_candidates): Propagate error_mark_node
result after recursing to find rewritten candidates.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/spaceship-sfinae3.C: New test.
---
 gcc/cp/call.cc| 11 +++---
 .../g++.dg/cpp2a/spaceship-sfinae3.C  | 22 +++
 2 files changed, 30 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/spaceship-sfinae3.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index 1f5ff417c81..183bb8c1348 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -7007,9 +7007,14 @@ add_operator_candidates (z_candidate **candidates,
{
  flags |= LOOKUP_REWRITTEN;
  if (rewrite_code != code)
-   /* Add rewritten candidates in same order.  */
-   add_operator_candidates (candidates, rewrite_code, ERROR_MARK,
-arglist, lookups, flags, complain);
+   {
+ /* Add rewritten candidates in same order.  */
+ tree r = add_operator_candidates (candidates, rewrite_code,
+   ERROR_MARK, arglist, lookups,
+   flags, complain);
+ if (r == error_mark_node)
+   return error_mark_node;
+   }
 
  z_candidate *save_cand = *candidates;
 
diff --git a/gcc/testsuite/g++.dg/cpp2a/spaceship-sfinae3.C 
b/gcc/testsuite/g++.dg/cpp2a/spaceship-sfinae3.C
new file mode 100644
index 000..ca159260ec7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/spaceship-sfinae3.C
@@ -0,0 +1,22 @@
+// PR c++/113529
+// { dg-do compile { target c++20 } }
+
+#include 
+
+struct A {
+  auto operator<=>(const A&) const = default;
+  bool operator<(const A&) const;
+};
+struct B {
+  auto operator<=>(const B&) const = default;
+};
+struct C : A, B { };
+
+
+template
+void f(T u, T v) {
+  static_assert(!requires { u < v; });
+  u < v; // { dg-error "request for member 'operator<=>' is ambiguous" }
+}
+
+template void f(C, C);
-- 
2.43.0.386.ge02ecfcc53



Re: [PATCH] ipa: Self-DCE of uses of removed call LHSs (PR 108007)

2024-01-22 Thread Jan Hubicka
> Hi,
> 
> PR 108007 is another manifestation where we rely on DCE to clean-up
> after IPA-SRA and if the user explicitely switches DCE off, IPA-SRA
> can leave behind statements which are fed uninitialized values and
> trap, even though their results are themselves never used.
> 
> I have already fixed this for unused parameters in callees, this bug
> shows that almost the same thing can happen for removed returns, on
> the side of callers.  This means that the issue has to be fixed
> elsewhere, in call redirection.  This patch adds a function which
> looks for (and through, using a work-list) uses of operations fed
> specific SSA names and removes them all.
> 
> That would have been easy if it wasn't for debug statements during
> tree-inline (from which call redirection is also invoked).  Debug
> statements are decoupled from the rest at this point and iterating
> over uses of SSAs does not bring them up.  During tree-inline they are
> handled especially at the end, I assume in order to make sure that
> relative ordering of UIDs are the same with and without debug info.
> 
> This means that during tree-inline we need to make a hash of killed
> SSAs, that we already have in copy_body_data, available to the
> function making the purging.  So the patch duly does also that, making
> the interface slightly ugly.  Moreover, all newly unused SSA names
> need to be freed and as PR 112616 showed, it must be done in a defined
> order, which is what newly added ipa_release_ssas_in_hash does.
> 
> The only difference from the patch which has already been approved in
> September but which I later had to revert is (one function name and)
> that SSAs that are to be released are first put into an auto_vec and
> sorted according their version number to avoid issues like PR 112616.
> 
> The patch has passed bootstrap, LTO-bootstrap and profiled-LTO-bootstrap
> and testing on x86_64-linux, bootstrap, LTO-bootstrap and testing on
> ppc64le-linux and bootstrap and LTO-bootstrap on Aarch64, testsuite
> there is still running, OK if it passes?
> 
> Thanks
> 
> Martin
> 
> 
> gcc/ChangeLog:
> 
> 2024-01-12  Martin Jambor  
> 
>   PR ipa/108007
>   PR ipa/112616
>   * cgraph.h (cgraph_edge): Add a parameter to
>   redirect_call_stmt_to_callee.
>   * ipa-param-manipulation.h (ipa_param_adjustments): Add a
>   parameter to modify_call.
>   (ipa_release_ssas_in_hash): Declare.
>   * cgraph.cc (cgraph_edge::redirect_call_stmt_to_callee): New
>   parameter killed_ssas, pass it to padjs->modify_call.
>   * ipa-param-manipulation.cc (purge_all_uses): New function.
>   (ipa_param_adjustments::modify_call): New parameter killed_ssas.
>   Instead of substituting uses, invoke purge_all_uses.  If
>   hash of killed SSAs has not been provided, create a temporary one
>   and release SSAs that have been added to it.
>   (compare_ssa_versions): New function.
>   (ipa_release_ssas_in_hash): Likewise.
>   * tree-inline.cc (redirect_all_calls): Create
>   id->killed_new_ssa_names earlier, pass it to edge redirection,
>   adjust a comment.
>   (copy_body): Release SSAs in id->killed_new_ssa_names.
> 
> gcc/testsuite/ChangeLog:
> 
> 2024-01-15  Martin Jambor  
> 
>   PR ipa/108007
>   PR ipa/112616
>   * gcc.dg/ipa/pr108007.c: New test.
>   * gcc.dg/ipa/pr112616.c: Likewise.
> +/* Remove all statements that use NAME directly or indirectly.  KILLED_SSAS
> +   contains the SSA_NAMEs that are already being or have been processed and 
> new
> +   ones need to be added to it.  The function only has to process situations
> +   handled by ssa_name_only_returned_p in ipa-sra.cc with the exception that 
> it
> +   can assume it must never reach a use in a return statement.  */
> +
> +static void
> +purge_all_uses (tree name, hash_set  *killed_ssas)

I think simple_dce_from_worklist does pretty much the same but expects
bitmap instead of hash set.

It checks for some cases when statement is not removable, but these
should all pass if we declared it dead.

The patch looks OK to me, except that keeping this walk at one place may
be nice. 

Honza
> +{
> +  imm_use_iterator imm_iter;
> +  gimple *stmt;
> +  auto_vec  worklist;
> +
> +  worklist.safe_push (name);
> +  while (!worklist.is_empty ())
> +{
> +  tree cur_name = worklist.pop ();
> +  FOR_EACH_IMM_USE_STMT (stmt, imm_iter, cur_name)
> + {
> +   if (gimple_debug_bind_p (stmt))
> + {
> +   /* When runing within tree-inline, we will never end up here but
> +  adding the SSAs to killed_ssas will do the trick in this case
> +  and the respective debug statements will get reset. */
> +   gimple_debug_bind_reset_value (stmt);
> +   update_stmt (stmt);
> +   continue;
> + }
> +
> +   tree lhs = NULL_TREE;
> +   if (is_gimple_assign (stmt))
> + lhs = gimple_assign_lhs (stmt);
> +   else if (gimple_code 

[PATCH v3 1/1] RISC-V: Add support for XCVbitmanip extension in CV32E40P

2024-01-22 Thread Mary Bennett
Spec: 
github.com/openhwgroup/core-v-sw/blob/master/specifications/corev-builtin-spec.md

Contributors:
  Mary Bennett 
  Nandni Jamnadas 
  Pietra Ferreira 
  Charlie Keaney
  Jessica Mills
  Craig Blackmore 
  Simon Cook 
  Jeremy Bennett 
  Helene Chelin 

gcc/ChangeLog:
* common/config/riscv/riscv-common.cc: Add XCVbitmanip.
* config/riscv/constraints.md: Likewise.
* config/riscv/corev.def: Likewise.
* config/riscv/corev.md: Likewise.
* config/riscv/predicates.md: Likewise.
* config/riscv/riscv-builtins.cc (AVAIL): Likewise.
* config/riscv/riscv-ftypes.def: Likewise.
* config/riscv/riscv.opt: Likewise.
* doc/extend.texi: Add XCVbitmanip builtin documentation.
* doc/sourcebuild.texi: Likewise.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/cv-bitmanip-compile-bclr.c: New test.
* gcc.target/riscv/cv-bitmanip-compile-bclrr.c: New test.
* gcc.target/riscv/cv-bitmanip-compile-bitrev.c: New test.
* gcc.target/riscv/cv-bitmanip-compile-bset.c: New test.
* gcc.target/riscv/cv-bitmanip-compile-bsetr.c: New test.
* gcc.target/riscv/cv-bitmanip-compile-clb.c: New test.
* gcc.target/riscv/cv-bitmanip-compile-cnt.c: New test.
* gcc.target/riscv/cv-bitmanip-compile-extract.c: New test.
* gcc.target/riscv/cv-bitmanip-compile-extractr.c: New test.
* gcc.target/riscv/cv-bitmanip-compile-extractu.c: New test.
* gcc.target/riscv/cv-bitmanip-compile-extractur.c: New test.
* gcc.target/riscv/cv-bitmanip-compile-ff1.c: New test.
* gcc.target/riscv/cv-bitmanip-compile-fl1.c: New test.
* gcc.target/riscv/cv-bitmanip-compile-insert.c: New test.
* gcc.target/riscv/cv-bitmanip-compile-insertr.c: New test.
* gcc.target/riscv/cv-bitmanip-compile-ror.c: New test.
* gcc.target/riscv/cv-bitmanip-fail-compile-bclr.c: New test.
* gcc.target/riscv/cv-bitmanip-fail-compile-bitrev.c: New test.
* gcc.target/riscv/cv-bitmanip-fail-compile-bset.c: New test.
* gcc.target/riscv/cv-bitmanip-fail-compile-extract.c: New test.
* gcc.target/riscv/cv-bitmanip-fail-compile-extractu.c: New test.
* gcc.target/riscv/cv-bitmanip-fail-compile-insert.c: New test.
* lib/target-supports.exp: Add proc for the XCVbitmanip extension.
---
 gcc/common/config/riscv/riscv-common.cc   |   2 +
 gcc/config/riscv/constraints.md   |  16 ++
 gcc/config/riscv/corev.def|  13 ++
 gcc/config/riscv/corev.md | 184 ++
 gcc/config/riscv/predicates.md|  16 ++
 gcc/config/riscv/riscv-builtins.cc|   1 +
 gcc/config/riscv/riscv-ftypes.def |   5 +
 gcc/config/riscv/riscv.cc |  13 ++
 gcc/config/riscv/riscv.opt|   2 +
 gcc/doc/extend.texi   |  53 +
 gcc/doc/sourcebuild.texi  |   3 +
 .../riscv/cv-bitmanip-compile-bclr.c  |  27 +++
 .../riscv/cv-bitmanip-compile-bclrr.c |  18 ++
 .../riscv/cv-bitmanip-compile-bitrev.c|  30 +++
 .../riscv/cv-bitmanip-compile-bset.c  |  27 +++
 .../riscv/cv-bitmanip-compile-bsetr.c |  18 ++
 .../riscv/cv-bitmanip-compile-clb.c   |  18 ++
 .../riscv/cv-bitmanip-compile-cnt.c   |  18 ++
 .../riscv/cv-bitmanip-compile-extract.c   |  27 +++
 .../riscv/cv-bitmanip-compile-extractr.c  |  18 ++
 .../riscv/cv-bitmanip-compile-extractu.c  |  27 +++
 .../riscv/cv-bitmanip-compile-extractur.c |  18 ++
 .../riscv/cv-bitmanip-compile-ff1.c   |  18 ++
 .../riscv/cv-bitmanip-compile-fl1.c   |  18 ++
 .../riscv/cv-bitmanip-compile-insert.c|  24 +++
 .../riscv/cv-bitmanip-compile-insertr.c   |  18 ++
 .../riscv/cv-bitmanip-compile-ror.c   |  18 ++
 .../riscv/cv-bitmanip-fail-compile-bclr.c |  25 +++
 .../riscv/cv-bitmanip-fail-compile-bitrev.c   |  23 +++
 .../riscv/cv-bitmanip-fail-compile-bset.c |  25 +++
 .../riscv/cv-bitmanip-fail-compile-extract.c  |  25 +++
 .../riscv/cv-bitmanip-fail-compile-extractu.c |  25 +++
 .../riscv/cv-bitmanip-fail-compile-insert.c   |  25 +++
 gcc/testsuite/lib/target-supports.exp |  13 ++
 34 files changed, 811 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bitmanip-compile-bclr.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bitmanip-compile-bclrr.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bitmanip-compile-bitrev.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bitmanip-compile-bset.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bitmanip-compile-bsetr.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bitmanip-compile-clb.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bitmanip-compile-cnt.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bitmanip-compile-extract.c
 create mode 100644 

[PATCH v3 0/1] RISC-V: Support CORE-V XCVBITMAIP extension

2024-01-22 Thread Mary Bennett
v2 -> v3:
 * Updated rtl for cnt, ff1, fl1, bclr, bset, insert and extract[u].
 * cv.bitrev requires groups of bits to reverse order. bitreverse does not
   support this.

This patch series presents the comprehensive implementation of the BITMANIP
extension for CORE-V.

Tested with riscv-gnu-toolchain on binutils, ld, gas and gcc testsuites to
ensure its correctness and compatibility with the existing codebase.
However, your input, reviews, and suggestions are invaluable in making this
extension even more robust.

The CORE-V builtins are described in the specification [1] and work can be
found in the OpenHW group's Github repository [2].

[1] 
github.com/openhwgroup/core-v-sw/blob/master/specifications/corev-builtin-spec.md

[2] github.com/openhwgroup/corev-gcc

Contributors:
  Mary Bennett 
  Nandni Jamnadas 
  Pietra Ferreira 
  Charlie Keaney
  Jessica Mills
  Craig Blackmore 
  Simon Cook 
  Jeremy Bennett 
  Helene Chelin 

RISC-V: Add support for XCVbitmanip extension in CV32E40P

 gcc/common/config/riscv/riscv-common.cc   |   2 +
 gcc/config/riscv/constraints.md   |  16 ++
 gcc/config/riscv/corev.def|  13 ++
 gcc/config/riscv/corev.md | 184 ++
 gcc/config/riscv/predicates.md|  16 ++
 gcc/config/riscv/riscv-builtins.cc|   1 +
 gcc/config/riscv/riscv-ftypes.def |   5 +
 gcc/config/riscv/riscv.cc |  13 ++
 gcc/config/riscv/riscv.opt|   2 +
 gcc/doc/extend.texi   |  53 +
 gcc/doc/sourcebuild.texi  |   3 +
 .../riscv/cv-bitmanip-compile-bclr.c  |  27 +++
 .../riscv/cv-bitmanip-compile-bclrr.c |  18 ++
 .../riscv/cv-bitmanip-compile-bitrev.c|  30 +++
 .../riscv/cv-bitmanip-compile-bset.c  |  27 +++
 .../riscv/cv-bitmanip-compile-bsetr.c |  18 ++
 .../riscv/cv-bitmanip-compile-clb.c   |  18 ++
 .../riscv/cv-bitmanip-compile-cnt.c   |  18 ++
 .../riscv/cv-bitmanip-compile-extract.c   |  27 +++
 .../riscv/cv-bitmanip-compile-extractr.c  |  18 ++
 .../riscv/cv-bitmanip-compile-extractu.c  |  27 +++
 .../riscv/cv-bitmanip-compile-extractur.c |  18 ++
 .../riscv/cv-bitmanip-compile-ff1.c   |  18 ++
 .../riscv/cv-bitmanip-compile-fl1.c   |  18 ++
 .../riscv/cv-bitmanip-compile-insert.c|  24 +++
 .../riscv/cv-bitmanip-compile-insertr.c   |  18 ++
 .../riscv/cv-bitmanip-compile-ror.c   |  18 ++
 .../riscv/cv-bitmanip-fail-compile-bclr.c |  25 +++
 .../riscv/cv-bitmanip-fail-compile-bitrev.c   |  23 +++
 .../riscv/cv-bitmanip-fail-compile-bset.c |  25 +++
 .../riscv/cv-bitmanip-fail-compile-extract.c  |  25 +++
 .../riscv/cv-bitmanip-fail-compile-extractu.c |  25 +++
 .../riscv/cv-bitmanip-fail-compile-insert.c   |  25 +++
 gcc/testsuite/lib/target-supports.exp |  13 ++
 34 files changed, 811 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bitmanip-compile-bclr.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bitmanip-compile-bclrr.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bitmanip-compile-bitrev.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bitmanip-compile-bset.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bitmanip-compile-bsetr.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bitmanip-compile-clb.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bitmanip-compile-cnt.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bitmanip-compile-extract.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/cv-bitmanip-compile-extractr.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/cv-bitmanip-compile-extractu.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/cv-bitmanip-compile-extractur.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bitmanip-compile-ff1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bitmanip-compile-fl1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bitmanip-compile-insert.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bitmanip-compile-insertr.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bitmanip-compile-ror.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/cv-bitmanip-fail-compile-bclr.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/cv-bitmanip-fail-compile-bitrev.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/cv-bitmanip-fail-compile-bset.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/cv-bitmanip-fail-compile-extract.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/cv-bitmanip-fail-compile-extractu.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/cv-bitmanip-fail-compile-insert.c

-- 
2.34.1



Re: [PATCH 3/3] aarch64: Fix up debug uses in ldp/stp pass [PR113089]

2024-01-22 Thread Richard Sandiford
Sorry for the earlier review comment about debug insns.  I hadn't
looked far enough into the queue to see this patch.

Alex Coplan  writes:
> As the PR shows, we were missing code to update debug uses in the
> load/store pair fusion pass.  This patch fixes that.
>
> Note that this patch depends on the following patch to create new uses
> in RTL-SSA, submitted as part of the fixes for PR113070:
> https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642919.html
>
> The patch tries to give a complete treatment of the debug uses that will
> be affected by the changes we make, and in particular makes an effort to
> preserve debug info where possible, e.g. when re-ordering an update of
> a base register by a constant over a debug use of that register.  When
> re-ordering loads over a debug use of a transfer register, we reset the
> debug insn.  Likewise when re-ordering stores over debug uses of mem.
>
> While doing this I noticed that try_promote_writeback used a strange
> choice of move_range for the pair insn, in that it chose the previous
> nondebug insn instead of the insn itself.  Since the insn is being
> changed, these move ranges are equivalent (at least in terms of nondebug
> insn placement as far as RTL-SSA is concerned), but I think it is more
> natural to choose the pair insn itself.  This is needed to avoid
> incorrectly updating some debug uses.
>
> Notes on testing:
>  - The series was bootstrapped/regtested on top of the fixes for
>PR113070 and PR113356.  It seemed to make more sense to test with
>correct use/def info, and as mentioned above, this patch depends on
>one of the PR113070 patches.
>  - I also ran the testsuite with -g -funroll-loops -mearly-ldp-fusion
>-mlate-ldp-fusion to try and flush out more issues, and worked
>through some examples where writeback updates were triggered to
>make sure it was doing the right thing.
>  - The patches also survived an LTO+PGO bootstrap with
>--enable-languages=all (with the passes enabled).
>
> Bootstrapped/regtested as a series on aarch64-linux-gnu (with/without
> the pass enabled).  OK for trunk?
>
> Thanks,
> Alex
>
> gcc/ChangeLog:
>
>   PR target/113089
>   * config/aarch64/aarch64-ldp-fusion.cc (reset_debug_use): New.
>   (fixup_debug_use): New.
>   (fixup_debug_uses_trailing_add): New.
>   (fixup_debug_uses): New. Use it ...
>   (ldp_bb_info::fuse_pair): ... here.
>   (try_promote_writeback): Call fixup_debug_uses_trailing_add to
>   fix up debug uses of the base register that are affected by
>   folding in the trailing add insn.
>
> gcc/testsuite/ChangeLog:
>
>   PR target/113089
>   * gcc.c-torture/compile/pr113089.c: New test.
> ---
>  gcc/config/aarch64/aarch64-ldp-fusion.cc  | 332 +-
>  .../gcc.c-torture/compile/pr113089.c  |  26 ++
>  2 files changed, 351 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr113089.c
>
> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> index 4d7fd72c6b1..fd0278e7acf 100644
> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> @@ -1342,6 +1342,309 @@ ldp_bb_info::track_tombstone (int uid)
>  gcc_unreachable (); // Bit should have changed.
>  }
>  
> +// Reset the debug insn containing USE (the debug insn has been
> +// optimized away).
> +static void
> +reset_debug_use (use_info *use)
> +{
> +  auto use_insn = use->insn ();
> +  auto use_rtl = use_insn->rtl ();
> +  insn_change change (use_insn);
> +  change.new_uses = {};
> +  INSN_VAR_LOCATION_LOC (use_rtl) = gen_rtx_UNKNOWN_VAR_LOC ();
> +  crtl->ssa->change_insn (change);
> +}
> +
> +// USE is a debug use that needs updating because DEF (a def of the same
> +// register) is being re-ordered over it.  If BASE is non-null, then DEF
> +// is an update of the register BASE by a constant, given by WB_OFFSET,
> +// and we can preserve debug info by accounting for the change in side
> +// effects.
> +static void
> +fixup_debug_use (obstack_watermark ,
> +  use_info *use,
> +  def_info *def,
> +  rtx base,
> +  poly_int64 wb_offset)
> +{
> +  auto use_insn = use->insn ();
> +  if (base)
> +{
> +  auto use_rtl = use_insn->rtl ();
> +  insn_change change (use_insn);
> +
> +  gcc_checking_assert (REG_P (base) && use->regno () == REGNO (base));
> +  change.new_uses = check_remove_regno_access (attempt,
> +change.new_uses,
> +use->regno ());
> +
> +  // The effect of the writeback is to add WB_OFFSET to BASE.  If
> +  // we're re-ordering DEF below USE, then we update USE by adding
> +  // WB_OFFSET to it.  Otherwise, if we're re-ordering DEF above
> +  // USE, we update USE by undoing the effect of the writeback
> +  // (subtracting WB_OFFSET).

Re: [PATCH] ipa-cp: Fix check for exceeding param_ipa_cp_value_list_size (PR 113490)

2024-01-22 Thread Jan Hubicka
> Hi,
> 
> When the check for exceeding param_ipa_cp_value_list_size limit was
> modified to be ignored for generating values from self-recursive
> calls, it should have been changed from equal to, to equals toor is
> greater than.  This omission manifests itself as PR 113490.
> 
> When I examined the condition I also noticed that the parameter should
> come from the callee rather than the caller, since the value list is
> associated with the former and not the latter.  In practice the limit
> is of course very likely to be the same, but I fixed this aspect of
> the condition too.  I briefly audited all other uses of opt_for_fn in
> ipa-cp.cc and all the others looked OK.
> 
> Bootstrapped and tested on x86_64-linux.  OK for master?
> 
> Thanks,
> 
> Martin
> 
> 
> gcc/ChangeLog:
> 
> 2024-01-19  Martin Jambor  
> 
>   PR ipa/113490
>   * ipa-cp.cc (ipcp_lattice::add_value): Bail out if value
>   count is equal or greater than the limit.  Use the limit from the
>   callee.
> 
> gcc/testsuite/ChangeLog:
> 
> 2024-01-19  Martin Jambor  
> 
>   PR ipa/113490
>   * gcc.dg/ipa/pr113490.c: New test.
OK,
thanks!
Honza


[PATCH] c++: Don't ICE for unknown parameter to constexpr'd switch-statement, PR113545

2024-01-22 Thread Hans-Peter Nilsson
I don't really know whether this is the right way to treat
CONVERT_EXPR as below, but...  Regtested native
x86_64-linux-gnu.  Ok to commit?

brgds, H-P

-- >8 --
That gcc_unreachable at the default-label seems to be over
the top.  It seems more correct to just say "that's not
constant" to whatever's not known (to be constant), when
looking for matches in switch-statements.

With this patch, the code generated for the (inlined) call to
ifbar equals that to swbar, except for the comparisons being
in another order.

gcc/cp:
PR c++/113545
* constexpr.cc (label_matches): Replace call to_unreachable with
return false.

gcc/testsuite:
* g++.dg/expr/pr113545.C: New text.
---
 gcc/cp/constexpr.cc |  3 +-
 gcc/testsuite/g++.dg/expr/pr113545.C | 49 +
 2 files changed, 51 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/expr/pr113545.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 6350fe154085..30caf3322fff 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -6922,7 +6922,8 @@ label_matches (const constexpr_ctx *ctx, tree 
*jump_target, tree stmt)
   break;
 
 default:
-  gcc_unreachable ();
+  /* Something else, like CONVERT_EXPR.  Unknown whether it matches.  */
+  break;
 }
   return false;
 }
diff --git a/gcc/testsuite/g++.dg/expr/pr113545.C 
b/gcc/testsuite/g++.dg/expr/pr113545.C
new file mode 100644
index ..914ffdeb8e16
--- /dev/null
+++ b/gcc/testsuite/g++.dg/expr/pr113545.C
@@ -0,0 +1,49 @@
+// { dg-do run { target c++11 } }
+
+char foo;
+
+// This one caught a call to gcc_unreachable in
+// cp/constexpr.cc:label_matches, when passed a convert_expr from the
+// cast in the call.
+constexpr unsigned char swbar(__UINTPTR_TYPE__ baz)
+{
+  switch (baz)
+{
+case 13:
+  return 11;
+case 14:
+  return 78;
+case 2048:
+  return 13;
+default:
+  return 42;
+}
+}
+
+// For reference, the equivalent* if-statements.
+constexpr unsigned char ifbar(__UINTPTR_TYPE__ baz)
+{
+  if (baz == 13)
+return 11;
+  else if (baz == 14)
+return 78;
+  else if (baz == 2048)
+return 13;
+  else
+return 42;
+}
+
+__attribute__ ((__noipa__))
+void xyzzy(int x)
+{
+  if (x != 42)
+__builtin_abort ();
+}
+
+int main()
+{
+  unsigned const char c = swbar(reinterpret_cast<__UINTPTR_TYPE__>());
+  xyzzy(c);
+  unsigned const char d = ifbar(reinterpret_cast<__UINTPTR_TYPE__>());
+  xyzzy(d);
+}
-- 
2.30.2



Re: [PATCH v2 2/2] x86: Don't save callee-saved registers in noreturn functions

2024-01-22 Thread Jan Hubicka
> I compared GCC master branch bootstrap and test times on a slow machine
> with 6.6 Linux kernels compiled with the original GCC 13 and the GCC 13
> with the backported patch.  The performance data isn't precise since the
> measurements were done on different days with different GCC sources under
> different 6.6 kernel versions.
> 
> GCC master branch build time in seconds:
> 
> beforeafter  improvement
> 30043.75user  30013.16user   0%
> 1274.85system 1243.72system  2.4%
> 
> GCC master branch test time in seconds (new tests added):
> 
> beforeafter  improvement
> 216035.90user 216547.51user  0
> 27365.51system26658.54system 2.6%

It is interesting - the system time difference comes from smaller
binary?  Is the difference any significant?
> 
> gcc/
> 
>   PR target/38534
>   * config/i386/i386-options.cc (ix86_set_func_type): Don't
>   save and restore callee saved registers for a noreturn function
>   with nothrow or compiled with -fno-exceptions.

In general this looks like good thing to do.  I wonder if that is not
something middle-end should understand for all targets.
Also I wonder about asynchronous stack unwinding.  If we want to unwind
stack from interrupt then we may need some registers to be saved (like
base pointer).

Honza
> 
> gcc/testsuite/
> 
>   PR target/38534
>   * gcc.target/i386/pr38534-1.c: New file.
>   * gcc.target/i386/pr38534-2.c: Likewise.
>   * gcc.target/i386/pr38534-3.c: Likewise.
>   * gcc.target/i386/pr38534-4.c: Likewise.
>   * gcc.target/i386/stack-check-17.c: Updated.
> ---
>  gcc/config/i386/i386-options.cc   | 16 ++--
>  gcc/testsuite/gcc.target/i386/pr38534-1.c | 26 +++
>  gcc/testsuite/gcc.target/i386/pr38534-2.c | 18 +
>  gcc/testsuite/gcc.target/i386/pr38534-3.c | 19 ++
>  gcc/testsuite/gcc.target/i386/pr38534-4.c | 18 +
>  .../gcc.target/i386/stack-check-17.c  | 19 +-
>  6 files changed, 102 insertions(+), 14 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-3.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-4.c
> 
> diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
> index 0cdea30599e..f965568947c 100644
> --- a/gcc/config/i386/i386-options.cc
> +++ b/gcc/config/i386/i386-options.cc
> @@ -3371,9 +3371,21 @@ ix86_simd_clone_adjust (struct cgraph_node *node)
>  static void
>  ix86_set_func_type (tree fndecl)
>  {
> +  /* No need to save and restore callee-saved registers for a noreturn
> + function with nothrow or compiled with -fno-exceptions.
> +
> + NB: Don't use TREE_THIS_VOLATILE to check if this is a noreturn
> + function.  The local-pure-const pass turns an interrupt function
> + into a noreturn function by setting TREE_THIS_VOLATILE.  Normally
> + the local-pure-const pass is run after ix86_set_func_type is called.
> + When the local-pure-const pass is enabled for LTO, the interrupt
> + function is marked as noreturn in the IR output, which leads the
> + incompatible attribute error in LTO1.  */
>bool has_no_callee_saved_registers
> -= lookup_attribute ("no_callee_saved_registers",
> - TYPE_ATTRIBUTES (TREE_TYPE (fndecl)));
> += (((TREE_NOTHROW (fndecl) || !flag_exceptions)
> + && lookup_attribute ("noreturn", DECL_ATTRIBUTES (fndecl)))
> +   || lookup_attribute ("no_callee_saved_registers",
> + TYPE_ATTRIBUTES (TREE_TYPE (fndecl;
>  
>if (cfun->machine->func_type == TYPE_UNKNOWN)
>  {
> diff --git a/gcc/testsuite/gcc.target/i386/pr38534-1.c 
> b/gcc/testsuite/gcc.target/i386/pr38534-1.c
> new file mode 100644
> index 000..9297959e759
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr38534-1.c
> @@ -0,0 +1,26 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move" 
> } */
> +
> +#define ARRAY_SIZE 256
> +
> +extern int array[ARRAY_SIZE][ARRAY_SIZE][ARRAY_SIZE];
> +extern int value (int, int, int)
> +#ifndef __x86_64__
> +__attribute__ ((regparm(3)))
> +#endif
> +;
> +
> +void
> +__attribute__((noreturn))
> +no_return_to_caller (void)
> +{
> +  unsigned i, j, k;
> +  for (i = ARRAY_SIZE; i > 0; --i)
> +for (j = ARRAY_SIZE; j > 0; --j)
> +  for (k = ARRAY_SIZE; k > 0; --k)
> + array[i - 1][j - 1][k - 1] = value (i, j, k);
> +  while (1);
> +}
> +
> +/* { dg-final { scan-assembler-not "push" } } */
> +/* { dg-final { scan-assembler-not "pop" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr38534-2.c 
> b/gcc/testsuite/gcc.target/i386/pr38534-2.c
> new file mode 100644
> index 000..1fb01363273
> --- 

Re: HELP: Questions on unshare_expr

2024-01-22 Thread Qing Zhao
One update, last Friday, I merged all my patches for counted-by support 
(including the Patch to workaround the LTO issue)  with the latest trunk, 
bootstrapped
 and run the testing, everything is good.

Today, when I disabled the Patch that workaround the LTO issue, surprisingly, I 
cannot
repeat the LTO issue anymore with the latest trunk + my counted-by support 
patches.
I.e., without the LTO workaround, everything works just fine with the latest 
gcc.

I suspected that some change in the latest GCC “fixed” (or hide) the issue. 

Qing

> On Jan 22, 2024, at 9:52 AM, Qing Zhao  wrote:
> 
> 
> 
>> On Jan 22, 2024, at 2:40 AM, Richard Biener  
>> wrote:
>> 
>> On Fri, Jan 19, 2024 at 5:26 PM Qing Zhao  wrote:
>>> 
>>> 
>>> 
 On Jan 19, 2024, at 4:30 AM, Richard Biener  
 wrote:
 
 On Thu, Jan 18, 2024 at 3:46 PM Qing Zhao  wrote:
> 
> 
> 
>> On Jan 17, 2024, at 1:43 AM, Richard Biener  
>> wrote:
>> 
>> On Wed, Jan 17, 2024 at 7:42 AM Richard Biener
>>  wrote:
>>> 
>>> On Tue, Jan 16, 2024 at 9:26 PM Qing Zhao  wrote:
 
 
 
> On Jan 15, 2024, at 4:31 AM, Richard Biener 
>  wrote:
> 
>> All my questions for unshare_expr relate to a  LTO bug that I 
>> currently stuck with
>> when using .ACCESS_WITH_SIZE in bound sanitizer (only with -flto, 
>> without -flto, no issue):
>> 
>> [opc@qinzhao-aarch64-ol8 gcc]$ sh t
>> during IPA pass: modref
>> t.c:20:1: internal compiler error: tree code ‘ssa_name’ is not 
>> supported in LTO streams
>> 0x14c3993 lto_write_tree
>>../../latest-gcc-write/gcc/lto-streamer-out.cc:561
>> 0x14c3aeb lto_output_tree_1
>> 
>> And the value of the tree node that triggered the ICE is:
>> (gdb) call debug_tree(expr)
>> 
>> nothrow
>> def_stmt
>> version:13 in-free-list>
>> 
>> Is there any good way to debug LTO bug?
> 
> This happens usually when you have a VLA type and its type fields are 
> not
> properly gimplified which usually happens because the frontend fails 
> to
> insert a gimplification point for it (a DECL_EXPR).
 
 I found an old gcc bug
 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97172
 ICE: tree code ‘ssa_name’ is not supported in LTO streams since 
 r11-3303-g6450f07388f9fe57
 
 Which is very similar to the bug I am having right now.
 
 After further study, I suspect that the issue I am having right now 
 with the LTO streaming also
 relate to “unshare_expr”, “save_expr”, and the combination of these 
 two, I suspect that
 the current gcc cannot handle the combination of these two correctly 
 for my case.
 
 My testing case is:
 
 #include 
 void __attribute__((__noinline__)) setup_and_test_vla (int n1, int n2, 
 int m)
 {
 struct foo {
int n;
int p[][n2][n1] __attribute__((counted_by(n)));
 } *f;
 
 f = (struct foo *) malloc (sizeof(struct foo) + m*sizeof(int[n2][n1]));
 f->n = m;
 f->p[m][n2][n1]=1;
 return;
 }
 
 int main(int argc, char *argv[])
 {
 setup_and_test_vla (10, 11, 20);
 return 0;
 }
 
 Failed with
 my_gcc -Os -fsanitize=bounds -flto
 
 If changing either n1 or n2 to a constant, the testing passed.
 If deleting -flto, the testing passed too.
 
 I double checked my code per the suggestions provided by you and Jakub 
 in this
 email thread, and I think the code should be fine.
 
 The code is following:
 
 =
 504 /* Instrument array bounds for INDIRECT_REFs whose pointers are
 505POINTER_PLUS_EXPRs of calls to .ACCESS_WITH_SIZE. We create 
 special
 506builtins that gets expanded in the sanopt pass, and make an 
 array
 507dimension of it.  ARRAY is the pointer to the base of the array,
 508which is a call to .ACCESS_WITH_SIZE, *OFFSET is the offset to 
 the
 509beginning of array.
 510Return NULL_TREE if no instrumentation is emitted.  */
 511
 512 tree
 513 ubsan_instrument_bounds_indirect_ref (location_t loc, tree array, 
 tree *offset)
 514 {
 515   if (!is_access_with_size_p (array))
 516 return NULL_TREE;
 517   tree bound = get_bound_from_access_with_size (array);
 518   /* The type of the call to .ACCESS_WITH_SIZE is a pointer type to
 519  the element of the array.  */
 520   tree element_size = TYPE_SIZE_UNIT (TREE_TYPE 

Re: [PATCH][GCC][Arm] Add pattern for bswap + rotate -> rev16 [Bug 108933]

2024-01-22 Thread Richard Earnshaw (lists)
On 22/01/2024 12:18, Matthieu Longo wrote:
> rev16 pattern was not recognised anymore as a change in the bswap tree
> pass was introducing a new GIMPLE form, not recognized by the assembly
> final transformation pass.
> 
> More details in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108933
> 
> gcc/ChangeLog:
> 
>     PR target/108933
>     * config/arm/arm.md (*arm_rev16si2_alt3): new pattern to convert
>   a bswap + rotate by 16 bits into rev16

ChangeLog entries need to be written as sentences, so start with a capital 
letter and end with a full stop; continuation lines should start in column 8 
(one hard tab, don't use spaces).  But in this case, "New pattern." is 
sufficient.

> 
> gcc/testsuite/ChangeLog:
> 
>     PR target/108933
>     * gcc.target/arm/rev16.c: Moved to...
>     * gcc.target/arm/rev16_1.c: ...here.
>     * gcc.target/arm/rev16_2.c: New test to check that rev16 is
>   emitted.


+;; Similar pattern to match (rotate (bswap) 16)
+(define_insn "*arm_rev16si2_alt3"
+  [(set (match_operand:SI 0 "register_operand" "=l,r")
+(rotate:SI (bswap:SI (match_operand:SI 1 "register_operand" "l,r"))
+ (const_int 16)))]
+  "arm_arch6"
+  "rev16\\t%0, %1"
+  [(set_attr "arch" "t,32")
+   (set_attr "length" "2,4")
+   (set_attr "type" "rev")]
+)
+

Unfortunately, this is insufficient.  When generating Arm or Thumb2 code (but 
not thumb1) we also have to handle conditional execution: we need to have '%?' 
in the output template at the point where a condition code might be needed.  
That means we need separate output templates for all three alternatives (as we 
need a 16-bit variant for thumb2 that's conditional and a 16-bit for thumb1 
that isn't).  See the output of arm_rev16 for a guide of what is really needed.

I note that the arm_rev16si2_alt1, and arm_rev16si2_alt2 patterns are incorrect 
in this regard as well; that will need fixing.

I also see that arm_rev16si2 currently expands to the alt1 variant above; given 
that the preferred canonical form would now appear to use bswap + rotate, we 
should change that as well.  In fact, we can merge your new pattern with the 
expand entirely and eliminate the need to call gen_arm_rev16si2_alt1.  
Something like:

(define_insn "arm_rev16si2"
  [(set (match_operand:SI 0 "s_register_operand")
(rotate:SI (bswap:SI (match_operand:SI 1 "s_register_operand")) 
(const_int 16))]
  "arm_arch6"
  "@
  rev16...
  ...


R.



Re: [PATCH 2/3] aarch64: Re-parent trailing nondebug base reg uses [PR113089]

2024-01-22 Thread Richard Sandiford
Alex Coplan  writes:
> While working on PR113089, I realised we where missing code to re-parent
> trailing nondebug uses of the base register in the case of cancelling
> writeback in the load/store pair pass.  This patch fixes that.
>
> Bootstrapped/regtested as a series on aarch64-linux-gnu (with/without
> the pass enabled), OK for trunk?
>
> Thanks,
> Alex
>
> gcc/ChangeLog:
>
>   PR target/113089
>   * config/aarch64/aarch64-ldp-fusion.cc (ldp_bb_info::fuse_pair):
>   Update trailing nondebug uses of the base register in the case
>   of cancelling writeback.

OK, thanks.  I suppose in future it would be good to have a way of
stitching together the use lists of two consecutive defs, since that
should be possible in constant time.  But that's just a possible future
enhancement.  The update as it stands should still be linear, since all
updated uses would come before the next writeback candidate involving
the same base register.

Richard

> ---
>  gcc/config/aarch64/aarch64-ldp-fusion.cc | 24 
>  1 file changed, 24 insertions(+)
>
> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> index 70b75c668ce..4d7fd72c6b1 100644
> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> @@ -1693,6 +1693,30 @@ ldp_bb_info::fuse_pair (bool load_p,
>  
>if (trailing_add)
>  changes.safe_push (make_delete (trailing_add));
> +  else if ((writeback & 2) && !writeback_effect)
> +{
> +  // The second insn initially had writeback but now the pair does not,
> +  // need to update any nondebug uses of the base register def in the
> +  // second insn.  We'll take care of debug uses later.
> +  auto def = find_access (insns[1]->defs (), base_regno);
> +  gcc_assert (def);
> +  auto set = dyn_cast (def);
> +  if (set && set->has_nondebug_uses ())
> + {
> +   auto orig_use = find_access (insns[0]->uses (), base_regno);
> +   for (auto use : set->nondebug_insn_uses ())
> + {
> +   auto change = make_change (use->insn ());
> +   change->new_uses = check_remove_regno_access (attempt,
> + change->new_uses,
> + base_regno);
> +   change->new_uses = insert_access (attempt,
> + orig_use,
> + change->new_uses);
> +   changes.safe_push (change);
> + }
> + }
> +}
>  
>auto is_changing = insn_is_changing (changes);
>for (unsigned i = 0; i < changes.length (); i++)


Re: [PATCH 2/2] find_base_value part

2024-01-22 Thread Jeff Law




On 1/15/24 06:34, Richard Biener wrote:

The following adjusts find_base_value similar as to what
find_base_term was adjusted for PR113255.

* alias.cc (known_base_value_p): Remove.
(find_base_value): Remove PLUS/MINUS handling
when both operands are not CONST_INT_P.

OK.
jeff


Re: [PATCH 1/3] rtl-ssa: Provide easier access to debug uses [PR113089]

2024-01-22 Thread Richard Sandiford
Alex Coplan  writes:
> This patch adds some accessors to set_info and use_info to make it
> easier to get at and iterate through uses in debug insns.
>
> It is used by the aarch64 load/store pair fusion pass in a subsequent
> patch to fix PR113089, i.e. to update debug uses in the pass.
>
> Bootstrapped/regtested as a series on aarch64-linux-gnu (with/without
> the load/store pair pass enabled), OK for trunk?
>
> gcc/ChangeLog:
>
>   PR target/113089
>   * rtl-ssa/accesses.h (use_info::next_debug_insn_use): New.
>   (debug_insn_use_iterator): New.
>   (set_info::first_debug_insn_use): New.
>   (set_info::debug_insn_uses): New.
>   * rtl-ssa/member-fns.inl (use_info::next_debug_insn_use): New.
>   (set_info::first_debug_insn_use): New.
>   (set_info::debug_insn_uses): New.

OK, thanks.

Richard

> ---
>  gcc/rtl-ssa/accesses.h | 13 +
>  gcc/rtl-ssa/member-fns.inl | 29 +
>  2 files changed, 42 insertions(+)
>
> diff --git a/gcc/rtl-ssa/accesses.h b/gcc/rtl-ssa/accesses.h
> index 6a3ecd32848..c57b8a8b7b5 100644
> --- a/gcc/rtl-ssa/accesses.h
> +++ b/gcc/rtl-ssa/accesses.h
> @@ -357,6 +357,10 @@ public:
>//next_use () && next_use ()->is_in_any_insn () ? next_use () : nullptr
>use_info *next_any_insn_use () const;
>  
> +  // Return the next use by a debug instruction, or null if none.
> +  // This is only valid if if is_in_debug_insn ().
> +  use_info *next_debug_insn_use () const;
> +
>// Return the previous use by a phi node in the list, or null if none.
>//
>// This is only valid if is_in_phi ().  It is equivalent to:
> @@ -458,6 +462,8 @@ using reverse_use_iterator = list_iterator _info::prev_use>;
>  // of use in the same definition.
>  using nondebug_insn_use_iterator
>= list_iterator;
> +using debug_insn_use_iterator
> +  = list_iterator;
>  using any_insn_use_iterator
>= list_iterator;
>  using phi_use_iterator = list_iterator;
> @@ -680,6 +686,10 @@ public:
>use_info *first_nondebug_insn_use () const;
>use_info *last_nondebug_insn_use () const;
>  
> +  // Return the first use of the set by debug instructions, or null if
> +  // there is no such use.
> +  use_info *first_debug_insn_use () const;
> +
>// Return the first use of the set by any kind of instruction, or null
>// if there are no such uses.  The uses are in the order described above.
>use_info *first_any_insn_use () const;
> @@ -731,6 +741,9 @@ public:
>// List the uses of the set by nondebug instructions, in reverse postorder.
>iterator_range nondebug_insn_uses () const;
>  
> +  // List the uses of the set by debug instructions, in reverse postorder.
> +  iterator_range debug_insn_uses () const;
> +
>// Return nondebug_insn_uses () in reverse order.
>iterator_range reverse_nondebug_insn_uses () const;
>  
> diff --git a/gcc/rtl-ssa/member-fns.inl b/gcc/rtl-ssa/member-fns.inl
> index 8e1c17ced95..e4825ad2a18 100644
> --- a/gcc/rtl-ssa/member-fns.inl
> +++ b/gcc/rtl-ssa/member-fns.inl
> @@ -119,6 +119,15 @@ use_info::next_any_insn_use () const
>return nullptr;
>  }
>  
> +inline use_info *
> +use_info::next_debug_insn_use () const
> +{
> +  if (auto use = next_use ())
> +if (use->is_in_debug_insn ())
> +  return use;
> +  return nullptr;
> +}
> +
>  inline use_info *
>  use_info::prev_phi_use () const
>  {
> @@ -212,6 +221,20 @@ set_info::last_nondebug_insn_use () const
>return nullptr;
>  }
>  
> +inline use_info *
> +set_info::first_debug_insn_use () const
> +{
> +  use_info *use;
> +  if (has_nondebug_insn_uses ())
> +use = last_nondebug_insn_use ()->next_use ();
> +  else
> +use = first_use ();
> +
> +  if (use && use->is_in_debug_insn ())
> +return use;
> +  return nullptr;
> +}
> +
>  inline use_info *
>  set_info::first_any_insn_use () const
>  {
> @@ -310,6 +333,12 @@ set_info::nondebug_insn_uses () const
>return { first_nondebug_insn_use (), nullptr };
>  }
>  
> +inline iterator_range
> +set_info::debug_insn_uses () const
> +{
> +  return { first_debug_insn_use (), nullptr };
> +}
> +
>  inline iterator_range
>  set_info::reverse_nondebug_insn_uses () const
>  {


Re: [PATCH] aarch64: Don't record hazards against paired insns [PR113356]

2024-01-22 Thread Richard Sandiford
Alex Coplan  writes:
> Hi,
>
> For the testcase in the PR, we try to pair insns where the first has
> writeback and the second uses the updated base register.  This causes us
> to record a hazard against the second insn, thus narrowing the move
> range away from the end of the BB.
>
> However, it isn't meaningful to record hazards against the other insn
> in the pair, as this doesn't change which pairs can be formed, and also
> doesn't change where the pair is formed (from the perspective of
> nondebug insns).
>
> To see why this is the case, consider the two cases:
>
>  - Suppoe we are finding hazards for insns[0].  If we record a hazard
>against insns[1], then range.last becomes
>insns[1]->prev_nondebug_insn (), but note that this is equivalent to
>inserting after insns[1] (since insns[1] is being changed).
>  - Now consider finding hazards for insns[1].  Suppose we record
>insns[0] as a hazard.  Then we set range.first = insns[0], which is a
>no-op.
>
> As such, it seems better to never record hazards against the other insn
> in the pair, as we check whether the insns themselves are suitable for
> combination separately (e.g. for ldp checking that they use distinct
> transfer registers).  Avoiding unnecessarily narrowing the move range
> avoids unnecessarily re-ordering over debug insns.
>
> This should also mean that we can only narrow the move range away from
> the end of the BB in the case that we record a hazard for insns[0]
> against insns[1]->prev_nondebug_insn () or earlier.  This means that for
> the non-call-exceptions case, either the move range includes insns[1],
> or we reject the pair (thus the assert tripped in the PR should always
> hold).
>
> Bootstrapped/regtested on aarch64-linux-gnu with/without ldp passes
> enabled on top of the PR113070 fixes, OK for trunk?
>
> Thanks,
> Alex
>
> gcc/ChangeLog:
>
>   PR target/113356
>   * config/aarch64/aarch64-ldp-fusion.cc (ldp_bb_info::try_fuse_pair):
>   Don't record hazards against the opposite insn in the pair.
>
> gcc/testsuite/ChangeLog:
>
>   PR target/113356
>   * gcc.target/aarch64/pr113356.C: New test.

OK, thanks.

Richard


Re: [Patch] xfail libgomp.c/declare-variant-4-{fiji,gfx803}.c

2024-01-22 Thread Andrew Stubbs
On Fri, 19 Jan 2024 at 18:27, Tobias Burnus  wrote:

> The problem is as described at
> https://gcc.gnu.org/install/specific.html#amdgcn-x-amdhsa
>
> "Note that support for Fiji devices has been removed in ROCm 4.0 and
> support in LLVM is deprecated and will be removed in LLVM 18."
>
> Therefore, GCC is no longer build with Fiji (gfx803) support by default
> – and the -march=fiji testcases now fails as the -lgomp multilib for
> Fiji is not available. (That is: It fails, unless Fiji support has been
> enabled manually.)
>
> Andrew mentioned that there is a PR about this, but I couldn't find it.
> If someone can, I am happy to add it to the changelog.
>
> OK for mainline?
>

OK. There's probably a bikeshed to paint here, but the tests are destined
to get deleted, so whatever.

Andrew


Re: [PATCH 4/4] aarch64: Fix up uses of mem following stp insert [PR113070]

2024-01-22 Thread Richard Sandiford
Alex Coplan  writes:
> As the PR shows (specifically #c7) we are missing updating uses of mem
> when inserting an stp in the aarch64 load/store pair fusion pass.  This
> patch fixes that.
>
> RTL-SSA has a simple view of memory and by default doesn't allow stores
> to be re-ordered w.r.t. other stores.  In the ldp fusion pass, we do our
> own alias analysis and so can re-order stores over other accesses when
> we deem this is safe.  If neither store can be re-purposed (moved into
> the required position to form the stp while respecting the RTL-SSA
> constraints), then we turn both the candidate stores into "tombstone"
> insns (logically delete them) and insert a new stp insn.
>
> As it stands, we implement the insert case separately (after dealing
> with the candidate stores) in fuse_pair by inserting into the middle of
> the vector of changes.  This is OK when we only have to insert one
> change, but with this fix we would need to insert the change for the new
> stp plus multiple changes to fix up uses of mem (note the number of
> fix-ups is naturally bounded by the alias limit param to prevent
> quadratic behaviour).  If we kept the code structured as is and inserted
> into the middle of the vector, that would lead to repeated moving of
> elements in the vector which seems inefficient.  The structure of the
> code would also be a little unwieldy.
>
> To improve on that situation, this patch introduces a helper class,
> stp_change_builder, which implements a state machine that helps to build
> the required changes directly in program order.  That state machine is
> reponsible for deciding what changes need to be made in what order, and
> the code in fuse_pair then simply follows those steps.
>
> Together with the fix in the previous patch for installing new defs
> correctly in RTL-SSA, this fixes PR113070.
>
> We take the opportunity to rename the function decide_stp_strategy to
> try_repurpose_store, as that seems more descriptive of what it actually
> does, since stp_change_builder is now responsible for the overall change
> strategy.
>
> Bootstrapped/regtested as a series with/without the passes enabled on
> aarch64-linux-gnu, OK for trunk?
>
> Thanks,
> Alex
>
> gcc/ChangeLog:
>
>   PR target/113070
>   * config/aarch64/aarch64-ldp-fusion.cc (struct stp_change_builder): New.
>   (decide_stp_strategy): Reanme to ...
>   (try_repurpose_store): ... this.
>   (ldp_bb_info::fuse_pair): Refactor to use stp_change_builder to
>   construct stp changes.  Fix up uses when inserting new stp insns.
> ---
>  gcc/config/aarch64/aarch64-ldp-fusion.cc | 248 ++-
>  1 file changed, 194 insertions(+), 54 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> index 689a8c884bd..703cfb1228c 100644
> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> @@ -844,11 +844,138 @@ def_upwards_move_range (def_info *def)
>return range;
>  }
>  
> +// Class that implements a state machine for building the changes needed to 
> form
> +// a store pair instruction.  This allows us to easily build the changes in
> +// program order, as required by rtl-ssa.
> +struct stp_change_builder
> +{
> +  enum class state
> +  {
> +FIRST,
> +INSERT,
> +FIXUP_USE,
> +LAST,
> +DONE
> +  };
> +
> +  enum class action
> +  {
> +TOMBSTONE,
> +CHANGE,
> +INSERT,
> +FIXUP_USE
> +  };
> +
> +  struct change
> +  {
> +action type;
> +insn_info *insn;
> +  };
> +
> +  bool done () const { return m_state == state::DONE; }
> +
> +  stp_change_builder (insn_info *insns[2],
> +   insn_info *repurpose,
> +   insn_info *dest)
> +: m_state (state::FIRST), m_insns { insns[0], insns[1] },
> +  m_repurpose (repurpose), m_dest (dest), m_use (nullptr) {}

Just to make sure I understand: is it the case that

  *insns[0] <= *dest <= *insns[1]

?

> +
> +  change get_change () const
> +  {
> +switch (m_state)
> +  {
> +  case state::FIRST:
> + return {
> +   m_insns[0] == m_repurpose ? action::CHANGE : action::TOMBSTONE,
> +   m_insns[0]
> + };
> +  case state::LAST:
> + return {
> +   m_insns[1] == m_repurpose ? action::CHANGE : action::TOMBSTONE,
> +   m_insns[1]
> + };
> +  case state::INSERT:
> + return { action::INSERT, m_dest };
> +  case state::FIXUP_USE:
> + return { action::FIXUP_USE, m_use->insn () };
> +  case state::DONE:
> + break;
> +  }
> +
> +gcc_unreachable ();
> +  }
> +
> +  // Transition to the next state.
> +  void advance ()
> +  {
> +switch (m_state)
> +  {
> +  case state::FIRST:
> + if (m_repurpose)
> +   m_state = state::LAST;
> + else
> +   m_state = state::INSERT;
> + break;
> +  case state::INSERT:
> +  {
> + def_info *def = memory_access (m_insns[0]->defs ());
> + while 

Re: [middle-end PATCH] Prefer PLUS over IOR in RTL expansion of multi-word shifts/rotates.

2024-01-22 Thread Jeff Law




On 1/22/24 00:45, Richard Biener wrote:

On Fri, Jan 19, 2024 at 5:06 PM Georg-Johann Lay  wrote:




Am 18.01.24 um 20:54 schrieb Roger Sayle:


This patch tweaks RTL expansion of multi-word shifts and rotates to use
PLUS rather than IOR for disjunctive operations.  During expansion of
these operations, the middle-end creates RTL like (X<>C2)
where the constants C1 and C2 guarantee that bits don't overlap.
Hence the IOR can be performed by any any_or_plus operation, such as
IOR, XOR or PLUS; for word-size operations where carry chains aren't
an issue these should all be equally fast (single-cycle) instructions.
The benefit of this change is that targets with shift-and-add insns,
like x86's lea, can benefit from the LSHIFT-ADD form.

An example of a backend that benefits is ARC, which is demonstrated
by these two simple functions:


But there are also back-ends where this is bad.

The reason is that with ORI, the back-end needs only to operate no
these sub-words where the sub-mask is non-zero.  But for PLUS this
is not the case because the back-end does not know that intermediate
carry will be zero.  Hence, with PLUS, more instructions are needed.
An example is AVR, but maybe much more target with multi-word operations
are affected in a bad way.

Take for example the case with 2 words and a value of 1.

LO |= 1
HI |= 0

can be optimized to

LO |= 1

but for addition this is not the case:

LO += 1
HI +=c 0 ;; Does not know that always carry = 0.


I wonder if the PLUS can be done on the lowpart only to make this
detail obvious?
In theory, yes.   This class of problems has often been punted to the 
target expanders (far from ideal).


I still suspect the way forward here is to have the exp* code query one 
or more target properties to guide IOR vs PLUS selection.


Jeff



Re: [PATCH 1/2] x86: Add no_callee_saved_registers function attribute

2024-01-22 Thread H.J. Lu
On Sun, Jan 21, 2024 at 8:03 PM Hongtao Liu  wrote:
>
> On Sat, Jan 20, 2024 at 10:30 PM H.J. Lu  wrote:
> >
> > When an interrupt handler is implemented by an assembly stub which does:
> >
> > 1. Save all registers.
> > 2. Call a C function.
> > 3. Restore all registers.
> > 4. Return from interrupt.
> >
> > it is completely unnecessary to save and restore any registers in the C
> > function called by the assembly stub, even if they would normally be
> > callee-saved.
> >
> > Add no_callee_saved_registers function attribute, which is complementary
> > to no_caller_saved_registers function attribute, to mark a function which
> > doesn't have any callee-saved registers.  Such a function won't save and
> > restore any registers.  Classify function call-saved register handling
> > type with:
> >
> > 1. Default call-saved registers.
> > 2. No caller-saved registers with no_caller_saved_registers attribute.
> > 3. No callee-saved registers with no_callee_saved_registers attribute.
> >
> > Disallow sibcall if callee is a no_callee_saved_registers function
> > and caller isn't a no_callee_saved_registers function.  Otherwise,
> > callee-saved registers won't be preserved.
> >
> > After a no_callee_saved_registers function is called, all registers may
> > be clobbered.  If the calling function isn't a no_callee_saved_registers
> > function, we need to preserve all registers which aren't used by function
> > calls.
> >
> > gcc/
> >
> > PR target/103503
> > PR target/113312
> > * config/i386/i386-expand.cc (ix86_expand_call): Set
> > call_no_callee_saved_registers to true when calling function
> > with no_callee_saved_registers attribute.  Replace
> > no_caller_saved_registers check with call_saved_registers check.
> > * config/i386/i386-options.cc (ix86_set_func_type): Set
> > call_saved_registers to TYPE_NO_CALLEE_SAVED_REGISTERS for
> > noreturn function.  Disallow no_callee_saved_registers with
> > interrupt or no_caller_saved_registers attributes together.
> > (ix86_set_current_function): Replace no_caller_saved_registers
> > check with call_saved_registers check.
> > (ix86_handle_no_caller_saved_registers_attribute): Renamed to ...
> > (ix86_handle_call_saved_registers_attribute): This.
> > (ix86_gnu_attributes): Add
> > ix86_handle_call_saved_registers_attribute.
> > * config/i386/i386.cc (ix86_conditional_register_usage): Replace
> > no_caller_saved_registers check with call_saved_registers check.
> > (ix86_function_ok_for_sibcall): Don't allow callee with
> > no_callee_saved_registers attribute when the calling function
> > has callee-saved registers.
> > (ix86_comp_type_attributes): Also check
> > no_callee_saved_registers.
> > (ix86_epilogue_uses): Replace no_caller_saved_registers check
> > with call_saved_registers check.
> > (ix86_hard_regno_scratch_ok): Likewise.
> > (ix86_save_reg): Replace no_caller_saved_registers check with
> > call_saved_registers check.  Don't save any registers for
> > TYPE_NO_CALLEE_SAVED_REGISTERS.  Save all registers with
> > TYPE_DEFAULT_CALL_SAVED_REGISTERS if function with
> > no_callee_saved_registers attribute is called.
> > (find_drap_reg): Replace no_caller_saved_registers check with
> > call_saved_registers check.
> > * config/i386/i386.h (call_saved_registers_type): New enum.
> > (machine_function): Replace no_caller_saved_registers with
> > call_saved_registers.  Add call_no_callee_saved_registers.
> > * doc/extend.texi: Document no_callee_saved_registers attribute.
> >
> > gcc/testsuite/
> >
> > PR target/103503
> > PR target/113312
> > * gcc.dg/torture/no-callee-saved-run-1a.c: New file.
> > * gcc.dg/torture/no-callee-saved-run-1b.c: Likewise.
> > * gcc.target/i386/no-callee-saved-1.c: Likewise.
> > * gcc.target/i386/no-callee-saved-2.c: Likewise.
> > * gcc.target/i386/no-callee-saved-3.c: Likewise.
> > * gcc.target/i386/no-callee-saved-4.c: Likewise.
> > * gcc.target/i386/no-callee-saved-5.c: Likewise.
> > * gcc.target/i386/no-callee-saved-6.c: Likewise.
> > * gcc.target/i386/no-callee-saved-7.c: Likewise.
> > * gcc.target/i386/no-callee-saved-8.c: Likewise.
> > * gcc.target/i386/no-callee-saved-9.c: Likewise.
> > * gcc.target/i386/no-callee-saved-10.c: Likewise.
> > * gcc.target/i386/no-callee-saved-11.c: Likewise.
> > * gcc.target/i386/no-callee-saved-12.c: Likewise.
> > * gcc.target/i386/no-callee-saved-13.c: Likewise.
> > * gcc.target/i386/no-callee-saved-14.c: Likewise.
> > * gcc.target/i386/no-callee-saved-15.c: Likewise.
> > * gcc.target/i386/no-callee-saved-16.c: Likewise.
> > * 

Re: [PATCH] arm: Fix parsecpu.awk for aliases [PR113030]

2024-01-22 Thread Richard Earnshaw (lists)
On 21/01/2024 07:29, Andrew Pinski wrote:
> So the problem here is the 2 functions check_cpu and check_arch use
> the wrong variable to check if an alias is valid for that cpu/arch.
> check_cpu uses cpu_optaliases instead of cpu_opt_alias. cpu_optaliases
> is an array of index'ed by the cpuname that contains all of the valid aliases
> for that cpu but cpu_opt_alias is an double index array which is index'ed
> by cpuname and the alias which provides what is the alias for that option.
> Similar thing happens for check_arch and arch_optaliases vs arch_optaliases.
> 
> Tested by running:
> ```
> awk -f config/arm/parsecpu.awk -v cmd="chkarch armv7-a+simd" 
> config/arm/arm-cpus.in
> awk -f config/arm/parsecpu.awk -v cmd="chkarch armv7-a+neon" 
> config/arm/arm-cpus.in
> awk -f config/arm/parsecpu.awk -v cmd="chkarch armv7-a+neon-vfpv3" 
> config/arm/arm-cpus.in
> ```
> And they don't return error back.
> 
> gcc/ChangeLog:
> 
>   PR target/113030
>   * config/arm/parsecpu.awk (check_cpu): Use cpu_opt_alias
>   instead of cpu_optaliases.
>   (check_arch): Use arch_opt_alias instead of arch_optaliases.

OK

Thanks,

R.

> 
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/config/arm/parsecpu.awk | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/config/arm/parsecpu.awk b/gcc/config/arm/parsecpu.awk
> index ddd4f3b440a..384462bdb5b 100644
> --- a/gcc/config/arm/parsecpu.awk
> +++ b/gcc/config/arm/parsecpu.awk
> @@ -529,7 +529,7 @@ function check_cpu (name) {
>  
>  for (n = 2; n <= exts; n++) {
>   if (!((cpu_name, extensions[n]) in cpu_opt_remove)  \
> - && !((cpu_name, extensions[n]) in cpu_optaliases)) {
> + && !((cpu_name, extensions[n]) in cpu_opt_alias)) {
>   return "error"
>   }
>  }
> @@ -552,7 +552,7 @@ function check_arch (name) {
>  
>  for (n = 2; n <= exts; n++) {
>   if (!((extensions[1], extensions[n]) in arch_opt_remove)\
> - && !((extensions[1], extensions[n]) in arch_optaliases)) {
> + && !((extensions[1], extensions[n]) in arch_opt_alias)) {
>   return "error"
>   }
>  }



[PATCH v2 1/2] x86: Add no_callee_saved_registers function attribute

2024-01-22 Thread H.J. Lu
When an interrupt handler is implemented by an assembly stub which does:

1. Save all registers.
2. Call a C function.
3. Restore all registers.
4. Return from interrupt.

it is completely unnecessary to save and restore any registers in the C
function called by the assembly stub, even if they would normally be
callee-saved.

Add no_callee_saved_registers function attribute, which is complementary
to no_caller_saved_registers function attribute, to mark a function which
doesn't have any callee-saved registers.  Such a function won't save and
restore any registers.  Classify function call-saved register handling
type with:

1. Default call-saved registers.
2. No caller-saved registers with no_caller_saved_registers attribute.
3. No callee-saved registers with no_callee_saved_registers attribute.

Disallow sibcall if callee is a no_callee_saved_registers function
and caller isn't a no_callee_saved_registers function.  Otherwise,
callee-saved registers won't be preserved.

After a no_callee_saved_registers function is called, all registers may
be clobbered.  If the calling function isn't a no_callee_saved_registers
function, we need to preserve all registers which aren't used by function
calls.

gcc/

PR target/103503
PR target/113312
* config/i386/i386-expand.cc (ix86_expand_call): Set
call_no_callee_saved_registers to true when calling function
with no_callee_saved_registers attribute.  Replace
no_caller_saved_registers check with call_saved_registers check.
Clobber all registers that are not used by the callee with
no_callee_saved_registers attribute.
* config/i386/i386-options.cc (ix86_set_func_type): Set
call_saved_registers to TYPE_NO_CALLEE_SAVED_REGISTERS for
noreturn function.  Disallow no_callee_saved_registers with
interrupt or no_caller_saved_registers attributes together.
(ix86_set_current_function): Replace no_caller_saved_registers
check with call_saved_registers check.
(ix86_handle_no_caller_saved_registers_attribute): Renamed to ...
(ix86_handle_call_saved_registers_attribute): This.
(ix86_gnu_attributes): Add
ix86_handle_call_saved_registers_attribute.
* config/i386/i386.cc (ix86_conditional_register_usage): Replace
no_caller_saved_registers check with call_saved_registers check.
(ix86_function_ok_for_sibcall): Don't allow callee with
no_callee_saved_registers attribute when the calling function
has callee-saved registers.
(ix86_comp_type_attributes): Also check
no_callee_saved_registers.
(ix86_epilogue_uses): Replace no_caller_saved_registers check
with call_saved_registers check.
(ix86_hard_regno_scratch_ok): Likewise.
(ix86_save_reg): Replace no_caller_saved_registers check with
call_saved_registers check.  Don't save any registers for
TYPE_NO_CALLEE_SAVED_REGISTERS.  Save all registers with
TYPE_DEFAULT_CALL_SAVED_REGISTERS if function with
no_callee_saved_registers attribute is called.
(find_drap_reg): Replace no_caller_saved_registers check with
call_saved_registers check.
* config/i386/i386.h (call_saved_registers_type): New enum.
(machine_function): Replace no_caller_saved_registers with
call_saved_registers.  Add call_no_callee_saved_registers.
* doc/extend.texi: Document no_callee_saved_registers attribute.

gcc/testsuite/

PR target/103503
PR target/113312
* gcc.dg/torture/no-callee-saved-run-1a.c: New file.
* gcc.dg/torture/no-callee-saved-run-1b.c: Likewise.
* gcc.target/i386/no-callee-saved-1.c: Likewise.
* gcc.target/i386/no-callee-saved-2.c: Likewise.
* gcc.target/i386/no-callee-saved-3.c: Likewise.
* gcc.target/i386/no-callee-saved-4.c: Likewise.
* gcc.target/i386/no-callee-saved-5.c: Likewise.
* gcc.target/i386/no-callee-saved-6.c: Likewise.
* gcc.target/i386/no-callee-saved-7.c: Likewise.
* gcc.target/i386/no-callee-saved-8.c: Likewise.
* gcc.target/i386/no-callee-saved-9.c: Likewise.
* gcc.target/i386/no-callee-saved-10.c: Likewise.
* gcc.target/i386/no-callee-saved-11.c: Likewise.
* gcc.target/i386/no-callee-saved-12.c: Likewise.
* gcc.target/i386/no-callee-saved-13.c: Likewise.
* gcc.target/i386/no-callee-saved-14.c: Likewise.
* gcc.target/i386/no-callee-saved-15.c: Likewise.
* gcc.target/i386/no-callee-saved-16.c: Likewise.
* gcc.target/i386/no-callee-saved-17.c: Likewise.
* gcc.target/i386/no-callee-saved-18.c: Likewise.
---
 gcc/config/i386/i386-expand.cc| 58 +--
 gcc/config/i386/i386-options.cc   | 49 +
 gcc/config/i386/i386.cc   | 70 +++
 gcc/config/i386/i386.h| 20 +-
 

[PATCH v2 2/2] x86: Don't save callee-saved registers in noreturn functions

2024-01-22 Thread H.J. Lu
There is no need to save callee-saved registers in noreturn functions
if they don't throw nor support exceptions.  We can treat them the same
as functions with no_callee_saved_registers attribute.

Adjust stack-check-17.c for noreturn function which no longer saves any
registers.

With this change, __libc_start_main in glibc 2.39, which is a noreturn
function, is changed from

__libc_start_main:
endbr64
push   %r15
push   %r14
mov%rcx,%r14
push   %r13
push   %r12
push   %rbp
mov%esi,%ebp
push   %rbx
mov%rdx,%rbx
sub$0x28,%rsp
mov%rdi,(%rsp)
mov%fs:0x28,%rax
mov%rax,0x18(%rsp)
xor%eax,%eax
test   %r9,%r9

to

__libc_start_main:
endbr64
sub$0x28,%rsp
mov%esi,%ebp
mov%rdx,%rbx
mov%rcx,%r14
mov%rdi,(%rsp)
mov%fs:0x28,%rax
mov%rax,0x18(%rsp)
xor%eax,%eax
test   %r9,%r9

In Linux kernel 6.7.0 on x86-64, do_exit is changed from

do_exit:
endbr64
call   
push   %r15
push   %r14
push   %r13
push   %r12
mov%rdi,%r12
push   %rbp
push   %rbx
mov%gs:0x0,%rbx
sub$0x28,%rsp
mov%gs:0x28,%rax
mov%rax,0x20(%rsp)
xor%eax,%eax
call   *0x0(%rip)# 
test   $0x2,%ah
je 

to

do_exit:
endbr64
call   
sub$0x28,%rsp
mov%rdi,%r12
mov%gs:0x28,%rax
mov%rax,0x20(%rsp)
xor%eax,%eax
mov%gs:0x0,%rbx
call   *0x0(%rip)# 
test   $0x2,%ah
je 

I compared GCC master branch bootstrap and test times on a slow machine
with 6.6 Linux kernels compiled with the original GCC 13 and the GCC 13
with the backported patch.  The performance data isn't precise since the
measurements were done on different days with different GCC sources under
different 6.6 kernel versions.

GCC master branch build time in seconds:

beforeafter  improvement
30043.75user  30013.16user   0%
1274.85system 1243.72system  2.4%

GCC master branch test time in seconds (new tests added):

beforeafter  improvement
216035.90user 216547.51user  0
27365.51system26658.54system 2.6%

gcc/

PR target/38534
* config/i386/i386-options.cc (ix86_set_func_type): Don't
save and restore callee saved registers for a noreturn function
with nothrow or compiled with -fno-exceptions.

gcc/testsuite/

PR target/38534
* gcc.target/i386/pr38534-1.c: New file.
* gcc.target/i386/pr38534-2.c: Likewise.
* gcc.target/i386/pr38534-3.c: Likewise.
* gcc.target/i386/pr38534-4.c: Likewise.
* gcc.target/i386/stack-check-17.c: Updated.
---
 gcc/config/i386/i386-options.cc   | 16 ++--
 gcc/testsuite/gcc.target/i386/pr38534-1.c | 26 +++
 gcc/testsuite/gcc.target/i386/pr38534-2.c | 18 +
 gcc/testsuite/gcc.target/i386/pr38534-3.c | 19 ++
 gcc/testsuite/gcc.target/i386/pr38534-4.c | 18 +
 .../gcc.target/i386/stack-check-17.c  | 19 +-
 6 files changed, 102 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-4.c

diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
index 0cdea30599e..f965568947c 100644
--- a/gcc/config/i386/i386-options.cc
+++ b/gcc/config/i386/i386-options.cc
@@ -3371,9 +3371,21 @@ ix86_simd_clone_adjust (struct cgraph_node *node)
 static void
 ix86_set_func_type (tree fndecl)
 {
+  /* No need to save and restore callee-saved registers for a noreturn
+ function with nothrow or compiled with -fno-exceptions.
+
+ NB: Don't use TREE_THIS_VOLATILE to check if this is a noreturn
+ function.  The local-pure-const pass turns an interrupt function
+ into a noreturn function by setting TREE_THIS_VOLATILE.  Normally
+ the local-pure-const pass is run after ix86_set_func_type is called.
+ When the local-pure-const pass is enabled for LTO, the interrupt
+ function is marked as noreturn in the IR output, which leads the
+ incompatible attribute error in LTO1.  */
   bool has_no_callee_saved_registers
-= lookup_attribute ("no_callee_saved_registers",
-   TYPE_ATTRIBUTES (TREE_TYPE (fndecl)));
+= (((TREE_NOTHROW (fndecl) || !flag_exceptions)
+   && lookup_attribute ("noreturn", DECL_ATTRIBUTES (fndecl)))
+   || lookup_attribute 

[PATCH v2 0/2] x86: Don't save callee-saved registers if not needed

2024-01-22 Thread H.J. Lu
Changes in v2:

1. Rebase against commit f9df00340e3
2. Don't add redundant clobbered_registers check in ix86_expand_call.

In some cases, there are no need to save callee-saved registers:

1. If a noreturn function doesn't throw nor support exceptions, it can
skip saving callee-saved registers.

2. When an interrupt handler is implemented by an assembly stub which does:

  1. Save all registers.
  2. Call a C function.
  3. Restore all registers.
  4. Return from interrupt.

it is completely unnecessary to save and restore any registers in the C
function called by the assembly stub, even if they would normally be
callee-saved.

This patch set adds no_callee_saved_registers function attribute, which
is complementary to no_caller_saved_registers function attribute, to
classify x86 backend call-saved register handling type with

  1. Default call-saved registers.
  2. No caller-saved registers with no_caller_saved_registers attribute.
  3. No callee-saved registers with no_callee_saved_registers attribute.

Functions of no callee-saved registers won't save callee-saved registers.
If a noreturn function doesn't throw nor support exceptions, it is
classified as the no callee-saved registers type.

With these changes, __libc_start_main in glibc 2.39, which is a noreturn
function, is changed from

__libc_start_main:
endbr64
push   %r15
push   %r14
mov%rcx,%r14
push   %r13
push   %r12
push   %rbp
mov%esi,%ebp
push   %rbx
mov%rdx,%rbx
sub$0x28,%rsp
mov%rdi,(%rsp)
mov%fs:0x28,%rax
mov%rax,0x18(%rsp)
xor%eax,%eax
test   %r9,%r9

to

__libc_start_main:
endbr64
sub$0x28,%rsp
mov%esi,%ebp
mov%rdx,%rbx
mov%rcx,%r14
mov%rdi,(%rsp)
mov%fs:0x28,%rax
mov%rax,0x18(%rsp)
xor%eax,%eax
test   %r9,%r9

In Linux kernel 6.7.0 on x86-64, do_exit is changed from

do_exit:
endbr64
call   
push   %r15
push   %r14
push   %r13
push   %r12
mov%rdi,%r12
push   %rbp
push   %rbx
mov%gs:0x0,%rbx
sub$0x28,%rsp
mov%gs:0x28,%rax
mov%rax,0x20(%rsp)
xor%eax,%eax
call   *0x0(%rip)# 
test   $0x2,%ah
je 

to

do_exit:
endbr64
call   
sub$0x28,%rsp
mov%rdi,%r12
mov%gs:0x28,%rax
mov%rax,0x20(%rsp)
xor%eax,%eax
mov%gs:0x0,%rbx
call   *0x0(%rip)# 
test   $0x2,%ah
je 

I compared GCC master branch bootstrap and test times on a slow machine
with 6.6 Linux kernels compiled with the original GCC 13 and the GCC 13
with the backported patch.  The performance data isn't precise since the
measurements were done on different days with different GCC sources under
different 6.6 kernel versions.

GCC master branch build time in seconds:

beforeafter  improvement
30043.75user  30013.16user   0%
1274.85system 1243.72system  2.4%

GCC master branch test time in seconds (new tests added):

beforeafter  improvement
216035.90user 216547.51user  0
27365.51system26658.54system 2.6%

Backported to GCC 13 to rebuild system glibc and kernel on Fedora 39.
Systems perform normally.

H.J. Lu (2):
  x86: Add no_callee_saved_registers function attribute
  x86: Don't save callee-saved registers in noreturn functions

 gcc/config/i386/i386-expand.cc| 58 +--
 gcc/config/i386/i386-options.cc   | 61 
 gcc/config/i386/i386.cc   | 70 +++
 gcc/config/i386/i386.h| 20 +-
 gcc/doc/extend.texi   |  8 +++
 .../gcc.dg/torture/no-callee-saved-run-1a.c   | 23 ++
 .../gcc.dg/torture/no-callee-saved-run-1b.c   | 59 
 .../gcc.target/i386/no-callee-saved-1.c   | 30 
 .../gcc.target/i386/no-callee-saved-10.c  | 46 
 .../gcc.target/i386/no-callee-saved-11.c  | 11 +++
 .../gcc.target/i386/no-callee-saved-12.c  | 10 +++
 .../gcc.target/i386/no-callee-saved-13.c  | 16 +
 .../gcc.target/i386/no-callee-saved-14.c  | 16 +
 .../gcc.target/i386/no-callee-saved-15.c  | 17 +
 .../gcc.target/i386/no-callee-saved-16.c  | 16 +
 .../gcc.target/i386/no-callee-saved-17.c  | 16 +
 .../gcc.target/i386/no-callee-saved-18.c  | 51 ++
 .../gcc.target/i386/no-callee-saved-2.c   | 30 
 .../gcc.target/i386/no-callee-saved-3.c   |  8 +++
 .../gcc.target/i386/no-callee-saved-4.c   |  8 +++
 .../gcc.target/i386/no-callee-saved-5.c   | 11 +++
 

Re: [PATCH 1/2] rtl-optimization/113255 - base_alias_check vs. pointer difference

2024-01-22 Thread Jeff Law




On 1/15/24 06:34, Richard Biener wrote:

When the x86 backend generates code for cpymem with the rep_8byte
strathegy for the 8 byte aligned main rep movq it needs to compute
an adjusted pointer to the source after doing a prologue aligning
the destination.  It computes that via

   src_ptr + (dest_ptr - orig_dest_ptr)

which is perfectly fine.  On RTL this is then

 8: r134:DI=const(`g'+0x44)
 9: {r133:DI=frame:DI-0x4c;clobber flags:CC;}
   REG_UNUSED flags:CC
56: r129:DI=const(`g'+0x4c)
57: {r129:DI=r129:DI&0xfff8;clobber flags:CC;}
   REG_UNUSED flags:CC
   REG_EQUAL const(`g'+0x4c)&0xfff8
58: {r118:DI=r134:DI-r129:DI;clobber flags:CC;}
   REG_DEAD r134:DI
   REG_UNUSED flags:CC
   REG_EQUAL const(`g'+0x44)-r129:DI
59: {r119:DI=r133:DI-r118:DI;clobber flags:CC;}
   REG_DEAD r133:DI
   REG_UNUSED flags:CC

but as written find_base_term happily picks the first candidate
it finds for the MINUS which means it picks const(`g') rather
than the correct frame:DI.  This way find_base_term (but also
the unfixed find_base_value used by init_alias_analysis to
initialize REG_BASE_VALUE) performs pointer analysis isn't
sound.  The following restricts the handling of multi-operand
operations to the case we know only one can be a pointer.

This for example causes gcc.dg/tree-ssa/pr94969.c to miss some
RTL PRE (I've opened PR113395 for this).  A more drastic patch,
removing base_alias_check results in only gcc.dg/guality/pr41447-1.c
regressing (so testsuite coverage is bad).  I've looked at
gcc.dg/tree-ssa tests and mostly scheduling changes are present,
the cc1plus .text size is only 230 bytes worse.  With the this
less drastic patch below most scheduling changes are gone.

x86_64 might not the very best target to test for impact, but
test coverage on other targets is unlikely to be very much better.

Bootstrapped and tested on x86_64-unknown-linux-gnu (together
with 2/2).  Jeff, can you maybe throw this on your tester?
Jakub, you did the PR64025 fix which was for a similar issue.

No issues across the cross compilers with those two patches.

Jeff


Re: HELP: Questions on unshare_expr

2024-01-22 Thread Qing Zhao


> On Jan 22, 2024, at 2:40 AM, Richard Biener  
> wrote:
> 
> On Fri, Jan 19, 2024 at 5:26 PM Qing Zhao  wrote:
>> 
>> 
>> 
>>> On Jan 19, 2024, at 4:30 AM, Richard Biener  
>>> wrote:
>>> 
>>> On Thu, Jan 18, 2024 at 3:46 PM Qing Zhao  wrote:
 
 
 
> On Jan 17, 2024, at 1:43 AM, Richard Biener  
> wrote:
> 
> On Wed, Jan 17, 2024 at 7:42 AM Richard Biener
>  wrote:
>> 
>> On Tue, Jan 16, 2024 at 9:26 PM Qing Zhao  wrote:
>>> 
>>> 
>>> 
 On Jan 15, 2024, at 4:31 AM, Richard Biener 
  wrote:
 
> All my questions for unshare_expr relate to a  LTO bug that I 
> currently stuck with
> when using .ACCESS_WITH_SIZE in bound sanitizer (only with -flto, 
> without -flto, no issue):
> 
> [opc@qinzhao-aarch64-ol8 gcc]$ sh t
> during IPA pass: modref
> t.c:20:1: internal compiler error: tree code ‘ssa_name’ is not 
> supported in LTO streams
> 0x14c3993 lto_write_tree
> ../../latest-gcc-write/gcc/lto-streamer-out.cc:561
> 0x14c3aeb lto_output_tree_1
> 
> And the value of the tree node that triggered the ICE is:
> (gdb) call debug_tree(expr)
> 
> nothrow
> def_stmt
> version:13 in-free-list>
> 
> Is there any good way to debug LTO bug?
 
 This happens usually when you have a VLA type and its type fields are 
 not
 properly gimplified which usually happens because the frontend fails to
 insert a gimplification point for it (a DECL_EXPR).
>>> 
>>> I found an old gcc bug
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97172
>>> ICE: tree code ‘ssa_name’ is not supported in LTO streams since 
>>> r11-3303-g6450f07388f9fe57
>>> 
>>> Which is very similar to the bug I am having right now.
>>> 
>>> After further study, I suspect that the issue I am having right now 
>>> with the LTO streaming also
>>> relate to “unshare_expr”, “save_expr”, and the combination of these 
>>> two, I suspect that
>>> the current gcc cannot handle the combination of these two correctly 
>>> for my case.
>>> 
>>> My testing case is:
>>> 
>>> #include 
>>> void __attribute__((__noinline__)) setup_and_test_vla (int n1, int n2, 
>>> int m)
>>> {
>>> struct foo {
>>> int n;
>>> int p[][n2][n1] __attribute__((counted_by(n)));
>>> } *f;
>>> 
>>> f = (struct foo *) malloc (sizeof(struct foo) + m*sizeof(int[n2][n1]));
>>> f->n = m;
>>> f->p[m][n2][n1]=1;
>>> return;
>>> }
>>> 
>>> int main(int argc, char *argv[])
>>> {
>>> setup_and_test_vla (10, 11, 20);
>>> return 0;
>>> }
>>> 
>>> Failed with
>>> my_gcc -Os -fsanitize=bounds -flto
>>> 
>>> If changing either n1 or n2 to a constant, the testing passed.
>>> If deleting -flto, the testing passed too.
>>> 
>>> I double checked my code per the suggestions provided by you and Jakub 
>>> in this
>>> email thread, and I think the code should be fine.
>>> 
>>> The code is following:
>>> 
>>> =
>>> 504 /* Instrument array bounds for INDIRECT_REFs whose pointers are
>>> 505POINTER_PLUS_EXPRs of calls to .ACCESS_WITH_SIZE. We create 
>>> special
>>> 506builtins that gets expanded in the sanopt pass, and make an array
>>> 507dimension of it.  ARRAY is the pointer to the base of the array,
>>> 508which is a call to .ACCESS_WITH_SIZE, *OFFSET is the offset to 
>>> the
>>> 509beginning of array.
>>> 510Return NULL_TREE if no instrumentation is emitted.  */
>>> 511
>>> 512 tree
>>> 513 ubsan_instrument_bounds_indirect_ref (location_t loc, tree array, 
>>> tree *offset)
>>> 514 {
>>> 515   if (!is_access_with_size_p (array))
>>> 516 return NULL_TREE;
>>> 517   tree bound = get_bound_from_access_with_size (array);
>>> 518   /* The type of the call to .ACCESS_WITH_SIZE is a pointer type to
>>> 519  the element of the array.  */
>>> 520   tree element_size = TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE 
>>> (array)));
>>> 521   gcc_assert (bound);
>>> 522
>>> 523   /* Given the offset, and the size of each element, the index can 
>>> be
>>> 524  computed as: offset/element_size.  */
>>> 525   *offset = save_expr (*offset);
>>> 526   tree index = fold_build2 (EXACT_DIV_EXPR,
>>> 527sizetype, *offset,
>>> 528unshare_expr (element_size));
>>> 529   /* Create a "(T *) 0" tree node to describe the original array 
>>> type.
>>> 530  We get the original array type from the first argument of the 
>>> call to
>>> 531  .ACCESS_WITH_SIZE (REF, COUNTED_BY_REF, 1, num_bytes, -1).
>>> 532
>>> 533  

Re: [PATCH] c/c++: Tweak warning for 'always_inline function might not be inlinable'

2024-01-22 Thread Hans-Peter Nilsson
> From: Richard Biener 
> Date: Mon, 22 Jan 2024 08:33:47 +0100

> > -   "% function might not be inlinable");
> > +   "% function is not always inlined"
> > +   " unless also declared %");
> 
> I don't like the "is not always inlined", maybe simply reword to
> 
>   "% function might not be inlinable"
>   " unless also declared %"
> 
> ?

Sure.  Though it's a small nuance to which I don't actually
agree, I'll go along with almost anything as long as the
"...declared inline" augmentation is there :-)

Also, I can see that keeping closer to the original wording
as you suggest can be preferable to some.

I assume by your reply that the patch is ok with that change
but will wait another 72 hours for "native speakers" to have
a say.

Thanks!

brgds, H-P


Re: Ping: [PATCHv3] aarch64/expr: Use ccmp when the outer expression is used twice [PR100942]

2024-01-22 Thread Richard Biener
On Mon, Jan 22, 2024 at 2:24 PM Richard Sandiford
 wrote:
>
> Ping for the expr/cfgexpand bits
>
> Richard Sandiford  writes:
> > Andrew Pinski  writes:
> >> Ccmp is not used if the result of the and/ior is used by both
> >> a GIMPLE_COND and a GIMPLE_ASSIGN. This improves the code generation
> >> here by using ccmp in this case.
> >> Two changes is required, first we need to allow the outer statement's
> >> result be used more than once.
> >> The second change is that during the expansion of the gimple, we need
> >> to try using ccmp. This is needed because we don't use expand the ssa
> >> name of the lhs but rather expand directly from the gimple.
> >>
> >> A small note on the ccmp_4.c testcase, we should be able to get slightly
> >> better than with this patch but it is one extra instruction compared to
> >> before.
> >>
> >> Diff from v1:
> >> * v2: Split out expand_gimple_assign_ssa so the we only need to handle
> >> promotion once. Add ccmp_5.c testcase which was suggested. Change comment
> >> on ccmp_candidate_p.
> >
> > I meant more that we should split out the gassign handling in
> > expand_expr_real_1, since we're effectively making cfgexpand follow
> > it more closely.  What do you think about the attached version?
> > Tested on aarch64-linux-gnu and x86_64-linux-gnu.
> >
> > OK for the expr/cfgexpand bits?

OK.

Richard.

> > Thanks,
> > Richard
> >
> > 
> >
> > Ccmp is not used if the result of the and/ior is used by both
> > a GIMPLE_COND and a GIMPLE_ASSIGN. This improves the code generation
> > here by using ccmp in this case.
> > Two changes is required, first we need to allow the outer statement's
> > result be used more than once.
> > The second change is that during the expansion of the gimple, we need
> > to try using ccmp. This is needed because we don't use expand the ssa
> > name of the lhs but rather expand directly from the gimple.
> >
> > A small note on the ccmp_4.c testcase, we should be able to get slightly
> > better than with this patch but it is one extra instruction compared to
> > before.
> >
> >   PR target/100942
> >
> > gcc/ChangeLog:
> >
> >   * ccmp.cc (ccmp_candidate_p): Add outer argument.
> >   Allow if the outer is true and the lhs is used more
> >   than once.
> >   (expand_ccmp_expr): Update call to ccmp_candidate_p.
> >   * expr.h (expand_expr_real_gassign): Declare.
> >   * expr.cc (expand_expr_real_gassign): New function, split out from...
> >   (expand_expr_real_1): ...here.
> >   * cfgexpand.cc (expand_gimple_stmt_1): Use expand_expr_real_gassign.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/aarch64/ccmp_3.c: New test.
> >   * gcc.target/aarch64/ccmp_4.c: New test.
> >   * gcc.target/aarch64/ccmp_5.c: New test.
> >
> > Signed-off-by: Andrew Pinski 
> > Co-authored-by: Richard Sandiford 
> > ---
> >  gcc/ccmp.cc   |  12 +--
> >  gcc/cfgexpand.cc  |  31 ++-
> >  gcc/expr.cc   | 103 --
> >  gcc/expr.h|   3 +
> >  gcc/testsuite/gcc.target/aarch64/ccmp_3.c |  20 +
> >  gcc/testsuite/gcc.target/aarch64/ccmp_4.c |  35 
> >  gcc/testsuite/gcc.target/aarch64/ccmp_5.c |  20 +
> >  7 files changed, 149 insertions(+), 75 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ccmp_3.c
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ccmp_4.c
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ccmp_5.c
> >
> > diff --git a/gcc/ccmp.cc b/gcc/ccmp.cc
> > index 09d6b5595a4..7cb525addf4 100644
> > --- a/gcc/ccmp.cc
> > +++ b/gcc/ccmp.cc
> > @@ -90,9 +90,10 @@ ccmp_tree_comparison_p (tree t, basic_block bb)
> > If all checks OK in expand_ccmp_expr, it emits insns in prep_seq, then
> > insns in gen_seq.  */
> >
> > -/* Check whether G is a potential conditional compare candidate.  */
> > +/* Check whether G is a potential conditional compare candidate; OUTER is 
> > true if
> > +   G is the outer most AND/IOR.  */
> >  static bool
> > -ccmp_candidate_p (gimple *g)
> > +ccmp_candidate_p (gimple *g, bool outer = false)
> >  {
> >tree lhs, op0, op1;
> >gimple *gs0, *gs1;
> > @@ -109,8 +110,9 @@ ccmp_candidate_p (gimple *g)
> >lhs = gimple_assign_lhs (g);
> >op0 = gimple_assign_rhs1 (g);
> >op1 = gimple_assign_rhs2 (g);
> > -  if ((TREE_CODE (op0) != SSA_NAME) || (TREE_CODE (op1) != SSA_NAME)
> > -  || !has_single_use (lhs))
> > +  if ((TREE_CODE (op0) != SSA_NAME) || (TREE_CODE (op1) != SSA_NAME))
> > +return false;
> > +  if (!outer && !has_single_use (lhs))
> >  return false;
> >
> >bb = gimple_bb (g);
> > @@ -284,7 +286,7 @@ expand_ccmp_expr (gimple *g, machine_mode mode)
> >rtx_insn *last;
> >rtx tmp;
> >
> > -  if (!ccmp_candidate_p (g))
> > +  if (!ccmp_candidate_p (g, true))
> >  return NULL_RTX;
> >
> >last = get_last_insn ();
> > diff --git a/gcc/cfgexpand.cc 

Re: [PATCH 3/4] rtl-ssa: Ensure new defs get inserted [PR113070]

2024-01-22 Thread Richard Sandiford
Alex Coplan  writes:
> In r14-5820-ga49befbd2c783e751dc2110b544fe540eb7e33eb I added support to
> RTL-SSA for inserting new insns, which included support for users
> creating new defs.
>
> However, I missed that apply_changes_to_insn needed updating to ensure
> that the new defs actually got inserted into the main def chain.  This
> meant that when the aarch64 ldp/stp pass inserted a new stp insn, the
> stp would just get skipped over during subsequent alias analysis, as its
> def never got inserted into the memory def chain.  This (unsurprisingly)
> led to wrong code.
>
> This patch fixes the issue by ensuring new user-created defs get
> inserted.  I would have preferred to have used a flag internal to the
> defs instead of a separate data structure to keep track of them, but since
> machine_mode increased to 16 bits we're already at 64 bits in access_info,
> and we can't really reuse m_is_temp as the logic in finalize_new_accesses
> requires it to get cleared.
>
> Bootstrapped/regtested as a series on aarch64-linux-gnu, OK for trunk?
>
> Thanks,
> Alex
>
> gcc/ChangeLog:
>
>   PR target/113070
>   * rtl-ssa.h: Include hash-set.h.
>   * rtl-ssa/changes.cc (function_info::finalize_new_accesses): Add
>   new_sets parameter and use it to keep track of new user-created sets.
>   (function_info::apply_changes_to_insn): Also call add_def on new sets.
>   (function_info::change_insns): Add hash_set to keep track of new
>   user-created defs.  Plumb it through.
>   * rtl-ssa/functions.h: Add hash_set parameter to finalize_new_accesses 
> and
>   apply_changes_to_insn.
> ---
>  gcc/rtl-ssa.h   |  1 +
>  gcc/rtl-ssa/changes.cc  | 28 +---
>  gcc/rtl-ssa/functions.h |  6 --
>  3 files changed, 26 insertions(+), 9 deletions(-)
>
> diff --git a/gcc/rtl-ssa.h b/gcc/rtl-ssa.h
> index f0cf656f5ac..17337639ae8 100644
> --- a/gcc/rtl-ssa.h
> +++ b/gcc/rtl-ssa.h
> @@ -50,6 +50,7 @@
>  #include "mux-utils.h"
>  #include "rtlanal.h"
>  #include "cfgbuild.h"
> +#include "hash-set.h"
>  
>  // Provides the global crtl->ssa.
>  #include "memmodel.h"
> diff --git a/gcc/rtl-ssa/changes.cc b/gcc/rtl-ssa/changes.cc
> index ce51d6ccd8d..6119ec3535b 100644
> --- a/gcc/rtl-ssa/changes.cc
> +++ b/gcc/rtl-ssa/changes.cc
> @@ -429,7 +429,8 @@ update_insn_in_place (insn_change )
>  // POS gives the final position of INSN, which hasn't yet been moved into
>  // place.

The new parameter should be documented.  How about:

  // place.  NEW_SETS contains the new set_infos that are being added as part
  // of this change (as opposed to being moved or repurposed from existing
  // instructions).


>  void
> -function_info::finalize_new_accesses (insn_change , insn_info *pos)
> +function_info::finalize_new_accesses (insn_change , insn_info *pos,
> +   hash_set _sets)
>  {
>insn_info *insn = change.insn ();
>  
> @@ -465,6 +466,12 @@ function_info::finalize_new_accesses (insn_change 
> , insn_info *pos)
>   // later in case we see a second write to the same resource.
>   def_info *perm_def = allocate (change.insn (),
>def->resource ());
> +
> + // Keep track of the new set so we remember to add it to the
> + // def chain later.
> + if (new_sets.add (perm_def))
> +   gcc_unreachable (); // We shouldn't see duplicates here.
> +
>   def->set_last_def (perm_def);
>   def = perm_def;
> }
> @@ -647,7 +654,8 @@ function_info::finalize_new_accesses (insn_change 
> , insn_info *pos)
>  // Copy information from CHANGE to its underlying insn_info, given that
>  // the insn_info has already been placed appropriately.

Similarly here.

OK with those changes, thanks.

Richard

>  void
> -function_info::apply_changes_to_insn (insn_change )
> +function_info::apply_changes_to_insn (insn_change ,
> +   hash_set _sets)
>  {
>insn_info *insn = change.insn ();
>if (change.is_deletion ())
> @@ -659,10 +667,11 @@ function_info::apply_changes_to_insn (insn_change 
> )
>// Copy the cost.
>insn->set_cost (change.new_cost);
>  
> -  // Add all clobbers.  Sets and call clobbers never move relative to
> -  // other definitions, so are OK as-is.
> +  // Add all clobbers and newly-created sets.  Existing sets and call
> +  // clobbers never move relative to other definitions, so are OK as-is.
>for (def_info *def : change.new_defs)
> -if (is_a (def) && !def->is_call_clobber ())
> +if ((is_a (def) && !def->is_call_clobber ())
> + || (is_a (def) && new_sets.contains (def)))
>add_def (def);
>  
>// Add all uses, now that their position is final.
> @@ -793,6 +802,10 @@ function_info::change_insns (array_slice 
> changes)
>placeholders[i] = placeholder;
>  }
>  
> +  // We need to keep track of newly-added sets as these need 

Re: [committed] libstdc++: Fix std::format for floating-point chrono::time_point [PR113500]

2024-01-22 Thread Jonathan Wakely
On Mon, 22 Jan 2024 at 09:51, Jonathan Wakely  wrote:
>
> On Sun, 21 Jan 2024 at 22:27, Jonathan Wakely wrote:
> > --- a/libstdc++-v3/testsuite/std/time/clock/file/io.cc
> > +++ b/libstdc++-v3/testsuite/std/time/clock/file/io.cc
> > @@ -17,6 +17,23 @@ test_ostream()
> >VERIFY( ss1.str() == ss2.str() );
> >  }
> >
> > +void
> > +test_format()
> > +{
> > +  using namespace std::chrono;
> > +  auto t = file_clock::now();
> > +
> > +  auto s = std::format("{}", t);
> > +  std::ostringstream ss;
> > +  ss << t;
> > +  VERIFY( s == ss.str() );
> > +
> > +  // PR libstdc++/113500
> > +  auto ft = clock_cast(sys_days(2024y/January/21)) + 0ms + 
> > 2.5s;
> > +  s = std::format("{}", ft);
> > +  VERIFY( s == "2024-01-17 00:00:02.500");
>
> Well obviously that should be 2024-01-21 and I should add a call to
> test_format() in main() so it actually runs.

And I did the same in the gps/io.cc test!

I'll push this fix for that one:

--- a/libstdc++-v3/testsuite/std/time/clock/gps/io.cc
+++ b/libstdc++-v3/testsuite/std/time/clock/gps/io.cc
@@ -42,7 +42,7 @@ test_format()

  // PR libstdc++/113500
  s = std::format("{}", gt + 150ms + 10.5s);
-  VERIFY( s == "2000-01-01 00:00:35.650" );
+  VERIFY( s == "2000-01-01 00:00:23.650" );
}

void
@@ -65,5 +65,6 @@ test_parse()
int main()
{
  test_ostream();
+  test_format();
  test_parse();
}



Re: [PATCH 2/4] rtl-ssa: Support for creating new uses [PR113070]

2024-01-22 Thread Richard Sandiford
Alex Coplan  writes:
> This exposes an interface for users to create new uses in RTL-SSA.
> This is needed for updating uses after inserting a new store pair insn
> in the aarch64 load/store pair fusion pass.
>
> gcc/ChangeLog:
>
>   PR target/113070
>   * rtl-ssa/accesses.cc (function_info::create_use): New.
>   * rtl-ssa/changes.cc (function_info::finalize_new_accesses):
>   Handle temporary uses, ensure new uses end up referring to
>   permanent defs.
>   * rtl-ssa/functions.h (function_info::create_use): Declare.
> ---
>  gcc/rtl-ssa/accesses.cc | 10 ++
>  gcc/rtl-ssa/changes.cc  | 24 +++-
>  gcc/rtl-ssa/functions.h |  5 +
>  3 files changed, 34 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/rtl-ssa/accesses.cc b/gcc/rtl-ssa/accesses.cc
> index ce4a8b8dc00..3f1304fc5bf 100644
> --- a/gcc/rtl-ssa/accesses.cc
> +++ b/gcc/rtl-ssa/accesses.cc
> @@ -1466,6 +1466,16 @@ function_info::create_set (obstack_watermark 
> ,
>return set;
>  }
>  
> +use_info *
> +function_info::create_use (obstack_watermark ,
> +insn_info *insn,
> +set_info *set)
> +{
> +  auto use = change_alloc (watermark, insn, set->resource (), set);
> +  use->m_is_temp = true;
> +  return use;
> +}
> +
>  // Return true if ACCESS1 can represent ACCESS2 and if ACCESS2 can
>  // represent ACCESS1.
>  static bool
> diff --git a/gcc/rtl-ssa/changes.cc b/gcc/rtl-ssa/changes.cc
> index e538b637848..ce51d6ccd8d 100644
> --- a/gcc/rtl-ssa/changes.cc
> +++ b/gcc/rtl-ssa/changes.cc
> @@ -538,7 +538,9 @@ function_info::finalize_new_accesses (insn_change 
> , insn_info *pos)
>unsigned int i = 0;
>for (use_info *use : change.new_uses)
>  {
> -  if (!use->m_has_been_superceded)
> +  if (use->m_is_temp)
> + use->m_has_been_superceded = true;
> +  else if (!use->m_has_been_superceded)
>   {

Is this part necessary for correctness, or is it just a compile-time
optimisation?  We already have temporary uses via make_uses_available,
and in principle, it's possible to reuse the uses for multiple changes
within the same group.  E.g. when replacing A with B in multiple
instructions, it's OK for the associated insn changes to refer to
A's uses directly, or to uses created for A by make_uses_available.

So IMO it'd better to drop this hunk if we can.

> use = allocate_temp (insn, use->resource (), use->def ());
> use->m_has_been_superceded = true;
> @@ -609,15 +611,27 @@ function_info::finalize_new_accesses (insn_change 
> , insn_info *pos)
> m_temp_uses[i] = use = allocate (*use);
> use->m_is_temp = false;
> set_info *def = use->def ();
> -   // Handle cases in which the value was previously not used
> -   // within the block.
> -   if (def && def->m_is_temp)
> +   if (!def || !def->m_is_temp)
> + continue;
> +
> +   if (auto phi = dyn_cast (def))
>   {
> -   phi_info *phi = as_a (def);
> +   // Handle cases in which the value was previously not used
> +   // within the block.
> gcc_assert (phi->is_degenerate ());
> phi = create_degenerate_phi (phi->ebb (), phi->input_value (0));
> use->set_def (phi);
>   }
> +   else
> + {
> +   // The temporary def may also be a set added with this change, in
> +   // which case the permanent set is stored in the last_def link,
> +   // and we need to update the use to refer to the permanent set.
> +   gcc_assert (is_a (def));
> +   auto perm_set = as_a (def->last_def ());
> +   gcc_assert (!perm_set->is_temporary ());
> +   use->set_def (perm_set);
> + }
>   }
>  }
>  
> diff --git a/gcc/rtl-ssa/functions.h b/gcc/rtl-ssa/functions.h
> index 58d0b50ea83..962180e27d6 100644
> --- a/gcc/rtl-ssa/functions.h
> +++ b/gcc/rtl-ssa/functions.h
> @@ -73,6 +73,11 @@ public:
>   insn_info *insn,
>   resource_info resource);
>  
> +  // Create a temporary use.

How about something like:

  // Create a temporary use of SET as part of a change to INSN.
  // SET can be a pre-existing definition or one that is being created
  // as part of the same change group.

(Feel free to tweak the wording.)

OK those changes, thanks.

Richard

> +  use_info *create_use (obstack_watermark ,
> + insn_info *insn,
> + set_info *set);
> +
>// Create a temporary insn with code INSN_CODE and pattern PAT.
>insn_info *create_insn (obstack_watermark ,
> rtx_code insn_code,


Re: [PATCH] RISC-V: Lower vmv.v.x (avl = 1) into vmv.s.x

2024-01-22 Thread Robin Dapp
LGTM.

Regards
 Robin


Re: [PATCH] RISC-V: Fix regressions due to 86de9b66480b710202a2898cf513db105d8c432f

2024-01-22 Thread Robin Dapp
> No, we didn't undo the optimization.
> 
> We just disallow move pattern for (set (reg) (VL_REGNUM)).

Ah, what I referred to was the opposite direction.  We allow
(subreg:V8QI (reg:DI ...)) which is not touched by this patch.

Then it is OK.

Regards
 Robin


Re: [PATCH v5 1/1] RISC-V: Add support for XCVbi extension in CV32E40P

2024-01-22 Thread Mary Bennett


On 09/01/2024 18:43, Jeff Law wrote:



On 1/8/24 06:14, Mary Bennett wrote:
Spec: 
github.com/openhwgroup/core-v-sw/blob/master/specifications/corev-builtin-spec.md


Contributors:
   Mary Bennett 
   Nandni Jamnadas 
   Pietra Ferreira 
   Charlie Keaney
   Jessica Mills
   Craig Blackmore 
   Simon Cook 
   Jeremy Bennett 
   Helene Chelin 

gcc/ChangeLog:
* common/config/riscv/riscv-common.cc: Create XCVbi extension
  support.
* config/riscv/riscv.opt: Likewise.
* config/riscv/corev.md: Implement cv_branch pattern
  for cv.beqimm and cv.bneimm.
* config/riscv/riscv.md: Add CORE-V branch immediate to RISC-V
  branch instruction pattern.
* config/riscv/constraints.md: Implement constraints
  cv_bi_s5 - signed 5-bit immediate.
* config/riscv/predicates.md: Implement predicate
  const_int5s_operand - signed 5 bit immediate.
* doc/sourcebuild.texi: Add XCVbi documentation.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/cv-bi-beqimm-compile-1.c: New test.
* gcc.target/riscv/cv-bi-beqimm-compile-2.c: New test.
* gcc.target/riscv/cv-bi-bneimm-compile-1.c: New test.
* gcc.target/riscv/cv-bi-bneimm-compile-2.c: New test.
* lib/target-supports.exp: Add proc for XCVbi.
Assuming this has gone through a testing cycle, this is fine for the 
trunk.


Thanks,
jeff


This patch passes regression. Are there any other changes required 
before it can be merged?



Kind regards,

Mary



OpenPGP_0xEA0457E97E867D75.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: [PATCH 1/4] rtl-ssa: Run finalize_new_accesses forwards [PR113070]

2024-01-22 Thread Richard Sandiford
Alex Coplan  writes:
> The next patch in this series exposes an interface for creating new uses
> in RTL-SSA.  The intent is that new user-created uses can consume new
> user-created defs in the same change group.  This is so that we can
> correctly update uses of memory when inserting a new store pair insn in
> the aarch64 load/store pair fusion pass (the affected uses need to
> consume the new store pair insn).
>
> As it stands, finalize_new_accesses is called as part of the backwards
> insn placement loop within change_insns, but if we want new uses to be
> able to depend on new defs in the same change group, we need
> finalize_new_accesses to be called on earlier insns first.  This is so
> that when we process temporary uses and turn them into permanent uses,
> we can follow the last_def link on the temporary def to ensure we end up
> with a permanent use consuming a permanent def.
>
> Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk?
>
> Thanks,
> Alex
>
> gcc/ChangeLog:
>
>   PR target/113070
>   * rtl-ssa/changes.cc (function_info::change_insns): Split out the call
>   to finalize_new_accesses from the backwards placement loop, run it
>   forwards in a separate loop.

OK, thanks.

Richard

> ---
>  gcc/rtl-ssa/changes.cc | 21 -
>  1 file changed, 16 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/rtl-ssa/changes.cc b/gcc/rtl-ssa/changes.cc
> index 2fac45ae885..e538b637848 100644
> --- a/gcc/rtl-ssa/changes.cc
> +++ b/gcc/rtl-ssa/changes.cc
> @@ -775,15 +775,26 @@ function_info::change_insns (array_slice 
> changes)
> placeholder = add_placeholder_after (after);
> following_insn = placeholder;
>   }
> -
> -   // Finalize the new list of accesses for the change.  Don't install
> -   // them yet, so that we still have access to the old lists below.
> -   finalize_new_accesses (change,
> -  placeholder ? placeholder : insn);
>   }
>placeholders[i] = placeholder;
>  }
>  
> +  // Finalize the new list of accesses for each change.  Don't install them 
> yet,
> +  // so that we still have access to the old lists below.
> +  //
> +  // Note that we do this forwards instead of in the backwards loop above so
> +  // that any new defs being inserted are processed before new uses of those
> +  // defs, so that the (initially) temporary uses referring to temporary defs
> +  // can be easily updated to become permanent uses referring to permanent 
> defs.
> +  for (unsigned i = 0; i < changes.size (); i++)
> +{
> +  insn_change  = *changes[i];
> +  insn_info *placeholder = placeholders[i];
> +  if (!change.is_deletion ())
> + finalize_new_accesses (change,
> +placeholder ? placeholder : change.insn ());
> +}
> +
>// Remove all definitions that are no longer needed.  After the above,
>// the only uses of such definitions should be dead phis and now-redundant
>// live-out uses.


Ping: [PATCHv3] aarch64/expr: Use ccmp when the outer expression is used twice [PR100942]

2024-01-22 Thread Richard Sandiford
Ping for the expr/cfgexpand bits

Richard Sandiford  writes:
> Andrew Pinski  writes:
>> Ccmp is not used if the result of the and/ior is used by both
>> a GIMPLE_COND and a GIMPLE_ASSIGN. This improves the code generation
>> here by using ccmp in this case.
>> Two changes is required, first we need to allow the outer statement's
>> result be used more than once.
>> The second change is that during the expansion of the gimple, we need
>> to try using ccmp. This is needed because we don't use expand the ssa
>> name of the lhs but rather expand directly from the gimple.
>>
>> A small note on the ccmp_4.c testcase, we should be able to get slightly
>> better than with this patch but it is one extra instruction compared to
>> before.
>>
>> Diff from v1:
>> * v2: Split out expand_gimple_assign_ssa so the we only need to handle
>> promotion once. Add ccmp_5.c testcase which was suggested. Change comment
>> on ccmp_candidate_p.
>
> I meant more that we should split out the gassign handling in
> expand_expr_real_1, since we're effectively making cfgexpand follow
> it more closely.  What do you think about the attached version?
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.
>
> OK for the expr/cfgexpand bits?
>
> Thanks,
> Richard
>
> 
>
> Ccmp is not used if the result of the and/ior is used by both
> a GIMPLE_COND and a GIMPLE_ASSIGN. This improves the code generation
> here by using ccmp in this case.
> Two changes is required, first we need to allow the outer statement's
> result be used more than once.
> The second change is that during the expansion of the gimple, we need
> to try using ccmp. This is needed because we don't use expand the ssa
> name of the lhs but rather expand directly from the gimple.
>
> A small note on the ccmp_4.c testcase, we should be able to get slightly
> better than with this patch but it is one extra instruction compared to
> before.
>
>   PR target/100942
>
> gcc/ChangeLog:
>
>   * ccmp.cc (ccmp_candidate_p): Add outer argument.
>   Allow if the outer is true and the lhs is used more
>   than once.
>   (expand_ccmp_expr): Update call to ccmp_candidate_p.
>   * expr.h (expand_expr_real_gassign): Declare.
>   * expr.cc (expand_expr_real_gassign): New function, split out from...
>   (expand_expr_real_1): ...here.
>   * cfgexpand.cc (expand_gimple_stmt_1): Use expand_expr_real_gassign.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/ccmp_3.c: New test.
>   * gcc.target/aarch64/ccmp_4.c: New test.
>   * gcc.target/aarch64/ccmp_5.c: New test.
>
> Signed-off-by: Andrew Pinski 
> Co-authored-by: Richard Sandiford 
> ---
>  gcc/ccmp.cc   |  12 +--
>  gcc/cfgexpand.cc  |  31 ++-
>  gcc/expr.cc   | 103 --
>  gcc/expr.h|   3 +
>  gcc/testsuite/gcc.target/aarch64/ccmp_3.c |  20 +
>  gcc/testsuite/gcc.target/aarch64/ccmp_4.c |  35 
>  gcc/testsuite/gcc.target/aarch64/ccmp_5.c |  20 +
>  7 files changed, 149 insertions(+), 75 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/ccmp_3.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/ccmp_4.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/ccmp_5.c
>
> diff --git a/gcc/ccmp.cc b/gcc/ccmp.cc
> index 09d6b5595a4..7cb525addf4 100644
> --- a/gcc/ccmp.cc
> +++ b/gcc/ccmp.cc
> @@ -90,9 +90,10 @@ ccmp_tree_comparison_p (tree t, basic_block bb)
> If all checks OK in expand_ccmp_expr, it emits insns in prep_seq, then
> insns in gen_seq.  */
>  
> -/* Check whether G is a potential conditional compare candidate.  */
> +/* Check whether G is a potential conditional compare candidate; OUTER is 
> true if
> +   G is the outer most AND/IOR.  */
>  static bool
> -ccmp_candidate_p (gimple *g)
> +ccmp_candidate_p (gimple *g, bool outer = false)
>  {
>tree lhs, op0, op1;
>gimple *gs0, *gs1;
> @@ -109,8 +110,9 @@ ccmp_candidate_p (gimple *g)
>lhs = gimple_assign_lhs (g);
>op0 = gimple_assign_rhs1 (g);
>op1 = gimple_assign_rhs2 (g);
> -  if ((TREE_CODE (op0) != SSA_NAME) || (TREE_CODE (op1) != SSA_NAME)
> -  || !has_single_use (lhs))
> +  if ((TREE_CODE (op0) != SSA_NAME) || (TREE_CODE (op1) != SSA_NAME))
> +return false;
> +  if (!outer && !has_single_use (lhs))
>  return false;
>  
>bb = gimple_bb (g);
> @@ -284,7 +286,7 @@ expand_ccmp_expr (gimple *g, machine_mode mode)
>rtx_insn *last;
>rtx tmp;
>  
> -  if (!ccmp_candidate_p (g))
> +  if (!ccmp_candidate_p (g, true))
>  return NULL_RTX;
>  
>last = get_last_insn ();
> diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
> index 1db22f0a1a3..381ed2c82d7 100644
> --- a/gcc/cfgexpand.cc
> +++ b/gcc/cfgexpand.cc
> @@ -3971,37 +3971,18 @@ expand_gimple_stmt_1 (gimple *stmt)
> {
>   rtx target, temp;
>   bool nontemporal = gimple_assign_nontemporal_move_p (assign_stmt);
> - struct 

[PATCH][GCC][Arm] Add pattern for bswap + rotate -> rev16 [Bug 108933]

2024-01-22 Thread Matthieu Longo

rev16 pattern was not recognised anymore as a change in the bswap tree
pass was introducing a new GIMPLE form, not recognized by the assembly
final transformation pass.

More details in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108933

gcc/ChangeLog:

PR target/108933
* config/arm/arm.md (*arm_rev16si2_alt3): new pattern to convert
  a bswap + rotate by 16 bits into rev16

gcc/testsuite/ChangeLog:

PR target/108933
* gcc.target/arm/rev16.c: Moved to...
* gcc.target/arm/rev16_1.c: ...here.
* gcc.target/arm/rev16_2.c: New test to check that rev16 is
  emitted.diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
4a98f2d7b6251da940806b26d4c310a7f7af927b..330c0a5ce2926439760466746a68fd5f6c5b1b09
 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -12601,6 +12601,18 @@
(set_attr "type" "rev")]
 )
 
+;; Similar pattern to match (rotate (bswap) 16)
+(define_insn "*arm_rev16si2_alt3"
+  [(set (match_operand:SI 0 "register_operand" "=l,r")
+(rotate:SI (bswap:SI (match_operand:SI 1 "register_operand" "l,r"))
+ (const_int 16)))]
+  "arm_arch6"
+  "rev16\\t%0, %1"
+  [(set_attr "arch" "t,32")
+   (set_attr "length" "2,4")
+   (set_attr "type" "rev")]
+)
+
 (define_expand "arm_rev16si2"
   [(set (match_operand:SI 0 "s_register_operand")
(bswap:SI (match_operand:SI 1 "s_register_operand")))]
diff --git a/gcc/testsuite/gcc.target/arm/rev16.c 
b/gcc/testsuite/gcc.target/arm/rev16_1.c
similarity index 100%
rename from gcc/testsuite/gcc.target/arm/rev16.c
rename to gcc/testsuite/gcc.target/arm/rev16_1.c
diff --git a/gcc/testsuite/gcc.target/arm/rev16_2.c 
b/gcc/testsuite/gcc.target/arm/rev16_2.c
new file mode 100644
index 
..90213f9b49f45340ced4f29c31446971dbc88ecd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/rev16_2.c
@@ -0,0 +1,20 @@
+/* { dg-options "-O2" } */
+/* { dg-do compile } */
+
+typedef unsigned int __u32;
+
+__u32
+__rev16_32_alt (__u32 x)
+{
+  return (((__u32)(x) & (__u32)0xff00ff00UL) >> 8)
+ | (((__u32)(x) & (__u32)0x00ff00ffUL) << 8);
+}
+
+__u32
+__rev16_32 (__u32 x)
+{
+  return (((__u32)(x) & (__u32)0x00ff00ffUL) << 8)
+ | (((__u32)(x) & (__u32)0xff00ff00UL) >> 8);
+}
+
+/* { dg-final { scan-assembler-times {rev16\tr[0-9]+, r[0-9]+} 2 } } */
\ No newline at end of file


Re: [PATCH] fold-const: Fold larger VIEW_CONVERT_EXPRs [PR113462]

2024-01-22 Thread Richard Biener
On Mon, 22 Jan 2024, Jakub Jelinek wrote:

> Hi!
> 
> On Mon, Jan 22, 2024 at 11:56:11AM +0100, Richard Biener wrote:
> > > I guess the || total_bytes * BITS_PER_UNIT > HOST_BITS_PER_DOUBLE_INT
> > > conditions make no sense, all we care is whether it fits in the buffer
> > > or not.
> > > But then there is
> > > fold_view_convert_expr
> > > (and other spots) which use
> > >   /* We support up to 1024-bit values (for GCN/RISC-V V128QImode).  */
> > >   unsigned char buffer[128];
> > > or something similar.
> > > Perhaps we could use XALLOCAVEC there instead (or use it only for the
> > > larger sizes and keep the static buffer for the common case).
> > 
> > Well, yes.  V_C_E folding could do this but then the native_encode_expr
> > API could also allow lazy allocation or so.
> 
> native_encode_expr can't reallocate, it has to fill in whatever buffer it
> has been called with, it can be in the middle of something else etc.
> 
> The following patch is what I meant, I think having some upper bound is
> desirable so that we don't spend too much time trying to VCE fold 2GB
> structures (after all, the APIs also use int for lengths) and similar and 
> passed
> make check-gcc check-g++ -j32 -k GCC_TEST_RUN_EXPENSIVE=1 
> RUNTESTFLAGS="GCC_TEST_RUN_EXPENSIVE=1 dg.exp='*bitint* pr112673.c 
> builtin-stdc-bit-*.c pr112566-2.c pr112511.c' dg-torture.exp=*bitint* 
> dfp.exp=*bitint*"
> (my usual quick test for bitint related changes).  Ok for trunk if it passes
> full bootstrap/regtest?

OK.

Thanks,
Richard.

> 2024-01-22  Jakub Jelinek  
> 
>   PR tree-optimization/113462
>   * fold-const.cc (native_interpret_int): Don't punt if total_bytes
>   is larger than HOST_BITS_PER_DOUBLE_INT / BITS_PER_UNIT.
>   (fold_view_convert_expr): Use XALLOCAVEC buffers for types with
>   sizes between 129 and 8192 bytes.
> 
> --- gcc/fold-const.cc.jj  2024-01-12 10:07:58.202851544 +0100
> +++ gcc/fold-const.cc 2024-01-22 12:09:05.116253393 +0100
> @@ -8773,8 +8773,7 @@ native_interpret_int (tree type, const u
>else
>  total_bytes = GET_MODE_SIZE (SCALAR_INT_TYPE_MODE (type));
>  
> -  if (total_bytes > len
> -  || total_bytes * BITS_PER_UNIT > HOST_BITS_PER_DOUBLE_INT)
> +  if (total_bytes > len)
>  return NULL_TREE;
>  
>wide_int result = wi::from_buffer (ptr, total_bytes);
> @@ -9329,9 +9328,10 @@ fold_view_convert_vector_encoding (tree
>  static tree
>  fold_view_convert_expr (tree type, tree expr)
>  {
> -  /* We support up to 1024-bit values (for GCN/RISC-V V128QImode).  */
>unsigned char buffer[128];
> +  unsigned char *buf;
>int len;
> +  HOST_WIDE_INT l;
>  
>/* Check that the host and target are sane.  */
>if (CHAR_BIT != 8 || BITS_PER_UNIT != 8)
> @@ -9341,11 +9341,23 @@ fold_view_convert_expr (tree type, tree
>  if (tree res = fold_view_convert_vector_encoding (type, expr))
>return res;
>  
> -  len = native_encode_expr (expr, buffer, sizeof (buffer));
> +  l = int_size_in_bytes (type);
> +  if (l > (int) sizeof (buffer)
> +  && l <= WIDE_INT_MAX_PRECISION / BITS_PER_UNIT)
> +{
> +  buf = XALLOCAVEC (unsigned char, l);
> +  len = l;
> +}
> +  else
> +{
> +  buf = buffer;
> +  len = sizeof (buffer);
> +}
> +  len = native_encode_expr (expr, buf, len);
>if (len == 0)
>  return NULL_TREE;
>  
> -  return native_interpret_expr (type, buffer, len);
> +  return native_interpret_expr (type, buf, len);
>  }
>  
>  /* Build an expression for the address of T.  Folds away INDIRECT_REF
> 
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[PATCH] fold-const: Fold larger VIEW_CONVERT_EXPRs [PR113462]

2024-01-22 Thread Jakub Jelinek
Hi!

On Mon, Jan 22, 2024 at 11:56:11AM +0100, Richard Biener wrote:
> > I guess the || total_bytes * BITS_PER_UNIT > HOST_BITS_PER_DOUBLE_INT
> > conditions make no sense, all we care is whether it fits in the buffer
> > or not.
> > But then there is
> > fold_view_convert_expr
> > (and other spots) which use
> >   /* We support up to 1024-bit values (for GCN/RISC-V V128QImode).  */
> >   unsigned char buffer[128];
> > or something similar.
> > Perhaps we could use XALLOCAVEC there instead (or use it only for the
> > larger sizes and keep the static buffer for the common case).
> 
> Well, yes.  V_C_E folding could do this but then the native_encode_expr
> API could also allow lazy allocation or so.

native_encode_expr can't reallocate, it has to fill in whatever buffer it
has been called with, it can be in the middle of something else etc.

The following patch is what I meant, I think having some upper bound is
desirable so that we don't spend too much time trying to VCE fold 2GB
structures (after all, the APIs also use int for lengths) and similar and passed
make check-gcc check-g++ -j32 -k GCC_TEST_RUN_EXPENSIVE=1 
RUNTESTFLAGS="GCC_TEST_RUN_EXPENSIVE=1 dg.exp='*bitint* pr112673.c 
builtin-stdc-bit-*.c pr112566-2.c pr112511.c' dg-torture.exp=*bitint* 
dfp.exp=*bitint*"
(my usual quick test for bitint related changes).  Ok for trunk if it passes
full bootstrap/regtest?

2024-01-22  Jakub Jelinek  

PR tree-optimization/113462
* fold-const.cc (native_interpret_int): Don't punt if total_bytes
is larger than HOST_BITS_PER_DOUBLE_INT / BITS_PER_UNIT.
(fold_view_convert_expr): Use XALLOCAVEC buffers for types with
sizes between 129 and 8192 bytes.

--- gcc/fold-const.cc.jj2024-01-12 10:07:58.202851544 +0100
+++ gcc/fold-const.cc   2024-01-22 12:09:05.116253393 +0100
@@ -8773,8 +8773,7 @@ native_interpret_int (tree type, const u
   else
 total_bytes = GET_MODE_SIZE (SCALAR_INT_TYPE_MODE (type));
 
-  if (total_bytes > len
-  || total_bytes * BITS_PER_UNIT > HOST_BITS_PER_DOUBLE_INT)
+  if (total_bytes > len)
 return NULL_TREE;
 
   wide_int result = wi::from_buffer (ptr, total_bytes);
@@ -9329,9 +9328,10 @@ fold_view_convert_vector_encoding (tree
 static tree
 fold_view_convert_expr (tree type, tree expr)
 {
-  /* We support up to 1024-bit values (for GCN/RISC-V V128QImode).  */
   unsigned char buffer[128];
+  unsigned char *buf;
   int len;
+  HOST_WIDE_INT l;
 
   /* Check that the host and target are sane.  */
   if (CHAR_BIT != 8 || BITS_PER_UNIT != 8)
@@ -9341,11 +9341,23 @@ fold_view_convert_expr (tree type, tree
 if (tree res = fold_view_convert_vector_encoding (type, expr))
   return res;
 
-  len = native_encode_expr (expr, buffer, sizeof (buffer));
+  l = int_size_in_bytes (type);
+  if (l > (int) sizeof (buffer)
+  && l <= WIDE_INT_MAX_PRECISION / BITS_PER_UNIT)
+{
+  buf = XALLOCAVEC (unsigned char, l);
+  len = l;
+}
+  else
+{
+  buf = buffer;
+  len = sizeof (buffer);
+}
+  len = native_encode_expr (expr, buf, len);
   if (len == 0)
 return NULL_TREE;
 
-  return native_interpret_expr (type, buffer, len);
+  return native_interpret_expr (type, buf, len);
 }
 
 /* Build an expression for the address of T.  Folds away INDIRECT_REF


Jakub



Re: Re: [PATCH] RISC-V: Fix regressions due to 86de9b66480b710202a2898cf513db105d8c432f

2024-01-22 Thread juzhe.zh...@rivai.ai
No, we didn't undo the optimization.

We just disallow move pattern for (set (reg) (VL_REGNUM)).




juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2024-01-22 19:25
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Fix regressions due to 
86de9b66480b710202a2898cf513db105d8c432f
 
Hi Juzhe,
 
in principle this seems ok to me but I wonder about:
 
> We shouldn't worry about subreg:...VL_REGNUM since it's impossible
> that we can have such situation,
 
I think we allow this in legitimize_move for situations like
(subreg:SI (reg:V4QI)).  That was not added for correctness but
optimization - are we sure we don't undo this optimization with
that change?
 
Regards
Robin
 
 


Re: [PATCH] RISC-V: Fix regressions due to 86de9b66480b710202a2898cf513db105d8c432f

2024-01-22 Thread Robin Dapp


Hi Juzhe,

in principle this seems ok to me but I wonder about:

> We shouldn't worry about subreg:...VL_REGNUM since it's impossible
> that we can have such situation,

I think we allow this in legitimize_move for situations like
(subreg:SI (reg:V4QI)).  That was not added for correctness but
optimization - are we sure we don't undo this optimization with
that change?

Regards
 Robin



Re: [PATCH v3] c++/modules: Fix handling of extern templates in modules [PR112820]

2024-01-22 Thread Nathaniel Shead
On Wed, Jan 17, 2024 at 10:51:16AM -0500, Jason Merrill wrote:
> On 1/17/24 01:33, Nathaniel Shead wrote:
> > On Mon, Jan 15, 2024 at 06:10:55PM -0500, Jason Merrill wrote:
> > > Under what circumstances does it make sense for CLASSTYPE_INTERFACE_ONLY 
> > > to
> > > be set in the context of modules, anyway?  We probably want to propagate 
> > > it
> > > for things in the global module so that various libstdc++ explicit
> > > instantiations work the same with import std.
> > > 
> > > For an class imported from a named module, this ties into the earlier
> > > discussion about vtables and inlines that hasn't resolved yet in the ABI
> > > committee.  But it's certainly significantly interface-like.  And I would
> > > expect maybe_suppress_debug_info to suppress the debug info for such a 
> > > class
> > > on the assumption that the module unit has the needed debug info.
> > > 
> > > Jason
> > > 
> > 
> > Here's another approach for this patch. This still only fixes the
> > specific issues in the PR, I think vtable handling etc. should wait till
> > stage 1 because it involves a lot of messing around in decl2.cc.
> > 
> > As mentioned in the commit message, after thinking more about it I don't
> > think we (in general) want to propagate CLASSTYPE_INTERFACE_ONLY, even
> > for declarations in the GMF. This makes sense to me because typically it
> > can only be accurately determined at the end of the TU, which we haven't
> > yet arrived at after importing. For instance, for a polymorphic class in
> > the GMF without a key method, that we import from a module and then
> > proceed to define the key method later on in this TU.
> 
> That sounds right for a module implementation unit or the GMF.
> 
> > Bootstrapped and partially regtested on x86_64-pc-linux-gnu (so far only
> > modules.exp): OK for trunk if full regtesting passes?
> 
> Please add a reference to ABI issue 170
> (https://github.com/itanium-cxx-abi/cxx-abi/issues/170).  OK with that
> change if Nathan doesn't have any further comments this week.
> 

Thanks. Here's what I'll push tomorrow unless I hear otherwise.

-- >8 --

Currently, extern templates are detected by looking for the
DECL_EXTERNAL flag on a TYPE_DECL. However, this is incorrect:
TYPE_DECLs don't actually set this flag, and it happens to work by
coincidence due to TYPE_DECL_SUPPRESS_DEBUG happening to use the same
underlying bit. This however causes issues with other TYPE_DECLs that
also happen to have suppressed debug information.

Instead, this patch reworks the logic so CLASSTYPE_INTERFACE_ONLY is
always emitted into the module BMI and can then be used to check for an
extern template correctly.

Otherwise, for other declarations we always want to redetermine this:
even for declarations from the GMF, we may change our mind on whether to
import or export depending on decisions made later in the TU after
importing so we shouldn't decide this now, or necessarily reuse what the
module we'd imported had decided.

Some of this may need to change in the future to account for
https://github.com/itanium-cxx-abi/cxx-abi/issues/170.

PR c++/112820
PR c++/102607

gcc/cp/ChangeLog:

* module.cc (trees_out::lang_type_bools): Write interface_only
and interface_unknown.
(trees_in::lang_type_bools): Read the above flags.
(trees_in::decl_value): Reset CLASSTYPE_INTERFACE_* except for
extern templates.
(trees_in::read_class_def): Remove buggy extern template
handling.

gcc/testsuite/ChangeLog:

* g++.dg/modules/debug-2_a.C: New test.
* g++.dg/modules/debug-2_b.C: New test.
* g++.dg/modules/debug-2_c.C: New test.
* g++.dg/modules/debug-3_a.C: New test.
* g++.dg/modules/debug-3_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/module.cc | 37 ++--
 gcc/testsuite/g++.dg/modules/debug-2_a.C |  9 ++
 gcc/testsuite/g++.dg/modules/debug-2_b.C |  8 +
 gcc/testsuite/g++.dg/modules/debug-2_c.C |  9 ++
 gcc/testsuite/g++.dg/modules/debug-3_a.C |  8 +
 gcc/testsuite/g++.dg/modules/debug-3_b.C |  9 ++
 6 files changed, 64 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/debug-2_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/debug-2_b.C
 create mode 100644 gcc/testsuite/g++.dg/modules/debug-2_c.C
 create mode 100644 gcc/testsuite/g++.dg/modules/debug-3_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/debug-3_b.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 8db662c0267..70785493561 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -5806,10 +5806,8 @@ trees_out::lang_type_bools (tree t)
 
   WB ((lang->gets_delete >> 0) & 1);
   WB ((lang->gets_delete >> 1) & 1);
-  // Interfaceness is recalculated upon reading.  May have to revisit?
-  // How do dllexport and dllimport interact across a module?
-  // lang->interface_only
-  // lang->interface_unknown
+  WB (lang->interface_only);
+  

RE: [PATCH][GCC][Arm] Define __ARM_FEATURE_BF16 when +bf16 feature is enabled

2024-01-22 Thread Matthieu Longo
Hi Richard,

The conditions for __ARM_BF16_FORMAT_ALTERNATIVE are redundant, so I wanted to 
simplify it.
TARGET_BF16_SIMD implies TARGET_BF16_FP to be true as well.

#define TARGET_BF16_FP (TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_VFP5 \
&& arm_arch8_2 && arm_arch_bf16)
#define TARGET_BF16_SIMD (TARGET_NEON && TARGET_VFP5 \
  && arm_arch8_2 && arm_arch_bf16)
#define TARGET_NEON \
  (TARGET_32BIT && TARGET_HARD_FLOAT\
   && bitmap_bit_p (arm_active_target.isa, isa_bit_neon))

Please let me know if you agree on the simplification.

Regards,
Matthieu

-Original Message-
From: Richard Earnshaw  
Sent: Wednesday, January 10, 2024 3:26 PM
To: Matthieu Longo ; gcc-patches@gcc.gnu.org
Cc: Richard Earnshaw ; Kyrylo Tkachov 

Subject: Re: [PATCH][GCC][Arm] Define __ARM_FEATURE_BF16 when +bf16 feature is 
enabled



On 08/01/2024 17:21, Matthieu Longo wrote:
> Hi,
> 
> Arm GCC backend does not define __ARM_FEATURE_BF16 when +bf16 is 
> specified (via -march option, or target pragma) whereas it is supposed 
> to be tested before including arm_bf16.h (as specified in ACLE document:
> https://arm-software.github.io/acle/main/acle.html#arm_bf16h).
> 
> gcc/ChangeLog:
> 
>      * config/arm/arm-c.cc (arm_cpu_builtins): define
> __ARM_FEATURE_BF16
>      * config/arm/arm.h: define TARGET_BF16
> 
> Ok for master ?
> 
> Matthieu
index
2e181bf7f36bab1209d5358e65d9513541683632..21ca22ac71119eda4ff01709aa95002ca13b1813
100644
--- a/gcc/config/arm/arm-c.cc
+++ b/gcc/config/arm/arm-c.cc
@@ -425,12 +425,14 @@ arm_cpu_builtins (struct cpp_reader* pfile)
   arm_arch_cde_coproc);

def_or_undef_macro (pfile, "__ARM_FEATURE_MATMUL_INT8", TARGET_I8MM);
+
+  def_or_undef_macro (pfile, "__ARM_FEATURE_BF16", TARGET_BF16);
+  def_or_undef_macro (pfile, "__ARM_BF16_FORMAT_ALTERNATIVE",
+ TARGET_BF16_FP);
def_or_undef_macro (pfile, "__ARM_FEATURE_BF16_SCALAR_ARITHMETIC",
  TARGET_BF16_FP);
def_or_undef_macro (pfile, "__ARM_FEATURE_BF16_VECTOR_ARITHMETIC",
  TARGET_BF16_SIMD);
-  def_or_undef_macro (pfile, "__ARM_BF16_FORMAT_ALTERNATIVE",
- TARGET_BF16_FP || TARGET_BF16_SIMD);

Why is the definition of __ARM_BF16_FORMAT_ALTERNATIVE changed?  And why is 
there explanation of that change?  It doesn't seem directly related to $subject.

R.

  }

  void


Re: [PATCH] lower-bitint: Handle INTEGER_CST rhs1 in handle_cast [PR113462]

2024-01-22 Thread Richard Biener
On Mon, 22 Jan 2024, Jakub Jelinek wrote:

> On Mon, Jan 22, 2024 at 11:27:52AM +0100, Richard Biener wrote:
> > We run into
> > 
> > static tree
> > native_interpret_int (tree type, const unsigned char *ptr, int len)
> > { 
> > ...
> >   if (total_bytes > len
> >   || total_bytes * BITS_PER_UNIT > HOST_BITS_PER_DOUBLE_INT)
> > return NULL_TREE;
> > 
> > OTOH using a V_C_E to "truncate" a _BitInt looks wrong?  OTOH the
> > check doesn't really handle native_encode_expr using the "proper"
> > wide_int encoding however that's exactly handled.  So it might be
> > a pre-existing issue that's only uncovered by large _BitInts
> > (__int128 might show similar issues?)
> 
> I guess the || total_bytes * BITS_PER_UNIT > HOST_BITS_PER_DOUBLE_INT
> conditions make no sense, all we care is whether it fits in the buffer
> or not.
> But then there is
> fold_view_convert_expr
> (and other spots) which use
>   /* We support up to 1024-bit values (for GCN/RISC-V V128QImode).  */
>   unsigned char buffer[128];
> or something similar.
> Perhaps we could use XALLOCAVEC there instead (or use it only for the
> larger sizes and keep the static buffer for the common case).

Well, yes.  V_C_E folding could do this but then the native_encode_expr
API could also allow lazy allocation or so.

>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] lower-bitint: Handle INTEGER_CST rhs1 in handle_cast [PR113462]

2024-01-22 Thread Jakub Jelinek
On Mon, Jan 22, 2024 at 11:27:52AM +0100, Richard Biener wrote:
> We run into
> 
> static tree
> native_interpret_int (tree type, const unsigned char *ptr, int len)
> { 
> ...
>   if (total_bytes > len
>   || total_bytes * BITS_PER_UNIT > HOST_BITS_PER_DOUBLE_INT)
> return NULL_TREE;
> 
> OTOH using a V_C_E to "truncate" a _BitInt looks wrong?  OTOH the
> check doesn't really handle native_encode_expr using the "proper"
> wide_int encoding however that's exactly handled.  So it might be
> a pre-existing issue that's only uncovered by large _BitInts
> (__int128 might show similar issues?)

I guess the || total_bytes * BITS_PER_UNIT > HOST_BITS_PER_DOUBLE_INT
conditions make no sense, all we care is whether it fits in the buffer
or not.
But then there is
fold_view_convert_expr
(and other spots) which use
  /* We support up to 1024-bit values (for GCN/RISC-V V128QImode).  */
  unsigned char buffer[128];
or something similar.
Perhaps we could use XALLOCAVEC there instead (or use it only for the
larger sizes and keep the static buffer for the common case).

Jakub



Re: [committed] libstdc++: Fix std::format for floating-point chrono::time_point [PR113500]

2024-01-22 Thread Jonathan Wakely
On Sun, 21 Jan 2024 at 22:27, Jonathan Wakely wrote:
> --- a/libstdc++-v3/testsuite/std/time/clock/file/io.cc
> +++ b/libstdc++-v3/testsuite/std/time/clock/file/io.cc
> @@ -17,6 +17,23 @@ test_ostream()
>VERIFY( ss1.str() == ss2.str() );
>  }
>
> +void
> +test_format()
> +{
> +  using namespace std::chrono;
> +  auto t = file_clock::now();
> +
> +  auto s = std::format("{}", t);
> +  std::ostringstream ss;
> +  ss << t;
> +  VERIFY( s == ss.str() );
> +
> +  // PR libstdc++/113500
> +  auto ft = clock_cast(sys_days(2024y/January/21)) + 0ms + 2.5s;
> +  s = std::format("{}", ft);
> +  VERIFY( s == "2024-01-17 00:00:02.500");

Well obviously that should be 2024-01-21 and I should add a call to
test_format() in main() so it actually runs.

I'll push this fix:

--- a/libstdc++-v3/testsuite/std/time/clock/file/io.cc
+++ b/libstdc++-v3/testsuite/std/time/clock/file/io.cc
@@ -31,7 +31,7 @@ test_format()
  // PR libstdc++/113500
  auto ft = clock_cast(sys_days(2024y/January/21)) + 0ms + 2.5s;
  s = std::format("{}", ft);
-  VERIFY( s == "2024-01-17 00:00:02.500");
+  VERIFY( s == "2024-01-21 00:00:02.500");
}

void
@@ -54,5 +54,6 @@ test_parse()
int main()
{
  test_ostream();
+  test_format();
  test_parse();
}



[PATCH] RISC-V: Fix regressions due to 86de9b66480b710202a2898cf513db105d8c432f

2024-01-22 Thread Juzhe-Zhong
This patch fixes the recent regression:

FAIL: gcc.dg/torture/float32-tg-2.c   -O1  (internal compiler error: in 
reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg-2.c   -O1  (test for excess errors)
FAIL: gcc.dg/torture/float32-tg-2.c   -O2  (internal compiler error: in 
reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg-2.c   -O2  (test for excess errors)
FAIL: gcc.dg/torture/float32-tg-2.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (internal compiler error: in reg_or_subregno, at 
jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg-2.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (test for excess errors)
FAIL: gcc.dg/torture/float32-tg-2.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (internal compiler error: in reg_or_subregno, at 
jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg-2.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (test for excess errors)
FAIL: gcc.dg/torture/float32-tg-2.c   -O3 -g  (internal compiler error: in 
reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg-2.c   -O3 -g  (test for excess errors)
FAIL: gcc.dg/torture/float32-tg-2.c   -Os  (internal compiler error: in 
reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg-2.c   -Os  (test for excess errors)
FAIL: gcc.dg/torture/float32-tg.c   -O1  (internal compiler error: in 
reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg.c   -O1  (test for excess errors)
FAIL: gcc.dg/torture/float32-tg.c   -O2  (internal compiler error: in 
reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg.c   -O2  (test for excess errors)
FAIL: gcc.dg/torture/float32-tg.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (internal compiler error: in reg_or_subregno, at 
jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (test for excess errors)
FAIL: gcc.dg/torture/float32-tg.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (internal compiler error: in reg_or_subregno, at 
jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (test for excess errors)
FAIL: gcc.dg/torture/float32-tg.c   -O3 -g  (internal compiler error: in 
reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg.c   -O3 -g  (test for excess errors)
FAIL: gcc.dg/torture/float32-tg.c   -Os  (internal compiler error: in 
reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg.c   -Os  (test for excess errors)
FAIL: gcc.dg/torture/pr48124-4.c   -O1  (internal compiler error: in 
reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/pr48124-4.c   -O1  (test for excess errors)
FAIL: gcc.dg/torture/pr48124-4.c   -O2  (internal compiler error: in 
reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/pr48124-4.c   -O2  (test for excess errors)
FAIL: gcc.dg/torture/pr48124-4.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (internal compiler error: in reg_or_subregno, at 
jump.cc:1895)
FAIL: gcc.dg/torture/pr48124-4.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (test for excess errors)
FAIL: gcc.dg/torture/pr48124-4.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (internal compiler error: in reg_or_subregno, at 
jump.cc:1895)
FAIL: gcc.dg/torture/pr48124-4.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (test for excess errors)
FAIL: gcc.dg/torture/pr48124-4.c   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  (internal compiler error: in 
reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/pr48124-4.c   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  (test for excess errors)
FAIL: gcc.dg/torture/pr48124-4.c   -O3 -g  (internal compiler error: in 
reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/pr48124-4.c   -O3 -g  (test for excess errors)
FAIL: gcc.dg/torture/pr48124-4.c   -Os  (internal compiler error: in 
reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/pr48124-4.c   -Os  (test for excess errors)

due to commit 86de9b66480b710202a2898cf513db105d8c432f.

The root cause is register_operand and reg_or_subregno are consistent so we 
reach the assertion fail.

We shouldn't worry about subreg:...VL_REGNUM since it's impossible that we can 
have such situation,
that is, we only have (set (reg) (reg:VL_REGNUM)) which generate "csrr vl" ASM 
for first fault load instructions (vleff).
So, using REG_P and REGNO must be totally solid and robostic.

Since we don't allow VL_RENUM involved into register allocation and we don't 
have such constraint, we always use this
following pattern to generate "csrr vl" ASM:

(define_insn "read_vlsi"
  [(set (match_operand:SI 0 "register_operand" "=r")
(reg:SI VL_REGNUM))]
  "TARGET_VECTOR"
  "csrr\t%0,vl"
  [(set_attr "type" "rdvl")
   (set_attr "mode" "SI")])

So the check in riscv.md is to disallow such situation fall into move pattern 
in riscv.md

Tested on both RV32/RV64 

[PATCH] Refactor exit PHI handling in vectorizer epilogue peeling

2024-01-22 Thread Richard Biener
This refactors the handling of PHIs inbetween the main and the
epilogue loop.  Instead of trying to handle the multiple exit
and original single exit case together the following separates
these cases resulting in much easier to understand code.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

* tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
Separate single and multi-exit case when creating PHIs between
the main and epilogue.
---
 gcc/tree-vect-loop-manip.cc | 135 
 1 file changed, 74 insertions(+), 61 deletions(-)

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index eacbc022549..873a28d7c56 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1682,77 +1682,60 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop 
*loop, edge loop_exit,
}
 
  /* Create the merge PHI nodes in new_preheader and populate the
-arguments for the main exit.  */
- for (auto gsi_from = gsi_start_phis (loop->header),
-  gsi_to = gsi_start_phis (new_loop->header);
-  !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
-  gsi_next (_from), gsi_next (_to))
+arguments for the exits.  */
+ if (multiple_exits_p)
{
- gimple *from_phi = gsi_stmt (gsi_from);
- gimple *to_phi = gsi_stmt (gsi_to);
- tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi,
-   loop_latch_edge (loop));
-
- /* Check if we've already created a new phi node during edge
-redirection.  If we have, only propagate the value
-downwards in case there is no merge block.  */
- tree *res;
- if ((res = new_phi_args.get (new_arg)))
+ for (auto gsi_from = gsi_start_phis (loop->header),
+  gsi_to = gsi_start_phis (new_loop->header);
+  !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
+  gsi_next (_from), gsi_next (_to))
{
- if (multiple_exits_p)
-   new_arg = *res;
+ gimple *from_phi = gsi_stmt (gsi_from);
+ gimple *to_phi = gsi_stmt (gsi_to);
+
+ /* When the vector loop is peeled then we need to use the
+value at start of the loop, otherwise the main loop exit
+should use the final iter value.  */
+ tree new_arg;
+ if (peeled_iters)
+   new_arg = gimple_phi_result (from_phi);
  else
-   {
- adjust_phi_and_debug_stmts (to_phi, loop_entry, *res);
- continue;
-   }
-   }
- /* If we have multiple exits and the vector loop is peeled then we
-need to use the value at start of loop.  If we're looking at
-virtual operands we have to keep the original link.   Virtual
-operands don't all become the same because we'll corrupt the
-vUSE chains among others.  */
- if (peeled_iters)
-   {
- tree tmp_arg = gimple_phi_result (from_phi);
- /* Similar to the single exit case, If we have an existing
-LCSSA variable thread through the original value otherwise
-skip it and directly use the final value.  */
- if ((res = new_phi_args.get (tmp_arg)))
+   new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi,
+loop_latch_edge (loop));
+
+ /* Check if we've already created a new phi node during edge
+redirection and re-use it if so.  Otherwise create a
+LC PHI node to feed the merge PHI.  */
+ tree *res;
+ if (virtual_operand_p (new_arg))
+   /* Use the existing virtual LC SSA from exit block.  */
+   new_arg = gimple_phi_result
+   (get_virtual_phi (main_loop_exit_block));
+ else if ((res = new_phi_args.get (new_arg)))
new_arg = *res;
- else if (!virtual_operand_p (new_arg))
-   new_arg = tmp_arg;
-   }
-
- tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
- gphi *lcssa_phi = create_phi_node (new_res, new_preheader);
-
- /* Otherwise, main loop exit should use the final iter value.  */
- if (multiple_exits_p)
-   {
- /* Create a LC PHI if it doesn't already exist.  */
- if (!virtual_operand_p (new_arg) && !res)
+ else
{
+ /* Create the LC PHI node for the 

[PING] [PATCH] get source line for diagnostic from preprocessed file / PR preprocessor/79106

2024-01-22 Thread Bader, Lucas
Hello,

as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79106 is still open from what I 
can tell, I wanted to gently bump this patch I provided a while back. It was 
earmarked for gcc-11 in 
https://gcc.gnu.org/pipermail/gcc-patches/2020-January/539201.html but did not 
make it into the release.

Original submission: 
https://gcc.gnu.org/legacy-ml/gcc-patches/2019-12/msg01113.html

Please let me know if something is missing.

Best
Lucas

-Original Message-
From: Bader, Lucas 
Sent: Montag, 16. Dezember 2019 12:19
To: gcc-patches@gcc.gnu.org
Subject: [PATCH] get source line for diagnostic from preprocessed file / PR 
preprocessor/79106

Hello,

within a compile cluster, only the preprocessed output of GCC is transferred to 
remote nodes for compilation. 
When GCC produces advanced diagnostics (with -fdiagnostics-show-caret), e.g. 
prints out the affected source
line and fixit hints, it attempts to read the source file again, even when 
compiling a preprocessed file (-fpreprocessed). 
This leads to wrong diagnostics when building with a compile cluster, or, more 
generally, when changing or deleting the original source file.

This patch attempts to alter the behavior by implementing a 
location_get_source_line_preprocessed 
function that can be used in diagnostic-show-locus.c in case a preprocessed 
file is compiled.
There was some previous discussion on this behavior on PR preprocessor/79106.

This is my first patch to GCC, so in case something is wrong with the format, 
please let me know.

Best regards
Lucas



Re: [PATCH] testsuite, jit: Stabilize error output.

2024-01-22 Thread Iain Sandoe
gentle ping,
with the increasing use of CI, it seems an idea to tackle this sooner rather 
than later.
thanks
Iain

> On 16 Jan 2024, at 11:12, Iain Sandoe  wrote:
> 
> Tested on x86_64, i686 Darwin, x86_64 Linux,
> OK for trunk? When?
> thanks
> Iain
> 
> --- 8< ---
> 
> Currently when a test fails, we print out a lot of information,
> this includes items that are not stable between invocations (e.g.
> the PID for the executable).  That makes automated comparisons
> between test runs flag any persistent fails as new ones each time
> which is not usually what is wanted.
> 
> This patch amends the error output to drop the variable portion
> of the message and retain items that should only change if the
> failure mode changes.
> 
> gcc/testsuite/ChangeLog:
> 
>   * jit.dg/jit.exp: Filter error output to remove per-run
>   variable content.
> 
> Signed-off-by: Iain Sandoe 
> ---
> gcc/testsuite/jit.dg/jit.exp | 21 +++--
> 1 file changed, 15 insertions(+), 6 deletions(-)
> 
> diff --git a/gcc/testsuite/jit.dg/jit.exp b/gcc/testsuite/jit.dg/jit.exp
> index 286cfa8192a..893ff5f6dd0 100644
> --- a/gcc/testsuite/jit.dg/jit.exp
> +++ b/gcc/testsuite/jit.dg/jit.exp
> @@ -94,25 +94,34 @@ proc parse_valgrind_logfile {name logfile} {
> # unexpected exits.
> 
> proc verify_exit_status { executable wres } {
> -lassign $wres pid spawnid os_error_flag value
> +set extra [lassign $wres pid spawnid os_error_flag value]
> verbose "pid: $pid" 3
> verbose "spawnid: $spawnid" 3
> verbose "os_error_flag: $os_error_flag" 3
> verbose "value: $value" 3
> 
> # Detect segfaults etc:
> -if { [llength $wres] > 4 } {
> - if { [lindex $wres 4] == "CHILDKILLED" } {
> - fail "$executable killed: $wres"
> +set len [llength $extra]
> +if { $len >= 1 } {
> + if { [lindex $extra 0] == "CHILDKILLED" } {
> + set reason "Unknown Reason"
> + set detail "No Details"
> + if { $len >= 2 } {
> + set reason [lindex $extra 1]
> + if { $len >= 3 } {
> + set detail [lindex $extra 2]
> + }
> + }
> + fail "$executable killed: $reason $detail"
>   return
>   }
> }
> if { $os_error_flag != 0 } {
> - fail "$executable: OS error: $wres"
> + fail "$executable: OS error: $os_error_flag $extra"
>   return
> }
> if { $value != 0 } {
> - fail "$executable: non-zero exit code: $wres"
> + fail "$executable: non-zero exit code: $value $extra"
>   return
> }
> pass "$executable exited cleanly"
> -- 
> 2.39.2 (Apple Git-143)
> 



Re: [PATCH] jit, Darwin: Implement library exports list.

2024-01-22 Thread Iain Sandoe
gentle ping,
this fixes quite a few of the new jit fails on darwin.
thanks
Iain

> On 16 Jan 2024, at 11:10, Iain Sandoe  wrote:
> 
> Tested on x86_64, i686 Darwin and x86_64 Linux,
> OK for trunk? when ?
> thanks,
> Iain
> 
> --- 8< ---
> 
> Currently, we have no exports list for libgccjit, which means that
> all symbols are exported, including those from libstdc++ which is
> linked statically into the lib.  This causes failures when the
> shared libstdc++ is used but some c++ symbols are satisfied from
> libgccjit.
> 
> This implements an export file for Darwin (which is currently
> manually created by cross-checking libgccjit.map).  Ideally we'd
> script this, at some point.  Update libtool current and age to
> reflect the current ABI version (we are not bumping the SO name
> at this stage).
> 
> This fixes a number of new failures in jit testing.
> 
> gcc/jit/ChangeLog:
> 
>   * Make-lang.in: Implement exports list, and use a shared
>   libgcc.
>   * libgccjit.exp: New file.
> 
> Signed-off-by: Iain Sandoe 
> ---
> gcc/jit/Make-lang.in  |  38 ---
> gcc/jit/libgccjit.exp | 229 ++
> 2 files changed, 251 insertions(+), 16 deletions(-)
> create mode 100644 gcc/jit/libgccjit.exp
> 
> diff --git a/gcc/jit/Make-lang.in b/gcc/jit/Make-lang.in
> index b1f0ce73e12..52dc2c24908 100644
> --- a/gcc/jit/Make-lang.in
> +++ b/gcc/jit/Make-lang.in
> @@ -55,7 +55,10 @@ else
> 
> ifneq (,$(findstring darwin,$(host)))
> 
> -LIBGCCJIT_AGE = 1
> +LIBGCCJIT_CURRENT = 26
> +LIBGCCJIT_REVISION = 0
> +LIBGCCJIT_AGE = 26
> +LIBGCCJIT_COMPAT = 0
> LIBGCCJIT_BASENAME = libgccjit
> 
> LIBGCCJIT_SONAME = \
> @@ -63,15 +66,15 @@ LIBGCCJIT_SONAME = \
> LIBGCCJIT_FILENAME = $(LIBGCCJIT_BASENAME).$(LIBGCCJIT_VERSION_NUM).dylib
> LIBGCCJIT_LINKER_NAME = $(LIBGCCJIT_BASENAME).dylib
> 
> -# Conditionalize the use of the LD_VERSION_SCRIPT_OPTION and
> -# LD_SONAME_OPTION depending if configure found them, using $(if)
> -# We have to define a COMMA here, otherwise the commas in the "true"
> -# result are treated as separators by the $(if).
> -COMMA := ,
> +# Darwin does not have a version script option. Exported symbols are 
> controlled
> +# by the following, and library versioning is done using libtool.
> LIBGCCJIT_VERSION_SCRIPT_OPTION = \
> - $(if $(LD_VERSION_SCRIPT_OPTION),\
> -   
> -Wl$(COMMA)$(LD_VERSION_SCRIPT_OPTION)$(COMMA)$(srcdir)/jit/libgccjit.map)
> +  -Wl,-exported_symbols_list,$(srcdir)/jit/libgccjit.exp
> 
> +# Conditionalize the use of  LD_SONAME_OPTION on configure finding it, using
> +# $(if).  We have to define a COMMA here, otherwise the commas in the "true"
> +# result are treated as separators by the $(if).
> +COMMA := ,
> LIBGCCJIT_SONAME_OPTION = \
>   $(if $(LD_SONAME_OPTION), \
>-Wl$(COMMA)$(LD_SONAME_OPTION)$(COMMA)$(LIBGCCJIT_SONAME))
> @@ -143,15 +146,18 @@ ifneq (,$(findstring mingw,$(target)))
> # Create import library
> LIBGCCJIT_EXTRA_OPTS = -Wl,--out-implib,$(LIBGCCJIT_IMPORT_LIB)
> else
> -
> ifneq (,$(findstring darwin,$(host)))
> -# TODO : Construct a Darwin-style symbol export file.
> -LIBGCCJIT_EXTRA_OPTS = -Wl,-compatibility_version,$(LIBGCCJIT_VERSION_NUM) \
> - 
> -Wl,-current_version,$(LIBGCCJIT_VERSION_NUM).$(LIBGCCJIT_MINOR_NUM).$(LIBGCCJIT_AGE)
>  \
> - $(LIBGCCJIT_VERSION_SCRIPT_OPTION) \
> - $(LIBGCCJIT_SONAME_OPTION)
> +LIBGCCJIT_VERS = $(LIBGCCJIT_CURRENT).$(LIBGCCJIT_REVISION).$(LIBGCCJIT_AGE)
> +LIBGCCJIT_EXTRA_OPTS = -Wl,-current_version,$(LIBGCCJIT_VERS) \
> + -Wl,-compatibility_version,$(LIBGCCJIT_COMPAT) \
> +  $(LIBGCCJIT_VERSION_SCRIPT_OPTION) $(LIBGCCJIT_SONAME_OPTION)
> +# Use the default (shared) libgcc.
> +JIT_LDFLAGS = $(filter-out -static-libgcc, $(LDFLAGS))
> +ifeq (,$(findstring darwin8,$(host)))
> +JIT_LDFLAGS += -Wl,-rpath,@loader_path
> +endif
> else
> -
> +JIT_LDFLAGS = $(LDFLAGS)
> LIBGCCJIT_EXTRA_OPTS = $(LIBGCCJIT_VERSION_SCRIPT_OPTION) \
>   $(LIBGCCJIT_SONAME_OPTION)
> endif
> @@ -170,7 +176,7 @@ $(LIBGCCJIT_FILENAME): $(jit_OBJS) \
>   $(LIBDEPS) $(srcdir)/jit/libgccjit.map \
>   $(EXTRA_GCC_OBJS_EXCLUSIVE) $(jit.prev)
>   @$(call LINK_PROGRESS,$(INDEX.jit),start)
> - +$(LLINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o $@ -shared \
> + +$(LLINKER) $(ALL_LINKERFLAGS) $(JIT_LDFLAGS) -o $@ -shared \
>$(jit_OBJS) libbackend.a libcommon-target.a libcommon.a \
>$(CPPLIB) $(LIBDECNUMBER) $(EXTRA_GCC_LIBS) $(LIBS) $(BACKENDLIBS) 
> \
>$(EXTRA_GCC_OBJS_EXCLUSIVE) \
> diff --git a/gcc/jit/libgccjit.exp b/gcc/jit/libgccjit.exp
> new file mode 100644
> index 000..0829503a53e
> --- /dev/null
> +++ b/gcc/jit/libgccjit.exp
> @@ -0,0 +1,229 @@
> +# Linker export list for Darwin libgccjit.dylib
> +
> +#   Contributed by Iain Sandoe .
> +#
> +# This file is part of GCC.
> +#
> +# GCC is free software you can redistribute it and/or modify it
> +# under the terms of the GNU General Public License as published by
> +# the Free