Re: [pushed][PATCH v1] LoongArch: Fixed an issue with the implementation of the template atomic_compare_and_swapsi.

2024-03-08 Thread chenglulu



在 2024/3/9 上午9:48, chenglulu 写道:

Pushed to r14-9407.

Cherry picked to r13-8413 and r12-10200.


在 2024/3/7 上午9:12, Lulu Cheng 写道:
If the hardware does not support LAMCAS, atomic_compare_and_swapsi 
needs to be
implemented through "ll.w+sc.w". In the implementation of the 
instruction sequence,

it is necessary to determine whether the two registers are equal.
Since LoongArch's comparison instructions do not distinguish between 
32-bit
and 64-bit, the two operand registers that need to be compared are 
symbolically
extended, and one of the operand registers is obtained from memory 
through the
"ll.w" instruction, which can ensure that the symbolic expansion is 
carried out.
However, the value of the other operand register is not guaranteed to 
be the

value of the sign extension.

gcc/ChangeLog:

* config/loongarch/sync.md (atomic_cas_value_strong):
In loongarch64, a sign extension operation is added when
operands[2] is a register operand and the mode is SImode.

gcc/testsuite/ChangeLog:

* g++.target/loongarch/atomic-cas-int.C: New test.
---
  gcc/config/loongarch/sync.md  | 46 ++-
  .../g++.target/loongarch/atomic-cas-int.C | 32 +
  2 files changed, 67 insertions(+), 11 deletions(-)
  create mode 100644 gcc/testsuite/g++.target/loongarch/atomic-cas-int.C

diff --git a/gcc/config/loongarch/sync.md b/gcc/config/loongarch/sync.md
index 8f35a5b48d2..d41c2d26811 100644
--- a/gcc/config/loongarch/sync.md
+++ b/gcc/config/loongarch/sync.md
@@ -245,18 +245,42 @@ (define_insn "atomic_cas_value_strong"
 (clobber (match_scratch:GPR 5 "="))]
    ""
  {
-  return "1:\\n\\t"
- "ll.\\t%0,%1\\n\\t"
- "bne\\t%0,%z2,2f\\n\\t"
- "or%i3\\t%5,$zero,%3\\n\\t"
- "sc.\\t%5,%1\\n\\t"
- "beqz\\t%5,1b\\n\\t"
- "b\\t3f\\n\\t"
- "2:\\n\\t"
- "%G4\\n\\t"
- "3:\\n\\t";
+  output_asm_insn ("1:", operands);
+  output_asm_insn ("ll.\t%0,%1", operands);
+
+  /* Like the test case atomic-cas-int.C, in loongarch64, O1 and 
higher, the
+ return value of the val_without_const_folding will not be 
truncated and

+ will be passed directly to the function compare_exchange_strong.
+ However, the instruction 'bne' does not distinguish between 
32-bit and
+ 64-bit operations.  so if the upper 32 bits of the register are 
not
+ extended by the 32nd bit symbol, then the comparison may not be 
valid

+ here.  This will affect the result of the operation.  */
+
+  if (TARGET_64BIT && REG_P (operands[2])
+  && GET_MODE (operands[2]) == SImode)
+    {
+  output_asm_insn ("addi.w\t%5,%2,0", operands);
+  output_asm_insn ("bne\t%0,%5,2f", operands);
+    }
+  else
+    output_asm_insn ("bne\t%0,%z2,2f", operands);
+
+  output_asm_insn ("or%i3\t%5,$zero,%3", operands);
+  output_asm_insn ("sc.\t%5,%1", operands);
+  output_asm_insn ("beqz\t%5,1b", operands);
+  output_asm_insn ("b\t3f", operands);
+  output_asm_insn ("2:", operands);
+  output_asm_insn ("%G4", operands);
+  output_asm_insn ("3:", operands);
+
+  return "";
  }
-  [(set (attr "length") (const_int 28))])
+  [(set (attr "length")
+ (if_then_else
+    (and (match_test "GET_MODE (operands[2]) == SImode")
+ (match_test "REG_P (operands[2])"))
+    (const_int 32)
+    (const_int 28)))])
    (define_insn "atomic_cas_value_strong_amcas"
    [(set (match_operand:QHWD 0 "register_operand" "=")
diff --git a/gcc/testsuite/g++.target/loongarch/atomic-cas-int.C 
b/gcc/testsuite/g++.target/loongarch/atomic-cas-int.C

new file mode 100644
index 000..830ce48267a
--- /dev/null
+++ b/gcc/testsuite/g++.target/loongarch/atomic-cas-int.C
@@ -0,0 +1,32 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include 
+#include 
+
+__attribute__ ((noinline)) long
+val_without_const_folding (long val)
+{
+  return val;
+}
+
+int
+main ()
+{
+  int oldval = 0xaa;
+  int newval = 0xbb;
+  std::atomic amo;
+
+  amo.store (oldval);
+
+  long longval = val_without_const_folding (0xff80 + 
oldval);

+  oldval = static_cast (longval);
+
+  amo.compare_exchange_strong (oldval, newval);
+
+  if (newval != amo.load (std::memory_order_relaxed))
+    __builtin_abort ();
+
+  return 0;
+}
+




[PATCH v2 2/2] bugzilla: remove `gcc-bugs@` mailing list address

2024-03-08 Thread Ben Boeckel
Bugzilla is preferred today. Use a URL that gives context about
gathering information prior to actually filing a bug at Bugzilla.

ChangeLog:

* config-ml.in: Replace gcc-bugs@ with bug reporting link.
* symlink-tree: Replace gcc-bugs@ with bug reporting link.

fixincludes/ChangeLog:

* README: Replace gcc-bugs@ with bug reporting link.

gcc/testsuite/ChangeLog:

* lib/file-format.exp: Replace gcc-bugs@ with bug reporting link.

libcpp/ChangeLog:

* configure: Regenerate.
* configure.ac: Replace gcc-bugs@ with bug reporting link.

libdecnumber/ChangeLog:

* configure: Regenerate.
* configure.ac: Replace gcc-bugs@ with bug reporting link.

Signed-off-by: Ben Boeckel 
---
v1 -> v2:
- Use `https://gcc.gnu.org/bugs/ instead of a direct Bugzilla link
- Regenerate `configure` scripts

---
 config-ml.in  |  2 +-
 fixincludes/README|  4 ++--
 gcc/testsuite/lib/file-format.exp |  4 ++--
 libcpp/configure  | 22 +++---
 libcpp/configure.ac   |  2 +-
 libdecnumber/configure| 22 +++---
 libdecnumber/configure.ac |  2 +-
 symlink-tree  |  2 +-
 8 files changed, 30 insertions(+), 30 deletions(-)

diff --git a/config-ml.in b/config-ml.in
index 6aff74410c0..dea86f7b3fe 100644
--- a/config-ml.in
+++ b/config-ml.in
@@ -24,7 +24,7 @@
 # configuration script generated by Autoconf, you may include it under
 # the same distribution terms that you use for the rest of that program.
 #
-# Please report bugs to 
+# Please report bugs to 
 # and send patches to .
 
 # It is advisable to support a few --enable/--disable options to let the
diff --git a/fixincludes/README b/fixincludes/README
index 98480165d10..28c9cfee194 100644
--- a/fixincludes/README
+++ b/fixincludes/README
@@ -6,8 +6,8 @@ If you are having some problem with a system header that is 
either
 broken by the manufacturer, or is broken by the fixinclude process,
 then you will need to alter or add information to the include fix
 definitions file, ``inclhack.def''.  Please also send relevant
-information to gcc-b...@gcc.gnu.org, gcc-patches@gcc.gnu.org and,
-please, to me:  bk...@gnu.org.
+information to https://gcc.gnu.org/bugs/, gcc-patches@gcc.gnu.org
+and, please, to me:  bk...@gnu.org.
 
 To make your fix, you will need to do several things:
 
diff --git a/gcc/testsuite/lib/file-format.exp 
b/gcc/testsuite/lib/file-format.exp
index 0670bda9c81..1a6904a9923 100644
--- a/gcc/testsuite/lib/file-format.exp
+++ b/gcc/testsuite/lib/file-format.exp
@@ -14,8 +14,8 @@
 # along with GCC; see the file COPYING3.  If not see
 # .
 
-# Please email any bugs, comments, and/or additions to this file to:
-# gcc-b...@gcc.gnu.org
+# Please report any bugs, comments, and/or additions to this file to:
+# https://gcc.gnu.org/bugs/
 
 # This file defines a proc for determining the file format in use by the
 # target.  This is useful for tests that are only supported by certain file
diff --git a/libcpp/configure b/libcpp/configure
index 8a38c0546e3..3904a66e190 100755
--- a/libcpp/configure
+++ b/libcpp/configure
@@ -2,7 +2,7 @@
 # Guess values for system-dependent variables and create Makefiles.
 # Generated by GNU Autoconf 2.69 for cpplib  .
 #
-# Report bugs to .
+# Report bugs to .
 #
 #
 # Copyright (C) 1992-1996, 1998-2012 Free Software Foundation, Inc.
@@ -267,10 +267,10 @@ fi
 $as_echo "$0: be upgraded to zsh 4.3.4 or later."
   else
 $as_echo "$0: Please tell bug-autoc...@gnu.org and
-$0: gcc-b...@gcc.gnu.org about your system, including any
-$0: error possibly output before this message. Then install
-$0: a modern shell, or manually run the script under such a
-$0: shell if you do have one."
+$0: https://gcc.gnu.org/bugs/ about your system, including
+$0: any error possibly output before this message. Then
+$0: install a modern shell, or manually run the script
+$0: under such a shell if you do have one."
   fi
   exit 1
 fi
@@ -582,7 +582,7 @@ PACKAGE_NAME='cpplib'
 PACKAGE_TARNAME='cpplib'
 PACKAGE_VERSION=' '
 PACKAGE_STRING='cpplib  '
-PACKAGE_BUGREPORT='gcc-b...@gcc.gnu.org'
+PACKAGE_BUGREPORT='https://gcc.gnu.org/bugs/'
 PACKAGE_URL=''
 
 ac_unique_file="ucnid.h"
@@ -1424,7 +1424,7 @@ Some influential environment variables:
 Use these variables to override the choices made by `configure' or to help
 it to find libraries and programs with nonstandard names/locations.
 
-Report bugs to .
+Report bugs to .
 _ACEOF
 ac_status=$?
 fi
@@ -1684,9 +1684,9 @@ $as_echo "$as_me: WARNING: $2: see the Autoconf 
documentation" >&2;}
 $as_echo "$as_me: WARNING: $2: section \"Present But Cannot Be Compiled\"" 
>&2;}
 { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $2: proceeding with the 
compiler's result" >&5
 $as_echo "$as_me: WARNING: $2: proceeding with the 

[PATCH v2 1/2] email: fix patch email addresses

2024-03-08 Thread Ben Boeckel
ChangeLog:

* config-ml.in: Update patch email address.
* symlink-tree: Update patch email address.

Signed-off-by: Ben Boeckel 
---
v1 -> v2:
- add Signed-off-by

---
 config-ml.in | 2 +-
 symlink-tree | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/config-ml.in b/config-ml.in
index 68854a4f16c..6aff74410c0 100644
--- a/config-ml.in
+++ b/config-ml.in
@@ -25,7 +25,7 @@
 # the same distribution terms that you use for the rest of that program.
 #
 # Please report bugs to 
-# and send patches to .
+# and send patches to .
 
 # It is advisable to support a few --enable/--disable options to let the
 # user select which libraries s/he really wants.
diff --git a/symlink-tree b/symlink-tree
index a9d50831b88..5cb95ba66aa 100755
--- a/symlink-tree
+++ b/symlink-tree
@@ -24,7 +24,7 @@
 # the same distribution terms that you use for the rest of that program.
 #
 # Please report bugs to 
-# and send patches to .
+# and send patches to .
 
 # Syntax: symlink-tree srcdir "ignore1 ignore2 ..."
 #

base-commit: 6fe63013e3292a45288461b7efa9d52e0ac234dc
-- 
2.44.0



Re: libbacktrace patch committed: Don't assume compressed section aligned

2024-03-08 Thread H.J. Lu
On Fri, Mar 8, 2024 at 2:48 PM Fangrui Song  wrote:
>
> On ELF64, it looks like BFD uses 8-byte alignment for compressed
> `.debug_*` sections while gold/lld/mold use 1-byte alignment. I do not
> know how the Solaris linker sets the alignment.
>
> The specification's wording makes me confused whether it really
> requires 8-byte alignment, even if a non-packed `Elf64_Chdr` surely
> requires 8.

Since compressed sections begin with a compression header
structure that identifies the compression algorithm, compressed
sections must be aligned to the alignment of the compression
header.  I don't think there is any ambiguity here.

> > The sh_size and sh_addralign fields of the section header for a compressed 
> > section reflect the requirements of the compressed section.
>
> There are many `.debug_*` sections. So avoiding some alignment padding
> seems a very natural extension (a DWARF v5 -gsplit-dwarf relocatable
> file has ~10 `.debug_*` sections), even if the specification doesn't
> allow it with a very strict interpretation...
>
> (Off-topic: I wonder whether ELF control structures should use
> unaligned LEB128 more. REL/RELA can naturally be replaced with a
> LEB128 one similar to wasm.)
>
> On Fri, Mar 8, 2024 at 1:57 PM Ian Lance Taylor  wrote:
> >
> > Reportedly when lld compresses debug sections, it fails to set the
> > alignment of the compressed section such that the compressed header
> > can be read directly.  To me this seems like a bug in lld.  However,
> > libbacktrace needs to work around it.  This patch, originally by the
> > GitHub user ubyte, does that.  Bootstrapped and tested on
> > x86_64-pc-linux-gnu.  Committed to mainline.
> >
> > Ian
> >
> > * elf.c (elf_uncompress_chdr): Don't assume compressed section is
> > aligned.
>
>
>
> --
> 宋方睿



-- 
H.J.


[PATCH wwwdocs 1/1] [RESEND] gcc-14: document P1689R5 scanning output support

2024-03-08 Thread Ben Boeckel
---
 htdocs/gcc-14/changes.html | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index 7278f753..b506eeb1 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -112,6 +112,17 @@ a work-in-progress.
   
 
   
+  C++ module scanning for named modules is now available:
+
+  https://wg21.link/P1689R5;>P1689R5, Format for
+  describing dependencies of source files.
+  
+  The -fdeps-format=, -fdeps-file=, and
+  -fdeps=target= flags may be used to generate P1689 output
+  (the p1689r5 format is the only available format today).
+  
+
+  
 
 
 Runtime Library (libstdc++)
-- 
2.42.0



Re: [pushed][PATCH v1] LoongArch: Fixed an issue with the implementation of the template atomic_compare_and_swapsi.

2024-03-08 Thread chenglulu

Pushed to r14-9407.

在 2024/3/7 上午9:12, Lulu Cheng 写道:

If the hardware does not support LAMCAS, atomic_compare_and_swapsi needs to be
implemented through "ll.w+sc.w". In the implementation of the instruction 
sequence,
it is necessary to determine whether the two registers are equal.
Since LoongArch's comparison instructions do not distinguish between 32-bit
and 64-bit, the two operand registers that need to be compared are symbolically
extended, and one of the operand registers is obtained from memory through the
"ll.w" instruction, which can ensure that the symbolic expansion is carried out.
However, the value of the other operand register is not guaranteed to be the
value of the sign extension.

gcc/ChangeLog:

* config/loongarch/sync.md (atomic_cas_value_strong):
In loongarch64, a sign extension operation is added when
operands[2] is a register operand and the mode is SImode.

gcc/testsuite/ChangeLog:

* g++.target/loongarch/atomic-cas-int.C: New test.
---
  gcc/config/loongarch/sync.md  | 46 ++-
  .../g++.target/loongarch/atomic-cas-int.C | 32 +
  2 files changed, 67 insertions(+), 11 deletions(-)
  create mode 100644 gcc/testsuite/g++.target/loongarch/atomic-cas-int.C

diff --git a/gcc/config/loongarch/sync.md b/gcc/config/loongarch/sync.md
index 8f35a5b48d2..d41c2d26811 100644
--- a/gcc/config/loongarch/sync.md
+++ b/gcc/config/loongarch/sync.md
@@ -245,18 +245,42 @@ (define_insn "atomic_cas_value_strong"
 (clobber (match_scratch:GPR 5 "="))]
""
  {
-  return "1:\\n\\t"
-"ll.\\t%0,%1\\n\\t"
-"bne\\t%0,%z2,2f\\n\\t"
-"or%i3\\t%5,$zero,%3\\n\\t"
-"sc.\\t%5,%1\\n\\t"
-"beqz\\t%5,1b\\n\\t"
-"b\\t3f\\n\\t"
-"2:\\n\\t"
-"%G4\\n\\t"
-"3:\\n\\t";
+  output_asm_insn ("1:", operands);
+  output_asm_insn ("ll.\t%0,%1", operands);
+
+  /* Like the test case atomic-cas-int.C, in loongarch64, O1 and higher, the
+ return value of the val_without_const_folding will not be truncated and
+ will be passed directly to the function compare_exchange_strong.
+ However, the instruction 'bne' does not distinguish between 32-bit and
+ 64-bit operations.  so if the upper 32 bits of the register are not
+ extended by the 32nd bit symbol, then the comparison may not be valid
+ here.  This will affect the result of the operation.  */
+
+  if (TARGET_64BIT && REG_P (operands[2])
+  && GET_MODE (operands[2]) == SImode)
+{
+  output_asm_insn ("addi.w\t%5,%2,0", operands);
+  output_asm_insn ("bne\t%0,%5,2f", operands);
+}
+  else
+output_asm_insn ("bne\t%0,%z2,2f", operands);
+
+  output_asm_insn ("or%i3\t%5,$zero,%3", operands);
+  output_asm_insn ("sc.\t%5,%1", operands);
+  output_asm_insn ("beqz\t%5,1b", operands);
+  output_asm_insn ("b\t3f", operands);
+  output_asm_insn ("2:", operands);
+  output_asm_insn ("%G4", operands);
+  output_asm_insn ("3:", operands);
+
+  return "";
  }
-  [(set (attr "length") (const_int 28))])
+  [(set (attr "length")
+ (if_then_else
+   (and (match_test "GET_MODE (operands[2]) == SImode")
+(match_test "REG_P (operands[2])"))
+   (const_int 32)
+   (const_int 28)))])
  
  (define_insn "atomic_cas_value_strong_amcas"

[(set (match_operand:QHWD 0 "register_operand" "=")
diff --git a/gcc/testsuite/g++.target/loongarch/atomic-cas-int.C 
b/gcc/testsuite/g++.target/loongarch/atomic-cas-int.C
new file mode 100644
index 000..830ce48267a
--- /dev/null
+++ b/gcc/testsuite/g++.target/loongarch/atomic-cas-int.C
@@ -0,0 +1,32 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include 
+#include 
+
+__attribute__ ((noinline)) long
+val_without_const_folding (long val)
+{
+  return val;
+}
+
+int
+main ()
+{
+  int oldval = 0xaa;
+  int newval = 0xbb;
+  std::atomic amo;
+
+  amo.store (oldval);
+
+  long longval = val_without_const_folding (0xff80 + oldval);
+  oldval = static_cast (longval);
+
+  amo.compare_exchange_strong (oldval, newval);
+
+  if (newval != amo.load (std::memory_order_relaxed))
+__builtin_abort ();
+
+  return 0;
+}
+




Re: [pushed][PATCH] LoongArch: testsuite: Add compilation options to the regname-fp-s9.c.

2024-03-08 Thread chenglulu

Pushed to r14-9408.

在 2024/3/7 上午9:50, Lulu Cheng 写道:

When the value of the macro DEFAULT_CFLAGS is set to '-ansi -pedantic-errors',
regname-s9-fp.c will test to fail. To solve this problem, add the compilation
option '-Wno-pedantic -std=gnu90' to this test case.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/regname-fp-s9.c: Add compilation option
'-Wno-pedantic -std=gnu90'.
---
  gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c 
b/gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c
index d2e3b80f83c..77a74f1f667 100644
--- a/gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c
+++ b/gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c
@@ -1,3 +1,4 @@
  /* { dg-do compile } */
+/* { dg-additional-options "-Wno-pedantic -std=gnu90" } */
  register long s9 asm("s9"); /* { dg-note "conflicts with 's9'" } */
  register long fp asm("fp"); /* { dg-warning "register of 'fp' used for multiple 
global register variables" } */




Re: [PATCH v1] LoongArch: Fixed an issue with the implementation of the template atomic_compare_and_swapsi.

2024-03-08 Thread chenglulu



在 2024/3/8 下午2:22, Xi Ruoyao 写道:

On Thu, 2024-03-07 at 21:07 +0800, chenglulu wrote:

在 2024/3/7 下午8:52, Xi Ruoyao 写道:

It should be better to extend the expected value before the ll/sc loop
(like what LLVM does), instead of repeating the extending in each
iteration.  Something like:

I wanted to do this at first, but it didn't work out.

But then I thought about it, and there are two benefits to putting it in
the middle of ll/sc:

1. If there is an operation that uses the $r4 register after this atomic
operation, another

register is required to store $r4.

2. ll.w requires long cycles, so putting an addi.w command after ll.w
won't make a difference.

So based on the above, I didn't try again, but directly made a
modification like a patch.

Ah, the explanation makes sense to me.  Ok with the original patch then.

Ok.Thank you so much!:-)



[committed] libstdc++: Do not require a time-of-day when parsing sys_days [PR114240]

2024-03-08 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk.

-- >8 --

When parsing a std::chrono::sys_days (or a sys_time with an even longer
period) we should not require a time-of-day to be present in the input,
because we can't represent that in the result type anyway.

Rather than trying to decide which specializations should require a
time-of-date and which should not, follow the direction of Howard
Hinnant's date library, which allows extracting a sys_time of any period
from input that only contains a date, defaulting the time-of-day part to
00:00:00. This seems consistent with the intent of the standard, which
says it's an error "If the parse fails to decode a valid date" (i.e., it
doesn't care about decoding a valid time, only a date).

libstdc++-v3/ChangeLog:

PR libstdc++/114240
* include/bits/chrono_io.h (_Parser::operator()): Assume
hours(0) for a time_point, so that a time is not required
to be present.
* testsuite/std/time/parse/114240.cc: New test.
---
 libstdc++-v3/include/bits/chrono_io.h | 12 ++-
 .../testsuite/std/time/parse/114240.cc| 36 +++
 2 files changed, 47 insertions(+), 1 deletion(-)
 create mode 100644 libstdc++-v3/testsuite/std/time/parse/114240.cc

diff --git a/libstdc++-v3/include/bits/chrono_io.h 
b/libstdc++-v3/include/bits/chrono_io.h
index eaa36b8a074..b9eb3d2be53 100644
--- a/libstdc++-v3/include/bits/chrono_io.h
+++ b/libstdc++-v3/include/bits/chrono_io.h
@@ -3157,6 +3157,16 @@ namespace __detail
  minutes __tz_offset = __bad_min;
  basic_string<_CharT, _Traits> __tz_abbr;
 
+ if ((_M_need & _ChronoParts::_TimeOfDay)
+   && (_M_need & _ChronoParts::_Year))
+   {
+ // For time_points assume "00:00:00" is implicitly present,
+ // so we don't fail to parse if it's not (PR libstdc++/114240).
+ // We will still fail to parse if there's no year+month+day.
+ __h = hours(0);
+ __parts = _ChronoParts::_TimeOfDay;
+   }
+
  // bool __is_neg = false; // TODO: how is this handled for parsing?
 
  _CharT __mod{}; // One of 'E' or 'O' or nul.
@@ -4098,7 +4108,7 @@ namespace __detail
  const bool __need_wday = _M_need & _ChronoParts::_Weekday;
 
  // Whether the caller wants _M_sys_days and _M_time.
- // Only true for time_points.
+ // Only true for durations and time_points.
  const bool __need_time = _M_need & _ChronoParts::_TimeOfDay;
 
  if (__need_wday && __wday != __bad_wday)
diff --git a/libstdc++-v3/testsuite/std/time/parse/114240.cc 
b/libstdc++-v3/testsuite/std/time/parse/114240.cc
new file mode 100644
index 000..46310efd09a
--- /dev/null
+++ b/libstdc++-v3/testsuite/std/time/parse/114240.cc
@@ -0,0 +1,36 @@
+// { dg-do run { target c++20 } }
+
+// PR libstdc++/114240 sys_days not being parsed with only a date in the stream
+
+#include 
+#include 
+#include 
+
+template
+void
+test_parse_date_only()
+{
+  using namespace std::chrono;
+
+  using CDays = time_point;
+  CDays td;
+  std::istringstream is("2024-03-05");
+  VERIFY( is >> parse("%Y-%m-%d ", td) );
+  if constexpr (std::is_same_v)
+VERIFY( td == static_cast>(2024y/March/5) );
+  else
+  {
+auto tp = clock_cast(sys_days(2024y/March/5));
+VERIFY( td == time_point_cast(tp) );
+  }
+}
+
+int main()
+{
+  test_parse_date_only();
+  test_parse_date_only();
+  test_parse_date_only();
+  test_parse_date_only();
+  test_parse_date_only();
+  test_parse_date_only();
+}
-- 
2.43.2



[committed] libstdc++: Fix parsing of leap seconds as chrono::utc_time [PR114279]

2024-03-08 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk.

-- >8 --

Implementing all chrono::from_stream overloads in terms of
chrono::sys_time meant that a leap second time like 23:59:60.001 cannot
be parsed, because that cannot be represented in a sys_time.

The fix to support parsing leap seconds as utc_time is to convert the
parsed date to utc_time and then add the parsed time to that,
which allows the result to land in a leap second, rather than doing all
the arithmetic with sys_time which doesn't have leap seconds.

For local_time we also allow %S to parse a 60s value, because doing
otherwise might disallow some valid uses. We can't know all use cases
users have for treating times as local_time.

For all other clocks, we can reject times that have 60 or 60.nnn as the
seconds part, because that cannot occur in a valid UNIX, GPS, or TAI
time. Since our chrono::file_clock uses sys_time, it can't occur for
that clock either.

In order to support this a new _M_is_leap_second member is needed in the
_Parser type. This can be added at the end, where most targets currently
have padding bytes. Similar to what I did recently for formatter _Spec
structs, we can also reserve additional padding bits for future
expansion.

This also fixes bugs in the from_stream overloads for utc_time,
tai_time, gps_time, and file_time, which were not using time_point_cast
to explicitly convert to the result type. That's needed because the
result type might have lower precision than the value returned from
from_sys or from_utc, which has a precision no lower than seconds.

libstdc++-v3/ChangeLog:

PR libstdc++/114279
* include/bits/chrono_io.h (_Parser::_M_is_leap_second): New
data member.
(_Parser::_M_reserved): Reserve padding bits for future use.
(_Parser::operator()): Set _M_is_leap_second if %S reads 60s.
(from_stream): Only allow _M_is_leap_second for utc_time and
local_time. Adjust arithmetic for utc_time so that leap seconds
are preserved. Use time_point_cast to convert to a possibly
lower-precision result type.
* testsuite/std/time/parse.cc: Move to ...
* testsuite/std/time/parse/parse.cc: ... here.
* testsuite/std/time/parse/114279.cc: New test.
---
 libstdc++-v3/include/bits/chrono_io.h | 74 ---
 .../testsuite/std/time/parse/114279.cc| 53 +
 .../testsuite/std/time/{ => parse}/parse.cc   |  0
 3 files changed, 115 insertions(+), 12 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/std/time/parse/114279.cc
 rename libstdc++-v3/testsuite/std/time/{ => parse}/parse.cc (100%)

diff --git a/libstdc++-v3/include/bits/chrono_io.h 
b/libstdc++-v3/include/bits/chrono_io.h
index 412e8b83fb7..eaa36b8a074 100644
--- a/libstdc++-v3/include/bits/chrono_io.h
+++ b/libstdc++-v3/include/bits/chrono_io.h
@@ -2164,6 +2164,8 @@ namespace __detail
   year_month_day _M_ymd{};
   weekday _M_wd{};
   __format::_ChronoParts _M_need;
+  unsigned _M_is_leap_second : 1 {};
+  unsigned _M_reserved : 15 {};
 
   template
basic_istream<_CharT, _Traits>&
@@ -2742,8 +2744,13 @@ namespace __detail
   __detail::_Parser_t<_Duration> __p(__need);
   if (__p(__is, __fmt, __abbrev, __offset))
{
- auto __st = __p._M_sys_days + __p._M_time - *__offset;
- __tp = chrono::time_point_cast<_Duration>(__st);
+ if (__p._M_is_leap_second)
+   __is.setstate(ios_base::failbit);
+ else
+   {
+ auto __st = __p._M_sys_days + __p._M_time - *__offset;
+ __tp = chrono::time_point_cast<_Duration>(__st);
+   }
}
   return __is;
 }
@@ -2765,9 +2772,21 @@ namespace __detail
basic_string<_CharT, _Traits, _Alloc>* __abbrev = nullptr,
minutes* __offset = nullptr)
 {
-  sys_time<_Duration> __st;
-  if (chrono::from_stream(__is, __fmt, __st, __abbrev, __offset))
-   __tp = utc_clock::from_sys(__st);
+  minutes __off{};
+  if (!__offset)
+   __offset = &__off;
+  using __format::_ChronoParts;
+  auto __need = _ChronoParts::_Year | _ChronoParts::_Month
+   | _ChronoParts::_Day | _ChronoParts::_TimeOfDay;
+  __detail::_Parser_t<_Duration> __p(__need);
+  if (__p(__is, __fmt, __abbrev, __offset))
+   {
+ // Converting to utc_time before adding _M_time is necessary for
+ // "23:59:60" to correctly produce a time within a leap second.
+ auto __ut = utc_clock::from_sys(__p._M_sys_days) + __p._M_time
+   - *__offset;
+ __tp = chrono::time_point_cast<_Duration>(__ut);
+   }
   return __is;
 }
 
@@ -2788,9 +2807,24 @@ namespace __detail
basic_string<_CharT, _Traits, _Alloc>* __abbrev = nullptr,
minutes* __offset = nullptr)
 {
-  utc_time<_Duration> __ut;
-  if (chrono::from_stream(__is, __fmt, __ut, __abbrev, __offset))
-   

Re: [PATCH v2] c++: Check module attachment instead of just purview when necessary [PR112631]

2024-03-08 Thread Nathaniel Shead
On Fri, Mar 08, 2024 at 10:19:52AM -0500, Jason Merrill wrote:
> On 3/7/24 21:55, Nathaniel Shead wrote:
> > On Mon, Nov 27, 2023 at 03:59:39PM +1100, Nathaniel Shead wrote:
> > > On Thu, Nov 23, 2023 at 03:03:37PM -0500, Nathan Sidwell wrote:
> > > > On 11/20/23 04:47, Nathaniel Shead wrote:
> > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu. I don't have write
> > > > > access.
> > > > > 
> > > > > -- >8 --
> > > > > 
> > > > > Block-scope declarations of functions or extern values are not allowed
> > > > > when attached to a named module. Similarly, class member functions are
> > > > > not inline if attached to a named module. However, in both these cases
> > > > > we currently only check if the declaration is within the module 
> > > > > purview;
> > > > > it is possible for such a declaration to occur within the module 
> > > > > purview
> > > > > but not be attached to a named module (e.g. in an 'extern "C++"' 
> > > > > block).
> > > > > This patch makes the required adjustments.
> > > > 
> > > > 
> > > > Ah I'd been puzzling over the default inlinedness of  member-fns of
> > > > block-scope structs.  Could you augment the testcase to make sure that's
> > > > right too?
> > > > 
> > > > Something like:
> > > > 
> > > > // dg-module-do link
> > > > export module Mod;
> > > > 
> > > > export auto Get () {
> > > >struct X { void Fn () {} };
> > > >return X();
> > > > }
> > > > 
> > > > 
> > > > ///
> > > > import Mod
> > > > void Frob () { Get().Fn(); }
> > > > 
> > > 
> > > I gave this a try and it indeed doesn't work correctly; 'Fn' needs to be
> > > marked 'inline' for this to link (whether or not 'Get' itself is
> > > inline). I've tried tracing the code to work out what's going on but
> > > I've been struggling to work out how all the different flags (e.g.
> > > TREE_PUBLIC, TREE_EXTERNAL, TREE_COMDAT, DECL_NOT_REALLY_EXTERN)
> > > interact, which flags we want to be set where, and where the decision of
> > > what function definitions to emit is actually made.
> > > 
> > > I did find that doing 'mark_used(decl)' on all member functions in
> > > block-scope structs seems to work however, but I wonder if that's maybe
> > > too aggressive or if there's something else we should be doing?
> > 
> > I got around to looking at this again, here's an updated version of this
> > patch. Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?
> > 
> > (I'm not sure if 'start_preparsed_function' is the right place to be
> > putting this kind of logic or if it should instead be going in
> > 'grokfndecl', e.g. decl.cc:10761 where the rules for making local
> > functions have no linkage are initially determined, but I found this
> > easier to implement: happy to move around though if preferred.)
> > 
> > -- >8 --
> > 
> > Block-scope declarations of functions or extern values are not allowed
> > when attached to a named module. Similarly, class member functions are
> > not inline if attached to a named module. However, in both these cases
> > we currently only check if the declaration is within the module purview;
> > it is possible for such a declaration to occur within the module purview
> > but not be attached to a named module (e.g. in an 'extern "C++"' block).
> > This patch makes the required adjustments.
> > 
> > While implementing this we discovered that block-scope local functions
> > are not correctly emitted, causing link failures; this patch also
> > corrects some assumptions here and ensures that they are emitted when
> > needed.
> > 
> > PR c++/112631
> > 
> > gcc/cp/ChangeLog:
> > 
> > * cp-tree.h (named_module_attach_p): New function.
> > * decl.cc (start_decl): Check for attachment not purview.
> > (grokmethod): Likewise.
> 
> These changes are OK; the others I want to consider more.
> 

Thanks, I can commit this as a separate commit if you prefer?

> > +export auto n_n() {
> > +  internal();
> > +  struct X { void f() { internal(); } };
> > +  return X{};
> 
> Hmm, is this not a prohibited exposure?  Seems like X has no linkage because
> it's at block scope, and the deduced return type names it.
> 
> I know we try to support this "voldemort" pattern, but is that actually
> correct?
> 
> Jason
> 

I had similar doubts, but this is not an especially uncommon pattern in
the wild either. I also asked some other people for their thoughts and
got told:

  "no linkage" doesn't mean it's ill-formed to name it in other scopes.
  It means a declaration in another scope cannot correspond to it

And that the wording in [basic.link] p2.4 is imprecise. (Apparently they
were going to raise a core issue about this too, I think?)

As for whether it's an exposure, looking at [basic.link] p15, the entity
'X' doesn't actually appear to be TU-local: it doesn't have a name with
internal linkage (no linkage is different) and is not declared or
introduced within the definition of a TU-local entity (n_n is not
TU-local). So I think this example is OK, but this example is not:

  

Re: libbacktrace patch committed: Don't assume compressed section aligned

2024-03-08 Thread Fangrui Song
On ELF64, it looks like BFD uses 8-byte alignment for compressed
`.debug_*` sections while gold/lld/mold use 1-byte alignment. I do not
know how the Solaris linker sets the alignment.

The specification's wording makes me confused whether it really
requires 8-byte alignment, even if a non-packed `Elf64_Chdr` surely
requires 8.

> The sh_size and sh_addralign fields of the section header for a compressed 
> section reflect the requirements of the compressed section.

There are many `.debug_*` sections. So avoiding some alignment padding
seems a very natural extension (a DWARF v5 -gsplit-dwarf relocatable
file has ~10 `.debug_*` sections), even if the specification doesn't
allow it with a very strict interpretation...

(Off-topic: I wonder whether ELF control structures should use
unaligned LEB128 more. REL/RELA can naturally be replaced with a
LEB128 one similar to wasm.)

On Fri, Mar 8, 2024 at 1:57 PM Ian Lance Taylor  wrote:
>
> Reportedly when lld compresses debug sections, it fails to set the
> alignment of the compressed section such that the compressed header
> can be read directly.  To me this seems like a bug in lld.  However,
> libbacktrace needs to work around it.  This patch, originally by the
> GitHub user ubyte, does that.  Bootstrapped and tested on
> x86_64-pc-linux-gnu.  Committed to mainline.
>
> Ian
>
> * elf.c (elf_uncompress_chdr): Don't assume compressed section is
> aligned.



-- 
宋方睿


libbacktrace patch committed: Don't assume compressed section aligned

2024-03-08 Thread Ian Lance Taylor
Reportedly when lld compresses debug sections, it fails to set the
alignment of the compressed section such that the compressed header
can be read directly.  To me this seems like a bug in lld.  However,
libbacktrace needs to work around it.  This patch, originally by the
GitHub user ubyte, does that.  Bootstrapped and tested on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian

* elf.c (elf_uncompress_chdr): Don't assume compressed section is
aligned.
5825bd0e0d0040126e78269e56c9b9f533e2a520
diff --git a/libbacktrace/elf.c b/libbacktrace/elf.c
index 7841c86cd9c..3cd87020b03 100644
--- a/libbacktrace/elf.c
+++ b/libbacktrace/elf.c
@@ -5076,7 +5076,7 @@ elf_uncompress_chdr (struct backtrace_state *state,
 backtrace_error_callback error_callback, void *data,
 unsigned char **uncompressed, size_t *uncompressed_size)
 {
-  const b_elf_chdr *chdr;
+  b_elf_chdr chdr;
   char *alc;
   size_t alc_len;
   unsigned char *po;
@@ -5088,27 +5088,30 @@ elf_uncompress_chdr (struct backtrace_state *state,
   if (compressed_size < sizeof (b_elf_chdr))
 return 1;
 
-  chdr = (const b_elf_chdr *) compressed;
+  /* The lld linker can misalign a compressed section, so we can't safely read
+ the fields directly as we can for other ELF sections.  See
+ https://github.com/ianlancetaylor/libbacktrace/pull/120.  */
+  memcpy (, compressed, sizeof (b_elf_chdr));
 
   alc = NULL;
   alc_len = 0;
-  if (*uncompressed != NULL && *uncompressed_size >= chdr->ch_size)
+  if (*uncompressed != NULL && *uncompressed_size >= chdr.ch_size)
 po = *uncompressed;
   else
 {
-  alc_len = chdr->ch_size;
+  alc_len = chdr.ch_size;
   alc = backtrace_alloc (state, alc_len, error_callback, data);
   if (alc == NULL)
return 0;
   po = (unsigned char *) alc;
 }
 
-  switch (chdr->ch_type)
+  switch (chdr.ch_type)
 {
 case ELFCOMPRESS_ZLIB:
   if (!elf_zlib_inflate_and_verify (compressed + sizeof (b_elf_chdr),
compressed_size - sizeof (b_elf_chdr),
-   zdebug_table, po, chdr->ch_size))
+   zdebug_table, po, chdr.ch_size))
goto skip;
   break;
 
@@ -5116,7 +5119,7 @@ elf_uncompress_chdr (struct backtrace_state *state,
   if (!elf_zstd_decompress (compressed + sizeof (b_elf_chdr),
compressed_size - sizeof (b_elf_chdr),
(unsigned char *)zdebug_table, po,
-   chdr->ch_size))
+   chdr.ch_size))
goto skip;
   break;
 
@@ -5126,7 +5129,7 @@ elf_uncompress_chdr (struct backtrace_state *state,
 }
 
   *uncompressed = po;
-  *uncompressed_size = chdr->ch_size;
+  *uncompressed_size = chdr.ch_size;
 
   return 1;
 
@@ -6876,8 +6879,8 @@ elf_add (struct backtrace_state *state, const char 
*filename, int descriptor,
}
 }
 
-  // A debuginfo file may not have a useful .opd section, but we can use the
-  // one from the original executable.
+  /* A debuginfo file may not have a useful .opd section, but we can use the
+ one from the original executable.  */
   if (opd == NULL)
 opd = caller_opd;
 


[pushed][PR113790][LRA]: Fixing LRA ICE on riscv64

2024-03-08 Thread Vladimir Makarov

The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113790

The patch was successfully bootstrapped and tested on x86-64,ppc64le, 
and aarch64.


commit cebbaa2a84586a7345837f74a53b7a0263bf29ee
Author: Vladimir N. Makarov 
Date:   Fri Mar 8 14:48:33 2024 -0500

[PR113790][LRA]: Fixing LRA ICE on riscv64

  LRA failed to consider all insn alternatives when non-reload pseudo
did not get a hard register.  This resulted in failure to generate
code by LRA.  The patch fixes this problem.

gcc/ChangeLog:

PR target/113790
* lra-assigns.cc (assign_by_spills): Set up all_spilled_pseudos
for non-reload pseudo too.

diff --git a/gcc/lra-assigns.cc b/gcc/lra-assigns.cc
index d1b2b35ffc9..7dfa6f70941 100644
--- a/gcc/lra-assigns.cc
+++ b/gcc/lra-assigns.cc
@@ -1430,13 +1430,19 @@ assign_by_spills (void)
 	hard_regno = spill_for (regno, _spilled_pseudos, iter == 1);
 	  if (hard_regno < 0)
 	{
-	  if (reload_p) {
-		/* Put unassigned reload pseudo first in the
-		   array.  */
-		regno2 = sorted_pseudos[nfails];
-		sorted_pseudos[nfails++] = regno;
-		sorted_pseudos[i] = regno2;
-	  }
+	  if (reload_p)
+		{
+		  /* Put unassigned reload pseudo first in the array.  */
+		  regno2 = sorted_pseudos[nfails];
+		  sorted_pseudos[nfails++] = regno;
+		  sorted_pseudos[i] = regno2;
+		}
+	  else
+		{
+		  /* Consider all alternatives on the next constraint
+		 subpass.  */
+		  bitmap_set_bit (_spilled_pseudos, regno);
+		}
 	}
 	  else
 	{


Re: [PATCH] gomp: testsuite: improve compatibility of bad-array-section-3.c [PR113428]

2024-03-08 Thread Richard Sandiford
Richard Earnshaw  writes:
> This test generates different warnings on ilp32 targets because the size
> of an integer matches the size of a pointer.  Avoid this by using
> signed char.
>
> gcc/testsuite:
>
>   PR testsuite/113428
>   * gcc.dg/gomp/bad-array-section-c-3.c: Use signed char instead
>   of int.
> ---
>
> I think this fixes the issues seen on ilp32 machines, without substantially
> changing what the test does, but a second set of eyes wouldn't hurt.
>
>  gcc/testsuite/gcc.dg/gomp/bad-array-section-c-3.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)

It looks good to me FWIW, but I'd feel more comfortable with an OK
from Jakub.

OK midday Monday UK time if no-one objects by then.

Richard

>
> diff --git a/gcc/testsuite/gcc.dg/gomp/bad-array-section-c-3.c 
> b/gcc/testsuite/gcc.dg/gomp/bad-array-section-c-3.c
> index 8be15ced8c0..431af71c422 100644
> --- a/gcc/testsuite/gcc.dg/gomp/bad-array-section-c-3.c
> +++ b/gcc/testsuite/gcc.dg/gomp/bad-array-section-c-3.c
> @@ -1,15 +1,15 @@
>  /* { dg-do compile } */
>  
>  struct S {
> -  int *ptr;
> +  signed char *ptr;
>  };
>  
>  int main()
>  {
> -  int arr[20];
> +  signed char arr[20];
>  
>/* Reject array section in compound initialiser.  */
> -#pragma omp target map( (struct S) { .ptr = (int *) arr[5:5] } )
> +#pragma omp target map( (struct S) { .ptr = (signed char *) arr[5:5] } )
>  /* { dg-error {expected '\]' before ':' token} "" { target *-*-* } .-1 } */
>  /* { dg-warning {cast to pointer from integer of different size} "" { target 
> *-*-* } .-2 } */
>  /* { dg-message {sorry, unimplemented: unsupported map expression} "" { 
> target *-*-* } .-3 } */


Re: [PATCH] Revert "Pass GUILE down to subdirectories"

2024-03-08 Thread Tom Tromey
> "Andrew" == Andrew Burgess  writes:

Andrew> After once again forgetting to add GUILE=guile2.2 to my GDB build I was
Andrew> thinking about this issue again.

Andrew> Given that GDB has a --with-guile=... configure option, and that our
Andrew> configure scripts try to identify a matching version of guild and a
Andrew> shared library to link GDB against, I wondered why we don't just force
Andrew> the use of a matching version of guile.

Andrew> I guess I'm suggesting that for building GDB's guile components we
Andrew> should respect either the --with-guile=... configure option, or what the
Andrew> configure script auto-detected, and should not be picking up any build
Andrew> time GUILE=... flag -- setting GUILE=... isn't going to change the
Andrew> version of guild used, nor is it going to change the shared library that
Andrew> GDB links against, so just changing the guile version seems like a
Andrew> recipe for incompatibility problems.

Andrew> Below is a patch that forces GDB to compile our guile scripts using a
Andrew> suitable version of guile.  This could be applied irrespective of
Andrew> whether you revert b7e5a29602143b53267efcd9c8d5ecc78cd5a62f or not.

Andrew> What do you think?

I have no issue with this.  If it helps you, you should do it.

FWIW I was waiting for a response from you before reverting this change.
https://sourceware.org/pipermail/gdb-patches/2024-February/206507.html


Overall I think that we really should revert my change.

cgen isn't run commonly enough to warrant the change breaking literally
anything else.  And, part of this is on cgen for not coming with any
kind of script for running it ... actually looking under the hood has
really soured me on cgen entirely.

The guild #! prologue also seems like an obvious bug to me.  However
it's perhaps too late to fix this in any useful way.

Tom


Re: [PATCH] middle-end/113680 - Optimize (x - y) CMP 0 as x CMP y

2024-03-08 Thread Ken Matsui
On Thu, Mar 7, 2024 at 10:49 PM Richard Biener
 wrote:
>
> On Thu, Mar 7, 2024 at 8:29 PM Ken Matsui  wrote:
> >
> > On Tue, Mar 5, 2024 at 7:58 AM Richard Biener
> >  wrote:
> > >
> > > On Tue, Mar 5, 2024 at 1:51 PM Ken Matsui  
> > > wrote:
> > > >
> > > > On Tue, Mar 5, 2024 at 12:38 AM Richard Biener
> > > >  wrote:
> > > > >
> > > > > On Mon, Mar 4, 2024 at 9:40 PM Ken Matsui  wrote:
> > > > > >
> > > > > > (x - y) CMP 0 is equivalent to x CMP y where x and y are signed
> > > > > > integers and CMP is <, <=, >, or >=.  Similarly, 0 CMP (x - y) is
> > > > > > equivalent to y CMP x.  As reported in PR middle-end/113680, this
> > > > > > equivalence does not hold for types other than signed integers.  
> > > > > > When
> > > > > > it comes to conditions, the former was translated to a combination 
> > > > > > of
> > > > > > sub and test, whereas the latter was translated to a single cmp.
> > > > > > Thus, this optimization pass tries to optimize the former to the
> > > > > > latter.
> > > > > >
> > > > > > When `-fwrapv` is enabled, GCC treats the overflow of signed 
> > > > > > integers
> > > > > > as defined behavior, specifically, wrapping around according to 
> > > > > > two's
> > > > > > complement arithmetic.  This has implications for optimizations that
> > > > > > rely on the standard behavior of signed integers, where overflow is
> > > > > > undefined.  Consider the example given:
> > > > > >
> > > > > > long long llmax = __LONG_LONG_MAX__;
> > > > > > long long llmin = -llmax - 1;
> > > > > >
> > > > > > Here, `llmax - llmin` effectively becomes `llmax - (-llmax - 1)`, 
> > > > > > which
> > > > > > simplifies to `2 * llmax + 1`.  Given that `llmax` is the maximum 
> > > > > > value
> > > > > > for a `long long`, this calculation overflows in a defined manner
> > > > > > (wrapping around), which under `-fwrapv` is a legal operation that
> > > > > > produces a negative value due to two's complement wraparound.
> > > > > > Therefore, `llmax - llmin < 0` is true.
> > > > > >
> > > > > > However, the direct comparison `llmax < llmin` is false since 
> > > > > > `llmax`
> > > > > > is the maximum possible value and `llmin` is the minimum.  Hence,
> > > > > > optimizations that rely on the equivalence of `(x - y) CMP 0` to
> > > > > > `x CMP y` (and vice versa) cannot be safely applied when `-fwrapv` 
> > > > > > is
> > > > > > enabled.  This is why this optimization pass is disabled under
> > > > > > `-fwrapv`.
> > > > > >
> > > > > > This optimization pass must run before the Jump Threading pass and 
> > > > > > the
> > > > > > VRP pass, as it may modify conditions. For example, in the VRP pass:
> > > > > >
> > > > > > (1)
> > > > > >   int diff = x - y;
> > > > > >   if (diff > 0)
> > > > > > foo();
> > > > > >   if (diff < 0)
> > > > > > bar();
> > > > > >
> > > > > > The second condition would be converted to diff != 0 in the VRP pass
> > > > > > because we know the postcondition of the first condition is diff <= 
> > > > > > 0,
> > > > > > and then diff != 0 is cheaper than diff < 0. If we apply this pass
> > > > > > after this VRP, we get:
> > > > > >
> > > > > > (2)
> > > > > >   int diff = x - y;
> > > > > >   if (x > y)
> > > > > > foo();
> > > > > >   if (diff != 0)
> > > > > > bar();
> > > > > >
> > > > > > This generates sub and test for the second condition and cmp for the
> > > > > > first condition. However, if we apply this pass beforehand, we 
> > > > > > simply
> > > > > > get:
> > > > > >
> > > > > > (3)
> > > > > >   int diff = x - y;
> > > > > >   if (x > y)
> > > > > > foo();
> > > > > >   if (x < y)
> > > > > > bar();
> > > > > >
> > > > > > In this code, diff will be eliminated as a dead code, and sub and 
> > > > > > test
> > > > > > will not be generated, which is more efficient.
> > > > > >
> > > > > > For the Jump Threading pass, without this optimization pass, (1) and
> > > > > > (3) above are recognized as different, which prevents TCO.
> > > > > >
> > > > > > PR middle-end/113680
> > > > >
> > > > > This shouldn't be done as a new optimization pass.  It fits either
> > > > > the explicit code present in the forwprop pass or a new match.pd
> > > > > pattern.  There's possible interaction with x - y value being used
> > > > > elsewhere and thus exposing a CSE opportunity as well as
> > > > > a comparison against zero being possibly implemented by
> > > > > a flag setting subtraction instruction.
> > > > >
> > > >
> > > > Thank you so much for your review!  Although the forwprop pass runs
> > > > multiple times, we might not need to apply this optimization multiple
> > > > times.  Would it be acceptable to add such optimization?  More
> > > > generally, I would like to know how to determine where to put
> > > > optimization in the future.
> > >
> > > This kind of pattern matching 

Re: [PATCH 01/11] gcc/doc/extend.texi: Sort built-in traits alphabetically

2024-03-08 Thread Ken Matsui
On Fri, Mar 8, 2024 at 8:38 AM Patrick Palka  wrote:
>
> Hi Ken,
>
> This patch series LGTM, thanks for these documentation improvements.

Thank you for your time to review!  I actually have some changes to
these patches, so I will re-send those to you once they are ready.

>
> On Fri, 1 Mar 2024, Ken Matsui wrote:
>
> > This patch sorts built-in traits alphabetically for better codebase
> > consistency and easier future integration of changes.
> >
> > gcc/ChangeLog:
> >
> >   * doc/extend.texi (Type Traits): Sort built-in traits
> >   alphabetically.
> >
> > Signed-off-by: Ken Matsui 
> > ---
> >  gcc/doc/extend.texi | 62 ++---
> >  1 file changed, 31 insertions(+), 31 deletions(-)
> >
> > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> > index f679c81acf2..b13f9d6f934 100644
> > --- a/gcc/doc/extend.texi
> > +++ b/gcc/doc/extend.texi
> > @@ -29499,15 +29499,6 @@ Requires: @var{type} shall be a complete type, 
> > (possibly cv-qualified)
> >  @code{void}, or an array of unknown bound.
> >  @enddefbuiltin
> >
> > -@defbuiltin{bool __has_nothrow_copy (@var{type})}
> > -If @code{__has_trivial_copy (type)} is @code{true} then the trait is
> > -@code{true}, else if @var{type} is a cv-qualified class or union type
> > -with copy constructors that are known not to throw an exception then
> > -the trait is @code{true}, else it is @code{false}.
> > -Requires: @var{type} shall be a complete type, (possibly cv-qualified)
> > -@code{void}, or an array of unknown bound.
> > -@enddefbuiltin
> > -
> >  @defbuiltin{bool __has_nothrow_constructor (@var{type})}
> >  If @code{__has_trivial_constructor (type)} is @code{true} then the trait
> >  is @code{true}, else if @var{type} is a cv class or union type (or array
> > @@ -29517,6 +29508,15 @@ Requires: @var{type} shall be a complete type, 
> > (possibly cv-qualified)
> >  @code{void}, or an array of unknown bound.
> >  @enddefbuiltin
> >
> > +@defbuiltin{bool __has_nothrow_copy (@var{type})}
> > +If @code{__has_trivial_copy (type)} is @code{true} then the trait is
> > +@code{true}, else if @var{type} is a cv-qualified class or union type
> > +with copy constructors that are known not to throw an exception then
> > +the trait is @code{true}, else it is @code{false}.
> > +Requires: @var{type} shall be a complete type, (possibly cv-qualified)
> > +@code{void}, or an array of unknown bound.
> > +@enddefbuiltin
> > +
> >  @defbuiltin{bool __has_trivial_assign (@var{type})}
> >  If @var{type} is @code{const}- qualified or is a reference type then
> >  the trait is @code{false}.  Otherwise if @code{__is_trivial (type)} is
> > @@ -29527,15 +29527,6 @@ Requires: @var{type} shall be a complete type, 
> > (possibly cv-qualified)
> >  @code{void}, or an array of unknown bound.
> >  @enddefbuiltin
> >
> > -@defbuiltin{bool __has_trivial_copy (@var{type})}
> > -If @code{__is_trivial (type)} is @code{true} or @var{type} is a reference
> > -type then the trait is @code{true}, else if @var{type} is a cv class
> > -or union type with a trivial copy constructor ([class.copy]) then the trait
> > -is @code{true}, else it is @code{false}.  Requires: @var{type} shall be
> > -a complete type, (possibly cv-qualified) @code{void}, or an array of 
> > unknown
> > -bound.
> > -@enddefbuiltin
> > -
> >  @defbuiltin{bool __has_trivial_constructor (@var{type})}
> >  If @code{__is_trivial (type)} is @code{true} then the trait is @code{true},
> >  else if @var{type} is a cv-qualified class or union type (or array thereof)
> > @@ -29545,6 +29536,15 @@ Requires: @var{type} shall be a complete type, 
> > (possibly cv-qualified)
> >  @code{void}, or an array of unknown bound.
> >  @enddefbuiltin
> >
> > +@defbuiltin{bool __has_trivial_copy (@var{type})}
> > +If @code{__is_trivial (type)} is @code{true} or @var{type} is a reference
> > +type then the trait is @code{true}, else if @var{type} is a cv class
> > +or union type with a trivial copy constructor ([class.copy]) then the trait
> > +is @code{true}, else it is @code{false}.  Requires: @var{type} shall be
> > +a complete type, (possibly cv-qualified) @code{void}, or an array of 
> > unknown
> > +bound.
> > +@enddefbuiltin
> > +
> >  @defbuiltin{bool __has_trivial_destructor (@var{type})}
> >  If @code{__is_trivial (type)} is @code{true} or @var{type} is a reference 
> > type
> >  then the trait is @code{true}, else if @var{type} is a cv class or union
> > @@ -29560,6 +29560,13 @@ If @var{type} is a class type with a virtual 
> > destructor
> >  Requires: If @var{type} is a non-union class type, it shall be a complete 
> > type.
> >  @enddefbuiltin
> >
> > +@defbuiltin{bool __integer_pack (@var{length})}
> > +When used as the pattern of a pack expansion within a template
> > +definition, expands to a template argument pack containing integers
> > +from @code{0} to @code{@var{length}-1}.  This is provided for
> > +efficient implementation of @code{std::make_integer_sequence}.
> > +@enddefbuiltin
> > +

Re: [PATCH v14 23/26] c++: Implement __is_invocable built-in trait

2024-03-08 Thread Ken Matsui
On Fri, Mar 8, 2024 at 9:17 AM Patrick Palka  wrote:
>
> On Wed, 28 Feb 2024, Ken Matsui wrote:
>
> > This patch implements built-in trait for std::is_invocable.
> >
> > gcc/cp/ChangeLog:
> >
> >   * cp-trait.def: Define __is_invocable.
> >   * constraint.cc (diagnose_trait_expr): Handle CPTK_IS_INVOCABLE.
> >   * semantics.cc (trait_expr_value): Likewise.
> >   (finish_trait_expr): Likewise.
> >   * cp-tree.h (build_invoke): New function.
> >   * method.cc (build_invoke): New function.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * g++.dg/ext/has-builtin-1.C: Test existence of __is_invocable.
> >   * g++.dg/ext/is_invocable1.C: New test.
> >   * g++.dg/ext/is_invocable2.C: New test.
> >   * g++.dg/ext/is_invocable3.C: New test.
> >   * g++.dg/ext/is_invocable4.C: New test.
>
> Thanks, this looks great!  This generic build_invoke function could be
> used for invoke_result etc as well, and it could also cache the built-up
> call across __is_invocable and __is_nothrow_invocable checks on the same
> arguments (which is a common pattern in the standard library).  LGTM

Thank you!!!  Yes, I will also work on those features!

>
> >
> > Signed-off-by: Ken Matsui 
> > ---
> >  gcc/cp/constraint.cc |   6 +
> >  gcc/cp/cp-trait.def  |   1 +
> >  gcc/cp/cp-tree.h |   2 +
> >  gcc/cp/method.cc | 132 +
> >  gcc/cp/semantics.cc  |   4 +
> >  gcc/testsuite/g++.dg/ext/has-builtin-1.C |   3 +
> >  gcc/testsuite/g++.dg/ext/is_invocable1.C | 349 +++
> >  gcc/testsuite/g++.dg/ext/is_invocable2.C | 139 +
> >  gcc/testsuite/g++.dg/ext/is_invocable3.C |  51 
> >  gcc/testsuite/g++.dg/ext/is_invocable4.C |  33 +++
> >  10 files changed, 720 insertions(+)
> >  create mode 100644 gcc/testsuite/g++.dg/ext/is_invocable1.C
> >  create mode 100644 gcc/testsuite/g++.dg/ext/is_invocable2.C
> >  create mode 100644 gcc/testsuite/g++.dg/ext/is_invocable3.C
> >  create mode 100644 gcc/testsuite/g++.dg/ext/is_invocable4.C
> >
> > diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
> > index 23ea66d9c12..c87b126fdb1 100644
> > --- a/gcc/cp/constraint.cc
> > +++ b/gcc/cp/constraint.cc
> > @@ -3791,6 +3791,12 @@ diagnose_trait_expr (tree expr, tree args)
> >  case CPTK_IS_FUNCTION:
> >inform (loc, "  %qT is not a function", t1);
> >break;
> > +case CPTK_IS_INVOCABLE:
> > +  if (!t2)
> > +inform (loc, "  %qT is not invocable", t1);
> > +  else
> > +inform (loc, "  %qT is not invocable by %qE", t1, t2);
> > +  break;
> >  case CPTK_IS_LAYOUT_COMPATIBLE:
> >inform (loc, "  %qT is not layout compatible with %qT", t1, t2);
> >break;
> > diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
> > index 85056c8140b..6cb2b55f4ea 100644
> > --- a/gcc/cp/cp-trait.def
> > +++ b/gcc/cp/cp-trait.def
> > @@ -75,6 +75,7 @@ DEFTRAIT_EXPR (IS_EMPTY, "__is_empty", 1)
> >  DEFTRAIT_EXPR (IS_ENUM, "__is_enum", 1)
> >  DEFTRAIT_EXPR (IS_FINAL, "__is_final", 1)
> >  DEFTRAIT_EXPR (IS_FUNCTION, "__is_function", 1)
> > +DEFTRAIT_EXPR (IS_INVOCABLE, "__is_invocable", -1)
> >  DEFTRAIT_EXPR (IS_LAYOUT_COMPATIBLE, "__is_layout_compatible", 2)
> >  DEFTRAIT_EXPR (IS_LITERAL_TYPE, "__is_literal_type", 1)
> >  DEFTRAIT_EXPR (IS_MEMBER_FUNCTION_POINTER, "__is_member_function_pointer", 
> > 1)
> > diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
> > index 334c11396c2..261d3a71faa 100644
> > --- a/gcc/cp/cp-tree.h
> > +++ b/gcc/cp/cp-tree.h
> > @@ -7334,6 +7334,8 @@ extern tree get_copy_assign 
> > (tree);
> >  extern tree get_default_ctor (tree);
> >  extern tree get_dtor (tree, tsubst_flags_t);
> >  extern tree build_stub_object(tree);
> > +extern tree build_invoke (tree, const_tree,
> > +  tsubst_flags_t);
> >  extern tree strip_inheriting_ctors   (tree);
> >  extern tree inherited_ctor_binfo (tree);
> >  extern bool base_ctor_omit_inherited_parms   (tree);
> > diff --git a/gcc/cp/method.cc b/gcc/cp/method.cc
> > index 98c10e6a8b5..953f1bed6fc 100644
> > --- a/gcc/cp/method.cc
> > +++ b/gcc/cp/method.cc
> > @@ -1928,6 +1928,138 @@ build_trait_object (tree type)
> >return build_stub_object (type);
> >  }
> >
> > +/* [func.require] Build an expression of INVOKE(FN_TYPE, ARG_TYPES...).  
> > If the
> > +   given is not invocable, returns error_mark_node.  */
> > +
> > +tree
> > +build_invoke (tree fn_type, const_tree arg_types, tsubst_flags_t complain)
> > +{
> > +  if (fn_type == error_mark_node || arg_types == error_mark_node)
> > +return error_mark_node;
> > +
> > +  gcc_assert (TYPE_P (fn_type));
> > +  gcc_assert (TREE_CODE (arg_types) == TREE_VEC);
> > +
> > +  /* Access check is required to determine if the given is invocable.  */
> > +  

Re: [PATCH v14 26/26] libstdc++: Optimize std::is_nothrow_invocable compilation performance

2024-03-08 Thread Ken Matsui
On Fri, Mar 8, 2024 at 9:22 AM Patrick Palka  wrote:
>
> On Wed, 28 Feb 2024, Ken Matsui wrote:
>
> > This patch optimizes the compilation performance of
> > std::is_nothrow_invocable by dispatching to the new
> > __is_nothrow_invocable built-in trait.
> >
> > libstdc++-v3/ChangeLog:
> >
> >   * include/std/type_traits (is_nothrow_invocable): Use
> >   __is_nothrow_invocable built-in trait.
> >   * testsuite/20_util/is_nothrow_invocable/incomplete_args_neg.cc:
> >   Handle the new error from __is_nothrow_invocable.
> >   * testsuite/20_util/is_nothrow_invocable/incomplete_neg.cc:
> >   Likewise.
> >
> > Signed-off-by: Ken Matsui 
> > ---
> >  libstdc++-v3/include/std/type_traits  | 4 
> >  .../20_util/is_nothrow_invocable/incomplete_args_neg.cc   | 1 +
> >  .../testsuite/20_util/is_nothrow_invocable/incomplete_neg.cc  | 1 +
> >  3 files changed, 6 insertions(+)
> >
> > diff --git a/libstdc++-v3/include/std/type_traits 
> > b/libstdc++-v3/include/std/type_traits
> > index 9af233bcc75..093d85a51a8 100644
> > --- a/libstdc++-v3/include/std/type_traits
> > +++ b/libstdc++-v3/include/std/type_traits
> > @@ -3265,8 +3265,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >/// std::is_nothrow_invocable
> >template
> >  struct is_nothrow_invocable
> > +#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_nothrow_invocable)
> > +: public __bool_constant<__is_nothrow_invocable(_Fn, _ArgTypes...)>
> > +#else
> >  : __and_<__is_invocable_impl<__invoke_result<_Fn, _ArgTypes...>, void>,
> >__call_is_nothrow_<_Fn, _ArgTypes...>>::type
> > +#endif
> >  {
> >
> > static_assert(std::__is_complete_or_unbounded(__type_identity<_Fn>{}),
> >   "_Fn must be a complete class or an unbounded array");
> > diff --git 
> > a/libstdc++-v3/testsuite/20_util/is_nothrow_invocable/incomplete_args_neg.cc
> >  
> > b/libstdc++-v3/testsuite/20_util/is_nothrow_invocable/incomplete_args_neg.cc
> > index 3c225883eaf..3f8542dd366 100644
> > --- 
> > a/libstdc++-v3/testsuite/20_util/is_nothrow_invocable/incomplete_args_neg.cc
> > +++ 
> > b/libstdc++-v3/testsuite/20_util/is_nothrow_invocable/incomplete_args_neg.cc
> > @@ -18,6 +18,7 @@
> >  // .
> >
> >  // { dg-error "must be a complete class" "" { target *-*-* } 0 }
> > +// { dg-prune-output "invalid use of incomplete type" }
>
> Is the error coming somewhere from the new build_invoke function?  That'd
> be surprising since we pass tf_none to it, which should suppress such
> errors.  (You can probably break on cxx_incomplete_type_diagnostic to
> find out where it's coming from.)

Hi Patrick,

Thank you for your review!

This error comes from cxx_incomplete_type_diagnostic in
check_trait_type, i.e., before the new build_invoke function.  As we
discussed previously, it would be better if we could produce
diagnostics that are in harmony with libstdc++.

>
> >
> >  #include 
> >
> > diff --git 
> > a/libstdc++-v3/testsuite/20_util/is_nothrow_invocable/incomplete_neg.cc 
> > b/libstdc++-v3/testsuite/20_util/is_nothrow_invocable/incomplete_neg.cc
> > index 5a728bfa03b..d3bdf08448b 100644
> > --- a/libstdc++-v3/testsuite/20_util/is_nothrow_invocable/incomplete_neg.cc
> > +++ b/libstdc++-v3/testsuite/20_util/is_nothrow_invocable/incomplete_neg.cc
> > @@ -18,6 +18,7 @@
> >  // .
> >
> >  // { dg-error "must be a complete class" "" { target *-*-* } 0 }
> > +// { dg-prune-output "invalid use of incomplete type" }
> >
> >  #include 
> >
> > --
> > 2.44.0
> >
> >
>


Re: [PATCH v2] bpf: add size threshold for inlining mem builtins

2024-03-08 Thread Jose E. Marchesi


Hi Faust.
OK.  Thanks!

> [Changes from v1:
>  - Error if threshold is exceeded instead of silently emitting libcall
>  - Update test accordingly
>  - Expand documentation to explain this behavior  ]
>
> BPF cannot fall back on library calls to implement memmove, memcpy and
> memset, so we attempt to expand these inline always if possible.
> However, this inline expansion was being attempted even for excessively
> large operations, which could result in gcc consuming huge amounts of
> memory and hanging.
>
> Add a size threshold in the BPF backend below which to always expand
> these operations inline, and introduce an option
> -minline-memops-threshold= to control the threshold. Defaults to
> 1024 bytes.
>
> gcc/
>
>   * config/bpf/bpf.cc (bpf_expand_cpymem, bpf_expand_setmem): Do
>   not attempt inline expansion if size is above threshold.
>   * config/bpf/bpf.opt (-minline-memops-threshold): New option.
>   * doc/invoke.texi (eBPF Options) <-minline-memops-threshold>:
>   Document.
>
> gcc/testsuite/
>
>   * gcc.target/bpf/inline-memops-threshold-1.c: New test.
>   * gcc.target/bpf/inline-memops-threshold-2.c: New test.
> ---
>  gcc/config/bpf/bpf.cc | 26 ++-
>  gcc/config/bpf/bpf.opt|  4 +++
>  gcc/doc/invoke.texi   | 11 +++-
>  .../bpf/inline-memops-threshold-1.c   | 15 +++
>  .../bpf/inline-memops-threshold-2.c   | 11 
>  5 files changed, 65 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/bpf/inline-memops-threshold-1.c
>  create mode 100644 gcc/testsuite/gcc.target/bpf/inline-memops-threshold-2.c
>
> diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
> index 0e33f4347ba..8365cd9fcb1 100644
> --- a/gcc/config/bpf/bpf.cc
> +++ b/gcc/config/bpf/bpf.cc
> @@ -1244,9 +1244,9 @@ bool
>  bpf_expand_cpymem (rtx *operands, bool is_move)
>  {
>/* Size must be constant for this expansion to work.  */
> +  const char *name = is_move ? "memmove" : "memcpy";
>if (!CONST_INT_P (operands[2]))
>  {
> -  const char *name = is_move ? "memmove" : "memcpy";
>if (flag_building_libgcc)
>   warning (0, "could not inline call to %<__builtin_%s%>: "
>"size must be constant", name);
> @@ -1275,6 +1275,18 @@ bpf_expand_cpymem (rtx *operands, bool is_move)
>gcc_unreachable ();
>  }
>  
> +  /* For sizes above threshold, always use a libcall.  */
> +  if (size_bytes > (unsigned HOST_WIDE_INT) bpf_inline_memops_threshold)
> +{
> +  if (flag_building_libgcc)
> + warning (0, "could not inline call to %<__builtin_%s%>: "
> +  "too many bytes, use -minline-memops-threshold", name);
> +  else
> + error ("could not inline call to %<__builtin_%s%>: "
> +"too many bytes, use -minline-memops-threshold", name);
> +  return false;
> +}
> +
>unsigned iters = size_bytes >> ceil_log2 (align);
>unsigned remainder = size_bytes & (align - 1);
>  
> @@ -1347,6 +1359,18 @@ bpf_expand_setmem (rtx *operands)
>gcc_unreachable ();
>  }
>  
> +  /* For sizes above threshold, always use a libcall.  */
> +  if (size_bytes > (unsigned HOST_WIDE_INT) bpf_inline_memops_threshold)
> +{
> +  if (flag_building_libgcc)
> + warning (0, "could not inline call to %<__builtin_memset%>: "
> +  "too many bytes, use -minline-memops-threshold");
> +  else
> + error ("could not inline call to %<__builtin_memset%>: "
> +"too many bytes, use -minline-memops-threshold");
> +  return false;
> +}
> +
>unsigned iters = size_bytes >> ceil_log2 (align);
>unsigned remainder = size_bytes & (align - 1);
>unsigned inc = GET_MODE_SIZE (mode);
> diff --git a/gcc/config/bpf/bpf.opt b/gcc/config/bpf/bpf.opt
> index acfddebdad7..541ebe4dfc4 100644
> --- a/gcc/config/bpf/bpf.opt
> +++ b/gcc/config/bpf/bpf.opt
> @@ -108,3 +108,7 @@ Enum(asm_dialect) String(normal) Value(ASM_NORMAL)
>  
>  EnumValue
>  Enum(asm_dialect) String(pseudoc) Value(ASM_PSEUDOC)
> +
> +minline-memops-threshold=
> +Target RejectNegative Joined UInteger Var(bpf_inline_memops_threshold) 
> Init(1024)
> +-minline-memops-threshold= Maximum size of memset/memmove/memcpy to 
> inline, larger sizes will use a library call.
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index c0d604a2c5c..85c938d4a14 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -971,7 +971,7 @@ Objective-C and Objective-C++ Dialects}.
>  @gccoptlist{-mbig-endian -mlittle-endian
>  -mframe-limit=@var{bytes} -mxbpf -mco-re -mno-co-re -mjmpext
>  -mjmp32 -malu32 -mv3-atomics -mbswap -msdiv -msmov -mcpu=@var{version}
> --masm=@var{dialect}}
> +-masm=@var{dialect} -minline-memops-threshold=@var{bytes}}
>  
>  @emph{FR30 Options}
>  @gccoptlist{-msmall-model  -mno-lsim}
> @@ -25701,6 +25701,15 @@ Outputs pseudo-c assembly dialect.
>  
>  @end table
>  

Re: [PATCH v14 26/26] libstdc++: Optimize std::is_nothrow_invocable compilation performance

2024-03-08 Thread Patrick Palka
On Wed, 28 Feb 2024, Ken Matsui wrote:

> This patch optimizes the compilation performance of
> std::is_nothrow_invocable by dispatching to the new
> __is_nothrow_invocable built-in trait.
> 
> libstdc++-v3/ChangeLog:
> 
>   * include/std/type_traits (is_nothrow_invocable): Use
>   __is_nothrow_invocable built-in trait.
>   * testsuite/20_util/is_nothrow_invocable/incomplete_args_neg.cc:
>   Handle the new error from __is_nothrow_invocable.
>   * testsuite/20_util/is_nothrow_invocable/incomplete_neg.cc:
>   Likewise.
> 
> Signed-off-by: Ken Matsui 
> ---
>  libstdc++-v3/include/std/type_traits  | 4 
>  .../20_util/is_nothrow_invocable/incomplete_args_neg.cc   | 1 +
>  .../testsuite/20_util/is_nothrow_invocable/incomplete_neg.cc  | 1 +
>  3 files changed, 6 insertions(+)
> 
> diff --git a/libstdc++-v3/include/std/type_traits 
> b/libstdc++-v3/include/std/type_traits
> index 9af233bcc75..093d85a51a8 100644
> --- a/libstdc++-v3/include/std/type_traits
> +++ b/libstdc++-v3/include/std/type_traits
> @@ -3265,8 +3265,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>/// std::is_nothrow_invocable
>template
>  struct is_nothrow_invocable
> +#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_nothrow_invocable)
> +: public __bool_constant<__is_nothrow_invocable(_Fn, _ArgTypes...)>
> +#else
>  : __and_<__is_invocable_impl<__invoke_result<_Fn, _ArgTypes...>, void>,
>__call_is_nothrow_<_Fn, _ArgTypes...>>::type
> +#endif
>  {
>static_assert(std::__is_complete_or_unbounded(__type_identity<_Fn>{}),
>   "_Fn must be a complete class or an unbounded array");
> diff --git 
> a/libstdc++-v3/testsuite/20_util/is_nothrow_invocable/incomplete_args_neg.cc 
> b/libstdc++-v3/testsuite/20_util/is_nothrow_invocable/incomplete_args_neg.cc
> index 3c225883eaf..3f8542dd366 100644
> --- 
> a/libstdc++-v3/testsuite/20_util/is_nothrow_invocable/incomplete_args_neg.cc
> +++ 
> b/libstdc++-v3/testsuite/20_util/is_nothrow_invocable/incomplete_args_neg.cc
> @@ -18,6 +18,7 @@
>  // .
>  
>  // { dg-error "must be a complete class" "" { target *-*-* } 0 }
> +// { dg-prune-output "invalid use of incomplete type" }

Is the error coming somewhere from the new build_invoke function?  That'd
be surprising since we pass tf_none to it, which should suppress such
errors.  (You can probably break on cxx_incomplete_type_diagnostic to
find out where it's coming from.)

>  
>  #include 
>  
> diff --git 
> a/libstdc++-v3/testsuite/20_util/is_nothrow_invocable/incomplete_neg.cc 
> b/libstdc++-v3/testsuite/20_util/is_nothrow_invocable/incomplete_neg.cc
> index 5a728bfa03b..d3bdf08448b 100644
> --- a/libstdc++-v3/testsuite/20_util/is_nothrow_invocable/incomplete_neg.cc
> +++ b/libstdc++-v3/testsuite/20_util/is_nothrow_invocable/incomplete_neg.cc
> @@ -18,6 +18,7 @@
>  // .
>  
>  // { dg-error "must be a complete class" "" { target *-*-* } 0 }
> +// { dg-prune-output "invalid use of incomplete type" }
>  
>  #include 
>  
> -- 
> 2.44.0
> 
> 



[PATCH v2] bpf: add size threshold for inlining mem builtins

2024-03-08 Thread David Faust
[Changes from v1:
 - Error if threshold is exceeded instead of silently emitting libcall
 - Update test accordingly
 - Expand documentation to explain this behavior  ]

BPF cannot fall back on library calls to implement memmove, memcpy and
memset, so we attempt to expand these inline always if possible.
However, this inline expansion was being attempted even for excessively
large operations, which could result in gcc consuming huge amounts of
memory and hanging.

Add a size threshold in the BPF backend below which to always expand
these operations inline, and introduce an option
-minline-memops-threshold= to control the threshold. Defaults to
1024 bytes.

gcc/

* config/bpf/bpf.cc (bpf_expand_cpymem, bpf_expand_setmem): Do
not attempt inline expansion if size is above threshold.
* config/bpf/bpf.opt (-minline-memops-threshold): New option.
* doc/invoke.texi (eBPF Options) <-minline-memops-threshold>:
Document.

gcc/testsuite/

* gcc.target/bpf/inline-memops-threshold-1.c: New test.
* gcc.target/bpf/inline-memops-threshold-2.c: New test.
---
 gcc/config/bpf/bpf.cc | 26 ++-
 gcc/config/bpf/bpf.opt|  4 +++
 gcc/doc/invoke.texi   | 11 +++-
 .../bpf/inline-memops-threshold-1.c   | 15 +++
 .../bpf/inline-memops-threshold-2.c   | 11 
 5 files changed, 65 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/bpf/inline-memops-threshold-1.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/inline-memops-threshold-2.c

diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
index 0e33f4347ba..8365cd9fcb1 100644
--- a/gcc/config/bpf/bpf.cc
+++ b/gcc/config/bpf/bpf.cc
@@ -1244,9 +1244,9 @@ bool
 bpf_expand_cpymem (rtx *operands, bool is_move)
 {
   /* Size must be constant for this expansion to work.  */
+  const char *name = is_move ? "memmove" : "memcpy";
   if (!CONST_INT_P (operands[2]))
 {
-  const char *name = is_move ? "memmove" : "memcpy";
   if (flag_building_libgcc)
warning (0, "could not inline call to %<__builtin_%s%>: "
 "size must be constant", name);
@@ -1275,6 +1275,18 @@ bpf_expand_cpymem (rtx *operands, bool is_move)
   gcc_unreachable ();
 }
 
+  /* For sizes above threshold, always use a libcall.  */
+  if (size_bytes > (unsigned HOST_WIDE_INT) bpf_inline_memops_threshold)
+{
+  if (flag_building_libgcc)
+   warning (0, "could not inline call to %<__builtin_%s%>: "
+"too many bytes, use -minline-memops-threshold", name);
+  else
+   error ("could not inline call to %<__builtin_%s%>: "
+  "too many bytes, use -minline-memops-threshold", name);
+  return false;
+}
+
   unsigned iters = size_bytes >> ceil_log2 (align);
   unsigned remainder = size_bytes & (align - 1);
 
@@ -1347,6 +1359,18 @@ bpf_expand_setmem (rtx *operands)
   gcc_unreachable ();
 }
 
+  /* For sizes above threshold, always use a libcall.  */
+  if (size_bytes > (unsigned HOST_WIDE_INT) bpf_inline_memops_threshold)
+{
+  if (flag_building_libgcc)
+   warning (0, "could not inline call to %<__builtin_memset%>: "
+"too many bytes, use -minline-memops-threshold");
+  else
+   error ("could not inline call to %<__builtin_memset%>: "
+  "too many bytes, use -minline-memops-threshold");
+  return false;
+}
+
   unsigned iters = size_bytes >> ceil_log2 (align);
   unsigned remainder = size_bytes & (align - 1);
   unsigned inc = GET_MODE_SIZE (mode);
diff --git a/gcc/config/bpf/bpf.opt b/gcc/config/bpf/bpf.opt
index acfddebdad7..541ebe4dfc4 100644
--- a/gcc/config/bpf/bpf.opt
+++ b/gcc/config/bpf/bpf.opt
@@ -108,3 +108,7 @@ Enum(asm_dialect) String(normal) Value(ASM_NORMAL)
 
 EnumValue
 Enum(asm_dialect) String(pseudoc) Value(ASM_PSEUDOC)
+
+minline-memops-threshold=
+Target RejectNegative Joined UInteger Var(bpf_inline_memops_threshold) 
Init(1024)
+-minline-memops-threshold= Maximum size of memset/memmove/memcpy to 
inline, larger sizes will use a library call.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index c0d604a2c5c..85c938d4a14 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -971,7 +971,7 @@ Objective-C and Objective-C++ Dialects}.
 @gccoptlist{-mbig-endian -mlittle-endian
 -mframe-limit=@var{bytes} -mxbpf -mco-re -mno-co-re -mjmpext
 -mjmp32 -malu32 -mv3-atomics -mbswap -msdiv -msmov -mcpu=@var{version}
--masm=@var{dialect}}
+-masm=@var{dialect} -minline-memops-threshold=@var{bytes}}
 
 @emph{FR30 Options}
 @gccoptlist{-msmall-model  -mno-lsim}
@@ -25701,6 +25701,15 @@ Outputs pseudo-c assembly dialect.
 
 @end table
 
+@opindex -minline-memops-threshold
+@item -minline-memops-threshold=@var{bytes}
+Specifies a size threshold in bytes at or below which memmove, memcpy
+and memset shall always be expanded inline.  Operations dealing with
+sizes 

Re: [PATCH v14 23/26] c++: Implement __is_invocable built-in trait

2024-03-08 Thread Patrick Palka
On Wed, 28 Feb 2024, Ken Matsui wrote:

> This patch implements built-in trait for std::is_invocable.
> 
> gcc/cp/ChangeLog:
> 
>   * cp-trait.def: Define __is_invocable.
>   * constraint.cc (diagnose_trait_expr): Handle CPTK_IS_INVOCABLE.
>   * semantics.cc (trait_expr_value): Likewise.
>   (finish_trait_expr): Likewise.
>   * cp-tree.h (build_invoke): New function.
>   * method.cc (build_invoke): New function.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/ext/has-builtin-1.C: Test existence of __is_invocable.
>   * g++.dg/ext/is_invocable1.C: New test.
>   * g++.dg/ext/is_invocable2.C: New test.
>   * g++.dg/ext/is_invocable3.C: New test.
>   * g++.dg/ext/is_invocable4.C: New test.

Thanks, this looks great!  This generic build_invoke function could be
used for invoke_result etc as well, and it could also cache the built-up
call across __is_invocable and __is_nothrow_invocable checks on the same
arguments (which is a common pattern in the standard library).  LGTM

> 
> Signed-off-by: Ken Matsui 
> ---
>  gcc/cp/constraint.cc |   6 +
>  gcc/cp/cp-trait.def  |   1 +
>  gcc/cp/cp-tree.h |   2 +
>  gcc/cp/method.cc | 132 +
>  gcc/cp/semantics.cc  |   4 +
>  gcc/testsuite/g++.dg/ext/has-builtin-1.C |   3 +
>  gcc/testsuite/g++.dg/ext/is_invocable1.C | 349 +++
>  gcc/testsuite/g++.dg/ext/is_invocable2.C | 139 +
>  gcc/testsuite/g++.dg/ext/is_invocable3.C |  51 
>  gcc/testsuite/g++.dg/ext/is_invocable4.C |  33 +++
>  10 files changed, 720 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/ext/is_invocable1.C
>  create mode 100644 gcc/testsuite/g++.dg/ext/is_invocable2.C
>  create mode 100644 gcc/testsuite/g++.dg/ext/is_invocable3.C
>  create mode 100644 gcc/testsuite/g++.dg/ext/is_invocable4.C
> 
> diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
> index 23ea66d9c12..c87b126fdb1 100644
> --- a/gcc/cp/constraint.cc
> +++ b/gcc/cp/constraint.cc
> @@ -3791,6 +3791,12 @@ diagnose_trait_expr (tree expr, tree args)
>  case CPTK_IS_FUNCTION:
>inform (loc, "  %qT is not a function", t1);
>break;
> +case CPTK_IS_INVOCABLE:
> +  if (!t2)
> +inform (loc, "  %qT is not invocable", t1);
> +  else
> +inform (loc, "  %qT is not invocable by %qE", t1, t2);
> +  break;
>  case CPTK_IS_LAYOUT_COMPATIBLE:
>inform (loc, "  %qT is not layout compatible with %qT", t1, t2);
>break;
> diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
> index 85056c8140b..6cb2b55f4ea 100644
> --- a/gcc/cp/cp-trait.def
> +++ b/gcc/cp/cp-trait.def
> @@ -75,6 +75,7 @@ DEFTRAIT_EXPR (IS_EMPTY, "__is_empty", 1)
>  DEFTRAIT_EXPR (IS_ENUM, "__is_enum", 1)
>  DEFTRAIT_EXPR (IS_FINAL, "__is_final", 1)
>  DEFTRAIT_EXPR (IS_FUNCTION, "__is_function", 1)
> +DEFTRAIT_EXPR (IS_INVOCABLE, "__is_invocable", -1)
>  DEFTRAIT_EXPR (IS_LAYOUT_COMPATIBLE, "__is_layout_compatible", 2)
>  DEFTRAIT_EXPR (IS_LITERAL_TYPE, "__is_literal_type", 1)
>  DEFTRAIT_EXPR (IS_MEMBER_FUNCTION_POINTER, "__is_member_function_pointer", 1)
> diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
> index 334c11396c2..261d3a71faa 100644
> --- a/gcc/cp/cp-tree.h
> +++ b/gcc/cp/cp-tree.h
> @@ -7334,6 +7334,8 @@ extern tree get_copy_assign (tree);
>  extern tree get_default_ctor (tree);
>  extern tree get_dtor (tree, tsubst_flags_t);
>  extern tree build_stub_object(tree);
> +extern tree build_invoke (tree, const_tree,
> +  tsubst_flags_t);
>  extern tree strip_inheriting_ctors   (tree);
>  extern tree inherited_ctor_binfo (tree);
>  extern bool base_ctor_omit_inherited_parms   (tree);
> diff --git a/gcc/cp/method.cc b/gcc/cp/method.cc
> index 98c10e6a8b5..953f1bed6fc 100644
> --- a/gcc/cp/method.cc
> +++ b/gcc/cp/method.cc
> @@ -1928,6 +1928,138 @@ build_trait_object (tree type)
>return build_stub_object (type);
>  }
>  
> +/* [func.require] Build an expression of INVOKE(FN_TYPE, ARG_TYPES...).  If 
> the
> +   given is not invocable, returns error_mark_node.  */
> +
> +tree
> +build_invoke (tree fn_type, const_tree arg_types, tsubst_flags_t complain)
> +{
> +  if (fn_type == error_mark_node || arg_types == error_mark_node)
> +return error_mark_node;
> +
> +  gcc_assert (TYPE_P (fn_type));
> +  gcc_assert (TREE_CODE (arg_types) == TREE_VEC);
> +
> +  /* Access check is required to determine if the given is invocable.  */
> +  deferring_access_check_sentinel acs (dk_no_deferred);
> +
> +  /* INVOKE is an unevaluated context.  */
> +  cp_unevaluated cp_uneval_guard;
> +
> +  bool is_ptrdatamem;
> +  bool is_ptrmemfunc;
> +  if (TREE_CODE (fn_type) == REFERENCE_TYPE)
> +{
> +  tree deref_fn_type = TREE_TYPE (fn_type);
> +  is_ptrdatamem = 

[COMMITTED] arm: testsuite: tweak bics_3.c [PR113542]

2024-03-08 Thread Richard Earnshaw

This test was too simple, which meant that the compiler was sometimes
able to find a better optimization of the code than using a BICS
instruction.  Fix this by changing the test slightly to produce a
sequence where BICS should always be the preferred solution.

gcc/testsuite:
PR target/113542
* gcc.target/arm/bics_3.c: Adjust code to something which should
always result in BICS.
---
 gcc/testsuite/gcc.target/arm/bics_3.c | 19 ---
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/bics_3.c b/gcc/testsuite/gcc.target/arm/bics_3.c
index e056b264e15..4d6938948a1 100644
--- a/gcc/testsuite/gcc.target/arm/bics_3.c
+++ b/gcc/testsuite/gcc.target/arm/bics_3.c
@@ -2,13 +2,11 @@
 /* { dg-options "-O2 --save-temps -fno-inline" } */
 /* { dg-require-effective-target arm32 } */
 
-extern void abort (void);
-
 int
 bics_si_test (int a, int b)
 {
-  if (a & ~b)
-return 1;
+  if ((a & ~b) >= 0)
+return 3;
   else
 return 0;
 }
@@ -16,8 +14,8 @@ bics_si_test (int a, int b)
 int
 bics_si_test2 (int a, int b)
 {
-  if (a & ~ (b << 2))
-return 1;
+  if ((a & ~ (b << 2)) >= 0)
+return 3;
   else
 return 0;
 }
@@ -28,13 +26,12 @@ main (void)
   int a = 5;
   int b = 5;
   int c = 20;
-  if (bics_si_test (a, b))
-abort ();
-  if (bics_si_test2 (c, b))
-abort ();
+  if (bics_si_test (a, b) != 3)
+__builtin_abort ();
+  if (bics_si_test2 (c, b) != 3)
+__builtin_abort ();
   return 0;
 }
 
 /* { dg-final { scan-assembler-times "bics\tr\[0-9\]+, r\[0-9\]+, r\[0-9\]+" 2 } } */
 /* { dg-final { scan-assembler-times "bics\tr\[0-9\]+, r\[0-9\]+, r\[0-9\]+, .sl #2" 1 } } */
-


[PATCH] c++: explicit inst of template method not generated [PR110323]

2024-03-08 Thread Marek Polacek
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
Consider

  constexpr int VAL = 1;
  struct foo {
  template 
  void bar(typename std::conditional::type arg) { }
  };
  template void foo::bar<1>(int arg);

where we since r11-291 fail to emit the code for the explicit
instantiation.  That's because cp_walk_subtrees/TYPENAME_TYPE now
walks TYPE_CONTEXT ('conditional' here) as well, and in a template
finds the B==VAL template argument.  VAL is constexpr, which implies const,
which in the global scope implies static.  constrain_visibility_for_template
then makes "struct conditional<(B == VAL), int, float>" non-TREE_PUBLIC.
Then symtab_node::needed_p checks TREE_PUBLIC, sees it's 0, and we don't
emit any code.

I thought the fix would be some ODR-esque check to not consider
constexpr variables/fns that are used just for their value.  But
it turned out to be tricky.  For instance, we can't skip
determine_visibility in a template; we can't even skip it for value-dep
expressions.  For example, no-linkage-expr1.C has

  using P = struct {}*;
  template 
  void f(int(*)[((P)0, N)]) {}

where ((P)0, N) is value-dep, but N is not relevant here: we have to
ferret out the anonymous type.  When instantiating, it's already gone.

The best I could come up with is to disregard _DECL in min_vis_expr_r
in a template while still checking type_visibility, even in a template.

PR c++/110323

gcc/cp/ChangeLog:

* decl2.cc (min_vis_expr_r) : Do nothing in a template.

gcc/testsuite/ChangeLog:

* g++.dg/template/explicit-instantiation6.C: New test.
* g++.dg/template/explicit-instantiation7.C: New test.
---
 gcc/cp/decl2.cc   |  6 ++-
 .../g++.dg/template/explicit-instantiation6.C | 53 +++
 .../g++.dg/template/explicit-instantiation7.C | 22 
 3 files changed, 80 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/template/explicit-instantiation6.C
 create mode 100644 gcc/testsuite/g++.dg/template/explicit-instantiation7.C

diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
index 6c9fd415d40..3e035a7bf9f 100644
--- a/gcc/cp/decl2.cc
+++ b/gcc/cp/decl2.cc
@@ -2718,7 +2718,11 @@ min_vis_expr_r (tree *tp, int */*walk_subtrees*/, void 
*data)
   /* Fall through.  */
 case VAR_DECL:
 case FUNCTION_DECL:
-  if (! TREE_PUBLIC (t))
+  if (processing_template_decl)
+   /* In a template, we can't trust _DECLs, either.  It's possible
+  they won't be ODR-used, and we could wrongly think the linkage
+  is internal (PR110323).  */;
+  else if (! TREE_PUBLIC (t))
tpvis = VISIBILITY_ANON;
   else
tpvis = DECL_VISIBILITY (t);
diff --git a/gcc/testsuite/g++.dg/template/explicit-instantiation6.C 
b/gcc/testsuite/g++.dg/template/explicit-instantiation6.C
new file mode 100644
index 000..399c7d72756
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/explicit-instantiation6.C
@@ -0,0 +1,53 @@
+// PR c++/110323
+// { dg-do compile { target c++14 } }
+
+template
+struct conditional { using type = T; };
+
+template
+struct conditional { using type = F; };
+
+constexpr int VAL = 1;
+
+static constexpr int getval () { return 1; }
+
+template
+constexpr int TVAL = 1;
+
+static struct S {
+  constexpr operator bool() { return true; }
+} s;
+
+struct foo {
+template 
+void bar(typename conditional::type arg) { }
+
+template 
+void baz(typename conditional::type arg) { }
+
+template 
+void qux(typename conditional, int, float>::type arg) { }
+
+template 
+void lox(typename conditional::type arg) { }
+
+template 
+void sox(typename conditional::type arg) 
{ }
+
+template 
+void nim(typename conditional::type arg) { }
+};
+
+template void foo::bar<1>(int arg);
+template void foo::baz<1>(int arg);
+template void foo::qux<1>(int arg);
+template void foo::lox<1>(int arg);
+template void foo::sox<1>(int arg);
+template void foo::nim<1>(int arg);
+
+// { dg-final { scan-assembler 
"_ZN3foo3barILi1EEEvN11conditionalIXeqT_L_ZL3VALEEifE4typeE" } }
+// { dg-final { scan-assembler 
"_ZN3foo3bazILi1EEEvN11conditionalIXeqT_clL_ZL6getvalvEEEifE4typeE" } }
+// { dg-final { scan-assembler 
"_ZN3foo3quxILi1EEEvN11conditionalIXeqT_L_Z4TVALIiEEEifE4typeE" } }
+// { dg-final { scan-assembler 
"_ZN3foo3loxILi1EEEvN11conditionalIXeqT_L_ZL1sEEifE4typeE" } }
+// { dg-final { scan-assembler 
"_ZN3foo3soxILi1EEEvN11conditionalIXeqT_nxL_ZL3VALEEifE4typeE" } }
+// { dg-final { scan-assembler 
"_ZN3foo3nimILi1EEEvN11conditionalIXneT_szL_ZL3VALEEifE4typeE" } }
diff --git a/gcc/testsuite/g++.dg/template/explicit-instantiation7.C 
b/gcc/testsuite/g++.dg/template/explicit-instantiation7.C
new file mode 100644
index 000..9a870e808fa
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/explicit-instantiation7.C
@@ -0,0 +1,22 @@
+// PR c++/110323
+// { dg-do compile { target c++11 } }
+
+using P = struct { }*;
+using N = struct A { }*;
+
+template

Re: [PATCH] contrib/gcc-changelog/git_check_commit.py: Implement --num-commits

2024-03-08 Thread Patrick Palka
On Wed, 28 Feb 2024, Ken Matsui wrote:

> This patch implements a --num-commits (-n) flag for shorthand for
> the range of hash~N..hash commits.
> 
> contrib/ChangeLog:
> 
>   * gcc-changelog/git_check_commit.py: Implement --num-commits.

LGTM

> 
> Signed-off-by: Ken Matsui 
> ---
>  contrib/gcc-changelog/git_check_commit.py | 15 +++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/contrib/gcc-changelog/git_check_commit.py 
> b/contrib/gcc-changelog/git_check_commit.py
> index 8cca9f439a5..22e032e8b38 100755
> --- a/contrib/gcc-changelog/git_check_commit.py
> +++ b/contrib/gcc-changelog/git_check_commit.py
> @@ -22,6 +22,12 @@ import argparse
>  
>  from git_repository import parse_git_revisions
>  
> +def nonzero_uint(value):
> +ivalue = int(value)
> +if ivalue <= 0:
> +raise argparse.ArgumentTypeError('%s is not a non-zero positive 
> integer' % value)
> +return ivalue
> +
>  parser = argparse.ArgumentParser(description='Check git ChangeLog format '
>   'of a commit')
>  parser.add_argument('revisions', default='HEAD', nargs='?',
> @@ -33,8 +39,17 @@ parser.add_argument('-p', '--print-changelog', 
> action='store_true',
>  help='Print final changelog entires')
>  parser.add_argument('-v', '--verbose', action='store_true',
>  help='Print verbose information')
> +parser.add_argument('-n', '--num-commits', type=nonzero_uint, default=1,
> +help='Number of commits to check (i.e. shorthand for '
> +'hash~N..hash)')
>  args = parser.parse_args()
>  
> +if args.num_commits > 1:
> +if '..' in args.revisions:
> +print('ERR: --num-commits and range of revisions are mutually 
> exclusive')
> +exit(1)
> +args.revisions = '{0}~{1}..{0}'.format(args.revisions, args.num_commits)
> +
>  retval = 0
>  for git_commit in parse_git_revisions(args.git_path, args.revisions):
>  res = 'OK' if git_commit.success else 'FAILED'
> -- 
> 2.44.0
> 
> 



Re: [PATCH 01/11] gcc/doc/extend.texi: Sort built-in traits alphabetically

2024-03-08 Thread Patrick Palka
Hi Ken,

This patch series LGTM, thanks for these documentation improvements.

On Fri, 1 Mar 2024, Ken Matsui wrote:

> This patch sorts built-in traits alphabetically for better codebase
> consistency and easier future integration of changes.
> 
> gcc/ChangeLog:
> 
>   * doc/extend.texi (Type Traits): Sort built-in traits
>   alphabetically.
> 
> Signed-off-by: Ken Matsui 
> ---
>  gcc/doc/extend.texi | 62 ++---
>  1 file changed, 31 insertions(+), 31 deletions(-)
> 
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index f679c81acf2..b13f9d6f934 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -29499,15 +29499,6 @@ Requires: @var{type} shall be a complete type, 
> (possibly cv-qualified)
>  @code{void}, or an array of unknown bound.
>  @enddefbuiltin
>  
> -@defbuiltin{bool __has_nothrow_copy (@var{type})}
> -If @code{__has_trivial_copy (type)} is @code{true} then the trait is
> -@code{true}, else if @var{type} is a cv-qualified class or union type
> -with copy constructors that are known not to throw an exception then
> -the trait is @code{true}, else it is @code{false}.
> -Requires: @var{type} shall be a complete type, (possibly cv-qualified)
> -@code{void}, or an array of unknown bound.
> -@enddefbuiltin
> -
>  @defbuiltin{bool __has_nothrow_constructor (@var{type})}
>  If @code{__has_trivial_constructor (type)} is @code{true} then the trait
>  is @code{true}, else if @var{type} is a cv class or union type (or array
> @@ -29517,6 +29508,15 @@ Requires: @var{type} shall be a complete type, 
> (possibly cv-qualified)
>  @code{void}, or an array of unknown bound.
>  @enddefbuiltin
>  
> +@defbuiltin{bool __has_nothrow_copy (@var{type})}
> +If @code{__has_trivial_copy (type)} is @code{true} then the trait is
> +@code{true}, else if @var{type} is a cv-qualified class or union type
> +with copy constructors that are known not to throw an exception then
> +the trait is @code{true}, else it is @code{false}.
> +Requires: @var{type} shall be a complete type, (possibly cv-qualified)
> +@code{void}, or an array of unknown bound.
> +@enddefbuiltin
> +
>  @defbuiltin{bool __has_trivial_assign (@var{type})}
>  If @var{type} is @code{const}- qualified or is a reference type then
>  the trait is @code{false}.  Otherwise if @code{__is_trivial (type)} is
> @@ -29527,15 +29527,6 @@ Requires: @var{type} shall be a complete type, 
> (possibly cv-qualified)
>  @code{void}, or an array of unknown bound.
>  @enddefbuiltin
>  
> -@defbuiltin{bool __has_trivial_copy (@var{type})}
> -If @code{__is_trivial (type)} is @code{true} or @var{type} is a reference
> -type then the trait is @code{true}, else if @var{type} is a cv class
> -or union type with a trivial copy constructor ([class.copy]) then the trait
> -is @code{true}, else it is @code{false}.  Requires: @var{type} shall be
> -a complete type, (possibly cv-qualified) @code{void}, or an array of unknown
> -bound.
> -@enddefbuiltin
> -
>  @defbuiltin{bool __has_trivial_constructor (@var{type})}
>  If @code{__is_trivial (type)} is @code{true} then the trait is @code{true},
>  else if @var{type} is a cv-qualified class or union type (or array thereof)
> @@ -29545,6 +29536,15 @@ Requires: @var{type} shall be a complete type, 
> (possibly cv-qualified)
>  @code{void}, or an array of unknown bound.
>  @enddefbuiltin
>  
> +@defbuiltin{bool __has_trivial_copy (@var{type})}
> +If @code{__is_trivial (type)} is @code{true} or @var{type} is a reference
> +type then the trait is @code{true}, else if @var{type} is a cv class
> +or union type with a trivial copy constructor ([class.copy]) then the trait
> +is @code{true}, else it is @code{false}.  Requires: @var{type} shall be
> +a complete type, (possibly cv-qualified) @code{void}, or an array of unknown
> +bound.
> +@enddefbuiltin
> +
>  @defbuiltin{bool __has_trivial_destructor (@var{type})}
>  If @code{__is_trivial (type)} is @code{true} or @var{type} is a reference 
> type
>  then the trait is @code{true}, else if @var{type} is a cv class or union
> @@ -29560,6 +29560,13 @@ If @var{type} is a class type with a virtual 
> destructor
>  Requires: If @var{type} is a non-union class type, it shall be a complete 
> type.
>  @enddefbuiltin
>  
> +@defbuiltin{bool __integer_pack (@var{length})}
> +When used as the pattern of a pack expansion within a template
> +definition, expands to a template argument pack containing integers
> +from @code{0} to @code{@var{length}-1}.  This is provided for
> +efficient implementation of @code{std::make_integer_sequence}.
> +@enddefbuiltin
> +
>  @defbuiltin{bool __is_abstract (@var{type})}
>  If @var{type} is an abstract class ([class.abstract]) then the trait
>  is @code{true}, else it is @code{false}.
> @@ -29589,12 +29596,6 @@ If @var{type} is a cv-qualified class type, and not 
> a union type
>  ([basic.compound]) the trait is @code{true}, else it is @code{false}.
>  @enddefbuiltin
>  
> -@c FIXME Commented out for GCC 13, 

Re: Fix 'char' initialization, copy, check in 'libgomp.oacc-fortran/acc-memcpy.f90'

2024-03-08 Thread Tobias Burnus

Hi Thomas,

Am 08.03.24 um 12:15 schrieb Thomas Schwinge:

OK to push
"Fix 'char' initialization, copy, check in 
'libgomp.oacc-fortran/acc-memcpy.f90'",
see attached?


OK.

I think there was some remaining code around the problem that 
HUGE(1_int8) = 127 and '-128_int8' is invalid because in Fortran, that's 
'unary_minus(128_int8)', which is not valid as 128 exceeds HUGE(1_int8),


Which the remaining bits code tried to solve (i.e. -127:127 vs. 
-128:127) but seemingly failed to do so consistently.


Thanks!

Tobias


Re: nvptx: 'cuDeviceGetCount' failure is fatal

2024-03-08 Thread Thomas Schwinge
Hi Tobias!

On 2024-03-07T15:28:21+0100, Tobias Burnus  wrote:
> Thomas Schwinge wrote:
>> OK to push the attached "nvptx: 'cuDeviceGetCount' failure is fatal"?
>
> I think the real question is: what does a 'cuDeviceGetCount' fail mean?

Internally to the CUDA stack: the error codes that you've cited below.
Per the state we're in when calling 'cuDeviceGetCount', we only expect
'CUDA_SUCCESS'.  Therefore, in our actual use: anything else means a
fatal condition that we don't attempt to recover from, like for most of
all other device access failures.

> Does it mean a serious error – or could it just be a permissions issue 
> such that the user has no device access but otherwise is fine?

As you can see, we've done a 'cuInit' right before, so in case there was
any permission issue (or similar), that's already settled (in whichever
way) by the time we do the 'cuDeviceGetCount'.

> Because if it is, e.g., a permission problem – just returning '0' (no 
> devices) would seem to be the proper solution.
>
> But if it is expected to be always something serious, well, then a fatal 
> error makes more sense.

ACK; pushed in commit 37078f241a22c45db6380c5e9a79b4d08054bb3d.


Grüße
 Thomas


> The possible exit codes are:
>
> CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, 
> CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE
>
> which does not really help.
>
> My impression is that 0 is usually returned if something goes wrong 
> (e.g. with permissions) such that an error is a real exception. But all 
> three choices seem to make about equally sense: either host fallback 
> (with 0 or -1) or a fatal error.
>
> Tobias


Re: [PATCH v2] c++: Check module attachment instead of just purview when necessary [PR112631]

2024-03-08 Thread Jason Merrill

On 3/7/24 21:55, Nathaniel Shead wrote:

On Mon, Nov 27, 2023 at 03:59:39PM +1100, Nathaniel Shead wrote:

On Thu, Nov 23, 2023 at 03:03:37PM -0500, Nathan Sidwell wrote:

On 11/20/23 04:47, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu. I don't have write
access.

-- >8 --

Block-scope declarations of functions or extern values are not allowed
when attached to a named module. Similarly, class member functions are
not inline if attached to a named module. However, in both these cases
we currently only check if the declaration is within the module purview;
it is possible for such a declaration to occur within the module purview
but not be attached to a named module (e.g. in an 'extern "C++"' block).
This patch makes the required adjustments.



Ah I'd been puzzling over the default inlinedness of  member-fns of
block-scope structs.  Could you augment the testcase to make sure that's
right too?

Something like:

// dg-module-do link
export module Mod;

export auto Get () {
   struct X { void Fn () {} };
   return X();
}


///
import Mod
void Frob () { Get().Fn(); }



I gave this a try and it indeed doesn't work correctly; 'Fn' needs to be
marked 'inline' for this to link (whether or not 'Get' itself is
inline). I've tried tracing the code to work out what's going on but
I've been struggling to work out how all the different flags (e.g.
TREE_PUBLIC, TREE_EXTERNAL, TREE_COMDAT, DECL_NOT_REALLY_EXTERN)
interact, which flags we want to be set where, and where the decision of
what function definitions to emit is actually made.

I did find that doing 'mark_used(decl)' on all member functions in
block-scope structs seems to work however, but I wonder if that's maybe
too aggressive or if there's something else we should be doing?


I got around to looking at this again, here's an updated version of this
patch. Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

(I'm not sure if 'start_preparsed_function' is the right place to be
putting this kind of logic or if it should instead be going in
'grokfndecl', e.g. decl.cc:10761 where the rules for making local
functions have no linkage are initially determined, but I found this
easier to implement: happy to move around though if preferred.)

-- >8 --

Block-scope declarations of functions or extern values are not allowed
when attached to a named module. Similarly, class member functions are
not inline if attached to a named module. However, in both these cases
we currently only check if the declaration is within the module purview;
it is possible for such a declaration to occur within the module purview
but not be attached to a named module (e.g. in an 'extern "C++"' block).
This patch makes the required adjustments.

While implementing this we discovered that block-scope local functions
are not correctly emitted, causing link failures; this patch also
corrects some assumptions here and ensures that they are emitted when
needed.

PR c++/112631

gcc/cp/ChangeLog:

* cp-tree.h (named_module_attach_p): New function.
* decl.cc (start_decl): Check for attachment not purview.
(grokmethod): Likewise.


These changes are OK; the others I want to consider more.


+export auto n_n() {
+  internal();
+  struct X { void f() { internal(); } };
+  return X{};


Hmm, is this not a prohibited exposure?  Seems like X has no linkage 
because it's at block scope, and the deduced return type names it.


I know we try to support this "voldemort" pattern, but is that actually 
correct?


Jason



Re: [PATCH v2] contrib: Improve dg-extract-results.sh's Python detection

2024-03-08 Thread Jeff Law




On 3/7/24 07:27, Sam James wrote:

'python' on some systems (e.g. SLES 15) might be Python 2. Prefer python3,
then python, then python2 (as the script still tries to work there).

contrib/ChangeLog:

 * dg-extract-results.sh: Check for python3 before python. Check for 
python2 last.
OK.  And given that the shell version is just fundamentally broken for 
parallel checks, consider a patch to remove it pre-approved.


Jeff



[COMMITTED] ARM: Fix builtin-bswap-1.c test [PR113915]

2024-03-08 Thread Wilco Dijkstra
On Thumb-2 the use of CBZ blocks conditional execution, so change the
test to compare with a non-zero value.

gcc/testsuite/ChangeLog:
PR target/113915
* gcc.target/arm/builtin-bswap.x: Fix test to avoid emitting CBZ.

---

diff --git a/gcc/testsuite/gcc.target/arm/builtin-bswap.x 
b/gcc/testsuite/gcc.target/arm/builtin-bswap.x
index 
c96dbe6329c4dc648fd0bcc972ad494c7d6dc6e5..dc8f910e0007a67ae5cb5100c98101c7b199b5ca
 100644
--- a/gcc/testsuite/gcc.target/arm/builtin-bswap.x
+++ b/gcc/testsuite/gcc.target/arm/builtin-bswap.x
@@ -10,7 +10,7 @@ extern short foos16 (short);
 short swaps16_cond (short x, int y)
 {
   short z = x;
-  if (y)
+  if (y != 2)
 z = __builtin_bswap16 (x);
   return foos16 (z);
 }
@@ -27,7 +27,7 @@ extern unsigned short foou16 (unsigned short);
 unsigned short swapu16_cond (unsigned short x, int y)
 {
   unsigned short z = x;
-  if (y)
+  if (y != 2)
 z = __builtin_bswap16 (x);
   return foou16 (z);
 }
@@ -43,7 +43,7 @@ extern int foos32 (int);
 int swaps32_cond (int x, int y)
 {
   int z = x;
-  if (y)
+  if (y != 2)
 z = __builtin_bswap32 (x);
   return foos32 (z);
 }
@@ -60,7 +60,7 @@ extern unsigned int foou32 (unsigned int);
 unsigned int swapsu2 (unsigned int x, int y)
 {
   int z = x;
-  if (y)
+  if (y != 2)
 z = __builtin_bswap32 (x);
   return foou32 (z);
 }



[PATCH] ipa: Fix C++ member ptr indirect inlining (PR 114254, PR 108802)

2024-03-08 Thread Martin Jambor
Hi,

Even though we have had code to handle creation of indirect call graph
edges (so that these calls can than be made direct as part of IPA-CP
and inlining and eventually also inlined) for C++ member pointers for
many years, it turns out that it does not work for lambdas and that it
has been severely broken since GCC 10 when the base class has virtual
functions.

Lambdas don't work because the code cannot work with structures
representing member function pointers because they are passed by
reference instead by value and the code was not ready for that.

The presence of virtual methods broke thinks because at some point C++
FE got clever and stopped emitting the check for virtual methods when
the base class does not have any and that in turn made our existing
testcases not test the necessary pattern matching code.  The pattern
matcher had a small bug which did not matter before
r10-917-g3b47da42de621c but did afterwards.

This patch changes the pattern matcher to match both of these cases.

Special thanks to the Linaro automated checker of patches which
reported that the earlier version of my PR 108802 fix was not working
on Aarch64 which in turn made me discover PR 114254.

The patch has passed bootstrap and testing on x86_64-linux,
aarch64-linux and ppc64-linux and I also LTO bootstrap on x86_64-linux.

I understand we have been living with these deficiencies for a while now
but both are technically regressions.  If Honza agrees (and manages to
review the patch quickly), I'm fine with pushing them to master now but
I can also wait until the next stage 1.

Thanks,

Martin


gcc/ChangeLog:

2024-03-06  Martin Jambor  

PR ipa/108802
PR ipa/114254
* ipa-prop.cc (ipa_get_stmt_member_ptr_load_param): Fix case looking
at COMPONENT_REFs directly from a PARM_DECL, also recognize loads from
a pointer parameter.
(ipa_analyze_indirect_call_uses): Also recognize loads from a pointer
parameter, also recognize the case when pfn pointer is loaded in its
own BB.

gcc/testsuite/ChangeLog:

2024-03-06  Martin Jambor  

PR ipa/108802
PR ipa/114254
* g++.dg/ipa/iinline-4.C: New test.
* g++.dg/ipa/pr108802.C: Likewise.
---
 gcc/ipa-prop.cc  | 110 +++
 gcc/testsuite/g++.dg/ipa/iinline-4.C |  61 +++
 gcc/testsuite/g++.dg/ipa/pr108802.C  |  14 
 3 files changed, 154 insertions(+), 31 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ipa/iinline-4.C
 create mode 100644 gcc/testsuite/g++.dg/ipa/pr108802.C

diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
index e22c4f78405..e8e4918d5a8 100644
--- a/gcc/ipa-prop.cc
+++ b/gcc/ipa-prop.cc
@@ -2500,7 +2500,9 @@ static tree
 ipa_get_stmt_member_ptr_load_param (gimple *stmt, bool use_delta,
HOST_WIDE_INT *offset_p)
 {
-  tree rhs, rec, ref_field, ref_offset, fld, ptr_field, delta_field;
+  tree rhs, fld, ptr_field, delta_field;
+  tree ref_field = NULL_TREE;
+  tree ref_offset = NULL_TREE;
 
   if (!gimple_assign_single_p (stmt))
 return NULL_TREE;
@@ -2511,35 +2513,53 @@ ipa_get_stmt_member_ptr_load_param (gimple *stmt, bool 
use_delta,
   ref_field = TREE_OPERAND (rhs, 1);
   rhs = TREE_OPERAND (rhs, 0);
 }
+
+  if (TREE_CODE (rhs) == MEM_REF)
+{
+  ref_offset = TREE_OPERAND (rhs, 1);
+  if (ref_field && integer_nonzerop (ref_offset))
+   return NULL_TREE;
+}
+  else if (!ref_field)
+return NULL_TREE;
+
+  if (TREE_CODE (rhs) == MEM_REF
+  && TREE_CODE (TREE_OPERAND (rhs, 0)) == SSA_NAME
+  && SSA_NAME_IS_DEFAULT_DEF (TREE_OPERAND (rhs, 0)))
+{
+  rhs = TREE_OPERAND (rhs, 0);
+  if (TREE_CODE (SSA_NAME_VAR (rhs)) != PARM_DECL
+ || !type_like_member_ptr_p (TREE_TYPE (TREE_TYPE (rhs)), _field,
+ _field))
+   return NULL_TREE;
+}
   else
-ref_field = NULL_TREE;
-  if (TREE_CODE (rhs) != MEM_REF)
-return NULL_TREE;
-  rec = TREE_OPERAND (rhs, 0);
-  if (TREE_CODE (rec) != ADDR_EXPR)
-return NULL_TREE;
-  rec = TREE_OPERAND (rec, 0);
-  if (TREE_CODE (rec) != PARM_DECL
-  || !type_like_member_ptr_p (TREE_TYPE (rec), _field, _field))
-return NULL_TREE;
-  ref_offset = TREE_OPERAND (rhs, 1);
+{
+  if (TREE_CODE (rhs) == MEM_REF
+ && TREE_CODE (TREE_OPERAND (rhs, 0)) == ADDR_EXPR)
+   rhs = TREE_OPERAND (TREE_OPERAND (rhs, 0), 0);
+  if (TREE_CODE (rhs) != PARM_DECL
+ || !type_like_member_ptr_p (TREE_TYPE (rhs), _field,
+ _field))
+   return NULL_TREE;
+}
 
   if (use_delta)
 fld = delta_field;
   else
 fld = ptr_field;
-  if (offset_p)
-*offset_p = int_bit_position (fld);
 
   if (ref_field)
 {
-  if (integer_nonzerop (ref_offset))
+  if (ref_field != fld)
return NULL_TREE;
-  return ref_field == fld ? rec : NULL_TREE;
 }
-  else
-return 

Re: [PATCH v1] VECT: Bugfix ICE for vectorizable_store when both len and mask

2024-03-08 Thread Richard Biener
On Fri, Mar 8, 2024 at 2:59 PM Richard Biener
 wrote:
>
> On Fri, Mar 8, 2024 at 1:04 AM  wrote:
> >
> > From: Pan Li 
> >
> > This patch would like to fix one ICE in vectorizable_store for both the
> > loop_masks and loop_lens.  The ICE looks like below with "-march=rv64gcv 
> > -O3".
> >
> > during GIMPLE pass: vect
> > test.c: In function ‘d’:
> > test.c:6:6: internal compiler error: in vectorizable_store, at
> > tree-vect-stmts.cc:8691
> > 6 | void d() {
> >   |  ^
> > 0x37a6f2f vectorizable_store
> > .../__RISC-V_BUILD__/../gcc/tree-vect-stmts.cc:8691
> > 0x37b861c vect_analyze_stmt(vec_info*, _stmt_vec_info*, bool*,
> > _slp_tree*, _slp_instance*, vec*)
> > .../__RISC-V_BUILD__/../gcc/tree-vect-stmts.cc:13242
> > 0x1db5dca vect_analyze_loop_operations
> > .../__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:2208
> > 0x1db885b vect_analyze_loop_2
> > .../__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3041
> > 0x1dba029 vect_analyze_loop_1
> > .../__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3481
> > 0x1dbabad vect_analyze_loop(loop*, vec_info_shared*)
> > .../__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3639
> > 0x1e389d1 try_vectorize_loop_1
> > .../__RISC-V_BUILD__/../gcc/tree-vectorizer.cc:1066
> > 0x1e38f3d try_vectorize_loop
> > .../__RISC-V_BUILD__/../gcc/tree-vectorizer.cc:1182
> > 0x1e39230 execute
> > .../__RISC-V_BUILD__/../gcc/tree-vectorizer.cc:1298
> >
> > Given the masks and the lens cannot be enabled simultanously when loop is
> > using partial vectors.  Thus, we need to ensure the one is disabled when we
> > would like to record the other in check_load_store_for_partial_vectors.  For
> > example, when we try to record loop len, we need to check if the loop mask
> > is disabled or not.
>
> I don't think you can rely on LOOP_VINFO_FULLY_WITH_LENGTH_P during
> analysis.  Instead how we tried to set up things is that we never even try
> both and there is (was?) code to reject partial vector usage when we end
> up recording both lens and masks.

That is, a fix along what you do would have been to split
LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P into
_WITH_LENGTH and _WITH_MASKS, make sure to record both when
a stmt can handle both so in the end we'll have a choice.  Currently
if we end up with both mask and len we don't know whether all stmts
support lens or masks or just some.

But we simply assumed on RISCV you'd never end up with unsupported len
but supported mask I guess.

Richard.

> That said, the assert you run into should be only asserted during transform,
> not during analysis.  It possibly was before Robins costing reorg?
>
> Richard.
>
> > Below testsuites are passed for this patch:
> > * The x86 bootstrap tests.
> > * The x86 fully regression tests.
> > * The aarch64 fully regression tests.
> > * The riscv fully regressison tests.
> >
> > PR target/114195
> >
> > gcc/ChangeLog:
> >
> > * tree-vect-stmts.cc (check_load_store_for_partial_vectors): Add
> > loop mask/len check before recording as they are mutual exclusion.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/riscv/rvv/base/pr114195-1.c: New test.
> >
> > Signed-off-by: Pan Li 
> > ---
> >  .../gcc.target/riscv/rvv/base/pr114195-1.c| 15 +++
> >  gcc/tree-vect-stmts.cc| 26 ++-
> >  2 files changed, 35 insertions(+), 6 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr114195-1.c
> >
> > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr114195-1.c 
> > b/gcc/testsuite/gcc.target/riscv/rvv/base/pr114195-1.c
> > new file mode 100644
> > index 000..b0c9d5b81b8
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr114195-1.c
> > @@ -0,0 +1,15 @@
> > +/* Test that we do not have ice when compile */
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize" } */
> > +
> > +long a, b;
> > +extern short c[];
> > +
> > +void d() {
> > +  for (int e = 0; e < 35; e += 2) {
> > +a = ({ a < 0 ? a : 0; });
> > +b = ({ b < 0 ? b : 0; });
> > +
> > +c[e] = 0;
> > +  }
> > +}
> > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > index 14a3ffb5f02..624947ed271 100644
> > --- a/gcc/tree-vect-stmts.cc
> > +++ b/gcc/tree-vect-stmts.cc
> > @@ -1502,6 +1502,8 @@ check_load_store_for_partial_vectors (loop_vec_info 
> > loop_vinfo, tree vectype,
> >   gather_scatter_info *gs_info,
> >   tree scalar_mask)
> >  {
> > +  gcc_assert (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo));
> > +
> >/* Invariant loads need no special support.  */
> >if (memory_access_type == VMAT_INVARIANT)
> >  return;
> > @@ -1521,9 +1523,17 @@ check_load_store_for_partial_vectors (loop_vec_info 
> > loop_vinfo, tree vectype,
> >internal_fn ifn
> > = (is_load ? vect_load_lanes_supported (vectype, 

Re: [PATCH v1] VECT: Bugfix ICE for vectorizable_store when both len and mask

2024-03-08 Thread Richard Biener
On Fri, Mar 8, 2024 at 1:04 AM  wrote:
>
> From: Pan Li 
>
> This patch would like to fix one ICE in vectorizable_store for both the
> loop_masks and loop_lens.  The ICE looks like below with "-march=rv64gcv -O3".
>
> during GIMPLE pass: vect
> test.c: In function ‘d’:
> test.c:6:6: internal compiler error: in vectorizable_store, at
> tree-vect-stmts.cc:8691
> 6 | void d() {
>   |  ^
> 0x37a6f2f vectorizable_store
> .../__RISC-V_BUILD__/../gcc/tree-vect-stmts.cc:8691
> 0x37b861c vect_analyze_stmt(vec_info*, _stmt_vec_info*, bool*,
> _slp_tree*, _slp_instance*, vec*)
> .../__RISC-V_BUILD__/../gcc/tree-vect-stmts.cc:13242
> 0x1db5dca vect_analyze_loop_operations
> .../__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:2208
> 0x1db885b vect_analyze_loop_2
> .../__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3041
> 0x1dba029 vect_analyze_loop_1
> .../__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3481
> 0x1dbabad vect_analyze_loop(loop*, vec_info_shared*)
> .../__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3639
> 0x1e389d1 try_vectorize_loop_1
> .../__RISC-V_BUILD__/../gcc/tree-vectorizer.cc:1066
> 0x1e38f3d try_vectorize_loop
> .../__RISC-V_BUILD__/../gcc/tree-vectorizer.cc:1182
> 0x1e39230 execute
> .../__RISC-V_BUILD__/../gcc/tree-vectorizer.cc:1298
>
> Given the masks and the lens cannot be enabled simultanously when loop is
> using partial vectors.  Thus, we need to ensure the one is disabled when we
> would like to record the other in check_load_store_for_partial_vectors.  For
> example, when we try to record loop len, we need to check if the loop mask
> is disabled or not.

I don't think you can rely on LOOP_VINFO_FULLY_WITH_LENGTH_P during
analysis.  Instead how we tried to set up things is that we never even try
both and there is (was?) code to reject partial vector usage when we end
up recording both lens and masks.

That said, the assert you run into should be only asserted during transform,
not during analysis.  It possibly was before Robins costing reorg?

Richard.

> Below testsuites are passed for this patch:
> * The x86 bootstrap tests.
> * The x86 fully regression tests.
> * The aarch64 fully regression tests.
> * The riscv fully regressison tests.
>
> PR target/114195
>
> gcc/ChangeLog:
>
> * tree-vect-stmts.cc (check_load_store_for_partial_vectors): Add
> loop mask/len check before recording as they are mutual exclusion.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/pr114195-1.c: New test.
>
> Signed-off-by: Pan Li 
> ---
>  .../gcc.target/riscv/rvv/base/pr114195-1.c| 15 +++
>  gcc/tree-vect-stmts.cc| 26 ++-
>  2 files changed, 35 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr114195-1.c
>
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr114195-1.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/pr114195-1.c
> new file mode 100644
> index 000..b0c9d5b81b8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr114195-1.c
> @@ -0,0 +1,15 @@
> +/* Test that we do not have ice when compile */
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize" } */
> +
> +long a, b;
> +extern short c[];
> +
> +void d() {
> +  for (int e = 0; e < 35; e += 2) {
> +a = ({ a < 0 ? a : 0; });
> +b = ({ b < 0 ? b : 0; });
> +
> +c[e] = 0;
> +  }
> +}
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 14a3ffb5f02..624947ed271 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -1502,6 +1502,8 @@ check_load_store_for_partial_vectors (loop_vec_info 
> loop_vinfo, tree vectype,
>   gather_scatter_info *gs_info,
>   tree scalar_mask)
>  {
> +  gcc_assert (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo));
> +
>/* Invariant loads need no special support.  */
>if (memory_access_type == VMAT_INVARIANT)
>  return;
> @@ -1521,9 +1523,17 @@ check_load_store_for_partial_vectors (loop_vec_info 
> loop_vinfo, tree vectype,
>internal_fn ifn
> = (is_load ? vect_load_lanes_supported (vectype, group_size, true)
>: vect_store_lanes_supported (vectype, group_size, true));
> -  if (ifn == IFN_MASK_LEN_LOAD_LANES || ifn == IFN_MASK_LEN_STORE_LANES)
> +
> +  /* When the loop_vinfo using partial vector,  we cannot enable both
> +the fully mask and length simultaneously.  Thus, make sure the
> +other one is disabled when record one of them.
> +The same as other place for both the vect_record_loop_len and
> +vect_record_loop_mask.  */
> +  if ((ifn == IFN_MASK_LEN_LOAD_LANES || ifn == IFN_MASK_LEN_STORE_LANES)
> +   && !LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
> vect_record_loop_len (loop_vinfo, lens, nvectors, vectype, 1);
> -  else if (ifn 

Re: [PATCH v2] openmp: Change to using a hashtab to lookup offload target addresses for indirect function calls

2024-03-08 Thread Thomas Schwinge
Hi!

On 2024-01-29T17:48:47+, Kwok Cheung Yeung  wrote:
> A splay-tree was previously used to lookup equivalent target addresses
> for a given host address on offload targets. However, as splay-trees can
> modify their structure on lookup, they are not suitable for concurrent
> access from separate teams/threads without some form of locking.

Heh.  ,-)

> This
> patch changes the lookup data structure to a hashtab instead, which does
> not have these issues.

(I've not looked into which data structure is most suitable here; not my
area of expertise.)

> The call to build_indirect_map to initialize the data structure is now
> called from just the first thread of the first team to avoid redundant
> calls to this function.

ACK, and also you've removed a number of 'volatile's, as I had questioned
earlier.  It remains open the question when to do the initialization, and
how to react to dynamic device image load and unload, and possibly other
(but not many?) raised during review.

I cannot formally approve this patch, but it seems a good incremental
step forward to me: per my testing so far,
(a) 'libgomp.c-c++-common/declare-target-indirect-2.c' is all-PASS,
with 'warning: this statement may fall through' resolved, and
(b) for 'libgomp.fortran/declare-target-indirect-2.f90': no more timeouts
(applies to nvptx only), and all-PASS execution test (both GCN, nvptx):

PASS: libgomp.fortran/declare-target-indirect-2.f90   -O0  (test for excess 
errors)
[-WARNING: libgomp.fortran/declare-target-indirect-2.f90   -O0  execution 
test program timed out.-]
[-XFAIL:-]{+PASS:+} libgomp.fortran/declare-target-indirect-2.f90   -O0  
execution test
PASS: libgomp.fortran/declare-target-indirect-2.f90   -O1  (test for excess 
errors)
[-WARNING: libgomp.fortran/declare-target-indirect-2.f90   -O1  execution 
test program timed out.-]
[-XFAIL:-]{+PASS:+} libgomp.fortran/declare-target-indirect-2.f90   -O1  
execution test
PASS: libgomp.fortran/declare-target-indirect-2.f90   -O2  (test for excess 
errors)
[-WARNING: libgomp.fortran/declare-target-indirect-2.f90   -O2  execution 
test program timed out.-]
[-XFAIL:-]{+PASS:+} libgomp.fortran/declare-target-indirect-2.f90   -O2  
execution test
PASS: libgomp.fortran/declare-target-indirect-2.f90   -O3 
-fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  
(test for excess errors)
[-WARNING: libgomp.fortran/declare-target-indirect-2.f90   -O3 
-fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  
execution test program timed out.-]
[-XFAIL:-]{+PASS:+} libgomp.fortran/declare-target-indirect-2.f90   -O3 
-fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  
execution test
PASS: libgomp.fortran/declare-target-indirect-2.f90   -O3 -g  (test for 
excess errors)
[-WARNING: libgomp.fortran/declare-target-indirect-2.f90   -O3 -g  
execution test program timed out.-]
[-XFAIL:-]{+PASS:+} libgomp.fortran/declare-target-indirect-2.f90   -O3 -g  
execution test
PASS: libgomp.fortran/declare-target-indirect-2.f90   -Os  (test for excess 
errors)
[-WARNING: libgomp.fortran/declare-target-indirect-2.f90   -Os  execution 
test program timed out.-]
[-XFAIL:-]{+PASS:+} libgomp.fortran/declare-target-indirect-2.f90   -Os  
execution test

(Of course, the patch now needs un-XFAILing of
'libgomp.fortran/declare-target-indirect-2.f90' merged in.)


Grüße
 Thomas


>   libgomp/
>   * config/accel/target-indirect.c: Include string.h and hashtab.h.
>   Remove include of splay-tree.h.  Update comments.
>   (splay_tree_prefix, splay_tree_c): Delete.
>   (struct indirect_map_t): New.
>   (hash_entry_type, htab_alloc, htab_free, htab_hash, htab_eq): New.
>   (GOMP_INDIRECT_ADD_MAP): Remove volatile qualifier.
>   (USE_SPLAY_TREE_LOOKUP): Rename to...
>   (USE_HASHTAB_LOOKUP): ..this.
>   (indirect_map, indirect_array): Delete.
>   (indirect_htab): New.
>   (build_indirect_map): Remove locking.  Build indirect map using
>   hashtab.
>   (GOMP_target_map_indirect_ptr): Use indirect_htab to lookup target
>   address.
>   (GOMP_target_map_indirect_ptr): Remove volatile qualifier.
>   * config/gcn/team.c (gomp_gcn_enter_kernel): Call build_indirect_map
>   from first thread of first team only.
>   * config/nvptx/team.c (gomp_nvptx_main): Likewise.
>   * testsuite/libgomp.c-c++-common/declare-target-indirect-2.c (main):
>   Add missing break statements.
> ---
>  libgomp/config/accel/target-indirect.c| 83 ++-
>  libgomp/config/gcn/team.c |  7 +-
>  libgomp/config/nvptx/team.c   |  9 +-
>  .../declare-target-indirect-2.c   | 14 ++--
>  4 files changed, 63 insertions(+), 50 deletions(-)
>
> diff --git a/libgomp/config/accel/target-indirect.c 
> b/libgomp/config/accel/target-indirect.c
> index c60fd547cb6..cfef1ddbc49 

Re: GCN: The original meaning of 'GCN_SUPPRESS_HOST_FALLBACK' isn't applicable (non-shared memory system)

2024-03-08 Thread Andrew Stubbs

On 08/03/2024 10:16, Thomas Schwinge wrote:

Hi!

So, attached here is now a different patch
"GCN: The original meaning of 'GCN_SUPPRESS_HOST_FALLBACK' isn't applicable 
(non-shared memory system)",
that takes a different approach re clarifying the two orthogonal aspects
that the 'GCN_SUPPRESS_HOST_FALLBACK' environment variable controls:
(a) the *original* meaning via 'HSA_SUPPRESS_HOST_FALLBACK', and
(b) the *additional*/*new* meaning to report as fatal certain errors
during device probing.

As you requested, (b) remains as it is (with just the diagnostic message
clarified).  Re (a):

On 2024-03-07T14:37:10+0100, I wrote:

On 2024-03-07T12:43:07+0100, Tobias Burnus  wrote:

Thomas Schwinge wrote:

[...] libgomp GCN plugin 'GCN_SUPPRESS_HOST_FALLBACK' [...]

[...] originates in the libgomp HSA plugin, where the idea was -- in my
understanding -- that you wouldn't have device code available for all
'fn_ptr's, and in that case transparently (shared-memory system!) do
host-fallback execution.  Or, with 'GCN_SUPPRESS_HOST_FALLBACK' set,
you'd get those diagnosed.

This has then been copied into the libgomp GCN plugin (see above).
However, is it really still applicable there; don't we assume that we're
generating device code for all relevant functions?



And, one step back: how is (the original meaning of)
'suppress_host_fallback = false' even supposed to work on non-shared
memory systems as currently implemented by the libgomp GCN plugin?



[...] this whole concept of dynamic plugin-level
host-fallback execution being in conflict with our current non-shared
memory system configurations?


I therefore suggest to get rid of (a).

OK to push?


I wasn't aware that things could be broken when fallback-suppression 
*wasn't* set. I agree that we don't need that "feature".


As far as I knew this feature was merely an older implementation of the 
now-standard OMP_TARGET_OFFLOAD=mandatory with the additional advantage 
that we could make it do whatever we want for our test and debug needs 
(i.e. no target independent "smarts").


This patch looks good, thanks.

Andrew


[patch,avr,applied] Add an insn combine pattern for offset computation.

2024-03-08 Thread Georg-Johann Lay

Computing  uint16_t += 2 * uint8_t  can occur when an offset
into a 16-bit array is computed.  Without this pattern is costs
six instructions: A move (1), a zero-extend (1), a shift (2) and
an addition (2).  With this pattern it costs 4.

Johann

--

AVR: Add an insn combine pattern for offset computation.

Computing  uint16_t += 2 * uint8_t  can occur when an offset
into a 16-bit array is computed.  Without this pattern is costs
six instructions: A move (1), a zero-extend (1), a shift (2) and
an addition (2).  With this pattern it costs 4.

gcc/
* config/avr/avr.md (*addhi3_zero_extend.ashift1): New pattern.
* config/avr/avr.cc (avr_rtx_costs_1) [PLUS]: Compute its cost.diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc
index b87ae6a256d..1fa4b557f5d 100644
--- a/gcc/config/avr/avr.cc
+++ b/gcc/config/avr/avr.cc
@@ -12513,6 +12513,17 @@ avr_rtx_costs_1 (rtx x, machine_mode mode, int outer_code,
   return true;
 
 case PLUS:
+  // uint16_t += 2 * uint8_t;
+  if (mode == HImode
+	  && GET_CODE (XEXP (x, 0)) == ASHIFT
+	  && REG_P (XEXP (x, 1))
+	  && XEXP (XEXP (x, 0), 1) == const1_rtx
+	  && GET_CODE (XEXP (XEXP (x, 0), 0)) == ZERO_EXTEND)
+	{
+	  *total = COSTS_N_INSNS (4);
+	  return true;
+	}
+
   if (GET_CODE (XEXP (x, 0)) == ZERO_EXTEND
 	  && REG_P (XEXP (x, 1)))
 	{
diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md
index bc8a59c956c..52b6cff4a8b 100644
--- a/gcc/config/avr/avr.md
+++ b/gcc/config/avr/avr.md
@@ -1630,6 +1630,39 @@ (define_insn "*addhi3_zero_extend.const"
   "subi %A0,%n2\;sbc %B0,%B0"
   [(set_attr "length" "2")])
 
+
+;; Occurs when computing offsets into 16-bit arrays.
+;; Saves up to 2 instructions.
+(define_insn_and_split "*addhi3_zero_extend.ashift1.split"
+  [(set (match_operand:HI 0 "register_operand""=r")
+(plus:HI (ashift:HI (zero_extend:HI (match_operand:QI 1 "register_operand" "r"))
+(const_int 1))
+ (match_operand:HI 2 "register_operand""0")))]
+  ""
+  "#"
+  "&& reload_completed"
+  [(parallel [(set (match_dup 0)
+   (plus:HI (ashift:HI (zero_extend:HI (match_dup 1))
+   (const_int 1))
+(match_dup 2)))
+  (clobber (reg:CC REG_CC))])])
+
+(define_insn "*addhi3_zero_extend.ashift1"
+  [(set (match_operand:HI 0 "register_operand""=r")
+(plus:HI (ashift:HI (zero_extend:HI (match_operand:QI 1 "register_operand" "r"))
+(const_int 1))
+ (match_operand:HI 2 "register_operand""0")))
+   (clobber (reg:CC REG_CC))]
+  "reload_completed"
+  {
+return reg_overlap_mentioned_p (operands[1], operands[0])
+  ? "mov __tmp_reg__,%1\;add %A0,__tmp_reg__\;adc %B0,__zero_reg__\;add %A0,__tmp_reg__\;adc %B0,__zero_reg__"
+  : "add %A0,%1\;adc %B0,__zero_reg__\;add %A0,%1\;adc %B0,__zero_reg__";
+  }
+  [(set (attr "length")
+(symbol_ref ("4 + reg_overlap_mentioned_p (operands[1], operands[0])")))])
+
+
 (define_insn_and_split "*usum_widenqihi3_split"
   [(set (match_operand:HI 0 "register_operand"  "=r")
 (plus:HI (zero_extend:HI (match_operand:QI 1 "register_operand"  "0"))


[PATCH] tree-optimization/114269 - 434.zeusmp regression after SCEV analysis fix

2024-03-08 Thread Richard Biener
The following addresses a performance regression caused by the recent
SCEV analysis fix with regard to folding multiplications and undefined
behavior on overflow.  We do not handle (T) { a, +, b } * c but can
treat sign-conversions from unsigned by performing the multiplication
in the unsigned type.  That's what we already do for additions (but
that misses one case that turns out important).

This fixes the 434.zeusmp regression for me.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

PR tree-optimization/114269
* tree-chrec.cc (chrec_fold_plus_1): Handle sign-conversions
in the third CASE_CONVERT case as well.
(chrec_fold_multiply): Handle sign-conversions from unsigned
by performing the operation in the unsigned type.
---
 gcc/tree-chrec.cc | 48 +++
 1 file changed, 48 insertions(+)

diff --git a/gcc/tree-chrec.cc b/gcc/tree-chrec.cc
index 2e6c7356d3b..7cd0ebc1010 100644
--- a/gcc/tree-chrec.cc
+++ b/gcc/tree-chrec.cc
@@ -325,6 +325,22 @@ chrec_fold_plus_1 (enum tree_code code, tree type,
: build_int_cst_type (type, -1)));
 
CASE_CONVERT:
+ {
+   /* We can strip sign-conversions to signed by performing the
+  operation in unsigned.  */
+   tree optype = TREE_TYPE (TREE_OPERAND (op1, 0));
+   if (INTEGRAL_TYPE_P (type)
+   && INTEGRAL_TYPE_P (optype)
+   && tree_nop_conversion_p (type, optype)
+   && TYPE_UNSIGNED (optype))
+ return chrec_convert (type,
+   chrec_fold_plus_1 (code, optype,
+  chrec_convert (optype,
+ op0, 
NULL),
+  TREE_OPERAND (op1, 0)),
+   NULL);
+ }
+
  if (tree_contains_chrecs (op1, NULL))
return chrec_dont_know;
  /* FALLTHRU */
@@ -424,6 +440,22 @@ chrec_fold_multiply (tree type,
  return chrec_fold_multiply_poly_poly (type, op0, op1);
 
CASE_CONVERT:
+ {
+   /* We can strip sign-conversions to signed by performing the
+  operation in unsigned.  */
+   tree optype = TREE_TYPE (TREE_OPERAND (op1, 0));
+   if (INTEGRAL_TYPE_P (type)
+   && INTEGRAL_TYPE_P (optype)
+   && tree_nop_conversion_p (type, optype)
+   && TYPE_UNSIGNED (optype))
+ return chrec_convert (type,
+   chrec_fold_multiply (optype,
+chrec_convert (optype,
+   op0, 
NULL),
+TREE_OPERAND (op1, 0)),
+   NULL);
+ }
+
  if (tree_contains_chrecs (op1, NULL))
return chrec_dont_know;
  /* FALLTHRU */
@@ -474,6 +506,22 @@ chrec_fold_multiply (tree type,
}
 
 CASE_CONVERT:
+  {
+   /* We can strip sign-conversions to signed by performing the
+  operation in unsigned.  */
+   tree optype = TREE_TYPE (TREE_OPERAND (op0, 0));
+   if (INTEGRAL_TYPE_P (type)
+   && INTEGRAL_TYPE_P (optype)
+   && tree_nop_conversion_p (type, optype)
+   && TYPE_UNSIGNED (optype))
+ return chrec_convert (type,
+   chrec_fold_multiply (optype,
+TREE_OPERAND (op0, 0),
+chrec_convert (optype,
+   op1, NULL)),
+   NULL);
+  }
+
   if (tree_contains_chrecs (op0, NULL))
return chrec_dont_know;
   /* FALLTHRU */
-- 
2.35.3


Re: [PATCH] testsuite: Fix up pr113617 test for darwin [PR113617]

2024-03-08 Thread Richard Biener
On Fri, 8 Mar 2024, Jakub Jelinek wrote:

> Hi!
> 
> The test attempts to link a shared library, and apparently Darwin doesn't
> allow by default for shared libraries to contain undefined symbols.
> 
> The following patch just adds dummy definitions for the symbols, so that
> the library no longer has any undefined symbols at least in my linux
> testing.
> Furthermore, for target { !shared } targets (like darwin until the it is
> fixed in target-supports.exp), because we then link a program rather than
> shared library, the patch also adds a dummy main definition so that it
> can link.
> 
> Regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

> 2024-03-08  Jakub Jelinek  
> 
>   PR rtl-optimization/113617
>   * g++.dg/other/pr113617.C: Define -DSHARED when linking with -shared.
>   * g++.dg/other/pr113617-aux.cc: Add definitions for used methods and
>   templates not defined elsewhere.
> 
> --- gcc/testsuite/g++.dg/other/pr113617.C.jj  2024-02-26 17:54:57.054857411 
> +0100
> +++ gcc/testsuite/g++.dg/other/pr113617.C 2024-03-07 15:57:40.510796110 
> +0100
> @@ -2,7 +2,7 @@
>  // { dg-do link { target c++11 } }
>  // { dg-options "-O2" }
>  // { dg-additional-options "-fPIC" { target fpic } } */
> -// { dg-additional-options "-shared" { target shared } } */
> +// { dg-additional-options "-shared -DSHARED" { target shared } } */
>  // { dg-additional-sources pr113617-aux.cc }
>  
>  #include "pr113617.h"
> --- gcc/testsuite/g++.dg/other/pr113617-aux.cc.jj 2024-02-26 
> 17:54:57.054857411 +0100
> +++ gcc/testsuite/g++.dg/other/pr113617-aux.cc2024-03-07 
> 15:58:05.473448950 +0100
> @@ -7,3 +7,42 @@ void qux() {
>A a;
>a.foo(0, 0);
>  }
> +
> +namespace R {
> +template<>
> +Y >::AI
> +Y >::operator->()
> +{
> +  return AI();
> +}
> +template<>
> +Y >::AI
> +Y >::operator->()
> +{
> +  return AI();
> +}
> +}
> +
> +N1::N2::N3::AB ab;
> +
> +N1::N2::N3::AB &
> +N1::N2::N3::AB::bleh()
> +{
> +  return ab;
> +}
> +
> +N1::N2::N3::AC::AC(int)
> +{
> +}
> +
> +void
> +N1::N2::N3::AC::m1(R::S)
> +{
> +}
> +
> +#ifndef SHARED
> +int
> +main()
> +{
> +}
> +#endif
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[PATCH] fix PowerPC < 7 w/ Altivec not to default to power7

2024-03-08 Thread Rene Rebe
This might not be the best timing -short before a major release-,
however, Sam just commented on the bug I filled years ago [1], so here
we go:

Glibc uses .machine to determine assembler optimizations to use.
However, since reworking the rs6000 .machine output selection in
commit e154242724b084380e3221df7c08fcdbd8460674 22 May 2019, G5 as
well as Cell, and even power4 w/ -maltivec currently resulted in
power7. Mask _ALTIVEC away as the .machine selection already did for
GFX and GPOPT.

powerpc64-t2-linux-gnu-gcc  test.c -S -o - -mcpu=G5
.file   "test.c"
.machine power7
.abiversion 2
.section".text"
.ident  "GCC: (GNU) 10.2.0"
.section.note.GNU-stack,"",@progbits

We ship this in T2/Linux [2] since 2020 and it is tested on G5, Cell
and Power8.

Signed-of-by: René Rebe 

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97367
[2] https://t2sde.org

--- gcc-11.1.0-RC-20210423/gcc/config/rs6000/rs6000.cc.vanilla  2021-04-25 
22:57:16.964223106 +0200
+++ gcc-11.1.0-RC-20210423/gcc/config/rs6000/rs6000.cc  2021-04-25 
22:57:27.193223841 +0200
@@ -5765,7 +5765,7 @@
   HOST_WIDE_INT flags = rs6000_isa_flags;
 
   /* Disable the flags that should never influence the .machine selection.  */
-  flags &= ~(OPTION_MASK_PPC_GFXOPT | OPTION_MASK_PPC_GPOPT | 
OPTION_MASK_ISEL);
+  flags &= ~(OPTION_MASK_PPC_GFXOPT | OPTION_MASK_PPC_GPOPT | 
OPTION_MASK_ALTIVEC | OPTION_MASK_ISEL);
 
   if ((flags & (ISA_3_1_MASKS_SERVER & ~ISA_3_0_MASKS_SERVER)) != 0)
 return "power10";

-- 
  René Rebe, ExactCODE GmbH, Lietzenburger Str. 42, DE-10789 Berlin
  https://exactcode.com | https://t2sde.org | https://rene.rebe.de


Fix 'char' initialization, copy, check in 'libgomp.oacc-fortran/acc-memcpy.f90' (was: [patch] OpenACC: Add Fortran routines acc_{alloc,free,hostptr,deviceptr,memcpy_{to,from}_device*})

2024-03-08 Thread Thomas Schwinge
Hi Tobias!

On 2024-02-19T22:36:51+0100, Tobias Burnus  wrote:
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-fortran/acc-memcpy.f90

OK to push
"Fix 'char' initialization, copy, check in 
'libgomp.oacc-fortran/acc-memcpy.f90'",
see attached?


Grüße
 Thomas


> @@ -0,0 +1,47 @@
> +! { dg-do run }
> +! { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } }
> +
> +! based on libgomp.oacc-c-c++-common/lib-60.c
> +
> +program main
> +  use openacc
> +  use iso_fortran_env
> +  use iso_c_binding
> +  implicit none (type, external)
> +  integer(int8), allocatable :: char(:)
> +  type(c_ptr) :: dptr
> +  integer(c_intptr_t) :: i
> +  integer(int8) :: j
> +
> +  allocate(char(-128:127))
> +  do i = -128, 127
> +char(j) = int (j, int8)
> +  end do
> +
> +  dptr = acc_malloc (256_c_size_t)
> +  call acc_memcpy_to_device (dptr, char, 255_c_size_t)
> +
> +  do i = 0, 255
> +if (acc_is_present (transfer (transfer(char, i) + i, dptr), 1)) &
> +  stop 1
> +  end do
> +
> +  char = 0_int8
> +
> +  call acc_memcpy_from_device (char, dptr, 256_c_size_t)
> +
> +  do i = -128, 127
> +char(i) = int (j, int8)
> +if (char(i) /= j) &
> +  stop 2
> +  end do
> +
> +  do i = 0, 255
> +if (acc_is_present (transfer (transfer(char, i) + i, dptr), 1)) &
> +  stop 3
> +  end do
> +
> +  call acc_free (dptr)
> +
> +  deallocate (char)
> +end


>From 7ea60a544353fa9ff0760e11db53332195eebad4 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 6 Mar 2024 23:18:08 +0100
Subject: [PATCH] Fix 'char' initialization, copy, check in
 'libgomp.oacc-fortran/acc-memcpy.f90'
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Our dear friend '-Wuninitialized' reported:

[...]/libgomp.oacc-fortran/acc-memcpy.f90:18:27:

   18 | char(j) = int (j, int8)
  |   ^
Warning: ‘j’ may be used uninitialized [-Wmaybe-uninitialized]
[...]/libgomp.oacc-fortran/acc-memcpy.f90:14:20:

   14 |   integer(int8) :: j
  |^
note: ‘j’ was declared here

..., but actually there were other issues.

	libgomp/
	* testsuite/libgomp.oacc-fortran/acc-memcpy.f90: Fix 'char'
	initialization, copy, check.
---
 libgomp/testsuite/libgomp.oacc-fortran/acc-memcpy.f90 | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/libgomp/testsuite/libgomp.oacc-fortran/acc-memcpy.f90 b/libgomp/testsuite/libgomp.oacc-fortran/acc-memcpy.f90
index 670dc50ff07..844d08a4661 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/acc-memcpy.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/acc-memcpy.f90
@@ -11,15 +11,14 @@ program main
   integer(int8), allocatable :: char(:)
   type(c_ptr) :: dptr
   integer(c_intptr_t) :: i
-  integer(int8) :: j
 
   allocate(char(-128:127))
   do i = -128, 127
-char(j) = int (j, int8)
+char(i) = int (i, int8)
   end do
 
   dptr = acc_malloc (256_c_size_t)
-  call acc_memcpy_to_device (dptr, char, 255_c_size_t)
+  call acc_memcpy_to_device (dptr, char, 256_c_size_t)
 
   do i = 0, 255
 if (acc_is_present (transfer (transfer(char, i) + i, dptr), 1)) &
@@ -31,8 +30,7 @@ program main
   call acc_memcpy_from_device (char, dptr, 256_c_size_t)
 
   do i = -128, 127
-char(i) = int (j, int8)
-if (char(i) /= j) &
+if (char(i) /= i) &
   stop 2
   end do
 
-- 
2.34.1



Re: [PATCH] bb-reorder: Fix assertion

2024-03-08 Thread Richard Biener
On Fri, 8 Mar 2024, Jakub Jelinek wrote:

> Hi!
> 
> When touching bb-reorder yesterday, I've noticed the checking assert
> doesn't actually check what it meant to.
> Because asm_noperands returns >= 0 for inline asm patterns (in that case
> number of input+output+label operands, so asm goto has at least one)
> and -1 if it isn't inline asm.
> 
> The following patch fixes the assertion to actually check that it is
> asm goto.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

> 2024-03-08  Jakub Jelinek  
> 
>   * bb-reorder.cc (fix_up_fall_thru_edges): Fix up checking assert,
>   asm_noperands < 0 means it is not asm goto too.
> 
> --- gcc/bb-reorder.cc.jj  2024-03-07 10:06:56.086285875 +0100
> +++ gcc/bb-reorder.cc 2024-03-07 10:11:44.745261926 +0100
> @@ -2024,7 +2024,8 @@ fix_up_fall_thru_edges (void)
>See PR108596.  */
> rtx_insn *j = BB_END (cur_bb);
> gcc_checking_assert (JUMP_P (j)
> -&& asm_noperands (PATTERN (j)));
> +&& (asm_noperands (PATTERN (j))
> +> 0));
> edge e2 = find_edge (cur_bb, e->dest);
> if (e2)
>   e2->flags |= EDGE_CROSSING;
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


GCN, nvptx: Errors during device probing are fatal (was: Stabilizing flaky libgomp GCN target/offloading testing)

2024-03-08 Thread Thomas Schwinge
Hi!

On 2024-02-21T13:34:01+0100, I wrote:
> On 2024-02-01T15:49:02+0100, Richard Biener  wrote:
>> On Thu, 1 Feb 2024, Thomas Schwinge wrote:
>>> [...] what I
>>> got with '-march=gfx1100' for AMD Radeon RX 7900 XTX.  [...]
>
>>> [...] execution test FAILs.  Not all FAILs appear all the time [...]
>
> What disturbs the testing a lot is, that the GPU may get into a bad
> state, upon which any use either fails with a
> 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' error -- or by just hanging, deep in
> 'libhsa-runtime64.so.1'...

So, there's a "fun" aspect: if we run into
'HSA_STATUS_ERROR_OUT_OF_RESOURCES' (or other errors; and similar in the
libgomp nvptx plugin) during libgomp GCN plugin device probing, then it's
not fatal, but instead silently disables the libgomp plugin/device, thus
typically silently resorting to host-fallback execution.  That's not
helpful behavior in my opinion, so I propose the attached
"GCN, nvptx: Errors during device probing are fatal".  OK to push?

(That's also the behavior that's implemented in both the GCN and nvptx
target 'run' tools.)


Grüße
 Thomas


>From 0dc72089dccc10d3b55096ade5fc4d72de6cb96f Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 7 Mar 2024 14:42:07 +0100
Subject: [PATCH] GCN, nvptx: Errors during device probing are fatal

Currently, we silently disable libgomp GCN and nvptx plugins/devices in
presence of certain error conditions during device probing, thus typically
silently resorting to host-fallback execution.  Make such errors fatal, similar
as for any other device access later on, so that we early and reliably notice
when things go wrong.  (Keep just two cases non-fatal: (a) libgomp GCN or nvptx
plugins are available but 'libhsa-runtime64.so.1' or 'libcuda.so.1' are not,
and (b) those are available, but the corresponding devices are not.)

This resolves the issue that we've got execution test cases unexpectedly
PASSing, despite:

libgomp: GCN fatal error: Run-time could not be initialized
Runtime message: HSA_STATUS_ERROR_OUT_OF_RESOURCES: The runtime failed to allocate the necessary resources. This error may also occur when the core runtime library needs to spawn threads or create internal OS-specific events.

..., and therefore they were not offloaded to the GCN device, but ran in
host-fallback execution mode.  What happend in that scenario is that in
'init_hsa_context' during the initial 'GOMP_OFFLOAD_get_num_devices' we ran
into 'HSA_STATUS_ERROR_OUT_OF_RESOURCES', but it wasn't fatal, but just
silently disabled the libgomp plugin/device.

Especially "entertaining" were cases where such unintended host-fallback
execution happened during effective-target checks like
'offload_device_available' (host-fallback execution there meaning: no offload
device available), but actual test cases then were running with an offload
device available, and therefore mis-configured.

	include/
	* cuda/cuda.h (CUresult): Add 'CUDA_ERROR_NO_DEVICE'.
	libgomp/
	* plugin/plugin-gcn.c (init_hsa_context): Add and handle
	'bool probe' parameter.  Adjust all users; errors during device
	probing are fatal.
	* plugin/plugin-nvptx.c (nvptx_get_num_devices): Aside from
	'CUDA_ERROR_NO_DEVICE', errors during device probing are fatal.
---
 include/cuda/cuda.h   |  1 +
 libgomp/plugin/plugin-gcn.c   | 14 --
 libgomp/plugin/plugin-nvptx.c |  4 +++-
 3 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/include/cuda/cuda.h b/include/cuda/cuda.h
index 114aba4e074..0dca4b3a5c0 100644
--- a/include/cuda/cuda.h
+++ b/include/cuda/cuda.h
@@ -57,6 +57,7 @@ typedef enum {
   CUDA_ERROR_OUT_OF_MEMORY = 2,
   CUDA_ERROR_NOT_INITIALIZED = 3,
   CUDA_ERROR_DEINITIALIZED = 4,
+  CUDA_ERROR_NO_DEVICE = 100,
   CUDA_ERROR_INVALID_CONTEXT = 201,
   CUDA_ERROR_INVALID_HANDLE = 400,
   CUDA_ERROR_NOT_FOUND = 500,
diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 7e141a85f31..2bea9157e9d 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -1511,10 +1511,12 @@ assign_agent_ids (hsa_agent_t agent, void *data)
 }
 
 /* Initialize hsa_context if it has not already been done.
-   Return TRUE on success.  */
+   If !PROBE: returns TRUE on success.
+   If PROBE: returns TRUE on success or if the plugin/device shall be silently
+   ignored, and otherwise emits an error and returns FALSE.  */
 
 static bool
-init_hsa_context (void)
+init_hsa_context (bool probe)
 {
   hsa_status_t status;
   int agent_index = 0;
@@ -1529,7 +1531,7 @@ init_hsa_context (void)
 	GOMP_PLUGIN_fatal ("%s\n", msg);
   else
 	GCN_WARNING ("%s\n", msg);
-  return false;
+  return probe ? true : false;
 }
   status = hsa_fns.hsa_init_fn ();
   if (status != HSA_STATUS_SUCCESS)
@@ -3321,8 +3323,8 @@ GOMP_OFFLOAD_version (void)
 int
 GOMP_OFFLOAD_get_num_devices (unsigned int omp_requires_mask)
 {
-  if (!init_hsa_context ())
-return 0;
+  if (!init_hsa_context (true))
+exit (EXIT_FAILURE);
   /* Return -1 if no 

Re: GCN: Even with 'GCN_SUPPRESS_HOST_FALLBACK' set, failure to 'init_hsa_runtime_functions' is not fatal

2024-03-08 Thread Thomas Schwinge
Hi!

On 2024-03-07T15:07:32+0100, Tobias Burnus  wrote:
> first, I have the feeling we talk about (more or less) the same code 
> region and use the same words – but we talk about rather different 
> things. Thus, you confuse me (and possibly Andrew) – and my reply 
> confuses you.

That, indeed, is my impression, too.  :-/

And actually the biggest confusion seems to be that both you would like
'GCN_SUPPRESS_HOST_FALLBACK' to mean something else than
'HSA_SUPPRESS_HOST_FALLBACK' originally meant.

Hopefully the
"GCN: The original meaning of 'GCN_SUPPRESS_HOST_FALLBACK' isn't applicable 
(non-shared memory system)"
does clarify that.


Just to close this out, let's try again for the other discussion items:

> Thomas Schwinge wrote:
>> On 2024-03-07T12:43:07+0100, Tobias Burnus  wrote:
>>> Thomas Schwinge wrote:
 First, I think most users do not set GCN_SUPPRESS_HOST_FALLBACK – and it
 is also not really desirable.
>> External users probably don't, but certainly all our internal testing is
>> setting it,
>
> First, I doubt it

'git grep --cached GCN_SUPPRESS_HOST_FALLBACK' in our internal scripts is
your friend.

> secondly, if it were true, it was broken for the 
> last 5 years or so as we definitely did not notice fails due to not 
> working offload devices. – Neither for AMD GCN nor ...

You're saying that 'GCN_SUPPRESS_HOST_FALLBACK=1' doesn't report as fatal
certain errors during device probing?  That's not what the code as well
as my experience says.

>> and also implicitly all nvptx offloading testing: simply by
>> means of having ["no" missing here -- sorry!] such knob in the libgomp nvptx 
>> plugin.
>
> I did see it at some places set for AMD but I do not see any 
> nvptx-specific environment variable which permits to do the same.

Right, that was confusing: there was a "no" missing in that sentence --
sorry!

> However:
>>   That is, the
>> libgomp nvptx plugin has an implicit 'suppress_host_fallback = true' for
>> (the original meaning of) that flag
>
> I think that's one of the problems here – you talk about 
> suppress_host_fallback (implicit, original meaning), while I talk about 
> the GCN_SUPPRESS_HOST_FALLBACK environment variable.

The 'suppress_host_fallback' internal variable directly corresponds to
the 'GCN_SUPPRESS_HOST_FALLBACK' environment variable.

> Besides all the talk about suppress_host_fallback, 
> 'init_hsa_runtime_functions' is not fatal' of the subject line seems to 
> be something to be considered (beyond the patches you already suggested).

I'll next submit "GCN, nvptx: Errors during device probing are fatal".

>>> If I run on my Linux system the system compiler with nvptx + gcn suppost
>>> installed, I get (with a nvptx permission problem):
>>>
>>> $ GCN_SUPPRESS_HOST_FALLBACK=1 ./a.out
>>>
>>> libgomp: GCN host fallback has been suppressed
>>>
>>> And exit code = 1. The same result with '-foffload=disable' or with
>>> '-foffload=nvptx-none'.
>> I can't tell if that's what you expect to see there, or not?
>
> Well, obviously

In this discussion thread here, nothing was obvious to my anymore...  ;-|

> not that I get this error by default – and as your 
> wording indicated that the internal variable will be always true

That always-'true' suggestion was only for the *original* meaning of the
variable: the use in 'GOMP_OFFLOAD_can_run'.

> – and 
> not only when the env var GCN_SUPPRESS_HOST_FALLBACK is explicit set, I 
> worry that I would get the error any time.

That was exactly the point of my patch in this thread: to get rid of the
*additional*/*new* behavior that the libgomp GCN plugin derives from
'GCN_SUPPRESS_HOST_FALLBACK', different from what
'HSA_SUPPRESS_HOST_FALLBACK' originally meant.

However, I now understand that Andrew would like to keep that *new*
behavior.

>> (For avoidance of doubt: I'm expecting silent host-fallback execution in
>> case that libgomp GCN and/or nvptx plugins are available, but no
>> corresponding devices.  That's what my patch achieves.)
>
> I concur that the silent host fallback should happen by default (unless 
> env vars tell otherwise) - at least when either no code was generated 
> for the device (e.g. -foffload=disable) or when the vendor runtime 
> library is not available or no device (be it no hardware or no permission).
>
> That's the current behavior and if that remains, my main concern evaporates.

ACK, thanks.


Grüße
 Thomas


>>> If we want to remove it, we can make it always false - but I am strongly
>>> against making it always true.
>> I'm confused.  So you want the GCN and nvptx plugins to behave
>> differently in that regard?
> No – or at least: not unless GCN_SUPPRESS_HOST_FALLBACK is set.
>>> Use OMP_TARGET_OFFLOAD=mandatory (or that GCN env) if you want to
>>> prevent the host fallback, but don't break somewhat common systems.
>> That's an orthogonal concept?
>
> No – It's the same concept of the main use of the 
> GCN_SUPPRESS_HOST_FALLBACK environment variable: You get a run-time 
> error instead of 

GCN: The original meaning of 'GCN_SUPPRESS_HOST_FALLBACK' isn't applicable (non-shared memory system) (was: GCN: Even with 'GCN_SUPPRESS_HOST_FALLBACK' set, failure to 'init_hsa_runtime_functions' is

2024-03-08 Thread Thomas Schwinge
Hi!

So, attached here is now a different patch
"GCN: The original meaning of 'GCN_SUPPRESS_HOST_FALLBACK' isn't applicable 
(non-shared memory system)",
that takes a different approach re clarifying the two orthogonal aspects
that the 'GCN_SUPPRESS_HOST_FALLBACK' environment variable controls:
(a) the *original* meaning via 'HSA_SUPPRESS_HOST_FALLBACK', and
(b) the *additional*/*new* meaning to report as fatal certain errors
during device probing.

As you requested, (b) remains as it is (with just the diagnostic message
clarified).  Re (a):

On 2024-03-07T14:37:10+0100, I wrote:
> On 2024-03-07T12:43:07+0100, Tobias Burnus  wrote:
>> Thomas Schwinge wrote:
>>> [...] libgomp GCN plugin 'GCN_SUPPRESS_HOST_FALLBACK' [...]
>>>
>>> [...] originates in the libgomp HSA plugin, where the idea was -- in my
>>> understanding -- that you wouldn't have device code available for all
>>> 'fn_ptr's, and in that case transparently (shared-memory system!) do
>>> host-fallback execution.  Or, with 'GCN_SUPPRESS_HOST_FALLBACK' set,
>>> you'd get those diagnosed.
>>>
>>> This has then been copied into the libgomp GCN plugin (see above).
>>> However, is it really still applicable there; don't we assume that we're
>>> generating device code for all relevant functions?

> And, one step back: how is (the original meaning of)
> 'suppress_host_fallback = false' even supposed to work on non-shared
> memory systems as currently implemented by the libgomp GCN plugin?

> [...] this whole concept of dynamic plugin-level
> host-fallback execution being in conflict with our current non-shared
> memory system configurations?

I therefore suggest to get rid of (a).

OK to push?


Grüße
 Thomas


>From 2a188021ca70fc1956ba78707fdec9dcca4f734d Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 7 Mar 2024 15:51:54 +0100
Subject: [PATCH] GCN: The original meaning of 'GCN_SUPPRESS_HOST_FALLBACK'
 isn't applicable (non-shared memory system)

'GCN_SUPPRESS_HOST_FALLBACK' originated as 'HSA_SUPPRESS_HOST_FALLBACK' in the
libgomp HSA plugin, where the idea was -- in my understanding -- that you
wouldn't have device code available for all functions that may be called, and
in that case transparently (shared memory system!) do host-fallback execution.
Or, with 'HSA_SUPPRESS_HOST_FALLBACK' set, you'd get those diagnosed.

This has then been copied into the libgomp GCN plugin as
'GCN_SUPPRESS_HOST_FALLBACK'.  However, the original meaning isn't applicable
for the libgomp GCN plugin anymore: we assume that we're generating device code
for all relevant functions, and we're implementing a non-shared memory system,
where we cannot transparently do host-fallback execution for individual
functions.

However, 'GCN_SUPPRESS_HOST_FALLBACK' has gained an additional meaning, to
enforce a fatal error in case that 'libhsa-runtime64.so.1' can't be dynamically
loaded; keep that meaning.

	libgomp/
	* plugin/plugin-gcn.c (GOMP_OFFLOAD_can_run): Don't consider
	'GCN_SUPPRESS_HOST_FALLBACK' anymore (assume always-'true').
	(init_hsa_context): Adjust 'GCN_SUPPRESS_HOST_FALLBACK' error
	message.
---
 libgomp/plugin/plugin-gcn.c | 14 +-
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 4b7ab5e83c5..7e141a85f31 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -1524,9 +1524,11 @@ init_hsa_context (void)
   init_environment_variables ();
   if (!init_hsa_runtime_functions ())
 {
-  GCN_WARNING ("Run-time could not be dynamically opened\n");
+  const char *msg = "Run-time could not be dynamically opened";
   if (suppress_host_fallback)
-	GOMP_PLUGIN_fatal ("GCN host fallback has been suppressed");
+	GOMP_PLUGIN_fatal ("%s\n", msg);
+  else
+	GCN_WARNING ("%s\n", msg);
   return false;
 }
   status = hsa_fns.hsa_init_fn ();
@@ -3855,15 +3857,9 @@ GOMP_OFFLOAD_can_run (void *fn_ptr)
 
   init_kernel (kernel);
   if (kernel->initialization_failed)
-goto failure;
+GOMP_PLUGIN_fatal ("kernel initialization failed");
 
   return true;
-
-failure:
-  if (suppress_host_fallback)
-GOMP_PLUGIN_fatal ("GCN host fallback has been suppressed");
-  GCN_WARNING ("GCN target cannot be launched, doing a host fallback\n");
-  return false;
 }
 
 /* Allocate memory on device N.  */
-- 
2.34.1



Re: [PATCH v2] libstdc++: add ARM SVE support to std::experimental::simd

2024-03-08 Thread Matthias Kretz
Hi,

I applied and did extended testing on x86_64 (no regressions) and aarch64 
using qemu testing SVE 256, 512, and 1024. Looks good!

While going through the applied patch I noticed a few style issues that I 
simply turned into a patch (attached).

A few comments inline. Sorry for not seeing these before.

On Friday, 9 February 2024 15:28:10 CET Srinivas Yadav Singanaboina wrote:
> diff --git a/libstdc++-v3/include/experimental/bits/simd.h
> b/libstdc++-v3/include/experimental/bits/simd.h index
> 90523ea57dc..d274cd740fe 100644
> --- a/libstdc++-v3/include/experimental/bits/simd.h
> +++ b/libstdc++-v3/include/experimental/bits/simd.h
> @@ -39,12 +39,16 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  #if _GLIBCXX_SIMD_X86INTRIN
>  #include 
>  #elif _GLIBCXX_SIMD_HAVE_NEON
>  #include 
>  #endif
> +#if _GLIBCXX_SIMD_HAVE_SVE
> +#include 
> +#endif
> 
>  /** @ingroup ts_simd
>   * @{
> @@ -83,6 +87,12 @@ using __m512d [[__gnu__::__vector_size__(64)]] = double;
>  using __m512i [[__gnu__::__vector_size__(64)]] = long long;
>  #endif
> 
> +#if _GLIBCXX_SIMD_HAVE_SVE
> +constexpr inline int __sve_vectorized_size_bytes = __ARM_FEATURE_SVE_BITS /
> 8; +#else
> +constexpr inline int __sve_vectorized_size_bytes = 0;
> +#endif
> +
>  namespace simd_abi {
>  // simd_abi forward declarations {{{
>  // implementation details:
> @@ -108,6 +118,9 @@ template 
>  template 
>struct _VecBltnBtmsk;
> 
> +template 
> +  struct _SveAbi;
> +
>  template 
>using _VecN = _VecBuiltin;
> 
> @@ -123,6 +136,9 @@ template 
>  template 
>using _Neon = _VecBuiltin<_UsedBytes>;
> 
> +template 
> +  using _Sve = _SveAbi<_UsedBytes, __sve_vectorized_size_bytes>;
> +
>  // implementation-defined:
>  using __sse = _Sse<>;
>  using __avx = _Avx<>;
> @@ -130,6 +146,7 @@ using __avx512 = _Avx512<>;
>  using __neon = _Neon<>;
>  using __neon128 = _Neon<16>;
>  using __neon64 = _Neon<8>;
> +using __sve = _Sve<>;
> 
>  // standard:
>  template 
> @@ -250,6 +267,8 @@ constexpr inline bool __support_neon_float =
>false;
>  #endif
> 
> +constexpr inline bool __have_sve = _GLIBCXX_SIMD_HAVE_SVE;
> +
>  #ifdef _ARCH_PWR10
>  constexpr inline bool __have_power10vec = true;
>  #else
> @@ -356,12 +375,13 @@ namespace __detail
> 
>| (__have_avx512vnni << 27)
>| (__have_avx512vpopcntdq<< 28)
>| (__have_avx512vp2intersect << 29);
> 
> -else if constexpr (__have_neon)
> +else if constexpr (__have_neon || __have_sve)
>return __have_neon
> 
>  | (__have_neon_a32 << 1)
>  | (__have_neon_a64 << 2)
>  | (__have_neon_a64 << 2)
> 
> -| (__support_neon_float << 3);
> +| (__support_neon_float << 3)
> + | (__have_sve << 4);

This is not enough. This should list all feature flags that might have a 
(significant enough) influence on code-gen in inline functions (that are not 
always_inline). AFAIU at least __ARM_FEATURE_SVE2 is necessary. But I assume 
__ARM_FEATURE_SVE2_BITPERM, __ARM_FEATURE_SVE_BITS, 
__ARM_FEATURE_SVE_MATMUL_INT8, and __ARM_FEATURE_SVE_VECTOR_OPERATORS are also 
relevant. Maybe more?

> [...]
bits/simd.h:

>  // fall back to fixed_size only if scalar and native ABIs don't match
>  template 
>struct __deduce_fixed_size_fallback {};
> 
> +template 
> +  struct __no_sve_deduce_fixed_size_fallback {};
> +
>  template 
>struct __deduce_fixed_size_fallback<_Tp, _Np,
>  enable_if_t::template _S_is_valid_v<_Tp>>>
>{ using type = simd_abi::fixed_size<_Np>; };
> 
> +template 
> +  struct __no_sve_deduce_fixed_size_fallback<_Tp, _Np,
> +enable_if_t::template _S_is_valid_v<_Tp>>>
> +  { using type = simd_abi::fixed_size<_Np>; };
> +
>  template 
>struct __deduce_impl : public __deduce_fixed_size_fallback<_Tp, _Np> {};
> 
> +template 
> +  struct __no_sve_deduce_impl : public
> __no_sve_deduce_fixed_size_fallback<_Tp, _Np> {};

I believe you don't need __no_sve_deduce_fixed_size_fallback. Simply derive 
__no_sve_deduce_impl from __deduce_fixed_size_fallback. No?


> diff --git a/libstdc++-v3/include/experimental/bits/simd_converter.h
> b/libstdc++-v3/include/experimental/bits/simd_converter.h index
> 3160e251632..b233d2c70a5 100644
> --- a/libstdc++-v3/include/experimental/bits/simd_converter.h
> +++ b/libstdc++-v3/include/experimental/bits/simd_converter.h
> @@ -28,6 +28,18 @@
>  #if __cplusplus >= 201703L
> 
>  _GLIBCXX_SIMD_BEGIN_NAMESPACE
> +
> +template 
> +_Ret __converter_fallback(_Arg __a)
> +  {
> +  _Ret __ret{};
> +  __execute_n_times<_Np>(
> +  [&](auto __i) _GLIBCXX_SIMD_ALWAYS_INLINE_LAMBDA {
> +__ret._M_set(__i, static_cast<_To>(__a[__i]));
> +});
> +  return __ret;
> +  }
> +
>  // _SimdConverter scalar -> scalar {{{
>  template 
>struct _SimdConverter<_From, simd_abi::scalar, _To, simd_abi::scalar,
> @@ -56,14 +68,16 @@ template 
>};
> 
>  // }}}
> -// _SimdConverter "native 1" -> "native 2" {{{
> +// _SimdConverter "native 

[PING^2] Re: [PATCH] analyzer: deal with -fshort-enums

2024-03-08 Thread Torbjorn SVENSSON

Ping!

Kind regards,
Torbjörn

On 2024-02-22 09:51, Torbjorn SVENSSON wrote:

Ping!

Kind regards,
Torbjörn

On 2024-02-07 17:21, Torbjorn SVENSSON wrote:

Hi,

Is it okay to backport 3cbab07b08d2f3a3ed34b6ec12e67727c59d285c to 
releases/gcc-13?


Without this backport, I see these failures on arm-none-eabi:

FAIL: gcc.dg/analyzer/switch-enum-1.c  (test for bogus messages, line 26)
FAIL: gcc.dg/analyzer/switch-enum-1.c  (test for bogus messages, line 44)
FAIL: gcc.dg/analyzer/switch-enum-2.c  (test for bogus messages, line 34)
FAIL: gcc.dg/analyzer/switch-enum-2.c  (test for bogus messages, line 52)
FAIL: gcc.dg/analyzer/torture/switch-enum-pr105273-doom-p_floor.c   
-O0   (test for bogus messages, line 82)
FAIL: gcc.dg/analyzer/torture/switch-enum-pr105273-doom-p_maputl.c   
-O0    (test for bogus messages, line 83)


Kind regards,
Torbjörn


On 2023-12-06 23:22, David Malcolm wrote:

On Wed, 2023-12-06 at 02:31 -0300, Alexandre Oliva wrote:

On Nov 22, 2023, Alexandre Oliva  wrote:


Ah, nice, that's a great idea, I wish I'd thought of that!  Will
do.


Sorry it took me so long, here it is.  I added two tests, so that,
regardless of the defaults, we get both circumstances tested, without
repetition.

Regstrapped on x86_64-linux-gnu.  Also tested on arm-eabi.  Ok to
install?


Thanks for the updated patch.

Looks good to me.

Dave




analyzer: deal with -fshort-enums

On platforms that enable -fshort-enums by default, various switch-
enum
analyzer tests fail, because apply_constraints_for_gswitch doesn't
expect the integral promotion type cast.  I've arranged for the code
to cope with those casts.


for  gcc/analyzer/ChangeLog

 * region-model.cc (has_nondefault_case_for_value_p): Take
 enumerate type as a parameter.
 (region_model::apply_constraints_for_gswitch): Cope with
 integral promotion type casts.

for  gcc/testsuite/ChangeLog

 * gcc.dg/analyzer/switch-short-enum-1.c: New.
 * gcc.dg/analyzer/switch-no-short-enum-1.c: New.
---
  gcc/analyzer/region-model.cc   |   27 +++-
  .../gcc.dg/analyzer/switch-no-short-enum-1.c   |  141

  .../gcc.dg/analyzer/switch-short-enum-1.c  |  140

  3 files changed, 304 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/analyzer/switch-no-short-
enum-1.c
  create mode 100644 gcc/testsuite/gcc.dg/analyzer/switch-short-enum-
1.c

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-
model.cc
index 2157ad2578b85..6a7a8bc9f4884 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -5387,10 +5387,10 @@ has_nondefault_case_for_value_p (const
gswitch *switch_stmt, tree int_cst)
 has nondefault cases handling all values in the enum.  */
  static bool
-has_nondefault_cases_for_all_enum_values_p (const gswitch
*switch_stmt)
+has_nondefault_cases_for_all_enum_values_p (const gswitch
*switch_stmt,
+   tree type)
  {
    gcc_assert (switch_stmt);
-  tree type = TREE_TYPE (gimple_switch_index (switch_stmt));
    gcc_assert (TREE_CODE (type) == ENUMERAL_TYPE);
    for (tree enum_val_iter = TYPE_VALUES (type);
@@ -5426,6 +5426,23 @@ apply_constraints_for_gswitch (const
switch_cfg_superedge ,
  {
    tree index  = gimple_switch_index (switch_stmt);
    const svalue *index_sval = get_rvalue (index, ctxt);
+  bool check_index_type = true;
+
+  /* With -fshort-enum, there may be a type cast.  */
+  if (ctxt && index_sval->get_kind () == SK_UNARYOP
+  && TREE_CODE (index_sval->get_type ()) == INTEGER_TYPE)
+    {
+  const unaryop_svalue *unaryop = as_a 
(index_sval);
+  if (unaryop->get_op () == NOP_EXPR
+ && is_a  (unaryop->get_arg ()))
+   if (const initial_svalue *initvalop = (as_a 
+  (unaryop->get_arg
(
+ if (TREE_CODE (initvalop->get_type ()) == ENUMERAL_TYPE)
+   {
+ index_sval = initvalop;
+ check_index_type = false;
+   }
+    }
    /* If we're switching based on an enum type, assume that the user
is only
   working with values from the enum.  Hence if this is an
@@ -5437,12 +5454,14 @@ apply_constraints_for_gswitch (const
switch_cfg_superedge ,
    ctxt
    /* Must be an enum value.  */
    && index_sval->get_type ()
-  && TREE_CODE (TREE_TYPE (index)) == ENUMERAL_TYPE
+  && (!check_index_type
+ || TREE_CODE (TREE_TYPE (index)) == ENUMERAL_TYPE)
    && TREE_CODE (index_sval->get_type ()) == ENUMERAL_TYPE
    /* If we have a constant, then we can check it directly.  */
    && index_sval->get_kind () != SK_CONSTANT
    && edge.implicitly_created_default_p ()
-  && has_nondefault_cases_for_all_enum_values_p (switch_stmt)
+  && has_nondefault_cases_for_all_enum_values_p (switch_stmt,
+    index_sval-

get_type ())

    /* Don't do this 

[PATCH] testsuite/108355 - make gcc.dg/tree-ssa/ssa-fre-104.c properly XFAIL

2024-03-08 Thread Richard Biener
The testcase only XFAILs on targets where int has an alignment
of sizeof(int).  Align the respective array this way to make it
XFAIL consistenlty.

Tested on x86_64-unknown-linux-gnu and cris-elf.  Pushed.

PR testsuite/108355
* gcc.dg/tree-ssa/ssa-fre-104.c: Align e.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-104.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-104.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-104.c
index 425c32dd93c..52756bb7e40 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-104.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-104.c
@@ -8,7 +8,7 @@ int d;
 void bar25_(void);
 void foo(void);
 int main() {
-  int e[][1] = {0, 0, 0, 0, 0, 1};
+  int __attribute__((aligned(sizeof(int e[][1] = {0, 0, 0, 0, 0, 1};
   for (;;) {
 bar25_();
 /* We should optimistically treat a == 0 because of the bounds of
-- 
2.35.3


[PATCH] c++: Fix constexpr evaluation of parameters passed by invisible reference [PR111284]

2024-03-08 Thread Jakub Jelinek
Hi!

My r9-6136 changes to make a copy of constexpr function bodies before
genericization modifies it broke the constant evaluation of non-POD
arguments passed by value.
In the callers such arguments are passed as reference to usually a
TARGET_EXPR, but on the callee side until genericization they are just
direct uses of a PARM_DECL with some class type.
In cxx_bind_parameters_in_call I've used convert_from_reference to
pretend it is passed by value and then cxx_eval_constant_expression
is called there and evaluates that as an rvalue, followed by
adjust_temp_type if the types don't match exactly (e.g. const Foo
argument and passing to it reference to Foo TARGET_EXPR).

The reason this doesn't work is that when the TARGET_EXPR in the caller
is constant initialized, this for it is the address of the TARGET_EXPR_SLOT,
but if the code later on pretends the PARM_DECL is just initialized to the
rvalue of the constant evaluation of the TARGET_EXPR, it is as if there
is a bitwise copy of the TARGET_EXPR to the callee, so this in the callee
is then address of the PARM_DECL in the callee.

The following patch attempts to fix that by constexpr evaluation of such
arguments in the caller as an lvalue instead of rvalue, and on the callee
side when seeing such a PARM_DECL, if we want an lvalue, lookup the value
(lvalue) saved in ctx->globals (if any), and if wanting an rvalue,
recursing with vc_prvalue on the looked up value (because it is there
as an lvalue, nor rvalue).

adjust_temp_type doesn't work for lvalues of non-scalarish types, for
such types it relies on changing the type of a CONSTRUCTOR, but on the
other side we know what we pass to the argument is addressable, so
the patch on type mismatch takes address of the argument value, casts
to reference to the desired type and dereferences it.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-03-08  Jakub Jelinek  

PR c++/111284
* constexpr.cc (cxx_bind_parameters_in_call): For PARM_DECLs with
TREE_ADDRESSABLE types use vc_glvalue rather than vc_prvalue for
cxx_eval_constant_expression and if it doesn't have the same
type as it should, cast the reference type to reference to type
before convert_from_reference and instead of adjust_temp_type
take address of the arg, cast to reference to type and then
convert_from_reference.
(cxx_eval_constant_expression) : For lval case
on parameters with TREE_ADDRESSABLE types lookup result in
ctx->globals if possible.  Otherwise if lookup in ctx->globals
was successful for parameter with TREE_ADDRESSABLE type,
recurse with vc_prvalue on the returned value.

* g++.dg/cpp1z/constexpr-111284.C: New test.
* g++.dg/cpp1y/constexpr-lifetime7.C: Expect one error on a different
line.

--- gcc/cp/constexpr.cc.jj  2024-02-13 10:29:57.979155641 +0100
+++ gcc/cp/constexpr.cc 2024-03-07 19:35:01.032412221 +0100
@@ -1877,13 +1877,21 @@ cxx_bind_parameters_in_call (const const
  x = build_address (x);
}
   if (TREE_ADDRESSABLE (type))
-   /* Undo convert_for_arg_passing work here.  */
-   x = convert_from_reference (x);
-  /* Normally we would strip a TARGET_EXPR in an initialization context
-such as this, but here we do the elision differently: we keep the
-TARGET_EXPR, and use its CONSTRUCTOR as the value of the parm.  */
-  arg = cxx_eval_constant_expression (ctx, x, vc_prvalue,
- non_constant_p, overflow_p);
+   {
+ /* Undo convert_for_arg_passing work here.  */
+ if (TYPE_REF_P (TREE_TYPE (x))
+ && !same_type_p (type, TREE_TYPE (TREE_TYPE (x
+   x = cp_fold_convert (build_reference_type (type), x);
+ x = convert_from_reference (x);
+ arg = cxx_eval_constant_expression (ctx, x, vc_glvalue,
+ non_constant_p, overflow_p);
+   }
+  else
+   /* Normally we would strip a TARGET_EXPR in an initialization context
+  such as this, but here we do the elision differently: we keep the
+  TARGET_EXPR, and use its CONSTRUCTOR as the value of the parm.  */
+   arg = cxx_eval_constant_expression (ctx, x, vc_prvalue,
+   non_constant_p, overflow_p);
   /* Check we aren't dereferencing a null pointer when calling a non-static
 member function, which is undefined behaviour.  */
   if (i == 0 && DECL_OBJECT_MEMBER_FUNCTION_P (fun)
@@ -1909,7 +1917,16 @@ cxx_bind_parameters_in_call (const const
{
  /* Make sure the binding has the same type as the parm.  But
 only for constant args.  */
- if (!TYPE_REF_P (type))
+ if (TREE_ADDRESSABLE (type))
+   {
+ if (!same_type_p (type, TREE_TYPE (arg)))
+   {
+ arg = build_fold_addr_expr (arg);
+   

[PATCH] testsuite: Fix up pr113617 test for darwin [PR113617]

2024-03-08 Thread Jakub Jelinek
Hi!

The test attempts to link a shared library, and apparently Darwin doesn't
allow by default for shared libraries to contain undefined symbols.

The following patch just adds dummy definitions for the symbols, so that
the library no longer has any undefined symbols at least in my linux
testing.
Furthermore, for target { !shared } targets (like darwin until the it is
fixed in target-supports.exp), because we then link a program rather than
shared library, the patch also adds a dummy main definition so that it
can link.

Regtested on x86_64-linux and i686-linux, ok for trunk?

2024-03-08  Jakub Jelinek  

PR rtl-optimization/113617
* g++.dg/other/pr113617.C: Define -DSHARED when linking with -shared.
* g++.dg/other/pr113617-aux.cc: Add definitions for used methods and
templates not defined elsewhere.

--- gcc/testsuite/g++.dg/other/pr113617.C.jj2024-02-26 17:54:57.054857411 
+0100
+++ gcc/testsuite/g++.dg/other/pr113617.C   2024-03-07 15:57:40.510796110 
+0100
@@ -2,7 +2,7 @@
 // { dg-do link { target c++11 } }
 // { dg-options "-O2" }
 // { dg-additional-options "-fPIC" { target fpic } } */
-// { dg-additional-options "-shared" { target shared } } */
+// { dg-additional-options "-shared -DSHARED" { target shared } } */
 // { dg-additional-sources pr113617-aux.cc }
 
 #include "pr113617.h"
--- gcc/testsuite/g++.dg/other/pr113617-aux.cc.jj   2024-02-26 
17:54:57.054857411 +0100
+++ gcc/testsuite/g++.dg/other/pr113617-aux.cc  2024-03-07 15:58:05.473448950 
+0100
@@ -7,3 +7,42 @@ void qux() {
   A a;
   a.foo(0, 0);
 }
+
+namespace R {
+template<>
+Y >::AI
+Y >::operator->()
+{
+  return AI();
+}
+template<>
+Y >::AI
+Y >::operator->()
+{
+  return AI();
+}
+}
+
+N1::N2::N3::AB ab;
+
+N1::N2::N3::AB &
+N1::N2::N3::AB::bleh()
+{
+  return ab;
+}
+
+N1::N2::N3::AC::AC(int)
+{
+}
+
+void
+N1::N2::N3::AC::m1(R::S)
+{
+}
+
+#ifndef SHARED
+int
+main()
+{
+}
+#endif

Jakub



[PATCH] bb-reorder: Fix assertion

2024-03-08 Thread Jakub Jelinek
Hi!

When touching bb-reorder yesterday, I've noticed the checking assert
doesn't actually check what it meant to.
Because asm_noperands returns >= 0 for inline asm patterns (in that case
number of input+output+label operands, so asm goto has at least one)
and -1 if it isn't inline asm.

The following patch fixes the assertion to actually check that it is
asm goto.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-03-08  Jakub Jelinek  

* bb-reorder.cc (fix_up_fall_thru_edges): Fix up checking assert,
asm_noperands < 0 means it is not asm goto too.

--- gcc/bb-reorder.cc.jj2024-03-07 10:06:56.086285875 +0100
+++ gcc/bb-reorder.cc   2024-03-07 10:11:44.745261926 +0100
@@ -2024,7 +2024,8 @@ fix_up_fall_thru_edges (void)
 See PR108596.  */
  rtx_insn *j = BB_END (cur_bb);
  gcc_checking_assert (JUMP_P (j)
-  && asm_noperands (PATTERN (j)));
+  && (asm_noperands (PATTERN (j))
+  > 0));
  edge e2 = find_edge (cur_bb, e->dest);
  if (e2)
e2->flags |= EDGE_CROSSING;

Jakub



Re: [PATCH] bpf: add size threshold for inlining mem builtins

2024-03-08 Thread Jose E. Marchesi


Hi Faust.

> BPF cannot fall back on library calls to implement memmove, memcpy and
> memset, so we attempt to expand these inline always if possible.
> However, this inline expansion was being attempted even for excessively
> large operations, which could result in gcc consuming huge amounts of
> memory and hanging.
>
> Add a size threshold in the BPF backend below which to always expand
> these operations inline, and introduce an option
> -minline-memops-threshold= to control the threshold. Defaults to
> 1024 bytes.
>
> Tested on x86_64-linux-gnu host for bpf-unknown-none target.
> Fixes hang in test gcc.c-torture/compile/20050622-1.c for BPF, which
> returns to (correctly) failing due to exceeding the eBPF stack limit.
>
> gcc/
>
>   * config/bpf/bpf.cc (bpf_expand_cpymem, bpf_expand_setmem): Do
>   not attempt inline expansion if size is above threshold.
>   * config/bpf/bpf.opt (-minline-memops-threshold): New option.
>   * doc/invoke.texi (eBPF Options) <-minline-memops-threshold>:
>   Document.
>
> gcc/testsuite/
>
>   * gcc.target/bpf/inline-memops-threshold-1.c: New test.
>   * gcc.target/bpf/inline-memops-threshold-2.c: New test.
> ---
>  gcc/config/bpf/bpf.cc |  8 
>  gcc/config/bpf/bpf.opt|  4 
>  gcc/doc/invoke.texi   |  9 -
>  .../gcc.target/bpf/inline-memops-threshold-1.c| 15 +++
>  .../gcc.target/bpf/inline-memops-threshold-2.c| 14 ++
>  5 files changed, 49 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/bpf/inline-memops-threshold-1.c
>  create mode 100644 gcc/testsuite/gcc.target/bpf/inline-memops-threshold-2.c
>
> diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
> index 0e33f4347ba..3f3dcb0a46b 100644
> --- a/gcc/config/bpf/bpf.cc
> +++ b/gcc/config/bpf/bpf.cc
> @@ -1275,6 +1275,10 @@ bpf_expand_cpymem (rtx *operands, bool is_move)
>gcc_unreachable ();
>  }
>  
> +  /* For sizes above threshold, always use a libcall.  */
> +  if (size_bytes > (unsigned HOST_WIDE_INT) bpf_inline_memops_threshold)
> +return false;

Shouldn't we emit warning or error in these cases, like in the other
situations that prevent inlining?  Something like:

  if (flag_building_libgcc)
warning (0, "could not inline call to %<__builtin_%s%>: "
 "too many bytes, use -minline-memops-threshold", name);
  else
error ("could not inline call to %<__builtin_%s%>: "
   "too many bytes, use -minline-memops-threshold", name);

> +
>unsigned iters = size_bytes >> ceil_log2 (align);
>unsigned remainder = size_bytes & (align - 1);
>  
> @@ -1347,6 +1351,10 @@ bpf_expand_setmem (rtx *operands)
>gcc_unreachable ();
>  }
>  
> +  /* For sizes above threshold, always use a libcall.  */
> +  if (size_bytes > (unsigned HOST_WIDE_INT) bpf_inline_memops_threshold)
> +return false;
> +
>unsigned iters = size_bytes >> ceil_log2 (align);
>unsigned remainder = size_bytes & (align - 1);
>unsigned inc = GET_MODE_SIZE (mode);
> diff --git a/gcc/config/bpf/bpf.opt b/gcc/config/bpf/bpf.opt
> index acfddebdad7..541ebe4dfc4 100644
> --- a/gcc/config/bpf/bpf.opt
> +++ b/gcc/config/bpf/bpf.opt
> @@ -108,3 +108,7 @@ Enum(asm_dialect) String(normal) Value(ASM_NORMAL)
>  
>  EnumValue
>  Enum(asm_dialect) String(pseudoc) Value(ASM_PSEUDOC)
> +
> +minline-memops-threshold=
> +Target RejectNegative Joined UInteger Var(bpf_inline_memops_threshold) 
> Init(1024)
> +-minline-memops-threshold= Maximum size of memset/memmove/memcpy to 
> inline, larger sizes will use a library call.
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 2390d478121..7a965631123 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -971,7 +971,7 @@ Objective-C and Objective-C++ Dialects}.
>  @gccoptlist{-mbig-endian -mlittle-endian
>  -mframe-limit=@var{bytes} -mxbpf -mco-re -mno-co-re -mjmpext
>  -mjmp32 -malu32 -mv3-atomics -mbswap -msdiv -msmov -mcpu=@var{version}
> --masm=@var{dialect}}
> +-masm=@var{dialect} -minline-memops-threshold=@var{bytes}}
>  
>  @emph{FR30 Options}
>  @gccoptlist{-msmall-model  -mno-lsim}
> @@ -25700,6 +25700,13 @@ Outputs pseudo-c assembly dialect.
>  
>  @end table
>  
> +@opindex -minline-memops-threshold
> +@item -minline-memops-threshold=@var{bytes}
> +Specifies a size threshold in bytes at or below which memmove, memcpy
> +and memset shall always be expanded inline.  Operations dealing with
> +sizes larger than this threshold will be implemented using a library
> +call instead of being expanded inline.  The default is @samp{1024}.

"[...] would have to be implemented using a library call instead of
 being expanded inline, but since BPF doesn't allow libcalls exceeding
 this threshold results in a compile-time error."

> +
>  @end table
>  
>  @node FR30 Options
> diff --git a/gcc/testsuite/gcc.target/bpf/inline-memops-threshold-1.c 
> 

[PATCHv2, rs6000] Add subreg patterns for SImode rotate and mask insert

2024-03-08 Thread HAO CHEN GUI
Hi,
  This patch fixes regression cases in gcc.target/powerpc/rlwimi-2.c. In
combine pass, SImode (subreg from DImode) lshiftrt is converted to DImode
lshiftrt with an out AND. It matches a DImode rotate and mask insert on
rs6000.

Trying 2 -> 7:
2: r122:DI=r129:DI
  REG_DEAD r129:DI
7: r125:SI=r122:DI#0 0>>0x1f
  REG_DEAD r122:DI
Failed to match this instruction:
(set (subreg:DI (reg:SI 125 [ x ]) 0)
(zero_extract:DI (reg:DI 129)
(const_int 32 [0x20])
(const_int 1 [0x1])))
Successfully matched this instruction:
(set (subreg:DI (reg:SI 125 [ x ]) 0)
(and:DI (lshiftrt:DI (reg:DI 129)
(const_int 31 [0x1f]))
(const_int 4294967295 [0x])))

This conversion blocks the further combination which combines to a SImode
rotate and mask insert insn.

Trying 9, 7 -> 10:
9: r127:SI=r130:DI#0&0xfffe
  REG_DEAD r130:DI
7: r125:SI#0=r129:DI 0>>0x1f&0x
  REG_DEAD r129:DI
   10: r124:SI=r127:SI|r125:SI
  REG_DEAD r125:SI
  REG_DEAD r127:SI
Failed to match this instruction:
(set (reg:SI 124)
(ior:SI (and:SI (subreg:SI (reg:DI 130) 0)
(const_int -2 [0xfffe]))
(subreg:SI (zero_extract:DI (reg:DI 129)
(const_int 32 [0x20])
(const_int 1 [0x1])) 0)))
Failed to match this instruction:
(set (reg:SI 124)
(ior:SI (and:SI (subreg:SI (reg:DI 130) 0)
(const_int -2 [0xfffe]))
(subreg:SI (and:DI (lshiftrt:DI (reg:DI 129)
(const_int 31 [0x1f]))
(const_int 4294967295 [0x])) 0)))

  The root cause of the issue is if it's necessary to do the widen mode for
lshiftrt when the target already has shiftrt for narrow mode and its cost
is not high. My former patch tried to fix the problem but not accepted yet.
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624852.html

  As it's stage 4 now, I drafted this patch to fix the regression by adding
subreg patterns of SImode rotate and mask insert. It actually does reversed
things and narrow the mode for lshiftrt so that it can matches the SImode
rotate and mask insert.

  The case "rlwimi-2.c" is fixed and restore the corresponding number of
insns to original ones.

  Compared with last version, the main change is to remove changes for a
testcase which was already fixed in another patch.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Add subreg patterns for SImode rotate and mask insert

In combine pass, SImode (subreg from DImode) lshiftrt is converted to DImode
lshiftrt with an AND.  The new pattern matches rotate and mask insert on
rs6000.  Thus it blocks the pattern to be further combined to a SImode rotate
and mask insert pattern.  This patch fixes the problem by adding two subreg
pattern for SImode rotate and mask insert patterns.

gcc/
PR target/93738
* config/rs6000/rs6000.md (*rotlsi3_insert_subreg): New.
(*rotlsi3_insert_4_subreg): New.

gcc/testsuite/
PR target/93738
* gcc.target/powerpc/rlwimi-2.c: Adjust the number of 64bit and 32bit
rotate instructions.

patch.diff
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index bc8bc6ab060..996d0740faf 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -4253,6 +4253,36 @@ (define_insn "*rotl3_insert"
 ; difference between rlwimi and rldimi.  We also might want dot forms,
 ; but not for rlwimi on POWER4 and similar processors.

+; Subreg pattern of insn "*rotlsi3_insert"
+(define_insn_and_split "*rotlsi3_insert_subreg"
+  [(set (match_operand:SI 0 "gpc_reg_operand" "=r")
+   (ior:SI (and:SI
+(match_operator:SI 8 "lowpart_subreg_operator"
+ [(and:DI (match_operator:DI 4 "rotate_mask_operator"
+   [(match_operand:DI 1 "gpc_reg_operand" "r")
+(match_operand:SI 2 "const_int_operand" "n")])
+  (match_operand:DI 3 "const_int_operand" "n"))])
+(match_operand:SI 5 "const_int_operand" "n"))
+   (and:SI (match_operand:SI 6 "gpc_reg_operand" "0")
+   (match_operand:SI 7 "const_int_operand" "n"]
+  "rs6000_is_valid_insert_mask (operands[5], operands[4], SImode)
+   && GET_CODE (operands[4]) == LSHIFTRT
+   && INTVAL (operands[3]) == 0x
+   && UINTVAL (operands[5]) + UINTVAL (operands[7]) + 1 == 0"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+   (ior:SI (and:SI (lshiftrt:SI (match_dup 9)
+(match_dup 2))
+   (match_dup 5))
+   (and:SI (match_dup 6)
+   (match_dup 7]
+{
+  int offset = BYTES_BIG_ENDIAN ? 4 : 0;
+  operands[9] = gen_rtx_SUBREG (SImode, operands[1], offset);
+}
+  [(set_attr "type" "insert")])
+
 (define_insn "*rotl3_insert_2"
   [(set 

Re: [PATCH] bpf: testsuite: fix unresolved test in memset-1.c

2024-03-08 Thread Jose E. Marchesi


Hi David.
OK.  Thanks.

> The test was trying to do too much by both checking for an error, and
> checking the resulting assembly. Of course, due to the error no asm was
> produced, so the scan-asm went unresolved. Split it into two separate
> tests to fix the issue.
>
> Tested on x86_64-linux-gnu host for bpf-unknown-none target.
>
> gcc/testsuite/
>
>   * gcc.target/bpf/memset-1.c: Move error test case to...
>   * gcc.target/bpf/memset-2.c: ... here. New test.
> ---
>  gcc/testsuite/gcc.target/bpf/memset-1.c |  8 
>  gcc/testsuite/gcc.target/bpf/memset-2.c | 22 ++
>  2 files changed, 22 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/bpf/memset-2.c
>
> diff --git a/gcc/testsuite/gcc.target/bpf/memset-1.c 
> b/gcc/testsuite/gcc.target/bpf/memset-1.c
> index 9e9f8eff028..7c4768c6e73 100644
> --- a/gcc/testsuite/gcc.target/bpf/memset-1.c
> +++ b/gcc/testsuite/gcc.target/bpf/memset-1.c
> @@ -28,12 +28,4 @@ set_large (struct context *ctx)
>__builtin_memset (dest, 0xfe, 130);
>  }
>  
> -void
> -set_variable (struct context *ctx)
> -{
> -  void *data = (void *)(long)ctx->data;
> -  char *dest = data;
> -  __builtin_memset (dest, 0xbc, ctx->data_meta); /* { dg-error "could not 
> inline call" } */
> -}
> -
>  /* { dg-final { scan-assembler-times "call" 0 } } */
> diff --git a/gcc/testsuite/gcc.target/bpf/memset-2.c 
> b/gcc/testsuite/gcc.target/bpf/memset-2.c
> new file mode 100644
> index 000..0602a1a277c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/bpf/memset-2.c
> @@ -0,0 +1,22 @@
> +/* Test that we error if memset cannot be expanded inline.  */
> +
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +struct context {
> + unsigned int data;
> + unsigned int data_end;
> + unsigned int data_meta;
> + unsigned int ingress;
> + unsigned int queue_index;
> + unsigned int egress;
> +};
> +
> +
> +void
> +set_variable (struct context *ctx)
> +{
> +  void *data = (void *)(long)ctx->data;
> +  char *dest = data;
> +  __builtin_memset (dest, 0xbc, ctx->data_meta); /* { dg-error "could not 
> inline call" } */
> +}