[pushed] wwwdocs: readings: Move shared-ptr.com to https

2024-03-23 Thread Gerald Pfeifer
Pushed.

Gerald
---
 htdocs/readings.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/htdocs/readings.html b/htdocs/readings.html
index e4e68909..ee77d969 100644
--- a/htdocs/readings.html
+++ b/htdocs/readings.html
@@ -283,7 +283,7 @@ names.
   Manufacturer: Renesas, various licensees.
   CPUs include: SH1, SH2, SH2-DSP, SH3, SH3-DSP, SH4, SH4A, SH5 series.
   https://www.renesas.com/us/en/products/microcontrollers-microprocessors/other-mcus-mpus/superh-risc-engine-family-mcus;>Renesas
 SuperH Processors
-  http://shared-ptr.com/sh_insns.html;>SuperH Instruction Set 
Summary
+  https://shared-ptr.com/sh_insns.html;>SuperH Instruction Set 
Summary
   GDB includes a simulator.
  
  
-- 
2.44.0


[pushed] wwwdocs: index: Avoid redirect for FOSDEM 2024 link

2024-03-23 Thread Gerald Pfeifer
Pushed.

Gerald
---
 htdocs/index.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/htdocs/index.html b/htdocs/index.html
index 90f2a838..909cae75 100644
--- a/htdocs/index.html
+++ b/htdocs/index.html
@@ -55,7 +55,7 @@ mission statement.
 News
 
 
-https://inbox.sourceware.org/gcc/36fadb0549c3dca716eb3b923d66a11be2c67a61.ca...@redhat.com;>GCC
 developer room at FOSDEM 2024: Call for Participation open
+https://inbox.sourceware.org/gcc/36fadb0549c3dca716eb3b923d66a11be2c67a61.ca...@redhat.com/;>GCC
 developer room at FOSDEM 2024: Call for Participation open
 [2023-11-20] wwwdocs:
 FOSDEM 2024: Brussels, Belgium, February 3-4 2024
 
-- 
2.44.0


[PATCH] libcpp: Fix _Pragma("GCC system_header") [PR114436]

2024-03-23 Thread Lewis Hyatt
Hello-

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114436

This is a small fix for the issue mentioned in the PR that _Pragma("GCC
system_header") does not work completely. I believe it was always the case
since _Pragma() support was first added. bootstrap + regtested all languages
on x86-64 Linux. Is it OK now or in stage 1 please? Thanks!

-Lewis

-- >8 --

_Pragma("GCC system_header") currently takes effect only partially. It does
succeed in updating the line_map, so that checks like in_system_header_at()
return correctly, but it does not update pfile->buffer->sysp.  One result is
that a subsequent #include does not set up the system header state properly
for the newly included file, as pointed out in the PR. Fix by propagating
the new system header state back to the buffer after processing the pragma.

libcpp/ChangeLog:

PR preprocessor/114436
* directives.cc (destringize_and_run): If the _Pragma changed the
buffer system header state (e.g. because it was _Pragma("GCC
system_header"), propagate that change back to the actual buffer too.

gcc/testsuite/ChangeLog:

PR preprocessor/114436
* c-c++-common/cpp/pragma-system-header-1.h: New test.
* c-c++-common/cpp/pragma-system-header-2.h: New test.
* c-c++-common/cpp/pragma-system-header.c: New test.
---
 libcpp/directives.cc  | 11 ---
 .../c-c++-common/cpp/pragma-system-header-1.h |  1 +
 .../c-c++-common/cpp/pragma-system-header-2.h |  5 +
 gcc/testsuite/c-c++-common/cpp/pragma-system-header.c |  3 +++
 4 files changed, 17 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/cpp/pragma-system-header-1.h
 create mode 100644 gcc/testsuite/c-c++-common/cpp/pragma-system-header-2.h
 create mode 100644 gcc/testsuite/c-c++-common/cpp/pragma-system-header.c

diff --git a/libcpp/directives.cc b/libcpp/directives.cc
index 479f8c716e8..bbaf9db6af0 100644
--- a/libcpp/directives.cc
+++ b/libcpp/directives.cc
@@ -1919,9 +1919,11 @@ destringize_and_run (cpp_reader *pfile, const cpp_string 
*in,
  until we've read all of the tokens that we want.  */
   cpp_push_buffer (pfile, (const uchar *) result, dest - result,
   /* from_stage3 */ true);
-  /* ??? Antique Disgusting Hack.  What does this do?  */
-  if (pfile->buffer->prev)
-pfile->buffer->file = pfile->buffer->prev->file;
+
+  /* This is needed for _Pragma("once") and _Pragma("GCC system_header") to 
work
+ properly.  */
+  pfile->buffer->file = pfile->buffer->prev->file;
+  pfile->buffer->sysp = pfile->buffer->prev->sysp;
 
   start_directive (pfile);
   _cpp_clean_line (pfile);
@@ -1986,6 +1988,9 @@ destringize_and_run (cpp_reader *pfile, const cpp_string 
*in,
 
   /* Finish inlining run_directive.  */
   pfile->buffer->file = NULL;
+  /* If the system header state changed due to #pragma GCC system_header, then
+ make that applicable to the real buffer too.  */
+  pfile->buffer->prev->sysp = pfile->buffer->sysp;
   _cpp_pop_buffer (pfile);
 
   /* Reset the old macro state before ...  */
diff --git a/gcc/testsuite/c-c++-common/cpp/pragma-system-header-1.h 
b/gcc/testsuite/c-c++-common/cpp/pragma-system-header-1.h
new file mode 100644
index 000..bd9ff0cb138
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cpp/pragma-system-header-1.h
@@ -0,0 +1 @@
+#pragma GCC warning "this warning should not be output (1)"
diff --git a/gcc/testsuite/c-c++-common/cpp/pragma-system-header-2.h 
b/gcc/testsuite/c-c++-common/cpp/pragma-system-header-2.h
new file mode 100644
index 000..a62d9e2685a
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cpp/pragma-system-header-2.h
@@ -0,0 +1,5 @@
+_Pragma("GCC system_header")
+#include "pragma-system-header-1.h"
+#pragma GCC warning "this warning should not be output (2)"
+_Pragma("unknown")
+#include "pragma-system-header-1.h"
diff --git a/gcc/testsuite/c-c++-common/cpp/pragma-system-header.c 
b/gcc/testsuite/c-c++-common/cpp/pragma-system-header.c
new file mode 100644
index 000..fdea12009e1
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cpp/pragma-system-header.c
@@ -0,0 +1,3 @@
+#include "pragma-system-header-2.h" /* { dg-bogus "this warning should not be 
output" } */
+/* { dg-do preprocess } */
+/* PR preprocessor/114436 */


Re: [PATCH v2] rs6000: Stackoverflow in optimized code on PPC [PR100799]

2024-03-23 Thread Ajit Agarwal



On 23/03/24 9:33 pm, Peter Bergner wrote:
> On 3/23/24 4:33 AM, Ajit Agarwal wrote:
 -  else if (align_words < GP_ARG_NUM_REG)
 +  else if (align_words < GP_ARG_NUM_REG
 + || (cum->hidden_string_length
 + && cum->actual_parm_length <= GP_ARG_NUM_REG))
>>> {
>>>   if (TARGET_32BIT && TARGET_POWERPC64)
>>> return rs6000_mixed_function_arg (mode, type, align_words);
>>>
>>>   return gen_rtx_REG (mode, GP_ARG_MIN_REG + align_words);
>>> }
>>>   else
>>> return NULL_RTX;
>>>
>>> The old code for the unused hidden parameter (which was the 9th param) would
>>> fall thru to the "return NULL_RTX;" which would make the callee assume there
>>> was a parameter save area allocated.  Now instead, we'll return a reg rtx,
>>> probably of r11 (r3 thru r10 are our param regs) and I'm guessing we'll now
>>> see a copy of r11 into a pseudo like we do for the other param regs.
>>> Is that a problem? Given it's an unused parameter, it'll probably get 
>>> deleted
>>> as dead code, but could it cause any issues?  What if we have more than one
>>> unused hidden parameter and we return r12 and r13 which have specific uses
>>> in our ABIs (eg, r13 is our TCB pointer), so it may not actually look dead.
>>> Have you verified what the callee RTL looks like after expand for these
>>> unused hidden parameters?  Is there a rtx we can return that isn't a 
>>> NULL_RTX
>>> which triggers the assumption of a parameter save area, but isn't a reg rtx
>>> which might lead to some rtl being generated?  Would a (const_int 0) or
>>> something else work?
>>>
>>>
>> For the above use case it will return 
>>
>> (reg:DI 5 %r5) and below check entry_parm = 
>> (reg:DI 5 %r5) and the following check will not return TRUE and hence
>>parameter save area will not be allocated.
> 
> Why r5?!?!   The 8th (integer) param would return r10, so I'd assume if
> the next param was a hidden param, then it'd get the next gpr, so r11.
> How does it jump back to r5 which may have been used by the 3rd param?
> 
> 
My mistake its r11 only for hidden param.
> 
> 
> 
>> It will not generate any rtx in the callee rtl code but it just used to
>> check whether to allocate parameter save area or not when number of args > 8.
>>
>> /* If there is no incoming register, we need a stack.  */
>>   entry_parm = rs6000_function_arg (args_so_far, arg);
>>   if (entry_parm == NULL)
>> return true;
>>
>>   /* Likewise if we need to pass both in registers and on the stack.  */
>>   if (GET_CODE (entry_parm) == PARALLEL
>>   && XEXP (XVECEXP (entry_parm, 0, 0), 0) == NULL_RTX)
>> return true;
> 
> Yes, this code in rs6000_parm_needs_stack() uses the rs6000_function_arg()
> return value as a boolean to tell us whether a parameter save area is required
> so what we return is unimportant other than to know it's not NULL_RTX.
> 
> I'm more concerned about the use of the target hook targetm.calls.function_arg
> used in the generic parts of the compiler.  What will that code do differently
> now that we return a reg rtx rather than NULL_RTX?  Might that code use
> the reg rtx to emit something?  I'd feel better if you could verify what
> happens in that code when we return a reg rtx for that 9th hidden param which
> isn't really being passed in a register.
> 

As per my understanding and debugging openBLAS code testcase I see that reg_rtx 
returned inside the below IF condition is used for check whether paramter save 
area is needed or not. 

In the generic code where targetm.calls.function_arg is called 
in calls.cc returned rtx is used for PARALLEL case so that we can
check if we need to pass both in registers and stack then they emit
store with respect to return rtx. If we identify that we need only
registers for argument then it emits nothing.

Thanks & Regards
Ajit
> 
> Peter
> 
> 


Re: [PATCH v2] rs6000: Stackoverflow in optimized code on PPC [PR100799]

2024-03-23 Thread Peter Bergner
On 3/23/24 4:33 AM, Ajit Agarwal wrote:
>>> -  else if (align_words < GP_ARG_NUM_REG)
>>> +  else if (align_words < GP_ARG_NUM_REG
>>> +  || (cum->hidden_string_length
>>> +  && cum->actual_parm_length <= GP_ARG_NUM_REG))
>> {
>>   if (TARGET_32BIT && TARGET_POWERPC64)
>> return rs6000_mixed_function_arg (mode, type, align_words);
>>
>>   return gen_rtx_REG (mode, GP_ARG_MIN_REG + align_words);
>> }
>>   else
>> return NULL_RTX;
>>
>> The old code for the unused hidden parameter (which was the 9th param) would
>> fall thru to the "return NULL_RTX;" which would make the callee assume there
>> was a parameter save area allocated.  Now instead, we'll return a reg rtx,
>> probably of r11 (r3 thru r10 are our param regs) and I'm guessing we'll now
>> see a copy of r11 into a pseudo like we do for the other param regs.
>> Is that a problem? Given it's an unused parameter, it'll probably get deleted
>> as dead code, but could it cause any issues?  What if we have more than one
>> unused hidden parameter and we return r12 and r13 which have specific uses
>> in our ABIs (eg, r13 is our TCB pointer), so it may not actually look dead.
>> Have you verified what the callee RTL looks like after expand for these
>> unused hidden parameters?  Is there a rtx we can return that isn't a NULL_RTX
>> which triggers the assumption of a parameter save area, but isn't a reg rtx
>> which might lead to some rtl being generated?  Would a (const_int 0) or
>> something else work?
>>
>>
> For the above use case it will return 
> 
> (reg:DI 5 %r5) and below check entry_parm = 
> (reg:DI 5 %r5) and the following check will not return TRUE and hence
>parameter save area will not be allocated.

Why r5?!?!   The 8th (integer) param would return r10, so I'd assume if
the next param was a hidden param, then it'd get the next gpr, so r11.
How does it jump back to r5 which may have been used by the 3rd param?





> It will not generate any rtx in the callee rtl code but it just used to
> check whether to allocate parameter save area or not when number of args > 8.
> 
> /* If there is no incoming register, we need a stack.  */
>   entry_parm = rs6000_function_arg (args_so_far, arg);
>   if (entry_parm == NULL)
> return true;
> 
>   /* Likewise if we need to pass both in registers and on the stack.  */
>   if (GET_CODE (entry_parm) == PARALLEL
>   && XEXP (XVECEXP (entry_parm, 0, 0), 0) == NULL_RTX)
> return true;

Yes, this code in rs6000_parm_needs_stack() uses the rs6000_function_arg()
return value as a boolean to tell us whether a parameter save area is required
so what we return is unimportant other than to know it's not NULL_RTX.

I'm more concerned about the use of the target hook targetm.calls.function_arg
used in the generic parts of the compiler.  What will that code do differently
now that we return a reg rtx rather than NULL_RTX?  Might that code use
the reg rtx to emit something?  I'd feel better if you could verify what
happens in that code when we return a reg rtx for that 9th hidden param which
isn't really being passed in a register.


Peter




[PATCH 0/2] A few minor fixes in

2024-03-23 Thread Arsen Arsenović
From: Arsen Arsenović 

Afternoon!

These couple of patches fix two minor issues in the generator
implementation that were informally reported a while ago.

Tested on x86_64-pc-linux-gnu.

OK for trunk?

TIA, have a lovely day!

Arsen Arsenović (2):
  libstdc++: fix _V badname in 
  libstdc++: fix generator iterator operator* return type

 libstdc++-v3/include/std/generator| 22 ++--
 .../range_generators/iter_deref_return.cc | 34 +++
 2 files changed, 46 insertions(+), 10 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/24_iterators/range_generators/iter_deref_return.cc

-- 
2.44.0



[PATCH 1/2] libstdc++: fix _V badname in

2024-03-23 Thread Arsen Arsenović
libstdc++-v3/ChangeLog:

* include/std/generator: Fix _V badname.
---
 libstdc++-v3/include/std/generator | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/libstdc++-v3/include/std/generator 
b/libstdc++-v3/include/std/generator
index 87983ee5e7c6..2d1dcced1e57 100644
--- a/libstdc++-v3/include/std/generator
+++ b/libstdc++-v3/include/std/generator
@@ -76,14 +76,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
* @headerfile generator
* @since C++23
*/
-  template
+  template
 class generator;
 
   /// @cond undocumented
   namespace __gen
   {
 /// _Reference type for a generator whose reference (first argument) and
-/// value (second argument) types are _Ref and _V.
+/// value (second argument) types are _Ref and _Val.
 template
 using _Reference_t = __conditional_t,
 _Ref&&, _Ref>;
@@ -642,14 +642,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   } // namespace __gen
   /// @endcond
 
-  template
+  template
 class generator
-  : public ranges::view_interface>
+  : public ranges::view_interface>
 {
-  using _Value = __conditional_t, remove_cvref_t<_Ref>, _V>;
+  using _Value = __conditional_t,
+remove_cvref_t<_Ref>,
+_Val>;
   static_assert(__gen::_Cv_unqualified_object<_Value>,
"Generator value must be a cv-unqualified object type");
-  using _Reference = __gen::_Reference_t<_Ref, _V>;
+  using _Reference = __gen::_Reference_t<_Ref, _Val>;
   static_assert(is_reference_v<_Reference>
|| (__gen::_Cv_unqualified_object<_Reference>
&& copy_constructible<_Reference>),
@@ -737,8 +739,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   bool _M_began = false;
 };
 
-  template
-struct generator<_Ref, _V, _Alloc>::_Iterator
+  template
+struct generator<_Ref, _Val, _Alloc>::_Iterator
 {
   using value_type = _Value;
   using difference_type = ptrdiff_t;
-- 
2.44.0



[PATCH 2/2] libstdc++: fix generator iterator operator* return type

2024-03-23 Thread Arsen Arsenović
Per the standard, the return type of a generators ranges iterator op*
should be the reference type rather than the yielded type.

The yielded type was used here by mistake.

libstdc++-v3/ChangeLog:

* include/std/generator (generator::_Iterator::operator*): Fix
return type.
* testsuite/24_iterators/range_generators/iter_deref_return.cc:
New test.
---
 libstdc++-v3/include/std/generator|  4 +--
 .../range_generators/iter_deref_return.cc | 34 +++
 2 files changed, 36 insertions(+), 2 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/24_iterators/range_generators/iter_deref_return.cc

diff --git a/libstdc++-v3/include/std/generator 
b/libstdc++-v3/include/std/generator
index 2d1dcced1e57..789016b5a883 100644
--- a/libstdc++-v3/include/std/generator
+++ b/libstdc++-v3/include/std/generator
@@ -773,12 +773,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   operator++(int)
   { this->operator++(); }
 
-  yielded
+  _Reference
   operator*()
const noexcept(is_nothrow_move_constructible_v<_Reference>)
   {
auto& __p = this->_M_coro.promise();
-   return static_cast(*__p._M_value());
+   return static_cast<_Reference>(*__p._M_value());
   }
 
 private:
diff --git 
a/libstdc++-v3/testsuite/24_iterators/range_generators/iter_deref_return.cc 
b/libstdc++-v3/testsuite/24_iterators/range_generators/iter_deref_return.cc
new file mode 100644
index ..7bdbf4d489ec
--- /dev/null
+++ b/libstdc++-v3/testsuite/24_iterators/range_generators/iter_deref_return.cc
@@ -0,0 +1,34 @@
+// { dg-do compile { target c++23 } }
+// Copyright (C) 2023-2024 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// .
+
+#include 
+
+// Check that the return type of iterator::operator* is the reference type.
+// Pre-op* return type fix, this'd have resulted in a op* return type of const
+// bool&.
+
+std::generator
+foo();
+
+static_assert(std::is_same_v);
+static_assert(std::is_same_v);
-- 
2.44.0



Re: [PATCH gcc 1/3] Move GNU/Hurd startfile spec from config/i386/gnu.h to config/gnu.h

2024-03-23 Thread Sergey Bugaev
On Wed, Mar 20, 2024 at 10:20 PM Thomas Schwinge  wrote:
> Hi!

Hi Thomas,

> Sergey, great work on aarch64 GNU/Hurd!  (... where these GCC bits
> clearly were the less complicated part...)  ;-)

thanks! (and indeed they were :)

> Please re-submit with ChangeLog updates added to the Git commit logs; see
>  ->
> , and/or 'git log'
> for guidance.  You may use
> 'contrib/gcc-changelog/git_check_commit.py --print-changelog' to verify.

Done so, and git_check_commit.py seems happy with my attempt. I
rebased (fixing a trivial merge conflict) and posted v2, please take a
look.

Sergey


[PATCH v2 3/3] libgcc: Add basic support for aarch64-gnu (GNU/Hurd on AArch64)

2024-03-23 Thread Sergey Bugaev
There is currently no unwinding implementation.

libgcc/ChangeLog:

* config.host: Recognize aarch64*-*-gnu* hosts.
* config/aarch64/gnu-unwind.h: New file.
* config/aarch64/heap-trampoline.c
(allocate_trampoline_page): Support GNU/Hurd.

Signed-off-by: Sergey Bugaev 
---
 libgcc/config.host  |  9 +++
 libgcc/config/aarch64/gnu-unwind.h  | 36 +
 libgcc/config/aarch64/heap-trampoline.c |  4 +--
 3 files changed, 47 insertions(+), 2 deletions(-)
 create mode 100644 libgcc/config/aarch64/gnu-unwind.h

diff --git a/libgcc/config.host b/libgcc/config.host
index 59a42d3a0..e75a7af64 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -448,6 +448,15 @@ aarch64*-*-linux*)
tmake_file="${tmake_file} t-dfprules"
tmake_file="${tmake_file} ${cpu_type}/t-heap-trampoline"
;;
+aarch64*-*-gnu*)
+   extra_parts="$extra_parts crtfastmath.o"
+   md_unwind_header=aarch64/gnu-unwind.h
+   tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
+   tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
+   tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
+   tmake_file="${tmake_file} t-dfprules"
+   tmake_file="${tmake_file} ${cpu_type}/t-heap-trampoline"
+   ;;
 aarch64*-*-vxworks7*)
extra_parts="$extra_parts crtfastmath.o"
md_unwind_header=aarch64/aarch64-unwind.h
diff --git a/libgcc/config/aarch64/gnu-unwind.h 
b/libgcc/config/aarch64/gnu-unwind.h
new file mode 100644
index 0..d9e485a18
--- /dev/null
+++ b/libgcc/config/aarch64/gnu-unwind.h
@@ -0,0 +1,36 @@
+/* DWARF2 EH unwinding support for GNU Hurd: aarch64.
+   Copyright (C) 2020-2024 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+.  */
+
+/* Always include AArch64 unwinder header file.  */
+#include "config/aarch64/aarch64-unwind.h"
+
+#ifndef inhibit_libc
+
+#include 
+
+/*
+ * TODO: support for aarch64 needs to be implemented.
+ */
+
+#endif /* ifndef inhibit_libc */
diff --git a/libgcc/config/aarch64/heap-trampoline.c 
b/libgcc/config/aarch64/heap-trampoline.c
index 885df629d..26957a3ee 100644
--- a/libgcc/config/aarch64/heap-trampoline.c
+++ b/libgcc/config/aarch64/heap-trampoline.c
@@ -29,7 +29,7 @@ void *allocate_trampoline_page (void);
 void __gcc_nested_func_ptr_created (void *chain, void *func, void *dst);
 void __gcc_nested_func_ptr_deleted (void);
 
-#if defined(__linux__)
+#if defined(__linux__) || defined(__gnu_hurd__)
 static const unsigned char aarch64_trampoline_insns[6][4] = {
   {0x5f, 0x24, 0x03, 0xd5}, /* hint34 */
   {0xb1, 0x00, 0x00, 0x58}, /* ldr x17, .+20 */
@@ -82,7 +82,7 @@ allocate_trampoline_page (void)
 {
   void *page;
 
-#if defined(__linux__)
+#if defined(__linux__) || defined(__gnu_hurd__)
   page = mmap (0, getpagesize (), PROT_WRITE | PROT_EXEC,
   MAP_ANON | MAP_PRIVATE, 0, 0);
 #elif __APPLE__
-- 
2.44.0



[PATCH v2 1/3] Move GNU/Hurd startfile spec from config/i386/gnu.h to config/gnu.h

2024-03-23 Thread Sergey Bugaev
Since it's not i386-specific; this makes it possible to reuse it for other
architectures.

Also, add a warning for the case gnu.h is specified before gnu-user.h, which
would cause gnu-user's version of the spec to override gnu's, and not the other
way around as it's intended. The i?86-gnu target currently specifies them in
the right order, but it's easy to accidentally put them in a wrong order.

gcc/Changelog:

* config/i386/gnu.h: Move GNU/Hurd STARTFILE_SPEC from here...
* config/gnu.h: ...to here.

Signed-off-by: Sergey Bugaev 
---
 gcc/config/gnu.h  | 16 
 gcc/config/i386/gnu.h | 11 ---
 2 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/gcc/config/gnu.h b/gcc/config/gnu.h
index ac99f7605..e2a33baf0 100644
--- a/gcc/config/gnu.h
+++ b/gcc/config/gnu.h
@@ -31,3 +31,19 @@ along with GCC.  If not, see .
builtin_assert ("system=unix"); \
builtin_assert ("system=posix");\
 } while (0)
+
+
+#ifndef GNU_USER_TARGET_STARTFILE_SPEC
+# warning This file should be included after gnu-user.h, to override its 
STARTFILE_SPEC
+#endif
+
+#undef STARTFILE_SPEC
+#if defined HAVE_LD_PIE
+#define STARTFILE_SPEC \
+  "%{!shared: 
%{pg|p|profile:%{static-pie:grcrt0.o%s;static:gcrt0.o%s;:gcrt1.o%s};static-pie:rcrt0.o%s;static:crt0.o%s;"
 PIE_SPEC ":Scrt1.o%s;:crt1.o%s}} \
+   crti.o%s %{static:crtbeginT.o%s;shared|static-pie|" PIE_SPEC 
":crtbeginS.o%s;:crtbegin.o%s}"
+#else
+#define STARTFILE_SPEC \
+  "%{!shared: 
%{pg|p|profile:%{static:gcrt0.o%s;:gcrt1.o%s};static:crt0.o%s;:crt1.o%s}} \
+   crti.o%s %{static:crtbeginT.o%s;shared:crtbeginS.o%s;:crtbegin.o%s}"
+#endif
diff --git a/gcc/config/i386/gnu.h b/gcc/config/i386/gnu.h
index 3f42714d1..af1d55887 100644
--- a/gcc/config/i386/gnu.h
+++ b/gcc/config/i386/gnu.h
@@ -24,17 +24,6 @@ along with GCC.  If not, see .
 #undef GNU_USER_DYNAMIC_LINKER
 #define GNU_USER_DYNAMIC_LINKER "/lib/ld.so"
 
-#undef STARTFILE_SPEC
-#if defined HAVE_LD_PIE
-#define STARTFILE_SPEC \
-  "%{!shared: 
%{pg|p|profile:%{static-pie:grcrt0.o%s;static:gcrt0.o%s;:gcrt1.o%s};static-pie:rcrt0.o%s;static:crt0.o%s;"
 PIE_SPEC ":Scrt1.o%s;:crt1.o%s}} \
-   crti.o%s %{static:crtbeginT.o%s;shared|static-pie|" PIE_SPEC 
":crtbeginS.o%s;:crtbegin.o%s}"
-#else
-#define STARTFILE_SPEC \
-  "%{!shared: 
%{pg|p|profile:%{static:gcrt0.o%s;:gcrt1.o%s};static:crt0.o%s;:crt1.o%s}} \
-   crti.o%s %{static:crtbeginT.o%s;shared:crtbeginS.o%s;:crtbegin.o%s}"
-#endif
-
 #ifdef TARGET_LIBC_PROVIDES_SSP
 
 /* i386 glibc provides __stack_chk_guard in %gs:0x14.  */
-- 
2.44.0



[PATCH v2 2/3] aarch64: Add support for aarch64-gnu (GNU/Hurd on AArch64)

2024-03-23 Thread Sergey Bugaev
Coupled with a corresponding binutils patch, this produces a toolchain that can
sucessfully build working binaries targeting aarch64-gnu.

gcc/Changelog:

* config.gcc: Recognize aarch64*-*-gnu* targets.
* config/aarch64/aarch64-gnu.h: New file.

Signed-off-by: Sergey Bugaev 
---
 gcc/config.gcc   |  6 +++
 gcc/config/aarch64/aarch64-gnu.h | 68 
 2 files changed, 74 insertions(+)
 create mode 100644 gcc/config/aarch64/aarch64-gnu.h

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 87a5c92b6..9d935164c 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1264,6 +1264,12 @@ aarch64*-*-linux*)
done
TM_MULTILIB_CONFIG=`echo $TM_MULTILIB_CONFIG | sed 's/^,//'`
;;
+aarch64*-*-gnu*)
+tm_file="${tm_file} elfos.h gnu-user.h gnu.h glibc-stdint.h"
+tm_file="${tm_file} aarch64/aarch64-elf.h aarch64/aarch64-errata.h 
aarch64/aarch64-gnu.h"
+tmake_file="${tmake_file} aarch64/t-aarch64"
+tm_defines="${tm_defines}  TARGET_DEFAULT_ASYNC_UNWIND_TABLES=1"
+   ;;
 aarch64*-wrs-vxworks*)
 tm_file="${tm_file} elfos.h aarch64/aarch64-elf.h"
 tm_file="${tm_file} vx-common.h vxworks.h aarch64/aarch64-vxworks.h"
diff --git a/gcc/config/aarch64/aarch64-gnu.h b/gcc/config/aarch64/aarch64-gnu.h
new file mode 100644
index 0..ee5494034
--- /dev/null
+++ b/gcc/config/aarch64/aarch64-gnu.h
@@ -0,0 +1,68 @@
+/* Definitions for AArch64 running GNU/Hurd.
+   Copyright (C) 2009-2024 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .  */
+
+#ifndef GCC_AARCH64_GNU_H
+#define GCC_AARCH64_GNU_H
+
+#define GNU_USER_DYNAMIC_LINKER 
"/lib/ld-aarch64%{mbig-endian:_be}%{mabi=ilp32:_ilp32}.so.1"
+
+#define CPP_SPEC "%{pthread:-D_REENTRANT}"
+
+#define GNU_TARGET_LINK_SPEC  "%{h*}   \
+   %{static:-Bstatic}  \
+   %{shared:-shared}   \
+   %{symbolic:-Bsymbolic}  \
+   %{!static:%{!static-pie:\
+ %{rdynamic:-export-dynamic}   \
+ %{!shared:-dynamic-linker " GNU_USER_DYNAMIC_LINKER "}}} \
+   %{static-pie:-Bstatic -pie --no-dynamic-linker -z text} \
+   -X  \
+   %{mbig-endian:-EB} %{mlittle-endian:-EL} \
+   -maarch64gnu%{mabi=ilp32:32}%{mbig-endian:b}"
+
+
+#define LINK_SPEC GNU_TARGET_LINK_SPEC AARCH64_ERRATA_LINK_SPEC
+
+#define GNU_USER_TARGET_MATHFILE_SPEC \
+  "%{Ofast|ffast-math|funsafe-math-optimizations:%{!shared:crtfastmath.o%s}}"
+
+#undef ENDFILE_SPEC
+#define ENDFILE_SPEC   \
+  GNU_USER_TARGET_MATHFILE_SPEC " " \
+  GNU_USER_TARGET_ENDFILE_SPEC
+
+#define TARGET_OS_CPP_BUILTINS()   \
+  do   \
+{  \
+   GNU_USER_TARGET_OS_CPP_BUILTINS();  \
+}  \
+  while (0)
+
+#define TARGET_ASM_FILE_END aarch64_file_end_indicate_exec_stack
+
+/* Uninitialized common symbols in non-PIE executables, even with
+   strong definitions in dependent shared libraries, will resolve
+   to COPY relocated symbol in the executable.  See PR65780.  */
+#undef TARGET_BINDS_LOCAL_P
+#define TARGET_BINDS_LOCAL_P default_binds_local_p_2
+
+/* Define this to be nonzero if static stack checking is supported.  */
+#define STACK_CHECK_STATIC_BUILTIN 1
+
+#endif  /* GCC_AARCH64_GNU_H */
-- 
2.44.0



Re: [PATCH v2] rs6000: Stackoverflow in optimized code on PPC [PR100799]

2024-03-23 Thread Ajit Agarwal
Hello Peter:

Sent version-3 of the patch addressing below review comments.

Thanks & Regards
Ajit

On 23/03/24 3:03 pm, Ajit Agarwal wrote:
> Hello Peter:
> 
> On 23/03/24 10:07 am, Peter Bergner wrote:
>> On 3/22/24 5:15 AM, Ajit Agarwal wrote:
>>> When using FlexiBLAS with OpenBLAS we noticed corruption of
>>> the parameters passed to OpenBLAS functions. FlexiBLAS
>>> basically provides a BLAS interface where each function
>>> is a stub that forwards the arguments to a real BLAS lib,
>>> like OpenBLAS.
>>>
>>> Fixes the corruption of caller frame checking number of
>>> arguments is less than equal to GP_ARG_NUM_REG (8)
>>> excluding hidden unused DECLS.
>>
>> I think the git log entry commentary could be a little more descriptive
>> of what the problem is. How about something like the following?
>>
>>   When using FlexiBLAS with OpenBLAS, we noticed corruption of the caller
>>   stack frame when calling OpenBLAS functions.  This was caused by the
>>   FlexiBLAS C/C++ caller and OpenBLAS Fortran callee disagreeing on the
>>   number of function parameters in the callee due to hidden Fortran
>>   parameters. This can cause problems when the callee believes the caller
>>   has allocated a parameter save area when the caller has not done so.
>>   That means any writes by the callee into the non-existent parameter save
>>   area will corrupt the caller stack frame.
>>
>>   The workaround implemented here, is for the callee to determine whether
>>   the caller has allocated a parameter save area or not, by ignoring any
>>   unused hidden parameters when counting the number of parameters.
>>
>>
> I will address this change in the new version of the patch.
>>
>>> PR rtk-optimization/100799
>>
>> s/rtk/rtl/
>>
>>
>>
> I will address this in new version of the patch.
>>> * config/rs6000/rs6000-calls.cc (rs6000_function_arg): Don't
>>> generate parameter save area if number of arguments passed
>>> less than equal to GP_ARG_NUM_REG (8) excluding hidden
>>> parameter.
>>
>> The callee doesn't generate or allocate the parameter save area, the
>> caller does.  The code here is for the callee trying to determine
>> whether the caller has done so.  How about saying the following instead?
>>
>>   Don't assume a parameter save area has been allocated if the number of
>>   formal parameters, excluding unused hidden parameters, is less than or
>>   equal to GP_ARG_NUM_REG (8).
>>
>>
> 
> I will incorporate this change in new version of the patch.
> 
>>
>>
>>> (init_cumulative_args): Check for hidden parameter in fortran
>>> routine and set the flag hidden_string_length and actual
>>> parameter passed excluding hidden unused DECLS.
>>
>> Check for unused hidden Fortran parameters and set hidden_string_length
>> and actual_parm_length.
>>
>>
> 
> I will address this change in new version of the patch.
> 
>  
>>> +  /* When the buggy C/C++ wrappers call the function with fewer arguments
>>> + than it actually has and doesn't expect the parameter save area on the
>>> + caller side because of that while the callee expects it and the callee
>>> + actually stores something in the parameter save area, it corrupts
>>> + whatever is in the caller stack frame at that location.  */
>>
>> The wrapper/caller is the one that allocates the parameter save area, so
>> saying "...doesn't expect the parameter save area on the caller side..."
>> doesn't make sense, since it knows whether it allocated it or not.
>> How about saying something like the following instead?
>>
>>   Check whether this function contains any unused hidden parameters and
>>   record how many there are for use in rs6000_function_arg() to determine
>>   whether its callers have allocated a parameter save area or not.
>>   See PR100799 for details.
>>
>>
> 
> I will incorporate this change in new version of the patch.
> 
>>
>>> +  unsigned int num_args = 0;
>>> +  unsigned int hidden_length = 0;
>>> +
>>> +  for (tree arg = DECL_ARGUMENTS (current_function_decl);
>>> +   arg; arg = DECL_CHAIN (arg))
>>> +{
>>> +  num_args++;
>>> +  if (DECL_HIDDEN_STRING_LENGTH (arg))
>>> +   {
>>> + tree parmdef = ssa_default_def (cfun, arg);
>>> + if (parmdef == NULL || has_zero_uses (parmdef))
>>> +   {
>>> + cum->hidden_string_length = 1;
>>> + hidden_length++;
>>> +   }
>>> +   }
>>> +   }
>>> +
>>> +  cum->actual_parm_length = num_args - hidden_length;
>>
>> This code looks fine, but do we really need two new fields in rs6000_args?
>> Can't we just get along with only cum->actual_parm_length by modifying
>> the rs6000_function_arg() change from:
>>
>>> +  else if (align_words < GP_ARG_NUM_REG
>>> +  || (cum->hidden_string_length
>>> +  && cum->actual_parm_length <= GP_ARG_NUM_REG))
>>
>> to:
>>
>> +  else if (align_words < GP_ARG_NUM_REG
>> +   || cum->actual_parm_length <= GP_ARG_NUM_REG)
>>
>> ???
>>
> 
> Yes we can do that. I will address this in new 

[PATCH v3] rs6000: Stackoverflow in optimized code on PPC [PR100799]

2024-03-23 Thread Ajit Agarwal
Hello All:

When using FlexiBLAS with OpenBLAS, we noticed corruption of the caller
stack frame when calling OpenBLAS functions.  This was caused by the
FlexiBLAS C/C++ caller and OpenBLAS Fortran callee disagreeing on the
number of function parameters in the callee due to hidden Fortran
parameters. This can cause problems when the callee believes the caller
has allocated a parameter save area when the caller has not done so.
That means any writes by the callee into the non-existent parameter save
area will corrupt the caller stack frame.

The workaround implemented here, is for the callee to determine whether
the caller has allocated a parameter save area or not, by ignoring any
unused hidden parameters when counting the number of parameters.

Bootstrapped and regtested on powerpc64-linux-gnu.

Thanks & Regards
Ajit


rs6000: Stackoverflow in optimized code on PPC [PR100799]

When using FlexiBLAS with OpenBLAS, we noticed corruption of the caller
stack frame when calling OpenBLAS functions.  This was caused by the
FlexiBLAS C/C++ caller and OpenBLAS Fortran callee disagreeing on the
number of function parameters in the callee due to hidden Fortran
parameters. This can cause problems when the callee believes the caller
has allocated a parameter save area when the caller has not done so.
That means any writes by the callee into the non-existent parameter save
area will corrupt the caller stack frame.

The workaround implemented here, is for the callee to determine whether
the caller has allocated a parameter save area or not, by ignoring any
unused hidden parameters when counting the number of parameters.

2024-03-23  Ajit Kumar Agarwal  

gcc/ChangeLog:

PR rtl-optimization/100799
* config/rs6000/rs6000-calls.cc (rs6000_function_arg): Don't
assume a parameter save area has been allocated if the number of
formal parameters, excluding unused hidden parameters, is less
than or equal to GP_ARG_NUM_REG (8).
(init_cumulative_args): Check for unused hidden Fortran
parameters and set hidden_string_length and actual_parm_length.
* config/rs6000/rs6000.h (rs6000_args): Add new field
hidden_string_length and actual_parm_length.
---
 gcc/config/rs6000/rs6000-call.cc | 38 ++--
 gcc/config/rs6000/rs6000.h   |  4 
 2 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-call.cc b/gcc/config/rs6000/rs6000-call.cc
index 1f8f93a2ee7..656735aebaf 100644
--- a/gcc/config/rs6000/rs6000-call.cc
+++ b/gcc/config/rs6000/rs6000-call.cc
@@ -64,7 +64,7 @@
 #include "ppc-auxv.h"
 #include "targhooks.h"
 #include "opts.h"
-
+#include "tree-dfa.h"
 #include "rs6000-internal.h"
 
 #ifndef TARGET_PROFILE_KERNEL
@@ -584,6 +584,32 @@ init_cumulative_args (CUMULATIVE_ARGS *cum, tree fntype,
   if (incoming || cum->prototype)
 cum->nargs_prototype = n_named_args;
 
+  /* When the buggy C/C++ wrappers call the function with fewer arguments
+ than it actually has. Check whether this function contains any unused
+ hidden parameters and record how many there are for use in
+ rs6000_function_arg() to determine whether its callers
+ have allocated a parameter save area or not. See PR100799 for
+ details.  */
+  unsigned int num_args = 0;
+  unsigned int hidden_length = 0;
+
+  for (tree arg = DECL_ARGUMENTS (current_function_decl);
+   arg; arg = DECL_CHAIN (arg))
+{
+  num_args++;
+  if (DECL_HIDDEN_STRING_LENGTH (arg))
+   {
+ tree parmdef = ssa_default_def (cfun, arg);
+ if (parmdef == NULL || has_zero_uses (parmdef))
+   {
+ cum->hidden_string_length = 1;
+ hidden_length++;
+   }
+   }
+   }
+
+  cum->actual_parm_length = num_args - hidden_length;
+
   /* Check for a longcall attribute.  */
   if ((!fntype && rs6000_default_long_calls)
   || (fntype
@@ -1857,7 +1883,15 @@ rs6000_function_arg (cumulative_args_t cum_v, const 
function_arg_info )
 
  return rs6000_finish_function_arg (mode, rvec, k);
}
-  else if (align_words < GP_ARG_NUM_REG)
+ /* When the buggy C/C++ wrappers call the function with fewer arguments
+   than it actually has. Check whether this function contains any unused
+   hidden parameters and record how many there are for use in
+   rs6000_function_arg() to determine whether its callers
+   have allocated a parameter save area or not. See PR100799 for
+   details.  */
+  else if (align_words < GP_ARG_NUM_REG
+  || (cum->hidden_string_length
+  && cum->actual_parm_length <= GP_ARG_NUM_REG))
{
  if (TARGET_32BIT && TARGET_POWERPC64)
return rs6000_mixed_function_arg (mode, type, align_words);
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 68bc45d65ba..a8f91301852 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -1490,6 +1490,10 @@ 

[pushed] analyzer: fix ICE and false positive with -Wanalyzer-deref-before-check [PR114408]

2024-03-23 Thread David Malcolm
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Successful run of analyzer integration tests on x86_64-pc-linux-gnu.
Pushed to trunk as r14-9646-g80a0cb37456c49.

gcc/analyzer/ChangeLog:
PR analyzer/114408
* engine.cc (impl_run_checkers): Free up any dominance info that
we may have created.
* kf.cc (class kf_ubsan_handler): New.
(register_sanitizer_builtins): New.
(register_known_functions): Call register_sanitizer_builtins.

gcc/testsuite/ChangeLog:
PR analyzer/114408
* c-c++-common/analyzer/deref-before-check-pr114408.c: New test.
* c-c++-common/ubsan/analyzer-ice-pr114408.c: New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/engine.cc|  7 ++
 gcc/analyzer/kf.cc| 22 +++
 .../analyzer/deref-before-check-pr114408.c| 22 +++
 .../ubsan/analyzer-ice-pr114408.c |  9 
 4 files changed, 60 insertions(+)
 create mode 100644 
gcc/testsuite/c-c++-common/analyzer/deref-before-check-pr114408.c
 create mode 100644 gcc/testsuite/c-c++-common/ubsan/analyzer-ice-pr114408.c

diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
index ad310b4d8731..e0dc0e66e88c 100644
--- a/gcc/analyzer/engine.cc
+++ b/gcc/analyzer/engine.cc
@@ -6251,6 +6251,13 @@ impl_run_checkers (logger *logger)
 eng.get_model_manager ()->dump_untracked_regions ();
 
   delete purge_map;
+
+  /* Free up any dominance info that we may have created.  */
+  FOR_EACH_FUNCTION_WITH_GIMPLE_BODY (node)
+{
+  function *fun = node->get_fun ();
+  free_dominance_info (fun, CDI_DOMINATORS);
+}
 }
 
 /* Handle -fdump-analyzer and -fdump-analyzer-stderr.  */
diff --git a/gcc/analyzer/kf.cc b/gcc/analyzer/kf.cc
index d197ccbd0f05..6931f07bd75e 100644
--- a/gcc/analyzer/kf.cc
+++ b/gcc/analyzer/kf.cc
@@ -2198,6 +2198,27 @@ register_atomic_builtins (known_function_manager )
   make_unique (BIT_IOR_EXPR));
 }
 
+/* Handle calls to the various __builtin___ubsan_handle_*.
+   These can return, but continuing after such a return
+   isn't likely to be interesting to the user of the analyzer.
+   Hence we terminate the analysis path at one of these calls.  */
+
+class kf_ubsan_handler : public internal_known_function
+{
+  void impl_call_post (const call_details ) const final override
+  {
+if (cd.get_ctxt ())
+  cd.get_ctxt ()->terminate_path ();
+  }
+};
+
+static void
+register_sanitizer_builtins (known_function_manager )
+{
+  kfm.add (BUILT_IN_UBSAN_HANDLE_NONNULL_ARG,
+  make_unique ());
+}
+
 /* Populate KFM with instances of known functions supported by the core of the
analyzer (as opposed to plugins).  */
 
@@ -2224,6 +2245,7 @@ register_known_functions (known_function_manager ,
 kfm.add (BUILT_IN_STACK_SAVE, make_unique ());
 
 register_atomic_builtins (kfm);
+register_sanitizer_builtins (kfm);
 register_varargs_builtins (kfm);
   }
 
diff --git a/gcc/testsuite/c-c++-common/analyzer/deref-before-check-pr114408.c 
b/gcc/testsuite/c-c++-common/analyzer/deref-before-check-pr114408.c
new file mode 100644
index ..d55720271d0f
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/analyzer/deref-before-check-pr114408.c
@@ -0,0 +1,22 @@
+extern void unknown_returns (const char *p);
+extern void unknown_noreturn (const char *p) __attribute__((__noreturn__));
+
+void test_1 (const char *p)
+{
+  if (p)
+unknown_returns (p);
+  __builtin_strcmp ("a", p); /* { dg-message "pointer 'p' is dereferenced 
here" "" { target c } } */
+  if (p) /* { dg-warning "check of 'p' for NULL after already dereferencing 
it" "" { target c } } */
+unknown_returns (p);
+  __builtin_strcmp ("a", p);  
+}
+
+void test_2 (const char *p)
+{
+  if (p)
+unknown_noreturn (p);
+  __builtin_strcmp ("a", p);
+  if (p) /* { dg-bogus "check of 'p' for NULL after already dereferencing it" 
} */
+unknown_noreturn (p);
+  __builtin_strcmp ("a", p);  
+}
diff --git a/gcc/testsuite/c-c++-common/ubsan/analyzer-ice-pr114408.c 
b/gcc/testsuite/c-c++-common/ubsan/analyzer-ice-pr114408.c
new file mode 100644
index ..55f918726eed
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/ubsan/analyzer-ice-pr114408.c
@@ -0,0 +1,9 @@
+/* { dg-do run } */
+/* { dg-require-effective-target analyzer } */
+/* { dg-options "-fanalyzer -fsanitize=undefined" } */
+
+int main(){}
+
+int HMAP_unset_copy(const char *key) {
+return __builtin_strcmp("a", key) + __builtin_strcmp("a", key);
+}
-- 
2.26.3



[committed] hppa: Fix LO_SUM DLTIND14R address support in PRINT_OPERAND_ADDRESS

2024-03-23 Thread John David Anglin
Tested on hppa-unknown-linux-gnu and hppa64-hp-hpux11.11.  Committed
to trunk.

Dave
---

hppa: Fix LO_SUM DLTIND14R address support in PRINT_OPERAND_ADDRESS 

This bug was hidden since LO_SUM DLTIND14R addresses are normally
handled by the A constraint in the move patterns.

2024-03-23  John David Anglin  

gcc/ChangeLog:

* config/pa/pa.cc (pa_output_global_address): Handle
UNSPEC_DLTIND14R addresses.
* config/pa/pa.h (PRINT_OPERAND_ADDRESS): Output "RT'" for
UNSPEC_DLTIND14R address.

diff --git a/gcc/config/pa/pa.cc b/gcc/config/pa/pa.cc
index d7666103de8..f9b1906efb4 100644
--- a/gcc/config/pa/pa.cc
+++ b/gcc/config/pa/pa.cc
@@ -5784,7 +5784,12 @@ pa_output_global_address (FILE *file, rtx x, int 
round_constant)
   if (GET_CODE (x) == HIGH)
 x = XEXP (x, 0);
 
-  if (GET_CODE (x) == SYMBOL_REF && read_only_operand (x, VOIDmode))
+  if (GET_CODE (x) == UNSPEC && XINT (x, 1) == UNSPEC_DLTIND14R)
+{
+  x = XVECEXP (x, 0, 0);
+  output_addr_const (file, x);
+}
+  else if (GET_CODE (x) == SYMBOL_REF && read_only_operand (x, VOIDmode))
 output_addr_const (file, x);
   else if (GET_CODE (x) == SYMBOL_REF && !flag_pic)
 {
diff --git a/gcc/config/pa/pa.h b/gcc/config/pa/pa.h
index 403f16c5cb5..127a0d1966d 100644
--- a/gcc/config/pa/pa.h
+++ b/gcc/config/pa/pa.h
@@ -1247,12 +1247,15 @@ do {
 \
   reg_names [REGNO (XEXP (addr, 0))]); \
   break;   \
 case LO_SUM:   \
-  if (!symbolic_operand (XEXP (addr, 1), VOIDmode))
\
+  if (GET_CODE (XEXP (addr, 1)) == UNSPEC  \
+ && XINT (XEXP (addr, 1), 1) == UNSPEC_DLTIND14R)  \
+   fputs ("RT'", FILE);\
+  else if (!symbolic_operand (XEXP (addr, 1), VOIDmode))   \
fputs ("R'", FILE); \
   else if (flag_pic == 0)  \
fputs ("RR'", FILE);\
   else \
-   fputs ("RT'", FILE);\
+   gcc_unreachable (); \
   pa_output_global_address (FILE, XEXP (addr, 1), 0);  \
   fputs ("(", FILE);   \
   output_operand (XEXP (addr, 0), 0);  \


signature.asc
Description: PGP signature


Re: No rule to make target '../libbacktrace/libbacktrace.la', needed by 'libgo.la'. [PR106472]

2024-03-23 Thread Дилян Палаузов
Hello,

Can the build experts say what needs to be changed?  The dependencies I added 
are missing in the build configuration (@if gcc-bootstrap).

I cannot say if libbacktrace should or should not be a bootstrap=true module.

If there are no better patches, I kindly ask to integrate this patch with a 
comment, indicating the quality of the patch.  The change proposed by me does 
fix PR106472.

Kind regards
  Дилян

-Original Message-
From: Jakub Jelinek 
Reply-To: Jakub Jelinek 
To: Дилян Палаузов , Paolo Bonzini 
, Nathanael Nerode , Alexandre Oliva 
, Ralf Wildenhues , Ian Lance 
Taylor 
Cc: gcc-patches@gcc.gnu.org
Subject: Re: No rule to make target '../libbacktrace/libbacktrace.la', needed 
by 'libgo.la'. [PR106472]
Date: 03/13/2024 10:13:37 AM

On Wed, Mar 13, 2024 at 07:37:26AM +0100, Дилян Палаузов wrote:
> Non-parallel build can fail, depending on the ./configure parameters -
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106472 .
> 
> The change below does fix the problem.

CCing build system maintainers and the Go maintainer.

While the first Makefile.tpl hunk looks obviously ok, the others look
completely wrong to me.
There is nothing special about libgo vs. libbacktrace/libatomic
compared to any other target library which is not bootstrapped vs. any
of its dependencies which are in the bootstrapped set.
So, Makefile.tpl shouldn't hardcode such dependencies.

The
all-target-libgo: maybe-all-target-libbacktrace
all-target-libgo: maybe-all-target-libatomic
dependencies which are in Makefile.in already are I believe intentionally
guarded with
@unless gcc-bootstrap
because when bootstrapping, there are the
all-target-libgo: stage_current
configure-target-libgo: stage_last
dependencies instead plus there is always
all-target-libgo: configure-target-libgo
and stage_last should I believe ensure that everything is bootstrapped,
gcc as well as the bootstrapped target libraries like libbacktrace or
libatomic.
Now, if those are built only sometimes depending on configured languages
- I see
grep 'lib\(backtrace\|atomic\)' gcc/*/config-lang.in 
gcc/ada/gcc-interface/config-lang.in 
gcc/d/config-lang.in:phobos_target_deps="target-zlib target-libbacktrace"
gcc/fortran/config-lang.in:target_libs="target-libgfortran target-libbacktrace"
gcc/go/config-lang.in:target_libs="target-libgo target-libffi 
target-libbacktrace"
then perhaps Makefile.def should know that it is not a bootstrap=true module
target_modules = { module= libbacktrace; bootstrap=true; };
unconditionally and arrange for the dependencies between non-bootstrap
target modules and these maybe ones to be emitted even if gcc-bootstrap.

> I do not understand the build system to say, that this is the best approach,
> so if there are questions I might or might not be able to answer them.
> 
> I tried different things, this worked on the releases/gcc-13 branch.  On the
> master branch last weekend the problem was that stage2 and stage3 results
> are not equal, so I have not verified this change there.  depend= in
> Makefile.def seem to have only effect if bootstrapping is involved and
> gcc/go/config-lang.in does not have boot_language=yes .  The lines below are
> present in the Makefile.in:@unless gcc-bootstrap snippet.  Actually I think
> ./configure --enable-languages=all and then serial build work, because this
> implied D and it does imply bootstrapping for libbacktrace and libatomic.  I
> also do not want to invest much more time on this.
> 
> I do not know, if 2×`maybe-`  is necessary.
> 
> 
> diff --git a/Makefile.in b/Makefile.in
> index 06a9398e172..236e5cda942 100644
> --- a/Makefile.in
> +++ b/Makefile.in
> @@ -66481,6 +66481,7 @@ configure-target-libgfortran:
> maybe-all-target-libquadmath
> 
> 
>  @if gcc-bootstrap
> +all-target-libgo: maybe-all-target-libbacktrace maybe-all-target-libatomic
>  configure-gnattools: stage_last
>  configure-libcc1: stage_last
>  configure-c++tools: stage_last
> diff --git a/Makefile.tpl b/Makefile.tpl
> index dfbd74b68f8..98160c7626b 100644
> --- a/Makefile.tpl
> +++ b/Makefile.tpl
> @@ -1952,7 +1952,7 @@ configure-target-[+module+]: maybe-all-gcc[+
>     (define dep-maybe (lambda ()
>    (if (exist? "hard") "" "maybe-")))
> 
> -   ;; dep-kind returns returns "prebootstrap" for configure or build
> +   ;; dep-kind returns "prebootstrap" for configure or build
>     ;; dependencies of bootstrapped modules on a build module
>     ;; (e.g. all-gcc on all-build-bison); "normal" if the dependency is
>     ;; on an "install" target, or if the dependence module is not
> @@ -2017,6 +2017,7 @@ configure-target-[+module+]: maybe-all-gcc[+
>  [+ ESAC +][+ ENDFOR dependencies +]
> 
>  @if gcc-bootstrap
> +all-target-libgo: maybe-all-target-libbacktrace maybe-all-target-libatomic
>  [+ FOR dependencies +][+ CASE (dep-kind) +]
>  [+ == "postbootstrap" +][+ (make-postboot-dep) +][+ ESAC +][+
>  ENDFOR dependencies +]@endif gcc-bootstrap
> 

Jakub




[committed] libstdc++: Disable std::formatter specializations (LWG 3944)

2024-03-23 Thread Jonathan Wakely
Tested aarch64-linux. Pushed to trunk.

-- >8 --

This was just approved in Tokyo as a DR for C++23. It doesn't affect us
yet, because we don't implement the __cpp_lib_format_ranges features. We
can add the disabled specializations and add a testcase now though.

libstdc++-v3/ChangeLog:

* include/std/format (formatter): Disable specializations that
would allow sequences of narrow characters to be formatted as
wchar_t without conversion, as per LWG 3944.
* testsuite/std/format/formatter/lwg3944.cc: New test.
---
 libstdc++-v3/include/std/format   | 23 ++
 .../testsuite/std/format/formatter/lwg3944.cc | 31 +++
 2 files changed, 54 insertions(+)
 create mode 100644 libstdc++-v3/testsuite/std/format/formatter/lwg3944.cc

diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index 613016d1a10..22dcb5f24bd 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -2478,6 +2478,29 @@ namespace __format
 };
   /// @}
 
+#if defined _GLIBCXX_USE_WCHAR_T && __cpp_lib_format_ranges
+  // _GLIBCXX_RESOLVE_LIB_DEFECTS
+  // 3944. Formatters converting sequences of char to sequences of wchar_t
+
+  namespace __format { struct __disabled; }
+
+  // std::formatter<__disabled, C> uses the primary template, which is 
disabled.
+  template<>
+struct formatter
+: private formatter<__format::__disabled, wchar_t> { };
+  template<>
+struct formatter
+: private formatter<__format::__disabled, wchar_t> { };
+  template
+struct formatter
+: private formatter<__format::__disabled, wchar_t> { };
+  template
+struct formatter, wchar_t>
+: private formatter<__format::__disabled, wchar_t> { };
+  template
+struct formatter, wchar_t>
+: private formatter<__format::__disabled, wchar_t> { };
+#endif
 
 /// @cond undocumented
 namespace __format
diff --git a/libstdc++-v3/testsuite/std/format/formatter/lwg3944.cc 
b/libstdc++-v3/testsuite/std/format/formatter/lwg3944.cc
new file mode 100644
index 000..ff5f075bcc8
--- /dev/null
+++ b/libstdc++-v3/testsuite/std/format/formatter/lwg3944.cc
@@ -0,0 +1,31 @@
+// { dg-do compile { target c++20 } }
+// { dg-options "-Wno-unused-result" }
+
+// LWG 3944. Formatters converting sequences of char to sequences of wchar_t
+
+#include 
+
+void test_lwg3944()
+{
+  // Ill-formed in C++20 and C++23
+  const char* cstr = "hello";
+  char* str = const_cast(cstr);
+  std::format(L"{}", str); // { dg-error "here" }
+  std::format(L"{}",cstr); // { dg-error "here" }
+
+  // Ill-formed in C++20
+  // In C++23 they give L"['h', 'e', 'l', 'l', 'o']"
+  std::format(L"{}", "hello"); // { dg-error "here" }
+  std::format(L"{}", std::string_view("hello")); // { dg-error "here" }
+  std::format(L"{}", std::string("hello")); // { dg-error "here" }
+#ifdef __cpp_lib_format_ranges
+  // LWG 3944 does not change this, it's still valid.
+  std::format(L"{}", std::vector{'h', 'e', 'l', 'l', 'o'});
+#endif
+}
+
+// { dg-error "std::formatter must be specialized" "" { target *-*-* } 0 }
+// { dg-prune-output "use of deleted function" }
+// { dg-prune-output "no matching function" }
+// { dg-prune-output "has no member named 'parse'" }
+// { dg-prune-output "not a constant expression" }
-- 
2.44.0



[committed] libstdc++: Add __is_in_place_index_v helper and use it in

2024-03-23 Thread Jonathan Wakely
Tested aarch64-linux. Pushed to trunk.

-- >8 --

We already have __is_in_place_type_v for in_place_type_t so adding an
equivalent for in_place_index_t allows us avoid a class template
instantiation for the __not_in_place_tag constraint on the most
commonly-used std::variant::variant(T&&) constructor.

For in_place_type_t we also have a __is_in_place_type class template
defined in terms of the variable template, but that isn't actually used
anywhere. I'm not adding an equivalent for the new variable template,
because that wouldn't be used either.

For GCC 15 we should remove the unused __is_in_place_tag and
__is_in_place_type class templates.

libstdc++-v3/ChangeLog:

* include/bits/utility.h (__is_in_place_index_v): New variable
template.
* include/std/variant (__not_in_place_tag): Define in terms of
variable templates not a class template.
---
 libstdc++-v3/include/bits/utility.h | 6 ++
 libstdc++-v3/include/std/variant| 3 ++-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/bits/utility.h 
b/libstdc++-v3/include/bits/utility.h
index 2a741bf7000..9f3b99231b3 100644
--- a/libstdc++-v3/include/bits/utility.h
+++ b/libstdc++-v3/include/bits/utility.h
@@ -223,6 +223,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 using __is_in_place_type = bool_constant<__is_in_place_type_v<_Tp>>;
 
+  template
+inline constexpr bool __is_in_place_index_v = false;
+
+  template
+inline constexpr bool __is_in_place_index_v> = true;
+
 #endif // C++17
 
 #if _GLIBCXX_USE_BUILTIN_TRAIT(__type_pack_element)
diff --git a/libstdc++-v3/include/std/variant b/libstdc++-v3/include/std/variant
index 4b9002e0917..f79d95db7a8 100644
--- a/libstdc++-v3/include/std/variant
+++ b/libstdc++-v3/include/std/variant
@@ -1414,7 +1414,8 @@ namespace __variant
 
   template
static constexpr bool __not_in_place_tag
- = !__is_in_place_tag<__remove_cvref_t<_Tp>>::value;
+ = !__is_in_place_type_v<__remove_cvref_t<_Tp>>
+ && !__is_in_place_index_v<__remove_cvref_t<_Tp>>;
 
 public:
 #if __cpp_concepts
-- 
2.44.0



[committed] libstdc++: Use std::type_identity_t in as per LWG 3950 [PR114400]

2024-03-23 Thread Jonathan Wakely
Tested aarch64-linux. Pushed to trunk.

-- >8 --

The difference between __type_identity_t and std::type_identity_t is
observable, as demonstrated in the PR. Nobody in LWG seems to think this
an example we should really care about, but it seems easy and harmless
to change this.

libstdc++-v3/ChangeLog:

PR libstdc++/114400
* include/std/string_view (operator==): Use std::type_identity_t
in C++20 instead of our own __type_identity_t.
---
 libstdc++-v3/include/std/string_view | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/std/string_view 
b/libstdc++-v3/include/std/string_view
index e30a9c1768e..a7c5a126461 100644
--- a/libstdc++-v3/include/std/string_view
+++ b/libstdc++-v3/include/std/string_view
@@ -602,15 +602,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // deduction and the other argument gets implicitly converted to the deduced
   // type (see N3766).
 
+#if __cpp_lib_three_way_comparison
   template
 [[nodiscard]]
 constexpr bool
 operator==(basic_string_view<_CharT, _Traits> __x,
-   __type_identity_t> __y)
+  type_identity_t> __y)
 noexcept
 { return __x.size() == __y.size() && __x.compare(__y) == 0; }
 
-#if __cpp_lib_three_way_comparison
   template
 [[nodiscard]]
 constexpr auto
@@ -620,6 +620,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 -> decltype(__detail::__char_traits_cmp_cat<_Traits>(0))
 { return __detail::__char_traits_cmp_cat<_Traits>(__x.compare(__y)); }
 #else
+  template
+[[nodiscard]]
+constexpr bool
+operator==(basic_string_view<_CharT, _Traits> __x,
+  __type_identity_t> __y)
+noexcept
+{ return __x.size() == __y.size() && __x.compare(__y) == 0; }
+
   template
 [[nodiscard]]
 constexpr bool
-- 
2.44.0



Re: [PATCH] bitint: Fix bitfield loads in handle_cast [PR114433]

2024-03-23 Thread Richard Biener



> Am 23.03.2024 um 08:59 schrieb Jakub Jelinek :
> 
> Hi!
> 
> We ICE on the following testcase, because handle_cast was incorrectly
> testing !m_first to see whether it should use m_data[m_bitfld_load + 1]
> or fresh SSA_NAME for a PHI result.
> Now, m_first is in the routine sometimes temporarily cleared in between
> doing prepare_data_in_out and the !m_first check and only before returning
> restored from the save_first copy.
> Without this patch, we try to use the same SSA_NAME (_12 here) in 2
> different PHI results which is obviously invalid IL and ICEs very quickly.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok

Richard 

> 2024-03-23  Jakub Jelinek  
> 
>PR tree-optimization/114433
>* gimple-lower-bitint.cc (bitint_large_huge::handle_cast): For
>m_bitfld_load check save_first rather than m_first.
> 
>* gcc.dg/torture/bitint-68.c: New test.
> 
> --- gcc/gimple-lower-bitint.cc.jj2024-03-22 14:54:41.767972684 +0100
> +++ gcc/gimple-lower-bitint.cc2024-03-22 17:13:43.702249666 +0100
> @@ -1506,7 +1506,7 @@ bitint_large_huge::handle_cast (tree lhs
>  if (m_bitfld_load)
>{
>  tree t4;
> -  if (!m_first)
> +  if (!save_first)
>t4 = m_data[m_bitfld_load + 1];
>  else
>t4 = make_ssa_name (m_limb_type);
> --- gcc/testsuite/gcc.dg/torture/bitint-68.c.jj2024-03-22 
> 17:26:29.841713520 +0100
> +++ gcc/testsuite/gcc.dg/torture/bitint-68.c2024-03-22 17:26:51.941409603 
> +0100
> @@ -0,0 +1,28 @@
> +/* PR tree-optimization/114433 */
> +/* { dg-do run { target bitint } } */
> +/* { dg-options "-std=c23" } */
> +/* { dg-skip-if "" { ! run_expensive_tests }  { "*" } { "-O0" "-O2" } } */
> +/* { dg-skip-if "" { ! run_expensive_tests } { "-flto" } { "" } } */
> +
> +#if __BITINT_MAXWIDTH__ >= 511
> +struct S { int : 31; _BitInt(511) b : 300; } s;
> +
> +__attribute__((noipa)) _BitInt(511)
> +foo (void)
> +{
> +  return s.b << 1;
> +}
> +#endif
> +
> +int
> +main ()
> +{
> +#if __BITINT_MAXWIDTH__ >= 511
> +  s.b = 
> 642460398785925402356009598661384732715767737595497615767135001949421105426024498988100867wb;
> +  if (foo () != ((_BitInt(511)) 
> 642460398785925402356009598661384732715767737595497615767135001949421105426024498988100867wb)
>  << 1)
> +__builtin_abort ();
> +  s.b = 
> 2655156766298562299560755420298083843774074962786295887660222690220887wb;
> +  if (foo () != ((_BitInt(511)) 
> 2655156766298562299560755420298083843774074962786295887660222690220887wb) << 
> 1)
> +__builtin_abort ();
> +#endif
> +}
> 
>Jakub
> 


Re: [PATCH] bitint: Handle complex types in build_bitint_stmt_ssa_conflicts [PR114425]

2024-03-23 Thread Richard Biener



> Am 23.03.2024 um 08:56 schrieb Jakub Jelinek :
> 
> Hi!
> 
> The task of the build_bitint_stmt_ssa_conflicts hook for
> tree-ssa-coalesce.cc next to special casing the
> multiplication/division/modulo is to ignore statements with
> large/huge _BitInt lhs which isn't in names bitmap and on the
> other side pretend all uses of the stmt are used in a later stmt
> (single user of that SSA_NAME or perhaps single user of lhs of
> the single user etc.) where the lowering will actually emit the
> code.
> 
> Unfortunately the function wasn't handling COMPLEX_TYPE of the large/huge
> BITINT_TYPE, while the FE doesn't really support such types, they are
> used under the hood for __builtin_{add,sub,mul}_overflow{,_p}, they are
> also present or absent from the names bitmap and should be treated the same.
> 
> Without this patch, the operands of .ADD_OVERFLOW were incorrectly pretended
> to be used right in that call statement rather than on the cast stmt from
> IMAGPART_EXPR of .ADD_OVERFLOW return value to some integral type.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok

Richard 

> 2024-03-23  Jakub Jelinek  
> 
>PR tree-optimization/114425
>* gimple-lower-bitint.cc (build_bitint_stmt_ssa_conflicts): Handle
>_Complex large/huge _BitInt types like the large/huge _BitInt types.
> 
>* gcc.dg/torture/bitint-67.c: New test.
>
> --- gcc/gimple-lower-bitint.cc.jj2024-03-22 09:21:28.350087044 +0100
> +++ gcc/gimple-lower-bitint.cc2024-03-22 14:54:41.767972684 +0100
> @@ -5902,20 +5902,25 @@ build_bitint_stmt_ssa_conflicts (gimple
>   if (is_gimple_assign (stmt))
> {
>   lhs = gimple_assign_lhs (stmt);
> -  if (TREE_CODE (lhs) == SSA_NAME
> -  && TREE_CODE (TREE_TYPE (lhs)) == BITINT_TYPE
> -  && bitint_precision_kind (TREE_TYPE (lhs)) >= bitint_prec_large)
> +  if (TREE_CODE (lhs) == SSA_NAME)
>{
> -  if (!bitmap_bit_p (names, SSA_NAME_VERSION (lhs)))
> -return;
> -  switch (gimple_assign_rhs_code (stmt))
> +  tree type = TREE_TYPE (lhs);
> +  if (TREE_CODE (type) == COMPLEX_TYPE)
> +type = TREE_TYPE (type);
> +  if (TREE_CODE (type) == BITINT_TYPE
> +  && bitint_precision_kind (type) >= bitint_prec_large)
>{
> -case MULT_EXPR:
> -case TRUNC_DIV_EXPR:
> -case TRUNC_MOD_EXPR:
> -  muldiv_p = true;
> -default:
> -  break;
> +  if (!bitmap_bit_p (names, SSA_NAME_VERSION (lhs)))
> +return;
> +  switch (gimple_assign_rhs_code (stmt))
> +{
> +case MULT_EXPR:
> +case TRUNC_DIV_EXPR:
> +case TRUNC_MOD_EXPR:
> +  muldiv_p = true;
> +default:
> +  break;
> +}
>}
>}
> }
> @@ -5947,27 +5952,37 @@ build_bitint_stmt_ssa_conflicts (gimple
> 
>   auto_vec worklist;
>   FOR_EACH_SSA_TREE_OPERAND (var, stmt, iter, SSA_OP_USE)
> -if (TREE_CODE (TREE_TYPE (var)) == BITINT_TYPE
> -&& bitint_precision_kind (TREE_TYPE (var)) >= bitint_prec_large)
> -  {
> -if (bitmap_bit_p (names, SSA_NAME_VERSION (var)))
> -  use (live, var);
> -else
> -  worklist.safe_push (var);
> -  }
> +{
> +  tree type = TREE_TYPE (var);
> +  if (TREE_CODE (type) == COMPLEX_TYPE)
> +type = TREE_TYPE (type);
> +  if (TREE_CODE (type) == BITINT_TYPE
> +  && bitint_precision_kind (type) >= bitint_prec_large)
> +{
> +  if (bitmap_bit_p (names, SSA_NAME_VERSION (var)))
> +use (live, var);
> +  else
> +worklist.safe_push (var);
> +}
> +}
> 
>   while (worklist.length () > 0)
> {
>   tree s = worklist.pop ();
>   FOR_EACH_SSA_TREE_OPERAND (var, SSA_NAME_DEF_STMT (s), iter, SSA_OP_USE)
> -if (TREE_CODE (TREE_TYPE (var)) == BITINT_TYPE
> -&& bitint_precision_kind (TREE_TYPE (var)) >= bitint_prec_large)
> -  {
> -if (bitmap_bit_p (names, SSA_NAME_VERSION (var)))
> -  use (live, var);
> -else
> -  worklist.safe_push (var);
> -  }
> +{
> +  tree type = TREE_TYPE (var);
> +  if (TREE_CODE (type) == COMPLEX_TYPE)
> +type = TREE_TYPE (type);
> +  if (TREE_CODE (type) == BITINT_TYPE
> +  && bitint_precision_kind (type) >= bitint_prec_large)
> +{
> +  if (bitmap_bit_p (names, SSA_NAME_VERSION (var)))
> +use (live, var);
> +  else
> +worklist.safe_push (var);
> +}
> +}
> }
> 
>   if (muldiv_p)
> --- gcc/testsuite/gcc.dg/torture/bitint-67.c.jj2024-03-22 
> 14:53:52.929642342 +0100
> +++ gcc/testsuite/gcc.dg/torture/bitint-67.c2024-03-22 14:54:29.205144953 
> +0100
> @@ -0,0 +1,29 @@
> +/* PR tree-optimization/114425 */
> +/* { dg-do run { target bitint } } */
> +/* { dg-options "-std=c23" } */
> +/* { dg-skip-if "" { ! run_expensive_tests }  { "*" } { "-O0" "-O2" } } */
> +/* { dg-skip-if "" { ! run_expensive_tests } { "-flto" } { "" } } */
> +
> +#if 

Re: [PATCH] predcom: Punt for steps which aren't multiples of access size [PR111683]

2024-03-23 Thread Richard Biener



> Am 23.03.2024 um 08:49 schrieb Jakub Jelinek :
> 
> Hi!
> 
> On the following testcases, there is no overlap between data references
> within a single iteration, but the data references have size which is twice
> as large as the step, which means the data references overlap with the next
> iteration which predcom doesn't take into account.
> As discussed in the PR, even if the reference size is smaller than step,
> if step isn't a multiple of the reference size, there could be overlaps with
> some other iteration later on.
> The initial version of the patch regressed (test still passed, but predcom
> didn't optimize anymore) pr71083.c which has a packed char, short structure
> and was reading/writing the short 2 bytes in there with step 3.
> The following patch deals with that by retrying for COMPONENT_REFs also the
> aggregate sizes etc., so that it then compares 3 bytes against step 3.
> In make check-gcc/check-g++ this patch I believe affects code generation
> for only the 2 new testcases according to statistics I've gathered.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok

Thanks,
Richard 

> 2024-03-23  Jakub Jelinek  
> 
>PR middle-end/111683
>* tree-predcom.cc (pcom_worker::suitable_component_p): If has_write
>and comp_step is RS_NONZERO, return false if any reference in the
>component doesn't have DR_STEP a multiple of access size.
> 
>* gcc.dg/pr111683-1.c: New test.
>* gcc.dg/pr111683-2.c: New test.
> 
> --- gcc/tree-predcom.cc.jj2024-03-22 09:19:27.700756950 +0100
> +++ gcc/tree-predcom.cc2024-03-22 14:01:21.758978338 +0100
> @@ -1102,8 +1102,39 @@ pcom_worker::suitable_component_p (struc
>   gcc_assert (ok);
>   first->offset = 0;
> 
> -  for (i = 1; comp->refs.iterate (i, ); i++)
> +  FOR_EACH_VEC_ELT (comp->refs, i, a)
> {
> +  if (has_write && comp->comp_step == RS_NONZERO)
> +{
> +  /* Punt for non-invariant references where step isn't a multiple
> + of reference size.  If step is smaller than reference size,
> + it overlaps the access in next iteration, if step is larger,
> + but not multiple of the access size, there could be overlap
> + in some later iteration.  There might be more latent issues
> + about this in predcom or data reference analysis.  If the
> + reference is a COMPONENT_REF, also check if step isn't a
> + multiple of the containg aggregate size.  See PR111683.  */
> +  tree ref = DR_REF (a->ref);
> +  tree step = DR_STEP (a->ref);
> +  if (TREE_CODE (ref) == COMPONENT_REF
> +  && DECL_BIT_FIELD (TREE_OPERAND (ref, 1)))
> +ref = TREE_OPERAND (ref, 0);
> +  do
> +{
> +  tree sz = TYPE_SIZE_UNIT (TREE_TYPE (ref));
> +  if (TREE_CODE (sz) != INTEGER_CST)
> +return false;
> +  if (wi::multiple_of_p (wi::to_offset (step),
> + wi::to_offset (sz), SIGNED))
> +break;
> +  if (TREE_CODE (ref) != COMPONENT_REF)
> +return false;
> +  ref = TREE_OPERAND (ref, 0);
> +}
> +  while (1);
> +}
> +  if (i == 0)
> +continue;
>   /* Polynomial offsets are no use, since we need to know the
> gap between iteration numbers at compile time.  */
>   poly_widest_int offset;
> --- gcc/testsuite/gcc.dg/pr111683-1.c.jj2024-03-22 11:14:29.292908760 
> +0100
> +++ gcc/testsuite/gcc.dg/pr111683-1.c2024-03-22 11:14:29.292908760 +0100
> @@ -0,0 +1,22 @@
> +/* PR middle-end/111683 */
> +/* { dg-do run } */
> +/* { dg-options "-O2" } */
> +
> +long long b[6] = { 3, 4, 5, 6, 7, 8 }, c[16];
> +long long d[9] = { 3, 7, 12, 18, 22, 26, 21, 15, 8 };
> +typedef long long U __attribute__ ((vector_size(16), may_alias, aligned(1)));
> +typedef long long V __attribute__ ((vector_size(16), may_alias));
> +
> +int
> +main ()
> +{
> +  for (int f = 0; f < 6; f++)
> +{
> +  *(U *) [f] = *(U *) [f] + (V) { b[f], b[f] };
> +  *(U *) [f + 2] = *(U *) [f + 2] + (V) { b[f], b[f] };
> +}
> +  for (int f = 0; f < 9; f++)
> +if (c[f] != d[f])
> +  __builtin_abort ();
> +  return 0;
> +}
> --- gcc/testsuite/gcc.dg/pr111683-2.c.jj2024-03-22 11:14:29.292908760 
> +0100
> +++ gcc/testsuite/gcc.dg/pr111683-2.c2024-03-22 11:14:29.292908760 +0100
> @@ -0,0 +1,27 @@
> +/* PR middle-end/111683 */
> +/* { dg-do run } */
> +/* { dg-options "-O2" } */
> +
> +int b[6] = { 3, 4, 5, 6, 7, 8 }, c[12];
> +int d[16] = { 0, 1, 3, 6, 10, 14, 12, 9, 5, 0, 0, 0 };
> +
> +int
> +main ()
> +{
> +  int i;
> +  if (sizeof (int) * 2 != sizeof (long long))
> +return 0;
> +  for (i = 0; i < 6; i++)
> +{
> +  long long a;
> +  __builtin_memcpy (, [i], sizeof (a));
> +  a += (((long long) i) << (sizeof (int) * __CHAR_BIT__)) + i;
> +  __builtin_memcpy ([i], , sizeof (a));
> +  __builtin_memcpy (, [i + 2], sizeof (a));
> +  a += (((long long) i) << (sizeof (int) * __CHAR_BIT__)) + i;
> +  

Re: [PATCH v2] rs6000: Stackoverflow in optimized code on PPC [PR100799]

2024-03-23 Thread Ajit Agarwal
Hello Peter:

On 23/03/24 10:07 am, Peter Bergner wrote:
> On 3/22/24 5:15 AM, Ajit Agarwal wrote:
>> When using FlexiBLAS with OpenBLAS we noticed corruption of
>> the parameters passed to OpenBLAS functions. FlexiBLAS
>> basically provides a BLAS interface where each function
>> is a stub that forwards the arguments to a real BLAS lib,
>> like OpenBLAS.
>>
>> Fixes the corruption of caller frame checking number of
>> arguments is less than equal to GP_ARG_NUM_REG (8)
>> excluding hidden unused DECLS.
> 
> I think the git log entry commentary could be a little more descriptive
> of what the problem is. How about something like the following?
> 
>   When using FlexiBLAS with OpenBLAS, we noticed corruption of the caller
>   stack frame when calling OpenBLAS functions.  This was caused by the
>   FlexiBLAS C/C++ caller and OpenBLAS Fortran callee disagreeing on the
>   number of function parameters in the callee due to hidden Fortran
>   parameters. This can cause problems when the callee believes the caller
>   has allocated a parameter save area when the caller has not done so.
>   That means any writes by the callee into the non-existent parameter save
>   area will corrupt the caller stack frame.
> 
>   The workaround implemented here, is for the callee to determine whether
>   the caller has allocated a parameter save area or not, by ignoring any
>   unused hidden parameters when counting the number of parameters.
> 
> 
I will address this change in the new version of the patch.
> 
>>  PR rtk-optimization/100799
> 
> s/rtk/rtl/
> 
> 
> 
I will address this in new version of the patch.
>>  * config/rs6000/rs6000-calls.cc (rs6000_function_arg): Don't
>>  generate parameter save area if number of arguments passed
>>  less than equal to GP_ARG_NUM_REG (8) excluding hidden
>>  parameter.
> 
> The callee doesn't generate or allocate the parameter save area, the
> caller does.  The code here is for the callee trying to determine
> whether the caller has done so.  How about saying the following instead?
> 
>   Don't assume a parameter save area has been allocated if the number of
>   formal parameters, excluding unused hidden parameters, is less than or
>   equal to GP_ARG_NUM_REG (8).
> 
> 

I will incorporate this change in new version of the patch.

> 
> 
>>  (init_cumulative_args): Check for hidden parameter in fortran
>>  routine and set the flag hidden_string_length and actual
>>  parameter passed excluding hidden unused DECLS.
> 
> Check for unused hidden Fortran parameters and set hidden_string_length
> and actual_parm_length.
> 
>

I will address this change in new version of the patch.

 
>> +  /* When the buggy C/C++ wrappers call the function with fewer arguments
>> + than it actually has and doesn't expect the parameter save area on the
>> + caller side because of that while the callee expects it and the callee
>> + actually stores something in the parameter save area, it corrupts
>> + whatever is in the caller stack frame at that location.  */
> 
> The wrapper/caller is the one that allocates the parameter save area, so
> saying "...doesn't expect the parameter save area on the caller side..."
> doesn't make sense, since it knows whether it allocated it or not.
> How about saying something like the following instead?
> 
>   Check whether this function contains any unused hidden parameters and
>   record how many there are for use in rs6000_function_arg() to determine
>   whether its callers have allocated a parameter save area or not.
>   See PR100799 for details.
> 
> 

I will incorporate this change in new version of the patch.

> 
>> +  unsigned int num_args = 0;
>> +  unsigned int hidden_length = 0;
>> +
>> +  for (tree arg = DECL_ARGUMENTS (current_function_decl);
>> +   arg; arg = DECL_CHAIN (arg))
>> +{
>> +  num_args++;
>> +  if (DECL_HIDDEN_STRING_LENGTH (arg))
>> +{
>> +  tree parmdef = ssa_default_def (cfun, arg);
>> +  if (parmdef == NULL || has_zero_uses (parmdef))
>> +{
>> +  cum->hidden_string_length = 1;
>> +  hidden_length++;
>> +}
>> +}
>> +   }
>> +
>> +  cum->actual_parm_length = num_args - hidden_length;
> 
> This code looks fine, but do we really need two new fields in rs6000_args?
> Can't we just get along with only cum->actual_parm_length by modifying
> the rs6000_function_arg() change from:
> 
>> +  else if (align_words < GP_ARG_NUM_REG
>> +   || (cum->hidden_string_length
>> +   && cum->actual_parm_length <= GP_ARG_NUM_REG))
> 
> to:
> 
> +  else if (align_words < GP_ARG_NUM_REG
> +|| cum->actual_parm_length <= GP_ARG_NUM_REG)
> 
> ???
>

Yes we can do that. I will address this in new version of the patch.

 
> That said, I have a further comment below on what happens here when 
> align_words >= GP_ARG_NUM_REG and cum->actual_parm_length <= GP_ARG_NUM_REG.
> 
>

If we exceed the align_words >= GP_ARG_NUM_REG then there could 

[PATCH] bitint: Fix bitfield loads in handle_cast [PR114433]

2024-03-23 Thread Jakub Jelinek
Hi!

We ICE on the following testcase, because handle_cast was incorrectly
testing !m_first to see whether it should use m_data[m_bitfld_load + 1]
or fresh SSA_NAME for a PHI result.
Now, m_first is in the routine sometimes temporarily cleared in between
doing prepare_data_in_out and the !m_first check and only before returning
restored from the save_first copy.
Without this patch, we try to use the same SSA_NAME (_12 here) in 2
different PHI results which is obviously invalid IL and ICEs very quickly.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-03-23  Jakub Jelinek  

PR tree-optimization/114433
* gimple-lower-bitint.cc (bitint_large_huge::handle_cast): For
m_bitfld_load check save_first rather than m_first.

* gcc.dg/torture/bitint-68.c: New test.

--- gcc/gimple-lower-bitint.cc.jj   2024-03-22 14:54:41.767972684 +0100
+++ gcc/gimple-lower-bitint.cc  2024-03-22 17:13:43.702249666 +0100
@@ -1506,7 +1506,7 @@ bitint_large_huge::handle_cast (tree lhs
  if (m_bitfld_load)
{
  tree t4;
- if (!m_first)
+ if (!save_first)
t4 = m_data[m_bitfld_load + 1];
  else
t4 = make_ssa_name (m_limb_type);
--- gcc/testsuite/gcc.dg/torture/bitint-68.c.jj 2024-03-22 17:26:29.841713520 
+0100
+++ gcc/testsuite/gcc.dg/torture/bitint-68.c2024-03-22 17:26:51.941409603 
+0100
@@ -0,0 +1,28 @@
+/* PR tree-optimization/114433 */
+/* { dg-do run { target bitint } } */
+/* { dg-options "-std=c23" } */
+/* { dg-skip-if "" { ! run_expensive_tests }  { "*" } { "-O0" "-O2" } } */
+/* { dg-skip-if "" { ! run_expensive_tests } { "-flto" } { "" } } */
+
+#if __BITINT_MAXWIDTH__ >= 511
+struct S { int : 31; _BitInt(511) b : 300; } s;
+
+__attribute__((noipa)) _BitInt(511)
+foo (void) 
+{
+  return s.b << 1;
+}
+#endif
+
+int
+main ()
+{
+#if __BITINT_MAXWIDTH__ >= 511
+  s.b = 
642460398785925402356009598661384732715767737595497615767135001949421105426024498988100867wb;
+  if (foo () != ((_BitInt(511)) 
642460398785925402356009598661384732715767737595497615767135001949421105426024498988100867wb)
 << 1)
+__builtin_abort ();
+  s.b = 
2655156766298562299560755420298083843774074962786295887660222690220887wb;
+  if (foo () != ((_BitInt(511)) 
2655156766298562299560755420298083843774074962786295887660222690220887wb) << 1)
+__builtin_abort ();
+#endif
+}

Jakub



[PATCH] bitint: Handle complex types in build_bitint_stmt_ssa_conflicts [PR114425]

2024-03-23 Thread Jakub Jelinek
Hi!

The task of the build_bitint_stmt_ssa_conflicts hook for
tree-ssa-coalesce.cc next to special casing the
multiplication/division/modulo is to ignore statements with
large/huge _BitInt lhs which isn't in names bitmap and on the
other side pretend all uses of the stmt are used in a later stmt
(single user of that SSA_NAME or perhaps single user of lhs of
the single user etc.) where the lowering will actually emit the
code.

Unfortunately the function wasn't handling COMPLEX_TYPE of the large/huge
BITINT_TYPE, while the FE doesn't really support such types, they are
used under the hood for __builtin_{add,sub,mul}_overflow{,_p}, they are
also present or absent from the names bitmap and should be treated the same.

Without this patch, the operands of .ADD_OVERFLOW were incorrectly pretended
to be used right in that call statement rather than on the cast stmt from
IMAGPART_EXPR of .ADD_OVERFLOW return value to some integral type.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-03-23  Jakub Jelinek  

PR tree-optimization/114425
* gimple-lower-bitint.cc (build_bitint_stmt_ssa_conflicts): Handle
_Complex large/huge _BitInt types like the large/huge _BitInt types.

* gcc.dg/torture/bitint-67.c: New test.

--- gcc/gimple-lower-bitint.cc.jj   2024-03-22 09:21:28.350087044 +0100
+++ gcc/gimple-lower-bitint.cc  2024-03-22 14:54:41.767972684 +0100
@@ -5902,20 +5902,25 @@ build_bitint_stmt_ssa_conflicts (gimple
   if (is_gimple_assign (stmt))
 {
   lhs = gimple_assign_lhs (stmt);
-  if (TREE_CODE (lhs) == SSA_NAME
- && TREE_CODE (TREE_TYPE (lhs)) == BITINT_TYPE
- && bitint_precision_kind (TREE_TYPE (lhs)) >= bitint_prec_large)
+  if (TREE_CODE (lhs) == SSA_NAME)
{
- if (!bitmap_bit_p (names, SSA_NAME_VERSION (lhs)))
-   return;
- switch (gimple_assign_rhs_code (stmt))
+ tree type = TREE_TYPE (lhs);
+ if (TREE_CODE (type) == COMPLEX_TYPE)
+   type = TREE_TYPE (type);
+ if (TREE_CODE (type) == BITINT_TYPE
+ && bitint_precision_kind (type) >= bitint_prec_large)
{
-   case MULT_EXPR:
-   case TRUNC_DIV_EXPR:
-   case TRUNC_MOD_EXPR:
- muldiv_p = true;
-   default:
- break;
+ if (!bitmap_bit_p (names, SSA_NAME_VERSION (lhs)))
+   return;
+ switch (gimple_assign_rhs_code (stmt))
+   {
+   case MULT_EXPR:
+   case TRUNC_DIV_EXPR:
+   case TRUNC_MOD_EXPR:
+ muldiv_p = true;
+   default:
+ break;
+   }
}
}
 }
@@ -5947,27 +5952,37 @@ build_bitint_stmt_ssa_conflicts (gimple
 
   auto_vec worklist;
   FOR_EACH_SSA_TREE_OPERAND (var, stmt, iter, SSA_OP_USE)
-if (TREE_CODE (TREE_TYPE (var)) == BITINT_TYPE
-   && bitint_precision_kind (TREE_TYPE (var)) >= bitint_prec_large)
-  {
-   if (bitmap_bit_p (names, SSA_NAME_VERSION (var)))
- use (live, var);
-   else
- worklist.safe_push (var);
-  }
+{
+  tree type = TREE_TYPE (var);
+  if (TREE_CODE (type) == COMPLEX_TYPE)
+   type = TREE_TYPE (type);
+  if (TREE_CODE (type) == BITINT_TYPE
+ && bitint_precision_kind (type) >= bitint_prec_large)
+   {
+ if (bitmap_bit_p (names, SSA_NAME_VERSION (var)))
+   use (live, var);
+ else
+   worklist.safe_push (var);
+   }
+}
 
   while (worklist.length () > 0)
 {
   tree s = worklist.pop ();
   FOR_EACH_SSA_TREE_OPERAND (var, SSA_NAME_DEF_STMT (s), iter, SSA_OP_USE)
-   if (TREE_CODE (TREE_TYPE (var)) == BITINT_TYPE
-   && bitint_precision_kind (TREE_TYPE (var)) >= bitint_prec_large)
- {
-   if (bitmap_bit_p (names, SSA_NAME_VERSION (var)))
- use (live, var);
-   else
- worklist.safe_push (var);
- }
+   {
+ tree type = TREE_TYPE (var);
+ if (TREE_CODE (type) == COMPLEX_TYPE)
+   type = TREE_TYPE (type);
+ if (TREE_CODE (type) == BITINT_TYPE
+ && bitint_precision_kind (type) >= bitint_prec_large)
+   {
+ if (bitmap_bit_p (names, SSA_NAME_VERSION (var)))
+   use (live, var);
+ else
+   worklist.safe_push (var);
+   }
+   }
 }
 
   if (muldiv_p)
--- gcc/testsuite/gcc.dg/torture/bitint-67.c.jj 2024-03-22 14:53:52.929642342 
+0100
+++ gcc/testsuite/gcc.dg/torture/bitint-67.c2024-03-22 14:54:29.205144953 
+0100
@@ -0,0 +1,29 @@
+/* PR tree-optimization/114425 */
+/* { dg-do run { target bitint } } */
+/* { dg-options "-std=c23" } */
+/* { dg-skip-if "" { ! run_expensive_tests }  { "*" } { "-O0" "-O2" } } */
+/* { dg-skip-if "" { ! run_expensive_tests } { "-flto" } { "" } } */
+
+#if __BITINT_MAXWIDTH__ >= 2000
+_BitInt(8) a;

[PATCH] predcom: Punt for steps which aren't multiples of access size [PR111683]

2024-03-23 Thread Jakub Jelinek
Hi!

On the following testcases, there is no overlap between data references
within a single iteration, but the data references have size which is twice
as large as the step, which means the data references overlap with the next
iteration which predcom doesn't take into account.
As discussed in the PR, even if the reference size is smaller than step,
if step isn't a multiple of the reference size, there could be overlaps with
some other iteration later on.
The initial version of the patch regressed (test still passed, but predcom
didn't optimize anymore) pr71083.c which has a packed char, short structure
and was reading/writing the short 2 bytes in there with step 3.
The following patch deals with that by retrying for COMPONENT_REFs also the
aggregate sizes etc., so that it then compares 3 bytes against step 3.
In make check-gcc/check-g++ this patch I believe affects code generation
for only the 2 new testcases according to statistics I've gathered.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-03-23  Jakub Jelinek  

PR middle-end/111683
* tree-predcom.cc (pcom_worker::suitable_component_p): If has_write
and comp_step is RS_NONZERO, return false if any reference in the
component doesn't have DR_STEP a multiple of access size.

* gcc.dg/pr111683-1.c: New test.
* gcc.dg/pr111683-2.c: New test.

--- gcc/tree-predcom.cc.jj  2024-03-22 09:19:27.700756950 +0100
+++ gcc/tree-predcom.cc 2024-03-22 14:01:21.758978338 +0100
@@ -1102,8 +1102,39 @@ pcom_worker::suitable_component_p (struc
   gcc_assert (ok);
   first->offset = 0;
 
-  for (i = 1; comp->refs.iterate (i, ); i++)
+  FOR_EACH_VEC_ELT (comp->refs, i, a)
 {
+  if (has_write && comp->comp_step == RS_NONZERO)
+   {
+ /* Punt for non-invariant references where step isn't a multiple
+of reference size.  If step is smaller than reference size,
+it overlaps the access in next iteration, if step is larger,
+but not multiple of the access size, there could be overlap
+in some later iteration.  There might be more latent issues
+about this in predcom or data reference analysis.  If the
+reference is a COMPONENT_REF, also check if step isn't a
+multiple of the containg aggregate size.  See PR111683.  */
+ tree ref = DR_REF (a->ref);
+ tree step = DR_STEP (a->ref);
+ if (TREE_CODE (ref) == COMPONENT_REF
+ && DECL_BIT_FIELD (TREE_OPERAND (ref, 1)))
+   ref = TREE_OPERAND (ref, 0);
+ do
+   {
+ tree sz = TYPE_SIZE_UNIT (TREE_TYPE (ref));
+ if (TREE_CODE (sz) != INTEGER_CST)
+   return false;
+ if (wi::multiple_of_p (wi::to_offset (step),
+wi::to_offset (sz), SIGNED))
+   break;
+ if (TREE_CODE (ref) != COMPONENT_REF)
+   return false;
+ ref = TREE_OPERAND (ref, 0);
+   }
+ while (1);
+   }
+  if (i == 0)
+   continue;
   /* Polynomial offsets are no use, since we need to know the
 gap between iteration numbers at compile time.  */
   poly_widest_int offset;
--- gcc/testsuite/gcc.dg/pr111683-1.c.jj2024-03-22 11:14:29.292908760 
+0100
+++ gcc/testsuite/gcc.dg/pr111683-1.c   2024-03-22 11:14:29.292908760 +0100
@@ -0,0 +1,22 @@
+/* PR middle-end/111683 */
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+long long b[6] = { 3, 4, 5, 6, 7, 8 }, c[16];
+long long d[9] = { 3, 7, 12, 18, 22, 26, 21, 15, 8 };
+typedef long long U __attribute__ ((vector_size(16), may_alias, aligned(1)));
+typedef long long V __attribute__ ((vector_size(16), may_alias));
+
+int
+main ()
+{
+  for (int f = 0; f < 6; f++)
+{
+  *(U *) [f] = *(U *) [f] + (V) { b[f], b[f] };
+  *(U *) [f + 2] = *(U *) [f + 2] + (V) { b[f], b[f] };
+}
+  for (int f = 0; f < 9; f++)
+if (c[f] != d[f])
+  __builtin_abort ();
+  return 0;
+}
--- gcc/testsuite/gcc.dg/pr111683-2.c.jj2024-03-22 11:14:29.292908760 
+0100
+++ gcc/testsuite/gcc.dg/pr111683-2.c   2024-03-22 11:14:29.292908760 +0100
@@ -0,0 +1,27 @@
+/* PR middle-end/111683 */
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+int b[6] = { 3, 4, 5, 6, 7, 8 }, c[12];
+int d[16] = { 0, 1, 3, 6, 10, 14, 12, 9, 5, 0, 0, 0 };
+
+int
+main ()
+{
+  int i;
+  if (sizeof (int) * 2 != sizeof (long long))
+return 0;
+  for (i = 0; i < 6; i++)
+{
+  long long a;
+  __builtin_memcpy (, [i], sizeof (a));
+  a += (((long long) i) << (sizeof (int) * __CHAR_BIT__)) + i;
+  __builtin_memcpy ([i], , sizeof (a));
+  __builtin_memcpy (, [i + 2], sizeof (a));
+  a += (((long long) i) << (sizeof (int) * __CHAR_BIT__)) + i;
+  __builtin_memcpy ([i + 2], , sizeof (a));
+}
+  if (__builtin_memcmp ([0], [0], sizeof (c)))
+__builtin_abort ();
+  return 0;
+}

Jakub



Re: [gcc-15 2/3] RISC-V: avoid LUI based const mat: keep stack offsets aligned

2024-03-23 Thread Jeff Law




On 3/19/24 2:05 PM, Vineet Gupta wrote:



On 3/19/24 06:10, Jeff Law wrote:

On 3/19/24 12:48 AM, Andrew Waterman wrote:

On Mon, Mar 18, 2024 at 5:28 PM Vineet Gupta  wrote:

On 3/16/24 13:21, Jeff Law wrote:

|   59944:add s0,sp,2047  <
|   59948:mv  a2,a0
|   5994c:mv  a3,a1
|   59950:mv  a0,sp
|   59954:li  a4,1
|   59958:lui a1,0x1
|   5995c:add s0,s0,1 <---
|   59960:jal 59a3c

SP here becomes unaligned, even if transitively which is undesirable as
well as incorrect:
- ABI requires stack to be 8 byte aligned
- asm code looks weird and unexpected
- to the user it might falsely seem like a compiler bug even when not,
  specially when staring at asm for debugging unrelated issue.
It's not ideal, but I think it's still ABI compliant as-is.  If it
wasn't, then I suspect things like virtual origins in Ada couldn't be
made ABI compliant.

To be clear are u suggesting ADD sp, sp, 2047 is ABI compliant ?
I'd still like to avoid it as I'm sure someone will complain about it.


With the patch, we get following correct code instead:

| ..
| 59944: add s0,sp,2032
| ..
| 5995c: add s0,s0,16

Alternately you could tighten the positive side of the range of the
splitter from patch 1/3 so that you could always use 2032 rather than
2047 on the first addi.   ie instead of allowing 2048..4094, allow
2048..4064.

2033..4064 vs. 2048..4094

Yeah I was a bit split about this as well. Since you are OK with either,
I'll keep them as-is and perhaps add this observation to commitlog.

There's a subset of embedded use cases where an interrupt service
routine continues on the same stack as the interrupted thread,
requiring sp to always have an ABI-compliant value (i.e. 16B aligned,
and with no important data on the stack at an address below sp).

Although not all use cases care about this property, it seems more
straightforward to maintain the invariant everywhere, rather than
selectively enforce it.

Just to be clear, the changes don't misalign the stack pointer at all.
They merely have the potential to create *another* pointer into the
stack which may or may not be aligned.  Which is totally normal, it's no
different than taking the address of a char on the stack.


Right I never saw any sp,sp,2047 getting generated - not even in the
first version of patch which lacked any filtering of stack regs via
riscv_reg_frame_related () and obviously didn't have the stack variant
of splitter. I don't know if that is just being lucky and not enough
testing exposure (I only spot checked buildroot libc, vmlinux) or
something somewhere enforces that.

However given that misaligned pointer off of stack is a non-issue, I
think we can do the following:

1. keep just one splitter with 2047 based predicates and constraint (and
not 2032) for both stack-related and general regs.
2. gate the splitter on only operands[0] being not stack related
(currently it checks for either [0] or [1]) - this allows the prominent
case where SP is simply a src, and avoids when any potential shenanigans
to SP itself.

Sounds reasonable to me.

Jeff



Re: [gcc-15 1/3] RISC-V: avoid LUI based const materialization ... [part of PR/106265]

2024-03-23 Thread Jeff Law




On 3/18/24 6:07 PM, Vineet Gupta wrote:



Naive question: why is define_split more natural vs. define_insn_and_split.
Is it because of the syntax/implementation or that one is more Combine
"centric" and other is more of a Split* Pass thing (roughly speaking of
course) or something else altogether ?
There are parts of combine that cost based on the number of insns.  This 
is primarily to keep combine from taking an insane amount of time.


So when we have a little white lie like mvconst_internal we muck up that 
costing aspect of the combiner.  That in turn stops the combiner from 
doing certain combinations.


As a concrete example consider this:

unsigned long long w32mem(unsigned long long w32)
{
return w32 & ~(1U << 0);
}


Right now we use a bseti+addi to construct the constant, then do the 
logical and.   Prior to the combiner it looks like this:



(insn 7 3 8 2 (set (reg:DI 137) 
(const_int 4294967294 [0xfffe])) "j.c":3:16 206 {*mvconst_internal}

 (nil))
(insn 8 7 13 2 (set (reg:DI 136 [ _2 ])
(and:DI (reg/v:DI 135 [ w32 ])
(reg:DI 137))) "j.c":3:16 102 {*anddi3}
 (expr_list:REG_DEAD (reg:DI 137)
(expr_list:REG_DEAD (reg/v:DI 135 [ w32 ])
(expr_list:REG_EQUAL (and:DI (reg/v:DI 135 [ w32 ])
(const_int 4294967294 [0xfffe]))
(nil)


The combiner will refuse to match a splitter where the number of 
incoming insns matches the number of resulting insns.  So to match this 
we'd have to write another define_insn_and_split.




If we didn't have mvconst_internal, then we'd see something like this:


(insn 6 3 7 2 (set (reg:DI 138)
(const_int 4294967296 [0x1])) "j.c":3:16 -1
 (nil))
(insn 7 6 8 2 (set (reg:DI 137)
(plus:DI (reg:DI 138)
(const_int -2 [0xfffe]))) "j.c":3:16 -1
 (expr_list:REG_EQUAL (const_int 4294967294 [0xfffe])
(nil)))
(insn 8 7 9 2 (set (reg:DI 136 [ _2 ])
(and:DI (reg/v:DI 135 [ w32 ])
(reg:DI 137))) "j.c":3:16 -1
 (nil))


Which we could match with a define_split which would generate RTL for 
bclri+zext.w.



Note that I suspect there's a handful of these kinds of sequences for 
logical ops where the immediate doesn't fit a simm12.



Of course the point of the white lie is to expose complex constant 
synthesis in  away that the combiner can use to simplify things.  It's 
definitely a tradeoff either way.  What's been rattling around a little 
bit would be to narrow the set of constants allowed by mvconst_internal 
to those which require 3 or more to synthesize.  THe idea being cases 
like you're looking where we can use addi+add rather than lui+addi+add 
would be rejected by mvconst_internal, but the more complex constants 
that push combine over the 4 insn limit would be allowed by 
mvconst_internal.





Would we have to revisit the new splitter (and perhaps audit the
existing define_insn_and_split patterns) if we were to go ahead with
this revert ?
I don't recall follow-ups which used the result of mvconst_internal in 
subsequent combiner patterns, but it should be fairly simple to search 
for them.


We'd need to look back at the original bug which resulted in creating 
the mvconst_internal pattern.  My recollection is it had a fairly 
complex constant and we were unable to see enough insns to do the 
necessary simplification to fix that case.


I bet whatever goes on inside perlbench,  mcf and x264 (guessing based 
on instruction counts in your message) + the original bug report will 
provide reasonable testcases for evaluating reversion + adding patches 
to avoid the regressions.




So why use "P" for your mode iterator?  I would have expected GPR since
I think this works for both SI and DI mode as-is.


My intent at least was to have this work for either of rv64/rv32 and in
either of those environments, work for both SI or DI - certainly
rv64+{SI,DI}.
To that end I debated between GPR, X and P.
It seems GPR only supports DI if TARGET_64BIT.
But I could well be wrong about above requirements or the subtle
differences in these.
I wouldn't worry about GPR only support DI for TARGET_64BIT.  I don't 
think we generally expose DImode patterns for TARGET_32BIT.





For the output template "#" for alternative 0 and the add instruction
for alternative 1 is probably better.  That way it's clear to everyone
that alternative 0 is always meant to be split.


OK.


Don't you need some kind of condition on the split to avoid splitting
when you've got a register for operands[2]?


Won't the predicate "match_code" guarantee that ?

   (define_predicate "const_two_s12"
      (match_code "const_int")
    {
      return SUM_OF_TWO_S12 (INTVAL (op), false);
    })

You're right.  Missed the match_code.  Sorry for the bad steer.



Do I need to build gcc in a certain way to expose such errors - I wasn't
able to trigger such an error for the entire testsuite or SPEC build.
There's a distinct RTL