date:20240322

[PATCH] Another ICE after conflicting types of redeclaration [PR110682]

2024-03-22 Thread Andrew Pinski

This another one of these ICE after error issues with the
gimplifier and a fallout from r12-3278-g823685221de986af.
The problem here is that STRIP_USELESS_TYPE_CONVERSION will
leave around a NON_LVALUE_EXPR which is an error mark node.
Since the gimplifier assumes non-lvalue expressions has been
removed, there was an ICE.

This fixes the issue by checking if there is a NON_LVALUE_EXPR
and that has an error operand, we handle it as the same as if
it was an error operand.

gcc/ChangeLog:

PR c/110682
* gimplify.cc (gimplify_expr): Add check if there is
a non-lvalue with an error operand.

gcc/testsuite/ChangeLog:

PR c/110682
* gcc.dg/redecl-27.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/gimplify.cc  |  6 +-
 gcc/testsuite/gcc.dg/redecl-27.c | 14 ++
 2 files changed, 19 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/redecl-27.c

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index d64bbf3ffbd..001b4af68b9 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -17686,7 +17686,11 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, 
gimple_seq *post_p,
   save_expr = *expr_p;
 
   /* Die, die, die, my darling.  */
-  if (error_operand_p (save_expr))
+  if (error_operand_p (save_expr)
+ /* The above strip useless type conversion might not strip out
+a conversion from an error so handle that case here.  */
+ || (TREE_CODE (save_expr) == NON_LVALUE_EXPR
+ && error_operand_p (TREE_OPERAND (save_expr, 0
{
  ret = GS_ERROR;
  break;
diff --git a/gcc/testsuite/gcc.dg/redecl-27.c b/gcc/testsuite/gcc.dg/redecl-27.c
new file mode 100644
index 000..93f577e64ff
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/redecl-27.c
@@ -0,0 +1,14 @@
+/* We used to ICE while gimplifying the body of f
+   due to a NON_LVALUE_EXPR still being there.
+   PR c/110682*/
+/* { dg-do compile } */
+/* { dg-options "" } */
+
+struct a {
+  const signed char b;
+};
+
+void f(volatile struct a *c) { /* { dg-note "" } */
+  c - 0 % c->b;
+  struct a c = {1}; /* { dg-error "redeclared as different kind of symbol" } */
+}
-- 
2.43.0

RE: [PATCH v2] DSE: Bugfix ICE after allow vector type in get_stored_val

2024-03-22 Thread Li, Pan2

Thanks Jeff for comments.

> As Richi noted using validate_subreg here isn't great.  Does it work to 
> factor out this code from extract_low_bits
>
>>   if (!int_mode_for_mode (src_mode).exists (_int_mode)
>>   || !int_mode_for_mode (mode).exists (_mode))
>> return NULL_RTX;
>> 
>>   if (!targetm.modes_tieable_p (src_int_mode, src_mode))
>> return NULL_RTX;
>>   if (!targetm.modes_tieable_p (int_mode, mode))
>> return NULL_RTX;

> And use that in the condition (and in extract_low_bits rather than 
> duplicating the code)?

It can solve the ICE but will forbid all vector modes goes gen_lowpart.
Actually only the vector mode size is less than reg nature size will trigger 
the ICE.
Thus, how about just add one more condition before goes to gen_lowpart as below?

Feel free to correct me if any misunderstandings. !

diff --git a/gcc/dse.cc b/gcc/dse.cc
index edc7a1dfecf..258d2ccc299 100644
--- a/gcc/dse.cc
+++ b/gcc/dse.cc
@@ -1946,7 +1946,9 @@ get_stored_val (store_info *store_info, machine_mode 
read_mode,
 copy_rtx (store_info->const_rhs));
   else if (VECTOR_MODE_P (read_mode) && VECTOR_MODE_P (store_mode)
 && known_le (GET_MODE_BITSIZE (read_mode), GET_MODE_BITSIZE (store_mode))
-&& targetm.modes_tieable_p (read_mode, store_mode))
+&& targetm.modes_tieable_p (read_mode, store_mode)
+/* It's invalid in validate_subreg if read_mode size is < reg natural.  */
+&& known_ge (GET_MODE_SIZE (read_mode), REGMODE_NATURAL_SIZE (read_mode)))
 read_reg = gen_lowpart (read_mode, copy_rtx (store_info->rhs));
   else
 read_reg = extract_low_bits (read_mode, store_mode,

Pan

-Original Message-
From: Jeff Law  
Sent: Saturday, March 23, 2024 2:54 AM
To: Li, Pan2 ; Robin Dapp ; 
gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com; 
Wang, Yanzhang ; Liu, Hongtao 
Subject: Re: [PATCH v2] DSE: Bugfix ICE after allow vector type in 
get_stored_val



On 3/4/24 11:22 PM, Li, Pan2 wrote:
> Thanks Jeff for comments.
> 
>> But in the case of a vector modes, we can usually reinterpret the
>> underlying bits in whatever mode we want and do any of the usual
>> operations on those bits.
> 
> Yes, I think that is why we can allow vector mode in get_stored_val if my 
> understanding is correct.
> And then the different modes will return by gen_low_part. Unfortunately, 
> there are some modes
>   (less than a vector bit size like V2SF, V2QI for vlen=128) are considered 
> as invalid by validate_subreg,
> and return NULL_RTX result in the final ICE.
That doesn't make a lot of sense to me.  Even for vlen=128 I would have 
expected that we can still use a subreg to access low bits.  After all 
we might have had a V16QI vector and done a reduction of some sort 
storing the result in the first element and we have to be able to 
extract that result and move it around.

I'm not real keen on a target workaround.  While extremely safe, I 
wouldn't be surprised if other ports could trigger the ICE and we'd end 
up patching up multiple targets for what is, IMHO, a more generic issue.

As Richi noted using validate_subreg here isn't great.  Does it work to 
factor out this code from extract_low_bits:


>   if (!int_mode_for_mode (src_mode).exists (_int_mode)
>   || !int_mode_for_mode (mode).exists (_mode))
> return NULL_RTX;
> 
>   if (!targetm.modes_tieable_p (src_int_mode, src_mode))
> return NULL_RTX;
>   if (!targetm.modes_tieable_p (int_mode, mode))
> return NULL_RTX;

And use that in the condition (and in extract_low_bits rather than 
duplicating the code)?

jeff

ps.  No need to apologize for the pings.  This completely fell off my radar.

Re: [PATCH v2] rs6000: Stackoverflow in optimized code on PPC [PR100799]

2024-03-22 Thread Peter Bergner

On 3/22/24 5:15 AM, Ajit Agarwal wrote:
> When using FlexiBLAS with OpenBLAS we noticed corruption of
> the parameters passed to OpenBLAS functions. FlexiBLAS
> basically provides a BLAS interface where each function
> is a stub that forwards the arguments to a real BLAS lib,
> like OpenBLAS.
> 
> Fixes the corruption of caller frame checking number of
> arguments is less than equal to GP_ARG_NUM_REG (8)
> excluding hidden unused DECLS.

I think the git log entry commentary could be a little more descriptive
of what the problem is. How about something like the following?

  When using FlexiBLAS with OpenBLAS, we noticed corruption of the caller
  stack frame when calling OpenBLAS functions.  This was caused by the
  FlexiBLAS C/C++ caller and OpenBLAS Fortran callee disagreeing on the
  number of function parameters in the callee due to hidden Fortran
  parameters. This can cause problems when the callee believes the caller
  has allocated a parameter save area when the caller has not done so.
  That means any writes by the callee into the non-existent parameter save
  area will corrupt the caller stack frame.

  The workaround implemented here, is for the callee to determine whether
  the caller has allocated a parameter save area or not, by ignoring any
  unused hidden parameters when counting the number of parameters.

>   PR rtk-optimization/100799

s/rtk/rtl/

>   * config/rs6000/rs6000-calls.cc (rs6000_function_arg): Don't
>   generate parameter save area if number of arguments passed
>   less than equal to GP_ARG_NUM_REG (8) excluding hidden
>   parameter.

The callee doesn't generate or allocate the parameter save area, the
caller does.  The code here is for the callee trying to determine
whether the caller has done so.  How about saying the following instead?

  Don't assume a parameter save area has been allocated if the number of
  formal parameters, excluding unused hidden parameters, is less than or
  equal to GP_ARG_NUM_REG (8).

>   (init_cumulative_args): Check for hidden parameter in fortran
>   routine and set the flag hidden_string_length and actual
>   parameter passed excluding hidden unused DECLS.

Check for unused hidden Fortran parameters and set hidden_string_length
and actual_parm_length.

> +  /* When the buggy C/C++ wrappers call the function with fewer arguments
> + than it actually has and doesn't expect the parameter save area on the
> + caller side because of that while the callee expects it and the callee
> + actually stores something in the parameter save area, it corrupts
> + whatever is in the caller stack frame at that location.  */

The wrapper/caller is the one that allocates the parameter save area, so
saying "...doesn't expect the parameter save area on the caller side..."
doesn't make sense, since it knows whether it allocated it or not.
How about saying something like the following instead?

  Check whether this function contains any unused hidden parameters and
  record how many there are for use in rs6000_function_arg() to determine
  whether its callers have allocated a parameter save area or not.
  See PR100799 for details.

> +  unsigned int num_args = 0;
> +  unsigned int hidden_length = 0;
> +
> +  for (tree arg = DECL_ARGUMENTS (current_function_decl);
> +   arg; arg = DECL_CHAIN (arg))
> +{
> +  num_args++;
> +  if (DECL_HIDDEN_STRING_LENGTH (arg))
> + {
> +   tree parmdef = ssa_default_def (cfun, arg);
> +   if (parmdef == NULL || has_zero_uses (parmdef))
> + {
> +   cum->hidden_string_length = 1;
> +   hidden_length++;
> + }
> + }
> +   }
> +
> +  cum->actual_parm_length = num_args - hidden_length;

This code looks fine, but do we really need two new fields in rs6000_args?
Can't we just get along with only cum->actual_parm_length by modifying
the rs6000_function_arg() change from:

> +  else if (align_words < GP_ARG_NUM_REG
> +|| (cum->hidden_string_length
> +&& cum->actual_parm_length <= GP_ARG_NUM_REG))

to:

+  else if (align_words < GP_ARG_NUM_REG
+  || cum->actual_parm_length <= GP_ARG_NUM_REG)

???

That said, I have a further comment below on what happens here when 
align_words >= GP_ARG_NUM_REG and cum->actual_parm_length <= GP_ARG_NUM_REG.

> + /* When the buggy C/C++ wrappers call the function with fewer arguments
> + than it actually has and doesn't expect the parameter save area on the
> + caller side because of that while the callee expects it and the callee
> + actually stores something in the parameter save area, it corrupts
> + whatever is in the caller stack frame at that location.  */

Same comment as before, so same problem with the comment, but the following
change...

> -  else if (align_words < GP_ARG_NUM_REG)
> +  else if (align_words < GP_ARG_NUM_REG
> +|| (cum->hidden_string_length
> +&& cum->actual_parm_length

Re: [PATCH] xtensa: Add supplementary split pattern for "*addsubx"

2024-03-22 Thread Max Filippov

On Thu, Mar 21, 2024 at 4:36 PM Takayuki 'January June' Suwa
 wrote:
>
> int test(int a) {
>return a * 4 + 3;
> }
>
> In the example above, since Xtensa has instructions to add register value
> scaled by 2, 4 or 8 (and corresponding define_insns), we would expect them
> to be used but not, because it is transformed before reaching the RTL
> generation pass as below:
>
> int test(int a) {
>return (a + 7500) * 4;
> }
>
> Fortunately, the RTL combination pass tries a splitting pattern that matches
> the first example, so it is easy to solve by defining that pattern.
>
> gcc/ChangeLog:
>
> * config/xtensa/xtensa.md: Add new split pattern described above.
> ---
>  gcc/config/xtensa/xtensa.md | 14 ++
>  1 file changed, 14 insertions(+)

Regtested for target=xtensa-linux-uclibc, no new regressions.
Committed to master.

-- 
Thanks.
-- Max

[PATCH 2/2] libstdc++: Replace stacktrace effective target with feature test

2024-03-22 Thread Jonathan Wakely

And this replaces an existing custom dg-require- directive with a use of
the new one that checks for a standard feature test macro. I didn't see
any other existing dg-require-xxx directives that can be replaced like
this.

-- >8 --

Remove the dejagnu code for checking whether std::stacktrace is supported
and just use the new dg-require-cpp-feature-test directive to check for
__cpp_lib_stacktrace instead.

libstdc++-v3/ChangeLog:

* testsuite/19_diagnostics/stacktrace/current.cc: Check for
__cpp_lib_stacktrace instead of check for stacktrace ET.
* testsuite/19_diagnostics/stacktrace/entry.cc: Likewise.
* testsuite/19_diagnostics/stacktrace/hash.cc: Likewise.
* testsuite/19_diagnostics/stacktrace/output.cc: Likewise.
* testsuite/19_diagnostics/stacktrace/stacktrace.cc: Likewise.
* testsuite/19_diagnostics/stacktrace/synopsis.cc: Likewise.
* testsuite/19_diagnostics/stacktrace/version.cc: Likewise.
* testsuite/23_containers/vector/debug/assign4_backtrace_neg.cc:
Likewise.
* testsuite/lib/libstdc++.exp (check_effective_target_stacktrace):
Remove.
---
 .../testsuite/19_diagnostics/stacktrace/current.cc| 2 +-
 libstdc++-v3/testsuite/19_diagnostics/stacktrace/entry.cc | 2 +-
 libstdc++-v3/testsuite/19_diagnostics/stacktrace/hash.cc  | 2 +-
 .../testsuite/19_diagnostics/stacktrace/output.cc | 2 +-
 .../testsuite/19_diagnostics/stacktrace/stacktrace.cc | 2 +-
 .../testsuite/19_diagnostics/stacktrace/synopsis.cc   | 2 +-
 .../testsuite/19_diagnostics/stacktrace/version.cc| 2 +-
 .../23_containers/vector/debug/assign4_backtrace_neg.cc   | 2 +-
 libstdc++-v3/testsuite/lib/libstdc++.exp  | 8 
 9 files changed, 8 insertions(+), 16 deletions(-)

diff --git a/libstdc++-v3/testsuite/19_diagnostics/stacktrace/current.cc 
b/libstdc++-v3/testsuite/19_diagnostics/stacktrace/current.cc
index a27836d27af..b1af5f74fb2 100644
--- a/libstdc++-v3/testsuite/19_diagnostics/stacktrace/current.cc
+++ b/libstdc++-v3/testsuite/19_diagnostics/stacktrace/current.cc
@@ -1,6 +1,6 @@
 // { dg-options "-lstdc++exp" }
 // { dg-do run { target c++23 } }
-// { dg-require-effective-target stacktrace }
+// { dg-require-cpp-feature-test __cpp_lib_stacktrace }
 
 #include 
 #include 
diff --git a/libstdc++-v3/testsuite/19_diagnostics/stacktrace/entry.cc 
b/libstdc++-v3/testsuite/19_diagnostics/stacktrace/entry.cc
index ab016d56400..bb348ebef8f 100644
--- a/libstdc++-v3/testsuite/19_diagnostics/stacktrace/entry.cc
+++ b/libstdc++-v3/testsuite/19_diagnostics/stacktrace/entry.cc
@@ -1,6 +1,6 @@
 // { dg-options "-lstdc++exp" }
 // { dg-do run { target c++23 } }
-// { dg-require-effective-target stacktrace }
+// { dg-require-cpp-feature-test __cpp_lib_stacktrace }
 
 #include 
 #include "testsuite_hooks.h"
diff --git a/libstdc++-v3/testsuite/19_diagnostics/stacktrace/hash.cc 
b/libstdc++-v3/testsuite/19_diagnostics/stacktrace/hash.cc
index 21705098ff0..2176596ae5c 100644
--- a/libstdc++-v3/testsuite/19_diagnostics/stacktrace/hash.cc
+++ b/libstdc++-v3/testsuite/19_diagnostics/stacktrace/hash.cc
@@ -1,6 +1,6 @@
 // { dg-options "-lstdc++exp" }
 // { dg-do run { target c++23 } }
-// { dg-require-effective-target stacktrace }
+// { dg-require-cpp-feature-test __cpp_lib_stacktrace }
 
 #include 
 #include 
diff --git a/libstdc++-v3/testsuite/19_diagnostics/stacktrace/output.cc 
b/libstdc++-v3/testsuite/19_diagnostics/stacktrace/output.cc
index 67f1e0cebaf..e27aea1f508 100644
--- a/libstdc++-v3/testsuite/19_diagnostics/stacktrace/output.cc
+++ b/libstdc++-v3/testsuite/19_diagnostics/stacktrace/output.cc
@@ -1,6 +1,6 @@
 // { dg-options "-lstdc++exp" }
 // { dg-do run { target c++23 } }
-// { dg-require-effective-target stacktrace }
+// { dg-require-cpp-feature-test __cpp_lib_stacktrace }
 // { dg-add-options no_pch }
 
 #include 
diff --git a/libstdc++-v3/testsuite/19_diagnostics/stacktrace/stacktrace.cc 
b/libstdc++-v3/testsuite/19_diagnostics/stacktrace/stacktrace.cc
index 5dfa76951df..070c4157471 100644
--- a/libstdc++-v3/testsuite/19_diagnostics/stacktrace/stacktrace.cc
+++ b/libstdc++-v3/testsuite/19_diagnostics/stacktrace/stacktrace.cc
@@ -1,6 +1,6 @@
 // { dg-options "-lstdc++exp" }
 // { dg-do run { target c++23 } }
-// { dg-require-effective-target stacktrace }
+// { dg-require-cpp-feature-test __cpp_lib_stacktrace }
 
 #include 
 #include "testsuite_allocator.h"
diff --git a/libstdc++-v3/testsuite/19_diagnostics/stacktrace/synopsis.cc 
b/libstdc++-v3/testsuite/19_diagnostics/stacktrace/synopsis.cc
index 9e775b86ac9..b99d382ec26 100644
--- a/libstdc++-v3/testsuite/19_diagnostics/stacktrace/synopsis.cc
+++ b/libstdc++-v3/testsuite/19_diagnostics/stacktrace/synopsis.cc
@@ -1,5 +1,5 @@
 // { dg-do compile { target c++23 } }
-// { dg-require-effective-target stacktrace }
+// { dg-require-cpp-feature-test __cpp_lib_stacktrace }
 // { dg-require-normal-namespace "" }
 // { dg-add-options no_pch }
 
diff

[PATCH 1/2] libstdc++: Add dg-require-cpp-feature-test to test feature test macros

2024-03-22 Thread Jonathan Wakely

Thoughts? There are only a few uses for this presently, but I can see it
being useful often in future. The library exposes which features it
supports in a standardized way, so we can use those in tests to skip
tests for features that aren't available on all targets.

The obvious downside is that it becomes harder to notice if a particular
feature is missing on all targets, because we don't get FAILs we just
skip all tests as UNSUPPORTED. And the checks for whether 
correctly defines the macro become redundant, because the test won't
even get run if it doesn't. But we won't use this dg-require for many
tests, only the ones where support is target-dependent because it relies
on something non-standard or not available on all targets (like
nl_langinfo_l or libbacktrace).

-- >8 -

This adds a new dejagnu directive which can be used to make a test
depend on a feature test macro such as __cpp_lib_text_encoding. This is
mroe flexible than writing a new dg-require-xxx for each feature.

libstdc++-v3/ChangeLog:

* testsuite/lib/dg-options.exp (dg-require-cpp-feature-test):
New proc.
* testsuite/lib/libstdc++.exp (check_v3_target_cpp_feature_test):
New proc.
* testsuite/std/text_encoding/cons.cc: Use new directive to skip
the test if the __cpp_lib_text_encoding feature test macro is
not defined.
* testsuite/std/text_encoding/requirements.cc: Likewise.
---
 libstdc++-v3/testsuite/lib/dg-options.exp |  9 +
 libstdc++-v3/testsuite/lib/libstdc++.exp  | 15 +++
 libstdc++-v3/testsuite/std/text_encoding/cons.cc  |  1 +
 .../testsuite/std/text_encoding/requirements.cc   |  3 ++-
 4 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/testsuite/lib/dg-options.exp 
b/libstdc++-v3/testsuite/lib/dg-options.exp
index 00ca678a53a..802bfc0b492 100644
--- a/libstdc++-v3/testsuite/lib/dg-options.exp
+++ b/libstdc++-v3/testsuite/lib/dg-options.exp
@@ -277,6 +277,15 @@ proc dg-require-target-fs-lwt { args } {
 return
 }
 
+proc dg-require-cpp-feature-test { n args } {
+if { ![ check_v3_target_cpp_feature_test $args ] } {
+   upvar dg-do-what dg-do-what
+   set dg-do-what [list [lindex ${dg-do-what} 0] "N" "P"]
+   return
+}
+return
+}
+
 proc add_options_for_no_pch { flags } {
 # Remove any inclusion of bits/stdc++.h from the options.
 regsub -all -- "-include bits/stdc...h" $flags "" flags
diff --git a/libstdc++-v3/testsuite/lib/libstdc++.exp 
b/libstdc++-v3/testsuite/lib/libstdc++.exp
index 7466fb51c58..2b31c950826 100644
--- a/libstdc++-v3/testsuite/lib/libstdc++.exp
+++ b/libstdc++-v3/testsuite/lib/libstdc++.exp
@@ -1134,6 +1134,21 @@ proc v3_check_preprocessor_condition { name cond { inc 
"" } } {
 return [v3_try_preprocess name $code $flags]
 }
 
+# Return 1 if libstdc++ defines macro for the current target, 0 otherwise.
+proc check_v3_target_cpp_feature_test { cond } {
+global cxxflags
+set cxxflags_saved $cxxflags
+# Use the latest standard, so that all feature test macros are defined.
+# We need to do it here, because this check happens before v3-dg-runtest
+# runs its loop to test multiple times with different -std options.
+# This should be updated when a new -std is added.
+set cxxflags "$cxxflags -std=gnu++26"
+set inc "#include "
+set result [v3_check_preprocessor_condition cpp_feature_test "$cond" $inc]
+set cxxflags $cxxflags_saved
+return $result
+}
+
 # Return 1 if Debug Mode is active, 0 otherwise.
 proc check_v3_target_debug_mode { } {
 global cxxflags
diff --git a/libstdc++-v3/testsuite/std/text_encoding/cons.cc 
b/libstdc++-v3/testsuite/std/text_encoding/cons.cc
index 8fcc2ec8c3b..4196e32ea8b 100644
--- a/libstdc++-v3/testsuite/std/text_encoding/cons.cc
+++ b/libstdc++-v3/testsuite/std/text_encoding/cons.cc
@@ -1,4 +1,5 @@
 // { dg-do run { target c++26 } }
+// { dg-require-cpp-feature-test "__cpp_lib_text_encoding" }
 
 #include 
 #include 
diff --git a/libstdc++-v3/testsuite/std/text_encoding/requirements.cc 
b/libstdc++-v3/testsuite/std/text_encoding/requirements.cc
index a1d5d6baee1..3889b250688 100644
--- a/libstdc++-v3/testsuite/std/text_encoding/requirements.cc
+++ b/libstdc++-v3/testsuite/std/text_encoding/requirements.cc
@@ -1,4 +1,5 @@
 // { dg-do compile { target c++26 } }
+// { dg-require-cpp-feature-test __cpp_lib_text_encoding }
 // { dg-add-options no_pch }
 
 #include

[committed] libstdc++: Destroy allocators in re-inserted container nodes [PR114401]

2024-03-22 Thread Jonathan Wakely

Tested aarch64-linux. Pushed to trunk.

This should be backported to all branches, as the failure to destroy the
allocators in the re-inserted nodes results in potential resource leaks.

-- >8 --

The allocator objects in container node handles were not being destroyed
after the node was re-inserted into a container. They are stored in a
union and so need to be explicitly destroyed when the node becomes
empty. The containers were zeroing the node handle's pointer, which
makes it empty, causing the handle's destructor to think there's nothign
to clean up.

Add a new member function to the node handle which destroys the
allocator and zeros the pointer. Change the containers to call that
instead of just changing the pointer manually.

We can also remove the _M_empty member of the union which is not
necessary.

libstdc++-v3/ChangeLog:

PR libstdc++/114401
* include/bits/hashtable.h (_Hashtable::_M_reinsert_node): Call
release() on node handle instead of just zeroing its pointer.
(_Hashtable::_M_reinsert_node_multi): Likewise.
(_Hashtable::_M_merge_unique): Likewise.
(_Hashtable::_M_merge_multi): Likewise.
* include/bits/node_handle.h (_Node_handle_common::release()):
New member function.
(_Node_handle_common::_Optional_alloc::_M_empty): Remove
unnecessary union member.
(_Node_handle_common): Declare _Hashtable as a friend.
* include/bits/stl_tree.h (_Rb_tree::_M_reinsert_node_unique):
Call release() on node handle instead of just zeroing its
pointer.
(_Rb_tree::_M_reinsert_node_equal): Likewise.
(_Rb_tree::_M_reinsert_node_hint_unique): Likewise.
(_Rb_tree::_M_reinsert_node_hint_equal): Likewise.
* testsuite/23_containers/multiset/modifiers/114401.cc: New test.
* testsuite/23_containers/set/modifiers/114401.cc: New test.
* testsuite/23_containers/unordered_multiset/modifiers/114401.cc: New 
test.
* testsuite/23_containers/unordered_set/modifiers/114401.cc: New test.
---
 libstdc++-v3/include/bits/hashtable.h |  12 +-
 libstdc++-v3/include/bits/node_handle.h   |  19 ++-
 libstdc++-v3/include/bits/stl_tree.h  |  12 +-
 .../multiset/modifiers/114401.cc  | 125 +
 .../23_containers/set/modifiers/114401.cc | 125 +
 .../unordered_multiset/modifiers/114401.cc| 126 ++
 .../unordered_set/modifiers/114401.cc | 126 ++
 7 files changed, 530 insertions(+), 15 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/23_containers/multiset/modifiers/114401.cc
 create mode 100644 libstdc++-v3/testsuite/23_containers/set/modifiers/114401.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/unordered_multiset/modifiers/114401.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/unordered_set/modifiers/114401.cc

diff --git a/libstdc++-v3/include/bits/hashtable.h 
b/libstdc++-v3/include/bits/hashtable.h
index c3ef7a0a3d5..cd3e1ac297c 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -1036,7 +1036,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // DR 1189.
   // reserve, if present, comes from _Rehash_base.
 
-#if __cplusplus > 201402L
+#if __glibcxx_node_extract // >= C++17
   /// Re-insert an extracted node into a container with unique keys.
   insert_return_type
   _M_reinsert_node(node_type&& __nh)
@@ -1078,7 +1078,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  {
__ret.position
  = _M_insert_unique_node(__bkt, __code, __nh._M_ptr);
-   __nh._M_ptr = nullptr;
+   __nh.release();
__ret.inserted = true;
  }
  }
@@ -1098,7 +1098,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
auto __code = this->_M_hash_code(__k);
auto __ret
  = _M_insert_multi_node(__hint._M_cur, __code, __nh._M_ptr);
-   __nh._M_ptr = nullptr;
+   __nh.release();
return __ret;
   }
 
@@ -1200,7 +1200,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
{
  auto __nh = __src.extract(__pos);
  _M_insert_unique_node(__bkt, __code, __nh._M_ptr, __n_elt);
- __nh._M_ptr = nullptr;
+ __nh.release();
  __n_elt = 1;
}
  else if (__n_elt != 1)
@@ -1227,10 +1227,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
= _M_src_hash_code(__src.hash_function(), __k, *__pos._M_cur);
  auto __nh = __src.extract(__pos);
  __hint = _M_insert_multi_node(__hint, __code, __nh._M_ptr)._M_cur;
- __nh._M_ptr = nullptr;
+ __nh.release();
}
}
-#endif // C++17
+#endif // C++17 __glibcxx_node_extract
 
 private:
   // Helper rehash method used when keys are unique.
diff --git a/libstdc++-v3/include/bits/node_handle.h

Re: [PATCH] libstdc++: Constrain std::vector default constructor [PR113841]

2024-03-22 Thread Jonathan Wakely

Pushed to trunk. Backport to gcc-13 needed too, as the changes to use
concepts for std::pair constructors are on that branch.

On Tue, 19 Mar 2024 at 15:59, Jonathan Wakely  wrote:
>
> This fixes the problem in the PR, which is revealed by the new
> concept-based constraints on std::pair constructors in C++20 mode. That
> makes this a C++20 regression, as the PR's example compiles with C++17.
>
> We need something similar for std::basic_string too, which I'll do
> later.
>
> Any comments?
>
> Tested aarch64-linux.
>
> -- >8 --
>
> This is needed to avoid errors outside the immediate context when
> evaluating is_default_constructible_v> when A is not
> default constructible.
>
> To avoid diagnostic regressions for 23_containers/vector/48101_neg.cc we
> need to make the std::allocator partial specializations default
> constructible, which they probably should have been anyway.
>
> libstdc++-v3/ChangeLog:
>
> PR libstdc++/113841
> * include/bits/allocator.h (allocator): Add default
> constructor to partial specializations for cv-qualified types.
> * include/bits/stl_vector.h (_Vector_impl::_Vector_impl()):
> Constrain so that it's only present if the allocator is default
> constructible.
> * include/bits/stl_bvector.h (_Bvector_impl::_Bvector_impl()):
> Likewise.
> * testsuite/23_containers/vector/cons/113841.cc: New test.
> ---
>  libstdc++-v3/include/bits/allocator.h |  3 ++
>  libstdc++-v3/include/bits/stl_bvector.h   |  3 ++
>  libstdc++-v3/include/bits/stl_vector.h|  3 ++
>  .../23_containers/vector/cons/113841.cc   | 34 +++
>  4 files changed, 43 insertions(+)
>  create mode 100644 libstdc++-v3/testsuite/23_containers/vector/cons/113841.cc
>
> diff --git a/libstdc++-v3/include/bits/allocator.h 
> b/libstdc++-v3/include/bits/allocator.h
> index ff4f5b9137b..9e75b37fce7 100644
> --- a/libstdc++-v3/include/bits/allocator.h
> +++ b/libstdc++-v3/include/bits/allocator.h
> @@ -254,6 +254,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  {
>  public:
>typedef _Tp value_type;
> +  allocator() { }
>template allocator(const allocator<_Up>&) { }
>  };
>
> @@ -262,6 +263,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  {
>  public:
>typedef _Tp value_type;
> +  allocator() { }
>template allocator(const allocator<_Up>&) { }
>  };
>
> @@ -270,6 +272,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  {
>  public:
>typedef _Tp value_type;
> +  allocator() { }
>template allocator(const allocator<_Up>&) { }
>  };
>/// @endcond
> diff --git a/libstdc++-v3/include/bits/stl_bvector.h 
> b/libstdc++-v3/include/bits/stl_bvector.h
> index a3343d95b36..d567e26f4e4 100644
> --- a/libstdc++-v3/include/bits/stl_bvector.h
> +++ b/libstdc++-v3/include/bits/stl_bvector.h
> @@ -593,6 +593,9 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
> _GLIBCXX20_CONSTEXPR
> _Bvector_impl() _GLIBCXX_NOEXCEPT_IF(
>   is_nothrow_default_constructible<_Bit_alloc_type>::value)
> +#if __cpp_concepts
> +   requires is_default_constructible_v<_Bit_alloc_type>
> +#endif
> : _Bit_alloc_type()
> { }
>
> diff --git a/libstdc++-v3/include/bits/stl_vector.h 
> b/libstdc++-v3/include/bits/stl_vector.h
> index a8d387f40a1..31169711a48 100644
> --- a/libstdc++-v3/include/bits/stl_vector.h
> +++ b/libstdc++-v3/include/bits/stl_vector.h
> @@ -135,6 +135,9 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
> _GLIBCXX20_CONSTEXPR
> _Vector_impl() _GLIBCXX_NOEXCEPT_IF(
> is_nothrow_default_constructible<_Tp_alloc_type>::value)
> +#if __cpp_lib_concepts
> +   requires is_default_constructible_v<_Tp_alloc_type>
> +#endif
> : _Tp_alloc_type()
> { }
>
> diff --git a/libstdc++-v3/testsuite/23_containers/vector/cons/113841.cc 
> b/libstdc++-v3/testsuite/23_containers/vector/cons/113841.cc
> new file mode 100644
> index 000..a7721d27f79
> --- /dev/null
> +++ b/libstdc++-v3/testsuite/23_containers/vector/cons/113841.cc
> @@ -0,0 +1,34 @@
> +// { dg-do compile { target c++20 } }
> +
> +#include 
> +
> +template
> +struct Alloc
> +{
> +  using value_type = T;
> +
> +  Alloc(int) { } // not default constructible
> +
> +  template Alloc(const Alloc&) { }
> +
> +  T* allocate(std::size_t n) { return std::allocator().allocate(n); }
> +  void deallocate(T* p, std::size_t n) { std::allocator().deallocate(p, 
> n); }
> +};
> +
> +template struct wrap { T t; };
> +
> +template void do_adl(T&) { }
> +
> +void test_pr113841()
> +{
> +  using test_type = std::vector>;
> +  std::pair>* h = nullptr;
> +  do_adl(h);
> +}
> +
> +void test_pr113841_bool()
> +{
> +  using test_type = std::vector>;
> +  std::pair>* h = nullptr;
> +  do_adl(h);
> +}
> --
> 2.44.0
>

[committed] libstdc++: Reorder feature test macro definitions

2024-03-22 Thread Jonathan Wakely

Tested aarch64-linux. Pushed to trunk.

-- >8 --

Put the C++23 generator and tuple_like ones before the C++26 ones.

libstdc++-v3/ChangeLog:

* include/bits/version.def (generator, tuple_like): Move earlier
in the file.
* include/bits/version.h: Regenerate.
---
 libstdc++-v3/include/bits/version.def | 34 +++
 libstdc++-v3/include/bits/version.h   | 40 +--
 2 files changed, 37 insertions(+), 37 deletions(-)

diff --git a/libstdc++-v3/include/bits/version.def 
b/libstdc++-v3/include/bits/version.def
index 26e62c6a9b2..5ad44941bff 100644
--- a/libstdc++-v3/include/bits/version.def
+++ b/libstdc++-v3/include/bits/version.def
@@ -1637,6 +1637,15 @@ ftms = {
   };
 };
 
+ftms = {
+  name = generator;
+  values = {
+v = 202207;
+cxxmin = 23;
+extra_cond = "__glibcxx_coroutine";
+  };
+};
+
 ftms = {
   name = ios_noreplace;
   values = {
@@ -1718,6 +1727,14 @@ ftms = {
   };
 };
 
+ftms = {
+  name = tuple_like;
+  values = {
+v = 202207;
+cxxmin = 23;
+  };
+};
+
 ftms = {
   name = unreachable;
   values = {
@@ -1771,23 +1788,6 @@ ftms = {
   };
 };
 
-ftms = {
-  name = generator;
-  values = {
-v = 202207;
-cxxmin = 23;
-extra_cond = "__glibcxx_coroutine";
-  };
-};
-
-ftms = {
-  name = tuple_like;
-  values = {
-v = 202207;
-cxxmin = 23;
-  };
-};
-
 // Standard test specifications.
 stds[97] = ">= 199711L";
 stds[03] = ">= 199711L";
diff --git a/libstdc++-v3/include/bits/version.h 
b/libstdc++-v3/include/bits/version.h
index 23c8c09ab4b..460a3e0116a 100644
--- a/libstdc++-v3/include/bits/version.h
+++ b/libstdc++-v3/include/bits/version.h
@@ -1823,6 +1823,16 @@
 #endif /* !defined(__cpp_lib_forward_like) && 
defined(__glibcxx_want_forward_like) */
 #undef __glibcxx_want_forward_like
 
+#if !defined(__cpp_lib_generator)
+# if (__cplusplus >= 202100L) && (__glibcxx_coroutine)
+#  define __glibcxx_generator 202207L
+#  if defined(__glibcxx_want_all) || defined(__glibcxx_want_generator)
+#   define __cpp_lib_generator 202207L
+#  endif
+# endif
+#endif /* !defined(__cpp_lib_generator) && defined(__glibcxx_want_generator) */
+#undef __glibcxx_want_generator
+
 #if !defined(__cpp_lib_ios_noreplace)
 # if (__cplusplus >= 202100L) && _GLIBCXX_HOSTED
 #  define __glibcxx_ios_noreplace 202207L
@@ -1913,6 +1923,16 @@
 #endif /* !defined(__cpp_lib_to_underlying) && 
defined(__glibcxx_want_to_underlying) */
 #undef __glibcxx_want_to_underlying
 
+#if !defined(__cpp_lib_tuple_like)
+# if (__cplusplus >= 202100L)
+#  define __glibcxx_tuple_like 202207L
+#  if defined(__glibcxx_want_all) || defined(__glibcxx_want_tuple_like)
+#   define __cpp_lib_tuple_like 202207L
+#  endif
+# endif
+#endif /* !defined(__cpp_lib_tuple_like) && defined(__glibcxx_want_tuple_like) 
*/
+#undef __glibcxx_want_tuple_like
+
 #if !defined(__cpp_lib_unreachable)
 # if (__cplusplus >= 202100L)
 #  define __glibcxx_unreachable 202202L
@@ -1973,24 +1993,4 @@
 #endif /* !defined(__cpp_lib_to_string) && defined(__glibcxx_want_to_string) */
 #undef __glibcxx_want_to_string
 
-#if !defined(__cpp_lib_generator)
-# if (__cplusplus >= 202100L) && (__glibcxx_coroutine)
-#  define __glibcxx_generator 202207L
-#  if defined(__glibcxx_want_all) || defined(__glibcxx_want_generator)
-#   define __cpp_lib_generator 202207L
-#  endif
-# endif
-#endif /* !defined(__cpp_lib_generator) && defined(__glibcxx_want_generator) */
-#undef __glibcxx_want_generator
-
-#if !defined(__cpp_lib_tuple_like)
-# if (__cplusplus >= 202100L)
-#  define __glibcxx_tuple_like 202207L
-#  if defined(__glibcxx_want_all) || defined(__glibcxx_want_tuple_like)
-#   define __cpp_lib_tuple_like 202207L
-#  endif
-# endif
-#endif /* !defined(__cpp_lib_tuple_like) && defined(__glibcxx_want_tuple_like) 
*/
-#undef __glibcxx_want_tuple_like
-
 #undef __glibcxx_want_all
-- 
2.44.0

[committed] libstdc++: Use feature test macros in

2024-03-22 Thread Jonathan Wakely

Tested aarch64-linux. Pushed to trunk.

-- >8 --

The preprocessor checks for __cplusplus in  should
use the appropriate feature test macros instead of __cplusplus, namely
__glibcxx_raw_memory_algorithms and __cpp_constexpr_dynamic_alloc.

For the latter, we want to check the compiler macro not the library's
__cpp_lib_constexpr_dynamic_alloc, because the latter is not defined for
freestanding but std::construct_at needs to be.

libstdc++-v3/ChangeLog:

* include/bits/stl_construct.h (destroy_at, construct_at): Guard
with feature test macros instead of just __cplusplus.
---
 libstdc++-v3/include/bits/stl_construct.h | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/libstdc++-v3/include/bits/stl_construct.h 
b/libstdc++-v3/include/bits/stl_construct.h
index 7c394072b50..dc08fb7ea33 100644
--- a/libstdc++-v3/include/bits/stl_construct.h
+++ b/libstdc++-v3/include/bits/stl_construct.h
@@ -74,7 +74,7 @@ namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
-#if __cplusplus >= 201703L
+#if __glibcxx_raw_memory_algorithms // >= C++17
   template 
 _GLIBCXX20_CONSTEXPR inline void
 destroy_at(_Tp* __location)
@@ -88,7 +88,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
__location->~_Tp();
 }
 
-#if __cplusplus >= 202002L
+#if __cpp_constexpr_dynamic_alloc // >= C++20
   template
 constexpr auto
 construct_at(_Tp* __location, _Args&&... __args)
@@ -108,7 +108,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 inline void
 _Construct(_Tp* __p, _Args&&... __args)
 {
-#if __cplusplus >= 202002L
+#if __cpp_constexpr_dynamic_alloc // >= C++20
   if (std::__is_constant_evaluated())
{
  // Allow std::_Construct to be used in constant expressions.
@@ -145,7 +145,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _GLIBCXX14_CONSTEXPR inline void
 _Destroy(_Tp* __pointer)
 {
-#if __cplusplus > 201703L
+#if __cpp_constexpr_dynamic_alloc // >= C++20
   std::destroy_at(__pointer);
 #else
   __pointer->~_Tp();
@@ -188,7 +188,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   static_assert(is_destructible<_Value_type>::value,
"value type is destructible");
 #endif
-#if __cplusplus >= 202002L
+#if __cpp_constexpr_dynamic_alloc // >= C++20
   if (std::__is_constant_evaluated())
return std::_Destroy_aux::__destroy(__first, __last);
 #endif
@@ -237,7 +237,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   static_assert(is_destructible<_Value_type>::value,
"value type is destructible");
 #endif
-#if __cplusplus >= 202002L
+#if __cpp_constexpr_dynamic_alloc // >= C++20
   if (std::__is_constant_evaluated())
return std::_Destroy_n_aux::__destroy_n(__first, __count);
 #endif
@@ -245,7 +245,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
__destroy_n(__first, __count);
 }
 
-#if __cplusplus >= 201703L
+#if __glibcxx_raw_memory_algorithms // >= C++17
   template 
 _GLIBCXX20_CONSTEXPR inline void
 destroy(_ForwardIterator __first, _ForwardIterator __last)
-- 
2.44.0

[committed] libstdc++: Replace std::result_of with __invoke_result_t [PR114394]

2024-03-22 Thread Jonathan Wakely

Tested aarch64-linux. Pushed to trunk.

-- >8 --

Replace std::result_of with std::invoke_result, as specified in the
standard since C++17, to avoid deprecated warnings for std::result_of.

We don't have __invoke_result_t in C++11 mode, so add it as an alias
template for __invoke_result<>::type (which is what std::result_of uses
as its base class, so there's no change in functionality).

This fixes warnings given by Clang 18.

libstdc++-v3/ChangeLog:

PR libstdc++/114394
* include/std/functional (bind): Use __invoke_result_t instead
of result_of::type.
* include/std/type_traits (__invoke_result_t): New alias
template.
* testsuite/20_util/bind/ref_neg.cc: Adjust prune pattern.
---
 libstdc++-v3/include/std/functional| 2 +-
 libstdc++-v3/include/std/type_traits   | 4 
 libstdc++-v3/testsuite/20_util/bind/ref_neg.cc | 2 +-
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/std/functional 
b/libstdc++-v3/include/std/functional
index e02be00abe5..766558b3ce0 100644
--- a/libstdc++-v3/include/std/functional
+++ b/libstdc++-v3/include/std/functional
@@ -556,7 +556,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template
using _Res_type_impl
- = typename result_of< _Fn&(_Mu_type<_BArgs, _CallArgs>&&...) >::type;
+ = __invoke_result_t<_Fn&, _Mu_type<_BArgs, _CallArgs>&&...>;
 
   template
using _Res_type = _Res_type_impl<_Functor, _CallArgs, _Bound_args...>;
diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 21402fd8c13..b441bf9908f 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -2664,6 +2664,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
_Functor, _ArgTypes...
   >::type
 { };
+
+  // __invoke_result_t (std::invoke_result_t for C++11)
+  template
+using __invoke_result_t = typename __invoke_result<_Fn, _Args...>::type;
   /// @endcond
 
   template
diff --git a/libstdc++-v3/testsuite/20_util/bind/ref_neg.cc 
b/libstdc++-v3/testsuite/20_util/bind/ref_neg.cc
index 4a1ed8dda5f..2db9fa8276a 100644
--- a/libstdc++-v3/testsuite/20_util/bind/ref_neg.cc
+++ b/libstdc++-v3/testsuite/20_util/bind/ref_neg.cc
@@ -50,7 +50,7 @@ void test02()
 
 // Ignore the reasons for deduction/substitution failure in the headers.
 // Arrange for the match to work on installed trees as well as build trees.
-// { dg-prune-output "no type named 'type' in 'struct std::result_of" }
+// { dg-prune-output "no type named 'type' in 'struct std::__invoke_result" }
 
 int main()
 {
-- 
2.44.0

[PATCH v2] c++: ICE with noexcept and local specialization, again [PR114349]

2024-03-22 Thread Marek Polacek

On Thu, Mar 21, 2024 at 05:27:37PM -0400, Jason Merrill wrote:
> On 3/21/24 17:01, Marek Polacek wrote:
> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > 
> > -- >8 --
> > Patrick noticed that my r14-9339-gdc6c3bfb59baab patch is wrong;
> > we're dealing with a noexcept-spec there, not a noexcept-expr, so
> > setting cp_noexcept_operand et al is incorrect.  Back to the drawing
> > board then.
> > 
> > To fix noexcept84.C, we should probably avoid doing push_to_top_level
> > in certain cases.  Patrick suggested checking:
> > 
> >const bool push_to_top = current_function_decl != fn;
> > 
> > which works, but I'm not sure I follow the logic there.  I also came
> > up with
> > 
> >const bool push_to_top = !decl_function_context (fn);
> > 
> > which also works.  But ultimately I went with !DECL_TEMPLATE_INSTANTIATED;
> > if DECL_TEMPLATE_INSTANTIATED is set, we've already pushed to top level
> > if it was necessary in instantiate_body.
> 
> This sort of thing is what maybe_push_to_top_level is for, does that also
> work?

Sadly -- and I should have mentioned that -- no.  maybe_push_to_top_level asks:

  bool push_to_top
= !(current_function_decl
   && !LAMBDA_FUNCTION_P (d)
   && decl_function_context (d) == current_function_decl);

here both d and current_function_decl are test()::S::S(), and
decl_function_context (d) is test().  (current_function_decl was
set to test()::S::S() by an earlier push_access_scope call.)

But I want it to work, and I think using maybe_ would be a way nicer
fix.  So what if we don't push to top level if decl_function_context
is non-null?  I had to add the LAMBDA_TYPE_P check though: it looks
that we always have to push to top level for lambdas, but sometimes
we get a lambda's TYPE_DECL, and LAMBDA_FUNCTION_P doesn't catch
that.  An example is lambda-nested4.C.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
Patrick noticed that my r14-9339-gdc6c3bfb59baab patch is wrong;
we're dealing with a noexcept-spec there, not a noexcept-expr, so
setting cp_noexcept_operand et al is incorrect.  Back to the drawing
board then.

To fix noexcept84.C, we should probably avoid doing push_to_top_level
in certain cases.  maybe_push_to_top_level didn't work here as-is, so
I changed it to not push to top level if decl_function_context is
non-null, when we are not dealing with a lambda.

This also fixes c++/114349, introduced by r14-9339.

PR c++/114349

gcc/cp/ChangeLog:

* name-lookup.cc (maybe_push_to_top_level): For a non-lambda,
don't push to top level if decl_function_context is non-null.
* pt.cc (maybe_instantiate_noexcept): Use maybe_push_to_top_level.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/noexcept85.C: New test.
* g++.dg/cpp0x/noexcept86.C: New test.
---
 gcc/cp/name-lookup.cc   | 12 ++---
 gcc/cp/pt.cc| 11 ++---
 gcc/testsuite/g++.dg/cpp0x/noexcept85.C | 33 +
 gcc/testsuite/g++.dg/cpp0x/noexcept86.C | 25 +++
 4 files changed, 68 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept85.C
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept86.C

diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
index dce4caf8981..4b2b27bdd0d 100644
--- a/gcc/cp/name-lookup.cc
+++ b/gcc/cp/name-lookup.cc
@@ -8664,10 +8664,14 @@ maybe_push_to_top_level (tree d)
 {
   /* Push if D isn't function-local, or is a lambda function, for which name
  resolution is already done.  */
-  bool push_to_top
-= !(current_function_decl
-   && !LAMBDA_FUNCTION_P (d)
-   && decl_function_context (d) == current_function_decl);
+  const bool push_to_top
+= (LAMBDA_FUNCTION_P (d)
+   || (TREE_CODE (d) == TYPE_DECL
+  && TREE_TYPE (d)
+  && LAMBDA_TYPE_P (TREE_TYPE (d)))
+   || !current_function_decl
+   || (!decl_function_context (d)
+  && decl_function_context (d) != current_function_decl));
 
   if (push_to_top)
 push_to_top_level ();
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 8cf0d5b7a8d..7b00a8615d2 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -26855,7 +26855,7 @@ maybe_instantiate_noexcept (tree fn, tsubst_flags_t 
complain)
}
   else if (push_tinst_level (fn))
{
- push_to_top_level ();
+ const bool push_to_top = maybe_push_to_top_level (fn);
  push_access_scope (fn);
  push_deferring_access_checks (dk_no_deferred);
  input_location = DECL_SOURCE_LOCATION (fn);
@@ -26878,17 +26878,10 @@ maybe_instantiate_noexcept (tree fn, tsubst_flags_t 
complain)
  if (orig_fn)
++processing_template_decl;
 
- ++cp_unevaluated_operand;
- ++c_inhibit_evaluation_warnings;
- ++cp_noexcept_operand;
  /* Do deferred instantiation of the noexcept-specifier.  */
  noex = tsubst_expr (DEFERRED_NOEXCEPT_PATTERN (noex),

[PATCH] libgcc: arm: fix build for FDPIC target

2024-03-22 Thread Max Filippov

libgcc/
* unwind-arm-common.inc (__gnu_personality_sigframe_fdpic): Cast
last argument of _Unwind_VRS_Set to void *.
---
 libgcc/unwind-arm-common.inc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgcc/unwind-arm-common.inc b/libgcc/unwind-arm-common.inc
index 5453f38186b5..576f7e93e8a8 100644
--- a/libgcc/unwind-arm-common.inc
+++ b/libgcc/unwind-arm-common.inc
@@ -248,7 +248,7 @@ __gnu_personality_sigframe_fdpic (_Unwind_State state,
  + ARM_SIGCONTEXT_R0;
 /* Restore regs saved on stack by the kernel.  */
 for (i = 0; i < 16; i++)
-   _Unwind_VRS_Set (context, _UVRSC_CORE, i, _UVRSD_UINT32, sp + 4 * i);
+   _Unwind_VRS_Set (context, _UVRSC_CORE, i, _UVRSD_UINT32, (void *)(sp + 
4 * i));
 
 return _URC_CONTINUE_UNWIND;
 }
-- 
2.39.2

Re: [PATCH] RISC-V: Add initial cost handling for segment loads/stores.

2024-03-22 Thread Jeff Law





On 3/1/24 8:07 AM, Robin Dapp wrote:

+  /* Segment load/store permute cost.  */
+  const int segment_permute_2;
+  const int segment_permute_4;
+  const int segment_permute_8;

Why do we only have 2/4/8, I think we should have 2/3/4/5/6/7/8


No idea why I posted that (wrong) version, I used it for
some testing locally.  Attached is the proper version, still
called it v3...

Regards
  Robin

Subject: [PATCH v3] RISC-V: Add initial cost handling for segment
  loads/stores.

This patch makes segment loads and stores more expensive.  It adds
segment_permute_2 as well as 3 to 8 cost fields to the common vector
costs and adds handling to adjust_stmt_cost.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (struct common_vector_cost): Add
segment_permute cost.
* config/riscv/riscv-vector-costs.cc (costs::adjust_stmt_cost):
Handle segment loads/stores.
* config/riscv/riscv.cc: Initialize segment_permute_[2-8] to 1.
So where do we stand with this?  Juzhe asked it to be rebased, but I 
don't see a rebased version in my inbox and I don't see anything that 
looks like this on the trunk.


jeff

Re: New effective-target 'asm_goto_with_outputs'

2024-03-22 Thread Jeff Law





On 3/22/24 12:24 PM, Jakub Jelinek wrote:

On Fri, Mar 22, 2024 at 12:17:03PM -0600, Jeff Law wrote:

I'd just make target_lra return false for nvptx rather than creating a new


The lra effective target currently though doesn't check if asm goto can have
outputs, but rather if the target is using lra.
Right.  It's not 100% precise as we lose one testcase for nvptx.  THat's 
a tradeoff I'd be willing to make.





selector -- I'm not aware of any features other than asm goto that LRA
provides that aren't supported reload.

Or perhaps rename the selector entirely to target_asm_goto?


In that case we should just test if asm goto with outputs is allowed
in a cached snippet, rather than testing if there is LRA in the ra dumps.
I won't lose any sleep with that approach, I just don't see that it adds 
a lot of value.


jeff

Re: [PATCH v2] DSE: Bugfix ICE after allow vector type in get_stored_val

2024-03-22 Thread Jeff Law





On 3/4/24 11:22 PM, Li, Pan2 wrote:

Thanks Jeff for comments.


But in the case of a vector modes, we can usually reinterpret the
underlying bits in whatever mode we want and do any of the usual
operations on those bits.


Yes, I think that is why we can allow vector mode in get_stored_val if my 
understanding is correct.
And then the different modes will return by gen_low_part. Unfortunately, there 
are some modes
  (less than a vector bit size like V2SF, V2QI for vlen=128) are considered as 
invalid by validate_subreg,
and return NULL_RTX result in the final ICE.
That doesn't make a lot of sense to me.  Even for vlen=128 I would have 
expected that we can still use a subreg to access low bits.  After all 
we might have had a V16QI vector and done a reduction of some sort 
storing the result in the first element and we have to be able to 
extract that result and move it around.


I'm not real keen on a target workaround.  While extremely safe, I 
wouldn't be surprised if other ports could trigger the ICE and we'd end 
up patching up multiple targets for what is, IMHO, a more generic issue.


As Richi noted using validate_subreg here isn't great.  Does it work to 
factor out this code from extract_low_bits:




  if (!int_mode_for_mode (src_mode).exists (_int_mode)
  || !int_mode_for_mode (mode).exists (_mode))
return NULL_RTX;

  if (!targetm.modes_tieable_p (src_int_mode, src_mode))
return NULL_RTX;
  if (!targetm.modes_tieable_p (int_mode, mode))
return NULL_RTX;


And use that in the condition (and in extract_low_bits rather than 
duplicating the code)?


jeff

ps.  No need to apologize for the pings.  This completely fell off my radar.

Re: New effective-target 'asm_goto_with_outputs'

2024-03-22 Thread Jakub Jelinek

On Fri, Mar 22, 2024 at 12:17:03PM -0600, Jeff Law wrote:
> I'd just make target_lra return false for nvptx rather than creating a new

The lra effective target currently though doesn't check if asm goto can have
outputs, but rather if the target is using lra.

> selector -- I'm not aware of any features other than asm goto that LRA
> provides that aren't supported reload.
> 
> Or perhaps rename the selector entirely to target_asm_goto?

In that case we should just test if asm goto with outputs is allowed
in a cached snippet, rather than testing if there is LRA in the ra dumps.

Jakub

Re: New effective-target 'asm_goto_with_outputs'

2024-03-22 Thread Jeff Law





On 3/21/24 5:20 AM, Thomas Schwinge wrote:

Hi!

On 2024-02-16T10:48:53-0800, Mike Stump  wrote:

On Feb 16, 2024, at 2:16 AM, Jakub Jelinek  wrote:


There is one special case, NVPTX, which is a TARGET_NO_REGISTER_ALLOCATION
target.  I think claiming for it that it is a lra target is strange (even
though it effectively returns true for targetm.lra_p ()), unsure if it
supports asm goto with outputs or not, if it does and we want to test it,
perhaps we should introduce asm_goto_outputs effective target and use
lra || nvptx-*-* for that?


Since the port people have to maintain that code in general, I usually leave it 
to them to try and select a cheap, maintainable way to manage it.

If people want to pave the way, I'd tend to defer to them, having thought about 
more than I.


Here I am.  ;-)

After commit e16f90be2dc8af6c371fe79044c3e668fa3dda62
"testsuite: Fix up lra effective target", we get for nvptx target:

 -PASS: gcc.c-torture/compile/asmgoto-2.c   -O0  (test for excess errors)
 +ERROR: gcc.c-torture/compile/asmgoto-2.c   -O0 : no files matched glob pattern 
"lra1020113.c.[0-9][0-9][0-9]r.reload" for " dg-do 2 compile { target lra } "

Etc.

That is, the current effective-target 'lra' is not suitable for nvptx --
which, I suppose, is OK, given that nvptx neither uses LRA nor doesn't
use LRA.  ;-) (Therefore, effective-target 'lra' shouldn't get used in
test cases that are active for nvptx.)

However, nvptx appears to support 'asm goto' with outputs, including the
new execution test case:

 PASS: gcc.dg/pr107385.c execution test

I'm attaching "[WIP] New effective-target 'asm_goto_with_outputs'", which
does address the effective-target check for nvptx, and otherwise does
's%lra%asm_goto_with_outputs'.  (I have not yet actually merged
'check_effective_target_lra' into
'check_effective_target_asm_goto_with_outputs'.)

I have verified that all current effective-target 'lra' test cases
actually use 'asm goto' with outputs, there is just one exception:
'gcc.dg/pr110079.c' (see

"bb-reorder: Fix -freorder-blocks-and-partition ICEs on aarch64 with asm goto 
[PR110079]",

"ICE with -freorder-blocks-and-partition and inline-asm goto").  That
test case, 'gcc.dg/pr110079.c', currently uses 'target lra', and uses
'asm goto' -- but not with outputs, so is 'asm_goto_with_outputs' not
really applicable?  The test case does PASS for nvptx target (but I've
not verified what it's actually doing/testing).  How to handle that one?
I'd just make target_lra return false for nvptx rather than creating a 
new selector -- I'm not aware of any features other than asm goto that 
LRA provides that aren't supported reload.


Or perhaps rename the selector entirely to target_asm_goto?

jeff

Re: [PATCH] [tree-prof] skip if errors were seen [PR113681]

2024-03-22 Thread Jeff Law





On 3/9/24 2:11 AM, Alexandre Oliva wrote:


ipa_tree_profile asserts that the symtab is in IPA_SSA state, but we
don't reach that state and ICE if e.g. ipa-strub passes report errors.
Skip this pass if errors were seen.

Regstrapped on x86_64-linux-gnu.  Ok to install?


for  gcc/ChangeLog

PR tree-optimization/113681
* tree-profiling.cc (pass_ipa_tree_profile::gate): Skip if
seen_errors.

for  gcc/testsuite/ChangeLog

PR tree-optimization/113681
* c-c++-common/strub-pr113681.c: New.
So I've really never dug into strub, but this would seem to imply that 
an error from strub is non-fatal?Are we going to end up having to 
make a similar change to gate most passes if strub let's things to 
forward rather than causing a graceful exit?


jeff

[Committed] RISC-V: Require a extension for ztso testcases with atomic insns

2024-03-22 Thread Patrick O'Neill




On 3/22/24 07:22, Palmer Dabbelt wrote:

On Thu, 21 Mar 2024 10:00:24 PDT (-0700), Patrick O'Neill wrote:

Use dg_add_options riscv_a to add atomic extension when running compile
tests on non-a targets.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/amo-table-ztso-amo-add-1.c: Add
  dg_add_options riscv_a
* gcc.target/riscv/amo-table-ztso-amo-add-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-amo-add-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-amo-add-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-amo-add-5.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-1.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-5.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-6.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-7.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-1.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-5.c: Ditto.

Signed-off-by: Patrick O'Neill 


Presumably these trip up on the non-A targets that Edwin's just adding 
to the

testers?  They'd also trip up anyone running newlib/mulilib tests.

Either way they look right to me, so

Reviewed-by: Palmer Dabbelt 
Acked-by: Palmer Dabbelt 

Thanks!


Committed - Thanks!

And yes, this was in response to the rv32/64imc_* targets that Edwin was 
adding to ci.


Patrick

[PATCH] Fortran: no size check passing NULL() without MOLD argument [PR55978]

2024-03-22 Thread Harald Anlauf

Dear all,

here's a simple and obvious patch for a rejects-valid case when
we pass a NULL() actual to an optional dummy for variants where
there is no MOLD argument and it is also not required.

The testcase is an extended version of PR55978 comment#16
and cross-checked with Intel and NAG.

Regtested on x86_64-pc-linux-gnu.

I intend to commit soon unless there are objections.

Thanks,
Harald

From e92244c5539a537cff338b781d15acd58d4c86f1 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Fri, 22 Mar 2024 18:17:15 +0100
Subject: [PATCH] Fortran: no size check passing NULL() without MOLD argument
 [PR55978]

gcc/fortran/ChangeLog:

	PR fortran/55978
	* interface.cc (gfc_compare_actual_formal): Skip size check for
	NULL() actual without MOLD argument.

gcc/testsuite/ChangeLog:

	PR fortran/55978
	* gfortran.dg/null_actual_5.f90: New test.
---
 gcc/fortran/interface.cc|  4 ++
 gcc/testsuite/gfortran.dg/null_actual_5.f90 | 76 +
 2 files changed, 80 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/null_actual_5.f90

diff --git a/gcc/fortran/interface.cc b/gcc/fortran/interface.cc
index 64b90550be2..7b86a338bc1 100644
--- a/gcc/fortran/interface.cc
+++ b/gcc/fortran/interface.cc
@@ -3439,6 +3439,10 @@ gfc_compare_actual_formal (gfc_actual_arglist **ap, gfc_formal_arglist *formal,
   if (f->sym->ts.type == BT_CLASS)
 	goto skip_size_check;

+  /* Skip size check for NULL() actual without MOLD argument.  */
+  if (a->expr->expr_type == EXPR_NULL && a->expr->ts.type == BT_UNKNOWN)
+	goto skip_size_check;
+
   actual_size = get_expr_storage_size (a->expr);
   formal_size = get_sym_storage_size (f->sym);
   if (actual_size != 0 && actual_size < formal_size
diff --git a/gcc/testsuite/gfortran.dg/null_actual_5.f90 b/gcc/testsuite/gfortran.dg/null_actual_5.f90
new file mode 100644
index 000..1198715b7c8
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/null_actual_5.f90
@@ -0,0 +1,76 @@
+! { dg-do compile }
+! PR fortran/55978
+!
+! Passing of NULL() with and without MOLD as actual argument
+!
+! Testcase derived from pr55978 comment#16
+
+program pr55978_c16
+  implicit none
+
+  integer, pointer   :: p(:)
+  integer, allocatable   :: a(:)
+  character(10), pointer :: c
+  character(10), pointer :: cp(:)
+
+  type t
+integer, pointer :: p(:)
+integer, allocatable :: a(:)
+  end type
+
+  type(t) :: d
+
+  ! (1) pointer
+  p => null()
+  call sub (p)
+
+  ! (2) allocatable
+  call sub (a)
+  call sub (d%a)
+
+  ! (3) pointer component
+  d%p => null ()
+  call sub (d%p)
+
+  ! (4) NULL
+  call sub (null (a))   ! OK
+  call sub (null (p))   ! OK
+  call sub (null (d%a)) ! OK
+  call sub (null (d%p)) ! OK
+  call sub (null ())! was erroneously rejected with:
+  ! Actual argument contains too few elements for dummy argument 'x' (1/4)
+
+  call bla (null(c))
+  call bla (null()) ! was erroneously rejected with:
+  ! Actual argument contains too few elements for dummy argument 'x' (1/10)
+
+  call foo (null(cp))
+  call foo (null())
+
+  call bar (null(cp))
+  call bar (null()) ! was erroneously rejected with:
+  ! Actual argument contains too few elements for dummy argument 'x' (1/70)
+
+contains
+
+  subroutine sub(x)
+integer, intent(in), optional :: x(4)
+if (present (x)) stop 1
+  end
+
+  subroutine bla(x)
+character(len=10), intent(in), optional :: x
+if (present (x)) stop 2
+  end
+
+  subroutine foo(x)
+character(len=10), intent(in), optional :: x(:)
+if (present (x)) stop 3
+  end
+
+  subroutine bar(x)
+character(len=10), intent(in), optional :: x(7)
+if (present (x)) stop 4
+  end
+
+end
--
2.35.3

Re: [PATCH] Revert "Pass GUILE down to subdirectories"

2024-03-22 Thread Tom Tromey

> "Andrew" == Andrew Burgess  writes:

Andrew> Thanks, that would be great, and would certainly fix the build problems
Andrew> I see.

I'm going to check it in to binutils-gdb in a minute.

For those reading on gcc-patches, please consider this a ping of the
patch.

thanks,
Tom

Re: scheduler queue flush (was Re: [gcc-15 0/3] RISC-V improve stack/array access by constant mat tweak)

2024-03-22 Thread Vineet Gupta




On 3/22/24 05:29, Jeff Law wrote:
>> Another option is to enable -fsched-pressure which should help with
>> this issue.
> In theory we're already using that by default -- it's part of what makes 
> me so curious to understand what's going on.

We are actually using it in practice :-)
Its the default for RISC-V port since Aug of last year.

-Vineet

[wwwdocs, committed] gcc-14: amdgcn: Add gfx1103

2024-03-22 Thread Andrew Stubbs

I added a note about gfx1103 to the existing text for gfx1100.

Andrew

---
 htdocs/gcc-14/changes.html | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index d88fbc96..880b9195 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -343,11 +343,11 @@ a work-in-progress.
 AMD Radeon (GCN)
 
 
-  Initial support for the AMD Radeon gfx1030 (RDNA2) and
-gfx1100 (RDNA3) devices has been added. LLVM 15+ (assembler
-and linker) is Initial support for the AMD Radeon gfx1030 (RDNA2),
+gfx1100 and gfx1103 (RDNA3) devices has been
+added. LLVM 15+ (assembler and linker) is https://gcc.gnu.org/install/specific.html#amdgcn-x-amdhsa;>required
-to support gfx1100.
+to support GFX11.
   Improved register usage and performance on CDNA Instinct MI100
 and MI200 series devices.
   The default device architecture is now gfx900 (Vega).
-- 
2.41.0

[patch,avr,applied] Adjust message for SIGNAL and INTERRUPT usage

2024-03-22 Thread Georg-Johann Lay


Applied this patchlet for a more precise diagnostic.

Johann

--

AVR: Adjust message for SIGNAL and INTERRUPT usage

gcc/
* config/avr/avr.cc (avr_set_current_function): Adjust diagnostic
for deprecated SIGNAL and INTERRUPT usage without respective header.

diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc
index 12c59668b4c..4a5a921107b 100644
--- a/gcc/config/avr/avr.cc
+++ b/gcc/config/avr/avr.cc
@@ -1495,14 +1495,20 @@ avr_set_current_function (tree decl)
   // Common problem is using "ISR" without first including 
avr/interrupt.h.

   const char *name = IDENTIFIER_POINTER (DECL_NAME (decl));
   name = default_strip_name_encoding (name);
-  if (strcmp ("ISR", name) == 0
-  || strcmp ("INTERRUPT", name) == 0
-  || strcmp ("SIGNAL", name) == 0)
+  if (strcmp ("ISR", name) == 0)
 {
   warning_at (loc, OPT_Wmisspelled_isr, "%qs is a reserved identifier"
  " in AVR-LibC.  Consider %<#include %>"
  " before using the %qs macro", name, name);
 }
+  if (strcmp ("INTERRUPT", name) == 0
+  || strcmp ("SIGNAL", name) == 0)
+{
+  warning_at (loc, OPT_Wmisspelled_isr, "%qs is a deprecated 
identifier"

+ " in AVR-LibC.  Consider %<#include %>"
+ " or %<#include %>"
+ " before using the %qs macro", name, name);
+}
 #endif // AVR-LibC naming conventions

   /* Don't print the above diagnostics more than once.  */

[committed] amdgcn: Adjust GFX10/GFX11 cache coherency

2024-03-22 Thread Andrew Stubbs

The RDNA devices have different cache architectures to the CDNA devices, and
the differences go deeper than just the assembler mnemonics, so we
probably need to generate different code to maintain coherency across
the whole device.

I believe this patch is correct according to the documentation in the LLVM
AMDGPU user guide (the ISA manual is less instructive), but I hadn't observed
any real problems before (or after).

Committed to mainline.

Andrew

gcc/ChangeLog:

* config/gcn/gcn.md (*memory_barrier): Split into RDNA and !RDNA.
(atomic_load): Adjust RDNA cache settings.
(atomic_store): Likewise.
(atomic_exchange): Likewise.
---
 gcc/config/gcn/gcn.md | 86 +++
 1 file changed, 55 insertions(+), 31 deletions(-)

diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md
index 3b51453aaca..574c2f87e8c 100644
--- a/gcc/config/gcn/gcn.md
+++ b/gcc/config/gcn/gcn.md
@@ -1960,11 +1960,19 @@
 (define_insn "*memory_barrier"
   [(set (match_operand:BLK 0)
(unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BARRIER))]
-  ""
-  "{buffer_wbinvl1_vol|buffer_gl0_inv}"
+  "!TARGET_RDNA2_PLUS"
+  "buffer_wbinvl1_vol"
   [(set_attr "type" "mubuf")
(set_attr "length" "4")])
 
+(define_insn "*memory_barrier"
+  [(set (match_operand:BLK 0)
+   (unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BARRIER))]
+  "TARGET_RDNA2_PLUS"
+  "buffer_gl1_inv\;buffer_gl0_inv"
+  [(set_attr "type" "mult")
+   (set_attr "length" "8")])
+
 ; FIXME: These patterns have been disabled as they do not seem to work
 ; reliably - they can cause hangs or incorrect results.
 ; TODO: flush caches according to memory model
@@ -2094,9 +2102,13 @@
  case 0:
return "s_load%o0\t%0, %A1 glc\;s_waitcnt\tlgkmcnt(0)";
  case 1:
-   return "flat_load%o0\t%0, %A1%O1 glc\;s_waitcnt\t0";
+   return (TARGET_RDNA2 /* Not GFX11.  */
+   ? "flat_load%o0\t%0, %A1%O1 glc dlc\;s_waitcnt\t0"
+   : "flat_load%o0\t%0, %A1%O1 glc\;s_waitcnt\t0");
  case 2:
-   return "global_load%o0\t%0, %A1%O1 glc\;s_waitcnt\tvmcnt(0)";
+   return (TARGET_RDNA2 /* Not GFX11.  */
+   ? "global_load%o0\t%0, %A1%O1 glc dlc\;s_waitcnt\tvmcnt(0)"
+   : "global_load%o0\t%0, %A1%O1 glc\;s_waitcnt\tvmcnt(0)");
  }
break;
   case MEMMODEL_CONSUME:
@@ -2108,15 +2120,21 @@
return "s_load%o0\t%0, %A1 glc\;s_waitcnt\tlgkmcnt(0)\;"
   "s_dcache_wb_vol";
  case 1:
-   return (TARGET_RDNA2_PLUS
+   return (TARGET_RDNA2
+   ? "flat_load%o0\t%0, %A1%O1 glc dlc\;s_waitcnt\t0\;"
+ "buffer_gl1_inv\;buffer_gl0_inv"
+   : TARGET_RDNA3
? "flat_load%o0\t%0, %A1%O1 glc\;s_waitcnt\t0\;"
- "buffer_gl0_inv"
+ "buffer_gl1_inv\;buffer_gl0_inv"
: "flat_load%o0\t%0, %A1%O1 glc\;s_waitcnt\t0\;"
  "buffer_wbinvl1_vol");
  case 2:
-   return (TARGET_RDNA2_PLUS
+   return (TARGET_RDNA2
+   ? "global_load%o0\t%0, %A1%O1 glc 
dlc\;s_waitcnt\tvmcnt(0)\;"
+ "buffer_gl1_inv\;buffer_gl0_inv"
+   : TARGET_RDNA3
? "global_load%o0\t%0, %A1%O1 glc\;s_waitcnt\tvmcnt(0)\;"
- "buffer_gl0_inv"
+ "buffer_gl1_inv\;buffer_gl0_inv"
: "global_load%o0\t%0, %A1%O1 glc\;s_waitcnt\tvmcnt(0)\;"
  "buffer_wbinvl1_vol");
  }
@@ -2130,15 +2148,21 @@
return "s_dcache_wb_vol\;s_load%o0\t%0, %A1 glc\;"
   "s_waitcnt\tlgkmcnt(0)\;s_dcache_inv_vol";
  case 1:
-   return (TARGET_RDNA2_PLUS
-   ? "buffer_gl0_inv\;flat_load%o0\t%0, %A1%O1 glc\;"
- "s_waitcnt\t0\;buffer_gl0_inv"
+   return (TARGET_RDNA2
+   ? "buffer_gl1_inv\;buffer_gl0_inv\;flat_load%o0\t%0, %A1%O1 
glc dlc\;"
+ "s_waitcnt\t0\;buffer_gl1_inv\;buffer_gl0_inv"
+   : TARGET_RDNA3
+   ? "buffer_gl1_inv\;buffer_gl0_inv\;flat_load%o0\t%0, %A1%O1 
glc\;"
+ "s_waitcnt\t0\;buffer_gl1_inv\;buffer_gl0_inv"
: "buffer_wbinvl1_vol\;flat_load%o0\t%0, %A1%O1 glc\;"
  "s_waitcnt\t0\;buffer_wbinvl1_vol");
  case 2:
-   return (TARGET_RDNA2_PLUS
-   ? "buffer_gl0_inv\;global_load%o0\t%0, %A1%O1 glc\;"
- "s_waitcnt\tvmcnt(0)\;buffer_gl0_inv"
+   return (TARGET_RDNA2
+   ? "buffer_gl1_inv\;buffer_gl0_inv\;global_load%o0\t%0, 
%A1%O1 glc dlc\;"
+ "s_waitcnt\tvmcnt(0)\;buffer_gl1_inv\;buffer_gl0_inv"
+   : TARGET_RDNA3
+   ? "buffer_gl1_inv\;buffer_gl0_inv\;global_load%o0\t%0,

[committed] amdgcn: Prefer V32 on RDNA devices

2024-03-22 Thread Andrew Stubbs

This patch alters the default (preferred) vector size to 32 on RDNA devices to
better match the actual hardware.  64-lane vectors will continue to be
used where they are hard-coded (such as function prologues).

We run these devices in wavefrontsize64 for compatibility, but they actually
only have 32-lane vectors, natively.  If the upper part of a V64 is masked
off (as it is in V32) then RDNA devices will skip execution of the upper part
for most operations, so this adjustment shouldn't leave too much performance on
the table.  One exception is memory instructions, so full wavefrontsize32
support would be better.

The advantage is that we avoid the missing V64 operations (such as permute and
vec_extract).

Committed to mainline.

Andrew

gcc/ChangeLog:

* config/gcn/gcn.cc (gcn_vectorize_preferred_simd_mode): Prefer V32 on
RDNA devices.
---
 gcc/config/gcn/gcn.cc | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 498146dcde9..efb73af50c4 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -5226,6 +5226,32 @@ gcn_vector_mode_supported_p (machine_mode mode)
 static machine_mode
 gcn_vectorize_preferred_simd_mode (scalar_mode mode)
 {
+  /* RDNA devices have 32-lane vectors with limited support for 64-bit vectors
+ (in particular, permute operations are only available for cases that don't
+ span the 32-lane boundary).
+
+ From the RDNA3 manual: "Hardware may choose to skip either half if the
+ EXEC mask for that half is all zeros...". This means that preferring
+ 32-lanes is a good stop-gap until we have proper wave32 support.  */
+  if (TARGET_RDNA2_PLUS)
+switch (mode)
+  {
+  case E_QImode:
+   return V32QImode;
+  case E_HImode:
+   return V32HImode;
+  case E_SImode:
+   return V32SImode;
+  case E_DImode:
+   return V32DImode;
+  case E_SFmode:
+   return V32SFmode;
+  case E_DFmode:
+   return V32DFmode;
+  default:
+   return word_mode;
+  }
+
   switch (mode)
 {
 case E_QImode:
-- 
2.41.0

[pushed] analyzer: look through casts in taint sanitization [PR112974, PR112975]

2024-03-22 Thread David Malcolm

PR analyzer/112974 and PR analyzer/112975 record false positives
from the analyzer's taint detection where sanitization of the form

  if (VALUE CMP VALUE-OF-WIDER-TYPE)

happens, but wasn't being "noticed" by the taint checker, due to the
test being:

  (WIDER_TYPE)VALUE CMP VALUE-OF-WIDER-TYPE

at the gimple level, and thus taint_state_machine recording
sanitization of (WIDER_TYPE)VALUE, but not of VALUE.

Fix by stripping casts in taint_state_machine::on_condition so that
the state machine records sanitization of the underlying value.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Successful run of analyzer integration tests on x86_64-pc-linux-gnu.
Pushed to trunk as r14-9625-gc6cf5789135236.

gcc/analyzer/ChangeLog:
PR analyzer/112974
PR analyzer/112975
* sm-taint.cc (taint_state_machine::on_condition): Strip away
casts before considering LHS and RHS, to increase the chance of
detecting places where sanitization of a value may have happened.

gcc/testsuite/ChangeLog:
PR analyzer/112974
PR analyzer/112975
* gcc.dg/plugin/plugin.exp (plugin_test_list): Add
taint-pr112974.c and taint-pr112975.c to analyzer_kernel_plugin.c.
* gcc.dg/plugin/taint-pr112974.c: New test.
* gcc.dg/plugin/taint-pr112975.c: New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/sm-taint.cc |  8 +++
 gcc/testsuite/gcc.dg/plugin/plugin.exp   |  2 +
 gcc/testsuite/gcc.dg/plugin/taint-pr112974.c | 59 
 gcc/testsuite/gcc.dg/plugin/taint-pr112975.c | 53 ++
 4 files changed, 122 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/plugin/taint-pr112974.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/taint-pr112975.c

diff --git a/gcc/analyzer/sm-taint.cc b/gcc/analyzer/sm-taint.cc
index c873c9ebd333..1d1e208fdf49 100644
--- a/gcc/analyzer/sm-taint.cc
+++ b/gcc/analyzer/sm-taint.cc
@@ -1109,6 +1109,14 @@ taint_state_machine::on_condition (sm_context *sm_ctxt,
   return;
 }
 
+  /* Strip away casts before considering LHS and RHS, to increase the
+ chance of detecting places where sanitization of a value may have
+ happened.  */
+  if (const svalue *inner = lhs->maybe_undo_cast ())
+lhs = inner;
+  if (const svalue *inner = rhs->maybe_undo_cast ())
+rhs = inner;
+
   // TODO
   switch (op)
 {
diff --git a/gcc/testsuite/gcc.dg/plugin/plugin.exp 
b/gcc/testsuite/gcc.dg/plugin/plugin.exp
index c26dda1f324b..933f9a5850bc 100644
--- a/gcc/testsuite/gcc.dg/plugin/plugin.exp
+++ b/gcc/testsuite/gcc.dg/plugin/plugin.exp
@@ -172,6 +172,8 @@ set plugin_test_list [list \
  taint-pr112850-too-complex.c \
  taint-pr112850-unsanitized.c \
  taint-pr112927.c \
+ taint-pr112974.c \
+ taint-pr112975.c \
  taint-pr112977.c } \
 { analyzer_cpython_plugin.c \
  cpython-plugin-test-no-Python-h.c \
diff --git a/gcc/testsuite/gcc.dg/plugin/taint-pr112974.c 
b/gcc/testsuite/gcc.dg/plugin/taint-pr112974.c
new file mode 100644
index ..1af505326c78
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/plugin/taint-pr112974.c
@@ -0,0 +1,59 @@
+/* Reduced from false positive in Linux kernel in
+   drivers/platform/x86/intel/speed_select_if/isst_tpmi_core.c.  */
+
+/* { dg-do compile } */
+/* { dg-options "-fanalyzer" } */
+/* { dg-require-effective-target analyzer } */
+
+typedef unsigned char __u8;
+typedef unsigned short __u16;
+extern unsigned int __max_logical_packages;
+extern unsigned long
+copy_from_user(void* to, const void* from, unsigned long n);
+extern unsigned long
+copy_to_user(void* to, const void* from, unsigned long n);
+struct isst_tpmi_instance_count
+{
+  __u8 socket_id;
+  __u8 count;
+  __u16 valid_mask;
+};
+struct tpmi_per_power_domain_info
+{
+  void* sst_base;
+};
+struct tpmi_sst_struct
+{
+  int number_of_power_domains;
+  struct tpmi_per_power_domain_info* power_domain_info;
+};
+struct tpmi_sst_common_struct
+{
+  int max_index;
+  struct tpmi_sst_struct** sst_inst;
+};
+static struct tpmi_sst_common_struct isst_common;
+int
+isst_if_get_tpmi_instance_count(void* argp)
+{
+  struct isst_tpmi_instance_count tpmi_inst;
+  struct tpmi_sst_struct* sst_inst;
+  int i;
+  if (copy_from_user(_inst, argp, sizeof(tpmi_inst)))
+return -14;
+  if (tpmi_inst.socket_id >= (__max_logical_packages))
+return -22;
+  tpmi_inst.count =
+isst_common.sst_inst[tpmi_inst.socket_id]->number_of_power_domains; /* { 
dg-bogus "use of attacker-controlled value as offset without upper-bounds 
checking" } */
+  sst_inst = isst_common.sst_inst[tpmi_inst.socket_id];
+  tpmi_inst.valid_mask = 0;
+  for (i = 0; i < sst_inst->number_of_power_domains; ++i) {
+struct tpmi_per_power_domain_info* pd_info;
+pd_info = _inst->power_domain_info[i];
+if (pd_info->sst_base)
+  tpmi_inst.valid_mask |= 1UL))) << (i));
+  }
+  if (copy_to_user(argp, _inst, sizeof(tpmi_inst)))
+return -14;
+

[pushed] analyzer: add SARIF property bags to taint diagnostics

2024-03-22 Thread David Malcolm

Another followup to r14-6057-g12b67d1e13b3cf to make it easier to debug
the analyzer.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Successful run of analyzer integration tests on x86_64-pc-linux-gnu.
Pushed to trunk as r14-9624-gd475a4571ef310.

gcc/analyzer/ChangeLog:
* sm-taint.cc: Include "diagnostic-format-sarif.h".
(bounds_to_str): New.
(taint_diagnostic::maybe_add_sarif_properties): New.
(tainted_offset::tainted_offset): Add "offset" param.
(tainted_offset::maybe_add_sarif_properties): New.
(tainted_offset::m_offset): New.
(region_model::check_region_for_taint): Pass offset to
tainted_offset ctor.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/sm-taint.cc | 50 +---
 1 file changed, 47 insertions(+), 3 deletions(-)

diff --git a/gcc/analyzer/sm-taint.cc b/gcc/analyzer/sm-taint.cc
index bbf683f82efc..c873c9ebd333 100644
--- a/gcc/analyzer/sm-taint.cc
+++ b/gcc/analyzer/sm-taint.cc
@@ -50,6 +50,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "analyzer/program-state.h"
 #include "analyzer/pending-diagnostic.h"
 #include "analyzer/constraint-manager.h"
+#include "diagnostic-format-sarif.h"
 
 #if ENABLE_ANALYZER
 
@@ -71,6 +72,22 @@ enum bounds
   BOUNDS_LOWER
 };
 
+static const char *
+bounds_to_str (enum bounds b)
+{
+  switch (b)
+{
+default:
+  gcc_unreachable ();
+case BOUNDS_NONE:
+  return "BOUNDS_NONE";
+case BOUNDS_UPPER:
+  return "BOUNDS_UPPER";
+case BOUNDS_LOWER:
+  return "BOUNDS_LOWER";
+}
+}
+
 /* An experimental state machine, for tracking "taint": unsanitized uses
of data potentially under an attacker's control.  */
 
@@ -193,6 +210,17 @@ public:
 return diagnostic_event::meaning ();
   }
 
+  void maybe_add_sarif_properties (sarif_object _obj)
+const override
+  {
+sarif_property_bag  = result_obj.get_or_create_properties ();
+#define PROPERTY_PREFIX "gcc/analyzer/taint_diagnostic/"
+props.set (PROPERTY_PREFIX "arg", tree_to_json (m_arg));
+props.set_string (PROPERTY_PREFIX "has_bounds",
+ bounds_to_str (m_has_bounds));
+#undef PROPERTY_PREFIX
+  }
+
 protected:
   const taint_state_machine _sm;
   tree m_arg;
@@ -315,8 +343,10 @@ class tainted_offset : public taint_diagnostic
 {
 public:
   tainted_offset (const taint_state_machine , tree arg,
-  enum bounds has_bounds)
-  : taint_diagnostic (sm, arg, has_bounds)
+ enum bounds has_bounds,
+ const svalue *offset)
+  : taint_diagnostic (sm, arg, has_bounds),
+m_offset (offset)
   {}
 
   const char *get_kind () const final override { return "tainted_offset"; }
@@ -409,6 +439,19 @@ public:
 " checking");
}
   }
+
+  void maybe_add_sarif_properties (sarif_object _obj)
+const final override
+  {
+taint_diagnostic::maybe_add_sarif_properties (result_obj);
+sarif_property_bag  = result_obj.get_or_create_properties ();
+#define PROPERTY_PREFIX "gcc/analyzer/tainted_offset/"
+props.set (PROPERTY_PREFIX "offset", m_offset->to_json ());
+#undef PROPERTY_PREFIX
+  }
+
+private:
+  const svalue *m_offset;
 };
 
 /* Concrete taint_diagnostic subclass for reporting attacker-controlled
@@ -1554,7 +1597,8 @@ region_model::check_region_for_taint (const region *reg,
if (taint_sm.get_taint (state, effective_type, ))
  {
tree arg = get_representative_tree (offset);
-   ctxt->warn (make_unique (taint_sm, arg, b));
+   ctxt->warn (make_unique (taint_sm, arg, b,
+offset));
  }
  }
  break;
-- 
2.26.3

[committed] amdgcn: Add gfx1103 target

2024-03-22 Thread Andrew Stubbs

This patch adds support for the gfx1103 RDNA3 APU integrated graphics
devices.  The ROCm documentation warns that these may not be supported,
but it seems to work at least partially.

This device should be considered "Experimental" at this point, although
so far it seems to be at least as functional as gfx1100.

Committed to mainline.

Andrew

gcc/ChangeLog:

* config.gcc (amdgcn): Add gfx1103 entries.
* config/gcn/gcn-hsa.h (NO_XNACK): Likewise.
(gcn_local_sym_hash): Likewise.
* config/gcn/gcn-opts.h (enum processor_type): Likewise.
(TARGET_GFX1103): New macro.
* config/gcn/gcn.cc (gcn_option_override): Handle gfx1103.
(gcn_omp_device_kind_arch_isa): Likewise.
(output_file_start): Likewise.
(gcn_hsa_declare_function_name): Use TARGET_RDNA3, not just gfx1100.
* config/gcn/gcn.h (TARGET_CPU_CPP_BUILTINS): Add __gfx1103__.
* config/gcn/gcn.opt: Add gfx1103.
* config/gcn/mkoffload.cc (EF_AMDGPU_MACH_AMDGCN_GFX1103): New.
(main): Handle gfx1103.
* config/gcn/t-omp-device: Add gfx1103 isa.
* doc/install.texi (amdgcn): Add gfx1103.
* doc/invoke.texi (-march): Likewise.

libgomp/ChangeLog:

* plugin/plugin-gcn.c (EF_AMDGPU_MACH): GFX1103.
(gcn_gfx1103_s): New.
(isa_hsa_name): Handle gfx1103.
(isa_code): Likewise.
(max_isa_vgprs): Likewise.
---
 gcc/config.gcc  |  4 ++--
 gcc/config/gcn/gcn-hsa.h|  6 +++---
 gcc/config/gcn/gcn-opts.h   |  4 +++-
 gcc/config/gcn/gcn.cc   | 14 --
 gcc/config/gcn/gcn.h|  2 ++
 gcc/config/gcn/gcn.opt  |  3 +++
 gcc/config/gcn/mkoffload.cc |  5 +
 gcc/config/gcn/t-omp-device |  2 +-
 gcc/doc/install.texi| 13 +++--
 gcc/doc/invoke.texi |  3 +++
 libgomp/plugin/plugin-gcn.c | 10 +-
 11 files changed, 50 insertions(+), 16 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 040afabd9ec..87a5c92b6e3 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4560,7 +4560,7 @@ case "${target}" in
for which in arch tune; do
eval "val=\$with_$which"
case ${val} in
-   "" | fiji | gfx900 | gfx906 | gfx908 | gfx90a | gfx1030 
| gfx1100)
+   "" | fiji | gfx900 | gfx906 | gfx908 | gfx90a | gfx1030 
| gfx1100 | gfx1103)
# OK
;;
*)
@@ -4576,7 +4576,7 @@ case "${target}" in
TM_MULTILIB_CONFIG=
;;
xdefault | xyes)
-   TM_MULTILIB_CONFIG=`echo 
"gfx900,gfx906,gfx908,gfx90a,gfx1030,gfx1100" | sed 
"s/${with_arch},\?//;s/,$//"`
+   TM_MULTILIB_CONFIG=`echo 
"gfx900,gfx906,gfx908,gfx90a,gfx1030,gfx1100,gfx1103" | sed 
"s/${with_arch},\?//;s/,$//"`
;;
*)
TM_MULTILIB_CONFIG="${with_multilib_list}"
diff --git a/gcc/config/gcn/gcn-hsa.h b/gcc/config/gcn/gcn-hsa.h
index c75256dbac3..ac32b8a328f 100644
--- a/gcc/config/gcn/gcn-hsa.h
+++ b/gcc/config/gcn/gcn-hsa.h
@@ -90,7 +90,7 @@ extern unsigned int gcn_local_sym_hash (const char *name);
the ELF flags (e_flags) of that generated file must be identical to those
generated by the compiler.  */
 
-#define NO_XNACK "march=fiji:;march=gfx1030:;march=gfx1100:;" \
+#define NO_XNACK "march=fiji:;march=gfx1030:;march=gfx1100:;march=gfx1103:;" \
 /* These match the defaults set in gcn.cc.  */ \
 
"!mxnack*|mxnack=default:%{march=gfx900|march=gfx906|march=gfx908:-mattr=-xnack};"
 #define NO_SRAM_ECC "!march=*:;march=fiji:;march=gfx900:;march=gfx906:;"
@@ -106,8 +106,8 @@ extern unsigned int gcn_local_sym_hash (const char *name);
  "%{" ABI_VERSION_SPEC "} " \
  "%{" NO_XNACK XNACKOPT "} " \
  "%{" NO_SRAM_ECC SRAMOPT "} " \
- "%{march=gfx1030|march=gfx1100:-mattr=+wavefrontsize64} " \
- "%{march=gfx1030|march=gfx1100:-mattr=+cumode} " \
+ 
"%{march=gfx1030|march=gfx1100|march=gfx1103:-mattr=+wavefrontsize64} " \
+ "%{march=gfx1030|march=gfx1100|march=gfx1103:-mattr=+cumode} 
" \
  "-filetype=obj"
 #define LINK_SPEC "--pie --export-dynamic"
 #define LIB_SPEC  "-lc"
diff --git a/gcc/config/gcn/gcn-opts.h b/gcc/config/gcn/gcn-opts.h
index 6be2c9204fa..285746f7f4d 100644
--- a/gcc/config/gcn/gcn-opts.h
+++ b/gcc/config/gcn/gcn-opts.h
@@ -26,7 +26,8 @@ enum processor_type
   PROCESSOR_GFX908,
   PROCESSOR_GFX90a,
   PROCESSOR_GFX1030,
-  PROCESSOR_GFX1100
+  PROCESSOR_GFX1100,
+  PROCESSOR_GFX1103
 };
 
 #define TARGET_FIJI (gcn_arch == PROCESSOR_FIJI)
@@ -36,6 +37,7 @@ enum processor_type
 #define TARGET_GFX90a (gcn_arch == PROCESSOR_GFX90a)
 #define TARGET_GFX1030 (gcn_arch == PROCESSOR_GFX1030)
 #define TARGET_GFX1100 (gcn_arch

Re: [PATCH] handle unwind tables that are embedded within unwinding code, [PR111731]

2024-03-22 Thread Thomas Neumann


libgcc/ChangeLog:
 PR libgcc/111731
 * unwind-dw2-fde.c: Split unwind ranges if they contain the
 unwind table.
And what I'd suggest is committing to the trunk now, then waiting a week 
or two before backporting to gcc-13.


I will do that, thanks for looking at the patch.

Best

Thomas

Re: [PATCH] RISC-V: Require a extension for ztso testcases with atomic insns

2024-03-22 Thread Palmer Dabbelt


On Thu, 21 Mar 2024 10:00:24 PDT (-0700), Patrick O'Neill wrote:

Use dg_add_options riscv_a to add atomic extension when running compile
tests on non-a targets.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/amo-table-ztso-amo-add-1.c: Add
  dg_add_options riscv_a
* gcc.target/riscv/amo-table-ztso-amo-add-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-amo-add-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-amo-add-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-amo-add-5.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-1.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-5.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-6.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-7.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-1.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-5.c: Ditto.

Signed-off-by: Patrick O'Neill 
---
 gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-1.c| 1 +
 gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-2.c| 1 +
 gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-3.c| 1 +
 gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-4.c| 1 +
 gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-5.c| 1 +
 .../gcc.target/riscv/amo-table-ztso-compare-exchange-1.c | 1 +
 .../gcc.target/riscv/amo-table-ztso-compare-exchange-2.c | 1 +
 .../gcc.target/riscv/amo-table-ztso-compare-exchange-3.c | 1 +
 .../gcc.target/riscv/amo-table-ztso-compare-exchange-4.c | 1 +
 .../gcc.target/riscv/amo-table-ztso-compare-exchange-5.c | 1 +
 .../gcc.target/riscv/amo-table-ztso-compare-exchange-6.c | 1 +
 .../gcc.target/riscv/amo-table-ztso-compare-exchange-7.c | 1 +
 .../gcc.target/riscv/amo-table-ztso-subword-amo-add-1.c  | 1 +
 .../gcc.target/riscv/amo-table-ztso-subword-amo-add-2.c  | 1 +
 .../gcc.target/riscv/amo-table-ztso-subword-amo-add-3.c  | 1 +
 .../gcc.target/riscv/amo-table-ztso-subword-amo-add-4.c  | 1 +
 .../gcc.target/riscv/amo-table-ztso-subword-amo-add-5.c  | 1 +
 17 files changed, 17 insertions(+)

diff --git a/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-1.c 
b/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-1.c
index 65a4351025d..a9edc33ff39 100644
--- a/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-1.c
+++ b/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-1.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* Verify that atomic op mappings match the Ztso suggested mapping.  */
 /* { dg-options "-O3" } */
+/* { dg-add-options riscv_a } */
 /* { dg-add-options riscv_ztso } */
 /* { dg-skip-if "" { *-*-* } { "-g" "-flto"} } */
 /* { dg-final { check-function-bodies "**" "" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-2.c 
b/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-2.c
index 03da6b04de0..ad843402bcc 100644
--- a/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-2.c
+++ b/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-2.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* Verify that atomic op mappings the Ztso suggested mapping.  */
 /* { dg-options "-O3" } */
+/* { dg-add-options riscv_a } */
 /* { dg-add-options riscv_ztso } */
 /* { dg-skip-if "" { *-*-* } { "-g" "-flto"} } */
 /* { dg-final { check-function-bodies "**" "" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-3.c 
b/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-3.c
index 695306e9d6f..bdae5bb83a6 100644
--- a/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-3.c
+++ b/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-3.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* Verify that atomic op mappings match the Ztso suggested mapping.  */
 /* { dg-options "-O3" } */
+/* { dg-add-options riscv_a } */
 /* { dg-add-options riscv_ztso } */
 /* { dg-skip-if "" { *-*-* } { "-g" "-flto"} } */
 /* { dg-final { check-function-bodies "**" "" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-4.c 
b/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-4.c
index e7e5ac7cc88..815a72f1e56 100644
--- a/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-4.c
+++ b/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-4.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* Verify that atomic op mappings match the Ztso suggested mapping.  */
 /* { dg-options "-O3" } */
+/* { dg-add-options riscv_a } */
 /* {

Re: [PATCH] vect: more oversized bitmask fixups

2024-03-22 Thread Andrew Stubbs


On 22/03/2024 08:43, Richard Biener wrote:


  I'll note that we don't pass 'val' there and
'val' is unfortunately
not documented - what's it supposed to be?  I think I placed the original fix in
do_compare_and_jump because we have the full into available there.  So
what's the
do_compare_rtx_and_jump caller that needs fixing as well?  (IMHO keying on 'val'
looks fragile)


"val" is the tree expression from which the rtx op0 was expanded. It's
optional, but it's used in emit_cmp_and_jump_insns to determine whether
the target supports tbranch (according to a comment).

I think it would be safe to remove your code as that path does path
"treeop0" to "val".

WDYT?


Looks like a bit of a mess, but yes, I think that sounds good.


Thanks, here's what I pushed.

Andrew
vect: more oversized bitmask fixups

These patches fix up a failure in testcase vect/tsvc/vect-tsvc-s278.c when
configured to use V32 instead of V64 (I plan to do this for RDNA devices).

The problem was that a "not" operation on the mask inadvertently enabled
inactive lanes 31-63 and corrupted the output.  The fix is to adjust the mask
when calling internal functions (in this case COND_MINUS), when doing masked
loads and stores, and when doing conditional jumps (some cases were already
handled).

gcc/ChangeLog:

	* dojump.cc (do_compare_rtx_and_jump): Clear excess bits in vector
	bitmasks.
	(do_compare_and_jump): Remove now-redundant similar code.
	* internal-fn.cc (expand_fn_using_insn): Clear excess bits in vector
	bitmasks.
	(add_mask_and_len_args): Likewise.

diff --git a/gcc/dojump.cc b/gcc/dojump.cc
index 88600cb42d3..5f74b696b41 100644
--- a/gcc/dojump.cc
+++ b/gcc/dojump.cc
@@ -1235,6 +1235,24 @@ do_compare_rtx_and_jump (rtx op0, rtx op1, enum rtx_code code, int unsignedp,
 	}
 	}
 
+  /* For boolean vectors with less than mode precision
+	 make sure to fill padding with consistent values.  */
+  if (val
+	  && VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (val))
+	  && SCALAR_INT_MODE_P (mode))
+	{
+	  auto nunits = TYPE_VECTOR_SUBPARTS (TREE_TYPE (val)).to_constant ();
+	  if (maybe_ne (GET_MODE_PRECISION (mode), nunits))
+	{
+	  op0 = expand_binop (mode, and_optab, op0,
+  GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
+  NULL_RTX, true, OPTAB_WIDEN);
+	  op1 = expand_binop (mode, and_optab, op1,
+  GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
+  NULL_RTX, true, OPTAB_WIDEN);
+	}
+	}
+
   emit_cmp_and_jump_insns (op0, op1, code, size, mode, unsignedp, val,
 			   if_true_label, prob);
 }
@@ -1266,7 +1284,6 @@ do_compare_and_jump (tree treeop0, tree treeop1, enum rtx_code signed_code,
   machine_mode mode;
   int unsignedp;
   enum rtx_code code;
-  unsigned HOST_WIDE_INT nunits;
 
   /* Don't crash if the comparison was erroneous.  */
   op0 = expand_normal (treeop0);
@@ -1309,21 +1326,6 @@ do_compare_and_jump (tree treeop0, tree treeop1, enum rtx_code signed_code,
   emit_insn (targetm.gen_canonicalize_funcptr_for_compare (new_op1, op1));
   op1 = new_op1;
 }
-  /* For boolean vectors with less than mode precision
- make sure to fill padding with consistent values.  */
-  else if (VECTOR_BOOLEAN_TYPE_P (type)
-	   && SCALAR_INT_MODE_P (mode)
-	   && TYPE_VECTOR_SUBPARTS (type).is_constant ()
-	   && maybe_ne (GET_MODE_PRECISION (mode), nunits))
-{
-  gcc_assert (code == EQ || code == NE);
-  op0 = expand_binop (mode, and_optab, op0,
-			  GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1), NULL_RTX,
-			  true, OPTAB_WIDEN);
-  op1 = expand_binop (mode, and_optab, op1,
-			  GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1), NULL_RTX,
-			  true, OPTAB_WIDEN);
-}
 
   do_compare_rtx_and_jump (op0, op1, code, unsignedp, treeop0, mode,
 			   ((mode == BLKmode)
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index fcf47c7fa12..5269f0ac528 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -245,6 +245,18 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, unsigned int noutputs,
 	   && SSA_NAME_IS_DEFAULT_DEF (rhs)
 	   && VAR_P (SSA_NAME_VAR (rhs)))
 	create_undefined_input_operand ([opno], TYPE_MODE (rhs_type));
+  else if (VECTOR_BOOLEAN_TYPE_P (rhs_type)
+	   && SCALAR_INT_MODE_P (TYPE_MODE (rhs_type))
+	   && maybe_ne (GET_MODE_PRECISION (TYPE_MODE (rhs_type)),
+			TYPE_VECTOR_SUBPARTS (rhs_type).to_constant ()))
+	{
+	  /* Ensure that the vector bitmasks do not have excess bits.  */
+	  int nunits = TYPE_VECTOR_SUBPARTS (rhs_type).to_constant ();
+	  rtx tmp = expand_binop (TYPE_MODE (rhs_type), and_optab, rhs_rtx,
+  GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
+  NULL_RTX, true, OPTAB_WIDEN);
+	  create_input_operand ([opno], tmp, TYPE_MODE (rhs_type));
+	}
   else
 	create_input_operand ([opno], rhs_rtx, TYPE_MODE (rhs_type));
   opno += 1;
@@ -312,6 +324,20 @@ add_mask_and_len_args (expand_operand *ops, unsigned int opno, gcall *stmt)
 {
   tree mask = gimple_call_arg (stmt, mask_index);
   rtx

Re: [PATCH] c-family, c++: Handle EXCESS_PRECISION_EXPR in pretty printers

2024-03-22 Thread Joseph Myers

On Fri, 22 Mar 2024, Jakub Jelinek wrote:

> Hi!
> 
> I've noticed that the c-c++-common/gomp/depobj-3.c test FAILs on i686-linux:
> PASS: c-c++-common/gomp/depobj-3.c  -std=c++17  at line 17 (test for 
> warnings, line 15)
> FAIL: c-c++-common/gomp/depobj-3.c  -std=c++17  at line 39 (test for 
> warnings, line 37)
> PASS: c-c++-common/gomp/depobj-3.c  -std=c++17  at line 43 (test for errors, 
> line 41)
> PASS: c-c++-common/gomp/depobj-3.c  -std=c++17  (test for warnings, line 45)
> FAIL: c-c++-common/gomp/depobj-3.c  -std=c++17 (test for excess errors)
> Excess errors:
> /home/jakub/src/gcc/gcc/testsuite/c-c++-common/gomp/depobj-3.c:37:38: 
> warning: the 'destroy' expression ''excess_precision_expr' not supported by 
> dump_expr' should be the same as the 'depobj' argument 
> 'obj' [-Wopenmp]
> The following patch replaces that 'excess_precision_expr' not supported by 
> dump_expr
> with (float)(((long double)a) + (long double)5)
> Still ugly and doesn't actually fix the FAIL (will deal with that
> incrementally), but at least valid C/C++ and shows the excess precision
> handling in action.
> 
> Ok for trunk if this passes bootstrap/regtest?
> 
> 2024-03-22  Jakub Jelinek  
> 
> gcc/c/
>   * c-pretty-print.cc (pp_c_cast_expression,
>   c_pretty_printer::expression): Handle EXCESS_PRECISION_EXPR like
>   NOP_EXPR.

The c-pretty-print.cc changes are OK.

-- 
Joseph S. Myers
josmy...@redhat.com

Re: [PATCH] handle unwind tables that are embedded within unwinding code, [PR111731]

2024-03-22 Thread Jeff Law





On 3/15/24 4:29 AM, Thomas Neumann wrote:

Original bug report: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111731
Given that this is a regression, is this okay for gcc 13 and mainline?

The unwinding mechanism registers both the code range and the unwind
table itself within a b-tree lookup structure. That data structure
assumes that is consists of non-overlappping intervals. This
becomes a problem if the unwinding table is embedded within the
code itself, as now the intervals do overlap.

To fix this problem we now keep the unwind tables in a separate
b-tree, which prevents the overlap.

libgcc/ChangeLog:
 PR libgcc/111731
 * unwind-dw2-fde.c: Split unwind ranges if they contain the
 unwind table.
And what I'd suggest is committing to the trunk now, then waiting a week 
or two before backporting to gcc-13.


jeff

Re: [PATCH] handle unwind tables that are embedded within unwinding code, [PR111731]

2024-03-22 Thread Jeff Law





On 3/15/24 4:29 AM, Thomas Neumann wrote:

Original bug report: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111731
Given that this is a regression, is this okay for gcc 13 and mainline?

The unwinding mechanism registers both the code range and the unwind
table itself within a b-tree lookup structure. That data structure
assumes that is consists of non-overlappping intervals. This
becomes a problem if the unwinding table is embedded within the
code itself, as now the intervals do overlap.

To fix this problem we now keep the unwind tables in a separate
b-tree, which prevents the overlap.

libgcc/ChangeLog:
 PR libgcc/111731
 * unwind-dw2-fde.c: Split unwind ranges if they contain the
 unwind table.

I'll go ahead and give the final OK here :-)

Jeff

Re: [committed] amdgcn: Ensure gfx11 is running in cumode

2024-03-22 Thread Andrew Stubbs


On 22/03/2024 11:56, Thomas Schwinge wrote:

Hi Andrew!

On 2024-03-21T13:39:53+, Andrew Stubbs  wrote:

CUmode "on" is the setting for compatibility with GCN and CDNA devices.



--- a/gcc/config/gcn/gcn-hsa.h
+++ b/gcc/config/gcn/gcn-hsa.h
@@ -107,6 +107,7 @@ extern unsigned int gcn_local_sym_hash (const char *name);
  "%{" NO_XNACK XNACKOPT "} " \
  "%{" NO_SRAM_ECC SRAMOPT "} " \
  "%{march=gfx1030|march=gfx1100:-mattr=+wavefrontsize64} " \
+ "%{march=gfx1030|march=gfx1100:-mattr=+cumode} " \
  "-filetype=obj"


Is this just general housekeeping, or should I be seeing any kind of
change in the GCN target '-march=gfx1100' test results?  (I'm not.)


I'm pretty sure cumode is the default, but defaults can change and now 
we're future-proof. The option doesn't change the ELF flags at all.


The opposite of cumode allows more than 16 wavefronts in a workgroup, 
but they can't physically share a single LDS memory so it would break 
OpenACC broadcasting and reductions, and OpenMP libgomp team metadata. 
Also "cgroup" low-latency memory allocation.


Andrew

Re: scheduler queue flush (was Re: [gcc-15 0/3] RISC-V improve stack/array access by constant mat tweak)

2024-03-22 Thread Jeff Law





On 3/22/24 2:47 AM, Richard Biener wrote:

On Thu, Mar 21, 2024 at 8:56 PM Jeff Law  wrote:




On 3/21/24 11:19 AM, Vineet Gupta wrote:



So if we go back to Robin's observation that scheduling dramatically
increases the instruction count, perhaps we try a run with
-fno-schedule-insns -fno-schedule-insns2 and see how the instruction
counts compare.


Oh yeah ! Robin hinted to this in Tues patchworks meeting too

default   : 2,565,319,368,591
128   : 2,509,741,035,068
256   : 2,527,817,813,612
no-sched{,2}: 1,295,520,567,376

Now we're getting somewhere.  That's in line with expectations.

I would strongly suspect it's -fno-schedule-insns rather than
-fno-schedule-insns2.  The former turns off scheduling before register
allocation, the second turns it off after register allocation.  So if
our theory about spilling is correct, then it must be the first since
the second won't affect register allocation.   While I can speculate
about other potential scheduler impacts, spilling due to sched1's
actions is by far the most likely.


Another option is to enable -fsched-pressure which should help with
this issue.
In theory we're already using that by default -- it's part of what makes 
me so curious to understand what's going on.


jeff

Re: [committed] amdgcn: Ensure gfx11 is running in cumode

2024-03-22 Thread Thomas Schwinge

Hi Andrew!

On 2024-03-21T13:39:53+, Andrew Stubbs  wrote:
> CUmode "on" is the setting for compatibility with GCN and CDNA devices.

> --- a/gcc/config/gcn/gcn-hsa.h
> +++ b/gcc/config/gcn/gcn-hsa.h
> @@ -107,6 +107,7 @@ extern unsigned int gcn_local_sym_hash (const char *name);
> "%{" NO_XNACK XNACKOPT "} " \
> "%{" NO_SRAM_ECC SRAMOPT "} " \
> "%{march=gfx1030|march=gfx1100:-mattr=+wavefrontsize64} " \
> +   "%{march=gfx1030|march=gfx1100:-mattr=+cumode} " \
> "-filetype=obj"

Is this just general housekeeping, or should I be seeing any kind of
change in the GCN target '-march=gfx1100' test results?  (I'm not.)


Grüße
 Thomas

RE: [PATCH v4] RISC-V: Introduce gcc attribute riscv_rvv_vector_bits for RVV

2024-03-22 Thread Li, Pan2

Committed, thanks Kito.

Pan

-Original Message-
From: Kito Cheng  
Sent: Friday, March 22, 2024 6:06 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 
; rdapp@gmail.com; vine...@rivosinc.com; 
pal...@rivosinc.com
Subject: Re: [PATCH v4] RISC-V: Introduce gcc attribute riscv_rvv_vector_bits 
for RVV

LGTM, thanks :)

On Fri, Mar 22, 2024 at 2:55 PM  wrote:
>
> From: Pan Li 
>
> This patch would like to introduce one new gcc attribute for RVV.
> This attribute is used to define fixed-length variants of one
> existing sizeless RVV types.
>
> This attribute is valid if and only if the mrvv-vector-bits=zvl, the only
> one args should be the integer constant and its' value is terminated
> by the LMUL and the vector register bits in zvl*b.  For example:
>
> typedef vint32m2_t fixed_vint32m2_t 
> __attribute__((riscv_rvv_vector_bits(128)));
>
> The above type define is valid when -march=rv64gc_zve64d_zvl64b
> (aka 2(m2) * 64 = 128 for vin32m2_t), and will report error when
> -march=rv64gcv_zvl128b similar to below.
>
> "error: invalid RVV vector size '128', expected size is '256' based on
> LMUL of type and '-mrvv-vector-bits=zvl'"
>
> Meanwhile, a pre-define macro __riscv_v_fixed_vlen is introduced to
> represent the fixed vlen in a RVV vector register.
>
> For the vint*m*_t below operations are allowed.
> * The sizeof.
> * The global variable(s).
> * The element of union and struct.
> * The cast to other equalities.
> * CMP: >, <, ==, !=, <=, >=
> * ALU: +, -, *, /, %, &, |, ^, >>, <<, ~, -
>
> The CMP will return vint*m*_t the same as aarch64 sve. For example:
> typedef vint32m1_t fixed_vint32m1_t 
> __attribute__((riscv_rvv_vector_bits(128)));
> fixed_vint32m1_t less_than (fixed_vint32m1_t a, fixed_vint32m1_t b)
> {
>   return a < b;
> }
>
> For the vfloat*m*_t below operations are allowed.
> * The sizeof.
> * The global variable(s).
> * The element of union and struct.
> * The cast to other equalities.
> * CMP: >, <, ==, !=, <=, >=
> * ALU: +, -, *, /, -
>
> The CMP will return vfloat*m*_t the same as aarch64 sve. For example:
> typedef vfloat32m1_t fixed_vfloat32m1_t 
> __attribute__((riscv_rvv_vector_bits(128)));
> fixed_vfloat32m1_t less_than (fixed_vfloat32m1_t a, fixed_vfloat32m1_t b)
> {
>   return a < b;
> }
>
> For the vbool*_t types only below operations are allowed except
> the CMP and ALU. The CMP and ALU operations on vbool*_t is not
> well defined currently.
> * The sizeof.
> * The global variable(s).
> * The element of union and struct.
> * The cast to other equalities.
>
> For the vint*x*m*_t tuple types are not suppored in this patch which is
> compatible with clang.
>
> This patch passed the below testsuites.
> * The riscv fully regression tests.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins): Add pre-define
> macro __riscv_v_fixed_vlen when zvl.
> * config/riscv/riscv.cc (riscv_handle_rvv_vector_bits_attribute):
> New static func to take care of the RVV types decorated by
> the attributes.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-1.c: New test.
> * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-10.c: New test.
> * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-11.c: New test.
> * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-12.c: New test.
> * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-13.c: New test.
> * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-14.c: New test.
> * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-15.c: New test.
> * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-16.c: New test.
> * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-17.c: New test.
> * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-18.c: New test.
> * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-2.c: New test.
> * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-3.c: New test.
> * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-4.c: New test.
> * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-5.c: New test.
> * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-6.c: New test.
> * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-7.c: New test.
> * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-8.c: New test.
> * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-9.c: New test.
> * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits.h: New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/config/riscv/riscv-c.cc   |   3 +
>  gcc/config/riscv/riscv.cc |  87 +-
>  .../riscv/rvv/base/riscv_rvv_vector_bits-1.c  |   6 +
>  .../riscv/rvv/base/riscv_rvv_vector_bits-10.c |  53 +
>  .../riscv/rvv/base/riscv_rvv_vector_bits-11.c |  76 
>  .../riscv/rvv/base/riscv_rvv_vector_bits-12.c |  14 +++
>  .../riscv/rvv/base/riscv_rvv_vector_bits-13.c |  10

Re: [PATCH v1] rs6000: Stackoverflow in optimized code on PPC [PR100799]

2024-03-22 Thread Ajit Agarwal

Hello Jakub:

Thanks for review. Addressed below review comments and sent
version 2 of the patch for review.

Thanks & Regards
Ajit

On 22/03/24 3:06 pm, Jakub Jelinek wrote:
> On Fri, Mar 22, 2024 at 02:55:43PM +0530, Ajit Agarwal wrote:
>> rs6000: Stackoverflow in optimized code on PPC [PR100799]
>>
>> When using FlexiBLAS with OpenBLAS we noticed corruption of
>> the parameters passed to OpenBLAS functions. FlexiBLAS
>> basically provides a BLAS interface where each function
>> is a stub that forwards the arguments to a real BLAS lib,
>> like OpenBLAS.
>>
>> Fixes the corruption of caller frame checking number of
>> arguments is less than equal to GP_ARG_NUM_REG (8)
>> excluding hidden unused DECLS.
> 
> Looks mostly good to me except some comment nits, but I'll defer
> the actual ack to the rs6000 maintainers.
> 
>> +  /* Workaround buggy C/C++ wrappers around Fortran routines with
>> + character(len=constant) arguments if the hidden string length arguments
>> + are passed on the stack; if the callers forget to pass those arguments,
>> + attempting to tail call in such routines leads to stack corruption.
> 
> I thought it isn't just tail calls, even normal calls.
> When the buggy C/C++ wrappers call the function with fewer arguments
> than it actually has and doesn't expect the parameter save area on the
> caller side because of that while the callee expects it and the callee
> actually stores something in the parameter save area, it corrupts whatever
> is in the caller stack frame at that location.
> 
>> + Avoid return stack space for parameters <= 8 excluding hidden string
>> + length argument is passed (partially or fully) on the stack in the
>> + caller and the callee needs to pass any arguments on the stack.  */
>> +  unsigned int num_args = 0;
>> +  unsigned int hidden_length = 0;
>> +
>> +  for (tree arg = DECL_ARGUMENTS (current_function_decl);
>> +   arg; arg = DECL_CHAIN (arg))
>> +{
>> +  num_args++;
>> +  if (DECL_HIDDEN_STRING_LENGTH (arg))
>> +{
>> +  tree parmdef = ssa_default_def (cfun, arg);
>> +  if (parmdef == NULL || has_zero_uses (parmdef))
>> +{
>> +  cum->hidden_string_length = 1;
>> +  hidden_length++;
>> +}
>> +}
>> +   }
>> +
>> +  cum->actual_parm_length = num_args - hidden_length;
>> +
>>/* Check for a longcall attribute.  */
>>if ((!fntype && rs6000_default_long_calls)
>>|| (fntype
>> @@ -1857,7 +1884,16 @@ rs6000_function_arg (cumulative_args_t cum_v, const 
>> function_arg_info )
>>  
>>return rs6000_finish_function_arg (mode, rvec, k);
>>  }
>> -  else if (align_words < GP_ARG_NUM_REG)
>> + /* Workaround buggy C/C++ wrappers around Fortran routines with
>> +character(len=constant) arguments if the hidden string length arguments
>> +are passed on the stack; if the callers forget to pass those arguments,
>> +attempting to tail call in such routines leads to stack corruption.
>> +Avoid return stack space for parameters <= 8 excluding hidden string
>> +length argument is passed (partially or fully) on the stack in the
>> +caller and the callee needs to pass any arguments on the stack.  */
>> +  else if (align_words < GP_ARG_NUM_REG
>> +   || (cum->hidden_string_length
>> +   && cum->actual_parm_length <= GP_ARG_NUM_REG))
>>  {
>>if (TARGET_32BIT && TARGET_POWERPC64)
>>  return rs6000_mixed_function_arg (mode, type, align_words);
>> diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
>> index 68bc45d65ba..a1d3ed00b14 100644
>> --- a/gcc/config/rs6000/rs6000.h
>> +++ b/gcc/config/rs6000/rs6000.h
>> @@ -1490,6 +1490,14 @@ typedef struct rs6000_args
>>int named;/* false for varargs params */
>>int escapes;  /* if function visible outside tu */
>>int libcall;  /* If this is a compiler generated 
>> call.  */
>> +  /* Actual parameter length ignoring hidden paramter.
> 
> s/paramter/parameter/
> 
>> + This is done to C++ wrapper calling fortran module
>> + which has hidden parameter that are not used.  */
>> +  unsigned int actual_parm_length;
>> +  /* Hidden parameters while calling C++ wrapper to fortran
>> + module. Set if there is hidden parameter in fortran
>> + module while called C++ wrapper.  */
> 
> modules in Fortran are something completely different.
> You should IMHO talk about procedures instead of modules
> in both of the above comments (multiple times even).
> 
>> +  unsigned int hidden_string_length : 1;
>>  } CUMULATIVE_ARGS;
>>  
>>  /* Initialize a variable CUM of type CUMULATIVE_ARGS
>> -- 
>> 2.39.3
> 
>   Jakub
>

[PATCH v2] rs6000: Stackoverflow in optimized code on PPC [PR100799]

2024-03-22 Thread Ajit Agarwal

Hello All:

This is version-2 of the patch with review comments addressed.

When using FlexiBLAS with OpenBLAS we noticed corruption of
the parameters passed to OpenBLAS functions. FlexiBLAS
basically provides a BLAS interface where each function
is a stub that forwards the arguments to a real BLAS lib,
like OpenBLAS.

Fixes the corruption of caller frame checking number of
arguments is less than equal to GP_ARG_NUM_REG (8)
excluding hidden unused DECLS.

Bootstrapped and regtested for powerpc64-linux.gnu.

Thanks & Regards
Ajit


rs6000: Stackoverflow in optimized code on PPC [PR100799]

When using FlexiBLAS with OpenBLAS we noticed corruption of
the parameters passed to OpenBLAS functions. FlexiBLAS
basically provides a BLAS interface where each function
is a stub that forwards the arguments to a real BLAS lib,
like OpenBLAS.

Fixes the corruption of caller frame checking number of
arguments is less than equal to GP_ARG_NUM_REG (8)
excluding hidden unused DECLS.

2024-03-22  Ajit Kumar Agarwal  

gcc/ChangeLog:

PR rtk-optimization/100799
* config/rs6000/rs6000-calls.cc (rs6000_function_arg): Don't
generate parameter save area if number of arguments passed
less than equal to GP_ARG_NUM_REG (8) excluding hidden
parameter.
(init_cumulative_args): Check for hidden parameter in fortran
routine and set the flag hidden_string_length and actual
parameter passed excluding hidden unused DECLS.
* config/rs6000/rs6000.h (rs6000_args): Add new field
hidden_string_length and actual_parm_length.
---
 gcc/config/rs6000/rs6000-call.cc | 36 ++--
 gcc/config/rs6000/rs6000.h   |  7 +++
 2 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-call.cc b/gcc/config/rs6000/rs6000-call.cc
index 1f8f93a2ee7..fd823c66ea2 100644
--- a/gcc/config/rs6000/rs6000-call.cc
+++ b/gcc/config/rs6000/rs6000-call.cc
@@ -64,7 +64,7 @@
 #include "ppc-auxv.h"
 #include "targhooks.h"
 #include "opts.h"
-
+#include "tree-dfa.h"
 #include "rs6000-internal.h"
 
 #ifndef TARGET_PROFILE_KERNEL
@@ -584,6 +584,31 @@ init_cumulative_args (CUMULATIVE_ARGS *cum, tree fntype,
   if (incoming || cum->prototype)
 cum->nargs_prototype = n_named_args;
 
+  /* When the buggy C/C++ wrappers call the function with fewer arguments
+ than it actually has and doesn't expect the parameter save area on the
+ caller side because of that while the callee expects it and the callee
+ actually stores something in the parameter save area, it corrupts
+ whatever is in the caller stack frame at that location.  */
+  unsigned int num_args = 0;
+  unsigned int hidden_length = 0;
+
+  for (tree arg = DECL_ARGUMENTS (current_function_decl);
+   arg; arg = DECL_CHAIN (arg))
+{
+  num_args++;
+  if (DECL_HIDDEN_STRING_LENGTH (arg))
+   {
+ tree parmdef = ssa_default_def (cfun, arg);
+ if (parmdef == NULL || has_zero_uses (parmdef))
+   {
+ cum->hidden_string_length = 1;
+ hidden_length++;
+   }
+   }
+   }
+
+  cum->actual_parm_length = num_args - hidden_length;
+
   /* Check for a longcall attribute.  */
   if ((!fntype && rs6000_default_long_calls)
   || (fntype
@@ -1857,7 +1882,14 @@ rs6000_function_arg (cumulative_args_t cum_v, const 
function_arg_info )
 
  return rs6000_finish_function_arg (mode, rvec, k);
}
-  else if (align_words < GP_ARG_NUM_REG)
+ /* When the buggy C/C++ wrappers call the function with fewer arguments
+   than it actually has and doesn't expect the parameter save area on the
+   caller side because of that while the callee expects it and the callee
+   actually stores something in the parameter save area, it corrupts
+   whatever is in the caller stack frame at that location.  */
+  else if (align_words < GP_ARG_NUM_REG
+  || (cum->hidden_string_length
+  && cum->actual_parm_length <= GP_ARG_NUM_REG))
{
  if (TARGET_32BIT && TARGET_POWERPC64)
return rs6000_mixed_function_arg (mode, type, align_words);
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 68bc45d65ba..60f23f33879 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -1490,6 +1490,13 @@ typedef struct rs6000_args
   int named;   /* false for varargs params */
   int escapes; /* if function visible outside tu */
   int libcall; /* If this is a compiler generated call.  */
+  /* Actual parameter length ignoring hidden parameter.
+ This is done to C++ wrapper calling fortran procedures
+ which has hidden parameter that are not used.  */
+  unsigned int actual_parm_length;
+  /* Set if there is hidden parameters while calling C++ wrapper to
+ fortran procedure.  */
+  unsigned int hidden_string_length : 1;
 } CUMULATIVE_ARGS;
 
 /* Initialize a

Re: [PATCH v4] RISC-V: Introduce gcc attribute riscv_rvv_vector_bits for RVV

2024-03-22 Thread Kito Cheng

LGTM, thanks :)

On Fri, Mar 22, 2024 at 2:55 PM  wrote:
>
> From: Pan Li 
>
> This patch would like to introduce one new gcc attribute for RVV.
> This attribute is used to define fixed-length variants of one
> existing sizeless RVV types.
>
> This attribute is valid if and only if the mrvv-vector-bits=zvl, the only
> one args should be the integer constant and its' value is terminated
> by the LMUL and the vector register bits in zvl*b.  For example:
>
> typedef vint32m2_t fixed_vint32m2_t 
> __attribute__((riscv_rvv_vector_bits(128)));
>
> The above type define is valid when -march=rv64gc_zve64d_zvl64b
> (aka 2(m2) * 64 = 128 for vin32m2_t), and will report error when
> -march=rv64gcv_zvl128b similar to below.
>
> "error: invalid RVV vector size '128', expected size is '256' based on
> LMUL of type and '-mrvv-vector-bits=zvl'"
>
> Meanwhile, a pre-define macro __riscv_v_fixed_vlen is introduced to
> represent the fixed vlen in a RVV vector register.
>
> For the vint*m*_t below operations are allowed.
> * The sizeof.
> * The global variable(s).
> * The element of union and struct.
> * The cast to other equalities.
> * CMP: >, <, ==, !=, <=, >=
> * ALU: +, -, *, /, %, &, |, ^, >>, <<, ~, -
>
> The CMP will return vint*m*_t the same as aarch64 sve. For example:
> typedef vint32m1_t fixed_vint32m1_t 
> __attribute__((riscv_rvv_vector_bits(128)));
> fixed_vint32m1_t less_than (fixed_vint32m1_t a, fixed_vint32m1_t b)
> {
>   return a < b;
> }
>
> For the vfloat*m*_t below operations are allowed.
> * The sizeof.
> * The global variable(s).
> * The element of union and struct.
> * The cast to other equalities.
> * CMP: >, <, ==, !=, <=, >=
> * ALU: +, -, *, /, -
>
> The CMP will return vfloat*m*_t the same as aarch64 sve. For example:
> typedef vfloat32m1_t fixed_vfloat32m1_t 
> __attribute__((riscv_rvv_vector_bits(128)));
> fixed_vfloat32m1_t less_than (fixed_vfloat32m1_t a, fixed_vfloat32m1_t b)
> {
>   return a < b;
> }
>
> For the vbool*_t types only below operations are allowed except
> the CMP and ALU. The CMP and ALU operations on vbool*_t is not
> well defined currently.
> * The sizeof.
> * The global variable(s).
> * The element of union and struct.
> * The cast to other equalities.
>
> For the vint*x*m*_t tuple types are not suppored in this patch which is
> compatible with clang.
>
> This patch passed the below testsuites.
> * The riscv fully regression tests.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins): Add pre-define
> macro __riscv_v_fixed_vlen when zvl.
> * config/riscv/riscv.cc (riscv_handle_rvv_vector_bits_attribute):
> New static func to take care of the RVV types decorated by
> the attributes.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-1.c: New test.
> * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-10.c: New test.
> * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-11.c: New test.
> * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-12.c: New test.
> * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-13.c: New test.
> * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-14.c: New test.
> * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-15.c: New test.
> * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-16.c: New test.
> * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-17.c: New test.
> * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-18.c: New test.
> * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-2.c: New test.
> * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-3.c: New test.
> * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-4.c: New test.
> * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-5.c: New test.
> * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-6.c: New test.
> * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-7.c: New test.
> * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-8.c: New test.
> * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-9.c: New test.
> * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits.h: New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/config/riscv/riscv-c.cc   |   3 +
>  gcc/config/riscv/riscv.cc |  87 +-
>  .../riscv/rvv/base/riscv_rvv_vector_bits-1.c  |   6 +
>  .../riscv/rvv/base/riscv_rvv_vector_bits-10.c |  53 +
>  .../riscv/rvv/base/riscv_rvv_vector_bits-11.c |  76 
>  .../riscv/rvv/base/riscv_rvv_vector_bits-12.c |  14 +++
>  .../riscv/rvv/base/riscv_rvv_vector_bits-13.c |  10 ++
>  .../riscv/rvv/base/riscv_rvv_vector_bits-14.c |  10 ++
>  .../riscv/rvv/base/riscv_rvv_vector_bits-15.c |  10 ++
>  .../riscv/rvv/base/riscv_rvv_vector_bits-16.c |  11 ++
>  .../riscv/rvv/base/riscv_rvv_vector_bits-17.c |  10 ++
>  .../riscv/rvv/base/riscv_rvv_vector_bits-18.c |  45 
>  .../riscv/rvv/base/riscv_rvv_vector_bits-2.c

Re: [PATCH] s390: testsuite: Fix backprop-6.c

2024-03-22 Thread Andreas Krebbel

On 3/22/24 10:49, Stefan Schulze Frielinghaus wrote:
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/tree-ssa/backprop-6.c: On s390 we also have a copysign
>   optab for long double.  Thus, scan 3 instead of 2 times for it.
> ---
>  OK for mainline?

Ok. Thanks!

Andreas

[PATCH] s390: testsuite: Fix backprop-6.c

2024-03-22 Thread Stefan Schulze Frielinghaus

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/backprop-6.c: On s390 we also have a copysign
optab for long double.  Thus, scan 3 instead of 2 times for it.
---
 OK for mainline?

 gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c 
b/gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c
index 4087ba93018..dbde681e383 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c
@@ -27,8 +27,9 @@ TEST_FUNCTION (float, f)
 TEST_FUNCTION (double, )
 TEST_FUNCTION (long double, l)
 
-/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = -} 4 "backprop" { 
target ifn_copysign } } } */
-/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = \.COPYSIGN} 2 
"backprop" { target ifn_copysign } } } */
-/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = ABS_EXPR <} 1 
"backprop" { target ifn_copysign } } } */
+/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = -} 4 "backprop" { 
target { ifn_copysign && { ! { s390*-*-* } } } } } } */
+/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = \.COPYSIGN} 2 
"backprop" { target { ifn_copysign && { ! { s390*-*-* } } } } } } */
+/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = ABS_EXPR <} 1 
"backprop" { target { ifn_copysign && { ! { s390*-*-* } } } } } } */
+/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = \.COPYSIGN} 3 
"backprop" { target { ifn_copysign && s390*-*-* } } } } */
 /* { dg-final { scan-tree-dump-times {Deleting[^\n]* = -} 6 "backprop" { 
target { ! ifn_copysign } } } } */
 /* { dg-final { scan-tree-dump-times {Deleting[^\n]* = ABS_EXPR <} 3 
"backprop" { target { ! ifn_copysign } } } } */
-- 
2.43.0

[committed] testsuite: Fix up depobj-3.c test on i686-linux [PR112724]

2024-03-22 Thread Jakub Jelinek

Hi!

While I've posted a patch to handle EXCESS_PRECISION_EXPR in C/C++
pretty printing, still we'd need to handle
(a + (float)5)
and
(float)(((long double)a) + (long double)5)
and possibly
(float)(((double)a) + (double)5)
too for s390?, so the following patch just uses -fexcess-precision=fast,
so that the expression is always the same.

Tested on x86_64-linux -m32/-m64, committed to trunk.

2024-03-22  Jakub Jelinek  

PR c++/112724
* c-c++-common/gomp/depobj-3.c: Add -fexcess-precision=fast as
dg-additional-options.

--- gcc/testsuite/c-c++-common/gomp/depobj-3.c.jj   2023-12-01 
08:10:43.769309565 +0100
+++ gcc/testsuite/c-c++-common/gomp/depobj-3.c  2024-03-22 10:12:17.680085428 
+0100
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fexcess-precision=fast" } */
+
 typedef struct __attribute__((__aligned__ (sizeof (void * omp_depend_t {
   char __omp_depend_t__[2 * sizeof (void *)];
 } omp_depend_t;

Jakub

Re: [PATCH v1] rs6000: Stackoverflow in optimized code on PPC [PR100799]

2024-03-22 Thread Jakub Jelinek

On Fri, Mar 22, 2024 at 02:55:43PM +0530, Ajit Agarwal wrote:
> rs6000: Stackoverflow in optimized code on PPC [PR100799]
> 
> When using FlexiBLAS with OpenBLAS we noticed corruption of
> the parameters passed to OpenBLAS functions. FlexiBLAS
> basically provides a BLAS interface where each function
> is a stub that forwards the arguments to a real BLAS lib,
> like OpenBLAS.
> 
> Fixes the corruption of caller frame checking number of
> arguments is less than equal to GP_ARG_NUM_REG (8)
> excluding hidden unused DECLS.

Looks mostly good to me except some comment nits, but I'll defer
the actual ack to the rs6000 maintainers.

> +  /* Workaround buggy C/C++ wrappers around Fortran routines with
> + character(len=constant) arguments if the hidden string length arguments
> + are passed on the stack; if the callers forget to pass those arguments,
> + attempting to tail call in such routines leads to stack corruption.

I thought it isn't just tail calls, even normal calls.
When the buggy C/C++ wrappers call the function with fewer arguments
than it actually has and doesn't expect the parameter save area on the
caller side because of that while the callee expects it and the callee
actually stores something in the parameter save area, it corrupts whatever
is in the caller stack frame at that location.

> + Avoid return stack space for parameters <= 8 excluding hidden string
> + length argument is passed (partially or fully) on the stack in the
> + caller and the callee needs to pass any arguments on the stack.  */
> +  unsigned int num_args = 0;
> +  unsigned int hidden_length = 0;
> +
> +  for (tree arg = DECL_ARGUMENTS (current_function_decl);
> +   arg; arg = DECL_CHAIN (arg))
> +{
> +  num_args++;
> +  if (DECL_HIDDEN_STRING_LENGTH (arg))
> + {
> +   tree parmdef = ssa_default_def (cfun, arg);
> +   if (parmdef == NULL || has_zero_uses (parmdef))
> + {
> +   cum->hidden_string_length = 1;
> +   hidden_length++;
> + }
> + }
> +   }
> +
> +  cum->actual_parm_length = num_args - hidden_length;
> +
>/* Check for a longcall attribute.  */
>if ((!fntype && rs6000_default_long_calls)
>|| (fntype
> @@ -1857,7 +1884,16 @@ rs6000_function_arg (cumulative_args_t cum_v, const 
> function_arg_info )
>  
> return rs6000_finish_function_arg (mode, rvec, k);
>   }
> -  else if (align_words < GP_ARG_NUM_REG)
> + /* Workaround buggy C/C++ wrappers around Fortran routines with
> + character(len=constant) arguments if the hidden string length arguments
> + are passed on the stack; if the callers forget to pass those arguments,
> + attempting to tail call in such routines leads to stack corruption.
> + Avoid return stack space for parameters <= 8 excluding hidden string
> + length argument is passed (partially or fully) on the stack in the
> + caller and the callee needs to pass any arguments on the stack.  */
> +  else if (align_words < GP_ARG_NUM_REG
> +|| (cum->hidden_string_length
> +&& cum->actual_parm_length <= GP_ARG_NUM_REG))
>   {
> if (TARGET_32BIT && TARGET_POWERPC64)
>   return rs6000_mixed_function_arg (mode, type, align_words);
> diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
> index 68bc45d65ba..a1d3ed00b14 100644
> --- a/gcc/config/rs6000/rs6000.h
> +++ b/gcc/config/rs6000/rs6000.h
> @@ -1490,6 +1490,14 @@ typedef struct rs6000_args
>int named; /* false for varargs params */
>int escapes;   /* if function visible outside tu */
>int libcall;   /* If this is a compiler generated 
> call.  */
> +  /* Actual parameter length ignoring hidden paramter.

s/paramter/parameter/

> + This is done to C++ wrapper calling fortran module
> + which has hidden parameter that are not used.  */
> +  unsigned int actual_parm_length;
> +  /* Hidden parameters while calling C++ wrapper to fortran
> + module. Set if there is hidden parameter in fortran
> + module while called C++ wrapper.  */

modules in Fortran are something completely different.
You should IMHO talk about procedures instead of modules
in both of the above comments (multiple times even).

> +  unsigned int hidden_string_length : 1;
>  } CUMULATIVE_ARGS;
>  
>  /* Initialize a variable CUM of type CUMULATIVE_ARGS
> -- 
> 2.39.3

Jakub

Re: [PATCH] rs6000: Stackoverflow in optimized code on PPC (PR100799)

2024-03-22 Thread Ajit Agarwal

Hello Jakub:

Addressed the below comments and sent version 1 of the patch
for review.

Thanks & Regards
Ajit

On 22/03/24 1:15 pm, Jakub Jelinek wrote:
> On Fri, Mar 22, 2024 at 01:00:21PM +0530, Ajit Agarwal wrote:
>> When using FlexiBLAS with OpenBLAS we noticed corruption of
>> the parameters passed to OpenBLAS functions. FlexiBLAS
>> basically provides a BLAS interface where each function
>> is a stub that forwards the arguments to a real BLAS lib,
>> like OpenBLAS.
>>
>> Fixes the corruption of caller frame checking number of
>> arguments is less than equal to GP_ARG_NUM_REG (8)
>> excluding hidden unused DECLS.
> 
> Thanks for working on this.
> 
>> 2024-03-22  Ajit Kumar Agarwal  
>>
>> gcc/ChangeLog:
>>
>> PR rtk-optimization/100799
>> * config/rs600/rs600-calls.cc (rs6000_function_arg): Don't
> 
> These 2 lines are 8 space indented rather than tab.
> 
>>  generate parameter save area if number of arguments passed
>>  less than equal to GP_ARG_NUM_REG (8) excluding hidden
>>  paramter.
>>  * function.cc (assign_parms_initialize_all): Check for hidden
>>  parameter in fortran code and set the flag hidden_string_length
>>  and actual paramter passed excluding hidden unused DECLS.
> 
> s/paramter/parameter/
> 
>>  * function.h: Add new field hidden_string_length and
>>  actual_parm_length in function structure.
> 
> Why do you need to change generic code for something that will only be
> used by a single target?
> I mean, why don't you add the extra members in rs6000.h (struct rs6000_args)
> and initialize them in rs6000-call.cc (init_cumulative_args) -
> the function.cc function you've modified is the only one which uses
> INIT_CUMULATIVE_INCOMING_ARGS and in that case init_cumulative_args is
> called with incoming == true, so move the stuff from function.cc there.
> 
>> --- a/gcc/config/rs6000/rs6000-call.cc
>> +++ b/gcc/config/rs6000/rs6000-call.cc
>> @@ -1857,7 +1857,16 @@ rs6000_function_arg (cumulative_args_t cum_v, const 
>> function_arg_info )
>>  
>>return rs6000_finish_function_arg (mode, rvec, k);
>>  }
>> -  else if (align_words < GP_ARG_NUM_REG)
>> + /* Workaround buggy C/C++ wrappers around Fortran routines with
>> +character(len=constant) arguments if the hidden string length arguments
>> +are passed on the stack; if the callers forget to pass those arguments,
>> +attempting to tail call in such routines leads to stack corruption.
>> +Avoid return stack space for parameters <= 8 excluding hidden string
>> +length argument is passed (partially or fully) on the stack in the
>> +caller and the callee needs to pass any arguments on the stack.  */
>> +  else if (align_words < GP_ARG_NUM_REG
>> +   || (cfun->hidden_string_length
>> +   && cfun->actual_parm_length <= GP_ARG_NUM_REG))
>>  {
>>if (TARGET_32BIT && TARGET_POWERPC64)
>>  return rs6000_mixed_function_arg (mode, type, align_words);
>> diff --git a/gcc/function.cc b/gcc/function.cc
>> index 3cef6c17bce..1318564b466 100644
>> --- a/gcc/function.cc
>> +++ b/gcc/function.cc
>> @@ -2326,6 +2326,32 @@ assign_parms_initialize_all (struct 
>> assign_parm_data_all *all)
>>  #endif
>>all->args_so_far = pack_cumulative_args (>args_so_far_v);
>>  
>> +  unsigned int num_args = 0;
>> +  unsigned int hidden_length = 0;
>> +
>> +  /* Workaround buggy C/C++ wrappers around Fortran routines with
>> + character(len=constant) arguments if the hidden string length arguments
>> + are passed on the stack; if the callers forget to pass those arguments,
>> + attempting to tail call in such routines leads to stack corruption.
>> + Avoid return stack space for parameters <= 8 excluding hidden string
>> + length argument is passed (partially or fully) on the stack in the
>> + caller and the callee needs to pass any arguments on the stack.  */
>> +  for (tree arg = DECL_ARGUMENTS (current_function_decl);
>> +   arg; arg = DECL_CHAIN (arg))
>> +{
>> +  num_args++;
>> +  if (DECL_HIDDEN_STRING_LENGTH (arg))
>> +{
>> +  tree parmdef = ssa_default_def (cfun, arg);
>> +  if (parmdef == NULL || has_zero_uses (parmdef))
>> +{
>> +  cfun->hidden_string_length = 1;
>> +  hidden_length++;
>> +}
>> +}
>> +   }
>> +
>> +  cfun->actual_parm_length = num_args - hidden_length;
>>  #ifdef INCOMING_REG_PARM_STACK_SPACE
>>all->reg_parm_stack_space
>>  = INCOMING_REG_PARM_STACK_SPACE (current_function_decl);
>> diff --git a/gcc/function.h b/gcc/function.h
>> index 19e15bd63b0..5984f0007c2 100644
>> --- a/gcc/function.h
>> +++ b/gcc/function.h
>> @@ -346,6 +346,11 @@ struct GTY(()) function {
>>/* Last assigned dependence info clique.  */
>>unsigned short last_clique;
>>  
>> +  /* Actual parameter length ignoring hidden paramter.
>> + This is done to C++ wrapper calling fortran module
>> + which has hidden parameter that are not used.

[PATCH v1] rs6000: Stackoverflow in optimized code on PPC [PR100799]

2024-03-22 Thread Ajit Agarwal

Hello Jakub:

When using FlexiBLAS with OpenBLAS we noticed corruption of
the parameters passed to OpenBLAS functions. FlexiBLAS
basically provides a BLAS interface where each function
is a stub that forwards the arguments to a real BLAS lib,
like OpenBLAS.

Fixes the corruption of caller frame checking number of
arguments is less than equal to GP_ARG_NUM_REG (8)
excluding hidden unused DECLS.

Bootstrapped and regtested on powerpc64-linux-gnu.

Thanks & Regards
Ajit


rs6000: Stackoverflow in optimized code on PPC [PR100799]

When using FlexiBLAS with OpenBLAS we noticed corruption of
the parameters passed to OpenBLAS functions. FlexiBLAS
basically provides a BLAS interface where each function
is a stub that forwards the arguments to a real BLAS lib,
like OpenBLAS.

Fixes the corruption of caller frame checking number of
arguments is less than equal to GP_ARG_NUM_REG (8)
excluding hidden unused DECLS.

2024-03-22  Ajit Kumar Agarwal  

gcc/ChangeLog:

PR rtk-optimization/100799
* config/rs6000/rs6000-calls.cc (rs6000_function_arg): Don't
generate parameter save area if number of arguments passed
less than equal to GP_ARG_NUM_REG (8) excluding hidden
parameter.
(init_cumulative_args): Check for hidden parameter in fortran
routine and set the flag hidden_string_length and actual
parameter passed excluding hidden unused DECLS.
* config/rs6000/rs6000.h (rs6000_args): Add new field
hidden_string_length and actual_parm_length.
---
 gcc/config/rs6000/rs6000-call.cc | 40 ++--
 gcc/config/rs6000/rs6000.h   |  8 +++
 2 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-call.cc b/gcc/config/rs6000/rs6000-call.cc
index 1f8f93a2ee7..2620ce16943 100644
--- a/gcc/config/rs6000/rs6000-call.cc
+++ b/gcc/config/rs6000/rs6000-call.cc
@@ -64,7 +64,7 @@
 #include "ppc-auxv.h"
 #include "targhooks.h"
 #include "opts.h"
-
+#include "tree-dfa.h"
 #include "rs6000-internal.h"
 
 #ifndef TARGET_PROFILE_KERNEL
@@ -584,6 +584,33 @@ init_cumulative_args (CUMULATIVE_ARGS *cum, tree fntype,
   if (incoming || cum->prototype)
 cum->nargs_prototype = n_named_args;
 
+  /* Workaround buggy C/C++ wrappers around Fortran routines with
+ character(len=constant) arguments if the hidden string length arguments
+ are passed on the stack; if the callers forget to pass those arguments,
+ attempting to tail call in such routines leads to stack corruption.
+ Avoid return stack space for parameters <= 8 excluding hidden string
+ length argument is passed (partially or fully) on the stack in the
+ caller and the callee needs to pass any arguments on the stack.  */
+  unsigned int num_args = 0;
+  unsigned int hidden_length = 0;
+
+  for (tree arg = DECL_ARGUMENTS (current_function_decl);
+   arg; arg = DECL_CHAIN (arg))
+{
+  num_args++;
+  if (DECL_HIDDEN_STRING_LENGTH (arg))
+   {
+ tree parmdef = ssa_default_def (cfun, arg);
+ if (parmdef == NULL || has_zero_uses (parmdef))
+   {
+ cum->hidden_string_length = 1;
+ hidden_length++;
+   }
+   }
+   }
+
+  cum->actual_parm_length = num_args - hidden_length;
+
   /* Check for a longcall attribute.  */
   if ((!fntype && rs6000_default_long_calls)
   || (fntype
@@ -1857,7 +1884,16 @@ rs6000_function_arg (cumulative_args_t cum_v, const 
function_arg_info )
 
  return rs6000_finish_function_arg (mode, rvec, k);
}
-  else if (align_words < GP_ARG_NUM_REG)
+ /* Workaround buggy C/C++ wrappers around Fortran routines with
+   character(len=constant) arguments if the hidden string length arguments
+   are passed on the stack; if the callers forget to pass those arguments,
+   attempting to tail call in such routines leads to stack corruption.
+   Avoid return stack space for parameters <= 8 excluding hidden string
+   length argument is passed (partially or fully) on the stack in the
+   caller and the callee needs to pass any arguments on the stack.  */
+  else if (align_words < GP_ARG_NUM_REG
+  || (cum->hidden_string_length
+  && cum->actual_parm_length <= GP_ARG_NUM_REG))
{
  if (TARGET_32BIT && TARGET_POWERPC64)
return rs6000_mixed_function_arg (mode, type, align_words);
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 68bc45d65ba..a1d3ed00b14 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -1490,6 +1490,14 @@ typedef struct rs6000_args
   int named;   /* false for varargs params */
   int escapes; /* if function visible outside tu */
   int libcall; /* If this is a compiler generated call.  */
+  /* Actual parameter length ignoring hidden paramter.
+ This is done to C++ wrapper calling fortran module
+ which has hidden parameter

Re: [PATCH] c-family, c++: Handle EXCESS_PRECISION_EXPR in pretty printers

2024-03-22 Thread Rainer Orth

Hi Jakub,

> I've noticed that the c-c++-common/gomp/depobj-3.c test FAILs on i686-linux:
> PASS: c-c++-common/gomp/depobj-3.c  -std=c++17  at line 17 (test for 
> warnings, line 15)
> FAIL: c-c++-common/gomp/depobj-3.c  -std=c++17  at line 39 (test for 
> warnings, line 37)
> PASS: c-c++-common/gomp/depobj-3.c  -std=c++17  at line 43 (test for errors, 
> line 41)
> PASS: c-c++-common/gomp/depobj-3.c  -std=c++17  (test for warnings, line 45)
> FAIL: c-c++-common/gomp/depobj-3.c  -std=c++17 (test for excess errors)
> Excess errors:
> /home/jakub/src/gcc/gcc/testsuite/c-c++-common/gomp/depobj-3.c:37:38: 
> warning: the 'destroy' expression ''excess_precision_expr' not supported by 
> dump_expr' should be the same as the 'depobj' argument 
> 'obj' [-Wopenmp]
> The following patch replaces that 'excess_precision_expr' not supported by 
> dump_expr
> with (float)(((long double)a) + (long double)5)

this is PR c++/112724.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

[PATCH] c-family, c++: Handle EXCESS_PRECISION_EXPR in pretty printers

2024-03-22 Thread Jakub Jelinek

Hi!

I've noticed that the c-c++-common/gomp/depobj-3.c test FAILs on i686-linux:
PASS: c-c++-common/gomp/depobj-3.c  -std=c++17  at line 17 (test for warnings, 
line 15)
FAIL: c-c++-common/gomp/depobj-3.c  -std=c++17  at line 39 (test for warnings, 
line 37)
PASS: c-c++-common/gomp/depobj-3.c  -std=c++17  at line 43 (test for errors, 
line 41)
PASS: c-c++-common/gomp/depobj-3.c  -std=c++17  (test for warnings, line 45)
FAIL: c-c++-common/gomp/depobj-3.c  -std=c++17 (test for excess errors)
Excess errors:
/home/jakub/src/gcc/gcc/testsuite/c-c++-common/gomp/depobj-3.c:37:38: warning: 
the 'destroy' expression ''excess_precision_expr' not supported by 
dump_expr' should be the same as the 'depobj' argument 'obj' 
[-Wopenmp]
The following patch replaces that 'excess_precision_expr' not supported by 
dump_expr
with (float)(((long double)a) + (long double)5)
Still ugly and doesn't actually fix the FAIL (will deal with that
incrementally), but at least valid C/C++ and shows the excess precision
handling in action.

Ok for trunk if this passes bootstrap/regtest?

2024-03-22  Jakub Jelinek  

gcc/c/
* c-pretty-print.cc (pp_c_cast_expression,
c_pretty_printer::expression): Handle EXCESS_PRECISION_EXPR like
NOP_EXPR.
gcc/cp/
* error.cc (dump_expr): Handle EXCESS_PRECISION_EXPR like NOP_EXPR.

--- gcc/c-family/c-pretty-print.cc.jj   2024-01-12 10:07:57.744858004 +0100
+++ gcc/c-family/c-pretty-print.cc  2024-03-22 09:58:56.640001991 +0100
@@ -2327,6 +2327,7 @@ pp_c_cast_expression (c_pretty_printer *
 case FIX_TRUNC_EXPR:
 CASE_CONVERT:
 case VIEW_CONVERT_EXPR:
+case EXCESS_PRECISION_EXPR:
   if (!location_wrapper_p (e))
pp_c_type_cast (pp, TREE_TYPE (e));
   pp_c_cast_expression (pp, TREE_OPERAND (e, 0));
@@ -2753,6 +2754,7 @@ c_pretty_printer::expression (tree e)
 case FIX_TRUNC_EXPR:
 CASE_CONVERT:
 case VIEW_CONVERT_EXPR:
+case EXCESS_PRECISION_EXPR:
   pp_c_cast_expression (this, e);
   break;
 
--- gcc/cp/error.cc.jj  2024-01-20 12:32:34.157939870 +0100
+++ gcc/cp/error.cc 2024-03-22 10:00:38.259610171 +0100
@@ -2662,6 +2662,7 @@ dump_expr (cxx_pretty_printer *pp, tree
 CASE_CONVERT:
 case IMPLICIT_CONV_EXPR:
 case VIEW_CONVERT_EXPR:
+case EXCESS_PRECISION_EXPR:
   {
tree op = TREE_OPERAND (t, 0);
 

Jakub

Re: [PATCH] testsuite: vect: Don't xfail scan-tree-dump in gcc.dg/vect/bb-slp-32.c [PR96147]

2024-03-22 Thread Rainer Orth

Hi Richard,

> On Thu, 21 Mar 2024, Rainer Orth wrote:
>
>> gcc.dg/vect/bb-slp-32.c currently XPASSes on 32 and 64-bit Solaris/SPARC:
>> 
>> XPASS: gcc.dg/vect/bb-slp-32.c -flto -ffat-lto-objects scan-tree-dump
>> slp2 "vectorization is not profitable"
>> XPASS: gcc.dg/vect/bb-slp-32.c scan-tree-dump slp2 "vectorization is not
>> profitable"
>> 
>> At least on SPARC, the current xfail can simply go, but I'm highly
>> uncertain if this is right in general.
>> 
>> Tested on sparc-sun-solaris2.11 and i386-pc-solaris2.11.
>> 
>> Ok for trunk?
>
> The condition was made for the case where vectorization fails even when
> not considering costing.  But given we now do
>
>   p = __builtin_assume_aligned (p, __BIGGEST_ALIGNMENT__);
>
> that condition doesn't make sense anymore (forgot to update it in my
> r11-6715-gb36c9cd09472c8 change).
>
> In principle the testcase should be profitable to vectorize with
> the SLP reduction support now (and we'd vectorize it that way).
> But we fail to apply SLP node CSE when merging the SLP instance
> into a common subgraph, so we over-estimate cost (and perform
> double code generation that's later CSEd).
>
> That it's still not profitable on x86_64 for me is a quite narrow loss:
>
>   Vector cost: 144
>   Scalar cost: 140
>
> So ideally we'd key the FAIL on .REDUC_PLUS not being available for
> V4SImode but then we also try V2SImode where the reduction isn't
> recognized.  So the testcase wouldn't work well for targets comparing
> cost.
>
> I'd say we remove the dg-final completely for now.  I filed PR114413
> about the costing/CSE issue above.

Thanks.  This is what I committed after re-testing.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2024-03-19  Rainer Orth  

gcc/testsuite:
PR tree-optimization/96147
* gcc.dg/vect/bb-slp-32.c (dg-final): Remove.

# HG changeset patch
# Parent  b3b6fa4472bc1f2b170e2b736852ec93bae94480
testsuite: vect: Don't xfail scan-tree-dump in gcc.dg/vect/bb-slp-32.c [PR96147]

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-32.c b/gcc/testsuite/gcc.dg/vect/bb-slp-32.c
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-32.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-32.c
@@ -24,5 +24,3 @@ int foo (int *p, int a, int b)
   bar (x);
   return sum;
 }
-
-/* { dg-final { scan-tree-dump "vectorization is not profitable" "slp2" { xfail  { vect_no_align && { ! vect_hw_misalign } } } } } */

Re: [PATCH] Another ICE after conflicting types of redeclaration [PR109619]

2024-03-22 Thread Richard Biener

On Fri, Mar 22, 2024 at 5:20 AM Andrew Pinski  wrote:
>
> This another one of these ICE after error issues with the
> gimplifier and a fallout from r12-3278-g823685221de986af.
> This case happens when we are trying to fold memcpy/memmove.
> There is already code to try to catch ERROR_MARKs as arguments
> to the builtins so just need to change them to use error_operand_p
> which checks the type of the expression to see if it was an error mark
> also.
>
> Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK.

> gcc/ChangeLog:
>
> PR c/109619
> * builtins.cc (fold_builtin_1): Use error_operand_p
> instead of checking against ERROR_MARK.
> (fold_builtin_2): Likewise.
> (fold_builtin_3): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> PR c/109619
> * gcc.dg/redecl-26.c: New test.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/builtins.cc  | 12 ++--
>  gcc/testsuite/gcc.dg/redecl-26.c | 14 ++
>  2 files changed, 20 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/redecl-26.c
>
> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> index eda8bea9c4b..bb74b5cbcd6 100644
> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -10461,7 +10461,7 @@ fold_builtin_1 (location_t loc, tree expr, tree 
> fndecl, tree arg0)
>tree type = TREE_TYPE (TREE_TYPE (fndecl));
>enum built_in_function fcode = DECL_FUNCTION_CODE (fndecl);
>
> -  if (TREE_CODE (arg0) == ERROR_MARK)
> +  if (error_operand_p (arg0))
>  return NULL_TREE;
>
>if (tree ret = fold_const_call (as_combined_fn (fcode), type, arg0))
> @@ -10601,8 +10601,8 @@ fold_builtin_2 (location_t loc, tree expr, tree 
> fndecl, tree arg0, tree arg1)
>tree type = TREE_TYPE (TREE_TYPE (fndecl));
>enum built_in_function fcode = DECL_FUNCTION_CODE (fndecl);
>
> -  if (TREE_CODE (arg0) == ERROR_MARK
> -  || TREE_CODE (arg1) == ERROR_MARK)
> +  if (error_operand_p (arg0)
> +  || error_operand_p (arg1))
>  return NULL_TREE;
>
>if (tree ret = fold_const_call (as_combined_fn (fcode), type, arg0, arg1))
> @@ -10693,9 +10693,9 @@ fold_builtin_3 (location_t loc, tree fndecl,
>tree type = TREE_TYPE (TREE_TYPE (fndecl));
>enum built_in_function fcode = DECL_FUNCTION_CODE (fndecl);
>
> -  if (TREE_CODE (arg0) == ERROR_MARK
> -  || TREE_CODE (arg1) == ERROR_MARK
> -  || TREE_CODE (arg2) == ERROR_MARK)
> +  if (error_operand_p (arg0)
> +  || error_operand_p (arg1)
> +  || error_operand_p (arg2))
>  return NULL_TREE;
>
>if (tree ret = fold_const_call (as_combined_fn (fcode), type,
> diff --git a/gcc/testsuite/gcc.dg/redecl-26.c 
> b/gcc/testsuite/gcc.dg/redecl-26.c
> new file mode 100644
> index 000..5f8889c4c39
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/redecl-26.c
> @@ -0,0 +1,14 @@
> +/* We used to ICE while folding memcpy and memmove.
> +   PR c/109619. */
> +/* { dg-do compile } */
> +/* { dg-options "" } */
> +
> +int *a1, *a2;
> +
> +void foo(__SIZE_TYPE__ a3) /* { dg-note "" }  */
> +{
> +  __builtin_memcpy(a1, a2, a3);
> +  __builtin_memmove(a1, a2, a3);
> +  int *a3; /* { dg-error "redeclared as different kind of symbol" } */
> +}
> +
> --
> 2.43.0
>

Re: scheduler queue flush (was Re: [gcc-15 0/3] RISC-V improve stack/array access by constant mat tweak)

2024-03-22 Thread Richard Biener

On Thu, Mar 21, 2024 at 8:56 PM Jeff Law  wrote:
>
>
>
> On 3/21/24 11:19 AM, Vineet Gupta wrote:
>
> >>
> >> So if we go back to Robin's observation that scheduling dramatically
> >> increases the instruction count, perhaps we try a run with
> >> -fno-schedule-insns -fno-schedule-insns2 and see how the instruction
> >> counts compare.
> >
> > Oh yeah ! Robin hinted to this in Tues patchworks meeting too
> >
> > default   : 2,565,319,368,591
> > 128   : 2,509,741,035,068
> > 256   : 2,527,817,813,612
> > no-sched{,2}: 1,295,520,567,376
> Now we're getting somewhere.  That's in line with expectations.
>
> I would strongly suspect it's -fno-schedule-insns rather than
> -fno-schedule-insns2.  The former turns off scheduling before register
> allocation, the second turns it off after register allocation.  So if
> our theory about spilling is correct, then it must be the first since
> the second won't affect register allocation.   While I can speculate
> about other potential scheduler impacts, spilling due to sched1's
> actions is by far the most likely.

Another option is to enable -fsched-pressure which should help with
this issue.

> Given the magnitude here, I would bet we can see this pretty clearly if
> you've got function level or block level count data for those runs.  I'd
> start with that, ideally narrowing things down to a function or hot loop
> within a function which shows a huge delta.
>
>  From that we can then look at the IRA and LRA dumps and correlate what
> we see there with the before/after scheduling dumps to see how we've
> lengthened lifetimes in critical locations.
>
> I'd probably start with the IRA dump.  It's going to have annotations in
> its dump output like "Potential Spill" which may guide us.  In simplest
> terms a pseudo is trivially allocatable when it has fewer neighbors in
> the conflict graph than available hard registers.  If it has more
> neighbors in the conflict graph than available hard registers, then it's
> potentially going to be spilled -- we can't know during this phase of
> allocation.
>
> As we pop registers off the coloring stack, some neighbors of the pseudo
> in question may end up allocated into the same hard register.  That can
> sometimes result in a hard register being available.  It might be easier
> to see with a graph
>
>  a--b--c
> |
> d
>
> Where a..d are pseudo registers.  If two pseudos are connected by an
> edge, then they have overlapping lifetimes and can't be allocated to the
> same hard register.  So as we can see b conflicts with a, c & d.  If we
> only have two hard registers, then b is not trivially colorable and will
> be marked as a potential spill.
>
> During coloring we may end up allocating a, c & d to the same hard
> register (they don't conflict, so its safe).  If that happens, then
> there would be a register available for b.
>
> Anyway, that should explain why b would be marked as a potential spill
> and how it might end up getting a hard register anyway.
>
> The hope is we can see the potential spills increasing.  At which point
> we can walk backwards to sched1 and dive into its scheduling decisions.
>
> Jeff

Re: [PATCH] vect: more oversized bitmask fixups

2024-03-22 Thread Richard Biener

On Thu, Mar 21, 2024 at 5:07 PM Andrew Stubbs  wrote:
>
> On 21/03/2024 15:18, Richard Biener wrote:
> > On Thu, Mar 21, 2024 at 3:23 PM Andrew Stubbs  wrote:
> >>
> >> My previous patch to fix this problem with xor was rejected because we
> >> want to fix these issues only at the point of use.  That patch produced
> >> slightly better code, in this example, but this works too
> >>
> >> These patches fix up a failure in testcase vect/tsvc/vect-tsvc-s278.c when
> >> configured to use V32 instead of V64 (I plan to do this for RDNA devices).
> >>
> >> The problem was that a "not" operation on the mask inadvertently enabled
> >> inactive lanes 31-63 and corrupted the output.  The fix is to adjust the 
> >> mask
> >> when calling internal functions (in this case COND_MINUS), when doing 
> >> masked
> >> loads and stores, and when doing conditional jumps.
> >>
> >> OK for mainline?
> >>
> >> Andrew
> >>
> >> gcc/ChangeLog:
> >>
> >>  * dojump.cc (do_compare_rtx_and_jump): Clear excess bits in vector
> >>  bitmaps.
> >>  * internal-fn.cc (expand_fn_using_insn): Likewise.
> >>  (add_mask_and_len_args): Likewise.
> >> ---
> >>   gcc/dojump.cc  | 16 
> >>   gcc/internal-fn.cc | 26 ++
> >>   2 files changed, 42 insertions(+)
> >>
> >> diff --git a/gcc/dojump.cc b/gcc/dojump.cc
> >> index 88600cb42d3..8df86957e83 100644
> >> --- a/gcc/dojump.cc
> >> +++ b/gcc/dojump.cc
> >> @@ -1235,6 +1235,22 @@ do_compare_rtx_and_jump (rtx op0, rtx op1, enum 
> >> rtx_code code, int unsignedp,
> >>  }
> >>  }
> >>
> >> +  if (val
> >> + && VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (val))
> >> + && SCALAR_INT_MODE_P (mode))
> >> +   {
> >> + auto nunits = TYPE_VECTOR_SUBPARTS (TREE_TYPE (val)).to_constant 
> >> ();
> >> + if (maybe_ne (GET_MODE_PRECISION (mode), nunits))
> >> +   {
> >> + op0 = expand_binop (mode, and_optab, op0,
> >> + GEN_INT ((HOST_WIDE_INT_1U << nunits) - 
> >> 1),
> >> + NULL_RTX, true, OPTAB_WIDEN);
> >> + op1 = expand_binop (mode, and_optab, op1,
> >> + GEN_INT ((HOST_WIDE_INT_1U << nunits) - 
> >> 1),
> >> + NULL_RTX, true, OPTAB_WIDEN);
> >> +   }
> >> +   }
> >> +
> >
> > Can we then remove the same code from do_compare_and_jump before the call to
> > do_compare_rtx_and_jump?
>
> It's called from do_jump.
>
> >  I'll note that we don't pass 'val' there and
> > 'val' is unfortunately
> > not documented - what's it supposed to be?  I think I placed the original 
> > fix in
> > do_compare_and_jump because we have the full into available there.  So
> > what's the
> > do_compare_rtx_and_jump caller that needs fixing as well?  (IMHO keying on 
> > 'val'
> > looks fragile)
>
> "val" is the tree expression from which the rtx op0 was expanded. It's
> optional, but it's used in emit_cmp_and_jump_insns to determine whether
> the target supports tbranch (according to a comment).
>
> I think it would be safe to remove your code as that path does path
> "treeop0" to "val".
>
> WDYT?

Looks like a bit of a mess, but yes, I think that sounds good.

Thanks,
Richard.

> > The other hunks below are OK.
>
> Thanks.
>
> Andrew
>
> > Thanks,
> > Richard.
> >
> >> emit_cmp_and_jump_insns (op0, op1, code, size, mode, unsignedp, 
> >> val,
> >> if_true_label, prob);
> >>   }
> >> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> >> index fcf47c7fa12..5269f0ac528 100644
> >> --- a/gcc/internal-fn.cc
> >> +++ b/gcc/internal-fn.cc
> >> @@ -245,6 +245,18 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, 
> >> unsigned int noutputs,
> >> && SSA_NAME_IS_DEFAULT_DEF (rhs)
> >> && VAR_P (SSA_NAME_VAR (rhs)))
> >>  create_undefined_input_operand ([opno], TYPE_MODE (rhs_type));
> >> +  else if (VECTOR_BOOLEAN_TYPE_P (rhs_type)
> >> +  && SCALAR_INT_MODE_P (TYPE_MODE (rhs_type))
> >> +  && maybe_ne (GET_MODE_PRECISION (TYPE_MODE (rhs_type)),
> >> +   TYPE_VECTOR_SUBPARTS (rhs_type).to_constant 
> >> ()))
> >> +   {
> >> + /* Ensure that the vector bitmasks do not have excess bits.  */
> >> + int nunits = TYPE_VECTOR_SUBPARTS (rhs_type).to_constant ();
> >> + rtx tmp = expand_binop (TYPE_MODE (rhs_type), and_optab, rhs_rtx,
> >> + GEN_INT ((HOST_WIDE_INT_1U << nunits) - 
> >> 1),
> >> + NULL_RTX, true, OPTAB_WIDEN);
> >> + create_input_operand ([opno], tmp, TYPE_MODE (rhs_type));
> >> +   }
> >> else
> >>  create_input_operand ([opno], rhs_rtx, TYPE_MODE (rhs_type));
> >> opno += 1;
> >> @@ -312,6 +324,20 @@ add_mask_and_len_args (expand_operand *ops, unsigned 
> >> int opno,

[PATCH] c++: Fix bogus warnings about ignored annotations [PR114409]

2024-03-22 Thread Jakub Jelinek

Hi!

The middle-end warns about the ANNOTATE_EXPR added for while/for loops
if they declare a var inside of the loop condition.
This is because the assumption is that ANNOTATE_EXPR argument is used
immediately in a COND_EXPR (later GIMPLE_COND), but simplify_loop_decl_cond
wraps the ANNOTATE_EXPR inside of a TRUTH_NOT_EXPR, so it no longer
holds.

The following patch fixes that by adding the TRUTH_NOT_EXPR inside of the
ANNOTATE_EXPR argument if any.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Note, the PR is mostly about ICE with the annotations used in a template,
this patch doesn't change anything on that and I really don't know what
should be done in that case.

2024-03-22  Jakub Jelinek  

PR c++/114409
* semantics.cc (simplify_loop_decl_cond): Use cp_build_unary_op with
TRUTH_NOT_EXPR on ANNOTATE_EXPR argument (if any) rather than
ANNOTATE_EXPR itself.

* g++.dg/ext/pr114409.C: New test.

--- gcc/cp/semantics.cc.jj  2024-03-01 17:27:58.862888609 +0100
+++ gcc/cp/semantics.cc 2024-03-21 15:24:57.296857864 +0100
@@ -799,7 +799,11 @@ simplify_loop_decl_cond (tree *cond_p, t
   *cond_p = boolean_true_node;
 
   if_stmt = begin_if_stmt ();
-  cond = cp_build_unary_op (TRUTH_NOT_EXPR, cond, false, tf_warning_or_error);
+  cond_p = 
+  while (TREE_CODE (*cond_p) == ANNOTATE_EXPR)
+cond_p = _OPERAND (*cond_p, 0);
+  *cond_p = cp_build_unary_op (TRUTH_NOT_EXPR, *cond_p, false,
+  tf_warning_or_error);
   finish_if_stmt_cond (cond, if_stmt);
   finish_break_stmt ();
   finish_then_clause (if_stmt);
--- gcc/testsuite/g++.dg/ext/pr114409.C.jj  2024-03-21 15:27:44.077661090 
+0100
+++ gcc/testsuite/g++.dg/ext/pr114409.C 2024-03-21 15:27:15.331039726 +0100
@@ -0,0 +1,22 @@
+// PR c++/114409
+// { dg-do compile }
+// { dg-options "-O2 -Wall" }
+
+void qux (int);
+int foo (int);
+
+void
+bar (int x)
+{
+  #pragma GCC novector
+  while (int y = foo (x))  // { dg-bogus "ignoring loop annotation" }
+qux (y);
+}
+
+void
+baz (int x)
+{
+  #pragma GCC novector
+  for (; int y = foo (x); )// { dg-bogus "ignoring loop annotation" }
+qux (y);
+}

Jakub

Re: [PATCH] ubsan: Don't -fsanitize=null instrument __seg_fs/gs pointers [PR111736]

2024-03-22 Thread Richard Biener

On Fri, 22 Mar 2024, Jakub Jelinek wrote:

> Hi!
> 
> On x86 and avr some address spaces allow 0 pointers (on avr actually
> even generic as, but libsanitizer isn't ported to it and
> I'm not convinced we should completely kill -fsanitize=null in that
> case).
> The following patch makes sure those aren't diagnosed for -fsanitize=null,
> though they are still sanitized for -fsanitize=alignment.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

> 2024-03-22  Jakub Jelinek  
> 
>   PR sanitizer/111736
>   * ubsan.cc (ubsan_expand_null_ifn, instrument_mem_ref): Avoid
>   SANITIZE_NULL instrumentation for non-generic address spaces
>   for which targetm.addr_space.zero_address_valid (as) is true.
> 
>   * gcc.dg/ubsan/pr111736.c: New test.
> 
> --- gcc/ubsan.cc.jj   2024-03-13 09:16:37.791885010 +0100
> +++ gcc/ubsan.cc  2024-03-22 08:11:50.093131678 +0100
> @@ -858,6 +858,13 @@ ubsan_expand_null_ifn (gimple_stmt_itera
>   }
>  }
>check_null = sanitize_flags_p (SANITIZE_NULL);
> +  if (check_null && POINTER_TYPE_P (TREE_TYPE (ptr)))
> +{
> +  addr_space_t as = TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (ptr)));
> +  if (!ADDR_SPACE_GENERIC_P (as)
> +   && targetm.addr_space.zero_address_valid (as))
> + check_null = false;
> +}
>  
>if (check_align == NULL_TREE && !check_null)
>  {
> @@ -1447,8 +1454,15 @@ instrument_mem_ref (tree mem, tree base,
>if (align <= 1)
>   align = 0;
>  }
> -  if (align == 0 && !sanitize_flags_p (SANITIZE_NULL))
> -return;
> +  if (align == 0)
> +{
> +  if (!sanitize_flags_p (SANITIZE_NULL))
> + return;
> +  addr_space_t as = TYPE_ADDR_SPACE (TREE_TYPE (base));
> +  if (!ADDR_SPACE_GENERIC_P (as)
> +   && targetm.addr_space.zero_address_valid (as))
> + return;
> +}
>tree t = TREE_OPERAND (base, 0);
>if (!POINTER_TYPE_P (TREE_TYPE (t)))
>  return;
> --- gcc/testsuite/gcc.dg/ubsan/pr111736.c.jj  2024-03-21 13:50:49.482348296 
> +0100
> +++ gcc/testsuite/gcc.dg/ubsan/pr111736.c 2024-03-21 13:53:33.789091054 
> +0100
> @@ -0,0 +1,23 @@
> +/* PR sanitizer/111736 */
> +/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
> +/* { dg-options "-fsanitize=null,alignment -fdump-tree-optimized 
> -ffat-lto-objects" } */
> +/* { dg-final { scan-tree-dump-times "__ubsan_handle_type_mismatch" 1 
> "optimized" } } */
> +/* { dg-final { scan-tree-dump-not "p_\[0-9]*.D. \[=!]= 0" "optimized" } } */
> +
> +#ifdef __x86_64__
> +#define SEG __seg_fs
> +#else
> +#define SEG __seg_gs
> +#endif
> +
> +int
> +foo (int SEG *p, int *q)
> +{
> +  return *p;
> +}
> +
> +__attribute__((no_sanitize("alignment"))) int
> +bar (int SEG *p, int *q)
> +{
> +  return *p;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH] bitint: Some bitint store fixes [PR114405]

2024-03-22 Thread Richard Biener

On Fri, 22 Mar 2024, Jakub Jelinek wrote:

> Hi!
> 
> The following patch fixes some bugs in the handling of stores to large/huge
> _BitInt bitfields.
> 
> In the first 2 hunks we are processing the most significant limb of the
> actual type (not necessarily limb in the storage), and so we know it is
> either partial or full limb, so [1, limb_prec] bits rather than
> [0, limb_prec - 1] bits as the code actually assumed.  So, those 2
> spots are fixed by making sure if tprec is a multiple of limb_prec we
> actually use limb_prec bits rather than 0.  Otherwise, it e.g. happily
> could create and use 0 precision INTEGER_TYPE even when it actually
> should have processed 64 bits, or for non-zero bo_bit could handle just
> say 1 bit rather than 64 bits plus 1 bit in the last hunk spot.
> 
> In the last hunk we are dealing with the extra bits in the last storage
> limb, and the code was e.g. happily creating 65 bit precision INTEGER_TYPE,
> even when we really should use 1 bit precision in that case.  Also, it
> used a wrong offset in that case.
> 
> The large testcase covers all these cases.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Richard.

> 2024-03-22  Jakub Jelinek  
> 
>   PR tree-optimization/114405
>   * gimple-lower-bitint.cc (bitint_large_huge::lower_mergeable_stmt):
>   Set rprec to limb_prec rather than 0 if tprec is divisible by
>   limb_prec.  In the last bf_cur handling, set rprec to (tprec + bo_bit)
>   % limb_prec rather than tprec % limb_prec and use just rprec instead
>   of rprec + bo_bit.  For build_bit_field_ref offset, divide
>   (tprec + bo_bit) by limb_prec rather than just tprec.
> 
> * gcc.dg/bitint-103.c: New test.
> 
> --- gcc/gimple-lower-bitint.cc.jj 2024-03-20 15:03:30.868068343 +0100
> +++ gcc/gimple-lower-bitint.cc2024-03-21 12:51:26.728296098 +0100
> @@ -2737,7 +2737,7 @@ bitint_large_huge::lower_mergeable_stmt
> && tree_fits_uhwi_p (idx))
>   {
> unsigned int tprec = TYPE_PRECISION (type);
> -   unsigned int rprec = tprec % limb_prec;
> +   unsigned int rprec = (tprec - 1) % limb_prec + 1;
> if (rprec + bo_bit < (unsigned) limb_prec)
>   {
> tree ftype
> @@ -2882,7 +2882,7 @@ bitint_large_huge::lower_mergeable_stmt
> if (nlhs && i == cnt - 1)
>   {
> unsigned int tprec = TYPE_PRECISION (type);
> -   unsigned int rprec = tprec % limb_prec;
> +   unsigned int rprec = (tprec - 1) % limb_prec + 1;
> if (rprec + bo_bit < (unsigned) limb_prec)
>   {
> tree ftype
> @@ -2934,11 +2934,11 @@ bitint_large_huge::lower_mergeable_stmt
>if (bf_cur != NULL_TREE)
>  {
>unsigned int tprec = TYPE_PRECISION (type);
> -  unsigned int rprec = tprec % limb_prec;
> -  tree ftype = build_nonstandard_integer_type (rprec + bo_bit, 1);
> +  unsigned int rprec = (tprec + bo_bit) % limb_prec;
> +  tree ftype = build_nonstandard_integer_type (rprec, 1);
>tree bfr = build_bit_field_ref (ftype, unshare_expr (nlhs),
> -   rprec + bo_bit,
> -   (bo_idx + tprec / limb_prec)
> +   rprec,
> +   (bo_idx + (tprec + bo_bit) / limb_prec)
> * limb_prec);
>rhs1 = bf_cur;
>if (bf_cur != ext)
> --- gcc/testsuite/gcc.dg/torture/bitint-66.c.jj   2024-03-21 
> 11:53:00.790647163 +0100
> +++ gcc/testsuite/gcc.dg/torture/bitint-66.c  2024-03-21 11:52:29.296082298 
> +0100
> @@ -0,0 +1,187 @@
> +/* PR tree-optimization/114405 */
> +/* { dg-do run { target bitint } } */
> +/* { dg-options "-std=c23" } */
> +/* { dg-skip-if "" { ! run_expensive_tests }  { "*" } { "-O0" "-O2" } } */
> +/* { dg-skip-if "" { ! run_expensive_tests } { "-flto" } { "" } } */
> +
> +#if __BITINT_MAXWIDTH__ >= 22658
> +struct S1 { unsigned _BitInt(22592) b : 22592; } s1;
> +struct S2 { unsigned _BitInt(22656) b : 22656; } s2;
> +struct S3 { unsigned _BitInt(22656) a : 1; unsigned _BitInt(22656) b : 
> 22592; } s3;
> +struct S4 { unsigned _BitInt(22720) a : 1; unsigned _BitInt(22720) b : 
> 22656; } s4;
> +struct S5 { unsigned _BitInt(22656) a : 63; unsigned _BitInt(22656) b : 
> 22592; } s5;
> +struct S6 { unsigned _BitInt(22720) a : 63; unsigned _BitInt(22720) b : 
> 22656; } s6;
> +struct S7 { unsigned _BitInt(22656) a : 63; unsigned _BitInt(22656) b : 
> 22593; } s7;
> +struct S8 { unsigned _BitInt(22720) a : 63; unsigned _BitInt(22720) b : 
> 22657; } s8;
> +struct S9 { unsigned _BitInt(22720) a : 63; unsigned _BitInt(22720) b : 
> 22594; } s9;
> +struct S10 { unsigned _BitInt(22784) a : 63; unsigned _BitInt(22784) b : 
> 22658; } s10;
> +
> +void
> +f1 ()
> +{
> +  s1.b -= 1;
> +}
> +
> +void
> +f2 ()
> +{
> +  s2.b -= 2;
> +}
> +
> +void
>

[PATCH] ubsan: Don't -fsanitize=null instrument __seg_fs/gs pointers [PR111736]

2024-03-22 Thread Jakub Jelinek

Hi!

On x86 and avr some address spaces allow 0 pointers (on avr actually
even generic as, but libsanitizer isn't ported to it and
I'm not convinced we should completely kill -fsanitize=null in that
case).
The following patch makes sure those aren't diagnosed for -fsanitize=null,
though they are still sanitized for -fsanitize=alignment.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-03-22  Jakub Jelinek  

PR sanitizer/111736
* ubsan.cc (ubsan_expand_null_ifn, instrument_mem_ref): Avoid
SANITIZE_NULL instrumentation for non-generic address spaces
for which targetm.addr_space.zero_address_valid (as) is true.

* gcc.dg/ubsan/pr111736.c: New test.

--- gcc/ubsan.cc.jj 2024-03-13 09:16:37.791885010 +0100
+++ gcc/ubsan.cc2024-03-22 08:11:50.093131678 +0100
@@ -858,6 +858,13 @@ ubsan_expand_null_ifn (gimple_stmt_itera
}
 }
   check_null = sanitize_flags_p (SANITIZE_NULL);
+  if (check_null && POINTER_TYPE_P (TREE_TYPE (ptr)))
+{
+  addr_space_t as = TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (ptr)));
+  if (!ADDR_SPACE_GENERIC_P (as)
+ && targetm.addr_space.zero_address_valid (as))
+   check_null = false;
+}
 
   if (check_align == NULL_TREE && !check_null)
 {
@@ -1447,8 +1454,15 @@ instrument_mem_ref (tree mem, tree base,
   if (align <= 1)
align = 0;
 }
-  if (align == 0 && !sanitize_flags_p (SANITIZE_NULL))
-return;
+  if (align == 0)
+{
+  if (!sanitize_flags_p (SANITIZE_NULL))
+   return;
+  addr_space_t as = TYPE_ADDR_SPACE (TREE_TYPE (base));
+  if (!ADDR_SPACE_GENERIC_P (as)
+ && targetm.addr_space.zero_address_valid (as))
+   return;
+}
   tree t = TREE_OPERAND (base, 0);
   if (!POINTER_TYPE_P (TREE_TYPE (t)))
 return;
--- gcc/testsuite/gcc.dg/ubsan/pr111736.c.jj2024-03-21 13:50:49.482348296 
+0100
+++ gcc/testsuite/gcc.dg/ubsan/pr111736.c   2024-03-21 13:53:33.789091054 
+0100
@@ -0,0 +1,23 @@
+/* PR sanitizer/111736 */
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-fsanitize=null,alignment -fdump-tree-optimized 
-ffat-lto-objects" } */
+/* { dg-final { scan-tree-dump-times "__ubsan_handle_type_mismatch" 1 
"optimized" } } */
+/* { dg-final { scan-tree-dump-not "p_\[0-9]*.D. \[=!]= 0" "optimized" } } */
+
+#ifdef __x86_64__
+#define SEG __seg_fs
+#else
+#define SEG __seg_gs
+#endif
+
+int
+foo (int SEG *p, int *q)
+{
+  return *p;
+}
+
+__attribute__((no_sanitize("alignment"))) int
+bar (int SEG *p, int *q)
+{
+  return *p;
+}

Jakub

[PATCH] LoongArch: Remove unused code and add sign/zero-extend for vpickve2gr.d

2024-03-22 Thread Jiahao Xu

For machines that satisfy ISA_HAS_LSX && !TARGET_64BIT, we will not support 
them now
and in the future, so this patch removes these unused code.

This patch also adds sign/zero-extend operations to vpickve2gr.d to match 
the actual
instruction behavior, and integrates the template definition of vpickve2gr.

gcc/ChangeLog:

* config/loongarch/lasx.md: Remove unused code.
* config/loongarch/loongarch-protos.h (loongarch_split_lsx_copy_d): 
Remove.
(loongarch_split_lsx_insert_d): Ditto.
(loongarch_split_lsx_fill_d): Ditto.
* config/loongarch/loongarch.cc (loongarch_split_lsx_copy_d): Ditto.
(loongarch_split_lsx_insert_d): Ditto.
(loongarch_split_lsx_fill_d): Ditto.
* config/loongarch/lsx.md (lsx_vpickve2gr_): Redefine.
(lsx_vpickve2gr_du): Remove.
(lsx_vpickve2gr_): Ditto.

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index 2fa5e46c8e8..7bd61f8ed5b 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -572,12 +572,7 @@ (define_insn "lasx_xvinsgr2vr_"
  (match_operand 3 "const__operand" "")))]
   "ISA_HAS_LASX"
 {
-#if 0
-  if (!TARGET_64BIT && (mode == V4DImode || mode == V4DFmode))
-return "#";
-  else
-#endif
-return "xvinsgr2vr.\t%u0,%z1,%y3";
+  return "xvinsgr2vr.\t%u0,%z1,%y3";
 }
   [(set_attr "type" "simd_insert")
(set_attr "mode" "")])
@@ -1446,10 +1441,7 @@ (define_insn "lasx_xvreplgr2vr_"
   if (which_alternative == 1)
 return "xvldi.b\t%u0,0" ;
 
-  if (!TARGET_64BIT && (mode == V2DImode || mode == V2DFmode))
-return "#";
-  else
-return "xvreplgr2vr.\t%u0,%z1";
+  return "xvreplgr2vr.\t%u0,%z1";
 }
   [(set_attr "type" "simd_fill")
(set_attr "mode" "")
diff --git a/gcc/config/loongarch/loongarch-protos.h 
b/gcc/config/loongarch/loongarch-protos.h
index e3ed2b912a5..e238d795a73 100644
--- a/gcc/config/loongarch/loongarch-protos.h
+++ b/gcc/config/loongarch/loongarch-protos.h
@@ -89,9 +89,6 @@ extern void loongarch_split_128bit_move (rtx, rtx);
 extern bool loongarch_split_128bit_move_p (rtx, rtx);
 extern void loongarch_split_256bit_move (rtx, rtx);
 extern bool loongarch_split_256bit_move_p (rtx, rtx);
-extern void loongarch_split_lsx_copy_d (rtx, rtx, rtx, rtx (*)(rtx, rtx, rtx));
-extern void loongarch_split_lsx_insert_d (rtx, rtx, rtx, rtx);
-extern void loongarch_split_lsx_fill_d (rtx, rtx);
 extern const char *loongarch_output_move (rtx, rtx);
 #ifdef RTX_CODE
 extern void loongarch_expand_scc (rtx *);
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 030957db4e7..34850a0fc64 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -4759,82 +4759,6 @@ loongarch_split_256bit_move (rtx dest, rtx src)
 }
 }
 
-
-/* Split a COPY_S.D with operands DEST, SRC and INDEX.  GEN is a function
-   used to generate subregs.  */
-
-void
-loongarch_split_lsx_copy_d (rtx dest, rtx src, rtx index,
-   rtx (*gen_fn)(rtx, rtx, rtx))
-{
-  gcc_assert ((GET_MODE (src) == V2DImode && GET_MODE (dest) == DImode)
- || (GET_MODE (src) == V2DFmode && GET_MODE (dest) == DFmode));
-
-  /* Note that low is always from the lower index, and high is always
- from the higher index.  */
-  rtx low = loongarch_subword (dest, false);
-  rtx high = loongarch_subword (dest, true);
-  rtx new_src = simplify_gen_subreg (V4SImode, src, GET_MODE (src), 0);
-
-  emit_insn (gen_fn (low, new_src, GEN_INT (INTVAL (index) * 2)));
-  emit_insn (gen_fn (high, new_src, GEN_INT (INTVAL (index) * 2 + 1)));
-}
-
-/* Split a INSERT.D with operand DEST, SRC1.INDEX and SRC2.  */
-
-void
-loongarch_split_lsx_insert_d (rtx dest, rtx src1, rtx index, rtx src2)
-{
-  int i;
-  gcc_assert (GET_MODE (dest) == GET_MODE (src1));
-  gcc_assert ((GET_MODE (dest) == V2DImode
-  && (GET_MODE (src2) == DImode || src2 == const0_rtx))
- || (GET_MODE (dest) == V2DFmode && GET_MODE (src2) == DFmode));
-
-  /* Note that low is always from the lower index, and high is always
- from the higher index.  */
-  rtx low = loongarch_subword (src2, false);
-  rtx high = loongarch_subword (src2, true);
-  rtx new_dest = simplify_gen_subreg (V4SImode, dest, GET_MODE (dest), 0);
-  rtx new_src1 = simplify_gen_subreg (V4SImode, src1, GET_MODE (src1), 0);
-  i = exact_log2 (INTVAL (index));
-  gcc_assert (i != -1);
-
-  emit_insn (gen_lsx_vinsgr2vr_w (new_dest, low, new_src1,
- GEN_INT (1 << (i * 2;
-  emit_insn (gen_lsx_vinsgr2vr_w (new_dest, high, new_dest,
- GEN_INT (1 << (i * 2 + 1;
-}
-
-/* Split FILL.D.  */
-
-void
-loongarch_split_lsx_fill_d (rtx dest, rtx src)
-{
-  gcc_assert ((GET_MODE (dest) == V2DImode
-  && (GET_MODE (src) == DImode || src == const0_rtx))
- || (GET_MODE (dest) == V2DFmode && GET_MODE (src) == DFmode));
-
-  /* Note that low is always from the

[PATCH] bitint: Some bitint store fixes [PR114405]

2024-03-22 Thread Jakub Jelinek

Hi!

The following patch fixes some bugs in the handling of stores to large/huge
_BitInt bitfields.

In the first 2 hunks we are processing the most significant limb of the
actual type (not necessarily limb in the storage), and so we know it is
either partial or full limb, so [1, limb_prec] bits rather than
[0, limb_prec - 1] bits as the code actually assumed.  So, those 2
spots are fixed by making sure if tprec is a multiple of limb_prec we
actually use limb_prec bits rather than 0.  Otherwise, it e.g. happily
could create and use 0 precision INTEGER_TYPE even when it actually
should have processed 64 bits, or for non-zero bo_bit could handle just
say 1 bit rather than 64 bits plus 1 bit in the last hunk spot.

In the last hunk we are dealing with the extra bits in the last storage
limb, and the code was e.g. happily creating 65 bit precision INTEGER_TYPE,
even when we really should use 1 bit precision in that case.  Also, it
used a wrong offset in that case.

The large testcase covers all these cases.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-03-22  Jakub Jelinek  

PR tree-optimization/114405
* gimple-lower-bitint.cc (bitint_large_huge::lower_mergeable_stmt):
Set rprec to limb_prec rather than 0 if tprec is divisible by
limb_prec.  In the last bf_cur handling, set rprec to (tprec + bo_bit)
% limb_prec rather than tprec % limb_prec and use just rprec instead
of rprec + bo_bit.  For build_bit_field_ref offset, divide
(tprec + bo_bit) by limb_prec rather than just tprec.

* gcc.dg/bitint-103.c: New test.

--- gcc/gimple-lower-bitint.cc.jj   2024-03-20 15:03:30.868068343 +0100
+++ gcc/gimple-lower-bitint.cc  2024-03-21 12:51:26.728296098 +0100
@@ -2737,7 +2737,7 @@ bitint_large_huge::lower_mergeable_stmt
  && tree_fits_uhwi_p (idx))
{
  unsigned int tprec = TYPE_PRECISION (type);
- unsigned int rprec = tprec % limb_prec;
+ unsigned int rprec = (tprec - 1) % limb_prec + 1;
  if (rprec + bo_bit < (unsigned) limb_prec)
{
  tree ftype
@@ -2882,7 +2882,7 @@ bitint_large_huge::lower_mergeable_stmt
  if (nlhs && i == cnt - 1)
{
  unsigned int tprec = TYPE_PRECISION (type);
- unsigned int rprec = tprec % limb_prec;
+ unsigned int rprec = (tprec - 1) % limb_prec + 1;
  if (rprec + bo_bit < (unsigned) limb_prec)
{
  tree ftype
@@ -2934,11 +2934,11 @@ bitint_large_huge::lower_mergeable_stmt
   if (bf_cur != NULL_TREE)
 {
   unsigned int tprec = TYPE_PRECISION (type);
-  unsigned int rprec = tprec % limb_prec;
-  tree ftype = build_nonstandard_integer_type (rprec + bo_bit, 1);
+  unsigned int rprec = (tprec + bo_bit) % limb_prec;
+  tree ftype = build_nonstandard_integer_type (rprec, 1);
   tree bfr = build_bit_field_ref (ftype, unshare_expr (nlhs),
- rprec + bo_bit,
- (bo_idx + tprec / limb_prec)
+ rprec,
+ (bo_idx + (tprec + bo_bit) / limb_prec)
  * limb_prec);
   rhs1 = bf_cur;
   if (bf_cur != ext)
--- gcc/testsuite/gcc.dg/torture/bitint-66.c.jj 2024-03-21 11:53:00.790647163 
+0100
+++ gcc/testsuite/gcc.dg/torture/bitint-66.c2024-03-21 11:52:29.296082298 
+0100
@@ -0,0 +1,187 @@
+/* PR tree-optimization/114405 */
+/* { dg-do run { target bitint } } */
+/* { dg-options "-std=c23" } */
+/* { dg-skip-if "" { ! run_expensive_tests }  { "*" } { "-O0" "-O2" } } */
+/* { dg-skip-if "" { ! run_expensive_tests } { "-flto" } { "" } } */
+
+#if __BITINT_MAXWIDTH__ >= 22658
+struct S1 { unsigned _BitInt(22592) b : 22592; } s1;
+struct S2 { unsigned _BitInt(22656) b : 22656; } s2;
+struct S3 { unsigned _BitInt(22656) a : 1; unsigned _BitInt(22656) b : 22592; 
} s3;
+struct S4 { unsigned _BitInt(22720) a : 1; unsigned _BitInt(22720) b : 22656; 
} s4;
+struct S5 { unsigned _BitInt(22656) a : 63; unsigned _BitInt(22656) b : 22592; 
} s5;
+struct S6 { unsigned _BitInt(22720) a : 63; unsigned _BitInt(22720) b : 22656; 
} s6;
+struct S7 { unsigned _BitInt(22656) a : 63; unsigned _BitInt(22656) b : 22593; 
} s7;
+struct S8 { unsigned _BitInt(22720) a : 63; unsigned _BitInt(22720) b : 22657; 
} s8;
+struct S9 { unsigned _BitInt(22720) a : 63; unsigned _BitInt(22720) b : 22594; 
} s9;
+struct S10 { unsigned _BitInt(22784) a : 63; unsigned _BitInt(22784) b : 
22658; } s10;
+
+void
+f1 ()
+{
+  s1.b -= 1;
+}
+
+void
+f2 ()
+{
+  s2.b -= 2;
+}
+
+void
+f3 ()
+{
+  s3.b -= 3;
+}
+
+void
+f4 ()
+{
+  s4.b -= 4;
+}
+
+void
+f5 ()
+{
+  s5.b -= 5;
+}
+
+void
+f6 ()
+{
+  s6.b -= 6;
+}
+
+void
+f7 ()
+{
+  s7.b -= 7;
+}
+
+void
+f8 ()
+{
+  s8.b -= 8;
+}
+
+void
+f9 ()
+{
+  s9.b -= 9;
+}
+
+void

Re: [PATCH] RISC-V: Don't add fractional LMUL types to V_VLS for XTheadVector

2024-03-22 Thread Christoph Müllner

On Fri, Mar 22, 2024 at 2:18 AM juzhe.zh...@rivai.ai
 wrote:
>
> LGTM.

Pushed.
Thanks!

>
> 
> juzhe.zh...@rivai.ai
>
>
> From: Christoph Müllner
> Date: 2024-03-22 07:45
> To: gcc-patches; Kito Cheng; Palmer Dabbelt; Andrew Waterman; Philipp 
> Tomsich; Camel Coder; Bruce Hoult; Juzhe-Zhong; Jun Sha; Xianmiao Qu; Jin Ma
> CC: Christoph Müllner
> Subject: [PATCH] RISC-V: Don't add fractional LMUL types to V_VLS for 
> XTheadVector
> The expansion of `memset` (via expand_builtin_memset_args())
> uses clear_by_pieces() and store_by_pieces() to avoid calls
> to the C runtime. To check if a type can be used for that purpose
> the function by_pieces_mode_supported_p() tests if a `mov` and
> a `vec_duplicate` INSN can be expaned by the backend.
>
> The `vec_duplicate` expansion takes arguments of type `V_VLS`.
> The `mov` expansions take arguments of type `V`, `VB`, `VT`,
> `VLS_AVL_IMM`, and `VLS_AVL_REG`. Some of these types (in fact
> not types but type iterators) include fractional LMUL types.
> E.g. `V_VLS` includes `V`, which includes `VI`, which includes
> `RVVMF2QI`.
>
> This results in an attempt to use fractional LMUL-types for
> the `memset` expansion resulting in an ICE for XTheadVector,
> because that extension cannot handle fractional LMULs.
>
> This patch addresses this issue by splitting the definition
> of the `VI` mode itereator into `VI_NOFRAC` (without fractional
> LMUL types) and `VI_FRAC` (only fractional LMUL types).
> Further, it defines `V_VLS` such, that `VI_FRAC` types are only
> included if XTheadVector is not enabled.
>
> The effect is demonstrated by a new test case that shows
> that the by-pieces framework now emits `sb` instructions
> instead of triggering an ICE.
>
> Signed-off-by: Christoph Müllner 
>
> PR 114194
>
> gcc/ChangeLog:
>
> * config/riscv/vector-iterators.md: Split VI into VI_FRAC and VI_NOFRAC.
> Only include VI_NOFRAC in V_VLS without TARGET_XTHEADVECTOR.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/xtheadvector/pr114194.c: New test.
>
> Signed-off-by: Christoph Müllner 
> ---
> gcc/config/riscv/vector-iterators.md  | 19 +--
> .../riscv/rvv/xtheadvector/pr114194.c | 56 +++
> 2 files changed, 69 insertions(+), 6 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr114194.c
>
> diff --git a/gcc/config/riscv/vector-iterators.md 
> b/gcc/config/riscv/vector-iterators.md
> index c2ea7e8b10a..a24e1bf078f 100644
> --- a/gcc/config/riscv/vector-iterators.md
> +++ b/gcc/config/riscv/vector-iterators.md
> @@ -108,17 +108,24 @@ (define_c_enum "unspecv" [
>UNSPECV_FRM_RESTORE_EXIT
> ])
> -(define_mode_iterator VI [
> -  RVVM8QI RVVM4QI RVVM2QI RVVM1QI RVVMF2QI RVVMF4QI (RVVMF8QI 
> "TARGET_MIN_VLEN > 32")
> -
> -  RVVM8HI RVVM4HI RVVM2HI RVVM1HI RVVMF2HI (RVVMF4HI "TARGET_MIN_VLEN > 32")
> -
> -  RVVM8SI RVVM4SI RVVM2SI RVVM1SI (RVVMF2SI "TARGET_MIN_VLEN > 32")
> +;; Subset of VI with fractional LMUL types
> +(define_mode_iterator VI_FRAC [
> +  RVVMF2QI RVVMF4QI (RVVMF8QI "TARGET_MIN_VLEN > 32")
> +  RVVMF2HI (RVVMF4HI "TARGET_MIN_VLEN > 32")
> +  (RVVMF2SI "TARGET_MIN_VLEN > 32")
> +])
> +;; Subset of VI with non-fractional LMUL types
> +(define_mode_iterator VI_NOFRAC [
> +  RVVM8QI RVVM4QI RVVM2QI RVVM1QI
> +  RVVM8HI RVVM4HI RVVM2HI RVVM1HI
> +  RVVM8SI RVVM4SI RVVM2SI RVVM1SI
>(RVVM8DI "TARGET_VECTOR_ELEN_64") (RVVM4DI "TARGET_VECTOR_ELEN_64")
>(RVVM2DI "TARGET_VECTOR_ELEN_64") (RVVM1DI "TARGET_VECTOR_ELEN_64")
> ])
> +(define_mode_iterator VI [ VI_NOFRAC (VI_FRAC "!TARGET_XTHEADVECTOR") ])
> +
> ;; This iterator is the same as above but with TARGET_VECTOR_ELEN_FP_16
> ;; changed to TARGET_ZVFH.  TARGET_VECTOR_ELEN_FP_16 is also true for
> ;; TARGET_ZVFHMIN while we actually want to disable all instructions apart
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr114194.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr114194.c
> new file mode 100644
> index 000..fc2d1349425
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr114194.c
> @@ -0,0 +1,56 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv32gc_xtheadvector" { target { rv32 } } } */
> +/* { dg-options "-march=rv64gc_xtheadvector" { target { rv64 } } } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +/*
> +** foo0_1:
> +** sb\tzero,0([a-x0-9]+)
> +** ret
> +*/
> +void foo0_1 (void *p)
> +{
> +  __builtin_memset (p, 0, 1);
> +}
> +
> +/*
> +** foo0_7:
> +** sb\tzero,0([a-x0-9]+)
> +** sb\tzero,1([a-x0-9]+)
> +** sb\tzero,2([a-x0-9]+)
> +** sb\tzero,3([a-x0-9]+)
> +** sb\tzero,4([a-x0-9]+)
> +** sb\tzero,5([a-x0-9]+)
> +** sb\tzero,6([a-x0-9]+)
> +** ret
> +*/
> +void foo0_7 (void *p)
> +{
> +  __builtin_memset (p, 0, 7);
> +}
> +
> +/*
> +** foo1_1:
> +** li\t[a-x0-9]+,1
> +** sb\t[a-x0-9]+,0([a-x0-9]+)
> +** ret
> +*/
> +void foo1_1 (void *p)
> +{
> +  __builtin_memset (p, 1, 1);
> +}
> +
> +/*
> +** foo1_5:
> +**

Re: [PATCH] RISC-V: Don't add fractional LMUL types to V_VLS for XTheadVector

2024-03-22 Thread Christoph Müllner

On Fri, Mar 22, 2024 at 4:43 AM Bruce Hoult  wrote:
>
> > The effect is demonstrated by a new test case that shows
> that the by-pieces framework now emits `sb` instructions
> instead of triggering an ICE
>
> So these small memset() now don't use RVV at all if xtheadvector is enabled?

Yes, but not directly.
The patch just prevents fractional LMUL modes from being considered
for XTheadVector.
That's necessary because further lowering memory moves with a
fractional LMUL mode
cannot be done for XTheadVector (that's the reason for the ICE).

> I don't have evidence whether the use of RVV (whether V or
> xtheadvector) for these memsets is a win or not, but the treatment
> should probably be consistent.
>
> I don't know why RVV 1.0 uses a fractional LMUL at all here. It would
> work perfectly well with LMUL=1 and just setting vl to the appropriate
> length (which is always less than 16 bytes). Use of fractional LMUL
> doesn't save any resources.

The compiler can consider fractional LMUL values for expansion for RVV,
but that does not mean it will be used in the emitted instruction sequence.
Details like cost model and data alignment also matter.

During testing, I observed that RVV and XTheadVector will both emit sequences
of 'sd' for short memsets with known length, known data to set,
and unknown alignment of the data to be written.
However, I have not excessively tested using all possible tuning parameters,
as my primary goal was to eliminate the reason for the ICE with XTheadVector.

>
> On Fri, Mar 22, 2024 at 12:46 PM Christoph Müllner
>  wrote:
> >
> > The expansion of `memset` (via expand_builtin_memset_args())
> > uses clear_by_pieces() and store_by_pieces() to avoid calls
> > to the C runtime. To check if a type can be used for that purpose
> > the function by_pieces_mode_supported_p() tests if a `mov` and
> > a `vec_duplicate` INSN can be expaned by the backend.
> >
> > The `vec_duplicate` expansion takes arguments of type `V_VLS`.
> > The `mov` expansions take arguments of type `V`, `VB`, `VT`,
> > `VLS_AVL_IMM`, and `VLS_AVL_REG`. Some of these types (in fact
> > not types but type iterators) include fractional LMUL types.
> > E.g. `V_VLS` includes `V`, which includes `VI`, which includes
> > `RVVMF2QI`.
> >
> > This results in an attempt to use fractional LMUL-types for
> > the `memset` expansion resulting in an ICE for XTheadVector,
> > because that extension cannot handle fractional LMULs.
> >
> > This patch addresses this issue by splitting the definition
> > of the `VI` mode itereator into `VI_NOFRAC` (without fractional
> > LMUL types) and `VI_FRAC` (only fractional LMUL types).
> > Further, it defines `V_VLS` such, that `VI_FRAC` types are only
> > included if XTheadVector is not enabled.
> >
> > The effect is demonstrated by a new test case that shows
> > that the by-pieces framework now emits `sb` instructions
> > instead of triggering an ICE.
> >
> > Signed-off-by: Christoph Müllner 
> >
> > PR 114194
> >
> > gcc/ChangeLog:
> >
> > * config/riscv/vector-iterators.md: Split VI into VI_FRAC and 
> > VI_NOFRAC.
> > Only include VI_NOFRAC in V_VLS without TARGET_XTHEADVECTOR.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/riscv/rvv/xtheadvector/pr114194.c: New test.
> >
> > Signed-off-by: Christoph Müllner 
> > ---
> >  gcc/config/riscv/vector-iterators.md  | 19 +--
> >  .../riscv/rvv/xtheadvector/pr114194.c | 56 +++
> >  2 files changed, 69 insertions(+), 6 deletions(-)
> >  create mode 100644 
> > gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr114194.c
> >
> > diff --git a/gcc/config/riscv/vector-iterators.md 
> > b/gcc/config/riscv/vector-iterators.md
> > index c2ea7e8b10a..a24e1bf078f 100644
> > --- a/gcc/config/riscv/vector-iterators.md
> > +++ b/gcc/config/riscv/vector-iterators.md
> > @@ -108,17 +108,24 @@ (define_c_enum "unspecv" [
> >UNSPECV_FRM_RESTORE_EXIT
> >  ])
> >
> > -(define_mode_iterator VI [
> > -  RVVM8QI RVVM4QI RVVM2QI RVVM1QI RVVMF2QI RVVMF4QI (RVVMF8QI 
> > "TARGET_MIN_VLEN > 32")
> > -
> > -  RVVM8HI RVVM4HI RVVM2HI RVVM1HI RVVMF2HI (RVVMF4HI "TARGET_MIN_VLEN > 
> > 32")
> > -
> > -  RVVM8SI RVVM4SI RVVM2SI RVVM1SI (RVVMF2SI "TARGET_MIN_VLEN > 32")
> > +;; Subset of VI with fractional LMUL types
> > +(define_mode_iterator VI_FRAC [
> > +  RVVMF2QI RVVMF4QI (RVVMF8QI "TARGET_MIN_VLEN > 32")
> > +  RVVMF2HI (RVVMF4HI "TARGET_MIN_VLEN > 32")
> > +  (RVVMF2SI "TARGET_MIN_VLEN > 32")
> > +])
> >
> > +;; Subset of VI with non-fractional LMUL types
> > +(define_mode_iterator VI_NOFRAC [
> > +  RVVM8QI RVVM4QI RVVM2QI RVVM1QI
> > +  RVVM8HI RVVM4HI RVVM2HI RVVM1HI
> > +  RVVM8SI RVVM4SI RVVM2SI RVVM1SI
> >(RVVM8DI "TARGET_VECTOR_ELEN_64") (RVVM4DI "TARGET_VECTOR_ELEN_64")
> >(RVVM2DI "TARGET_VECTOR_ELEN_64") (RVVM1DI "TARGET_VECTOR_ELEN_64")
> >  ])
> >
> > +(define_mode_iterator VI [ VI_NOFRAC (VI_FRAC "!TARGET_XTHEADVECTOR") ])
> > +
> >  ;; This iterator is the same as above but with

[r13-8480 Regression] FAIL: gcc.dg/tree-ssa/andnot-2.c scan-tree-dump-not forwprop3 "_expr" on Linux/x86_64

2024-03-22 Thread haochen.jiang

On Linux/x86_64,

a9a425df628ab80374cc6a132d39e470bc78c8bc is the first bad commit
commit a9a425df628ab80374cc6a132d39e470bc78c8bc
Author: Richard Biener 
Date:   Fri Feb 23 16:06:05 2024 +0100

middle-end/114070 - folding breaking VEC_COND expansion

caused

FAIL: gcc.dg/tree-ssa/andnot-2.c scan-tree-dump-not forwprop3 "_expr"

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/gcc-13/releases/gcc-13/r13-8480/usr
 --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/andnot-2.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/andnot-2.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)

Re: [PATCH] rs6000: Stackoverflow in optimized code on PPC (PR100799)

2024-03-22 Thread Jakub Jelinek

On Fri, Mar 22, 2024 at 01:00:21PM +0530, Ajit Agarwal wrote:
> When using FlexiBLAS with OpenBLAS we noticed corruption of
> the parameters passed to OpenBLAS functions. FlexiBLAS
> basically provides a BLAS interface where each function
> is a stub that forwards the arguments to a real BLAS lib,
> like OpenBLAS.
> 
> Fixes the corruption of caller frame checking number of
> arguments is less than equal to GP_ARG_NUM_REG (8)
> excluding hidden unused DECLS.

Thanks for working on this.

> 2024-03-22  Ajit Kumar Agarwal  
> 
> gcc/ChangeLog:
> 
> PR rtk-optimization/100799
> * config/rs600/rs600-calls.cc (rs6000_function_arg): Don't

These 2 lines are 8 space indented rather than tab.

>   generate parameter save area if number of arguments passed
>   less than equal to GP_ARG_NUM_REG (8) excluding hidden
>   paramter.
>   * function.cc (assign_parms_initialize_all): Check for hidden
>   parameter in fortran code and set the flag hidden_string_length
>   and actual paramter passed excluding hidden unused DECLS.

s/paramter/parameter/

>   * function.h: Add new field hidden_string_length and
>   actual_parm_length in function structure.

Why do you need to change generic code for something that will only be
used by a single target?
I mean, why don't you add the extra members in rs6000.h (struct rs6000_args)
and initialize them in rs6000-call.cc (init_cumulative_args) -
the function.cc function you've modified is the only one which uses
INIT_CUMULATIVE_INCOMING_ARGS and in that case init_cumulative_args is
called with incoming == true, so move the stuff from function.cc there.

> --- a/gcc/config/rs6000/rs6000-call.cc
> +++ b/gcc/config/rs6000/rs6000-call.cc
> @@ -1857,7 +1857,16 @@ rs6000_function_arg (cumulative_args_t cum_v, const 
> function_arg_info )
>  
> return rs6000_finish_function_arg (mode, rvec, k);
>   }
> -  else if (align_words < GP_ARG_NUM_REG)
> + /* Workaround buggy C/C++ wrappers around Fortran routines with
> + character(len=constant) arguments if the hidden string length arguments
> + are passed on the stack; if the callers forget to pass those arguments,
> + attempting to tail call in such routines leads to stack corruption.
> + Avoid return stack space for parameters <= 8 excluding hidden string
> + length argument is passed (partially or fully) on the stack in the
> + caller and the callee needs to pass any arguments on the stack.  */
> +  else if (align_words < GP_ARG_NUM_REG
> +|| (cfun->hidden_string_length
> +&& cfun->actual_parm_length <= GP_ARG_NUM_REG))
>   {
> if (TARGET_32BIT && TARGET_POWERPC64)
>   return rs6000_mixed_function_arg (mode, type, align_words);
> diff --git a/gcc/function.cc b/gcc/function.cc
> index 3cef6c17bce..1318564b466 100644
> --- a/gcc/function.cc
> +++ b/gcc/function.cc
> @@ -2326,6 +2326,32 @@ assign_parms_initialize_all (struct 
> assign_parm_data_all *all)
>  #endif
>all->args_so_far = pack_cumulative_args (>args_so_far_v);
>  
> +  unsigned int num_args = 0;
> +  unsigned int hidden_length = 0;
> +
> +  /* Workaround buggy C/C++ wrappers around Fortran routines with
> + character(len=constant) arguments if the hidden string length arguments
> + are passed on the stack; if the callers forget to pass those arguments,
> + attempting to tail call in such routines leads to stack corruption.
> + Avoid return stack space for parameters <= 8 excluding hidden string
> + length argument is passed (partially or fully) on the stack in the
> + caller and the callee needs to pass any arguments on the stack.  */
> +  for (tree arg = DECL_ARGUMENTS (current_function_decl);
> +   arg; arg = DECL_CHAIN (arg))
> +{
> +  num_args++;
> +  if (DECL_HIDDEN_STRING_LENGTH (arg))
> + {
> +   tree parmdef = ssa_default_def (cfun, arg);
> +   if (parmdef == NULL || has_zero_uses (parmdef))
> + {
> +   cfun->hidden_string_length = 1;
> +   hidden_length++;
> + }
> + }
> +   }
> +
> +  cfun->actual_parm_length = num_args - hidden_length;
>  #ifdef INCOMING_REG_PARM_STACK_SPACE
>all->reg_parm_stack_space
>  = INCOMING_REG_PARM_STACK_SPACE (current_function_decl);
> diff --git a/gcc/function.h b/gcc/function.h
> index 19e15bd63b0..5984f0007c2 100644
> --- a/gcc/function.h
> +++ b/gcc/function.h
> @@ -346,6 +346,11 @@ struct GTY(()) function {
>/* Last assigned dependence info clique.  */
>unsigned short last_clique;
>  
> +  /* Actual parameter length ignoring hidden paramter.
> + This is done to C++ wrapper calling fortran module
> + which has hidden parameter that are not used.  */
> +  unsigned int actual_parm_length;
> +
>/* Collected bit flags.  */
>  
>/* Number of units of general registers that need saving in stdarg
> @@ -442,6 +447,11 @@ struct GTY(()) function {
>/* Set for artificial function created for

[PATCH] rs6000: Stackoverflow in optimized code on PPC (PR100799)

2024-03-22 Thread Ajit Agarwal

Hello All:


When using FlexiBLAS with OpenBLAS we noticed corruption of
the parameters passed to OpenBLAS functions. FlexiBLAS
basically provides a BLAS interface where each function
is a stub that forwards the arguments to a real BLAS lib,
like OpenBLAS.

Fixes the corruption of caller frame checking number of
arguments is less than equal to GP_ARG_NUM_REG (8)
excluding hidden unused DECLS.

Bootstrapped and regtested on powerpc64-linux-gnu.

Thanks & Regards
Ajit


rs6000: Stackoverflow in optimized code on PPC (PR100799)

When using FlexiBLAS with OpenBLAS we noticed corruption of
the parameters passed to OpenBLAS functions. FlexiBLAS
basically provides a BLAS interface where each function
is a stub that forwards the arguments to a real BLAS lib,
like OpenBLAS.

Fixes the corruption of caller frame checking number of
arguments is less than equal to GP_ARG_NUM_REG (8)
excluding hidden unused DECLS.

2024-03-22  Ajit Kumar Agarwal  

gcc/ChangeLog:

PR rtk-optimization/100799
* config/rs600/rs600-calls.cc (rs6000_function_arg): Don't
generate parameter save area if number of arguments passed
less than equal to GP_ARG_NUM_REG (8) excluding hidden
paramter.
* function.cc (assign_parms_initialize_all): Check for hidden
parameter in fortran code and set the flag hidden_string_length
and actual paramter passed excluding hidden unused DECLS.
* function.h: Add new field hidden_string_length and
actual_parm_length in function structure.
---
 gcc/config/rs6000/rs6000-call.cc | 11 ++-
 gcc/function.cc  | 26 ++
 gcc/function.h   | 10 ++
 3 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/rs6000-call.cc b/gcc/config/rs6000/rs6000-call.cc
index 1f8f93a2ee7..8e6e3de6804 100644
--- a/gcc/config/rs6000/rs6000-call.cc
+++ b/gcc/config/rs6000/rs6000-call.cc
@@ -1857,7 +1857,16 @@ rs6000_function_arg (cumulative_args_t cum_v, const 
function_arg_info )
 
  return rs6000_finish_function_arg (mode, rvec, k);
}
-  else if (align_words < GP_ARG_NUM_REG)
+ /* Workaround buggy C/C++ wrappers around Fortran routines with
+   character(len=constant) arguments if the hidden string length arguments
+   are passed on the stack; if the callers forget to pass those arguments,
+   attempting to tail call in such routines leads to stack corruption.
+   Avoid return stack space for parameters <= 8 excluding hidden string
+   length argument is passed (partially or fully) on the stack in the
+   caller and the callee needs to pass any arguments on the stack.  */
+  else if (align_words < GP_ARG_NUM_REG
+  || (cfun->hidden_string_length
+  && cfun->actual_parm_length <= GP_ARG_NUM_REG))
{
  if (TARGET_32BIT && TARGET_POWERPC64)
return rs6000_mixed_function_arg (mode, type, align_words);
diff --git a/gcc/function.cc b/gcc/function.cc
index 3cef6c17bce..1318564b466 100644
--- a/gcc/function.cc
+++ b/gcc/function.cc
@@ -2326,6 +2326,32 @@ assign_parms_initialize_all (struct assign_parm_data_all 
*all)
 #endif
   all->args_so_far = pack_cumulative_args (>args_so_far_v);
 
+  unsigned int num_args = 0;
+  unsigned int hidden_length = 0;
+
+  /* Workaround buggy C/C++ wrappers around Fortran routines with
+ character(len=constant) arguments if the hidden string length arguments
+ are passed on the stack; if the callers forget to pass those arguments,
+ attempting to tail call in such routines leads to stack corruption.
+ Avoid return stack space for parameters <= 8 excluding hidden string
+ length argument is passed (partially or fully) on the stack in the
+ caller and the callee needs to pass any arguments on the stack.  */
+  for (tree arg = DECL_ARGUMENTS (current_function_decl);
+   arg; arg = DECL_CHAIN (arg))
+{
+  num_args++;
+  if (DECL_HIDDEN_STRING_LENGTH (arg))
+   {
+ tree parmdef = ssa_default_def (cfun, arg);
+ if (parmdef == NULL || has_zero_uses (parmdef))
+   {
+ cfun->hidden_string_length = 1;
+ hidden_length++;
+   }
+   }
+   }
+
+  cfun->actual_parm_length = num_args - hidden_length;
 #ifdef INCOMING_REG_PARM_STACK_SPACE
   all->reg_parm_stack_space
 = INCOMING_REG_PARM_STACK_SPACE (current_function_decl);
diff --git a/gcc/function.h b/gcc/function.h
index 19e15bd63b0..5984f0007c2 100644
--- a/gcc/function.h
+++ b/gcc/function.h
@@ -346,6 +346,11 @@ struct GTY(()) function {
   /* Last assigned dependence info clique.  */
   unsigned short last_clique;
 
+  /* Actual parameter length ignoring hidden paramter.
+ This is done to C++ wrapper calling fortran module
+ which has hidden parameter that are not used.  */
+  unsigned int actual_parm_length;
+
   /* Collected bit flags.  */
 
   /* Number of units of general registers that

[PATCH v4] RISC-V: Introduce gcc attribute riscv_rvv_vector_bits for RVV

2024-03-22 Thread pan2 . li

From: Pan Li 

This patch would like to introduce one new gcc attribute for RVV.
This attribute is used to define fixed-length variants of one
existing sizeless RVV types.

This attribute is valid if and only if the mrvv-vector-bits=zvl, the only
one args should be the integer constant and its' value is terminated
by the LMUL and the vector register bits in zvl*b.  For example:

typedef vint32m2_t fixed_vint32m2_t __attribute__((riscv_rvv_vector_bits(128)));

The above type define is valid when -march=rv64gc_zve64d_zvl64b
(aka 2(m2) * 64 = 128 for vin32m2_t), and will report error when
-march=rv64gcv_zvl128b similar to below.

"error: invalid RVV vector size '128', expected size is '256' based on
LMUL of type and '-mrvv-vector-bits=zvl'"

Meanwhile, a pre-define macro __riscv_v_fixed_vlen is introduced to
represent the fixed vlen in a RVV vector register.

For the vint*m*_t below operations are allowed.
* The sizeof.
* The global variable(s).
* The element of union and struct.
* The cast to other equalities.
* CMP: >, <, ==, !=, <=, >=
* ALU: +, -, *, /, %, &, |, ^, >>, <<, ~, -

The CMP will return vint*m*_t the same as aarch64 sve. For example:
typedef vint32m1_t fixed_vint32m1_t __attribute__((riscv_rvv_vector_bits(128)));
fixed_vint32m1_t less_than (fixed_vint32m1_t a, fixed_vint32m1_t b)
{
  return a < b;
}

For the vfloat*m*_t below operations are allowed.
* The sizeof.
* The global variable(s).
* The element of union and struct.
* The cast to other equalities.
* CMP: >, <, ==, !=, <=, >=
* ALU: +, -, *, /, -

The CMP will return vfloat*m*_t the same as aarch64 sve. For example:
typedef vfloat32m1_t fixed_vfloat32m1_t 
__attribute__((riscv_rvv_vector_bits(128)));
fixed_vfloat32m1_t less_than (fixed_vfloat32m1_t a, fixed_vfloat32m1_t b)
{
  return a < b;
}

For the vbool*_t types only below operations are allowed except
the CMP and ALU. The CMP and ALU operations on vbool*_t is not
well defined currently.
* The sizeof.
* The global variable(s).
* The element of union and struct.
* The cast to other equalities.

For the vint*x*m*_t tuple types are not suppored in this patch which is
compatible with clang.

This patch passed the below testsuites.
* The riscv fully regression tests.

gcc/ChangeLog:

* config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins): Add pre-define
macro __riscv_v_fixed_vlen when zvl.
* config/riscv/riscv.cc (riscv_handle_rvv_vector_bits_attribute):
New static func to take care of the RVV types decorated by
the attributes.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-1.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-10.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-11.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-12.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-13.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-14.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-15.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-16.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-17.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-18.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-2.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-3.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-4.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-5.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-6.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-7.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-8.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-9.c: New test.
* gcc.target/riscv/rvv/base/riscv_rvv_vector_bits.h: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/riscv-c.cc   |   3 +
 gcc/config/riscv/riscv.cc |  87 +-
 .../riscv/rvv/base/riscv_rvv_vector_bits-1.c  |   6 +
 .../riscv/rvv/base/riscv_rvv_vector_bits-10.c |  53 +
 .../riscv/rvv/base/riscv_rvv_vector_bits-11.c |  76 
 .../riscv/rvv/base/riscv_rvv_vector_bits-12.c |  14 +++
 .../riscv/rvv/base/riscv_rvv_vector_bits-13.c |  10 ++
 .../riscv/rvv/base/riscv_rvv_vector_bits-14.c |  10 ++
 .../riscv/rvv/base/riscv_rvv_vector_bits-15.c |  10 ++
 .../riscv/rvv/base/riscv_rvv_vector_bits-16.c |  11 ++
 .../riscv/rvv/base/riscv_rvv_vector_bits-17.c |  10 ++
 .../riscv/rvv/base/riscv_rvv_vector_bits-18.c |  45 
 .../riscv/rvv/base/riscv_rvv_vector_bits-2.c  |   6 +
 .../riscv/rvv/base/riscv_rvv_vector_bits-3.c  |   6 +
 .../riscv/rvv/base/riscv_rvv_vector_bits-4.c  |   6 +
 .../riscv/rvv/base/riscv_rvv_vector_bits-5.c  |   6 +
 .../riscv/rvv/base/riscv_rvv_vector_bits-6.c  |   6 +

Re: [PATCH] cpp: new built-in __EXP_COUNTER__

2024-03-22 Thread Kaz Kylheku

On 2024-03-21 18:40, Andrew Pinski wrote:

On Thu, Mar 21, 2024, 17:20 Kaz Kylheku  wrote: For instance, 
suppose we have a macro that expands to some block
of code in which there is an internal goto. If we have it

  #define MAC(...) { ... goto _label; ... __label: ; }

then this cannot be used twice in the same function; labels have
function scope. 

In this case why can't you use gcc's already extension of defining a local 
label? 
https://gcc.gnu.org/onlinedocs/gcc-13.2.0/gcc/Local-Labels.html 

This extension has been around for over 20 years specifically for that use 
case. 

Yes. For that case, local labels are a nice solution. 

It's just an example of the sort of thing for which it may be useful 
for a macro to be able to invent different identifiers in different 
invocations. 

The GNU preprocessor is used for multiple languages, and is also exposed 
as an independent utility that can be used to process anything that has 
a sufficiently C-like token structure. 

Since local labels are intended for macros, they are not subject to 
diagnosis by -Wshadow.  In the ordinary namespace for 
variables/functions/typedefs, there is no such concession. Macros 
that reuse identifiers in nested scopes will trigger nuisance warnings 
from -Wshadow.

68 matches

Mail list logo