Re: [PATCH] RISC-V: Fix SEW64 of vrsub.vx runtime fail in RV32 system

2023-04-04 Thread Jeff Law via Gcc-patches




On 4/2/23 22:18, juzhe.zh...@rivai.ai wrote:
OK, would you mind telling me some tool can help for formatting of .MD 
file?

Since clang-format only works on .cc/.h file.
No idea.  I don't use clang-format.  But I've also had 30+ years in this 
codebase, so most of its coding conventions are ones that I've 
internalized.  When I do something outside GCC I struggle mightly to 
conform to other conventions.


Given how often this construct seems to appear, perhaps move it into a 
helper in the .cc file.  That would seem to reduce code duplication and 
allow you to use clang-format.






I have asked many people how to format .MD file but I didn't get the answer.
So that's why you can see formatting of .MD file is not good. Unlike 
.cc/.h files.

Can you force clang-format to treat it like C++ code?

jeff


Re: [PATCH] Less warnings for parameters declared as arrays [PR98541, PR98536]

2023-04-04 Thread Jeff Law via Gcc-patches




On 4/3/23 13:34, Martin Uecker via Gcc-patches wrote:



With the relatively new warnings (11..) affecting VLA bounds,
I now get a lot of false positives with -Wall. In general, I find
the new warnings very useful, but they seem a bit too
aggressive and some minor tweaks are needed, otherwise they are
too noisy.  This patch suggests two changes:

1. For VLA bounds non-null is implied only when 'static' is
used (similar to clang) and not already when a bound > 0 is
specified:

int foo(int n, char buf[static n]);

int foo(10, 0); // warning with 'static' but not without.


(It also seems problematic to require a size of 0 to indicate
that the pointer may be null, because 0 is not allowed in
ISO C as a size. It is also inconsistent to how arrays with
static bound behave.)

There seems to be agreement about this change in PR98541.


2. GCC always warns when the number of unspecified
bounds is different between two declarations:

int foo(int n, char buf[*]);
int foo(int n, char buf[n]);

or

int foo(int n, char buf[n]);
int foo(int n, char buf[*]);

But the first version is useful if the size expression
can not be specified in a header (e.g. because it uses
a macro or variable not available there) and there is
currently no easy way to avoid this.  The warning for
both cases was by design,  but I suggest to limit the
warning to the second case.

Note that the logic currently applied by GCC is too
simplistic anyway, as GCC does not warn for

int foo(int x, int y, double m[*][y]);
int foo(int x, int y, double m[x][*]);

because the number of specified / unspecified bounds
is the same.  So I suggest to go with the attached
patch now and add  more precise warnings later
if there is more experience with these warning
in gernal and if this then still seems desirable.


Martin


 Less warnings for parameters declared as arrays [PR98541, PR98536]
 
 To avoid false positivies, tune the warnings for parameters declared

 as arrays with size expressions.  Only warn about null arguments with
 'static'.  Also do not warn when more bounds are specified in the new
 declaration than before.
 
 PR c/98541

 PR c/98536
 
 c-family/

 * c-warn.cc (warn_parm_array_mismatch): Do not warn if more
 bounds are specified.
 
 gcc/

 * gimple-ssa-warn-access.cc
   (pass_waccess::maybe_check_access_sizes): For VLA bounds
 in parameters, only warn about null pointers with 'static'.
 
 gcc/testsuite:

 * gcc.dg/Wnonnull-4: Adapt test.
 * gcc.dg/Wstringop-overflow-40.c: Adapt test.
 * gcc.dg/Wvla-parameter-4.c: Adapt test.
 * gcc.dg/attr-access-2.c: Adapt test.

Neither appears to be a regression.  Seems like it should defer to gcc-14.
jeff


Re: [PATCH v2] RISC-V: Add Z*inx imcompatible check in gcc.

2023-04-04 Thread Jeff Law via Gcc-patches




On 4/3/23 19:46, Hans-Peter Nilsson wrote:

On Tue, 28 Mar 2023, Jiawei wrote:


+  // Zfinx is conflict with float extensions.
+  if (TARGET_ZFINX && TARGET_HARD_FLOAT)
+error ("z*inx is conflict with float extensions");
+


While I'm not a native English speaker, "is conflict with"
doesn't sound grammatically correct.  Perhaps "conflicts with"
or "is in conflict with"?

"conflicts with" is better.

Jeff



Re: [PATCH] Check hard_regno_mode_ok before setting lowest memory move cost for the mode with different reg classes.

2023-04-04 Thread Jeff Law via Gcc-patches




On 4/3/23 23:13, liuhongt via Gcc-patches wrote:

There's a potential performance issue when backend returns some
unreasonable value for the mode which can be never be allocate with
reg class.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk(or GCC14 stage1)?

gcc/ChangeLog:

PR rtl-optimization/109351
* ira.cc (setup_class_subset_and_memory_move_costs): Check
hard_regno_mode_ok before setting lowest memory move cost for
the mode with different reg classes.
Not a regression *and* changing register allocation.  This seems like it 
should defer to gcc-14.


jeff


Re: [PATCH] RISC-V: Fix PR109399 VSETVL PASS bug

2023-04-04 Thread Jeff Law via Gcc-patches




On 4/4/23 02:46, juzhe.zh...@rivai.ai wrote:

From: Juzhe-Zhong 

This patch fix bug:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109399

 PR 109399

gcc/ChangeLog:

 * config/riscv/riscv-vsetvl.cc 
(pass_vsetvl::compute_local_backward_infos): Update user vsetvl in local demand 
fusion.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/rvv/vsetvl/pr109399.c: New test.

Thanks.  Installed.

I noted that change_vsetvl_insn does not have a function comment.  Can 
you please add one.  Perhaps something like this:


/* INSN is either a vector configuration insn, or an insn with a vtype
   that is immediately preceded by a vector configuration insn.  In
   both cases, change the vector configuration insn to utilize the
   vector configuration state in INFO.  */



Re: [PATCH] RISC-V: Fix typo

2023-04-04 Thread Jeff Law via Gcc-patches




On 4/4/23 01:49, Li Xu wrote:

gcc/ChangeLog:

 * config/riscv/riscv-vector-builtins.def: Fix typo.
 * config/riscv/riscv.cc (riscv_dwarf_poly_indeterminate_value): Ditto.
 * config/riscv/vector-iterators.md: Ditto.

Thanks.  Installed on the trunk.
jeff


[committed] doc: md.texi (Including Patterns): Fix page break

2023-04-04 Thread Hans-Peter Nilsson via Gcc-patches
Committed as obvious.  See also the previous discussion
regarding my define_split doc patch.
-- >8 --
The line-break in the example looked odd, even more so with
a page-break in the middle of it, due to recently added text
in preceding pages.

* doc/md.texi (Including Patterns): Fix page break.
---
 gcc/doc/md.texi | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 970af5e34710..07bf8bdebffb 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -8963,8 +8963,7 @@ It looks like:
 
 @smallexample
 
-(include
-  @var{pathname})
+(include @var{pathname})
 @end smallexample
 
 For example:
-- 
2.30.2



[PATCH 1/4] libstdc++: Harmonize and other headers

2023-04-04 Thread Arsen Arsenović via Gcc-patches
Due to recent, large changes in libstdc++, the feature test macros
declared in  got out of sync with the other headers that
possibly declare them.

libstdc++-v3/ChangeLog:

* include/bits/unique_ptr.h (__cpp_lib_constexpr_memory):
Synchronize the definition block with...
* include/bits/ptr_traits.h (__cpp_lib_constexpr_memory):
... this one here.  Also define the 202202L value, rather than
leaving it up to purely unique_ptr.h, so that the value is
synchronized across all headers.
(__gnu_debug::_Safe_iterator_base): Move into new conditional
block.
* include/std/memory (__cpp_lib_atomic_value_initialization):
Define on freestanding under the same conditions as in
atomic_base.h.
* include/std/version (__cpp_lib_robust_nonmodifying_seq_ops):
Also define on freestanding.
(__cpp_lib_to_chars): Ditto.
(__cpp_lib_gcd): Ditto.
(__cpp_lib_gcd_lcm): Ditto.
(__cpp_lib_raw_memory_algorithms): Ditto.
(__cpp_lib_array_constexpr): Ditto.
(__cpp_lib_nonmember_container_access): Ditto.
(__cpp_lib_clamp): Ditto.
(__cpp_lib_constexpr_char_traits): Ditto.
(__cpp_lib_constexpr_string): Ditto.
(__cpp_lib_sample): Ditto.
(__cpp_lib_lcm): Ditto.
(__cpp_lib_constexpr_iterator): Ditto.
(__cpp_lib_constexpr_char_traits): Ditto.
(__cpp_lib_interpolate): Ditto.
(__cpp_lib_constexpr_utility): Ditto.
(__cpp_lib_shift): Ditto.
(__cpp_lib_ranges): Ditto.
(__cpp_lib_move_iterator_concept): Ditto.
(__cpp_lib_constexpr_numeric): Ditto.
(__cpp_lib_constexpr_functional): Ditto.
(__cpp_lib_constexpr_algorithms): Ditto.
(__cpp_lib_constexpr_tuple): Ditto.
(__cpp_lib_constexpr_memory): Ditto.
---
Evening,

This patchset is a replacement to and extension of the one presented at
https://inbox.sourceware.org/libstdc++/20230309222626.4008373-1-ar...@aarsen.me/
... that has been rebased as to include newer additions, and extended to
cover some regressions that seem to have occurred recently in
freestanding mode.

Tested on x86_64-pc-linux-gnu.

OK for trunk?

Thanks in advance, have a lovely night.

 libstdc++-v3/include/bits/ptr_traits.h | 13 ++--
 libstdc++-v3/include/bits/unique_ptr.h | 11 ++--
 libstdc++-v3/include/std/memory|  6 ++
 libstdc++-v3/include/std/version   | 85 ++
 4 files changed, 66 insertions(+), 49 deletions(-)

diff --git a/libstdc++-v3/include/bits/ptr_traits.h 
b/libstdc++-v3/include/bits/ptr_traits.h
index dc42a743c96..f6cc6b65f93 100644
--- a/libstdc++-v3/include/bits/ptr_traits.h
+++ b/libstdc++-v3/include/bits/ptr_traits.h
@@ -34,12 +34,15 @@
 
 #include 
 
+/* Duplicate definition with unique_ptr.h.  */
+#if __cplusplus > 202002L && defined(__cpp_constexpr_dynamic_alloc)
+# define __cpp_lib_constexpr_memory 202202L
+#elif __cplusplus > 201703L
+# include 
+# define __cpp_lib_constexpr_memory 201811L
+#endif
+
 #if __cplusplus > 201703L
-#include 
-# ifndef __cpp_lib_constexpr_memory
-// Defined to a newer value in bits/unique_ptr.h for C++23
-#  define __cpp_lib_constexpr_memory 201811L
-# endif
 namespace __gnu_debug { struct _Safe_iterator_base; }
 #endif
 
diff --git a/libstdc++-v3/include/bits/unique_ptr.h 
b/libstdc++-v3/include/bits/unique_ptr.h
index c8daff41865..f0c6d2383b4 100644
--- a/libstdc++-v3/include/bits/unique_ptr.h
+++ b/libstdc++-v3/include/bits/unique_ptr.h
@@ -43,12 +43,11 @@
 # endif
 #endif
 
-#if __cplusplus > 202002L && __cpp_constexpr_dynamic_alloc
-# if __cpp_lib_constexpr_memory < 202202L
-// Defined with older value in bits/ptr_traits.h for C++20
-#  undef __cpp_lib_constexpr_memory
-#  define __cpp_lib_constexpr_memory 202202L
-# endif
+/* Duplicate definition with ptr_traits.h.  */
+#if __cplusplus > 202002L && defined(__cpp_constexpr_dynamic_alloc)
+# define __cpp_lib_constexpr_memory 202202L
+#elif __cplusplus > 201703L
+# define __cpp_lib_constexpr_memory 201811L
 #endif
 
 namespace std _GLIBCXX_VISIBILITY(default)
diff --git a/libstdc++-v3/include/std/memory b/libstdc++-v3/include/std/memory
index 341f9857730..85c36d67ee1 100644
--- a/libstdc++-v3/include/std/memory
+++ b/libstdc++-v3/include/std/memory
@@ -91,6 +91,12 @@
 #  include 
 #endif
 
+/* As a hack, we declare __cpp_lib_atomic_value_initialization here even though
+   we don't include the bit that actually declares it, for consistency.  */
+#if !defined(__cpp_lib_atomic_value_initialization) && __cplusplus >= 202002L
+# define __cpp_lib_atomic_value_initialization 201911L
+#endif
+
 #if __cplusplus >= 201103L && __cplusplus <= 202002L && _GLIBCXX_HOSTED
 namespace std _GLIBCXX_VISIBILITY(default)
 {
diff --git a/libstdc++-v3/include/std/version b/libstdc++-v3/include/std/version
index a19c39c6cdd..0239fcea813 100644
--- a/libstdc++-v3/include/std/version
+++ b/libstdc++-v3/include/std/version
@@ -85,6 +85,12 

[PATCH 3/4] libstdc++: Downgrade DEBUG to ASSERTIONS when !HOSTED

2023-04-04 Thread Arsen Arsenović via Gcc-patches
Supporting the debug mode in freestanding is a non-trivial job, so
instead, as a best-effort, enable assertions, which are light and easy.

libstdc++-v3/ChangeLog:

* include/bits/c++config: When __STDC_HOSTED__ is zero,
disable _GLIBCXX_DEBUG and, if it was set, enable
_GLIBCXX_ASSERTIONS.
* testsuite/lib/libstdc++.exp (check_v3_target_debug_mode):
Include  when determining whether debug is
set, in order to inherit the logic from above
---
 libstdc++-v3/include/bits/c++config  | 7 +++
 libstdc++-v3/testsuite/lib/libstdc++.exp | 1 +
 2 files changed, 8 insertions(+)

diff --git a/libstdc++-v3/include/bits/c++config 
b/libstdc++-v3/include/bits/c++config
index 71f2401402f..13892787e09 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -397,6 +397,13 @@ _GLIBCXX_END_NAMESPACE_VERSION
 # define _GLIBCXX_END_INLINE_ABI_NAMESPACE(X)   } // inline namespace X
 #endif
 
+// In the case that we don't have a hosted environment, we can't provide the
+// debugging mode.  Instead, we do our best and downgrade to assertions.
+#if defined(_GLIBCXX_DEBUG) && !__STDC_HOSTED__
+#undef _GLIBCXX_DEBUG
+#define _GLIBCXX_ASSERTIONS 1
+#endif
+
 // Inline namespaces for special modes: debug, parallel.
 #if defined(_GLIBCXX_DEBUG) || defined(_GLIBCXX_PARALLEL)
 namespace std
diff --git a/libstdc++-v3/testsuite/lib/libstdc++.exp 
b/libstdc++-v3/testsuite/lib/libstdc++.exp
index 98512c973fb..490abd108fa 100644
--- a/libstdc++-v3/testsuite/lib/libstdc++.exp
+++ b/libstdc++-v3/testsuite/lib/libstdc++.exp
@@ -1007,6 +1007,7 @@ proc check_v3_target_debug_mode { } {
 global cxxflags
 return [check_v3_target_prop_cached et_debug_mode {
set code "
+   #include 
#if ! defined _GLIBCXX_DEBUG
# error no debug mode
#endif
-- 
2.40.0



[PATCH 4/4] libstdc++: Fix some freestanding test failures

2023-04-04 Thread Arsen Arsenović via Gcc-patches
At some point,  was added to the non-hosted bit of the C++17
block, which induced failures in many tests.

In addition, some tests also lacked a dg-require-effective-target hosted
tag.

libstdc++-v3/ChangeLog:

* include/precompiled/stdc++.h (C++17): Don't double-include
, once with wrong conditions.
* testsuite/18_support/96817.cc: Require hosted.
* testsuite/18_support/bad_exception/59392.cc: Ditto.
* testsuite/20_util/scoped_allocator/108952.cc: Ditto.
* testsuite/20_util/uses_allocator/lwg3527.cc: Ditto.
* testsuite/29_atomics/atomic/operators/pointer_partial_void.cc:
Ditto.
---
 libstdc++-v3/include/precompiled/stdc++.h| 1 -
 libstdc++-v3/testsuite/18_support/96817.cc   | 1 +
 libstdc++-v3/testsuite/18_support/bad_exception/59392.cc | 1 +
 libstdc++-v3/testsuite/20_util/scoped_allocator/108952.cc| 1 +
 libstdc++-v3/testsuite/20_util/uses_allocator/lwg3527.cc | 1 +
 .../29_atomics/atomic/operators/pointer_partial_void.cc  | 1 +
 6 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/precompiled/stdc++.h 
b/libstdc++-v3/include/precompiled/stdc++.h
index bc011986b53..176ad79ff3c 100644
--- a/libstdc++-v3/include/precompiled/stdc++.h
+++ b/libstdc++-v3/include/precompiled/stdc++.h
@@ -75,7 +75,6 @@
 
 #if __cplusplus >= 201703L
 #include 
-#include 
 // #include 
 #include 
 #include 
diff --git a/libstdc++-v3/testsuite/18_support/96817.cc 
b/libstdc++-v3/testsuite/18_support/96817.cc
index 70938812bd8..073fc337e8f 100644
--- a/libstdc++-v3/testsuite/18_support/96817.cc
+++ b/libstdc++-v3/testsuite/18_support/96817.cc
@@ -17,6 +17,7 @@
 
 // { dg-do run }
 // { dg-additional-options "-pthread" { target pthread } }
+// { dg-require-effective-target hosted }
 
 // Static init cannot detect recursion for gthreads targets without futexes
 // (and the futex case can only detect it if __libc_single_threaded==true).
diff --git a/libstdc++-v3/testsuite/18_support/bad_exception/59392.cc 
b/libstdc++-v3/testsuite/18_support/bad_exception/59392.cc
index ac64e6eddb2..ae972d0535d 100644
--- a/libstdc++-v3/testsuite/18_support/bad_exception/59392.cc
+++ b/libstdc++-v3/testsuite/18_support/bad_exception/59392.cc
@@ -17,6 +17,7 @@
 
 // { dg-options "-Wno-deprecated" }
 // { dg-do run { target c++14_down } }
+// { dg-require-effective-target hosted }
 
 #include 
 #include 
diff --git a/libstdc++-v3/testsuite/20_util/scoped_allocator/108952.cc 
b/libstdc++-v3/testsuite/20_util/scoped_allocator/108952.cc
index a6b9c67498c..9342f453bf4 100644
--- a/libstdc++-v3/testsuite/20_util/scoped_allocator/108952.cc
+++ b/libstdc++-v3/testsuite/20_util/scoped_allocator/108952.cc
@@ -1,4 +1,5 @@
 // { dg-do compile { target c++11 } }
+// { dg-require-effective-target hosted }
 
 #include 
 
diff --git a/libstdc++-v3/testsuite/20_util/uses_allocator/lwg3527.cc 
b/libstdc++-v3/testsuite/20_util/uses_allocator/lwg3527.cc
index ae377f4b5a3..c5a7d513b31 100644
--- a/libstdc++-v3/testsuite/20_util/uses_allocator/lwg3527.cc
+++ b/libstdc++-v3/testsuite/20_util/uses_allocator/lwg3527.cc
@@ -1,5 +1,6 @@
 // { dg-options "-std=gnu++20" }
 // { dg-do compile { target c++20 } }
+// { dg-require-effective-target hosted }
 
 #include 
 
diff --git 
a/libstdc++-v3/testsuite/29_atomics/atomic/operators/pointer_partial_void.cc 
b/libstdc++-v3/testsuite/29_atomics/atomic/operators/pointer_partial_void.cc
index ddb63233a64..e5d221ed15a 100644
--- a/libstdc++-v3/testsuite/29_atomics/atomic/operators/pointer_partial_void.cc
+++ b/libstdc++-v3/testsuite/29_atomics/atomic/operators/pointer_partial_void.cc
@@ -1,5 +1,6 @@
 // { dg-do run { target { c++11_only || c++14_only } } }
 // { dg-require-atomic-builtins "" }
+// { dg-require-effective-target hosted }
 
 // Copyright (C) 2012-2023 Free Software Foundation, Inc.
 //
-- 
2.40.0



[PATCH 2/4] libstdc++: Add a test for FTM redefinitions

2023-04-04 Thread Arsen Arsenović via Gcc-patches
This test detects redefinitions by compiling stdc++.h with
-Wsystem-headers.  Thanks Patrick Palka for the suggestion.

libstdc++-v3/ChangeLog:

* testsuite/17_intro/versionconflict.cc: New test.
---
 libstdc++-v3/testsuite/17_intro/versionconflict.cc | 6 ++
 1 file changed, 6 insertions(+)
 create mode 100644 libstdc++-v3/testsuite/17_intro/versionconflict.cc

diff --git a/libstdc++-v3/testsuite/17_intro/versionconflict.cc 
b/libstdc++-v3/testsuite/17_intro/versionconflict.cc
new file mode 100644
index 000..4191c7a2b08
--- /dev/null
+++ b/libstdc++-v3/testsuite/17_intro/versionconflict.cc
@@ -0,0 +1,6 @@
+// { dg-do preprocess }
+// { dg-additional-options "-Wsystem-headers -Werror" }
+
+// Test for redefinitions of FTMs using bits/stdc++.h.
+#include 
+#include 
-- 
2.40.0



Ping: [PATCH v2] libcpp: Handle extended characters in user-defined literal suffix [PR103902]

2023-04-04 Thread Lewis Hyatt via Gcc-patches
May I please ping this one?
https://gcc.gnu.org/pipermail/gcc-patches/2023-March/613247.html

Thanks!

-Lewis

On Thu, Mar 2, 2023 at 6:21 PM Lewis Hyatt  wrote:
>
> The PR complains that we do not handle UTF-8 in the suffix for a user-defined
> literal, such as:
>
> bool operator ""_π (unsigned long long);
>
> In fact we don't handle any extended identifier characters there, whether
> UTF-8, UCNs, or the $ sign. We do handle it fine if the optional space after
> the "" tokens is included, since then the identifier is lexed in the "normal"
> way as its own token. But when it is lexed as part of the string token, this
> is handled in lex_string() with a one-off loop that is not aware of extended
> characters.
>
> This patch fixes it by adding a new function scan_cur_identifier() that can be
> used to lex an identifier while in the middle of lexing another token.
>
> BTW, the other place that has been mis-lexing identifiers is
> lex_identifier_intern(), which is used to implement #pragma push_macro
> and #pragma pop_macro. This does not support extended characters either.
> I will add that in a subsequent patch, because it can't directly reuse the
> new function, but rather needs to lex from a string instead of a cpp_buffer.
>
> With scan_cur_identifier(), we do also correctly warn about bidi and
> normalization issues in the extended identifiers comprising the suffix.
>
> libcpp/ChangeLog:
>
> PR preprocessor/103902
> * lex.cc (identifier_diagnostics_on_lex): New function refactoring
> some common code.
> (lex_identifier_intern): Use the new function.
> (lex_identifier): Don't run identifier diagnostics here, rather let
> the call site do it when needed.
> (_cpp_lex_direct): Adjust the call sites of lex_identifier ()
> acccordingly.
> (struct scan_id_result): New struct.
> (scan_cur_identifier): New function.
> (create_literal2): New function.
> (lit_accum::create_literal2): New function.
> (is_macro): Folded into new function...
> (maybe_ignore_udl_macro_suffix): ...here.
> (is_macro_not_literal_suffix): Folded likewise.
> (lex_raw_string): Handle UTF-8 in UDL suffix via scan_cur_identifier 
> ().
> (lex_string): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> PR preprocessor/103902
> * g++.dg/cpp0x/udlit-extended-id-1.C: New test.
> * g++.dg/cpp0x/udlit-extended-id-2.C: New test.
> * g++.dg/cpp0x/udlit-extended-id-3.C: New test.
> * g++.dg/cpp0x/udlit-extended-id-4.C: New test.
> ---
>
> Notes:
> Hello-
>
> This is the updated version of the patch, incorporating feedback from 
> Jakub
> and Jason, most recently discussed here:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612073.html
>
> Please let me know how it looks? It is simpler than before with the new
> approach. Thanks!
>
> One thing to note. As Jason clarified for me, a usage like this:
>
>  #pragma GCC poison _x
> const char * operator "" _x (const char *, unsigned long);
>
> The space between the "" and the _x is currently allowed but will be
> deprecated in C++23. GCC currently will complain about the poisoned use of
> _x in this case, and this patch, which is just focused on handling UTF-8
> properly, does not change this. But it seems that it would be correct
> not to apply poison in this case. I can try to follow up with a patch to 
> do
> so, if it seems worthwhile? Given the syntax is deprecated, maybe it's not
> worth it...
>
> For the time being, this patch does add a testcase for the above and 
> xfails
> it. For the case where no space is present, which is the part touched by 
> the
> present patch, existing behavior is preserved correctly and no diagnostics
> such as poison are issued for the UDL suffix. (Contrary to v1 of this
> patch.)
>
> Thanks! bootstrap + regtested all languages on x86-64 Linux with
> no regressions.
>
> -Lewis
>
>  .../g++.dg/cpp0x/udlit-extended-id-1.C|  68 
>  .../g++.dg/cpp0x/udlit-extended-id-2.C|   6 +
>  .../g++.dg/cpp0x/udlit-extended-id-3.C|  15 +
>  .../g++.dg/cpp0x/udlit-extended-id-4.C|  14 +
>  libcpp/lex.cc | 382 ++
>  5 files changed, 317 insertions(+), 168 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-1.C
>  create mode 100644 gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-2.C
>  create mode 100644 gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-3.C
>  create mode 100644 gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-4.C
>
> diff --git a/gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-1.C 
> b/gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-1.C
> new file mode 100644
> index 000..411d4fdd0ba
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-1.C
> @@ -0,0 +1,68 @@
> +// { dg-do run { target 

Re: [aarch64] Use dup and zip1 for interleaving elements in initializing vector

2023-04-04 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni  writes:
> On Mon, 13 Mar 2023 at 13:03, Richard Biener  wrote:
>> On GIMPLE it would be
>>
>>  _1 = { a, ... }; // (a)
>>  _2 = { _1, ... }; // (b)
>>
>> but I'm not sure if (b), a VL CTOR of fixed len(?) sub-vectors is
>> possible?  But at least a CTOR of vectors is what we use to
>> concat vectors.
>>
>> With the recent relaxing of VEC_PERM inputs it's also possible to
>> express (b) with a VEC_PERM:
>>
>>  _2 = VEC_PERM <_1, _1, { 0, 1, 2, 3, 0, 1, 2, 3, ... }>
>>
>> but again I'm not sure if that repeating 0, 1, 2, 3 is expressible
>> for VL vectors (maybe we'd allow "wrapping" here, I'm not sure).
>>
> Hi,
> Thanks for the suggestions and sorry for late response in turn.
> The attached patch tries to fix the issue by explicitly constructing a CTOR
> from svdupq's arguments and then using VEC_PERM_EXPR with VL mask
> having encoded elements {0, 1, ... nargs-1},
> npatterns == nargs, and nelts_per_pattern == 1, to replicate the base vector.
>
> So for example, for the above case,
> svint32_t f_32(int32x4_t x)
> {
>   return svdupq_s32 (x[0], x[1], x[2], x[3]);
> }
>
> forwprop1 lowers it to:
>   svint32_t _6;
>   vector(4) int _8;
>   :
>   _1 = BIT_FIELD_REF ;
>   _2 = BIT_FIELD_REF ;
>   _3 = BIT_FIELD_REF ;
>   _4 = BIT_FIELD_REF ;
>   _8 = {_1, _2, _3, _4};
>   _6 = VEC_PERM_EXPR <_8, _8, { 0, 1, 2, 3, ... }>;
>   return _6;
>
> which is then eventually optimized to:
>   svint32_t _6;
>[local count: 1073741824]:
>   _6 = VEC_PERM_EXPR ;
>   return _6;
>
> code-gen:
> f_32:
> dup z0.q, z0.q[0]
> ret

Nice!

> Does it look OK ?
>
> Thanks,
> Prathamesh
>> Richard.
>>
>> > We're planning to implement the ACLE's Neon-SVE bridge:
>> > https://github.com/ARM-software/acle/blob/main/main/acle.md#neon-sve-bridge
>> > and so we'll need (b) to implement the svdup_neonq functions.
>> >
>> > Thanks,
>> > Richard
>> >
>>
>> --
>> Richard Biener 
>> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
>> Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
>> HRB 36809 (AG Nuernberg)
>
> [SVE] Fold svld1rq to VEC_PERM_EXPR if elements are not constant.
>
> gcc/ChangeLog:
>   * config/aarch64/aarch64-sve-builtins-base.cc
>   (svdupq_impl::fold_nonconst_dupq): New method.
>   (svdupq_impl::fold): Call fold_nonconst_dupq.
>
> gcc/testsuite/ChangeLog:
>   * gcc.target/aarch64/sve/acle/general/dupq_11.c: New test.
>
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> index cd9cace3c9b..3de79060619 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> @@ -817,6 +817,62 @@ public:
>  
>  class svdupq_impl : public quiet
>  {
> +private:
> +  gimple *
> +  fold_nonconst_dupq (gimple_folder , unsigned factor) const
> +  {
> +/* Lower lhs = svdupq (arg0, arg1, ..., argN} into:
> +   tmp = {arg0, arg1, ..., arg}
> +   lhs = VEC_PERM_EXPR (tmp, tmp, {0, 1, 2, N-1, ...})  */
> +
> +/* TODO: Revisit to handle factor by padding zeros.  */
> +if (factor > 1)
> +  return NULL;

Isn't the key thing here predicate vs. vector rather than factor == 1 vs.
factor != 1?  Do we generate good code for b8, where factor should be 1?

> +
> +if (BYTES_BIG_ENDIAN)
> +  return NULL;
> +
> +tree lhs = gimple_call_lhs (f.call);
> +if (TREE_CODE (lhs) != SSA_NAME)
> +  return NULL;

Why is this check needed?

> +tree lhs_type = TREE_TYPE (lhs);
> +tree elt_type = TREE_TYPE (lhs_type);
> +scalar_mode elt_mode = GET_MODE_INNER (TYPE_MODE (elt_type));

Aren't we already dealing with a scalar type here?  I'd have expected
SCALAR_TYPE_MODE rather than GET_MODE_INNER (TYPE_MODE ...).

> +machine_mode vq_mode = aarch64_vq_mode (elt_mode).require ();
> +tree vq_type = build_vector_type_for_mode (elt_type, vq_mode);
> +
> +unsigned nargs = gimple_call_num_args (f.call);
> +vec *v;
> +vec_alloc (v, nargs);
> +for (unsigned i = 0; i < nargs; i++)
> +  CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, gimple_call_arg (f.call, i));
> +tree vec = build_constructor (vq_type, v);
> +
> +tree access_type
> +  = build_aligned_type (vq_type, TYPE_ALIGN (elt_type));

Nit: seems to fit on one line.  But do we need this?  We're not accessing
memory, so I'd have expected vq_type to be OK as-is.

> +tree tmp = make_ssa_name_fn (cfun, access_type, 0);
> +gimple *g = gimple_build_assign (tmp, vec);
> +
> +gimple_seq stmts = NULL;
> +gimple_seq_add_stmt_without_update (, g);
> +
> +int source_nelts = TYPE_VECTOR_SUBPARTS (access_type).to_constant ();

Looks like we should be able to use nargs instead of source_nelts.

Thanks,
Richard

> +poly_uint64 lhs_len = TYPE_VECTOR_SUBPARTS (lhs_type);
> +vec_perm_builder sel (lhs_len, source_nelts, 1);
> +for (int i = 0; i < source_nelts; i++)
> +  sel.quick_push (i);
> +
> +

Re: [PATCHv4] [AARCH64] Fix PR target/103100 -mstrict-align and memset on not aligned buffers

2023-04-04 Thread Richard Sandiford via Gcc-patches
Andrew Pinski via Gcc-patches  writes:
> The problem here is that aarch64_expand_setmem does not change the alignment
> for strict alignment case.
> This is version 4 of the fix, major changes from the last version is fixing
> the way store pairs are handled which allows handling of storing 2 SI mode
> at a time.
> This also adds a testcase to show a case with -mstrict-align we can do
> the store word pair stores.

Heh.  The patch seems to be getting more complicated. :-)

> OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.
>
>   PR target/103100
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.cc (aarch64_gen_store_pair):
>   Add support for SImode.
>   (aarch64_set_one_block_and_progress_pointer):
>   Add use_pair argument and rewrite and simplifying the
>   code.
>   (aarch64_can_use_pair_load_stores): New function.
>   (aarch64_expand_setmem): Rewrite mode selection to
>   better handle strict alignment and non ld/stp pair case.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/memset-strict-align-1.c: Update test.
>   Reduce the size down to 207 and make s1 global and aligned
>   to 16 bytes.
>   * gcc.target/aarch64/memset-strict-align-2.c: New test.
>   * gcc.target/aarch64/memset-strict-align-3.c: New test.
> ---
>  gcc/config/aarch64/aarch64.cc | 136 ++
>  .../aarch64/memset-strict-align-1.c   |  19 ++-
>  .../aarch64/memset-strict-align-2.c   |  14 ++
>  .../aarch64/memset-strict-align-3.c   |  15 ++
>  4 files changed, 113 insertions(+), 71 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/memset-strict-align-2.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/memset-strict-align-3.c
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 5c40b6ed22a..3eaf9bd608a 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -8850,6 +8850,9 @@ aarch64_gen_store_pair (machine_mode mode, rtx mem1, 
> rtx reg1, rtx mem2,
>  {
>switch (mode)
>  {
> +case E_SImode:
> +  return gen_store_pair_sw_sisi (mem1, reg1, mem2, reg2);
> +
>  case E_DImode:
>return gen_store_pair_dw_didi (mem1, reg1, mem2, reg2);
>  
> @@ -24896,42 +24899,49 @@ aarch64_expand_cpymem (rtx *operands)
> SRC is a register we have created with the duplicated value to be set.  */
>  static void
>  aarch64_set_one_block_and_progress_pointer (rtx src, rtx *dst,
> - machine_mode mode)
> + machine_mode mode, bool use_pairs)

It would be good to update the comment, since this no longer matches
the aarch64_copy_one_block_and_progress_pointers interface very closely.

>  {
> +  rtx reg = src;
>/* If we are copying 128bits or 256bits, we can do that straight from
>   the SIMD register we prepared.  */
> -  if (known_eq (GET_MODE_BITSIZE (mode), 256))
> -{
> -  mode = GET_MODE (src);
> -  /* "Cast" the *dst to the correct mode.  */
> -  *dst = adjust_address (*dst, mode, 0);
> -  /* Emit the memset.  */
> -  emit_insn (aarch64_gen_store_pair (mode, *dst, src,
> -  aarch64_progress_pointer (*dst), src));
> -
> -  /* Move the pointers forward.  */
> -  *dst = aarch64_move_pointer (*dst, 32);
> -  return;
> -}
>if (known_eq (GET_MODE_BITSIZE (mode), 128))
> -{
> -  /* "Cast" the *dst to the correct mode.  */
> -  *dst = adjust_address (*dst, GET_MODE (src), 0);
> -  /* Emit the memset.  */
> -  emit_move_insn (*dst, src);
> -  /* Move the pointers forward.  */
> -  *dst = aarch64_move_pointer (*dst, 16);
> -  return;
> -}
> -  /* For copying less, we have to extract the right amount from src.  */
> -  rtx reg = lowpart_subreg (mode, src, GET_MODE (src));
> +mode = GET_MODE(src);

Nit: space before "(".

> +  else
> +/* For copying less, we have to extract the right amount from src.  */
> +reg = lowpart_subreg (mode, src, GET_MODE (src));
>  
>/* "Cast" the *dst to the correct mode.  */
>*dst = adjust_address (*dst, mode, 0);
>/* Emit the memset.  */
> -  emit_move_insn (*dst, reg);
> +  if (use_pairs)
> +emit_insn (aarch64_gen_store_pair (mode, *dst, reg,
> +aarch64_progress_pointer (*dst),
> +reg));
> +  else
> +emit_move_insn (*dst, reg);
> +
>/* Move the pointer forward.  */
>*dst = aarch64_progress_pointer (*dst);
> +  if (use_pairs)
> +*dst = aarch64_progress_pointer (*dst);
> +}
> +
> +/* Returns true if size can be used as a store/load pair.
> +   This is a helper function for aarch64_expand_setmem and others. */
> +static bool
> +aarch64_can_use_pair_load_stores (unsigned HOST_WIDE_INT size)
> +{
> +  /* For DI and SI modes, we can use store pairs.  */
> +  if (size == 

[PATCH] Add -fsarif-time-report [PR109361]

2023-04-04 Thread David Malcolm via Gcc-patches
Richi, Jakub: I can probably self-approve this, but it's technically a
new feature.  OK if I push this to trunk in stage 4?  I believe it's
low risk, and is very useful for benchmarking -fanalyzer.


This patch adds support for embeddding profiling information about the
compiler itself into the SARIF output.

In an earlier version of this patch I extended -ftime-report so that
as well as writing to stderr, it would embed the information in any
SARIF output.  This turned out to be awkward to use, in that I found
myself needing to get the data in JSON form without also having it
emitted on stderr (which was affecting the output of the build).

Hence this version of the patch adds a new -fsarif-time-report, similar
to the existing -ftime-report for requesting GCC profile itself using
the timevar machinery.

Specifically, if -fsarif-time-report is specified, the timing
information will be captured (as if -ftime-report were specified), and
will be embedded in JSON form within any SARIF as a "gcc/timeReport"
property within a property bag of the "invocation" object.

Here's an example of the output:

  "invocations": [
  {
  "executionSuccessful": true,
  "toolExecutionNotifications": [],
  "properties": {
  "gcc/timeReport": {
  "timevars": [
  {
  "name": "phase setup",
  "elapsed": {
  "user": 0.04,
  "sys": 0,
  "wall": 0.04,
  "ggc_mem": 1863472
  }
  },

  [...snip...]

  {
  "name": "analyzer: processing worklist",
  "elapsed": {
  "user": 0.06,
  "sys": 0,
  "wall": 0.06,
  "ggc_mem": 48
  }
  },
  {
  "name": "analyzer: emitting diagnostics",
  "elapsed": {
  "user": 0.01,
  "sys": 0,
  "wall": 0.01,
  "ggc_mem": 0
  }
  },
  {
  "name": "TOTAL",
  "elapsed": {
  "user": 0.21,
  "sys": 0.03,
  "wall": 0.24,
  "ggc_mem": 3368736
  }
  }
  ],
  "CHECKING_P": true,
  "flag_checking": true
  }
  }
  }
  ]

I have successfully used this in my analyzer integration tests to get
timing information about which source files get slowed down by the
analyzer.  I've validated the generated .sarif files against the SARIF
schema.

The documentation notes that the precise output format is subject
to change.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

gcc/ChangeLog:
PR analyzer/109361
* common.opt (fsarif-time-report): New option.
* diagnostic-client-data-hooks.h (class sarif_object): New forward
decl.
(diagnostic_client_data_hooks::add_sarif_invocation_properties):
New vfunc.
* diagnostic-format-sarif.cc: Include "diagnostic-format-sarif.h".
(class sarif_invocation): Inherit from sarif_object rather
than json::object.
(class sarif_result): Likewise.
(class sarif_ice_notification): Likewise.
(sarif_object::get_or_create_properties): New.
(sarif_invocation::prepare_to_flush): Add "context" param.  Use it
to call the context's add_sarif_invocation_properties hook.
(sarif_builder::flush_to_file): Pass m_context to
sarif_invocation::prepare_to_flush.
* diagnostic-format-sarif.h: New header.
* doc/invoke.texi (Developer Options): Add -fsarif-time-report.
* timevar.cc: Include "json.h".
(timer::named_items::m_hash_map): Split out type into...
(timer::named_items::hash_map_t): ...this new typedef.
(timer::named_items::make_json): New function.
(timevar_diff): New function.
(make_json_for_timevar_time_def): New function.
(timer::timevar_def::make_json): New function.
(timer::make_json): New function.
* timevar.h (class json::value): New forward decl.
(timer::make_json): New decl.
(timer::timevar_def::make_json): New decl.
* toplev.cc (toplev::~toplev): Guard the call to timer::print so
that it doesn't happen on just -fsarif-time-report.
(toplev::start_timevars): Add sarif_time_report to the flags that
can lead to 

Re: [PATCH] ree: Improvement of ree pass for rs6000 target.

2023-04-04 Thread Segher Boessenkool
On Tue, Apr 04, 2023 at 05:26:26PM +0200, Jakub Jelinek wrote:
> On Tue, Apr 04, 2023 at 10:19:23AM -0500, Segher Boessenkool wrote:
> > > > +/* Enable -free for zero extension and sign extension 
> > > > elimination.*/
> > > > +{ OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 },
> > > 
> > > I believe the options should be sorted by the OPT_LEVEL* they are given.
> > 
> > If that is true, that rule is violated all over the place already.  It
> > doesn't make much sense anyway, the OPT_LEVEL* have no complete ordering
> > at all.  But, yeah, -O2 stuff after the -O1 stuff makes sense, and we do
> > have such a partial ordering now.
> 
> At least default_options_table sorts stuff like that.  Sure, the larger
> the table it is, the more it is important to be able to see clearly what
> each level enables.

And ours is 16 lines including whitespace.  But yeah, and the
presentation can be improved other ways as well.

The default_options_table has in order:
/* -O1 and -Og optimizations.  */
/* -O1 (and not -Og) optimizations.  */
/* -O2 and -Os optimizations.  */
/* -O2 and above optimizations, but not -Os or -Og.  */
/* -O3 and -Os optimizations.  */
/* -O3 optimizations.  */
/* -O3 parameters.  */
/* -Ofast adds optimizations to -O3.  */
(and no OPT_LEVELS_ALL at all, the most common for us and probably
most other targets.  Logically these come first, if ordering by opt
level).

Either way, yes we should have some grouping.  Not the same as the
above, but something that does make some sense :-)


Segher


Re: [PATCH] ree: Improvement of ree pass for rs6000 target.

2023-04-04 Thread Jakub Jelinek via Gcc-patches
On Tue, Apr 04, 2023 at 10:19:23AM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Apr 04, 2023 at 01:53:46PM +0200, Jakub Jelinek wrote:
> > On Tue, Apr 04, 2023 at 05:02:35PM +0530, Ajit Agarwal via Gcc-patches 
> > wrote:
> > > --- a/gcc/common/config/rs6000/rs6000-common.cc
> > > +++ b/gcc/common/config/rs6000/rs6000-common.cc
> > > @@ -30,6 +30,8 @@
> > >  /* Implement TARGET_OPTION_OPTIMIZATION_TABLE.  */
> > >  static const struct default_options rs6000_option_optimization_table[] =
> > >{
> > > +/* Enable -free for zero extension and sign extension elimination.*/
> > > +{ OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 },
> > 
> > I believe the options should be sorted by the OPT_LEVEL* they are given.
> 
> If that is true, that rule is violated all over the place already.  It
> doesn't make much sense anyway, the OPT_LEVEL* have no complete ordering
> at all.  But, yeah, -O2 stuff after the -O1 stuff makes sense, and we do
> have such a partial ordering now.

At least default_options_table sorts stuff like that.  Sure, the larger
the table it is, the more it is important to be able to see clearly what
each level enables.

Jakub



Re: [PATCH] ree: Improvement of ree pass for rs6000 target.

2023-04-04 Thread Segher Boessenkool
Hi!

On Tue, Apr 04, 2023 at 01:53:46PM +0200, Jakub Jelinek wrote:
> On Tue, Apr 04, 2023 at 05:02:35PM +0530, Ajit Agarwal via Gcc-patches wrote:
> > --- a/gcc/common/config/rs6000/rs6000-common.cc
> > +++ b/gcc/common/config/rs6000/rs6000-common.cc
> > @@ -30,6 +30,8 @@
> >  /* Implement TARGET_OPTION_OPTIMIZATION_TABLE.  */
> >  static const struct default_options rs6000_option_optimization_table[] =
> >{
> > +/* Enable -free for zero extension and sign extension elimination.*/
> > +{ OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 },
> 
> I believe the options should be sorted by the OPT_LEVEL* they are given.

If that is true, that rule is violated all over the place already.  It
doesn't make much sense anyway, the OPT_LEVEL* have no complete ordering
at all.  But, yeah, -O2 stuff after the -O1 stuff makes sense, and we do
have such a partial ordering now.


Segher


Re: [Patch] libgomp/nvptx: Prepare for reverse-offload callback handling

2023-04-04 Thread Thomas Schwinge
Hi!

During GCC/OpenMP/nvptx reverse offload investigations, about how to
replace the problematic global 'GOMP_REV_OFFLOAD_VAR', I may have found
something re:

On 2022-08-26T11:07:28+0200, Tobias Burnus  wrote:
> Better suggestions are welcome for the busy loop in
> libgomp/plugin/plugin-nvptx.c regarding the variable placement and checking
> its value.

> On the host side, the last address is checked - if fn_addr != NULL,
> it passes all arguments on to the generic (target.c) gomp_target_rev
> to do the actual offloading.
>
> CUDA does lockup when trying to copy data from the currently running
> stream; hence, a new stream is generated to do the memory copying.

> Future work for nvptx:
> * Adjust 'sleep', possibly [...]
>   to do shorter sleeps than usleep(1)?

... this busy loop.

Current 'libgomp/plugin/plugin-nvptx.c:GOMP_OFFLOAD_run':

[...]
  if (reverse_offload)
CUDA_CALL_ASSERT (cuStreamCreate, _stream, CU_STREAM_NON_BLOCKING);
  r = CUDA_CALL_NOCHECK (cuLaunchKernel, function, teams, 1, 1,
 32, threads, 1, 0, NULL, NULL, config);
  if (r != CUDA_SUCCESS)
GOMP_PLUGIN_fatal ("cuLaunchKernel error: %s", cuda_error (r));
  if (reverse_offload)
while (true)
  {
r = CUDA_CALL_NOCHECK (cuStreamQuery, NULL);
if (r == CUDA_SUCCESS)
  break;
if (r == CUDA_ERROR_LAUNCH_FAILED)
  GOMP_PLUGIN_fatal ("cuStreamQuery error: %s %s\n", cuda_error (r),
 maybe_abort_msg);
else if (r != CUDA_ERROR_NOT_READY)
  GOMP_PLUGIN_fatal ("cuStreamQuery error: %s", cuda_error (r));

if (__atomic_load_n (_dev->rev_data->fn, __ATOMIC_ACQUIRE) != 0)
  {
struct rev_offload *rev_data = ptx_dev->rev_data;
GOMP_PLUGIN_target_rev (rev_data->fn, rev_data->mapnum,
rev_data->addrs, rev_data->sizes,
rev_data->kinds, rev_data->dev_num,
rev_off_dev_to_host_cpy,
rev_off_host_to_dev_cpy, copy_stream);
CUDA_CALL_ASSERT (cuStreamSynchronize, copy_stream);
__atomic_store_n (_data->fn, 0, __ATOMIC_RELEASE);
  }
usleep (1);
  }
  else
r = CUDA_CALL_NOCHECK (cuCtxSynchronize, );
  if (reverse_offload)
CUDA_CALL_ASSERT (cuStreamDestroy, copy_stream);
[...]

Instead of this 'while (true)', 'usleep (1)' loop, shouldn't we be able
to use "Stream Memory Operations",
,
that allow to "Wait on a memory location", "until the given condition on
the memory is satisfied"?

For reference, current 'libgomp/config/nvptx/target.c:GOMP_target_ext':

[...]
  GOMP_REV_OFFLOAD_VAR->mapnum = mapnum;
  GOMP_REV_OFFLOAD_VAR->addrs = (uint64_t) hostaddrs;
  GOMP_REV_OFFLOAD_VAR->sizes = (uint64_t) sizes;
  GOMP_REV_OFFLOAD_VAR->kinds = (uint64_t) kinds;
  GOMP_REV_OFFLOAD_VAR->dev_num = GOMP_ADDITIONAL_ICVS.device_num;

  /* Set 'fn' to trigger processing on the host; wait for completion,
 which is flagged by setting 'fn' back to 0 on the host.  */
  uint64_t addr_struct_fn = (uint64_t) _REV_OFFLOAD_VAR->fn;
#if __PTX_SM__ >= 700
  asm volatile ("st.global.release.sys.u64 [%0], %1;"
: : "r"(addr_struct_fn), "r" (fn) : "memory");
#else
  __sync_synchronize ();  /* membar.sys */
  asm volatile ("st.volatile.global.u64 [%0], %1;"
: : "r"(addr_struct_fn), "r" (fn) : "memory");
#endif

#if __PTX_SM__ >= 700
  uint64_t fn2;
  do
{
  asm volatile ("ld.acquire.sys.global.u64 %0, [%1];"
: "=r" (fn2) : "r" (addr_struct_fn) : "memory");
}
  while (fn2 != 0);
#else
  /* ld.global.u64 %r64,[__gomp_rev_offload_var];
 ld.u64 %r36,[%r64];
 membar.sys;  */
  while (__atomic_load_n (_REV_OFFLOAD_VAR->fn, __ATOMIC_ACQUIRE) != 0)
;  /* spin  */
#endif
[...]


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [wwwdocs] Add Ada's GCC13 changelog entry

2023-04-04 Thread Marc Poulhiès via Gcc-patches


Fernando Oleo Blanco via Gcc-patches  writes:

> Hi all,
>
> a bit belated but just like last year, I've made a patch for the Ada
> entry in the changelog. You can find the patch attached to this email.
>
> If I have forgotten anything relevant or if I have done something
> incorrectly, please, say so.

Hello Fernando,

I've applied/pushed your change.

Thanks!
Marc


Re: [PATCH] range-op-float: Fix reverse ops of comparisons [PR109386]

2023-04-04 Thread Aldy Hernandez via Gcc-patches
OK.

On Mon, Apr 3, 2023, 21:54 Jakub Jelinek  wrote:

> Hi!
>
> I've missed one of my recent range-op-float.cc changes (likely the
> r13-6967 one) caused
> FAIL: libphobos.phobos/std/math/algebraic.d execution test
> FAIL: libphobos.phobos_shared/std/math/algebraic.d execution test
> regressions, distilled into a C testcase below.
>
> In the testcase, we have
> !(u >= v)
> condition where both u and v are results of fabs*, which guards
> t1 = u u<= __FLT_MAX__;
> and
> t2 = v u<= __FLT_MAX__;
> comparisons.  From false u >= v where u and v have [0.0, +Inf] NAN
> ranges we (incorrectly deduce that one of them is [nextafterf (0.0, 1.0),
> +Inf] NAN
> and the other is [0.0, nextafterf (+Inf, 0.0)] NAN and from that deduce
> that
> one of the comparisons is always true, because UNLE_EXPR with the maximum
> representable number are false only if the value is +Inf and our ranges
> tell
> that is not possible.
>
> The bug is that the u >= v comparison determines a sensible range only when
> it is true - we then know neither operand can be NAN and it behaves
> correctly.  But when the comparison is false, our current code gives
> sensible answers only if the other op can't be NAN.  If it can be NAN,
> whenever it is NAN, the comparison is always false regardless of the other
> value, so the other value needs to be VARYING.
> Now, foperator_unordered_lt::op1_range etc. had code to deal with that
> for op?.known_nan (), but as the testcase shows, it is enough if it may be
> a
> NAN at runtime to make it VARYING.
>
> So, the following patch replaces for all those BRS_FALSE cases of the
> normal
> non-equality comparisons if (opOTHER.known_isnan ()) r.set_varying (type);
> to do it also if maybe_isnan ().
>
> For the unordered or ... comparisons, it is similar for BRS_TRUE.  Those
> comparisons are true whenever either operand is NAN, or if neither is NAN,
> the corresponding normal comparison.  So, if those comparisons are true
> and other operand might be a NAN, we can't tell (VARYING), if it is false,
> currently handling is correct.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, fixes those 2
> D testcases and the newly added one.  Ok for trunk?
>
> 2023-04-03  Jakub Jelinek  
>
> PR tree-optimization/109386
> * range-op-float.cc (foperator_lt::op1_range,
> foperator_lt::op2_range,
> foperator_le::op1_range, foperator_le::op2_range,
> foperator_gt::op1_range, foperator_gt::op2_range,
> foperator_ge::op1_range, foperator_ge::op2_range): Make r varying
> for
> BRS_FALSE case even if the other op is maybe_isnan, not just
> known_isnan.
> (foperator_unordered_lt::op1_range,
> foperator_unordered_lt::op2_range,
> foperator_unordered_le::op1_range,
> foperator_unordered_le::op2_range,
> foperator_unordered_gt::op1_range,
> foperator_unordered_gt::op2_range,
> foperator_unordered_ge::op1_range,
> foperator_unordered_ge::op2_range):
> Make r varying for BRS_TRUE case even if the other op is
> maybe_isnan,
> not just known_isnan.
>
> * gcc.c-torture/execute/ieee/pr109386.c: New test.
>
> --- gcc/range-op-float.cc.jj2023-04-03 10:42:54.0 +0200
> +++ gcc/range-op-float.cc   2023-04-03 13:31:01.163216123 +0200
> @@ -889,7 +889,7 @@ foperator_lt::op1_range (frange ,
>
>  case BRS_FALSE:
>// On the FALSE side of x < NAN, we know nothing about x.
> -  if (op2.known_isnan ())
> +  if (op2.known_isnan () || op2.maybe_isnan ())
> r.set_varying (type);
>else
> build_ge (r, type, op2);
> @@ -926,7 +926,7 @@ foperator_lt::op2_range (frange ,
>
>  case BRS_FALSE:
>// On the FALSE side of NAN < x, we know nothing about x.
> -  if (op1.known_isnan ())
> +  if (op1.known_isnan () || op1.maybe_isnan ())
> r.set_varying (type);
>else
> build_le (r, type, op1);
> @@ -1005,7 +1005,7 @@ foperator_le::op1_range (frange ,
>
>  case BRS_FALSE:
>// On the FALSE side of x <= NAN, we know nothing about x.
> -  if (op2.known_isnan ())
> +  if (op2.known_isnan () || op2.maybe_isnan ())
> r.set_varying (type);
>else
> build_gt (r, type, op2);
> @@ -1038,7 +1038,7 @@ foperator_le::op2_range (frange ,
>
>  case BRS_FALSE:
>// On the FALSE side of NAN <= x, we know nothing about x.
> -  if (op1.known_isnan ())
> +  if (op1.known_isnan () || op1.maybe_isnan ())
> r.set_varying (type);
>else if (op1.undefined_p ())
> return false;
> @@ -1122,7 +1122,7 @@ foperator_gt::op1_range (frange ,
>
>  case BRS_FALSE:
>// On the FALSE side of x > NAN, we know nothing about x.
> -  if (op2.known_isnan ())
> +  if (op2.known_isnan () || op2.maybe_isnan ())
> r.set_varying (type);
>else if (op2.undefined_p ())
> return false;
> @@ -1161,7 +1161,7 @@ foperator_gt::op2_range (frange ,
>
>  case BRS_FALSE:
>   

Fwd: [V6][PATCH 2/2] Update documentation to clarify a GCC extension

2023-04-04 Thread Qing Zhao via Gcc-patches
Ping….

Qing

Begin forwarded message:

From: Qing Zhao mailto:qing.z...@oracle.com>>
Subject: [PATCH 2/2] Update documentation to clarify a GCC extension
Date: March 28, 2023 at 11:49:44 AM EDT
To: ja...@redhat.com, 
jos...@codesourcery.com
Cc: richard.guent...@gmail.com, 
keesc...@chromium.org, 
siddh...@gotplt.org, 
gcc-patches@gcc.gnu.org, Qing Zhao 
mailto:qing.z...@oracle.com>>

on a structure with a C99 flexible array member being nested in
another structure. (PR77650)

"GCC extension accepts a structure containing an ISO C99 "flexible array
member", or a union containing such a structure (possibly recursively)
to be a member of a structure.

There are two situations:

  * A structure or a union with a C99 flexible array member is the last
field of another structure, for example:

 struct flex  { int length; char data[]; };
 union union_flex { int others; struct flex f; };

 struct out_flex_struct { int m; struct flex flex_data; };
 struct out_flex_union { int n; union union_flex flex_data; };

In the above, both 'out_flex_struct.flex_data.data[]' and
'out_flex_union.flex_data.f.data[]' are considered as flexible
arrays too.

  * A structure or a union with a C99 flexible array member is the
middle field of another structure, for example:

 struct flex  { int length; char data[]; };

 struct mid_flex { int m; struct flex flex_data; int n; };

In the above, 'mid_flex.flex_data.data[]' has undefined behavior.
Compilers do not handle such case consistently, Any code relying on
such case should be modified to ensure that flexible array members
only end up at the ends of structures.

Please use warning option '-Wflex-array-member-not-at-end' to
identify all such cases in the source code and modify them.  This
warning will be on by default starting from GCC 14.
"

gcc/c-family/ChangeLog:

* c.opt: New option -Wflex-array-member-not-at-end.

gcc/c/ChangeLog:

* c-decl.cc (finish_struct): Issue warnings for new option.

gcc/ChangeLog:

* doc/extend.texi: Document GCC extension on a structure containing
a flexible array member to be a member of another structure.

gcc/testsuite/ChangeLog:

* gcc.dg/variable-sized-type-flex-array.c: New test.
---
gcc/c-family/c.opt|  5 +++
gcc/c/c-decl.cc   |  9 
gcc/doc/extend.texi   | 45 ++-
.../gcc.dg/variable-sized-type-flex-array.c   | 31 +
4 files changed, 89 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/gcc.dg/variable-sized-type-flex-array.c

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index cddeece..c26d9801b63 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -737,6 +737,11 @@ Wformat-truncation=
C ObjC C++ LTO ObjC++ Joined RejectNegative UInteger Var(warn_format_trunc) 
Warning LangEnabledBy(C ObjC C++ LTO ObjC++,Wformat=, warn_format >= 1, 0) 
IntegerRange(0, 2)
Warn about calls to snprintf and similar functions that truncate output.

+Wflex-array-member-not-at-end
+C C++ Var(warn_flex_array_member_not_at_end) Warning
+Warn when a structure containing a C99 flexible array member as the last
+field is not at the end of another structure.
+
Wif-not-aligned
C ObjC C++ ObjC++ Var(warn_if_not_aligned) Init(1) Warning
Warn when the field in a struct is not aligned.
diff --git a/gcc/c/c-decl.cc 
b/gcc/c/c-decl.cc
index 14c54809b9d..92304fd9c8f 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -9269,6 +9269,15 @@ finish_struct (location_t loc, tree t, tree fieldlist, 
tree attributes,
TYPE_INCLUDE_FLEXARRAY (t)
 = is_last_field && TYPE_INCLUDE_FLEXARRAY (TREE_TYPE (x));

+  if (warn_flex_array_member_not_at_end
+  && !is_last_field
+  && RECORD_OR_UNION_TYPE_P (TREE_TYPE (x))
+  && TYPE_INCLUDE_FLEXARRAY (TREE_TYPE (x)))
+ warning_at (DECL_SOURCE_LOCATION (x),
+OPT_Wflex_array_member_not_at_end,
+"structure containing a flexible array member"
+" is not at the end of another structure");
+
  if (DECL_NAME (x)
 || RECORD_OR_UNION_TYPE_P (TREE_TYPE (x)))
saw_named_field = true;
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 3adb67aa47a..ef46423339e 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -1748,7 +1748,50 @@ Flexible array members may only appear as the last 
member of a
A structure containing a flexible array member, or a union containing
such a structure (possibly recursively), may not be a member of a
structure or an element of an array.  (However, these uses are
-permitted by GCC as extensions.)
+permitted by GCC as extensions, see details below.)
+@end itemize
+
+GCC extension 

Fwd: [V6][PATCH 1/2] Handle component_ref to a structre/union field including flexible array member [PR101832]

2023-04-04 Thread Qing Zhao via Gcc-patches
Ping…

Qing

Begin forwarded message:

From: Qing Zhao mailto:qing.z...@oracle.com>>
Subject: [V6][PATCH 1/2] Handle component_ref to a structre/union field 
including flexible array member [PR101832]
Date: March 28, 2023 at 11:49:43 AM EDT
To: ja...@redhat.com, 
jos...@codesourcery.com
Cc: richard.guent...@gmail.com, 
keesc...@chromium.org, 
siddh...@gotplt.org, 
gcc-patches@gcc.gnu.org, Qing Zhao 
mailto:qing.z...@oracle.com>>

the C front-end has been approved by Joseph.

Jacub, could you please eview the middle end part of the changes of this patch?

The major change is in tree-object-size.cc 
(addr_object_size).
(To use the new TYPE_INCLUDE_FLEXARRAY info).

This patch is to fix 
PR101832(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101832),
and is needed for Linux Kernel security.  It’s better to be put into GCC13.

Thanks a lot!

Qing

==

GCC extension accepts the case when a struct with a flexible array member
is embedded into another struct or union (possibly recursively).
__builtin_object_size should treat such struct as flexible size per
-fstrict-flex-arrays.

gcc/c/ChangeLog:

PR tree-optimization/101832
* c-decl.cc (finish_struct): Set TYPE_INCLUDE_FLEXARRAY for
struct/union type.

gcc/lto/ChangeLog:

PR tree-optimization/101832
* lto-common.cc (compare_tree_sccs_1): Compare bit
TYPE_NO_NAMED_ARGS_STDARG_P or TYPE_INCLUDE_FLEXARRAY properly
for its corresponding type.

gcc/ChangeLog:

PR tree-optimization/101832
* print-tree.cc (print_node): Print new bit 
type_include_flexarray.
* tree-core.h (struct tree_type_common): Use bit no_named_args_stdarg_p
as type_include_flexarray for RECORD_TYPE or UNION_TYPE.
* tree-object-size.cc (addr_object_size): Handle 
structure/union type
when it has flexible size.
* tree-streamer-in.cc 
(unpack_ts_type_common_value_fields): Stream
in bit no_named_args_stdarg_p properly for its corresponding type.
* tree-streamer-out.cc 
(pack_ts_type_common_value_fields): Stream
out bit no_named_args_stdarg_p properly for its corresponding type.
* tree.h (TYPE_INCLUDE_FLEXARRAY): New macro TYPE_INCLUDE_FLEXARRAY.

gcc/testsuite/ChangeLog:

PR tree-optimization/101832
* gcc.dg/builtin-object-size-pr101832.c: New test.
---
gcc/c/c-decl.cc   |  11 ++
gcc/lto/lto-common.cc |   5 +-
gcc/print-tree.cc |   5 +
.../gcc.dg/builtin-object-size-pr101832.c | 134 ++
gcc/tree-core.h   |   2 +
gcc/tree-object-size.cc   |  23 
++-
gcc/tree-streamer-in.cc   |   5 
+-
gcc/tree-streamer-out.cc  |   
5 +-
gcc/tree.h|   7 +-
9 files changed, 192 insertions(+), 5 deletions(-)
create mode 100644 gcc/testsuite/gcc.dg/builtin-object-size-pr101832.c

diff --git a/gcc/c/c-decl.cc 
b/gcc/c/c-decl.cc
index e537d33f398..14c54809b9d 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -9258,6 +9258,17 @@ finish_struct (location_t loc, tree t, tree fieldlist, 
tree attributes,
  /* Set DECL_NOT_FLEXARRAY flag for FIELD_DECL x.  */
  DECL_NOT_FLEXARRAY (x) = !is_flexible_array_member_p (is_last_field, x);

+  /* Set TYPE_INCLUDE_FLEXARRAY for the context of x, t.
+ when x is an array and is the last field.  */
+  if (TREE_CODE (TREE_TYPE (x)) == ARRAY_TYPE)
+ TYPE_INCLUDE_FLEXARRAY (t)
+  = is_last_field && flexible_array_member_type_p (TREE_TYPE (x));
+  /* Recursively set TYPE_INCLUDE_FLEXARRAY for the context of x, t
+ when x is an union or record and is the last field.  */
+  else if (RECORD_OR_UNION_TYPE_P (TREE_TYPE (x)))
+ TYPE_INCLUDE_FLEXARRAY (t)
+  = is_last_field && TYPE_INCLUDE_FLEXARRAY (TREE_TYPE (x));
+
  if (DECL_NAME (x)
 || RECORD_OR_UNION_TYPE_P (TREE_TYPE (x)))
saw_named_field = true;
diff --git a/gcc/lto/lto-common.cc 
b/gcc/lto/lto-common.cc
index 882dd8971a4..9dde7118266 100644
--- a/gcc/lto/lto-common.cc
+++ b/gcc/lto/lto-common.cc
@@ -1275,7 +1275,10 @@ compare_tree_sccs_1 (tree t1, tree t2, tree **map)
  if (AGGREGATE_TYPE_P (t1))
compare_values (TYPE_TYPELESS_STORAGE);
  compare_values (TYPE_EMPTY_P);
-  compare_values (TYPE_NO_NAMED_ARGS_STDARG_P);
+  if (FUNC_OR_METHOD_TYPE_P (t1))
+ compare_values (TYPE_NO_NAMED_ARGS_STDARG_P);
+  if 

RE: [PATCH] aarch64: update ampere1 vectorization cost

2023-04-04 Thread Kyrylo Tkachov via Gcc-patches
Hi Philipp,

> -Original Message-
> From: Philipp Tomsich 
> Sent: Monday, April 3, 2023 12:46 PM
> To: Kyrylo Tkachov 
> Cc: gcc-patches@gcc.gnu.org; Richard Sandiford
> ; Tamar Christina
> ; Manolis Tsamis 
> Subject: Re: [PATCH] aarch64: update ampere1 vectorization cost
> 
> Kyrill,
> 
> We reran on GCC12 and GCC11, reproducing the same improvements (e.g.,
> on fotonik3d) that prompted the changes.
> I'll apply the backports later this week, unless you have any further
> concerns…

Ok, thanks for checking.
Kyrill

> 
> Thanks,
> Philipp.
> 
> 
> On Mon, 27 Mar 2023 at 11:24, Kyrylo Tkachov 
> wrote:
> >
> >
> >
> > > -Original Message-
> > > From: Philipp Tomsich 
> > > Sent: Monday, March 27, 2023 9:50 AM
> > > To: Kyrylo Tkachov 
> > > Cc: gcc-patches@gcc.gnu.org; Richard Sandiford
> > > ; Tamar Christina
> > > ; Manolis Tsamis
> 
> > > Subject: Re: [PATCH] aarch64: update ampere1 vectorization cost
> > >
> > > On Mon, 27 Mar 2023 at 16:45, Kyrylo Tkachov
> 
> > > wrote:
> > > >
> > > > Hi Philipp,
> > > >
> > > > > -Original Message-
> > > > > From: Gcc-patches  > > > > bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Philipp
> > > > > Tomsich
> > > > > Sent: Monday, March 27, 2023 8:47 AM
> > > > > To: gcc-patches@gcc.gnu.org
> > > > > Cc: Richard Sandiford ; Tamar
> Christina
> > > > > ; Philipp Tomsich
> > > ;
> > > > > Manolis Tsamis 
> > > > > Subject: [PATCH] aarch64: update ampere1 vectorization cost
> > > > >
> > > > > The original submission of AmpereOne (-mcpu=ampere1) costs
> occurred
> > > > > prior to exhaustive testing of vectorizable workloads against
> > > > > hardware.
> > > > >
> > > > > Adjust the vector costs to achieve the best results and more closely
> > > > > match the underlying hardware.
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > >   * config/aarch64/aarch64.cc: Update vector costs for ampere1.
> > > > >
> > > > > Co-Authored-By: Manolis Tsamis 
> > > > >
> > > > > Signed-off-by: Philipp Tomsich 
> > > > > ---
> > > > > We would like to get this into GCC 13 to avoid having to backport at
> > > > > the start of the next cycle.
> > > > >
> > > >
> > > > Given this affects only the ampere1 costs that sounds fine to me and
> fairly
> > > low risk, you are being trusted that these costs are actually desirable 
> > > and
> > > properly validated on the hardware involved.
> > > >
> > > > > OK for backports?
> > > >
> > > > This is ok for trunk (GCC 13). Do you also want to backport this to 
> > > > other
> > > branches?
> > >
> > > Ampere1 (with the older vector costs) are in GCC12 and GCC11.
> > > I would like to backport to those as well.
> >
> > Ok then, though you may want to run the benchmarks on the branches as
> well to make sure the costs give the expected benefit there as well.
> > Thanks,
> > Kyrill
> >
> > >
> > > Thanks,
> > > Philipp.
> > >
> > > > Thanks,
> > > > Kyrill
> > > >
> > > > >
> > > > >  gcc/config/aarch64/aarch64.cc | 12 ++--
> > > > >  1 file changed, 6 insertions(+), 6 deletions(-)
> > > > >
> > > > > diff --git a/gcc/config/aarch64/aarch64.cc
> > > b/gcc/config/aarch64/aarch64.cc
> > > > > index b27f4354031..661fff65cea 100644
> > > > > --- a/gcc/config/aarch64/aarch64.cc
> > > > > +++ b/gcc/config/aarch64/aarch64.cc
> > > > > @@ -1132,7 +1132,7 @@ static const struct cpu_vector_cost
> > > > > thunderx3t110_vector_cost =
> > > > >
> > > > >  static const advsimd_vec_cost ampere1_advsimd_vector_cost =
> > > > >  {
> > > > > -  3, /* int_stmt_cost  */
> > > > > +  1, /* int_stmt_cost  */
> > > > >3, /* fp_stmt_cost  */
> > > > >0, /* ld2_st2_permute_cost  */
> > > > >0, /* ld3_st3_permute_cost  */
> > > > > @@ -1148,17 +1148,17 @@ static const advsimd_vec_cost
> > > > > ampere1_advsimd_vector_cost =
> > > > >8, /* store_elt_extra_cost  */
> > > > >6, /* vec_to_scalar_cost  */
> > > > >7, /* scalar_to_vec_cost  */
> > > > > -  5, /* align_load_cost  */
> > > > > -  5, /* unalign_load_cost  */
> > > > > -  2, /* unalign_store_cost  */
> > > > > -  2  /* store_cost  */
> > > > > +  4, /* align_load_cost  */
> > > > > +  4, /* unalign_load_cost  */
> > > > > +  1, /* unalign_store_cost  */
> > > > > +  1  /* store_cost  */
> > > > >  };
> > > > >
> > > > >  /* Ampere-1 costs for vector insn classes.  */
> > > > >  static const struct cpu_vector_cost ampere1_vector_cost =
> > > > >  {
> > > > >1, /* scalar_int_stmt_cost  */
> > > > > -  1, /* scalar_fp_stmt_cost  */
> > > > > +  3, /* scalar_fp_stmt_cost  */
> > > > >4, /* scalar_load_cost  */
> > > > >1, /* scalar_store_cost  */
> > > > >1, /* cond_taken_branch_cost  */
> > > > > --
> > > > > 2.34.1
> > > >


Re: arm: Fix MVE vcreate definition

2023-04-04 Thread Stamatis Markianos-Wright via Gcc-patches



On 29/03/2023 13:16, Kyrylo Tkachov wrote:

-Original Message-
From: Stam Markianos-Wright
Sent: Wednesday, March 29, 2023 11:50 AM
To:gcc-patches@gcc.gnu.org
Cc: Kyrylo Tkachov
Subject: arm: Fix MVE vcreate definition

Hi all,

I just found a bug that goes back to the initial merge of
the MVE backend: The vcreate intrinsic has had it's vector
lanes mixed up, compared to what was intended (as per
the ACLE) definition. This is also a discrepancy with clang:
https://godbolt.org/z/4n93e5aqj

This patches simply switches the operands around and
makes the tests more specific on the input registers
(I do not touch the output Q regs as they vary based
on softfp/hardfp or the input registers when the input
is a constant, since, in that case, a single register
is loaded with a constant and then the same register is
used twice as "vmov q0[2], q0[0], r2, r2" and the reg
num might also not always be guaranteed).

No regressions on MVE tesctsuite configurations or in
the CMSIS-NN testsuite.

Ok for trunk? (Despite this being late in Stage 4, sorry
about that!)

Ok, since this is a wrong-code fix.

Thanks, applied as:
3f0ca7a3e4431534bff3b8eb73709cc822e489b0.

This needs backports as well, right?

Indeed! I'm building up a larger list of commits that we're hoping
to backport, so I will include this on that list.


Thanks,
Kyrill


Thanks,
Stamatis Markianos-Wright

gcc/ChangeLog:

      * config/arm/mve.md (mve_vcvtq_n_to_f_): Swap
operands.
    (mve_vcreateq_f): Swap operands.

gcc/testsuite/ChangeLog:

      * gcc.target/arm/mve/intrinsics/vcreateq_f16.c: Tighten test.
      * gcc.target/arm/mve/intrinsics/vcreateq_f32.c: Tighten test.
      * gcc.target/arm/mve/intrinsics/vcreateq_s16.c: Tighten test.
      * gcc.target/arm/mve/intrinsics/vcreateq_s32.c: Tighten test.
      * gcc.target/arm/mve/intrinsics/vcreateq_s64.c: Tighten test.
      * gcc.target/arm/mve/intrinsics/vcreateq_s8.c: Tighten test.
      * gcc.target/arm/mve/intrinsics/vcreateq_u16.c: Tighten test.
      * gcc.target/arm/mve/intrinsics/vcreateq_u32.c: Tighten test.
      * gcc.target/arm/mve/intrinsics/vcreateq_u64.c: Tighten test.
      * gcc.target/arm/mve/intrinsics/vcreateq_u8.c: Tighten test.


Re: [PATCH] ree: Improvement of ree pass for rs6000 target.

2023-04-04 Thread Jakub Jelinek via Gcc-patches
On Tue, Apr 04, 2023 at 05:02:35PM +0530, Ajit Agarwal via Gcc-patches wrote:
> This patch eliminates unnecessary redundant extension within basic and across 
> basic blocks. For rs6000 target we see
> redundant zero and sign extension and done to improve ree pass to eliminate 
> such redundant zero and sign extension.

Just random comments, will defer the rest to whomever reviews it for stage1.
>   testsuite/g++.target/powerpc/zext-elim.C: New testcase.
>   testsuite/g++.target/powerpc/sext-elim.C: New testcase.

Missing * and space before the filenames, plus testsuite has a separate
ChangeLog, so it should be
gcc/testsuite/ChangeLog
* g++.target/powerpc/zext-elim.C: New testcase.
etc.
> --- a/gcc/common/config/rs6000/rs6000-common.cc
> +++ b/gcc/common/config/rs6000/rs6000-common.cc
> @@ -30,6 +30,8 @@
>  /* Implement TARGET_OPTION_OPTIMIZATION_TABLE.  */
>  static const struct default_options rs6000_option_optimization_table[] =
>{
> +/* Enable -free for zero extension and sign extension elimination.*/
> +{ OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 },

I believe the options should be sorted by the OPT_LEVEL* they are given.

>  /* Split multi-word types early.  */
>  { OPT_LEVELS_ALL, OPT_fsplit_wide_types_early, NULL, 1 },
>  /* Enable -fsched-pressure for first pass instruction scheduling.  */
> @@ -38,11 +40,9 @@ static const struct default_options 
> rs6000_option_optimization_table[] =
> loops at -O2 and above by default.  */
>  { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 },
>  { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_only_small_loops, NULL, 1 },
> -
>  /* -frename-registers leads to non-optimal codegen and performance
> on rs6000, turn it off by default.  */
>  { OPT_LEVELS_ALL, OPT_frename_registers, NULL, 0 },
> -
>  /* Double growth factor to counter reduced min jump length.  */
>  { OPT_LEVELS_ALL, OPT__param_max_grow_copy_bb_insns_, NULL, 16 },
>  { OPT_LEVELS_NONE, 0, NULL, 0 }

Why?

> + rtx set = XEXP (SET_SRC(body), 0);

Space missing before (body
The indentation is also incorrect, it should be always in multiplies
of 2, the above is 5 columns.

> +
> + if (REG_P(set) && GET_MODE(SET_DEST(body))
> +  == GET_MODE(set))

Similarly.  Furthermore, the formatting is wrong.  In this case, there is no
reason why
  if (REG_P (set) && GET_MODE (SET_DEST (body)) == GET_MODE (set))
wouldn't fit on one line, but if it would be longer, either one would use
  if (whatever
  && whatever_else)
or
  if (whatever && (whatever_else
   == something))
or similar.
> +  && GET_CODE (SET_DEST (expr)) == REG

Use REG_P (SET_DEST (expr))

> +  && GET_CODE (SET_SRC (expr))  == IF_THEN_ELSE
> +  && GET_CODE (XEXP (SET_SRC (expr), 1)) == REG
> +  && GET_CODE (XEXP (SET_SRC (expr), 2)) == REG)

Similarly.

> +{
> +  *reg1 = XEXP (SET_SRC (expr), 1);
> +  *reg2 = XEXP (SET_SRC (expr), 2);
> +  return true;
> +}

> @@ -319,9 +440,8 @@ combine_set_extension (ext_cand *cand, rtx_insn 
> *curr_insn, rtx *orig_set)
>  {
>rtx orig_src = SET_SRC (*orig_set);
>machine_mode orig_mode = GET_MODE (SET_DEST (*orig_set));
> -  rtx new_set;
> +  rtx new_set = NULL_RTX;
>rtx cand_pat = single_set (cand->insn);
> -

Why this random whitespace change?

> @@ -734,8 +1006,7 @@ merge_def_and_ext (ext_cand *cand, rtx_insn *def_insn, 
> ext_state *state)
> return true;
>   }
>  }
> -
> -  return false;
> +return false;
>  }

The old code was properly formatted, this one isn't.

>  /* Given SRC, which should be one or more extensions of a REG, strip
> @@ -744,7 +1015,8 @@ merge_def_and_ext (ext_cand *cand, rtx_insn *def_insn, 
> ext_state *state)
>  static inline rtx
>  get_extended_src_reg (rtx src)
>  {
> -  while (GET_CODE (src) == SIGN_EXTEND || GET_CODE (src) == ZERO_EXTEND)
> +  while (GET_CODE (src) == SIGN_EXTEND || GET_CODE (src) == ZERO_EXTEND
> + || insn_is_zext_p(src))

If a condition doesn't fit into one line, then the &&/|| terms should be
split into one term on one line, so
  while (GET_CODE (src) == SIGN_EXTEND
 || GET_CODE (src) == ZERO_EXTEND
 || insn_is_zext_p (src))
and note again missing space.

>  src = XEXP (src, 0);
>gcc_assert (REG_P (src));
>return src;
> @@ -767,10 +1039,48 @@ combine_reaching_defs (ext_cand *cand, const_rtx 
> set_pat, ext_state *state)
>int i;
>int defs_ix;
>bool outcome;
> -

Again, why?  Don't do this.
>state->defs_list.truncate (0);
>state->copies_list.truncate (0);
>  
> +  if (cand->code == ZERO_EXTEND)
> +{
> +  rtx orig_src = XEXP (SET_SRC (cand->expr),0);
> +  if (!get_defs (cand->insn, orig_src, NULL))
> + {
> +   if (GET_MODE (orig_src) == QImode
> +   && FUNCTION_ARG_REGNO_P (REGNO (orig_src))
> +   && !FUNCTION_VALUE_REGNO_P (REGNO (orig_src)))
> + {
> +   if 

Re: Patch ping Re: [PATCH] ipa: Avoid another ICE when dealing with type-incompatibilities (PR 108959)

2023-04-04 Thread Jan Hubicka via Gcc-patches
> On Thu, Mar 23, 2023 at 11:09:19AM +0100, Martin Jambor wrote:
> > Hi,
> > 
> > PR 108959 shows one more example where undefined code with type
> > incompatible accesses to stuff passed in parameters can cause an ICE
> > because we try to create a VIEW_CONVERT_EXPR of mismatching sizes:
> > 
> > 1. IPA-CP tries to push one type from above,
> > 
> > 2. IPA-SRA (correctly) decides the parameter is useless because it is
> >only used to construct an argument to another function which does not
> >use it and so the formal parameter should be removed,
> > 
> > 3. but the code reconciling IPA-CP and IPA-SRA transformations still
> >wants to perform the IPA-CP and overrides the built-in DCE of
> >useless statements and tries to stuff constants into them
> >instead, constants of a type with mismatching type and size.
> > 
> > This patch avoids the situation in IPA-SRA by purging the IPA-CP
> > results from any "aggregate" constants that are passed in parameters
> > which are detected to be useless.  It also removes IPA value range and
> > bits information associated with removed parameters stored in the same
> > structure so that the useless information is not streamed later on.
> > 
> > Bootstrapped and LTO-bootstrapped and tested on x86_64-linux.  OK for
> > trunk?
> > 
> > gcc/ChangeLog:
> > 
> > 2023-03-22  Martin Jambor  
> > 
> > PR ipa/108959
> > * ipa-sra.cc (zap_useless_ipcp_results): New function.
> > (process_isra_node_results): Call it.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > 2023-03-17  Martin Jambor  
> > 
> > PR ipa/108959
> > * gcc.dg/ipa/pr108959.c: New test.
OK,
thanks!
Honza


[PATCH] ree: Improvement of ree pass for rs6000 target.

2023-04-04 Thread Ajit Agarwal via Gcc-patches
Hello All:

This patch eliminates unnecessary redundant extension within basic and across 
basic blocks. For rs6000 target we see
redundant zero and sign extension and done to improve ree pass to eliminate 
such redundant zero and sign extension.

Bootstrapped and regtested on powerpc64-linux-gnu.

Thanks & Regards
Ajit


ree: Improvement of ree pass for rs6000 target.

Eliminate unnecessary redundant extension within basic
and across basic blocks. For rs6000 target we see
redundant zero and sign extension and done to improve
ree pass to eliminate such redundant zero and sign
extension.

2023-04-04  Ajit Kumar Agarwal  

gcc/ChangeLog:

* ree.cc (eliminate_across_bbs_p): Add checks to enable extension
elimination across and within basic blocks.
(def_arith_p): New function to check definition has arithmetic
operation.
(combine_set_extension): Modification to incorporate AND
and current zero_extend and sign_extend instruction.
(combline_reaching_defs): Add zero_extend and sign_extend.
Add FUNCTION_ARG_REGNO_P abi interfaces calls and
FUNCTION_VALUE_REGNO_P support.
(merge_def_and_ext): Add calls to eliminate_across_bbs_p and
zero_extend sign_extend and AND instruction.
(insn_is_zext_p): New function.
(add_removable_extension): Add FUNCTION_ARG_REGNO_P abi
interface calls.
* common/config/rs6000/rs6000-common.cc: Add REE pass as a
default rs6000 target pass for O2 and above.
testsuite/g++.target/powerpc/zext-elim.C: New testcase.
testsuite/g++.target/powerpc/sext-elim.C: New testcase.
---
 gcc/common/config/rs6000/rs6000-common.cc|   4 +-
 gcc/ree.cc   | 662 ++-
 gcc/testsuite/g++.target/powerpc/sext-elim.C |  18 +
 gcc/testsuite/g++.target/powerpc/zext-elim.C |  30 +
 4 files changed, 563 insertions(+), 151 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/powerpc/sext-elim.C
 create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim.C

diff --git a/gcc/common/config/rs6000/rs6000-common.cc 
b/gcc/common/config/rs6000/rs6000-common.cc
index 2140c442ba9..a9f518478a4 100644
--- a/gcc/common/config/rs6000/rs6000-common.cc
+++ b/gcc/common/config/rs6000/rs6000-common.cc
@@ -30,6 +30,8 @@
 /* Implement TARGET_OPTION_OPTIMIZATION_TABLE.  */
 static const struct default_options rs6000_option_optimization_table[] =
   {
+/* Enable -free for zero extension and sign extension elimination.*/
+{ OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 },
 /* Split multi-word types early.  */
 { OPT_LEVELS_ALL, OPT_fsplit_wide_types_early, NULL, 1 },
 /* Enable -fsched-pressure for first pass instruction scheduling.  */
@@ -38,11 +40,9 @@ static const struct default_options 
rs6000_option_optimization_table[] =
loops at -O2 and above by default.  */
 { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 },
 { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_only_small_loops, NULL, 1 },
-
 /* -frename-registers leads to non-optimal codegen and performance
on rs6000, turn it off by default.  */
 { OPT_LEVELS_ALL, OPT_frename_registers, NULL, 0 },
-
 /* Double growth factor to counter reduced min jump length.  */
 { OPT_LEVELS_ALL, OPT__param_max_grow_copy_bb_insns_, NULL, 16 },
 { OPT_LEVELS_NONE, 0, NULL, 0 }
diff --git a/gcc/ree.cc b/gcc/ree.cc
index 413aec7c8eb..038bb71baaf 100644
--- a/gcc/ree.cc
+++ b/gcc/ree.cc
@@ -253,6 +253,102 @@ struct ext_cand
 
 static int max_insn_uid;
 
+/* Get all the reaching definitions of an instruction.  The definitions are
+   desired for REG used in INSN.  Return the definition list or NULL if a
+   definition is missing.  If DEST is non-NULL, additionally push the INSN
+   of the definitions onto DEST.  */
+
+static struct df_link *
+get_defs (rtx_insn *insn, rtx reg, vec *dest)
+{
+  df_ref use;
+  struct df_link *ref_chain, *ref_link;
+
+  FOR_EACH_INSN_USE (use, insn)
+{
+  if (GET_CODE (DF_REF_REG (use)) == SUBREG)
+   return NULL;
+  if (REGNO (DF_REF_REG (use)) == REGNO (reg))
+   break;
+}
+
+  if (use == NULL)
+return NULL;
+
+  ref_chain = DF_REF_CHAIN (use);
+
+  for (ref_link = ref_chain; ref_link; ref_link = ref_link->next)
+{
+  /* Problem getting some definition for this instruction.  */
+  if (ref_link->ref == NULL)
+   return NULL;
+  if (DF_REF_INSN_INFO (ref_link->ref) == NULL)
+   return NULL;
+  /* As global regs are assumed to be defined at each function call
+dataflow can report a call_insn as being a definition of REG.
+But we can't do anything with that in this pass so proceed only
+if the instruction really sets REG in a way that can be deduced
+from the RTL structure.  */
+  if (global_regs[REGNO (reg)]
+ && !set_of (reg, DF_REF_INSN (ref_link->ref)))
+   

[committed] libstdc++: Fix outdated docs about demangling exception messages

2023-04-04 Thread Jonathan Wakely via Gcc-patches
Pushed to trunk.

-- >8 --

The string returned by std::bad_exception::what() hasn't been a mangled
name since PR libstdc++/14493 was fixed for GCC 4.2.0, so remove the
docs showing how to demangle it.

libstdc++-v3/ChangeLog:

* doc/xml/manual/extensions.xml: Remove std::bad_exception from
example program.
* doc/html/manual/ext_demangling.html: Regenerate.
---
 libstdc++-v3/doc/html/manual/ext_demangling.html | 13 ++---
 libstdc++-v3/doc/xml/manual/extensions.xml   | 13 ++---
 2 files changed, 4 insertions(+), 22 deletions(-)

diff --git a/libstdc++-v3/doc/xml/manual/extensions.xml 
b/libstdc++-v3/doc/xml/manual/extensions.xml
index 86e92beffd3..196b55d8347 100644
--- a/libstdc++-v3/doc/xml/manual/extensions.xml
+++ b/libstdc++-v3/doc/xml/manual/extensions.xml
@@ -534,14 +534,6 @@ int main()
   int status;
   char   *realname;
 
-  // exception classes not in stdexcept, thrown by the implementation
-  // instead of the user
-  std::bad_exception  e;
-  realname = abi::__cxa_demangle(e.what(), 0, 0, status);
-  std::cout  e.what()  "\t= "  realname  
"\t: "  status  '\n';
-  free(realname);
-
-
   // typeid
   barempty,17  u;
   const std::type_info  ti = typeid(u);
@@ -559,7 +551,6 @@ int main()
 


-  St13bad_exception   = std::bad_exception   : 0
   3barI5emptyLi17EE   = barempty, 17   : 0


@@ -568,8 +559,8 @@ int main()
  The demangler interface is described in the source documentation
  linked to above.  It is actually written in C, so you don't need to
  be writing C++ in order to demangle C++.  (That also means we have to
- use crummy memory management facilities, so don't forget to free()
- the returned char array.)
+ use crummy memory management facilities, so don't forget to
+ free() the returned char array.)

 
 
-- 
2.39.2



Patch ping Re: [PATCH] ipa: Avoid another ICE when dealing with type-incompatibilities (PR 108959)

2023-04-04 Thread Jakub Jelinek via Gcc-patches
Hi!

Honza, could you please have a look?

This is one of the few remaining P1s.

On Thu, Mar 23, 2023 at 11:09:19AM +0100, Martin Jambor wrote:
> Hi,
> 
> PR 108959 shows one more example where undefined code with type
> incompatible accesses to stuff passed in parameters can cause an ICE
> because we try to create a VIEW_CONVERT_EXPR of mismatching sizes:
> 
> 1. IPA-CP tries to push one type from above,
> 
> 2. IPA-SRA (correctly) decides the parameter is useless because it is
>only used to construct an argument to another function which does not
>use it and so the formal parameter should be removed,
> 
> 3. but the code reconciling IPA-CP and IPA-SRA transformations still
>wants to perform the IPA-CP and overrides the built-in DCE of
>useless statements and tries to stuff constants into them
>instead, constants of a type with mismatching type and size.
> 
> This patch avoids the situation in IPA-SRA by purging the IPA-CP
> results from any "aggregate" constants that are passed in parameters
> which are detected to be useless.  It also removes IPA value range and
> bits information associated with removed parameters stored in the same
> structure so that the useless information is not streamed later on.
> 
> Bootstrapped and LTO-bootstrapped and tested on x86_64-linux.  OK for
> trunk?
> 
> gcc/ChangeLog:
> 
> 2023-03-22  Martin Jambor  
> 
>   PR ipa/108959
>   * ipa-sra.cc (zap_useless_ipcp_results): New function.
>   (process_isra_node_results): Call it.
> 
> gcc/testsuite/ChangeLog:
> 
> 2023-03-17  Martin Jambor  
> 
>   PR ipa/108959
>   * gcc.dg/ipa/pr108959.c: New test.
> ---
>  gcc/ipa-sra.cc  | 66 +
>  gcc/testsuite/gcc.dg/ipa/pr108959.c | 22 ++
>  2 files changed, 88 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/ipa/pr108959.c
> 
> diff --git a/gcc/ipa-sra.cc b/gcc/ipa-sra.cc
> index 3de7d426b7e..7b8260bc9e1 100644
> --- a/gcc/ipa-sra.cc
> +++ b/gcc/ipa-sra.cc
> @@ -4028,6 +4028,70 @@ mark_callers_calls_comdat_local (struct cgraph_node 
> *node, void *)
>return false;
>  }
>  
> +/* Remove any IPA-CP results stored in TS that are associated with removed
> +   parameters as marked in IFS. */
> +
> +static void
> +zap_useless_ipcp_results (const isra_func_summary *ifs, ipcp_transformation 
> *ts)
> +{
> +  unsigned ts_len = vec_safe_length (ts->m_agg_values);
> +
> +  if (ts_len == 0)
> +return;
> +
> +  bool removed_item = false;
> +  unsigned dst_index = 0;
> +
> +  for (unsigned i = 0; i < ts_len; i++)
> +{
> +  ipa_argagg_value *v = &(*ts->m_agg_values)[i];
> +  const isra_param_desc *desc = &(*ifs->m_parameters)[v->index];
> +
> +  if (!desc->locally_unused)
> + {
> +   if (removed_item)
> + (*ts->m_agg_values)[dst_index] = *v;
> +   dst_index++;
> + }
> +  else
> + removed_item = true;
> +}
> +  if (dst_index == 0)
> +{
> +  ggc_free (ts->m_agg_values);
> +  ts->m_agg_values = NULL;
> +}
> +  else if (removed_item)
> +ts->m_agg_values->truncate (dst_index);
> +
> +  bool useful_bits = false;
> +  unsigned count = vec_safe_length (ts->bits);
> +  for (unsigned i = 0; i < count; i++)
> +if ((*ts->bits)[i])
> +{
> +  const isra_param_desc *desc = &(*ifs->m_parameters)[i];
> +  if (desc->locally_unused)
> + (*ts->bits)[i] = NULL;
> +  else
> + useful_bits = true;
> +}
> +  if (!useful_bits)
> +ts->bits = NULL;
> +
> +  bool useful_vr = false;
> +  count = vec_safe_length (ts->m_vr);
> +  for (unsigned i = 0; i < count; i++)
> +if ((*ts->m_vr)[i].known)
> +  {
> + const isra_param_desc *desc = &(*ifs->m_parameters)[i];
> + if (desc->locally_unused)
> +   (*ts->m_vr)[i].known = false;
> + else
> +   useful_vr = true;
> +  }
> +  if (!useful_vr)
> +ts->m_vr = NULL;
> +}
>  
>  /* Do final processing of results of IPA propagation regarding NODE, clone it
> if appropriate.  */
> @@ -4080,6 +4144,8 @@ process_isra_node_results (cgraph_node *node,
>  }
>  
>ipcp_transformation *ipcp_ts = ipcp_get_transformation_summary (node);
> +  if (ipcp_ts)
> +zap_useless_ipcp_results (ifs, ipcp_ts);
>vec *new_params = NULL;
>if (ipa_param_adjustments *old_adjustments
>= cinfo ? cinfo->param_adjustments : NULL)
> diff --git a/gcc/testsuite/gcc.dg/ipa/pr108959.c 
> b/gcc/testsuite/gcc.dg/ipa/pr108959.c
> new file mode 100644
> index 000..cd1f88658ef
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/ipa/pr108959.c
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +union U2 {
> +  long f0;
> +  int f1;
> +};
> +int g_16;
> +int g_70[20];
> +static int func_61(int) {
> +  for (;;)
> +g_70[g_16] = 4;
> +}
> +static int func_43(int *p_44)
> +{
> +  func_61(*p_44);
> +}
> +int main() {
> +  union U2 l_38 = {9};
> +  int *l_49 = (int *) _38;
> +  func_43(l_49);
> +}
> -- 
> 2.40.0

Jakub

Re: [PATCH] tree-optimization/109304 - properly handle instrumented aliases

2023-04-04 Thread Jan Hubicka via Gcc-patches
> On Tue, Apr 04, 2023 at 01:21:40AM +0200, Jan Hubicka via Gcc-patches wrote:
> > It is however really side case and I am worried about dropping
> > pure/const from builtin declarations...
> 
> Yeah, that can certainly break stuff.  See e.g. the recently fixed
> ICE when memcmp wasn't pure in PR109258.

Yep, i think itis better to poke about this in stage1 (it is a can of
worms).  Clearly we have conflict here: if memcmp is implemented locally
one can construct a testcase where profile would be rejected on
-fprofile-use time if const flag is not cleared :(.  But it should be
rare thing happening in practice.

Honza
>
>   Jakub
> 


[committed] amdgcn: Add 64-bit vector not

2023-04-04 Thread Andrew Stubbs

I've committed this patch to add a missing vector operator on amdgcn.

The architecture doesn't have a 64-bit not instruction so we didn't have 
an insn for it, but the vectorizer didn't like that and caused the 
v64df_pow function to use 2MB of stack frame. This is a problem when you 
typically have over 3000 threads and only want to allocate 32k of stack 
space each!


Andrewamdgcn: Add 64-bit vector not

gcc/ChangeLog:

* config/gcn/gcn-valu.md (one_cmpl2): New.

diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index 44d107145db..c0b43fcfb64 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -2791,6 +2791,23 @@ (define_expand "neg2"
 DONE;
   })
 
+(define_insn_and_split "one_cmpl2"
+  [(set (match_operand:V_DI 0 "register_operand"  "=   v")
+(not:V_DI
+  (match_operand:V_DI 1 "gcn_alu_operand" "vSvDB")))]
+  ""
+  "#"
+  "reload_completed"
+  [(set (match_dup 3) (not: (match_dup 5)))
+   (set (match_dup 4) (not: (match_dup 6)))]
+  {
+operands[3] = gcn_operand_part (mode, operands[0], 0);
+operands[4] = gcn_operand_part (mode, operands[0], 1);
+operands[5] = gcn_operand_part (mode, operands[1], 0);
+operands[6] = gcn_operand_part (mode, operands[1], 1);
+  }
+  [(set_attr "type" "mult")])
+
 ;; }}}
 ;; {{{ FP binops - special cases
 


Re: [PATCH] riscv: Fix bootstrap [PR109384]

2023-04-04 Thread Kito Cheng via Gcc-patches
ok, thanks!

On Tue, Apr 4, 2023 at 5:01 PM Jakub Jelinek via Gcc-patches
 wrote:
>
> Hi!
>
> The following patch unbreaks riscv bootstrap, where it previously failed
> on -Werror=format-diag warning promoted to error.
>
> Ok for trunk?
>
> Or shall it say e.g.
> "%<-march=%s%>: % extension conflicts with %"
> ?
> Or say if the current condition is true, do
> const char *ext = "zfinx";
> if (subset_list->lookup ("zdinx"))
>   ext = "zdinx";
> else if (subset_list->lookup ("zhinx"))
>   ext = "zhinx";
> else if (subset_list->lookup ("zhinxmin"))
>   ext = "zhinxmin";
> and
> "%<-march=%s%>: %qs extension conflicts with %", arch, ext
> ?  Or do similar check for which extension to print against it,
> const char *ext = "zfinx";
> const char *ext2 = "f";
> if (subset_list->lookup ("zdinx"))
>   {
> ext = "zdinx";
> if (subset_list->lookup ("d"))
>   ext2 = "d";
>   }
> else if (subset_list->lookup ("zhinx"))
>   {
> ext = "zhinx";
> if (subset_list->lookup ("zfh"))
>   ext2 = "zfh";
>   }
> else if (subset_list->lookup ("zhinxmin"))
>   {
> ext = "zhinxmin";
> if (subset_list->lookup ("zfhmin"))
>   ext2 = "zfhmin";
>   }
> "%<-march=%s%>: %qs extension conflicts with %qs", arch, ext, ext2
> ?
>
> 2023-04-04  Jakub Jelinek  
>
> PR target/109384
> * common/config/riscv/riscv-common.cc (riscv_subset_list::parse):
> Reword diagnostics about zfinx conflict with f, formatting fixes.
>
> * gcc.target/riscv/arch-19.c: Expect a different message about zfinx
> vs. f conflict.
>
> --- gcc/common/config/riscv/riscv-common.cc.jj  2023-04-04 10:46:33.473871184 
> +0200
> +++ gcc/common/config/riscv/riscv-common.cc 2023-04-04 10:41:22.014477456 
> +0200
> @@ -1153,10 +1153,9 @@ riscv_subset_list::parse (const char *ar
>
>subset_list->handle_combine_ext ();
>
> -  if (subset_list->lookup("zfinx") && subset_list->lookup("f"))
> -   error_at (loc,
> -   "%<-march=%s%>: z*inx is conflict with float extensions",
> -   arch);
> +  if (subset_list->lookup ("zfinx") && subset_list->lookup ("f"))
> +error_at (loc, "%<-march=%s%>: z*inx conflicts with floating-point "
> +  "extensions", arch);
>
>return subset_list;
>
> --- gcc/testsuite/gcc.target/riscv/arch-19.c.jj 2023-03-29 22:37:11.465651690 
> +0200
> +++ gcc/testsuite/gcc.target/riscv/arch-19.c2023-04-04 10:45:50.734503089 
> +0200
> @@ -1,4 +1,4 @@
>  /* { dg-do compile } */
>  /* { dg-options "-march=rv64if_zfinx -mabi=lp64" } */
>  int foo() {}
> -/* { dg-error "'-march=rv64if_zfinx': z\\*inx is conflict with float 
> extensions" "" { target *-*-* } 0 } */
> +/* { dg-error "'-march=rv64if_zfinx': z\\*inx conflicts with floating-point 
> extensions" "" { target *-*-* } 0 } */
>
> Jakub
>


Re: [GCC14 PATCH] LoongArch: Optimize additions with immediates

2023-04-04 Thread Lulu Cheng



在 2023/4/4 下午4:40, Xi Ruoyao 写道:

On Tue, 2023-04-04 at 16:00 +0800, Xi Ruoyao via Gcc-patches wrote:

On Tue, 2023-04-04 at 11:01 +0800, Lulu Cheng wrote:

/* snip */


+unsigned long f10 (unsigned long x) { return x - 0x8000l * 2; }
+unsigned long f11 (unsigned long x) { return x - 0x8000l * 2; }

  These two test cases are duplicates.

/* snip */


+unsigned int g10 (unsigned int x) { return x - 0x8000l * 2; }
+unsigned int g11 (unsigned int x) { return x - 0x8000l * 2; }

Ditto.

I'll fix them in V2.


I found that adding this log test case gcc.target/loongarch/stack-check-cfa-1.c 
and gcc.target/loongarch/stack-check-cfa-2.c test failed.
Although the test fails, the generated assembly code is better, and there is no 
problem with the logic of the assembly code. I haven't checked the reason for 
this yet.

Looks like the change hides PR109035 (like -fpie) for some reason. (But
I still don't understand the root cause of PR109035 anyway.)

V2 sent with test cases fixed.

/* snip */


Technically there should be "addu16i.d $r3,$r3,-1" in the prologue and
"addu16i.d $r3,$r3,2" in the epilogue, so we can avoid using r14/r13.
I'll try modifying loongarch_expand_{pro,epi}logue for this.

Will do this later because I'm too stupid to understand
loongarch_first_stack_step function quickly :).


Thank you very much!



Re: [GCC14 PATCH v2] LoongArch: Optimize additions with immediates

2023-04-04 Thread Lulu Cheng



在 2023/4/4 下午4:38, Xi Ruoyao 写道:

1. Use addu16i.d for TARGET_64BIT and suitable immediates.
2. Split one addition with immediate into two addu16i.d or addi.{d/w}
instructions if possible.  This can avoid using a temp register w/o
increase the count of instructions.

Inspired by https://reviews.llvm.org/D143710 and
https://reviews.llvm.org/D147222.

Stack check CFA tests are updated because this change somehow hides
PR109035, causing a smaller stack frame.

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for GCC 14?


I think there is no problem can be integrated into gcc14.

Thanks!


gcc/ChangeLog:

* config/loongarch/loongarch-protos.h
(loongarch_addu16i_imm12_operand_p): New function prototype.
(loongarch_split_plus_constant): Likewise.
* config/loongarch/loongarch.cc
(loongarch_addu16i_imm12_operand_p): New function.
(loongarch_split_plus_constant): Likewise.
* config/loongarch/loongarch.h (ADDU16I_OPERAND): New macro.
(DUAL_IMM12_OPERAND): Likewise.
(DUAL_ADDU16I_OPERAND): Likewise.
* config/loongarch/constraints.md (La, Lb, Lc, Ld, Le): New
constraint.
* config/loongarch/predicates.md (const_dual_imm12_operand): New
predicate.
(const_addu16i_operand): Likewise.
(const_addu16i_imm12_di_operand): Likewise.
(const_addu16i_imm12_si_operand): Likewise.
(plus_di_operand): Likewise.
(plus_si_operand): Likewise.
(plus_si_extend_operand): Likewise.
* config/loongarch/loongarch.md (add3): Convert to
define_insn_and_split.  Use plus__operand predicate
instead of arith_operand.  Add alternatives for La, Lb, Lc, Ld,
and Le constraints.
(*addsi3_extended): Convert to define_insn_and_split.  Use
plus_si_extend_operand instead of arith_operand.  Add
alternatives for La and Le alternatives.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/add-const.c: New test.
* gcc.target/loongarch/stack-check-cfa-1.c: Adjust for stack
frame size change.
* gcc.target/loongarch/stack-check-cfa-2.c: Likewise.
---
  gcc/config/loongarch/constraints.md   | 46 -
  gcc/config/loongarch/loongarch-protos.h   |  2 +
  gcc/config/loongarch/loongarch.cc | 44 +
  gcc/config/loongarch/loongarch.h  | 19 ++
  gcc/config/loongarch/loongarch.md | 66 +++
  gcc/config/loongarch/predicates.md| 36 ++
  .../gcc.target/loongarch/add-const.c  | 45 +
  .../gcc.target/loongarch/stack-check-cfa-1.c  |  2 +-
  .../gcc.target/loongarch/stack-check-cfa-2.c  |  2 +-
  9 files changed, 246 insertions(+), 16 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/loongarch/add-const.c

diff --git a/gcc/config/loongarch/constraints.md 
b/gcc/config/loongarch/constraints.md
index cb7fa688ceb..7a38cd07ae9 100644
--- a/gcc/config/loongarch/constraints.md
+++ b/gcc/config/loongarch/constraints.md
@@ -60,7 +60,22 @@
  ;; "I" "A signed 12-bit constant (for arithmetic instructions)."
  ;; "J" "Integer zero."
  ;; "K" "An unsigned 12-bit constant (for logic instructions)."
-;; "L" <-unused
+;; "L" -
+;; "La"
+;;  "A signed constant in [-4096, 2048) or (2047, 4094]."
+;; "Lb"
+;;  "A signed 32-bit constant and low 16-bit is zero, which can be
+;;   added onto a register with addu16i.d.  It matches nothing if
+;;   the addu16i.d instruction is not available."
+;; "Lc"
+;;  "A signed 64-bit constant can be expressed as Lb + I, but not a
+;;   single Lb or I."
+;; "Ld"
+;;  "A signed 64-bit constant can be expressed as Lb + Lb, but not a
+;;   single Lb."
+;; "Le"
+;;  "A signed 32-bit constant can be expressed as Lb + I, but not a
+;;   single Lb or I."
  ;; "M" <-unused
  ;; "N" <-unused
  ;; "O" <-unused
@@ -170,6 +185,35 @@ (define_constraint "K"
(and (match_code "const_int")
 (match_test "IMM12_OPERAND_UNSIGNED (ival)")))
  
+(define_constraint "La"

+  "A signed constant in [-4096, 2048) or (2047, 4094]."
+  (and (match_code "const_int")
+   (match_test "DUAL_IMM12_OPERAND (ival)")))
+
+(define_constraint "Lb"
+  "A signed 32-bit constant and low 16-bit is zero, which can be added
+   onto a register with addu16i.d."
+  (and (match_code "const_int")
+   (match_test "ADDU16I_OPERAND (ival)")))
+
+(define_constraint "Lc"
+  "A signed 64-bit constant can be expressed as Lb + I, but not a single Lb
+   or I."
+  (and (match_code "const_int")
+   (match_test "loongarch_addu16i_imm12_operand_p (ival, DImode)")))
+
+(define_constraint "Ld"
+  "A signed 64-bit constant can be expressed as Lb + Lb, but not a single
+   Lb."
+  (and (match_code "const_int")
+   (match_test "DUAL_ADDU16I_OPERAND (ival)")))
+
+(define_constraint "Le"
+  "A signed 32-bit constant can be expressed as Lb + I, 

Re: [Patch] nvptx/mkoffload.cc: Add dummy proc for OpenMP rev-offload table [PR108098]

2023-04-04 Thread Tom de Vries via Gcc-patches

On 4/4/23 11:02, Thomas Schwinge wrote:

Hi!

Are we going to install such a work-around?



Hi,

LGTM.

Thanks,
- Tom



Grüße
  Thomas


On 2022-12-19T13:04:43+0100, I wrote:

Hi!

On 2022-12-16T17:19:00+0100, Tobias Burnus  wrote:

Seems to be a CUDA JIT issue


A Nvidia Driver JIT issue, more precisely.  ;-)


which is fixed by adding a dummy procedure.


Gah...  :-|


Lightly tested with 4 systems at hand, where 2 failed before.


I'm happy to confirm that indeed this does resolve the issue for all
configurations that I reported in 
"OpenMP/nvptx reverse offload execution test FAILs".


As I said on IRC, #gcc, 2022-12-16:


[...] we're unlikely to reverse-engineer the exact version/conditions
where this got fixed, so don't have a useful means for versioning the
workaround.  Fortunately, it doesn't "cost" anything really.  (In
constrast to some other GCC/nvptx back end workarounds, as I
understand.)



Grüße
  Thomas



One had 10.2 and
the other had some ancient CUDA where 'nvptx-smi' did not print a CUDA version
and requires -mptx=3.1.
(I did check that offloading indeed happened and no hostfallback was done.)

OK for mainline?

Tobias




nvptx/mkoffload.cc: Add dummy proc for OpenMP rev-offload table [PR108098]

Seemingly, the ptx JIT of CUDA <= 10.2 replaces function pointers in global
variables by NULL if a translation does not contain any executable code. It
works with CUDA 11.1.  The code of this commit is about reverse offload;
having NULL values disables the side of reverse offload during image load.

Solution is the same as found by Thomas for a related issue: Adding a dummy
procedure. Cf. the PR of this issue and Thomas' patch
"nvptx: Support global constructors/destructors via 'collect2'"
https://gcc.gnu.org/pipermail/gcc-patches/2022-December/607749.html

As that approach also works here:

Co-authored-by: Thomas Schwinge 

gcc/
  PR libgomp/108098

  * config/nvptx/mkoffload.cc (process): Emit dummy procedure
  alongside reverse-offload function table to prevent NULL values
  of the function addresses.

---
  gcc/config/nvptx/mkoffload.cc | 14 ++
  1 file changed, 14 insertions(+)

diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc
index 5d89ba8..8306aa0 100644
--- a/gcc/config/nvptx/mkoffload.cc
+++ b/gcc/config/nvptx/mkoffload.cc
@@ -357,6 +357,20 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
  fputc (sm_ver2[i], out);
fprintf (out, "\"\n\t\".file 1 \\\"\\\"\"\n");

+  /* WORKAROUND - see PR 108098
+ It seems as if older CUDA JIT compiler optimizes the function pointers
+ in offload_func_table to NULL, which can be prevented by adding a
+ dummy procedure. With CUDA 11.1, it seems to work fine without
+ workaround while CUDA 10.2 as some ancient version have need the
+ workaround. Assuming CUDA 11.0 fixes it, emitting it could be
+ restricted to 'if (sm_ver2[0] < 8 && version2[0] < 7)' as sm_80 and
+ PTX ISA 7.0 are new in CUDA 11.0; for 11.1 it would be sm_86 and
+ PTX ISA 7.1.  */
+  fprintf (out, "\n\t\".func __dummy$func ( );\"\n");
+  fprintf (out, "\t\".func __dummy$func ( )\"\n");
+  fprintf (out, "\t\"{\"\n");
+  fprintf (out, "\t\"}\"\n");
+
size_t fidx = 0;
for (id = func_ids; id; id = id->next)
  {

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955




Re: [Patch] nvptx/mkoffload.cc: Add dummy proc for OpenMP rev-offload table [PR108098]

2023-04-04 Thread Thomas Schwinge
Hi!

Are we going to install such a work-around?


Grüße
 Thomas


On 2022-12-19T13:04:43+0100, I wrote:
> Hi!
>
> On 2022-12-16T17:19:00+0100, Tobias Burnus  wrote:
>> Seems to be a CUDA JIT issue
>
> A Nvidia Driver JIT issue, more precisely.  ;-)
>
>> which is fixed by adding a dummy procedure.
>
> Gah...  :-|
>
>> Lightly tested with 4 systems at hand, where 2 failed before.
>
> I'm happy to confirm that indeed this does resolve the issue for all
> configurations that I reported in 
> "OpenMP/nvptx reverse offload execution test FAILs".
>
>
> As I said on IRC, #gcc, 2022-12-16:
>
>> [...] we're unlikely to reverse-engineer the exact version/conditions
>> where this got fixed, so don't have a useful means for versioning the
>> workaround.  Fortunately, it doesn't "cost" anything really.  (In
>> constrast to some other GCC/nvptx back end workarounds, as I
>> understand.)
>
>
> Grüße
>  Thomas
>
>
>> One had 10.2 and
>> the other had some ancient CUDA where 'nvptx-smi' did not print a CUDA 
>> version
>> and requires -mptx=3.1.
>> (I did check that offloading indeed happened and no hostfallback was done.)
>>
>> OK for mainline?
>>
>> Tobias
>
>
>> nvptx/mkoffload.cc: Add dummy proc for OpenMP rev-offload table [PR108098]
>>
>> Seemingly, the ptx JIT of CUDA <= 10.2 replaces function pointers in global
>> variables by NULL if a translation does not contain any executable code. It
>> works with CUDA 11.1.  The code of this commit is about reverse offload;
>> having NULL values disables the side of reverse offload during image load.
>>
>> Solution is the same as found by Thomas for a related issue: Adding a dummy
>> procedure. Cf. the PR of this issue and Thomas' patch
>> "nvptx: Support global constructors/destructors via 'collect2'"
>> https://gcc.gnu.org/pipermail/gcc-patches/2022-December/607749.html
>>
>> As that approach also works here:
>>
>> Co-authored-by: Thomas Schwinge 
>>
>> gcc/
>>  PR libgomp/108098
>>
>>  * config/nvptx/mkoffload.cc (process): Emit dummy procedure
>>  alongside reverse-offload function table to prevent NULL values
>>  of the function addresses.
>>
>> ---
>>  gcc/config/nvptx/mkoffload.cc | 14 ++
>>  1 file changed, 14 insertions(+)
>>
>> diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc
>> index 5d89ba8..8306aa0 100644
>> --- a/gcc/config/nvptx/mkoffload.cc
>> +++ b/gcc/config/nvptx/mkoffload.cc
>> @@ -357,6 +357,20 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
>>  fputc (sm_ver2[i], out);
>>fprintf (out, "\"\n\t\".file 1 \\\"\\\"\"\n");
>>
>> +  /* WORKAROUND - see PR 108098
>> + It seems as if older CUDA JIT compiler optimizes the function pointers
>> + in offload_func_table to NULL, which can be prevented by adding a
>> + dummy procedure. With CUDA 11.1, it seems to work fine without
>> + workaround while CUDA 10.2 as some ancient version have need the
>> + workaround. Assuming CUDA 11.0 fixes it, emitting it could be
>> + restricted to 'if (sm_ver2[0] < 8 && version2[0] < 7)' as sm_80 and
>> + PTX ISA 7.0 are new in CUDA 11.0; for 11.1 it would be sm_86 and
>> + PTX ISA 7.1.  */
>> +  fprintf (out, "\n\t\".func __dummy$func ( );\"\n");
>> +  fprintf (out, "\t\".func __dummy$func ( )\"\n");
>> +  fprintf (out, "\t\"{\"\n");
>> +  fprintf (out, "\t\"}\"\n");
>> +
>>size_t fidx = 0;
>>for (id = func_ids; id; id = id->next)
>>  {
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH] riscv: Fix bootstrap [PR109384]

2023-04-04 Thread Jakub Jelinek via Gcc-patches
Hi!

The following patch unbreaks riscv bootstrap, where it previously failed
on -Werror=format-diag warning promoted to error.

Ok for trunk?

Or shall it say e.g.
"%<-march=%s%>: % extension conflicts with %"
?
Or say if the current condition is true, do
const char *ext = "zfinx";
if (subset_list->lookup ("zdinx"))
  ext = "zdinx";
else if (subset_list->lookup ("zhinx"))
  ext = "zhinx";
else if (subset_list->lookup ("zhinxmin"))
  ext = "zhinxmin";
and
"%<-march=%s%>: %qs extension conflicts with %", arch, ext
?  Or do similar check for which extension to print against it,
const char *ext = "zfinx";
const char *ext2 = "f";
if (subset_list->lookup ("zdinx"))
  {
ext = "zdinx";
if (subset_list->lookup ("d"))
  ext2 = "d";
  }
else if (subset_list->lookup ("zhinx"))
  {
ext = "zhinx";
if (subset_list->lookup ("zfh"))
  ext2 = "zfh";
  }
else if (subset_list->lookup ("zhinxmin"))
  {
ext = "zhinxmin";
if (subset_list->lookup ("zfhmin"))
  ext2 = "zfhmin";
  }
"%<-march=%s%>: %qs extension conflicts with %qs", arch, ext, ext2
?

2023-04-04  Jakub Jelinek  

PR target/109384
* common/config/riscv/riscv-common.cc (riscv_subset_list::parse):
Reword diagnostics about zfinx conflict with f, formatting fixes.

* gcc.target/riscv/arch-19.c: Expect a different message about zfinx
vs. f conflict.

--- gcc/common/config/riscv/riscv-common.cc.jj  2023-04-04 10:46:33.473871184 
+0200
+++ gcc/common/config/riscv/riscv-common.cc 2023-04-04 10:41:22.014477456 
+0200
@@ -1153,10 +1153,9 @@ riscv_subset_list::parse (const char *ar
 
   subset_list->handle_combine_ext ();
 
-  if (subset_list->lookup("zfinx") && subset_list->lookup("f"))
-   error_at (loc,
-   "%<-march=%s%>: z*inx is conflict with float extensions",
-   arch);
+  if (subset_list->lookup ("zfinx") && subset_list->lookup ("f"))
+error_at (loc, "%<-march=%s%>: z*inx conflicts with floating-point "
+  "extensions", arch);
 
   return subset_list;
 
--- gcc/testsuite/gcc.target/riscv/arch-19.c.jj 2023-03-29 22:37:11.465651690 
+0200
+++ gcc/testsuite/gcc.target/riscv/arch-19.c2023-04-04 10:45:50.734503089 
+0200
@@ -1,4 +1,4 @@
 /* { dg-do compile } */
 /* { dg-options "-march=rv64if_zfinx -mabi=lp64" } */
 int foo() {}
-/* { dg-error "'-march=rv64if_zfinx': z\\*inx is conflict with float 
extensions" "" { target *-*-* } 0 } */
+/* { dg-error "'-march=rv64if_zfinx': z\\*inx conflicts with floating-point 
extensions" "" { target *-*-* } 0 } */

Jakub



[PATCH] RISC-V: Fix PR109399 VSETVL PASS bug

2023-04-04 Thread juzhe . zhong
From: Juzhe-Zhong 

This patch fix bug:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109399

PR 109399

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc 
(pass_vsetvl::compute_local_backward_infos): Update user vsetvl in local demand 
fusion.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/pr109399.c: New test.

---
 gcc/config/riscv/riscv-vsetvl.cc   |  8 +++-
 .../gcc.target/riscv/rvv/vsetvl/pr109399.c | 14 ++
 2 files changed, 21 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109399.c

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 4d043c0645b..7e8a5376705 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2715,7 +2715,13 @@ pass_vsetvl::compute_local_backward_infos (const bb_info 
*bb)
  if (!(propagate_avl_across_demands_p (change, info)
&& !reg_available_p (insn, change))
  && change.compatible_p (info))
-   info = change.merge (info);
+   {
+ info = change.merge (info);
+ /* Fix PR109399, we should update user vsetvl instruction
+if there is a change in demand fusion.  */
+ if (vsetvl_insn_p (insn->rtl ()))
+   change_vsetvl_insn (insn, info);
+   }
}
  change = info;
}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109399.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109399.c
new file mode 100644
index 000..b3abad7a8bf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109399.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 -fno-tree-vectorize 
-fno-schedule-insns -fno-schedule-insns2" } */
+
+#include "riscv_vector.h"
+
+void foo(void *in1, void *in2, void *in3, void *out, size_t n) {
+  size_t vl = __riscv_vsetvlmax_e32m1();
+  vint32m1_t a = __riscv_vle32_v_i32m1(in1, vl);
+  vint32m1_t b = __riscv_vle32_v_i32m1_tu(a, in2, vl);
+  vint32m1_t c = __riscv_vle32_v_i32m1_tu(b, in3, vl);
+  __riscv_vse32_v_i32m1(out, c, vl);
+}
+
+/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e32,\s*m1,\s*tu,\s*m[au]} 1 { target { no-opts 
"-O0"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } 
} */
-- 
2.36.3



Re: [GCC14 PATCH] LoongArch: Optimize additions with immediates

2023-04-04 Thread Xi Ruoyao via Gcc-patches
On Tue, 2023-04-04 at 16:00 +0800, Xi Ruoyao via Gcc-patches wrote:
> On Tue, 2023-04-04 at 11:01 +0800, Lulu Cheng wrote:
> 
> /* snip */
> 
> > > +unsigned long f10 (unsigned long x) { return x - 0x8000l * 2; }
> > > +unsigned long f11 (unsigned long x) { return x - 0x8000l * 2; }
> >  These two test cases are duplicates.
> 
> /* snip */
> 
> > 
> > > +unsigned int g10 (unsigned int x) { return x - 0x8000l * 2; }
> > > +unsigned int g11 (unsigned int x) { return x - 0x8000l * 2; }
> > Ditto.
> 
> I'll fix them in V2.
> 
> > I found that adding this log test case 
> > gcc.target/loongarch/stack-check-cfa-1.c and 
> > gcc.target/loongarch/stack-check-cfa-2.c test failed.
> > Although the test fails, the generated assembly code is better, and there 
> > is no problem with the logic of the assembly code. I haven't checked the 
> > reason for this yet.
> 
> Looks like the change hides PR109035 (like -fpie) for some reason. (But
> I still don't understand the root cause of PR109035 anyway.)

V2 sent with test cases fixed.

/* snip */

> Technically there should be "addu16i.d $r3,$r3,-1" in the prologue and
> "addu16i.d $r3,$r3,2" in the epilogue, so we can avoid using r14/r13.
> I'll try modifying loongarch_expand_{pro,epi}logue for this.

Will do this later because I'm too stupid to understand
loongarch_first_stack_step function quickly :).

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[GCC14 PATCH v2] LoongArch: Optimize additions with immediates

2023-04-04 Thread Xi Ruoyao via Gcc-patches
1. Use addu16i.d for TARGET_64BIT and suitable immediates.
2. Split one addition with immediate into two addu16i.d or addi.{d/w}
   instructions if possible.  This can avoid using a temp register w/o
   increase the count of instructions.

Inspired by https://reviews.llvm.org/D143710 and
https://reviews.llvm.org/D147222.

Stack check CFA tests are updated because this change somehow hides
PR109035, causing a smaller stack frame.

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for GCC 14?

gcc/ChangeLog:

* config/loongarch/loongarch-protos.h
(loongarch_addu16i_imm12_operand_p): New function prototype.
(loongarch_split_plus_constant): Likewise.
* config/loongarch/loongarch.cc
(loongarch_addu16i_imm12_operand_p): New function.
(loongarch_split_plus_constant): Likewise.
* config/loongarch/loongarch.h (ADDU16I_OPERAND): New macro.
(DUAL_IMM12_OPERAND): Likewise.
(DUAL_ADDU16I_OPERAND): Likewise.
* config/loongarch/constraints.md (La, Lb, Lc, Ld, Le): New
constraint.
* config/loongarch/predicates.md (const_dual_imm12_operand): New
predicate.
(const_addu16i_operand): Likewise.
(const_addu16i_imm12_di_operand): Likewise.
(const_addu16i_imm12_si_operand): Likewise.
(plus_di_operand): Likewise.
(plus_si_operand): Likewise.
(plus_si_extend_operand): Likewise.
* config/loongarch/loongarch.md (add3): Convert to
define_insn_and_split.  Use plus__operand predicate
instead of arith_operand.  Add alternatives for La, Lb, Lc, Ld,
and Le constraints.
(*addsi3_extended): Convert to define_insn_and_split.  Use
plus_si_extend_operand instead of arith_operand.  Add
alternatives for La and Le alternatives.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/add-const.c: New test.
* gcc.target/loongarch/stack-check-cfa-1.c: Adjust for stack
frame size change.
* gcc.target/loongarch/stack-check-cfa-2.c: Likewise.
---
 gcc/config/loongarch/constraints.md   | 46 -
 gcc/config/loongarch/loongarch-protos.h   |  2 +
 gcc/config/loongarch/loongarch.cc | 44 +
 gcc/config/loongarch/loongarch.h  | 19 ++
 gcc/config/loongarch/loongarch.md | 66 +++
 gcc/config/loongarch/predicates.md| 36 ++
 .../gcc.target/loongarch/add-const.c  | 45 +
 .../gcc.target/loongarch/stack-check-cfa-1.c  |  2 +-
 .../gcc.target/loongarch/stack-check-cfa-2.c  |  2 +-
 9 files changed, 246 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/add-const.c

diff --git a/gcc/config/loongarch/constraints.md 
b/gcc/config/loongarch/constraints.md
index cb7fa688ceb..7a38cd07ae9 100644
--- a/gcc/config/loongarch/constraints.md
+++ b/gcc/config/loongarch/constraints.md
@@ -60,7 +60,22 @@
 ;; "I" "A signed 12-bit constant (for arithmetic instructions)."
 ;; "J" "Integer zero."
 ;; "K" "An unsigned 12-bit constant (for logic instructions)."
-;; "L" <-unused
+;; "L" -
+;; "La"
+;;  "A signed constant in [-4096, 2048) or (2047, 4094]."
+;; "Lb"
+;;  "A signed 32-bit constant and low 16-bit is zero, which can be
+;;   added onto a register with addu16i.d.  It matches nothing if
+;;   the addu16i.d instruction is not available."
+;; "Lc"
+;;  "A signed 64-bit constant can be expressed as Lb + I, but not a
+;;   single Lb or I."
+;; "Ld"
+;;  "A signed 64-bit constant can be expressed as Lb + Lb, but not a
+;;   single Lb."
+;; "Le"
+;;  "A signed 32-bit constant can be expressed as Lb + I, but not a
+;;   single Lb or I."
 ;; "M" <-unused
 ;; "N" <-unused
 ;; "O" <-unused
@@ -170,6 +185,35 @@ (define_constraint "K"
   (and (match_code "const_int")
(match_test "IMM12_OPERAND_UNSIGNED (ival)")))
 
+(define_constraint "La"
+  "A signed constant in [-4096, 2048) or (2047, 4094]."
+  (and (match_code "const_int")
+   (match_test "DUAL_IMM12_OPERAND (ival)")))
+
+(define_constraint "Lb"
+  "A signed 32-bit constant and low 16-bit is zero, which can be added
+   onto a register with addu16i.d."
+  (and (match_code "const_int")
+   (match_test "ADDU16I_OPERAND (ival)")))
+
+(define_constraint "Lc"
+  "A signed 64-bit constant can be expressed as Lb + I, but not a single Lb
+   or I."
+  (and (match_code "const_int")
+   (match_test "loongarch_addu16i_imm12_operand_p (ival, DImode)")))
+
+(define_constraint "Ld"
+  "A signed 64-bit constant can be expressed as Lb + Lb, but not a single
+   Lb."
+  (and (match_code "const_int")
+   (match_test "DUAL_ADDU16I_OPERAND (ival)")))
+
+(define_constraint "Le"
+  "A signed 32-bit constant can be expressed as Lb + I, but not a single Lb
+   or I."
+  (and (match_code "const_int")
+   (match_test "loongarch_addu16i_imm12_operand_p (ival, 

Re: [PATCH] tree-optimization/109304 - properly handle instrumented aliases

2023-04-04 Thread Jakub Jelinek via Gcc-patches
On Tue, Apr 04, 2023 at 01:21:40AM +0200, Jan Hubicka via Gcc-patches wrote:
> It is however really side case and I am worried about dropping
> pure/const from builtin declarations...

Yeah, that can certainly break stuff.  See e.g. the recently fixed
ICE when memcmp wasn't pure in PR109258.

Jakub



[COMMITTED] config: -pthread shouldn't link with -lpthread on Solaris

2023-04-04 Thread Rainer Orth
libpthread has been folded into libc since Solaris 10 and replaced by a
filter on libc.  Linking with libpthread thus only creates unnecessary
runtime overhead.

This patch thus removes linking with -lpthread if -pthread/-pthreads is
specified, thus getting rid of the libpthread dependency in libatomic,
libgdruntime, libgomp, libgphobos, and libitm.

Bootstrapped without regressions on i386-pc-solaris2.11 and
sparc-sun-solaris2.11 (both Solaris 11.3 and 11.4).

Committed to trunk.

There are more instances of this issue: both libsanitizer and libgo
unnecessarily link with -lpthread, either unconditionally or due to a
configure test which doesn't check if the library is actually needed.
This can be fixed by consistently using AX_PTHREAD from
config/ax_pthread.m4, but such a fix affects all targets and is clearly
not stage 4 material.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2023-04-03  Rainer Orth  

gcc:
* config/sol2.h (LIB_SPEC): Don't link with -lpthread.

# HG changeset patch
# Parent  5e543e5a54a480b50f3f8534837cb5ec7ad96a07
config: -pthread shouldn't link with -lpthread on Solaris

diff --git a/gcc/config/sol2.h b/gcc/config/sol2.h
--- a/gcc/config/sol2.h
+++ b/gcc/config/sol2.h
@@ -161,7 +161,6 @@ along with GCC; see the file COPYING3.  
 #undef LIB_SPEC
 #define LIB_SPEC \
   "%{!symbolic:\
- %{pthreads|pthread:-lpthread} \
  %{p|pg:-ldl} -lc}"
 
 #ifndef CROSS_DIRECTORY_STRUCTURE


Re: [GCC14 PATCH] LoongArch: Optimize additions with immediates

2023-04-04 Thread Xi Ruoyao via Gcc-patches
On Tue, 2023-04-04 at 11:01 +0800, Lulu Cheng wrote:

/* snip */

> > +unsigned long f10 (unsigned long x) { return x - 0x8000l * 2; }
> > +unsigned long f11 (unsigned long x) { return x - 0x8000l * 2; }
>  These two test cases are duplicates.

/* snip */

> 
> > +unsigned int g10 (unsigned int x) { return x - 0x8000l * 2; }
> > +unsigned int g11 (unsigned int x) { return x - 0x8000l * 2; }
> Ditto.

I'll fix them in V2.

> I found that adding this log test case 
> gcc.target/loongarch/stack-check-cfa-1.c and 
> gcc.target/loongarch/stack-check-cfa-2.c test failed.
> Although the test fails, the generated assembly code is better, and there is 
> no problem with the logic of the assembly code. I haven't checked the reason 
> for this yet.

Looks like the change hides PR109035 (like -fpie) for some reason. (But
I still don't understand the root cause of PR109035 anyway.)

f_test:
.LFB0 = .
.cfi_startproc
lu12i.w $r14,65536>>12  # 0x1
sub.d   $r3,$r3,$r14
st.d$r0,$r3,0
sub.d   $r3,$r3,$r14
st.d$r0,$r3,0
.cfi_def_cfa_offset 131072
lu12i.w $r13,131072>>12 #
0x2
or  $r12,$r3,$r0
ldx.b   $r4,$r12,$r4
add.d   $r3,$r3,$r13
.cfi_def_cfa_offset 0
jr  $r1
.cfi_endproc

Technically there should be "addu16i.d $r3,$r3,-1" in the prologue and
"addu16i.d $r3,$r3,2" in the epilogue, so we can avoid using r14/r13.
I'll try modifying loongarch_expand_{pro,epi}logue for this.


-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[PATCH] RISC-V: Fix typo

2023-04-04 Thread Li Xu
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins.def: Fix typo.
* config/riscv/riscv.cc (riscv_dwarf_poly_indeterminate_value): Ditto.
* config/riscv/vector-iterators.md: Ditto.
---
 gcc/config/riscv/riscv-vector-builtins.def | 3 +--
 gcc/config/riscv/riscv.cc  | 4 ++--
 gcc/config/riscv/vector-iterators.md   | 4 ++--
 3 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/gcc/config/riscv/riscv-vector-builtins.def 
b/gcc/config/riscv/riscv-vector-builtins.def
index 2d527f76f0a..563ad355342 100644
--- a/gcc/config/riscv/riscv-vector-builtins.def
+++ b/gcc/config/riscv/riscv-vector-builtins.def
@@ -65,8 +65,7 @@ along with GCC; see the file COPYING3.  If not see
 #define DEF_RVV_BASE_TYPE(NAME, TYPE)
 #endif
 
-/* Use "DEF_RVV_TYPE_INDEX" macro to define RVV function types.
-   The 'NAME' will be concatenated into intrinsic function name.  */
+/* Use "DEF_RVV_TYPE_INDEX" macro to define RVV function types.  */
 #ifndef DEF_RVV_TYPE_INDEX
 #define DEF_RVV_TYPE_INDEX(VECTOR, MASK, SIGNED, UNSIGNED, EEW8_INDEX, 
EEW16_INDEX, \
  EEW32_INDEX, EEW64_INDEX, SHIFT, DOUBLE_TRUNC,   \
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 76eee4a55e9..5f542932d13 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -7048,8 +7048,8 @@ riscv_dwarf_poly_indeterminate_value (unsigned int i, 
unsigned int *factor,
  int *offset)
 {
   /* Polynomial invariant 1 == (VLENB / riscv_bytes_per_vector_chunk) - 1.
- 1. TARGET_MIN_VLEN == 32, olynomial invariant 1 == (VLENB / 4) - 1.
- 2. TARGET_MIN_VLEN > 32, olynomial invariant 1 == (VLENB / 8) - 1.
+ 1. TARGET_MIN_VLEN == 32, polynomial invariant 1 == (VLENB / 4) - 1.
+ 2. TARGET_MIN_VLEN > 32, polynomial invariant 1 == (VLENB / 8) - 1.
   */
   gcc_assert (i == 1);
   *factor = riscv_bytes_per_vector_chunk;
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 34e486e48ca..194e9b8f57f 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -727,7 +727,7 @@
   (VNx1QI "vnx4hi") (VNx2QI "vnx4hi") (VNx4QI "vnx4hi")
   (VNx8QI "vnx4hi") (VNx16QI "vnx4hi") (VNx32QI "vnx4hi") (VNx64QI "vnx4hi")
   (VNx1HI "vnx2si") (VNx2HI "vnx2si") (VNx4HI "vnx2si")
-  (VNx8HI "vnx2si") (VNx16HI "vnx2si") (VNx32HI "vnx2SI")
+  (VNx8HI "vnx2si") (VNx16HI "vnx2si") (VNx32HI "vnx2si")
   (VNx1SI "vnx2di") (VNx2SI "vnx2di") (VNx4SI "vnx2di")
   (VNx8SI "vnx2di") (VNx16SI "vnx2di")
   (VNx1SF "vnx1df") (VNx2SF "vnx1df")
@@ -738,7 +738,7 @@
   (VNx1QI "vnx2hi") (VNx2QI "vnx2hi") (VNx4QI "vnx2hi")
   (VNx8QI "vnx2hi") (VNx16QI "vnx2hi") (VNx32QI "vnx2hi")
   (VNx1HI "vnx1si") (VNx2HI "vnx1si") (VNx4HI "vnx1si")
-  (VNx8HI "vnx1si") (VNx16HI "vnx1SI")
+  (VNx8HI "vnx1si") (VNx16HI "vnx1si")
 ])
 
 (define_mode_attr VDEMOTE [
-- 
2.17.1