Re: [PATCH v2] VECT: Fix ICE for vectorizable LD/ST when both len and store are enabled

2024-03-09 Thread Richard Biener



> Am 10.03.2024 um 04:14 schrieb pan2...@intel.com:
> 
> From: Pan Li 
> 
> This patch would like to fix one ICE in vectorizable_store when both the
> loop_masks and loop_lens are enabled.  The ICE looks like below when build
> with "-march=rv64gcv -O3".
> 
> during GIMPLE pass: vect
> test.c: In function ‘d’:
> test.c:6:6: internal compiler error: in vectorizable_store, at
> tree-vect-stmts.cc:8691
>6 | void d() {
>  |  ^
> 0x37a6f2f vectorizable_store
>.../__RISC-V_BUILD__/../gcc/tree-vect-stmts.cc:8691
> 0x37b861c vect_analyze_stmt(vec_info*, _stmt_vec_info*, bool*,
> _slp_tree*, _slp_instance*, vec*)
>.../__RISC-V_BUILD__/../gcc/tree-vect-stmts.cc:13242
> 0x1db5dca vect_analyze_loop_operations
>.../__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:2208
> 0x1db885b vect_analyze_loop_2
>.../__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3041
> 0x1dba029 vect_analyze_loop_1
>.../__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3481
> 0x1dbabad vect_analyze_loop(loop*, vec_info_shared*)
>.../__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3639
> 0x1e389d1 try_vectorize_loop_1
>.../__RISC-V_BUILD__/../gcc/tree-vectorizer.cc:1066
> 0x1e38f3d try_vectorize_loop
>.../__RISC-V_BUILD__/../gcc/tree-vectorizer.cc:1182
> 0x1e39230 execute
>.../__RISC-V_BUILD__/../gcc/tree-vectorizer.cc:1298
> 
> There are two ways to reach vectorizer LD/ST, one is the analysis and
> the other is transform.  We cannot have both the lens and the masks
> enabled during transform but it is valid during analysis.  Given the
> transform doesn't required cost_vec,  we can only enable the assert
> based on cost_vec is NULL or not.
> 
> Below testsuites are passed for this patch:
> * The x86 bootstrap tests.
> * The x86 fully regression tests.
> * The aarch64 fully regression tests.
> * The riscv fully regressison tests.

Ok

Thanks,
Richard 

> gcc/ChangeLog:
> 
>* tree-vect-stmts.cc (vectorizable_store): Enable the assert
>during transform process.
>(vectorizable_load): Ditto.
> 
> gcc/testsuite/ChangeLog:
> 
>* gcc.target/riscv/rvv/base/pr114195-1.c: New test.
> 
> Signed-off-by: Pan Li 
> ---
> .../gcc.target/riscv/rvv/base/pr114195-1.c | 15 +++
> gcc/tree-vect-stmts.cc | 18 ++
> 2 files changed, 29 insertions(+), 4 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr114195-1.c
> 
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr114195-1.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/pr114195-1.c
> new file mode 100644
> index 000..a67b847112b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr114195-1.c
> @@ -0,0 +1,15 @@
> +/* Test that we do not have ice when compile */
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize" } */
> +
> +long a, b;
> +extern short c[];
> +
> +void d() {
> +  for (int e = 0; e < 35; e = 2) {
> +a = ({ a < 0 ? a : 0; });
> +b = ({ b < 0 ? b : 0; });
> +
> +c[e] = 0;
> +  }
> +}
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 14a3ffb5f02..e8617439a48 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -8697,8 +8697,13 @@ vectorizable_store (vec_info *vinfo,
>? _VINFO_LENS (loop_vinfo)
>: NULL);
> 
> -  /* Shouldn't go with length-based approach if fully masked.  */
> -  gcc_assert (!loop_lens || !loop_masks);
> +  /* The vect_transform_stmt and vect_analyze_stmt will go here but there
> + are some difference here.  We cannot enable both the lens and masks
> + during transform but it is allowed during analysis.
> + Shouldn't go with length-based approach if fully masked.  */
> +  if (cost_vec == NULL)
> +/* The cost_vec is NULL during transfrom.  */
> +gcc_assert ((!loop_lens || !loop_masks));
> 
>   /* Targets with store-lane instructions must not require explicit
>  realignment.  vect_supportable_dr_alignment always returns either
> @@ -10577,8 +10582,13 @@ vectorizable_load (vec_info *vinfo,
>? _VINFO_LENS (loop_vinfo)
>: NULL);
> 
> -  /* Shouldn't go with length-based approach if fully masked.  */
> -  gcc_assert (!loop_lens || !loop_masks);
> +  /* The vect_transform_stmt and vect_analyze_stmt will go here but there
> + are some difference here.  We cannot enable both the lens and masks
> + during transform but it is allowed during analysis.
> + Shouldn't go with length-based approach if fully masked.  */
> +  if (cost_vec == NULL)
> +/* The cost_vec is NULL during transfrom.  */
> +gcc_assert ((!loop_lens || !loop_masks));
> 
>   /* Targets with store-lane instructions must not require explicit
>  realignment.  vect_supportable_dr_alignment always returns either
> --
> 2.34.1
> 


Reverting recent adjustment to expected output of sh port tests

2024-03-09 Thread Jeff Law


With Jakub's twiddle to the forward propagation pass, the assembly code 
for pr59533-1.c has returned to its previous state.  Thus I've reverted 
my patch which adjusted the expected output.


Jeffcommit 6f7d000fcacef31a6947f95021e445c846170f92
Author: jlaw 
Date:   Sat Mar 9 21:33:47 2024 -0700

Revert "[committed] Adjust expectations for pr59533-1.c"

This reverts commit 7e16f819ff413c48702f9087b62eaac39a060a14.

diff --git a/gcc/testsuite/gcc.target/sh/pr59533-1.c 
b/gcc/testsuite/gcc.target/sh/pr59533-1.c
index 859b8e2d24c..b0469859df5 100644
--- a/gcc/testsuite/gcc.target/sh/pr59533-1.c
+++ b/gcc/testsuite/gcc.target/sh/pr59533-1.c
@@ -2,15 +2,15 @@
 /* { dg-do compile }  */
 /* { dg-options "-O1" } */
 
-/* { dg-final { scan-assembler-times "shll" 3 } }  */
+/* { dg-final { scan-assembler-times "shll" 1 } }  */
 /* { dg-final { scan-assembler-times "movt" 5 } }  */
 /* { dg-final { scan-assembler-times "rotcl" 1 } }  */
 /* { dg-final { scan-assembler-times "and" 3 } }  */
 /* { dg-final { scan-assembler-times "extu.b" 5 } }  */
 
-/* { dg-final { scan-assembler-times "cmp/pz" 25 { target { ! sh2a } } } }  */
-/* { dg-final { scan-assembler-times "addc" 6 { target { ! sh2a } } } }  */
-/* { dg-final { scan-assembler-times "subc" 14 { target { ! sh2a } } } }  */
+/* { dg-final { scan-assembler-times "cmp/pz" 27 { target { ! sh2a } } } }  */
+/* { dg-final { scan-assembler-times "addc" 4 { target { ! sh2a } } } }  */
+/* { dg-final { scan-assembler-times "subc" 16 { target { ! sh2a } } } }  */
 
 /* { dg-final { scan-assembler-times "cmp/pz" 25 { target { sh2a } } } }  */
 /* { dg-final { scan-assembler-times "addc" 6 { target { sh2a } } } }  */


[committed] [target/102250] Document python requirement for risc-v

2024-03-09 Thread Jeff Law
Not sure why nobody's taken care of this yet.  Under certain 
circumstances python may be needed if you're building a RISC-V compiler.


Here's what I've checked in.  Happy to adjust if folks want to wordsmith 
it further.


Jeffcommit 7c8f0a79a7e1e42f846ddbca14b98b47ddcfd178
Author: jlaw 
Date:   Sat Mar 9 20:11:39 2024 -0700

[committed] [target/102250] Document python requirement for risc-v

PR target/102250
gcc/

* doc/install.texi: Document need for python when building
RISC-V compilers.

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 173233096d1..e3650e0c4f4 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -253,6 +253,11 @@ name of the package depends on your distro) or you must 
build GCC as a
 @option{--disable-multilib}.  Otherwise, you may encounter an error such as
 @samp{fatal error: gnu/stubs-32.h: No such file}
 
+@item Python
+If you configure a RISC-V compiler with the option @option{--with-arch} and
+the specified architecture string is non-canonical, then you will need
+@command{python} installed on the build system.
+
 @item @anchor{GNAT-prerequisite}GNAT
 
 In order to build GNAT, the Ada compiler, you need a working GNAT


[PATCH v2] VECT: Fix ICE for vectorizable LD/ST when both len and store are enabled

2024-03-09 Thread pan2 . li
From: Pan Li 

This patch would like to fix one ICE in vectorizable_store when both the
loop_masks and loop_lens are enabled.  The ICE looks like below when build
with "-march=rv64gcv -O3".

during GIMPLE pass: vect
test.c: In function ‘d’:
test.c:6:6: internal compiler error: in vectorizable_store, at
tree-vect-stmts.cc:8691
6 | void d() {
  |  ^
0x37a6f2f vectorizable_store
.../__RISC-V_BUILD__/../gcc/tree-vect-stmts.cc:8691
0x37b861c vect_analyze_stmt(vec_info*, _stmt_vec_info*, bool*,
_slp_tree*, _slp_instance*, vec*)
.../__RISC-V_BUILD__/../gcc/tree-vect-stmts.cc:13242
0x1db5dca vect_analyze_loop_operations
.../__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:2208
0x1db885b vect_analyze_loop_2
.../__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3041
0x1dba029 vect_analyze_loop_1
.../__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3481
0x1dbabad vect_analyze_loop(loop*, vec_info_shared*)
.../__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3639
0x1e389d1 try_vectorize_loop_1
.../__RISC-V_BUILD__/../gcc/tree-vectorizer.cc:1066
0x1e38f3d try_vectorize_loop
.../__RISC-V_BUILD__/../gcc/tree-vectorizer.cc:1182
0x1e39230 execute
.../__RISC-V_BUILD__/../gcc/tree-vectorizer.cc:1298

There are two ways to reach vectorizer LD/ST, one is the analysis and
the other is transform.  We cannot have both the lens and the masks
enabled during transform but it is valid during analysis.  Given the
transform doesn't required cost_vec,  we can only enable the assert
based on cost_vec is NULL or not.

Below testsuites are passed for this patch:
* The x86 bootstrap tests.
* The x86 fully regression tests.
* The aarch64 fully regression tests.
* The riscv fully regressison tests.

gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_store): Enable the assert
during transform process.
(vectorizable_load): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr114195-1.c: New test.

Signed-off-by: Pan Li 
---
 .../gcc.target/riscv/rvv/base/pr114195-1.c | 15 +++
 gcc/tree-vect-stmts.cc | 18 ++
 2 files changed, 29 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr114195-1.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr114195-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr114195-1.c
new file mode 100644
index 000..a67b847112b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr114195-1.c
@@ -0,0 +1,15 @@
+/* Test that we do not have ice when compile */
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize" } */
+
+long a, b;
+extern short c[];
+
+void d() {
+  for (int e = 0; e < 35; e = 2) {
+a = ({ a < 0 ? a : 0; });
+b = ({ b < 0 ? b : 0; });
+
+c[e] = 0;
+  }
+}
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 14a3ffb5f02..e8617439a48 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -8697,8 +8697,13 @@ vectorizable_store (vec_info *vinfo,
? _VINFO_LENS (loop_vinfo)
: NULL);
 
-  /* Shouldn't go with length-based approach if fully masked.  */
-  gcc_assert (!loop_lens || !loop_masks);
+  /* The vect_transform_stmt and vect_analyze_stmt will go here but there
+ are some difference here.  We cannot enable both the lens and masks
+ during transform but it is allowed during analysis.
+ Shouldn't go with length-based approach if fully masked.  */
+  if (cost_vec == NULL)
+/* The cost_vec is NULL during transfrom.  */
+gcc_assert ((!loop_lens || !loop_masks));
 
   /* Targets with store-lane instructions must not require explicit
  realignment.  vect_supportable_dr_alignment always returns either
@@ -10577,8 +10582,13 @@ vectorizable_load (vec_info *vinfo,
? _VINFO_LENS (loop_vinfo)
: NULL);
 
-  /* Shouldn't go with length-based approach if fully masked.  */
-  gcc_assert (!loop_lens || !loop_masks);
+  /* The vect_transform_stmt and vect_analyze_stmt will go here but there
+ are some difference here.  We cannot enable both the lens and masks
+ during transform but it is allowed during analysis.
+ Shouldn't go with length-based approach if fully masked.  */
+  if (cost_vec == NULL)
+/* The cost_vec is NULL during transfrom.  */
+gcc_assert ((!loop_lens || !loop_masks));
 
   /* Targets with store-lane instructions must not require explicit
  realignment.  vect_supportable_dr_alignment always returns either
-- 
2.34.1



[committed] [PR target/111362] Fix compare-debug issue with mode switching

2024-03-09 Thread Jeff Law


The issue here is the code we emit for mode-switching can change when -g 
is added to the command line.  This is caused by processing debug notes 
occurring after a call which is the last real statement in a basic block.


Without -g the CALL_INSN is literally the last insn in the block and the 
loop exits.  If mode switching after the call is needed, it'll be 
handled as we process outgoing edges.


With -g the loop iterates again and in the processing of the node the 
backend signals that a mode switch is necessary.


I pondered fixing this in the target, but the better fix is to ignore 
the debug notes in the insn stream.


I did a cursory review of some of the other compare-debug failures, but 
did not immediately see others which would likely be fixed by this 
change.  Sigh.


Anyway, bootstrapped and regression tested on x86.  Regression tested on 
rv64 as well.


Pushing to the trunk.

Jeff

commit 50531b6d400945793a1d549e6ee941d989319d42
Author: jlaw 
Date:   Sat Mar 9 19:27:32 2024 -0700

[committed] [PR target/111362] Fix compare-debug issue with mode switching

The issue here is the code we emit for mode-switching can change when -g is
added to the command line.  This is caused by processing debug notes 
occurring
after a call which is the last real statement in a basic block.

Without -g the CALL_INSN is literally the last insn in the block and the 
loop
exits.  If mode switching after the call is needed, it'll be handled as we
process outgoing edges.

With -g the loop iterates again and in the processing of the node the 
backend
signals that a mode switch is necessary.

I pondered fixing this in the target, but the better fix is to ignore the 
debug
notes in the insn stream.

I did a cursory review of some of the other compare-debug failures, but did 
not
immediately see others which would likely be fixed by this change.  Sigh.

Anyway, bootstrapped and regression tested on x86.  Regression tested on 
rv64
as well.

PR target/111362
gcc/
* mode-switching.cc (optimize_mode_switching): Only process
NONDEBUG insns.

gcc/testsuite

* gcc.target/riscv/compare-debug-1.c: New test.
* gcc.target/riscv/compare-debug-2.c: New test.

diff --git a/gcc/mode-switching.cc b/gcc/mode-switching.cc
index 583929184ce..a145b77397d 100644
--- a/gcc/mode-switching.cc
+++ b/gcc/mode-switching.cc
@@ -959,7 +959,7 @@ optimize_mode_switching (void)
 
  FOR_BB_INSNS (bb, insn)
{
- if (INSN_P (insn))
+ if (NONDEBUG_INSN_P (insn))
{
  int mode = targetm.mode_switching.needed (e, insn, live_now);
  rtx link;
diff --git a/gcc/testsuite/gcc.target/riscv/compare-debug-1.c 
b/gcc/testsuite/gcc.target/riscv/compare-debug-1.c
new file mode 100644
index 000..d65bb287b9a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/compare-debug-1.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fno-tree-ch --param=max-completely-peel-times=0 
-march=rv64iv -mabi=lp64d -fcompare-debug" } */
+
+
+void
+foo(void) {
+  for (unsigned i = 0; i < sizeof(foo); i++)
+__builtin_printf("%d", i);
+}
diff --git a/gcc/testsuite/gcc.target/riscv/compare-debug-2.c 
b/gcc/testsuite/gcc.target/riscv/compare-debug-2.c
new file mode 100644
index 000..d87758475e4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/compare-debug-2.c
@@ -0,0 +1,3 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fno-tree-ch --param=max-completely-peel-times=0 
-march=rv64iv -mabi=lp64d -fno-dce -fschedule-insns -fcompare-debug" } */
+#include "compare-debug-1.c"


Re: [PATCH 1/2] libstdc++: Atomic wait/notify ABI stabilization

2024-03-09 Thread Jonathan Wakely
On Sat, 9 Mar 2024 at 12:18, Jonathan Wakely  wrote:

>
>
>
> +template
>> +  __wait_result_type
>> +  __wait_for(const __platform_wait_t* __addr, __wait_args __args,
>> +const chrono::duration<_Rep, _Period>& __rtime) noexcept
>> +{
>> +  if (!__rtime.count())
>> +   // no rtime supplied, just spin a bit
>> +   return __detail::__wait_impl(__addr, __args |
>> __wait_flags::__spin_only);
>>
>
> This should set __do_spin | __spin_only if the latter no longer implies
> the former.
>
>
>>
>> +enum class __wait_flags : uint32_t
>>  {
>>
>
>
>> +   __abi_version = 0,
>> +   __proxy_wait = 1,
>> +   __track_contention = 2,
>> +   __do_spin = 4,
>> +   __spin_only = 8 | __do_spin, // implies __do_spin
>>
>
> This should be just 8 and not also imply __do_spin.
>


Alternatively, we could have:

   __spin_only = 4,
   __spin_then_wait = 8,
   __do_spin = __spin_only | __spin_then_wait,

Then testing (flags & do_spin) would be true if either of the others is
set, but (flags & spin_only) would do the right thing.

But if we have spin_then_wait in the default flags, a caller that wants to
use spin_only has to clear the spin_then_wait flag, otherwise there are two
mutually exclusive flags set at once.

I think I prefer:

   __do_spin = 4,
   __spin_only = 8, // Ignored unless __do_spin is also set.

as this is simpler for callers.


Re: [PATCH 1/2] libstdc++: Atomic wait/notify ABI stabilization

2024-03-09 Thread Jonathan Wakely
On Thu, 16 Nov 2023 at 13:49, Jonathan Wakely  wrote:

> From: Thomas Rodgers 
>
> These two patches were written by Tom earlier this year, before he left
> Red Hat. We should finish reviewing them for GCC 14 (and probably squash
> them into one?)
>
> Tom, you mentioned further work that changes the __platform_wait_t* to
> uintptr_t, is that ready, or likely to be ready very soon?
>
> Tested x86_64-linux, testing underway on powerpc-aix and sparc-solaris.
>
>
> -- >8 --
>
> This represents a major refactoring of the previous atomic::wait
> and atomic::notify implementation detail. The aim of this change
> is to simplify the implementation details and position the resulting
> implementation so that much of the current header-only detail
> can be moved into the shared library, while also accounting for
> anticipated changes to wait/notify functionality for C++26.
>
> The previous implementation implemented spin logic in terms of
> the types __default_spin_policy, __timed_backoff_spin_policy, and
> the free function __atomic_spin. These are replaced in favor of
> two new free functions; __spin_impl and __spin_until_impl. These
> currently inline free functions are expected to be moved into the
> libstdc++ shared library in a future commit.
>
> The previous implementation derived untimed and timed wait
> implementation detail from __detail::__waiter_pool_base. This
> is-a relationship is removed in the new version and the previous
> implementation detail is renamed to reflect this change. The
> static _S_for member has been renamed as well to indicate that it
> returns the __waiter_pool_impl entry in the static 'side table'
> for a given awaited address.
>
> This new implementation replaces all of the non-templated waiting
> detail of __waiter_base, __waiter_pool, __waiter, __enters_wait, and
> __bare_wait with the __wait_impl free function, and the supporting
> __wait_flags enum and __wait_args struct. This currenly inline free
> function is expected to be moved into the libstdc++ shared library
> in a future commit.
>
> This new implementation replaces all of the non-templated notifying
> detail of __waiter_base, __waiter_pool, and __waiter with the
> __notify_impl free function. This currently inline free function
> is expected to be moved into the libstdc++ shared library in a
> future commit.
>
> The __atomic_wait_address template function is updated to account
> for the above changes and to support the expected C++26 change to
> pass the most recent observed value to the caller supplied predicate.
>
> A new non-templated __atomic_wait_address_v free function is added
> that only works for atomic types that operate only on __platform_wait_t
> and requires the caller to supply a memory order. This is intended
> to be the simplest code path for such types.
>
> The __atomic_wait_address_v template function is now implemented in
> terms of new __atomic_wait_address template and continues to accept
> a user supplied "value function" to retrieve the current value of
> the atomic.
>
> The __atomic_notify_address template function is updated to account
> for the above changes.
>
> The template __platform_wait_until_impl is renamed to
> __wait_clock_t. The previous __platform_wait_until template is deleted
> and the functionality previously provided is moved t the new tempalate
> function __wait_until. A similar change is made to the
> __cond_wait_until_impl/__cond_wait_until implementation.
>
> This new implementation similarly replaces all of the non-templated
> waiting detail of __timed_waiter_pool, __timed_waiter, etc. with
> the new __wait_until_impl free function. This currently inline free
> function is expected to be moved into the libstdc++ shared library
> in a future commit.
>
> This implementation replaces all templated waiting functions that
> manage clock conversion as well as relative waiting (wait_for) with
> the new template functions __wait_until and __wait_for.
>
> Similarly the previous implementation detail for the various
> __atomic_wait_address_Xxx templates is adjusted to account for the
> implementation changes outlined above.
>
> All of the "bare wait" versions of __atomic_wait_Xxx have been removed
> and replaced with a defaulted boolean __bare_wait parameter on the
> new version of these templates.
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/atomic_timed_wait.h:
> (__detail::__platform_wait_until_impl): Rename to
> __platform_wait_until.
> (__detail::__platform_wait_until): Remove previous
> definition.
> (__detail::__cond_wait_until_impl): Rename to
> __cond_wait_until.
> (__detail::__cond_wait_until): Remove previous
> definition.
> (__detail::__spin_until_impl): New function.
> (__detail::__wait_until_impl): New function.
> (__detail::__wait_until): New function.
> (__detail::__wait_for): New function.
> (__detail::__timed_waiter_pool): Remove type.
> 

Re: [PATCH] bitint: Avoid rewriting large/huge _BitInt vars into SSA after bitint lowering [PR114278]

2024-03-09 Thread Richard Biener



> Am 09.03.2024 um 09:28 schrieb Jakub Jelinek :
> 
> Hi!
> 
> The following testcase ICEs, because update-address-taken subpass of
> fre5 rewrites
>  _BitInt(128) b;
>  vector(16) unsigned char _3;
> 
>   [local count: 1073741824]:
>  _3 = MEM  [(char * {ref-all})p_2(D)];
>  MEM  [(char * {ref-all})] = _3;
>  b ={v} {CLOBBER(eos)};
> to
>  _BitInt(128) b;
>  vector(16) unsigned char _3;
> 
>   [local count: 1073741824]:
>  _3 = MEM  [(char * {ref-all})p_2(D)];
>  b_5 = VIEW_CONVERT_EXPR<_BitInt(128)>(_3);
> but we can't have large/huge _BitInt vars in SSA form after the bitint
> lowering except for function arguments loaded from memory, as expansion
> isn't able to deal with those, it relies on bitint lowering to lower
> those operations.
> The following patch fixes that by not clearing TREE_ADDRESSABLE for
> large/huge _BitInt vars after bitint lowering, such that we don't
> rewrite them into SSA form.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 

Ideally we’d clear TREE_ADDRESSABLE but set DECL_NOT_GIMPLE_REG, I think the 
analysis where we check the base would be a more appropriate place to enforce 
that.

Richard 

> 2024-03-09  Jakub Jelinek  
> 
>PR tree-optimization/114278
>* tree-ssa.cc (maybe_optimize_var): Punt on large/huge _BitInt
>vars after bitint lowering.
> 
>* gcc.dg/bitint-99.c: New test.
> 
> --- gcc/tree-ssa.cc.jj2024-01-03 11:51:39.902615009 +0100
> +++ gcc/tree-ssa.cc2024-03-08 14:24:11.844821915 +0100
> @@ -1753,7 +1753,11 @@ maybe_optimize_var (tree var, bitmap add
>   /* Global Variables, result decls cannot be changed.  */
>   if (is_global_var (var)
>   || TREE_CODE (var) == RESULT_DECL
> -  || bitmap_bit_p (addresses_taken, DECL_UID (var)))
> +  || bitmap_bit_p (addresses_taken, DECL_UID (var))
> +  || (TREE_CODE (TREE_TYPE (var)) == BITINT_TYPE
> +  /* Don't change large/huge _BitInt vars after _BitInt lowering.  */
> +  && (cfun->curr_properties & PROP_gimple_lbitint) != 0
> +  && TYPE_PRECISION (TREE_TYPE (var)) > MAX_FIXED_MODE_SIZE))
> return;
> 
>   bool maybe_reg = false;
> --- gcc/testsuite/gcc.dg/bitint-99.c.jj2024-03-08 14:26:17.658069942 +0100
> +++ gcc/testsuite/gcc.dg/bitint-99.c2024-03-08 14:25:36.292645965 +0100
> @@ -0,0 +1,26 @@
> +/* PR tree-optimization/114278 */
> +/* { dg-do compile { target bitint } } */
> +/* { dg-options "-O2 -fno-tree-dce -fno-tree-dse -fno-tree-ccp" } */
> +/* { dg-additional-options "-mavx2" { target i?86-*-* x86_64-*-* } } */
> +
> +void
> +foo (void *p)
> +{
> +  _BitInt(64) b = *(_BitInt(64) *) __builtin_memmove (, p, sizeof 
> (_BitInt(64)));
> +}
> +
> +#if __BITINT_MAXWIDTH__ >= 128
> +void
> +bar (void *p)
> +{
> +  _BitInt(128) b = *(_BitInt(128) *) __builtin_memmove (, p, sizeof 
> (_BitInt(128)));
> +}
> +#endif
> +
> +#if __BITINT_MAXWIDTH__ >= 256
> +void
> +baz (void *p)
> +{
> +  _BitInt(256) b = *(_BitInt(256) *) __builtin_memmove (, p, sizeof 
> (_BitInt(256)));
> +}
> +#endif
> 
>Jakub
> 


Re: [PATCH] fwprop: Restore previous behavior for forward propagation of RTL with MEMs [PR114284]

2024-03-09 Thread Richard Biener



> Am 09.03.2024 um 09:36 schrieb Jakub Jelinek :
> 
> Hi!
> 
> Before the recent PR111267 r14-8319 fwprop changes, fwprop would never try
> to propagate what was not considered PROFITABLE, where the profitable part
> actually was partly about profitability, partly about very good reasons
> not to actually propagate and partly for cases where propagation is
> completely incorrect.
> In particular, classify_result has:
>  /* Allow (subreg (mem)) -> (mem) simplifications with the following
> exceptions:
> 1) Propagating (mem)s into multiple uses is not profitable.
> 2) Propagating (mem)s across EBBs may not be profitable if the source EBB
>runs less frequently.
> 3) Propagating (mem)s into paradoxical (subreg)s is not profitable.
> 4) Creating new (mem/v)s is not correct, since DCE will not remove the old
>ones.  */
>  if (single_use_p
>  && single_ebb_p
>  && SUBREG_P (old_rtx)
>  && !paradoxical_subreg_p (old_rtx)
>  && MEM_P (new_rtx)
>  && !MEM_VOLATILE_P (new_rtx))
>return PROFITABLE;
> and didn't mark any other MEM_P (new_rtx) or rtxes which contain
> a MEM in its subrtxes as PROFITABLE.  Now, since r14-8319 profitable_p
> method has been renamed to likely_profitable_p and has just a minor role.
> Now, rule 4) above is something that isn't about profitability, but about
> correct behavior, if you propagate mem/v, the code is miscompiled.
> This particular case has been fixed elsewhere by Haochen in r14-9379.
> But I think even the 1) and 2) and maybe 3) are a strong don't do it,
> don't rely solely on rtx costs, increasing the number of loads of the same
> memory, even when cached, is undesirable, canceling load hoisting can
> be undesirable as well.
> 
> So, the following patch restores previous behavior of src contains any MEMs,
> in that case likely_profitable_p () is taken as the old profitable_p ()
> as a requirement rather than just a hint.  For propagation of something
> which doesn't load from memory this keeps the r14-8319 behavior.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok

Richard 

> 2024-03-09  Jakub Jelinek  
> 
>PR target/114284
>* fwprop.cc (try_fwprop_subst_pattern): Don't propagate
>src containing MEMs unless prop.likely_profitable_p ().
> 
> --- gcc/fwprop.cc.jj2024-03-08 09:07:29.371626376 +0100
> +++ gcc/fwprop.cc2024-03-08 16:18:16.125853619 +0100
> @@ -451,6 +451,7 @@ try_fwprop_subst_pattern (obstack_waterm
> 
>   if (!prop.likely_profitable_p ()
>   && (prop.changed_mem_p ()
> +  || contains_mem_rtx_p (src)
>  || use_insn->is_asm ()
>  || !single_set (use_rtl)))
> {
> 
>Jakub
> 


RE: [PATCH v1] VECT: Bugfix ICE for vectorizable_store when both len and mask

2024-03-09 Thread Li, Pan2
Thanks Richard for comments.

> That said, the assert you run into should be only asserted during transform,
> not during analysis.
Good to learn that the assertion is only valid during transform, I guess we may 
have almost
the same case in vectorizable_load. I will try to test only allow assertion 
during transform, to
see if there is any regressions and send the v2.

> It possibly was before Robins costing reorg?
Sorry, not very sure which commit from robin.

Pan

-Original Message-
From: Richard Biener  
Sent: Friday, March 8, 2024 10:03 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Wang, 
Yanzhang ; rdapp@gmail.com; jeffreya...@gmail.com
Subject: Re: [PATCH v1] VECT: Bugfix ICE for vectorizable_store when both len 
and mask

On Fri, Mar 8, 2024 at 2:59 PM Richard Biener
 wrote:
>
> On Fri, Mar 8, 2024 at 1:04 AM  wrote:
> >
> > From: Pan Li 
> >
> > This patch would like to fix one ICE in vectorizable_store for both the
> > loop_masks and loop_lens.  The ICE looks like below with "-march=rv64gcv 
> > -O3".
> >
> > during GIMPLE pass: vect
> > test.c: In function ‘d’:
> > test.c:6:6: internal compiler error: in vectorizable_store, at
> > tree-vect-stmts.cc:8691
> > 6 | void d() {
> >   |  ^
> > 0x37a6f2f vectorizable_store
> > .../__RISC-V_BUILD__/../gcc/tree-vect-stmts.cc:8691
> > 0x37b861c vect_analyze_stmt(vec_info*, _stmt_vec_info*, bool*,
> > _slp_tree*, _slp_instance*, vec*)
> > .../__RISC-V_BUILD__/../gcc/tree-vect-stmts.cc:13242
> > 0x1db5dca vect_analyze_loop_operations
> > .../__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:2208
> > 0x1db885b vect_analyze_loop_2
> > .../__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3041
> > 0x1dba029 vect_analyze_loop_1
> > .../__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3481
> > 0x1dbabad vect_analyze_loop(loop*, vec_info_shared*)
> > .../__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3639
> > 0x1e389d1 try_vectorize_loop_1
> > .../__RISC-V_BUILD__/../gcc/tree-vectorizer.cc:1066
> > 0x1e38f3d try_vectorize_loop
> > .../__RISC-V_BUILD__/../gcc/tree-vectorizer.cc:1182
> > 0x1e39230 execute
> > .../__RISC-V_BUILD__/../gcc/tree-vectorizer.cc:1298
> >
> > Given the masks and the lens cannot be enabled simultanously when loop is
> > using partial vectors.  Thus, we need to ensure the one is disabled when we
> > would like to record the other in check_load_store_for_partial_vectors.  For
> > example, when we try to record loop len, we need to check if the loop mask
> > is disabled or not.
>
> I don't think you can rely on LOOP_VINFO_FULLY_WITH_LENGTH_P during
> analysis.  Instead how we tried to set up things is that we never even try
> both and there is (was?) code to reject partial vector usage when we end
> up recording both lens and masks.

That is, a fix along what you do would have been to split
LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P into
_WITH_LENGTH and _WITH_MASKS, make sure to record both when
a stmt can handle both so in the end we'll have a choice.  Currently
if we end up with both mask and len we don't know whether all stmts
support lens or masks or just some.

But we simply assumed on RISCV you'd never end up with unsupported len
but supported mask I guess.

Richard.

> That said, the assert you run into should be only asserted during transform,
> not during analysis.  It possibly was before Robins costing reorg?
>
> Richard.
>
> > Below testsuites are passed for this patch:
> > * The x86 bootstrap tests.
> > * The x86 fully regression tests.
> > * The aarch64 fully regression tests.
> > * The riscv fully regressison tests.
> >
> > PR target/114195
> >
> > gcc/ChangeLog:
> >
> > * tree-vect-stmts.cc (check_load_store_for_partial_vectors): Add
> > loop mask/len check before recording as they are mutual exclusion.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/riscv/rvv/base/pr114195-1.c: New test.
> >
> > Signed-off-by: Pan Li 
> > ---
> >  .../gcc.target/riscv/rvv/base/pr114195-1.c| 15 +++
> >  gcc/tree-vect-stmts.cc| 26 ++-
> >  2 files changed, 35 insertions(+), 6 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr114195-1.c
> >
> > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr114195-1.c 
> > b/gcc/testsuite/gcc.target/riscv/rvv/base/pr114195-1.c
> > new file mode 100644
> > index 000..b0c9d5b81b8
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr114195-1.c
> > @@ -0,0 +1,15 @@
> > +/* Test that we do not have ice when compile */
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize" } */
> > +
> > +long a, b;
> > +extern short c[];
> > +
> > +void d() {
> > +  for (int e = 0; e < 35; e += 2) {
> > +a = ({ a < 0 ? a : 0; });
> > +b = ({ b < 0 ? b : 0; });
> > +
> > +c[e] = 0;
> > +  }
> > +}
> > diff --git 

Re: [PATCH] testsuite: xfail test for arm

2024-03-09 Thread Andrew Pinski
On Sat, Mar 9, 2024 at 1:07 AM Torbjörn SVENSSON
 wrote:
>
> I don't know if this affects other targets than arm-none-eabi, so I
> used arm-*-*. If you think it should be *-*-* or some other target
> selector, please let me know what to use instead.
>
> Ok for releases/gcc-13?

Most likely should be short_enums instead of arm*-*-* (I think the old
arm non-eabi didn't use short enums) due to the fix
r14-6517-gb7e4a4c626e applies when -fshort-enums is used.
Also if you are adding a dg-bogus to the branch, it might makes sense
to the same to the trunk (obviously without the xfail part).
Also makes sense to add a reference to r14-6517-gb7e4a4c626e to the
dg-bogus in the source too.

Thanks,
Andrew Pinski

>
> --
>
> On arm-none-eabi, the test case fails with
> .../null-deref-pr108251-smp_fetch_ssl_fc_has_early-O2.c:63:65: warning: 
> converting a packed 'enum obj_type' pointer (alignment 1) to a 'struct 
> connection' pointer (alignment 4) may result in an unaligned pointer value 
> [-Waddress-of-packed-member]
>
> The error was fixed in basepoints/gcc-14-6517-gb7e4a4c626e, but it
> was considered to be a too big change to be backported and thus, the
> failing test is marked xfail in GCC13.
>
> gcc/testsuite/ChangeLog:
> * gcc.dg/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early-O2.c:
> Added dg-bogus with xfail on offending line for arm-*-*.
>
> Signed-off-by: Torbjörn SVENSSON 
> ---
>  .../null-deref-pr108251-smp_fetch_ssl_fc_has_early-O2.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git 
> a/gcc/testsuite/gcc.dg/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early-O2.c
>  
> b/gcc/testsuite/gcc.dg/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early-O2.c
> index 2a9c715c32c..461d5f1199c 100644
> --- 
> a/gcc/testsuite/gcc.dg/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early-O2.c
> +++ 
> b/gcc/testsuite/gcc.dg/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early-O2.c
> @@ -60,7 +60,7 @@ static inline enum obj_type obj_type(const enum obj_type *t)
>  }
>  static inline struct connection *__objt_conn(enum obj_type *t)
>  {
> - return ((struct connection *)(((void *)(t)) - ((long)&((struct connection 
> *)0)->obj_type)));
> + return ((struct connection *)(((void *)(t)) - ((long)&((struct connection 
> *)0)->obj_type))); /* { dg-bogus "may result in an unaligned pointer value" 
> "" { xfail arm-*-* } } */
>  }
>  static inline struct connection *objt_conn(enum obj_type *t)
>  {
> --
> 2.25.1
>


[PATCH] [tree-prof] skip if errors were seen [PR113681]

2024-03-09 Thread Alexandre Oliva


ipa_tree_profile asserts that the symtab is in IPA_SSA state, but we
don't reach that state and ICE if e.g. ipa-strub passes report errors.
Skip this pass if errors were seen.

Regstrapped on x86_64-linux-gnu.  Ok to install?


for  gcc/ChangeLog

PR tree-optimization/113681
* tree-profiling.cc (pass_ipa_tree_profile::gate): Skip if
seen_errors.

for  gcc/testsuite/ChangeLog

PR tree-optimization/113681
* c-c++-common/strub-pr113681.c: New.
---
 gcc/testsuite/c-c++-common/strub-pr113681.c |   22 ++
 gcc/tree-profile.cc |3 ++-
 2 files changed, 24 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/c-c++-common/strub-pr113681.c

diff --git a/gcc/testsuite/c-c++-common/strub-pr113681.c 
b/gcc/testsuite/c-c++-common/strub-pr113681.c
new file mode 100644
index 0..3ef9017b2eb70
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/strub-pr113681.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-fstrub=relaxed -fbranch-probabilities" } */
+/* { dg-require-effective-target strub } */
+
+/* Same as torture/strub-inlineable1.c, but with -fbranch-probabilities, to
+   check that IPA tree-profiling won't ICE.  It would when we refrained from
+   running passes that would take it to IPA_SSA, but ran the pass that asserted
+   for IPA_SSA.  */
+
+inline void __attribute__ ((strub ("internal"), always_inline))
+inl_int_ali (void)
+{
+  /* No internal wrapper, so this body ALWAYS gets inlined,
+ but it cannot be called from non-strub contexts.  */
+}
+
+void
+bat (void)
+{
+  /* Not allowed, not a strub context.  */
+  inl_int_ali (); /* { dg-error "context" } */
+}
diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc
index aed13e2b1bc30..d2bdbe3a08c2f 100644
--- a/gcc/tree-profile.cc
+++ b/gcc/tree-profile.cc
@@ -999,7 +999,8 @@ pass_ipa_tree_profile::gate (function *)
  disabled.  */
   return (!in_lto_p && !flag_auto_profile
  && (flag_branch_probabilities || flag_test_coverage
- || profile_arc_flag));
+ || profile_arc_flag)
+ && !seen_error ());
 }
 
 } // anon namespace


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


[PATCH] [strub] improve handling of indirected volatile parms [PR112938]

2024-03-09 Thread Alexandre Oliva


The earlier patch for PR112938 arranged for volatile parms to be made
indirect in internal strub wrapped bodies.

The first problem that remained, more evident, was that the indirected
parameter remained volatile, despite the indirection, but it wasn't
regimplified, so indirecting it was malformed gimple.

Regimplifying turned out not to be needed.  The best course of action
was to drop the volatility from the by-reference parm, that was being
unexpectedly inherited from the original volatile parm.

That exposed another problem: the dereferences would then lose their
volatile status, so we had to bring volatile back to them.

Regstrapped on x86_64-linux-gnu.  Ok to install?


for  gcc/ChangeLog

PR middle-end/112938
* ipa-strub.cc (pass_ipa_strub::execute): Drop volatility from
indirected parm.
(maybe_make_indirect): Restore volatility in dereferences.

for  gcc/testsuite/ChangeLog

PR middle-end/112938
* g++.dg/strub-internal-pr112938.cc: New.
---
 gcc/ipa-strub.cc|7 +++
 gcc/testsuite/g++.dg/strub-internal-pr112938.cc |   12 
 2 files changed, 19 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/strub-internal-pr112938.cc

diff --git a/gcc/ipa-strub.cc b/gcc/ipa-strub.cc
index dff94222351ad..8fa7bdf530023 100644
--- a/gcc/ipa-strub.cc
+++ b/gcc/ipa-strub.cc
@@ -1940,6 +1940,9 @@ maybe_make_indirect (indirect_parms_t _parms, 
tree op, int *rec)
  TREE_TYPE (TREE_TYPE (op)),
  op,
  build_int_cst (TREE_TYPE (op), 0));
+ if (TYPE_VOLATILE (TREE_TYPE (TREE_TYPE (op)))
+ && !TREE_THIS_VOLATILE (ret))
+   TREE_SIDE_EFFECTS (ret) = TREE_THIS_VOLATILE (ret) = 1;
  return ret;
}
 }
@@ -2894,6 +2897,10 @@ pass_ipa_strub::execute (function *)
 probably drop the TREE_ADDRESSABLE and keep the TRUE.  */
  tree ref_type = build_ref_type_for (nparm);
 
+ if (TREE_THIS_VOLATILE (nparm)
+ && TYPE_VOLATILE (TREE_TYPE (nparm))
+ && !TYPE_VOLATILE (ref_type))
+   TREE_SIDE_EFFECTS (nparm) = TREE_THIS_VOLATILE (nparm) = 0;
  DECL_ARG_TYPE (nparm) = TREE_TYPE (nparm) = ref_type;
  relayout_decl (nparm);
  TREE_ADDRESSABLE (nparm) = 0;
diff --git a/gcc/testsuite/g++.dg/strub-internal-pr112938.cc 
b/gcc/testsuite/g++.dg/strub-internal-pr112938.cc
new file mode 100644
index 0..5a74becc2697e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/strub-internal-pr112938.cc
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-fdump-tree-optimized -O2" } */
+/* { dg-require-effective-target strub } */
+
+bool __attribute__ ((__strub__ ("internal")))
+f(bool i, volatile bool j)
+{
+  return (i ^ j) == j;
+}
+
+/* Check for two dereferences of the indirected volatile j parm.  */
+/* { dg-final { scan-tree-dump-times {={v} \*j_[0-9][0-9]*(D)} 2 "optimized" } 
} */

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


[PATCH] testsuite: xfail test for arm

2024-03-09 Thread Torbjörn SVENSSON
I don't know if this affects other targets than arm-none-eabi, so I
used arm-*-*. If you think it should be *-*-* or some other target
selector, please let me know what to use instead.

Ok for releases/gcc-13?

--

On arm-none-eabi, the test case fails with
.../null-deref-pr108251-smp_fetch_ssl_fc_has_early-O2.c:63:65: warning: 
converting a packed 'enum obj_type' pointer (alignment 1) to a 'struct 
connection' pointer (alignment 4) may result in an unaligned pointer value 
[-Waddress-of-packed-member]

The error was fixed in basepoints/gcc-14-6517-gb7e4a4c626e, but it
was considered to be a too big change to be backported and thus, the
failing test is marked xfail in GCC13.

gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early-O2.c:
Added dg-bogus with xfail on offending line for arm-*-*.

Signed-off-by: Torbjörn SVENSSON 
---
 .../null-deref-pr108251-smp_fetch_ssl_fc_has_early-O2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/gcc/testsuite/gcc.dg/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early-O2.c
 
b/gcc/testsuite/gcc.dg/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early-O2.c
index 2a9c715c32c..461d5f1199c 100644
--- 
a/gcc/testsuite/gcc.dg/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early-O2.c
+++ 
b/gcc/testsuite/gcc.dg/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early-O2.c
@@ -60,7 +60,7 @@ static inline enum obj_type obj_type(const enum obj_type *t)
 }
 static inline struct connection *__objt_conn(enum obj_type *t)
 {
- return ((struct connection *)(((void *)(t)) - ((long)&((struct connection 
*)0)->obj_type)));
+ return ((struct connection *)(((void *)(t)) - ((long)&((struct connection 
*)0)->obj_type))); /* { dg-bogus "may result in an unaligned pointer value" "" 
{ xfail arm-*-* } } */
 }
 static inline struct connection *objt_conn(enum obj_type *t)
 {
-- 
2.25.1



[patch,avr,applied] Add some more cost computation

2024-03-09 Thread Georg-Johann Lay

This adds cost computation for some insn combiner patterns
and improves a few other nits.

Johann

--

AVR: Add cost computation for some insn combine patterns.

gcc/
* config/avr/avr.cc (avr_rtx_costs_1) [PLUS]: Determine cost for
usum_widenqihi and add_zero_extend1.
[MINUS]: Determine costs for udiff_widenqihi, sub+zero_extend,
sub+sign_extend.
* config/avr/avr.md (*addhi3.sign_extend1, *subhi3.sign_extend2):
Compute exact insn lengths.
(*usum_widenqihi3): Allow input operands to commute.diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc
index 1fa4b557f5d..00fce8da15f 100644
--- a/gcc/config/avr/avr.cc
+++ b/gcc/config/avr/avr.cc
@@ -12524,10 +12524,25 @@ avr_rtx_costs_1 (rtx x, machine_mode mode, int outer_code,
 	  return true;
 	}
 
+  // *usum_widenqihi
+  if (mode == HImode
+	  && GET_CODE (XEXP (x, 0)) == ZERO_EXTEND
+	  && GET_CODE (XEXP (x, 1)) == ZERO_EXTEND)
+	{
+	  *total = COSTS_N_INSNS (3);
+	  return true;
+	}
+
   if (GET_CODE (XEXP (x, 0)) == ZERO_EXTEND
 	  && REG_P (XEXP (x, 1)))
 	{
-	  *total = COSTS_N_INSNS (GET_MODE_SIZE (mode) - 1);
+	  *total = COSTS_N_INSNS (GET_MODE_SIZE (mode));
+	  return true;
+	}
+  if (REG_P (XEXP (x, 0))
+	  && GET_CODE (XEXP (x, 1)) == ZERO_EXTEND)
+	{
+	  *total = COSTS_N_INSNS (GET_MODE_SIZE (mode));
 	  return true;
 	}
 
@@ -12610,6 +12625,29 @@ avr_rtx_costs_1 (rtx x, machine_mode mode, int outer_code,
   return true;
 
 case MINUS:
+  // *udiff_widenqihi
+  if (mode == HImode
+	  && GET_CODE (XEXP (x, 0)) == ZERO_EXTEND
+	  && GET_CODE (XEXP (x, 1)) == ZERO_EXTEND)
+	{
+	  *total = COSTS_N_INSNS (2);
+	  return true;
+	}
+  // *sub3_zero_extend1
+  if (REG_P (XEXP (x, 0))
+	  && GET_CODE (XEXP (x, 1)) == ZERO_EXTEND)
+	{
+	  *total = COSTS_N_INSNS (GET_MODE_SIZE (mode));
+	  return true;
+	}
+  // *sub3.sign_extend2
+  if (REG_P (XEXP (x, 0))
+	  && GET_CODE (XEXP (x, 1)) == SIGN_EXTEND)
+	{
+	  *total = COSTS_N_INSNS (2 + GET_MODE_SIZE (mode));
+	  return true;
+	}
+
   if (AVR_HAVE_MUL
 	  && QImode == mode
 	  && register_operand (XEXP (x, 0), QImode)
diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md
index 52b6cff4a8b..59ec724f7da 100644
--- a/gcc/config/avr/avr.md
+++ b/gcc/config/avr/avr.md
@@ -1588,12 +1588,10 @@ (define_insn_and_split "*addhi3.sign_extend1_split"
   ""
   "#"
   "&& reload_completed"
-  [(parallel
-  [(set (match_dup 0)
-(plus:HI
-  (sign_extend:HI (match_dup 1))
-  (match_dup 2)))
-   (clobber (reg:CC REG_CC))])])
+  [(parallel [(set (match_dup 0)
+   (plus:HI (sign_extend:HI (match_dup 1))
+(match_dup 2)))
+  (clobber (reg:CC REG_CC))])])
 
 
 (define_insn "*addhi3.sign_extend1"
@@ -1607,7 +1605,8 @@ (define_insn "*addhi3.sign_extend1"
   ? "mov __tmp_reg__,%1\;add %A0,%1\;adc %B0,__zero_reg__\;sbrc __tmp_reg__,7\;dec %B0"
   : "add %A0,%1\;adc %B0,__zero_reg__\;sbrc %1,7\;dec %B0";
   }
-  [(set_attr "length" "5")])
+  [(set (attr "length")
+(symbol_ref ("4 + reg_overlap_mentioned_p (operands[0], operands[1])")))])
 
 (define_insn_and_split "*addhi3_zero_extend.const_split"
   [(set (match_operand:HI 0 "register_operand" "=d")
@@ -1665,7 +1664,7 @@ (define_insn "*addhi3_zero_extend.ashift1"
 
 (define_insn_and_split "*usum_widenqihi3_split"
   [(set (match_operand:HI 0 "register_operand"  "=r")
-(plus:HI (zero_extend:HI (match_operand:QI 1 "register_operand"  "0"))
+(plus:HI (zero_extend:HI (match_operand:QI 1 "register_operand" "%0"))
  (zero_extend:HI (match_operand:QI 2 "register_operand"  "r"]
   ""
   "#"
@@ -1678,7 +1677,7 @@ (define_insn_and_split "*usum_widenqihi3_split"
 
 (define_insn "*usum_widenqihi3"
   [(set (match_operand:HI 0 "register_operand"  "=r")
-(plus:HI (zero_extend:HI (match_operand:QI 1 "register_operand"  "0"))
+(plus:HI (zero_extend:HI (match_operand:QI 1 "register_operand" "%0"))
  (zero_extend:HI (match_operand:QI 2 "register_operand"  "r"
(clobber (reg:CC REG_CC))]
   "reload_completed"
@@ -2186,7 +2185,8 @@ (define_insn "*subhi3.sign_extend2"
   ? "mov __tmp_reg__,%2\;sub %A0,%2\;sbc %B0,__zero_reg__\;sbrc __tmp_reg__,7\;inc %B0"
   : "sub %A0,%2\;sbc %B0,__zero_reg__\;sbrc %2,7\;inc %B0";
   }
-  [(set_attr "length" "5")])
+  [(set (attr "length")
+(symbol_ref ("4 + reg_overlap_mentioned_p (operands[0], operands[2])")))])
 
 ;; "subsi3"
 ;; "subsq3" "subusq3"


[committed] i386: Regenerate i386.opt.urls

2024-03-09 Thread Jakub Jelinek
Hi!

When I've added the -mnoreturn-no-callee-saved-registers option
to i386.opt, I forgot to regenerate i386.opt.urls and Mark's
CI kindly reminded me of that.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
verified
https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html#index-mnoreturn-no-callee-saved-registers
works, committed to trunk as obvious.

2024-03-09  Jakub Jelinek  

* config/i386/i386.opt.urls: Regenerate.

--- gcc/config/i386/i386.opt.urls.jj2024-03-04 19:20:27.060201697 +0100
+++ gcc/config/i386/i386.opt.urls   2024-03-08 18:32:20.527162487 +0100
@@ -184,6 +184,9 @@ UrlSuffix(gcc/x86-Options.html#index-mmo
 mstore-max=
 UrlSuffix(gcc/x86-Options.html#index-mstore-max)
 
+mnoreturn-no-callee-saved-registers
+UrlSuffix(gcc/x86-Options.html#index-mnoreturn-no-callee-saved-registers)
+
 m32
 UrlSuffix(gcc/x86-Options.html#index-m32-2)
 

Jakub



[PATCH] fwprop: Restore previous behavior for forward propagation of RTL with MEMs [PR114284]

2024-03-09 Thread Jakub Jelinek
Hi!

Before the recent PR111267 r14-8319 fwprop changes, fwprop would never try
to propagate what was not considered PROFITABLE, where the profitable part
actually was partly about profitability, partly about very good reasons
not to actually propagate and partly for cases where propagation is
completely incorrect.
In particular, classify_result has:
  /* Allow (subreg (mem)) -> (mem) simplifications with the following
 exceptions:
 1) Propagating (mem)s into multiple uses is not profitable.
 2) Propagating (mem)s across EBBs may not be profitable if the source EBB
runs less frequently.
 3) Propagating (mem)s into paradoxical (subreg)s is not profitable.
 4) Creating new (mem/v)s is not correct, since DCE will not remove the old
ones.  */
  if (single_use_p
  && single_ebb_p
  && SUBREG_P (old_rtx)
  && !paradoxical_subreg_p (old_rtx)
  && MEM_P (new_rtx)
  && !MEM_VOLATILE_P (new_rtx))
return PROFITABLE;
and didn't mark any other MEM_P (new_rtx) or rtxes which contain
a MEM in its subrtxes as PROFITABLE.  Now, since r14-8319 profitable_p
method has been renamed to likely_profitable_p and has just a minor role.
Now, rule 4) above is something that isn't about profitability, but about
correct behavior, if you propagate mem/v, the code is miscompiled.
This particular case has been fixed elsewhere by Haochen in r14-9379.
But I think even the 1) and 2) and maybe 3) are a strong don't do it,
don't rely solely on rtx costs, increasing the number of loads of the same
memory, even when cached, is undesirable, canceling load hoisting can
be undesirable as well.

So, the following patch restores previous behavior of src contains any MEMs,
in that case likely_profitable_p () is taken as the old profitable_p ()
as a requirement rather than just a hint.  For propagation of something
which doesn't load from memory this keeps the r14-8319 behavior.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-03-09  Jakub Jelinek  

PR target/114284
* fwprop.cc (try_fwprop_subst_pattern): Don't propagate
src containing MEMs unless prop.likely_profitable_p ().

--- gcc/fwprop.cc.jj2024-03-08 09:07:29.371626376 +0100
+++ gcc/fwprop.cc   2024-03-08 16:18:16.125853619 +0100
@@ -451,6 +451,7 @@ try_fwprop_subst_pattern (obstack_waterm
 
   if (!prop.likely_profitable_p ()
   && (prop.changed_mem_p ()
+ || contains_mem_rtx_p (src)
  || use_insn->is_asm ()
  || !single_set (use_rtl)))
 {

Jakub



[PATCH] bitint: Avoid rewriting large/huge _BitInt vars into SSA after bitint lowering [PR114278]

2024-03-09 Thread Jakub Jelinek
Hi!

The following testcase ICEs, because update-address-taken subpass of
fre5 rewrites
  _BitInt(128) b;
  vector(16) unsigned char _3;

   [local count: 1073741824]:
  _3 = MEM  [(char * {ref-all})p_2(D)];
  MEM  [(char * {ref-all})] = _3;
  b ={v} {CLOBBER(eos)};
to
  _BitInt(128) b;
  vector(16) unsigned char _3;

   [local count: 1073741824]:
  _3 = MEM  [(char * {ref-all})p_2(D)];
  b_5 = VIEW_CONVERT_EXPR<_BitInt(128)>(_3);
but we can't have large/huge _BitInt vars in SSA form after the bitint
lowering except for function arguments loaded from memory, as expansion
isn't able to deal with those, it relies on bitint lowering to lower
those operations.
The following patch fixes that by not clearing TREE_ADDRESSABLE for
large/huge _BitInt vars after bitint lowering, such that we don't
rewrite them into SSA form.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-03-09  Jakub Jelinek  

PR tree-optimization/114278
* tree-ssa.cc (maybe_optimize_var): Punt on large/huge _BitInt
vars after bitint lowering.

* gcc.dg/bitint-99.c: New test.

--- gcc/tree-ssa.cc.jj  2024-01-03 11:51:39.902615009 +0100
+++ gcc/tree-ssa.cc 2024-03-08 14:24:11.844821915 +0100
@@ -1753,7 +1753,11 @@ maybe_optimize_var (tree var, bitmap add
   /* Global Variables, result decls cannot be changed.  */
   if (is_global_var (var)
   || TREE_CODE (var) == RESULT_DECL
-  || bitmap_bit_p (addresses_taken, DECL_UID (var)))
+  || bitmap_bit_p (addresses_taken, DECL_UID (var))
+  || (TREE_CODE (TREE_TYPE (var)) == BITINT_TYPE
+ /* Don't change large/huge _BitInt vars after _BitInt lowering.  */
+ && (cfun->curr_properties & PROP_gimple_lbitint) != 0
+ && TYPE_PRECISION (TREE_TYPE (var)) > MAX_FIXED_MODE_SIZE))
 return;
 
   bool maybe_reg = false;
--- gcc/testsuite/gcc.dg/bitint-99.c.jj 2024-03-08 14:26:17.658069942 +0100
+++ gcc/testsuite/gcc.dg/bitint-99.c2024-03-08 14:25:36.292645965 +0100
@@ -0,0 +1,26 @@
+/* PR tree-optimization/114278 */
+/* { dg-do compile { target bitint } } */
+/* { dg-options "-O2 -fno-tree-dce -fno-tree-dse -fno-tree-ccp" } */
+/* { dg-additional-options "-mavx2" { target i?86-*-* x86_64-*-* } } */
+
+void
+foo (void *p)
+{
+  _BitInt(64) b = *(_BitInt(64) *) __builtin_memmove (, p, sizeof 
(_BitInt(64)));
+}
+
+#if __BITINT_MAXWIDTH__ >= 128
+void
+bar (void *p)
+{
+  _BitInt(128) b = *(_BitInt(128) *) __builtin_memmove (, p, sizeof 
(_BitInt(128)));
+}
+#endif
+
+#if __BITINT_MAXWIDTH__ >= 256
+void
+baz (void *p)
+{
+  _BitInt(256) b = *(_BitInt(256) *) __builtin_memmove (, p, sizeof 
(_BitInt(256)));
+}
+#endif

Jakub