cpymem for RISCV with v extension

2023-07-17 Thread Joern Rennecke
As discussed on last week's patch call, this patch uses either a
straight copy or an opaque pattern that emits the loop as assembly to
optimize cpymem for the 'v' extension.
I used Ju-Zhe Zhong's patch - starting in git with:

Author: zhongjuzhe <66454988+zhongju...@users.noreply.github.com>
Date:   Mon Mar 21 14:20:42 2022 +0800

  PR for RVV support using splitted small chunks (#334)

as a starting point, even though not all that much of the original code remains.

Regression tested on x86_64-pc-linux-gnu X
riscv-sim

riscv-sim/-march=rv32imafdcv_zicsr_zifencei_zfh_zba_zbb_zbc_zbs_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=ilp32f

riscv-sim/-march=rv32imafdcv_zicsr_zifencei_zfh_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=ilp32

riscv-sim/-march=rv32imafdcv_zicsr_zifencei_zfh_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=ilp32f

riscv-sim/-march=rv32imfdcv_zicsr_zifencei_zfh_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=ilp32

riscv-sim/-march=rv64imafdcv_zicsr_zifencei_zfh_zba_zbb_zbc_zbs_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=lp64d

riscv-sim/-march=rv64imafdcv_zicsr_zifencei_zfh_zba_zbb_zbs_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=lp64d

riscv-sim/-march=rv64imafdcv_zicsr_zifencei_zfh_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=lp64d
2023-07-12  Ju-Zhe Zhong 
Joern Rennecke  

* config/riscv/riscv-protos.h (riscv_vector::expand_block_move):
Declare.
* config/riscv/riscv-v.cc (riscv_vector::expand_block_move):
New function.
* config/riscv/riscv.md (cpymemsi): Use riscv_vector::expand_block_move.
* config/riscv/vector.md (@cpymem_straight):
New define_insn patterns.
(@cpymem_loop): Likewise.
(@cpymem_loop_fast): Likewise.

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 16fb8dabca0..40965a00681 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -301,6 +301,7 @@ bool slide1_sew64_helper (int, machine_mode, machine_mode,
  machine_mode, rtx *);
 rtx gen_avl_for_scalar_move (rtx);
 void expand_tuple_move (rtx *);
+bool expand_block_move (rtx, rtx, rtx);
 machine_mode preferred_simd_mode (scalar_mode);
 opt_machine_mode get_mask_mode (machine_mode);
 void expand_vec_series (rtx, rtx, rtx);
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index b4884a30872..e61110fa3ad 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -49,6 +49,7 @@
 #include "tm-constrs.h"
 #include "rtx-vector-builder.h"
 #include "targhooks.h"
+#include "predict.h"
 
 using namespace riscv_vector;
 
@@ -2164,6 +2165,191 @@ expand_tuple_move (rtx *ops)
 }
 }
 
+/* Used by cpymemsi in riscv.md .  */
+
+bool
+expand_block_move (rtx dest_in, rtx src_in, rtx length_in)
+{
+  /*
+memcpy:
+   mv a3, a0   # Copy destination
+loop:
+   vsetvli t0, a2, e8, m8, ta, ma  # Vectors of 8b
+   vle8.v v0, (a1) # Load bytes
+   add a1, a1, t0  # Bump pointer
+   sub a2, a2, t0  # Decrement count
+   vse8.v v0, (a3) # Store bytes
+   add a3, a3, t0  # Bump pointer
+   bnez a2, loop   # Any more?
+   ret # Return
+  */
+  if (!TARGET_VECTOR)
+return false;
+  HOST_WIDE_INT potential_ew
+= (MIN (MIN (MEM_ALIGN (src_in), MEM_ALIGN (dest_in)), BITS_PER_WORD)
+   / BITS_PER_UNIT);
+  machine_mode vmode = VOIDmode;
+  bool need_loop = true;
+  bool size_p = optimize_function_for_size_p (cfun);
+  rtx src, dst;
+  rtx end = gen_reg_rtx (Pmode);
+  rtx vec;
+  rtx length_rtx = length_in;
+
+  if (CONST_INT_P (length_in))
+{
+  HOST_WIDE_INT length = INTVAL (length_in);
+
+/* By using LMUL=8, we can copy as many bytes in one go as there
+   are bits in a vector register.  If the entire block thus fits,
+   we don't need a loop.  */
+if (length <= TARGET_MIN_VLEN)
+  {
+   need_loop = false;
+
+   /* If a single scalar load / store pair can do the job, leave it
+  to the scalar code to do that.  */
+
+   if (pow2p_hwi (length) && length <= potential_ew)
+ return false;
+  }
+
+  /* Find the vector mode to use.  Using the largest possible element
+size is likely to give smaller constants, and thus potentially
+reducing code size.  However, if we need a loop, we need to update
+the pointers, and that is more complicated with a larger element
+size, unless we use an immediate, which prevents us from dynamically
+using the largets transfer size that the hart supports.  And then,
+unless we know the *exact* vector size of the hart, we'd need
+multiple 

Re: [PATCH v3] Introduce attribute reverse_alias

2023-07-17 Thread Alexandre Oliva via Gcc-patches
Hello, Nathan,

On Jul 15, 2023, Nathan Sidwell  wrote:

> Not commenting on the semantics, but the name seems unfortunate (hello
> bikeshed).

Yeah, it's a bit challenging to express the concept, when the notion of
"alias" is kind of symmetric between decl and target, but the
previously-implemented extension attaches it to decl rather than to
target.  I tried "extra alias" before, but that didn't fly either.

Maybe I should give up and just recommend the use of asm ("name")
instead of allowing alternative names (AKA aliases, in the dictionary
sense; oh, the irony) to be introduced for a decl?  Maybe that would be
simpler and enough to sidestep the problem of varying mangled names when
trying to import into Ada (or defining C++ aliases for) C++ symbols that
use standard types in signatures that are not fundamental types, such as
size_t.  That they mangle differently depending on what size_t is
typedef'ed to makes for some undesirable inconvenience, which this
attribute attempts to alleviate.

> The documentation starts with 'attribute causes @var{name}
> to be emitted as an alias to the definition'.  So not emitting a
> 'reverse alias', whatever that might be.

It's reverse in that it doesn't alias another declaration, as in the
preexisting meaning of the alias attribute, it adds an alias for the
present declaration.

> It doesn;t seem to mention how reverse alias differs from 'alias'.
> Why would 'alias' not DTRT?

contrast:

int foo();
int __attribute__ ((alias ("foo"))) bar();

static_assert ( == ); // ok

with:

int __attribute__ ((reverse_alias ("bar"))) foo();

static_assert ( == ); // error, bar is not a C++ symbol

int __attribute__ ((alias ("bar"))) baz(); // ok

static_assert ( == ); // ok

asm (".quad bar"); // ok, even in other TUs
asm (".quad foo"); // not necessarily ok, foo's symbol may be mangled
asm (".quad baz"); // not necessarily ok, baz's symbol may be mangled

> Is is emitting a an additiona symbol -- ie, something like 'altname'.

Yup.  Is there precedent for this attribute name elsewhere?  I think it
could work.

> Is that symbol known in the current TU, or other TUs?

Only in the assembly/linker name space, not in any C++ namespace.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


Committed: Tighten regexps in gcc.target/riscv/_Float16-zhinx-1.c .

2023-07-17 Thread Joern Rennecke
Committed as obvious.
commit 6bab2772dbc42ce7a1b29b03ae84e6e434e23c4e
Author: Joern Rennecke 
Date:   Tue Jul 18 04:28:55 2023 +0100

Tighten regexps in gcc.target/riscv/_Float16-zhinx-1.c .

The original "mv" regexp would match
.ascii  "\254\254\375\002e2N6\013\231,\354NDmvVP0]\304\312F!biZ\025\211"
in the .gnu.lto_foo1.0.32528183c9deec41 section.

gcc/testsuite/
* gcc.target/riscv/_Float16-zhinx-1.c: Tighten regexps.

diff --git a/gcc/testsuite/gcc.target/riscv/_Float16-zhinx-1.c 
b/gcc/testsuite/gcc.target/riscv/_Float16-zhinx-1.c
index 90172b57e05..67826171bfb 100644
--- a/gcc/testsuite/gcc.target/riscv/_Float16-zhinx-1.c
+++ b/gcc/testsuite/gcc.target/riscv/_Float16-zhinx-1.c
@@ -6,5 +6,5 @@ _Float16 foo1 (_Float16 a, _Float16 b)
 return b;
 }
 
-/* { dg-final { scan-assembler-not "fmv.h" } } */
-/* { dg-final { scan-assembler-times "mv" 1 } } */
+/* { dg-final { scan-assembler-not {\mfmv\.h\M} } } */
+/* { dg-final { scan-assembler-times {\mmv\M} 1 } } */


Re:Fw: [PATCH V2] RTL_SSA: Relax PHI_MODE in phi_setup

2023-07-17 Thread Lehua Ding
Committed to the trunk, thanks Richard and Juzhe.


1. bootstrap and regression are pass on i386 target (by Pan).
2. no new failed testcases on AArch64 target.


Best,
Lehua


-- Original --
From:   
 "Richard Sandiford"



RE: [PATCH v2] RISC-V: Fix RVV frm run test failure on RV32

2023-07-17 Thread Li, Pan2 via Gcc-patches
Committed, thanks Juzhe.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Tuesday, July 18, 2023 10:53 AM
To: Li, Pan2 ; gcc-patches 
Cc: Li, Pan2 ; Wang, Yanzhang ; 
kito.cheng 
Subject: Re: [PATCH v2] RISC-V: Fix RVV frm run test failure on RV32

LGTM.


juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-07-18 10:49
To: gcc-patches
CC: juzhe.zhong; 
pan2.li; 
yanzhang.wang; 
kito.cheng
Subject: [PATCH v2] RISC-V: Fix RVV frm run test failure on RV32
From: Pan Li mailto:pan2...@intel.com>>

Refine the run test case to avoid interactive checking in RV32, by
separating each checks in different functions.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-frm-run-1.c: Fix run failure.
---
.../riscv/rvv/base/float-point-frm-run-1.c| 59 +++
1 file changed, 36 insertions(+), 23 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-run-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-run-1.c
index 245ce7d1fc0..1b2789a924b 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-run-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-run-1.c
@@ -5,6 +5,24 @@
#include 
#include 
+#define DEFINE_TEST_FUNC(FRM) \
+vfloat32m1_t __attribute__ ((noinline)) \
+test_float_point_frm_run_##FRM (vfloat32m1_t op1, vfloat32m1_t op2, size_t vl) 
\
+{  
\
+  vfloat32m1_t result; 
\
+   
\
+  set_frm (0); 
\
+   
\
+  result = __riscv_vfadd_vv_f32m1_rm (op1, result, FRM, vl);   
\
+   
\
+  assert_equal (FRM, get_frm (), "The value of frm should be " #FRM ".");  
\
+   
\
+  return result;   
\
+}
+
+#define RUN_TEST_FUNC(FRM, op1, op2, vl) \
+  test_float_point_frm_run_##FRM (op1, op2, vl)
+
static int
get_frm ()
{
@@ -41,28 +59,11 @@ assert_equal (int a, int b, char *message)
 }
}
-vfloat32m1_t __attribute__ ((noinline))
-test_float_point_frm_run (vfloat32m1_t op1, vfloat32m1_t op2, size_t vl)
-{
-  vfloat32m1_t result;
-
-  result = __riscv_vfadd_vv_f32m1_rm (op1, result, 1, vl);
-  assert_equal (1, get_frm (), "The value of frm register should be 1.");
-
-  result = __riscv_vfadd_vv_f32m1_rm (op1, result, 2, vl);
-  assert_equal (2, get_frm (), "The value of frm register should be 2.");
-
-  result = __riscv_vfadd_vv_f32m1_rm (op1, result, 3, vl);
-  assert_equal (3, get_frm (), "The value of frm register should be 3.");
-
-  result = __riscv_vfadd_vv_f32m1_rm (op1, result, 4, vl);
-  assert_equal (4, get_frm (), "The value of frm register should be 4.");
-
-  result = __riscv_vfadd_vv_f32m1_rm (op1, result, 0, vl);
-  assert_equal (0, get_frm (), "The value of frm register should be 0.");
-
-  return result;
-}
+DEFINE_TEST_FUNC (0)
+DEFINE_TEST_FUNC (1)
+DEFINE_TEST_FUNC (2)
+DEFINE_TEST_FUNC (3)
+DEFINE_TEST_FUNC (4)
int
main ()
@@ -72,8 +73,20 @@ main ()
   vfloat32m1_t op2;
   set_frm (4);
-  test_float_point_frm_run (op1, op2, vl);
+  RUN_TEST_FUNC (0, op1, op2, vl);
+  assert_equal (4, get_frm (), "The value of frm register should be 4.");
+
+  RUN_TEST_FUNC (1, op1, op2, vl);
+  assert_equal (4, get_frm (), "The value of frm register should be 4.");
+
+  RUN_TEST_FUNC (2, op1, op2, vl);
+  assert_equal (4, get_frm (), "The value of frm register should be 4.");
+
+  RUN_TEST_FUNC (3, op1, op2, vl);
+  assert_equal (4, get_frm (), "The value of frm register should be 4.");
+
+  RUN_TEST_FUNC (4, op1, op2, vl);
   assert_equal (4, get_frm (), "The value of frm register should be 4.");
   return 0;
--
2.34.1




Re: [PATCH v2] RISC-V: Fix RVV frm run test failure on RV32

2023-07-17 Thread juzhe.zh...@rivai.ai
LGTM.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-07-18 10:49
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v2] RISC-V: Fix RVV frm run test failure on RV32
From: Pan Li 
 
Refine the run test case to avoid interactive checking in RV32, by
separating each checks in different functions.
 
Signed-off-by: Pan Li 
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/float-point-frm-run-1.c: Fix run failure.
---
.../riscv/rvv/base/float-point-frm-run-1.c| 59 +++
1 file changed, 36 insertions(+), 23 deletions(-)
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-run-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-run-1.c
index 245ce7d1fc0..1b2789a924b 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-run-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-run-1.c
@@ -5,6 +5,24 @@
#include 
#include 
+#define DEFINE_TEST_FUNC(FRM) \
+vfloat32m1_t __attribute__ ((noinline)) \
+test_float_point_frm_run_##FRM (vfloat32m1_t op1, vfloat32m1_t op2, size_t vl) 
\
+{  
\
+  vfloat32m1_t result; 
\
+   
\
+  set_frm (0); 
\
+   
\
+  result = __riscv_vfadd_vv_f32m1_rm (op1, result, FRM, vl);   
\
+   
\
+  assert_equal (FRM, get_frm (), "The value of frm should be " #FRM ".");  
\
+   
\
+  return result;   
\
+}
+
+#define RUN_TEST_FUNC(FRM, op1, op2, vl) \
+  test_float_point_frm_run_##FRM (op1, op2, vl)
+
static int
get_frm ()
{
@@ -41,28 +59,11 @@ assert_equal (int a, int b, char *message)
 }
}
-vfloat32m1_t __attribute__ ((noinline))
-test_float_point_frm_run (vfloat32m1_t op1, vfloat32m1_t op2, size_t vl)
-{
-  vfloat32m1_t result;
-
-  result = __riscv_vfadd_vv_f32m1_rm (op1, result, 1, vl);
-  assert_equal (1, get_frm (), "The value of frm register should be 1.");
-
-  result = __riscv_vfadd_vv_f32m1_rm (op1, result, 2, vl);
-  assert_equal (2, get_frm (), "The value of frm register should be 2.");
-
-  result = __riscv_vfadd_vv_f32m1_rm (op1, result, 3, vl);
-  assert_equal (3, get_frm (), "The value of frm register should be 3.");
-
-  result = __riscv_vfadd_vv_f32m1_rm (op1, result, 4, vl);
-  assert_equal (4, get_frm (), "The value of frm register should be 4.");
-
-  result = __riscv_vfadd_vv_f32m1_rm (op1, result, 0, vl);
-  assert_equal (0, get_frm (), "The value of frm register should be 0.");
-
-  return result;
-}
+DEFINE_TEST_FUNC (0)
+DEFINE_TEST_FUNC (1)
+DEFINE_TEST_FUNC (2)
+DEFINE_TEST_FUNC (3)
+DEFINE_TEST_FUNC (4)
int
main ()
@@ -72,8 +73,20 @@ main ()
   vfloat32m1_t op2;
   set_frm (4);
-  test_float_point_frm_run (op1, op2, vl);
+  RUN_TEST_FUNC (0, op1, op2, vl);
+  assert_equal (4, get_frm (), "The value of frm register should be 4.");
+
+  RUN_TEST_FUNC (1, op1, op2, vl);
+  assert_equal (4, get_frm (), "The value of frm register should be 4.");
+
+  RUN_TEST_FUNC (2, op1, op2, vl);
+  assert_equal (4, get_frm (), "The value of frm register should be 4.");
+
+  RUN_TEST_FUNC (3, op1, op2, vl);
+  assert_equal (4, get_frm (), "The value of frm register should be 4.");
+
+  RUN_TEST_FUNC (4, op1, op2, vl);
   assert_equal (4, get_frm (), "The value of frm register should be 4.");
   return 0;
-- 
2.34.1
 
 


RE: [PATCH v1] RISC-V: Fix RVV frm run test failure on RV32

2023-07-17 Thread Li, Pan2 via Gcc-patches
Thanks Juzhe, addressed conflict and passed RV32/RV64 tests with below PATCH v2.

https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624723.html

Pan

From: juzhe.zh...@rivai.ai 
Sent: Tuesday, July 18, 2023 9:09 AM
To: Li, Pan2 ; gcc-patches 
Cc: Li, Pan2 ; Wang, Yanzhang ; 
kito.cheng 
Subject: Re: [PATCH v1] RISC-V: Fix RVV frm run test failure on RV32

LGTM


juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-07-14 21:20
To: gcc-patches
CC: juzhe.zhong; 
pan2.li; 
yanzhang.wang; 
kito.cheng
Subject: [PATCH v1] RISC-V: Fix RVV frm run test failure on RV32
From: Pan Li mailto:pan2...@intel.com>>

Refine the run test case to avoid interactive checking in RV32, by
separating each checks in different functions.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-frm-run-1.c: Fix failure
on RV32.
---
.../riscv/rvv/base/float-point-frm-run-1.c| 58 ++-
1 file changed, 31 insertions(+), 27 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-run-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-run-1.c
index 210c49c5e8d..1d90b4f50d9 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-run-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-run-1.c
@@ -5,6 +5,24 @@
#include 
#include 
+#define DEFINE_TEST_FRM_FUNC(FRM) \
+vfloat32m1_t __attribute__ ((noinline)) \
+test_float_point_frm_run_##FRM (vfloat32m1_t op1, vfloat32m1_t op2, size_t vl) 
\
+{  
\
+  vfloat32m1_t result; 
\
+   
\
+  set_frm (0); 
\
+   
\
+  result = __riscv_vfadd_vv_f32m1_rm (op1, result, FRM, vl);   
\
+   
\
+  assert_equal (FRM, get_frm (), "The value of frm should be " #FRM ".");  
\
+   
\
+  return result;   
\
+}
+
+#define CALL_TEST_FUNC(FRM, op1, op2, vl) \
+  test_float_point_frm_run_##FRM (op1, op2, vl)
+
static int
get_frm ()
{
@@ -31,40 +49,22 @@ set_frm (int frm)
   );
}
-static inline void
+void __attribute__ ((noinline)) \
assert_equal (int a, int b, char *message)
{
   if (a != b)
 {
-  printf (message);
+  fprintf (stdout, message);
+  fflush (stdout);
   __builtin_abort ();
 }
}
-vfloat32m1_t __attribute__ ((noinline))
-test_float_point_frm_run (vfloat32m1_t op1, vfloat32m1_t op2, size_t vl)
-{
-  set_frm (0);
-
-  vfloat32m1_t result;
-
-  result = __riscv_vfadd_vv_f32m1_rm (op1, result, 1, vl);
-  assert_equal (1, get_frm (), "The value of frm register should be 1.");
-
-  result = __riscv_vfadd_vv_f32m1_rm (op1, result, 2, vl);
-  assert_equal (2, get_frm (), "The value of frm register should be 2.");
-
-  result = __riscv_vfadd_vv_f32m1_rm (op1, result, 3, vl);
-  assert_equal (3, get_frm (), "The value of frm register should be 3.");
-
-  result = __riscv_vfadd_vv_f32m1_rm (op1, result, 4, vl);
-  assert_equal (4, get_frm (), "The value of frm register should be 4.");
-
-  result = __riscv_vfadd_vv_f32m1_rm (op1, result, 0, vl);
-  assert_equal (0, get_frm (), "The value of frm register should be 0.");
-
-  return result;
-}
+DEFINE_TEST_FRM_FUNC (0)
+DEFINE_TEST_FRM_FUNC (1)
+DEFINE_TEST_FRM_FUNC (2)
+DEFINE_TEST_FRM_FUNC (3)
+DEFINE_TEST_FRM_FUNC (4)
int
main ()
@@ -73,7 +73,11 @@ main ()
   vfloat32m1_t op1;
   vfloat32m1_t op2;
-  test_float_point_frm_run (op1, op2, vl);
+  CALL_TEST_FUNC (0, op1, op2, vl);
+  CALL_TEST_FUNC (1, op1, op2, vl);
+  CALL_TEST_FUNC (2, op1, op2, vl);
+  CALL_TEST_FUNC (3, op1, op2, vl);
+  CALL_TEST_FUNC (4, op1, op2, vl);
   return 0;
}
--
2.34.1




[PATCH v2] RISC-V: Fix RVV frm run test failure on RV32

2023-07-17 Thread Pan Li via Gcc-patches
From: Pan Li 

Refine the run test case to avoid interactive checking in RV32, by
separating each checks in different functions.

Signed-off-by: Pan Li 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-frm-run-1.c: Fix run failure.
---
 .../riscv/rvv/base/float-point-frm-run-1.c| 59 +++
 1 file changed, 36 insertions(+), 23 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-run-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-run-1.c
index 245ce7d1fc0..1b2789a924b 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-run-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-run-1.c
@@ -5,6 +5,24 @@
 #include 
 #include 
 
+#define DEFINE_TEST_FUNC(FRM) \
+vfloat32m1_t __attribute__ ((noinline)) \
+test_float_point_frm_run_##FRM (vfloat32m1_t op1, vfloat32m1_t op2, size_t vl) 
\
+{  
\
+  vfloat32m1_t result; 
\
+   
\
+  set_frm (0); 
\
+   
\
+  result = __riscv_vfadd_vv_f32m1_rm (op1, result, FRM, vl);   
\
+   
\
+  assert_equal (FRM, get_frm (), "The value of frm should be " #FRM ".");  
\
+   
\
+  return result;   
\
+}
+
+#define RUN_TEST_FUNC(FRM, op1, op2, vl) \
+  test_float_point_frm_run_##FRM (op1, op2, vl)
+
 static int
 get_frm ()
 {
@@ -41,28 +59,11 @@ assert_equal (int a, int b, char *message)
 }
 }
 
-vfloat32m1_t __attribute__ ((noinline))
-test_float_point_frm_run (vfloat32m1_t op1, vfloat32m1_t op2, size_t vl)
-{
-  vfloat32m1_t result;
-
-  result = __riscv_vfadd_vv_f32m1_rm (op1, result, 1, vl);
-  assert_equal (1, get_frm (), "The value of frm register should be 1.");
-
-  result = __riscv_vfadd_vv_f32m1_rm (op1, result, 2, vl);
-  assert_equal (2, get_frm (), "The value of frm register should be 2.");
-
-  result = __riscv_vfadd_vv_f32m1_rm (op1, result, 3, vl);
-  assert_equal (3, get_frm (), "The value of frm register should be 3.");
-
-  result = __riscv_vfadd_vv_f32m1_rm (op1, result, 4, vl);
-  assert_equal (4, get_frm (), "The value of frm register should be 4.");
-
-  result = __riscv_vfadd_vv_f32m1_rm (op1, result, 0, vl);
-  assert_equal (0, get_frm (), "The value of frm register should be 0.");
-
-  return result;
-}
+DEFINE_TEST_FUNC (0)
+DEFINE_TEST_FUNC (1)
+DEFINE_TEST_FUNC (2)
+DEFINE_TEST_FUNC (3)
+DEFINE_TEST_FUNC (4)
 
 int
 main ()
@@ -72,8 +73,20 @@ main ()
   vfloat32m1_t op2;
 
   set_frm (4);
-  test_float_point_frm_run (op1, op2, vl);
 
+  RUN_TEST_FUNC (0, op1, op2, vl);
+  assert_equal (4, get_frm (), "The value of frm register should be 4.");
+
+  RUN_TEST_FUNC (1, op1, op2, vl);
+  assert_equal (4, get_frm (), "The value of frm register should be 4.");
+
+  RUN_TEST_FUNC (2, op1, op2, vl);
+  assert_equal (4, get_frm (), "The value of frm register should be 4.");
+
+  RUN_TEST_FUNC (3, op1, op2, vl);
+  assert_equal (4, get_frm (), "The value of frm register should be 4.");
+
+  RUN_TEST_FUNC (4, op1, op2, vl);
   assert_equal (4, get_frm (), "The value of frm register should be 4.");
 
   return 0;
-- 
2.34.1



RE: [PATCH v1] RISC-V: Support basic floating-point dynamic rounding mode

2023-07-17 Thread Li, Pan2 via Gcc-patches
Committed, thanks Jeff and Juzhe.

Pan

-Original Message-
From: Jeff Law  
Sent: Tuesday, July 18, 2023 12:59 AM
To: juzhe.zh...@rivai.ai; Li, Pan2 ; gcc-patches 

Cc: Wang, Yanzhang ; kito.cheng 
Subject: Re: [PATCH v1] RISC-V: Support basic floating-point dynamic rounding 
mode



On 7/16/23 19:02, juzhe.zh...@rivai.ai wrote:
> LGTM
And as of today, that's all we need ;-)

Thanks,

Jeff


Re: [PATCH v1] RISC-V: Fix RVV frm run test failure on RV32

2023-07-17 Thread juzhe.zh...@rivai.ai
LGTM



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-07-14 21:20
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Fix RVV frm run test failure on RV32
From: Pan Li 
 
Refine the run test case to avoid interactive checking in RV32, by
separating each checks in different functions.
 
Signed-off-by: Pan Li 
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/float-point-frm-run-1.c: Fix failure
on RV32.
---
.../riscv/rvv/base/float-point-frm-run-1.c| 58 ++-
1 file changed, 31 insertions(+), 27 deletions(-)
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-run-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-run-1.c
index 210c49c5e8d..1d90b4f50d9 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-run-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-run-1.c
@@ -5,6 +5,24 @@
#include 
#include 
+#define DEFINE_TEST_FRM_FUNC(FRM) \
+vfloat32m1_t __attribute__ ((noinline)) \
+test_float_point_frm_run_##FRM (vfloat32m1_t op1, vfloat32m1_t op2, size_t vl) 
\
+{  
\
+  vfloat32m1_t result; 
\
+   
\
+  set_frm (0); 
\
+   
\
+  result = __riscv_vfadd_vv_f32m1_rm (op1, result, FRM, vl);   
\
+   
\
+  assert_equal (FRM, get_frm (), "The value of frm should be " #FRM ".");  
\
+   
\
+  return result;   
\
+}
+
+#define CALL_TEST_FUNC(FRM, op1, op2, vl) \
+  test_float_point_frm_run_##FRM (op1, op2, vl)
+
static int
get_frm ()
{
@@ -31,40 +49,22 @@ set_frm (int frm)
   );
}
-static inline void
+void __attribute__ ((noinline)) \
assert_equal (int a, int b, char *message)
{
   if (a != b)
 {
-  printf (message);
+  fprintf (stdout, message);
+  fflush (stdout);
   __builtin_abort ();
 }
}
-vfloat32m1_t __attribute__ ((noinline))
-test_float_point_frm_run (vfloat32m1_t op1, vfloat32m1_t op2, size_t vl)
-{
-  set_frm (0);
-
-  vfloat32m1_t result;
-
-  result = __riscv_vfadd_vv_f32m1_rm (op1, result, 1, vl);
-  assert_equal (1, get_frm (), "The value of frm register should be 1.");
-
-  result = __riscv_vfadd_vv_f32m1_rm (op1, result, 2, vl);
-  assert_equal (2, get_frm (), "The value of frm register should be 2.");
-
-  result = __riscv_vfadd_vv_f32m1_rm (op1, result, 3, vl);
-  assert_equal (3, get_frm (), "The value of frm register should be 3.");
-
-  result = __riscv_vfadd_vv_f32m1_rm (op1, result, 4, vl);
-  assert_equal (4, get_frm (), "The value of frm register should be 4.");
-
-  result = __riscv_vfadd_vv_f32m1_rm (op1, result, 0, vl);
-  assert_equal (0, get_frm (), "The value of frm register should be 0.");
-
-  return result;
-}
+DEFINE_TEST_FRM_FUNC (0)
+DEFINE_TEST_FRM_FUNC (1)
+DEFINE_TEST_FRM_FUNC (2)
+DEFINE_TEST_FRM_FUNC (3)
+DEFINE_TEST_FRM_FUNC (4)
int
main ()
@@ -73,7 +73,11 @@ main ()
   vfloat32m1_t op1;
   vfloat32m1_t op2;
-  test_float_point_frm_run (op1, op2, vl);
+  CALL_TEST_FUNC (0, op1, op2, vl);
+  CALL_TEST_FUNC (1, op1, op2, vl);
+  CALL_TEST_FUNC (2, op1, op2, vl);
+  CALL_TEST_FUNC (3, op1, op2, vl);
+  CALL_TEST_FUNC (4, op1, op2, vl);
   return 0;
}
-- 
2.34.1
 
 


[PATCH] RISC-V: Enable SLP un-order reduction

2023-07-17 Thread Juzhe-Zhong
This patch is to enable SLP un-order reduction autao-vectorization

Consider this following case:

int __attribute__((noipa))
add_loop (int *x, int n, int res)
{
  for (int i = 0; i < n; ++i)
{
  res += x[i * 2];
  res += x[i * 2 + 1];
}
  return res;
}

--param riscv-autovec-preference=scalable -fopt-info-vec-missed:
:4:21: missed: couldn't vectorize loop
:4:21: missed: unsupported SLP instances

After this patch:

add_loop:
ble a1,zero,.L5
csrra6,vlenb
srlia4,a6,2
sllia1,a1,1
neg a7,a4
vsetvli t1,zero,e32,m1,ta,ma
vmv.v.i v2,0
vslide1up.vxv1,v2,a2   ---> generated by VEC_SHL_INSERT
.L4:
mv  a3,a1
mv  a5,a1
bleua1,a4,.L3
mv  a5,a4
.L3:
vsetvli zero,a5,e32,m1,tu,ma
add a1,a1,a7
vle32.v v2,0(a0)
add a0,a0,a6
vadd.vv v1,v1,v2
bgtua3,a4,.L4
vsetivlizero,1,e32,m1,ta,ma
vmv.v.i v2,0
vsetvli t1,zero,e32,m1,ta,ma
vredsum.vs  v1,v1,v2
vmv.x.s a0,v1
ret
.L5:
mv  a0,a2
ret



gcc/ChangeLog:

* config/riscv/autovec.md (vec_shl_insert_): New patterns.
* config/riscv/riscv-v.cc (shuffle_compress_patterns): Fix bugs.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/reduc/reduc-5.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc-6.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc-7.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc-8.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc-9.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_run-7.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_run-8.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_run-9.c: New test.

---
 gcc/config/riscv/autovec.md   |  32 +++
 gcc/config/riscv/riscv-v.cc   |   4 +
 .../riscv/rvv/autovec/reduc/reduc-5.c |  88 
 .../riscv/rvv/autovec/reduc/reduc-6.c |   6 +
 .../riscv/rvv/autovec/reduc/reduc-7.c |  88 
 .../riscv/rvv/autovec/reduc/reduc-8.c |  16 ++
 .../riscv/rvv/autovec/reduc/reduc-9.c |  16 ++
 .../riscv/rvv/autovec/reduc/reduc_run-5.c |  61 ++
 .../riscv/rvv/autovec/reduc/reduc_run-6.c |   6 +
 .../riscv/rvv/autovec/reduc/reduc_run-7.c | 188 ++
 .../riscv/rvv/autovec/reduc/reduc_run-8.c |  22 ++
 .../riscv/rvv/autovec/reduc/reduc_run-9.c |  22 ++
 12 files changed, 549 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-9.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-6.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-7.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-9.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 8cdec75bacf..a85821ada9c 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1692,3 +1692,35 @@
   riscv_vector::expand_reduction (SMIN, operands, f);
   DONE;
 })
+
+;; -
+;;  [INT,FP] Initialize from individual elements
+;; -
+;; Includes:
+;; - vslide1up.vx/vfslide1up.vf
+;; -
+
+;; Slide an RVV vector left and insert a scalar into element 0.
+(define_expand "vec_shl_insert_"
+  [(match_operand:VI 0 "register_operand")
+   (match_operand:VI 1 "register_operand")
+   (match_operand: 2 "reg_or_0_operand")]
+  "TARGET_VECTOR"
+{
+  insn_code icode = code_for_pred_slide (UNSPEC_VSLIDE1UP, mode);
+  rtx ops[] = {operands[0], RVV_VUNDEF (mode), operands[1], operands[2]};
+  riscv_vector::emit_vlmax_slide_insn (icode, ops);
+  DONE;
+})
+
+(define_expand "vec_shl_insert_"
+  [(match_operand:VF 0 "register_operand")
+   (match_operand:VF 1 "register_operand")
+   (match_operand: 2 "register_operand")]
+  "TARGET_VECTOR"
+{
+  insn_code icode = code_for_pred_slide (UNSPEC_VFSLIDE1UP, mode);
+  rtx ops[] = {operands[0], RVV_VUNDEF (mode), operands[1], 

[FYI] c++: check for trying to cache non-constant expressions

2023-07-17 Thread Jason Merrill via Gcc-patches
Just for reference, not applying.

-- 8< --

This is what I used to check for what non-constant results we were
previously caching.  After the previous two patches it's just address
of a local and a partially-uninitialized COMPLEX_EXPR.

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_call_expression): Add checking.
---
 gcc/cp/constexpr.cc | 21 +
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 6e8f1c2b61e..ed4e6a3acf9 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -3206,6 +3206,11 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree 
t,
  cacheable = false;
  break;
}
+ /* Also don't cache a call that returns a deallocated pointer.  */
+ if (cacheable && CHECKING_P
+ && (cp_walk_tree_without_duplicates
+ (, find_heap_var_refs, NULL)))
+   cacheable = false;
}
 
/* Rewrite all occurrences of the function's RESULT_DECL with the
@@ -3215,10 +3220,6 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree 
t,
&& !is_empty_class (TREE_TYPE (res)))
  if (replace_decl (, res, ctx->object))
cacheable = false;
-
- /* Only cache a permitted result of a constant expression.  */
- if (cacheable && !reduced_constant_expression_p (result))
-   cacheable = false;
}
   else
/* Couldn't get a function copy to evaluate.  */
@@ -3230,6 +3231,18 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree 
t,
result = error_mark_node;
   else if (!result)
result = void_node;
+
+  /* Only cache a permitted result of a constant expression.  */
+  if (cacheable && !reduced_constant_expression_p (result))
+   {
+ cacheable = false;
+ if (CHECKING_P
+ && result != void_node && result != error_mark_node
+ && !(TREE_CODE (result) == CONSTRUCTOR
+  && CONSTRUCTOR_NO_CLEARING (result)))
+   internal_error ("caching %qE", result);
+   }
+
   if (entry)
entry->result = cacheable ? result : error_mark_node;
 }

base-commit: c7ac1de5f5c561b2d90c084a638c232d322d54e6
-- 
2.39.3



Re: [V1][PATCH 0/3] New attribute "element_count" to annotate bounds for C99 FAM(PR108896)

2023-07-17 Thread Kees Cook via Gcc-patches
On Mon, Jul 17, 2023 at 09:17:48PM +, Qing Zhao wrote:
> 
> > On Jul 13, 2023, at 4:31 PM, Kees Cook  wrote:
> > 
> > In the bug, the problem is that "p" isn't known to be allocated, if I'm
> > reading that correctly?
> 
> I think that the major point in PR109557 
> (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109557):
> 
> for the following pointer p.3_1, 
> 
> p.3_1 = p;
> _2 = __builtin_object_size (p.3_1, 0);
> 
> Question: why the size of p.3_1 cannot use the TYPE_SIZE of the pointee of p 
> when the TYPE_SIZE can be determined at compile time?
> 
> Answer:  From just knowing the type of the pointee of p, the compiler cannot 
> determine the size of the object.  

Why is that? "p" points to "struct P", which has a fixed size. There
must be an assumption somewhere that a pointer is allocated, otherwise
__bos would almost never work?

> Therefore the bug has been closed. 
> 
> In your following testing 5:
> 
> > I'm not sure this is a reasonable behavior, but
> > let me get into the specific test, which looks like this:
> > 
> > TEST(counted_by_seen_by_bdos)
> > {
> >struct annotated *p;
> >int index = MAX_INDEX + unconst;
> > 
> >p = alloc_annotated(index);
> > 
> >REPORT_SIZE(p->array);
> > /* 1 */ EXPECT_EQ(sizeof(*p), offsetof(typeof(*p), array));
> >/* Check array size alone. */
> > /* 2 */ EXPECT_EQ(__builtin_object_size(p->array, 1), SIZE_MAX);
> > /* 3 */ EXPECT_EQ(__builtin_dynamic_object_size(p->array, 1), p->foo * 
> > sizeof(*p->array));
> >/* Check check entire object size. */
> > /* 4 */ EXPECT_EQ(__builtin_object_size(p, 1), SIZE_MAX);
> > /* 5 */ EXPECT_EQ(__builtin_dynamic_object_size(p, 1), sizeof(*p) + p->foo 
> > * sizeof(*p->array));
> > }
> > 
> > Test 5 should pass as well, since, again, p can be examined. Passing p
> > to __bdos implies it is allocated and the __counted_by annotation can be
> > examined.
> 
> Since the call to the routine “alloc_annotated" cannot be inlined, GCC does 
> not see any allocation calls for the pointer p.

Correct.

> At the same time, due to the same reason as PR109986, GCC cannot determine 
> the size of the object by just knowing the TYPE_SIZE
> of the pointee of p.  

So the difference between test 3 and test 5 is that "p" is explicitly
dereferenced to find "array", and therefore an assumption is established
that "p" is allocated?

> So, this is exactly the same issue as PR109557.  It’s an existing behavior 
> per the current __buitlin_object_size algorithm.
> I am still not very sure whether the situation in PR109557 can be improved or 
> not, but either way, it’s a separate issue.

Okay, so the issue is one of object allocation visibility (or
assumptions there in)?

> Please see the new testing case I added for PR109557, you will see that the 
> above case 5 is a similar case as the new testing case in PR109557.

I will ponder this a bit more to see if I can come up with a useful
test case to replace the part from "test 5" above.

> > 
> > If "return p->array[index];" would be caught by the sanitizer, then
> > it follows that __builtin_dynamic_object_size(p, 1) must also know the
> > size. i.e. both must examine "p" and its trailing flexible array and
> > __counted_by annotation.
> > 
> >> 
> >> 2. The common issue for  the failed testing 3, 4, 9, 10 is:
> >> 
> >> for the following annotated structure: 
> >> 
> >> 
> >> struct annotated {
> >>unsigned long flags;
> >>size_t foo;
> >>int array[] __attribute__((counted_by (foo)));
> >> };
> >> 
> >> 
> >> struct annotated *p;
> >> int index = 16;
> >> 
> >> p = malloc(sizeof(*p) + index * sizeof(*p->array));  // allocated real 
> >> size 
> >> 
> >> p->foo = index + 2;  // p->foo was set by a different value than the real 
> >> size of p->array as in test 9 and 10
> > 
> > Right, tests 9 and 10 are checking that the _smallest_ possible value of
> > the array is used. (There are two sources of information: the allocation
> > size and the size calculated by counted_by. The smaller of the two
> > should be used when both are available.)
> 
> The counted_by attribute is used to annotate a Flexible array member on how 
> many elements it will have.
> However, if this information can not accurately reflect the real number of 
> elements for the array allocated, 
> What’s the purpose of such information? 

For example, imagine code that allocates space for 100 elements since
the common case is that the number of elements will grow over time.
Elements are added as it goes. For example:

struct grows {
int alloc_count;
int valid_count;
struct element item[] __counted_by(valid_count);
} *p;

void something(void)
{
p = malloc(sizeof(*p) + sizeof(*p->item) * 100);
p->alloc_count = 100;
p->valid_count = 0;

/* this loop doesn't check that we don't go over 100. */
while (items_to_copy) {
struct element *item_ptr = get_next_item();
/* 

Re: [x86-64] RFC: Add nosse abi attribute

2023-07-17 Thread Richard Sandiford via Gcc-patches
Michael Matz via Gcc-patches  writes:
> Hello,
>
> the ELF psABI for x86-64 doesn't have any callee-saved SSE
> registers (there were actual reasons for that, but those don't
> matter anymore).  This starts to hurt some uses, as it means that
> as soon as you have a call (say to memmove/memcpy, even if
> implicit as libcall) in a loop that manipulates floating point
> or vector data you get saves/restores around those calls.
>
> But in reality many functions can be written such that they only need
> to clobber a subset of the 16 XMM registers (or do the save/restore
> themself in the codepaths that needs them, hello memcpy again).
> So we want to introduce a way to specify this, via an ABI attribute
> that basically says "doesn't clobber the high XMM regs".
>
> I've opted to do only the obvious: do something special only for
> xmm8 to xmm15, without a way to specify the clobber set in more detail.
> I think such half/half split is reasonable, and as I don't want to
> change the argument passing anyway (whose regs are always clobbered)
> there isn't that much wiggle room anyway.
>
> I chose to make it possible to write function definitions with that
> attribute with GCC adding the necessary callee save/restore code in
> the xlogue itself.  Carefully note that this is only possible for
> the SSE2 registers, as other parts of them would need instructions
> that are only optional.  When a function doesn't contain calls to
> unknown functions we can be a bit more lenient: we can make it so that
> GCC simply doesn't touch xmm8-15 at all, then no save/restore is
> necessary.  If a function contains calls then GCC can't know which
> parts of the XMM regset is clobbered by that, it may be parts
> which don't even exist yet (say until avx2048 comes out), so we must
> restrict ourself to only save/restore the SSE2 parts and then of course
> can only claim to not clobber those parts.
>
> To that end I introduce actually two related attributes (for naming
> see below):
> * nosseclobber: claims (and ensures) that xmm8-15 aren't clobbered
> * noanysseclobber: claims (and ensures) that nothing of any of the
>   registers overlapping xmm8-15 is clobbered (not even future, as of
>   yet unknown, parts)
>
> Ensuring the first is simple: potentially add saves/restore in xlogue
> (e.g. when xmm8 is either used explicitely or implicitely by a call).
> Ensuring the second comes with more: we must also ensure that no
> functions are called that don't guarantee the same thing (in addition
> to just removing all xmm8-15 parts alltogether from the available
> regsters).
>
> See also the added testcases for what I intended to support.
>
> I chose to use the new target independend function-abi facility for
> this.  I need some adjustments in generic code:
> * the "default_abi" is actually more like a "current" abi: it happily
>   changes its contents according to conditional_register_usage,
>   and other code assumes that such changes do propagate.
>   But if that conditonal_reg_usage is actually done because the current
>   function is of a different ABI, then we must not change default_abi.
> * in insn_callee_abi we do look at a potential fndecl for a call
>   insn (only set when -fipa-ra), but doesn't work for calls through
>   pointers and (as said) is optional.  So, also always look at the
>   called functions type (it's always recorded in the MEM_EXPR for
>   non-libcalls), before asking the target.
>   (The function-abi accessors working on trees were already doing that,
>   its just the RTL accessor that missed this)
> [...]
> diff --git a/gcc/function-abi.cc b/gcc/function-abi.cc
> index 2ab9b2c5649..efbe114218c 100644
> --- a/gcc/function-abi.cc
> +++ b/gcc/function-abi.cc
> @@ -42,6 +42,26 @@ void
>  predefined_function_abi::initialize (unsigned int id,
>const_hard_reg_set full_reg_clobbers)
>  {
> +  /* Don't reinitialize an ABI struct.  We might be called from reinit_regs
> + from the targets conditional_register_usage hook which might depend
> + on cfun and might have changed the global register sets according
> + to that functions ABI already.  That's not the default ABI anymore.
> +
> + XXX only avoid this if we're reinitializing the default ABI, and the
> + current function is _not_ of the default ABI.  That's for
> + backward compatibility where some backends modify the regsets with
> + the exception that those changes are then reflected also in the default
> + ABI (which rather is then the "current" ABI).  E.g. x86_64 with the
> + ms_abi vs sysv attribute.  They aren't reflected by separate ABI
> + structs, but handled different.  The "default" ABI hence changes
> + back and forth (and is expected to!) between a ms_abi and a sysv
> + function.  */

The default ABI is also the eh_edge_abi, and so describes the set of
registers that are preserved or clobbered across EH edges.  If changing
between ms_abi and sysv changes the "default" 

Re: [PATCH v2 0/2] ifcvt: Allow if conversion of arithmetic in basic blocks with multiple sets

2023-07-17 Thread Richard Sandiford via Gcc-patches
Manolis Tsamis  writes:
> noce_convert_multiple_sets has been introduced and extended over time to 
> handle
> if conversion for blocks with multiple sets. Currently this is focused on
> register moves and rejects any sort of arithmetic operations.
>
> This series is an extension to allow more sequences to take part in if
> conversion. The first patch is a required change to emit correct code and the
> second patch whitelists a larger number of operations through
> bb_ok_for_noce_convert_multiple_sets.
>
> For targets that have a rich selection of conditional instructions,
> like aarch64, I have seen an ~5x increase of profitable if conversions for
> multiple set blocks in SPEC benchmarks. Also tested with a wide variety of
> benchmarks and I have not seen performance regressions on either x64 / 
> aarch64.

Interesting results.  Are you free to say which target you used for aarch64?

If I've understood the cost heuristics correctly, we'll allow a "predictable"
branch to be replaced by up to 5 simple conditional instructions and an
"unpredictable" branch to be replaced by up to 10 simple conditional
instructions.  That seems pretty high.  And I'm not sure how well we
guess predictability in the absence of real profile information.

So my gut instinct was that the limitations of the current code might
be saving us from overly generous limits.  It sounds from your results
like that might not be the case though.

Still, if it does turn out to be the case in future, I agree we should
fix the costs rather than hamstring the code.

> Some samples that previously resulted in a branch but now better use these
> instructions can be seen in the provided test case.
>
> Tested on aarch64 and x64; On x64 some tests that use __builtin_rint are
> failing with an ICE but I believe that it's not an issue of this change.
> force_operand crashes when (and:DF (not:DF (reg:DF 88)) (reg/v:DF 83 [ x ]))
> is provided through emit_conditional_move.

I guess that needs to be fixed first though.  (Thanks for checking both
targets.)

My main comments on the series are:

(1) It isn't obvious which operations should be included in the list
in patch 2 and which shouldn't.  Also, the patch only checks the
outermost operation, and so it allows the inner rtxes to be
arbitrarily complex.

Because of that, it might be better to remove the condition
altogether and just rely on the other routines to do costing and
correctness checks.

(2) Don't you also need to update the "rewiring" mechanism, to cope
with cases where the then block has something like:

  if (a == 0) {
a = b op c;   ->a' = a == 0 ? b op c : a;
d = a op b;   ->d = a == 0 ? a' op b : d;
  } a = a'

At the moment the code only handles regs and subregs, whereas but IIUC
it should now iterate over all the regs in the SET_SRC.  And I suppose
that creates the need for multiple possible rewirings in the same insn,
so that it isn't a simple insn -> index mapping any more.

Thanks,
Richard

>
>
> Changes in v2:
> - Change "conditional moves" to "conditional instructions"
> in bb_ok_for_noce_convert_multiple_sets's comment.
>
> Manolis Tsamis (2):
>   ifcvt: handle sequences that clobber flags in
> noce_convert_multiple_sets
>   ifcvt: Allow more operations in multiple set if conversion
>
>  gcc/ifcvt.cc  | 109 ++
>  .../aarch64/ifcvt_multiple_sets_arithm.c  |  67 +++
>  2 files changed, 127 insertions(+), 49 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_arithm.c


Re: [PATCH] c++: redundant targ coercion for var/alias tmpls

2023-07-17 Thread Patrick Palka via Gcc-patches
On Fri, 14 Jul 2023, Jason Merrill wrote:

> On 7/14/23 14:07, Patrick Palka wrote:
> > On Thu, 13 Jul 2023, Jason Merrill wrote:
> > 
> > > On 7/13/23 11:48, Patrick Palka wrote:
> > > > On Wed, 28 Jun 2023, Patrick Palka wrote:
> > > > 
> > > > > On Wed, Jun 28, 2023 at 11:50 AM Jason Merrill 
> > > > > wrote:
> > > > > > 
> > > > > > On 6/23/23 12:23, Patrick Palka wrote:
> > > > > > > On Fri, 23 Jun 2023, Jason Merrill wrote:
> > > > > > > 
> > > > > > > > On 6/21/23 13:19, Patrick Palka wrote:
> > > > > > > > > When stepping through the variable/alias template
> > > > > > > > > specialization
> > > > > > > > > code
> > > > > > > > > paths, I noticed we perform template argument coercion twice:
> > > > > > > > > first from
> > > > > > > > > instantiate_alias_template / finish_template_variable and
> > > > > > > > > again
> > > > > > > > > from
> > > > > > > > > tsubst_decl (during instantiate_template).  It should suffice
> > > > > > > > > to
> > > > > > > > > perform
> > > > > > > > > coercion once.
> > > > > > > > > 
> > > > > > > > > To that end patch elides this second coercion from tsubst_decl
> > > > > > > > > when
> > > > > > > > > possible.  We can't get rid of it completely because we don't
> > > > > > > > > always
> > > > > > > > > specialize a variable template from finish_template_variable:
> > > > > > > > > we
> > > > > > > > > could
> > > > > > > > > also be doing so directly from instantiate_template during
> > > > > > > > > variable
> > > > > > > > > template partial specialization selection, in which case the
> > > > > > > > > coercion
> > > > > > > > > from tsubst_decl would be the first and only coercion.
> > > > > > > > 
> > > > > > > > Perhaps we should be coercing in lookup_template_variable rather
> > > > > > > > than
> > > > > > > > finish_template_variable?
> > > > > > > 
> > > > > > > Ah yes, there's a patch for that at
> > > > > > > https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617377.html :)
> > > > > > 
> > > > > > So after that patch, can we get rid of the second coercion
> > > > > > completely?
> > > > > 
> > > > > On second thought it should be possible to get rid of it, if we
> > > > > rearrange things to always pass the primary arguments to tsubst_decl,
> > > > > and perform partial specialization selection from there instead of
> > > > > instantiate_template.  Let me try...
> > > > 
> > > > Like so?  Bootstrapped and regtested on x86_64-pc-linux-gnu.
> > > > 
> > > > -- >8 --
> > > > 
> > > > When stepping through the variable/alias template specialization code
> > > > paths, I noticed we perform template argument coercion twice: first from
> > > > instantiate_alias_template / finish_template_variable and again from
> > > > tsubst_decl (during instantiate_template).  It'd be good to avoid this
> > > > redundant coercion.
> > > > 
> > > > It turns out that this coercion could be safely elided whenever
> > > > specializing a primary variable/alias template, because we can rely on
> > > > lookup_template_variable and instantiate_alias_template to already have
> > > > coerced the arguments.
> > > > 
> > > > The other situation to consider is when fully specializing a partial
> > > > variable template specialization (from instantiate_template), in which
> > > > case the passed 'args' are the (already coerced) arguments relative to
> > > > the partial template and 'argvec', the result of substitution into
> > > > DECL_TI_ARGS, are the (uncoerced) arguments relative to the primary
> > > > template, so coercion is still necessary.  We can still avoid this
> > > > coercion however if we always pass the primary variable template to
> > > > tsubst_decl from instantiate_template, and instead perform partial
> > > > specialization selection directly from tsubst_decl.  This patch
> > > > implements this approach.
> > > 
> > > The relationship between instantiate_template and tsubst_decl is pretty
> > > tangled.  We use the former to substitute (often deduced) template
> > > arguments
> > > into a template, and the latter to substitute template arguments into a
> > > use of
> > > a template...and also to implement the former.
> > > 
> > > For substitution of uses of a template, we expect to need to coerce the
> > > arguments after substitution.  But we avoid this issue for variable
> > > templates
> > > by keeping them as TEMPLATE_ID_EXPR until substitution time, so if we see
> > > a
> > > VAR_DECL in tsubst_decl it's either a non-template variable or under
> > > instantiate_template.
> > 
> > FWIW it seems we could also be in tsubst_decl for a VAR_DECL if
> > 
> >* we're partially instantiating a class-scope variable template
> >  during instantiation of the class
> 
> Hmm, why don't partial instantiations stay as TEMPLATE_ID_EXPR?

Whoops, I accidentally omitted a crucial word.  The situation is when
partially instantiating a class-scope variable template _declaration_,
e.g. for

  template
  struct A {
template static int v;
  };

  template struct A;

we call 

[RFC v2] RISC-V: Add Ztso atomic mappings

2023-07-17 Thread Patrick O'Neill
The RISC-V Ztso extension currently has no effect on generated code.
With the additional ordering constraints guarenteed by Ztso, we can emit
more optimized atomic mappings than the RVWMO mappings.

This PR defines a strengthened version of Andrea Parri's proposed Ztso mappings 
("Proposed Mapping") [1]. The changes were discussed by Andrea Parri and Hans 
Boehm on the GCC mailing list and are required in order to be compatible with 
the RVWMO ABI [2].

This change corresponds to the Ztso psABI proposal[3].

[1] https://github.com/preames/public-notes/blob/master/riscv-tso-mappings.rst
[2] https://inbox.sourceware.org/gcc-patches/ZFV8pNAstwrF2qBb@andrea/T/#t
[3] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/391

gcc/ChangeLog:

2023-07-17  Patrick O'Neill  

* common/config/riscv/riscv-common.cc: Add Ztso and mark Ztso as
dependent on 'a' extension.
* config/riscv/riscv-opts.h (MASK_ZTSO): New mask.
(TARGET_ZTSO): New target.
* config/riscv/riscv.cc (riscv_memmodel_needs_amo_acquire): Add
Ztso case.
(riscv_memmodel_needs_amo_release): Add Ztso case.
(riscv_print_operand): Add Ztso case for LR/SC annotations.
* config/riscv/riscv.md: Import sync-rvwmo.md and sync-ztso.md.
* config/riscv/riscv.opt: Add Ztso target variable.
* config/riscv/sync.md (mem_thread_fence_1): Expand to RVWMO or
Ztso specific insn.
(atomic_load): Expand to RVWMO or Ztso specific insn.
(atomic_store): Expand to RVWMO or Ztso specific insn.
* config/riscv/sync-rvwmo.md: New file. Seperate out RVWMO
specific load/store/fence mappings.
* config/riscv/sync-ztso.md: New file. Seperate out Ztso
specific load/store/fence mappings.

gcc/testsuite/ChangeLog:

2023-07-17  Patrick O'Neill  

* gcc.target/riscv/amo-table-ztso-amo-add-1.c: New test.
* gcc.target/riscv/amo-table-ztso-amo-add-2.c: New test.
* gcc.target/riscv/amo-table-ztso-amo-add-3.c: New test.
* gcc.target/riscv/amo-table-ztso-amo-add-4.c: New test.
* gcc.target/riscv/amo-table-ztso-amo-add-5.c: New test.
* gcc.target/riscv/amo-table-ztso-compare-exchange-1.c: New test.
* gcc.target/riscv/amo-table-ztso-compare-exchange-2.c: New test.
* gcc.target/riscv/amo-table-ztso-compare-exchange-3.c: New test.
* gcc.target/riscv/amo-table-ztso-compare-exchange-4.c: New test.
* gcc.target/riscv/amo-table-ztso-compare-exchange-5.c: New test.
* gcc.target/riscv/amo-table-ztso-compare-exchange-6.c: New test.
* gcc.target/riscv/amo-table-ztso-compare-exchange-7.c: New test.
* gcc.target/riscv/amo-table-ztso-fence-1.c: New test.
* gcc.target/riscv/amo-table-ztso-fence-2.c: New test.
* gcc.target/riscv/amo-table-ztso-fence-3.c: New test.
* gcc.target/riscv/amo-table-ztso-fence-4.c: New test.
* gcc.target/riscv/amo-table-ztso-fence-5.c: New test.
* gcc.target/riscv/amo-table-ztso-load-1.c: New test.
* gcc.target/riscv/amo-table-ztso-load-2.c: New test.
* gcc.target/riscv/amo-table-ztso-load-3.c: New test.
* gcc.target/riscv/amo-table-ztso-store-1.c: New test.
* gcc.target/riscv/amo-table-ztso-store-2.c: New test.
* gcc.target/riscv/amo-table-ztso-store-3.c: New test.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-1.c: New test.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-2.c: New test.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-3.c: New test.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-4.c: New test.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-5.c: New test.

Signed-off-by: Patrick O'Neill 
---
 gcc/common/config/riscv/riscv-common.cc   |   4 +
 gcc/config/riscv/riscv-opts.h |   4 +
 gcc/config/riscv/riscv.cc |  20 +++-
 gcc/config/riscv/riscv.md |   2 +
 gcc/config/riscv/riscv.opt|   3 +
 gcc/config/riscv/sync-rvwmo.md|  96 
 gcc/config/riscv/sync-ztso.md |  80 +
 gcc/config/riscv/sync.md  | 107 ++
 .../riscv/amo-table-ztso-amo-add-1.c  |  15 +++
 .../riscv/amo-table-ztso-amo-add-2.c  |  15 +++
 .../riscv/amo-table-ztso-amo-add-3.c  |  15 +++
 .../riscv/amo-table-ztso-amo-add-4.c  |  15 +++
 .../riscv/amo-table-ztso-amo-add-5.c  |  15 +++
 .../riscv/amo-table-ztso-compare-exchange-1.c |  10 ++
 .../riscv/amo-table-ztso-compare-exchange-2.c |  10 ++
 .../riscv/amo-table-ztso-compare-exchange-3.c |  10 ++
 .../riscv/amo-table-ztso-compare-exchange-4.c |  10 ++
 .../riscv/amo-table-ztso-compare-exchange-5.c |  10 ++
 .../riscv/amo-table-ztso-compare-exchange-6.c |  10 ++
 .../riscv/amo-table-ztso-compare-exchange-7.c |  10 ++
 .../gcc.target/riscv/amo-table-ztso-fence-1.c |  14 +++
 

[pushed] extend.texi: index __auto_type

2023-07-17 Thread Arsen Arsenović via Gcc-patches
gcc/ChangeLog:

* doc/extend.texi: Add @cindex on __auto_type.
---
Pushed as obvious.

 gcc/doc/extend.texi | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 093bd97ba4d..ec9ffa3c86e 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -843,6 +843,7 @@ Thus, @code{array (pointer (char), 4)} is the type of 
arrays of 4
 pointers to @code{char}.
 @end itemize
 
+@cindex @code{__auto_type} in GNU C
 In GNU C, but not GNU C++, you may also declare the type of a variable
 as @code{__auto_type}.  In that case, the declaration must declare
 only one variable, whose declarator must just be an identifier, the
-- 
2.41.0



[PATCH RFA (fold)] c++: constexpr bit_cast with empty field

2023-07-17 Thread Jason Merrill via Gcc-patches
Tested x86_64-pc-linux-gnu, OK for trunk?

-- 8< --

The change to only cache constexpr calls that are
reduced_constant_expression_p tripped on bit-cast3.C, which failed that
predicate due to the presence of an empty field in the result of
native_interpret_aggregate, which reduced_constant_expression_p rejects to
avoid confusing output_constructor.

This patch proposes to skip such fields in native_interpret_aggregate, since
they aren't actually involved in the value representation.

gcc/ChangeLog:

* fold-const.cc (native_interpret_aggregate): Skip empty fields.

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_bit_cast): Check that the result of
native_interpret_aggregate doesn't need more evaluation.
---
 gcc/cp/constexpr.cc | 9 +
 gcc/fold-const.cc   | 3 ++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 9d85c3be5cc..6e8f1c2b61e 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -1440,6 +1440,8 @@ enum value_cat {
 
 static tree cxx_eval_constant_expression (const constexpr_ctx *, tree,
  value_cat, bool *, bool *, tree * = 
NULL);
+static tree cxx_eval_bare_aggregate (const constexpr_ctx *, tree,
+value_cat, bool *, bool *);
 static tree cxx_fold_indirect_ref (const constexpr_ctx *, location_t, tree, 
tree,
   bool * = NULL);
 static tree find_heap_var_refs (tree *, int *, void *);
@@ -4803,6 +4805,13 @@ cxx_eval_bit_cast (const constexpr_ctx *ctx, tree t, 
bool *non_constant_p,
{
  clear_type_padding_in_mask (TREE_TYPE (t), mask);
  clear_uchar_or_std_byte_in_mask (loc, r, mask);
+ if (CHECKING_P)
+   {
+ tree e = cxx_eval_bare_aggregate (ctx, r, vc_prvalue,
+   non_constant_p, overflow_p);
+ gcc_checking_assert (e == r);
+ r = e;
+   }
}
 }
 
diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index a02ede79fed..db8f7de5680 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -8935,7 +8935,8 @@ native_interpret_aggregate (tree type, const unsigned 
char *ptr, int off,
 return NULL_TREE;
   for (tree field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field))
 {
-  if (TREE_CODE (field) != FIELD_DECL || DECL_PADDING_P (field))
+  if (TREE_CODE (field) != FIELD_DECL || DECL_PADDING_P (field)
+ || integer_zerop (DECL_SIZE (field)))
continue;
   tree fld = field;
   HOST_WIDE_INT bitoff = 0, pos = 0, sz = 0;

base-commit: caabf0973a4e9a26421c94d540e3e20051e93e77
-- 
2.39.3



[pushed] c++: only cache constexpr calls that are constant exprs

2023-07-17 Thread Jason Merrill via Gcc-patches
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

In reviewing Nathaniel's patch for PR70331, it occurred to me that instead
of looking for various specific problematic things in the result of a
constexpr call to decide whether to cache it, we should use
reduced_constant_expression_p.

The change to that function is to avoid crashing on uninitialized objects of
non-class type.

In a trial version of this patch I checked to see what cases this stopped
caching; some were instances of partially-initialized return values, which
seem fine to not cache.  Some were returning pointers to expiring local
variables, which we definitely want not to cache.  And one was bit-cast3.C,
which will be handled in a follow-up patch.

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_call_expression): Only cache
reduced_constant_expression_p results.
(reduced_constant_expression_p): Handle CONSTRUCTOR of scalar type.
(cxx_eval_constant_expression): Fold vectors here.
(cxx_eval_bare_aggregate): Not here.
---
 gcc/cp/constexpr.cc | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index c6f323ebf43..9d85c3be5cc 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -3033,7 +3033,7 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree 
t,
 }
   else
 {
-  bool cacheable = true;
+  bool cacheable = !!entry;
   if (result && result != error_mark_node)
/* OK */;
   else if (!DECL_SAVED_TREE (fun))
@@ -3185,7 +3185,7 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree 
t,
 for the constexpr evaluation and should not be cached.
 It is fine if the call allocates something and deallocates it
 too.  */
- if (entry
+ if (cacheable
  && (save_heap_alloc_count != ctx->global->heap_vars.length ()
  || (save_heap_dealloc_count
  != ctx->global->heap_dealloc_count)))
@@ -3204,10 +3204,6 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree 
t,
  cacheable = false;
  break;
}
- /* Also don't cache a call that returns a deallocated pointer.  */
- if (cacheable && (cp_walk_tree_without_duplicates
-   (, find_heap_var_refs, NULL)))
-   cacheable = false;
}
 
/* Rewrite all occurrences of the function's RESULT_DECL with the
@@ -3217,6 +3213,10 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree 
t,
&& !is_empty_class (TREE_TYPE (res)))
  if (replace_decl (, res, ctx->object))
cacheable = false;
+
+ /* Only cache a permitted result of a constant expression.  */
+ if (cacheable && !reduced_constant_expression_p (result))
+   cacheable = false;
}
   else
/* Couldn't get a function copy to evaluate.  */
@@ -3268,8 +3268,9 @@ reduced_constant_expression_p (tree t)
 case CONSTRUCTOR:
   /* And we need to handle PTRMEM_CST wrapped in a CONSTRUCTOR.  */
   tree field;
-  if (TREE_CODE (TREE_TYPE (t)) == VECTOR_TYPE)
-   /* An initialized vector would have a VECTOR_CST.  */
+  if (!AGGREGATE_TYPE_P (TREE_TYPE (t)))
+   /* A constant vector would be folded to VECTOR_CST.
+  A CONSTRUCTOR of scalar type means uninitialized.  */
return false;
   if (CONSTRUCTOR_NO_CLEARING (t))
{

base-commit: caabf0973a4e9a26421c94d540e3e20051e93e77
-- 
2.39.3



Re: [V1][PATCH 0/3] New attribute "element_count" to annotate bounds for C99 FAM(PR108896)

2023-07-17 Thread Qing Zhao via Gcc-patches


> On Jul 13, 2023, at 4:31 PM, Kees Cook  wrote:
> 
> In the bug, the problem is that "p" isn't known to be allocated, if I'm
> reading that correctly?

I think that the major point in PR109557 
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109557):

for the following pointer p.3_1, 

p.3_1 = p;
_2 = __builtin_object_size (p.3_1, 0);

Question: why the size of p.3_1 cannot use the TYPE_SIZE of the pointee of p 
when the TYPE_SIZE can be determined at compile time?

Answer:  From just knowing the type of the pointee of p, the compiler cannot 
determine the size of the object.  

Therefore the bug has been closed. 

In your following testing 5:

> I'm not sure this is a reasonable behavior, but
> let me get into the specific test, which looks like this:
> 
> TEST(counted_by_seen_by_bdos)
> {
>struct annotated *p;
>int index = MAX_INDEX + unconst;
> 
>p = alloc_annotated(index);
> 
>REPORT_SIZE(p->array);
> /* 1 */ EXPECT_EQ(sizeof(*p), offsetof(typeof(*p), array));
>/* Check array size alone. */
> /* 2 */ EXPECT_EQ(__builtin_object_size(p->array, 1), SIZE_MAX);
> /* 3 */ EXPECT_EQ(__builtin_dynamic_object_size(p->array, 1), p->foo * 
> sizeof(*p->array));
>/* Check check entire object size. */
> /* 4 */ EXPECT_EQ(__builtin_object_size(p, 1), SIZE_MAX);
> /* 5 */ EXPECT_EQ(__builtin_dynamic_object_size(p, 1), sizeof(*p) + p->foo * 
> sizeof(*p->array));
> }
> 
> Test 5 should pass as well, since, again, p can be examined. Passing p
> to __bdos implies it is allocated and the __counted_by annotation can be
> examined.

Since the call to the routine “alloc_annotated" cannot be inlined, GCC does not 
see any allocation calls for the pointer p.
At the same time, due to the same reason as PR109986, GCC cannot determine the 
size of the object by just knowing the TYPE_SIZE
of the pointee of p.  

So, this is exactly the same issue as PR109557.  It’s an existing behavior per 
the current __buitlin_object_size algorithm.
I am still not very sure whether the situation in PR109557 can be improved or 
not, but either way, it’s a separate issue.

Please see the new testing case I added for PR109557, you will see that the 
above case 5 is a similar case as the new testing case in PR109557.

> 
> If "return p->array[index];" would be caught by the sanitizer, then
> it follows that __builtin_dynamic_object_size(p, 1) must also know the
> size. i.e. both must examine "p" and its trailing flexible array and
> __counted_by annotation.
> 
>> 
>> 2. The common issue for  the failed testing 3, 4, 9, 10 is:
>> 
>> for the following annotated structure: 
>> 
>> 
>> struct annotated {
>>unsigned long flags;
>>size_t foo;
>>int array[] __attribute__((counted_by (foo)));
>> };
>> 
>> 
>> struct annotated *p;
>> int index = 16;
>> 
>> p = malloc(sizeof(*p) + index * sizeof(*p->array));  // allocated real size 
>> 
>> p->foo = index + 2;  // p->foo was set by a different value than the real 
>> size of p->array as in test 9 and 10
> 
> Right, tests 9 and 10 are checking that the _smallest_ possible value of
> the array is used. (There are two sources of information: the allocation
> size and the size calculated by counted_by. The smaller of the two
> should be used when both are available.)

The counted_by attribute is used to annotate a Flexible array member on how 
many elements it will have.
However, if this information can not accurately reflect the real number of 
elements for the array allocated, 
What’s the purpose of such information? 

>> or
>> p->foo was not set to any value as in test 3 and 4
> 
> For tests 3 and 4, yes, this was my mistake. I have fixed up the tests
> now. Bill noticed the same issue. Sorry for the think-o!
> 
>> 
>> 
>> i.e, the value of p->foo is NOT synced with the number of elements allocated 
>> for the array p->array.  
>> 
>> I think that this should be considered as an user error, and the 
>> documentation of the attribute should include
>> this requirement.  (In the LLVM’s RFC, such requirement was included in the 
>> programing model: 
>> https://discourse.llvm.org/t/rfc-enforcing-bounds-safety-in-c-fbounds-safety/70854#maintaining-correctness-of-bounds-annotations-18)
>> 
>> We can add a new warning option -Wcounted-by to report such user error if 
>> needed.
>> 
>> What’s your opinion on this?
> 
> I think it is correct to allow mismatch between allocation and
> counted_by as long as only the least of the two is used.

What’s your mean by “only the least of the two is used”?

> This may be
> desirable in a few situations. One example would be a large allocation
> that is slowly filled up by the program.

So, for such situation, whenever the allocation is filled up, the field that 
hold the “counted_by” attribute should be increased at the same time,
Then, the “counted_by” value always sync with the real allocation. 
> I.e. the counted_by member is
> slowly increased during runtime (but not beyond the true 

[committed] combine: Change return type of predicate functions from int to bool

2023-07-17 Thread Uros Bizjak via Gcc-patches
Also change some internal variables and function arguments from int to bool.

gcc/ChangeLog:

* combine.cc (struct reg_stat_type): Change last_set_invalid to bool.
(cant_combine_insn_p): Change return type from int to bool and adjust
function body accordingly.
(can_combine_p): Ditto.
(combinable_i3pat): Ditto.  Change "i1_not_in_src" and "i0_not_in_src"
function arguments from int to bool.
(contains_muldiv): Change return type from int to bool and adjust
function body accordingly.
(try_combine): Ditto. Change "new_direct_jump" pointer function
argument from int to bool.  Change "substed_i2", "substed_i1",
"substed_i0", "added_sets_0", "added_sets_1", "added_sets_2",
"i2dest_in_i2src", "i1dest_in_i1src", "i2dest_in_i1src",
"i0dest_in_i0src", "i1dest_in_i0src", "i2dest_in_i0src",
"i2dest_killed", "i1dest_killed", "i0dest_killed", "i1_feeds_i2_n",
"i0_feeds_i2_n", "i0_feeds_i1_n", "i3_subst_into_i2", "have_mult",
"swap_i2i3", "split_i2i3" and "changed_i3_dest" variables
from int to bool.
(subst): Change "in_dest", "in_cond" and "unique_copy" function
arguments from int to bool.
(combine_simplify_rtx): Change "in_dest" and "in_cond" function
arguments from int to bool.
(make_extraction): Change "unsignedp", "in_dest" and "in_compare"
function argument from int to bool.
(force_int_to_mode): Change "just_select" function argument
from int to bool.  Change "next_select" variable to bool.
(rtx_equal_for_field_assignment_p): Change return type from
int to bool and adjust function body accordingly.
(merge_outer_ops): Ditto.  Change "pcomp_p" pointer function
argument from int to bool.
(get_last_value_validate): Change return type from int to bool
and adjust function body accordingly.
(reg_dead_at_p): Ditto.
(reg_bitfield_target_p): Ditto.
(combine_instructions): Ditto.  Change "new_direct_jump"
variable to bool.
(can_combine_p): Change return type from int to bool
and adjust function body accordingly.
(likely_spilled_retval_p): Ditto.
(can_change_dest_mode): Change "added_sets" function argument
from int to bool.
(find_split_point): Change "unsignedp" variable to bool.
(simplify_if_then_else): Change "comparison_p" and "swapped"
variables to bool.
(simplify_set): Change "other_changed" variable to bool.
(expand_compound_operation): Change "unsignedp" variable to bool.
(force_to_mode): Change "just_select" function argument
from int to bool.  Change "next_select" variable to bool.
(extended_count): Change "unsignedp" function argument to bool.
(simplify_shift_const_1): Change "complement_p" variable to bool.
(simplify_comparison): Change "changed" variable to bool.
(rest_of_handle_combine): Change return type to void.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/combine.cc b/gcc/combine.cc
index 304c020ec79..d9161b257e8 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -156,7 +156,7 @@ struct reg_stat_type {
register was assigned
  last_set_table_tick   records the value of label_tick when a
value using the register is assigned
- last_set_invalid  set to nonzero when it is not valid
+ last_set_invalid  set to true when it is not valid
to use the value of this register in some
register's value
 
@@ -202,11 +202,11 @@ struct reg_stat_type {
   char last_set_sign_bit_copies;
   ENUM_BITFIELD(machine_mode)  last_set_mode : MACHINE_MODE_BITSIZE;
 
-  /* Set nonzero if references to register n in expressions should not be
+  /* Set to true if references to register n in expressions should not be
  used.  last_set_invalid is set nonzero when this register is being
  assigned to and last_set_table_tick == label_tick.  */
 
-  char last_set_invalid;
+  bool last_set_invalid;
 
   /* Some registers that are set more than once and used in more than one
  basic block are nevertheless always set in similar ways.  For example,
@@ -416,35 +416,36 @@ static void do_SUBST_INT (int *, int);
 static void init_reg_last (void);
 static void setup_incoming_promotions (rtx_insn *);
 static void set_nonzero_bits_and_sign_copies (rtx, const_rtx, void *);
-static int cant_combine_insn_p (rtx_insn *);
-static int can_combine_p (rtx_insn *, rtx_insn *, rtx_insn *, rtx_insn *,
- rtx_insn *, rtx_insn *, rtx *, rtx *);
-static int combinable_i3pat (rtx_insn *, rtx *, rtx, rtx, rtx, int, int, rtx 
*);
-static int contains_muldiv (rtx);
+static bool cant_combine_insn_p (rtx_insn *);
+static bool can_combine_p (rtx_insn *, rtx_insn *, rtx_insn *, rtx_insn *,
+  rtx_insn *, rtx_insn *, rtx *, rtx *);
+static bool 

Re: [PATCH][RFC] tree-optimization/88540 - FP x > y ? x : y if-conversion without -ffast-math

2023-07-17 Thread Andrew Pinski via Gcc-patches
On Mon, Jul 17, 2023 at 2:30 AM Richard Biener  wrote:
>
> On Fri, 14 Jul 2023, Andrew Pinski wrote:
>
> > On Thu, Jul 13, 2023 at 2:54?AM Richard Biener via Gcc-patches
> >  wrote:
> > >
> > > The following makes sure that FP x > y ? x : y style max/min operations
> > > are if-converted at the GIMPLE level.  While we can neither match
> > > it to MAX_EXPR nor .FMAX as both have different semantics with IEEE
> > > than the ternary ?: operation we can make sure to maintain this form
> > > as a COND_EXPR so backends have the chance to match this to instructions
> > > their ISA offers.
> > >
> > > The patch does this in phiopt where we recognize min/max and instead
> > > of giving up when we have to honor NaNs we alter the generated code
> > > to a COND_EXPR.
> > >
> > > This resolves PR88540 and we can then SLP vectorize the min operation
> > > for its testcase.  It also resolves part of the regressions observed
> > > with the change matching bit-inserts of bit-field-refs to vec_perm.
> > >
> > > Expansion from a COND_EXPR rather than from compare-and-branch
> > > regresses gcc.target/i386/pr54855-13.c and gcc.target/i386/pr54855-9.c
> > > by producing extra moves while the corresponding min/max operations
> > > are now already synthesized by RTL expansion, register selection
> > > isn't optimal.  This can be also provoked without this change by
> > > altering the operand order in the source.
> > >
> > > It regresses gcc.target/i386/pr110170.c where we end up CSEing the
> > > condition which makes RTL expansion no longer produce the min/max
> > > directly and code generation is obfuscated enough to confuse
> > > RTL if-conversion.
> > >
> > > It also regresses gcc.target/i386/ssefp-[12].c where oddly one
> > > variant isn't if-converted and ix86_expand_fp_movcc doesn't
> > > match directly (the FP constants get expanded twice).  A fix
> > > could be in emit_conditional_move where both prepare_cmp_insn
> > > and emit_conditional_move_1 force the constants to (different)
> > > registers.
> > >
> > > Otherwise bootstrapped and tested on x86_64-unknown-linux-gnu.
> > >
> > > PR tree-optimization/88540
> > > * tree-ssa-phiopt.cc (minmax_replacement): Do not give up
> > > with NaNs but handle the simple case by if-converting to a
> > > COND_EXPR.
> >
> > One thing which I was thinking about adding to phiopt is having the
> > last pass do the conversion to COND_EXPR if the target supports a
> > conditional move for that expression. That should fix this one right?
> > This was one of things I was working towards with the moving to use
> > match-and-simplify too.
>
> Note the if-conversion has to happen before BB SLP but the last
> phiopt is too late for this (yes, BB SLP could also be enhanced
> to handle conditionals and do if-conversion on-the-fly).  For
> BB SLP there's also usually jump threading making a mess of
> same condition chain of if-convertible ops ...

Oh, I didn't think about that. I was thinking more of PR 110170 and PR
106952 when I saw this patch rather than thinking of SLP vectorizer
related stuff.

>
> As for the min + max case that regresses due
> to CSE (gcc.target/i386/pr110170.c) I wonder whether pre-expanding
>
>  _1 = _2 < _3;
>  _4 = _1 ? _2 : _3;
>  _5 = _1 ? _3 : _2;
>
> to something more clever would be appropriate anyway.  We could
> adjust this to either duplicate _1 or expand the COND_EXPRs back
> to a single CFG diamond.  I suppose force-duplicating non-vector
> compares of COND_EXPRs to make TER work again would fix similar
> regressions we might already observe (but I'm not aware of many
> COND_EXPR generators).

Oh yes you had already recorded as PR 105715 too.

Thanks,
Andrew Pinski


>
> Richard.
>
> > Thanks,
> > Andrew
> >
> > >
> > > * gcc.target/i386/pr88540.c: New testcase.
> > > * gcc.target/i386/pr54855-12.c: Adjust.
> > > * gcc.target/i386/pr54855-13.c: Likewise.
> > > ---
> > >  gcc/testsuite/gcc.target/i386/pr54855-12.c |  2 +-
> > >  gcc/testsuite/gcc.target/i386/pr54855-13.c |  2 +-
> > >  gcc/testsuite/gcc.target/i386/pr88540.c| 10 ++
> > >  gcc/tree-ssa-phiopt.cc | 21 -
> > >  4 files changed, 28 insertions(+), 7 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr88540.c
> > >
> > > diff --git a/gcc/testsuite/gcc.target/i386/pr54855-12.c 
> > > b/gcc/testsuite/gcc.target/i386/pr54855-12.c
> > > index 2f8af392c83..09e8ab8ae39 100644
> > > --- a/gcc/testsuite/gcc.target/i386/pr54855-12.c
> > > +++ b/gcc/testsuite/gcc.target/i386/pr54855-12.c
> > > @@ -1,6 +1,6 @@
> > >  /* { dg-do compile } */
> > >  /* { dg-options "-O2 -mavx512fp16" } */
> > > -/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]" 1 } } */
> > > +/* { dg-final { scan-assembler-times "vm\[ai\]\[nx\]sh\[ \\t\]" 1 } } */
> > >  /* { dg-final { scan-assembler-not "vcomish\[ \\t\]" } } */
> > >  /* { dg-final { scan-assembler-not "vmovsh\[ \\t\]" { target { ! ia32 } 
> > > } 

[PATCH 2/2 ver 4] rs6000, fix vec_replace_unaligned built-in arguments

2023-07-17 Thread Carl Love via Gcc-patches
GCC maintainers:

Version 4, changed the new RS6000_OVLD_VEC_REPLACE_UN case statement
rs6000/rs6000-c.cc.  The existing REPLACE_ELT iterator name was changed
to REPLACE_ELT_V along with the associated define_mode_attr.  Renamed
VEC_RU to REPLACE_ELT for the iterator name and VEC_RU_char to
REPLACE_ELT_char.  Fixed the double test in vec-replace-word-
runnable_1.c to be consistent with the other tests.  Removed the "dg-do 
link" from both tests.  Put in an explicit cast in test 
vec-replace-word-runnable_2.c to eliminate the need for the 
-flax-vector-conversions dg-option.

Version 3, added code to altivec_resolve_overloaded_builtin so the
correct instruction is selected for the size of the second argument. 
This restores the instruction counts to the original values where the
correct instructions were originally being generated.  The naming of
the overloaded builtin instances and builtin definitions were changed
to reflect the type of the second argument since the type of the first
argument is now the same for all overloaded instances.  A new builtin
test file was added for the case where the first argument is cast to
the unsigned long long type.  This test requires the -flax-vector-
conversions gcc command line option.  Since the other tests do not
require this option, I felt that the new test needed to be in a
separate file.  Finally some formatting fixes were made in the original
test file.  Patch has been retested on Power 10 with no regressions.

Version 2, fixed various typos.  Updated the change log body to say the
instruction counts were updated.  The instruction counts changed as a
result of changing the first argument of the vec_replace_unaligned
builtin call from vector unsigned long long (vull) to vector unsigned
char (vuc).  When the first argument was vull the builtin call
generated the vinsd instruction for the two test cases.  The updated
call with vuc as the first argument generates two vinsw instructions
instead.  Patch was retested on Power 10 with no regressions.

The following patch fixes the first argument in the builtin definition
and the corresponding test cases.  Initially, the builtin specification
was wrong due to a cut and past error.  The documentation was fixed in:

   commit ed3fea09b18f67e757b5768b42cb6e816626f1db
   Author: Bill Schmidt 
   Date:   Fri Feb 4 13:07:17 2022 -0600

   rs6000: Correct function prototypes for vec_replace_unaligned

   Due to a pasto error in the documentation, vec_replace_unaligned was
   implemented with the same function prototypes as vec_replace_elt.  
   It was intended that vec_replace_unaligned always specify output
   vectors as having type vector unsigned char, to emphasize that 
   elements are potentially misaligned by this built-in function.  
   This patch corrects the misimplementation.


This patch fixes the arguments in the definitions and updates the
testcases accordingly.  Additionally, a few minor spacing issues are
fixed.

The patch has been tested on Power 10 with no regressions.  Please let
me know if the patch is acceptable for mainline.  Thanks.

 Carl 



rs6000, fix vec_replace_unaligned built-in arguments

The first argument of the vec_replace_unaligned built-in should always be
of type unsigned char, as specified in gcc/doc/extend.texi.

This patch fixes the builtin definitions and updates the test cases to use
the correct arguments.  The original test file is renamed and a second test
file is added for a new test case.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def: Rename
__builtin_altivec_vreplace_un_uv2di as __builtin_altivec_vreplace_un_udi
__builtin_altivec_vreplace_un_uv4si as __builtin_altivec_vreplace_un_usi
__builtin_altivec_vreplace_un_v2df as __builtin_altivec_vreplace_un_df
__builtin_altivec_vreplace_un_v2di as __builtin_altivec_vreplace_un_di
__builtin_altivec_vreplace_un_v4sf as __builtin_altivec_vreplace_un_sf
__builtin_altivec_vreplace_un_v4si as __builtin_altivec_vreplace_un_si.
Rename VREPLACE_UN_UV2DI as VREPLACE_UN_UDI, VREPLACE_UN_UV4SI as
VREPLACE_UN_USI, VREPLACE_UN_V2DF as VREPLACE_UN_DF,
VREPLACE_UN_V2DI as VREPLACE_UN_DI, VREPLACE_UN_V4SF as
VREPLACE_UN_SF, VREPLACE_UN_V4SI as VREPLACE_UN_SI.
Rename vreplace_un_v2di as vreplace_un_di, vreplace_un_v4si as
vreplace_un_si, vreplace_un_v2df as vreplace_un_df,
vreplace_un_v2di as vreplace_un_di, vreplace_un_v4sf as
vreplace_un_sf, vreplace_un_v4si as vreplace_un_si.
* config/rs6000/rs6000-c.cc (find_instance): Add case
RS6000_OVLD_VEC_REPLACE_UN.
* config/rs6000/rs6000-overload.def (__builtin_vec_replace_un):
Fix first argument type.  Rename VREPLACE_UN_UV4SI as
VREPLACE_UN_USI, VREPLACE_UN_V4SI as VREPLACE_UN_SI,
VREPLACE_UN_UV2DI as VREPLACE_UN_UDI, VREPLACE_UN_V2DI as
VREPLACE_UN_DI, 

[PATCH 1/2] rs6000, add argument to function find_instance

2023-07-17 Thread Carl Love via Gcc-patches


GCC maintainers:

The rs6000 function find_instance assumes that it is called for built-
ins with only two arguments.  There is no checking for the actual
number of aruguments used in the built-in.  This patch adds an
additional parameter to the function call containing the number of
aruguments in the built-in.  The function will now do the needed checks
for all of the arguments.

This fix is needed for the next patch in the series that fixes the
vec_replace_unaligned built-in.c test.

Please let me know if this patch is acceptable for mainline.  Thanks.

Carl 



rs6000, add argument to function find_instance

The function find_instance assumes it is called to check a built-in  with
only two arguments.  Ths patch extends the function by adding a parameter
specifying the number of buit-in arguments to check.

gcc/ChangeLog:
* config/rs6000/rs6000-c.cc (find_instance): Add new parameter that
specifies the number of built-in arguments to check.
(altivec_resolve_overloaded_builtin): Update calls to find_instance
to pass the number of built-in argument to be checked.
---
 gcc/config/rs6000/rs6000-c.cc | 27 +++
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index a353bca19ef..350987b851b 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -1679,7 +1679,7 @@ tree
 find_instance (bool *unsupported_builtin, ovlddata **instance,
   rs6000_gen_builtins instance_code,
   rs6000_gen_builtins fcode,
-  tree *types, tree *args)
+  tree *types, tree *args, int nargs)
 {
   while (*instance && (*instance)->bifid != instance_code)
 *instance = (*instance)->next;
@@ -1691,17 +1691,28 @@ find_instance (bool *unsupported_builtin, ovlddata 
**instance,
   if (!inst->fntype)
 return error_mark_node;
   tree fntype = rs6000_builtin_info[inst->bifid].fntype;
-  tree parmtype0 = TREE_VALUE (TYPE_ARG_TYPES (fntype));
-  tree parmtype1 = TREE_VALUE (TREE_CHAIN (TYPE_ARG_TYPES (fntype)));
+  tree argtype = TYPE_ARG_TYPES (fntype);
+  tree parmtype;
+  int args_compatible = true;
 
-  if (rs6000_builtin_type_compatible (types[0], parmtype0)
-  && rs6000_builtin_type_compatible (types[1], parmtype1))
+  for (int i = 0; i bifid, false) != error_mark_node
  && rs6000_builtin_is_supported (inst->bifid))
{
  tree ret_type = TREE_TYPE (inst->fntype);
- return altivec_build_resolved_builtin (args, 2, fntype, ret_type,
+ return altivec_build_resolved_builtin (args, nargs, fntype, ret_type,
 inst->bifid, fcode);
}
   else
@@ -1921,7 +1932,7 @@ altivec_resolve_overloaded_builtin (location_t loc, tree 
fndecl,
  instance_code = RS6000_BIF_CMPB_32;
 
tree call = find_instance (_builtin, ,
-  instance_code, fcode, types, args);
+  instance_code, fcode, types, args, nargs);
if (call != error_mark_node)
  return call;
break;
@@ -1958,7 +1969,7 @@ altivec_resolve_overloaded_builtin (location_t loc, tree 
fndecl,
  }
 
tree call = find_instance (_builtin, ,
-  instance_code, fcode, types, args);
+  instance_code, fcode, types, args, nargs);
if (call != error_mark_node)
  return call;
break;
-- 
2.37.2




[PATCH 0/2] rs6000, fix vec_replace_unaligned built-in arguments

2023-07-17 Thread Carl Love via Gcc-patches


GCC maintianers:

In the process of fixing the powerpc/vec-replace-word-runnable.c test I
found there is an existing issue with function find_instance in rs6000-
c.cc.  Per the review comments from Kewen in

https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624401.html

The fix for function find_instance was put into a separate patch
followed by a patch for the vec-replace-word-runnable.c test fixes.

The two patches have been tested on Power 10 LE with no regression
failures.

   Carl



Re: [PATCH ver 3] rs6000, fix vec_replace_unaligned built-in arguments

2023-07-17 Thread Carl Love via Gcc-patches
On Thu, 2023-07-13 at 17:41 +0800, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2023/7/8 04:18, Carl Love wrote:
> > GCC maintainers:
> > 
> > Version 3, added code to altivec_resolve_overloaded_builtin so the
> > correct instruction is selected for the size of the second
> > argument. 
> > This restores the instruction counts to the original values where
> > the
> > correct instructions were originally being generated.  The naming
> > of
> 
> Nice, I have some comments inlined below.
> 
> > the overloaded builtin instances and builtin definitions were
> > changed
> > to reflect the type of the second argument since the type of the
> > first
> > argument is now the same for all overloaded instances.  A new
> > builtin
> > test file was added for the case where the first argument is cast
> > to
> > the unsigned long long type.  This test requires the -flax-vector-
> > conversions gcc command line option.  Since the other tests do not
> > require this option, I felt that the new test needed to be in a
> > separate file.  Finally some formatting fixes were made in the
> > original
> > test file.  Patch has been retested on Power 10 with no
> > regressions.
> > 
> > Version 2, fixed various typos.  Updated the change log body to say
> > the
> > instruction counts were updated.  The instruction counts changed as
> > a
> > result of changing the first argument of the vec_replace_unaligned
> > builtin call from vector unsigned long long (vull) to vector
> > unsigned
> > char (vuc).  When the first argument was vull the builtin call
> > generated the vinsd instruction for the two test cases.  The
> > updated
> > call with vuc as the first argument generates two vinsw
> > instructions
> > instead.  Patch was retested on Power 10 with no regressions.
> > 
> > The following patch fixes the first argument in the builtin
> > definition
> > and the corresponding test cases.  Initially, the builtin
> > specification
> > was wrong due to a cut and past error.  The documentation was fixed
> > in:
> > 
> >commit ed3fea09b18f67e757b5768b42cb6e816626f1db
> >Author: Bill Schmidt 
> >Date:   Fri Feb 4 13:07:17 2022 -0600
> > 
> >rs6000: Correct function prototypes for
> > vec_replace_unaligned
> > 
> >Due to a pasto error in the documentation,
> > vec_replace_unaligned
> > was
> >implemented with the same function prototypes as
> > vec_replace_elt.  It was
> >intended that vec_replace_unaligned always specify output
> > vectors as having
> >type vector unsigned char, to emphasize that elements are
> > potentially
> >misaligned by this built-in function.  This patch corrects
> > the
> >misimplementation.
> > 
> > 
> > This patch fixes the arguments in the definitions and updates the
> > testcases accordingly.  Additionally, a few minor spacing issues
> > are
> > fixed.
> > 
> > The patch has been tested on Power 10 with no regressions.  Please
> > let
> > me know if the patch is acceptable for mainline.  Thanks.
> > 
> >  Carl 
> > 
> > --
> > rs6000, fix vec_replace_unaligned built-in arguments
> > 
> > The first argument of the vec_replace_unaligned built-in should
> > always be
> > unsigned char, as specified in gcc/doc/extend.texi.
> 
> Maybe "be with type vector unsigned char"?

Changed to 

  The first argument of the vec_replace_unaligned built-in should
always be of type unsigned char, 

> 
> > This patch fixes the builtin definitions and updates the test cases
> > to use
> > the correct arguments.  The original test file is renamed and a
> > second test
> > file is added for a new test case.
> > 
> > gcc/ChangeLog:
> > * config/rs6000/rs6000-builtins.def: Rename
> > __builtin_altivec_vreplace_un_uv2di as
> > __builtin_altivec_vreplace_un_udi
> > __builtin_altivec_vreplace_un_uv4si as
> > __builtin_altivec_vreplace_un_usi
> > __builtin_altivec_vreplace_un_v2df as
> > __builtin_altivec_vreplace_un_df
> > __builtin_altivec_vreplace_un_v2di as
> > __builtin_altivec_vreplace_un_di
> > __builtin_altivec_vreplace_un_v4sf as
> > __builtin_altivec_vreplace_un_sf
> > __builtin_altivec_vreplace_un_v4si as
> > __builtin_altivec_vreplace_un_si.
> > Rename VREPLACE_UN_UV2DI as VREPLACE_UN_UDI, VREPLACE_UN_UV4SI
> > as
> > VREPLACE_UN_USI, VREPLACE_UN_V2DF as VREPLACE_UN_DF,
> > VREPLACE_UN_V2DI as VREPLACE_UN_DI, VREPLACE_UN_V4SF as
> > VREPLACE_UN_SF, VREPLACE_UN_V4SI as VREPLACE_UN_SI.
> > Rename vreplace_un_v2di as vreplace_un_di, vreplace_un_v4si as
> > vreplace_un_si, vreplace_un_v2df as vreplace_un_df,
> > vreplace_un_v2di as vreplace_un_di, vreplace_un_v4sf as
> > vreplace_un_sf, vreplace_un_v4si as vreplace_un_si.
> > * config/rs6000/rs6000-c.cc (find_instance): Add new argument
> > nargs.  Add nargs check.  Extend function to handle three
> > arguments.
> > (altivec_resolve_overloaded_builtin): Add new argument nargs to
> > function calls.  Add 

Re: [PATCH] c++: constrained surrogate calls [PR110535]

2023-07-17 Thread Jason Merrill via Gcc-patches

On 7/12/23 11:54, Patrick Palka wrote:

On Wed, 12 Jul 2023, Patrick Palka wrote:


We're not checking constraints of pointer/reference-to-function conversion
functions during overload resolution, which causes us to ICE on the first
testcase and incorrectly reject the second testcase.


Er, I noticed [over.call.object] doesn't exactly say that surrogate
call functions inherit the constraints of the corresponding conversion
function, but I reckon that's the intent?


I also assume so.  OK.



Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/13?

PR c++/110535

gcc/cp/ChangeLog:

* call.cc (add_conv_candidate): Check constraints.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-surrogate1.C: New test.
* g++.dg/cpp2a/concepts-surrogate2.C: New test.
---
  gcc/cp/call.cc   |  8 
  gcc/testsuite/g++.dg/cpp2a/concepts-surrogate1.C | 12 
  gcc/testsuite/g++.dg/cpp2a/concepts-surrogate2.C | 14 ++
  3 files changed, 34 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-surrogate1.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-surrogate2.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index 15a3d6f2a1f..81935b83908 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -2588,6 +2588,14 @@ add_conv_candidate (struct z_candidate **candidates, 
tree fn, tree obj,
if (*candidates && (*candidates)->fn == totype)
  return NULL;
  
+  if (!constraints_satisfied_p (fn))

+{
+  reason = constraint_failure ();
+  viable = 0;
+  return add_candidate (candidates, fn, obj, arglist, len, convs,
+   access_path, conversion_path, viable, reason, 
flags);
+}
+
for (i = 0; i < len; ++i)
  {
tree arg, argtype, convert_type = NULL_TREE;
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-surrogate1.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-surrogate1.C
new file mode 100644
index 000..e8481a31656
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-surrogate1.C
@@ -0,0 +1,12 @@
+// PR c++/110535
+// { dg-do compile { target c++20 } }
+
+using F = int(int);
+
+template
+struct A {
+ operator F*() requires B;
+};
+
+int i = A{}(0);  // OK
+int j = A{}(0); // { dg-error "no match" }
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-surrogate2.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-surrogate2.C
new file mode 100644
index 000..8bf8364beb7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-surrogate2.C
@@ -0,0 +1,14 @@
+// PR c++/110535
+// { dg-do compile { target c++20 } }
+
+using F = int(int);
+using G = long(int);
+
+template
+struct A {
+ operator F&() requires B;
+ operator G&() requires (!B);
+};
+
+int i = A{}(0);  // { dg-bogus "ambiguous" }
+int j = A{}(0); // { dg-bogus "ambiguous" }
--
2.41.0.327.gaa9166bcc0








Re: [PATCH] c++: non-standalone surrogate call template

2023-07-17 Thread Jason Merrill via Gcc-patches

On 7/12/23 14:47, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?  There might be an existing PR for this issue but Bugzilla search
seems to be timing out for me currently.


OK.


-- >8 --

I noticed we were accidentally preventing ourselves from considering
a pointer/reference-to-function conversion function template if it's
not the first conversion function that's considered, which for the
testcase below resulted in us accepting the B call but not the A call
despite the only difference between A and B being the order of member
declarations.  This patch fixes this so that the outcome of overload
resolution doesn't arbitrarily depend on declaration order in this
situation.

gcc/cp/ChangeLog:

* call.cc (add_template_conv_candidate): Don't check for
non-empty 'candidates' here.
(build_op_call): Check it here, before we've considered any
conversion functions.

gcc/testsuite/ChangeLog:

* g++.dg/overload/conv-op5.C: New test.
---
  gcc/cp/call.cc   | 24 ++--
  gcc/testsuite/g++.dg/overload/conv-op5.C | 18 ++
  2 files changed, 32 insertions(+), 10 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/overload/conv-op5.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index 81935b83908..119063979fa 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -3709,12 +3709,6 @@ add_template_conv_candidate (struct z_candidate 
**candidates, tree tmpl,
 tree return_type, tree access_path,
 tree conversion_path, tsubst_flags_t complain)
  {
-  /* Making this work broke PR 71117 and 85118, so until the committee resolves
- core issue 2189, let's disable this candidate if there are any call
- operators.  */
-  if (*candidates)
-return NULL;
-
return
  add_template_candidate_real (candidates, tmpl, NULL_TREE, NULL_TREE,
 NULL_TREE, arglist, return_type, access_path,
@@ -5290,6 +5284,8 @@ build_op_call (tree obj, vec **args, 
tsubst_flags_t complain)
  LOOKUP_NORMAL, , complain);
  }
  
+  bool any_call_ops = candidates != nullptr;

+
convs = lookup_conversions (type);
  
for (; convs; convs = TREE_CHAIN (convs))

@@ -5306,10 +5302,18 @@ build_op_call (tree obj, vec **args, 
tsubst_flags_t complain)
  continue;
  
  	if (TREE_CODE (fn) == TEMPLATE_DECL)

- add_template_conv_candidate
-   (, fn, obj, *args, totype,
-/*access_path=*/NULL_TREE,
-/*conversion_path=*/NULL_TREE, complain);
+ {
+   /* Making this work broke PR 71117 and 85118, so until the
+  committee resolves core issue 2189, let's disable this
+  candidate if there are any call operators.  */
+   if (any_call_ops)
+ continue;
+
+   add_template_conv_candidate
+ (, fn, obj, *args, totype,
+  /*access_path=*/NULL_TREE,
+  /*conversion_path=*/NULL_TREE, complain);
+ }
else
  add_conv_candidate (, fn, obj,
  *args, /*conversion_path=*/NULL_TREE,
diff --git a/gcc/testsuite/g++.dg/overload/conv-op5.C 
b/gcc/testsuite/g++.dg/overload/conv-op5.C
new file mode 100644
index 000..b7724908b62
--- /dev/null
+++ b/gcc/testsuite/g++.dg/overload/conv-op5.C
@@ -0,0 +1,18 @@
+// { dg-do compile { target c++11 } }
+
+template using F = int(*)(T);
+using G = int(*)(int*);
+
+struct A {
+  template operator F();  // #1
+  operator G() = delete; // #2
+};
+
+int i = A{}(0); // selects #1
+
+struct B {
+  operator G() = delete; // #2
+  template operator F();  // #1
+};
+
+int j = B{}(0); // selects #1




Re: [PATCH V4] Optimize '(X - N * M) / N' to 'X / N - M' if valid

2023-07-17 Thread Andrew MacLeod via Gcc-patches


On 7/17/23 09:45, Jiufu Guo wrote:



Should we decide we would like it in general, it wouldnt be hard to add to
irange.  wi_fold() cuurently returns null, it could easily return a bool
indicating if an overflow happened, and wi_fold_in_parts and fold_range would
simply OR the results all together of the compoent wi_fold() calls.  It would
require updating/audfiting  a number of range-op entries and adding an
overflowed_p()  query to irange.

Ah, yeah - the folding APIs would be a good fit I guess.  I was
also looking to have the "new" helpers to be somewhat consistent
with the ranger API.

So if we had a fold_range overload with either an output argument
or a flag that makes it return false on possible overflow that
would work I guess?  Since we have a virtual class setup we
might be able to provide a default failing method and implement
workers for plus and mult (as needed for this patch) as the need
arises?

Thanks for your comments!
Here is a concern.  The patterns in match.pd may be supported by
'vrp' passes. At that time, the range info would be computed (via
the value-range machinery) and cached for each SSA_NAME. In the
patterns, when range_of_expr is called for a capture, the range
info is retrieved from the cache, and no need to fold_range again.
This means the overflow info may also need to be cached together
with other range info.  There may be additional memory and time
cost.



I've been thinking about this a little bit, and how to make the info 
available in a useful way.


I wonder if maybe we just add another entry point  to range-ops that 
looks a bit like fold_range ..


  Attached is an (untested) patch which ads overflow_free_p(op1, op2, 
relation)  to rangeops.   It defaults to returning false.  If you want 
to implement it for say plus,  you'd add to operator_plus in 
range-ops.cc  something like


operator_plus::overflow_free_p (irange, irange& op2, relation_kind)
{
   // stuff you do in plus_without_overflow
}

I added relation_kind as  param, but you can ignore it.  maybe it wont 
ever help, but it seems like if we know there is a relation between op1 
and op2 we might be able to someday determine something else? if 
not, remove it.


Then all you need to do too access it is to go thru range-op_handler.. 
so for instance:


range_op_handler (PLUS_EXPR).overflow_free_p (op1, op2)

It'll work for all types an all tree codes. the dispatch machinery will 
return false unless both op1 and op2 are integral ranges, and then it 
will invoke the appropriate handler, defaulting to returning FALSE.


I also am not a fan of the get_range  routine.  It would be better to 
generally just call range_of_expr, get the results, then handle 
undefined in the new overflow_free_p() routine and return false.  
varying should not need anything special since it will trigger the 
overflow when you do the calculation.


The auxillary routines could go in vr-values.h/cc.  They seem like 
things that simplify_using_ranges could utilize, and when we get to 
integrating simplify_using_ranges better,  what you are doing may end up 
there anyway


Does that work?

Andrew
diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index d1c735ee6aa..f2a863db286 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -366,6 +366,24 @@ range_op_handler::op1_op2_relation (const vrange ) const
 }
 }
 
+bool
+range_op_handler::overflow_free_p (const vrange ,
+   const vrange ,
+   relation_trio rel) const
+{
+  gcc_checking_assert (m_operator);
+  switch (dispatch_kind (lh, lh, rh))
+{
+  case RO_III:
+	return m_operator->overflow_free_p(as_a  (lh),
+	   as_a  (rh),
+	   rel);
+  default:
+	return false;
+}
+}
+
+
 
 // Convert irange bitmasks into a VALUE MASK pair suitable for calling CCP.
 
@@ -688,6 +706,13 @@ range_operator::op1_op2_relation_effect (irange _range ATTRIBUTE_UNUSED,
   return false;
 }
 
+bool
+range_operator::overflow_free_p (const irange &, const irange &,
+ relation_trio) const
+{
+  return false;
+}
+
 // Apply any known bitmask updates based on this operator.
 
 void
diff --git a/gcc/range-op.h b/gcc/range-op.h
index af94c2756a7..db3b03f28a5 100644
--- a/gcc/range-op.h
+++ b/gcc/range-op.h
@@ -147,6 +147,9 @@ public:
 
   virtual relation_kind op1_op2_relation (const irange ) const;
   virtual relation_kind op1_op2_relation (const frange ) const;
+
+  virtual bool overflow_free_p (const irange , const irange ,
+relation_trio = TRIO_VARYING) const;
 protected:
   // Perform an integral operation between 2 sub-ranges and return it.
   virtual void wi_fold (irange , tree type,
@@ -214,6 +217,8 @@ public:
   const vrange ,
   relation_kind = VREL_VARYING) const;
   relation_kind op1_op2_relation (const vrange ) const;
+  bool overflow_free_p (const vrange , const vrange ,
+			relation_trio = TRIO_VARYING) const;
 protected:
   unsigned dispatch_kind (const vrange , const vrange ,
 			  const vrange& op2) const;


Re: [PATCH v1] RISC-V: Support basic floating-point dynamic rounding mode

2023-07-17 Thread Jeff Law via Gcc-patches




On 7/16/23 19:02, juzhe.zh...@rivai.ai wrote:

LGTM

And as of today, that's all we need ;-)

Thanks,

Jeff


Re: [PATCH v3 1/3] c++: Track lifetimes in constant evaluation [PR70331,PR96630,PR98675]

2023-07-17 Thread Jason Merrill via Gcc-patches

On 7/16/23 09:47, Nathaniel Shead wrote:

On Fri, Jul 14, 2023 at 11:16:58AM -0400, Jason Merrill wrote:


What if, instead of removing the variable from one hash table and adding it
to another, we change the value to, say, void_node?


I have another patch I'm working on after this which does seem to
require the overlapping tables to properly catch uses of aggregates
while they are still being constructed (i.e. before their lifetime has
begun), as part of PR c++/109518. In that case the 'values' map contains
the CONSTRUCTOR node for the aggregate, but it also needs to be in
'outside_lifetime'. I could also explore solving this another way
however if you prefer.


I'd think to handle this with a flag on the CONSTRUCTOR to indicate that 
members with no value are out of lifetime (so, a stronger version of 
CONSTRUCTOR_NO_CLEARING that just indicates uninitialized).  Currently 
all the TREE_LANG_FLAG_* are occupied on CONSTRUCTOR, but there seem to 
be plenty of spare bits to add a TREE_LANG_FLAG_7.


It might make sense to access those two flags with accessor functions so 
they stay aligned.



(I also have vague dreams of at some point making this a map to the
location that the object was destroyed for more context in the error
messages, but I'm not yet sure if that's feasible or will actually be
all that helpful so I'm happy to forgo that.)


Hmm, that sounds convenient for debugging, but affected cases would also 
be straightforward to debug by adding a run-time call, so I'm skeptical 
it would be worth the overhead for successful compiles.


Jason



Re: PR82943 - Suggested patch to fix

2023-07-17 Thread Alexander Westbrooks via Gcc-patches
Hello,

I wanted to follow up on this, and ask what the next steps would be to
incorporate this patch?

Thanks,

Alexander Westbrooks


On Thu, Jun 29, 2023 at 10:38 PM Alexander Westbrooks 
wrote:

> Hello,
>
> I have finished my testing, and updated my patch and relevant Changelogs.
> I added 4 new tests and all the existing tests in the current testsuite
> for gfortran passed or failed as expected. Do I need to attach the test
> results here?
>
> The platform I tested on was a Docker container running in Docker Desktop,
> running the "mcr.microsoft.com/devcontainers/universal:2-linux" image.
>
> I also made sure that my code changes followed the coding standards.
> Please let me know if there is anything else that I need to do. I don't
> have write-access to the repository.
>
> Thanks,
>
> Alexander
>
> On Wed, Jun 28, 2023 at 4:14 PM Harald Anlauf  wrote:
>
>> Hi Alex,
>>
>> welcome to the gfortran community.  It is great that you are trying
>> to get actively involved.
>>
>> You already did quite a few things right: patches shall be sent to
>> the gcc-patches ML, but Fortran reviewers usually notice them only
>> where they are copied to the fortran ML.
>>
>> There are some general recommendations on the formatting of C code,
>> like indentation, of the patches, and of the commit log entries.
>>
>> Regarding coding standards, see https://www.gnu.org/prep/standards/ .
>>
>> Regarding testcases, a recommendation is to have a look at
>> existing testcases, e.g. in gcc/testsuite/gfortran.dg/, and then
>> decide if the testcase shall test the compile-time or run-time
>> behaviour, and add the necessary dejagnu directives.
>>
>> You should also verify if your patch passes regression testing.
>> For changes to gfortran, it is usually sufficient to run
>>
>> make check-fortran -j 
>>
>> where  is the number of parallel tests.
>> You would need to report also the platform where you tested on.
>>
>> There is also a legal issue to consider before non-trivial patches can
>> be accepted for incorporation: https://gcc.gnu.org/contribute.html#legal
>>
>> If your patch is accepted and if you do not have write-access to the
>> repository, one of the maintainers will likely take care of it.
>> If you become a regular contributor, you will probably want to consider
>> getting write access.
>>
>> Cheers,
>> Harald
>>
>>
>>
>> On 6/24/23 19:17, Alexander Westbrooks via Gcc-patches wrote:
>> > Hello,
>> >
>> > I am new to the GFortran community. Over the past two weeks I created a
>> > patch that should fix PR82943 for GFortran. I have attached it to this
>> > email. The patch allows the code below to compile successfully. I am
>> > working on creating test cases next, but I am new to the process so it
>> may
>> > take me some time. After I make test cases, do I email them to you as
>> well?
>> > Do I need to make a pull-request on github in order to get the patch
>> > reviewed?
>> >
>> > Thank you,
>> >
>> > Alexander Westbrooks
>> >
>> > module testmod
>> >
>> >  public :: foo
>> >
>> >  type, public :: tough_lvl_0(a, b)
>> >  integer, kind :: a = 1
>> >  integer, len :: b
>> >  contains
>> >  procedure :: foo
>> >  end type
>> >
>> >  type, public, EXTENDS(tough_lvl_0) :: tough_lvl_1 (c)
>> >  integer, len :: c
>> >  contains
>> >  procedure :: bar
>> >  end type
>> >
>> >  type, public, EXTENDS(tough_lvl_1) :: tough_lvl_2 (d)
>> >  integer, len :: d
>> >  contains
>> >  procedure :: foobar
>> >  end type
>> >
>> > contains
>> >  subroutine foo(this)
>> >  class(tough_lvl_0(1,*)), intent(inout) :: this
>> >  end subroutine
>> >
>> >  subroutine bar(this)
>> >  class(tough_lvl_1(1,*,*)), intent(inout) :: this
>> >  end subroutine
>> >
>> >  subroutine foobar(this)
>> >  class(tough_lvl_2(1,*,*,*)), intent(inout) :: this
>> >  end subroutine
>> >
>> > end module
>> >
>> > PROGRAM testprogram
>> >  USE testmod
>> >
>> >  TYPE(tough_lvl_0(1,5)) :: test_pdt_0
>> >  TYPE(tough_lvl_1(1,5,6))   :: test_pdt_1
>> >  TYPE(tough_lvl_2(1,5,6,7)) :: test_pdt_2
>> >
>> >  CALL test_pdt_0%foo()
>> >
>> >  CALL test_pdt_1%foo()
>> >  CALL test_pdt_1%bar()
>> >
>> >  CALL test_pdt_2%foo()
>> >  CALL test_pdt_2%bar()
>> >  CALL test_pdt_2%foobar()
>> >
>> >
>> > END PROGRAM testprogram
>>
>>


[PATCH] s390: Optimize vec_cmpge followed by vec_sel

2023-07-17 Thread Juergen Christ via Gcc-patches
A vec_cmpge produces a negation.  Replace this negation by swapping the two
selection choices of a vec_sel based on the result of the vec_cmpge.

Bootstrapped and regression tested on s390x.

gcc/ChangeLog:

* config/s390/vx-builtins.md: New vsel pattern.

gcc/testsuite/ChangeLog:

* gcc.target/s390/vector/vec-cmpge.c: New test.

Signed-off-by: Juergen Christ 
---
 gcc/config/s390/vx-builtins.md | 11 +++
 .../gcc.target/s390/vector/vec-cmpge.c | 18 ++
 2 files changed, 29 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-cmpge.c

diff --git a/gcc/config/s390/vx-builtins.md b/gcc/config/s390/vx-builtins.md
index f4248c55d4ec..0ce3ff6ef4a6 100644
--- a/gcc/config/s390/vx-builtins.md
+++ b/gcc/config/s390/vx-builtins.md
@@ -530,6 +530,17 @@
   "vsel\t%v0,%1,%2,%3"
   [(set_attr "op_type" "VRR")])
 
+(define_insn "vsel_swapped"
+  [(set (match_operand:V_HW_FT   0 "register_operand" "=v")
+   (ior:V_HW_FT
+(and:V_HW_FT (not:V_HW_FT (match_operand:V_HW_FT 3 "register_operand"  
"v"))
+ (match_operand:V_HW_FT 1 "register_operand"  "v"))
+(and:V_HW_FT (match_dup 3)
+ (match_operand:V_HW_FT 2 "register_operand"  "v"]
+  "TARGET_VX"
+  "vsel\t%v0,%2,%1,%3"
+  [(set_attr "op_type" "VRR")])
+
 
 ; Vector sign extend to doubleword
 
diff --git a/gcc/testsuite/gcc.target/s390/vector/vec-cmpge.c 
b/gcc/testsuite/gcc.target/s390/vector/vec-cmpge.c
new file mode 100644
index ..eb188690ae41
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/vector/vec-cmpge.c
@@ -0,0 +1,18 @@
+/* Check that vec_sel absorbs a negation generated by vec_cmpge.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O3 -mzarch -march=z13" } */
+
+typedef __attribute__((vector_size(16))) unsigned char uv16qi;
+
+#include 
+
+void f(char *res, uv16qi ctrl)
+{
+  uv16qi a = vec_splat_u8(0xfe);
+  uv16qi b = vec_splat_u8(0x80);
+  uv16qi mask = vec_cmpge(ctrl, b);
+  *(uv16qi *)res = vec_sel(a, b, mask);
+}
+
+/* { dg-final { scan-assembler-not "vno\t" } } */
-- 
2.39.3



Re: [PATCH V2] RTL_SSA: Relax PHI_MODE in phi_setup

2023-07-17 Thread Richard Sandiford via Gcc-patches
juzhe.zh...@rivai.ai writes:
> From: Ju-Zhe Zhong 
>
> Hi, Richard.
>
> RISC-V port needs to add a bunch VLS modes (V16QI,V32QI,V64QI,...etc)
> There are sharing same REG_CLASS with VLA modes (VNx16QI,VNx32QI,...etc)
>
> When I am adding those VLS modes, the RTL_SSA initialization in VSETVL PASS 
> (inserted after RA) ICE:
> rvv.c:13:1: internal compiler error: in partial_subreg_p, at rtl.h:3186
>13 | }
>   | ^
> 0xf7a5b1 partial_subreg_p(machine_mode, machine_mode)
> ../../../riscv-gcc/gcc/rtl.h:3186
> 0x1407616 wider_subreg_mode(machine_mode, machine_mode)
> ../../../riscv-gcc/gcc/rtl.h:3252
> 0x2a2c6ff rtl_ssa::combine_modes(machine_mode, machine_mode)
> ../../../riscv-gcc/gcc/rtl-ssa/internals.inl:677
> 0x2a2b9a4 rtl_ssa::function_info::simplify_phi_setup(rtl_ssa::phi_info*, 
> rtl_ssa::set_info**, bitmap_head*)
> ../../../riscv-gcc/gcc/rtl-ssa/functions.cc:146
> 0x2a2c142 rtl_ssa::function_info::simplify_phis()
> ../../../riscv-gcc/gcc/rtl-ssa/functions.cc:258
> 0x2a2b3f0 rtl_ssa::function_info::function_info(function*)
> ../../../riscv-gcc/gcc/rtl-ssa/functions.cc:51
> 0x1cebab9 pass_vsetvl::init()
> ../../../riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:4578
> 0x1cec150 pass_vsetvl::execute(function*)
> ../../../riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:4716
>
> The reason is that we have V32QImode (size = [32,0]) which is the mode set as 
> regno_reg_rtx[97]
> When the PHI input def comes from ENTRY BLOCK (index =0), the def->mode () = 
> V32QImode.
> But the phi_mode = VNx2QI for example (I use VLA modes intrinsic write the 
> codes).
> Then combine_modes report ICE.
>
> gcc/ChangeLog:
>
> * rtl-ssa/internals.inl: Fix when mode1 and mode2 are not ordred.

OK if it passes testing.

Thanks,
Richard

> ---
>  gcc/rtl-ssa/internals.inl | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/gcc/rtl-ssa/internals.inl b/gcc/rtl-ssa/internals.inl
> index 0a61811289d..e49297c12b3 100644
> --- a/gcc/rtl-ssa/internals.inl
> +++ b/gcc/rtl-ssa/internals.inl
> @@ -673,6 +673,9 @@ combine_modes (machine_mode mode1, machine_mode mode2)
>if (mode2 == E_BLKmode)
>  return mode1;
>  
> +  if (!ordered_p (GET_MODE_SIZE (mode1), GET_MODE_SIZE (mode2)))
> +return BLKmode;
> +
>return wider_subreg_mode (mode1, mode2);
>  }


Re: Re: [PATCH] RTL_SSA: Relax PHI_MODE in phi_setup

2023-07-17 Thread 钟居哲
Thanks so much.
It works!

https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624694.html 
Is it OK?



juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-07-17 22:31
To: Juzhe-Zhong
CC: gcc-patches
Subject: Re: [PATCH] RTL_SSA: Relax PHI_MODE in phi_setup
Juzhe-Zhong  writes:
> Hi, Richard.
>
> RISC-V port needs to add a bunch VLS modes (V16QI,V32QI,V64QI,...etc)
> There are sharing same REG_CLASS with VLA modes (VNx16QI,VNx32QI,...etc)
>
> When I am adding those VLS modes, the RTL_SSA initialization in VSETVL PASS 
> (inserted after RA) ICE:
> rvv.c:13:1: internal compiler error: in partial_subreg_p, at rtl.h:3186
>13 | }
>   | ^
> 0xf7a5b1 partial_subreg_p(machine_mode, machine_mode)
> ../../../riscv-gcc/gcc/rtl.h:3186
> 0x1407616 wider_subreg_mode(machine_mode, machine_mode)
> ../../../riscv-gcc/gcc/rtl.h:3252
> 0x2a2c6ff rtl_ssa::combine_modes(machine_mode, machine_mode)
> ../../../riscv-gcc/gcc/rtl-ssa/internals.inl:677
> 0x2a2b9a4 rtl_ssa::function_info::simplify_phi_setup(rtl_ssa::phi_info*, 
> rtl_ssa::set_info**, bitmap_head*)
> ../../../riscv-gcc/gcc/rtl-ssa/functions.cc:146
> 0x2a2c142 rtl_ssa::function_info::simplify_phis()
> ../../../riscv-gcc/gcc/rtl-ssa/functions.cc:258
> 0x2a2b3f0 rtl_ssa::function_info::function_info(function*)
> ../../../riscv-gcc/gcc/rtl-ssa/functions.cc:51
> 0x1cebab9 pass_vsetvl::init()
> ../../../riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:4578
> 0x1cec150 pass_vsetvl::execute(function*)
> ../../../riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:4716
>
> The reason is that we have V32QImode (size = [32,0]) which is the mode set as 
> regno_reg_rtx[97]
> When the PHI input def comes from ENTRY BLOCK (index =0), the def->mode () = 
> V32QImode.
> But the phi_mode = VNx2QI for example (I use VLA modes intrinsic write the 
> codes).
> Then combine_modes report ICE.
>
> In this situation, I relax it and let it use phi_mode directly.
 
The idea is that phi_mode must be:
 
(a) big enough to store all possible inputs without losing significant bits
(b) something that occupies the right number of registers
 
I think the patch loses property (a).
 
I suppose it would be difficult to find a "real" mode that is known to
contain both V32QI and VNx2QI without losing property (b).
 
There is some support for using BLKmode as a wildcard mode for registers.
Does it work if you add:
 
  if (!ordered_p (GET_MODE_SIZE (mode1), GET_MODE_SIZE (mode2)))
return BLKmode;
 
before the call to wider_subreg_mode in combine_modes?
 
Thanks,
Richard
 
>
> Is it correct ?
>
> Thanks.
>
> gcc/ChangeLog:
>
> * rtl-ssa/functions.cc (function_info::simplify_phi_setup): Relax 
> combine in PHI setup.
>
> ---
>  gcc/rtl-ssa/functions.cc | 14 +-
>  1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/rtl-ssa/functions.cc b/gcc/rtl-ssa/functions.cc
> index c35d25dbf8f..0793598ab1d 100644
> --- a/gcc/rtl-ssa/functions.cc
> +++ b/gcc/rtl-ssa/functions.cc
> @@ -143,7 +143,19 @@ function_info::simplify_phi_setup (phi_info *phi, 
> set_info **assumed_values,
>// If the input has a known mode (i.e. not BLKmode), make sure
>// that the phi's mode is at least as large.
>if (def)
> - phi_mode = combine_modes (phi_mode, def->mode ());
> + {
> +   /* For target like RISC-V, it applies both variable-length
> +  and fixed-length to the same REG_CLASS.
> +
> +  It will cause ICE for these 2 following cases:
> +1. phi_mode: variable-length.
> +   def->mode (): fixed-length.
> +2. phi_mode: fixed-length.
> +   def->mode (): variable-length.  */
> +   if (!(GET_MODE_SIZE (phi_mode).is_constant ()
> + ^ GET_MODE_SIZE (def->mode ()).is_constant ()))
> + phi_mode = combine_modes (phi_mode, def->mode ());
> + }
>  }
>if (phi->mode () != phi_mode)
>  phi->set_mode (phi_mode);
 


[PATCH V2] RTL_SSA: Relax PHI_MODE in phi_setup

2023-07-17 Thread juzhe . zhong
From: Ju-Zhe Zhong 

Hi, Richard.

RISC-V port needs to add a bunch VLS modes (V16QI,V32QI,V64QI,...etc)
There are sharing same REG_CLASS with VLA modes (VNx16QI,VNx32QI,...etc)

When I am adding those VLS modes, the RTL_SSA initialization in VSETVL PASS 
(inserted after RA) ICE:
rvv.c:13:1: internal compiler error: in partial_subreg_p, at rtl.h:3186
   13 | }
  | ^
0xf7a5b1 partial_subreg_p(machine_mode, machine_mode)
../../../riscv-gcc/gcc/rtl.h:3186
0x1407616 wider_subreg_mode(machine_mode, machine_mode)
../../../riscv-gcc/gcc/rtl.h:3252
0x2a2c6ff rtl_ssa::combine_modes(machine_mode, machine_mode)
../../../riscv-gcc/gcc/rtl-ssa/internals.inl:677
0x2a2b9a4 rtl_ssa::function_info::simplify_phi_setup(rtl_ssa::phi_info*, 
rtl_ssa::set_info**, bitmap_head*)
../../../riscv-gcc/gcc/rtl-ssa/functions.cc:146
0x2a2c142 rtl_ssa::function_info::simplify_phis()
../../../riscv-gcc/gcc/rtl-ssa/functions.cc:258
0x2a2b3f0 rtl_ssa::function_info::function_info(function*)
../../../riscv-gcc/gcc/rtl-ssa/functions.cc:51
0x1cebab9 pass_vsetvl::init()
../../../riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:4578
0x1cec150 pass_vsetvl::execute(function*)
../../../riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:4716

The reason is that we have V32QImode (size = [32,0]) which is the mode set as 
regno_reg_rtx[97]
When the PHI input def comes from ENTRY BLOCK (index =0), the def->mode () = 
V32QImode.
But the phi_mode = VNx2QI for example (I use VLA modes intrinsic write the 
codes).
Then combine_modes report ICE.

gcc/ChangeLog:

* rtl-ssa/internals.inl: Fix when mode1 and mode2 are not ordred.

---
 gcc/rtl-ssa/internals.inl | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/rtl-ssa/internals.inl b/gcc/rtl-ssa/internals.inl
index 0a61811289d..e49297c12b3 100644
--- a/gcc/rtl-ssa/internals.inl
+++ b/gcc/rtl-ssa/internals.inl
@@ -673,6 +673,9 @@ combine_modes (machine_mode mode1, machine_mode mode2)
   if (mode2 == E_BLKmode)
 return mode1;
 
+  if (!ordered_p (GET_MODE_SIZE (mode1), GET_MODE_SIZE (mode2)))
+return BLKmode;
+
   return wider_subreg_mode (mode1, mode2);
 }
 
-- 
2.36.1



Re: [PATCH] RTL_SSA: Relax PHI_MODE in phi_setup

2023-07-17 Thread Richard Sandiford via Gcc-patches
Juzhe-Zhong  writes:
> Hi, Richard.
>
> RISC-V port needs to add a bunch VLS modes (V16QI,V32QI,V64QI,...etc)
> There are sharing same REG_CLASS with VLA modes (VNx16QI,VNx32QI,...etc)
>
> When I am adding those VLS modes, the RTL_SSA initialization in VSETVL PASS 
> (inserted after RA) ICE:
> rvv.c:13:1: internal compiler error: in partial_subreg_p, at rtl.h:3186
>13 | }
>   | ^
> 0xf7a5b1 partial_subreg_p(machine_mode, machine_mode)
> ../../../riscv-gcc/gcc/rtl.h:3186
> 0x1407616 wider_subreg_mode(machine_mode, machine_mode)
> ../../../riscv-gcc/gcc/rtl.h:3252
> 0x2a2c6ff rtl_ssa::combine_modes(machine_mode, machine_mode)
> ../../../riscv-gcc/gcc/rtl-ssa/internals.inl:677
> 0x2a2b9a4 rtl_ssa::function_info::simplify_phi_setup(rtl_ssa::phi_info*, 
> rtl_ssa::set_info**, bitmap_head*)
> ../../../riscv-gcc/gcc/rtl-ssa/functions.cc:146
> 0x2a2c142 rtl_ssa::function_info::simplify_phis()
> ../../../riscv-gcc/gcc/rtl-ssa/functions.cc:258
> 0x2a2b3f0 rtl_ssa::function_info::function_info(function*)
> ../../../riscv-gcc/gcc/rtl-ssa/functions.cc:51
> 0x1cebab9 pass_vsetvl::init()
> ../../../riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:4578
> 0x1cec150 pass_vsetvl::execute(function*)
> ../../../riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:4716
>
> The reason is that we have V32QImode (size = [32,0]) which is the mode set as 
> regno_reg_rtx[97]
> When the PHI input def comes from ENTRY BLOCK (index =0), the def->mode () = 
> V32QImode.
> But the phi_mode = VNx2QI for example (I use VLA modes intrinsic write the 
> codes).
> Then combine_modes report ICE.
>
> In this situation, I relax it and let it use phi_mode directly.

The idea is that phi_mode must be:

(a) big enough to store all possible inputs without losing significant bits
(b) something that occupies the right number of registers

I think the patch loses property (a).

I suppose it would be difficult to find a "real" mode that is known to
contain both V32QI and VNx2QI without losing property (b).

There is some support for using BLKmode as a wildcard mode for registers.
Does it work if you add:

  if (!ordered_p (GET_MODE_SIZE (mode1), GET_MODE_SIZE (mode2)))
return BLKmode;

before the call to wider_subreg_mode in combine_modes?

Thanks,
Richard

>
> Is it correct ?
>
> Thanks.
>
> gcc/ChangeLog:
>
> * rtl-ssa/functions.cc (function_info::simplify_phi_setup): Relax 
> combine in PHI setup.
>
> ---
>  gcc/rtl-ssa/functions.cc | 14 +-
>  1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/rtl-ssa/functions.cc b/gcc/rtl-ssa/functions.cc
> index c35d25dbf8f..0793598ab1d 100644
> --- a/gcc/rtl-ssa/functions.cc
> +++ b/gcc/rtl-ssa/functions.cc
> @@ -143,7 +143,19 @@ function_info::simplify_phi_setup (phi_info *phi, 
> set_info **assumed_values,
>// If the input has a known mode (i.e. not BLKmode), make sure
>// that the phi's mode is at least as large.
>if (def)
> - phi_mode = combine_modes (phi_mode, def->mode ());
> + {
> +   /* For target like RISC-V, it applies both variable-length
> +  and fixed-length to the same REG_CLASS.
> +
> +  It will cause ICE for these 2 following cases:
> +1. phi_mode: variable-length.
> +   def->mode (): fixed-length.
> +2. phi_mode: fixed-length.
> +   def->mode (): variable-length.  */
> +   if (!(GET_MODE_SIZE (phi_mode).is_constant ()
> + ^ GET_MODE_SIZE (def->mode ()).is_constant ()))
> + phi_mode = combine_modes (phi_mode, def->mode ());
> + }
>  }
>if (phi->mode () != phi_mode)
>  phi->set_mode (phi_mode);


Re: [PATCH] Include insn-opinit.h in PLUGIN_H [PR110610]

2023-07-17 Thread Jeff Law via Gcc-patches




On 7/17/23 05:55, Andre Vieira (lists) wrote:



On 11/07/2023 23:28, Jeff Law wrote:



On 7/11/23 04:37, Andre Vieira (lists) via Gcc-patches wrote:

Hi,

This patch fixes PR110610 by including OPTABS_H in the INTERNAL_FN_H 
list, as insn-opinit.h is now required by internal-fn.h. This will 
lead to insn-opinit.h, among the other OPTABS_H header files, being 
installed in the plugin directory.


Bootstrapped aarch64-unknown-linux-gnu.

@Jakub: could you check to see if it also addresses PR 110284?


gcc/ChangeLog:

 PR 110610
 * Makefile.in (INTERNAL_FN_H): Add OPTABS_H.
Why use OPTABS_H here?  Isn't the new dependency just on insn-opinit.h 
and insn-codes.h and neither of those #include other headers do they?





Yeah, there was no particular reason other than I just felt the Makefile 
structure sort of lend itself that way. I checked genopinit.cc and it 
seems insn-opinit.h doesn't include any other header files, only the 
sources do, so I've changed the patch to only add insn-opinit.h to 
INTERNAL_FN_H.


---

This patch fixes PR110610 by including insn-opinit.h in the 
INTERNAL_FN_H list, as insn-opinit.h is now required by internal-fn.h. 
This will lead to insn-opinit.h, among the other OPTABS_H header files, 
being installed in the plugin directory.


Bootstrapped aarch64-unknown-linux-gnu.

gcc/ChangeLog:
     PR 110610
     * Makefile.in (INTERNAL_FN_H): Add insn-opinit.h.

OK
jeff


RE: [PATCH v2] tree-optimization/110279- Check for nested FMA chains in reassoc

2023-07-17 Thread Tamar Christina via Gcc-patches
I think Andrew is listed as maintainer for tree-ssa, or maybe it's on one of 
the Richard's lists?

> -Original Message-
> From: Gcc-patches  bounces+tamar.christina=arm@gcc.gnu.org> On Behalf Of Philipp
> Tomsich
> Sent: Tuesday, July 11, 2023 7:51 AM
> To: Jakub Jelinek 
> Cc: gcc-patches@gcc.gnu.org; Di Zhao OS
> 
> Subject: Re: [PATCH v2] tree-optimization/110279- Check for nested FMA
> chains in reassoc
> 
> Jakub,
> 
> it looks like you did a lot of work on reassoc in the past — could you have a
> quick look and comment?
> 
> Thanks,
> Philipp.
> 
> 
> On Tue, 11 Jul 2023 at 04:59, Di Zhao OS
>  wrote:
> >
> > Attached is an updated version of the patch.
> >
> > Based on Philipp's review, some changes:
> >
> > 1. Defined new enum fma_state to describe the state of FMA candidates
> >for a list of operands. (Since the tests seems simple after the
> >change, I didn't add predicates on it.) 2. Changed return type of
> > convert_mult_to_fma_1 and convert_mult_to_fma
> >to tree, to remove the in/out parameter.
> > 3. Added description of return value values of rank_ops_for_fma.
> >
> > ---
> > gcc/ChangeLog:
> >
> > * tree-ssa-math-opts.cc (convert_mult_to_fma_1): Added new
> parameter
> > check_only_p. Changed return type to tree.
> > (struct fma_transformation_info): Moved to header.
> > (class fma_deferring_state): Moved to header.
> > (convert_mult_to_fma): Added new parameter check_only_p. Changed
> > return type to tree.
> > * tree-ssa-math-opts.h (struct fma_transformation_info): Moved from
> .cc.
> > (class fma_deferring_state): Moved from .cc.
> > (convert_mult_to_fma): Add function decl.
> > * tree-ssa-reassoc.cc (enum fma_state): Defined new enum to describe
> > the state of FMA candidates for a list of operands.
> > (rewrite_expr_tree_parallel): Changed boolean parameter to enum 
> > type.
> > (rank_ops_for_fma): Return enum fma_state.
> > (reassociate_bb): Avoid rewriting to parallel if nested FMAs are 
> > found.
> >
> > Thanks,
> > Di Zhao
> >
> >


Re: [PATCH V3] RISC-V: Add TARGET_MIN_VLEN > 4096 check

2023-07-17 Thread Jeff Law via Gcc-patches




On 7/17/23 08:20, Juzhe-Zhong wrote:

gcc/ChangeLog:

 * config/riscv/riscv.cc (riscv_option_override): Add sorry check.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/rvv/base/zvl-unimplemented-1.c: New test.
 * gcc.target/riscv/rvv/base/zvl-unimplemented-2.c: New test.

OK
jeff


[PATCH V3] RISC-V: Add TARGET_MIN_VLEN > 4096 check

2023-07-17 Thread Juzhe-Zhong
gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_option_override): Add sorry check.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/zvl-unimplemented-1.c: New test.
* gcc.target/riscv/rvv/base/zvl-unimplemented-2.c: New test.

---
 gcc/config/riscv/riscv.cc | 8 
 .../gcc.target/riscv/rvv/base/zvl-unimplemented-1.c   | 4 
 .../gcc.target/riscv/rvv/base/zvl-unimplemented-2.c   | 4 
 3 files changed, 16 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/zvl-unimplemented-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/zvl-unimplemented-2.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 6ed735d6983..82e7c27b057 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -6672,6 +6672,14 @@ riscv_option_override (void)
   riscv_stack_protector_guard_offset = offs;
 }
 
+  /* FIXME: We don't allow TARGET_MIN_VLEN > 4096 since the datatypes of
+ both GET_MODE_SIZE and GET_MODE_BITSIZE are poly_uint16.
+
+ We can only allow TARGET_MIN_VLEN * 8 (LMUL) < 65535.  */
+  if (TARGET_MIN_VLEN > 4096)
+sorry (
+  "Current RISC-V GCC can not support VLEN > 4096bit for 'V' Extension");
+
   /* Convert -march to a chunks count.  */
   riscv_vector_chunks = riscv_convert_vector_bits ();
 }
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/zvl-unimplemented-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/zvl-unimplemented-1.c
new file mode 100644
index 000..03f67035ca4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/zvl-unimplemented-1.c
@@ -0,0 +1,4 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=rv64gcv_zvl8192b -mabi=lp64d --param 
riscv-autovec-preference=fixed-vlmax" } */
+
+void foo () {} // { dg-excess-errors "sorry, unimplemented: Current RISC-V GCC 
can not support VLEN > 4096bit for 'V' Extension" }
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/zvl-unimplemented-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/zvl-unimplemented-2.c
new file mode 100644
index 000..075112f2f81
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/zvl-unimplemented-2.c
@@ -0,0 +1,4 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=rv64gcv_zvl8192b -mabi=lp64d --param 
riscv-autovec-preference=scalable" } */
+
+void foo () {} // { dg-excess-errors "sorry, unimplemented: Current RISC-V GCC 
can not support VLEN > 4096bit for 'V' Extension" }
-- 
2.36.1




Re: [WIP RFC] Add support for keyword-based attributes

2023-07-17 Thread Richard Sandiford via Gcc-patches
Jason Merrill  writes:
> On Sun, Jul 16, 2023 at 6:50 AM Richard Sandiford 
> wrote:
>
>> Jakub Jelinek  writes:
>> > On Fri, Jul 14, 2023 at 04:56:18PM +0100, Richard Sandiford via
>> Gcc-patches wrote:
>> >> Summary: We'd like to be able to specify some attributes using
>> >> keywords, rather than the traditional __attribute__ or [[...]]
>> >> syntax.  Would that be OK?
>> >
>> > Will defer to C/C++ maintainers, but as you mentioned, there are many
>> > attributes which really can't be ignored and change behavior
>> significantly.
>> > vector_size is one of those, mode attribute another,
>> > no_unique_address another one (changes ABI in various cases),
>> > the OpenMP attributes (omp::directive, omp::sequence) can change
>> > behavior if -fopenmp, etc.
>> > One can easily error with
>> > #ifdef __has_cpp_attribute
>> > #if !__has_cpp_attribute (arm::whatever)
>> > #error arm::whatever attribute unsupported
>> > #endif
>> > #else
>> > #error __has_cpp_attribute unsupported
>> > #endif
>>
>> Yeah.  It's easy to detect whether a particular ACLE feature is supported,
>> since there are predefined macros for each one.  But IMO it's a failing
>> if we have to recommend that any compilation that uses arm::foo should
>> also have:
>>
>> #ifndef __ARM_FEATURE_FOO
>> #error arm::foo not supported
>> #endif
>>
>> It ought to be the compiler's job to diagnose its limitations, rather
>> than the user's.
>>
>> The feature macros are more for conditional usage of features, where
>> there's a fallback available.
>>
>> I suppose we could say that users have to include a particular header
>> file before using an attribute, and use a pragma in that header file to
>> tell the compiler to enable the attribute.  But then there would need to
>> be a separate header file for each distinct set of attributes (in terms
>> of historical timeline), which would get ugly.  I'm not sure that it's
>> better than using keywords, or even whether it's better than predefining
>> the "keyword" as a macro that expands to a compiler-internal attribute.
>>
>
> With a combination of those approaches it can be a single header:
>
> #ifdef __ARM_FEATURE_FOO
> #define __arm_foo [[arm::foo]]
> // else use of __arm_foo will fail
> #endif

If we did that, would it be a defined part of the interface that
__arm_foo expands to exactly arm::foo, rather than to an obfuscated
or compiler-dependent attribute?

In other words, would it be a case of providing both the attribute
and the macro, and leaving users to choose whether they use the
attribute directly (and run the risk of miscompilation) or whether
they use the macros, based on their risk appetite?  If so, the risk of
miscompliation is mostly borne by the people who build the deployed code
rather than the people who initially wrote it.

If instead we say that the expansion of the macros is compiler-dependent
and that the macros must always be used, then I'm not sure the header
file provides a better interface than predefining the macros in the
compiler (which was the fallback option if the keywords were rejected).

But the diagnostics using these macros would be worse than diagnostics
based on keywords, not least because the diagnostics about invalid
use of the macros (from compilers that understood them) would refer
to the underlying attribute rather than the macro.

Thanks,
Richard


Re: [WIP RFC] Add support for keyword-based attributes

2023-07-17 Thread Michael Matz via Gcc-patches
Hello,

On Mon, 17 Jul 2023, Richard Sandiford via Gcc-patches wrote:

> >> There are some existing attributes that similarly affect semantics
> >> in ways that cannot be ignored.  vector_size is one obvious example.
> >> But that doesn't make it a good thing. :)
> >...
> > If you had added __arm(bar(args)) instead of __arm_bar(args) you would only
> > need one additional keyword - we could set aside a similar one for each
> > target then.  I realize that double-nesting of arguments might prove a bit
> > challenging but still.
> 
> Yeah, that would work.

So, essentially you want unignorable attributes, right?  Then implement 
exactly that: add one new keyword "__known_attribute__" (invent a better 
name, maybe :) ), semantics exactly as with __attribute__ (including using 
the same underlying lists in our data structures), with only one single 
deviation: instead of the warning you give an error for unhandled 
attributes.  Done.

(Old compilers will barf of the unknown new keyword, new compilers will 
error on unknown values used within such attribute list)

> > In any case I also think that attributes are what you want and their 
> > ugliness/issues are not worse than the ugliness/issues of the keyword 
> > approach IMHO.
> 
> I guess the ugliness of keywords is the double underscore? What are the 
> issues with the keyword approach though?

There are _always_ problems with new keywords, the more new keywords the 
more problems :-)  Is the keyword context-sensitive or not?  What about 
existing system headers that use it right now?  Is it recognized in 
free-standing or not?  Is it only recognized for some targets?  Is it 
recognized only for certain configurations of the target?

So, let's define one new mechanism, for all targets, all configs, and all 
language standards.  Let's use __attribute__ with a twist :)


Ciao,
Michael.


Re: [PATCH] Fix bootstrap failure (with g++ 4.8.5) in tree-if-conv.cc.

2023-07-17 Thread Alexander Monakov


On Mon, 17 Jul 2023, Richard Biener wrote:

> > > > > OK.   Btw, while I didn't spot this during review I would appreciate
> > > > > if the code could use vec.[q]sort, this should work with a lambda as
> > > > > well I think.
> > > >
> > > > That was my first use, but that hits
> > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99469
> > >
> > > That is not hitting PR 99469 but rather it means your comparison is not
> > > correct for an (unstable) sort.
> > > That is qsort comparator should have this relationship `f(a,b) == !f(b, 
> > > a)` and
> > > `f(a,a)` should also return false.
> >
> > I'm using the standard std::pair comparator which indicates that f(a,a) is 
> > true,
> > https://en.cppreference.com/w/cpp/utility/pair/operator_cmp
> >
> > > If you are running into this for qsort here, you will most likely run 
> > > into issues
> > > with std::sort later on too.
> >
> > Don't see why or how. It needs to have a consistent relationship which 
> > std::pair
> > maintains.  So why would using the standard tuple comparator with a standard
> > std::sort cause problem?
> 
> At least for
> 
>  return left.second < right.second;
> 
> f(a,a) doesn't hold.  Note qsort can end up comparing an element to
> itself (not sure if GCCs implementation now can).

(it cannot but that is not important here)

Tamar, while std::sort receives a "less-than" comparison predicate, qsort
needs a tri-state comparator that returns a negative value for "less-than"
relation, positive for "more-than", and zero when operands are "equal".

Passing output of std::pair::operator< straight to qsort is not correct,
and qsort_chk catches that mistake at runtime.

std::sort is not a stable sort and therefore can cause code generation
differences by swapping around elements that are not bitwise-identical
but "equal" according to the comparator. This is the main reason for
preferring our internal qsort, which yields same results on all platforms.

Let me also note that #include  is pretty heavy-weight, and so
I'd suggest to avoid it to avoid needlessly increasing bootstrap times.

Alexander


Re: [PATCH V4] Optimize '(X - N * M) / N' to 'X / N - M' if valid

2023-07-17 Thread Jiufu Guo via Gcc-patches


Hi,

Richard Biener  writes:

> On Fri, 14 Jul 2023, Andrew MacLeod wrote:
>
>> 
>> On 7/14/23 09:37, Richard Biener wrote:
>> > On Fri, 14 Jul 2023, Aldy Hernandez wrote:
>> >
>> >> I don't know what you're trying to accomplish here, as I haven't been
>> >> following the PR, but adding all these helper functions to the ranger
>> >> header
>> >> file seems wrong, especially since there's only one use of them. I see
>> >> you're
>> >> tweaking the irange API, adding helper functions to range-op (which is 
>> >> only
>> >> for code dealing with implementing range operators for tree codes), etc
>> >> etc.
>> >>
>> >> If you need these helper functions, I suggest you put them closer to their
>> >> uses (i.e. wherever the match.pd support machinery goes).
>> > Note I suggested the opposite beacuse I thought these kind of helpers
>> > are closer to value-range support than to match.pd.
>> 
>> 
>> probably vr-values.{cc.h} and  the simply_using_ranges paradigm would be the
>> most sensible place to put these kinds of auxiliary routines?
>> 
>> 
>> >
>> > But I take away from your answer that there's nothing close in the
>> > value-range machinery that answers the question whether A op B may
>> > overflow?
>> 
>> we dont track it in ranges themselves.   During calculation of a range we
>> obviously know, but propagating that generally when we rarely care doesn't
>> seem worthwhile.  The very first generation of irange 6 years ago had an
>> overflow_p() flag, but it was removed as not being worth keeping.     easier
>> to simply ask the question when it matters
>> 
>> As the routines show, it pretty easy to figure out when the need arises so I
>> think that should suffice.  At least for now,
>> 
>> Should we decide we would like it in general, it wouldnt be hard to add to
>> irange.  wi_fold() cuurently returns null, it could easily return a bool
>> indicating if an overflow happened, and wi_fold_in_parts and fold_range would
>> simply OR the results all together of the compoent wi_fold() calls.  It would
>> require updating/audfiting  a number of range-op entries and adding an
>> overflowed_p()  query to irange.
>
> Ah, yeah - the folding APIs would be a good fit I guess.  I was
> also looking to have the "new" helpers to be somewhat consistent
> with the ranger API.
>
> So if we had a fold_range overload with either an output argument
> or a flag that makes it return false on possible overflow that
> would work I guess?  Since we have a virtual class setup we
> might be able to provide a default failing method and implement
> workers for plus and mult (as needed for this patch) as the need
> arises?

Thanks for your comments!
Here is a concern.  The patterns in match.pd may be supported by
'vrp' passes. At that time, the range info would be computed (via
the value-range machinery) and cached for each SSA_NAME. In the
patterns, when range_of_expr is called for a capture, the range
info is retrieved from the cache, and no need to fold_range again.
This means the overflow info may also need to be cached together
with other range info.  There may be additional memory and time
cost.

BR,
Jeff (Jiufu Guo)

>
> Thanks,
> Richard.


Re: [WIP RFC] Add support for keyword-based attributes

2023-07-17 Thread Jason Merrill via Gcc-patches
On Sun, Jul 16, 2023 at 6:50 AM Richard Sandiford 
wrote:

> Jakub Jelinek  writes:
> > On Fri, Jul 14, 2023 at 04:56:18PM +0100, Richard Sandiford via
> Gcc-patches wrote:
> >> Summary: We'd like to be able to specify some attributes using
> >> keywords, rather than the traditional __attribute__ or [[...]]
> >> syntax.  Would that be OK?
> >
> > Will defer to C/C++ maintainers, but as you mentioned, there are many
> > attributes which really can't be ignored and change behavior
> significantly.
> > vector_size is one of those, mode attribute another,
> > no_unique_address another one (changes ABI in various cases),
> > the OpenMP attributes (omp::directive, omp::sequence) can change
> > behavior if -fopenmp, etc.
> > One can easily error with
> > #ifdef __has_cpp_attribute
> > #if !__has_cpp_attribute (arm::whatever)
> > #error arm::whatever attribute unsupported
> > #endif
> > #else
> > #error __has_cpp_attribute unsupported
> > #endif
>
> Yeah.  It's easy to detect whether a particular ACLE feature is supported,
> since there are predefined macros for each one.  But IMO it's a failing
> if we have to recommend that any compilation that uses arm::foo should
> also have:
>
> #ifndef __ARM_FEATURE_FOO
> #error arm::foo not supported
> #endif
>
> It ought to be the compiler's job to diagnose its limitations, rather
> than the user's.
>
> The feature macros are more for conditional usage of features, where
> there's a fallback available.
>
> I suppose we could say that users have to include a particular header
> file before using an attribute, and use a pragma in that header file to
> tell the compiler to enable the attribute.  But then there would need to
> be a separate header file for each distinct set of attributes (in terms
> of historical timeline), which would get ugly.  I'm not sure that it's
> better than using keywords, or even whether it's better than predefining
> the "keyword" as a macro that expands to a compiler-internal attribute.
>

With a combination of those approaches it can be a single header:

#ifdef __ARM_FEATURE_FOO
#define __arm_foo [[arm::foo]]
// else use of __arm_foo will fail
#endif


[committed] OpenMP/Fortran: Parsing support for 'uses_allocators'

2023-07-17 Thread Tobias Burnus

Committed the attached patch as r14-2582-g89d0f082b3c95f.

This is about OpenMP's uses_allocators clause to the 'target' directive.

Using the clause with predefined allocators as list arguments is
required if those allocators are used in a target region - unless
there is an 'omp requires dynamic_allocators' in the compilation unit.

While the later is a no op (requirement fulfilled by all devices), we
still had to handle the no op when using 'uses_allocators', which this
commit does.

However, uses_allocators also permits to define new allocators; for
those, this commit stops after parsing and resolving with a
'sorry, unimplemented'.

Support for the latter will be added together with the C/C++ support
by a re-diffed/updated version of Chung-Lin's patch at
https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596587.html

(See thread for pending review issues; the C++ member var issue
is https://gcc.gnu.org/PR110347 )

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit 89d0f082b3c95f68d116d4480126e3ab7fb7f36b
Author: Tobias Burnus 
Date:   Mon Jul 17 15:13:44 2023 +0200

OpenMP/Fortran: Parsing support for 'uses_allocators'

The 'uses_allocators' clause to the 'target' construct accepts predefined
allocators and can also be used to define a new allocator for a target region.
As predefined allocators in GCC do not require special handling, those can and
are ignored after parsing, such that this feature now works. On the other hand,
defining a new allocator will fail for now with a 'sorry, unimplemented'.

Note that both the OpenMP 5.0/5.1 and 5.2 syntax for uses_allocators
is supported by this commit.

2023-07-17  Tobias Burnus  
Chung-Lin Tang  

gcc/fortran/ChangeLog:

* dump-parse-tree.cc (show_omp_namelist, show_omp_clauses): Dump
uses_allocators clause.
* gfortran.h (gfc_free_omp_namelist): Add memspace_sym to u union
and traits_sym to u2 union.
(OMP_LIST_USES_ALLOCATORS): New enum value.
(gfc_free_omp_namelist): Add 'bool free_mem_traits_space' arg.
* match.cc (gfc_free_omp_namelist): Likewise.
* openmp.cc (gfc_free_omp_clauses, gfc_match_omp_variable_list,
gfc_match_omp_to_link, gfc_match_omp_doacross_sink,
gfc_match_omp_clause_reduction, gfc_match_omp_allocate,
gfc_match_omp_flush): Update call.
(gfc_match_omp_clauses): Likewise. Parse uses_allocators clause.
(gfc_match_omp_clause_uses_allocators): New.
(enum omp_mask2): Add new OMP_CLAUSE_USES_ALLOCATORS.
(OMP_TARGET_CLAUSES): Accept it.
(resolve_omp_clauses): Resolve uses_allocators clause
* st.cc (gfc_free_statement): Update gfc_free_omp_namelist call.
* trans-openmp.cc (gfc_trans_omp_clauses): Handle
OMP_LIST_USES_ALLOCATORS; fail with sorry unless predefined allocator.
(gfc_split_omp_clauses): Handle uses_allocators.

libgomp/ChangeLog:

* testsuite/libgomp.fortran/uses_allocators_1.f90: New test.
* testsuite/libgomp.fortran/uses_allocators_2.f90: New test.

Co-authored-by: Chung-Lin Tang 
---
 gcc/fortran/dump-parse-tree.cc |  24 +++
 gcc/fortran/gfortran.h |   5 +-
 gcc/fortran/match.cc   |   7 +-
 gcc/fortran/openmp.cc  | 194 +++--
 gcc/fortran/st.cc  |   2 +-
 gcc/fortran/trans-openmp.cc|  11 ++
 .../libgomp.fortran/uses_allocators_1.f90  | 168 ++
 .../libgomp.fortran/uses_allocators_2.f90  |  99 +++
 8 files changed, 491 insertions(+), 19 deletions(-)

diff --git a/gcc/fortran/dump-parse-tree.cc b/gcc/fortran/dump-parse-tree.cc
index effcebe9325..68122e3e6fd 100644
--- a/gcc/fortran/dump-parse-tree.cc
+++ b/gcc/fortran/dump-parse-tree.cc
@@ -1497,6 +1497,29 @@ show_omp_namelist (int list_type, gfc_omp_namelist *n)
 	  case OMP_LINEAR_UVAL: fputs ("uval(", dumpfile); break;
 	  default: break;
 	  }
+  else if (list_type == OMP_LIST_USES_ALLOCATORS)
+	{
+	  if (n->u.memspace_sym)
+	{
+	  fputs ("memspace(", dumpfile);
+	  fputs (n->sym->name, dumpfile);
+	  fputc (')', dumpfile);
+	}
+	  if (n->u.memspace_sym && n->u2.traits_sym)
+	fputc (',', dumpfile);
+	  if (n->u2.traits_sym)
+	{
+	  fputs ("traits(", dumpfile);
+	  fputs (n->u2.traits_sym->name, dumpfile);
+	  fputc (')', dumpfile);
+	}
+	  if (n->u.memspace_sym || n->u2.traits_sym)
+	fputc (':', dumpfile);
+	  fputs (n->sym->name, dumpfile);
+	  

Re: Fix optimize_mask_stores profile update

2023-07-17 Thread Richard Biener via Gcc-patches



> Am 17.07.2023 um 14:38 schrieb Jan Hubicka :
> 
> 
>> 
>>> On Mon, Jul 17, 2023 at 12:36 PM Jan Hubicka via Gcc-patches
>>>  wrote:
>>> 
>>> Hi,
>>> While looking into sphinx3 regression I noticed that vectorizer produces
>>> BBs with overall probability count 120%.  This patch fixes it.
>>> Richi, I don't know how to create a testcase, but having one would
>>> be nice.
>>> 
>>> Bootstrapped/regtested x86_64-linux, commited last night (sorry for
>>> late email)
>> 
>> This should trigger with sth like
>> 
>>  for (i)
>>if (cond[i])
>>  out[i] = 1.;
>> 
>> so a masked store and then using AVX2+.  ISTR we disable AVX masked
>> stores on zen (but not AVX512).
> 
> OK, let me see if I can get a testcase out of that.
>>>   efalse = make_edge (bb, store_bb, EDGE_FALSE_VALUE);
>>>   /* Put STORE_BB to likely part.  */
>>>   efalse->probability = profile_probability::unlikely ();
>>> +  e->probability = efalse->probability.invert ();
>>>   store_bb->count = efalse->count ();
>> 
>> isn't the count also wrong?  Or rather efalse should be likely().   We're
>> testing doing
>> 
>>  if (!mask all zeros)
>>masked-store
>> 
>> because a masked store with all zero mask can end up invoking COW page fault
>> handling multiple times (because it doesn't actually write).
> 
> Hmm, I only fixed the profile, efalse was already set to unlikely, but
> indeed I think it should be likely. Maybe we can compute some bound on
> actual probability by knowing if(cond[i]) probability.
> If the loop always does factor many ones or zeros, the probability would
> remain the same.
> If that is p and they are all independent, the outcome would be
> (1-p)^factor
> 
> sp we know the conditoinal shoul dbe in ragne (1-p)^factor(1-p),
> right?

Yes.  I think the heuristic was added for
The case of bigger ranges with all 0/1 for
Purely random one wouldn’t expect all zeros ever in practice.  Maybe the 
probability was also set with that special case in mind (which is of course 
broken)

Richard 

> Honza
> 
>> 
>> Note -Ofast allows store data races and thus does RMW instead of a masked 
>> store.
>> 
>>>   make_single_succ_edge (store_bb, join_bb, EDGE_FALLTHRU);
>>>   if (dom_info_available_p (CDI_DOMINATORS))


[PATCH] Read global value/mask in IPA.

2023-07-17 Thread Aldy Hernandez via Gcc-patches
Instead of reading the known zero bits in IPA, read the value/mask
pair which is available.

There is a slight change of behavior here.  I have removed the check
for SSA_NAME, as the ranger can calculate the range and value/mask for
INTEGER_CST.  This simplifies the code a bit, since there's no special
casing when setting the jfunc bits.  The default range for VR is
undefined, so I think it's safe just to check for undefined_p().

OK?

gcc/ChangeLog:

* ipa-prop.cc (ipa_compute_jump_functions_for_edge): Read global
value/mask.
---
 gcc/ipa-prop.cc | 18 --
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
index 5d790ff1265..4f6ed7b89bd 100644
--- a/gcc/ipa-prop.cc
+++ b/gcc/ipa-prop.cc
@@ -2402,8 +2402,7 @@ ipa_compute_jump_functions_for_edge (struct 
ipa_func_body_info *fbi,
}
   else
{
- if (TREE_CODE (arg) == SSA_NAME
- && param_type
+ if (param_type
  && Value_Range::supports_type_p (TREE_TYPE (arg))
  && Value_Range::supports_type_p (param_type)
  && irange::supports_p (TREE_TYPE (arg))
@@ -2422,15 +2421,14 @@ ipa_compute_jump_functions_for_edge (struct 
ipa_func_body_info *fbi,
gcc_assert (!jfunc->m_vr);
}
 
-  if (INTEGRAL_TYPE_P (TREE_TYPE (arg))
- && (TREE_CODE (arg) == SSA_NAME || TREE_CODE (arg) == INTEGER_CST))
+  if (INTEGRAL_TYPE_P (TREE_TYPE (arg)) && !vr.undefined_p ())
{
- if (TREE_CODE (arg) == SSA_NAME)
-   ipa_set_jfunc_bits (jfunc, 0,
-   widest_int::from (get_nonzero_bits (arg),
- TYPE_SIGN (TREE_TYPE (arg;
- else
-   ipa_set_jfunc_bits (jfunc, wi::to_widest (arg), 0);
+ irange  = as_a  (vr);
+ irange_bitmask bm = r.get_bitmask ();
+ signop sign = TYPE_SIGN (TREE_TYPE (arg));
+ ipa_set_jfunc_bits (jfunc,
+ widest_int::from (bm.value (), sign),
+ widest_int::from (bm.mask (), sign));
}
   else if (POINTER_TYPE_P (TREE_TYPE (arg)))
{
-- 
2.40.1



RE: [PATCH 12/19]middle-end: implement loop peeling and IV updates for early break.

2023-07-17 Thread Richard Biener via Gcc-patches
On Mon, 17 Jul 2023, Tamar Christina wrote:

> 
> 
> > -Original Message-
> > From: Richard Biener 
> > Sent: Friday, July 14, 2023 2:35 PM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> > Subject: RE: [PATCH 12/19]middle-end: implement loop peeling and IV
> > updates for early break.
> > 
> > On Thu, 13 Jul 2023, Tamar Christina wrote:
> > 
> > > > -Original Message-
> > > > From: Richard Biener 
> > > > Sent: Thursday, July 13, 2023 6:31 PM
> > > > To: Tamar Christina 
> > > > Cc: gcc-patches@gcc.gnu.org; nd ;
> > j...@ventanamicro.com
> > > > Subject: Re: [PATCH 12/19]middle-end: implement loop peeling and IV
> > > > updates for early break.
> > > >
> > > > On Wed, 28 Jun 2023, Tamar Christina wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > This patch updates the peeling code to maintain LCSSA during peeling.
> > > > > The rewrite also naturally takes into account multiple exits and so 
> > > > > it didn't
> > > > > make sense to split them off.
> > > > >
> > > > > For the purposes of peeling the only change for multiple exits is 
> > > > > that the
> > > > > secondary exits are all wired to the start of the new loop preheader 
> > > > > when
> > > > doing
> > > > > epilogue peeling.
> > > > >
> > > > > When doing prologue peeling the CFG is kept in tact.
> > > > >
> > > > > For both epilogue and prologue peeling we wire through between the
> > two
> > > > loops any
> > > > > PHI nodes that escape the first loop into the second loop if 
> > > > > flow_loops is
> > > > > specified.  The reason for this conditionality is because
> > > > > slpeel_tree_duplicate_loop_to_edge_cfg is used in the compiler in 3 
> > > > > ways:
> > > > >   - prologue peeling
> > > > >   - epilogue peeling
> > > > >   - loop distribution
> > > > >
> > > > > for the last case the loops should remain independent, and so not be
> > > > connected.
> > > > > Because of this propagation of only used phi nodes get_current_def can
> > be
> > > > used
> > > > > to easily find the previous definitions.  However live statements 
> > > > > that are
> > > > > not used inside the loop itself are not propagated (since if unused, 
> > > > > the
> > > > moment
> > > > > we add the guard in between the two loops the value across the bypass
> > edge
> > > > can
> > > > > be wrong if the loop has been peeled.)
> > > > >
> > > > > This is dealt with easily enough in find_guard_arg.
> > > > >
> > > > > For multiple exits, while we are in LCSSA form, and have a correct DOM
> > tree,
> > > > the
> > > > > moment we add the guard block we will change the dominators again.  To
> > > > deal with
> > > > > this slpeel_tree_duplicate_loop_to_edge_cfg can optionally return the
> > blocks
> > > > to
> > > > > update without having to recompute the list of blocks to update again.
> > > > >
> > > > > When multiple exits and doing epilogue peeling we will also 
> > > > > temporarily
> > have
> > > > an
> > > > > incorrect VUSES chain for the secondary exits as it anticipates the 
> > > > > final
> > result
> > > > > after the VDEFs have been moved.  This will thus be corrected once the
> > code
> > > > > motion is applied.
> > > > >
> > > > > Lastly by doing things this way we can remove the helper functions 
> > > > > that
> > > > > previously did lock step iterations to update things as it went along.
> > > > >
> > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > > >
> > > > > Ok for master?
> > > >
> > > > Not sure if I get through all of this in one go - so be prepared that
> > > > the rest of the review follows another day.
> > >
> > > No worries, I appreciate the reviews!
> > > Just giving some quick replies for when you continue.
> > 
> > Continueing.
> > 
> > > >
> > > > > Thanks,
> > > > > Tamar
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > >   * tree-loop-distribution.cc (copy_loop_before): Pass flow_loops 
> > > > > =
> > > > false.
> > > > >   * tree-ssa-loop-niter.cc (loop_only_exit_p):  Fix bug when 
> > > > > exit==null.
> > > > >   * tree-vect-loop-manip.cc (adjust_phi_and_debug_stmts): Add
> > > > additional
> > > > >   assert.
> > > > >   (vect_set_loop_condition_normal): Skip modifying loop IV for 
> > > > > multiple
> > > > >   exits.
> > > > >   (slpeel_tree_duplicate_loop_to_edge_cfg): Support multiple exit
> > > > peeling.
> > > > >   (slpeel_can_duplicate_loop_p): Likewise.
> > > > >   (vect_update_ivs_after_vectorizer): Don't enter this...
> > > > >   (vect_update_ivs_after_early_break): ...but instead enter here.
> > > > >   (find_guard_arg): Update for new peeling code.
> > > > >   (slpeel_update_phi_nodes_for_loops): Remove.
> > > > >   (slpeel_update_phi_nodes_for_guard2): Remove hardcoded edge 0
> > > > checks.
> > > > >   (slpeel_update_phi_nodes_for_lcssa): Remove.
> > > > >   (vect_do_peeling): Fix VF for multiple exits and force epilogue.
> > > > >   * tree-vect-loop.cc 

[PATCH] RTL_SSA: Relax PHI_MODE in phi_setup

2023-07-17 Thread Juzhe-Zhong
Hi, Richard.

RISC-V port needs to add a bunch VLS modes (V16QI,V32QI,V64QI,...etc)
There are sharing same REG_CLASS with VLA modes (VNx16QI,VNx32QI,...etc)

When I am adding those VLS modes, the RTL_SSA initialization in VSETVL PASS 
(inserted after RA) ICE:
rvv.c:13:1: internal compiler error: in partial_subreg_p, at rtl.h:3186
   13 | }
  | ^
0xf7a5b1 partial_subreg_p(machine_mode, machine_mode)
../../../riscv-gcc/gcc/rtl.h:3186
0x1407616 wider_subreg_mode(machine_mode, machine_mode)
../../../riscv-gcc/gcc/rtl.h:3252
0x2a2c6ff rtl_ssa::combine_modes(machine_mode, machine_mode)
../../../riscv-gcc/gcc/rtl-ssa/internals.inl:677
0x2a2b9a4 rtl_ssa::function_info::simplify_phi_setup(rtl_ssa::phi_info*, 
rtl_ssa::set_info**, bitmap_head*)
../../../riscv-gcc/gcc/rtl-ssa/functions.cc:146
0x2a2c142 rtl_ssa::function_info::simplify_phis()
../../../riscv-gcc/gcc/rtl-ssa/functions.cc:258
0x2a2b3f0 rtl_ssa::function_info::function_info(function*)
../../../riscv-gcc/gcc/rtl-ssa/functions.cc:51
0x1cebab9 pass_vsetvl::init()
../../../riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:4578
0x1cec150 pass_vsetvl::execute(function*)
../../../riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:4716

The reason is that we have V32QImode (size = [32,0]) which is the mode set as 
regno_reg_rtx[97]
When the PHI input def comes from ENTRY BLOCK (index =0), the def->mode () = 
V32QImode.
But the phi_mode = VNx2QI for example (I use VLA modes intrinsic write the 
codes).
Then combine_modes report ICE.

In this situation, I relax it and let it use phi_mode directly.

Is it correct ?

Thanks.

gcc/ChangeLog:

* rtl-ssa/functions.cc (function_info::simplify_phi_setup): Relax 
combine in PHI setup.

---
 gcc/rtl-ssa/functions.cc | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/gcc/rtl-ssa/functions.cc b/gcc/rtl-ssa/functions.cc
index c35d25dbf8f..0793598ab1d 100644
--- a/gcc/rtl-ssa/functions.cc
+++ b/gcc/rtl-ssa/functions.cc
@@ -143,7 +143,19 @@ function_info::simplify_phi_setup (phi_info *phi, set_info 
**assumed_values,
   // If the input has a known mode (i.e. not BLKmode), make sure
   // that the phi's mode is at least as large.
   if (def)
-   phi_mode = combine_modes (phi_mode, def->mode ());
+   {
+ /* For target like RISC-V, it applies both variable-length
+and fixed-length to the same REG_CLASS.
+
+It will cause ICE for these 2 following cases:
+  1. phi_mode: variable-length.
+ def->mode (): fixed-length.
+  2. phi_mode: fixed-length.
+ def->mode (): variable-length.  */
+ if (!(GET_MODE_SIZE (phi_mode).is_constant ()
+   ^ GET_MODE_SIZE (def->mode ()).is_constant ()))
+   phi_mode = combine_modes (phi_mode, def->mode ());
+   }
 }
   if (phi->mode () != phi_mode)
 phi->set_mode (phi_mode);
-- 
2.36.1



Re: Fix optimize_mask_stores profile update

2023-07-17 Thread Jan Hubicka via Gcc-patches
> On Mon, Jul 17, 2023 at 12:36 PM Jan Hubicka via Gcc-patches
>  wrote:
> >
> > Hi,
> > While looking into sphinx3 regression I noticed that vectorizer produces
> > BBs with overall probability count 120%.  This patch fixes it.
> > Richi, I don't know how to create a testcase, but having one would
> > be nice.
> >
> > Bootstrapped/regtested x86_64-linux, commited last night (sorry for
> > late email)
> 
> This should trigger with sth like
> 
>   for (i)
> if (cond[i])
>   out[i] = 1.;
> 
> so a masked store and then using AVX2+.  ISTR we disable AVX masked
> stores on zen (but not AVX512).

OK, let me see if I can get a testcase out of that.
> >efalse = make_edge (bb, store_bb, EDGE_FALSE_VALUE);
> >/* Put STORE_BB to likely part.  */
> >efalse->probability = profile_probability::unlikely ();
> > +  e->probability = efalse->probability.invert ();
> >store_bb->count = efalse->count ();
> 
> isn't the count also wrong?  Or rather efalse should be likely().   We're
> testing doing
> 
>   if (!mask all zeros)
> masked-store
> 
> because a masked store with all zero mask can end up invoking COW page fault
> handling multiple times (because it doesn't actually write).

Hmm, I only fixed the profile, efalse was already set to unlikely, but
indeed I think it should be likely. Maybe we can compute some bound on
actual probability by knowing if(cond[i]) probability.
If the loop always does factor many ones or zeros, the probability would
remain the same.
If that is p and they are all independent, the outcome would be
(1-p)^factor

sp we know the conditoinal shoul dbe in ragne (1-p)^factor(1-p),
right?

Honza

> 
> Note -Ofast allows store data races and thus does RMW instead of a masked 
> store.
> 
> >make_single_succ_edge (store_bb, join_bb, EDGE_FALLTHRU);
> >if (dom_info_available_p (CDI_DOMINATORS))


[committed] Restore bootstrap by removing unused variable in tree-ssa-loop-ivcanon.cc

2023-07-17 Thread Martin Jambor
Hi,

This restores bootstrap by removing the variable causing:

  /home/mjambor/gcc/trunk/src/gcc/tree-ssa-loop-ivcanon.cc: In function ‘bool 
try_peel_loop(loop*, edge, tree, bool, long int)’:
  /home/mjambor/gcc/trunk/src/gcc/tree-ssa-loop-ivcanon.cc:1170:17: error: 
variable ‘entry_count’ set but not used [-Werror=unused-but-set-variable]
   1170 |   profile_count entry_count = profile_count::zero ();
| ^~~
  cc1plus: all warnings being treated as errors

ACKed by Honza in a chat, passed a bootstrap on x86_64-linux, committed.

Thanks,

Martin


gcc/ChangeLog:

2023-07-17  Martin Jambor  

* tree-ssa-loop-ivcanon.cc (try_peel_loop): Remove unused variable
entry_count.
---
 gcc/tree-ssa-loop-ivcanon.cc | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/tree-ssa-loop-ivcanon.cc b/gcc/tree-ssa-loop-ivcanon.cc
index bdb738af7a8..a895e8e65be 100644
--- a/gcc/tree-ssa-loop-ivcanon.cc
+++ b/gcc/tree-ssa-loop-ivcanon.cc
@@ -1167,7 +1167,6 @@ try_peel_loop (class loop *loop,
   loop->num, (int) npeel);
 }
   adjust_loop_info_after_peeling (loop, npeel, true);
-  profile_count entry_count = profile_count::zero ();
 
   bitmap_set_bit (peeled_loops, loop->num);
   return true;
-- 
2.41.0



Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors

2023-07-17 Thread Richard Biener via Gcc-patches
On Fri, Jul 14, 2023 at 12:18 PM Tejas Belagod  wrote:
>
> On 7/13/23 4:05 PM, Richard Biener wrote:
> > On Thu, Jul 13, 2023 at 12:15 PM Tejas Belagod  
> > wrote:
> >>
> >> On 7/3/23 1:31 PM, Richard Biener wrote:
> >>> On Mon, Jul 3, 2023 at 8:50 AM Tejas Belagod  
> >>> wrote:
> 
>  On 6/29/23 6:55 PM, Richard Biener wrote:
> > On Wed, Jun 28, 2023 at 1:26 PM Tejas Belagod  
> > wrote:
> >>
> >>
> >>
> >>
> >>
> >> From: Richard Biener 
> >> Date: Tuesday, June 27, 2023 at 12:58 PM
> >> To: Tejas Belagod 
> >> Cc: gcc-patches@gcc.gnu.org 
> >> Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors
> >>
> >> On Tue, Jun 27, 2023 at 8:30 AM Tejas Belagod  
> >> wrote:
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> From: Richard Biener 
> >>> Date: Monday, June 26, 2023 at 2:23 PM
> >>> To: Tejas Belagod 
> >>> Cc: gcc-patches@gcc.gnu.org 
> >>> Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors
> >>>
> >>> On Mon, Jun 26, 2023 at 8:24 AM Tejas Belagod via Gcc-patches
> >>>  wrote:
> 
>  Hi,
> 
>  Packed Boolean Vectors
>  --
> 
>  I'd like to propose a feature addition to GNU Vector extensions to 
>  add packed
>  boolean vectors (PBV).  This has been discussed in the past here[1] 
>  and a variant has
>  been implemented in Clang recently[2].
> 
>  With predication features being added to vector architectures (SVE, 
>  MVE, AVX),
>  it is a useful feature to have to model predication on targets.  
>  This could
>  find its use in intrinsics or just used as is as a GNU vector 
>  extension being
>  mapped to underlying target features.  For example, the packed 
>  boolean vector
>  could directly map to a predicate register on SVE.
> 
>  Also, this new packed boolean type GNU extension can be used with 
>  SVE ACLE
>  intrinsics to replace a fixed-length svbool_t.
> 
>  Here are a few options to represent the packed boolean vector type.
> >>>
> >>> The GIMPLE frontend uses a new 'vector_mask' attribute:
> >>>
> >>> typedef int v8si __attribute__((vector_size(8*sizeof(int;
> >>> typedef v8si v8sib __attribute__((vector_mask));
> >>>
> >>> it get's you a vector type that's the appropriate (dependent on the
> >>> target) vector
> >>> mask type for the vector data type (v8si in this case).
> >>>
> >>>
> >>>
> >>> Thanks Richard.
> >>>
> >>> Having had a quick look at the implementation, it does seem to tick 
> >>> the boxes.
> >>>
> >>> I must admit I haven't dug deep, but if the target hook allows the 
> >>> mask to be
> >>>
> >>> defined in way that is target-friendly (and I don't know how much 
> >>> effort it will
> >>>
> >>> be to migrate the attribute to more front-ends), it should do the job 
> >>> nicely.
> >>>
> >>> Let me go back and dig a bit deeper and get back with questions if 
> >>> any.
> >>
> >>
> >> Let me add that the advantage of this is the compiler doesn't need
> >> to support weird explicitely laid out packed boolean vectors that do
> >> not match what the target supports and the user doesn't need to know
> >> what the target supports (and thus have an #ifdef maze around 
> >> explicitely
> >> specified layouts).
> >>
> >> Sorry for the delayed response – I spent a day experimenting with 
> >> vector_mask.
> >>
> >>
> >>
> >> Yeah, this is what option 4 in the RFC is trying to achieve – be 
> >> portable enough
> >>
> >> to avoid having to sprinkle the code with ifdefs.
> >>
> >>
> >> It does remove some flexibility though, for example with -mavx512f 
> >> -mavx512vl
> >> you'll get AVX512 style masks for V4SImode data vectors but of course 
> >> the
> >> target sill supports SSE2/AVX2 style masks as well, but those would 
> >> not be
> >> available as "packed boolean vectors", though they are of course in 
> >> fact
> >> equal to V4SImode data vectors with -1 or 0 values, so in this 
> >> particular
> >> case it might not matter.
> >>
> >> That said, the vector_mask attribute will get you V4SImode vectors with
> >> signed boolean elements of 32 bits for V4SImode data vectors with
> >> SSE2/AVX2.
> >>
> >>
> >>
> >> This sounds very much like what the scenario would be with NEON vs 
> >> SVE. Coming to think
> >>
> >> of it, vector_mask resembles option 4 in the proposal with ‘n’ implied 
> >> by the ‘base’ vector type
> >>
> >> and a ‘w’ specified for the type.
> >>
> >>
> >>
> >> Given its current implementation, 

[RFC] [v2] Extend fold_vec_perm to handle VLA vectors

2023-07-17 Thread Prathamesh Kulkarni via Gcc-patches
Hi Richard,
This is reworking of patch to extend fold_vec_perm to handle VLA vectors.
The attached patch unifies handling of VLS and VLA vector_csts, while
using fallback code
for ctors.

For VLS vector, the patch ignores underlying encoding, and
uses npatterns = nelts, and nelts_per_pattern = 1.

For VLA patterns, if sel has a stepped sequence, then it
only chooses elements from a particular pattern of a particular
input vector.

To make things simpler, the patch imposes following constraints:
(a) op0_npatterns, op1_npatterns and sel_npatterns are powers of 2.
(b) The step size for a stepped sequence is a power of 2, and
  multiple of npatterns of chosen input vector.
(c) Runtime vector length of sel is a multiple of sel_npatterns.
 So, we don't handle sel.length = 2 + 2x and npatterns = 4.

Eg:
op0, op1: npatterns = 2, nelts_per_pattern = 3
op0_len = op1_len = 16 + 16x.
sel = { 0, 0, 2, 0, 4, 0, ... }
npatterns = 2, nelts_per_pattern = 3.

For pattern {0, 2, 4, ...}
Let,
a1 = 2
S = step size = 2

Let Esel denote number of elements per pattern in sel at runtime.
Esel = (16 + 16x) / npatterns_sel
= (16 + 16x) / 2
= (8 + 8x)

So, last element of pattern:
ae = a1 + (Esel - 2) * S
 = 2 + (8 + 8x - 2) * 2
 = 14 + 16x

a1 /trunc arg0_len = 2 / (16 + 16x) = 0
ae /trunc arg0_len = (14 + 16x) / (16 + 16x) = 0
Since both are equal with quotient = 0, we select elements from op0.

Since step size (S) is a multiple of npatterns(op0), we select
all elements from same pattern of op0.

res_npatterns = max (op0_npatterns, max (op1_npatterns, sel_npatterns))
   = max (2, max (2, 2)
   = 2

res_nelts_per_pattern = max (op0_nelts_per_pattern,
max (op1_nelts_per_pattern,
 sel_nelts_per_pattern))
= max (3, max (3, 3))
= 3

So res has encoding with npatterns = 2, nelts_per_pattern = 3.
res: { op0[0], op0[0], op0[2], op0[0], op0[4], op0[0], ... }

Unfortunately, this results in an issue for poly_int_cst index:
For example,
op0, op1: npatterns = 1, nelts_per_pattern = 3
op0_len = op1_len = 4 + 4x

sel: { 4 + 4x, 5 + 4x, 6 + 4x, ... } // should choose op1

In this case,
a1 = 5 + 4x
S = (6 + 4x) - (5 + 4x) = 1
Esel = 4 + 4x

ae = a1 + (esel - 2) * S
 = (5 + 4x) + (4 + 4x - 2) * 1
 = 7 + 8x

IIUC, 7 + 8x will always be index for last element of op1 ?
if x = 0, len = 4, 7 + 8x = 7
if x = 1, len = 8, 7 + 8x = 15, etc.
So the stepped sequence will always choose elements
from op1 regardless of vector length for above case ?

However,
ae /trunc op0_len
= (7 + 8x) / (4 + 4x)
which is not defined because 7/4 != 8/4
and we return NULL_TREE, but I suppose the expected result would be:
res: { op1[0], op1[1], op1[2], ... } ?

The patch passes bootstrap+test on aarch64-linux-gnu with and without sve,
and on x86_64-unknown-linux-gnu.
I would be grateful for suggestions on how to proceed.

Thanks,
Prathamesh
diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index a02ede79fed..8028b3e8e9a 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -85,6 +85,10 @@ along with GCC; see the file COPYING3.  If not see
 #include "vec-perm-indices.h"
 #include "asan.h"
 #include "gimple-range.h"
+#include 
+#include "tree-pretty-print.h"
+#include "gimple-pretty-print.h"
+#include "print-tree.h"
 
 /* Nonzero if we are folding constants inside an initializer or a C++
manifestly-constant-evaluated context; zero otherwise.
@@ -10493,15 +10497,9 @@ fold_mult_zconjz (location_t loc, tree type, tree expr)
 static bool
 vec_cst_ctor_to_array (tree arg, unsigned int nelts, tree *elts)
 {
-  unsigned HOST_WIDE_INT i, nunits;
+  unsigned HOST_WIDE_INT i;
 
-  if (TREE_CODE (arg) == VECTOR_CST
-  && VECTOR_CST_NELTS (arg).is_constant ())
-{
-  for (i = 0; i < nunits; ++i)
-   elts[i] = VECTOR_CST_ELT (arg, i);
-}
-  else if (TREE_CODE (arg) == CONSTRUCTOR)
+  if (TREE_CODE (arg) == CONSTRUCTOR)
 {
   constructor_elt *elt;
 
@@ -10519,6 +10517,230 @@ vec_cst_ctor_to_array (tree arg, unsigned int nelts, 
tree *elts)
   return true;
 }
 
+/* Return a vector with (NPATTERNS, NELTS_PER_PATTERN) encoding.  */
+
+static tree
+vector_cst_reshape (tree vec, unsigned npatterns, unsigned nelts_per_pattern)
+{
+  gcc_assert (pow2p_hwi (npatterns));
+
+  if (VECTOR_CST_NPATTERNS (vec) == npatterns
+  && VECTOR_CST_NELTS_PER_PATTERN (vec) == nelts_per_pattern)
+return vec;
+
+  tree v = make_vector (exact_log2 (npatterns), nelts_per_pattern);
+  TREE_TYPE (v) = TREE_TYPE (vec);
+
+  unsigned nelts = npatterns * nelts_per_pattern;
+  for (unsigned i = 0; i < nelts; i++)
+VECTOR_CST_ENCODED_ELT(v, i) = vector_cst_elt (vec, i);
+  return v;
+}
+
+/* Helper routine for fold_vec_perm_vla to check if ARG is a suitable
+   operand for VLA vec_perm folding. If arg is VLS, then set
+   

Re: [PATCH] Fix bootstrap failure (with g++ 4.8.5) in tree-if-conv.cc.

2023-07-17 Thread Richard Biener via Gcc-patches
On Mon, Jul 17, 2023 at 9:35 AM Tamar Christina  wrote:
>
> > On Mon, Jul 17, 2023 at 12:21 AM Tamar Christina via Gcc-patches  > patc...@gcc.gnu.org> wrote:
> > >
> > > > -Original Message-
> > > > From: Richard Biener 
> > > > Sent: Monday, July 17, 2023 7:19 AM
> > > > To: Roger Sayle 
> > > > Cc: gcc-patches@gcc.gnu.org; Tamar Christina
> > > > 
> > > > Subject: Re: [PATCH] Fix bootstrap failure (with g++ 4.8.5) in tree-if-
> > conv.cc.
> > > >
> > > > On Fri, Jul 14, 2023 at 8:56 PM Roger Sayle
> > > > 
> > > > wrote:
> > > > >
> > > > >
> > > > >
> > > > > This patch fixes the bootstrap failure I'm seeing using gcc 4.8.5
> > > > > as
> > > > >
> > > > > the host compiler.  Ok for mainline?  [I might be missing
> > > > > something]
> > > >
> > > > OK.   Btw, while I didn't spot this during review I would appreciate
> > > > if the code could use vec.[q]sort, this should work with a lambda as
> > > > well I think.
> > >
> > > That was my first use, but that hits
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99469
> >
> > That is not hitting PR 99469 but rather it means your comparison is not
> > correct for an (unstable) sort.
> > That is qsort comparator should have this relationship `f(a,b) == !f(b, a)` 
> > and
> > `f(a,a)` should also return false.
>
> I'm using the standard std::pair comparator which indicates that f(a,a) is 
> true,
> https://en.cppreference.com/w/cpp/utility/pair/operator_cmp
>
> > If you are running into this for qsort here, you will most likely run into 
> > issues
> > with std::sort later on too.
>
> Don't see why or how. It needs to have a consistent relationship which 
> std::pair
> maintains.  So why would using the standard tuple comparator with a standard
> std::sort cause problem?

At least for

 return left.second < right.second;

f(a,a) doesn't hold.  Note qsort can end up comparing an element to
itself (not sure
if GCCs implementation now can).

Richard.

> Thanks,
> Tamar
>
> >
> > Thanks,
> > Andrew
> >
> > >
> > > Regards,
> > > Tamar


Re: [PATCH] Include insn-opinit.h in PLUGIN_H [PR110610]

2023-07-17 Thread Andre Vieira (lists) via Gcc-patches



On 11/07/2023 23:28, Jeff Law wrote:



On 7/11/23 04:37, Andre Vieira (lists) via Gcc-patches wrote:

Hi,

This patch fixes PR110610 by including OPTABS_H in the INTERNAL_FN_H 
list, as insn-opinit.h is now required by internal-fn.h. This will 
lead to insn-opinit.h, among the other OPTABS_H header files, being 
installed in the plugin directory.


Bootstrapped aarch64-unknown-linux-gnu.

@Jakub: could you check to see if it also addresses PR 110284?


gcc/ChangeLog:

 PR 110610
 * Makefile.in (INTERNAL_FN_H): Add OPTABS_H.
Why use OPTABS_H here?  Isn't the new dependency just on insn-opinit.h 
and insn-codes.h and neither of those #include other headers do they?





Yeah, there was no particular reason other than I just felt the Makefile 
structure sort of lend itself that way. I checked genopinit.cc and it 
seems insn-opinit.h doesn't include any other header files, only the 
sources do, so I've changed the patch to only add insn-opinit.h to 
INTERNAL_FN_H.


---

This patch fixes PR110610 by including insn-opinit.h in the 
INTERNAL_FN_H list, as insn-opinit.h is now required by internal-fn.h. 
This will lead to insn-opinit.h, among the other OPTABS_H header files, 
being installed in the plugin directory.


Bootstrapped aarch64-unknown-linux-gnu.

gcc/ChangeLog:
PR 110610
* Makefile.in (INTERNAL_FN_H): Add insn-opinit.h.diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 
c478ec852013eae65b9f3ec0a443e023c7d8b452..683774ad446d545362644d2dbdc37723eea55bc3
 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -976,7 +976,7 @@ READ_MD_H = $(OBSTACK_H) $(HASHTAB_H) read-md.h
 BUILTINS_DEF = builtins.def sync-builtins.def omp-builtins.def \
gtm-builtins.def sanitizer.def
 INTERNAL_FN_DEF = internal-fn.def
-INTERNAL_FN_H = internal-fn.h $(INTERNAL_FN_DEF)
+INTERNAL_FN_H = internal-fn.h $(INTERNAL_FN_DEF) insn-opinit.h
 TREE_CORE_H = tree-core.h $(CORETYPES_H) all-tree.def tree.def \
c-family/c-common.def $(lang_tree_files) \
$(BUILTINS_DEF) $(INPUT_H) statistics.h \


Re: [PATCH] Export value/mask known bits from CCP.

2023-07-17 Thread Richard Biener via Gcc-patches
On Mon, Jul 17, 2023 at 9:57 AM Aldy Hernandez via Gcc-patches
 wrote:
>
> Currently CCP throws away the known 1 bits because VRP and irange have
> traditionally only had a way of tracking known 0s (set_nonzero_bits).
> With the ability to keep all the known bits in the irange, we can now
> save this between passes.
>
> OK?

OK.

> gcc/ChangeLog:
>
> * tree-ssa-ccp.cc (ccp_finalize): Export value/mask known bits.
> ---
>  gcc/tree-ssa-ccp.cc | 8 +++-
>  1 file changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/tree-ssa-ccp.cc b/gcc/tree-ssa-ccp.cc
> index 0d0f02a8442..64d5fa81334 100644
> --- a/gcc/tree-ssa-ccp.cc
> +++ b/gcc/tree-ssa-ccp.cc
> @@ -1020,11 +1020,9 @@ ccp_finalize (bool nonzero_p)
>else
> {
>   unsigned int precision = TYPE_PRECISION (TREE_TYPE (val->value));
> - wide_int nonzero_bits
> -   = (wide_int::from (val->mask, precision, UNSIGNED)
> -  | wi::to_wide (val->value));
> - nonzero_bits &= get_nonzero_bits (name);
> - set_nonzero_bits (name, nonzero_bits);
> + wide_int value = wi::to_wide (val->value);
> + wide_int mask = wide_int::from (val->mask, precision, UNSIGNED);
> + set_bitmask (name, value, mask);
> }
>  }
>
> --
> 2.40.1
>


Re: [PATCH] RISC-V: Ensure all implied extensions are included[PR110696]

2023-07-17 Thread Lehua Ding
Commited to the trunk, thank you.


--Original--
From: "KitoCheng"

Re: [PATCH 1/2] [i386] Support type _Float16/__bf16 independent of SSE2.

2023-07-17 Thread Uros Bizjak via Gcc-patches
On Mon, Jul 17, 2023 at 10:28 AM Hongtao Liu  wrote:
>
> I'd like to ping for this patch (only patch 1/2, for patch 2/2, I
> think that may not be necessary).
>
> On Mon, May 15, 2023 at 9:20 AM Hongtao Liu  wrote:
> >
> > ping.
> >
> > On Fri, Apr 21, 2023 at 9:55 PM liuhongt  wrote:
> > >
> > > > > +  if (!TARGET_SSE2)
> > > > > +{
> > > > > +  if (c_dialect_cxx ()
> > > > > +   && cxx_dialect > cxx20)
> > > >
> > > > Formatting, both conditions are short, so just put them on one line.
> > > Changed.
> > >
> > > > But for the C++23 macros, more importantly I think we really should
> > > > also in ix86_target_macros_internal add
> > > >   if (c_dialect_cxx ()
> > > >   && cxx_dialect > cxx20
> > > >   && (isa_flag & OPTION_MASK_ISA_SSE2))
> > > > {
> > > >   def_or_undef (parse_in, "__STDCPP_FLOAT16_T__");
> > > >   def_or_undef (parse_in, "__STDCPP_BFLOAT16_T__");
> > > > }
> > > > plus associated libstdc++ changes.  It can be done incrementally though.
> > > Added in PATCH 2/2
> > >
> > > > > +  if (flag_building_libgcc)
> > > > > + {
> > > > > +   /* libbid uses __LIBGCC_HAS_HF_MODE__ and 
> > > > > __LIBGCC_HAS_BF_MODE__
> > > > > +  to check backend support of _Float16 and __bf16 type.  */
> > > >
> > > > That is actually the case only for HFmode, but not for BFmode right now.
> > > > So, we need further work.  One is to add the BFmode support in there,
> > > > and another one is make sure the _Float16 <-> _Decimal* and __bf16 <->
> > > > _Decimal* conversions are compiled in also if not -msse2 by default.
> > > > One way to do that is wrap the HF and BF mode related functions on x86
> > > > #ifndef __SSE2__ into the pragmas like intrin headers use (but then
> > > > perhaps we don't need to undef this stuff here), another is not provide
> > > > the hf/bf support in that case from the TUs where they are provided now,
> > > > but from a different one which would be compiled with -msse2.
> > > Add CFLAGS-_hf_to_sd.c += -msse2, similar for other files in libbid, just 
> > > like
> > > we did before for HFtype softfp. Then no need to undef libgcc macros.
> > >
> > > > >/* We allowed the user to turn off SSE for kernel mode.  Don't 
> > > > > crash if
> > > > >   some less clueful developer tries to use floating-point anyway. 
> > > > >  */
> > > > > -  if (needed_sseregs && !TARGET_SSE)
> > > > > +  if (needed_sseregs
> > > > > +  && (!TARGET_SSE
> > > > > +   || (VALID_SSE2_TYPE_MODE (mode)
> > > > > +   && !TARGET_SSE2)))
> > > >
> > > > Formatting, no need to split this up that much.
> > > >   if (needed_sseregs
> > > >   && (!TARGET_SSE
> > > >   || (VALID_SSE2_TYPE_MODE (mode) && !TARGET_SSE2)))
> > > > or even better
> > > >   if (needed_sseregs
> > > >   && (!TARGET_SSE || (VALID_SSE2_TYPE_MODE (mode) && !TARGET_SSE2)))
> > > > will do it.
> > > Changed.
> > >
> > > > Instead of this, just use
> > > >   if (!float16_type_node)
> > > > {
> > > >   float16_type_node = ix86_float16_type_node;
> > > >   callback (float16_type_node);
> > > >   float16_type_node = NULL_TREE;
> > > > }
> > > >   if (!bfloat16_type_node)
> > > > {
> > > >   bfloat16_type_node = ix86_bf16_type_node;
> > > >   callback (bfloat16_type_node);
> > > >   bfloat16_type_node = NULL_TREE;
> > > > }
> > > Changed.
> > >
> > >
> > > > > +static const char *
> > > > > +ix86_invalid_conversion (const_tree fromtype, const_tree totype)
> > > > > +{
> > > > > +  if (element_mode (fromtype) != element_mode (totype))
> > > > > +{
> > > > > +  /* Do no allow conversions to/from BFmode/HFmode scalar types
> > > > > +  when TARGET_SSE2 is not available.  */
> > > > > +  if ((TYPE_MODE (fromtype) == BFmode
> > > > > +|| TYPE_MODE (fromtype) == HFmode)
> > > > > +   && !TARGET_SSE2)
> > > >
> > > > First of all, not really sure if this should be purely about scalar
> > > > modes, not also complex and vector modes involving those inner modes.
> > > > Because complex or vector modes with BF/HF elements will be without
> > > > TARGET_SSE2 for sure lowered into scalar code and that can't be handled
> > > > either.
> > > > So if (!TARGET_SSE2 && GET_MODE_INNER (TYPE_MODE (fromtype)) == BFmode)
> > > > or even better
> > > > if (!TARGET_SSE2 && element_mode (fromtype) == BFmode)
> > > > ?
> > > > Or even better remember the 2 modes above into machine_mode temporaries
> > > > and just use those in the != comparison and for the checks?
> > > >
> > > > Also, I think it is weird to tell user %<__bf16%> or %<_Float16%> when
> > > > we know which one it is.  Just return separate messages?
> > > Changed.
> > >
> > > > > +  /* Reject all single-operand operations on BFmode/HFmode except 
> > > > > for &
> > > > > + when TARGET_SSE2 is not available.  */
> > > > > +  if ((element_mode (type) == BFmode || element_mode (type) == 
> > > > > HFmode)
> 

Re: [PATCH] Add peephole to eliminate redundant comparison after cmpccxadd.

2023-07-17 Thread Uros Bizjak via Gcc-patches
On Mon, Jul 17, 2023 at 8:44 AM Hongtao Liu  wrote:
>
> Ping.
>
> On Tue, Jul 11, 2023 at 5:16 PM liuhongt via Gcc-patches
>  wrote:
> >
> > Similar like we did for CMPXCHG, but extended to all
> > ix86_comparison_int_operator since CMPCCXADD set EFLAGS exactly same
> > as CMP.
> >
> > When operand order in CMP insn is same as that in CMPCCXADD,
> > CMP insn can be eliminated directly.
> >
> > When operand order is swapped in CMP insn, only optimize
> > cmpccxadd + cmpl + jcc/setcc to cmpccxadd + jcc/setcc when FLAGS_REG is dead
> > after jcc/setcc plus adjusting code for jcc/setcc.
> >
> > gcc/ChangeLog:
> >
> > PR target/110591
> > * config/i386/sync.md (cmpccxadd_): Adjust the pattern
> > to explicitly set FLAGS_REG like *cmp_1, also add extra
> > 3 define_peephole2 after the pattern.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/pr110591.c: New test.
> > * gcc.target/i386/pr110591-2.c: New test.

LGTM.

Thanks,
Uros.

> > ---
> >  gcc/config/i386/sync.md| 160 -
> >  gcc/testsuite/gcc.target/i386/pr110591-2.c |  90 
> >  gcc/testsuite/gcc.target/i386/pr110591.c   |  66 +
> >  3 files changed, 315 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr110591-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr110591.c
> >
> > diff --git a/gcc/config/i386/sync.md b/gcc/config/i386/sync.md
> > index e1fa1504deb..e84226cf895 100644
> > --- a/gcc/config/i386/sync.md
> > +++ b/gcc/config/i386/sync.md
> > @@ -1093,7 +1093,9 @@ (define_insn "cmpccxadd_"
> >   UNSPECV_CMPCCXADD))
> > (set (match_dup 1)
> > (unspec_volatile:SWI48x [(const_int 0)] UNSPECV_CMPCCXADD))
> > -   (clobber (reg:CC FLAGS_REG))]
> > +   (set (reg:CC FLAGS_REG)
> > +   (compare:CC (match_dup 1)
> > +   (match_dup 2)))]
> >"TARGET_CMPCCXADD && TARGET_64BIT"
> >  {
> >char buf[128];
> > @@ -1105,3 +1107,159 @@ (define_insn "cmpccxadd_"
> >output_asm_insn (buf, operands);
> >return "";
> >  })
> > +
> > +(define_peephole2
> > +  [(set (match_operand:SWI48x 0 "register_operand")
> > +   (match_operand:SWI48x 1 "x86_64_general_operand"))
> > +   (parallel [(set (match_dup 0)
> > +  (unspec_volatile:SWI48x
> > +[(match_operand:SWI48x 2 "memory_operand")
> > + (match_dup 0)
> > + (match_operand:SWI48x 3 "register_operand")
> > + (match_operand:SI 4 "const_int_operand")]
> > +UNSPECV_CMPCCXADD))
> > + (set (match_dup 2)
> > +  (unspec_volatile:SWI48x [(const_int 0)] 
> > UNSPECV_CMPCCXADD))
> > + (set (reg:CC FLAGS_REG)
> > +  (compare:CC (match_dup 2)
> > +  (match_dup 0)))])
> > +   (set (reg FLAGS_REG)
> > +   (compare (match_operand:SWI48x 5 "register_operand")
> > +(match_operand:SWI48x 6 "x86_64_general_operand")))]
> > +  "TARGET_CMPCCXADD && TARGET_64BIT
> > +   && rtx_equal_p (operands[0], operands[5])
> > +   && rtx_equal_p (operands[1], operands[6])"
> > +  [(set (match_dup 0)
> > +   (match_dup 1))
> > +   (parallel [(set (match_dup 0)
> > +  (unspec_volatile:SWI48x
> > +[(match_dup 2)
> > + (match_dup 0)
> > + (match_dup 3)
> > + (match_dup 4)]
> > +UNSPECV_CMPCCXADD))
> > + (set (match_dup 2)
> > +  (unspec_volatile:SWI48x [(const_int 0)] 
> > UNSPECV_CMPCCXADD))
> > + (set (reg:CC FLAGS_REG)
> > +  (compare:CC (match_dup 2)
> > +  (match_dup 0)))])
> > +   (set (match_dup 7)
> > +   (match_op_dup 8
> > + [(match_dup 9) (const_int 0)]))])
> > +
> > +(define_peephole2
> > +  [(set (match_operand:SWI48x 0 "register_operand")
> > +   (match_operand:SWI48x 1 "x86_64_general_operand"))
> > +   (parallel [(set (match_dup 0)
> > +  (unspec_volatile:SWI48x
> > +[(match_operand:SWI48x 2 "memory_operand")
> > + (match_dup 0)
> > + (match_operand:SWI48x 3 "register_operand")
> > + (match_operand:SI 4 "const_int_operand")]
> > +UNSPECV_CMPCCXADD))
> > + (set (match_dup 2)
> > +  (unspec_volatile:SWI48x [(const_int 0)] 
> > UNSPECV_CMPCCXADD))
> > + (set (reg:CC FLAGS_REG)
> > +  (compare:CC (match_dup 2)
> > +  (match_dup 0)))])
> > +   (set (reg FLAGS_REG)
> > +   (compare (match_operand:SWI48x 5 "register_operand")
> > +(match_operand:SWI48x 6 "x86_64_general_operand")))
> > +   (set (match_operand:QI 7 "nonimmediate_operand")
> > +   (match_operator:QI 8 

RE: [PATCH 12/19]middle-end: implement loop peeling and IV updates for early break.

2023-07-17 Thread Tamar Christina via Gcc-patches


> -Original Message-
> From: Richard Biener 
> Sent: Friday, July 14, 2023 2:35 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> Subject: RE: [PATCH 12/19]middle-end: implement loop peeling and IV
> updates for early break.
> 
> On Thu, 13 Jul 2023, Tamar Christina wrote:
> 
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Thursday, July 13, 2023 6:31 PM
> > > To: Tamar Christina 
> > > Cc: gcc-patches@gcc.gnu.org; nd ;
> j...@ventanamicro.com
> > > Subject: Re: [PATCH 12/19]middle-end: implement loop peeling and IV
> > > updates for early break.
> > >
> > > On Wed, 28 Jun 2023, Tamar Christina wrote:
> > >
> > > > Hi All,
> > > >
> > > > This patch updates the peeling code to maintain LCSSA during peeling.
> > > > The rewrite also naturally takes into account multiple exits and so it 
> > > > didn't
> > > > make sense to split them off.
> > > >
> > > > For the purposes of peeling the only change for multiple exits is that 
> > > > the
> > > > secondary exits are all wired to the start of the new loop preheader 
> > > > when
> > > doing
> > > > epilogue peeling.
> > > >
> > > > When doing prologue peeling the CFG is kept in tact.
> > > >
> > > > For both epilogue and prologue peeling we wire through between the
> two
> > > loops any
> > > > PHI nodes that escape the first loop into the second loop if flow_loops 
> > > > is
> > > > specified.  The reason for this conditionality is because
> > > > slpeel_tree_duplicate_loop_to_edge_cfg is used in the compiler in 3 
> > > > ways:
> > > >   - prologue peeling
> > > >   - epilogue peeling
> > > >   - loop distribution
> > > >
> > > > for the last case the loops should remain independent, and so not be
> > > connected.
> > > > Because of this propagation of only used phi nodes get_current_def can
> be
> > > used
> > > > to easily find the previous definitions.  However live statements that 
> > > > are
> > > > not used inside the loop itself are not propagated (since if unused, the
> > > moment
> > > > we add the guard in between the two loops the value across the bypass
> edge
> > > can
> > > > be wrong if the loop has been peeled.)
> > > >
> > > > This is dealt with easily enough in find_guard_arg.
> > > >
> > > > For multiple exits, while we are in LCSSA form, and have a correct DOM
> tree,
> > > the
> > > > moment we add the guard block we will change the dominators again.  To
> > > deal with
> > > > this slpeel_tree_duplicate_loop_to_edge_cfg can optionally return the
> blocks
> > > to
> > > > update without having to recompute the list of blocks to update again.
> > > >
> > > > When multiple exits and doing epilogue peeling we will also temporarily
> have
> > > an
> > > > incorrect VUSES chain for the secondary exits as it anticipates the 
> > > > final
> result
> > > > after the VDEFs have been moved.  This will thus be corrected once the
> code
> > > > motion is applied.
> > > >
> > > > Lastly by doing things this way we can remove the helper functions that
> > > > previously did lock step iterations to update things as it went along.
> > > >
> > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > >
> > > > Ok for master?
> > >
> > > Not sure if I get through all of this in one go - so be prepared that
> > > the rest of the review follows another day.
> >
> > No worries, I appreciate the reviews!
> > Just giving some quick replies for when you continue.
> 
> Continueing.
> 
> > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > * tree-loop-distribution.cc (copy_loop_before): Pass flow_loops 
> > > > =
> > > false.
> > > > * tree-ssa-loop-niter.cc (loop_only_exit_p):  Fix bug when 
> > > > exit==null.
> > > > * tree-vect-loop-manip.cc (adjust_phi_and_debug_stmts): Add
> > > additional
> > > > assert.
> > > > (vect_set_loop_condition_normal): Skip modifying loop IV for 
> > > > multiple
> > > > exits.
> > > > (slpeel_tree_duplicate_loop_to_edge_cfg): Support multiple exit
> > > peeling.
> > > > (slpeel_can_duplicate_loop_p): Likewise.
> > > > (vect_update_ivs_after_vectorizer): Don't enter this...
> > > > (vect_update_ivs_after_early_break): ...but instead enter here.
> > > > (find_guard_arg): Update for new peeling code.
> > > > (slpeel_update_phi_nodes_for_loops): Remove.
> > > > (slpeel_update_phi_nodes_for_guard2): Remove hardcoded edge 0
> > > checks.
> > > > (slpeel_update_phi_nodes_for_lcssa): Remove.
> > > > (vect_do_peeling): Fix VF for multiple exits and force epilogue.
> > > > * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
> > > > non_break_control_flow and early_breaks.
> > > > (vect_need_peeling_or_partial_vectors_p): Force partial vector 
> > > > if
> > > > multiple exits and VLA.
> > > > (vect_analyze_loop_form): Support inner loop multiple exits.

Re: Fix optimize_mask_stores profile update

2023-07-17 Thread Richard Biener via Gcc-patches
On Mon, Jul 17, 2023 at 12:36 PM Jan Hubicka via Gcc-patches
 wrote:
>
> Hi,
> While looking into sphinx3 regression I noticed that vectorizer produces
> BBs with overall probability count 120%.  This patch fixes it.
> Richi, I don't know how to create a testcase, but having one would
> be nice.
>
> Bootstrapped/regtested x86_64-linux, commited last night (sorry for
> late email)

This should trigger with sth like

  for (i)
if (cond[i])
  out[i] = 1.;

so a masked store and then using AVX2+.  ISTR we disable AVX masked
stores on zen (but not AVX512).

Richard.

> gcc/ChangeLog:
>
> PR tree-optimization/110649
> * tree-vect-loop.cc (optimize_mask_stores): Set correctly
> probability of the if-then-else construct.
>
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 7d917bfd72c..b44fb9c7712 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -11680,6 +11679,7 @@ optimize_mask_stores (class loop *loop)
>efalse = make_edge (bb, store_bb, EDGE_FALSE_VALUE);
>/* Put STORE_BB to likely part.  */
>efalse->probability = profile_probability::unlikely ();
> +  e->probability = efalse->probability.invert ();
>store_bb->count = efalse->count ();

isn't the count also wrong?  Or rather efalse should be likely().   We're
testing doing

  if (!mask all zeros)
masked-store

because a masked store with all zero mask can end up invoking COW page fault
handling multiple times (because it doesn't actually write).

Note -Ofast allows store data races and thus does RMW instead of a masked store.

>make_single_succ_edge (store_bb, join_bb, EDGE_FALLTHRU);
>if (dom_info_available_p (CDI_DOMINATORS))


Avoid double profile udpate in try_peel_loop

2023-07-17 Thread Jan Hubicka via Gcc-patches
Hi,
try_peel_loop uses gimple_duplicate_loop_body_to_header_edge which subtracts 
the profile
from the original loop. However then it tries to scale the profile in a wrong 
way
(it forces header count to be entry count).

This eliminates to profile misupdates in the internal loop of sphinx3.

gcc/ChangeLog:

PR middle-end/110649
* tree-ssa-loop-ivcanon.cc (try_peel_loop): Avoid double profile update.

diff --git a/gcc/tree-ssa-loop-ivcanon.cc b/gcc/tree-ssa-loop-ivcanon.cc
index 0117dbfc91b..bdb738af7a8 100644
--- a/gcc/tree-ssa-loop-ivcanon.cc
+++ b/gcc/tree-ssa-loop-ivcanon.cc
@@ -1152,6 +1152,7 @@ try_peel_loop (class loop *loop,
 }
   if (may_be_zero)
 bitmap_clear_bit (wont_exit, 1);
+
   if (!gimple_duplicate_loop_body_to_header_edge (
loop, loop_preheader_edge (loop), npeel, wont_exit, exit,
_to_remove, DLTHE_FLAG_UPDATE_FREQ))
@@ -1168,18 +1169,6 @@ try_peel_loop (class loop *loop,
   adjust_loop_info_after_peeling (loop, npeel, true);
   profile_count entry_count = profile_count::zero ();
 
-  edge e;
-  edge_iterator ei;
-  FOR_EACH_EDGE (e, ei, loop->header->preds)
-if (e->src != loop->latch)
-  {
-   if (e->src->count.initialized_p ())
- entry_count += e->src->count;
-   gcc_assert (!flow_bb_inside_loop_p (loop, e->src));
-  }
-  profile_probability p;
-  p = entry_count.probability_in (loop->header->count);
-  scale_loop_profile (loop, p, -1);
   bitmap_set_bit (peeled_loops, loop->num);
   return true;
 }


Fix profile update in scale_profile_for_vect_loop

2023-07-17 Thread Jan Hubicka via Gcc-patches
Hi,
when vectorizing 4 times, we sometimes do
  for
<4x vectorized body>
  for
<2x vectorized body>
  for
<1x vectorized body>

Here the second two fors handling epilogue never iterates.
Currently vecotrizer thinks that the middle for itrates twice.
This turns out to be scale_profile_for_vect_loop that uses 
niter_for_unrolled_loop.

At that time we know epilogue will iterate at most 2 times
but niter_for_unrolled_loop does not know that the last iteration
will be taken by the epilogue-of-epilogue and thus it think
that the loop may iterate once and exit in middle of second
iteration.

We already do correct job updating niter bounds and this is
just ordering issue.  This patch makes us to first update
the bounds and then do updating of the loop.  I re-implemented
the function more correctly and precisely.

The loop reducing iteration factor for overly flat profiles is bit funny, but
only other method I can think of is to compute sreal scale that would have
similar overhead I think.

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

PR middle-end/110649
* tree-vect-loop.cc (scale_profile_for_vect_loop):
(vect_transform_loop):
(optimize_mask_stores):

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 7d917bfd72c..b44fb9c7712 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -10842,31 +10842,30 @@ vect_get_loop_len (loop_vec_info loop_vinfo, 
gimple_stmt_iterator *gsi,
 static void
 scale_profile_for_vect_loop (class loop *loop, unsigned vf)
 {
-  edge preheader = loop_preheader_edge (loop);
-  /* Reduce loop iterations by the vectorization factor.  */
-  gcov_type new_est_niter = niter_for_unrolled_loop (loop, vf);
-  profile_count freq_h = loop->header->count, freq_e = preheader->count ();
-
-  if (freq_h.nonzero_p ())
-{
-  profile_probability p;
-
-  /* Avoid dropping loop body profile counter to 0 because of zero count
-in loop's preheader.  */
-  if (!(freq_e == profile_count::zero ()))
-freq_e = freq_e.force_nonzero ();
-  p = (freq_e * (new_est_niter + 1)).probability_in (freq_h);
-  scale_loop_frequencies (loop, p);
-}
-
+  /* Loop body executes VF fewer times and exit increases VF times.  */
   edge exit_e = single_exit (loop);
-  exit_e->probability = profile_probability::always () / (new_est_niter + 1);
-
-  edge exit_l = single_pred_edge (loop->latch);
-  profile_probability prob = exit_l->probability;
-  exit_l->probability = exit_e->probability.invert ();
-  if (prob.initialized_p () && exit_l->probability.initialized_p ())
-scale_bbs_frequencies (>latch, 1, exit_l->probability / prob);
+  profile_count entry_count = loop_preheader_edge (loop)->count ();
+
+  /* If we have unreliable loop profile avoid dropping entry
+ count bellow header count.  This can happen since loops
+ has unrealistically low trip counts.  */
+  while (vf > 1
+&& loop->header->count > entry_count
+&& loop->header->count < entry_count * vf)
+vf /= 2;
+
+  if (entry_count.nonzero_p ())
+set_edge_probability_and_rescale_others
+   (exit_e,
+entry_count.probability_in (loop->header->count / vf));
+  /* Avoid producing very large exit probability when we do not have
+ sensible profile.  */
+  else if (exit_e->probability < profile_probability::always () / (vf * 2))
+set_edge_probability_and_rescale_others (exit_e, exit_e->probability * vf);
+  loop->latch->count = single_pred_edge (loop->latch)->count ();
+
+  scale_loop_profile (loop, profile_probability::always () / vf,
+ get_likely_max_loop_iterations_int (loop));
 }
 
 /* For a vectorized stmt DEF_STMT_INFO adjust all vectorized PHI
@@ -11476,7 +11475,6 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple 
*loop_vectorized_call)
   niters_vector_mult_vf, !niters_no_overflow);
 
   unsigned int assumed_vf = vect_vf_for_cost (loop_vinfo);
-  scale_profile_for_vect_loop (loop, assumed_vf);
 
   /* True if the final iteration might not handle a full vector's
  worth of scalar iterations.  */
@@ -11547,6 +11545,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple 
*loop_vectorized_call)
  assumed_vf) - 1
 : wi::udiv_floor (loop->nb_iterations_estimate + bias_for_assumed,
   assumed_vf) - 1);
+  scale_profile_for_vect_loop (loop, assumed_vf);
 
   if (dump_enabled_p ())
 {


Fix optimize_mask_stores profile update

2023-07-17 Thread Jan Hubicka via Gcc-patches
Hi,
While looking into sphinx3 regression I noticed that vectorizer produces
BBs with overall probability count 120%.  This patch fixes it.
Richi, I don't know how to create a testcase, but having one would
be nice.

Bootstrapped/regtested x86_64-linux, commited last night (sorry for
late email)

gcc/ChangeLog:

PR tree-optimization/110649
* tree-vect-loop.cc (optimize_mask_stores): Set correctly
probability of the if-then-else construct.

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 7d917bfd72c..b44fb9c7712 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -11680,6 +11679,7 @@ optimize_mask_stores (class loop *loop)
   efalse = make_edge (bb, store_bb, EDGE_FALSE_VALUE);
   /* Put STORE_BB to likely part.  */
   efalse->probability = profile_probability::unlikely ();
+  e->probability = efalse->probability.invert ();
   store_bb->count = efalse->count ();
   make_single_succ_edge (store_bb, join_bb, EDGE_FALLTHRU);
   if (dom_info_available_p (CDI_DOMINATORS))


Re: [PATCH] RISC-V: Ensure all implied extensions are included[PR110696]

2023-07-17 Thread Kito Cheng via Gcc-patches
LGTM, thanks for the patch :)

On Mon, Jul 17, 2023 at 5:53 PM Lehua Ding  wrote:
>
> Hi,
>
> This patch fix target/PR110696, recursively add all implied extensions.
>
> Best,
> Lehua
>
> PR target/110696
>
> gcc/ChangeLog:
>
> * common/config/riscv/riscv-common.cc 
> (riscv_subset_list::handle_implied_ext): recur add all implied extensions.
> (riscv_subset_list::check_implied_ext): Add new method.
> (riscv_subset_list::parse): Call checker check_implied_ext.
> * config/riscv/riscv-subset.h: Add new method.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/attribute-20.c: New test.
> * gcc.target/riscv/pr110696.c: New test.
>
> ---
>  gcc/common/config/riscv/riscv-common.cc   | 33 +--
>  gcc/config/riscv/riscv-subset.h   |  3 +-
>  gcc/testsuite/gcc.target/riscv/attribute-20.c |  7 
>  gcc/testsuite/gcc.target/riscv/pr110696.c |  7 
>  4 files changed, 46 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/attribute-20.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr110696.c
>
> diff --git a/gcc/common/config/riscv/riscv-common.cc 
> b/gcc/common/config/riscv/riscv-common.cc
> index 28c8f0c1489..19075c0b241 100644
> --- a/gcc/common/config/riscv/riscv-common.cc
> +++ b/gcc/common/config/riscv/riscv-common.cc
> @@ -949,14 +949,14 @@ riscv_subset_list::parse_std_ext (const char *p)
>
>  /* Check any implied extensions for EXT.  */
>  void
> -riscv_subset_list::handle_implied_ext (riscv_subset_t *ext)
> +riscv_subset_list::handle_implied_ext (const char *ext)
>  {
>const riscv_implied_info_t *implied_info;
>for (implied_info = _implied_info[0];
> implied_info->ext;
> ++implied_info)
>  {
> -  if (strcmp (ext->name.c_str (), implied_info->ext) != 0)
> +  if (strcmp (ext, implied_info->ext) != 0)
> continue;
>
>/* Skip if implied extension already present.  */
> @@ -966,6 +966,9 @@ riscv_subset_list::handle_implied_ext (riscv_subset_t 
> *ext)
>/* Version of implied extension will get from current ISA spec
>  version.  */
>add (implied_info->implied_ext, true);
> +
> +  /* Recursively add implied extension by implied_info->implied_ext.  */
> +  handle_implied_ext (implied_info->implied_ext);
>  }
>
>/* For RISC-V ISA version 2.2 or earlier version, zicsr and zifence is
> @@ -980,6 +983,27 @@ riscv_subset_list::handle_implied_ext (riscv_subset_t 
> *ext)
>  }
>  }
>
> +/* Check that all implied extensions are included.  */
> +bool
> +riscv_subset_list::check_implied_ext ()
> +{
> +  riscv_subset_t *itr;
> +  for (itr = m_head; itr != NULL; itr = itr->next)
> +{
> +  const riscv_implied_info_t *implied_info;
> +  for (implied_info = _implied_info[0]; implied_info->ext;
> +  ++implied_info)
> +   {
> + if (strcmp (itr->name.c_str(), implied_info->ext) != 0)
> +   continue;
> +
> + if (!lookup (implied_info->implied_ext))
> +   return false;
> +   }
> +}
> +  return true;
> +}
> +
>  /* Check any combine extensions for EXT.  */
>  void
>  riscv_subset_list::handle_combine_ext ()
> @@ -1194,9 +1218,12 @@ riscv_subset_list::parse (const char *arch, location_t 
> loc)
>
>for (itr = subset_list->m_head; itr != NULL; itr = itr->next)
>  {
> -  subset_list->handle_implied_ext (itr);
> +  subset_list->handle_implied_ext (itr->name.c_str ());
>  }
>
> +  /* Make sure all implied extensions are included. */
> +  gcc_assert (subset_list->check_implied_ext ());
> +
>subset_list->handle_combine_ext ();
>
>if (subset_list->lookup ("zfinx") && subset_list->lookup ("f"))
> diff --git a/gcc/config/riscv/riscv-subset.h b/gcc/config/riscv/riscv-subset.h
> index 92e4fb31692..84a7a82db63 100644
> --- a/gcc/config/riscv/riscv-subset.h
> +++ b/gcc/config/riscv/riscv-subset.h
> @@ -67,7 +67,8 @@ private:
>const char *parse_multiletter_ext (const char *, const char *,
>  const char *);
>
> -  void handle_implied_ext (riscv_subset_t *);
> +  void handle_implied_ext (const char *);
> +  bool check_implied_ext ();
>void handle_combine_ext ();
>
>  public:
> diff --git a/gcc/testsuite/gcc.target/riscv/attribute-20.c 
> b/gcc/testsuite/gcc.target/riscv/attribute-20.c
> new file mode 100644
> index 000..f7d0b29b71c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/attribute-20.c
> @@ -0,0 +1,7 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv_zvl65536b -mabi=lp64d" } */
> +int foo()
> +{
> +}
> +
> +/* { dg-final { scan-assembler ".attribute arch, 
> \"rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_v1p0_zicsr2p0_zifencei2p0_zve32f1p0_zve32x1p0_zve64d1p0_zve64f1p0_zve64x1p0_zvl1024b1p0_zvl128b1p0_zvl16384b1p0_zvl2048b1p0_zvl256b1p0_zvl32768b1p0_zvl32b1p0_zvl4096b1p0_zvl512b1p0_zvl64b1p0_zvl65536b1p0_zvl8192b1p0\""
>  } } */
> diff --git 

RE: [PATCH V2] RISC-V: Support non-SLP unordered reduction

2023-07-17 Thread Li, Pan2 via Gcc-patches
Committed, thanks Kito.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Kito Cheng via Gcc-patches
Sent: Monday, July 17, 2023 5:33 PM
To: Juzhe-Zhong 
Cc: gcc-patches@gcc.gnu.org; kito.ch...@gmail.com; pal...@dabbelt.com; 
pal...@rivosinc.com; jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH V2] RISC-V: Support non-SLP unordered reduction

LGTM, thanks :)

On Mon, Jul 17, 2023 at 4:20 PM Juzhe-Zhong  wrote:
>
> This patch add reduc_*_scal to support reduction auto-vectorization.
>
> Use COND_LEN_* + reduc_*_scal to support unordered non-SLP auto-vectorization.
>
> Consider this following case:
> int __attribute__((noipa))
> and_loop (int32_t * __restrict x,
> int32_t n, int res)
> {
>   for (int i = 0; i < n; ++i)
> res &= x[i];
>   return res;
> }
>
> ASM:
> and_loop:
> ble a1,zero,.L4
> vsetvli a3,zero,e32,m1,ta,ma
> vmv.v.i v1,-1
> .L3:
> vsetvli a5,a1,e32,m1,tu,ma   > MUST BE "TU".
> sllia4,a5,2
> sub a1,a1,a5
> vle32.v v2,0(a0)
> add a0,a0,a4
> vand.vv v1,v2,v1
> bne a1,zero,.L3
> vsetivlizero,1,e32,m1,ta,ma
> vmv.v.i v2,-1
> vsetvli a3,zero,e32,m1,ta,ma
> vredand.vs  v1,v1,v2
> vmv.x.s a5,v1
> and a0,a2,a5
> ret
> .L4:
> mv  a0,a2
> ret
>
> Fix bug of VSETVL PASS which is caused by reduction testcase.
>
> SLP reduction and floating-point in-order reduction are not supported yet.
>
> gcc/ChangeLog:
>
> * config/riscv/autovec.md (reduc_plus_scal_): New pattern.
> (reduc_smax_scal_): Ditto.
> (reduc_umax_scal_): Ditto.
> (reduc_smin_scal_): Ditto.
> (reduc_umin_scal_): Ditto.
> (reduc_and_scal_): Ditto.
> (reduc_ior_scal_): Ditto.
> (reduc_xor_scal_): Ditto.
> * config/riscv/riscv-protos.h (enum insn_type): Add reduction.
> (expand_reduction): New function.
> * config/riscv/riscv-v.cc (emit_vlmax_reduction_insn): Ditto.
> (emit_vlmax_fp_reduction_insn): Ditto.
> (get_m1_mode): Ditto.
> (expand_cond_len_binop): Fix name.
> (expand_reduction): New function
> * config/riscv/riscv-vsetvl.cc (gen_vsetvl_pat): Fix VSETVL BUG.
> (validate_change_or_fail): New function.
> (change_insn): Fix VSETVL BUG.
> (change_vsetvl_insn): Ditto.
> (pass_vsetvl::backward_demand_fusion): Ditto.
> (pass_vsetvl::df_post_optimization): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/rvv.exp: Add reduction tests.
> * gcc.target/riscv/rvv/autovec/reduc/reduc-1.c: New test.
> * gcc.target/riscv/rvv/autovec/reduc/reduc-2.c: New test.
> * gcc.target/riscv/rvv/autovec/reduc/reduc-3.c: New test.
> * gcc.target/riscv/rvv/autovec/reduc/reduc-4.c: New test.
> * gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c: New test.
> * gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c: New test.
> * gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c: New test.
> * gcc.target/riscv/rvv/autovec/reduc/reduc_run-4.c: New test.
>
> ---
>  gcc/config/riscv/autovec.md   | 138 ++
>  gcc/config/riscv/riscv-protos.h   |   2 +
>  gcc/config/riscv/riscv-v.cc   |  84 ++-
>  gcc/config/riscv/riscv-vsetvl.cc  |  57 ++--
>  .../riscv/rvv/autovec/reduc/reduc-1.c | 118 +++
>  .../riscv/rvv/autovec/reduc/reduc-2.c | 129 
>  .../riscv/rvv/autovec/reduc/reduc-3.c |  65 +
>  .../riscv/rvv/autovec/reduc/reduc-4.c |  59 
>  .../riscv/rvv/autovec/reduc/reduc_run-1.c |  56 +++
>  .../riscv/rvv/autovec/reduc/reduc_run-2.c |  79 ++
>  .../riscv/rvv/autovec/reduc/reduc_run-3.c |  49 +++
>  .../riscv/rvv/autovec/reduc/reduc_run-4.c |  66 +
>  gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|   2 +
>  13 files changed, 887 insertions(+), 17 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-3.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-4.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-4.c
>
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index 64a41bd7101..8cdec75bacf 100644
> --- a/gcc/config/riscv/autovec.md
> +++ 

[PATCH] RISC-V: Ensure all implied extensions are included[PR110696]

2023-07-17 Thread Lehua Ding
Hi,

This patch fix target/PR110696, recursively add all implied extensions.

Best,
Lehua

PR target/110696

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc 
(riscv_subset_list::handle_implied_ext): recur add all implied extensions.
(riscv_subset_list::check_implied_ext): Add new method.
(riscv_subset_list::parse): Call checker check_implied_ext.
* config/riscv/riscv-subset.h: Add new method.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/attribute-20.c: New test.
* gcc.target/riscv/pr110696.c: New test.

---
 gcc/common/config/riscv/riscv-common.cc   | 33 +--
 gcc/config/riscv/riscv-subset.h   |  3 +-
 gcc/testsuite/gcc.target/riscv/attribute-20.c |  7 
 gcc/testsuite/gcc.target/riscv/pr110696.c |  7 
 4 files changed, 46 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/attribute-20.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr110696.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 28c8f0c1489..19075c0b241 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -949,14 +949,14 @@ riscv_subset_list::parse_std_ext (const char *p)
 
 /* Check any implied extensions for EXT.  */
 void
-riscv_subset_list::handle_implied_ext (riscv_subset_t *ext)
+riscv_subset_list::handle_implied_ext (const char *ext)
 {
   const riscv_implied_info_t *implied_info;
   for (implied_info = _implied_info[0];
implied_info->ext;
++implied_info)
 {
-  if (strcmp (ext->name.c_str (), implied_info->ext) != 0)
+  if (strcmp (ext, implied_info->ext) != 0)
continue;
 
   /* Skip if implied extension already present.  */
@@ -966,6 +966,9 @@ riscv_subset_list::handle_implied_ext (riscv_subset_t *ext)
   /* Version of implied extension will get from current ISA spec
 version.  */
   add (implied_info->implied_ext, true);
+
+  /* Recursively add implied extension by implied_info->implied_ext.  */
+  handle_implied_ext (implied_info->implied_ext);
 }
 
   /* For RISC-V ISA version 2.2 or earlier version, zicsr and zifence is
@@ -980,6 +983,27 @@ riscv_subset_list::handle_implied_ext (riscv_subset_t *ext)
 }
 }
 
+/* Check that all implied extensions are included.  */
+bool
+riscv_subset_list::check_implied_ext ()
+{
+  riscv_subset_t *itr;
+  for (itr = m_head; itr != NULL; itr = itr->next)
+{
+  const riscv_implied_info_t *implied_info;
+  for (implied_info = _implied_info[0]; implied_info->ext;
+  ++implied_info)
+   {
+ if (strcmp (itr->name.c_str(), implied_info->ext) != 0)
+   continue;
+
+ if (!lookup (implied_info->implied_ext))
+   return false;
+   }
+}
+  return true;
+}
+
 /* Check any combine extensions for EXT.  */
 void
 riscv_subset_list::handle_combine_ext ()
@@ -1194,9 +1218,12 @@ riscv_subset_list::parse (const char *arch, location_t 
loc)
 
   for (itr = subset_list->m_head; itr != NULL; itr = itr->next)
 {
-  subset_list->handle_implied_ext (itr);
+  subset_list->handle_implied_ext (itr->name.c_str ());
 }
 
+  /* Make sure all implied extensions are included. */
+  gcc_assert (subset_list->check_implied_ext ());
+
   subset_list->handle_combine_ext ();
 
   if (subset_list->lookup ("zfinx") && subset_list->lookup ("f"))
diff --git a/gcc/config/riscv/riscv-subset.h b/gcc/config/riscv/riscv-subset.h
index 92e4fb31692..84a7a82db63 100644
--- a/gcc/config/riscv/riscv-subset.h
+++ b/gcc/config/riscv/riscv-subset.h
@@ -67,7 +67,8 @@ private:
   const char *parse_multiletter_ext (const char *, const char *,
 const char *);
 
-  void handle_implied_ext (riscv_subset_t *);
+  void handle_implied_ext (const char *);
+  bool check_implied_ext ();
   void handle_combine_ext ();
 
 public:
diff --git a/gcc/testsuite/gcc.target/riscv/attribute-20.c 
b/gcc/testsuite/gcc.target/riscv/attribute-20.c
new file mode 100644
index 000..f7d0b29b71c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/attribute-20.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvl65536b -mabi=lp64d" } */
+int foo()
+{
+}
+
+/* { dg-final { scan-assembler ".attribute arch, 
\"rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_v1p0_zicsr2p0_zifencei2p0_zve32f1p0_zve32x1p0_zve64d1p0_zve64f1p0_zve64x1p0_zvl1024b1p0_zvl128b1p0_zvl16384b1p0_zvl2048b1p0_zvl256b1p0_zvl32768b1p0_zvl32b1p0_zvl4096b1p0_zvl512b1p0_zvl64b1p0_zvl65536b1p0_zvl8192b1p0\""
 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/pr110696.c 
b/gcc/testsuite/gcc.target/riscv/pr110696.c
new file mode 100644
index 000..a630f04e74f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr110696.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvl4096b -mabi=lp64d" } */
+int foo()
+{
+}
+
+/* { dg-final { scan-assembler 

Re: [PATCH] Export value/mask known bits from IPA.

2023-07-17 Thread Martin Jambor
Hi Aldy,

On Mon, Jul 17 2023, Aldy Hernandez wrote:
> Currently IPA throws away the known 1 bits because VRP and irange have
> traditionally only had a way of tracking known 0s (set_nonzero_bits).
> With the ability to keep all the known bits in the irange, we can now
> save this between passes.
>
> OK?
>
> gcc/ChangeLog:
>
>   * ipa-prop.cc (ipcp_update_bits): Export value/mask known bits.

OK, thanks.

Martin


> ---
>  gcc/ipa-prop.cc | 7 +++
>  1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
> index d2b998f8af5..5d790ff1265 100644
> --- a/gcc/ipa-prop.cc
> +++ b/gcc/ipa-prop.cc
> @@ -5853,10 +5853,9 @@ ipcp_update_bits (struct cgraph_node *node, 
> ipcp_transformation *ts)
>   {
> unsigned prec = TYPE_PRECISION (TREE_TYPE (ddef));
> signop sgn = TYPE_SIGN (TREE_TYPE (ddef));
> -
> -   wide_int nonzero_bits = wide_int::from (bits[i]->mask, prec, UNSIGNED)
> -   | wide_int::from (bits[i]->value, prec, sgn);
> -   set_nonzero_bits (ddef, nonzero_bits);
> +   wide_int mask = wide_int::from (bits[i]->mask, prec, UNSIGNED);
> +   wide_int value = wide_int::from (bits[i]->value, prec, sgn);
> +   set_bitmask (ddef, value, mask);
>   }
>else
>   {
> -- 
> 2.40.1


Re: [PATCH V2] RISC-V: Support non-SLP unordered reduction

2023-07-17 Thread Kito Cheng via Gcc-patches
LGTM, thanks :)

On Mon, Jul 17, 2023 at 4:20 PM Juzhe-Zhong  wrote:
>
> This patch add reduc_*_scal to support reduction auto-vectorization.
>
> Use COND_LEN_* + reduc_*_scal to support unordered non-SLP auto-vectorization.
>
> Consider this following case:
> int __attribute__((noipa))
> and_loop (int32_t * __restrict x,
> int32_t n, int res)
> {
>   for (int i = 0; i < n; ++i)
> res &= x[i];
>   return res;
> }
>
> ASM:
> and_loop:
> ble a1,zero,.L4
> vsetvli a3,zero,e32,m1,ta,ma
> vmv.v.i v1,-1
> .L3:
> vsetvli a5,a1,e32,m1,tu,ma   > MUST BE "TU".
> sllia4,a5,2
> sub a1,a1,a5
> vle32.v v2,0(a0)
> add a0,a0,a4
> vand.vv v1,v2,v1
> bne a1,zero,.L3
> vsetivlizero,1,e32,m1,ta,ma
> vmv.v.i v2,-1
> vsetvli a3,zero,e32,m1,ta,ma
> vredand.vs  v1,v1,v2
> vmv.x.s a5,v1
> and a0,a2,a5
> ret
> .L4:
> mv  a0,a2
> ret
>
> Fix bug of VSETVL PASS which is caused by reduction testcase.
>
> SLP reduction and floating-point in-order reduction are not supported yet.
>
> gcc/ChangeLog:
>
> * config/riscv/autovec.md (reduc_plus_scal_): New pattern.
> (reduc_smax_scal_): Ditto.
> (reduc_umax_scal_): Ditto.
> (reduc_smin_scal_): Ditto.
> (reduc_umin_scal_): Ditto.
> (reduc_and_scal_): Ditto.
> (reduc_ior_scal_): Ditto.
> (reduc_xor_scal_): Ditto.
> * config/riscv/riscv-protos.h (enum insn_type): Add reduction.
> (expand_reduction): New function.
> * config/riscv/riscv-v.cc (emit_vlmax_reduction_insn): Ditto.
> (emit_vlmax_fp_reduction_insn): Ditto.
> (get_m1_mode): Ditto.
> (expand_cond_len_binop): Fix name.
> (expand_reduction): New function
> * config/riscv/riscv-vsetvl.cc (gen_vsetvl_pat): Fix VSETVL BUG.
> (validate_change_or_fail): New function.
> (change_insn): Fix VSETVL BUG.
> (change_vsetvl_insn): Ditto.
> (pass_vsetvl::backward_demand_fusion): Ditto.
> (pass_vsetvl::df_post_optimization): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/rvv.exp: Add reduction tests.
> * gcc.target/riscv/rvv/autovec/reduc/reduc-1.c: New test.
> * gcc.target/riscv/rvv/autovec/reduc/reduc-2.c: New test.
> * gcc.target/riscv/rvv/autovec/reduc/reduc-3.c: New test.
> * gcc.target/riscv/rvv/autovec/reduc/reduc-4.c: New test.
> * gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c: New test.
> * gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c: New test.
> * gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c: New test.
> * gcc.target/riscv/rvv/autovec/reduc/reduc_run-4.c: New test.
>
> ---
>  gcc/config/riscv/autovec.md   | 138 ++
>  gcc/config/riscv/riscv-protos.h   |   2 +
>  gcc/config/riscv/riscv-v.cc   |  84 ++-
>  gcc/config/riscv/riscv-vsetvl.cc  |  57 ++--
>  .../riscv/rvv/autovec/reduc/reduc-1.c | 118 +++
>  .../riscv/rvv/autovec/reduc/reduc-2.c | 129 
>  .../riscv/rvv/autovec/reduc/reduc-3.c |  65 +
>  .../riscv/rvv/autovec/reduc/reduc-4.c |  59 
>  .../riscv/rvv/autovec/reduc/reduc_run-1.c |  56 +++
>  .../riscv/rvv/autovec/reduc/reduc_run-2.c |  79 ++
>  .../riscv/rvv/autovec/reduc/reduc_run-3.c |  49 +++
>  .../riscv/rvv/autovec/reduc/reduc_run-4.c |  66 +
>  gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|   2 +
>  13 files changed, 887 insertions(+), 17 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-3.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-4.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-4.c
>
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index 64a41bd7101..8cdec75bacf 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -1554,3 +1554,141 @@
>riscv_vector::expand_cond_len_ternop (icode, operands);
>DONE;
>  })
> +
> +;; =
> +;; == Reductions
> +;; =
> +
> +;; -
> 

Re: [PATCH] riscv: Fix warning in riscv_regno_ok_for_index_p

2023-07-17 Thread Kito Cheng via Gcc-patches
pushed, thanks :)

On Mon, Jul 17, 2023 at 4:59 PM Christoph Muellner
 wrote:
>
> From: Christoph Müllner 
>
> The variable `regno` is currently not used in riscv_regno_ok_for_index_p(),
> which triggers a compiler warning. Let's address this.
>
> Fixes: 423604278ed5 ("riscv: Prepare backend for index registers")
>
> Reported-by: Juzhe Zhong 
> Reported-by: Andreas Schwab 
> Signed-off-by: Christoph Müllner 
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.cc (riscv_regno_ok_for_index_p):
> Remove parameter name from declaration of unused parameter.
>
> Signed-off-by: Christoph Müllner 
> ---
>  gcc/config/riscv/riscv.cc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 6ed735d6983..ae3c034e76e 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -861,7 +861,7 @@ riscv_index_reg_class ()
> but extensions might support that.  */
>
>  int
> -riscv_regno_ok_for_index_p (int regno)
> +riscv_regno_ok_for_index_p (int)
>  {
>return 0;
>  }
> --
> 2.41.0
>


Re: [PATCH][RFC] tree-optimization/88540 - FP x > y ? x : y if-conversion without -ffast-math

2023-07-17 Thread Richard Biener via Gcc-patches
On Fri, 14 Jul 2023, Andrew Pinski wrote:

> On Thu, Jul 13, 2023 at 2:54?AM Richard Biener via Gcc-patches
>  wrote:
> >
> > The following makes sure that FP x > y ? x : y style max/min operations
> > are if-converted at the GIMPLE level.  While we can neither match
> > it to MAX_EXPR nor .FMAX as both have different semantics with IEEE
> > than the ternary ?: operation we can make sure to maintain this form
> > as a COND_EXPR so backends have the chance to match this to instructions
> > their ISA offers.
> >
> > The patch does this in phiopt where we recognize min/max and instead
> > of giving up when we have to honor NaNs we alter the generated code
> > to a COND_EXPR.
> >
> > This resolves PR88540 and we can then SLP vectorize the min operation
> > for its testcase.  It also resolves part of the regressions observed
> > with the change matching bit-inserts of bit-field-refs to vec_perm.
> >
> > Expansion from a COND_EXPR rather than from compare-and-branch
> > regresses gcc.target/i386/pr54855-13.c and gcc.target/i386/pr54855-9.c
> > by producing extra moves while the corresponding min/max operations
> > are now already synthesized by RTL expansion, register selection
> > isn't optimal.  This can be also provoked without this change by
> > altering the operand order in the source.
> >
> > It regresses gcc.target/i386/pr110170.c where we end up CSEing the
> > condition which makes RTL expansion no longer produce the min/max
> > directly and code generation is obfuscated enough to confuse
> > RTL if-conversion.
> >
> > It also regresses gcc.target/i386/ssefp-[12].c where oddly one
> > variant isn't if-converted and ix86_expand_fp_movcc doesn't
> > match directly (the FP constants get expanded twice).  A fix
> > could be in emit_conditional_move where both prepare_cmp_insn
> > and emit_conditional_move_1 force the constants to (different)
> > registers.
> >
> > Otherwise bootstrapped and tested on x86_64-unknown-linux-gnu.
> >
> > PR tree-optimization/88540
> > * tree-ssa-phiopt.cc (minmax_replacement): Do not give up
> > with NaNs but handle the simple case by if-converting to a
> > COND_EXPR.
> 
> One thing which I was thinking about adding to phiopt is having the
> last pass do the conversion to COND_EXPR if the target supports a
> conditional move for that expression. That should fix this one right?
> This was one of things I was working towards with the moving to use
> match-and-simplify too.

Note the if-conversion has to happen before BB SLP but the last
phiopt is too late for this (yes, BB SLP could also be enhanced
to handle conditionals and do if-conversion on-the-fly).  For
BB SLP there's also usually jump threading making a mess of
same condition chain of if-convertible ops ...

As for the min + max case that regresses due 
to CSE (gcc.target/i386/pr110170.c) I wonder whether pre-expanding

 _1 = _2 < _3;
 _4 = _1 ? _2 : _3;
 _5 = _1 ? _3 : _2;

to something more clever would be appropriate anyway.  We could
adjust this to either duplicate _1 or expand the COND_EXPRs back
to a single CFG diamond.  I suppose force-duplicating non-vector
compares of COND_EXPRs to make TER work again would fix similar
regressions we might already observe (but I'm not aware of many
COND_EXPR generators).

Richard.

> Thanks,
> Andrew
> 
> >
> > * gcc.target/i386/pr88540.c: New testcase.
> > * gcc.target/i386/pr54855-12.c: Adjust.
> > * gcc.target/i386/pr54855-13.c: Likewise.
> > ---
> >  gcc/testsuite/gcc.target/i386/pr54855-12.c |  2 +-
> >  gcc/testsuite/gcc.target/i386/pr54855-13.c |  2 +-
> >  gcc/testsuite/gcc.target/i386/pr88540.c| 10 ++
> >  gcc/tree-ssa-phiopt.cc | 21 -
> >  4 files changed, 28 insertions(+), 7 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr88540.c
> >
> > diff --git a/gcc/testsuite/gcc.target/i386/pr54855-12.c 
> > b/gcc/testsuite/gcc.target/i386/pr54855-12.c
> > index 2f8af392c83..09e8ab8ae39 100644
> > --- a/gcc/testsuite/gcc.target/i386/pr54855-12.c
> > +++ b/gcc/testsuite/gcc.target/i386/pr54855-12.c
> > @@ -1,6 +1,6 @@
> >  /* { dg-do compile } */
> >  /* { dg-options "-O2 -mavx512fp16" } */
> > -/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]" 1 } } */
> > +/* { dg-final { scan-assembler-times "vm\[ai\]\[nx\]sh\[ \\t\]" 1 } } */
> >  /* { dg-final { scan-assembler-not "vcomish\[ \\t\]" } } */
> >  /* { dg-final { scan-assembler-not "vmovsh\[ \\t\]" { target { ! ia32 } } 
> > } } */
> >
> > diff --git a/gcc/testsuite/gcc.target/i386/pr54855-13.c 
> > b/gcc/testsuite/gcc.target/i386/pr54855-13.c
> > index 87b4f459a5a..a4f25066f81 100644
> > --- a/gcc/testsuite/gcc.target/i386/pr54855-13.c
> > +++ b/gcc/testsuite/gcc.target/i386/pr54855-13.c
> > @@ -1,6 +1,6 @@
> >  /* { dg-do compile } */
> >  /* { dg-options "-O2 -mavx512fp16" } */
> > -/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]" 1 } } */
> > +/* { dg-final { scan-assembler-times 

Re: [PATCH V4] Optimize '(X - N * M) / N' to 'X / N - M' if valid

2023-07-17 Thread Richard Biener via Gcc-patches
On Fri, 14 Jul 2023, Andrew MacLeod wrote:

> 
> On 7/14/23 09:37, Richard Biener wrote:
> > On Fri, 14 Jul 2023, Aldy Hernandez wrote:
> >
> >> I don't know what you're trying to accomplish here, as I haven't been
> >> following the PR, but adding all these helper functions to the ranger
> >> header
> >> file seems wrong, especially since there's only one use of them. I see
> >> you're
> >> tweaking the irange API, adding helper functions to range-op (which is only
> >> for code dealing with implementing range operators for tree codes), etc
> >> etc.
> >>
> >> If you need these helper functions, I suggest you put them closer to their
> >> uses (i.e. wherever the match.pd support machinery goes).
> > Note I suggested the opposite beacuse I thought these kind of helpers
> > are closer to value-range support than to match.pd.
> 
> 
> probably vr-values.{cc.h} and  the simply_using_ranges paradigm would be the
> most sensible place to put these kinds of auxiliary routines?
> 
> 
> >
> > But I take away from your answer that there's nothing close in the
> > value-range machinery that answers the question whether A op B may
> > overflow?
> 
> we dont track it in ranges themselves.   During calculation of a range we
> obviously know, but propagating that generally when we rarely care doesn't
> seem worthwhile.  The very first generation of irange 6 years ago had an
> overflow_p() flag, but it was removed as not being worth keeping.     easier
> to simply ask the question when it matters
> 
> As the routines show, it pretty easy to figure out when the need arises so I
> think that should suffice.  At least for now,
> 
> Should we decide we would like it in general, it wouldnt be hard to add to
> irange.  wi_fold() cuurently returns null, it could easily return a bool
> indicating if an overflow happened, and wi_fold_in_parts and fold_range would
> simply OR the results all together of the compoent wi_fold() calls.  It would
> require updating/audfiting  a number of range-op entries and adding an
> overflowed_p()  query to irange.

Ah, yeah - the folding APIs would be a good fit I guess.  I was
also looking to have the "new" helpers to be somewhat consistent
with the ranger API.

So if we had a fold_range overload with either an output argument
or a flag that makes it return false on possible overflow that
would work I guess?  Since we have a virtual class setup we
might be able to provide a default failing method and implement
workers for plus and mult (as needed for this patch) as the need
arises?

Thanks,
Richard.


Re: Loop-ch improvements, part 3

2023-07-17 Thread Richard Biener via Gcc-patches
On Fri, 14 Jul 2023, Jan Hubicka wrote:

> Hi,
> loop-ch currently does analysis using ranger for all loops to identify
> candidates and then follows by phase where headers are duplicated (which
> breaks SSA and ranger).  The second stage does more analysis (to see how
> many BBs we want to duplicate) but can't use ranger and thus misses
> information about static conditionals.
> 
> This patch pushes all analysis into the first stage. We record how many
> BBs to duplicate and the second stage just duplicats as it is told so.
> This makes it possible to also extend range query done also to basic
> blocks that are not headers.  This is easy to do, since we already do
> path specific query so we only need to extend the path by headers we
> decided to dulicate earlier.
> 
> This makes it possible to track situations where exit that is always
> false in the first iteration for tests not in the original loop header.
> Doing so lets us to update profile better and do better heuristics.  In
> particular I changed logic as follows
>   1) should_duplicate_loop_header_p counts size of duplicated region.  When we
>  know that a given conditional will be constant true or constant false 
> either
>  in the duplicated region, by range query, or in the loop body after
>  duplication (since it is loop invariant), we do not account it to code 
> size
>  costs
>   2) don't need account loop invariant compuations that will be duplicated
>  as they will become fully invariant
>  (maybe we want to have some cap for register pressure eventually?)
>   3) optimize_size logic is now different.  Originally we started duplicating
>  iff the first conditional was known to be true by ranger query, but then
>  we used same limits as for -O2.
> 
>  I now simply lower limits to 0. This means that every conditional
>  in duplicated sequence must be either loop invariant or constant when
>  duplicated and we only duplicate statements computing loop invariants
>  and those we account to 0 size anyway,
> 
> This makes code IMO more streamlined (and hopefully will let us to merge
> ibts with loop peeling logic), but makes little difference in practice.
> The problem is that in loop:
> 
> void test2();
> void test(int n)
> {
>   for (int i = 0; n && i < 10; i++)
> test2();
> }
> 
> We produce:
>[local count: 1073741824 freq: 9.090909]:
>   # i_4 = PHI <0(2), i_9(3)>
>   _1 = n_7(D) != 0;
>   _2 = i_4 <= 9;
>   _3 = _1 & _2;
>   if (_3 != 0)
> goto ; [89.00%]
>   else
> goto ; [11.00%]
> 
> and do not understand that the final conditional is a combination of a 
> conditional
> that is always true in first iteration and a conditional that is loop 
> invariant.
> 
> This is also the case of
> void test2();
> void test(int n)
> {
>   for (int i = 0; n; i++)
> {
>   if (i > 10)
> break;
>   test2();
> }
> }
> Which we turn to the earlier case in ifcombine.
> 
> With disabled ifcombine things however works as exepcted.  This is something
> I plan to handle incrementally.  However extending loop-ch and peeling passes
> to understand such combined conditionals is still not good enough: at the 
> time ifcombine
> merged the two conditionals we lost profile information on how often n is 0,
> so we can't recover correct profile or know what is expected number of 
> iterations
> after the transofrm.
> 
> Bootstrapped/regtested x86_64-linux, OK?

OK.

Thanks,
Richard.

> Honza
> 
> 
> gcc/ChangeLog:
> 
>   * tree-ssa-loop-ch.cc (edge_range_query): Take loop argument; be ready
>   for queries not in headers.
>   (static_loop_exit): Add basic blck parameter; update use of
>   edge_range_query
>   (should_duplicate_loop_header_p): Add ranger and static_exits
>   parameter.  Do not account statements that will be optimized
>   out after duplicaiton in overall size. Add ranger query to
>   find static exits.
>   (update_profile_after_ch):  Take static_exits has set instead of
>   single eliminated_edge.
>   (ch_base::copy_headers): Do all analysis in the first pass;
>   remember invariant_exits and static_exits.
> 
> diff --git a/gcc/tree-ssa-loop-ch.cc b/gcc/tree-ssa-loop-ch.cc
> index 24e7fbc805a..e0139cb432c 100644
> --- a/gcc/tree-ssa-loop-ch.cc
> +++ b/gcc/tree-ssa-loop-ch.cc
> @@ -49,11 +49,13 @@ along with GCC; see the file COPYING3.  If not see
> the range of the solved conditional in R.  */
>  
>  static void
> -edge_range_query (irange , edge e, gcond *cond, gimple_ranger )
> +edge_range_query (irange , class loop *loop, gcond *cond, gimple_ranger 
> )
>  {
> -  auto_vec path (2);
> -  path.safe_push (e->dest);
> -  path.safe_push (e->src);
> +  auto_vec path;
> +  for (basic_block bb = gimple_bb (cond); bb != loop->header; bb = 
> single_pred_edge (bb)->src)
> +path.safe_push (bb);
> +  path.safe_push (loop->header);
> +  path.safe_push (loop_preheader_edge (loop)->src);
>path_range_query query 

[PATCH 9/9] Native complex operation: Experimental support in x86 backend

2023-07-17 Thread Sylvain Noiry via Gcc-patches
Add an experimental support for native complex operation handling in
the x86 backend. For now it only support add, sub, mul, conj, neg, mov
in SCmode (complex float). Performance gains are still marginal on this
target because there are no particular instructions to speedup complex
operation, except some SIMD tricks.

gcc/ChangeLog:

* config/i386/i386.cc (classify_argument): Align complex
element to the whole size, not size of the parts
(ix86_return_in_memory): Handle complex modes like a scalar
with the same size
(ix86_class_max_nregs): Likewise
(ix86_hard_regno_nregs): Likewise
(function_value_ms_64): Add case for SCmode
(ix86_build_const_vector): Likewise
(ix86_build_signbit_mask): Likewise
(x86_gen_rtx_complex): New: Implement the gen_rtx_complex
hook, use registers of complex modes to represent complex
elements in rtl
(x86_read_complex_part): New: Implement the read_complex_part
hook, handle registers of complex modes
(x86_write_complex_part): New: Implement the write_complex_part
hook, handle registers of complex modes
* config/i386/i386.h: Add SCmode in several predicates
* config/i386/sse.md: Add pattern for some complex operations in
SCmode. This includes movsc, addsc3, subsc3, negsc2, mulsc3,
and conjsc2
---
 gcc/config/i386/i386.cc | 296 +++-
 gcc/config/i386/i386.h  |  11 +-
 gcc/config/i386/sse.md  | 144 +++
 3 files changed, 440 insertions(+), 11 deletions(-)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index f0d6167e667..a65ac92a4a9 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -2339,8 +2339,8 @@ classify_argument (machine_mode mode, const_tree type,
mode_alignment = 128;
   else if (mode == XCmode)
mode_alignment = 256;
-  if (COMPLEX_MODE_P (mode))
-   mode_alignment /= 2;
+  /*if (COMPLEX_MODE_P (mode))
+   mode_alignment /= 2;*/
   /* Misaligned fields are always returned in memory.  */
   if (bit_offset % mode_alignment)
return 0;
@@ -3007,6 +3007,7 @@ pass_in_reg:
 case E_V4BFmode:
 case E_V2SImode:
 case E_V2SFmode:
+case E_SCmode:
 case E_V1TImode:
 case E_V1DImode:
   if (!type || !AGGREGATE_TYPE_P (type))
@@ -3257,6 +3258,7 @@ pass_in_reg:
 case E_V4BFmode:
 case E_V2SImode:
 case E_V2SFmode:
+case E_SCmode:
 case E_V1TImode:
 case E_V1DImode:
   if (!type || !AGGREGATE_TYPE_P (type))
@@ -4158,8 +4160,8 @@ function_value_ms_64 (machine_mode orig_mode, 
machine_mode mode,
  && !INTEGRAL_TYPE_P (valtype)
  && !VECTOR_FLOAT_TYPE_P (valtype))
break;
- if ((SCALAR_INT_MODE_P (mode) || VECTOR_MODE_P (mode))
- && !COMPLEX_MODE_P (mode))
+ if ((SCALAR_INT_MODE_P (mode) || VECTOR_MODE_P (mode)))
+// && !COMPLEX_MODE_P (mode))
regno = FIRST_SSE_REG;
  break;
case 8:
@@ -4266,7 +4268,7 @@ ix86_return_in_memory (const_tree type, const_tree fntype 
ATTRIBUTE_UNUSED)
   || INTEGRAL_TYPE_P (type)
   || VECTOR_FLOAT_TYPE_P (type))
  && (SCALAR_INT_MODE_P (mode) || VECTOR_MODE_P (mode))
- && !COMPLEX_MODE_P (mode)
+ //&& !COMPLEX_MODE_P (mode)
  && (GET_MODE_SIZE (mode) == 16 || size == 16))
return false;
 
@@ -15722,6 +15724,7 @@ ix86_build_const_vector (machine_mode mode, bool vect, 
rtx value)
 case E_V8SFmode:
 case E_V4SFmode:
 case E_V2SFmode:
+case E_SCmode:
 case E_V8DFmode:
 case E_V4DFmode:
 case E_V2DFmode:
@@ -15770,6 +15773,7 @@ ix86_build_signbit_mask (machine_mode mode, bool vect, 
bool invert)
 case E_V8SFmode:
 case E_V4SFmode:
 case E_V2SFmode:
+case E_SCmode:
 case E_V2SImode:
   vec_mode = mode;
   imode = SImode;
@@ -19821,7 +19825,8 @@ ix86_class_max_nregs (reg_class_t rclass, machine_mode 
mode)
   else
 {
   if (COMPLEX_MODE_P (mode))
-   return 2;
+   return CEIL (GET_MODE_SIZE (mode), UNITS_PER_WORD);
+   //return 2;
   else
return 1;
 }
@@ -20157,7 +20162,8 @@ ix86_hard_regno_nregs (unsigned int regno, machine_mode 
mode)
   return CEIL (GET_MODE_SIZE (mode), UNITS_PER_WORD);
 }
   if (COMPLEX_MODE_P (mode))
-return 2;
+return 1;
+//return 2;
   /* Register pair for mask registers.  */
   if (mode == P2QImode || mode == P2HImode)
 return 2;
@@ -23613,6 +23619,273 @@ ix86_preferred_simd_mode (scalar_mode mode)
 }
 }
 
+static rtx
+x86_gen_rtx_complex (machine_mode mode, rtx real_part, rtx imag_part)
+{
+  machine_mode imode = GET_MODE_INNER (mode);
+
+  if ((real_part == imag_part) && (real_part == CONST0_RTX (imode)))
+{
+  if (CONST_DOUBLE_P (real_part))
+   return const_double_from_real_value 

Re: [WIP RFC] Add support for keyword-based attributes

2023-07-17 Thread Richard Biener via Gcc-patches
On Mon, Jul 17, 2023 at 10:21 AM Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On Fri, Jul 14, 2023 at 5:58 PM Richard Sandiford via Gcc-patches
> >  wrote:
> >>
> >> Summary: We'd like to be able to specify some attributes using
> >> keywords, rather than the traditional __attribute__ or [[...]]
> >> syntax.  Would that be OK?
> >>
> >> In more detail:
> >>
> >> We'd like to add some new target-specific attributes for Arm SME.
> >> These attributes affect semantics and code generation and so they
> >> can't simply be ignored.
> >>
> >> Traditionally we've done this kind of thing by adding GNU attributes,
> >> via TARGET_ATTRIBUTE_TABLE in GCC's case.  The problem is that both
> >> GCC and Clang have traditionally only warned about unrecognised GNU
> >> attributes, rather than raising an error.  Older compilers might
> >> therefore be able to look past some uses of the new attributes and
> >> still produce object code, even though that object code is almost
> >> certainly going to be wrong.  (The compilers will also emit a default-on
> >> warning, but that might go unnoticed when building a big project.)
> >>
> >> There are some existing attributes that similarly affect semantics
> >> in ways that cannot be ignored.  vector_size is one obvious example.
> >> But that doesn't make it a good thing. :)
> >>
> >> Also, C++ says this for standard [[...]] attributes:
> >>
> >>   For an attribute-token (including an attribute-scoped-token)
> >>   not specified in this document, the behavior is implementation-defined;
> >>   any such attribute-token that is not recognized by the implementation
> >>   is ignored.
> >>
> >> which doubles down on the idea that attributes should not be used
> >> for necessary semantic information.
> >>
> >> One of the attributes we'd like to add provides a new way of compiling
> >> existing code.  The attribute doesn't require SME to be available;
> >> it just says that the code must be compiled so that it can run in either
> >> of two modes.  This is probably the most dangerous attribute of the set,
> >> since compilers that ignore it would just produce normal code.  That
> >> code might work in some test scenarios, but it would fail in others.
> >>
> >> The feeling from the Clang community was therefore that these SME
> >> attributes should use keywords instead, so that the keywords trigger
> >> an error with older compilers.
> >>
> >> However, it seemed wrong to define new SME-specific grammar rules,
> >> since the underlying problem is pretty generic.  We therefore
> >> proposed having a type of keyword that can appear exactly where
> >> a standard [[...]] attribute can appear and that appertains to
> >> exactly what a standard [[...]] attribute would appertain to.
> >> No divergence or cherry-picking is allowed.
> >>
> >> For example:
> >>
> >>   [[arm::foo]]
> >>
> >> would become:
> >>
> >>   __arm_foo
> >>
> >> and:
> >>
> >>   [[arm::bar(args)]]
> >>
> >> would become:
> >>
> >>   __arm_bar(args)
> >>
> >> It wouldn't be possible to retrofit arguments to a keyword that
> >> previously didn't take arguments, since that could lead to parsing
> >> ambiguities.  So when a keyword is first added, a binding decision
> >> would need to be made whether the keyword always takes arguments
> >> or is always standalone.
> >>
> >> For that reason, empty argument lists are allowed for keywords,
> >> even though they're not allowed for [[...]] attributes.
> >>
> >> The argument-less version was accepted into Clang, and I have a follow-on
> >> patch for handling arguments.  Would the same thing be OK for GCC,
> >> in both the C and C++ frontends?
> >>
> >> The patch below is a proof of concept for the C frontend.  It doesn't
> >> bootstrap due to warnings about uninitialised fields.  And it doesn't
> >> have tests.  But I did test it locally with various combinations of
> >> attribute_spec and it seemed to work as expected.
> >>
> >> The impact on the C frontend seems to be pretty small.  It looks like
> >> the impact on the C++ frontend would be a bit bigger, but not much.
> >>
> >> The patch contains a logically unrelated change: c-common.h set aside
> >> 16 keywords for address spaces, but of the in-tree ports, the maximum
> >> number of keywords used is 6 (for amdgcn).  The patch therefore changes
> >> the limit to 8 and uses 8 keywords for the new attributes.  This keeps
> >> the number of reserved ids <= 256.
> >
> > If you had added __arm(bar(args)) instead of __arm_bar(args) you would only
> > need one additional keyword - we could set aside a similar one for each
> > target then.  I realize that double-nesting of arguments might prove a bit
> > challenging but still.
>
> Yeah, that would work.
>
> > In any case I also think that attributes are what you want and their
> > ugliness/issues are not worse than the ugliness/issues of the keyword
> > approach IMHO.
>
> I guess the ugliness of keywords is the double underscore?
> What are the issues with the keyword approach 

[PATCH 7/9] Native complex operations: Vectorization of native complex operations

2023-07-17 Thread Sylvain Noiry via Gcc-patches
Add vectors of complex types to vectorize native operations. Because of
the vectorize was designed to work with scalar elements, several functions
and target hooks have to be adapted or duplicated to support complex types.
After that, the vectorization of native complex operations follows exactly
the same flow as scalars operations.

gcc/ChangeLog:

* target.def: Add preferred_simd_mode_complex and
related_mode_complex by duplicating their scalar counterparts
* targhooks.h: Add default_preferred_simd_mode_complex and
default_vectorize_related_mode_complex
* targhooks.cc (default_preferred_simd_mode_complex): New:
Default implementation of preferred_simd_mode_complex
(default_vectorize_related_mode_complex): New: Default
implementation of related_mode_complex
* doc/tm.texi: Document
TARGET_VECTORIZE_PREFERRED_SIMD_MODE_COMPLEX
and TARGET_VECTORIZE_RELATED_MODE_COMPLEX
* doc/tm.texi.in: Add TARGET_VECTORIZE_PREFERRED_SIMD_MODE_COMPLEX
and TARGET_VECTORIZE_RELATED_MODE_COMPLEX
* emit-rtl.cc (init_emit_once): Add the zero constant for vectors
of complex modes
* genmodes.cc (vector_class): Add case for vectors of complex
(complete_mode): Likewise
(make_complex_modes): Likewise
* gensupport.cc (match_pattern): Likewise
* machmode.h: Add vectors of complex in predicates and redefine
mode_for_vector and related_vector_mode for complex types
* mode-classes.def: Add MODE_VECTOR_COMPLEX_INT and
MODE_VECTOR_COMPLEX_FLOAT classes
* simplify-rtx.cc (simplify_context::simplify_binary_operation):
FIXME: do not simplify binary operations with complex vector
modes.
* stor-layout.cc (mode_for_vector): Adapt for complex modes
using sub-functions calling a common one
(related_vector_mode): Implement the function for complex modes
* tree-vect-generic.cc (type_for_widest_vector_mode): Add
cases for complex modes
* tree-vect-stmts.cc (get_related_vectype_for_scalar_type):
Adapt for complex modes
* tree.cc (build_vector_type_for_mode): Add cases for complex
modes
---
 gcc/doc/tm.texi  | 31 
 gcc/doc/tm.texi.in   |  4 
 gcc/emit-rtl.cc  | 10 
 gcc/genmodes.cc  |  8 +++
 gcc/gensupport.cc|  3 +++
 gcc/machmode.h   | 19 +++
 gcc/mode-classes.def |  2 ++
 gcc/simplify-rtx.cc  |  4 
 gcc/stor-layout.cc   | 43 +
 gcc/target.def   | 39 ++
 gcc/targhooks.cc | 29 ++
 gcc/targhooks.h  |  4 
 gcc/tree-vect-generic.cc |  4 
 gcc/tree-vect-stmts.cc   | 52 +++-
 gcc/tree.cc  |  2 ++
 15 files changed, 230 insertions(+), 24 deletions(-)

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index b73147aea9f..955a1f983d0 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6229,6 +6229,13 @@ equal to @code{word_mode}, because the vectorizer can do 
some
 transformations even in absence of specialized @acronym{SIMD} hardware.
 @end deftypefn
 
+@deftypefn {Target Hook} machine_mode 
TARGET_VECTORIZE_PREFERRED_SIMD_MODE_COMPLEX (complex_mode @var{mode})
+This hook should return the preferred mode for vectorizing complex
+mode @var{mode}.  The default is
+equal to @code{word_mode}, because the vectorizer can do some
+transformations even in absence of specialized @acronym{SIMD} hardware.
+@end deftypefn
+
 @deftypefn {Target Hook} machine_mode TARGET_VECTORIZE_SPLIT_REDUCTION 
(machine_mode)
 This hook should return the preferred mode to split the final reduction
 step on @var{mode} to.  The reduction is then carried out reducing upper
@@ -6291,6 +6298,30 @@ requested mode, returning a mode with the same size as 
@var{vector_mode}
 when @var{nunits} is zero.  This is the correct behavior for most targets.
 @end deftypefn
 
+@deftypefn {Target Hook} opt_machine_mode 
TARGET_VECTORIZE_RELATED_MODE_COMPLEX (machine_mode @var{vector_mode}, 
complex_mode @var{element_mode}, poly_uint64 @var{nunits})
+If a piece of code is using vector mode @var{vector_mode} and also wants
+to operate on elements of mode @var{element_mode}, return the vector mode
+it should use for those elements.  If @var{nunits} is nonzero, ensure that
+the mode has exactly @var{nunits} elements, otherwise pick whichever vector
+size pairs the most naturally with @var{vector_mode}.  Return an empty
+@code{opt_machine_mode} if there is no supported vector mode with the
+required properties.
+
+There is no prescribed way of handling the case in which @var{nunits}
+is zero.  One common choice is to pick a vector mode with the same size
+as @var{vector_mode}; this is the natural choice if the target has a
+fixed vector size.  Another option is to 

[PATCH 5/9] Native complex operations: Add the conjugate op in optabs

2023-07-17 Thread Sylvain Noiry via Gcc-patches
Add an optab and rtl operation for the conjugate, called conj,
to expand CONJ_EXPR.

gcc/ChangeLog:

* rtl.def: Add a conj operation in rtl
* optabs.def: Add a conj optab
* optabs-tree.cc (optab_for_tree_code): use the
conj_optab to convert a CONJ_EXPR
* expr.cc (expand_expr_real_2): Add a case to expand
native CONJ_EXPR
(expand_expr_real_1): Likewise
---
 gcc/expr.cc| 17 -
 gcc/optabs-tree.cc |  3 +++
 gcc/optabs.def |  3 +++
 gcc/rtl.def|  3 +++
 4 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/gcc/expr.cc b/gcc/expr.cc
index e94de8a05b5..be153be0b71 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -10498,6 +10498,18 @@ expand_expr_real_2 (sepops ops, rtx target, 
machine_mode tmode,
return dst;
   }
 
+case CONJ_EXPR:
+  op0 = expand_expr (treeop0, subtarget, VOIDmode, EXPAND_NORMAL);
+  if (modifier == EXPAND_STACK_PARM)
+   target = 0;
+  temp = expand_unop (mode,
+ optab_for_tree_code (CONJ_EXPR, type,
+  optab_default),
+ op0, target, 0);
+  gcc_assert (temp);
+  return REDUCE_BIT_FIELD (temp);
+
+
 default:
   gcc_unreachable ();
 }
@@ -12064,6 +12076,10 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode 
tmode,
   op0 = expand_normal (treeop0);
   return read_complex_part (op0, IMAG_P);
 
+case CONJ_EXPR:
+  op0 = expand_normal (treeop0);
+  return op0;
+
 case RETURN_EXPR:
 case LABEL_EXPR:
 case GOTO_EXPR:
@@ -12087,7 +12103,6 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode 
tmode,
 case VA_ARG_EXPR:
 case BIND_EXPR:
 case INIT_EXPR:
-case CONJ_EXPR:
 case COMPOUND_EXPR:
 case PREINCREMENT_EXPR:
 case PREDECREMENT_EXPR:
diff --git a/gcc/optabs-tree.cc b/gcc/optabs-tree.cc
index e6ae15939d3..c646b3667d4 100644
--- a/gcc/optabs-tree.cc
+++ b/gcc/optabs-tree.cc
@@ -271,6 +271,9 @@ optab_for_tree_code (enum tree_code code, const_tree type,
return TYPE_UNSIGNED (type) ? usneg_optab : ssneg_optab;
   return trapv ? negv_optab : neg_optab;
 
+case CONJ_EXPR:
+  return conj_optab;
+
 case ABS_EXPR:
   return trapv ? absv_optab : abs_optab;
 
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 3dae228fba6..31475c8afcc 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -160,6 +160,9 @@ OPTAB_NL(umax_optab, "umax$I$a3", UMAX, "umax", '3', 
gen_int_libfunc)
 OPTAB_NL(neg_optab, "neg$P$a2", NEG, "neg", '2', gen_int_fp_fixed_libfunc)
 OPTAB_NX(neg_optab, "neg$F$a2")
 OPTAB_NX(neg_optab, "neg$Q$a2")
+OPTAB_NL(conj_optab, "conj$P$a2", CONJ, "conj", '2', gen_int_fp_fixed_libfunc)
+OPTAB_NX(conj_optab, "conj$F$a2")
+OPTAB_NX(conj_optab, "conj$Q$a2")
 OPTAB_VL(negv_optab, "negv$I$a2", NEG, "neg", '2', gen_intv_fp_libfunc)
 OPTAB_VX(negv_optab, "neg$F$a2")
 OPTAB_NL(ssneg_optab, "ssneg$Q$a2", SS_NEG, "ssneg", '2', 
gen_signed_fixed_libfunc)
diff --git a/gcc/rtl.def b/gcc/rtl.def
index 88e2b198503..4280f727286 100644
--- a/gcc/rtl.def
+++ b/gcc/rtl.def
@@ -460,6 +460,9 @@ DEF_RTL_EXPR(MINUS, "minus", "ee", RTX_BIN_ARITH)
 /* Minus operand 0.  */
 DEF_RTL_EXPR(NEG, "neg", "e", RTX_UNARY)
 
+/* Conj operand 0 */
+DEF_RTL_EXPR(CONJ, "conj", "e", RTX_UNARY)
+
 DEF_RTL_EXPR(MULT, "mult", "ee", RTX_COMM_ARITH)
 
 /* Multiplication with signed saturation */
-- 
2.17.1







[PATCH 6/9] Native complex operations: Update how complex rotations are handled

2023-07-17 Thread Sylvain Noiry via Gcc-patches
Catch complex rotation by 90° and 270° in fold-const.cc like before,
but now convert them into the new COMPLEX_ROT90 and COMPLEX_ROT270
internal functions. Also add crot90 and crot270 optabs to expose these
operation the backends. So conditionnaly lower COMPLEX_ROT90/COMPLEX_ROT270
by checking if crot90/crot270 are in the optab. Finally, convert
a + crot90/270(b) into cadd90/270(a, b) in a similar way than FMAs.

gcc/ChangeLog:

* internal-fn.def: Add COMPLEX_ROT90 and COMPLEX_ROT270
* fold-const.cc (fold_binary_loc): Update the folding of
complex rotations to generate called to COMPLEX_ROT90 and
COMPLEX_ROT270
* optabs.def: add crot90/crot270 optabs
* tree-complex.cc (init_dont_simulate_again): Catch calls
to COMPLEX_ROT90 and COMPLEX_ROT270
(expand_complex_rotation): Conditionally lower complex
rotations if no pattern is present in the backend
(expand_complex_operations_1): Likewise
(convert_crot): Likewise
* tree-ssa-math-opts.cc (convert_crot_1): Catch complex
rotations with additions in a similar way the FMAs.
(math_opts_dom_walker::after_dom_children): Call convert_crot
if a COMPLEX_ROT90 or COMPLEX_ROT270 is identified
---
 gcc/fold-const.cc | 115 ++---
 gcc/internal-fn.def   |   2 +
 gcc/optabs.def|   2 +
 gcc/tree-complex.cc   |  79 ++-
 gcc/tree-ssa-math-opts.cc | 129 ++
 5 files changed, 302 insertions(+), 25 deletions(-)

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index a02ede79fed..f1224b6a548 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -11609,30 +11609,6 @@ fold_binary_loc (location_t loc, enum tree_code code, 
tree type,
}
   else
{
- /* Fold z * +-I to __complex__ (-+__imag z, +-__real z).
-This is not the same for NaNs or if signed zeros are
-involved.  */
- if (!HONOR_NANS (arg0)
- && !HONOR_SIGNED_ZEROS (arg0)
- && COMPLEX_FLOAT_TYPE_P (TREE_TYPE (arg0))
- && TREE_CODE (arg1) == COMPLEX_CST
- && real_zerop (TREE_REALPART (arg1)))
-   {
- tree rtype = TREE_TYPE (TREE_TYPE (arg0));
- if (real_onep (TREE_IMAGPART (arg1)))
-   return
- fold_build2_loc (loc, COMPLEX_EXPR, type,
-  negate_expr (fold_build1_loc (loc, IMAGPART_EXPR,
-rtype, arg0)),
-  fold_build1_loc (loc, REALPART_EXPR, rtype, 
arg0));
- else if (real_minus_onep (TREE_IMAGPART (arg1)))
-   return
- fold_build2_loc (loc, COMPLEX_EXPR, type,
-  fold_build1_loc (loc, IMAGPART_EXPR, rtype, 
arg0),
-  negate_expr (fold_build1_loc (loc, REALPART_EXPR,
-rtype, arg0)));
-   }
-
  /* Optimize z * conj(z) for floating point complex numbers.
 Guarded by flag_unsafe_math_optimizations as non-finite
 imaginary components don't produce scalar results.  */
@@ -11645,6 +11621,97 @@ fold_binary_loc (location_t loc, enum tree_code code, 
tree type,
  && operand_equal_p (arg0, TREE_OPERAND (arg1, 0), 0))
return fold_mult_zconjz (loc, type, arg0);
}
+
+  /* Fold z * +-I to __complex__ (-+__imag z, +-__real z).
+This is not the same for NaNs or if signed zeros are
+involved.  */
+  if (!HONOR_NANS (arg0)
+ && !HONOR_SIGNED_ZEROS (arg0)
+ && TREE_CODE (arg1) == COMPLEX_CST
+ && (COMPLEX_FLOAT_TYPE_P (TREE_TYPE (arg0))
+ && real_zerop (TREE_REALPART (arg1
+   {
+ if (real_onep (TREE_IMAGPART (arg1)))
+   {
+ tree rtype = TREE_TYPE (TREE_TYPE (arg0));
+ tree cplx_build = fold_build2_loc (loc, COMPLEX_EXPR, type,
+negate_expr (fold_build1_loc 
(loc, IMAGPART_EXPR,
+  
rtype, arg0)),
+ fold_build1_loc (loc, REALPART_EXPR, rtype, arg0));
+ if (cplx_build && TREE_CODE (TREE_OPERAND (cplx_build, 0)) != 
NEGATE_EXPR)
+   return cplx_build;
+
+ if ((TREE_CODE (arg0) == COMPLEX_EXPR) && real_zerop 
(TREE_OPERAND (arg0, 1)))
+   return fold_build2_loc (loc, COMPLEX_EXPR, type,
+   TREE_OPERAND (arg0, 1), TREE_OPERAND 
(arg0, 0));
+
+ if (TREE_CODE (arg0) == CALL_EXPR)
+   {
+ if (CALL_EXPR_IFN (arg0) == IFN_COMPLEX_ROT90)
+   return negate_expr (CALL_EXPR_ARG (arg0, 0));
+ else if (CALL_EXPR_IFN (arg0) == 

[PATCH 8/9] Native complex operations: Add explicit vector of complex

2023-07-17 Thread Sylvain Noiry via Gcc-patches
Allow the creation and usage of builtins vectors of complex
in C, using __attribute__ ((vector_size ()))

gcc/c-family/ChangeLog:

* c-attribs.cc (vector_mode_valid_p): Add cases for
vectors of complex
(handle_mode_attribute): Likewise
(type_valid_for_vector_size): Likewise
* c-common.cc (c_common_type_for_mode): Likewise
(vector_types_compatible_elements_p): Likewise

gcc/ChangeLog:

* fold-const.cc (fold_binary_loc): Likewise

gcc/c/ChangeLog:

* c-typeck.cc (build_unary_op): Likewise
---
 gcc/c-family/c-attribs.cc | 12 ++--
 gcc/c-family/c-common.cc  | 20 +++-
 gcc/c/c-typeck.cc |  8 ++--
 gcc/fold-const.cc |  1 +
 4 files changed, 36 insertions(+), 5 deletions(-)

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index e2792ca6898..d4de85160c1 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -2019,6 +2019,8 @@ vector_mode_valid_p (machine_mode mode)
   /* Doh!  What's going on?  */
   if (mclass != MODE_VECTOR_INT
   && mclass != MODE_VECTOR_FLOAT
+  && mclass != MODE_VECTOR_COMPLEX_INT
+  && mclass != MODE_VECTOR_COMPLEX_FLOAT
   && mclass != MODE_VECTOR_FRACT
   && mclass != MODE_VECTOR_UFRACT
   && mclass != MODE_VECTOR_ACCUM
@@ -2125,6 +2127,8 @@ handle_mode_attribute (tree *node, tree name, tree args,
 
case MODE_VECTOR_INT:
case MODE_VECTOR_FLOAT:
+   case MODE_VECTOR_COMPLEX_INT:
+   case MODE_VECTOR_COMPLEX_FLOAT:
case MODE_VECTOR_FRACT:
case MODE_VECTOR_UFRACT:
case MODE_VECTOR_ACCUM:
@@ -4361,9 +4365,13 @@ type_valid_for_vector_size (tree type, tree atname, tree 
args,
 
   if ((!INTEGRAL_TYPE_P (type)
&& !SCALAR_FLOAT_TYPE_P (type)
+   && !COMPLEX_INTEGER_TYPE_P (type)
+   && !COMPLEX_FLOAT_TYPE_P (type)
&& !FIXED_POINT_TYPE_P (type))
-  || (!SCALAR_FLOAT_MODE_P (orig_mode)
- && GET_MODE_CLASS (orig_mode) != MODE_INT
+  || ((!SCALAR_FLOAT_MODE_P (orig_mode)
+  && GET_MODE_CLASS (orig_mode) != MODE_INT)
+ && (!COMPLEX_FLOAT_MODE_P (orig_mode)
+ && GET_MODE_CLASS (orig_mode) != MODE_COMPLEX_INT)
  && !ALL_SCALAR_FIXED_POINT_MODE_P (orig_mode))
   || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type))
   || TREE_CODE (type) == BOOLEAN_TYPE)
diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
index 6ab63dae997..9574c074d26 100644
--- a/gcc/c-family/c-common.cc
+++ b/gcc/c-family/c-common.cc
@@ -2430,7 +2430,23 @@ c_common_type_for_mode (machine_mode mode, int unsignedp)
  : make_signed_type (precision));
 }
 
-  if (COMPLEX_MODE_P (mode))
+  if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL
+  && valid_vector_subparts_p (GET_MODE_NUNITS (mode)))
+{
+  unsigned int elem_bits = vector_element_size (GET_MODE_BITSIZE (mode),
+   GET_MODE_NUNITS (mode));
+  tree bool_type = build_nonstandard_boolean_type (elem_bits);
+  return build_vector_type_for_mode (bool_type, mode);
+}
+  else if (VECTOR_MODE_P (mode)
+  && valid_vector_subparts_p (GET_MODE_NUNITS (mode)))
+{
+  machine_mode inner_mode = GET_MODE_INNER (mode);
+  tree inner_type = c_common_type_for_mode (inner_mode, unsignedp);
+  if (inner_type != NULL_TREE)
+   return build_vector_type_for_mode (inner_type, mode);
+}
+  else if (COMPLEX_MODE_P (mode))
 {
   machine_mode inner_mode;
   tree inner_type;
@@ -8104,9 +8120,11 @@ vector_types_compatible_elements_p (tree t1, tree t2)
 
   gcc_assert ((INTEGRAL_TYPE_P (t1)
   || c1 == REAL_TYPE
+  || c1 == COMPLEX_TYPE
   || c1 == FIXED_POINT_TYPE)
  && (INTEGRAL_TYPE_P (t2)
  || c2 == REAL_TYPE
+ || c2 == COMPLEX_TYPE
  || c2 == FIXED_POINT_TYPE));
 
   t1 = c_common_signed_type (t1);
diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 7cf411155c6..68a9646cf5b 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -4584,7 +4584,9 @@ build_unary_op (location_t location, enum tree_code code, 
tree xarg,
   /* ~ works on integer types and non float vectors. */
   if (typecode == INTEGER_TYPE
  || (gnu_vector_type_p (TREE_TYPE (arg))
- && !VECTOR_FLOAT_TYPE_P (TREE_TYPE (arg
+ && !VECTOR_FLOAT_TYPE_P (TREE_TYPE (arg))
+ && !COMPLEX_INTEGER_TYPE_P (TREE_TYPE (TREE_TYPE (arg)))
+ && !COMPLEX_FLOAT_TYPE_P (TREE_TYPE (TREE_TYPE (arg)
{
  tree e = arg;
 
@@ -4607,7 +4609,9 @@ build_unary_op (location_t location, enum tree_code code, 
tree xarg,
  if (!noconvert)
arg = default_conversion (arg);
}
-  else if (typecode == COMPLEX_TYPE)
+  else if (typecode == COMPLEX_TYPE
+ || COMPLEX_INTEGER_TYPE_P (TREE_TYPE (TREE_TYPE (arg)))
+ || 

[PATCH 4/9] Native complex operations: Allow native complex regs and ops in rtl

2023-07-17 Thread Sylvain Noiry via Gcc-patches
Support registers of complex types in rtl. Also adapt the functions
called during the expand pass to support native complex operations.

gcc/ChangeLog:

* explow.cc (trunc_int_for_mode): Allow complex int modes
* expr.cc (emit_move_complex_parts): Move both parts at the
same time if it is supported by the backend
(emit_move_complex): Do not move via integer if not int mode
corresponds. For complex floats, relax the constraint on the
number of registers for targets with pairs of registers, and
use native moves if it is supported by the backend.
(expand_expr_real_2): Move both parts at the same time if it
is supported by the backend
(expand_expr_real_1): Update the expand of complex constants
(const_vector_from_tree): Add the expand of both parts of a
complex constant
* real.h: update FLOAT_MODE_FORMAT
* machmode.h: Add COMPLEX_INT_MODE_P and COMPLEX_FLOAT_MODE_P
predicates
* optabs-libfuncs.cc (gen_int_libfunc): Add support for
complex modes
(gen_intv_fp_libfunc): Likewise
* recog.cc (general_operand): Likewise
---
 gcc/explow.cc  |  2 +-
 gcc/expr.cc| 84 --
 gcc/machmode.h |  6 +++
 gcc/optabs-libfuncs.cc | 29 ---
 gcc/real.h |  3 +-
 gcc/recog.cc   |  1 +
 6 files changed, 105 insertions(+), 20 deletions(-)

diff --git a/gcc/explow.cc b/gcc/explow.cc
index 6424c0802f0..48572a40eab 100644
--- a/gcc/explow.cc
+++ b/gcc/explow.cc
@@ -56,7 +56,7 @@ trunc_int_for_mode (HOST_WIDE_INT c, machine_mode mode)
   int width = GET_MODE_PRECISION (smode);
 
   /* You want to truncate to a _what_?  */
-  gcc_assert (SCALAR_INT_MODE_P (mode));
+  gcc_assert (SCALAR_INT_MODE_P (mode) || COMPLEX_INT_MODE_P (mode));
 
   /* Canonicalize BImode to 0 and STORE_FLAG_VALUE.  */
   if (smode == BImode)
diff --git a/gcc/expr.cc b/gcc/expr.cc
index e1a0892b4d9..e94de8a05b5 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -3847,8 +3847,14 @@ emit_move_complex_parts (rtx x, rtx y)
   && REG_P (x) && !reg_overlap_mentioned_p (x, y))
 emit_clobber (x);
 
-  write_complex_part (x, read_complex_part (y, REAL_P), REAL_P, true);
-  write_complex_part (x, read_complex_part (y, IMAG_P), IMAG_P, false);
+  machine_mode mode = GET_MODE (x);
+  if (optab_handler (mov_optab, mode) != CODE_FOR_nothing)
+write_complex_part (x, read_complex_part (y, BOTH_P), BOTH_P, false);
+  else
+{
+  write_complex_part (x, read_complex_part (y, REAL_P), REAL_P, true);
+  write_complex_part (x, read_complex_part (y, IMAG_P), IMAG_P, false);
+}
 
   return get_last_insn ();
 }
@@ -3868,14 +3874,14 @@ emit_move_complex (machine_mode mode, rtx x, rtx y)
 
   /* See if we can coerce the target into moving both values at once, except
  for floating point where we favor moving as parts if this is easy.  */
-  if (GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT
+  scalar_int_mode imode;
+  if (!int_mode_for_mode (mode).exists ())
+try_int = false;
+  else if (GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT
   && optab_handler (mov_optab, GET_MODE_INNER (mode)) != CODE_FOR_nothing
-  && !(REG_P (x)
-  && HARD_REGISTER_P (x)
-  && REG_NREGS (x) == 1)
-  && !(REG_P (y)
-  && HARD_REGISTER_P (y)
-  && REG_NREGS (y) == 1))
+  && optab_handler (mov_optab, mode) != CODE_FOR_nothing
+  && !(REG_P (x) && HARD_REGISTER_P (x))
+  && !(REG_P (y) && HARD_REGISTER_P (y)))
 try_int = false;
   /* Not possible if the values are inherently not adjacent.  */
   else if (GET_CODE (x) == CONCAT || GET_CODE (y) == CONCAT)
@@ -10246,9 +10252,14 @@ expand_expr_real_2 (sepops ops, rtx target, 
machine_mode tmode,
break;
  }
 
-  /* Move the real (op0) and imaginary (op1) parts to their location.  */
-  write_complex_part (target, op0, REAL_P, true);
-  write_complex_part (target, op1, IMAG_P, false);
+  if ((op0 == op1) && (GET_CODE (op0) == CONST_VECTOR))
+   write_complex_part (target, op0, BOTH_P, false);
+  else
+   {
+ /* Move the real (op0) and imaginary (op1) parts to their location.  
*/
+ write_complex_part (target, op0, REAL_P, true);
+ write_complex_part (target, op1, IMAG_P, false);
+   }
 
   return target;
 
@@ -11001,6 +11012,51 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode 
tmode,
 
  return original_target;
}
+  else if (original_target && (GET_CODE (original_target) == REG)
+  &&
+  ((GET_MODE_CLASS (GET_MODE (original_target)) ==
+MODE_COMPLEX_INT)
+   || (GET_MODE_CLASS (GET_MODE (original_target)) ==
+   MODE_COMPLEX_FLOAT)))
+   {
+ mode = TYPE_MODE (TREE_TYPE (exp));
+
+ /* Move both parts at the same time if possible */
+ if 

[PATCH 3/9] Native complex operations: Add gen_rtx_complex hook

2023-07-17 Thread Sylvain Noiry via Gcc-patches
Add a new target hook for complex element creation during
the expand pass, called gen_rtx_complex. The default implementation
calls gen_rtx_CONCAT like before. Then calls to gen_rtx_CONCAT for
complex handling are replaced by calls to targetm.gen_rtx_complex.

gcc/ChangeLog:

* target.def: Add gen_rtx_complex target hook
* targhooks.cc (default_gen_rtx_complex): New: Default
implementation for gen_rtx_complex
* targhooks.h: Add default_gen_rtx_complex
* doc/tm.texi: Document TARGET_GEN_RTX_COMPLEX
* doc/tm.texi.in: Add TARGET_GEN_RTX_COMPLEX
* emit-rtl.cc (gen_reg_rtx): Replace call to
gen_rtx_CONCAT by call to gen_rtx_complex
(init_emit_once): Likewise
* expmed.cc (flip_storage_order): Likewise
* optabs.cc (expand_doubleword_mod): Likewise
---
 gcc/doc/tm.texi|  6 ++
 gcc/doc/tm.texi.in |  2 ++
 gcc/emit-rtl.cc| 26 +-
 gcc/expmed.cc  |  2 +-
 gcc/optabs.cc  | 12 +++-
 gcc/target.def | 10 ++
 gcc/targhooks.cc   | 27 +++
 gcc/targhooks.h|  2 ++
 8 files changed, 64 insertions(+), 23 deletions(-)

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 87997b76338..b73147aea9f 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -4605,6 +4605,12 @@ to return a nonzero value when it is required, the 
compiler will run out
 of spill registers and print a fatal error message.
 @end deftypefn
 
+@deftypefn {Target Hook} rtx TARGET_GEN_RTX_COMPLEX (machine_mode @var{mode}, 
rtx @var{real_part}, rtx @var{imag_part})
+This hook should return an rtx representing a complex of mode 
@var{machine_mode} built from @var{real_part} and @var{imag_part}.
+  If both arguments are @code{NULL}, create them as registers.
+ The default is @code{gen_rtx_CONCAT}.
+@end deftypefn
+
 @deftypefn {Target Hook} rtx TARGET_READ_COMPLEX_PART (rtx @var{cplx}, 
complex_part_t @var{part})
 This hook should return the rtx representing the specified @var{part} of the 
complex given by @var{cplx}.
   @var{part} can be the real part, the imaginary part, or both of them.
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index efbf972e6a7..dd39e450903 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -3390,6 +3390,8 @@ stack.
 
 @hook TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P
 
+@hook TARGET_GEN_RTX_COMPLEX
+
 @hook TARGET_READ_COMPLEX_PART
 
 @hook TARGET_WRITE_COMPLEX_PART
diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc
index f6276a2d0b6..22012bfea13 100644
--- a/gcc/emit-rtl.cc
+++ b/gcc/emit-rtl.cc
@@ -1190,19 +1190,7 @@ gen_reg_rtx (machine_mode mode)
   if (generating_concat_p
   && (GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT
  || GET_MODE_CLASS (mode) == MODE_COMPLEX_INT))
-{
-  /* For complex modes, don't make a single pseudo.
-Instead, make a CONCAT of two pseudos.
-This allows noncontiguous allocation of the real and imaginary parts,
-which makes much better code.  Besides, allocating DCmode
-pseudos overstrains reload on some machines like the 386.  */
-  rtx realpart, imagpart;
-  machine_mode partmode = GET_MODE_INNER (mode);
-
-  realpart = gen_reg_rtx (partmode);
-  imagpart = gen_reg_rtx (partmode);
-  return gen_rtx_CONCAT (mode, realpart, imagpart);
-}
+return targetm.gen_rtx_complex (mode, NULL, NULL);
 
   /* Do not call gen_reg_rtx with uninitialized crtl.  */
   gcc_assert (crtl->emit.regno_pointer_align_length);
@@ -6274,14 +6262,18 @@ init_emit_once (void)
 
   FOR_EACH_MODE_IN_CLASS (mode, MODE_COMPLEX_INT)
 {
-  rtx inner = const_tiny_rtx[0][(int)GET_MODE_INNER (mode)];
-  const_tiny_rtx[0][(int) mode] = gen_rtx_CONCAT (mode, inner, inner);
+  machine_mode imode = GET_MODE_INNER (mode);
+  rtx inner = const_tiny_rtx[0][(int) imode];
+  const_tiny_rtx[0][(int) mode] =
+   targetm.gen_rtx_complex (mode, inner, inner);
 }
 
   FOR_EACH_MODE_IN_CLASS (mode, MODE_COMPLEX_FLOAT)
 {
-  rtx inner = const_tiny_rtx[0][(int)GET_MODE_INNER (mode)];
-  const_tiny_rtx[0][(int) mode] = gen_rtx_CONCAT (mode, inner, inner);
+  machine_mode imode = GET_MODE_INNER (mode);
+  rtx inner = const_tiny_rtx[0][(int) imode];
+  const_tiny_rtx[0][(int) mode] =
+   targetm.gen_rtx_complex (mode, inner, inner);
 }
 
   FOR_EACH_MODE_IN_CLASS (mode, MODE_VECTOR_BOOL)
diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index 2f787cc28f9..8a18161827b 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -400,7 +400,7 @@ flip_storage_order (machine_mode mode, rtx x)
   real = flip_storage_order (GET_MODE_INNER (mode), real);
   imag = flip_storage_order (GET_MODE_INNER (mode), imag);
 
-  return gen_rtx_CONCAT (mode, real, imag);
+  return targetm.gen_rtx_complex (mode, real, imag);
 }
 
   if (UNLIKELY (reverse_storage_order_supported < 0))
diff --git a/gcc/optabs.cc b/gcc/optabs.cc
index 4e9f58f8060..18900e8113e 

[PATCH 2/9] Native complex operations: Move functions to hooks

2023-07-17 Thread Sylvain Noiry via Gcc-patches
Move read_complex_part and write_complex_part to target hooks. Their
signature also change because of the type of argument part is now
complex_part_t. Calls to theses functions are updated accordingly.

gcc/ChangeLog:

* target.def: Define hooks for read_complex_part and
write_complex_part
* targhooks.cc (default_read_complex_part): New: default
implementation of read_complex_part
(default_write_complex_part): New: default implementation
if write_complex_part
* targhooks.h: Add default_read_complex_part and
default_write_complex_part
* doc/tm.texi: Document the new TARGET_READ_COMPLEX_PART
and TARGET_WRITE_COMPLEX_PART hooks
* doc/tm.texi.in: Add TARGET_READ_COMPLEX_PART and
TARGET_WRITE_COMPLEX_PART
* expr.cc
(write_complex_part): Call TARGET_READ_COMPLEX_PART hook
(read_complex_part): Call TARGET_WRITE_COMPLEX_PART hook
* expr.h: Update function signatures of read_complex_part
and write_complex_part
* builtins.cc (expand_ifn_atomic_compare_exchange_into_call):
Update calls to read_complex_part and write_complex_part
(expand_ifn_atomic_compare_exchange): Likewise
* expmed.cc (flip_storage_order): Likewise
(clear_storage_hints): Likewise
and write_complex_part
(emit_move_complex_push): Likewise
(emit_move_complex_parts): Likewise
(expand_assignment): Likewise
(expand_expr_real_2): Likewise
(expand_expr_real_1): Likewise
(const_vector_from_tree): Likewise
* internal-fn.cc (expand_arith_set_overflow): Likewise
(expand_arith_overflow_result_store): Likewise
(expand_addsub_overflow): Likewise
(expand_neg_overflow): Likewise
(expand_mul_overflow): Likewise
(expand_arith_overflow): Likewise
(expand_UADDC): Likewise
---
 gcc/builtins.cc|   8 +--
 gcc/doc/tm.texi|  10 +++
 gcc/doc/tm.texi.in |   4 ++
 gcc/expmed.cc  |   4 +-
 gcc/expr.cc| 164 +
 gcc/expr.h |   5 +-
 gcc/internal-fn.cc |  20 +++---
 gcc/target.def |  18 +
 gcc/targhooks.cc   | 139 ++
 gcc/targhooks.h|   5 ++
 10 files changed, 224 insertions(+), 153 deletions(-)

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 6dff5214ff8..37da6bcae6f 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -6347,8 +6347,8 @@ expand_ifn_atomic_compare_exchange_into_call (gcall 
*call, machine_mode mode)
   if (GET_MODE (boolret) != mode)
boolret = convert_modes (mode, GET_MODE (boolret), boolret, 1);
   x = force_reg (mode, x);
-  write_complex_part (target, boolret, true, true);
-  write_complex_part (target, x, false, false);
+  write_complex_part (target, boolret, IMAG_P, true);
+  write_complex_part (target, x, REAL_P, false);
 }
 }
 
@@ -6403,8 +6403,8 @@ expand_ifn_atomic_compare_exchange (gcall *call)
   rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
   if (GET_MODE (boolret) != mode)
boolret = convert_modes (mode, GET_MODE (boolret), boolret, 1);
-  write_complex_part (target, boolret, true, true);
-  write_complex_part (target, oldval, false, false);
+  write_complex_part (target, boolret, IMAG_P, true);
+  write_complex_part (target, oldval, REAL_P, false);
 }
 }
 
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 95ba56e05ae..87997b76338 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -4605,6 +4605,16 @@ to return a nonzero value when it is required, the 
compiler will run out
 of spill registers and print a fatal error message.
 @end deftypefn
 
+@deftypefn {Target Hook} rtx TARGET_READ_COMPLEX_PART (rtx @var{cplx}, 
complex_part_t @var{part})
+This hook should return the rtx representing the specified @var{part} of the 
complex given by @var{cplx}.
+  @var{part} can be the real part, the imaginary part, or both of them.
+@end deftypefn
+
+@deftypefn {Target Hook} void TARGET_WRITE_COMPLEX_PART (rtx @var{cplx}, rtx 
@var{val}, complex_part_t @var{part}, bool @var{undefined_p})
+This hook should move the rtx value given by @var{val} to the specified 
@var{var} of the complex given by @var{cplx}.
+  @var{var} can be the real part, the imaginary part, or both of them.
+@end deftypefn
+
 @node Scalar Return
 @subsection How Scalar Function Values Are Returned
 @cindex return values in registers
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 4ac96dc357d..efbf972e6a7 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -3390,6 +3390,10 @@ stack.
 
 @hook TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P
 
+@hook TARGET_READ_COMPLEX_PART
+
+@hook TARGET_WRITE_COMPLEX_PART
+
 @node Scalar Return
 @subsection How Scalar Function Values Are Returned
 @cindex return values in registers
diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index 

[PATCH 1/9] Native complex operations: Conditional lowering

2023-07-17 Thread Sylvain Noiry via Gcc-patches
Allow the cplxlower pass to identify if an operation does not need
to be lowered through optabs. In this case, lowering is not performed.
The cplxlower pass now has to handle a mix of lowered and non-lowered
operations. A quick access to both parts of a complex constant is
also implemented.

gcc/lto/ChangeLog:

* lto-common.cc (compare_tree_sccs_1): Handle both parts of a
  complex constant

gcc/ChangeLog:

* coretypes.h: Add enum for complex parts
* gensupport.cc (match_pattern): Add complex types
* lto-streamer-out.cc (DFS::DFS_write_tree_body):
(hash_tree): Handle both parts of a complex constant
* tree-complex.cc (get_component_var): Support handling of
both parts of a complex
(get_component_ssa_name): Likewise
(set_component_ssa_name): Likewise
(extract_component): Likewise
(update_complex_components): Likewise
(update_complex_components_on_edge): Likewise
(update_complex_assignment): Likewise
(update_phi_components): Likewise
(expand_complex_move): Likewise
(expand_complex_asm): Update with complex_part_t
(complex_component_cst_p): New: check if a complex
component is a constant
(target_native_complex_operation): New: Check if complex
operation is supported natively by the backend, through
the optab
(expand_complex_operations_1): Condionally lowered ops
(tree_lower_complex): Support handling of both parts of
 a complex
* tree-core.h (struct GTY): Add field for both parts of
the tree_complex struct
* tree-streamer-in.cc (lto_input_ts_complex_tree_pointers):
Handle both parts of a complex constant
* tree-streamer-out.cc (write_ts_complex_tree_pointers):
Likewise
* tree.cc (build_complex): likewise
* tree.h (class auto_suppress_location_wrappers):
(type_has_mode_precision_p): Add special case for complex
---
 gcc/coretypes.h  |   9 +
 gcc/gensupport.cc|   2 +
 gcc/lto-streamer-out.cc  |   2 +
 gcc/lto/lto-common.cc|   2 +
 gcc/tree-complex.cc  | 434 +--
 gcc/tree-core.h  |   1 +
 gcc/tree-streamer-in.cc  |   1 +
 gcc/tree-streamer-out.cc |   1 +
 gcc/tree.cc  |   8 +
 gcc/tree.h   |  15 +-
 10 files changed, 363 insertions(+), 112 deletions(-)

diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index ca8837cef67..a000c104b53 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -443,6 +443,15 @@ enum optimize_size_level
   OPTIMIZE_SIZE_MAX
 };
 
+/* part of a complex */
+
+typedef enum
+{
+  REAL_P = 0,
+  IMAG_P = 1,
+  BOTH_P = 2
+} complex_part_t;
+
 /* Support for user-provided GGC and PCH markers.  The first parameter
is a pointer to a pointer, the second either NULL if the pointer to
pointer points into a GC object or the actual pointer address if
diff --git a/gcc/gensupport.cc b/gcc/gensupport.cc
index 959d1d9c83c..9aa2ba69fcd 100644
--- a/gcc/gensupport.cc
+++ b/gcc/gensupport.cc
@@ -3746,9 +3746,11 @@ match_pattern (optab_pattern *p, const char *name, const 
char *pat)
break;
if (*p == 0
&& (! force_int || mode_class[i] == MODE_INT
+   || mode_class[i] == MODE_COMPLEX_INT
|| mode_class[i] == MODE_VECTOR_INT)
&& (! force_partial_int
|| mode_class[i] == MODE_INT
+   || mode_class[i] == MODE_COMPLEX_INT
|| mode_class[i] == MODE_PARTIAL_INT
|| mode_class[i] == MODE_VECTOR_INT)
&& (! force_float
diff --git a/gcc/lto-streamer-out.cc b/gcc/lto-streamer-out.cc
index 5ffa8954022..38c48e44867 100644
--- a/gcc/lto-streamer-out.cc
+++ b/gcc/lto-streamer-out.cc
@@ -985,6 +985,7 @@ DFS::DFS_write_tree_body (struct output_block *ob,
 {
   DFS_follow_tree_edge (TREE_REALPART (expr));
   DFS_follow_tree_edge (TREE_IMAGPART (expr));
+  DFS_follow_tree_edge (TREE_COMPLEX_BOTH_PARTS (expr));
 }
 
   if (CODE_CONTAINS_STRUCT (code, TS_DECL_MINIMAL))
@@ -1417,6 +1418,7 @@ hash_tree (struct streamer_tree_cache_d *cache, 
hash_map *map,
 {
   visit (TREE_REALPART (t));
   visit (TREE_IMAGPART (t));
+  visit (TREE_COMPLEX_BOTH_PARTS (t));
 }
 
   if (CODE_CONTAINS_STRUCT (code, TS_DECL_MINIMAL))
diff --git a/gcc/lto/lto-common.cc b/gcc/lto/lto-common.cc
index 703e665b698..f647ee62f9e 100644
--- a/gcc/lto/lto-common.cc
+++ b/gcc/lto/lto-common.cc
@@ -1408,6 +1408,8 @@ compare_tree_sccs_1 (tree t1, tree t2, tree **map)
 {
   compare_tree_edges (TREE_REALPART (t1), TREE_REALPART (t2));
   compare_tree_edges (TREE_IMAGPART (t1), TREE_IMAGPART (t2));
+  compare_tree_edges (TREE_COMPLEX_BOTH_PARTS (t1),
+ TREE_COMPLEX_BOTH_PARTS (t2));
 }
 
   if 

[PATCH 0/9] Native complex operations

2023-07-17 Thread Sylvain Noiry via Gcc-patches
Hi,

I have recently started a discussion about exposing complex operations directly
to the backends, to better exploit ISA with complex instructions. The title of 
the original message is "[RFC] Exposing complex numbers to target backends" [1].

This message starts a serie of 9 patches of the implementation that I've done. 
8 patches are about generic code, split by features. The last one is an 
experimental update of the x86 backend which exploits the newly exposed 
complex operations.

My original work was on the KVX backend from Kalray, where the ISA has complex
instructions. So I have obtained huge performance gains, on par with a code 
which uses builtins. On x86 there are gains without -ffast-math because less
calls to helpers are performed, but gains are marginal with -ffast-math due to
the lack of complex instructions.

[1] https://gcc.gnu.org/pipermail/gcc/2023-July/241998.html

Summary of the 9 patches:
  1/9: Conditional lowering of complex operations using the backend + update 
   on the TREE complex constants
  2/9: Move of read_complex_part and write_complex_part to target hooks to let
   the backend decide
  3/9: Add a gen_rtx_complex target hook to let the backend use its preferred 
   representation of complex in rtl
  4/9: Support and optimize the use of classical registers to represent complex
  5/9: Expose the conjugate operation down to the backend
  6/9: Expose and optimize complex rotations using internals functions and 
   conditional lowering
  7/9: Allow the vectorizer to work on complex type like it does on scalars
  8/9: Add explicit vectors of complex. This remains optional
  9/9: Experimental update on the x86 backend to exploit some of the previous
   features

The following sections explains the features added by each patch and 
illustrates them with examples on KVX, because of the backend support all the
new features. All examples are compiled with -O2 -ffast-math.

Patches 1 to 4 are required to have the minimal set of features which allows a
backend to exploit native complex operations.

PATCH 1/9: 
  - Change the TREE complex constants by adding a new field called "both" in 
the tree_complex struct, which holds a vector of the real and imaginary 
parts. This make the handling of constants during the cplxlower and expand
passes easier. Any change to the one part will also affect the vector, 
so very few changes are needed elsewhere.
  - Check in the optab for a complex pattern for almost all operations in the 
cplxlower pass. The lowering is done only if an optab code is found. Some 
conditions on presence on constants in the operands were also added, which 
can be a subject of discussions.
  - Add a complex component for both parts in the cplxlower pass. When an 
operation is lowered, the both part is recreated using a COMPLEX_EXPR. 
When an operation is kept non-lowerd, real and imaginary parts are 
extracted using REALPART_EXPR and IMAGPART_EXPR.

PATCH 2/9:
  - Move the inner implementation of read_complex_part and write_complex_part 
to target hooks. This allows each backend to have its own implementation, 
while the default ones are almost the same as before. Going back to 
standard functions may be a point to discuss if no incompatible change are 
done by the target to the default implementation.
  - Change the signature of read_complex_part and write_complex_part, to allow 
both parts as a part. This affects all the calls to these functions.

PATCH 3/9:
  - Add a new target hook to replace gen_rtx_CONCAT when a complex element 
needs to be created. The default implementation uses gen_rtx_CONCAT, but 
the KVX implementation simply created a register with a complex type. 
A previous attempt was to deal with generating_concat_p in gen_rtx_reg, 
but no good solutions was found.

PATCH 4/9:
  - Adapt and optimize for the use of native complex operation in rtl, 
aswell as register of complex types. After this patch, it's now possible 
to re-implement the three new hooks and write some complex pattern. 
  
  Considering the following example:
  
_Complex float mul(_Complex float a, _Complex float b)
{ 
  return a * b;
}

  Previously, the generated code was:
mul:
copyw $r3 = $r0
extfz $r5 = $r0, 32+32-1, 32 ; extract imag part
;;  # (end cycle 0)
fmulw $r4 = $r3, $r1 ; float mul
;;  # (end cycle 1)
fmulw $r2 = $r5, $r1 ; float mul
extfz $r1 = $r1, 32+32-1, 32 ; extract real part
;;  # (end cycle 2)  
ffmsw $r4 = $r5, $r1 ; float FMS
;;  # (end cycle 5)
ffmaw $r2 = $r3, $r1 : float FMA
;;  # (end cycle 6)
insf $r0 = $r4, 32+0-1, 0; insert real part
;;  # (end cycle 9)
insf $r0 = $r2, 32+32-1, 32  ; insert imag part
ret
;;  # (end cycle 10)

  The 

[PATCH] riscv: Fix warning in riscv_regno_ok_for_index_p

2023-07-17 Thread Christoph Muellner
From: Christoph Müllner 

The variable `regno` is currently not used in riscv_regno_ok_for_index_p(),
which triggers a compiler warning. Let's address this.

Fixes: 423604278ed5 ("riscv: Prepare backend for index registers")

Reported-by: Juzhe Zhong 
Reported-by: Andreas Schwab 
Signed-off-by: Christoph Müllner 

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_regno_ok_for_index_p):
Remove parameter name from declaration of unused parameter.

Signed-off-by: Christoph Müllner 
---
 gcc/config/riscv/riscv.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 6ed735d6983..ae3c034e76e 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -861,7 +861,7 @@ riscv_index_reg_class ()
but extensions might support that.  */
 
 int
-riscv_regno_ok_for_index_p (int regno)
+riscv_regno_ok_for_index_p (int)
 {
   return 0;
 }
-- 
2.41.0



Re: [PATCH] vect: Initialize new_temp to avoid false positive warning [PR110652]

2023-07-17 Thread Kewen.Lin via Gcc-patches
on 2023/7/17 14:39, Richard Biener wrote:
> On Mon, Jul 17, 2023 at 4:22 AM Kewen.Lin  wrote:
>>
>> Hi,
>>
>> As PR110652 and its duplicate PRs show, there could be one
>> build error
>>
>>   error: 'new_temp' may be used uninitialized
>>
>> for some build configurations.  It's a false positive warning
>> (or error at -Werror), but in order to make the build succeed,
>> this patch is to initialize the reported variable 'new_temp'
>> as NULL_TREE.
>>
>> Confirmed this patch fixed the reported issue in PR110652
>> (with the same configuration).
>>
>> Is it ok for trunk?
> 
> OK.

Thanks Richi, pushed as r14-2560.

BR,
Kewen


Re: [PATCH 1/2] [i386] Support type _Float16/__bf16 independent of SSE2.

2023-07-17 Thread Hongtao Liu via Gcc-patches
I'd like to ping for this patch (only patch 1/2, for patch 2/2, I
think that may not be necessary).

On Mon, May 15, 2023 at 9:20 AM Hongtao Liu  wrote:
>
> ping.
>
> On Fri, Apr 21, 2023 at 9:55 PM liuhongt  wrote:
> >
> > > > +  if (!TARGET_SSE2)
> > > > +{
> > > > +  if (c_dialect_cxx ()
> > > > +   && cxx_dialect > cxx20)
> > >
> > > Formatting, both conditions are short, so just put them on one line.
> > Changed.
> >
> > > But for the C++23 macros, more importantly I think we really should
> > > also in ix86_target_macros_internal add
> > >   if (c_dialect_cxx ()
> > >   && cxx_dialect > cxx20
> > >   && (isa_flag & OPTION_MASK_ISA_SSE2))
> > > {
> > >   def_or_undef (parse_in, "__STDCPP_FLOAT16_T__");
> > >   def_or_undef (parse_in, "__STDCPP_BFLOAT16_T__");
> > > }
> > > plus associated libstdc++ changes.  It can be done incrementally though.
> > Added in PATCH 2/2
> >
> > > > +  if (flag_building_libgcc)
> > > > + {
> > > > +   /* libbid uses __LIBGCC_HAS_HF_MODE__ and __LIBGCC_HAS_BF_MODE__
> > > > +  to check backend support of _Float16 and __bf16 type.  */
> > >
> > > That is actually the case only for HFmode, but not for BFmode right now.
> > > So, we need further work.  One is to add the BFmode support in there,
> > > and another one is make sure the _Float16 <-> _Decimal* and __bf16 <->
> > > _Decimal* conversions are compiled in also if not -msse2 by default.
> > > One way to do that is wrap the HF and BF mode related functions on x86
> > > #ifndef __SSE2__ into the pragmas like intrin headers use (but then
> > > perhaps we don't need to undef this stuff here), another is not provide
> > > the hf/bf support in that case from the TUs where they are provided now,
> > > but from a different one which would be compiled with -msse2.
> > Add CFLAGS-_hf_to_sd.c += -msse2, similar for other files in libbid, just 
> > like
> > we did before for HFtype softfp. Then no need to undef libgcc macros.
> >
> > > >/* We allowed the user to turn off SSE for kernel mode.  Don't crash 
> > > > if
> > > >   some less clueful developer tries to use floating-point anyway.  
> > > > */
> > > > -  if (needed_sseregs && !TARGET_SSE)
> > > > +  if (needed_sseregs
> > > > +  && (!TARGET_SSE
> > > > +   || (VALID_SSE2_TYPE_MODE (mode)
> > > > +   && !TARGET_SSE2)))
> > >
> > > Formatting, no need to split this up that much.
> > >   if (needed_sseregs
> > >   && (!TARGET_SSE
> > >   || (VALID_SSE2_TYPE_MODE (mode) && !TARGET_SSE2)))
> > > or even better
> > >   if (needed_sseregs
> > >   && (!TARGET_SSE || (VALID_SSE2_TYPE_MODE (mode) && !TARGET_SSE2)))
> > > will do it.
> > Changed.
> >
> > > Instead of this, just use
> > >   if (!float16_type_node)
> > > {
> > >   float16_type_node = ix86_float16_type_node;
> > >   callback (float16_type_node);
> > >   float16_type_node = NULL_TREE;
> > > }
> > >   if (!bfloat16_type_node)
> > > {
> > >   bfloat16_type_node = ix86_bf16_type_node;
> > >   callback (bfloat16_type_node);
> > >   bfloat16_type_node = NULL_TREE;
> > > }
> > Changed.
> >
> >
> > > > +static const char *
> > > > +ix86_invalid_conversion (const_tree fromtype, const_tree totype)
> > > > +{
> > > > +  if (element_mode (fromtype) != element_mode (totype))
> > > > +{
> > > > +  /* Do no allow conversions to/from BFmode/HFmode scalar types
> > > > +  when TARGET_SSE2 is not available.  */
> > > > +  if ((TYPE_MODE (fromtype) == BFmode
> > > > +|| TYPE_MODE (fromtype) == HFmode)
> > > > +   && !TARGET_SSE2)
> > >
> > > First of all, not really sure if this should be purely about scalar
> > > modes, not also complex and vector modes involving those inner modes.
> > > Because complex or vector modes with BF/HF elements will be without
> > > TARGET_SSE2 for sure lowered into scalar code and that can't be handled
> > > either.
> > > So if (!TARGET_SSE2 && GET_MODE_INNER (TYPE_MODE (fromtype)) == BFmode)
> > > or even better
> > > if (!TARGET_SSE2 && element_mode (fromtype) == BFmode)
> > > ?
> > > Or even better remember the 2 modes above into machine_mode temporaries
> > > and just use those in the != comparison and for the checks?
> > >
> > > Also, I think it is weird to tell user %<__bf16%> or %<_Float16%> when
> > > we know which one it is.  Just return separate messages?
> > Changed.
> >
> > > > +  /* Reject all single-operand operations on BFmode/HFmode except for &
> > > > + when TARGET_SSE2 is not available.  */
> > > > +  if ((element_mode (type) == BFmode || element_mode (type) == HFmode)
> > > > +  && !TARGET_SSE2 && op != ADDR_EXPR)
> > > > +return N_("operation not permitted on type %<__bf16%> "
> > > > +   "or %<_Float16%> without option %<-msse2%>");
> > >
> > > Similarly.  Also, check !TARGET_SSE2 first as inexpensive one.
> > Changed.
> >
> >
> > Bootstrapped and regtested 

Re: Re: [PATCH] RISC-V: Support non-SLP unordered reduction

2023-07-17 Thread juzhe.zh...@rivai.ai
Address comment.

V2 patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624638.html 

I added:

+/* Change insn and Assert the change always happens.  */
+static void
+validate_change_or_fail (rtx object, rtx *loc, rtx new_rtx, bool in_group)
+{
+  bool change_p = validate_change (object, loc, new_rtx, in_group);
+  gcc_assert (change_p);
+}
as you suggested.

Could you take a look again?


juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-07-17 15:00
To: juzhe.zhong
CC: gcc-patches; kito.cheng; palmer; rdapp.gcc; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Support non-SLP unordered reduction
> @@ -247,6 +248,7 @@ void emit_vlmax_cmp_mu_insn (unsigned, rtx *);
>  void emit_vlmax_masked_mu_insn (unsigned, int, rtx *);
>  void emit_scalar_move_insn (unsigned, rtx *);
>  void emit_nonvlmax_integer_move_insn (unsigned, rtx *, rtx);
> +//void emit_vlmax_reduction_insn (unsigned, rtx *);
 
Plz drop this.
 
 
> diff --git a/gcc/config/riscv/riscv-vsetvl.cc 
> b/gcc/config/riscv/riscv-vsetvl.cc
> index 586dc8e5379..97a9dad8a77 100644
> --- a/gcc/config/riscv/riscv-vsetvl.cc
> +++ b/gcc/config/riscv/riscv-vsetvl.cc
> @@ -646,7 +646,8 @@ gen_vsetvl_pat (enum vsetvl_type insn_type, const 
> vl_vtype_info , rtx vl)
>  }
>
>  static rtx
> -gen_vsetvl_pat (rtx_insn *rinsn, const vector_insn_info )
> +gen_vsetvl_pat (rtx_insn *rinsn, const vector_insn_info ,
> +   rtx vl = NULL_RTX)
>  {
>rtx new_pat;
>vl_vtype_info new_info = info;
> @@ -657,7 +658,7 @@ gen_vsetvl_pat (rtx_insn *rinsn, const vector_insn_info 
> )
>if (vsetvl_insn_p (rinsn) || vlmax_avl_p (info.get_avl ()))
>  {
>rtx dest = get_vl (rinsn);
 
rtx dest = vl ? vl : get_vl (rinsn);
 
> -  new_pat = gen_vsetvl_pat (VSETVL_NORMAL, new_info, dest);
> +  new_pat = gen_vsetvl_pat (VSETVL_NORMAL, new_info, vl ? vl : dest);
 
and keep dest here.
 
>  }
>else if (INSN_CODE (rinsn) == CODE_FOR_vsetvl_vtype_change_only)
>  new_pat = gen_vsetvl_pat (VSETVL_VTYPE_CHANGE_ONLY, new_info, NULL_RTX);
 
Should we handle vl is non-null case in else-if and else case?
Add `assert (vl == NULL_RTX)` if not handle.
 
> @@ -818,7 +819,8 @@ change_insn (rtx_insn *rinsn, rtx new_pat)
>print_rtl_single (dump_file, PATTERN (rinsn));
>  }
>
> -  validate_change (rinsn,  (rinsn), new_pat, false);
> +  bool change_p = validate_change (rinsn,  (rinsn), new_pat, false);
> +  gcc_assert (change_p);
 
I think we could create a wrapper for validate_change to make sure
that return true, and also use that wrapper for all other call sites?
 
e.g.
validate_change_or_fail?
 


Re: [WIP RFC] Add support for keyword-based attributes

2023-07-17 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Fri, Jul 14, 2023 at 5:58 PM Richard Sandiford via Gcc-patches
>  wrote:
>>
>> Summary: We'd like to be able to specify some attributes using
>> keywords, rather than the traditional __attribute__ or [[...]]
>> syntax.  Would that be OK?
>>
>> In more detail:
>>
>> We'd like to add some new target-specific attributes for Arm SME.
>> These attributes affect semantics and code generation and so they
>> can't simply be ignored.
>>
>> Traditionally we've done this kind of thing by adding GNU attributes,
>> via TARGET_ATTRIBUTE_TABLE in GCC's case.  The problem is that both
>> GCC and Clang have traditionally only warned about unrecognised GNU
>> attributes, rather than raising an error.  Older compilers might
>> therefore be able to look past some uses of the new attributes and
>> still produce object code, even though that object code is almost
>> certainly going to be wrong.  (The compilers will also emit a default-on
>> warning, but that might go unnoticed when building a big project.)
>>
>> There are some existing attributes that similarly affect semantics
>> in ways that cannot be ignored.  vector_size is one obvious example.
>> But that doesn't make it a good thing. :)
>>
>> Also, C++ says this for standard [[...]] attributes:
>>
>>   For an attribute-token (including an attribute-scoped-token)
>>   not specified in this document, the behavior is implementation-defined;
>>   any such attribute-token that is not recognized by the implementation
>>   is ignored.
>>
>> which doubles down on the idea that attributes should not be used
>> for necessary semantic information.
>>
>> One of the attributes we'd like to add provides a new way of compiling
>> existing code.  The attribute doesn't require SME to be available;
>> it just says that the code must be compiled so that it can run in either
>> of two modes.  This is probably the most dangerous attribute of the set,
>> since compilers that ignore it would just produce normal code.  That
>> code might work in some test scenarios, but it would fail in others.
>>
>> The feeling from the Clang community was therefore that these SME
>> attributes should use keywords instead, so that the keywords trigger
>> an error with older compilers.
>>
>> However, it seemed wrong to define new SME-specific grammar rules,
>> since the underlying problem is pretty generic.  We therefore
>> proposed having a type of keyword that can appear exactly where
>> a standard [[...]] attribute can appear and that appertains to
>> exactly what a standard [[...]] attribute would appertain to.
>> No divergence or cherry-picking is allowed.
>>
>> For example:
>>
>>   [[arm::foo]]
>>
>> would become:
>>
>>   __arm_foo
>>
>> and:
>>
>>   [[arm::bar(args)]]
>>
>> would become:
>>
>>   __arm_bar(args)
>>
>> It wouldn't be possible to retrofit arguments to a keyword that
>> previously didn't take arguments, since that could lead to parsing
>> ambiguities.  So when a keyword is first added, a binding decision
>> would need to be made whether the keyword always takes arguments
>> or is always standalone.
>>
>> For that reason, empty argument lists are allowed for keywords,
>> even though they're not allowed for [[...]] attributes.
>>
>> The argument-less version was accepted into Clang, and I have a follow-on
>> patch for handling arguments.  Would the same thing be OK for GCC,
>> in both the C and C++ frontends?
>>
>> The patch below is a proof of concept for the C frontend.  It doesn't
>> bootstrap due to warnings about uninitialised fields.  And it doesn't
>> have tests.  But I did test it locally with various combinations of
>> attribute_spec and it seemed to work as expected.
>>
>> The impact on the C frontend seems to be pretty small.  It looks like
>> the impact on the C++ frontend would be a bit bigger, but not much.
>>
>> The patch contains a logically unrelated change: c-common.h set aside
>> 16 keywords for address spaces, but of the in-tree ports, the maximum
>> number of keywords used is 6 (for amdgcn).  The patch therefore changes
>> the limit to 8 and uses 8 keywords for the new attributes.  This keeps
>> the number of reserved ids <= 256.
>
> If you had added __arm(bar(args)) instead of __arm_bar(args) you would only
> need one additional keyword - we could set aside a similar one for each
> target then.  I realize that double-nesting of arguments might prove a bit
> challenging but still.

Yeah, that would work.

> In any case I also think that attributes are what you want and their
> ugliness/issues are not worse than the ugliness/issues of the keyword
> approach IMHO.

I guess the ugliness of keywords is the double underscore?
What are the issues with the keyword approach though?

If it's two underscores vs miscompilation then it's not obvious
that two underscores should lose.

Richard


[PATCH V2] RISC-V: Support non-SLP unordered reduction

2023-07-17 Thread Juzhe-Zhong
This patch add reduc_*_scal to support reduction auto-vectorization.

Use COND_LEN_* + reduc_*_scal to support unordered non-SLP auto-vectorization.

Consider this following case:
int __attribute__((noipa))
and_loop (int32_t * __restrict x, 
int32_t n, int res)
{
  for (int i = 0; i < n; ++i)
res &= x[i];
  return res;
}

ASM:
and_loop:
ble a1,zero,.L4
vsetvli a3,zero,e32,m1,ta,ma
vmv.v.i v1,-1
.L3:
vsetvli a5,a1,e32,m1,tu,ma   > MUST BE "TU".
sllia4,a5,2
sub a1,a1,a5
vle32.v v2,0(a0)
add a0,a0,a4
vand.vv v1,v2,v1
bne a1,zero,.L3
vsetivlizero,1,e32,m1,ta,ma
vmv.v.i v2,-1
vsetvli a3,zero,e32,m1,ta,ma
vredand.vs  v1,v1,v2
vmv.x.s a5,v1
and a0,a2,a5
ret
.L4:
mv  a0,a2
ret

Fix bug of VSETVL PASS which is caused by reduction testcase.

SLP reduction and floating-point in-order reduction are not supported yet.

gcc/ChangeLog:

* config/riscv/autovec.md (reduc_plus_scal_): New pattern.
(reduc_smax_scal_): Ditto.
(reduc_umax_scal_): Ditto.
(reduc_smin_scal_): Ditto.
(reduc_umin_scal_): Ditto.
(reduc_and_scal_): Ditto.
(reduc_ior_scal_): Ditto.
(reduc_xor_scal_): Ditto.
* config/riscv/riscv-protos.h (enum insn_type): Add reduction.
(expand_reduction): New function.
* config/riscv/riscv-v.cc (emit_vlmax_reduction_insn): Ditto.
(emit_vlmax_fp_reduction_insn): Ditto.
(get_m1_mode): Ditto.
(expand_cond_len_binop): Fix name.
(expand_reduction): New function
* config/riscv/riscv-vsetvl.cc (gen_vsetvl_pat): Fix VSETVL BUG.
(validate_change_or_fail): New function.
(change_insn): Fix VSETVL BUG.
(change_vsetvl_insn): Ditto.
(pass_vsetvl::backward_demand_fusion): Ditto.
(pass_vsetvl::df_post_optimization): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Add reduction tests.
* gcc.target/riscv/rvv/autovec/reduc/reduc-1.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc-2.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc-3.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc-4.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_run-4.c: New test.

---
 gcc/config/riscv/autovec.md   | 138 ++
 gcc/config/riscv/riscv-protos.h   |   2 +
 gcc/config/riscv/riscv-v.cc   |  84 ++-
 gcc/config/riscv/riscv-vsetvl.cc  |  57 ++--
 .../riscv/rvv/autovec/reduc/reduc-1.c | 118 +++
 .../riscv/rvv/autovec/reduc/reduc-2.c | 129 
 .../riscv/rvv/autovec/reduc/reduc-3.c |  65 +
 .../riscv/rvv/autovec/reduc/reduc-4.c |  59 
 .../riscv/rvv/autovec/reduc/reduc_run-1.c |  56 +++
 .../riscv/rvv/autovec/reduc/reduc_run-2.c |  79 ++
 .../riscv/rvv/autovec/reduc/reduc_run-3.c |  49 +++
 .../riscv/rvv/autovec/reduc/reduc_run-4.c |  66 +
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|   2 +
 13 files changed, 887 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-4.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 64a41bd7101..8cdec75bacf 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1554,3 +1554,141 @@
   riscv_vector::expand_cond_len_ternop (icode, operands);
   DONE;
 })
+
+;; =
+;; == Reductions
+;; =
+
+;; -
+;;  [INT] Tree reductions
+;; -
+;; Includes:
+;; - vredsum.vs
+;; - vredmaxu.vs
+;; - vredmax.vs
+;; - vredminu.vs
+;; - vredmin.vs
+;; - vredand.vs
+;; - vredor.vs
+;; - vredxor.vs
+;; 

[PATCH] tree-optimization/110669 - bogus matching of loop bitop

2023-07-17 Thread Richard Biener via Gcc-patches
The matching code lacked a check that we end up with a PHI node
in the loop header.  This caused us to match a random PHI argument
now catched by the extra PHI_ARG_DEF_FROM_EDGE checking.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/110669
* tree-scalar-evolution.cc (analyze_and_compute_bitop_with_inv_effect):
Check we matched a header PHI.

* gcc.dg/torture/pr110669.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr110669.c | 15 +++
 gcc/tree-scalar-evolution.cc|  1 +
 2 files changed, 16 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr110669.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr110669.c 
b/gcc/testsuite/gcc.dg/torture/pr110669.c
new file mode 100644
index 000..b0a9ea448f4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr110669.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+
+int g_29, func_47_p_48, func_47_p_51_l_129;
+void func_47_p_51()
+{
+  for (;;)
+{
+  func_47_p_51_l_129 = 0;
+  for (; func_47_p_51_l_129 <= 1; func_47_p_51_l_129 += 1)
+   {
+ short *l_160 = (short *)(__UINTPTR_TYPE__)(func_47_p_48 || *l_160);
+ *l_160 &= g_29;
+   }
+}
+}
diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc
index ba47a684f4b..2abe8fa0b90 100644
--- a/gcc/tree-scalar-evolution.cc
+++ b/gcc/tree-scalar-evolution.cc
@@ -3674,6 +3674,7 @@ analyze_and_compute_bitop_with_inv_effect (class loop* 
loop, tree phidef,
   if (TREE_CODE (match_op[1]) != SSA_NAME
   || !expr_invariant_in_loop_p (loop, match_op[0])
   || !(header_phi = dyn_cast  (SSA_NAME_DEF_STMT (match_op[1])))
+  || gimple_bb (header_phi) != loop->header
   || gimple_phi_num_args (header_phi) != 2)
 return NULL_TREE;
 
-- 
2.35.3


[PATCH] Use substituted GDCFLAGS

2023-07-17 Thread Andreas Schwab via Gcc-patches
Use the substituted value for GCDFLAGS instead of hardcoding $(CFLAGS) so
that the subdir configure scripts use the configured value.

* configure.ac (GDCFLAGS): Set default from ${CFLAGS}.
* configure: Regenerate.
* Makefile.in (GDCFLAGS): Substitute @GDCFLAGS@.
---
 Makefile.in  | 2 +-
 configure| 1 +
 configure.ac | 1 +
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/Makefile.in b/Makefile.in
index 04307ca561b..144bccd2603 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -444,7 +444,7 @@ LIBCFLAGS = $(CFLAGS)
 CXXFLAGS = @CXXFLAGS@
 LIBCXXFLAGS = $(CXXFLAGS) -fno-implicit-templates
 GOCFLAGS = $(CFLAGS)
-GDCFLAGS = $(CFLAGS)
+GDCFLAGS = @GDCFLAGS@
 GM2FLAGS = $(CFLAGS)
 
 # Pass additional PGO and LTO compiler options to the PGO build.
diff --git a/configure b/configure
index 0d3f5c6455d..3269da9829f 100755
--- a/configure
+++ b/configure
@@ -12947,6 +12947,7 @@ fi
 
 
 
+GDCFLAGS=${GDCFLAGS-${CFLAGS}}
 
 # Target tools.
 
diff --git a/configure.ac b/configure.ac
index dddab2a56d8..d07a0fa7698 100644
--- a/configure.ac
+++ b/configure.ac
@@ -3662,6 +3662,7 @@ AC_SUBST(CFLAGS)
 AC_SUBST(CXXFLAGS)
 AC_SUBST(GDC)
 AC_SUBST(GDCFLAGS)
+GDCFLAGS=${GDCFLAGS-${CFLAGS}}
 
 # Target tools.
 AC_ARG_WITH([build-time-tools], 
-- 
2.41.0


-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [gcc r14-2455] riscv: Prepare backend for index registers

2023-07-17 Thread Christoph Müllner
On Mon, Jul 17, 2023 at 9:44 AM Andreas Schwab  wrote:
>
> On Jul 17 2023, Christoph Müllner wrote:
>
> > My host compiler is: gcc version 13.1.1 20230614 (Red Hat 13.1.1-4) (GCC)
>
> Too old.

Ok understood.

Thanks,
Christoph

>
> --
> Andreas Schwab, sch...@linux-m68k.org
> GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
> "And now for something completely different."


Re: [gcc r14-2455] riscv: Prepare backend for index registers

2023-07-17 Thread Christoph Müllner
On Mon, Jul 17, 2023 at 9:31 AM Andrew Pinski  wrote:
>
> On Sun, Jul 16, 2023 at 11:49 PM Christoph Müllner
>  wrote:
> >
> > On Fri, Jul 14, 2023 at 12:28 PM Andreas Schwab  
> > wrote:
> > >
> > > Why didn't you test that?
> >
> > Thanks for reporting, and sorry for introducing this warning.
> >
> > I test all patches before sending them.
> > In the case of RISC-V backend patches, I build a 2-stage
> > cross-toolchain and run all regression tests for RV32 and RV64 (using
> > QEMU).
> > Testing is done with and without patches applied to identify regressions.
> >
> > The build process shows a lot of warnings. Therefore I did not
> > investigate finding a way to use -Werror.
> > This means that looking for compiler warnings is a manual step, and I
> > might miss one while scrolling through the logs.
>
> If you are building a cross compiler, and want to clean up warnings,
> first build a native compiler and then build the cross using that.

Ok, will adjust my workflow accordingly.

> Also maybe it is finding a way to do native bootstraps on riscv to do
> testing of patches rather than doing just cross builds when testing
> backend patches.
> Especially when I have found the GCC testsuite but the bootstrap is
> more likely to find backend issues and such.

Yes, using the patch-under-testing to build a toolchain can identify
issues that the testsuite can't find. I did that a couple of times in a
QEMU environment, but I prefer the cross-toolchain approach because
it is faster. For patches that have a bigger impact, I test the toolchain
with SPEC CPU 2017.

Thanks,
Christoph

>
> Thanks,
> Andrew
>
> >
> > Sorry for the inconvenience,
> > Christoph
> >
> >
> > >
> > > ../../gcc/config/riscv/riscv.cc: In function 'int 
> > > riscv_regno_ok_for_index_p(int)':
> > > ../../gcc/config/riscv/riscv.cc:864:33: error: unused parameter 'regno' 
> > > [-Werror=unused-parameter]
> > >   864 | riscv_regno_ok_for_index_p (int regno)
> > >   | ^
> > > cc1plus: all warnings being treated as errors
> > > make[3]: *** [Makefile:2499: riscv.o] Error 1
> > >
> > > --
> > > Andreas Schwab, sch...@linux-m68k.org
> > > GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
> > > "And now for something completely different."


[PATCH] Export value/mask known bits from IPA.

2023-07-17 Thread Aldy Hernandez via Gcc-patches
Currently IPA throws away the known 1 bits because VRP and irange have
traditionally only had a way of tracking known 0s (set_nonzero_bits).
With the ability to keep all the known bits in the irange, we can now
save this between passes.

OK?

gcc/ChangeLog:

* ipa-prop.cc (ipcp_update_bits): Export value/mask known bits.
---
 gcc/ipa-prop.cc | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
index d2b998f8af5..5d790ff1265 100644
--- a/gcc/ipa-prop.cc
+++ b/gcc/ipa-prop.cc
@@ -5853,10 +5853,9 @@ ipcp_update_bits (struct cgraph_node *node, 
ipcp_transformation *ts)
{
  unsigned prec = TYPE_PRECISION (TREE_TYPE (ddef));
  signop sgn = TYPE_SIGN (TREE_TYPE (ddef));
-
- wide_int nonzero_bits = wide_int::from (bits[i]->mask, prec, UNSIGNED)
- | wide_int::from (bits[i]->value, prec, sgn);
- set_nonzero_bits (ddef, nonzero_bits);
+ wide_int mask = wide_int::from (bits[i]->mask, prec, UNSIGNED);
+ wide_int value = wide_int::from (bits[i]->value, prec, sgn);
+ set_bitmask (ddef, value, mask);
}
   else
{
-- 
2.40.1



[PATCH] Export value/mask known bits from CCP.

2023-07-17 Thread Aldy Hernandez via Gcc-patches
Currently CCP throws away the known 1 bits because VRP and irange have
traditionally only had a way of tracking known 0s (set_nonzero_bits).
With the ability to keep all the known bits in the irange, we can now
save this between passes.

OK?

gcc/ChangeLog:

* tree-ssa-ccp.cc (ccp_finalize): Export value/mask known bits.
---
 gcc/tree-ssa-ccp.cc | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/gcc/tree-ssa-ccp.cc b/gcc/tree-ssa-ccp.cc
index 0d0f02a8442..64d5fa81334 100644
--- a/gcc/tree-ssa-ccp.cc
+++ b/gcc/tree-ssa-ccp.cc
@@ -1020,11 +1020,9 @@ ccp_finalize (bool nonzero_p)
   else
{
  unsigned int precision = TYPE_PRECISION (TREE_TYPE (val->value));
- wide_int nonzero_bits
-   = (wide_int::from (val->mask, precision, UNSIGNED)
-  | wi::to_wide (val->value));
- nonzero_bits &= get_nonzero_bits (name);
- set_nonzero_bits (name, nonzero_bits);
+ wide_int value = wi::to_wide (val->value);
+ wide_int mask = wide_int::from (val->mask, precision, UNSIGNED);
+ set_bitmask (name, value, mask);
}
 }
 
-- 
2.40.1



RE: [x86 PATCH] Fix FAIL of gcc.target/i386/pr91681-1.c

2023-07-17 Thread Roger Sayle


> From: Jiang, Haochen 
> Sent: 17 July 2023 02:50
> 
> > From: Jiang, Haochen
> > Sent: Friday, July 14, 2023 10:50 AM
> >
> > > The recent change in TImode parameter passing on x86_64 results in
> > > the FAIL of pr91681-1.c.  The issue is that with the extra
> > > flexibility, the combine pass is now spoilt for choice between using
> > > either the *add3_doubleword_concat or the
> > > *add3_doubleword_zext patterns, when one operand is a *concat and
> the other is a zero_extend.
> > > The solution proposed below is provide an
> > > *add3_doubleword_concat_zext define_insn_and_split, that can
> > > benefit both from the register allocation of *concat, and still
> > > avoid the xor normally required by zero extension.
> > >
> > > I'm investigating a follow-up refinement to improve register
> > > allocation further by avoiding the early clobber in the =, and
> > > handling (custom) reloads explicitly, but this piece resolves the
> > > testcase
> > failure.
> > >
> > > This patch has been tested on x86_64-pc-linux-gnu with make
> > > bootstrap and make -k check, both with and without
> > > --target_board=unix{-m32} with no new failures.  Ok for mainline?
> > >
> > >
> > > 2023-07-11  Roger Sayle  
> > >
> > > gcc/ChangeLog
> > > PR target/91681
> > > * config/i386/i386.md (*add3_doubleword_concat_zext): New
> > > define_insn_and_split derived from
*add3_doubleword_concat
> > > and *add3_doubleword_zext.
> >
> > Hi Roger,
> >
> > This commit currently changed the codegen of testcase p443644-2.c from:
> 
> Oops, a typo, I mean pr43644-2.c.
> 
> Haochen

I'm working on a fix and hope to have this resolved soon (unfortunately
fixing
things in a post-reload splitter isn't working out due to reload's choices,
so the
solution will likely be a peephole2).

The problem is that pr91681-1.c and pr43644-2.c can't both PASS (as
written)!
The operation x = y + 0, can be generated as either "mov y,x; add $0,x" or
as
"xor x,x; add y,x".  pr91681-1.c checks there isn't an xor, pr43644-2.c
checks
there isn't a mov.  Doh!  As the author of both these test cases, I've
painted
myself into a corner.

The solution is that add $0,x should be generated (optimal) when y is
already in x,
and "xor x,x; add y,x" used otherwise (as this is shorter than "mov y,x; add
$0,x",
both sequences being approximately equal performance-wise).

> > movq%rdx, %rax
> > xorl%edx, %edx
> > addq%rdi, %rax
> > adcq%rsi, %rdx
> > to:
> > movq%rdx, %rcx
> > movq%rdi, %rax
> > movq%rsi, %rdx
> > addq%rcx, %rax
> > adcq$0, %rdx
> >
> > which causes the testcase fail under -m64.
> > Is this within your expectation?

You're right that the original (using xor) is better for pr43644-2.c's test
case.
unsigned __int128 foo(unsigned __int128 x, unsigned long long y) { return
x+y; }
but the closely related (swapping the argument order):
unsigned __int128 bar(unsigned long long y, unsigned __int128 x) { return
x+y; }
is better using "adcq $0", than having a superfluous xor.

Executive summary: This FAIL isn't serious.  I'll silence it soon.

> > BRs,
> > Haochen
> >
> > >
> > >
> > > Thanks,
> > > Roger
> > > --




  1   2   >