Re: [aarch64] Use dup and zip1 for interleaving elements in initializing vector

2023-02-03 Thread Prathamesh Kulkarni via Gcc-patches
On Fri, 3 Feb 2023 at 20:47, Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
> > On Fri, 3 Feb 2023 at 07:10, Prathamesh Kulkarni
> >  wrote:
> >>
> >> On Thu, 2 Feb 2023 at 20:50, Richard Sandiford
> >>  wrote:
> >> >
> >> > Prathamesh Kulkarni  writes:
> >> > >> >> > I have attached a patch that extends the transform if one half 
> >> > >> >> > is dup
> >> > >> >> > and other is set of constants.
> >> > >> >> > For eg:
> >> > >> >> > int8x16_t f(int8_t x)
> >> > >> >> > {
> >> > >> >> >   return (int8x16_t) { x, 1, x, 2, x, 3, x, 4, x, 5, x, 6, x, 7, 
> >> > >> >> > x, 8 };
> >> > >> >> > }
> >> > >> >> >
> >> > >> >> > code-gen trunk:
> >> > >> >> > f:
> >> > >> >> > adrpx1, .LC0
> >> > >> >> > ldr q0, [x1, #:lo12:.LC0]
> >> > >> >> > ins v0.b[0], w0
> >> > >> >> > ins v0.b[2], w0
> >> > >> >> > ins v0.b[4], w0
> >> > >> >> > ins v0.b[6], w0
> >> > >> >> > ins v0.b[8], w0
> >> > >> >> > ins v0.b[10], w0
> >> > >> >> > ins v0.b[12], w0
> >> > >> >> > ins v0.b[14], w0
> >> > >> >> > ret
> >> > >> >> >
> >> > >> >> > code-gen with patch:
> >> > >> >> > f:
> >> > >> >> > dup v0.16b, w0
> >> > >> >> > adrpx0, .LC0
> >> > >> >> > ldr q1, [x0, #:lo12:.LC0]
> >> > >> >> > zip1v0.16b, v0.16b, v1.16b
> >> > >> >> > ret
> >> > >> >> >
> >> > >> >> > Bootstrapped+tested on aarch64-linux-gnu.
> >> > >> >> > Does it look OK ?
> >> > >> >>
> >> > >> >> Looks like a nice improvement.  It'll need to wait for GCC 14 now 
> >> > >> >> though.
> >> > >> >>
> >> > >> >> However, rather than handle this case specially, I think we should 
> >> > >> >> instead
> >> > >> >> take a divide-and-conquer approach: split the initialiser into 
> >> > >> >> even and
> >> > >> >> odd elements, find the best way of loading each part, then compare 
> >> > >> >> the
> >> > >> >> cost of these sequences + ZIP with the cost of the fallback code 
> >> > >> >> (the code
> >> > >> >> later in aarch64_expand_vector_init).
> >> > >> >>
> >> > >> >> For example, doing that would allow:
> >> > >> >>
> >> > >> >>   { x, y, 0, y, 0, y, 0, y, 0, y }
> >> > >> >>
> >> > >> >> to be loaded more easily, even though the even elements aren't 
> >> > >> >> wholly
> >> > >> >> constant.
> >> > >> > Hi Richard,
> >> > >> > I have attached a prototype patch based on the above approach.
> >> > >> > It subsumes specializing for above {x, y, x, y, x, y, x, y} case by 
> >> > >> > generating
> >> > >> > same sequence, thus I removed that hunk, and improves the following 
> >> > >> > cases:
> >> > >> >
> >> > >> > (a)
> >> > >> > int8x16_t f_s16(int8_t x)
> >> > >> > {
> >> > >> >   return (int8x16_t) { x, 1, x, 2, x, 3, x, 4,
> >> > >> >  x, 5, x, 6, x, 7, x, 8 };
> >> > >> > }
> >> > >> >
> >> > >> > code-gen trunk:
> >> > >> > f_s16:
> >> > >> > adrpx1, .LC0
> >> > >> > ldr q0, [x1, #:lo12:.LC0]
> >> > >> > ins v0.b[0], w0
> >> > >> > ins v0.b[2], w0
> >> > >> > ins v0.b[4], w0
> >> > >> > ins v0.b[6], w0
> >> > >> > ins v0.b[8], w0
> >> > >> > ins v0.b[10], w0
> >> > >> > ins v0.b[12], w0
> >> > >> > ins v0.b[14], w0
> >> > >> > ret
> >> > >> >
> >> > >> > code-gen with patch:
> >> > >> > f_s16:
> >> > >> > dup v0.16b, w0
> >> > >> > adrpx0, .LC0
> >> > >> > ldr q1, [x0, #:lo12:.LC0]
> >> > >> > zip1v0.16b, v0.16b, v1.16b
> >> > >> > ret
> >> > >> >
> >> > >> > (b)
> >> > >> > int8x16_t f_s16(int8_t x, int8_t y)
> >> > >> > {
> >> > >> >   return (int8x16_t) { x, y, 1, y, 2, y, 3, y,
> >> > >> > 4, y, 5, y, 6, y, 7, y };
> >> > >> > }
> >> > >> >
> >> > >> > code-gen trunk:
> >> > >> > f_s16:
> >> > >> > adrpx2, .LC0
> >> > >> > ldr q0, [x2, #:lo12:.LC0]
> >> > >> > ins v0.b[0], w0
> >> > >> > ins v0.b[1], w1
> >> > >> > ins v0.b[3], w1
> >> > >> > ins v0.b[5], w1
> >> > >> > ins v0.b[7], w1
> >> > >> > ins v0.b[9], w1
> >> > >> > ins v0.b[11], w1
> >> > >> > ins v0.b[13], w1
> >> > >> > ins v0.b[15], w1
> >> > >> > ret
> >> > >> >
> >> > >> > code-gen patch:
> >> > >> > f_s16:
> >> > >> > adrpx2, .LC0
> >> > >> > dup v1.16b, w1
> >> > >> > ldr q0, [x2, #:lo12:.LC0]
> >> > >> > ins v0.b[0], w0
> >> > >> > zip1v0.16b, v0.16b, v1.16b
> >> > >> > ret
> >> > >>
> >> > >> Nice.
> >> > >>
> >> > >> > There are a couple of issues I have come across:
> >> > >> > (1) Choosing element to pad vector.
> >> > >> > For eg, if we are initiailizing a vector say { x, y, 0, y, 1, y, 2, 
> >> > >> > y }
> >> > >> > with mode V8HI.
> >> > >> 

[PATCH] libstdc++: Avoid use of naked int32_t in unseq_backend_simd.h, PR108672

2023-02-03 Thread Hans-Peter Nilsson via Gcc-patches
Tested cris-elf and native x86_64-pc-linux-gnu.
Ok to commit?

 8< 
The use of a "naked" int32_t (i.e. without a fitting #include:
stdint.h or cstdint or inttypes.h or an equivalent internal header),
in libstdc++-v3/include/pstl/unseq_backend_simd.h, caused an error for
cris-elf and apparently pru-elf and I guess all "newlib targets".
(Unfortunately, there's a lack of other *-elf targets in recent months
of gcc-testresults archives.)

This does not manifest on e.g. native x86_64-pc-linux-gnu, because
there, a definition is included as an effect of including stdlib.h in
cstdlib (following the trace in native xtreme-header-2_a.ii with
glibc-2.31-13+deb11u5).  Maybe better than chasing the right #includes
is to directly use the built-in type, like so:

libstdc++-v3:

PR libstdc++/108672
* include/pstl/unseq_backend_simd.h (__simd_or): Use __INT32_TYPE__
instead of int32_t.
---
 libstdc++-v3/include/pstl/unseq_backend_simd.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/pstl/unseq_backend_simd.h 
b/libstdc++-v3/include/pstl/unseq_backend_simd.h
index a05de39f7576..f6265f5c16e5 100644
--- a/libstdc++-v3/include/pstl/unseq_backend_simd.h
+++ b/libstdc++-v3/include/pstl/unseq_backend_simd.h
@@ -74,7 +74,7 @@ __simd_or(_Index __first, _DifferenceType __n, _Pred __pred) 
noexcept
 const _Index __last = __first + __n;
 while (__last != __first)
 {
-int32_t __flag = 1;
+__INT32_TYPE__ __flag = 1;
 _PSTL_PRAGMA_SIMD_REDUCTION(& : __flag)
 for (_DifferenceType __i = 0; __i < __block_size; ++__i)
 if (__pred(*(__first + __i)))
-- 
2.30.2



[PATCH] RISC-V: Fix VSETVL PASS bug in exception handling

2023-02-03 Thread juzhe . zhong
From: Ju-Zhe Zhong 

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pass_vsetvl::compute_probabilities): 
Skip exit block.

gcc/testsuite/ChangeLog:

* g++.target/riscv/rvv/base/exception-1.C: New test.

---
 gcc/config/riscv/riscv-vsetvl.cc  | 10 +--
 .../g++.target/riscv/rvv/base/exception-1.C   | 29 +++
 2 files changed, 36 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/exception-1.C

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index ef5b74c58d2..8e6063ae83b 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -3492,8 +3492,15 @@ pass_vsetvl::compute_probabilities (void)
   basic_block cfg_bb = bb->cfg_bb ();
   auto _prob
= m_vector_manager->vector_block_infos[cfg_bb->index].probability;
+
+  /* GCC assume entry block (bb 0) are always so
+executed so set its probability as "always".  */
   if (ENTRY_BLOCK_PTR_FOR_FN (cfun) == cfg_bb)
curr_prob = profile_probability::always ();
+  /* Exit block (bb 1) is the block we don't need to process.  */
+  if (EXIT_BLOCK_PTR_FOR_FN (cfun) == cfg_bb)
+   continue;
+
   gcc_assert (curr_prob.initialized_p ());
   FOR_EACH_EDGE (e, ei, cfg_bb->succs)
{
@@ -3507,9 +3514,6 @@ pass_vsetvl::compute_probabilities (void)
new_prob += curr_prob * e->probability;
}
 }
-  auto _block
-= m_vector_manager->vector_block_infos[EXIT_BLOCK_PTR_FOR_FN 
(cfun)->index];
-  exit_block.probability = profile_probability::always ();
 }
 
 /* Lazy vsetvl insertion for optimize > 0. */
diff --git a/gcc/testsuite/g++.target/riscv/rvv/base/exception-1.C 
b/gcc/testsuite/g++.target/riscv/rvv/base/exception-1.C
new file mode 100644
index 000..5f5247bce46
--- /dev/null
+++ b/gcc/testsuite/g++.target/riscv/rvv/base/exception-1.C
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include 
+#include "riscv_vector.h"
+#include 
+void __attribute__((noinline)) foo(int arr[4]) {
+printf("%d %d %d %d\n", arr[0], arr[1], arr[2], arr[3]);
+}
+
+void __attribute__((noinline)) test() {
+// Intialization with 2 memsets leads to spilling of zero-splat value
+vint32m1_t a;
+int arr1[4] = {};
+foo(arr1);
+int arr2[4] = {};
+foo(arr2);
+asm volatile ("# %0" : "+vr" (a));
+throw int();
+}
+
+int main() {
+try {
+   test();
+} catch (...) {
+   printf("hello\n");
+};
+return 0;
+}
-- 
2.36.1



[PATCH] RISC-V: Add vneg.v C++ API tests

2023-02-03 Thread juzhe . zhong
From: Ju-Zhe Zhong 

gcc/testsuite/ChangeLog:

* g++.target/riscv/rvv/base/vneg_v-1.C: New test.
* g++.target/riscv/rvv/base/vneg_v-2.C: New test.
* g++.target/riscv/rvv/base/vneg_v-3.C: New test.
* g++.target/riscv/rvv/base/vneg_v_mu-1.C: New test.
* g++.target/riscv/rvv/base/vneg_v_mu-2.C: New test.
* g++.target/riscv/rvv/base/vneg_v_mu-3.C: New test.
* g++.target/riscv/rvv/base/vneg_v_tu-1.C: New test.
* g++.target/riscv/rvv/base/vneg_v_tu-2.C: New test.
* g++.target/riscv/rvv/base/vneg_v_tu-3.C: New test.
* g++.target/riscv/rvv/base/vneg_v_tum-1.C: New test.
* g++.target/riscv/rvv/base/vneg_v_tum-2.C: New test.
* g++.target/riscv/rvv/base/vneg_v_tum-3.C: New test.
* g++.target/riscv/rvv/base/vneg_v_tumu-1.C: New test.
* g++.target/riscv/rvv/base/vneg_v_tumu-2.C: New test.
* g++.target/riscv/rvv/base/vneg_v_tumu-3.C: New test.

---
 .../g++.target/riscv/rvv/base/vneg_v-1.C  | 314 ++
 .../g++.target/riscv/rvv/base/vneg_v-2.C  | 314 ++
 .../g++.target/riscv/rvv/base/vneg_v-3.C  | 314 ++
 .../g++.target/riscv/rvv/base/vneg_v_mu-1.C   | 160 +
 .../g++.target/riscv/rvv/base/vneg_v_mu-2.C   | 160 +
 .../g++.target/riscv/rvv/base/vneg_v_mu-3.C   | 160 +
 .../g++.target/riscv/rvv/base/vneg_v_tu-1.C   | 160 +
 .../g++.target/riscv/rvv/base/vneg_v_tu-2.C   | 160 +
 .../g++.target/riscv/rvv/base/vneg_v_tu-3.C   | 160 +
 .../g++.target/riscv/rvv/base/vneg_v_tum-1.C  | 160 +
 .../g++.target/riscv/rvv/base/vneg_v_tum-2.C  | 160 +
 .../g++.target/riscv/rvv/base/vneg_v_tum-3.C  | 160 +
 .../g++.target/riscv/rvv/base/vneg_v_tumu-1.C | 160 +
 .../g++.target/riscv/rvv/base/vneg_v_tumu-2.C | 160 +
 .../g++.target/riscv/rvv/base/vneg_v_tumu-3.C | 160 +
 15 files changed, 2862 insertions(+)
 create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/vneg_v-1.C
 create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/vneg_v-2.C
 create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/vneg_v-3.C
 create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/vneg_v_mu-1.C
 create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/vneg_v_mu-2.C
 create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/vneg_v_mu-3.C
 create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/vneg_v_tu-1.C
 create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/vneg_v_tu-2.C
 create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/vneg_v_tu-3.C
 create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/vneg_v_tum-1.C
 create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/vneg_v_tum-2.C
 create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/vneg_v_tum-3.C
 create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/vneg_v_tumu-1.C
 create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/vneg_v_tumu-2.C
 create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/vneg_v_tumu-3.C

diff --git a/gcc/testsuite/g++.target/riscv/rvv/base/vneg_v-1.C 
b/gcc/testsuite/g++.target/riscv/rvv/base/vneg_v-1.C
new file mode 100644
index 000..2d135e0bc0d
--- /dev/null
+++ b/gcc/testsuite/g++.target/riscv/rvv/base/vneg_v-1.C
@@ -0,0 +1,314 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -fno-schedule-insns 
-fno-schedule-insns2" } */
+
+#include "riscv_vector.h"
+
+vint8mf8_t test___riscv_vneg(vint8mf8_t op1,size_t vl)
+{
+return __riscv_vneg(op1,vl);
+}
+
+
+vint8mf4_t test___riscv_vneg(vint8mf4_t op1,size_t vl)
+{
+return __riscv_vneg(op1,vl);
+}
+
+
+vint8mf2_t test___riscv_vneg(vint8mf2_t op1,size_t vl)
+{
+return __riscv_vneg(op1,vl);
+}
+
+
+vint8m1_t test___riscv_vneg(vint8m1_t op1,size_t vl)
+{
+return __riscv_vneg(op1,vl);
+}
+
+
+vint8m2_t test___riscv_vneg(vint8m2_t op1,size_t vl)
+{
+return __riscv_vneg(op1,vl);
+}
+
+
+vint8m4_t test___riscv_vneg(vint8m4_t op1,size_t vl)
+{
+return __riscv_vneg(op1,vl);
+}
+
+
+vint8m8_t test___riscv_vneg(vint8m8_t op1,size_t vl)
+{
+return __riscv_vneg(op1,vl);
+}
+
+
+vint16mf4_t test___riscv_vneg(vint16mf4_t op1,size_t vl)
+{
+return __riscv_vneg(op1,vl);
+}
+
+
+vint16mf2_t test___riscv_vneg(vint16mf2_t op1,size_t vl)
+{
+return __riscv_vneg(op1,vl);
+}
+
+
+vint16m1_t test___riscv_vneg(vint16m1_t op1,size_t vl)
+{
+return __riscv_vneg(op1,vl);
+}
+
+
+vint16m2_t test___riscv_vneg(vint16m2_t op1,size_t vl)
+{
+return __riscv_vneg(op1,vl);
+}
+
+
+vint16m4_t test___riscv_vneg(vint16m4_t op1,size_t vl)
+{
+return __riscv_vneg(op1,vl);
+}
+
+
+vint16m8_t test___riscv_vneg(vint16m8_t op1,size_t vl)
+{
+return __riscv_vneg(op1,vl);
+}
+
+
+vint32mf2_t test___riscv_vneg(vint32mf2_t op1,size_t vl)
+{
+return __riscv_vneg(op1,vl);
+}
+
+
+vint32m1_t test___riscv_vneg(vint32m1_t op1,size_t vl)
+{
+return __riscv_vneg(op1,vl);
+}
+
+

[PATCH] RISC-V: Add vnot.v C++ API tests

2023-02-03 Thread juzhe . zhong
From: Ju-Zhe Zhong 

gcc/testsuite/ChangeLog:

* g++.target/riscv/rvv/base/vnot_v-1.C: New test.
* g++.target/riscv/rvv/base/vnot_v-2.C: New test.
* g++.target/riscv/rvv/base/vnot_v-3.C: New test.
* g++.target/riscv/rvv/base/vnot_v_mu-1.C: New test.
* g++.target/riscv/rvv/base/vnot_v_mu-2.C: New test.
* g++.target/riscv/rvv/base/vnot_v_mu-3.C: New test.
* g++.target/riscv/rvv/base/vnot_v_tu-1.C: New test.
* g++.target/riscv/rvv/base/vnot_v_tu-2.C: New test.
* g++.target/riscv/rvv/base/vnot_v_tu-3.C: New test.
* g++.target/riscv/rvv/base/vnot_v_tum-1.C: New test.
* g++.target/riscv/rvv/base/vnot_v_tum-2.C: New test.
* g++.target/riscv/rvv/base/vnot_v_tum-3.C: New test.
* g++.target/riscv/rvv/base/vnot_v_tumu-1.C: New test.
* g++.target/riscv/rvv/base/vnot_v_tumu-2.C: New test.
* g++.target/riscv/rvv/base/vnot_v_tumu-3.C: New test.

---
 .../g++.target/riscv/rvv/base/vnot_v-1.C  | 314 ++
 .../g++.target/riscv/rvv/base/vnot_v-2.C  | 314 ++
 .../g++.target/riscv/rvv/base/vnot_v-3.C  | 314 ++
 .../g++.target/riscv/rvv/base/vnot_v_mu-1.C   | 160 +
 .../g++.target/riscv/rvv/base/vnot_v_mu-2.C   | 160 +
 .../g++.target/riscv/rvv/base/vnot_v_mu-3.C   | 160 +
 .../g++.target/riscv/rvv/base/vnot_v_tu-1.C   | 160 +
 .../g++.target/riscv/rvv/base/vnot_v_tu-2.C   | 160 +
 .../g++.target/riscv/rvv/base/vnot_v_tu-3.C   | 160 +
 .../g++.target/riscv/rvv/base/vnot_v_tum-1.C  | 160 +
 .../g++.target/riscv/rvv/base/vnot_v_tum-2.C  | 160 +
 .../g++.target/riscv/rvv/base/vnot_v_tum-3.C  | 160 +
 .../g++.target/riscv/rvv/base/vnot_v_tumu-1.C | 160 +
 .../g++.target/riscv/rvv/base/vnot_v_tumu-2.C | 160 +
 .../g++.target/riscv/rvv/base/vnot_v_tumu-3.C | 160 +
 15 files changed, 2862 insertions(+)
 create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/vnot_v-1.C
 create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/vnot_v-2.C
 create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/vnot_v-3.C
 create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/vnot_v_mu-1.C
 create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/vnot_v_mu-2.C
 create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/vnot_v_mu-3.C
 create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/vnot_v_tu-1.C
 create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/vnot_v_tu-2.C
 create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/vnot_v_tu-3.C
 create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/vnot_v_tum-1.C
 create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/vnot_v_tum-2.C
 create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/vnot_v_tum-3.C
 create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/vnot_v_tumu-1.C
 create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/vnot_v_tumu-2.C
 create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/vnot_v_tumu-3.C

diff --git a/gcc/testsuite/g++.target/riscv/rvv/base/vnot_v-1.C 
b/gcc/testsuite/g++.target/riscv/rvv/base/vnot_v-1.C
new file mode 100644
index 000..23e6f92c8c9
--- /dev/null
+++ b/gcc/testsuite/g++.target/riscv/rvv/base/vnot_v-1.C
@@ -0,0 +1,314 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -fno-schedule-insns 
-fno-schedule-insns2" } */
+
+#include "riscv_vector.h"
+
+vint8mf8_t test___riscv_vnot(vint8mf8_t op1,size_t vl)
+{
+return __riscv_vnot(op1,vl);
+}
+
+
+vint8mf4_t test___riscv_vnot(vint8mf4_t op1,size_t vl)
+{
+return __riscv_vnot(op1,vl);
+}
+
+
+vint8mf2_t test___riscv_vnot(vint8mf2_t op1,size_t vl)
+{
+return __riscv_vnot(op1,vl);
+}
+
+
+vint8m1_t test___riscv_vnot(vint8m1_t op1,size_t vl)
+{
+return __riscv_vnot(op1,vl);
+}
+
+
+vint8m2_t test___riscv_vnot(vint8m2_t op1,size_t vl)
+{
+return __riscv_vnot(op1,vl);
+}
+
+
+vint8m4_t test___riscv_vnot(vint8m4_t op1,size_t vl)
+{
+return __riscv_vnot(op1,vl);
+}
+
+
+vint8m8_t test___riscv_vnot(vint8m8_t op1,size_t vl)
+{
+return __riscv_vnot(op1,vl);
+}
+
+
+vint16mf4_t test___riscv_vnot(vint16mf4_t op1,size_t vl)
+{
+return __riscv_vnot(op1,vl);
+}
+
+
+vint16mf2_t test___riscv_vnot(vint16mf2_t op1,size_t vl)
+{
+return __riscv_vnot(op1,vl);
+}
+
+
+vint16m1_t test___riscv_vnot(vint16m1_t op1,size_t vl)
+{
+return __riscv_vnot(op1,vl);
+}
+
+
+vint16m2_t test___riscv_vnot(vint16m2_t op1,size_t vl)
+{
+return __riscv_vnot(op1,vl);
+}
+
+
+vint16m4_t test___riscv_vnot(vint16m4_t op1,size_t vl)
+{
+return __riscv_vnot(op1,vl);
+}
+
+
+vint16m8_t test___riscv_vnot(vint16m8_t op1,size_t vl)
+{
+return __riscv_vnot(op1,vl);
+}
+
+
+vint32mf2_t test___riscv_vnot(vint32mf2_t op1,size_t vl)
+{
+return __riscv_vnot(op1,vl);
+}
+
+
+vint32m1_t test___riscv_vnot(vint32m1_t op1,size_t vl)
+{
+return __riscv_vnot(op1,vl);
+}
+
+

[PATCH] RISC-V: Add unary constraint tests.

2023-02-03 Thread juzhe . zhong
From: Ju-Zhe Zhong 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/unop_v_constraint-1.c: New test.

---
 .../riscv/rvv/base/unop_v_constraint-1.c  | 132 ++
 1 file changed, 132 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/unop_v_constraint-1.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/unop_v_constraint-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/unop_v_constraint-1.c
new file mode 100644
index 000..1266784fd8f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/unop_v_constraint-1.c
@@ -0,0 +1,132 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32d -O3" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+#include "riscv_vector.h"
+
+/*
+** f1:
+** vsetivli\tzero,4,e32,m1,tu,ma
+** vle32\.v\tv[0-9]+,0\([a-x0-9]+\)
+** vle32\.v\tv[0-9]+,0\([a-x0-9]+\)
+** vneg\.v\tv[0-9]+,\s*v[0-9]+
+** vneg\.v\tv[0-9]+,\s*v[0-9]+
+** vse32\.v\tv[0-9]+,0\([a-x0-9]+\)
+** ret
+*/
+void f1 (void * in, void *out)
+{
+vint32m1_t v = __riscv_vle32_v_i32m1 (in, 4);
+vint32m1_t v2 = __riscv_vle32_v_i32m1_tu (v, in, 4);
+vint32m1_t v3 = __riscv_vneg_v_i32m1 (v2, 4);
+vint32m1_t v4 = __riscv_vneg_v_i32m1_tu (v3, v2, 4);
+__riscv_vse32_v_i32m1 (out, v4, 4);
+}
+
+/*
+** f2:
+** vsetvli\t[a-x0-9]+,zero,e8,mf4,ta,ma
+** vlm.v\tv[0-9]+,0\([a-x0-9]+\)
+** vsetivli\tzero,4,e32,m1,ta,ma
+** vle32.v\tv[0-9]+,0\([a-x0-9]+\),v0.t
+** vneg\.v\tv[0-9]+,\s*v[0-9]+
+** vneg\.v\tv[1-9][0-9]?,\s*v[0-9]+,\s*v0.t
+** vse32.v\tv[0-9]+,0\([a-x0-9]+\)
+** ret
+*/
+void f2 (void * in, void *out)
+{
+vbool32_t mask = *(vbool32_t*)in;
+asm volatile ("":::"memory");
+vint32m1_t v = __riscv_vle32_v_i32m1 (in, 4);
+vint32m1_t v2 = __riscv_vle32_v_i32m1_m (mask, in, 4);
+vint32m1_t v3 = __riscv_vneg_v_i32m1 (v2, 4);
+vint32m1_t v4 = __riscv_vneg_v_i32m1_m (mask, v3, 4);
+__riscv_vse32_v_i32m1 (out, v4, 4);
+}
+
+/*
+** f3:
+** vsetvli\t[a-x0-9]+,zero,e8,mf4,ta,ma
+** vlm.v\tv[0-9]+,0\([a-x0-9]+\)
+** vsetivli\tzero,4,e32,m1,tu,mu
+** vle32\.v\tv[0-9]+,0\([a-x0-9]+\)
+** vle32.v\tv[0-9]+,0\([a-x0-9]+\),v0.t
+** vneg\.v\tv[0-9]+,\s*v[0-9]+
+** vneg\.v\tv[1-9][0-9]?,\s*v[0-9]+,\s*v0.t
+** vse32.v\tv[0-9]+,0\([a-x0-9]+\)
+** ret
+*/
+void f3 (void * in, void *out)
+{
+vbool32_t mask = *(vbool32_t*)in;
+asm volatile ("":::"memory");
+vint32m1_t v = __riscv_vle32_v_i32m1 (in, 4);
+vint32m1_t v2 = __riscv_vle32_v_i32m1_tumu (mask, v, in, 4);
+vint32m1_t v3 = __riscv_vneg_v_i32m1 (v2, 4);
+vint32m1_t v4 = __riscv_vneg_v_i32m1_tumu (mask, v3, v2, 4);
+__riscv_vse32_v_i32m1 (out, v4, 4);
+}
+
+/*
+** f4:
+** vsetivli\tzero,4,e8,mf8,tu,ma
+** vle8\.v\tv[0-9]+,0\([a-x0-9]+\)
+** vle8\.v\tv[0-9]+,0\([a-x0-9]+\)
+** vneg\.v\tv[0-9]+,\s*v[0-9]+
+** vneg\.v\tv[0-9]+,\s*v[0-9]+
+** vse8\.v\tv[0-9]+,0\([a-x0-9]+\)
+** ret
+*/
+void f4 (void * in, void *out)
+{
+vint8mf8_t v = __riscv_vle8_v_i8mf8 (in, 4);
+vint8mf8_t v2 = __riscv_vle8_v_i8mf8_tu (v, in, 4);
+vint8mf8_t v3 = __riscv_vneg_v_i8mf8 (v2, 4);
+vint8mf8_t v4 = __riscv_vneg_v_i8mf8_tu (v3, v2, 4);
+__riscv_vse8_v_i8mf8 (out, v4, 4);
+}
+
+/*
+** f5:
+** vsetvli\t[a-x0-9]+,zero,e8,mf8,ta,ma
+** vlm.v\tv[0-9]+,0\([a-x0-9]+\)
+** vsetivli\tzero,4,e8,mf8,ta,ma
+** vle8.v\tv[0-9]+,0\([a-x0-9]+\),v0.t
+** vneg\.v\tv[0-9]+,\s*v[0-9]+
+** vneg\.v\tv[1-9][0-9]?,\s*v[0-9]+,\s*v0.t
+** vse8.v\tv[0-9]+,0\([a-x0-9]+\)
+** ret
+*/
+void f5 (void * in, void *out)
+{
+vbool64_t mask = *(vbool64_t*)in;
+asm volatile ("":::"memory");
+vint8mf8_t v = __riscv_vle8_v_i8mf8 (in, 4);
+vint8mf8_t v2 = __riscv_vle8_v_i8mf8_m (mask, in, 4);
+vint8mf8_t v3 = __riscv_vneg_v_i8mf8 (v2, 4);
+vint8mf8_t v4 = __riscv_vneg_v_i8mf8_m (mask, v3, 4);
+__riscv_vse8_v_i8mf8 (out, v4, 4);
+}
+
+/*
+** f6:
+** vsetvli\t[a-x0-9]+,zero,e8,mf8,ta,ma
+** vlm.v\tv[0-9]+,0\([a-x0-9]+\)
+** vsetivli\tzero,4,e8,mf8,tu,mu
+** vle8\.v\tv[0-9]+,0\([a-x0-9]+\)
+** vle8.v\tv[0-9]+,0\([a-x0-9]+\),v0.t
+** vneg\.v\tv[0-9]+,\s*v[0-9]+
+** vneg\.v\tv[1-9][0-9]?,\s*v[0-9]+,\s*v0.t
+** vse8.v\tv[0-9]+,0\([a-x0-9]+\)
+** ret
+*/
+void f6 (void * in, void *out)
+{
+vbool64_t mask = *(vbool64_t*)in;
+asm volatile ("":::"memory");
+vint8mf8_t v = __riscv_vle8_v_i8mf8 (in, 4);
+vint8mf8_t v2 = __riscv_vle8_v_i8mf8_tumu (mask, v, in, 4);
+vint8mf8_t v3 = __riscv_vneg_v_i8mf8 (v2, 4);
+vint8mf8_t v4 = __riscv_vneg_v_i8mf8_tumu (mask, v3, v2, 4);
+__riscv_vse8_v_i8mf8 (out, v4, 4);
+}
-- 
2.36.1



[PATCH] RISC-V: Add vneg.v C/C++ API tests

2023-02-03 Thread juzhe . zhong
From: Ju-Zhe Zhong 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/vneg_v-1.c: New test.
* gcc.target/riscv/rvv/base/vneg_v-2.c: New test.
* gcc.target/riscv/rvv/base/vneg_v-3.c: New test.
* gcc.target/riscv/rvv/base/vneg_v_m-1.c: New test.
* gcc.target/riscv/rvv/base/vneg_v_m-2.c: New test.
* gcc.target/riscv/rvv/base/vneg_v_m-3.c: New test.
* gcc.target/riscv/rvv/base/vneg_v_mu-1.c: New test.
* gcc.target/riscv/rvv/base/vneg_v_mu-2.c: New test.
* gcc.target/riscv/rvv/base/vneg_v_mu-3.c: New test.
* gcc.target/riscv/rvv/base/vneg_v_tu-1.c: New test.
* gcc.target/riscv/rvv/base/vneg_v_tu-2.c: New test.
* gcc.target/riscv/rvv/base/vneg_v_tu-3.c: New test.
* gcc.target/riscv/rvv/base/vneg_v_tum-1.c: New test.
* gcc.target/riscv/rvv/base/vneg_v_tum-2.c: New test.
* gcc.target/riscv/rvv/base/vneg_v_tum-3.c: New test.
* gcc.target/riscv/rvv/base/vneg_v_tumu-1.c: New test.
* gcc.target/riscv/rvv/base/vneg_v_tumu-2.c: New test.
* gcc.target/riscv/rvv/base/vneg_v_tumu-3.c: New test.

---
 .../gcc.target/riscv/rvv/base/vneg_v-1.c  | 160 ++
 .../gcc.target/riscv/rvv/base/vneg_v-2.c  | 160 ++
 .../gcc.target/riscv/rvv/base/vneg_v-3.c  | 160 ++
 .../gcc.target/riscv/rvv/base/vneg_v_m-1.c| 160 ++
 .../gcc.target/riscv/rvv/base/vneg_v_m-2.c| 160 ++
 .../gcc.target/riscv/rvv/base/vneg_v_m-3.c| 160 ++
 .../gcc.target/riscv/rvv/base/vneg_v_mu-1.c   | 160 ++
 .../gcc.target/riscv/rvv/base/vneg_v_mu-2.c   | 160 ++
 .../gcc.target/riscv/rvv/base/vneg_v_mu-3.c   | 160 ++
 .../gcc.target/riscv/rvv/base/vneg_v_tu-1.c   | 160 ++
 .../gcc.target/riscv/rvv/base/vneg_v_tu-2.c   | 160 ++
 .../gcc.target/riscv/rvv/base/vneg_v_tu-3.c   | 160 ++
 .../gcc.target/riscv/rvv/base/vneg_v_tum-1.c  | 160 ++
 .../gcc.target/riscv/rvv/base/vneg_v_tum-2.c  | 160 ++
 .../gcc.target/riscv/rvv/base/vneg_v_tum-3.c  | 160 ++
 .../gcc.target/riscv/rvv/base/vneg_v_tumu-1.c | 160 ++
 .../gcc.target/riscv/rvv/base/vneg_v_tumu-2.c | 160 ++
 .../gcc.target/riscv/rvv/base/vneg_v_tumu-3.c | 160 ++
 18 files changed, 2880 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vneg_v-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vneg_v-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vneg_v-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vneg_v_m-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vneg_v_m-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vneg_v_m-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vneg_v_mu-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vneg_v_mu-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vneg_v_mu-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vneg_v_tu-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vneg_v_tu-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vneg_v_tu-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vneg_v_tum-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vneg_v_tum-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vneg_v_tum-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vneg_v_tumu-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vneg_v_tumu-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vneg_v_tumu-3.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/vneg_v-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/vneg_v-1.c
new file mode 100644
index 000..e573a26e258
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/vneg_v-1.c
@@ -0,0 +1,160 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -fno-schedule-insns 
-fno-schedule-insns2" } */
+
+#include "riscv_vector.h"
+
+vint8mf8_t test___riscv_vneg_v_i8mf8(vint8mf8_t op1,size_t vl)
+{
+return __riscv_vneg_v_i8mf8(op1,vl);
+}
+
+
+vint8mf4_t test___riscv_vneg_v_i8mf4(vint8mf4_t op1,size_t vl)
+{
+return __riscv_vneg_v_i8mf4(op1,vl);
+}
+
+
+vint8mf2_t test___riscv_vneg_v_i8mf2(vint8mf2_t op1,size_t vl)
+{
+return __riscv_vneg_v_i8mf2(op1,vl);
+}
+
+
+vint8m1_t test___riscv_vneg_v_i8m1(vint8m1_t op1,size_t vl)
+{
+return __riscv_vneg_v_i8m1(op1,vl);
+}
+
+
+vint8m2_t test___riscv_vneg_v_i8m2(vint8m2_t op1,size_t vl)
+{
+return __riscv_vneg_v_i8m2(op1,vl);
+}
+
+
+vint8m4_t test___riscv_vneg_v_i8m4(vint8m4_t op1,size_t vl)
+{
+return __riscv_vneg_v_i8m4(op1,vl);
+}
+
+
+vint8m8_t test___riscv_vneg_v_i8m8(vint8m8_t op1,size_t vl)
+{
+return 

[PATCH] RISC-V: Add vnot.v C API tests

2023-02-03 Thread juzhe . zhong
From: Ju-Zhe Zhong 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/vnot_v-1.c: New test.
* gcc.target/riscv/rvv/base/vnot_v-2.c: New test.
* gcc.target/riscv/rvv/base/vnot_v-3.c: New test.
* gcc.target/riscv/rvv/base/vnot_v_m-1.c: New test.
* gcc.target/riscv/rvv/base/vnot_v_m-2.c: New test.
* gcc.target/riscv/rvv/base/vnot_v_m-3.c: New test.
* gcc.target/riscv/rvv/base/vnot_v_mu-1.c: New test.
* gcc.target/riscv/rvv/base/vnot_v_mu-2.c: New test.
* gcc.target/riscv/rvv/base/vnot_v_mu-3.c: New test.
* gcc.target/riscv/rvv/base/vnot_v_tu-1.c: New test.
* gcc.target/riscv/rvv/base/vnot_v_tu-2.c: New test.
* gcc.target/riscv/rvv/base/vnot_v_tu-3.c: New test.
* gcc.target/riscv/rvv/base/vnot_v_tum-1.c: New test.
* gcc.target/riscv/rvv/base/vnot_v_tum-2.c: New test.
* gcc.target/riscv/rvv/base/vnot_v_tum-3.c: New test.
* gcc.target/riscv/rvv/base/vnot_v_tumu-1.c: New test.
* gcc.target/riscv/rvv/base/vnot_v_tumu-2.c: New test.
* gcc.target/riscv/rvv/base/vnot_v_tumu-3.c: New test.

---
 .../gcc.target/riscv/rvv/base/vnot_v-1.c  | 160 ++
 .../gcc.target/riscv/rvv/base/vnot_v-2.c  | 160 ++
 .../gcc.target/riscv/rvv/base/vnot_v-3.c  | 160 ++
 .../gcc.target/riscv/rvv/base/vnot_v_m-1.c| 160 ++
 .../gcc.target/riscv/rvv/base/vnot_v_m-2.c| 160 ++
 .../gcc.target/riscv/rvv/base/vnot_v_m-3.c| 160 ++
 .../gcc.target/riscv/rvv/base/vnot_v_mu-1.c   | 160 ++
 .../gcc.target/riscv/rvv/base/vnot_v_mu-2.c   | 160 ++
 .../gcc.target/riscv/rvv/base/vnot_v_mu-3.c   | 160 ++
 .../gcc.target/riscv/rvv/base/vnot_v_tu-1.c   | 160 ++
 .../gcc.target/riscv/rvv/base/vnot_v_tu-2.c   | 160 ++
 .../gcc.target/riscv/rvv/base/vnot_v_tu-3.c   | 160 ++
 .../gcc.target/riscv/rvv/base/vnot_v_tum-1.c  | 160 ++
 .../gcc.target/riscv/rvv/base/vnot_v_tum-2.c  | 160 ++
 .../gcc.target/riscv/rvv/base/vnot_v_tum-3.c  | 160 ++
 .../gcc.target/riscv/rvv/base/vnot_v_tumu-1.c | 160 ++
 .../gcc.target/riscv/rvv/base/vnot_v_tumu-2.c | 160 ++
 .../gcc.target/riscv/rvv/base/vnot_v_tumu-3.c | 160 ++
 18 files changed, 2880 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vnot_v-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vnot_v-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vnot_v-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vnot_v_m-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vnot_v_m-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vnot_v_m-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vnot_v_mu-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vnot_v_mu-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vnot_v_mu-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vnot_v_tu-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vnot_v_tu-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vnot_v_tu-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vnot_v_tum-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vnot_v_tum-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vnot_v_tum-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vnot_v_tumu-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vnot_v_tumu-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vnot_v_tumu-3.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/vnot_v-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/vnot_v-1.c
new file mode 100644
index 000..82bd70e5fda
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/vnot_v-1.c
@@ -0,0 +1,160 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -fno-schedule-insns 
-fno-schedule-insns2" } */
+
+#include "riscv_vector.h"
+
+vint8mf8_t test___riscv_vnot_v_i8mf8(vint8mf8_t op1,size_t vl)
+{
+return __riscv_vnot_v_i8mf8(op1,vl);
+}
+
+
+vint8mf4_t test___riscv_vnot_v_i8mf4(vint8mf4_t op1,size_t vl)
+{
+return __riscv_vnot_v_i8mf4(op1,vl);
+}
+
+
+vint8mf2_t test___riscv_vnot_v_i8mf2(vint8mf2_t op1,size_t vl)
+{
+return __riscv_vnot_v_i8mf2(op1,vl);
+}
+
+
+vint8m1_t test___riscv_vnot_v_i8m1(vint8m1_t op1,size_t vl)
+{
+return __riscv_vnot_v_i8m1(op1,vl);
+}
+
+
+vint8m2_t test___riscv_vnot_v_i8m2(vint8m2_t op1,size_t vl)
+{
+return __riscv_vnot_v_i8m2(op1,vl);
+}
+
+
+vint8m4_t test___riscv_vnot_v_i8m4(vint8m4_t op1,size_t vl)
+{
+return __riscv_vnot_v_i8m4(op1,vl);
+}
+
+
+vint8m8_t test___riscv_vnot_v_i8m8(vint8m8_t op1,size_t vl)
+{
+return 

[PATCH] RISC-V: Add unary C/C++ API support

2023-02-03 Thread juzhe . zhong
From: Ju-Zhe Zhong 

gcc/ChangeLog:

* config/riscv/iterators.md: Add neg and not.
* config/riscv/riscv-vector-builtins-bases.cc (class unop): New class.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def (vadd): Rename binop 
into alu.
(vsub): Ditto.
(vand): Ditto.
(vor): Ditto.
(vxor): Ditto.
(vsll): Ditto.
(vsra): Ditto.
(vsrl): Ditto.
(vmin): Ditto.
(vmax): Ditto.
(vminu): Ditto.
(vmaxu): Ditto.
(vmul): Ditto.
(vdiv): Ditto.
(vrem): Ditto.
(vdivu): Ditto.
(vremu): Ditto.
(vrsub): Ditto.
(vneg): Ditto.
(vnot): Ditto.
* config/riscv/riscv-vector-builtins-shapes.cc (struct binop_def): 
Ditto.
(struct alu_def): Ditto.
(SHAPE): Ditto.
* config/riscv/riscv-vector-builtins-shapes.h: Ditto.
* config/riscv/riscv-vector-builtins.cc: Support unary C/C/++.
* config/riscv/vector-iterators.md: New iterator.
* config/riscv/vector.md (@pred_): New pattern

---
 gcc/config/riscv/iterators.md |  8 ++-
 .../riscv/riscv-vector-builtins-bases.cc  | 15 
 .../riscv/riscv-vector-builtins-bases.h   |  2 +
 .../riscv/riscv-vector-builtins-functions.def | 72 ++-
 .../riscv/riscv-vector-builtins-shapes.cc |  6 +-
 .../riscv/riscv-vector-builtins-shapes.h  |  2 +-
 gcc/config/riscv/riscv-vector-builtins.cc | 12 
 gcc/config/riscv/vector-iterators.md  |  2 +
 gcc/config/riscv/vector.md| 30 
 9 files changed, 108 insertions(+), 41 deletions(-)

diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index 9561403419b..6013f58db6e 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -201,7 +201,9 @@
 (smax "smax")
 (umin "umin")
 (umax "umax")
-(mult "mul")])
+(mult "mul")
+(not "one_cmpl")
+(neg "neg")])
 
 ;;  code attributes
 (define_code_attr or_optab [(ior "ior")
@@ -224,7 +226,9 @@
(smax "max")
(umin "minu")
(umax "maxu")
-   (mult "mul")])
+   (mult "mul")
+   (not "not")
+   (neg "neg")])
 
 ; atomics code attribute
 (define_code_attr atomic_optab
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 0d54694398d..0d86bbcd6b1 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -186,6 +186,17 @@ public:
   }
 };
 
+/* Implements vneg/vnot.  */
+template
+class unop : public function_base
+{
+public:
+  rtx expand (function_expander ) const override
+  {
+return e.use_exact_insn (code_for_pred (CODE, e.vector_mode ()));
+  }
+};
+
 static CONSTEXPR const vsetvl vsetvl_obj;
 static CONSTEXPR const vsetvl vsetvlmax_obj;
 static CONSTEXPR const loadstore vle_obj;
@@ -228,6 +239,8 @@ static CONSTEXPR const binop vdiv_obj;
 static CONSTEXPR const binop vrem_obj;
 static CONSTEXPR const binop vdivu_obj;
 static CONSTEXPR const binop vremu_obj;
+static CONSTEXPR const unop vneg_obj;
+static CONSTEXPR const unop vnot_obj;
 
 /* Declare the function base NAME, pointing it to an instance
of class _obj.  */
@@ -276,5 +289,7 @@ BASE (vdiv)
 BASE (vrem)
 BASE (vdivu)
 BASE (vremu)
+BASE (vneg)
+BASE (vnot)
 
 } // end namespace riscv_vector
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index a8b65dee6fc..72ee25655b2 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -66,6 +66,8 @@ extern const function_base *const vdiv;
 extern const function_base *const vrem;
 extern const function_base *const vdivu;
 extern const function_base *const vremu;
+extern const function_base *const vneg;
+extern const function_base *const vnot;
 }
 
 } // end namespace riscv_vector
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index d5df5c3d433..b94e780e916 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -63,40 +63,42 @@ DEF_RVV_FUNCTION (vsoxei16, indexed_loadstore, 
none_m_preds, all_v_scalar_ptr_ui
 DEF_RVV_FUNCTION (vsoxei32, indexed_loadstore, none_m_preds, 
all_v_scalar_ptr_uint32_index_ops)
 DEF_RVV_FUNCTION (vsoxei64, indexed_loadstore, none_m_preds, 
all_v_scalar_ptr_uint64_index_ops)
 /* 11. Vector Integer Arithmetic Instructions.  */
-DEF_RVV_FUNCTION (vadd, binop, 

[pushed] libstdc++: Adjust link to pdftex

2023-02-03 Thread Gerald Pfeifer
Pushed.

Gerald

libstdc++-v3/ChangeLog:

* doc/xml/manual/documentation_hacking.xml: Adjust link to pdftex.
* doc/html/manual/documentation_hacking.html: Regenerate.
---
 libstdc++-v3/doc/html/manual/documentation_hacking.html | 4 ++--
 libstdc++-v3/doc/xml/manual/documentation_hacking.xml   | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/doc/html/manual/documentation_hacking.html 
b/libstdc++-v3/doc/html/manual/documentation_hacking.html
index 4e415c32389..7766f133bea 100644
--- a/libstdc++-v3/doc/html/manual/documentation_hacking.html
+++ b/libstdc++-v3/doc/html/manual/documentation_hacking.html
@@ -124,7 +124,7 @@
graphs, the
http://www.graphviz.org; 
target="_top">Graphviz package
will need to be installed. For PDF
-   output, http://www.tug.org/applications/pdftex/; 
target="_top">
+   output, https://tug.org/applications/pdftex/; 
target="_top">
pdflatex is required as well as a number of TeX packages
such as texlive-xtab and
texlive-tocloft.
@@ -560,4 +560,4 @@ make XSL_STYLE_DIR="/usr/share/xml/docbook/stylesheet/nwal
   Prev??Up??NextAppendix??B.??
   Porting and Maintenance
   
-??Home??Porting to New Hardware or Operating 
Systems
\ No newline at end of file
+??Home??Porting to New Hardware or Operating 
Systems
diff --git a/libstdc++-v3/doc/xml/manual/documentation_hacking.xml 
b/libstdc++-v3/doc/xml/manual/documentation_hacking.xml
index 8a1cba92506..20f96ed7205 100644
--- a/libstdc++-v3/doc/xml/manual/documentation_hacking.xml
+++ b/libstdc++-v3/doc/xml/manual/documentation_hacking.xml
@@ -273,7 +273,7 @@
graphs, the
http://www.w3.org/1999/xlink; 
xlink:href="http://www.graphviz.org;>Graphviz package
will need to be installed. For PDF
-   output, http://www.w3.org/1999/xlink; 
xlink:href="http://www.tug.org/applications/pdftex/;>
+   output, http://www.w3.org/1999/xlink; 
xlink:href="https://tug.org/applications/pdftex/;>
pdflatex is required as well as a number of TeX packages
such as texlive-xtab and
texlive-tocloft.
-- 
2.39.1


Re: [PATCH 2/2] Documentation Update.

2023-02-03 Thread Qing Zhao via Gcc-patches
Okay, thanks all for the comments and suggestions.

Based on the discussion so far, I have the following plan for resolving this 
issue:

In GCC13:

1. Add documentation in extend.texi to include all the following 3 cases as GCC 
extension:

Case 1: The structure with a flexible array member is the last field of another
structure, for example:

struct flex  { int length; char data[]; }
struct out_flex { int m; struct flex flex_data; }

In the above, flex_data.data[] is considered as a flexible array too.

Case 2: The structure with a flexible array member is the field of another 
union, for example:

struct flex1  { int length1; char data1[]; }
struct flex2  { int length2; char data2[]; }
union out_flex { struct flex1 flex_data1; struct flex2 flex_data2; }

In the above, flex_data1.data1[] or flex_data2.data2[] is considered as 
flexible arrays too.

Case 3: The structure with a flexible array member is the middle field of 
another
structure, for example:

struct flex  { int length; char data[]; }
struct out_flex { int m; struct flex flex_data; int n; }

In the above, flex_data.data[] is allowed to be extended flexibly to
the padding. E.g, up to 4 elements.

However, relying on space in struct padding is a bad programming practice,  
compilers do not 
handle such extension consistently, and any code relying on this behavior 
should be modified
to ensure that flexible array members only end up at the ends of structures.

Please use warning option -Wgnu-variable-sized-type-not-at-end (to be 
consistent with CLANG) 
to identify all such cases in the source code and modify them. This extension 
will be deprecated
from gcc in the next release.

2. Add a new warning option -Wgnu-varaible-sized-type-not-at-end to warn such 
usage.

In GCC14:

1. Include this new warning -Wgnu-varaible-sized-type-not-at-end to -Wall
2. Deprecate this extension from GCC. (Or delay this to next release?).


Let me know any comments and suggestions?

thanks.

Qing



> On Feb 3, 2023, at 3:55 PM, Joseph Myers  wrote:
> 
> On Thu, 2 Feb 2023, Siddhesh Poyarekar wrote:
> 
>> I dug into this on the glibc end and it looks like this commit:
>> 
>> commit 63fb8f9aa9d19f85599afe4b849b567aefd70a36
>> Author: Zack Weinberg 
>> Date:   Mon Feb 5 14:13:41 2018 -0500
>> 
>>Post-cleanup 2: minimize _G_config.h.
>> 
>> ripped all of that gunk out.  AFAICT there's no use of struct __gconv_info
>> anywhere else in the code.
>> 
>> I reckon it is safe to say now that glibc no longer needs this misfeature.
> 
> It would be worth testing whether any change warns anywhere else in glibc 
> (not necessarily in installed headers).  And to have fixincludes for the 
> installed _G_config.h from old glibc if we start rejecting such code.
> 
> -- 
> Joseph S. Myers
> jos...@codesourcery.com



[PATCH 8/8] Add saturating subtract built-ins.

2023-02-03 Thread Michael Meissner via Gcc-patches
This patch adds support for a saturating subtract built-in function that may be
added to a future PowerPC processor.  Note, if it is added, the name of the
built-in function may change before GCC 13 is released.  If the name changes,
we will submit a patch changing the name.

I also added support for providing dense math built-in functions, even though
at present, we have not added any new built-in functions for dense math.  It is
likely we will want to add new dense math built-in functions as the dense math
support is fleshed out.

I tested this patch on a little endian power10 system with long double using
the tradiational IBM double double format.  Assuming the other 6 patches for
-mcpu=future are checked in (or at least the first patch), can I check this
patch into the master branch for GCC 13.

Note, I will be on vacation from Tuesday February 7th through Tuesday February
14th.

2023-02-03   Michael Meissner  

gcc/

* config/rs6000/rs6000-builtin.cc (rs6000_invalid_builtin): Add support
for flagging invalid use of future built-in functions.
(rs6000_builtin_is_supported): Add support for future built-in
functions.
* config/rs6000/rs6000-builtins.def (__builtin_saturate_subtract32): New
built-in function for -mcpu=future.
(__builtin_saturate_subtract64): Likewise.
* config/rs6000/rs6000-gen-builtins.cc (enum bif_stanza): Add stanzas
for -mcpu=future built-ins.
(stanza_map): Likewise.
(enable_string): Likewise.
(struct attrinfo): Likewise.
(parse_bif_attrs): Likewise.
(write_decls): Likewise.
* config/rs6000/rs6000.md (sat_sub3): Add saturating subtract
built-in insn declarations.
(sat_sub3_dot): Likewise.
(sat_sub3_dot2): Likewise.
* doc/extend.texi (Future PowerPC built-ins): New section.

gcc/testsuite/

* gcc.target/powerpc/subfus-1.c: New test.
* gcc.target/powerpc/subfus-2.c: Likewise.
---
 gcc/config/rs6000/rs6000-builtin.cc | 17 ++
 gcc/config/rs6000/rs6000-builtins.def   | 11 
 gcc/config/rs6000/rs6000-gen-builtins.cc| 35 ++--
 gcc/config/rs6000/rs6000.md | 60 +
 gcc/doc/extend.texi | 24 +
 gcc/testsuite/gcc.target/powerpc/subfus-1.c | 32 +++
 gcc/testsuite/gcc.target/powerpc/subfus-2.c | 32 +++
 gcc/testsuite/lib/target-supports.exp   | 16 +-
 8 files changed, 220 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/subfus-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/subfus-2.c

diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc
index d971cf90e51..b9b0b2d52d0 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -139,6 +139,17 @@ rs6000_invalid_builtin (enum rs6000_gen_builtins fncode)
 case ENB_MMA:
   error ("%qs requires the %qs option", name, "-mmma");
   break;
+case ENB_FUTURE:
+  error ("%qs requires the %qs option", name, "-mcpu=future");
+  break;
+case ENB_FUTURE_64:
+  error ("%qs requires the %qs option and either the %qs or %qs option",
+name, "-mcpu=future", "-m64", "-mpowerpc64");
+  break;
+case ENB_DM:
+  error ("%qs requires the %qs or %qs options", name, "-mcpu=future",
+"-mdense-math");
+  break;
 default:
 case ENB_ALWAYS:
   gcc_unreachable ();
@@ -194,6 +205,12 @@ rs6000_builtin_is_supported (enum rs6000_gen_builtins 
fncode)
   return TARGET_HTM;
 case ENB_MMA:
   return TARGET_MMA;
+case ENB_FUTURE:
+  return TARGET_FUTURE;
+case ENB_FUTURE_64:
+  return TARGET_FUTURE && TARGET_POWERPC64;
+case ENB_DM:
+  return TARGET_DENSE_MATH;
 default:
   gcc_unreachable ();
 }
diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index e0d9f5adc97..8b73e994558 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -139,6 +139,8 @@
 ;   endian   Needs special handling for endianness
 ;   ibmldRestrict usage to the case when TFmode is IBM-128
 ;   ibm128   Restrict usage to the case where __ibm128 is supported or if ibmld
+;   future   Restrict usage to future instructions
+;   dm   Restrict usage to dense math
 ;
 ; Each attribute corresponds to extra processing required when
 ; the built-in is expanded.  All such special processing should
@@ -4108,3 +4110,12 @@
 
   void __builtin_vsx_stxvp (v256, unsigned long, const v256 *);
 STXVP nothing {mma,pair}
+
+[future]
+  const signed int __builtin_saturate_subtract32 (signed int, signed int);
+  SAT_SUBSI sat_subsi3 {}
+
+[future-64]
+  const signed long __builtin_saturate_subtract64 (signed long, signed long);
+  SAT_SUBDI sat_subdi3 {}
+
diff --git a/gcc/config/rs6000/rs6000-gen-builtins.cc 

[PATCH 7/8] Support load/store vector with right length.

2023-02-03 Thread Michael Meissner via Gcc-patches
This patch adds support for new instructions that may be added to the PowerPC
architecture in the future to enhance the load and store vector with length
instructions.

The current instructions (lxvl, lxvll, stxvl, and stxvll) are inconvient to use
since the count for the number of bytes must be in the top 8 bits of the GPR
register, instead of the bottom 8 bits.  This meant that code generating these
instructions typically had to do a shift left by 56 bits to get the count into
the right position.  In a future version of the PowerPC architecture, new
variants of these instructions might be added that expect the count to be in
the bottom 8 bits of the GPR register.  These patches add this support to GCC
if the user uses the -mcpu=future option.

I discovered that the code in rs6000-string.cc to generate ISA 3.1 lxvl/stxvl
future lxvll/stxvll instructions would generate these instructions on 32-bit.
However the patterns for these instructions is only done on 64-bit systems.  So
I added a check for 64-bit support before generating the instructions.

I tested this patch on a little endian power10 system with long double using
the tradiational IBM double double format.  Assuming the other 6 patches for
-mcpu=future are checked in (or at least the first patch), can I check this
patch into the master branch for GCC 13?

Note, I will be on vacation from Tuesday February 7th through Tuesday February
14th.

2023-02-03   Michael Meissner  

gcc/

* config/rs6000/rs6000-string.cc (expand_block_move): Do generate lxvl
and stxvl on 32-bit.
* config/rs6000/vsx.md (lxvl): If -mcpu=future, generate the lxvl with
the shift count automaticaly used in the insn.
(lxvrl): New insn for -mcpu=future.
(lxvrll): Likewise.
(stxvl): If -mcpu=future, generate the stxvl with the shift count
automaticaly used in the insn.
(stxvrl): New insn for -mcpu=future.
(stxvrll): Likewise.

gcc/testsuite/

* gcc.target/powerpc/lxvrl.c: New test.
* lib/target-supports.exp (check_effective_target_powerpc_future_ok):
New effective target.
---
 gcc/config/rs6000/rs6000-string.cc   |   1 +
 gcc/config/rs6000/vsx.md | 122 +++
 gcc/testsuite/gcc.target/powerpc/lxvrl.c |  32 ++
 gcc/testsuite/lib/target-supports.exp|  16 ++-
 4 files changed, 148 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/lxvrl.c

diff --git a/gcc/config/rs6000/rs6000-string.cc 
b/gcc/config/rs6000/rs6000-string.cc
index 75e6f8803a5..9b2f1b83b22 100644
--- a/gcc/config/rs6000/rs6000-string.cc
+++ b/gcc/config/rs6000/rs6000-string.cc
@@ -2811,6 +2811,7 @@ expand_block_move (rtx operands[], bool might_overlap)
  gen_func.mov = gen_vsx_movv2di_64bit;
}
   else if (TARGET_BLOCK_OPS_UNALIGNED_VSX
+  && TARGET_POWERPC64
   && TARGET_POWER10 && bytes < 16
   && orig_bytes > 16
   && !(bytes == 1 || bytes == 2
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 0865608f94a..1ab8dc373c0 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -5582,20 +5582,32 @@ (define_expand "first_mismatch_or_eos_index_"
   DONE;
 })
 
-;; Load VSX Vector with Length
+;; Load VSX Vector with Length.  If we have lxvrl, we don't have to do an
+;; explicit shift left into a pseudo.
 (define_expand "lxvl"
-  [(set (match_dup 3)
-(ashift:DI (match_operand:DI 2 "register_operand")
-   (const_int 56)))
-   (set (match_operand:V16QI 0 "vsx_register_operand")
-   (unspec:V16QI
-[(match_operand:DI 1 "gpc_reg_operand")
-  (mem:V16QI (match_dup 1))
- (match_dup 3)]
-UNSPEC_LXVL))]
+  [(use (match_operand:V16QI 0 "vsx_register_operand"))
+   (use (match_operand:DI 1 "gpc_reg_operand"))
+   (use (match_operand:DI 2 "gpc_reg_operand"))]
   "TARGET_P9_VECTOR && TARGET_64BIT"
 {
-  operands[3] = gen_reg_rtx (DImode);
+  rtx shift_len = gen_rtx_ASHIFT (DImode, operands[2], GEN_INT (56));
+  rtx len;
+
+  if (TARGET_FUTURE)
+len = shift_len;
+  else
+{
+  len = gen_reg_rtx (DImode);
+  emit_insn (gen_rtx_SET (len, shift_len));
+}
+
+  rtx dest = operands[0];
+  rtx addr = operands[1];
+  rtx mem = gen_rtx_MEM (V16QImode, addr);
+  rtvec rv = gen_rtvec (3, addr, mem, len);
+  rtx lxvl = gen_rtx_UNSPEC (V16QImode, rv, UNSPEC_LXVL);
+  emit_insn (gen_rtx_SET (dest, lxvl));
+  DONE;
 })
 
 (define_insn "*lxvl"
@@ -5619,6 +5631,34 @@ (define_insn "lxvll"
   "lxvll %x0,%1,%2"
   [(set_attr "type" "vecload")])
 
+;; For lxvrl and lxvrll, use the combiner to eliminate the shift.  The
+;; define_expand for lxvl will already incorporate the shift in generating the
+;; insn.  The lxvll buitl-in function required the user to have already done
+;; the shift.  Defining lxvrll this way, will optimize cases where the user has
+;; done the shift immediately before the 

[PATCH 6/8] PowerPC: Add support for 1,024 bit DMR registers.

2023-02-03 Thread Michael Meissner via Gcc-patches
This patch is a prelimianry patch to add the full 1,024 bit dense math register
(DMRs) for -mcpu=future.  The MMA 512-bit accumulators map onto the top of the
DMR register.

This patch only adds the new 1,024 bit register support.  It does not add
support for any instructions that need 1,024 bit registers instead of 512 bit
registers.

I used the new mode 'TDOmode' to be the opaque mode used for 1,204 bit
registers.  The 'wD' constraint added in previous patches is used for these
registers.  I added support to do load and store of DMRs via the VSX registers,
since there are no load/store dense math instructions.  I added the new keyword
'__dmr' to create 1,024 bit types that can be loaded into DMRs.  At present, I
don't have aliases for __dmr512 and __dmr1024 that we've discussed internally.

The patches have been tested on the following platforms.  I added the patches
for PR target/107299 that I submitted on November 2nd before doing the builds so
that GCC would build on systems using IEEE 128-bit long double.
*   https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604834.html

Note this patch requires the patch posted on February 2nd, 2023 to bump up the
precision size to 16 bits.  To get this into GCC 13, I will have to revise this
patch.

| Date: Thu, 2 Feb 2023 12:38:30 -0500
| Subject: [PATCH] Bump up precision size to 16 bits.
| Message-ID: 
| https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611198.html

There were no regressions with doing bootstrap builds and running the regression
tests, providing the above patch for the precision size has been installed:

1)  Power10 LE using --with-cpu=power10 --with-long-double-format=ieee;
2)  Power10 LE using --with-cpu=power10 --with-long-double-format=ibm;
3)  Power9 LE using --with-cpu=power9 --with-long-double-format=ibm; and
4)  Power8 BE using --with-cpu=power8 (both 32-bit & 64-bit tested).

Note, I will be on vacation from Tuesday February 7th through Tuesday February
14th.

Can I check this patch into the GCC 13 master branch?

2023-02-03   Michael Meissner  

gcc/

* config/rs6000/mma.md (UNSPEC_DM_INSERT512_UPPER): New unspec.
(UNSPEC_DM_INSERT512_LOWER): Likewise.
(UNSPEC_DM_EXTRACT512): Likewise.
(UNSPEC_DMR_RELOAD_FROM_MEMORY): Likewise.
(UNSPEC_DMR_RELOAD_TO_MEMORY): Likewise.
(movtdo): New define_expand and define_insn_and_split to implement 1,024
bit DMR registers.
(movtdo_insert512_upper): New insn.
(movtdo_insert512_lower): Likewise.
(movtdo_extract512): Likewise.
(reload_dmr_from_memory): Likewise.
(reload_dmr_to_memory): Likewise.
* config/rs6000/rs6000-builtin.cc (rs6000_type_string): Add DMR
support.
(rs6000_init_builtins): Add support for __dmr keyword.
* config/rs6000/rs6000-call.cc (rs6000_return_in_memory): Add support
for TDOmode.
(rs6000_function_arg): Likewise.
* config/rs6000/rs6000-modes.def (TDOmode): New mode.
* config/rs6000/rs6000.cc (rs6000_hard_regno_nregs_internal): Add
support for TDOmode.
(rs6000_hard_regno_mode_ok_uncached): Likewise.
(rs6000_hard_regno_mode_ok): Likewise.
(rs6000_modes_tieable_p): Likewise.
(rs6000_debug_reg_global): Likewise.
(rs6000_setup_reg_addr_masks): Likewise.
(rs6000_init_hard_regno_mode_ok): Add support for TDOmode.  Setup reload
hooks for DMR mode.
(reg_offset_addressing_ok_p): Add support for TDOmode.
(rs6000_emit_move): Likewise.
(rs6000_secondary_reload_simple_move): Likewise.
(rs6000_secondary_reload_class): Likewise.
(rs6000_mangle_type): Add mangling for __dmr type.
(rs6000_dmr_register_move_cost): Add support for TDOmode.
(rs6000_split_multireg_move): Likewise.
(rs6000_invalid_conversion): Likewise.
* config/rs6000/rs6000.h (VECTOR_ALIGNMENT_P): Add TDOmode.
(enum rs6000_builtin_type_index): Add DMR type nodes.
(dmr_type_node): Likewise.
(ptr_dmr_type_node): Likewise.

gcc/testsuite/

* gcc.target/powerpc/dm-1024bit.c: New test.
---
 gcc/config/rs6000/mma.md  | 152 ++
 gcc/config/rs6000/rs6000-builtin.cc   |  13 ++
 gcc/config/rs6000/rs6000-call.cc  |  13 +-
 gcc/config/rs6000/rs6000-modes.def|   4 +
 gcc/config/rs6000/rs6000.cc   | 125 ++
 gcc/config/rs6000/rs6000.h|   7 +-
 gcc/testsuite/gcc.target/powerpc/dm-1024bit.c |  63 
 7 files changed, 345 insertions(+), 32 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/dm-1024bit.c

diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index 411e2345291..0233c7b304a 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -92,6 +92,11 @@ (define_c_enum "unspec"
UNSPEC_MMA_XXMFACC
UNSPEC_MMA_XXMTACC

[PATCH 4/8] PowerPC: Switch to dense math names for all MMA operations

2023-02-03 Thread Michael Meissner via Gcc-patches
This patch changes the assembler instruction names for MMA instructions from
the original name used in power10 to the new name when used with the dense math
system.  I.e. xvf64gerpp becomes dmxvf64gerpp.  The assembler will emit the
same bits for either spelling.

The patches have been tested on the following platforms.  I added the patches
for PR target/107299 that I submitted on November 2nd before doing the builds so
that GCC would build on systems using IEEE 128-bit long double.
*   https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604834.html

There were no regressions with doing bootstrap builds and running the regression
tests:

1)  Power10 LE using --with-cpu=power10 --with-long-double-format=ieee;
2)  Power10 LE using --with-cpu=power10 --with-long-double-format=ibm;
3)  Power9 LE using --with-cpu=power9 --with-long-double-format=ibm; and
4)  Power8 BE using --with-cpu=power8 (both 32-bit & 64-bit tested).

Note, I will be on vacation from Tuesday February 7th through Tuesday February
14th.

Can I check this patch into the GCC 13 master branch?

2023-02-03   Michael Meissner  

gcc/

* config/rs6000/mma.md (vvi4i4i8_dm): New int attribute.
(avvi4i4i8_dm): Likewise.
(vvi4i4i2_dm): Likewise.
(avvi4i4i2_dm): Likewise.
(vvi4i4_dm): Likewise.
(avvi4i4_dm): Likewise.
(pvi4i2_dm): Likewise.
(apvi4i2_dm): Likewise.
(vvi4i4i4_dm): Likewise.
(avvi4i4i4_dm): Likewise.
(mma_): Add support for running on DMF systems, generating the dense
math instruction and using the dense math accumulators.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.

gcc/testsuite/

* gcc.target/powerpc/dm-double-test.c: New test.
* lib/target-supports.exp (check_effective_target_ppc_dmr_ok): New
target test.
---
 gcc/config/rs6000/mma.md  |  98 +++--
 .../gcc.target/powerpc/dm-double-test.c   | 194 ++
 gcc/testsuite/lib/target-supports.exp |  19 ++
 3 files changed, 299 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/dm-double-test.c

diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index 9e3feb3ea54..411e2345291 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -227,13 +227,22 @@ (define_int_attr apv  [(UNSPEC_MMA_XVF64GERPP 
"xvf64gerpp")
 
 (define_int_attr vvi4i4i8  [(UNSPEC_MMA_PMXVI4GER8 "pmxvi4ger8")])
 
+(define_int_attr vvi4i4i8_dm   [(UNSPEC_MMA_PMXVI4GER8 
"pmdmxvi4ger8")])
+
 (define_int_attr avvi4i4i8 [(UNSPEC_MMA_PMXVI4GER8PP   
"pmxvi4ger8pp")])
 
+(define_int_attr avvi4i4i8_dm  [(UNSPEC_MMA_PMXVI4GER8PP   
"pmdmxvi4ger8pp")])
+
 (define_int_attr vvi4i4i2  [(UNSPEC_MMA_PMXVI16GER2"pmxvi16ger2")
 (UNSPEC_MMA_PMXVI16GER2S   "pmxvi16ger2s")
 (UNSPEC_MMA_PMXVF16GER2"pmxvf16ger2")
 (UNSPEC_MMA_PMXVBF16GER2   
"pmxvbf16ger2")])
 
+(define_int_attr vvi4i4i2_dm   [(UNSPEC_MMA_PMXVI16GER2"pmdmxvi16ger2")
+(UNSPEC_MMA_PMXVI16GER2S   
"pmdmxvi16ger2s")
+(UNSPEC_MMA_PMXVF16GER2"pmdmxvf16ger2")
+(UNSPEC_MMA_PMXVBF16GER2   
"pmdmxvbf16ger2")])
+
 (define_int_attr avvi4i4i2 [(UNSPEC_MMA_PMXVI16GER2PP  "pmxvi16ger2pp")
 (UNSPEC_MMA_PMXVI16GER2SPP 
"pmxvi16ger2spp")
 (UNSPEC_MMA_PMXVF16GER2PP  "pmxvf16ger2pp")
@@ -245,25 +254,54 @@ (define_int_attr avvi4i4i2
[(UNSPEC_MMA_PMXVI16GER2PP  "pmxvi16ger2pp")
 (UNSPEC_MMA_PMXVBF16GER2NP 
"pmxvbf16ger2np")
 (UNSPEC_MMA_PMXVBF16GER2NN 
"pmxvbf16ger2nn")])
 
+(define_int_attr avvi4i4i2_dm  [(UNSPEC_MMA_PMXVI16GER2PP  
"pmdmxvi16ger2pp")
+(UNSPEC_MMA_PMXVI16GER2SPP 
"pmdmxvi16ger2spp")
+(UNSPEC_MMA_PMXVF16GER2PP  
"pmdmxvf16ger2pp")
+(UNSPEC_MMA_PMXVF16GER2PN  
"pmdmxvf16ger2pn")
+(UNSPEC_MMA_PMXVF16GER2NP  
"pmdmxvf16ger2np")
+(UNSPEC_MMA_PMXVF16GER2NN  
"pmdmxvf16ger2nn")
+(UNSPEC_MMA_PMXVBF16GER2PP 
"pmdmxvbf16ger2pp")
+(UNSPEC_MMA_PMXVBF16GER2PN 
"pmdmxvbf16ger2pn")
+(UNSPEC_MMA_PMXVBF16GER2NP 
"pmdmxvbf16ger2np")
+

[PATCH 3/8] PowerPC: Make MMA insns support DMR registers.

2023-02-03 Thread Michael Meissner via Gcc-patches
This patch changes the MMA instructions to use either FPR registers
(-mcpu=power10) or DMRs (-mcpu=future).  In this patch, the existing MMA
instruction names are used.

A macro (__PPC_DMR__) is defined if the MMA instructions use the DMRs.

The patches have been tested on the following platforms.  I added the patches
for PR target/107299 that I submitted on November 2nd before doing the builds so
that GCC would build on systems using IEEE 128-bit long double.
*   https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604834.html

There were no regressions with doing bootstrap builds and running the regression
tests:

1)  Power10 LE using --with-cpu=power10 --with-long-double-format=ieee;
2)  Power10 LE using --with-cpu=power10 --with-long-double-format=ibm;
3)  Power9 LE using --with-cpu=power9 --with-long-double-format=ibm; and
4)  Power8 BE using --with-cpu=power8 (both 32-bit & 64-bit tested).

Note, I will be on vacation from Tuesday February 7th through Tuesday February
14th.

Can I check this patch into the GCC 13 master branch?

2023-02-03   Michael Meissner  

gcc/

* config/rs6000/mma.md (mma_): New define_expand to handle
mma_ for dense math and non dense math.
(mma_ insn): Restrict to non dense math.
(mma_xxsetaccz): Convert to define_expand to handle non dense math and
dense math.
(mma_xxsetaccz_vsx): Rename from mma_xxsetaccz and restrict usage to non
dense math.
(mma_xxsetaccz_dm): Dense math version of mma_xxsetaccz.
(mma_): Add support for dense math.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
* config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Define
__PPC_DMR__ if we have dense math instructions.
* config/rs6000/rs6000.cc (print_operand): Make %A handle only DMRs if
dense math and only FPRs if not dense math.
(rs6000_split_multireg_move): Do not generate the xxmtacc instruction to
prime the DMR registers or the xxmfacc instruction to de-prime
instructions if we have dense math register support.
---
 gcc/config/rs6000/mma.md  | 247 +-
 gcc/config/rs6000/rs6000-c.cc |   3 +
 gcc/config/rs6000/rs6000.cc   |  35 ++---
 3 files changed, 176 insertions(+), 109 deletions(-)

diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index 59ca6835f7c..9e3feb3ea54 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -552,190 +552,249 @@ (define_insn "*mma_disassemble_acc_dm"
   "dmxxextfdmr256 %0,%1,2"
   [(set_attr "type" "mma")])
 
-(define_insn "mma_"
+;; MMA instructions that do not use their accumulators as an input, still must
+;; not allow their vector operands to overlap the registers used by the
+;; accumulator.  We enforce this by marking the output as early clobber.  If we
+;; have dense math, we don't need the whole prime/de-prime action, so just make
+;; thse instructions be NOPs.
+
+(define_expand "mma_"
+  [(set (match_operand:XO 0 "register_operand")
+   (unspec:XO [(match_operand:XO 1 "register_operand")]
+  MMA_ACC))]
+  "TARGET_MMA"
+{
+  if (TARGET_DENSE_MATH)
+{
+  if (!rtx_equal_p (operands[0], operands[1]))
+   emit_move_insn (operands[0], operands[1]);
+  DONE;
+}
+
+  /* Generate the prime/de-prime code.  */
+})
+
+(define_insn "*mma_"
   [(set (match_operand:XO 0 "fpr_reg_operand" "=")
(unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0")]
MMA_ACC))]
-  "TARGET_MMA"
+  "TARGET_MMA && !TARGET_DENSE_MATH"
   " %A0"
   [(set_attr "type" "mma")])
 
 ;; We can't have integer constants in XOmode so we wrap this in an
-;; UNSPEC_VOLATILE.
+;; UNSPEC_VOLATILE for the non-dense math case.  For dense math, we don't need
+;; to disable optimization and we can do a normal UNSPEC.
 
-(define_insn "mma_xxsetaccz"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=d")
+(define_expand "mma_xxsetaccz"
+  [(set (match_operand:XO 0 "register_operand")
(unspec_volatile:XO [(const_int 0)]
UNSPECV_MMA_XXSETACCZ))]
   "TARGET_MMA"
+{
+  if (TARGET_DENSE_MATH)
+{
+  emit_insn (gen_mma_xxsetaccz_dm (operands[0]));
+  DONE;
+}
+})
+
+(define_insn "*mma_xxsetaccz_vsx"
+  [(set (match_operand:XO 0 "fpr_reg_operand" "=d")
+   (unspec_volatile:XO [(const_int 0)]
+   UNSPECV_MMA_XXSETACCZ))]
+  "TARGET_MMA && !TARGET_DENSE_MATH"
   "xxsetaccz %A0"
   [(set_attr "type" "mma")])
 
+
+(define_insn "mma_xxsetaccz_dm"
+  [(set (match_operand:XO 0 "dmr_operand" "=wD")
+   (unspec:XO [(const_int 0)]
+  UNSPECV_MMA_XXSETACCZ))]
+  

[PATCH 2/8] PowerPC: Add support for accumulators in DMR registers.

2023-02-03 Thread Michael Meissner via Gcc-patches
The MMA subsystem added the notion of accumulator registers as an optional
feature of ISA 3.1.  In ISA 3.1, these accumulators overlapped with the VSX
vector registers 0..31, but logically the accumulator registers were separate
from the FPR registers.  In ISA 3.1, it was anticipated that in future systems,
the accumulator registers may no overlap with the FPR registers.  This patch
adds the support for dense math registers as separate registers.

These changes are preliminary.  They are expected to change over time.

This particular patch does not change the MMA support to use the accumulators
within the dense math registers.  This patch just adds the basic support for
having separate DMRs.  The next patch will switch the MMA support to use the
accumulators if -mcpu=future is used.

For testing purposes, I added an undocumented option '-mdense-math' to enable
or disable the dense math support.

This patch adds a new constraint (wD).  If MMA is selected but dense math is
not selected (i.e. -mcpu=power10), the wD constraint will allow access to
accumulators that overlap with the VSX vector registers 0..31.  If both MMA and
dense math are selected (i.e. -mcpu=future), the wD constraint will only allow
dense math registers.

This patch modifies the existing %A output modifier.  If MMA is selected but
dense math is not selected, then %A output modifier converts the VSX register
number to the accumulator number, by dividing it by 4.  If both MMA and dense
math are selected, then %A will map the separate DMR registers into 0..7.

The intention is that user code using extended asm can be modified to run on
both MMA without dense math and MMA with dense math:

1)  If possible, don't use extended asm, but instead use the MMA built-in
functions;

2)  If you do need to write extended asm, change the d constraints
targetting accumulators should now use wD;

3)  Only use the built-in zero, assemble and disassemble functions create
move data between vector quad types and dense math accumulators.
I.e. do not use the xxmfacc, xxmtacc, and xxsetaccz directly in the
extended asm code.  The reason is these instructions assume there is a
1-to-1 correspondence between 4 adjacent FPR registers and an
accumulator that overlaps with those instructions.  With accumulators
now being separate registers, there no longer is a 1-to-1
correspondence.

It is possible that the mangling for DMRs and the GDB register numbers may
change in the future.

The patches have been tested on the following platforms.  I added the patches
for PR target/107299 that I submitted on November 2nd before doing the builds so
that GCC would build on systems using IEEE 128-bit long double.
*   https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604834.html

There were no regressions with doing bootstrap builds and running the regression
tests:

1)  Power10 LE using --with-cpu=power10 --with-long-double-format=ieee;
2)  Power10 LE using --with-cpu=power10 --with-long-double-format=ibm;
3)  Power9 LE using --with-cpu=power9 --with-long-double-format=ibm; and
4)  Power8 BE using --with-cpu=power8 (both 32-bit & 64-bit tested).

Can I check this patch into the GCC 13 master branch?

Note, I will be on vacation from Tuesday February 7th through Tuesday February
14th.

2023-02-03   Michael Meissner  

gcc/

* config/rs6000/constraints.md (wD constraint): New constraint.
* config/rs6000/mma.md (UNSPEC_DM_ASSEMBLE_ACC): New unspec.
(movxo): Convert into define_expand.
(movxo_vsx): Version of movxo where accumulators overlap with VSX vector
registers 0..31.
(movxo_dm): Verson of movxo that supports separate dense math
accumulators.
(mma_assemble_acc): Add dense math support to define_expand.
(mma_assemble_acc_vsx): Rename from mma_assemble_acc, and restrict it to
non dense math systems.
(mma_assemble_acc_dm): Dense math version of mma_assemble_acc.
(mma_disassemble_acc): Add dense math support to define_expand.
(mma_disassemble_acc_vsx): Rename from mma_disassemble_acc, and restrict
it to non dense math systems.
(mma_disassemble_acc_dm): Dense math version of mma_disassemble_acc.
* config/rs6000/predicates.md (dmr_operand): New predicate.
(accumulator_operand): Likewise.
* config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS): Add -mdense-math.
(POWERPC_MASKS): Likewise.
* config/rs6000/rs6000.cc (enum rs6000_reg_type): Add DMR_REG_TYPE.
(enum rs6000_reload_reg_type): Add RELOAD_REG_DMR.
(LAST_RELOAD_REG_CLASS): Add support for DMR registers and the wD
constraint.
(reload_reg_map): Likewise.
(rs6000_reg_names): Likewise.
(alt_reg_names): Likewise.
(rs6000_hard_regno_nregs_internal): Likewise.
(rs6000_hard_regno_mode_ok_uncached): Likewise.

[PATCH 1/8] PowerPC: Make -mcpu=future enable -mblock-ops-vector-pair

2023-02-03 Thread Michael Meissner via Gcc-patches
This patch enables generating load and store vector pair instructions when
doing certain memory copy operations when -mcpu=future is used.  In doing tests
on power10, it was determined that using these instructions were problematical
in a few cases, so we disabled generating them by default.  This patch
re-enabled generating these instructions if -mcpu=future is used.

The patches have been tested on the following platforms.  I added the patches
for PR target/107299 that I submitted on November 2nd before doing the builds so
that GCC would build on systems using IEEE 128-bit long double.
*   https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604834.html

There were no regressions with doing bootstrap builds and running the regression
tests:

1)  Power10 LE using --with-cpu=power10 --with-long-double-format=ieee;
2)  Power10 LE using --with-cpu=power10 --with-long-double-format=ibm;
3)  Power9 LE using --with-cpu=power9 --with-long-double-format=ibm; and
4)  Power8 BE using --with-cpu=power8 (both 32-bit & 64-bit tested).

Note, I will be on vacation from Tuesday February 7th through Tuesday February
14th.

Can I check this patch into the GCC 13 master branch?

2023-02-03   Michael Meissner  

gcc/

* config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS): Add
-mblock-ops-vector-pair.
(POWERPC_MASKS): Likewise.
---
 gcc/config/rs6000/rs6000-cpus.def | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-cpus.def 
b/gcc/config/rs6000/rs6000-cpus.def
index deb4ea1c980..b9a4d9ad76e 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -88,6 +88,7 @@
 
 /* Flags for a potential future processor that may or may not be delivered.  */
 #define ISA_FUTURE_MASKS   (ISA_3_1_MASKS_SERVER   \
+| OPTION_MASK_BLOCK_OPS_VECTOR_PAIR\
 | OPTION_MASK_FUTURE)
 
 /* Flags that need to be turned off if -mno-power9-vector.  */
@@ -125,6 +126,7 @@
 
 /* Mask of all options to set the default isa flags based on -mcpu=.  */
 #define POWERPC_MASKS  (OPTION_MASK_ALTIVEC\
+| OPTION_MASK_BLOCK_OPS_VECTOR_PAIR\
 | OPTION_MASK_CMPB \
 | OPTION_MASK_CRYPTO   \
 | OPTION_MASK_DFP  \
-- 
2.39.1


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [PATCH 1/2] c++: make manifestly_const_eval tri-state

2023-02-03 Thread Patrick Palka via Gcc-patches
On Mon, 30 Jan 2023, Jason Merrill wrote:

> On 1/27/23 17:02, Patrick Palka wrote:
> > This patch turns the manifestly_const_eval flag used by the constexpr
> > machinery into a tri-state enum so that we're able to express wanting
> > to fold __builtin_is_constant_evaluated to false via late speculative
> > constexpr evaluation.  Of all the entry points to constexpr evaluation
> > only maybe_constant_value is changed to take a tri-state value; the
> > others continue to take bool.  The subsequent patch will use this to fold
> > the builtin to false when called from cp_fold_function.
> > 
> > gcc/cp/ChangeLog:
> > 
> > * constexpr.cc (constexpr_call::manifestly_const_eval): Give
> > it type int instead of bool.
> > (constexpr_ctx::manifestly_const_eval): Give it type mce_value
> > instead of bool.
> > (cxx_eval_builtin_function_call): Adjust after making
> > manifestly_const_eval tri-state.
> > (cxx_eval_call_expression): Likewise.
> > (cxx_eval_binary_expression): Likewise.
> > (cxx_eval_conditional_expression): Likewise.
> > (cxx_eval_constant_expression): Likewise.
> > (cxx_eval_outermost_constant_expr): Likewise.
> > (cxx_constant_value): Likewise.
> > (cxx_constant_dtor): Likewise.
> > (maybe_constant_value): Give manifestly_const_eval parameter
> > type mce_value instead of bool and adjust accordingly.
> > (fold_non_dependent_expr_template): Adjust call
> > to cxx_eval_outermost_constant_expr.
> > (fold_non_dependent_expr): Likewise.
> > (maybe_constant_init_1): Likewise.
> > * constraint.cc (satisfy_atom): Adjust call to
> > maybe_constant_value.
> > * cp-tree.h (enum class mce_value): Define.
> > (maybe_constant_value): Adjust manifestly_const_eval parameter
> > type and default argument.
> > * decl.cc (compute_array_index_type_loc): Adjust call to
> > maybe_constant_value.
> > * pt.cc (convert_nontype_argument): Likewise.
> > ---
> >   gcc/cp/constexpr.cc  | 61 
> >   gcc/cp/constraint.cc |  3 +--
> >   gcc/cp/cp-tree.h | 18 -
> >   gcc/cp/decl.cc   |  2 +-
> `>   gcc/cp/pt.cc |  6 ++---
> >   5 files changed, 54 insertions(+), 36 deletions(-)
> > 
> > diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
> > index be99bec17e7..34662198903 100644
> > --- a/gcc/cp/constexpr.cc
> > +++ b/gcc/cp/constexpr.cc
> > @@ -1119,8 +1119,8 @@ struct GTY((for_user)) constexpr_call {
> > /* The hash of this call; we remember it here to avoid having to
> >recalculate it when expanding the hash table.  */
> > hashval_t hash;
> > -  /* Whether __builtin_is_constant_evaluated() should evaluate to true.  */
> > -  bool manifestly_const_eval;
> > +  /* The raw value of constexpr_ctx::manifestly_const_eval.  */
> > +  int manifestly_const_eval;
> 
> Why not mce_value?

gengtype complained about 'mce_value' being an unknown type here
(constexpr_call is gengtype-enabled).  Ah, but it looks like using
'enum mce_value' makes gengtype happy.

> 
> >   };
> > struct constexpr_call_hasher : ggc_ptr_hash
> > @@ -1248,7 +1248,7 @@ struct constexpr_ctx {
> >trying harder to get a constant value.  */
> > bool strict;
> > /* Whether __builtin_is_constant_evaluated () should be true.  */
> > -  bool manifestly_const_eval;
> > +  mce_value manifestly_const_eval;
> >   };
> > /* This internal flag controls whether we should avoid doing anything
> > during
> > @@ -1463,7 +1463,7 @@ cxx_eval_builtin_function_call (const constexpr_ctx
> > *ctx, tree t, tree fun,
> > /* If we aren't requiring a constant expression, defer
> > __builtin_constant_p
> >in a constexpr function until we have values for the parameters.  */
> > if (bi_const_p
> > -  && !ctx->manifestly_const_eval
> > +  && ctx->manifestly_const_eval == mce_unknown
> > && current_function_decl
> > && DECL_DECLARED_CONSTEXPR_P (current_function_decl))
> >   {
> > @@ -1479,12 +1479,13 @@ cxx_eval_builtin_function_call (const constexpr_ctx
> > *ctx, tree t, tree fun,
> > if (fndecl_built_in_p (fun, CP_BUILT_IN_IS_CONSTANT_EVALUATED,
> >  BUILT_IN_FRONTEND))
> >   {
> > -  if (!ctx->manifestly_const_eval)
> > +  if (ctx->manifestly_const_eval == mce_unknown)
> > {
> >   *non_constant_p = true;
> >   return t;
> > }
> > -  return boolean_true_node;
> > +  return constant_boolean_node (ctx->manifestly_const_eval == mce_true,
> > +   boolean_type_node);
> >   }
> >   if (fndecl_built_in_p (fun, CP_BUILT_IN_SOURCE_LOCATION,
> > BUILT_IN_FRONTEND))
> > @@ -1591,7 +1592,7 @@ cxx_eval_builtin_function_call (const constexpr_ctx
> > *ctx, tree t, tree fun,
> >   }
> >   bool save_ffbcp = force_folding_builtin_constant_p;
> > -  force_folding_builtin_constant_p |= ctx->manifestly_const_eval;
> > +  

[PATCH 1/8] PowerPC: Add -mcpu=future.

2023-02-03 Thread Michael Meissner via Gcc-patches
These patches implement support for potential future PowerPC cpus.  At this
time, features enabled with -mcpu=future may or may not be in actual PowerPCs
that will be delivered in the future.

This patch adds support for the -mcpu=future and -mtune=future options.
If you use -mcpu=future, the macro __ARCH_PWR_FUTURE__ is defined, and the
assembler .machine directive "future" is used.  Future patches in this
series will add support for new instructions that may be present in future
PowerPC processors.

At the moment, we do not have any differences in tuning between power10 and
future.  It is anticipated that we may change the tuning characteristics for
-mtune=future at a later time.

The patches have been tested on the following platforms.  I added the patches
for PR target/107299 that I submitted on November 2nd before doing the builds so
that GCC would build on systems using IEEE 128-bit long double.
* https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604834.html

There were no regressions with doing bootstrap builds and running the regression
tests:

1)  Power10 LE using --with-cpu=power10 --with-long-double-format=ieee;
2)  Power10 LE using --with-cpu=power10 --with-long-double-format=ibm;
3)  Power9 LE using --with-cpu=power9 --with-long-double-format=ibm; and
4)  Power8 BE using --with-cpu=power8 (both 32-bit & 64-bit tested).

Can I check this patch into the GCC 13 master branch?

Note, I will be on vacation from Tuesday February 7th through Tuesday February
14th.

2023-02-03   Michael Meissner  

gcc/

* config/rs6000/power10.md (power10-load): Temporarily treat
-mcpu=future the same as -mcpu=power10.
(power10-fused-load): Likewise.
(power10-prefixed-load): Likewise.
(power10-prefixed-load): Likewise.
(power10-load-update): Likewise.
(power10-fpload-double): Likewise.
(power10-fpload-double): Likewise.
(power10-prefixed-fpload-double): Likewise.
(power10-prefixed-fpload-double): Likewise.
(power10-fpload-update-double): Likewise.
(power10-fpload-single): Likewise.
(power10-fpload-update-single): Likewise.
(power10-vecload): Likewise.
(power10-vecload-pair): Likewise.
(power10-store): Likewise.
(power10-fused-store): Likewise.
(power10-prefixed-store): Likewise.
(power10-prefixed-store): Likewise.
(power10-store-update): Likewise.
(power10-vecstore-pair): Likewise.
(power10-larx): Likewise.
(power10-lq): Likewise.
(power10-stcx): Likewise.
(power10-stq): Likewise.
(power10-sync): Likewise.
(power10-sync): Likewise.
(power10-alu): Likewise.
(power10-fused_alu): Likewise.
(power10-paddi): Likewise.
(power10-rot): Likewise.
(power10-rot-compare): Likewise.
(power10-alu2): Likewise.
(power10-cmp): Likewise.
(power10-two): Likewise.
(power10-three): Likewise.
(power10-mul): Likewise.
(power10-mul-compare): Likewise.
(power10-div): Likewise.
(power10-div-compare): Likewise.
(power10-crlogical): Likewise.
(power10-mfcrf): Likewise.
(power10-mfcr): Likewise.
(power10-mtcr): Likewise.
(power10-mtjmpr): Likewise.
(power10-mfjmpr): Likewise.
(power10-mfjmpr): Likewise.
(power10-fpsimple): Likewise.
(power10-fp): Likewise.
(power10-fpcompare): Likewise.
(power10-sdiv): Likewise.
(power10-ddiv): Likewise.
(power10-sqrt): Likewise.
(power10-dsqrt): Likewise.
(power10-vec-2cyc): Likewise.
(power10-fused-vec): Likewise.
(power10-veccmp): Likewise.
(power10-vecsimple): Likewise.
(power10-vecnormal): Likewise.
(power10-qp): Likewise.
(power10-vecperm): Likewise.
(power10-vecperm-compare): Likewise.
(power10-prefixed-vecperm): Likewise.
(power10-veccomplex): Likewise.
(power10-vecfdiv): Likewise.
(power10-vecdiv): Likewise.
(power10-qpdiv): Likewise.
(power10-qpmul): Likewise.
(power10-mtvsr): Likewise.
(power10-mfvsr): Likewise.
(power10-mfvsr): Likewise.
(power10-branch): Likewise.
(power10-fused-branch): Likewise.
(power10-crypto): Likewise.
(power10-htm): Likewise.
(power10-htm): Likewise.
(power10-dfp): Likewise.
(power10-dfpq): Likewise.
(power10-mma): Likewise.
(power10-prefixed-mma): Likewise.
* config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Define
__ARCH_PWR_FUTURE__ if -mcpu=future.
* config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS): New macro.
(POWERPC_MASKS): Add -mcpu=future.
* config/rs6000/rs6000-opts.h (enum processor_type): Add
PROCESSOR_FUTURE.
* config/rs6000/rs6000-tables.opt: 

[PATCH 0/8] PowerPC future support for Dense Math

2023-02-03 Thread Michael Meissner via Gcc-patches
These patches were originally posted on November 10th.  Segher has asked that I
repost them.  These patches are somewhat changed since the original posting to
address some of the comments.

https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605581.html

In the first patch (adding -mcpu=future), I have taken out the code of making
-mtune=future act as -mtune=power10.  Instead I went through all of the places
that look at the tuning (mostly in power10.md and rs6000.cc), and added future
as an option.  Obviously at a later time, we will provide a separate tuning
file for future (or whatever the new name will be if the instructions are added
officially).  But for now, it will suffice.

In patch #3, I fixed the opcode for clearing a dense math register that Peter
had noticed.  I was using the name based on the existing clear instruction,
instead of the new instruction.

In patch #6, I fixed the code, relying on the changes for setting the precision
field to 16 bits.  Since that patch will not be able to go into GCC 13 at
present, we might skip that support for now.  The important thing for existing
users of the MMA code is the support for accumulators being in the separate
dense math registers rather than overlapping does need to go in, and we can
probably delay the 1,024 bit register support, or implement in a different
fashion.

In the insn names, I tried to switch to using _vsx instead of _fpr for the
existing MMA support instructions.  I also tried to clear up the comments to
specify ISA 3.1 instead of power10 when talking about the existing MMA
support.

The following is from the original posting (slightly modified):

This patch is very preliminary support for a potential new feature to the
PowerPC that extends the current power10 MMA architecture.  This feature may or
may not be present in any specific future PowerPC processor.

In the current MMA subsystem for Power10, there are 8 512-bit accumulator
registers.  These accumulators are each tied to sets of 4 FPR registers.  When
you issue a prime instruction, it makes sure the accumulator is a copy of the 4
FPR registers the accumulator is tied to.  When you issue a deprime
instruction, it makes sure that the accumulator data content is logically
copied to the matching FPR register.

In the potential dense math system, the accumulators are moved to separate
registers called dense math registers (DM registers or DMR).  The DMRs are then
extended to 1,024 bits and new instructions will be added to deal with all
1,024 bits of the DMRs.

If you take existing MMA code, it will work as long as you don't do anything
with accumulators, and you follow the rules in the ISA 3.1 documentation for
using the MMA subsystem.

These patches add support for the 512-bit accumulators within the dense math
system, and for allocation of the 1,024-bit DMRs.  At this time, no additional
built-in functions will be done to support any dense math features other than
doing data movement between the DMRs and the VSX registers.  Before we can look
at adding any new dense math support other than data movement, we need the GCC
compiler to be able to allocate and use these DMRs.

There are 8 patches in this patch set:

1) The first patch just adds -mcpu=future as an option to add new support.
This is similar to the -mcpu=future that we did before power10 was announced.

2) The second patch enables GCC to use the load and store vector pair
instructions to optimize memory copy operations in the compiler.  For power10,
we needed to just stay with normal vector load/stores for memory copy
operations.

3) The third patch enables 512-bit accumulators store in DMRs.  This patch
enables the register allocation, but it does not move the existing MMA to use
these registers.

4) The fourth patch switches the MMA subsystem to use 512-bit accumulators
within DMRs if you use -mcpu=future.

5) The fifth patch switches the names of the MMA instructions to use the dense
math equivalent name if -mcpu=future.

6) The sixth patch enables using the full 1,024-bit DMRs.  Right now, all you
can do with DMRs is move a VSX register to a DMR register, and to move a DMR
register to a VSX register.  [As I mentioned above, at the moment, this patch
is problematical as is]

7) The seventh patch is not DMR related.  It adds support for variants of the
load/store vector with length instruction that may be added in future PowerPC
processors.  These variants eliminate having to shift the byte length left by
56 bits.

8) The eighth patch is also not DMR related.  It adds support for a saturating
subtract operation that may be added to future PowerPC processors.

In terms of changes, we now use the wD constraint for accumulators.  If you
compile with -mcpu=power10, the wD constraint will match the equivalent VSX
register (0..31) that overlaps with the accumulator.  If you compile with
-mcpu=future, the wD constraint will match the DMR register and not the FPR
register.

This patch also modifies the print_operand %A 

Re: [PATCH 2/2] c++: speculative constexpr and is_constant_evaluated [PR108243]

2023-02-03 Thread Patrick Palka via Gcc-patches
On Fri, 3 Feb 2023, Patrick Palka wrote:

> On Mon, 30 Jan 2023, Jason Merrill wrote:
> 
> > On 1/27/23 17:02, Patrick Palka wrote:
> > > This PR illustrates that __builtin_is_constant_evaluated currently acts
> > > as an optimization barrier for our speculative constexpr evaluation,
> > > since we don't want to prematurely fold the builtin to false if the
> > > expression in question would be later manifestly constant evaluated (in
> > > which case it must be folded to true).
> > > 
> > > This patch fixes this by permitting __builtin_is_constant_evaluated
> > > to get folded as false during cp_fold_function, since at that point
> > > we're sure we're doing manifestly constant evaluation.  To that end
> > > we add a flags parameter to cp_fold that controls what mce_value the
> > > CALL_EXPR case passes to maybe_constant_value.
> > > 
> > > bootstrapped and rgetsted no x86_64-pc-linux-gnu, does this look OK for
> > > trunk?
> > > 
> > >   PR c++/108243
> > > 
> > > gcc/cp/ChangeLog:
> > > 
> > >   * cp-gimplify.cc (enum fold_flags): Define.
> > >   (cp_fold_data::genericize): Replace this data member with ...
> > >   (cp_fold_data::fold_flags): ... this.
> > >   (cp_fold_r): Adjust cp_fold_data use and cp_fold_calls.
> > >   (cp_fold_function): Likewise.
> > >   (cp_fold_maybe_rvalue): Likewise.
> > >   (cp_fully_fold_init): Likewise.
> > >   (cp_fold): Add fold_flags parameter.  Don't cache if flags
> > >   isn't empty.
> > >   : Pass mce_false to maybe_constant_value
> > >   if if ff_genericize is set.
> > > 
> > > gcc/testsuite/ChangeLog:
> > > 
> > >   * g++.dg/opt/pr108243.C: New test.
> > > ---
> > >   gcc/cp/cp-gimplify.cc   | 76 ++---
> > >   gcc/testsuite/g++.dg/opt/pr108243.C | 29 +++
> > >   2 files changed, 76 insertions(+), 29 deletions(-)
> > >   create mode 100644 gcc/testsuite/g++.dg/opt/pr108243.C
> > > 
> > > diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
> > > index a35cedd05cc..d023a63768f 100644
> > > --- a/gcc/cp/cp-gimplify.cc
> > > +++ b/gcc/cp/cp-gimplify.cc
> > > @@ -43,12 +43,20 @@ along with GCC; see the file COPYING3.  If not see
> > >   #include "omp-general.h"
> > >   #include "opts.h"
> > >   +/* Flags for cp_fold and cp_fold_r.  */
> > > +
> > > +enum fold_flags {
> > > +  ff_none = 0,
> > > +  /* Whether we're being called from cp_fold_function.  */
> > > +  ff_genericize = 1 << 0,
> > > +};
> > > +
> > >   /* Forward declarations.  */
> > > static tree cp_genericize_r (tree *, int *, void *);
> > >   static tree cp_fold_r (tree *, int *, void *);
> > >   static void cp_genericize_tree (tree*, bool);
> > > -static tree cp_fold (tree);
> > > +static tree cp_fold (tree, fold_flags);
> > > /* Genericize a TRY_BLOCK.  */
> > >   @@ -996,9 +1004,8 @@ struct cp_genericize_data
> > >   struct cp_fold_data
> > >   {
> > > hash_set pset;
> > > -  bool genericize; // called from cp_fold_function?
> > > -
> > > -  cp_fold_data (bool g): genericize (g) {}
> > > +  fold_flags flags;
> > > +  cp_fold_data (fold_flags flags): flags (flags) {}
> > >   };
> > > static tree
> > > @@ -1039,7 +1046,7 @@ cp_fold_r (tree *stmt_p, int *walk_subtrees, void
> > > *data_)
> > > break;
> > >   }
> > >   -  *stmt_p = stmt = cp_fold (*stmt_p);
> > > +  *stmt_p = stmt = cp_fold (*stmt_p, data->flags);
> > >   if (data->pset.add (stmt))
> > >   {
> > > @@ -1119,12 +1126,12 @@ cp_fold_r (tree *stmt_p, int *walk_subtrees, void
> > > *data_)
> > >here rather than in cp_genericize to avoid problems with the
> > > invisible
> > >reference transition.  */
> > >   case INIT_EXPR:
> > > -  if (data->genericize)
> > > +  if (data->flags & ff_genericize)
> > >   cp_genericize_init_expr (stmt_p);
> > > break;
> > > case TARGET_EXPR:
> > > -  if (data->genericize)
> > > +  if (data->flags & ff_genericize)
> > >   cp_genericize_target_expr (stmt_p);
> > >   /* Folding might replace e.g. a COND_EXPR with a TARGET_EXPR; in
> > > @@ -1157,7 +1164,7 @@ cp_fold_r (tree *stmt_p, int *walk_subtrees, void
> > > *data_)
> > >   void
> > >   cp_fold_function (tree fndecl)
> > >   {
> > > -  cp_fold_data data (/*genericize*/true);
> > > +  cp_fold_data data (ff_genericize);
> > > cp_walk_tree (_SAVED_TREE (fndecl), cp_fold_r, , NULL);
> > >   }
> > >   @@ -2375,7 +2382,7 @@ cp_fold_maybe_rvalue (tree x, bool rval)
> > >   {
> > > while (true)
> > >   {
> > > -  x = cp_fold (x);
> > > +  x = cp_fold (x, ff_none);
> > > if (rval)
> > >   x = mark_rvalue_use (x);
> > > if (rval && DECL_P (x)
> > > @@ -2434,7 +2441,7 @@ cp_fully_fold_init (tree x)
> > > if (processing_template_decl)
> > >   return x;
> > > x = cp_fully_fold (x);
> > > -  cp_fold_data data (/*genericize*/false);
> > > +  cp_fold_data data (ff_none);
> > > cp_walk_tree (, cp_fold_r, , NULL);
> > > return x;
> > >   }
> > > @@ 

Re: [PATCH 2/2] Documentation Update.

2023-02-03 Thread Joseph Myers
On Thu, 2 Feb 2023, Siddhesh Poyarekar wrote:

> I dug into this on the glibc end and it looks like this commit:
> 
> commit 63fb8f9aa9d19f85599afe4b849b567aefd70a36
> Author: Zack Weinberg 
> Date:   Mon Feb 5 14:13:41 2018 -0500
> 
> Post-cleanup 2: minimize _G_config.h.
> 
> ripped all of that gunk out.  AFAICT there's no use of struct __gconv_info
> anywhere else in the code.
> 
> I reckon it is safe to say now that glibc no longer needs this misfeature.

It would be worth testing whether any change warns anywhere else in glibc 
(not necessarily in installed headers).  And to have fixincludes for the 
installed _G_config.h from old glibc if we start rejecting such code.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 2/2] c++: speculative constexpr and is_constant_evaluated [PR108243]

2023-02-03 Thread Patrick Palka via Gcc-patches
On Mon, 30 Jan 2023, Jason Merrill wrote:

> On 1/27/23 17:02, Patrick Palka wrote:
> > This PR illustrates that __builtin_is_constant_evaluated currently acts
> > as an optimization barrier for our speculative constexpr evaluation,
> > since we don't want to prematurely fold the builtin to false if the
> > expression in question would be later manifestly constant evaluated (in
> > which case it must be folded to true).
> > 
> > This patch fixes this by permitting __builtin_is_constant_evaluated
> > to get folded as false during cp_fold_function, since at that point
> > we're sure we're doing manifestly constant evaluation.  To that end
> > we add a flags parameter to cp_fold that controls what mce_value the
> > CALL_EXPR case passes to maybe_constant_value.
> > 
> > bootstrapped and rgetsted no x86_64-pc-linux-gnu, does this look OK for
> > trunk?
> > 
> > PR c++/108243
> > 
> > gcc/cp/ChangeLog:
> > 
> > * cp-gimplify.cc (enum fold_flags): Define.
> > (cp_fold_data::genericize): Replace this data member with ...
> > (cp_fold_data::fold_flags): ... this.
> > (cp_fold_r): Adjust cp_fold_data use and cp_fold_calls.
> > (cp_fold_function): Likewise.
> > (cp_fold_maybe_rvalue): Likewise.
> > (cp_fully_fold_init): Likewise.
> > (cp_fold): Add fold_flags parameter.  Don't cache if flags
> > isn't empty.
> > : Pass mce_false to maybe_constant_value
> > if if ff_genericize is set.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/opt/pr108243.C: New test.
> > ---
> >   gcc/cp/cp-gimplify.cc   | 76 ++---
> >   gcc/testsuite/g++.dg/opt/pr108243.C | 29 +++
> >   2 files changed, 76 insertions(+), 29 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/opt/pr108243.C
> > 
> > diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
> > index a35cedd05cc..d023a63768f 100644
> > --- a/gcc/cp/cp-gimplify.cc
> > +++ b/gcc/cp/cp-gimplify.cc
> > @@ -43,12 +43,20 @@ along with GCC; see the file COPYING3.  If not see
> >   #include "omp-general.h"
> >   #include "opts.h"
> >   +/* Flags for cp_fold and cp_fold_r.  */
> > +
> > +enum fold_flags {
> > +  ff_none = 0,
> > +  /* Whether we're being called from cp_fold_function.  */
> > +  ff_genericize = 1 << 0,
> > +};
> > +
> >   /* Forward declarations.  */
> > static tree cp_genericize_r (tree *, int *, void *);
> >   static tree cp_fold_r (tree *, int *, void *);
> >   static void cp_genericize_tree (tree*, bool);
> > -static tree cp_fold (tree);
> > +static tree cp_fold (tree, fold_flags);
> > /* Genericize a TRY_BLOCK.  */
> >   @@ -996,9 +1004,8 @@ struct cp_genericize_data
> >   struct cp_fold_data
> >   {
> > hash_set pset;
> > -  bool genericize; // called from cp_fold_function?
> > -
> > -  cp_fold_data (bool g): genericize (g) {}
> > +  fold_flags flags;
> > +  cp_fold_data (fold_flags flags): flags (flags) {}
> >   };
> > static tree
> > @@ -1039,7 +1046,7 @@ cp_fold_r (tree *stmt_p, int *walk_subtrees, void
> > *data_)
> > break;
> >   }
> >   -  *stmt_p = stmt = cp_fold (*stmt_p);
> > +  *stmt_p = stmt = cp_fold (*stmt_p, data->flags);
> >   if (data->pset.add (stmt))
> >   {
> > @@ -1119,12 +1126,12 @@ cp_fold_r (tree *stmt_p, int *walk_subtrees, void
> > *data_)
> >  here rather than in cp_genericize to avoid problems with the
> > invisible
> >  reference transition.  */
> >   case INIT_EXPR:
> > -  if (data->genericize)
> > +  if (data->flags & ff_genericize)
> > cp_genericize_init_expr (stmt_p);
> > break;
> > case TARGET_EXPR:
> > -  if (data->genericize)
> > +  if (data->flags & ff_genericize)
> > cp_genericize_target_expr (stmt_p);
> >   /* Folding might replace e.g. a COND_EXPR with a TARGET_EXPR; in
> > @@ -1157,7 +1164,7 @@ cp_fold_r (tree *stmt_p, int *walk_subtrees, void
> > *data_)
> >   void
> >   cp_fold_function (tree fndecl)
> >   {
> > -  cp_fold_data data (/*genericize*/true);
> > +  cp_fold_data data (ff_genericize);
> > cp_walk_tree (_SAVED_TREE (fndecl), cp_fold_r, , NULL);
> >   }
> >   @@ -2375,7 +2382,7 @@ cp_fold_maybe_rvalue (tree x, bool rval)
> >   {
> > while (true)
> >   {
> > -  x = cp_fold (x);
> > +  x = cp_fold (x, ff_none);
> > if (rval)
> > x = mark_rvalue_use (x);
> > if (rval && DECL_P (x)
> > @@ -2434,7 +2441,7 @@ cp_fully_fold_init (tree x)
> > if (processing_template_decl)
> >   return x;
> > x = cp_fully_fold (x);
> > -  cp_fold_data data (/*genericize*/false);
> > +  cp_fold_data data (ff_none);
> > cp_walk_tree (, cp_fold_r, , NULL);
> > return x;
> >   }
> > @@ -2469,7 +2476,7 @@ clear_fold_cache (void)
> >   Function returns X or its folded variant.  */
> > static tree
> > -cp_fold (tree x)
> > +cp_fold (tree x, fold_flags flags)
> >   {
> > tree op0, op1, op2, op3;
> > tree org_x = x, r = NULL_TREE;
> > @@ -2490,8 +2497,11 @@ cp_fold (tree x)

Re: [PATCH] range-op: Handle op?.undefined_p () in op[12]_range of comparisons [PR108647]

2023-02-03 Thread Aldy Hernandez via Gcc-patches

LGTM

On 2/3/23 19:59, Jakub Jelinek wrote:

Hi!

As mentioned in the PR, we ICE because lhs is singleton [0, 0]
or [1, 1] but op2 (or in other cases op1) is undefined and op?.*_bound ()
ICEs on those because there are no pairs for UNDEFINED.

The following patch makes us set r to varying or return false in those
cases.

Included is a version of the patch I've bootstrapped/regtested successfully
on x86_64-linux and i686-linux, attached is a slight modification more
consistent with the range-op-float.cc patch.

Ok for trunk (and which one)?

2023-02-03  Jakub Jelinek  

PR tree-optimization/108647
* range-op.cc (operator_equal::op1_range,
operator_not_equal::op1_range): Don't test op2 bound
equality if op2.undefined_p (), instead set_varying.
(operator_lt::op1_range, operator_le::op1_range,
operator_gt::op1_range, operator_ge::op1_range): Return false if
op2.undefined_p ().
(operator_lt::op2_range, operator_le::op2_range,
operator_gt::op2_range, operator_ge::op2_range): Return false if
op1.undefined_p ().

* g++.dg/torture/pr108647.C: New test.

--- gcc/range-op.cc.jj  2023-02-03 10:51:40.699003658 +0100
+++ gcc/range-op.cc 2023-02-03 17:26:02.204429931 +0100
@@ -642,7 +642,8 @@ operator_equal::op1_range (irange , tr
  case BRS_FALSE:
// If the result is false, the only time we know anything is
// if OP2 is a constant.
-  if (wi::eq_p (op2.lower_bound(), op2.upper_bound()))
+  if (!op2.undefined_p ()
+ && wi::eq_p (op2.lower_bound(), op2.upper_bound()))
{
  r = op2;
  r.invert ();
@@ -755,7 +756,8 @@ operator_not_equal::op1_range (irange 
  case BRS_TRUE:
// If the result is true, the only time we know anything is if
// OP2 is a constant.
-  if (wi::eq_p (op2.lower_bound(), op2.upper_bound()))
+  if (!op2.undefined_p ()
+ && wi::eq_p (op2.lower_bound(), op2.upper_bound()))
{
  r = op2;
  r.invert ();
@@ -920,6 +922,9 @@ operator_lt::op1_range (irange , tree
const irange ,
relation_trio) const
  {
+  if (op2.undefined_p ())
+return false;
+
switch (get_bool_state (r, lhs, type))
  {
  case BRS_TRUE:
@@ -942,6 +947,9 @@ operator_lt::op2_range (irange , tree
const irange ,
relation_trio) const
  {
+  if (op1.undefined_p ())
+return false;
+
switch (get_bool_state (r, lhs, type))
  {
  case BRS_TRUE:
@@ -1031,6 +1039,9 @@ operator_le::op1_range (irange , tree
const irange ,
relation_trio) const
  {
+  if (op2.undefined_p ())
+return false;
+
switch (get_bool_state (r, lhs, type))
  {
  case BRS_TRUE:
@@ -1053,6 +1064,9 @@ operator_le::op2_range (irange , tree
const irange ,
relation_trio) const
  {
+  if (op1.undefined_p ())
+return false;
+
switch (get_bool_state (r, lhs, type))
  {
  case BRS_TRUE:
@@ -1141,6 +1155,9 @@ operator_gt::op1_range (irange , tree
const irange , const irange ,
relation_trio) const
  {
+  if (op2.undefined_p ())
+return false;
+
switch (get_bool_state (r, lhs, type))
  {
  case BRS_TRUE:
@@ -1163,6 +1180,9 @@ operator_gt::op2_range (irange , tree
const irange ,
relation_trio) const
  {
+  if (op1.undefined_p ())
+return false;
+
switch (get_bool_state (r, lhs, type))
  {
  case BRS_TRUE:
@@ -1252,6 +1272,9 @@ operator_ge::op1_range (irange , tree
const irange ,
relation_trio) const
  {
+  if (op2.undefined_p ())
+return false;
+
switch (get_bool_state (r, lhs, type))
  {
  case BRS_TRUE:
@@ -1274,6 +1297,9 @@ operator_ge::op2_range (irange , tree
const irange ,
relation_trio) const
  {
+  if (op1.undefined_p ())
+return false;
+
switch (get_bool_state (r, lhs, type))
  {
  case BRS_TRUE:
--- gcc/testsuite/g++.dg/torture/pr108647.C.jj  2023-02-03 16:36:18.347255058 
+0100
+++ gcc/testsuite/g++.dg/torture/pr108647.C 2023-02-03 16:32:16.338811259 
+0100
@@ -0,0 +1,25 @@
+// PR tree-optimization/108647
+// { dg-do compile }
+
+bool a;
+int b, c;
+
+inline const bool &
+foo (bool , const bool )
+{
+  return f < e ? f : e;
+}
+
+void
+bar (signed char e, bool *f, bool *h, bool *g)
+{
+  for (;;)
+if (g)
+  for (signed char j = 0; j < 6;
+  j += ((f[0] & c ? g[0] : int(0 >= e))
+? 0 : foo (g[0], g[0] > h[0]) + 1))
+   {
+ a = 0;
+ b = 0;
+   }
+}

Jakub




Re: [PATCH] fortran: Fix up hash table usage in gfc_trans_use_stmts [PR108451]

2023-02-03 Thread Steve Kargl via Gcc-patches
On Fri, Feb 03, 2023 at 08:03:36PM +0100, Jakub Jelinek via Fortran wrote:
> Hi!
> 
> The first testcase in the PR (which I haven't included in the patch because
> it is unclear to me if it is supposed to be valid or not) ICEs since extra
> hash table checking has been added recently.  The problem is that
> gfc_trans_use_stmts does
>   tree *slot = entry->decls->find_slot_with_hash (rent->use_name, 
> hash,
>   INSERT);
>   if (*slot == NULL)
> and later on doesn't store anything into *slot and continues.  Another spot
> a few lines later correctly clears the slot if it decides not to use the
> slot, so the following patch does the same.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 

Yes.

See my comment in the PR about the testcases being invalid Fortran.

-- 
Steve


[PATCH] range-op: Handle op?.undefined_p () in op[12]_range of comparisons [PR108647]

2023-02-03 Thread Jakub Jelinek via Gcc-patches
Hi!

As mentioned in the PR, we ICE because lhs is singleton [0, 0]
or [1, 1] but op2 (or in other cases op1) is undefined and op?.*_bound ()
ICEs on those because there are no pairs for UNDEFINED.

The following patch makes us set r to varying or return false in those
cases.

Included is a version of the patch I've bootstrapped/regtested successfully
on x86_64-linux and i686-linux, attached is a slight modification more
consistent with the range-op-float.cc patch.

Ok for trunk (and which one)?

2023-02-03  Jakub Jelinek  

PR tree-optimization/108647
* range-op.cc (operator_equal::op1_range,
operator_not_equal::op1_range): Don't test op2 bound
equality if op2.undefined_p (), instead set_varying.
(operator_lt::op1_range, operator_le::op1_range,
operator_gt::op1_range, operator_ge::op1_range): Return false if
op2.undefined_p ().
(operator_lt::op2_range, operator_le::op2_range,
operator_gt::op2_range, operator_ge::op2_range): Return false if
op1.undefined_p ().

* g++.dg/torture/pr108647.C: New test.

--- gcc/range-op.cc.jj  2023-02-03 10:51:40.699003658 +0100
+++ gcc/range-op.cc 2023-02-03 17:26:02.204429931 +0100
@@ -642,7 +642,8 @@ operator_equal::op1_range (irange , tr
 case BRS_FALSE:
   // If the result is false, the only time we know anything is
   // if OP2 is a constant.
-  if (wi::eq_p (op2.lower_bound(), op2.upper_bound()))
+  if (!op2.undefined_p ()
+ && wi::eq_p (op2.lower_bound(), op2.upper_bound()))
{
  r = op2;
  r.invert ();
@@ -755,7 +756,8 @@ operator_not_equal::op1_range (irange 
 case BRS_TRUE:
   // If the result is true, the only time we know anything is if
   // OP2 is a constant.
-  if (wi::eq_p (op2.lower_bound(), op2.upper_bound()))
+  if (!op2.undefined_p ()
+ && wi::eq_p (op2.lower_bound(), op2.upper_bound()))
{
  r = op2;
  r.invert ();
@@ -920,6 +922,9 @@ operator_lt::op1_range (irange , tree
const irange ,
relation_trio) const
 {
+  if (op2.undefined_p ())
+return false;
+
   switch (get_bool_state (r, lhs, type))
 {
 case BRS_TRUE:
@@ -942,6 +947,9 @@ operator_lt::op2_range (irange , tree
const irange ,
relation_trio) const
 {
+  if (op1.undefined_p ())
+return false;
+
   switch (get_bool_state (r, lhs, type))
 {
 case BRS_TRUE:
@@ -1031,6 +1039,9 @@ operator_le::op1_range (irange , tree
const irange ,
relation_trio) const
 {
+  if (op2.undefined_p ())
+return false;
+
   switch (get_bool_state (r, lhs, type))
 {
 case BRS_TRUE:
@@ -1053,6 +1064,9 @@ operator_le::op2_range (irange , tree
const irange ,
relation_trio) const
 {
+  if (op1.undefined_p ())
+return false;
+
   switch (get_bool_state (r, lhs, type))
 {
 case BRS_TRUE:
@@ -1141,6 +1155,9 @@ operator_gt::op1_range (irange , tree
const irange , const irange ,
relation_trio) const
 {
+  if (op2.undefined_p ())
+return false;
+
   switch (get_bool_state (r, lhs, type))
 {
 case BRS_TRUE:
@@ -1163,6 +1180,9 @@ operator_gt::op2_range (irange , tree
const irange ,
relation_trio) const
 {
+  if (op1.undefined_p ())
+return false;
+
   switch (get_bool_state (r, lhs, type))
 {
 case BRS_TRUE:
@@ -1252,6 +1272,9 @@ operator_ge::op1_range (irange , tree
const irange ,
relation_trio) const
 {
+  if (op2.undefined_p ())
+return false;
+
   switch (get_bool_state (r, lhs, type))
 {
 case BRS_TRUE:
@@ -1274,6 +1297,9 @@ operator_ge::op2_range (irange , tree
const irange ,
relation_trio) const
 {
+  if (op1.undefined_p ())
+return false;
+
   switch (get_bool_state (r, lhs, type))
 {
 case BRS_TRUE:
--- gcc/testsuite/g++.dg/torture/pr108647.C.jj  2023-02-03 16:36:18.347255058 
+0100
+++ gcc/testsuite/g++.dg/torture/pr108647.C 2023-02-03 16:32:16.338811259 
+0100
@@ -0,0 +1,25 @@
+// PR tree-optimization/108647
+// { dg-do compile }
+
+bool a;
+int b, c;
+
+inline const bool &
+foo (bool , const bool )
+{
+  return f < e ? f : e;
+}
+
+void
+bar (signed char e, bool *f, bool *h, bool *g)
+{
+  for (;;)
+if (g)
+  for (signed char j = 0; j < 6;
+  j += ((f[0] & c ? g[0] : int(0 >= e))
+? 0 : foo (g[0], g[0] > h[0]) + 1))
+   {
+ a = 0;
+ b = 0;
+   }
+}

Jakub
2023-02-03  Jakub Jelinek  

PR tree-optimization/108647
* range-op.cc (operator_equal::op1_range,
operator_not_equal::op1_range): Don't test op2 bound
equality if op2.undefined_p (), instead set_varying.

[PATCH] fortran: Fix up hash table usage in gfc_trans_use_stmts [PR108451]

2023-02-03 Thread Jakub Jelinek via Gcc-patches
Hi!

The first testcase in the PR (which I haven't included in the patch because
it is unclear to me if it is supposed to be valid or not) ICEs since extra
hash table checking has been added recently.  The problem is that
gfc_trans_use_stmts does
  tree *slot = entry->decls->find_slot_with_hash (rent->use_name, hash,
  INSERT);
  if (*slot == NULL)
and later on doesn't store anything into *slot and continues.  Another spot
a few lines later correctly clears the slot if it decides not to use the
slot, so the following patch does the same.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-02-03  Jakub Jelinek  

PR fortran/108451
* trans-decl.cc (gfc_trans_use_stmts): Call clear_slot before
doing continue.

--- gcc/fortran/trans-decl.cc.jj2023-01-16 11:52:16.146733136 +0100
+++ gcc/fortran/trans-decl.cc   2023-02-03 14:41:40.503322954 +0100
@@ -5350,7 +5350,11 @@ gfc_trans_use_stmts (gfc_namespace * ns)
  /* Sometimes, generic interfaces wind up being over-ruled by a
 local symbol (see PR41062).  */
  if (!st->n.sym->attr.use_assoc)
-   continue;
+   {
+ *slot = error_mark_node;
+ entry->decls->clear_slot (slot);
+ continue;
+   }
 
  if (st->n.sym->backend_decl
  && DECL_P (st->n.sym->backend_decl)

Jakub



[PATCH] ubsan: Fix up another spot that should have been BUILT_IN_UNREACHABLE_TRAPS [PR108655]

2023-02-03 Thread Jakub Jelinek via Gcc-patches
Hi!

We ICE on the following testcase, because ivcanon calls
gimple_build_builtin_unreachable but doesn't expect it would need vops.
BUILT_IN_UNREACHABLE_TRAP I've introduced yesterday doesn't need
vops and should be used in that case instead of BUILT_IN_TRAP which
needs them.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-02-03  Jakub Jelinek  

PR tree-optimization/108655
* ubsan.cc (sanitize_unreachable_fn): For -funreachable-traps
or -fsanitize=unreachable -fsanitize-trap=unreachable return
BUILT_IN_UNREACHABLE_TRAP decl rather than BUILT_IN_TRAP.

* gcc.dg/pr108655.c: New test.

--- gcc/ubsan.cc.jj 2023-01-02 09:32:38.393053992 +0100
+++ gcc/ubsan.cc2023-02-03 11:40:47.047399386 +0100
@@ -649,7 +649,7 @@ sanitize_unreachable_fn (tree *data, loc
   ? (flag_sanitize_trap & SANITIZE_UNREACHABLE)
   : flag_unreachable_traps)
 {
-  fn = builtin_decl_explicit (BUILT_IN_TRAP);
+  fn = builtin_decl_explicit (BUILT_IN_UNREACHABLE_TRAP);
   *data = NULL_TREE;
 }
   else if (san)
--- gcc/testsuite/gcc.dg/pr108655.c.jj  2023-02-03 11:46:39.533190031 +0100
+++ gcc/testsuite/gcc.dg/pr108655.c 2023-02-03 11:46:28.272356439 +0100
@@ -0,0 +1,15 @@
+/* PR tree-optimization/108655 */
+/* { dg-do compile } */
+/* { dg-options "-w -O1 -funreachable-traps" } */
+
+void
+foo (void)
+{
+  int i, j;
+  for (; i;)
+;
+  for (; i < 6;)
+for (j = 0; j < 6; ++j)
+  i += j;
+  __builtin_trap ();
+}

Jakub



Re: [PATCH] range-ops: Handle undefined ranges in frange op[12]_range [PR108647]

2023-02-03 Thread Jakub Jelinek via Gcc-patches
On Fri, Feb 03, 2023 at 07:09:18PM +0100, Aldy Hernandez wrote:
> This patch gracefully handles undefined operand ranges for the floating
> point op[12]_range operators.  This is very low risk, as we would have
> ICEd otherwise.
> 
> We don't have a testcase that ICEs for floating point ranges, but it's
> only a matter of time.  Besides, this dovetails nicely with the integer
> versions Jakub is testing.

LGTM (even bootstrapped/regtested this successfully on i686-linux).

Jakub



Re: [PATCH] [PR tree-optimization/18639] Compare nonzero bits in irange with widest_int.

2023-02-03 Thread Jakub Jelinek via Gcc-patches
On Fri, Feb 03, 2023 at 11:23:28AM -0500, Andrew MacLeod wrote:
> 
> On 2/3/23 04:16, Jakub Jelinek wrote:
> > On Fri, Feb 03, 2023 at 09:50:43AM +0100, Aldy Hernandez wrote:
> > > [PR tree-optimization/18639] Compare nonzero bits in irange with 
> > > widest_int.
> > 0 missing in the bug number in the subject line, though the current
> > recommended formatting of the subject is I think:
> > value-range: Compare nonzero bits in irange with widest_int [PR180639]
> >  PR 108639/tree-optimization
> > 
> > Reversed component and number
> > 
> > > --- a/gcc/value-range.cc
> > > +++ b/gcc/value-range.cc
> > > @@ -1259,7 +1259,10 @@ irange::legacy_equal_p (const irange ) const
> > >  other.tree_lower_bound (0))
> > > && vrp_operand_equal_p (tree_upper_bound (0),
> > > other.tree_upper_bound (0))
> > > -   && get_nonzero_bits () == other.get_nonzero_bits ());
> > > +   && (widest_int::from (get_nonzero_bits (),
> > > + TYPE_SIGN (type ()))
> > > +   == widest_int::from (other.get_nonzero_bits (),
> > > +TYPE_SIGN (other.type ();
> > >   }
> > >   bool
> > > @@ -1294,7 +1297,11 @@ irange::operator== (const irange ) const
> > > || !operand_equal_p (ub, ub_other, 0))
> > >   return false;
> > >   }
> > > -  return get_nonzero_bits () == other.get_nonzero_bits ();
> > > +  widest_int nz1 = widest_int::from (get_nonzero_bits (),
> > > +  TYPE_SIGN (type ()));
> > > +  widest_int nz2 = widest_int::from (other.get_nonzero_bits (),
> > > +  TYPE_SIGN (other.type ()));
> > > +  return nz1 == nz2;
> > >   }
> > While the above avoids the ICE (and would be certainly correct for
> > the bounds, depending on the sign of their type sign or zero extended
> > to widest int), but is the above what we want for non-zero bits
> > to be considered equal?  The wide_ints (which ought to have precision
> > of the corresponding type) don't represent normal numbers but bitmasks,
> > 0 - this bit is known to be zero, 1 - nothing is known about this bit).
> > So, if there are different precisions and the narrower value has 0
> > in the MSB of the bitmask (so MSB is known to be zero), the above requires
> > for equality that in the other range all upper bits are known to be zero
> > too for both signed and unsigned.  That is ok.  Similarly for MSB set
> > if TYPE_SIGN of the narrower is unsigned, the MSB value is unknown, but we
> > require on the wider to have all the upper bits cleared.  But for signed
> > narrower type with MSB set, i.e. it is unknown if it is positive or
> > negative, the above requires that all the above bits are unknown too.
> > And that is the case I'm not sure about, whether in that case the
> > upper bits of the wider wide_int should be checked at all.
> > Though, perhaps from the POV of nonzero bits derived from the sign-extended
> > values in the ranges sign bit copies (so all above bits 1) is what one would
> > get, so maybe it is ok.  Just food for thought.
> > 
> if the bits match exactly along with everything else, then we can be sure
> the ranges are truly equal.  If for some reason the numbers are all the same
> but the non-zero bits don't compare equal,  then I can't think of what harm
> it could cause to compare unequal..  Worst case is we dont perform some
> optimization in this extremely rare scenario of differing precisions.  And
> in fact they could actually be unequal...
> 
> So I suspect this is fine...

Ok then.

Jakub



Re: [wwwdocs] document modula-2 in gcc-13/changes.html (and index.html)

2023-02-03 Thread Jakub Jelinek via Gcc-patches
On Fri, Feb 03, 2023 at 02:52:43PM +, Gaius Mulley via Gcc-patches wrote:
> 
> Hello,
> 
> The following patch provides a summary of the modula-2 front end
> and also contains links to the online modula-2 documentation in
> index.html.
> 
> [I'm just about to git push fixes so that modula-2 builds html, info and
>  pdf documentation into the standard directories.]

IMHO it should go also into the News section on the gcc.gnu.org page.
If you look into https://gcc.gnu.org/news.html which contains older news,
it contains also e.g. D addition entry.

Jakub



Re: [pushed] libstdc++: Tweak link to ABIcheck project

2023-02-03 Thread Jonathan Wakely via Gcc-patches
On Fri, 3 Feb 2023, 19:20 Gerald Pfeifer wrote:

> libstdc++-v3/ChangeLog:
>
> * doc/xml/manual/abi.xml: Tweak link to ABIcheck project.
>


We should probably link to libabigail instead, or as well.


* doc/html/manual/abi.html: Regenerate.
> ---
>  libstdc++-v3/doc/html/manual/abi.html | 2 +-
>  libstdc++-v3/doc/xml/manual/abi.xml   | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/libstdc++-v3/doc/html/manual/abi.html
> b/libstdc++-v3/doc/html/manual/abi.html
> index 2781aeda30d..a9f7f04fa0c 100644
> --- a/libstdc++-v3/doc/html/manual/abi.html
> +++ b/libstdc++-v3/doc/html/manual/abi.html
> @@ -527,7 +527,7 @@ gcc test.c -g -O2 -L. -lone -ltwo
> /usr/lib/libstdc++.so.5 /usr/lib/libstdc++.so.
>  
>  http://gcc.gnu.org/PR19664; target="_top">19664:
> libstdc++ headers should have pop/push of the visibility around the
> declarations
>   class="title"> id="abi.biblio">Bibliography class="biblioentry">[biblio.abicheck]  class="title">
> -   http://abicheck.sourceforge.net;
> target="_top">
> +   https://abicheck.sourceforge.net;
> target="_top">
>   ABIcheck
> 
>.  id="biblio.cxxabi">[biblio.cxxabi] 
> diff --git a/libstdc++-v3/doc/xml/manual/abi.xml
> b/libstdc++-v3/doc/xml/manual/abi.xml
> index 3cca1fd3b38..3a3cbd3346c 100644
> --- a/libstdc++-v3/doc/xml/manual/abi.xml
> +++ b/libstdc++-v3/doc/xml/manual/abi.xml
> @@ -1130,7 +1130,7 @@ gcc test.c -g -O2 -L. -lone -ltwo
> /usr/lib/libstdc++.so.5 /usr/lib/libstdc++.so.
>  
>
> http://www.w3.org/1999/xlink;
> - xlink:href="http://abicheck.sourceforge.net;>
> + xlink:href="https://abicheck.sourceforge.net;>
>   ABIcheck
> 
>
> --
> 2.39.1
>


[pushed] wwwdocs: news/profiledriven: Move citeseerx.ist.psu.edu to https

2023-02-03 Thread Gerald Pfeifer
Business as usual.

Pushed.
Gerald

---
 htdocs/news/profiledriven.html | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/htdocs/news/profiledriven.html b/htdocs/news/profiledriven.html
index 13a61ec0..cac172b4 100644
--- a/htdocs/news/profiledriven.html
+++ b/htdocs/news/profiledriven.html
@@ -271,7 +271,7 @@ href="../benchmarks/">our benchmarks page.
 Frequency and Program Profile Analysis; Wu and Larus; MICRO-27.
 
 [3] wwwdocs:
-http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.37.7180;>Design 
and Analysis of Profile-Based Optimization in Compaq's Compilation Tools for 
Alpha;
+https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.37.7180;>Design 
and Analysis of Profile-Based Optimization in Compaq's Compilation Tools for 
Alpha;
Journal of Instruction-Level Parallelism 3 (2000) 1-25
 
 [4] wwwdocs:
@@ -287,7 +287,7 @@ Frequency and Program Profile Analysis; Wu and Larus; 
MICRO-27.
Cliff Young, David S. Johnson, David R. Karger, Michael D. Smith, ACM 
1997
 
 [6] wwwdocs:
-http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.50.2235;>Software 
Trace Cache;
+https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.50.2235;>Software
 Trace Cache;
International Conference on Supercomputing, 1999
 
 [7] wwwdocs:
-- 
2.39.1


[pushed] libstdc++: Tweak link to ABIcheck project

2023-02-03 Thread Gerald Pfeifer
libstdc++-v3/ChangeLog:

* doc/xml/manual/abi.xml: Tweak link to ABIcheck project.
* doc/html/manual/abi.html: Regenerate.
---
 libstdc++-v3/doc/html/manual/abi.html | 2 +-
 libstdc++-v3/doc/xml/manual/abi.xml   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/doc/html/manual/abi.html 
b/libstdc++-v3/doc/html/manual/abi.html
index 2781aeda30d..a9f7f04fa0c 100644
--- a/libstdc++-v3/doc/html/manual/abi.html
+++ b/libstdc++-v3/doc/html/manual/abi.html
@@ -527,7 +527,7 @@ gcc test.c -g -O2 -L. -lone -ltwo /usr/lib/libstdc++.so.5 
/usr/lib/libstdc++.so.
 
 http://gcc.gnu.org/PR19664; target="_top">19664: 
libstdc++ headers should have pop/push of the visibility around the 
declarations
 Bibliography[biblio.abicheck] 
-   http://abicheck.sourceforge.net; target="_top">
+   https://abicheck.sourceforge.net; target="_top">
  ABIcheck

   . [biblio.cxxabi] 
diff --git a/libstdc++-v3/doc/xml/manual/abi.xml 
b/libstdc++-v3/doc/xml/manual/abi.xml
index 3cca1fd3b38..3a3cbd3346c 100644
--- a/libstdc++-v3/doc/xml/manual/abi.xml
+++ b/libstdc++-v3/doc/xml/manual/abi.xml
@@ -1130,7 +1130,7 @@ gcc test.c -g -O2 -L. -lone -ltwo /usr/lib/libstdc++.so.5 
/usr/lib/libstdc++.so.
 
   
http://www.w3.org/1999/xlink;
- xlink:href="http://abicheck.sourceforge.net;>
+ xlink:href="https://abicheck.sourceforge.net;>
  ABIcheck

   
-- 
2.39.1


Re: [wwwdocs] document modula-2 in gcc-13/changes.html (and index.html)

2023-02-03 Thread Gerald Pfeifer
On Fri, 3 Feb 2023, Gaius Mulley wrote:
> The following patch provides a summary of the modula-2 front end
> and also contains links to the online modula-2 documentation in
> index.html.

> +Modula-2
> +
> +  Support for the language Modula-2 has been added.  The dialects
> +  supported are PIM2, PIM3, PIM4 and ISO/IEC 10514-1.  Also included
> +  are a complete set of ISO/IEC 10514-1 libraries and PIM
> +libraries.

I wonder whether we can this a bit more active. 

Maybe something like "This includes support for the ... dialects, a 
complete set of ...and ..."?

> +  https://gcc.gnu.org/onlinedocs/m2/Compiler-options.html;>
> +  Compiler options.

Maybe put this in parenthesis since it's not an update as such and more 
relatives to the previous item?

> +  Linking has been redesigned.

What are we saying here? I.e., what is the change we are announcing? As a 
user, what might I notice? Why do I care?


The above are questions to possibly improve this for our users. Please 
adjust as you see fit, or push as is, if you prefer.


On a somewhat related note: This is definitely big enough to warrant an 
entry in the News section on our main page. :-)  Do you want to propose
something?

Gerald


Re: [PATCH] Improve RTL CSE hash table hash usage

2023-02-03 Thread Richard Biener via Gcc-patches



> Am 03.02.2023 um 16:55 schrieb Richard Sandiford via Gcc-patches 
> :
> 
> Richard Biener via Gcc-patches  writes:
 Am 03.02.2023 um 15:20 schrieb Richard Sandiford via Gcc-patches 
 :
>>> 
>>> Richard Biener via Gcc-patches  writes:
 The RTL CSE hash table has a fixed number of buckets (32) each
 with a linked list of entries with the same hash value.  The
 actual hash values are computed using hash_rtx which uses adds
 for mixing and adds the rtx CODE as CODE << 7 (apart from some
 exceptions such as MEM).  The unsigned int typed hash value
 is then simply truncated for the actual lookup into the fixed
 size table which means that usually CODE is simply lost.
 
 The following improves this truncation by first mixing in more
 bits using xor.  It does not change the actual hash function
 since that's used outside of CSE as well.
 
 An alternative would be to bump the fixed number of buckets,
 say to 256 which would retain the LSB of CODE or to 8192 which
 can capture all 6 bits required for the last CODE.
 
 As the comment in CSE says, there's invalidate_memory and
 flush_hash_table done possibly frequently and those at least
 need to walk all slots, so when the hash table is mostly empty
 enlarging it will be a loss.  Still there should be more
 regular lookups by hash, so less collisions should pay off
 as well.
>>> 
>>> Going purely from this description and without having looked
>>> at the code properly, would it be possible to link all current
>>> values together, not just those with the same hash?  And would
>>> that help?  It looks like the list is already doubly-linked,
>>> and there's spare room to store a "start of new hash" marker.
>> 
>> We already do have equivalent values linked, but I’m not sure that’s what 
>> you are suggesting.
> 
> I was thinking of linking every active value in the table together,
> but with entries for the same hash being consecutive.  That way, things
> like invalidate_memory can just walk the list and ignore the hash table.

Ah, yeah.  Even better might be a generation count for memory like there’s one 
(but only for a subset of cases?!) for pseudos.  That would avoid the walking 
altogether.

>> Those should also have the same hash value, so both lists are somewhat 
>> redundant and we might be able to save some storage here by making this a 
>> list of lists of same hash and value list?
> 
> I thought the value-equality list was to establish that (e.g.)
> (reg R1) and (reg R2) are known to have the same value, despite
> being different expressions with different hash values.

I’d have to double check, I was just cursory sweeping over the code after being 
pointed to it from a profile of a testcase.  Most of CSE.cc dates back to the 
initial source revision …

Richard 

> But I suppose if we reused an existing hash table structure (with its
> own mechanism for handling collisions), it would make sense to use the
> equivalent-value list to join everything together, rather than the
> same-hash list.  Again, there could be a marker to establish the start
> of a new equivalent-value sublist.
> 
> Thanks,
> Richard
>> 
>>> 
>>> Thanks,
>>> Richard
>>> 
 Without enlarging the table a better hash function is unlikely
 going to make a big difference, simple statistics on the
 number of collisions at insertion time shows a reduction of
 around 10%.  Bumping HASH_SHIFT by 1 improves that to 30%
 at the expense of reducing the average table fill by 10%
 (all of this stats from looking just at fold-const.i at -O2).
 Increasing HASH_SHIFT more leaves the table even more sparse
 likely showing that hash_rtx uses add for mixing which is
 quite bad.  Bumping HASH_SHIFT by 2 removes 90% of all
 collisions.
 
 Experimenting with using inchash instead of adds for the
 mixing does not improve things when looking at the HASH_SHIFT
 bumped by 2 numbers.
 
 Bootstrapped and tested on x86_64-unknown-linux-gnu.
 
 Any opinions?
 
   * cse.cc (HASH): Turn into inline function and mix
   in another HASH_SHIFT bits.
   (SAFE_HASH): Likewise.
 ---
 gcc/cse.cc | 37 +++--
 1 file changed, 23 insertions(+), 14 deletions(-)
 
 diff --git a/gcc/cse.cc b/gcc/cse.cc
 index 37afc88b439..4777e559b86 100644
 --- a/gcc/cse.cc
 +++ b/gcc/cse.cc
 @@ -420,20 +420,6 @@ struct table_elt
 #define HASH_SIZE(1 << HASH_SHIFT)
 #define HASH_MASK(HASH_SIZE - 1)
 
 -/* Compute hash code of X in mode M.  Special-case case where X is a 
 pseudo
 -   register (hard registers may require `do_not_record' to be set).  */
 -
 -#define HASH(X, M)\
 - ((REG_P (X) && REGNO (X) >= FIRST_PSEUDO_REGISTER\
 -  ? (((unsigned) REG << 7) + (unsigned) REG_QTY (REGNO (X)))\
 -  : canon_hash (X, M)) & HASH_MASK)
 -

Re: [PATCH v2] c++: wrong error with constexpr array and value-init [PR108158]

2023-02-03 Thread Marek Polacek via Gcc-patches
On Fri, Feb 03, 2023 at 01:53:48PM -0500, Jason Merrill wrote:
> On 2/3/23 13:08, Marek Polacek wrote:
> > On Thu, Feb 02, 2023 at 05:29:44PM -0500, Jason Merrill wrote:
> > > On 1/30/23 21:35, Marek Polacek wrote:
> > > > In this test case, we find ourselves evaluating 't' which is
> > > > ((const struct carray *) this)->data_[VIEW_CONVERT_EXPR > > > int>(index)]
> > > > in cxx_eval_array_reference.  ctx->object is non-null, a RESULT_DECL, so
> > > > we replace it with 't':
> > > > 
> > > > new_ctx.object = t; // result_decl replaced
> > > > 
> > > > and then we go to cxx_eval_constant_expression to evaluate an
> > > > AGGR_INIT_EXPR, where we end up evaluating an INIT_EXPR (which is in the
> > > > body of the constructor for seed_or_index):
> > > > 
> > > > ((struct seed_or_index *) this)->value_ = NON_LVALUE_EXPR <0>
> > > > 
> > > > whereupon in cxx_eval_store_expression we go to the probe loop
> > > > where the 'this' is evaluated to
> > > > 
> > > > ze_set.tables_.first_table_.data_[0]
> > > > 
> > > > so the 'object' is ze_set, but that isn't in ctx->global->get_value_ptr
> > > > so we fail with a bogus error.  ze_set is not there because it comes
> > > > from a different constexpr context (it's not in cv_cache either).
> > > > 
> > > > The problem started with r12-2304 where I added the new_ctx.object
> > > > replacement.  That was to prevent a type mismatch: the type of 't'
> > > > and ctx.object were different.
> > > > 
> > > > It seems clear that we shouldn't have replaced ctx.object here.
> > > > The cxx_eval_array_reference I mentioned earlier is called from
> > > > cxx_eval_store_expression:
> > > >6257   init = cxx_eval_constant_expression (_ctx, init, 
> > > > vc_prvalue,
> > > >6258non_constant_p, 
> > > > overflow_p);
> > > > which already created a new context, whose .object we should be
> > > > using unless, for instance, INIT contained a.b and we're evaluating
> > > > the 'a' part, which I think was the case for r12-2304; in that case
> > > > ctx.object has to be something different.
> > > > 
> > > > A relatively safe fix should be to check the types before replacing
> > > > ctx.object, as in the below.
> > > 
> > > Agreed.  I'm trying to understand when the replacement could ever make
> > > sense, since 't' is not the target, it's the initializer.  The replacement
> > > comes from Patrick's fix for 98295, but that testcase no longer hits that
> > > code (likely due to changes in empty class handling).
> > > 
> > > If you add a gcc_checking_assert (false) to the replacement, does anything
> > > trip it?
> > 
> > It would trip in constexpr-101371.C, added in r12-2304.  BUT, and I would
> > have sworn that it ICEd when I tried, it's not necessary anymore.  So it
> > looks like we can simply remove the new_ctx.object line.  At least for
> > trunk, maybe 12 too.
> > 
> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> 
> OK, thanks.  Let's go with your original patch for 11/12.

Will do, thanks.  I think I'll wait for a few days before backporting.

Marek



Re: [PATCH v2] c++: wrong error with constexpr array and value-init [PR108158]

2023-02-03 Thread Jason Merrill via Gcc-patches

On 2/3/23 13:08, Marek Polacek wrote:

On Thu, Feb 02, 2023 at 05:29:44PM -0500, Jason Merrill wrote:

On 1/30/23 21:35, Marek Polacek wrote:

In this test case, we find ourselves evaluating 't' which is
((const struct carray *) this)->data_[VIEW_CONVERT_EXPR(index)]
in cxx_eval_array_reference.  ctx->object is non-null, a RESULT_DECL, so
we replace it with 't':

new_ctx.object = t; // result_decl replaced

and then we go to cxx_eval_constant_expression to evaluate an
AGGR_INIT_EXPR, where we end up evaluating an INIT_EXPR (which is in the
body of the constructor for seed_or_index):

((struct seed_or_index *) this)->value_ = NON_LVALUE_EXPR <0>

whereupon in cxx_eval_store_expression we go to the probe loop
where the 'this' is evaluated to

ze_set.tables_.first_table_.data_[0]

so the 'object' is ze_set, but that isn't in ctx->global->get_value_ptr
so we fail with a bogus error.  ze_set is not there because it comes
from a different constexpr context (it's not in cv_cache either).

The problem started with r12-2304 where I added the new_ctx.object
replacement.  That was to prevent a type mismatch: the type of 't'
and ctx.object were different.

It seems clear that we shouldn't have replaced ctx.object here.
The cxx_eval_array_reference I mentioned earlier is called from
cxx_eval_store_expression:
   6257   init = cxx_eval_constant_expression (_ctx, init, vc_prvalue,
   6258non_constant_p, overflow_p);
which already created a new context, whose .object we should be
using unless, for instance, INIT contained a.b and we're evaluating
the 'a' part, which I think was the case for r12-2304; in that case
ctx.object has to be something different.

A relatively safe fix should be to check the types before replacing
ctx.object, as in the below.


Agreed.  I'm trying to understand when the replacement could ever make
sense, since 't' is not the target, it's the initializer.  The replacement
comes from Patrick's fix for 98295, but that testcase no longer hits that
code (likely due to changes in empty class handling).

If you add a gcc_checking_assert (false) to the replacement, does anything
trip it?


It would trip in constexpr-101371.C, added in r12-2304.  BUT, and I would
have sworn that it ICEd when I tried, it's not necessary anymore.  So it
looks like we can simply remove the new_ctx.object line.  At least for
trunk, maybe 12 too.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK, thanks.  Let's go with your original patch for 11/12.


-- >8 --
In this test case, we find ourselves evaluating 't' which is
((const struct carray *) this)->data_[VIEW_CONVERT_EXPR(index)]
in cxx_eval_array_reference.  ctx->object is non-null, a RESULT_DECL, so
we replace it with 't':

   new_ctx.object = t; // result_decl replaced

and then we go to cxx_eval_constant_expression to evaluate an
AGGR_INIT_EXPR, where we end up evaluating an INIT_EXPR (which is in the
body of the constructor for seed_or_index):

   ((struct seed_or_index *) this)->value_ = NON_LVALUE_EXPR <0>

whereupon in cxx_eval_store_expression we go to the probe loop
where the 'this' is evaluated to

   ze_set.tables_.first_table_.data_[0]

so the 'object' is ze_set, but that isn't in ctx->global->get_value_ptr
so we fail with a bogus error.  ze_set is not there because it comes
from a different constexpr context (it's not in cv_cache either).

The problem started with r12-2304 where I added the new_ctx.object
replacement.  That was to prevent a type mismatch: the type of 't'
and ctx.object were different.

It seems clear that we shouldn't have replaced ctx.object here.
The cxx_eval_array_reference I mentioned earlier is called from
cxx_eval_store_expression:
  6257   init = cxx_eval_constant_expression (_ctx, init, vc_prvalue,
  6258non_constant_p, overflow_p);
which already created a new context, whose .object we should be
using unless, for instance, INIT contained a.b and we're evaluating
the 'a' part, which I think was the case for r12-2304; in that case
ctx.object has to be something different.

It no longer seems necessary to replace new_ctx.object (likely due to
changes in empty class handling).

PR c++/108158

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_array_reference): Don't replace
new_ctx.object.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/constexpr-108158.C: New test.
---
  gcc/cp/constexpr.cc   |  4 ---
  gcc/testsuite/g++.dg/cpp1y/constexpr-108158.C | 32 +++
  2 files changed, 32 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-108158.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 5b31f9c27d1..564766c8a00 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -4301,10 +4301,6 @@ cxx_eval_array_reference (const constexpr_ctx *ctx, tree 
t,
if (!SCALAR_TYPE_P (elem_type))
  {

[pushed] c++: Add fixed test [PR101071]

2023-02-03 Thread Marek Polacek via Gcc-patches
As a happy accident, this was fixed by the recent r13-2978.

PR c++/101071

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/variadic-alias8.C: New test.
---
 gcc/testsuite/g++.dg/cpp0x/variadic-alias8.C | 95 
 1 file changed, 95 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/variadic-alias8.C

diff --git a/gcc/testsuite/g++.dg/cpp0x/variadic-alias8.C 
b/gcc/testsuite/g++.dg/cpp0x/variadic-alias8.C
new file mode 100644
index 000..1d317ef8438
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/variadic-alias8.C
@@ -0,0 +1,95 @@
+// PR c++/101071
+// { dg-do compile { target c++11 } }
+// { dg-additional-options "-fno-elide-constructors -O2" }
+
+// Like variadic-alias2.C, just different options.
+
+template
+struct list {};
+
+struct nil;
+
+
+
+template
+struct number {
+  constexpr /*implicit*/ operator int() const { return n; }
+  using type = number;
+};
+
+using false_ = number<0>;
+using true_ = number<1>;
+
+static_assert(!false_{}, "");
+static_assert(true_{}, "");
+
+template using numbers = list...>;
+
+
+
+template
+struct less_impl;
+
+template
+struct less_impl, number>
+  : number<(lhs < rhs)> {};
+
+template using less = typename less_impl::type;
+
+
+
+template
+struct sum_impl {
+  static_assert(sizeof...(vs) == 0, "see specialization");
+  using type = v0;
+};
+
+template
+struct sum_impl, number, vs...>
+  : sum_impl, vs...> {};
+
+template using sum = typename sum_impl::type;
+
+
+
+template
+struct conditional_impl {
+  static_assert(num{}, "see specialization");
+
+  template
+  using type = T;
+};
+
+template<>
+struct conditional_impl {
+  template
+  using type = F;
+};
+
+template
+using conditional = typename conditional_impl::template type;
+
+
+
+template
+struct min_filter_impl;
+
+template
+struct min_filter_impl> {
+  template
+  using count_better_mins = sum...>;
+
+  using type = list, nil, nums>...>;
+};
+
+template using min_filter = typename min_filter_impl::type;
+
+
+
+void test_min_filter() {
+  using computed = min_filter>;
+  using expected = list, nil, number<2>>;
+  (void)(computed{} = expected{});// compiles for identical types
+}
+
+int main() {}

base-commit: f0065f207cf19cd960b33d961472c6d69514336f
-- 
2.39.1



Re: [PATCH] PR tree-optimization/107570 - Reset SCEV after folding in VRP.

2023-02-03 Thread Richard Biener via Gcc-patches



> Am 03.02.2023 um 16:54 schrieb Andrew MacLeod :
> 
> 
>> On 2/2/23 07:22, Richard Biener wrote:
>>> On Wed, Feb 1, 2023 at 7:12 PM Andrew MacLeod via Gcc-patches
>>>  wrote:
>>> We can reset SCEV after we fold, then SCEVs cache shouldn't have
>>> anything in it when we go to remove ssa-names in remove_unreachable().
>>> 
>>> We were resetting it later sometimes if we were processing the array
>>> bounds warning, so I removed that call and just always reset it now.
>>> 
>>> Bootstraps on x86_64-pc-linux-gnu. Testing running. Assuming no
>>> regressions,  OK for trunk?
>> +
>> +  // SCEV needs to be reset for array bounds, and we do not wish to trigger
>> +  // any SCEV lookups when removing unreachable globals, so reset it here.
>> +  scev_reset ();
>> 
>> the comment suggests that SCEV queries (aka analyze_scalar_evolution)
>> won't return anything after a scev_reset ().  That's not true - instead what
>> it does is nuke the SCEV cache.  That's necessary when you
>> release SSA names or alter the CFG and you want to avoid followup
>> SCEV queries to pick up stale data.
>> 
>> So if remove_and_update_globals performs SCEV queries and eventually
>> releases SSA names you cannot remove the second call to scev_reset.
>> 
>> But yes, it's probably substitute_and_fold_engine::substitute_and_fold
>> itself that should do a
>> 
>>   if (scev_initialized_p ())
>> scev_reset ();
>> 
>> possibly only in the case it released an SSA name, or removed an
>> edge (but that's maybe premature optimization).
>> 
>> Richard.
>> 
>> 
> Mmm, yeah thats not what I meant to imply.  We can simply reset it before 
> trying to process the unreachable globals like so. Maybe we visit the  S 
> engine next release.. It seems more intrusive than this.
> 
> How about this patch?  testing underway.

Lgtm

Richard 
> Andrew
> 
> <0001-Reset-SCEV-before-removing-unreachable-globals.patch>


[PATCH] range-ops: Handle undefined ranges in frange op[12]_range [PR108647]

2023-02-03 Thread Aldy Hernandez via Gcc-patches
This patch gracefully handles undefined operand ranges for the floating
point op[12]_range operators.  This is very low risk, as we would have
ICEd otherwise.

We don't have a testcase that ICEs for floating point ranges, but it's
only a matter of time.  Besides, this dovetails nicely with the integer
versions Jakub is testing.

Tested on x86-64 Linux.

OK?

gcc/ChangeLog:

PR tree-optimization/108647
* range-op-float.cc (foperator_lt::op1_range): Handle undefined ranges.
(foperator_lt::op2_range): Same.
(foperator_le::op1_range): Same.
(foperator_le::op2_range): Same.
(foperator_gt::op1_range): Same.
(foperator_gt::op2_range): Same.
(foperator_ge::op1_range): Same.
(foperator_ge::op2_range): Same.
(foperator_unordered_lt::op1_range): Same.
(foperator_unordered_lt::op2_range): Same.
(foperator_unordered_le::op1_range): Same.
(foperator_unordered_le::op2_range): Same.
(foperator_unordered_gt::op1_range): Same.
(foperator_unordered_gt::op2_range): Same.
(foperator_unordered_ge::op1_range): Same.
(foperator_unordered_ge::op2_range): Same.
---
 gcc/range-op-float.cc | 56 +++
 1 file changed, 56 insertions(+)

diff --git a/gcc/range-op-float.cc b/gcc/range-op-float.cc
index 2db83aeb2fc..ff42b95de4f 100644
--- a/gcc/range-op-float.cc
+++ b/gcc/range-op-float.cc
@@ -866,6 +866,8 @@ foperator_lt::op1_range (frange ,
   // The TRUE side of x < NAN is unreachable.
   if (op2.known_isnan ())
r.set_undefined ();
+  else if (op2.undefined_p ())
+   return false;
   else if (build_lt (r, type, op2))
{
  r.clear_nan ();
@@ -901,6 +903,8 @@ foperator_lt::op2_range (frange ,
   // The TRUE side of NAN < x is unreachable.
   if (op1.known_isnan ())
r.set_undefined ();
+  else if (op1.undefined_p ())
+   return false;
   else if (build_gt (r, type, op1))
{
  r.clear_nan ();
@@ -982,6 +986,8 @@ foperator_le::op1_range (frange ,
   // The TRUE side of x <= NAN is unreachable.
   if (op2.known_isnan ())
r.set_undefined ();
+  else if (op2.undefined_p ())
+   return false;
   else if (build_le (r, type, op2))
r.clear_nan ();
   break;
@@ -1013,6 +1019,8 @@ foperator_le::op2_range (frange ,
   // The TRUE side of NAN <= x is unreachable.
   if (op1.known_isnan ())
r.set_undefined ();
+  else if (op1.undefined_p ())
+   return false;
   else if (build_ge (r, type, op1))
r.clear_nan ();
   break;
@@ -1021,6 +1029,8 @@ foperator_le::op2_range (frange ,
   // On the FALSE side of NAN <= x, we know nothing about x.
   if (op1.known_isnan ())
r.set_varying (type);
+  else if (op1.undefined_p ())
+   return false;
   else
build_lt (r, type, op1);
   break;
@@ -1090,6 +1100,8 @@ foperator_gt::op1_range (frange ,
   // The TRUE side of x > NAN is unreachable.
   if (op2.known_isnan ())
r.set_undefined ();
+  else if (op2.undefined_p ())
+   return false;
   else if (build_gt (r, type, op2))
{
  r.clear_nan ();
@@ -1102,6 +1114,8 @@ foperator_gt::op1_range (frange ,
   // On the FALSE side of x > NAN, we know nothing about x.
   if (op2.known_isnan ())
r.set_varying (type);
+  else if (op2.undefined_p ())
+   return false;
   else
build_le (r, type, op2);
   break;
@@ -1125,6 +1139,8 @@ foperator_gt::op2_range (frange ,
   // The TRUE side of NAN > x is unreachable.
   if (op1.known_isnan ())
r.set_undefined ();
+  else if (op1.undefined_p ())
+   return false;
   else if (build_lt (r, type, op1))
{
  r.clear_nan ();
@@ -1137,6 +1153,8 @@ foperator_gt::op2_range (frange ,
   // On The FALSE side of NAN > x, we know nothing about x.
   if (op1.known_isnan ())
r.set_varying (type);
+  else if (op1.undefined_p ())
+   return false;
   else
build_ge (r, type, op1);
   break;
@@ -1206,6 +1224,8 @@ foperator_ge::op1_range (frange ,
   // The TRUE side of x >= NAN is unreachable.
   if (op2.known_isnan ())
r.set_undefined ();
+  else if (op2.undefined_p ())
+   return false;
   else if (build_ge (r, type, op2))
r.clear_nan ();
   break;
@@ -1214,6 +1234,8 @@ foperator_ge::op1_range (frange ,
   // On the FALSE side of x >= NAN, we know nothing about x.
   if (op2.known_isnan ())
r.set_varying (type);
+  else if (op2.undefined_p ())
+   return false;
   else
build_lt (r, type, op2);
   break;
@@ -1236,6 +1258,8 @@ foperator_ge::op2_range (frange , tree type,
   // The TRUE side of NAN >= x is unreachable.
   if (op1.known_isnan ())
r.set_undefined ();
+  else if (op1.undefined_p ())
+   return false;
   

[PATCH v2] c++: wrong error with constexpr array and value-init [PR108158]

2023-02-03 Thread Marek Polacek via Gcc-patches
On Thu, Feb 02, 2023 at 05:29:44PM -0500, Jason Merrill wrote:
> On 1/30/23 21:35, Marek Polacek wrote:
> > In this test case, we find ourselves evaluating 't' which is
> > ((const struct carray *) this)->data_[VIEW_CONVERT_EXPR(index)]
> > in cxx_eval_array_reference.  ctx->object is non-null, a RESULT_DECL, so
> > we replace it with 't':
> > 
> >new_ctx.object = t; // result_decl replaced
> > 
> > and then we go to cxx_eval_constant_expression to evaluate an
> > AGGR_INIT_EXPR, where we end up evaluating an INIT_EXPR (which is in the
> > body of the constructor for seed_or_index):
> > 
> >((struct seed_or_index *) this)->value_ = NON_LVALUE_EXPR <0>
> > 
> > whereupon in cxx_eval_store_expression we go to the probe loop
> > where the 'this' is evaluated to
> > 
> >ze_set.tables_.first_table_.data_[0]
> > 
> > so the 'object' is ze_set, but that isn't in ctx->global->get_value_ptr
> > so we fail with a bogus error.  ze_set is not there because it comes
> > from a different constexpr context (it's not in cv_cache either).
> > 
> > The problem started with r12-2304 where I added the new_ctx.object
> > replacement.  That was to prevent a type mismatch: the type of 't'
> > and ctx.object were different.
> > 
> > It seems clear that we shouldn't have replaced ctx.object here.
> > The cxx_eval_array_reference I mentioned earlier is called from
> > cxx_eval_store_expression:
> >   6257   init = cxx_eval_constant_expression (_ctx, init, 
> > vc_prvalue,
> >   6258non_constant_p, 
> > overflow_p);
> > which already created a new context, whose .object we should be
> > using unless, for instance, INIT contained a.b and we're evaluating
> > the 'a' part, which I think was the case for r12-2304; in that case
> > ctx.object has to be something different.
> > 
> > A relatively safe fix should be to check the types before replacing
> > ctx.object, as in the below.
> 
> Agreed.  I'm trying to understand when the replacement could ever make
> sense, since 't' is not the target, it's the initializer.  The replacement
> comes from Patrick's fix for 98295, but that testcase no longer hits that
> code (likely due to changes in empty class handling).
> 
> If you add a gcc_checking_assert (false) to the replacement, does anything
> trip it?

It would trip in constexpr-101371.C, added in r12-2304.  BUT, and I would
have sworn that it ICEd when I tried, it's not necessary anymore.  So it
looks like we can simply remove the new_ctx.object line.  At least for
trunk, maybe 12 too.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
In this test case, we find ourselves evaluating 't' which is
((const struct carray *) this)->data_[VIEW_CONVERT_EXPR(index)]
in cxx_eval_array_reference.  ctx->object is non-null, a RESULT_DECL, so
we replace it with 't':

  new_ctx.object = t; // result_decl replaced

and then we go to cxx_eval_constant_expression to evaluate an
AGGR_INIT_EXPR, where we end up evaluating an INIT_EXPR (which is in the
body of the constructor for seed_or_index):

  ((struct seed_or_index *) this)->value_ = NON_LVALUE_EXPR <0>

whereupon in cxx_eval_store_expression we go to the probe loop
where the 'this' is evaluated to

  ze_set.tables_.first_table_.data_[0]

so the 'object' is ze_set, but that isn't in ctx->global->get_value_ptr
so we fail with a bogus error.  ze_set is not there because it comes
from a different constexpr context (it's not in cv_cache either).

The problem started with r12-2304 where I added the new_ctx.object
replacement.  That was to prevent a type mismatch: the type of 't'
and ctx.object were different.

It seems clear that we shouldn't have replaced ctx.object here.
The cxx_eval_array_reference I mentioned earlier is called from
cxx_eval_store_expression:
 6257   init = cxx_eval_constant_expression (_ctx, init, vc_prvalue,
 6258non_constant_p, overflow_p);
which already created a new context, whose .object we should be
using unless, for instance, INIT contained a.b and we're evaluating
the 'a' part, which I think was the case for r12-2304; in that case
ctx.object has to be something different.

It no longer seems necessary to replace new_ctx.object (likely due to
changes in empty class handling).

PR c++/108158

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_array_reference): Don't replace
new_ctx.object.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/constexpr-108158.C: New test.
---
 gcc/cp/constexpr.cc   |  4 ---
 gcc/testsuite/g++.dg/cpp1y/constexpr-108158.C | 32 +++
 2 files changed, 32 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-108158.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 5b31f9c27d1..564766c8a00 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -4301,10 +4301,6 @@ cxx_eval_array_reference (const constexpr_ctx *ctx, tree 

[PATCH] LoongArch: Generate bytepick.[wd] for suitable bit operation pattern

2023-02-03 Thread Xi Ruoyao via Gcc-patches
We can use bytepick.[wd] for

a << (8 * x) | b >> (8 * (sizeof(a) - x))

while a and b are uint32_t or uint64_t.  This is useful for some cases,
for example:
https://sourceware.org/pipermail/libc-alpha/2023-February/145203.html

Bootstrapped and regtested on loongarch64-linux-gnu.
Ok for trunk (now or GCC 14 stage 1)?

gcc/ChangeLog:

* config/loongarch/loongarch.md (bytepick_w_ashift_amount):
New define_int_iterator.
(bytepick_d_ashift_amount): Likewise.
(bytepick_imm): New define_int_attr.
(bytepick_w_lshiftrt_amount): Likewise.
(bytepick_d_lshiftrt_amount): Likewise.
(bytepick_w_): New define_insn template.
(bytepick_w__extend): Likewise.
(bytepick_d_): Likewise.
(bytepick_w): Remove unused define_insn.
(bytepick_d): Likewise.
(UNSPEC_BYTEPICK_W): Remove unused unspec.
(UNSPEC_BYTEPICK_D): Likewise.
* config/loongarch/predicates.md (const_0_to_3_operand):
Remove unused define_predicate.
(const_0_to_7_operand): Likewise.

gcc/testsuite/ChangeLog:

* g++.target/loongarch/bytepick.C: New test.
---
 gcc/config/loongarch/loongarch.md | 60 ++-
 gcc/config/loongarch/predicates.md|  8 ---
 gcc/testsuite/g++.target/loongarch/bytepick.C | 32 ++
 3 files changed, 77 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/loongarch/bytepick.C

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 0884ec09dfb..3509c3c21c1 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -48,8 +48,6 @@ (define_c_enum "unspec" [
   UNSPEC_EH_RETURN
 
   ;; Bit operation
-  UNSPEC_BYTEPICK_W
-  UNSPEC_BYTEPICK_D
   UNSPEC_BITREV_4B
   UNSPEC_BITREV_8B
 
@@ -544,6 +542,27 @@ (define_int_attr lrint_allow_inexact [(UNSPEC_FTINT "1")
  (UNSPEC_FTINTRM "0")
  (UNSPEC_FTINTRP "0")])
 
+;; Iterator and attributes for bytepick.d
+(define_int_iterator bytepick_w_ashift_amount [8 16 24])
+(define_int_attr bytepick_w_lshiftrt_amount [(8 "24")
+(16 "16")
+(24 "8")])
+(define_int_iterator bytepick_d_ashift_amount [8 16 24 32 40 48 56])
+(define_int_attr bytepick_d_lshiftrt_amount [(8 "56")
+(16 "48")
+(24 "40")
+(32 "32")
+(40 "24")
+(48 "16")
+(56 "8")])
+(define_int_attr bytepick_imm [(8 "1")
+(16 "2")
+(24 "3")
+(32 "4")
+(40 "5")
+(48 "6")
+(56 "7")])
+
 ;;
 ;;  
 ;;
@@ -3364,24 +3383,35 @@ (define_insn "fclass_"
   [(set_attr "type" "unknown")
(set_attr "mode" "")])
 
-(define_insn "bytepick_w"
+(define_insn "bytepick_w_"
   [(set (match_operand:SI 0 "register_operand" "=r")
-   (unspec:SI [(match_operand:SI 1 "register_operand" "r")
-   (match_operand:SI 2 "register_operand" "r")
-   (match_operand:SI 3 "const_0_to_3_operand" "n")]
-   UNSPEC_BYTEPICK_W))]
+   (ior:SI (lshiftrt (match_operand:SI 1 "register_operand" "r")
+ (const_int ))
+   (ashift (match_operand:SI 2 "register_operand" "r")
+   (const_int bytepick_w_ashift_amount]
   ""
-  "bytepick.w\t%0,%1,%2,%z3"
+  "bytepick.w\t%0,%1,%2,"
   [(set_attr "mode" "SI")])
 
-(define_insn "bytepick_d"
+(define_insn "bytepick_w__extend"
   [(set (match_operand:DI 0 "register_operand" "=r")
-   (unspec:DI [(match_operand:DI 1 "register_operand" "r")
-   (match_operand:DI 2 "register_operand" "r")
-   (match_operand:DI 3 "const_0_to_7_operand" "n")]
-   UNSPEC_BYTEPICK_D))]
-  ""
-  "bytepick.d\t%0,%1,%2,%z3"
+   (sign_extend:DI
+ (ior:SI (lshiftrt (match_operand:SI 1 "register_operand" "r")
+   (const_int ))
+ (ashift (match_operand:SI 2 "register_operand" "r")
+ (const_int bytepick_w_ashift_amount)]
+  "TARGET_64BIT"
+  "bytepick.w\t%0,%1,%2,"
+  [(set_attr "mode" "SI")])
+
+(define_insn "bytepick_d_"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (ior:DI (lshiftrt (match_operand:DI 1 "register_operand" "r")
+ (const_int ))
+   (ashift (match_operand:DI 2 "register_operand" "r")
+   (const_int bytepick_d_ashift_amount]
+  "TARGET_64BIT"
+  "bytepick.d\t%0,%1,%2,"
   

Re: [PATCH] ipa: Avoid invalid gimple when IPA-CP and IPA-SRA disagree on types (108384)

2023-02-03 Thread Bernhard Reutner-Fischer via Gcc-patches
On 3 February 2023 12:35:32 CET, Richard Biener via Gcc-patches
>
>I think it's OK as-is given this explanation.
>

s/derefernce/dereference/

thanks,


Re: [PATCH] [PR tree-optimization/18639] Compare nonzero bits in irange with widest_int.

2023-02-03 Thread Andrew MacLeod via Gcc-patches



On 2/3/23 04:16, Jakub Jelinek wrote:

On Fri, Feb 03, 2023 at 09:50:43AM +0100, Aldy Hernandez wrote:

[PR tree-optimization/18639] Compare nonzero bits in irange with widest_int.

0 missing in the bug number in the subject line, though the current
recommended formatting of the subject is I think:
value-range: Compare nonzero bits in irange with widest_int [PR180639]
   
 PR 108639/tree-optimization


Reversed component and number


--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -1259,7 +1259,10 @@ irange::legacy_equal_p (const irange ) const
   other.tree_lower_bound (0))
  && vrp_operand_equal_p (tree_upper_bound (0),
  other.tree_upper_bound (0))
- && get_nonzero_bits () == other.get_nonzero_bits ());
+ && (widest_int::from (get_nonzero_bits (),
+   TYPE_SIGN (type ()))
+ == widest_int::from (other.get_nonzero_bits (),
+  TYPE_SIGN (other.type ();
  }
  
  bool

@@ -1294,7 +1297,11 @@ irange::operator== (const irange ) const
  || !operand_equal_p (ub, ub_other, 0))
return false;
  }
-  return get_nonzero_bits () == other.get_nonzero_bits ();
+  widest_int nz1 = widest_int::from (get_nonzero_bits (),
+TYPE_SIGN (type ()));
+  widest_int nz2 = widest_int::from (other.get_nonzero_bits (),
+TYPE_SIGN (other.type ()));
+  return nz1 == nz2;
  }

While the above avoids the ICE (and would be certainly correct for
the bounds, depending on the sign of their type sign or zero extended
to widest int), but is the above what we want for non-zero bits
to be considered equal?  The wide_ints (which ought to have precision
of the corresponding type) don't represent normal numbers but bitmasks,
0 - this bit is known to be zero, 1 - nothing is known about this bit).
So, if there are different precisions and the narrower value has 0
in the MSB of the bitmask (so MSB is known to be zero), the above requires
for equality that in the other range all upper bits are known to be zero
too for both signed and unsigned.  That is ok.  Similarly for MSB set
if TYPE_SIGN of the narrower is unsigned, the MSB value is unknown, but we
require on the wider to have all the upper bits cleared.  But for signed
narrower type with MSB set, i.e. it is unknown if it is positive or
negative, the above requires that all the above bits are unknown too.
And that is the case I'm not sure about, whether in that case the
upper bits of the wider wide_int should be checked at all.
Though, perhaps from the POV of nonzero bits derived from the sign-extended
values in the ranges sign bit copies (so all above bits 1) is what one would
get, so maybe it is ok.  Just food for thought.

if the bits match exactly along with everything else, then we can be 
sure the ranges are truly equal.  If for some reason the numbers are all 
the same but the non-zero bits don't compare equal,  then I can't think 
of what harm it could cause to compare unequal..  Worst case is we dont 
perform some optimization in this extremely rare scenario of differing 
precisions.  And in fact they could actually be unequal...


So I suspect this is fine...

Andrew




Re: [PATCH 2/2] Documentation Update.

2023-02-03 Thread Jeff Law via Gcc-patches




On 2/2/23 10:05, Kees Cook via Gcc-patches wrote:



Right -- this can lead (at least) to type confusion and other problems
too. We've been trying to remove all of these overlaps in the Linux
kernel. I mention it the "Overlapping composite structure members"
section at https://people.kernel.org/kees/bounded-flexible-arrays-in-c
Good.  We found several of these when Martin S was doing his work on the 
diagnostics a few years back.  It wasn't hard to see what the intent 
was, but it struck me as a poor (ab)use of the feature and that I 
wouldn't be surprised if compilers might differ in their semantics 
around this stuff.


jeff


Re: [PATCH] PR tree-optimization/107570 - Reset SCEV after folding in VRP.

2023-02-03 Thread Andrew MacLeod via Gcc-patches


On 2/2/23 07:22, Richard Biener wrote:

On Wed, Feb 1, 2023 at 7:12 PM Andrew MacLeod via Gcc-patches
 wrote:

We can reset SCEV after we fold, then SCEVs cache shouldn't have
anything in it when we go to remove ssa-names in remove_unreachable().

We were resetting it later sometimes if we were processing the array
bounds warning, so I removed that call and just always reset it now.

Bootstraps on x86_64-pc-linux-gnu. Testing running. Assuming no
regressions,  OK for trunk?

+
+  // SCEV needs to be reset for array bounds, and we do not wish to trigger
+  // any SCEV lookups when removing unreachable globals, so reset it here.
+  scev_reset ();

the comment suggests that SCEV queries (aka analyze_scalar_evolution)
won't return anything after a scev_reset ().  That's not true - instead what
it does is nuke the SCEV cache.  That's necessary when you
release SSA names or alter the CFG and you want to avoid followup
SCEV queries to pick up stale data.

So if remove_and_update_globals performs SCEV queries and eventually
releases SSA names you cannot remove the second call to scev_reset.

But yes, it's probably substitute_and_fold_engine::substitute_and_fold
itself that should do a

   if (scev_initialized_p ())
 scev_reset ();

possibly only in the case it released an SSA name, or removed an
edge (but that's maybe premature optimization).

Richard.


Mmm, yeah thats not what I meant to imply.  We can simply reset it 
before trying to process the unreachable globals like so. Maybe we visit 
the  S engine next release.. It seems more intrusive than this.


How about this patch?  testing underway.

Andrew

From e58ebd111e7b3b8047fb93396c204bd703926802 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Wed, 1 Feb 2023 11:46:18 -0500
Subject: [PATCH] Reset SCEV before removing unreachable globals.

SCEV should be reset in VRP before trying to remove unreachable globals
to avoid triggering issues with it's cache.

	PR tree-optimization/107570
	gcc/
	* tree-vrp.cc (remove_and_update_globals): Reset SCEV.

	gcc/testsuite/
	gcc.dg/pr107570.c: New.
---
 gcc/testsuite/gcc.dg/pr107570.c | 25 +
 gcc/tree-vrp.cc |  4 
 2 files changed, 29 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pr107570.c

diff --git a/gcc/testsuite/gcc.dg/pr107570.c b/gcc/testsuite/gcc.dg/pr107570.c
new file mode 100644
index 000..ba5b535a867
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr107570.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-Os" } */
+
+long int n;
+
+void
+foo (int *p, int x)
+{
+  for (;;)
+{
+  for (*p = 0; *p < 1; ++*p)
+{
+  n += *p < 0;
+  if (n < x)
+{
+  while (x < 1)
+++x;
+
+  __builtin_unreachable ();
+}
+}
+
+  p = 
+}
+}
diff --git a/gcc/tree-vrp.cc b/gcc/tree-vrp.cc
index 3c431760a16..95547e5419b 100644
--- a/gcc/tree-vrp.cc
+++ b/gcc/tree-vrp.cc
@@ -121,6 +121,10 @@ remove_unreachable::remove_and_update_globals (bool final_p)
   if (m_list.length () == 0)
 return false;
 
+  // Ensure the cache in SCEV has been cleared before processing
+  // globals to be removed.
+  scev_reset ();
+
   bool change = false;
   tree name;
   unsigned i;
-- 
2.39.0



Re: [PATCH] Improve RTL CSE hash table hash usage

2023-02-03 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches  writes:
>> Am 03.02.2023 um 15:20 schrieb Richard Sandiford via Gcc-patches 
>> :
>> 
>> Richard Biener via Gcc-patches  writes:
>>> The RTL CSE hash table has a fixed number of buckets (32) each
>>> with a linked list of entries with the same hash value.  The
>>> actual hash values are computed using hash_rtx which uses adds
>>> for mixing and adds the rtx CODE as CODE << 7 (apart from some
>>> exceptions such as MEM).  The unsigned int typed hash value
>>> is then simply truncated for the actual lookup into the fixed
>>> size table which means that usually CODE is simply lost.
>>> 
>>> The following improves this truncation by first mixing in more
>>> bits using xor.  It does not change the actual hash function
>>> since that's used outside of CSE as well.
>>> 
>>> An alternative would be to bump the fixed number of buckets,
>>> say to 256 which would retain the LSB of CODE or to 8192 which
>>> can capture all 6 bits required for the last CODE.
>>> 
>>> As the comment in CSE says, there's invalidate_memory and
>>> flush_hash_table done possibly frequently and those at least
>>> need to walk all slots, so when the hash table is mostly empty
>>> enlarging it will be a loss.  Still there should be more
>>> regular lookups by hash, so less collisions should pay off
>>> as well.
>> 
>> Going purely from this description and without having looked
>> at the code properly, would it be possible to link all current
>> values together, not just those with the same hash?  And would
>> that help?  It looks like the list is already doubly-linked,
>> and there's spare room to store a "start of new hash" marker.
>
> We already do have equivalent values linked, but I’m not sure that’s what you 
> are suggesting.

I was thinking of linking every active value in the table together,
but with entries for the same hash being consecutive.  That way, things
like invalidate_memory can just walk the list and ignore the hash table.

> Those should also have the same hash value, so both lists are somewhat 
> redundant and we might be able to save some storage here by making this a 
> list of lists of same hash and value list?

I thought the value-equality list was to establish that (e.g.)
(reg R1) and (reg R2) are known to have the same value, despite
being different expressions with different hash values.

But I suppose if we reused an existing hash table structure (with its
own mechanism for handling collisions), it would make sense to use the
equivalent-value list to join everything together, rather than the
same-hash list.  Again, there could be a marker to establish the start
of a new equivalent-value sublist.

Thanks,
Richard
>
>> 
>> Thanks,
>> Richard
>> 
>>> Without enlarging the table a better hash function is unlikely
>>> going to make a big difference, simple statistics on the
>>> number of collisions at insertion time shows a reduction of
>>> around 10%.  Bumping HASH_SHIFT by 1 improves that to 30%
>>> at the expense of reducing the average table fill by 10%
>>> (all of this stats from looking just at fold-const.i at -O2).
>>> Increasing HASH_SHIFT more leaves the table even more sparse
>>> likely showing that hash_rtx uses add for mixing which is
>>> quite bad.  Bumping HASH_SHIFT by 2 removes 90% of all
>>> collisions.
>>> 
>>> Experimenting with using inchash instead of adds for the
>>> mixing does not improve things when looking at the HASH_SHIFT
>>> bumped by 2 numbers.
>>> 
>>> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>>> 
>>> Any opinions?
>>> 
>>>* cse.cc (HASH): Turn into inline function and mix
>>>in another HASH_SHIFT bits.
>>>(SAFE_HASH): Likewise.
>>> ---
>>> gcc/cse.cc | 37 +++--
>>> 1 file changed, 23 insertions(+), 14 deletions(-)
>>> 
>>> diff --git a/gcc/cse.cc b/gcc/cse.cc
>>> index 37afc88b439..4777e559b86 100644
>>> --- a/gcc/cse.cc
>>> +++ b/gcc/cse.cc
>>> @@ -420,20 +420,6 @@ struct table_elt
>>> #define HASH_SIZE(1 << HASH_SHIFT)
>>> #define HASH_MASK(HASH_SIZE - 1)
>>> 
>>> -/* Compute hash code of X in mode M.  Special-case case where X is a pseudo
>>> -   register (hard registers may require `do_not_record' to be set).  */
>>> -
>>> -#define HASH(X, M)\
>>> - ((REG_P (X) && REGNO (X) >= FIRST_PSEUDO_REGISTER\
>>> -  ? (((unsigned) REG << 7) + (unsigned) REG_QTY (REGNO (X)))\
>>> -  : canon_hash (X, M)) & HASH_MASK)
>>> -
>>> -/* Like HASH, but without side-effects.  */
>>> -#define SAFE_HASH(X, M)\
>>> - ((REG_P (X) && REGNO (X) >= FIRST_PSEUDO_REGISTER\
>>> -  ? (((unsigned) REG << 7) + (unsigned) REG_QTY (REGNO (X)))\
>>> -  : safe_hash (X, M)) & HASH_MASK)
>>> -
>>> /* Determine whether register number N is considered a fixed register for 
>>> the
>>>purpose of approximating register costs.
>>>It is desirable to replace other regs with fixed regs, to reduce need for
>>> @@ -586,6 +572,29 @@ static machine_mode cse_cc_succs (basic_block, 
>>> basic_block, rtx, 

Re: [PATCH] Improve RTL CSE hash table hash usage

2023-02-03 Thread Richard Biener via Gcc-patches



> Am 03.02.2023 um 15:20 schrieb Richard Sandiford via Gcc-patches 
> :
> 
> Richard Biener via Gcc-patches  writes:
>> The RTL CSE hash table has a fixed number of buckets (32) each
>> with a linked list of entries with the same hash value.  The
>> actual hash values are computed using hash_rtx which uses adds
>> for mixing and adds the rtx CODE as CODE << 7 (apart from some
>> exceptions such as MEM).  The unsigned int typed hash value
>> is then simply truncated for the actual lookup into the fixed
>> size table which means that usually CODE is simply lost.
>> 
>> The following improves this truncation by first mixing in more
>> bits using xor.  It does not change the actual hash function
>> since that's used outside of CSE as well.
>> 
>> An alternative would be to bump the fixed number of buckets,
>> say to 256 which would retain the LSB of CODE or to 8192 which
>> can capture all 6 bits required for the last CODE.
>> 
>> As the comment in CSE says, there's invalidate_memory and
>> flush_hash_table done possibly frequently and those at least
>> need to walk all slots, so when the hash table is mostly empty
>> enlarging it will be a loss.  Still there should be more
>> regular lookups by hash, so less collisions should pay off
>> as well.
> 
> Going purely from this description and without having looked
> at the code properly, would it be possible to link all current
> values together, not just those with the same hash?  And would
> that help?  It looks like the list is already doubly-linked,
> and there's spare room to store a "start of new hash" marker.

We already do have equivalent values linked, but I’m not sure that’s what you 
are suggesting.  Those should also have the same hash value, so both lists are 
somewhat redundant and we might be able to save some storage here by making 
this a list of lists of same hash and value list?

> 
> Thanks,
> Richard
> 
>> Without enlarging the table a better hash function is unlikely
>> going to make a big difference, simple statistics on the
>> number of collisions at insertion time shows a reduction of
>> around 10%.  Bumping HASH_SHIFT by 1 improves that to 30%
>> at the expense of reducing the average table fill by 10%
>> (all of this stats from looking just at fold-const.i at -O2).
>> Increasing HASH_SHIFT more leaves the table even more sparse
>> likely showing that hash_rtx uses add for mixing which is
>> quite bad.  Bumping HASH_SHIFT by 2 removes 90% of all
>> collisions.
>> 
>> Experimenting with using inchash instead of adds for the
>> mixing does not improve things when looking at the HASH_SHIFT
>> bumped by 2 numbers.
>> 
>> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>> 
>> Any opinions?
>> 
>>* cse.cc (HASH): Turn into inline function and mix
>>in another HASH_SHIFT bits.
>>(SAFE_HASH): Likewise.
>> ---
>> gcc/cse.cc | 37 +++--
>> 1 file changed, 23 insertions(+), 14 deletions(-)
>> 
>> diff --git a/gcc/cse.cc b/gcc/cse.cc
>> index 37afc88b439..4777e559b86 100644
>> --- a/gcc/cse.cc
>> +++ b/gcc/cse.cc
>> @@ -420,20 +420,6 @@ struct table_elt
>> #define HASH_SIZE(1 << HASH_SHIFT)
>> #define HASH_MASK(HASH_SIZE - 1)
>> 
>> -/* Compute hash code of X in mode M.  Special-case case where X is a pseudo
>> -   register (hard registers may require `do_not_record' to be set).  */
>> -
>> -#define HASH(X, M)\
>> - ((REG_P (X) && REGNO (X) >= FIRST_PSEUDO_REGISTER\
>> -  ? (((unsigned) REG << 7) + (unsigned) REG_QTY (REGNO (X)))\
>> -  : canon_hash (X, M)) & HASH_MASK)
>> -
>> -/* Like HASH, but without side-effects.  */
>> -#define SAFE_HASH(X, M)\
>> - ((REG_P (X) && REGNO (X) >= FIRST_PSEUDO_REGISTER\
>> -  ? (((unsigned) REG << 7) + (unsigned) REG_QTY (REGNO (X)))\
>> -  : safe_hash (X, M)) & HASH_MASK)
>> -
>> /* Determine whether register number N is considered a fixed register for the
>>purpose of approximating register costs.
>>It is desirable to replace other regs with fixed regs, to reduce need for
>> @@ -586,6 +572,29 @@ static machine_mode cse_cc_succs (basic_block, 
>> basic_block, rtx, rtx,
>> 
>> static const struct rtl_hooks cse_rtl_hooks = RTL_HOOKS_INITIALIZER;
>> 
>> +/* Compute hash code of X in mode M.  Special-case case where X is a pseudo
>> +   register (hard registers may require `do_not_record' to be set).  */
>> +
>> +static inline unsigned
>> +HASH (rtx x, machine_mode mode)
>> +{
>> +  unsigned h = (REG_P (x) && REGNO (x) >= FIRST_PSEUDO_REGISTER
>> +? (((unsigned) REG << 7) + (unsigned) REG_QTY (REGNO (x)))
>> +: canon_hash (x, mode));
>> +  return (h ^ (h >> HASH_SHIFT)) & HASH_MASK;
>> +}
>> +
>> +/* Like HASH, but without side-effects.  */
>> +
>> +static inline unsigned
>> +SAFE_HASH (rtx x, machine_mode mode)
>> +{
>> +  unsigned h = (REG_P (x) && REGNO (x) >= FIRST_PSEUDO_REGISTER
>> +? (((unsigned) REG << 7) + (unsigned) REG_QTY (REGNO (x)))
>> +: safe_hash (x, mode));
>> +  return (h ^ 

Re: [PATCH] arm: [MVE] Add missing length=8 attribute

2023-02-03 Thread Richard Earnshaw via Gcc-patches




On 01/02/2023 09:46, Christophe Lyon via Gcc-patches wrote:

I have noticed that the "length" "8" attribute is missing in a few
patterns in mve.md.

gcc/
* config/arm/mve.md (mve_vabavq_p_): Add length
attribute.
(mve_vqshluq_m_n_s): Likewise.
(mve_vshlq_m_): Likewise.
(mve_vsriq_m_n_): Likewise.
(mve_vsubq_m_): Likewise.
---


OK

R.


Re: [PATCH] arm: Fix warning in libgcc/config/arm/pr-support.c

2023-02-03 Thread Richard Earnshaw via Gcc-patches




On 01/02/2023 09:46, Christophe Lyon via Gcc-patches wrote:

I have noticed some warnings when building GCC for arm-eabi:
pr-support.c:110:7: warning: variable ‘set_pac_sp’ set but not used 
[-Wunused-but-set-variable]
pr-support.c:109:7: warning: variable ‘set_pac’ set but not used 
[-Wunused-but-set-variable]

This small patch avoids them by defining these two variables undef
TARGET_HAVE_PACBTI, like the code which actually uses them.

libgcc/
* config/arm/pr-support.c (__gnu_unwind_execute): Use
TARGET_HAVE_PACBTI to define set_pac and set_pac_sp.


OK

R.


Re: [PATCH] MIPS: use arch_32/64 instead of default_mips_arch

2023-02-03 Thread YunQiang Su
Xi Ruoyao via Gcc-patches  于2023年2月3日周五 22:35写道:
>
> On Fri, 2023-02-03 at 14:08 +, Richard Sandiford via Gcc-patches
> wrote:
> > > Do you mean that the "wrong" format is quite interesting?
> > > Yes, While this format is never used at all.
> >
> > My point was that there is nothing wrong in principle with creating
> > an o32 executable that has a 64-bit rather than a 32-bit ISA (since
> > the
> > 64-bit ISAs are pure extensions of 32-bit ISAs).  Doing that is even
> > useful in some cases.  For example, MIPS4+O32 is a useful combination,
> > even though MIPS4 is a 64-bit ISA.  Same for Octeon3+O32, etc.
> >
> > So is the linker behaviour really correct?  Doesn't it mean that
> > Octeon3 O32 binaries are link-incompatible with MIPS32 o32 binaries?
>
> On gcc230:
>
> xry111@gcc230:~$ cat a.c
> int a() { return 42; }
> xry111@gcc230:~$ cat b.c
> extern int a(void);
> int main() { return a() ^ 42; }
> xry111@gcc230:~$ cc a.c -mabi=32 -march=mips32r2 -c
> xry111@gcc230:~$ cc b.c -mabi=32 -march=mips64r2 -c
> xry111@gcc230:~$ cc a.o b.o
> xry111@gcc230:~$ ./a.out && echo ok
> ok
> xry111@gcc230:~$
> xry111@gcc230:~$ ld -version
> GNU ld (GNU Binutils for Debian) 2.31.1
> Copyright (C) 2018 Free Software Foundation, Inc.
> This program is free software; you may redistribute it under the terms of
> the GNU General Public License version 3 or (at your option) a later version.
> This program has absolutely no warranty.
>
> So I'd consider the issue a GNU ld regression if it suddenly stops to
> behave like this.

Sorry, I made a mistake: r2 works well while r6 does not.
root@bookworm-mips64r6el:~# gcc -mabi=32 -mips64r6 -O3 -c yy.c
root@bookworm-mips64r6el:~# gcc -mabi=32 -mips32r6 -O3 -c xx.c
root@bookworm-mips64r6el:~# gcc -shared -o xx.so yy.o xx.o
/usr/bin/ld: yy.o: ABI is incompatible with that of the selected emulation
/usr/bin/ld: failed to merge target specific data of file yy.o
/usr/bin/ld: xx.o: ABI is incompatible with that of the selected emulation
/usr/bin/ld: failed to merge target specific data of file xx.o
collect2: error: ld returned 1 exit status

So, it seems that I need to find what happens in gnu ld for r6.

> --
> Xi Ruoyao 
> School of Aerospace Science and Technology, Xidian University


Re: [aarch64] Use dup and zip1 for interleaving elements in initializing vector

2023-02-03 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni  writes:
> On Fri, 3 Feb 2023 at 07:10, Prathamesh Kulkarni
>  wrote:
>>
>> On Thu, 2 Feb 2023 at 20:50, Richard Sandiford
>>  wrote:
>> >
>> > Prathamesh Kulkarni  writes:
>> > >> >> > I have attached a patch that extends the transform if one half is 
>> > >> >> > dup
>> > >> >> > and other is set of constants.
>> > >> >> > For eg:
>> > >> >> > int8x16_t f(int8_t x)
>> > >> >> > {
>> > >> >> >   return (int8x16_t) { x, 1, x, 2, x, 3, x, 4, x, 5, x, 6, x, 7, 
>> > >> >> > x, 8 };
>> > >> >> > }
>> > >> >> >
>> > >> >> > code-gen trunk:
>> > >> >> > f:
>> > >> >> > adrpx1, .LC0
>> > >> >> > ldr q0, [x1, #:lo12:.LC0]
>> > >> >> > ins v0.b[0], w0
>> > >> >> > ins v0.b[2], w0
>> > >> >> > ins v0.b[4], w0
>> > >> >> > ins v0.b[6], w0
>> > >> >> > ins v0.b[8], w0
>> > >> >> > ins v0.b[10], w0
>> > >> >> > ins v0.b[12], w0
>> > >> >> > ins v0.b[14], w0
>> > >> >> > ret
>> > >> >> >
>> > >> >> > code-gen with patch:
>> > >> >> > f:
>> > >> >> > dup v0.16b, w0
>> > >> >> > adrpx0, .LC0
>> > >> >> > ldr q1, [x0, #:lo12:.LC0]
>> > >> >> > zip1v0.16b, v0.16b, v1.16b
>> > >> >> > ret
>> > >> >> >
>> > >> >> > Bootstrapped+tested on aarch64-linux-gnu.
>> > >> >> > Does it look OK ?
>> > >> >>
>> > >> >> Looks like a nice improvement.  It'll need to wait for GCC 14 now 
>> > >> >> though.
>> > >> >>
>> > >> >> However, rather than handle this case specially, I think we should 
>> > >> >> instead
>> > >> >> take a divide-and-conquer approach: split the initialiser into even 
>> > >> >> and
>> > >> >> odd elements, find the best way of loading each part, then compare 
>> > >> >> the
>> > >> >> cost of these sequences + ZIP with the cost of the fallback code 
>> > >> >> (the code
>> > >> >> later in aarch64_expand_vector_init).
>> > >> >>
>> > >> >> For example, doing that would allow:
>> > >> >>
>> > >> >>   { x, y, 0, y, 0, y, 0, y, 0, y }
>> > >> >>
>> > >> >> to be loaded more easily, even though the even elements aren't wholly
>> > >> >> constant.
>> > >> > Hi Richard,
>> > >> > I have attached a prototype patch based on the above approach.
>> > >> > It subsumes specializing for above {x, y, x, y, x, y, x, y} case by 
>> > >> > generating
>> > >> > same sequence, thus I removed that hunk, and improves the following 
>> > >> > cases:
>> > >> >
>> > >> > (a)
>> > >> > int8x16_t f_s16(int8_t x)
>> > >> > {
>> > >> >   return (int8x16_t) { x, 1, x, 2, x, 3, x, 4,
>> > >> >  x, 5, x, 6, x, 7, x, 8 };
>> > >> > }
>> > >> >
>> > >> > code-gen trunk:
>> > >> > f_s16:
>> > >> > adrpx1, .LC0
>> > >> > ldr q0, [x1, #:lo12:.LC0]
>> > >> > ins v0.b[0], w0
>> > >> > ins v0.b[2], w0
>> > >> > ins v0.b[4], w0
>> > >> > ins v0.b[6], w0
>> > >> > ins v0.b[8], w0
>> > >> > ins v0.b[10], w0
>> > >> > ins v0.b[12], w0
>> > >> > ins v0.b[14], w0
>> > >> > ret
>> > >> >
>> > >> > code-gen with patch:
>> > >> > f_s16:
>> > >> > dup v0.16b, w0
>> > >> > adrpx0, .LC0
>> > >> > ldr q1, [x0, #:lo12:.LC0]
>> > >> > zip1v0.16b, v0.16b, v1.16b
>> > >> > ret
>> > >> >
>> > >> > (b)
>> > >> > int8x16_t f_s16(int8_t x, int8_t y)
>> > >> > {
>> > >> >   return (int8x16_t) { x, y, 1, y, 2, y, 3, y,
>> > >> > 4, y, 5, y, 6, y, 7, y };
>> > >> > }
>> > >> >
>> > >> > code-gen trunk:
>> > >> > f_s16:
>> > >> > adrpx2, .LC0
>> > >> > ldr q0, [x2, #:lo12:.LC0]
>> > >> > ins v0.b[0], w0
>> > >> > ins v0.b[1], w1
>> > >> > ins v0.b[3], w1
>> > >> > ins v0.b[5], w1
>> > >> > ins v0.b[7], w1
>> > >> > ins v0.b[9], w1
>> > >> > ins v0.b[11], w1
>> > >> > ins v0.b[13], w1
>> > >> > ins v0.b[15], w1
>> > >> > ret
>> > >> >
>> > >> > code-gen patch:
>> > >> > f_s16:
>> > >> > adrpx2, .LC0
>> > >> > dup v1.16b, w1
>> > >> > ldr q0, [x2, #:lo12:.LC0]
>> > >> > ins v0.b[0], w0
>> > >> > zip1v0.16b, v0.16b, v1.16b
>> > >> > ret
>> > >>
>> > >> Nice.
>> > >>
>> > >> > There are a couple of issues I have come across:
>> > >> > (1) Choosing element to pad vector.
>> > >> > For eg, if we are initiailizing a vector say { x, y, 0, y, 1, y, 2, y 
>> > >> > }
>> > >> > with mode V8HI.
>> > >> > We split it into { x, 0, 1, 2 } and { y, y, y, y}
>> > >> > However since the mode is V8HI, we would need to pad the above split 
>> > >> > vectors
>> > >> > with 4 more elements to match up to vector length.
>> > >> > For {x, 0, 1, 2} using any constant is the obvious choice while for 
>> > >> > {y, y, y, y}
>> > >> > using 'y' is the 

Re: [PATCH v5 0/5] P1689R5 support

2023-02-03 Thread Ben Boeckel via Gcc-patches
On Fri, Feb 03, 2023 at 09:10:21 +, Jonathan Wakely wrote:
> On Fri, 3 Feb 2023 at 08:58, Jonathan Wakely wrote:
> > On Fri, 3 Feb 2023, 04:09 Andrew Pinski via Gcc,  wrote:
> >> On Wed, Jan 25, 2023 at 1:07 PM Ben Boeckel via Fortran
> >>  wrote:
> >> > This patch series adds initial support for ISO C++'s [P1689R5][], a
> >> > format for describing C++ module requirements and provisions based on
> >> > the source code. This is required because compiling C++ with modules is
> >> > not embarrassingly parallel and need to be ordered to ensure that
> >> > `import some_module;` can be satisfied in time by making sure that any
> >> > TU with `export import some_module;` is compiled first.
> >>
> >> I like how folks are complaining that GCC outputs POSIX makefile
> >> syntax from GCC's dependency files which are supposed to be in POSIX
> >> Makefile syntax.
> >> It seems like rather the build tools are people like to use are not
> >> understanding POSIX makefile syntax any more rather.
> >> Also I am not a fan of json, it is too verbose for no use. Maybe it is
> >> time to go back to standardizing a new POSIX makefile syntax rather
> >> than changing C++ here.

I'm not complaining that dependency files are in POSIX (or even
POSIX-to-be) syntax. The information requires a bit more structure than
some variable assignments and I don't expect anything trying to read
them to start trying to understand `VAR_$(DEREF)=` and the behaviors of
`:=` versus `=` assignment to get this reliably.

> > That would take a decade or more. It's too late for POSIX 202x and
> > the pace that POSIX agrees on makefile features is incredibly slow.
> 
> Also, name+=value is *not* POSIX make syntax today, that's an
> extension. That's why the tools don't always support it.
> So I don't think it's true that GCC's dependency files are in POSIX syntax.
> 
> POSIX 202x does add support for it, but it will take some time for it
> to be supported everywhere.

Additionally, while the *syntax* might be supported, encoding all of
P1689 in it would require additional work (e.g., key/value variable
assignments or something). Batch scanning would also be…interesting.
Also note that the imported modules' location cannot be known before
scanning in general, so all you get are "logical names" that you need a
collator to link up with other scan results anyways. Tools such as
`make` and `ninja` cannot know, in general, how to do this linking
between arbitrary targets (e.g., there may be a debug and release build
of the same module in the graph and knowing which to use requires
higher-level info about the entire build graph; modules may also be
considered "private" and not accessible everywhere and therefore should
also not be hooked up across different target boundaries).

While the `CXX_MODULES +=` approach can work for simple cases (a
pseudo-implicit build), it is quite insufficient for the general case.

--Ben


Re: [PATCH 2/2] Documentation Update.

2023-02-03 Thread Qing Zhao via Gcc-patches



> On Feb 2, 2023, at 11:25 PM, Siddhesh Poyarekar  wrote:
> 
> On 2023-02-02 03:33, Richard Biener wrote:
>> looking at PR77650 what seems missing there is the semantics of this
>> extension as expected/required by the glibc use.  comment#5 seems
>> to suggest that for my example above its expected that
>> Y.x.data[0] aliases Y.end?!  There must be a better way to write
>> the glibc code and IMHO it would be best to deprecate this extension.
>> Definitely the middle-end wouldn't consider this aliasing for
>> my example - maybe it "works" when wrapped inside a union but
>> then for sure only when the union is visible in all accesses ...
>> typedef union
>> {
>>   struct __gconv_info __cd;
>>   struct
>>   {
>> struct __gconv_info __cd;
>> struct __gconv_step_data __data;
>>   } __combined;
>> } _G_iconv_t;
>> could be written as
>> typedef union
>> {
>>   struct __gconv_info __cd;
>>   char __dummy[sizeof(struct __gconv_info) + sizeof(struct
>> __gconv_step_data)];
>> } _G_iconv_t;
>> in case the intent is to provide a complete type with space for
>> a single __gconv_step_data.
> 
> I dug into this on the glibc end and it looks like this commit:
> 
> commit 63fb8f9aa9d19f85599afe4b849b567aefd70a36
> Author: Zack Weinberg 
> Date:   Mon Feb 5 14:13:41 2018 -0500
> 
>Post-cleanup 2: minimize _G_config.h.
> 
> ripped all of that gunk out.  AFAICT there's no use of struct __gconv_info 
> anywhere else in the code.
> 
> I reckon it is safe to say now that glibc no longer needs this misfeature.

Thanks a lot for the info.

Looks like it’s good time to start deprecating this misfeature from GCC.

Qing

> 
> Sid



[wwwdocs] document modula-2 in gcc-13/changes.html (and index.html)

2023-02-03 Thread Gaius Mulley via Gcc-patches


Hello,

The following patch provides a summary of the modula-2 front end
and also contains links to the online modula-2 documentation in
index.html.

[I'm just about to git push fixes so that modula-2 builds html, info and
 pdf documentation into the standard directories.]

regards,
Gaius




diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
index 9ecd115c..fa13369f 100644
--- a/htdocs/gcc-13/changes.html
+++ b/htdocs/gcc-13/changes.html
@@ -283,6 +283,18 @@ a work-in-progress.
 
 
 
+Modula-2
+
+  Support for the language Modula-2 has been added.  The dialects
+  supported are PIM2, PIM3, PIM4 and ISO/IEC 10514-1.  Also included
+  are a complete set of ISO/IEC 10514-1 libraries and PIM
+libraries.
+  https://gcc.gnu.org/onlinedocs/m2/Compiler-options.html;>
+  Compiler options.
+  The <* noreturn *> attribute is supported.
+  Linking has been redesigned.
+
+
 
 
 
diff --git a/htdocs/onlinedocs/index.html b/htdocs/onlinedocs/index.html
index 343ff9f5..27a8a505 100644
--- a/htdocs/onlinedocs/index.html
+++ b/htdocs/onlinedocs/index.html
@@ -1647,6 +1647,12 @@ existing release.
href="https://gcc.gnu.org/onlinedocs/gdc.ps.gz;>PostScript or https://gcc.gnu.org/onlinedocs/gdc-html.tar.gz;>an
HTML tarball)
+https://gcc.gnu.org/onlinedocs/m2/;>GNU M2 Manual (https://gcc.gnu.org/onlinedocs/m2.pdf;>also in
+   PDF or https://gcc.gnu.org/onlinedocs/m2.ps.gz;>PostScript or https://gcc.gnu.org/onlinedocs/m2-html.tar.gz;>an
+   HTML tarball)
 https://gcc.gnu.org/onlinedocs/libgomp/;>GNU Offloading and
Multi Processing Runtime Library Manual (https://gcc.gnu.org/onlinedocs/libgomp.pdf;>also in


Re: [PATCH 1/2] libstdc++: Normalise _GLIBCXX20_DEPRECATED macro

2023-02-03 Thread Jonathan Wakely via Gcc-patches
On Wed, 28 Dec 2022 at 14:28, Nathaniel Shead via Libstdc++
 wrote:
>
> These two patches implement P1413 (deprecate std::aligned_storage and
> std::aligned_union) for C++23. Tested on x86_64-linux.
>
> -- >8 --
>
> Updates _GLIBCXX20_DEPRECATED to be defined and behave the same as the
> versions for other standards (e.g. _GLIBCXX17_DEPRECATED).
>
> libstdc++-v3/ChangeLog:
>
> * doc/doxygen/user.cfg.in (PREDEFINED): Update macros.
> * include/bits/c++config (_GLIBCXX20_DEPRECATED): Make
> consistent with other 'deprecated' macros.
> * include/std/type_traits (is_pod, is_pod_v): Use
> _GLIBCXX20_DEPRECATED_SUGGEST instead.
>
> Signed-off-by: Nathaniel Shead 
> ---
>  libstdc++-v3/doc/doxygen/user.cfg.in | 4 ++--
>  libstdc++-v3/include/bits/c++config  | 6 +++---
>  libstdc++-v3/include/std/type_traits | 4 ++--
>  3 files changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/libstdc++-v3/doc/doxygen/user.cfg.in 
> b/libstdc++-v3/doc/doxygen/user.cfg.in
> index 834ad9e4fd5..fc46e722529 100644
> --- a/libstdc++-v3/doc/doxygen/user.cfg.in
> +++ b/libstdc++-v3/doc/doxygen/user.cfg.in
> @@ -2394,8 +2394,8 @@ PREDEFINED = __cplusplus=202002L \
>   "_GLIBCXX11_DEPRECATED_SUGGEST(E)= " \
>   "_GLIBCXX17_DEPRECATED= " \
>   "_GLIBCXX17_DEPRECATED_SUGGEST(E)= " \
> - "_GLIBCXX20_DEPRECATED(E)= " \
> - "_GLIBCXX20_DEPRECATED(E)= " \
> + "_GLIBCXX20_DEPRECATED= " \
> + "_GLIBCXX20_DEPRECATED_SUGGEST(E)= " \

Oops, good catch, that should definitely be fixed.

>   _GLIBCXX17_INLINE=inline \
>   _GLIBCXX_CHRONO_INT64_T=int64_t \
>   _GLIBCXX_DEFAULT_ABI_TAG \
> diff --git a/libstdc++-v3/include/bits/c++config 
> b/libstdc++-v3/include/bits/c++config
> index 50406066afe..d2b0cfa15ce 100644
> --- a/libstdc++-v3/include/bits/c++config
> +++ b/libstdc++-v3/include/bits/c++config
> @@ -84,7 +84,7 @@
>  //   _GLIBCXX14_DEPRECATED_SUGGEST( string-literal )
>  //   _GLIBCXX17_DEPRECATED
>  //   _GLIBCXX17_DEPRECATED_SUGGEST( string-literal )
> -//   _GLIBCXX20_DEPRECATED( string-literal )
> +//   _GLIBCXX20_DEPRECATED
>  //   _GLIBCXX20_DEPRECATED_SUGGEST( string-literal )
>  #ifndef _GLIBCXX_USE_DEPRECATED
>  # define _GLIBCXX_USE_DEPRECATED 1
> @@ -124,10 +124,10 @@
>  #endif
>
>  #if defined(__DEPRECATED) && (__cplusplus >= 202002L)
> -# define _GLIBCXX20_DEPRECATED(MSG) [[deprecated(MSG)]]
> +# define _GLIBCXX20_DEPRECATED [[__deprecated__]]
>  # define _GLIBCXX20_DEPRECATED_SUGGEST(ALT) _GLIBCXX_DEPRECATED_SUGGEST(ALT)
>  #else
> -# define _GLIBCXX20_DEPRECATED(MSG)
> +# define _GLIBCXX20_DEPRECATED

I think this inconsistency was actually deliberate...

>  # define _GLIBCXX20_DEPRECATED_SUGGEST(ALT)
>  #endif
>
> diff --git a/libstdc++-v3/include/std/type_traits 
> b/libstdc++-v3/include/std/type_traits
> index 5dc9e1b2921..2f4d4bb8d4d 100644
> --- a/libstdc++-v3/include/std/type_traits
> +++ b/libstdc++-v3/include/std/type_traits
> @@ -815,7 +815,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>// Could use is_standard_layout && is_trivial instead of the builtin.
>template
>  struct
> -_GLIBCXX20_DEPRECATED("use is_standard_layout && is_trivial instead")
> +_GLIBCXX20_DEPRECATED_SUGGEST("is_standard_layout && is_trivial")

This doesn't quite work. The SUGGEST macro will enclose its argument
in single quotes, so here we get:

is_pod is deprecated, use 'is_standard_layout && is_trivial' instead.

That makes it look like 'is_standard_layout && is_trivial' is a piece
of valid code, but it's not. It would need to use the variable
templates not the class templates, and would need template argument
lists:

is_pod is deprecated, use 'is_standard_layout_v && is_trivial_v' instead.

Then we have the question of whether it's OK to use 'T' here when that
is neither the template parameter in the  header (that's
'_Tp') nor the user's code (that will be some class type that we can't
know in the attribute's string-literal).

So I think my preference would be to leave _GLIBCXX20_DEPRECATED as
is, and always require a message. If possible, we should always say
something user-friendly, e.g. for std::aligned_storage we could use
something like:

_GLIBCXX23_DEPRECATED("use an aligned array of bytes instead")

We don't want to use the SUGGEST macro here because we don't want the
single quotes. But I don't see an obvious message we could use for
std::aligned_union ... so because I'm too lazy to spend any longer
thinking about it, I think I'll just merge both your patches. Thanks
for contributing them!



Re: [PATCH] minor optimization bug in basic_string move assignment

2023-02-03 Thread Jonathan Wakely via Gcc-patches
On Wed, 25 Jan 2023 at 18:38, François Dumont  wrote:
>
> Let's submit a proper patch proposal then.
>
> The occasion for me to ask if there is any reason for cow string not
> being C++11 allocator compliant ? Just lack of interest ?

Mostly lack of interest, but also I don't really want to "encourage"
the use of the old string by investing lots of maintenance effort into
it. If you want new features like C++11 Allocators and
resize_and_overwrite etc then you should use the new type.

I don't remember if there were any actual blockers that made it
difficult to support stateful allocators in the COW string. I might
have written something about it in mails to the list when I was adding
the SSO string, but I don't remember now.

Anyway, for this patch ...

>
> I wanted to consider it to get rid of the __gnu_debug::_Safe_container
> _IsCxx11AllocatorAware template parameter.
>
>  libstdc++: Optimize basic_string move assignment
>
>  Since resolution of Issue 2593 [1] we can consider that equal
> allocators
>  before the propagate-on-move-assignment operations will still be equal
>  afterward.
>
>  So we can extend the optimization of transfering the storage of the
> move-to
>  instance to the move-from one that is currently limited to always equal
>  allocators.
>
>  [1] https://cplusplus.github.io/LWG/issue2593
>
>  libstdc++-v3/ChangeLog:
>
>  * include/bits/basic_string.h (operator=(basic_string&&)):
> Transfer move-to
>  storage to the move-from instance when allocators are equal.
>  *
> testsuite/21_strings/basic_string/allocator/char/move_assign.cc (test04):
>  New test case.
>
> Tested under linux x86_64, ok to commit ?

OK for trunk, thanks!

+Reviewed-by: Jonathan Wakely 


> > On Wed, 4 Jan 2023 at 18:21, François Dumont via Libstdc++
> >  wrote:
> >> On 04/01/23 00:11, waffl3x via Libstdc++ wrote:
> >>> Example: https://godbolt.org/z/sKhGqG1qK
> >>> https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libstdc%2B%2B-v3/include/bits/basic_string.h;hb=HEAD#l880
> >>> When move assigning to a basic_string, the allocated memory of the moved 
> >>> into string is stored into the source string instead of deallocating it, 
> >>> a good optimization when everything is compatible. However in the case of 
> >>> a stateful allocator (is_always_true() evaluating as false) this 
> >>> optimization is never taken. Unless there is some reason I can't think of 
> >>> that makes equal stateful allocators incompatible here, I believe the if 
> >>> statement on line 880 of basic_string.h should also compare the equality 
> >>> of each strings allocator. The first condition in the function seems to 
> >>> indicate to me that this scenario was being considered and just forgotten 
> >>> about, as the memory doesn't get deallocated immediately if the two 
> >>> allocators are equal. I'll note that because of how everything is 
> >>> handled, this doesn't result in a leak so this bug is still only a minor 
> >>> missed optimization.
> >>>
> >>> mailto:libstd...@gcc.gnu.org
> >> Hmmm, I don't know, at least it is not as simple as you present it.
> >>
> >> You cannot add a check on allocator equality as you are proposing
> >> because it is too late. __str allocator might have already been
> >> propagated to *this on the previous call to std::__alloc_on_move. Note
> >> that current check is done only if
> >> !_Alloc_traits::_S_propagate_on_move_assign().
> >>
> >> This patch might do the job but I wonder if equal allocators can become
> >> un-equal after the propagate-on-move-assignment ?
> > Since https://cplusplus.github.io/LWG/issue2593 they can't. But I
> > think when I wrote that code, they could do, which is probably why the
> > optimization wasn't done.
> >



Re: [PATCH] MIPS: use arch_32/64 instead of default_mips_arch

2023-02-03 Thread Xi Ruoyao via Gcc-patches
On Fri, 2023-02-03 at 14:08 +, Richard Sandiford via Gcc-patches
wrote:
> > Do you mean that the "wrong" format is quite interesting?
> > Yes, While this format is never used at all.
> 
> My point was that there is nothing wrong in principle with creating
> an o32 executable that has a 64-bit rather than a 32-bit ISA (since
> the
> 64-bit ISAs are pure extensions of 32-bit ISAs).  Doing that is even
> useful in some cases.  For example, MIPS4+O32 is a useful combination,
> even though MIPS4 is a 64-bit ISA.  Same for Octeon3+O32, etc.
> 
> So is the linker behaviour really correct?  Doesn't it mean that
> Octeon3 O32 binaries are link-incompatible with MIPS32 o32 binaries?

On gcc230:

xry111@gcc230:~$ cat a.c
int a() { return 42; }
xry111@gcc230:~$ cat b.c
extern int a(void);
int main() { return a() ^ 42; }
xry111@gcc230:~$ cc a.c -mabi=32 -march=mips32r2 -c
xry111@gcc230:~$ cc b.c -mabi=32 -march=mips64r2 -c
xry111@gcc230:~$ cc a.o b.o
xry111@gcc230:~$ ./a.out && echo ok
ok
xry111@gcc230:~$ 
xry111@gcc230:~$ ld -version
GNU ld (GNU Binutils for Debian) 2.31.1
Copyright (C) 2018 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or (at your option) a later version.
This program has absolutely no warranty.

So I'd consider the issue a GNU ld regression if it suddenly stops to
behave like this.
-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] Improve RTL CSE hash table hash usage

2023-02-03 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches  writes:
> The RTL CSE hash table has a fixed number of buckets (32) each
> with a linked list of entries with the same hash value.  The
> actual hash values are computed using hash_rtx which uses adds
> for mixing and adds the rtx CODE as CODE << 7 (apart from some
> exceptions such as MEM).  The unsigned int typed hash value
> is then simply truncated for the actual lookup into the fixed
> size table which means that usually CODE is simply lost.
>
> The following improves this truncation by first mixing in more
> bits using xor.  It does not change the actual hash function
> since that's used outside of CSE as well.
>
> An alternative would be to bump the fixed number of buckets,
> say to 256 which would retain the LSB of CODE or to 8192 which
> can capture all 6 bits required for the last CODE.
>
> As the comment in CSE says, there's invalidate_memory and
> flush_hash_table done possibly frequently and those at least
> need to walk all slots, so when the hash table is mostly empty
> enlarging it will be a loss.  Still there should be more
> regular lookups by hash, so less collisions should pay off
> as well.

Going purely from this description and without having looked
at the code properly, would it be possible to link all current
values together, not just those with the same hash?  And would
that help?  It looks like the list is already doubly-linked,
and there's spare room to store a "start of new hash" marker.

Thanks,
Richard

> Without enlarging the table a better hash function is unlikely
> going to make a big difference, simple statistics on the
> number of collisions at insertion time shows a reduction of
> around 10%.  Bumping HASH_SHIFT by 1 improves that to 30%
> at the expense of reducing the average table fill by 10%
> (all of this stats from looking just at fold-const.i at -O2).
> Increasing HASH_SHIFT more leaves the table even more sparse
> likely showing that hash_rtx uses add for mixing which is
> quite bad.  Bumping HASH_SHIFT by 2 removes 90% of all
> collisions.
>
> Experimenting with using inchash instead of adds for the
> mixing does not improve things when looking at the HASH_SHIFT
> bumped by 2 numbers.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> Any opinions?
>
>   * cse.cc (HASH): Turn into inline function and mix
>   in another HASH_SHIFT bits.
>   (SAFE_HASH): Likewise.
> ---
>  gcc/cse.cc | 37 +++--
>  1 file changed, 23 insertions(+), 14 deletions(-)
>
> diff --git a/gcc/cse.cc b/gcc/cse.cc
> index 37afc88b439..4777e559b86 100644
> --- a/gcc/cse.cc
> +++ b/gcc/cse.cc
> @@ -420,20 +420,6 @@ struct table_elt
>  #define HASH_SIZE(1 << HASH_SHIFT)
>  #define HASH_MASK(HASH_SIZE - 1)
>  
> -/* Compute hash code of X in mode M.  Special-case case where X is a pseudo
> -   register (hard registers may require `do_not_record' to be set).  */
> -
> -#define HASH(X, M)   \
> - ((REG_P (X) && REGNO (X) >= FIRST_PSEUDO_REGISTER   \
> -  ? (((unsigned) REG << 7) + (unsigned) REG_QTY (REGNO (X))) \
> -  : canon_hash (X, M)) & HASH_MASK)
> -
> -/* Like HASH, but without side-effects.  */
> -#define SAFE_HASH(X, M)  \
> - ((REG_P (X) && REGNO (X) >= FIRST_PSEUDO_REGISTER   \
> -  ? (((unsigned) REG << 7) + (unsigned) REG_QTY (REGNO (X))) \
> -  : safe_hash (X, M)) & HASH_MASK)
> -
>  /* Determine whether register number N is considered a fixed register for the
> purpose of approximating register costs.
> It is desirable to replace other regs with fixed regs, to reduce need for
> @@ -586,6 +572,29 @@ static machine_mode cse_cc_succs (basic_block, 
> basic_block, rtx, rtx,
>  
>  static const struct rtl_hooks cse_rtl_hooks = RTL_HOOKS_INITIALIZER;
>  
> +/* Compute hash code of X in mode M.  Special-case case where X is a pseudo
> +   register (hard registers may require `do_not_record' to be set).  */
> +
> +static inline unsigned
> +HASH (rtx x, machine_mode mode)
> +{
> +  unsigned h = (REG_P (x) && REGNO (x) >= FIRST_PSEUDO_REGISTER
> + ? (((unsigned) REG << 7) + (unsigned) REG_QTY (REGNO (x)))
> + : canon_hash (x, mode));
> +  return (h ^ (h >> HASH_SHIFT)) & HASH_MASK;
> +}
> +
> +/* Like HASH, but without side-effects.  */
> +
> +static inline unsigned
> +SAFE_HASH (rtx x, machine_mode mode)
> +{
> +  unsigned h = (REG_P (x) && REGNO (x) >= FIRST_PSEUDO_REGISTER
> + ? (((unsigned) REG << 7) + (unsigned) REG_QTY (REGNO (x)))
> + : safe_hash (x, mode));
> +  return (h ^ (h >> HASH_SHIFT)) & HASH_MASK;
> +}
> +
>  /* Nonzero if X has the form (PLUS frame-pointer integer).  */
>  
>  static bool


Re: [PATCH] MIPS: use arch_32/64 instead of default_mips_arch

2023-02-03 Thread Richard Sandiford via Gcc-patches
YunQiang Su  writes:
> Richard Sandiford via Gcc-patches 
> 于2023年2月3日周五 20:29写道:
>>
>> YunQiang Su  writes:
>> > The value of default_mips_arch will be always used for -march by default,
>> > no matter what value is given to -mabi.
>> > It will produce abnormal elf file like:
>> >  ELF 32-bit LSB relocatable, MIPS, MIPS64 rel2 version 1 (SYSV)
>>
>> Is that really wrong though?  There's nothing in principle that
>> prevents a 64-bit ISA being used with a 32-bit ABI, even in the
>> object file's metadata.
>>
>
> To make sure that there is no misunderstanding.
> The "wrong" format is
>  ELF 32-bit LSB relocatable, MIPS, MIPS64 rel2 version 1 (SYSV)
>   ^^
> and the "correct" O32 ABI file is
>  ELF 32-bit LSB relocatable, MIPS, MIPS32 rel2 version 1 (SYSV)
>   ^^
> and the linker refuses to interlink them together.
>
> Do you mean that the "wrong" format is quite interesting?
> Yes, While this format is never used at all.

My point was that there is nothing wrong in principle with creating
an o32 executable that has a 64-bit rather than a 32-bit ISA (since the
64-bit ISAs are pure extensions of 32-bit ISAs).  Doing that is even
useful in some cases.  For example, MIPS4+O32 is a useful combination,
even though MIPS4 is a 64-bit ISA.  Same for Octeon3+O32, etc.

So is the linker behaviour really correct?  Doesn't it mean that
Octeon3 O32 binaries are link-incompatible with MIPS32 o32 binaries?

>> > So we use with_arch_32 and with_arch_64 instead of default_mips_arch
>> > for all mipsisa[32,64]rN triples.
>>
>> I agree there's no benefit to using a stock MIPS64rN ISA over
>> a stock MIPS32rN ISA with a 32-bit ABI, and the patch is only
>> changing those cases.  But things are different when using
>> (say) MIPS4 with a 32-bit ABI, or a 64-bit processor that has
>> proprietary extensions.
>>
>> And, for example, a mips-linux-gnu toolchain would (IIRC) require
>> an -march as well as an -mabi in order to generate 64-bit code.
>> There would be no implicit selection of a new -march.
>>
>
> In fact, no: if we wish to use the default march of GCC configured,
> we can use -mabi=64 only.
>
> $ mipsel-linux-gnu-gcc -mabi=64 -c xx.c && file xx.o
> xx.o: ELF 64-bit LSB relocatable, MIPS, MIPS64 rel2 version 1 (SYSV),
> not stripped

Ah, OK, thanks for the correction.  I obviously misremembered.

> There does be a problem:
> $ mipsel-linux-gnu-gcc -mabi=32 -march=mips64r2 -c xx.c && file xx.o
> xx.o: ELF 32-bit LSB relocatable, MIPS, MIPS64 rel2 version 1 (SYSV),
> not stripped
>  ^^
>> I'm not opposed to the patch.  I just think we should be clear
>> about the underlying principle.  If it's just that all MIPS32/64rN
>> toolchains should behave in the same way (like the sde and mti ones
>> do), then the patch looks good.  But I don't think we should create
>> a general principle that -mabi determines/changes/downgrades -march.
>>
>
> In fact, I prefer what x86 does now:
> $ gcc -m32 -march=haswell -c -O3 xx.c && file xx.o
> xx.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped
>
> But MIPS does like:
> $ mipsel-linux-gnu-gcc -mabi=32 -march=octeon -c yy.c && file yy.o
> yy.o: ELF 32-bit LSB relocatable, MIPS, MIPS64 rel2 version 1 (SYSV),
> not stripped
>  ^^
> $ mipsel-linux-gnu-gcc -mabi=32 -march=mips64r2 -c yy.c && file yy.o
> yy.o: ELF 32-bit LSB relocatable, MIPS, MIPS64 rel2 version 1 (SYSV),
> not stripped
>  ^^
>
> I hope I can fix this problem for MIPS, although I have no idea how to
> do so yet.

The current MIPS behaviour is what I'd expect though, given the
command lines.

file doesn't tell the full story.  The ELF flags should also
distinguish between the -march=octeon and -march=mips64r2 cases.

Thanks,
Richard

>
>> Thanks,
>> Richard
>>
>> >
>> > gcc/ChangeLog:
>> >   * config.gcc: use with_arch_32 and with_arch_64 instead of
>> >   default_mips_arch for mipsisa[32,64]rN triples.
>> > ---
>> >  gcc/config.gcc | 21 ++---
>> >  1 file changed, 14 insertions(+), 7 deletions(-)
>> >
>> > diff --git a/gcc/config.gcc b/gcc/config.gcc
>> > index f0958e1c959..0b6d093d847 100644
>> > --- a/gcc/config.gcc
>> > +++ b/gcc/config.gcc
>> > @@ -2518,13 +2518,16 @@ mips*-*-linux*)   # 
>> > Linux MIPS, either endian.
>> >   extra_options="${extra_options} linux-android.opt"
>> >   case ${target} in
>> >   mipsisa32r6*)
>> > - default_mips_arch=mips32r6
>> > + with_arch_32="mips32r6"
>> > + with_arch_64="mips64r6"
>> >   ;;
>> >   mipsisa32r2*)
>> > - 

[PATCH] Improve RTL CSE hash table hash usage

2023-02-03 Thread Richard Biener via Gcc-patches
The RTL CSE hash table has a fixed number of buckets (32) each
with a linked list of entries with the same hash value.  The
actual hash values are computed using hash_rtx which uses adds
for mixing and adds the rtx CODE as CODE << 7 (apart from some
exceptions such as MEM).  The unsigned int typed hash value
is then simply truncated for the actual lookup into the fixed
size table which means that usually CODE is simply lost.

The following improves this truncation by first mixing in more
bits using xor.  It does not change the actual hash function
since that's used outside of CSE as well.

An alternative would be to bump the fixed number of buckets,
say to 256 which would retain the LSB of CODE or to 8192 which
can capture all 6 bits required for the last CODE.

As the comment in CSE says, there's invalidate_memory and
flush_hash_table done possibly frequently and those at least
need to walk all slots, so when the hash table is mostly empty
enlarging it will be a loss.  Still there should be more
regular lookups by hash, so less collisions should pay off
as well.

Without enlarging the table a better hash function is unlikely
going to make a big difference, simple statistics on the
number of collisions at insertion time shows a reduction of
around 10%.  Bumping HASH_SHIFT by 1 improves that to 30%
at the expense of reducing the average table fill by 10%
(all of this stats from looking just at fold-const.i at -O2).
Increasing HASH_SHIFT more leaves the table even more sparse
likely showing that hash_rtx uses add for mixing which is
quite bad.  Bumping HASH_SHIFT by 2 removes 90% of all
collisions.

Experimenting with using inchash instead of adds for the
mixing does not improve things when looking at the HASH_SHIFT
bumped by 2 numbers.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Any opinions?

* cse.cc (HASH): Turn into inline function and mix
in another HASH_SHIFT bits.
(SAFE_HASH): Likewise.
---
 gcc/cse.cc | 37 +++--
 1 file changed, 23 insertions(+), 14 deletions(-)

diff --git a/gcc/cse.cc b/gcc/cse.cc
index 37afc88b439..4777e559b86 100644
--- a/gcc/cse.cc
+++ b/gcc/cse.cc
@@ -420,20 +420,6 @@ struct table_elt
 #define HASH_SIZE  (1 << HASH_SHIFT)
 #define HASH_MASK  (HASH_SIZE - 1)
 
-/* Compute hash code of X in mode M.  Special-case case where X is a pseudo
-   register (hard registers may require `do_not_record' to be set).  */
-
-#define HASH(X, M) \
- ((REG_P (X) && REGNO (X) >= FIRST_PSEUDO_REGISTER \
-  ? (((unsigned) REG << 7) + (unsigned) REG_QTY (REGNO (X)))   \
-  : canon_hash (X, M)) & HASH_MASK)
-
-/* Like HASH, but without side-effects.  */
-#define SAFE_HASH(X, M)\
- ((REG_P (X) && REGNO (X) >= FIRST_PSEUDO_REGISTER \
-  ? (((unsigned) REG << 7) + (unsigned) REG_QTY (REGNO (X)))   \
-  : safe_hash (X, M)) & HASH_MASK)
-
 /* Determine whether register number N is considered a fixed register for the
purpose of approximating register costs.
It is desirable to replace other regs with fixed regs, to reduce need for
@@ -586,6 +572,29 @@ static machine_mode cse_cc_succs (basic_block, 
basic_block, rtx, rtx,
 
 static const struct rtl_hooks cse_rtl_hooks = RTL_HOOKS_INITIALIZER;
 
+/* Compute hash code of X in mode M.  Special-case case where X is a pseudo
+   register (hard registers may require `do_not_record' to be set).  */
+
+static inline unsigned
+HASH (rtx x, machine_mode mode)
+{
+  unsigned h = (REG_P (x) && REGNO (x) >= FIRST_PSEUDO_REGISTER
+   ? (((unsigned) REG << 7) + (unsigned) REG_QTY (REGNO (x)))
+   : canon_hash (x, mode));
+  return (h ^ (h >> HASH_SHIFT)) & HASH_MASK;
+}
+
+/* Like HASH, but without side-effects.  */
+
+static inline unsigned
+SAFE_HASH (rtx x, machine_mode mode)
+{
+  unsigned h = (REG_P (x) && REGNO (x) >= FIRST_PSEUDO_REGISTER
+   ? (((unsigned) REG << 7) + (unsigned) REG_QTY (REGNO (x)))
+   : safe_hash (x, mode));
+  return (h ^ (h >> HASH_SHIFT)) & HASH_MASK;
+}
+
 /* Nonzero if X has the form (PLUS frame-pointer integer).  */
 
 static bool
-- 
2.35.3


[PATCH] openmp: Add support for 'present' modifier

2023-02-03 Thread Kwok Cheung Yeung

Hello

This patch implements support for the OpenMP 5.1 'present' modifier in 
C, C++ and Fortran. 'present' can be used in the 'map' clause for the 
'target', 'target data', 'target data enter' and 'target data exit' 
constructs, and the 'to'/'from' clauses of 'target update'. It can be 
used in conjunction with other modifiers too (currently only 'always' on 
map clauses).


It can also be used in defaultmap, which applies 'present, alloc' to the 
default clauses.


It behaves similarly to the OpenACC 'present' clause, and causes an 
fatal runtime error when the referenced data is not already present in 
device memory. Similarly to the OpenACC error message, the error is 
expressed in terms of the equivalent OpenMP function !omp_target_is_present.


Regarding the representation of the map kind - the bit space is getting 
a bit crowded. I have made bit 7 (GOMP_MAP_FLAG_FORCE) into another 
special bit (GOMP_MAP_FLAG_SPECIAL_5), and redefined GOMP_MAP_FLAG_FORCE 
to be GOMP_MAP_FLAG_SPECIAL_5 with no other special flags set. The 
'present' modifier is represented by setting GOMP_MAP_FLAG_SPECIAL_5 | 
GOMP_MAP_FLAG_SPECIAL_0 - this does not interfere with 'always' 
(GOMP_MAP_FLAG_SPECIAL_2) or 'implicit' (GOMP_MAP_FLAG_SPECIAL_3 | 
GOMP_MAP_FLAG_SPECIAL_4) which is used by clauses generated by defaultmap.


During gimplification of defaultmap, the present defaultmap is 
represented by setting GOVD_MAP_FORCE_PRESENT (as that is presently only 
used in OpenACC and has a similar meaning). GOVD_MAP_ALLOC ONLY will be 
added, and this is eventually lowered to a GOMP_MAP_PRESENT_ALLOC map 
kind for the default clauses.


Bootstrapped on x86-64, no regressions in GCC testsuite, libgomp tested 
with x86-64 (no offloading), AMD GCN and NVPTX offloading. This is too 
late for GCC 13 now, but will this be okay for GCC 14?


Thanks

KwokFrom ba9368f88514a27f374d84e53e36ce36fa9ac5bc Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Fri, 3 Feb 2023 13:04:21 +
Subject: [PATCH] openmp: Add support for the 'present' modifier

This implements support for the OpenMP 5.1 'present' modifier, which can be
used in map clauses in the 'target', 'target data', 'target data enter' and
'target data exit' constructs, and in the 'to' and 'from' clauses of the
'target update' construct.  It is also supported in defaultmap.

The modifier triggers a fatal runtime error if the data specified by the
clause is not already present on the target device.  It can also be combined
with 'always' in map clauses.

2023-02-01  Kwok Cheung Yeung  

gcc/c/
* c-parser.cc (c_parser_omp_variable_list): Set default motion
modifier.
(c_parser_omp_var_list_parens): Add new parameter with default.  Parse
'present' motion modifier and apply.
(c_parser_omp_clause_defaultmap): Parse 'present' in defaultmap.
(c_parser_omp_clause_map): Parse 'present' modifier in map clauses.
(c_parser_omp_clause_to): Allow use of 'present' in variable list.
(c_parser_omp_clause_from): Likewise.
(c_parser_omp_target_data): Allow map clauses with 'present'
modifiers.
(c_parser_omp_target_enter_data): Likewise.
(c_parser_omp_target_exit_data): Likewise.
(c_parser_omp_target): Likewise.

gcc/cp/
* parser.cc (cp_parser_omp_var_list_no_open): Add new parameter with
default.  Parse 'present' motion modifier and apply.
(cp_parser_omp_clause_defaultmap): Parse 'present' in defaultmap.
(cp_parser_omp_clause_map): Parse 'present' modifier in map clauses.
(cp_parser_omp_all_clauses): Allow use of 'present' in 'to' and 'from'
clauses.
(cp_parser_omp_target_data): Allow map clauses with 'present'
modifiers.
(cp_parser_omp_target_enter_data): Likewise.
(cp_parser_omp_target_exit_data): Likewise.
* semantics.cc (finish_omp_target): Accept map clauses with 'present'
modifiers.

gcc/fortran/
* gfortran.h (enum gfc_omp_map_op): Add entries with 'present'
modifiers.
(enum gfc_omp_motion_modifier): New.
(struct gfc_omp_namelist): Add motion_modifier field.
* openmp.cc (gfc_match_omp_variable_list): Add new parameter with
default.  Parse 'present' motion modifier and apply.
(gfc_match_omp_clauses): Parse 'present' in defaultmap, 'from'
clauses, 'map' clauses and 'to' clauses.
(resolve_omp_clauses): Allow 'present' modifiers on 'target',
'target data', 'target enter' and 'target exit' directives.
* trans-openmp.cc (gfc_trans_omp_clauses): Apply 'present' modifiers
to tree node for 'map', 'to' and 'from' clauses.  Apply 'present' for
defaultmap.

gcc/
* gimplify.cc (omp_notice_variable): Apply GOVD_MAP_ALLOC_ONLY flag
and defaultmap flags if the defaultmap has GOVD_MAP_FORCE_PRESENT flag
set.
(omp_get_attachment): Handle map 

Re: [PATCH] MIPS: use arch_32/64 instead of default_mips_arch

2023-02-03 Thread YunQiang Su
Richard Sandiford via Gcc-patches 
于2023年2月3日周五 20:29写道:
>
> YunQiang Su  writes:
> > The value of default_mips_arch will be always used for -march by default,
> > no matter what value is given to -mabi.
> > It will produce abnormal elf file like:
> >  ELF 32-bit LSB relocatable, MIPS, MIPS64 rel2 version 1 (SYSV)
>
> Is that really wrong though?  There's nothing in principle that
> prevents a 64-bit ISA being used with a 32-bit ABI, even in the
> object file's metadata.
>

To make sure that there is no misunderstanding.
The "wrong" format is
 ELF 32-bit LSB relocatable, MIPS, MIPS64 rel2 version 1 (SYSV)
  ^^
and the "correct" O32 ABI file is
 ELF 32-bit LSB relocatable, MIPS, MIPS32 rel2 version 1 (SYSV)
  ^^
and the linker refuses to interlink them together.

Do you mean that the "wrong" format is quite interesting?
Yes, While this format is never used at all.

> > So we use with_arch_32 and with_arch_64 instead of default_mips_arch
> > for all mipsisa[32,64]rN triples.
>
> I agree there's no benefit to using a stock MIPS64rN ISA over
> a stock MIPS32rN ISA with a 32-bit ABI, and the patch is only
> changing those cases.  But things are different when using
> (say) MIPS4 with a 32-bit ABI, or a 64-bit processor that has
> proprietary extensions.
>
> And, for example, a mips-linux-gnu toolchain would (IIRC) require
> an -march as well as an -mabi in order to generate 64-bit code.
> There would be no implicit selection of a new -march.
>

In fact, no: if we wish to use the default march of GCC configured,
we can use -mabi=64 only.

$ mipsel-linux-gnu-gcc -mabi=64 -c xx.c && file xx.o
xx.o: ELF 64-bit LSB relocatable, MIPS, MIPS64 rel2 version 1 (SYSV),
not stripped

There does be a problem:
$ mipsel-linux-gnu-gcc -mabi=32 -march=mips64r2 -c xx.c && file xx.o
xx.o: ELF 32-bit LSB relocatable, MIPS, MIPS64 rel2 version 1 (SYSV),
not stripped
 ^^
> I'm not opposed to the patch.  I just think we should be clear
> about the underlying principle.  If it's just that all MIPS32/64rN
> toolchains should behave in the same way (like the sde and mti ones
> do), then the patch looks good.  But I don't think we should create
> a general principle that -mabi determines/changes/downgrades -march.
>

In fact, I prefer what x86 does now:
$ gcc -m32 -march=haswell -c -O3 xx.c && file xx.o
xx.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped

But MIPS does like:
$ mipsel-linux-gnu-gcc -mabi=32 -march=octeon -c yy.c && file yy.o
yy.o: ELF 32-bit LSB relocatable, MIPS, MIPS64 rel2 version 1 (SYSV),
not stripped
 ^^
$ mipsel-linux-gnu-gcc -mabi=32 -march=mips64r2 -c yy.c && file yy.o
yy.o: ELF 32-bit LSB relocatable, MIPS, MIPS64 rel2 version 1 (SYSV),
not stripped
 ^^

I hope I can fix this problem for MIPS, although I have no idea how to
do so yet.

> Thanks,
> Richard
>
> >
> > gcc/ChangeLog:
> >   * config.gcc: use with_arch_32 and with_arch_64 instead of
> >   default_mips_arch for mipsisa[32,64]rN triples.
> > ---
> >  gcc/config.gcc | 21 ++---
> >  1 file changed, 14 insertions(+), 7 deletions(-)
> >
> > diff --git a/gcc/config.gcc b/gcc/config.gcc
> > index f0958e1c959..0b6d093d847 100644
> > --- a/gcc/config.gcc
> > +++ b/gcc/config.gcc
> > @@ -2518,13 +2518,16 @@ mips*-*-linux*)   # 
> > Linux MIPS, either endian.
> >   extra_options="${extra_options} linux-android.opt"
> >   case ${target} in
> >   mipsisa32r6*)
> > - default_mips_arch=mips32r6
> > + with_arch_32="mips32r6"
> > + with_arch_64="mips64r6"
> >   ;;
> >   mipsisa32r2*)
> > - default_mips_arch=mips32r2
> > + with_arch_32="mips32r2"
> > + with_arch_64="mips64r2"
> >   ;;
> >   mipsisa32*)
> > - default_mips_arch=mips32
> > + with_arch_32="mips32"
> > + with_arch_64="mips64"
> >   ;;
> >   mips64el-st-linux-gnu)
> >   default_mips_abi=n32
> > @@ -2540,22 +2543,26 @@ mips*-*-linux*)   # 
> > Linux MIPS, either endian.
> >   ;;
> >   mipsisa64r6*-*-linux-gnuabi64)
> >   default_mips_abi=64
> > - default_mips_arch=mips64r6
> > + with_arch_32="mips32r6"
> > + with_arch_64="mips64r6"
> >   enable_mips_multilibs="yes"
> >   ;;
> >   

Re: [PATCH 1/2] Handle component_ref to a structre/union field including flexible array member [PR101832]

2023-02-03 Thread Qing Zhao via Gcc-patches



> On Feb 3, 2023, at 2:49 AM, Richard Biener  wrote:
> 
> On Thu, 2 Feb 2023, Qing Zhao wrote:
> 
>> 
>> 
>>> On Feb 2, 2023, at 8:54 AM, Richard Biener  wrote:
>>> 
>>> On Thu, 2 Feb 2023, Qing Zhao wrote:
>>> 
 
 
> 
> [...]
> 
 +  return flexible_size_type_p (TREE_TYPE (last));
>>> 
>>> For types with many members this can become quite slow (IIRC we had
>>> bugs about similar walks of all fields in types), and this function
>>> looks like it's invoked multiple times on the same type per TU.
>>> 
>>> In principle the property is fixed at the time we lay out a record
>>> type, so we might want to compute it at that time and record the
>>> result.
>> 
>> You mean in FE? 
> 
> Yes, either in the frontend or in the middle-ends layout_type.
> 
>> Yes, that?s better and cleaner.
>> 
>> I will add one more field in the TYPE structure to record this 
>> information and check this field during middle end.
>> 
>> I had the same thought in the beginning, but not sure whether adding a 
>> new field in IR is necessary or not, other places in middle end might 
>> not use this new field.
> 
> It might be interesting to search for other code walking all fields of
> a type to determine this or similar info.
 
 There is one which is defined in tree.cc but only is referenced in 
 c/c-decl.cc:
 
 /* Determine whether TYPE is a structure with a flexible array member,
  or a union containing such a structure (possibly recursively).  */
 flexible_array_type_p
 
 However, this routine is a little different than the one I tried to add:
 
 In the current routine ?flexible_array_type_p?,  only one level nesting in 
 the structure is accepted, multiple nesting in structure is not permitted.
 
 So, my question is:  shall we accept multiple nesting in structure? i.e.
>>> 
>>> If we don't reject the testcase with an error, then yes.
>> 
>> Gcc currently accepts the multiple nesting in structure without error.  
>> So, we will continue to accept such extension as long as the flex array 
>> is at the end of the structure. At the same time, for the case the flex 
>> array is in the middle of the structure, issue additional warnings now 
>> to discourage such usage, and deprecate this case in a future release.
>> 
>> Does this sound reasonable? 
> 
> Please don't mix several issues - I think the flex array in the
> middle of a structure is separate and we shouldn't report that
> as flexible_array_type_p or flexible_size_type_p since the size
> of the containing structure is not variable.
Agreed on this.

My major question here is (for documentation change, sorry for mixing this 
thread with the documentation change): do we need to document this case 
together with the case in which struct with flex array is embedded into another 
structure? (As a GCC extension?)
> 
> For diagnostic purposes the intended use case is to treat
> a pointer to a structure that appears to have a fixed size
> but has (recursive) a member with a flexible array at the end
> as having variable size.  Just the same as array_at_struct_end_p
> treats this for the case of accesses involving such a type.

Yes. 
> 
> For the middle position case that's not the case.
Yes. 

Thanks.

Qing
> 
> Richard.
> 
>> Qing
>>> 
 struct A {
 int n;
 char data[];/* Content following header */
 };
 
 struct B {
 int m;
 struct A a;
 };
 
 struct C {
 int q;
 struct B b;
 };
 
 Qing
> 
>> thanks.
>> 
>> Qing
>> 
>>> 
 +  return false;
 +case UNION_TYPE:
 +  for (x = TYPE_FIELDS (type); x != NULL_TREE; x = DECL_CHAIN (x))
 +  {
 +if (TREE_CODE (x) == FIELD_DECL
 +&& flexible_array_type_p (TREE_TYPE (x)))
 +  return true;
 +  }
 +  return false;
 +default:
 +  return false;
 +  }
 +}
 +
 /* Compute __builtin_object_size for PTR, which is a ADDR_EXPR.
 OBJECT_SIZE_TYPE is the second argument from __builtin_object_size.
 If unknown, return size_unknown (object_size_type).  */
 @@ -633,45 +669,68 @@ addr_object_size (struct object_size_info *osi, 
 const_tree ptr,
v = NULL_TREE;
break;
  case COMPONENT_REF:
 -  if (TREE_CODE (TREE_TYPE (v)) != ARRAY_TYPE)
 +  /* When the ref is not to an array, a record or a 
 union, it
 + will not have flexible size, compute the object 
 size
 + directly.  */
 +  if ((TREE_CODE (TREE_TYPE (v)) != ARRAY_TYPE)
 +  && (TREE_CODE (TREE_TYPE (v)) 

[PATCH] Speedup cse_insn

2023-02-03 Thread Richard Biener via Gcc-patches
When cse_insn prunes src{,_folded,_eqv_here,_related} with the
equivalence set in the *_same_value chain it also searches for
an equivalence to the destination of the instruction with

  /* This is the same as the destination of the insns, we want
 to prefer it.  Copy it to src_related.  The code below will
 then give it a negative cost.  */
  if (GET_CODE (dest) == code && rtx_equal_p (p->exp, dest))
src_related = p->exp;

this picks up the last such equivalence and in particular any
later duplicate will be pruned by the preceeding

  else if (src_related && GET_CODE (src_related) == code
   && rtx_equal_p (src_related, p->exp))
src_related = 0;

first.  This wastes cycles doing extra rtx_equal_p checks.  The
following instead searches for the first destination equivalence
separately in this loop and delays using src_related for it until
we are about to process that, avoiding another redundant rtx_equal_p
check.

I've came here because of a testcase with very large equivalence
lists and compile-time of cse_insn.  The patch below doesn't speed
it up significantly since there's no equivalence on the destination.

In theory this opens the possibility to track dest_related
separately, avoiding the implicit pruning of any previous
value in src_related.  As is the change should be a no-op for
code generation.

Bootstrapped and tested on x86_64-unknown-linux-gnu, queued for
stage1.

* cse.cc (cse_insn): Track an equivalence to the destination
separately and delay using src_related for it.
---
 gcc/cse.cc | 51 +++
 1 file changed, 27 insertions(+), 24 deletions(-)

diff --git a/gcc/cse.cc b/gcc/cse.cc
index 8fbda4ecc86..543cb1fe36f 100644
--- a/gcc/cse.cc
+++ b/gcc/cse.cc
@@ -4614,6 +4614,7 @@ cse_insn (rtx_insn *insn)
   rtx src_eqv_here;
   rtx src_const = 0;
   rtx src_related = 0;
+  rtx dest_related = 0;
   bool src_related_is_const_anchor = false;
   struct table_elt *src_const_elt = 0;
   int src_cost = MAX_COST;
@@ -5085,10 +5086,11 @@ cse_insn (rtx_insn *insn)
src_related = 0;
 
  /* This is the same as the destination of the insns, we want
-to prefer it.  Copy it to src_related.  The code below will
-then give it a negative cost.  */
- if (GET_CODE (dest) == code && rtx_equal_p (p->exp, dest))
-   src_related = p->exp;
+to prefer it.  The code below will then give it a negative
+cost.  */
+ if (!dest_related
+ && GET_CODE (dest) == code && rtx_equal_p (p->exp, dest))
+   dest_related = p->exp;
}
 
   /* Find the cheapest valid equivalent, trying all the available
@@ -5130,27 +5132,28 @@ cse_insn (rtx_insn *insn)
}
}
 
-  if (src_related)
+  if (dest_related)
{
- if (rtx_equal_p (src_related, dest))
-   src_related_cost = src_related_regcost = -1;
- else
-   {
- src_related_cost = COST (src_related, mode);
- src_related_regcost = approx_reg_cost (src_related);
-
- /* If a const-anchor is used to synthesize a constant that
-normally requires multiple instructions then slightly prefer
-it over the original sequence.  These instructions are likely
-to become redundant now.  We can't compare against the cost
-of src_eqv_here because, on MIPS for example, multi-insn
-constants have zero cost; they are assumed to be hoisted from
-loops.  */
- if (src_related_is_const_anchor
- && src_related_cost == src_cost
- && src_eqv_here)
-   src_related_cost--;
-   }
+ src_related_cost = src_related_regcost = -1;
+ /* Handle it as src_related.  */
+ src_related = dest_related;
+   }
+  else if (src_related)
+   {
+ src_related_cost = COST (src_related, mode);
+ src_related_regcost = approx_reg_cost (src_related);
+
+ /* If a const-anchor is used to synthesize a constant that
+normally requires multiple instructions then slightly prefer
+it over the original sequence.  These instructions are likely
+to become redundant now.  We can't compare against the cost
+of src_eqv_here because, on MIPS for example, multi-insn
+constants have zero cost; they are assumed to be hoisted from
+loops.  */
+ if (src_related_is_const_anchor
+ && src_related_cost == src_cost
+ && src_eqv_here)
+   src_related_cost--;
}
 
   /* If this was an indirect jump insn, a known label will really be
-- 
2.35.3


Re: [PATCHv2] libstdc++: Mark pieces of gnu-linux/os_support.h linux-specific

2023-02-03 Thread Jonathan Wakely via Gcc-patches
On Fri, 7 Oct 2022 at 21:45, Samuel Thibault wrote:
>
> This is notably needed because in glibc 2.34, the move of pthread functions
> into libc.so happened for Linux only, not GNU/Hurd.
>
> The pthread_self() function can also always be used fine as it is on
> GNU/Hurd.

Sorry for the delay, I'm going to push this to trunk today.


>
> libstdc++-v3/ChangeLog:
>
> * config/os/gnu-linux/os_defines.h [!__linux__]
>   (_GLIBCXX_NATIVE_THREAD_ID, _GLIBCXX_GTHREAD_USE_WEAK): Do not 
> define.
>
> diff --git a/libstdc++-v3/config/os/gnu-linux/os_defines.h 
> b/libstdc++-v3/config/os/gnu-linux/os_defines.h
> index c0caa21a013..4de93d752e1 100644
> --- a/libstdc++-v3/config/os/gnu-linux/os_defines.h
> +++ b/libstdc++-v3/config/os/gnu-linux/os_defines.h
> @@ -49,22 +49,24 @@
>  // version dynamically in case it has changed since libstdc++ was configured.
>  #define _GLIBCXX_NO_OBSOLETE_ISINF_ISNAN_DYNAMIC __GLIBC_PREREQ(2,23)
>
> -#if __GLIBC_PREREQ(2, 27)
> -// Since glibc 2.27 pthread_self() is usable without linking to libpthread.
> -# define _GLIBCXX_NATIVE_THREAD_ID pthread_self()
> -#else
> +#ifdef __linux__
> +# if __GLIBC_PREREQ(2, 27)
> +// Since glibc 2.27 Linux' pthread_self() is usable without linking to 
> libpthread.
> +#  define _GLIBCXX_NATIVE_THREAD_ID pthread_self()
> +# else
>  // Before then it was in libc.so.6 but not libc.a, and always returns 0,
>  // which breaks the invariant this_thread::get_id() != thread::id{}.
>  // So only use it if we know the libpthread version is available.
>  // Otherwise use (__gthread_t)1 as the ID of the main (and only) thread.
> -# define _GLIBCXX_NATIVE_THREAD_ID \
> -  (__gthread_active_p() ? __gthread_self() : (__gthread_t)1)
> -#endif
> +#  define _GLIBCXX_NATIVE_THREAD_ID \
> +   (__gthread_active_p() ? __gthread_self() : (__gthread_t)1)
> +# endif
>
> -#if __GLIBC_PREREQ(2, 34)
> -// Since glibc 2.34 all pthreads functions are usable without linking to
> +# if __GLIBC_PREREQ(2, 34)
> +// Since glibc 2.34 all Linux pthreads functions are usable without linking 
> to
>  // libpthread.
> -# define _GLIBCXX_GTHREAD_USE_WEAK 0
> +#  define _GLIBCXX_GTHREAD_USE_WEAK 0
> +# endif
>  #endif
>
>  #endif
>



Re: [PATCH] MIPS: use arch_32/64 instead of default_mips_arch

2023-02-03 Thread Richard Sandiford via Gcc-patches
YunQiang Su  writes:
> The value of default_mips_arch will be always used for -march by default,
> no matter what value is given to -mabi.
> It will produce abnormal elf file like:
>  ELF 32-bit LSB relocatable, MIPS, MIPS64 rel2 version 1 (SYSV)

Is that really wrong though?  There's nothing in principle that
prevents a 64-bit ISA being used with a 32-bit ABI, even in the
object file's metadata.

> So we use with_arch_32 and with_arch_64 instead of default_mips_arch
> for all mipsisa[32,64]rN triples.

I agree there's no benefit to using a stock MIPS64rN ISA over
a stock MIPS32rN ISA with a 32-bit ABI, and the patch is only
changing those cases.  But things are different when using
(say) MIPS4 with a 32-bit ABI, or a 64-bit processor that has
proprietary extensions.

And, for example, a mips-linux-gnu toolchain would (IIRC) require
an -march as well as an -mabi in order to generate 64-bit code.
There would be no implicit selection of a new -march.

I'm not opposed to the patch.  I just think we should be clear
about the underlying principle.  If it's just that all MIPS32/64rN
toolchains should behave in the same way (like the sde and mti ones
do), then the patch looks good.  But I don't think we should create
a general principle that -mabi determines/changes/downgrades -march.

Thanks,
Richard

>
> gcc/ChangeLog:
>   * config.gcc: use with_arch_32 and with_arch_64 instead of
>   default_mips_arch for mipsisa[32,64]rN triples.
> ---
>  gcc/config.gcc | 21 ++---
>  1 file changed, 14 insertions(+), 7 deletions(-)
>
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index f0958e1c959..0b6d093d847 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -2518,13 +2518,16 @@ mips*-*-linux*)   # Linux 
> MIPS, either endian.
>   extra_options="${extra_options} linux-android.opt"
>   case ${target} in
>   mipsisa32r6*)
> - default_mips_arch=mips32r6
> + with_arch_32="mips32r6"
> + with_arch_64="mips64r6"
>   ;;
>   mipsisa32r2*)
> - default_mips_arch=mips32r2
> + with_arch_32="mips32r2"
> + with_arch_64="mips64r2"
>   ;;
>   mipsisa32*)
> - default_mips_arch=mips32
> + with_arch_32="mips32"
> + with_arch_64="mips64"
>   ;;
>   mips64el-st-linux-gnu)
>   default_mips_abi=n32
> @@ -2540,22 +2543,26 @@ mips*-*-linux*)   # Linux 
> MIPS, either endian.
>   ;;
>   mipsisa64r6*-*-linux-gnuabi64)
>   default_mips_abi=64
> - default_mips_arch=mips64r6
> + with_arch_32="mips32r6"
> + with_arch_64="mips64r6"
>   enable_mips_multilibs="yes"
>   ;;
>   mipsisa64r6*-*-linux*)
>   default_mips_abi=n32
> - default_mips_arch=mips64r6
> + with_arch_32="mips32r6"
> + with_arch_64="mips64r6"
>   enable_mips_multilibs="yes"
>   ;;
>   mipsisa64r2*-*-linux-gnuabi64)
>   default_mips_abi=64
> - default_mips_arch=mips64r2
> + with_arch_32="mips32r2"
> + with_arch_64="mips64r2"
>   enable_mips_multilibs="yes"
>   ;;
>   mipsisa64r2*-*-linux*)
>   default_mips_abi=n32
> - default_mips_arch=mips64r2
> + with_arch_32="mips32r2"
> + with_arch_64="mips64r2"
>   enable_mips_multilibs="yes"
>   ;;
>   mips64*-*-linux-gnuabi64 | mipsisa64*-*-linux-gnuabi64)


Re: [PATCH] ipa: Avoid invalid gimple when IPA-CP and IPA-SRA disagree on types (108384)

2023-02-03 Thread Richard Biener via Gcc-patches
On Fri, Feb 3, 2023 at 10:40 AM Martin Jambor  wrote:
>
> On Fri, Feb 03 2023, Richard Biener wrote:
> > On Thu, Feb 2, 2023 at 5:20 PM Martin Jambor  wrote:
> >>
> >> Hi,
> >>
> >> when the compiled program contains type mismatches between callers and
> >> callees when it comes to a parameter, IPA-CP can try to propagate one
> >> constant from callers while IPA-SRA may try to split a parameter
> >> expecting a value of a different size on the same offset.  This then
> >> currently leads to creation of a VIEW_CONVERT_EXPR with mismatching
> >> type sizes of LHS and RHS which is correctly flagged by the GIMPLE
> >> verifier as invalid.
> >>
> >> It seems that the best course of action is to try and avoid the
> >> situation altogether and so this patch adds a check to IPA-SRA that
> >> peeks into the result of IPA-CP and when it sees a value on the same
> >> offset but with a mismatching size, it just decides to leave that
> >> particular parameter be.
> >>
> >> Bootstrapped and tested on x86_64-linux, OK for master?
> >
> > OK.  I suppose there are guards elsewhere that never lets a
> > non-UHWI size type (like variable size or poly-int-size) through
> > any of the SRA or CP lattices?
>
> SRA tracks its accesses in simple integers so yes for that part.
>
> As far IPA-CP is concerned... all the values tracked conform to
> is_gimple_ip_invariant, so are either ADDR_EXPRs of a global variable or
> is_gimple_constant's.  So its size should never be variable and I hope
> also never a complex poly-int.  If you think it would be better, I can
> of course add the check.

I think it's OK as-is given this explanation.

Richard.

> Thanks,
>
> Martin
>
>
> >> gcc/ChangeLog:
> >>
> >> 2023-02-02  Martin Jambor  
> >>
> >> PR ipa/108384
> >> * ipa-sra.cc (push_param_adjustments_for_index): Remove a size 
> >> check
> >> when comparing to an IPA-CP value.
> >> (dump_list_of_param_indices): New function.
> >> (adjust_parameter_descriptions): Check for mismatching IPA-CP 
> >> values.
> >> Dump removed candidates using dump_list_of_param_indices.
> >> * ipa-param-manipulation.cc
> >> (ipa_param_body_adjustments::modify_expression): Add assert 
> >> checking
> >> sizes of a VIEW_CONVERT_EXPR will match.
> >> (ipa_param_body_adjustments::modify_assignment): Likewise.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> 2023-02-02  Martin Jambor  
> >>
> >> PR ipa/108384
> >> * gcc.dg/ipa/pr108384.c: New test.
> >> ---


Re: [PATCH] libstdc++: testsuite: async.cc early timeout

2023-02-03 Thread Jonathan Wakely via Gcc-patches
On Thu, 5 May 2022 at 07:56, Alexandre Oliva via Libstdc++
 wrote:
>
>
> The async call and future variable initialization may take a while to
> complete on uniprocessors, especially if the async call and other
> unrelated processes run before context switches back to the main
> thread.
>
> Taking steady_begin only then sometimes causes the 11*100ms in the
> slow clock, counted from before the async call, to not be enough for
> the measured wait to last 1s in the steady clock.  I've seen it fall
> short of 1s by as little as a third of a tenth of a second in some
> cases, but in one surprisingly extreme case the elapsed wait time got
> only up to 216.7ms.
>
> Initializing both timestamps next to each other, before the async
> call, appears to avoid the problem entirely.  I've renamed the
> variable moved out of the block so as to avoid name hiding in the
> subsequent block, that has another steady_begin variable.
>
> The second wait fails a lot less frequently, but the 2s limit has been
> exceeded, so I'm bumping up the max sleep to ~4s, and the tolerance to
> 3s.
>
>
> I wasn't sure about whether to leave the added outputs that I put in to
> confirm the failure modes.  Please let me know in case they're
> undersirable, and I'll take them out.
>
> Regstrapped on x86_64-linux-gnu, ppc64le-linux-gnu, and also tested on
> ppc- and ppc64-vx7r2.  Ok to install?

Hi Alex,

This one slipped through the cracks, sorry.

Leaving the outputs seems useful in this case. For timing-sensitive
tests like this it's useful to have the output for exactly how long it
took when there's a FAIL in the logs.

The patch is OK for trunk now (and should still apply cleanly).



>
>
> for  libstdc++-v3/ChangeLog
>
> * testsuite/30_threads/async/async.cc (test04): Initialize
> steady_start, renamed from steady_begin, next to slow_start.
> Increase tolerance for final wait.
> ---
>  libstdc++-v3/testsuite/30_threads/async/async.cc |   17 -
>  1 file changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/libstdc++-v3/testsuite/30_threads/async/async.cc 
> b/libstdc++-v3/testsuite/30_threads/async/async.cc
> index 38943ff1a9a5e..a36e1aee8bdef 100644
> --- a/libstdc++-v3/testsuite/30_threads/async/async.cc
> +++ b/libstdc++-v3/testsuite/30_threads/async/async.cc
> @@ -20,6 +20,7 @@
>  // with this library; see the file COPYING3.  If not see
>  // .
>
> +#include 
>
>  #include 
>  #include 
> @@ -133,6 +134,7 @@ void test04()
>  {
>using namespace std::chrono;
>
> +  auto const steady_start = steady_clock::now();
>auto const slow_start = slow_clock::now();
>future f1 = async(launch::async, []() {
>std::this_thread::sleep_for(std::chrono::seconds(2));
> @@ -140,21 +142,26 @@ void test04()
>
>// Wait for ~1s
>{
> -auto const steady_begin = steady_clock::now();
>  auto const status = f1.wait_until(slow_start + milliseconds(100));
>  VERIFY(status == std::future_status::timeout);
> -auto const elapsed = steady_clock::now() - steady_begin;
> +auto const elapsed = steady_clock::now() - steady_start;
> +if (elapsed < seconds(1))
> +  std::cout << elapsed.count () << "ns < 1s" << std::endl;
>  VERIFY(elapsed >= seconds(1));
>  VERIFY(elapsed < seconds(2));
>}
>
> -  // Wait for up to ~2s more
> +  // Wait for up to ~4s more, but since the async sleep completes, the
> +  // actual wait may be shorter than 1s.  Tolerate 3s because 2s
> +  // hasn't been enough in some extreme cases.
>{
>  auto const steady_begin = steady_clock::now();
> -auto const status = f1.wait_until(slow_start + milliseconds(300));
> +auto const status = f1.wait_until(slow_start + milliseconds(500));
>  VERIFY(status == std::future_status::ready);
>  auto const elapsed = steady_clock::now() - steady_begin;
> -VERIFY(elapsed < seconds(2));
> +if (elapsed >= seconds(3))
> +  std::cout << elapsed.count () << "ns > 2s" << std::endl;
> +VERIFY(elapsed < seconds(3));
>}
>  }
>
>
>
> --
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>Free Software Activist   GNU Toolchain Engineer
> Disinformation flourishes because many people care deeply about injustice
> but very few check the facts.  Ask me about 
>



Re: [Patch] libgomp: Fix reverse offload issues

2023-02-03 Thread Tobias Burnus

Now committed as obvious as
r13-5680-g0b1ce70a813b98ef2893779d14ad6c90c5d06a71.

I improved the wording in the commit comment a bit, compared to previous
attachment and I have verified that those features work with AMDGCN* and
without offloading.

Tobias

(* it seems as if there is still another issue with mapping, this time
for array-descriptor variables that do not exist on the host, i.e. that
have to be fully mapped to the host. I will look into this today or Monday.)

On 02.02.23 15:13, Tobias Burnus wrote:

Found when testing AMD GCN offloading, the second issue came up with
libgomp.fortran/reverse-offload-5.f90. (But oddly not with nvptx.)

While the first one (new test: libgomp.fortran/reverse-offload-6.f90)
came up when debugging the issue.

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße
201, 80634 München; Gesellschaft mit beschränkter Haftung;
Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft:
München; Registergericht München, HRB 106955

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit 0b1ce70a813b98ef2893779d14ad6c90c5d06a71
Author: Tobias Burnus 
Date:   Fri Feb 3 11:31:53 2023 +0100

libgomp: Fix reverse offload issues

If there is nothing to map, skip the mapping and avoid attempting to
copy 0 bytes from addrs, sizes and kinds.

Additionally, it could happen that a non-allocated address was deallocated,
such as a pointer set, leading to a free for the actual data.

libgomp/
* target.c (gomp_target_rev): Handle mapnum == 0 and avoid
freeing not allocated memory.
* testsuite/libgomp.fortran/reverse-offload-6.f90: New test.
---
 libgomp/target.c   |  8 +++---
 .../libgomp.fortran/reverse-offload-6.f90  | 32 ++
 2 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/libgomp/target.c b/libgomp/target.c
index b16ee761a95..c1682caea13 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -3324,7 +3324,7 @@ gomp_target_rev (uint64_t fn_ptr, uint64_t mapnum, uint64_t devaddrs_ptr,
 gomp_fatal ("Cannot find reverse-offload function");
   void (*host_fn)() = (void (*)()) n->k->host_start;
 
-  if (devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
+  if ((devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) || mapnum == 0)
 {
   devaddrs = (uint64_t *) (uintptr_t) devaddrs_ptr;
   sizes = (uint64_t *) (uintptr_t) sizes_ptr;
@@ -3402,7 +3402,7 @@ gomp_target_rev (uint64_t fn_ptr, uint64_t mapnum, uint64_t devaddrs_ptr,
 	  }
 }
 
-  if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM))
+  if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) && mapnum > 0)
 {
   size_t j, struct_cpy = 0;
   splay_tree_key n2;
@@ -3638,7 +3638,7 @@ gomp_target_rev (uint64_t fn_ptr, uint64_t mapnum, uint64_t devaddrs_ptr,
 
   host_fn (devaddrs);
 
-  if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM))
+  if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) && mapnum > 0)
 {
   uint64_t struct_cpy = 0;
   bool clean_struct = false;
@@ -3680,7 +3680,7 @@ gomp_target_rev (uint64_t fn_ptr, uint64_t mapnum, uint64_t devaddrs_ptr,
 	  clean_struct = true;
 	  struct_cpy = sizes[i];
 	}
-	  else if (cdata[i].aligned)
+	  else if (!cdata[i].present && cdata[i].aligned)
 	gomp_aligned_free ((void *) (uintptr_t) devaddrs[i]);
 	  else if (!cdata[i].present)
 	free ((void *) (uintptr_t) devaddrs[i]);
diff --git a/libgomp/testsuite/libgomp.fortran/reverse-offload-6.f90 b/libgomp/testsuite/libgomp.fortran/reverse-offload-6.f90
new file mode 100644
index 000..04866edbba7
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/reverse-offload-6.f90
@@ -0,0 +1,32 @@
+!
+! Ensure that a mapping with no argument works
+!
+
+module m
+  implicit none (type, external)
+  integer :: x = 32
+  integer :: dev_num2 = -1
+contains
+subroutine  foo()
+  use omp_lib, only: omp_get_device_num
+  x = x + 10
+  dev_num2 = omp_get_device_num()
+end
+end module m
+
+use m
+use omp_lib
+!$omp requires reverse_offload
+implicit none (type, external)
+integer :: dev_num = -1
+!$omp target map(from:dev_num)
+  dev_num = omp_get_device_num()
+  ! This calls GOMP_target_ext with number of maps = 0
+  !$omp target device(ancestor:1)
+call foo
+  !$omp end target
+!$omp end target
+
+if (omp_get_num_devices() > 0 .and.  dev_num2 == dev_num) stop 1
+if (x /= 42) stop 2
+end


[PATCH 4/4] rs6000: build constant via li/lis;rldic

2023-02-03 Thread Jiufu Guo via Gcc-patches
Hi,

This patch checks if a constant is possible to be built by "li;rldic".
We only need to take care of "negative li", other forms do not need to check.
For example, "negative lis" is just a "negative li" with an additional shift.

Bootstrap and regtest pass on ppc64{,le}.
Is this ok for trunk or next stage1?

BR,
Jeff (Jiufu)

gcc/ChangeLog:

* config/rs6000/rs6000.cc (can_be_built_by_li_and_rldic): New function.
(rs6000_emit_set_long_const): Call can_be_built_by_li_and_rldic.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/const-build.c: Add more tests.

---
 gcc/config/rs6000/rs6000.cc   | 60 ++-
 .../gcc.target/powerpc/const-build.c  | 28 +
 2 files changed, 87 insertions(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 025abaa436e..59b4e422058 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -10361,6 +10361,63 @@ can_be_built_by_li_lis_and_rldicr (HOST_WIDE_INT c, 
int *shift,
   return false;
 }
 
+/* Check if value C can be built by 2 instructions: one is 'li', another is
+   rldic.
+
+   If so, *SHIFT is set to the 'shift' operand of rldic; and *MASK is set
+   to the mask value about the 'mb' operand of rldic; and return true.
+   Return false otherwise.  */
+static bool
+can_be_built_by_li_and_rldic (HOST_WIDE_INT c, int *shift, HOST_WIDE_INT *mask)
+{
+  /* There are 49 successive ones in the negative value of 'li'.  */
+  int ones = 49;
+
+  /* 1..1xx1..1: negative value of li --> 0..01..1xx0..0:
+ right bits are shiftted as 0's, and left 1's(and x's) are cleaned.  */
+  int tz = ctz_hwi (c);
+  int lz = clz_hwi (c);
+  int middle_ones = clz_hwi (~(c << lz));
+  if (tz + lz + middle_ones >= ones)
+{
+  *mask = ((1LL << (HOST_BITS_PER_WIDE_INT - tz - lz)) - 1LL) << tz;
+  *shift = tz;
+  return true;
+}
+
+  /* 1..1xx1..1 --> 1..1xx0..01..1: some 1's(following x's) are cleaned. */
+  int leading_ones = clz_hwi (~c);
+  int tailing_ones = ctz_hwi (~c);
+  int middle_zeros = ctz_hwi (c >> tailing_ones);
+  if (leading_ones + tailing_ones + middle_zeros >= ones)
+{
+  *mask = ~(((1ULL << middle_zeros) - 1ULL) << tailing_ones);
+  *shift = tailing_ones + middle_zeros;
+  return true;
+}
+
+  /* xx1..1xx: --> xx0..01..1xx: some 1's(following x's) are cleaned. */
+  /* Get the possition for the first bit of sucessive 1.
+ The 24th bit would be in successive 0 or 1.  */
+  HOST_WIDE_INT low_mask = (1LL << 24) - 1LL;
+  int pos_first_1 = ((c & (low_mask + 1)) == 0)
+ ? clz_hwi (c & low_mask)
+ : HOST_BITS_PER_WIDE_INT - ctz_hwi (~(c | low_mask));
+  middle_ones = clz_hwi (~c << pos_first_1);
+  middle_zeros = ctz_hwi (c >> (HOST_BITS_PER_WIDE_INT - pos_first_1));
+  if (pos_first_1 < HOST_BITS_PER_WIDE_INT
+  && middle_ones + middle_zeros < HOST_BITS_PER_WIDE_INT
+  && middle_ones + middle_zeros >= ones)
+{
+  *mask = ~(((1ULL << middle_zeros) - 1LL)
+   << (HOST_BITS_PER_WIDE_INT - pos_first_1));
+  *shift = HOST_BITS_PER_WIDE_INT - pos_first_1 + middle_zeros;
+  return true;
+}
+
+  return false;
+}
+
 /* Subroutine of rs6000_emit_set_const, handling PowerPC64 DImode.
Output insns to set DEST equal to the constant C as a series of
lis, ori and shl instructions.  */
@@ -10402,7 +10459,8 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
 }
   else if (can_be_built_by_li_lis_and_rotldi (c, , )
   || can_be_built_by_li_lis_and_rldicl (c, , )
-  || can_be_built_by_li_lis_and_rldicr (c, , ))
+  || can_be_built_by_li_lis_and_rldicr (c, , )
+  || can_be_built_by_li_and_rldic (c, , ))
 {
   temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
   unsigned HOST_WIDE_INT imm = (c | ~mask);
diff --git a/gcc/testsuite/gcc.target/powerpc/const-build.c 
b/gcc/testsuite/gcc.target/powerpc/const-build.c
index 8c209921d41..b503ee31c7c 100644
--- a/gcc/testsuite/gcc.target/powerpc/const-build.c
+++ b/gcc/testsuite/gcc.target/powerpc/const-build.c
@@ -82,6 +82,29 @@ lis_rldicr_12 (void)
   return 0x5310LL;
 }
 
+long long NOIPA
+li_rldic_13 (void)
+{
+  return 0x000f8531LL;
+}
+long long NOIPA
+li_rldic_14 (void)
+{
+  return 0x853100ffLL;
+}
+
+long long NOIPA
+li_rldic_15 (void)
+{
+  return 0x8031LL;
+}
+
+long long NOIPA
+li_rldic_16 (void)
+{
+  return 0x8f31LL;
+}
+
 struct fun arr[] = {
   {li_rotldi_1, 0x75310LL},
   {li_rotldi_2, 0x2164LL},
@@ -95,11 +118,16 @@ struct fun arr[] = {
   {li_rldicr_10, 0x8531fff0LL},
   {li_rldicr_11, 0x21f0LL},
   {lis_rldicr_12, 0x5310LL},
+  {li_rldic_13, 0x000f8531LL},
+  {li_rldic_14, 0x853100ffLL},
+  {li_rldic_15, 0x8031LL},
+  {li_rldic_16, 0x8f31LL}
 };
 
 /* { dg-final { 

[PATCH 2/4] rs6000: build constant via lis;rotldi

2023-02-03 Thread Jiufu Guo via Gcc-patches
Hi,

This patch checks if a constant is possible to be rotated to/from a negative
value from "lis".  If so, we could use "lis;rotldi" to build it.
The positive value of "lis" does not need to be analyzed.  Because if a
constant can be rotated from positive value of "lis", it also can be rotated
from a positive value of "li".

Bootstrap and regtest pass on ppc64{,le}.
Is this ok for trunk or next stage1?

BR,
Jeff (Jiufu)

gcc/ChangeLog:

* config/rs6000/rs6000.cc (can_be_rotated_to_negative_lis): New
function.
(can_be_built_by_li_and_rotldi): Rename to ...
(can_be_built_by_li_lis_and_rotldi): ... this function.
(rs6000_emit_set_long_const): Call can_be_built_by_li_lis_and_rotldi.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/const-build.c: Add more tests.

---
 gcc/config/rs6000/rs6000.cc   | 41 ---
 .../gcc.target/powerpc/const-build.c  | 16 +++-
 2 files changed, 51 insertions(+), 6 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 82aba051c55..dcbd5820a52 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -10256,18 +10256,49 @@ can_be_rotated_to_negative_li (HOST_WIDE_INT c, int 
*rot)
   return can_be_rotated_to_lowbits (~c, 15, rot);
 }
 
-/* Check if value C can be built by 2 instructions: one is 'li', another is
-   rotldi.
+/* Check if C can be rotated to a negative value which 'lis' instruction is
+   able to load: 1..1xx0..0.  If so, set *ROT to the number by which C is
+   rotated, and return true.  Return false otherwise.  */
+static bool
+can_be_rotated_to_negative_lis (HOST_WIDE_INT c, int *rot)
+{
+  /* case a. 1..1xxx0..01..1: up to 15 x's, at least 16 0's.  */
+  int leading_ones = clz_hwi (~c);
+  int tailing_ones = ctz_hwi (~c);
+  int middle_zeros = ctz_hwi (c >> tailing_ones);
+  if (middle_zeros >= 16 && leading_ones + tailing_ones >= 33)
+{
+  *rot = HOST_BITS_PER_WIDE_INT - tailing_ones;
+  return true;
+}
+
+  /* case b. xx0..01..1xx: some of 15 x's (and some of 16 0's) are
+ rotated over highest bit.  */
+  int pos_one = clz_hwi ((c << 16) >> 16);
+  middle_zeros = ctz_hwi (c >> (HOST_BITS_PER_WIDE_INT - pos_one));
+  int middle_ones = clz_hwi (~(c << pos_one));
+  if (middle_zeros >= 16 && middle_ones >= 33)
+{
+  *rot = pos_one;
+  return true;
+}
+
+  return false;
+}
+
+/* Check if value C can be built by 2 instructions: one is 'li or lis',
+   another is rotldi.
 
If so, *SHIFT is set to the shift operand of rotldi(rldicl), and *MASK
is set to -1, and return true.  Return false otherwise.  */
 static bool
-can_be_built_by_li_and_rotldi (HOST_WIDE_INT c, int *shift,
+can_be_built_by_li_lis_and_rotldi (HOST_WIDE_INT c, int *shift,
   HOST_WIDE_INT *mask)
 {
   int n;
   if (can_be_rotated_to_possitive_li (c, )
-  || can_be_rotated_to_negative_li (c, ))
+  || can_be_rotated_to_negative_li (c, )
+  || can_be_rotated_to_negative_lis (c, ))
 {
   *mask = HOST_WIDE_INT_M1;
   *shift = HOST_BITS_PER_WIDE_INT - n;
@@ -10316,7 +10347,7 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
   emit_move_insn (dest, gen_rtx_XOR (DImode, temp,
 GEN_INT ((ud2 ^ 0x) << 16)));
 }
-  else if (can_be_built_by_li_and_rotldi (c, , ))
+  else if (can_be_built_by_li_lis_and_rotldi (c, , ))
 {
   temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
   unsigned HOST_WIDE_INT imm = (c | ~mask);
diff --git a/gcc/testsuite/gcc.target/powerpc/const-build.c 
b/gcc/testsuite/gcc.target/powerpc/const-build.c
index 70f095f6bf2..c38a1dd91f2 100644
--- a/gcc/testsuite/gcc.target/powerpc/const-build.c
+++ b/gcc/testsuite/gcc.target/powerpc/const-build.c
@@ -34,14 +34,28 @@ li_rotldi_4 (void)
   return 0x2194LL;
 }
 
+long long NOIPA
+lis_rotldi_5 (void)
+{
+  return 0x8531LL;
+}
+
+long long NOIPA
+lis_rotldi_6 (void)
+{
+  return 0x5318LL;
+}
+
 struct fun arr[] = {
   {li_rotldi_1, 0x75310LL},
   {li_rotldi_2, 0x2164LL},
   {li_rotldi_3, 0x8531LL},
   {li_rotldi_4, 0x2194LL},
+  {lis_rotldi_5, 0x8531LL},
+  {lis_rotldi_6, 0x5318LL},
 };
 
-/* { dg-final { scan-assembler-times {\mrotldi\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mrotldi\M} 6 } } */
 
 int
 main ()
-- 
2.17.1



[PATCH 3/4] rs6000: build constant via li/lis;rldicl/rldicr

2023-02-03 Thread Jiufu Guo via Gcc-patches
Hi,

This patch checks if a constant is possible left/right cleaned on a rotated
value from a negative value of "li/lis".  If so, we can build the constant
through "li/lis ; rldicl/rldicr".

Bootstrap and regtest pass on ppc64{,le}.
Is this ok for trunk or next stage1?

BR,
Jeff (Jiufu)

gcc/ChangeLog:

* config/rs6000/rs6000.cc (can_be_built_by_li_lis_and_rldicl): New
function.
(can_be_built_by_li_lis_and_rldicr): New function.
(rs6000_emit_set_long_const): Call can_be_built_by_li_lis_and_rldicr and
can_be_built_by_li_lis_and_rldicl.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/const-build.c: Add more tests.

---
 gcc/config/rs6000/rs6000.cc   | 57 ++-
 .../gcc.target/powerpc/const-build.c  | 44 ++
 2 files changed, 100 insertions(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index dcbd5820a52..025abaa436e 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -10308,6 +10308,59 @@ can_be_built_by_li_lis_and_rotldi (HOST_WIDE_INT c, 
int *shift,
   return false;
 }
 
+/* Check if value C can be built by 2 instructions: one is 'li or lis',
+   another is rldicl.
+
+   If so, *SHIFT is set to the shift operand of rldicl, and *MASK is set to
+   the mask operand of rldicl, and return true.
+   Return false otherwise.  */
+static bool
+can_be_built_by_li_lis_and_rldicl (HOST_WIDE_INT c, int *shift,
+  HOST_WIDE_INT *mask)
+{
+  /* Leading zeros maybe cleaned by rldicl with mask.  Change leading zeros
+ to ones and then recheck it.  */
+  int lz = clz_hwi (c);
+  HOST_WIDE_INT unmask_c
+= c | (HOST_WIDE_INT_M1U << (HOST_BITS_PER_WIDE_INT - lz));
+  int n;
+  if (can_be_rotated_to_negative_li (unmask_c, )
+  || can_be_rotated_to_negative_lis (unmask_c, ))
+{
+  *mask = HOST_WIDE_INT_M1U >> lz;
+  *shift = n == 0 ? 0 : HOST_BITS_PER_WIDE_INT - n;
+  return true;
+}
+
+  return false;
+}
+
+/* Check if value C can be built by 2 instructions: one is 'li or lis',
+   another is rldicr.
+
+   If so, *SHIFT is set to the shift operand of rldicr, and *MASK is set to
+   the mask operand of rldicr, and return true.
+   Return false otherwise.  */
+static bool
+can_be_built_by_li_lis_and_rldicr (HOST_WIDE_INT c, int *shift,
+  HOST_WIDE_INT *mask)
+{
+  /* Tailing zeros maybe cleaned by rldicr with mask.  Change tailing zeros
+ to ones and then recheck it.  */
+  int tz = ctz_hwi (c);
+  HOST_WIDE_INT unmask_c = c | ((HOST_WIDE_INT_1U << tz) - 1);
+  int n;
+  if (can_be_rotated_to_negative_li (unmask_c, )
+  || can_be_rotated_to_negative_lis (unmask_c, ))
+{
+  *mask = HOST_WIDE_INT_M1U << tz;
+  *shift = HOST_BITS_PER_WIDE_INT - n;
+  return true;
+}
+
+  return false;
+}
+
 /* Subroutine of rs6000_emit_set_const, handling PowerPC64 DImode.
Output insns to set DEST equal to the constant C as a series of
lis, ori and shl instructions.  */
@@ -10347,7 +10400,9 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
   emit_move_insn (dest, gen_rtx_XOR (DImode, temp,
 GEN_INT ((ud2 ^ 0x) << 16)));
 }
-  else if (can_be_built_by_li_lis_and_rotldi (c, , ))
+  else if (can_be_built_by_li_lis_and_rotldi (c, , )
+  || can_be_built_by_li_lis_and_rldicl (c, , )
+  || can_be_built_by_li_lis_and_rldicr (c, , ))
 {
   temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
   unsigned HOST_WIDE_INT imm = (c | ~mask);
diff --git a/gcc/testsuite/gcc.target/powerpc/const-build.c 
b/gcc/testsuite/gcc.target/powerpc/const-build.c
index c38a1dd91f2..8c209921d41 100644
--- a/gcc/testsuite/gcc.target/powerpc/const-build.c
+++ b/gcc/testsuite/gcc.target/powerpc/const-build.c
@@ -46,6 +46,42 @@ lis_rotldi_6 (void)
   return 0x5318LL;
 }
 
+long long NOIPA
+li_rldicl_7 (void)
+{
+  return 0x3ffa1LL;
+}
+
+long long NOIPA
+li_rldicl_8 (void)
+{
+  return 0xff8531LL;
+}
+
+long long NOIPA
+lis_rldicl_9 (void)
+{
+  return 0x00ff8531LL;
+}
+
+long long NOIPA
+li_rldicr_10 (void)
+{
+  return 0x8531fff0LL;
+}
+
+long long NOIPA
+li_rldicr_11 (void)
+{
+  return 0x21f0LL;
+}
+
+long long NOIPA
+lis_rldicr_12 (void)
+{
+  return 0x5310LL;
+}
+
 struct fun arr[] = {
   {li_rotldi_1, 0x75310LL},
   {li_rotldi_2, 0x2164LL},
@@ -53,9 +89,17 @@ struct fun arr[] = {
   {li_rotldi_4, 0x2194LL},
   {lis_rotldi_5, 0x8531LL},
   {lis_rotldi_6, 0x5318LL},
+  {li_rldicl_7, 0x3ffa1LL},
+  {li_rldicl_8, 0xff8531LL},
+  {lis_rldicl_9, 0x00ff8531LL},
+  {li_rldicr_10, 0x8531fff0LL},
+  {li_rldicr_11, 0x21f0LL},
+  {lis_rldicr_12, 0x5310LL},
 };
 
 /* { dg-final { scan-assembler-times {\mrotldi\M} 6 } } */
+/* { 

[PATCH 0/4] rs6000: build constant via li/lis;rldicX

2023-02-03 Thread Jiufu Guo via Gcc-patches
Hi,

For a given constant, it would be profitable if we can use 2 insns to build.
This patch enables more constants building through 2 insns: one is "li or lis",
another is 'rldicl, rldicr or rldic'.
Through checking and analyzing the characters of the insns "li/lis;rldicX",
all the possible constant values are considered by this patch.

Previously, a patch is posted, but it is too large.
https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601276.html
As suggested, I split it into this series.

Considering the functionality and size, 4 patches are split as below:
1. Support the constants which can be built by "li;rotldi"
   Both positive and negative values from insn "li" are analyzed.
2. Support the constants which can be built by "lis;rotldi"
   We only need to analyze the negative value from "lis".
   And this patch uses more code to check leading 1s and tailing 0s from "lis".
3. Support the constants which can be built by "li/lis;rldicl/rldicr":
   Leverage the APIs defined/analyzed in patches 1 and 2,
   this patch checks the characters for the mask of "rldicl/rldicr"
   to support more constants.
4. Support the constants which can be built by "li/lis;rldic":
   The mask of "rldic" is relatively complicated, it is analyzed in this
   patch to support more constants.

BR,
Jeff (Jiufu)


[PATCH 1/4] rs6000: build constant via li;rotldi

2023-02-03 Thread Jiufu Guo via Gcc-patches
Hi,

This patch checks if a constant is possible to be rotated to/from a positive
or negative value from "li". If so, we could use "li;rotldi" to build it.

Bootstrap and regtest pass on ppc64{,le}.
Is this ok for trunk or next stage1?

BR,
Jeff (Jiufu)

gcc/ChangeLog:

* config/rs6000/rs6000.cc (can_be_rotated_to_possitive_li): New 
function.
(can_be_rotated_to_negative_li): New function.
(can_be_built_by_li_and_rotldi): New function.
(rs6000_emit_set_long_const): Call can_be_built_by_li_and_rotldi.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/const-build.c: New test.

---
 gcc/config/rs6000/rs6000.cc   | 63 +--
 .../gcc.target/powerpc/const-build.c  | 54 
 2 files changed, 111 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/const-build.c

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 6ac3adcec6b..82aba051c55 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -10238,6 +10238,45 @@ rs6000_emit_set_const (rtx dest, rtx source)
   return true;
 }
 
+/* Check if C can be rotated to a possitive value which 'li' instruction
+   is able to load.  If so, set *ROT to the number by which C is rotated,
+   and return true.  Return false otherwise.  */
+static bool
+can_be_rotated_to_possitive_li (HOST_WIDE_INT c, int *rot)
+{
+  /* 49 leading zeros and 15 lowbits on the possitive value
+ generated by 'li' instruction.  */
+  return can_be_rotated_to_lowbits (c, 15, rot);
+}
+
+/* Like can_be_rotated_to_possitive_li, but check negative value of 'li'.  */
+static bool
+can_be_rotated_to_negative_li (HOST_WIDE_INT c, int *rot)
+{
+  return can_be_rotated_to_lowbits (~c, 15, rot);
+}
+
+/* Check if value C can be built by 2 instructions: one is 'li', another is
+   rotldi.
+
+   If so, *SHIFT is set to the shift operand of rotldi(rldicl), and *MASK
+   is set to -1, and return true.  Return false otherwise.  */
+static bool
+can_be_built_by_li_and_rotldi (HOST_WIDE_INT c, int *shift,
+  HOST_WIDE_INT *mask)
+{
+  int n;
+  if (can_be_rotated_to_possitive_li (c, )
+  || can_be_rotated_to_negative_li (c, ))
+{
+  *mask = HOST_WIDE_INT_M1;
+  *shift = HOST_BITS_PER_WIDE_INT - n;
+  return true;
+}
+
+  return false;
+}
+
 /* Subroutine of rs6000_emit_set_const, handling PowerPC64 DImode.
Output insns to set DEST equal to the constant C as a series of
lis, ori and shl instructions.  */
@@ -10246,15 +10285,14 @@ static void
 rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
 {
   rtx temp;
+  int shift;
+  HOST_WIDE_INT mask;
   HOST_WIDE_INT ud1, ud2, ud3, ud4;
 
   ud1 = c & 0x;
-  c = c >> 16;
-  ud2 = c & 0x;
-  c = c >> 16;
-  ud3 = c & 0x;
-  c = c >> 16;
-  ud4 = c & 0x;
+  ud2 = (c >> 16) & 0x;
+  ud3 = (c >> 32) & 0x;
+  ud4 = (c >> 48) & 0x;
 
   if ((ud4 == 0x && ud3 == 0x && ud2 == 0x && (ud1 & 0x8000))
   || (ud4 == 0 && ud3 == 0 && ud2 == 0 && ! (ud1 & 0x8000)))
@@ -10278,6 +10316,19 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
   emit_move_insn (dest, gen_rtx_XOR (DImode, temp,
 GEN_INT ((ud2 ^ 0x) << 16)));
 }
+  else if (can_be_built_by_li_and_rotldi (c, , ))
+{
+  temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
+  unsigned HOST_WIDE_INT imm = (c | ~mask);
+  imm = (imm >> shift) | (imm << (HOST_BITS_PER_WIDE_INT - shift));
+
+  emit_move_insn (temp, GEN_INT (imm));
+  if (shift != 0)
+   temp = gen_rtx_ROTATE (DImode, temp, GEN_INT (shift));
+  if (mask != HOST_WIDE_INT_M1)
+   temp = gen_rtx_AND (DImode, temp, GEN_INT (mask));
+  emit_move_insn (dest, temp);
+}
   else if (ud3 == 0 && ud4 == 0)
 {
   temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
diff --git a/gcc/testsuite/gcc.target/powerpc/const-build.c 
b/gcc/testsuite/gcc.target/powerpc/const-build.c
new file mode 100644
index 000..70f095f6bf2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/const-build.c
@@ -0,0 +1,54 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -save-temps" } */
+/* { dg-require-effective-target has_arch_ppc64 } */
+
+#define NOIPA __attribute__ ((noipa))
+
+struct fun
+{
+  long long (*f) (void);
+  long long val;
+};
+
+long long NOIPA
+li_rotldi_1 (void)
+{
+  return 0x75310LL;
+}
+
+long long NOIPA
+li_rotldi_2 (void)
+{
+  return 0x2164LL;
+}
+
+long long NOIPA
+li_rotldi_3 (void)
+{
+  return 0x8531LL;
+}
+
+long long NOIPA
+li_rotldi_4 (void)
+{
+  return 0x2194LL;
+}
+
+struct fun arr[] = {
+  {li_rotldi_1, 0x75310LL},
+  {li_rotldi_2, 0x2164LL},
+  {li_rotldi_3, 0x8531LL},
+  {li_rotldi_4, 0x2194LL},
+};
+
+/* { dg-final { scan-assembler-times {\mrotldi\M} 4 } } */
+
+int
+main 

[PATCH v1] RISC-V: Change the generation mode of ADJUST_SP_RTX from gen_insn to gen_SET.

2023-02-03 Thread Jin Ma via Gcc-patches
The gen_insn method is used to generate ADJUST_SP_RTX here, which has certain
potential risks:

When the architecture adds pre-processing to `define_insn "adddi3"`, such as
`define_expend "adddi3"`, the gen_expand will be automatically called here,
causing the patern to emit directly, which will cause insn to enter REG_NOTE
for `DWARF` instead of patern.

The following error REG_NOTE occurred:
error: invalid rtl sharing found in the insn:
(insn 19 3 20 2 (parallel [
...
])
(expr_list:REG_CFA_ADJUST_CFA
(insn 18 0 0 (set (reg/f:DI 2 sp)
(plus:DI (reg/f:DI 2 sp)
(const_int -16 [0xfff0]))) -1
(nil

In fact, the correct one should be the following:
(insn 19 3 20 2 (parallel [
...
])
(expr_list:REG_CFA_ADJUST_CFA
(set (reg/f:DI 2 sp)
(plus:DI (reg/f:DI 2 sp)
(const_int -16 [0xfff0])

Following the treatment of arm or other architectures, it is more reasonable to
use gen_SET here.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_adjust_libcall_cfi_prologue): Change 
gen_add3_insn
to gen_rtx_SET.
(riscv_adjust_libcall_cfi_epilogue): Likewise.
---
 gcc/config/riscv/riscv.cc | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 3b7804b7501..c9c6e53c6d0 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -5054,8 +5054,9 @@ riscv_adjust_libcall_cfi_prologue ()
   }
 
   /* Debug info for adjust sp.  */
-  adjust_sp_rtx = gen_add3_insn (stack_pointer_rtx,
-stack_pointer_rtx, GEN_INT (-saved_size));
+  adjust_sp_rtx =
+gen_rtx_SET (stack_pointer_rtx,
+gen_rtx_PLUS (GET_MODE(stack_pointer_rtx), stack_pointer_rtx, 
GEN_INT (-saved_size)));
   dwarf = alloc_reg_note (REG_CFA_ADJUST_CFA, adjust_sp_rtx,
  dwarf);
   return dwarf;
@@ -5176,8 +5177,9 @@ riscv_adjust_libcall_cfi_epilogue ()
   int saved_size = cfun->machine->frame.save_libcall_adjustment;
 
   /* Debug info for adjust sp.  */
-  adjust_sp_rtx = gen_add3_insn (stack_pointer_rtx,
-stack_pointer_rtx, GEN_INT (saved_size));
+  adjust_sp_rtx =
+gen_rtx_SET (stack_pointer_rtx,
+gen_rtx_PLUS (GET_MODE(stack_pointer_rtx), stack_pointer_rtx, 
GEN_INT (saved_size)));
   dwarf = alloc_reg_note (REG_CFA_ADJUST_CFA, adjust_sp_rtx,
  dwarf);
 
-- 
2.17.1



Re: [PATCH] ipa: Avoid invalid gimple when IPA-CP and IPA-SRA disagree on types (108384)

2023-02-03 Thread Martin Jambor
On Fri, Feb 03 2023, Richard Biener wrote:
> On Thu, Feb 2, 2023 at 5:20 PM Martin Jambor  wrote:
>>
>> Hi,
>>
>> when the compiled program contains type mismatches between callers and
>> callees when it comes to a parameter, IPA-CP can try to propagate one
>> constant from callers while IPA-SRA may try to split a parameter
>> expecting a value of a different size on the same offset.  This then
>> currently leads to creation of a VIEW_CONVERT_EXPR with mismatching
>> type sizes of LHS and RHS which is correctly flagged by the GIMPLE
>> verifier as invalid.
>>
>> It seems that the best course of action is to try and avoid the
>> situation altogether and so this patch adds a check to IPA-SRA that
>> peeks into the result of IPA-CP and when it sees a value on the same
>> offset but with a mismatching size, it just decides to leave that
>> particular parameter be.
>>
>> Bootstrapped and tested on x86_64-linux, OK for master?
>
> OK.  I suppose there are guards elsewhere that never lets a
> non-UHWI size type (like variable size or poly-int-size) through
> any of the SRA or CP lattices?

SRA tracks its accesses in simple integers so yes for that part.

As far IPA-CP is concerned... all the values tracked conform to
is_gimple_ip_invariant, so are either ADDR_EXPRs of a global variable or
is_gimple_constant's.  So its size should never be variable and I hope
also never a complex poly-int.  If you think it would be better, I can
of course add the check.

Thanks,

Martin


>> gcc/ChangeLog:
>>
>> 2023-02-02  Martin Jambor  
>>
>> PR ipa/108384
>> * ipa-sra.cc (push_param_adjustments_for_index): Remove a size check
>> when comparing to an IPA-CP value.
>> (dump_list_of_param_indices): New function.
>> (adjust_parameter_descriptions): Check for mismatching IPA-CP values.
>> Dump removed candidates using dump_list_of_param_indices.
>> * ipa-param-manipulation.cc
>> (ipa_param_body_adjustments::modify_expression): Add assert checking
>> sizes of a VIEW_CONVERT_EXPR will match.
>> (ipa_param_body_adjustments::modify_assignment): Likewise.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2023-02-02  Martin Jambor  
>>
>> PR ipa/108384
>> * gcc.dg/ipa/pr108384.c: New test.
>> ---


Re: [PATCH] [PR tree-optimization/18639] Compare nonzero bits in irange with widest_int.

2023-02-03 Thread Jakub Jelinek via Gcc-patches
On Fri, Feb 03, 2023 at 09:50:43AM +0100, Aldy Hernandez wrote:
> [PR tree-optimization/18639] Compare nonzero bits in irange with widest_int.

0 missing in the bug number in the subject line, though the current
recommended formatting of the subject is I think:
value-range: Compare nonzero bits in irange with widest_int [PR180639]


  
PR 108639/tree-optimization

Reversed component and number

> --- a/gcc/value-range.cc
> +++ b/gcc/value-range.cc
> @@ -1259,7 +1259,10 @@ irange::legacy_equal_p (const irange ) const
>  other.tree_lower_bound (0))
> && vrp_operand_equal_p (tree_upper_bound (0),
> other.tree_upper_bound (0))
> -   && get_nonzero_bits () == other.get_nonzero_bits ());
> +   && (widest_int::from (get_nonzero_bits (),
> + TYPE_SIGN (type ()))
> +   == widest_int::from (other.get_nonzero_bits (),
> +TYPE_SIGN (other.type ();
>  }
>  
>  bool
> @@ -1294,7 +1297,11 @@ irange::operator== (const irange ) const
> || !operand_equal_p (ub, ub_other, 0))
>   return false;
>  }
> -  return get_nonzero_bits () == other.get_nonzero_bits ();
> +  widest_int nz1 = widest_int::from (get_nonzero_bits (),
> +  TYPE_SIGN (type ()));
> +  widest_int nz2 = widest_int::from (other.get_nonzero_bits (),
> +  TYPE_SIGN (other.type ()));
> +  return nz1 == nz2;
>  }

While the above avoids the ICE (and would be certainly correct for
the bounds, depending on the sign of their type sign or zero extended
to widest int), but is the above what we want for non-zero bits
to be considered equal?  The wide_ints (which ought to have precision
of the corresponding type) don't represent normal numbers but bitmasks,
0 - this bit is known to be zero, 1 - nothing is known about this bit).
So, if there are different precisions and the narrower value has 0
in the MSB of the bitmask (so MSB is known to be zero), the above requires
for equality that in the other range all upper bits are known to be zero
too for both signed and unsigned.  That is ok.  Similarly for MSB set
if TYPE_SIGN of the narrower is unsigned, the MSB value is unknown, but we
require on the wider to have all the upper bits cleared.  But for signed
narrower type with MSB set, i.e. it is unknown if it is positive or
negative, the above requires that all the above bits are unknown too.
And that is the case I'm not sure about, whether in that case the
upper bits of the wider wide_int should be checked at all.
Though, perhaps from the POV of nonzero bits derived from the sign-extended
values in the ranges sign bit copies (so all above bits 1) is what one would
get, so maybe it is ok.  Just food for thought.

As for retesting, if you have done full bootstrap/regtest with the patch
without the testcases in it, it should be more than enough to test just
make check-gcc \
RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} compile.exp=pr10863*.c'
You don't really need to rerun all tests just for it.

Jakub



[PATCH] ifcvt: Fix regression in aarch64/fcsel_1.c

2023-02-03 Thread Richard Sandiford via Gcc-patches
aarch64/fcsel_1.c contains:

double
f_2 (double a, double b, double c, double d)
{
  if (a > b)
return c;
  else
return d;
}

which started failing in the GCC 12 timeframe.  When it passed,
the RTL had the form:

[A]
  (set (reg ret) (reg c))
  (set (pc) (if_then_else (gt ...) (label_ref ret) (pc)))
edge to ret, fallthru to else
else:
  (set (reg ret) (reg d))
fallthru to ret
ret:
  ...exit...

i.e. a branch around.  Now the RTL has form:

[B]
  (set (reg ret) (reg d))
  (set (pc) (if_then_else (gt ...) (label_ref then) (pc)))
edge to then, fallthru to ret
ret:
  ...exit...

then:
  (set (reg ret) (reg c))
edge to ret

i.e. a branch out.

Both are valid, of course, and there's no easy way to predict
which we'll get.  But ifcvt canonicalises its representation on:

  if (cond) goto fallthru else goto non-fallthru

That is, it canoncalises on the branch-around case for half-diamonds.
It therefore wants to invert the comparison in [B] to get:

  if (...) goto ret else goto then

But that isn't possible for strict FP gt, so the optimisation fails.

Canonicalising on the branch-around case seems like the wrong choice for
half diamonds.  The natural way of expressing a conditional branch is
for the label_ref to be the "then" destination and pc to be the "else"
destination.  And the natural choice of condition seems to be the one
under which extra stuff *is* done, rather than the one under which extra
stuff *isn't* done.  But that decision goes back at least 20 years and
it doesn't seem like a good idea to change it in stage 4.

This patch instead allows the internal structure to store the
condition in inverted form.  For simplicity it handles only
conditional moves, which is the one case that is needed
to fix the known regression.  (There are probably unknown
regressions too, but still.)

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


gcc/
* ifcvt.h (noce_if_info::cond_inverted): New field.
* ifcvt.cc (cond_move_convert_if_block): Swap the then and else
values when cond_inverted is true.
(noce_find_if_block): Allow the condition to be inverted when
handling conditional moves.
---
 gcc/ifcvt.cc | 31 +++
 gcc/ifcvt.h  |  8 
 2 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index 008796838f7..63ef42b3c34 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -4253,6 +4253,9 @@ cond_move_convert_if_block (struct noce_if_info *if_infop,
e = dest;
}
 
+  if (if_infop->cond_inverted)
+   std::swap (t, e);
+
   target = noce_emit_cmove (if_infop, dest, code, cond_arg0, cond_arg1,
t, e);
   if (!target)
@@ -4405,7 +4408,6 @@ noce_find_if_block (basic_block test_bb, edge then_edge, 
edge else_edge,
   basic_block then_bb, else_bb, join_bb;
   bool then_else_reversed = false;
   rtx_insn *jump;
-  rtx cond;
   rtx_insn *cond_earliest;
   struct noce_if_info if_info;
   bool speed_p = optimize_bb_for_speed_p (test_bb);
@@ -4481,25 +4483,28 @@ noce_find_if_block (basic_block test_bb, edge 
then_edge, edge else_edge,
   if (! onlyjump_p (jump))
 return FALSE;
 
-  /* If this is not a standard conditional jump, we can't parse it.  */
-  cond = noce_get_condition (jump, _earliest, then_else_reversed);
-  if (!cond)
-return FALSE;
-
-  /* We must be comparing objects whose modes imply the size.  */
-  if (GET_MODE (XEXP (cond, 0)) == BLKmode)
-return FALSE;
-
   /* Initialize an IF_INFO struct to pass around.  */
   memset (_info, 0, sizeof if_info);
   if_info.test_bb = test_bb;
   if_info.then_bb = then_bb;
   if_info.else_bb = else_bb;
   if_info.join_bb = join_bb;
-  if_info.cond = cond;
+  if_info.cond = noce_get_condition (jump, _earliest,
+then_else_reversed);;
   rtx_insn *rev_cond_earliest;
   if_info.rev_cond = noce_get_condition (jump, _cond_earliest,
 !then_else_reversed);
+  if (!if_info.cond && !if_info.rev_cond)
+return FALSE;
+  if (!if_info.cond)
+{
+  std::swap (if_info.cond, if_info.rev_cond);
+  std::swap (cond_earliest, rev_cond_earliest);
+  if_info.cond_inverted = true;
+}
+  /* We must be comparing objects whose modes imply the size.  */
+  if (GET_MODE (XEXP (if_info.cond, 0)) == BLKmode)
+return FALSE;
   gcc_assert (if_info.rev_cond == NULL_RTX
  || rev_cond_earliest == cond_earliest);
   if_info.cond_earliest = cond_earliest;
@@ -4518,7 +4523,9 @@ noce_find_if_block (basic_block test_bb, edge then_edge, 
edge else_edge,
 
   /* Do the real work.  */
 
-  if (noce_process_if_block (_info))
+  /* ??? noce_process_if_block has not yet been updated to handle
+ inverted conditions.  */
+  if (!if_info.cond_inverted && noce_process_if_block (_info))
 return TRUE;
 
   if (HAVE_conditional_move
diff --git a/gcc/ifcvt.h b/gcc/ifcvt.h
index 

Re: [PATCH v5 0/5] P1689R5 support

2023-02-03 Thread Jonathan Wakely via Gcc-patches
On Fri, 3 Feb 2023 at 08:58, Jonathan Wakely wrote:
>
>
>
> On Fri, 3 Feb 2023, 04:09 Andrew Pinski via Gcc,  wrote:
>>
>> On Wed, Jan 25, 2023 at 1:07 PM Ben Boeckel via Fortran
>>  wrote:
>> >
>> > Hi,
>> >
>> > This patch series adds initial support for ISO C++'s [P1689R5][], a
>> > format for describing C++ module requirements and provisions based on
>> > the source code. This is required because compiling C++ with modules is
>> > not embarrassingly parallel and need to be ordered to ensure that
>> > `import some_module;` can be satisfied in time by making sure that any
>> > TU with `export import some_module;` is compiled first.
>>
>>
>> I like how folks are complaining that GCC outputs POSIX makefile
>> syntax from GCC's dependency files which are supposed to be in POSIX
>> Makefile syntax.
>> It seems like rather the build tools are people like to use are not
>> understanding POSIX makefile syntax any more rather.
>> Also I am not a fan of json, it is too verbose for no use. Maybe it is
>> time to go back to standardizing a new POSIX makefile syntax rather
>> than changing C++ here.
>
>
>
> That would take a decade or more. It's too late for POSIX 202x and the pace 
> that POSIX agrees on makefile features is incredibly slow.

Also, name+=value is *not* POSIX make syntax today, that's an
extension. That's why the tools don't always support it.
So I don't think it's true that GCC's dependency files are in POSIX syntax.

POSIX 202x does add support for it, but it will take some time for it
to be supported everywhere.


[PATCH] MIPS: use arch_32/64 instead of default_mips_arch

2023-02-03 Thread YunQiang Su
The value of default_mips_arch will be always used for -march by default,
no matter what value is given to -mabi.
It will produce abnormal elf file like:
 ELF 32-bit LSB relocatable, MIPS, MIPS64 rel2 version 1 (SYSV)

So we use with_arch_32 and with_arch_64 instead of default_mips_arch
for all mipsisa[32,64]rN triples.

gcc/ChangeLog:
* config.gcc: use with_arch_32 and with_arch_64 instead of
default_mips_arch for mipsisa[32,64]rN triples.
---
 gcc/config.gcc | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index f0958e1c959..0b6d093d847 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -2518,13 +2518,16 @@ mips*-*-linux*) # Linux MIPS, 
either endian.
extra_options="${extra_options} linux-android.opt"
case ${target} in
mipsisa32r6*)
-   default_mips_arch=mips32r6
+   with_arch_32="mips32r6"
+   with_arch_64="mips64r6"
;;
mipsisa32r2*)
-   default_mips_arch=mips32r2
+   with_arch_32="mips32r2"
+   with_arch_64="mips64r2"
;;
mipsisa32*)
-   default_mips_arch=mips32
+   with_arch_32="mips32"
+   with_arch_64="mips64"
;;
mips64el-st-linux-gnu)
default_mips_abi=n32
@@ -2540,22 +2543,26 @@ mips*-*-linux*) # Linux MIPS, 
either endian.
;;
mipsisa64r6*-*-linux-gnuabi64)
default_mips_abi=64
-   default_mips_arch=mips64r6
+   with_arch_32="mips32r6"
+   with_arch_64="mips64r6"
enable_mips_multilibs="yes"
;;
mipsisa64r6*-*-linux*)
default_mips_abi=n32
-   default_mips_arch=mips64r6
+   with_arch_32="mips32r6"
+   with_arch_64="mips64r6"
enable_mips_multilibs="yes"
;;
mipsisa64r2*-*-linux-gnuabi64)
default_mips_abi=64
-   default_mips_arch=mips64r2
+   with_arch_32="mips32r2"
+   with_arch_64="mips64r2"
enable_mips_multilibs="yes"
;;
mipsisa64r2*-*-linux*)
default_mips_abi=n32
-   default_mips_arch=mips64r2
+   with_arch_32="mips32r2"
+   with_arch_64="mips64r2"
enable_mips_multilibs="yes"
;;
mips64*-*-linux-gnuabi64 | mipsisa64*-*-linux-gnuabi64)
-- 
2.30.2



Re: [PATCH v5 0/5] P1689R5 support

2023-02-03 Thread Jonathan Wakely via Gcc-patches
On Fri, 3 Feb 2023, 04:09 Andrew Pinski via Gcc,  wrote:

> On Wed, Jan 25, 2023 at 1:07 PM Ben Boeckel via Fortran
>  wrote:
> >
> > Hi,
> >
> > This patch series adds initial support for ISO C++'s [P1689R5][], a
> > format for describing C++ module requirements and provisions based on
> > the source code. This is required because compiling C++ with modules is
> > not embarrassingly parallel and need to be ordered to ensure that
> > `import some_module;` can be satisfied in time by making sure that any
> > TU with `export import some_module;` is compiled first.
>
>
> I like how folks are complaining that GCC outputs POSIX makefile
> syntax from GCC's dependency files which are supposed to be in POSIX
> Makefile syntax.
> It seems like rather the build tools are people like to use are not
> understanding POSIX makefile syntax any more rather.
> Also I am not a fan of json, it is too verbose for no use. Maybe it is
> time to go back to standardizing a new POSIX makefile syntax rather
> than changing C++ here.
>


That would take a decade or more. It's too late for POSIX 202x and the pace
that POSIX agrees on makefile features is incredibly slow.


[PING] [PATCH 3/3] RISC-V: make the stack manipulation codes more readable.

2023-02-03 Thread Fei Gao
Gentle ping.

The patch I previously submitted:
| Date: Wed, 30 Nov 2022 00:38:08 -0800
| Subject: [PATCH] RISC-V: optimize stack manipulation in save-restore
| Message-ID: 

I split the patches as per Palmer's review comment.

BR
Fei

>gcc/ChangeLog:
>
>    * config/riscv/riscv.cc (riscv_first_stack_step): make codes more 
>readable.
>    (riscv_expand_epilogue): likewise.
>---
> gcc/config/riscv/riscv.cc | 17 ++---
> 1 file changed, 10 insertions(+), 7 deletions(-)
>
>diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>index a50f2303032..95da08ffb3b 100644
>--- a/gcc/config/riscv/riscv.cc
>+++ b/gcc/config/riscv/riscv.cc
>@@ -4926,8 +4926,11 @@ riscv_first_stack_step (struct riscv_frame_info *frame, 
>poly_int64 remaining_siz
>   if (SMALL_OPERAND (remaining_const_size))
> return remaining_const_size;
>
>+  poly_int64 callee_saved_first_step =
>+    remaining_size - frame->frame_pointer_offset;
>+  gcc_assert(callee_saved_first_step.is_constant ());
>   HOST_WIDE_INT min_first_step =
>-    riscv_stack_align ((remaining_size - 
>frame->frame_pointer_offset).to_constant());
>+    riscv_stack_align (callee_saved_first_step.to_constant ());
>   HOST_WIDE_INT max_first_step = IMM_REACH / 2 - PREFERRED_STACK_BOUNDARY / 8;
>   HOST_WIDE_INT min_second_step = remaining_const_size - max_first_step;
>   gcc_assert (min_first_step <= max_first_step);
>@@ -4935,7 +4938,7 @@ riscv_first_stack_step (struct riscv_frame_info *frame, 
>poly_int64 remaining_siz
>   /* As an optimization, use the least-significant bits of the total frame
>  size, so that the second adjustment step is just LUI + ADD.  */
>   if (!SMALL_OPERAND (min_second_step)
>-  && remaining_const_size % IMM_REACH < IMM_REACH / 2
>+  && remaining_const_size % IMM_REACH <= max_first_step
>   && remaining_const_size % IMM_REACH >= min_first_step)
> return remaining_const_size % IMM_REACH;
>
>@@ -5129,14 +5132,14 @@ riscv_adjust_libcall_cfi_epilogue ()
> void
> riscv_expand_epilogue (int style)
> {
>-  /* Split the frame into two.  STEP1 is the amount of stack we should
>- deallocate before restoring the registers.  STEP2 is the amount we
>- should deallocate afterwards.
>+  /* Split the frame into 3 steps. STEP1 is the amount of stack we should
>+ deallocate before restoring the registers. STEP2 is the amount we
>+ should deallocate afterwards including the callee saved regs. STEP3
>+ is the amount deallocated by save-restore libcall.
>
>  Start off by assuming that no registers need to be restored.  */
>   struct riscv_frame_info *frame = >machine->frame;
>   unsigned mask = frame->mask;
>-  poly_int64 step1 = frame->total_size;
>   HOST_WIDE_INT step2 = 0;
>   bool use_restore_libcall = ((style == NORMAL_RETURN)
>   && riscv_use_save_libcall (frame));
>@@ -5223,7 +5226,7 @@ riscv_expand_epilogue (int style)
>   if (use_restore_libcall)
> frame->mask = mask; /* Undo the above fib.  */
>
>-  step1 -= step2 + libcall_size;
>+  poly_int64 step1 = frame->total_size - step2 - libcall_size;
>
>   /* Set TARGET to BASE + STEP1.  */
>   if (known_gt (step1, 0))
>--
>2.17.1

[PING] [PATCH 2/3] RISC-V: optimize stack manipulation in save-restore

2023-02-03 Thread Fei Gao

Gentle ping.

The patch I previously submitted:
| Date: Wed, 30 Nov 2022 00:38:08 -0800
| Subject: [PATCH] RISC-V: optimize stack manipulation in save-restore
| Message-ID: 

I split the patches as per Palmer's review comment.

BR
Fei

>The stack that save-restore reserves is not well accumulated in stack 
>allocation and deallocation.
>This patch allows less instructions to be used in stack allocation and 
>deallocation if save-restore enabled.
>
>before patch:
>  bar:
>    call   t0,__riscv_save_4
>    addi   sp,sp,-64
>    ...
>    li t0,-12288
>    addi   t0,t0,-1968 # optimized out after patch
>    addsp,sp,t0 # prologue
>    ...
>    li t0,12288 # epilogue
>    addi   t0,t0,2000 # optimized out after patch
>    addsp,sp,t0
>    ...
>    addi   sp,sp,32
>    tail   __riscv_restore_4
>
>after patch:
>  bar:
>    call   t0,__riscv_save_4
>    addi   sp,sp,-2032
>    ...
>    li t0,-12288
>    addsp,sp,t0 # prologue
>    ...
>    li t0,12288 # epilogue
>    addsp,sp,t0
>    ...
>    addi   sp,sp,2032
>    tail   __riscv_restore_4
>
>gcc/ChangeLog:
>
>    * config/riscv/riscv.cc (riscv_expand_prologue): consider save-restore 
>in stack allocation.
>    (riscv_expand_epilogue): consider save-restore in stack deallocation.
>
>gcc/testsuite/ChangeLog:
>
>    * gcc.target/riscv/stack_save_restore.c: New test.
>---
> gcc/config/riscv/riscv.cc | 50 ++-
> .../gcc.target/riscv/stack_save_restore.c | 40 +++
> 2 files changed, 66 insertions(+), 24 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/riscv/stack_save_restore.c
>
>diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>index f0bbcd6d6be..a50f2303032 100644
>--- a/gcc/config/riscv/riscv.cc
>+++ b/gcc/config/riscv/riscv.cc
>@@ -5010,12 +5010,12 @@ void
> riscv_expand_prologue (void)
> {
>   struct riscv_frame_info *frame = >machine->frame;
>-  poly_int64 size = frame->total_size;
>+  poly_int64 remaining_size = frame->total_size;
>   unsigned mask = frame->mask;
>   rtx insn;
>
>   if (flag_stack_usage_info)
>-    current_function_static_stack_size = constant_lower_bound (size);
>+    current_function_static_stack_size = constant_lower_bound 
>(remaining_size);
>
>   if (cfun->machine->naked_p)
> return;
>@@ -5026,7 +5026,7 @@ riscv_expand_prologue (void)
>   rtx dwarf = NULL_RTX;
>   dwarf = riscv_adjust_libcall_cfi_prologue ();
>
>-  size -= frame->save_libcall_adjustment;
>+  remaining_size -= frame->save_libcall_adjustment;
>   insn = emit_insn (riscv_gen_gpr_save_insn (frame));
>   frame->mask = 0; /* Temporarily fib that we need not save GPRs.  */
>
>@@ -5037,16 +5037,14 @@ riscv_expand_prologue (void)
>   /* Save the registers.  */
>   if ((frame->mask | frame->fmask) != 0)
> {
>-  HOST_WIDE_INT step1 = riscv_first_stack_step (frame, frame->total_size);
>-  if (size.is_constant ())
>-  step1 = MIN (size.to_constant(), step1);
>+  HOST_WIDE_INT step1 = riscv_first_stack_step (frame, remaining_size);
>
>   insn = gen_add3_insn (stack_pointer_rtx,
>     stack_pointer_rtx,
>     GEN_INT (-step1));
>   RTX_FRAME_RELATED_P (emit_insn (insn)) = 1;
>-  size -= step1;
>-  riscv_for_each_saved_reg (size, riscv_save_reg, false, false);
>+  remaining_size -= step1;
>+  riscv_for_each_saved_reg (remaining_size, riscv_save_reg, false, false);
> }
>
>   frame->mask = mask; /* Undo the above fib.  */
>@@ -5055,29 +5053,29 @@ riscv_expand_prologue (void)
>   if (frame_pointer_needed)
> {
>   insn = gen_add3_insn (hard_frame_pointer_rtx, stack_pointer_rtx,
>-      GEN_INT ((frame->hard_frame_pointer_offset - size).to_constant ()));
>+      GEN_INT ((frame->hard_frame_pointer_offset - 
>remaining_size).to_constant ()));
>   RTX_FRAME_RELATED_P (emit_insn (insn)) = 1;
>
>   riscv_emit_stack_tie ();
> }
>
>   /* Allocate the rest of the frame.  */
>-  if (known_gt (size, 0))
>+  if (known_gt (remaining_size, 0))
> {
>   /* Two step adjustment:
> 1.scalable frame. 2.constant frame.  */
>   poly_int64 scalable_frame (0, 0);
>-  if (!size.is_constant ())
>+  if (!remaining_size.is_constant ())
> {
>   /* First for scalable frame.  */
>-    poly_int64 scalable_frame = size;
>-    scalable_frame.coeffs[0] = size.coeffs[1];
>+    poly_int64 scalable_frame = remaining_size;
>+    scalable_frame.coeffs[0] = remaining_size.coeffs[1];
>   riscv_v_adjust_scalable_frame (stack_pointer_rtx, scalable_frame, false);
>-    size -= scalable_frame;
>+    remaining_size -= scalable_frame;
> }
>
>   /* Second step for constant frame.  */
>-  HOST_WIDE_INT constant_frame = size.to_constant ();
>+  HOST_WIDE_INT constant_frame = remaining_size.to_constant ();
>   if (constant_frame == 0)
> return;
>
>@@ -5142,6 +5140,8 @@ riscv_expand_epilogue (int style)
>   HOST_WIDE_INT 

[PING][PATCH 1/3] RISC-V: add a new parameter in riscv_first_stack_step.

2023-02-03 Thread Fei Gao

Gentle ping.

The patch I previously submitted:
| Date: Wed, 30 Nov 2022 00:38:08 -0800
| Subject: [PATCH] RISC-V: optimize stack manipulation in save-restore
| Message-ID: 

I split the patches as per Palmer's review comment.

BR
Fei

>frame->total_size to remaining_size conversion is done as an independent patch 
>without
>functionality change as per review comment.
>
>gcc/ChangeLog:
>
>    * config/riscv/riscv.cc (riscv_first_stack_step): add a new function 
>parameter remaining_size.
>    (riscv_compute_frame_info): adapt new riscv_first_stack_step interface.
>    (riscv_expand_prologue): likewise.
>    (riscv_expand_epilogue): likewise.
>---
> gcc/config/riscv/riscv.cc | 48 +++
> 1 file changed, 24 insertions(+), 24 deletions(-)
>
>diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>index 05bdba5ab4d..f0bbcd6d6be 100644
>--- a/gcc/config/riscv/riscv.cc
>+++ b/gcc/config/riscv/riscv.cc
>@@ -4634,7 +4634,7 @@ riscv_save_libcall_count (unsigned mask)
>    They decrease stack_pointer_rtx but leave frame_pointer_rtx and
>    hard_frame_pointer_rtx unchanged.  */
>
>-static HOST_WIDE_INT riscv_first_stack_step (struct riscv_frame_info *frame);
>+static HOST_WIDE_INT riscv_first_stack_step (struct riscv_frame_info *frame, 
>poly_int64 remaining_size);
>
> /* Handle stack align for poly_int.  */
> static poly_int64
>@@ -4663,7 +4663,7 @@ riscv_compute_frame_info (void)
>  save/restore t0.  We check for this before clearing the frame struct.  */
>   if (cfun->machine->interrupt_handler_p)
> {
>-  HOST_WIDE_INT step1 = riscv_first_stack_step (frame);
>+  HOST_WIDE_INT step1 = riscv_first_stack_step (frame, frame->total_size);
>   if (! POLY_SMALL_OPERAND_P ((frame->total_size - step1)))
> interrupt_save_prologue_temp = true;
> }
>@@ -4913,45 +4913,45 @@ riscv_restore_reg (rtx reg, rtx mem)
>    without adding extra instructions.  */
>
> static HOST_WIDE_INT
>-riscv_first_stack_step (struct riscv_frame_info *frame)
>+riscv_first_stack_step (struct riscv_frame_info *frame, poly_int64 
>remaining_size)
> {
>-  HOST_WIDE_INT frame_total_constant_size;
>-  if (!frame->total_size.is_constant ())
>-    frame_total_constant_size
>-  = riscv_stack_align (frame->total_size.coeffs[0])
>-  - riscv_stack_align (frame->total_size.coeffs[1]);
>+  HOST_WIDE_INT remaining_const_size;
>+  if (!remaining_size.is_constant ())
>+    remaining_const_size
>+  = riscv_stack_align (remaining_size.coeffs[0])
>+    - riscv_stack_align (remaining_size.coeffs[1]);
>   else
>-    frame_total_constant_size = frame->total_size.to_constant ();
>+    remaining_const_size = remaining_size.to_constant ();
>
>-  if (SMALL_OPERAND (frame_total_constant_size))
>-    return frame_total_constant_size;
>+  if (SMALL_OPERAND (remaining_const_size))
>+    return remaining_const_size;
>
>   HOST_WIDE_INT min_first_step =
>-    RISCV_STACK_ALIGN ((frame->total_size - 
>frame->frame_pointer_offset).to_constant());
>+    riscv_stack_align ((remaining_size - 
>frame->frame_pointer_offset).to_constant());
>   HOST_WIDE_INT max_first_step = IMM_REACH / 2 - PREFERRED_STACK_BOUNDARY / 8;
>-  HOST_WIDE_INT min_second_step = frame_total_constant_size - max_first_step;
>+  HOST_WIDE_INT min_second_step = remaining_const_size - max_first_step;
>   gcc_assert (min_first_step <= max_first_step);
>
>   /* As an optimization, use the least-significant bits of the total frame
>  size, so that the second adjustment step is just LUI + ADD.  */
>   if (!SMALL_OPERAND (min_second_step)
>-  && frame_total_constant_size % IMM_REACH < IMM_REACH / 2
>-  && frame_total_constant_size % IMM_REACH >= min_first_step)
>-    return frame_total_constant_size % IMM_REACH;
>+  && remaining_const_size % IMM_REACH < IMM_REACH / 2
>+  && remaining_const_size % IMM_REACH >= min_first_step)
>+    return remaining_const_size % IMM_REACH;
>
>   if (TARGET_RVC)
> {
>   /* If we need two subtracts, and one is small enough to allow compressed
>-  loads and stores, then put that one first.  */
>+ loads and stores, then put that one first.  */
>   if (IN_RANGE (min_second_step, 0,
>-      (TARGET_64BIT ? SDSP_REACH : SWSP_REACH)))
>-  return MAX (min_second_step, min_first_step);
>+    (TARGET_64BIT ? SDSP_REACH : SWSP_REACH)))
>+   return MAX (min_second_step, min_first_step);
>
>   /* If we need LUI + ADDI + ADD for the second adjustment step, then 
>start
>-  with the minimum first step, so that we can get compressed loads and
>-  stores.  */
>+ with the minimum first step, so that we can get compressed loads and
>+ stores.  */
>   else if (!SMALL_OPERAND (min_second_step))
>-  return min_first_step;
>+   return min_first_step;
> }
>
>   return max_first_step;
>@@ -5037,7 +5037,7 @@ riscv_expand_prologue (void)
>   /* Save the registers.  */
>   if ((frame->mask | 

[PING] [PATCH 0/3] RISC-V: optimize stack manipulation in save-restore

2023-02-03 Thread Fei Gao

Gentle ping.

The patch I previously submitted:
| Date: Wed, 30 Nov 2022 00:38:08 -0800
| Subject: [PATCH] RISC-V: optimize stack manipulation in save-restore
| Message-ID: 

I split the patches as per Palmer's review comment.

BR
Fei

On 2022-12-01 18:03  Fei Gao  wrote:
>
>The patches allow less instructions to be used in stack allocation
>and deallocation if save-restore enabled, and also make the stack
>manipulation codes more readable.
>
>Fei Gao (3):
>  RISC-V: add a new parameter in riscv_first_stack_step.
>  RISC-V: optimize stack manipulation in save-restore
>  RISC-V: make the stack manipulation codes more readable.
>
> gcc/config/riscv/riscv.cc | 105 +-
> .../gcc.target/riscv/stack_save_restore.c |  40 +++
> 2 files changed, 95 insertions(+), 50 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/riscv/stack_save_restore.c
>
>--
>2.17.1

[PATCH] [PR tree-optimization/18639] Compare nonzero bits in irange with widest_int.

2023-02-03 Thread Aldy Hernandez via Gcc-patches
The problem here is we are trying to compare two ranges with different
precisions and the == operator in wide_int is complaining.

Interestingly, the problem is not the nonzero bits, but the fact that
the entire ranges have different precisions.  The reason we don't ICE
when comparing the sub-ranges, is because the code in
irange::operator== works on trees, and tree_int_cst_equal is
promoting the comparison to a widest int:

  if (TREE_CODE (t1) == INTEGER_CST
  && TREE_CODE (t2) == INTEGER_CST
  && wi::to_widest (t1) == wi::to_widest (t2))
return 1;

This is why we don't see the ICE until the nonzero bits comparison is
done on wide ints.  I think we should maintain the current equality
behavior, and follow suit in the nonzero bit comparison.

I have also fixed the legacy equality code, even though technically
nonzero bits shouldn't appear in legacy.  But better safe than sorry.

PR 108639/tree-optimization

Re-running tests with Jakub's testcases for both PR108638 and PR108639.

OK pending tests?

gcc/ChangeLog:

* value-range.cc (irange::legacy_equal_p): Compare nonzero bits as
widest_int.
(irange::operator==): Same.
---
 gcc/testsuite/gcc.c-torture/compile/pr108638.c | 12 
 gcc/testsuite/gcc.c-torture/compile/pr108639.c | 11 +++
 gcc/value-range.cc | 11 +--
 3 files changed, 32 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr108638.c
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr108639.c

diff --git a/gcc/testsuite/gcc.c-torture/compile/pr108638.c 
b/gcc/testsuite/gcc.c-torture/compile/pr108638.c
new file mode 100644
index 000..755c151a09a
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr108638.c
@@ -0,0 +1,12 @@
+/* PR tree-optimization/108638 */
+
+long long a;
+int b;
+
+void
+foo (void)
+{
+  for (a = 0; a < __SIZEOF_LONG_LONG__ * __CHAR_BIT__; a++)
+if (b)
+  b |= a << a;
+}
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr108639.c 
b/gcc/testsuite/gcc.c-torture/compile/pr108639.c
new file mode 100644
index 000..ed826cc2f5a
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr108639.c
@@ -0,0 +1,11 @@
+/* PR tree-optimization/108639 */
+
+long long a;
+
+int
+main ()
+{
+  a = a ? 0 || 0 % 0 : 0;
+  a = a << a;
+  return 0;
+}
diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index 26f6f26b01a..a535337c47a 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -1259,7 +1259,10 @@ irange::legacy_equal_p (const irange ) const
   other.tree_lower_bound (0))
  && vrp_operand_equal_p (tree_upper_bound (0),
  other.tree_upper_bound (0))
- && get_nonzero_bits () == other.get_nonzero_bits ());
+ && (widest_int::from (get_nonzero_bits (),
+   TYPE_SIGN (type ()))
+ == widest_int::from (other.get_nonzero_bits (),
+  TYPE_SIGN (other.type ();
 }
 
 bool
@@ -1294,7 +1297,11 @@ irange::operator== (const irange ) const
  || !operand_equal_p (ub, ub_other, 0))
return false;
 }
-  return get_nonzero_bits () == other.get_nonzero_bits ();
+  widest_int nz1 = widest_int::from (get_nonzero_bits (),
+TYPE_SIGN (type ()));
+  widest_int nz2 = widest_int::from (other.get_nonzero_bits (),
+TYPE_SIGN (other.type ()));
+  return nz1 == nz2;
 }
 
 /* Return TRUE if this is a symbolic range.  */
-- 
2.39.1



Re: [PATCH] tree: Use comdat tree_code_{type,length} even for C++11/14 [PR108634]

2023-02-03 Thread Jakub Jelinek via Gcc-patches
On Thu, Feb 02, 2023 at 03:30:29PM +0100, Jakub Jelinek via Gcc-patches wrote:
> Tested in non-bootstrapped build with both -std=gnu++17 and -std=gnu++11,
> ok for trunk if it passes full bootstrap/regtest?

Bootstrapped/regtested successfully on x86_64-linux and i686-linux
(gcc 12 as stage1 compiler) and on powerpc64le-linux (gcc 4.8.5 as
stage1 compiler, testing the other paths).

> 2023-02-02  Jakub Jelinek  
> 
>   PR plugins/108634
>   * tree-core.h (tree_code_type, tree_code_length): For C++11 or
>   C++14, don't declare as extern const arrays.
>   (tree_code_type_tmpl, tree_code_length_tmpl): New types with
>   static constexpr member arrays for C++11 or C++14.
>   * tree.h (TREE_CODE_CLASS): For C++11 or C++14 use
>   tree_code_type_tmpl <0>::tree_code_type instead of tree_code_type.
>   (TREE_CODE_LENGTH): For C++11 or C++14 use
>   tree_code_length_tmpl <0>::tree_code_length instead of
>   tree_code_length.
>   * tree.cc (tree_code_type, tree_code_length): Remove.

Jakub