[PATCH v2] LoongArch: testsuite:Fix fail in gen-vect-{2,25}.c file.

2024-01-12 Thread chenxiaolong
1.Added  dg-do compile on LoongArch.
  When binutils does not support vector instruction sets, an error occurs
because the assembler does not recognize vector instructions.

2.Added "-mlsx" option for vectorization on LoongArch.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/gen-vect-2.c: Added detection of compilation
behavior and "-mlsx" option on LoongArch.
* gcc.dg/tree-ssa/gen-vect-25.c: Dito.
---
 gcc/testsuite/gcc.dg/tree-ssa/gen-vect-2.c  | 2 ++
 gcc/testsuite/gcc.dg/tree-ssa/gen-vect-25.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-2.c
index b84f3184427..a35999a172a 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-2.c
@@ -1,6 +1,8 @@
 /* { dg-do run { target vect_cmdline_needed } } */
+/* { dg-do compile { target { loongarch_sx && {! loongarch_sx_hw } } } } */
 /* { dg-options "-O2 -fno-tree-loop-distribute-patterns -ftree-vectorize 
-fdump-tree-vect-details -fvect-cost-model=dynamic" } */
 /* { dg-additional-options "-mno-sse" { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-additional-options "-mlsx" { target { loongarch*-*-* } } } */
 
 #include 
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-25.c 
b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-25.c
index 18fe1aa1502..9f14a54c413 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-25.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-25.c
@@ -1,6 +1,8 @@
 /* { dg-do run { target vect_cmdline_needed } } */
+/* { dg-do compile { target { loongarch_sx && {! loongarch_sx_hw } } } } */
 /* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details 
-fvect-cost-model=dynamic" } */
 /* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details 
-fvect-cost-model=dynamic -mno-sse" { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-additional-options "-mlsx" { target { loongarch*-*-* } } } */
 
 #include 
 
-- 
2.20.1



[PATCH v2] LoongArch: testsuite:Added additional vectorization "-mlsx" option.

2024-01-12 Thread chenxiaolong
gcc/testsuite/ChangeLog:

* gcc.dg/pr104992.c: Added additional "-mlsx" compilation options.
* gcc.dg/signbit-2.c: Dito.
* gcc.dg/tree-ssa/scev-16.c: Dito.
* gfortran.dg/graphite/vect-pr40979.f90: Dito.
* gfortran.dg/vect/fast-math-mgrid-resid.f: Dito.
---
 gcc/testsuite/gcc.dg/pr104992.c| 1 +
 gcc/testsuite/gcc.dg/signbit-2.c   | 1 +
 gcc/testsuite/gcc.dg/tree-ssa/scev-16.c| 1 +
 gcc/testsuite/gfortran.dg/graphite/vect-pr40979.f90| 1 +
 gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f | 1 +
 5 files changed, 5 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/pr104992.c b/gcc/testsuite/gcc.dg/pr104992.c
index 82f8c75559c..a77992fa491 100644
--- a/gcc/testsuite/gcc.dg/pr104992.c
+++ b/gcc/testsuite/gcc.dg/pr104992.c
@@ -1,6 +1,7 @@
 /* PR tree-optimization/104992 */
 /* { dg-do compile } */
 /* { dg-options "-O2 -Wno-psabi -fdump-tree-optimized" } */
+/* { dg-additional-options "-mlsx" { target loongarch_sx } } */
 
 #define vector __attribute__((vector_size(4*sizeof(int
 
diff --git a/gcc/testsuite/gcc.dg/signbit-2.c b/gcc/testsuite/gcc.dg/signbit-2.c
index 62bb4047d74..5511bb78149 100644
--- a/gcc/testsuite/gcc.dg/signbit-2.c
+++ b/gcc/testsuite/gcc.dg/signbit-2.c
@@ -5,6 +5,7 @@
 /* { dg-additional-options "-msse2 -mno-avx512f" { target { i?86-*-* 
x86_64-*-* } } } */
 /* { dg-additional-options "-march=armv8-a" { target aarch64_sve } } */
 /* { dg-additional-options "-maltivec" { target powerpc_altivec_ok } } */
+/* { dg-additional-options "-mlsx" { target loongarch_sx } } */
 /* { dg-skip-if "no fallback for MVE" { arm_mve } } */
 
 #include 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c 
b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
index 120f40c0b6c..06cfbbcfae5 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target vect_int } */
 /* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */
+/* { dg-additional-options "-mlsx" { target { loongarch*-*-* } } } */
 
 int A[1024 * 2];
 
diff --git a/gcc/testsuite/gfortran.dg/graphite/vect-pr40979.f90 
b/gcc/testsuite/gfortran.dg/graphite/vect-pr40979.f90
index a42290948c4..6f2ad1166a4 100644
--- a/gcc/testsuite/gfortran.dg/graphite/vect-pr40979.f90
+++ b/gcc/testsuite/gfortran.dg/graphite/vect-pr40979.f90
@@ -1,6 +1,7 @@
 ! { dg-do compile }
 ! { dg-require-effective-target vect_double }
 ! { dg-additional-options "-msse2" { target { { i?86-*-* x86_64-*-* } && ilp32 
} } }
+! { dg-additional-options "-mlsx" { target { loongarch*-*-* } } }
 
 module mqc_m
 integer, parameter, private :: longreal = selected_real_kind(15,90)
diff --git a/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f 
b/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f
index 08965cc5e20..97b88821731 100644
--- a/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f
+++ b/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f
@@ -2,6 +2,7 @@
 ! { dg-require-effective-target vect_double }
 ! { dg-options "-O3 --param vect-max-peeling-for-alignment=0 
-fpredictive-commoning -fdump-tree-pcom-details -std=legacy" }
 ! { dg-additional-options "-mprefer-avx128" { target { i?86-*-* x86_64-*-* } } 
}
+! { dg-additional-options "-mlsx" { target { loongarch*-*-* } } }
 ! { dg-additional-options "-mzarch" { target { s390*-*-* } } }
 
 *** RESID COMPUTES THE RESIDUAL:  R = V - AU
-- 
2.20.1



Re: [PATCH v2 2/2] LoongArch: When the code model is extreme, the symbol address is obtained through macro instructions regardless of the value of -mexplicit-relocs.

2024-01-12 Thread chenglulu



在 2024/1/12 下午7:42, Xi Ruoyao 写道:

在 2024-01-12星期五的 09:46 +0800,chenglulu写道:


I found an issue bootstrapping GCC with -mcmodel=extreme in BOOT_CFLAGS:
we need a target hook to tell the generic code
UNSPEC_LA_PCREL_64_PART{1,2} are just a wrapper around symbols, or we'll
see millions lines of messages like

../../gcc/gcc/tree.h:4171:1: note: non-delegitimized UNSPEC
UNSPEC_LA_PCREL_64_PART1 (42) found in variable location

I build GCC with -mcmodel=extreme in BOOT_CFLAGS, but I haven't reproduced the 
problem you mentioned.

     $ ../configure --host=loongarch64-linux-gnu --target=loongarch64-linux-gnu 
--build=loongarch64-linux-gnu \
     --with-arch=loongarch64 --with-abi=lp64d --enable-tls 
--enable-languages=c,c++,fortran,lto --enable-plugin \
     --disable-multilib --disable-host-shared --enable-bootstrap 
--enable-checking=release
     $ make BOOT_FLAGS="-mcmodel=extreme"

What did I do wrong?:-(

BOOT_CFLAGS, not BOOT_FLAGS :).


This is so strange. My compilation here stopped due to syntax problems,

and I still haven't reproduced the information you mentioned about 
UNSPEC_LA_PCREL_64_PART1.





[PATCH] LoongArch: Assign the '/u' attribute to the mem to which the global offset table belongs.

2024-01-12 Thread Lulu Cheng
gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_split_symbol):
Assign the '/u' attribute to the mem.

gcc/testsuite/ChangeLog:

* g++.target/loongarch/got-load.C: New test.
---
 gcc/config/loongarch/loongarch.cc |  5 +
 gcc/testsuite/g++.target/loongarch/got-load.C | 19 +++
 2 files changed, 24 insertions(+)
 create mode 100644 gcc/testsuite/g++.target/loongarch/got-load.C

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 3b8559bfdc8..82467474288 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -3202,6 +3202,11 @@ loongarch_split_symbol (rtx temp, rtx addr, machine_mode 
mode, rtx *low_out)
  rtx mem = gen_rtx_MEM (Pmode, low);
  *low_out = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, mem),
 UNSPEC_LOAD_FROM_GOT);
+
+ /* Nonzero in a mem, if the memory is statically allocated and
+read-only.  A common example of the later is a shared library’s
+global offset table.  */
+ MEM_READONLY_P (mem) = 1;
}
 
  break;
diff --git a/gcc/testsuite/g++.target/loongarch/got-load.C 
b/gcc/testsuite/g++.target/loongarch/got-load.C
new file mode 100644
index 000..20924c73942
--- /dev/null
+++ b/gcc/testsuite/g++.target/loongarch/got-load.C
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mabi=lp64d -O2 -mexplicit-relocs -mcmodel=normal 
-fdump-rtl-expand" } */
+/* { dg-final { scan-rtl-dump-times "mem/u" 2 "expand" } } */
+
+#include 
+
+using namespace std;
+
+int lr[15][2];
+
+void
+test(void)
+{
+  int n;
+
+  cin >> n;
+  for (int i = 0; i < n; ++i)
+cin >> lr[i][0] >> lr[i][1];
+}
-- 
2.39.3



[COMMITTED] Add a few testcases for fix missed optimization regressions

2024-01-12 Thread Andrew Pinski
Adds a few new testcases for some missed optimization regressions.
The analysis on how each should be optimized is in the testcases
themselves (and in the bug report).

Committed as obvious after running the testsuite to make sure they pass.

PR tree-optimization/107823
PR tree-optimization/110768
PR tree-optimization/110941
PR tree-optimization/110450
PR tree-optimization/110841

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/ssa-thread-22.c: New test.
* gcc.dg/tree-ssa/vrp-loop-1.c: New test.
* gcc.dg/tree-ssa/vrp-loop-2.c: New test.
* gcc.dg/tree-ssa/vrp-unreachable-1.c: New test.
* gcc.dg/tree-ssa/vrp-unreachable-2.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-22.c | 23 +
 gcc/testsuite/gcc.dg/tree-ssa/vrp-loop-1.c| 34 +++
 gcc/testsuite/gcc.dg/tree-ssa/vrp-loop-2.c| 33 ++
 .../gcc.dg/tree-ssa/vrp-unreachable-1.c   | 26 ++
 .../gcc.dg/tree-ssa/vrp-unreachable-2.c   | 29 
 5 files changed, 145 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-22.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/vrp-loop-1.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/vrp-loop-2.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/vrp-unreachable-1.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/vrp-unreachable-2.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-22.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-22.c
new file mode 100644
index 000..f605009d8b5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-22.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-Os -fdump-tree-optimized" } */
+/* PR tree-optimization/107823 */
+/* With jump threading across the loop header,
+   we should figure out that b is always 0 and remove
+   the call to foo.  */
+
+int a;
+void bar64_(void);
+void foo();
+int main() {
+  signed char b = a = 6;
+  for (; a; a = 0) {
+bar64_();
+b = 0;
+  }
+  if (b <= 0)
+;
+  else
+foo();
+}
+
+/* { dg-final { scan-tree-dump-not "foo " "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp-loop-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/vrp-loop-1.c
new file mode 100644
index 000..09de8924308
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp-loop-1.c
@@ -0,0 +1,34 @@
+/* { dg-do compile } */
+/* { dg-options "-Os -fdump-tree-optimized" } */
+/* PR tree-optimization/110768 */
+/* The call to foo should be able to removed,
+   The branch to unreachable is unreachable as
+   VRP (ranger) figure out that c there can only
+   be -20409 or 0. before r14-5109-ga291237b628f41
+   ranger could not figure that out.  */
+   
+
+void foo(void);
+static int a, b;
+int main() {
+{
+short c = 45127;
+signed char d;
+b = 0;
+for (; b <= 3; b++) {
+if (b) continue;
+d = 0;
+for (; d <= 100; d++) {
+if (!(((c) >= -20409) && ((c) <= 1))) {
+__builtin_unreachable();
+}
+if (~(0 == a) & 1) return b;
+c = 0;
+for (; c <= 0; c++) a = 3;
+}
+}
+foo();
+}
+}
+
+/* { dg-final { scan-tree-dump-not "foo " "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp-loop-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/vrp-loop-2.c
new file mode 100644
index 000..7438c55aaef
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp-loop-2.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+/* PR tree-optimization/110941 */
+/* The call to foo should be able to removed,
+   VRP should figure out `(c >= 2 && c <= 26)`
+   is always true.  */  
+
+static int a;
+void foo(void);
+void bar349_(void);
+void bar363_(void);
+void bar275_(void);
+int main() {
+  {
+{
+  short b = 26;
+  for (; b >= 1; b = b - 4) {
+if (b >= 2 && b <= 26)
+  bar275_();
+if (a)
+  bar363_();
+if (a)
+  bar349_();
+int c = b;
+if (!(c >= 2 && c <= 26))
+  foo();
+  }
+}
+a = 0;
+  }
+}
+
+/* { dg-final { scan-tree-dump-not "foo " "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp-unreachable-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/vrp-unreachable-1.c
new file mode 100644
index 000..76ef5017577
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp-unreachable-1.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* PR tree-optimization/110450 */
+/* the ranger should be able to figure out that based on the
+   unreachable part of d not being zero, *b is also never 0.
+*/
+
+
+void foo(void);
+static int a = 1;
+static int *b = , *c = 
+static short d, e;
+static signed char f = 11;
+static signed char(g)(signed char h, int i) { return h << i; }
+int 

[PATCH] RISC-V: Adjust loop len by costing 1 when NITER < VF [GCC 14 regression]

2024-01-12 Thread Juzhe-Zhong
This patch fixes the regression between GCC 13.2.0 and trunk GCC (GCC-14)

GCC 13.2.0:

lui a5,%hi(a)
li  a4,19
sb  a4,%lo(a)(a5)
li  a0,0
ret

Trunk GCC:

vsetvli a5,zero,e8,mf2,ta,ma
li  a4,-32768
vid.v   v1
vsetvli zero,zero,e16,m1,ta,ma
addiw   a4,a4,104
vmv.v.i v3,15
lui a1,%hi(a)
li  a0,19
vsetvli zero,zero,e8,mf2,ta,ma
vadd.vi v1,v1,1
sb  a0,%lo(a)(a1)
vsetvli zero,zero,e16,m1,ta,ma
vzext.vf2   v2,v1
vmv.v.x v1,a4
vminu.vvv2,v2,v3
vsrl.vv v1,v1,v2
vslidedown.vi   v1,v1,17
vmv.x.s a0,v1
sneza0,a0
ret

The root cause we are vectorizing the codes inefficiently since we doesn't cost 
len when NITERS < VF.
Leverage loop control of mask targets or rs6000 fixes the regression.

Tested no regression. Ok for trunk ?

PR target/113281

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.cc 
(costs::adjust_vect_cost_per_loop): New function.
(costs::finish_cost): Adjust cost
* config/riscv/riscv-vector-costs.h: New function.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/pr113281-3.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/pr113281-4.c: New test.

---
 gcc/config/riscv/riscv-vector-costs.cc| 61 +++
 gcc/config/riscv/riscv-vector-costs.h |  2 +
 .../vect/costmodel/riscv/rvv/pr113281-3.c | 18 ++
 .../vect/costmodel/riscv/rvv/pr113281-4.c | 18 ++
 4 files changed, 99 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113281-3.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113281-4.c

diff --git a/gcc/config/riscv/riscv-vector-costs.cc 
b/gcc/config/riscv/riscv-vector-costs.cc
index 1c3708f23a0..9c0b9a874de 100644
--- a/gcc/config/riscv/riscv-vector-costs.cc
+++ b/gcc/config/riscv/riscv-vector-costs.cc
@@ -1110,9 +1110,70 @@ costs::add_stmt_cost (int count, vect_cost_for_stmt kind,
   return record_stmt_cost (stmt_info, where, count * stmt_cost);
 }
 
+/* For some target specific vectorization cost which can't be handled per stmt,
+   we check the requisite conditions and adjust the vectorization cost
+   accordingly if satisfied.  One typical example is to model model and adjust
+   loop_len cost for known_lt (NITERS, VF).  */
+
+void
+costs::adjust_vect_cost_per_loop (loop_vec_info loop_vinfo)
+{
+  if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo)
+  && !LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)
+  && m_num_vector_iterations == 1
+  && LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
+  && known_le (LOOP_VINFO_INT_NITERS (loop_vinfo),
+  LOOP_VINFO_VECT_FACTOR (loop_vinfo)))
+{
+  /* In middle-end loop vectorizer, we don't count the loop_len cost in
+vect_estimate_min_profitable_iters when NITERS < VF, that is, we only
+count cost of len that we need to iterate loop more than once with VF
+(m_num_vector_iterations > 1).  It's correct for most of the cases:
+
+E.g. VF = [4, 4]
+  for (int i = 0; i < 3; i ++)
+a[i] += b[i];
+
+We don't need to cost MIN_EXPR or SELECT_VL for the case above.
+
+However, for some inefficient vectorized cases, it does use MIN_EXPR
+to generate len.
+
+E.g. VF = [256, 256]
+
+Loop body:
+  # loop_len_110 = PHI <18(2), _119(11)>
+  ...
+  _117 = MIN_EXPR ;
+  _118 = 18 - _117;
+  _119 = MIN_EXPR <_118, POLY_INT_CST [256, 256]>;
+  ...
+
+Epilogue:
+  ...
+  _112 = .VEC_EXTRACT (vect_patt_27.14_109, _111);
+
+We cost 1 unconditionally for this situation like other targets which
+apply mask as the loop control.  */
+  rgroup_controls *rgc;
+  unsigned int num_vectors_m1;
+  unsigned int body_stmts = 0;
+  FOR_EACH_VEC_ELT (LOOP_VINFO_LENS (loop_vinfo), num_vectors_m1, rgc)
+   if (rgc->type)
+ body_stmts += num_vectors_m1 + 1;
+
+  add_stmt_cost (body_stmts, scalar_stmt, NULL, NULL, NULL_TREE, 0,
+vect_body);
+}
+}
+
 void
 costs::finish_cost (const vector_costs *scalar_costs)
 {
+  if (loop_vec_info loop_vinfo = dyn_cast (m_vinfo))
+{
+  adjust_vect_cost_per_loop (loop_vinfo);
+}
   vector_costs::finish_cost (scalar_costs);
 }
 
diff --git a/gcc/config/riscv/riscv-vector-costs.h 
b/gcc/config/riscv/riscv-vector-costs.h
index 9bf041bb65c..3defd45fd4c 100644
--- a/gcc/config/riscv/riscv-vector-costs.h
+++ b/gcc/config/riscv/riscv-vector-costs.h
@@ -101,6 +101,8 @@ private:
  V_REGS spills according to the analysis.  */
   bool m_has_unexpected_spills_p = false;
   void record_potential_unexpected_spills (loop_vec_info);
+
+  void adjust_vect_cost_per_loop (loop_vec_info);
 };
 
 } // namespace 

[PATCH] testsuite: Fix fallout of turning warnings into errors on 32-bit Arm

2024-01-12 Thread Thiago Jung Bauermann
Since commits 2c3db94d9fd ("c: Turn int-conversion warnings into
permerrors") and 55e94561e97e ("c: Turn -Wimplicit-function-declaration
into a permerror") these tests fail with errors such as:

  FAIL: gcc.target/arm/pr59858.c (test for excess errors)
  FAIL: gcc.target/arm/pr65647.c (test for excess errors)
  FAIL: gcc.target/arm/pr65710.c (test for excess errors)
  FAIL: gcc.target/arm/pr97969.c (test for excess errors)

Here's one example of the excess errors:

  FAIL: gcc.target/arm/pr65647.c (test for excess errors)
  Excess errors:
  /path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:6:17: error: 
initialization of 'int' from 'int *' makes integer from pointer without a cast 
[-Wint-conversion]
  /path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:6:51: error: 
initialization of 'int' from 'int *' makes integer from pointer without a cast 
[-Wint-conversion]
  /path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:6:62: error: 
initialization of 'int' from 'int *' makes integer from pointer without a cast 
[-Wint-conversion]
  /path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:7:48: error: 
initialization of 'int' from 'int *' makes integer from pointer without a cast 
[-Wint-conversion]
  /path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:8:9: error: 
initialization of 'int' from 'int *' makes integer from pointer without a cast 
[-Wint-conversion]
  /path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:24:5: error: 
initialization of 'int' from 'int *' makes integer from pointer without a cast 
[-Wint-conversion]
  /path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:25:5: error: 
initialization of 'int' from 'struct S1 *' makes integer from pointer without a 
cast [-Wint-conversion]
  /path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:41:3: error: implicit 
declaration of function 'fn3'; did you mean 'fn2'? 
[-Wimplicit-function-declaration]
  /path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:46:3: error: implicit 
declaration of function 'fn5'; did you mean 'fn4'? 
[-Wimplicit-function-declaration]
  /path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:57:16: error: implicit 
declaration of function 'fn6'; did you mean 'fn4'? 
[-Wimplicit-function-declaration]

PR rtl-optimization/59858 and PR target/65710 test the fix of an ICE.
PR target/65647 and PR target/97969 test for a compilation infinite loop.

Therefore, add -fpermissive so that the tests behave as they did previously.
Tested on armv8l-linux-gnueabihf.

gcc/testsuite/ChangeLog:
* gcc.target/arm/pr59858.c: Add -fpermissive.
* gcc/testsuite/gcc.target/arm/pr65647.c: Likewise.
* gcc/testsuite/gcc.target/arm/pr65710.c: Likewise.
* gcc/testsuite/gcc.target/arm/pr97969.c: Likewise.
---
 gcc/testsuite/gcc.target/arm/pr59858.c | 2 +-
 gcc/testsuite/gcc.target/arm/pr65647.c | 2 +-
 gcc/testsuite/gcc.target/arm/pr65710.c | 2 +-
 gcc/testsuite/gcc.target/arm/pr97969.c | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/pr59858.c 
b/gcc/testsuite/gcc.target/arm/pr59858.c
index 3360b48e8586..9336edfce277 100644
--- a/gcc/testsuite/gcc.target/arm/pr59858.c
+++ b/gcc/testsuite/gcc.target/arm/pr59858.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-march=armv5te -fno-builtin -mfloat-abi=soft -mthumb 
-fno-stack-protector -Os -fno-tree-loop-optimize -fno-tree-dominator-opts -fPIC 
-w" } */
+/* { dg-options "-march=armv5te -fno-builtin -mfloat-abi=soft -mthumb 
-fno-stack-protector -Os -fno-tree-loop-optimize -fno-tree-dominator-opts -fPIC 
-w -fpermissive" } */
 /* { dg-require-effective-target fpic } */
 /* { dg-skip-if "Incompatible command line options: -mfloat-abi=soft 
-mfloat-abi=hard" { *-*-* } { "-mfloat-abi=hard" } { "" } } */
 /* { dg-require-effective-target arm_arch_v5te_thumb_ok } */
diff --git a/gcc/testsuite/gcc.target/arm/pr65647.c 
b/gcc/testsuite/gcc.target/arm/pr65647.c
index 26b4e399f6be..3cbf6b804ec0 100644
--- a/gcc/testsuite/gcc.target/arm/pr65647.c
+++ b/gcc/testsuite/gcc.target/arm/pr65647.c
@@ -1,7 +1,7 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target arm_arch_v6m_ok } */
 /* { dg-skip-if "do not override -mfloat-abi" { *-*-* } { "-mfloat-abi=*" } 
{"-mfloat-abi=soft" } } */
-/* { dg-options "-march=armv6-m -mthumb -O3 -w -mfloat-abi=soft" } */
+/* { dg-options "-march=armv6-m -mthumb -O3 -w -mfloat-abi=soft -fpermissive" 
} */
 
 a, b, c, e, g = , h, i = 7, l = 1, m, n, o, q = , r, s = , u, w = 9, x,
   y = 6, z, t6 = 7, t8, t9 = 1, t11 = 5, t12 = , t13 = 3, t15,
diff --git a/gcc/testsuite/gcc.target/arm/pr65710.c 
b/gcc/testsuite/gcc.target/arm/pr65710.c
index 103ce1d45f77..4cbf7817af7e 100644
--- a/gcc/testsuite/gcc.target/arm/pr65710.c
+++ b/gcc/testsuite/gcc.target/arm/pr65710.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-skip-if "do not override -mfloat-abi" { *-*-* } { "-mfloat-abi=*" } 
{"-mfloat-abi=soft" } } */
-/* { dg-options "-mthumb -O2 -mfloat-abi=soft -w" } */
+/* { dg-options "-mthumb -O2 

[r14-7199 Regression] FAIL: gcc.dg/vect/vect-early-break_100-pr113287.c (test for excess errors) on Linux/x86_64

2024-01-12 Thread haochen.jiang
On Linux/x86_64,

d14ef0987de2f6f2dac64f4f0f068b929078a01d is the first bad commit
commit d14ef0987de2f6f2dac64f4f0f068b929078a01d
Author: Tamar Christina 
Date:   Fri Jan 12 15:27:45 2024 +

testsuite: Make bitint early vect test more accurate

caused

FAIL: gcc.dg/vect/vect-early-break_100-pr113287.c -flto -ffat-lto-objects (test 
for excess errors)
FAIL: gcc.dg/vect/vect-early-break_100-pr113287.c (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-7199/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-early-break_100-pr113287.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-early-break_100-pr113287.c 
--target_board='unix{-m32\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


Re: [PATCH] libstdc++: Implement P2255R2 dangling checks for std::tuple [PR108822]

2024-01-12 Thread Patrick Palka
On Fri, 12 Jan 2024, Jonathan Wakely wrote:

> On Fri, 12 Jan 2024 at 18:33, Patrick Palka  wrote:
> >
> > On Fri, 12 Jan 2024, Jonathan Wakely wrote:
> >
> > > On Fri, 12 Jan 2024 at 17:55, Patrick Palka  wrote:
> > > >
> > > > On Thu, 11 Jan 2024, Jonathan Wakely wrote:
> > > >
> > > > > I'd like to commit this to trunk for GCC 14. Please take a look.
> > > > >
> > > > > -- >8 --
> > > > >
> > > > > This is the last part of PR libstdc++/108822 implementing P2255R2, 
> > > > > which
> > > > > makes it ill-formed to create a std::tuple that would bind a reference
> > > > > to a temporary.
> > > > >
> > > > > The dangling checks are implemented as deleted constructors for C++20
> > > > > and higher, and as Debug Mode static assertions in the constructor 
> > > > > body
> > > > > for older standards. This is similar to the r13-6084-g916ce577ad109b
> > > > > changes for std::pair.
> > > > >
> > > > > As part of this change, I've reimplemented most of std::tuple for 
> > > > > C++20,
> > > > > making use of concepts to replace the enable_if constraints, and using
> > > > > conditional explicit to avoid duplicating most constructors. We could
> > > > > use conditional explicit for the C++11 implementation too (with 
> > > > > pragmas
> > > > > to disables the -Wc++17-extensions warnings), but that should be done 
> > > > > as
> > > > > a stage 1 change for GCC 15 rather than now.
> > > > >
> > > > > The partial specialization for std::tuple is no longer used 
> > > > > for
> > > > > C++20 (or more precisely, for a C++20 compiler that supports concepts
> > > > > and conditional explicit). The additional constructors and assignment
> > > > > operators that take std::pair arguments have been added to the C++20
> > > > > implementation of the primary template, with sizeof...(_Elements)==2
> > > > > constraints. This avoids reimplementing all the other constructors in
> > > > > the std::tuple partial specialization to use concepts. This 
> > > > > way
> > > > > we avoid four implementations of every constructor and only have 
> > > > > three!
> > > > > (The primary template has an implementation of each constructor for
> > > > > C++11 and another for C++20, and the tuple specialization has 
> > > > > an
> > > > > implementation of each for C++11, so that's three for each 
> > > > > constructor.)
> > > > >
> > > > > In order to make the constraints more efficient on the C++20 version 
> > > > > of
> > > > > the default constructor I've also added a variable template for the
> > > > > __is_implicitly_default_constructible trait, implemented using 
> > > > > concepts.
> > > > >
> > > > > libstdc++-v3/ChangeLog:
> > > > >
> > > > >   PR libstdc++/108822
> > > > >   * include/std/tuple (tuple): Add checks for dangling references.
> > > > >   Reimplement constraints and constant expressions using C++20
> > > > >   features.
> > > > >   * include/std/type_traits [C++20]
> > > > >   (__is_implicitly_default_constructible_v): Define.
> > > > >   (__is_implicitly_default_constructible): Use variable template.
> > > > >   * testsuite/20_util/tuple/dangling_ref.cc: New test.
> > > > > ---
> > > > >  libstdc++-v3/include/std/tuple| 1021 
> > > > > -
> > > > >  libstdc++-v3/include/std/type_traits  |   11 +
> > > > >  .../testsuite/20_util/tuple/dangling_ref.cc   |  105 ++
> > > > >  3 files changed, 841 insertions(+), 296 deletions(-)
> > > > >  create mode 100644 
> > > > > libstdc++-v3/testsuite/20_util/tuple/dangling_ref.cc
> > > > >
> > > > > diff --git a/libstdc++-v3/include/std/tuple 
> > > > > b/libstdc++-v3/include/std/tuple
> > > > > index 50e11843757..cd05b638923 100644
> > > > > --- a/libstdc++-v3/include/std/tuple
> > > > > +++ b/libstdc++-v3/include/std/tuple
> > > > > @@ -752,11 +752,467 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > > > >template
> > > > >  class tuple : public _Tuple_impl<0, _Elements...>
> > > > >  {
> > > > > -  typedef _Tuple_impl<0, _Elements...> _Inherited;
> > > > > +  using _Inherited = _Tuple_impl<0, _Elements...>;
> > > > >
> > > > >template
> > > > >   using _TCC = _TupleConstraints<_Cond, _Elements...>;
> > > >
> > > > I guess this should be moved into the #else branch if it's not used in
> > > > the new impl.
> > >
> > > Ah yes, I left them there until I was sure I wouldn't need them ...
> > > then didn't move them when I didn't need them.
> > >
> > > >
> > > > >
> > > > > +#if __cpp_concepts && __cpp_conditional_explicit // >= C++20
> > > > > +  template
> > > > > + static consteval bool
> > > > > + __constructible()
> > > > > + {
> > > > > +   if constexpr (sizeof...(_UTypes) == sizeof...(_Elements))
> > > > > + return (is_constructible_v<_Elements, _UTypes> && ...);
> > > >
> > > > IIUC this (and all the other new constraints) won't short-circuit like
> > > > the old versions do :/ Not sure how much that matters?
> > >
> > > Yeah, I thought about that, but 

Re: [PATCH] libstdc++: Make PSTL algorithms accept C++20 iterators [PR110512]

2024-01-12 Thread Jonathan Wakely
On Fri, 12 Jan 2024 at 18:11, Patrick Palka  wrote:
>
> On Thu, 11 Jan 2024, Jonathan Wakely wrote:
>
> > Tested x86_64-linux and aarch64-linux, with TBB 2020.3 only.
> >
> > Reviews requested.
> >
> > -- >8 --
> >
> > This is a step towards implementing the C++23 change P2408R5, "Ranges
> > iterators as inputs to non-Ranges algorithms". C++20 random access
> > iterators which do not meet the C==17RandomAccessIterator requirements
> > will now be recognized by the PSTL algorithms.
>
> IIUC P2408R5 only relaxes the iterator requirements on non-mutating
> algorithms, but presumably this patch relaxes the requirements for all
> parallel algorithms?  Perhaps that's safe here, not sure..

I think that technically it's UB to pass non-Cpp17Iterators to those
algos (they're not constrained, they just say the argument have to
meet the requirements). So I think allowing previously ill-formed
programs to compile when using types that satisfy
std::random_access_iterator but don't meet the
Cpp17RandomAccessIterator reqs is allowed.

However, that's not all this patch allows. Dispatching to the RA code
for types that do meet the Cpp17ForwardIterator requirements but not
the Cpp17RandomAccessIterator reqs would be a semantic change for code
that was already valid and already compiled. I think it's a good
change though? I'm not certain.



Re: [PATCH] libstdc++: Implement P2255R2 dangling checks for std::tuple [PR108822]

2024-01-12 Thread Jonathan Wakely
On Fri, 12 Jan 2024 at 18:33, Patrick Palka  wrote:
>
> On Fri, 12 Jan 2024, Jonathan Wakely wrote:
>
> > On Fri, 12 Jan 2024 at 17:55, Patrick Palka  wrote:
> > >
> > > On Thu, 11 Jan 2024, Jonathan Wakely wrote:
> > >
> > > > I'd like to commit this to trunk for GCC 14. Please take a look.
> > > >
> > > > -- >8 --
> > > >
> > > > This is the last part of PR libstdc++/108822 implementing P2255R2, which
> > > > makes it ill-formed to create a std::tuple that would bind a reference
> > > > to a temporary.
> > > >
> > > > The dangling checks are implemented as deleted constructors for C++20
> > > > and higher, and as Debug Mode static assertions in the constructor body
> > > > for older standards. This is similar to the r13-6084-g916ce577ad109b
> > > > changes for std::pair.
> > > >
> > > > As part of this change, I've reimplemented most of std::tuple for C++20,
> > > > making use of concepts to replace the enable_if constraints, and using
> > > > conditional explicit to avoid duplicating most constructors. We could
> > > > use conditional explicit for the C++11 implementation too (with pragmas
> > > > to disables the -Wc++17-extensions warnings), but that should be done as
> > > > a stage 1 change for GCC 15 rather than now.
> > > >
> > > > The partial specialization for std::tuple is no longer used for
> > > > C++20 (or more precisely, for a C++20 compiler that supports concepts
> > > > and conditional explicit). The additional constructors and assignment
> > > > operators that take std::pair arguments have been added to the C++20
> > > > implementation of the primary template, with sizeof...(_Elements)==2
> > > > constraints. This avoids reimplementing all the other constructors in
> > > > the std::tuple partial specialization to use concepts. This way
> > > > we avoid four implementations of every constructor and only have three!
> > > > (The primary template has an implementation of each constructor for
> > > > C++11 and another for C++20, and the tuple specialization has an
> > > > implementation of each for C++11, so that's three for each constructor.)
> > > >
> > > > In order to make the constraints more efficient on the C++20 version of
> > > > the default constructor I've also added a variable template for the
> > > > __is_implicitly_default_constructible trait, implemented using concepts.
> > > >
> > > > libstdc++-v3/ChangeLog:
> > > >
> > > >   PR libstdc++/108822
> > > >   * include/std/tuple (tuple): Add checks for dangling references.
> > > >   Reimplement constraints and constant expressions using C++20
> > > >   features.
> > > >   * include/std/type_traits [C++20]
> > > >   (__is_implicitly_default_constructible_v): Define.
> > > >   (__is_implicitly_default_constructible): Use variable template.
> > > >   * testsuite/20_util/tuple/dangling_ref.cc: New test.
> > > > ---
> > > >  libstdc++-v3/include/std/tuple| 1021 -
> > > >  libstdc++-v3/include/std/type_traits  |   11 +
> > > >  .../testsuite/20_util/tuple/dangling_ref.cc   |  105 ++
> > > >  3 files changed, 841 insertions(+), 296 deletions(-)
> > > >  create mode 100644 libstdc++-v3/testsuite/20_util/tuple/dangling_ref.cc
> > > >
> > > > diff --git a/libstdc++-v3/include/std/tuple 
> > > > b/libstdc++-v3/include/std/tuple
> > > > index 50e11843757..cd05b638923 100644
> > > > --- a/libstdc++-v3/include/std/tuple
> > > > +++ b/libstdc++-v3/include/std/tuple
> > > > @@ -752,11 +752,467 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > > >template
> > > >  class tuple : public _Tuple_impl<0, _Elements...>
> > > >  {
> > > > -  typedef _Tuple_impl<0, _Elements...> _Inherited;
> > > > +  using _Inherited = _Tuple_impl<0, _Elements...>;
> > > >
> > > >template
> > > >   using _TCC = _TupleConstraints<_Cond, _Elements...>;
> > >
> > > I guess this should be moved into the #else branch if it's not used in
> > > the new impl.
> >
> > Ah yes, I left them there until I was sure I wouldn't need them ...
> > then didn't move them when I didn't need them.
> >
> > >
> > > >
> > > > +#if __cpp_concepts && __cpp_conditional_explicit // >= C++20
> > > > +  template
> > > > + static consteval bool
> > > > + __constructible()
> > > > + {
> > > > +   if constexpr (sizeof...(_UTypes) == sizeof...(_Elements))
> > > > + return (is_constructible_v<_Elements, _UTypes> && ...);
> > >
> > > IIUC this (and all the other new constraints) won't short-circuit like
> > > the old versions do :/ Not sure how much that matters?
> >
> > Yeah, I thought about that, but we have efficient built-ins for these
> > traits now, so I think it's probably OK?
>
> Performance wise agreed, though I suppose removing the short circuiting
> could break existing (though not necessarily valid) code that relied
> on it to prevent an ill-formed template instantiation.  It seems
> the standard https://eel.is/c++draft/tuple uses conjunction_v in some
> constraints, and 

[PATCH] libstdc++: Update tzdata to 2023d

2024-01-12 Thread Jonathan Wakely
It would be good to update the bundled tzdata for GCC 14.1 and 13.3

Tested x86_64-linux.

Any objections?

-- >8 --

Import the new 2023d tzdata.zi file.

libstdc++-v3/ChangeLog:

* src/c++20/tzdata.zi: Import new file from 2023d release.
---
 libstdc++-v3/src/c++20/tzdata.zi | 23 +--
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/libstdc++-v3/src/c++20/tzdata.zi b/libstdc++-v3/src/c++20/tzdata.zi
index b522e395326..4e01359010c 100644
--- a/libstdc++-v3/src/c++20/tzdata.zi
+++ b/libstdc++-v3/src/c++20/tzdata.zi
@@ -1,4 +1,4 @@
-# version 2023c
+# version 2023d
 # This zic input file is in the public domain.
 R d 1916 o - Jun 14 23s 1 S
 R d 1916 1919 - O Su>=1 23s 0 -
@@ -394,7 +394,12 @@ Z Antarctica/Casey 0 - -00 1969
 8 - +08 2019 O 4 3
 11 - +11 2020 Mar 8 3
 8 - +08 2020 O 4 0:1
-11 - +11
+11 - +11 2021 Mar 14
+8 - +08 2021 O 3 0:1
+11 - +11 2022 Mar 13
+8 - +08 2022 O 2 0:1
+11 - +11 2023 Mar 9 3
+8 - +08
 Z Antarctica/Davis 0 - -00 1957 Ja 13
 7 - +07 1964 N
 0 - -00 1969 F
@@ -410,6 +415,11 @@ R Tr 2005 ma - Mar lastSu 1u 2 +02
 R Tr 2004 ma - O lastSu 1u 0 +00
 Z Antarctica/Troll 0 - -00 2005 F 12
 0 Tr %s
+Z Antarctica/Vostok 0 - -00 1957 D 16
+7 - +07 1994 F
+0 - -00 1994 N
+7 - +07 2023 D 18 2
+5 - +05
 Z Antarctica/Rothera 0 - -00 1976 D
 -3 - -03
 Z Asia/Kabul 4:36:48 - LMT 1890
@@ -1050,13 +1060,13 @@ R P 2070 o - O 4 2 0 -
 R P 2071 o - S 19 2 0 -
 R P 2072 o - S 10 2 0 -
 R P 2072 o - O 15 2 1 S
+R P 2072 ma - O Sa<=30 2 0 -
 R P 2073 o - S 2 2 0 -
 R P 2073 o - O 7 2 1 S
 R P 2074 o - Au 18 2 0 -
 R P 2074 o - S 29 2 1 S
 R P 2075 o - Au 10 2 0 -
 R P 2075 o - S 14 2 1 S
-R P 2075 ma - O Sa<=30 2 0 -
 R P 2076 o - Jul 25 2 0 -
 R P 2076 o - S 5 2 1 S
 R P 2077 o - Jul 17 2 0 -
@@ -1831,10 +1841,12 @@ Z America/Danmarkshavn -1:14:40 - LMT 1916 Jul 28
 Z America/Scoresbysund -1:27:52 - LMT 1916 Jul 28
 -2 - -02 1980 Ap 6 2
 -2 c -02/-01 1981 Mar 29
--1 E -01/+00
+-1 E -01/+00 2024 Mar 31
+-2 E -02/-01
 Z America/Nuuk -3:26:56 - LMT 1916 Jul 28
 -3 - -03 1980 Ap 6 2
--3 E -03/-02 2023 O 29 1u
+-3 E -03/-02 2023 Mar 26 1u
+-2 - -02 2023 O 29 1u
 -2 E -02/-01
 Z America/Thule -4:35:8 - LMT 1916 Jul 28
 -4 Th A%sT
@@ -4185,7 +4197,6 @@ L America/Puerto_Rico America/Tortola
 L Pacific/Port_Moresby Antarctica/DumontDUrville
 L Pacific/Auckland Antarctica/McMurdo
 L Asia/Riyadh Antarctica/Syowa
-L Asia/Urumqi Antarctica/Vostok
 L Europe/Berlin Arctic/Longyearbyen
 L Asia/Riyadh Asia/Aden
 L Asia/Qatar Asia/Bahrain
-- 
2.43.0



[WIP] libstdc++: Implement C++26 std::text_encoding [PR113318]

2024-01-12 Thread Jonathan Wakely
Here's a partial patch for PR libstdc++/113318 to implement another
C++26 feature: https://wg21.link/p1885r12

I'm writing the rest of the tests, but thought I would post it now for
comments on the general approach.

The text_encoding class is basically just a pointer to an {ID,name} pair
in the static array. The aliases view is also just the same pointer (or
empty), and the view's iterator moves forwards and backwards in the
array while the array elements have the same ID. The iterator should
never be able to access out-of-bounds, it always points to an element of
the static array (even when not dereferenceable) and always returns
something when dereferenced (even when not dereferenceable). In the
language being proposed for C++26, it erroneously returns "" when a
non-dereferenceable iterator is dereferenced (and abort with assertions
enabled).  Incrementing/decrementing past the last/first element in the
view is idempotent (erroneously). That's the idea anyway ... there might
be bugs in the implementation. My thinking is that since those iterators
refer to a global array that never goes out of scope, there's no reason
they should every produce undefined behaviour or indeterminate values.
They should either have well-defined behaviour, or abort. The overhead
of ensuring those properties is pretty low, so seems worth it.

Comments welcome while I write the rest of the tests (and no doubt find
problems with the code)

-- >8 --

libstdc++-v3/ChangeLog:

* acinclude.m4 (GLIBCXX_CHECK_TEXT_ENCODING): Define.
* config.h.in: Regenerate.
* configure: Regenerate.
* configure.ac: Use GLIBCXX_CHECK_TEXT_ENCODING.
* include/Makefile.am: Add new headers.
* include/Makefile.in: Regenerate.
* include/bits/locale_classes.h (locale::encoding): Declare new
member function.
* include/bits/unicode.h (__charset_alias_match): New function.
* include/bits/text_encoding-data.h: New file.
* include/bits/version.def (text_encoding): Define.
* include/bits/version.h: Regenerate.
* include/std/text_encoding: New file.
* src/c++26/Makefile.am: Add test_encoding.cc.
* src/c++26/Makefile.in: Regenerate.
* src/c++26/text_encoding.cc: New file.
* python/libstdcxx/v6/printers.py (StdTextEncodingPrinter): New
printer.
* scripts/gen_text_encoding_data.py: New file.
* testsuite/ext/unicode/charset_alias_match.cc: New test.
* testsuite/std/text_encoding/requirements.cc: New test.
---
 libstdc++-v3/acinclude.m4 |  28 +
 libstdc++-v3/config.h.in  |   3 +
 libstdc++-v3/configure|  54 ++
 libstdc++-v3/configure.ac |   3 +
 libstdc++-v3/include/Makefile.am  |   2 +
 libstdc++-v3/include/Makefile.in  |   2 +
 libstdc++-v3/include/bits/locale_classes.h|  14 +
 .../include/bits/text_encoding-data.h | 907 ++
 libstdc++-v3/include/bits/unicode.h   | 157 ++-
 libstdc++-v3/include/bits/version.def |  10 +
 libstdc++-v3/include/bits/version.h   |  13 +-
 libstdc++-v3/include/std/text_encoding| 688 +
 libstdc++-v3/python/libstdcxx/v6/printers.py  |  17 +
 .../scripts/gen_text_encoding_data.py |  61 ++
 libstdc++-v3/src/c++26/Makefile.am|   2 +-
 libstdc++-v3/src/c++26/Makefile.in|   4 +-
 libstdc++-v3/src/c++26/text_encoding.cc   |  70 ++
 .../ext/unicode/charset_alias_match.cc|  18 +
 .../std/text_encoding/requirements.cc |  31 +
 19 files changed, 2079 insertions(+), 5 deletions(-)
 create mode 100644 libstdc++-v3/include/bits/text_encoding-data.h
 create mode 100644 libstdc++-v3/include/std/text_encoding
 create mode 100755 libstdc++-v3/scripts/gen_text_encoding_data.py
 create mode 100644 libstdc++-v3/src/c++26/text_encoding.cc
 create mode 100644 libstdc++-v3/testsuite/ext/unicode/charset_alias_match.cc
 create mode 100644 libstdc++-v3/testsuite/std/text_encoding/requirements.cc

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index aa2cc4af52b..f9ba7ef744b 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -5821,6 +5821,34 @@ AC_LANG_SAVE
   AC_LANG_RESTORE
 ])
 
+dnl
+dnl Check whether the dependencies for std::text_encoding are available.
+dnl
+dnl Defines:
+dnl   _GLIBCXX_USE_NL_LANGINFO_L if nl_langinfo_l is in .
+dnl
+AC_DEFUN([GLIBCXX_CHECK_TEXT_ENCODING], [
+AC_LANG_SAVE
+  AC_LANG_CPLUSPLUS
+
+  AC_MSG_CHECKING([whether nl_langinfo_l is defined in ])
+  AC_TRY_COMPILE([
+  #include 
+  #include 
+  ],[
+locale_t loc = newlocale(LC_ALL_MASK, "", (locale_t)0);
+const char* enc = nl_langinfo_l(CODESET, loc);
+freelocale(loc);
+  ], [ac_nl_langinfo_l=yes], [ac_nl_langinfo_l=no])
+  AC_MSG_RESULT($ac_nl_langinfo_l)
+  if test "$ac_nl_langinfo_l" = yes; then
+

Re: [PATCH v2 1/2] Implement ASM_DECLARE_FUNCTION_NAME using ASM_OUTPUT_FUNCTION_LABEL

2024-01-12 Thread Jan-Benedict Glaw
On Tue, 2024-01-02 20:41:37 +0100, Ilya Leoshkevich  wrote:
> diff --git a/gcc/config/ia64/ia64.cc b/gcc/config/ia64/ia64.cc
> index ac566efcf19..92d00bf922f 100644
> --- a/gcc/config/ia64/ia64.cc
> +++ b/gcc/config/ia64/ia64.cc
> @@ -3886,8 +3886,7 @@ ia64_expand_prologue (void)
>  /* Output the textual info surrounding the prologue.  */
>  
>  void
> -ia64_start_function (FILE *file, const char *fnname,
> -  tree decl ATTRIBUTE_UNUSED)
> +ia64_start_function (FILE *file, const char *fnname, tree decl)
>  {
>  #if TARGET_ABI_OPEN_VMS
>vms_start_function (fnname);
> @@ -3896,7 +3895,7 @@ ia64_start_function (FILE *file, const char *fnname,
>fputs ("\t.proc ", file);
>assemble_name (file, fnname);
>fputc ('\n', file);
> -  ASM_OUTPUT_LABEL (file, fnname);
> +  ASM_OUTPUT_FUNCTION_LABEL (file, fnname, decl);
>  }
>  
>  /* Called after register allocation to add any instructions needed for the

Seems for this I'll get a new warning (forced to error by configuring
with --enable-werror-always), cf. 
http://toolchain.lug-owl.de/laminar/log/gcc-ia64-elf/48 :

[all 2024-01-12 16:32:32] 
/var/lib/laminar/run/gcc-ia64-elf/48/local-toolchain-install/bin/g++  -fno-PIE 
-c   -g -O2   -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE   -fno-exceptions -fno-rtti 
-fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
-Wcast-qual -Wmissing-format-attribute -Wconditionally-supported 
-Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros 
-Wno-overlength-strings -Werror -fno-common  -DHAVE_CONFIG_H -fno-PIE -I. -I. 
-I../../gcc/gcc -I../../gcc/gcc/. -I../../gcc/gcc/../include  
-I../../gcc/gcc/../libcpp/include -I../../gcc/gcc/../libcody  
-I../../gcc/gcc/../libdecnumber -I../../gcc/gcc/../libdecnumber/dpd 
-I../libdecnumber -I../../gcc/gcc/../libbacktrace   -o ia64.o -MT ia64.o -MMD 
-MP -MF ./.deps/ia64.TPo ../../gcc/gcc/config/ia64/ia64.cc
[all 2024-01-12 16:32:34] ../../gcc/gcc/config/ia64/ia64.cc: In function 'void 
ia64_start_function(FILE*, const char*, tree)':
[all 2024-01-12 16:32:34] ../../gcc/gcc/config/ia64/ia64.cc:3889:59: error: 
unused parameter 'decl' [-Werror=unused-parameter]
[all 2024-01-12 16:32:34]  3889 | ia64_start_function (FILE *file, const char 
*fnname, tree decl)
[all 2024-01-12 16:32:34]   |   
   ~^~~~
[all 2024-01-12 16:32:49] cc1plus: all warnings being treated as errors
[all 2024-01-12 16:32:49] make[1]: *** [Makefile:2555: ia64.o] Error 1


So the ATTRIBUTE_UNUSED seems to be a good cover, or update the
ASM_OUTPUT_FUNCTION_LABEL macro to always "use" its last argument.

MfG, JBG

-- 


signature.asc
Description: PGP signature


Re: [PATCH] libstdc++: Implement P2255R2 dangling checks for std::tuple [PR108822]

2024-01-12 Thread Jonathan Wakely
On Fri, 12 Jan 2024 at 18:33, Patrick Palka  wrote:
>
> On Fri, 12 Jan 2024, Jonathan Wakely wrote:
>
> > On Fri, 12 Jan 2024 at 17:55, Patrick Palka  wrote:
> > >
> > > On Thu, 11 Jan 2024, Jonathan Wakely wrote:
> > >
> > > > I'd like to commit this to trunk for GCC 14. Please take a look.
> > > >
> > > > -- >8 --
> > > >
> > > > This is the last part of PR libstdc++/108822 implementing P2255R2, which
> > > > makes it ill-formed to create a std::tuple that would bind a reference
> > > > to a temporary.
> > > >
> > > > The dangling checks are implemented as deleted constructors for C++20
> > > > and higher, and as Debug Mode static assertions in the constructor body
> > > > for older standards. This is similar to the r13-6084-g916ce577ad109b
> > > > changes for std::pair.
> > > >
> > > > As part of this change, I've reimplemented most of std::tuple for C++20,
> > > > making use of concepts to replace the enable_if constraints, and using
> > > > conditional explicit to avoid duplicating most constructors. We could
> > > > use conditional explicit for the C++11 implementation too (with pragmas
> > > > to disables the -Wc++17-extensions warnings), but that should be done as
> > > > a stage 1 change for GCC 15 rather than now.
> > > >
> > > > The partial specialization for std::tuple is no longer used for
> > > > C++20 (or more precisely, for a C++20 compiler that supports concepts
> > > > and conditional explicit). The additional constructors and assignment
> > > > operators that take std::pair arguments have been added to the C++20
> > > > implementation of the primary template, with sizeof...(_Elements)==2
> > > > constraints. This avoids reimplementing all the other constructors in
> > > > the std::tuple partial specialization to use concepts. This way
> > > > we avoid four implementations of every constructor and only have three!
> > > > (The primary template has an implementation of each constructor for
> > > > C++11 and another for C++20, and the tuple specialization has an
> > > > implementation of each for C++11, so that's three for each constructor.)
> > > >
> > > > In order to make the constraints more efficient on the C++20 version of
> > > > the default constructor I've also added a variable template for the
> > > > __is_implicitly_default_constructible trait, implemented using concepts.
> > > >
> > > > libstdc++-v3/ChangeLog:
> > > >
> > > >   PR libstdc++/108822
> > > >   * include/std/tuple (tuple): Add checks for dangling references.
> > > >   Reimplement constraints and constant expressions using C++20
> > > >   features.
> > > >   * include/std/type_traits [C++20]
> > > >   (__is_implicitly_default_constructible_v): Define.
> > > >   (__is_implicitly_default_constructible): Use variable template.
> > > >   * testsuite/20_util/tuple/dangling_ref.cc: New test.
> > > > ---
> > > >  libstdc++-v3/include/std/tuple| 1021 -
> > > >  libstdc++-v3/include/std/type_traits  |   11 +
> > > >  .../testsuite/20_util/tuple/dangling_ref.cc   |  105 ++
> > > >  3 files changed, 841 insertions(+), 296 deletions(-)
> > > >  create mode 100644 libstdc++-v3/testsuite/20_util/tuple/dangling_ref.cc
> > > >
> > > > diff --git a/libstdc++-v3/include/std/tuple 
> > > > b/libstdc++-v3/include/std/tuple
> > > > index 50e11843757..cd05b638923 100644
> > > > --- a/libstdc++-v3/include/std/tuple
> > > > +++ b/libstdc++-v3/include/std/tuple
> > > > @@ -752,11 +752,467 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > > >template
> > > >  class tuple : public _Tuple_impl<0, _Elements...>
> > > >  {
> > > > -  typedef _Tuple_impl<0, _Elements...> _Inherited;
> > > > +  using _Inherited = _Tuple_impl<0, _Elements...>;
> > > >
> > > >template
> > > >   using _TCC = _TupleConstraints<_Cond, _Elements...>;
> > >
> > > I guess this should be moved into the #else branch if it's not used in
> > > the new impl.
> >
> > Ah yes, I left them there until I was sure I wouldn't need them ...
> > then didn't move them when I didn't need them.
> >
> > >
> > > >
> > > > +#if __cpp_concepts && __cpp_conditional_explicit // >= C++20
> > > > +  template
> > > > + static consteval bool
> > > > + __constructible()
> > > > + {
> > > > +   if constexpr (sizeof...(_UTypes) == sizeof...(_Elements))
> > > > + return (is_constructible_v<_Elements, _UTypes> && ...);
> > >
> > > IIUC this (and all the other new constraints) won't short-circuit like
> > > the old versions do :/ Not sure how much that matters?
> >
> > Yeah, I thought about that, but we have efficient built-ins for these
> > traits now, so I think it's probably OK?
>
> Performance wise agreed, though I suppose removing the short circuiting
> could break existing (though not necessarily valid) code that relied
> on it to prevent an ill-formed template instantiation.  It seems
> the standard https://eel.is/c++draft/tuple uses conjunction_v in some
> constraints, and 

Re: [PATCH 2/2] libstdc++: Implement C++23 std::bind_pack from P2387R3 [PR108827]

2024-01-12 Thread Jonathan Wakely
On Fri, 12 Jan 2024 at 20:10, Patrick Palka  wrote:
>
> Tested on x86_64-pc-linux-gnu, does this look OK for trunk?

OK


>
> PR libstdc++/108827
> PR libstdc++/111327
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/version.def (bind_back): Define.
> * include/bits/version.h: Regenerate.
> * include/std/functional (_Bind_back): Define for C++23.
> (bind_back): Likewise.
> * testsuite/20_util/function_objects/bind_back/1.cc: New test
> (adapted from corresponding bind_front test).
> * testsuite/20_util/function_objects/bind_back/111327.cc: Likewise.
> ---
>  libstdc++-v3/include/bits/version.def |   9 +
>  libstdc++-v3/include/bits/version.h   | 221 +-
>  libstdc++-v3/include/std/functional   |  71 ++
>  .../20_util/function_objects/bind_back/1.cc   | 178 ++
>  .../function_objects/bind_back/111327.cc  |  42 
>  5 files changed, 416 insertions(+), 105 deletions(-)
>  create mode 100644 
> libstdc++-v3/testsuite/20_util/function_objects/bind_back/1.cc
>  create mode 100644 
> libstdc++-v3/testsuite/20_util/function_objects/bind_back/111327.cc
>
> diff --git a/libstdc++-v3/include/bits/version.def 
> b/libstdc++-v3/include/bits/version.def
> index 7c7ba066161..21cdc65121b 100644
> --- a/libstdc++-v3/include/bits/version.def
> +++ b/libstdc++-v3/include/bits/version.def
> @@ -766,6 +766,15 @@ ftms = {
>};
>  };
>
> +ftms = {
> +  name = bind_back;
> +  values = {
> +v = 202202;
> +cxxmin = 23;
> +extra_cond = "__cpp_explicit_this_parameter";
> +  };
> +};
> +
>  ftms = {
>name = starts_ends_with;
>values = {
> diff --git a/libstdc++-v3/include/bits/version.h 
> b/libstdc++-v3/include/bits/version.h
> index 65d5164347e..f8dd16416a4 100644
> --- a/libstdc++-v3/include/bits/version.h
> +++ b/libstdc++-v3/include/bits/version.h
> @@ -937,6 +937,17 @@
>  #undef __glibcxx_want_bind_front
>
>  // from version.def line 770
> +#if !defined(__cpp_lib_bind_back)
> +# if (__cplusplus >= 202100L) && (__cpp_explicit_this_parameter)
> +#  define __glibcxx_bind_back 202202L
> +#  if defined(__glibcxx_want_all) || defined(__glibcxx_want_bind_back)
> +#   define __cpp_lib_bind_back 202202L
> +#  endif
> +# endif
> +#endif /* !defined(__cpp_lib_bind_back) && defined(__glibcxx_want_bind_back) 
> */
> +#undef __glibcxx_want_bind_back
> +
> +// from version.def line 779
>  #if !defined(__cpp_lib_starts_ends_with)
>  # if (__cplusplus >= 202002L)
>  #  define __glibcxx_starts_ends_with 201711L
> @@ -947,7 +958,7 @@
>  #endif /* !defined(__cpp_lib_starts_ends_with) && 
> defined(__glibcxx_want_starts_ends_with) */
>  #undef __glibcxx_want_starts_ends_with
>
> -// from version.def line 778
> +// from version.def line 787
>  #if !defined(__cpp_lib_bit_cast)
>  # if (__cplusplus >= 202002L) && (__has_builtin(__builtin_bit_cast))
>  #  define __glibcxx_bit_cast 201806L
> @@ -958,7 +969,7 @@
>  #endif /* !defined(__cpp_lib_bit_cast) && defined(__glibcxx_want_bit_cast) */
>  #undef __glibcxx_want_bit_cast
>
> -// from version.def line 787
> +// from version.def line 796
>  #if !defined(__cpp_lib_bitops)
>  # if (__cplusplus >= 202002L)
>  #  define __glibcxx_bitops 201907L
> @@ -969,7 +980,7 @@
>  #endif /* !defined(__cpp_lib_bitops) && defined(__glibcxx_want_bitops) */
>  #undef __glibcxx_want_bitops
>
> -// from version.def line 795
> +// from version.def line 804
>  #if !defined(__cpp_lib_bounded_array_traits)
>  # if (__cplusplus >= 202002L)
>  #  define __glibcxx_bounded_array_traits 201902L
> @@ -980,7 +991,7 @@
>  #endif /* !defined(__cpp_lib_bounded_array_traits) && 
> defined(__glibcxx_want_bounded_array_traits) */
>  #undef __glibcxx_want_bounded_array_traits
>
> -// from version.def line 803
> +// from version.def line 812
>  #if !defined(__cpp_lib_concepts)
>  # if (__cplusplus >= 202002L) && (__cpp_concepts >= 201907L)
>  #  define __glibcxx_concepts 202002L
> @@ -991,7 +1002,7 @@
>  #endif /* !defined(__cpp_lib_concepts) && defined(__glibcxx_want_concepts) */
>  #undef __glibcxx_want_concepts
>
> -// from version.def line 813
> +// from version.def line 822
>  #if !defined(__cpp_lib_optional)
>  # if (__cplusplus >= 202100L) && (__glibcxx_concepts)
>  #  define __glibcxx_optional 202110L
> @@ -1012,7 +1023,7 @@
>  #endif /* !defined(__cpp_lib_optional) && defined(__glibcxx_want_optional) */
>  #undef __glibcxx_want_optional
>
> -// from version.def line 830
> +// from version.def line 839
>  #if !defined(__cpp_lib_destroying_delete)
>  # if (__cplusplus >= 202002L) && (__cpp_impl_destroying_delete)
>  #  define __glibcxx_destroying_delete 201806L
> @@ -1023,7 +1034,7 @@
>  #endif /* !defined(__cpp_lib_destroying_delete) && 
> defined(__glibcxx_want_destroying_delete) */
>  #undef __glibcxx_want_destroying_delete
>
> -// from version.def line 839
> +// from version.def line 848
>  #if !defined(__cpp_lib_constexpr_string_view)
>  # if (__cplusplus >= 202002L)
>  #  

Re: [PATCH 1/2] libstdc++: Use C++23 deducing this in std::bind_front

2024-01-12 Thread Jonathan Wakely
On Fri, 12 Jan 2024 at 20:09, Patrick Palka  wrote:
>
> This simplifies the operator() of _Bind_front using C++23 deducing
> this, allowing us to condense multiple nearly identical operator()
> overloads into one.
>
> In passing I think we can remove _Bind_front's defaulted special member
> declarations and just let the compiler implicitly generate them for us.

OK


>
> libstdc++-v3/ChangeLog:
>
> * include/std/functional (_Bind_front): Remove =default special
> member function declarations.
> (_Bind_front::operator()): Implement using C++23 deducing this
> when available.
> * testsuite/20_util/function_objects/bind_front/111327.cc:
> Adjust testcase to expect better errors in C++23 mode.
> ---
>  libstdc++-v3/include/std/functional   | 20 +--
>  .../function_objects/bind_front/111327.cc | 14 +++--
>  2 files changed, 22 insertions(+), 12 deletions(-)
>
> diff --git a/libstdc++-v3/include/std/functional 
> b/libstdc++-v3/include/std/functional
> index 8d50a730889..190cea612bb 100644
> --- a/libstdc++-v3/include/std/functional
> +++ b/libstdc++-v3/include/std/functional
> @@ -934,12 +934,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>   _M_bound_args(std::forward<_Args>(__args)...)
> { static_assert(sizeof...(_Args) == sizeof...(_BoundArgs)); }
>
> -  _Bind_front(const _Bind_front&) = default;
> -  _Bind_front(_Bind_front&&) = default;
> -  _Bind_front& operator=(const _Bind_front&) = default;
> -  _Bind_front& operator=(_Bind_front&&) = default;
> -  ~_Bind_front() = default;
> -
> +#if __cpp_explicit_this_parameter
> +  template
> +   constexpr
> +   invoke_result_t<__like_t<_Self, _Fd>, __like_t<_Self, _BoundArgs>..., 
> _CallArgs...>
> +   operator()(this _Self&& __self, _CallArgs&&... __call_args)
> +   noexcept(is_nothrow_invocable_v<__like_t<_Self, _Fd>,
> +   __like_t<_Self, _BoundArgs>...,
> +   _CallArgs...>)
> +   {
> + return _S_call(std::forward<_Self>(__self), _BoundIndices(),
> +std::forward<_CallArgs>(__call_args)...);
> +   }
> +#else
>template
> requires true
> constexpr
> @@ -997,6 +1004,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
>template
> void operator()(_CallArgs&&...) const && = delete;
> +#endif
>
>  private:
>using _BoundIndices = index_sequence_for<_BoundArgs...>;
> diff --git 
> a/libstdc++-v3/testsuite/20_util/function_objects/bind_front/111327.cc 
> b/libstdc++-v3/testsuite/20_util/function_objects/bind_front/111327.cc
> index 43b56ca4378..5fe0a83baec 100644
> --- a/libstdc++-v3/testsuite/20_util/function_objects/bind_front/111327.cc
> +++ b/libstdc++-v3/testsuite/20_util/function_objects/bind_front/111327.cc
> @@ -17,24 +17,26 @@ struct G {
>
>  int main() {
>auto f0 = std::bind_front(F{});
> -  f0(); // { dg-error "deleted" }
> +  f0(); // { dg-error "deleted|no match" }
>std::move(f0)();
>std::as_const(f0)();
>std::move(std::as_const(f0))();
>
>auto g0 = std::bind_front(G{});
> -  g0(); // { dg-error "deleted" }
> -  std::move(g0)(); // { dg-error "deleted" }
> +  g0(); // { dg-error "deleted|no match" }
> +  std::move(g0)(); // { dg-error "deleted|no match" }
>std::move(std::as_const(g0))();
>
>auto f1 = std::bind_front(F{}, 42);
> -  f1(); // { dg-error "deleted" }
> +  f1(); // { dg-error "deleted|no match" }
>std::move(f1)();
>std::as_const(f1)();
>std::move(std::as_const(f1))();
>
>auto g1 = std::bind_front(G{}, 42);
> -  g1(); // { dg-error "deleted" }
> -  std::move(g1)(); // { dg-error "deleted" }
> +  g1(); // { dg-error "deleted|no match" }
> +  std::move(g1)(); // { dg-error "deleted|no match" }
>std::move(std::as_const(g1))();
>  }
> +
> +// { dg-error "no type named 'type' in 'struct std::invoke_result" "" { 
> target c++23 } 0 }
> --
> 2.43.0.283.ga54a84b333
>



RE: [PATCHv3] aarch64/expr: Use ccmp when the outer expression is used twice [PR100942]

2024-01-12 Thread Andrew Pinski (QUIC)
> -Original Message-
> From: Richard Sandiford 
> Sent: Friday, January 12, 2024 4:26 AM
> To: Andrew Pinski (QUIC) 
> Cc: gcc-patches@gcc.gnu.org
> Subject: [PATCHv3] aarch64/expr: Use ccmp when the outer expression is
> used twice [PR100942]
> 
> Andrew Pinski  writes:
> > Ccmp is not used if the result of the and/ior is used by both
> > a GIMPLE_COND and a GIMPLE_ASSIGN. This improves the code generation
> > here by using ccmp in this case.
> > Two changes is required, first we need to allow the outer statement's
> > result be used more than once.
> > The second change is that during the expansion of the gimple, we need
> > to try using ccmp. This is needed because we don't use expand the ssa
> > name of the lhs but rather expand directly from the gimple.
> >
> > A small note on the ccmp_4.c testcase, we should be able to get slightly
> > better than with this patch but it is one extra instruction compared to
> > before.
> >
> > Diff from v1:
> > * v2: Split out expand_gimple_assign_ssa so the we only need to handle
> > promotion once. Add ccmp_5.c testcase which was suggested. Change
> comment
> > on ccmp_candidate_p.
> 
> I meant more that we should split out the gassign handling in
> expand_expr_real_1, since we're effectively making cfgexpand follow
> it more closely.  What do you think about the attached version?
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.
> 
> OK for the expr/cfgexpand bits?

Oh that is what I originally thought you wanted but I was not 100% sure so I 
just
moved it out in one place.  Anyways thanks for taking care of the change. 

Thanks,
Andrew Pinski

> 
> Thanks,
> Richard
> 
> 
> 
> Ccmp is not used if the result of the and/ior is used by both
> a GIMPLE_COND and a GIMPLE_ASSIGN. This improves the code generation
> here by using ccmp in this case.
> Two changes is required, first we need to allow the outer statement's
> result be used more than once.
> The second change is that during the expansion of the gimple, we need
> to try using ccmp. This is needed because we don't use expand the ssa
> name of the lhs but rather expand directly from the gimple.
> 
> A small note on the ccmp_4.c testcase, we should be able to get slightly
> better than with this patch but it is one extra instruction compared to
> before.
> 
>   PR target/100942
> 
> gcc/ChangeLog:
> 
>   * ccmp.cc (ccmp_candidate_p): Add outer argument.
>   Allow if the outer is true and the lhs is used more
>   than once.
>   (expand_ccmp_expr): Update call to ccmp_candidate_p.
>   * expr.h (expand_expr_real_gassign): Declare.
>   * expr.cc (expand_expr_real_gassign): New function, split out from...
>   (expand_expr_real_1): ...here.
>   * cfgexpand.cc (expand_gimple_stmt_1): Use
> expand_expr_real_gassign.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/ccmp_3.c: New test.
>   * gcc.target/aarch64/ccmp_4.c: New test.
>   * gcc.target/aarch64/ccmp_5.c: New test.
> 
> Signed-off-by: Andrew Pinski 
> Co-authored-by: Richard Sandiford 
> ---
>  gcc/ccmp.cc   |  12 +--
>  gcc/cfgexpand.cc  |  31 ++-
>  gcc/expr.cc   | 103 --
>  gcc/expr.h|   3 +
>  gcc/testsuite/gcc.target/aarch64/ccmp_3.c |  20 +
>  gcc/testsuite/gcc.target/aarch64/ccmp_4.c |  35 
>  gcc/testsuite/gcc.target/aarch64/ccmp_5.c |  20 +
>  7 files changed, 149 insertions(+), 75 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/ccmp_3.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/ccmp_4.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/ccmp_5.c
> 
> diff --git a/gcc/ccmp.cc b/gcc/ccmp.cc
> index 09d6b5595a4..7cb525addf4 100644
> --- a/gcc/ccmp.cc
> +++ b/gcc/ccmp.cc
> @@ -90,9 +90,10 @@ ccmp_tree_comparison_p (tree t, basic_block bb)
> If all checks OK in expand_ccmp_expr, it emits insns in prep_seq, then
> insns in gen_seq.  */
> 
> -/* Check whether G is a potential conditional compare candidate.  */
> +/* Check whether G is a potential conditional compare candidate; OUTER is
> true if
> +   G is the outer most AND/IOR.  */
>  static bool
> -ccmp_candidate_p (gimple *g)
> +ccmp_candidate_p (gimple *g, bool outer = false)
>  {
>tree lhs, op0, op1;
>gimple *gs0, *gs1;
> @@ -109,8 +110,9 @@ ccmp_candidate_p (gimple *g)
>lhs = gimple_assign_lhs (g);
>op0 = gimple_assign_rhs1 (g);
>op1 = gimple_assign_rhs2 (g);
> -  if ((TREE_CODE (op0) != SSA_NAME) || (TREE_CODE (op1) != SSA_NAME)
> -  || !has_single_use (lhs))
> +  if ((TREE_CODE (op0) != SSA_NAME) || (TREE_CODE (op1) != SSA_NAME))
> +return false;
> +  if (!outer && !has_single_use (lhs))
>  return false;
> 
>bb = gimple_bb (g);
> @@ -284,7 +286,7 @@ expand_ccmp_expr (gimple *g, machine_mode
> mode)
>rtx_insn *last;
>rtx tmp;
> 
> -  if (!ccmp_candidate_p (g))
> +  if (!ccmp_candidate_p (g, 

[PATCH 2/2] libstdc++: Implement C++23 std::bind_pack from P2387R3 [PR108827]

2024-01-12 Thread Patrick Palka
Tested on x86_64-pc-linux-gnu, does this look OK for trunk?

PR libstdc++/108827
PR libstdc++/111327

libstdc++-v3/ChangeLog:

* include/bits/version.def (bind_back): Define.
* include/bits/version.h: Regenerate.
* include/std/functional (_Bind_back): Define for C++23.
(bind_back): Likewise.
* testsuite/20_util/function_objects/bind_back/1.cc: New test
(adapted from corresponding bind_front test).
* testsuite/20_util/function_objects/bind_back/111327.cc: Likewise.
---
 libstdc++-v3/include/bits/version.def |   9 +
 libstdc++-v3/include/bits/version.h   | 221 +-
 libstdc++-v3/include/std/functional   |  71 ++
 .../20_util/function_objects/bind_back/1.cc   | 178 ++
 .../function_objects/bind_back/111327.cc  |  42 
 5 files changed, 416 insertions(+), 105 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/20_util/function_objects/bind_back/1.cc
 create mode 100644 
libstdc++-v3/testsuite/20_util/function_objects/bind_back/111327.cc

diff --git a/libstdc++-v3/include/bits/version.def 
b/libstdc++-v3/include/bits/version.def
index 7c7ba066161..21cdc65121b 100644
--- a/libstdc++-v3/include/bits/version.def
+++ b/libstdc++-v3/include/bits/version.def
@@ -766,6 +766,15 @@ ftms = {
   };
 };
 
+ftms = {
+  name = bind_back;
+  values = {
+v = 202202;
+cxxmin = 23;
+extra_cond = "__cpp_explicit_this_parameter";
+  };
+};
+
 ftms = {
   name = starts_ends_with;
   values = {
diff --git a/libstdc++-v3/include/bits/version.h 
b/libstdc++-v3/include/bits/version.h
index 65d5164347e..f8dd16416a4 100644
--- a/libstdc++-v3/include/bits/version.h
+++ b/libstdc++-v3/include/bits/version.h
@@ -937,6 +937,17 @@
 #undef __glibcxx_want_bind_front
 
 // from version.def line 770
+#if !defined(__cpp_lib_bind_back)
+# if (__cplusplus >= 202100L) && (__cpp_explicit_this_parameter)
+#  define __glibcxx_bind_back 202202L
+#  if defined(__glibcxx_want_all) || defined(__glibcxx_want_bind_back)
+#   define __cpp_lib_bind_back 202202L
+#  endif
+# endif
+#endif /* !defined(__cpp_lib_bind_back) && defined(__glibcxx_want_bind_back) */
+#undef __glibcxx_want_bind_back
+
+// from version.def line 779
 #if !defined(__cpp_lib_starts_ends_with)
 # if (__cplusplus >= 202002L)
 #  define __glibcxx_starts_ends_with 201711L
@@ -947,7 +958,7 @@
 #endif /* !defined(__cpp_lib_starts_ends_with) && 
defined(__glibcxx_want_starts_ends_with) */
 #undef __glibcxx_want_starts_ends_with
 
-// from version.def line 778
+// from version.def line 787
 #if !defined(__cpp_lib_bit_cast)
 # if (__cplusplus >= 202002L) && (__has_builtin(__builtin_bit_cast))
 #  define __glibcxx_bit_cast 201806L
@@ -958,7 +969,7 @@
 #endif /* !defined(__cpp_lib_bit_cast) && defined(__glibcxx_want_bit_cast) */
 #undef __glibcxx_want_bit_cast
 
-// from version.def line 787
+// from version.def line 796
 #if !defined(__cpp_lib_bitops)
 # if (__cplusplus >= 202002L)
 #  define __glibcxx_bitops 201907L
@@ -969,7 +980,7 @@
 #endif /* !defined(__cpp_lib_bitops) && defined(__glibcxx_want_bitops) */
 #undef __glibcxx_want_bitops
 
-// from version.def line 795
+// from version.def line 804
 #if !defined(__cpp_lib_bounded_array_traits)
 # if (__cplusplus >= 202002L)
 #  define __glibcxx_bounded_array_traits 201902L
@@ -980,7 +991,7 @@
 #endif /* !defined(__cpp_lib_bounded_array_traits) && 
defined(__glibcxx_want_bounded_array_traits) */
 #undef __glibcxx_want_bounded_array_traits
 
-// from version.def line 803
+// from version.def line 812
 #if !defined(__cpp_lib_concepts)
 # if (__cplusplus >= 202002L) && (__cpp_concepts >= 201907L)
 #  define __glibcxx_concepts 202002L
@@ -991,7 +1002,7 @@
 #endif /* !defined(__cpp_lib_concepts) && defined(__glibcxx_want_concepts) */
 #undef __glibcxx_want_concepts
 
-// from version.def line 813
+// from version.def line 822
 #if !defined(__cpp_lib_optional)
 # if (__cplusplus >= 202100L) && (__glibcxx_concepts)
 #  define __glibcxx_optional 202110L
@@ -1012,7 +1023,7 @@
 #endif /* !defined(__cpp_lib_optional) && defined(__glibcxx_want_optional) */
 #undef __glibcxx_want_optional
 
-// from version.def line 830
+// from version.def line 839
 #if !defined(__cpp_lib_destroying_delete)
 # if (__cplusplus >= 202002L) && (__cpp_impl_destroying_delete)
 #  define __glibcxx_destroying_delete 201806L
@@ -1023,7 +1034,7 @@
 #endif /* !defined(__cpp_lib_destroying_delete) && 
defined(__glibcxx_want_destroying_delete) */
 #undef __glibcxx_want_destroying_delete
 
-// from version.def line 839
+// from version.def line 848
 #if !defined(__cpp_lib_constexpr_string_view)
 # if (__cplusplus >= 202002L)
 #  define __glibcxx_constexpr_string_view 201811L
@@ -1034,7 +1045,7 @@
 #endif /* !defined(__cpp_lib_constexpr_string_view) && 
defined(__glibcxx_want_constexpr_string_view) */
 #undef __glibcxx_want_constexpr_string_view
 
-// from version.def line 847
+// from version.def line 856
 #if !defined(__cpp_lib_endian)
 # 

[PATCH 1/2] libstdc++: Use C++23 deducing this in std::bind_front

2024-01-12 Thread Patrick Palka
This simplifies the operator() of _Bind_front using C++23 deducing
this, allowing us to condense multiple nearly identical operator()
overloads into one.

In passing I think we can remove _Bind_front's defaulted special member
declarations and just let the compiler implicitly generate them for us.

libstdc++-v3/ChangeLog:

* include/std/functional (_Bind_front): Remove =default special
member function declarations.
(_Bind_front::operator()): Implement using C++23 deducing this
when available.
* testsuite/20_util/function_objects/bind_front/111327.cc:
Adjust testcase to expect better errors in C++23 mode.
---
 libstdc++-v3/include/std/functional   | 20 +--
 .../function_objects/bind_front/111327.cc | 14 +++--
 2 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/libstdc++-v3/include/std/functional 
b/libstdc++-v3/include/std/functional
index 8d50a730889..190cea612bb 100644
--- a/libstdc++-v3/include/std/functional
+++ b/libstdc++-v3/include/std/functional
@@ -934,12 +934,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  _M_bound_args(std::forward<_Args>(__args)...)
{ static_assert(sizeof...(_Args) == sizeof...(_BoundArgs)); }
 
-  _Bind_front(const _Bind_front&) = default;
-  _Bind_front(_Bind_front&&) = default;
-  _Bind_front& operator=(const _Bind_front&) = default;
-  _Bind_front& operator=(_Bind_front&&) = default;
-  ~_Bind_front() = default;
-
+#if __cpp_explicit_this_parameter
+  template
+   constexpr
+   invoke_result_t<__like_t<_Self, _Fd>, __like_t<_Self, _BoundArgs>..., 
_CallArgs...>
+   operator()(this _Self&& __self, _CallArgs&&... __call_args)
+   noexcept(is_nothrow_invocable_v<__like_t<_Self, _Fd>,
+   __like_t<_Self, _BoundArgs>...,
+   _CallArgs...>)
+   {
+ return _S_call(std::forward<_Self>(__self), _BoundIndices(),
+std::forward<_CallArgs>(__call_args)...);
+   }
+#else
   template
requires true
constexpr
@@ -997,6 +1004,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template
void operator()(_CallArgs&&...) const && = delete;
+#endif
 
 private:
   using _BoundIndices = index_sequence_for<_BoundArgs...>;
diff --git 
a/libstdc++-v3/testsuite/20_util/function_objects/bind_front/111327.cc 
b/libstdc++-v3/testsuite/20_util/function_objects/bind_front/111327.cc
index 43b56ca4378..5fe0a83baec 100644
--- a/libstdc++-v3/testsuite/20_util/function_objects/bind_front/111327.cc
+++ b/libstdc++-v3/testsuite/20_util/function_objects/bind_front/111327.cc
@@ -17,24 +17,26 @@ struct G {
 
 int main() {
   auto f0 = std::bind_front(F{});
-  f0(); // { dg-error "deleted" }
+  f0(); // { dg-error "deleted|no match" }
   std::move(f0)();
   std::as_const(f0)();
   std::move(std::as_const(f0))();
 
   auto g0 = std::bind_front(G{});
-  g0(); // { dg-error "deleted" }
-  std::move(g0)(); // { dg-error "deleted" }
+  g0(); // { dg-error "deleted|no match" }
+  std::move(g0)(); // { dg-error "deleted|no match" }
   std::move(std::as_const(g0))();
 
   auto f1 = std::bind_front(F{}, 42);
-  f1(); // { dg-error "deleted" }
+  f1(); // { dg-error "deleted|no match" }
   std::move(f1)();
   std::as_const(f1)();
   std::move(std::as_const(f1))();
 
   auto g1 = std::bind_front(G{}, 42);
-  g1(); // { dg-error "deleted" }
-  std::move(g1)(); // { dg-error "deleted" }
+  g1(); // { dg-error "deleted|no match" }
+  std::move(g1)(); // { dg-error "deleted|no match" }
   std::move(std::as_const(g1))();
 }
+
+// { dg-error "no type named 'type' in 'struct std::invoke_result" "" { target 
c++23 } 0 }
-- 
2.43.0.283.ga54a84b333



[PATCH, v2] Fortran: annotations for DO CONCURRENT loops [PR113305]

2024-01-12 Thread Harald Anlauf

Hi Bernhard,

On 1/12/24 10:44, Bernhard Reutner-Fischer wrote:

On Wed, 10 Jan 2024 23:24:22 +0100
Harald Anlauf  wrote:


diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 82f388c05f8..88502c1e3f0 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -2926,6 +2926,10 @@ gfc_dt;
  typedef struct gfc_forall_iterator
  {
gfc_expr *var, *start, *end, *stride;
+  unsigned short unroll;
+  bool ivdep;
+  bool vector;
+  bool novector;
struct gfc_forall_iterator *next;
  }

[]

diff --git a/gcc/fortran/trans-stmt.cc b/gcc/fortran/trans-stmt.cc
index a718dce237f..59a9cf99f9b 100644
--- a/gcc/fortran/trans-stmt.cc
+++ b/gcc/fortran/trans-stmt.cc
@@ -41,6 +41,10 @@ typedef struct iter_info
tree start;
tree end;
tree step;
+  unsigned short unroll;
+  bool ivdep;
+  bool vector;
+  bool novector;
struct iter_info *next;
  }


Given that we already have in gfortran.h


typedef struct
{
   gfc_expr *var, *start, *end, *step;
   unsigned short unroll;
   bool ivdep;
   bool vector;
   bool novector;
}
gfc_iterator;


would it make sense to break out these loop annotation flags into its
own let's say struct gfc_iterator_flags and use pointers to that flags
instead?


I've created a struct gfc_loop_annot and use that directly
as I think using pointers to it is probably not very efficient.
Well, the struct is smaller than a pointer on a 64-bit system...


LGTM otherwise.
Thanks for the patch!


Thanks for the review!

If there are no further comments, I'll commit the attached version
soon.

Harald

From 31d8957a95455663577a0e60109679d56aac234d Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Fri, 12 Jan 2024 19:51:11 +0100
Subject: [PATCH] Fortran: annotations for DO CONCURRENT loops [PR113305]

gcc/fortran/ChangeLog:

	PR fortran/113305
	* gfortran.h (gfc_loop_annot): New.
	(gfc_iterator, gfc_forall_iterator): Use for annotation control.
	* array.cc (gfc_copy_iterator): Adjust.
	* gfortran.texi: Document annotations IVDEP, UNROLL n, VECTOR,
	NOVECTOR as applied to DO CONCURRENT.
	* parse.cc (parse_do_block): Parse annotations IVDEP, UNROLL n,
	VECTOR, NOVECTOR as applied to DO CONCURRENT.  Apply UNROLL only to
	first loop control variable.
	* trans-stmt.cc (iter_info): Use gfc_loop_annot.
	(gfc_trans_simple_do): Adjust.
	(gfc_trans_forall_loop): Annotate loops with IVDEP, UNROLL n,
	VECTOR, NOVECTOR as needed for DO CONCURRENT.
	(gfc_trans_forall_1): Handle loop annotations.

gcc/testsuite/ChangeLog:

	PR fortran/113305
	* gfortran.dg/do_concurrent_7.f90: New test.
---
 gcc/fortran/array.cc  |  5 +-
 gcc/fortran/gfortran.h| 11 -
 gcc/fortran/gfortran.texi | 12 +
 gcc/fortran/parse.cc  | 34 --
 gcc/fortran/trans-stmt.cc | 46 ++-
 gcc/testsuite/gfortran.dg/do_concurrent_7.f90 | 26 +++
 6 files changed, 113 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/do_concurrent_7.f90

diff --git a/gcc/fortran/array.cc b/gcc/fortran/array.cc
index 19456baf103..81fa99d219f 100644
--- a/gcc/fortran/array.cc
+++ b/gcc/fortran/array.cc
@@ -2308,10 +2308,7 @@ gfc_copy_iterator (gfc_iterator *src)
   dest->start = gfc_copy_expr (src->start);
   dest->end = gfc_copy_expr (src->end);
   dest->step = gfc_copy_expr (src->step);
-  dest->unroll = src->unroll;
-  dest->ivdep = src->ivdep;
-  dest->vector = src->vector;
-  dest->novector = src->novector;
+  dest->annot = src->annot;
 
   return dest;
 }
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 82f388c05f8..fd73e4ce431 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -2830,14 +2830,22 @@ gfc_case;
 #define gfc_get_case() XCNEW (gfc_case)
 
 
+/* Annotations for loop constructs.  */
 typedef struct
 {
-  gfc_expr *var, *start, *end, *step;
   unsigned short unroll;
   bool ivdep;
   bool vector;
   bool novector;
 }
+gfc_loop_annot;
+
+
+typedef struct
+{
+  gfc_expr *var, *start, *end, *step;
+  gfc_loop_annot annot;
+}
 gfc_iterator;
 
 #define gfc_get_iterator() XCNEW (gfc_iterator)
@@ -2926,6 +2934,7 @@ gfc_dt;
 typedef struct gfc_forall_iterator
 {
   gfc_expr *var, *start, *end, *stride;
+  gfc_loop_annot annot;
   struct gfc_forall_iterator *next;
 }
 gfc_forall_iterator;
diff --git a/gcc/fortran/gfortran.texi b/gcc/fortran/gfortran.texi
index 5615fee2897..371666dcbb6 100644
--- a/gcc/fortran/gfortran.texi
+++ b/gcc/fortran/gfortran.texi
@@ -3262,6 +3262,9 @@ It must be placed immediately before a @code{DO} loop and applies only to the
 loop that follows.  N is an integer constant specifying the unrolling factor.
 The values of 0 and 1 block any unrolling of the loop.
 
+For @code{DO CONCURRENT} constructs the unrolling specification applies
+only to the first loop control variable.
+
 
 @node BUILTIN directive
 @subsection BUILTIN directive
@@ -3300,6 +3303,9 @@ whether a particular loop is vectorizable 

Re: [PATCH] libstdc++: Implement P2255R2 dangling checks for std::tuple [PR108822]

2024-01-12 Thread Patrick Palka
On Fri, 12 Jan 2024, Jonathan Wakely wrote:

> On Fri, 12 Jan 2024 at 17:55, Patrick Palka  wrote:
> >
> > On Thu, 11 Jan 2024, Jonathan Wakely wrote:
> >
> > > I'd like to commit this to trunk for GCC 14. Please take a look.
> > >
> > > -- >8 --
> > >
> > > This is the last part of PR libstdc++/108822 implementing P2255R2, which
> > > makes it ill-formed to create a std::tuple that would bind a reference
> > > to a temporary.
> > >
> > > The dangling checks are implemented as deleted constructors for C++20
> > > and higher, and as Debug Mode static assertions in the constructor body
> > > for older standards. This is similar to the r13-6084-g916ce577ad109b
> > > changes for std::pair.
> > >
> > > As part of this change, I've reimplemented most of std::tuple for C++20,
> > > making use of concepts to replace the enable_if constraints, and using
> > > conditional explicit to avoid duplicating most constructors. We could
> > > use conditional explicit for the C++11 implementation too (with pragmas
> > > to disables the -Wc++17-extensions warnings), but that should be done as
> > > a stage 1 change for GCC 15 rather than now.
> > >
> > > The partial specialization for std::tuple is no longer used for
> > > C++20 (or more precisely, for a C++20 compiler that supports concepts
> > > and conditional explicit). The additional constructors and assignment
> > > operators that take std::pair arguments have been added to the C++20
> > > implementation of the primary template, with sizeof...(_Elements)==2
> > > constraints. This avoids reimplementing all the other constructors in
> > > the std::tuple partial specialization to use concepts. This way
> > > we avoid four implementations of every constructor and only have three!
> > > (The primary template has an implementation of each constructor for
> > > C++11 and another for C++20, and the tuple specialization has an
> > > implementation of each for C++11, so that's three for each constructor.)
> > >
> > > In order to make the constraints more efficient on the C++20 version of
> > > the default constructor I've also added a variable template for the
> > > __is_implicitly_default_constructible trait, implemented using concepts.
> > >
> > > libstdc++-v3/ChangeLog:
> > >
> > >   PR libstdc++/108822
> > >   * include/std/tuple (tuple): Add checks for dangling references.
> > >   Reimplement constraints and constant expressions using C++20
> > >   features.
> > >   * include/std/type_traits [C++20]
> > >   (__is_implicitly_default_constructible_v): Define.
> > >   (__is_implicitly_default_constructible): Use variable template.
> > >   * testsuite/20_util/tuple/dangling_ref.cc: New test.
> > > ---
> > >  libstdc++-v3/include/std/tuple| 1021 -
> > >  libstdc++-v3/include/std/type_traits  |   11 +
> > >  .../testsuite/20_util/tuple/dangling_ref.cc   |  105 ++
> > >  3 files changed, 841 insertions(+), 296 deletions(-)
> > >  create mode 100644 libstdc++-v3/testsuite/20_util/tuple/dangling_ref.cc
> > >
> > > diff --git a/libstdc++-v3/include/std/tuple 
> > > b/libstdc++-v3/include/std/tuple
> > > index 50e11843757..cd05b638923 100644
> > > --- a/libstdc++-v3/include/std/tuple
> > > +++ b/libstdc++-v3/include/std/tuple
> > > @@ -752,11 +752,467 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > >template
> > >  class tuple : public _Tuple_impl<0, _Elements...>
> > >  {
> > > -  typedef _Tuple_impl<0, _Elements...> _Inherited;
> > > +  using _Inherited = _Tuple_impl<0, _Elements...>;
> > >
> > >template
> > >   using _TCC = _TupleConstraints<_Cond, _Elements...>;
> >
> > I guess this should be moved into the #else branch if it's not used in
> > the new impl.
> 
> Ah yes, I left them there until I was sure I wouldn't need them ...
> then didn't move them when I didn't need them.
> 
> >
> > >
> > > +#if __cpp_concepts && __cpp_conditional_explicit // >= C++20
> > > +  template
> > > + static consteval bool
> > > + __constructible()
> > > + {
> > > +   if constexpr (sizeof...(_UTypes) == sizeof...(_Elements))
> > > + return (is_constructible_v<_Elements, _UTypes> && ...);
> >
> > IIUC this (and all the other new constraints) won't short-circuit like
> > the old versions do :/ Not sure how much that matters?
> 
> Yeah, I thought about that, but we have efficient built-ins for these
> traits now, so I think it's probably OK?

Performance wise agreed, though I suppose removing the short circuiting
could break existing (though not necessarily valid) code that relied
on it to prevent an ill-formed template instantiation.  It seems
the standard https://eel.is/c++draft/tuple uses conjunction_v in some
constraints, and fold-expressions in others, implying short circuiting
in some cases but not others?

> 
> If not we could go back to sharing the _TupleConstraints implementations.

IMHO I'd be more comfortable with that.



Re: [PATCH] libstdc++: Make PSTL algorithms accept C++20 iterators [PR110512]

2024-01-12 Thread Patrick Palka
On Thu, 11 Jan 2024, Jonathan Wakely wrote:

> Tested x86_64-linux and aarch64-linux, with TBB 2020.3 only.
> 
> Reviews requested.
> 
> -- >8 --
> 
> This is a step towards implementing the C++23 change P2408R5, "Ranges
> iterators as inputs to non-Ranges algorithms". C++20 random access
> iterators which do not meet the C==17RandomAccessIterator requirements
> will now be recognized by the PSTL algorithms.

IIUC P2408R5 only relaxes the iterator requirements on non-mutating
algorithms, but presumably this patch relaxes the requirements for all
parallel algorithms?  Perhaps that's safe here, not sure..

Besides that LGTM.

> 
> We can also optimize the C++17 implementation by using std::__or_, and
> use std::__remove_cvref_t and std::__iter_category_t for readability.
> This diverges from the upstream PSTL, but since libc++ is no longer
> using that upstream (so we're the only consumer of this code) I think
> it's reasonable to use libstdc++ extensions in localized places like
> this. Rebasing this small header on upstream should not be difficult.
> 
> libstdc++-v3/ChangeLog:
> 
>   PR libstdc++/110512
>   * include/pstl/execution_impl.h (__are_random_access_iterators):
>   Recognize C++20 random access iterators, and use more efficient
>   implementations.
>   * testsuite/25_algorithms/pstl/110512.cc: New test.
> ---
>  libstdc++-v3/include/pstl/execution_impl.h| 21 ++---
>  .../testsuite/25_algorithms/pstl/110512.cc| 31 +++
>  2 files changed, 47 insertions(+), 5 deletions(-)
>  create mode 100644 libstdc++-v3/testsuite/25_algorithms/pstl/110512.cc
> 
> diff --git a/libstdc++-v3/include/pstl/execution_impl.h 
> b/libstdc++-v3/include/pstl/execution_impl.h
> index 64f6cc4357a..c84061848b9 100644
> --- a/libstdc++-v3/include/pstl/execution_impl.h
> +++ b/libstdc++-v3/include/pstl/execution_impl.h
> @@ -19,13 +19,24 @@ namespace __pstl
>  {
>  namespace __internal
>  {
> -
> -template 
> -using __are_iterators_of = std::conjunction<
> -std::is_base_of<_IteratorTag, typename 
> std::iterator_traits>::iterator_category>...>;
> +#if __glibcxx_concepts
> +template
> +  concept __is_random_access_iter
> += std::is_base_of_v + std::__iter_category_t<_Iter>>
> +  || std::random_access_iterator<_Iter>;
>  
>  template 
> -using __are_random_access_iterators = 
> __are_iterators_of;
> +  using __are_random_access_iterators
> += 
> std::bool_constant<(__is_random_access_iter>
>  && ...)>;
> +#else
> +template 
> +using __are_random_access_iterators
> += std::__and_<
> + std::is_base_of + 
> std::__iter_category_t>>...
> +  >;
> +#endif
>  
>  struct __serial_backend_tag
>  {
> diff --git a/libstdc++-v3/testsuite/25_algorithms/pstl/110512.cc 
> b/libstdc++-v3/testsuite/25_algorithms/pstl/110512.cc
> new file mode 100644
> index 000..188c7c915e5
> --- /dev/null
> +++ b/libstdc++-v3/testsuite/25_algorithms/pstl/110512.cc
> @@ -0,0 +1,31 @@
> +// { dg-do compile { target c++17 } }
> +
> +// Bug 110512 - C++20 random access iterators run sequentially with PSTL
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +using InputIter = __gnu_test::input_iterator_wrapper;
> +using FwdIter = __gnu_test::forward_iterator_wrapper;
> +using RAIter = __gnu_test::random_access_iterator_wrapper;
> +
> +template
> +constexpr bool all_random_access
> +  = __pstl::__internal::__are_random_access_iterators::value;
> +
> +using __pstl::__internal::__are_random_access_iterators;
> +static_assert( all_random_access );
> +static_assert( all_random_access );
> +static_assert( ! all_random_access );
> +static_assert( ! all_random_access );
> +
> +#if __cpp_lib_ranges
> +using IotaIter = std::ranges::iterator_t>;
> +static_assert( std::random_access_iterator );
> +static_assert( all_random_access );
> +static_assert( all_random_access );
> +static_assert( all_random_access );
> +static_assert( ! all_random_access );
> +#endif
> -- 
> 2.43.0
> 
> 



[PATCH V3 3/4] RISC-V: Use default cost model for insn scheduling

2024-01-12 Thread Edwin Lu
Use default cost model scheduling on these test cases. All these tests
introduce scan dump failures with -mtune generic-ooo. Since the vector
cost models are the same across all three tunes, some of the tests
in PR113249 will be fixed with this patch series.

39 additional unique testsuite failures (scan dumps) will still be present.
I don't know how optimal the new output is compared to the old. Should I update
the testcase expected output to match the new scan dumps?

PR target/113249

gcc/testsuite/ChangeLog:

* g++.target/riscv/rvv/base/bug-1.C: use default scheduling
* gcc.target/riscv/rvv/autovec/reduc/reduc_call-2.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-102.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-108.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-114.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-119.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-12.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-16.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-17.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-19.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-21.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-23.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-25.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-27.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-29.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-31.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-33.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-35.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-4.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-40.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-44.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-50.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-56.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-62.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-68.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-74.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-79.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-8.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-84.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-90.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-96.c: ditto
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-30.c: ditto
* gcc.target/riscv/rvv/base/pr108185-1.c: ditto
* gcc.target/riscv/rvv/base/pr108185-2.c: ditto
* gcc.target/riscv/rvv/base/pr108185-3.c: ditto
* gcc.target/riscv/rvv/base/pr108185-4.c: ditto
* gcc.target/riscv/rvv/base/pr108185-5.c: ditto
* gcc.target/riscv/rvv/base/pr108185-6.c: ditto
* gcc.target/riscv/rvv/base/pr108185-7.c: ditto
* gcc.target/riscv/rvv/base/shift_vx_constraint-1.c: ditto
* gcc.target/riscv/rvv/vsetvl/pr111037-3.c: ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-28.c: ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-29.c: ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-32.c: ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-33.c: ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_single_block-17.c: ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_single_block-18.c: ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_single_block-19.c: ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-10.c: ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-11.c: ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-12.c: ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-4.c: ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-5.c: ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-6.c: ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-7.c: ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-8.c: ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-9.c: ditto
* gfortran.dg/vect/vect-8.f90: ditto

Signed-off-by: Edwin Lu 
---
 gcc/testsuite/g++.target/riscv/rvv/base/bug-1.C | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_call-2.c | 2 ++
 .../gcc.target/riscv/rvv/base/binop_vx_constraint-102.c | 2 ++
 .../gcc.target/riscv/rvv/base/binop_vx_constraint-108.c | 2 ++
 .../gcc.target/riscv/rvv/base/binop_vx_constraint-114.c | 2 ++
 .../gcc.target/riscv/rvv/base/binop_vx_constraint-119.c | 2 ++
 .../gcc.target/riscv/rvv/base/binop_vx_constraint-12.c  | 2 ++
 .../gcc.target/riscv/rvv/base/binop_vx_constraint-16.c  | 2 ++
 

[PATCH V3 4/4] RISC-V: Enable assert for insn_has_dfa_reservation

2024-01-12 Thread Edwin Lu
Enables assert that every typed instruction is associated with a
dfa reservation

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_sched_variable_issue): enable assert

Signed-off-by: Edwin Lu 
---
V2:
- No changes
V3:
- No changes
---
 gcc/config/riscv/riscv.cc | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index ee1a57b321d..c428d3e4e58 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -8215,9 +8215,11 @@ riscv_sched_variable_issue (FILE *, int, rtx_insn *insn, 
int more)
 
   /* If we ever encounter an insn without an insn reservation, trip
  an assert so we can find and fix this problem.  */
-#if 0
+  if (! insn_has_dfa_reservation_p (insn)) {
+print_rtl(stderr, insn);
+fprintf(stderr, "%d", get_attr_type (insn));
+  }
   gcc_assert (insn_has_dfa_reservation_p (insn));
-#endif
 
   return more - 1;
 }
-- 
2.34.1



[PATCH V3 2/4] RISC-V: Add vector related pipelines

2024-01-12 Thread Edwin Lu
Creates new generic vector pipeline file common to all cpu tunes.
Moves all vector related pipelines from generic-ooo to generic-vector-ooo.
Creates new vector crypto related insn reservations. Add temporary attribute
for making changes to the vector cost model

gcc/ChangeLog:

* config/riscv/generic-ooo.md (generic_ooo): Move reservation
(generic_ooo_vec_load): ditto
(generic_ooo_vec_store): ditto
(generic_ooo_vec_loadstore_seg): ditto
(generic_ooo_vec_alu): ditto
(generic_ooo_vec_fcmp): ditto
(generic_ooo_vec_imul): ditto
(generic_ooo_vec_fadd): ditto
(generic_ooo_vec_fmul): ditto
(generic_ooo_crypto): ditto
(generic_ooo_perm): ditto
(generic_ooo_vec_reduction): ditto
(generic_ooo_vec_ordered_reduction): ditto
(generic_ooo_vec_idiv): ditto
(generic_ooo_vec_float_divsqrt): ditto
(generic_ooo_vec_mask): ditto
(generic_ooo_vec_vesetvl): ditto
(generic_ooo_vec_setrm): ditto
(generic_ooo_vec_readlen): ditto
* config/riscv/riscv.md (no): add temporary attribute
* config/riscv/generic-vector-ooo.md: to here

Signed-off-by: Edwin Lu 
Co-authored-by: Robin Dapp 
---
V2:
- Remove unnecessary syntax changes in generic-ooo
- Add new vector crypto reservations and types to
  pipelines
V3:
- Move all vector pipelines into separate file which defines all ooo vector
  reservations.
- Add temporary attribute while cost model changes.
---
 gcc/config/riscv/generic-ooo.md| 125 ---
 gcc/config/riscv/generic-vector-ooo.md | 165 +
 gcc/config/riscv/riscv.md  |   5 +
 3 files changed, 170 insertions(+), 125 deletions(-)
 create mode 100644 gcc/config/riscv/generic-vector-ooo.md

diff --git a/gcc/config/riscv/generic-ooo.md b/gcc/config/riscv/generic-ooo.md
index ef8cb96daf4..40e5104cde1 100644
--- a/gcc/config/riscv/generic-ooo.md
+++ b/gcc/config/riscv/generic-ooo.md
@@ -48,9 +48,6 @@ (define_automaton "generic_ooo")
 ;; Integer/float issue queues.
 (define_cpu_unit "issue0,issue1,issue2,issue3,issue4" "generic_ooo")
 
-;; Separate issue queue for vector instructions.
-(define_cpu_unit "generic_ooo_vxu_issue" "generic_ooo")
-
 ;; Integer/float execution units.
 (define_cpu_unit "ixu0,ixu1,ixu2,ixu3" "generic_ooo")
 (define_cpu_unit "fxu0,fxu1" "generic_ooo")
@@ -58,12 +55,6 @@ (define_cpu_unit "fxu0,fxu1" "generic_ooo")
 ;; Integer subunit for division.
 (define_cpu_unit "generic_ooo_div" "generic_ooo")
 
-;; Vector execution unit.
-(define_cpu_unit "generic_ooo_vxu_alu" "generic_ooo")
-
-;; Vector subunit that does mult/div/sqrt.
-(define_cpu_unit "generic_ooo_vxu_multicycle" "generic_ooo")
-
 ;; Shortcuts
 (define_reservation "generic_ooo_issue" "issue0|issue1|issue2|issue3|issue4")
 (define_reservation "generic_ooo_ixu_alu" "ixu0|ixu1|ixu2|ixu3")
@@ -92,25 +83,6 @@ (define_insn_reservation "generic_ooo_float_store" 6
(eq_attr "type" "fpstore"))
   "generic_ooo_issue,generic_ooo_fxu")
 
-;; Vector load/store
-(define_insn_reservation "generic_ooo_vec_load" 6
-  (and (eq_attr "tune" "generic_ooo")
-   (eq_attr "type" "vlde,vldm,vlds,vldux,vldox,vldff,vldr"))
-  "generic_ooo_vxu_issue,generic_ooo_vxu_alu")
-
-(define_insn_reservation "generic_ooo_vec_store" 6
-  (and (eq_attr "tune" "generic_ooo")
-   (eq_attr "type" "vste,vstm,vsts,vstux,vstox,vstr"))
-  "generic_ooo_vxu_issue,generic_ooo_vxu_alu")
-
-;; Vector segment loads/stores.
-(define_insn_reservation "generic_ooo_vec_loadstore_seg" 10
-  (and (eq_attr "tune" "generic_ooo")
-   (eq_attr "type" "vlsegde,vlsegds,vlsegdux,vlsegdox,vlsegdff,\
-   vssegte,vssegts,vssegtux,vssegtox"))
-  "generic_ooo_vxu_issue,generic_ooo_vxu_alu")
-
-
 ;; Generic integer instructions.
 (define_insn_reservation "generic_ooo_alu" 1
   (and (eq_attr "tune" "generic_ooo")
@@ -191,103 +163,6 @@ (define_insn_reservation "generic_ooo_popcount" 2
(eq_attr "type" "cpop,clmul"))
   "generic_ooo_issue,generic_ooo_ixu_alu")
 
-;; Regular vector operations and integer comparisons.
-(define_insn_reservation "generic_ooo_vec_alu" 3
-  (and (eq_attr "tune" "generic_ooo")
-   (eq_attr "type" 
"vialu,viwalu,vext,vicalu,vshift,vnshift,viminmax,vicmp,\
-   vimov,vsalu,vaalu,vsshift,vnclip,vmov,vfmov,vector"))
-  "generic_ooo_vxu_issue,generic_ooo_vxu_alu")
-
-;; Vector float comparison, conversion etc.
-(define_insn_reservation "generic_ooo_vec_fcmp" 3
-  (and (eq_attr "tune" "generic_ooo")
-   (eq_attr "type" "vfrecp,vfminmax,vfcmp,vfsgnj,vfclass,vfcvtitof,\
-   vfcvtftoi,vfwcvtitof,vfwcvtftoi,vfwcvtftof,vfncvtitof,\
-   vfncvtftoi,vfncvtftof"))
-  "generic_ooo_vxu_issue,generic_ooo_vxu_alu")
-
-;; Vector integer multiplication.
-(define_insn_reservation "generic_ooo_vec_imul" 4
-  (and (eq_attr "tune" "generic_ooo")
-   (eq_attr "type" 

[PATCH V3 1/4] RISC-V: Add non-vector types to dfa pipelines

2024-01-12 Thread Edwin Lu
This patch adds non-vector related insn reservations and updates/creates
new insn reservations so all non-vector typed instructions have a reservation.

gcc/ChangeLog:

* config/riscv/generic-ooo.md (generic_ooo_sfb_alu): Add reservation
(generic_ooo_branch): ditto
* config/riscv/generic.md ( dittogeneric_sfb_alu):
(generic_fmul_half): ditto
* config/riscv/riscv.md: Remove cbo, pushpop, and rdfrm types
* config/riscv/sifive-7.md (sifive_7_hfma): Add reservation
(sifive_7_popcount): ditto
* config/riscv/vector.md: change rdfrm to fmove
* config/riscv/zc.md: change pushpop to load/store

Signed-off-by: Edwin Lu 
---
V2:
- Add insn reservations for HF fmul
- Remove/adjust insn types
V3:
- No changes
---
 gcc/config/riscv/generic-ooo.md | 15 +-
 gcc/config/riscv/generic.md | 20 +--
 gcc/config/riscv/riscv.md   | 18 +++
 gcc/config/riscv/sifive-7.md| 17 +-
 gcc/config/riscv/vector.md  |  2 +-
 gcc/config/riscv/zc.md  | 96 -
 6 files changed, 102 insertions(+), 66 deletions(-)

diff --git a/gcc/config/riscv/generic-ooo.md b/gcc/config/riscv/generic-ooo.md
index 421a7bb929d..ef8cb96daf4 100644
--- a/gcc/config/riscv/generic-ooo.md
+++ b/gcc/config/riscv/generic-ooo.md
@@ -115,9 +115,20 @@ (define_insn_reservation "generic_ooo_vec_loadstore_seg" 10
 (define_insn_reservation "generic_ooo_alu" 1
   (and (eq_attr "tune" "generic_ooo")
(eq_attr "type" "unknown,const,arith,shift,slt,multi,auipc,nop,logical,\
-   move,bitmanip,min,max,minu,maxu,clz,ctz"))
+   move,bitmanip,rotate,min,max,minu,maxu,clz,ctz,atomic,\
+   condmove,mvpair,zicond"))
   "generic_ooo_issue,generic_ooo_ixu_alu")
 
+(define_insn_reservation "generic_ooo_sfb_alu" 2
+  (and (eq_attr "tune" "generic_ooo")
+   (eq_attr "type" "sfb_alu"))
+  "generic_ooo_issue,generic_ooo_ixu_alu")
+
+;; Branch instructions
+(define_insn_reservation "generic_ooo_branch" 1
+  (and (eq_attr "tune" "generic_ooo")
+   (eq_attr "type" "branch,jump,call,jalr,ret,trap"))
+  "generic_ooo_issue,generic_ooo_ixu_alu")
 
 ;; Float move, convert and compare.
 (define_insn_reservation "generic_ooo_float_move" 3
@@ -184,7 +195,7 @@ (define_insn_reservation "generic_ooo_popcount" 2
 (define_insn_reservation "generic_ooo_vec_alu" 3
   (and (eq_attr "tune" "generic_ooo")
(eq_attr "type" 
"vialu,viwalu,vext,vicalu,vshift,vnshift,viminmax,vicmp,\
-   vimov,vsalu,vaalu,vsshift,vnclip,vmov,vfmov"))
+   vimov,vsalu,vaalu,vsshift,vnclip,vmov,vfmov,vector"))
   "generic_ooo_vxu_issue,generic_ooo_vxu_alu")
 
 ;; Vector float comparison, conversion etc.
diff --git a/gcc/config/riscv/generic.md b/gcc/config/riscv/generic.md
index b99ae345bb3..45986cfea89 100644
--- a/gcc/config/riscv/generic.md
+++ b/gcc/config/riscv/generic.md
@@ -27,7 +27,9 @@ (define_cpu_unit "fdivsqrt" "pipe0")
 
 (define_insn_reservation "generic_alu" 1
   (and (eq_attr "tune" "generic")
-   (eq_attr "type" 
"unknown,const,arith,shift,slt,multi,auipc,nop,logical,move,bitmanip,min,max,minu,maxu,clz,ctz,cpop"))
+   (eq_attr "type" "unknown,const,arith,shift,slt,multi,auipc,nop,logical,\
+   move,bitmanip,min,max,minu,maxu,clz,ctz,rotate,atomic,\
+   condmove,crypto,mvpair,zicond"))
   "alu")
 
 (define_insn_reservation "generic_load" 3
@@ -47,12 +49,17 @@ (define_insn_reservation "generic_xfer" 3
 
 (define_insn_reservation "generic_branch" 1
   (and (eq_attr "tune" "generic")
-   (eq_attr "type" "branch,jump,call,jalr"))
+   (eq_attr "type" "branch,jump,call,jalr,ret,trap"))
+  "alu")
+
+(define_insn_reservation "generic_sfb_alu" 2
+  (and (eq_attr "tune" "generic")
+   (eq_attr "type" "sfb_alu"))
   "alu")
 
 (define_insn_reservation "generic_imul" 10
   (and (eq_attr "tune" "generic")
-   (eq_attr "type" "imul,clmul"))
+   (eq_attr "type" "imul,clmul,cpop"))
   "imuldiv*10")
 
 (define_insn_reservation "generic_idivsi" 34
@@ -67,6 +74,12 @@ (define_insn_reservation "generic_idivdi" 66
(eq_attr "mode" "DI")))
   "imuldiv*66")
 
+(define_insn_reservation "generic_fmul_half" 5
+  (and (eq_attr "tune" "generic")
+   (and (eq_attr "type" "fadd,fmul,fmadd")
+   (eq_attr "mode" "HF")))
+  "alu")
+
 (define_insn_reservation "generic_fmul_single" 5
   (and (eq_attr "tune" "generic")
(and (eq_attr "type" "fadd,fmul,fmadd")
@@ -88,3 +101,4 @@ (define_insn_reservation "generic_fsqrt" 25
   (and (eq_attr "tune" "generic")
(eq_attr "type" "fsqrt"))
   "fdivsqrt*25")
+
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 95753c75cfc..1ec3e165791 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -326,9 +326,7 @@ (define_attr "ext_enabled" "no,yes"
 ;; rotate   rotation instructions
 ;; atomic   atomic instructions
 ;; 

[PATCH V3 0/4] RISC-V: Associate typed insns to dfa reservation

2024-01-12 Thread Edwin Lu
Updates all tune insn reservation pipelines to cover all types defined by
define_attr "type" in riscv.md.

Creates new vector insn reservation pipelines in new file generic-vector-ooo.md
which has separate automaton vector_ooo where all reservations are mapped to.
This allows all tunes to share a common vector model for now as we make 
large changes to the vector cost model. 
(https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642511.html)

Disables pipeline scheduling for some tests with scan dump failures when using
-mtune=generic-ooo. 

Enables assert that all insn types must be associated with a dfa pipeline
reservation

Edwin Lu (4):
  RISC-V: Add non-vector types to dfa pipelines
  RISC-V: Add vector related pipelines
  RISC-V: Use default cost model for insn scheduling
  RISC-V: Enable assert for insn_has_dfa_reservation

---
V2:
- Update non-vector insn types and add new pipelines
- Add -fno-schedule-insn -fno-schedule-insn2 to some test cases

V3:
- Separate vector pipelines to separate file which all tunes have access to
---

 gcc/config/riscv/generic-ooo.md   | 138 ++-
 gcc/config/riscv/generic-vector-ooo.md| 165 ++
 gcc/config/riscv/generic.md   |  20 ++-
 gcc/config/riscv/riscv.cc |   6 +-
 gcc/config/riscv/riscv.md |  23 +--
 gcc/config/riscv/sifive-7.md  |  17 +-
 gcc/config/riscv/vector.md|   2 +-
 gcc/config/riscv/zc.md|  96 +-
 .../g++.target/riscv/rvv/base/bug-1.C |   2 +
 .../riscv/rvv/autovec/reduc/reduc_call-2.c|   2 +
 .../riscv/rvv/base/binop_vx_constraint-102.c  |   2 +
 .../riscv/rvv/base/binop_vx_constraint-108.c  |   2 +
 .../riscv/rvv/base/binop_vx_constraint-114.c  |   2 +
 .../riscv/rvv/base/binop_vx_constraint-119.c  |   2 +
 .../riscv/rvv/base/binop_vx_constraint-12.c   |   2 +
 .../riscv/rvv/base/binop_vx_constraint-16.c   |   2 +
 .../riscv/rvv/base/binop_vx_constraint-17.c   |   2 +
 .../riscv/rvv/base/binop_vx_constraint-19.c   |   2 +
 .../riscv/rvv/base/binop_vx_constraint-21.c   |   2 +
 .../riscv/rvv/base/binop_vx_constraint-23.c   |   2 +
 .../riscv/rvv/base/binop_vx_constraint-25.c   |   2 +
 .../riscv/rvv/base/binop_vx_constraint-27.c   |   2 +
 .../riscv/rvv/base/binop_vx_constraint-29.c   |   2 +
 .../riscv/rvv/base/binop_vx_constraint-31.c   |   2 +
 .../riscv/rvv/base/binop_vx_constraint-33.c   |   2 +
 .../riscv/rvv/base/binop_vx_constraint-35.c   |   2 +
 .../riscv/rvv/base/binop_vx_constraint-4.c|   2 +
 .../riscv/rvv/base/binop_vx_constraint-40.c   |   2 +
 .../riscv/rvv/base/binop_vx_constraint-44.c   |   2 +
 .../riscv/rvv/base/binop_vx_constraint-50.c   |   2 +
 .../riscv/rvv/base/binop_vx_constraint-56.c   |   2 +
 .../riscv/rvv/base/binop_vx_constraint-62.c   |   2 +
 .../riscv/rvv/base/binop_vx_constraint-68.c   |   2 +
 .../riscv/rvv/base/binop_vx_constraint-74.c   |   2 +
 .../riscv/rvv/base/binop_vx_constraint-79.c   |   2 +
 .../riscv/rvv/base/binop_vx_constraint-8.c|   2 +
 .../riscv/rvv/base/binop_vx_constraint-84.c   |   2 +
 .../riscv/rvv/base/binop_vx_constraint-90.c   |   2 +
 .../riscv/rvv/base/binop_vx_constraint-96.c   |   2 +
 .../rvv/base/float-point-dynamic-frm-30.c |   2 +
 .../gcc.target/riscv/rvv/base/pr108185-1.c|   2 +
 .../gcc.target/riscv/rvv/base/pr108185-2.c|   2 +
 .../gcc.target/riscv/rvv/base/pr108185-3.c|   2 +
 .../gcc.target/riscv/rvv/base/pr108185-4.c|   2 +
 .../gcc.target/riscv/rvv/base/pr108185-5.c|   2 +
 .../gcc.target/riscv/rvv/base/pr108185-6.c|   2 +
 .../gcc.target/riscv/rvv/base/pr108185-7.c|   2 +
 .../riscv/rvv/base/shift_vx_constraint-1.c|   2 +
 .../gcc.target/riscv/rvv/vsetvl/pr111037-3.c  |   2 +
 .../riscv/rvv/vsetvl/vlmax_back_prop-28.c |   2 +
 .../riscv/rvv/vsetvl/vlmax_back_prop-29.c |   2 +
 .../riscv/rvv/vsetvl/vlmax_back_prop-32.c |   2 +
 .../riscv/rvv/vsetvl/vlmax_back_prop-33.c |   2 +
 .../riscv/rvv/vsetvl/vlmax_single_block-17.c  |   2 +
 .../riscv/rvv/vsetvl/vlmax_single_block-18.c  |   2 +
 .../riscv/rvv/vsetvl/vlmax_single_block-19.c  |   2 +
 .../riscv/rvv/vsetvl/vlmax_switch_vtype-10.c  |   2 +
 .../riscv/rvv/vsetvl/vlmax_switch_vtype-11.c  |   2 +
 .../riscv/rvv/vsetvl/vlmax_switch_vtype-12.c  |   2 +
 .../riscv/rvv/vsetvl/vlmax_switch_vtype-4.c   |   2 +
 .../riscv/rvv/vsetvl/vlmax_switch_vtype-5.c   |   2 +
 .../riscv/rvv/vsetvl/vlmax_switch_vtype-6.c   |   2 +
 .../riscv/rvv/vsetvl/vlmax_switch_vtype-7.c   |   2 +
 .../riscv/rvv/vsetvl/vlmax_switch_vtype-8.c   |   2 +
 .../riscv/rvv/vsetvl/vlmax_switch_vtype-9.c   |   2 +
 gcc/testsuite/gfortran.dg/vect/vect-8.f90 |   2 +
 66 files changed, 391 insertions(+), 192 deletions(-)
 create mode 100644 gcc/config/riscv/generic-vector-ooo.md

-- 
2.34.1



[patch,avr,applied] Add link to sample ld-script in avr-gcc wiki.

2024-01-12 Thread Georg-Johann Lay

This links an example from the avr-gcc wiki that shows how to set up
a linker script for the __flashN avr address spaces in
section AVR Named Address Spaces of the GCC user manual.

Johann

--

AVR: Documentation: Web-Link an example ld-Script for Address-Space 
__flashN.


gcc/
* doc/extend.texi (AVR Named Address Spaces, Limitations and Caveats):
Add web-link to the avr-gcc wiki.

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index eb4a42588c7..b9129d1b464 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -1608,6 +1608,8 @@ supports reading across the 64@tie{}KiB flash 
segment boundaries is

 If you use one of the @code{__flash@var{N}} address spaces
 you must arrange your linker script to locate the
 @code{.progmem@var{N}.data} sections according to your needs.
+For an example, see the
+@w{@uref{https://gcc.gnu.org/wiki/avr-gcc#Address_Spaces,avr-gcc wiki}}

 @item
 Any data or pointers to the non-generic address spaces must


Re: [PATCH] libstdc++: Implement P2255R2 dangling checks for std::tuple [PR108822]

2024-01-12 Thread Jonathan Wakely
On Fri, 12 Jan 2024 at 17:55, Patrick Palka  wrote:
>
> On Thu, 11 Jan 2024, Jonathan Wakely wrote:
>
> > I'd like to commit this to trunk for GCC 14. Please take a look.
> >
> > -- >8 --
> >
> > This is the last part of PR libstdc++/108822 implementing P2255R2, which
> > makes it ill-formed to create a std::tuple that would bind a reference
> > to a temporary.
> >
> > The dangling checks are implemented as deleted constructors for C++20
> > and higher, and as Debug Mode static assertions in the constructor body
> > for older standards. This is similar to the r13-6084-g916ce577ad109b
> > changes for std::pair.
> >
> > As part of this change, I've reimplemented most of std::tuple for C++20,
> > making use of concepts to replace the enable_if constraints, and using
> > conditional explicit to avoid duplicating most constructors. We could
> > use conditional explicit for the C++11 implementation too (with pragmas
> > to disables the -Wc++17-extensions warnings), but that should be done as
> > a stage 1 change for GCC 15 rather than now.
> >
> > The partial specialization for std::tuple is no longer used for
> > C++20 (or more precisely, for a C++20 compiler that supports concepts
> > and conditional explicit). The additional constructors and assignment
> > operators that take std::pair arguments have been added to the C++20
> > implementation of the primary template, with sizeof...(_Elements)==2
> > constraints. This avoids reimplementing all the other constructors in
> > the std::tuple partial specialization to use concepts. This way
> > we avoid four implementations of every constructor and only have three!
> > (The primary template has an implementation of each constructor for
> > C++11 and another for C++20, and the tuple specialization has an
> > implementation of each for C++11, so that's three for each constructor.)
> >
> > In order to make the constraints more efficient on the C++20 version of
> > the default constructor I've also added a variable template for the
> > __is_implicitly_default_constructible trait, implemented using concepts.
> >
> > libstdc++-v3/ChangeLog:
> >
> >   PR libstdc++/108822
> >   * include/std/tuple (tuple): Add checks for dangling references.
> >   Reimplement constraints and constant expressions using C++20
> >   features.
> >   * include/std/type_traits [C++20]
> >   (__is_implicitly_default_constructible_v): Define.
> >   (__is_implicitly_default_constructible): Use variable template.
> >   * testsuite/20_util/tuple/dangling_ref.cc: New test.
> > ---
> >  libstdc++-v3/include/std/tuple| 1021 -
> >  libstdc++-v3/include/std/type_traits  |   11 +
> >  .../testsuite/20_util/tuple/dangling_ref.cc   |  105 ++
> >  3 files changed, 841 insertions(+), 296 deletions(-)
> >  create mode 100644 libstdc++-v3/testsuite/20_util/tuple/dangling_ref.cc
> >
> > diff --git a/libstdc++-v3/include/std/tuple b/libstdc++-v3/include/std/tuple
> > index 50e11843757..cd05b638923 100644
> > --- a/libstdc++-v3/include/std/tuple
> > +++ b/libstdc++-v3/include/std/tuple
> > @@ -752,11 +752,467 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >template
> >  class tuple : public _Tuple_impl<0, _Elements...>
> >  {
> > -  typedef _Tuple_impl<0, _Elements...> _Inherited;
> > +  using _Inherited = _Tuple_impl<0, _Elements...>;
> >
> >template
> >   using _TCC = _TupleConstraints<_Cond, _Elements...>;
>
> I guess this should be moved into the #else branch if it's not used in
> the new impl.

Ah yes, I left them there until I was sure I wouldn't need them ...
then didn't move them when I didn't need them.

>
> >
> > +#if __cpp_concepts && __cpp_conditional_explicit // >= C++20
> > +  template
> > + static consteval bool
> > + __constructible()
> > + {
> > +   if constexpr (sizeof...(_UTypes) == sizeof...(_Elements))
> > + return (is_constructible_v<_Elements, _UTypes> && ...);
>
> IIUC this (and all the other new constraints) won't short-circuit like
> the old versions do :/ Not sure how much that matters?

Yeah, I thought about that, but we have efficient built-ins for these
traits now, so I think it's probably OK?

If not we could go back to sharing the _TupleConstraints implementations.



Re: Ping [PATCH] testsuite: Reduce gcc.dg/torture/inline-mem-cpy-1.c by 11 for simulators

2024-01-12 Thread Mike Stump
On Jan 12, 2024, at 2:52 AM, Hans-Peter Nilsson  wrote:
> 
> Ping.  (Don't miss the gcc.dg/torture/inline-mem-cpy-1.c part.)
> 
> On Mon, 1 Jan 2024, Hans-Peter Nilsson wrote:
> 
>> Tested mmix-knuth-mmixware (where all torture-variants of
>> gcc.dg/torture/inline-mem-cpy-1.c now pass) and native
>> x86_64-pc-linux-gnu.  Also stepped through the test for native,
>> w/wo. RUN_FRACTION defined to see that it worked as intended.
>> 
>> You may wonder what about the "sibling" tests inline-mem-cmp-1.c and
>> inline-mem-cpy-cmp-1.c.  Well, they FAIL, but not because of
>> timeouts(!)  To be continued
>> 
>> Ok to commit?

Ok.


Re: [PATCH] libstdc++: Implement P2255R2 dangling checks for std::tuple [PR108822]

2024-01-12 Thread Patrick Palka
On Thu, 11 Jan 2024, Jonathan Wakely wrote:

> I'd like to commit this to trunk for GCC 14. Please take a look.
> 
> -- >8 --
> 
> This is the last part of PR libstdc++/108822 implementing P2255R2, which
> makes it ill-formed to create a std::tuple that would bind a reference
> to a temporary.
> 
> The dangling checks are implemented as deleted constructors for C++20
> and higher, and as Debug Mode static assertions in the constructor body
> for older standards. This is similar to the r13-6084-g916ce577ad109b
> changes for std::pair.
> 
> As part of this change, I've reimplemented most of std::tuple for C++20,
> making use of concepts to replace the enable_if constraints, and using
> conditional explicit to avoid duplicating most constructors. We could
> use conditional explicit for the C++11 implementation too (with pragmas
> to disables the -Wc++17-extensions warnings), but that should be done as
> a stage 1 change for GCC 15 rather than now.
> 
> The partial specialization for std::tuple is no longer used for
> C++20 (or more precisely, for a C++20 compiler that supports concepts
> and conditional explicit). The additional constructors and assignment
> operators that take std::pair arguments have been added to the C++20
> implementation of the primary template, with sizeof...(_Elements)==2
> constraints. This avoids reimplementing all the other constructors in
> the std::tuple partial specialization to use concepts. This way
> we avoid four implementations of every constructor and only have three!
> (The primary template has an implementation of each constructor for
> C++11 and another for C++20, and the tuple specialization has an
> implementation of each for C++11, so that's three for each constructor.)
> 
> In order to make the constraints more efficient on the C++20 version of
> the default constructor I've also added a variable template for the
> __is_implicitly_default_constructible trait, implemented using concepts.
> 
> libstdc++-v3/ChangeLog:
> 
>   PR libstdc++/108822
>   * include/std/tuple (tuple): Add checks for dangling references.
>   Reimplement constraints and constant expressions using C++20
>   features.
>   * include/std/type_traits [C++20]
>   (__is_implicitly_default_constructible_v): Define.
>   (__is_implicitly_default_constructible): Use variable template.
>   * testsuite/20_util/tuple/dangling_ref.cc: New test.
> ---
>  libstdc++-v3/include/std/tuple| 1021 -
>  libstdc++-v3/include/std/type_traits  |   11 +
>  .../testsuite/20_util/tuple/dangling_ref.cc   |  105 ++
>  3 files changed, 841 insertions(+), 296 deletions(-)
>  create mode 100644 libstdc++-v3/testsuite/20_util/tuple/dangling_ref.cc
> 
> diff --git a/libstdc++-v3/include/std/tuple b/libstdc++-v3/include/std/tuple
> index 50e11843757..cd05b638923 100644
> --- a/libstdc++-v3/include/std/tuple
> +++ b/libstdc++-v3/include/std/tuple
> @@ -752,11 +752,467 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>template
>  class tuple : public _Tuple_impl<0, _Elements...>
>  {
> -  typedef _Tuple_impl<0, _Elements...> _Inherited;
> +  using _Inherited = _Tuple_impl<0, _Elements...>;
>  
>template
>   using _TCC = _TupleConstraints<_Cond, _Elements...>;

I guess this should be moved into the #else branch if it's not used in
the new impl.

>  
> +#if __cpp_concepts && __cpp_conditional_explicit // >= C++20
> +  template
> + static consteval bool
> + __constructible()
> + {
> +   if constexpr (sizeof...(_UTypes) == sizeof...(_Elements))
> + return (is_constructible_v<_Elements, _UTypes> && ...);

IIUC this (and all the other new constraints) won't short-circuit like
the old versions do :/ Not sure how much that matters?

> +   else
> + return false;
> + }
> +
> +  template
> + static consteval bool
> + __nothrow_constructible()
> + {
> +   if constexpr (sizeof...(_UTypes) == sizeof...(_Elements))
> + return (is_nothrow_constructible_v<_Elements, _UTypes> && ...);
> +   else
> + return false;
> + }
> +
> +  template
> + static consteval bool
> + __convertible()
> + {
> +   if constexpr (sizeof...(_UTypes) == sizeof...(_Elements))
> + return (is_convertible_v<_UTypes, _Elements> && ...);
> +   else
> + return false;
> + }
> +
> +  // _GLIBCXX_RESOLVE_LIB_DEFECTS
> +  // 3121. tuple constructor constraints for UTypes&&... overloads
> +  template
> + static consteval bool
> + __disambiguating_constraint()
> + {
> +   if constexpr (sizeof...(_Elements) != sizeof...(_UTypes))
> + return false;
> +   else if constexpr (sizeof...(_Elements) == 1)
> + {
> +   using _U0 = typename _Nth_type<0, _UTypes...>::type;
> +   return !is_same_v, tuple>;
> + }
> +   else if constexpr (sizeof...(_Elements) < 4)
> + {
> +   using _U0 = 

[patch,avr,applied] Fix documentation for attribute "address".

2024-01-12 Thread Georg-Johann Lay

avr attribute "address" only supports exactly one argument,
fixed thusly.

Johann

--

AVR: Documentation: Attribute address has exactly one argument.

gcc/
* doc/extend.texi (AVR Variable Attributes) [address]: Remove
documentation for a version without argument, which is not supported.

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index eebff4071e8..eb4a42588c7 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -8317,8 +8317,7 @@ allowing the use of @code{cbi}, @code{sbi}, 
@code{sbic} and @code{sbis}

 instructions.

 @cindex @code{address} variable attribute, AVR
-@item address
-@itemx address (@var{addr})
+@item address (@var{addr})
 Variables with the @code{address} attribute can be used to address
 memory-mapped peripherals that may lie outside the I/O address range.
 Just like with the @code{io} and @code{io_low} attributes, no memory is



Re: HELP: Questions on unshare_expr

2024-01-12 Thread Qing Zhao
Thanks a lot for the reply.  

> On Jan 12, 2024, at 11:28 AM, Richard Biener  
> wrote:
> 
> 
> 
>> Am 12.01.2024 um 16:55 schrieb Qing Zhao :
>> 
>> Hi,
>> 
>> I have some questions on using the utility routine “unshare_expr”:
>> 
>> From my understanding, there should be NO shared nodes in a GENERIC function.
>> Otherwise, gimplication might fail.
> 
> There is sharing and this is why we unshare everything before gimplification.

Okay, so, the "unsharing everything” is done automatically by the compiler 
before gimplification? 
I don’t need to worry about this?

I see  many places in FE where “unshare_expr” is used, for example, 
“ubsan_instrument_division”,
 “ubsan_instrument_shift”, etc. 

So, usually, when should “unshare_expr” be used? 

>> Therefore, when we insert new tree nodes manually into the GENERIC function, 
>> we should
>> Make sure there is no shared nodes introduced.
>> 
>> 1. Is the above understanding correct?
> 
> No
> 
>> 2. Is there any tool to check there is no shared nodes in the GENERIC 
>> function?
>> 3. Are there any tree nodes that are allowed to be shared in a GENERIC 
>> function? If so, what are they?
> 
> There’s some allowed sharing on GIMPLE and a verifier.
What’s the name of the verifier that I can search and check? 
> 
>> 4. For the following:
>> 
>> If both “op1” and “op2” are existing tree nodes in the current GENERIC 
>> function,
>> and we will insert a new tree node:
>> 
>> tree  new_tree = build2 (CODE, TYPE, op1, op2)
>> 
>> 
>> Should we add “unshare_expr” on both “op1” and “op2” as:
>> 
>> Tree new_tree = build2 (CODE, TYPE, unshare_expr (op1), unshare_expr (op2))
>> ?
> 
> Not necessarily but instead you have to watch for evaluating side-effects 
> only once.  See save_expr.

Okay.  I see.
> 
>> 
>> If op2 is a node that is allowed to be shared, whether the additional 
>> “unshare_expr” on it trigger any potential problem?
> 
> If you unshare side-effects that’s generating wrong-code.  Otherwise 
> unsharing is safe.

Okay. 
Will unnecessary unshareing produce redundant IRs?

All my questions for unshare_expr relate to a  LTO bug that I currently stuck 
with 
when using .ACCESS_WITH_SIZE in bound sanitizer (only with -flto, without 
-flto, no issue):

[opc@qinzhao-aarch64-ol8 gcc]$ sh t
during IPA pass: modref
t.c:20:1: internal compiler error: tree code ‘ssa_name’ is not supported in LTO 
streams
0x14c3993 lto_write_tree
../../latest-gcc-write/gcc/lto-streamer-out.cc:561
0x14c3aeb lto_output_tree_1

And the value of the tree node that triggered the ICE is:
(gdb) call debug_tree(expr)
 
nothrow
def_stmt 
version:13 in-free-list>

Is there any good way to debug LTO bug?

Thanks a lot for the help.

Qing


> 
> Richard 
> 
>> Thanks a lot for your help.
>> 
>> Qing



Re: [PATCH RFC] c++/modules: __class_type_info and modules

2024-01-12 Thread Jason Merrill

On 1/12/24 09:09, Jason Merrill wrote:

On 12/23/23 14:46, Nathan Sidwell wrote:

On 12/18/23 17:10, Jason Merrill wrote:

On 12/18/23 16:57, Nathan Sidwell wrote:

On 12/18/23 16:31, Jason Merrill wrote:
Tested x86_64-pc-linux-gnu.  Does this make sense?  Did you have 
another theory

about how to merge these?


Why isn't push_abi_namespace doing the right setup here? (and I 
think get_global_binding might be similarly problematic?)


What would the right setup be?  It pushes into the global module, but 
before this change lookup doesn't find things imported into the 
global module, and so we get two independent (and so non-equivalent) 
declarations.


The comment for get_namespace_binding says "Users of this who, having 
found nothing, push a new decl must be prepared for that pushing to 
match an existing decl."  But if lookup_elaborated_type fails, so we 
pushtag a new type, check_module_override doesn't try to merge them 
because TREE_PUBLIC isn't set on the TYPE_DECL yet at that point, and 
they coexist until we complain about redeclaring __dynamic_cast with 
non-matching parameter types.


I tried setting TREE_PUBLIC on the TYPE_DECL, and then 
check_module_override called duplicate_decls, and rejected the 
redeclaration as a different type.


sigh, it seems that doesn't work as intended, I guess your approace is 
a pragmatic workaround, much as I dislike special-casing particular 
identifier. Perhaps comment with an appropriate FIXME?


I've realized there's problems with completeness here -- the 
'invisible' type may be complete, but the current TU only 
forward-declares it.  Our AST can't represent that right now.  And I'm 
not sure if there are template instantiation issues -- is the type 
complete or not in any particular instantiaton?


My understanding of https://eel.is/c++draft/module#reach-4 is that this 
doesn't matter: if there is a reachable definition of the class, the 
class is complete, even if the current TU only forward-declares it.


Here's an alternate approach that handles this merging in 
check_module_override; this makes P1811 include-after-import a bit 
worse, but it's already not well supported, so perhaps that's OK for 
now.  But I'm inclined to go with my earlier patch for GCC 14.  What do 
you think?


I'm going to go ahead and push this revision of my earlier patch for 
now, we can adjust as needed.
From 27521a2f4f7b859d5656e5bdd69d3f759ea4c23a Mon Sep 17 00:00:00 2001
From: Jason Merrill 
Date: Mon, 18 Dec 2023 15:47:10 -0500
Subject: [PATCH] c++: __class_type_info and modules [PR113038]
To: gcc-patches@gcc.gnu.org

Doing a dynamic_cast in both TUs broke because we were declaring a new
__class_type_info in _b that conflicted with the one imported in the global
module from _a.  It seems clear to me that any new class declaration in
the global module should merge with an imported definition, but for GCC 14
let's just fix this for the specific case of __class_type_info.

	PR c++/113038

gcc/cp/ChangeLog:

	* name-lookup.cc (lookup_elaborated_type): Look for bindings
	in the global namespace in the ABI namespace.

gcc/testsuite/ChangeLog:

	* g++.dg/modules/pr106304_b.C: Add dynamic_cast.
---
 gcc/cp/name-lookup.cc | 16 +---
 gcc/testsuite/g++.dg/modules/pr106304_b.C |  1 +
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
index 26c6bc71e99..d827d337d3b 100644
--- a/gcc/cp/name-lookup.cc
+++ b/gcc/cp/name-lookup.cc
@@ -8089,9 +8089,19 @@ lookup_elaborated_type (tree name, TAG_how how)
 	{
 	  /* We're in the global module, perhaps there's a tag
 		 there?  */
-	  // FIXME: This isn't quite right, if we find something
-	  // here, from the language PoV we're not supposed to
-	  // know it?
+
+	  /* FIXME: In general we should probably merge global module
+		 classes in check_module_override rather than here, but for
+		 GCC14 let's just fix lazy declarations of __class_type_info in
+		 build_dynamic_cast_1.  */
+	  if (current_namespace == abi_node)
+		{
+		  tree g = (BINDING_VECTOR_CLUSTER (*slot, 0)
+			.slots[BINDING_SLOT_GLOBAL]);
+		  for (ovl_iterator iter (g); iter; ++iter)
+		if (qualify_lookup (*iter, LOOK_want::TYPE))
+		  return *iter;
+		}
 	}
 	}
 }
diff --git a/gcc/testsuite/g++.dg/modules/pr106304_b.C b/gcc/testsuite/g++.dg/modules/pr106304_b.C
index e8333909c8d..0d1da086176 100644
--- a/gcc/testsuite/g++.dg/modules/pr106304_b.C
+++ b/gcc/testsuite/g++.dg/modules/pr106304_b.C
@@ -5,4 +5,5 @@ module pr106304;
 
 void f(A& a) {
   as_b(a);
+  dynamic_cast();
 }
-- 
2.39.3



Re: [PATCH v3 00/12] [GCC] arm: vld1q vst1 vst1q vst1 intrinsics

2024-01-12 Thread Richard Earnshaw (lists)
On 02/01/2024 09:23, ezra.sito...@arm.com wrote:
> From: Ezra Sitorus 
> 
> Add vld1q, vst1, vst1q and vst1 intrinsics to arm port.
> 
> Ezra Sitorus (12):
>   [GCC] arm: vld1q_types_x2 ACLE intrinsics
>   [GCC] arm: vld1q_types_x3 ACLE intrinsics
>   [GCC] arm: vld1q_types_x4 ACLE intrinsics
>   [GCC] arm: vst1_types_x2 ACLE intrinsics
>   [GCC] arm: vst1_types_x3 ACLE intrinsics
>   [GCC] arm: vst1_types_x4 ACLE intrinsics
>   [GCC] arm: vst1q_types_x2 ACLE intrinsics
>   [GCC] arm: vst1q_types_x3 ACLE intrinsics
>   [GCC] arm: vst1q_types_x4 ACLE intrinsics
>   [GCC] arm: vld1_types_x2 ACLE intrinsics
>   [GCC] arm: vld1_types_x3 ACLE intrinsics
>   [GCC] arm: vld1_types_x4 ACLE intrinsics
> 
>  gcc/config/arm/arm_neon.h | 2032 ++---
>  gcc/config/arm/arm_neon_builtins.def  |   12 +
>  gcc/config/arm/iterators.md   |6 +
>  gcc/config/arm/neon.md|  249 ++
>  gcc/config/arm/unspecs.md |8 +
>  .../gcc.target/arm/simd/vld1_base_xN_1.c  |  176 ++
>  .../gcc.target/arm/simd/vld1_bf16_xN_1.c  |   23 +
>  .../gcc.target/arm/simd/vld1_fp16_xN_1.c  |   23 +
>  .../gcc.target/arm/simd/vld1_p64_xN_1.c   |   23 +
>  .../gcc.target/arm/simd/vld1q_base_xN_1.c |  183 ++
>  .../gcc.target/arm/simd/vld1q_bf16_xN_1.c |   24 +
>  .../gcc.target/arm/simd/vld1q_fp16_xN_1.c |   24 +
>  .../gcc.target/arm/simd/vld1q_p64_xN_1.c  |   24 +
>  .../gcc.target/arm/simd/vst1_base_xN_1.c  |  176 ++
>  .../gcc.target/arm/simd/vst1_bf16_xN_1.c  |   22 +
>  .../gcc.target/arm/simd/vst1_fp16_xN_1.c  |   23 +
>  .../gcc.target/arm/simd/vst1_p64_xN_1.c   |   23 +
>  .../gcc.target/arm/simd/vst1q_base_xN_1.c |  185 ++
>  .../gcc.target/arm/simd/vst1q_bf16_xN_1.c |   24 +
>  .../gcc.target/arm/simd/vst1q_fp16_xN_1.c |   24 +
>  .../gcc.target/arm/simd/vst1q_p64_xN_1.c  |   24 +
>  21 files changed, 3018 insertions(+), 290 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1_base_xN_1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1_bf16_xN_1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1_fp16_xN_1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1_p64_xN_1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1q_base_xN_1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1q_bf16_xN_1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1q_fp16_xN_1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1q_p64_xN_1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/vst1_base_xN_1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/vst1_bf16_xN_1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/vst1_fp16_xN_1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/vst1_p64_xN_1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/vst1q_base_xN_1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/vst1q_bf16_xN_1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/vst1q_fp16_xN_1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/vst1q_p64_xN_1.c
> 

Thanks, I've pushed this series.

Reviewing this series did highlight a couple of issues with the existing code 
base (not your patch); I'll follow up on these separately.

R.


Re: HELP: Questions on unshare_expr

2024-01-12 Thread Richard Biener



> Am 12.01.2024 um 16:55 schrieb Qing Zhao :
> 
> Hi,
> 
> I have some questions on using the utility routine “unshare_expr”:
> 
> From my understanding, there should be NO shared nodes in a GENERIC function.
> Otherwise, gimplication might fail.

There is sharing and this is why we unshare everything before gimplification.

> Therefore, when we insert new tree nodes manually into the GENERIC function, 
> we should
> Make sure there is no shared nodes introduced.
> 
> 1. Is the above understanding correct?

No

> 2. Is there any tool to check there is no shared nodes in the GENERIC 
> function?
> 3. Are there any tree nodes that are allowed to be shared in a GENERIC 
> function? If so, what are they?

There’s some allowed sharing on GIMPLE and a verifier.

> 4. For the following:
> 
> If both “op1” and “op2” are existing tree nodes in the current GENERIC 
> function,
> and we will insert a new tree node:
> 
> tree  new_tree = build2 (CODE, TYPE, op1, op2)
> 
> 
> Should we add “unshare_expr” on both “op1” and “op2” as:
> 
> Tree new_tree = build2 (CODE, TYPE, unshare_expr (op1), unshare_expr (op2))
> ?

Not necessarily but instead you have to watch for evaluating side-effects only 
once.  See save_expr.

> 
> If op2 is a node that is allowed to be shared, whether the additional 
> “unshare_expr” on it trigger any potential problem?

If you unshare side-effects that’s generating wrong-code.  Otherwise unsharing 
is safe.

Richard 

> Thanks a lot for your help.
> 
> Qing
> 
> 
> 
> 
> 


HELP: Questions on unshare_expr

2024-01-12 Thread Qing Zhao
Hi, 

I have some questions on using the utility routine “unshare_expr”:

From my understanding, there should be NO shared nodes in a GENERIC function. 
 Otherwise, gimplication might fail. 

Therefore, when we insert new tree nodes manually into the GENERIC function, we 
should
Make sure there is no shared nodes introduced. 

1. Is the above understanding correct?
2. Is there any tool to check there is no shared nodes in the GENERIC function?
3. Are there any tree nodes that are allowed to be shared in a GENERIC 
function? If so, what are they?

4. For the following:

If both “op1” and “op2” are existing tree nodes in the current GENERIC 
function, 
and we will insert a new tree node:

tree  new_tree = build2 (CODE, TYPE, op1, op2)


Should we add “unshare_expr” on both “op1” and “op2” as:

Tree new_tree = build2 (CODE, TYPE, unshare_expr (op1), unshare_expr (op2))
?

If op2 is a node that is allowed to be shared, whether the additional 
“unshare_expr” on it trigger any potential problem?

Thanks a lot for your help.

Qing 







Re: [PATCH] c: Avoid _BitInt indexes > sizetype in ARRAY_REFs [PR113315]

2024-01-12 Thread Joseph Myers
On Fri, 12 Jan 2024, Jakub Jelinek wrote:

> Hi!
> 
> When build_array_ref doesn't use ARRAY_REF, it casts the index to sizetype
> already, performs POINTER_PLUS_EXPR and then dereferences.
> While when emitting ARRAY_REF, we try to keep index expression as is in
> whatever type it had, which is reasonable e.g. for signed or unsigned types
> narrower than sizetype for loop optimizations etc.
> But if the index is wider than sizetype, we are unnecessarily computing
> bits beyond what is needed.  For {,unsigned }__int128 on 64-bit arches
> or {,unsigned }long long on 32-bit arches we've been doing that for decades,
> so the following patch doesn't propose to change that (might be stage1
> material), but for _BitInt at least the _BitInt lowering code doesn't expect
> to see large/huge _BitInt in the ARRAY_REF indexes, I was expecting one
> would see just casts of those to sizetype.
> 
> So, the following patch makes sure that large/huge _BitInt indexes don't
> appear in ARRAY_REFs.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [pushed][PR112918][LRA]: Fixing IRA ICE on m68k

2024-01-12 Thread YunQiang Su
Vladimir Makarov  于2024年1月11日周四 22:35写道:
>
> The following patch fixes
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112918
>
> The patch was successfully bootstrapped and tested on x86_64, aarch64,
> ppc64le

This patch causes some ICE on MIPS:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113354

PS: how to test cross build for mips:

1. apt install g++-multilib-mipsel-linux-gnu
2. apply patch:
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641619.html
3. ../configure --target=mipsel-linux-gnu \
  --includedir=/usr/mipsel-linux-gnu/include --enable-multilib \
  --with-arch-32=mips32r2 --with-fp-32=xx \
  --enable-multiarch --enable-targets=all \
  --with-arch-64=mips64r2 --prefix=/usr --disable-libsanitizer
4. make -j

-- 
YunQiang Su


[PATCH] tree-optimization/109893 - allow more backwards jump threading

2024-01-12 Thread Richard Biener
Currently we scale the number of stmts allowed for forward
jump threading to limit those for backwards jump threading
by applying a factor of two to the counted stmts.  That doesn't
allow fine-grained adjustments, like by a single stmt as needed
for PR109893.  The following changes the factor to be a percentage
of the forward threading number and adjusts that percentage from
50 to 54, fixing the regression.

Bootstrapped and tested on x86_64-unknown-linux-gnu, I'm cross-checking
some FAILs I see.

PR tree-optimization/109893
* params.opt (fsm-scale-path-stmts): Change to percentage
and default to 54 from 50.
* doc/invoke.texi (--param fsm-scale-path-stmts): Adjust.
* tree-ssa-threadbackward.cc
(back_threader_profitability::possibly_profitable_path_p):
Adjust param_fsm_scale_path_stmts uses.
(back_threader_profitability::profitable_path_p): Likewise.

* gcc.dg/tree-ssa/pr109893.c: New testcase.
---
 gcc/doc/invoke.texi  |  4 +--
 gcc/params.opt   |  4 +--
 gcc/testsuite/gcc.dg/tree-ssa/pr109893.c | 33 
 gcc/tree-ssa-threadbackward.cc   | 17 ++--
 4 files changed, 46 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr109893.c

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index b7a201317ce..7e19f0245de 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -16647,8 +16647,8 @@ Maximum number of arrays per scop.
 Max. size of loc list for which reverse ops should be added.
 
 @item fsm-scale-path-stmts
-Scale factor to apply to the number of statements in a threading path
-crossing a loop backedge when comparing to
+Percentage of max-jump-thread-duplication-stmts to allow for the number of
+statements in a threading path crossing a loop backedge.
 @option{--param=max-jump-thread-duplication-stmts}.
 
 @item uninit-control-dep-attempts
diff --git a/gcc/params.opt b/gcc/params.opt
index 5eb045b2e6c..dbfa8ece8e0 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -131,8 +131,8 @@ Common Joined UInteger Var(param_early_inlining_insns) 
Init(6) Optimization Para
 Maximal estimated growth of function body caused by early inlining of single 
call.
 
 -param=fsm-scale-path-stmts=
-Common Joined UInteger Var(param_fsm_scale_path_stmts) Init(2) IntegerRange(1, 
10) Param Optimization
-Scale factor to apply to the number of statements in a threading path crossing 
a loop backedge when comparing to max-jump-thread-duplication-stmts.
+Common Joined UInteger Var(param_fsm_scale_path_stmts) Init(54) 
IntegerRange(1, 100) Param Optimization
+Percentage of max-jump-thread-duplication-stmts to allow for the number of 
statements in a threading path crossing a loop backedge.
 
 -param=fully-pipelined-fma=
 Common Joined UInteger Var(param_fully_pipelined_fma) Init(0) IntegerRange(0, 
1) Param Optimization
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr109893.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr109893.c
new file mode 100644
index 000..5c98664df72
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr109893.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-dom2" } */
+
+void foo(void);
+void bar(void);
+static char a;
+static int b, e, f;
+static int *c = , *g;
+int main() {
+int *j = 0;
+if (a) {
+g = 0;
+if (c)
+bar();
+} else {
+j = 
+c = 0;
+}
+if (c ==  == b || c == )
+;
+else
+__builtin_unreachable();
+if (g || e) {
+if (j ==  || j == 0)
+;
+else
+foo();
+}
+a = 4;
+}
+
+/* Jump threading in thread1 should enable to elide the call to foo.  */
+/* { dg-final { scan-tree-dump-not "foo" "dom2" } } */
diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc
index fcebcdb5eaa..3091ddf4af1 100644
--- a/gcc/tree-ssa-threadbackward.cc
+++ b/gcc/tree-ssa-threadbackward.cc
@@ -741,8 +741,8 @@ back_threader_profitability::possibly_profitable_path_p
   if ((!m_threaded_multiway_branch
|| !loop->latch
|| loop->latch->index == EXIT_BLOCK)
-  && (m_n_insns * param_fsm_scale_path_stmts
- >= param_max_jump_thread_duplication_stmts))
+  && (m_n_insns * 100 >= (param_max_jump_thread_duplication_stmts
+ * param_fsm_scale_path_stmts)))
 {
   if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file,
@@ -751,8 +751,9 @@ back_threader_profitability::possibly_profitable_path_p
   return false;
 }
   *large_non_fsm = (!(m_threaded_through_latch && m_threaded_multiway_branch)
-   && (m_n_insns * param_fsm_scale_path_stmts
-   >= param_max_jump_thread_duplication_stmts));
+   && (m_n_insns * 100
+   >= (param_max_jump_thread_duplication_stmts
+   * param_fsm_scale_path_stmts)));
 
 

[pushed] Objective-C, Darwin: Fix a regression in handling bad receivers.

2024-01-12 Thread Iain Sandoe
Tested on i686 and powerpc darwin, and a cross from x86-64 darwin to
powerpc, pushed to trunk, thanks,
Iain

--- 8< ---

This is seen on 32b targets with a 64b multilib, and is an ICE when
the build has checking enabled.  The fix is to exit the routine
early if the sender or receiver are already error_mark_node.

gcc/objc/ChangeLog:

* objc-next-runtime-abi-02.cc
(build_v2_objc_method_fixup_call): Early exit for cases
where the sender or receiver are known to be in error.

Signed-off-by: Iain Sandoe 
---
 gcc/objc/objc-next-runtime-abi-02.cc | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/objc/objc-next-runtime-abi-02.cc 
b/gcc/objc/objc-next-runtime-abi-02.cc
index dfc1129530a..a622f4cbf4e 100644
--- a/gcc/objc/objc-next-runtime-abi-02.cc
+++ b/gcc/objc/objc-next-runtime-abi-02.cc
@@ -1657,6 +1657,8 @@ build_v2_objc_method_fixup_call (int super_flag, tree 
method_prototype,
   rcv_p = (super_flag ? objc_super_type : objc_object_type);
 
   lookup_object = build_c_cast (input_location, rcv_p, lookup_object);
+  if (sender == error_mark_node || lookup_object == error_mark_node)
+return error_mark_node;
 
   /* Use SAVE_EXPR to avoid evaluating the receiver twice.  */
   lookup_object = save_expr (lookup_object);
-- 
2.39.2 (Apple Git-143)



Re: [pushed] c++: corresponding object parms [PR113191]

2024-01-12 Thread Jason Merrill

On 1/11/24 17:01, Jason Merrill wrote:

Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

As discussed, our handling of corresponding object parameters needed to
handle the using-declaration case better.  And I took the opportunity to
share code between the add_method and cand_parms_match uses.

This patch specifically doesn't compare reversed parameters, but a follow-up
patch will.


Thus.From 8182dc2cc293009d0bc95dd667bb872246f2ca04 Mon Sep 17 00:00:00 2001
From: Jason Merrill 
Date: Wed, 10 Jan 2024 23:18:23 -0500
Subject: [PATCH] c++: cand_parms_match and reversed candidates
To: gcc-patches@gcc.gnu.org

When considering whether the candidate parameters match, according to the
language we're considering the synthesized reversed candidate, so we should
compare the parameters in swapped order.  In this situation it doesn't make
sense to consider whether object parameters correspond, since we're
comparing an object parameter to a non-object parameter, so I generalized
xobj_iobj_parameters_correspond accordingly.

As I refine cand_parms_match, more behaviors need to differ between its
original use to compare the original templates for two candidates, and the
later use to decide whether to compare constraints.  So now there's a
parameter to select between the semantics.

gcc/cp/ChangeLog:

	* call.cc (reversed_match): New.
	(enum class pmatch): New enum.
	(cand_parms_match): Add match_kind parm.
	(object_parms_correspond): Add fn parms.
	(joust): Adjust.
	* class.cc (xobj_iobj_parameters_correspond): Rename to...
	(iobj_parm_corresponds_to): ...this.  Take the other
	type instead of a second function.
	(object_parms_correspond): Adjust.
	* cp-tree.h (iobj_parm_corresponds_to): Declare.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp2a/concepts-memfun4.C: Change expected
	reversed handling.
---
 gcc/cp/cp-tree.h  |  1 +
 gcc/cp/call.cc| 76 ++-
 gcc/cp/class.cc   | 32 +++-
 gcc/testsuite/g++.dg/cpp2a/concepts-memfun4.C | 10 +--
 4 files changed, 73 insertions(+), 46 deletions(-)

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 83009fc837c..d9b14d7c4f5 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -6854,6 +6854,7 @@ extern tree build_vtbl_ref			(tree, tree);
 extern tree build_vfn_ref			(tree, tree);
 extern tree get_vtable_decl			(tree, int);
 extern bool object_parms_correspond		(tree, tree, tree);
+extern bool iobj_parm_corresponds_to		(tree, tree, tree);
 extern bool add_method(tree, tree, bool);
 extern tree declared_access			(tree);
 extern bool maybe_push_used_methods		(tree);
diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index 6f024b8abc3..1f5ff417c81 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -12713,7 +12713,7 @@ class_of_implicit_object (z_candidate *cand)
[basic.scope.scope].  */
 
 static bool
-object_parms_correspond (z_candidate *c1, z_candidate *c2)
+object_parms_correspond (z_candidate *c1, tree fn1, z_candidate *c2, tree fn2)
 {
   tree context = class_of_implicit_object (c1);
   tree ctx2 = class_of_implicit_object (c2);
@@ -12727,43 +12727,80 @@ object_parms_correspond (z_candidate *c1, z_candidate *c2)
but it can occur with reversed operators.  */
 return false;
 
-  return object_parms_correspond (c1->fn, c2->fn, context);
+  return object_parms_correspond (fn1, fn2, context);
+}
+
+/* Return whether the first parameter of C1 matches the second parameter
+   of C2.  */
+
+static bool
+reversed_match (z_candidate *c1, z_candidate *c2)
+{
+  tree fn1 = c1->fn;
+  tree parms2 = TYPE_ARG_TYPES (TREE_TYPE (c2->fn));
+  tree parm2 = TREE_VALUE (TREE_CHAIN (parms2));
+  if (DECL_IOBJ_MEMBER_FUNCTION_P (fn1))
+{
+  tree ctx = class_of_implicit_object (c1);
+  return iobj_parm_corresponds_to (fn1, parm2, ctx);
+}
+  else
+{
+  tree parms1 = TYPE_ARG_TYPES (TREE_TYPE (fn1));
+  tree parm1 = TREE_VALUE (parms1);
+  return same_type_p (parm1, parm2);
+}
 }
 
 /* True if the defining declarations of the two candidates have equivalent
-   parameters.  */
+   parameters.  MATCH_KIND controls whether we're trying to compare the
+   original declarations (for a warning) or the actual candidates.  */
+
+enum class pmatch { original, current };
 
 static bool
-cand_parms_match (z_candidate *c1, z_candidate *c2)
+cand_parms_match (z_candidate *c1, z_candidate *c2, pmatch match_kind)
 {
   tree fn1 = c1->fn;
   tree fn2 = c2->fn;
-  if (fn1 == fn2)
+  bool reversed = (match_kind == pmatch::current
+		   && c1->reversed () != c2->reversed ());
+  if (fn1 == fn2 && !reversed)
 return true;
   if (identifier_p (fn1) || identifier_p (fn2))
 return false;
-  /* We don't look at c1->template_decl because that's only set for primary
- templates, not e.g. non-template member functions of class templates.  */
-  tree t1 = most_general_template (fn1);
-  tree t2 = most_general_template (fn2);
-  if (t1 || t2)
+  if 

Re: [PATCH RFC] c++/modules: __class_type_info and modules

2024-01-12 Thread Jason Merrill

On 12/23/23 14:46, Nathan Sidwell wrote:

On 12/18/23 17:10, Jason Merrill wrote:

On 12/18/23 16:57, Nathan Sidwell wrote:

On 12/18/23 16:31, Jason Merrill wrote:
Tested x86_64-pc-linux-gnu.  Does this make sense?  Did you have 
another theory

about how to merge these?


Why isn't push_abi_namespace doing the right setup here? (and I think 
get_global_binding might be similarly problematic?)


What would the right setup be?  It pushes into the global module, but 
before this change lookup doesn't find things imported into the global 
module, and so we get two independent (and so non-equivalent) 
declarations.


The comment for get_namespace_binding says "Users of this who, having 
found nothing, push a new decl must be prepared for that pushing to 
match an existing decl."  But if lookup_elaborated_type fails, so we 
pushtag a new type, check_module_override doesn't try to merge them 
because TREE_PUBLIC isn't set on the TYPE_DECL yet at that point, and 
they coexist until we complain about redeclaring __dynamic_cast with 
non-matching parameter types.


I tried setting TREE_PUBLIC on the TYPE_DECL, and then 
check_module_override called duplicate_decls, and rejected the 
redeclaration as a different type.


sigh, it seems that doesn't work as intended, I guess your approace is a 
pragmatic workaround, much as I dislike special-casing particular 
identifier. Perhaps comment with an appropriate FIXME?


I've realized there's problems with completeness here -- the 'invisible' 
type may be complete, but the current TU only forward-declares it.  Our 
AST can't represent that right now.  And I'm not sure if there are 
template instantiation issues -- is the type complete or not in any 
particular instantiaton?


My understanding of https://eel.is/c++draft/module#reach-4 is that this 
doesn't matter: if there is a reachable definition of the class, the 
class is complete, even if the current TU only forward-declares it.


Here's an alternate approach that handles this merging in 
check_module_override; this makes P1811 include-after-import a bit 
worse, but it's already not well supported, so perhaps that's OK for 
now.  But I'm inclined to go with my earlier patch for GCC 14.  What do 
you think?


JasonFrom a4ccd4664d6acb696db3263de8286721e75a0d2b Mon Sep 17 00:00:00 2001
From: Jason Merrill 
Date: Mon, 18 Dec 2023 15:47:10 -0500
Subject: [PATCH 1/2] c++: __class_type_info and modules
To: gcc-patches@gcc.gnu.org

Doing a dynamic_cast in both TUs broke because we were declaring a new
__class_type_info in _b that conflicted with the one imported in the global
module from _a.  check_module_override wasn't merging them because
TREE_PUBLIC wasn't set yet.  Fixing that led to errors from duplicate_decls
about the decls having different types; let's avoid that in
check_module_override.

gcc/cp/ChangeLog:

	* name-lookup.cc (pushtag): Set TREE_PUBLIC sooner.
	(check_module_override): Merge classes.
  	(lookup_elaborated_type): Update comment.

gcc/testsuite/ChangeLog:

	* g++.dg/lookup/builtin4.C: Expect warning.
	* g++.dg/lookup/hidden-class10.C: Likewise.
	* g++.dg/modules/pr106304_b.C: Add dynamic_cast.
---
 gcc/cp/name-lookup.cc| 33 
 gcc/testsuite/g++.dg/lookup/builtin4.C   |  2 +-
 gcc/testsuite/g++.dg/lookup/hidden-class10.C |  2 +-
 gcc/testsuite/g++.dg/modules/pr106304_b.C|  1 +
 4 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
index 4e2d5b03015..37724ea0e65 100644
--- a/gcc/cp/name-lookup.cc
+++ b/gcc/cp/name-lookup.cc
@@ -3794,7 +3794,22 @@ check_module_override (tree decl, tree mvec, bool hiding,
 
   for (ovl_iterator iter (mergeable); iter; ++iter)
 	{
-	  match = duplicate_decls (decl, *iter, hiding);
+	  if (!named_module_p ()
+	  && DECL_IMPLICIT_TYPEDEF_P (decl)
+	  && DECL_IMPLICIT_TYPEDEF_P (*iter)
+	  && !IDENTIFIER_ANON_P (DECL_NAME (decl)))
+	/* Merge classes of the same name in the global module;
+	   duplicate_decls would complain about different types.  This test
+	   is similar to the one in check_mergeable_decl, but only
+	   considers implicit typedefs, since explicit typedefs should
+	   actually have the same type.
+
+	   ??? If we eventually want to do consistency checking for P1811
+	   redefinition, we'll probably need to delay this merging until
+	   the end of the class definition.  */
+	match = *iter;
+	  else
+	match = duplicate_decls (decl, *iter, hiding);
 	  if (match)
 	goto matched;
 	}
@@ -8087,11 +8102,9 @@ lookup_elaborated_type (tree name, TAG_how how)
 
 	  if (!module_purview_p ())
 	{
-	  /* We're in the global module, perhaps there's a tag
-		 there?  */
-	  // FIXME: This isn't quite right, if we find something
-	  // here, from the language PoV we're not supposed to
-	  // know it?
+	  /* We're in the global module, perhaps there's a tag there?
+		 We'll find it in 

[pushed] Darwin, powerpc: Fix bootstrap.

2024-01-12 Thread Iain Sandoe
Tested on powerpc-darwin9, pushed to trunk, thanks,
Iain

--- 8< ---

Recent changes to the member names of the diagnostics class missed one case in
the Darwin PowerPC host code.  Fixed thus.

gcc/ChangeLog:

* config/rs6000/host-darwin.cc (segv_handler): Use the revised
diagnostics class member name for abort of error.

Signed-off-by: Iain Sandoe 
---
 gcc/config/rs6000/host-darwin.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/host-darwin.cc b/gcc/config/rs6000/host-darwin.cc
index 691dcb39b6d..e0001776339 100644
--- a/gcc/config/rs6000/host-darwin.cc
+++ b/gcc/config/rs6000/host-darwin.cc
@@ -119,7 +119,7 @@ segv_handler (int sig ATTRIBUTE_UNUSED,
  }
}
   
-  if (global_dc->abort_on_error)
+  if (global_dc->m_abort_on_error)
fancy_abort (__FILE__, __LINE__, __FUNCTION__);
 
   exit (FATAL_EXIT_CODE);
-- 
2.39.2 (Apple Git-143)



GCN: Enable effective-target 'vect_early_break', 'vect_early_break_hw'

2024-01-12 Thread Thomas Schwinge
Hi!

OK to push the attached
"GCN: Enable effective-target 'vect_early_break', 'vect_early_break_hw'"?
("The relevant test cases are all-PASS with just [two] exceptions, to be
looked into individually, later on."  I'm not currently planning to look
into that.)


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 3193614c4f9a8032e85a4da87bde8055aeee7d7b Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 9 Jan 2024 10:25:48 +0100
Subject: [PATCH] GCN: Enable effective-target 'vect_early_break',
 'vect_early_break_hw'

Via XPASSing test cases after commit a657c7e3518fcfc796f223d47385cad5e97dc9a5
"testsuite: un-xfail TSVC loops that check for exit control flow vectorization":

PASS: gcc.dg/vect/tsvc/vect-tsvc-s332.c (test for excess errors)
PASS: gcc.dg/vect/tsvc/vect-tsvc-s332.c execution test
[-XFAIL:-]{+XPASS:+} gcc.dg/vect/tsvc/vect-tsvc-s332.c scan-tree-dump vect "vectorized 1 loops"

PASS: gcc.dg/vect/tsvc/vect-tsvc-s481.c (test for excess errors)
PASS: gcc.dg/vect/tsvc/vect-tsvc-s481.c execution test
[-XFAIL:-]{+XPASS:+} gcc.dg/vect/tsvc/vect-tsvc-s481.c scan-tree-dump vect "vectorized 1 loops"

PASS: gcc.dg/vect/tsvc/vect-tsvc-s482.c (test for excess errors)
PASS: gcc.dg/vect/tsvc/vect-tsvc-s482.c execution test
[-XFAIL:-]{+XPASS:+} gcc.dg/vect/tsvc/vect-tsvc-s482.c scan-tree-dump vect "vectorized 1 loops"

..., it became apparent that GCN, too, does support vectorization of loops with
early breaks.  The relevant test cases are all-PASS with just the following
exceptions, to be looked into individually, later on:

PASS: gcc.dg/vect/vect-early-break_25.c (test for excess errors)
PASS: gcc.dg/vect/vect-early-break_25.c scan-tree-dump-times vect "vectorized 1 loops" 1
FAIL: gcc.dg/vect/vect-early-break_25.c scan-tree-dump-times vect "Alignment of access forced using peeling" 1

PASS: gcc.dg/vect/vect-early-break_56.c (test for excess errors)
PASS: gcc.dg/vect/vect-early-break_56.c execution test
XPASS: gcc.dg/vect/vect-early-break_56.c scan-tree-dump-times vect "vectorized 2 loops" 2

	gcc/testsuite/
	* lib/target-supports.exp
	(check_effective_target_vect_early_break)
	(check_effective_target_vect_early_break_hw): Enable for GCN.
---
 gcc/testsuite/lib/target-supports.exp | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 75d1add894f..497c46de4cb 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -4071,6 +4071,7 @@ proc check_effective_target_vect_early_break { } {
 	[istarget aarch64*-*-*]
 	|| [check_effective_target_arm_v8_neon_ok]
 	|| [check_effective_target_sse4]
+	|| [istarget amdgcn-*-*]
 	}}]
 }
 
@@ -4085,6 +4086,7 @@ proc check_effective_target_vect_early_break_hw { } {
 	[istarget aarch64*-*-*]
 	|| [check_effective_target_arm_v8_neon_hw]
 	|| [check_sse4_hw_available]
+	|| [istarget amdgcn-*-*]
 	}}]
 }
 
-- 
2.34.1



Re: [PATCH 2/2] RISC-V/testsuite: Also verify if-conversion runs for pr105314.c

2024-01-12 Thread Maciej W. Rozycki
On Fri, 12 Jan 2024, Andrew Pinski wrote:

> > Verify that if-conversion succeeded through noce_try_store_flag_mask, as
> > per PR rtl-optimization/105314, tightening the test case and making it
> > explicit.
> >
> > gcc/testsuite/
> > * gcc.target/riscv/pr105314.c: Scan the RTL "ce1" pass too.
> 
> I have an objection for this, if we are checking the RTL pass and not
> overall code generation, then maybe we change the testcase so that it
> is a RTL testcase instead.

 It's not clear to me what you mean by an "RTL testcase", i.e. how you'd 
see the testcase changed (or an additional one produced instead) and why, 
please elaborate.  Right now we verify that branches are absent from 
output, but not how that happens.

> Especially when there might be improvements going into GCC 15
> specifically targeting ifcvt on the gimple level (I am planning on
> doing some).

 How are the improvements going to affect the testcase?

 If they make it no longer relevant (in which case a replacement testcase 
for the new arrangement will be needed) or require updates, then I think 
it's an expected situation: one of the purposes of the testsuite is to 
make sure we're in control and understand what the consequences of changes 
made are.  It's not that the testsuite is cast in stone and not expected 
to change.

 I.e. if we expect noce_try_store_flag_mask no longer to trigger, then 
we'll see that in the test results (good!) and we can update the relevant 
test case(s). e.g. by reversing the pass criteria so that we're still in 
control.

  Maciej


Re: [PATCH] c++, demangle: Implement https://github.com/itanium-cxx-abi/cxx-abi/issues/148 non-proposal

2024-01-12 Thread Jason Merrill

On 1/12/24 07:45, Jakub Jelinek wrote:

Hi!

The following patch attempts to implement what apparently clang++
implemented for explicit object member function mangling, but nobody
actually proposed in patch form in
https://github.com/itanium-cxx-abi/cxx-abi/issues/148

Ok for trunk if it passes full bootstrap/regtest?  So far just tested
on the new testcases.


OK, thanks.


2024-01-12  Jakub Jelinek  

gcc/cp/
* mangle.cc (write_nested_name): Mangle explicit object
member functions with H as per
https://github.com/itanium-cxx-abi/cxx-abi/issues/148 non-proposal.
gcc/testsuite/
* g++.dg/abi/mangle79.C: New test.
include/
* demangle.h (enum demangle_component_type): Add
DEMANGLE_COMPONENT_XOBJ_MEMBER_FUNCTION.
libiberty/
* cp-demangle.c (FNQUAL_COMPONENT_CASE): Add case for
DEMANGLE_COMPONENT_XOBJ_MEMBER_FUNCTION.
(d_dump): Handle DEMANGLE_COMPONENT_XOBJ_MEMBER_FUNCTION.
(d_nested_name): Parse H after N in nested name.
(d_count_templates_scopes): Handle
DEMANGLE_COMPONENT_XOBJ_MEMBER_FUNCTION.
(d_print_mod): Likewise.
(d_print_function_type): Likewise.
* testsuite/demangle-expected: Add tests for explicit object
member functions.

--- gcc/cp/mangle.cc.jj 2024-01-12 10:07:31.248231747 +0100
+++ gcc/cp/mangle.cc2024-01-12 11:37:35.790915463 +0100
@@ -1247,6 +1247,8 @@ write_nested_name (const tree decl)
write_char ('R');
}
  }
+  else if (DECL_XOBJ_MEMBER_FUNCTION_P (decl))
+write_char ('H');
  
/* Is this a template instance?  */

if (tree info = maybe_template_info (decl))
--- gcc/testsuite/g++.dg/abi/mangle79.C.jj  2024-01-12 13:18:20.782917924 
+0100
+++ gcc/testsuite/g++.dg/abi/mangle79.C 2024-01-12 13:26:01.297433970 +0100
@@ -0,0 +1,61 @@
+// { dg-do compile { target c++11 } }
+// { dg-options "" }
+
+struct S {
+  static void foo (S);
+  void foo (this S);   // { dg-warning "explicit object member function only 
available with" "" { target c++20_down } }
+  template 
+  static void bar (S, T);
+  template 
+  void bar (this S, T);// { dg-warning "explicit object member function only 
available with" "" { target c++20_down } }
+  static void baz (const S &);
+  void baz (this const S &);   // { dg-warning "explicit object member function only 
available with" "" { target c++20_down } }
+};
+
+void
+S::foo (S)
+{
+}
+
+void
+S::foo (this S)// { dg-warning "explicit object member function only 
available with" "" { target c++20_down } }
+{
+}
+
+template 
+void
+S::bar (S, T)
+{
+}
+
+template 
+void
+S::bar (this S, T) // { dg-warning "explicit object member function only 
available with" "" { target c++20_down } }
+{
+}
+
+void
+S::baz (const S &)
+{
+}
+
+void
+S::baz (this const S &)// { dg-warning "explicit object member function only 
available with" "" { target c++20_down } }
+{
+}
+
+void
+qux (S *p)
+{
+  S::foo (*p);
+  p->foo ();
+  S::bar <5> (*p, 0);
+  p->bar <5> (0);
+}
+
+// { dg-final { scan-assembler "_ZN1S3fooES_" } }
+// { dg-final { scan-assembler "_ZNH1S3fooES_" } }
+// { dg-final { scan-assembler "_ZN1S3barILi5EiEEvS_T0_" } }
+// { dg-final { scan-assembler "_ZNH1S3barILi5EiEEvS_T0_" } }
+// { dg-final { scan-assembler "_ZN1S3bazERKS_" } }
+// { dg-final { scan-assembler "_ZNH1S3bazERKS_" } }
--- include/demangle.h.jj   2024-01-03 12:07:25.330409694 +0100
+++ include/demangle.h  2024-01-12 11:43:27.543915280 +0100
@@ -314,6 +314,8 @@ enum demangle_component_type
/* C++11: An rvalue reference modifying a member function.  The one
   subtree is the type which is being referenced.  */
DEMANGLE_COMPONENT_RVALUE_REFERENCE_THIS,
+  /* C++23: A member function with explict object parameter.  */
+  DEMANGLE_COMPONENT_XOBJ_MEMBER_FUNCTION,
/* A vendor qualifier.  The left subtree is the type which is being
   qualified, and the right subtree is the name of the
   qualifier.  */
--- libiberty/cp-demangle.c.jj  2024-01-03 12:07:48.498085118 +0100
+++ libiberty/cp-demangle.c 2024-01-12 13:06:04.526281733 +0100
@@ -581,6 +581,7 @@ static char *d_demangle (const char *, i
  case DEMANGLE_COMPONENT_CONST_THIS:   \
  case DEMANGLE_COMPONENT_REFERENCE_THIS:   \
  case DEMANGLE_COMPONENT_RVALUE_REFERENCE_THIS:\
+case DEMANGLE_COMPONENT_XOBJ_MEMBER_FUNCTION:  \
  case DEMANGLE_COMPONENT_TRANSACTION_SAFE: \
  case DEMANGLE_COMPONENT_NOEXCEPT: \
  case DEMANGLE_COMPONENT_THROW_SPEC
@@ -749,6 +750,9 @@ d_dump (struct demangle_component *dc, i
  case DEMANGLE_COMPONENT_RVALUE_REFERENCE_THIS:
printf ("rvalue reference this\n");
break;
+case DEMANGLE_COMPONENT_XOBJ_MEMBER_FUNCTION:
+  printf ("explicit object parameter\n");
+  break;
  case DEMANGLE_COMPONENT_TRANSACTION_SAFE:
printf 

Re: [PATCH] Add support for function attributes and variable attributes

2024-01-12 Thread Guillaume Gomez
Just realized that you were asking for the patch I forgot to join...

Here it is.

Le ven. 12 janv. 2024 à 11:09, Guillaume Gomez
 a écrit :
>
> > It sounds like the patch you have locally is ready, but it has some
> > nontrivial changes compared to the last version you posted to the list.
> > Please post your latest version to the list.
>
> Sure!
>
> This patch adds the support for attributes on functions and variables. It does
> so by adding the following functions:
>
> * gcc_jit_function_add_attribute
> * gcc_jit_function_add_string_attribute
> * gcc_jit_function_add_integer_array_attribute
> * gcc_jit_lvalue_add_string_attribute
>
> It adds the following types:
>
> * gcc_jit_fn_attribute
> * gcc_jit_variable_attribute
>
> It adds tests to ensure that the attributes are correctly applied.
>
> > Do you have push rights, or do you need me to push it for you?
>
> I have push rights so I'll merge the patch myself. But thanks for offering to
> do it.
>
> Le jeu. 11 janv. 2024 à 23:38, David Malcolm  a écrit :
> >
> > On Thu, 2024-01-11 at 22:40 +0100, Guillaume Gomez wrote:
> > > Hi David,
> > >
> > > > The above looks correct, but the patch adds the entrypoint
> > > > descriptions
> > > > to topics/types.rst, which seems like the wrong place.  The
> > > > function-
> > > > related ones should be in topics/functions.rst in the "Functions"
> > > > section and the lvalue/variable one in topics/expression.rst after
> > > > the
> > > > "Global variables" section.
> > >
> > > Ah indeed. Mix-up on my end. Fixed it.
> > >
> > > > test-restrict.c is a pre-existing testcase, so please don't delete
> > > > its
> > > > entry.
> > >
> > > Ah indeed, I went too quickly and thought it was a test I renamed...
> > >
> > > > BTW, the ChangeLog entry mentions adding test-restrict.c, but the
> > > > patch
> > > > doesn't add it, so that part of the proposed ChangeLog is wrong.
> > > >
> > > > Does the patch pass ./contrib/gcc-changelog/git_check_commit.py ?
> > >
> > > I messed up a bit, fixed it thanks to you. I didn't run the script in
> > > my last
> > > update but just did:
> > >
> > > ```
> > > $ contrib/gcc-changelog/git_check_commit.py $(git log -1 --format=%h)
> > > Checking 3849ee2eadf0eeec2b0080a5142ced00be96a60d: OK
> > > ```
> > >
> > > > Otherwise, looks good, assuming that the patch has been tested with
> > > > the
> > > > full jit testsuite.
> > >
> > > When rebasing on upstream yesterday I discovered that two tests
> > > were not working anymore. For the first one, it was simply because of
> > > the changes in `dummy-frontend.cc`. For the second one
> > > (test-noinline-attribute.c), it was because the rules for inlining
> > > changed
> > > since we wrote this patch apparently (our fork is very late). Antoni
> > > discovered
> > > that we could just add a call to `asm` to prevent this from happening
> > > so I
> > > added it.
> > >
> > > So yes, all jit tests are passing as expected. :)
> >
> > Good.
> >
> > It sounds like the patch you have locally is ready, but it has some
> > nontrivial changes compared to the last version you posted to the list.
> > Please post your latest version to the list.
> >
> > Do you have push rights, or do you need me to push it for you?
> >
> > Thanks
> > Dave
> >
> > >
> > > Le jeu. 11 janv. 2024 à 19:46, David Malcolm  a
> > > écrit :
> > > >
> > > > On Thu, 2024-01-11 at 01:00 +0100, Guillaume Gomez wrote:
> > > > > Hi David.
> > > > >
> > > > > Thanks for the review!
> > > > >
> > > > > > > +.. function::  void\
> > > > > > > +   gcc_jit_lvalue_add_string_attribute
> > > > > > > (gcc_jit_lvalue *variable,
> > > > > > > +enum
> > > > > > > gcc_jit_fn_attribute attribute,
> > > > > >
> > > > > > ^^
> > > > > >
> > > > > > This got out of sync with the declaration in the header file;
> > > > > > it
> > > > > > should
> > > > > > be enum gcc_jit_variable_attribute attribute
> > > > >
> > > > > Indeed, good catch!
> > > > >
> > > > > > I took a brief look through the handler functions and with the
> > > > > > above
> > > > > > caveat I didn't see anything obviously wrong.  I'm going to
> > > > > > assume
> > > > > > this
> > > > > > code is OK given that presumably you've been testing it within
> > > > > > rustc,
> > > > > > right?
> > > > >
> > > > > Both in rustc and in the JIT tests we added.
> > > > >
> > > > > [..snip...]
> > > > >
> > > > > I added all the missing `RETURN_IF_FAIL` you mentioned. None of
> > > > > the
> > > > > arguments should be `NULL` so it was a mistake not to check it.
> > > > >
> > > > > [..snip...]
> > > > >
> > > > > I removed the tests comments as you mentioned.
> > > > >
> > > > > > Please update jit.dg/all-non-failing-tests.h for the new tests;
> > > > > > it's
> > > > > > meant to list all of the (non failing) tests alphabetically.
> > > > >
> > > > > It's not always correctly sorted. Might be worth sending a patch
> > > > > after this
> > > > > one gets merged to fix that.
> > > > >

GCC 14.0.1 Status Report (2024-01-12), Stage 4 in effect now

2024-01-12 Thread Richard Biener
Status
==

The GCC development branch which will become GCC 14 is now
in regression and documentation fixes only mode (Stage 4).

Please concentrate now on fixing regressions from GCC 13
and earlier.

GCC 14.1 will be released when we reach the milestone of
zero P1 regressions (note not all regressions have been
prioritized yet).


Quality Data


Priority  #   Change from last report
---   ---
P1   32   +   2
P2  504   +   5
P3  241   -   3
P4  211   -   1
P5   25
---   ---
Total P1-P3 777   +   4
Total  1013   +   3


Previous Report
===

https://gcc.gnu.org/pipermail/gcc/2023-November/242898.html


[patch,avr,applied] Fix PR107201 -nodevicelib not working for all devices.

2024-01-12 Thread Georg-Johann Lay

Since the advent of devices AVR*, the spec pattern mmcu=avr* does no
more work to discriminate between devices and cores like avr51.

This means -nodevicelib no more works for AVR* devices because that
option is removed for mmcu=avr* (which were only cores in the old days).

Instead of that pattern, a new spec function is used.

When there are no objections or improvements within the next week or
so, I'd install the patch below.

Johann

--

AVR: target/107201: Make -nodevicelib work for all devices.

The driver-avr.cc contains a spec that discriminates between cores
and devices by means of a mmcu=avr* spec pattern.  This does not
work for new devices like AVR128* which also start with mmcu=avr
like all cores do.  The patch uses a new spec function in order to
tell apart cores from devices.

gcc/
PR target/107201
* config/avr/avr.h (EXTRA_SPEC_FUNCTIONS): Add no-devlib, avr_no_devlib.
* config/avr/driver-avr.cc (avr_no_devlib): New function.
(avr_devicespecs_file): Use it to remove -nodevicelib from the
options for cores only.
* config/avr/avr-arch.h (avr_get_parch): New prototype.
* config/avr/avr-devices.cc (avr_get_parch): New function.diff --git a/gcc/config/avr/avr-arch.h b/gcc/config/avr/avr-arch.h
index 03b3263d529..0ee335038a4 100644
--- a/gcc/config/avr/avr-arch.h
+++ b/gcc/config/avr/avr-arch.h
@@ -195,6 +195,7 @@ typedef struct
 
 extern const avr_arch_t avr_arch_types[];
 extern const avr_arch_t *avr_arch;
+extern const avr_arch_t *avr_get_parch (const char *mcu);
 
 extern const avr_mcu_t avr_mcu_types[];
 
diff --git a/gcc/config/avr/avr-devices.cc b/gcc/config/avr/avr-devices.cc
index 90846f3da21..43d38eb3916 100644
--- a/gcc/config/avr/avr-devices.cc
+++ b/gcc/config/avr/avr-devices.cc
@@ -153,4 +153,20 @@ avr_inform_core_architectures (void)
   free (archs);
 }
 
+
+/* When MCU names a core arch like "avr5", then return a pointer to the
+   respective entry in avr_arch_types[].  Otherwise, return NULL.  */
+
+const avr_arch_t *
+avr_get_parch (const char *mcu)
+{
+  for (size_t i = 0; i < ARRAY_SIZE (avr_arch_types); ++i)
+{
+  if (strcmp (mcu, avr_arch_types[i].name) == 0)
+	return & avr_arch_types[i];
+}
+
+  return NULL;
+}
+
 #endif // IN_GEN_AVR_MMCU_TEXI
diff --git a/gcc/config/avr/avr.h b/gcc/config/avr/avr.h
index 3ef60b9ab7f..7f7e23183b2 100644
--- a/gcc/config/avr/avr.h
+++ b/gcc/config/avr/avr.h
@@ -500,9 +500,11 @@ typedef struct avr_args
 
 extern const char *avr_devicespecs_file (int, const char**);
 extern const char *avr_double_lib (int, const char**);
+extern const char *avr_no_devlib (int, const char**);
 
 #define EXTRA_SPEC_FUNCTIONS\
   { "double-lib", avr_double_lib }, \
+  { "no-devlib", avr_no_devlib },   \
   { "device-specs-file", avr_devicespecs_file },
 
 /* Driver self specs has lmited functionality w.r.t. '%s' for dynamic specs.
diff --git a/gcc/config/avr/driver-avr.cc b/gcc/config/avr/driver-avr.cc
index 2512c2c546a..b44136e2577 100644
--- a/gcc/config/avr/driver-avr.cc
+++ b/gcc/config/avr/driver-avr.cc
@@ -105,7 +105,12 @@ avr_devicespecs_file (int argc, const char **argv)
   return concat ("%{!nodevicespecs:-specs=device-specs", dir_separator_str,
  "specs-", mmcu, "%s} %

[PATCH] c++, demangle: Implement https://github.com/itanium-cxx-abi/cxx-abi/issues/148 non-proposal

2024-01-12 Thread Jakub Jelinek
Hi!

The following patch attempts to implement what apparently clang++
implemented for explicit object member function mangling, but nobody
actually proposed in patch form in
https://github.com/itanium-cxx-abi/cxx-abi/issues/148

Ok for trunk if it passes full bootstrap/regtest?  So far just tested
on the new testcases.

2024-01-12  Jakub Jelinek  

gcc/cp/
* mangle.cc (write_nested_name): Mangle explicit object
member functions with H as per
https://github.com/itanium-cxx-abi/cxx-abi/issues/148 non-proposal.
gcc/testsuite/
* g++.dg/abi/mangle79.C: New test.
include/
* demangle.h (enum demangle_component_type): Add
DEMANGLE_COMPONENT_XOBJ_MEMBER_FUNCTION.
libiberty/
* cp-demangle.c (FNQUAL_COMPONENT_CASE): Add case for
DEMANGLE_COMPONENT_XOBJ_MEMBER_FUNCTION.
(d_dump): Handle DEMANGLE_COMPONENT_XOBJ_MEMBER_FUNCTION.
(d_nested_name): Parse H after N in nested name.
(d_count_templates_scopes): Handle
DEMANGLE_COMPONENT_XOBJ_MEMBER_FUNCTION.
(d_print_mod): Likewise.
(d_print_function_type): Likewise.
* testsuite/demangle-expected: Add tests for explicit object
member functions.

--- gcc/cp/mangle.cc.jj 2024-01-12 10:07:31.248231747 +0100
+++ gcc/cp/mangle.cc2024-01-12 11:37:35.790915463 +0100
@@ -1247,6 +1247,8 @@ write_nested_name (const tree decl)
write_char ('R');
}
 }
+  else if (DECL_XOBJ_MEMBER_FUNCTION_P (decl))
+write_char ('H');
 
   /* Is this a template instance?  */
   if (tree info = maybe_template_info (decl))
--- gcc/testsuite/g++.dg/abi/mangle79.C.jj  2024-01-12 13:18:20.782917924 
+0100
+++ gcc/testsuite/g++.dg/abi/mangle79.C 2024-01-12 13:26:01.297433970 +0100
@@ -0,0 +1,61 @@
+// { dg-do compile { target c++11 } }
+// { dg-options "" }
+
+struct S {
+  static void foo (S);
+  void foo (this S);   // { dg-warning "explicit object member 
function only available with" "" { target c++20_down } }
+  template 
+  static void bar (S, T);
+  template 
+  void bar (this S, T);// { dg-warning "explicit object member 
function only available with" "" { target c++20_down } }
+  static void baz (const S &);
+  void baz (this const S &);   // { dg-warning "explicit object member 
function only available with" "" { target c++20_down } }
+};
+
+void
+S::foo (S)
+{
+}
+
+void
+S::foo (this S)// { dg-warning "explicit object member 
function only available with" "" { target c++20_down } }
+{
+}
+
+template 
+void
+S::bar (S, T)
+{
+}
+
+template 
+void
+S::bar (this S, T) // { dg-warning "explicit object member 
function only available with" "" { target c++20_down } }
+{
+}
+
+void
+S::baz (const S &)
+{
+}
+
+void
+S::baz (this const S &)// { dg-warning "explicit object member 
function only available with" "" { target c++20_down } }
+{
+}
+
+void
+qux (S *p)
+{
+  S::foo (*p);
+  p->foo ();
+  S::bar <5> (*p, 0);
+  p->bar <5> (0);
+}
+
+// { dg-final { scan-assembler "_ZN1S3fooES_" } }
+// { dg-final { scan-assembler "_ZNH1S3fooES_" } }
+// { dg-final { scan-assembler "_ZN1S3barILi5EiEEvS_T0_" } }
+// { dg-final { scan-assembler "_ZNH1S3barILi5EiEEvS_T0_" } }
+// { dg-final { scan-assembler "_ZN1S3bazERKS_" } }
+// { dg-final { scan-assembler "_ZNH1S3bazERKS_" } }
--- include/demangle.h.jj   2024-01-03 12:07:25.330409694 +0100
+++ include/demangle.h  2024-01-12 11:43:27.543915280 +0100
@@ -314,6 +314,8 @@ enum demangle_component_type
   /* C++11: An rvalue reference modifying a member function.  The one
  subtree is the type which is being referenced.  */
   DEMANGLE_COMPONENT_RVALUE_REFERENCE_THIS,
+  /* C++23: A member function with explict object parameter.  */
+  DEMANGLE_COMPONENT_XOBJ_MEMBER_FUNCTION,
   /* A vendor qualifier.  The left subtree is the type which is being
  qualified, and the right subtree is the name of the
  qualifier.  */
--- libiberty/cp-demangle.c.jj  2024-01-03 12:07:48.498085118 +0100
+++ libiberty/cp-demangle.c 2024-01-12 13:06:04.526281733 +0100
@@ -581,6 +581,7 @@ static char *d_demangle (const char *, i
 case DEMANGLE_COMPONENT_CONST_THIS:\
 case DEMANGLE_COMPONENT_REFERENCE_THIS:\
 case DEMANGLE_COMPONENT_RVALUE_REFERENCE_THIS: \
+case DEMANGLE_COMPONENT_XOBJ_MEMBER_FUNCTION:  \
 case DEMANGLE_COMPONENT_TRANSACTION_SAFE:  \
 case DEMANGLE_COMPONENT_NOEXCEPT:  \
 case DEMANGLE_COMPONENT_THROW_SPEC
@@ -749,6 +750,9 @@ d_dump (struct demangle_component *dc, i
 case DEMANGLE_COMPONENT_RVALUE_REFERENCE_THIS:
   printf ("rvalue reference this\n");
   break;
+case DEMANGLE_COMPONENT_XOBJ_MEMBER_FUNCTION:
+  printf ("explicit object parameter\n");
+  break;
 case DEMANGLE_COMPONENT_TRANSACTION_SAFE:
   printf ("transaction_safe this\n");
   break;
@@ -1547,6 +1551,8 @@ d_name (struct d_info 

[pushed] aarch64: Rework uxtl->zip optimisation [PR113196]

2024-01-12 Thread Richard Sandiford
g:f26f92b534f9 implemented unsigned extensions using ZIPs rather than
UXTL{,2}, since the former has a higher throughput than the latter on
amny cores.  The optimisation worked by lowering directly to ZIP during
expand, so that the zero input could be hoisted and shared.

However, changing to ZIP means that zero extensions no longer benefit
from some existing combine patterns.  The patch included new patterns
for UADDW and USUBW, but the PR shows that other patterns were affected
as well.

This patch instead introduces the ZIPs during a pre-reload split
and forcibly hoists the zero move to the outermost scope.  This has
the disadvantage of executing the move even for a shrink-wrapped
function, which I suppose could be a problem if it causes a kernel
to trap and enable Advanced SIMD unnecessarily.  In other circumstances,
an unused move shouldn't affect things much.

Also, the RA should be able to rematerialise the move at an
appropriate point if necessary, such as if there is an intervening
call.

In https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641948.html
I'd then tried to allow a zero to be recombined back into a solitary
ZIP.  However, that relied on late-combine, which didn't make it into
GCC 14.  This version instead restricts the split to cases where the
UXTL executes more frequently as the entry block (which is where we
plan to put the zero).

Also, the original optimisation contained a big-endian correction
that I don't think is needed/correct.  Even on big-endian targets,
we want the ZIP to take the low half of an element from the input
vector and the high half from the zero vector.  And the patterns
map directly to the underlying Advanced SIMD instructions: the use
of unspecs means that there's no need to adjust for the difference
between GCC and Arm lane numbering.

Tested on aarch64-linux-gnu & pushed (after checking with Tamar
off-list).

Richard


gcc/
PR target/113196
* config/aarch64/aarch64.h (machine_function::advsimd_zero_insn):
New member variable.
* config/aarch64/aarch64-protos.h (aarch64_split_simd_shift_p):
Declare.
* config/aarch64/iterators.md (Vnarrowq2): New mode attribute.
* config/aarch64/aarch64-simd.md
(vec_unpacku_hi_, vec_unpacks_hi_): Recombine into...
(vec_unpack_hi_): ...this.  Move the generation of
zip2 for zero-extends to...
(aarch64_simd_vec_unpack_hi_): ...a split of this
instruction.  Fix big-endian handling.
(vec_unpacku_lo_, vec_unpacks_lo_): Recombine into...
(vec_unpack_lo_): ...this.  Move the generation of
zip1 for zero-extends to...
(2): ...a split of this instruction.
Fix big-endian handling.
(*aarch64_zip1_uxtl): New pattern.
(aarch64_usubw_lo_zip, aarch64_uaddw_lo_zip): Delete
(aarch64_usubw_hi_zip, aarch64_uaddw_hi_zip): Likewise.
* config/aarch64/aarch64.cc (aarch64_get_shareable_reg): New function.
(aarch64_gen_shareable_zero): Use it.
(aarch64_split_simd_shift_p): New function.

gcc/testsuite/
PR target/113196
* gcc.target/aarch64/pr113196.c: New test.
* gcc.target/aarch64/simd/vmovl_high_1.c: Remove double include.
Expect uxtl2 rather than zip2.
* gcc.target/aarch64/vect_mixed_sizes_8.c: Expect zip1 rather
than uxtl.
* gcc.target/aarch64/vect_mixed_sizes_9.c: Likewise.
* gcc.target/aarch64/vect_mixed_sizes_10.c: Likewise.
---
 gcc/config/aarch64/aarch64-protos.h   |   1 +
 gcc/config/aarch64/aarch64-simd.md| 134 +-
 gcc/config/aarch64/aarch64.cc |  53 ++-
 gcc/config/aarch64/aarch64.h  |   6 +
 gcc/config/aarch64/iterators.md   |   2 +
 gcc/testsuite/gcc.target/aarch64/pr113196.c   |  23 +++
 .../gcc.target/aarch64/simd/vmovl_high_1.c|   8 +-
 .../gcc.target/aarch64/vect_mixed_sizes_10.c  |   2 +-
 .../gcc.target/aarch64/vect_mixed_sizes_8.c   |   2 +-
 .../gcc.target/aarch64/vect_mixed_sizes_9.c   |   2 +-
 10 files changed, 123 insertions(+), 110 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr113196.c

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index ce9bec79cec..4c70e8a4963 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -880,6 +880,7 @@ rtx aarch64_return_addr_rtx (void);
 rtx aarch64_return_addr (int, rtx);
 rtx aarch64_simd_gen_const_vector_dup (machine_mode, HOST_WIDE_INT);
 rtx aarch64_gen_shareable_zero (machine_mode);
+bool aarch64_split_simd_shift_p (rtx_insn *);
 bool aarch64_simd_mem_operand_p (rtx);
 bool aarch64_sve_ld1r_operand_p (rtx);
 bool aarch64_sve_ld1rq_operand_p (rtx);
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 3cd184f46fa..6f48b4d5f21 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ 

Re: [patch,avr,applied] PR target/112952 Fix attribute "io" et al. handling.

2024-01-12 Thread Georg-Johann Lay




Am 12.01.24 um 04:37 schrieb Jan-Benedict Glaw:

On Thu, 2024-01-04 17:28:02 +0100, Georg-Johann Lay  wrote:

This fixes the avr-specific attributes io, io_low and address,
that are all basically the same except that io and io_low imply
assertions on allowed addressing modes.



--- a/gcc/config/avr/avr.cc
+++ b/gcc/config/avr/avr.cc

[...]

@@ -10385,12 +10389,10 @@ avr_handle_addr_attribute (tree *node, tree name, 
tree args,
}
else if (io_p
   && (!tree_fits_shwi_p (arg)
-  || !(strcmp (IDENTIFIER_POINTER (name), "io_low") == 0
-   ? low_io_address_operand : io_address_operand)
-(GEN_INT (TREE_INT_CST_LOW (arg)), QImode)))
+  || ! IN_RANGE (TREE_INT_CST_LOW (arg), io_start, io_end)))
{
- warning_at (loc, OPT_Wattributes, "%qE attribute address "
- "out of range", name);
+ warning_at (loc, OPT_Wattributes, "%qE attribute address out of "
+ "range 0x%x...0x%x", name, (int) io_start, (int) io_end);
  *no_add = true;
}
else


Building with a recent GCC, this results in a new warning (here forced
to an error with --enable-werror-alway--enable-werror-always):

/var/lib/laminar/run/gcc-avr-elf/64/local-toolchain-install/bin/g++  -fno-PIE 
-c   -g -O2   -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE   -fno-exceptions -fno-rtti 
-fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
-Wcast-qual -Wmissing-format-attribute -Wconditionally-supported 
-Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros 
-Wno-overlength-strings -Werror -fno-common  -DHAVE_CONFIG_H -fno-PIE -I. -I. 
-I../../gcc/gcc -I../../gcc/gcc/. -I../../gcc/gcc/../include  
-I../../gcc/gcc/../libcpp/include -I../../gcc/gcc/../libcody  
-I../../gcc/gcc/../libdecnumber -I../../gcc/gcc/../libdecnumber/dpd 
-I../libdecnumber -I../../gcc/gcc/../libbacktrace   -o avr.o -MT avr.o -MMD -MP 
-MF ./.deps/avr.TPo ../../gcc/gcc/config/avr/avr.cc
../../gcc/gcc/config/avr/avr.cc: In function 'tree_node* 
avr_handle_addr_attribute(tree_node**, tree, tree, int, bool*)':
../../gcc/gcc/config/avr/avr.cc:10391:45: error: unquoted sequence of 3 
consecutive punctuation characters '...' in format [-Werror=format-diag]
10391 |   warning_at (loc, OPT_Wattributes, "%qE attribute address out of 
"
   | 
^~~
10392 |   "range 0x%x...0x%x", name, (int) io_start, (int) 
io_end);
   |   ~~~
cc1plus: all warnings being treated as errors
make[1]: *** [Makefile:2554: avr.o] Error 1
make[1]: Leaving directory 
'/var/lib/laminar/run/gcc-avr-elf/64/toolchain-build/gcc'
make: *** [Makefile:4676: all-gcc] Error 2


I think this should be "%<...%>".

MfG, JBG


Hi,

thanks for sorting this out. I would install the patch below.

I must admit that I don't understand that warning and what is
illegal about having ellipses in a format string at that place.

That warning isn't even documented, at least as of 2024-02-12

https://gcc.gnu.org/onlinedocs/gcc/Option-Index.html

There should be no quotes around the ellipses, they are intended
as real ellipses.

Plus, I saw in other modules that it warns about format strings
like printf (";%i", 10); Why is ";" not allowed there?

Johann

--


diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc
index 0cdd035fa1a..4bc3cf929de 100644
--- a/gcc/config/avr/avr.cc
+++ b/gcc/config/avr/avr.cc
@@ -10388,8 +10388,8 @@ avr_handle_addr_attribute (tree *node, tree 
name, tree args,

   && (!tree_fits_shwi_p (arg)
   || ! IN_RANGE (TREE_INT_CST_LOW (arg), io_start, 
io_end)))

{
- warning_at (loc, OPT_Wattributes, "%qE attribute address out of "
- "range 0x%x...0x%x", name, (int) io_start, (int) 
io_end);
+ warning_at (loc, OPT_Wattributes, "%qE attribute address out 
of range"
+ " 0x%x%s0x%x", name, (int) io_start, "...", (int) 
io_end);

  *no_add = true;
}
   else



[PATCH 2/2] aarch64: Use a global map to detect duplicated overloads [PR112989]

2024-01-12 Thread Richard Sandiford
As explained in the covering note to the previous patch,
the fact that aarch64-sve-* is now used for multiple header
files means that function_builder::add_overloaded_function
now needs to use a global map to detect duplicated overload
functions, instead of the member variable that it used previously.

Tested on aarch64-linux-gnu & pushed.

Richard


gcc/
PR target/112989
* config/aarch64/aarch64-sve-builtins.h
(function_builder::m_overload_names): Replace with...
* config/aarch64/aarch64-sve-builtins.cc (overload_names): ...this
new global.
(add_overloaded_function): Update accordingly, using get_identifier
to get a GGC-friendly record of the name.
---
 gcc/config/aarch64/aarch64-sve-builtins.cc | 22 ++
 gcc/config/aarch64/aarch64-sve-builtins.h  |  4 
 2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc 
b/gcc/config/aarch64/aarch64-sve-builtins.cc
index 3ad2271d51c..c2f1486315f 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
@@ -938,6 +938,10 @@ static GTY(()) vec 
*registered_functions;
overloaded functions.  */
 static hash_table *function_table;
 
+/* Maps all overloaded function names that we've registered so far to
+   their associated function_instances.  The map keys are IDENTIFIER_NODEs.  */
+static GTY(()) hash_map *overload_names;
+
 /* True if we've already complained about attempts to use functions
when the required extension is disabled.  */
 static bool reported_missing_extension_p;
@@ -1585,21 +1589,23 @@ function_builder::
 add_overloaded_function (const function_instance ,
 aarch64_feature_flags required_extensions)
 {
+  if (!overload_names)
+overload_names = hash_map::create_ggc ();
+
   char *name = get_name (instance, true);
-  if (registered_function **map_value = m_overload_names.get (name))
-{
-  gcc_assert ((*map_value)->instance == instance
- && ((*map_value)->required_extensions
- & ~required_extensions) == 0);
-  obstack_free (_string_obstack, name);
-}
+  tree id = get_identifier (name);
+  if (registered_function **map_value = overload_names->get (id))
+gcc_assert ((*map_value)->instance == instance
+   && ((*map_value)->required_extensions
+   & ~required_extensions) == 0);
   else
 {
   registered_function 
= add_function (instance, name, m_overload_type, NULL_TREE,
required_extensions, true, m_direct_overloads);
-  m_overload_names.put (name, );
+  overload_names->put (id, );
 }
+  obstack_free (_string_obstack, name);
 }
 
 /* If we are using manual overload resolution, add one function decl
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h 
b/gcc/config/aarch64/aarch64-sve-builtins.h
index 2bb893af7dd..e66729ed635 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.h
+++ b/gcc/config/aarch64/aarch64-sve-builtins.h
@@ -453,10 +453,6 @@ private:
 
   /* Used for building up function names.  */
   obstack m_string_obstack;
-
-  /* Maps all overloaded function names that we've registered so far
- to their associated function_instances.  */
-  hash_map m_overload_names;
 };
 
 /* A base class for handling calls to built-in functions.  */
-- 
2.25.1



[PATCH 1/2] aarch64: Use a separate group for SME builtins [PR112989]

2024-01-12 Thread Richard Sandiford
The PR shows that we were registering the same overloaded SVE
builtins twice.  This was supposed to be prevented by
function_builder::add_overloaded_function, which uses a map
to detect whether a function of the same name has already been
registered.  add_overloaded_function then had some asserts to
check for consistency.

However, the map that add_overloaded_function uses was a member of
function_builder itself.  That made sense when there was just one
header file, arm_sve.h, since it meant that the memory could be
reclaimed once arm_sve.h had been processed.  But now we have three
header files, and in principle, it's possible for arm_sme.h to include
overloads of things that arm_sve.h also defines.  We therefore need
to use a global map instead.

However, doing that meant that the consistency checks in
add_overloaded_function fired as expected, which showed some
latent issues.  This preliminary patch deals with those by adding
AARCH64_FL_SME to things that require AARCH64_FL_SME2.

This inconsistency led to another problem: functions were selected
for arm_sme.h over arm_sve.h based on whether they had AARCH64_FL_SME.
So some SME2-only things were actually defined in arm_sve.h, whereas
similar SME things were defined in arm_sme.h.

Choosing based on flags was an early get-started crutch that I forgot
to clean up later :(  This patch goes for the more direct approach of
having a separate table of SME builtins, as for arm_neon_sve_bridge.h.

aarch64-sve-builtins-sve2.def contains several intrinsics that are
currently SME-only but that operate entirely on vector registers.
Many of these will be extended to SVE2.1 once SVE2.1 support is added,
so the patch front-loads that by keeping the current division between
aarch64-sve-builtins-sve2.def (whose functions now go in arm_sve.h)
and aarch64-sve-builtins-sme.def (whose functions now go in arm_sme.h).

Tested on aarch64-linux-gnu & pushed.  Sorry for the breakage and for
the long fix time.

Richard


gcc/
PR target/112989
* config/aarch64/aarch64-sve-builtins.def: Don't include
aarch64-sve-builtins-sme.def.
(DEF_SME_ZA_FUNCTION_GS, DEF_SME_ZA_FUNCTION): Move to...
* config/aarch64/aarch64-sve-builtins-sme.def: ...here.
(DEF_SME_FUNCTION): New macro.  Use it and DEF_SME_FUNCTION_GS
instead of DEF_SVE_*.  Add AARCH64_FL_SME to anything that
requires AARCH64_FL_SME2.
* config/aarch64/aarch64-sve-builtins-sve2.def: Make same
AARCH64_FL_SME adjustment here.
* config/aarch64/aarch64-sve-builtins.cc (function_groups): Don't
include SME intrinsics.
(sme_function_groups): New array.
(handle_arm_sve_h): Remove check for AARCH64_FL_SME.
(handle_arm_sme_h): Use sme_function_groups instead of function_groups.

gcc/testsuite/
PR target/112989
* gcc.target/aarch64/sve/acle/general-c/clamp_1.c: Remove bogus
error test.
---
 .../aarch64/aarch64-sve-builtins-sme.def  | 53 +--
 .../aarch64/aarch64-sve-builtins-sve2.def |  1 +
 gcc/config/aarch64/aarch64-sve-builtins.cc| 26 +
 gcc/config/aarch64/aarch64-sve-builtins.def   | 13 -
 .../aarch64/sve/acle/general-c/clamp_1.c  |  2 +-
 5 files changed, 55 insertions(+), 40 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-sme.def 
b/gcc/config/aarch64/aarch64-sve-builtins-sme.def
index 5109c5e5e7d..416df0b3637 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-sme.def
+++ b/gcc/config/aarch64/aarch64-sve-builtins-sme.def
@@ -17,16 +17,31 @@
along with GCC; see the file COPYING3.  If not see
.  */
 
+#ifndef DEF_SME_FUNCTION
+#define DEF_SME_FUNCTION(NAME, SHAPE, TYPES, PREDS) \
+  DEF_SME_FUNCTION_GS (NAME, SHAPE, TYPES, none, PREDS)
+#endif
+
+#ifndef DEF_SME_ZA_FUNCTION_GS
+#define DEF_SME_ZA_FUNCTION_GS(NAME, SHAPE, TYPES, GROUP, PREDS) \
+  DEF_SME_FUNCTION_GS (NAME, SHAPE, TYPES, GROUP, PREDS)
+#endif
+
+#ifndef DEF_SME_ZA_FUNCTION
+#define DEF_SME_ZA_FUNCTION(NAME, SHAPE, TYPES, PREDS) \
+  DEF_SME_ZA_FUNCTION_GS (NAME, SHAPE, TYPES, none, PREDS)
+#endif
+
 #define REQUIRED_EXTENSIONS 0
-DEF_SVE_FUNCTION (arm_has_sme, bool_inherent, none, none)
-DEF_SVE_FUNCTION (arm_in_streaming_mode, bool_inherent, none, none)
+DEF_SME_FUNCTION (arm_has_sme, bool_inherent, none, none)
+DEF_SME_FUNCTION (arm_in_streaming_mode, bool_inherent, none, none)
 #undef REQUIRED_EXTENSIONS
 
 #define REQUIRED_EXTENSIONS AARCH64_FL_SME
-DEF_SVE_FUNCTION (svcntsb, count_inherent, none, none)
-DEF_SVE_FUNCTION (svcntsd, count_inherent, none, none)
-DEF_SVE_FUNCTION (svcntsh, count_inherent, none, none)
-DEF_SVE_FUNCTION (svcntsw, count_inherent, none, none)
+DEF_SME_FUNCTION (svcntsb, count_inherent, none, none)
+DEF_SME_FUNCTION (svcntsd, count_inherent, none, none)
+DEF_SME_FUNCTION (svcntsh, count_inherent, none, none)
+DEF_SME_FUNCTION (svcntsw, count_inherent, none, none)
 DEF_SME_ZA_FUNCTION (svldr, ldr_za, za, 

[PATCHv3] aarch64/expr: Use ccmp when the outer expression is used twice [PR100942]

2024-01-12 Thread Richard Sandiford
Andrew Pinski  writes:
> Ccmp is not used if the result of the and/ior is used by both
> a GIMPLE_COND and a GIMPLE_ASSIGN. This improves the code generation
> here by using ccmp in this case.
> Two changes is required, first we need to allow the outer statement's
> result be used more than once.
> The second change is that during the expansion of the gimple, we need
> to try using ccmp. This is needed because we don't use expand the ssa
> name of the lhs but rather expand directly from the gimple.
>
> A small note on the ccmp_4.c testcase, we should be able to get slightly
> better than with this patch but it is one extra instruction compared to
> before.
>
> Diff from v1:
> * v2: Split out expand_gimple_assign_ssa so the we only need to handle
> promotion once. Add ccmp_5.c testcase which was suggested. Change comment
> on ccmp_candidate_p.

I meant more that we should split out the gassign handling in
expand_expr_real_1, since we're effectively making cfgexpand follow
it more closely.  What do you think about the attached version?
Tested on aarch64-linux-gnu and x86_64-linux-gnu.

OK for the expr/cfgexpand bits?

Thanks,
Richard



Ccmp is not used if the result of the and/ior is used by both
a GIMPLE_COND and a GIMPLE_ASSIGN. This improves the code generation
here by using ccmp in this case.
Two changes is required, first we need to allow the outer statement's
result be used more than once.
The second change is that during the expansion of the gimple, we need
to try using ccmp. This is needed because we don't use expand the ssa
name of the lhs but rather expand directly from the gimple.

A small note on the ccmp_4.c testcase, we should be able to get slightly
better than with this patch but it is one extra instruction compared to
before.

PR target/100942

gcc/ChangeLog:

* ccmp.cc (ccmp_candidate_p): Add outer argument.
Allow if the outer is true and the lhs is used more
than once.
(expand_ccmp_expr): Update call to ccmp_candidate_p.
* expr.h (expand_expr_real_gassign): Declare.
* expr.cc (expand_expr_real_gassign): New function, split out from...
(expand_expr_real_1): ...here.
* cfgexpand.cc (expand_gimple_stmt_1): Use expand_expr_real_gassign.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/ccmp_3.c: New test.
* gcc.target/aarch64/ccmp_4.c: New test.
* gcc.target/aarch64/ccmp_5.c: New test.

Signed-off-by: Andrew Pinski 
Co-authored-by: Richard Sandiford 
---
 gcc/ccmp.cc   |  12 +--
 gcc/cfgexpand.cc  |  31 ++-
 gcc/expr.cc   | 103 --
 gcc/expr.h|   3 +
 gcc/testsuite/gcc.target/aarch64/ccmp_3.c |  20 +
 gcc/testsuite/gcc.target/aarch64/ccmp_4.c |  35 
 gcc/testsuite/gcc.target/aarch64/ccmp_5.c |  20 +
 7 files changed, 149 insertions(+), 75 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/ccmp_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/ccmp_4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/ccmp_5.c

diff --git a/gcc/ccmp.cc b/gcc/ccmp.cc
index 09d6b5595a4..7cb525addf4 100644
--- a/gcc/ccmp.cc
+++ b/gcc/ccmp.cc
@@ -90,9 +90,10 @@ ccmp_tree_comparison_p (tree t, basic_block bb)
If all checks OK in expand_ccmp_expr, it emits insns in prep_seq, then
insns in gen_seq.  */
 
-/* Check whether G is a potential conditional compare candidate.  */
+/* Check whether G is a potential conditional compare candidate; OUTER is true 
if
+   G is the outer most AND/IOR.  */
 static bool
-ccmp_candidate_p (gimple *g)
+ccmp_candidate_p (gimple *g, bool outer = false)
 {
   tree lhs, op0, op1;
   gimple *gs0, *gs1;
@@ -109,8 +110,9 @@ ccmp_candidate_p (gimple *g)
   lhs = gimple_assign_lhs (g);
   op0 = gimple_assign_rhs1 (g);
   op1 = gimple_assign_rhs2 (g);
-  if ((TREE_CODE (op0) != SSA_NAME) || (TREE_CODE (op1) != SSA_NAME)
-  || !has_single_use (lhs))
+  if ((TREE_CODE (op0) != SSA_NAME) || (TREE_CODE (op1) != SSA_NAME))
+return false;
+  if (!outer && !has_single_use (lhs))
 return false;
 
   bb = gimple_bb (g);
@@ -284,7 +286,7 @@ expand_ccmp_expr (gimple *g, machine_mode mode)
   rtx_insn *last;
   rtx tmp;
 
-  if (!ccmp_candidate_p (g))
+  if (!ccmp_candidate_p (g, true))
 return NULL_RTX;
 
   last = get_last_insn ();
diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
index 1db22f0a1a3..381ed2c82d7 100644
--- a/gcc/cfgexpand.cc
+++ b/gcc/cfgexpand.cc
@@ -3971,37 +3971,18 @@ expand_gimple_stmt_1 (gimple *stmt)
  {
rtx target, temp;
bool nontemporal = gimple_assign_nontemporal_move_p (assign_stmt);
-   struct separate_ops ops;
bool promoted = false;
 
target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
if (GET_CODE (target) == SUBREG && SUBREG_PROMOTED_VAR_P (target))
  promoted = true;
 
-   

Re: [PATCH] varasm: Fix up process_pending_assemble_externals [PR113182]

2024-01-12 Thread Richard Biener
On Fri, 12 Jan 2024, Jakub Jelinek wrote:

> Hi!
> 
> John reported that on HP-UX we no longer emit needed external libcalls.
> 
> The problem is that we didn't strip name encoding when looking up
> the identifiers in assemble_external_libcall and
> process_pending_assemble_externals, while
> assemble_name_resolve does that:
>   const char *real_name = targetm.strip_name_encoding (name);
>   tree id = maybe_get_identifier (real_name);
> 
>   if (id)
> {
> ...
>   mark_referenced (id);
> The intention is that assemble_external_libcall ensures the IDENTIFIER
> exists for the external libcall, then for actually emitted calls
> assemble_name_resolve sees those IDENTIFIERS and sets TREE_SYMBOL_REFERENCED
> on them and finally process_pending_assemble_externals looks the
> IDENTIFIER up again and checks its TREE_SYMBOL_REFERENCED.
> 
> But without the strip_name_encoding call, they can look up different
> identifiers and those are likely never used.
> 
> In the PR, John was discussing whether get_identifier or
> maybe_get_identifier should be used, I believe in assemble_external_libcall
> we definitely want to use get_identifier, we need an IDENTIFIER allocated
> so that it can be actually tracked, in process_pending_assemble_externals
> it doesn't matter, the IDENTIFIER should be already created.
> 
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
> ok for trunk?

OK.

> 2024-01-12  John David Anglin  
>   Jakub Jelinek  
> 
>   PR middle-end/113182
>   * varasm.cc (process_pending_assemble_externals,
>   assemble_external_libcall): Use targetm.strip_name_encoding
>   before calling get_identifier.
> 
> --- gcc/varasm.cc.jj  2024-01-08 21:56:04.968516120 +0100
> +++ gcc/varasm.cc 2024-01-11 18:44:19.171399167 +0100
> @@ -2543,7 +2543,8 @@ process_pending_assemble_externals (void
>for (rtx list = pending_libcall_symbols; list; list = XEXP (list, 1))
>  {
>rtx symbol = XEXP (list, 0);
> -  tree id = get_identifier (XSTR (symbol, 0));
> +  const char *name = targetm.strip_name_encoding (XSTR (symbol, 0));
> +  tree id = get_identifier (name);
>if (TREE_SYMBOL_REFERENCED (id))
>   targetm.asm_out.external_libcall (symbol);
>  }
> @@ -2631,7 +2632,8 @@ assemble_external_libcall (rtx fun)
>   reference to it will mark its tree node as referenced, via
>   assemble_name_resolve.  These are eventually emitted, if
>   used, in process_pending_assemble_externals. */
> -  get_identifier (XSTR (fun, 0));
> +  const char *name = targetm.strip_name_encoding (XSTR (fun, 0));
> +  get_identifier (name);
>pending_libcall_symbols = gen_rtx_EXPR_LIST (VOIDmode, fun,
>  pending_libcall_symbols);
>  }
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH v2 2/2] LoongArch: When the code model is extreme, the symbol address is obtained through macro instructions regardless of the value of -mexplicit-relocs.

2024-01-12 Thread Xi Ruoyao
在 2024-01-12星期五的 09:46 +0800,chenglulu写道:

> > I found an issue bootstrapping GCC with -mcmodel=extreme in BOOT_CFLAGS:
> > we need a target hook to tell the generic code
> > UNSPEC_LA_PCREL_64_PART{1,2} are just a wrapper around symbols, or we'll
> > see millions lines of messages like
> > 
> > ../../gcc/gcc/tree.h:4171:1: note: non-delegitimized UNSPEC
> > UNSPEC_LA_PCREL_64_PART1 (42) found in variable location
> 
> I build GCC with -mcmodel=extreme in BOOT_CFLAGS, but I haven't reproduced 
> the problem you mentioned.
> 
>     $ ../configure --host=loongarch64-linux-gnu 
> --target=loongarch64-linux-gnu --build=loongarch64-linux-gnu \
>     --with-arch=loongarch64 --with-abi=lp64d --enable-tls 
> --enable-languages=c,c++,fortran,lto --enable-plugin \
>     --disable-multilib --disable-host-shared --enable-bootstrap 
> --enable-checking=release
>     $ make BOOT_FLAGS="-mcmodel=extreme"
> 
> What did I do wrong?:-(

BOOT_CFLAGS, not BOOT_FLAGS :).

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] strub: Only unbias stack point for SPARC_STACK_BOUNDARY_HACK [PR113100]

2024-01-12 Thread Alexandre Oliva
On Jan 12, 2024, "Kewen.Lin"  wrote:

>>> By checking PR112917, IMHO we should keep this unbiasing
>>> guarded under SPARC_STACK_BOUNDARY_HACK (TARGET_ARCH64 &&
>>> TARGET_STACK_BIAS), similar to some existing code special
>>> treating SPARC stack bias.
>> 
>> I'm afraid this change will most certainly regress 32-bit sparc, because
>> of the large register save area.

> Oh, I read the comments and commit logs in PR112917, mainly
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112917#{c4,c5,c6},
> and the "sparc64" in subject of commit r14-6737 also implies
> that this unbiasing is only required for sparc64, so I thought
> it should be safe to guard with SPARC_STACK_BOUNDARY_HACK.

It is safe, in a way, because that protects potentially active stack
areas, but it's unsafe in that it may leak data that stack scrubbing was
supposed to scrub.  There's no conservative solution here, alas; we have
to get it just right.

Specifically on sparc32, if __builtin_scrub_leave allocated its own
frame (it doesn't) with the large register-save area for its potential
(but inexistent) callees to use, it could overlap with a large chunk of
the very stack frame that it's supposed to clear.

Unfortunately, this is slowly drifting away from the notion of stack
address.  I mean, all of the following could conceivably be returned by
__builtin_stack_address:

- the (biased) stack pointer

- the address of the red zone

- the unbiased stack pointer

- the address of the save area reserved by callees for potential callees

- the boundary between caller- and callee-used stack space

The last one is what we need for stack scrubbing, so that's what I'm
planning to implement, but I'm pondering whether to change
__builtin_stack_address() to take an extra argument to select among the
different possibilities, or of other means to query these various
offsets.  It feels like overthinking, so I'm trying to push these
thoughts aside, but...  Does anyone think that would be a desirable
feature?  We can always add it later.


>> ISTM that PPC sets up a save area between the outgoing args and the

> Yes, taking 64-bit PowerPC ELF abi 1.9 as example:

*nod*, and that's a caller-used save area, as opposed to sparc's
callee-used save area.  Whereas the caller-used area needs to be
preserved across a call, the callee-used one could conceivably even be
used as scratch space by the caller.

> Nice, thanks!  Welcome back. :-)

Thank you!

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


Re: [PATCH] rs6000: New pass for replacement of adjacent lxv with lxvp.

2024-01-12 Thread Surya Kumari Jangala
Hi Ajit,
I have taken a quick look at the patch and my comments are inlined:

On 09/01/24 4:44 pm, Ajit Agarwal wrote:
> Hello All:
> 
> This pass is registered before ira rtl pass.
> Bootstrapped and regtested for powerpc64-linux-gnu.
> 
> No regressions for spec 2017 benchmarks and improvements for some of the
> FP and INT benchmarks.
> 
> Vladimir:
> 
> I did modify IRA and LRA register Allocators. Please review.
> 
> Thanks & Regards
> Ajit
> 
> rs6000: New pass for replacement of adjacent lxv with lxvp.

Please add PR number.

> 
> New pass to replace adjacent memory addresses lxv with lxvp.
> This pass is registered before ira rtl pass.

Please add explanation of what changes have been made in IRA/LRA
and why those changes are required.

> 
> 2024-01-09  Ajit Kumar Agarwal  
> 


> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index f0676c830e8..4cf15e807de 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -518,7 +518,7 @@ or1k*-*-*)
>   ;;
>  powerpc*-*-*)
>   cpu_type=rs6000
> - extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
> + extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o 
> rs6000-vecload-opt.o"
>   extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
>   extra_objs="${extra_objs} rs6000-builtins.o rs6000-builtin.o"
>   extra_headers="ppc-asm.h altivec.h htmintrin.h htmxlintrin.h"
> @@ -555,7 +555,7 @@ riscv*)
>   ;;
>  rs6000*-*-*)
>   extra_options="${extra_options} g.opt fused-madd.opt 
> rs6000/rs6000-tables.opt"
> - extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
> + extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o 
> rs6000-vecload-opt.o"
>   extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
>   target_gtfiles="$target_gtfiles 
> \$(srcdir)/config/rs6000/rs6000-logue.cc 
> \$(srcdir)/config/rs6000/rs6000-call.cc"
>   target_gtfiles="$target_gtfiles 
> \$(srcdir)/config/rs6000/rs6000-pcrel-opt.cc"
> diff --git a/gcc/config/rs6000/rs6000-passes.def 
> b/gcc/config/rs6000/rs6000-passes.def
> index ca899d5f7af..e6a9810ee24 100644
> --- a/gcc/config/rs6000/rs6000-passes.def
> +++ b/gcc/config/rs6000/rs6000-passes.def
> @@ -28,6 +28,7 @@ along with GCC; see the file COPYING3.  If not see
>   The power8 does not have instructions that automaticaly do the byte 
> swaps
>   for loads and stores.  */
>INSERT_PASS_BEFORE (pass_cse, 1, pass_analyze_swaps);
> +  INSERT_PASS_BEFORE (pass_ira, 1, pass_analyze_vecload);

Please add comments, similar to the other INSERT_PASS_BEFORE(...).

>  
>/* Pass to do the PCREL_OPT optimization that combines the load of an
>   external symbol's address along with a single load or store using that
> diff --git a/gcc/config/rs6000/rs6000-vecload-opt.cc 
> b/gcc/config/rs6000/rs6000-vecload-opt.cc
> new file mode 100644
> index 000..f02c8337f2e
> --- /dev/null
> +++ b/gcc/config/rs6000/rs6000-vecload-opt.cc
> @@ -0,0 +1,395 @@
> +/* Subroutines used to replace lxv with lxvp
> +   for TARGET_POWER10 and TARGET_VSX,

s/,/.

Comment can be rewritten as follows to specify the fact that we replace
lxv's having adjacent addresses:
Subroutines used to replace lxv having adjacent addresses with lxvp.


> +/* Identify lxv instruction that are candidate of adjacent
> +   memory addresses and replace them with mma instruction lxvp.  */

The comment needs modification for better readability, perhaps as follows:
Identify lxv instructions that have adjacent memory addresses 
and replace them with an lxvp instruction.

> +unsigned int
> +rs6000_analyze_vecload (function *fun)
> +{
> +  df_set_flags (DF_RD_PRUNE_DEAD_DEFS);
> +  df_chain_add_problem (DF_DU_CHAIN | DF_UD_CHAIN);
> +  df_analyze ();
> +  df_set_flags (DF_DEFER_INSN_RESCAN);
> +
> +  /* Rebuild ud- and du-chains.  */
> +  df_remove_problem (df_chain);
> +  df_process_deferred_rescans ();
> +  df_set_flags (DF_RD_PRUNE_DEAD_DEFS);
> +  df_chain_add_problem (DF_DU_CHAIN | DF_UD_CHAIN);
> +  df_analyze ();
> +  df_set_flags (DF_DEFER_INSN_RESCAN);
> +
> +  basic_block bb;
> +  bool changed = false;
> +  rtx_insn *insn, *curr_insn = 0;
> +  rtx_insn *insn1 = 0, *insn2 = 0;
> +  bool first_vec_insn = false;
> +  unsigned int regno = 0;
> +
> +  FOR_ALL_BB_FN (bb, fun)

I am assuming that the 2 lxv instructions that we are searching for
should belong to the same BB. Otherwise, we risk moving a load insn across
basic blocks. In which case, the variable "first_vec_insn" has to be set to 
false here. It has to be false each time we start processing a new BB.

> +FOR_BB_INSNS_SAFE (bb, insn, curr_insn)
> +{
> +  if (LABEL_P (insn))
> + continue;
> +
> +  if (NONDEBUG_INSN_P (insn) && GET_CODE (PATTERN (insn)) == SET)
> + {

Please correct the indentation.

> +   rtx set = single_set (insn);
> +   rtx src = SET_SRC (set);
> +   machine_mode mode = GET_MODE (SET_DEST (set));
> +
> +   if (TARGET_VSX && TARGET_POWER10 && MEM_P (src))


Ping [PATCH] testsuite: Reduce gcc.dg/torture/inline-mem-cpy-1.c by 11 for simulators

2024-01-12 Thread Hans-Peter Nilsson
Ping.  (Don't miss the gcc.dg/torture/inline-mem-cpy-1.c part.)

On Mon, 1 Jan 2024, Hans-Peter Nilsson wrote:

> Tested mmix-knuth-mmixware (where all torture-variants of
> gcc.dg/torture/inline-mem-cpy-1.c now pass) and native
> x86_64-pc-linux-gnu.  Also stepped through the test for native,
> w/wo. RUN_FRACTION defined to see that it worked as intended.
> 
> You may wonder what about the "sibling" tests inline-mem-cmp-1.c and
> inline-mem-cpy-cmp-1.c.  Well, they FAIL, but not because of
> timeouts(!)  To be continued
> 
> Ok to commit?
> 
> Or, other suggestions?
> 
> -- >8 --
> The test inline-mem-cpy-1.c takes 16 minutes at -O0 for the mmix
> simulator on a 3.5 year old laptop and thus always times out, despite
> the x 2 timeout (i.e. 10 minutes), and times out at all optimization
> levels.  For the included file (when run as gcc.dg/memcmp-1.c), the
> execution time on the same host is 9 minutes 54 seconds, so just
> within 10 minutes timeout limit.  Seems pragmatically best to reduce
> the torture-test by a factor of about 10, but there's no obvious small
> set of entities to scale down to get the intended effect, and
> splitting up the test into several tests seem a bit too intrusive.
> 
> Instead, introduce pseudo-random machinery to skip all but each
> RUN_FRACTION:th iteration, defaulting to no change when RUN_FRACTION
> isn't defined.  Use 11 for RUN_FRACTION, assuming this prime will lead
> to even distribution within nested iterations with loops looking like
> (0, 1) : (0, 1).  Do this only for the main loop in
> test_driver_memcmp; the "outermost" two levels of iterations.
> 
> With this, execution time for -O0 as above is down to 1 minute 32
> seconds.
> 
>   * gcc.dg/torture/inline-mem-cpy-1.c: Pass -DRUN_FRACTION=11
>   when testing in a simulator.
>   * gcc.dg/memcmp-1.c [RUN_FRACTION]: Add machinery to run only
>   for each RUN_FRACTION:th iteration.
>   (main): Call initialize_skip_iteration_count.
>   (test_driver_memcmp): Check SKIP_ITERATION for each iteration.
> ---
>  gcc/testsuite/gcc.dg/memcmp-1.c   | 35 +++
>  .../gcc.dg/torture/inline-mem-cpy-1.c |  1 +
>  2 files changed, 36 insertions(+)
> 
> diff --git a/gcc/testsuite/gcc.dg/memcmp-1.c b/gcc/testsuite/gcc.dg/memcmp-1.c
> index ea837ca0f577..13ef5b3380d0 100644
> --- a/gcc/testsuite/gcc.dg/memcmp-1.c
> +++ b/gcc/testsuite/gcc.dg/memcmp-1.c
> @@ -34,6 +34,36 @@ int lib_strncmp(const char *a, const char *b, size_t n)
>  
>  #define MAX_SZ 600
>  
> +/* A means to run only a fraction of the tests, beginning at a random
> +   count.  */
> +#ifdef RUN_FRACTION
> +
> +#define SKIP_ITERATION skip_iteration ()
> +static unsigned int iteration_count;
> +
> +static _Bool
> +skip_iteration (void)
> +{
> +  _Bool run = ++iteration_count == RUN_FRACTION;
> +
> +  if (run)
> +iteration_count = 0;
> +
> +  return !run;
> +}
> +
> +static void
> +initialize_skip_iteration_count ()
> +{
> +  srand (2024);
> +  iteration_count = (unsigned int) (rand ()) % RUN_FRACTION;
> +}
> +
> +#else
> +#define SKIP_ITERATION 0
> +#define initialize_skip_iteration_count()
> +#endif
> +
>  #define DEF_RS(ALIGN)  \
>  static void test_memcmp_runtime_size_ ## ALIGN (const char *str1,   \
>   const char *str2,  \
> @@ -110,6 +140,8 @@ static void test_driver_memcmp (void (test_memcmp)(const 
> char *, const char *, i
>int i,j,l;
>for(l=0;l  for(i=0;i +  if (SKIP_ITERATION)
> + continue;
>for(j=0;j   buf1[j] = rand() & 0xff;
>   buf2[j] = buf1[j];
> @@ -128,6 +160,8 @@ static void test_driver_memcmp (void (test_memcmp)(const 
> char *, const char *, i
>for(diff_pos = ((test_sz>TZONE)?(test_sz-TZONE):0); diff_pos < 
> test_sz+TZONE; diff_pos++)
>  for(zero_pos = ((test_sz>TZONE)?(test_sz-TZONE):0); zero_pos < 
> test_sz+TZONE; zero_pos++)
>{
> + if (SKIP_ITERATION)
> +   continue;
>   memset(buf1, 'A', 2*test_sz);
>   memset(buf2, 'A', 2*test_sz);
>   buf2[diff_pos] = 'B';
> @@ -490,6 +524,7 @@ DEF_TEST(49,1)
>  int
>  main(int argc, char **argv)
>  {
> +  initialize_skip_iteration_count ();
>  #ifdef TEST_ALL
>  RUN_TEST(1,1)
>  RUN_TEST(1,2)
> diff --git a/gcc/testsuite/gcc.dg/torture/inline-mem-cpy-1.c 
> b/gcc/testsuite/gcc.dg/torture/inline-mem-cpy-1.c
> index f4952554dd01..f0752349571b 100644
> --- a/gcc/testsuite/gcc.dg/torture/inline-mem-cpy-1.c
> +++ b/gcc/testsuite/gcc.dg/torture/inline-mem-cpy-1.c
> @@ -1,5 +1,6 @@
>  /* { dg-do run } */
>  /* { dg-options "-finline-stringops=memcpy -save-temps -g0 -fno-lto" } */
> +/* { dg-additional-options "-DRUN_FRACTION=11" { target simulator } } */
>  /* { dg-timeout-factor 2 } */
>  
>  #include "../memcmp-1.c"
> -- 
> 2.30.2
> 
> 


Re: [PATCH] libstdc++: Implement P2255R2 dangling checks for std::tuple [PR108822]

2024-01-12 Thread Jonathan Wakely
On Thu, 11 Jan 2024 at 22:17, Jonathan Wakely wrote:
>
> I'd like to commit this to trunk for GCC 14. Please take a look.
>
> -- >8 --
>
> This is the last part of PR libstdc++/108822 implementing P2255R2, which
> makes it ill-formed to create a std::tuple that would bind a reference
> to a temporary.
>
> The dangling checks are implemented as deleted constructors for C++20
> and higher, and as Debug Mode static assertions in the constructor body
> for older standards. This is similar to the r13-6084-g916ce577ad109b
> changes for std::pair.
>
> As part of this change, I've reimplemented most of std::tuple for C++20,
> making use of concepts to replace the enable_if constraints, and using
> conditional explicit to avoid duplicating most constructors. We could
> use conditional explicit for the C++11 implementation too (with pragmas
> to disables the -Wc++17-extensions warnings), but that should be done as
> a stage 1 change for GCC 15 rather than now.
>
> The partial specialization for std::tuple is no longer used for
> C++20 (or more precisely, for a C++20 compiler that supports concepts
> and conditional explicit). The additional constructors and assignment
> operators that take std::pair arguments have been added to the C++20
> implementation of the primary template, with sizeof...(_Elements)==2
> constraints. This avoids reimplementing all the other constructors in
> the std::tuple partial specialization to use concepts. This way
> we avoid four implementations of every constructor and only have three!
> (The primary template has an implementation of each constructor for
> C++11 and another for C++20, and the tuple specialization has an
> implementation of each for C++11, so that's three for each constructor.)
>
> In order to make the constraints more efficient on the C++20 version of
> the default constructor I've also added a variable template for the
> __is_implicitly_default_constructible trait, implemented using concepts.

[snip]

> +#if __cpp_concepts // >= C++20
> +private:
> +  template
> +   static consteval bool
> +   __assignable()

This causes errors for -std=c++17 -fconcepts-ts because that defines
__cpp_concepts=20157L, but does not allow C++20 consteval to be used.

I used a different condition for the constructors:
#if __cpp_concepts && __cpp_conditional_explicit // >= C++20
The difference is because the assignment ops don't use explicit. The
additional check for __cpp_conditional_explicit means it already
requires C++20, so doesn't match for -std=c++17 -fconcepts-ts. So that
preprocessor group didn't cause problems.
N.B. The different conditions means that for a compiler that supports
concepts but not conditional explicit we will use concepts for the
assignment ops, but not for the constructors. And you'll still get the
partial specialization for std::tuple, and that partial
specialization will be missing the C++23 constructors for ranges::zip.
I think that's fine - if you don't have a good enough C++20 compiler
(i.e. one that defines __cpp_conditional_explicit) then you don't get
a complete C++20 std::tuple, let alone a complete C++23 std::tuple.
dealwithit.jpg

I could just use constexpr instead of consteval for those helper
functions, but I think I will add a check for __cpp_consteval. I don't
feel comfortable trying to make the new assignment ops work with
-std=c++17 -fconcepts-ts as there might be other interactions with
C++20 features that will go unnoticed, as we don't routinely test the
whole library with C++17 + Concepts TS.



Re: [PATCH V3] RISC-V: Adjust scalar_to_vec cost

2024-01-12 Thread juzhe.zhong
VLA is a known issue for a long time.GCC doesn't have too much cse optimization forVLA vectors. It should be a big work to investigate what's going on.I think most cse optimization for precomputed result are vls loop. So I think as long as we can do a good job on cost model which pick appropriate vls loop. It's not big issue and not high priority for me. Replied Message FromRobin DappDate01/12/2024 18:10 ToJuzhe-Zhong,gcc-patches@gcc.gnu.org Ccrdapp@gmail.com,kito.ch...@gmail.com,kito.ch...@sifive.com,jeffreya...@gmail.comSubjectRe: [PATCH V3] RISC-V: Adjust scalar_to_vec cost> Tested on both RV32/RV64 no regression, Ok for trunk ?

Yes, thanks!

Btw out of curiosity, did you see why we actually fail to
optimize away the VLA loop?  We should open a bug for that
I suppose.

Regards
 Robin




Re: [PATCH] Add support for function attributes and variable attributes

2024-01-12 Thread Guillaume Gomez
> It sounds like the patch you have locally is ready, but it has some
> nontrivial changes compared to the last version you posted to the list.
> Please post your latest version to the list.

Sure!

This patch adds the support for attributes on functions and variables. It does
so by adding the following functions:

* gcc_jit_function_add_attribute
* gcc_jit_function_add_string_attribute
* gcc_jit_function_add_integer_array_attribute
* gcc_jit_lvalue_add_string_attribute

It adds the following types:

* gcc_jit_fn_attribute
* gcc_jit_variable_attribute

It adds tests to ensure that the attributes are correctly applied.

> Do you have push rights, or do you need me to push it for you?

I have push rights so I'll merge the patch myself. But thanks for offering to
do it.

Le jeu. 11 janv. 2024 à 23:38, David Malcolm  a écrit :
>
> On Thu, 2024-01-11 at 22:40 +0100, Guillaume Gomez wrote:
> > Hi David,
> >
> > > The above looks correct, but the patch adds the entrypoint
> > > descriptions
> > > to topics/types.rst, which seems like the wrong place.  The
> > > function-
> > > related ones should be in topics/functions.rst in the "Functions"
> > > section and the lvalue/variable one in topics/expression.rst after
> > > the
> > > "Global variables" section.
> >
> > Ah indeed. Mix-up on my end. Fixed it.
> >
> > > test-restrict.c is a pre-existing testcase, so please don't delete
> > > its
> > > entry.
> >
> > Ah indeed, I went too quickly and thought it was a test I renamed...
> >
> > > BTW, the ChangeLog entry mentions adding test-restrict.c, but the
> > > patch
> > > doesn't add it, so that part of the proposed ChangeLog is wrong.
> > >
> > > Does the patch pass ./contrib/gcc-changelog/git_check_commit.py ?
> >
> > I messed up a bit, fixed it thanks to you. I didn't run the script in
> > my last
> > update but just did:
> >
> > ```
> > $ contrib/gcc-changelog/git_check_commit.py $(git log -1 --format=%h)
> > Checking 3849ee2eadf0eeec2b0080a5142ced00be96a60d: OK
> > ```
> >
> > > Otherwise, looks good, assuming that the patch has been tested with
> > > the
> > > full jit testsuite.
> >
> > When rebasing on upstream yesterday I discovered that two tests
> > were not working anymore. For the first one, it was simply because of
> > the changes in `dummy-frontend.cc`. For the second one
> > (test-noinline-attribute.c), it was because the rules for inlining
> > changed
> > since we wrote this patch apparently (our fork is very late). Antoni
> > discovered
> > that we could just add a call to `asm` to prevent this from happening
> > so I
> > added it.
> >
> > So yes, all jit tests are passing as expected. :)
>
> Good.
>
> It sounds like the patch you have locally is ready, but it has some
> nontrivial changes compared to the last version you posted to the list.
> Please post your latest version to the list.
>
> Do you have push rights, or do you need me to push it for you?
>
> Thanks
> Dave
>
> >
> > Le jeu. 11 janv. 2024 à 19:46, David Malcolm  a
> > écrit :
> > >
> > > On Thu, 2024-01-11 at 01:00 +0100, Guillaume Gomez wrote:
> > > > Hi David.
> > > >
> > > > Thanks for the review!
> > > >
> > > > > > +.. function::  void\
> > > > > > +   gcc_jit_lvalue_add_string_attribute
> > > > > > (gcc_jit_lvalue *variable,
> > > > > > +enum
> > > > > > gcc_jit_fn_attribute attribute,
> > > > >
> > > > > ^^
> > > > >
> > > > > This got out of sync with the declaration in the header file;
> > > > > it
> > > > > should
> > > > > be enum gcc_jit_variable_attribute attribute
> > > >
> > > > Indeed, good catch!
> > > >
> > > > > I took a brief look through the handler functions and with the
> > > > > above
> > > > > caveat I didn't see anything obviously wrong.  I'm going to
> > > > > assume
> > > > > this
> > > > > code is OK given that presumably you've been testing it within
> > > > > rustc,
> > > > > right?
> > > >
> > > > Both in rustc and in the JIT tests we added.
> > > >
> > > > [..snip...]
> > > >
> > > > I added all the missing `RETURN_IF_FAIL` you mentioned. None of
> > > > the
> > > > arguments should be `NULL` so it was a mistake not to check it.
> > > >
> > > > [..snip...]
> > > >
> > > > I removed the tests comments as you mentioned.
> > > >
> > > > > Please update jit.dg/all-non-failing-tests.h for the new tests;
> > > > > it's
> > > > > meant to list all of the (non failing) tests alphabetically.
> > > >
> > > > It's not always correctly sorted. Might be worth sending a patch
> > > > after this
> > > > one gets merged to fix that.
> > > >
> > > > > I *think* all of the new tests aren't suitable to be run as
> > > > > part of
> > > > > a
> > > > > shared context (e.g. due to touching the optimization level or
> > > > > examining generated asm), so they should be listed in that
> > > > > header
> > > > > with
> > > > > comments explaining why.
> > > >
> > > > I added them with a comment on top of each of them.
> > > >
> > > > I joined the new patch 

Re: [PATCH V3] RISC-V: Adjust scalar_to_vec cost

2024-01-12 Thread Robin Dapp
> Tested on both RV32/RV64 no regression, Ok for trunk ?

Yes, thanks!

Btw out of curiosity, did you see why we actually fail to
optimize away the VLA loop?  We should open a bug for that
I suppose.

Regards
 Robin



Re: [PATCH 2/2] RISC-V/testsuite: Also verify if-conversion runs for pr105314.c

2024-01-12 Thread Andrew Pinski
On Thu, Jan 11, 2024 at 3:37 PM Maciej W. Rozycki  wrote:
>
> Verify that if-conversion succeeded through noce_try_store_flag_mask, as
> per PR rtl-optimization/105314, tightening the test case and making it
> explicit.
>
> gcc/testsuite/
> * gcc.target/riscv/pr105314.c: Scan the RTL "ce1" pass too.

I have an objection for this, if we are checking the RTL pass and not
overall code generation, then maybe we change the testcase so that it
is a RTL testcase instead.
Especially when there might be improvements going into GCC 15
specifically targeting ifcvt on the gimple level (I am planning on
doing some).

Thanks,
Andrew Pinski

> ---
>  gcc/testsuite/gcc.target/riscv/pr105314.c |2 ++
>  1 file changed, 2 insertions(+)
>
> gcc-test-riscv-pr105314-rtl.diff
> Index: gcc/gcc/testsuite/gcc.target/riscv/pr105314.c
> ===
> --- gcc.orig/gcc/testsuite/gcc.target/riscv/pr105314.c
> +++ gcc/gcc/testsuite/gcc.target/riscv/pr105314.c
> @@ -1,6 +1,7 @@
>  /* PR rtl-optimization/105314 */
>  /* { dg-do compile } */
>  /* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
> +/* { dg-options "-fdump-rtl-ce1" } */
>
>  long
>  foo (long a, long b, long c)
> @@ -10,4 +11,5 @@ foo (long a, long b, long c)
>return a;
>  }
>
> +/* { dg-final { scan-rtl-dump-times "if-conversion succeeded through 
> noce_try_store_flag_mask" 1 "ce1" } } */
>  /* { dg-final { scan-assembler-not "\\s(?:beq|bne)\\s" } } */


[PATCH] varasm: Fix up process_pending_assemble_externals [PR113182]

2024-01-12 Thread Jakub Jelinek
Hi!

John reported that on HP-UX we no longer emit needed external libcalls.

The problem is that we didn't strip name encoding when looking up
the identifiers in assemble_external_libcall and
process_pending_assemble_externals, while
assemble_name_resolve does that:
  const char *real_name = targetm.strip_name_encoding (name);
  tree id = maybe_get_identifier (real_name);

  if (id)
{
...
  mark_referenced (id);
The intention is that assemble_external_libcall ensures the IDENTIFIER
exists for the external libcall, then for actually emitted calls
assemble_name_resolve sees those IDENTIFIERS and sets TREE_SYMBOL_REFERENCED
on them and finally process_pending_assemble_externals looks the
IDENTIFIER up again and checks its TREE_SYMBOL_REFERENCED.

But without the strip_name_encoding call, they can look up different
identifiers and those are likely never used.

In the PR, John was discussing whether get_identifier or
maybe_get_identifier should be used, I believe in assemble_external_libcall
we definitely want to use get_identifier, we need an IDENTIFIER allocated
so that it can be actually tracked, in process_pending_assemble_externals
it doesn't matter, the IDENTIFIER should be already created.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
ok for trunk?

2024-01-12  John David Anglin  
Jakub Jelinek  

PR middle-end/113182
* varasm.cc (process_pending_assemble_externals,
assemble_external_libcall): Use targetm.strip_name_encoding
before calling get_identifier.

--- gcc/varasm.cc.jj2024-01-08 21:56:04.968516120 +0100
+++ gcc/varasm.cc   2024-01-11 18:44:19.171399167 +0100
@@ -2543,7 +2543,8 @@ process_pending_assemble_externals (void
   for (rtx list = pending_libcall_symbols; list; list = XEXP (list, 1))
 {
   rtx symbol = XEXP (list, 0);
-  tree id = get_identifier (XSTR (symbol, 0));
+  const char *name = targetm.strip_name_encoding (XSTR (symbol, 0));
+  tree id = get_identifier (name);
   if (TREE_SYMBOL_REFERENCED (id))
targetm.asm_out.external_libcall (symbol);
 }
@@ -2631,7 +2632,8 @@ assemble_external_libcall (rtx fun)
  reference to it will mark its tree node as referenced, via
  assemble_name_resolve.  These are eventually emitted, if
  used, in process_pending_assemble_externals. */
-  get_identifier (XSTR (fun, 0));
+  const char *name = targetm.strip_name_encoding (XSTR (fun, 0));
+  get_identifier (name);
   pending_libcall_symbols = gen_rtx_EXPR_LIST (VOIDmode, fun,
   pending_libcall_symbols);
 }

Jakub



Re: [PATCH] lower-bitint: Fix up handling of unsigned INTEGER_CSTs operands with lots of 1s in the upper bits [PR113334]

2024-01-12 Thread Richard Biener
On Fri, 12 Jan 2024, Jakub Jelinek wrote:

> Hi!
> 
> For INTEGER_CST operands, the code decides if it should emit the whole
> INTEGER_CST into memory, or if there are enough upper bits either all 0s
> or all 1s to warrant an optimization, where we use memory for lower limbs
> or even just an INTEGER_CST for least significant limb and fill in the
> rest of limbs with 0s or 1s.  Unfortunately when not using
> bitint_min_cst_precision, the code was using tree_int_cst_sgn (op) < 0
> to determine whether to fill in the upper bits with 1s or 0s.  That is
> incorrect for TYPE_UNSIGNED INTEGER_CSTs which have higher limbs full of
> ones, we really want to check here whether the most significant bit is
> set or clear.
> 
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?

OK.

> 2024-01-12  Jakub Jelinek  
> 
>   PR tree-optimization/113334
>   * gimple-lower-bitint.cc (bitint_large_huge::handle_operand): Use
>   wi::neg_p (wi::to_wide (op)) instead of tree_int_cst_sgn (op) < 0
>   to determine if number should be extended by all ones rather than zero
>   extended.
> 
>   * gcc.dg/torture/bitint-46.c: New test.
> 
> --- gcc/gimple-lower-bitint.cc.jj 2024-01-11 14:27:26.0 +0100
> +++ gcc/gimple-lower-bitint.cc2024-01-11 17:35:07.484557476 +0100
> @@ -997,7 +997,7 @@ bitint_large_huge::handle_operand (tree
>   {
> unsigned int prec = TYPE_PRECISION (TREE_TYPE (op));
> unsigned rem = prec % (2 * limb_prec);
> -   int ext = tree_int_cst_sgn (op) < 0 ? -1 : 0;
> +   int ext = wi::neg_p (wi::to_wide (op)) ? -1 : 0;
> tree c = m_data[m_data_cnt];
> unsigned min_prec = TYPE_PRECISION (TREE_TYPE (c));
> g = gimple_build_cond (LT_EXPR, idx,
> --- gcc/testsuite/gcc.dg/torture/bitint-46.c.jj   2024-01-11 
> 17:44:42.360409112 +0100
> +++ gcc/testsuite/gcc.dg/torture/bitint-46.c  2024-01-11 17:44:31.471564635 
> +0100
> @@ -0,0 +1,32 @@
> +/* PR tree-optimization/113334 */
> +/* { dg-do run { target bitint } } */
> +/* { dg-options "-std=c23 -pedantic-errors" } */
> +/* { dg-skip-if "" { ! run_expensive_tests }  { "*" } { "-O0" "-O2" } } */
> +/* { dg-skip-if "" { ! run_expensive_tests } { "-flto" } { "" } } */
> +
> +#if __BITINT_MAXWIDTH__ >= 384
> +__attribute__((noipa)) _BitInt(384)
> +foo (int s)
> +{
> +  _BitInt(384) z = (-(unsigned _BitInt(384)) 4) >> s;
> +  return z;
> +}
> +#endif
> +
> +int
> +main ()
> +{
> +#if __BITINT_MAXWIDTH__ >= 384
> +  if (foo (59) != 
> 0x1fwb)
> +__builtin_abort ();
> +  if (foo (0) != -4wb)
> +__builtin_abort ();
> +  if (foo (1) != 
> 0x7ffewb)
> +__builtin_abort ();
> +  if (foo (11) != 
> 0x001fwb)
> +__builtin_abort ();
> +  if (foo (123) != 
> 0x1fwb)
> +__builtin_abort ();
> +#endif
> +  return 0;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[PATCH] lower-bitint: Fix up handling of unsigned INTEGER_CSTs operands with lots of 1s in the upper bits [PR113334]

2024-01-12 Thread Jakub Jelinek
Hi!

For INTEGER_CST operands, the code decides if it should emit the whole
INTEGER_CST into memory, or if there are enough upper bits either all 0s
or all 1s to warrant an optimization, where we use memory for lower limbs
or even just an INTEGER_CST for least significant limb and fill in the
rest of limbs with 0s or 1s.  Unfortunately when not using
bitint_min_cst_precision, the code was using tree_int_cst_sgn (op) < 0
to determine whether to fill in the upper bits with 1s or 0s.  That is
incorrect for TYPE_UNSIGNED INTEGER_CSTs which have higher limbs full of
ones, we really want to check here whether the most significant bit is
set or clear.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2024-01-12  Jakub Jelinek  

PR tree-optimization/113334
* gimple-lower-bitint.cc (bitint_large_huge::handle_operand): Use
wi::neg_p (wi::to_wide (op)) instead of tree_int_cst_sgn (op) < 0
to determine if number should be extended by all ones rather than zero
extended.

* gcc.dg/torture/bitint-46.c: New test.

--- gcc/gimple-lower-bitint.cc.jj   2024-01-11 14:27:26.0 +0100
+++ gcc/gimple-lower-bitint.cc  2024-01-11 17:35:07.484557476 +0100
@@ -997,7 +997,7 @@ bitint_large_huge::handle_operand (tree
{
  unsigned int prec = TYPE_PRECISION (TREE_TYPE (op));
  unsigned rem = prec % (2 * limb_prec);
- int ext = tree_int_cst_sgn (op) < 0 ? -1 : 0;
+ int ext = wi::neg_p (wi::to_wide (op)) ? -1 : 0;
  tree c = m_data[m_data_cnt];
  unsigned min_prec = TYPE_PRECISION (TREE_TYPE (c));
  g = gimple_build_cond (LT_EXPR, idx,
--- gcc/testsuite/gcc.dg/torture/bitint-46.c.jj 2024-01-11 17:44:42.360409112 
+0100
+++ gcc/testsuite/gcc.dg/torture/bitint-46.c2024-01-11 17:44:31.471564635 
+0100
@@ -0,0 +1,32 @@
+/* PR tree-optimization/113334 */
+/* { dg-do run { target bitint } } */
+/* { dg-options "-std=c23 -pedantic-errors" } */
+/* { dg-skip-if "" { ! run_expensive_tests }  { "*" } { "-O0" "-O2" } } */
+/* { dg-skip-if "" { ! run_expensive_tests } { "-flto" } { "" } } */
+
+#if __BITINT_MAXWIDTH__ >= 384
+__attribute__((noipa)) _BitInt(384)
+foo (int s)
+{
+  _BitInt(384) z = (-(unsigned _BitInt(384)) 4) >> s;
+  return z;
+}
+#endif
+
+int
+main ()
+{
+#if __BITINT_MAXWIDTH__ >= 384
+  if (foo (59) != 
0x1fwb)
+__builtin_abort ();
+  if (foo (0) != -4wb)
+__builtin_abort ();
+  if (foo (1) != 
0x7ffewb)
+__builtin_abort ();
+  if (foo (11) != 
0x001fwb)
+__builtin_abort ();
+  if (foo (123) != 
0x1fwb)
+__builtin_abort ();
+#endif
+  return 0;
+}

Jakub



Re: [PATCH] Fortran: annotations for DO CONCURRENT loops [PR113305]

2024-01-12 Thread Bernhard Reutner-Fischer
On Wed, 10 Jan 2024 23:24:22 +0100
Harald Anlauf  wrote:

> diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
> index 82f388c05f8..88502c1e3f0 100644
> --- a/gcc/fortran/gfortran.h
> +++ b/gcc/fortran/gfortran.h
> @@ -2926,6 +2926,10 @@ gfc_dt;
>  typedef struct gfc_forall_iterator
>  {
>gfc_expr *var, *start, *end, *stride;
> +  unsigned short unroll;
> +  bool ivdep;
> +  bool vector;
> +  bool novector;
>struct gfc_forall_iterator *next;
>  }
[]
> diff --git a/gcc/fortran/trans-stmt.cc b/gcc/fortran/trans-stmt.cc
> index a718dce237f..59a9cf99f9b 100644
> --- a/gcc/fortran/trans-stmt.cc
> +++ b/gcc/fortran/trans-stmt.cc
> @@ -41,6 +41,10 @@ typedef struct iter_info
>tree start;
>tree end;
>tree step;
> +  unsigned short unroll;
> +  bool ivdep;
> +  bool vector;
> +  bool novector;
>struct iter_info *next;
>  }

Given that we already have in gfortran.h

> typedef struct
> {
>   gfc_expr *var, *start, *end, *step;
>   unsigned short unroll;
>   bool ivdep;
>   bool vector;
>   bool novector;
> }
> gfc_iterator;

would it make sense to break out these loop annotation flags into its
own let's say struct gfc_iterator_flags and use pointers to that flags
instead?

LGTM otherwise.
Thanks for the patch!


Re: [PATCH 1/2] RISC-V/testsuite: Widen coverage for pr105314.c

2024-01-12 Thread Kito Cheng
LGTM

On Fri, Jan 12, 2024 at 7:36 AM Maciej W. Rozycki  wrote:
>
> The optimization levels pr105314.c is iterated over are needlessly
> overridden with "-O2", limiting the coverage of the test case to that
> level, perhaps with additional options the original optimization level
> has been supplied with.  We could prevent the extra iterations other
> than "-O2" from being run, but the transformation made by if-conversion
> is also expected to happen at other optimization levels, so include them
> all, and also make sure no reverse-condition branch appears in output,
> moving the `dg-final' command to the bottom, as with most test cases.
>
> gcc/testsuite/
> * gcc.target/riscv/pr105314.c: Replace `dg-options' command with
> `dg-skip-if'.  Also reject "bne" with `dg-final'.
> ---
> Hi,
>
>  Technically it's not a single self-contained change and it could be 3
> instead, but I think there's little point in splitting it further.
>
>   Maciej
> ---
>  gcc/testsuite/gcc.target/riscv/pr105314.c |5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> gcc-test-riscv-pr105314-levels.diff
> Index: gcc/gcc/testsuite/gcc.target/riscv/pr105314.c
> ===
> --- gcc.orig/gcc/testsuite/gcc.target/riscv/pr105314.c
> +++ gcc/gcc/testsuite/gcc.target/riscv/pr105314.c
> @@ -1,7 +1,6 @@
>  /* PR rtl-optimization/105314 */
>  /* { dg-do compile } */
> -/* { dg-options "-O2" } */
> -/* { dg-final { scan-assembler-not "\tbeq\t" } } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
>
>  long
>  foo (long a, long b, long c)
> @@ -10,3 +9,5 @@ foo (long a, long b, long c)
>  a = 0;
>return a;
>  }
> +
> +/* { dg-final { scan-assembler-not "\\s(?:beq|bne)\\s" } } */


Re: [PATCH 2/2] RISC-V/testsuite: Also verify if-conversion runs for pr105314.c

2024-01-12 Thread Kito Cheng
LGTM

On Fri, Jan 12, 2024 at 7:37 AM Maciej W. Rozycki  wrote:
>
> Verify that if-conversion succeeded through noce_try_store_flag_mask, as
> per PR rtl-optimization/105314, tightening the test case and making it
> explicit.
>
> gcc/testsuite/
> * gcc.target/riscv/pr105314.c: Scan the RTL "ce1" pass too.
> ---
>  gcc/testsuite/gcc.target/riscv/pr105314.c |2 ++
>  1 file changed, 2 insertions(+)
>
> gcc-test-riscv-pr105314-rtl.diff
> Index: gcc/gcc/testsuite/gcc.target/riscv/pr105314.c
> ===
> --- gcc.orig/gcc/testsuite/gcc.target/riscv/pr105314.c
> +++ gcc/gcc/testsuite/gcc.target/riscv/pr105314.c
> @@ -1,6 +1,7 @@
>  /* PR rtl-optimization/105314 */
>  /* { dg-do compile } */
>  /* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
> +/* { dg-options "-fdump-rtl-ce1" } */
>
>  long
>  foo (long a, long b, long c)
> @@ -10,4 +11,5 @@ foo (long a, long b, long c)
>return a;
>  }
>
> +/* { dg-final { scan-rtl-dump-times "if-conversion succeeded through 
> noce_try_store_flag_mask" 1 "ce1" } } */
>  /* { dg-final { scan-assembler-not "\\s(?:beq|bne)\\s" } } */


[committed] testsuite: Fix up preprocessor conditions in bitint-31.c test

2024-01-12 Thread Jakub Jelinek
Hi!

Andre reported on IRC that this test has weird preprocessor conditions,
obviously the intent was to test whether corresponding __*_MANT_DIG__
is equal to the expected value like earlier in the function definitions,
but somehow I've ended up with a comma expression instead, which was
always true.

Regtested on x86_64-linux and i686-linux, committed to trunk as obvious.

2024-01-12  Jakub Jelinek  

* gcc.dg/bitint-31.c: Fix up #if conditions checking whether
__*_MANT_DIG__ is equal to a particular precision.

--- gcc/testsuite/gcc.dg/bitint-31.c.jj 2023-11-09 09:04:19.208535017 +0100
+++ gcc/testsuite/gcc.dg/bitint-31.c2024-01-11 16:15:26.552641335 +0100
@@ -280,7 +280,7 @@ main ()
   check_round (testfltu_575 
(123665200736552267030251260509823595017565674550605919957031528046448612553265933585158200530621522494798835713008069669675682517153375604983773077550946583958303386074349567uwb),
 __builtin_inff (), 0xffp+104f, __builtin_inff (), 0xffp+104f);
 #endif
 #endif
-#if __DBL_MANT_DIG__, 53
+#if __DBL_MANT_DIG__ == 53
 #if __BITINT_MAXWIDTH__ >= 135
   check_round (testdbl_135 (-21267647932558650424686050812251602943wb), 
-0x1ep+71, -0x1fp+71, -0x1ep+71, 
-0x1ep+71);
   check_round (testdbl_135 (-21267647932558650424686050812251602944wb), 
-0x1ep+71, -0x1fp+71, -0x1ep+71, 
-0x1ep+71);
@@ -360,7 +360,7 @@ main ()
   check_round (testdblu_575 
(123665200736552267030251260509823595017565674550605919957031528046448612553265933585158200530621522494798835713008069669675682517153375604983773077550946583958303386074349567uwb),
 0x20p+522, 0x1fp+522, 0x20p+522, 
0x1fp+522);
 #endif
 #endif
-#if __LDBL_MANT_DIG__, 64
+#if __LDBL_MANT_DIG__ == 64
 #if __BITINT_MAXWIDTH__ >= 135
   check_round (testldbl_135 (-27577662721237071616947187835994111wb), 
-0xa9f5e144d113e1c4p+51L, -0xa9f5e144d113e1c5p+51L, -0xa9f5e144d113e1c4p+51L, 
-0xa9f5e144d113e1c4p+51L);
   check_round (testldbl_135 (-27577662721237071616947187835994112wb), 
-0xa9f5e144d113e1c4p+51L, -0xa9f5e144d113e1c5p+51L, -0xa9f5e144d113e1c4p+51L, 
-0xa9f5e144d113e1c4p+51L);
@@ -426,7 +426,7 @@ main ()
   check_round (testldblu_575 
(123665200736552267030251260509823595017565674550605919957031528046448612553265933585158200530621522494798835713008069669675682517153375604983773077550946583958303386074349567uwb),
 0x1p+511L, 0xp+511L, 
0x1p+511L, 0xp+511L);
 #endif
 #endif
-#if __FLT128_MANT_DIG__, 113
+#if __FLT128_MANT_DIG__ == 113
 #if __BITINT_MAXWIDTH__ >= 135
   check_round (testflt128_135 (-21646332438261169091754659013488783917055wb), 
-0x1fce71fdcfb1797b42dede66ac9ecp+21F128, 
-0x1fce71fdcfb1797b42dede66ac9edp+21F128, 
-0x1fce71fdcfb1797b42dede66ac9ecp+21F128, 
-0x1fce71fdcfb1797b42dede66ac9ecp+21F128);
   check_round (testflt128_135 (-21646332438261169091754659013488783917056wb), 
-0x1fce71fdcfb1797b42dede66ac9ecp+21F128, 
-0x1fce71fdcfb1797b42dede66ac9edp+21F128, 
-0x1fce71fdcfb1797b42dede66ac9ecp+21F128, 
-0x1fce71fdcfb1797b42dede66ac9ecp+21F128);

Jakub



Re: [PATCH] libstdc++: Fix std::runtime_format deviations from the spec [PR113320]

2024-01-12 Thread Jonathan Wakely
On Fri, 12 Jan 2024 at 06:53, Daniel Krügler  wrote:
>
> Am Do., 11. Jan. 2024 um 21:23 Uhr schrieb Jonathan Wakely 
> :
> >
> > Tested x86_64-linux. Does this look better now?
>
> Yes, thank you.

Thanks, pushed.


[committed] libstdc++: Implement C++23 P1951R1 (Default Args for pair's Forwarding Ctor) [PR105505]

2024-01-12 Thread Jonathan Wakely
Tested aarch64-linux, pushed to trunk.

-- >8 --

This was approved for C++23 at he June 2021 virtual meeting.

This allows more efficient construction of std::pair members when {} is
used as a constructor argument. The perfect forwarding constructor can
be used, so that the member variables are constructed from forwarded
arguments instead of being copy constructed from temporaries.

libstdc++-v3/ChangeLog:

PR libstdc++/105505
* include/bits/stl_pair.h (pair::pair(U1&&, U2&&)) [C++23]: Add
default template arguments, as per P1951R1.
* testsuite/20_util/pair/cons/default_tmpl_args.cc: New test.
---
 libstdc++-v3/include/bits/stl_pair.h  |  8 
 .../20_util/pair/cons/default_tmpl_args.cc| 48 +++
 2 files changed, 56 insertions(+)
 create mode 100644 
libstdc++-v3/testsuite/20_util/pair/cons/default_tmpl_args.cc

diff --git a/libstdc++-v3/include/bits/stl_pair.h 
b/libstdc++-v3/include/bits/stl_pair.h
index ce8826ae6a8..52f532f3c39 100644
--- a/libstdc++-v3/include/bits/stl_pair.h
+++ b/libstdc++-v3/include/bits/stl_pair.h
@@ -308,7 +308,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   { }
 
   /// Constructor accepting two values of arbitrary types
+#if __cplusplus > 202002L
+  template
+#else
   template
+#endif
requires (_S_constructible<_U1, _U2>()) && (!_S_dangles<_U1, _U2>())
constexpr explicit(!_S_convertible<_U1, _U2>())
pair(_U1&& __x, _U2&& __y)
@@ -316,7 +320,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
: first(std::forward<_U1>(__x)), second(std::forward<_U2>(__y))
{ }
 
+#if __cplusplus > 202002L
+  template
+#else
   template
+#endif
requires (_S_constructible<_U1, _U2>()) && (_S_dangles<_U1, _U2>())
constexpr explicit(!_S_convertible<_U1, _U2>())
pair(_U1&&, _U2&&) = delete;
diff --git a/libstdc++-v3/testsuite/20_util/pair/cons/default_tmpl_args.cc 
b/libstdc++-v3/testsuite/20_util/pair/cons/default_tmpl_args.cc
new file mode 100644
index 000..5960bf72e21
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/pair/cons/default_tmpl_args.cc
@@ -0,0 +1,48 @@
+// { dg-do compile { target c++23 } }
+
+// P1951R1 Default Arguments for pair's Forwarding Constructor
+
+#include 
+#include 
+#include 
+#include 
+
+void
+test_p1951r1_example()
+{
+  std::pair> p("hello", {});
+}
+
+struct Counter
+{
+  constexpr Counter() = default;
+  constexpr Counter(int) { }
+  constexpr Counter(const Counter& c) : copies(c.copies + 1), moves(c.moves) { 
}
+  constexpr Counter(Counter&& c) : copies(c.copies), moves(c.moves+1) { }
+  int copies = 0;
+  int moves = 0;
+};
+
+constexpr bool
+test_count_copies()
+{
+  std::pair p1(1, {});
+  VERIFY( p1.first.copies == 0 && p1.second.copies == 0 );
+  VERIFY( p1.first.moves == 0 && p1.second.moves == 1 );
+
+  std::pair p2({}, 1);
+  VERIFY( p2.first.copies == 0 && p2.second.copies == 0 );
+  VERIFY( p2.first.moves == 1 && p2.second.moves == 0 );
+
+  std::pair p3({}, {});
+  VERIFY( p3.first.copies == 0 && p3.second.copies == 0 );
+  VERIFY( p3.first.moves == 1 && p3.second.moves == 1 );
+
+  return true;
+}
+
+int main()
+{
+  test_p1951r1_example();
+  static_assert( test_count_copies() );
+}
-- 
2.43.0



Re: [PATCH] sra: Punt for too large _BitInt accesses [PR113330]

2024-01-12 Thread Richard Biener
On Fri, 12 Jan 2024, Jakub Jelinek wrote:

> Hi!
> 
> This is the case I was talking about in
> https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642423.html
> and Zdenek kindly found a testcase for it.
> We can only create BITINT_TYPE with precision at most 65535, not 65536,
> so need to punt if we'd want to create it.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK

> 2024-01-12  Jakub Jelinek  
> 
>   PR tree-optimization/113330
>   * tree-sra.cc (create_access): Punt for BITINT_TYPE accesses with
>   too large size.
> 
>   * gcc.dg/bitint-69.c: New test.
> 
> --- gcc/tree-sra.cc.jj2024-01-10 12:45:54.293851670 +0100
> +++ gcc/tree-sra.cc   2024-01-11 15:13:29.697073438 +0100
> @@ -967,6 +967,12 @@ create_access (tree expr, gimple *stmt,
>disqualify_candidate (base, "Encountered an access beyond the base.");
>return NULL;
>  }
> +  if (TREE_CODE (TREE_TYPE (expr)) == BITINT_TYPE
> +  && size > WIDE_INT_MAX_PRECISION - 1)
> +{
> +  disqualify_candidate (base, "Encountered too large _BitInt access.");
> +  return NULL;
> +}
>  
>access = create_access_1 (base, offset, size);
>access->expr = expr;
> --- gcc/testsuite/gcc.dg/bitint-69.c.jj   2024-01-11 15:16:57.573140907 
> +0100
> +++ gcc/testsuite/gcc.dg/bitint-69.c  2024-01-12 09:55:30.026374627 +0100
> @@ -0,0 +1,25 @@
> +/* PR tree-optimization/113330 */
> +/* { dg-do compile { target bitint } } */
> +/* { dg-require-stack-check "generic" } */
> +/* { dg-options "-std=c23 -O --param=large-stack-frame=131072 
> -fstack-check=generic --param=sccvn-max-alias-queries-per-access=0" } */
> +
> +_BitInt(8) a;
> +
> +static inline __attribute__((__always_inline__)) void
> +bar (int, int, int, int, int, int, int, int)
> +{
> +#if __BITINT_MAXWIDTH__ >= 65535
> +  _BitInt(65535) b = 0;
> +  _BitInt(383) c = 0;
> +#else
> +  _BitInt(63) b = 0;
> +  _BitInt(39) c = 0;
> +#endif
> +  a = b;
> +}
> +
> +void
> +foo (void)
> +{
> +  bar (0, 0, 0, 0, 0, 0, 0, 0);
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] lower-bitint: Fix up handling of uninitialized large/huge _BitInt call arguments [PR113316]

2024-01-12 Thread Richard Biener
On Fri, 12 Jan 2024, Jakub Jelinek wrote:

> Hi!
> 
> The code to assign large/huge _BitInt SSA_NAMEs to partitions intentionally
> ignores uninitialized SSA_NAMEs:
>   /* Also ignore uninitialized uses.  */
>   if (SSA_NAME_IS_DEFAULT_DEF (s)
>   && (!SSA_NAME_VAR (s) || VAR_P (SSA_NAME_VAR (s
> continue;
> because there is no need to store them into memory, all we need is when
> trying to extract some limb from them use uninitialized SSA_NAME for the
> limb.
> 
> The following testcase shows this is a problem for call arguments though,
> for those we need to create replacement SSA_NAMEs which are loaded from
> the underlying variable.  For uninitialized SSA_NAMEs because we didn't
> create underlying variable for them var_to_partition doesn't work, the
> following patch handles it by just creating an uninitialized replacement
> SSA_NAME.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK

> 2024-01-12  Jakub Jelinek  
> 
>   PR tree-optimization/113316
>   * gimple-lower-bitint.cc (bitint_large_huge::lower_call): Handle
>   uninitialized large/huge _BitInt arguments to calls.
> 
>   * gcc.dg/bitint-67.c: New test.
> 
> --- gcc/gimple-lower-bitint.cc.jj 2024-01-11 11:46:49.147779946 +0100
> +++ gcc/gimple-lower-bitint.cc2024-01-11 13:52:46.106182653 +0100
> @@ -5118,14 +5118,23 @@ bitint_large_huge::lower_call (tree obj,
> || TREE_CODE (TREE_TYPE (arg)) != BITINT_TYPE
> || bitint_precision_kind (TREE_TYPE (arg)) <= bitint_prec_middle)
>   continue;
> -  int p = var_to_partition (m_map, arg);
> -  tree v = m_vars[p];
> -  gcc_assert (v != NULL_TREE);
> -  if (!types_compatible_p (TREE_TYPE (arg), TREE_TYPE (v)))
> - v = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (arg), v);
> -  arg = make_ssa_name (TREE_TYPE (arg));
> -  gimple *g = gimple_build_assign (arg, v);
> -  gsi_insert_before (, g, GSI_SAME_STMT);
> +  if (SSA_NAME_IS_DEFAULT_DEF (arg)
> +   && (!SSA_NAME_VAR (arg) || VAR_P (SSA_NAME_VAR (arg
> + {
> +   tree var = create_tmp_reg (TREE_TYPE (arg));
> +   arg = get_or_create_ssa_default_def (cfun, var);
> + }
> +  else
> + {
> +   int p = var_to_partition (m_map, arg);
> +   tree v = m_vars[p];
> +   gcc_assert (v != NULL_TREE);
> +   if (!types_compatible_p (TREE_TYPE (arg), TREE_TYPE (v)))
> + v = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (arg), v);
> +   arg = make_ssa_name (TREE_TYPE (arg));
> +   gimple *g = gimple_build_assign (arg, v);
> +   gsi_insert_before (, g, GSI_SAME_STMT);
> + }
>gimple_call_set_arg (stmt, i, arg);
>if (m_preserved == NULL)
>   m_preserved = BITMAP_ALLOC (NULL);
> --- gcc/testsuite/gcc.dg/bitint-67.c.jj   2024-01-11 13:54:14.561936065 
> +0100
> +++ gcc/testsuite/gcc.dg/bitint-67.c  2024-01-11 13:55:09.474162196 +0100
> @@ -0,0 +1,12 @@
> +/* PR tree-optimization/113316 */
> +/* { dg-do compile { target bitint575 } } */
> +/* { dg-options "-std=c23 -O2 -w" } */
> +
> +void bar (_BitInt(535) y);
> +
> +void
> +foo (void)
> +{
> +  _BitInt(535) y;
> +  bar (y);
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] lower-bitint: Fix a typo in a condition [PR113323]

2024-01-12 Thread Richard Biener
On Fri, 12 Jan 2024, Jakub Jelinek wrote:

> Hi!
> 
> The following testcase revealed a typo in condition, as the comment
> says the intent is
>/*  If lhs of stmt is large/huge _BitInt SSA_NAME not in m_names
>it means it will be handled in a loop or straight line code
>at the location of its (ultimate) immediate use, so for
>vop checking purposes check these only at the ultimate
>immediate use.  */
> but the condition was using != BITINT_TYPE rather than == BITINT_TYPE,
> so e.g. it used bitint_precision_kind on non-BITINT_TYPEs (e.g. on vector
> types it will crash because TYPE_PRECISION means something different there,
> or on say INTEGER_TYPEs the precision will never be large enough to be
> >= bitint_prec_large).
> 
> The following patch fixes that, bootstrapped/regtested on x86_64-linux and
> i686-linux, ok for trunk?

OK

> 2024-01-12  Jakub Jelinek  
> 
>   PR tree-optimization/113323
>   * gimple-lower-bitint.cc (bitint_dom_walker::before_dom_children): Fix
>   check for lhs being large/huge _BitInt not in m_names.
> 
>   * gcc.dg/bitint-68.c: New test.
> 
> --- gcc/gimple-lower-bitint.cc.jj 2024-01-11 13:52:46.0 +0100
> +++ gcc/gimple-lower-bitint.cc2024-01-11 14:27:26.011875196 +0100
> @@ -5513,7 +5513,7 @@ bitint_dom_walker::before_dom_children (
>tree lhs = gimple_get_lhs (stmt);
>if (lhs
> && TREE_CODE (lhs) == SSA_NAME
> -   && TREE_CODE (TREE_TYPE (lhs)) != BITINT_TYPE
> +   && TREE_CODE (TREE_TYPE (lhs)) == BITINT_TYPE
> && bitint_precision_kind (TREE_TYPE (lhs)) >= bitint_prec_large
> && !bitmap_bit_p (m_names, SSA_NAME_VERSION (lhs)))
>   /* If lhs of stmt is large/huge _BitInt SSA_NAME not in m_names,
> --- gcc/testsuite/gcc.dg/bitint-68.c.jj   2024-01-11 14:41:21.237183889 
> +0100
> +++ gcc/testsuite/gcc.dg/bitint-68.c  2024-01-11 14:40:35.977814727 +0100
> @@ -0,0 +1,14 @@
> +/* PR tree-optimization/113323 */
> +/* { dg-do compile { target bitint575 } } */
> +/* { dg-options "-std=c23 -O2" } */
> +
> +typedef long __attribute__((__vector_size__ (16))) V;
> +V u, v;
> +_BitInt(535) i;
> +
> +void
> +foo (void)
> +{
> +  while (i)
> +u = v;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] lower-bitint: Fix handling of casts on arches with abi_limb_mode != limb_mode

2024-01-12 Thread Richard Biener
On Fri, 12 Jan 2024, Jakub Jelinek wrote:

> On Thu, Jan 11, 2024 at 12:12:59PM +0100, Jakub Jelinek wrote:
> > So, the problem was that in 2 spots I was comparing TYPE_SIZE of large/huge
> > BITINT_TYPEs to determine if it can be handled cheaply.
> > On x86_64 with limb_mode == abi_limb_mode (both DImode) that works fine,
> > if TYPE_SIZE is equal, it means it has the same number of limbs.
> > But on aarch64 TYPE_SIZE of say _BitInt(135) and _BitInt(193) is the same,
> > both are 256-bit storage, but because DImode is used as limb_mode, the
> > former actually needs just 3 limbs, while the latter needs 4 limbs.
> > And limb_access_type was asserting that we don't try to access 4th limb
> > on types which actually have a precision which needs just 3 limbs.
> > 
> > The following patch (so far tested on x86_64 with all the bitint tests plus
> > on the bitint-7.c testcase in a cross to aarch64) should fix that.
> > 
> > Note, for the info.extended targets (currently none, but I think arm 32-bit
> > in the ABI is meant like that), we'll need to do something different,
> > because the upper bits aren't just padding and should be zero/sign extended,
> > so if we say have limb_mode SImode, abi_limb_mode DImode, we'll need to
> > treat _BitInt(135) not as 5 SImode limbs, but 6.  For !info.extended targets
> > I think treating _BitInt(135) as 3 DImode limbs rather than 4 is fine.
> > 
> > 2024-01-11  Jakub Jelinek  
> > 
> > * gimple-lower-bitint.cc (mergeable_op): Instead of comparing
> > TYPE_SIZE (t) of large/huge BITINT_TYPEs, compare
> > CEIL (TYPE_PRECISION (t), limb_prec).
> > (bitint_large_huge::handle_cast): Likewise.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK

> > --- gcc/gimple-lower-bitint.cc.jj   2024-01-08 13:58:21.448176859 +0100
> > +++ gcc/gimple-lower-bitint.cc  2024-01-11 11:46:49.147779946 +0100
> > @@ -231,7 +231,8 @@ mergeable_op (gimple *stmt)
> > && TREE_CODE (rhs_type) == BITINT_TYPE
> > && bitint_precision_kind (lhs_type) >= bitint_prec_large
> > && bitint_precision_kind (rhs_type) >= bitint_prec_large
> > -   && tree_int_cst_equal (TYPE_SIZE (lhs_type), TYPE_SIZE (rhs_type)))
> > +   && (CEIL (TYPE_PRECISION (lhs_type), limb_prec)
> > +   == CEIL (TYPE_PRECISION (rhs_type), limb_prec)))
> >   {
> > if (TYPE_PRECISION (rhs_type) >= TYPE_PRECISION (lhs_type))
> >   return true;
> > @@ -1263,8 +1264,8 @@ bitint_large_huge::handle_cast (tree lhs
> >  if m_upwards_2limb * limb_prec is equal to
> >  lhs precision that is not the case.  */
> >   || (!m_var_msb
> > - && tree_int_cst_equal (TYPE_SIZE (rhs_type),
> > -TYPE_SIZE (lhs_type))
> > + && (CEIL (TYPE_PRECISION (lhs_type), limb_prec)
> > + == CEIL (TYPE_PRECISION (rhs_type), limb_prec))
> >   && (!m_upwards_2limb
> >   || (m_upwards_2limb * limb_prec
> >   < TYPE_PRECISION (lhs_type)
> > 
> > Jakub
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[committed] libstdc++: Fix incorrect PR number in comment

2024-01-12 Thread Jonathan Wakely
Tested x86_64-linux, pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

* include/std/format (__format::_Arg_store): Fix PR number in
comment. Simplify preprocessor code.
---
 libstdc++-v3/include/std/format | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index b3b5a0bbdbc..7440a25ea97 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -3652,12 +3652,9 @@ namespace __format
   friend std::basic_format_args<_Context>;
 
   template
-   friend auto
+   friend auto std::
 #if _GLIBCXX_INLINE_VERSION
-   // Needed for PR c++/59526
-   std::__8::
-#else
-   std::
+   __8:: // Needed for PR c++/59256
 #endif
make_format_args(_Argz&...) noexcept;
 
-- 
2.43.0



[PATCH] sra: Punt for too large _BitInt accesses [PR113330]

2024-01-12 Thread Jakub Jelinek
Hi!

This is the case I was talking about in
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642423.html
and Zdenek kindly found a testcase for it.
We can only create BITINT_TYPE with precision at most 65535, not 65536,
so need to punt if we'd want to create it.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-01-12  Jakub Jelinek  

PR tree-optimization/113330
* tree-sra.cc (create_access): Punt for BITINT_TYPE accesses with
too large size.

* gcc.dg/bitint-69.c: New test.

--- gcc/tree-sra.cc.jj  2024-01-10 12:45:54.293851670 +0100
+++ gcc/tree-sra.cc 2024-01-11 15:13:29.697073438 +0100
@@ -967,6 +967,12 @@ create_access (tree expr, gimple *stmt,
   disqualify_candidate (base, "Encountered an access beyond the base.");
   return NULL;
 }
+  if (TREE_CODE (TREE_TYPE (expr)) == BITINT_TYPE
+  && size > WIDE_INT_MAX_PRECISION - 1)
+{
+  disqualify_candidate (base, "Encountered too large _BitInt access.");
+  return NULL;
+}
 
   access = create_access_1 (base, offset, size);
   access->expr = expr;
--- gcc/testsuite/gcc.dg/bitint-69.c.jj 2024-01-11 15:16:57.573140907 +0100
+++ gcc/testsuite/gcc.dg/bitint-69.c2024-01-12 09:55:30.026374627 +0100
@@ -0,0 +1,25 @@
+/* PR tree-optimization/113330 */
+/* { dg-do compile { target bitint } } */
+/* { dg-require-stack-check "generic" } */
+/* { dg-options "-std=c23 -O --param=large-stack-frame=131072 
-fstack-check=generic --param=sccvn-max-alias-queries-per-access=0" } */
+
+_BitInt(8) a;
+
+static inline __attribute__((__always_inline__)) void
+bar (int, int, int, int, int, int, int, int)
+{
+#if __BITINT_MAXWIDTH__ >= 65535
+  _BitInt(65535) b = 0;
+  _BitInt(383) c = 0;
+#else
+  _BitInt(63) b = 0;
+  _BitInt(39) c = 0;
+#endif
+  a = b;
+}
+
+void
+foo (void)
+{
+  bar (0, 0, 0, 0, 0, 0, 0, 0);
+}

Jakub



[PATCH] lower-bitint: Fix a typo in a condition [PR113323]

2024-01-12 Thread Jakub Jelinek
Hi!

The following testcase revealed a typo in condition, as the comment
says the intent is
   /*  If lhs of stmt is large/huge _BitInt SSA_NAME not in m_names
   it means it will be handled in a loop or straight line code
   at the location of its (ultimate) immediate use, so for
   vop checking purposes check these only at the ultimate
   immediate use.  */
but the condition was using != BITINT_TYPE rather than == BITINT_TYPE,
so e.g. it used bitint_precision_kind on non-BITINT_TYPEs (e.g. on vector
types it will crash because TYPE_PRECISION means something different there,
or on say INTEGER_TYPEs the precision will never be large enough to be
>= bitint_prec_large).

The following patch fixes that, bootstrapped/regtested on x86_64-linux and
i686-linux, ok for trunk?

2024-01-12  Jakub Jelinek  

PR tree-optimization/113323
* gimple-lower-bitint.cc (bitint_dom_walker::before_dom_children): Fix
check for lhs being large/huge _BitInt not in m_names.

* gcc.dg/bitint-68.c: New test.

--- gcc/gimple-lower-bitint.cc.jj   2024-01-11 13:52:46.0 +0100
+++ gcc/gimple-lower-bitint.cc  2024-01-11 14:27:26.011875196 +0100
@@ -5513,7 +5513,7 @@ bitint_dom_walker::before_dom_children (
   tree lhs = gimple_get_lhs (stmt);
   if (lhs
  && TREE_CODE (lhs) == SSA_NAME
- && TREE_CODE (TREE_TYPE (lhs)) != BITINT_TYPE
+ && TREE_CODE (TREE_TYPE (lhs)) == BITINT_TYPE
  && bitint_precision_kind (TREE_TYPE (lhs)) >= bitint_prec_large
  && !bitmap_bit_p (m_names, SSA_NAME_VERSION (lhs)))
/* If lhs of stmt is large/huge _BitInt SSA_NAME not in m_names,
--- gcc/testsuite/gcc.dg/bitint-68.c.jj 2024-01-11 14:41:21.237183889 +0100
+++ gcc/testsuite/gcc.dg/bitint-68.c2024-01-11 14:40:35.977814727 +0100
@@ -0,0 +1,14 @@
+/* PR tree-optimization/113323 */
+/* { dg-do compile { target bitint575 } } */
+/* { dg-options "-std=c23 -O2" } */
+
+typedef long __attribute__((__vector_size__ (16))) V;
+V u, v;
+_BitInt(535) i;
+
+void
+foo (void)
+{
+  while (i)
+u = v;
+}

Jakub



[PATCH] lower-bitint: Fix up handling of uninitialized large/huge _BitInt call arguments [PR113316]

2024-01-12 Thread Jakub Jelinek
Hi!

The code to assign large/huge _BitInt SSA_NAMEs to partitions intentionally
ignores uninitialized SSA_NAMEs:
  /* Also ignore uninitialized uses.  */
  if (SSA_NAME_IS_DEFAULT_DEF (s)
  && (!SSA_NAME_VAR (s) || VAR_P (SSA_NAME_VAR (s
continue;
because there is no need to store them into memory, all we need is when
trying to extract some limb from them use uninitialized SSA_NAME for the
limb.

The following testcase shows this is a problem for call arguments though,
for those we need to create replacement SSA_NAMEs which are loaded from
the underlying variable.  For uninitialized SSA_NAMEs because we didn't
create underlying variable for them var_to_partition doesn't work, the
following patch handles it by just creating an uninitialized replacement
SSA_NAME.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-01-12  Jakub Jelinek  

PR tree-optimization/113316
* gimple-lower-bitint.cc (bitint_large_huge::lower_call): Handle
uninitialized large/huge _BitInt arguments to calls.

* gcc.dg/bitint-67.c: New test.

--- gcc/gimple-lower-bitint.cc.jj   2024-01-11 11:46:49.147779946 +0100
+++ gcc/gimple-lower-bitint.cc  2024-01-11 13:52:46.106182653 +0100
@@ -5118,14 +5118,23 @@ bitint_large_huge::lower_call (tree obj,
  || TREE_CODE (TREE_TYPE (arg)) != BITINT_TYPE
  || bitint_precision_kind (TREE_TYPE (arg)) <= bitint_prec_middle)
continue;
-  int p = var_to_partition (m_map, arg);
-  tree v = m_vars[p];
-  gcc_assert (v != NULL_TREE);
-  if (!types_compatible_p (TREE_TYPE (arg), TREE_TYPE (v)))
-   v = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (arg), v);
-  arg = make_ssa_name (TREE_TYPE (arg));
-  gimple *g = gimple_build_assign (arg, v);
-  gsi_insert_before (, g, GSI_SAME_STMT);
+  if (SSA_NAME_IS_DEFAULT_DEF (arg)
+ && (!SSA_NAME_VAR (arg) || VAR_P (SSA_NAME_VAR (arg
+   {
+ tree var = create_tmp_reg (TREE_TYPE (arg));
+ arg = get_or_create_ssa_default_def (cfun, var);
+   }
+  else
+   {
+ int p = var_to_partition (m_map, arg);
+ tree v = m_vars[p];
+ gcc_assert (v != NULL_TREE);
+ if (!types_compatible_p (TREE_TYPE (arg), TREE_TYPE (v)))
+   v = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (arg), v);
+ arg = make_ssa_name (TREE_TYPE (arg));
+ gimple *g = gimple_build_assign (arg, v);
+ gsi_insert_before (, g, GSI_SAME_STMT);
+   }
   gimple_call_set_arg (stmt, i, arg);
   if (m_preserved == NULL)
m_preserved = BITMAP_ALLOC (NULL);
--- gcc/testsuite/gcc.dg/bitint-67.c.jj 2024-01-11 13:54:14.561936065 +0100
+++ gcc/testsuite/gcc.dg/bitint-67.c2024-01-11 13:55:09.474162196 +0100
@@ -0,0 +1,12 @@
+/* PR tree-optimization/113316 */
+/* { dg-do compile { target bitint575 } } */
+/* { dg-options "-std=c23 -O2 -w" } */
+
+void bar (_BitInt(535) y);
+
+void
+foo (void)
+{
+  _BitInt(535) y;
+  bar (y);
+}

Jakub



[PATCH] c: Avoid _BitInt indexes > sizetype in ARRAY_REFs [PR113315]

2024-01-12 Thread Jakub Jelinek
Hi!

When build_array_ref doesn't use ARRAY_REF, it casts the index to sizetype
already, performs POINTER_PLUS_EXPR and then dereferences.
While when emitting ARRAY_REF, we try to keep index expression as is in
whatever type it had, which is reasonable e.g. for signed or unsigned types
narrower than sizetype for loop optimizations etc.
But if the index is wider than sizetype, we are unnecessarily computing
bits beyond what is needed.  For {,unsigned }__int128 on 64-bit arches
or {,unsigned }long long on 32-bit arches we've been doing that for decades,
so the following patch doesn't propose to change that (might be stage1
material), but for _BitInt at least the _BitInt lowering code doesn't expect
to see large/huge _BitInt in the ARRAY_REF indexes, I was expecting one
would see just casts of those to sizetype.

So, the following patch makes sure that large/huge _BitInt indexes don't
appear in ARRAY_REFs.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-01-12  Jakub Jelinek  

PR c/113315
* c-typeck.cc (build_array_ref): If index has BITINT_TYPE type with
precision larger than sizetype precision, convert it to sizetype.

* gcc.dg/bitint-65.c: New test.
* gcc.dg/bitint-66.c: New test.

--- gcc/c/c-typeck.cc.jj2024-01-03 12:06:52.940863462 +0100
+++ gcc/c/c-typeck.cc   2024-01-11 12:55:05.457899186 +0100
@@ -2858,6 +2858,10 @@ build_array_ref (location_t loc, tree ar
 "array");
}
 
+  if (TREE_CODE (TREE_TYPE (index)) == BITINT_TYPE
+ && TYPE_PRECISION (TREE_TYPE (index)) > TYPE_PRECISION (sizetype))
+   index = fold_convert (sizetype, index);
+
   type = TREE_TYPE (TREE_TYPE (array));
   rval = build4 (ARRAY_REF, type, array, index, NULL_TREE, NULL_TREE);
   /* Array ref is const/volatile if the array elements are
--- gcc/testsuite/gcc.dg/bitint-65.c.jj 2024-01-11 13:03:27.843827305 +0100
+++ gcc/testsuite/gcc.dg/bitint-65.c2024-01-11 13:03:04.807154223 +0100
@@ -0,0 +1,23 @@
+/* PR c/113315 */
+/* { dg-do compile { target bitint } } */
+/* { dg-options "-std=c23" } */
+
+#if __BITINT_MAXWIDTH__ >= 535
+_BitInt(535) x;
+#else
+_BitInt(64) x;
+#endif
+extern int a[];
+extern char b[][10];
+
+int
+foo (void)
+{
+  return a[x];
+}
+
+int
+bar (void)
+{
+  return __builtin_strlen (b[x]);
+}
--- gcc/testsuite/gcc.dg/bitint-66.c.jj 2024-01-11 13:08:40.561399890 +0100
+++ gcc/testsuite/gcc.dg/bitint-66.c2024-01-11 13:09:07.512019458 +0100
@@ -0,0 +1,12 @@
+/* PR c/113315 */
+/* { dg-do compile { target bitint } } */
+/* { dg-options "-std=c23 -O2" } */
+
+extern int a[5];
+
+int
+foo (void)
+{
+  _BitInt(535) i = 1;
+  return a[i];
+}

Jakub



[PATCH] lower-bitint: Fix handling of casts on arches with abi_limb_mode != limb_mode

2024-01-12 Thread Jakub Jelinek
On Thu, Jan 11, 2024 at 12:12:59PM +0100, Jakub Jelinek wrote:
> So, the problem was that in 2 spots I was comparing TYPE_SIZE of large/huge
> BITINT_TYPEs to determine if it can be handled cheaply.
> On x86_64 with limb_mode == abi_limb_mode (both DImode) that works fine,
> if TYPE_SIZE is equal, it means it has the same number of limbs.
> But on aarch64 TYPE_SIZE of say _BitInt(135) and _BitInt(193) is the same,
> both are 256-bit storage, but because DImode is used as limb_mode, the
> former actually needs just 3 limbs, while the latter needs 4 limbs.
> And limb_access_type was asserting that we don't try to access 4th limb
> on types which actually have a precision which needs just 3 limbs.
> 
> The following patch (so far tested on x86_64 with all the bitint tests plus
> on the bitint-7.c testcase in a cross to aarch64) should fix that.
> 
> Note, for the info.extended targets (currently none, but I think arm 32-bit
> in the ABI is meant like that), we'll need to do something different,
> because the upper bits aren't just padding and should be zero/sign extended,
> so if we say have limb_mode SImode, abi_limb_mode DImode, we'll need to
> treat _BitInt(135) not as 5 SImode limbs, but 6.  For !info.extended targets
> I think treating _BitInt(135) as 3 DImode limbs rather than 4 is fine.
> 
> 2024-01-11  Jakub Jelinek  
> 
>   * gimple-lower-bitint.cc (mergeable_op): Instead of comparing
>   TYPE_SIZE (t) of large/huge BITINT_TYPEs, compare
>   CEIL (TYPE_PRECISION (t), limb_prec).
>   (bitint_large_huge::handle_cast): Likewise.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

> --- gcc/gimple-lower-bitint.cc.jj 2024-01-08 13:58:21.448176859 +0100
> +++ gcc/gimple-lower-bitint.cc2024-01-11 11:46:49.147779946 +0100
> @@ -231,7 +231,8 @@ mergeable_op (gimple *stmt)
>   && TREE_CODE (rhs_type) == BITINT_TYPE
>   && bitint_precision_kind (lhs_type) >= bitint_prec_large
>   && bitint_precision_kind (rhs_type) >= bitint_prec_large
> - && tree_int_cst_equal (TYPE_SIZE (lhs_type), TYPE_SIZE (rhs_type)))
> + && (CEIL (TYPE_PRECISION (lhs_type), limb_prec)
> + == CEIL (TYPE_PRECISION (rhs_type), limb_prec)))
> {
>   if (TYPE_PRECISION (rhs_type) >= TYPE_PRECISION (lhs_type))
> return true;
> @@ -1263,8 +1264,8 @@ bitint_large_huge::handle_cast (tree lhs
>if m_upwards_2limb * limb_prec is equal to
>lhs precision that is not the case.  */
> || (!m_var_msb
> -   && tree_int_cst_equal (TYPE_SIZE (rhs_type),
> -  TYPE_SIZE (lhs_type))
> +   && (CEIL (TYPE_PRECISION (lhs_type), limb_prec)
> +   == CEIL (TYPE_PRECISION (rhs_type), limb_prec))
> && (!m_upwards_2limb
> || (m_upwards_2limb * limb_prec
> < TYPE_PRECISION (lhs_type)
> 
>   Jakub

Jakub



Re: [PATCH] c++: reject packs on xobj params. [PR113307]

2024-01-12 Thread Jakub Jelinek
On Fri, Jan 12, 2024 at 07:40:19AM +, waffl3x wrote:
> Bootstrapped and tested on x86_64-linux with no regressions.
> 
> I'm still getting used to things so let me know if the change log
> entries are excessive, thanks.

> From 9dc168e7bcbbd7d515fa28cb9cae28ec113fae0f Mon Sep 17 00:00:00 2001
> From: Waffl3x 
> Date: Thu, 11 Jan 2024 14:32:46 -0700
> Subject: [PATCH] c++: reject packs on xobj params. [PR113307]
> 
> Reject and diagnose xobj parameters declared as parameter packs.
> 
>   PR c++/113307
> 
> gcc/cp/ChangeLog:
> 
>   * parser.cc (cp_parser_parameter_declaration): Reject packs
>   on xobj params.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp23/explicit-obj-diagnostics3.C: Add test for
>   rejection of packs.
> 
> Signed-off-by: Waffl3x 
> ---
>  gcc/cp/parser.cc  |  21 +++-
>  .../g++.dg/cpp23/explicit-obj-diagnostics3.C  | 106 +-
>  2 files changed, 125 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
> index 8ab98cc0c23..70fbba09bf8 100644
> --- a/gcc/cp/parser.cc
> +++ b/gcc/cp/parser.cc
> @@ -25706,6 +25706,25 @@ cp_parser_parameter_declaration (cp_parser *parser,
>   for a C-style variadic function. */
>token = cp_lexer_peek_token (parser->lexer);
>  
> +  bool const xobj_param_p
> += decl_spec_seq_has_spec_p (_specifiers, ds_this);
> +
> +  if (xobj_param_p
> +  && ((declarator && declarator->parameter_pack_p)
> +   || cp_lexer_next_token_is (parser->lexer, CPP_ELLIPSIS)))
> +{
> +  location_t xobj_param
> + = make_location (decl_specifiers.locations[ds_this],
> +  decl_spec_token_start->location,
> +  input_location);
> +  error_at(xobj_param,
> +"an explicit object parameter cannot "
> +"be a function parameter pack");

Formatting - there should be space before ( and the following 2 lines
should be indented accordingly.

Will defer to Jason for the rest.

Jakub



[PATCH V3] RISC-V: Adjust scalar_to_vec cost

2024-01-12 Thread Juzhe-Zhong
1. Introduce vector regmove new tune info.
2. Adjust scalar_to_vec cost in add_stmt_cost.

We will get optimal codegen after this patch with -march=rv64gcv_zvl256b:

lui a5,%hi(a)
li  a4,19
sb  a4,%lo(a)(a5)
li  a0,0
ret

Tested on both RV32/RV64 no regression, Ok for trunk ?

PR target/113281

gcc/ChangeLog:

* config/riscv/riscv-protos.h (struct regmove_vector_cost): New struct.
(struct cpu_vector_cost): Add regmove struct.
(get_vector_costs): Export as global.
* config/riscv/riscv-vector-costs.cc (adjust_stmt_cost): Adjust 
scalar_to_vec cost.
(costs::add_stmt_cost): Ditto.
* config/riscv/riscv.cc (get_common_costs): Export global function.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr113209.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/pr113281-1.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/pr113281-2.c: New test.

---
 gcc/config/riscv/riscv-protos.h   | 11 
 gcc/config/riscv/riscv-vector-costs.cc| 23 +
 gcc/config/riscv/riscv.cc | 25 ---
 .../vect/costmodel/riscv/rvv/pr113281-1.c | 18 +
 .../vect/costmodel/riscv/rvv/pr113281-2.c | 18 +
 .../gcc.target/riscv/rvv/autovec/pr113209.c   |  2 +-
 6 files changed, 87 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113281-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113281-2.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index e8c54c5be50..4f3b677f4f9 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -250,6 +250,13 @@ struct scalable_vector_cost : common_vector_cost
  E.g. fold_left reduction cost, lanes load/store cost, ..., etc.  */
 };
 
+/* Additional costs for register copies.  Cost is for one register.  */
+struct regmove_vector_cost
+{
+  const int GR2VR;
+  const int FR2VR;
+};
+
 /* Cost for vector insn classes.  */
 struct cpu_vector_cost
 {
@@ -276,6 +283,9 @@ struct cpu_vector_cost
 
   /* Cost of an VLA modes operations.  */
   const scalable_vector_cost *vla;
+
+  /* Cost of vector register move operations.  */
+  const regmove_vector_cost *regmove;
 };
 
 /* Routines implemented in riscv-selftests.cc.  */
@@ -764,5 +774,6 @@ struct riscv_tune_info {
 
 const struct riscv_tune_info *
 riscv_parse_tune (const char *, bool);
+const cpu_vector_cost *get_vector_costs ();
 
 #endif /* ! GCC_RISCV_PROTOS_H */
diff --git a/gcc/config/riscv/riscv-vector-costs.cc 
b/gcc/config/riscv/riscv-vector-costs.cc
index 58ec0b9b503..1c3708f23a0 100644
--- a/gcc/config/riscv/riscv-vector-costs.cc
+++ b/gcc/config/riscv/riscv-vector-costs.cc
@@ -1055,6 +1055,26 @@ costs::better_main_loop_than_p (const vector_costs 
*uncast_other) const
   return vector_costs::better_main_loop_than_p (other);
 }
 
+/* Adjust vectorization cost after calling riscv_builtin_vectorization_cost.
+   For some statement, we would like to further fine-grain tweak the cost on
+   top of riscv_builtin_vectorization_cost handling which doesn't have any
+   information on statement operation codes etc.  */
+
+static unsigned
+adjust_stmt_cost (enum vect_cost_for_stmt kind, tree vectype, int stmt_cost)
+{
+  const cpu_vector_cost *costs = get_vector_costs ();
+  switch (kind)
+{
+case scalar_to_vec:
+  return stmt_cost += (FLOAT_TYPE_P (vectype) ? costs->regmove->FR2VR
+ : costs->regmove->GR2VR);
+default:
+  break;
+}
+  return stmt_cost;
+}
+
 unsigned
 costs::add_stmt_cost (int count, vect_cost_for_stmt kind,
  stmt_vec_info stmt_info, slp_tree, tree vectype,
@@ -1082,6 +1102,9 @@ costs::add_stmt_cost (int count, vect_cost_for_stmt kind,
 as one iteration of the VLA loop.  */
   if (where == vect_body && m_unrolled_vls_niters)
m_unrolled_vls_stmts += count * m_unrolled_vls_niters;
+
+  if (vectype)
+   stmt_cost = adjust_stmt_cost (kind, vectype, stmt_cost);
 }
 
   return record_stmt_cost (stmt_info, where, count * stmt_cost);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index f829014a589..ee1a57b321d 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -391,17 +391,24 @@ static const scalable_vector_cost rvv_vla_vector_cost = {
   },
 };
 
+/* RVV register move cost.   */
+static const regmove_vector_cost rvv_regmove_vector_cost = {
+  2, /* GR2VR  */
+  2, /* FR2VR  */
+};
+
 /* Generic costs for vector insn classes.  It is supposed to be the vector cost
models used by default if no other cost model was specified.  */
 static const struct cpu_vector_cost generic_vector_cost = {
-  1,   /* scalar_int_stmt_cost  */
-  1,   /* scalar_fp_stmt_cost  */
-  1,   /* 

Re: [PATCH]middle-end: remove more usages of single_exit

2024-01-12 Thread Richard Biener
On Fri, 12 Jan 2024, Tamar Christina wrote:

> Hi All,
> 
> This replaces two more usages of single_exit that I had missed before.
> They both seem to happen when we re-use the ifcvt scalar loop for versioning.
> 
> The condition in versioning is the same as the one for when we don't re-use 
> the
> scalar loop.
> 
> I hit these during an LTO enabled bootstrap now.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu with lto enabled and no 
> issues.
> 
> Ok for master?

OK

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * tree-vect-loop-manip.cc (vect_loop_versioning): Replace single_exit.
>   * tree-vect-loop.cc (vect_transform_loop): Likewise.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index 
> 0931b18404856f6c33dcae1ffa8d5a350dbd0f8f..0d8c90f69e9693d5d25095e799fbc17a9910779b
>  100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -4051,7 +4051,16 @@ vect_loop_versioning (loop_vec_info loop_vinfo,
>basic_block preheader = loop_preheader_edge (loop_to_version)->src;
>preheader->count = preheader->count.apply_probability (prob * prob2);
>scale_loop_frequencies (loop_to_version, prob * prob2);
> -  single_exit (loop_to_version)->dest->count = preheader->count;
> +  /* When the loop has multiple exits then we can only version itself.
> + This is denoted by loop_to_version == loop.  In this case we can
> + do the versioning by selecting the exit edge the vectorizer is
> + currently using.  */
> +  edge exit_edge;
> +  if (loop_to_version == loop)
> +   exit_edge = LOOP_VINFO_IV_EXIT (loop_vinfo);
> +  else
> +   exit_edge = single_exit (loop_to_version);
> +  exit_edge->dest->count = preheader->count;
>LOOP_VINFO_SCALAR_LOOP_SCALING (loop_vinfo) = (prob * prob2).invert ();
>  
>nloop = scalar_loop;
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 
> da2dfa176ecd457ebc11d1131302ca15d77d779d..eccf0953bbae2a0e95efba0966c85492e5057b14
>  100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -11910,8 +11910,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple 
> *loop_vectorized_call)
> (LOOP_VINFO_SCALAR_LOOP_SCALING (loop_vinfo));
>scale_loop_frequencies (LOOP_VINFO_SCALAR_LOOP (loop_vinfo),
> LOOP_VINFO_SCALAR_LOOP_SCALING (loop_vinfo));
> -  single_exit (LOOP_VINFO_SCALAR_LOOP (loop_vinfo))->dest->count
> - = preheader->count;
> +  LOOP_VINFO_SCALAR_IV_EXIT (loop_vinfo)->dest->count = preheader->count;
>  }
>  
>if (niters_vector == NULL_TREE)
> 
> 
> 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[PATCH] middle-end/113344 - is_truth_type_for vs GENERIC tcc_comparison

2024-01-12 Thread Richard Biener
On GENERIC tcc_comparison can have int type so restrict the PR113126
fix to vector types.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR middle-end/113344
* match.pd ((double)float CMP (double)float -> float CMP float):
Perform result type check only for vectors.
* fold-const.cc (fold_binary_loc): Likewise.
---
 gcc/fold-const.cc | 2 +-
 gcc/match.pd  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 9feb31f5c8b..594ea843d9c 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -12901,7 +12901,7 @@ fold_binary_loc (location_t loc, enum tree_code code, 
tree type,
  newtype = TREE_TYPE (targ1);
 
if (element_precision (newtype) < element_precision (TREE_TYPE (arg0))
-   && is_truth_type_for (newtype, type))
+   && (!VECTOR_TYPE_P (type) || is_truth_type_for (newtype, type)))
  return fold_build2_loc (loc, code, type,
  fold_convert_loc (loc, newtype, targ0),
  fold_convert_loc (loc, newtype, targ1));
diff --git a/gcc/match.pd b/gcc/match.pd
index 0bcf3153ff2..e42ecaf9ec7 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -6799,7 +6799,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
? TREE_TYPE (@00) : type1);
  }
  (if (element_precision (TREE_TYPE (@0)) > element_precision (newtype)
- && is_truth_type_for (newtype, type))
+ && (!VECTOR_TYPE_P (type) || is_truth_type_for (newtype, type)))
   (cmp (convert:newtype @00) (convert:newtype @10
 
 
-- 
2.35.3


Re: Re: [PATCH] RISC-V: Increase scalar_to_vec_cost from 1 to 3

2024-01-12 Thread juzhe.zh...@rivai.ai
Hi, Richard.

I tried hard in RISC-V backend. I found to fix the case with 
-march=rv64gcv_zvl4096b can not be without vec_to_scalar count.

Is there an approach that we can count vec_to_scalar cost without this piece 
code in middle-end ?

  /* ???  Enable for loop costing as well.  */
  if (!loop_vinfo)
record_stmt_cost (cost_vec, 1, vec_to_scalar, stmt_info, NULL_TREE,
  0, vect_epilogue);

Since it's stage 4, I guess we can't change this now.



juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2024-01-11 17:57
To: Robin Dapp
CC: juzhe.zh...@rivai.ai; gcc-patches; kito.cheng; Kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Increase scalar_to_vec_cost from 1 to 3
On Thu, Jan 11, 2024 at 10:52 AM Robin Dapp  wrote:
>
> On 1/11/24 10:46, juzhe.zh...@rivai.ai wrote:
> > Oh. I see I think I have done wrong here.
> >
> > I should adjust cost for VEC_EXTRACT not VEC_SET.
> >
> > But it's odd, I didn't see loop vectorizer is scanning scalar_to_vec
> > cost in vect.dump.
>
> The slidedown/vmv.x.s part is of course vec_extract but we indeed
> don't seem to cost it as vec_to_scalar here.
 
It looks like a vectorized live operation as it's not in the loop body
(and thus really irrelevant for costing in practice).  This has
 
  /* ???  Enable for loop costing as well.  */
  if (!loop_vinfo)
record_stmt_cost (cost_vec, 1, vec_to_scalar, stmt_info, NULL_TREE,
  0, vect_epilogue);
 
so live ops are not costed at all.  I would suggest to try unconditionally
enabling this?
 
> vmv.vx correspond to scalar_to_vec and I'd say 3 seems a
> bit high when a regular vector instruction is "1".
> It should rather be dependent on the latency between register
> files.  We can't really say in general but I'd say "2" is not so bad.
>
> I would suggest adding special handling in builtin_vectorization_cost
> like:
>
> /* Add register-register latency.  */
> case scalar_to_vec:
>   return common_costs->scalar_to_vec_cost + riscv_register_move_cost (...)
>
> and adjust register_move_cost accordingly.  Instead of using
> register_move_cost we could also use a cost structure directly.
> (E.g. like aarch64's regmove tuning structures.  Those don't
> contain VRs but for us it could make sense to add them).
>
> > +/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -O3 -ftree-vectorize 
> > -fdump-tree-vect-details" } */
> With a cost of "3" we still vectorize for zvl512b and larger.
> Is that intended?  I don't really see why 512 should vectorized
> but 256 not.  Disregarding that everything should be optimized
> away, 2 iterations for the whole loop with 256 bits doesn't
> seem that bad.
>
> Regards
>  Robin
>
 


[PATCH]middle-end: remove more usages of single_exit

2024-01-12 Thread Tamar Christina
Hi All,

This replaces two more usages of single_exit that I had missed before.
They both seem to happen when we re-use the ifcvt scalar loop for versioning.

The condition in versioning is the same as the one for when we don't re-use the
scalar loop.

I hit these during an LTO enabled bootstrap now.

Bootstrapped Regtested on aarch64-none-linux-gnu with lto enabled and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-vect-loop-manip.cc (vect_loop_versioning): Replace single_exit.
* tree-vect-loop.cc (vect_transform_loop): Likewise.

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 
0931b18404856f6c33dcae1ffa8d5a350dbd0f8f..0d8c90f69e9693d5d25095e799fbc17a9910779b
 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -4051,7 +4051,16 @@ vect_loop_versioning (loop_vec_info loop_vinfo,
   basic_block preheader = loop_preheader_edge (loop_to_version)->src;
   preheader->count = preheader->count.apply_probability (prob * prob2);
   scale_loop_frequencies (loop_to_version, prob * prob2);
-  single_exit (loop_to_version)->dest->count = preheader->count;
+  /* When the loop has multiple exits then we can only version itself.
+   This is denoted by loop_to_version == loop.  In this case we can
+   do the versioning by selecting the exit edge the vectorizer is
+   currently using.  */
+  edge exit_edge;
+  if (loop_to_version == loop)
+   exit_edge = LOOP_VINFO_IV_EXIT (loop_vinfo);
+  else
+   exit_edge = single_exit (loop_to_version);
+  exit_edge->dest->count = preheader->count;
   LOOP_VINFO_SCALAR_LOOP_SCALING (loop_vinfo) = (prob * prob2).invert ();
 
   nloop = scalar_loop;
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 
da2dfa176ecd457ebc11d1131302ca15d77d779d..eccf0953bbae2a0e95efba0966c85492e5057b14
 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -11910,8 +11910,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple 
*loop_vectorized_call)
  (LOOP_VINFO_SCALAR_LOOP_SCALING (loop_vinfo));
   scale_loop_frequencies (LOOP_VINFO_SCALAR_LOOP (loop_vinfo),
  LOOP_VINFO_SCALAR_LOOP_SCALING (loop_vinfo));
-  single_exit (LOOP_VINFO_SCALAR_LOOP (loop_vinfo))->dest->count
-   = preheader->count;
+  LOOP_VINFO_SCALAR_IV_EXIT (loop_vinfo)->dest->count = preheader->count;
 }
 
   if (niters_vector == NULL_TREE)




-- 
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 
0931b18404856f6c33dcae1ffa8d5a350dbd0f8f..0d8c90f69e9693d5d25095e799fbc17a9910779b
 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -4051,7 +4051,16 @@ vect_loop_versioning (loop_vec_info loop_vinfo,
   basic_block preheader = loop_preheader_edge (loop_to_version)->src;
   preheader->count = preheader->count.apply_probability (prob * prob2);
   scale_loop_frequencies (loop_to_version, prob * prob2);
-  single_exit (loop_to_version)->dest->count = preheader->count;
+  /* When the loop has multiple exits then we can only version itself.
+   This is denoted by loop_to_version == loop.  In this case we can
+   do the versioning by selecting the exit edge the vectorizer is
+   currently using.  */
+  edge exit_edge;
+  if (loop_to_version == loop)
+   exit_edge = LOOP_VINFO_IV_EXIT (loop_vinfo);
+  else
+   exit_edge = single_exit (loop_to_version);
+  exit_edge->dest->count = preheader->count;
   LOOP_VINFO_SCALAR_LOOP_SCALING (loop_vinfo) = (prob * prob2).invert ();
 
   nloop = scalar_loop;
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 
da2dfa176ecd457ebc11d1131302ca15d77d779d..eccf0953bbae2a0e95efba0966c85492e5057b14
 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -11910,8 +11910,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple 
*loop_vectorized_call)
  (LOOP_VINFO_SCALAR_LOOP_SCALING (loop_vinfo));
   scale_loop_frequencies (LOOP_VINFO_SCALAR_LOOP (loop_vinfo),
  LOOP_VINFO_SCALAR_LOOP_SCALING (loop_vinfo));
-  single_exit (LOOP_VINFO_SCALAR_LOOP (loop_vinfo))->dest->count
-   = preheader->count;
+  LOOP_VINFO_SCALAR_IV_EXIT (loop_vinfo)->dest->count = preheader->count;
 }
 
   if (niters_vector == NULL_TREE)





Re: [PATCH] Do not count unused scalar use when marking STMT_VINFO_LIVE_P [PR113091]

2024-01-12 Thread Richard Biener
On Fri, Jan 12, 2024 at 7:15 AM Feng Xue OS  wrote:
>
> Add a depth parameter to limit recursion of vec_slp_has_scalar_use.

OK.

> Feng
> ---
>
>  .../gcc.target/aarch64/bb-slp-pr113091.c  |  22 ++
>  gcc/tree-vect-slp.cc  | 207 ++
>  2 files changed, 190 insertions(+), 39 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/bb-slp-pr113091.c
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/bb-slp-pr113091.c 
> b/gcc/testsuite/gcc.target/aarch64/bb-slp-pr113091.c
> new file mode 100644
> index 000..ff822e90b4a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/bb-slp-pr113091.c
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O3 -fdump-tree-slp-details 
> -ftree-slp-vectorize" } */
> +
> +int test(unsigned array[8]);
> +
> +int foo(char *a, char *b)
> +{
> +  unsigned array[8];
> +
> +  array[0] = (a[0] - b[0]);
> +  array[1] = (a[1] - b[1]);
> +  array[2] = (a[2] - b[2]);
> +  array[3] = (a[3] - b[3]);
> +  array[4] = (a[4] - b[4]);
> +  array[5] = (a[5] - b[5]);
> +  array[6] = (a[6] - b[6]);
> +  array[7] = (a[7] - b[7]);
> +
> +  return test(array);
> +}
> +
> +/* { dg-final { scan-tree-dump-times "Basic block will be vectorized using 
> SLP" 1 "slp2" } } */
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index b6cce55ce90..086377a9ac0 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -6418,6 +6418,102 @@ vect_slp_analyze_node_operations (vec_info *vinfo, 
> slp_tree node,
>return res;
>  }
>
> +/* Given a definition DEF, analyze if it will have any live scalar use after
> +   performing SLP vectorization whose information is represented by BB_VINFO,
> +   and record result into hash map SCALAR_USE_MAP as cache for later fast
> +   check.  If recursion DEPTH exceeds a limit, stop analysis and make a
> +   conservative assumption.  Return 0 if no scalar use, 1 if there is, -1
> +   means recursion is limited.  */
> +
> +static int
> +vec_slp_has_scalar_use (bb_vec_info bb_vinfo, tree def,
> +   hash_map _use_map,
> +   int depth = 0)
> +{
> +  const int depth_limit = 2;
> +  imm_use_iterator use_iter;
> +  gimple *use_stmt;
> +
> +  if (int *res = scalar_use_map.get (def))
> +return *res;
> +
> +  int scalar_use = 1;
> +
> +  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, def)
> +{
> +  if (is_gimple_debug (use_stmt))
> +   continue;
> +
> +  stmt_vec_info use_stmt_info = bb_vinfo->lookup_stmt (use_stmt);
> +
> +  if (!use_stmt_info)
> +   break;
> +
> +  if (PURE_SLP_STMT (vect_stmt_to_vectorize (use_stmt_info)))
> +   continue;
> +
> +  /* Do not step forward when encounter PHI statement, since it may
> +involve cyclic reference and cause infinite recursive invocation.  */
> +  if (gimple_code (use_stmt) == GIMPLE_PHI)
> +   break;
> +
> +  /* When pattern recognition is involved, a statement whose definition 
> is
> +consumed in some pattern, may not be included in the final 
> replacement
> +pattern statements, so would be skipped when building SLP graph.
> +
> +* Original
> + char a_c = *(char *) a;
> + char b_c = *(char *) b;
> + unsigned short a_s = (unsigned short) a_c;
> + int a_i = (int) a_s;
> + int b_i = (int) b_c;
> + int r_i = a_i - b_i;
> +
> +* After pattern replacement
> + a_s = (unsigned short) a_c;
> + a_i = (int) a_s;
> +
> + patt_b_s = (unsigned short) b_c;// b_i = (int) b_c
> + patt_b_i = (int) patt_b_s;  // b_i = (int) b_c
> +
> + patt_r_s = widen_minus(a_c, b_c);   // r_i = a_i - b_i
> + patt_r_i = (int) patt_r_s;  // r_i = a_i - b_i
> +
> +The definitions of a_i(original statement) and b_i(pattern statement)
> +are related to, but actually not part of widen_minus pattern.
> +Vectorizing the pattern does not cause these definition statements to
> +be marked as PURE_SLP.  For this case, we need to recursively check
> +whether their uses are all absorbed into vectorized code.  But there
> +is an exception that some use may participate in an vectorized
> +operation via an external SLP node containing that use as an element.
> +The parameter "scalar_use_map" tags such kind of SSA as having scalar
> +use in advance.  */
> +  tree lhs = gimple_get_lhs (use_stmt);
> +
> +  if (!lhs || TREE_CODE (lhs) != SSA_NAME)
> +   break;
> +
> +  if (depth_limit && depth >= depth_limit)
> +   return -1;
> +
> +  if ((scalar_use = vec_slp_has_scalar_use (bb_vinfo, lhs, 
> scalar_use_map,
> +   depth + 1)))
> +   break;
> +}
> +
> +  if (end_imm_use_stmt_p (_iter))
> +scalar_use = 0;
> +
> +  /* If recursion is limited, do not cache result for non-root defs.  */
> +  

Re: [Patch, rs6000] Eliminate unnecessary byte swaps for block clear on P8 LE [PR113325]

2024-01-12 Thread Richard Biener
On Fri, Jan 12, 2024 at 3:15 AM HAO CHEN GUI  wrote:
>
> Hi Richard,
>Thanks so much for your comments.
>
>
> >> patch.diff
> >> diff --git a/gcc/config/rs6000/rs6000-string.cc 
> >> b/gcc/config/rs6000/rs6000-string.cc
> >> index 7f777666ba9..4c9b2cbeefc 100644
> >> --- a/gcc/config/rs6000/rs6000-string.cc
> >> +++ b/gcc/config/rs6000/rs6000-string.cc
> >> @@ -140,7 +140,9 @@ expand_block_clear (rtx operands[])
> >> }
> >>
> >>dest = adjust_address (orig_dest, mode, offset);
> >> -
> >> +  /* Set the alignment of dest to the size of mode in order to
> >> +avoid unnecessary byte swaps on LE.  */
> >> +  set_mem_align (dest, GET_MODE_SIZE (mode) * BITS_PER_UNIT);
> >
> > but the alignment is now wrong which might cause ripple-down
> > wrong-code effects, no?
> >
> > It's probably bad to hide the byte-swapping in the move patterns (I'm
> > just guessing
> > you do that)
>
> Here I just change the alignment of "dest" which is temporary used for
> move. The orig_dest is untouched and keep the original alignment. The
> subsequent insns which use orig_dest are not affected. I am not sure if
> it causes ripple-down effects. Do you mean the dest might be reused
> later? But I think the alignment is different even though the mode and
> offset is the same.

If the MEM ends up in the IL then its MEM_ALIGN should be better correct.

> Looking forward to your advice.
>
> Thanks
> Gui Haochen