[PATCH v1 1/5] RISC-V: Add new option --param=rvv-gr2vr-cost= for rvv insn

2025-05-03 Thread pan2 . li
From: Pan Li 

During investigate the combine from vec_dup and vop.vv into
vop.vx, we need to depend on the cost of the insn operate
from the gr to vr, for example, vadd.vx.  Thus, for better
control and test, we introduce a new option, aka below:

--param=rvv-gr2vr-cost=

To specific the cost value of the insn that operate from
the gr to vr.

gcc/ChangeLog:

* config/riscv/riscv-opts.h (RVV_GR2VR_COST_UNPROVIDED): Add
new macro to indicate the param is not provided.
* config/riscv/riscv.cc (get_vector_gr2vr_cost): Add new func
to get the cost value of rvv insn operate from gr to vr.
* config/riscv/riscv.opt: Add new option --pararm=rvv-gr2vr-cost.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/riscv-opts.h |  2 ++
 gcc/config/riscv/riscv.cc | 11 +++
 gcc/config/riscv/riscv.opt|  4 
 3 files changed, 17 insertions(+)

diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 26fe228e0f8..670f540f11d 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -162,4 +162,6 @@ enum riscv_tls_type {
 #define TARGET_VECTOR_AUTOVEC_SEGMENT \
   (TARGET_VECTOR && riscv_mautovec_segment)
 
+#define RVV_GR2VR_COST_UNPROVIDED -1
+
 #endif /* ! GCC_RISCV_OPTS_H */
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index ed635ab42f4..ee23888cbf7 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3851,6 +3851,17 @@ riscv_extend_cost (rtx op, bool unsigned_p)
   return COSTS_N_INSNS (2);
 }
 
+static inline int
+get_vector_gr2vr_cost ()
+{
+  int cost = get_vector_costs ()->regmove->GR2VR;
+
+  if (rvv_gr2vr_cost != RVV_GR2VR_COST_UNPROVIDED)
+cost = rvv_gr2vr_cost;
+
+  return cost;
+}
+
 /* Implement TARGET_RTX_COSTS.  */
 
 #define SINGLE_SHIFT_COST 1
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 7515c8ea13d..4ed0412e1aa 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -579,6 +579,10 @@ Inline strlen calls if possible.
 Target RejectNegative Joined UInteger Var(riscv_strcmp_inline_limit) Init(64)
 Max number of bytes to compare as part of inlined strcmp/strncmp routines 
(default: 64).
 
+-param=rvv-gr2vr-cost=
+Target RejectNegative Joined UInteger Var(rvv_gr2vr_cost) 
Init(RVV_GR2VR_COST_UNPROVIDED)
+Set the cost value of the rvv instruction when operate from GR to VR.
+
 Enum
 Name(rvv_max_lmul) Type(enum rvv_max_lmul_enum)
 The RVV possible LMUL (-mrvv-max-lmul=):
-- 
2.43.0



[PATCH v1 5/5] RISC-V: Add testcases for vec_duplicate + vadd.vv combine when GR2VR cost 15

2025-05-03 Thread pan2 . li
From: Pan Li 

Add asm dump check and for vec_duplicate + vadd.vv combine to vadd.vx.
The late-combine will not take action when GR2VR cost is 15.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-i16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-i32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-i64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-i8.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-u16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-u32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-u64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-u8.c: New test.

Signed-off-by: Pan Li 
---
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-i16.c| 8 
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-i32.c| 8 
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-i64.c| 8 
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-i8.c | 8 
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-u16.c| 8 
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-u32.c| 8 
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-u64.c| 8 
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-u8.c | 8 
 8 files changed, 64 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-i16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-i8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-u16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-u32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-u64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-u8.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-i16.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-i16.c
new file mode 100644
index 000..61587c83680
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-i16.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param=rvv-gr2vr-cost=15" } */
+
+#include "vx_binary.h"
+
+DEF_VX_BINARY(int16_t, +)
+
+/* { dg-final { scan-assembler-not {vadd.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-i32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-i32.c
new file mode 100644
index 000..f6f9b4d863e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-i32.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param=rvv-gr2vr-cost=15" } */
+
+#include "vx_binary.h"
+
+DEF_VX_BINARY(int32_t, +)
+
+/* { dg-final { scan-assembler-not {vadd.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-i64.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-i64.c
new file mode 100644
index 000..c3d21b96747
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-i64.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param=rvv-gr2vr-cost=15" } */
+
+#include "vx_binary.h"
+
+DEF_VX_BINARY(int64_t, +)
+
+/* { dg-final { scan-assembler-not {vadd.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-i8.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-i8.c
new file mode 100644
index 000..42580de6cb4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-i8.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param=rvv-gr2vr-cost=15" } */
+
+#include "vx_binary.h"
+
+DEF_VX_BINARY(int8_t, +)
+
+/* { dg-final { scan-assembler-not {vadd.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-u16.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-u16.c
new file mode 100644
index 000..ac3831db887
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-u16.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param=rvv-gr2vr-cost=15" } */
+
+#include "vx_binary.h"
+
+DEF_VX_BINARY(uint16_t, +)
+
+/* { dg-final { scan-assembler-not {vadd.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-u32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-u32.c
new file mode 100644
index 000..273bf52975d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-u32.c
@@ 

[PATCH v1 4/5] RISC-V: Add testcases for vec_duplicate + vadd.vv combine when GR2VR cost 1

2025-05-03 Thread pan2 . li
From: Pan Li 

Add asm dump check and for vec_duplicate + vadd.vv combine to vadd.vx.
The late-combine will not take action when GR2VR cost is 1.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-i16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-i32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-i64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-i8.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-u16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-u32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-u64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-u8.c: New test.

Signed-off-by: Pan Li 
---
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-i16.c| 8 
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-i32.c| 8 
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-i64.c| 8 
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-i8.c | 8 
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-u16.c| 8 
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-u32.c| 8 
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-u64.c| 8 
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-u8.c | 8 
 8 files changed, 64 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-i16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-i8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-u16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-u32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-u64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-u8.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-i16.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-i16.c
new file mode 100644
index 000..6d16a5ecfbb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-i16.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param=rvv-gr2vr-cost=1" } */
+
+#include "vx_binary.h"
+
+DEF_VX_BINARY(int16_t, +)
+
+/* { dg-final { scan-assembler-not {vadd.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-i32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-i32.c
new file mode 100644
index 000..c2f83c8974d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-i32.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param=rvv-gr2vr-cost=1" } */
+
+#include "vx_binary.h"
+
+DEF_VX_BINARY(int32_t, +)
+
+/* { dg-final { scan-assembler-not {vadd.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-i64.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-i64.c
new file mode 100644
index 000..059801213de
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-i64.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param=rvv-gr2vr-cost=1" } */
+
+#include "vx_binary.h"
+
+DEF_VX_BINARY(int64_t, +)
+
+/* { dg-final { scan-assembler-not {vadd.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-i8.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-i8.c
new file mode 100644
index 000..35ef89f0a65
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-i8.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param=rvv-gr2vr-cost=1" } */
+
+#include "vx_binary.h"
+
+DEF_VX_BINARY(int8_t, +)
+
+/* { dg-final { scan-assembler-not {vadd.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-u16.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-u16.c
new file mode 100644
index 000..43f146d1400
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-u16.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param=rvv-gr2vr-cost=1" } */
+
+#include "vx_binary.h"
+
+DEF_VX_BINARY(uint16_t, +)
+
+/* { dg-final { scan-assembler-not {vadd.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-u32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-u32.c
new file mode 100644
index 000..282beb96ae4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-u32.c
@@ -0,0 +

[PATCH v3 0/5] RISC-V: Combine vec_duplicate + vadd.vv to vadd.vx on GR2VR cost

2025-05-03 Thread pan2 . li
From: Pan Li 

This patch would like to introduce the combine of vec_dup + vadd.vv into
vadd.vx on the cost value of GR2VR.  The late-combine will take place if
the cost of GR2VR is zero, or reject the combine if non-zero like 1, 15
in test.

The below test suites are passed for this patch series.
* The rv64gcv fully regression test.

Pan Li (5):
  RISC-V: Add new option --param=rvv-gr2vr-cost= for rvv insn
  RISC-V: Combine vec_duplicate + vadd.vv to vadd.vx on GR2VR cost
  RISC-V: Add testcases for vec_duplicate + vadd.vv combine when GR2VR cost 0
  RISC-V: Add testcases for vec_duplicate + vadd.vv combine when GR2VR cost 1
  RISC-V: Add testcases for vec_duplicate + vadd.vv combine when GR2VR cost 15

 gcc/config/riscv/autovec-opt.md   |  23 +
 gcc/config/riscv/riscv-opts.h |   2 +
 gcc/config/riscv/riscv.cc |  46 +-
 gcc/config/riscv/riscv.opt|   4 +
 gcc/config/riscv/vector-iterators.md  |   4 +
 .../riscv/rvv/autovec/vx_vf/vx_binary.h   |  17 +
 .../riscv/rvv/autovec/vx_vf/vx_binary_data.h  | 401 ++
 .../riscv/rvv/autovec/vx_vf/vx_binary_run.h   |  26 ++
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-i16.c   |   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-i32.c   |   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-i64.c   |   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-i8.c|   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-u16.c   |   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-u32.c   |   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-u64.c   |   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-u8.c|   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-2-i16.c   |   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-2-i32.c   |   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-2-i64.c   |   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-2-i8.c|   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-2-u16.c   |   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-2-u32.c   |   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-2-u64.c   |   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-2-u8.c|   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-3-i16.c   |   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-3-i32.c   |   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-3-i64.c   |   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-3-i8.c|   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-3-u16.c   |   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-3-u32.c   |   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-3-u64.c   |   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-3-u8.c|   8 +
 .../rvv/autovec/vx_vf/vx_vadd-run-1-i16.c |  14 +
 .../rvv/autovec/vx_vf/vx_vadd-run-1-i32.c |  14 +
 .../rvv/autovec/vx_vf/vx_vadd-run-1-i64.c |  14 +
 .../rvv/autovec/vx_vf/vx_vadd-run-1-i8.c  |  14 +
 .../rvv/autovec/vx_vf/vx_vadd-run-1-u16.c |  14 +
 .../rvv/autovec/vx_vf/vx_vadd-run-1-u32.c |  14 +
 .../rvv/autovec/vx_vf/vx_vadd-run-1-u64.c |  14 +
 .../rvv/autovec/vx_vf/vx_vadd-run-1-u8.c  |  14 +
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|   2 +
 41 files changed, 828 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_data.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_run.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-i16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-i8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-u16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-u32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-u64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-u8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-i16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-i8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-u16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-u32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-u64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-u8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-i16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_va

[PATCH v1 2/5] RISC-V: Combine vec_duplicate + vadd.vv to vadd.vx on GR2VR cost

2025-05-03 Thread pan2 . li
From: Pan Li 

This patch would like to combine the vec_duplicate + vadd.vv to the
vadd.vx.  From example as below code.  The related pattern will depend
on the cost of vec_duplicate from GR2VR, it will:

* The pattern matching will be active by default.
* The cost of GR2VR will be added to the total cost of pattern, aka:
  vec_dup cost = gr2vr_cost
  vadd.vv v, (vec_dup (x)) = gr2vr_cost + 1

Then the late-combine will take action if the cost of GR2VR is zero,
and reject the combination if the GR2VR cost is greater than zero.

Assume we have example code like below, GR2VR cost is 0.

  #define DEF_VX_BINARY(T, OP)\
  void\
  test_vx_binary (T * restrict out, T * restrict in, T x, unsigned n) \
  {   \
for (unsigned i = 0; i < n; i++)  \
  out[i] = in[i] OP x;\
  }

  DEF_VX_BINARY(int32_t, +)

Before this patch:
  10   │ test_binary_vx_add:
  11   │ beq a3,zero,.L8
  12   │ vsetvli a5,zero,e32,m1,ta,ma // Deleted if GR2VR cost zero
  13   │ vmv.v.x v2,a2// Ditto.
  14   │ sllia3,a3,32
  15   │ srlia3,a3,32
  16   │ .L3:
  17   │ vsetvli a5,a3,e32,m1,ta,ma
  18   │ vle32.v v1,0(a1)
  19   │ sllia4,a5,2
  20   │ sub a3,a3,a5
  21   │ add a1,a1,a4
  22   │ vadd.vv v1,v2,v1
  23   │ vse32.v v1,0(a0)
  24   │ add a0,a0,a4
  25   │ bne a3,zero,.L3

After this patch:
  10   │ test_binary_vx_add:
  11   │ beq a3,zero,.L8
  12   │ sllia3,a3,32
  13   │ srlia3,a3,32
  14   │ .L3:
  15   │ vsetvli a5,a3,e32,m1,ta,ma
  16   │ vle32.v v1,0(a1)
  17   │ sllia4,a5,2
  18   │ sub a3,a3,a5
  19   │ add a1,a1,a4
  20   │ vadd.vx v1,v1,a2
  21   │ vse32.v v1,0(a0)
  22   │ add a0,a0,a4
  23   │ bne a3,zero,.L3

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*_vx_): Add new
combine to convert vec_duplicate + vadd.vv to vaddvx on GR2VR
cost.
* config/riscv/riscv.cc (riscv_rtx_costs): Take care of the cost
when vec_dup and vadd v, vec_dup(x).
* config/riscv/vector-iterators.md: Add new iterator for vx.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/autovec-opt.md  | 23 ++
 gcc/config/riscv/riscv.cc| 35 +++-
 gcc/config/riscv/vector-iterators.md |  4 
 3 files changed, 61 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index 0c3b0cc7e05..7cf7e8a92ba 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -1673,3 +1673,26 @@ (define_insn_and_split "*vandn_"
 DONE;
   }
   [(set_attr "type" "vandn")])
+
+
+;; 
=
+;; Combine vec_duplicate + op.vv to op.vx
+;; Include
+;; - vadd.vx
+;; 
=
+(define_insn_and_split "*_vx_"
+ [(set (match_operand:V_VLSI0 "register_operand")
+   (any_int_binop_no_shift_vx:V_VLSI
+(vec_duplicate:V_VLSI
+  (match_operand: 1 "register_operand"))
+(match_operand:V_VLSI  2 "")))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+rtx ops[] = {operands[0], operands[2], operands[1]};
+riscv_vector::emit_vlmax_insn (code_for_pred_scalar (, mode),
+  riscv_vector::BINARY_OP, ops);
+  }
+  [(set_attr "type" "vialu")])
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index ee23888cbf7..3cbbbde1084 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3874,7 +3874,40 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
  Cost Model need to be well analyzed and supported in the future. */
   if (riscv_v_ext_mode_p (mode))
 {
-  *total = COSTS_N_INSNS (1);
+  int gr2vr_cost = get_vector_gr2vr_cost ();
+
+  switch (outer_code)
+   {
+   case SET:
+ {
+   switch (GET_CODE (x))
+ {
+ case VEC_DUPLICATE:
+   *total = gr2vr_cost * COSTS_N_INSNS (1);
+   break;
+ case PLUS:
+   {
+ rtx op_0 = XEXP (x, 0);
+ rtx op_1 = XEXP (x, 1);
+
+ if (GET_CODE (op_0) == VEC_DUPLICATE
+ || GET_CODE (op_1) == VEC_DUPLICATE)
+   *total = (gr2vr_cost + 1) * COSTS_N_INSNS (1);
+ else
+   *total = COSTS_N_INSNS (1);
+   }
+   break;
+ default:
+   *total = COSTS_N_INSNS 

[PATCH v1 3/5] RISC-V: Add testcases for vec_duplicate + vadd.vv combine when GR2VR cost 0

2025-05-03 Thread pan2 . li
From: Pan Li 

Add asm dump check and run test for vec_duplicate + vadd.vv combine
to vadd.vx.  Introduce new folder to hold all related testcases.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Add new folder vx_vf for all
vec_dup + vv to vx testcases.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_data.h: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_run.h: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-i16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-i32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-i64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-i8.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-u16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-u32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-u64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-u8.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-i16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-i32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-i64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-i8.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-u16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-u32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-u64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-u8.c: New test.

Signed-off-by: Pan Li 
---
 .../riscv/rvv/autovec/vx_vf/vx_binary.h   |  17 +
 .../riscv/rvv/autovec/vx_vf/vx_binary_data.h  | 401 ++
 .../riscv/rvv/autovec/vx_vf/vx_binary_run.h   |  26 ++
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-i16.c   |   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-i32.c   |   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-i64.c   |   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-i8.c|   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-u16.c   |   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-u32.c   |   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-u64.c   |   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-u8.c|   8 +
 .../rvv/autovec/vx_vf/vx_vadd-run-1-i16.c |  14 +
 .../rvv/autovec/vx_vf/vx_vadd-run-1-i32.c |  14 +
 .../rvv/autovec/vx_vf/vx_vadd-run-1-i64.c |  14 +
 .../rvv/autovec/vx_vf/vx_vadd-run-1-i8.c  |  14 +
 .../rvv/autovec/vx_vf/vx_vadd-run-1-u16.c |  14 +
 .../rvv/autovec/vx_vf/vx_vadd-run-1-u32.c |  14 +
 .../rvv/autovec/vx_vf/vx_vadd-run-1-u64.c |  14 +
 .../rvv/autovec/vx_vf/vx_vadd-run-1-u8.c  |  14 +
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|   2 +
 20 files changed, 622 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_data.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_run.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-i16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-i8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-u16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-u32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-u64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-u8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-i16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-i8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-u16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-u32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-u64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-u8.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h
new file mode 100644
index 000..66654eb9022
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h
@@ -0,0 +1,17 @@
+#ifndef HAVE_DEFINED_VX_VF_BINARY_H
+#define HAVE_DEFINED_VX_VF_BINARY_H
+
+#include 
+
+#define DEF_VX_BI

RE: [PATCH v1][GCC16-Stage-1] RISC-V: Remove unnecessary frm restore volatile define_insn

2025-05-03 Thread Li, Pan2
> There's still some nagging issues (which I'll describe in a seperate email), 
> but
> I think we can go ahead and merge this one too to close the loop.

Thanks Vineet, but I'd like to wait the ack from Jeff before commit.

Pan

-Original Message-
From: Vineet Gupta  
Sent: Saturday, May 3, 2025 1:04 AM
To: Jeff Law ; Li, Pan2 ; 
gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com; Chen, Ken 

Subject: Re: [PATCH v1][GCC16-Stage-1] RISC-V: Remove unnecessary frm restore 
volatile define_insn

On 4/30/25 20:44, Jeff Law wrote:
>>> Sorry this got backed up as I'm working on FRM overhaul - if this is not 
>>> super
>>> urgent can you please wait for a few weeks for my work to be posted.
>>> If you prefer this go in still, fine by me as well.
>> Sure thing, feel free to ping me if there is something I can help.
> I put Pan's patch onto the deferred list.

Looks like Kito's confluence hook already fixes my issue (PR/119164) and in
general the FRM writes are much fewer.
There's still some nagging issues (which I'll describe in a seperate email), but
I think we can go ahead and merge this one too to close the loop.

I'll post my current ready to go cleanups, the issues I currently see and the
future course of action.

Thx,
-Vineet


[patch, wwwdocs, committed] Fix option name in gcc15/changes.html

2025-05-03 Thread Thomas Koenig

Hello world,

I just committed the following patch after noticing that an option name
was wrong in the gcc15/changes.html file.

diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html
index d851a744..b442b8d9 100644
--- a/htdocs/gcc-15/changes.html
+++ b/htdocs/gcc-15/changes.html
@@ -738,7 +738,7 @@ asm (".text; %cc0: mov %cc2, %%r0; .previous;"
   is not affected, because it provides backwards compatibility 
with the

   older ABI.
   
-The -Wexternal-interface-mismatch option has been
+The -Wexternal-arguments-mismatch option has been
 added.  This checks for mismatches between the argument lists in
 dummy external arguments, and is implied by -Wall
 and -fc-prototypes-external options.



[patch, Fortran] Fix PR 119928, rejects-valid 15/16 regression

2025-05-03 Thread Thomas Koenig

Hello world,

This patch fixes a case where too much was being checked with
-Wexternal-arguments-mismatch with a procedure pointer with an
unlimited polymorphic and an INTEGER argument which was inferred from
an actual argument.I also found some checks which can trigger false
positives, which this patch also excludes from testing.

Regression-tested.

OK for trunk and backport to gcc-15?

Best regards

Thomas

gcc/fortran/ChangeLog:

PR fortran/119928
* interface.cc (gfc_check_dummy_characteristics): Do not issue
error for type if one argument is an unlimited polymorphic entity
and the other one has been generated from an actual argument.
Do not check OPTIONAL, INTENT, ALLOCATABLE, POINTER, TARGET, VALUE,
ASYNCHRONOUS or CONTIGUOUS if one of the arguments has been
generated from an actual argument.

gcc/testsuite/ChangeLog:

PR fortran/119928
* gfortran.dg/interface_60.f90: New test.
diff --git a/gcc/fortran/interface.cc b/gcc/fortran/interface.cc
index 1e552a3df86..af955fd2ff9 100644
--- a/gcc/fortran/interface.cc
+++ b/gcc/fortran/interface.cc
@@ -1387,8 +1387,10 @@ gfc_check_dummy_characteristics (gfc_symbol *s1, gfc_symbol *s2,
   /* Check type and rank.  */
   if (type_must_agree)
 {
-  if (!compare_type_characteristics (s1, s2)
-	  || !compare_type_characteristics (s2, s1))
+  if ((!compare_type_characteristics (s1, s2)
+	   || !compare_type_characteristics (s2, s1))
+	  && !((s1->attr.artificial && UNLIMITED_POLY(s2))
+	   || (s2->attr.artificial && UNLIMITED_POLY(s1
 	{
 	  snprintf (errmsg, err_len, "Type mismatch in argument '%s' (%s/%s)",
 		s1->name, gfc_dummy_typename (&s1->ts),
@@ -1403,77 +1405,82 @@ gfc_check_dummy_characteristics (gfc_symbol *s1, gfc_symbol *s2,
 	}
 }
 
-  /* Check INTENT.  */
-  if (s1->attr.intent != s2->attr.intent && !s1->attr.artificial
-  && !s2->attr.artificial)
-{
-  snprintf (errmsg, err_len, "INTENT mismatch in argument '%s'",
-		s1->name);
-  return false;
-}
+  /* A lot of information is missing for artificially generated
+ formal arguments, let's not look into that.  */
 
-  /* Check OPTIONAL attribute.  */
-  if (s1->attr.optional != s2->attr.optional)
+  if (!s1->attr.artificial && !s2->attr.artificial)
 {
-  snprintf (errmsg, err_len, "OPTIONAL mismatch in argument '%s'",
-		s1->name);
-  return false;
-}
+  /* Check INTENT.  */
+  if (s1->attr.intent != s2->attr.intent)
+	{
+	  snprintf (errmsg, err_len, "INTENT mismatch in argument '%s'",
+		s1->name);
+	  return false;
+	}
 
-  /* Check ALLOCATABLE attribute.  */
-  if (s1->attr.allocatable != s2->attr.allocatable)
-{
-  snprintf (errmsg, err_len, "ALLOCATABLE mismatch in argument '%s'",
-		s1->name);
-  return false;
-}
+  /* Check OPTIONAL attribute.  */
+  if (s1->attr.optional != s2->attr.optional)
+	{
+	  snprintf (errmsg, err_len, "OPTIONAL mismatch in argument '%s'",
+		s1->name);
+	  return false;
+	}
 
-  /* Check POINTER attribute.  */
-  if (s1->attr.pointer != s2->attr.pointer)
-{
-  snprintf (errmsg, err_len, "POINTER mismatch in argument '%s'",
-		s1->name);
-  return false;
-}
+  /* Check ALLOCATABLE attribute.  */
+  if (s1->attr.allocatable != s2->attr.allocatable)
+	{
+	  snprintf (errmsg, err_len, "ALLOCATABLE mismatch in argument '%s'",
+		s1->name);
+	  return false;
+	}
 
-  /* Check TARGET attribute.  */
-  if (s1->attr.target != s2->attr.target)
-{
-  snprintf (errmsg, err_len, "TARGET mismatch in argument '%s'",
-		s1->name);
-  return false;
-}
+  /* Check POINTER attribute.  */
+  if (s1->attr.pointer != s2->attr.pointer)
+	{
+	  snprintf (errmsg, err_len, "POINTER mismatch in argument '%s'",
+		s1->name);
+	  return false;
+	}
 
-  /* Check ASYNCHRONOUS attribute.  */
-  if (s1->attr.asynchronous != s2->attr.asynchronous)
-{
-  snprintf (errmsg, err_len, "ASYNCHRONOUS mismatch in argument '%s'",
-		s1->name);
-  return false;
-}
+  /* Check TARGET attribute.  */
+  if (s1->attr.target != s2->attr.target)
+	{
+	  snprintf (errmsg, err_len, "TARGET mismatch in argument '%s'",
+		s1->name);
+	  return false;
+	}
 
-  /* Check CONTIGUOUS attribute.  */
-  if (s1->attr.contiguous != s2->attr.contiguous)
-{
-  snprintf (errmsg, err_len, "CONTIGUOUS mismatch in argument '%s'",
-		s1->name);
-  return false;
-}
+  /* Check ASYNCHRONOUS attribute.  */
+  if (s1->attr.asynchronous != s2->attr.asynchronous)
+	{
+	  snprintf (errmsg, err_len, "ASYNCHRONOUS mismatch in argument '%s'",
+		s1->name);
+	  return false;
+	}
 
-  /* Check VALUE attribute.  */
-  if (s1->attr.value != s2->attr.value)
-{
-  snprintf (errmsg, err_len, "VALUE mismatch in argument '%s'",
-		s1->name);
-  return false;
-}
+  /* Check CONTIGUOUS attribute.  */
+  if (s1->attr.contiguous != s2->attr

Re: [PATCH] cobol, v2: Fix up cobol cross-compilation from 32-bit arches [PR119364]

2025-05-03 Thread Richard Biener
On Sat, May 3, 2025 at 1:44 PM Jakub Jelinek  wrote:
>
> On Sat, May 03, 2025 at 01:32:35PM +0200, Richard Biener wrote:
> > Any reason for unsigned long long vs. uint64_t for the following?
> >
> > -size_t attr;// See cbl_field_attr_t
> > +unsigned long long attr;// See cbl_field_attr_t
>
> Because it needs to be handled the same by the compiler, and it was easier
> to use ULONGLONG, "attr" than figure out what exact type uint64_t has.
> The C FE has
>   if (UINT64_TYPE)
> c_uint64_type_node = uint64_type_node =
>   TREE_TYPE (identifier_global_value (c_get_ident (UINT64_TYPE)));
> but the COBOL FE doesn't have that.
> For aliasing etc. reasons I'm afraid we can't use
> build_nonstandard_integer_type (64, 1);

Ah, I was hoping the stdint type nodes were initialized by the middle-end.

Richard.

>
> Jakub
>


Re: [PATCH] cobol, v2: Fix up cobol cross-compilation from 32-bit arches [PR119364]

2025-05-03 Thread Richard Biener
On Fri, May 2, 2025 at 4:53 PM Jakub Jelinek  wrote:
>
> On Sun, Apr 06, 2025 at 03:39:27PM +0200, Jakub Jelinek wrote:
> > Right now it is not possible to even build cross-compilers from 32-bit
> > architectures to e.g. x86_64-linux or aarch64-linux, even from little-endian
> > ones.
> >
> > The following patch attempts to fix that.
> >
> > There were various issues seen e.g. trying to build i686-linux ->
> > x86_64-linux cross-compiler (so still 64-bit libgcobol, but the compiler
> > is 32-bit).
> > 1) warning about >> 32 shift of size_t, on 32-bit arches size_t is 32-bit
> >and so the shift is UB; fixed by doing (new_size>>16)>>16 so that
> >it ors in >> 32 when new_size is 64-bit and 0 when it is 32-bit
> > 2) enum cbl_field_attr_t was using size_t as underlying type, but has
> >various bitmasks which require full 64-bit type; changed this to uint64_t
> >underlying type and using unsigned long long in the structure; various
> >routines which operate with those attributes had to be changed also to
> >work with uint64_t instead of size_t
> > 3) on i686-linux, config.h can #define _FILE_OFFSET_BITS 64 or similar
> >macros; as documented, those macros have to be defined before including
> >first C library header, but some sources included cobol-system.h which
> >includes config.h only after various other headers; this resulted in
> >link failures, as ino_t was sometimes unsigned long and sometines
> >unsigned long long, depending on whether config.h was included first or
> >not, and e.g. cobol_filename uses ino_t argument
> > 4) lots of places used %ld or %lx *printf format specifers with size_t
> >arguments; that works only if size_t is unsigned long, but not when it
> >is unsigned int or unsigned long long or some other type; now while
> >ISO C99 has %zd or %zx to print size_t and C++14 includes C99 (or C11?),
> >while for the C++ headers the C++ compilers typically have full control
> >over it and so support everything in C++14 (e.g. libstdc++ in GCC 5.1+
> >or libc++ if not too old), for C library we are dependent on the system
> >C library (note, on the host for the compiler side).  And not all hosts
> >support C99 in their C libraries; so instead of just changing it to
> >%zd or %zx, I'm changing it to what we use elsewhere in GCC,
> >HOST_SIZE_T_PRINT_{DEC,UNSIGNED,HEX_PURE} or GCC_PRISZ macros in the
> >*printf family format string and casts of the size_t arguments to
> >fmt_size_t.  Note, if not using the C library *printf family (e.g. in
> >dbgmsg, sprintf, snprintf, fprintf, etc.) but the GCC diagnostic code
> >(e.g. err_msg, error, warning, yywarn, ...), then %zd/%zu is supported
> >and on the other side HOST_SIZE_T_PRINT_{DEC,UNSIGNED,HEX_PURE} etc.
> >macros shouldn't be used (for two reasons, because it is unnecessary
> >when %zd/%zu is guaranteed to be supported there because GCC has
> >control over that and more importantly because it breaks translations,
> >both extraction of the to be translated strings and we don't want to
> >have different messages, once with %lld, once with %ld, once with just %d
> >or %I64d depending on host, translators couldn't translate it all).
> > 5) see above, there were already tons of %zd/%zu or %3zu etc. format
> >specifers in *printf format strings, this patch changes those too
> > 6) I've noticed dbgmsg wasn't declared with printf attribute, which resulted
> >in bugs where format specifiers didn't match actually passed types of
> >arguments
>
> Here is an updated patch against latest trunk.
> Maintaining the patch is a nightmare, got several dozens of rejects
> that I had to deal with.
>
> Bootstrapped/regtested on x86_64-linux and tested with i686-linux ->
> x86_64-linux cross, ok for trunk?

Any reason for unsigned long long vs. uint64_t for the following?

-size_t attr;// See cbl_field_attr_t
+unsigned long long attr;// See cbl_field_attr_t

Otherwise LGTM.

> 2025-05-02  Jakub Jelinek  
>
> PR cobol/119364
> libgcobol/
> * valconv.cc (__gg__realloc_if_necessary): Use (new_size>>16)>>16;
> instead of new_size>>32; to avoid warnings on 32-bit hosts.
> * common-defs.h (enum cbl_field_attr_t): Use uint64_t
> as underlying type rather than size_t.
> * gcobolio.h (cblc_field_t): Change attr member type from size_t
> to unsigned long long.
> gcc/cobol/
> * util.cc (is_numeric_edited): Use HOST_SIZE_T_PRINT_UNSIGNED
> instead of "%zu" and cast corresponding argument to fmt_size_t.
> (normalize_picture): Use GCC_PRISZ instead of "z" and pass address
> of fmt_size_t var to sscanf and copy afterwards.
> (cbl_refer_t::str): Use HOST_SIZE_T_PRINT_UNSIGNED instead of
> "%zu" or GCC_PRISZ instead of "z" and cast corresponding argument
> to fmt_size_t.
> (struct move_corresponding_

Re: [PATCH] PR tree-optimization/120048 - Allow IPA_CP to handle UNDEFINED as VARYING.

2025-05-03 Thread Richard Biener
On Sat, May 3, 2025 at 12:39 AM Andrew MacLeod  wrote:
>
> On trunk I'll eventually do something different.. but it will be more
> invasive than I think is reasonable for a backport.
>
> The problem in the PR is that there is a variable with a range and has a
> bitmask attached to it.   We often defer bitmask processing, the the
> change which triggers this problem "improves" the range by applying the
> bitmask when  we call update_bitmask. (PR 119712)
>
> The case in point is a range of 0, combined with a bitmask that says the
> '1' bit must be on.   This results in an UNDEFINED range since its
> impossible.   this is rarely a problem but this particular snippet of
> code in IPA is tripping over it because it has checked for undefined,
> and then created a new range by combining the [0, 0] and the bitmask,
> which we turn into an UNDEFINED.. which it isn't expected.and then
> it asks for the type of the range.
>
> As Jakub points out in the PR, this is effectively unreachable code that
> is being propagated. A harmless fix would be to check if the result of
> applying the bitmask results in an UNDEFINED value and  to simply
> replace it with a VARYING value.
>
> WE still reduce the testcase to "return 0" and no more failure.
>
> bootstraps on -x86_64-pc-linux-gnu  with no regressions.
>
> If this is acceptable, I will push it to trunk, then also test/verify
> for the GCC15 and 14(?) branches and check it in there.

LGTM.  IPA CP might want to either avoid looking at the type
for UNDEFINED or track it separate from the value-range, not
sure where it looks at the type of a range.

Richard.

> Andrew
>
>


Re: [PATCH] cobol, v2: Fix up cobol cross-compilation from 32-bit arches [PR119364]

2025-05-03 Thread Jakub Jelinek
On Sat, May 03, 2025 at 01:32:35PM +0200, Richard Biener wrote:
> Any reason for unsigned long long vs. uint64_t for the following?
> 
> -size_t attr;// See cbl_field_attr_t
> +unsigned long long attr;// See cbl_field_attr_t

Because it needs to be handled the same by the compiler, and it was easier
to use ULONGLONG, "attr" than figure out what exact type uint64_t has.
The C FE has
  if (UINT64_TYPE)
c_uint64_type_node = uint64_type_node =
  TREE_TYPE (identifier_global_value (c_get_ident (UINT64_TYPE)));
but the COBOL FE doesn't have that.
For aliasing etc. reasons I'm afraid we can't use
build_nonstandard_integer_type (64, 1);

Jakub



Re: [PATCH v3] i386/cygming: Decrease default preferred stack boundary for 32-bit targets

2025-05-03 Thread LIU Hao

在 2025-5-2 01:25, LIU Hao 写道:
Remove `STACK_REALIGN_DEFAULT` for this target, because now the default value of 
`incoming_stack_boundary` equals `MIN_STACK_BOUNDARY` and it doesn't have an effect any more.





I suddenly realized the previous patch was for GCC 15 branch. Here's a new one, 
rebased on master.


--
Best regards,
LIU Hao
From d127c5fcf20d34548529ccfef962e7b48c6c56ef Mon Sep 17 00:00:00 2001
From: LIU Hao 
Date: Tue, 29 Apr 2025 10:43:06 +0800
Subject: [PATCH] i386/cygming: Decrease default preferred stack boundary for
 32-bit targets

This commit decreases the default preferred stack boundary to 4.

In i386-options.cc, there's

   ix86_default_incoming_stack_boundary = PREFERRED_STACK_BOUNDARY;

which sets the default incoming stack boundary to this value, if it's not
overridden by other options or attributes.

Previously, GCC preferred 16-byte alignment like other platforms, unless
`-miamcu` was specified. However, the Microsoft x86 ABI only requires the
stack be aligned to 4-byte boundaries. Callback functions from MSVC code may
break this assumption by GCC (see reference below), causing local variables
to be misaligned.

For compatibility reasons, when the attribute `force_align_arg_pointer` is
attached to a function, it continues to ensure the stack is at least aligned
to a 16-byte boundary, as the documentation seems to suggest.

After this change, `STACK_REALIGN_DEFAULT` no longer has an effect on this
target, so it is removed.

Reference: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=07#c9
Signed-off-by: LIU Hao 

gcc/ChangeLog:

PR 07
* config/i386/cygming.h (PREFERRED_STACK_BOUNDARY_DEFAULT): Override
definition from i386.h.
(STACK_REALIGN_DEFAULT): Undefine, as it no longer has an effect.
* config/i386/i386.cc (ix86_update_stack_boundary): Force minimum
128-bit alignment if `force_align_arg_pointer`.

Signed-off-by: LIU Hao 
---
 gcc/config/i386/cygming.h | 9 -
 gcc/config/i386/i386.cc   | 9 +
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/gcc/config/i386/cygming.h b/gcc/config/i386/cygming.h
index d587d25a58a8..743cc38f5852 100644
--- a/gcc/config/i386/cygming.h
+++ b/gcc/config/i386/cygming.h
@@ -28,16 +28,15 @@ along with GCC; see the file COPYING3.  If not see
 #undef TARGET_SEH
 #define TARGET_SEH  (TARGET_64BIT_MS_ABI && flag_unwind_tables)
 
+#undef PREFERRED_STACK_BOUNDARY_DEFAULT
+#define PREFERRED_STACK_BOUNDARY_DEFAULT \
+  (TARGET_64BIT ? 128 : MIN_STACK_BOUNDARY)
+
 /* Win64 with SEH cannot represent DRAP stack frames.  Disable its use.
Force the use of different mechanisms to allocate aligned local data.  */
 #undef MAX_STACK_ALIGNMENT
 #define MAX_STACK_ALIGNMENT  (TARGET_SEH ? 128 : MAX_OFILE_ALIGNMENT)
 
-/* 32-bit Windows aligns the stack on a 4-byte boundary but SSE instructions
-   may require 16-byte alignment.  */
-#undef STACK_REALIGN_DEFAULT
-#define STACK_REALIGN_DEFAULT (TARGET_64BIT ? 0 : 1)
-
 /* Support hooks for SEH.  */
 #undef  TARGET_ASM_UNWIND_EMIT
 #define TARGET_ASM_UNWIND_EMIT  i386_pe_seh_unwind_emit
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 5ad47e194348..d517f36362d2 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -7942,6 +7942,15 @@ ix86_update_stack_boundary (void)
   if (ix86_tls_descriptor_calls_expanded_in_cfun
   && crtl->preferred_stack_boundary < 128)
 crtl->preferred_stack_boundary = 128;
+
+  /* For 32-bit MS ABI, both the incoming and preferred stack boundaries
+ are 32 bits, but if force_align_arg_pointer is specified, it should
+ prefer 128 bits for a backward-compatibility reason, which is also
+ what the doc suggests.  */
+  if (lookup_attribute ("force_align_arg_pointer",
+   TYPE_ATTRIBUTES (TREE_TYPE (current_function_decl)))
+  && crtl->preferred_stack_boundary < 128)
+crtl->preferred_stack_boundary = 128;
 }
 
 /* Handle the TARGET_GET_DRAP_RTX hook.  Return NULL if no DRAP is
-- 
2.49.0



OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: [PATCH v5 05/10] libstdc++: Implement layout_left from mdspan.

2025-05-03 Thread Luc Grosheintz




On 4/30/25 7:13 AM, Tomasz Kaminski wrote:

Hi,

As we will be landing patches for extends, this will become a separate
patch series.
I would prefer, if you could commit per layout, and start with layout_right
(default)
I try to provide prompt responses, so if that works better for you, you can
post a patch
only with this layout first, as most of the comments will apply to all of
them.

For the general design we have constructors that allow conversion between
rank-0
and rank-1 layouts left and right. This is done because they essentially
represents
the same layout. I think we could benefit from that in code by having a
base classes
for rank0 and rank1 mapping:
template
_Rank0_mapping_base
{
static_assert(_Extents::rank() == 0);

template
// explicit, requires goes here
_Rank0_mapping_base(_Rank0_mapping_base);

 // All members layout_type goes her
};

template
_Rank1_mapping_base
{
static_assert(_Extents::rank() == 1);
   // Static assert for product is much simpler here, as we need to check one

template
// explicit, requires goes here
_Rank1_mapping_base(_Rank1_mapping_base);

   // Call operator can also be simplified
   index_type operator()(index_type i) const // conversion happens at user
side

   // cosntructor from strided_layout of Rank1 goes here.

 // All members layout_type goes her
};
Then we will specialize layout_left/right/stride to use _Rank0_mapping_base
as a base for rank() == 0
and layout_left/right to use _Rank1_mapping as base for rank()1;
template
struct extents {};

struct layout
{
template
struct mapping
{
// static assert that Extents mmyst be specialization of _Extents goes here.
}
};

template
struct layout::mapping>
: _Rank0_mapping_base>
{
using layout_type = layout_left;
// Provides converting constructor.
using _Rank0_mapping_base>::_Rank0_mapping_base;
// This one is implicit;
mapping(_Rank0_mapping_base> const&);
};

template
struct layout::mapping>
: _Rank1_mapping_base>

{
using layout_type = layout_left;
// Provides converting constructor.
using _Rank0_mapping_base>::_Rank0_mapping_base;
// This one is implicit, allows construction from layout_right
mapping(_Rank1_mapping_base> const&);
};
};

template
requires sizeof..(_Ext) > = 2
struct layout::mapping>

The last one is a generic implementation that you can use in yours.
Please also include a comment explaining that we are deviating from
standard text here.



Thank for reviewing and offering fast review cycles, I can't say I've
ever felt that they were anything but wonderfully fast and I appologise
for the delay (I've been away hiking for two days).

The reason I implement all three is that I needed to see them all.
Otherwise, I can see and "feel" the impact of the duplication (or
efforts to reduce duplication). It's also to make sure I understand
precisely how the layouts are similar and different. The idea is that
you'd review one at a time; and by adding the others you can pick which
one and have a glance at the other if it's helpful during review.

The review contains three topics. This email responds to the idea of
introducing a common base class. I believe I superficially understand
the request. However, it's not clear to me what we gain.

The reorganization seems to stress the how rank 0 and rank 1 layouts are
similar; at the cost of making the uniformity of layout_left (regardless
of rank) and layout_right (regardless of rank) less obvious. Personally,
I quite like that we can express all layouts of one kind regardless of
their rank, without resorting to special cases via specialization.

To me the standard reads like the layouts are three separate,
independent entities. However, because it would be too tedious to not
allow conversion between layouts of rank 0 and 1, a couple of ctors were
added. An example, of how the layouts are not consider related is that
we can't compare layout_left and layout_right mappings for equality.

If I count, the current implementation has 3 copies: layout_left,
layout_right, layout_stride. The proposed changes add two (likely three)
base classes which, require three specializations of layout_right and
layout_left each. Therefore I end up with 6 copies of layout_left and
layout_right; and 2 (or 3) base classes. Or if we manage it use the base
classes for all layouts: 9 specialization, 2 (or 3) base classes.
Therefore, in terms of numbers the restructuring doesn't seem favourable
and I feel would require noticeably more repetition in the tests.

While the rank zero and rank one versions are a lot simpler than the
generic copy, they aren't entirely empty either (we'll likely need to
use `using super::*` several times). Furthermore, I don't see any real
simplifications in the generic copy. Meaning we still have a copy that
has all the complexities and could (almost) handle the other two cases.

Since layout_stride is a little different, I'm not sure we can reuse the
base classes for it. For example it allows conversion more freely, but
is

[pushed] c++: let plain -Wabi warn about future changes

2025-05-03 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

c_common_post_options limits flag_abi_version and flag_abi_compat_version to
actual ABI version numbers, but let's not do that for warn_abi_version; we
might want to add a warning relative to a future ABI version that isn't
available in the current release, such backporting the PR120012 warning.

Also allow plain -Wabi to include such a warning without complaining that
it's useless.

Also warn about an unsupported -fabi-version argument.

gcc/c-family/ChangeLog:

* c-opts.cc (c_common_post_options): Let plain -Wabi warn
about changes in a future version.
---
 gcc/c-family/c-opts.cc | 32 +---
 1 file changed, 17 insertions(+), 15 deletions(-)

diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
index 40163821948..f1c276f07cd 100644
--- a/gcc/c-family/c-opts.cc
+++ b/gcc/c-family/c-opts.cc
@@ -1085,12 +1085,21 @@ c_common_post_options (const char **pfilename)
   /* Change flag_abi_version to be the actual current ABI level, for the
  benefit of c_cpp_builtins, and to make comparison simpler.  */
   const int latest_abi_version = 21;
+  /* Possibly different for non-default ABI fixes within a release.  */
+  const int default_abi_version = latest_abi_version;
   /* Generate compatibility aliases for ABI v18 (GCC 13) by default.  */
   const int abi_compat_default = 18;
 
+  if (flag_abi_version > latest_abi_version)
+warning (0, "%<-fabi-version=%d%> is not supported, using =%d",
+flag_abi_version, latest_abi_version);
+
+  SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+  flag_abi_version, default_abi_version);
+
 #define clamp(X) if (X == 0 || X > latest_abi_version) X = latest_abi_version
   clamp (flag_abi_version);
-  clamp (warn_abi_version);
+  /* Don't clamp warn_abi_version, let it be 0 or out of bounds.  */
   clamp (flag_abi_compat_version);
 #undef clamp
 
@@ -1101,24 +1110,17 @@ c_common_post_options (const char **pfilename)
 flag_abi_compat_version = warn_abi_version;
   else if (warn_abi_version == -1 && flag_abi_compat_version == -1)
 {
-  warn_abi_version = latest_abi_version;
-  if (flag_abi_version == latest_abi_version)
-   {
- auto_diagnostic_group d;
- if (warning (OPT_Wabi, "%<-Wabi%> won%'t warn about anything"))
-   {
- inform (input_location, "%<-Wabi%> warns about differences "
- "from the most up-to-date ABI, which is also used "
- "by default");
- inform (input_location, "use e.g. %<-Wabi=11%> to warn about "
- "changes from GCC 7");
-   }
- flag_abi_compat_version = abi_compat_default;
-   }
+  warn_abi_version = 0;
+  if (flag_abi_version == default_abi_version)
+   flag_abi_compat_version = abi_compat_default;
   else
flag_abi_compat_version = latest_abi_version;
 }
 
+  /* Allow warnings vs ABI versions beyond what we currently support.  */
+  if (warn_abi_version == 0)
+warn_abi_version = 1000;
+
   /* By default, enable the new inheriting constructor semantics along with ABI
  11.  New and old should coexist fine, but it is a change in what
  artificial symbols are generated.  */

base-commit: a63d871eac0e57002b4ab4e1522f3f3851183b5e
-- 
2.49.0



[pushed] c++: add fixed testcase [PR85944]

2025-05-03 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

This testcase was incidentally fixed by r16-325 for PR119162.

PR c++/85944

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-temp3.C: New test.
---
 gcc/testsuite/g++.dg/cpp0x/constexpr-temp3.C | 8 
 1 file changed, 8 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-temp3.C

diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-temp3.C 
b/gcc/testsuite/g++.dg/cpp0x/constexpr-temp3.C
new file mode 100644
index 000..584472c7c84
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-temp3.C
@@ -0,0 +1,8 @@
+// PR c++/85944
+// { dg-do compile { target c++11 } }
+
+constexpr bool f(int const & x) {
+return &x;
+}
+
+constexpr auto x = f(0);

base-commit: 20d184e3f84d859e7e9f44a8d91772a02b658872
-- 
2.49.0



Re: [PATCH 0/3][RFC] Remove TARGET_LRA_P hook

2025-05-03 Thread Jeff Law




On 5/3/25 4:52 AM, Richard Biener wrote:

On Fri, 2 May 2025, Paul Koning wrote:





On May 2, 2025, at 12:27 PM, Maciej W. Rozycki  wrote:

...
NB I understand your position and the need to cut the line sometime, and
I knew what the situation is with the VAX backend and that it would be
manageable.  In principle it might be that it's only that single ICE that
needs debugging before we can claim LRA usable, even if poor RISC-like
code results (FWIW the PDP-11 backend suffers from the same fate).  As
long as user code builds and runs we can improve code gen gradually.


Indeed, I have noticed that LRA doesn't take advantage of PDP-11 (and I
would guess VAX) addressing modes not found in RISC type machines.  A
notable example are the pre-dec and post-inc modes, and I think memory
indirect (i.e., MEM(MEM(xyz)) modes).  What isn't clear to me is whether
there is interest in LRA doing those things, or if the answer is that
they only are part of reload and therefore now unsupported.  I would
like to be able to take advantage of those features, but have not dug
into it so far -- modern register alllocators are rather intimidating
beasts.


I think the auto-inc/dec addressing modes are used by many targets so
there's definitely the chance to adapt LRA (and/or the auto-inc/dec pass)
to handle these better.  There's also the chance this is easier to pull
off when we do not have to care for the IRA + reload combo not regressing.
Yup.  There's no inherent reason why an auto-inc/auto-dec target won't 
work with LRA  (we have some in the tree), but there's probably places 
that support can be improved.




As for MEM(MEM(xyz)) addressing modes I'm less sure - I suppose those
are usually formed at RTL expansion time (rather than, say, by
RTL combine)?  If PDP-11 is the only target with those then it might
be easier to recover those post-LRA during late-combine or peephole
or alternatively in a target specific pass?  But of course I know
nothing about the constraints of said addressing mode or the challenges
those present to LRA.
Double-indirect is *very* uncommon.  I wouldn't want to pollute LRA with 
support that for those addressing modes if we could avoid it.  Pushing 
it to the target seems better from an overall project maintenance 
standpoint, though it may not be the best solution for the targets which 
support double-indirect addressing modes.


jeff



Re: [PATCH v5 05/10] libstdc++: Implement layout_left from mdspan.

2025-05-03 Thread Luc Grosheintz

Topic: follow up question about operator() for layout_stride.

On 4/30/25 7:13 AM, Tomasz Kaminski wrote:

Hi,

As we will be landing patches for extends, this will become a separate
patch series.
I would prefer, if you could commit per layout, and start with layout_right
(default)
I try to provide prompt responses, so if that works better for you, you can
post a patch
only with this layout first, as most of the comments will apply to all of
them.

For the general design we have constructors that allow conversion between
rank-0
and rank-1 layouts left and right. This is done because they essentially
represents
the same layout. I think we could benefit from that in code by having a
base classes
for rank0 and rank1 mapping:
template
_Rank0_mapping_base
{
static_assert(_Extents::rank() == 0);

template
// explicit, requires goes here
_Rank0_mapping_base(_Rank0_mapping_base);

 // All members layout_type goes her
};

template
_Rank1_mapping_base
{
static_assert(_Extents::rank() == 1);
   // Static assert for product is much simpler here, as we need to check one

template
// explicit, requires goes here
_Rank1_mapping_base(_Rank1_mapping_base);

   // Call operator can also be simplified
   index_type operator()(index_type i) const // conversion happens at user
side

   // cosntructor from strided_layout of Rank1 goes here.

 // All members layout_type goes her
};
Then we will specialize layout_left/right/stride to use _Rank0_mapping_base
as a base for rank() == 0
and layout_left/right to use _Rank1_mapping as base for rank()1;
template
struct extents {};

struct layout
{
template
struct mapping
{
// static assert that Extents mmyst be specialization of _Extents goes here.
}
};

template
struct layout::mapping>
: _Rank0_mapping_base>
{
using layout_type = layout_left;
// Provides converting constructor.
using _Rank0_mapping_base>::_Rank0_mapping_base;
// This one is implicit;
mapping(_Rank0_mapping_base> const&);
};

template
struct layout::mapping>
: _Rank1_mapping_base>

{
using layout_type = layout_left;
// Provides converting constructor.
using _Rank0_mapping_base>::_Rank0_mapping_base;
// This one is implicit, allows construction from layout_right
mapping(_Rank1_mapping_base> const&);
};
};

template
requires sizeof..(_Ext) > = 2
struct layout::mapping>

The last one is a generic implementation that you can use in yours.
Please also include a comment explaining that we are deviating from
standard text here.


On Tue, Apr 29, 2025 at 2:56 PM Luc Grosheintz 
wrote:


Implements the parts of layout_left that don't depend on any of the
other layouts.

libstdc++/ChangeLog:

 * include/std/mdspan (layout_left): New class.

Signed-off-by: Luc Grosheintz 
---
  libstdc++-v3/include/std/mdspan | 179 
  1 file changed, 179 insertions(+)

diff --git a/libstdc++-v3/include/std/mdspan
b/libstdc++-v3/include/std/mdspan
index 39ced1d6301..e05048a5b93 100644
--- a/libstdc++-v3/include/std/mdspan
+++ b/libstdc++-v3/include/std/mdspan
@@ -286,6 +286,26 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

namespace __mdspan
{
+template
+  constexpr typename _Extents::index_type
+  __fwd_prod(const _Extents& __exts, size_t __r) noexcept
+  {
+   typename _Extents::index_type __fwd = 1;
+   for(size_t __i = 0; __i < __r; ++__i)
+ __fwd *= __exts.extent(__i);
+   return __fwd;
+  }


As we are inside the standard library implementation, we can do some tricks
here,
and provide two functions:
// Returns the std::span(_ExtentsStorage::_Ext).substr(f, l);
// For extents forward to __static_exts
span __static_exts(size_t f, size_t l);
// Returns the
std::span(_ExtentsStorage::_M_dynamic_extents).substr(_S_dynamic_index[f],
_S_dynamic_index[l);
span __dynamic_exts(Extents const& c);
Then you can befriend this function both to extents and _ExtentsStorage.
Also add index_type members to _ExtentsStorage.

Then instead of having fwd-prod and rev-prod I would have:
template
consteval size_t __static_ext_prod(size_t f, size_t l)
{
   // multiply E != dynamic_ext from __static_exts
}
constexpr size __ext_prod(const _Extents& __exts, size_t f, size_t l)
{
// multiply __static_ext_prod<_Extents>(f, l) and each elements of
__dynamic_exts(__exts, f, l);
}

Then fwd-prod(e, n) would be __ext_prod(e, 0, n), and rev_prod(e, n) would
be __ext_prod(e, __ext.rank() -n, n, __ext.rank())



+
+template
+  constexpr typename _Extents::index_type
+  __rev_prod(const _Extents& __exts, size_t __r) noexcept
+  {
+   typename _Extents::index_type __rev = 1;
+   for(size_t __i = __r + 1; __i < __exts.rank(); ++__i)
+ __rev *= __exts.extent(__i);
+   return __rev;
+  }
+
  template
auto __build_dextents_type(integer_sequence)
 -> extents<_IndexType, ((void) _Counts, dynamic_extent)...>;
@@ -304,6 +324,165 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  explicit extents(_Integrals...) ->
 

Re: [PATCH] cobol, v2: Fix up cobol cross-compilation from 32-bit arches [PR119364]

2025-05-03 Thread Bernhard Reutner-Fischer
On 3 May 2025 13:53:02 CEST, Richard Biener  wrote:
>On Sat, May 3, 2025 at 1:44 PM Jakub Jelinek  wrote:
>>
>> On Sat, May 03, 2025 at 01:32:35PM +0200, Richard Biener wrote:
>> > Any reason for unsigned long long vs. uint64_t for the following?
>> >
>> > -size_t attr;// See cbl_field_attr_t
>> > +unsigned long long attr;// See cbl_field_attr_t
>>
>> Because it needs to be handled the same by the compiler, and it was easier
>> to use ULONGLONG, "attr" than figure out what exact type uint64_t has.
>> The C FE has
>>   if (UINT64_TYPE)
>> c_uint64_type_node = uint64_type_node =
>>   TREE_TYPE (identifier_global_value (c_get_ident (UINT64_TYPE)));
>> but the COBOL FE doesn't have that.
>> For aliasing etc. reasons I'm afraid we can't use
>> build_nonstandard_integer_type (64, 1);
>
>Ah, I was hoping the stdint type nodes were initialized by the middle-end.

no, each fe has that dance one way or the other. hmz

>
>Richard.
>
>>
>> Jakub
>>



Re: [PATCH] get_known_nonzero_bits_1 should use wi::bit_and_not [PR118659]

2025-05-03 Thread Bernhard Reutner-Fischer
On 1 May 2025 17:01:16 CEST, Andrew Pinski  wrote:
>While looking into bitwise optimizations, I noticed that
>get_known_nonzero_bits_1 does `bm.value () & ~bm.mask ()` which
>is ok except it creates a temporary wide_int. Instead if we
>use wi::bit_and_not, we can avoid the temporary and on some


did you look, by chance, if there are other such spots?

just curious
thanks


Re: [PATCH] get_known_nonzero_bits_1 should use wi::bit_and_not [PR118659]

2025-05-03 Thread Andrew Pinski
On Sat, May 3, 2025 at 1:03 PM Bernhard Reutner-Fischer
 wrote:
>
> On 1 May 2025 17:01:16 CEST, Andrew Pinski  wrote:
> >While looking into bitwise optimizations, I noticed that
> >get_known_nonzero_bits_1 does `bm.value () & ~bm.mask ()` which
> >is ok except it creates a temporary wide_int. Instead if we
> >use wi::bit_and_not, we can avoid the temporary and on some
>
>
> did you look, by chance, if there are other such spots?

No, I have not looked. I just happened to come across this one while
working on some other unrelated optimization.

Thanks,
Andrew

>
> just curious
> thanks


Re: [PATCH] get_known_nonzero_bits_1 should use wi::bit_and_not [PR118659]

2025-05-03 Thread Bernhard Reutner-Fischer


>No, I have not looked.

In IPA, we should be able to see them, the mask as not requiring the tmp, but 
the same otherwise


Re: [PATCH v1][GCC16-Stage-1] RISC-V: Remove unnecessary frm restore volatile define_insn

2025-05-03 Thread Jeff Law




On 5/3/25 2:20 AM, Li, Pan2 wrote:

There's still some nagging issues (which I'll describe in a seperate email), but
I think we can go ahead and merge this one too to close the loop.


Thanks Vineet, but I'd like to wait the ack from Jeff before commit.
Go ahead.  I just wanted to make the two (actually three!) parallel 
efforts in this space weren't going to conflict or make some patches 
redundant.


jeff



Re: [patch, Fortran] Fix PR 119928, rejects-valid 15/16 regression

2025-05-03 Thread Harald Anlauf

Hi Thomas,

I haven't tested your patch very thorougly, but when manually
compiling

% gfc-16 gcc/testsuite/gfortran.dg/proc_ptr_52.f90 
-Wexternal-argument-mismatch && ./a.out

STOP 1

It appears that something is not right and generates wrong code with
the check enabled.  Can you have another look?

Cheers,
Harald

Am 03.05.25 um 11:11 schrieb Thomas Koenig:

Hello world,

This patch fixes a case where too much was being checked with
-Wexternal-arguments-mismatch with a procedure pointer with an
unlimited polymorphic and an INTEGER argument which was inferred from
an actual argument.I also found some checks which can trigger false
positives, which this patch also excludes from testing.

Regression-tested.

OK for trunk and backport to gcc-15?

Best regards

 Thomas

gcc/fortran/ChangeLog:

 PR fortran/119928
 * interface.cc (gfc_check_dummy_characteristics): Do not issue
 error for type if one argument is an unlimited polymorphic entity
 and the other one has been generated from an actual argument.
 Do not check OPTIONAL, INTENT, ALLOCATABLE, POINTER, TARGET, VALUE,
 ASYNCHRONOUS or CONTIGUOUS if one of the arguments has been
 generated from an actual argument.

gcc/testsuite/ChangeLog:

 PR fortran/119928
 * gfortran.dg/interface_60.f90: New test.




[to-be-committed][RISC-V] Adjust testcases and finish register move costing fix

2025-05-03 Thread Jeff Law
The recent adjustment to more correctly cost register moves tripped a 
few testsuite regressions.


I'm pretty torn on the thead test adjustments.  But in reality they only 
worked because the register move costing was broken.  So I've reverted 
the scan-asm part of those to a prior state for two of those tests.  The 
other was only failing at -Og/-Oz which was added to the exclude list.


The other Zfa test is similar, but we can make the test behave with a 
suitable -mtune option and thus preserve the test.


While investigating I also noted that vector moves aren't being handled 
correctly for subclasses of the integer/fp register files.  So I fixed 
those while I was in there.


Note this may have an impact on some of your work Pan.  I haven't 
followed the changes from the last week or so due to illness.


Waiting on pre-commit's verdict, though it did spin through my tester 
successfully, though not all of the regressions related to that change 
are addressed (there's still one for rv32 I'll look at shortly).


jeffgcc/
* config/riscv/riscv.cc (riscv_register_move_cost): Handle
subclasses with vector registers as well.

gcc/testsuite/

* gcc.target/riscv/xtheadfmemidx-xtheadfmv-medany.c: Adjust expected
output.
* gcc.target/riscv/xtheadfmemidx-zfa-medany.c: Likewise.
* gcc.target/riscv/xtheadfmv-fmv.c: Skip for -Os and -Oz.
* gcc.target/riscv/zfa-fmovh-fmovp.c: Use sifive-p400 tuning.

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index ed635ab42f4..9f13eeedea8 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -9659,17 +9659,17 @@ riscv_register_move_cost (machine_mode mode,
 
   if (from == V_REGS)
 {
-  if (to == GR_REGS)
+  if (to_is_gpr)
return get_vector_costs ()->regmove->VR2GR;
-  else if (to == FP_REGS)
+  else if (to_is_fpr)
return get_vector_costs ()->regmove->VR2FR;
 }
 
   if (to == V_REGS)
 {
-  if (from == GR_REGS)
+  if (from_is_gpr)
return get_vector_costs ()->regmove->GR2VR;
-  else if (from == FP_REGS)
+  else if (from_is_fpr)
return get_vector_costs ()->regmove->FR2VR;
 }
 
diff --git a/gcc/testsuite/gcc.target/riscv/xtheadfmemidx-xtheadfmv-medany.c 
b/gcc/testsuite/gcc.target/riscv/xtheadfmemidx-xtheadfmv-medany.c
index 6746c314057..38966fefad5 100644
--- a/gcc/testsuite/gcc.target/riscv/xtheadfmemidx-xtheadfmv-medany.c
+++ b/gcc/testsuite/gcc.target/riscv/xtheadfmemidx-xtheadfmv-medany.c
@@ -35,6 +35,4 @@ double foo (int i, int j)
   return z;
 }
 
-/* { dg-final { scan-assembler-not {\mth\.flrd\M} } } */
-/* { dg-final { scan-assembler-times {\mlw\M} 2 } } */
-/* { dg-final { scan-assembler-times {\mth\.fmv\.hw\.x\M} 3 } } */
+/* { dg-final { scan-assembler-times {\mth\.flrd\M} 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/xtheadfmemidx-zfa-medany.c 
b/gcc/testsuite/gcc.target/riscv/xtheadfmemidx-zfa-medany.c
index fb1ac2b735c..f0d9c80d16f 100644
--- a/gcc/testsuite/gcc.target/riscv/xtheadfmemidx-zfa-medany.c
+++ b/gcc/testsuite/gcc.target/riscv/xtheadfmemidx-zfa-medany.c
@@ -35,6 +35,4 @@ double foo (int i, int j)
   return z;
 }
 
-/* { dg-final { scan-assembler-not {\mth\.flrd\M} } } */
-/* { dg-final { scan-assembler-times {\mlw\M} 2 } } */
-/* { dg-final { scan-assembler-times {\mfmvp\.d\.x\M} 3 } } */
+/* { dg-final { scan-assembler-times {\mth\.flrd\M} 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/zfa-fmovh-fmovp.c 
b/gcc/testsuite/gcc.target/riscv/zfa-fmovh-fmovp.c
index 5a52adce36a..150cfd7fc05 100644
--- a/gcc/testsuite/gcc.target/riscv/zfa-fmovh-fmovp.c
+++ b/gcc/testsuite/gcc.target/riscv/zfa-fmovh-fmovp.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv32g_zfa -mabi=ilp32 -O0" } */
+/* { dg-options "-march=rv32g_zfa -mabi=ilp32 -O0 -mtune=sifive-p400-series" } 
*/
 
 double foo(long long a)
 {
diff --git a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c 
b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
index 9b4e2378448..81b240eac57 100644
--- a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
+++ b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
@@ -1,6 +1,6 @@
 /* { dg-do compile { target { rv32 } } } */
 /* { dg-options "-march=rv32gc_xtheadfmv -mabi=ilp32d" } */
-/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Oz"} } */
 
 double
 ll2d (long long ll)


Update GCC16 to use libffi 3.4.8

2025-05-03 Thread Simon Sobisch

As GCC15 is now strict on dynamic function as
int *func()
to mean exactly zero arguments via its default -std=gnu23, I'm looking 
into a dynamic option that would work for C23 and recognized libffi 
being built as part of GCC and being part of its source tree, which 
possibly is a way to go (unknown amount of arguments between 0 and 252)


But then I've seen that GCC's in-tree libffi was last updated in 2021, 
while the current one is from this year.


So I do wonder:

1 Which parts of GCC use libffi?
2 Is it linked in statically for GCC's usage (I'd see no problem via its
  MIT license to put it anywhere)?
3 Is there a reason to _not_ update it in GCC16 to the most current
  version?


Bonus question: for my call of functions with a compile-time unknown 
amount of arguments for function pointer that is shared - would you 
suggest to use something else?


Re: [PATCH v16] ada: fix timeval timespec on 32 bits archs with 64 bits time_t [PR114065]

2025-05-03 Thread Nicolas Boulenguez
Hello.
The bug log contains a slightly improved v16.

Marc Poulhiès:
> issue we currently see is that some application code doesn't build. In
> particular gprbuild needs to be adjusted following your changes. We don't want
> to split the ecosystem between code that builds with latest compiler only and
> code that builds with older compilers only.
> We're currently discussing the possible solutions. The case of gprbuild is 
> very
> localized and comes from the removal of the Timeval type in the
> GNAT.Sockets.Thin_Common package, but we fear that this pattern may exist in
> many other applications. Testing the changes is not straightforward as we need
> to adjust many downstream projects (gprbuild as already mentioned, but also 
> many
> custom runtimes that now fail to build, and probably others that we will
> discover as we fix previous failures...).

The types in System.C_Time could remain visible for a while, with
Obsolescent warnings that they will eventually be private.
Then, types removed by patch 1 could be replaced with subtypes.
For example, in GNAT.Sockets.Thin_Common:

  subtype time_t is System.C_Time.time_t;
  pragma Obsolescent (time_t, "please use System.C_Time");
  subtype suseconds_t is System.C_Time.usec_t;
  pragma Obsolescent (suseconds_t, "please use System.C_Time");
  subtype timeval is System.C_Time.timeval;
  pragma Obsolescent (timeval, "please use System.C_Time");

Except for the casing, the timeval/spec record components were
consistently named, so most user code accessing them should build
unchanged.

Would this solve the issues you are encountering?

If so, it does not cost much to replace each visible removal in patch
1 with a subtype or a function renaming (and an Obsolescent pragma),
just in case someone outside GNAT is relying on it.


Re: [PATCH 0/3][RFC] Remove TARGET_LRA_P hook

2025-05-03 Thread Paul Koning



> On May 3, 2025, at 6:52 AM, Richard Biener  wrote:
> 
> On Fri, 2 May 2025, Paul Koning wrote:
> 
>> 
>> 
>>> On May 2, 2025, at 12:27 PM, Maciej W. Rozycki  wrote:
>>> 
>>> ...
>>> NB I understand your position and the need to cut the line sometime, and 
>>> I knew what the situation is with the VAX backend and that it would be 
>>> manageable.  In principle it might be that it's only that single ICE that 
>>> needs debugging before we can claim LRA usable, even if poor RISC-like 
>>> code results (FWIW the PDP-11 backend suffers from the same fate).  As 
>>> long as user code builds and runs we can improve code gen gradually.
>> 
>> Indeed, I have noticed that LRA doesn't take advantage of PDP-11 (and I 
>> would guess VAX) addressing modes not found in RISC type machines.  A 
>> notable example are the pre-dec and post-inc modes, and I think memory 
>> indirect (i.e., MEM(MEM(xyz)) modes).  What isn't clear to me is whether 
>> there is interest in LRA doing those things, or if the answer is that 
>> they only are part of reload and therefore now unsupported.  I would 
>> like to be able to take advantage of those features, but have not dug 
>> into it so far -- modern register alllocators are rather intimidating 
>> beasts.
> 
> I think the auto-inc/dec addressing modes are used by many targets so
> there's definitely the chance to adapt LRA (and/or the auto-inc/dec pass)
> to handle these better.  There's also the chance this is easier to pull
> off when we do not have to care for the IRA + reload combo not regressing.

Thanks, that's helpful to know.

> As for MEM(MEM(xyz)) addressing modes I'm less sure - I suppose those
> are usually formed at RTL expansion time (rather than, say, by
> RTL combine)?  If PDP-11 is the only target with those then it might
> be easier to recover those post-LRA during late-combine or peephole
> or alternatively in a target specific pass?  But of course I know
> nothing about the constraints of said addressing mode or the challenges
> those present to LRA.

VAX also has them, in fact VAX addressing modes are a superset of the PDP11 
ones.  

paul



[PATCH] Fortran: array subreferences and components of derived types [PR119986]

2025-05-03 Thread Harald Anlauf

Dear all,

the attached, semi-obvious patch fixes bugging issues with passing of
array subreferences when either an inquiry reference to a complex array
or a substring reference to a character array was involved, and the
array was a component of a derived type.  The obvious cause was always
an early termination of the scan of the reference.

The original PR was about complex issues, but since I was aware of
a similar issue for substrings, I fixed that at the same time.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

As this is a hideous wrong-code bug, I'd like to backport
to at least 15-branch, if this is ok.

Thanks,
Harald

From 8d49cd9e0fe76d2c45495017cb87588e9b9824cf Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Sat, 3 May 2025 20:35:57 +0200
Subject: [PATCH] Fortran: array subreferences and components of derived types
 [PR119986]

	PR fortran/119986

gcc/fortran/ChangeLog:

	* expr.cc (is_subref_array): When searching for array references,
	do not terminate early so that inquiry references to complex
	components work.
	* primary.cc (gfc_variable_attr): A substring reference can refer
	to either a scalar or array character variable.  Adjust search
	accordingly.

gcc/testsuite/ChangeLog:

	* gfortran.dg/actual_array_subref.f90: New test.
---
 gcc/fortran/expr.cc   |   1 +
 gcc/fortran/primary.cc|  13 ++-
 .../gfortran.dg/actual_array_subref.f90   | 103 ++
 3 files changed, 113 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/actual_array_subref.f90

diff --git a/gcc/fortran/expr.cc b/gcc/fortran/expr.cc
index 07e9bac37a1..92a9ebdcbe8 100644
--- a/gcc/fortran/expr.cc
+++ b/gcc/fortran/expr.cc
@@ -1194,6 +1194,7 @@ is_subref_array (gfc_expr * e)
 	 what follows cannot be a subreference array, unless there is a
 	 substring reference.  */
   if (!seen_array && ref->type == REF_COMPONENT
+	  && ref->next == NULL
 	  && ref->u.c.component->ts.type != BT_CHARACTER
 	  && ref->u.c.component->ts.type != BT_CLASS
 	  && !gfc_bt_struct (ref->u.c.component->ts.type))
diff --git a/gcc/fortran/primary.cc b/gcc/fortran/primary.cc
index 161d4c26964..72ecc7ccf93 100644
--- a/gcc/fortran/primary.cc
+++ b/gcc/fortran/primary.cc
@@ -2893,6 +2893,7 @@ gfc_variable_attr (gfc_expr *expr, gfc_typespec *ts)
   gfc_symbol *sym;
   gfc_component *comp;
   bool has_inquiry_part;
+  bool has_substring_ref = false;
 
   if (expr->expr_type != EXPR_VARIABLE
   && expr->expr_type != EXPR_FUNCTION
@@ -2955,7 +2956,12 @@ gfc_variable_attr (gfc_expr *expr, gfc_typespec *ts)
 
   has_inquiry_part = false;
   for (ref = expr->ref; ref; ref = ref->next)
-if (ref->type == REF_INQUIRY)
+if (ref->type == REF_SUBSTRING)
+  {
+	has_substring_ref = true;
+	optional = false;
+  }
+else if (ref->type == REF_INQUIRY)
   {
 	has_inquiry_part = true;
 	optional = false;
@@ -3003,9 +3009,8 @@ gfc_variable_attr (gfc_expr *expr, gfc_typespec *ts)
 	*ts = comp->ts;
 	/* Don't set the string length if a substring reference
 	   follows.  */
-	if (ts->type == BT_CHARACTER
-		&& ref->next && ref->next->type == REF_SUBSTRING)
-		ts->u.cl = NULL;
+	if (ts->type == BT_CHARACTER && has_substring_ref)
+	  ts->u.cl = NULL;
 	  }
 
 	if (comp->ts.type == BT_CLASS)
diff --git a/gcc/testsuite/gfortran.dg/actual_array_subref.f90 b/gcc/testsuite/gfortran.dg/actual_array_subref.f90
new file mode 100644
index 000..932d7aba121
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/actual_array_subref.f90
@@ -0,0 +1,103 @@
+! { dg-do run }
+! { dg-additional-options "-O2 -fcheck=bounds" }
+!
+! PR fortran/119986
+!
+! Check passing of inquiry references of complex arrays and substring
+! references of character arrays when these are components of derived types.
+!
+! Extended version of report by Neil Carlson.
+
+program main
+  implicit none
+  integer :: j
+
+  complex, parameter  :: z0(*) = [(cmplx(j,-j),j=1,4)]
+  type :: cx
+ real :: re
+ real :: im
+  end type cx
+  type(cx), parameter :: c0(*) = [(cx   (j,-j),j=1,4)]
+
+  type :: my_type
+ complex  :: z(4) = z0
+ type(cx) :: c(4) = c0
+  end type my_type
+  type(my_type) :: x
+
+  character(*), parameter :: s0(*) = ["abcd","efgh","ijkl","mnop"]
+  character(*), parameter :: expect(*) = s0(:)(2:3)
+  character(len(s0))  :: s1(4) = s0
+
+  type :: str1
+ character(len(s0))   :: s(4)  = s0
+  end type str1
+  type(str1) :: string1
+
+  type :: str2
+ character(:), allocatable :: s(:)
+  end type str2
+  type(str2) :: string2
+
+  integer :: stopcode = 0
+
+  if (len(expect) /= 2)stop 1
+  if (expect(4)   /= "no") stop 2
+  if (any(c0 %re  /= [ 1, 2, 3, 4])) stop 3
+  if (any(c0 %im  /= [-1,-2,-3,-4])) stop 4
+
+  stopcode = 10
+  call fubar ( x%z %re, x%z %im)
+  call fubar ( x%c %re, x%c %im)
+
+  stopcode = 20
+  call fubar ((x%z %re), (x%z %im))
+  call fubar ((x%c %re), (x%c %im))
+
+  stopcode = 30
+  call fubar ([x%z %r

Re: [PATCH v5 05/10] libstdc++: Implement layout_left from mdspan.

2025-05-03 Thread Luc Grosheintz

Topics of this chain:
 - computing __fwd_prod and __rev_prod.
 - checking representability preconditions.

On 4/30/25 7:13 AM, Tomasz Kaminski wrote:

Hi,

As we will be landing patches for extends, this will become a separate
patch series.
I would prefer, if you could commit per layout, and start with layout_right
(default)
I try to provide prompt responses, so if that works better for you, you can
post a patch
only with this layout first, as most of the comments will apply to all of
them.

For the general design we have constructors that allow conversion between
rank-0
and rank-1 layouts left and right. This is done because they essentially
represents
the same layout. I think we could benefit from that in code by having a
base classes
for rank0 and rank1 mapping:
template
_Rank0_mapping_base
{
static_assert(_Extents::rank() == 0);

template
// explicit, requires goes here
_Rank0_mapping_base(_Rank0_mapping_base);

 // All members layout_type goes her
};

template
_Rank1_mapping_base
{
static_assert(_Extents::rank() == 1);
   // Static assert for product is much simpler here, as we need to check one

template
// explicit, requires goes here
_Rank1_mapping_base(_Rank1_mapping_base);

   // Call operator can also be simplified
   index_type operator()(index_type i) const // conversion happens at user
side

   // cosntructor from strided_layout of Rank1 goes here.

 // All members layout_type goes her
};
Then we will specialize layout_left/right/stride to use _Rank0_mapping_base
as a base for rank() == 0
and layout_left/right to use _Rank1_mapping as base for rank()1;
template
struct extents {};

struct layout
{
template
struct mapping
{
// static assert that Extents mmyst be specialization of _Extents goes here.
}
};

template
struct layout::mapping>
: _Rank0_mapping_base>
{
using layout_type = layout_left;
// Provides converting constructor.
using _Rank0_mapping_base>::_Rank0_mapping_base;
// This one is implicit;
mapping(_Rank0_mapping_base> const&);
};

template
struct layout::mapping>
: _Rank1_mapping_base>

{
using layout_type = layout_left;
// Provides converting constructor.
using _Rank0_mapping_base>::_Rank0_mapping_base;
// This one is implicit, allows construction from layout_right
mapping(_Rank1_mapping_base> const&);
};
};

template
requires sizeof..(_Ext) > = 2
struct layout::mapping>

The last one is a generic implementation that you can use in yours.
Please also include a comment explaining that we are deviating from
standard text here.


On Tue, Apr 29, 2025 at 2:56 PM Luc Grosheintz 
wrote:


Implements the parts of layout_left that don't depend on any of the
other layouts.

libstdc++/ChangeLog:

 * include/std/mdspan (layout_left): New class.

Signed-off-by: Luc Grosheintz 
---
  libstdc++-v3/include/std/mdspan | 179 
  1 file changed, 179 insertions(+)

diff --git a/libstdc++-v3/include/std/mdspan
b/libstdc++-v3/include/std/mdspan
index 39ced1d6301..e05048a5b93 100644
--- a/libstdc++-v3/include/std/mdspan
+++ b/libstdc++-v3/include/std/mdspan
@@ -286,6 +286,26 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

namespace __mdspan
{
+template
+  constexpr typename _Extents::index_type
+  __fwd_prod(const _Extents& __exts, size_t __r) noexcept
+  {
+   typename _Extents::index_type __fwd = 1;
+   for(size_t __i = 0; __i < __r; ++__i)
+ __fwd *= __exts.extent(__i);
+   return __fwd;
+  }


As we are inside the standard library implementation, we can do some tricks
here,
and provide two functions:
// Returns the std::span(_ExtentsStorage::_Ext).substr(f, l);
// For extents forward to __static_exts
span __static_exts(size_t f, size_t l);
// Returns the
std::span(_ExtentsStorage::_M_dynamic_extents).substr(_S_dynamic_index[f],
_S_dynamic_index[l);
span __dynamic_exts(Extents const& c);
Then you can befriend this function both to extents and _ExtentsStorage.
Also add index_type members to _ExtentsStorage.

Then instead of having fwd-prod and rev-prod I would have:
template
consteval size_t __static_ext_prod(size_t f, size_t l)
{
   // multiply E != dynamic_ext from __static_exts
}
constexpr size __ext_prod(const _Extents& __exts, size_t f, size_t l)
{
// multiply __static_ext_prod<_Extents>(f, l) and each elements of
__dynamic_exts(__exts, f, l);
}

Then fwd-prod(e, n) would be __ext_prod(e, 0, n), and rev_prod(e, n) would
be __ext_prod(e, __ext.rank() -n, n, __ext.rank())


This makes a lot of sense (and I'd briefly thought of it before forgetting
again before submission).




+
+template
+  constexpr typename _Extents::index_type
+  __rev_prod(const _Extents& __exts, size_t __r) noexcept
+  {
+   typename _Extents::index_type __rev = 1;
+   for(size_t __i = __r + 1; __i < __exts.rank(); ++__i)
+ __rev *= __exts.extent(__i);
+   return __rev;
+  }
+
  template
auto __build_dextents_type(integer_sequence)
 -> extents<_In

Re: [PATCH v5 05/10] libstdc++: Implement layout_left from mdspan.

2025-05-03 Thread Luc Grosheintz

This chain discusses changes to `mapping::operator()`. For concrete
discussion, see below. I have a general question: is there a reason
other than style to prefer folds over recursion?


On 4/30/25 7:13 AM, Tomasz Kaminski wrote:

Hi,

As we will be landing patches for extends, this will become a separate
patch series.
I would prefer, if you could commit per layout, and start with layout_right
(default)
I try to provide prompt responses, so if that works better for you, you can
post a patch
only with this layout first, as most of the comments will apply to all of
them.

For the general design we have constructors that allow conversion between
rank-0
and rank-1 layouts left and right. This is done because they essentially
represents
the same layout. I think we could benefit from that in code by having a
base classes
for rank0 and rank1 mapping:
template
_Rank0_mapping_base
{
static_assert(_Extents::rank() == 0);

template
// explicit, requires goes here
_Rank0_mapping_base(_Rank0_mapping_base);

 // All members layout_type goes her
};

template
_Rank1_mapping_base
{
static_assert(_Extents::rank() == 1);
   // Static assert for product is much simpler here, as we need to check one

template
// explicit, requires goes here
_Rank1_mapping_base(_Rank1_mapping_base);

   // Call operator can also be simplified
   index_type operator()(index_type i) const // conversion happens at user
side

   // cosntructor from strided_layout of Rank1 goes here.

 // All members layout_type goes her
};
Then we will specialize layout_left/right/stride to use _Rank0_mapping_base
as a base for rank() == 0
and layout_left/right to use _Rank1_mapping as base for rank()1;
template
struct extents {};

struct layout
{
template
struct mapping
{
// static assert that Extents mmyst be specialization of _Extents goes here.
}
};

template
struct layout::mapping>
: _Rank0_mapping_base>
{
using layout_type = layout_left;
// Provides converting constructor.
using _Rank0_mapping_base>::_Rank0_mapping_base;
// This one is implicit;
mapping(_Rank0_mapping_base> const&);
};

template
struct layout::mapping>
: _Rank1_mapping_base>

{
using layout_type = layout_left;
// Provides converting constructor.
using _Rank0_mapping_base>::_Rank0_mapping_base;
// This one is implicit, allows construction from layout_right
mapping(_Rank1_mapping_base> const&);
};
};

template
requires sizeof..(_Ext) > = 2
struct layout::mapping>

The last one is a generic implementation that you can use in yours.
Please also include a comment explaining that we are deviating from
standard text here.


On Tue, Apr 29, 2025 at 2:56 PM Luc Grosheintz 
wrote:


Implements the parts of layout_left that don't depend on any of the
other layouts.

libstdc++/ChangeLog:

 * include/std/mdspan (layout_left): New class.

Signed-off-by: Luc Grosheintz 
---
  libstdc++-v3/include/std/mdspan | 179 
  1 file changed, 179 insertions(+)

diff --git a/libstdc++-v3/include/std/mdspan
b/libstdc++-v3/include/std/mdspan
index 39ced1d6301..e05048a5b93 100644
--- a/libstdc++-v3/include/std/mdspan
+++ b/libstdc++-v3/include/std/mdspan
@@ -286,6 +286,26 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

namespace __mdspan
{
+template
+  constexpr typename _Extents::index_type
+  __fwd_prod(const _Extents& __exts, size_t __r) noexcept
+  {
+   typename _Extents::index_type __fwd = 1;
+   for(size_t __i = 0; __i < __r; ++__i)
+ __fwd *= __exts.extent(__i);
+   return __fwd;
+  }


As we are inside the standard library implementation, we can do some tricks
here,
and provide two functions:
// Returns the std::span(_ExtentsStorage::_Ext).substr(f, l);
// For extents forward to __static_exts
span __static_exts(size_t f, size_t l);
// Returns the
std::span(_ExtentsStorage::_M_dynamic_extents).substr(_S_dynamic_index[f],
_S_dynamic_index[l);
span __dynamic_exts(Extents const& c);
Then you can befriend this function both to extents and _ExtentsStorage.
Also add index_type members to _ExtentsStorage.

Then instead of having fwd-prod and rev-prod I would have:
template
consteval size_t __static_ext_prod(size_t f, size_t l)
{
   // multiply E != dynamic_ext from __static_exts
}
constexpr size __ext_prod(const _Extents& __exts, size_t f, size_t l)
{
// multiply __static_ext_prod<_Extents>(f, l) and each elements of
__dynamic_exts(__exts, f, l);
}

Then fwd-prod(e, n) would be __ext_prod(e, 0, n), and rev_prod(e, n) would
be __ext_prod(e, __ext.rank() -n, n, __ext.rank())



+
+template
+  constexpr typename _Extents::index_type
+  __rev_prod(const _Extents& __exts, size_t __r) noexcept
+  {
+   typename _Extents::index_type __rev = 1;
+   for(size_t __i = __r + 1; __i < __exts.rank(); ++__i)
+ __rev *= __exts.extent(__i);
+   return __rev;
+  }
+
  template
auto __build_dextents_type(integer_sequence)
 -> extents<_IndexType, ((void) _Counts

Re: [PATCH] PR tree-optimization/120048 - Allow IPA_CP to handle UNDEFINED as VARYING.

2025-05-03 Thread Andrew MacLeod



On 5/3/25 07:41, Richard Biener wrote:

On Sat, May 3, 2025 at 12:39 AM Andrew MacLeod  wrote:

On trunk I'll eventually do something different.. but it will be more
invasive than I think is reasonable for a backport.

The problem in the PR is that there is a variable with a range and has a
bitmask attached to it.   We often defer bitmask processing, the the
change which triggers this problem "improves" the range by applying the
bitmask when  we call update_bitmask. (PR 119712)

The case in point is a range of 0, combined with a bitmask that says the
'1' bit must be on.   This results in an UNDEFINED range since its
impossible.   this is rarely a problem but this particular snippet of
code in IPA is tripping over it because it has checked for undefined,
and then created a new range by combining the [0, 0] and the bitmask,
which we turn into an UNDEFINED.. which it isn't expected.and then
it asks for the type of the range.

As Jakub points out in the PR, this is effectively unreachable code that
is being propagated. A harmless fix would be to check if the result of
applying the bitmask results in an UNDEFINED value and  to simply
replace it with a VARYING value.

WE still reduce the testcase to "return 0" and no more failure.

bootstraps on -x86_64-pc-linux-gnu  with no regressions.

If this is acceptable, I will push it to trunk, then also test/verify
for the GCC15 and 14(?) branches and check it in there.

LGTM.  IPA CP might want to either avoid looking at the type
for UNDEFINED or track it separate from the value-range, not
sure where it looks at the type of a range.

Richard.

It appears they don't track undefined at al?  l...   but thats just a 
cursory glance.


On trunk. I think I'll adjust it next week and put the type into the 
UNDEFINED range..  I have a functioning patch now.  I think its too 
pervasive and not really enough of an issue to do that on the release 
branches


It'll solve a few of these kinds of things when they pop up, and allow 
us to properly do an invert () operation  on VARYING and UNDEFINED, 
which we have discussed before.


Andrew



Improve maybe_hot handling in inliner heuristics

2025-05-03 Thread Jan Hubicka
Hi,
Inliner currently applies different heuristics to hot and cold calls (the
second are inlined only if the code size will shrink).  It may happen that the
call itself is hot, but the significant time is spent in callee and inlining
makes it faster.  For this reason we want to check if the anticipated speedup
is considered hot which is done by this patch (that is similar ot my earlier
ipa-cp change).

In general I think this is less important compared to ipa-cp change, since large
benefit from inlining happens only when something useful is propagated into the
callee and should be handled earlier by ipa-cp. However the patch improves
SPEC2k17 imagick runtime by about 9% as discussed in PR 11900 though it is
mostly problem of bad train data set which does not train well parts of program
that are hot for ref data set.  As discussed in the PR log, the particular call
that needs to be inlined has count that falls very slightly bellow the cutoff
and scaling it up by expected savings enables inlining.

Profiledbootstrapped/regtested x86_64-linux, plan to commit it after LNT
testers pick up current changes.

gcc/ChangeLog:

PR target/119900
* cgraph.cc (cgraph_edge::maybe_hot_p): Add
a variant accepting a sreal scale; use reliability of
profile.
* cgraph.h (cgraph_edge::maybe_hot_p): Declare
a varaint accepting a sreal scale.
* ipa-inline.cc (callee_speedup): New function.
(want_inline_small_function_p): add early return
and avoid duplicated lookup of summaries; use scaled
maybe_hot predicate.

diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc
index 6ae6a97f6f5..1a2ec38374a 100644
--- a/gcc/cgraph.cc
+++ b/gcc/cgraph.cc
@@ -2984,13 +2984,22 @@ cgraph_edge::cannot_lead_to_return_p (void)
 return callee->cannot_return_p ();
 }
 
-/* Return true if the edge may be considered hot.  */
+/* Return true if the edge after scaling it profile by SCALE
+   may be considered hot.  */
 
 bool
-cgraph_edge::maybe_hot_p (void)
+cgraph_edge::maybe_hot_p (sreal scale)
 {
-  if (!maybe_hot_count_p (NULL, count.ipa ()))
+  /* Never consider calls in functions optimized for size hot.  */
+  if (opt_for_fn (caller->decl, optimize_size))
 return false;
+
+  /* If reliable IPA count is available, just use it.  */
+  profile_count c = count.ipa ();
+  if (c.reliable_p ())
+return maybe_hot_count_p (NULL, c * scale);
+
+  /* See if we can determine hotness using caller frequency.  */
   if (caller->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED
   || (callee
  && callee->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED))
@@ -2999,25 +3008,42 @@ cgraph_edge::maybe_hot_p (void)
   && (callee
  && callee->frequency <= NODE_FREQUENCY_EXECUTED_ONCE))
 return false;
-  if (opt_for_fn (caller->decl, optimize_size))
-return false;
+  /* ??? This may make sense for hot functions determined by
+ user attribute, but if function is hot by profile, it may
+ contains non-hot calls.  In most practical cases this case
+ is handled by the reliable ipa count above, but i.e. after
+ inlining function with no profile to function with profile
+ we get here.. */
   if (caller->frequency == NODE_FREQUENCY_HOT)
 return true;
+
+  /* Use IPA count and if it s not available appy local heuristics.  */
+  if (c.initialized_p ())
+return maybe_hot_count_p (NULL, c * scale);
   if (!count.initialized_p ())
 return true;
   cgraph_node *where = caller->inlined_to ? caller->inlined_to : caller;
   if (!where->count.initialized_p ())
-return false;
+return true;
+  c = count * scale;
   if (caller->frequency == NODE_FREQUENCY_EXECUTED_ONCE)
 {
-  if (count * 2 < where->count * 3)
+  if (c * 2 < where->count * 3)
return false;
 }
-  else if (count * param_hot_bb_frequency_fraction < where->count)
+  else if (c * param_hot_bb_frequency_fraction < where->count)
 return false;
   return true;
 }
 
+/* Return true if the edge may be considered hot.  */
+
+bool
+cgraph_edge::maybe_hot_p ()
+{
+  return maybe_hot_p (1);
+}
+
 /* Worker for cgraph_can_remove_if_no_direct_calls_p.  */
 
 static bool
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index abde770ba2b..f7b67ed0a6c 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -1872,8 +1872,13 @@ public:
   /* Return true when the edge represents a direct recursion.  */
   bool recursive_p (void);
 
-  /* Return true if the edge may be considered hot.  */
-  bool maybe_hot_p (void);
+  /* Return true if the edge may be considered hot after scalling its count.  
*/
+  bool maybe_hot_p ();
+
+  /* Return true if the edge may be considered hot after scalling its count
+ (i.e. assume that optimization would reduce runtime for callee,
+  possibly significantly).  */
+  bool maybe_hot_p (sreal scale);
 
   /* Get unique identifier of the edge.  */
   inline int get_uid ()
diff --git a/gcc/ipa-inline.cc b/gcc/ipa-inline.cc
index 7c2feeeffbb..38fdbfde1b3 100644

Re: [PATCH 0/3][RFC] Remove TARGET_LRA_P hook

2025-05-03 Thread John Paul Adrian Glaubitz
Hello,




> This mini-series removes the TARGET_LRA_P hook, forcing all targets
> to use LRA.  I have not touched the targets that define -mlra
> in terms of a 'Target Mask(XXX)' since IIRC there's no way to
> "default" that.  I'd expect those to wrongly assume LRA isn't enabled
> when using that XXX flag.  Likewise this defers removal of -mlra
> and the TARGET_LRA_P hook from targets with a -mlra flag.

Please let me run tests on alpha, hppa, m68k and sh to verify what the
current status of LRA on these targets are. I will report back.

I know for sure that LRA on alpha works with the baseline set to EV56,
i.e. all BWX-targets but did not work for non-BWX targets. On sh, enabling
LRA requires using Oleg Endo's tree from [1] plus the patches from the
attachments 59432 and 59550 from [2].

PS: If possible, please CC me in the future when it comes to discussions
regarding these retro-computing targets. I am subscribed to gcc-patches
but I have disabled mail delivery at the moment due to the high volume.

PPS: Sorry for posting out of thread, but unlike lore.kernel.org, I could
 not find a way to obtain the message or mboxes on gcc-patches.

Thanks,
Adrian

> [1] https://github.com/olegendo/gcc/tree/devel/sh-lra
> [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55212

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer
`. `'   Physicist
  `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913


Re: [PATCH 0/3][RFC] Remove TARGET_LRA_P hook

2025-05-03 Thread Sam James
John Paul Adrian Glaubitz  writes:

> Hello,
>
>
>
>
>> This mini-series removes the TARGET_LRA_P hook, forcing all targets
>> to use LRA.  I have not touched the targets that define -mlra
>> in terms of a 'Target Mask(XXX)' since IIRC there's no way to
>> "default" that.  I'd expect those to wrongly assume LRA isn't enabled
>> when using that XXX flag.  Likewise this defers removal of -mlra
>> and the TARGET_LRA_P hook from targets with a -mlra flag.
>
> Please let me run tests on alpha, hppa, m68k and sh to verify what the
> current status of LRA on these targets are. I will report back.

Thanks. I was planning on doing that for hppa but the heat hasn't
allowed it.


Re: [PATCH 0/3][RFC] Remove TARGET_LRA_P hook

2025-05-03 Thread John Paul Adrian Glaubitz
Hi Maciej,

On Fri, 2025-05-02 at 11:57 +0100, Maciej W. Rozycki wrote:
> - I have only learnt last year that the Alpha backend also needs some work 
>   here and it appears that it relies on a hack or a bunch within reload to 
>   propagate alignment information required for non-BWX targets to produce 
>   correct code.  I haven't been able to investigate it further, but it 
>   seems it may require considerable effort to solve and as it stands the 
>   backend doesn't build if switched to LRA.  And then last year's sudden 
>   removal of non-BWX support from the Linux kernel has consumed my already 
>   limited time for work on a solution to bring it back instead (with more 
>   effort still required, also as a matter of priority).

I was able to bootstrap the Alpha backend with LRA enabled just fine when using
a BWX target, i.e. the EV56. What didn't work was using the non-BWX base line.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer
`. `'   Physicist
  `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913


Re: [PATCH 0/3][RFC] Remove TARGET_LRA_P hook

2025-05-03 Thread Andrew Pinski
On Sat, May 3, 2025 at 10:29 PM John Paul Adrian Glaubitz
 wrote:
>
> Hello,
>
>
>
>
> > This mini-series removes the TARGET_LRA_P hook, forcing all targets
> > to use LRA.  I have not touched the targets that define -mlra
> > in terms of a 'Target Mask(XXX)' since IIRC there's no way to
> > "default" that.  I'd expect those to wrongly assume LRA isn't enabled
> > when using that XXX flag.  Likewise this defers removal of -mlra
> > and the TARGET_LRA_P hook from targets with a -mlra flag.
>
> Please let me run tests on alpha, hppa, m68k and sh to verify what the
> current status of LRA on these targets are. I will report back.
>
> I know for sure that LRA on alpha works with the baseline set to EV56,
> i.e. all BWX-targets but did not work for non-BWX targets. On sh, enabling
> LRA requires using Oleg Endo's tree from [1] plus the patches from the
> attachments 59432 and 59550 from [2].
>
> PS: If possible, please CC me in the future when it comes to discussions
> regarding these retro-computing targets. I am subscribed to gcc-patches
> but I have disabled mail delivery at the moment due to the high volume.
>
> PPS: Sorry for posting out of thread, but unlike lore.kernel.org, I could
>  not find a way to obtain the message or mboxes on gcc-patches.

https://inbox.sourceware.org/gcc-patches/ is the link to the official
public-inbox instance for next time.

Thanks,
Andrew

>
> Thanks,
> Adrian
>
> > [1] https://github.com/olegendo/gcc/tree/devel/sh-lra
> > [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55212
>
> --
>  .''`.  John Paul Adrian Glaubitz
> : :' :  Debian Developer
> `. `'   Physicist
>   `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913


Re: [PATCH 0/3][RFC] Remove TARGET_LRA_P hook

2025-05-03 Thread John Paul Adrian Glaubitz
Hello,

> This mini-series removes the TARGET_LRA_P hook, forcing all targets
> to use LRA.  I have not touched the targets that define -mlra
> in terms of a 'Target Mask(XXX)' since IIRC there's no way to
> "default" that.  I'd expect those to wrongly assume LRA isn't enabled
> when using that XXX flag.  Likewise this defers removal of -mlra
> and the TARGET_LRA_P hook from targets with a -mlra flag.

Please let me run tests on alpha, hppa, m68k and sh to verify what the
current status of LRA on these targets are. I will report back.

I know for sure that LRA on alpha works with the baseline set to EV56,
i.e. all BWX-targets but did not work for non-BWX targets. On sh, enabling
LRA requires using Oleg Endo's tree from [1] plus the patches from the
attachments 59432 and 59550 from [2].

PS: If possible, please CC me in the future when it comes to discussions
regarding these retro-computing targets. I am subscribed to gcc-patches
but I have disabled mail delivery at the moment due to the high volume.

PPS: Sorry for posting out of thread, but unlike lore.kernel.org, I could
 not find a way to obtain the message or mboxes on gcc-patches.

Thanks,
Adrian

> [1] https://github.com/olegendo/gcc/tree/devel/sh-lra
> [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55212

--
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer
`. `'   Physicist
  `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913


Re: [PATCH 0/3][RFC] Remove TARGET_LRA_P hook

2025-05-03 Thread John Paul Adrian Glaubitz
Hi Maciej,

On Fri, 2025-05-02 at 17:27 +0100, Maciej W. Rozycki wrote:
>  I only have non-BWX hardware and I'm not interested in decommissioning it 
> or upgrading.  There appear to be a few users around, but I seem to be the 
> last GCC developer remaining who is willing to do anything about the port.  
> It doesn't help that Alpha/QEMU appears broken and produces unreliable 
> results, so it'd have to be someone with actual hardware (or willing to 
> fix QEMU first).

What exactly is broken with the QEMU emulation in Alpha? I don't know of any
bugs, but it could be that you have run into the nasty stack alignment issue
in the kernel that was fixed in Linux 6.14.

So, if you test on an emulated Alpha, please make sure to use at least kernel
6.14 and also make sure that CONFIG_COMPACTION is disabled. Using this setup
will get you a very reliable and stable Alpha Linux environment on QEMU.

> 
>  What I was not aware of is the situation with the Alpha backend and the 
> need to put out fires there.  That non-BWX issue with Linux kernel's RCU 
> algorithms was a nasty surprise to me, one I could have dealt with before 
> with less time pressure if I knew about it.

What RCU issue are you talking about? I can only stress that to use Linux on
Alpha, you *must* use kernel 6.14 or later with CONFIG_COMPACTION disabled
otherwise you will run into all kinds of issues.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer
`. `'   Physicist
  `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913


Re: [PATCH 0/3][RFC] Remove TARGET_LRA_P hook

2025-05-03 Thread John Paul Adrian Glaubitz
Hi Andrew,

On Sat, 2025-05-03 at 22:32 -0700, Andrew Pinski wrote:
> > PPS: Sorry for posting out of thread, but unlike lore.kernel.org, I could
> >  not find a way to obtain the message or mboxes on gcc-patches.
> 
> https://inbox.sourceware.org/gcc-patches/ is the link to the official
> public-inbox instance for next time.

Thanks, I'm using this now. Very helpful and much better than mailman.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer
`. `'   Physicist
  `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913


Re: Update GCC16 to use libffi 3.4.8

2025-05-03 Thread Sam James
Simon Sobisch  writes:

> As GCC15 is now strict on dynamic function as
> int *func()
> to mean exactly zero arguments via its default -std=gnu23, I'm looking
> into a dynamic option that would work for C23 and recognized libffi
> being built as part of GCC and being part of its source tree, which
> possibly is a way to go (unknown amount of arguments between 0 and
> 252)
>
> But then I've seen that GCC's in-tree libffi was last updated in 2021,
> while the current one is from this year.
>
> So I do wonder:
>
> 1 Which parts of GCC use libffi?

Think it's just Go right now and Rust may use it later.

> 2 Is it linked in statically for GCC's usage (I'd see no problem via its
>   MIT license to put it anywhere)?

Yes, it's statically linked.

> 3 Is there a reason to _not_ update it in GCC16 to the most current
>   version?

There were some regressions recently but I think master is fine now (and
IIRC the last release is too, but would need to check).

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117635 tracks this. I was
maybe going to look at it once zlib was done (just need to send it) but
I'd be happy if someone else did it ;)


Re: [PATCH 0/3][RFC] Remove TARGET_LRA_P hook

2025-05-03 Thread Maciej W. Rozycki
On Sat, 3 May 2025, Paul Koning wrote:

> > As for MEM(MEM(xyz)) addressing modes I'm less sure - I suppose those
> > are usually formed at RTL expansion time (rather than, say, by
> > RTL combine)?  If PDP-11 is the only target with those then it might
> > be easier to recover those post-LRA during late-combine or peephole
> > or alternatively in a target specific pass?  But of course I know
> > nothing about the constraints of said addressing mode or the challenges
> > those present to LRA.
> 
> VAX also has them, in fact VAX addressing modes are a superset of the 
> PDP11 ones.

 Indeed, with the VAX target there's double-indirect postincrement too, 
and then an index can be optionally applied to all these modes.  We get 
very good results with old reload for some scenarios with efficient 
compact code produced.

  Maciej


Re: [PATCH 0/3][RFC] Remove TARGET_LRA_P hook

2025-05-03 Thread Richard Biener
On Fri, 2 May 2025, Paul Koning wrote:

> 
> 
> > On May 2, 2025, at 12:27 PM, Maciej W. Rozycki  wrote:
> > 
> > ...
> > NB I understand your position and the need to cut the line sometime, and 
> > I knew what the situation is with the VAX backend and that it would be 
> > manageable.  In principle it might be that it's only that single ICE that 
> > needs debugging before we can claim LRA usable, even if poor RISC-like 
> > code results (FWIW the PDP-11 backend suffers from the same fate).  As 
> > long as user code builds and runs we can improve code gen gradually.
> 
> Indeed, I have noticed that LRA doesn't take advantage of PDP-11 (and I 
> would guess VAX) addressing modes not found in RISC type machines.  A 
> notable example are the pre-dec and post-inc modes, and I think memory 
> indirect (i.e., MEM(MEM(xyz)) modes).  What isn't clear to me is whether 
> there is interest in LRA doing those things, or if the answer is that 
> they only are part of reload and therefore now unsupported.  I would 
> like to be able to take advantage of those features, but have not dug 
> into it so far -- modern register alllocators are rather intimidating 
> beasts.

I think the auto-inc/dec addressing modes are used by many targets so
there's definitely the chance to adapt LRA (and/or the auto-inc/dec pass)
to handle these better.  There's also the chance this is easier to pull
off when we do not have to care for the IRA + reload combo not regressing.

As for MEM(MEM(xyz)) addressing modes I'm less sure - I suppose those
are usually formed at RTL expansion time (rather than, say, by
RTL combine)?  If PDP-11 is the only target with those then it might
be easier to recover those post-LRA during late-combine or peephole
or alternatively in a target specific pass?  But of course I know
nothing about the constraints of said addressing mode or the challenges
those present to LRA.

Richard.