Re: [PATCH] LoongArch: Increase division costs

2024-03-27 Thread chenglulu



在 2024/3/27 下午8:42, Xi Ruoyao 写道:

On Wed, 2024-03-27 at 18:39 +0800, Xi Ruoyao wrote:

On Wed, 2024-03-27 at 10:38 +0800, chenglulu wrote:

在 2024/3/26 下午5:48, Xi Ruoyao 写道:

The latency of LA464 and LA664 division instructions depends on the
input.  When I updated the costs in r14-6642, I unintentionally set the
division costs to the best-case latency (when the first operand is 0).
Per a recent discussion [1] we should use "something sensible" instead
of it.

Use the average of the minimum and maximum latency observed instead.
This enables multiplication to reciprocal sequence reduction and speeds
up the following test case for about 30%:

  int
  main (void)
  {
    unsigned long stat = 0xdeadbeef;
    for (int i = 0; i < 1; i++)
  stat = (stat * stat + stat * 114514 + 1919810) % 17;
    asm(""::"r"(stat));
  }

[1]: https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648348.html

The test case div-const-reduction.c is modified to assemble the instruction
sequence as follows:
lu12i.w $r12,97440>>12# 0x3b9ac000
ori $r12,$r12,2567
mod.w   $r13,$r13,$r12

This sequence of instructions takes 5 clock cycles.

It actually may take 5 to 8 cycles depending on the input.  And
multiplication is fully pipelined while division is not, so the
reciprocal sequence should still produce a better throughput.


Hmm indeed, it seems a waste to do this reduction for int / 17.
I'll try to make a better heuristic as Richard suggests...

Oops, it seems impossible (w/o refactoring the generic code).  See my
reply to Richi :(.

Can you also try benchmarking with the costs of SI and DI division
increased to (10, 10) instead of (14, 22) - allowing more CSE but not
reciprocal sequence reduction, and (10, 22) - only allowing reduction
for DI but not SI?

No problem, I'll test both cases ((10,10) and (10,22)).



[PATCH] Don't set full_profile in auto-profile [PR113765]

2024-03-27 Thread Eugene Rozenfeld
auto-profile currently doesn't guarantee that it will set probabilities
on all edges because of zero basic block counts. Normally those edges
just have probabilities set by the preceding profile_estimate pass but
under -O0 profile_estimate pass doesn't run. The patch removes setting
of full_profile to true in auto-profile.

Tested on x86_64-pc-linux-gnu.

gcc/ChangeLog:
* auto-profile.cc (afdo_annotate_cfg): Don't set full_profile to true
---
 gcc/auto-profile.cc | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc
index e5407d32fbb..de59b94bcb3 100644
--- a/gcc/auto-profile.cc
+++ b/gcc/auto-profile.cc
@@ -1580,7 +1580,6 @@ afdo_annotate_cfg (const stmt_set _stmts)
 }
   update_max_bb_count ();
   profile_status_for_fn (cfun) = PROFILE_READ;
-  cfun->cfg->full_profile = true;
   if (flag_value_profile_transformations)
 {
   gimple_value_profile_transformations ();
-- 
2.25.1



Re: [PATCH] Fortran: fix DATA and derived types with pointer components [PR114474]

2024-03-27 Thread Jerry D

On 3/27/24 1:42 PM, Harald Anlauf wrote:

Dear all,

the attached patch fixes a 10+ regression for cases where a
derived type with a pointer component is used in a DATA statement.
The failure looked obscure, see testcase comments, and pointed
to a possible issue in the resolution (order).  For the failing
test, the target variable was seen with ts.type == BT_PROCEDURE
instead of its actual type.  For this reason, I restricted the
fixup as much as possible.

For details, please see the commit message.

Testcase cross-checked with NAG.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

If this fix survives broader testing, I would like to backport.

Thanks,
Harald

P.S.: while trying to extend coverage of conforming code, I had
much fun also with other compilers (e.g. NAG panicking...)



OK, for trunk and we will see how it survives for a bit before back port.

Jerry -


Go patch committed: Use correct check for index value overflow

2024-03-27 Thread Ian Lance Taylor
This patch to the Go frontend uses the correct size and comparison
when doing an index value overflow check.  This has apparently been
wrong since I introduced the code ten years ago.  This fixes GCC PR
114500.  Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.
Committed to mainline.

Ian
1091113a0036c7315197e09af572dce2beaf1c4c
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index de6e21fb3b5..50d430d5034 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-3f597287b6b858794dabdfe1bf83b386aad18102
+98e92493db2ab7857a5934a950a830fc1f95a4e5
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/gcc/go/gofrontend/expressions.cc b/gcc/go/gofrontend/expressions.cc
index 8429e553eac..238d5a56ca2 100644
--- a/gcc/go/gofrontend/expressions.cc
+++ b/gcc/go/gofrontend/expressions.cc
@@ -18790,7 +18790,7 @@ Composite_literal_expression::lower_array(Type* type)
 
  Named_type* ntype = Type::lookup_integer_type("int");
  Integer_type* inttype = ntype->integer_type();
- if (sizeof(index) <= static_cast(inttype->bits() * 8)
+ if (sizeof(index) >= static_cast(inttype->bits() / 8)
  && index >> (inttype->bits() - 1) != 0)
{
  go_error_at(index_expr->location(), "index value overflow");


[pushed] analyzer: fix ICE due to type mismatch when replaying call summary [PR114473]

2024-03-27 Thread David Malcolm
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Successful run of analyzer integration tests on x86_64-pc-linux-gnu.
Pushed to trunk as r14-9697-gfdd59818e2abf6.

gcc/analyzer/ChangeLog:
PR analyzer/114473
* call-summary.cc
(call_summary_replay::convert_svalue_from_summary): Assert that
the types match.
(call_summary_replay::convert_region_from_summary): Likewise.
(call_summary_replay::convert_region_from_summary_1): Add missing
cast for the deref of RK_SYMBOLIC case.

gcc/testsuite/ChangeLog:
PR analyzer/114473
* gcc.dg/analyzer/call-summaries-pr114473.c: New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/call-summary.cc  | 12 +++
 .../gcc.dg/analyzer/call-summaries-pr114473.c | 31 +++
 2 files changed, 43 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/call-summaries-pr114473.c

diff --git a/gcc/analyzer/call-summary.cc b/gcc/analyzer/call-summary.cc
index a569bb94cec5..c2c9c71f79b4 100644
--- a/gcc/analyzer/call-summary.cc
+++ b/gcc/analyzer/call-summary.cc
@@ -235,6 +235,11 @@ call_summary_replay::convert_svalue_from_summary (const 
svalue *summary_sval)
 
   const svalue *caller_sval = convert_svalue_from_summary_1 (summary_sval);
 
+  if (caller_sval)
+if (summary_sval->get_type () && caller_sval->get_type ())
+  gcc_assert (types_compatible_p (summary_sval->get_type (),
+ caller_sval->get_type ()));
+
   /* Add to cache.  */
   add_svalue_mapping (summary_sval, caller_sval);
 
@@ -552,6 +557,11 @@ call_summary_replay::convert_region_from_summary (const 
region *summary_reg)
 
   const region *caller_reg = convert_region_from_summary_1 (summary_reg);
 
+  if (caller_reg)
+if (summary_reg->get_type () && caller_reg->get_type ())
+  gcc_assert (types_compatible_p (summary_reg->get_type (),
+ caller_reg->get_type ()));
+
   /* Add to cache.  */
   add_region_mapping (summary_reg, caller_reg);
 
@@ -603,6 +613,8 @@ call_summary_replay::convert_region_from_summary_1 (const 
region *summary_reg)
  = get_caller_model ()->deref_rvalue (caller_ptr_sval,
   NULL_TREE,
   get_ctxt ());
+   caller_reg = mgr->get_cast_region (caller_reg,
+  summary_reg->get_type ());
return caller_reg;
   }
   break;
diff --git a/gcc/testsuite/gcc.dg/analyzer/call-summaries-pr114473.c 
b/gcc/testsuite/gcc.dg/analyzer/call-summaries-pr114473.c
new file mode 100644
index ..4598840f0dff
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/call-summaries-pr114473.c
@@ -0,0 +1,31 @@
+/* { dg-additional-options "-fanalyzer-call-summaries" } */
+
+int a;
+extern int *q[];
+
+int *
+baz (int *src)
+{
+  while (a)
+{
+  src && a;
+  return src;
+}
+}
+
+void
+bar (int **src)
+{
+  for (unsigned j = 0; j;)
+a = 0;
+  while (a)
+baz (src[0]);
+}
+
+void
+foo (void)
+{
+  bar (q);
+  baz ();
+  bar (q);
+}
-- 
2.26.3



[PATCH] RISC-V: testsuite: ensure vtype is call clobbered

2024-03-27 Thread Vineet Gupta
Per classic Vector calling convention ABI, vtype is call clobbered,
so ensure gcc generates fresh a VSETVLI after a function call or an
inline asm which clobbers vtype.

ATM gcc seems to be doing the right thing, but a test can never be
harmful.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/vtype-call-clobbered.c: New Test.

Signed-off-by: Vineet Gupta 
---
 .../riscv/rvv/vtype-call-clobbered.c  | 47 +++
 1 file changed, 47 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vtype-call-clobbered.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vtype-call-clobbered.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vtype-call-clobbered.c
new file mode 100644
index ..be9f312aa508
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vtype-call-clobbered.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O2" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+
+#include "riscv_vector.h"
+
+extern void can_clobber_vtype();
+
+static inline void v_loop (void * restrict in, void * restrict out, int n)
+{
+  for (int i = 0; i < n; i++)
+{
+  vuint8mf8_t v = *(vuint8mf8_t*)(in + i);
+  *(vuint8mf8_t*)(out + i) = v;
+}
+}
+
+/* Two V instructions back-back.
+   Only 1 vsetvli insn.  */
+void
+vec1 (void * restrict in, void * restrict out1,  void * restrict out2, int n)
+{
+ v_loop(in, out1, n);
+ v_loop(in, out2, n);
+}
+
+/* Two V instructions seperated by a function call.
+   Both need to have a corresponding vsetvli insn.  */
+void
+vec2 (void * restrict in, void * restrict out1,  void * restrict out2, int n)
+{
+ v_loop(in, out1, n);
+ can_clobber_vtype();
+ v_loop(in, out2, n);
+}
+
+/* Two V instructions seperated by an inline asm with vtype clobber.
+   Both need to have a corresponding vsetvli insn.  */
+void
+vec3 (void * restrict in, void * restrict out1,  void * restrict out2, int n)
+{
+ v_loop(in, out1, n);
+ asm volatile("":::"vtype");
+ v_loop(in, out2, n);
+}
+
+/* { dg-final { scan-assembler-times {vsetvli} 5 } } */
-- 
2.34.1



[r14-9692 Regression] FAIL: gcc.target/i386/pr101950-2.c scan-assembler-times \txor[ql]\t 2 on Linux/x86_64

2024-03-27 Thread haochen.jiang
On Linux/x86_64,

839bc42772ba7af66af3bd16efed4a69511312ae is the first bad commit
commit 839bc42772ba7af66af3bd16efed4a69511312ae
Author: Segher Boessenkool 
Date:   Wed Mar 27 14:09:52 2024 +

combine: Don't combine if I2 does not change

caused

FAIL: gcc.target/i386/pr101950-2.c scan-assembler-times \txor[ql]\t 2

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-9692/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr101950-2.c --target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr101950-2.c --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


[PATCH] RISC-V: Add vxsat as a register

2024-03-27 Thread Palmer Dabbelt
We aren't doing anything with vxsat right now, but I'd like to add it as
an accepted register to the clobber list.  If we get this into GCC-14
then we'll avoid some preprocessor-based twiddling if we ever start
using vxsat in the future.

gcc/ChangeLog:

* config/riscv/riscv.h (REGISTER_NAMES): Add vxsat.
---
IIUC we aren't using these N/A regnos for anything, they're just there to pad
out the types.  So I think this is safe, but Juzhe would likely know best here.

See
https://inbox.sourceware.org/libc-alpha/20240327193601.28903-2-pal...@rivosinc.com/
a use of this.
---
 gcc/config/riscv/riscv.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index da089a03e9d..d5779512994 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -933,7 +933,7 @@ extern enum riscv_cc get_riscv_cc (const rtx use);
   "fs0", "fs1", "fa0", "fa1", "fa2", "fa3", "fa4", "fa5",  \
   "fa6", "fa7", "fs2", "fs3", "fs4", "fs5", "fs6", "fs7",  \
   "fs8", "fs9", "fs10","fs11","ft8", "ft9", "ft10","ft11", \
-  "arg", "frame", "vl", "vtype", "vxrm", "frm", "N/A", "N/A",   \
+  "arg", "frame", "vl", "vtype", "vxrm", "frm", "vxsat", "N/A", \
   "N/A", "N/A", "N/A", "N/A", "N/A", "N/A", "N/A", "N/A",  \
   "N/A", "N/A", "N/A", "N/A", "N/A", "N/A", "N/A", "N/A",  \
   "N/A", "N/A", "N/A", "N/A", "N/A", "N/A", "N/A", "N/A",  \
-- 
2.44.0



Re: [PATCH 1/2 v2] libdecnumber: fixed multiple potential access-out-of bounds errors by moving range conditions before reads.

2024-03-27 Thread Ian McCormack
Hello—just pinging again about the following two patches I submitted. Each
fixes small access-out-of-bounds errors in libdecnumber.

https://gcc.gnu.org/pipermail/gcc-patches/2024-February/644840.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-February/644910.html

Also, the documentation indicated that I should mention that I do not have
write access, since I'm a first-time contributor.

On Sat, Feb 3, 2024 at 2:31 PM Ian McCormack  wrote:

> Multiple `for` loops across `libdecnumber` contain boolean expressions
> where memory is accessed prior to checking if the pointer is still within a
> valid range, which can lead to out-of-bounds reads.
>
> This patch moves the range conditions to appear before the memory accesses
> in each conjunction so that these expressions short-circuit instead of
> performing an invalid read.
>
> libdecnumber/ChangeLog
>* In each `for` loop and `if` statement, all boolean expressions of
> the form `*ptr == value && in_range(ptr)` have been changed to
> `in_range(ptr) && *ptr == value`.
>
> Bootstrapped on x86_64-pc-linux-gnu with no regressions.
> ---
>  libdecnumber/decBasic.c  | 20 ++--
>  libdecnumber/decCommon.c |  2 +-
>  libdecnumber/decNumber.c |  2 +-
>  3 files changed, 12 insertions(+), 12 deletions(-)
>
> diff --git a/libdecnumber/decBasic.c b/libdecnumber/decBasic.c
> index 6319f66b25d..04833f2390d 100644
> --- a/libdecnumber/decBasic.c
> +++ b/libdecnumber/decBasic.c
> @@ -341,7 +341,7 @@ static decFloat * decDivide(decFloat *result, const
> decFloat *dfl,
>  for (;;) {   /* inner loop -- calculate quotient unit */
>/* strip leading zero units from acc (either there initially or */
>/* from subtraction below); this may strip all if exactly 0 */
> -  for (; *msua==0 && msua>=lsua;) msua--;
> +  for (; msua>=lsua && *msua==0;) msua--;
>accunits=(Int)(msua-lsua+1);   /* [maybe 0] */
>/* subtraction is only necessary and possible if there are as */
>/* least as many units remaining in acc for this iteration as */
> @@ -515,7 +515,7 @@ static decFloat * decDivide(decFloat *result, const
> decFloat *dfl,
>  /* (must also continue to original lsu for correct quotient length) */
>  if (lsua>acc+DIVACCLEN-DIVOPLEN) continue;
>  for (; msua>lsua && *msua==0;) msua--;
> -if (*msua==0 && msua==lsua) break;
> +if (msua==lsua && *msua==0) break;
>  } /* outer loop */
>
>/* all of the original operand in acc has been covered at this point */
> @@ -1543,8 +1543,8 @@ decFloat * decFloatAdd(decFloat *result,
> umsd=acc+COFF+DECPMAX-1;   /* so far, so zero */
> if (ulsd>umsd) {   /* more to check */
>   umsd++;  /* to align after checked area */
> - for (; UBTOUI(umsd)==0 && umsd+3 - for (; *umsd==0 && umsd + for (; umsd+3 + for (; umsd   }
> if (*umsd==0) {/* must be true zero (and diffsign) */
>   num.sign=0;  /* assume + */
> @@ -2087,10 +2087,10 @@ decFloat * decFloatFMA(decFloat *result, const
> decFloat *dfl,
>/* remove leading zeros on both operands; this will save time later */
>/* and make testing for zero trivial (tests are safe because acc */
>/* and coe are rounded up to uInts) */
> -  for (; UBTOUI(hi->msd)==0 && hi->msd+3lsd;) hi->msd+=4;
> -  for (; *hi->msd==0 && hi->msdlsd;) hi->msd++;
> -  for (; UBTOUI(lo->msd)==0 && lo->msd+3lsd;) lo->msd+=4;
> -  for (; *lo->msd==0 && lo->msdlsd;) lo->msd++;
> +  for (; hi->msd+3lsd && UBTOUI(hi->msd)==0;) hi->msd+=4;
> +  for (; hi->msdlsd && *hi->msd==0;) hi->msd++;
> +  for (; lo->msd+3lsd && UBTOUI(lo->msd)==0;) lo->msd+=4;
> +  for (; lo->msdlsd && *lo->msd==0;) lo->msd++;
>
>/* if hi is zero then result will be lo (which has the smaller */
>/* exponent), which also may need to be tested for zero for the */
> @@ -2252,8 +2252,8 @@ decFloat * decFloatFMA(decFloat *result, const
> decFloat *dfl,
>/* all done except for the special IEEE 754 exact-zero-result */
>/* rule (see above); while testing for zero, strip leading */
>/* zeros (which will save decFinalize doing it) */
> -  for (; UBTOUI(lo->msd)==0 && lo->msd+3lsd;) lo->msd+=4;
> -  for (; *lo->msd==0 && lo->msdlsd;) lo->msd++;
> +  for (; lo->msd+3lsd && UBTOUI(lo->msd)==0;) lo->msd+=4;
> +  for (; lo->msdlsd && *lo->msd==0;) lo->msd++;
>if (*lo->msd==0) {  /* must be true zero (and diffsign) */
> lo->sign=0;/* assume + */
> if (set->round==DEC_ROUND_FLOOR) lo->sign=DECFLOAT_Sign;
> diff --git a/libdecnumber/decCommon.c b/libdecnumber/decCommon.c
> index 6f7563de6e6..1f9fe4a1935 100644
> --- a/libdecnumber/decCommon.c
> +++ b/libdecnumber/decCommon.c
> @@ -276,7 +276,7 @@ static decFloat * decFinalize(decFloat *df, bcdnum
> *num,
>  /* [this is quite expensive] */
>  if (*umsd==0) {
>for (; 

[PATCH] Fortran: fix DATA and derived types with pointer components [PR114474]

2024-03-27 Thread Harald Anlauf
Dear all,

the attached patch fixes a 10+ regression for cases where a
derived type with a pointer component is used in a DATA statement.
The failure looked obscure, see testcase comments, and pointed
to a possible issue in the resolution (order).  For the failing
test, the target variable was seen with ts.type == BT_PROCEDURE
instead of its actual type.  For this reason, I restricted the
fixup as much as possible.

For details, please see the commit message.

Testcase cross-checked with NAG.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

If this fix survives broader testing, I would like to backport.

Thanks,
Harald

P.S.: while trying to extend coverage of conforming code, I had
much fun also with other compilers (e.g. NAG panicking...)

From d5fda38243a22e1aef4367653d92521e53f2000d Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Wed, 27 Mar 2024 21:18:04 +0100
Subject: [PATCH] Fortran: fix DATA and derived types with pointer components
 [PR114474]

When matching actual arguments in match_actual_arg, these are initially
treated as a possible dummy procedure, assuming that the correct type is
determined later.  This resolution could fail when the procedure is a
derived type constructor with a pointer component and appears in a DATA
statement, where the pointer shall be associated with an initial data
target.  Check for those cases where the type obviously has not been
resolved yet, and which were missed because there was no component
reference.

gcc/fortran/ChangeLog:

	PR fortran/114474
	* primary.cc (gfc_variable_attr): Catch variables used in structure
	constructors within DATA statements that are still tagged with a
	temporary type BT_PROCEDURE from match_actual_arg and which have the
	target attribute, and fix their typespec.

gcc/testsuite/ChangeLog:

	PR fortran/114474
	* gfortran.dg/data_pointer_3.f90: New test.
---
 gcc/fortran/primary.cc   | 12 +++
 gcc/testsuite/gfortran.dg/data_pointer_3.f90 | 77 
 2 files changed, 89 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/data_pointer_3.f90

diff --git a/gcc/fortran/primary.cc b/gcc/fortran/primary.cc
index 0ab69bb9dce..5dd6875a4a6 100644
--- a/gcc/fortran/primary.cc
+++ b/gcc/fortran/primary.cc
@@ -2804,6 +2804,18 @@ gfc_variable_attr (gfc_expr *expr, gfc_typespec *ts)
   if (ts != NULL && expr->ts.type == BT_UNKNOWN)
 *ts = sym->ts;

+  /* Catch left-overs from match_actual_arg, where an actual argument of a
+ procedure is given a temporary ts.type == BT_PROCEDURE.  The fixup is
+ needed for structure constructors in DATA statements, where a pointer
+ is associated with a data target, and the argument has not been fully
+ resolved yet.  Components references are dealt with further below.  */
+  if (ts != NULL
+  && expr->ts.type == BT_PROCEDURE
+  && expr->ref == NULL
+  && attr.flavor != FL_PROCEDURE
+  && attr.target)
+*ts = sym->ts;
+
   has_inquiry_part = false;
   for (ref = expr->ref; ref; ref = ref->next)
 if (ref->type == REF_INQUIRY)
diff --git a/gcc/testsuite/gfortran.dg/data_pointer_3.f90 b/gcc/testsuite/gfortran.dg/data_pointer_3.f90
new file mode 100644
index 000..f0325cd5bcb
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/data_pointer_3.f90
@@ -0,0 +1,77 @@
+! { dg-do compile }
+! PR fortran/114474 - DATA and derived types with pointer components
+
+program pr114474
+  implicit none
+  integer, target :: ii = 42! initial data target
+
+  integer, target :: jj = 24
+  integer, pointer:: qq => jj
+  ! ii and jj resolve slightly differently when the data statement below
+  ! is reached, as jj is resolved outside the structure constructor first
+
+  type t
+ integer, pointer :: h
+  end type t
+
+  integer, target :: kk(7) =  23
+  integer, pointer:: ll(:) => kk
+
+  type t1
+ integer  :: m(7)
+  end type t1
+
+  type(t) :: x1, x2, x3, x4, x5
+  type(t), parameter  :: z1 = t(null())
+
+  type(t1), target:: tt = t1([1,2,3,4,5,6,7])
+  type(t1), parameter :: vv = t1(22)
+  type(t1):: w1, w2
+  integer,  pointer   :: p1(:) => tt% m
+
+  data x1 / t(null())  /
+  data x2 / t(ii)  / ! ii is initial data target
+  data x3 / t(jj)  / ! jj is resolved differently...
+  data x4 / t(tt%m(3)) / ! pointer association with 3rd element
+
+  data w1 / t1(12) /
+  data w2 / t1(vv%m)   /
+
+  if (  associated (x1% h)) stop 1
+  if (.not. associated (x2% h)) stop 2
+  if (.not. associated (x3% h)) stop 3
+  if (.not. associated (x4% h)) stop 4
+  if (x2% h /= 42) stop 5
+  if (x3% h /= 24) stop 6
+  if (x4% h /=  3) stop 7
+
+ if (any (w1%m /= 12  )) stop 8
+  if (any (w2%m /= vv%m)) stop 9
+end
+
+
+subroutine sub
+  implicit none
+
+  interface
+ real function myfun (x)
+   real, intent(in) :: x
+ end function myfun
+  end interface
+
+  type u
+ procedure(myfun), pointer, nopass :: p
+  end type u
+
+  type(u):: u3 = u(null())
+  

Re: [PATCH v7 1/5] Provide counted_by attribute to flexible array member field (PR108896)

2024-03-27 Thread Qing Zhao


> On Mar 26, 2024, at 13:20, Joseph Myers  wrote:
> 
> On Tue, 26 Mar 2024, Qing Zhao wrote:
> 
>>> What happens when there are multiple counted_by attributes on the same 
>>> field?  As far as I can see, all but one end up being ignored (by the code 
>>> that actually uses the attribute).
>> 
>> In general, is there any rule for handling multiple same attributes in 
>> GCC? i.e, from left to right, the last one wins? Or something else? I’d 
>> like to following the consistent rule with other places in GCC.
> 
> Sometimes, they are meaningful and all can be respected.  (An example is 
> the format_arg attribute, where ngettext legitimately has two such 
> attributes.)
> 
> When not meaningful, an error is appropriate.  For example, with section 
> attributes you can get
> 
>error ("section of %q+D conflicts with previous declaration",
>   *node);
> 
> if different sections are named.  I think that's a suitable model for the 
> new attribute here: allow duplicates if they name the same field, but give 
> errors if they name different fields, just as with the section attribute.
> 
> Once you give an error for multiple attributes naming different fields, 
> which one wins is just a question of error recovery; the specific choice 
> doesn't matter much, as long as you don't get an ICE in later processing.

Agreed and fixed as suggested.

Thanks.

Qing
> 
> -- 
> Joseph S. Myers
> josmy...@redhat.com



Re: [PATCH] btf: Fix up btf-datasec-1.c test on x86

2024-03-27 Thread David Faust
Hi Jakub,

On 3/27/24 04:16, Jakub Jelinek wrote:
> On Wed, Mar 27, 2024 at 11:18:49AM +0100, Jakub Jelinek wrote:
>>> -/* The offset entry for each variable in a DATSEC should be 0 at compile 
>>> time.  */
>>> -/* { dg-final { scan-assembler-times "0\[\t \]+\[^\n\]*bts_offset" 7 } } */
>>> +/* The offset entry for each variable in a DATSEC should contain a label.  
>>> */
>>> +/* { dg-final { scan-assembler-times ".4byte\[\t \]\[a-e\]\[\t 
>>> \]+\[^\n\]*bts_offset" 5 } } */
>>
>> 4byte is used only on some targets, what exact assembler directive is used
>> for 4byte unaligned data is heavily target dependent.
> 
> Now in patch form, tested on x86_64-linux with
> make check-gcc RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} btf.exp'
> ok for trunk?

OK.
Thank you for fixing these.

> 
> 2024-03-27  Jakub Jelinek  
> 
>   * gcc.dg/debug/btf/btf-cvr-quals-1.c: Use dg-additional-options
>   instead of multiple dg-options.
>   * gcc.dg/debug/btf/btf-datasec-1.c: Likewise.  Accept all supported
>   unaligned 4 byte assembler directives rather than assuming it must
>   be .4byte.
> 
> --- gcc/testsuite/gcc.dg/debug/btf/btf-cvr-quals-1.c.jj   2021-07-12 
> 22:57:26.056103708 +0200
> +++ gcc/testsuite/gcc.dg/debug/btf/btf-cvr-quals-1.c  2024-03-27 
> 12:09:25.383278017 +0100
> @@ -23,7 +23,7 @@
>  
>  /* { dg-do compile } */
>  /* { dg-options "-O0 -gbtf -dA" } */
> -/* { dg-options "-O0 -gbtf -gdwarf-4 -dA" { target { *-*-darwin* } } } */
> +/* { dg-additional-options "-gdwarf-4" { target { *-*-darwin* } } } */
>  
>  /* { dg-final { scan-assembler-times "ascii \"int.0\"\[\t 
> \]+\[^\n\]*btf_string" 1 } } */
>  
> --- gcc/testsuite/gcc.dg/debug/btf/btf-datasec-1.c.jj 2024-03-26 
> 22:34:47.925757342 +0100
> +++ gcc/testsuite/gcc.dg/debug/btf/btf-datasec-1.c2024-03-27 
> 12:12:09.761045873 +0100
> @@ -11,16 +11,16 @@
>  
>  /* { dg-do compile )  */
>  /* { dg-options "-O0 -gbtf -dA" } */
> -/* { dg-options "-O0 -gbtf -dA -msdata=none" { target { { powerpc*-*-* } && 
> ilp32 } } } */
> -/* { dg-options "-O0 -gbtf -dA -msmall-data-limit=0" { target { riscv*-*-* } 
> } } */
> -/* { dg-options "-O0 -gbtf -dA -G0" { target { nios2-*-* } } } */
> +/* { dg-additional-options "-msdata=none" { target { { powerpc*-*-* } && 
> ilp32 } } } */
> +/* { dg-additional-options "-msmall-data-limit=0" { target { riscv*-*-* } } 
> } */
> +/* { dg-additional-options "-G0" { target { nios2-*-* } } } */
>  
>  /* Check for two DATASEC entries with vlen 3, and one with vlen 1.  */
>  /* { dg-final { scan-assembler-times "0xf03\[\t \]+\[^\n\]*btt_info" 2 } 
> } */
>  /* { dg-final { scan-assembler-times "0xf01\[\t \]+\[^\n\]*btt_info" 1 } 
> } */
>  
>  /* The offset entry for each variable in a DATSEC should contain a label.  */
> -/* { dg-final { scan-assembler-times ".4byte\[\t \]\[a-e\]\[\t 
> \]+\[^\n\]*bts_offset" 5 } } */
> +/* { dg-final { scan-assembler-times 
> "(?:(?:\\.4byte|\\.long|data4\\.ua|\\.ualong|\\.uaword|\\.dword|long|dc\\.l|\\.word)\[\t
>  \]|\\.vbyte\t4,\[\t \]?)\[a-e\]\[\t \]+\[^\n\]*bts_offset" 5 } } */
>  /* { dg-final { scan-assembler-times "my_cstruct\[\t \]+\[^\n\]*bts_offset" 
> 1 } } */
>  /* { dg-final { scan-assembler-times "bigarr\[\t \]+\[^\n\]*bts_offset" 1 } 
> } */
>  
> 
>   Jakub
> 


Re: [PATCH] btf: Emit labels in DATASEC bts_offset entries.

2024-03-27 Thread Cupertino Miranda


Hi Jakub,

Thanks for the patch and appologies for the results regression.

Cupertino

Jakub Jelinek writes:

> On Tue, Mar 26, 2024 at 02:28:52PM +, Cupertino Miranda wrote:
>> GCC was defining bts_offset entry to always contain 0.
>> When comparing with clang, the same entry is instead a label to the
>> respective variable or function. The assembler emits relocations for
>> those labels.
>>
>> gcc/ChangeLog:
>>  PR target/114431
>>  * btfout.cc (get_name_for_datasec_entry): Add function.
>>  (btf_asm_datasec_entry): Print label when possible.
>>
>> gcc/testsuite/ChangeLog:
>>  gcc.dg/debug/btf/btf-datasec-1.c: Correct for new
>>  implementation.
>>  gcc.dg/debug/btf/btf-datasec-2.c: Likewise
>>  gcc.dg/debug/btf/btf-pr106773.c: Likewise
>
> For next time, missing dot after Likewise
>
>> --- a/gcc/testsuite/gcc.dg/debug/btf/btf-datasec-1.c
>> +++ b/gcc/testsuite/gcc.dg/debug/btf/btf-datasec-1.c
>> @@ -19,8 +19,10 @@
>>  /* { dg-final { scan-assembler-times "0xf03\[\t \]+\[^\n\]*btt_info" 2 
>> } } */
>>  /* { dg-final { scan-assembler-times "0xf01\[\t \]+\[^\n\]*btt_info" 1 
>> } } */
>>
>> -/* The offset entry for each variable in a DATSEC should be 0 at compile 
>> time.  */
>> -/* { dg-final { scan-assembler-times "0\[\t \]+\[^\n\]*bts_offset" 7 } } */
>> +/* The offset entry for each variable in a DATSEC should contain a label.  
>> */
>> +/* { dg-final { scan-assembler-times ".4byte\[\t \]\[a-e\]\[\t 
>> \]+\[^\n\]*bts_offset" 5 } } */
>
> Where has this been tested?
> 4byte is used only on some targets, what exact assembler directive is used
> for 4byte unaligned data is heavily target dependent.
> git grep define.*TARGET_ASM_UNALIGNED_SI_OP
> gcc/config/alpha/alpha.cc:#define TARGET_ASM_UNALIGNED_SI_OP "\t.align 
> 0\n\t.long\t"
> gcc/config/avr/avr.cc:#define TARGET_ASM_UNALIGNED_SI_OP "\t.long\t"
> gcc/config/cris/cris.cc:#define TARGET_ASM_UNALIGNED_SI_OP 
> TARGET_ASM_ALIGNED_SI_OP
> gcc/config/csky/csky.cc:#define TARGET_ASM_UNALIGNED_SI_OP "\t.long\t"
> gcc/config/i386/i386.cc:#define TARGET_ASM_UNALIGNED_SI_OP 
> TARGET_ASM_ALIGNED_SI_OP
> gcc/config/ia64/ia64.cc:#define TARGET_ASM_UNALIGNED_SI_OP "\tdata4.ua\t"
> gcc/config/m68k/m68k.cc:#define TARGET_ASM_UNALIGNED_SI_OP 
> TARGET_ASM_ALIGNED_SI_OP
> gcc/config/mcore/mcore.cc:#define TARGET_ASM_UNALIGNED_SI_OP "\t.long\t"
> gcc/config/pa/pa.cc:#define TARGET_ASM_UNALIGNED_SI_OP 
> TARGET_ASM_ALIGNED_SI_OP
> gcc/config/rs6000/rs6000.cc:#define TARGET_ASM_UNALIGNED_SI_OP "\t.vbyte\t4,"
> gcc/config/rs6000/rs6000.cc:#define TARGET_ASM_UNALIGNED_SI_OP "\t.long\t"
> gcc/config/sh/sh.cc:#define TARGET_ASM_UNALIGNED_SI_OP "\t.ualong\t"
> gcc/config/sparc/sparc.cc:#define TARGET_ASM_UNALIGNED_SI_OP "\t.uaword\t"
> gcc/target-def.h:#define TARGET_ASM_UNALIGNED_SI_OP "\t.4byte\t"
> gcc/target-def.h:#define TARGET_ASM_UNALIGNED_SI_OP NULL
> Now
> git grep 'define[[:blank:]]*TARGET_ASM_ALIGNED_SI_OP' | grep 
> '/\(cris\|i386\|m68k\|pa\)/'
> gcc/config/cris/cris.cc:#define TARGET_ASM_ALIGNED_SI_OP "\t.dword\t"
> gcc/config/i386/i386.cc:#define TARGET_ASM_ALIGNED_SI_OP ASM_LONG
> gcc/config/m68k/m68k.cc:#define TARGET_ASM_ALIGNED_SI_OP "\tlong\t"
> gcc/config/m68k/m68k.cc:#define TARGET_ASM_ALIGNED_SI_OP "\tdc.l\t"
> gcc/config/pa/pa.cc:#define TARGET_ASM_ALIGNED_SI_OP "\t.word\t"
> git grep 'define[[:blank:]]*ASM_LONG'
> gcc/config/i386/att.h:#define ASM_LONG "\t.long\t"
> gcc/config/i386/bsd.h:#define ASM_LONG "\t.long\t"
> gcc/config/i386/darwin.h:#define ASM_LONG "\t.long\t"
>
> So, if you want to handle it for all arches, instead of .4byte\[\t \] you'd 
> need to
> use
> (?:(?:.4byte|.long|data4.ua|.ualong|.uaword|.dword|long|dc.l|.word)\[\t 
> \]|.vbyte\t4,\[\t \]?)
> or so.
>
> BTW,
> /* { dg-options "-O0 -gbtf -dA" } */
> /* { dg-options "-O0 -gbtf -dA -msdata=none" { target { { powerpc*-*-* } && 
> ilp32 } } } */
> /* { dg-options "-O0 -gbtf -dA -msmall-data-limit=0" { target { riscv*-*-* } 
> } } */
> /* { dg-options "-O0 -gbtf -dA -G0" { target { nios2-*-* } } } */
> would be perhaps right for test written 2 decades ago, but for a test
> written 3 years ago is something that shouldn't have passed patch review.
> This should be done with
> /* { dg-options "-O0 -gbtf -dA" } */
> /* { dg-additional-options "-msdata=none" { target { { powerpc*-*-* } && 
> ilp32 } } } */
> /* { dg-additional-options "-msmall-data-limit=0" { target { riscv*-*-* } } } 
> */
> /* { dg-additional-options "-G0" { target { nios2-*-* } } } */
>
> btf-cvr-quals-1.c test suffers from that too.
>
>   Jakub


Re: [PATCH] middle-end/114480 - IDF compute is slow

2024-03-27 Thread Jakub Jelinek
On Wed, Mar 27, 2024 at 07:44:28PM +0100, Michael Matz wrote:
> Hey,
> 
> On Wed, 27 Mar 2024, Jakub Jelinek wrote:
> 
> > > @@ -1712,12 +1711,9 @@ compute_idf (bitmap def_blocks, bitmap_head *dfs)
> > >gcc_checking_assert (bb_index
> > >  < (unsigned) last_basic_block_for_fn (cfun));
> > >  
> > > -  EXECUTE_IF_AND_COMPL_IN_BITMAP ([bb_index], 
> > > phi_insertion_points,
> > > -   0, i, bi)
> > > - {
> > > +  EXECUTE_IF_SET_IN_BITMAP ([bb_index], 0, i, bi)
> > > + if (bitmap_set_bit (phi_insertion_points, i))
> > > bitmap_set_bit (work_set, i);
> > > -   bitmap_set_bit (phi_insertion_points, i);
> > > - }
> > >  }
> > 
> > I don't understand why the above is better.
> > Wouldn't it be best to do
> >   bitmap_ior_and_compl_into (work_set, [bb_index],
> >  phi_insertion_points);
> >   bitmap_ior_into (phi_insertion_points, [bb_index]);
> > ?
> 
> I had the same hunch, but:
> 
> 1) One would have to make work_set be non-tree-view again (which with the 
> current structure is a wash anyway, and that makes sense as accesses to 
> work_set aren't heavily random here).
> 
> 2) But doing that and using bitmap_ior.._into is still measurably slower: 
> on a reduced testcase with -O0 -fno-checking, proposed structure 
> (tree-view or not-tree-view workset doesn't matter):
> 
>  tree SSA rewrite   :  14.93 ( 12%)   0.01 (  2%)  14.95 ( 
> 12%)27M (  8%)
> 
> with non-tree-view, and your suggestion:
> 
>  tree SSA rewrite   :  20.68 ( 12%)   0.02 (  4%)  20.75 ( 
> 12%)27M (  8%)
> 
> I can only speculate that the usually extreme sparsity of the bitmaps in 
> question make the setup costs of the two bitmap_ior calls actually more 
> expensive than the often skipped second call to bitmap_set_bit in Richis 
> proposed structure.  (That or cache effects)

Ok then.

Jakub



[PATCH] RTEMS: Add multilib configuration for aarch64

2024-03-27 Thread Sebastian Huber
Add a multilib with workarounds for Cortex-A53 errata.

gcc/ChangeLog:

* config.gcc (aarch64-*-rtems*): Add target makefile fragment
t-aarch64-rtems.
* config/aarch64/t-aarch64-rtems: New file.
---
 gcc/config.gcc |  1 +
 gcc/config/aarch64/t-aarch64-rtems | 42 ++
 2 files changed, 43 insertions(+)
 create mode 100644 gcc/config/aarch64/t-aarch64-rtems

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 648b3dc2110..c3b73d05eb7 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1139,6 +1139,7 @@ aarch64*-*-elf | aarch64*-*-fuchsia* | aarch64*-*-rtems*)
 ;;
aarch64-*-rtems*)
tm_file="${tm_file} aarch64/rtems.h rtems.h"
+   tmake_file="${tmake_file} aarch64/t-aarch64-rtems"
;;
esac
case $target in
diff --git a/gcc/config/aarch64/t-aarch64-rtems 
b/gcc/config/aarch64/t-aarch64-rtems
new file mode 100644
index 000..11f8c380222
--- /dev/null
+++ b/gcc/config/aarch64/t-aarch64-rtems
@@ -0,0 +1,42 @@
+# Machine description for AArch64 architecture.
+#  Copyright (C) 2024 Free Software Foundation, Inc.
+#
+#  This file is part of GCC.
+#
+#  GCC is free software; you can redistribute it and/or modify it
+#  under the terms of the GNU General Public License as published by
+#  the Free Software Foundation; either version 3, or (at your option)
+#  any later version.
+#
+#  GCC is distributed in the hope that it will be useful, but
+#  WITHOUT ANY WARRANTY; without even the implied warranty of
+#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+#  General Public License for more details.
+#
+#  You should have received a copy of the GNU General Public License
+#  along with GCC; see the file COPYING3.  If not see
+#  .
+
+MULTILIB_OPTIONS  =
+MULTILIB_DIRNAMES =
+
+MULTILIB_OPTIONS  += mabi=ilp32
+MULTILIB_DIRNAMES += ilp32
+
+MULTILIB_OPTIONS  += mno-outline-atomics
+MULTILIB_DIRNAMES += nooa
+
+MULTILIB_OPTIONS  += mcpu=cortex-a53
+MULTILIB_DIRNAMES += a53
+
+MULTILIB_OPTIONS  += mfix-cortex-a53-835769
+MULTILIB_DIRNAMES += fix835769
+
+MULTILIB_OPTIONS  += mfix-cortex-a53-843419
+MULTILIB_DIRNAMES += fix843419
+
+MULTILIB_REQUIRED =
+
+MULTILIB_REQUIRED += mabi=ilp32
+MULTILIB_REQUIRED += 
mabi=ilp32/mno-outline-atomics/mcpu=cortex-a53/mfix-cortex-a53-835769/mfix-cortex-a53-843419
+MULTILIB_REQUIRED += 
mno-outline-atomics/mcpu=cortex-a53/mfix-cortex-a53-835769/mfix-cortex-a53-843419
-- 
2.35.3



Go patch committed: Update issue16016 test

2024-03-27 Thread Ian Lance Taylor
This patch to the Go testsuite updates issue16016.go.  This backports
https://go.dev/cl/574536 into the GCC testsuite.  This fixes PR
go/114453.  Bootstrapped and ran test.  Committed to mainline.

Ian
5b6f599670994bef957bd15c683102468a7104f1
diff --git a/gcc/testsuite/go.test/test/fixedbugs/issue16016.go 
b/gcc/testsuite/go.test/test/fixedbugs/issue16016.go
index e738e1dba0e..b1947f5548d 100644
--- a/gcc/testsuite/go.test/test/fixedbugs/issue16016.go
+++ b/gcc/testsuite/go.test/test/fixedbugs/issue16016.go
@@ -6,7 +6,10 @@
 
 package main
 
-import "time"
+import (
+   "runtime"
+   "time"
+)
 
 type T struct{}
 
@@ -24,8 +27,19 @@ type Q interface {
 }
 
 func main() {
+   var count = 1
+   if runtime.Compiler == "gccgo" {
+   // On targets without split-stack libgo allocates
+   // a large stack for each goroutine. On 32-bit
+   // systems this test can run out of memory.
+   const intSize = 32 << (^uint(0) >> 63) // 32 or 64
+   if intSize < 64 {
+   count = 100
+   }
+   }
+
var q Q = {{}}
-   for i := 0; i < 1; i++ {
+   for i := 0; i < count; i++ {
go func() {
defer q.Foo([]interface{}{"meow"})
time.Sleep(100 * time.Millisecond)


Re: [PATCH] middle-end/114480 - IDF compute is slow

2024-03-27 Thread Michael Matz
Hey,

On Wed, 27 Mar 2024, Jakub Jelinek wrote:

> > @@ -1712,12 +1711,9 @@ compute_idf (bitmap def_blocks, bitmap_head *dfs)
> >gcc_checking_assert (bb_index
> >< (unsigned) last_basic_block_for_fn (cfun));
> >  
> > -  EXECUTE_IF_AND_COMPL_IN_BITMAP ([bb_index], phi_insertion_points,
> > - 0, i, bi)
> > -   {
> > +  EXECUTE_IF_SET_IN_BITMAP ([bb_index], 0, i, bi)
> > +   if (bitmap_set_bit (phi_insertion_points, i))
> >   bitmap_set_bit (work_set, i);
> > - bitmap_set_bit (phi_insertion_points, i);
> > -   }
> >  }
> 
> I don't understand why the above is better.
> Wouldn't it be best to do
>   bitmap_ior_and_compl_into (work_set, [bb_index],
>phi_insertion_points);
>   bitmap_ior_into (phi_insertion_points, [bb_index]);
> ?

I had the same hunch, but:

1) One would have to make work_set be non-tree-view again (which with the 
current structure is a wash anyway, and that makes sense as accesses to 
work_set aren't heavily random here).

2) But doing that and using bitmap_ior.._into is still measurably slower: 
on a reduced testcase with -O0 -fno-checking, proposed structure 
(tree-view or not-tree-view workset doesn't matter):

 tree SSA rewrite   :  14.93 ( 12%)   0.01 (  2%)  14.95 ( 
12%)27M (  8%)

with non-tree-view, and your suggestion:

 tree SSA rewrite   :  20.68 ( 12%)   0.02 (  4%)  20.75 ( 
12%)27M (  8%)

I can only speculate that the usually extreme sparsity of the bitmaps in 
question make the setup costs of the two bitmap_ior calls actually more 
expensive than the often skipped second call to bitmap_set_bit in Richis 
proposed structure.  (That or cache effects)


Ciao,
Michael.


Re: [PATCH] c++: fix alias CTAD [PR114377]

2024-03-27 Thread centurion
>From 22056e95bde82b1dc45b8b611be4c8d756122b02 Mon Sep 17 00:00:00 2001
From: centurion 
Date: Wed, 27 Mar 2024 22:29:57 +0400
Subject: [PATCH] c++: fix alias CTAD [PR114377]

PR c++/114377

gcc/cp/ChangeLog:

PR c++/114377
* pt.cc (find_template_parameter_info::found): Use TREE_TYPE for
TEMPLATE_DECL instead of DECL_INITIAL.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/class-deduction-alias19.C: New test.
---
 gcc/cp/pt.cc  |  3 ++-
 .../g++.dg/cpp2a/class-deduction-alias19.C| 15 +++
 2 files changed, 17 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/class-deduction-alias19.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 7b00a8615d2..1425d6116d0 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -11032,7 +11032,8 @@ find_template_parameter_info::found (tree parm)
 {
   if (TREE_CODE (parm) == TREE_LIST)
 parm = TREE_VALUE (parm);
-  if (TREE_CODE (parm) == TYPE_DECL)
+  if (TREE_CODE (parm) == TYPE_DECL
+  || TREE_CODE (parm) == TEMPLATE_DECL)
 parm = TREE_TYPE (parm);
   else
 parm = DECL_INITIAL (parm);
diff --git a/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias19.C 
b/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias19.C
new file mode 100644
index 000..1ea79bd7691
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias19.C
@@ -0,0 +1,15 @@
+// PR c++/114377
+// { dg-do compile { target c++20 } }
+
+template  typename Iterator>
+struct K {};
+
+template 
+class Foo {};
+
+template  typename TTP>
+using Bar = Foo>;
+
+void s() {
+Bar(1); // { dg-error "failed|no match" }
+}
-- 
2.44.0


On Wednesday, March 27th, 2024 at 6:21 PM, Patrick Palka  
wrote:

> On Mon, 25 Mar 2024, centurion wrote:
> 
> > From b34312d82b236601c348382d30e625558f37d40c Mon Sep 17 00:00:00 2001
> > From: centurion centurion...@proton.me
> > Date: Mon, 25 Mar 2024 01:57:21 +0400
> > Subject: [PATCH] c++: fix alias CTAD [PR114377]
> > 
> > PR c++/114377
> > 
> > gcc/cp/ChangeLog:
> > 
> > PR c++/114377
> > * pt.cc (find_template_parameter_info::found): Use TREE_TYPE for
> > TEMPLATE_DECL instead of DECL_INITIAL.
> 
> 
> Makes sense, this is consistent with e.g. template_parm_to_arg. LGTM!
> 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/cpp2a/class-deduction-alias19.C: New test.
> > ---
> > gcc/cp/pt.cc | 3 ++-
> > .../g++.dg/cpp2a/class-deduction-alias19.C | 15 +++
> > 2 files changed, 17 insertions(+), 1 deletion(-)
> > create mode 100644 gcc/testsuite/g++.dg/cpp2a/class-deduction-alias19.C
> > 
> > diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> > index 8cf0d5b7a8d..d8a02f1cd7f 100644
> > --- a/gcc/cp/pt.cc
> > +++ b/gcc/cp/pt.cc
> > @@ -11032,7 +11032,8 @@ find_template_parameter_info::found (tree parm)
> > {
> > if (TREE_CODE (parm) == TREE_LIST)
> > parm = TREE_VALUE (parm);
> > - if (TREE_CODE (parm) == TYPE_DECL)
> > + if (TREE_CODE (parm) == TYPE_DECL
> > + || TREE_CODE(parm) == TEMPLATE_DECL)
> 
> 
> Style note: there should be a space before the ( of a function/macro call
> 
> > parm = TREE_TYPE (parm);
> > else
> > parm = DECL_INITIAL (parm);
> > diff --git a/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias19.C 
> > b/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias19.C
> > new file mode 100644
> > index 000..1ea79bd7691
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias19.C
> > @@ -0,0 +1,15 @@
> > +// PR c++/114377
> > +// { dg-do compile { target c++20 } }
> > +
> > +template  typename Iterator>
> > +struct K {};
> > +
> > +template 
> > +class Foo {};
> > +
> > +template  typename TTP>
> > +using Bar = Foo>;
> > +
> > +void s() {
> > + Bar(1); // { dg-error "failed|no match" }
> > +}
> > --
> > 2.44.0


Go patch committed: initialize local variable in lower_method_expression

2024-03-27 Thread Ian Lance Taylor
This patch to the Go frontend fixes an uninitialized variables in
lower_method_expression.  This fixes GCC PR 114463.  Bootstrapped and
ran Go testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
7b2a24f3964509bd5b74c4579c7ea5684e82aee1
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index 73cb095322c..de6e21fb3b5 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-e15a14e410b8fc5d28012d5b313cb6c8476c7df9
+3f597287b6b858794dabdfe1bf83b386aad18102
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/gcc/go/gofrontend/expressions.cc b/gcc/go/gofrontend/expressions.cc
index 51ff0206129..8429e553eac 100644
--- a/gcc/go/gofrontend/expressions.cc
+++ b/gcc/go/gofrontend/expressions.cc
@@ -9059,7 +9059,7 @@ Selector_expression::lower_method_expression(Gogo* gogo)
 
   Named_type* nt = type->named_type();
   Struct_type* st = type->struct_type();
-  bool is_ambiguous;
+  bool is_ambiguous = false;
   Method* method = NULL;
   if (nt != NULL)
 method = nt->method_function(name, _ambiguous);


[PATCHv2 2/2] aarch64: Add support for _BitInt

2024-03-27 Thread Andre Vieira (lists)
This patch adds support for C23's _BitInt for the AArch64 port when 
compiling for little endianness.  Big Endianness requires further 
target-agnostic support and we therefor disable it for now.


The tests expose some suboptimal codegen for which I'll create PR's for 
optimizations after this goes in.


gcc/ChangeLog:

* config/aarch64/aarch64.cc (TARGET_C_BITINT_TYPE_INFO): Declare MACRO.
(aarch64_bitint_type_info): New function.
(aarch64_return_in_memory_1): Return large _BitInt's in memory.
(aarch64_function_arg_alignment): Adapt to correctly return the ABI
mandated alignment of _BitInt(N) where N > 128 as the alignment of
TImode.
(aarch64_composite_type_p): Return true for _BitInt(N), where N > 128.

libgcc/ChangeLog:

* config/aarch64/t-softfp (softfp_extras): Add floatbitinthf,
floatbitintbf, floatbitinttf and fixtfbitint.
* config/aarch64/libgcc-softfp.ver (GCC_14.0.0): Add __floatbitinthf,
__floatbitintbf, __floatbitinttf and __fixtfbitint.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/bitint-alignments.c: New test.
* gcc.target/aarch64/bitint-args.c: New test.
* gcc.target/aarch64/bitint-sizes.c: New test.
* gcc.target/aarch64/bitfield-bitint-abi.h: New header.
* gcc.target/aarch64/bitfield-bitint-abi-align16.c: New test.
* gcc.target/aarch64/bitfield-bitint-abi-align8.c: New test.diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
b68cf3e7cb9a6fa89b4e5826a39ffa11f64ca20a..5fe55c6e980bc1ea66df0e4357932123cd049366
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -6583,6 +6583,7 @@ aarch64_return_in_memory_1 (const_tree type)
   int count;
 
   if (!AGGREGATE_TYPE_P (type)
+  && TREE_CODE (type) != BITINT_TYPE
   && TREE_CODE (type) != COMPLEX_TYPE
   && TREE_CODE (type) != VECTOR_TYPE)
 /* Simple scalar types always returned in registers.  */
@@ -21991,6 +21992,11 @@ aarch64_composite_type_p (const_tree type,
   if (type && (AGGREGATE_TYPE_P (type) || TREE_CODE (type) == COMPLEX_TYPE))
 return true;
 
+  if (type
+  && TREE_CODE (type) == BITINT_TYPE
+  && int_size_in_bytes (type) > 16)
+return true;
+
   if (mode == BLKmode
   || GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT
   || GET_MODE_CLASS (mode) == MODE_COMPLEX_INT)
@@ -28472,6 +28478,42 @@ aarch64_excess_precision (enum excess_precision_type 
type)
   return FLT_EVAL_METHOD_UNPREDICTABLE;
 }
 
+/* Implement TARGET_C_BITINT_TYPE_INFO.
+   Return true if _BitInt(N) is supported and fill its details into *INFO.  */
+bool
+aarch64_bitint_type_info (int n, struct bitint_info *info)
+{
+  if (TARGET_BIG_END)
+return false;
+
+  if (n <= 8)
+info->limb_mode = QImode;
+  else if (n <= 16)
+info->limb_mode = HImode;
+  else if (n <= 32)
+info->limb_mode = SImode;
+  else if (n <= 64)
+info->limb_mode = DImode;
+  else if (n <= 128)
+info->limb_mode = TImode;
+  else
+/* The AAPCS for AArch64 defines _BitInt(N > 128) as an array with
+   type {signed,unsigned} __int128[M] where M*128 >= N.  However, to be
+   able to use libgcc's implementation to support large _BitInt's we need
+   to use a LIMB_MODE that is no larger than 'long long'.  This is why we
+   use DImode for our internal LIMB_MODE and we define the ABI_LIMB_MODE to
+   be TImode to ensure we are ABI compliant.  */
+info->limb_mode = DImode;
+
+  if (n > 128)
+info->abi_limb_mode = TImode;
+  else
+info->abi_limb_mode = info->limb_mode;
+  info->big_endian = TARGET_BIG_END;
+  info->extended = false;
+  return true;
+}
+
 /* Implement TARGET_SCHED_CAN_SPECULATE_INSN.  Return true if INSN can be
scheduled for speculative execution.  Reject the long-running division
and square-root instructions.  */
@@ -30596,6 +30638,9 @@ aarch64_run_selftests (void)
 #undef TARGET_C_EXCESS_PRECISION
 #define TARGET_C_EXCESS_PRECISION aarch64_excess_precision
 
+#undef TARGET_C_BITINT_TYPE_INFO
+#define TARGET_C_BITINT_TYPE_INFO aarch64_bitint_type_info
+
 #undef  TARGET_EXPAND_BUILTIN
 #define TARGET_EXPAND_BUILTIN aarch64_expand_builtin
 
diff --git a/gcc/testsuite/gcc.target/aarch64/bitfield-bitint-abi-align16.c 
b/gcc/testsuite/gcc.target/aarch64/bitfield-bitint-abi-align16.c
new file mode 100644
index 
..048d04e4c1bf90215892aa0173f6246a097d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/bitfield-bitint-abi-align16.c
@@ -0,0 +1,378 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-stack-protector -save-temps -fno-schedule-insns 
-fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#define ALIGN 16
+#include "bitfield-bitint-abi.h"
+
+// f1-f16 are all the same
+
+/*
+** f1:
+** and x0, x2, 1
+** ret
+*/
+/*
+** f8:
+** and x0, x2, 1
+** ret
+*/
+/*
+** f16:
+** and x0, x2, 1

[PATCHv2 1/2] aarch64: Do not give ABI change diagnostics for _BitInt(N)

2024-03-27 Thread Andre Vieira (lists)
This patch makes sure we do not give ABI change diagnostics for the ABI 
breaks of GCC 9, 13 and 14 for any type involving _BitInt(N), since that 
type did not exist before this GCC version.


ChangeLog:

* config/aarch64/aarch64.cc (bitint_or_aggr_of_bitint_p): New function.
(aarch64_layout_arg): Don't emit diagnostics for types involving
_BitInt(N).diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
1ea84c8bd7386e399f6ffa3a5e36408cf8831fc6..b68cf3e7cb9a6fa89b4e5826a39ffa11f64ca20a
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -6744,6 +6744,33 @@ aarch64_function_arg_alignment (machine_mode mode, 
const_tree type,
   return alignment;
 }
 
+/* Return true if TYPE describes a _BitInt(N) or an angreggate that uses the
+   _BitInt(N) type.  These include ARRAY_TYPE's with an element that is a
+   _BitInt(N) or an aggregate that uses it, and a RECORD_TYPE or a UNION_TYPE
+   with a field member that is a _BitInt(N) or an aggregate that uses it.
+   Return false otherwise.  */
+
+static bool
+bitint_or_aggr_of_bitint_p (tree type)
+{
+  if (!type)
+return false;
+
+  if (TREE_CODE (type) == BITINT_TYPE)
+return true;
+
+  /* If ARRAY_TYPE, check it's element type.  */
+  if (TREE_CODE (type) == ARRAY_TYPE)
+return bitint_or_aggr_of_bitint_p (TREE_TYPE (type));
+
+  /* If RECORD_TYPE or UNION_TYPE, check the fields' types.  */
+  if (RECORD_OR_UNION_TYPE_P (type))
+for (tree field = TYPE_FIELDS (type); field; field = TREE_CHAIN (field))
+  if (bitint_or_aggr_of_bitint_p (TREE_TYPE (field)))
+   return true;
+  return false;
+}
+
 /* Layout a function argument according to the AAPCS64 rules.  The rule
numbers refer to the rule numbers in the AAPCS64.  ORIG_MODE is the
mode that was originally given to us by the target hook, whereas the
@@ -6767,12 +6794,6 @@ aarch64_layout_arg (cumulative_args_t pcum_v, const 
function_arg_info )
   if (pcum->aapcs_arg_processed)
 return;
 
-  bool warn_pcs_change
-= (warn_psabi
-   && !pcum->silent_p
-   && (currently_expanding_function_start
-  || currently_expanding_gimple_stmt));
-
   /* HFAs and HVAs can have an alignment greater than 16 bytes.  For example:
 
typedef struct foo {
@@ -6907,6 +6928,18 @@ aarch64_layout_arg (cumulative_args_t pcum_v, const 
function_arg_info )
  && (!alignment || abi_break_gcc_9 < alignment)
  && (!abi_break_gcc_13 || alignment < abi_break_gcc_13));
 
+
+  bool warn_pcs_change
+= (warn_psabi
+   && !pcum->silent_p
+   && (currently_expanding_function_start
+  || currently_expanding_gimple_stmt)
+  /* warn_pcs_change is currently used to gate diagnostics in case of
+abi_break_gcc_{9,13,14}.  These however, do not apply to _BitInt(N)
+types as they were only introduced in GCC 14.  */
+   && (!type || !bitint_or_aggr_of_bitint_p (type)));
+
+
   /* allocate_ncrn may be false-positive, but allocate_nvrn is quite reliable.
  The following code thus handles passing by SIMD/FP registers first.  */
 
@@ -21266,19 +21299,25 @@ aarch64_gimplify_va_arg_expr (tree valist, tree type, 
gimple_seq *pre_p,
   rsize = ROUND_UP (size, UNITS_PER_WORD);
   nregs = rsize / UNITS_PER_WORD;
 
-  if (align <= 8 && abi_break_gcc_13 && warn_psabi)
+  if (align <= 8
+ && abi_break_gcc_13
+ && warn_psabi
+ && !bitint_or_aggr_of_bitint_p (type))
inform (input_location, "parameter passing for argument of type "
"%qT changed in GCC 13.1", type);
 
   if (warn_psabi
  && abi_break_gcc_14
- && (abi_break_gcc_14 > 8 * BITS_PER_UNIT) != (align > 8))
+ && (abi_break_gcc_14 > 8 * BITS_PER_UNIT) != (align > 8)
+ && !bitint_or_aggr_of_bitint_p (type))
inform (input_location, "parameter passing for argument of type "
"%qT changed in GCC 14.1", type);
 
   if (align > 8)
{
- if (abi_break_gcc_9 && warn_psabi)
+ if (abi_break_gcc_9
+ && warn_psabi
+ && !bitint_or_aggr_of_bitint_p (type))
inform (input_location, "parameter passing for argument of type "
"%qT changed in GCC 9.1", type);
  dw_align = true;


[PATCHv2 0/2] aarch64, bitint: Add support for _BitInt for AArch64 Little Endian

2024-03-27 Thread Andre Vieira (lists)

Hi,

Introduced a new patch to disable diagnostics for ABI breaks involving 
_BitInt(N) given the type didn't exist, let me know what you think of that.


Also added further testing to replicate the ABI diagnostic tests to use 
_BitInt(N).


Andre Vieira (2)
aarch64: Do not give ABI change diagnostics for _BitInt(N)
aarch64: Add support for _BitInt



Re: [PATCH] lto: Don't assume a posix shell is usable on windows [PR110710]

2024-03-27 Thread Richard Biener



> Am 27.03.2024 um 18:37 schrieb Peter0x44 :
> 
> 
>> 
>>> >> > Another way would be to have a portable solution to truncate a file
>>> >> > (maybe even removing it would work).  I don't think we should override
>>> >> > SHELL.
> I've been thinking harder about this, these files get unlinked at the end if 
> they are temporary.
> Is there no way to get make to communicate back this info so lto-wrapper can 
> just unlink the file on its own? It feels easier and less invasive than 
> introducing a whole new option to the driver for only that purpose. If there 
> is some way, I'm not aware of it.

The point is we want to do this earlier than when control gets back to 
LTO-wrapper to reduce the amount of disk space required.

Richard.

Re: [PATCH] c-family: Cast __atomic_load_*/__atomic_exchange_* result to _BitInt rather then VCE it [PR114469]

2024-03-27 Thread Joseph Myers
On Wed, 27 Mar 2024, Jakub Jelinek wrote:

> Hi!
> 
> As written in the PR, torture/bitint-64.c test fails with -O2 -flto
> and the reason is that on _BitInt arches where the padding bits
> are undefined, the padding bits in the _Atomic vars are also undefined,
> but when __atomic_load or __atomic_exchange on a _BitInt _Atomic variable
> with some padding bits is lowered into __atomic_load_{1,2,4,8,16} or
> __atomic_exchange_*, the mode precision unsigned result is VIEW_CONVERT_EXPR
> converted to _BitInt and because of the VCE nothing actually sign/zero
> extends it as needed for later uses - the var is no longer addressable and
> expansion assumes such automatic vars are properly extended.
> 
> The following patch fixes that by using NOP_EXPR on it (the
> VIEW_CONVERT_EXPR after it will then be optimized away during
> gimplification, didn't want to repeat it in the code as else result = build1
> (VIEW_CONVERT_EXPR, ...); twice.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

This is OK.  A natural question arising is whether anything else might 
generate a VIEW_CONVERT_EXPR for code that somehow reinterprets memory as 
of type _BitInt, and then later fail to ensure required sign/zero 
extension, or whether this is the only place where a VIEW_CONVERT_EXPR of 
_BitInt type can cause problems.

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [PATCH] lto: Don't assume a posix shell is usable on windows [PR110710]

2024-03-27 Thread Peter0x44

>> > Another way would be to have a portable solution to truncate a file
>> > (maybe even removing it would work).  I don't think we should override
>> > SHELL.
I've been thinking harder about this, these files get unlinked at the 
end if they are temporary.
Is there no way to get make to communicate back this info so lto-wrapper 
can just unlink the file on its own? It feels easier and less invasive 
than introducing a whole new option to the driver for only that purpose. 
If there is some way, I'm not aware of it.


Re: [PATCH] fortran: Fix specification expression check in submodules [PR114475]

2024-03-27 Thread Steve Kargl
On Wed, Mar 27, 2024 at 04:30:42PM +0100, Mikael Morin wrote:
> Hell(o),
> 
> it didn't take long for my recent patch for PR111781 to show a regression.
> The fix proposed here is actually the one Harald posted in the PR.
> I can't imagine a case where it wouldn't do the right thing.
> Regression tested on x86_64-pc-linux-gnu.
> OK for master?
> 

Yes.  Thank you (and Harald) for the quick fix.

-- 
Steve


[PATCH] combine: Don't combine if I2 does not change

2024-03-27 Thread Segher Boessenkool
In some cases combine will "combine" an I2 and I3, but end up putting
exactly the same thing back as I2 as was there before.  This is never
progress, so we shouldn't do it, it will lead to oscillating behaviour
and the like.

If we want to canonicalise things, that's fine, but this is not the
way to do it.

Committed to trunk.


Segher


2024-03-27  Segher Boessenkool  

PR rtl-optimization/101523
* combine.cc (try_combine): Don't do a 2-insn combination if
it does not in fact change I2.
---
 gcc/combine.cc | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/gcc/combine.cc b/gcc/combine.cc
index a4479f8d8364..745391016d04 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -4186,6 +4186,17 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
   adjust_for_new_dest (i3);
 }
 
+  /* If I2 didn't change, this is not a combination (but a simplification or
+ canonicalisation with context), which should not be done here.  Doing
+ it here explodes the algorithm.  Don't.  */
+  if (rtx_equal_p (newi2pat, PATTERN (i2)))
+{
+  if (dump_file)
+   fprintf (dump_file, "i2 didn't change, not doing this\n");
+  undo_all ();
+  return 0;
+}
+
   /* We now know that we can do this combination.  Merge the insns and
  update the status of registers and LOG_LINKS.  */
 
-- 
1.8.3.1



Re: [PATCH] middle-end/114480 - IDF compute is slow

2024-03-27 Thread Jakub Jelinek
On Wed, Mar 27, 2024 at 04:42:21PM +0100, Richard Biener wrote:
>   PR middle-end/114480
>   * cfganal.cc (compute_idf): Use simpler bitmap iteration,
>   touch work_set only when phi_insertion_points changed.
> ---
>  gcc/cfganal.cc | 10 +++---
>  1 file changed, 3 insertions(+), 7 deletions(-)
> 
> diff --git a/gcc/cfganal.cc b/gcc/cfganal.cc
> index 432775decf1..5ef629f677e 100644
> --- a/gcc/cfganal.cc
> +++ b/gcc/cfganal.cc
> @@ -1701,8 +1701,7 @@ compute_idf (bitmap def_blocks, bitmap_head *dfs)
>   on earlier blocks first is better.
>???  Basic blocks are by no means guaranteed to be ordered in
>optimal order for this iteration.  */
> -  bb_index = bitmap_first_set_bit (work_set);
> -  bitmap_clear_bit (work_set, bb_index);
> +  bb_index = bitmap_clear_first_set_bit (work_set);
>  
>/* Since the registration of NEW -> OLD name mappings is done
>separately from the call to update_ssa, when updating the SSA

The above is clearly obvious.

> @@ -1712,12 +1711,9 @@ compute_idf (bitmap def_blocks, bitmap_head *dfs)
>gcc_checking_assert (bb_index
>  < (unsigned) last_basic_block_for_fn (cfun));
>  
> -  EXECUTE_IF_AND_COMPL_IN_BITMAP ([bb_index], phi_insertion_points,
> -   0, i, bi)
> - {
> +  EXECUTE_IF_SET_IN_BITMAP ([bb_index], 0, i, bi)
> + if (bitmap_set_bit (phi_insertion_points, i))
> bitmap_set_bit (work_set, i);
> -   bitmap_set_bit (phi_insertion_points, i);
> - }
>  }

I don't understand why the above is better.
Wouldn't it be best to do
  bitmap_ior_and_compl_into (work_set, [bb_index],
 phi_insertion_points);
  bitmap_ior_into (phi_insertion_points, [bb_index]);
?

Jakub



[PATCH] middle-end/114480 - IDF compute is slow

2024-03-27 Thread Richard Biener
The testcase in this PR shows very slow IDF compute:

  tree SSA rewrite   :  76.99 ( 31%)
  24.78%243663  cc1plus  cc1plus [.] compute_idf

which can be mitigated to some extent by refactoring the bitmap
operations to simpler variants.  With the patch below this becomes

  tree SSA rewrite   :  15.23 (  8%)

when not optimizing and in addition to that

  tree SSA incremental   : 181.52 ( 30%)

to

  tree SSA incremental   :  24.09 (  6%)

when optimizing.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

OK if that succeeds?

Thanks,
Richard.

PR middle-end/114480
* cfganal.cc (compute_idf): Use simpler bitmap iteration,
touch work_set only when phi_insertion_points changed.
---
 gcc/cfganal.cc | 10 +++---
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/gcc/cfganal.cc b/gcc/cfganal.cc
index 432775decf1..5ef629f677e 100644
--- a/gcc/cfganal.cc
+++ b/gcc/cfganal.cc
@@ -1701,8 +1701,7 @@ compute_idf (bitmap def_blocks, bitmap_head *dfs)
  on earlier blocks first is better.
 ???  Basic blocks are by no means guaranteed to be ordered in
 optimal order for this iteration.  */
-  bb_index = bitmap_first_set_bit (work_set);
-  bitmap_clear_bit (work_set, bb_index);
+  bb_index = bitmap_clear_first_set_bit (work_set);
 
   /* Since the registration of NEW -> OLD name mappings is done
 separately from the call to update_ssa, when updating the SSA
@@ -1712,12 +1711,9 @@ compute_idf (bitmap def_blocks, bitmap_head *dfs)
   gcc_checking_assert (bb_index
   < (unsigned) last_basic_block_for_fn (cfun));
 
-  EXECUTE_IF_AND_COMPL_IN_BITMAP ([bb_index], phi_insertion_points,
- 0, i, bi)
-   {
+  EXECUTE_IF_SET_IN_BITMAP ([bb_index], 0, i, bi)
+   if (bitmap_set_bit (phi_insertion_points, i))
  bitmap_set_bit (work_set, i);
- bitmap_set_bit (phi_insertion_points, i);
-   }
 }
 
   return phi_insertion_points;
-- 
2.35.3


[PATCH] fortran: Fix specification expression check in submodules [PR114475]

2024-03-27 Thread Mikael Morin
Hell(o),

it didn't take long for my recent patch for PR111781 to show a regression.
The fix proposed here is actually the one Harald posted in the PR.
I can't imagine a case where it wouldn't do the right thing.
Regression tested on x86_64-pc-linux-gnu.
OK for master?

-- >8 --

The patch fixing PR111781 made the check of specification expressions more
restrictive, disallowing local variables in specification expressions of
dummy arguments.  PR114475 showed an example where that change regressed,
disallowing in submodules expressions that had been allowed in the parent
module.  In submodules indeed, the hierarchy of namespaces inherited from
the parent module is not reproduced so the host-association of symbols
can't be recognized by checking the nesting of namespaces.

This change fixes the problem by allowing in specification expressions
all the symbols in a submodule that are inherited from the parent module.

PR fortran/111781
PR fortran/114475

gcc/fortran/ChangeLog:

* expr.cc (check_restricted): In submodules, allow variables host-
associated from the parent module.

gcc/testsuite/ChangeLog:

* gfortran.dg/spec_expr_10.f90: New test.

Co-authored-by: Harald Anlauf 
---
 gcc/fortran/expr.cc|  1 +
 gcc/testsuite/gfortran.dg/spec_expr_10.f90 | 46 ++
 2 files changed, 47 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/spec_expr_10.f90

diff --git a/gcc/fortran/expr.cc b/gcc/fortran/expr.cc
index 9a042cd7040..09d1ebd95d2 100644
--- a/gcc/fortran/expr.cc
+++ b/gcc/fortran/expr.cc
@@ -3517,6 +3517,7 @@ check_restricted (gfc_expr *e)
   if (e->error
|| sym->attr.in_common
|| sym->attr.use_assoc
+   || sym->attr.used_in_submodule
|| sym->attr.dummy
|| sym->attr.implied_index
|| sym->attr.flavor == FL_PARAMETER
diff --git a/gcc/testsuite/gfortran.dg/spec_expr_10.f90 
b/gcc/testsuite/gfortran.dg/spec_expr_10.f90
new file mode 100644
index 000..287b5a8d6cc
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/spec_expr_10.f90
@@ -0,0 +1,46 @@
+! { dg-do compile }
+!
+! PR fortran/114475
+! The array specification of PP in OL_EVAL used to be rejected in the submodule
+! because the compiler was not able to see the host-association of N_EXTERNAL
+! there.
+!
+! Contributed by Jürgen Reuter .
+
+module t1
+  use, intrinsic :: iso_c_binding
+  implicit none
+  private
+  public :: t1_t
+  integer :: N_EXTERNAL = 0
+
+  type :: t1_t
+  contains
+procedure :: set_n_external => t1_set_n_external
+  end type t1_t
+
+  abstract interface
+ subroutine ol_eval (id, pp, emitter) bind(C)
+   import
+   real(kind = c_double), intent(in) :: pp(5 * N_EXTERNAL)
+ end subroutine ol_eval
+  end interface
+  interface
+module subroutine t1_set_n_external (object, n)
+  class(t1_t), intent(inout) :: object
+  integer, intent(in) :: n
+end subroutine t1_set_n_external
+  end interface
+
+end module t1
+
+submodule (t1) t1_s
+  implicit none
+contains
+  module subroutine t1_set_n_external (object, n)
+class(t1_t), intent(inout) :: object
+integer, intent(in) :: n
+N_EXTERNAL = n
+  end subroutine t1_set_n_external
+
+end submodule t1_s
-- 
2.43.0



Re: [PATCH] match.pd: Avoid (view_convert (convert@0 @1)) optimization for extended _BitInts with padding bits [PR114469]

2024-03-27 Thread Jakub Jelinek
On Wed, Mar 27, 2024 at 01:42:03PM +0100, Richard Biener wrote:
> The comment says it was added for (char)_Bool, probably 
> gcc.dg/tree-ssa/vce-1.c will fail.  But I'm not sure this optimization
> is important, it seems early SRA introduces a V_C_E here and then
> phiopt the conversion to unsigned char.  But this is a V_C_E of
> _Bool -> unsigned char -> _Bool.  We seem to apply both this
> pattern and then drop the V_C_E.
> 
> Maybe we can reasonably optimize the case where type == TREE_TYPE (@1)
> and the intermediate precision is bigger or equal?

If type is INTEGRAL_TYPE_P and has TYPE_PRECISION <= that of TREE_TYPE (@1),
sure, it can be optimized whenever TREE_TYPE (@0) has precision >= that,
no matter what TYPE_SIGN TREE_TYPE (@1) has.
Anyway, will try to cook up something.

Jakub



Re: [PATCH] docs: Use @var{S} etc. in Spec File invoke.texi documentation

2024-03-27 Thread Sandra Loosemore

On 3/27/24 04:57, Jakub Jelinek wrote:

Hi!

We got internally a question about the Spec File syntax, misunderstanding
what is the literal syntax and what are the placeholder variables in
the syntax descriptions.
The following patch attempts to use @var{S} etc. instead of just S
to clarify it stands for any option (or start of option etc.) rather
than literal S, say in %{S:X}.  At least in HTML documentation it
then uses italics.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


Looks good for me, as long as you visually inspected the changed text in 
the manual and didn't just make sure it still builds.  (I generally 
consider any markup/formatting changes "obvious" as long as they 
actually look OK.)


-Sandra


Re: [PATCH] c++: fix alias CTAD [PR114377]

2024-03-27 Thread Patrick Palka
On Mon, 25 Mar 2024, centurion wrote:

> From b34312d82b236601c348382d30e625558f37d40c Mon Sep 17 00:00:00 2001
> From: centurion 
> Date: Mon, 25 Mar 2024 01:57:21 +0400
> Subject: [PATCH] c++: fix alias CTAD [PR114377]
> 
> PR c++/114377
> 
> gcc/cp/ChangeLog:
> 
>   PR c++/114377
>   * pt.cc (find_template_parameter_info::found): Use TREE_TYPE for
>   TEMPLATE_DECL instead of DECL_INITIAL.

Makes sense, this is consistent with e.g. template_parm_to_arg.  LGTM!

> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp2a/class-deduction-alias19.C: New test.
> ---
>  gcc/cp/pt.cc  |  3 ++-
>  .../g++.dg/cpp2a/class-deduction-alias19.C| 15 +++
>  2 files changed, 17 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/class-deduction-alias19.C
> 
> diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> index 8cf0d5b7a8d..d8a02f1cd7f 100644
> --- a/gcc/cp/pt.cc
> +++ b/gcc/cp/pt.cc
> @@ -11032,7 +11032,8 @@ find_template_parameter_info::found (tree parm)
>  {
>if (TREE_CODE (parm) == TREE_LIST)
>  parm = TREE_VALUE (parm);
> -  if (TREE_CODE (parm) == TYPE_DECL)
> +  if (TREE_CODE (parm) == TYPE_DECL
> +  || TREE_CODE(parm) == TEMPLATE_DECL)

Style note: there should be a space before the ( of a function/macro call

>  parm = TREE_TYPE (parm);
>else
>  parm = DECL_INITIAL (parm);
> diff --git a/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias19.C 
> b/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias19.C
> new file mode 100644
> index 000..1ea79bd7691
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias19.C
> @@ -0,0 +1,15 @@
> +// PR c++/114377
> +// { dg-do compile { target c++20 } }
> +
> +template  typename Iterator>
> +struct K {};
> +
> +template 
> +class Foo {};
> +
> +template  typename TTP>
> +using Bar = Foo>;
> +
> +void s() {
> +Bar(1); // { dg-error "failed|no match" }
> +}
> -- 
> 2.44.0
> 
> 
> 



Re: [PATCH v2] libstdc++: add ARM SVE support to std::experimental::simd

2024-03-27 Thread Matthias Kretz
On Wednesday, 27 March 2024 10:50:41 CET Jonathan Wakely wrote:
> As discussed on IRC, please push the revised patch with your
> suggestions incorporated (and post to the lists for posterity).

The patch as pushed is attached.

-- 
──
 Dr. Matthias Kretz   https://mattkretz.github.io
 GSI Helmholtz Center for Heavy Ion Research   https://gsi.de
 std::simd
──
commit 9ac3119fec81fb64d11dee8f853145f937389366
Author: Srinivas Yadav Singanaboina 
Date:   Sat Mar 16 19:04:35 2024 +

libstdc++: add ARM SVE support to std::experimental::simd

libstdc++-v3/ChangeLog:

* include/Makefile.am: Add simd_sve.h.
* include/Makefile.in: Add simd_sve.h.
* include/experimental/bits/simd.h: Add new SveAbi.
* include/experimental/bits/simd_builtin.h: Use
__no_sve_deduce_t to support existing Neon Abi.
* include/experimental/bits/simd_converter.h: Convert
sequentially when sve is available.
* include/experimental/bits/simd_detail.h: Define sve
specific macro.
* include/experimental/bits/simd_math.h: Fallback frexp
to execute sequntially when sve is available, to handle
fixed_size_simd return type that always uses sve.
* include/experimental/simd: Include bits/simd_sve.h.
* testsuite/experimental/simd/tests/bits/main.h: Enable
testing for sve128, sve256, sve512.
* include/experimental/bits/simd_sve.h: New file.

Signed-off-by: Srinivas Yadav Singanaboina 

diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index cb902de36ae..422a0f4bd0a 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -835,6 +835,7 @@ experimental_bits_headers = \
 	${experimental_bits_srcdir}/simd_neon.h \
 	${experimental_bits_srcdir}/simd_ppc.h \
 	${experimental_bits_srcdir}/simd_scalar.h \
+	${experimental_bits_srcdir}/simd_sve.h \
 	${experimental_bits_srcdir}/simd_x86.h \
 	${experimental_bits_srcdir}/simd_x86_conversions.h \
 	${experimental_bits_srcdir}/string_view.tcc \
diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in
index 9357087acb4..9fd4ab4848c 100644
--- a/libstdc++-v3/include/Makefile.in
+++ b/libstdc++-v3/include/Makefile.in
@@ -1181,6 +1181,7 @@ experimental_bits_headers = \
 	${experimental_bits_srcdir}/simd_neon.h \
 	${experimental_bits_srcdir}/simd_ppc.h \
 	${experimental_bits_srcdir}/simd_scalar.h \
+	${experimental_bits_srcdir}/simd_sve.h \
 	${experimental_bits_srcdir}/simd_x86.h \
 	${experimental_bits_srcdir}/simd_x86_conversions.h \
 	${experimental_bits_srcdir}/string_view.tcc \
diff --git a/libstdc++-v3/include/experimental/bits/simd.h b/libstdc++-v3/include/experimental/bits/simd.h
index 974377c6798..03c2e17a326 100644
--- a/libstdc++-v3/include/experimental/bits/simd.h
+++ b/libstdc++-v3/include/experimental/bits/simd.h
@@ -39,12 +39,16 @@
 #include 
 #include 
 #include 
+#include 
 
 #if _GLIBCXX_SIMD_X86INTRIN
 #include 
 #elif _GLIBCXX_SIMD_HAVE_NEON
 #include 
 #endif
+#if _GLIBCXX_SIMD_HAVE_SVE
+#include 
+#endif
 
 /** @ingroup ts_simd
  * @{
@@ -83,6 +87,12 @@
 using __m512i [[__gnu__::__vector_size__(64)]] = long long;
 #endif
 
+#if _GLIBCXX_SIMD_HAVE_SVE
+constexpr inline int __sve_vectorized_size_bytes = __ARM_FEATURE_SVE_BITS / 8;
+#else
+constexpr inline int __sve_vectorized_size_bytes = 0;
+#endif
+
 namespace simd_abi {
 // simd_abi forward declarations {{{
 // implementation details:
@@ -108,6 +118,9 @@ struct _VecBuiltin
 template 
   struct _VecBltnBtmsk;
 
+template 
+  struct _SveAbi;
+
 template 
   using _VecN = _VecBuiltin;
 
@@ -123,6 +136,9 @@ struct _VecBltnBtmsk
 template 
   using _Neon = _VecBuiltin<_UsedBytes>;
 
+template 
+  using _Sve = _SveAbi<_UsedBytes, __sve_vectorized_size_bytes>;
+
 // implementation-defined:
 using __sse = _Sse<>;
 using __avx = _Avx<>;
@@ -130,6 +146,7 @@ struct _VecBltnBtmsk
 using __neon = _Neon<>;
 using __neon128 = _Neon<16>;
 using __neon64 = _Neon<8>;
+using __sve = _Sve<>;
 
 // standard:
 template 
@@ -250,6 +267,9 @@ _S_apply(_Up* __ptr)
   false;
 #endif
 
+constexpr inline bool __have_sve = _GLIBCXX_SIMD_HAVE_SVE;
+constexpr inline bool __have_sve2 = _GLIBCXX_SIMD_HAVE_SVE2;
+
 #ifdef _ARCH_PWR10
 constexpr inline bool __have_power10vec = true;
 #else
@@ -356,12 +376,14 @@ __machine_flags()
 		 | (__have_avx512vnni << 27)
 		 | (__have_avx512vpopcntdq<< 28)
 		 | (__have_avx512vp2intersect << 29);
-else if constexpr (__have_neon)
+else if constexpr (__have_neon || __have_sve)
   return __have_neon
 	   | (__have_neon_a32 << 1)
 	   | (__have_neon_a64 << 2)
 	   | (__have_neon_a64 << 2)
-	   | (__support_neon_float << 3);
+	   

Re: [PATCH] c++: templated substitution into lambda-expr [PR114393]

2024-03-27 Thread Patrick Palka
On Mon, 25 Mar 2024, Patrick Palka wrote:

> On Mon, 25 Mar 2024, Patrick Palka wrote:
> 
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
> > for trunk?
> > 
> > -- >8 --
> > 
> > The below testcases use a lambda-expr as a template argument and they
> > all trip over the below added tsubst_lambda_expr sanity check ultimately
> > because current_template_parms is empty, which causes push_template_decl
> > to return error_mark_node from the call to begin_lambda_type.  Were it
> > not for the sanity check this silent error_mark_node result leads to
> > nonsensical errors down the line, or silent breakage.
> > 
> > In the first testcase, we hit this assert during instantiation of the
> > dependent alias template-id c1_t<_Data> from instantiate_template, which
> > clears current_template_parms via push_to_top_level.  Similar story for
> > the second testcase.  For the third testcase we hit the assert during
> > partial instantiation of the member template from instantiate_class_template
> > which similarly calls push_to_top_level.
> > 
> > These testcases illustrate that templated substitution into a lambda-expr
> > is not always possible, in particular when we lost the relevant template
> > context.  I experimented with recovering the template context by making
> > tsubst_lambda_expr fall back to using scope_chain->prev->template_parms if
> > current_template_parms is empty which worked but seemed like a hack.  I
> > also experimented with preserving the template context by keeping
> > current_template_parms set during instantiate_template for a dependent
> > specialization which also worked but it's at odds with the fact that we
> > cache dependent specializations (and so they should be independent of
> > the template context).
> > 
> > So instead of trying to make such substitution work, this patch uses the
> > extra-args mechanism to defer templated substitution into a lambda-expr
> > when we lost the relevant template context.
> > 
> > PR c++/114393
> > PR c++/107457
> > PR c++/93595
> > 
> > gcc/cp/ChangeLog:
> > 
> > * cp-tree.h (LAMBDA_EXPR_EXTRA_ARGS):
> > (struct GTY):
> > * module.cc (trees_out::core_vals) : Stream
> > LAMBDA_EXPR_EXTRA_ARGS.
> > (trees_in::core_vals) : Likewise.
> > * pt.cc (has_extra_args_mechanism_p):
> > (tsubst_lambda_expr):
> 
> Whoops, this version of the patch has an incomplete ChangeLog entry.
> 
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/cpp2a/lambda-targ2.C: New test.
> > * g++.dg/cpp2a/lambda-targ3.C: New test.
> > * g++.dg/cpp2a/lambda-targ4.C: New test.
> > ---
> >  gcc/cp/cp-tree.h  |  5 +
> >  gcc/cp/module.cc  |  2 ++
> >  gcc/cp/pt.cc  | 20 ++--
> >  gcc/testsuite/g++.dg/cpp2a/lambda-targ2.C | 19 +++
> >  gcc/testsuite/g++.dg/cpp2a/lambda-targ3.C | 12 
> >  gcc/testsuite/g++.dg/cpp2a/lambda-targ4.C | 12 
> >  6 files changed, 68 insertions(+), 2 deletions(-)
> >  create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-targ2.C
> >  create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-targ3.C
> >  create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-targ4.C
> > 
> > diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
> > index c29a5434492..27100537038 100644
> > --- a/gcc/cp/cp-tree.h
> > +++ b/gcc/cp/cp-tree.h
> > @@ -1538,6 +1538,10 @@ enum cp_lambda_default_capture_mode_type {
> >  #define LAMBDA_EXPR_REGEN_INFO(NODE) \
> >(((struct tree_lambda_expr *)LAMBDA_EXPR_CHECK (NODE))->regen_info)
> >  
> > +/* Like PACK_EXPANSION_EXTRA_ARGS, for lambda-expressions.  */
> > +#define LAMBDA_EXPR_EXTRA_ARGS(NODE) \
> > +  (((struct tree_lambda_expr *)LAMBDA_EXPR_CHECK (NODE))->extra_args)
> > +
> >  /* The closure type of the lambda, which is also the type of the
> > LAMBDA_EXPR.  */
> >  #define LAMBDA_EXPR_CLOSURE(NODE) \
> > @@ -1550,6 +1554,7 @@ struct GTY (()) tree_lambda_expr
> >tree this_capture;
> >tree extra_scope;
> >tree regen_info;
> > +  tree extra_args;
> >vec *pending_proxies;
> >location_t locus;
> >enum cp_lambda_default_capture_mode_type default_capture_mode : 2;
> > diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
> > index 52c60cf370c..1cd890909e3 100644
> > --- a/gcc/cp/module.cc
> > +++ b/gcc/cp/module.cc
> > @@ -6312,6 +6312,7 @@ trees_out::core_vals (tree t)
> >WT (((lang_tree_node *)t)->lambda_expression.this_capture);
> >WT (((lang_tree_node *)t)->lambda_expression.extra_scope);
> >WT (((lang_tree_node *)t)->lambda_expression.regen_info);
> > +  WT (((lang_tree_node *)t)->lambda_expression.extra_args);
> >/* pending_proxies is a parse-time thing.  */
> >gcc_assert (!((lang_tree_node 
> > *)t)->lambda_expression.pending_proxies);
> >if (state)
> > @@ -6814,6 +6815,7 @@ trees_in::core_vals (tree t)
> >RT (((lang_tree_node 

Re: [PATCH] libstdc++: add ARM SVE support to std::experimental::simd

2024-03-27 Thread Richard Sandiford
Matthias Kretz  writes:
> Hi Richard,
>
> sorry for not answering sooner. I took action on your mail but failed to also 
> give feedback. Now in light of your veto of Srinivas patch I wanted to use 
> the 
> opportunity to pick this up again.
>
> On Dienstag, 23. Januar 2024 21:57:23 CET Richard Sandiford wrote:
>> However, we also support different vector lengths for streaming SVE
>> (running in "streaming" mode on SME) and non-streaming SVE (running
>> in "non-streaming" mode on the core).  Having two different lengths is
>> expected to be the common case, rather than a theoretical curiosity.
>
> I read up on this after you mentioned this for the first time. As a WG21 
> member I find the approach troublesome - but that's a bit off-topic for this 
> thread.
>
> The big issue here is that, IIUC, a user (and the simd library) cannot do the 
> right thing at the moment. There simply isn't enough context information 
> available when parsing the  header. I.e. on definition of 
> the class template there's no facility to take target_clones or SME 
> "streaming" mode into account. Consequently, if we want the library to be fit 
> for SME, then we need more language extension(s) to make it work.

Yeah.  I think the same applies to plain SVE.  It seems reasonable to
have functions whose implementation is specialised for a specific SVE
length, with that function being selected at runtime where appropriate.
Those functions needn't (in principle) be in separate TUs.  The “best”
definition of native then becomes a per-function property rather
than a per-TU property.

As you note later, I think the same thing would apply to x86_64.

> I guess I'm looking for a way to declare types that are different depending 
> on 
> whether they are used in streaming mode or non-streaming mode (making them 
> ill-formed to use in functions marked arm_streaming_compatible).
>
> From reading through https://arm-software.github.io/acle/main/
> acle.html#controlling-the-use-of-streaming-mode I don't see any discussion of 
> member functions or ctor/dtor, static and non-static data members, etc.
>
> The big issue I see here is that currently all of std::* is declared without 
> a 
> arm_streaming or arm_streaming_compatible. Thus, IIUC, you can't use anything 
> from the standard library in streaming mode. Since that also applies to 
> std::experimental::simd, we're not creating a new footgun, only missing out 
> on 
> potential users?

Kind-of.  However, we can inline a non-streaming function into a streaming
function if that doesn't change defined behaviour.  And that's important
in practice for C++, since most trivial inline functions will not be
marked streaming-compatible despite being so in practice.

It's UB to pass and return SVE vectors across streaming/non-streaming
boundaries unless the two VLs are equal.  It's therefore valid to inline
such functions into streaming functions *unless* the callee uses
non-streaming-only instructions such as gather loads.

Because of that, someone trying to use std::experimenal::simd in SME
functions is likely to succeed, at least in simple cases.

> Some more thoughts on target_clones/streaming SVE language extension 
> evolution:
>
>   void nonstreaming_fn(void) {
> constexpr int width = __arm_sve_bits(); // e.g. 512
> constexpr int width2 = __builtin_vector_size(); // e.g. 64 (the
>   // vector_size attribute works with bytes, not bits)
>   }
>
>   __attribute__((arm_locally_streaming))
>   void streaming_fn(void) {
> constexpr int width = __arm_sve_bits(); // e.g. 128
> constexpr int width2 = __builtin_vector_size(); // e.g. 16
>   }
>
>   __attribute__((target_clones("sse4.2,avx2")))
>   void streaming_fn(void) {
> constexpr int width = __builtin_vector_size(); // 16 in the sse4.2 clone
>   // and 32 in the avx2 clone
>   }
>
> ... as a starting point for exploration. Given this, I'd still have to resort 
> to a macro to define a "native" simd type:
>
> #define NATIVE_SIMD(T) std::experimental::simd CHAR_BITS, __arm_sve_bits() / CHAR_BITS>>
>
> Getting rid of the macro seems to be even harder.

Yeah.  The constexprs in the AArch64 functions would only be compile-time
constants in to-be-defined circumstances, using some mechanism to specify
the streaming and non-streaming vector lengths at compile time.  But that's
a premise of the whole discussion, just noting it for the record in case
anyone reading this later jumps in at this point.

> A declaration of an alias like
>
> template 
> using SveSimd = std::experimental::simd CHAR_BITS, __arm_sve_bits() / CHAR_BITS>>;
>
> would have to delay "invoking" __arm_sve_bits() until it knows its context:
>
>   void nonstreaming_fn(void) {
> static_assert(sizeof(SveSimd) == 64);
>   }
>
>   __attribute__((arm_locally_streaming))
>   void streaming_fn(void) {
> static_assert(sizeof(SveSimd) == 16);
> nonstreaming_fn(); // fine
>   }
>
> This gets even worse for target_clones, where
>
>   void f() {
> 

Re: [PATCH] vect: more oversized bitmask fixups

2024-03-27 Thread Thomas Schwinge
Hi!

On 2024-03-22T14:15:36+, Andrew Stubbs  wrote:
> On 22/03/2024 08:43, Richard Biener wrote:
> Thanks, here's what I pushed.

> vect: more oversized bitmask fixups
>
> These patches fix up a failure in testcase vect/tsvc/vect-tsvc-s278.c when
> configured to use V32 instead of V64 (I plan to do this for RDNA devices).

Thanks, confirming that this "vect: more oversized bitmask fixups" does
fix the GCN target '-march=gfx1100' testing regression:

PASS: gcc.dg/vect/tsvc/vect-tsvc-s278.c (test for excess errors)
[-PASS:-]{+FAIL:+} gcc.dg/vect/tsvc/vect-tsvc-s278.c execution test
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s278.c scan-tree-dump vect "vectorized 1 
loops"

PASS: gcc.dg/vect/tsvc/vect-tsvc-s279.c (test for excess errors)
[-PASS:-]{+FAIL:+} gcc.dg/vect/tsvc/vect-tsvc-s279.c execution test
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s279.c scan-tree-dump vect "vectorized 1 
loops"

... that I saw introduced by "amdgcn: Prefer V32 on RDNA devices".

(The XPASSes are independent of that, pre-existing.)


Grüße
 Thomas


> The problem was that a "not" operation on the mask inadvertently enabled
> inactive lanes 31-63 and corrupted the output.  The fix is to adjust the mask
> when calling internal functions (in this case COND_MINUS), when doing masked
> loads and stores, and when doing conditional jumps (some cases were already
> handled).
>
> gcc/ChangeLog:
>
>   * dojump.cc (do_compare_rtx_and_jump): Clear excess bits in vector
>   bitmasks.
>   (do_compare_and_jump): Remove now-redundant similar code.
>   * internal-fn.cc (expand_fn_using_insn): Clear excess bits in vector
>   bitmasks.
>   (add_mask_and_len_args): Likewise.
>
> diff --git a/gcc/dojump.cc b/gcc/dojump.cc
> index 88600cb42d3..5f74b696b41 100644
> --- a/gcc/dojump.cc
> +++ b/gcc/dojump.cc
> @@ -1235,6 +1235,24 @@ do_compare_rtx_and_jump (rtx op0, rtx op1, enum 
> rtx_code code, int unsignedp,
>   }
>   }
>  
> +  /* For boolean vectors with less than mode precision
> +  make sure to fill padding with consistent values.  */
> +  if (val
> +   && VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (val))
> +   && SCALAR_INT_MODE_P (mode))
> + {
> +   auto nunits = TYPE_VECTOR_SUBPARTS (TREE_TYPE (val)).to_constant ();
> +   if (maybe_ne (GET_MODE_PRECISION (mode), nunits))
> + {
> +   op0 = expand_binop (mode, and_optab, op0,
> +   GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
> +   NULL_RTX, true, OPTAB_WIDEN);
> +   op1 = expand_binop (mode, and_optab, op1,
> +   GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
> +   NULL_RTX, true, OPTAB_WIDEN);
> + }
> + }
> +
>emit_cmp_and_jump_insns (op0, op1, code, size, mode, unsignedp, val,
>  if_true_label, prob);
>  }
> @@ -1266,7 +1284,6 @@ do_compare_and_jump (tree treeop0, tree treeop1, enum 
> rtx_code signed_code,
>machine_mode mode;
>int unsignedp;
>enum rtx_code code;
> -  unsigned HOST_WIDE_INT nunits;
>  
>/* Don't crash if the comparison was erroneous.  */
>op0 = expand_normal (treeop0);
> @@ -1309,21 +1326,6 @@ do_compare_and_jump (tree treeop0, tree treeop1, enum 
> rtx_code signed_code,
>emit_insn (targetm.gen_canonicalize_funcptr_for_compare (new_op1, 
> op1));
>op1 = new_op1;
>  }
> -  /* For boolean vectors with less than mode precision
> - make sure to fill padding with consistent values.  */
> -  else if (VECTOR_BOOLEAN_TYPE_P (type)
> -&& SCALAR_INT_MODE_P (mode)
> -&& TYPE_VECTOR_SUBPARTS (type).is_constant ()
> -&& maybe_ne (GET_MODE_PRECISION (mode), nunits))
> -{
> -  gcc_assert (code == EQ || code == NE);
> -  op0 = expand_binop (mode, and_optab, op0,
> -   GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1), NULL_RTX,
> -   true, OPTAB_WIDEN);
> -  op1 = expand_binop (mode, and_optab, op1,
> -   GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1), NULL_RTX,
> -   true, OPTAB_WIDEN);
> -}
>  
>do_compare_rtx_and_jump (op0, op1, code, unsignedp, treeop0, mode,
>  ((mode == BLKmode)
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index fcf47c7fa12..5269f0ac528 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -245,6 +245,18 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, 
> unsigned int noutputs,
>  && SSA_NAME_IS_DEFAULT_DEF (rhs)
>  && VAR_P (SSA_NAME_VAR (rhs)))
>   create_undefined_input_operand ([opno], TYPE_MODE (rhs_type));
> +  else if (VECTOR_BOOLEAN_TYPE_P (rhs_type)
> +&& SCALAR_INT_MODE_P (TYPE_MODE (rhs_type))
> +&& maybe_ne (GET_MODE_PRECISION (TYPE_MODE (rhs_type)),
> + TYPE_VECTOR_SUBPARTS (rhs_type).to_constant ()))
> + {
> +  

Re: [PATCH] libstdc++: Add masked ++/-- implementation for sizeof < 16

2024-03-27 Thread Jonathan Wakely
On Wed, 27 Mar 2024 at 12:48, Matthias Kretz  wrote:
>
> And one more to fix follow-up / remaining failures. Tested on x86_64-linux-
> gnu.
>
> OK for trunk and 13?

OK for both, thanks.


>
>  8< --
>
> This resolves further failures (-Wreturn-type warnings) and test
> failures for where-* tests targeting AVX-512.
>
> Signed-off-by: Matthias Kretz 
>
> libstdc++-v3/ChangeLog:
>
> * include/experimental/bits/simd_x86.h (_S_masked_unary):
> Cast inputs < 16 bytes to 16 byte vectors before calling the
> right subtraction builtin. Before returning, truncate to the
> return vector type.
> ---
>  .../include/experimental/bits/simd_x86.h  | 24 +++
>  1 file changed, 14 insertions(+), 10 deletions(-)
>
>
> --
> ──
>  Dr. Matthias Kretz   https://mattkretz.github.io
>  GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
>  stdₓ::simd
> ──


[PATCH] libstdc++: Add masked ++/-- implementation for sizeof < 16

2024-03-27 Thread Matthias Kretz
And one more to fix follow-up / remaining failures. Tested on x86_64-linux-
gnu.

OK for trunk and 13?

 8< --

This resolves further failures (-Wreturn-type warnings) and test
failures for where-* tests targeting AVX-512.

Signed-off-by: Matthias Kretz 

libstdc++-v3/ChangeLog:

* include/experimental/bits/simd_x86.h (_S_masked_unary):
Cast inputs < 16 bytes to 16 byte vectors before calling the
right subtraction builtin. Before returning, truncate to the
return vector type.
---
 .../include/experimental/bits/simd_x86.h  | 24 +++
 1 file changed, 14 insertions(+), 10 deletions(-)


--
──
 Dr. Matthias Kretz   https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
 stdₓ::simd
──diff --git a/libstdc++-v3/include/experimental/bits/simd_x86.h b/libstdc++-v3/include/experimental/bits/simd_x86.h
index 6b414486fee..517c4b4a5be 100644
--- a/libstdc++-v3/include/experimental/bits/simd_x86.h
+++ b/libstdc++-v3/include/experimental/bits/simd_x86.h
@@ -3508,6 +3508,9 @@ _S_masked_unary(const _SimdWrapper<_K, _Np> __k, const _SimdWrapper<_Tp, _Np> __
 #ifdef __clang__
 	return __movm<_Np, _Tp>(__k._M_data) ? __v._M_data - __pm_one : __v._M_data;
 #else // __clang__
+	using _TV = __vector_type_t<_Tp, _Np>;
+	constexpr size_t __bytes = sizeof(__v) < 16 ? 16 : sizeof(__v);
+	constexpr size_t __width = __bytes / sizeof(_Tp);
 	if constexpr (is_integral_v<_Tp>)
 	  {
 		constexpr bool __lp64 = sizeof(long) == sizeof(long long);
@@ -3517,11 +3520,11 @@ _S_masked_unary(const _SimdWrapper<_K, _Np> __k, const _SimdWrapper<_Tp, _Np> __
 			  std::conditional_t<__lp64, long long, int>,
 			  std::conditional_t<
 std::is_same_v<_Ip, signed char>, char, _Ip>>;
-		const auto __value = __vector_bitcast<_Up>(__v._M_data);
+		const auto __value = __intrin_bitcast<__vector_type_t<_Up, __width>>(__v._M_data);
 #define _GLIBCXX_SIMD_MASK_SUB(_Sizeof, _Width, _Instr)\
-  if constexpr (sizeof(_Tp) == _Sizeof && sizeof(__v) == _Width)   \
-return __vector_bitcast<_Tp>(__builtin_ia32_##_Instr##_mask(__value,   \
-	 __vector_broadcast<_Np>(_Up(__pm_one)), __value, __k._M_data))
+  if constexpr (sizeof(_Tp) == _Sizeof && sizeof(__value) == _Width)   \
+return __intrin_bitcast<_TV>(__builtin_ia32_##_Instr##_mask(__value,   \
+	 __vector_broadcast<__width>(_Up(__pm_one)), __value, __k._M_data))
 		_GLIBCXX_SIMD_MASK_SUB(1, 64, psubb512);
 		_GLIBCXX_SIMD_MASK_SUB(1, 32, psubb256);
 		_GLIBCXX_SIMD_MASK_SUB(1, 16, psubb128);
@@ -3538,16 +3541,17 @@ _S_masked_unary(const _SimdWrapper<_K, _Np> __k, const _SimdWrapper<_Tp, _Np> __
 	  }
 	else
 	  {
+		const auto __value = __intrin_bitcast<__vector_type_t<_Tp, __width>>(__v._M_data);
 #define _GLIBCXX_SIMD_MASK_SUB_512(_Sizeof, _Width, _Instr)\
-  if constexpr (sizeof(_Tp) == _Sizeof && sizeof(__v) == _Width)   \
+  if constexpr (sizeof(_Tp) == _Sizeof && sizeof(__value) == _Width)   \
 return __builtin_ia32_##_Instr##_mask( \
-	 __v._M_data, __vector_broadcast<_Np>(_Tp(__pm_one)), __v._M_data, \
+	 __value, __vector_broadcast<__width>(_Tp(__pm_one)), __value, \
 	 __k._M_data, _MM_FROUND_CUR_DIRECTION)
 #define _GLIBCXX_SIMD_MASK_SUB(_Sizeof, _Width, _Instr)\
-  if constexpr (sizeof(_Tp) == _Sizeof && sizeof(__v) == _Width)   \
-return __builtin_ia32_##_Instr##_mask( \
-	 __v._M_data, __vector_broadcast<_Np>(_Tp(__pm_one)), __v._M_data, \
-	 __k._M_data)
+  if constexpr (sizeof(_Tp) == _Sizeof && sizeof(__value) == _Width)   \
+return __intrin_bitcast<_TV>(__builtin_ia32_##_Instr##_mask(   \
+	 __value, __vector_broadcast<__width>(_Tp(__pm_one)), __value, \
+	 __k._M_data))
 		_GLIBCXX_SIMD_MASK_SUB_512(4, 64, subps512);
 		_GLIBCXX_SIMD_MASK_SUB(4, 32, subps256);
 		_GLIBCXX_SIMD_MASK_SUB(4, 16, subps128);


Re: [PATCH v2] libstdc++: add ARM SVE support to std::experimental::simd

2024-03-27 Thread Jonathan Wakely
On Wed, 27 Mar 2024 at 12:13, Richard Sandiford
 wrote:
>
> Matthias Kretz  writes:
> > On Wednesday, 27 March 2024 11:07:14 CET Richard Sandiford wrote:
> >> I'm still worried about:
> >>
> >>   #if _GLIBCXX_SIMD_HAVE_SVE
> >>   constexpr inline int __sve_vectorized_size_bytes = __ARM_FEATURE_SVE_BITS
> >> / 8; #else
> >>   constexpr inline int __sve_vectorized_size_bytes = 0;
> >>   #endif
> >>
> >> and the direct use __ARM_FEATURE_SVE_BITS elsewhere, for the reasons
> >> discussed here (including possible ODR problems):
> >>
> >>   https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640037.html
> >>   https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643734.html
> >>
> >> Logically the vector length should be a template parameter rather than
> >> an invariant.  Has this been resolved?  If not, it feels like a blocker
> >> to me (sorry).
> >
> > The vector length is always a template parameter to all user-facing API. 
> > Some
> > examples
> >
> > 1. on aarch64 the following is independent of SVE flags (and status quo):
> >
> >   simd is an alias for
> >   simd
> >
> >   fixed_size_simd is supposed to be ABI-stable anyway (passed via
> >   the stack, alignof == sizeof).
> >
> > 2. with -msve-vector-bits=512:
> >
> >   native_simd is an alias for
> >   simd>
> >
> >   simd> is an alias for
> >   simd>
> >
> > 3. with -msve-vector-bits=256:
> >
> >   native_simd is an alias for
> >   simd>
> >
> >   simd> is an alias for
> >   simd>
> >
> > Implementation functions are either [[gnu::always_inline]] or tagged with 
> > the
> > ABI tag type and the __odr_helper template argument (to ensure not-inlined
> > inline functions have unique names).
>
> Ah, thanks for the explanation.  I think the global native_float alias
> is problematic for reasons that you touched on in your later message.
> I'll reply more about that there.  But in other respects this looks good.
>
> > Does that make __ARM_FEATURE_SVE_BITS usage indirect enough?
>
> In principle, the only use of __ARM_FEATURE_SVE_BITS should be to determine
> the definition of native_simd (with the caveats above).  But current
> GCC restrictions might make that impractical.
>
> > Also for context, please consider that this is std::*experimental*::simd. 
> > The
> > underlying ISO document will likely get retracted at some point and the 
> > whole
> > API and implementation (hopefully) superseded by C++26. The main purpose of
> > the spec and implementation is to gather experience.
>
> Ah, ok.  If this is a deliberate experiment for evidence-gathering
> purposes, rather than a long-term commitment, then I agree the barrier
> should be lower.

Yes, that's definitely what this code is for. The more feedback and
impl-experience we can get now with the std::experimental::simd
version, the better std::simd will be when that happens.

In practice, we probably won't ever actually remove the
 header even when the experiment is over (e.g. we
still have  with std::tr1::shared_ptr!), but we are likely
to consider it unmaintained and deprecated once it's superseded by
std::simd.

> So yeah, I'll withdraw my objection.  I've no problem with this going
> into GCC 14 on the basis above.  Thanks again to you and Srinivas for
> working on this.
>
> Richard
>



Re: [PATCH] libstdc++/ranges: Define _S_has_simple_call_op on newer adaptors

2024-03-27 Thread Patrick Palka
On Wed, 17 Jan 2024, Jonathan Wakely wrote:

> 
> 
> On Wed, 17 Jan 2024, 02:37 Patrick Palka,  wrote:
>   Tested on x86_64-pc-linux-gnu, does this look OK for trunk?
> 
> 
> OK

Thanks a lot.  For the record I ended up not pushing this patch because
all the range adaptors it touches are C++23 ones and so ...

> 
> 
> 
>   -- >8 --
> 
>   This compile-time and diagnostic improvement[1] is less important in
>   C++23 mode where deducing this is available and used in _Pipe, but for
>   benefit of C++20 mode

... this isn't true.  There's no real benefit in setting this flag for
standard C++23 range adaptors (besides consistency I guess, but OTOH not
setting it means more testing coverage for the deducing this code path).

> and for consistency let's set the flag on the most
>   recently added range adaptors which lack it.
> 
>   [1]: Defining _S_has_simple_call_op=true for a range adaptor closure
>   object signals that the adaptor has an operator() that's not overloaded
>   according to the constness/value category of the object, which in turn
>   allows us to implement the operator() of the composition of such 
> adaptors
>   in a simpler way.
> 
>   libstdc++-v3/ChangeLog:
> 
>           * include/std/ranges (views::_Adjacent::_S_has_simple_call_op):
>           Define to true.
>           (views::_AsRvalue::_S_has_simple_call_op): Likewise.
>           (views::_Enumerate::_S_has_simple_call_op): Likewise.
>           (views::_AsConst::_S_has_simple_call_op): Likewise.
>   ---
>    libstdc++-v3/include/std/ranges | 8 
>    1 file changed, 8 insertions(+)
> 
>   diff --git a/libstdc++-v3/include/std/ranges 
> b/libstdc++-v3/include/std/ranges
>   index 7ef835f486a..66502cc1da4 100644
>   --- a/libstdc++-v3/include/std/ranges
>   +++ b/libstdc++-v3/include/std/ranges
>   @@ -5669,6 +5669,8 @@ namespace views::__adaptor
>               else
>                 return adjacent_view, 
> _Nm>(std::forward<_Range>(__r));
>             }
>   +
>   +       static constexpr bool _S_has_simple_call_op = true;
>          };
> 
>        template
>   @@ -8844,6 +8846,8 @@ namespace views::__adaptor
>             else
>               return as_rvalue_view(std::forward<_Range>(__r));
>           }
>   +
>   +      static constexpr bool _S_has_simple_call_op = true;
>        };
> 
>        inline constexpr _AsRvalue as_rvalue;
>   @@ -9147,6 +9151,8 @@ namespace views::__adaptor
>           constexpr auto
>           operator() [[nodiscard]] (_Range&& __r) const
>           { return 
> enumerate_view>(std::forward<_Range>(__r)); }
>   +
>   +      static constexpr bool _S_has_simple_call_op = true;
>        };
> 
>        inline constexpr _Enumerate enumerate;
>   @@ -9253,6 +9259,8 @@ namespace views::__adaptor
>           else
>             return as_const_view(std::forward<_Range>(__r));
>          }
>   +
>   +      static constexpr bool _S_has_simple_call_op = true;
>        };
> 
>        inline constexpr _AsConst as_const;
>   --
>   2.43.0.367.g186b115d30
> 
> 
> 

Re: [PATCH] LoongArch: Increase division costs

2024-03-27 Thread Xi Ruoyao
On Wed, 2024-03-27 at 18:39 +0800, Xi Ruoyao wrote:
> On Wed, 2024-03-27 at 10:38 +0800, chenglulu wrote:
> > 
> > 在 2024/3/26 下午5:48, Xi Ruoyao 写道:
> > > The latency of LA464 and LA664 division instructions depends on the
> > > input.  When I updated the costs in r14-6642, I unintentionally set the
> > > division costs to the best-case latency (when the first operand is 0).
> > > Per a recent discussion [1] we should use "something sensible" instead
> > > of it.
> > > 
> > > Use the average of the minimum and maximum latency observed instead.
> > > This enables multiplication to reciprocal sequence reduction and speeds
> > > up the following test case for about 30%:
> > > 
> > >  int
> > >  main (void)
> > >  {
> > >    unsigned long stat = 0xdeadbeef;
> > >    for (int i = 0; i < 1; i++)
> > >  stat = (stat * stat + stat * 114514 + 1919810) % 17;
> > >    asm(""::"r"(stat));
> > >  }
> > > 
> > > [1]: https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648348.html
> > 
> > The test case div-const-reduction.c is modified to assemble the instruction
> > sequence as follows:
> > lu12i.w $r12,97440>>12  # 0x3b9ac000
> > ori $r12,$r12,2567
> > mod.w   $r13,$r13,$r12
> > 
> > This sequence of instructions takes 5 clock cycles.

It actually may take 5 to 8 cycles depending on the input.  And
multiplication is fully pipelined while division is not, so the
reciprocal sequence should still produce a better throughput.

> Hmm indeed, it seems a waste to do this reduction for int / 17.
> I'll try to make a better heuristic as Richard suggests...

Oops, it seems impossible (w/o refactoring the generic code).  See my
reply to Richi :(.

Can you also try benchmarking with the costs of SI and DI division
increased to (10, 10) instead of (14, 22) - allowing more CSE but not
reciprocal sequence reduction, and (10, 22) - only allowing reduction
for DI but not SI?

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] match.pd: Avoid (view_convert (convert@0 @1)) optimization for extended _BitInts with padding bits [PR114469]

2024-03-27 Thread Richard Biener
On Wed, 27 Mar 2024, Jakub Jelinek wrote:

> On Wed, Mar 27, 2024 at 12:48:29PM +0100, Richard Biener wrote:
> > > The following patch attempts to fix the (view_convert (convert@0 @1))
> > > optimization.  If TREE_TYPE (@0) is a _BitInt type with padding bits
> > > and @0 has the same precision as @1 and it has a different sign
> > > and _BitInt with padding bits are extended on the target (x86 doesn't,
> > > aarch64 doesn't, but arm plans to do that), then optimizing it to
> > > just (view_convert @1) is wrong, the padding bits wouldn't be what
> > > it should be.
> > > E.g. bitint-64.c test with -O2 has
> > >   _5 = (unsigned _BitInt(5)) _4;
> > >   _7 = (unsigned _BitInt(5)) e.0_1;
> > >   _8 = _5 + _7;
> > >   _9 = (_BitInt(5)) _8;
> > >   _10 = VIEW_CONVERT_EXPR(_9);
> > > and forwprop1 changes that to just
> > >   _5 = (unsigned _BitInt(5)) _4;
> > >   _7 = (unsigned _BitInt(5)) e.0_1;
> > >   _8 = _5 + _7;
> > >   _10 = VIEW_CONVERT_EXPR(_8);
> > > The former makes the padding bits well defined (at least on arm in
> > > the future), while the latter has those bits zero extended.
> > 
> > So I think the existing rule, when extended to
> > 
> >|| (TYPE_PRECISION (TREE_TYPE (@0)) > TYPE_PRECISION (TREE_TYPE 
> > (@1))
> >&& TYPE_UNSIGNED (TREE_TYPE (@1))
> > 
> > assumes padding is extended according to the signedness.  But the
> > original
> > 
> >&& (TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE 
> > (@1))
> > 
> > rule has the padding undefined.  I think that's inconsistent at least.
> 
> The whole simplification is just weird, all it tests are the precisions
> of TREE_TYPE (@0) and TREE_TYPE (@1), but doesn't actually test if there
> are padding bits (i.e. the TYPE_SIZE (type) vs. those precisions).
> 
> I've tried to construct a non-_BitInt testcase with
> struct S { signed long long a : 37; unsigned long long b : 37; } s;
> 
> unsigned long long
> foo (long long x, long long y)
> {
>   __typeof (s.a + 0) c = x, d = y;
>   __typeof (s.b + 0) e = (__typeof (s.b + 0)) (c + d);
>   union U { __typeof (s.b + 0) f; unsigned long long g; } u;
>   u.f = e;
>   return u.g;
> }
> 
> unsigned long long
> bar (unsigned long long x, unsigned long long y)
> {
>   __typeof (s.b + 0) c = x, d = y;
>   __typeof (s.a + 0) e = (__typeof (s.a + 0)) (c + d);
>   union U { __typeof (s.a + 0) f; unsigned long long g; } u;
>   u.f = e;
>   return u.g;
> }
> 
> int
> main ()
> {
>   unsigned long long x = foo (-53245722422LL, -5719971797LL);
>   unsigned long long y = bar (136917222900, 5719971797LL);
>   __builtin_printf ("%016llx %016llx\n", x, y);
>   return 0;
> }
> which seems to print
> 0012455ee8f5 000135d6ddc9
> at -O0 and
> fff2455ee8f5 000135d6ddc9
> at -O2/-O3, dunno if in that case we consider those padding bits
> uninitialized/anything can happen, then it is fine, or something else.
> 
> > I'll note that we do not constrain 'type' so the V_C_E could
> > re-interpret the integer (with or without padding) as floating-point.
> 
> Anyway, if we have the partial int types with undefined padding bits
> (the above __typeof of a bitfield type or _BitInt on x86/aarch64),
> I guess the precision0 == precision1 case is ok, we have
> say 3 padding bits before/after, a valid program better should mask
> away those bits or it will be UB.
> If precision0 == precision1 and type1 has the extended padding bits
> and type0 has undefined padding bits, then I guess it is also ok.
> Other cases of same precision are not ok, turning well defined bits
> into undefined or changing sign extended to zero extended or vice versa.
> 
> For the precision0 > precision1 && unsigned1 case on the other side,
> I don't see how it is ever correct for the undefined padding bits cases,
> the padding bits above type0 are undefined before/after the simplification,
> but the bits in between are changed from previously zero to undefined.
> 
> For the precision0 > precision1 cases if both type0 and type1 are
> extended like _BitInt on arm, then the simplification is reasonable,
> all the padding bits above type0 are zeros and all the padding bits between
> type1 and type0 precision are zeros too, before and after.
> 
> So, indeed my patch isn't correct.
> 
> And because we don't have the _BitInt arm support for GCC 14, perhaps
> we could defer that part for GCC 15, though I wonder if we shouldn't just
> kill that TYPE_PRECISION (TREE_TYPE (@0)) > TYPE_PRECISION (TREE_TYPE (@1))
> && TYPE_UNSIGNED (TREE_TYPE (@1)) case, because it is clearly wrong
> for the types with undefined padding bits (everything on trunk right now).

The comment says it was added for (char)_Bool, probably 
gcc.dg/tree-ssa/vce-1.c will fail.  But I'm not sure this optimization
is important, it seems early SRA introduces a V_C_E here and then
phiopt the conversion to unsigned char.  But this is a V_C_E of
_Bool -> unsigned char -> _Bool.  We seem to apply both this
pattern and then drop the V_C_E.

Maybe we can 

Re: [PATCH] LoongArch: Increase division costs

2024-03-27 Thread Richard Biener
On Wed, Mar 27, 2024 at 1:20 PM Xi Ruoyao  wrote:
>
> On Wed, 2024-03-27 at 08:54 +0100, Richard Biener wrote:
> > On Tue, Mar 26, 2024 at 10:52 AM Xi Ruoyao  wrote:
> > >
> > > The latency of LA464 and LA664 division instructions depends on the
> > > input.  When I updated the costs in r14-6642, I unintentionally set the
> > > division costs to the best-case latency (when the first operand is 0).
> > > Per a recent discussion [1] we should use "something sensible" instead
> > > of it.
> > >
> > > Use the average of the minimum and maximum latency observed instead.
> > > This enables multiplication to reciprocal sequence reduction and speeds
> > > up the following test case for about 30%:
> > >
> > > int
> > > main (void)
> > > {
> > >   unsigned long stat = 0xdeadbeef;
> > >   for (int i = 0; i < 1; i++)
> > > stat = (stat * stat + stat * 114514 + 1919810) % 17;
> > >   asm(""::"r"(stat));
> > > }
> >
> > I think you should be able to see a constant divisor and thus could do
> > better than return the same latency for everything.  For non-constant
> > divisors using the best-case latency shouldn't be a problem.
>
> Hmm, it seems not really possible as at now.  expand_divmod does
> something like:
>
>   max_cost = (unsignedp
>   ? udiv_cost (speed, compute_mode)
>   : sdiv_cost (speed, compute_mode));
>
> which is reading the pre-calculated costs from a table.  Thus we don't
> really know the denominator and cannot estimate the cost based on it :(.

Ah, too bad.  OTOH for the actual case it decomposes it could compute
the real cost, avoiding the table which is filled with reg-reg operations only.

> CSE really invokes the cost hook with the actual (mod (a, (const_int
> 17)) RTX but it's less important.
>
> --
> Xi Ruoyao 
> School of Aerospace Science and Technology, Xidian University


Re: [PATCH] LoongArch: Increase division costs

2024-03-27 Thread Xi Ruoyao
On Wed, 2024-03-27 at 08:54 +0100, Richard Biener wrote:
> On Tue, Mar 26, 2024 at 10:52 AM Xi Ruoyao  wrote:
> > 
> > The latency of LA464 and LA664 division instructions depends on the
> > input.  When I updated the costs in r14-6642, I unintentionally set the
> > division costs to the best-case latency (when the first operand is 0).
> > Per a recent discussion [1] we should use "something sensible" instead
> > of it.
> > 
> > Use the average of the minimum and maximum latency observed instead.
> > This enables multiplication to reciprocal sequence reduction and speeds
> > up the following test case for about 30%:
> > 
> >     int
> >     main (void)
> >     {
> >   unsigned long stat = 0xdeadbeef;
> >   for (int i = 0; i < 1; i++)
> >     stat = (stat * stat + stat * 114514 + 1919810) % 17;
> >   asm(""::"r"(stat));
> >     }
> 
> I think you should be able to see a constant divisor and thus could do
> better than return the same latency for everything.  For non-constant
> divisors using the best-case latency shouldn't be a problem.

Hmm, it seems not really possible as at now.  expand_divmod does
something like:

  max_cost = (unsignedp
  ? udiv_cost (speed, compute_mode)
  : sdiv_cost (speed, compute_mode));

which is reading the pre-calculated costs from a table.  Thus we don't
really know the denominator and cannot estimate the cost based on it :(.

CSE really invokes the cost hook with the actual (mod (a, (const_int
17)) RTX but it's less important.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] match.pd: Avoid (view_convert (convert@0 @1)) optimization for extended _BitInts with padding bits [PR114469]

2024-03-27 Thread Jakub Jelinek
On Wed, Mar 27, 2024 at 12:48:29PM +0100, Richard Biener wrote:
> > The following patch attempts to fix the (view_convert (convert@0 @1))
> > optimization.  If TREE_TYPE (@0) is a _BitInt type with padding bits
> > and @0 has the same precision as @1 and it has a different sign
> > and _BitInt with padding bits are extended on the target (x86 doesn't,
> > aarch64 doesn't, but arm plans to do that), then optimizing it to
> > just (view_convert @1) is wrong, the padding bits wouldn't be what
> > it should be.
> > E.g. bitint-64.c test with -O2 has
> >   _5 = (unsigned _BitInt(5)) _4;
> >   _7 = (unsigned _BitInt(5)) e.0_1;
> >   _8 = _5 + _7;
> >   _9 = (_BitInt(5)) _8;
> >   _10 = VIEW_CONVERT_EXPR(_9);
> > and forwprop1 changes that to just
> >   _5 = (unsigned _BitInt(5)) _4;
> >   _7 = (unsigned _BitInt(5)) e.0_1;
> >   _8 = _5 + _7;
> >   _10 = VIEW_CONVERT_EXPR(_8);
> > The former makes the padding bits well defined (at least on arm in
> > the future), while the latter has those bits zero extended.
> 
> So I think the existing rule, when extended to
> 
>|| (TYPE_PRECISION (TREE_TYPE (@0)) > TYPE_PRECISION (TREE_TYPE 
> (@1))
>&& TYPE_UNSIGNED (TREE_TYPE (@1))
> 
> assumes padding is extended according to the signedness.  But the
> original
> 
>&& (TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE 
> (@1))
> 
> rule has the padding undefined.  I think that's inconsistent at least.

The whole simplification is just weird, all it tests are the precisions
of TREE_TYPE (@0) and TREE_TYPE (@1), but doesn't actually test if there
are padding bits (i.e. the TYPE_SIZE (type) vs. those precisions).

I've tried to construct a non-_BitInt testcase with
struct S { signed long long a : 37; unsigned long long b : 37; } s;

unsigned long long
foo (long long x, long long y)
{
  __typeof (s.a + 0) c = x, d = y;
  __typeof (s.b + 0) e = (__typeof (s.b + 0)) (c + d);
  union U { __typeof (s.b + 0) f; unsigned long long g; } u;
  u.f = e;
  return u.g;
}

unsigned long long
bar (unsigned long long x, unsigned long long y)
{
  __typeof (s.b + 0) c = x, d = y;
  __typeof (s.a + 0) e = (__typeof (s.a + 0)) (c + d);
  union U { __typeof (s.a + 0) f; unsigned long long g; } u;
  u.f = e;
  return u.g;
}

int
main ()
{
  unsigned long long x = foo (-53245722422LL, -5719971797LL);
  unsigned long long y = bar (136917222900, 5719971797LL);
  __builtin_printf ("%016llx %016llx\n", x, y);
  return 0;
}
which seems to print
0012455ee8f5 000135d6ddc9
at -O0 and
fff2455ee8f5 000135d6ddc9
at -O2/-O3, dunno if in that case we consider those padding bits
uninitialized/anything can happen, then it is fine, or something else.

> I'll note that we do not constrain 'type' so the V_C_E could
> re-interpret the integer (with or without padding) as floating-point.

Anyway, if we have the partial int types with undefined padding bits
(the above __typeof of a bitfield type or _BitInt on x86/aarch64),
I guess the precision0 == precision1 case is ok, we have
say 3 padding bits before/after, a valid program better should mask
away those bits or it will be UB.
If precision0 == precision1 and type1 has the extended padding bits
and type0 has undefined padding bits, then I guess it is also ok.
Other cases of same precision are not ok, turning well defined bits
into undefined or changing sign extended to zero extended or vice versa.

For the precision0 > precision1 && unsigned1 case on the other side,
I don't see how it is ever correct for the undefined padding bits cases,
the padding bits above type0 are undefined before/after the simplification,
but the bits in between are changed from previously zero to undefined.

For the precision0 > precision1 cases if both type0 and type1 are
extended like _BitInt on arm, then the simplification is reasonable,
all the padding bits above type0 are zeros and all the padding bits between
type1 and type0 precision are zeros too, before and after.

So, indeed my patch isn't correct.

And because we don't have the _BitInt arm support for GCC 14, perhaps
we could defer that part for GCC 15, though I wonder if we shouldn't just
kill that TYPE_PRECISION (TREE_TYPE (@0)) > TYPE_PRECISION (TREE_TYPE (@1))
&& TYPE_UNSIGNED (TREE_TYPE (@1)) case, because it is clearly wrong
for the types with undefined padding bits (everything on trunk right now).

If I have
struct S { signed long long a : 42; unsigned long long b : 37; } s;
__typeof (s.b) v1 = ...;
__typeof (s.a) v2;
unsigned long long v3;
v2 = v1; // This zero extends the 5 bits
v3 = VCE  (v2);
return v3 & 0x3ffULL;
then without the folding it used to be valid, those 5 bits were zeros,
with the optimization it is not.  Similarly to VCE to struct or double
and the like.  And similarly for unsigned _BitInt(37)/ signed _BitInt(42)
on x86/aarch64.

Jakub



Re: [PATCH v2] libstdc++: add ARM SVE support to std::experimental::simd

2024-03-27 Thread Richard Sandiford
Matthias Kretz  writes:
> On Wednesday, 27 March 2024 11:07:14 CET Richard Sandiford wrote:
>> I'm still worried about:
>> 
>>   #if _GLIBCXX_SIMD_HAVE_SVE
>>   constexpr inline int __sve_vectorized_size_bytes = __ARM_FEATURE_SVE_BITS
>> / 8; #else
>>   constexpr inline int __sve_vectorized_size_bytes = 0;
>>   #endif
>> 
>> and the direct use __ARM_FEATURE_SVE_BITS elsewhere, for the reasons
>> discussed here (including possible ODR problems):
>> 
>>   https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640037.html
>>   https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643734.html
>> 
>> Logically the vector length should be a template parameter rather than
>> an invariant.  Has this been resolved?  If not, it feels like a blocker
>> to me (sorry).
>
> The vector length is always a template parameter to all user-facing API. Some 
> examples
>
> 1. on aarch64 the following is independent of SVE flags (and status quo):
>
>   simd is an alias for
>   simd
>
>   fixed_size_simd is supposed to be ABI-stable anyway (passed via
>   the stack, alignof == sizeof).
>
> 2. with -msve-vector-bits=512:
>
>   native_simd is an alias for
>   simd>
>
>   simd> is an alias for
>   simd>
>
> 3. with -msve-vector-bits=256: 
>
>   native_simd is an alias for
>   simd>
>
>   simd> is an alias for
>   simd>
>
> Implementation functions are either [[gnu::always_inline]] or tagged with the 
> ABI tag type and the __odr_helper template argument (to ensure not-inlined 
> inline functions have unique names).

Ah, thanks for the explanation.  I think the global native_float alias
is problematic for reasons that you touched on in your later message.
I'll reply more about that there.  But in other respects this looks good.

> Does that make __ARM_FEATURE_SVE_BITS usage indirect enough?

In principle, the only use of __ARM_FEATURE_SVE_BITS should be to determine
the definition of native_simd (with the caveats above).  But current
GCC restrictions might make that impractical.

> Also for context, please consider that this is std::*experimental*::simd. The 
> underlying ISO document will likely get retracted at some point and the whole 
> API and implementation (hopefully) superseded by C++26. The main purpose of 
> the spec and implementation is to gather experience.

Ah, ok.  If this is a deliberate experiment for evidence-gathering
purposes, rather than a long-term commitment, then I agree the barrier
should be lower.

So yeah, I'll withdraw my objection.  I've no problem with this going
into GCC 14 on the basis above.  Thanks again to you and Srinivas for
working on this.

Richard


Re: [PATCH] libstdc++: add ARM SVE support to std::experimental::simd

2024-03-27 Thread Matthias Kretz
Hi Richard,

sorry for not answering sooner. I took action on your mail but failed to also 
give feedback. Now in light of your veto of Srinivas patch I wanted to use the 
opportunity to pick this up again.

On Dienstag, 23. Januar 2024 21:57:23 CET Richard Sandiford wrote:
> However, we also support different vector lengths for streaming SVE
> (running in "streaming" mode on SME) and non-streaming SVE (running
> in "non-streaming" mode on the core).  Having two different lengths is
> expected to be the common case, rather than a theoretical curiosity.

I read up on this after you mentioned this for the first time. As a WG21 
member I find the approach troublesome - but that's a bit off-topic for this 
thread.

The big issue here is that, IIUC, a user (and the simd library) cannot do the 
right thing at the moment. There simply isn't enough context information 
available when parsing the  header. I.e. on definition of 
the class template there's no facility to take target_clones or SME 
"streaming" mode into account. Consequently, if we want the library to be fit 
for SME, then we need more language extension(s) to make it work.

I guess I'm looking for a way to declare types that are different depending on 
whether they are used in streaming mode or non-streaming mode (making them 
ill-formed to use in functions marked arm_streaming_compatible).

From reading through https://arm-software.github.io/acle/main/
acle.html#controlling-the-use-of-streaming-mode I don't see any discussion of 
member functions or ctor/dtor, static and non-static data members, etc.

The big issue I see here is that currently all of std::* is declared without a 
arm_streaming or arm_streaming_compatible. Thus, IIUC, you can't use anything 
from the standard library in streaming mode. Since that also applies to 
std::experimental::simd, we're not creating a new footgun, only missing out on 
potential users?

Some more thoughts on target_clones/streaming SVE language extension 
evolution:

  void nonstreaming_fn(void) {
constexpr int width = __arm_sve_bits(); // e.g. 512
constexpr int width2 = __builtin_vector_size(); // e.g. 64 (the
  // vector_size attribute works with bytes, not bits)
  }

  __attribute__((arm_locally_streaming))
  void streaming_fn(void) {
constexpr int width = __arm_sve_bits(); // e.g. 128
constexpr int width2 = __builtin_vector_size(); // e.g. 16
  }

  __attribute__((target_clones("sse4.2,avx2")))
  void streaming_fn(void) {
constexpr int width = __builtin_vector_size(); // 16 in the sse4.2 clone
  // and 32 in the avx2 clone
  }

... as a starting point for exploration. Given this, I'd still have to resort 
to a macro to define a "native" simd type:

#define NATIVE_SIMD(T) std::experimental::simd>

Getting rid of the macro seems to be even harder.

A declaration of an alias like

template 
using SveSimd = std::experimental::simd>;

would have to delay "invoking" __arm_sve_bits() until it knows its context:

  void nonstreaming_fn(void) {
static_assert(sizeof(SveSimd) == 64);
  }

  __attribute__((arm_locally_streaming))
  void streaming_fn(void) {
static_assert(sizeof(SveSimd) == 16);
nonstreaming_fn(); // fine
  }

This gets even worse for target_clones, where

  void f() {
sizeof(std::simd) == ?
  }

  __attribute__((target_clones("sse4.2,avx2")))
  void g() {
f();
  }

the compiler *must* virally apply target_clones to all functions it calls. And 
member functions must either also get cloned as functions, or the whole type 
must be cloned (as in the std::simd case, where the sizeof needs to change). 


> When would NumberOfUsedBytes < SizeofRegister be used for SVE?  Would it
> be for storing narrower elements in wider containers?  If the interface
> supports that then, yeah, two parameters would probably be safer.
> 
> Or were you thinking about emulating narrower vectors with wider registers
> using predication?  I suppose that's possible too, and would be similar in
> spirit to using SVE to optimise Advanced SIMD std::simd types.
> But mightn't it cause confusion if sizeof applied to a "16-byte"
> vector actually gives 32?

Yes, the idea is to e.g. use one SVE register instead of two NEON registers 
for a "float, 8" with SVE512.

The user never asks for a "16-byte" vector. The user asks for a value-type and 
and number of elements. Sure, the wasteful "padding" might come as a surprise, 
but it's totally within the spec to implement it like this.


> I assume std::experimental::native_simd has to have the same
> meaning everywhere for ODR reasons?

No. Only std::experimental::simd has to be "ABI stable". And note that in 
the C++ spec there's no such thing as compiling and linking TUs with different 
compiler flags. That's plain UB. The committee still cares about it, but 
getting this "right" cannot be part of the standard and must be defined by 
implementers

>  If so, it'll need to be an
> Advanced SIMD vector for AArch64 (but using SVE to optimise 

[PATCH] tree-optimization/114057 - handle BB reduction remain defs as LIVE

2024-03-27 Thread Richard Biener
The following makes sure to record the scalars we add to the BB
reduction vectorization result as scalar uses for the purpose of
computing live lanes.  This restores vectorization in the
bondfree.c TU of 435.gromacs.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/114057
* tree-vect-slp.cc (vect_bb_slp_mark_live_stmts): Mark
BB reduction remain defs as scalar uses.
---
 gcc/tree-vect-slp.cc | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index adb2d9ae1e5..2e5481acbc7 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -6665,8 +6665,14 @@ vect_bb_slp_mark_live_stmts (bb_vec_info bb_vinfo)
   auto_vec worklist;
 
   for (slp_instance instance : bb_vinfo->slp_instances)
-if (!visited.add (SLP_INSTANCE_TREE (instance)))
-  worklist.safe_push (SLP_INSTANCE_TREE (instance));
+{
+  if (SLP_INSTANCE_KIND (instance) == slp_inst_kind_bb_reduc)
+   for (tree op : SLP_INSTANCE_REMAIN_DEFS (instance))
+ if (TREE_CODE (op) == SSA_NAME)
+   scalar_use_map.put (op, 1);
+  if (!visited.add (SLP_INSTANCE_TREE (instance)))
+   worklist.safe_push (SLP_INSTANCE_TREE (instance));
+}
 
   do
 {
@@ -6684,7 +6690,8 @@ vect_bb_slp_mark_live_stmts (bb_vec_info bb_vinfo)
if (child && !visited.add (child))
  worklist.safe_push (child);
}
-} while (!worklist.is_empty ());
+}
+  while (!worklist.is_empty ());
 
   visited.empty ();
 
-- 
2.35.3


Re: [PATCH] match.pd: Avoid (view_convert (convert@0 @1)) optimization for extended _BitInts with padding bits [PR114469]

2024-03-27 Thread Richard Biener
On Wed, 27 Mar 2024, Jakub Jelinek wrote:

> Hi!
> 
> The following patch attempts to fix the (view_convert (convert@0 @1))
> optimization.  If TREE_TYPE (@0) is a _BitInt type with padding bits
> and @0 has the same precision as @1 and it has a different sign
> and _BitInt with padding bits are extended on the target (x86 doesn't,
> aarch64 doesn't, but arm plans to do that), then optimizing it to
> just (view_convert @1) is wrong, the padding bits wouldn't be what
> it should be.
> E.g. bitint-64.c test with -O2 has
>   _5 = (unsigned _BitInt(5)) _4;
>   _7 = (unsigned _BitInt(5)) e.0_1;
>   _8 = _5 + _7;
>   _9 = (_BitInt(5)) _8;
>   _10 = VIEW_CONVERT_EXPR(_9);
> and forwprop1 changes that to just
>   _5 = (unsigned _BitInt(5)) _4;
>   _7 = (unsigned _BitInt(5)) e.0_1;
>   _8 = _5 + _7;
>   _10 = VIEW_CONVERT_EXPR(_8);
> The former makes the padding bits well defined (at least on arm in
> the future), while the latter has those bits zero extended.

So I think the existing rule, when extended to

   || (TYPE_PRECISION (TREE_TYPE (@0)) > TYPE_PRECISION (TREE_TYPE 
(@1))
   && TYPE_UNSIGNED (TREE_TYPE (@1))

assumes padding is extended according to the signedness.  But the
original

   && (TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE 
(@1))

rule has the padding undefined.  I think that's inconsistent at least.
I'll note that we do not constrain 'type' so the V_C_E could
re-interpret the integer (with or without padding) as floating-point.

Given the previous

/* For integral conversions with the same precision or pointer
   conversions use a NOP_EXPR instead.  */
(simplify 
  (view_convert @0)

should match first we should only ever see non-integer/pointer here
or cases where the V_C_E looks at padding.  IMO we should change
this to

   && ((TYPE_PRECISION (TREE_TYPE (@0)) >= TYPE_PRECISION (TREE_TYPE 
(@1))
&& TYPE_SIGN (TREE_TYPE (@0)) == TYPE_SIGN (TREE_TYPE (@1
   || (TYPE_PRECISION (TREE_TYPE (@0)) > TYPE_PRECISION (TREE_TYPE 
(@1))
   && TYPE_UNSIGNED (TREE_TYPE (@1

?  Having this inconsistent treatment of padding is bad.

In your case the precision is equal so this should fix it, right?
Not sure if it's worth adding a INTEGERAL_TYPE_P (type) case with
TYPE_PRECISION (type) == TYPE_PRECISION (TREE_TYPE (@1)), see the
previous pattern.


> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2024-03-26  Jakub Jelinek  
> 
>   PR tree-optimization/114469
>   * match.pd ((view_convert (convert@0 @1))): Don't optimize if
>   TREE_TYPE (@0) is a BITINT_TYPE with padding bits which are supposed
>   to be extended by the ABI.
> 
> --- gcc/match.pd.jj   2024-03-15 11:04:24.672914747 +0100
> +++ gcc/match.pd  2024-03-26 15:49:44.177864509 +0100
> @@ -4699,13 +4699,38 @@ (define_operator_list SYNC_FETCH_AND_AND
> zero-extend while keeping the same size (for bool-to-char).  */
>  (simplify
>(view_convert (convert@0 @1))
> -  (if ((INTEGRAL_TYPE_P (TREE_TYPE (@0)) || POINTER_TYPE_P (TREE_TYPE (@0)))
> -   && (INTEGRAL_TYPE_P (TREE_TYPE (@1)) || POINTER_TYPE_P (TREE_TYPE 
> (@1)))
> -   && TYPE_SIZE (TREE_TYPE (@0)) == TYPE_SIZE (TREE_TYPE (@1))
> -   && (TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1))
> -|| (TYPE_PRECISION (TREE_TYPE (@0)) > TYPE_PRECISION (TREE_TYPE (@1))
> -&& TYPE_UNSIGNED (TREE_TYPE (@1)
> -   (view_convert @1)))
> +  (with { tree type0 = TREE_TYPE (@0);
> +   tree type1 = TREE_TYPE (@1);
> +   bool ok = false;
> +   if ((INTEGRAL_TYPE_P (type0) || POINTER_TYPE_P (type0))
> +   && (INTEGRAL_TYPE_P (type1) || POINTER_TYPE_P (type1))
> +   && TYPE_SIZE (type0) == TYPE_SIZE (type1))
> + {
> +   if (TYPE_PRECISION (type0) == TYPE_PRECISION (type1))
> + {
> +   if (TREE_CODE (type0) != BITINT_TYPE)
> + ok = true;
> +   else
> + {
> +   /* Avoid optimizing this if type0 is a _BitInt
> +  type with padding bits which are supposed to be
> +  extended.  */
> +   struct bitint_info info;
> +   targetm.c.bitint_type_info (TYPE_PRECISION (type0),
> +   );
> +   if (!info.extended)
> + ok = true;
> +   else
> + ok = (TYPE_PRECISION (type0)
> +   == tree_to_uhwi (TYPE_SIZE (type0)));
> + }
> + }
> +   else if (TYPE_PRECISION (type0) > TYPE_PRECISION (type1)
> +&& TYPE_UNSIGNED (type1))
> + ok = true;
> + } }
> +   (if (ok)
> +(view_convert @1
>  
>  /* Simplify a view-converted empty or single-element constructor.  */
>  (simplify
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions 

Re: [PATCH] lto: Don't assume a posix shell is usable on windows [PR110710]

2024-03-27 Thread Richard Biener
On Wed, Mar 27, 2024 at 10:13 AM peter0x44  wrote:
>
> On 2024-03-27 01:58, Richard Biener wrote:
> > On Wed, Mar 27, 2024 at 9:13 AM Peter0x44 
> > wrote:
> >>
> >> I accidentally replied off-list. Sorry.
> >>
> >> 27 Mar 2024 8:09:30 am Peter0x44 :
> >>
> >>
> >> 27 Mar 2024 7:51:26 am Richard Biener :
> >>
> >> > On Tue, Mar 26, 2024 at 11:37 PM Peter Damianov 
> >> > wrote:
> >> >>
> >> >> lto-wrapper generates Makefiles that use the following:
> >> >> touch -r file file.tmp && mv file.tmp file
> >> >> to truncate files.
> >> >> If there is no suitable "touch" or "mv" available, then this errors
> >> >> with
> >> >> "The system cannot find the file specified".
> >> >>
> >> >> The solution here is to check if sh -c true works, before trying to
> >> >> use it in
> >> >> the Makefile. If it doesn't, then fall back to "copy /y nul file"
> >> >> instead.
> >> >> The fallback doesn't work exactly the same (the modified time of the
> >> >> file gets
> >> >> updated), but this doesn't seem to matter in practice.
> >> >
> >> > I suppose it doesn't matter as we (no longer?) have the input as
> >> > dependency
> >> > on the rule so make doesn't get confused to re-build it.  I guess we
> >> > only truncate
> >> > the file because it's going to be deleted by another process.
> >> >
> >> > Instead of doing sth like sh_exists I would suggest to simply use
> >> > #ifdef __WIN
> >> > or something like that?  Not sure if we have suitable macros to
> >> > identify the
> >> > host operating system though and whether mingw, cygwin, etc. behave the
> >> > same
> >> > here.
> >>
> >> They do, I tested. Using sh_exists is deliberate, I've had to program
> >> on
> >> school computers that had cmd.exe disabled, but had busybox-w32
> >> working,
> >> so it might be more flexible in that way. I would prefer a solution
> >> which
> >> didn't require invoking cmd.exe if there is a working shell present.
> >
> > Hmm, but then I'd expect SHELL to be appropriately set in such
> > situation?  So shouldn't sh_exists at least try to look at $SHELL?
> I'm not sure it would . On windows, make searches the PATH for an sh.exe
> if
> present, and then uses it by default. The relevant code is here:
> https://git.savannah.gnu.org/cgit/make.git/tree/src/variable.c#n1628
> >
> >> I figured doing the "sh_exists" is okay because there is a basically
> >> identical check for make.
> >>
> >> >
> >> > As a stop-gap solution doing
> >> >
> >> >   ( @-touch -r ... ) || true
> >> >
> >> > might also work?  Or another way to note to make the command can fail
> >> > without causing a problem.
> >>
> >> I don't think it would work. cmd.exe can't run subshells like this.
> >
> > Hmm, OK.  So this is all for the case where 'make' is available (as you
> > say we check for that) but no POSIX command environment is
> > (IIRC both touch and mv are part of  POSIX).  Testing for 'touch' and
> > 'mv' would then be another option?
> I think it would work, but keep in mind they could be placed on the PATH
> but
> still invoked from cmd.exe, so you might have to be careful of the
> redirection
> syntax and no /dev/null. It's a bit more complexity that doesn't seem
> necessary, to me.
> >
> >> >
> >> > Another way would be to have a portable solution to truncate a file
> >> > (maybe even removing it would work).  I don't think we should override
> >> > SHELL.
> >>
> >> Do you mean that perhaps an special command line argument could be
> >> added
> >> to lto-wrapper to do it, and then the makefile could invoke
> >> lto-wrapper
> >> to remove or truncate files instead of a shell? I'm not sure I get the
> >> proposed suggestion.
> >
> > The point is to truncate the file at the earliest point to reduce the
> > peak disk space required for a LTO link with many LTRANS units.
> > But yes, I guess it would be possible to add a flag to gcc itself
> > so the LTRANS compile would truncate the file.  Currently we
> > emit sth like
> >
> > ./libquantum.ltrans0.ltrans.o:
> > @/space/rguenther/install/trunk-r14-8925/usr/local/bin/gcc
> > '-xlto' '-c' '-fno-openmp' '-fno-openacc' '-fno-pie'
> > '-fcf-protection=none' '-g' '-mtune=generic' '-march=x86-64' '-O2'
> > '-O2' '-g' '-v' '-save-temps' '-mtune=generic' '-march=x86-64'
> > '-dumpdir' 'libquantum.' '-dumpbase' './libquantum.ltrans0.ltrans'
> > '-fltrans' '-o' './libquantum.ltrans0.ltrans.o'
> > './libquantum.ltrans0.o'
> >
> > so adding a '-truncate-input' flag for the driver or even more
> > explicit '-truncate ./libquantum.ltrans0.o'
> > might be a more elegant solution then?
>
> This is much more elegant. I like it, I can take a look at implementing
> it.
> Would I be looking at gcc.cc for this?

Yes, you'd add the Driver option to gcc/common.opt and handle it from gcc/gcc.cc

Richard.

> Best wishes,
> Peter.
>
> >
> > Richard.
> >
> >> >
> >> > Richard.
> >> >
> >> >> I tested this both in environments both with and without sh present,
> >> >> and
> >> >> observed no issues.
> >> >>
> >> >> Signed-off-by: Peter 

[PING] Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.

2024-03-27 Thread Aleksandar Rakic
I remind you that the patch for the computation of complexity for unsupported 
addressing modes ( https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109429 ) has 
been sent:
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647966.html


[PATCH] btf: Fix up btf-datasec-1.c test on x86

2024-03-27 Thread Jakub Jelinek
On Wed, Mar 27, 2024 at 11:18:49AM +0100, Jakub Jelinek wrote:
> > -/* The offset entry for each variable in a DATSEC should be 0 at compile 
> > time.  */
> > -/* { dg-final { scan-assembler-times "0\[\t \]+\[^\n\]*bts_offset" 7 } } */
> > +/* The offset entry for each variable in a DATSEC should contain a label.  
> > */
> > +/* { dg-final { scan-assembler-times ".4byte\[\t \]\[a-e\]\[\t 
> > \]+\[^\n\]*bts_offset" 5 } } */
> 
> 4byte is used only on some targets, what exact assembler directive is used
> for 4byte unaligned data is heavily target dependent.

Now in patch form, tested on x86_64-linux with
make check-gcc RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} btf.exp'
ok for trunk?

2024-03-27  Jakub Jelinek  

* gcc.dg/debug/btf/btf-cvr-quals-1.c: Use dg-additional-options
instead of multiple dg-options.
* gcc.dg/debug/btf/btf-datasec-1.c: Likewise.  Accept all supported
unaligned 4 byte assembler directives rather than assuming it must
be .4byte.

--- gcc/testsuite/gcc.dg/debug/btf/btf-cvr-quals-1.c.jj 2021-07-12 
22:57:26.056103708 +0200
+++ gcc/testsuite/gcc.dg/debug/btf/btf-cvr-quals-1.c2024-03-27 
12:09:25.383278017 +0100
@@ -23,7 +23,7 @@
 
 /* { dg-do compile } */
 /* { dg-options "-O0 -gbtf -dA" } */
-/* { dg-options "-O0 -gbtf -gdwarf-4 -dA" { target { *-*-darwin* } } } */
+/* { dg-additional-options "-gdwarf-4" { target { *-*-darwin* } } } */
 
 /* { dg-final { scan-assembler-times "ascii \"int.0\"\[\t 
\]+\[^\n\]*btf_string" 1 } } */
 
--- gcc/testsuite/gcc.dg/debug/btf/btf-datasec-1.c.jj   2024-03-26 
22:34:47.925757342 +0100
+++ gcc/testsuite/gcc.dg/debug/btf/btf-datasec-1.c  2024-03-27 
12:12:09.761045873 +0100
@@ -11,16 +11,16 @@
 
 /* { dg-do compile )  */
 /* { dg-options "-O0 -gbtf -dA" } */
-/* { dg-options "-O0 -gbtf -dA -msdata=none" { target { { powerpc*-*-* } && 
ilp32 } } } */
-/* { dg-options "-O0 -gbtf -dA -msmall-data-limit=0" { target { riscv*-*-* } } 
} */
-/* { dg-options "-O0 -gbtf -dA -G0" { target { nios2-*-* } } } */
+/* { dg-additional-options "-msdata=none" { target { { powerpc*-*-* } && ilp32 
} } } */
+/* { dg-additional-options "-msmall-data-limit=0" { target { riscv*-*-* } } } 
*/
+/* { dg-additional-options "-G0" { target { nios2-*-* } } } */
 
 /* Check for two DATASEC entries with vlen 3, and one with vlen 1.  */
 /* { dg-final { scan-assembler-times "0xf03\[\t \]+\[^\n\]*btt_info" 2 } } 
*/
 /* { dg-final { scan-assembler-times "0xf01\[\t \]+\[^\n\]*btt_info" 1 } } 
*/
 
 /* The offset entry for each variable in a DATSEC should contain a label.  */
-/* { dg-final { scan-assembler-times ".4byte\[\t \]\[a-e\]\[\t 
\]+\[^\n\]*bts_offset" 5 } } */
+/* { dg-final { scan-assembler-times 
"(?:(?:\\.4byte|\\.long|data4\\.ua|\\.ualong|\\.uaword|\\.dword|long|dc\\.l|\\.word)\[\t
 \]|\\.vbyte\t4,\[\t \]?)\[a-e\]\[\t \]+\[^\n\]*bts_offset" 5 } } */
 /* { dg-final { scan-assembler-times "my_cstruct\[\t \]+\[^\n\]*bts_offset" 1 
} } */
 /* { dg-final { scan-assembler-times "bigarr\[\t \]+\[^\n\]*bts_offset" 1 } } 
*/
 

Jakub



[PATCH] docs: Use @var{S} etc. in Spec File invoke.texi documentation

2024-03-27 Thread Jakub Jelinek
Hi!

We got internally a question about the Spec File syntax, misunderstanding
what is the literal syntax and what are the placeholder variables in
the syntax descriptions.
The following patch attempts to use @var{S} etc. instead of just S
to clarify it stands for any option (or start of option etc.) rather
than literal S, say in %{S:X}.  At least in HTML documentation it
then uses italics.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-03-26  Jakub Jelinek  

* doc/invoke.texi (Spec Files): Use @var{S} instead of S,
@var{X} instead of X etc. for other placeholders.

--- gcc/doc/invoke.texi.jj  2024-03-26 10:01:08.290920704 +0100
+++ gcc/doc/invoke.texi 2024-03-26 21:56:46.133690400 +0100
@@ -36794,17 +36794,17 @@ Substitute the variable part of a matche
 Note that each comma in the substituted string is replaced by
 a single space.
 
-@item %S
-Similar to @samp{%@var{S}
+Similar to @samp{%<@var{S}}, but keep @code{-@var{S}} in the GCC command line.
 
 @item %:@var{function}(@var{args})
 Call the named function @var{function}, passing it @var{args}.
@@ -36988,47 +36988,47 @@ otherwise.
 @end smallexample
 @end table
 
-@item %@{S@}
-Substitutes the @code{-S} switch, if that switch is given to GCC@.
+@item %@{@var{S}@}
+Substitutes the @code{-@var{S}} switch, if that switch is given to GCC@.
 If that switch is not specified, this substitutes nothing.  Note that
 the leading dash is omitted when specifying this option, and it is
 automatically inserted if the substitution is performed.  Thus the spec
 string @samp{%@{foo@}} matches the command-line option @option{-foo}
 and outputs the command-line option @option{-foo}.
 
-@item %W@{S@}
-Like %@{@code{S}@} but mark last argument supplied within as a file to be
+@item %W@{@var{S}@}
+Like %@{@code{@var{S}}@} but mark last argument supplied within as a file to be
 deleted on failure.
 
-@item %@@@{S@}
-Like %@{@code{S}@} but puts the result into a @code{FILE} and substitutes
+@item %@@@{@var{S}@}
+Like %@{@code{@var{S}}@} but puts the result into a @code{FILE} and substitutes
 @code{@@FILE} if an @code{@@file} argument has been supplied.
 
-@item %@{S*@}
+@item %@{@var{S}*@}
 Substitutes all the switches specified to GCC whose names start
-with @code{-S}, but which also take an argument.  This is used for
+with @code{-@var{S}}, but which also take an argument.  This is used for
 switches like @option{-o}, @option{-D}, @option{-I}, etc.
 GCC considers @option{-o foo} as being
 one switch whose name starts with @samp{o}.  %@{o*@} substitutes this
 text, including the space.  Thus two arguments are generated.
 
-@item %@{S**@}
-Like %@{@code{S}*@}, but preserve order of @code{S} and @code{T} options
-(the order of @code{S} and @code{T} in the spec is not significant).
+@item %@{@var{S}*&@var{T}*@}
+Like %@{@code{@var{S}}*@}, but preserve order of @code{@var{S}} and 
@code{@var{T}} options
+(the order of @code{@var{S}} and @code{@var{T}} in the spec is not 
significant).
 There can be any number of ampersand-separated variables; for each the
 wild card is optional.  Useful for CPP as @samp{%@{D***@}}.
 
-@item %@{S:X@}
-Substitutes @code{X}, if the @option{-S} switch is given to GCC@.
+@item %@{@var{S}:@var{X}@}
+Substitutes @code{@var{X}}, if the @option{-@var{S}} switch is given to GCC@.
 
-@item %@{!S:X@}
-Substitutes @code{X}, if the @option{-S} switch is @emph{not} given to GCC@.
+@item %@{!@var{S}:@var{X}@}
+Substitutes @code{@var{X}}, if the @option{-@var{S}} switch is @emph{not} 
given to GCC@.
 
-@item %@{S*:X@}
-Substitutes @code{X} if one or more switches whose names start with
-@code{-S} are specified to GCC@.  Normally @code{X} is substituted only
+@item %@{@var{S}*:@var{X}@}
+Substitutes @code{@var{X}} if one or more switches whose names start with
+@code{-@var{S}} are specified to GCC@.  Normally @code{@var{X}} is substituted 
only
 once, no matter how many such switches appeared.  However, if @code{%*}
-appears somewhere in @code{X}, then @code{X} is substituted once
+appears somewhere in @code{@var{X}}, then @code{@var{X}} is substituted once
 for each matching switch, with the @code{%*} replaced by the part of
 that switch matching the @code{*}.
 
@@ -37049,23 +37049,23 @@ when matching an option like @option{-mc
 --script=newchip/memory.ld
 @end smallexample
 
-@item %@{.S:X@}
-Substitutes @code{X}, if processing a file with suffix @code{S}.
+@item %@{.@var{S}:@var{X}@}
+Substitutes @code{@var{X}}, if processing a file with suffix @code{@var{S}}.
 
-@item %@{!.S:X@}
-Substitutes @code{X}, if @emph{not} processing a file with suffix @code{S}.
+@item %@{!.@var{S}:@var{X}@}
+Substitutes @code{@var{X}}, if @emph{not} processing a file with suffix 
@code{@var{S}}.
 
-@item %@{,S:X@}
-Substitutes @code{X}, if processing a file for language @code{S}.
+@item %@{,@var{S}:@var{X}@}
+Substitutes @code{@var{X}}, if processing a file for language @code{@var{S}}.
 
-@item %@{!,S:X@}
-Substitutes @code{X}, if 

Re: [PATCH v3] RISC-V: Replace zero_extendsidi2_shifted with generalized split

2024-03-27 Thread Philipp Tomsich
Jeff,

just a heads-up that that trunk (i.e., the soon-to-be GCC14) still
generates the suboptimal sequence:
  https://godbolt.org/z/K9YYEPsvY

Thanks,
Philipp.


On Mon, 21 Nov 2022 at 18:00, Philipp Tomsich  wrote:
>
> On Sun, 20 Nov 2022 at 17:38, Jeff Law  wrote:
> >
> >
> > On 11/9/22 16:10, Philipp Tomsich wrote:
> > > The current method of treating shifts of extended values on RISC-V
> > > frequently causes sequences of 3 shifts, despite the presence of the
> > > 'zero_extendsidi2_shifted' pattern.
> > >
> > > Consider:
> > >  unsigned long f(unsigned int a, unsigned long b)
> > >  {
> > >  a = a << 1;
> > >  unsigned long c = (unsigned long) a;
> > >  c = b + (c<<4);
> > >  return c;
> > >  }
> > > which will present at combine-time as:
> > >  Trying 7, 8 -> 9:
> > >  7: r78:SI=r81:DI#0<<0x1
> > >REG_DEAD r81:DI
> > >  8: r79:DI=zero_extend(r78:SI)
> > >REG_DEAD r78:SI
> > >  9: r72:DI=r79:DI<<0x4
> > >REG_DEAD r79:DI
> > >  Failed to match this instruction:
> > >  (set (reg:DI 72 [ _1 ])
> > >  (and:DI (ashift:DI (reg:DI 81)
> > >  (const_int 5 [0x5]))
> > >   (const_int 68719476704 [0xfffe0])))
> > > and produce the following (optimized) assembly:
> > >  f:
> > >   slliw   a5,a0,1
> > >   sllia5,a5,32
> > >   srlia5,a5,28
> > >   add a0,a5,a1
> > >   ret
> > >
> > > The current way of handling this (in 'zero_extendsidi2_shifted')
> > > doesn't apply for two reasons:
> > > - this is seen before reload, and
> > > - (more importantly) the constant mask is not 0xul.
> > >
> > > To address this, we introduce a generalized version of shifting
> > > zero-extended values that supports any mask of consecutive ones as
> > > long as the number of training zeros is the inner shift-amount.
> > >
> > > With this new split, we generate the following assembly for the
> > > aforementioned function:
> > >  f:
> > >   sllia0,a0,33
> > >   srlia0,a0,28
> > >   add a0,a0,a1
> > >   ret
> > >
> > > Unfortunately, all of this causes some fallout (especially in how it
> > > interacts with Zb* extensions and zero_extract expressions formed
> > > during combine): this is addressed through additional instruction
> > > splitting and handling of zero_extract.
> > >
> > > gcc/ChangeLog:
> > >
> > >   * config/riscv/bitmanip.md (*zext.w): Match a zext.w expressed
> > >  as an and:DI.
> > >   (*andi_add.uw): New pattern.
> > >   (*slli_slli_uw): New pattern.
> > >   (*shift_then_shNadd.uw): New pattern.
> > >   (*slliuw): Rename to riscv_slli_uw.
> > >   (riscv_slli_uw): Renamed from *slliuw.
> > >   (*zeroextract2_highbits): New pattern.
> > >   (*zero_extract): New pattern, which will be split to
> > >   shift-left + shift-right.
> > >   * config/riscv/predicates.md (dimode_shift_operand):
> > >   * config/riscv/riscv.md (*zero_extract_lowbits):
> > >   (zero_extendsidi2_shifted): Rename.
> > >   (*zero_extendsidi2_shifted): Generalize.
> > >   (*shift_truthvalue): New pattern.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   * gcc.target/riscv/shift-shift-6.c: New test.
> > >   * gcc.target/riscv/shift-shift-7.c: New test.
> > >   * gcc.target/riscv/shift-shift-8.c: New test.
> > >   * gcc.target/riscv/shift-shift-9.c: New test.
> > >   * gcc.target/riscv/snez.c: New test.
> > >
> > > Commit notes:
> > > - Depends on a predicate posted in "RISC-V: Optimize branches testing
> > >a bit-range or a shifted immediate".  Depending on the order of
> > >applying these, I'll take care to pull that part out of the other
> > >patch if needed.
> > >
> > > Version-changes: 2
> > > - refactor
> > > - optimise for additional corner cases and deal with fallout
> > >
> > > Version-changes: 3
> > > - removed the [WIP] from the commit message (no other changes)
> > >
> > > Signed-off-by: Philipp Tomsich 
> > > ---
> > >
> > > (no changes since v1)
> > >
> > >   gcc/config/riscv/bitmanip.md  | 142 ++
> > >   gcc/config/riscv/predicates.md|   5 +
> > >   gcc/config/riscv/riscv.md |  75 +++--
> > >   .../gcc.target/riscv/shift-shift-6.c  |  14 ++
> > >   .../gcc.target/riscv/shift-shift-7.c  |  16 ++
> > >   .../gcc.target/riscv/shift-shift-8.c  |  20 +++
> > >   .../gcc.target/riscv/shift-shift-9.c  |  15 ++
> > >   gcc/testsuite/gcc.target/riscv/snez.c |  14 ++
> > >   8 files changed, 261 insertions(+), 40 deletions(-)
> > >   create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-6.c
> > >   create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-7.c
> > >   create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-8.c
> > >   create mode 100644 

Re: [PATCH] testsuite: Fix up ext-floating{3,12}.C on i686-linux

2024-03-27 Thread Uros Bizjak
On Wed, Mar 27, 2024 at 11:48 AM Jakub Jelinek  wrote:
>
> Hi!
>
> These tests FAIL for quite a while on i686-linux since July last year,
> likely r14-2628 .  Since that patch gcc claims _Float16 and __bf16
> support even without -msse2 because some functions could be using
> target attribute.
> Later r14-2691 added -msse2 to add_options_for_float16, but didn't do that
> for bfloat16, plus ext-floating{3,12}.C tests need the added dg-add-options,
> so that float16 and bfloat16 effective targets match the __STDCPP_FLOAT16_T__
> or __STDCPP_BFLOAT16_T__ macros.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, fixes
> -FAIL: g++.dg/cpp23/ext-floating12.C  -std=gnu++23  (test for errors, line 
> 144)
> -FAIL: g++.dg/cpp23/ext-floating12.C  -std=gnu++23  (test for errors, line 
> 146)
> -FAIL: g++.dg/cpp23/ext-floating12.C  -std=gnu++23  (test for errors, line 
> 148)
> -FAIL: g++.dg/cpp23/ext-floating12.C  -std=gnu++23  (test for errors, line 
> 150)
> -FAIL: g++.dg/cpp23/ext-floating12.C  -std=gnu++23  (test for errors, line 
> 152)
> -FAIL: g++.dg/cpp23/ext-floating12.C  -std=gnu++23  (test for errors, line 
> 154)
> -FAIL: g++.dg/cpp23/ext-floating12.C  -std=gnu++26  (test for errors, line 
> 144)
> -FAIL: g++.dg/cpp23/ext-floating12.C  -std=gnu++26  (test for errors, line 
> 146)
> -FAIL: g++.dg/cpp23/ext-floating12.C  -std=gnu++26  (test for errors, line 
> 148)
> -FAIL: g++.dg/cpp23/ext-floating12.C  -std=gnu++26  (test for errors, line 
> 150)
> -FAIL: g++.dg/cpp23/ext-floating12.C  -std=gnu++26  (test for errors, line 
> 152)
> -FAIL: g++.dg/cpp23/ext-floating12.C  -std=gnu++26  (test for errors, line 
> 154)
> -FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++23  (test for errors, line 107)
> -FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++23  (test for errors, line 114)
> -FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++23  (test for errors, line 126)
> -FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++23  (test for errors, line 79)
> -FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++23  (test for errors, line 86)
> -FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++23  (test for errors, line 98)
> -FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++23  (test for warnings, line 
> 22)
> -FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++23  (test for warnings, line 
> 23)
> -FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++23  (test for warnings, line 
> 24)
> -FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++23  (test for warnings, line 
> 25)
> -FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++26  (test for errors, line 107)
> -FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++26  (test for errors, line 114)
> -FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++26  (test for errors, line 126)
> -FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++26  (test for errors, line 79)
> -FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++26  (test for errors, line 86)
> -FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++26  (test for errors, line 98)
> -FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++26  (test for warnings, line 
> 22)
> -FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++26  (test for warnings, line 
> 23)
> -FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++26  (test for warnings, line 
> 24)
> -FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++26  (test for warnings, line 
> 25)
> on the latter and changes nothing on the former, ok for trunk?
>
> 2024-03-26  Jakub Jelinek  
>
> * lib/target-supports.exp (add_options_for_bfloat16): Add -msse2 on
> i?86/x86_64.
> * g++.dg/cpp23/ext-floating3.C: Add dg-add-options float16.
> * g++.dg/cpp23/ext-floating12.C: Add dg-add-options float16 and
> bfloat16.

OK.

Thanks,
Uros.

> --- gcc/testsuite/lib/target-supports.exp.jj2024-03-19 08:55:18.500791497 
> +0100
> +++ gcc/testsuite/lib/target-supports.exp   2024-03-26 20:30:41.963222438 
> +0100
> @@ -3829,6 +3829,9 @@ proc check_effective_target_bfloat16_run
>  }
>
>  proc add_options_for_bfloat16 { flags } {
> +if { [istarget i?86-*-*] || [istarget x86_64-*-*] } {
> +   return "$flags -msse2"
> +}
>  return "$flags"
>  }
>
> --- gcc/testsuite/g++.dg/cpp23/ext-floating3.C.jj   2022-09-27 
> 08:03:27.0 +0200
> +++ gcc/testsuite/g++.dg/cpp23/ext-floating3.C  2024-03-26 20:26:41.921609624 
> +0100
> @@ -4,6 +4,7 @@
>  // And some further tests.
>  // { dg-do compile { target { c++23 && { i?86-*-linux* x86_64-*-linux* } } } 
> }
>  // { dg-options "" }
> +// { dg-add-options float16 }
>
>  #include "ext-floating.h"
>
> --- gcc/testsuite/g++.dg/cpp23/ext-floating12.C.jj  2022-10-31 
> 20:15:49.72032 +0100
> +++ gcc/testsuite/g++.dg/cpp23/ext-floating12.C 2024-03-26 20:31:29.876546341 
> +0100
> @@ -1,6 +1,8 @@
>  // P1467R9 - Extended floating-point types and standard names.
>  // { dg-do compile { target { c++23 && { i?86-*-linux* x86_64-*-linux* } } } 
> }
>  // { dg-options "" }
> +// { dg-add-options float16 }
> +// { dg-add-options bfloat16 }
>
>  #include 
>  #include 
>
>   

[PATCH] testsuite: Fix up ext-floating{3,12}.C on i686-linux

2024-03-27 Thread Jakub Jelinek
Hi!

These tests FAIL for quite a while on i686-linux since July last year,
likely r14-2628 .  Since that patch gcc claims _Float16 and __bf16
support even without -msse2 because some functions could be using
target attribute.
Later r14-2691 added -msse2 to add_options_for_float16, but didn't do that
for bfloat16, plus ext-floating{3,12}.C tests need the added dg-add-options,
so that float16 and bfloat16 effective targets match the __STDCPP_FLOAT16_T__
or __STDCPP_BFLOAT16_T__ macros.

Bootstrapped/regtested on x86_64-linux and i686-linux, fixes
-FAIL: g++.dg/cpp23/ext-floating12.C  -std=gnu++23  (test for errors, line 144)
-FAIL: g++.dg/cpp23/ext-floating12.C  -std=gnu++23  (test for errors, line 146)
-FAIL: g++.dg/cpp23/ext-floating12.C  -std=gnu++23  (test for errors, line 148)
-FAIL: g++.dg/cpp23/ext-floating12.C  -std=gnu++23  (test for errors, line 150)
-FAIL: g++.dg/cpp23/ext-floating12.C  -std=gnu++23  (test for errors, line 152)
-FAIL: g++.dg/cpp23/ext-floating12.C  -std=gnu++23  (test for errors, line 154)
-FAIL: g++.dg/cpp23/ext-floating12.C  -std=gnu++26  (test for errors, line 144)
-FAIL: g++.dg/cpp23/ext-floating12.C  -std=gnu++26  (test for errors, line 146)
-FAIL: g++.dg/cpp23/ext-floating12.C  -std=gnu++26  (test for errors, line 148)
-FAIL: g++.dg/cpp23/ext-floating12.C  -std=gnu++26  (test for errors, line 150)
-FAIL: g++.dg/cpp23/ext-floating12.C  -std=gnu++26  (test for errors, line 152)
-FAIL: g++.dg/cpp23/ext-floating12.C  -std=gnu++26  (test for errors, line 154)
-FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++23  (test for errors, line 107)
-FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++23  (test for errors, line 114)
-FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++23  (test for errors, line 126)
-FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++23  (test for errors, line 79)
-FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++23  (test for errors, line 86)
-FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++23  (test for errors, line 98)
-FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++23  (test for warnings, line 22)
-FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++23  (test for warnings, line 23)
-FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++23  (test for warnings, line 24)
-FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++23  (test for warnings, line 25)
-FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++26  (test for errors, line 107)
-FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++26  (test for errors, line 114)
-FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++26  (test for errors, line 126)
-FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++26  (test for errors, line 79)
-FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++26  (test for errors, line 86)
-FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++26  (test for errors, line 98)
-FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++26  (test for warnings, line 22)
-FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++26  (test for warnings, line 23)
-FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++26  (test for warnings, line 24)
-FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++26  (test for warnings, line 25)
on the latter and changes nothing on the former, ok for trunk?

2024-03-26  Jakub Jelinek  

* lib/target-supports.exp (add_options_for_bfloat16): Add -msse2 on
i?86/x86_64.
* g++.dg/cpp23/ext-floating3.C: Add dg-add-options float16.
* g++.dg/cpp23/ext-floating12.C: Add dg-add-options float16 and
bfloat16.

--- gcc/testsuite/lib/target-supports.exp.jj2024-03-19 08:55:18.500791497 
+0100
+++ gcc/testsuite/lib/target-supports.exp   2024-03-26 20:30:41.963222438 
+0100
@@ -3829,6 +3829,9 @@ proc check_effective_target_bfloat16_run
 }
 
 proc add_options_for_bfloat16 { flags } {
+if { [istarget i?86-*-*] || [istarget x86_64-*-*] } {
+   return "$flags -msse2"
+}
 return "$flags"
 }
 
--- gcc/testsuite/g++.dg/cpp23/ext-floating3.C.jj   2022-09-27 
08:03:27.0 +0200
+++ gcc/testsuite/g++.dg/cpp23/ext-floating3.C  2024-03-26 20:26:41.921609624 
+0100
@@ -4,6 +4,7 @@
 // And some further tests.
 // { dg-do compile { target { c++23 && { i?86-*-linux* x86_64-*-linux* } } } }
 // { dg-options "" }
+// { dg-add-options float16 }
 
 #include "ext-floating.h"
 
--- gcc/testsuite/g++.dg/cpp23/ext-floating12.C.jj  2022-10-31 
20:15:49.72032 +0100
+++ gcc/testsuite/g++.dg/cpp23/ext-floating12.C 2024-03-26 20:31:29.876546341 
+0100
@@ -1,6 +1,8 @@
 // P1467R9 - Extended floating-point types and standard names.
 // { dg-do compile { target { c++23 && { i?86-*-linux* x86_64-*-linux* } } } }
 // { dg-options "" }
+// { dg-add-options float16 }
+// { dg-add-options bfloat16 }
 
 #include 
 #include 

Jakub



Re: [PATCH] LoongArch: Increase division costs

2024-03-27 Thread Xi Ruoyao
On Wed, 2024-03-27 at 10:38 +0800, chenglulu wrote:
> 
> 在 2024/3/26 下午5:48, Xi Ruoyao 写道:
> > The latency of LA464 and LA664 division instructions depends on the
> > input.  When I updated the costs in r14-6642, I unintentionally set the
> > division costs to the best-case latency (when the first operand is 0).
> > Per a recent discussion [1] we should use "something sensible" instead
> > of it.
> > 
> > Use the average of the minimum and maximum latency observed instead.
> > This enables multiplication to reciprocal sequence reduction and speeds
> > up the following test case for about 30%:
> > 
> >  int
> >  main (void)
> >  {
> >    unsigned long stat = 0xdeadbeef;
> >    for (int i = 0; i < 1; i++)
> >  stat = (stat * stat + stat * 114514 + 1919810) % 17;
> >    asm(""::"r"(stat));
> >  }
> > 
> > [1]: https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648348.html
> 
> The test case div-const-reduction.c is modified to assemble the instruction
> sequence as follows:
>   lu12i.w $r12,97440>>12  # 0x3b9ac000
>   ori $r12,$r12,2567
>   mod.w   $r13,$r13,$r12
> 
> This sequence of instructions takes 5 clock cycles.

Hmm indeed, it seems a waste to do this reduction for int / 17.
I'll try to make a better heuristic as Richard suggests...


-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[PATCH] match.pd: Avoid (view_convert (convert@0 @1)) optimization for extended _BitInts with padding bits [PR114469]

2024-03-27 Thread Jakub Jelinek
Hi!

The following patch attempts to fix the (view_convert (convert@0 @1))
optimization.  If TREE_TYPE (@0) is a _BitInt type with padding bits
and @0 has the same precision as @1 and it has a different sign
and _BitInt with padding bits are extended on the target (x86 doesn't,
aarch64 doesn't, but arm plans to do that), then optimizing it to
just (view_convert @1) is wrong, the padding bits wouldn't be what
it should be.
E.g. bitint-64.c test with -O2 has
  _5 = (unsigned _BitInt(5)) _4;
  _7 = (unsigned _BitInt(5)) e.0_1;
  _8 = _5 + _7;
  _9 = (_BitInt(5)) _8;
  _10 = VIEW_CONVERT_EXPR(_9);
and forwprop1 changes that to just
  _5 = (unsigned _BitInt(5)) _4;
  _7 = (unsigned _BitInt(5)) e.0_1;
  _8 = _5 + _7;
  _10 = VIEW_CONVERT_EXPR(_8);
The former makes the padding bits well defined (at least on arm in
the future), while the latter has those bits zero extended.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-03-26  Jakub Jelinek  

PR tree-optimization/114469
* match.pd ((view_convert (convert@0 @1))): Don't optimize if
TREE_TYPE (@0) is a BITINT_TYPE with padding bits which are supposed
to be extended by the ABI.

--- gcc/match.pd.jj 2024-03-15 11:04:24.672914747 +0100
+++ gcc/match.pd2024-03-26 15:49:44.177864509 +0100
@@ -4699,13 +4699,38 @@ (define_operator_list SYNC_FETCH_AND_AND
zero-extend while keeping the same size (for bool-to-char).  */
 (simplify
   (view_convert (convert@0 @1))
-  (if ((INTEGRAL_TYPE_P (TREE_TYPE (@0)) || POINTER_TYPE_P (TREE_TYPE (@0)))
-   && (INTEGRAL_TYPE_P (TREE_TYPE (@1)) || POINTER_TYPE_P (TREE_TYPE (@1)))
-   && TYPE_SIZE (TREE_TYPE (@0)) == TYPE_SIZE (TREE_TYPE (@1))
-   && (TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1))
-  || (TYPE_PRECISION (TREE_TYPE (@0)) > TYPE_PRECISION (TREE_TYPE (@1))
-  && TYPE_UNSIGNED (TREE_TYPE (@1)
-   (view_convert @1)))
+  (with { tree type0 = TREE_TYPE (@0);
+ tree type1 = TREE_TYPE (@1);
+ bool ok = false;
+ if ((INTEGRAL_TYPE_P (type0) || POINTER_TYPE_P (type0))
+ && (INTEGRAL_TYPE_P (type1) || POINTER_TYPE_P (type1))
+ && TYPE_SIZE (type0) == TYPE_SIZE (type1))
+   {
+ if (TYPE_PRECISION (type0) == TYPE_PRECISION (type1))
+   {
+ if (TREE_CODE (type0) != BITINT_TYPE)
+   ok = true;
+ else
+   {
+ /* Avoid optimizing this if type0 is a _BitInt
+type with padding bits which are supposed to be
+extended.  */
+ struct bitint_info info;
+ targetm.c.bitint_type_info (TYPE_PRECISION (type0),
+ );
+ if (!info.extended)
+   ok = true;
+ else
+   ok = (TYPE_PRECISION (type0)
+ == tree_to_uhwi (TYPE_SIZE (type0)));
+   }
+   }
+ else if (TYPE_PRECISION (type0) > TYPE_PRECISION (type1)
+  && TYPE_UNSIGNED (type1))
+   ok = true;
+   } }
+   (if (ok)
+(view_convert @1
 
 /* Simplify a view-converted empty or single-element constructor.  */
 (simplify

Jakub



Re: [PATCH v2] libstdc++: add ARM SVE support to std::experimental::simd

2024-03-27 Thread Matthias Kretz
On Wednesday, 27 March 2024 11:07:14 CET Richard Sandiford wrote:
> I'm still worried about:
> 
>   #if _GLIBCXX_SIMD_HAVE_SVE
>   constexpr inline int __sve_vectorized_size_bytes = __ARM_FEATURE_SVE_BITS
> / 8; #else
>   constexpr inline int __sve_vectorized_size_bytes = 0;
>   #endif
> 
> and the direct use __ARM_FEATURE_SVE_BITS elsewhere, for the reasons
> discussed here (including possible ODR problems):
> 
>   https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640037.html
>   https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643734.html
> 
> Logically the vector length should be a template parameter rather than
> an invariant.  Has this been resolved?  If not, it feels like a blocker
> to me (sorry).

The vector length is always a template parameter to all user-facing API. Some 
examples

1. on aarch64 the following is independent of SVE flags (and status quo):

  simd is an alias for
  simd

  fixed_size_simd is supposed to be ABI-stable anyway (passed via
  the stack, alignof == sizeof).

2. with -msve-vector-bits=512:

  native_simd is an alias for
  simd>

  simd> is an alias for
  simd>

3. with -msve-vector-bits=256: 

  native_simd is an alias for
  simd>

  simd> is an alias for
  simd>

Implementation functions are either [[gnu::always_inline]] or tagged with the 
ABI tag type and the __odr_helper template argument (to ensure not-inlined 
inline functions have unique names).

Does that make __ARM_FEATURE_SVE_BITS usage indirect enough?

Also for context, please consider that this is std::*experimental*::simd. The 
underlying ISO document will likely get retracted at some point and the whole 
API and implementation (hopefully) superseded by C++26. The main purpose of 
the spec and implementation is to gather experience.

Best,
  Matthias

-- 
──
 Dr. Matthias Kretz   https://mattkretz.github.io
 GSI Helmholtz Center for Heavy Ion Research   https://gsi.de
 std::simd
──


signature.asc
Description: This is a digitally signed message part.


[PATCH] c-family: Cast __atomic_load_*/__atomic_exchange_* result to _BitInt rather then VCE it [PR114469]

2024-03-27 Thread Jakub Jelinek
Hi!

As written in the PR, torture/bitint-64.c test fails with -O2 -flto
and the reason is that on _BitInt arches where the padding bits
are undefined, the padding bits in the _Atomic vars are also undefined,
but when __atomic_load or __atomic_exchange on a _BitInt _Atomic variable
with some padding bits is lowered into __atomic_load_{1,2,4,8,16} or
__atomic_exchange_*, the mode precision unsigned result is VIEW_CONVERT_EXPR
converted to _BitInt and because of the VCE nothing actually sign/zero
extends it as needed for later uses - the var is no longer addressable and
expansion assumes such automatic vars are properly extended.

The following patch fixes that by using NOP_EXPR on it (the
VIEW_CONVERT_EXPR after it will then be optimized away during
gimplification, didn't want to repeat it in the code as else result = build1
(VIEW_CONVERT_EXPR, ...); twice.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-03-26  Jakub Jelinek  

PR tree-optimization/114469
* c-common.cc (resolve_overloaded_builtin): For _BitInt result
on !extended targets convert result to the _BitInt type before
using VIEW_CONVERT_EXPR.

--- gcc/c-family/c-common.cc.jj 2024-03-08 09:29:30.257747832 +0100
+++ gcc/c-family/c-common.cc2024-03-26 14:30:59.493031026 +0100
@@ -8461,7 +8461,19 @@ resolve_overloaded_builtin (location_t l
if (new_return)
  {
/* Cast function result from I{1,2,4,8,16} to the required type.  */
-   result = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (new_return), result);
+   if (TREE_CODE (TREE_TYPE (new_return)) == BITINT_TYPE)
+ {
+   struct bitint_info info;
+   unsigned prec = TYPE_PRECISION (TREE_TYPE (new_return));
+   targetm.c.bitint_type_info (prec, );
+   if (!info.extended)
+ /* For _BitInt which has the padding bits undefined
+convert to the _BitInt type rather than VCE to force
+zero or sign extension.  */
+ result = build1 (NOP_EXPR, TREE_TYPE (new_return), result);
+ }
+   result
+ = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (new_return), result);
result = build2 (MODIFY_EXPR, TREE_TYPE (new_return), new_return,
 result);
TREE_SIDE_EFFECTS (result) = 1;

Jakub



Re: [PATCH] btf: Emit labels in DATASEC bts_offset entries.

2024-03-27 Thread Jakub Jelinek
On Tue, Mar 26, 2024 at 02:28:52PM +, Cupertino Miranda wrote:
> GCC was defining bts_offset entry to always contain 0.
> When comparing with clang, the same entry is instead a label to the
> respective variable or function. The assembler emits relocations for
> those labels.
> 
> gcc/ChangeLog:
>   PR target/114431
>   * btfout.cc (get_name_for_datasec_entry): Add function.
>   (btf_asm_datasec_entry): Print label when possible.
> 
> gcc/testsuite/ChangeLog:
>   gcc.dg/debug/btf/btf-datasec-1.c: Correct for new
>   implementation.
>   gcc.dg/debug/btf/btf-datasec-2.c: Likewise
>   gcc.dg/debug/btf/btf-pr106773.c: Likewise

For next time, missing dot after Likewise

> --- a/gcc/testsuite/gcc.dg/debug/btf/btf-datasec-1.c
> +++ b/gcc/testsuite/gcc.dg/debug/btf/btf-datasec-1.c
> @@ -19,8 +19,10 @@
>  /* { dg-final { scan-assembler-times "0xf03\[\t \]+\[^\n\]*btt_info" 2 } 
> } */
>  /* { dg-final { scan-assembler-times "0xf01\[\t \]+\[^\n\]*btt_info" 1 } 
> } */
>  
> -/* The offset entry for each variable in a DATSEC should be 0 at compile 
> time.  */
> -/* { dg-final { scan-assembler-times "0\[\t \]+\[^\n\]*bts_offset" 7 } } */
> +/* The offset entry for each variable in a DATSEC should contain a label.  */
> +/* { dg-final { scan-assembler-times ".4byte\[\t \]\[a-e\]\[\t 
> \]+\[^\n\]*bts_offset" 5 } } */

Where has this been tested?
4byte is used only on some targets, what exact assembler directive is used
for 4byte unaligned data is heavily target dependent.
git grep define.*TARGET_ASM_UNALIGNED_SI_OP
gcc/config/alpha/alpha.cc:#define TARGET_ASM_UNALIGNED_SI_OP "\t.align 
0\n\t.long\t"
gcc/config/avr/avr.cc:#define TARGET_ASM_UNALIGNED_SI_OP "\t.long\t"
gcc/config/cris/cris.cc:#define TARGET_ASM_UNALIGNED_SI_OP 
TARGET_ASM_ALIGNED_SI_OP
gcc/config/csky/csky.cc:#define TARGET_ASM_UNALIGNED_SI_OP "\t.long\t"
gcc/config/i386/i386.cc:#define TARGET_ASM_UNALIGNED_SI_OP 
TARGET_ASM_ALIGNED_SI_OP
gcc/config/ia64/ia64.cc:#define TARGET_ASM_UNALIGNED_SI_OP "\tdata4.ua\t"
gcc/config/m68k/m68k.cc:#define TARGET_ASM_UNALIGNED_SI_OP 
TARGET_ASM_ALIGNED_SI_OP
gcc/config/mcore/mcore.cc:#define TARGET_ASM_UNALIGNED_SI_OP "\t.long\t"
gcc/config/pa/pa.cc:#define TARGET_ASM_UNALIGNED_SI_OP TARGET_ASM_ALIGNED_SI_OP
gcc/config/rs6000/rs6000.cc:#define TARGET_ASM_UNALIGNED_SI_OP "\t.vbyte\t4,"
gcc/config/rs6000/rs6000.cc:#define TARGET_ASM_UNALIGNED_SI_OP "\t.long\t"
gcc/config/sh/sh.cc:#define TARGET_ASM_UNALIGNED_SI_OP "\t.ualong\t"
gcc/config/sparc/sparc.cc:#define TARGET_ASM_UNALIGNED_SI_OP "\t.uaword\t"
gcc/target-def.h:#define TARGET_ASM_UNALIGNED_SI_OP "\t.4byte\t"
gcc/target-def.h:#define TARGET_ASM_UNALIGNED_SI_OP NULL
Now
git grep 'define[[:blank:]]*TARGET_ASM_ALIGNED_SI_OP' | grep 
'/\(cris\|i386\|m68k\|pa\)/'
gcc/config/cris/cris.cc:#define TARGET_ASM_ALIGNED_SI_OP "\t.dword\t"
gcc/config/i386/i386.cc:#define TARGET_ASM_ALIGNED_SI_OP ASM_LONG
gcc/config/m68k/m68k.cc:#define TARGET_ASM_ALIGNED_SI_OP "\tlong\t"
gcc/config/m68k/m68k.cc:#define TARGET_ASM_ALIGNED_SI_OP "\tdc.l\t"
gcc/config/pa/pa.cc:#define TARGET_ASM_ALIGNED_SI_OP "\t.word\t"
git grep 'define[[:blank:]]*ASM_LONG'
gcc/config/i386/att.h:#define ASM_LONG "\t.long\t"
gcc/config/i386/bsd.h:#define ASM_LONG "\t.long\t"
gcc/config/i386/darwin.h:#define ASM_LONG "\t.long\t"

So, if you want to handle it for all arches, instead of .4byte\[\t \] you'd 
need to
use
(?:(?:.4byte|.long|data4.ua|.ualong|.uaword|.dword|long|dc.l|.word)\[\t 
\]|.vbyte\t4,\[\t \]?)
or so.

BTW,
/* { dg-options "-O0 -gbtf -dA" } */
/* { dg-options "-O0 -gbtf -dA -msdata=none" { target { { powerpc*-*-* } && 
ilp32 } } } */
/* { dg-options "-O0 -gbtf -dA -msmall-data-limit=0" { target { riscv*-*-* } } 
} */
/* { dg-options "-O0 -gbtf -dA -G0" { target { nios2-*-* } } } */
would be perhaps right for test written 2 decades ago, but for a test
written 3 years ago is something that shouldn't have passed patch review.
This should be done with
/* { dg-options "-O0 -gbtf -dA" } */
/* { dg-additional-options "-msdata=none" { target { { powerpc*-*-* } && ilp32 
} } } */
/* { dg-additional-options "-msmall-data-limit=0" { target { riscv*-*-* } } } */
/* { dg-additional-options "-G0" { target { nios2-*-* } } } */

btf-cvr-quals-1.c test suffers from that too.

Jakub



Re: [PATCH] aarch64: Add +lse128 architectural extension command-line flag

2024-03-27 Thread Victor Do Nascimento

On 3/26/24 12:26, Richard Sandiford wrote:

Victor Do Nascimento  writes:

Given how, at present, the choice of using LSE128 atomic instructions
by the toolchain is delegated to run-time selection in the form of
Libatomic ifuncs, responsible for querying target support, the
`+lse128' target architecture compile-time flag is absent from GCC.

This, however, contrasts with the Binutils implementation, which gates
LSE128 instructions behind the `+lse128' flag.  This can lead to
problems in GCC for certain use-cases.  One such example is in the use
of inline assembly, whereby the inability of enabling the feature in
the command-line prevents the compiler from automatically issuing the
necessary LSE128 `.arch' directive.

This patch therefore brings GCC into alignment with LLVM and Binutils
in adding support for the `+lse128' architectural extension flag.

gcc/ChangeLog:

* config/aarch64/aarch64-option-extensions.def: Add LSE128
AARCH64_OPT_EXTENSION, adding it as a dependency for the D128
feature.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/lse128-flag.c: New.
* gcc.target/aarch64/cpunative/info_23: Likewise.
* gcc.target/aarch64/cpunative/native_cpu_23.c: Likewise.


The new extension should be documented in doc/invoke.texi.


---
  gcc/config/aarch64/aarch64-option-extensions.def  |  4 +++-
  gcc/testsuite/gcc.target/aarch64/cpunative/info_23|  8 
  .../gcc.target/aarch64/cpunative/native_cpu_23.c  | 11 +++
  gcc/testsuite/gcc.target/aarch64/lse128-flag.c| 10 ++
  4 files changed, 32 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/gcc.target/aarch64/cpunative/info_23
  create mode 100644 gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_23.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/lse128-flag.c

diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
b/gcc/config/aarch64/aarch64-option-extensions.def
index 1a3b91c68cf..ac54b899a06 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -275,7 +275,9 @@ AARCH64_OPT_EXTENSION("mops", MOPS, (), (), (), "")
  
  AARCH64_OPT_EXTENSION("cssc", CSSC, (), (), (), "cssc")
  
-AARCH64_OPT_EXTENSION("d128", D128, (), (), (), "d128")

+AARCH64_OPT_EXTENSION("lse128", LSE128, (LSE), (), (), "lse128")
+
+AARCH64_OPT_EXTENSION("d128", D128, (LSE128), (), (), "d128")
  
  AARCH64_OPT_EXTENSION("the", THE, (), (), (), "the")
  
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/info_23 b/gcc/testsuite/gcc.target/aarch64/cpunative/info_23

new file mode 100644
index 000..d77c25d2f61
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/info_23
@@ -0,0 +1,8 @@
+processor  : 0
+BogoMIPS   : 100.00
+Features   : fp asimd evtstrm aes pmull sha1 sha2 crc32 asimddp atomics 
lse128
+CPU implementer: 0xfe
+CPU architecture: 8
+CPU variant: 0x0
+CPU part   : 0xd08
+CPU revision   : 2
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_23.c 
b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_23.c
new file mode 100644
index 000..8a1e235d8ab
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_23.c
@@ -0,0 +1,11 @@
+/* { dg-do compile { target { { aarch64*-*-linux*} && native } } } */
+/* { dg-set-compiler-env-var GCC_CPUINFO 
"$srcdir/gcc.target/aarch64/cpunative/info_23" } */
+/* { dg-additional-options "-mcpu=native" } */
+
+int main()
+{
+  return 0;
+}
+
+/* { dg-final { scan-assembler {\.arch armv8-a\+dotprod\+crc\+crypto\+lse128} 
} } */
+/* Test one where lse128 is available and so should be emitted.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/lse128-flag.c 
b/gcc/testsuite/gcc.target/aarch64/lse128-flag.c
new file mode 100644
index 000..71339c3af6d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/lse128-flag.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target { aarch64*-*-*} } } */
+/* { dg-additional-options "-march=armv9.4-a+lse128" } */
+
+int main()
+{
+  return 0;
+}
+
+/* { dg-final { scan-assembler {\.arch armv9\.4-a\+crc\+lse128} } } */
+/* Test a normal looking procinfo.  */


Not sure I understand the comment.  Is procinfo part of this test?


Thank you for catching this, Richard.  This was originally a 
procinfo-reliant test which ended up being redesigned without reference 
to procinfo and this remained.  I've now dropped it from the patch.


Cheers!


Looks good otherwise.

Thanks,
Richard


Re: [PATCH v2] libstdc++: add ARM SVE support to std::experimental::simd

2024-03-27 Thread Richard Sandiford
Jonathan Wakely  writes:
> On Fri, 8 Mar 2024 at 09:58, Matthias Kretz wrote:
>>
>> Hi,
>>
>> I applied and did extended testing on x86_64 (no regressions) and aarch64
>> using qemu testing SVE 256, 512, and 1024. Looks good!
>>
>> While going through the applied patch I noticed a few style issues that I
>> simply turned into a patch (attached).
>>
> [...]
>>
>> From my side, with the noted changes the patch is ready for merging.
>> @Jonathan, any chance for a green light before GCC 14.1?
>
> As discussed on IRC, please push the revised patch with your
> suggestions incorporated (and post to the lists for posterity).
>
> Thanks, everybody, for the patches and the thorough review.

I'm still worried about:

  #if _GLIBCXX_SIMD_HAVE_SVE
  constexpr inline int __sve_vectorized_size_bytes = __ARM_FEATURE_SVE_BITS / 8;
  #else
  constexpr inline int __sve_vectorized_size_bytes = 0;
  #endif

and the direct use __ARM_FEATURE_SVE_BITS elsewhere, for the reasons
discussed here (including possible ODR problems):

  https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640037.html
  https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643734.html

Logically the vector length should be a template parameter rather than
an invariant.  Has this been resolved?  If not, it feels like a blocker
to me (sorry).

Thanks,
Richard


Re: [PATCH v2] libstdc++: add ARM SVE support to std::experimental::simd

2024-03-27 Thread Jonathan Wakely
On Fri, 8 Mar 2024 at 09:58, Matthias Kretz wrote:
>
> Hi,
>
> I applied and did extended testing on x86_64 (no regressions) and aarch64
> using qemu testing SVE 256, 512, and 1024. Looks good!
>
> While going through the applied patch I noticed a few style issues that I
> simply turned into a patch (attached).
>
[...]
>
> From my side, with the noted changes the patch is ready for merging.
> @Jonathan, any chance for a green light before GCC 14.1?

As discussed on IRC, please push the revised patch with your
suggestions incorporated (and post to the lists for posterity).

Thanks, everybody, for the patches and the thorough review.



Re: [PATCH] libstdc++: Fix call signature of builtins from masked ++/--

2024-03-27 Thread Jonathan Wakely
On Wed, 27 Mar 2024 at 08:51, Matthias Kretz  wrote:
>
> This is broken on GCC 13 and 14 (https://compiler-explorer.com/z/GPKGPGs6T)
>
> Tested on x86_64-linux-gnu.
>
> OK for trunk and 13?

OK for both, thanks.


>
> --- 8< -
>
> This resolves failures in the "expensive" where-* test of check-simd
> when targeting AVX-512.
>
> Signed-off-by: Matthias Kretz 
>
> libstdc++-v3/ChangeLog:
>
> * include/experimental/bits/simd_x86.h (_S_masked_unary): Call
> the 4- and 8-byte variants of __builtin_ia32_subp[ds] without
> rounding direction argument.
> ---
>  libstdc++-v3/include/experimental/bits/simd_x86.h | 12 +---
>  1 file changed, 9 insertions(+), 3 deletions(-)
>
>
> --
> ──
>  Dr. Matthias Kretz   https://mattkretz.github.io
>  GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
>  stdₓ::simd
> ──


Re: [PATCH] lto: Don't assume a posix shell is usable on windows [PR110710]

2024-03-27 Thread peter0x44

On 2024-03-27 01:58, Richard Biener wrote:
On Wed, Mar 27, 2024 at 9:13 AM Peter0x44  
wrote:


I accidentally replied off-list. Sorry.

27 Mar 2024 8:09:30 am Peter0x44 :


27 Mar 2024 7:51:26 am Richard Biener :

> On Tue, Mar 26, 2024 at 11:37 PM Peter Damianov 
> wrote:
>>
>> lto-wrapper generates Makefiles that use the following:
>> touch -r file file.tmp && mv file.tmp file
>> to truncate files.
>> If there is no suitable "touch" or "mv" available, then this errors
>> with
>> "The system cannot find the file specified".
>>
>> The solution here is to check if sh -c true works, before trying to
>> use it in
>> the Makefile. If it doesn't, then fall back to "copy /y nul file"
>> instead.
>> The fallback doesn't work exactly the same (the modified time of the
>> file gets
>> updated), but this doesn't seem to matter in practice.
>
> I suppose it doesn't matter as we (no longer?) have the input as
> dependency
> on the rule so make doesn't get confused to re-build it.  I guess we
> only truncate
> the file because it's going to be deleted by another process.
>
> Instead of doing sth like sh_exists I would suggest to simply use
> #ifdef __WIN
> or something like that?  Not sure if we have suitable macros to
> identify the
> host operating system though and whether mingw, cygwin, etc. behave the
> same
> here.

They do, I tested. Using sh_exists is deliberate, I've had to program 
on
school computers that had cmd.exe disabled, but had busybox-w32 
working,
so it might be more flexible in that way. I would prefer a solution 
which

didn't require invoking cmd.exe if there is a working shell present.


Hmm, but then I'd expect SHELL to be appropriately set in such
situation?  So shouldn't sh_exists at least try to look at $SHELL?
I'm not sure it would . On windows, make searches the PATH for an sh.exe 
if

present, and then uses it by default. The relevant code is here:
https://git.savannah.gnu.org/cgit/make.git/tree/src/variable.c#n1628



I figured doing the "sh_exists" is okay because there is a basically
identical check for make.

>
> As a stop-gap solution doing
>
>   ( @-touch -r ... ) || true
>
> might also work?  Or another way to note to make the command can fail
> without causing a problem.

I don't think it would work. cmd.exe can't run subshells like this.


Hmm, OK.  So this is all for the case where 'make' is available (as you
say we check for that) but no POSIX command environment is
(IIRC both touch and mv are part of  POSIX).  Testing for 'touch' and
'mv' would then be another option?
I think it would work, but keep in mind they could be placed on the PATH 
but
still invoked from cmd.exe, so you might have to be careful of the 
redirection

syntax and no /dev/null. It's a bit more complexity that doesn't seem
necessary, to me.



>
> Another way would be to have a portable solution to truncate a file
> (maybe even removing it would work).  I don't think we should override
> SHELL.

Do you mean that perhaps an special command line argument could be 
added
to lto-wrapper to do it, and then the makefile could invoke 
lto-wrapper

to remove or truncate files instead of a shell? I'm not sure I get the
proposed suggestion.


The point is to truncate the file at the earliest point to reduce the
peak disk space required for a LTO link with many LTRANS units.
But yes, I guess it would be possible to add a flag to gcc itself
so the LTRANS compile would truncate the file.  Currently we
emit sth like

./libquantum.ltrans0.ltrans.o:
@/space/rguenther/install/trunk-r14-8925/usr/local/bin/gcc
'-xlto' '-c' '-fno-openmp' '-fno-openacc' '-fno-pie'
'-fcf-protection=none' '-g' '-mtune=generic' '-march=x86-64' '-O2'
'-O2' '-g' '-v' '-save-temps' '-mtune=generic' '-march=x86-64'
'-dumpdir' 'libquantum.' '-dumpbase' './libquantum.ltrans0.ltrans'
'-fltrans' '-o' './libquantum.ltrans0.ltrans.o'
'./libquantum.ltrans0.o'

so adding a '-truncate-input' flag for the driver or even more
explicit '-truncate ./libquantum.ltrans0.o'
might be a more elegant solution then?


This is much more elegant. I like it, I can take a look at implementing 
it.

Would I be looking at gcc.cc for this?

Best wishes,
Peter.



Richard.


>
> Richard.
>
>> I tested this both in environments both with and without sh present,
>> and
>> observed no issues.
>>
>> Signed-off-by: Peter Damianov 
>> ---
>> gcc/lto-wrapper.cc | 35 ---
>> 1 file changed, 32 insertions(+), 3 deletions(-)
>>
>> diff --git a/gcc/lto-wrapper.cc b/gcc/lto-wrapper.cc
>> index 5186d040ce0..8dee0aaa2d8 100644
>> --- a/gcc/lto-wrapper.cc
>> +++ b/gcc/lto-wrapper.cc
>> @@ -1389,6 +1389,27 @@ make_exists (void)
>>return errmsg == NULL && exit_status == 0 && err == 0;
>> }
>>
>> +/* Test that an sh command is present and working, return true if so.
>> +   This is only relevant for windows hosts, where a /bin/sh shell
>> cannot
>> +   be assumed to exist. */
>> +
>> +static bool
>> +sh_exists (void)
>> +{
>> +  const char *sh = "sh";

Re: [PATCH] lto: Don't assume a posix shell is usable on windows [PR110710]

2024-03-27 Thread Richard Biener
On Wed, Mar 27, 2024 at 9:13 AM Peter0x44  wrote:
>
> I accidentally replied off-list. Sorry.
>
> 27 Mar 2024 8:09:30 am Peter0x44 :
>
>
> 27 Mar 2024 7:51:26 am Richard Biener :
>
> > On Tue, Mar 26, 2024 at 11:37 PM Peter Damianov 
> > wrote:
> >>
> >> lto-wrapper generates Makefiles that use the following:
> >> touch -r file file.tmp && mv file.tmp file
> >> to truncate files.
> >> If there is no suitable "touch" or "mv" available, then this errors
> >> with
> >> "The system cannot find the file specified".
> >>
> >> The solution here is to check if sh -c true works, before trying to
> >> use it in
> >> the Makefile. If it doesn't, then fall back to "copy /y nul file"
> >> instead.
> >> The fallback doesn't work exactly the same (the modified time of the
> >> file gets
> >> updated), but this doesn't seem to matter in practice.
> >
> > I suppose it doesn't matter as we (no longer?) have the input as
> > dependency
> > on the rule so make doesn't get confused to re-build it.  I guess we
> > only truncate
> > the file because it's going to be deleted by another process.
> >
> > Instead of doing sth like sh_exists I would suggest to simply use
> > #ifdef __WIN
> > or something like that?  Not sure if we have suitable macros to
> > identify the
> > host operating system though and whether mingw, cygwin, etc. behave the
> > same
> > here.
>
> They do, I tested. Using sh_exists is deliberate, I've had to program on
> school computers that had cmd.exe disabled, but had busybox-w32 working,
> so it might be more flexible in that way. I would prefer a solution which
> didn't require invoking cmd.exe if there is a working shell present.

Hmm, but then I'd expect SHELL to be appropriately set in such
situation?  So shouldn't sh_exists at least try to look at $SHELL?

> I figured doing the "sh_exists" is okay because there is a basically
> identical check for make.
>
> >
> > As a stop-gap solution doing
> >
> >   ( @-touch -r ... ) || true
> >
> > might also work?  Or another way to note to make the command can fail
> > without causing a problem.
>
> I don't think it would work. cmd.exe can't run subshells like this.

Hmm, OK.  So this is all for the case where 'make' is available (as you
say we check for that) but no POSIX command environment is
(IIRC both touch and mv are part of  POSIX).  Testing for 'touch' and
'mv' would then be another option?

> >
> > Another way would be to have a portable solution to truncate a file
> > (maybe even removing it would work).  I don't think we should override
> > SHELL.
>
> Do you mean that perhaps an special command line argument could be added
> to lto-wrapper to do it, and then the makefile could invoke lto-wrapper
> to remove or truncate files instead of a shell? I'm not sure I get the
> proposed suggestion.

The point is to truncate the file at the earliest point to reduce the
peak disk space required for a LTO link with many LTRANS units.
But yes, I guess it would be possible to add a flag to gcc itself
so the LTRANS compile would truncate the file.  Currently we
emit sth like

./libquantum.ltrans0.ltrans.o:
@/space/rguenther/install/trunk-r14-8925/usr/local/bin/gcc
'-xlto' '-c' '-fno-openmp' '-fno-openacc' '-fno-pie'
'-fcf-protection=none' '-g' '-mtune=generic' '-march=x86-64' '-O2'
'-O2' '-g' '-v' '-save-temps' '-mtune=generic' '-march=x86-64'
'-dumpdir' 'libquantum.' '-dumpbase' './libquantum.ltrans0.ltrans'
'-fltrans' '-o' './libquantum.ltrans0.ltrans.o'
'./libquantum.ltrans0.o'

so adding a '-truncate-input' flag for the driver or even more
explicit '-truncate ./libquantum.ltrans0.o'
might be a more elegant solution then?

Richard.

> >
> > Richard.
> >
> >> I tested this both in environments both with and without sh present,
> >> and
> >> observed no issues.
> >>
> >> Signed-off-by: Peter Damianov 
> >> ---
> >> gcc/lto-wrapper.cc | 35 ---
> >> 1 file changed, 32 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/gcc/lto-wrapper.cc b/gcc/lto-wrapper.cc
> >> index 5186d040ce0..8dee0aaa2d8 100644
> >> --- a/gcc/lto-wrapper.cc
> >> +++ b/gcc/lto-wrapper.cc
> >> @@ -1389,6 +1389,27 @@ make_exists (void)
> >>return errmsg == NULL && exit_status == 0 && err == 0;
> >> }
> >>
> >> +/* Test that an sh command is present and working, return true if so.
> >> +   This is only relevant for windows hosts, where a /bin/sh shell
> >> cannot
> >> +   be assumed to exist. */
> >> +
> >> +static bool
> >> +sh_exists (void)
> >> +{
> >> +  const char *sh = "sh";
> >> +  const char *sh_args[] = {sh, "-c", "true", NULL};
> >> +#ifdef _WIN32
> >> +  int exit_status = 0;
> >> +  int err = 0;
> >> +  const char *errmsg
> >> += pex_one (PEX_SEARCH, sh_args[0], CONST_CAST (char **, sh_args),
> >> +  "sh", NULL, NULL, _status, );
> >> +  return errmsg == NULL && exit_status == 0 && err == 0;
> >> +#else
> >> +  return true;
> >> +#endif
> >> +}
> >> +
> >> /* Execute gcc. ARGC is the number of arguments. ARGV contains the
> >> 

[PATCH] libstdc++: Fix call signature of builtins from masked ++/--

2024-03-27 Thread Matthias Kretz
This is broken on GCC 13 and 14 (https://compiler-explorer.com/z/GPKGPGs6T)

Tested on x86_64-linux-gnu.

OK for trunk and 13?

--- 8< -

This resolves failures in the "expensive" where-* test of check-simd
when targeting AVX-512.

Signed-off-by: Matthias Kretz 

libstdc++-v3/ChangeLog:

* include/experimental/bits/simd_x86.h (_S_masked_unary): Call
the 4- and 8-byte variants of __builtin_ia32_subp[ds] without
rounding direction argument.
---
 libstdc++-v3/include/experimental/bits/simd_x86.h | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)


--
──
 Dr. Matthias Kretz   https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
 stdₓ::simd
──diff --git a/libstdc++-v3/include/experimental/bits/simd_x86.h b/libstdc++-v3/include/experimental/bits/simd_x86.h
index 16b207be2a3..6b414486fee 100644
--- a/libstdc++-v3/include/experimental/bits/simd_x86.h
+++ b/libstdc++-v3/include/experimental/bits/simd_x86.h
@@ -3538,17 +3538,23 @@ _S_masked_unary(const _SimdWrapper<_K, _Np> __k, const _SimdWrapper<_Tp, _Np> __
 	  }
 	else
 	  {
-#define _GLIBCXX_SIMD_MASK_SUB(_Sizeof, _Width, _Instr)\
+#define _GLIBCXX_SIMD_MASK_SUB_512(_Sizeof, _Width, _Instr)\
   if constexpr (sizeof(_Tp) == _Sizeof && sizeof(__v) == _Width)   \
 return __builtin_ia32_##_Instr##_mask( \
 	 __v._M_data, __vector_broadcast<_Np>(_Tp(__pm_one)), __v._M_data, \
 	 __k._M_data, _MM_FROUND_CUR_DIRECTION)
-		_GLIBCXX_SIMD_MASK_SUB(4, 64, subps512);
+#define _GLIBCXX_SIMD_MASK_SUB(_Sizeof, _Width, _Instr)\
+  if constexpr (sizeof(_Tp) == _Sizeof && sizeof(__v) == _Width)   \
+return __builtin_ia32_##_Instr##_mask( \
+	 __v._M_data, __vector_broadcast<_Np>(_Tp(__pm_one)), __v._M_data, \
+	 __k._M_data)
+		_GLIBCXX_SIMD_MASK_SUB_512(4, 64, subps512);
 		_GLIBCXX_SIMD_MASK_SUB(4, 32, subps256);
 		_GLIBCXX_SIMD_MASK_SUB(4, 16, subps128);
-		_GLIBCXX_SIMD_MASK_SUB(8, 64, subpd512);
+		_GLIBCXX_SIMD_MASK_SUB_512(8, 64, subpd512);
 		_GLIBCXX_SIMD_MASK_SUB(8, 32, subpd256);
 		_GLIBCXX_SIMD_MASK_SUB(8, 16, subpd128);
+#undef _GLIBCXX_SIMD_MASK_SUB_512
 #undef _GLIBCXX_SIMD_MASK_SUB
 	  }
 #endif // __clang__


Re: [PATCH] lto: Don't assume a posix shell is usable on windows [PR110710]

2024-03-27 Thread Peter0x44

I accidentally replied off-list. Sorry.

27 Mar 2024 8:09:30 am Peter0x44 :


27 Mar 2024 7:51:26 am Richard Biener :

On Tue, Mar 26, 2024 at 11:37 PM Peter Damianov  
wrote:


lto-wrapper generates Makefiles that use the following:
touch -r file file.tmp && mv file.tmp file
to truncate files.
If there is no suitable "touch" or "mv" available, then this errors 
with

"The system cannot find the file specified".

The solution here is to check if sh -c true works, before trying to 
use it in
the Makefile. If it doesn't, then fall back to "copy /y nul file" 
instead.
The fallback doesn't work exactly the same (the modified time of the 
file gets

updated), but this doesn't seem to matter in practice.


I suppose it doesn't matter as we (no longer?) have the input as 
dependency

on the rule so make doesn't get confused to re-build it.  I guess we
only truncate
the file because it's going to be deleted by another process.

Instead of doing sth like sh_exists I would suggest to simply use 
#ifdef __WIN
or something like that?  Not sure if we have suitable macros to 
identify the
host operating system though and whether mingw, cygwin, etc. behave the 
same

here.


They do, I tested. Using sh_exists is deliberate, I've had to program on 
school computers that had cmd.exe disabled, but had busybox-w32 working, 
so it might be more flexible in that way. I would prefer a solution which 
didn't require invoking cmd.exe if there is a working shell present.


I figured doing the "sh_exists" is okay because there is a basically 
identical check for make.




As a stop-gap solution doing

  ( @-touch -r ... ) || true

might also work?  Or another way to note to make the command can fail
without causing a problem.


I don't think it would work. cmd.exe can't run subshells like this.



Another way would be to have a portable solution to truncate a file
(maybe even removing it would work).  I don't think we should override
SHELL.


Do you mean that perhaps an special command line argument could be added 
to lto-wrapper to do it, and then the makefile could invoke lto-wrapper 
to remove or truncate files instead of a shell? I'm not sure I get the 
proposed suggestion.




Richard.

I tested this both in environments both with and without sh present, 
and

observed no issues.

Signed-off-by: Peter Damianov 
---
gcc/lto-wrapper.cc | 35 ---
1 file changed, 32 insertions(+), 3 deletions(-)

diff --git a/gcc/lto-wrapper.cc b/gcc/lto-wrapper.cc
index 5186d040ce0..8dee0aaa2d8 100644
--- a/gcc/lto-wrapper.cc
+++ b/gcc/lto-wrapper.cc
@@ -1389,6 +1389,27 @@ make_exists (void)
   return errmsg == NULL && exit_status == 0 && err == 0;
}

+/* Test that an sh command is present and working, return true if so.
+   This is only relevant for windows hosts, where a /bin/sh shell 
cannot

+   be assumed to exist. */
+
+static bool
+sh_exists (void)
+{
+  const char *sh = "sh";
+  const char *sh_args[] = {sh, "-c", "true", NULL};
+#ifdef _WIN32
+  int exit_status = 0;
+  int err = 0;
+  const char *errmsg
+    = pex_one (PEX_SEARCH, sh_args[0], CONST_CAST (char **, sh_args),
+  "sh", NULL, NULL, _status, );
+  return errmsg == NULL && exit_status == 0 && err == 0;
+#else
+  return true;
+#endif
+}
+
/* Execute gcc. ARGC is the number of arguments. ARGV contains the 
arguments. */


static void
@@ -1402,6 +1423,7 @@ run_gcc (unsigned argc, char *argv[])
   const char *collect_gcc;
   char *collect_gcc_options;
   int parallel = 0;
+  bool have_sh = sh_exists ();
   int jobserver = 0;
   bool jobserver_requested = false;
   int auto_parallel = 0;
@@ -2016,6 +2038,7 @@ cont:
  argv_ptr[5] = NULL;
  if (parallel)
    {
+ fprintf (mstream, "SHELL=%s\n", have_sh ? "sh" : "cmd");
  fprintf (mstream, "%s:\n\t@%s ", output_name, 
new_argv[0]);

  for (j = 1; new_argv[j] != NULL; ++j)
    fprintf (mstream, " '%s'", new_argv[j]);
@@ -2024,9 +2047,15 @@ cont:
 truncate them as soon as we have processed it.  This
 reduces temporary disk-space usage.  */
  if (! save_temps)
-   fprintf (mstream, "\t@-touch -r \"%s\" \"%s.tem\" > 
/dev/null "

-    "2>&1 && mv \"%s.tem\" \"%s\"\n",
-    input_name, input_name, input_name, 
input_name);

+   {
+ fprintf (mstream,
+  have_sh
+  ? "\t@-touch -r \"%s\" \"%s.tem\" > 
/dev/null "

+    "2>&1 && mv \"%s.tem\" \"%s\"\n"
+  : "\t@-copy /y nul \"%s\" > NUL "
+    "2>&1\n",
+  input_name, input_name, input_name, 
input_name);

+   }
    }
  else
    {
--
2.39.2



Re: [PATCH] LoongArch: Increase division costs

2024-03-27 Thread Richard Biener
On Tue, Mar 26, 2024 at 10:52 AM Xi Ruoyao  wrote:
>
> The latency of LA464 and LA664 division instructions depends on the
> input.  When I updated the costs in r14-6642, I unintentionally set the
> division costs to the best-case latency (when the first operand is 0).
> Per a recent discussion [1] we should use "something sensible" instead
> of it.
>
> Use the average of the minimum and maximum latency observed instead.
> This enables multiplication to reciprocal sequence reduction and speeds
> up the following test case for about 30%:
>
> int
> main (void)
> {
>   unsigned long stat = 0xdeadbeef;
>   for (int i = 0; i < 1; i++)
> stat = (stat * stat + stat * 114514 + 1919810) % 17;
>   asm(""::"r"(stat));
> }

I think you should be able to see a constant divisor and thus could do
better than return the same latency for everything.  For non-constant
divisors using the best-case latency shouldn't be a problem.

> [1]: https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648348.html
>
> gcc/ChangeLog:
>
> * config/loongarch/loongarch-def.cc
> (loongarch_rtx_cost_data::loongarch_rtx_cost_data): Increase
> default division cost to the average of the best case and worst
> case senarios observed.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/loongarch/div-const-reduction.c: New test.
> ---
>
> Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?
>
>  gcc/config/loongarch/loongarch-def.cc| 8 
>  gcc/testsuite/gcc.target/loongarch/div-const-reduction.c | 9 +
>  2 files changed, 13 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/loongarch/div-const-reduction.c
>
> diff --git a/gcc/config/loongarch/loongarch-def.cc 
> b/gcc/config/loongarch/loongarch-def.cc
> index e8c129ce643..93e72a520d5 100644
> --- a/gcc/config/loongarch/loongarch-def.cc
> +++ b/gcc/config/loongarch/loongarch-def.cc
> @@ -95,12 +95,12 @@ loongarch_rtx_cost_data::loongarch_rtx_cost_data ()
>: fp_add (COSTS_N_INSNS (5)),
>  fp_mult_sf (COSTS_N_INSNS (5)),
>  fp_mult_df (COSTS_N_INSNS (5)),
> -fp_div_sf (COSTS_N_INSNS (8)),
> -fp_div_df (COSTS_N_INSNS (8)),
> +fp_div_sf (COSTS_N_INSNS (12)),
> +fp_div_df (COSTS_N_INSNS (15)),
>  int_mult_si (COSTS_N_INSNS (4)),
>  int_mult_di (COSTS_N_INSNS (4)),
> -int_div_si (COSTS_N_INSNS (5)),
> -int_div_di (COSTS_N_INSNS (5)),
> +int_div_si (COSTS_N_INSNS (14)),
> +int_div_di (COSTS_N_INSNS (22)),
>  movcf2gr (COSTS_N_INSNS (7)),
>  movgr2cf (COSTS_N_INSNS (15)),
>  branch_cost (6),
> diff --git a/gcc/testsuite/gcc.target/loongarch/div-const-reduction.c 
> b/gcc/testsuite/gcc.target/loongarch/div-const-reduction.c
> new file mode 100644
> index 000..0ee86410dd7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/loongarch/div-const-reduction.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mtune=la464" } */
> +/* { dg-final { scan-assembler-not "div\.\[dw\]" } } */
> +
> +int
> +test (int a)
> +{
> +  return a % 17;
> +}
> --
> 2.44.0
>


[Patch] Builtin: Fold builtin_isinf on IBM long double to builtin_isinf on double [PR97786]

2024-03-27 Thread HAO CHEN GUI
Hi,
  This patch folds builtin_isinf on IBM long double to builtin_isinf on
double type. The former patch
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648304.html
implemented the DFmode isinf_optab.

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. Is it OK for next stage 1?

Thanks
Gui Haochen

ChangeLog
Builtin: Fold builtin_isinf on IBM long double to builtin_isinf on double

For IBM long double, Inf is encoded in the high-order double value only.
So the builtin_isinf on IBM long double can be folded to builtin_isinf on
double type.  As former patch implemented DFmode isinf_optab, this patch
converts builtin_isinf on IBM long double to builtin_isinf on double type
if the DFmode isinf_optab exists.

gcc/
PR target/97786
* builtins.cc (fold_builtin_interclass_mathfn): Fold IBM long double
isinf call to double isinf call when DFmode isinf_optab exists.

gcc/testsuite/
PR target/97786
* gcc.target/powerpc/pr97786-3.c: New test.

patch.diff
diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index eda8bea9c4b..d2786f207b8 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -9574,6 +9574,12 @@ fold_builtin_interclass_mathfn (location_t loc, tree 
fndecl, tree arg)
type = double_type_node;
mode = DFmode;
arg = fold_build1_loc (loc, NOP_EXPR, type, arg);
+   tree const isinf_fn = builtin_decl_explicit (BUILT_IN_ISINF);
+   if (interclass_mathfn_icode (arg, isinf_fn) != CODE_FOR_nothing)
+ {
+   result = build_call_expr (isinf_fn, 1, arg);
+   return result;
+ }
  }
get_max_float (REAL_MODE_FORMAT (mode), buf, sizeof (buf), false);
real_from_string (, buf);
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-3.c 
b/gcc/testsuite/gcc.target/powerpc/pr97786-3.c
new file mode 100644
index 000..1c816921e1a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97786-3.c
@@ -0,0 +1,17 @@
+/* { dg-do compile { target lp64 } } */
+/* { dg-require-effective-target ppc_float128_sw } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -mvsx -mabi=ibmlongdouble 
-Wno-psabi" } */
+
+int test1 (long double x)
+{
+  return __builtin_isinf (x);
+}
+
+int test2 (long double x)
+{
+  return __builtin_isinfl (x);
+}
+
+/* { dg-final { scan-assembler-not {\mfcmpu\M} } } */
+/* { dg-final { scan-assembler-times {\mxststdcdp\M} 2 } } */


Re: [PATCH] lto: Don't assume a posix shell is usable on windows [PR110710]

2024-03-27 Thread Richard Biener
On Tue, Mar 26, 2024 at 11:37 PM Peter Damianov  wrote:
>
> lto-wrapper generates Makefiles that use the following:
> touch -r file file.tmp && mv file.tmp file
> to truncate files.
> If there is no suitable "touch" or "mv" available, then this errors with
> "The system cannot find the file specified".
>
> The solution here is to check if sh -c true works, before trying to use it in
> the Makefile. If it doesn't, then fall back to "copy /y nul file" instead.
> The fallback doesn't work exactly the same (the modified time of the file gets
> updated), but this doesn't seem to matter in practice.

I suppose it doesn't matter as we (no longer?) have the input as dependency
on the rule so make doesn't get confused to re-build it.  I guess we
only truncate
the file because it's going to be deleted by another process.

Instead of doing sth like sh_exists I would suggest to simply use #ifdef __WIN
or something like that?  Not sure if we have suitable macros to identify the
host operating system though and whether mingw, cygwin, etc. behave the same
here.

As a stop-gap solution doing

  ( @-touch -r ... ) || true

might also work?  Or another way to note to make the command can fail
without causing a problem.

Another way would be to have a portable solution to truncate a file
(maybe even removing it would work).  I don't think we should override
SHELL.

Richard.

> I tested this both in environments both with and without sh present, and
> observed no issues.
>
> Signed-off-by: Peter Damianov 
> ---
>  gcc/lto-wrapper.cc | 35 ---
>  1 file changed, 32 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/lto-wrapper.cc b/gcc/lto-wrapper.cc
> index 5186d040ce0..8dee0aaa2d8 100644
> --- a/gcc/lto-wrapper.cc
> +++ b/gcc/lto-wrapper.cc
> @@ -1389,6 +1389,27 @@ make_exists (void)
>return errmsg == NULL && exit_status == 0 && err == 0;
>  }
>
> +/* Test that an sh command is present and working, return true if so.
> +   This is only relevant for windows hosts, where a /bin/sh shell cannot
> +   be assumed to exist. */
> +
> +static bool
> +sh_exists (void)
> +{
> +  const char *sh = "sh";
> +  const char *sh_args[] = {sh, "-c", "true", NULL};
> +#ifdef _WIN32
> +  int exit_status = 0;
> +  int err = 0;
> +  const char *errmsg
> += pex_one (PEX_SEARCH, sh_args[0], CONST_CAST (char **, sh_args),
> +  "sh", NULL, NULL, _status, );
> +  return errmsg == NULL && exit_status == 0 && err == 0;
> +#else
> +  return true;
> +#endif
> +}
> +
>  /* Execute gcc. ARGC is the number of arguments. ARGV contains the 
> arguments. */
>
>  static void
> @@ -1402,6 +1423,7 @@ run_gcc (unsigned argc, char *argv[])
>const char *collect_gcc;
>char *collect_gcc_options;
>int parallel = 0;
> +  bool have_sh = sh_exists ();
>int jobserver = 0;
>bool jobserver_requested = false;
>int auto_parallel = 0;
> @@ -2016,6 +2038,7 @@ cont:
>   argv_ptr[5] = NULL;
>   if (parallel)
> {
> + fprintf (mstream, "SHELL=%s\n", have_sh ? "sh" : "cmd");
>   fprintf (mstream, "%s:\n\t@%s ", output_name, new_argv[0]);
>   for (j = 1; new_argv[j] != NULL; ++j)
> fprintf (mstream, " '%s'", new_argv[j]);
> @@ -2024,9 +2047,15 @@ cont:
>  truncate them as soon as we have processed it.  This
>  reduces temporary disk-space usage.  */
>   if (! save_temps)
> -   fprintf (mstream, "\t@-touch -r \"%s\" \"%s.tem\" > /dev/null 
> "
> -"2>&1 && mv \"%s.tem\" \"%s\"\n",
> -input_name, input_name, input_name, input_name);
> +   {
> + fprintf (mstream,
> +  have_sh
> +  ? "\t@-touch -r \"%s\" \"%s.tem\" > /dev/null "
> +"2>&1 && mv \"%s.tem\" \"%s\"\n"
> +  : "\t@-copy /y nul \"%s\" > NUL "
> +"2>&1\n",
> +  input_name, input_name, input_name, input_name);
> +   }
> }
>   else
> {
> --
> 2.39.2
>


[PATCH] s390: avoid peeking eof after __vector

2024-03-27 Thread Jiufu Guo
Hi,

Same like PR101168, this patch is need for s390 to
avoid peeking eof after vector keyword.
And similar test case is also ok for s390.

Is this ok for trunk?

Jeff (Jiufu Guo)

PR target/95782

gcc/ChangeLog:

* config/s390/s390-c.cc (s390_macro_to_expand): Avoid empty identifier.

gcc/testsuite/ChangeLog:

* g++.target/s390/pr95782.C: New test.

---
 gcc/config/s390/s390-c.cc   | 4 +++-
 gcc/testsuite/g++.target/s390/pr95782.C | 5 +
 2 files changed, 8 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.target/s390/pr95782.C

diff --git a/gcc/config/s390/s390-c.cc b/gcc/config/s390/s390-c.cc
index 8d3d1a467a8..45f164d978b 100644
--- a/gcc/config/s390/s390-c.cc
+++ b/gcc/config/s390/s390-c.cc
@@ -275,7 +275,9 @@ s390_macro_to_expand (cpp_reader *pfile, const cpp_token 
*tok)
   /* __vector long __bool a; */
   if (ident == C_CPP_HASHNODE (__bool_keyword))
expand_bool_p = true;
-  else
+
+  /* If there are more tokens to check.  */
+  else if (ident)
{
  /* Triggered with: __vector long long __bool a; */
  do
diff --git a/gcc/testsuite/g++.target/s390/pr95782.C 
b/gcc/testsuite/g++.target/s390/pr95782.C
new file mode 100644
index 000..daf887fc6fe
--- /dev/null
+++ b/gcc/testsuite/g++.target/s390/pr95782.C
@@ -0,0 +1,5 @@
+// { dg-do compile }
+// { dg-options "-march=z14 -mzvector" }
+
+using vdbl =  __vector double;
+#define BREAK 1
-- 
2.25.1