date:20230626

The manual references asm goto as being implicitly volatile already
and that was done when asm goto could not have outputs. When outputs
were added to `asm goto`, only asm goto without outputs were still being
marked as volatile. Now some parts of GCC decide, removing the `asm goto`
is ok if the output is not used, though not updating the CFG (this happens
on both the RTL level and the gimple level). Since the biggest user of `asm 
goto`
is the Linux kernel and they expect them to be volatile (they use them to
copy to/from userspace), we should just mark the inline-asm as volatile.

OK? Bootstrapped and tested on x86_64-linux-gnu.

PR middle-end/110420
PR middle-end/103979
PR middle-end/98619

gcc/ChangeLog:

* gimplify.cc (gimplify_asm_expr): Mark asm with labels as volatile.

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/asmgoto-6.c: New test.
---
 gcc/gimplify.cc   |  7 -
 .../gcc.c-torture/compile/asmgoto-6.c | 26 +++
 2 files changed, 32 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/asmgoto-6.c

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 0e24b915b8f..dc6a00e8bd9 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -6935,7 +6935,12 @@ gimplify_asm_expr (tree *expr_p, gimple_seq *pre_p, 
gimple_seq *post_p)
   stmt = gimple_build_asm_vec (TREE_STRING_POINTER (ASM_STRING (expr)),
   inputs, outputs, clobbers, labels);
 
-  gimple_asm_set_volatile (stmt, ASM_VOLATILE_P (expr) || noutputs == 0);
+  /* asm is volatile if it was marked by the user as volatile or
+there is no outputs or this is an asm goto.  */
+  gimple_asm_set_volatile (stmt,
+  ASM_VOLATILE_P (expr)
+  || noutputs == 0
+  || labels);
   gimple_asm_set_input (stmt, ASM_INPUT_P (expr));
   gimple_asm_set_inline (stmt, ASM_INLINE_P (expr));
 
diff --git a/gcc/testsuite/gcc.c-torture/compile/asmgoto-6.c 
b/gcc/testsuite/gcc.c-torture/compile/asmgoto-6.c
new file mode 100644
index 000..0652bd4e4e1
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/asmgoto-6.c
@@ -0,0 +1,26 @@
+
+/* { dg-do compile } */
+/* PR middle-end/110420 */
+/* PR middle-end/103979 */
+/* PR middle-end/98619 */
+/* Test that the middle-end does not remove the asm goto
+   with an output. */
+
+static int t;
+void g(void);
+
+void f(void)
+{
+  int  __gu_val;
+  asm goto("#my asm "
+ : "="(__gu_val)
+ :
+ :
+ : Efault);
+  t = __gu_val;
+  g();
+Efault:
+}
+
+/* Make sure "my asm " is still in the assembly. */
+/* { dg-final { scan-assembler "my asm " } } */
-- 
2.31.1

[PATCH] Fix __builtin_alloca_with_align_and_max defbuiltin usage

There is a missing space between the return type and the name
which causes the name not to be outputted in the html docs.

Committed as obvious after building html docs.

gcc/ChangeLog:

* doc/extend.texi (__builtin_alloca_with_align_and_max): Fix
defbuiltin usage.
---
 gcc/doc/extend.texi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index c01cd3fe90c..53a1b12f88a 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -13731,7 +13731,7 @@ an extension.  @xref{Variable Length}, for details.
 
 @enddefbuiltin
 
-@defbuiltin{{void *}__builtin_alloca_with_align_and_max (size_t size, size_t 
alignment, size_t max_size)}
+@defbuiltin{{void *} __builtin_alloca_with_align_and_max (size_t size, size_t 
alignment, size_t max_size)}
 Similar to @code{__builtin_alloca_with_align} but takes an extra argument
 specifying an upper bound for @var{size} in case its value cannot be computed
 at compile time, for use by @option{-fstack-usage}, @option{-Wstack-usage}
-- 
2.31.1

RE: [PATCH V2] RISC-V: Support const vector expansion with step vector with base != 0

Committed, thanks Jeff.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Jeff Law via Gcc-patches
Sent: Tuesday, June 27, 2023 7:50 AM
To: juzhe.zhong 
Cc: gcc-patches@gcc.gnu.org; kito.ch...@gmail.com; kito.ch...@sifive.com; 
pal...@dabbelt.com; pal...@rivosinc.com; rdapp@gmail.com
Subject: Re: [PATCH V2] RISC-V: Support const vector expansion with step vector 
with base != 0



On 6/26/23 17:36, juzhe.zhong wrote:
> Yes. I found the “return” is redundant so I removed it.
OK.  Just wanted to be sure.

OK for the trunk.
jeff

Re: [PATCH V2] RISC-V: Support const vector expansion with step vector with base != 0





On 6/26/23 17:36, juzhe.zhong wrote:

Yes. I found the “return” is redundant so I removed it.

OK.  Just wanted to be sure.

OK for the trunk.
jeff

Re: [PATCH] Move substitute_and_fold over to use simple_dce_from_worklist

On Mon, Jun 26, 2023 at 12:41 PM Andrew Pinski  wrote:
>
> On Mon, Jun 26, 2023 at 11:49 AM Andrew Pinski  wrote:
> >
> > On Mon, Jun 26, 2023 at 9:13 AM Andrew Pinski  wrote:
> > >
> > > On Sun, Jun 25, 2023 at 10:59 PM Jan-Benedict Glaw  
> > > wrote:
> > > >
> > > > Hi Andrew,
> > > >
> > > > On Fri, 2023-05-05 08:17:19 -0700, Andrew Pinski via Gcc-patches 
> > > >  wrote:
> > > > > While looking into a different issue, I noticed that it
> > > > > would take until the second forwprop pass to do some
> > > > > forward proping and it was because the ssa name was
> > > > > used more than once but the second statement was
> > > > > "dead" and we don't remove that until much later.
> > > > [...]
> > > > > OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> > > >
> > > > Since this patch, I see a bit of fallout building the Linux kernel
> > > > using the adder875_defconfig:
> > > >
> > > > # CC  arch/powerpc/kernel/ptrace/ptrace-view.o
> > > >   powerpc-linux-gcc 
> > > > -Wp,-MMD,arch/powerpc/kernel/ptrace/.ptrace-view.o.d -nostdinc 
> > > > -I./arch/powerpc/include -I./arch/powerpc/include/generated  
> > > > -I./include -I./arch/powerpc/include/uapi 
> > > > -I./arch/powerpc/include/generated/uapi -I./include/uapi 
> > > > -I./include/generated/uapi -include ./include/linux/compiler-version.h 
> > > > -include ./include/linux/kconfig.h -include 
> > > > ./include/linux/compiler_types.h -D__KERNEL__ -I ./arch/powerpc 
> > > > -fmacro-prefix-map=./= -Wall -Wundef -Werror=strict-prototypes 
> > > > -Wno-trigraphs -fno-strict-aliasing -fno-common -fshort-wchar -fno-PIE 
> > > > -Werror=implicit-function-declaration -Werror=implicit-int 
> > > > -Werror=return-type -Wno-format-security -funsigned-char -std=gnu11 
> > > > -mbig-endian -m32 -msoft-float -pipe -ffixed-r2 -mmultiple 
> > > > -mno-readonly-in-sdata -mcpu=860 -mno-prefixed -mno-pcrel -mno-altivec 
> > > > -mno-vsx -mno-mma -fno-asynchronous-unwind-tables -mno-string 
> > > > -mbig-endian -mstack-protector-guard=tls -mstack-protector-guard-reg=r2 
> > > > -fno-delete-null-pointer-checks -Wno-frame-address 
> > > > -Wno-format-truncation -Wno-format-overflow 
> > > > -Wno-address-of-packed-member -O2 -fno-allow-store-data-races 
> > > > -Wframe-larger-than=1024 -fstack-protector-strong -Wno-main 
> > > > -Wno-unused-but-set-variable -Wno-unused-const-variable 
> > > > -Wno-dangling-pointer -fomit-frame-pointer -ftrivial-auto-var-init=zero 
> > > > -fno-stack-clash-protection -Wdeclaration-after-statement -Wvla 
> > > > -Wno-pointer-sign -Wcast-function-type -Wno-stringop-truncation 
> > > > -Wno-stringop-overflow -Wno-restrict -Wno-maybe-uninitialized 
> > > > -Wno-array-bounds -Wno-alloc-size-larger-than -Wimplicit-fallthrough=5 
> > > > -fno-strict-overflow -fno-stack-check -fconserve-stack 
> > > > -Werror=date-time -Werror=incompatible-pointer-types 
> > > > -Werror=designated-init -Wno-packed-not-aligned -g 
> > > > -mstack-protector-guard-offset=544 -Werror -DUTS_MACHINE='"ppc"'
> > > > -DKBUILD_MODFILE='"arch/powerpc/kernel/ptrace/ptrace-view"' 
> > > > -DKBUILD_BASENAME='"ptrace_view"' -DKBUILD_MODNAME='"ptrace_view"' 
> > > > -D__KBUILD_MODNAME=kmod_ptrace_view -c -o 
> > > > arch/powerpc/kernel/ptrace/ptrace-view.o 
> > > > arch/powerpc/kernel/ptrace/ptrace-view.c
> > > > during GIMPLE pass: pre
> > > > arch/powerpc/kernel/ptrace/ptrace-view.c: In function 
> > > > 'gpr32_set_common':
> > > > arch/powerpc/kernel/ptrace/ptrace-view.c:649:5: internal compiler 
> > > > error: in gimple_redirect_edge_and_branch, at tree-cfg.cc:6262
> > > >   649 | int gpr32_set_common(struct task_struct *target,
> > > >   | ^~~~
> > > > 0x1a562a6 internal_error(char const*, ...)
> > > >  ???:0
> > > > 0x826ea1 fancy_abort(char const*, int, char const*)
> > > >  ???:0
> > > > 0x9b77c9 redirect_edge_and_branch(edge_def*, basic_block_def*)
> > > >  ???:0
> > > > 0x9b7e43 split_edge(edge_def*)
> > > >  ???:0
> > > > 0xee1cc7 split_critical_edges(bool)
> > > >  ???:0
> > > > Please submit a full bug report, with preprocessed source (by using 
> > > > -freport-bug).
> > > > Please include the complete backtrace with any bug report.
> > > > See  for instructions.
> > > > make[4]: *** [scripts/Makefile.build:252: 
> > > > arch/powerpc/kernel/ptrace/ptrace-view.o] Error 1
> > > > make[3]: *** [scripts/Makefile.build:494: arch/powerpc/kernel/ptrace] 
> > > > Error 2
> > > > make[2]: *** [scripts/Makefile.build:494: arch/powerpc/kernel] Error 2
> > > > make[1]: *** [scripts/Makefile.build:494: arch/powerpc] Error 2
> > > > make: *** [Makefile:2026: .] Error 2
> > >
> > > Can you file a bug (https://gcc.gnu.org/bugzilla/) with the
> > > preprocessed source (which -freport-bug will provide). In the meantime
> > > I will try to reproduce it and see what is going on.
> >
> > Note I am suspecting it is related to GCC PR 103979 . I am still
> > trying to reproduce the ICE.
>
> I am 99% sure it is more

Re: [PATCH] match.pd: Use element_mode instead of TYPE_MODE.





On 6/26/23 08:26, Robin Dapp via Gcc-patches wrote:

Hi,

this patch changes TYPE_MODE into element_mode in a match.pd
simplification.  As the simplification can be called with vector types
real_can_shorten_arithmetic would ICE in REAL_MODE_FORMAT which
expects a scalar mode.  Therefore, use element_mode instead of
TYPE_MODE.

Additionally, check if the target supports the resulting operation in the
new mode.  One target that supports e.g. a float addition but not a
_Float16 addition is the RISC-V vector Float16 extension Zvfhmin.

Bootstrap on x86_64 succeeded, testsuite is currently running.  Is this OK
if the testsuite is clean?

Regards
  Robin

gcc/ChangeLog:

* match.pd: Use element_mode and check if target supports
operation with new type.
Ugh.  What a mess -- not helped by the fact the code is poorly indented. 
 Indention would lead one to believe this is guarded by the integral 
type test.




+  && target_supports_op_p (newtype, op, optab_default)
FWIW, I think there are other targets that support various 16bit FP 
modes, but not the full complement of operations.  It was fairly common 
when the 16bit FP modes were first landing.



OK after the usual bootstrap & testing.

jeff

Re: [PATCH V2] RISC-V: Support const vector expansion with step vector with base != 0





On 6/26/23 06:18, Juzhe-Zhong wrote:

Currently, we are able to generate step vector with base == 0:
  { 0, 0, 2, 2, 4, 4, ... }

ASM:

vid
vand

However, we do wrong for step vector with base != 0:
{ 1, 1, 3, 3, 5, 5, ... }

Before this patch, such case will run fail.

After this patch, we are able to pass the testcase and generate the step vector 
with asm:

vid
vand
vadd

gcc/ChangeLog:

 * config/riscv/riscv-v.cc (expand_const_vector): Fix stepped vector 
with base != 0.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/rvv/autovec/partial/slp-17.c: New test.
 * gcc.target/riscv/rvv/autovec/partial/slp-18.c: New test.
 * gcc.target/riscv/rvv/autovec/partial/slp-19.c: New test.
 * gcc.target/riscv/rvv/autovec/partial/slp_run-17.c: New test.
 * gcc.target/riscv/rvv/autovec/partial/slp_run-18.c: New test.
 * gcc.target/riscv/rvv/autovec/partial/slp_run-19.c: New test.





diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 5518394be1e..cd3422bf711 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -1258,7 +1258,6 @@ expand_const_vector (rtx target, rtx src)
}
  emit_move_insn (target, tmp);
}
-  return;
  }
else if (CONST_VECTOR_STEPPED_P (src))
  {
Was removal of the "return" intentional here?  I'm not real familiar 
with this code, but it doesn't look related to the case you're trying to 
fix.


The rest of the code looks quite sensible.

Jeff

Re: [PATCH] Improve DSE to handle stores before __builtin_unreachable ()





On 6/26/23 11:21, Jan Hubicka wrote:

Hi,
playing with testcases for path isolation and const function, I noticed
that we do not seem to even try to isolate out of range array accesses:
int a[3]={0,1,2};
test(int i)
{
if (i > 3)
  return test2(a[i]);
return a[i];
}

Here call to test2 is dead, since a[i] will access memory past of the
array.  We produce a warning:

t.c:5:24: warning: array subscript 4 is above array bounds of ‘int[3]’ 
[-Warray-bounds=]

but we still keep the call:
My recollection is that we'd planned to have those cases call into the 
isolate paths code, but it may not have moved forward -- I lost track of 
that work when I left Red Hat.  I don't think Martin S. is doing GCC 
work anymore, so we'll probably need to update things ourselves.






Curiously adjusting the testcase:

const int a[3]={0,1,2};
test(int i)
{
 if (i == 3)
 return test2(a[i]);
 return a[i];
I would guess that we cprop a[i] into a[3] at which point the oob 
reference is painfully obvious and something cleans that up, likely 
before we even get to isolate-paths.



Jeff

Re: [PATCH] Move substitute_and_fold over to use simple_dce_from_worklist

2023-06-26 Thread Jan-Benedict Glaw

Hi Andrew,

On Mon, 2023-06-26 09:13:51 -0700, Andrew Pinski  wrote:
> On Sun, Jun 25, 2023 at 10:59 PM Jan-Benedict Glaw  wrote:
> > On Fri, 2023-05-05 08:17:19 -0700, Andrew Pinski via Gcc-patches 
> >  wrote:
> > > While looking into a different issue, I noticed that it
> > > would take until the second forwprop pass to do some
> > > forward proping and it was because the ssa name was
> > > used more than once but the second statement was
> > > "dead" and we don't remove that until much later.
> > [...]
> > > OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> >
> > Since this patch, I see a bit of fallout building the Linux kernel
> > using the adder875_defconfig:
> >
> > # CC  arch/powerpc/kernel/ptrace/ptrace-view.o
[...]
> Can you file a bug (https://gcc.gnu.org/bugzilla/) with the
> preprocessed source (which -freport-bug will provide). In the meantime
> I will try to reproduce it and see what is going on.

Here it is:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110420

Once again an asm goto.

MfG, JBG

-- 


signature.asc
Description: PGP signature

[Committed] docs: Add @cindex for some attributes

While looking for the access attribute,
I tried to find it via the concept index but it was
missing. This patch fixes that and adds one for
interrupt/interrupt_handler too.

Committed as obvious after building the HTML docs
and looking at the resulting concept index page.

gcc/ChangeLog:

* doc/extend.texi (access attribute): Add
cindex for it.
(interrupt/interrupt_handler attribute):
Likewise.
---
 gcc/doc/extend.texi | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 3040a9bdea6..05afd9ae3d9 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -2573,6 +2573,7 @@ The following attributes are supported on most targets.
 @table @code
 @c Keep this table alphabetized by attribute name.  Treat _ as space.
 
+@cindex @code{access} function attribute
 @item access (@var{access-mode}, @var{ref-index})
 @itemx access (@var{access-mode}, @var{ref-index}, @var{size-index})
 
@@ -3339,6 +3340,8 @@ int S::interface (int) __attribute__ ((ifunc 
("_ZN1S8resolverEv")));
 Indirect functions cannot be weak.  Binutils version 2.20.1 or higher
 and GNU C Library version 2.11.1 are required to use this feature.
 
+@cindex @code{interrupt_handler} function attribute
+@cindex @code{interrupt} function attribute
 @item interrupt
 @itemx interrupt_handler
 Many GCC back ends support attributes to indicate that a function is
-- 
2.31.1

Re: [PATCH V2] RISC-V: Support const vector expansion with step vector with base != 0

Ping. This patch is a simple fix here. Ok for trunk ?



juzhe.zh...@rivai.ai
 
From: Juzhe-Zhong
Date: 2023-06-26 20:18
To: gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw; rdapp.gcc; Juzhe-Zhong
Subject: [PATCH V2] RISC-V: Support const vector expansion with step vector 
with base != 0
Currently, we are able to generate step vector with base == 0:
{ 0, 0, 2, 2, 4, 4, ... }
 
ASM:
 
vid
vand
 
However, we do wrong for step vector with base != 0:
{ 1, 1, 3, 3, 5, 5, ... }
 
Before this patch, such case will run fail.
 
After this patch, we are able to pass the testcase and generate the step vector 
with asm:
 
vid
vand
vadd
 
gcc/ChangeLog:
 
* config/riscv/riscv-v.cc (expand_const_vector): Fix stepped vector 
with base != 0.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/partial/slp-17.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-18.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-19.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-17.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-18.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-19.c: New test.
 
---
gcc/config/riscv/riscv-v.cc   | 14 +++-
.../riscv/rvv/autovec/partial/slp-17.c| 34 
.../riscv/rvv/autovec/partial/slp-18.c| 26 ++
.../riscv/rvv/autovec/partial/slp-19.c| 26 ++
.../riscv/rvv/autovec/partial/slp_run-17.c| 84 +++
.../riscv/rvv/autovec/partial/slp_run-18.c| 69 +++
.../riscv/rvv/autovec/partial/slp_run-19.c| 69 +++
7 files changed, 320 insertions(+), 2 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-17.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-18.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-19.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-17.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-18.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-19.c
 
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 5518394be1e..cd3422bf711 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -1258,7 +1258,6 @@ expand_const_vector (rtx target, rtx src)
}
  emit_move_insn (target, tmp);
}
-  return;
 }
   else if (CONST_VECTOR_STEPPED_P (src))
 {
@@ -1287,9 +1286,20 @@ expand_const_vector (rtx target, rtx src)
  */
  rtx imm
= gen_int_mode (-builder.npatterns (), builder.inner_mode ());
-   rtx and_ops[] = {target, vid, imm};
+   rtx tmp = gen_reg_rtx (builder.mode ());
+   rtx and_ops[] = {tmp, vid, imm};
  icode = code_for_pred_scalar (AND, builder.mode ());
  emit_vlmax_insn (icode, RVV_BINOP, and_ops);
+   HOST_WIDE_INT init_val = INTVAL (builder.elt (0));
+   if (init_val == 0)
+ emit_move_insn (target, tmp);
+   else
+ {
+   rtx dup = gen_const_vector_dup (builder.mode (), init_val);
+   rtx add_ops[] = {target, tmp, dup};
+   icode = code_for_pred (PLUS, builder.mode ());
+   emit_vlmax_insn (icode, RVV_BINOP, add_ops);
+ }
}
  else
{
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-17.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-17.c
new file mode 100644
index 000..2f2c3d11c2a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-17.c
@@ -0,0 +1,34 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param 
riscv-autovec-preference=scalable -fdump-tree-optimized-details" } */
+
+#include 
+
+void
+f (uint8_t *restrict a, uint8_t *restrict b,
+   uint8_t *restrict c, uint8_t *restrict d,
+   int n)
+{
+  for (int i = 0; i < n; ++i)
+{
+  a[i * 8] = c[i * 8] + d[i * 8];
+  a[i * 8 + 1] = c[i * 8] + d[i * 8 + 1];
+  a[i * 8 + 2] = c[i * 8 + 2] + d[i * 8 + 2];
+  a[i * 8 + 3] = c[i * 8 + 2] + d[i * 8 + 3];
+  a[i * 8 + 4] = c[i * 8 + 4] + d[i * 8 + 4];
+  a[i * 8 + 5] = c[i * 8 + 4] + d[i * 8 + 5];
+  a[i * 8 + 6] = c[i * 8 + 6] + d[i * 8 + 6];
+  a[i * 8 + 7] = c[i * 8 + 6] + d[i * 8 + 7];
+  b[i * 8] = c[i * 8 + 1] + d[i * 8];
+  b[i * 8 + 1] = c[i * 8 + 1] + d[i * 8 + 1];
+  b[i * 8 + 2] = c[i * 8 + 3] + d[i * 8 + 2];
+  b[i * 8 + 3] = c[i * 8 + 3] + d[i * 8 + 3];
+  b[i * 8 + 4] = c[i * 8 + 5] + d[i * 8 + 4];
+  b[i * 8 + 5] = c[i * 8 + 5] + d[i * 8 + 5];
+  b[i * 8 + 6] = c[i * 8 + 7] + d[i * 8 + 6];
+  b[i * 8 + 7] = c[i * 8 + 7] + d[i * 8 + 7];
+}
+}
+
+/* { dg-final { scan-tree-dump-times "\.VEC_PERM" 2 "optimized" } } */
+/* { dg-final { scan-assembler {\tvid\.v} } } */
+/* { dg-final { scan-assembler-not {\tvmul} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-18.c

Re: [PATCH] RISC-V: Add autovect widening/narrowing Integer/FP conversions.





On 6/26/23 12:59, Robin Dapp wrote:

Hi,

this patch implements widening and narrowing float-to-int and
int-to-float autovec conversions and adds tests.

Regards
  Robin

gcc/ChangeLog:

* config/riscv/autovec.md (2): New
expander.
(2): Dito.
(2): Dito.
(2): Dito.
* config/riscv/vector-iterators.md: Add vnconvert.

OK.
Jeff

Re: [PATCH] RISC-V: Add autovect widening/narrowing Integer/FP conversions.

LGTM.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-06-27 02:59
To: gcc-patches; palmer; Kito Cheng; juzhe.zh...@rivai.ai; jeffreyalaw
CC: rdapp.gcc
Subject: [PATCH] RISC-V: Add autovect widening/narrowing Integer/FP conversions.
Hi,
 
this patch implements widening and narrowing float-to-int and
int-to-float autovec conversions and adds tests.
 
Regards
Robin
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (2): New
expander.
(2): Dito.
(2): Dito.
(2): Dito.
* config/riscv/vector-iterators.md: Add vnconvert.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-template.h: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-template.h: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-template.h: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-template.h: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-zvfh-run.c: New test.
---
gcc/config/riscv/autovec.md   | 76 +++
gcc/config/riscv/vector-iterators.md  |  8 ++
.../rvv/autovec/conversions/vfncvt-ftoi-run.c | 96 +++
.../autovec/conversions/vfncvt-ftoi-rv32gcv.c |  7 ++
.../autovec/conversions/vfncvt-ftoi-rv64gcv.c |  7 ++
.../conversions/vfncvt-ftoi-template.h| 19 
.../conversions/vfncvt-ftoi-zvfh-run.c| 42 
.../rvv/autovec/conversions/vfncvt-itof-run.c | 52 ++
.../autovec/conversions/vfncvt-itof-rv32gcv.c |  7 ++
.../autovec/conversions/vfncvt-itof-rv64gcv.c |  7 ++
.../conversions/vfncvt-itof-template.h| 18 
.../conversions/vfncvt-itof-zvfh-run.c| 64 +
.../rvv/autovec/conversions/vfwcvt-ftoi-run.c | 64 +
.../autovec/conversions/vfwcvt-ftoi-rv32gcv.c |  7 ++
.../autovec/conversions/vfwcvt-ftoi-rv64gcv.c |  7 ++
.../conversions/vfwcvt-ftoi-template.h| 17 
.../conversions/vfwcvt-ftoi-zvfh-run.c| 64 +
.../rvv/autovec/conversions/vfwcvt-itof-run.c | 96 +++
.../autovec/conversions/vfwcvt-itof-rv32gcv.c |  7 ++
.../autovec/conversions/vfwcvt-itof-rv64gcv.c |  7 ++
.../conversions/vfwcvt-itof-template.h| 20 
.../conversions/vfwcvt-itof-zvfh-run.c| 45 +
22 files changed, 737 insertions(+)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-run.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-rv32gcv.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-rv64gcv.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-template.h
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-zvfh-run.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-run.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-rv32gcv.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-rv64gcv.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-template.h
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-zvfh-run.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-run.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-rv32gcv.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-rv64gcv.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-template.h
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-zvfh-run.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-run.c
create mode 100644

Re: [PATCH] RISC-V: Add autovec FP widening/narrowing.





On 6/26/23 12:58, Robin Dapp wrote:

Hi,

this patch adds FP widening and narrowing autovec expanders as well as
tests.  Conceptually similar to integer extension/truncation, we emulate
_Float16 -> double by two vfwcvts and double -> _Float16 by two vfncvts.

Optimizations to create widening operations will be added separately.

Regards
  Robin

gcc/ChangeLog:

* config/riscv/autovec.md (extend2): New
expander.
(extend2): Dito.
(trunc2): Dito.
(trunc2): Dito.
* config/riscv/vector-iterators.md: Add VQEXTF and HF to
V_QUAD_TRUNC and v_quad_trunc.
It looks like you fixed the type of trunc2 to be a 
narrowing vector shift from just a vector shift.  This isn't reflected 
in the ChangeLog.


I wasn't aware that "Dito" was an accepted spelling.  I had to look it 
up ;-)


OK with the ChangeLog fixed.

jeff

Re: [PATCH] RISC-V: Add autovec FP widening/narrowing.


A comment here:

-  [(set_attr "type" "vshift")
+  [(set_attr "type" "vnshift")

You should drop this change, otherwise LGTM.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-06-27 02:58
To: gcc-patches; palmer; Kito Cheng; juzhe.zh...@rivai.ai; jeffreyalaw
CC: rdapp.gcc
Subject: [PATCH] RISC-V: Add autovec FP widening/narrowing.
Hi,
 
this patch adds FP widening and narrowing autovec expanders as well as
tests.  Conceptually similar to integer extension/truncation, we emulate
_Float16 -> double by two vfwcvts and double -> _Float16 by two vfncvts.
 
Optimizations to create widening operations will be added separately.
 
Regards
Robin
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (extend2): New
expander.
(extend2): Dito.
(trunc2): Dito.
(trunc2): Dito.
* config/riscv/vector-iterators.md: Add VQEXTF and HF to
V_QUAD_TRUNC and v_quad_trunc.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-template.h: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-template.h: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-zvfh-run.c: New test.
---
gcc/config/riscv/autovec.md   | 90 ++-
gcc/config/riscv/vector-iterators.md  | 14 +++
.../rvv/autovec/conversions/vfncvt-run.c  | 33 +++
.../rvv/autovec/conversions/vfncvt-rv32gcv.c  |  7 ++
.../rvv/autovec/conversions/vfncvt-rv64gcv.c  |  7 ++
.../rvv/autovec/conversions/vfncvt-template.h | 16 
.../rvv/autovec/conversions/vfncvt-zvfh-run.c | 34 +++
.../rvv/autovec/conversions/vfwcvt-run.c  | 33 +++
.../rvv/autovec/conversions/vfwcvt-rv32gcv.c  |  6 ++
.../rvv/autovec/conversions/vfwcvt-rv64gcv.c  |  6 ++
.../rvv/autovec/conversions/vfwcvt-template.h | 16 
.../rvv/autovec/conversions/vfwcvt-zvfh-run.c | 34 +++
12 files changed, 293 insertions(+), 3 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfncvt-run.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfncvt-rv32gcv.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfncvt-rv64gcv.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfncvt-template.h
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfncvt-zvfh-run.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfwcvt-run.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfwcvt-rv32gcv.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfwcvt-rv64gcv.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfwcvt-template.h
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfwcvt-zvfh-run.c
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index aee4574b8e1..5cc48f966aa 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -162,12 +162,12 @@ (define_insn_and_split "3"
   riscv_vector::emit_vlmax_insn (code_for_pred_scalar (, mode),
riscv_vector::RVV_BINOP, operands);
   DONE;
-}  
+}
  [(set_attr "type" "vshift")
   (set_attr "mode" "")])
;; -
-;;  [INT] Binary shifts by scalar.
+;;  [INT] Binary shifts by vector.
;; -
;; Includes:
;; - vsll.vv/vsra.vv/vsrl.vv
@@ -416,7 +416,7 @@ (define_insn_and_split "trunc2"
   riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, operands);
   DONE;
}
-  [(set_attr "type" "vshift")
+  [(set_attr "type" "vnshift")
(set_attr "mode" "")])
;; -
@@ -466,6 +466,90 @@ (define_expand "trunc2"
   DONE;
})
+;; -
+;;  [FP] Widening.
+;; -
+;; - vfwcvt.f.f.v
+;; -
+(define_insn_and_split "extend2"
+  [(set (match_operand:VWEXTF_ZVFHMIN 0 "register_operand" "=")
+(float_extend:VWEXTF_ZVFHMIN
+ (match_operand:  1 "register_operand" "  vr")))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  insn_code icode = code_for_pred_extend (mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, operands);
+  DONE;
+}
+  [(set_attr

Re: Re: [PATCH] RISC-V: Add autovec FP int->float conversion.

LGTM too.



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-06-27 05:50
To: Robin Dapp; gcc-patches; palmer; Kito Cheng; juzhe.zh...@rivai.ai
Subject: Re: [PATCH] RISC-V: Add autovec FP int->float conversion.
 
 
On 6/26/23 12:58, Robin Dapp wrote:
> Hi,
> 
> this patch adds the autovec expander for vfcvt.f.x.v and tests for it.
> In addition, it modifies the zfhmin-1 test so it doesn't scan for
> "no vectorization" but rather check that we do not emit any (RTL)
> vector operations (other than float/float conversions) with a
> VNx..HFmode.
> 
> Regards
>   Robin
> 
> gcc/ChangeLog:
> 
> * config/riscv/autovec.md (2): New
> expander.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-run.c: Adjust.
> * gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-rv32gcv.c:
> Dito.
> * gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-rv64gcv.c:
> Dito.
> * gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-template.h:
> Dito.
> * gcc.target/riscv/rvv/autovec/zvfhmin-1.c: Add int/float conversions.
> * gcc.target/riscv/rvv/autovec/conversions/vfcvt-itof-run.c: New test.
> * gcc.target/riscv/rvv/autovec/conversions/vfcvt-itof-rv32gcv.c: New test.
> * gcc.target/riscv/rvv/autovec/conversions/vfcvt-itof-rv64gcv.c: New test.
> * gcc.target/riscv/rvv/autovec/conversions/vfcvt-itof-template.h: New test.
> * gcc.target/riscv/rvv/autovec/conversions/vfcvt-itof-zvfh-run.c: New test.
OK.
 
jeff

Re: [PATCH] RISC-V: Add autovec FP int->float conversion.





On 6/26/23 12:58, Robin Dapp wrote:

Hi,

this patch adds the autovec expander for vfcvt.f.x.v and tests for it.
In addition, it modifies the zfhmin-1 test so it doesn't scan for
"no vectorization" but rather check that we do not emit any (RTL)
vector operations (other than float/float conversions) with a
VNx..HFmode.

Regards
  Robin

gcc/ChangeLog:

* config/riscv/autovec.md (2): New
expander.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-run.c: Adjust.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-rv32gcv.c:
Dito.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-rv64gcv.c:
Dito.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-template.h:
Dito.
* gcc.target/riscv/rvv/autovec/zvfhmin-1.c: Add int/float conversions.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt-itof-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt-itof-rv32gcv.c: New 
test.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt-itof-rv64gcv.c: New 
test.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt-itof-template.h: New 
test.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt-itof-zvfh-run.c: New 
test.

OK.

jeff

Re: [PATCH 2/2] AArch64: New RTL for ABDL

2023-06-26 Thread Richard Sandiford via Gcc-patches

Oluwatamilore Adebayo  writes:
> From: oluade01 
>
> This patch adds new RTL for ABDL (sabdl, sabdl2, uabdl, uabdl2).
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-simd.md
>   (vec_widen_abdl_lo_, vec_widen_abdl_hi_):
>   Expansions for abd vec widen optabs.
>   (aarch64_abdl_insn): VQW based abdl RTL.
>   * config/aarch64/iterators.md (USMAX_EXT): Code attributes
>   that give the appropriate extend RTL for the max RTL.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/abd_2.c: Added ABDL testcases.
>   * gcc.target/aarch64/abd_3.c: Added ABDL testcases.
>   * gcc.target/aarch64/abd_4.c: Added ABDL testcases.
>   * gcc.target/aarch64/abd_none_2.c: Added ABDL testcases.
>   * gcc.target/aarch64/abd_none_3.c: Added ABDL testcases.
>   * gcc.target/aarch64/abd_none_4.c: Added ABDL testcases.
>   * gcc.target/aarch64/abd_run_1.c: Added ABDL testcases.
>   * gcc.target/aarch64/sve/abd_1.c: Added ABDL testcases.
>   * gcc.target/aarch64/sve/abd_2.c: Added ABDL testcases.
>   * gcc.target/aarch64/sve/abd_none_1.c: Added ABDL testcases.
>   * gcc.target/aarch64/sve/abd_none_2.c: Added ABDL testcases.
> ---
>  gcc/config/aarch64/aarch64-simd.md| 70 
>  gcc/config/aarch64/iterators.md   |  3 +
>  gcc/testsuite/gcc.target/aarch64/abd_2.c  | 62 --
>  gcc/testsuite/gcc.target/aarch64/abd_3.c  | 63 --
>  gcc/testsuite/gcc.target/aarch64/abd_4.c  | 47 +--
>  gcc/testsuite/gcc.target/aarch64/abd_none_2.c | 73 
>  gcc/testsuite/gcc.target/aarch64/abd_none_3.c | 73 
>  gcc/testsuite/gcc.target/aarch64/abd_none_4.c | 84 +++
>  gcc/testsuite/gcc.target/aarch64/abd_run_1.c  | 29 +++
>  gcc/testsuite/gcc.target/aarch64/sve/abd_1.c  | 67 +--
>  gcc/testsuite/gcc.target/aarch64/sve/abd_2.c  | 53 ++--
>  .../gcc.target/aarch64/sve/abd_none_1.c   | 73 
>  .../gcc.target/aarch64/sve/abd_none_2.c   | 80 ++
>  13 files changed, 749 insertions(+), 28 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> bf90202ba2ad3f62f2020486d21256f083effb07..36fefd0a96801479fbf6469a3fbcef4a0b8cad6f
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -975,6 +975,76 @@ (define_expand "aarch64_abdl2"
>}
>  )
>  
> +(define_insn "aarch64_abdl_hi_internal"
> +  [(set (match_operand: 0 "register_operand" "=w")
> + (minus:
> +   (USMAX:
> + (:
> +   (vec_select:
> + (match_operand:VQW 1 "register_operand" "w")
> + (match_operand:VQW 3 "vect_par_cnst_hi_half" "")))
> + (:
> +   (vec_select:
> + (match_operand:VQW 2 "register_operand" "w")
> + (match_dup 3
> +   (:
> + (:
> +   (vec_select: (match_dup 1) (match_dup 3)))
> + (:
> +   (vec_select: (match_dup 2) (match_dup 3))]
> +  "TARGET_SIMD"
> +  "abdl2\t%0., %1., %2."
> +  [(set_attr "type" "neon_abd_long")]
> +)

We don't need the (minus (max…) (min…)) thing when widening is
involved.  It should be enough to do something like:

  (abs:
(minus:
  (ANY_EXTEND:
(vec_select:…))
  (ANY_EXTEND:
(vec_select:…

> +
> +(define_insn "aarch64_abdl_lo_internal"
> +  [(set (match_operand: 0 "register_operand" "=w")
> + (minus:
> +   (USMAX:
> + (: (vec_select:
> + (match_operand:VQW 1 "register_operand" "w")
> + (match_operand:VQW 3 "vect_par_cnst_lo_half" "")))
> + (: (vec_select:
> + (match_operand:VQW 2 "register_operand" "w")
> + (match_dup 3
> +   (:
> + (: (vec_select:
> + (match_dup 1)
> + (match_dup 3)))
> + (: (vec_select:
> + (match_dup 2)
> + (match_dup 3))]
> +  "TARGET_SIMD"
> +  "abdl\t%0., %1., %2."
> +  [(set_attr "type" "neon_abd_long")]
> +)
> +
> +(define_expand "vec_widen_abd_hi_"
> +  [(match_operand: 0 "register_operand")
> +   (ANY_EXTEND: (match_operand:VQW 1 "register_operand"))
> +   (ANY_EXTEND: (match_operand:VQW 2 "register_operand"))]
> +  "TARGET_SIMD"
> +  {
> +rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
> +emit_insn (gen_aarch64_abdl_hi_internal (operands[0], 
> operands[1],
> +operands[2], p));
> +DONE;
> +  }
> +)
> +
> +(define_expand "vec_widen_abd_lo_"
> +  [(match_operand: 0 "register_operand")
> +   (ANY_EXTEND: (match_operand:VQW 1 "register_operand"))
> +   (ANY_EXTEND: (match_operand:VQW 2 "register_operand"))]
> +  "TARGET_SIMD"
> +  {
> +rtx p = aarch64_simd_vect_par_cnst_half (mode, , false);
> +emit_insn (gen_aarch64_abdl_lo_internal (operands[0], 
> operands[1],
> +operands[2], p));
> +DONE;
> +  }
> +)
>

Re: [PATCH 1/2] Mid engine setup [SU]ABDL

2023-06-26 Thread Richard Sandiford via Gcc-patches

Thanks for doing this.  Generally looks good, but some comments below.

Oluwatamilore Adebayo  writes:
> From: oluade01 
>
> This updates vect_recog_abd_pattern to recognize the widening
> variant of absolute difference (ABDL, ABDL2).
>
> gcc/ChangeLog:
>
>   * internal-fn.cc (widening_fn_p, decomposes_to_hilo_fn_p):
>   Add IFN_VEC_WIDEN_ABD to the switch statement.
>   * internal-fn.def (VEC_WIDEN_ABD): New internal hilo optab.
>   * optabs.def (vec_widen_sabd_optab,
>   vec_widen_sabd_hi_optab, vec_widen_sabd_lo_optab,
>   vec_widen_sabd_odd_even, vec_widen_sabd_even_optab,
>   vec_widen_uabd_optab,
>   vec_widen_uabd_hi_optab, vec_widen_uabd_lo_optab,
>   vec_widen_uabd_odd_even, vec_widen_uabd_even_optab):
>   New optabs.
>   * tree-vect-patterns.cc (vect_recog_abd_pattern): Update to
>   to build a VEC_WIDEN_ABD call if the input precision is smaller
>   than the precision of the output.
>   (vect_recog_widen_abd_pattern): Should an ABD expression be
>   found preceeding an extension, replace the two with a
>   VEC_WIDEN_ABD.
> ---
>  gcc/internal-fn.def   |   5 ++
>  gcc/optabs.def|  10 +++
>  gcc/tree-vect-patterns.cc | 174 --
>  3 files changed, 161 insertions(+), 28 deletions(-)
>
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index 
> 116965f4830cec8f60642ff011a86b6562e2c509..d67274d68b49943a88c531e903fd03b42343ab97
>  100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -352,6 +352,11 @@ DEF_INTERNAL_WIDENING_OPTAB_FN (VEC_WIDEN_MINUS,
>   first,
>   vec_widen_ssub, vec_widen_usub,
>   binary)
> +DEF_INTERNAL_WIDENING_OPTAB_FN (VEC_WIDEN_ABD,
> + ECF_CONST | ECF_NOTHROW,
> + first,
> + vec_widen_sabd, vec_widen_uabd,
> + binary)
>  DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, vec_fmaddsub, ternary)
>  DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd, ternary)
>  
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 
> 35b835a6ac56d72417dac8ddfd77a8a7e2475e65..68dfa1550f791a2fe833012157601ecfa68f1e09
>  100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -418,6 +418,11 @@ OPTAB_D (vec_widen_sadd_hi_optab, "vec_widen_sadd_hi_$a")
>  OPTAB_D (vec_widen_sadd_lo_optab, "vec_widen_sadd_lo_$a")
>  OPTAB_D (vec_widen_sadd_odd_optab, "vec_widen_sadd_odd_$a")
>  OPTAB_D (vec_widen_sadd_even_optab, "vec_widen_sadd_even_$a")
> +OPTAB_D (vec_widen_sabd_optab, "vec_widen_sabd_$a")
> +OPTAB_D (vec_widen_sabd_hi_optab, "vec_widen_sabd_hi_$a")
> +OPTAB_D (vec_widen_sabd_lo_optab, "vec_widen_sabd_lo_$a")
> +OPTAB_D (vec_widen_sabd_odd_optab, "vec_widen_sabd_odd_$a")
> +OPTAB_D (vec_widen_sabd_even_optab, "vec_widen_sabd_even_$a")
>  OPTAB_D (vec_widen_sshiftl_hi_optab, "vec_widen_sshiftl_hi_$a")
>  OPTAB_D (vec_widen_sshiftl_lo_optab, "vec_widen_sshiftl_lo_$a")
>  OPTAB_D (vec_widen_umult_even_optab, "vec_widen_umult_even_$a")
> @@ -436,6 +441,11 @@ OPTAB_D (vec_widen_uadd_hi_optab, "vec_widen_uadd_hi_$a")
>  OPTAB_D (vec_widen_uadd_lo_optab, "vec_widen_uadd_lo_$a")
>  OPTAB_D (vec_widen_uadd_odd_optab, "vec_widen_uadd_odd_$a")
>  OPTAB_D (vec_widen_uadd_even_optab, "vec_widen_uadd_even_$a")
> +OPTAB_D (vec_widen_uabd_optab, "vec_widen_uabd_$a")
> +OPTAB_D (vec_widen_uabd_hi_optab, "vec_widen_uabd_hi_$a")
> +OPTAB_D (vec_widen_uabd_lo_optab, "vec_widen_uabd_lo_$a")
> +OPTAB_D (vec_widen_uabd_odd_optab, "vec_widen_uabd_odd_$a")
> +OPTAB_D (vec_widen_uabd_even_optab, "vec_widen_uabd_even_$a")
>  OPTAB_D (vec_addsub_optab, "vec_addsub$a3")
>  OPTAB_D (vec_fmaddsub_optab, "vec_fmaddsub$a4")
>  OPTAB_D (vec_fmsubadd_optab, "vec_fmsubadd$a4")

The new optabs need to be documented in doc/md.texi.

> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 
> e2392113bff4065c909aefc760b4c48978b73a5a..852c4e99edb19d215c354b666b991c87c48620b4
>  100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -1404,15 +1404,28 @@ vect_recog_sad_pattern (vec_info *vinfo,
>gcall *abd_stmt = dyn_cast  (abs_stmt_vinfo->stmt);
>if (!abd_stmt
> || !gimple_call_internal_p (abd_stmt)
> -   || gimple_call_internal_fn (abd_stmt) != IFN_ABD)
> +   || gimple_call_num_args (abd_stmt) != 2)
>   return NULL;
>  
>tree abd_oprnd0 = gimple_call_arg (abd_stmt, 0);
>tree abd_oprnd1 = gimple_call_arg (abd_stmt, 1);
>  
> -  if (!vect_look_through_possible_promotion (vinfo, abd_oprnd0, 
> [0])
> -   || !vect_look_through_possible_promotion (vinfo, abd_oprnd1,
> - [1]))
> +  if (gimple_call_internal_fn (abd_stmt) == IFN_ABD)
> + {
> +   if (!vect_look_through_possible_promotion (vinfo, abd_oprnd0,
> +

Re: [PATCH] Move substitute_and_fold over to use simple_dce_from_worklist

On Mon, Jun 26, 2023 at 11:49 AM Andrew Pinski  wrote:
>
> On Mon, Jun 26, 2023 at 9:13 AM Andrew Pinski  wrote:
> >
> > On Sun, Jun 25, 2023 at 10:59 PM Jan-Benedict Glaw  
> > wrote:
> > >
> > > Hi Andrew,
> > >
> > > On Fri, 2023-05-05 08:17:19 -0700, Andrew Pinski via Gcc-patches 
> > >  wrote:
> > > > While looking into a different issue, I noticed that it
> > > > would take until the second forwprop pass to do some
> > > > forward proping and it was because the ssa name was
> > > > used more than once but the second statement was
> > > > "dead" and we don't remove that until much later.
> > > [...]
> > > > OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> > >
> > > Since this patch, I see a bit of fallout building the Linux kernel
> > > using the adder875_defconfig:
> > >
> > > # CC  arch/powerpc/kernel/ptrace/ptrace-view.o
> > >   powerpc-linux-gcc -Wp,-MMD,arch/powerpc/kernel/ptrace/.ptrace-view.o.d 
> > > -nostdinc -I./arch/powerpc/include -I./arch/powerpc/include/generated  
> > > -I./include -I./arch/powerpc/include/uapi 
> > > -I./arch/powerpc/include/generated/uapi -I./include/uapi 
> > > -I./include/generated/uapi -include ./include/linux/compiler-version.h 
> > > -include ./include/linux/kconfig.h -include 
> > > ./include/linux/compiler_types.h -D__KERNEL__ -I ./arch/powerpc 
> > > -fmacro-prefix-map=./= -Wall -Wundef -Werror=strict-prototypes 
> > > -Wno-trigraphs -fno-strict-aliasing -fno-common -fshort-wchar -fno-PIE 
> > > -Werror=implicit-function-declaration -Werror=implicit-int 
> > > -Werror=return-type -Wno-format-security -funsigned-char -std=gnu11 
> > > -mbig-endian -m32 -msoft-float -pipe -ffixed-r2 -mmultiple 
> > > -mno-readonly-in-sdata -mcpu=860 -mno-prefixed -mno-pcrel -mno-altivec 
> > > -mno-vsx -mno-mma -fno-asynchronous-unwind-tables -mno-string 
> > > -mbig-endian -mstack-protector-guard=tls -mstack-protector-guard-reg=r2 
> > > -fno-delete-null-pointer-checks -Wno-frame-address -Wno-format-truncation 
> > > -Wno-format-overflow -Wno-address-of-packed-member -O2 
> > > -fno-allow-store-data-races -Wframe-larger-than=1024 
> > > -fstack-protector-strong -Wno-main -Wno-unused-but-set-variable 
> > > -Wno-unused-const-variable -Wno-dangling-pointer -fomit-frame-pointer 
> > > -ftrivial-auto-var-init=zero -fno-stack-clash-protection 
> > > -Wdeclaration-after-statement -Wvla -Wno-pointer-sign 
> > > -Wcast-function-type -Wno-stringop-truncation -Wno-stringop-overflow 
> > > -Wno-restrict -Wno-maybe-uninitialized -Wno-array-bounds 
> > > -Wno-alloc-size-larger-than -Wimplicit-fallthrough=5 -fno-strict-overflow 
> > > -fno-stack-check -fconserve-stack -Werror=date-time 
> > > -Werror=incompatible-pointer-types -Werror=designated-init 
> > > -Wno-packed-not-aligned -g -mstack-protector-guard-offset=544 -Werror 
> > > -DUTS_MACHINE='"ppc"'
> > > -DKBUILD_MODFILE='"arch/powerpc/kernel/ptrace/ptrace-view"' 
> > > -DKBUILD_BASENAME='"ptrace_view"' -DKBUILD_MODNAME='"ptrace_view"' 
> > > -D__KBUILD_MODNAME=kmod_ptrace_view -c -o 
> > > arch/powerpc/kernel/ptrace/ptrace-view.o 
> > > arch/powerpc/kernel/ptrace/ptrace-view.c
> > > during GIMPLE pass: pre
> > > arch/powerpc/kernel/ptrace/ptrace-view.c: In function 'gpr32_set_common':
> > > arch/powerpc/kernel/ptrace/ptrace-view.c:649:5: internal compiler error: 
> > > in gimple_redirect_edge_and_branch, at tree-cfg.cc:6262
> > >   649 | int gpr32_set_common(struct task_struct *target,
> > >   | ^~~~
> > > 0x1a562a6 internal_error(char const*, ...)
> > >  ???:0
> > > 0x826ea1 fancy_abort(char const*, int, char const*)
> > >  ???:0
> > > 0x9b77c9 redirect_edge_and_branch(edge_def*, basic_block_def*)
> > >  ???:0
> > > 0x9b7e43 split_edge(edge_def*)
> > >  ???:0
> > > 0xee1cc7 split_critical_edges(bool)
> > >  ???:0
> > > Please submit a full bug report, with preprocessed source (by using 
> > > -freport-bug).
> > > Please include the complete backtrace with any bug report.
> > > See  for instructions.
> > > make[4]: *** [scripts/Makefile.build:252: 
> > > arch/powerpc/kernel/ptrace/ptrace-view.o] Error 1
> > > make[3]: *** [scripts/Makefile.build:494: arch/powerpc/kernel/ptrace] 
> > > Error 2
> > > make[2]: *** [scripts/Makefile.build:494: arch/powerpc/kernel] Error 2
> > > make[1]: *** [scripts/Makefile.build:494: arch/powerpc] Error 2
> > > make: *** [Makefile:2026: .] Error 2
> >
> > Can you file a bug (https://gcc.gnu.org/bugzilla/) with the
> > preprocessed source (which -freport-bug will provide). In the meantime
> > I will try to reproduce it and see what is going on.
>
> Note I am suspecting it is related to GCC PR 103979 . I am still
> trying to reproduce the ICE.

I am 99% sure it is more related to PR 105165 really. Where PRE is now
trying to insert on a critical edge and that is failing ...
Basically my patch removes some extra statements (unrelated to the asm
goto) and that causes the IR to be enough different that PRE is now
failing.

Thanks,

Re: [PATCH v2 1/3] c++: Track lifetimes in constant evaluation [PR70331, PR96630, PR98675]

2023-06-26 Thread Patrick Palka via Gcc-patches

On Sun, 25 Jun 2023, Nathaniel Shead wrote:

> On Fri, Jun 23, 2023 at 12:43:21PM -0400, Patrick Palka wrote:
> > On Wed, 29 Mar 2023, Nathaniel Shead via Gcc-patches wrote:
> > 
> > > This adds rudimentary lifetime tracking in C++ constexpr contexts,
> > > allowing the compiler to report errors with using values after their
> > > backing has gone out of scope. We don't yet handle other ways of ending
> > > lifetimes (e.g. explicit destructor calls).
> > 
> > Awesome!
> > 
> > > 
> > >   PR c++/96630
> > >   PR c++/98675
> > >   PR c++/70331
> > > 
> > > gcc/cp/ChangeLog:
> > > 
> > >   * constexpr.cc (constexpr_global_ctx::put_value): Mark value as
> > >   in lifetime.
> > >   (constexpr_global_ctx::remove_value): Mark value as expired.
> > >   (cxx_eval_call_expression): Remove comment that is no longer
> > >   applicable.
> > >   (non_const_var_error): Add check for expired values.
> > >   (cxx_eval_constant_expression): Add checks for expired values. Forget
> > >   local variables at end of bind expressions. Forget temporaries at end
> > >   of cleanup points.
> > >   * cp-tree.h (struct lang_decl_base): New flag to track expired values
> > >   in constant evaluation.
> > >   (DECL_EXPIRED_P): Access the new flag.
> > >   (SET_DECL_EXPIRED_P): Modify the new flag.
> > >   * module.cc (trees_out::lang_decl_bools): Write out the new flag.
> > >   (trees_in::lang_decl_bools): Read in the new flag.
> > > 
> > > gcc/testsuite/ChangeLog:
> > > 
> > >   * g++.dg/cpp0x/constexpr-ice20.C: Update error raised by test.
> > >   * g++.dg/cpp1y/constexpr-lifetime1.C: New test.
> > >   * g++.dg/cpp1y/constexpr-lifetime2.C: New test.
> > >   * g++.dg/cpp1y/constexpr-lifetime3.C: New test.
> > >   * g++.dg/cpp1y/constexpr-lifetime4.C: New test.
> > >   * g++.dg/cpp1y/constexpr-lifetime5.C: New test.
> > > 
> > > Signed-off-by: Nathaniel Shead 
> > > ---
> > >  gcc/cp/constexpr.cc   | 69 +++
> > >  gcc/cp/cp-tree.h  | 10 ++-
> > >  gcc/cp/module.cc  |  2 +
> > >  gcc/testsuite/g++.dg/cpp0x/constexpr-ice20.C  |  2 +-
> > >  .../g++.dg/cpp1y/constexpr-lifetime1.C| 13 
> > >  .../g++.dg/cpp1y/constexpr-lifetime2.C| 20 ++
> > >  .../g++.dg/cpp1y/constexpr-lifetime3.C| 13 
> > >  .../g++.dg/cpp1y/constexpr-lifetime4.C| 11 +++
> > >  .../g++.dg/cpp1y/constexpr-lifetime5.C| 11 +++
> > >  9 files changed, 137 insertions(+), 14 deletions(-)
> > >  create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime1.C
> > >  create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime2.C
> > >  create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime3.C
> > >  create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime4.C
> > >  create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime5.C
> > > 
> > > diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
> > > index 3de60cfd0f8..bdbc12144a7 100644
> > > --- a/gcc/cp/constexpr.cc
> > > +++ b/gcc/cp/constexpr.cc
> > > @@ -1185,10 +1185,22 @@ public:
> > >void put_value (tree t, tree v)
> > >{
> > >  bool already_in_map = values.put (t, v);
> > > +if (!already_in_map && DECL_P (t))
> > > +  {
> > > + if (!DECL_LANG_SPECIFIC (t))
> > > +   retrofit_lang_decl (t);
> > > + if (DECL_LANG_SPECIFIC (t))
> > > +   SET_DECL_EXPIRED_P (t, false);
> > > +  }
> > 
> > Since this new flag would only be used only during constexpr evaluation,
> > could we instead use an on-the-side hash_set in constexpr_global_ctx for
> > tracking expired-ness?  That way we won't have to allocate a
> > DECL_LANG_SPECIFIC structure for decls that lack it, and won't have to
> > worry about the flag in other parts of the compiler.
> 
> I've tried this but I haven't been able to get it to work well. The main
> issue I'm running into is the caching of function calls in constant
> evaluation. For example, consider the following:
> 
> constexpr const double& test() {
>   const double& local = 3.0;
>   return local;
> }
> 
> constexpr int foo(const double&) { return 5; }
> 
> constexpr int a = foo(test());
> static_assert(test() == 3.0);
> 
> When constant-evaluating 'a', we evaluate 'test()'. It returns a value
> that ends its lifetime immediately, so we mark this in 'ctx->global' as
> expired. However, 'foo()' never actually evaluates this expired value,
> so the initialisation of 'a' succeeds.
> 
> However, then when the static assertion attempts to constant evaluate a
> second time, the result of 'test' has already been cached, and we just
> get directly handed a value. This is a new constant evaluation, so
> 'ctx->global' has been reset, and because we just got the result of the
> cached function we don't actually know whether this is expired or not
> anymore, and so this compiles without any error in case it was valid.

Ouch, good catch..

> 
> I haven't yet been able to come up with a good way

Re: [PATCH] Introduce hardbool attribute for C

2023-06-26 Thread Qing Zhao via Gcc-patches

Hi, Alexandre,

> On Jun 23, 2023, at 10:38 PM, Alexandre Oliva  wrote:
> 
>> For normal Boolean variables, 0x00 is false, this is a reasonable init
>> value with zero-initialization.
> 
> *nod*.  I was surprised by zero initialization of (non-hardened)
> booleans even when pattern is requested, but not consistently
> (e.g. boolean fields of a larger struct would still get
> pattern-initialized IIUC).  I'd have expected pattern would translate to
> nonzero and thus to true, rather than false.

Such inconsistent behavior was introduced by the following commit:

From c081d0a3b0291297f04a05c833d2ffa8de3a7a1a Mon Sep 17 00:00:00 2001
Subject: [PATCH] middle-end/103033 - drop native_interpret_expr with
 .DEFERRED_INIT expansion

> 
>> For hardbool variables, what 0x00 represents if it’s not false or true
>> value?
> 
> It depends on how hardbool is parameterized.  One may pick 0x00 or 0xFE
> as the representations for true or false, or neither, in which case the
> trivial initializer will end up as a trapping value.

Okay, then, this looks like a  good behavior (trapping with 
-ftrival-auto-var-init  most of the time,
 i.e, when neither 0x00 or 0xFE was chosen as the representations for true or 
false), 
it will urge the user to fix the uninitialized hardbool variables.
Do I miss anything here?
> 
>>> I'd probably have arranged for the front-end to create the initializer
>>> value, because expansion time is too late to figure it out: we may not
>>> even have the front-end at hand any more, in case of lto compilation.
> 
>> Is the hardbool attribute information available during the rtl expansion 
>> phase?
> 
> It is in the sense that the attribute lives on, but c_hardbool_type_attr
> is a frontend function, it cannot be called from e.g. lto1.
does lookup_attribute work for this attribute during rtl expansion? (Still a 
little confusing here)
> 
> The hardbool attribute is also implemented in Ada, but there it only
> affects validity checking in the front end: Boolean types in Ada are
> Enumeration types, and there is standard syntax to specify the
> representations for true and false.  AFAICT, once we translate GNAT IR
> to GNU IR, hardened booleans would not be recognizable as boolean types.
> Even non-hardened booleans with representation clauses would.

So, right now, the GNU IR represents Ada’s boolean type as enumeration type? 

>  So
> handling these differently from other enumeration types, to make them
> closer to booleans, would be a bit of a challenge,

is there any special handling in GNU IR when representing Ada’s boolean type as 
enumeration type?
Any issue right now?

> and a
> backwards-compatibility issue (because such booleans have already been
> handled in the present way since the introduction of -ftrivial-* back in
> GCC12)

With the new hardbool attribute added for C, an original bool type becomes an 
enumeration type logically, 
But such information is not passed to middle end through GNU IR, So, in GCC 
middle-end, 
We still treat such type as boolean, not an enumeration type.
Is this understanding correct?

> 
>>> Now, I acknowledge that the decision to make implicit
>>> zero-initialization of boolean types set them to the value for false,
>>> rather than to all-bits-zero representation, is a departure from common
>>> practice of zero-initialization yielding logical zero.
> 
>> Dont’s quite understand the above, for normal boolean variables,
> 
> Sorry, I meant hardened boolean types.  This was WRT to the design
> decision that led to this bit in the documentation:
> 
> typedef char __attribute__ ((__hardbool__ (0x5a))) hbool;
> [...]
> static hbool zeroinit; /* False, stored as (char)0x5a.  */
> auto hbool uninit; /* Undefined, may trap.  */
For the hardbool variable "uninit",  -ftrivial-auto-var-init=zero will 
initialize it to zero, and it will trap during runtime.
And at the same time, -ftrivial-auto-var-init=pattern will initialize it to 
0xfe, and it will trap during runtime, too.

I think these are good behaviors, just need to be documented. 
> 
>> And this is a very reasonable initial value for Boolean variables,
> 
> Agreed.  The all-zeros bit pattern is not so great for booleans that use
> alternate representations, though, such as the following standard Ada:
> 
>   type MyBool is new Boolean;
>   for MyBool use (16#5a#, 16#a5#);
>   for MyBool'Size use 8;
> 
> or for biased variables such as:
> 
>  X : Integer range 254 .. 507;
>  for X'Size use 8; -- bits, so a biased representation is required.
> 
> Just to make things more interesting, I chose a range for X that causes
> the compiler to represent 0xfe as 0x00 in in the byte that holds X, but
> that places the 0xfe pattern just out of the range :-) So with
> -ftrivial-auto-var-init=zero, X = 254, whereas with
> -ftrivial-auto-var-init=pattern, it fails validity checking, and might
> come out as 508 if that's disabled.

 for the biased variable X,  it was initialized to 254 (the smallest valid 
value in the range) when

[PATCH] RISC-V: Add autovect widening/narrowing Integer/FP conversions.

Hi,

this patch implements widening and narrowing float-to-int and
int-to-float autovec conversions and adds tests.

Regards
 Robin

gcc/ChangeLog:

* config/riscv/autovec.md (2): New
expander.
(2): Dito.
(2): Dito.
(2): Dito.
* config/riscv/vector-iterators.md: Add vnconvert.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-rv32gcv.c: New 
test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-rv64gcv.c: New 
test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-template.h: New 
test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-zvfh-run.c: New 
test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-rv32gcv.c: New 
test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-rv64gcv.c: New 
test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-template.h: New 
test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-zvfh-run.c: New 
test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-rv32gcv.c: New 
test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-rv64gcv.c: New 
test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-template.h: New 
test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-zvfh-run.c: New 
test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-rv32gcv.c: New 
test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-rv64gcv.c: New 
test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-template.h: New 
test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-zvfh-run.c: New 
test.
---
 gcc/config/riscv/autovec.md   | 76 +++
 gcc/config/riscv/vector-iterators.md  |  8 ++
 .../rvv/autovec/conversions/vfncvt-ftoi-run.c | 96 +++
 .../autovec/conversions/vfncvt-ftoi-rv32gcv.c |  7 ++
 .../autovec/conversions/vfncvt-ftoi-rv64gcv.c |  7 ++
 .../conversions/vfncvt-ftoi-template.h| 19 
 .../conversions/vfncvt-ftoi-zvfh-run.c| 42 
 .../rvv/autovec/conversions/vfncvt-itof-run.c | 52 ++
 .../autovec/conversions/vfncvt-itof-rv32gcv.c |  7 ++
 .../autovec/conversions/vfncvt-itof-rv64gcv.c |  7 ++
 .../conversions/vfncvt-itof-template.h| 18 
 .../conversions/vfncvt-itof-zvfh-run.c| 64 +
 .../rvv/autovec/conversions/vfwcvt-ftoi-run.c | 64 +
 .../autovec/conversions/vfwcvt-ftoi-rv32gcv.c |  7 ++
 .../autovec/conversions/vfwcvt-ftoi-rv64gcv.c |  7 ++
 .../conversions/vfwcvt-ftoi-template.h| 17 
 .../conversions/vfwcvt-ftoi-zvfh-run.c| 64 +
 .../rvv/autovec/conversions/vfwcvt-itof-run.c | 96 +++
 .../autovec/conversions/vfwcvt-itof-rv32gcv.c |  7 ++
 .../autovec/conversions/vfwcvt-itof-rv64gcv.c |  7 ++
 .../conversions/vfwcvt-itof-template.h| 20 
 .../conversions/vfwcvt-itof-zvfh-run.c| 45 +
 22 files changed, 737 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-run.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-rv32gcv.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-rv64gcv.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-template.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-zvfh-run.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-run.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-rv32gcv.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-rv64gcv.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-template.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-zvfh-run.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-run.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-rv32gcv.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-rv64gcv.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-template.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-zvfh-run.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-run.c
 create mode 100644

[PATCH] RISC-V: Add autovec FP widening/narrowing.

Hi,

this patch adds FP widening and narrowing autovec expanders as well as
tests.  Conceptually similar to integer extension/truncation, we emulate
_Float16 -> double by two vfwcvts and double -> _Float16 by two vfncvts.

Optimizations to create widening operations will be added separately.

Regards
 Robin

gcc/ChangeLog:

* config/riscv/autovec.md (extend2): New
expander.
(extend2): Dito.
(trunc2): Dito.
(trunc2): Dito.
* config/riscv/vector-iterators.md: Add VQEXTF and HF to
V_QUAD_TRUNC and v_quad_trunc.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/conversions/vfncvt-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-template.h: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-template.h: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-zvfh-run.c: New test.
---
 gcc/config/riscv/autovec.md   | 90 ++-
 gcc/config/riscv/vector-iterators.md  | 14 +++
 .../rvv/autovec/conversions/vfncvt-run.c  | 33 +++
 .../rvv/autovec/conversions/vfncvt-rv32gcv.c  |  7 ++
 .../rvv/autovec/conversions/vfncvt-rv64gcv.c  |  7 ++
 .../rvv/autovec/conversions/vfncvt-template.h | 16 
 .../rvv/autovec/conversions/vfncvt-zvfh-run.c | 34 +++
 .../rvv/autovec/conversions/vfwcvt-run.c  | 33 +++
 .../rvv/autovec/conversions/vfwcvt-rv32gcv.c  |  6 ++
 .../rvv/autovec/conversions/vfwcvt-rv64gcv.c  |  6 ++
 .../rvv/autovec/conversions/vfwcvt-template.h | 16 
 .../rvv/autovec/conversions/vfwcvt-zvfh-run.c | 34 +++
 12 files changed, 293 insertions(+), 3 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfncvt-run.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfncvt-rv32gcv.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfncvt-rv64gcv.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfncvt-template.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfncvt-zvfh-run.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfwcvt-run.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfwcvt-rv32gcv.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfwcvt-rv64gcv.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfwcvt-template.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfwcvt-zvfh-run.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index aee4574b8e1..5cc48f966aa 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -162,12 +162,12 @@ (define_insn_and_split "3"
   riscv_vector::emit_vlmax_insn (code_for_pred_scalar (, mode),
 riscv_vector::RVV_BINOP, operands);
   DONE;
-}  
+}
  [(set_attr "type" "vshift")
   (set_attr "mode" "")])
 
 ;; -
-;;  [INT] Binary shifts by scalar.
+;;  [INT] Binary shifts by vector.
 ;; -
 ;; Includes:
 ;; - vsll.vv/vsra.vv/vsrl.vv
@@ -416,7 +416,7 @@ (define_insn_and_split "trunc2"
   riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, operands);
   DONE;
 }
-  [(set_attr "type" "vshift")
+  [(set_attr "type" "vnshift")
(set_attr "mode" "")])
 
 ;; -
@@ -466,6 +466,90 @@ (define_expand "trunc2"
   DONE;
 })
 
+;; -
+;;  [FP] Widening.
+;; -
+;; - vfwcvt.f.f.v
+;; -
+(define_insn_and_split "extend2"
+  [(set (match_operand:VWEXTF_ZVFHMIN 0 "register_operand" "=")
+(float_extend:VWEXTF_ZVFHMIN
+ (match_operand:  1 "register_operand" "  vr")))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  insn_code icode = code_for_pred_extend (mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, operands);
+  DONE;
+}
+  [(set_attr "type" "vfwcvtftof")
+   (set_attr "mode" "")])
+
+(define_expand "extend2"
+  [(set (match_operand:VQEXTF 0 "register_operand")
+

[PATCH] RISC-V: Add autovec FP int->float conversion.

Hi,

this patch adds the autovec expander for vfcvt.f.x.v and tests for it.
In addition, it modifies the zfhmin-1 test so it doesn't scan for
"no vectorization" but rather check that we do not emit any (RTL)
vector operations (other than float/float conversions) with a
VNx..HFmode.

Regards
 Robin

gcc/ChangeLog:

* config/riscv/autovec.md (2): New
expander.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-run.c: Adjust.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-rv32gcv.c:
Dito.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-rv64gcv.c:
Dito.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-template.h:
Dito.
* gcc.target/riscv/rvv/autovec/zvfhmin-1.c: Add int/float conversions.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt-itof-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt-itof-rv32gcv.c: New 
test.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt-itof-rv64gcv.c: New 
test.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt-itof-template.h: New 
test.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt-itof-zvfh-run.c: New 
test.
---
 gcc/config/riscv/autovec.md   | 19 
 .../rvv/autovec/conversions/vfcvt-itof-run.c  | 96 +++
 .../autovec/conversions/vfcvt-itof-rv32gcv.c  |  7 ++
 .../autovec/conversions/vfcvt-itof-rv64gcv.c  |  7 ++
 .../autovec/conversions/vfcvt-itof-template.h | 20 
 .../autovec/conversions/vfcvt-itof-zvfh-run.c | 64 +
 .../rvv/autovec/conversions/vfcvt_rtz-run.c   | 44 +
 .../autovec/conversions/vfcvt_rtz-rv32gcv.c   |  5 +-
 .../autovec/conversions/vfcvt_rtz-rv64gcv.c   |  5 +-
 .../autovec/conversions/vfcvt_rtz-template.h  |  6 +-
 .../autovec/conversions/vfcvt_rtz-zvfh-run.c  | 64 +
 .../gcc.target/riscv/rvv/autovec/zvfhmin-1.c  | 48 --
 12 files changed, 374 insertions(+), 11 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfcvt-itof-run.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfcvt-itof-rv32gcv.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfcvt-itof-rv64gcv.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfcvt-itof-template.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfcvt-itof-zvfh-run.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-zvfh-run.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 5cc48f966aa..c2913128362 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -573,6 +573,25 @@ (define_expand "2"
   DONE;
 })
 
+;; -
+;;  [FP<-INT] Conversions
+;; -
+;; Includes:
+;; - vfcvt.f.xu.v
+;; - vfcvt.f.x.v
+;; -
+
+(define_expand "2"
+  [(set (match_operand:VF 0 "register_operand")
+   (any_float:VF
+ (match_operand: 1 "register_operand")))]
+  "TARGET_VECTOR"
+{
+  insn_code icode = code_for_pred (, mode);
+  riscv_vector::emit_vlmax_fp_insn (icode, riscv_vector::RVV_UNOP, operands);
+  DONE;
+})
+
 ;; =
 ;; == Unary arithmetic
 ;; =
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfcvt-itof-run.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfcvt-itof-run.c
new file mode 100644
index 000..dfe73285b36
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfcvt-itof-run.c
@@ -0,0 +1,96 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "-std=c99 -fno-vect-cost-model 
--param=riscv-autovec-preference=scalable" } */
+
+#include "vfcvt-itof-template.h"
+
+#define RUN(TYPE1, TYPE2, NUM) 
\
+  TYPE1 src##TYPE1##TYPE2##NUM[NUM];   
\
+  TYPE2 dst##TYPE1##TYPE2##NUM[NUM];   
\
+  for (int i = 0; i < NUM; i++)
\
+{  
\
+  src##TYPE1##TYPE2##NUM[i] = i * -3 - 88932;  
\
+}  
\
+  vfcvt_##TYPE1##TYPE2 (dst##TYPE1##TYPE2##NUM, src##TYPE1##TYPE2##NUM, NUM);  
\
+  for (int i = 0; i < NUM; i++)
\
+if (dst##TYPE1##TYPE2##NUM[i] != (TYPE2) src##TYPE1##TYPE2##NUM[i])
\
+  __builtin_abort ();

Re: [PATCH] libstdc++: Synchronize PSTL with upstream

2023-06-26 Thread Thomas Rodgers via Gcc-patches

On Wed, May 17, 2023 at 12:32 PM Jonathan Wakely  wrote:

> -template 
> -  _OutputIterator
> -__brick_generate_n(_OutputIterator __first, _Size __count, _Generator
> __g, /* is_vector = */ std::true_type) noexcept
> +template 
>
> Missing uglification on Size.
>
> +_RandomAccessIterator
> +__brick_generate_n(_RandomAccessIterator __first, Size __count,
> _Generator __g,
> +   /* is_vector = */ std::true_type) noexcept
>  {
>  return __unseq_backend::__simd_generate_n(__first, __count, __g);
>  }
>
> -template 
> -  _OutputIterator
> -__brick_generate_n(_OutputIterator __first, _Size __count, _Generator
> __g, /* is_vector = */ std::false_type) noexcept
> +template 
>
> Missing uglification on OutputIterator and Size.
>
> +OutputIterator
> +__brick_generate_n(OutputIterator __first, Size __count, _Generator __g,
> /* is_vector = */ std::false_type) noexcept
>
>
> -template  _BinaryOperation>
> -_ForwardIterator2
> -__brick_adjacent_difference(_ForwardIterator1 __first, _ForwardIterator1
> __last, _ForwardIterator2 __d_first,
> -_BinaryOperation __op, /*is_vector=*/std::true_type) noexcept
> +template  class BinaryOperation>
>
> Missing uglification on BinaryOperation.
>
> +_RandomAccessIterator2
> +__brick_adjacent_difference(_RandomAccessIterator1 __first,
> _RandomAccessIterator1 __last,
> +_RandomAccessIterator2 __d_first,
> BinaryOperation __op,
> +/*is_vector=*/std::true_type) noexcept
>
>
> The above problems exist on the declaration and the definitions.
>
>
> --- a/libstdc++-v3/include/pstl/glue_execution_defs.h
> +++ b/libstdc++-v3/include/pstl/glue_execution_defs.h
> @@ -18,8 +18,8 @@ namespace std
>  {
>  // Type trait
>  using __pstl::execution::is_execution_policy;
> -#if _PSTL_CPP14_VARIABLE_TEMPLATES_PRESENT
> -#if __INTEL_COMPILER
> +#if defined(_PSTL_CPP14_VARIABLE_TEMPLATES_PRESENT)
> +#if defined(__INTEL_COMPILER)
>  template 
>  constexpr bool is_execution_policy_v = is_execution_policy::value;
>  #else
>
> Pre-existing, but that T should be _Tp, but it only affects the Intel
> compiler branch, so meh.
>
> Please fix these and report them upstream too.
>

Ack.


>
> All the actual code changes look good.
>
> I think I'd prefer if __pattern_partial_sort_copy used
> std::uninitialized_copy instead of a loop and placement-new, but that
> doesn't need to hold this up. We could optimize some uses of
> std::conjunction and std::conditional to use our own __and_ and
> __conditional, but I'm not sure it's worth diverging from upstream to do
> that.
>
> Please fix the naming bugs noted above and push to trunk, thanks!
>

There were a handful of additional missed uglifications in -
* libstdc++-v3/include/pstl/glue_algorithm_impl.h
* libstdc++-v3/include/pstl/unseq_backend_simd.h
That are in this commit, but not in the patch as reviewed.

Tested x86_64-linux. Pushed to trunk.


>
> +Reviewed-by: Jonathan Wakely 
>
>
>
>
>
>

Re: [PATCH] Move substitute_and_fold over to use simple_dce_from_worklist

On Mon, Jun 26, 2023 at 9:13 AM Andrew Pinski  wrote:
>
> On Sun, Jun 25, 2023 at 10:59 PM Jan-Benedict Glaw  wrote:
> >
> > Hi Andrew,
> >
> > On Fri, 2023-05-05 08:17:19 -0700, Andrew Pinski via Gcc-patches 
> >  wrote:
> > > While looking into a different issue, I noticed that it
> > > would take until the second forwprop pass to do some
> > > forward proping and it was because the ssa name was
> > > used more than once but the second statement was
> > > "dead" and we don't remove that until much later.
> > [...]
> > > OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> >
> > Since this patch, I see a bit of fallout building the Linux kernel
> > using the adder875_defconfig:
> >
> > # CC  arch/powerpc/kernel/ptrace/ptrace-view.o
> >   powerpc-linux-gcc -Wp,-MMD,arch/powerpc/kernel/ptrace/.ptrace-view.o.d 
> > -nostdinc -I./arch/powerpc/include -I./arch/powerpc/include/generated  
> > -I./include -I./arch/powerpc/include/uapi 
> > -I./arch/powerpc/include/generated/uapi -I./include/uapi 
> > -I./include/generated/uapi -include ./include/linux/compiler-version.h 
> > -include ./include/linux/kconfig.h -include 
> > ./include/linux/compiler_types.h -D__KERNEL__ -I ./arch/powerpc 
> > -fmacro-prefix-map=./= -Wall -Wundef -Werror=strict-prototypes 
> > -Wno-trigraphs -fno-strict-aliasing -fno-common -fshort-wchar -fno-PIE 
> > -Werror=implicit-function-declaration -Werror=implicit-int 
> > -Werror=return-type -Wno-format-security -funsigned-char -std=gnu11 
> > -mbig-endian -m32 -msoft-float -pipe -ffixed-r2 -mmultiple 
> > -mno-readonly-in-sdata -mcpu=860 -mno-prefixed -mno-pcrel -mno-altivec 
> > -mno-vsx -mno-mma -fno-asynchronous-unwind-tables -mno-string -mbig-endian 
> > -mstack-protector-guard=tls -mstack-protector-guard-reg=r2 
> > -fno-delete-null-pointer-checks -Wno-frame-address -Wno-format-truncation 
> > -Wno-format-overflow -Wno-address-of-packed-member -O2 
> > -fno-allow-store-data-races -Wframe-larger-than=1024 
> > -fstack-protector-strong -Wno-main -Wno-unused-but-set-variable 
> > -Wno-unused-const-variable -Wno-dangling-pointer -fomit-frame-pointer 
> > -ftrivial-auto-var-init=zero -fno-stack-clash-protection 
> > -Wdeclaration-after-statement -Wvla -Wno-pointer-sign -Wcast-function-type 
> > -Wno-stringop-truncation -Wno-stringop-overflow -Wno-restrict 
> > -Wno-maybe-uninitialized -Wno-array-bounds -Wno-alloc-size-larger-than 
> > -Wimplicit-fallthrough=5 -fno-strict-overflow -fno-stack-check 
> > -fconserve-stack -Werror=date-time -Werror=incompatible-pointer-types 
> > -Werror=designated-init -Wno-packed-not-aligned -g 
> > -mstack-protector-guard-offset=544 -Werror -DUTS_MACHINE='"ppc"'
> > -DKBUILD_MODFILE='"arch/powerpc/kernel/ptrace/ptrace-view"' 
> > -DKBUILD_BASENAME='"ptrace_view"' -DKBUILD_MODNAME='"ptrace_view"' 
> > -D__KBUILD_MODNAME=kmod_ptrace_view -c -o 
> > arch/powerpc/kernel/ptrace/ptrace-view.o 
> > arch/powerpc/kernel/ptrace/ptrace-view.c
> > during GIMPLE pass: pre
> > arch/powerpc/kernel/ptrace/ptrace-view.c: In function 'gpr32_set_common':
> > arch/powerpc/kernel/ptrace/ptrace-view.c:649:5: internal compiler error: in 
> > gimple_redirect_edge_and_branch, at tree-cfg.cc:6262
> >   649 | int gpr32_set_common(struct task_struct *target,
> >   | ^~~~
> > 0x1a562a6 internal_error(char const*, ...)
> >  ???:0
> > 0x826ea1 fancy_abort(char const*, int, char const*)
> >  ???:0
> > 0x9b77c9 redirect_edge_and_branch(edge_def*, basic_block_def*)
> >  ???:0
> > 0x9b7e43 split_edge(edge_def*)
> >  ???:0
> > 0xee1cc7 split_critical_edges(bool)
> >  ???:0
> > Please submit a full bug report, with preprocessed source (by using 
> > -freport-bug).
> > Please include the complete backtrace with any bug report.
> > See  for instructions.
> > make[4]: *** [scripts/Makefile.build:252: 
> > arch/powerpc/kernel/ptrace/ptrace-view.o] Error 1
> > make[3]: *** [scripts/Makefile.build:494: arch/powerpc/kernel/ptrace] Error 
> > 2
> > make[2]: *** [scripts/Makefile.build:494: arch/powerpc/kernel] Error 2
> > make[1]: *** [scripts/Makefile.build:494: arch/powerpc] Error 2
> > make: *** [Makefile:2026: .] Error 2
>
> Can you file a bug (https://gcc.gnu.org/bugzilla/) with the
> preprocessed source (which -freport-bug will provide). In the meantime
> I will try to reproduce it and see what is going on.

Note I am suspecting it is related to GCC PR 103979 . I am still
trying to reproduce the ICE.

>
> Thanks,
> Andrew
>
>
> >
> >
> > (Full log at 
> > http://toolchain.lug-owl.de/laminar/jobs/linux-powerpc-adder875_defconfig/100)
> >
> > Thanks,
> >   Jan-Benedict
> >
> > --

Re: [PATCH] New finish_compare_by_pieces target hook (for x86).

2023-06-26 Thread Richard Sandiford via Gcc-patches

Richard Biener via Gcc-patches  writes:
> On Sun, Jun 25, 2023 at 7:39 AM Roger Sayle  
> wrote:
>>
>>
>> On Tue, 13 June 2023 12:02, Richard Biener wrote:
>> > On Mon, Jun 12, 2023 at 4:04 PM Roger Sayle 
>> > wrote:
>> > > The following simple test case, from PR 104610, shows that memcmp ()
>> > > == 0 can result in some bizarre code sequences on x86.
>> > >
>> > > int foo(char *a)
>> > > {
>> > > static const char t[] = "0123456789012345678901234567890";
>> > > return __builtin_memcmp(a, [0], sizeof(t)) == 0; }
>> > >
>> > > with -O2 currently contains both:
>> > > xorl%eax, %eax
>> > > xorl$1, %eax
>> > > and also
>> > > movl$1, %eax
>> > > xorl$1, %eax
>> > >
>> > > Changing the return type of foo to _Bool results in the equally
>> > > bizarre:
>> > > xorl%eax, %eax
>> > > testl   %eax, %eax
>> > > sete%al
>> > > and also
>> > > movl$1, %eax
>> > > testl   %eax, %eax
>> > > sete%al
>> > >
>> > > All these sequences set the result to a constant, but this
>> > > optimization opportunity only occurs very late during compilation, by
>> > > basic block duplication in the 322r.bbro pass, too late for CSE or
>> > > peephole2 to do anything about it.  The problem is that the idiom
>> > > expanded by compare_by_pieces for __builtin_memcmp_eq contains basic
>> > > blocks that can't easily be optimized by if-conversion due to the
>> > > multiple incoming edges on the fail block.
>> > >
>> > > In summary, compare_by_pieces generates code that looks like:
>> > >
>> > > if (x[0] != y[0]) goto fail_label;
>> > > if (x[1] != y[1]) goto fail_label;
>> > > ...
>> > > if (x[n] != y[n]) goto fail_label;
>> > > result = 1;
>> > > goto end_label;
>> > > fail_label:
>> > > result = 0;
>> > > end_label:
>> > >
>> > > In theory, the RTL if-conversion pass could be enhanced to tackle
>> > > arbitrarily complex if-then-else graphs, but the solution proposed
>> > > here is to allow suitable targets to perform if-conversion during
>> > > compare_by_pieces.  The x86, for example, can take advantage that all
>> > > of the above comparisons set and test the zero flag (ZF), which can
>> > > then be used in combination with sete.  Hence compare_by_pieces could
>> > > instead generate:
>> > >
>> > > if (x[0] != y[0]) goto fail_label;
>> > > if (x[1] != y[1]) goto fail_label;
>> > > ...
>> > > if (x[n] != y[n]) goto fail_label;
>> > > fail_label:
>> > > sete result
>> > >
>> > > which requires one less basic block, and the redundant conditional
>> > > branch to a label immediately after is cleaned up by GCC's existing
>> > > RTL optimizations.
>> > >
>> > > For the test case above, where -O2 -msse4 previously generated:
>> > >
>> > > foo:movdqu  (%rdi), %xmm0
>> > > pxor.LC0(%rip), %xmm0
>> > > ptest   %xmm0, %xmm0
>> > > je  .L5
>> > > .L2:movl$1, %eax
>> > > xorl$1, %eax
>> > > ret
>> > > .L5:movdqu  16(%rdi), %xmm0
>> > > pxor.LC1(%rip), %xmm0
>> > > ptest   %xmm0, %xmm0
>> > > jne .L2
>> > > xorl%eax, %eax
>> > > xorl$1, %eax
>> > > ret
>> > >
>> > > we now generate:
>> > >
>> > > foo:movdqu  (%rdi), %xmm0
>> > > pxor.LC0(%rip), %xmm0
>> > > ptest   %xmm0, %xmm0
>> > > jne .L2
>> > > movdqu  16(%rdi), %xmm0
>> > > pxor.LC1(%rip), %xmm0
>> > > ptest   %xmm0, %xmm0
>> > > .L2:sete%al
>> > > movzbl  %al, %eax
>> > > ret
>> > >
>> > > Using a target hook allows the large amount of intelligence already in
>> > > compare_by_pieces to be re-used by the i386 backend, but this can also
>> > > help other backends with condition flags where the equality result can
>> > > be materialized.
>> > >
>> > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
>> > > and make -k check, both with and without --target_board=unix{-m32}
>> > > with no new failures.  Ok for mainline?
>> >
>> > What's the guarantee that the zero flag is appropriately set on all edges 
>> > incoming
>> > now and forever?
>>
>> Is there any reason why this target hook can't be removed (in future) should 
>> it stop
>> being useful?  It's completely optional and not required for the correct 
>> functioning
>> of the compiler.
>>
>> > Does this require target specific knowledge on how do_compare_rtx_and_jump
>> > is emitting RTL?
>>
>> Yes.  Each backend can decide how best to implement finish_compare_by_pieces
>> given its internal knowledge of how do_compare_rtx_and_jump works.  It's not
>> important to the middle-end how the underlying invariants are guaranteed, 
>> just
>> that they are and the backend produces correct code.  A backend may store 
>> flags
>> on the target label, or maintain state in cfun.  Future changes to the

Re: Merge from trunk to gccgo branch

2023-06-26 Thread Ian Lance Taylor via Gcc-patches

I merged trunk revision 3a39a31b8ae9c6465434aefa657f7fcc86f905c0 to
the gccgo branch.

Ian

Re: [PATCH] Improve DSE to handle stores before __builtin_unreachable ()

2023-06-26 Thread Jan Hubicka via Gcc-patches

Hi,
playing with testcases for path isolation and const function, I noticed
that we do not seem to even try to isolate out of range array accesses:
int a[3]={0,1,2};
test(int i)
{
   if (i > 3)
 return test2(a[i]);
   return a[i];
}

Here call to test2 is dead, since a[i] will access memory past of the
array.  We produce a warning:

t.c:5:24: warning: array subscript 4 is above array bounds of ‘int[3]’ 
[-Warray-bounds=]

but we still keep the call:

test:
.LFB0:
.cfi_startproc
movslq  %edi, %rax
movla(,%rax,4), %eax
cmpl$3, %edi
jg  .L4
ret
.p2align 4,,10
.p2align 3
.L4:
movl%eax, %edi
xorl%eax, %eax
jmp test2

We eventually move the load before conditional, but at path isolation
time it is still quite obvious the conditional being true invokes
undefined behaviour

int test (int i)
{
  int _1;
  int _2;
  int _6;
  int _8;
  
   [local count: 1073741824]:
  if (i_4(D) > 3)
goto ; [20.24%]
  else
goto ; [79.76%]

   [local count: 217325344]:
  _1 = a[i_4(D)];
  _8 = test2 (_1);
  goto ; [100.00%]

   [local count: 856416481]:
  _6 = a[i_4(D)];
  
   [local count: 1073741824]:
  # _2 = PHI <_8(3), _6(4)>
  return _2;
} 

Curiously adjusting the testcase:

const int a[3]={0,1,2};
test(int i)
{
if (i == 3)
return test2(a[i]);
return a[i];
}
no longer has undefined behaviour visible at isolate-paths
int test (int i)
{
  int _1;
  int _5;
  int _7;

   [local count: 1073741824]:
  if (i_3(D) == 3)
goto ; [11.56%]
  else
goto ; [88.44%]

   [local count: 124124552]:
  _7 = test2 (0);
  goto ; [100.00%]

   [local count: 949617273]:
  _5 = a[i_3(D)];

   [local count: 1073741824]:
  # _1 = PHI <_7(3), _5(4)>
  return _1;
}
since we fold the load to 0.  It would perhaps help optimizers to keep info on 
undefined behaviour happening there.

Honza

Go patch committed: Support -fgo-importcfg

2023-06-26 Thread Ian Lance Taylor via Gcc-patches

The gc Go compiler has a -importcfg option that takes a file that
provides a mapping from import paths to the files that satisfy those
imports.  This is used by the go build tool to let the compiler read
imported packages directly out of the build cache.  Without this
option the go build tool has to construct a tree of files to provide
the same mapping in the file system.

This patch to the Go frontend adds a -fgo-importcfg option that does
the same thing.  The go build tool already uses this option if it is
supported; with this patch, it is supported.

Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian

* lang.opt (fgo-importcfg): New option.
* go-c.h (struct go_create_gogo_args): Add importcfg field.
* go-lang.cc (go_importcfg): New static variable.
(go_langhook_init): Set args.importcfg.
(go_langhook_handle_option): Handle -fgo-importcfg.
* gccgo.texi (Invoking gccgo): Document -fgo-importcfg.
cd4f91ed9786caf207d6d68bf2e64f986ed19735
diff --git a/gcc/go/gccgo.texi b/gcc/go/gccgo.texi
index 4ab1a76818f..90651af8384 100644
--- a/gcc/go/gccgo.texi
+++ b/gcc/go/gccgo.texi
@@ -271,6 +271,14 @@ pattern to a list of file names, and @code{Files} maps 
each file name
 to a full path to the file.  This option is intended for use by the
 @command{go} command to implement @code{//go:embed}.
 
+@cindex @option{-fgo-importcfg}
+@item -fgo-importcfg=@var{file}
+Identify a file that provides mappings for import package paths found
+in the Go source files.  The file can contain two commands:
+@code{importpath} to rename import paths for vendoring and
+@code{packagefile} to map from package path to files containing export
+data.  This option is intended for use by the @command{go} command.
+
 @cindex @option{-g for gccgo}
 @item -g
 This is the standard @command{gcc} option (@pxref{Debugging Options, ,
diff --git a/gcc/go/go-c.h b/gcc/go/go-c.h
index c6050382aa8..6a2b57b3b44 100644
--- a/gcc/go/go-c.h
+++ b/gcc/go/go-c.h
@@ -41,6 +41,7 @@ struct go_create_gogo_args
   const char* prefix;
   const char* relative_import_path;
   const char* c_header;
+  const char* importcfg;
   const char* embedcfg;
   Backend* backend;
   Linemap* linemap;
diff --git a/gcc/go/go-lang.cc b/gcc/go/go-lang.cc
index c6c147b20a5..e85a4bfe949 100644
--- a/gcc/go/go-lang.cc
+++ b/gcc/go/go-lang.cc
@@ -90,6 +90,7 @@ static const char *go_prefix = NULL;
 static const char *go_relative_import_path = NULL;
 static const char *go_c_header = NULL;
 static const char *go_embedcfg = NULL;
+static const char *go_importcfg = NULL;
 
 /* Language hooks.  */
 
@@ -111,6 +112,7 @@ go_langhook_init (void)
   args.relative_import_path = go_relative_import_path;
   args.c_header = go_c_header;
   args.embedcfg = go_embedcfg;
+  args.importcfg = go_importcfg;
   args.check_divide_by_zero = go_check_divide_zero;
   args.check_divide_overflow = go_check_divide_overflow;
   args.compiling_runtime = go_compiling_runtime;
@@ -286,6 +288,10 @@ go_langhook_handle_option (
   go_embedcfg = arg;
   break;
 
+case OPT_fgo_importcfg_:
+  go_importcfg = arg;
+  break;
+
 default:
   /* Just return 1 to indicate that the option is valid.  */
   break;
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index ff07b1a1fa6..c44cdc2baac 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-68a756b6aadc901534cfad2b1e73fae9e34f
+92152c88ea8e2dd9e8c67e91bf4ae5e3edf1b506
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/gcc/go/gofrontend/embed.cc b/gcc/go/gofrontend/embed.cc
index 0584f707ce6..6dada5efc2a 100644
--- a/gcc/go/gofrontend/embed.cc
+++ b/gcc/go/gofrontend/embed.cc
@@ -19,8 +19,8 @@
 
 // Read a file into *DATA.  Returns false on error.
 
-static bool
-read_file(const char* filename, Location loc, std::string* data)
+bool
+Gogo::read_file(const char* filename, Location loc, std::string* data)
 {
   int fd = open(filename, O_RDONLY | O_BINARY);
   if (fd < 0)
@@ -346,7 +346,8 @@ Gogo::read_embedcfg(const char *filename)
 bool
 Embedcfg_reader::initialize_from_file()
 {
-  if (!read_file(this->filename_, Linemap::unknown_location(), >data_))
+  if (!Gogo::read_file(this->filename_, Linemap::unknown_location(),
+  >data_))
 return false;
   if (this->data_.empty())
 {
@@ -849,7 +850,7 @@ Gogo::initializer_for_embeds(Type* type,
}
 
   std::string data;
-  if (!read_file(this->embed_files_[paths[0]].c_str(), loc, ))
+  if (!Gogo::read_file(this->embed_files_[paths[0]].c_str(), loc, ))
return Expression::make_error(loc);
 
   Expression* e = Expression::make_string(data, loc);
@@ -909,7 +910,7 @@ Gogo::initializer_for_embeds(Type* type,
   std::string data;
   if ((*pp)[pp->size() - 1] != '/')
{
- if (!read_file(this->embed_files_[*pp].c_str(), loc, ))
+ if

[PATCH][committed] aarch64: Use instead of in scalar SQRSHRUN pattern

2023-06-26 Thread Kyrylo Tkachov via Gcc-patches

Hi all,

In the scalar pattern for SQRSHRUN it's a bit clearer to use DWI instead of 
V2XWIDE
to make it more clear that no vector modes are involved.
No behavioural change intended.

Bootstrapped and tested on aarch64-none-linux-gnu.
Pushing to trunk.
Thanks,
Kyrill

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_sqrshrun_n_insn):
Use  instead of .
(aarch64_sqrshrun_n): Likewise.


dwi.patch
Description: dwi.patch

[PATCH][committed] aarch64: Clean up some rounding immediate predicates

2023-06-26 Thread Kyrylo Tkachov via Gcc-patches

Hi all,

aarch64_simd_rsra_rnd_imm_vec is now used for more than just RSRA
and accepts more than just vectors so rename it to make it more
truthful.
The aarch64_simd_rshrn_imm_vec is now unused and can be deleted.
No behavioural change intended.

Bootstrapped and tested on aarch64-none-linux-gnu.
Pushing to trunk.
Thanks,
Kyrill

gcc/ChangeLog:

* config/aarch64/aarch64-protos.h (aarch64_const_vec_rsra_rnd_imm_p):
Rename to...
(aarch64_rnd_imm_p): ... This.
* config/aarch64/predicates.md (aarch64_simd_rsra_rnd_imm_vec):
Rename to...
(aarch64_int_rnd_operand): ... This.
(aarch64_simd_rshrn_imm_vec): Delete.
* config/aarch64/aarch64-simd.md (aarch64_rsra_n_insn):
Adjust for the above.
(aarch64_rshr_n_insn): Likewise.
(*aarch64_rshrn_n_insn): Likewise.
(*aarch64_sqrshrun_n_insn): Likewise.
(aarch64_sqrshrun_n_insn): Likewise.
(aarch64_rshrn2_n_insn_le): Likewise.
(aarch64_rshrn2_n_insn_be): Likewise.
(aarch64_sqrshrun2_n_insn_le): Likewise.
(aarch64_sqrshrun2_n_insn_be): Likewise.
* config/aarch64/aarch64.cc (aarch64_const_vec_rsra_rnd_imm_p):
Rename to...
(aarch64_rnd_imm_p): ... This.


rnd-imm.patch
Description: rnd-imm.patch

Re: [PATCH] RISCV: Add -m(no)-omit-leaf-frame-pointer support.





On 6/26/23 08:50, Kito Cheng wrote:
LLVM will try to find scratch register even after RA to resolve the long 
jump issue. so maybe we could consider similar approach? And I guess the 
most complicate part would be the scratch register is not found, and 
require spill/reload after RA.
Right.  And the spill/reload after RA is ta problem unless you 
pre-allocate the space.  Of course in a function near 1M in size, odds 
are there were some calls in there and thus $ra would be saved.  In the 
exceedingly rare case where it wasn't, allocating a single stack slot 
isn't going to be a major performance driver.


There's other things you can do as well.  Register scavenging, jump 
trampolines, etc.  Examples of both exist.


The point I'm trying to make is that I suspect we're better off burning 
$ra right now to address the correctness issue, then coming back to one 
of the schemes noted above when the cost/benefit analysis shows it's a 
reasonably high priority relative to other optimizations we could be doing.


Jeff

[Committed] IBM zSystems: Assume symbols without explicit alignment to be ok

2023-06-26 Thread Andreas Krebbel via Gcc-patches

A change we have committed back in 2015 relies on the backend
requested ABI alignment to be applied to ALL symbols by the
middle-end. However, this does not appear to be the case for external
symbols. With this commit we assume all symbols without explicit
alignment to be aligned according to the ABI. That's the behavior we
had before.
This fixes a performance regression caused by the 2015 patch. Since
then the address of external char type symbols have been pushed to the
literal pool, although it is safe to access them with larl (which
requires symbols to reside at even addresses).

Bootstrapped and regression tested on s390x.

gcc/
* config/s390/s390.cc (s390_encode_section_info): Set
SYMBOL_FLAG_SET_NOTALIGN2 only if the symbol has explicitely been
misaligned.

gcc/testsuite/
* gcc.target/s390/larl-1.c: New test.
---
 gcc/config/s390/s390.cc|  6 +++--
 gcc/testsuite/gcc.target/s390/larl-1.c | 32 ++
 2 files changed, 36 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/larl-1.c

diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index 9284477396d..d9f10542473 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -13706,8 +13706,10 @@ s390_encode_section_info (tree decl, rtx rtl, int 
first)
 {
   /* Store the alignment to be able to check if we can use
 a larl/load-relative instruction.  We only handle the cases
-that can go wrong (i.e. no FUNC_DECLs).  */
-  if (DECL_ALIGN (decl) == 0 || DECL_ALIGN (decl) % 16)
+that can go wrong (i.e. no FUNC_DECLs).
+All symbols without an explicit alignment are assumed to be 2
+byte aligned as mandated by our ABI.  */
+  if (DECL_USER_ALIGN (decl) && DECL_ALIGN (decl) % 16)
SYMBOL_FLAG_SET_NOTALIGN2 (XEXP (rtl, 0));
   else if (DECL_ALIGN (decl) % 32)
SYMBOL_FLAG_SET_NOTALIGN4 (XEXP (rtl, 0));
diff --git a/gcc/testsuite/gcc.target/s390/larl-1.c 
b/gcc/testsuite/gcc.target/s390/larl-1.c
new file mode 100644
index 000..5ef2ef63f82
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/larl-1.c
@@ -0,0 +1,32 @@
+/* Check if load-address-relative instructions are created */
+
+/* { dg-do compile { target { s390*-*-* } } } */
+/* { dg-options "-O2 -march=z10 -mzarch -fno-section-anchors" } */
+
+/* An explicitely misaligned symbol.  This symbol is NOT aligned as
+   mandated by our ABI.  However, the back-end needs to handle that in
+   order to make things like __attribute__((packed)) work.  The symbol
+   address is expected to be loaded from literal pool.  */
+/* { dg-final { scan-assembler "lgrl\t%r2," { target { lp64 } } } } */
+/* { dg-final { scan-assembler "lrl\t%r2," { target { ! lp64 } } } } */
+extern char align1 __attribute__((aligned(1)));
+
+/* { dg-final { scan-assembler "larl\t%r2,align2" } } */
+extern char align2 __attribute__((aligned(2)));
+
+/* { dg-final { scan-assembler "larl\t%r2,align4" } } */
+extern char align4 __attribute__((aligned(4)));
+
+/* An external char symbol without explicit alignment has a DECL_ALIGN
+   of just 8. In contrast to local definitions DATA_ABI_ALIGNMENT is
+   NOT applied to DECL_ALIGN in that case.  Make sure the backend
+   still assumes this symbol to be aligned according to ABI
+   requirements.  */
+/* { dg-final { scan-assembler "larl\t%r2,align_default" } } */
+extern char align_default;
+
+char * foo1 () { return  }
+char * foo2 () { return  }
+char * foo3 () { return  }
+char * foo4 () { return _default; }
+
-- 
2.41.0

[committed] libstdc++: Fix std::format for pointers [PR110239]

2023-06-26 Thread Jonathan Wakely via Gcc-patches

Tested x86_64-linux. Pushed to trunk.

-- >8 --

The formatter for pointers was casting to uint64_t which sign extends a
32-bit pointer and produces a value that won't fit in the provided
buffer. Cast to uintptr_t instead.

There was also a bug in the __parse_integer helper when converting a
wide string to a narrow string in order to use std::from_chars on it.
The function would always try to read 32 characters, even if the format
string was shorter than that. Fix that bug, and remove the constexpr
implementation of __parse_integer by just using __from_chars_alnum
instead of from_chars, because that's usable in constexpr even in
C++20.

libstdc++-v3/ChangeLog:

PR libstdc++/110239
* include/std/format (__format::__parse_integer): Fix buffer
overflow for wide chars.
(formatter::format): Cast to uintptr_t instead
of uint64_t.
* testsuite/std/format/string.cc: Test too-large widths.
---
 libstdc++-v3/include/std/format | 33 +++--
 libstdc++-v3/testsuite/std/format/string.cc |  5 
 2 files changed, 15 insertions(+), 23 deletions(-)

diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index 96a1e62ccc8..9d5981e4882 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -269,39 +269,26 @@ namespace __format
   if (__first == __last)
__builtin_unreachable();
 
-  // TODO: use this loop unconditionally?
-  // Most integers used for arg-id, width or precision will be small.
-  if (is_constant_evaluated())
-   {
- auto __next = __first;
- unsigned short __val = 0;
- while (__next != __last && '0' <= *__next && *__next <= '9')
-   {
- __val = (__val * 10) + (*__next - '0'); // TODO check overflow?
- ++__next;
-   }
- if (__next == __first)
-   return {0, nullptr};
- return {__val, __next};
-   }
-
-  unsigned short __val = 0;
   if constexpr (is_same_v<_CharT, char>)
{
- auto [ptr, ec] = std::from_chars(__first, __last, __val);
- if (ec == errc{})
-   return {__val, ptr};
- return {0, nullptr};
+ const auto __start = __first;
+ unsigned short __val = 0;
+ // N.B. std::from_chars is not constexpr in C++20.
+ if (__detail::__from_chars_alnum(__first, __last, __val, 10)
+   && __first != __start) [[likely]]
+   return {__val, __first};
}
   else
{
+ unsigned short __val = 0;
  constexpr int __n = 32;
  char __buf[__n]{};
- for (int __i = 0; __i < __n && __first != __last; ++__i)
+ for (int __i = 0; __i < __n && (__first + __i) != __last; ++__i)
__buf[__i] = __first[__i];
  auto [__v, __ptr] = __format::__parse_integer(__buf, __buf + __n);
  return {__v, __first + (__ptr - __buf)};
}
+  return {0, nullptr};
 }
 
   template
@@ -2118,7 +2105,7 @@ namespace __format
typename basic_format_context<_Out, _CharT>::iterator
format(const void* __v, basic_format_context<_Out, _CharT>& __fc) const
{
- auto __u = reinterpret_cast<__UINT64_TYPE__>(__v);
+ auto __u = reinterpret_cast<__UINTPTR_TYPE__>(__v);
  char __buf[2 + sizeof(__v) * 2];
  auto [__ptr, __ec] = std::to_chars(__buf + 2, std::end(__buf),
 __u, 16);
diff --git a/libstdc++-v3/testsuite/std/format/string.cc 
b/libstdc++-v3/testsuite/std/format/string.cc
index e421028a873..d28135ec260 100644
--- a/libstdc++-v3/testsuite/std/format/string.cc
+++ b/libstdc++-v3/testsuite/std/format/string.cc
@@ -121,6 +121,11 @@ test_format_spec()
   // Invalid presentation types for strings.
   VERIFY( ! is_format_string_for("{:S}", "str") );
   VERIFY( ! is_format_string_for("{:d}", "str") );
+
+  // Maximum integer value supported for widths and precisions is USHRT_MAX.
+  VERIFY( is_format_string_for("{:65535}", 1) );
+  VERIFY( ! is_format_string_for("{:65536}", 1) );
+  VERIFY( ! is_format_string_for("{:999}", 1) );
 }
 
 int main()
-- 
2.41.0

[committed] libstdc++: Implement P2538R1 ADL-proof std::projected

2023-06-26 Thread Jonathan Wakely via Gcc-patches

Tested x86_64-linux. Pushed to trunk.

-- >8 --

This was recently approved for C++26, but there's no harm in
implementing it unconditionally for C++20 and C++23. As it says in the
paper, it doesn't change the meaning of any valid code. It only enables
things that were previously ill-formed for questionable reasons.

libstdc++-v3/ChangeLog:

* include/bits/iterator_concepts.h (projected): Replace class
template with alias template denoting an ADL-proofed helper.
(incremental_traits>): Remove.
* testsuite/24_iterators/indirect_callable/projected-adl.cc:
New test.
---
 libstdc++-v3/include/bits/iterator_concepts.h | 35 +++-
 .../indirect_callable/projected-adl.cc| 42 +++
 2 files changed, 67 insertions(+), 10 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/24_iterators/indirect_callable/projected-adl.cc

diff --git a/libstdc++-v3/include/bits/iterator_concepts.h 
b/libstdc++-v3/include/bits/iterator_concepts.h
index 1555c374870..6802582a459 100644
--- a/libstdc++-v3/include/bits/iterator_concepts.h
+++ b/libstdc++-v3/include/bits/iterator_concepts.h
@@ -771,19 +771,34 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   && invocable<_Fn, iter_reference_t<_Is>...>
 using indirect_result_t = invoke_result_t<_Fn, iter_reference_t<_Is>...>;
 
+  namespace __detail
+  {
+template
+  struct __projected
+  {
+   struct __type
+   {
+ using value_type = remove_cvref_t>;
+ indirect_result_t<_Proj&, _Iter> operator*() const; // not defined
+   };
+  };
+
+template
+  struct __projected<_Iter, _Proj>
+  {
+   struct __type
+   {
+ using value_type = remove_cvref_t>;
+ using difference_type = iter_difference_t<_Iter>;
+ indirect_result_t<_Proj&, _Iter> operator*() const; // not defined
+   };
+  };
+  } // namespace __detail
+
   /// [projected], projected
   template _Proj>
-struct projected
-{
-  using value_type = remove_cvref_t>;
-
-  indirect_result_t<_Proj&, _Iter> operator*() const; // not defined
-};
-
-  template
-struct incrementable_traits>
-{ using difference_type = iter_difference_t<_Iter>; };
+using projected = __detail::__projected<_Iter, _Proj>::__type;
 
   // [alg.req], common algorithm requirements
 
diff --git 
a/libstdc++-v3/testsuite/24_iterators/indirect_callable/projected-adl.cc 
b/libstdc++-v3/testsuite/24_iterators/indirect_callable/projected-adl.cc
new file mode 100644
index 000..4c2a0955c6e
--- /dev/null
+++ b/libstdc++-v3/testsuite/24_iterators/indirect_callable/projected-adl.cc
@@ -0,0 +1,42 @@
+// { dg-options "-std=gnu++20" }
+// { dg-do compile { target c++20 } }
+
+// P2538R1 ADL-proof std::projected
+// https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2538r1.html
+
+#include 
+
+template
+  concept has_diff_type = requires { typename T::difference_type; };
+
+static_assert( has_diff_type> );
+
+struct Indy {
+  using value_type = int;
+  int operator*() const { return 0; }
+};
+static_assert( ! std::weakly_incrementable );
+static_assert( ! has_diff_type> );
+
+
+// Examples from the paper:
+
+template struct Holder { T t; };
+struct Incomplete;
+
+void test_concepts()
+{
+  using T = Holder*;
+  static_assert(std::equality_comparable);
+  (void) std::indirectly_comparable>;
+  (void) std::sortable;
+}
+
+#include 
+
+void test_count()
+{
+  Holder* a = nullptr;
+  (void) std::count(, , nullptr);
+  (void) std::ranges::count(, , nullptr); // { dg-bogus "." }
+}
-- 
2.41.0

[committed] libstdc++: Qualify calls to debug mode helpers

2023-06-26 Thread Jonathan Wakely via Gcc-patches

Tested x86_64-linux. Pushed to trunk.

-- >8 --

These functions should be qualified to disable unwanted ADL.

The overload of __check_singular_aux for safe iterators was previously
being found by ADL, because it wasn't declared before __check_singular.
Add a declaration so that it can be found by qualified lookup.

libstdc++-v3/ChangeLog:

* include/debug/helper_functions.h (__get_distance)
(__check_singular, __valid_range_aux, __valid_range): Qualify
calls to disable ADL.
(__check_singular_aux(const _Safe_iterator_base*)): Declare
overload that was previously found via ADL.
---
 libstdc++-v3/include/debug/helper_functions.h | 32 ---
 1 file changed, 21 insertions(+), 11 deletions(-)

diff --git a/libstdc++-v3/include/debug/helper_functions.h 
b/libstdc++-v3/include/debug/helper_functions.h
index dccf8e9e5e6..052b36b484c 100644
--- a/libstdc++-v3/include/debug/helper_functions.h
+++ b/libstdc++-v3/include/debug/helper_functions.h
@@ -111,12 +111,19 @@ namespace __gnu_debug
 _GLIBCXX_CONSTEXPR
 inline typename _Distance_traits<_Iterator>::__type
 __get_distance(_Iterator __lhs, _Iterator __rhs)
-{ return __get_distance(__lhs, __rhs, std::__iterator_category(__lhs)); }
+{
+  return __gnu_debug::__get_distance(__lhs, __rhs,
+std::__iterator_category(__lhs));
+}
 
   // An arbitrary iterator pointer is not singular.
   inline bool
   __check_singular_aux(const void*) { return false; }
 
+  // Defined in 
+  bool
+  __check_singular_aux(const class _Safe_iterator_base*);
+
   // We may have an iterator that derives from _Safe_iterator_base but isn't
   // a _Safe_iterator.
   template
@@ -125,7 +132,7 @@ namespace __gnu_debug
 __check_singular(_Iterator const& __x)
 {
   return ! std::__is_constant_evaluated()
-  && __check_singular_aux(std::__addressof(__x));
+  && __gnu_debug::__check_singular_aux(std::__addressof(__x));
 }
 
   /** Non-NULL pointers are nonsingular. */
@@ -163,7 +170,8 @@ namespace __gnu_debug
  std::input_iterator_tag)
 {
   return __first == __last
-   || (!__check_singular(__first) && !__check_singular(__last));
+   || (!__gnu_debug::__check_singular(__first)
+ && !__gnu_debug::__check_singular(__last));
 }
 
   template
@@ -172,8 +180,8 @@ namespace __gnu_debug
 __valid_range_aux(_InputIterator __first, _InputIterator __last,
  std::random_access_iterator_tag)
 {
-  return
-   __valid_range_aux(__first, __last, std::input_iterator_tag())
+  return __gnu_debug::__valid_range_aux(__first, __last,
+   std::input_iterator_tag())
&& __first <= __last;
 }
 
@@ -186,8 +194,8 @@ namespace __gnu_debug
 __valid_range_aux(_InputIterator __first, _InputIterator __last,
  std::__false_type)
 {
-  return __valid_range_aux(__first, __last,
-  std::__iterator_category(__first));
+  return __gnu_debug::__valid_range_aux(__first, __last,
+   std::__iterator_category(__first));
 }
 
   template
@@ -197,10 +205,11 @@ namespace __gnu_debug
  typename _Distance_traits<_InputIterator>::__type& __dist,
  std::__false_type)
 {
-  if (!__valid_range_aux(__first, __last, std::input_iterator_tag()))
+  if (!__gnu_debug::__valid_range_aux(__first, __last,
+ std::input_iterator_tag()))
return false;
 
-  __dist = __get_distance(__first, __last);
+  __dist = __gnu_debug::__get_distance(__first, __last);
   switch (__dist.second)
{
case __dp_none:
@@ -231,7 +240,8 @@ namespace __gnu_debug
  typename _Distance_traits<_InputIterator>::__type& __dist)
 {
   typedef typename std::__is_integer<_InputIterator>::__type _Integral;
-  return __valid_range_aux(__first, __last, __dist, _Integral());
+  return __gnu_debug::__valid_range_aux(__first, __last, __dist,
+   _Integral());
 }
 
   template
@@ -254,7 +264,7 @@ namespace __gnu_debug
 __valid_range(_InputIterator __first, _InputIterator __last)
 {
   typedef typename std::__is_integer<_InputIterator>::__type _Integral;
-  return __valid_range_aux(__first, __last, _Integral());
+  return __gnu_debug::__valid_range_aux(__first, __last, _Integral());
 }
 
   template
-- 
2.41.0

Re: [PATCH] Convert remaining uses of value_range in ipa-*.cc to Value_Range.

2023-06-26 Thread Martin Jambor

Hi,

On Mon, May 22 2023, Aldy Hernandez via Gcc-patches wrote:
> Minor cleanups to get rid of value_range in IPA.  There's only one left,
> but it's in the switch code which is integer specific.
>
> OK?

With the same request that...

>
> gcc/ChangeLog:
>
>   * ipa-cp.cc (decide_whether_version_node): Adjust comment.
>   * ipa-fnsummary.cc (evaluate_conditions_for_known_args): Adjust
>   for Value_Range.
>   (set_switch_stmt_execution_predicate): Same.
>   * ipa-prop.cc (ipa_compute_jump_functions_for_edge): Same.
> ---
>  gcc/ipa-cp.cc|  3 +--
>  gcc/ipa-fnsummary.cc | 22 ++
>  gcc/ipa-prop.cc  |  9 +++--
>  3 files changed, 18 insertions(+), 16 deletions(-)
>
> diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
> index 03273666ea2..2e64415096e 100644
> --- a/gcc/ipa-cp.cc
> +++ b/gcc/ipa-cp.cc
> @@ -6287,8 +6287,7 @@ decide_whether_version_node (struct cgraph_node *node)
>   {
> /* If some values generated for self-recursive calls with
>arithmetic jump functions fall outside of the known
> -  value_range for the parameter, we can skip them.  VR interface
> -  supports this only for integers now.  */
> +  range for the parameter, we can skip them.  */
> if (TREE_CODE (val->value) == INTEGER_CST
> && !plats->m_value_range.bottom_p ()
> && !ipa_range_contains_p (plats->m_value_range.m_vr,
> diff --git a/gcc/ipa-fnsummary.cc b/gcc/ipa-fnsummary.cc
> index 0474af8991e..1ce8501fe85 100644
> --- a/gcc/ipa-fnsummary.cc
> +++ b/gcc/ipa-fnsummary.cc
> @@ -488,19 +488,20 @@ evaluate_conditions_for_known_args (struct cgraph_node 
> *node,
> if (vr.varying_p () || vr.undefined_p ())
>   break;
>  
> -   value_range res;
> +   Value_Range res (op->type);
> if (!op->val[0])
>   {
> +   Value_Range varying (op->type);
> +   varying.set_varying (op->type);
> range_op_handler handler (op->code, op->type);
> if (!handler
> || !res.supports_type_p (op->type)
> -   || !handler.fold_range (res, op->type, vr,
> -   value_range (op->type)))
> +   || !handler.fold_range (res, op->type, vr, varying))
>   res.set_varying (op->type);
>   }
> else if (!op->val[1])
>   {
> -   value_range op0;
> +   Value_Range op0 (op->type);
> range_op_handler handler (op->code, op->type);
>  
> ipa_range_set_and_normalize (op0, op->val[0]);
> @@ -518,14 +519,14 @@ evaluate_conditions_for_known_args (struct cgraph_node 
> *node,
>   }
> if (!vr.varying_p () && !vr.undefined_p ())
>   {
> -   value_range res;
> -   value_range val_vr;
> +   int_range<2> res;
> +   Value_Range val_vr (TREE_TYPE (c->val));
> range_op_handler handler (c->code, boolean_type_node);
>  
> ipa_range_set_and_normalize (val_vr, c->val);
>  
> if (!handler
> -   || !res.supports_type_p (boolean_type_node)
> +   || !val_vr.supports_type_p (TREE_TYPE (c->val))
> || !handler.fold_range (res, boolean_type_node, vr, 
> val_vr))
>   res.set_varying (boolean_type_node);
>  
> @@ -1687,12 +1688,17 @@ set_switch_stmt_execution_predicate (struct 
> ipa_func_body_info *fbi,
>int bound_limit = opt_for_fn (fbi->node->decl,
>   param_ipa_max_switch_predicate_bounds);
>int bound_count = 0;
> -  value_range vr;
> +  // This can safely be an integer range, as switches can only hold
> +  // integers.
> +  int_range<2> vr;
>  
>get_range_query (cfun)->range_of_expr (vr, op);
>if (vr.undefined_p ())
>  vr.set_varying (TREE_TYPE (op));
>tree vr_min, vr_max;
> +  // ?? This entire function could use a rewrite to use the irange
> +  // API, instead of trying to recreate its intersection/union logic.
> +  // Any use of get_legacy_range() is a serious code smell.

you replace "??" with TODO, because that is presumably what you mean.

OK with that change.

Thanks,

Martin


>value_range_kind vr_type = get_legacy_range (vr, vr_min, vr_max);
>wide_int vr_wmin = wi::to_wide (vr_min);
>wide_int vr_wmax = wi::to_wide (vr_max);
> diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
> index 6383bc11e0a..5f9e6dbbff2 100644
> --- a/gcc/ipa-prop.cc
> +++ b/gcc/ipa-prop.cc
> @@ -2348,7 +2348,6 @@ ipa_compute_jump_functions_for_edge (struct 
> ipa_func_body_info *fbi,
>gcall *call = cs->call_stmt;
>int n, arg_num = gimple_call_num_args (call);
>bool useful_context = false;
> -

Re: [PATCH] Implement ipa_vr hashing.

2023-06-26 Thread Martin Jambor

Hi,

On Mon, May 22 2023, Aldy Hernandez via Gcc-patches wrote:
> Implement hashing for ipa_vr.  When all is said and done, all these
> patches incurr a 7.64% slowdown for ipa-cp, with is entirely covered by
> the similar 7% increase in this area last week.  So we get type agnostic
> ranges with "infinite" range precision close to free.
>
> There is no change in overall compilation.
>
> OK?
>

One small request

> gcc/ChangeLog:
>
>   * ipa-prop.cc (struct ipa_vr_ggc_hash_traits): Adjust for use with
>   ipa_vr instead of value_range.
>   (gt_pch_nx): Same.
>   (gt_ggc_mx): Same.
>   (ipa_get_value_range): Same.
>   * value-range.cc (gt_pch_nx): Move to ipa-prop.cc and adjust for
>   ipa_vr.
>   (gt_ggc_mx): Same.
> ---
>  gcc/ipa-prop.cc| 76 +++---
>  gcc/value-range.cc | 15 -
>  2 files changed, 45 insertions(+), 46 deletions(-)
>
> diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
> index c46a89f1b49..6383bc11e0a 100644
> --- a/gcc/ipa-prop.cc
> +++ b/gcc/ipa-prop.cc
> @@ -109,53 +109,53 @@ struct ipa_bit_ggc_hash_traits : public 
> ggc_cache_remove 
>  /* Hash table for avoid repeated allocations of equal ipa_bits.  */
>  static GTY ((cache)) hash_table 
> *ipa_bits_hash_table;
>  
> -/* Traits for a hash table for reusing value_ranges used for IPA.  Note that
> -   the equiv bitmap is not hashed and is expected to be NULL.  */
> +/* Traits for a hash table for reusing ranges.  */
>  
> -struct ipa_vr_ggc_hash_traits : public ggc_cache_remove 
> +struct ipa_vr_ggc_hash_traits : public ggc_cache_remove 
>  {
> -  typedef value_range *value_type;
> -  typedef value_range *compare_type;
> +  typedef ipa_vr *value_type;
> +  typedef const vrange *compare_type;
>static hashval_t
> -  hash (const value_range *p)
> +  hash (const ipa_vr *p)
>  {
> -  tree min, max;
> -  value_range_kind kind = get_legacy_range (*p, min, max);
> -  inchash::hash hstate (kind);
> -  inchash::add_expr (min, hstate);
> -  inchash::add_expr (max, hstate);
> +  // This never get called, except in the verification code, as
> +  // ipa_get_value_range() calculates the hash itself.  This
> +  // function is mostly here for completness' sake.
> +  Value_Range vr;
> +  p->get_vrange (vr);
> +  inchash::hash hstate;
> +  add_vrange (vr, hstate);
>return hstate.end ();
>  }
>static bool
> -  equal (const value_range *a, const value_range *b)
> +  equal (const ipa_vr *a, const vrange *b)
>  {
> -  return (types_compatible_p (a->type (), b->type ())
> -   && *a == *b);
> +  return a->equal_p (*b);
>  }
>static const bool empty_zero_p = true;
>static void
> -  mark_empty (value_range *)
> +  mark_empty (ipa_vr *)
>  {
>p = NULL;
>  }
>static bool
> -  is_empty (const value_range *p)
> +  is_empty (const ipa_vr *p)
>  {
>return p == NULL;
>  }
>static bool
> -  is_deleted (const value_range *p)
> +  is_deleted (const ipa_vr *p)
>  {
> -  return p == reinterpret_cast (1);
> +  return p == reinterpret_cast (1);
>  }
>static void
> -  mark_deleted (value_range *)
> +  mark_deleted (ipa_vr *)
>  {
> -  p = reinterpret_cast (1);
> +  p = reinterpret_cast (1);
>  }
>  };
>  
> -/* Hash table for avoid repeated allocations of equal value_ranges.  */
> +/* Hash table for avoid repeated allocations of equal ranges.  */
>  static GTY ((cache)) hash_table *ipa_vr_hash_table;
>  
>  /* Holders of ipa cgraph hooks: */
> @@ -265,6 +265,22 @@ ipa_vr::dump (FILE *out) const
>  fprintf (out, "NO RANGE");
>  }
>  
> +// ?? These stubs are because we use an ipa_vr in a hash_traits and
> +// hash-traits.h defines an extern of gt_ggc_mx (T &) instead of
> +// picking up the gt_ggc_mx (T *) version.

If you mean FIXME or TODO, please replace the "??" string with one of
those.  Otherwise please just remove it or specify what you mean in some
clearer way.

OK with that change.

Thanks,

Martin



> +void
> +gt_pch_nx (ipa_vr *)
> +{
> +  return gt_pch_nx ((ipa_vr *) x);
> +}
> +
> +void
> +gt_ggc_mx (ipa_vr *)
> +{
> +  return gt_ggc_mx ((ipa_vr *) x);
> +}
> +
> +

[...]

Re: [PATCH] Convert ipa_jump_func to use ipa_vr instead of a value_range.

2023-06-26 Thread Martin Jambor

Hi,

On Mon, May 22 2023, Aldy Hernandez via Gcc-patches wrote:
> This patch converts the ipa_jump_func code to use the type agnostic
> ipa_vr suitable for GC instead of value_range which is integer specific.
>
> I've disabled the range cacheing to simplify the patch for review, but
> it is handled in the next patch in the series.
>
> OK?
>
> gcc/ChangeLog:
>
>   * ipa-cp.cc (ipa_vr_operation_and_type_effects): New.
>   * ipa-prop.cc (ipa_get_value_range): Adjust for ipa_vr.
>   (ipa_set_jfunc_vr): Take a range.
>   (ipa_compute_jump_functions_for_edge): Pass range to
>   ipa_set_jfunc_vr.
>   (ipa_write_jump_function): Call streamer write helper.
>   (ipa_read_jump_function): Call streamer read helper.
>   * ipa-prop.h (class ipa_vr): Change m_vr to an ipa_vr.

OK, thanks and sorry for the waiting, I've been unexpectedly traveling
last week.

Martin

> ---
>  gcc/ipa-cp.cc   | 15 +++
>  gcc/ipa-prop.cc | 70 ++---
>  gcc/ipa-prop.h  |  5 +++-
>  3 files changed, 44 insertions(+), 46 deletions(-)
>
[...]

Fix profile of forwardes produced by cd-dce

2023-06-26 Thread Jan Hubicka via Gcc-patches

Hi,
compiling the testcase from PR109849 (which uses std:vector based stack to
drive a loop) with profile feedbakc leads to profile mismatches introduced by
tree-ssa-dce.  This is the new code to produce unified forwarder blocks for
PHIs.

I am not including the testcase itself since
checking it for Invalid sum is probably going to be too fragile and this should
show in our LNT testers. The patch however fixes the mismatch.

Bootstrapped/regtested x86_64-linux and plan to commit it shortly.

gcc/ChangeLog:

PR tree-optimization/109849
* tree-ssa-dce.cc (make_forwarders_with_degenerate_phis): Fix profile
count of newly constructed forwarder block.

diff --git a/gcc/tree-ssa-dce.cc b/gcc/tree-ssa-dce.cc
index 2949957f883..f0b02456132 100644
--- a/gcc/tree-ssa-dce.cc
+++ b/gcc/tree-ssa-dce.cc
@@ -1865,12 +1865,15 @@ make_forwarders_with_degenerate_phis (function *fn)
}
  free_dominance_info (fn, CDI_DOMINATORS);
  basic_block forwarder = split_edge (args[start].first);
+ profile_count count = profile_count::zero ();
  for (unsigned j = start + 1; j < i; ++j)
{
  edge e = args[j].first;
  redirect_edge_and_branch_force (e, forwarder);
  redirect_edge_var_map_clear (e);
+ count += e->count ();
}
+ forwarder->count = count;
  if (vphi)
{
  tree def = copy_ssa_name (vphi_args[0]);

Re: [PATCH] Move substitute_and_fold over to use simple_dce_from_worklist

On Sun, Jun 25, 2023 at 10:59 PM Jan-Benedict Glaw  wrote:
>
> Hi Andrew,
>
> On Fri, 2023-05-05 08:17:19 -0700, Andrew Pinski via Gcc-patches 
>  wrote:
> > While looking into a different issue, I noticed that it
> > would take until the second forwprop pass to do some
> > forward proping and it was because the ssa name was
> > used more than once but the second statement was
> > "dead" and we don't remove that until much later.
> [...]
> > OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>
> Since this patch, I see a bit of fallout building the Linux kernel
> using the adder875_defconfig:
>
> # CC  arch/powerpc/kernel/ptrace/ptrace-view.o
>   powerpc-linux-gcc -Wp,-MMD,arch/powerpc/kernel/ptrace/.ptrace-view.o.d 
> -nostdinc -I./arch/powerpc/include -I./arch/powerpc/include/generated  
> -I./include -I./arch/powerpc/include/uapi 
> -I./arch/powerpc/include/generated/uapi -I./include/uapi 
> -I./include/generated/uapi -include ./include/linux/compiler-version.h 
> -include ./include/linux/kconfig.h -include ./include/linux/compiler_types.h 
> -D__KERNEL__ -I ./arch/powerpc -fmacro-prefix-map=./= -Wall -Wundef 
> -Werror=strict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common 
> -fshort-wchar -fno-PIE -Werror=implicit-function-declaration 
> -Werror=implicit-int -Werror=return-type -Wno-format-security -funsigned-char 
> -std=gnu11 -mbig-endian -m32 -msoft-float -pipe -ffixed-r2 -mmultiple 
> -mno-readonly-in-sdata -mcpu=860 -mno-prefixed -mno-pcrel -mno-altivec 
> -mno-vsx -mno-mma -fno-asynchronous-unwind-tables -mno-string -mbig-endian 
> -mstack-protector-guard=tls -mstack-protector-guard-reg=r2 
> -fno-delete-null-pointer-checks -Wno-frame-address -Wno-format-truncation 
> -Wno-format-overflow -Wno-address-of-packed-member -O2 
> -fno-allow-store-data-races -Wframe-larger-than=1024 -fstack-protector-strong 
> -Wno-main -Wno-unused-but-set-variable -Wno-unused-const-variable 
> -Wno-dangling-pointer -fomit-frame-pointer -ftrivial-auto-var-init=zero 
> -fno-stack-clash-protection -Wdeclaration-after-statement -Wvla 
> -Wno-pointer-sign -Wcast-function-type -Wno-stringop-truncation 
> -Wno-stringop-overflow -Wno-restrict -Wno-maybe-uninitialized 
> -Wno-array-bounds -Wno-alloc-size-larger-than -Wimplicit-fallthrough=5 
> -fno-strict-overflow -fno-stack-check -fconserve-stack -Werror=date-time 
> -Werror=incompatible-pointer-types -Werror=designated-init 
> -Wno-packed-not-aligned -g -mstack-protector-guard-offset=544 -Werror 
> -DUTS_MACHINE='"ppc"'
> -DKBUILD_MODFILE='"arch/powerpc/kernel/ptrace/ptrace-view"' 
> -DKBUILD_BASENAME='"ptrace_view"' -DKBUILD_MODNAME='"ptrace_view"' 
> -D__KBUILD_MODNAME=kmod_ptrace_view -c -o 
> arch/powerpc/kernel/ptrace/ptrace-view.o 
> arch/powerpc/kernel/ptrace/ptrace-view.c
> during GIMPLE pass: pre
> arch/powerpc/kernel/ptrace/ptrace-view.c: In function 'gpr32_set_common':
> arch/powerpc/kernel/ptrace/ptrace-view.c:649:5: internal compiler error: in 
> gimple_redirect_edge_and_branch, at tree-cfg.cc:6262
>   649 | int gpr32_set_common(struct task_struct *target,
>   | ^~~~
> 0x1a562a6 internal_error(char const*, ...)
>  ???:0
> 0x826ea1 fancy_abort(char const*, int, char const*)
>  ???:0
> 0x9b77c9 redirect_edge_and_branch(edge_def*, basic_block_def*)
>  ???:0
> 0x9b7e43 split_edge(edge_def*)
>  ???:0
> 0xee1cc7 split_critical_edges(bool)
>  ???:0
> Please submit a full bug report, with preprocessed source (by using 
> -freport-bug).
> Please include the complete backtrace with any bug report.
> See  for instructions.
> make[4]: *** [scripts/Makefile.build:252: 
> arch/powerpc/kernel/ptrace/ptrace-view.o] Error 1
> make[3]: *** [scripts/Makefile.build:494: arch/powerpc/kernel/ptrace] Error 2
> make[2]: *** [scripts/Makefile.build:494: arch/powerpc/kernel] Error 2
> make[1]: *** [scripts/Makefile.build:494: arch/powerpc] Error 2
> make: *** [Makefile:2026: .] Error 2

Can you file a bug (https://gcc.gnu.org/bugzilla/) with the
preprocessed source (which -freport-bug will provide). In the meantime
I will try to reproduce it and see what is going on.

Thanks,
Andrew


>
>
> (Full log at 
> http://toolchain.lug-owl.de/laminar/jobs/linux-powerpc-adder875_defconfig/100)
>
> Thanks,
>   Jan-Benedict
>
> --

[PATCH 2/2] AArch64: New RTL for ABDL

2023-06-26 Thread Oluwatamilore Adebayo via Gcc-patches

From: oluade01 

This patch adds new RTL for ABDL (sabdl, sabdl2, uabdl, uabdl2).

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md
(vec_widen_abdl_lo_, vec_widen_abdl_hi_):
Expansions for abd vec widen optabs.
(aarch64_abdl_insn): VQW based abdl RTL.
* config/aarch64/iterators.md (USMAX_EXT): Code attributes
that give the appropriate extend RTL for the max RTL.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/abd_2.c: Added ABDL testcases.
* gcc.target/aarch64/abd_3.c: Added ABDL testcases.
* gcc.target/aarch64/abd_4.c: Added ABDL testcases.
* gcc.target/aarch64/abd_none_2.c: Added ABDL testcases.
* gcc.target/aarch64/abd_none_3.c: Added ABDL testcases.
* gcc.target/aarch64/abd_none_4.c: Added ABDL testcases.
* gcc.target/aarch64/abd_run_1.c: Added ABDL testcases.
* gcc.target/aarch64/sve/abd_1.c: Added ABDL testcases.
* gcc.target/aarch64/sve/abd_2.c: Added ABDL testcases.
* gcc.target/aarch64/sve/abd_none_1.c: Added ABDL testcases.
* gcc.target/aarch64/sve/abd_none_2.c: Added ABDL testcases.
---
 gcc/config/aarch64/aarch64-simd.md| 70 
 gcc/config/aarch64/iterators.md   |  3 +
 gcc/testsuite/gcc.target/aarch64/abd_2.c  | 62 --
 gcc/testsuite/gcc.target/aarch64/abd_3.c  | 63 --
 gcc/testsuite/gcc.target/aarch64/abd_4.c  | 47 +--
 gcc/testsuite/gcc.target/aarch64/abd_none_2.c | 73 
 gcc/testsuite/gcc.target/aarch64/abd_none_3.c | 73 
 gcc/testsuite/gcc.target/aarch64/abd_none_4.c | 84 +++
 gcc/testsuite/gcc.target/aarch64/abd_run_1.c  | 29 +++
 gcc/testsuite/gcc.target/aarch64/sve/abd_1.c  | 67 +--
 gcc/testsuite/gcc.target/aarch64/sve/abd_2.c  | 53 ++--
 .../gcc.target/aarch64/sve/abd_none_1.c   | 73 
 .../gcc.target/aarch64/sve/abd_none_2.c   | 80 ++
 13 files changed, 749 insertions(+), 28 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
bf90202ba2ad3f62f2020486d21256f083effb07..36fefd0a96801479fbf6469a3fbcef4a0b8cad6f
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -975,6 +975,76 @@ (define_expand "aarch64_abdl2"
   }
 )
 
+(define_insn "aarch64_abdl_hi_internal"
+  [(set (match_operand: 0 "register_operand" "=w")
+   (minus:
+ (USMAX:
+   (:
+ (vec_select:
+   (match_operand:VQW 1 "register_operand" "w")
+   (match_operand:VQW 3 "vect_par_cnst_hi_half" "")))
+   (:
+ (vec_select:
+   (match_operand:VQW 2 "register_operand" "w")
+   (match_dup 3
+ (:
+   (:
+ (vec_select: (match_dup 1) (match_dup 3)))
+   (:
+ (vec_select: (match_dup 2) (match_dup 3))]
+  "TARGET_SIMD"
+  "abdl2\t%0., %1., %2."
+  [(set_attr "type" "neon_abd_long")]
+)
+
+(define_insn "aarch64_abdl_lo_internal"
+  [(set (match_operand: 0 "register_operand" "=w")
+   (minus:
+ (USMAX:
+   (: (vec_select:
+   (match_operand:VQW 1 "register_operand" "w")
+   (match_operand:VQW 3 "vect_par_cnst_lo_half" "")))
+   (: (vec_select:
+   (match_operand:VQW 2 "register_operand" "w")
+   (match_dup 3
+ (:
+   (: (vec_select:
+   (match_dup 1)
+   (match_dup 3)))
+   (: (vec_select:
+   (match_dup 2)
+   (match_dup 3))]
+  "TARGET_SIMD"
+  "abdl\t%0., %1., %2."
+  [(set_attr "type" "neon_abd_long")]
+)
+
+(define_expand "vec_widen_abd_hi_"
+  [(match_operand: 0 "register_operand")
+   (ANY_EXTEND: (match_operand:VQW 1 "register_operand"))
+   (ANY_EXTEND: (match_operand:VQW 2 "register_operand"))]
+  "TARGET_SIMD"
+  {
+rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
+emit_insn (gen_aarch64_abdl_hi_internal (operands[0], 
operands[1],
+  operands[2], p));
+DONE;
+  }
+)
+
+(define_expand "vec_widen_abd_lo_"
+  [(match_operand: 0 "register_operand")
+   (ANY_EXTEND: (match_operand:VQW 1 "register_operand"))
+   (ANY_EXTEND: (match_operand:VQW 2 "register_operand"))]
+  "TARGET_SIMD"
+  {
+rtx p = aarch64_simd_vect_par_cnst_half (mode, , false);
+emit_insn (gen_aarch64_abdl_lo_internal (operands[0], 
operands[1],
+  operands[2], p));
+DONE;
+  }
+)
+
 (define_insn "aarch64_abal"
   [(set (match_operand: 0 "register_operand" "=w")
(plus:
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 
d9c7354730ac5870c0042f1e30fb1140a117d110..1385842d0a51b3f4a0871af4d0bbbe5299d4
 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -1984,6 +1984,9 @@ (define_code_attr max_opp

[PATCH 1/2] Mid engine setup [SU]ABDL

2023-06-26 Thread Oluwatamilore Adebayo via Gcc-patches

From: oluade01 

This updates vect_recog_abd_pattern to recognize the widening
variant of absolute difference (ABDL, ABDL2).

gcc/ChangeLog:

* internal-fn.cc (widening_fn_p, decomposes_to_hilo_fn_p):
Add IFN_VEC_WIDEN_ABD to the switch statement.
* internal-fn.def (VEC_WIDEN_ABD): New internal hilo optab.
* optabs.def (vec_widen_sabd_optab,
vec_widen_sabd_hi_optab, vec_widen_sabd_lo_optab,
vec_widen_sabd_odd_even, vec_widen_sabd_even_optab,
vec_widen_uabd_optab,
vec_widen_uabd_hi_optab, vec_widen_uabd_lo_optab,
vec_widen_uabd_odd_even, vec_widen_uabd_even_optab):
New optabs.
* tree-vect-patterns.cc (vect_recog_abd_pattern): Update to
to build a VEC_WIDEN_ABD call if the input precision is smaller
than the precision of the output.
(vect_recog_widen_abd_pattern): Should an ABD expression be
found preceeding an extension, replace the two with a
VEC_WIDEN_ABD.
---
 gcc/internal-fn.def   |   5 ++
 gcc/optabs.def|  10 +++
 gcc/tree-vect-patterns.cc | 174 --
 3 files changed, 161 insertions(+), 28 deletions(-)

diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 
116965f4830cec8f60642ff011a86b6562e2c509..d67274d68b49943a88c531e903fd03b42343ab97
 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -352,6 +352,11 @@ DEF_INTERNAL_WIDENING_OPTAB_FN (VEC_WIDEN_MINUS,
first,
vec_widen_ssub, vec_widen_usub,
binary)
+DEF_INTERNAL_WIDENING_OPTAB_FN (VEC_WIDEN_ABD,
+   ECF_CONST | ECF_NOTHROW,
+   first,
+   vec_widen_sabd, vec_widen_uabd,
+   binary)
 DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, vec_fmaddsub, ternary)
 DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd, ternary)
 
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 
35b835a6ac56d72417dac8ddfd77a8a7e2475e65..68dfa1550f791a2fe833012157601ecfa68f1e09
 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -418,6 +418,11 @@ OPTAB_D (vec_widen_sadd_hi_optab, "vec_widen_sadd_hi_$a")
 OPTAB_D (vec_widen_sadd_lo_optab, "vec_widen_sadd_lo_$a")
 OPTAB_D (vec_widen_sadd_odd_optab, "vec_widen_sadd_odd_$a")
 OPTAB_D (vec_widen_sadd_even_optab, "vec_widen_sadd_even_$a")
+OPTAB_D (vec_widen_sabd_optab, "vec_widen_sabd_$a")
+OPTAB_D (vec_widen_sabd_hi_optab, "vec_widen_sabd_hi_$a")
+OPTAB_D (vec_widen_sabd_lo_optab, "vec_widen_sabd_lo_$a")
+OPTAB_D (vec_widen_sabd_odd_optab, "vec_widen_sabd_odd_$a")
+OPTAB_D (vec_widen_sabd_even_optab, "vec_widen_sabd_even_$a")
 OPTAB_D (vec_widen_sshiftl_hi_optab, "vec_widen_sshiftl_hi_$a")
 OPTAB_D (vec_widen_sshiftl_lo_optab, "vec_widen_sshiftl_lo_$a")
 OPTAB_D (vec_widen_umult_even_optab, "vec_widen_umult_even_$a")
@@ -436,6 +441,11 @@ OPTAB_D (vec_widen_uadd_hi_optab, "vec_widen_uadd_hi_$a")
 OPTAB_D (vec_widen_uadd_lo_optab, "vec_widen_uadd_lo_$a")
 OPTAB_D (vec_widen_uadd_odd_optab, "vec_widen_uadd_odd_$a")
 OPTAB_D (vec_widen_uadd_even_optab, "vec_widen_uadd_even_$a")
+OPTAB_D (vec_widen_uabd_optab, "vec_widen_uabd_$a")
+OPTAB_D (vec_widen_uabd_hi_optab, "vec_widen_uabd_hi_$a")
+OPTAB_D (vec_widen_uabd_lo_optab, "vec_widen_uabd_lo_$a")
+OPTAB_D (vec_widen_uabd_odd_optab, "vec_widen_uabd_odd_$a")
+OPTAB_D (vec_widen_uabd_even_optab, "vec_widen_uabd_even_$a")
 OPTAB_D (vec_addsub_optab, "vec_addsub$a3")
 OPTAB_D (vec_fmaddsub_optab, "vec_fmaddsub$a4")
 OPTAB_D (vec_fmsubadd_optab, "vec_fmsubadd$a4")
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 
e2392113bff4065c909aefc760b4c48978b73a5a..852c4e99edb19d215c354b666b991c87c48620b4
 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -1404,15 +1404,28 @@ vect_recog_sad_pattern (vec_info *vinfo,
   gcall *abd_stmt = dyn_cast  (abs_stmt_vinfo->stmt);
   if (!abd_stmt
  || !gimple_call_internal_p (abd_stmt)
- || gimple_call_internal_fn (abd_stmt) != IFN_ABD)
+ || gimple_call_num_args (abd_stmt) != 2)
return NULL;
 
   tree abd_oprnd0 = gimple_call_arg (abd_stmt, 0);
   tree abd_oprnd1 = gimple_call_arg (abd_stmt, 1);
 
-  if (!vect_look_through_possible_promotion (vinfo, abd_oprnd0, [0])
- || !vect_look_through_possible_promotion (vinfo, abd_oprnd1,
-   [1]))
+  if (gimple_call_internal_fn (abd_stmt) == IFN_ABD)
+   {
+ if (!vect_look_through_possible_promotion (vinfo, abd_oprnd0,
+[0])
+ || !vect_look_through_possible_promotion (vinfo, abd_oprnd1,
+   [1]))
+   return NULL;
+   }
+  else if (gimple_call_internal_fn (abd_stmt) == IFN_VEC_WIDEN_ABD)
+   {
+ unprom[0].op

Re: [PATCH] arm: Fix MVE intrinsics support with LTO (PR target/110268)

2023-06-26 Thread Christophe Lyon via Gcc-patches

On Mon, 26 Jun 2023 at 17:30, Prathamesh Kulkarni <
prathamesh.kulka...@linaro.org> wrote:

> On Mon, 26 Jun 2023 at 20:33, Christophe Lyon via Gcc-patches
>  wrote:
> >
> > After the recent MVE intrinsics re-implementation, LTO stopped working
> > because the intrinsics would no longer be defined.
> >
> > The main part of the patch is simple and similar to what we do for
> > AArch64:
> > - call handle_arm_mve_h() from arm_init_mve_builtins to declare the
> >   intrinsics when the compiler is in LTO mode
> > - actually implement arm_builtin_decl for MVE.
> >
> > It was just a bit tricky to handle __ARM_MVE_PRESERVE_USER_NAMESPACE:
> > its value in the user code cannot be guessed at LTO time, so we always
> > have to assume that it was not defined.  The led to a few fixes in the
> > way we register MVE builtins as placeholders or not.  Without this
> > patch, we would just omit some versions of the inttrinsics when
> > __ARM_MVE_PRESERVE_USER_NAMESPACE is true. In fact, like for the C/C++
> > placeholders, we need to always keep entries for all of them to ensure
> > that we have a consistent numbering scheme.
> >
> > 2023-06-26  Christophe Lyon   
> >
> > PR target/110268
> > gcc/
> > * config/arm/arm-builtins.cc (arm_init_mve_builtins): Handle LTO.
> > (arm_builtin_decl): Hahndle MVE builtins.
> > * config/arm/arm-mve-builtins.cc (builtin_decl): New function.
> > (add_unique_function): Fix handling of
> > __ARM_MVE_PRESERVE_USER_NAMESPACE.
> > (add_overloaded_function): Likewise.
> > * config/arm/arm-protos.h (builtin_decl): New declaration.
> >
> > gcc/testsuite/
> > * gcc.target/arm/pr110268-1.c: New test.
> > * gcc.target/arm/pr110268-2.c: New test.
> > ---
> >  gcc/config/arm/arm-builtins.cc| 11 +++-
> >  gcc/config/arm/arm-mve-builtins.cc| 61 ---
> >  gcc/config/arm/arm-protos.h   |  1 +
> >  gcc/testsuite/gcc.target/arm/pr110268-1.c | 11 
> >  gcc/testsuite/gcc.target/arm/pr110268-2.c | 22 
> >  5 files changed, 76 insertions(+), 30 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/arm/pr110268-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/arm/pr110268-2.c
> >
> > diff --git a/gcc/config/arm/arm-builtins.cc
> b/gcc/config/arm/arm-builtins.cc
> > index 36365e40a5b..fca7dcaf565 100644
> > --- a/gcc/config/arm/arm-builtins.cc
> > +++ b/gcc/config/arm/arm-builtins.cc
> > @@ -1918,6 +1918,15 @@ arm_init_mve_builtins (void)
> >arm_builtin_datum *d = _builtin_data[i];
> >arm_init_builtin (fcode, d, "__builtin_mve");
> >  }
> > +
> > +  if (in_lto_p)
> > +{
> > +  arm_mve::handle_arm_mve_types_h ();
> > +  /* Under LTO, we cannot know whether
> > +__ARM_MVE_PRESERVE_USER_NAMESPACE was defined, so assume it
> > +was not.  */
> > +  arm_mve::handle_arm_mve_h (false);
> > +}
> >  }
> >
> >  /* Set up all the NEON builtins, even builtins for instructions that
> are not
> > @@ -2723,7 +2732,7 @@ arm_builtin_decl (unsigned code, bool initialize_p
> ATTRIBUTE_UNUSED)
> >  case ARM_BUILTIN_GENERAL:
> >return arm_general_builtin_decl (subcode);
> >  case ARM_BUILTIN_MVE:
> > -  return error_mark_node;
> > +  return arm_mve::builtin_decl (subcode);
> >  default:
> >gcc_unreachable ();
> >  }
> > diff --git a/gcc/config/arm/arm-mve-builtins.cc
> b/gcc/config/arm/arm-mve-builtins.cc
> > index 7033e41a571..e9a12f27411 100644
> > --- a/gcc/config/arm/arm-mve-builtins.cc
> > +++ b/gcc/config/arm/arm-mve-builtins.cc
> > @@ -493,6 +493,16 @@ handle_arm_mve_h (bool preserve_user_namespace)
> >  preserve_user_namespace);
> >  }
> >
> > +/* Return the function decl with SVE function subcode CODE, or
> error_mark_node
> > +   if no such function exists.  */
> Hi Christophe,
> Sorry to nitpick -- s/SVE/MVE ? :)
>
> Gasp, I must confess you are right ;-)

Thanks,

Christophe


> Thanks,
> Prathamesh
> > +tree
> > +builtin_decl (unsigned int code)
> > +{
> > +  if (code >= vec_safe_length (registered_functions))
> > +return error_mark_node;
> > +  return (*registered_functions)[code]->decl;
> > +}
> > +
> >  /* Return true if CANDIDATE is equivalent to MODEL_TYPE for overloading
> > purposes.  */
> >  static bool
> > @@ -849,7 +859,6 @@ function_builder::add_function (const
> function_instance ,
> >  ? integer_zero_node
> >  : simulate_builtin_function_decl (input_location, name, fntype,
> >   code, NULL, attrs);
> > -
> >registered_function  = *ggc_alloc  ();
> >rfn.instance = instance;
> >rfn.decl = decl;
> > @@ -889,15 +898,12 @@ function_builder::add_unique_function (const
> function_instance ,
> >gcc_assert (!*rfn_slot);
> >*rfn_slot = 
> >
> > -  /* Also add the non-prefixed non-overloaded function, if the user
> namespace
> > - does not

Re: [PATCH] arm: Fix MVE intrinsics support with LTO (PR target/110268)

2023-06-26 Thread Prathamesh Kulkarni via Gcc-patches

On Mon, 26 Jun 2023 at 20:33, Christophe Lyon via Gcc-patches
 wrote:
>
> After the recent MVE intrinsics re-implementation, LTO stopped working
> because the intrinsics would no longer be defined.
>
> The main part of the patch is simple and similar to what we do for
> AArch64:
> - call handle_arm_mve_h() from arm_init_mve_builtins to declare the
>   intrinsics when the compiler is in LTO mode
> - actually implement arm_builtin_decl for MVE.
>
> It was just a bit tricky to handle __ARM_MVE_PRESERVE_USER_NAMESPACE:
> its value in the user code cannot be guessed at LTO time, so we always
> have to assume that it was not defined.  The led to a few fixes in the
> way we register MVE builtins as placeholders or not.  Without this
> patch, we would just omit some versions of the inttrinsics when
> __ARM_MVE_PRESERVE_USER_NAMESPACE is true. In fact, like for the C/C++
> placeholders, we need to always keep entries for all of them to ensure
> that we have a consistent numbering scheme.
>
> 2023-06-26  Christophe Lyon   
>
> PR target/110268
> gcc/
> * config/arm/arm-builtins.cc (arm_init_mve_builtins): Handle LTO.
> (arm_builtin_decl): Hahndle MVE builtins.
> * config/arm/arm-mve-builtins.cc (builtin_decl): New function.
> (add_unique_function): Fix handling of
> __ARM_MVE_PRESERVE_USER_NAMESPACE.
> (add_overloaded_function): Likewise.
> * config/arm/arm-protos.h (builtin_decl): New declaration.
>
> gcc/testsuite/
> * gcc.target/arm/pr110268-1.c: New test.
> * gcc.target/arm/pr110268-2.c: New test.
> ---
>  gcc/config/arm/arm-builtins.cc| 11 +++-
>  gcc/config/arm/arm-mve-builtins.cc| 61 ---
>  gcc/config/arm/arm-protos.h   |  1 +
>  gcc/testsuite/gcc.target/arm/pr110268-1.c | 11 
>  gcc/testsuite/gcc.target/arm/pr110268-2.c | 22 
>  5 files changed, 76 insertions(+), 30 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/arm/pr110268-1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/pr110268-2.c
>
> diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc
> index 36365e40a5b..fca7dcaf565 100644
> --- a/gcc/config/arm/arm-builtins.cc
> +++ b/gcc/config/arm/arm-builtins.cc
> @@ -1918,6 +1918,15 @@ arm_init_mve_builtins (void)
>arm_builtin_datum *d = _builtin_data[i];
>arm_init_builtin (fcode, d, "__builtin_mve");
>  }
> +
> +  if (in_lto_p)
> +{
> +  arm_mve::handle_arm_mve_types_h ();
> +  /* Under LTO, we cannot know whether
> +__ARM_MVE_PRESERVE_USER_NAMESPACE was defined, so assume it
> +was not.  */
> +  arm_mve::handle_arm_mve_h (false);
> +}
>  }
>
>  /* Set up all the NEON builtins, even builtins for instructions that are not
> @@ -2723,7 +2732,7 @@ arm_builtin_decl (unsigned code, bool initialize_p 
> ATTRIBUTE_UNUSED)
>  case ARM_BUILTIN_GENERAL:
>return arm_general_builtin_decl (subcode);
>  case ARM_BUILTIN_MVE:
> -  return error_mark_node;
> +  return arm_mve::builtin_decl (subcode);
>  default:
>gcc_unreachable ();
>  }
> diff --git a/gcc/config/arm/arm-mve-builtins.cc 
> b/gcc/config/arm/arm-mve-builtins.cc
> index 7033e41a571..e9a12f27411 100644
> --- a/gcc/config/arm/arm-mve-builtins.cc
> +++ b/gcc/config/arm/arm-mve-builtins.cc
> @@ -493,6 +493,16 @@ handle_arm_mve_h (bool preserve_user_namespace)
>  preserve_user_namespace);
>  }
>
> +/* Return the function decl with SVE function subcode CODE, or 
> error_mark_node
> +   if no such function exists.  */
Hi Christophe,
Sorry to nitpick -- s/SVE/MVE ? :)

Thanks,
Prathamesh
> +tree
> +builtin_decl (unsigned int code)
> +{
> +  if (code >= vec_safe_length (registered_functions))
> +return error_mark_node;
> +  return (*registered_functions)[code]->decl;
> +}
> +
>  /* Return true if CANDIDATE is equivalent to MODEL_TYPE for overloading
> purposes.  */
>  static bool
> @@ -849,7 +859,6 @@ function_builder::add_function (const function_instance 
> ,
>  ? integer_zero_node
>  : simulate_builtin_function_decl (input_location, name, fntype,
>   code, NULL, attrs);
> -
>registered_function  = *ggc_alloc  ();
>rfn.instance = instance;
>rfn.decl = decl;
> @@ -889,15 +898,12 @@ function_builder::add_unique_function (const 
> function_instance ,
>gcc_assert (!*rfn_slot);
>*rfn_slot = 
>
> -  /* Also add the non-prefixed non-overloaded function, if the user namespace
> - does not need to be preserved.  */
> -  if (!preserve_user_namespace)
> -{
> -  char *noprefix_name = get_name (instance, false, false);
> -  tree attrs = get_attributes (instance);
> -  add_function (instance, noprefix_name, fntype, attrs, requires_float,
> -   false, false);
> -}
> +  /* Also add the non-prefixed non-overloaded function, as placeholder
>

[PATCH] arm: Fix MVE intrinsics support with LTO (PR target/110268)

2023-06-26 Thread Christophe Lyon via Gcc-patches

After the recent MVE intrinsics re-implementation, LTO stopped working
because the intrinsics would no longer be defined.

The main part of the patch is simple and similar to what we do for
AArch64:
- call handle_arm_mve_h() from arm_init_mve_builtins to declare the
  intrinsics when the compiler is in LTO mode
- actually implement arm_builtin_decl for MVE.

It was just a bit tricky to handle __ARM_MVE_PRESERVE_USER_NAMESPACE:
its value in the user code cannot be guessed at LTO time, so we always
have to assume that it was not defined.  The led to a few fixes in the
way we register MVE builtins as placeholders or not.  Without this
patch, we would just omit some versions of the inttrinsics when
__ARM_MVE_PRESERVE_USER_NAMESPACE is true. In fact, like for the C/C++
placeholders, we need to always keep entries for all of them to ensure
that we have a consistent numbering scheme.

2023-06-26  Christophe Lyon   

PR target/110268
gcc/
* config/arm/arm-builtins.cc (arm_init_mve_builtins): Handle LTO.
(arm_builtin_decl): Hahndle MVE builtins.
* config/arm/arm-mve-builtins.cc (builtin_decl): New function.
(add_unique_function): Fix handling of
__ARM_MVE_PRESERVE_USER_NAMESPACE.
(add_overloaded_function): Likewise.
* config/arm/arm-protos.h (builtin_decl): New declaration.

gcc/testsuite/
* gcc.target/arm/pr110268-1.c: New test.
* gcc.target/arm/pr110268-2.c: New test.
---
 gcc/config/arm/arm-builtins.cc| 11 +++-
 gcc/config/arm/arm-mve-builtins.cc| 61 ---
 gcc/config/arm/arm-protos.h   |  1 +
 gcc/testsuite/gcc.target/arm/pr110268-1.c | 11 
 gcc/testsuite/gcc.target/arm/pr110268-2.c | 22 
 5 files changed, 76 insertions(+), 30 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/pr110268-1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/pr110268-2.c

diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc
index 36365e40a5b..fca7dcaf565 100644
--- a/gcc/config/arm/arm-builtins.cc
+++ b/gcc/config/arm/arm-builtins.cc
@@ -1918,6 +1918,15 @@ arm_init_mve_builtins (void)
   arm_builtin_datum *d = _builtin_data[i];
   arm_init_builtin (fcode, d, "__builtin_mve");
 }
+
+  if (in_lto_p)
+{
+  arm_mve::handle_arm_mve_types_h ();
+  /* Under LTO, we cannot know whether
+__ARM_MVE_PRESERVE_USER_NAMESPACE was defined, so assume it
+was not.  */
+  arm_mve::handle_arm_mve_h (false);
+}
 }
 
 /* Set up all the NEON builtins, even builtins for instructions that are not
@@ -2723,7 +2732,7 @@ arm_builtin_decl (unsigned code, bool initialize_p 
ATTRIBUTE_UNUSED)
 case ARM_BUILTIN_GENERAL:
   return arm_general_builtin_decl (subcode);
 case ARM_BUILTIN_MVE:
-  return error_mark_node;
+  return arm_mve::builtin_decl (subcode);
 default:
   gcc_unreachable ();
 }
diff --git a/gcc/config/arm/arm-mve-builtins.cc 
b/gcc/config/arm/arm-mve-builtins.cc
index 7033e41a571..e9a12f27411 100644
--- a/gcc/config/arm/arm-mve-builtins.cc
+++ b/gcc/config/arm/arm-mve-builtins.cc
@@ -493,6 +493,16 @@ handle_arm_mve_h (bool preserve_user_namespace)
 preserve_user_namespace);
 }
 
+/* Return the function decl with SVE function subcode CODE, or error_mark_node
+   if no such function exists.  */
+tree
+builtin_decl (unsigned int code)
+{
+  if (code >= vec_safe_length (registered_functions))
+return error_mark_node;
+  return (*registered_functions)[code]->decl;
+}
+
 /* Return true if CANDIDATE is equivalent to MODEL_TYPE for overloading
purposes.  */
 static bool
@@ -849,7 +859,6 @@ function_builder::add_function (const function_instance 
,
 ? integer_zero_node
 : simulate_builtin_function_decl (input_location, name, fntype,
  code, NULL, attrs);
-
   registered_function  = *ggc_alloc  ();
   rfn.instance = instance;
   rfn.decl = decl;
@@ -889,15 +898,12 @@ function_builder::add_unique_function (const 
function_instance ,
   gcc_assert (!*rfn_slot);
   *rfn_slot = 
 
-  /* Also add the non-prefixed non-overloaded function, if the user namespace
- does not need to be preserved.  */
-  if (!preserve_user_namespace)
-{
-  char *noprefix_name = get_name (instance, false, false);
-  tree attrs = get_attributes (instance);
-  add_function (instance, noprefix_name, fntype, attrs, requires_float,
-   false, false);
-}
+  /* Also add the non-prefixed non-overloaded function, as placeholder
+ if the user namespace does not need to be preserved.  */
+  char *noprefix_name = get_name (instance, false, false);
+  attrs = get_attributes (instance);
+  add_function (instance, noprefix_name, fntype, attrs, requires_float,
+   false, preserve_user_namespace);
 
   /* Also add the function under its overloaded alias, if we want
  a separate

Re: [PATCH] RISCV: Add -m(no)-omit-leaf-frame-pointer support.

LLVM will try to find scratch register even after RA to resolve the long
jump issue. so maybe we could consider similar approach? And I guess the
most complicate part would be the scratch register is not found, and
require spill/reload after RA.

Jeff Law via Gcc-patches 於 2023年6月26日 週一，22:31寫道：

>
>
> On 6/25/23 12:45, Stefan O'Rear wrote:
>
> >
> > To clarify: are you proposing to make ra (or t1 in the hypothetical) a
> fixed
> > register for all functions, or only those heuristically identified as
> potentially
> > larger than 1MiB?  And would this extend to forcing the creation of
> stack frames
> > for all functions, including very small functions?  I am concerned this
> would
> > result in a substantial performance regression.For the case Yanzhang is
> discussing (firmware and such), yes.  And
> that's simply the cost they're going to have to pay for wanting
> consistent backtraces without utilizing dwarf unwind info, sframe or orc.
>
> Normal builds won't be using those options and thus won't suffer from
> those performance penalties.
>
> >
> > Without seeing the patch I can't know if I'm missing something obvious
> but I
> > would say t1 has three advantages:
> >
> > 1. Consistency with tail, possibly simpler implementation.
> And as I've already stated, this sequence is defined by the assembler.
> While I do want to revisit a compiler only solution, it's way down on my
> list of things to improve if I do a cost/benefit analysis.   If  someone
> wants to take a stab at it, I'm all for it.  But it's not a simple
> problem due the phase ordering issues.
>
> >
> > 2. Very few functions use all seven t-registers.  qemu linux-user in
> 2016 had an
> > off-by-one bug that corrupted t6 in sigreturn and it took months for
> anyone to
> > notice.  By contrast, ra has live data in every non-_Noreturn function.
> That's a terrible way to evaluate the impact.  The right way is to use
> real benchmarks.  Not synthetic benchmarks.  Not indirect observations
> that require triggering a bug in a sigreturn path.  Build and run a real
> benchmark.
>
>
>
> >
> > 3. Any jalr instruction which has rs1=ra has a hint effect on the return
> address
> > stack (call, return, or coroutine swap); a jalr which is intended to be
> treated
> > as a plain jump must have rs1!=ra, rs1!=t0.
> I'm well aware of these concerns.  We support disambiguating various
> jump forms to facilitate different branch predictors.
>
> jeff
>

Re: [PATCH] RISCV: Add -m(no)-omit-leaf-frame-pointer support.

On 6/25/23 12:45, Stefan O'Rear wrote:

To clarify: are you proposing to make ra (or t1 in the hypothetical) a fixed
register for all functions, or only those heuristically identified as
potentially
larger than 1MiB? And would this extend to forcing the creation of stack frames
for all functions, including very small functions? I am concerned this would
result in a substantial performance regression.For the case Yanzhang is discussing (firmware and such), yes. And
that's simply the cost they're going to have to pay for wanting
consistent backtraces without utilizing dwarf unwind info, sframe or orc.

Normal builds won't be using those options and thus won't suffer from
those performance penalties.

Without seeing the patch I can't know if I'm missing something obvious but I
would say t1 has three advantages:

1. Consistency with tail, possibly simpler implementation.
And as I've already stated, this sequence is defined by the assembler.
While I do want to revisit a compiler only solution, it's way down on my
list of things to improve if I do a cost/benefit analysis. If someone
wants to take a stab at it, I'm all for it. But it's not a simple
problem due the phase ordering issues.

2. Very few functions use all seven t-registers. qemu linux-user in 2016 had an
off-by-one bug that corrupted t6 in sigreturn and it took months for anyone to
notice. By contrast, ra has live data in every non-_Noreturn function.
That's a terrible way to evaluate the impact. The right way is to use
real benchmarks. Not synthetic benchmarks. Not indirect observations
that require triggering a bug in a sigreturn path. Build and run a real
benchmark.

3. Any jalr instruction which has rs1=ra has a hint effect on the return address
stack (call, return, or coroutine swap); a jalr which is intended to be treated
as a plain jump must have rs1!=ra, rs1!=t0.
I'm well aware of these concerns. We support disambiguating various
jump forms to facilitate different branch predictors.

jeff

Re: [PATCH] Change fma_reassoc_width tuning for ampere1

2023-06-26 Thread Richard Sandiford via Gcc-patches

Philipp Tomsich  writes:
> Richard,
>
> OK for backport to GCC-13?

Yeah, OK for GCC 13 too.

Thanks,
Richard

> Thanks,
> Philipp.
>
> On Thu, 22 Jun 2023 at 16:18, Richard Sandiford via Gcc-patches
>  wrote:
>>
>> Di Zhao OS via Gcc-patches  writes:
>> > This patch enables reassociation of floating-point additions on ampere1.
>> > This brings about 1% overall benefit on spec2017 fprate cases. (There
>> > are minor regressions in 510.parest_r and 508.namd_r, analyzed here:
>> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110279 .)
>> >
>> > Bootstrapped and tested on aarch64-unknown-linux-gnu. Is this OK for trunk?
>> >
>> > Thanks,
>> > Di Zhao
>> >
>> > gcc/ChangeLog:
>> >
>> > * config/aarch64/aarch64.cc: Change fma_reassoc_width for ampere1
>>
>> Thanks, pushed to trunk.
>>
>> Richard
>>
>> > ---
>> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
>> > index d16565b5581..301c9f6c0cd 100644
>> > --- a/gcc/config/aarch64/aarch64.cc
>> > +++ b/gcc/config/aarch64/aarch64.cc
>> > @@ -1927,7 +1927,7 @@ static const struct tune_params ampere1_tunings =
>> >"32:12",   /* loop_align.  */
>> >2, /* int_reassoc_width.  */
>> >4, /* fp_reassoc_width.  */
>> > -  1, /* fma_reassoc_width.  */
>> > +  4, /* fma_reassoc_width.  */
>> >2, /* vec_reassoc_width.  */
>> >2, /* min_div_recip_mul_sf.  */
>> >2, /* min_div_recip_mul_df.  */

[PATCH] match.pd: Use element_mode instead of TYPE_MODE.

Hi,

this patch changes TYPE_MODE into element_mode in a match.pd
simplification.  As the simplification can be called with vector types
real_can_shorten_arithmetic would ICE in REAL_MODE_FORMAT which
expects a scalar mode.  Therefore, use element_mode instead of
TYPE_MODE.

Additionally, check if the target supports the resulting operation in the
new mode.  One target that supports e.g. a float addition but not a
_Float16 addition is the RISC-V vector Float16 extension Zvfhmin.

Bootstrap on x86_64 succeeded, testsuite is currently running.  Is this OK
if the testsuite is clean?

Regards
 Robin

gcc/ChangeLog:

* match.pd: Use element_mode and check if target supports
operation with new type.
---
 gcc/match.pd | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index 33ccda3e7b6..4a200f221f6 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -7454,10 +7454,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  values representable in the TYPE to be within the
  range of normal values of ITYPE.  */
  (if (element_precision (newtype) < element_precision (itype)
+  && target_supports_op_p (newtype, op, optab_default)
   && (flag_unsafe_math_optimizations
   || (element_precision (newtype) == element_precision 
(type)
-  && real_can_shorten_arithmetic (TYPE_MODE (itype),
-  TYPE_MODE (type))
+  && real_can_shorten_arithmetic (element_mode (itype),
+  element_mode (type))
   && !excess_precision_type (newtype)))
   && !types_match (itype, newtype))
 (convert:type (op (convert:newtype @1)
-- 
2.41.0

[committed] docs: Fix typo

2023-06-26 Thread Andrew Carlotti via Gcc-patches

gcc/ChangeLog:

 * doc/optinfo.texi: Fix "steam" -> "stream".


diff --git a/gcc/doc/optinfo.texi b/gcc/doc/optinfo.texi
index 
b91bba7bd10470b17ca5190688beee06ad3b87ab..5e8c97ef118786e68b7e46f3c802154cb9b57b83
 100644
--- a/gcc/doc/optinfo.texi
+++ b/gcc/doc/optinfo.texi
@@ -100,7 +100,7 @@ that one could also use special file names @code{stdout} and
 respectively.
 
 @item @code{alt_stream}
-This steam is used for printing optimization specific output in
+This stream is used for printing optimization specific output in
 response to the @option{-fopt-info}. Again a file name can be given. If
 the file name is not given, it defaults to @code{stderr}.
 @end table

RE: [PATCH V3] DSE: Add LEN_MASK_STORE analysis into DSE and fix LEN_STORE

Committed as passed both the bootstrap and regression test, thanks Richard.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Richard Biener via Gcc-patches
Sent: Monday, June 26, 2023 4:15 PM
To: Ju-Zhe Zhong 
Cc: gcc-patches@gcc.gnu.org; richard.sandif...@arm.com
Subject: Re: [PATCH V3] DSE: Add LEN_MASK_STORE analysis into DSE and fix 
LEN_STORE

On Mon, 26 Jun 2023, juzhe.zh...@rivai.ai wrote:

> From: Ju-Zhe Zhong 
> 
> Hi, Richi.
> 
> This patch is adding LEN_MASK_STORE into DSE.
> 
> My understanding is LEN_MASK_STORE is predicated by mask and len.
> No matter len is constant or not, the ao_ref should be the same as MASK_STORE.
> 
> Wheras for LEN_STORE, when len is constant, we use (len - bias), otherwise, 
> it's
> the same as MASK_STORE/LEN_MASK_STORE.
> 
> Not sure whether I am on the same page with you, feel free to correct me.

OK if it passes bootstrap/regtest.

Thanks,
Richard.

> Thanks.
> 
> gcc/ChangeLog:
> 
> * tree-ssa-dse.cc (initialize_ao_ref_for_dse): Add LEN_MASK_STORE and 
> fix LEN_STORE.
> (dse_optimize_stmt): Add LEN_MASK_STORE.
> 
> ---
>  gcc/tree-ssa-dse.cc | 47 ++---
>  1 file changed, 31 insertions(+), 16 deletions(-)
> 
> diff --git a/gcc/tree-ssa-dse.cc b/gcc/tree-ssa-dse.cc
> index 3c7a2e9992d..f8338037a61 100644
> --- a/gcc/tree-ssa-dse.cc
> +++ b/gcc/tree-ssa-dse.cc
> @@ -48,6 +48,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-ssa-loop-niter.h"
>  #include "cfgloop.h"
>  #include "tree-data-ref.h"
> +#include "internal-fn.h"
>  
>  /* This file implements dead store elimination.
>  
> @@ -157,23 +158,36 @@ initialize_ao_ref_for_dse (gimple *stmt, ao_ref *write, 
> bool may_def_ok = false)
>switch (gimple_call_internal_fn (stmt))
>   {
>   case IFN_LEN_STORE:
> -   ao_ref_init_from_ptr_and_size
> -   (write, gimple_call_arg (stmt, 0),
> -int_const_binop (MINUS_EXPR,
> - gimple_call_arg (stmt, 2),
> - gimple_call_arg (stmt, 4)));
> -   return true;
>   case IFN_MASK_STORE:
> -   /* We cannot initialize a must-def ao_ref (in all cases) but we
> -  can provide a may-def variant.  */
> -   if (may_def_ok)
> - {
> -   ao_ref_init_from_ptr_and_size
> -   (write, gimple_call_arg (stmt, 0),
> -TYPE_SIZE_UNIT (TREE_TYPE (gimple_call_arg (stmt, 3;
> -   return true;
> - }
> -   break;
> + case IFN_LEN_MASK_STORE:
> +   {
> + int stored_value_index
> +   = internal_fn_stored_value_index (gimple_call_internal_fn (stmt));
> + if (gimple_call_internal_fn (stmt) == IFN_LEN_STORE)
> +   {
> + tree len = gimple_call_arg (stmt, 2);
> + tree bias = gimple_call_arg (stmt, 4);
> + if (tree_fits_uhwi_p (len))
> +   {
> + ao_ref_init_from_ptr_and_size (write,
> +gimple_call_arg (stmt, 0),
> +int_const_binop (MINUS_EXPR,
> + len, bias));
> + return true;
> +   }
> +   }
> + /* We cannot initialize a must-def ao_ref (in all cases) but we
> +can provide a may-def variant.  */
> + if (may_def_ok)
> +   {
> + ao_ref_init_from_ptr_and_size (
> +   write, gimple_call_arg (stmt, 0),
> +   TYPE_SIZE_UNIT (
> + TREE_TYPE (gimple_call_arg (stmt, stored_value_index;
> + return true;
> +   }
> + break;
> +   }
>   default:;
>   }
>  }
> @@ -1502,6 +1516,7 @@ dse_optimize_stmt (function *fun, gimple_stmt_iterator 
> *gsi, sbitmap live_bytes)
>   {
>   case IFN_LEN_STORE:
>   case IFN_MASK_STORE:
> + case IFN_LEN_MASK_STORE:
> {
>   enum dse_store_status store_status;
>   store_status = dse_classify_store (, stmt, false, live_bytes);
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

RE: [PATCH V2] GIMPLE_FOLD: Fix gimple fold for LEN_{MASK}_{LOAD,STORE}

Committed as passed both the bootstrap and regression test, thanks Richard.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Richard Biener via Gcc-patches
Sent: Monday, June 26, 2023 4:17 PM
To: Ju-Zhe Zhong 
Cc: gcc-patches@gcc.gnu.org; richard.sandif...@arm.com
Subject: Re: [PATCH V2] GIMPLE_FOLD: Fix gimple fold for LEN_{MASK}_{LOAD,STORE}

On Mon, 26 Jun 2023, juzhe.zh...@rivai.ai wrote:

> From: Ju-Zhe Zhong 
> 
> Hi, previous I made a mistake on GIMPLE_FOLD of LEN_MASK_{LOAD,STORE}.
> 
> We should fold LEN_MASK_{LOAD,STORE} (bias+len) == vf (nunits instead of 
> bytesize) && mask = all trues mask
> 
> into:
>MEM_REF [...].
> 
> This patch added testcase to test gimple fold of LEN_MASK_{LOAD,STORE}.
> 
> Also, I fix LEN_LOAD/LEN_STORE, to make them have the same behavior.
> 
> Ok for trunk ?

OK

> gcc/ChangeLog:
> 
> * gimple-fold.cc (gimple_fold_partial_load_store_mem_ref): Fix gimple 
> fold of LOAD/STORE with length.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/riscv/rvv/autovec/partial/gimple_fold-1.c: New test.
> 
> ---
>  gcc/gimple-fold.cc|  6 ++-
>  .../riscv/rvv/autovec/partial/gimple_fold-1.c | 43 +++
>  2 files changed, 47 insertions(+), 2 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/gimple_fold-1.c
> 
> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> index 3d46b76edeb..6d167b116b9 100644
> --- a/gcc/gimple-fold.cc
> +++ b/gcc/gimple-fold.cc
> @@ -5397,8 +5397,10 @@ gimple_fold_partial_load_store_mem_ref (gcall *call, 
> tree vectype, bool mask_p)
>unsigned int nargs = gimple_call_num_args (call);
>tree bias = gimple_call_arg (call, nargs - 1);
>gcc_assert (TREE_CODE (bias) == INTEGER_CST);
> -  if (maybe_ne (wi::to_poly_widest (basic_len) - wi::to_widest (bias),
> - GET_MODE_SIZE (TYPE_MODE (vectype
> +  /* For LEN_LOAD/LEN_STORE/LEN_MASK_LOAD/LEN_MASK_STORE,
> +  we don't fold when (bias + len) != VF.  */
> +  if (maybe_ne (wi::to_poly_widest (basic_len) + wi::to_widest (bias),
> + GET_MODE_NUNITS (TYPE_MODE (vectype
>   return NULL_TREE;
>  
>/* For LEN_MASK_{LOAD,STORE}, we should also check whether
> diff --git 
> a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/gimple_fold-1.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/gimple_fold-1.c
> new file mode 100644
> index 000..23407a2d3f4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/gimple_fold-1.c
> @@ -0,0 +1,43 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv32gcv -mabi=ilp32d --param 
> riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m8 -O3 
> -fdump-tree-optimized-details" } */
> +
> +#include 
> +
> +#define SZ 255
> +
> +#define DEF(TYPE) void fn_##TYPE (TYPE *__restrict a);
> +
> +#define RUN(TYPE)
>   \
> +  TYPE a##TYPE[SZ];  
>   \
> +  for (int i = 0; i < SZ; i++)   
>   \
> +{
>   \
> +  a##TYPE[i] = 127;  
>   \
> +}
>   \
> +  fn_##TYPE (a##TYPE);
> +
> +#define RUN_ALL()
>   \
> +  RUN (int8_t)   
>   \
> +  RUN (int16_t)  
>   \
> +  RUN (int32_t)  
>   \
> +  RUN (int64_t)  
>   \
> +  RUN (uint8_t)  
>   \
> +  RUN (uint16_t) 
>   \
> +  RUN (uint32_t) 
>   \
> +  RUN (uint64_t)
> +
> +DEF (int8_t)
> +DEF (int16_t)
> +DEF (int32_t)
> +DEF (int64_t)
> +DEF (uint8_t)
> +DEF (uint16_t)
> +DEF (uint32_t)
> +DEF (uint64_t)
> +
> +int
> +main ()
> +{
> +  RUN_ALL ()
> +}
> +
> +/* { dg-final { scan-tree-dump-times "\.LEN_MASK_STORE" 6 "optimized" } } */
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

[COMMITTED] PR tree-optimization/110251 - Avoid redundant GORI calcuations.

2023-06-26 Thread Andrew MacLeod via Gcc-patches

When calculating ranges, GORI evaluates the chain of definitions until 
it finds the desired name.


  _4 = (short unsigned int) c.2_1;
  _5 = _4 + 65535;
  a_lsm.19_30 = a;
  _49 = _4 + 65534;
  _12 = _5 & _49;
  _46 = _12 + 65535;
  _48 = _12 & _46;    <<--
  if (_48 != 0)

When evaluating c.2_1 on the true edge, GORI starts with _48 with a 
range of [1, +INF]


Looking at _48's operands (_12 and _46), note that it depends both  _12 
and _46.  Also note that _46 is also dependent on _12.


GORI currently simply calculates c.2_1 through both operands. this means 
_12 will be evaluates back thru to c.2_1, and then _46 will do the same 
and the results will be combined.  that means the statements from _12 
back to c.2_1 are actually calculated twice.


This PR produces a sequence of code which is quite long, with cascading 
chains of dependencies like this that feed each other. This becomes a 
geometric/exponential growth in calculation time, over and over.


This patch identifies the situation of one operand depending on the 
other, and simply evaluates only  the one which includes the other.  In 
the above case, it simply winds back thru _46 ignoring the _12 operand 
in the definition of _48.    During the process of evaluating _46, we 
eventually get to evaluating _12 anyway, so we don't lose much, if 
anything.    This results in a much more consistently linear time 
evaluation.


Bootstraps on x86_64-pc-linux-gnu with no regressions.   Pushed.

Andrew







commit 6246ee062062b53275c229daf8676ccaa535f419
Author: Andrew MacLeod 
Date:   Thu Jun 22 10:00:12 2023 -0400

Avoid redundant GORI calcuations.

When GORI evaluates a statement, if operand 1 and 2 are both in the
dependency chain, GORI evaluates the name through both operands sequentially
and combines the results.

If either operand is in the dependency chain of the other, this
evaluation will do the same work twice, for questionable gain.
Instead, simple evaluate only the operand which depends on the other
and keep the evaluation linear in time.

* gimple-range-gori.cc (compute_operand1_and_operand2_range):
Check for interdependence between operands 1 and 2.

diff --git a/gcc/gimple-range-gori.cc b/gcc/gimple-range-gori.cc
index abc70cd54ee..4ee0ae36014 100644
--- a/gcc/gimple-range-gori.cc
+++ b/gcc/gimple-range-gori.cc
@@ -1291,13 +1291,26 @@ gori_compute::compute_operand1_and_operand2_range (vrange ,
 {
   Value_Range op_range (TREE_TYPE (name));
 
-  // Calculate a good a range for op2.  Since op1 == op2, this will
-  // have already included whatever the actual range of name is.
-  if (!compute_operand2_range (op_range, handler, lhs, name, src, rel))
+  // If op1 is in the def chain of op2, we'll do the work twice to evalaute
+  // op1.  This can result in an exponential time calculation.
+  // Instead just evaluate op2, which will eventualy get to op1.
+  if (in_chain_p (handler.operand1 (), handler.operand2 ()))
+return compute_operand2_range (r, handler, lhs, name, src, rel);
+
+  // Likewise if op2 is in the def chain of op1.
+  if (in_chain_p (handler.operand2 (), handler.operand1 ()))
+return compute_operand1_range (r, handler, lhs, name, src, rel);
+
+  // Calculate a good a range through op2.
+  if (!compute_operand2_range (r, handler, lhs, name, src, rel))
 return false;
 
+  // If op1 == op2 there is again no need to go further.
+  if (handler.operand1 () == handler.operand2 ())
+return true;
+
   // Now get the range thru op1.
-  if (!compute_operand1_range (r, handler, lhs, name, src, rel))
+  if (!compute_operand1_range (op_range, handler, lhs, name, src, rel))
 return false;
 
   // Both operands have to be simultaneously true, so perform an intersection.

Re: [PATCH v3] Add leafy mode for zero-call-used-regs

2023-06-26 Thread Qing Zhao via Gcc-patches



> On Jun 23, 2023, at 7:27 PM, Alexandre Oliva  wrote:
> 
> On Jun 23, 2023, Qing Zhao via Gcc-patches  wrote:
> 
>> It’s better to add this definition earlier in the list of the “three
>> basic values”, to make it “four basic values”, like the following:
> 
> Oh, my, sorry for being so dense, I had managed to miss that bit all
> this time somehow :-(
> 
>> The sentence "This value is mainly to provide users a more efficient mode to 
>> zero 
>>call-used registers in leaf functions.” just for your reference,
>> the wording can certainly be improved.  -:)
> 
> :-)  got it, thanks.  How about this?
> 
> 
> Add leafy mode for zero-call-used-regs
> 
> Introduce 'leafy' to auto-select between 'used' and 'all' for leaf and
> nonleaf functions, respectively.
> 
> Regstrapped on x86_64-linux-gnu.  Ok to install?

LGTM now.

However, I am not a reviewer,  you might still need approval from a middle-end 
reviewer.

thanks.

Qing


> 
> 
> for  gcc/ChangeLog
> 
>   * doc/extend.texi (zero-call-used-regs): Document leafy and
>   variants thereof.
>   * flag-types.h (zero_regs_flags): Add LEAFY_MODE, as well as
>   LEAFY and variants.
>   * function.cc (gen_call_ued_regs_seq): Set only_used for leaf
>   functions in leafy mode.
>   * opts.cc (zero_call_used_regs_opts): Add leafy and variants.
> 
> for  gcc/testsuite/ChangeLog
> 
>   * c-c++-common/zero-scratch-regs-leafy-1.c: New.
>   * c-c++-common/zero-scratch-regs-leafy-2.c: New.
>   * gcc.target/i386/zero-scratch-regs-leafy-1.c: New.
>   * gcc.target/i386/zero-scratch-regs-leafy-2.c: New.
> ---
> gcc/doc/extend.texi|   30 ++--
> gcc/flag-types.h   |5 +++
> gcc/function.cc|3 ++
> gcc/opts.cc|4 +++
> .../c-c++-common/zero-scratch-regs-leafy-1.c   |   15 ++
> .../c-c++-common/zero-scratch-regs-leafy-2.c   |   21 ++
> .../gcc.target/i386/zero-scratch-regs-leafy-1.c|   12 
> .../gcc.target/i386/zero-scratch-regs-leafy-2.c|   16 +++
> 8 files changed, 103 insertions(+), 3 deletions(-)
> create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-leafy-1.c
> create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-leafy-2.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-leafy-1.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-leafy-2.c
> 
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 852f6b629bea8..739c40368f556 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -4349,7 +4349,7 @@ through registers.
> In order to satisfy users with different security needs and control the
> run-time overhead at the same time, the @var{choice} parameter provides a
> flexible way to choose the subset of the call-used registers to be zeroed.
> -The three basic values of @var{choice} are:
> +The four basic values of @var{choice} are:
> 
> @itemize @bullet
> @item
> @@ -4362,10 +4362,16 @@ the function.
> 
> @item
> @samp{all} zeros all call-used registers.
> +
> +@item
> +@samp{leafy} behaves like @samp{used} in a leaf function, and like
> +@samp{all} in a nonleaf function.  This makes for leaner zeroing in leaf
> +functions, where the set of used registers is known, and that may be
> +enough for some purposes of register zeroing.
> @end itemize
> 
> In addition to these three basic choices, it is possible to modify
> -@samp{used} or @samp{all} as follows:
> +@samp{used}, @samp{all}, and @samp{leafy} as follows:
> 
> @itemize @bullet
> @item
> @@ -4412,10 +4418,28 @@ zeros all call-used registers that pass arguments.
> @item all-gpr-arg
> zeros all call-used general purpose registers that pass
> arguments.
> +
> +@item leafy
> +Same as @samp{used} in a leaf function, and same as @samp{all} in a
> +nonleaf function.
> +
> +@item leafy-gpr
> +Same as @samp{used-gpr} in a leaf function, and same as @samp{all-gpr}
> +in a nonleaf function.
> +
> +@item leafy-arg
> +Same as @samp{used-arg} in a leaf function, and same as @samp{all-arg}
> +in a nonleaf function.
> +
> +@item leafy-gpr-arg
> +Same as @samp{used-gpr-arg} in a leaf function, and same as
> +@samp{all-gpr-arg} in a nonleaf function.
> +
> @end table
> 
> Of this list, @samp{used-arg}, @samp{used-gpr-arg}, @samp{all-arg},
> -and @samp{all-gpr-arg} are mainly used for ROP mitigation.
> +@samp{all-gpr-arg}, @samp{leafy-arg}, and @samp{leafy-gpr-arg} are
> +mainly used for ROP mitigation.
> 
> The default for the attribute is controlled by @option{-fzero-call-used-regs}.
> @end table
> diff --git a/gcc/flag-types.h b/gcc/flag-types.h
> index 2e650bf1c487c..0d2dab1b99dd4 100644
> --- a/gcc/flag-types.h
> +++ b/gcc/flag-types.h
> @@ -348,6 +348,7 @@ namespace zero_regs_flags {
>   const unsigned int ONLY_GPR = 1UL << 2;
>   const unsigned int ONLY_ARG = 1UL << 3;
>   const unsigned int

[PATCH] aarch64: Remove architecture dependencies from intrinsics

2023-06-26 Thread Andrew Carlotti via Gcc-patches

Many intrinsics currently depend on both an architecture version and a
feature, despite the corresponding instructions being available within
GCC at lower architecture versions.

LLVM has already removed these explicit architecture version
dependences; this patch does the same for GCC, as well as removing an
unecessary simd dependency for the scalar fp16 intrinsics.

Binutils does not support all of these architecture+feature combinations
yet, but this is an existing problem that is already reachable from GCC.
For example, compiling the test gcc.target/aarch64/usadv16qi-dotprod.c
with -O3 -march=armv8-a+dotprod has resulted in an assembler error since
GCC 10. I intend to patch this in binutils.

This patch retains explicit architecture version dependencies for
features that do not currently have a separate feature flag.

Ok for master, and backport to GCC 13?

gcc/ChangeLog:

 * config/aarch64/aarch64.h (TARGET_MEMTAG): Remove armv8.5
 dependency.
 * config/aarch64/arm_acle.h: Remove unnecessary armv8.x
 dependencies from target pragmas.
 * config/aarch64/arm_fp16.h (target): Likewise.
 * config/aarch64/arm_neon.h (target): Likewise.

gcc/testsuite/ChangeLog:

 * gcc.target/aarch64/feature-bf16-backport.c: New test.
 * gcc.target/aarch64/feature-dotprod-backport.c: New test.
 * gcc.target/aarch64/feature-fp16-backport.c: New test.
 * gcc.target/aarch64/feature-fp16-scalar-backport.c: New test.
 * gcc.target/aarch64/feature-fp16fml-backport.c: New test.
 * gcc.target/aarch64/feature-i8mm-backport.c: New test.
 * gcc.target/aarch64/feature-memtag-backport.c: New test.
 * gcc.target/aarch64/feature-sha3-backport.c: New test.
 * gcc.target/aarch64/feature-sm4-backport.c: New test.


diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
7129ed1ff370d597895b3f46b56b1250da7fa190..cdb664eb8f7db820b6b06b2667bfad6dc14cb7a2
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -292,7 +292,7 @@ enum class aarch64_feature : unsigned char {
 #define TARGET_RNG (AARCH64_ISA_RNG)
 
 /* Memory Tagging instructions optional to Armv8.5 enabled through +memtag.  */
-#define TARGET_MEMTAG (AARCH64_ISA_V8_5A && AARCH64_ISA_MEMTAG)
+#define TARGET_MEMTAG (AARCH64_ISA_MEMTAG)
 
 /* I8MM instructions are enabled through +i8mm.  */
 #define TARGET_I8MM (AARCH64_ISA_I8MM)
diff --git a/gcc/config/aarch64/arm_acle.h b/gcc/config/aarch64/arm_acle.h
index 
e0ac591d2c8d6c4c4c8a074b2d9881c47b1db1ab..87fb42f47c5821adecbb0ea441e0a38c63972e77
 100644
--- a/gcc/config/aarch64/arm_acle.h
+++ b/gcc/config/aarch64/arm_acle.h
@@ -325,7 +325,7 @@ __rndrrs (uint64_t *__res)
 #pragma GCC pop_options
 
 #pragma GCC push_options
-#pragma GCC target ("arch=armv8.5-a+memtag")
+#pragma GCC target ("+nothing+memtag")
 
 #define __arm_mte_create_random_tag(__ptr, __u64_mask) \
   __builtin_aarch64_memtag_irg(__ptr, __u64_mask)
diff --git a/gcc/config/aarch64/arm_fp16.h b/gcc/config/aarch64/arm_fp16.h
index 
a8fa4dbbdfe1bab4aa604bb311ef66d4e1de18ac..84b2ed66f9ba19fba6ccd8be33940d7239bfa22e
 100644
--- a/gcc/config/aarch64/arm_fp16.h
+++ b/gcc/config/aarch64/arm_fp16.h
@@ -30,7 +30,7 @@
 #include 
 
 #pragma GCC push_options
-#pragma GCC target ("arch=armv8.2-a+fp16")
+#pragma GCC target ("+nothing+fp16+nosimd")
 
 typedef __fp16 float16_t;
 
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 
eeec9f162e223df8cf7803b3227aef22e94227ac..a078674376af121c36bbebef76631c25a6815b1b
 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -25590,7 +25590,7 @@ __INTERLEAVE_LIST (zip)
 #include "arm_fp16.h"
 
 #pragma GCC push_options
-#pragma GCC target ("arch=armv8.2-a+fp16")
+#pragma GCC target ("+nothing+fp16")
 
 /* ARMv8.2-A FP16 one operand vector intrinsics.  */
 
@@ -26753,7 +26753,7 @@ vminnmvq_f16 (float16x8_t __a)
 /* AdvSIMD Dot Product intrinsics.  */
 
 #pragma GCC push_options
-#pragma GCC target ("arch=armv8.2-a+dotprod")
+#pragma GCC target ("+nothing+dotprod")
 
 __extension__ extern __inline uint32x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
@@ -26844,7 +26844,7 @@ vdotq_laneq_s32 (int32x4_t __r, int8x16_t __a, 
int8x16_t __b, const int __index)
 #pragma GCC pop_options
 
 #pragma GCC push_options
-#pragma GCC target ("arch=armv8.2-a+sm4")
+#pragma GCC target ("+nothing+sm4")
 
 __extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
@@ -26911,7 +26911,7 @@ vsm4ekeyq_u32 (uint32x4_t __a, uint32x4_t __b)
 #pragma GCC pop_options
 
 #pragma GCC push_options
-#pragma GCC target ("arch=armv8.2-a+sha3")
+#pragma GCC target ("+nothing+sha3")
 
 __extension__ extern __inline uint64x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
@@ -27547,7 +27547,7 @@ vcmlaq_rot270_laneq_f32 (float32x4_t __r, float32x4_t 
__a, float32x4_t __b,
 #pragma GCC pop_options
 
 #pragma GCC push_options
-#pragma GCC target ("arch=armv8.2-a+fp16fml")
+#pragma GCC

Re: Re: [PATCH V3] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE

Hi，Richi.>> I guess it would be nice to re-formulate the loop in terms of
>> the encoded VECTOR_CST elts, but then we need to generate
>> the "extents" for set bits, not sure how to do that here.
>> Note in the end we get HOST_WIDE_INT extents from adding
>> the element size for each mask element we look at.  The question
>> is how and if we currently handle the trailing ... correctly
>> for VL vectors.
I tried to understant this but I failed since I may miss the background of 
SCCVN.Do you want me to refactor the do...while loop?Actually, RVV didn't use 
IFN for intrinsics and I am not sure whether the refactoring is necessary or 
not.

Thanks.


juzhe.zh...@rivai.ai
 
From: juzhe.zhong
Date: 2023-06-26 21:04
To: Richard Biener
CC: gcc-patches; richard.sandiford
Subject: Re: [PATCH V3] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE
RV V didn't use IFN for intrinsics. Is it possible that we write C code to 
reproduce the issue you point out？
 Replied Message 
FromRichard Biener
Date06/26/2023 20:45
tojuzhe.zh...@rivai.ai
Ccgcc-patches,
richard.sandiford
SubjectRe: Re: [PATCH V3] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE
On Mon, 26 Jun 2023, juzhe.zh...@rivai.ai wrote: 

> Hi, Richi. 
> 
> >> I think you can use ordered_min here?  Alternatively doing ... 
> 
> I check the function of ordered_min: 
> ordered_min (const poly_int_pod , const poly_int_pod ) 
> { 
>   if (known_le (a, b)) 
> return a; 
>   else 
> { 
>   if (N > 1) 
>   gcc_checking_assert (known_le (b, a)); 
>   return b; 
> } 
> } 
> 
> It seems that assertion will fail When nunits = [2,2] , len + bias = 3, for 
> example. 

Yes, looks like so. 

> I may be wrong. 

I guess it would be nice to re-formulate the loop in terms of 
the encoded VECTOR_CST elts, but then we need to generate 
the "extents" for set bits, not sure how to do that here. 
Note in the end we get HOST_WIDE_INT extents from adding 
the element size for each mask element we look at.  The question 
is how and if we currently handle the trailing ... correctly 
for VL vectors. 

It should be a matter of creating a few testcases where we 
expect (or expect not) to CSE a [masked] VL vector load with 
one or multiple stores.  Like if we have 

*v = 0; 
*(v + vls) = 1; 
... = *(v + vls/2); 

that is, two VL vector stores that are "adjacent" and one 
load that half-overlaps both.  That 'vls' would be a 
poly-int CST then.  It might be possible to create the 
above with intrinsics(?), for sure within a loop by 
vectorization. 

Richard. 

> Thanks. 
> 
> 
> juzhe.zh...@rivai.ai 
>   
> From: Richard Biener 
> Date: 2023-06-26 20:16 
> To: Ju-Zhe Zhong 
> CC: gcc-patches; richard.sandiford 
> Subject: Re: [PATCH V3] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE 
> On Mon, 26 Jun 2023, juzhe.zh...@rivai.ai wrote: 
>   
> > From: Ju-Zhe Zhong  
> > 
> > Hi, Richi. It seems that we use nunits which is len + bias to iterate then 
> > we can 
> > simplify the codes. 
> > 
> > Also, I fixed behavior of len_store, 
> > 
> > Before this patch: 
> >(len - bias) * BITS_PER_UNIT 
> > After this patch: 
> >(len + bias) * BITS_PER_UNIT 
> > 
> > gcc/ChangeLog: 
> > 
> > * tree-ssa-sccvn.cc (vn_reference_lookup_3): Add LEN_MASK_STORE and 
> > fix LEN_STORE. 
> > 
> > --- 
> >  gcc/tree-ssa-sccvn.cc | 24 ++-- 
> >  1 file changed, 22 insertions(+), 2 deletions(-) 
> > 
> > diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc 
> > index 11061a374a2..d66e75460ed 100644 
> > --- a/gcc/tree-ssa-sccvn.cc 
> > +++ b/gcc/tree-ssa-sccvn.cc 
> > @@ -3304,6 +3304,16 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
> > *data_, 
> >if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias)) 
> >  return (void *)-1; 
> >break; 
> > + case IFN_LEN_MASK_STORE: 
> > +   len = gimple_call_arg (call, 2); 
> > +   bias = gimple_call_arg (call, 5); 
> > +   if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias)) 
> > + return (void *)-1; 
> > +   mask = gimple_call_arg (call, internal_fn_mask_index (fn)); 
> > +   mask = vn_valueize (mask); 
> > +   if (TREE_CODE (mask) != VECTOR_CST) 
> > + return (void *)-1; 
> > +   break; 
> >  default: 
> >return (void *)-1; 
> >  } 
> > @@ -3344,6 +3354,16 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
> > *data_, 
> >tree vectype = TREE_TYPE (def_rhs); 
> >unsigned HOST_WIDE_INT elsz 
> >  = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (vectype))); 
> > +   poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype); 
> > +   if (len) 
> > + { 
> > +   /* Since the following while condition known_lt 
> > +  (mask_idx, nunits) will exit the while loop 
> > +  when mask_idx > nunits.coeffs[0], we pick the 
> > +  MIN (nunits.coeffs[0], len + bias).  */ 
> > +   nunits = MIN (nunits.coeffs[0], 
> > + tree_to_uhwi (len) + tree_to_shwi (bias)); 
>   
> I think you can use ordered_min here?  Alternatively doing ... 
>   
> > + } 
> >if (mask) 
> >  {

Re: [PATCH v6] tree-ssa-sink: Improve code sinking pass

On Sat, Jun 24, 2023 at 6:12 AM Ajit Agarwal  wrote:
>
> Hello All:
>
> This patch improves code sinking pass to sink statements before call to reduce
> register pressure.
> Review comments are incorporated.
>
> For example :
>
> void bar();
> int j;
> void foo(int a, int b, int c, int d, int e, int f)
> {
>   int l;
>   l = a + b + c + d +e + f;
>   if (a != 5)
> {
>   bar();
>   j = l;
> }
> }
>
> Code Sinking does the following:
>
> void bar();
> int j;
> void foo(int a, int b, int c, int d, int e, int f)
> {
>   int l;
>
>   if (a != 5)
> {
>   l = a + b + c + d +e + f;
>   bar();
>   j = l;
> }
> }
>
> Bootstrapped regtested on powerpc64-linux-gnu.
>
> Thanks & Regards
> Ajit
>
>
> tree-ssa-sink: Improve code sinking pass
>
> Currently, code sinking will sink code after function calls.  This increases
> register pressure for callee-saved registers.  The following patch improves
> code sinking by placing the sunk code before calls in the use block or in
> the immediate dominator of the use blocks.
>
> 2023-06-24  Ajit Kumar Agarwal  
>
> gcc/ChangeLog:
>
> PR tree-optimization/81953
> * tree-ssa-sink.cc (statement_sink_location): Move statements before
> calls.
> (def_use_same_block): New function.
> (select_best_block): Add heuristics to select the best blocks in the
> immediate post dominator.
>
> gcc/testsuite/ChangeLog:
>
> PR tree-optimization/81953
> * gcc.dg/tree-ssa/ssa-sink-20.c: New testcase.
> * gcc.dg/tree-ssa/ssa-sink-21.c: New testcase.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c | 15 +
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c | 19 ++
>  gcc/tree-ssa-sink.cc| 68 ++---
>  3 files changed, 92 insertions(+), 10 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c
> new file mode 100644
> index 000..d3b79ca5803
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-sink-stats" } */
> +void bar();
> +int j;
> +void foo(int a, int b, int c, int d, int e, int f)
> +{
> +  int l;
> +  l = a + b + c + d +e + f;
> +  if (a != 5)
> +{
> +  bar();
> +  j = l;
> +}
> +}
> +/* { dg-final { scan-tree-dump 
> {l_12\s+=\s+_4\s+\+\s+f_11\(D\);\n\s+bar\s+\(\)} sink1 } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
> new file mode 100644
> index 000..84e7938c54f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-sink-stats" } */
> +void bar();
> +int j, x;
> +void foo(int a, int b, int c, int d, int e, int f)
> +{
> +  int l;
> +  l = a + b + c + d +e + f;
> +  if (a != 5)
> +{
> +  bar();
> +  if (b != 3)
> +x = 3;
> +  else
> +x = 5;
> +  j = l;
> +}
> +}
> +/* { dg-final { scan-tree-dump 
> {l_13\s+=\s+_4\s+\+\s+f_12\(D\);\n\s+bar\s+\(\)} sink1 } } */
> diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
> index b1ba7a2ad6c..791d44249f9 100644
> --- a/gcc/tree-ssa-sink.cc
> +++ b/gcc/tree-ssa-sink.cc
> @@ -171,9 +171,28 @@ nearest_common_dominator_of_uses (def_operand_p def_p, 
> bool *debug_stmts)
>return commondom;
>  }
>
> +/* Return TRUE if immediate uses of the defs in
> +   STMT occur in the same block as STMT, FALSE otherwise.  */
> +
> +static bool
> +def_use_same_block (gimple *stmt)
> +{
> +  def_operand_p def;
> +  ssa_op_iter iter;
> +
> +  FOR_EACH_SSA_DEF_OPERAND (def, stmt, iter, SSA_OP_DEF)
> +{
> +  gimple *def_stmt = SSA_NAME_DEF_STMT (DEF_FROM_PTR (def));
> +  if ((gimple_bb (def_stmt) == gimple_bb (stmt)))
> +   return true;

This doesn't do what the comment says?  It returns true if 'stmt' has
any SSA DEF,
because in fact def_stmt == stmt in all cases.

I should probably stop looking here, but I'll point you to PR110218 where I note
the selection of the block should happen in the walk where we walk immediate
dominators, and both the abnormal_pred and ->count checks should happen
there, causing us to consider the next dominator as candidate.

> + }
> +  return false;
> +}
> +
>  /* Given EARLY_BB and LATE_BB, two blocks in a path through the dominator
> tree, return the best basic block between them (inclusive) to place
> -   statements.
> +   statements. The best basic block should be an immediate dominator of
> +   best basic block if the use stmt is after the call.
>
> We want the most control dependent block in the shallowest loop nest.
>
> @@ -190,7 +209,8 @@ nearest_common_dominator_of_uses (def_operand_p def_p, 
> bool *debug_stmts)
>  static

Re: [PATCH] tree-optimization/110381 - preserve SLP permutation with in-order reductions

2023-06-26 Thread Richard Sandiford via Gcc-patches

Richard Biener  writes:
> The following fixes a bug that manifests itself during fold-left
> reduction transform in picking not the last scalar def to replace
> and thus double-counting some elements.  But the underlying issue
> is that we merge a load permutation into the in-order reduction
> which is of course wrong.
>
> Now, reduction analysis has not yet been performend when optimizing
> permutations so we have to resort to check that ourselves.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.
>
>   PR tree-optimization/110381
>   * tree-vect-slp.cc (vect_optimize_slp_pass::start_choosing_layouts):
>   Materialize permutes before fold-left reductions.
>
>   * gcc.dg/vect/pr110381.c: New testcase.

Thanks, LGTM FWIW.

Richard

> ---
>  gcc/testsuite/gcc.dg/vect/pr110381.c | 40 
>  gcc/tree-vect-slp.cc | 18 +++--
>  2 files changed, 56 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/pr110381.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/pr110381.c 
> b/gcc/testsuite/gcc.dg/vect/pr110381.c
> new file mode 100644
> index 000..2313dbf11ca
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr110381.c
> @@ -0,0 +1,40 @@
> +/* { dg-do run } */
> +
> +struct FOO {
> +   double a;
> +   double b;
> +   double c;
> +};
> +
> +double __attribute__((noipa))
> +sum_8_foos(const struct FOO* foos)
> +{
> +  double sum = 0;
> +
> +  for (int i = 0; i < 8; ++i)
> +{
> +  struct FOO foo = foos[i];
> +
> +  /* Need to use an in-order reduction here, preserving
> + the load permutation.  */
> +  sum += foo.a;
> +  sum += foo.c;
> +  sum += foo.b;
> +}
> +
> +  return sum;
> +}
> +
> +int main()
> +{
> +  struct FOO foos[8];
> +
> +  __builtin_memset (foos, 0, sizeof (foos));
> +  foos[0].a = __DBL_MAX__;
> +  foos[0].b = 5;
> +  foos[0].c = -__DBL_MAX__;
> +
> +  if (sum_8_foos (foos) != 5)
> +__builtin_abort ();
> +  return 0;
> +}
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 4481d43e3d7..8cb1ac1f319 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -4682,14 +4682,28 @@ vect_optimize_slp_pass::start_choosing_layouts ()
>m_partition_layout_costs.safe_grow_cleared (m_partitions.length ()
> * m_perms.length ());
>  
> -  /* We have to mark outgoing permutations facing non-reduction graph
> - entries that are not represented as to be materialized.  */
> +  /* We have to mark outgoing permutations facing non-associating-reduction
> + graph entries that are not represented as to be materialized.
> + slp_inst_kind_bb_reduc currently only covers associatable reductions.  
> */
>for (slp_instance instance : m_vinfo->slp_instances)
>  if (SLP_INSTANCE_KIND (instance) == slp_inst_kind_ctor)
>{
>   unsigned int node_i = SLP_INSTANCE_TREE (instance)->vertex;
>   m_partitions[m_vertices[node_i].partition].layout = 0;
>}
> +else if (SLP_INSTANCE_KIND (instance) == slp_inst_kind_reduc_chain)
> +  {
> + stmt_vec_info stmt_info
> +   = SLP_TREE_REPRESENTATIVE (SLP_INSTANCE_TREE (instance));
> + stmt_vec_info reduc_info = info_for_reduction (m_vinfo, stmt_info);
> + if (needs_fold_left_reduction_p (TREE_TYPE
> +(gimple_get_lhs (stmt_info->stmt)),
> +  STMT_VINFO_REDUC_CODE (reduc_info)))
> +   {
> + unsigned int node_i = SLP_INSTANCE_TREE (instance)->vertex;
> + m_partitions[m_vertices[node_i].partition].layout = 0;
> +   }
> +  }
>  
>/* Check which layouts each node and partition can handle.  Calculate the
>   weights associated with inserting layout changes on edges.  */

[PING] PATCH v5 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces.

All:

Ok for trunk. Please review.

Thanks & Regards
Ajit

On 26/06/23 6:12 pm, Ajit Agarwal via Gcc-patches wrote:
> All:
> 
> Ok for trunk. Please review.
> 
> Thanks & Regards
> Ajit
> 
> On 01/06/23 10:53 am, Ajit Agarwal via Gcc-patches wrote:
>> Hello All:
>>
>> This new version of patch 4 use improve ree pass for rs6000 target using 
>> defined ABI interfaces.
>> Bootstrapped and regtested on power64-linux-gnu.
>>
>> Review comments incorporated.
>>
>> Thanks & Regards
>> Ajit
>>
>> Improve ree pass for rs6000 target using defined abi interfaces
>>
>> For rs6000 target we see redundant zero and sign
>> extension and done to improve ree pass to eliminate
>> such redundant zero and sign extension using defined
>> ABI interfaces.
>>
>> 2023-06-01  Ajit Kumar Agarwal  
>>
>> gcc/ChangeLog:
>>
>>  * ree.cc (combine_reaching_defs): Use of  zero_extend and sign_extend
>>  defined abi interfaces.
>>  (add_removable_extension): Use of defined abi interfaces for no
>>  reaching defs.
>>  (abi_extension_candidate_return_reg_p): New function.
>>  (abi_extension_candidate_p): New function.
>>  (abi_extension_candidate_argno_p): New function.
>>  (abi_handle_regs_without_defs_p): New function.
>>  (abi_target_promote_function_mode): New function.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * g++.target/powerpc/zext-elim-3.C
>> ---
>>  gcc/ree.cc| 199 +++---
>>  .../g++.target/powerpc/zext-elim-3.C  |  13 ++
>>  2 files changed, 183 insertions(+), 29 deletions(-)
>>  create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-3.C
>>
>> diff --git a/gcc/ree.cc b/gcc/ree.cc
>> index fc04249fa84..2025a7c43da 100644
>> --- a/gcc/ree.cc
>> +++ b/gcc/ree.cc
>> @@ -514,7 +514,8 @@ get_uses (rtx_insn *insn, rtx reg)
>>  if (REGNO (DF_REF_REG (def)) == REGNO (reg))
>>break;
>>  
>> -  gcc_assert (def != NULL);
>> +  if (def == NULL)
>> +return NULL;
>>  
>>ref_chain = DF_REF_CHAIN (def);
>>  
>> @@ -750,6 +751,120 @@ get_extended_src_reg (rtx src)
>>return src;
>>  }
>>  
>> +/* Return TRUE if target mode is equal to source mode of zero_extend
>> +   or sign_extend otherwise false.  */
>> +
>> +static bool
>> +abi_target_promote_function_mode (machine_mode mode)
>> +{
>> +  int unsignedp;
>> +  machine_mode tgt_mode =
>> +targetm.calls.promote_function_mode (NULL_TREE, mode, ,
>> + NULL_TREE, 1);
>> +
>> +  if (tgt_mode == mode)
>> +return true;
>> +  else
>> +return false;
>> +}
>> +
>> +/* Return TRUE if the candidate insn is zero extend and regno is
>> +   an return  registers.  */
>> +
>> +static bool
>> +abi_extension_candidate_return_reg_p (rtx_insn *insn, int regno)
>> +{
>> +  rtx set = single_set (insn);
>> +
>> +  if (GET_CODE (SET_SRC (set)) !=  ZERO_EXTEND)
>> +return false;
>> +
>> +  if (FUNCTION_VALUE_REGNO_P (regno))
>> +return true;
>> +
>> +  return false;
>> +}
>> +
>> +/* Return TRUE if reg source operand of zero_extend is argument registers
>> +   and not return registers and source and destination operand are same
>> +   and mode of source and destination operand are not same.  */
>> +
>> +static bool
>> +abi_extension_candidate_p (rtx_insn *insn)
>> +{
>> +  rtx set = single_set (insn);
>> +
>> +  if (GET_CODE (SET_SRC (set)) !=  ZERO_EXTEND)
>> +return false;
>> +
>> +  machine_mode ext_dst_mode = GET_MODE (SET_DEST (set));
>> +  rtx orig_src = XEXP (SET_SRC (set),0);
>> +
>> +  bool copy_needed
>> += (REGNO (SET_DEST (set)) != REGNO (XEXP (SET_SRC (set), 0)));
>> +
>> +  if (!copy_needed && ext_dst_mode != GET_MODE (orig_src)
>> +  && FUNCTION_ARG_REGNO_P (REGNO (orig_src))
>> +  && !abi_extension_candidate_return_reg_p (insn, REGNO (orig_src)))
>> +return true;
>> +
>> +  return false;
>> +}
>> +
>> +/* Return TRUE if the candidate insn is zero extend and regno is
>> +   an argument registers.  */
>> +
>> +static bool
>> +abi_extension_candidate_argno_p (rtx_code code, int regno)
>> +{
>> +  if (code !=  ZERO_EXTEND)
>> +return false;
>> +
>> +  if (FUNCTION_ARG_REGNO_P (regno))
>> +return true;
>> +
>> +  return false;
>> +}
>> +
>> +/* Return TRUE if the candidate insn doesn't have defs and have
>> + * uses without RTX_BIN_ARITH/RTX_COMM_ARITH/RTX_UNARY rtx class.  */
>> +
>> +static bool
>> +abi_handle_regs_without_defs_p (rtx_insn *insn)
>> +{
>> +  if (side_effects_p (PATTERN (insn)))
>> +return false;
>> +
>> +  struct df_link *uses
>> += get_uses (insn, SET_DEST (PATTERN (insn)));
>> +
>> +  if (!uses)
>> +return false;
>> +
>> +  for (df_link *use = uses; use; use = use->next)
>> +{
>> +  if (!use->ref)
>> +return false;
>> +
>> +  if (BLOCK_FOR_INSN (insn)
>> +  != BLOCK_FOR_INSN (DF_REF_INSN (use->ref)))
>> +return false;
>> +
>> +  rtx_insn *use_insn = DF_REF_INSN (use->ref);
>> +
>> +  if (GET_CODE (PATTERN (use_insn)) == SET)
>> +{
>> +

Re: Re: [PATCH V3] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE

On Mon, 26 Jun 2023, juzhe.zh...@rivai.ai wrote:

> Hi, Richi.
> 
> >> I think you can use ordered_min here?  Alternatively doing ...
> 
> I check the function of ordered_min:
> ordered_min (const poly_int_pod , const poly_int_pod )
> {
>   if (known_le (a, b))
> return a;
>   else
> {
>   if (N > 1)
>   gcc_checking_assert (known_le (b, a));
>   return b;
> }
> }
> 
> It seems that assertion will fail When nunits = [2,2] , len + bias = 3, for 
> example.

Yes, looks like so.

> I may be wrong.

I guess it would be nice to re-formulate the loop in terms of
the encoded VECTOR_CST elts, but then we need to generate
the "extents" for set bits, not sure how to do that here.
Note in the end we get HOST_WIDE_INT extents from adding
the element size for each mask element we look at.  The question
is how and if we currently handle the trailing ... correctly
for VL vectors.

It should be a matter of creating a few testcases where we
expect (or expect not) to CSE a [masked] VL vector load with
one or multiple stores.  Like if we have

 *v = 0;
 *(v + vls) = 1;
 ... = *(v + vls/2);

that is, two VL vector stores that are "adjacent" and one
load that half-overlaps both.  That 'vls' would be a
poly-int CST then.  It might be possible to create the
above with intrinsics(?), for sure within a loop by
vectorization.

Richard.

> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-06-26 20:16
> To: Ju-Zhe Zhong
> CC: gcc-patches; richard.sandiford
> Subject: Re: [PATCH V3] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE
> On Mon, 26 Jun 2023, juzhe.zh...@rivai.ai wrote:
>  
> > From: Ju-Zhe Zhong 
> > 
> > Hi, Richi. It seems that we use nunits which is len + bias to iterate then 
> > we can
> > simplify the codes.
> > 
> > Also, I fixed behavior of len_store,
> > 
> > Before this patch:
> >(len - bias) * BITS_PER_UNIT
> > After this patch:
> >(len + bias) * BITS_PER_UNIT
> >
> > gcc/ChangeLog:
> > 
> > * tree-ssa-sccvn.cc (vn_reference_lookup_3): Add LEN_MASK_STORE and 
> > fix LEN_STORE.
> > 
> > ---
> >  gcc/tree-ssa-sccvn.cc | 24 ++--
> >  1 file changed, 22 insertions(+), 2 deletions(-)
> > 
> > diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
> > index 11061a374a2..d66e75460ed 100644
> > --- a/gcc/tree-ssa-sccvn.cc
> > +++ b/gcc/tree-ssa-sccvn.cc
> > @@ -3304,6 +3304,16 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
> > *data_,
> >if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias))
> >  return (void *)-1;
> >break;
> > + case IFN_LEN_MASK_STORE:
> > +   len = gimple_call_arg (call, 2);
> > +   bias = gimple_call_arg (call, 5);
> > +   if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias))
> > + return (void *)-1;
> > +   mask = gimple_call_arg (call, internal_fn_mask_index (fn));
> > +   mask = vn_valueize (mask);
> > +   if (TREE_CODE (mask) != VECTOR_CST)
> > + return (void *)-1;
> > +   break;
> >  default:
> >return (void *)-1;
> >  }
> > @@ -3344,6 +3354,16 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
> > *data_,
> >tree vectype = TREE_TYPE (def_rhs);
> >unsigned HOST_WIDE_INT elsz
> >  = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (vectype)));
> > +   poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
> > +   if (len)
> > + {
> > +   /* Since the following while condition known_lt
> > +  (mask_idx, nunits) will exit the while loop
> > +  when mask_idx > nunits.coeffs[0], we pick the
> > +  MIN (nunits.coeffs[0], len + bias).  */
> > +   nunits = MIN (nunits.coeffs[0],
> > + tree_to_uhwi (len) + tree_to_shwi (bias));
>  
> I think you can use ordered_min here?  Alternatively doing ...
>  
> > + }
> >if (mask)
> >  {
> >HOST_WIDE_INT start = 0, length = 0;
> > @@ -3373,7 +3393,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
> > *data_,
> >  length += elsz;
> >mask_idx++;
> >  }
> > -   while (known_lt (mask_idx, TYPE_VECTOR_SUBPARTS (vectype)));
> > +   while (known_lt (mask_idx, nunits));
>  
> && mask_id < len
>  
> would be possible.
>  
> Richard?
>  
> Thanks,
> Richard.
>  
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

Re: PATCH v5 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces.

All:

Ok for trunk. Please review.

Thanks & Regards
Ajit

On 01/06/23 10:53 am, Ajit Agarwal via Gcc-patches wrote:
> Hello All:
> 
> This new version of patch 4 use improve ree pass for rs6000 target using 
> defined ABI interfaces.
> Bootstrapped and regtested on power64-linux-gnu.
> 
> Review comments incorporated.
> 
> Thanks & Regards
> Ajit
> 
> Improve ree pass for rs6000 target using defined abi interfaces
> 
> For rs6000 target we see redundant zero and sign
> extension and done to improve ree pass to eliminate
> such redundant zero and sign extension using defined
> ABI interfaces.
> 
> 2023-06-01  Ajit Kumar Agarwal  
> 
> gcc/ChangeLog:
> 
>   * ree.cc (combine_reaching_defs): Use of  zero_extend and sign_extend
>   defined abi interfaces.
>   (add_removable_extension): Use of defined abi interfaces for no
>   reaching defs.
>   (abi_extension_candidate_return_reg_p): New function.
>   (abi_extension_candidate_p): New function.
>   (abi_extension_candidate_argno_p): New function.
>   (abi_handle_regs_without_defs_p): New function.
>   (abi_target_promote_function_mode): New function.
> 
> gcc/testsuite/ChangeLog:
> 
> * g++.target/powerpc/zext-elim-3.C
> ---
>  gcc/ree.cc| 199 +++---
>  .../g++.target/powerpc/zext-elim-3.C  |  13 ++
>  2 files changed, 183 insertions(+), 29 deletions(-)
>  create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-3.C
> 
> diff --git a/gcc/ree.cc b/gcc/ree.cc
> index fc04249fa84..2025a7c43da 100644
> --- a/gcc/ree.cc
> +++ b/gcc/ree.cc
> @@ -514,7 +514,8 @@ get_uses (rtx_insn *insn, rtx reg)
>  if (REGNO (DF_REF_REG (def)) == REGNO (reg))
>break;
>  
> -  gcc_assert (def != NULL);
> +  if (def == NULL)
> +return NULL;
>  
>ref_chain = DF_REF_CHAIN (def);
>  
> @@ -750,6 +751,120 @@ get_extended_src_reg (rtx src)
>return src;
>  }
>  
> +/* Return TRUE if target mode is equal to source mode of zero_extend
> +   or sign_extend otherwise false.  */
> +
> +static bool
> +abi_target_promote_function_mode (machine_mode mode)
> +{
> +  int unsignedp;
> +  machine_mode tgt_mode =
> +targetm.calls.promote_function_mode (NULL_TREE, mode, ,
> +  NULL_TREE, 1);
> +
> +  if (tgt_mode == mode)
> +return true;
> +  else
> +return false;
> +}
> +
> +/* Return TRUE if the candidate insn is zero extend and regno is
> +   an return  registers.  */
> +
> +static bool
> +abi_extension_candidate_return_reg_p (rtx_insn *insn, int regno)
> +{
> +  rtx set = single_set (insn);
> +
> +  if (GET_CODE (SET_SRC (set)) !=  ZERO_EXTEND)
> +return false;
> +
> +  if (FUNCTION_VALUE_REGNO_P (regno))
> +return true;
> +
> +  return false;
> +}
> +
> +/* Return TRUE if reg source operand of zero_extend is argument registers
> +   and not return registers and source and destination operand are same
> +   and mode of source and destination operand are not same.  */
> +
> +static bool
> +abi_extension_candidate_p (rtx_insn *insn)
> +{
> +  rtx set = single_set (insn);
> +
> +  if (GET_CODE (SET_SRC (set)) !=  ZERO_EXTEND)
> +return false;
> +
> +  machine_mode ext_dst_mode = GET_MODE (SET_DEST (set));
> +  rtx orig_src = XEXP (SET_SRC (set),0);
> +
> +  bool copy_needed
> += (REGNO (SET_DEST (set)) != REGNO (XEXP (SET_SRC (set), 0)));
> +
> +  if (!copy_needed && ext_dst_mode != GET_MODE (orig_src)
> +  && FUNCTION_ARG_REGNO_P (REGNO (orig_src))
> +  && !abi_extension_candidate_return_reg_p (insn, REGNO (orig_src)))
> +return true;
> +
> +  return false;
> +}
> +
> +/* Return TRUE if the candidate insn is zero extend and regno is
> +   an argument registers.  */
> +
> +static bool
> +abi_extension_candidate_argno_p (rtx_code code, int regno)
> +{
> +  if (code !=  ZERO_EXTEND)
> +return false;
> +
> +  if (FUNCTION_ARG_REGNO_P (regno))
> +return true;
> +
> +  return false;
> +}
> +
> +/* Return TRUE if the candidate insn doesn't have defs and have
> + * uses without RTX_BIN_ARITH/RTX_COMM_ARITH/RTX_UNARY rtx class.  */
> +
> +static bool
> +abi_handle_regs_without_defs_p (rtx_insn *insn)
> +{
> +  if (side_effects_p (PATTERN (insn)))
> +return false;
> +
> +  struct df_link *uses
> += get_uses (insn, SET_DEST (PATTERN (insn)));
> +
> +  if (!uses)
> +return false;
> +
> +  for (df_link *use = uses; use; use = use->next)
> +{
> +  if (!use->ref)
> + return false;
> +
> +  if (BLOCK_FOR_INSN (insn)
> +   != BLOCK_FOR_INSN (DF_REF_INSN (use->ref)))
> + return false;
> +
> +  rtx_insn *use_insn = DF_REF_INSN (use->ref);
> +
> +  if (GET_CODE (PATTERN (use_insn)) == SET)
> + {
> +   rtx_code code = GET_CODE (SET_SRC (PATTERN (use_insn)));
> +
> +   if (GET_RTX_CLASS (code) == RTX_BIN_ARITH
> +   || GET_RTX_CLASS (code) == RTX_COMM_ARITH
> +   || GET_RTX_CLASS (code) == RTX_UNARY)
> + return false;
> + }
> +

[PING] [PATCH 3/4] ree: Improve functionality of ree pass for rs6000 target.

All:

Ok for trunk. Please review.

Thanks & Regards
Ajit

On 07/06/23 3:55 pm, Ajit Agarwal via Gcc-patches wrote:
> Hello All:
> 
> This patch provide functionality to improve ree pass for rs6000 target.
> Eliminated sign_extend/zero_extend/AND with varying constants.
> 
> Bootstrapped and regtested on powerpc64-linux-gnu.
> 
> Thanks & Regards
> Ajit
> 
> ree: Improve ree pass for rs6000 target
> 
> For rs6000 target we see redundant zero and sign extension and done to improve
> ree pass to eliminate such redundant zero and sign extension. Support of
> zero_extend/sign_extend/AND. Also support of AND with extension with different
> constants other than 1.
> 
> 2023-06-07  Ajit Kumar Agarwal  
> 
> gcc/ChangeLog:
> 
>   * ree.cc (eliminate_across_bbs_p): Add checks to enable extension
>   elimination across and within basic blocks.
>   (def_arith_p): New function to check definition has arithmetic
>   operation.
>   (combine_set_extension): Modification to incorporate AND
>   and current zero_extend and sign_extend instruction.
>   (merge_def_and_ext): Add calls to eliminate_across_bbs_p and
>   zero_extend sign_extend and AND instruction.
>   (rtx_is_zext_p): New function.
>   (feasible_cfg): New function.
>   * rtl.h (reg_used_set_between_p): Add prototype.
>   * rtlanal.cc (reg_used_set_between_p): New function.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.target/powerpc/zext-elim.C: New testcase.
>   * g++.target/powerpc/zext-elim-1.C: New testcase.
>   * g++.target/powerpc/zext-elim-2.C: New testcase.
>   * g++.target/powerpc/sext-elim.C: New testcase.
> ---
>  gcc/ree.cc| 476 --
>  gcc/rtl.h |   1 +
>  gcc/rtlanal.cc|  15 +
>  gcc/testsuite/g++.target/powerpc/sext-elim.C  |  18 +
>  .../g++.target/powerpc/zext-elim-1.C  |  19 +
>  .../g++.target/powerpc/zext-elim-2.C  |  11 +
>  gcc/testsuite/g++.target/powerpc/zext-elim.C  |  30 ++
>  7 files changed, 524 insertions(+), 46 deletions(-)
>  create mode 100644 gcc/testsuite/g++.target/powerpc/sext-elim.C
>  create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-1.C
>  create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-2.C
>  create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim.C
> 
> diff --git a/gcc/ree.cc b/gcc/ree.cc
> index fc04249fa84..dc6da21ec16 100644
> --- a/gcc/ree.cc
> +++ b/gcc/ree.cc
> @@ -253,6 +253,66 @@ struct ext_cand
>  
>  static int max_insn_uid;
>  
> +/* Return TRUE if OP can be considered a zero extension from one or
> +   more sub-word modes to larger modes up to a full word.
> +
> +   For example (and:DI (reg) (const_int X))
> +
> +   Depending on the value of X could be considered a zero extension
> +   from QI, HI and SI to larger modes up to DImode.  */
> +
> +static bool
> +rtx_is_zext_p (rtx insn)
> +{
> +  if (GET_CODE (insn) == AND)
> +{
> +  rtx set = XEXP (insn, 0);
> +  if (REG_P (set))
> + {
> +   rtx src = XEXP (insn, 1);
> +
> +   if (CONST_INT_P (src)
> +   && IN_RANGE (exact_log2 (UINTVAL (src)), 0, 7))
> + return true;
> + }
> +  else
> + return false;
> +}
> +
> +  return false;
> +}
> +/* Return TRUE if OP can be considered a zero extension from one or
> +   more sub-word modes to larger modes up to a full word.
> +
> +   For example (and:DI (reg) (const_int X))
> +
> +   Depending on the value of X could be considered a zero extension
> +   from QI, HI and SI to larger modes up to DImode.  */
> +
> +static bool
> +rtx_is_zext_p (rtx_insn *insn)
> +{
> +  rtx body = single_set (insn);
> +
> +  if (GET_CODE (body) == SET && GET_CODE (SET_SRC (body)) == AND)
> +   {
> + rtx set = XEXP (SET_SRC (body), 0);
> +
> + if (REG_P (set) && GET_MODE (SET_DEST (body)) == GET_MODE (set))
> +   {
> +   rtx src = XEXP (SET_SRC (body), 1);
> +
> +   if (CONST_INT_P (src)
> +   && IN_RANGE (exact_log2 (UINTVAL (src)), 0, 7))
> + return true;
> +   }
> + else
> +  return false;
> +   }
> +
> +   return false;
> +}
> +
>  /* Update or remove REG_EQUAL or REG_EQUIV notes for INSN.  */
>  
>  static bool
> @@ -319,7 +379,7 @@ combine_set_extension (ext_cand *cand, rtx_insn 
> *curr_insn, rtx *orig_set)
>  {
>rtx orig_src = SET_SRC (*orig_set);
>machine_mode orig_mode = GET_MODE (SET_DEST (*orig_set));
> -  rtx new_set;
> +  rtx new_set = NULL_RTX;
>rtx cand_pat = single_set (cand->insn);
>  
>/* If the extension's source/destination registers are not the same
> @@ -359,27 +419,41 @@ combine_set_extension (ext_cand *cand, rtx_insn 
> *curr_insn, rtx *orig_set)
>else if (GET_CODE (orig_src) == cand->code)
>  {
>/* Here is a sequence of two extensions.  Try to merge them.  */
> -  rtx temp_extension
> - = gen_rtx_fmt_e (cand->code, cand->mode, XEXP (orig_src, 0));
> +

[PING] PATCH v5 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces.

All:

Ok for trunk. Please review.


Thanks & Regards
Ajit

On 01/06/23 10:53 am, Ajit Agarwal via Gcc-patches wrote:
> Hello All:
> 
> This new version of patch 4 use improve ree pass for rs6000 target using 
> defined ABI interfaces.
> Bootstrapped and regtested on power64-linux-gnu.
> 
> Review comments incorporated.
> 
> Thanks & Regards
> Ajit
> 
> Improve ree pass for rs6000 target using defined abi interfaces
> 
> For rs6000 target we see redundant zero and sign
> extension and done to improve ree pass to eliminate
> such redundant zero and sign extension using defined
> ABI interfaces.
> 
> 2023-06-01  Ajit Kumar Agarwal  
> 
> gcc/ChangeLog:
> 
>   * ree.cc (combine_reaching_defs): Use of  zero_extend and sign_extend
>   defined abi interfaces.
>   (add_removable_extension): Use of defined abi interfaces for no
>   reaching defs.
>   (abi_extension_candidate_return_reg_p): New function.
>   (abi_extension_candidate_p): New function.
>   (abi_extension_candidate_argno_p): New function.
>   (abi_handle_regs_without_defs_p): New function.
>   (abi_target_promote_function_mode): New function.
> 
> gcc/testsuite/ChangeLog:
> 
> * g++.target/powerpc/zext-elim-3.C
> ---
>  gcc/ree.cc| 199 +++---
>  .../g++.target/powerpc/zext-elim-3.C  |  13 ++
>  2 files changed, 183 insertions(+), 29 deletions(-)
>  create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-3.C
> 
> diff --git a/gcc/ree.cc b/gcc/ree.cc
> index fc04249fa84..2025a7c43da 100644
> --- a/gcc/ree.cc
> +++ b/gcc/ree.cc
> @@ -514,7 +514,8 @@ get_uses (rtx_insn *insn, rtx reg)
>  if (REGNO (DF_REF_REG (def)) == REGNO (reg))
>break;
>  
> -  gcc_assert (def != NULL);
> +  if (def == NULL)
> +return NULL;
>  
>ref_chain = DF_REF_CHAIN (def);
>  
> @@ -750,6 +751,120 @@ get_extended_src_reg (rtx src)
>return src;
>  }
>  
> +/* Return TRUE if target mode is equal to source mode of zero_extend
> +   or sign_extend otherwise false.  */
> +
> +static bool
> +abi_target_promote_function_mode (machine_mode mode)
> +{
> +  int unsignedp;
> +  machine_mode tgt_mode =
> +targetm.calls.promote_function_mode (NULL_TREE, mode, ,
> +  NULL_TREE, 1);
> +
> +  if (tgt_mode == mode)
> +return true;
> +  else
> +return false;
> +}
> +
> +/* Return TRUE if the candidate insn is zero extend and regno is
> +   an return  registers.  */
> +
> +static bool
> +abi_extension_candidate_return_reg_p (rtx_insn *insn, int regno)
> +{
> +  rtx set = single_set (insn);
> +
> +  if (GET_CODE (SET_SRC (set)) !=  ZERO_EXTEND)
> +return false;
> +
> +  if (FUNCTION_VALUE_REGNO_P (regno))
> +return true;
> +
> +  return false;
> +}
> +
> +/* Return TRUE if reg source operand of zero_extend is argument registers
> +   and not return registers and source and destination operand are same
> +   and mode of source and destination operand are not same.  */
> +
> +static bool
> +abi_extension_candidate_p (rtx_insn *insn)
> +{
> +  rtx set = single_set (insn);
> +
> +  if (GET_CODE (SET_SRC (set)) !=  ZERO_EXTEND)
> +return false;
> +
> +  machine_mode ext_dst_mode = GET_MODE (SET_DEST (set));
> +  rtx orig_src = XEXP (SET_SRC (set),0);
> +
> +  bool copy_needed
> += (REGNO (SET_DEST (set)) != REGNO (XEXP (SET_SRC (set), 0)));
> +
> +  if (!copy_needed && ext_dst_mode != GET_MODE (orig_src)
> +  && FUNCTION_ARG_REGNO_P (REGNO (orig_src))
> +  && !abi_extension_candidate_return_reg_p (insn, REGNO (orig_src)))
> +return true;
> +
> +  return false;
> +}
> +
> +/* Return TRUE if the candidate insn is zero extend and regno is
> +   an argument registers.  */
> +
> +static bool
> +abi_extension_candidate_argno_p (rtx_code code, int regno)
> +{
> +  if (code !=  ZERO_EXTEND)
> +return false;
> +
> +  if (FUNCTION_ARG_REGNO_P (regno))
> +return true;
> +
> +  return false;
> +}
> +
> +/* Return TRUE if the candidate insn doesn't have defs and have
> + * uses without RTX_BIN_ARITH/RTX_COMM_ARITH/RTX_UNARY rtx class.  */
> +
> +static bool
> +abi_handle_regs_without_defs_p (rtx_insn *insn)
> +{
> +  if (side_effects_p (PATTERN (insn)))
> +return false;
> +
> +  struct df_link *uses
> += get_uses (insn, SET_DEST (PATTERN (insn)));
> +
> +  if (!uses)
> +return false;
> +
> +  for (df_link *use = uses; use; use = use->next)
> +{
> +  if (!use->ref)
> + return false;
> +
> +  if (BLOCK_FOR_INSN (insn)
> +   != BLOCK_FOR_INSN (DF_REF_INSN (use->ref)))
> + return false;
> +
> +  rtx_insn *use_insn = DF_REF_INSN (use->ref);
> +
> +  if (GET_CODE (PATTERN (use_insn)) == SET)
> + {
> +   rtx_code code = GET_CODE (SET_SRC (PATTERN (use_insn)));
> +
> +   if (GET_RTX_CLASS (code) == RTX_BIN_ARITH
> +   || GET_RTX_CLASS (code) == RTX_COMM_ARITH
> +   || GET_RTX_CLASS (code) == RTX_UNARY)
> + return false;
> + }
> +

Re: Re: [PATCH] RISC-V: Enhance RVV VLA SLP auto-vectorization

Sure. Sent it:
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622836.html 




juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-06-26 17:10
To: Robin Dapp
CC: Juzhe-Zhong; gcc-patches; kito.cheng; palmer; palmer; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Enhance RVV VLA SLP auto-vectorization
Could you re-title this patch into something like "Support const
vector expansion with xxx pattern",
 
On Mon, Jun 26, 2023 at 3:52 PM Robin Dapp via Gcc-patches
 wrote:
>
> Hi Juzhe,
>
> > Currently, we are able to generate step vector with base == 0:
> >  { 0, 0, 2, 2, 4, 4, ... }
> >
> > ASM:
> >
> > vid
> > vand
> >
> > However, we do wrong for step vector with base != 0:
> > { 1, 1, 3, 3, 5, 5, ... }
> >
> > Before this patch, such case will run fail.
> >
> > After this patch, we are able to pass the testcase and generate the step 
> > vector with asm:
> >
> > vid
> > vand
> > vadd
>
> Can't we use the first case as long as pow2_p (base) == true
> and not just for base == 0?
>
> Regards
>  Robin
>

Re: Re: [PATCH V3] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE

Hi, Richi.

>> I think you can use ordered_min here?  Alternatively doing ...

I check the function of ordered_min:
ordered_min (const poly_int_pod , const poly_int_pod )
{
  if (known_le (a, b))
return a;
  else
{
  if (N > 1)
  gcc_checking_assert (known_le (b, a));
  return b;
}
}

It seems that assertion will fail When nunits = [2,2] , len + bias = 3, for 
example.
I may be wrong.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-06-26 20:16
To: Ju-Zhe Zhong
CC: gcc-patches; richard.sandiford
Subject: Re: [PATCH V3] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE
On Mon, 26 Jun 2023, juzhe.zh...@rivai.ai wrote:
 
> From: Ju-Zhe Zhong 
> 
> Hi, Richi. It seems that we use nunits which is len + bias to iterate then we 
> can
> simplify the codes.
> 
> Also, I fixed behavior of len_store,
> 
> Before this patch:
>(len - bias) * BITS_PER_UNIT
> After this patch:
>(len + bias) * BITS_PER_UNIT
>
> gcc/ChangeLog:
> 
> * tree-ssa-sccvn.cc (vn_reference_lookup_3): Add LEN_MASK_STORE and 
> fix LEN_STORE.
> 
> ---
>  gcc/tree-ssa-sccvn.cc | 24 ++--
>  1 file changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
> index 11061a374a2..d66e75460ed 100644
> --- a/gcc/tree-ssa-sccvn.cc
> +++ b/gcc/tree-ssa-sccvn.cc
> @@ -3304,6 +3304,16 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
> *data_,
>if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias))
>  return (void *)-1;
>break;
> + case IFN_LEN_MASK_STORE:
> +   len = gimple_call_arg (call, 2);
> +   bias = gimple_call_arg (call, 5);
> +   if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias))
> + return (void *)-1;
> +   mask = gimple_call_arg (call, internal_fn_mask_index (fn));
> +   mask = vn_valueize (mask);
> +   if (TREE_CODE (mask) != VECTOR_CST)
> + return (void *)-1;
> +   break;
>  default:
>return (void *)-1;
>  }
> @@ -3344,6 +3354,16 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
> *data_,
>tree vectype = TREE_TYPE (def_rhs);
>unsigned HOST_WIDE_INT elsz
>  = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (vectype)));
> +   poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
> +   if (len)
> + {
> +   /* Since the following while condition known_lt
> +  (mask_idx, nunits) will exit the while loop
> +  when mask_idx > nunits.coeffs[0], we pick the
> +  MIN (nunits.coeffs[0], len + bias).  */
> +   nunits = MIN (nunits.coeffs[0],
> + tree_to_uhwi (len) + tree_to_shwi (bias));
 
I think you can use ordered_min here?  Alternatively doing ...
 
> + }
>if (mask)
>  {
>HOST_WIDE_INT start = 0, length = 0;
> @@ -3373,7 +3393,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
> *data_,
>  length += elsz;
>mask_idx++;
>  }
> -   while (known_lt (mask_idx, TYPE_VECTOR_SUBPARTS (vectype)));
> +   while (known_lt (mask_idx, nunits));
 
&& mask_id < len
 
would be possible.
 
Richard?
 
Thanks,
Richard.

Re: [PATCH] vect: Cost intermediate conversions

On Mon, Jun 26, 2023 at 1:58 PM Richard Sandiford via Gcc-patches
 wrote:
>
> g:6f19cf7526168f8 extended N-vector to N-vector conversions
> to handle cases where an intermediate integer extension or
> truncation is needed.  This patch adjusts the cost to account
> for these intermediate conversions.
>
> Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK to install?

OK.

> Richard
>
> gcc/
> * tree-vect-stmts.cc (vectorizable_conversion): Take multi_step_cvt
> into account when costing non-widening/truncating conversions.
> ---
>  gcc/tree-vect-stmts.cc | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index ae24f3e66e6..7bc602bf90a 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -5470,8 +5470,9 @@ vectorizable_conversion (vec_info *vinfo,
>if (modifier == NONE)
>  {
>   STMT_VINFO_TYPE (stmt_info) = type_conversion_vec_info_type;
> - vect_model_simple_cost (vinfo, stmt_info, ncopies, dt, ndts, 
> slp_node,
> - cost_vec);
> + vect_model_simple_cost (vinfo, stmt_info,
> + ncopies * (1 + multi_step_cvt),
> + dt, ndts, slp_node, cost_vec);
> }
>else if (modifier == NARROW_SRC || modifier == NARROW_DST)
> {
> --
> 2.25.1
>

[PATCH V2] RISC-V: Support const vector expansion with step vector with base != 0

2023-06-26 Thread Juzhe-Zhong

Currently, we are able to generate step vector with base == 0:
 { 0, 0, 2, 2, 4, 4, ... }

ASM:

vid
vand

However, we do wrong for step vector with base != 0:
{ 1, 1, 3, 3, 5, 5, ... }

Before this patch, such case will run fail.

After this patch, we are able to pass the testcase and generate the step vector 
with asm:

vid
vand
vadd

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_const_vector): Fix stepped vector 
with base != 0.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/partial/slp-17.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-18.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-19.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-17.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-18.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-19.c: New test.

---
 gcc/config/riscv/riscv-v.cc   | 14 +++-
 .../riscv/rvv/autovec/partial/slp-17.c| 34 
 .../riscv/rvv/autovec/partial/slp-18.c| 26 ++
 .../riscv/rvv/autovec/partial/slp-19.c| 26 ++
 .../riscv/rvv/autovec/partial/slp_run-17.c| 84 +++
 .../riscv/rvv/autovec/partial/slp_run-18.c| 69 +++
 .../riscv/rvv/autovec/partial/slp_run-19.c| 69 +++
 7 files changed, 320 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-17.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-18.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-19.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-17.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-18.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-19.c

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 5518394be1e..cd3422bf711 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -1258,7 +1258,6 @@ expand_const_vector (rtx target, rtx src)
}
  emit_move_insn (target, tmp);
}
-  return;
 }
   else if (CONST_VECTOR_STEPPED_P (src))
 {
@@ -1287,9 +1286,20 @@ expand_const_vector (rtx target, rtx src)
  */
  rtx imm
= gen_int_mode (-builder.npatterns (), builder.inner_mode ());
- rtx and_ops[] = {target, vid, imm};
+ rtx tmp = gen_reg_rtx (builder.mode ());
+ rtx and_ops[] = {tmp, vid, imm};
  icode = code_for_pred_scalar (AND, builder.mode ());
  emit_vlmax_insn (icode, RVV_BINOP, and_ops);
+ HOST_WIDE_INT init_val = INTVAL (builder.elt (0));
+ if (init_val == 0)
+   emit_move_insn (target, tmp);
+ else
+   {
+ rtx dup = gen_const_vector_dup (builder.mode (), init_val);
+ rtx add_ops[] = {target, tmp, dup};
+ icode = code_for_pred (PLUS, builder.mode ());
+ emit_vlmax_insn (icode, RVV_BINOP, add_ops);
+   }
}
  else
{
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-17.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-17.c
new file mode 100644
index 000..2f2c3d11c2a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-17.c
@@ -0,0 +1,34 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param 
riscv-autovec-preference=scalable -fdump-tree-optimized-details" } */
+
+#include 
+
+void
+f (uint8_t *restrict a, uint8_t *restrict b,
+   uint8_t *restrict c, uint8_t *restrict d,
+   int n)
+{
+  for (int i = 0; i < n; ++i)
+{
+  a[i * 8] = c[i * 8] + d[i * 8];
+  a[i * 8 + 1] = c[i * 8] + d[i * 8 + 1];
+  a[i * 8 + 2] = c[i * 8 + 2] + d[i * 8 + 2];
+  a[i * 8 + 3] = c[i * 8 + 2] + d[i * 8 + 3];
+  a[i * 8 + 4] = c[i * 8 + 4] + d[i * 8 + 4];
+  a[i * 8 + 5] = c[i * 8 + 4] + d[i * 8 + 5];
+  a[i * 8 + 6] = c[i * 8 + 6] + d[i * 8 + 6];
+  a[i * 8 + 7] = c[i * 8 + 6] + d[i * 8 + 7];
+  b[i * 8] = c[i * 8 + 1] + d[i * 8];
+  b[i * 8 + 1] = c[i * 8 + 1] + d[i * 8 + 1];
+  b[i * 8 + 2] = c[i * 8 + 3] + d[i * 8 + 2];
+  b[i * 8 + 3] = c[i * 8 + 3] + d[i * 8 + 3];
+  b[i * 8 + 4] = c[i * 8 + 5] + d[i * 8 + 4];
+  b[i * 8 + 5] = c[i * 8 + 5] + d[i * 8 + 5];
+  b[i * 8 + 6] = c[i * 8 + 7] + d[i * 8 + 6];
+  b[i * 8 + 7] = c[i * 8 + 7] + d[i * 8 + 7];
+}
+}
+
+/* { dg-final { scan-tree-dump-times "\.VEC_PERM" 2 "optimized" } } */
+/* { dg-final { scan-assembler {\tvid\.v} } } */
+/* { dg-final { scan-assembler-not {\tvmul} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-18.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-18.c
new file mode 100644
index

[PATCH] tree-optimization/110381 - preserve SLP permutation with in-order reductions

The following fixes a bug that manifests itself during fold-left
reduction transform in picking not the last scalar def to replace
and thus double-counting some elements.  But the underlying issue
is that we merge a load permutation into the in-order reduction
which is of course wrong.

Now, reduction analysis has not yet been performend when optimizing
permutations so we have to resort to check that ourselves.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/110381
* tree-vect-slp.cc (vect_optimize_slp_pass::start_choosing_layouts):
Materialize permutes before fold-left reductions.

* gcc.dg/vect/pr110381.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr110381.c | 40 
 gcc/tree-vect-slp.cc | 18 +++--
 2 files changed, 56 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr110381.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr110381.c 
b/gcc/testsuite/gcc.dg/vect/pr110381.c
new file mode 100644
index 000..2313dbf11ca
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr110381.c
@@ -0,0 +1,40 @@
+/* { dg-do run } */
+
+struct FOO {
+   double a;
+   double b;
+   double c;
+};
+
+double __attribute__((noipa))
+sum_8_foos(const struct FOO* foos)
+{
+  double sum = 0;
+
+  for (int i = 0; i < 8; ++i)
+{
+  struct FOO foo = foos[i];
+
+  /* Need to use an in-order reduction here, preserving
+ the load permutation.  */
+  sum += foo.a;
+  sum += foo.c;
+  sum += foo.b;
+}
+
+  return sum;
+}
+
+int main()
+{
+  struct FOO foos[8];
+
+  __builtin_memset (foos, 0, sizeof (foos));
+  foos[0].a = __DBL_MAX__;
+  foos[0].b = 5;
+  foos[0].c = -__DBL_MAX__;
+
+  if (sum_8_foos (foos) != 5)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 4481d43e3d7..8cb1ac1f319 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -4682,14 +4682,28 @@ vect_optimize_slp_pass::start_choosing_layouts ()
   m_partition_layout_costs.safe_grow_cleared (m_partitions.length ()
  * m_perms.length ());
 
-  /* We have to mark outgoing permutations facing non-reduction graph
- entries that are not represented as to be materialized.  */
+  /* We have to mark outgoing permutations facing non-associating-reduction
+ graph entries that are not represented as to be materialized.
+ slp_inst_kind_bb_reduc currently only covers associatable reductions.  */
   for (slp_instance instance : m_vinfo->slp_instances)
 if (SLP_INSTANCE_KIND (instance) == slp_inst_kind_ctor)
   {
unsigned int node_i = SLP_INSTANCE_TREE (instance)->vertex;
m_partitions[m_vertices[node_i].partition].layout = 0;
   }
+else if (SLP_INSTANCE_KIND (instance) == slp_inst_kind_reduc_chain)
+  {
+   stmt_vec_info stmt_info
+ = SLP_TREE_REPRESENTATIVE (SLP_INSTANCE_TREE (instance));
+   stmt_vec_info reduc_info = info_for_reduction (m_vinfo, stmt_info);
+   if (needs_fold_left_reduction_p (TREE_TYPE
+  (gimple_get_lhs (stmt_info->stmt)),
+STMT_VINFO_REDUC_CODE (reduc_info)))
+ {
+   unsigned int node_i = SLP_INSTANCE_TREE (instance)->vertex;
+   m_partitions[m_vertices[node_i].partition].layout = 0;
+ }
+  }
 
   /* Check which layouts each node and partition can handle.  Calculate the
  weights associated with inserting layout changes on edges.  */
-- 
2.35.3

Re: Re: [PATCH] RISC-V: Enhance RVV VLA SLP auto-vectorization

No, I don't think we can use first case (vid + vand -npatterns) for base != 0,
since the first element value of vid is alway 0.

Thanks.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-06-26 15:51
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Enhance RVV VLA SLP auto-vectorization
Hi Juzhe,
 
> Currently, we are able to generate step vector with base == 0:
>  { 0, 0, 2, 2, 4, 4, ... }
> 
> ASM:
> 
> vid
> vand
> 
> However, we do wrong for step vector with base != 0:
> { 1, 1, 3, 3, 5, 5, ... }
> 
> Before this patch, such case will run fail.
> 
> After this patch, we are able to pass the testcase and generate the step 
> vector with asm:
> 
> vid
> vand
> vadd
 
Can't we use the first case as long as pow2_p (base) == true
and not just for base == 0?
 
Regards
Robin

Re: [PATCH V3] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE

On Mon, 26 Jun 2023, juzhe.zh...@rivai.ai wrote:

> From: Ju-Zhe Zhong 
> 
> Hi, Richi. It seems that we use nunits which is len + bias to iterate then we 
> can
> simplify the codes.
> 
> Also, I fixed behavior of len_store,
> 
> Before this patch:
>(len - bias) * BITS_PER_UNIT
> After this patch:
>(len + bias) * BITS_PER_UNIT
>
> gcc/ChangeLog:
> 
> * tree-ssa-sccvn.cc (vn_reference_lookup_3): Add LEN_MASK_STORE and 
> fix LEN_STORE.
> 
> ---
>  gcc/tree-ssa-sccvn.cc | 24 ++--
>  1 file changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
> index 11061a374a2..d66e75460ed 100644
> --- a/gcc/tree-ssa-sccvn.cc
> +++ b/gcc/tree-ssa-sccvn.cc
> @@ -3304,6 +3304,16 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
> *data_,
> if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias))
>   return (void *)-1;
> break;
> + case IFN_LEN_MASK_STORE:
> +   len = gimple_call_arg (call, 2);
> +   bias = gimple_call_arg (call, 5);
> +   if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias))
> + return (void *)-1;
> +   mask = gimple_call_arg (call, internal_fn_mask_index (fn));
> +   mask = vn_valueize (mask);
> +   if (TREE_CODE (mask) != VECTOR_CST)
> + return (void *)-1;
> +   break;
>   default:
> return (void *)-1;
>   }
> @@ -3344,6 +3354,16 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
> *data_,
> tree vectype = TREE_TYPE (def_rhs);
> unsigned HOST_WIDE_INT elsz
>   = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (vectype)));
> +   poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
> +   if (len)
> + {
> +   /* Since the following while condition known_lt
> +  (mask_idx, nunits) will exit the while loop
> +  when mask_idx > nunits.coeffs[0], we pick the
> +  MIN (nunits.coeffs[0], len + bias).  */
> +   nunits = MIN (nunits.coeffs[0],
> + tree_to_uhwi (len) + tree_to_shwi (bias));

I think you can use ordered_min here?  Alternatively doing ...

> + }
> if (mask)
>   {
> HOST_WIDE_INT start = 0, length = 0;
> @@ -3373,7 +3393,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
> *data_,
>   length += elsz;
> mask_idx++;
>   }
> -   while (known_lt (mask_idx, TYPE_VECTOR_SUBPARTS (vectype)));
> +   while (known_lt (mask_idx, nunits));

&& mask_id < len

would be possible.

Richard?

Thanks,
Richard.

[PATCH V3] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE

2023-06-26 Thread juzhe . zhong

From: Ju-Zhe Zhong 

Hi, Richi. It seems that we use nunits which is len + bias to iterate then we 
can
simplify the codes.

Also, I fixed behavior of len_store,

Before this patch:
   (len - bias) * BITS_PER_UNIT
After this patch:
   (len + bias) * BITS_PER_UNIT
   
gcc/ChangeLog:

* tree-ssa-sccvn.cc (vn_reference_lookup_3): Add LEN_MASK_STORE and fix 
LEN_STORE.

---
 gcc/tree-ssa-sccvn.cc | 24 ++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
index 11061a374a2..d66e75460ed 100644
--- a/gcc/tree-ssa-sccvn.cc
+++ b/gcc/tree-ssa-sccvn.cc
@@ -3304,6 +3304,16 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
*data_,
  if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias))
return (void *)-1;
  break;
+   case IFN_LEN_MASK_STORE:
+ len = gimple_call_arg (call, 2);
+ bias = gimple_call_arg (call, 5);
+ if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias))
+   return (void *)-1;
+ mask = gimple_call_arg (call, internal_fn_mask_index (fn));
+ mask = vn_valueize (mask);
+ if (TREE_CODE (mask) != VECTOR_CST)
+   return (void *)-1;
+ break;
default:
  return (void *)-1;
}
@@ -3344,6 +3354,16 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
*data_,
  tree vectype = TREE_TYPE (def_rhs);
  unsigned HOST_WIDE_INT elsz
= tree_to_uhwi (TYPE_SIZE (TREE_TYPE (vectype)));
+ poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
+ if (len)
+   {
+ /* Since the following while condition known_lt
+(mask_idx, nunits) will exit the while loop
+when mask_idx > nunits.coeffs[0], we pick the
+MIN (nunits.coeffs[0], len + bias).  */
+ nunits = MIN (nunits.coeffs[0],
+   tree_to_uhwi (len) + tree_to_shwi (bias));
+   }
  if (mask)
{
  HOST_WIDE_INT start = 0, length = 0;
@@ -3373,7 +3393,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
*data_,
length += elsz;
  mask_idx++;
}
- while (known_lt (mask_idx, TYPE_VECTOR_SUBPARTS (vectype)));
+ while (known_lt (mask_idx, nunits));
  if (length != 0)
{
  pd.rhs_off = start;
@@ -3389,7 +3409,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
*data_,
{
  pd.offset = offset2i;
  pd.size = (tree_to_uhwi (len)
-+ -tree_to_shwi (bias)) * BITS_PER_UNIT;
++ tree_to_shwi (bias)) * BITS_PER_UNIT;
  if (BYTES_BIG_ENDIAN)
pd.rhs_off = pd.size - tree_to_uhwi (TYPE_SIZE (vectype));
  else
-- 
2.36.3

[PATCH] vect: Cost intermediate conversions

2023-06-26 Thread Richard Sandiford via Gcc-patches

g:6f19cf7526168f8 extended N-vector to N-vector conversions
to handle cases where an intermediate integer extension or
truncation is needed.  This patch adjusts the cost to account
for these intermediate conversions.

Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK to install?

Richard

gcc/
* tree-vect-stmts.cc (vectorizable_conversion): Take multi_step_cvt
into account when costing non-widening/truncating conversions.
---
 gcc/tree-vect-stmts.cc | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index ae24f3e66e6..7bc602bf90a 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -5470,8 +5470,9 @@ vectorizable_conversion (vec_info *vinfo,
   if (modifier == NONE)
 {
  STMT_VINFO_TYPE (stmt_info) = type_conversion_vec_info_type;
- vect_model_simple_cost (vinfo, stmt_info, ncopies, dt, ndts, slp_node,
- cost_vec);
+ vect_model_simple_cost (vinfo, stmt_info,
+ ncopies * (1 + multi_step_cvt),
+ dt, ndts, slp_node, cost_vec);
}
   else if (modifier == NARROW_SRC || modifier == NARROW_DST)
{
-- 
2.25.1

Re: Re: [PATCH V2] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE

On Mon, 26 Jun 2023, juzhe.zh...@rivai.ai wrote:

> Hi, Richi. I am wondering whether it is true that :?
> 
> TYPE_VECTOR_SUBPARTS (vectype).to_constant ()

Not necessarily.

> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-06-26 19:18
> To: Ju-Zhe Zhong
> CC: gcc-patches; richard.sandiford
> Subject: Re: [PATCH V2] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE
> On Mon, 26 Jun 2023, juzhe.zh...@rivai.ai wrote:
>  
> > From: Ju-Zhe Zhong 
> > 
> > Hi, Richi. It seems that we use nunits which is len + bias to iterate then 
> > we can
> > simplify the codes.
> > 
> > Also, I fixed behavior of len_store,
> > 
> > Before this patch:
> >(len - bias) * BITS_PER_UNIT
> > After this patch:
> >(len + bias) * BITS_PER_UNIT
> >
> > gcc/ChangeLog:
> > 
> > * tree-ssa-sccvn.cc (vn_reference_lookup_3): Add LEN_MAS_STORE and 
> > fix LEN_STORE.
> > 
> > ---
> >  gcc/tree-ssa-sccvn.cc | 19 +--
> >  1 file changed, 17 insertions(+), 2 deletions(-)
> > 
> > diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
> > index 11061a374a2..228ec117ff3 100644
> > --- a/gcc/tree-ssa-sccvn.cc
> > +++ b/gcc/tree-ssa-sccvn.cc
> > @@ -3304,6 +3304,16 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
> > *data_,
> >if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias))
> >  return (void *)-1;
> >break;
> > + case IFN_LEN_MASK_STORE:
> > +   len = gimple_call_arg (call, 2);
> > +   bias = gimple_call_arg (call, 5);
> > +   if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias))
> > + return (void *)-1;
> > +   mask = gimple_call_arg (call, internal_fn_mask_index (fn));
> > +   mask = vn_valueize (mask);
> > +   if (TREE_CODE (mask) != VECTOR_CST)
> > + return (void *)-1;
> > +   break;
> >  default:
> >return (void *)-1;
> >  }
> > @@ -3344,6 +3354,11 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
> > *data_,
> >tree vectype = TREE_TYPE (def_rhs);
> >unsigned HOST_WIDE_INT elsz
> >  = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (vectype)));
> > +   poly_uint64 nunits;
> > +   if (len)
> > + nunits = tree_to_uhwi (len) + tree_to_shwi (bias);
> > +   else
> > + nunits = TYPE_VECTOR_SUBPARTS (vectype);
>  
> Are the _LEN ifns accessible via intrinsics as well?  If so I think
> we should use MIN (nunits, len + bias) here as otherwise we risk
> out-of bound accesses.
>  
> Otherwise looks good to me.
>  
> Thanks,
> Richard.
>  
> >if (mask)
> >  {
> >HOST_WIDE_INT start = 0, length = 0;
> > @@ -3373,7 +3388,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
> > *data_,
> >  length += elsz;
> >mask_idx++;
> >  }
> > -   while (known_lt (mask_idx, TYPE_VECTOR_SUBPARTS (vectype)));
> > +   while (known_lt (mask_idx, nunits));
> >if (length != 0)
> >  {
> >pd.rhs_off = start;
> > @@ -3389,7 +3404,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
> > *data_,
> >  {
> >pd.offset = offset2i;
> >pd.size = (tree_to_uhwi (len)
> > -  + -tree_to_shwi (bias)) * BITS_PER_UNIT;
> > +  + tree_to_shwi (bias)) * BITS_PER_UNIT;
> >if (BYTES_BIG_ENDIAN)
> >  pd.rhs_off = pd.size - tree_to_uhwi (TYPE_SIZE (vectype));
> >else
> > 
>  
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

Re: Re: [PATCH V2] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE

Hi, Richi. I am wondering whether it is true that :?

TYPE_VECTOR_SUBPARTS (vectype).to_constant ()

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-06-26 19:18
To: Ju-Zhe Zhong
CC: gcc-patches; richard.sandiford
Subject: Re: [PATCH V2] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE
On Mon, 26 Jun 2023, juzhe.zh...@rivai.ai wrote:
 
> From: Ju-Zhe Zhong 
> 
> Hi, Richi. It seems that we use nunits which is len + bias to iterate then we 
> can
> simplify the codes.
> 
> Also, I fixed behavior of len_store,
> 
> Before this patch:
>(len - bias) * BITS_PER_UNIT
> After this patch:
>(len + bias) * BITS_PER_UNIT
>
> gcc/ChangeLog:
> 
> * tree-ssa-sccvn.cc (vn_reference_lookup_3): Add LEN_MAS_STORE and 
> fix LEN_STORE.
> 
> ---
>  gcc/tree-ssa-sccvn.cc | 19 +--
>  1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
> index 11061a374a2..228ec117ff3 100644
> --- a/gcc/tree-ssa-sccvn.cc
> +++ b/gcc/tree-ssa-sccvn.cc
> @@ -3304,6 +3304,16 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
> *data_,
>if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias))
>  return (void *)-1;
>break;
> + case IFN_LEN_MASK_STORE:
> +   len = gimple_call_arg (call, 2);
> +   bias = gimple_call_arg (call, 5);
> +   if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias))
> + return (void *)-1;
> +   mask = gimple_call_arg (call, internal_fn_mask_index (fn));
> +   mask = vn_valueize (mask);
> +   if (TREE_CODE (mask) != VECTOR_CST)
> + return (void *)-1;
> +   break;
>  default:
>return (void *)-1;
>  }
> @@ -3344,6 +3354,11 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
> *data_,
>tree vectype = TREE_TYPE (def_rhs);
>unsigned HOST_WIDE_INT elsz
>  = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (vectype)));
> +   poly_uint64 nunits;
> +   if (len)
> + nunits = tree_to_uhwi (len) + tree_to_shwi (bias);
> +   else
> + nunits = TYPE_VECTOR_SUBPARTS (vectype);
 
Are the _LEN ifns accessible via intrinsics as well?  If so I think
we should use MIN (nunits, len + bias) here as otherwise we risk
out-of bound accesses.
 
Otherwise looks good to me.
 
Thanks,
Richard.
 
>if (mask)
>  {
>HOST_WIDE_INT start = 0, length = 0;
> @@ -3373,7 +3388,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
> *data_,
>  length += elsz;
>mask_idx++;
>  }
> -   while (known_lt (mask_idx, TYPE_VECTOR_SUBPARTS (vectype)));
> +   while (known_lt (mask_idx, nunits));
>if (length != 0)
>  {
>pd.rhs_off = start;
> @@ -3389,7 +3404,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
> *data_,
>  {
>pd.offset = offset2i;
>pd.size = (tree_to_uhwi (len)
> -  + -tree_to_shwi (bias)) * BITS_PER_UNIT;
> +  + tree_to_shwi (bias)) * BITS_PER_UNIT;
>if (BYTES_BIG_ENDIAN)
>  pd.rhs_off = pd.size - tree_to_uhwi (TYPE_SIZE (vectype));
>else
> 
 
-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

Re: [PATCH V2] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE

On Mon, 26 Jun 2023, juzhe.zh...@rivai.ai wrote:

> From: Ju-Zhe Zhong 
> 
> Hi, Richi. It seems that we use nunits which is len + bias to iterate then we 
> can
> simplify the codes.
> 
> Also, I fixed behavior of len_store,
> 
> Before this patch:
>(len - bias) * BITS_PER_UNIT
> After this patch:
>(len + bias) * BITS_PER_UNIT
>
> gcc/ChangeLog:
> 
> * tree-ssa-sccvn.cc (vn_reference_lookup_3): Add LEN_MAS_STORE and 
> fix LEN_STORE.
> 
> ---
>  gcc/tree-ssa-sccvn.cc | 19 +--
>  1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
> index 11061a374a2..228ec117ff3 100644
> --- a/gcc/tree-ssa-sccvn.cc
> +++ b/gcc/tree-ssa-sccvn.cc
> @@ -3304,6 +3304,16 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
> *data_,
> if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias))
>   return (void *)-1;
> break;
> + case IFN_LEN_MASK_STORE:
> +   len = gimple_call_arg (call, 2);
> +   bias = gimple_call_arg (call, 5);
> +   if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias))
> + return (void *)-1;
> +   mask = gimple_call_arg (call, internal_fn_mask_index (fn));
> +   mask = vn_valueize (mask);
> +   if (TREE_CODE (mask) != VECTOR_CST)
> + return (void *)-1;
> +   break;
>   default:
> return (void *)-1;
>   }
> @@ -3344,6 +3354,11 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
> *data_,
> tree vectype = TREE_TYPE (def_rhs);
> unsigned HOST_WIDE_INT elsz
>   = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (vectype)));
> +   poly_uint64 nunits;
> +   if (len)
> + nunits = tree_to_uhwi (len) + tree_to_shwi (bias);
> +   else
> + nunits = TYPE_VECTOR_SUBPARTS (vectype);

Are the _LEN ifns accessible via intrinsics as well?  If so I think
we should use MIN (nunits, len + bias) here as otherwise we risk
out-of bound accesses.

Otherwise looks good to me.

Thanks,
Richard.

> if (mask)
>   {
> HOST_WIDE_INT start = 0, length = 0;
> @@ -3373,7 +3388,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
> *data_,
>   length += elsz;
> mask_idx++;
>   }
> -   while (known_lt (mask_idx, TYPE_VECTOR_SUBPARTS (vectype)));
> +   while (known_lt (mask_idx, nunits));
> if (length != 0)
>   {
> pd.rhs_off = start;
> @@ -3389,7 +3404,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
> *data_,
>   {
> pd.offset = offset2i;
> pd.size = (tree_to_uhwi (len)
> -  + -tree_to_shwi (bias)) * BITS_PER_UNIT;
> +  + tree_to_shwi (bias)) * BITS_PER_UNIT;
> if (BYTES_BIG_ENDIAN)
>   pd.rhs_off = pd.size - tree_to_uhwi (TYPE_SIZE (vectype));
> else
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

RE: [PATCH v1] RISC-V: Remove duplicated extern function_base decl

Committed, thanks kito and juzhe.

Pan

From: Kito Cheng 
Sent: Monday, June 26, 2023 5:51 PM
To: juzhe.zh...@rivai.ai
Cc: Robin Dapp ; gcc-patches ; 
jeffreyalaw ; Li, Pan2 ; Wang, 
Yanzhang 
Subject: Re: [PATCH v1] RISC-V: Remove duplicated extern function_base decl

Lgtm

juzhe.zh...@rivai.ai 
mailto:juzhe.zh...@rivai.ai>>於 2023年6月26日 週一，17:40寫道：
LGTM



juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-06-26 17:36
To: gcc-patches
CC: juzhe.zhong; rdapp.gcc; jeffreyalaw; pan2.li; 
yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Remove duplicated extern function_base decl
From: Pan Li mailto:pan2...@intel.com>>

Signed-off-by: Pan Li mailto:pan2...@intel.com>>

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.h: Remove duplicated decl.
---
gcc/config/riscv/riscv-vector-builtins-bases.h | 5 -
1 file changed, 5 deletions(-)

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index 62ff38a2811..fb95d6afdf0 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -121,8 +121,6 @@ extern const function_base *const vsmul;
extern const function_base *const vssra;
extern const function_base *const vssrl;
extern const function_base *const vnclip;
-extern const function_base *const vnclip;
-extern const function_base *const vnclipu;
extern const function_base *const vnclipu;
extern const function_base *const vmand;
extern const function_base *const vmnand;
@@ -144,8 +142,6 @@ extern const function_base *const vmsof;
extern const function_base *const viota;
extern const function_base *const vid;
extern const function_base *const vfadd;
-extern const function_base *const vfadd;
-extern const function_base *const vfsub;
extern const function_base *const vfsub;
extern const function_base *const vfrsub;
extern const function_base *const vfwadd;
@@ -153,7 +149,6 @@ extern const function_base *const vfwsub;
extern const function_base *const vfmul;
extern const function_base *const vfmul;
extern const function_base *const vfdiv;
-extern const function_base *const vfdiv;
extern const function_base *const vfrdiv;
extern const function_base *const vfwmul;
extern const function_base *const vfmacc;
--
2.34.1

RE: [PATCH] RISC-V: Remove redundant vcond patterns

Committed, thanks kito.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Kito Cheng via Gcc-patches
Sent: Monday, June 26, 2023 5:08 PM
To: Richard Biener 
Cc: Juzhe-Zhong ; gcc-patches@gcc.gnu.org; 
kito.ch...@sifive.com; pal...@dabbelt.com; pal...@rivosinc.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH] RISC-V: Remove redundant vcond patterns

ok for trunk, thanks :)

On Mon, Jun 26, 2023 at 4:44 PM Richard Biener via Gcc-patches
 wrote:
>
> On Mon, 26 Jun 2023, Juzhe-Zhong wrote:
>
> > Previously, Richi has suggested that vcond patterns are only needed when 
> > target
> > support comparison + select consuming 1 instruction.
> >
> > Now, I do the experiments on removing those "vcond" patterns, it works 
> > perfectly.
> >
> > All testcases PASS.
> >
> > Really appreicate Richi helps us recognize such issue.
> >
> > Now remove all "vcond" patterns as Richi suggested.
>
> Btw, it's also good to have a target clear of 'vcond' to verify we are
> indeed happy with just vcmp and vcond_mask in the middle-end.
>
> I see there's only a single user of vcondeq (x86), looks like an
> opportunity to remove this optab ... (unless my grep skills are confused).
>
> Thanks,
> Richard.
>
> > gcc/ChangeLog:
> >
> > * config/riscv/autovec.md (vcond): Remove 
> > redundant vcond patterns.
> > (vcondu): Ditto.
> > * config/riscv/riscv-protos.h (expand_vcond): Ditto.
> > * config/riscv/riscv-v.cc (expand_vcond): Ditto.
> >
> > ---
> >  gcc/config/riscv/autovec.md | 38 -
> >  gcc/config/riscv/riscv-protos.h |  1 -
> >  gcc/config/riscv/riscv-v.cc | 22 ---
> >  3 files changed, 61 deletions(-)
> >
> > diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> > index 5de43a8d647..19100b5b2cb 100644
> > --- a/gcc/config/riscv/autovec.md
> > +++ b/gcc/config/riscv/autovec.md
> > @@ -311,44 +311,6 @@
> >}
> >  )
> >
> > -;; 
> > -
> > -;;  [INT,FP] Compare and select
> > -;; 
> > -
> > -;; The patterns in this section are synthetic.
> > -;; 
> > -
> > -
> > -(define_expand "vcond"
> > -  [(set (match_operand:V 0 "register_operand")
> > - (if_then_else:V
> > -   (match_operator 3 "comparison_operator"
> > - [(match_operand:VI 4 "register_operand")
> > -  (match_operand:VI 5 "register_operand")])
> > -   (match_operand:V 1 "register_operand")
> > -   (match_operand:V 2 "register_operand")))]
> > -  "TARGET_VECTOR && known_eq (GET_MODE_NUNITS (mode),
> > - GET_MODE_NUNITS (mode))"
> > -  {
> > -riscv_vector::expand_vcond (operands);
> > -DONE;
> > -  }
> > -)
> > -
> > -(define_expand "vcondu"
> > -  [(set (match_operand:V 0 "register_operand")
> > - (if_then_else:V
> > -   (match_operator 3 "comparison_operator"
> > - [(match_operand:VI 4 "register_operand")
> > -  (match_operand:VI 5 "register_operand")])
> > -   (match_operand:V 1 "register_operand")
> > -   (match_operand:V 2 "register_operand")))]
> > -  "TARGET_VECTOR && known_eq (GET_MODE_NUNITS (mode),
> > - GET_MODE_NUNITS (mode))"
> > -  {
> > -riscv_vector::expand_vcond (operands);
> > -DONE;
> > -  }
> > -)
> > -
> >  ;; 
> > -
> >  ;;  [INT] Sign and zero extension
> >  ;; 
> > -
> > diff --git a/gcc/config/riscv/riscv-protos.h 
> > b/gcc/config/riscv/riscv-protos.h
> > index f686edab3d1..7265b1c8401 100644
> > --- a/gcc/config/riscv/riscv-protos.h
> > +++ b/gcc/config/riscv/riscv-protos.h
> > @@ -252,7 +252,6 @@ machine_mode preferred_simd_mode (scalar_mode);
> >  opt_machine_mode get_mask_mode (machine_mode);
> >  void expand_vec_series (rtx, rtx, rtx);
> >  void expand_vec_init (rtx, rtx);
> > -void expand_vcond (rtx *);
> >  void expand_vec_perm (rtx, rtx, rtx, rtx);
> >  void expand_select_vl (rtx *);
> >  void expand_load_store (rtx *, bool);
> > diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
> > index 5518394be1e..f6dd0d8e2a4 100644
> > --- a/gcc/config/riscv/riscv-v.cc
> > +++ b/gcc/config/riscv/riscv-v.cc
> > @@ -2421,28 +2421,6 @@ expand_vec_cmp_float (rtx target, rtx_code code, rtx 
> > op0, rtx op1,
> >return false;
> >  }
> >
> > -/* Expand an RVV vcond pattern with operands OPS.  DATA_MODE is the mode
> > -   of the data being merged and CMP_MODE is the mode of the values being
> > -   compared.  */
> > -
> > -void
> > -expand_vcond (rtx *ops)
> > -{
> > -  machine_mode cmp_mode = GET_MODE (ops[4]);
> > -  machine_mode data_mode = GET_MODE (ops[1]);
> > -  machine_mode mask_mode = get_mask_mode (cmp_mode).require ();
> > -

[PATCH] tree-optimization/110392 - ICE with predicate analysis

Feeding not optimized IL can result in predicate normalization
to simplify things so a predicate can get true or false.  The
following re-orders the early exit in that case to come after
simplification and normalization to take care of that.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/110392
* gimple-predicate-analysis.cc (uninit_analysis::is_use_guarded):
Do early exits on true/false predicate only after normalization.
---
 gcc/gimple-predicate-analysis.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/gimple-predicate-analysis.cc b/gcc/gimple-predicate-analysis.cc
index 7f20f81ad86..373163ba9c8 100644
--- a/gcc/gimple-predicate-analysis.cc
+++ b/gcc/gimple-predicate-analysis.cc
@@ -2216,11 +2216,11 @@ uninit_analysis::is_use_guarded (gimple *use_stmt, 
basic_block use_bb,
 return false;
 
   use_preds.simplify (use_stmt, /*is_use=*/true);
+  use_preds.normalize (use_stmt, /*is_use=*/true);
   if (use_preds.is_false ())
 return true;
   if (use_preds.is_true ())
 return false;
-  use_preds.normalize (use_stmt, /*is_use=*/true);
 
   /* Try to prune the dead incoming phi edges.  */
   if (!overlap (phi, opnds, visited, use_preds))
@@ -2238,11 +2238,11 @@ uninit_analysis::is_use_guarded (gimple *use_stmt, 
basic_block use_bb,
return false;
 
   m_phi_def_preds.simplify (phi);
+  m_phi_def_preds.normalize (phi);
   if (m_phi_def_preds.is_false ())
return false;
   if (m_phi_def_preds.is_true ())
return true;
-  m_phi_def_preds.normalize (phi);
 }
 
   /* Return true if the predicate guarding the valid definition (i.e.,
-- 
2.35.3

[PATCH] Machine Description: Add LEN_MASK_{GATHER_LOAD, SCATTER_STORE} pattern

2023-06-26 Thread juzhe . zhong

From: Ju-Zhe Zhong 

Hi, Richi and Richard.

This patch is adding LEN_MASK_{GATHER_LOAD,SCATTER_STORE} to allow targets
handle flow control by mask and loop control by length on gather/scatter memory
operations. Consider this following case:

#include 
void
f (uint8_t *restrict a, 
   uint8_t *restrict b, int n,
   int base, int step,
   int *restrict cond)
{
  for (int i = 0; i < n; ++i)
{
  if (cond[i])
a[i * step + base] = b[i * step + base];
}
}

We hope RVV can vectorize such case into following IR:

loop_len = SELECT_VL
control_mask = comparison
v = LEN_MASK_GATHER_LOAD (.., loop_len, control_mask)
LEN_SCATTER_STORE (... v, ..., loop_len, control_mask)

This patch doesn't apply such patterns into vectorizer, just add patterns
and update the documents.

Will send patch which apply such patterns into vectorizer soon after this
patch is approved.

Thanks.

gcc/ChangeLog:

* doc/md.texi: Add LEN_MASK_{GATHER_LOAD,SCATTER_STORE}.
* internal-fn.cc (expand_scatter_store_optab_fn): Ditto.
(expand_gather_load_optab_fn): Ditto.
(internal_load_fn_p): Ditto.
(internal_store_fn_p): Ditto.
(internal_gather_scatter_fn_p): Ditto.
(internal_fn_mask_index): Ditto.
(internal_fn_stored_value_index): Ditto.
* internal-fn.def (LEN_MASK_GATHER_LOAD): Ditto.
(LEN_MASK_SCATTER_STORE): Ditto.
* optabs.def (OPTAB_CD): Ditto.

---
 gcc/doc/md.texi | 14 ++
 gcc/internal-fn.cc  | 32 ++--
 gcc/internal-fn.def |  8 ++--
 gcc/optabs.def  |  2 ++
 4 files changed, 52 insertions(+), 4 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 9648fdc846a..a7512506358 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5040,6 +5040,14 @@ operand 5.  Bit @var{i} of the mask is set if element 
@var{i}
 of the result should be loaded from memory and clear if element @var{i}
 of the result should be set to zero.
 
+@cindex @code{len_mask_gather_load@var{m}@var{n}} instruction pattern
+@item @samp{len_mask_gather_load@var{m}@var{n}}
+Like @samp{gather_load@var{m}@var{n}}, but takes an extra len operand
+as operand 5 and an extra mask operand as operand 6.  Bit @var{i} of
+the mask is set and i < len if element @var{i} of the result should be
+loaded from memory.  Element @var{i} of the result should be undefined
+value when either Bit @var{i} of the mask is clear or i >= len.
+
 @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
 @item @samp{scatter_store@var{m}@var{n}}
 Store a vector of mode @var{m} into several distinct memory locations.
@@ -5069,6 +5077,12 @@ Like @samp{scatter_store@var{m}@var{n}}, but takes an 
extra mask operand as
 operand 5.  Bit @var{i} of the mask is set if element @var{i}
 of the result should be stored to memory.
 
+@cindex @code{len_mask_scatter_store@var{m}@var{n}} instruction pattern
+@item @samp{len_mask_scatter_store@var{m}@var{n}}
+Like @samp{scatter_store@var{m}@var{n}}, but takes an extra len operand as
+operand 5 and an extra mask operand as operand 6.  Bit @var{i} of the mask
+is set and i < len if element @var{i} of the result should be stored to memory.
+
 @cindex @code{vec_set@var{m}} instruction pattern
 @item @samp{vec_set@var{m}}
 Set given field in the vector value.  Operand 0 is the vector to modify,
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 9017176dc7a..335ff9971bc 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -3537,7 +3537,7 @@ expand_scatter_store_optab_fn (internal_fn, gcall *stmt, 
direct_optab optab)
   HOST_WIDE_INT scale_int = tree_to_shwi (scale);
   rtx rhs_rtx = expand_normal (rhs);
 
-  class expand_operand ops[6];
+  class expand_operand ops[7];
   int i = 0;
   create_address_operand ([i++], base_rtx);
   create_input_operand ([i++], offset_rtx, TYPE_MODE (TREE_TYPE (offset)));
@@ -3546,6 +3546,14 @@ expand_scatter_store_optab_fn (internal_fn, gcall *stmt, 
direct_optab optab)
   create_input_operand ([i++], rhs_rtx, TYPE_MODE (TREE_TYPE (rhs)));
   if (mask_index >= 0)
 {
+  if (optab == len_mask_scatter_store_optab)
+   {
+ tree len = gimple_call_arg (stmt, mask_index - 1);
+ rtx len_rtx = expand_normal (len);
+ create_convert_operand_from ([i++], len_rtx,
+  TYPE_MODE (TREE_TYPE (len)),
+  TYPE_UNSIGNED (TREE_TYPE (len)));
+   }
   tree mask = gimple_call_arg (stmt, mask_index);
   rtx mask_rtx = expand_normal (mask);
   create_input_operand ([i++], mask_rtx, TYPE_MODE (TREE_TYPE (mask)));
@@ -3572,7 +3580,7 @@ expand_gather_load_optab_fn (internal_fn, gcall *stmt, 
direct_optab optab)
   HOST_WIDE_INT scale_int = tree_to_shwi (scale);
 
   int i = 0;
-  class expand_operand ops[6];
+  class expand_operand ops[7];
   create_output_operand ([i++], lhs_rtx, TYPE_MODE (TREE_TYPE (lhs)));
   create_address_operand ([i++], base_rtx);

Re: [PATCH v1] RISC-V: Remove duplicated extern function_base decl

Lgtm

juzhe.zh...@rivai.ai 於 2023年6月26日 週一，17:40寫道：

> LGTM
>
>
>
> juzhe.zh...@rivai.ai
>
> From: pan2.li
> Date: 2023-06-26 17:36
> To: gcc-patches
> CC: juzhe.zhong; rdapp.gcc; jeffreyalaw; pan2.li; yanzhang.wang;
> kito.cheng
> Subject: [PATCH v1] RISC-V: Remove duplicated extern function_base decl
> From: Pan Li 
>
> Signed-off-by: Pan Li 
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vector-builtins-bases.h: Remove duplicated decl.
> ---
> gcc/config/riscv/riscv-vector-builtins-bases.h | 5 -
> 1 file changed, 5 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h
> b/gcc/config/riscv/riscv-vector-builtins-bases.h
> index 62ff38a2811..fb95d6afdf0 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-bases.h
> +++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
> @@ -121,8 +121,6 @@ extern const function_base *const vsmul;
> extern const function_base *const vssra;
> extern const function_base *const vssrl;
> extern const function_base *const vnclip;
> -extern const function_base *const vnclip;
> -extern const function_base *const vnclipu;
> extern const function_base *const vnclipu;
> extern const function_base *const vmand;
> extern const function_base *const vmnand;
> @@ -144,8 +142,6 @@ extern const function_base *const vmsof;
> extern const function_base *const viota;
> extern const function_base *const vid;
> extern const function_base *const vfadd;
> -extern const function_base *const vfadd;
> -extern const function_base *const vfsub;
> extern const function_base *const vfsub;
> extern const function_base *const vfrsub;
> extern const function_base *const vfwadd;
> @@ -153,7 +149,6 @@ extern const function_base *const vfwsub;
> extern const function_base *const vfmul;
> extern const function_base *const vfmul;
> extern const function_base *const vfdiv;
> -extern const function_base *const vfdiv;
> extern const function_base *const vfrdiv;
> extern const function_base *const vfwmul;
> extern const function_base *const vfmacc;
> --
> 2.34.1
>
>
>

Re: [PATCH v1] RISC-V: Remove duplicated extern function_base decl

LGTM



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-06-26 17:36
To: gcc-patches
CC: juzhe.zhong; rdapp.gcc; jeffreyalaw; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Remove duplicated extern function_base decl
From: Pan Li 
 
Signed-off-by: Pan Li 
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins-bases.h: Remove duplicated decl.
---
gcc/config/riscv/riscv-vector-builtins-bases.h | 5 -
1 file changed, 5 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index 62ff38a2811..fb95d6afdf0 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -121,8 +121,6 @@ extern const function_base *const vsmul;
extern const function_base *const vssra;
extern const function_base *const vssrl;
extern const function_base *const vnclip;
-extern const function_base *const vnclip;
-extern const function_base *const vnclipu;
extern const function_base *const vnclipu;
extern const function_base *const vmand;
extern const function_base *const vmnand;
@@ -144,8 +142,6 @@ extern const function_base *const vmsof;
extern const function_base *const viota;
extern const function_base *const vid;
extern const function_base *const vfadd;
-extern const function_base *const vfadd;
-extern const function_base *const vfsub;
extern const function_base *const vfsub;
extern const function_base *const vfrsub;
extern const function_base *const vfwadd;
@@ -153,7 +149,6 @@ extern const function_base *const vfwsub;
extern const function_base *const vfmul;
extern const function_base *const vfmul;
extern const function_base *const vfdiv;
-extern const function_base *const vfdiv;
extern const function_base *const vfrdiv;
extern const function_base *const vfwmul;
extern const function_base *const vfmacc;
-- 
2.34.1

[PATCH V2] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE

2023-06-26 Thread juzhe . zhong

From: Ju-Zhe Zhong 

Hi, Richi. It seems that we use nunits which is len + bias to iterate then we 
can
simplify the codes.

Also, I fixed behavior of len_store,

Before this patch:
   (len - bias) * BITS_PER_UNIT
After this patch:
   (len + bias) * BITS_PER_UNIT
   
gcc/ChangeLog:

* tree-ssa-sccvn.cc (vn_reference_lookup_3): Add LEN_MAS_STORE and fix 
LEN_STORE.

---
 gcc/tree-ssa-sccvn.cc | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
index 11061a374a2..228ec117ff3 100644
--- a/gcc/tree-ssa-sccvn.cc
+++ b/gcc/tree-ssa-sccvn.cc
@@ -3304,6 +3304,16 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
*data_,
  if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias))
return (void *)-1;
  break;
+   case IFN_LEN_MASK_STORE:
+ len = gimple_call_arg (call, 2);
+ bias = gimple_call_arg (call, 5);
+ if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias))
+   return (void *)-1;
+ mask = gimple_call_arg (call, internal_fn_mask_index (fn));
+ mask = vn_valueize (mask);
+ if (TREE_CODE (mask) != VECTOR_CST)
+   return (void *)-1;
+ break;
default:
  return (void *)-1;
}
@@ -3344,6 +3354,11 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
*data_,
  tree vectype = TREE_TYPE (def_rhs);
  unsigned HOST_WIDE_INT elsz
= tree_to_uhwi (TYPE_SIZE (TREE_TYPE (vectype)));
+ poly_uint64 nunits;
+ if (len)
+   nunits = tree_to_uhwi (len) + tree_to_shwi (bias);
+ else
+   nunits = TYPE_VECTOR_SUBPARTS (vectype);
  if (mask)
{
  HOST_WIDE_INT start = 0, length = 0;
@@ -3373,7 +3388,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
*data_,
length += elsz;
  mask_idx++;
}
- while (known_lt (mask_idx, TYPE_VECTOR_SUBPARTS (vectype)));
+ while (known_lt (mask_idx, nunits));
  if (length != 0)
{
  pd.rhs_off = start;
@@ -3389,7 +3404,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
*data_,
{
  pd.offset = offset2i;
  pd.size = (tree_to_uhwi (len)
-+ -tree_to_shwi (bias)) * BITS_PER_UNIT;
++ tree_to_shwi (bias)) * BITS_PER_UNIT;
  if (BYTES_BIG_ENDIAN)
pd.rhs_off = pd.size - tree_to_uhwi (TYPE_SIZE (vectype));
  else
-- 
2.36.3

[PATCH v1] RISC-V: Remove duplicated extern function_base decl

2023-06-26 Thread Pan Li via Gcc-patches

From: Pan Li 

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.h: Remove duplicated decl.
---
 gcc/config/riscv/riscv-vector-builtins-bases.h | 5 -
 1 file changed, 5 deletions(-)

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index 62ff38a2811..fb95d6afdf0 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -121,8 +121,6 @@ extern const function_base *const vsmul;
 extern const function_base *const vssra;
 extern const function_base *const vssrl;
 extern const function_base *const vnclip;
-extern const function_base *const vnclip;
-extern const function_base *const vnclipu;
 extern const function_base *const vnclipu;
 extern const function_base *const vmand;
 extern const function_base *const vmnand;
@@ -144,8 +142,6 @@ extern const function_base *const vmsof;
 extern const function_base *const viota;
 extern const function_base *const vid;
 extern const function_base *const vfadd;
-extern const function_base *const vfadd;
-extern const function_base *const vfsub;
 extern const function_base *const vfsub;
 extern const function_base *const vfrsub;
 extern const function_base *const vfwadd;
@@ -153,7 +149,6 @@ extern const function_base *const vfwsub;
 extern const function_base *const vfmul;
 extern const function_base *const vfmul;
 extern const function_base *const vfdiv;
-extern const function_base *const vfdiv;
 extern const function_base *const vfrdiv;
 extern const function_base *const vfwmul;
 extern const function_base *const vfmacc;
-- 
2.34.1

RE: [PATCH] SCCVN: Fix repeating variable name "len"

Committed, thanks Richard.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Richard Biener via Gcc-patches
Sent: Monday, June 26, 2023 3:53 PM
To: Ju-Zhe Zhong 
Cc: gcc-patches@gcc.gnu.org; richard.sandif...@arm.com
Subject: Re: [PATCH] SCCVN: Fix repeating variable name "len"

On Mon, 26 Jun 2023, juzhe.zh...@rivai.ai wrote:

> From: Ju-Zhe Zhong 
> 
> Line 3292: has variable name "len": tree mask = NULL_TREE, len = NULL_TREE, 
> bias = NULL_TREE;
> Line 3349: has variable name "len": HOST_WIDE_INT start = 0, len = 0;
> 
> Since they are never used simultaneously, such issue is not recognized for 
> now.
> However, I want to add LEN_MASK_{LOAD,STORE} which will need these 2 
> variables, so fix naming in this path.
> 
> Change HOST_WIDE_INT start = 0, len = 0; into HOST_WIDE_INT start = 0, length 
> = 0;

OK.

> gcc/ChangeLog:
> 
> * tree-ssa-sccvn.cc (vn_reference_lookup_3): Change name "len" into 
> "length".
> 
> ---
>  gcc/tree-ssa-sccvn.cc | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
> index 27c84e78fcf..11061a374a2 100644
> --- a/gcc/tree-ssa-sccvn.cc
> +++ b/gcc/tree-ssa-sccvn.cc
> @@ -3346,17 +3346,17 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
> *data_,
>   = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (vectype)));
> if (mask)
>   {
> -   HOST_WIDE_INT start = 0, len = 0;
> +   HOST_WIDE_INT start = 0, length = 0;
> unsigned mask_idx = 0;
> do
>   {
> if (integer_zerop (VECTOR_CST_ELT (mask, mask_idx)))
>   {
> -   if (len != 0)
> +   if (length != 0)
>   {
> pd.rhs_off = start;
> pd.offset = offset2i + start;
> -   pd.size = len;
> +   pd.size = length;
> if (ranges_known_overlap_p
>   (offset, maxsize, pd.offset, pd.size))
>   {
> @@ -3367,18 +3367,18 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
> *data_,
>   }
>   }
> start = (mask_idx + 1) * elsz;
> -   len = 0;
> +   length = 0;
>   }
> else
> - len += elsz;
> + length += elsz;
> mask_idx++;
>   }
> while (known_lt (mask_idx, TYPE_VECTOR_SUBPARTS (vectype)));
> -   if (len != 0)
> +   if (length != 0)
>   {
> pd.rhs_off = start;
> pd.offset = offset2i + start;
> -   pd.size = len;
> +   pd.size = length;
> if (ranges_known_overlap_p (offset, maxsize,
> pd.offset, pd.size))
>   return data->push_partial_def (pd, set, set,
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

Re: [PATCH] Implement ipa_vr hashing.

2023-06-26 Thread Aldy Hernandez via Gcc-patches

Errr, sorry about this ping.  I was meant to re-ping my IPA patches
after 7 days, but just realized it had been only 4.  My bad.

Aldy

On Mon, Jun 26, 2023 at 11:22 AM Aldy Hernandez  wrote:
>
> PING*3
>
> On Thu, Jun 22, 2023 at 7:49 AM Aldy Hernandez  wrote:
> >
> > Ping*2
> >
> > On Wed, Jun 14, 2023, 14:11 Aldy Hernandez  wrote:
> >>
> >> PING
> >>
> >> On Sat, Jun 10, 2023 at 10:30 PM Aldy Hernandez  wrote:
> >> >
> >> >
> >> >
> >> > On 5/29/23 16:51, Martin Jambor wrote:
> >> > > Hi,
> >> > >
> >> > > On Mon, May 22 2023, Aldy Hernandez via Gcc-patches wrote:
> >> > >> Implement hashing for ipa_vr.  When all is said and done, all these
> >> > >> patches incurr a 7.64% slowdown for ipa-cp, with is entirely covered 
> >> > >> by
> >> > >> the similar 7% increase in this area last week.  So we get type 
> >> > >> agnostic
> >> > >> ranges with "infinite" range precision close to free.
> >> > >
> >> > > Do you know why/where this slow-down happens?  Do we perhaps want to
> >> > > limit the "infiniteness" a little somehow?
> >> >
> >> > I addressed the slow down in another mail.
> >> >
> >> > >
> >> > > Also, jump functions live for a long time, have you looked at how 
> >> > > memory
> >> > > hungry they become?  I hope that the hashing would be good at 
> >> > > preventing
> >> > > any issues.
> >> >
> >> > On a side-note, the caching does help.  On a (mistaken) hunch, I had
> >> > played around with removing caching for everything but UNDEFINED/VARYING
> >> > and zero/nonzero to simplify things, but the cache hit ratio was still
> >> > surprisingly high (+80%).  So good job there :-).
> >> >
> >> > >
> >> > > Generally, I think I OK with the patches if the impact on memory is not
> >> > > too bad, though I guess they depend on the one I looked at last week, 
> >> > > so
> >> > > we may focus on that one first.
> >> >
> >> > I'm not sure whether this was an OK for the other patches, given you
> >> > approved the first patch, so I'll hold off until you give the go-ahead.
> >> >
> >> > Thanks.
> >> > Aldy

Re: [PATCH] Implement ipa_vr hashing.

2023-06-26 Thread Aldy Hernandez via Gcc-patches

PING*3

On Thu, Jun 22, 2023 at 7:49 AM Aldy Hernandez  wrote:
>
> Ping*2
>
> On Wed, Jun 14, 2023, 14:11 Aldy Hernandez  wrote:
>>
>> PING
>>
>> On Sat, Jun 10, 2023 at 10:30 PM Aldy Hernandez  wrote:
>> >
>> >
>> >
>> > On 5/29/23 16:51, Martin Jambor wrote:
>> > > Hi,
>> > >
>> > > On Mon, May 22 2023, Aldy Hernandez via Gcc-patches wrote:
>> > >> Implement hashing for ipa_vr.  When all is said and done, all these
>> > >> patches incurr a 7.64% slowdown for ipa-cp, with is entirely covered by
>> > >> the similar 7% increase in this area last week.  So we get type agnostic
>> > >> ranges with "infinite" range precision close to free.
>> > >
>> > > Do you know why/where this slow-down happens?  Do we perhaps want to
>> > > limit the "infiniteness" a little somehow?
>> >
>> > I addressed the slow down in another mail.
>> >
>> > >
>> > > Also, jump functions live for a long time, have you looked at how memory
>> > > hungry they become?  I hope that the hashing would be good at preventing
>> > > any issues.
>> >
>> > On a side-note, the caching does help.  On a (mistaken) hunch, I had
>> > played around with removing caching for everything but UNDEFINED/VARYING
>> > and zero/nonzero to simplify things, but the cache hit ratio was still
>> > surprisingly high (+80%).  So good job there :-).
>> >
>> > >
>> > > Generally, I think I OK with the patches if the impact on memory is not
>> > > too bad, though I guess they depend on the one I looked at last week, so
>> > > we may focus on that one first.
>> >
>> > I'm not sure whether this was an OK for the other patches, given you
>> > approved the first patch, so I'll hold off until you give the go-ahead.
>> >
>> > Thanks.
>> > Aldy

Re: [PATCH v2] RISC-V: fix expand function of vlmul_ext RVV intrinsic

2023-06-26 Thread Li Xu

Hi, Jeff:

I have filled out the form. May I ask if you have received my application? Is 
there anything else I need to do?

Thanks.
--
Li Xu
>
>
>On 6/25/23 03:13, juzhe.zh...@rivai.ai wrote:
>> LGTM.
>> Thanks for fixing it.
>Agreed.  I didn't see the V2 had already been posted.
>
>>
>> Hi, Jeff:
>> I saw Li Xu is frequently helping RVV support in GCC. Is it possible to
>> give him the write access?
>Yes, we can do that with the normal process.
>
>Li Xu, fill out this form:
>
>https://sourceware.org/cgi-bin/pdw/ps_form.cgi
>
>List me as approving the request.
>Jeff

Re: [PATCH] RISC-V: Enhance RVV VLA SLP auto-vectorization

Could you re-title this patch into something like "Support const
vector expansion with xxx pattern",

On Mon, Jun 26, 2023 at 3:52 PM Robin Dapp via Gcc-patches
 wrote:
>
> Hi Juzhe,
>
> > Currently, we are able to generate step vector with base == 0:
> >  { 0, 0, 2, 2, 4, 4, ... }
> >
> > ASM:
> >
> > vid
> > vand
> >
> > However, we do wrong for step vector with base != 0:
> > { 1, 1, 3, 3, 5, 5, ... }
> >
> > Before this patch, such case will run fail.
> >
> > After this patch, we are able to pass the testcase and generate the step 
> > vector with asm:
> >
> > vid
> > vand
> > vadd
>
> Can't we use the first case as long as pow2_p (base) == true
> and not just for base == 0?
>
> Regards
>  Robin
>

Re: [PATCH] RISC-V: Remove redundant vcond patterns