date:20230509

RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-05-09 Thread Li, Pan2 via Gcc-patches

Just migrated to the pointer_mux for the var-tracking, it works well even the 
bitsize of tree_base code is different from the rtl_def code. I will prepare 
the PATCH if there is no surprise from the X86 bootstrap test.

Thanks Richard for pointing out the pointer_mux, !

Pan 

-Original Message-
From: Li, Pan2 
Sent: Tuesday, May 9, 2023 7:51 PM
To: Richard Biener ; Richard Sandiford 

Cc: Jeff Law ; Kito Cheng ; 
juzhe.zh...@rivai.ai; gcc-patches ; palmer 
; jakub 
Subject: RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
16-bit

Sure thing, I will have a try and keep you posted.

Pan

-Original Message-
From: Richard Biener 
Sent: Tuesday, May 9, 2023 6:26 PM
To: Richard Sandiford 
Cc: Li, Pan2 ; Jeff Law ; Kito Cheng 
; juzhe.zh...@rivai.ai; gcc-patches 
; palmer ; jakub 
Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
16-bit

On Tue, 9 May 2023, Richard Sandiford wrote:

> "Li, Pan2"  writes:
> > After the bits patch like below.
> >
> > rtx_def code 16 => 8 bits.
> > rtx_def mode 8 => 16 bits.
> > tree_base code unchanged.
> >
> > The structure layout of both the rtx_def and tree_base will be something 
> > similar as below. As I understand, the lower 8-bits of tree_base will be 
> > inspected when 'dv' is a tree for the rtx conversion.
> >
> > tree_base   rtx_def
> > code: 16code: 8
> > side_effects_flag: 1mode: 16
> 
> I think we should try hard to avoid that though.  The 16-bit value 
> should be aligned to 16 bits if at all possible.  decl_or_value 
> doesn't seem like something that should be dictating our approach here.
> 
> Perhaps we can use pointer_mux for decl_or_value instead?  pointer_mux 
> is intended to be a standands-compliant (hah!) way of switching 
> between two pointer types in a reasonably efficient way.

Ah, I wasn't aware of that - yes, that looks good to use I think.

Pan, can you prepare a patch only doing such conversion of the var-tracking 
decl_or_value type?  Aka make it

typedef pointer_mux decl_or_value;

and adjust uses?

Thanks,
Richard.

> Thanks,
> Richard
> 
> > constant_flag: 1
> > addressable_flag: 1
> > volatile_flag: 1
> > readonly_flag: 1
> > asm_written_flag: 1
> > nowarning_flag: 1
> > visited: 1
> > used_flag: 1
> > nothrow_flag: 1
> > static_flag: 1
> > public_flag: 1
> > private_flag: 1
> > protected_flag: 1
> > deprecated_flag: 1
> > default_def_flag: 1
> >
> > I have a try a similar approach (as below) as you mentioned, aka shrink 
> > tree_code as 1:1 overlap to rtx_code. And completed one memory allocated 
> > bytes test in another email.
> >
> > rtx_def code 16 => 12 bits.
> > rtx_def mode 8 => 12 bits.
> > tree_base code 16 => 12 bits.
> >
> > Pan
> >
> > -Original Message-
> > From: Richard Biener 
> > Sent: Monday, May 8, 2023 3:38 PM
> > To: Li, Pan2 
> > Cc: Jeff Law ; Kito Cheng 
> > ; juzhe.zh...@rivai.ai; richard.sandiford 
> > ; gcc-patches ; 
> > palmer ; jakub 
> > Subject: RE: [PATCH] machine_mode type size: Extend enum size from 
> > 8-bit to 16-bit
> >
> > On Mon, 8 May 2023, Li, Pan2 wrote:
> >
> >> return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; } is able 
> >> to fix this ICE after mode bits change.
> >
> > Can you check which bits this will inspect when 'dv' is a tree after your 
> > patch?  VALUE is 1 and would map to IDENTIFIER_NODE on the tree side when 
> > there was a 1:1 overlap.
> >
> > I think for all cases but struct loc_exp_dep we could find a bit to record 
> > wheter we deal with a VALUE or a decl, but for loc_exp_dep it's going to be 
> > difficult (unless we start to take bits from pointer representations).
> >
> > That said, I agree with Jeff that the code is ugly, but a simplistic 
> > conversion isn't what we want.
> >
> > An alternative "solution" might be to also shrink tree_code when we shrink 
> > rtx_code and keep the 1:1 overlap.
> >
> > Richard.
> >
> >> I will re-trigger the memory allocate bytes test with below changes 
> >> for X86.
> >> 
> >> rtx_def code 16 => 8 bits.
> >> rtx_def mode 8 => 16 bits.
> >> tree_base code unchanged.
> >> 
> >> Pan
> >> 
> >> -Original Message-
> >> From: Li, Pan2
> >> Sent: Monday, May 8, 2023 2:42 PM
> >> To: Richard Biener ; Jeff Law 
> >> 
> >> Cc: Kito Cheng ; juzhe.zh...@rivai.ai; 
> >> richard.sandiford ; gcc-patches 
> >> ; palmer ; jakub 
> >> 
> >> Subject: RE: [PATCH] machine_mode type size: Extend enum size from 
> >> 8-bit to 16-bit
> >> 
> >> Oops. Actually I am patching a version as you mentioned like storage 
> >> allocation. Thank you Richard, will try your suggestion and keep you 
> >> posted.
> >> 
> >> Pan
> >> 
> >> -Original Message-
> >> From: Richard Biener 
> >> Sent: Monday, May 8, 2023 2:30 PM
> >> To: Jeff Law 
> >> Cc: Li, Pan2 ; Kito Cheng 
> >> ; juzhe.zh...@rivai.ai; richard.sandiford 
> >> ; gcc-patches ; 
> >> palmer ; jakub 
> >> Subject: Re: [PATCH] machine_mode type size: Extend enum size from 
> >> 8-bit to 16-bit

[Committed] New testcase

2023-05-09 Thread Andrew Pinski via Gcc-patches

While I was writting a match.pd patch, I can across GCC was being miscompiled
but no testcase was failing. So this adds that testcase.

Committed after testing on x86_64 with
make check-gcc RUNTESTFLAGS="execute.exp=20230509-1.c"

gcc/testsuite/ChangeLog:

* gcc.c-torture/execute/20230509-1.c: New test.
---
 .../gcc.c-torture/execute/20230509-1.c| 28 +++
 1 file changed, 28 insertions(+)
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/20230509-1.c

diff --git a/gcc/testsuite/gcc.c-torture/execute/20230509-1.c 
b/gcc/testsuite/gcc.c-torture/execute/20230509-1.c
new file mode 100644
index 000..359d93c5d34
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/20230509-1.c
@@ -0,0 +1,28 @@
+int __attribute__((noipa)) f(unsigned a, int b)
+{
+  if (a < 0) __builtin_unreachable();
+  if (a > 30) __builtin_unreachable();
+  int t = a;
+  if (b)  t = 100;
+  else  if (a != 0)
+t = a ;
+  else
+t = 1;
+  return t;
+}
+
+
+int main(void)
+{
+  if (f(0, 0) != 1)
+__builtin_abort();
+  if (f(1, 0) != 1)
+__builtin_abort();
+  if (f(0, 1) != 100)
+__builtin_abort();
+  if (f(1, 0) != 1)
+__builtin_abort();
+  if (f(30, 0) != 30)
+__builtin_abort();
+}
+
-- 
2.31.1

[PATCH] _Hashtable implementation cleanup

2023-05-09 Thread François Dumont via Gcc-patches


Hi

Rather than providing a series of patches for _Hashtable I prefer to 
submit them one by one. It will maximize the chances to have some of 
them in gcc 14.


I'm starting with this simple patch to do some cleanup in the current 
implementation to ease compiler optimizations by making some methods 
implicitly inline and avoiding the iterator abstraction when useless.


It is also replacing some faulty usages of __node_type* with __node_ptr. 
It should simplify the patch to make use of allocator custom pointer I 
would like to reactivate.


libstdc++: [_Hashtable] Implement several small methods implicitly inline

Make implementation of 3 simple _Hashtable methods implicitly inline.

Avoid usage of const_iterator abstraction within _Hashtable implementation.

Replace several usages of __node_type* with expected __node_ptr.

libstdc++-v3/ChangeLog:

    * include/bits/hashtable_policy.h
    (_NodeBuilder<>::_S_build): Use __node_ptr.
    (_ReuseOrAllocNode<>): Use __node_ptr in place of __node_type*.
    (_AllocNode<>): Likewise.
    (_Equality<>::_M_equal): Remove const_iterator usages. Only 
preserved
    to call std::is_permutation in the non-unique key 
implementation.
    * include/bits/hashtable.h 
(_Hashtable<>::_M_update_begin()): Capture

    _M_begin() once.
    (_Hashtable<>::_M_bucket_begin(size_type)): Implement 
implicitly inline.

    (_Hashtable<>::_M_insert_bucket_begin): Likewise.
    (_Hashtable<>::_M_remove_bucket_begin): Likewise.
    (_Hashtable<>::_M_compute_hash_code): Use __node_ptr rather 
than

    const_iterator.
    (_Hashtable<>::find): Likewise.
    (_Hashtable<>::_M_emplace): Likewise.
    (_Hashtable<>::_M_insert_unique): Likewise.

Ok to commit ?

François
diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h
index d2ff15320fc..954a1c7a58d 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -401,8 +401,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   void
   _M_update_bbegin()
   {
-	if (_M_begin())
-	  _M_buckets[_M_bucket_index(*_M_begin())] = &_M_before_begin;
+	if (auto __begin = _M_begin())
+	  _M_buckets[_M_bucket_index(*__begin)] = &_M_before_begin;
   }
 
   void
@@ -458,7 +458,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // Gets bucket begin, deals with the fact that non-empty buckets contain
   // their before begin node.
   __node_ptr
-  _M_bucket_begin(size_type __bkt) const;
+  _M_bucket_begin(size_type __bkt) const
+  {
+	__node_base_ptr __n = _M_buckets[__bkt];
+	return __n ? static_cast<__node_ptr>(__n->_M_nxt) : nullptr;
+  }
 
   __node_ptr
   _M_begin() const
@@ -831,19 +835,57 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   // Insert a node at the beginning of a bucket.
   void
-  _M_insert_bucket_begin(size_type, __node_ptr);
+  _M_insert_bucket_begin(size_type __bkt, __node_ptr __node)
+  {
+	if (_M_buckets[__bkt])
+	  {
+	// Bucket is not empty, we just need to insert the new node
+	// after the bucket before begin.
+	__node->_M_nxt = _M_buckets[__bkt]->_M_nxt;
+	_M_buckets[__bkt]->_M_nxt = __node;
+	  }
+	else
+	  {
+	// The bucket is empty, the new node is inserted at the
+	// beginning of the singly-linked list and the bucket will
+	// contain _M_before_begin pointer.
+	__node->_M_nxt = _M_before_begin._M_nxt;
+	_M_before_begin._M_nxt = __node;
+
+	if (__node->_M_nxt)
+	  // We must update former begin bucket that is pointing to
+	  // _M_before_begin.
+	  _M_buckets[_M_bucket_index(*__node->_M_next())] = __node;
+
+	_M_buckets[__bkt] = &_M_before_begin;
+	  }
+  }
 
   // Remove the bucket first node
   void
   _M_remove_bucket_begin(size_type __bkt, __node_ptr __next_n,
-			 size_type __next_bkt);
+			 size_type __next_bkt)
+  {
+	if (!__next_n || __next_bkt != __bkt)
+	  {
+	// Bucket is now empty
+	// First update next bucket if any
+	if (__next_n)
+	  _M_buckets[__next_bkt] = _M_buckets[__bkt];
+
+	// Second update before begin node if necessary
+	if (&_M_before_begin == _M_buckets[__bkt])
+	  _M_before_begin._M_nxt = __next_n;
+	_M_buckets[__bkt] = nullptr;
+	  }
+  }
 
   // Get the node before __n in the bucket __bkt
   __node_base_ptr
   _M_get_previous_node(size_type __bkt, __node_ptr __n);
 
-  pair
-  _M_compute_hash_code(const_iterator __hint, const key_type& __k) const;
+  pair<__node_ptr, __hash_code>
+  _M_compute_hash_code(__node_ptr __hint, const key_type& __k) const;
 
   // Insert node __n with hash code __code, in bucket __bkt if no
   // rehash (assumes no element with same key already present).
@@ -1153,20 +1195,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 };
 
   // Definitions of class template

Re: [PATCH V2] RISC-V: Insert vsetivli zero, 0 for vmv.x.s/vfmv.f.s instructions satisfying REG_P(operand[1]) in -O0.

2023-05-09 Thread juzhe.zh...@rivai.ai

LGTM. Let's wait for kito's feedback.
Thanks :)



juzhe.zh...@rivai.ai
 
From: Li Xu
Date: 2023-05-10 12:02
To: gcc-patches
CC: kito.cheng; palmer; juzhe.zhong; Li Xu
Subject: [PATCH V2] RISC-V: Insert vsetivli zero, 0 for vmv.x.s/vfmv.f.s 
instructions satisfying REG_P(operand[1]) in -O0.
This issue happens is because the operand1 of scalar move can be
REG_P (operand[1]) in the O0 case, which causes the VSETVL PASS to
not insert the vsetvl instruction correctly, and the compiler crashes.
 
Consider this following case:
int16_t foo1 (void *base, size_t vl)
{
int16_t maxVal = __riscv_vmv_x_s_i16m1_i16 (__riscv_vle16_v_i16m1 (base, 
vl));
return maxVal;
}
 
Before this patch:
bug.c:15:1: internal compiler error: Segmentation fault
   15 | }
  | ^
0x145d723 crash_signal
../.././riscv-gcc/gcc/toplev.cc:314
0x22929dd const_csr_operand(rtx_def*, machine_mode)
../.././riscv-gcc/gcc/config/riscv/predicates.md:44
0x2292a21 csr_operand(rtx_def*, machine_mode)
../.././riscv-gcc/gcc/config/riscv/predicates.md:46
0x23dfbb0 recog_356
../.././riscv-gcc/gcc/config/riscv/iterators.md:72
0x23efecd recog(rtx_def*, rtx_insn*, int*)
../.././riscv-gcc/gcc/config/riscv/iterators.md:89
0xdddc15 recog_memoized(rtx_insn*)
../.././riscv-gcc/gcc/recog.h:273
 
After this patch:
vsetivli zero,0,e16,m1,ta,ma
vmv.x.s a5,v1
 
gcc/ChangeLog:
 
* config/riscv/riscv-vsetvl.cc (gen_vsetvl_pat): For vfmv.f.s/vmv.x.s 
intruction replace null avl with (const_int 0).
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/scalar_move-10.c: New test.
* gcc.target/riscv/rvv/base/scalar_move-11.c: New test.
---
gcc/config/riscv/riscv-vsetvl.cc  |  5 +++
.../riscv/rvv/base/scalar_move-10.c   | 31 +++
.../riscv/rvv/base/scalar_move-11.c   | 20 
3 files changed, 56 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-10.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-11.c
 
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index d4d6f336ef9..14ebae1f3f6 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -618,6 +618,11 @@ static rtx
gen_vsetvl_pat (enum vsetvl_type insn_type, const vl_vtype_info , rtx vl)
{
   rtx avl = info.get_avl ();
+  /* if optimization == 0 and the instruction is vmv.x.s/vfmv.f.s,
+ set the value of avl to (const_int 0) so that VSETVL PASS will
+ insert vsetvl correctly.*/
+  if (info.has_avl_no_reg ())
+avl = GEN_INT (0);
   rtx sew = gen_int_mode (info.get_sew (), Pmode);
   rtx vlmul = gen_int_mode (info.get_vlmul (), Pmode);
   rtx ta = gen_int_mode (info.get_ta (), Pmode);
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-10.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-10.c
new file mode 100644
index 000..9760d77fb22
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-10.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O0" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "riscv_vector.h"
+
+/*
+** foo1:
+** ...
+** vsetivli\tzero,0,e16,m1,t[au],m[au]
+** vmv.x.s\t[a-x0-9]+,v[0-9]+
+** ...
+*/
+int16_t foo1 (void *base, size_t vl)
+{
+int16_t maxVal = __riscv_vmv_x_s_i16m1_i16 (__riscv_vle16_v_i16m1 (base, 
vl));
+return maxVal;
+}
+
+/*
+** foo2:
+** ...
+** vsetivli\tzero,0,e32,m1,t[au],m[au]
+** vfmv.f.s\tf[a-x0-9]+,v[0-9]+
+** ...
+*/
+float foo2 (void *base, size_t vl)
+{
+float maxVal = __riscv_vfmv_f_s_f32m1_f32 (__riscv_vle32_v_f32m1 (base, 
vl));
+return maxVal;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-11.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-11.c
new file mode 100644
index 000..8036acd0a52
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-11.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32d -O0" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "riscv_vector.h"
+
+/*
+** foo:
+** ...
+** vsetivli\tzero,0,e64,m4,t[au],m[au]
+** vmv.x.s\t[a-x0-9]+,v[0-9]+
+** vsetivli\tzero,0,e64,m4,t[au],m[au]
+** vmv.x.s\t[a-x0-9]+,v[0-9]+
+** ...
+*/
+int16_t foo (void *base, size_t vl)
+{
+int16_t maxVal = __riscv_vmv_x_s_i64m4_i64 (__riscv_vle64_v_i64m4 (base, 
vl));
+return maxVal;
+}
-- 
2.17.1

[PATCH V2] RISC-V: Insert vsetivli zero, 0 for vmv.x.s/vfmv.f.s instructions satisfying REG_P(operand[1]) in -O0.

2023-05-09 Thread Li Xu

This issue happens is because the operand1 of scalar move can be
REG_P (operand[1]) in the O0 case, which causes the VSETVL PASS to
not insert the vsetvl instruction correctly, and the compiler crashes.

Consider this following case:
int16_t foo1 (void *base, size_t vl)
{
int16_t maxVal = __riscv_vmv_x_s_i16m1_i16 (__riscv_vle16_v_i16m1 (base, 
vl));
return maxVal;
}

Before this patch:
bug.c:15:1: internal compiler error: Segmentation fault
   15 | }
  | ^
0x145d723 crash_signal
../.././riscv-gcc/gcc/toplev.cc:314
0x22929dd const_csr_operand(rtx_def*, machine_mode)
../.././riscv-gcc/gcc/config/riscv/predicates.md:44
0x2292a21 csr_operand(rtx_def*, machine_mode)
../.././riscv-gcc/gcc/config/riscv/predicates.md:46
0x23dfbb0 recog_356
../.././riscv-gcc/gcc/config/riscv/iterators.md:72
0x23efecd recog(rtx_def*, rtx_insn*, int*)
../.././riscv-gcc/gcc/config/riscv/iterators.md:89
0xdddc15 recog_memoized(rtx_insn*)
../.././riscv-gcc/gcc/recog.h:273

After this patch:
vsetivlizero,0,e16,m1,ta,ma
vmv.x.s a5,v1

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (gen_vsetvl_pat): For vfmv.f.s/vmv.x.s 
intruction replace null avl with (const_int 0).

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/scalar_move-10.c: New test.
* gcc.target/riscv/rvv/base/scalar_move-11.c: New test.
---
 gcc/config/riscv/riscv-vsetvl.cc  |  5 +++
 .../riscv/rvv/base/scalar_move-10.c   | 31 +++
 .../riscv/rvv/base/scalar_move-11.c   | 20 
 3 files changed, 56 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-11.c

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index d4d6f336ef9..14ebae1f3f6 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -618,6 +618,11 @@ static rtx
 gen_vsetvl_pat (enum vsetvl_type insn_type, const vl_vtype_info , rtx vl)
 {
   rtx avl = info.get_avl ();
+  /* if optimization == 0 and the instruction is vmv.x.s/vfmv.f.s,
+ set the value of avl to (const_int 0) so that VSETVL PASS will
+ insert vsetvl correctly.*/
+  if (info.has_avl_no_reg ())
+avl = GEN_INT (0);
   rtx sew = gen_int_mode (info.get_sew (), Pmode);
   rtx vlmul = gen_int_mode (info.get_vlmul (), Pmode);
   rtx ta = gen_int_mode (info.get_ta (), Pmode);
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-10.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-10.c
new file mode 100644
index 000..9760d77fb22
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-10.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O0" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "riscv_vector.h"
+
+/*
+** foo1:
+** ...
+** vsetivli\tzero,0,e16,m1,t[au],m[au]
+** vmv.x.s\t[a-x0-9]+,v[0-9]+
+** ...
+*/
+int16_t foo1 (void *base, size_t vl)
+{
+int16_t maxVal = __riscv_vmv_x_s_i16m1_i16 (__riscv_vle16_v_i16m1 (base, 
vl));
+return maxVal;
+}
+
+/*
+** foo2:
+** ...
+** vsetivli\tzero,0,e32,m1,t[au],m[au]
+** vfmv.f.s\tf[a-x0-9]+,v[0-9]+
+** ...
+*/
+float foo2 (void *base, size_t vl)
+{
+float maxVal = __riscv_vfmv_f_s_f32m1_f32 (__riscv_vle32_v_f32m1 (base, 
vl));
+return maxVal;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-11.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-11.c
new file mode 100644
index 000..8036acd0a52
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-11.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32d -O0" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "riscv_vector.h"
+
+/*
+** foo:
+** ...
+** vsetivli\tzero,0,e64,m4,t[au],m[au]
+** vmv.x.s\t[a-x0-9]+,v[0-9]+
+** vsetivli\tzero,0,e64,m4,t[au],m[au]
+** vmv.x.s\t[a-x0-9]+,v[0-9]+
+** ...
+*/
+int16_t foo (void *base, size_t vl)
+{
+int16_t maxVal = __riscv_vmv_x_s_i64m4_i64 (__riscv_vle64_v_i64m4 (base, 
vl));
+return maxVal;
+}
-- 
2.17.1

[PATCH] RISC-V: Support const series vector for RVV auto-vectorization

2023-05-09 Thread juzhe . zhong

From: Juzhe-Zhong 

This patch is the prerequiste patch for more RVV auto-vectorization
support.

Since when we enable a very simple binary operations, we will end
up with such following ICE:

during RTL pass: expand
add_run-1.c: In function 'main':
add_run-1.c:28:1: internal compiler error: Segmentation fault
0x1618ea3 crash_signal
../../../riscv-gcc/gcc/toplev.cc:314
0xe76cd9 single_set(rtx_insn const*)
../../../riscv-gcc/gcc/rtl.h:3602
0x1080f8a emit_move_insn(rtx_def*, rtx_def*)
../../../riscv-gcc/gcc/expr.cc:4342
0x170c458 insert_value_copy_on_edge
../../../riscv-gcc/gcc/tree-outof-ssa.cc:352
0x170d58e eliminate_phi
../../../riscv-gcc/gcc/tree-outof-ssa.cc:785
0x170df17 expand_phi_nodes(ssaexpand*)
../../../riscv-gcc/gcc/tree-outof-ssa.cc:1024
0xef27e2 execute
../../../riscv-gcc/gcc/cfgexpand.cc:6818

This is because LoopVectorizer assume target is able to handle
series const vector when we enable binary operations.
Then it will be easily causing ICE like that.

gcc/ChangeLog:

* config/riscv/autovec.md (@vec_series): New pattern
* config/riscv/riscv-protos.h (expand_vec_series): New function.
* config/riscv/riscv-v.cc (emit_binop): Ditto.
(emit_indexop): Ditto.
(expand_vec_series): Ditto.
(expand_const_vector): Add series vector handling.
* config/riscv/riscv.cc (riscv_const_insns): Enable series vector for 
testing.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/series-1.c: New test.
* gcc.target/riscv/rvv/autovec/series_run-1.c: New test.

---
 gcc/config/riscv/autovec.md   |  24 
 gcc/config/riscv/riscv-protos.h   |   1 +
 gcc/config/riscv/riscv-v.cc   | 118 +-
 gcc/config/riscv/riscv.cc |  27 +++-
 .../gcc.target/riscv/rvv/autovec/series-1.c   |  50 
 .../riscv/rvv/autovec/series_run-1.c  |  20 +++
 6 files changed, 236 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/series-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/series_run-1.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index f1c5ff5951b..99dc4f046b0 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -58,3 +58,27 @@
 DONE;
   }
 )
+
+;; =
+;; == Vector creation
+;; =
+
+;; -
+;;  [INT] Linear series
+;; -
+;; Includes:
+;; - vid.v
+;; - vmul.vx
+;; - vadd.vx/vadd.vi
+;; -
+
+(define_expand "@vec_series"
+  [(match_operand:VI 0 "register_operand")
+   (match_operand: 1 "reg_or_int_operand")
+   (match_operand: 2 "reg_or_int_operand")]
+  "TARGET_VECTOR"
+  {
+riscv_vector::expand_vec_series (operands[0], operands[1], operands[2]);
+DONE;
+  }
+)
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index c0293a306f9..e8a728ae226 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -219,6 +219,7 @@ rtx gen_avl_for_scalar_move (rtx);
 void expand_tuple_move (machine_mode, rtx *);
 machine_mode preferred_simd_mode (scalar_mode);
 opt_machine_mode get_mask_mode (machine_mode);
+void expand_vec_series (rtx, rtx, rtx);
 }
 
 /* We classify builtin types into two classes:
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 7ca49ca67c1..0c3b1b4c40b 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -248,6 +248,111 @@ emit_nonvlmax_op (unsigned icode, rtx dest, rtx src, rtx 
len,
   emit_pred_op (icode, NULL_RTX, dest, src, len, mask_mode, false);
 }
 
+/* Emit binary operations.  */
+
+static void
+emit_binop (unsigned icode, rtx *ops, machine_mode mask_mode,
+   machine_mode scalar_mode)
+{
+  insn_expander<9> e;
+  machine_mode mode = GET_MODE (ops[0]);
+  e.add_output_operand (ops[0], mode);
+  e.add_all_one_mask_operand (mask_mode);
+  e.add_vundef_operand (mode);
+  if (VECTOR_MODE_P (GET_MODE (ops[1])))
+e.add_input_operand (ops[1], GET_MODE (ops[1]));
+  else
+e.add_input_operand (ops[1], scalar_mode);
+  if (VECTOR_MODE_P (GET_MODE (ops[2])))
+e.add_input_operand (ops[2], GET_MODE (ops[2]));
+  else
+e.add_input_operand (ops[2], scalar_mode);
+  rtx vlmax = gen_reg_rtx (Pmode);
+  emit_vlmax_vsetvl (mode, vlmax);
+  e.add_input_operand (vlmax, Pmode);
+  e.add_policy_operand (get_prefer_tail_policy (), get_prefer_mask_policy ());
+  e.add_avl_type_operand (avl_type::VLMAX);
+  e.expand ((enum insn_code) icode, false);
+}
+
+/* Emit vid.v instruction.  */
+
+static void
+emit_indexop

Re: [PATCH] RISC-V: Insert vsetivli zero, 0 for vmv.x.s/vfmv.f.s instructions satisfying REG_P(operand[1]) in -O0.

2023-05-09 Thread juzhe.zh...@rivai.ai

Thanks for fix this. It seems that we don't have much testing in O0 (Mostly 
testing in optimize > 0).

A couple of comments here:

>> if (avl == NULL_RTX && !optimize)
Chang it into ---> info.has_avl_no_reg ()

>> \ No newline at end of file
Check each test files make sure each file has a newline at the end of file.

After these change, it LGTM.

Thanks.


juzhe.zh...@rivai.ai
 
From: Li Xu
Date: 2023-05-10 10:18
To: gcc-patches
CC: kito.cheng; palmer; juzhe.zhong; Li Xu
Subject: [PATCH] RISC-V: Insert vsetivli zero, 0 for vmv.x.s/vfmv.f.s 
instructions satisfying REG_P(operand[1]) in -O0.
This issue happens is because the operand1 of scalar move can be
REG_P (operand[1]) in the O0 case, which causes the VSETVL PASS to
not insert the vsetvl instruction correctly, and the compiler crashes.
 
Consider this following case:
int16_t foo1 (void *base, size_t vl)
{
int16_t maxVal = __riscv_vmv_x_s_i16m1_i16 (__riscv_vle16_v_i16m1 (base, 
vl));
return maxVal;
}
 
Before this patch:
bug.c:15:1: internal compiler error: Segmentation fault
   15 | }
  | ^
0x145d723 crash_signal
../.././riscv-gcc/gcc/toplev.cc:314
0x22929dd const_csr_operand(rtx_def*, machine_mode)
../.././riscv-gcc/gcc/config/riscv/predicates.md:44
0x2292a21 csr_operand(rtx_def*, machine_mode)
../.././riscv-gcc/gcc/config/riscv/predicates.md:46
0x23dfbb0 recog_356
../.././riscv-gcc/gcc/config/riscv/iterators.md:72
0x23efecd recog(rtx_def*, rtx_insn*, int*)
../.././riscv-gcc/gcc/config/riscv/iterators.md:89
0xdddc15 recog_memoized(rtx_insn*)
../.././riscv-gcc/gcc/recog.h:273
 
After this patch:
vsetivli zero,0,e16,m1,ta,ma
vmv.x.s a5,v1
 
gcc/ChangeLog:
 
* config/riscv/riscv-vsetvl.cc (gen_vsetvl_pat): For vfmv.f.s/vmv.x.s 
intruction replace null avl with (const_int 0).
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/scalar_move-10.c: New test.
* gcc.target/riscv/rvv/base/scalar_move-11.c: New test.
---
gcc/config/riscv/riscv-vsetvl.cc  |  5 +++
.../riscv/rvv/base/scalar_move-10.c   | 31 +++
.../riscv/rvv/base/scalar_move-11.c   | 20 
3 files changed, 56 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-10.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-11.c
 
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index d4d6f336ef9..dfca2515f83 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -618,6 +618,11 @@ static rtx
gen_vsetvl_pat (enum vsetvl_type insn_type, const vl_vtype_info , rtx vl)
{
   rtx avl = info.get_avl ();
+  /* if optimization == 0 and the instruction is vmv.x.s/vfmv.f.s,
+ set the value of avl to (const_int 0) so that VSETVL PASS will
+ insert vsetvl correctly.*/
+  if (avl == NULL_RTX && !optimize)
+avl = GEN_INT (0);
   rtx sew = gen_int_mode (info.get_sew (), Pmode);
   rtx vlmul = gen_int_mode (info.get_vlmul (), Pmode);
   rtx ta = gen_int_mode (info.get_ta (), Pmode);
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-10.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-10.c
new file mode 100644
index 000..186ae34335e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-10.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O0" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "riscv_vector.h"
+
+/*
+** foo1:
+** ...
+** vsetivli\tzero,0,e16,m1,t[au],m[au]
+** vmv.x.s\t[a-x0-9]+,v[0-9]+
+** ...
+*/
+int16_t foo1 (void *base, size_t vl)
+{
+int16_t maxVal = __riscv_vmv_x_s_i16m1_i16 (__riscv_vle16_v_i16m1 (base, 
vl));
+return maxVal;
+}
+
+/*
+** foo2:
+** ...
+** vsetivli\tzero,0,e32,m1,t[au],m[au]
+** vfmv.f.s\tf[a-x0-9]+,v[0-9]+
+** ...
+*/
+float foo2 (void *base, size_t vl)
+{
+float maxVal = __riscv_vfmv_f_s_f32m1_f32 (__riscv_vle32_v_f32m1 (base, 
vl));
+return maxVal;
+}
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-11.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-11.c
new file mode 100644
index 000..724cf74d217
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-11.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32d -O0" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "riscv_vector.h"
+
+/*
+** foo:
+** ...
+** vsetivli\tzero,0,e64,m4,t[au],m[au]
+** vmv.x.s\t[a-x0-9]+,v[0-9]+
+** vsetivli\tzero,0,e64,m4,t[au],m[au]
+** vmv.x.s\t[a-x0-9]+,v[0-9]+
+** ...
+*/
+int16_t foo (void *base, size_t vl)
+{
+int16_t maxVal = __riscv_vmv_x_s_i64m4_i64 (__riscv_vle64_v_i64m4 (base, 
vl));
+return maxVal;
+}
\ No newline at end of file
-- 
2.17.1

[PATCH] RISC-V: Insert vsetivli zero, 0 for vmv.x.s/vfmv.f.s instructions satisfying REG_P(operand[1]) in -O0.

2023-05-09 Thread Li Xu

This issue happens is because the operand1 of scalar move can be
REG_P (operand[1]) in the O0 case, which causes the VSETVL PASS to
not insert the vsetvl instruction correctly, and the compiler crashes.

Consider this following case:
int16_t foo1 (void *base, size_t vl)
{
int16_t maxVal = __riscv_vmv_x_s_i16m1_i16 (__riscv_vle16_v_i16m1 (base, 
vl));
return maxVal;
}

Before this patch:
bug.c:15:1: internal compiler error: Segmentation fault
   15 | }
  | ^
0x145d723 crash_signal
../.././riscv-gcc/gcc/toplev.cc:314
0x22929dd const_csr_operand(rtx_def*, machine_mode)
../.././riscv-gcc/gcc/config/riscv/predicates.md:44
0x2292a21 csr_operand(rtx_def*, machine_mode)
../.././riscv-gcc/gcc/config/riscv/predicates.md:46
0x23dfbb0 recog_356
../.././riscv-gcc/gcc/config/riscv/iterators.md:72
0x23efecd recog(rtx_def*, rtx_insn*, int*)
../.././riscv-gcc/gcc/config/riscv/iterators.md:89
0xdddc15 recog_memoized(rtx_insn*)
../.././riscv-gcc/gcc/recog.h:273

After this patch:
vsetivlizero,0,e16,m1,ta,ma
vmv.x.s a5,v1

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (gen_vsetvl_pat): For vfmv.f.s/vmv.x.s 
intruction replace null avl with (const_int 0).

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/scalar_move-10.c: New test.
* gcc.target/riscv/rvv/base/scalar_move-11.c: New test.
---
 gcc/config/riscv/riscv-vsetvl.cc  |  5 +++
 .../riscv/rvv/base/scalar_move-10.c   | 31 +++
 .../riscv/rvv/base/scalar_move-11.c   | 20 
 3 files changed, 56 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-11.c

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index d4d6f336ef9..dfca2515f83 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -618,6 +618,11 @@ static rtx
 gen_vsetvl_pat (enum vsetvl_type insn_type, const vl_vtype_info , rtx vl)
 {
   rtx avl = info.get_avl ();
+  /* if optimization == 0 and the instruction is vmv.x.s/vfmv.f.s,
+ set the value of avl to (const_int 0) so that VSETVL PASS will
+ insert vsetvl correctly.*/
+  if (avl == NULL_RTX && !optimize)
+avl = GEN_INT (0);
   rtx sew = gen_int_mode (info.get_sew (), Pmode);
   rtx vlmul = gen_int_mode (info.get_vlmul (), Pmode);
   rtx ta = gen_int_mode (info.get_ta (), Pmode);
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-10.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-10.c
new file mode 100644
index 000..186ae34335e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-10.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O0" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "riscv_vector.h"
+
+/*
+** foo1:
+** ...
+** vsetivli\tzero,0,e16,m1,t[au],m[au]
+** vmv.x.s\t[a-x0-9]+,v[0-9]+
+** ...
+*/
+int16_t foo1 (void *base, size_t vl)
+{
+int16_t maxVal = __riscv_vmv_x_s_i16m1_i16 (__riscv_vle16_v_i16m1 (base, 
vl));
+return maxVal;
+}
+
+/*
+** foo2:
+** ...
+** vsetivli\tzero,0,e32,m1,t[au],m[au]
+** vfmv.f.s\tf[a-x0-9]+,v[0-9]+
+** ...
+*/
+float foo2 (void *base, size_t vl)
+{
+float maxVal = __riscv_vfmv_f_s_f32m1_f32 (__riscv_vle32_v_f32m1 (base, 
vl));
+return maxVal;
+}
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-11.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-11.c
new file mode 100644
index 000..724cf74d217
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-11.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32d -O0" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "riscv_vector.h"
+
+/*
+** foo:
+** ...
+** vsetivli\tzero,0,e64,m4,t[au],m[au]
+** vmv.x.s\t[a-x0-9]+,v[0-9]+
+** vsetivli\tzero,0,e64,m4,t[au],m[au]
+** vmv.x.s\t[a-x0-9]+,v[0-9]+
+** ...
+*/
+int16_t foo (void *base, size_t vl)
+{
+int16_t maxVal = __riscv_vmv_x_s_i64m4_i64 (__riscv_vle64_v_i64m4 (base, 
vl));
+return maxVal;
+}
\ No newline at end of file
-- 
2.17.1

Re: [PATCH V2 2/2] [x86] x86: Add a new option -mdaz-ftz to enable FTZ and DAZ flags in MXCSR.

2023-05-09 Thread Hongtao Liu via Gcc-patches

On Thu, May 4, 2023 at 5:49 PM Richard Biener
 wrote:
>
> On Thu, May 4, 2023 at 7:37 AM Hongtao Liu via Gcc-patches
>  wrote:
> >
> > On Thu, May 4, 2023 at 1:35 PM Hongtao Liu  wrote:
> > >
> > > On Thu, Dec 22, 2022 at 4:04 PM Uros Bizjak  wrote:
> > > >
> > > > On Thu, Dec 22, 2022 at 5:40 AM Hongtao Liu  wrote:
> > > > >
> > > > > On Thu, Dec 22, 2022 at 6:46 AM Jakub Jelinek  
> > > > > wrote:
> > > > > >
> > > > > > On Wed, Dec 21, 2022 at 02:43:43PM -0800, H.J. Lu wrote:
> > > > > > > > > > > > >  Target RejectNegative
> > > > > > > > > > > > >  Set 80387 floating-point precision to 80-bit.
> > > > > > > > > > > > >
> > > > > > > > > > > > > +mdaz-ftz
> > > > > > > > > > > > > +Target
> > > > > > > > > > > >
> > > > > > > > > > > > s/Target/Driver/
> > > > > > > > > > > Change to Driver and Got error like:cc1: error: 
> > > > > > > > > > > command-line option
> > > > > > > > > > > ‘-mdaz-ftz’ is valid for the driver but not for C.
> > > > > > > > > > Hi Jakub:
> > > > > > > > > >   I didn't find a good solution to handle this error after 
> > > > > > > > > > changing
> > > > > > > > > > *Target* to *Driver*, Could you give some hints how to 
> > > > > > > > > > solve this
> > > > > > > > > > problem?
> > > > > > > > > > Or is it ok for you to mark this as *Target*(there won't be 
> > > > > > > > > > any save
> > > > > > > > > > and restore in cfun since there's no variable defined here.)
> > > > > > > > >
> > > > > > > > > Since all -m* options are passed to cc1, -mdaz-ftz can't be 
> > > > > > > > > marked
> > > > > > > > > as Driver.  We need to give it a different name to mark it as 
> > > > > > > > > Driver.
> > > > > > > >
> > > > > > > > It is ok like that.
> > > > > > > >
> > > > > > > > Jakub
> > > > > > > >
> > > > > > >
> > > > > > > The GCC driver handles -mno-XXX automatically for -mXXX.  Use
> > > > > > > a different name needs to handle the negation.   Or we can do 
> > > > > > > something
> > > > > > > like this to check for CL_DRIVER before passing it to cc1.
> > > > > >
> > > > > > I meant I'm ok with -m{,no-}daz-ftz option being Target rather than 
> > > > > > Driver.
> > > > > >
> > > > > Thanks.
> > > > > Uros, Is the patch for you?
> > > >
> > > > The original patch is then OK.
> > > Some users found the -mdaz-ftz option to be very useful, and want it
> > > to be backport to GCC12 and GCC11.
> > > But the patch is not a bugfix one, so i'd like to ask options from
> > s/options/opinions/g
> > > other maintainers, if the patch is suitable for backport?
> > >
> > > The backport patches include both this one and [1] which apply
> > > -mdaz-ftz to all other x86 targets.
> > >
> > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-January/610053.html
>
> Please make sure to not backport the -ffast-math linker spec change though.
> Also note the 12 branch is currently frozen.
You're meaning don't backport -shared part since it will cause
different behavior between different backends. ?
I'm trying to backport the daz-ftz part, it won't change the existed
behavior for fast-math or shared.
%{mdaz-ftz:crtfastmath.o%s;Ofast|ffast-math|funsafe-math-optimizations:%{!mno-daz-ftz:crtfastmath.o%s}}
>
> I'll defer to x86 maintainers on the -mdaz-ftz flag itself.
>
> Richard.
>
> > > >
> > > > Thanks,
> > > > Uros.
> > >
> > >
> > >
> > > --
> > > BR,
> > > Hongtao
> >
> >
> >
> > --
> > BR,
> > Hongtao



-- 
BR,
Hongtao

[PATCH] Fixes and workarounds for warnings during autoprofiledbootstrap build

2023-05-09 Thread Eugene Rozenfeld via Gcc-patches

autoprofiledbootstrap build produces new warnings since inlining decisions
are different from other builds. This patch contains fixes and workarounds
for those warnings.

Tested on x86_64-pc-linux-gnu.

gcc/ChangeLog:

* config/i386/i386-expand.cc (expand_vec_perm_interleave2): Work around
-Wstringop-overflow false positive during autoprofiledbootstrap
* ipa-devirt.cc (debug_tree_odr_name): Fix for -Wformat-overflow
warning during autoprofiledbootstrap
* lra-eliminations.cc (setup_can_eliminate): Work around
-Wmaybe-uninitialized false positive during autoprofiledbootstrap
* opts-common.cc (candidates_list_and_hint): Work around
-Wstringop-overflow false positive during autoprofiledbootstrap
* tree-ssa-ccp.cc (bit_value_unop): Work around -Wmaybe-uninitialized
false positive during autoprofiledbootstrap
* wide-int.h (wi::copy): Work around -Wmaybe-uninitialized false
positive during autoprofiledbootstrap
---
 gcc/config/i386/i386-expand.cc | 11 +++
 gcc/ipa-devirt.cc  |  3 ++-
 gcc/lra-eliminations.cc| 11 +++
 gcc/opts-common.cc |  1 +
 gcc/tree-ssa-ccp.cc| 11 +++
 gcc/wide-int.h | 11 +++
 6 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 634fe61ba79..be9f912775b 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -20419,6 +20419,13 @@ expand_vec_perm_pblendv (struct expand_vec_perm_d *d)
 
 static bool expand_vec_perm_interleave3 (struct expand_vec_perm_d *d);
 
+/* Work around -Wstringop-overflow false positive during 
autoprofiledbootstrap.  */
+
+# if GCC_VERSION >= 7001
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wstringop-overflow"
+#endif
+
 /* A subroutine of ix86_expand_vec_perm_const_1.  Try to simplify
a two vector permutation into a single vector permutation by using
an interleave operation to merge the vectors.  */
@@ -20737,6 +20744,10 @@ expand_vec_perm_interleave2 (struct expand_vec_perm_d 
*d)
   return true;
 }
 
+# if GCC_VERSION >= 7001
+#pragma GCC diagnostic pop
+#endif
+
 /* A subroutine of ix86_expand_vec_perm_const_1.  Try to simplify
a single vector cross-lane permutation into vpermq followed
by any of the single insn permutations.  */
diff --git a/gcc/ipa-devirt.cc b/gcc/ipa-devirt.cc
index 819860258d1..36ea266e834 100644
--- a/gcc/ipa-devirt.cc
+++ b/gcc/ipa-devirt.cc
@@ -4033,7 +4033,8 @@ debug_tree_odr_name (tree type, bool demangle)
   odr = cplus_demangle (odr, opts);
 }
 
-  fprintf (stderr, "%s\n", odr);
+  if (odr != NULL)
+fprintf (stderr, "%s\n", odr);
 }
 
 /* Register ODR enum so we later stream record about its values.  */
diff --git a/gcc/lra-eliminations.cc b/gcc/lra-eliminations.cc
index 4220639..05e2a7e0d68 100644
--- a/gcc/lra-eliminations.cc
+++ b/gcc/lra-eliminations.cc
@@ -138,6 +138,13 @@ lra_debug_elim_table (void)
   print_elim_table (stderr);
 }
 
+/* Work around -Wmaybe-uninitialized false positive during 
autoprofiledbootstrap.  */
+
+# if GCC_VERSION >= 4007
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wmaybe-uninitialized"
+#endif
+
 /* Setup possibility of elimination in elimination table element EP to
VALUE.  Setup FRAME_POINTER_NEEDED if elimination from frame
pointer to stack pointer is not possible anymore.  */
@@ -152,6 +159,10 @@ setup_can_eliminate (class lra_elim_table *ep, bool value)
 REGNO_POINTER_ALIGN (HARD_FRAME_POINTER_REGNUM) = 0;
 }
 
+# if GCC_VERSION >= 4007
+#pragma GCC diagnostic pop
+#endif
+
 /* Map: eliminable "from" register -> its current elimination,
or NULL if none.  The elimination table may contain more than
one elimination for the same hard register, but this map specifies
diff --git a/gcc/opts-common.cc b/gcc/opts-common.cc
index 23ddcaa3b55..0bb8e34e2b0 100644
--- a/gcc/opts-common.cc
+++ b/gcc/opts-common.cc
@@ -1388,6 +1388,7 @@ candidates_list_and_hint (const char *arg, char *,
   p[len] = ' ';
   p += len + 1;
 }
+  gcc_assert(p > str);
   p[-1] = '\0';
   return find_closest_string (arg, );
 }
diff --git a/gcc/tree-ssa-ccp.cc b/gcc/tree-ssa-ccp.cc
index 03a984f2adf..a54e5a90464 100644
--- a/gcc/tree-ssa-ccp.cc
+++ b/gcc/tree-ssa-ccp.cc
@@ -1976,6 +1976,13 @@ bit_value_binop (enum tree_code code, signop sgn, int 
width,
 }
 }
 
+/* Work around -Wmaybe-uninitialized false positive during 
autoprofiledbootstrap.  */
+
+# if GCC_VERSION >= 4007
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wmaybe-uninitialized"
+#endif
+
 /* Return the propagation value when applying the operation CODE to
the value RHS yielding type TYPE.  */
 
@@ -2011,6 +2018,10 @@ bit_value_unop (enum tree_code code, tree type, tree rhs)
   return val;
 }
 
+# if GCC_VERSION >= 4007
+#pragma GCC diagnostic pop
+#endif
+
 /*

[committed] CRIS: Fix ccmode typo in cris_postdbr_cmpelim

2023-05-09 Thread Hans-Peter Nilsson via Gcc-patches

Typo spotted while doing CCmode improvements, as a missed
optimization.  It's almost visible from the patch context;
there's not much done in terms of "mode-adjustment" when
replacing (reg:CC CRIS_CC0_REGNUM) with a copy!
This bug affects functions in the newlib printf-formatting
functions (nothing else in libgcc or newlib libc), with the
performance impact on coremark scores being less than 1e-6
(3/5078992 cycles, 6/48543 bytes).

* config/cris/cris.cc (cris_postdbr_cmpelim): Correct mode
of modeadjusted_dccr.
---
 gcc/config/cris/cris.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/cris/cris.cc b/gcc/config/cris/cris.cc
index 05dead9c0778..5b4cc4e204eb 100644
--- a/gcc/config/cris/cris.cc
+++ b/gcc/config/cris/cris.cc
@@ -432,7 +432,7 @@ cris_postdbr_cmpelim ()
  machine_mode ccmode = GET_MODE (src);
  rtx modeadjusted_dccr
= (ccmode == CCmode ? dccr
-  : gen_rtx_REG (CCmode, CRIS_CC0_REGNUM));
+  : gen_rtx_REG (ccmode, CRIS_CC0_REGNUM));
  rtx compare
/* We don't need to copy_rtx pat: we're going to
   delete that insn. */
-- 
2.30.2

RE: [EXTERNAL] Re: [PATCH][PUSHED] Fix cfg maintenance after inlining in AutoFDO

2023-05-09 Thread Eugene Rozenfeld via Gcc-patches

>>  free_dominance_info (CDI_DOMINATORS); @@ -1674,7 +1677,7 @@ 
>> auto_profile (void)
>>  pop_cfun ();
>>}
>>
>> -  return TODO_rebuild_cgraph_edges;
>> +  return 0;

>This change isn't mentioned - was it accidential?

No, it wasn't accidental. There is no reason to return 
TODO_rebuild_cgraph_edges since we called cgraph_edge::rebuild_edges () after 
each early_inline ().
Here is more context before that diff:

todo |= early_inline ();
autofdo::afdo_annotate_cfg (promoted_stmts);
compute_function_frequency ();

/* Local pure-const may imply need to fixup the cfg.  */
todo |= execute_fixup_cfg ();
if (todo & TODO_cleanup_cfg)
  cleanup_tree_cfg ();

free_dominance_info (CDI_DOMINATORS);
free_dominance_info (CDI_POST_DOMINATORS);
cgraph_edge::rebuild_edges ();
compute_fn_summary (cgraph_node::get (current_function_decl), true);
pop_cfun ();
  }

  return 0;

Thanks,

Eugene

-Original Message-
From: Richard Biener  
Sent: Monday, May 8, 2023 11:40 PM
To: Eugene Rozenfeld 
Cc: gcc-patches@gcc.gnu.org
Subject: [EXTERNAL] Re: [PATCH][PUSHED] Fix cfg maintenance after inlining in 
AutoFDO

On Tue, May 9, 2023 at 12:27 AM Eugene Rozenfeld via Gcc-patches 
 wrote:
>
> Todo from early_inliner needs to be propagated so that 
> cleanup_tree_cfg () is called if necessary.
>
> This bug was causing an assert in get_loop_body during ipa-sra in 
> autoprofiledbootstrap build since loops weren't fixed up and one of 
> the loops had num_nodes set to 0.
>
> Tested on x86_64-pc-linux-gnu.
>
> gcc/ChangeLog:
>
> * auto-profile.cc (auto_profile): Check todo from early_inline
> to see if cleanup_tree_vfg needs to be called.

_cfg

> (early_inline): Return todo from early_inliner.
> ---
>  gcc/auto-profile.cc | 21 -
>  1 file changed, 12 insertions(+), 9 deletions(-)
>
> diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc index 
> 360c42c4b89..e3af3555e75 100644
> --- a/gcc/auto-profile.cc
> +++ b/gcc/auto-profile.cc
> @@ -1589,13 +1589,14 @@ afdo_annotate_cfg (const stmt_set 
> _stmts)
>
>  /* Wrapper function to invoke early inliner.  */
>
> -static void
> +static unsigned int
>  early_inline ()
>  {
>compute_fn_summary (cgraph_node::get (current_function_decl), 
> true);
> -  unsigned todo = early_inliner (cfun);
> +  unsigned int todo = early_inliner (cfun);
>if (todo & TODO_update_ssa_any)
>  update_ssa (TODO_update_ssa);
> +  return todo;
>  }
>
>  /* Use AutoFDO profile to annoate the control flow graph.
> @@ -1651,20 +1652,22 @@ auto_profile (void)
> function before annotation, so the profile inside bar@loc_foo2
> will be useful.  */
>  autofdo::stmt_set promoted_stmts;
> +unsigned int todo = 0;
>  for (int i = 0; i < 10; i++)
>{
> -if (!flag_value_profile_transformations
> -|| !autofdo::afdo_vpt_for_early_inline (_stmts))
> -  break;
> -early_inline ();
> +   if (!flag_value_profile_transformations
> +   || !autofdo::afdo_vpt_for_early_inline (_stmts))
> + break;
> +   todo |= early_inline ();
>}
>
> -early_inline ();
> +todo |= early_inline ();
>  autofdo::afdo_annotate_cfg (promoted_stmts);
>  compute_function_frequency ();
>
>  /* Local pure-const may imply need to fixup the cfg.  */
> -if (execute_fixup_cfg () & TODO_cleanup_cfg)
> +todo |= execute_fixup_cfg ();
> +if (todo & TODO_cleanup_cfg)
>cleanup_tree_cfg ();
>
>  free_dominance_info (CDI_DOMINATORS); @@ -1674,7 +1677,7 @@ 
> auto_profile (void)
>  pop_cfun ();
>}
>
> -  return TODO_rebuild_cgraph_edges;
> +  return 0;

This change isn't mentioned - was it accidential?

Otherwise looks OK.

Thanks,
Richard.

>  }
>  } /* namespace autofdo.  */
>
> --
> 2.25.1
>

Re: [PATCH] libffi: fix handling of homogeneous float128 structs [PR109447]

2023-05-09 Thread Peter Bergner via Gcc-patches

On 5/9/23 3:50 PM, Andreas Schwab wrote:
> On Mai 09 2023, Peter Bergner via Gcc-patches wrote:
> 
>> It's almost as if the top level build machinery
>> adds a LD_LIBRARY_PATH=...
> 
> See how the toplevel Makefile sets LD_LIBRARY_PATH (via RPATH_ENVVAR) if
> gcc-bootstrap is set.

I'm sorry to be dense, but can you point to the specific line?  In my
$GCC_BUILD/Makefile, the only mention of LD_LIBRARY_PATH is:

  RPATH_ENVVAR = LD_LIBRARY_PATH

...so that isn't setting LD_LIBRARY_PATH, but using it.

Peter

Re: [PATCH] libffi: fix handling of homogeneous float128 structs [PR109447]

2023-05-09 Thread Andreas Schwab

On Mai 09 2023, Peter Bergner via Gcc-patches wrote:

> It's almost as if the top level build machinery
> adds a LD_LIBRARY_PATH=...

See how the toplevel Makefile sets LD_LIBRARY_PATH (via RPATH_ENVVAR) if
gcc-bootstrap is set.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

Re: [PATCH] Add RTX codes for BITREVERSE and COPYSIGN.

2023-05-09 Thread Joseph Myers

On Sat, 6 May 2023, Roger Sayle wrote:

> An analysis of backend UNSPECs reveals that two of the most common UNSPECs
> across target backends are for copysign and bit reversal.  This patch
> adds RTX codes for these expressions to allow their representation to
> be standardized, and them to optimized by the middle-end RTL optimizers.

Note we have bug 50481 requesting (target-independent) built-in functions 
for bit reversal (so this patch could be useful as a basis for 
implementing such built-in functions, with appropriate lowering or libgcc 
implementation for targets without relevant instructions).

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH] c++, v2: Reject attributes without arguments used as pack expansion [PR109756]

2023-05-09 Thread Jason Merrill via Gcc-patches


On 5/9/23 15:23, Jakub Jelinek wrote:

On Tue, May 09, 2023 at 01:17:09PM -0400, Jason Merrill wrote:

How about changing cp_parser_std_attribute to set TREE_VALUE to
error_mark_node if it skips arguments?


In limited testing that seems to work (tried
GXX_TESTSUITE_STDS=98,11,14,17,20,2b make -j32 -k check-g++ 
RUNTESTFLAGS='dg.exp=*attr*'
so far with it).

Will bootstrap/regtest it tonight.

Ok if it passes?


OK.


2023-05-09  Jakub Jelinek  

PR c++/109756
* parser.cc (cp_parser_std_attribute): For unknown attributes with
arguments set TREE_VALUE (attribute) to error_mark_node after skipping
the balanced tokens.
(cp_parser_std_attribute_list): If ... is used after attribute without
arguments, diagnose it and return error_mark_node.  If
TREE_VALUE (attribute) is error_mark_node, don't call
make_pack_expansion nor return early error_mark_node.

* g++.dg/cpp0x/gen-attrs-78.C: New test.

--- gcc/cp/parser.cc.jj 2023-04-25 16:40:42.010723809 +0200
+++ gcc/cp/parser.cc2023-05-09 20:22:42.025601924 +0200
@@ -29468,9 +29468,12 @@ cp_parser_std_attribute (cp_parser *pars
  }
  
  	/* For unknown attributes, just skip balanced tokens instead of

-  trying to parse the arguments.  */
+  trying to parse the arguments.  Set TREE_VALUE (attribute) to
+  error_mark_node to distinguish skipped arguments from attributes
+  with no arguments.  */
for (size_t n = cp_parser_skip_balanced_tokens (parser, 1) - 1; n; --n)
  cp_lexer_consume_token (parser->lexer);
+   TREE_VALUE (attribute) = error_mark_node;
return attribute;
}
  
@@ -29562,7 +29565,13 @@ cp_parser_std_attribute_list (cp_parser

  if (attribute == NULL_TREE)
error_at (token->location,
  "expected attribute before %<...%>");
- else
+ else if (TREE_VALUE (attribute) == NULL_TREE)
+   {
+ error_at (token->location, "attribute with no arguments "
+"contains no parameter packs");
+ return error_mark_node;
+   }
+ else if (TREE_VALUE (attribute) != error_mark_node)
{
  tree pack = make_pack_expansion (TREE_VALUE (attribute));
  if (pack == error_mark_node)
--- gcc/testsuite/g++.dg/cpp0x/gen-attrs-78.C.jj2023-05-08 
12:33:13.387581760 +0200
+++ gcc/testsuite/g++.dg/cpp0x/gen-attrs-78.C   2023-05-08 12:32:23.146301128 
+0200
@@ -0,0 +1,29 @@
+// PR c++/109756
+// { dg-do compile { target c++11 } }
+// { dg-options "-Wno-attributes" }
+
+template 
+[[noreturn...]]// { dg-error "attribute 
with no arguments contains no parameter packs" }
+[[deprecated...]]  // { dg-error "attribute with no 
arguments contains no parameter packs" }
+[[nodiscard...]]   // { dg-error "attribute with no 
arguments contains no parameter packs" }
+int foo (int x)
+{
+  switch (x)
+{
+case 1:
+  [[likely...]];   // { dg-error "attribute with no 
arguments contains no parameter packs" }
+  [[fallthrough...]];  // { dg-error "attribute with no 
arguments contains no parameter packs" }
+case 2:
+  [[unlikely...]]; // { dg-error "attribute with no 
arguments contains no parameter packs" }
+
+  break;
+default:
+  break;
+}
+  struct T {};
+  struct S { [[no_unique_address...]] T t; };  // { dg-error "attribute with no 
arguments contains no parameter packs" }
+  for (;;)
+;
+}
+
+int a = foo <1, 2, 3> (4);


Jakub

Re: [Patch, fortran] PR103716 - [10/11/12/13/14 Regression] ICE in gimplify_expr, at gimplify.c:15964

2023-05-09 Thread Paul Richard Thomas via Gcc-patches

Duuh! There's even a choice :-)

Paul


On Tue, 9 May 2023 at 19:29, Harald Anlauf  wrote:

> Hi Paul,
>
> On 5/9/23 18:00, Paul Richard Thomas via Gcc-patches wrote:
> > Hi All,
> >
> > This problem caused the gimplifier failure because the reference chain
> > ending in an inquiry_len still retained a full array reference. This had
> > already been corrected for deferred character lengths but the fix extends
> > this to all characters without a length expression and integer
> expressions,
> > which is the correct type of course, that retain a full  array_spec. The
> > nullification of the se->string length in conv_inquiry is a
> > belts-and-braces measure to stop it from winding up as a hidden argument
> in
> > procedure calls.
> >
> > OK for trunk and, after a decent delay, backporting?
>
> ENOTESTCASE.
>
> Nevertheless the patch LGTM and is also OK for backporting.
>
> Thanks for fixing this!
>
> Harald
>
>
> > Cheers
> >
> > Paul
> >
> > Fortran: Fix assumed length chars and len inquiry [PR103716]
> >
> > 2023-05-09  Paul Thomas  
> >
> > gcc/fortran
> > PR fortran/103716
> > * resolve.cc (gfc_resolve_ref): Conversion of array_ref into an
> > element should be done for all characters without a len expr,
> > not just deferred lens, and for integer expressions.
> > * trans-expr.cc (conv_inquiry): For len and kind inquiry refs,
> > set the se string_length to NULL_TREE.
> >
> > gcc/testsuite/
> > PR fortran/103716
> > * gfortran.dg/pr103716 : New test.
>
>

-- 
"If you can't explain it simply, you don't understand it well enough" -
Albert Einstein
! { dg-do run }
!
! The gimplifier used to throw a fit on the write statements in f1 and f2.
!
! Contributed by Gerhard Steinmetz  
!
module m
  character(6) :: buffer
contains
  integer function g(x)
integer :: x
g = x
  end
  integer function f1(x)
character(*) :: x(*)
write (buffer(1:3),'(i2)') g(x%len)
  end
  integer function f2(x)
character(*) :: x(3)
write (buffer(4:6),'(i2)') g(x%len)
  end
end module m

  use m
  integer :: i(2), j
  character(2), dimension(3) :: chr = ['ab','cd','ef']
  i(1) = f1(chr)
  i(2) = f2(chr)
  if (any (i .eq.2)) stop 1
  if (trim(buffer) .ne. ' 2  2') stop 2
end
! { dg-do compile }
!
! The gimplifier used to throw a fit on thes two functions.
!
! Contributed by Gerhard Steinmetz  
!
function f1(x)
   character(*) :: x(*)
   print *, g(x%len)
end

function f2(x)
   character(*) :: x(3)
   print *, g(x%len)
end

Re: [PATCH] libffi: fix handling of homogeneous float128 structs [PR109447]

2023-05-09 Thread Peter Bergner via Gcc-patches

On 5/5/23 4:42 PM, Jakub Jelinek wrote:
> On Thu, May 04, 2023 at 02:29:34PM -0500, Peter Bergner wrote:
>> Merged from upstream libffi commit: 464b4b66e3cf3b5489e730c1466ee1bf825560e0
>>
>> 2023-05-03  Dan Horák 
>>
>> libffi/
>>  PR libffi/109447
>>  * src/powerpc/ffi_linux64.c (ffi_prep_args64): Update arg.f128 pointer.
> 
> Ok for 14/13.2/12.4 (i.e. after 12.3 is out)/11.4

Thanks, I've pushed the GCC trunk and GCC 13 commits and now that GCC 12.3
is released, I have pushed the GCC 12 backport too.

I have yet to push to GCC 11 yet, due to bootstrap is broken when building
GCC 11 on our Fedora 36 system, so I cannot test there yet.  It also seems
GCC 11 is missing some IEEE128 changes from upstream libffi that GCC 12 and
later have, so it might not even be appropriate, but I'll wait for bootstrap
to be restored before making any decisions.  The problem seem to be that the
system ld.gold which is used to link libgo.so is dying with an undefined
runtime symbol:

/usr/bin/ld.gold: 
/home/bergner/gcc/build/gcc-fsf-11-baselib-regtest/powerpc64le-linux/libstdc++-v3/src/.libs/libstdc++.so.6:
 version `GLIBCXX_3.4.30' not found (required by /usr/bin/ld.gold)
collect2: error: ld returned 1 exit status

Running the link command by hand or via a make in powerpc64le-linux/libgo,
the link succeeds.  It's only when I type make in the top level build dir
where I see this error.  It's almost as if the top level build machinery
adds a LD_LIBRARY_PATH=... forcing the system ld.gold (which was built
with a gcc12 based compiler) to pick up the build's gcc11 libstdc++ which
doesn't have that symbol, rather than the gcc12 system libstdc++.  Has anyone
seen anything like that before?

Peter

Re: Question on patch -fprofile-partial-training

2023-05-09 Thread Qing Zhao via Gcc-patches

Honza,

Thanks a lot for your comments. 

> On May 9, 2023, at 6:22 AM, Jan Hubicka  wrote:
> 
> 
> From my understanding, -fprofile-partial-training is one important option 
> for PGO performance.
 
 I don't think so, speed benefit would be rather small I guess.
>>> I saw some articles online to introduce this option for gcc10,
>>> https://documentation.suse.com/sbp/all/html/SBP-GCC-10/index.html#sec-gcc10-pgo
>> 
>> Hi.
>> 
>> Ah, I see.
>> 
>>> And also based on my previous experience in Studio compiler, I guess that 
>>> this one might have
>>> Some good performance impact on PGO.  Is there any old performance data on 
>>> this option? (I cannot find online)
>> 
>> Maybe Honza can chime in here? Or Martin who is the author of the white 
>> paper.
> 
> Main motivation for this was profiling programs that contain specific
> code paths for different CPUs (such as graphics library in Firefox or Linux
> kernel). In the situation training machine differs from the machine
> program is run later, we end up optimizing for size all code paths
> except ones taken by the specific CPU.  This patch essentially tells gcc
> to consider every non-trained function as built without profile
> feedback.
Make sense.
> 
> For Firefox it had important impact on graphics rendering tests back
> then since the building machined had AVX while the benchmarking did not.
> Some benchmarks improved several times which is not a surprise if you
> consider tight graphics rendering loop optimized for size versus
> vectorized one.  

That’s a lot of improvement. So, without -fprofile-partial-training, the PGO 
hurt the performance for those cases? 

> The patch has bad effect on code size which in turn
> impacts performance too, so I think it makes sense to use
> -fprofile-partial-training with bit of care (i.e. only one code where
> such scenarios are likely).

Right. 
> 
> As for backporting, I do not have checkout of GCC 8 right now. It
> depends on profile infrastructure that was added in 2017 (so stage1 of
> GCC 8), so the patch may backport quite easilly.  I am not 100% sure
> what shape the infrastrucure was in the first version, but I am quite
> convinced it had the necessary bits - it was able to make the difference
> between 0 profile count and missing profile feedback.

This is good to know, I will try to back port to GCC8 and let them test to see 
any good impact.

Qing
> 
> Honza
>>

[committed] libstdc++: Fix pretty printers and add tests

2023-05-09 Thread Jonathan Wakely via Gcc-patches

Tested powerpc64le-linux.

Pushed to trunk. I'll backport to gcc-13 too.

-- >8--

This fixes a couple of errors in the printers for chrono types, and adds
tests to ensure they keep working.

libstdc++-v3/ChangeLog:

* python/libstdcxx/v6/printers.py (StdChronoDurationPrinter):
Print floating-point durations correctly.
(StdChronoTimePointPrinter): Support printing only the value,
not the type name. Uncomment handling for known clocks.
(StdChronoZonedTimePrinter): Remove type names from output.
(StdChronoCalendarPrinter): Fix hh_mm_ss member access.
(StdChronoTimeZonePrinter): Add equals sign to output.
* testsuite/libstdc++-prettyprinters/chrono.cc: New test.
---
 libstdc++-v3/python/libstdcxx/v6/printers.py  | 44 ++
 .../libstdc++-prettyprinters/chrono.cc| 87 +++
 2 files changed, 113 insertions(+), 18 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/libstdc++-prettyprinters/chrono.cc

diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py 
b/libstdc++-v3/python/libstdcxx/v6/printers.py
index 2c50e60eae7..b4c427d487c 100644
--- a/libstdc++-v3/python/libstdcxx/v6/printers.py
+++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
@@ -1863,8 +1863,8 @@ class StdAtomicPrinter:
 
 class StdFormatArgsPrinter:
 "Print a std::basic_format_args"
-# TODO: add printer for basic_format_arg and print out children
-# TODO: add printer for basic_format_args::_Store
+# TODO: add printer for basic_format_arg and print out children.
+# TODO: add printer for __format::_ArgStore.
 
 def __init__(self, typename, val):
 self.typename = strip_versioned_namespace(typename)
@@ -1930,7 +1930,10 @@ class StdChronoDurationPrinter:
 return "[{}/{}]s".format(num, den)
 
 def to_string(self):
-return "std::chrono::duration = { %d%s }" % (self.val['__r'], 
self._suffix())
+r = self.val['__r']
+if r.type.strip_typedefs().code == gdb.TYPE_CODE_FLT:
+r = "%g" % r
+return "std::chrono::duration = {{ {}{} }}".format(r, self._suffix())
 
 
 class StdChronoTimePointPrinter:
@@ -1947,19 +1950,19 @@ class StdChronoTimePointPrinter:
 or name == 'std::chrono::system_clock':
 return ('std::chrono::sys_time', 0)
 # XXX need to remove leap seconds from utc, gps, and tai
-#if name == 'std::chrono::utc_clock':
-#return ('std::chrono::utc_time', 0)
-#if name == 'std::chrono::gps_clock':
-#return ('std::chrono::gps_clock time_point', 315964809)
-#if name == 'std::chrono::tai_clock':
-#return ('std::chrono::tai_clock time_point', -378691210)
+if name == 'std::chrono::utc_clock':
+return ('std::chrono::utc_time', None) # XXX
+if name == 'std::chrono::gps_clock':
+return ('std::chrono::gps_time', None) # XXX 315964809
+if name == 'std::chrono::tai_clock':
+return ('std::chrono::tai_time', None) # XXX -378691210
 if name == 'std::filesystem::__file_clock':
 return ('std::chrono::file_time', 6437664000)
 if name == 'std::chrono::local_t':
 return ('std::chrono::local_time', 0)
 return ('{} time_point'.format(name), None)
 
-def to_string(self):
+def to_string(self, abbrev = False):
 clock, offset = self._clock()
 d = self.val['__d']
 r = d['__r']
@@ -1970,11 +1973,14 @@ class StdChronoTimePointPrinter:
 num, den = printer._ratio()
 secs = (r * num / den) + offset
 try:
-dt = datetime.fromtimestamp(secs, _utc_timezone)
+dt = datetime.datetime.fromtimestamp(secs, _utc_timezone)
 time = ' [{:%Y-%m-%d %H:%M:%S}]'.format(dt)
 except:
 pass
-return '%s = {%d%s%s}' % (clock, r, suffix, time)
+s = '%d%s%s' % (r, suffix, time)
+if abbrev:
+return s
+return '%s = { %s }' % (clock, s)
 
 class StdChronoZonedTimePrinter:
 "Print a std::chrono::zoned_time"
@@ -1984,9 +1990,11 @@ class StdChronoZonedTimePrinter:
 self.val = val
 
 def to_string(self):
-zone = self.val['_M_zone'].dereference()
+zone = self.val['_M_zone'].dereference()['_M_name']
 time = self.val['_M_tp']
-return 'std::chrono::zoned_time = {{{} {}}}'.format(zone, time)
+printer = StdChronoTimePointPrinter(time.type.name, time)
+time = printer.to_string(True)
+return 'std::chrono::zoned_time = {{ {} {} }}'.format(zone, time)
 
 
 months = [None, 'January', 'February', 'March', 'April', 'May', 'June',
@@ -2037,13 +2045,13 @@ class StdChronoCalendarPrinter:
 if typ == 'std::chrono::year_month_day_last':
 return '{}/{}'.format(y, val['_M_mdl'])
 if typ == 'std::chrono::year_month_weekday':
-return '{}/{}'.format(y, m,

[PATCH] configure: Implement --enable-host-pie

2023-05-09 Thread Marek Polacek via Gcc-patches

[ This is my third attempt to add this configure option.  The first
version was approved but it came too late in the development cycle.
The second version was also approved, but I had to revert it:
.
I've fixed the problem (by moving $(PICFLAG) from INTERNAL_CFLAGS to
ALL_COMPILERFLAGS).  Another change is that since r13-4536 I no longer
need to touch Makefile.def, so this patch is simplified. ]

This patch implements the --enable-host-pie configure option which
makes the compiler executables PIE.  This can be used to enhance
protection against ROP attacks, and can be viewed as part of a wider
trend to harden binaries.

It is similar to the option --enable-host-shared, except that --e-h-s
won't add -shared to the linker flags whereas --e-h-p will add -pie.
It is different from --enable-default-pie because that option just
adds an implicit -fPIE/-pie when the compiler is invoked, but the
compiler itself isn't PIE.

Since r12-5768-gfe7c3ecf, PCH works well with PIE, so there are no PCH
regressions.

When building the compiler, the build process may use various in-tree
libraries; these need to be built with -fPIE so that it's possible to
use them when building a PIE.  For instance, when --with-included-gettext
is in effect, intl object files must be compiled with -fPIE.  Similarly,
when building in-tree gmp, isl, mpfr and mpc, they must be compiled with
-fPIE.

With this patch and --enable-host-pie used to configure gcc:

$ file gcc/cc1{,plus,obj} gcc/f951 gcc/lto1 gcc/cpp
gcc/cc1: ELF 64-bit LSB pie executable, x86-64, version 1 (GNU/Linux), 
dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 
3.2.0, with debug_info, not stripped
gcc/cc1plus: ELF 64-bit LSB pie executable, x86-64, version 1 (GNU/Linux), 
dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 
3.2.0, with debug_info, not stripped
gcc/f951:ELF 64-bit LSB pie executable, x86-64, version 1 (GNU/Linux), 
dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 
3.2.0, with debug_info, not stripped
gcc/cc1obj:  ELF 64-bit LSB pie executable, x86-64, version 1 (GNU/Linux), 
dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 
3.2.0, with debug_info, not stripped
gcc/lto1:ELF 64-bit LSB pie executable, x86-64, version 1 (GNU/Linux), 
dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 
3.2.0, with debug_info, not stripped
gcc/cpp: ELF 64-bit LSB pie executable, x86-64, version 1 (GNU/Linux), 
dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 
3.2.0, with debug_info, not stripped

I plan to add an option to link with -Wl,-z,now.

Bootstrapped on x86_64-pc-linux-gnu with --with-included-gettext
--enable-host-pie as well as without --enable-host-pie.  Also tested
on a Debian system where the system gcc was configured with
--enable-default-pie.

ChangeLog:

* configure.ac (--enable-host-pie): New check.  Set PICFLAG after this
check.
* configure: Regenerate.

c++tools/ChangeLog:

* Makefile.in: Rename PIEFLAG to PICFLAG.  Set LD_PICFLAG.  Use it.
Use pic/libiberty.a if PICFLAG is set.
* configure.ac (--enable-default-pie): Set PICFLAG instead of PIEFLAG.
(--enable-host-pie): New check.
* configure: Regenerate.

fixincludes/ChangeLog:

* Makefile.in: Set and use PICFLAG and LD_PICFLAG.  Use the "pic"
build of libiberty if PICFLAG is set.
* configure.ac:
* configure: Regenerate.

gcc/ChangeLog:

* Makefile.in: Set LD_PICFLAG.  Use it.  Set enable_host_pie.
Remove NO_PIE_CFLAGS and NO_PIE_FLAG.  Pass LD_PICFLAG to
ALL_LINKERFLAGS.  Use the "pic" build of libiberty if --enable-host-pie.
* configure.ac (--enable-host-shared): Don't set PICFLAG here.
(--enable-host-pie): New check.  Set PICFLAG and LD_PICFLAG after this
check.
* configure: Regenerate.
* doc/install.texi: Document --enable-host-pie.

gcc/d/ChangeLog:

* Make-lang.in: Remove NO_PIE_CFLAGS.

intl/ChangeLog:

* Makefile.in: Use @PICFLAG@ in COMPILE as well.
* configure.ac (--enable-host-shared): Don't set PICFLAG here.
(--enable-host-pie): New check.  Set PICFLAG after this check.
* configure: Regenerate.

libcody/ChangeLog:

* Makefile.in: Pass LD_PICFLAG to LDFLAGS.
* configure.ac (--enable-host-shared): Don't set PICFLAG here.
(--enable-host-pie): New check.  Set PICFLAG and LD_PICFLAG after this
check.
* configure: Regenerate.

libcpp/ChangeLog:

* configure.ac (--enable-host-shared): Don't set PICFLAG here.
(--enable-host-pie): New check.  Set PICFLAG after this check.
* configure: Regenerate.

libdecnumber/ChangeLog:

* configure.ac (--enable-host-shared): Don't set PICFLAG here.
(--enable-host-pie):

[PATCH] c++, v2: Reject attributes without arguments used as pack expansion [PR109756]

2023-05-09 Thread Jakub Jelinek via Gcc-patches

On Tue, May 09, 2023 at 01:17:09PM -0400, Jason Merrill wrote:
> How about changing cp_parser_std_attribute to set TREE_VALUE to
> error_mark_node if it skips arguments?

In limited testing that seems to work (tried
GXX_TESTSUITE_STDS=98,11,14,17,20,2b make -j32 -k check-g++ 
RUNTESTFLAGS='dg.exp=*attr*'
so far with it).

Will bootstrap/regtest it tonight.

Ok if it passes?

2023-05-09  Jakub Jelinek  

PR c++/109756
* parser.cc (cp_parser_std_attribute): For unknown attributes with
arguments set TREE_VALUE (attribute) to error_mark_node after skipping
the balanced tokens.
(cp_parser_std_attribute_list): If ... is used after attribute without
arguments, diagnose it and return error_mark_node.  If
TREE_VALUE (attribute) is error_mark_node, don't call
make_pack_expansion nor return early error_mark_node.

* g++.dg/cpp0x/gen-attrs-78.C: New test.

--- gcc/cp/parser.cc.jj 2023-04-25 16:40:42.010723809 +0200
+++ gcc/cp/parser.cc2023-05-09 20:22:42.025601924 +0200
@@ -29468,9 +29468,12 @@ cp_parser_std_attribute (cp_parser *pars
  }
 
/* For unknown attributes, just skip balanced tokens instead of
-  trying to parse the arguments.  */
+  trying to parse the arguments.  Set TREE_VALUE (attribute) to
+  error_mark_node to distinguish skipped arguments from attributes
+  with no arguments.  */
for (size_t n = cp_parser_skip_balanced_tokens (parser, 1) - 1; n; --n)
  cp_lexer_consume_token (parser->lexer);
+   TREE_VALUE (attribute) = error_mark_node;
return attribute;
   }
 
@@ -29562,7 +29565,13 @@ cp_parser_std_attribute_list (cp_parser
  if (attribute == NULL_TREE)
error_at (token->location,
  "expected attribute before %<...%>");
- else
+ else if (TREE_VALUE (attribute) == NULL_TREE)
+   {
+ error_at (token->location, "attribute with no arguments "
+"contains no parameter packs");
+ return error_mark_node;
+   }
+ else if (TREE_VALUE (attribute) != error_mark_node)
{
  tree pack = make_pack_expansion (TREE_VALUE (attribute));
  if (pack == error_mark_node)
--- gcc/testsuite/g++.dg/cpp0x/gen-attrs-78.C.jj2023-05-08 
12:33:13.387581760 +0200
+++ gcc/testsuite/g++.dg/cpp0x/gen-attrs-78.C   2023-05-08 12:32:23.146301128 
+0200
@@ -0,0 +1,29 @@
+// PR c++/109756
+// { dg-do compile { target c++11 } }
+// { dg-options "-Wno-attributes" }
+
+template 
+[[noreturn...]]// { dg-error 
"attribute with no arguments contains no parameter packs" }
+[[deprecated...]]  // { dg-error "attribute with 
no arguments contains no parameter packs" }
+[[nodiscard...]]   // { dg-error "attribute with 
no arguments contains no parameter packs" }
+int foo (int x)
+{
+  switch (x)
+{
+case 1:
+  [[likely...]];   // { dg-error "attribute with 
no arguments contains no parameter packs" }
+  [[fallthrough...]];  // { dg-error "attribute with 
no arguments contains no parameter packs" }
+case 2:
+  [[unlikely...]]; // { dg-error "attribute with 
no arguments contains no parameter packs" }
+
+  break;
+default:
+  break;
+}
+  struct T {};
+  struct S { [[no_unique_address...]] T t; };  // { dg-error "attribute with 
no arguments contains no parameter packs" }
+  for (;;)
+;
+}
+
+int a = foo <1, 2, 3> (4);


Jakub

Re: Testsuite: Add missing 'torture-init'/'torture-finish' around 'LTO_TORTURE_OPTIONS' usage (was: Let each 'lto_init' determine the default 'LTO_OPTIONS', and 'torture-init' the 'LTO_TORTURE_OPTIONS

2023-05-09 Thread Christophe Lyon via Gcc-patches

Hi Thomas,

On Tue, 9 May 2023 at 17:17, Christophe Lyon 
wrote:

> Hi!
>
> On Tue, 9 May 2023 at 11:00, Thomas Schwinge 
> wrote:
>
>> Hi Christophe!
>>
>> On 2023-05-09T09:32:55+0200, Christophe Lyon 
>> wrote:
>> > On Wed, 3 May 2023 at 13:47, Richard Biener via Gcc-patches <
>> gcc-patches@gcc.gnu.org> wrote:
>> >> On Wed, 3 May 2023, Thomas Schwinge wrote:
>> >> > "Let each 'lto_init' determine the default 'LTO_OPTIONS', and
>> 'torture-init' the 'LTO_TORTURE_OPTIONS'"?
>> >
>> > This is causing issues on arm/aarch64, including:
>> >
>> > ERROR: can't read "LTO_TORTURE_OPTIONS": no such variable
>> > in gcc.target/arm/acle/acle.exp:
>> >
>> > ERROR: torture-init: LTO_TORTURE_OPTIONS is not empty as expected
>> > in gcc.target/aarch64/sls-mitigation/sls-mitigation.exp,
>> > gcc.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp,
>> > gcc.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp,
>> > gcc.target/aarch64/torture/aarch64-torture.exp
>> >
>> > and maybe others
>> >
>> > Are other targets affected too?
>>
>> Sorry for that -- it means, the safe-guards I added are working as
>> expected.
>>
>> Please test whether all these issues are gone with the attached
>> "Testsuite: Add missing 'torture-init'/'torture-finish' around
>> 'LTO_TORTURE_OPTIONS' usage"?
>>
>>
> Your patch seemed reasonable,  but it doesn't work :-(
>
> Well now I get:
> ERROR: torture-init: LTO_TORTURE_OPTIONS is not empty as expected
> because gcc-dg-runtest itself calls torture-init
>
> but I'm not sure where LTO_TORTURE_OPTIONS is set
>
>
Just checking, are you able to test your changes on arm (a cross toolchain
is OK) ?
The problem shows up even if running only acle.exp, so it's quick once you
have built the toolchain once.

I spent some time looking at it, and the conflict is that the .exp file
calls torture-init and gcc-dg-runtest, which in turn calls torture-init
again, leading to the error.

I haven't checked the details of why there are similar failures on aarch64.

Thanks,

Christophe




> Christophe
>
>
>> Grüße
>>  Thomas
>>
>>
>> -
>> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201,
>> 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer:
>> Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München;
>> Registergericht München, HRB 106955
>>
>

Re: [Patch, fortran] PR97122 - Spurious FINAL ... must be in the specification part of a MODULE

2023-05-09 Thread Steve Kargl via Gcc-patches

On Tue, May 09, 2023 at 08:35:00PM +0200, Harald Anlauf wrote:
> On 5/9/23 20:29, Steve Kargl via Gcc-patches wrote:
> > 
> > It's not needed.  See above.  gfc_state_stack->previous is referenced
> > a few lines above the if-stmt.  The reference will segfault if the
> > pointer is NULL.
> > 
> 
> You're absolutely right.  So it is OK as is.

Thanks for keeping us honest and the review.

-- 
Steve

Re: [Patch, fortran] PR97122 - Spurious FINAL ... must be in the specification part of a MODULE

2023-05-09 Thread Harald Anlauf via Gcc-patches


On 5/9/23 20:29, Steve Kargl via Gcc-patches wrote:

On Tue, May 09, 2023 at 08:24:16PM +0200, Harald Anlauf wrote:

Hi Paul,

On 5/9/23 17:51, Paul Richard Thomas via Gcc-patches wrote:

Hi All,

Thanks to Steve Kargl for the fix. It caused finalize_8.f03 to fail because
this testcase checked that finalizable derived types could not be specified
in a submodule. I have replaced the original test with a test of the patch.

Thanks also to Malcolm Cohen for guidance on this.

OK for trunk?


the patch looks good to me.  However:

@@ -11637,8 +11637,9 @@ gfc_match_final_decl (void)
block = gfc_state_stack->previous->sym;

  ^
See below.


gcc_assert (block);

-  if (!gfc_state_stack->previous || !gfc_state_stack->previous->previous
-  || gfc_state_stack->previous->previous->state != COMP_MODULE)
+  if (gfc_state_stack->previous->previous
+  && gfc_state_stack->previous->previous->state != COMP_MODULE
+  && gfc_state_stack->previous->previous->state != COMP_SUBMODULE)
  {
gfc_error ("Derived type declaration with FINAL at %C must be in
the"
  " specification part of a MODULE");

I am wondering if we should keep the protection against a potential
NULL pointer dereference (i.e. gfc_state_stack->previous == NULL) for
possibly invalid code.  I have failed to produce a simple testcase,
but others may have "better" ideas.


It's not needed.  See above.  gfc_state_stack->previous is referenced
a few lines above the if-stmt.  The reference will segfault if the
pointer is NULL.



You're absolutely right.  So it is OK as is.

Re: [PATCH 01/16] arm: [MVE intrinsics] add binary_maxvminv shape

2023-05-09 Thread Christophe Lyon via Gcc-patches





On 5/9/23 15:50, Kyrylo Tkachov wrote:

Hi Christophe,


-Original Message-
From: Christophe Lyon 
Sent: Tuesday, May 9, 2023 1:19 PM
To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
Richard Earnshaw ; Richard Sandiford

Cc: Christophe Lyon 
Subject: [PATCH 01/16] arm: [MVE intrinsics] add binary_maxvminv shape

This patch adds the binary_maxvminv shape description.


This patch series is fairly mechanical (that's not to say simple!) and in line 
with the other series in this area.


Yeah it took me a bit of time & thinking to put the series in a 
mechanical shape, hopefully easy to review.



You obviously know what you're doing here so I'm comfortable approving it.
I did have a look at the patches individually and with the comment on patch 
06/16 addressed this series is ok for trunk.


Thanks,

Christophe


Thanks,
Kyrill



2022-09-08  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-shapes.cc (binary_maxvminv): New.
* config/arm/arm-mve-builtins-shapes.h (binary_maxvminv): New.
---
  gcc/config/arm/arm-mve-builtins-shapes.cc | 30 +++
  gcc/config/arm/arm-mve-builtins-shapes.h  |  1 +
  2 files changed, 31 insertions(+)

diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc b/gcc/config/arm/arm-
mve-builtins-shapes.cc
index 1d43b8871bf..19c3c47a20e 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.cc
+++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
@@ -401,6 +401,36 @@ struct binary_rshift_def : public
overloaded_base<0>
  };
  SHAPE (binary_rshift)

+/* _t vfoo[_](_t, _t)
+
+   Example: vmaxvq.
+   int8_t [__arm_]vmaxvq[_s8](int8_t a, int8x16_t b)
+   int8_t [__arm_]vmaxvq_p[_s8](int8_t a, int8x16_t b, mve_pred16_t p)  */
+struct binary_maxvminv_def : public overloaded_base<0>
+{
+  void
+  build (function_builder , const function_group_info ,
+bool preserve_user_namespace) const override
+  {
+b.add_overloaded_functions (group, MODE_none,
preserve_user_namespace);
+build_all (b, "s0,s0,v0", group, MODE_none, preserve_user_namespace);
+  }
+
+  tree
+  resolve (function_resolver ) const override
+  {
+unsigned int i, nargs;
+type_suffix_index type;
+if (!r.check_gp_argument (2, i, nargs)
+   || !r.require_derived_scalar_type (0, r.SAME_TYPE_CLASS)
+   || (type = r.infer_vector_type (1)) == NUM_TYPE_SUFFIXES)
+  return error_mark_node;
+
+return r.resolve_to (r.mode_suffix_id, type);
+  }
+};
+SHAPE (binary_maxvminv)
+
  /* _t vfoo[_t0](_t, _t)

 Example: vmovnbq.
diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h b/gcc/config/arm/arm-
mve-builtins-shapes.h
index dd2597dc6f5..9debf1d8733 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.h
+++ b/gcc/config/arm/arm-mve-builtins-shapes.h
@@ -37,6 +37,7 @@ namespace arm_mve
  extern const function_shape *const binary;
  extern const function_shape *const binary_lshift;
  extern const function_shape *const binary_lshift_r;
+extern const function_shape *const binary_maxvminv;
  extern const function_shape *const binary_move_narrow;
  extern const function_shape *const binary_move_narrow_unsigned;
  extern const function_shape *const binary_opt_n;
--
2.34.1

Re: [Patch, fortran] PR97122 - Spurious FINAL ... must be in the specification part of a MODULE

2023-05-09 Thread Steve Kargl via Gcc-patches

On Tue, May 09, 2023 at 08:24:16PM +0200, Harald Anlauf wrote:
> Hi Paul,
> 
> On 5/9/23 17:51, Paul Richard Thomas via Gcc-patches wrote:
> > Hi All,
> > 
> > Thanks to Steve Kargl for the fix. It caused finalize_8.f03 to fail because
> > this testcase checked that finalizable derived types could not be specified
> > in a submodule. I have replaced the original test with a test of the patch.
> > 
> > Thanks also to Malcolm Cohen for guidance on this.
> > 
> > OK for trunk?
> 
> the patch looks good to me.  However:
> 
> @@ -11637,8 +11637,9 @@ gfc_match_final_decl (void)
>block = gfc_state_stack->previous->sym;
 ^
See below.

>gcc_assert (block);
> 
> -  if (!gfc_state_stack->previous || !gfc_state_stack->previous->previous
> -  || gfc_state_stack->previous->previous->state != COMP_MODULE)
> +  if (gfc_state_stack->previous->previous
> +  && gfc_state_stack->previous->previous->state != COMP_MODULE
> +  && gfc_state_stack->previous->previous->state != COMP_SUBMODULE)
>  {
>gfc_error ("Derived type declaration with FINAL at %C must be in
> the"
>  " specification part of a MODULE");
> 
> I am wondering if we should keep the protection against a potential
> NULL pointer dereference (i.e. gfc_state_stack->previous == NULL) for
> possibly invalid code.  I have failed to produce a simple testcase,
> but others may have "better" ideas.

It's not needed.  See above.  gfc_state_stack->previous is referenced
a few lines above the if-stmt.  The reference will segfault if the
pointer is NULL.

-- 
Steve

Re: [Patch, fortran] PR103716 - [10/11/12/13/14 Regression] ICE in gimplify_expr, at gimplify.c:15964

2023-05-09 Thread Harald Anlauf via Gcc-patches


Hi Paul,

On 5/9/23 18:00, Paul Richard Thomas via Gcc-patches wrote:

Hi All,

This problem caused the gimplifier failure because the reference chain
ending in an inquiry_len still retained a full array reference. This had
already been corrected for deferred character lengths but the fix extends
this to all characters without a length expression and integer expressions,
which is the correct type of course, that retain a full  array_spec. The
nullification of the se->string length in conv_inquiry is a
belts-and-braces measure to stop it from winding up as a hidden argument in
procedure calls.

OK for trunk and, after a decent delay, backporting?


ENOTESTCASE.

Nevertheless the patch LGTM and is also OK for backporting.

Thanks for fixing this!

Harald



Cheers

Paul

Fortran: Fix assumed length chars and len inquiry [PR103716]

2023-05-09  Paul Thomas  

gcc/fortran
PR fortran/103716
* resolve.cc (gfc_resolve_ref): Conversion of array_ref into an
element should be done for all characters without a len expr,
not just deferred lens, and for integer expressions.
* trans-expr.cc (conv_inquiry): For len and kind inquiry refs,
set the se string_length to NULL_TREE.

gcc/testsuite/
PR fortran/103716
* gfortran.dg/pr103716 : New test.

[PATCH v2 06/16] arm: add smax/smin expanders for v*hf

2023-05-09 Thread Christophe Lyon via Gcc-patches

This patch adds the missing expanders for smax/smin for v*hf modes,
by using the VDQWH iterator instead of VALLW.

2022-09-08  Christophe Lyon  

gcc/
* config/arm/vec-common.md (smin3): Use VDQWH iterator.
(smax3): Likewise.
---
 gcc/config/arm/vec-common.md | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index b5fc86fdf28..6183c931e36 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -110,9 +110,9 @@ (define_expand "mul3"
 )
 
 (define_expand "smin3"
-  [(set (match_operand:VALLW 0 "s_register_operand")
-   (smin:VALLW (match_operand:VALLW 1 "s_register_operand")
-   (match_operand:VALLW 2 "s_register_operand")))]
+  [(set (match_operand:VDQWH 0 "s_register_operand")
+   (smin:VDQWH (match_operand:VDQWH 1 "s_register_operand")
+   (match_operand:VDQWH 2 "s_register_operand")))]
"ARM_HAVE__ARITH"
 )
 
@@ -124,9 +124,9 @@ (define_expand "umin3"
 )
 
 (define_expand "smax3"
-  [(set (match_operand:VALLW 0 "s_register_operand")
-   (smax:VALLW (match_operand:VALLW 1 "s_register_operand")
-   (match_operand:VALLW 2 "s_register_operand")))]
+  [(set (match_operand:VDQWH 0 "s_register_operand")
+   (smax:VDQWH (match_operand:VDQWH 1 "s_register_operand")
+   (match_operand:VDQWH 2 "s_register_operand")))]
"ARM_HAVE__ARITH"
 )
 
-- 
2.34.1

Re: [Patch, fortran] PR97122 - Spurious FINAL ... must be in the specification part of a MODULE

2023-05-09 Thread Harald Anlauf via Gcc-patches


Hi Paul,

On 5/9/23 17:51, Paul Richard Thomas via Gcc-patches wrote:

Hi All,

Thanks to Steve Kargl for the fix. It caused finalize_8.f03 to fail because
this testcase checked that finalizable derived types could not be specified
in a submodule. I have replaced the original test with a test of the patch.

Thanks also to Malcolm Cohen for guidance on this.

OK for trunk?


the patch looks good to me.  However:

@@ -11637,8 +11637,9 @@ gfc_match_final_decl (void)
   block = gfc_state_stack->previous->sym;
   gcc_assert (block);

-  if (!gfc_state_stack->previous || !gfc_state_stack->previous->previous
-  || gfc_state_stack->previous->previous->state != COMP_MODULE)
+  if (gfc_state_stack->previous->previous
+  && gfc_state_stack->previous->previous->state != COMP_MODULE
+  && gfc_state_stack->previous->previous->state != COMP_SUBMODULE)
 {
   gfc_error ("Derived type declaration with FINAL at %C must be in
the"
 " specification part of a MODULE");

I am wondering if we should keep the protection against a potential
NULL pointer dereference (i.e. gfc_state_stack->previous == NULL) for
possibly invalid code.  I have failed to produce a simple testcase,
but others may have "better" ideas.

I'll leave it to you to amend the patch or leave as is.

Thanks,
Harald



Paul

Fortran: Allow declaration of finalizable DT in a submodule [PR97122]

2023-05-09  Paul Thomas  
Steven G. Kargl  

gcc/fortran
PR fortran/97122
* decl.cc (variable_decl): Clean up white space issues.
(gfc_match_final_decl): Declaration of finalizable derived type
is allowed in a submodule.

gcc/testsuite/
PR fortran/97122
* gfortran.dg/finalize_8.f03 : Replace testcase that checks
declaration of finalizable derived types in submodules works.

[PATCH 1/2] aarch64: Fix cut-&-pasto in aarch64-sve2-acle-asm.exp

2023-05-09 Thread Richard Sandiford via Gcc-patches

aarch64-sve2-acle-asm.exp tried to prevent --with-cpu/tune
from affecting the results, but it used sve_flags rather than
sve2_flags.  This was a silent failure when running the full
testsuite, but was a fatal error when running the harness
individually.

Tested on aarch64-linux-gnu, pushed to trunk.

Richard


gcc/testsuite/
* gcc.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp: Use
sve2_flags instead of sve_flags.
---
 .../gcc.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/gcc/testsuite/gcc.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp 
b/gcc/testsuite/gcc.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp
index 2e8d78904c5..0ad6463d832 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp
@@ -39,7 +39,7 @@ if { [check_effective_target_aarch64_sve2] } {
 
 # Turn off any codegen tweaks by default that may affect expected assembly.
 # Tests relying on those should turn them on explicitly.
-set sve_flags "$sve_flags -mtune=generic -moverride=tune=none"
+set sve2_flags "$sve2_flags -mtune=generic -moverride=tune=none"
 
 lappend extra_flags "-fno-ipa-icf"
 
-- 
2.25.1

Re: [PATCH] c++: noexcept-spec from nested class confusion [PR109761]

2023-05-09 Thread Jason Merrill via Gcc-patches


On 5/9/23 12:35, Patrick Palka wrote:

When processing a noexcept-spec from a nested class after completion of
the outer class (since a noexcept-spec is a complete-class context), we
pass to noexcept_override_late_checks the outer class type instead of
the nested class type, which leads to bogus errors in the below test.

This patch fixes this by making noexcept_override_late_checks obtain the
class context directly via DECL_CONTEXT instead of via an additional
parameter.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this looK OK
for trunk and release branches?


OK.


PR c++/109761

gcc/cp/ChangeLog:

* parser.cc (cp_parser_class_specifier): Adjust call to
noexcept_override_late_checks.
(noexcept_override_late_checks): Remove 'type' parameter
and use DECL_CONTEXT of 'fndecl' instead.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/noexcept78.C: New test.
---
  gcc/cp/parser.cc| 13 ++---
  gcc/testsuite/g++.dg/cpp0x/noexcept78.C | 16 
  2 files changed, 22 insertions(+), 7 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept78.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index d89553e7da8..7f4bc0f468e 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -251,7 +251,7 @@ static cp_token_cache *cp_token_cache_new
  static tree cp_parser_late_noexcept_specifier
(cp_parser *, tree);
  static void noexcept_override_late_checks
-  (tree, tree);
+  (tree);
  
  static void cp_parser_initial_pragma

(cp_token *);
@@ -26458,7 +26458,7 @@ cp_parser_class_specifier (cp_parser* parser)
  /* The finish_struct call above performed various override checking,
 but it skipped unparsed noexcept-specifier operands.  Now that we
 have resolved them, check again.  */
- noexcept_override_late_checks (type, decl);
+ noexcept_override_late_checks (decl);
  
  	  /* Remove any member-function parameters from the symbol table.  */

  pop_injected_parms ();
@@ -28240,14 +28240,13 @@ cp_parser_late_noexcept_specifier (cp_parser *parser, 
tree default_arg)
  }
  
  /* Perform late checking of overriding function with respect to their

-   noexcept-specifiers.  TYPE is the class and FNDECL is the function
-   that potentially overrides some virtual function with the same
-   signature.  */
+   noexcept-specifiers.  FNDECL is the member function that potentially
+   overrides some virtual function with the same signature.  */
  
  static void

-noexcept_override_late_checks (tree type, tree fndecl)
+noexcept_override_late_checks (tree fndecl)
  {
-  tree binfo = TYPE_BINFO (type);
+  tree binfo = TYPE_BINFO (DECL_CONTEXT (fndecl));
tree base_binfo;
  
if (DECL_STATIC_FUNCTION_P (fndecl))

diff --git a/gcc/testsuite/g++.dg/cpp0x/noexcept78.C 
b/gcc/testsuite/g++.dg/cpp0x/noexcept78.C
new file mode 100644
index 000..e8156eb7c6f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/noexcept78.C
@@ -0,0 +1,16 @@
+// PR c++/109761
+// { dg-do compile { target c++11 } }
+
+struct base {
+  virtual void foo() noexcept { }
+  virtual ~base() { }
+};
+
+struct outer : base {
+  struct nested {
+void foo() noexcept(noexcept(g())); // { dg-bogus "looser" }
+~nested() noexcept(noexcept(g()));  // { dg-bogus "looser" }
+  };
+  static void g();
+};
+

RE: [PATCH 06/16] arm: add smax/smin expanders for v*hf

2023-05-09 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Christophe Lyon 
> Sent: Tuesday, May 9, 2023 6:33 PM
> To: Kyrylo Tkachov ; gcc-patches@gcc.gnu.org;
> Richard Earnshaw ; Richard Sandiford
> 
> Subject: Re: [PATCH 06/16] arm: add smax/smin expanders for v*hf
> 
> 
> 
> On 5/9/23 19:31, Kyrylo Tkachov wrote:
> >
> >
> >> -Original Message-
> >> From: Christophe Lyon 
> >> Sent: Tuesday, May 9, 2023 6:18 PM
> >> To: Kyrylo Tkachov ; gcc-patches@gcc.gnu.org;
> >> Richard Earnshaw ; Richard Sandiford
> >> 
> >> Subject: Re: [PATCH 06/16] arm: add smax/smin expanders for v*hf
> >>
> >>
> >>
> >> On 5/9/23 15:48, Kyrylo Tkachov wrote:
> >>>
> >>>
>  -Original Message-
>  From: Christophe Lyon 
>  Sent: Tuesday, May 9, 2023 1:19 PM
>  To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov
> ;
>  Richard Earnshaw ; Richard Sandiford
>  
>  Cc: Christophe Lyon 
>  Subject: [PATCH 06/16] arm: add smax/smin expanders for v*hf
> 
>  This patch adds the missing expanders for smax/smin for v*hf modes.
> 
>  2022-09-08  Christophe Lyon  
> 
>   gcc/
>   * config/arm/vec-common.md (smin3): New.
>   (smax3): New.
>  ---
> gcc/config/arm/vec-common.md | 14 ++
> 1 file changed, 14 insertions(+)
> 
>  diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-
>  common.md
>  index b5fc86fdf28..1f9b7992da4 100644
>  --- a/gcc/config/arm/vec-common.md
>  +++ b/gcc/config/arm/vec-common.md
>  @@ -116,6 +116,13 @@ (define_expand "smin3"
>    "ARM_HAVE__ARITH"
> )
> 
>  +(define_expand "smin3"
>  +  [(set (match_operand:VH 0 "s_register_operand")
>  +(smin:VH (match_operand:VH 1 "s_register_operand")
>  + (match_operand:VH 2 "s_register_operand")))]
>  +   "ARM_HAVE__ARITH"
>  +)
>  +
> (define_expand "umin3"
>   [(set (match_operand:VINTW 0 "s_register_operand")
>   (umin:VINTW (match_operand:VINTW 1 "s_register_operand")
>  @@ -130,6 +137,13 @@ (define_expand "smax3"
>    "ARM_HAVE__ARITH"
> )
> 
>  +(define_expand "smax3"
>  +  [(set (match_operand:VH 0 "s_register_operand")
>  +(smax:VH (match_operand:VH 1 "s_register_operand")
>  + (match_operand:VH 2 "s_register_operand")))]
>  +   "ARM_HAVE__ARITH"
>  +)
> >>>
> >>> We already have expanders for smin and smax, can we just extend their
> >> mode iterators to include the VH modes?
> >>> The ARM_HAVE__ARITH checks should still gate them properly
> and
> >> we could avoid adding more bloat in this file.
> >>
> >> I opted for the most localized changes, to avoid breaking Neon since
> >> there are already so many similar iterators ;-)
> >>
> >> It seems I can just use the existing VDQWH, which seems to be VALLW (as
> >> already used by smax) plus V8HF/V4HF which is just what we want.
> >
> > Yes, let's use that.
> >
> Thanks, I'll do that. OK to push with change, or do you want me to post
> the updated patch?

Please post the updated patch for archival on the list and commit the series.
Thanks,
Kyrill

> 
> >>
> >> Also, ISTM that VALLW == VDQW, am I misreading?
> >
> > They do look the same. I'm generally okay with removing duplicate iterators
> unless their name seems to have a very specific meaning that would be
> confusing in other contexts.
> > But VALLW and VDQW seem equally confusing 
> Thanks for confirming the "confusing" bit :-)
> 
> Christophe
> 
> > Thanks,
> > KYrill
> >
> >>
> >> Thanks,
> >>
> >> Christophe
> >>
> >>
> >>> Thanks,
> >>> Kyrill
> >>>
>  +
> (define_expand "umax3"
>   [(set (match_operand:VINTW 0 "s_register_operand")
>   (umax:VINTW (match_operand:VINTW 1 "s_register_operand")
>  --
>  2.34.1
> >>>

Re: [PATCH 06/16] arm: add smax/smin expanders for v*hf

2023-05-09 Thread Christophe Lyon via Gcc-patches





On 5/9/23 19:31, Kyrylo Tkachov wrote:




-Original Message-
From: Christophe Lyon 
Sent: Tuesday, May 9, 2023 6:18 PM
To: Kyrylo Tkachov ; gcc-patches@gcc.gnu.org;
Richard Earnshaw ; Richard Sandiford

Subject: Re: [PATCH 06/16] arm: add smax/smin expanders for v*hf



On 5/9/23 15:48, Kyrylo Tkachov wrote:




-Original Message-
From: Christophe Lyon 
Sent: Tuesday, May 9, 2023 1:19 PM
To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
Richard Earnshaw ; Richard Sandiford

Cc: Christophe Lyon 
Subject: [PATCH 06/16] arm: add smax/smin expanders for v*hf

This patch adds the missing expanders for smax/smin for v*hf modes.

2022-09-08  Christophe Lyon  

gcc/
* config/arm/vec-common.md (smin3): New.
(smax3): New.
---
   gcc/config/arm/vec-common.md | 14 ++
   1 file changed, 14 insertions(+)

diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-
common.md
index b5fc86fdf28..1f9b7992da4 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -116,6 +116,13 @@ (define_expand "smin3"
  "ARM_HAVE__ARITH"
   )

+(define_expand "smin3"
+  [(set (match_operand:VH 0 "s_register_operand")
+   (smin:VH (match_operand:VH 1 "s_register_operand")
+(match_operand:VH 2 "s_register_operand")))]
+   "ARM_HAVE__ARITH"
+)
+
   (define_expand "umin3"
 [(set (match_operand:VINTW 0 "s_register_operand")
(umin:VINTW (match_operand:VINTW 1 "s_register_operand")
@@ -130,6 +137,13 @@ (define_expand "smax3"
  "ARM_HAVE__ARITH"
   )

+(define_expand "smax3"
+  [(set (match_operand:VH 0 "s_register_operand")
+   (smax:VH (match_operand:VH 1 "s_register_operand")
+(match_operand:VH 2 "s_register_operand")))]
+   "ARM_HAVE__ARITH"
+)


We already have expanders for smin and smax, can we just extend their

mode iterators to include the VH modes?

The ARM_HAVE__ARITH checks should still gate them properly and

we could avoid adding more bloat in this file.

I opted for the most localized changes, to avoid breaking Neon since
there are already so many similar iterators ;-)

It seems I can just use the existing VDQWH, which seems to be VALLW (as
already used by smax) plus V8HF/V4HF which is just what we want.


Yes, let's use that.

Thanks, I'll do that. OK to push with change, or do you want me to post 
the updated patch?




Also, ISTM that VALLW == VDQW, am I misreading?


They do look the same. I'm generally okay with removing duplicate iterators 
unless their name seems to have a very specific meaning that would be confusing 
in other contexts.
But VALLW and VDQW seem equally confusing 

Thanks for confirming the "confusing" bit :-)

Christophe


Thanks,
KYrill



Thanks,

Christophe



Thanks,
Kyrill


+
   (define_expand "umax3"
 [(set (match_operand:VINTW 0 "s_register_operand")
(umax:VINTW (match_operand:VINTW 1 "s_register_operand")
--
2.34.1

RE: [PATCH 06/16] arm: add smax/smin expanders for v*hf

2023-05-09 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Christophe Lyon 
> Sent: Tuesday, May 9, 2023 6:18 PM
> To: Kyrylo Tkachov ; gcc-patches@gcc.gnu.org;
> Richard Earnshaw ; Richard Sandiford
> 
> Subject: Re: [PATCH 06/16] arm: add smax/smin expanders for v*hf
> 
> 
> 
> On 5/9/23 15:48, Kyrylo Tkachov wrote:
> >
> >
> >> -Original Message-
> >> From: Christophe Lyon 
> >> Sent: Tuesday, May 9, 2023 1:19 PM
> >> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> >> Richard Earnshaw ; Richard Sandiford
> >> 
> >> Cc: Christophe Lyon 
> >> Subject: [PATCH 06/16] arm: add smax/smin expanders for v*hf
> >>
> >> This patch adds the missing expanders for smax/smin for v*hf modes.
> >>
> >> 2022-09-08  Christophe Lyon  
> >>
> >>gcc/
> >>* config/arm/vec-common.md (smin3): New.
> >>(smax3): New.
> >> ---
> >>   gcc/config/arm/vec-common.md | 14 ++
> >>   1 file changed, 14 insertions(+)
> >>
> >> diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-
> >> common.md
> >> index b5fc86fdf28..1f9b7992da4 100644
> >> --- a/gcc/config/arm/vec-common.md
> >> +++ b/gcc/config/arm/vec-common.md
> >> @@ -116,6 +116,13 @@ (define_expand "smin3"
> >>  "ARM_HAVE__ARITH"
> >>   )
> >>
> >> +(define_expand "smin3"
> >> +  [(set (match_operand:VH 0 "s_register_operand")
> >> +  (smin:VH (match_operand:VH 1 "s_register_operand")
> >> +   (match_operand:VH 2 "s_register_operand")))]
> >> +   "ARM_HAVE__ARITH"
> >> +)
> >> +
> >>   (define_expand "umin3"
> >> [(set (match_operand:VINTW 0 "s_register_operand")
> >>(umin:VINTW (match_operand:VINTW 1 "s_register_operand")
> >> @@ -130,6 +137,13 @@ (define_expand "smax3"
> >>  "ARM_HAVE__ARITH"
> >>   )
> >>
> >> +(define_expand "smax3"
> >> +  [(set (match_operand:VH 0 "s_register_operand")
> >> +  (smax:VH (match_operand:VH 1 "s_register_operand")
> >> +   (match_operand:VH 2 "s_register_operand")))]
> >> +   "ARM_HAVE__ARITH"
> >> +)
> >
> > We already have expanders for smin and smax, can we just extend their
> mode iterators to include the VH modes?
> > The ARM_HAVE__ARITH checks should still gate them properly and
> we could avoid adding more bloat in this file.
> 
> I opted for the most localized changes, to avoid breaking Neon since
> there are already so many similar iterators ;-)
> 
> It seems I can just use the existing VDQWH, which seems to be VALLW (as
> already used by smax) plus V8HF/V4HF which is just what we want.

Yes, let's use that.

> 
> Also, ISTM that VALLW == VDQW, am I misreading?

They do look the same. I'm generally okay with removing duplicate iterators 
unless their name seems to have a very specific meaning that would be confusing 
in other contexts.
But VALLW and VDQW seem equally confusing 
Thanks,
KYrill

> 
> Thanks,
> 
> Christophe
> 
> 
> > Thanks,
> > Kyrill
> >
> >> +
> >>   (define_expand "umax3"
> >> [(set (match_operand:VINTW 0 "s_register_operand")
> >>(umax:VINTW (match_operand:VINTW 1 "s_register_operand")
> >> --
> >> 2.34.1
> >

Re: Support parallel testing in libgomp, part II [PR66005]

2023-05-09 Thread Bernhard Reutner-Fischer via Gcc-patches

Thomas,

On Fri, 5 May 2023 10:59:31 +0200
Thomas Schwinge  wrote:

> Worth noting is that with nvptx offloading, there is one execution test case
> that times out ('libgomp.fortran/reverse-offload-5.f90').  This effectively
> stalls progress for almost 5 min:

Short of fixing it for real you could shorten the timeout for this
single test to a handful of seconds: dg-timeout 3
or set it to a 1/50 fraction or the like: dg-timeout-factor 0.02

If it's unbroken then it should run to completion in far less than 10
or 30 seconds, from the looks?

just a thought..

Re: [PATCH] c++: error-recovery ICE with unstable satisfaction [PR109752]

2023-05-09 Thread Jason Merrill via Gcc-patches


On 5/9/23 12:35, Patrick Palka wrote:

After diagnosing and recovering from unstable satisfaction, it's
possible to evaluate an atom for the first time noisily rather than
quietly.  The satisfaction cache tries to handle this situation
gracefully, but apparently not gracefully enough: we inserted an empty
slot in the cache, and left it empty, which later makes
hash_table::check_complete_insertion unhappy.  This patch fixes this by
removing the empty slot in this case.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?


OK.


PR c++/109752

gcc/cp/ChangeLog:

* constraint.cc (satisfaction_cache::satisfaction_cache): In the
unexpected case of evaluating an atom for the first time noisily,
clear the cache slot that we inserted.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-pr109752.C: New test.
---
  gcc/cp/constraint.cc  | 14 +++---
  .../g++.dg/cpp2a/concepts-pr109752.C  | 26 +++
  2 files changed, 37 insertions(+), 3 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-pr109752.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 675299aa4cd..bdaa82c8741 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -2687,9 +2687,17 @@ satisfaction_cache
*slot = entry;
  }
else
-/* We shouldn't get here, but if we do, let's just leave 'entry'
-   empty, effectively disabling the cache.  */
-return;
+{
+  /* We're evaluating this atom for the first time, and doing so noisily.
+This shouldn't happen outside of error recovery situations involving
+unstable satisfaction.  Let's just leave 'entry' empty, effectively
+disabling the cache, and remove the empty slot.  */
+  gcc_checking_assert (seen_error ());
+  /* To appease hash_table::check_complete_insertion.  */
+  *slot = ggc_alloc ();
+  sat_cache->clear_slot (slot);
+  return;
+}
  }
  
  /* Returns the cached satisfaction result if we have one and we're not

diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-pr109752.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-pr109752.C
new file mode 100644
index 000..d54ce295e50
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-pr109752.C
@@ -0,0 +1,26 @@
+// PR c++/109752
+// { dg-do compile { target c++20 } }
+
+template 
+  inline constexpr bool is_constructible_v = __is_constructible(_Tp, _Args...);
+template
+  concept __weakly_eq_cmp_with
+ = requires(_Tp __t, _Up __u) {{ __u != __t } ; // { dg-error "changed 
from" }
+ };
+  template
+concept regular =  is_constructible_v<_Tp>  && __weakly_eq_cmp_with<_Tp, 
_Tp>;
+  template concept incrementable = true
+&& regular<_Iter>
+&& requires(_Iter __i) { { __i++ } ;}
+;
+template
+struct iterator_interface
+{
+  friend constexpr bool operator>=(D lhs, D rhs) requires __weakly_eq_cmp_with { return true; }
+};
+template
+struct iterator : iterator_interface>
+{
+bool operator==(iterator) const;
+};
+static_assert(incrementable>); // { dg-error "assert" }

Re: [PATCH 06/16] arm: add smax/smin expanders for v*hf

2023-05-09 Thread Christophe Lyon via Gcc-patches





On 5/9/23 15:48, Kyrylo Tkachov wrote:




-Original Message-
From: Christophe Lyon 
Sent: Tuesday, May 9, 2023 1:19 PM
To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
Richard Earnshaw ; Richard Sandiford

Cc: Christophe Lyon 
Subject: [PATCH 06/16] arm: add smax/smin expanders for v*hf

This patch adds the missing expanders for smax/smin for v*hf modes.

2022-09-08  Christophe Lyon  

gcc/
* config/arm/vec-common.md (smin3): New.
(smax3): New.
---
  gcc/config/arm/vec-common.md | 14 ++
  1 file changed, 14 insertions(+)

diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-
common.md
index b5fc86fdf28..1f9b7992da4 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -116,6 +116,13 @@ (define_expand "smin3"
 "ARM_HAVE__ARITH"
  )

+(define_expand "smin3"
+  [(set (match_operand:VH 0 "s_register_operand")
+   (smin:VH (match_operand:VH 1 "s_register_operand")
+(match_operand:VH 2 "s_register_operand")))]
+   "ARM_HAVE__ARITH"
+)
+
  (define_expand "umin3"
[(set (match_operand:VINTW 0 "s_register_operand")
(umin:VINTW (match_operand:VINTW 1 "s_register_operand")
@@ -130,6 +137,13 @@ (define_expand "smax3"
 "ARM_HAVE__ARITH"
  )

+(define_expand "smax3"
+  [(set (match_operand:VH 0 "s_register_operand")
+   (smax:VH (match_operand:VH 1 "s_register_operand")
+(match_operand:VH 2 "s_register_operand")))]
+   "ARM_HAVE__ARITH"
+)


We already have expanders for smin and smax, can we just extend their mode 
iterators to include the VH modes?
The ARM_HAVE__ARITH checks should still gate them properly and we could 
avoid adding more bloat in this file.


I opted for the most localized changes, to avoid breaking Neon since 
there are already so many similar iterators ;-)


It seems I can just use the existing VDQWH, which seems to be VALLW (as 
already used by smax) plus V8HF/V4HF which is just what we want.


Also, ISTM that VALLW == VDQW, am I misreading?

Thanks,

Christophe



Thanks,
Kyrill


+
  (define_expand "umax3"
[(set (match_operand:VINTW 0 "s_register_operand")
(umax:VINTW (match_operand:VINTW 1 "s_register_operand")
--
2.34.1

Re: [PATCH] c++: Reject attributes without arguments used as pack expansion [PR109756]

2023-05-09 Thread Jason Merrill via Gcc-patches


On 5/9/23 04:12, Jakub Jelinek wrote:

Hi!

The following testcase shows we silently accept (and ignore) attributes without
arguments used as pack expansions.  This is because we call
make_pack_expansion and that starts with
   if (!arg || arg == error_mark_node)
 return arg;
Now, an attribute without arguments like [[noreturn...]] is IMHO always
invalid, in this case for 2 reasons; one is that as it has no arguments,
no pack can be present and second is that the standard says that
attributes need to specially permit uses of parameter pack and doesn't
explicitly permit it for any of the standard attributes (except for alignas?
which has different syntax).
If an attribute has some arguments but doesn't contain packs in those
arguments, make_pack_expansion will already diagnose it.

My first version of the patch just added the last hunk in parser.cc and
without the && !has_args, but that unfortunately regresses
gen-attrs-65.C - TREE_VALUE (attribute) == NULL_TREE can mean a recognized
or unrecognized attribute which doesn't have any arguments, but can also
mean an unrecognized attribute which had arguments but which we didn't
parse because we didn't know how to parse those arguments.


How about changing cp_parser_std_attribute to set TREE_VALUE to 
error_mark_node if it skips arguments?



So, the patch remembers from cp_parser_std_attribute whether an attribute
had or didn't have arguments and diagnoses it just in the latter case.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
I guess this isn't appropriate for backports as accepts-invalid.

2023-05-09  Jakub Jelinek  

PR c++/109756
* parser.cc (cp_parser_std_attribute): Add HAS_ARGS argument,
set it to true iff the attribute has arguments specified.
(cp_parser_std_attribute_list): Adjust caller.  If ... is used
after attribute without arguments, diagnose it and return
error_mark_node.

* g++.dg/cpp0x/gen-attrs-78.C: New test.

--- gcc/cp/parser.cc.jj 2023-04-25 16:40:42.010723809 +0200
+++ gcc/cp/parser.cc2023-05-08 12:47:58.767924380 +0200
@@ -2634,7 +2634,7 @@ static tree cp_parser_gnu_attributes_opt
  static tree cp_parser_gnu_attribute_list
(cp_parser *, bool = false);
  static tree cp_parser_std_attribute
-  (cp_parser *, tree);
+  (cp_parser *, tree, bool &);
  static tree cp_parser_std_attribute_spec
(cp_parser *);
  static tree cp_parser_std_attribute_spec_seq
@@ -29313,7 +29313,7 @@ cp_parser_omp_sequence_args (cp_parser *
{ balanced-token-seq }.  */
  
  static tree

-cp_parser_std_attribute (cp_parser *parser, tree attr_ns)
+cp_parser_std_attribute (cp_parser *parser, tree attr_ns, bool _args)
  {
tree attribute, attr_id = NULL_TREE, arguments;
cp_token *token;
@@ -29323,6 +29323,7 @@ cp_parser_std_attribute (cp_parser *pars
  
/* First, parse name of the attribute, a.k.a attribute-token.  */
  
+  has_args = false;

token = cp_lexer_peek_token (parser->lexer);
if (token->type == CPP_NAME)
  attr_id = token->u.value;
@@ -29419,6 +29420,7 @@ cp_parser_std_attribute (cp_parser *pars
return attribute;
  }
  
+  has_args = true;

{
  vec *vec;
  int attr_flag = normal_attr;
@@ -29544,7 +29546,8 @@ cp_parser_std_attribute_list (cp_parser
while (true)
  {
location_t loc = cp_lexer_peek_token (parser->lexer)->location;
-  attribute = cp_parser_std_attribute (parser, attr_ns);
+  bool has_args;
+  attribute = cp_parser_std_attribute (parser, attr_ns, has_args);
if (attribute == error_mark_node)
break;
if (attribute != NULL_TREE)
@@ -29562,6 +29565,12 @@ cp_parser_std_attribute_list (cp_parser
  if (attribute == NULL_TREE)
error_at (token->location,
  "expected attribute before %<...%>");
+ else if (TREE_VALUE (attribute) == NULL_TREE && !has_args)
+   {
+ error_at (token->location, "attribute with no arguments "
+"contains no parameter packs");
+ return error_mark_node;
+   }
  else
{
  tree pack = make_pack_expansion (TREE_VALUE (attribute));
--- gcc/testsuite/g++.dg/cpp0x/gen-attrs-78.C.jj2023-05-08 
12:33:13.387581760 +0200
+++ gcc/testsuite/g++.dg/cpp0x/gen-attrs-78.C   2023-05-08 12:32:23.146301128 
+0200
@@ -0,0 +1,29 @@
+// PR c++/109756
+// { dg-do compile { target c++11 } }
+// { dg-options "-Wno-attributes" }
+
+template 
+[[noreturn...]]// { dg-error "attribute 
with no arguments contains no parameter packs" }
+[[deprecated...]]  // { dg-error "attribute with no 
arguments contains no parameter packs" }
+[[nodiscard...]]   // { dg-error "attribute with no 
arguments contains no parameter packs" }
+int foo (int x)
+{
+  switch (x)
+{
+case 1:
+  [[likely...]];

Re: [PATCH] c++: error-recovery ICE with unstable satisfaction [PR109752]

2023-05-09 Thread Patrick Palka via Gcc-patches

On Tue, 9 May 2023, Patrick Palka wrote:

> After diagnosing and recovering from unstable satisfaction, it's
> possible to evaluate an atom for the first time noisily rather than
> quietly.  The satisfaction cache tries to handle this situation
> gracefully, but apparently not gracefully enough: we inserted an empty
> slot in the cache, and left it empty, which later makes
> hash_table::check_complete_insertion unhappy.  This patch fixes this by
> removing the empty slot in this case.
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> trunk?
> 
>   PR c++/109752
> 
> gcc/cp/ChangeLog:
> 
>   * constraint.cc (satisfaction_cache::satisfaction_cache): In the
>   unexpected case of evaluating an atom for the first time noisily,
>   clear the cache slot that we inserted.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp2a/concepts-pr109752.C: New test.
> ---
>  gcc/cp/constraint.cc  | 14 +++---
>  .../g++.dg/cpp2a/concepts-pr109752.C  | 26 +++
>  2 files changed, 37 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-pr109752.C
> 
> diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
> index 675299aa4cd..bdaa82c8741 100644
> --- a/gcc/cp/constraint.cc
> +++ b/gcc/cp/constraint.cc
> @@ -2687,9 +2687,17 @@ satisfaction_cache
>*slot = entry;
>  }
>else
> -/* We shouldn't get here, but if we do, let's just leave 'entry'
> -   empty, effectively disabling the cache.  */
> -return;
> +{
> +  /* We're evaluating this atom for the first time, and doing so noisily.
> +  This shouldn't happen outside of error recovery situations involving
> +  unstable satisfaction.  Let's just leave 'entry' empty, effectively
> +  disabling the cache, and remove the empty slot.  */
> +  gcc_checking_assert (seen_error ());
> +  /* To appease hash_table::check_complete_insertion.  */
> +  *slot = ggc_alloc ();
> +  sat_cache->clear_slot (slot);
> +  return;

Whoops, this 'return;' is rather unnecessary, so consider it removed.

> +}
>  }
>  
>  /* Returns the cached satisfaction result if we have one and we're not
> diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-pr109752.C 
> b/gcc/testsuite/g++.dg/cpp2a/concepts-pr109752.C
> new file mode 100644
> index 000..d54ce295e50
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp2a/concepts-pr109752.C
> @@ -0,0 +1,26 @@
> +// PR c++/109752
> +// { dg-do compile { target c++20 } }
> +
> +template 
> +  inline constexpr bool is_constructible_v = __is_constructible(_Tp, 
> _Args...);
> +template
> +  concept __weakly_eq_cmp_with
> + = requires(_Tp __t, _Up __u) {{ __u != __t } ; // { dg-error "changed 
> from" }
> + };
> +  template
> +concept regular =  is_constructible_v<_Tp>  && __weakly_eq_cmp_with<_Tp, 
> _Tp>;
> +  template concept incrementable = true
> +&& regular<_Iter>
> +&& requires(_Iter __i) { { __i++ } ;}
> +;
> +template
> +struct iterator_interface
> +{
> +  friend constexpr bool operator>=(D lhs, D rhs) requires 
> __weakly_eq_cmp_with { return true; }
> +};
> +template
> +struct iterator : iterator_interface>
> +{
> +bool operator==(iterator) const;
> +};
> +static_assert(incrementable>); // { dg-error "assert" }
> -- 
> 2.40.1.476.g69c786637d
> 
>

[PATCH] c++: noexcept-spec from nested class confusion [PR109761]

2023-05-09 Thread Patrick Palka via Gcc-patches

When processing a noexcept-spec from a nested class after completion of
the outer class (since a noexcept-spec is a complete-class context), we
pass to noexcept_override_late_checks the outer class type instead of
the nested class type, which leads to bogus errors in the below test.

This patch fixes this by making noexcept_override_late_checks obtain the
class context directly via DECL_CONTEXT instead of via an additional
parameter.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this looK OK
for trunk and release branches?

PR c++/109761

gcc/cp/ChangeLog:

* parser.cc (cp_parser_class_specifier): Adjust call to
noexcept_override_late_checks.
(noexcept_override_late_checks): Remove 'type' parameter
and use DECL_CONTEXT of 'fndecl' instead.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/noexcept78.C: New test.
---
 gcc/cp/parser.cc| 13 ++---
 gcc/testsuite/g++.dg/cpp0x/noexcept78.C | 16 
 2 files changed, 22 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept78.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index d89553e7da8..7f4bc0f468e 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -251,7 +251,7 @@ static cp_token_cache *cp_token_cache_new
 static tree cp_parser_late_noexcept_specifier
   (cp_parser *, tree);
 static void noexcept_override_late_checks
-  (tree, tree);
+  (tree);
 
 static void cp_parser_initial_pragma
   (cp_token *);
@@ -26458,7 +26458,7 @@ cp_parser_class_specifier (cp_parser* parser)
  /* The finish_struct call above performed various override checking,
 but it skipped unparsed noexcept-specifier operands.  Now that we
 have resolved them, check again.  */
- noexcept_override_late_checks (type, decl);
+ noexcept_override_late_checks (decl);
 
  /* Remove any member-function parameters from the symbol table.  */
  pop_injected_parms ();
@@ -28240,14 +28240,13 @@ cp_parser_late_noexcept_specifier (cp_parser *parser, 
tree default_arg)
 }
 
 /* Perform late checking of overriding function with respect to their
-   noexcept-specifiers.  TYPE is the class and FNDECL is the function
-   that potentially overrides some virtual function with the same
-   signature.  */
+   noexcept-specifiers.  FNDECL is the member function that potentially
+   overrides some virtual function with the same signature.  */
 
 static void
-noexcept_override_late_checks (tree type, tree fndecl)
+noexcept_override_late_checks (tree fndecl)
 {
-  tree binfo = TYPE_BINFO (type);
+  tree binfo = TYPE_BINFO (DECL_CONTEXT (fndecl));
   tree base_binfo;
 
   if (DECL_STATIC_FUNCTION_P (fndecl))
diff --git a/gcc/testsuite/g++.dg/cpp0x/noexcept78.C 
b/gcc/testsuite/g++.dg/cpp0x/noexcept78.C
new file mode 100644
index 000..e8156eb7c6f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/noexcept78.C
@@ -0,0 +1,16 @@
+// PR c++/109761
+// { dg-do compile { target c++11 } }
+
+struct base {
+  virtual void foo() noexcept { }
+  virtual ~base() { }
+};
+
+struct outer : base {
+  struct nested {
+void foo() noexcept(noexcept(g())); // { dg-bogus "looser" }
+~nested() noexcept(noexcept(g()));  // { dg-bogus "looser" }
+  };
+  static void g();
+};
+
-- 
2.40.1.476.g69c786637d

[PATCH] c++: error-recovery ICE with unstable satisfaction [PR109752]

2023-05-09 Thread Patrick Palka via Gcc-patches

After diagnosing and recovering from unstable satisfaction, it's
possible to evaluate an atom for the first time noisily rather than
quietly.  The satisfaction cache tries to handle this situation
gracefully, but apparently not gracefully enough: we inserted an empty
slot in the cache, and left it empty, which later makes
hash_table::check_complete_insertion unhappy.  This patch fixes this by
removing the empty slot in this case.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/109752

gcc/cp/ChangeLog:

* constraint.cc (satisfaction_cache::satisfaction_cache): In the
unexpected case of evaluating an atom for the first time noisily,
clear the cache slot that we inserted.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-pr109752.C: New test.
---
 gcc/cp/constraint.cc  | 14 +++---
 .../g++.dg/cpp2a/concepts-pr109752.C  | 26 +++
 2 files changed, 37 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-pr109752.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 675299aa4cd..bdaa82c8741 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -2687,9 +2687,17 @@ satisfaction_cache
   *slot = entry;
 }
   else
-/* We shouldn't get here, but if we do, let's just leave 'entry'
-   empty, effectively disabling the cache.  */
-return;
+{
+  /* We're evaluating this atom for the first time, and doing so noisily.
+This shouldn't happen outside of error recovery situations involving
+unstable satisfaction.  Let's just leave 'entry' empty, effectively
+disabling the cache, and remove the empty slot.  */
+  gcc_checking_assert (seen_error ());
+  /* To appease hash_table::check_complete_insertion.  */
+  *slot = ggc_alloc ();
+  sat_cache->clear_slot (slot);
+  return;
+}
 }
 
 /* Returns the cached satisfaction result if we have one and we're not
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-pr109752.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-pr109752.C
new file mode 100644
index 000..d54ce295e50
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-pr109752.C
@@ -0,0 +1,26 @@
+// PR c++/109752
+// { dg-do compile { target c++20 } }
+
+template 
+  inline constexpr bool is_constructible_v = __is_constructible(_Tp, _Args...);
+template
+  concept __weakly_eq_cmp_with
+ = requires(_Tp __t, _Up __u) {{ __u != __t } ; // { dg-error "changed 
from" }
+ };
+  template
+concept regular =  is_constructible_v<_Tp>  && __weakly_eq_cmp_with<_Tp, 
_Tp>;
+  template concept incrementable = true
+&& regular<_Iter>
+&& requires(_Iter __i) { { __i++ } ;}
+;
+template
+struct iterator_interface
+{
+  friend constexpr bool operator>=(D lhs, D rhs) requires 
__weakly_eq_cmp_with { return true; }
+};
+template
+struct iterator : iterator_interface>
+{
+bool operator==(iterator) const;
+};
+static_assert(incrementable>); // { dg-error "assert" }
-- 
2.40.1.476.g69c786637d

[PATCH] rtl: AArch64: New RTL for ABD

2023-05-09 Thread Oluwatamilore Adebayo via Gcc-patches

>From afa416dab831795f7e1114da2fb9e94ea3b8c519 Mon Sep 17 00:00:00 2001
From: oluade01 
Date: Fri, 14 Apr 2023 15:10:07 +0100
Subject: [PATCH 2/4] AArch64: New RTL for ABD

This patch adds new RTL and tests for sabd and uabd

PR tree-optimization/109156

gcc/ChangeLog:

* config/aarch64/aarch64-simd-builtins.def (sabd, uabd):
Change the mode to 3.
* config/aarch64/aarch64-simd.md (aarch64_abd):
Rename to abd3.
* config/aarch64/aarch64-sve.md (abd_3): Rename
to abd3.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/abd.h: New file.
* gcc.target/aarch64/abd_2.c: New test.
* gcc.target/aarch64/abd_3.c: New test.
* gcc.target/aarch64/abd_4.c: New test.
* gcc.target/aarch64/abd_run_1.c: New test.
* gcc.target/aarch64/sve/sve/abd_1.c: New test.
* gcc.target/aarch64/sve/sve/abd_2.c: New test.
---
 gcc/config/aarch64/aarch64-simd-builtins.def |  6 +-
 gcc/config/aarch64/aarch64-simd.md   |  4 +-
 gcc/config/aarch64/aarch64-sve.md|  4 +-
 gcc/testsuite/gcc.target/aarch64/abd.h   | 62 +
 gcc/testsuite/gcc.target/aarch64/abd_2.c | 34 +++
 gcc/testsuite/gcc.target/aarch64/abd_3.c | 34 +++
 gcc/testsuite/gcc.target/aarch64/abd_4.c | 33 +++
 gcc/testsuite/gcc.target/aarch64/abd_run_1.c | 93 
 gcc/testsuite/gcc.target/aarch64/sve/abd_1.c | 34 +++
 gcc/testsuite/gcc.target/aarch64/sve/abd_2.c | 33 +++
 10 files changed, 330 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/abd.h
 create mode 100644 gcc/testsuite/gcc.target/aarch64/abd_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/abd_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/abd_4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/abd_run_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/abd_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/abd_2.c

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
b/gcc/config/aarch64/aarch64-simd-builtins.def
index 
1beaa08c1e7c94bc13a64865ddb677345534699c..3efbf0a1874f6242e69665b8316d9a7d62a9c8cf
 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -194,9 +194,9 @@
   BUILTIN_VDQV_L (UNOP, saddlv, 0, NONE)
   BUILTIN_VDQV_L (UNOPU, uaddlv, 0, NONE)
 
-  /* Implemented by aarch64_abd.  */
-  BUILTIN_VDQ_BHSI (BINOP, sabd, 0, NONE)
-  BUILTIN_VDQ_BHSI (BINOPU, uabd, 0, NONE)
+  /* Implemented by abd3.  */
+  BUILTIN_VDQ_BHSI (BINOP, sabd, 3, NONE)
+  BUILTIN_VDQ_BHSI (BINOPU, uabd, 3, NONE)
 
   /* Implemented by aarch64_aba.  */
   BUILTIN_VDQ_BHSI (TERNOP, saba, 0, NONE)
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
cb2223d29c2d97d6d396b4eca166463369819ca6..f52c148a80589a48befb71135e90aa02a2b253e7
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -915,7 +915,7 @@ (define_insn "aarch64_abs"
 ;; So (ABS:QI (minus:QI 64 -128)) == (ABS:QI (192 or -64 signed)) == 64.
 ;; Whereas SABD would return 192 (-64 signed) on the above example.
 ;; Use MINUS ([us]max (op1, op2), [us]min (op1, op2)) instead.
-(define_insn "aarch64_abd"
+(define_insn "abd3"
   [(set (match_operand:VDQ_BHSI 0 "register_operand" "=w")
(minus:VDQ_BHSI
  (USMAX:VDQ_BHSI
@@ -1080,7 +1080,7 @@ (define_expand "sadv16qi"
   {
rtx ones = force_reg (V16QImode, CONST1_RTX (V16QImode));
rtx abd = gen_reg_rtx (V16QImode);
-   emit_insn (gen_aarch64_abdv16qi (abd, operands[1], operands[2]));
+   emit_insn (gen_abdv16qi3 (abd, operands[1], operands[2]));
emit_insn (gen_udot_prodv16qi (operands[0], abd, ones, operands[3]));
DONE;
   }
diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index 
4b4c02c90fec6ce1ff15a8b2a5df348224a307b7..5966a33a3cc471f8c2e875b9e3a6a8a8ddc6af17
 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -4001,7 +4001,7 @@ (define_insn_and_rewrite "*aarch64_adr_shift_uxtw"
 ;; -
 
 ;; Unpredicated integer absolute difference.
-(define_expand "abd_3"
+(define_expand "abd3"
   [(use (match_operand:SVE_I 0 "register_operand"))
(USMAX:SVE_I
  (match_operand:SVE_I 1 "register_operand")
@@ -6973,7 +6973,7 @@ (define_expand "sad"
   {
 rtx ones = force_reg (mode, CONST1_RTX (mode));
 rtx diff = gen_reg_rtx (mode);
-emit_insn (gen_abd_3 (diff, operands[1], operands[2]));
+emit_insn (gen_abd3 (diff, operands[1], operands[2]));
 emit_insn (gen_udot_prod (operands[0], diff, ones, operands[3]));
 DONE;
   }
diff --git a/gcc/testsuite/gcc.target/aarch64/abd.h 
b/gcc/testsuite/gcc.target/aarch64/abd.h
new file mode 100644
index 
..bc38e8508056cf2623cddd6053bf1cec3fa4ece4
---

[PATCH] vect: Missed opportunity to use [SU]ABD

2023-05-09 Thread Oluwatamilore Adebayo via Gcc-patches

>From 0b5f469171c340ef61a48a31877d495bb77bd35f Mon Sep 17 00:00:00 2001
From: oluade01 
Date: Fri, 14 Apr 2023 10:24:43 +0100
Subject: [PATCH 1/4] Missed opportunity to use [SU]ABD

This adds a recognition pattern for the non-widening
absolute difference (ABD).

gcc/ChangeLog:

* doc/md.texi (sabd, uabd): Document them.
* internal-fn.def (ABD): Use new optab.
* optabs.def (sabd_optab, uabd_optab): New optabs,
* tree-vect-patterns.cc (vect_recog_absolute_difference):
Recognize the following idiom abs (a - b).
(vect_recog_sad_pattern): Refactor to use
vect_recog_absolute_difference.
(vect_recog_abd_pattern): Use patterns found by
vect_recog_absolute_difference to build a new ABD
internal call.
---
 gcc/doc/md.texi   |  10 ++
 gcc/internal-fn.def   |   3 +
 gcc/optabs.def|   2 +
 gcc/tree-vect-patterns.cc | 250 +-
 4 files changed, 234 insertions(+), 31 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 
07bf8bdebffb2e523f25a41f2b57e43c0276b745..0ad546c63a8deebb4b6db894f437d1e21f0245a8
 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5778,6 +5778,16 @@ Other shift and rotate instructions, analogous to the
 Vector shift and rotate instructions that take vectors as operand 2
 instead of a scalar type.
 
+@cindex @code{uabd@var{m}} instruction pattern
+@cindex @code{sabd@var{m}} instruction pattern
+@item @samp{uabd@var{m}}, @samp{sabd@var{m}}
+Signed and unsigned absolute difference instructions.  These
+instructions find the difference between operands 1 and 2
+then return the absolute value.  A C code equivalent would be:
+@smallexample
+op0 = abs (op0 - op1)
+@end smallexample
+
 @cindex @code{avg@var{m}3_floor} instruction pattern
 @cindex @code{uavg@var{m}3_floor} instruction pattern
 @item @samp{avg@var{m}3_floor}
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 
7fe742c2ae713e7152ab05cfdfba86e4e0aa3456..0f1724ecf37a31c231572edf90b5577e2d82f468
 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -167,6 +167,9 @@ DEF_INTERNAL_OPTAB_FN (FMS, ECF_CONST, fms, ternary)
 DEF_INTERNAL_OPTAB_FN (FNMA, ECF_CONST, fnma, ternary)
 DEF_INTERNAL_OPTAB_FN (FNMS, ECF_CONST, fnms, ternary)
 
+DEF_INTERNAL_SIGNED_OPTAB_FN (ABD, ECF_CONST | ECF_NOTHROW, first,
+ sabd, uabd, binary)
+
 DEF_INTERNAL_SIGNED_OPTAB_FN (AVG_FLOOR, ECF_CONST | ECF_NOTHROW, first,
  savg_floor, uavg_floor, binary)
 DEF_INTERNAL_SIGNED_OPTAB_FN (AVG_CEIL, ECF_CONST | ECF_NOTHROW, first,
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 
695f5911b300c9ca5737de9be809fa01aabe5e01..29bc92281a2175f898634cbe6af63c18021e5268
 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -359,6 +359,8 @@ OPTAB_D (mask_fold_left_plus_optab, 
"mask_fold_left_plus_$a")
 OPTAB_D (extract_last_optab, "extract_last_$a")
 OPTAB_D (fold_extract_last_optab, "fold_extract_last_$a")
 
+OPTAB_D (uabd_optab, "uabd$a3")
+OPTAB_D (sabd_optab, "sabd$a3")
 OPTAB_D (savg_floor_optab, "avg$a3_floor")
 OPTAB_D (uavg_floor_optab, "uavg$a3_floor")
 OPTAB_D (savg_ceil_optab, "avg$a3_ceil")
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 
a49b09539776c0056e77f99b10365d0a8747fbc5..91e1f9d4b610275dd833ec56dc77f76367ee7886
 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -770,6 +770,89 @@ vect_split_statement (vec_info *vinfo, stmt_vec_info 
stmt2_info, tree new_rhs,
 }
 }
 
+/* Look for the following pattern
+   X = x[i]
+   Y = y[i]
+   DIFF = X - Y
+   DAD = ABS_EXPR
+ */
+static bool
+vect_recog_absolute_difference (vec_info *vinfo, gassign *abs_stmt,
+   tree *half_type, bool reject_unsigned,
+   vect_unpromoted_value unprom[2],
+   tree diff_oprnds[2])
+{
+  if (!abs_stmt)
+return false;
+
+  /* FORNOW.  Can continue analyzing the def-use chain when this stmt in a phi
+ inside the loop (in case we are analyzing an outer-loop).  */
+  enum tree_code code = gimple_assign_rhs_code (abs_stmt);
+  if (code != ABS_EXPR && code != ABSU_EXPR)
+return false;
+
+  tree abs_oprnd = gimple_assign_rhs1 (abs_stmt);
+  tree abs_type = TREE_TYPE (abs_oprnd);
+  if (!abs_oprnd)
+return false;
+  if (reject_unsigned && TYPE_UNSIGNED (abs_type))
+return false;
+  if (!ANY_INTEGRAL_TYPE_P (abs_type) || TYPE_OVERFLOW_WRAPS (abs_type))
+return false;
+
+  /* Peel off conversions from the ABS input.  This can involve sign
+ changes (e.g.  from an unsigned subtraction to a signed ABS input)
+ or signed promotion, but it can't include unsigned promotion.
+ (Note that ABS of an unsigned promotion should have been folded
+ away before now anyway.)  */
+  vect_unpromoted_value unprom_diff;
+  abs_oprnd = vect_look_through_possible_promotion (vinfo, abs_oprnd,
+

Re: [gcc13 backport] RISCV: Inline subword atomic ops

2023-05-09 Thread Patrick O'Neill


Ping.

On 5/3/23 10:19, Patrick O'Neill wrote:


RISC-V has no support for subword atomic operations; code currently
generates libatomic library calls.

This patch changes the default behavior to inline subword atomic calls
(using the same logic as the existing library call).
Behavior can be specified using the -minline-atomics and
-mno-inline-atomics command line flags.

gcc/libgcc/config/riscv/atomic.c has the same logic implemented in asm.
This will need to stay for backwards compatibility and the
-mno-inline-atomics flag.

2023-05-03 Patrick O'Neill 

gcc/ChangeLog:
PR target/104338
* config/riscv/riscv-protos.h: Add helper function stubs.
* config/riscv/riscv.cc: Add helper functions for subword masking.
* config/riscv/riscv.opt: Add command-line flags
-minline-atomics and -mno-inline-atomics.
* config/riscv/sync.md: Add masking logic and inline asm for
fetch_and_op, fetch_and_nand, CAS, and exchange ops.
* doc/invoke.texi: Add blurb regarding new command-line flags
-minline-atomics and -mno-inline-atomics.

libgcc/ChangeLog:
PR target/104338
* config/riscv/atomic.c: Add reference to duplicate logic.

gcc/testsuite/ChangeLog:
PR target/104338
* gcc.target/riscv/inline-atomics-1.c: New test.
* gcc.target/riscv/inline-atomics-2.c: New test.
* gcc.target/riscv/inline-atomics-3.c: New test.
* gcc.target/riscv/inline-atomics-4.c: New test.
* gcc.target/riscv/inline-atomics-5.c: New test.
* gcc.target/riscv/inline-atomics-6.c: New test.
* gcc.target/riscv/inline-atomics-7.c: New test.
* gcc.target/riscv/inline-atomics-8.c: New test.

Signed-off-by: Patrick O'Neill 
Signed-off-by: Palmer Dabbelt 
---
This backport includes all the subsequent fixes.
Squashed GCC-14 commits:
   
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=f797260adaf52bee0ec0e16190bbefbe1bfc3692
   
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=2a26872984c109a98d0ad733b0c68c3e1648ec86

Fixed ChangeLog:
   
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=9cfdd5af3efd4a7e52ae7f97f55effc436c0cf45
---
The GCC-14 build error[1] was caused by the subsequent A.6 mappings
change[2], not the inline atomics change (this backport).
[1] 
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=4bd434fbfc7865961a8e10d7e9601b28765ce7be
[2] 
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=bff7c77386447936dd614ebc7086b826c99c6642
---
There is a related follow-on GCC-14 patch to config/riscv/linux:
   
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=203f3060dd363361b172f7295f42bb6bf5ac0b3b

The changes from that patch are not included in this backport.
---

[Patch, fortran] PR103716 - [10/11/12/13/14 Regression] ICE in gimplify_expr, at gimplify.c:15964

2023-05-09 Thread Paul Richard Thomas via Gcc-patches

Hi All,

This problem caused the gimplifier failure because the reference chain
ending in an inquiry_len still retained a full array reference. This had
already been corrected for deferred character lengths but the fix extends
this to all characters without a length expression and integer expressions,
which is the correct type of course, that retain a full  array_spec. The
nullification of the se->string length in conv_inquiry is a
belts-and-braces measure to stop it from winding up as a hidden argument in
procedure calls.

OK for trunk and, after a decent delay, backporting?

Cheers

Paul

Fortran: Fix assumed length chars and len inquiry [PR103716]

2023-05-09  Paul Thomas  

gcc/fortran
PR fortran/103716
* resolve.cc (gfc_resolve_ref): Conversion of array_ref into an
element should be done for all characters without a len expr,
not just deferred lens, and for integer expressions.
* trans-expr.cc (conv_inquiry): For len and kind inquiry refs,
set the se string_length to NULL_TREE.

gcc/testsuite/
PR fortran/103716
* gfortran.dg/pr103716 : New test.
diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc
index 55d8e326a87..8f0dd8b6dee 100644
--- a/gcc/fortran/resolve.cc
+++ b/gcc/fortran/resolve.cc
@@ -5504,7 +5504,9 @@ gfc_resolve_ref (gfc_expr *expr)
 	case REF_INQUIRY:
 	  /* Implement requirement in note 9.7 of F2018 that the result of the
 	 LEN inquiry be a scalar.  */
-	  if (ref->u.i == INQUIRY_LEN && array_ref && expr->ts.deferred)
+	  if (ref->u.i == INQUIRY_LEN && array_ref
+	  && ((expr->ts.type == BT_CHARACTER && !expr->ts.u.cl->length)
+		  || expr->ts.type == BT_INTEGER))
 	{
 	  array_ref->u.ar.type = AR_ELEMENT;
 	  expr->rank = 0;
diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index 09cdd9263c4..3225b419989 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -2861,11 +2861,13 @@ conv_inquiry (gfc_se * se, gfc_ref * ref, gfc_expr *expr, gfc_typespec *ts)
 case INQUIRY_KIND:
   res = build_int_cst (gfc_typenode_for_spec (>ts),
 			   ts->kind);
+  se->string_length = NULL_TREE;
   break;
 
 case INQUIRY_LEN:
   res = fold_convert (gfc_typenode_for_spec (>ts),
 			  se->string_length);
+  se->string_length = NULL_TREE;
   break;
 
 default:

[PATCH] vect: Missed opportunity to use [SU]ABD

2023-05-09 Thread Oluwatamilore Adebayo via Gcc-patches

>From 0b5f469171c340ef61a48a31877d495bb77bd35f Mon Sep 17 00:00:00 2001
From: oluade01 
Date: Fri, 14 Apr 2023 10:24:43 +0100
Subject: [PATCH 1/4] Missed opportunity to use [SU]ABD

This adds a recognition pattern for the non-widening
absolute difference (ABD).

gcc/ChangeLog:

* doc/md.texi (sabd, uabd): Document them.
* internal-fn.def (ABD): Use new optab.
* optabs.def (sabd_optab, uabd_optab): New optabs,
* tree-vect-patterns.cc (vect_recog_absolute_difference):
Recognize the following idiom abs (a - b).
(vect_recog_sad_pattern): Refactor to use
vect_recog_absolute_difference.
(vect_recog_abd_pattern): Use patterns found by
vect_recog_absolute_difference to build a new ABD
internal call.
---
 gcc/doc/md.texi   |  10 ++
 gcc/internal-fn.def   |   3 +
 gcc/optabs.def|   2 +
 gcc/tree-vect-patterns.cc | 250 +-
 4 files changed, 234 insertions(+), 31 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 
07bf8bdebffb2e523f25a41f2b57e43c0276b745..0ad546c63a8deebb4b6db894f437d1e21f0245a8
 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5778,6 +5778,16 @@ Other shift and rotate instructions, analogous to the
 Vector shift and rotate instructions that take vectors as operand 2
 instead of a scalar type.
 
+@cindex @code{uabd@var{m}} instruction pattern
+@cindex @code{sabd@var{m}} instruction pattern
+@item @samp{uabd@var{m}}, @samp{sabd@var{m}}
+Signed and unsigned absolute difference instructions.  These
+instructions find the difference between operands 1 and 2
+then return the absolute value.  A C code equivalent would be:
+@smallexample
+op0 = abs (op0 - op1)
+@end smallexample
+
 @cindex @code{avg@var{m}3_floor} instruction pattern
 @cindex @code{uavg@var{m}3_floor} instruction pattern
 @item @samp{avg@var{m}3_floor}
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 
7fe742c2ae713e7152ab05cfdfba86e4e0aa3456..0f1724ecf37a31c231572edf90b5577e2d82f468
 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -167,6 +167,9 @@ DEF_INTERNAL_OPTAB_FN (FMS, ECF_CONST, fms, ternary)
 DEF_INTERNAL_OPTAB_FN (FNMA, ECF_CONST, fnma, ternary)
 DEF_INTERNAL_OPTAB_FN (FNMS, ECF_CONST, fnms, ternary)
 
+DEF_INTERNAL_SIGNED_OPTAB_FN (ABD, ECF_CONST | ECF_NOTHROW, first,
+ sabd, uabd, binary)
+
 DEF_INTERNAL_SIGNED_OPTAB_FN (AVG_FLOOR, ECF_CONST | ECF_NOTHROW, first,
  savg_floor, uavg_floor, binary)
 DEF_INTERNAL_SIGNED_OPTAB_FN (AVG_CEIL, ECF_CONST | ECF_NOTHROW, first,
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 
695f5911b300c9ca5737de9be809fa01aabe5e01..29bc92281a2175f898634cbe6af63c18021e5268
 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -359,6 +359,8 @@ OPTAB_D (mask_fold_left_plus_optab, 
"mask_fold_left_plus_$a")
 OPTAB_D (extract_last_optab, "extract_last_$a")
 OPTAB_D (fold_extract_last_optab, "fold_extract_last_$a")
 
+OPTAB_D (uabd_optab, "uabd$a3")
+OPTAB_D (sabd_optab, "sabd$a3")
 OPTAB_D (savg_floor_optab, "avg$a3_floor")
 OPTAB_D (uavg_floor_optab, "uavg$a3_floor")
 OPTAB_D (savg_ceil_optab, "avg$a3_ceil")
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 
a49b09539776c0056e77f99b10365d0a8747fbc5..91e1f9d4b610275dd833ec56dc77f76367ee7886
 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -770,6 +770,89 @@ vect_split_statement (vec_info *vinfo, stmt_vec_info 
stmt2_info, tree new_rhs,
 }
 }
 
+/* Look for the following pattern
+   X = x[i]
+   Y = y[i]
+   DIFF = X - Y
+   DAD = ABS_EXPR
+ */
+static bool
+vect_recog_absolute_difference (vec_info *vinfo, gassign *abs_stmt,
+   tree *half_type, bool reject_unsigned,
+   vect_unpromoted_value unprom[2],
+   tree diff_oprnds[2])
+{
+  if (!abs_stmt)
+return false;
+
+  /* FORNOW.  Can continue analyzing the def-use chain when this stmt in a phi
+ inside the loop (in case we are analyzing an outer-loop).  */
+  enum tree_code code = gimple_assign_rhs_code (abs_stmt);
+  if (code != ABS_EXPR && code != ABSU_EXPR)
+return false;
+
+  tree abs_oprnd = gimple_assign_rhs1 (abs_stmt);
+  tree abs_type = TREE_TYPE (abs_oprnd);
+  if (!abs_oprnd)
+return false;
+  if (reject_unsigned && TYPE_UNSIGNED (abs_type))
+return false;
+  if (!ANY_INTEGRAL_TYPE_P (abs_type) || TYPE_OVERFLOW_WRAPS (abs_type))
+return false;
+
+  /* Peel off conversions from the ABS input.  This can involve sign
+ changes (e.g.  from an unsigned subtraction to a signed ABS input)
+ or signed promotion, but it can't include unsigned promotion.
+ (Note that ABS of an unsigned promotion should have been folded
+ away before now anyway.)  */
+  vect_unpromoted_value unprom_diff;
+  abs_oprnd = vect_look_through_possible_promotion (vinfo, abs_oprnd,
+

[Patch, fortran] PR97122 - Spurious FINAL ... must be in the specification part of a MODULE

2023-05-09 Thread Paul Richard Thomas via Gcc-patches

Hi All,

Thanks to Steve Kargl for the fix. It caused finalize_8.f03 to fail because
this testcase checked that finalizable derived types could not be specified
in a submodule. I have replaced the original test with a test of the patch.

Thanks also to Malcolm Cohen for guidance on this.

OK for trunk?

Paul

Fortran: Allow declaration of finalizable DT in a submodule [PR97122]

2023-05-09  Paul Thomas  
   Steven G. Kargl  

gcc/fortran
PR fortran/97122
* decl.cc (variable_decl): Clean up white space issues.
(gfc_match_final_decl): Declaration of finalizable derived type
is allowed in a submodule.

gcc/testsuite/
PR fortran/97122
* gfortran.dg/finalize_8.f03 : Replace testcase that checks
declaration of finalizable derived types in submodules works.
diff --git a/gcc/fortran/decl.cc b/gcc/fortran/decl.cc
index 233bf244d62..6d6ce0854de 100644
--- a/gcc/fortran/decl.cc
+++ b/gcc/fortran/decl.cc
@@ -2698,7 +2698,7 @@ variable_decl (int elem)
 	}
 
   gfc_seen_div0 = false;
-  
+
   /* F2018:C830 (R816) An explicit-shape-spec whose bounds are not
 	 constant expressions shall appear only in a subprogram, derived
 	 type definition, BLOCK construct, or interface body.  */
@@ -2769,7 +2769,7 @@ variable_decl (int elem)
 	  if (e->expr_type != EXPR_CONSTANT)
 		{
 		  n = gfc_copy_expr (e);
-		  if (!gfc_simplify_expr (n, 1)  && gfc_seen_div0) 
+		  if (!gfc_simplify_expr (n, 1)  && gfc_seen_div0)
 		{
 		  m = MATCH_ERROR;
 		  goto cleanup;
@@ -2784,12 +2784,12 @@ variable_decl (int elem)
 	  if (e->expr_type != EXPR_CONSTANT)
 		{
 		  n = gfc_copy_expr (e);
-		  if (!gfc_simplify_expr (n, 1)  && gfc_seen_div0) 
+		  if (!gfc_simplify_expr (n, 1)  && gfc_seen_div0)
 		{
 		  m = MATCH_ERROR;
 		  goto cleanup;
 		}
-		  
+
 		  if (n->expr_type == EXPR_CONSTANT)
 		gfc_replace_expr (e, n);
 		  else
@@ -11637,8 +11637,9 @@ gfc_match_final_decl (void)
   block = gfc_state_stack->previous->sym;
   gcc_assert (block);
 
-  if (!gfc_state_stack->previous || !gfc_state_stack->previous->previous
-  || gfc_state_stack->previous->previous->state != COMP_MODULE)
+  if (gfc_state_stack->previous->previous
+  && gfc_state_stack->previous->previous->state != COMP_MODULE
+  && gfc_state_stack->previous->previous->state != COMP_SUBMODULE)
 {
   gfc_error ("Derived type declaration with FINAL at %C must be in the"
 		 " specification part of a MODULE");
diff --git a/gcc/testsuite/gfortran.dg/finalize_8.f03 b/gcc/testsuite/gfortran.dg/finalize_8.f03
index b2027a0ba6d..b7fa10dda31 100644
--- a/gcc/testsuite/gfortran.dg/finalize_8.f03
+++ b/gcc/testsuite/gfortran.dg/finalize_8.f03
@@ -1,35 +1,49 @@
-! { dg-do compile }
-
-! Parsing of finalizer procedure definitions.
-! Check that FINAL-declarations are only allowed on types defined in the
-! specification part of a module.
-
-MODULE final_type
+! { dg-do run }
+!
+! PR97122: Declaration of a finalizable derived type in a submodule
+! IS allowed.
+!
+! Contributed by Ian Harvey  
+!
+MODULE m
   IMPLICIT NONE
 
-CONTAINS
+  INTERFACE
+MODULE SUBROUTINE other(i)
+  IMPLICIT NONE
+  integer, intent(inout) :: i
+END SUBROUTINE other
+  END INTERFACE
 
-  SUBROUTINE bar
-IMPLICIT NONE
+  integer :: mi
 
-TYPE :: mytype
-  INTEGER, ALLOCATABLE :: fooarr(:)
-  REAL :: foobar
-CONTAINS
-  FINAL :: myfinal ! { dg-error "in the specification part of a MODULE" }
-END TYPE mytype
-
-  CONTAINS
+END MODULE m
 
-SUBROUTINE myfinal (el)
-  TYPE(mytype) :: el
-END SUBROUTINE myfinal
+SUBMODULE (m) s
+  IMPLICIT NONE
 
-  END SUBROUTINE bar
+  TYPE :: t
+integer :: i
+  CONTAINS
+FINAL :: final_t  ! Used to be an error here
+  END TYPE t
 
-END MODULE final_type
+CONTAINS
 
-PROGRAM finalizer
-  IMPLICIT NONE
-  ! Do nothing here
-END PROGRAM finalizer
+  SUBROUTINE final_t(arg)
+TYPE(t), INTENT(INOUT) :: arg
+mi = -arg%i
+  END SUBROUTINE final_t
+
+  module subroutine other(i)  ! 'ti' is finalized
+integer, intent(inout) :: i
+type(t) :: ti
+ti%i = i
+  END subroutine other
+END SUBMODULE s
+
+  use m
+  integer :: i = 42
+  call other(i)
+  if (mi .ne. -i) stop 1
+end

Re: Re: [PATCH V2] RISC-V: Fix incorrect implementation of TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT

2023-05-09 Thread 钟居哲

No, I don't think so. Some testcases the reason I added -fno-vect-cost-model 
here is
because we don't have enough patterns to enable some auto-vectorizations.
I add   -fno-vect-cost-model to force enable auto-vectorizations for such cases 
for testing.



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-05-09 22:36
To: juzhe.zhong
CC: gcc-patches@gcc.gnu.org; pal...@dabbelt.com; jeffreya...@gmail.com; 
rdapp@gmail.com
Subject: Re: [PATCH V2] RISC-V: Fix incorrect implementation of 
TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT
One more question from me: should we just add  -fno-vect-cost-model to
AUTOVEC_TEST_OPTS?
 
On Tue, May 9, 2023 at 10:29 PM Kito Cheng  wrote:
>
> Oh, checked default_builtin_support_vector_misalignment and I realized
> we can just remove riscv_support_vector_misalignment at all...
>
>
> On Tue, May 9, 2023 at 10:18 PM juzhe.zhong  wrote:
> >
> > riscv_support_vector_misalignment update makes some of the testcase check 
> > fail. I have checked the those fails， they are reasonable. So I include 
> > test case adapt in this patch.
> >  Replied Message 
> > FromKito Cheng
> > Date05/09/2023 21:54
> > tojuzhe.zh...@rivai.ai
> > ccgcc-patc...@gcc.gnu.org,
> > pal...@dabbelt.com,
> > jeffreya...@gmail.com,
> > rdapp@gmail.com
> > SubjectRe: [PATCH V2] RISC-V: Fix incorrect implementation of 
> > TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT
> > I am ok with both changes but I tried to build some test cases, and it
> > seems the changes are caused by options update, not caused by the
> > riscv_support_vector_misalignment update? so I would like to see the
> > testcase should split out into a separated patch.
> >
> > > +/* Return true if the vector misalignment factor is supported by the
> > > +   target.  */
> > >  bool
> > >  riscv_support_vector_misalignment (machine_mode mode,
> > >const_tree type ATTRIBUTE_UNUSED,
> > >int misalignment,
> > >bool is_packed ATTRIBUTE_UNUSED)
> > >  {
> > > -  if (TARGET_VECTOR)
> > > -{
> > > -  if (STRICT_ALIGNMENT)
> > > -   {
> > > - /* Return if movmisalign pattern is not supported for this 
> > > mode.  */
> > > - if (optab_handler (movmisalign_optab, mode) == CODE_FOR_nothing)
> > > -   return false;
> > > -
> > > - /* Misalignment factor is unknown at compile time.  */
> > > - if (misalignment == -1)
> > > -   return false;
> > > -   }
> > > -  return true;
> > > -}
> > > +  /* TODO: For RVV scalable vector auto-vectorization, we should allow
> > > + movmisalign pattern to handle misalign data movement to 
> > > unblock
> > > + possible auto-vectorization.
> > >
> > > + RVV VLS auto-vectorization or SIMD auto-vectorization can be 
> > > supported here
> > > + in the future.  */
> > >return default_builtin_support_vector_misalignment (mode, type, 
> > > misalignment,
> > >   is_packed);
> > >  }
> >
> > Should we have some corresponding change on autovec.md like this?
> >
> > diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> > index f1c5ff5951bf..c2873201d82e 100644
> > --- a/gcc/config/riscv/autovec.md
> > +++ b/gcc/config/riscv/autovec.md
> > @@ -51,7 +51,7 @@
> > (define_expand "movmisalign"
> >  [(set (match_operand:V 0 "nonimmediate_operand")
> >   (match_operand:V 1 "general_operand"))]
> > -  "TARGET_VECTOR"
> > +  "TARGET_VECTOR && !STRICT_ALIGNMENT"
> >  {
> >/* Equivalent to a normal move for our purpooses.  */
> >emit_move_insn (operands[0], operands[1]);

Re: Testsuite: Add missing 'torture-init'/'torture-finish' around 'LTO_TORTURE_OPTIONS' usage (was: Let each 'lto_init' determine the default 'LTO_OPTIONS', and 'torture-init' the 'LTO_TORTURE_OPTIONS

2023-05-09 Thread Christophe Lyon via Gcc-patches

Hi!

On Tue, 9 May 2023 at 11:00, Thomas Schwinge 
wrote:

> Hi Christophe!
>
> On 2023-05-09T09:32:55+0200, Christophe Lyon 
> wrote:
> > On Wed, 3 May 2023 at 13:47, Richard Biener via Gcc-patches <
> gcc-patches@gcc.gnu.org> wrote:
> >> On Wed, 3 May 2023, Thomas Schwinge wrote:
> >> > "Let each 'lto_init' determine the default 'LTO_OPTIONS', and
> 'torture-init' the 'LTO_TORTURE_OPTIONS'"?
> >
> > This is causing issues on arm/aarch64, including:
> >
> > ERROR: can't read "LTO_TORTURE_OPTIONS": no such variable
> > in gcc.target/arm/acle/acle.exp:
> >
> > ERROR: torture-init: LTO_TORTURE_OPTIONS is not empty as expected
> > in gcc.target/aarch64/sls-mitigation/sls-mitigation.exp,
> > gcc.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp,
> > gcc.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp,
> > gcc.target/aarch64/torture/aarch64-torture.exp
> >
> > and maybe others
> >
> > Are other targets affected too?
>
> Sorry for that -- it means, the safe-guards I added are working as
> expected.
>
> Please test whether all these issues are gone with the attached
> "Testsuite: Add missing 'torture-init'/'torture-finish' around
> 'LTO_TORTURE_OPTIONS' usage"?
>
>
Your patch seemed reasonable,  but it doesn't work :-(

Well now I get:
ERROR: torture-init: LTO_TORTURE_OPTIONS is not empty as expected
because gcc-dg-runtest itself calls torture-init

but I'm not sure where LTO_TORTURE_OPTIONS is set

Christophe


> Grüße
>  Thomas
>
>
> -
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201,
> 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer:
> Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München;
> Registergericht München, HRB 106955
>

Re: [PATCH V2] RISC-V: Fix incorrect implementation of TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT

2023-05-09 Thread Kito Cheng via Gcc-patches

One more question from me: should we just add  -fno-vect-cost-model to
AUTOVEC_TEST_OPTS?

On Tue, May 9, 2023 at 10:29 PM Kito Cheng  wrote:
>
> Oh, checked default_builtin_support_vector_misalignment and I realized
> we can just remove riscv_support_vector_misalignment at all...
>
>
> On Tue, May 9, 2023 at 10:18 PM juzhe.zhong  wrote:
> >
> > riscv_support_vector_misalignment update makes some of the testcase check 
> > fail. I have checked the those fails， they are reasonable. So I include 
> > test case adapt in this patch.
> >  Replied Message 
> > FromKito Cheng
> > Date05/09/2023 21:54
> > tojuzhe.zh...@rivai.ai
> > ccgcc-patc...@gcc.gnu.org,
> > pal...@dabbelt.com,
> > jeffreya...@gmail.com,
> > rdapp@gmail.com
> > SubjectRe: [PATCH V2] RISC-V: Fix incorrect implementation of 
> > TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT
> > I am ok with both changes but I tried to build some test cases, and it
> > seems the changes are caused by options update, not caused by the
> > riscv_support_vector_misalignment update? so I would like to see the
> > testcase should split out into a separated patch.
> >
> > > +/* Return true if the vector misalignment factor is supported by the
> > > +   target.  */
> > >  bool
> > >  riscv_support_vector_misalignment (machine_mode mode,
> > >const_tree type ATTRIBUTE_UNUSED,
> > >int misalignment,
> > >bool is_packed ATTRIBUTE_UNUSED)
> > >  {
> > > -  if (TARGET_VECTOR)
> > > -{
> > > -  if (STRICT_ALIGNMENT)
> > > -   {
> > > - /* Return if movmisalign pattern is not supported for this 
> > > mode.  */
> > > - if (optab_handler (movmisalign_optab, mode) == CODE_FOR_nothing)
> > > -   return false;
> > > -
> > > - /* Misalignment factor is unknown at compile time.  */
> > > - if (misalignment == -1)
> > > -   return false;
> > > -   }
> > > -  return true;
> > > -}
> > > +  /* TODO: For RVV scalable vector auto-vectorization, we should allow
> > > + movmisalign pattern to handle misalign data movement to 
> > > unblock
> > > + possible auto-vectorization.
> > >
> > > + RVV VLS auto-vectorization or SIMD auto-vectorization can be 
> > > supported here
> > > + in the future.  */
> > >return default_builtin_support_vector_misalignment (mode, type, 
> > > misalignment,
> > >   is_packed);
> > >  }
> >
> > Should we have some corresponding change on autovec.md like this?
> >
> > diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> > index f1c5ff5951bf..c2873201d82e 100644
> > --- a/gcc/config/riscv/autovec.md
> > +++ b/gcc/config/riscv/autovec.md
> > @@ -51,7 +51,7 @@
> > (define_expand "movmisalign"
> >  [(set (match_operand:V 0 "nonimmediate_operand")
> >   (match_operand:V 1 "general_operand"))]
> > -  "TARGET_VECTOR"
> > +  "TARGET_VECTOR && !STRICT_ALIGNMENT"
> >  {
> >/* Equivalent to a normal move for our purpooses.  */
> >emit_move_insn (operands[0], operands[1]);

Re: Re: [PATCH V2] RISC-V: Fix incorrect implementation of TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT

2023-05-09 Thread 钟居哲

Yes We can remove it but I still keep it here and add comment for TODO.
Since we may want to support it for VLS modes, like ARM SVE, they have Advanced 
SIMD modes (128bit VLS mode):
/* Return true if the vector misalignment factor is supported by the
   target.  */
static bool
aarch64_builtin_support_vector_misalignment (machine_mode mode,
   const_tree type, int misalignment,
   bool is_packed)
{
  if (TARGET_SIMD && STRICT_ALIGNMENT)
{
  /* Return if movmisalign pattern is not supported for this mode.  */
  if (optab_handler (movmisalign_optab, mode) == CODE_FOR_nothing)
return false;

  /* Misalignment factor is unknown at compile time.  */
  if (misalignment == -1)
  return false;
}
  return default_builtin_support_vector_misalignment (mode, type, misalignment,
  is_packed);
}

This is ARM implementation, TAGET_SIMD is for Advance SIMD.

juzhe.zh...@rivai.ai

From: Kito Cheng
Date: 2023-05-09 22:29
To: juzhe.zhong
CC: gcc-patches@gcc.gnu.org; pal...@dabbelt.com; jeffreya...@gmail.com; 
rdapp@gmail.com
Subject: Re: [PATCH V2] RISC-V: Fix incorrect implementation of 
TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT
Oh, checked default_builtin_support_vector_misalignment and I realized
we can just remove riscv_support_vector_misalignment at all...

On Tue, May 9, 2023 at 10:18 PM juzhe.zhong  wrote:
>
> riscv_support_vector_misalignment update makes some of the testcase check 
> fail. I have checked the those fails， they are reasonable. So I include test 
> case adapt in this patch.
>  Replied Message 
> FromKito Cheng
> Date05/09/2023 21:54
> tojuzhe.zh...@rivai.ai
> ccgcc-patc...@gcc.gnu.org,
> pal...@dabbelt.com,
> jeffreya...@gmail.com,
> rdapp@gmail.com
> SubjectRe: [PATCH V2] RISC-V: Fix incorrect implementation of 
> TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT
> I am ok with both changes but I tried to build some test cases, and it
> seems the changes are caused by options update, not caused by the
> riscv_support_vector_misalignment update? so I would like to see the
> testcase should split out into a separated patch.
>
> > +/* Return true if the vector misalignment factor is supported by the
> > +   target.  */
> >  bool
> >  riscv_support_vector_misalignment (machine_mode mode,
> >const_tree type ATTRIBUTE_UNUSED,
> >int misalignment,
> >bool is_packed ATTRIBUTE_UNUSED)
> >  {
> > -  if (TARGET_VECTOR)
> > -{
> > -  if (STRICT_ALIGNMENT)
> > -   {
> > - /* Return if movmisalign pattern is not supported for this mode.  
> > */
> > - if (optab_handler (movmisalign_optab, mode) == CODE_FOR_nothing)
> > -   return false;
> > -
> > - /* Misalignment factor is unknown at compile time.  */
> > - if (misalignment == -1)
> > -   return false;
> > -   }
> > -  return true;
> > -}
> > +  /* TODO: For RVV scalable vector auto-vectorization, we should allow
> > + movmisalign pattern to handle misalign data movement to unblock
> > + possible auto-vectorization.
> >
> > + RVV VLS auto-vectorization or SIMD auto-vectorization can be 
> > supported here
> > + in the future.  */
> >return default_builtin_support_vector_misalignment (mode, type, 
> > misalignment,
> >   is_packed);
> >  }
>
> Should we have some corresponding change on autovec.md like this?
>
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index f1c5ff5951bf..c2873201d82e 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -51,7 +51,7 @@
> (define_expand "movmisalign"
>  [(set (match_operand:V 0 "nonimmediate_operand")
>   (match_operand:V 1 "general_operand"))]
> -  "TARGET_VECTOR"
> +  "TARGET_VECTOR && !STRICT_ALIGNMENT"
>  {
>/* Equivalent to a normal move for our purpooses.  */
>emit_move_insn (operands[0], operands[1]);

Re: Re: [PATCH V4] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-09 Thread 钟居哲

More details for Case 2:
+   _72 = MIN_EXPR ;
+   _75 = MIN_EXPR ;
+   ...
+   .LEN_STORE (vectp_f.8_51, 128B, _75, { 1, 2, 1, 2, 1, 2, 1, 2 }, 0);
+   vectp_f.8_56 = vectp_f.8_51 + 16;
+   .LEN_STORE (vectp_f.8_56, 128B, _72, { 1, 2, 1, 2, 1, 2, 1, 2 }, 0);
+   ...
+   _61 = _75 / 2;
+   .LEN_STORE (vectp_d.10_59, 128B, _61, { 3, 3, 3, 3 }, 0);
+   vectp_d.10_63 = vectp_d.10_59 + 16;
+   _64 = _72 / 2;
+   .LEN_STORE (vectp_d.10_63, 128B, _64, { 3, 3, 3, 3 }, 0);
You may be confused that _61 = _75 / 2; and _64 = _72 / 2;
Well, this is similiar VIEW_CONVERT_EXPR of mask in ARM SVE.

Like ARM SVE:
tree
vect_get_loop_mask (gimple_stmt_iterator *gsi, vec_loop_masks *masks,
unsigned int nvectors, tree vectype, unsigned int index)
{
  rgroup_controls *rgm = &(*masks)[nvectors - 1];
  tree mask_type = rgm->type;

  /* Populate the rgroup's mask array, if this is the first time we've
 used it.  */
  if (rgm->controls.is_empty ())
{
  rgm->controls.safe_grow_cleared (nvectors, true);
  for (unsigned int i = 0; i < nvectors; ++i)
{
  tree mask = make_temp_ssa_name (mask_type, NULL, "loop_mask");
  /* Provide a dummy definition until the real one is available.  */
  SSA_NAME_DEF_STMT (mask) = gimple_build_nop ();
  rgm->controls[i] = mask;
}
}

  tree mask = rgm->controls[index];
  if (maybe_ne (TYPE_VECTOR_SUBPARTS (mask_type),
TYPE_VECTOR_SUBPARTS (vectype)))
{
  /* A loop mask for data type X can be reused for data type Y
 if X has N times more elements than Y and if Y's elements
 are N times bigger than X's.  In this case each sequence
 of N elements in the loop mask will be all-zero or all-one.
 We can then view-convert the mask so that each sequence of
 N elements is replaced by a single element.  */
  gcc_assert (multiple_p (TYPE_VECTOR_SUBPARTS (mask_type),
  TYPE_VECTOR_SUBPARTS (vectype)));
  gimple_seq seq = NULL;
  mask_type = truth_type_for (vectype);
  mask = gimple_build (, VIEW_CONVERT_EXPR, mask_type, mask);
  if (seq)
gsi_insert_seq_before (gsi, seq, GSI_SAME_STMT);
}
  return mask;
}


I am doing similiar thing for RVV:
+vect_get_loop_len (loop_vec_info loop_vinfo, gimple_stmt_iterator *gsi,
+  vec_loop_lens *lens, unsigned int nvectors, tree vectype,
+  unsigned int index)
 {
   rgroup_controls *rgl = &(*lens)[nvectors - 1];
   bool use_bias_adjusted_len =
 LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) != 0;
+  tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
 
   /* Populate the rgroup's len array, if this is the first time we've
  used it.  */
@@ -10400,6 +10403,27 @@ vect_get_loop_len (loop_vec_info loop_vinfo, 
vec_loop_lens *lens,
 
   if (use_bias_adjusted_len)
 return rgl->bias_adjusted_ctrl;
+  else if (direct_internal_fn_supported_p (IFN_SELECT_VL, iv_type,
+  OPTIMIZE_FOR_SPEED))
+{
+  tree loop_len = rgl->controls[index];
+  poly_int64 nunits1 = TYPE_VECTOR_SUBPARTS (rgl->type);
+  poly_int64 nunits2 = TYPE_VECTOR_SUBPARTS (vectype);
+  if (maybe_ne (nunits1, nunits2))
+   {
+ /* A loop len for data type X can be reused for data type Y
+if X has N times more elements than Y and if Y's elements
+are N times bigger than X's.  */
+ gcc_assert (multiple_p (nunits1, nunits2));
+ unsigned int factor = exact_div (nunits1, nunits2).to_constant ();
+ gimple_seq seq = NULL;
+ loop_len = gimple_build (, RDIV_EXPR, iv_type, loop_len,
+  build_int_cst (iv_type, factor));
+ if (seq)
+   gsi_insert_seq_before (gsi, seq, GSI_SAME_STMT);
+   }
+  return loop_len;
+}
   else
 return rgl->controls[index];
 }Thansk.


juzhe.zh...@rivai.ai
 
From: juzhe.zh...@rivai.ai
Date: 2023-05-09 21:27
To: richard.sandiford
CC: gcc-patches; rguenther
Subject: Re: Re: [PATCH V4] VECT: Add decrement IV iteration loop control by 
variable amount support
Thanks, Richard.

>> Could you go into more details about this?  I imagined that for case 3,
>> there would be a single SELECT_VL that decides how many scalar iterations
>> will be handled by the current vector iteration, then we would "expand"
>> the result (using MIN_EXPRs) to the multi-control cases.

For case 2 , here is the example:
+   2. Multiple rgroup, the Gimple IR should be:
+
+   # i_23 = PHI 
+   # vectp_f.8_51 = PHI 
+   # vectp_d.10_59 = PHI 
+   # ivtmp_70 = PHI 
+   # ivtmp_73 = PHI 
+   _72 = MIN_EXPR ;
+   _75 = MIN_EXPR ;
+   _1 = i_23 * 2;
+   _2 = (long unsigned int) _1;
+   _3 = _2 * 2;
+   _4 = f_15(D) + _3;
+   _5 = _2 + 1;
+   _6 = _5 * 2;
+   _7 = f_15(D) + _6;
+   .LEN_STORE (vectp_f.8_51,

Re: [PATCH V2] RISC-V: Fix incorrect implementation of TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT

2023-05-09 Thread Kito Cheng via Gcc-patches

Oh, checked default_builtin_support_vector_misalignment and I realized
we can just remove riscv_support_vector_misalignment at all...


On Tue, May 9, 2023 at 10:18 PM juzhe.zhong  wrote:
>
> riscv_support_vector_misalignment update makes some of the testcase check 
> fail. I have checked the those fails， they are reasonable. So I include test 
> case adapt in this patch.
>  Replied Message 
> FromKito Cheng
> Date05/09/2023 21:54
> tojuzhe.zh...@rivai.ai
> ccgcc-patc...@gcc.gnu.org,
> pal...@dabbelt.com,
> jeffreya...@gmail.com,
> rdapp@gmail.com
> SubjectRe: [PATCH V2] RISC-V: Fix incorrect implementation of 
> TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT
> I am ok with both changes but I tried to build some test cases, and it
> seems the changes are caused by options update, not caused by the
> riscv_support_vector_misalignment update? so I would like to see the
> testcase should split out into a separated patch.
>
> > +/* Return true if the vector misalignment factor is supported by the
> > +   target.  */
> >  bool
> >  riscv_support_vector_misalignment (machine_mode mode,
> >const_tree type ATTRIBUTE_UNUSED,
> >int misalignment,
> >bool is_packed ATTRIBUTE_UNUSED)
> >  {
> > -  if (TARGET_VECTOR)
> > -{
> > -  if (STRICT_ALIGNMENT)
> > -   {
> > - /* Return if movmisalign pattern is not supported for this mode.  
> > */
> > - if (optab_handler (movmisalign_optab, mode) == CODE_FOR_nothing)
> > -   return false;
> > -
> > - /* Misalignment factor is unknown at compile time.  */
> > - if (misalignment == -1)
> > -   return false;
> > -   }
> > -  return true;
> > -}
> > +  /* TODO: For RVV scalable vector auto-vectorization, we should allow
> > + movmisalign pattern to handle misalign data movement to unblock
> > + possible auto-vectorization.
> >
> > + RVV VLS auto-vectorization or SIMD auto-vectorization can be 
> > supported here
> > + in the future.  */
> >return default_builtin_support_vector_misalignment (mode, type, 
> > misalignment,
> >   is_packed);
> >  }
>
> Should we have some corresponding change on autovec.md like this?
>
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index f1c5ff5951bf..c2873201d82e 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -51,7 +51,7 @@
> (define_expand "movmisalign"
>  [(set (match_operand:V 0 "nonimmediate_operand")
>   (match_operand:V 1 "general_operand"))]
> -  "TARGET_VECTOR"
> +  "TARGET_VECTOR && !STRICT_ALIGNMENT"
>  {
>/* Equivalent to a normal move for our purpooses.  */
>emit_move_insn (operands[0], operands[1]);

Re: [PATCH] c++: Reject pack expansion of assume attribute [PR109756]

2023-05-09 Thread Jason Merrill via Gcc-patches


On 5/9/23 04:22, Jakub Jelinek wrote:

Hi!

http://eel.is/c++draft/dcl.attr#grammar-4 says
"In an attribute-list, an ellipsis may appear only if that attribute's
specification permits it."
and doesn't explicitly permit it on any standard attribute.
The https://wg21.link/p1774r8 paper which introduced assume attribute says
"We could therefore hypothetically permit the assume attribute to directly
support pack expansion:
template 
void f() {
[[assume(args >= 0)...]];
}
However, we do not propose this. It would require substantial additional work


I question "substantial"; it could easily be treated as a 
fold-expression internally.  But the patch is OK.



for a very rare use case. Note that this can instead be expressed with a fold
expression, which is equivalent to the above and works out of the box without
any extra effort:
template 
void f() {
[[assume(((args >= 0) && ...))]];
}
", but as the testcase shows, GCC 13+ ICEs on assume attribute followed by
... if it contains packs.
The following patch rejects those instead of ICE and for C++17 or later
suggests using fold expressions instead (it doesn't make sense to suggest
it for C++14 and earlier when we'd error on the fold expressions).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/13.2?

2023-05-09  Jakub Jelinek  

PR c++/109756
* gp-gimplify.cc (process_stmt_assume_attribute): Diagnose pack
expansion of assume attribute.

* g++.dg/cpp23/attr-assume11.C: New test.

--- gcc/cp/cp-gimplify.cc.jj2023-05-04 12:13:50.791647593 +0200
+++ gcc/cp/cp-gimplify.cc   2023-05-08 13:20:33.191070530 +0200
@@ -3267,6 +3267,16 @@ process_stmt_assume_attribute (tree std_
for (; attr; attr = lookup_attribute ("gnu", "assume", TREE_CHAIN (attr)))
  {
tree args = TREE_VALUE (attr);
+  if (args && PACK_EXPANSION_P (args))
+   {
+ auto_diagnostic_group d;
+ error_at (attrs_loc, "pack expansion of %qE attribute",
+   get_attribute_name (attr));
+ if (cxx_dialect >= cxx17)
+   inform (attrs_loc, "use fold expression in the attribute "
+  "argument instead");
+ continue;
+   }
int nargs = list_length (args);
if (nargs != 1)
{
--- gcc/testsuite/g++.dg/cpp23/attr-assume11.C.jj   2023-05-08 
13:19:07.812290213 +0200
+++ gcc/testsuite/g++.dg/cpp23/attr-assume11.C  2023-05-08 13:22:07.746719746 
+0200
@@ -0,0 +1,22 @@
+// PR c++/109756
+// { dg-do compile { target c++11 } }
+
+template 
+void
+foo ()
+{
+  [[assume (1 > 0)...]];// { dg-error "expansion pattern '\\\(1 > 
0\\\)' contains no parameter packs" }
+   // { dg-warning "attributes at the beginning of 
statement are ignored" "" { target *-*-* } .-1 }
+  [[assume (args > 0)...]]; // { dg-error "pack expansion of 'assume' 
attribute" }
+   // { dg-message "use fold expression in the attribute 
argument instead" "" { target c++17 } .-1 }
+#if __cpp_fold_expressions >= 201603L
+  [[assume (((args > 0) && ...))]];
+#endif
+  [[gnu::assume (1 > 0)...]];   // { dg-error "expansion pattern '\\\(1 
> 0\\\)' contains no parameter packs" }
+   // { dg-warning "attributes at the beginning of 
statement are ignored" "" { target *-*-* } .-1 }
+  [[gnu::assume (args > 0)...]];// { dg-error "pack expansion of 'assume' 
attribute" }
+   // { dg-message "use fold expression in the attribute 
argument instead" "" { target c++17 } .-1  }
+#if __cpp_fold_expressions >= 201603L
+  [[gnu::assume (((args > 0) && ...))]];
+#endif
+}

Jakub

Re: [PATCH V2] RISC-V: Fix incorrect implementation of TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT

2023-05-09 Thread Kito Cheng via Gcc-patches

I am ok with both changes but I tried to build some test cases, and it
seems the changes are caused by options update, not caused by the
riscv_support_vector_misalignment update? so I would like to see the
testcase should split out into a separated patch.

> +/* Return true if the vector misalignment factor is supported by the
> +   target.  */
>  bool
>  riscv_support_vector_misalignment (machine_mode mode,
>const_tree type ATTRIBUTE_UNUSED,
>int misalignment,
>bool is_packed ATTRIBUTE_UNUSED)
>  {
> -  if (TARGET_VECTOR)
> -{
> -  if (STRICT_ALIGNMENT)
> -   {
> - /* Return if movmisalign pattern is not supported for this mode.  */
> - if (optab_handler (movmisalign_optab, mode) == CODE_FOR_nothing)
> -   return false;
> -
> - /* Misalignment factor is unknown at compile time.  */
> - if (misalignment == -1)
> -   return false;
> -   }
> -  return true;
> -}
> +  /* TODO: For RVV scalable vector auto-vectorization, we should allow
> + movmisalign pattern to handle misalign data movement to unblock
> + possible auto-vectorization.
>
> + RVV VLS auto-vectorization or SIMD auto-vectorization can be supported 
> here
> + in the future.  */
>return default_builtin_support_vector_misalignment (mode, type, 
> misalignment,
>   is_packed);
>  }

Should we have some corresponding change on autovec.md like this?

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index f1c5ff5951bf..c2873201d82e 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -51,7 +51,7 @@
(define_expand "movmisalign"
  [(set (match_operand:V 0 "nonimmediate_operand")
   (match_operand:V 1 "general_operand"))]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && !STRICT_ALIGNMENT"
  {
/* Equivalent to a normal move for our purpooses.  */
emit_move_insn (operands[0], operands[1]);

RE: [PATCH 01/16] arm: [MVE intrinsics] add binary_maxvminv shape

2023-05-09 Thread Kyrylo Tkachov via Gcc-patches

Hi Christophe,

> -Original Message-
> From: Christophe Lyon 
> Sent: Tuesday, May 9, 2023 1:19 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Richard Earnshaw ; Richard Sandiford
> 
> Cc: Christophe Lyon 
> Subject: [PATCH 01/16] arm: [MVE intrinsics] add binary_maxvminv shape
> 
> This patch adds the binary_maxvminv shape description.

This patch series is fairly mechanical (that's not to say simple!) and in line 
with the other series in this area.
You obviously know what you're doing here so I'm comfortable approving it.
I did have a look at the patches individually and with the comment on patch 
06/16 addressed this series is ok for trunk.
Thanks,
Kyrill

> 
> 2022-09-08  Christophe Lyon  
> 
>   gcc/
>   * config/arm/arm-mve-builtins-shapes.cc (binary_maxvminv): New.
>   * config/arm/arm-mve-builtins-shapes.h (binary_maxvminv): New.
> ---
>  gcc/config/arm/arm-mve-builtins-shapes.cc | 30 +++
>  gcc/config/arm/arm-mve-builtins-shapes.h  |  1 +
>  2 files changed, 31 insertions(+)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc b/gcc/config/arm/arm-
> mve-builtins-shapes.cc
> index 1d43b8871bf..19c3c47a20e 100644
> --- a/gcc/config/arm/arm-mve-builtins-shapes.cc
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
> @@ -401,6 +401,36 @@ struct binary_rshift_def : public
> overloaded_base<0>
>  };
>  SHAPE (binary_rshift)
> 
> +/* _t vfoo[_](_t, _t)
> +
> +   Example: vmaxvq.
> +   int8_t [__arm_]vmaxvq[_s8](int8_t a, int8x16_t b)
> +   int8_t [__arm_]vmaxvq_p[_s8](int8_t a, int8x16_t b, mve_pred16_t p)  */
> +struct binary_maxvminv_def : public overloaded_base<0>
> +{
> +  void
> +  build (function_builder , const function_group_info ,
> +  bool preserve_user_namespace) const override
> +  {
> +b.add_overloaded_functions (group, MODE_none,
> preserve_user_namespace);
> +build_all (b, "s0,s0,v0", group, MODE_none, preserve_user_namespace);
> +  }
> +
> +  tree
> +  resolve (function_resolver ) const override
> +  {
> +unsigned int i, nargs;
> +type_suffix_index type;
> +if (!r.check_gp_argument (2, i, nargs)
> + || !r.require_derived_scalar_type (0, r.SAME_TYPE_CLASS)
> + || (type = r.infer_vector_type (1)) == NUM_TYPE_SUFFIXES)
> +  return error_mark_node;
> +
> +return r.resolve_to (r.mode_suffix_id, type);
> +  }
> +};
> +SHAPE (binary_maxvminv)
> +
>  /* _t vfoo[_t0](_t, _t)
> 
> Example: vmovnbq.
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h b/gcc/config/arm/arm-
> mve-builtins-shapes.h
> index dd2597dc6f5..9debf1d8733 100644
> --- a/gcc/config/arm/arm-mve-builtins-shapes.h
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.h
> @@ -37,6 +37,7 @@ namespace arm_mve
>  extern const function_shape *const binary;
>  extern const function_shape *const binary_lshift;
>  extern const function_shape *const binary_lshift_r;
> +extern const function_shape *const binary_maxvminv;
>  extern const function_shape *const binary_move_narrow;
>  extern const function_shape *const binary_move_narrow_unsigned;
>  extern const function_shape *const binary_opt_n;
> --
> 2.34.1

RE: [PATCH 06/16] arm: add smax/smin expanders for v*hf

2023-05-09 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Christophe Lyon 
> Sent: Tuesday, May 9, 2023 1:19 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Richard Earnshaw ; Richard Sandiford
> 
> Cc: Christophe Lyon 
> Subject: [PATCH 06/16] arm: add smax/smin expanders for v*hf
> 
> This patch adds the missing expanders for smax/smin for v*hf modes.
> 
> 2022-09-08  Christophe Lyon  
> 
>   gcc/
>   * config/arm/vec-common.md (smin3): New.
>   (smax3): New.
> ---
>  gcc/config/arm/vec-common.md | 14 ++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-
> common.md
> index b5fc86fdf28..1f9b7992da4 100644
> --- a/gcc/config/arm/vec-common.md
> +++ b/gcc/config/arm/vec-common.md
> @@ -116,6 +116,13 @@ (define_expand "smin3"
> "ARM_HAVE__ARITH"
>  )
> 
> +(define_expand "smin3"
> +  [(set (match_operand:VH 0 "s_register_operand")
> + (smin:VH (match_operand:VH 1 "s_register_operand")
> +  (match_operand:VH 2 "s_register_operand")))]
> +   "ARM_HAVE__ARITH"
> +)
> +
>  (define_expand "umin3"
>[(set (match_operand:VINTW 0 "s_register_operand")
>   (umin:VINTW (match_operand:VINTW 1 "s_register_operand")
> @@ -130,6 +137,13 @@ (define_expand "smax3"
> "ARM_HAVE__ARITH"
>  )
> 
> +(define_expand "smax3"
> +  [(set (match_operand:VH 0 "s_register_operand")
> + (smax:VH (match_operand:VH 1 "s_register_operand")
> +  (match_operand:VH 2 "s_register_operand")))]
> +   "ARM_HAVE__ARITH"
> +)

We already have expanders for smin and smax, can we just extend their mode 
iterators to include the VH modes?
The ARM_HAVE__ARITH checks should still gate them properly and we could 
avoid adding more bloat in this file.
Thanks,
Kyrill

> +
>  (define_expand "umax3"
>[(set (match_operand:VINTW 0 "s_register_operand")
>   (umax:VINTW (match_operand:VINTW 1 "s_register_operand")
> --
> 2.34.1

Re: [PATCH V5] Use reg mode to move sub blocks for parameters and returns

2023-05-09 Thread Jiufu Guo via Gcc-patches

Hi,

Jeff Law  writes:

> On 5/3/23 23:49, guojiufu wrote:
>> Hi,
>>
>> On 2023-05-01 03:00, Jeff Law wrote:
>>> On 3/16/23 21:39, Jiufu Guo wrote:
 Hi,

 When assigning a parameter to a variable, or assigning a variable to
 return value with struct type, and the parameter/return is passed
 through registers.
 For this kind of case, it would be better to use the nature mode of
 the registers to move the content for the assignment.

 As the example code (like code in PR65421):

 typedef struct SA {double a[3];} A;
 A ret_arg_pt (A *a) {return *a;} // on ppc64le, expect only 3 lfd(s)
 A ret_arg (A a) {return a;} // just empty fun body
 void st_arg (A a, A *p) {*p = a;} //only 3 stfd(s)

 Comparing with previous version:
 https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609394.html
 This version refine code to eliminated reductant code in  the sub
 routine "move_sub_blocks".

 Bootstrap and regtest pass on ppc64{,le}.
 Is this ok for trunk?

>> ...
>>
 diff --git a/gcc/expr.cc b/gcc/expr.cc
 index 15be1c8db99..97a7be9542e 100644
 --- a/gcc/expr.cc
 +++ b/gcc/expr.cc
 @@ -5559,6 +5559,41 @@ mem_ref_refers_to_non_mem_p (tree ref)
     return non_mem_decl_p (base);
   }
   +/* Sub routine of expand_assignment, invoked when assigning from a
 +   parameter or assigning to a return val on struct type which may
 +   be passed through registers.  The mode of register is used to
 +   move the content for the assignment.
 +
 +   This routine generates code for expression FROM which is BLKmode,
 +   and move the generated content to TO_RTX by su-blocks in SUB_MODE.  */
 +
 +static void
 +move_sub_blocks (rtx to_rtx, tree from, machine_mode sub_mode)
 +{
 +  gcc_assert (MEM_P (to_rtx));
 +
 +  HOST_WIDE_INT size = MEM_SIZE (to_rtx).to_constant ();
>>> Consider the case of a BLKmode return value.  Isn't TO_RTX in this
>>> case a BLKmode object?
>>
>> Thanks for this question!
>>
>> Yes, the mode of TO_RTX is BLKmode.
>> As we know, when the function returns via registers, the mode of
>> the `return-rtx` could also be BLKmode.  This patch is going to
>> improve these kinds of cases.
>>
>> For example:
>> ```
>> typedef struct FLOATS
>> {
>>    double a[3];
>> } FLOATS;
>> FLOATS ret_arg_pt (FLOATS *a){return *a;}
>> ```
>>
>> D.3952 = *a_2(D); //this patch enhance this assignment
>> return D.3952;
>>
>> The mode is BLKmode for the rtx of `D.3952` is BLKmode, and the
>> rtx for "DECL_RESULT(current_function_decl)".  And the DECL_RESULT
>> represents the return registers.
> I didn't think MEM_SIZE worked for BLKmode.  BUt looking at its
> definition, it's pulling the size out of the attributes rather than
> from the mode.  SO I guess there's a reasonable chance it's going to
> work :-)

Thanks for point out this!  Yes, BLKmode rtx may not always be a MEM.
MEM_SIZE is only ok for MEM after the it's known size is computed.
Here MEM_SIZE is fine just because it is an stack rtx corresponding
to the type of parameter and returns which has been computed.

I updated the patch to resolve the conflicts with the trunk, and
retest bootstrap, and then updated the patch a new version.

And this version pass bootstrap and regtest on ppc64{,le}, x86_64. 

The major change is 'move_sub_blocks' only handles the case when
the block size can be move by same submode, or say (size % sub_size)
is 0.  If no objection, I would committed the new version.

BR,
Jeff (Jiufu)

gcc/ChangeLog:

* cfgexpand.cc (expand_used_vars): Update to mark DECL_USEDBY_RETURN_P
for returns.
* expr.cc (move_sub_blocks): New function.
(expand_assignment): Update assignment code about returns/parameters.
* function.cc (assign_parm_setup_block): Update to mark
DECL_REGS_TO_STACK_P for parameter.
* tree-core.h (struct tree_decl_common): Add comment.
* tree.h (DECL_USEDBY_RETURN_P): New define.
(DECL_REGS_TO_STACK_P): New define.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr65421-1.c: New test.
* gcc.target/powerpc/pr65421-2.c: New test.
---
 gcc/cfgexpand.cc | 14 +
 gcc/expr.cc  | 62 
 gcc/function.cc  |  3 +
 gcc/tree-core.h  |  4 +-
 gcc/tree.h   |  9 +++
 gcc/testsuite/gcc.target/powerpc/pr65421-1.c |  6 ++
 gcc/testsuite/gcc.target/powerpc/pr65421-2.c | 33 +++
 7 files changed, 130 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-2.c

diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
index 1a1b26b1c6c..7b6a2216492 100644
--- a/gcc/cfgexpand.cc
+++ b/gcc/cfgexpand.cc
@@ -2158,6 +2158,20 @@ expand_used_vars (bitmap

Re: Re: [PATCH V4] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-09 Thread juzhe.zh...@rivai.ai

Thanks, Richard.

>> Could you go into more details about this?  I imagined that for case 3,
>> there would be a single SELECT_VL that decides how many scalar iterations
>> will be handled by the current vector iteration, then we would "expand"
>> the result (using MIN_EXPRs) to the multi-control cases.

For case 2 , here is the example:
+   2. Multiple rgroup, the Gimple IR should be:
+
+   # i_23 = PHI 
+   # vectp_f.8_51 = PHI 
+   # vectp_d.10_59 = PHI 
+   # ivtmp_70 = PHI 
+   # ivtmp_73 = PHI 
+   _72 = MIN_EXPR ;
+   _75 = MIN_EXPR ;
+   _1 = i_23 * 2;
+   _2 = (long unsigned int) _1;
+   _3 = _2 * 2;
+   _4 = f_15(D) + _3;
+   _5 = _2 + 1;
+   _6 = _5 * 2;
+   _7 = f_15(D) + _6;
+   .LEN_STORE (vectp_f.8_51, 128B, _75, { 1, 2, 1, 2, 1, 2, 1, 2 }, 0);
+   vectp_f.8_56 = vectp_f.8_51 + 16;
+   .LEN_STORE (vectp_f.8_56, 128B, _72, { 1, 2, 1, 2, 1, 2, 1, 2 }, 0);
+   _8 = (long unsigned int) i_23;
+   _9 = _8 * 4;
+   _10 = d_18(D) + _9;
+   _61 = _75 / 2;
+   .LEN_STORE (vectp_d.10_59, 128B, _61, { 3, 3, 3, 3 }, 0);
+   vectp_d.10_63 = vectp_d.10_59 + 16;
+   _64 = _72 / 2;
+   .LEN_STORE (vectp_d.10_63, 128B, _64, { 3, 3, 3, 3 }, 0);
+   i_20 = i_23 + 1;
+   vectp_f.8_52 = vectp_f.8_56 + 16;
+   vectp_d.10_60 = vectp_d.10_63 + 16;
+   ivtmp_74 = ivtmp_73 - _75;
+   ivtmp_71 = ivtmp_70 - _72;
+   if (ivtmp_74 != 0)
+ goto ; [83.33%]
+   else
+ goto ; [16.67%]
+
+   Note: We DO NOT use .SELECT_VL in SLP auto-vectorization for multiple
+   rgroups. Instead, we use MIN_EXPR to guarantee we always use VF as the
+   iteration amount for mutiple rgroups.+   The analysis of the flow of 
multiple rgroups:
+   _72 = MIN_EXPR ;
+   _75 = MIN_EXPR ;
+   ...
+   .LEN_STORE (vectp_f.8_51, 128B, _75, { 1, 2, 1, 2, 1, 2, 1, 2 }, 0);
+   vectp_f.8_56 = vectp_f.8_51 + 16;
+   .LEN_STORE (vectp_f.8_56, 128B, _72, { 1, 2, 1, 2, 1, 2, 1, 2 }, 0);
+   ...
+   _61 = _75 / 2;
+   .LEN_STORE (vectp_d.10_59, 128B, _61, { 3, 3, 3, 3 }, 0);
+   vectp_d.10_63 = vectp_d.10_59 + 16;
+   _64 = _72 / 2;
+   .LEN_STORE (vectp_d.10_63, 128B, _64, { 3, 3, 3, 3 }, 0);Here, If use 
SELECT_VL instead of MIN_EXPR. Since we define the outcome of SELECT_VL can be 
any number in non-final iteration.It seems not easy to adjust address pointer 
IV (vectp_f.8_56 = vectp_f.8_51 + 16;) and the next length (_61 = _75 / 2;).
For case 3: +  3. Multiple rgroups for non-SLP auto-vectorization.+
+ # ivtmp_26 = PHI 
+ # ivtmp.35_10 = PHI 
+ # ivtmp.36_2 = PHI 
+ _28 = MIN_EXPR ;
+ loop_len_15 = MIN_EXPR <_28, POLY_INT_CST [4, 4]>;
+ loop_len_16 = _28 - loop_len_15;
+ _29 = (void *) ivtmp.35_10;
+ _7 =   [(int *)_29];
+ vect__1.25_17 = .LEN_LOAD (_7, 128B, loop_len_15, 0);
+ _33 = _29 + POLY_INT_CST [16, 16];
+ _34 =   [(int *)_33];
+ vect__1.26_19 = .LEN_LOAD (_34, 128B, loop_len_16, 0);
+ vect__2.27_20 = VEC_PACK_TRUNC_EXPR ;
+ _30 = (void *) ivtmp.36_2;
+ _31 =   [(short int *)_30];
+ .LEN_STORE (_31, 128B, _28, vect__2.27_20, 0);
+ ivtmp_27 = ivtmp_26 - _28;
+ ivtmp.35_11 = ivtmp.35_10 + POLY_INT_CST [32, 32];
+ ivtmp.36_8 = ivtmp.36_2 + POLY_INT_CST [16, 16];
+ if (ivtmp_27 != 0)
+   goto ; [83.33%]
+ else
+   goto ; [16.67%]
+
+ The total length: _28 = MIN_EXPR ;
+
+ The length of first half vector:
+   loop_len_15 = MIN_EXPR <_28, POLY_INT_CST [4, 4]>;
+
+ The length of second half vector:
+   loop_len_15 = MIN_EXPR <_28, POLY_INT_CST [4, 4]>;
+   loop_len_16 = _28 - loop_len_15;
+
+ 1). _28 always <= POLY_INT_CST [8, 8].
+ 2). When _28 <= POLY_INT_CST [4, 4], second half vector is not processed.
+ 3). When _28 > POLY_INT_CST [4, 4], second half vector is processed.We 
known in Case 3, we should deal with 2 vectors: vect__2.27_20 = 
VEC_PACK_TRUNC_EXPR ;First we use  _28 = MIN_EXPR 
; to generate the number of elements to be 
processedfor these 2 vector.Second, we use "loop_len_15 = MIN_EXPR <_28, 
POLY_INT_CST [4, 4]>;"  "loop_len_15" is the  number elements to be processed 
for first vector.Then, "loop_len_16 = _28 - loop_len_15; "loop_len_16" is the  
number elements to be processed for first vector.I think "loop_len_15 = 
MIN_EXPR <_28, POLY_INT_CST [4, 4]>;" is very similiar the unpacklo in ARM 
SVE."loop_len_16 = _28 - loop_len_15; "loop_len_16" is very similiar the 
unpackhi in ARM SVE.
>> It's up to you.  If you don't think select_vl is worth it then it would
>>obviously make the vectoriser changes a bit simpler.
>>But making the vectoriser simpler isn't IMO the goal here.  SELECT_VL
>>seems like a perfectly reasonable construct to add to target-independent
>>code.  We just need to work out some of the details.

Ok, I also prefer keeping select_vl.

>>FWIW, I share Kewen's concern about duplicating too much logic between

[committed] Eliminate more comparisons on the H8 port

2023-05-09 Thread Jeff Law via Gcc-patches

This patch fixes a minor code quality issue I found while testing LRA on 
the H8.  Specifically we have a peephole which converts a comparison of 
a memory location against zero into a load + comparison which is 
actually more efficient.  This triggers when there are registers 
available at the right point during peephole2.


If the load is not a mode dependent address we can actually do better by 
realizing the load itself sets the proper flags and eliminate the 
comparison.  I may have expected this to happen when I wrote the 
original peephole2 -- but cmpelim runs before peephole2, so clearly if 
we want to eliminate the comparison we have to do it manually.


Committed to the trunk,
Jeff


commit 204303c81e82ddd01e7dc5a5a63719d476f9043c
Author: Jeff Law 
Date:   Tue May 9 07:18:45 2023 -0600

Eliminate more comparisons on the H8 port

This patch fixes a minor code quality issue I found while testing LRA on the
H8.  Specifically we have a peephole which converts a comparison of a memory
location against zero into a load + comparison which is actually more
efficient.  This triggers when there are registers available at the right
point during peephole2.

If the load is not a mode dependent address we can actually do better by
realizing the load itself sets the proper flags and eliminate the 
comparison.
I may have expected this to happen when I wrote the original peephole2,
but cmpelim runs before peephole2, so clearly if we want to eliminate the
comparison we have to do it manually.

gcc/
* config/h8300/testcompare.md: Add peephole2 which uses a memory
load to set flags, thus eliminating a compare against zero.

diff --git a/gcc/config/h8300/testcompare.md b/gcc/config/h8300/testcompare.md
index 81dce1d0bc1..efa66d274c7 100644
--- a/gcc/config/h8300/testcompare.md
+++ b/gcc/config/h8300/testcompare.md
@@ -171,13 +171,25 @@ (define_insn "cmpsi"
(set_attr "length_table" "*,add")])
 
 ;; Convert a memory comparison to a move if there is a scratch register.
+;; This is preferred over the next as we can proactively avoid the
+;; comparison.
+(define_peephole2
+  [(match_scratch:QHSI 1 "r")
+   (set (reg:CC CC_REG)
+   (compare (match_operand:QHSI 0 "memory_operand" "")
+(const_int 0)))]
+  "!mode_dependent_address_p (XEXP (operands[0], 0), MEM_ADDR_SPACE 
(operands[0]))"
+  [(parallel [(set (reg:CCZN CC_REG) (compare:CCZN (match_dup 0) (const_int 
0)))
+ (set (match_dup 1) (match_dup 0))])])
 
+;; Similarly, but used when the memory reference is an autoinc address
+;; mode.
 (define_peephole2
   [(match_scratch:QHSI 1 "r")
(set (reg:CC CC_REG)
(compare (match_operand:QHSI 0 "memory_operand" "")
 (const_int 0)))]
-  ""
+  "mode_dependent_address_p (XEXP (operands[0], 0), MEM_ADDR_SPACE 
(operands[0]))"
   [(parallel [(set (match_dup 1) (match_dup 0)) (clobber (reg:CC CC_REG))])
(set (reg:CC CC_REG) (compare:CC (match_dup 1) (const_int 0)))])

libgomp testsuite: Get rid of 'lang_test_file_found' (was: libgomp C++, Fortran testsuites: Resolve 'lang_test_file_found' first (was: libgomp testsuite: Localize 'lang_[...]' etc. (was: libgomp tests

2023-05-09 Thread Thomas Schwinge

Hi!

On 2023-05-09T14:59:53+0200, I wrote:
> On 2023-05-09T14:54:21+0200, I wrote:
>> [...] libgomp testsuite is a bit weird [...]: that every '*.exp' file
>> begins with a bunch of conditional 'unset lang_[...]' to clean up what
>> the previous '*.exp' file left behind.  I propose to simplify this as per
>> the attached "libgomp testsuite: Localize 'lang_[...]' etc." -- OK to
>> push?
>
> On top of this I'd then push the attached
> "libgomp C++, Fortran testsuites: Resolve 'lang_test_file_found' first",
> see attached, which again is extracted out of my earlier work.  This is
> to enable follow-on clean-up.

... which is "libgomp testsuite: Get rid of 'lang_test_file_found'", see
attached.  I'm also attaching a 'git format-patch --ignore-all-space'
variant, for easier review.

>> given that it's in line with my
>> earlier work (just a separate step), I intend to push it soon, unless
>> there are any objections, of course.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 38528210d3eb3da198be6900378f9e9d1ff10b53 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Sat, 1 Nov 2014 16:25:26 +0100
Subject: [PATCH] libgomp testsuite: Get rid of 'lang_test_file_found'

Instead, 'return' early from the '*.exp' files that we're not able to test.
Also, change 'puts' into 'verbose -log'.  While re-indenting the previous
'if { $lang_test_file_found } { [...] }' code, also simplify 'ld_library_path'
setup.

	libgomp/
	* testsuite/lib/libgomp.exp (libgomp_target_compile): Don't look
	at 'lang_test_file_found'.
	* testsuite/libgomp.c++/c++.exp: Don't set and use it, and instead
	'return' early if not able to test.  Simplify 'ld_library_path' setup.
	* testsuite/libgomp.fortran/fortran.exp: Likewise.
	* testsuite/libgomp.oacc-c++/c++.exp: Likewise.
	* testsuite/libgomp.oacc-fortran/fortran.exp: Likewise.
---
 libgomp/testsuite/lib/libgomp.exp |  31 ++-
 libgomp/testsuite/libgomp.c++/c++.exp |  61 +++---
 libgomp/testsuite/libgomp.fortran/fortran.exp |  80 
 libgomp/testsuite/libgomp.oacc-c++/c++.exp| 187 --
 .../libgomp.oacc-fortran/fortran.exp  | 166 
 5 files changed, 243 insertions(+), 282 deletions(-)

diff --git a/libgomp/testsuite/lib/libgomp.exp b/libgomp/testsuite/lib/libgomp.exp
index dce33d788cc..c30fa4ed24b 100644
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -240,24 +240,23 @@ proc libgomp_target_compile { source dest type options } {
 global gluefile wrap_flags
 global ALWAYS_CFLAGS
 global GCC_UNDER_TEST
-global lang_test_file_found
+
+global lang_source_re lang_include_flags
+if { [info exists lang_include_flags] \
+	 && [regexp ${lang_source_re} ${source}] } {
+	lappend options "additional_flags=${lang_include_flags}"
+}
+
 global lang_library_path
+if { [info exists lang_library_path] } {
+	# Some targets use libgfortran.a%s in their specs, so they need
+	# a -B option for uninstalled testing.
+	lappend options "additional_flags=-B${blddir}/${lang_library_path}"
+	lappend options "ldflags=-L${blddir}/${lang_library_path}"
+}
 global lang_link_flags
-global lang_include_flags
-global lang_source_re
-
-if { [info exists lang_test_file_found] } {
-if { $blddir != "" } {
-# Some targets use libgfortran.a%s in their specs, so they need
-# a -B option for uninstalled testing.
-lappend options "additional_flags=-B${blddir}/${lang_library_path}"
-lappend options "ldflags=-L${blddir}/${lang_library_path}"
-}
-lappend options "ldflags=${lang_link_flags}"
-	if { [info exists lang_include_flags] \
-	 && [regexp ${lang_source_re} ${source}] } {
-	lappend options "additional_flags=${lang_include_flags}"
-	}
+if { [info exists lang_link_flags] } {
+	lappend options "ldflags=${lang_link_flags}"
 }
 
 if { [target_info needs_status_wrapper] != "" && [info exists gluefile] } {
diff --git a/libgomp/testsuite/libgomp.c++/c++.exp b/libgomp/testsuite/libgomp.c++/c++.exp
index 797a05ca870..8307baf32fc 100644
--- a/libgomp/testsuite/libgomp.c++/c++.exp
+++ b/libgomp/testsuite/libgomp.c++/c++.exp
@@ -1,23 +1,18 @@
 load_lib libgomp-dg.exp
 load_gcc_lib gcc-dg.exp
 
-set lang_test_file_found 0
 if { $blddir != "" } {
 set lang_library_path "../libstdc++-v3/src/.libs"
 set shlib_ext [get_shlib_extension]
-# Look for a static libstdc++ first.
-if [file exists "${blddir}/${lang_library_path}/libstdc++.a"] {
-set lang_test_file_found 1
-# We may have a shared only build, so look for a shared libstdc++.
-} elseif [file exists "${blddir}/${lang_library_path}/libstdc++.${shlib_ext}"] {
-set

libgomp C++, Fortran testsuites: Resolve 'lang_test_file_found' first (was: libgomp testsuite: Localize 'lang_[...]' etc. (was: libgomp testsuite: (not) using a specific driver for C++, Fortran?))

2023-05-09 Thread Thomas Schwinge

Hi!

On 2023-05-09T14:54:21+0200, I wrote:
> [...] libgomp testsuite is a bit weird [...]: that every '*.exp' file
> begins with a bunch of conditional 'unset lang_[...]' to clean up what
> the previous '*.exp' file left behind.  I propose to simplify this as per
> the attached "libgomp testsuite: Localize 'lang_[...]' etc." -- OK to
> push?

On top of this I'd then push the attached
"libgomp C++, Fortran testsuites: Resolve 'lang_test_file_found' first",
see attached, which again is extracted out of my earlier work.  This is
to enable follow-on clean-up.

> given that it's in line with my
> earlier work (just a separate step), I intend to push it soon, unless
> there are any objections, of course.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 9b7c6db76c5757a32b246dc8cb912902a3ffdb35 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Sat, 1 Nov 2014 16:25:26 +0100
Subject: [PATCH] libgomp C++, Fortran testsuites: Resolve
 'lang_test_file_found' first

	libgomp/
	* testsuite/libgomp.c++/c++.exp: Resolve 'lang_test_file_found'
	first.
	* testsuite/libgomp.fortran/fortran.exp: Likewise.
	* testsuite/libgomp.oacc-c++/c++.exp: Likewise.
	* testsuite/libgomp.oacc-fortran/fortran.exp: Likewise.
---
 libgomp/testsuite/libgomp.c++/c++.exp | 43 +--
 libgomp/testsuite/libgomp.fortran/fortran.exp | 30 ++---
 libgomp/testsuite/libgomp.oacc-c++/c++.exp| 37 
 .../libgomp.oacc-fortran/fortran.exp  | 31 +++--
 4 files changed, 68 insertions(+), 73 deletions(-)

diff --git a/libgomp/testsuite/libgomp.c++/c++.exp b/libgomp/testsuite/libgomp.c++/c++.exp
index 442e9f744d0..797a05ca870 100644
--- a/libgomp/testsuite/libgomp.c++/c++.exp
+++ b/libgomp/testsuite/libgomp.c++/c++.exp
@@ -1,12 +1,25 @@
 load_lib libgomp-dg.exp
 load_gcc_lib gcc-dg.exp
 
-global shlib_ext
-
-set shlib_ext [get_shlib_extension]
-set lang_link_flags "-lstdc++"
 set lang_test_file_found 0
-set lang_library_path "../libstdc++-v3/src/.libs"
+if { $blddir != "" } {
+set lang_library_path "../libstdc++-v3/src/.libs"
+set shlib_ext [get_shlib_extension]
+# Look for a static libstdc++ first.
+if [file exists "${blddir}/${lang_library_path}/libstdc++.a"] {
+set lang_test_file_found 1
+# We may have a shared only build, so look for a shared libstdc++.
+} elseif [file exists "${blddir}/${lang_library_path}/libstdc++.${shlib_ext}"] {
+set lang_test_file_found 1
+} else {
+puts "No libstdc++ library found, will not execute c++ tests"
+}
+} elseif { [info exists GXX_UNDER_TEST] } {
+set lang_test_file_found 1
+} else {
+puts "GXX_UNDER_TEST not defined, will not execute c++ tests"
+}
+set lang_link_flags "-lstdc++"
 
 # If a testcase doesn't have special options, use these.
 if ![info exists DEFAULT_CFLAGS] then {
@@ -24,22 +37,6 @@ lappend ALWAYS_CFLAGS "additional_flags=-fopenmp"
 set SAVE_GCC_UNDER_TEST "$GCC_UNDER_TEST"
 set GCC_UNDER_TEST "$GCC_UNDER_TEST -x c++"
 
-if { $blddir != "" } {
-# Look for a static libstdc++ first.
-if [file exists "${blddir}/${lang_library_path}/libstdc++.a"] {
-set lang_test_file_found 1
-# We may have a shared only build, so look for a shared libstdc++.
-} elseif [file exists "${blddir}/${lang_library_path}/libstdc++.${shlib_ext}"] {
-set lang_test_file_found 1
-} else {
-puts "No libstdc++ library found, will not execute c++ tests"
-}
-} elseif { [info exists GXX_UNDER_TEST] } {
-set lang_test_file_found 1
-} else {
-puts "GXX_UNDER_TEST not defined, will not execute c++ tests"
-}
-
 if { $lang_test_file_found } {
 # Gather a list of all tests.
 set tests [lsort [concat \
@@ -72,7 +69,9 @@ if [info exists lang_include_flags] then {
 unset lang_source_re
 unset lang_include_flags
 }
-unset lang_library_path
+if { $blddir != "" } {
+unset lang_library_path
+}
 unset lang_link_flags
 unset lang_test_file_found
 
diff --git a/libgomp/testsuite/libgomp.fortran/fortran.exp b/libgomp/testsuite/libgomp.fortran/fortran.exp
index 7ea00a25bd9..16ce9d3e023 100644
--- a/libgomp/testsuite/libgomp.fortran/fortran.exp
+++ b/libgomp/testsuite/libgomp.fortran/fortran.exp
@@ -2,23 +2,10 @@ load_lib libgomp-dg.exp
 load_gcc_lib gcc-dg.exp
 load_gcc_lib gfortran-dg.exp
 
-global shlib_ext
-global ALWAYS_CFLAGS
-
-set shlib_ext [get_shlib_extension]
-set lang_library_path	"../libgfortran/.libs"
-set lang_link_flags	"-lgfortran -foffload=-lgfortran"
 set lang_test_file_found 0
-
-# Initialize dg.
-dg-init
-
-# Turn on OpenMP.
-lappend ALWAYS_CFLAGS "additional_flags=-fopenmp"
-
 if { $blddir != "" } {
-set lang_source_re {^.*\.[fF](|90|95|03|08)$}
-set lang_include_flags

Re: [PATCH V4] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-09 Thread Richard Sandiford via Gcc-patches

钟居哲  writes:
> Hi, Richards. I would like to give more information about this patch so that 
> it will make this patch easier for you to review.
>
> Currently, I saw we have 3 situations that we need to handle in case of loop 
> control IV in auto-vectorization:
> 1. Single rgroup loop control (ncopies == 1 && vec_num == 1 so 
> loop_len.length () == 1 or rgc->lengh () == 1)
> 2. Multiple rgroup for SLP.
> 3. Multiple rgroup for non-SLP which is Richard Sandiford point out 
> previously (For example, VEC_PACK_TRUNC).
>
> To talk about this patch, let me talk about RVV LLVM implementation first 
> which inspire me to send this patch:
> https://reviews.llvm.org/D99750 
>
> According to LLVM implementation, they are adding a middle-end IR called 
> "get_vector_length" which has totally
> same functionality as "select_vl" in this patch (I call it "while_len" 
> previously, now I rename it as "select_vl" following Richard suggestion).
>
> The LLVM implementation is only let "get_vector_length" calculate the number 
> of elements in single rgroup loop.
> For multi rgroup, let's take a look at it:
> https://godbolt.org/z/3GP78efTY 
>
> void
> foo1 (short *__restrict f, int *__restrict d, int n)
> {
>   for (int i = 0; i < n; ++i)
> {
>   f[i * 2 + 0] = 1;
>   f[i * 2 + 1] = 2;
>   d[i] = 3;
> }
> } 
>
> RISC-V Clang:
> foo1:   # @foo1
> # %bb.0:
> bleza2, .LBB0_8
> # %bb.1:
> li  a3, 16
> bgeua2, a3, .LBB0_3
> # %bb.2:
> li  a6, 0
> j   .LBB0_6
> .LBB0_3:
> andia6, a2, -16
> lui a3, 32
> addiw   a3, a3, 1
> vsetivlizero, 8, e32, m2, ta, ma
> vmv.v.x v8, a3
> vmv.v.i v10, 3
> mv  a4, a6
> mv  a5, a1
> mv  a3, a0
> .LBB0_4:# =>This Inner Loop Header: Depth=1
> addia7, a5, 32
> addit0, a3, 32
> vsetivlizero, 16, e16, m2, ta, ma
> vse16.v v8, (a3)
> vse16.v v8, (t0)
> vsetivlizero, 8, e32, m2, ta, ma
> vse32.v v10, (a5)
> vse32.v v10, (a7)
> addia3, a3, 64
> addia4, a4, -16
> addia5, a5, 64
> bneza4, .LBB0_4
> # %bb.5:
> beq a6, a2, .LBB0_8
> .LBB0_6:
> sllia3, a6, 2
> add a0, a0, a3
> addia0, a0, 2
> add a1, a1, a3
> sub a2, a2, a6
> li  a3, 1
> li  a4, 2
> li  a5, 3
> .LBB0_7:# =>This Inner Loop Header: Depth=1
> sh  a3, -2(a0)
> sh  a4, 0(a0)
> sw  a5, 0(a1)
> addia0, a0, 4
> addia2, a2, -1
> addia1, a1, 4
> bneza2, .LBB0_7
> .LBB0_8:
> ret
>
> ARM GCC:
> foo1:
> cmp w2, 0
> ble .L1
> addvl   x4, x0, #1
> mov x3, 0
> cntbx7
> cntbx6, all, mul #2
> sbfiz   x2, x2, 1, 32
> ptrue   p0.b, all
> mov x5, x2
> adrpx8, .LC0
> uqdech  x5
> add x8, x8, :lo12:.LC0
> whilelo p1.h, xzr, x5
> ld1rw   z1.s, p0/z, [x8]
> mov z0.s, #3
> whilelo p0.h, xzr, x2
> .L3:
> st1hz1.h, p0, [x0, x3, lsl 1]
> st1hz1.h, p1, [x4, x3, lsl 1]
> st1wz0.s, p1, [x1, #1, mul vl]
> add x3, x3, x7
> whilelo p1.h, x3, x5
> st1wz0.s, p0, [x1]
> add x1, x1, x6
> whilelo p0.h, x3, x2
> b.any   .L3
> .L1:
> ret
>
> It's very obvious that ARM GCC has much better codegen since RVV LLVM just 
> use SIMD style to handle multi-rgroup SLP auto-vectorization.
>
> Well, I am totally aggree that we should add length stuff in 
> auto-vectorization not only for single rgroup but also multiple rgroup.
> However, when I am trying to implement multiple rgroup length for both SLP 
> and non-SLP and testing, turns out it's hard to use select_vl
> since "select_vl" pattern allows non-VF flexible length (length <= min 
> (remain,VF)) in any iteration, it's consuming much more operations for
> adjust loop controls IV and data reference address point IV than just using 
> "MIN_EXPR".
>
> So for Case 2 && Case 3, I just use MIN_EXPR directly instead of SELECT_VL 
> after my serveral internal testing.

Could you go into more details about this?  I imagined that for case 3,
there would be a single SELECT_VL that decides how many scalar iterations
will be handled by the current vector iteration, then we would "expand"
the result (using MIN_EXPRs) to the multi-control cases.

In a sense that replicates what the SVE code above is doing.  But for SVE,
it's possible to "optimise" the unpacking of a WHILELO result due to the
lack of implementation-defined behaviour.  So conceptually we have a
single WHILELO that is

libgomp testsuite: Localize 'lang_[...]' etc. (was: libgomp testsuite: (not) using a specific driver for C++, Fortran?)

2023-05-09 Thread Thomas Schwinge

Hi!

On 2014-11-04T10:31:37-0800, Mike Stump  wrote:
> I wonder if any variables that you set that need to be cleared out are, that 
> seems a defect in the api or conventions we use and it can screw following 
> tests with a wrong environment in subtle ways.  I could miss accidental bleed 
> over from the .exp you’re modifying into other unrelated parts of the test 
> suite.

Aye, that topic keeps coming back...  ;-\ I'm not aware of any such
defects in my changes, but I do agree that the current structure of the
libgomp testsuite is a bit weird in that regard: that every '*.exp' file
begins with a bunch of conditional 'unset lang_[...]' to clean up what
the previous '*.exp' file left behind.  I propose to simplify this as per
the attached "libgomp testsuite: Localize 'lang_[...]' etc." -- OK to
push?  (That's a new patch, loosely based on/extracted out of my earlier
work.  I'm posting it separately, but given that it's in line with my
earlier work (just a separate step), I intend to push it soon, unless
there are any objections, of course.)


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 6230d3b0aef9b785c7d62c512e8939cca183fdb4 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 9 May 2023 10:09:35 +0200
Subject: [PATCH] libgomp testsuite: Localize 'lang_[...]' etc.

..., instead of letting them bleed into the next '*.exp' file, requiring
clean-up there.

	libgomp/
	* testsuite/libgomp.c++/c++.exp: Localize 'lang_[...]' etc.
	* testsuite/libgomp.c/c.exp: Likewise.
	* testsuite/libgomp.fortran/fortran.exp: Likewise.
	* testsuite/libgomp.graphite/graphite.exp: Likewise.
	* testsuite/libgomp.oacc-c++/c++.exp: Likewise.
	* testsuite/libgomp.oacc-c/c.exp: Likewise.
	* testsuite/libgomp.oacc-fortran/fortran.exp: Likewise.
---
 libgomp/testsuite/libgomp.c++/c++.exp | 11 ---
 libgomp/testsuite/libgomp.c/c.exp | 11 ---
 libgomp/testsuite/libgomp.fortran/fortran.exp | 15 ++-
 libgomp/testsuite/libgomp.graphite/graphite.exp   | 11 ---
 libgomp/testsuite/libgomp.oacc-c++/c++.exp| 11 ---
 libgomp/testsuite/libgomp.oacc-c/c.exp| 11 ---
 .../testsuite/libgomp.oacc-fortran/fortran.exp| 15 ++-
 7 files changed, 36 insertions(+), 49 deletions(-)

diff --git a/libgomp/testsuite/libgomp.c++/c++.exp b/libgomp/testsuite/libgomp.c++/c++.exp
index 7dd2d49d568..442e9f744d0 100644
--- a/libgomp/testsuite/libgomp.c++/c++.exp
+++ b/libgomp/testsuite/libgomp.c++/c++.exp
@@ -7,9 +7,6 @@ set shlib_ext [get_shlib_extension]
 set lang_link_flags "-lstdc++"
 set lang_test_file_found 0
 set lang_library_path "../libstdc++-v3/src/.libs"
-if [info exists lang_include_flags] then {
-unset lang_include_flags
-}
 
 # If a testcase doesn't have special options, use these.
 if ![info exists DEFAULT_CFLAGS] then {
@@ -71,5 +68,13 @@ if { $lang_test_file_found } {
 # See above.
 set GCC_UNDER_TEST "$SAVE_GCC_UNDER_TEST"
 
+if [info exists lang_include_flags] then {
+unset lang_source_re
+unset lang_include_flags
+}
+unset lang_library_path
+unset lang_link_flags
+unset lang_test_file_found
+
 # All done.
 dg-finish
diff --git a/libgomp/testsuite/libgomp.c/c.exp b/libgomp/testsuite/libgomp.c/c.exp
index e67adc8b378..0ee28ed723e 100644
--- a/libgomp/testsuite/libgomp.c/c.exp
+++ b/libgomp/testsuite/libgomp.c/c.exp
@@ -1,14 +1,3 @@
-if [info exists lang_library_path] then {
-unset lang_library_path
-unset lang_link_flags
-}
-if [info exists lang_test_file_found] then {
-unset lang_test_file_found
-}
-if [info exists lang_include_flags] then {
-unset lang_include_flags
-}
-
 load_lib libgomp-dg.exp
 load_gcc_lib gcc-dg.exp
 
diff --git a/libgomp/testsuite/libgomp.fortran/fortran.exp b/libgomp/testsuite/libgomp.fortran/fortran.exp
index 5e8e15e7743..7ea00a25bd9 100644
--- a/libgomp/testsuite/libgomp.fortran/fortran.exp
+++ b/libgomp/testsuite/libgomp.fortran/fortran.exp
@@ -8,12 +8,7 @@ global ALWAYS_CFLAGS
 set shlib_ext [get_shlib_extension]
 set lang_library_path	"../libgfortran/.libs"
 set lang_link_flags	"-lgfortran -foffload=-lgfortran"
-if [info exists lang_include_flags] then {
-unset lang_include_flags
-}
 set lang_test_file_found 0
-set quadmath_library_path "../libquadmath/.libs"
-
 
 # Initialize dg.
 dg-init
@@ -44,6 +39,7 @@ if { $lang_test_file_found } {
 set tests [lsort [find $srcdir/$subdir *.\[fF\]{,90,95,03,08}]]
 
 if { $blddir != "" } {
+	set quadmath_library_path "../libquadmath/.libs"
 	if { [file exists "${blddir}/${quadmath_library_path}/libquadmath.a"]
 	 || [file exists "${blddir}/${quadmath_library_path}/libquadmath.${shlib_ext}"] } {
 	lappend ALWAYS_CFLAGS "ldflags=-L${blddir}/${quadmath_library_path}/"
@@ -54,6

libgomp testsuite: Use 'lang_test_file_found' instead of 'lang_test_file' (was: libgomp testsuite: Only use 'blddir' if set (was: libgomp C++ testsuite: Don't compute 'blddir' twice (was: libgomp test

2023-05-09 Thread Thomas Schwinge

Hi!

On 2023-05-09T14:39:53+0200, I wrote:
> On 2023-05-09T14:36:53+0200, I wrote:
>> On 2014-11-04T10:31:37-0800, Mike Stump  wrote:
>>> On Nov 4, 2014, at 4:13 AM, Thomas Schwinge  wrote:
 On Wed, 15 Oct 2014 17:46:48 +0200, I wrote:
> [...]
>
> Am I on the right track with the following?

 Nobody commented, which also means nobody disagreed
>>>
>>> :-)
>>>
 OK to commit all that to trunk?
>>>
>>> Ok, thanks.
>>
>> Almost one decade later (eek...) I'm now finally getting back to this.
>> First, a number of clean-up patches; rebased, adjusted, retested for
>> current state of affairs (not too much changed in libgomp testsuite
>> infrastructure, though).

Pushed to master branch commit 2ed5ceba0fe313ef09bdfe98788ba9377bfec9aa
"libgomp testsuite: Use 'lang_test_file_found' instead of 'lang_test_file'",
see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 2ed5ceba0fe313ef09bdfe98788ba9377bfec9aa Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 31 Oct 2014 17:38:03 +0100
Subject: [PATCH] libgomp testsuite: Use 'lang_test_file_found' instead of
 'lang_test_file'

The value of 'lang_test_file' isn't actually used anywhere.

	libgomp/
	* testsuite/libgomp.c++/c++.exp: Don't set 'lang_test_file'.
	* testsuite/libgomp.fortran/fortran.exp: Likewise.
	* testsuite/libgomp.oacc-c++/c++.exp: Likewise.
	* testsuite/libgomp.oacc-fortran/fortran.exp: Likewise.
	* testsuite/libgomp.c/c.exp: Unset 'lang_test_file_found' instead of
	'lang_test_file'.
	* testsuite/libgomp.oacc-c/c.exp: Likewise.
	* testsuite/libgomp.graphite/graphite.exp: Likewise.
	* testsuite/lib/libgomp.exp (libgomp_target_compile): Look for
	'lang_test_file_found' instead of 'lang_test_file'.
---
 libgomp/testsuite/lib/libgomp.exp  | 4 ++--
 libgomp/testsuite/libgomp.c++/c++.exp  | 4 
 libgomp/testsuite/libgomp.c/c.exp  | 4 ++--
 libgomp/testsuite/libgomp.fortran/fortran.exp  | 4 
 libgomp/testsuite/libgomp.graphite/graphite.exp| 4 ++--
 libgomp/testsuite/libgomp.oacc-c++/c++.exp | 4 
 libgomp/testsuite/libgomp.oacc-c/c.exp | 4 ++--
 libgomp/testsuite/libgomp.oacc-fortran/fortran.exp | 4 
 8 files changed, 8 insertions(+), 24 deletions(-)

diff --git a/libgomp/testsuite/lib/libgomp.exp b/libgomp/testsuite/lib/libgomp.exp
index 92f650742290..aae7149a97df 100644
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -234,13 +234,13 @@ proc libgomp_target_compile { source dest type options } {
 global gluefile wrap_flags
 global ALWAYS_CFLAGS
 global GCC_UNDER_TEST
-global lang_test_file
+global lang_test_file_found
 global lang_library_path
 global lang_link_flags
 global lang_include_flags
 global lang_source_re
 
-if { [info exists lang_test_file] } {
+if { [info exists lang_test_file_found] } {
 if { $blddir != "" } {
 # Some targets use libgfortran.a%s in their specs, so they need
 # a -B option for uninstalled testing.
diff --git a/libgomp/testsuite/libgomp.c++/c++.exp b/libgomp/testsuite/libgomp.c++/c++.exp
index 188d1a823561..7dd2d49d568e 100644
--- a/libgomp/testsuite/libgomp.c++/c++.exp
+++ b/libgomp/testsuite/libgomp.c++/c++.exp
@@ -30,19 +30,15 @@ set GCC_UNDER_TEST "$GCC_UNDER_TEST -x c++"
 if { $blddir != "" } {
 # Look for a static libstdc++ first.
 if [file exists "${blddir}/${lang_library_path}/libstdc++.a"] {
-set lang_test_file "${lang_library_path}/libstdc++.a"
 set lang_test_file_found 1
 # We may have a shared only build, so look for a shared libstdc++.
 } elseif [file exists "${blddir}/${lang_library_path}/libstdc++.${shlib_ext}"] {
-set lang_test_file "${lang_library_path}/libstdc++.${shlib_ext}"
 set lang_test_file_found 1
 } else {
 puts "No libstdc++ library found, will not execute c++ tests"
 }
 } elseif { [info exists GXX_UNDER_TEST] } {
 set lang_test_file_found 1
-# Needs to exist for libgomp.exp.
-set lang_test_file ""
 } else {
 puts "GXX_UNDER_TEST not defined, will not execute c++ tests"
 }
diff --git a/libgomp/testsuite/libgomp.c/c.exp b/libgomp/testsuite/libgomp.c/c.exp
index 31bdd5795dc2..e67adc8b378c 100644
--- a/libgomp/testsuite/libgomp.c/c.exp
+++ b/libgomp/testsuite/libgomp.c/c.exp
@@ -2,8 +2,8 @@ if [info exists lang_library_path] then {
 unset lang_library_path
 unset lang_link_flags
 }
-if [info exists lang_test_file] then {
-unset lang_test_file
+if [info exists lang_test_file_found] then {
+unset lang_test_file_found
 }
 if [info exists lang_include_flags] then {
 unset lang_include_flags
diff --git

libgomp testsuite: Only use 'blddir' if set (was: libgomp C++ testsuite: Don't compute 'blddir' twice (was: libgomp testsuite: (not) using a specific driver for C++, Fortran?))

2023-05-09 Thread Thomas Schwinge

Hi!

On 2023-05-09T14:36:53+0200, I wrote:
> On 2014-11-04T10:31:37-0800, Mike Stump  wrote:
>> On Nov 4, 2014, at 4:13 AM, Thomas Schwinge  wrote:
>>> On Wed, 15 Oct 2014 17:46:48 +0200, I wrote:
 [...]

 Am I on the right track with the following?
>>>
>>> Nobody commented, which also means nobody disagreed
>>
>> :-)
>>
>>> OK to commit all that to trunk?
>>
>> Ok, thanks.
>
> Almost one decade later (eek...) I'm now finally getting back to this.
> First, a number of clean-up patches; rebased, adjusted, retested for
> current state of affairs (not too much changed in libgomp testsuite
> infrastructure, though).

Pushed to master branch commit fed3dbbfd1b707d7386b14e724056bfe2234e3a5
"libgomp testsuite: Only use 'blddir' if set", see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From fed3dbbfd1b707d7386b14e724056bfe2234e3a5 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 3 Nov 2014 09:58:38 +0100
Subject: [PATCH] libgomp testsuite: Only use 'blddir' if set

(It is unclear to me why the current working directory needs to be in
'LD_LIBRARY_PATH'; leaving that alone for now.)

	libgomp/
	* testsuite/lib/libgomp.exp (libgomp_init): Only use 'blddir' if
	set.
	* testsuite/libgomp.c++/c++.exp: Likewise.
	* testsuite/libgomp.oacc-c++/c++.exp: Likewise.
---
 libgomp/testsuite/lib/libgomp.exp  | 5 +++--
 libgomp/testsuite/libgomp.c++/c++.exp  | 3 ++-
 libgomp/testsuite/libgomp.oacc-c++/c++.exp | 3 ++-
 3 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/libgomp/testsuite/lib/libgomp.exp b/libgomp/testsuite/lib/libgomp.exp
index e12236e9083c..92f650742290 100644
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -66,7 +66,6 @@ proc libgomp_init { args } {
 global srcdir blddir objdir tool_root_dir
 global libgomp_initialized
 global tmpdir
-global blddir
 global gluefile wrap_flags
 global ALWAYS_CFLAGS
 global CFLAGS
@@ -118,7 +117,7 @@ proc libgomp_init { args } {
 }
 
 # Compute what needs to be put into LD_LIBRARY_PATH
-set always_ld_library_path ".:${blddir}/.libs"
+set always_ld_library_path "."
 
 global offload_additional_lib_paths
 if { $offload_additional_lib_paths != "" } {
@@ -157,6 +156,8 @@ proc libgomp_init { args } {
 lappend ALWAYS_CFLAGS "additional_flags=-B${blddir}/.libs"
 lappend ALWAYS_CFLAGS "additional_flags=-I${blddir}"
 lappend ALWAYS_CFLAGS "ldflags=-L${blddir}/.libs"
+
+	append always_ld_library_path ":${blddir}/.libs"
 }
 # The top-level include directory, for gomp-constants.h.
 lappend ALWAYS_CFLAGS "additional_flags=-I${srcdir}/../../include"
diff --git a/libgomp/testsuite/libgomp.c++/c++.exp b/libgomp/testsuite/libgomp.c++/c++.exp
index 81c188e297a8..188d1a823561 100644
--- a/libgomp/testsuite/libgomp.c++/c++.exp
+++ b/libgomp/testsuite/libgomp.c++/c++.exp
@@ -62,7 +62,8 @@ if { $lang_test_file_found } {
 set_ld_library_path_env_vars
 
 set flags_file "${blddir}/../libstdc++-v3/scripts/testsuite_flags"
-if { [file exists $flags_file] } {
+if { $blddir != ""
+	 && [file exists $flags_file] } {
 	set lang_source_re {^.*\.[cC]$}
 	set lang_include_flags [exec sh $flags_file --build-includes]
 }
diff --git a/libgomp/testsuite/libgomp.oacc-c++/c++.exp b/libgomp/testsuite/libgomp.oacc-c++/c++.exp
index 09001788bb42..24a4d1f67b96 100644
--- a/libgomp/testsuite/libgomp.oacc-c++/c++.exp
+++ b/libgomp/testsuite/libgomp.oacc-c++/c++.exp
@@ -68,7 +68,8 @@ if { $lang_test_file_found } {
 set_ld_library_path_env_vars
 
 set flags_file "${blddir}/../libstdc++-v3/scripts/testsuite_flags"
-if { [file exists $flags_file] } {
+if { $blddir != ""
+	 && [file exists $flags_file] } {
 	set lang_source_re {^.*\.[cC]$}
 	set lang_include_flags [exec sh $flags_file --build-includes]
 }
-- 
2.39.2

libgomp C++ testsuite: Don't compute 'blddir' twice (was: libgomp testsuite: (not) using a specific driver for C++, Fortran?)

2023-05-09 Thread Thomas Schwinge

Hi!

On 2014-11-04T10:31:37-0800, Mike Stump  wrote:
> On Nov 4, 2014, at 4:13 AM, Thomas Schwinge  wrote:
>> On Wed, 15 Oct 2014 17:46:48 +0200, I wrote:
>>> [...]
>>>
>>> Am I on the right track with the following?
>>
>> Nobody commented, which also means nobody disagreed
>
> :-)
>
>> OK to commit all that to trunk?
>
> Ok, thanks.

Almost one decade later (eek...) I'm now finally getting back to this.
First, a number of clean-up patches; rebased, adjusted, retested for
current state of affairs (not too much changed in libgomp testsuite
infrastructure, though).

Pushed to master branch commit b7b209843654bb240589251c90c3465a84d704da
"libgomp C++ testsuite: Don't compute 'blddir' twice", see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From b7b209843654bb240589251c90c3465a84d704da Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 31 Oct 2014 16:49:14 +0100
Subject: [PATCH] libgomp C++ testsuite: Don't compute 'blddir' twice

It has already been set in 'libgomp/testsuite/lib/libgomp.exp:libgomp_init'.

	libgomp/
	* testsuite/libgomp.c++/c++.exp (blddir): Don't set.
	* testsuite/libgomp.oacc-c++/c++.exp (blddir): Likewise.
---
 libgomp/testsuite/libgomp.c++/c++.exp  | 3 ---
 libgomp/testsuite/libgomp.oacc-c++/c++.exp | 3 ---
 2 files changed, 6 deletions(-)

diff --git a/libgomp/testsuite/libgomp.c++/c++.exp b/libgomp/testsuite/libgomp.c++/c++.exp
index 5b9a5924ff37..81c188e297a8 100644
--- a/libgomp/testsuite/libgomp.c++/c++.exp
+++ b/libgomp/testsuite/libgomp.c++/c++.exp
@@ -27,9 +27,6 @@ lappend ALWAYS_CFLAGS "additional_flags=-fopenmp"
 set SAVE_GCC_UNDER_TEST "$GCC_UNDER_TEST"
 set GCC_UNDER_TEST "$GCC_UNDER_TEST -x c++"
 
-set blddir [lookfor_file [get_multilibs] libgomp]
-
-
 if { $blddir != "" } {
 # Look for a static libstdc++ first.
 if [file exists "${blddir}/${lang_library_path}/libstdc++.a"] {
diff --git a/libgomp/testsuite/libgomp.oacc-c++/c++.exp b/libgomp/testsuite/libgomp.oacc-c++/c++.exp
index 0b235ba47f35..09001788bb42 100644
--- a/libgomp/testsuite/libgomp.oacc-c++/c++.exp
+++ b/libgomp/testsuite/libgomp.oacc-c++/c++.exp
@@ -33,9 +33,6 @@ lappend ALWAYS_CFLAGS "additional_flags=-fopenacc"
 set SAVE_GCC_UNDER_TEST "$GCC_UNDER_TEST"
 set GCC_UNDER_TEST "$GCC_UNDER_TEST -x c++"
 
-set blddir [lookfor_file [get_multilibs] libgomp]
-
-
 if { $blddir != "" } {
 # Look for a static libstdc++ first.
 if [file exists "${blddir}/${lang_library_path}/libstdc++.a"] {
-- 
2.39.2

[PATCH 05/16] arm: [MVE intrinsics] rework vmaxvq vminvq vmaxavq vminavq

2023-05-09 Thread Christophe Lyon via Gcc-patches

Implement vmaxvq, vminvq, vmaxavq, vminavq using the new MVE builtins
framework.

2022-09-08  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-base.cc (FUNCTION_PRED_P_S_U)
(FUNCTION_PRED_P_S): New.
(vmaxavq, vminavq, vmaxvq, vminvq): New.
* config/arm/arm-mve-builtins-base.def (vmaxavq, vminavq, vmaxvq)
(vminvq): New.
* config/arm/arm-mve-builtins-base.h (vmaxavq, vminavq, vmaxvq)
(vminvq): New.
* config/arm/arm_mve.h (vminvq): Remove.
(vmaxvq): Remove.
(vminvq_p): Remove.
(vmaxvq_p): Remove.
(vminvq_u8): Remove.
(vmaxvq_u8): Remove.
(vminvq_s8): Remove.
(vmaxvq_s8): Remove.
(vminvq_u16): Remove.
(vmaxvq_u16): Remove.
(vminvq_s16): Remove.
(vmaxvq_s16): Remove.
(vminvq_u32): Remove.
(vmaxvq_u32): Remove.
(vminvq_s32): Remove.
(vmaxvq_s32): Remove.
(vminvq_p_u8): Remove.
(vmaxvq_p_u8): Remove.
(vminvq_p_s8): Remove.
(vmaxvq_p_s8): Remove.
(vminvq_p_u16): Remove.
(vmaxvq_p_u16): Remove.
(vminvq_p_s16): Remove.
(vmaxvq_p_s16): Remove.
(vminvq_p_u32): Remove.
(vmaxvq_p_u32): Remove.
(vminvq_p_s32): Remove.
(vmaxvq_p_s32): Remove.
(__arm_vminvq_u8): Remove.
(__arm_vmaxvq_u8): Remove.
(__arm_vminvq_s8): Remove.
(__arm_vmaxvq_s8): Remove.
(__arm_vminvq_u16): Remove.
(__arm_vmaxvq_u16): Remove.
(__arm_vminvq_s16): Remove.
(__arm_vmaxvq_s16): Remove.
(__arm_vminvq_u32): Remove.
(__arm_vmaxvq_u32): Remove.
(__arm_vminvq_s32): Remove.
(__arm_vmaxvq_s32): Remove.
(__arm_vminvq_p_u8): Remove.
(__arm_vmaxvq_p_u8): Remove.
(__arm_vminvq_p_s8): Remove.
(__arm_vmaxvq_p_s8): Remove.
(__arm_vminvq_p_u16): Remove.
(__arm_vmaxvq_p_u16): Remove.
(__arm_vminvq_p_s16): Remove.
(__arm_vmaxvq_p_s16): Remove.
(__arm_vminvq_p_u32): Remove.
(__arm_vmaxvq_p_u32): Remove.
(__arm_vminvq_p_s32): Remove.
(__arm_vmaxvq_p_s32): Remove.
(__arm_vminvq): Remove.
(__arm_vmaxvq): Remove.
(__arm_vminvq_p): Remove.
(__arm_vmaxvq_p): Remove.
(vminavq): Remove.
(vmaxavq): Remove.
(vminavq_p): Remove.
(vmaxavq_p): Remove.
(vminavq_s8): Remove.
(vmaxavq_s8): Remove.
(vminavq_s16): Remove.
(vmaxavq_s16): Remove.
(vminavq_s32): Remove.
(vmaxavq_s32): Remove.
(vminavq_p_s8): Remove.
(vmaxavq_p_s8): Remove.
(vminavq_p_s16): Remove.
(vmaxavq_p_s16): Remove.
(vminavq_p_s32): Remove.
(vmaxavq_p_s32): Remove.
(__arm_vminavq_s8): Remove.
(__arm_vmaxavq_s8): Remove.
(__arm_vminavq_s16): Remove.
(__arm_vmaxavq_s16): Remove.
(__arm_vminavq_s32): Remove.
(__arm_vmaxavq_s32): Remove.
(__arm_vminavq_p_s8): Remove.
(__arm_vmaxavq_p_s8): Remove.
(__arm_vminavq_p_s16): Remove.
(__arm_vmaxavq_p_s16): Remove.
(__arm_vminavq_p_s32): Remove.
(__arm_vmaxavq_p_s32): Remove.
(__arm_vminavq): Remove.
(__arm_vmaxavq): Remove.
(__arm_vminavq_p): Remove.
(__arm_vmaxavq_p): Remove.
---
 gcc/config/arm/arm-mve-builtins-base.cc  |  17 +
 gcc/config/arm/arm-mve-builtins-base.def |   4 +
 gcc/config/arm/arm-mve-builtins-base.h   |   4 +
 gcc/config/arm/arm_mve.h | 616 ---
 4 files changed, 25 insertions(+), 616 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc 
b/gcc/config/arm/arm-mve-builtins-base.cc
index aafd85b293d..cfab3f222ed 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -212,6 +212,19 @@ namespace arm_mve {
 -1, -1, UNSPEC##_M_F,  \
 -1, -1, -1))
 
+  /* Helper for builtins without RTX codes, _S mode, _p predicated.  */
+#define FUNCTION_PRED_P_S(NAME, UNSPEC) FUNCTION   \
+  (NAME, unspec_mve_function_exact_insn_pred_p,
\
+   (UNSPEC##_S, -1, -1,
\
+UNSPEC##_P_S, -1, -1))
+
+  /* Helper for builtins without RTX codes, _S and _U modes, _p
+ predicated.  */
+#define FUNCTION_PRED_P_S_U(NAME, UNSPEC) FUNCTION \
+  (NAME, unspec_mve_function_exact_insn_pred_p,
\
+   (UNSPEC##_S, UNSPEC##_U, -1,
\
+UNSPEC##_P_S, UNSPEC##_P_U, -1))
+
 FUNCTION_WITHOUT_N (vabdq, VABDQ)
 FUNCTION (vabsq, unspec_based_mve_function_exact_insn, (ABS, ABS, ABS, -1, -1, 
-1, VABSQ_M_S, -1, VABSQ_M_F, -1, -1, -1))
 FUNCTION_WITH_RTX_M_N (vaddq, PLUS, VADDQ)
@@ -222,8

[PATCH 15/16] arm: [MVE intrinsics] factorize vmaxaq vminaq

2023-05-09 Thread Christophe Lyon via Gcc-patches

Factorize vmaxaq vminaq so that they use the same pattern.

2022-09-08  Christophe Lyon  

gcc/
* config/arm/iterators.md (MVE_VMAXAVMINAQ, MVE_VMAXAVMINAQ_M):
New.
(mve_insn): Add vmaxa, vmina.
(supf): Add VMAXAQ_S, VMAXAQ_M_S, VMINAQ_S, VMINAQ_M_S.
* config/arm/mve.md (mve_vmaxaq_s, mve_vminaq_s):
Merge into ...
(@mve_q_): ... this.
(mve_vmaxaq_m_s, mve_vminaq_m_s): Merge into ...
(@mve_q_m_): ... this.
---
 gcc/config/arm/iterators.md | 18 ++
 gcc/config/arm/mve.md   | 49 -
 2 files changed, 28 insertions(+), 39 deletions(-)

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 8edbf5a55cf..3c70fd7f56d 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -621,6 +621,16 @@ (define_int_iterator MVE_VMAXNMA_VMINNMAQ_M [
 VMINNMAQ_M_F
 ])
 
+(define_int_iterator MVE_VMAXAVMINAQ [
+VMAXAQ_S
+VMINAQ_S
+])
+
+(define_int_iterator MVE_VMAXAVMINAQ_M [
+VMAXAQ_M_S
+VMINAQ_M_S
+])
+
 (define_int_iterator MVE_MOVN [
 VMOVNBQ_S VMOVNBQ_U
 VMOVNTQ_S VMOVNTQ_U
@@ -670,6 +680,8 @@ (define_int_attr mve_insn [
 (VHSUBQ_M_S "vhsub") (VHSUBQ_M_U "vhsub")
 (VHSUBQ_N_S "vhsub") (VHSUBQ_N_U "vhsub")
 (VHSUBQ_S "vhsub") (VHSUBQ_U "vhsub")
+(VMAXAQ_M_S "vmaxa")
+(VMAXAQ_S "vmaxa")
 (VMAXAVQ_P_S "vmaxav")
 (VMAXAVQ_S "vmaxav")
 (VMAXNMAQ_F "vmaxnma")
@@ -682,6 +694,8 @@ (define_int_attr mve_insn [
 (VMAXQ_M_S "vmax") (VMAXQ_M_U "vmax")
 (VMAXVQ_P_S "vmaxv") (VMAXVQ_P_U "vmaxv")
 (VMAXVQ_S "vmaxv") (VMAXVQ_U "vmaxv")
+(VMINAQ_M_S "vmina")
+(VMINAQ_S "vmina")
 (VMINAVQ_P_S "vminav")
 (VMINAVQ_S "vminav")
 (VMINNMAQ_F "vminnma")
@@ -2064,6 +2078,10 @@ (define_int_attr supf [(VCVTQ_TO_F_S "s") (VCVTQ_TO_F_U 
"u") (VREV16Q_S "s")
   (VMAXAVQ_P_S "s")
   (VMINAVQ_S "s")
   (VMINAVQ_P_S "s")
+  (VMAXAQ_S "s")
+  (VMAXAQ_M_S "s")
+  (VMINAQ_S "s")
+  (VMINAQ_M_S "s")
   ])
 
 ;; Both kinds of return insn.
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index ef0b6fd3ded..45bca6d6215 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -948,17 +948,18 @@ (define_insn "mve_vhcaddq_rot90_s"
 ])
 
 ;;
-;; [vmaxaq_s])
+;; [vmaxaq_s]
+;; [vminaq_s]
 ;;
-(define_insn "mve_vmaxaq_s"
+(define_insn "@mve_q_"
   [
(set (match_operand:MVE_2 0 "s_register_operand" "=w")
(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
   (match_operand:MVE_2 2 "s_register_operand" "w")]
-VMAXAQ_S))
+MVE_VMAXAVMINAQ))
   ]
   "TARGET_HAVE_MVE"
-  "vmaxa.s%#%q0, %q2"
+  ".s%#\t%q0, %q2"
   [(set_attr "type" "mve_move")
 ])
 
@@ -996,21 +997,6 @@ (define_insn "@mve_q_"
   [(set_attr "type" "mve_move")
 ])
 
-;;
-;; [vminaq_s])
-;;
-(define_insn "mve_vminaq_s"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-   (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-  (match_operand:MVE_2 2 "s_register_operand" "w")]
-VMINAQ_S))
-  ]
-  "TARGET_HAVE_MVE"
-  "vmina.s%#\t%q0, %q2"
-  [(set_attr "type" "mve_move")
-])
-
 ;;
 ;; [vmladavq_u, vmladavq_s])
 ;;
@@ -2239,18 +2225,19 @@ (define_insn "mve_vdupq_m_n_"
(set_attr "length""8")])
 
 ;;
-;; [vmaxaq_m_s])
+;; [vmaxaq_m_s]
+;; [vminaq_m_s]
 ;;
-(define_insn "mve_vmaxaq_m_s"
+(define_insn "@mve_q_m_"
   [
(set (match_operand:MVE_2 0 "s_register_operand" "=w")
(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
   (match_operand:MVE_2 2 "s_register_operand" "w")
   (match_operand: 3 "vpr_register_operand" 
"Up")]
-VMAXAQ_M_S))
+MVE_VMAXAVMINAQ_M))
   ]
   "TARGET_HAVE_MVE"
-  "vpst\;vmaxat.s%# %q0, %q2"
+  "vpst\;t.s%#\t%q0, %q2"
   [(set_attr "type" "mve_move")
(set_attr "length""8")])
 
@@ -2273,22 +2260,6 @@ (define_insn "@mve_q_p_"
   [(set_attr "type" "mve_move")
(set_attr "length""8")])
 
-;;
-;; [vminaq_m_s])
-;;
-(define_insn "mve_vminaq_m_s"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-   (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-  (match_operand:MVE_2 2 "s_register_operand" "w")
-  (match_operand: 3 "vpr_register_operand" 
"Up")]
-VMINAQ_M_S))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vminat.s%# %q0,

[PATCH 09/16] arm: [MVE intrinsics] factorize vmaxnmavq vmaxnmvq vminnmavq vminnmvq

2023-05-09 Thread Christophe Lyon via Gcc-patches

Factorize vmaxnmavq vmaxnmvq vminnmavq vminnmvq so that they use the
same pattern.

2022-09-08  Christophe Lyon 

gcc/
* config/arm/iterators.md (MVE_VMAXNMxV_MINNMxVQ)
(MVE_VMAXNMxV_MINNMxVQ_P): New.
(mve_insn): Add vmaxnmav, vmaxnmv, vminnmav, vminnmv.
* config/arm/mve.md (mve_vmaxnmavq_f, mve_vmaxnmvq_f)
(mve_vminnmavq_f, mve_vminnmvq_f): Merge into ...
(@mve_q_f): ... this.
(mve_vmaxnmavq_p_f, mve_vmaxnmvq_p_f)
(mve_vminnmavq_p_f, mve_vminnmvq_p_f): Merge into ...
(@mve_q_p_f): ... this.
---
 gcc/config/arm/iterators.md |  22 +++
 gcc/config/arm/mve.md   | 114 +---
 2 files changed, 37 insertions(+), 99 deletions(-)

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 397ac32720d..26ad687cefd 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -597,6 +597,20 @@ (define_int_iterator MVE_VMAXVQ_VMINVQ_P [
 VMINVQ_P_S VMINVQ_P_U
 ])
 
+(define_int_iterator MVE_VMAXNMxV_MINNMxVQ [
+VMAXNMAVQ_F
+VMAXNMVQ_F
+VMINNMAVQ_F
+VMINNMVQ_F
+])
+
+(define_int_iterator MVE_VMAXNMxV_MINNMxVQ_P [
+VMAXNMAVQ_P_F
+VMAXNMVQ_P_F
+VMINNMAVQ_P_F
+VMINNMVQ_P_F
+])
+
 (define_int_iterator MVE_MOVN [
 VMOVNBQ_S VMOVNBQ_U
 VMOVNTQ_S VMOVNTQ_U
@@ -648,13 +662,21 @@ (define_int_attr mve_insn [
 (VHSUBQ_S "vhsub") (VHSUBQ_U "vhsub")
 (VMAXAVQ_P_S "vmaxav")
 (VMAXAVQ_S "vmaxav")
+(VMAXNMAVQ_F "vmaxnmav")
+(VMAXNMAVQ_P_F "vmaxnmav")
 (VMAXNMQ_M_F "vmaxnm")
+(VMAXNMVQ_F "vmaxnmv")
+(VMAXNMVQ_P_F "vmaxnmv")
 (VMAXQ_M_S "vmax") (VMAXQ_M_U "vmax")
 (VMAXVQ_P_S "vmaxv") (VMAXVQ_P_U "vmaxv")
 (VMAXVQ_S "vmaxv") (VMAXVQ_U "vmaxv")
 (VMINAVQ_P_S "vminav")
 (VMINAVQ_S "vminav")
+(VMINNMAVQ_F "vminnmav")
+(VMINNMAVQ_P_F "vminnmav")
 (VMINNMQ_M_F "vminnm")
+(VMINNMVQ_F "vminnmv")
+(VMINNMVQ_P_F "vminnmv")
 (VMINQ_M_S "vmin") (VMINQ_M_U "vmin")
 (VMINVQ_P_S "vminv") (VMINVQ_P_U "vminv")
 (VMINVQ_S "vminv") (VMINVQ_U "vminv")
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index d2863b316e0..2aebaa99bbf 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -1440,17 +1440,20 @@ (define_insn "mve_vmaxnmaq_f"
 ])
 
 ;;
-;; [vmaxnmavq_f])
+;; [vmaxnmavq_f]
+;; [vmaxnmvq_f]
+;; [vminnmavq_f]
+;; [vminnmvq_f]
 ;;
-(define_insn "mve_vmaxnmavq_f"
+(define_insn "@mve_q_f"
   [
(set (match_operand: 0 "s_register_operand" "=r")
(unspec: [(match_operand: 1 "s_register_operand" "0")
  (match_operand:MVE_0 2 "s_register_operand" "w")]
-VMAXNMAVQ_F))
+MVE_VMAXNMxV_MINNMxVQ))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vmaxnmav.f%# %0, %q2"
+  ".f%#\t%0, %q2"
   [(set_attr "type" "mve_move")
 ])
 
@@ -1469,21 +1472,6 @@ (define_insn "@mve_q_f"
   [(set_attr "type" "mve_move")
 ])
 
-;;
-;; [vmaxnmvq_f])
-;;
-(define_insn "mve_vmaxnmvq_f"
-  [
-   (set (match_operand: 0 "s_register_operand" "=r")
-   (unspec: [(match_operand: 1 "s_register_operand" "0")
- (match_operand:MVE_0 2 "s_register_operand" "w")]
-VMAXNMVQ_F))
-  ]
-  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vmaxnmv.f%#  %0, %q2"
-  [(set_attr "type" "mve_move")
-])
-
 ;;
 ;; [vminnmaq_f])
 ;;
@@ -1499,36 +1487,6 @@ (define_insn "mve_vminnmaq_f"
   [(set_attr "type" "mve_move")
 ])
 
-;;
-;; [vminnmavq_f])
-;;
-(define_insn "mve_vminnmavq_f"
-  [
-   (set (match_operand: 0 "s_register_operand" "=r")
-   (unspec: [(match_operand: 1 "s_register_operand" "0")
- (match_operand:MVE_0 2 "s_register_operand" "w")]
-VMINNMAVQ_F))
-  ]
-  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vminnmav.f%# %0, %q2"
-  [(set_attr "type" "mve_move")
-])
-
-;;
-;; [vminnmvq_f])
-;;
-(define_insn "mve_vminnmvq_f"
-  [
-   (set (match_operand: 0 "s_register_operand" "=r")
-   (unspec: [(match_operand: 1 "s_register_operand" "0")
- (match_operand:MVE_0 2 "s_register_operand" "w")]
-VMINNMVQ_F))
-  ]
-  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vminnmv.f%#  %0, %q2"
-  [(set_attr "type" "mve_move")
-])
-
 ;;
 ;; [vmlaldavq_u, vmlaldavq_s])
 ;;
@@ -3202,37 +3160,26 @@ (define_insn "mve_vmaxnmaq_m_f"
   "vpst\;vmaxnmat.f%#   %q0, %q2"
   [(set_attr "type" "mve_move")
(set_attr "length""8")])
-;;
-;; [vmaxnmavq_p_f])
-;;
-(define_insn

[PATCH 07/16] arm: [MVE intrinsics] factorize vmaxnmq vminnmq

2023-05-09 Thread Christophe Lyon via Gcc-patches

Factorize vmaxnmq and vminnmq so that they use the same pattern.

2022-09-08  Christophe Lyon  

gcc/
* config/arm/iterators.md (MAX_MIN_F): New.
(MVE_FP_M_BINARY): Add VMAXNMQ_M_F, VMINNMQ_M_F.
(mve_insn): Add vmaxnm, vminnm.
(max_min_f_str): New.
* config/arm/mve.md (mve_vmaxnmq_f, mve_vminnmq_f):
Merge into ...
(@mve_q_f): ... this.
(mve_vmaxnmq_m_f, mve_vminnmq_m_f): Merge into ...
(@mve_q_m_f): ... this.
---
 gcc/config/arm/iterators.md | 10 ++
 gcc/config/arm/mve.md   | 63 ++---
 2 files changed, 19 insertions(+), 54 deletions(-)

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 5bb7e2be7c8..397ac32720d 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -333,6 +333,9 @@ (define_code_iterator SSPLUSMINUS [ss_plus ss_minus])
 ;; Max/Min iterator, to factorize MVE patterns
 (define_code_iterator MAX_MIN_SU [smax umax smin umin])
 
+;; Floating-point Max/Min iterator, to factorize MVE patterns
+(define_code_iterator MAX_MIN_F [smax smin])
+
 ;; MVE integer unary operations.
 (define_int_iterator MVE_INT_M_UNARY [
 VABSQ_M_S
@@ -547,6 +550,8 @@ (define_int_iterator MVE_SHRN_M_N [
 (define_int_iterator MVE_FP_M_BINARY   [
 VABDQ_M_F
 VADDQ_M_F
+VMAXNMQ_M_F
+VMINNMQ_M_F
 VMULQ_M_F
 VSUBQ_M_F
 ])
@@ -643,11 +648,13 @@ (define_int_attr mve_insn [
 (VHSUBQ_S "vhsub") (VHSUBQ_U "vhsub")
 (VMAXAVQ_P_S "vmaxav")
 (VMAXAVQ_S "vmaxav")
+(VMAXNMQ_M_F "vmaxnm")
 (VMAXQ_M_S "vmax") (VMAXQ_M_U "vmax")
 (VMAXVQ_P_S "vmaxv") (VMAXVQ_P_U "vmaxv")
 (VMAXVQ_S "vmaxv") (VMAXVQ_U "vmaxv")
 (VMINAVQ_P_S "vminav")
 (VMINAVQ_S "vminav")
+(VMINNMQ_M_F "vminnm")
 (VMINQ_M_S "vmin") (VMINQ_M_U "vmin")
 (VMINVQ_P_S "vminv") (VMINVQ_P_U "vminv")
 (VMINVQ_S "vminv") (VMINVQ_U "vminv")
@@ -1516,6 +1523,9 @@ (define_code_attr max_min_supf [
 (smin "s") (umin "u")
 ])
 
+;; Floating-point max/min for MVE
+(define_code_attr max_min_f_str [(smax "vmaxnm") (smin "vminnm")])
+
 ;;
 ;; Int attributes
 ;;
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 715e85c9998..d2863b316e0 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -1455,16 +1455,17 @@ (define_insn "mve_vmaxnmavq_f"
 ])
 
 ;;
-;; [vmaxnmq_f])
+;; [vmaxnmq_f]
+;; [vminnmq_f]
 ;;
-(define_insn "mve_vmaxnmq_f"
+(define_insn "@mve_q_f"
   [
(set (match_operand:MVE_0 0 "s_register_operand" "=w")
-   (smax:MVE_0 (match_operand:MVE_0 1 "s_register_operand" "w")
-   (match_operand:MVE_0 2 "s_register_operand" "w")))
+   (MAX_MIN_F:MVE_0 (match_operand:MVE_0 1 "s_register_operand" "w")
+(match_operand:MVE_0 2 "s_register_operand" "w")))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vmaxnm.f%#   %q0, %q1, %q2"
+  ".f%#  %q0, %q1, %q2"
   [(set_attr "type" "mve_move")
 ])
 
@@ -1513,20 +1514,6 @@ (define_insn "mve_vminnmavq_f"
   [(set_attr "type" "mve_move")
 ])
 
-;;
-;; [vminnmq_f])
-;;
-(define_insn "mve_vminnmq_f"
-  [
-   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
-   (smin:MVE_0 (match_operand:MVE_0 1 "s_register_operand" "w")
-   (match_operand:MVE_0 2 "s_register_operand" "w")))
-  ]
-  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vminnm.f%#   %q0, %q1, %q2"
-  [(set_attr "type" "mve_move")
-])
-
 ;;
 ;; [vminnmvq_f])
 ;;
@@ -4533,8 +4520,10 @@ (define_insn "mve_vrmlsldavhaxq_p_sv4si"
 ;;
 ;; [vabdq_m_f]
 ;; [vaddq_m_f]
-;; [vsubq_m_f]
+;; [vmaxnmq_m_f]
+;; [vminnmq_m_f]
 ;; [vmulq_m_f]
+;; [vsubq_m_f]
 ;;
 (define_insn "@mve_q_m_f"
   [
@@ -4844,40 +4833,6 @@ (define_insn "mve_vfmsq_m_f"
   [(set_attr "type" "mve_move")
(set_attr "length""8")])
 
-;;
-;; [vmaxnmq_m_f])
-;;
-(define_insn "mve_vmaxnmq_m_f"
-  [
-   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
-   (unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
-  (match_operand:MVE_0 2 "s_register_operand" "w")
-  (match_operand:MVE_0 3 "s_register_operand" "w")
-  (match_operand: 4 "vpr_register_operand" 
"Up")]
-VMAXNMQ_M_F))
-  ]
-  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vpst\;vmaxnmt.f%#%q0, %q2, %q3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
-;;
-;; [vminnmq_m_f])
-;;
-(define_insn "mve_vminnmq_m_f"
-  [
-   (set (match_operand:MVE_0 0

[PATCH 14/16] arm: [MVE intrinsics] add binary_maxamina shape

2023-05-09 Thread Christophe Lyon via Gcc-patches

This patch adds the binary_maxamina shape description.

2022-09-08  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-shapes.cc (binary_maxamina): New.
* config/arm/arm-mve-builtins-shapes.h (binary_maxamina): New.
---
 gcc/config/arm/arm-mve-builtins-shapes.cc | 40 +++
 gcc/config/arm/arm-mve-builtins-shapes.h  |  1 +
 2 files changed, 41 insertions(+)

diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc 
b/gcc/config/arm/arm-mve-builtins-shapes.cc
index 1324abd1c12..c9eac80d1e3 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.cc
+++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
@@ -401,6 +401,46 @@ struct binary_rshift_def : public overloaded_base<0>
 };
 SHAPE (binary_rshift)
 
+
+/* _t vfoo[_t0](_t, _t)
+
+   i.e. binary operations that take a vector of unsigned elements as first 
argument and a
+   vector of signed elements as second argument, and produce a vector of 
unsigned elements.
+
+   Example: vminaq.
+   uint8x16_t [__arm_]vminaq[_s8](uint8x16_t a, int8x16_t b)
+   uint8x16_t [__arm_]vminaq_m[_s8](uint8x16_t a, int8x16_t b, mve_pred16_t p) 
 */
+struct binary_maxamina_def : public overloaded_base<0>
+{
+  void
+  build (function_builder , const function_group_info ,
+bool preserve_user_namespace) const override
+  {
+b.add_overloaded_functions (group, MODE_none, preserve_user_namespace);
+build_all (b, "vu0,vu0,vs0", group, MODE_none, preserve_user_namespace);
+  }
+
+  tree
+  resolve (function_resolver ) const override
+  {
+unsigned int i, nargs;
+type_suffix_index type;
+if (!r.check_gp_argument (2, i, nargs)
+   || (type = r.infer_vector_type (i)) == NUM_TYPE_SUFFIXES)
+  return error_mark_node;
+
+/* Check that the first argument has the expeected unsigned
+   type.  */
+type_suffix_index return_type
+  = find_type_suffix (TYPE_unsigned, type_suffixes[type].element_bits);
+if (!r.require_matching_vector_type (0, return_type))
+  return error_mark_node;
+
+return r.resolve_to (r.mode_suffix_id, type);
+  }
+};
+SHAPE (binary_maxamina)
+
 /* _t vfoo[_](_t, _t)
 
Example: vmaxavq.
diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h 
b/gcc/config/arm/arm-mve-builtins-shapes.h
index cf4c523ab1a..7f582d7375a 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.h
+++ b/gcc/config/arm/arm-mve-builtins-shapes.h
@@ -37,6 +37,7 @@ namespace arm_mve
 extern const function_shape *const binary;
 extern const function_shape *const binary_lshift;
 extern const function_shape *const binary_lshift_r;
+extern const function_shape *const binary_maxamina;
 extern const function_shape *const binary_maxavminav;
 extern const function_shape *const binary_maxvminv;
 extern const function_shape *const binary_move_narrow;
-- 
2.34.1

[PATCH 13/16] arm: [MVE intrinsics] rework vmaxnmaq vminnmaq

2023-05-09 Thread Christophe Lyon via Gcc-patches

Implement vmaxnmaq and vminnmaq using the new MVE builtins framework.

2022-09-08  Christophe Lyon 

gcc/
* config/arm/arm-mve-builtins-base.cc (vmaxnmaq, vminnmaq): New.
* config/arm/arm-mve-builtins-base.def (vmaxnmaq, vminnmaq): New.
* config/arm/arm-mve-builtins-base.h (vmaxnmaq, vminnmaq): New.
* config/arm/arm-mve-builtins.cc
(function_instance::has_inactive_argument): Handle vmaxnmaq and
vminnmaq.
* config/arm/arm_mve.h (vminnmaq): Remove.
(vmaxnmaq): Remove.
(vmaxnmaq_m): Remove.
(vminnmaq_m): Remove.
(vminnmaq_f16): Remove.
(vmaxnmaq_f16): Remove.
(vminnmaq_f32): Remove.
(vmaxnmaq_f32): Remove.
(vmaxnmaq_m_f16): Remove.
(vminnmaq_m_f16): Remove.
(vmaxnmaq_m_f32): Remove.
(vminnmaq_m_f32): Remove.
(__arm_vminnmaq_f16): Remove.
(__arm_vmaxnmaq_f16): Remove.
(__arm_vminnmaq_f32): Remove.
(__arm_vmaxnmaq_f32): Remove.
(__arm_vmaxnmaq_m_f16): Remove.
(__arm_vminnmaq_m_f16): Remove.
(__arm_vmaxnmaq_m_f32): Remove.
(__arm_vminnmaq_m_f32): Remove.
(__arm_vminnmaq): Remove.
(__arm_vmaxnmaq): Remove.
(__arm_vmaxnmaq_m): Remove.
(__arm_vminnmaq_m): Remove.
---
 gcc/config/arm/arm-mve-builtins-base.cc  |   2 +
 gcc/config/arm/arm-mve-builtins-base.def |   2 +
 gcc/config/arm/arm-mve-builtins-base.h   |   2 +
 gcc/config/arm/arm-mve-builtins.cc   |   2 +
 gcc/config/arm/arm_mve.h | 148 ---
 5 files changed, 8 insertions(+), 148 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc 
b/gcc/config/arm/arm-mve-builtins-base.cc
index af00d070739..8082e97a7ea 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -242,12 +242,14 @@ FUNCTION_WITH_RTX_M (veorq, XOR, VEORQ)
 FUNCTION_WITH_M_N_NO_F (vhaddq, VHADDQ)
 FUNCTION_WITH_M_N_NO_F (vhsubq, VHSUBQ)
 FUNCTION_PRED_P_S (vmaxavq, VMAXAVQ)
+FUNCTION_ONLY_F (vmaxnmaq, VMAXNMAQ)
 FUNCTION_PRED_P_F (vmaxnmavq, VMAXNMAVQ)
 FUNCTION (vmaxnmq, unspec_based_mve_function_exact_insn, (UNKNOWN, UNKNOWN, 
SMAX, -1, -1, -1, -1, -1, VMAXNMQ_M_F, -1, -1, -1))
 FUNCTION_PRED_P_F (vmaxnmvq, VMAXNMVQ)
 FUNCTION_WITH_RTX_M_NO_F (vmaxq, SMAX, UMAX, VMAXQ)
 FUNCTION_PRED_P_S_U (vmaxvq, VMAXVQ)
 FUNCTION_PRED_P_S (vminavq, VMINAVQ)
+FUNCTION_ONLY_F (vminnmaq, VMINNMAQ)
 FUNCTION_PRED_P_F (vminnmavq, VMINNMAVQ)
 FUNCTION (vminnmq, unspec_based_mve_function_exact_insn, (UNKNOWN, UNKNOWN, 
SMIN, -1, -1, -1, -1, -1, VMINNMQ_M_F, -1, -1, -1))
 FUNCTION_PRED_P_F (vminnmvq, VMINNMVQ)
diff --git a/gcc/config/arm/arm-mve-builtins-base.def 
b/gcc/config/arm/arm-mve-builtins-base.def
index 19ac75c8f2e..45e2135452b 100644
--- a/gcc/config/arm/arm-mve-builtins-base.def
+++ b/gcc/config/arm/arm-mve-builtins-base.def
@@ -86,9 +86,11 @@ DEF_MVE_FUNCTION (vaddq, binary_opt_n, all_float, mx_or_none)
 DEF_MVE_FUNCTION (vandq, binary, all_float, mx_or_none)
 DEF_MVE_FUNCTION (vcreateq, create, all_float, none)
 DEF_MVE_FUNCTION (veorq, binary, all_float, mx_or_none)
+DEF_MVE_FUNCTION (vmaxnmaq, binary, all_float, m_or_none)
 DEF_MVE_FUNCTION (vmaxnmavq, binary_maxvminv, all_float, p_or_none)
 DEF_MVE_FUNCTION (vmaxnmq, binary, all_float, mx_or_none)
 DEF_MVE_FUNCTION (vmaxnmvq, binary_maxvminv, all_float, p_or_none)
+DEF_MVE_FUNCTION (vminnmaq, binary, all_float, m_or_none)
 DEF_MVE_FUNCTION (vminnmavq, binary_maxvminv, all_float, p_or_none)
 DEF_MVE_FUNCTION (vminnmq, binary, all_float, mx_or_none)
 DEF_MVE_FUNCTION (vminnmvq, binary_maxvminv, all_float, p_or_none)
diff --git a/gcc/config/arm/arm-mve-builtins-base.h 
b/gcc/config/arm/arm-mve-builtins-base.h
index dc413fc63df..0242c33ac94 100644
--- a/gcc/config/arm/arm-mve-builtins-base.h
+++ b/gcc/config/arm/arm-mve-builtins-base.h
@@ -34,12 +34,14 @@ extern const function_base *const veorq;
 extern const function_base *const vhaddq;
 extern const function_base *const vhsubq;
 extern const function_base *const vmaxavq;
+extern const function_base *const vmaxnmaq;
 extern const function_base *const vmaxnmavq;
 extern const function_base *const vmaxnmq;
 extern const function_base *const vmaxnmvq;
 extern const function_base *const vmaxq;
 extern const function_base *const vmaxvq;
 extern const function_base *const vminavq;
+extern const function_base *const vminnmaq;
 extern const function_base *const vminnmavq;
 extern const function_base *const vminnmq;
 extern const function_base *const vminnmvq;
diff --git a/gcc/config/arm/arm-mve-builtins.cc 
b/gcc/config/arm/arm-mve-builtins.cc
index 38639f75785..5752434c968 100644
--- a/gcc/config/arm/arm-mve-builtins.cc
+++ b/gcc/config/arm/arm-mve-builtins.cc
@@ -670,6 +670,8 @@ function_instance::has_inactive_argument () const
 return false;
 
   if (mode_suffix_id == MODE_r
+  || base == functions::vmaxnmaq
+  || base == functions::vminnmaq
   || base ==

[PATCH 02/16] arm: [MVE intrinsics] add binary_maxavminav shape

2023-05-09 Thread Christophe Lyon via Gcc-patches

This patch adds the binary_maxavminav shape description.

2022-09-08  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-shapes.cc (binary_maxavminav): New.
* config/arm/arm-mve-builtins-shapes.h (binary_maxavminav): New.
---
 gcc/config/arm/arm-mve-builtins-shapes.cc | 30 +++
 gcc/config/arm/arm-mve-builtins-shapes.h  |  1 +
 2 files changed, 31 insertions(+)

diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc 
b/gcc/config/arm/arm-mve-builtins-shapes.cc
index 19c3c47a20e..1324abd1c12 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.cc
+++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
@@ -401,6 +401,36 @@ struct binary_rshift_def : public overloaded_base<0>
 };
 SHAPE (binary_rshift)
 
+/* _t vfoo[_](_t, _t)
+
+   Example: vmaxavq.
+   uint8_t [__arm_]vmaxavq[_s8](uint8_t a, int8x16_t b)
+   uint8_t [__arm_]vmaxavq_p[_s8](uint8_t a, int8x16_t b, mve_pred16_t p)  */
+struct binary_maxavminav_def : public overloaded_base<0>
+{
+  void
+  build (function_builder , const function_group_info ,
+bool preserve_user_namespace) const override
+  {
+b.add_overloaded_functions (group, MODE_none, preserve_user_namespace);
+build_all (b, "su0,su0,v0", group, MODE_none, preserve_user_namespace);
+  }
+
+  tree
+  resolve (function_resolver ) const override
+  {
+unsigned int i, nargs;
+type_suffix_index type;
+if (!r.check_gp_argument (2, i, nargs)
+   || !r.require_derived_scalar_type (0, TYPE_unsigned)
+   || (type = r.infer_vector_type (1)) == NUM_TYPE_SUFFIXES)
+  return error_mark_node;
+
+return r.resolve_to (r.mode_suffix_id, type);
+  }
+};
+SHAPE (binary_maxavminav)
+
 /* _t vfoo[_](_t, _t)
 
Example: vmaxvq.
diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h 
b/gcc/config/arm/arm-mve-builtins-shapes.h
index 9debf1d8733..cf4c523ab1a 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.h
+++ b/gcc/config/arm/arm-mve-builtins-shapes.h
@@ -37,6 +37,7 @@ namespace arm_mve
 extern const function_shape *const binary;
 extern const function_shape *const binary_lshift;
 extern const function_shape *const binary_lshift_r;
+extern const function_shape *const binary_maxavminav;
 extern const function_shape *const binary_maxvminv;
 extern const function_shape *const binary_move_narrow;
 extern const function_shape *const binary_move_narrow_unsigned;
-- 
2.34.1

[PATCH 11/16] arm: [MVE intrinsics] rework vmaxnmavq vmaxnmvq vminnmavq vminnmvq

2023-05-09 Thread Christophe Lyon via Gcc-patches

Implement vmaxnmavq vmaxnmvq vminnmavq vminnmvq using the new MVE
builtins framework.

2022-09-08  Christophe Lyon 

gcc/
* config/arm/arm-mve-builtins-base.cc (FUNCTION_PRED_P_F): New.
(vmaxnmavq, vmaxnmvq, vminnmavq, vminnmvq): New.
* config/arm/arm-mve-builtins-base.def (vmaxnmavq, vmaxnmvq)
(vminnmavq, vminnmvq): New.
* config/arm/arm-mve-builtins-base.h (vmaxnmavq, vmaxnmvq)
(vminnmavq, vminnmvq): New.
* config/arm/arm_mve.h (vminnmvq): Remove.
(vminnmavq): Remove.
(vmaxnmvq): Remove.
(vmaxnmavq): Remove.
(vmaxnmavq_p): Remove.
(vmaxnmvq_p): Remove.
(vminnmavq_p): Remove.
(vminnmvq_p): Remove.
(vminnmvq_f16): Remove.
(vminnmavq_f16): Remove.
(vmaxnmvq_f16): Remove.
(vmaxnmavq_f16): Remove.
(vminnmvq_f32): Remove.
(vminnmavq_f32): Remove.
(vmaxnmvq_f32): Remove.
(vmaxnmavq_f32): Remove.
(vmaxnmavq_p_f16): Remove.
(vmaxnmvq_p_f16): Remove.
(vminnmavq_p_f16): Remove.
(vminnmvq_p_f16): Remove.
(vmaxnmavq_p_f32): Remove.
(vmaxnmvq_p_f32): Remove.
(vminnmavq_p_f32): Remove.
(vminnmvq_p_f32): Remove.
(__arm_vminnmvq_f16): Remove.
(__arm_vminnmavq_f16): Remove.
(__arm_vmaxnmvq_f16): Remove.
(__arm_vmaxnmavq_f16): Remove.
(__arm_vminnmvq_f32): Remove.
(__arm_vminnmavq_f32): Remove.
(__arm_vmaxnmvq_f32): Remove.
(__arm_vmaxnmavq_f32): Remove.
(__arm_vmaxnmavq_p_f16): Remove.
(__arm_vmaxnmvq_p_f16): Remove.
(__arm_vminnmavq_p_f16): Remove.
(__arm_vminnmvq_p_f16): Remove.
(__arm_vmaxnmavq_p_f32): Remove.
(__arm_vmaxnmvq_p_f32): Remove.
(__arm_vminnmavq_p_f32): Remove.
(__arm_vminnmvq_p_f32): Remove.
(__arm_vminnmvq): Remove.
(__arm_vminnmavq): Remove.
(__arm_vmaxnmvq): Remove.
(__arm_vmaxnmavq): Remove.
(__arm_vmaxnmavq_p): Remove.
(__arm_vmaxnmvq_p): Remove.
(__arm_vminnmavq_p): Remove.
(__arm_vminnmvq_p): Remove.
(__arm_vmaxnmavq_m): Remove.
(__arm_vmaxnmvq_m): Remove.
---
 gcc/config/arm/arm-mve-builtins-base.cc  |  10 +
 gcc/config/arm/arm-mve-builtins-base.def |   4 +
 gcc/config/arm/arm-mve-builtins-base.h   |   4 +
 gcc/config/arm/arm_mve.h | 314 ---
 4 files changed, 18 insertions(+), 314 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc 
b/gcc/config/arm/arm-mve-builtins-base.cc
index dcbd1906563..af00d070739 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -225,6 +225,12 @@ namespace arm_mve {
(UNSPEC##_S, UNSPEC##_U, -1,
\
 UNSPEC##_P_S, UNSPEC##_P_U, -1))
 
+  /* Helper for builtins without RTX codes, _F mode, _p predicated.  */
+#define FUNCTION_PRED_P_F(NAME, UNSPEC) FUNCTION   \
+  (NAME, unspec_mve_function_exact_insn_pred_p,
\
+   (-1, -1, UNSPEC##_F,
\
+-1, -1, UNSPEC##_P_F))
+
 FUNCTION_WITHOUT_N (vabdq, VABDQ)
 FUNCTION (vabsq, unspec_based_mve_function_exact_insn, (ABS, ABS, ABS, -1, -1, 
-1, VABSQ_M_S, -1, VABSQ_M_F, -1, -1, -1))
 FUNCTION_WITH_RTX_M_N (vaddq, PLUS, VADDQ)
@@ -236,11 +242,15 @@ FUNCTION_WITH_RTX_M (veorq, XOR, VEORQ)
 FUNCTION_WITH_M_N_NO_F (vhaddq, VHADDQ)
 FUNCTION_WITH_M_N_NO_F (vhsubq, VHSUBQ)
 FUNCTION_PRED_P_S (vmaxavq, VMAXAVQ)
+FUNCTION_PRED_P_F (vmaxnmavq, VMAXNMAVQ)
 FUNCTION (vmaxnmq, unspec_based_mve_function_exact_insn, (UNKNOWN, UNKNOWN, 
SMAX, -1, -1, -1, -1, -1, VMAXNMQ_M_F, -1, -1, -1))
+FUNCTION_PRED_P_F (vmaxnmvq, VMAXNMVQ)
 FUNCTION_WITH_RTX_M_NO_F (vmaxq, SMAX, UMAX, VMAXQ)
 FUNCTION_PRED_P_S_U (vmaxvq, VMAXVQ)
 FUNCTION_PRED_P_S (vminavq, VMINAVQ)
+FUNCTION_PRED_P_F (vminnmavq, VMINNMAVQ)
 FUNCTION (vminnmq, unspec_based_mve_function_exact_insn, (UNKNOWN, UNKNOWN, 
SMIN, -1, -1, -1, -1, -1, VMINNMQ_M_F, -1, -1, -1))
+FUNCTION_PRED_P_F (vminnmvq, VMINNMVQ)
 FUNCTION_WITH_RTX_M_NO_F (vminq, SMIN, UMIN, VMINQ)
 FUNCTION_PRED_P_S_U (vminvq, VMINVQ)
 FUNCTION_WITHOUT_N_NO_F (vmovnbq, VMOVNBQ)
diff --git a/gcc/config/arm/arm-mve-builtins-base.def 
b/gcc/config/arm/arm-mve-builtins-base.def
index c2155bafeb3..19ac75c8f2e 100644
--- a/gcc/config/arm/arm-mve-builtins-base.def
+++ b/gcc/config/arm/arm-mve-builtins-base.def
@@ -86,8 +86,12 @@ DEF_MVE_FUNCTION (vaddq, binary_opt_n, all_float, mx_or_none)
 DEF_MVE_FUNCTION (vandq, binary, all_float, mx_or_none)
 DEF_MVE_FUNCTION (vcreateq, create, all_float, none)
 DEF_MVE_FUNCTION (veorq, binary, all_float, mx_or_none)
+DEF_MVE_FUNCTION (vmaxnmavq, binary_maxvminv, all_float, p_or_none)
 DEF_MVE_FUNCTION (vmaxnmq, binary, all_float, mx_or_none)
+DEF_MVE_FUNCTION (vmaxnmvq, binary_maxvminv,

[PATCH 16/16] arm: [MVE intrinsics] rework vmaxaq vminaq

2023-05-09 Thread Christophe Lyon via Gcc-patches

Implement vmaxaq and vminaq using the new MVE builtins framework.

2022-09-08  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-base.cc (vmaxaq, vminaq): New.
* config/arm/arm-mve-builtins-base.def (vmaxaq, vminaq): New.
* config/arm/arm-mve-builtins-base.h (vmaxaq, vminaq): New.
* config/arm/arm-mve-builtins.cc
(function_instance::has_inactive_argument): Handle vmaxaq and
vminaq.
* config/arm/arm_mve.h (vminaq): Remove.
(vmaxaq): Remove.
(vminaq_m): Remove.
(vmaxaq_m): Remove.
(vminaq_s8): Remove.
(vmaxaq_s8): Remove.
(vminaq_s16): Remove.
(vmaxaq_s16): Remove.
(vminaq_s32): Remove.
(vmaxaq_s32): Remove.
(vminaq_m_s8): Remove.
(vmaxaq_m_s8): Remove.
(vminaq_m_s16): Remove.
(vmaxaq_m_s16): Remove.
(vminaq_m_s32): Remove.
(vmaxaq_m_s32): Remove.
(__arm_vminaq_s8): Remove.
(__arm_vmaxaq_s8): Remove.
(__arm_vminaq_s16): Remove.
(__arm_vmaxaq_s16): Remove.
(__arm_vminaq_s32): Remove.
(__arm_vmaxaq_s32): Remove.
(__arm_vminaq_m_s8): Remove.
(__arm_vmaxaq_m_s8): Remove.
(__arm_vminaq_m_s16): Remove.
(__arm_vmaxaq_m_s16): Remove.
(__arm_vminaq_m_s32): Remove.
(__arm_vmaxaq_m_s32): Remove.
(__arm_vminaq): Remove.
(__arm_vmaxaq): Remove.
(__arm_vminaq_m): Remove.
(__arm_vmaxaq_m): Remove.
---
 gcc/config/arm/arm-mve-builtins-base.cc  |   2 +
 gcc/config/arm/arm-mve-builtins-base.def |   2 +
 gcc/config/arm/arm-mve-builtins-base.h   |   2 +
 gcc/config/arm/arm-mve-builtins.cc   |   2 +
 gcc/config/arm/arm_mve.h | 240 ---
 5 files changed, 8 insertions(+), 240 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc 
b/gcc/config/arm/arm-mve-builtins-base.cc
index 8082e97a7ea..edca0d9ac6c 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -242,6 +242,7 @@ FUNCTION_WITH_RTX_M (veorq, XOR, VEORQ)
 FUNCTION_WITH_M_N_NO_F (vhaddq, VHADDQ)
 FUNCTION_WITH_M_N_NO_F (vhsubq, VHSUBQ)
 FUNCTION_PRED_P_S (vmaxavq, VMAXAVQ)
+FUNCTION_WITHOUT_N_NO_U_F (vmaxaq, VMAXAQ)
 FUNCTION_ONLY_F (vmaxnmaq, VMAXNMAQ)
 FUNCTION_PRED_P_F (vmaxnmavq, VMAXNMAVQ)
 FUNCTION (vmaxnmq, unspec_based_mve_function_exact_insn, (UNKNOWN, UNKNOWN, 
SMAX, -1, -1, -1, -1, -1, VMAXNMQ_M_F, -1, -1, -1))
@@ -249,6 +250,7 @@ FUNCTION_PRED_P_F (vmaxnmvq, VMAXNMVQ)
 FUNCTION_WITH_RTX_M_NO_F (vmaxq, SMAX, UMAX, VMAXQ)
 FUNCTION_PRED_P_S_U (vmaxvq, VMAXVQ)
 FUNCTION_PRED_P_S (vminavq, VMINAVQ)
+FUNCTION_WITHOUT_N_NO_U_F (vminaq, VMINAQ)
 FUNCTION_ONLY_F (vminnmaq, VMINNMAQ)
 FUNCTION_PRED_P_F (vminnmavq, VMINNMAVQ)
 FUNCTION (vminnmq, unspec_based_mve_function_exact_insn, (UNKNOWN, UNKNOWN, 
SMIN, -1, -1, -1, -1, -1, VMINNMQ_M_F, -1, -1, -1))
diff --git a/gcc/config/arm/arm-mve-builtins-base.def 
b/gcc/config/arm/arm-mve-builtins-base.def
index 45e2135452b..48a07c8d888 100644
--- a/gcc/config/arm/arm-mve-builtins-base.def
+++ b/gcc/config/arm/arm-mve-builtins-base.def
@@ -28,9 +28,11 @@ DEF_MVE_FUNCTION (vcreateq, create, all_integer_with_64, 
none)
 DEF_MVE_FUNCTION (veorq, binary, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vhaddq, binary_opt_n, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vhsubq, binary_opt_n, all_integer, mx_or_none)
+DEF_MVE_FUNCTION (vmaxaq, binary_maxamina, all_signed, m_or_none)
 DEF_MVE_FUNCTION (vmaxavq, binary_maxavminav, all_signed, p_or_none)
 DEF_MVE_FUNCTION (vmaxq, binary, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vmaxvq, binary_maxvminv, all_integer, p_or_none)
+DEF_MVE_FUNCTION (vminaq, binary_maxamina, all_signed, m_or_none)
 DEF_MVE_FUNCTION (vminavq, binary_maxavminav, all_signed, p_or_none)
 DEF_MVE_FUNCTION (vminq, binary, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vminvq, binary_maxvminv, all_integer, p_or_none)
diff --git a/gcc/config/arm/arm-mve-builtins-base.h 
b/gcc/config/arm/arm-mve-builtins-base.h
index 0242c33ac94..31417435f6f 100644
--- a/gcc/config/arm/arm-mve-builtins-base.h
+++ b/gcc/config/arm/arm-mve-builtins-base.h
@@ -33,6 +33,7 @@ extern const function_base *const vcreateq;
 extern const function_base *const veorq;
 extern const function_base *const vhaddq;
 extern const function_base *const vhsubq;
+extern const function_base *const vmaxaq;
 extern const function_base *const vmaxavq;
 extern const function_base *const vmaxnmaq;
 extern const function_base *const vmaxnmavq;
@@ -40,6 +41,7 @@ extern const function_base *const vmaxnmq;
 extern const function_base *const vmaxnmvq;
 extern const function_base *const vmaxq;
 extern const function_base *const vmaxvq;
+extern const function_base *const vminaq;
 extern const function_base *const vminavq;
 extern const function_base *const vminnmaq;
 extern const function_base *const vminnmavq;
diff --git a/gcc/config/arm/arm-mve-builtins.cc

[PATCH 08/16] arm: [MVE intrinsics] rework vmaxnmq vminnmq

2023-05-09 Thread Christophe Lyon via Gcc-patches

Implement vmaxnmq and vminnmq using the new MVE builtins framework.

2022-09-08  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-base.cc (vmaxnmq, vminnmq): New.
* config/arm/arm-mve-builtins-base.def (vmaxnmq, vminnmq): New.
* config/arm/arm-mve-builtins-base.h (vmaxnmq, vminnmq): New.
* config/arm/arm_mve.h (vminnmq): Remove.
(vmaxnmq): Remove.
(vmaxnmq_m): Remove.
(vminnmq_m): Remove.
(vminnmq_x): Remove.
(vmaxnmq_x): Remove.
(vminnmq_f16): Remove.
(vmaxnmq_f16): Remove.
(vminnmq_f32): Remove.
(vmaxnmq_f32): Remove.
(vmaxnmq_m_f32): Remove.
(vmaxnmq_m_f16): Remove.
(vminnmq_m_f32): Remove.
(vminnmq_m_f16): Remove.
(vminnmq_x_f16): Remove.
(vminnmq_x_f32): Remove.
(vmaxnmq_x_f16): Remove.
(vmaxnmq_x_f32): Remove.
(__arm_vminnmq_f16): Remove.
(__arm_vmaxnmq_f16): Remove.
(__arm_vminnmq_f32): Remove.
(__arm_vmaxnmq_f32): Remove.
(__arm_vmaxnmq_m_f32): Remove.
(__arm_vmaxnmq_m_f16): Remove.
(__arm_vminnmq_m_f32): Remove.
(__arm_vminnmq_m_f16): Remove.
(__arm_vminnmq_x_f16): Remove.
(__arm_vminnmq_x_f32): Remove.
(__arm_vmaxnmq_x_f16): Remove.
(__arm_vmaxnmq_x_f32): Remove.
(__arm_vminnmq): Remove.
(__arm_vmaxnmq): Remove.
(__arm_vmaxnmq_m): Remove.
(__arm_vminnmq_m): Remove.
(__arm_vminnmq_x): Remove.
(__arm_vmaxnmq_x): Remove.
---
 gcc/config/arm/arm-mve-builtins-base.cc  |   2 +
 gcc/config/arm/arm-mve-builtins-base.def |   2 +
 gcc/config/arm/arm-mve-builtins-base.h   |   2 +
 gcc/config/arm/arm_mve.h | 224 ---
 4 files changed, 6 insertions(+), 224 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc 
b/gcc/config/arm/arm-mve-builtins-base.cc
index cfab3f222ed..dcbd1906563 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -236,9 +236,11 @@ FUNCTION_WITH_RTX_M (veorq, XOR, VEORQ)
 FUNCTION_WITH_M_N_NO_F (vhaddq, VHADDQ)
 FUNCTION_WITH_M_N_NO_F (vhsubq, VHSUBQ)
 FUNCTION_PRED_P_S (vmaxavq, VMAXAVQ)
+FUNCTION (vmaxnmq, unspec_based_mve_function_exact_insn, (UNKNOWN, UNKNOWN, 
SMAX, -1, -1, -1, -1, -1, VMAXNMQ_M_F, -1, -1, -1))
 FUNCTION_WITH_RTX_M_NO_F (vmaxq, SMAX, UMAX, VMAXQ)
 FUNCTION_PRED_P_S_U (vmaxvq, VMAXVQ)
 FUNCTION_PRED_P_S (vminavq, VMINAVQ)
+FUNCTION (vminnmq, unspec_based_mve_function_exact_insn, (UNKNOWN, UNKNOWN, 
SMIN, -1, -1, -1, -1, -1, VMINNMQ_M_F, -1, -1, -1))
 FUNCTION_WITH_RTX_M_NO_F (vminq, SMIN, UMIN, VMINQ)
 FUNCTION_PRED_P_S_U (vminvq, VMINVQ)
 FUNCTION_WITHOUT_N_NO_F (vmovnbq, VMOVNBQ)
diff --git a/gcc/config/arm/arm-mve-builtins-base.def 
b/gcc/config/arm/arm-mve-builtins-base.def
index d06e134719e..c2155bafeb3 100644
--- a/gcc/config/arm/arm-mve-builtins-base.def
+++ b/gcc/config/arm/arm-mve-builtins-base.def
@@ -86,6 +86,8 @@ DEF_MVE_FUNCTION (vaddq, binary_opt_n, all_float, mx_or_none)
 DEF_MVE_FUNCTION (vandq, binary, all_float, mx_or_none)
 DEF_MVE_FUNCTION (vcreateq, create, all_float, none)
 DEF_MVE_FUNCTION (veorq, binary, all_float, mx_or_none)
+DEF_MVE_FUNCTION (vmaxnmq, binary, all_float, mx_or_none)
+DEF_MVE_FUNCTION (vminnmq, binary, all_float, mx_or_none)
 DEF_MVE_FUNCTION (vmulq, binary_opt_n, all_float, mx_or_none)
 DEF_MVE_FUNCTION (vnegq, unary, all_float, mx_or_none)
 DEF_MVE_FUNCTION (vorrq, binary_orrq, all_float, mx_or_none)
diff --git a/gcc/config/arm/arm-mve-builtins-base.h 
b/gcc/config/arm/arm-mve-builtins-base.h
index 30e0f42a352..0290ee72b4c 100644
--- a/gcc/config/arm/arm-mve-builtins-base.h
+++ b/gcc/config/arm/arm-mve-builtins-base.h
@@ -34,9 +34,11 @@ extern const function_base *const veorq;
 extern const function_base *const vhaddq;
 extern const function_base *const vhsubq;
 extern const function_base *const vmaxavq;
+extern const function_base *const vmaxnmq;
 extern const function_base *const vmaxq;
 extern const function_base *const vmaxvq;
 extern const function_base *const vminavq;
+extern const function_base *const vminnmq;
 extern const function_base *const vminq;
 extern const function_base *const vminvq;
 extern const function_base *const vmovnbq;
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index dddaab74bc0..12e77eee11e 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -309,11 +309,9 @@
 #define vcvtq(__a) __arm_vcvtq(__a)
 #define vcvtq_n(__a, __imm6) __arm_vcvtq_n(__a, __imm6)
 #define vminnmvq(__a, __b) __arm_vminnmvq(__a, __b)
-#define vminnmq(__a, __b) __arm_vminnmq(__a, __b)
 #define vminnmavq(__a, __b) __arm_vminnmavq(__a, __b)
 #define vminnmaq(__a, __b) __arm_vminnmaq(__a, __b)
 #define vmaxnmvq(__a, __b) __arm_vmaxnmvq(__a, __b)
-#define vmaxnmq(__a, __b) __arm_vmaxnmq(__a, __b)
 #define vmaxnmavq(__a, __b) __arm_vmaxnmavq(__a, __b)
 #define vmaxnmaq(__a, __b)

[PATCH 04/16] arm: [MVE intrinsics] factorize vmaxvq vminvq vmaxavq vminavq

2023-05-09 Thread Christophe Lyon via Gcc-patches

Factorize vmaxvq vminvq vmaxavq vminavq so that they use the same
pattern.

2022-09-08  Christophe Lyon  

gcc/
* config/arm/iterators.md (MVE_VMAXVQ_VMINVQ, MVE_VMAXVQ_VMINVQ_P): New.
(mve_insn): Add vmaxav, vmaxv, vminav, vminv.
(supf): Add VMAXAVQ_S, VMAXAVQ_P_S, VMINAVQ_S, VMINAVQ_P_S.
* config/arm/mve.md (mve_vmaxavq_s, mve_vmaxvq_)
(mve_vminavq_s, mve_vminvq_): Merge into ...
(@mve_q_): ... this.
(mve_vmaxavq_p_s, mve_vmaxvq_p_)
(mve_vminavq_p_s, mve_vminvq_p_): Merge into ...
(@mve_q_p_): ... this.
---
 gcc/config/arm/iterators.md |  26 
 gcc/config/arm/mve.md   | 115 +---
 2 files changed, 40 insertions(+), 101 deletions(-)

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index e82ff0d5d9b..5bb7e2be7c8 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -578,6 +578,20 @@ (define_int_iterator MVE_FP_CREATE_ONLY [
 VCREATEQ_F
 ])
 
+(define_int_iterator MVE_VMAXVQ_VMINVQ [
+VMAXAVQ_S
+VMAXVQ_S VMAXVQ_U
+VMINAVQ_S
+VMINVQ_S VMINVQ_U
+])
+
+(define_int_iterator MVE_VMAXVQ_VMINVQ_P [
+VMAXAVQ_P_S
+VMAXVQ_P_S VMAXVQ_P_U
+VMINAVQ_P_S
+VMINVQ_P_S VMINVQ_P_U
+])
+
 (define_int_iterator MVE_MOVN [
 VMOVNBQ_S VMOVNBQ_U
 VMOVNTQ_S VMOVNTQ_U
@@ -627,8 +641,16 @@ (define_int_attr mve_insn [
 (VHSUBQ_M_S "vhsub") (VHSUBQ_M_U "vhsub")
 (VHSUBQ_N_S "vhsub") (VHSUBQ_N_U "vhsub")
 (VHSUBQ_S "vhsub") (VHSUBQ_U "vhsub")
+(VMAXAVQ_P_S "vmaxav")
+(VMAXAVQ_S "vmaxav")
 (VMAXQ_M_S "vmax") (VMAXQ_M_U "vmax")
+(VMAXVQ_P_S "vmaxv") (VMAXVQ_P_U "vmaxv")
+(VMAXVQ_S "vmaxv") (VMAXVQ_U "vmaxv")
+(VMINAVQ_P_S "vminav")
+(VMINAVQ_S "vminav")
 (VMINQ_M_S "vmin") (VMINQ_M_U "vmin")
+(VMINVQ_P_S "vminv") (VMINVQ_P_U "vminv")
+(VMINVQ_S "vminv") (VMINVQ_U "vminv")
 (VMLAQ_M_N_S "vmla") (VMLAQ_M_N_U "vmla")
 (VMLASQ_M_N_S "vmlas") (VMLASQ_M_N_U "vmlas")
 (VMOVNBQ_M_S "vmovnb") (VMOVNBQ_M_U "vmovnb")
@@ -1992,6 +2014,10 @@ (define_int_attr supf [(VCVTQ_TO_F_S "s") (VCVTQ_TO_F_U 
"u") (VREV16Q_S "s")
   (VQMOVUNBQ_S "s")
   (VQMOVUNTQ_M_S "s")
   (VQMOVUNTQ_S "s")
+  (VMAXAVQ_S "s")
+  (VMAXAVQ_P_S "s")
+  (VMINAVQ_S "s")
+  (VMINAVQ_P_S "s")
   ])
 
 ;; Both kinds of return insn.
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 98728e6f3ef..715e85c9998 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -962,21 +962,6 @@ (define_insn "mve_vmaxaq_s"
   [(set_attr "type" "mve_move")
 ])
 
-;;
-;; [vmaxavq_s])
-;;
-(define_insn "mve_vmaxavq_s"
-  [
-   (set (match_operand: 0 "s_register_operand" "=r")
-   (unspec: [(match_operand: 1 "s_register_operand" "0")
- (match_operand:MVE_2 2 "s_register_operand" "w")]
-VMAXAVQ_S))
-  ]
-  "TARGET_HAVE_MVE"
-  "vmaxav.s%#\t%0, %q2"
-  [(set_attr "type" "mve_move")
-])
-
 ;;
 ;; [vmaxq_u, vmaxq_s]
 ;; [vminq_s, vminq_u]
@@ -994,17 +979,20 @@ (define_insn "mve_q_"
 
 
 ;;
-;; [vmaxvq_u, vmaxvq_s])
+;; [vmaxavq_s]
+;; [vmaxvq_u, vmaxvq_s]
+;; [vminavq_s]
+;; [vminvq_u, vminvq_s]
 ;;
-(define_insn "mve_vmaxvq_"
+(define_insn "@mve_q_"
   [
(set (match_operand: 0 "s_register_operand" "=r")
(unspec: [(match_operand: 1 "s_register_operand" "0")
  (match_operand:MVE_2 2 "s_register_operand" "w")]
-VMAXVQ))
+MVE_VMAXVQ_VMINVQ))
   ]
   "TARGET_HAVE_MVE"
-  "vmaxv.%#\t%0, %q2"
+  ".%#\t%0, %q2"
   [(set_attr "type" "mve_move")
 ])
 
@@ -1023,36 +1011,6 @@ (define_insn "mve_vminaq_s"
   [(set_attr "type" "mve_move")
 ])
 
-;;
-;; [vminavq_s])
-;;
-(define_insn "mve_vminavq_s"
-  [
-   (set (match_operand: 0 "s_register_operand" "=r")
-   (unspec: [(match_operand: 1 "s_register_operand" "0")
- (match_operand:MVE_2 2 "s_register_operand" "w")]
-VMINAVQ_S))
-  ]
-  "TARGET_HAVE_MVE"
-  "vminav.s%#\t%0, %q2"
-  [(set_attr "type" "mve_move")
-])
-
-;;
-;; [vminvq_u, vminvq_s])
-;;
-(define_insn "mve_vminvq_"
-  [
-   (set (match_operand: 0 "s_register_operand" "=r")
-   (unspec: [(match_operand: 1 "s_register_operand" "0")
- (match_operand:MVE_2 2 "s_register_operand" "w")]
-VMINVQ))
-  ]
-  "TARGET_HAVE_MVE"
-  "vminv.%#\t%0, %q2"
-  [(set_attr "type" "mve_move")
-])
-
 ;;

[PATCH 10/16] arm: [MVE intrinsics] add support for mve_q_p_f

2023-05-09 Thread Christophe Lyon via Gcc-patches

We can call code_for_mve_q_p_f only once this function exists, which
is the case after we factorized vmaxnmavq, vmaxnmvq, vminnmavq and
vminnmvq in a previous patch.

2022-09-08  Christophe Lyon 

gcc/
* config/arm/arm-mve-builtins-functions.h
(unspec_mve_function_exact_insn_pred_p): Use code_for_mve_q_p_f.
---
 gcc/config/arm/arm-mve-builtins-functions.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/arm/arm-mve-builtins-functions.h 
b/gcc/config/arm/arm-mve-builtins-functions.h
index bf4e209a720..ddedbb2a8e1 100644
--- a/gcc/config/arm/arm-mve-builtins-functions.h
+++ b/gcc/config/arm/arm-mve-builtins-functions.h
@@ -428,7 +428,7 @@ public:
  else
code = code_for_mve_q_p (m_unspec_for_p_sint, m_unspec_for_p_sint, 
e.vector_mode (0));
else
- gcc_unreachable ();  /* Will be fixed later in the series.  */
+ code = code_for_mve_q_p_f (m_unspec_for_p_fp, e.vector_mode (0));
 
return e.use_exact_insn (code);
 
-- 
2.34.1

[PATCH 12/16] arm: [MVE intrinsics] factorize vmaxnmaq vminnmaq

2023-05-09 Thread Christophe Lyon via Gcc-patches

Factorize vmaxnmaq and vminnmaq so that they use the same pattern.

2022-09-08  Christophe Lyon 

gcc/
* config/arm/iterators.md (MVE_VMAXNMA_VMINNMAQ)
(MVE_VMAXNMA_VMINNMAQ_M): New.
(mve_insn): Add vmaxnma, vminnma.
* config/arm/mve.md (mve_vmaxnmaq_f, mve_vminnmaq_f):
Merge into ...
(@mve_q_f): ... this.
(mve_vmaxnmaq_m_f, mve_vminnmaq_m_f): Merge into ...
(@mve_q_m_f): ... this.
---
 gcc/config/arm/iterators.md | 14 +++
 gcc/config/arm/mve.md   | 49 -
 2 files changed, 24 insertions(+), 39 deletions(-)

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 26ad687cefd..8edbf5a55cf 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -611,6 +611,16 @@ (define_int_iterator MVE_VMAXNMxV_MINNMxVQ_P [
 VMINNMVQ_P_F
 ])
 
+(define_int_iterator MVE_VMAXNMA_VMINNMAQ [
+VMAXNMAQ_F
+VMINNMAQ_F
+])
+
+(define_int_iterator MVE_VMAXNMA_VMINNMAQ_M [
+VMAXNMAQ_M_F
+VMINNMAQ_M_F
+])
+
 (define_int_iterator MVE_MOVN [
 VMOVNBQ_S VMOVNBQ_U
 VMOVNTQ_S VMOVNTQ_U
@@ -662,6 +672,8 @@ (define_int_attr mve_insn [
 (VHSUBQ_S "vhsub") (VHSUBQ_U "vhsub")
 (VMAXAVQ_P_S "vmaxav")
 (VMAXAVQ_S "vmaxav")
+(VMAXNMAQ_F "vmaxnma")
+(VMAXNMAQ_M_F "vmaxnma")
 (VMAXNMAVQ_F "vmaxnmav")
 (VMAXNMAVQ_P_F "vmaxnmav")
 (VMAXNMQ_M_F "vmaxnm")
@@ -672,6 +684,8 @@ (define_int_attr mve_insn [
 (VMAXVQ_S "vmaxv") (VMAXVQ_U "vmaxv")
 (VMINAVQ_P_S "vminav")
 (VMINAVQ_S "vminav")
+(VMINNMAQ_F "vminnma")
+(VMINNMAQ_M_F "vminnma")
 (VMINNMAVQ_F "vminnmav")
 (VMINNMAVQ_P_F "vminnmav")
 (VMINNMQ_M_F "vminnm")
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 2aebaa99bbf..ef0b6fd3ded 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -1425,17 +1425,18 @@ (define_insn "mve_veorq_f"
 ])
 
 ;;
-;; [vmaxnmaq_f])
+;; [vmaxnmaq_f]
+;; [vminnmaq_f]
 ;;
-(define_insn "mve_vmaxnmaq_f"
+(define_insn "@mve_q_f"
   [
(set (match_operand:MVE_0 0 "s_register_operand" "=w")
(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
   (match_operand:MVE_0 2 "s_register_operand" "w")]
-VMAXNMAQ_F))
+MVE_VMAXNMA_VMINNMAQ))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vmaxnma.f%#  %q0, %q2"
+  ".f%#\t%q0, %q2"
   [(set_attr "type" "mve_move")
 ])
 
@@ -1472,21 +1473,6 @@ (define_insn "@mve_q_f"
   [(set_attr "type" "mve_move")
 ])
 
-;;
-;; [vminnmaq_f])
-;;
-(define_insn "mve_vminnmaq_f"
-  [
-   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
-   (unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
-  (match_operand:MVE_0 2 "s_register_operand" "w")]
-VMINNMAQ_F))
-  ]
-  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vminnma.f%#  %q0, %q2"
-  [(set_attr "type" "mve_move")
-])
-
 ;;
 ;; [vmlaldavq_u, vmlaldavq_s])
 ;;
@@ -3146,18 +3132,19 @@ (define_insn "mve_vfmsq_f"
 ])
 
 ;;
-;; [vmaxnmaq_m_f])
+;; [vmaxnmaq_m_f]
+;; [vminnmaq_m_f]
 ;;
-(define_insn "mve_vmaxnmaq_m_f"
+(define_insn "@mve_q_m_f"
   [
(set (match_operand:MVE_0 0 "s_register_operand" "=w")
(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
   (match_operand:MVE_0 2 "s_register_operand" "w")
   (match_operand: 3 "vpr_register_operand" 
"Up")]
-VMAXNMAQ_M_F))
+MVE_VMAXNMA_VMINNMAQ_M))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vpst\;vmaxnmat.f%#   %q0, %q2"
+  "vpst\;t.f%#\t%q0, %q2"
   [(set_attr "type" "mve_move")
(set_attr "length""8")])
 
@@ -3180,22 +3167,6 @@ (define_insn "@mve_q_p_f"
   [(set_attr "type" "mve_move")
(set_attr "length""8")])
 
-;;
-;; [vminnmaq_m_f])
-;;
-(define_insn "mve_vminnmaq_m_f"
-  [
-   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
-   (unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
-  (match_operand:MVE_0 2 "s_register_operand" "w")
-  (match_operand: 3 "vpr_register_operand" 
"Up")]
-VMINNMAQ_M_F))
-  ]
-  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vpst\;vminnmat.f%#   %q0, %q2"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
 ;;
 ;; [vmlaldavaq_s, vmlaldavaq_u])
 ;;
-- 
2.34.1

[PATCH 06/16] arm: add smax/smin expanders for v*hf

2023-05-09 Thread Christophe Lyon via Gcc-patches

This patch adds the missing expanders for smax/smin for v*hf modes.

2022-09-08  Christophe Lyon  

gcc/
* config/arm/vec-common.md (smin3): New.
(smax3): New.
---
 gcc/config/arm/vec-common.md | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index b5fc86fdf28..1f9b7992da4 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -116,6 +116,13 @@ (define_expand "smin3"
"ARM_HAVE__ARITH"
 )
 
+(define_expand "smin3"
+  [(set (match_operand:VH 0 "s_register_operand")
+   (smin:VH (match_operand:VH 1 "s_register_operand")
+(match_operand:VH 2 "s_register_operand")))]
+   "ARM_HAVE__ARITH"
+)
+
 (define_expand "umin3"
   [(set (match_operand:VINTW 0 "s_register_operand")
(umin:VINTW (match_operand:VINTW 1 "s_register_operand")
@@ -130,6 +137,13 @@ (define_expand "smax3"
"ARM_HAVE__ARITH"
 )
 
+(define_expand "smax3"
+  [(set (match_operand:VH 0 "s_register_operand")
+   (smax:VH (match_operand:VH 1 "s_register_operand")
+(match_operand:VH 2 "s_register_operand")))]
+   "ARM_HAVE__ARITH"
+)
+
 (define_expand "umax3"
   [(set (match_operand:VINTW 0 "s_register_operand")
(umax:VINTW (match_operand:VINTW 1 "s_register_operand")
-- 
2.34.1

[PATCH 01/16] arm: [MVE intrinsics] add binary_maxvminv shape

2023-05-09 Thread Christophe Lyon via Gcc-patches

This patch adds the binary_maxvminv shape description.

2022-09-08  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-shapes.cc (binary_maxvminv): New.
* config/arm/arm-mve-builtins-shapes.h (binary_maxvminv): New.
---
 gcc/config/arm/arm-mve-builtins-shapes.cc | 30 +++
 gcc/config/arm/arm-mve-builtins-shapes.h  |  1 +
 2 files changed, 31 insertions(+)

diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc 
b/gcc/config/arm/arm-mve-builtins-shapes.cc
index 1d43b8871bf..19c3c47a20e 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.cc
+++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
@@ -401,6 +401,36 @@ struct binary_rshift_def : public overloaded_base<0>
 };
 SHAPE (binary_rshift)
 
+/* _t vfoo[_](_t, _t)
+
+   Example: vmaxvq.
+   int8_t [__arm_]vmaxvq[_s8](int8_t a, int8x16_t b)
+   int8_t [__arm_]vmaxvq_p[_s8](int8_t a, int8x16_t b, mve_pred16_t p)  */
+struct binary_maxvminv_def : public overloaded_base<0>
+{
+  void
+  build (function_builder , const function_group_info ,
+bool preserve_user_namespace) const override
+  {
+b.add_overloaded_functions (group, MODE_none, preserve_user_namespace);
+build_all (b, "s0,s0,v0", group, MODE_none, preserve_user_namespace);
+  }
+
+  tree
+  resolve (function_resolver ) const override
+  {
+unsigned int i, nargs;
+type_suffix_index type;
+if (!r.check_gp_argument (2, i, nargs)
+   || !r.require_derived_scalar_type (0, r.SAME_TYPE_CLASS)
+   || (type = r.infer_vector_type (1)) == NUM_TYPE_SUFFIXES)
+  return error_mark_node;
+
+return r.resolve_to (r.mode_suffix_id, type);
+  }
+};
+SHAPE (binary_maxvminv)
+
 /* _t vfoo[_t0](_t, _t)
 
Example: vmovnbq.
diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h 
b/gcc/config/arm/arm-mve-builtins-shapes.h
index dd2597dc6f5..9debf1d8733 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.h
+++ b/gcc/config/arm/arm-mve-builtins-shapes.h
@@ -37,6 +37,7 @@ namespace arm_mve
 extern const function_shape *const binary;
 extern const function_shape *const binary_lshift;
 extern const function_shape *const binary_lshift_r;
+extern const function_shape *const binary_maxvminv;
 extern const function_shape *const binary_move_narrow;
 extern const function_shape *const binary_move_narrow_unsigned;
 extern const function_shape *const binary_opt_n;
-- 
2.34.1

[PATCH 03/16] arm: [MVE intrinsics add unspec_mve_function_exact_insn_pred_p

2023-05-09 Thread Christophe Lyon via Gcc-patches

Introduce a function that will be used to build intrinsics that use p
predication.

2022-09-08  Christophe Lyon 

gcc/
* config/arm/arm-mve-builtins-functions.h (class
unspec_mve_function_exact_insn_pred_p): New.
---
 gcc/config/arm/arm-mve-builtins-functions.h | 64 +
 1 file changed, 64 insertions(+)

diff --git a/gcc/config/arm/arm-mve-builtins-functions.h 
b/gcc/config/arm/arm-mve-builtins-functions.h
index 533fd1159c6..bf4e209a720 100644
--- a/gcc/config/arm/arm-mve-builtins-functions.h
+++ b/gcc/config/arm/arm-mve-builtins-functions.h
@@ -376,6 +376,70 @@ public:
   }
 };
 
+/* Map the function directly to CODE (UNSPEC), when there is a
+   non-predicated version and one with the "_p" predicate.  */
+class unspec_mve_function_exact_insn_pred_p : public function_base
+{
+public:
+  CONSTEXPR unspec_mve_function_exact_insn_pred_p (int unspec_for_sint,
+  int unspec_for_uint,
+  int unspec_for_fp,
+  int unspec_for_p_sint,
+  int unspec_for_p_uint,
+  int unspec_for_p_fp)
+: m_unspec_for_sint (unspec_for_sint),
+  m_unspec_for_uint (unspec_for_uint),
+  m_unspec_for_fp (unspec_for_fp),
+  m_unspec_for_p_sint (unspec_for_p_sint),
+  m_unspec_for_p_uint (unspec_for_p_uint),
+  m_unspec_for_p_fp (unspec_for_p_fp)
+  {}
+
+  /* The unspec code associated with signed-integer and unsigned-integer
+ operations, with no predicate, or with "_p" predicate.  */
+  int m_unspec_for_sint;
+  int m_unspec_for_uint;
+  int m_unspec_for_fp;
+  int m_unspec_for_p_sint;
+  int m_unspec_for_p_uint;
+  int m_unspec_for_p_fp;
+
+  rtx
+  expand (function_expander ) const override
+  {
+insn_code code;
+switch (e.pred)
+  {
+  case PRED_none:
+   if (e.type_suffix (0).integer_p)
+ if (e.type_suffix (0).unsigned_p)
+   code = code_for_mve_q (m_unspec_for_uint, m_unspec_for_uint, 
e.vector_mode (0));
+ else
+   code = code_for_mve_q (m_unspec_for_sint, m_unspec_for_sint, 
e.vector_mode (0));
+   else
+ code = code_for_mve_q_f (m_unspec_for_fp, e.vector_mode (0));
+
+   return e.use_exact_insn (code);
+
+  case PRED_p:
+   if (e.type_suffix (0).integer_p)
+ if (e.type_suffix (0).unsigned_p)
+   code = code_for_mve_q_p (m_unspec_for_p_uint, m_unspec_for_p_uint, 
e.vector_mode (0));
+ else
+   code = code_for_mve_q_p (m_unspec_for_p_sint, m_unspec_for_p_sint, 
e.vector_mode (0));
+   else
+ gcc_unreachable ();  /* Will be fixed later in the series.  */
+
+   return e.use_exact_insn (code);
+
+  default:
+   gcc_unreachable ();
+  }
+
+gcc_unreachable ();
+  }
+};
+
 /* Map the function directly to CODE (UNSPEC, M) for vshl-like
builtins. The difference with unspec_mve_function_exact_insn is
that this function handles MODE_r and the related unspecs..  */
-- 
2.34.1

[PATCH][RFC] c-family: Implement __has_feature and __has_extension [PR60512]

2023-05-09 Thread Alex Coplan via Gcc-patches

Hi,

This patch implements clang's __has_feature and __has_extension in GCC.
Currently the patch aims to implement all documented features (and some
undocumented ones) following the documentation at
https://clang.llvm.org/docs/LanguageExtensions.html with the following
omissions:
 - C++ type traits.
 - Objective-C-specific features.

C++ type traits aren't currently implemented since, as the clang
documentation notes, __has_builtin is the correct "modern" way to query
for these (which GCC already implements). Of course there's an argument
that we should recognize the legacy set of C++ type traits that can be
queried through __has_feature for backwards compatibility with older
code. I'm happy to do this if reviewers think that's a good idea.

There are some comments in the patch marked with XXX, I'm looking for
review comments from C/C++ maintainers on those areas in particular.

Bootstrapped/regtested on aarch64-linux-gnu. Any comments?

Thanks,
Alex

gcc/c-family/ChangeLog:

PR c++/60512
* c-common.cc (struct hf_feature_info): New.
(has_generic_feature_p): New.
* c-common.h (c_common_has_feature): New.
(has_generic_feature_p): New.
(has_lang_feature_p): New.
* c-lex.cc (init_c_lex): Plumb through has_feature callback.
(c_common_has_builtin): Adapt into more generic helper function.
Rename to ...
(c_common_lex_availability_macro): ... this.
(c_common_has_feature): New.
* c-ppoutput.cc (init_pp_output): Plumb through has_feature callback.

gcc/c/ChangeLog:

PR c++/60512
* c-objc-common.cc (struct c_feature_info): New.
(has_lang_feature_p): New.

gcc/cp/ChangeLog:

PR c++/60512
* cp-objcp-common.cc (struct cp_feature_selector): New.
(cp_feature_selector::has_feature): New.
(struct cp_feature_info): New.
(has_lang_feature_p): New.

gcc/ChangeLog:

PR c++/60512
* doc/cpp.texi: Document __has_{feature,extension}.

libcpp/ChangeLog:

PR c++/60512
* include/cpplib.h (struct cpp_callbacks): Add has_feature callback.
(enum cpp_builtin_type): Add types for __has_{feature,extension}.
* init.cc (builtin_array): Add entries for __has_{feature,extension}.
* macro.cc (_cpp_builtin_macro_text): Handle
BT_HAS_{FEATURE,EXTENSION}.

gcc/testsuite/ChangeLog:

PR c++/60512
* c-c++-common/has-feature-common.c: New test.
* g++.dg/ext/has-feature.C: New test.
* gcc.dg/asan/has-feature-asan.c: New test.
* gcc.dg/has-feature.c: New test.
diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
index 2b4c82facf7..5b8429244b2 100644
--- a/gcc/c-family/c-common.cc
+++ b/gcc/c-family/c-common.cc
@@ -311,6 +311,34 @@ const struct fname_var_t fname_vars[] =
   {NULL, 0, 0},
 };
 
+enum
+{
+  HF_FLAG_EXT = 1, /* Available only as an extension.  */
+  HF_FLAG_SANITIZE = 2, /* Availability depends on sanitizer flags.  */
+};
+
+struct hf_feature_info
+{
+  const char *ident;
+  unsigned flags;
+  unsigned mask;
+};
+
+static const hf_feature_info has_feature_table[] =
+{
+  { "address_sanitizer",   HF_FLAG_SANITIZE, SANITIZE_ADDRESS },
+  { "thread_sanitizer",HF_FLAG_SANITIZE, SANITIZE_THREAD },
+  { "leak_sanitizer",  HF_FLAG_SANITIZE, SANITIZE_LEAK },
+  { "hwaddress_sanitizer", HF_FLAG_SANITIZE, SANITIZE_HWADDRESS },
+  { "undefined_behavior_sanitizer", HF_FLAG_SANITIZE, SANITIZE_UNDEFINED },
+  { "attribute_deprecated_with_message",  0, 0 },
+  { "attribute_unavailable_with_message", 0, 0 },
+  { "enumerator_attributes", 0, 0 },
+  { "tls", 0, 0 },
+  { "gnu_asm_goto_with_outputs", HF_FLAG_EXT, 0 },
+  { "gnu_asm_goto_with_outputs_full",HF_FLAG_EXT, 0 }
+};
+
 /* Global visibility options.  */
 struct visibility_flags visibility_options;
 
@@ -9545,4 +9573,25 @@ c_strict_flex_array_level_of (tree array_field)
   return strict_flex_array_level;
 }
 
+bool
+has_generic_feature_p (const char *feat, bool strict_p)
+{
+  for (unsigned i = 0; i < ARRAY_SIZE (has_feature_table); i++)
+{
+  const hf_feature_info *info = has_feature_table + i;
+  if (strcmp (info->ident, feat))
+   continue;
+
+  if ((info->flags & HF_FLAG_EXT) && strict_p)
+   return false;
+
+  if (info->flags & HF_FLAG_SANITIZE)
+   return flag_sanitize & info->mask;
+
+  return true;
+}
+
+  return has_lang_feature_p (feat, strict_p);
+}
+
 #include "gt-c-family-c-common.h"
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index f96350b64af..38fd30312a0 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -1121,6 +1121,9 @@ extern bool c_cpp_diagnostic (cpp_reader *, enum 
cpp_diagnostic_level,
  ATTRIBUTE_GCC_DIAG(5,0);
 extern int c_common_has_attribute (cpp_reader *, bool);
 extern int c_common_has_builtin (cpp_reader *);
+extern int c_common_has_feature

[PATCH V2] RISC-V: Fix incorrect implementation of TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT

2023-05-09 Thread juzhe . zhong

From: Juzhe-Zhong 

This incorrect codes blocks the scalable RVV auto-vectorization.
Take a look at this target hook implementation of aarch64.
They only have the similiar handling on TARGET_SIMD.

They let movmisalign to handle scalable vector of SVE.
For RVV, we should follow the same implementation of ARM SVE.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_support_vector_misalignment): Fix 
incorrect codes.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/v-2.c: Adapt testcase.
* gcc.target/riscv/rvv/autovec/zve32f-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32f-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32x-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32x-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64d-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64d-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64d_zvl128b-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64f-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64f-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64f_zvl128b-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64x-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64x-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64x_zvl128b-2.c: Ditto.

---
 gcc/config/riscv/riscv.cc | 21 +++
 .../gcc.target/riscv/rvv/autovec/v-2.c|  4 ++--
 .../gcc.target/riscv/rvv/autovec/zve32f-2.c   |  4 ++--
 .../gcc.target/riscv/rvv/autovec/zve32f-3.c   |  4 ++--
 .../gcc.target/riscv/rvv/autovec/zve32x-2.c   |  4 ++--
 .../gcc.target/riscv/rvv/autovec/zve32x-3.c   |  2 +-
 .../gcc.target/riscv/rvv/autovec/zve64d-2.c   |  4 ++--
 .../gcc.target/riscv/rvv/autovec/zve64d-3.c   |  4 ++--
 .../riscv/rvv/autovec/zve64d_zvl128b-2.c  |  4 ++--
 .../gcc.target/riscv/rvv/autovec/zve64f-2.c   |  4 ++--
 .../gcc.target/riscv/rvv/autovec/zve64f-3.c   |  4 ++--
 .../riscv/rvv/autovec/zve64f_zvl128b-2.c  |  4 ++--
 .../gcc.target/riscv/rvv/autovec/zve64x-2.c   |  4 ++--
 .../gcc.target/riscv/rvv/autovec/zve64x-3.c   |  2 +-
 .../riscv/rvv/autovec/zve64x_zvl128b-2.c  |  2 +-
 15 files changed, 32 insertions(+), 39 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 8684271f8ac..ff90c44d811 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -7264,27 +7264,20 @@ riscv_estimated_poly_value (poly_int64 val,
   return val.coeffs[0] + val.coeffs[1] * over_128 / 128;
 }
 
+/* Return true if the vector misalignment factor is supported by the
+   target.  */
 bool
 riscv_support_vector_misalignment (machine_mode mode,
   const_tree type ATTRIBUTE_UNUSED,
   int misalignment,
   bool is_packed ATTRIBUTE_UNUSED)
 {
-  if (TARGET_VECTOR)
-{
-  if (STRICT_ALIGNMENT)
-   {
- /* Return if movmisalign pattern is not supported for this mode.  */
- if (optab_handler (movmisalign_optab, mode) == CODE_FOR_nothing)
-   return false;
-
- /* Misalignment factor is unknown at compile time.  */
- if (misalignment == -1)
-   return false;
-   }
-  return true;
-}
+  /* TODO: For RVV scalable vector auto-vectorization, we should allow
+ movmisalign pattern to handle misalign data movement to unblock
+ possible auto-vectorization.
 
+ RVV VLS auto-vectorization or SIMD auto-vectorization can be supported 
here
+ in the future.  */
   return default_builtin_support_vector_misalignment (mode, type, misalignment,
  is_packed);
 }
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/v-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/v-2.c
index 3d086e30081..66d8ea15f5b 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/v-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/v-2.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv32gcv -mabi=ilp32d --param 
riscv-autovec-preference=fixed-vlmax -fdump-tree-vect-details" } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32d -fno-vect-cost-model --param 
riscv-autovec-preference=fixed-vlmax -fdump-tree-vect-details" } */
 
 #include "template-1.h"
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 5 "vect" 
} } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 6 "vect" 
} } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32f-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32f-2.c
index d6199665126..7cdc174c06f 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32f-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32f-2.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv32gc_zve32f -mabi=ilp32d --param 
riscv-autovec-preference=fixed-vlmax -fdump-tree-vect-details" } */
+/* { dg-options "-march=rv32gc_zve32f -mabi=ilp32d

RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-05-09 Thread Li, Pan2 via Gcc-patches

Sure thing, I will have a try and keep you posted.

Pan

-Original Message-
From: Richard Biener  
Sent: Tuesday, May 9, 2023 6:26 PM
To: Richard Sandiford 
Cc: Li, Pan2 ; Jeff Law ; Kito Cheng 
; juzhe.zh...@rivai.ai; gcc-patches 
; palmer ; jakub 
Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
16-bit

On Tue, 9 May 2023, Richard Sandiford wrote:

> "Li, Pan2"  writes:
> > After the bits patch like below.
> >
> > rtx_def code 16 => 8 bits.
> > rtx_def mode 8 => 16 bits.
> > tree_base code unchanged.
> >
> > The structure layout of both the rtx_def and tree_base will be something 
> > similar as below. As I understand, the lower 8-bits of tree_base will be 
> > inspected when 'dv' is a tree for the rtx conversion.
> >
> > tree_base   rtx_def
> > code: 16code: 8
> > side_effects_flag: 1mode: 16
> 
> I think we should try hard to avoid that though.  The 16-bit value 
> should be aligned to 16 bits if at all possible.  decl_or_value 
> doesn't seem like something that should be dictating our approach here.
> 
> Perhaps we can use pointer_mux for decl_or_value instead?  pointer_mux 
> is intended to be a standands-compliant (hah!) way of switching 
> between two pointer types in a reasonably efficient way.

Ah, I wasn't aware of that - yes, that looks good to use I think.

Pan, can you prepare a patch only doing such conversion of the var-tracking 
decl_or_value type?  Aka make it

typedef pointer_mux decl_or_value;

and adjust uses?

Thanks,
Richard.

> Thanks,
> Richard
> 
> > constant_flag: 1
> > addressable_flag: 1
> > volatile_flag: 1
> > readonly_flag: 1
> > asm_written_flag: 1
> > nowarning_flag: 1
> > visited: 1
> > used_flag: 1
> > nothrow_flag: 1
> > static_flag: 1
> > public_flag: 1
> > private_flag: 1
> > protected_flag: 1
> > deprecated_flag: 1
> > default_def_flag: 1
> >
> > I have a try a similar approach (as below) as you mentioned, aka shrink 
> > tree_code as 1:1 overlap to rtx_code. And completed one memory allocated 
> > bytes test in another email.
> >
> > rtx_def code 16 => 12 bits.
> > rtx_def mode 8 => 12 bits.
> > tree_base code 16 => 12 bits.
> >
> > Pan
> >
> > -Original Message-
> > From: Richard Biener 
> > Sent: Monday, May 8, 2023 3:38 PM
> > To: Li, Pan2 
> > Cc: Jeff Law ; Kito Cheng 
> > ; juzhe.zh...@rivai.ai; richard.sandiford 
> > ; gcc-patches ; 
> > palmer ; jakub 
> > Subject: RE: [PATCH] machine_mode type size: Extend enum size from 
> > 8-bit to 16-bit
> >
> > On Mon, 8 May 2023, Li, Pan2 wrote:
> >
> >> return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; } is able 
> >> to fix this ICE after mode bits change.
> >
> > Can you check which bits this will inspect when 'dv' is a tree after your 
> > patch?  VALUE is 1 and would map to IDENTIFIER_NODE on the tree side when 
> > there was a 1:1 overlap.
> >
> > I think for all cases but struct loc_exp_dep we could find a bit to record 
> > wheter we deal with a VALUE or a decl, but for loc_exp_dep it's going to be 
> > difficult (unless we start to take bits from pointer representations).
> >
> > That said, I agree with Jeff that the code is ugly, but a simplistic 
> > conversion isn't what we want.
> >
> > An alternative "solution" might be to also shrink tree_code when we shrink 
> > rtx_code and keep the 1:1 overlap.
> >
> > Richard.
> >
> >> I will re-trigger the memory allocate bytes test with below changes 
> >> for X86.
> >> 
> >> rtx_def code 16 => 8 bits.
> >> rtx_def mode 8 => 16 bits.
> >> tree_base code unchanged.
> >> 
> >> Pan
> >> 
> >> -Original Message-
> >> From: Li, Pan2
> >> Sent: Monday, May 8, 2023 2:42 PM
> >> To: Richard Biener ; Jeff Law 
> >> 
> >> Cc: Kito Cheng ; juzhe.zh...@rivai.ai; 
> >> richard.sandiford ; gcc-patches 
> >> ; palmer ; jakub 
> >> 
> >> Subject: RE: [PATCH] machine_mode type size: Extend enum size from 
> >> 8-bit to 16-bit
> >> 
> >> Oops. Actually I am patching a version as you mentioned like storage 
> >> allocation. Thank you Richard, will try your suggestion and keep you 
> >> posted.
> >> 
> >> Pan
> >> 
> >> -Original Message-
> >> From: Richard Biener 
> >> Sent: Monday, May 8, 2023 2:30 PM
> >> To: Jeff Law 
> >> Cc: Li, Pan2 ; Kito Cheng 
> >> ; juzhe.zh...@rivai.ai; richard.sandiford 
> >> ; gcc-patches ; 
> >> palmer ; jakub 
> >> Subject: Re: [PATCH] machine_mode type size: Extend enum size from 
> >> 8-bit to 16-bit
> >> 
> >> On Sun, 7 May 2023, Jeff Law wrote:
> >> 
> >> > 
> >> > 
> >> > On 5/6/23 19:55, Li, Pan2 wrote:
> >> > > It looks like we cannot simply swap the code and mode in 
> >> > > rtx_def, the code may have to be the same bits as the tree_code in 
> >> > > tree_base.
> >> > > Or we will meet ICE like below.
> >> > > 
> >> > > rtx_def code 16 => 8 bits.
> >> > > rtx_def mode 8 => 16 bits.
> >> > > 
> >> > > static inline decl_or_value
> >> > > dv_from_value (rtx value)
> >> > > {
> >> > >decl_or_value dv;
> >> > >dv = value;
> >> > >

Re: Re: [PATCH V4] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-09 Thread juzhe.zh...@rivai.ai

Hi，Richards. Would you mind reviewing this patch?

Thanks.


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-05-07 23:19
To: juzhe.zhong; gcc-patches
CC: richard.sandiford; rguenther
Subject: Re: [PATCH V4] VECT: Add decrement IV iteration loop control by 
variable amount support
 
 
On 5/4/23 07:25, juzhe.zh...@rivai.ai wrote:
> From: Ju-Zhe Zhong 
> 
> This patch is fixing V3 patch:
> https://patchwork.sourceware.org/project/gcc/patch/20230407014741.139387-1-juzhe.zh...@rivai.ai/
> 
> Fix issues according to Richard Sandiford && Richard Biener.
> 
> 1. Rename WHILE_LEN pattern into SELECT_VL according to Richard Sandiford.
> 2. Support multiple-rgroup for non-SLP auto-vectorization.
> 
> For vec_pack_trunc pattern (multi-rgroup of non-SLP), we generate the 
> total length:
> 
>   _36 = MIN_EXPR ;
> 
>   First length (MIN (X, VF/N)):
> loop_len_15 = MIN_EXPR <_36, POLY_INT_CST [2, 2]>;
> 
>   Second length (X - MIN (X, 1 * VF/N)):
> loop_len_16 = _36 - loop_len_15;
> 
>   Third length (X - MIN (X, 2 * VF/N)):
> _38 = MIN_EXPR <_36, POLY_INT_CST [4, 4]>;
> loop_len_17 = _36 - _38;
> 
>   Forth length (X - MIN (X, 3 * VF/N)):
> _39 = MIN_EXPR <_36, POLY_INT_CST [6, 6]>;
> loop_len_18 = _36 - _39;
> 
> The reason that I use MIN_EXPR instead of SELECT_VL to calculate total length 
> since using SELECT_VL
> to adapt induction IV consumes more instructions than just using MIN_EXPR. 
> Also, during testing,
> I found it's hard to adjust length correctly according to SELECT_VL.
> 
> So, this patch we only use SELECT_VL for single-rgroup with single length 
> control.
> 
> 3. Fix document of select_vl for Richard Biener (remove mode N).
> 4. Fix comments of vect_set_loop_controls_by_select_vl according to Richard 
> Biener.
> 5. Keep loop_vinfo as first parameter for "vect_get_loop_len".
> 6. make requirement of get_while_len_data_ref_ptr outside, let it to be gated 
> at the caller site.
> 
> More comments from Richard Biener:
>>> So it's not actually saturating.  The saturating operation is done by 
>>> .WHILE_LEN?
> I define the outcome of SELECT_VL (n, vf)  (WHILE_LEN) = IN_RANGE (0, min (n, 
> vf)) will make
> the loop control counter never underflow zero.
> 
>>> I see.  I wonder if it makes sense to leave .WHILE_LEN aside for a start,
>>> the above scheme should also work for single rgroups, no?
>>> As said, it _looks_ like you can progress without .WHILE_LEN and using
>>> .WHILE_LEN is a pure optimization?
> Yes, SELECT_VL (WHILE_LEN) is pure optimization for single-rgroup and allow
> target adjust any length = INRANGE (0, min (n, vf)) each iteration.
> 
> Let me known if I missed something for the V3 patch.
So at a high level this is pretty good.  I think there's some 
improvements we should make in the documentation and comments, but I'm 
comfortable with most of the implementation details.
 
 
 
> 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index cc4a93a8763..99cf0cdbdca 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -4974,6 +4974,40 @@ for (i = 1; i < operand3; i++)
> operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
>   @end smallexample
>   
> +@cindex @code{select_vl@var{m}} instruction pattern
> +@item @code{select_vl@var{m}}
> +Set operand 0 to the number of active elements in vector will be updated 
> value.
This reads rather poorly.  Is this still accurate?
 
Set operand 0 to the number of active elements in a vector to be updated 
in a loop iteration based on the total number of elements to be updated, 
the vectorization factor and vector properties of the target.
 
 
> +operand 1 is the total elements need to be updated value.
operand 1 is the total elements in the vector to be updated.
 
 
> +
> +The output of this pattern is not only used as IV of loop control counter, 
> but also
> +is used as the IV of address calculation with multiply/shift operation. This 
> allow
> +us dynamic adjust the number of elements is processed in each iteration of 
> the loop.
This allows dynamic adjustment of the number of elements processed each 
loop iteration. -- is that still accurate and does it read better?
 
 
> @@ -47,7 +47,9 @@ along with GCC; see the file COPYING3.  If not see
>  so that we can free them all at once.  */
>   static bitmap_obstack loop_renamer_obstack;
>   
> -/* Creates an induction variable with value BASE + STEP * iteration in LOOP.
> +/* Creates an induction variable with value BASE (+/-) STEP * iteration in 
> LOOP.
> +   If CODE is PLUS_EXPR, the induction variable is BASE + STEP * iteration.
> +   If CODE is MINUS_EXPR, the induction variable is BASE - STEP * iteration.
>  It is expected that neither BASE nor STEP are shared with other 
> expressions
>  (unless the sharing rules allow this).  Use VAR as a base var_decl for it
>  (if NULL, a new temporary will be created).  The increment will occur at
It's been pretty standard to stick with just PLUS_EXPR

[committed] mux-utils.h: Fix a comment typo

2023-05-09 Thread Jakub Jelinek via Gcc-patches

Hi!

Trivial comment typo...

Committed as obvious to trunk.

2023-05-09  Jakub Jelinek  

* mux-utils.h: Fix comment typo, avoides -> avoids.

--- gcc/mux-utils.h.jj  2023-01-02 09:32:37.423068007 +0100
+++ gcc/mux-utils.h 2023-05-09 12:45:29.235288953 +0200
@@ -34,7 +34,7 @@
 // and having a separate tag bit to indicate which alternative is active.
 // However, using this class can have two advantages over a union:
 //
-// - It avoides the need to find somewhere to store the tag bit.
+// - It avoids the need to find somewhere to store the tag bit.
 //
 // - The compiler is aware that B cannot be null, which can make checks
 //   of the form:

Jakub

Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-05-09 Thread Richard Biener via Gcc-patches

On Tue, 9 May 2023, Richard Sandiford wrote:

> "Li, Pan2"  writes:
> > After the bits patch like below.
> >
> > rtx_def code 16 => 8 bits.
> > rtx_def mode 8 => 16 bits.
> > tree_base code unchanged.
> >
> > The structure layout of both the rtx_def and tree_base will be something 
> > similar as below. As I understand, the lower 8-bits of tree_base will be 
> > inspected when 'dv' is a tree for the rtx conversion.
> >
> > tree_base   rtx_def
> > code: 16code: 8
> > side_effects_flag: 1mode: 16
> 
> I think we should try hard to avoid that though.  The 16-bit value should
> be aligned to 16 bits if at all possible.  decl_or_value doesn't seem
> like something that should be dictating our approach here.
> 
> Perhaps we can use pointer_mux for decl_or_value instead?  pointer_mux is
> intended to be a standands-compliant (hah!) way of switching between two
> pointer types in a reasonably efficient way.

Ah, I wasn't aware of that - yes, that looks good to use I think.

Pan, can you prepare a patch only doing such conversion of the
var-tracking decl_or_value type?  Aka make it

typedef pointer_mux decl_or_value;

and adjust uses?

Thanks,
Richard.

> Thanks,
> Richard
> 
> > constant_flag: 1
> > addressable_flag: 1
> > volatile_flag: 1
> > readonly_flag: 1
> > asm_written_flag: 1
> > nowarning_flag: 1
> > visited: 1
> > used_flag: 1
> > nothrow_flag: 1
> > static_flag: 1
> > public_flag: 1
> > private_flag: 1
> > protected_flag: 1
> > deprecated_flag: 1
> > default_def_flag: 1
> >
> > I have a try a similar approach (as below) as you mentioned, aka shrink 
> > tree_code as 1:1 overlap to rtx_code. And completed one memory allocated 
> > bytes test in another email.
> >
> > rtx_def code 16 => 12 bits.
> > rtx_def mode 8 => 12 bits.
> > tree_base code 16 => 12 bits.
> >
> > Pan
> >
> > -Original Message-
> > From: Richard Biener  
> > Sent: Monday, May 8, 2023 3:38 PM
> > To: Li, Pan2 
> > Cc: Jeff Law ; Kito Cheng ; 
> > juzhe.zh...@rivai.ai; richard.sandiford ; 
> > gcc-patches ; palmer ; jakub 
> > 
> > Subject: RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
> > 16-bit
> >
> > On Mon, 8 May 2023, Li, Pan2 wrote:
> >
> >> return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; } is able to 
> >> fix this ICE after mode bits change.
> >
> > Can you check which bits this will inspect when 'dv' is a tree after your 
> > patch?  VALUE is 1 and would map to IDENTIFIER_NODE on the tree side when 
> > there was a 1:1 overlap.
> >
> > I think for all cases but struct loc_exp_dep we could find a bit to record 
> > wheter we deal with a VALUE or a decl, but for loc_exp_dep it's going to be 
> > difficult (unless we start to take bits from pointer representations).
> >
> > That said, I agree with Jeff that the code is ugly, but a simplistic 
> > conversion isn't what we want.
> >
> > An alternative "solution" might be to also shrink tree_code when we shrink 
> > rtx_code and keep the 1:1 overlap.
> >
> > Richard.
> >
> >> I will re-trigger the memory allocate bytes test with below changes 
> >> for X86.
> >> 
> >> rtx_def code 16 => 8 bits.
> >> rtx_def mode 8 => 16 bits.
> >> tree_base code unchanged.
> >> 
> >> Pan
> >> 
> >> -Original Message-
> >> From: Li, Pan2
> >> Sent: Monday, May 8, 2023 2:42 PM
> >> To: Richard Biener ; Jeff Law 
> >> 
> >> Cc: Kito Cheng ; juzhe.zh...@rivai.ai; 
> >> richard.sandiford ; gcc-patches 
> >> ; palmer ; jakub 
> >> 
> >> Subject: RE: [PATCH] machine_mode type size: Extend enum size from 
> >> 8-bit to 16-bit
> >> 
> >> Oops. Actually I am patching a version as you mentioned like storage 
> >> allocation. Thank you Richard, will try your suggestion and keep you 
> >> posted.
> >> 
> >> Pan
> >> 
> >> -Original Message-
> >> From: Richard Biener 
> >> Sent: Monday, May 8, 2023 2:30 PM
> >> To: Jeff Law 
> >> Cc: Li, Pan2 ; Kito Cheng ; 
> >> juzhe.zh...@rivai.ai; richard.sandiford ; 
> >> gcc-patches ; palmer ; 
> >> jakub 
> >> Subject: Re: [PATCH] machine_mode type size: Extend enum size from 
> >> 8-bit to 16-bit
> >> 
> >> On Sun, 7 May 2023, Jeff Law wrote:
> >> 
> >> > 
> >> > 
> >> > On 5/6/23 19:55, Li, Pan2 wrote:
> >> > > It looks like we cannot simply swap the code and mode in rtx_def, 
> >> > > the code may have to be the same bits as the tree_code in tree_base.
> >> > > Or we will meet ICE like below.
> >> > > 
> >> > > rtx_def code 16 => 8 bits.
> >> > > rtx_def mode 8 => 16 bits.
> >> > > 
> >> > > static inline decl_or_value
> >> > > dv_from_value (rtx value)
> >> > > {
> >> > >decl_or_value dv;
> >> > >dv = value;
> >> > >gcc_checking_assert (dv_is_value_p (dv));  <=  ICE
> >> > >return dv;
> >> > Ugh.  We really just need to fix this code.  It assumes particular 
> >> > structure layouts and that's just wrong/dumb.
> >> 
> >> Well, it's a neat trick ... we just need to adjust it to
> >> 
> >> static inline bool
> >> dv_is_decl_p (decl_or_value dv)
> >> {
> >>   return !dv ||

Re: Question on patch -fprofile-partial-training

2023-05-09 Thread Jan Hubicka via Gcc-patches

> > > > 
> > > >  From my understanding, -fprofile-partial-training is one important 
> > > > option for PGO performance.
> > > 
> > > I don't think so, speed benefit would be rather small I guess.
> > I saw some articles online to introduce this option for gcc10,
> > https://documentation.suse.com/sbp/all/html/SBP-GCC-10/index.html#sec-gcc10-pgo
> 
> Hi.
> 
> Ah, I see.
> 
> > And also based on my previous experience in Studio compiler, I guess that 
> > this one might have
> > Some good performance impact on PGO.  Is there any old performance data on 
> > this option? (I cannot find online)
> 
> Maybe Honza can chime in here? Or Martin who is the author of the white paper.

Main motivation for this was profiling programs that contain specific
code paths for different CPUs (such as graphics library in Firefox or Linux
kernel). In the situation training machine differs from the machine
program is run later, we end up optimizing for size all code paths
except ones taken by the specific CPU.  This patch essentially tells gcc
to consider every non-trained function as built without profile
feedback.

For Firefox it had important impact on graphics rendering tests back
then since the building machined had AVX while the benchmarking did not.
Some benchmarks improved several times which is not a surprise if you
consider tight graphics rendering loop optimized for size versus
vectorized one.  The patch has bad effect on code size which in turn
impacts performance too, so I think it makes sense to use
-fprofile-partial-training with bit of care (i.e. only one code where
such scenarios are likely).

As for backporting, I do not have checkout of GCC 8 right now. It
depends on profile infrastructure that was added in 2017 (so stage1 of
GCC 8), so the patch may backport quite easilly.  I am not 100% sure
what shape the infrastrucure was in the first version, but I am quite
convinced it had the necessary bits - it was able to make the difference
between 0 profile count and missing profile feedback.

Honza
> 
> Martin
> 
> > 
> > thanks.
> > 
> > Qing
> > 
> > > 
> > > > I’d like
> > > > to see any big technique difficult to prevent it from being back ported 
> > > > to GCC8.
> > > 
> > > There might be of course some patch dependencies and I don't see a point 
> > > why should we waste
> > > time with that.
> > > 
> > > Cheers,
> > > Martin
> > > 
> > > > 
> > > > Thanks.
> > > > 
> > > > Qing
> > > > 
> > > > > 
> > > > > Martin
> > > > > 
> > > > > > 
> > > > > > Can this patch be back ported to GCC8 easily? I am wondering any 
> > > > > > significant
> > > > > > Change between GCC8 and GCC10 that might make the backporting very 
> > > > > > hard>
> > > > > > Thanks a lot for your help.
> > > > > > 
> > > > > > Qing
> > 
>

[committed] testsuite: Add further testcase for already fixed PR [PR109778]

2023-05-09 Thread Jakub Jelinek via Gcc-patches

Hi!

On Tue, May 09, 2023 at 08:55:56AM +, Richard Biener wrote:
> OK.

Thanks.

I came up with a testcase which reproduces all the way to r10-7469.
LTO to avoid early inlining it, so that ccp handles rotates and not
shifts before they are turned into rotates.

Tested on x86_64-linux -m32/-m64, both trunk and 10 branch, committed
to trunk as obvious so far:

2023-05-09  Jakub Jelinek  

PR tree-optimization/109778
* gcc.dg/lto/pr109778_0.c: New test.
* gcc.dg/lto/pr109778_1.c: New file.

--- gcc/testsuite/gcc.dg/lto/pr109778_0.c.jj2023-05-09 12:03:18.186428978 
+0200
+++ gcc/testsuite/gcc.dg/lto/pr109778_0.c   2023-05-09 12:00:18.506004676 
+0200
@@ -0,0 +1,22 @@
+/* PR tree-optimization/109778 */
+/* { dg-lto-do run } */
+/* { dg-lto-options { "-O2 -flto" } } */
+/* { dg-require-effective-target int32 } */
+
+int bar (int);
+
+__attribute__((noipa)) int
+foo (int x)
+{
+  x = bar (x);
+  x = (x << 16) | (int) ((unsigned) x >> 16);
+  return x & 0x1000;
+}
+
+int
+main ()
+{
+  if (foo (0) || foo (-1))
+__builtin_abort ();
+  return 0;
+}
--- gcc/testsuite/gcc.dg/lto/pr109778_1.c.jj2023-05-09 12:03:21.504381415 
+0200
+++ gcc/testsuite/gcc.dg/lto/pr109778_1.c   2023-05-09 12:00:07.062168719 
+0200
@@ -0,0 +1,7 @@
+int
+bar (int x)
+{
+  x &= 0x;
+  x |= (int) 0xf1234567U;
+  return x;
+}


Jakub

Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-05-09 Thread Richard Sandiford via Gcc-patches

"Li, Pan2"  writes:
> After the bits patch like below.
>
> rtx_def code 16 => 8 bits.
> rtx_def mode 8 => 16 bits.
> tree_base code unchanged.
>
> The structure layout of both the rtx_def and tree_base will be something 
> similar as below. As I understand, the lower 8-bits of tree_base will be 
> inspected when 'dv' is a tree for the rtx conversion.
>
> tree_base rtx_def
> code: 16  code: 8
> side_effects_flag: 1  mode: 16

I think we should try hard to avoid that though.  The 16-bit value should
be aligned to 16 bits if at all possible.  decl_or_value doesn't seem
like something that should be dictating our approach here.

Perhaps we can use pointer_mux for decl_or_value instead?  pointer_mux is
intended to be a standands-compliant (hah!) way of switching between two
pointer types in a reasonably efficient way.

Thanks,
Richard

> constant_flag: 1
> addressable_flag: 1
> volatile_flag: 1
> readonly_flag: 1
> asm_written_flag: 1
> nowarning_flag: 1
> visited: 1
> used_flag: 1
> nothrow_flag: 1
> static_flag: 1
> public_flag: 1
> private_flag: 1
> protected_flag: 1
> deprecated_flag: 1
> default_def_flag: 1
>
> I have a try a similar approach (as below) as you mentioned, aka shrink 
> tree_code as 1:1 overlap to rtx_code. And completed one memory allocated 
> bytes test in another email.
>
> rtx_def code 16 => 12 bits.
> rtx_def mode 8 => 12 bits.
> tree_base code 16 => 12 bits.
>
> Pan
>
> -Original Message-
> From: Richard Biener  
> Sent: Monday, May 8, 2023 3:38 PM
> To: Li, Pan2 
> Cc: Jeff Law ; Kito Cheng ; 
> juzhe.zh...@rivai.ai; richard.sandiford ; 
> gcc-patches ; palmer ; jakub 
> 
> Subject: RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
> 16-bit
>
> On Mon, 8 May 2023, Li, Pan2 wrote:
>
>> return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; } is able to 
>> fix this ICE after mode bits change.
>
> Can you check which bits this will inspect when 'dv' is a tree after your 
> patch?  VALUE is 1 and would map to IDENTIFIER_NODE on the tree side when 
> there was a 1:1 overlap.
>
> I think for all cases but struct loc_exp_dep we could find a bit to record 
> wheter we deal with a VALUE or a decl, but for loc_exp_dep it's going to be 
> difficult (unless we start to take bits from pointer representations).
>
> That said, I agree with Jeff that the code is ugly, but a simplistic 
> conversion isn't what we want.
>
> An alternative "solution" might be to also shrink tree_code when we shrink 
> rtx_code and keep the 1:1 overlap.
>
> Richard.
>
>> I will re-trigger the memory allocate bytes test with below changes 
>> for X86.
>> 
>> rtx_def code 16 => 8 bits.
>> rtx_def mode 8 => 16 bits.
>> tree_base code unchanged.
>> 
>> Pan
>> 
>> -Original Message-
>> From: Li, Pan2
>> Sent: Monday, May 8, 2023 2:42 PM
>> To: Richard Biener ; Jeff Law 
>> 
>> Cc: Kito Cheng ; juzhe.zh...@rivai.ai; 
>> richard.sandiford ; gcc-patches 
>> ; palmer ; jakub 
>> 
>> Subject: RE: [PATCH] machine_mode type size: Extend enum size from 
>> 8-bit to 16-bit
>> 
>> Oops. Actually I am patching a version as you mentioned like storage 
>> allocation. Thank you Richard, will try your suggestion and keep you posted.
>> 
>> Pan
>> 
>> -Original Message-
>> From: Richard Biener 
>> Sent: Monday, May 8, 2023 2:30 PM
>> To: Jeff Law 
>> Cc: Li, Pan2 ; Kito Cheng ; 
>> juzhe.zh...@rivai.ai; richard.sandiford ; 
>> gcc-patches ; palmer ; 
>> jakub 
>> Subject: Re: [PATCH] machine_mode type size: Extend enum size from 
>> 8-bit to 16-bit
>> 
>> On Sun, 7 May 2023, Jeff Law wrote:
>> 
>> > 
>> > 
>> > On 5/6/23 19:55, Li, Pan2 wrote:
>> > > It looks like we cannot simply swap the code and mode in rtx_def, 
>> > > the code may have to be the same bits as the tree_code in tree_base.
>> > > Or we will meet ICE like below.
>> > > 
>> > > rtx_def code 16 => 8 bits.
>> > > rtx_def mode 8 => 16 bits.
>> > > 
>> > > static inline decl_or_value
>> > > dv_from_value (rtx value)
>> > > {
>> > >decl_or_value dv;
>> > >dv = value;
>> > >gcc_checking_assert (dv_is_value_p (dv));  <=  ICE
>> > >return dv;
>> > Ugh.  We really just need to fix this code.  It assumes particular 
>> > structure layouts and that's just wrong/dumb.
>> 
>> Well, it's a neat trick ... we just need to adjust it to
>> 
>> static inline bool
>> dv_is_decl_p (decl_or_value dv)
>> {
>>   return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; }
>> 
>> I think (and hope for the 'decl' case the bits inspected are never 'VALUE'). 
>>  Of course the above stinks from a TBAA perspective ...
>> 
>> Any "real" fix would require allocating storage for a discriminator and thus 
>> hurt the resource constrained var-tracking a lot.
>> 
>> Richard.
>> 
>
> --
> Richard Biener 
> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, 
> Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 
> 36809 (AG Nuernberg)

Re: Question on patch -fprofile-partial-training

2023-05-09 Thread Martin Liška


On 5/4/23 15:37, Qing Zhao wrote:




On May 4, 2023, at 9:05 AM, Martin Liška  wrote:

On 5/4/23 14:54, Qing Zhao wrote:




On May 4, 2023, at 4:30 AM, Martin Liška  wrote:

On 5/3/23 21:10, Qing Zhao via Gcc-patches wrote:

Hi, Jan,

You added the following patch into gcc10:

 From 34fbe3f0946f88828765184ed6581bda62cdf49f Mon Sep 17 00:00:00 2001
From: Jan Hubicka 
Date: Thu, 5 Dec 2019 19:12:51 +0100
Subject: [PATCH] cgraphclones.c (localize_profile): New function.

   * cgraphclones.c (localize_profile): New function.
   (cgraph_node::create_clone): Use it for partial profiles.
   * common.opt (fprofile-partial-training): New flag.
   * doc/invoke.texi (-fprofile-partial-training): Document.
   * ipa-cp.c (update_profiling_info): For partial profiles do not
   set function profile to zero.
   * profile.c (compute_branch_probabilities): With partial profile
   watch if edge count is zero and turn all probabilities to guessed.
   (compute_branch_probabilities): For partial profiles do not apply
   profile when entry count is zero.
   * tree-profile.c (tree_profiling): Only do value_profile_transformations
   when profile is read.

My question is:


Hello.

Why would anybody backport such change to unsupported code-stream of GCC 8?
Generally speaking, I discourage from doing that.


Yes, I agree.
However, many users still use GCC8 right now, and some of them are asking for 
more performance
from PGO recently. That’s the reason I am studying this right now.


I understand there are products that are based on GCC8, but as the branch is 
officially unsupported, I don't
see a reason to backport a new feature from newer release. It's just asking for 
troubles. If your clients are
interested in more performance, then they should use a recent supported release.

We are trying to persuade them to use newer GCC, but it’s quite hard...




 From my understanding, -fprofile-partial-training is one important option for 
PGO performance.


I don't think so, speed benefit would be rather small I guess.

I saw some articles online to introduce this option for gcc10,
https://documentation.suse.com/sbp/all/html/SBP-GCC-10/index.html#sec-gcc10-pgo


Hi.

Ah, I see.


And also based on my previous experience in Studio compiler, I guess that this 
one might have
Some good performance impact on PGO.  Is there any old performance data on this 
option? (I cannot find online)


Maybe Honza can chime in here? Or Martin who is the author of the white paper.

Martin



thanks.

Qing




I’d like
to see any big technique difficult to prevent it from being back ported to GCC8.


There might be of course some patch dependencies and I don't see a point why 
should we waste
time with that.

Cheers,
Martin



Thanks.

Qing



Martin



Can this patch be back ported to GCC8 easily? I am wondering any significant
Change between GCC8 and GCC10 that might make the backporting very hard>
Thanks a lot for your help.

Qing

[PATCH] RISC-V: Fix incorrect implementation of TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT

2023-05-09 Thread juzhe . zhong

From: Juzhe-Zhong 

This incorrect codes blocks the scalable RVV auto-vectorization.
Take a look at this target hook implementation of aarch64.
They only have the similiar handling on TARGET_SIMD.

They let movmisalign to handle scalable vector of SVE.
For RVV, we should follow the same implementation of ARM SVE.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_support_vector_misalignment): Fix 
incorrect codes.

---
 gcc/config/riscv/riscv.cc | 21 +++--
 1 file changed, 7 insertions(+), 14 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 8684271f8ac..ff90c44d811 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -7264,27 +7264,20 @@ riscv_estimated_poly_value (poly_int64 val,
   return val.coeffs[0] + val.coeffs[1] * over_128 / 128;
 }
 
+/* Return true if the vector misalignment factor is supported by the
+   target.  */
 bool
 riscv_support_vector_misalignment (machine_mode mode,
   const_tree type ATTRIBUTE_UNUSED,
   int misalignment,
   bool is_packed ATTRIBUTE_UNUSED)
 {
-  if (TARGET_VECTOR)
-{
-  if (STRICT_ALIGNMENT)
-   {
- /* Return if movmisalign pattern is not supported for this mode.  */
- if (optab_handler (movmisalign_optab, mode) == CODE_FOR_nothing)
-   return false;
-
- /* Misalignment factor is unknown at compile time.  */
- if (misalignment == -1)
-   return false;
-   }
-  return true;
-}
+  /* TODO: For RVV scalable vector auto-vectorization, we should allow
+ movmisalign pattern to handle misalign data movement to unblock
+ possible auto-vectorization.
 
+ RVV VLS auto-vectorization or SIMD auto-vectorization can be supported 
here
+ in the future.  */
   return default_builtin_support_vector_misalignment (mode, type, misalignment,
  is_packed);
 }
-- 
2.36.3

Re: [PATCH] tree: Fix up save_expr [PR52339]

2023-05-09 Thread Eric Botcazou via Gcc-patches

> I think we really need Eric (as one who e.g. introduced the
> DECL_INVARIANT_P apparently for this kind of stuff) to have a look at that
> on the Ada side.

DECL_INVARIANT_P is only set on discriminants with no default value and those 
are really invariant in Ada, i.e. do not change once set.

> The question is if the posted tree.cc (smallest) patch + 3 new testcases
> + the 7 ada testsuite workarounds are ok for trunk if it passes
> bootstrap/regtest, then I'd file a PR about the Ada regression and only once
> it is dealt with would consider backporting, or if we need to wait for Eric
> before making progress.

Let me have a quick look first, as pessimizing loop optimizations in Ada in 
order to fix a 11-year old PR seems to be a little bit hasty.

-- 
Eric Botcazou

Testsuite: Add missing 'torture-init'/'torture-finish' around 'LTO_TORTURE_OPTIONS' usage (was: Let each 'lto_init' determine the default 'LTO_OPTIONS', and 'torture-init' the 'LTO_TORTURE_OPTIONS')

2023-05-09 Thread Thomas Schwinge

Hi Christophe!

On 2023-05-09T09:32:55+0200, Christophe Lyon  wrote:
> On Wed, 3 May 2023 at 13:47, Richard Biener via Gcc-patches 
>  wrote:
>> On Wed, 3 May 2023, Thomas Schwinge wrote:
>> > "Let each 'lto_init' determine the default 'LTO_OPTIONS', and 
>> > 'torture-init' the 'LTO_TORTURE_OPTIONS'"?
>
> This is causing issues on arm/aarch64, including:
>
> ERROR: can't read "LTO_TORTURE_OPTIONS": no such variable
> in gcc.target/arm/acle/acle.exp:
>
> ERROR: torture-init: LTO_TORTURE_OPTIONS is not empty as expected
> in gcc.target/aarch64/sls-mitigation/sls-mitigation.exp,
> gcc.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp,
> gcc.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp,
> gcc.target/aarch64/torture/aarch64-torture.exp
>
> and maybe others
>
> Are other targets affected too?

Sorry for that -- it means, the safe-guards I added are working as
expected.

Please test whether all these issues are gone with the attached
"Testsuite: Add missing 'torture-init'/'torture-finish' around 
'LTO_TORTURE_OPTIONS' usage"?


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 5f158fb7a5167e943e1410c7faa30e682ae85c4d Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 9 May 2023 10:35:27 +0200
Subject: [PATCH] Testsuite: Add missing 'torture-init'/'torture-finish' around
 'LTO_TORTURE_OPTIONS' usage

Recent commit d6654a4be3ba44c0d57be7c8a51d76d9721345e1
"Let each 'lto_init' determine the default 'LTO_OPTIONS', and 'torture-init' the 'LTO_TORTURE_OPTIONS'"
made it a requirement that 'LTO_TORTURE_OPTIONS' usage be within
'torture-init'/'torture-finish', and missed a few cases that didn't have that.

	gcc/testsuite/
	* gcc.target/arm/acle/acle.exp: Add missing
	'torture-init'/'torture-finish' around 'LTO_TORTURE_OPTIONS'
	usage.
	* gcc.target/arm/cmse/cmse.exp: Likewise.
	* gcc.target/arm/pure-code/pure-code.exp: Likewise.
---
 gcc/testsuite/gcc.target/arm/acle/acle.exp   | 3 +++
 gcc/testsuite/gcc.target/arm/cmse/cmse.exp   | 2 ++
 gcc/testsuite/gcc.target/arm/pure-code/pure-code.exp | 2 ++
 3 files changed, 7 insertions(+)

diff --git a/gcc/testsuite/gcc.target/arm/acle/acle.exp b/gcc/testsuite/gcc.target/arm/acle/acle.exp
index 7b99dd72987..4d63ccc9554 100644
--- a/gcc/testsuite/gcc.target/arm/acle/acle.exp
+++ b/gcc/testsuite/gcc.target/arm/acle/acle.exp
@@ -26,6 +26,7 @@ load_lib gcc-dg.exp
 
 # Initialize `dg'.
 dg-init
+torture-init
 
 set saved-dg-do-what-default ${dg-do-what-default}
 set dg-do-what-default "assemble"
@@ -48,5 +49,7 @@ gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cCS\]]] \
 # Restore globals
 set dg-do-what-default ${saved-dg-do-what-default}
 set LTO_TORTURE_OPTIONS ${saved-lto_torture_options}
+
 # All done.
+torture-finish
 dg-finish
diff --git a/gcc/testsuite/gcc.target/arm/cmse/cmse.exp b/gcc/testsuite/gcc.target/arm/cmse/cmse.exp
index 1d251a4fa1f..0baf8c5a504 100644
--- a/gcc/testsuite/gcc.target/arm/cmse/cmse.exp
+++ b/gcc/testsuite/gcc.target/arm/cmse/cmse.exp
@@ -32,6 +32,7 @@ if ![info exists DEFAULT_CFLAGS] then {
 
 # Initialize `dg'.
 dg-init
+torture-init
 
 set saved-dg-do-what-default ${dg-do-what-default}
 
@@ -104,4 +105,5 @@ set LTO_TORTURE_OPTIONS ${saved-lto_torture_options}
 set dg-do-what-default ${saved-dg-do-what-default}
 
 # All done.
+torture-finish
 dg-finish
diff --git a/gcc/testsuite/gcc.target/arm/pure-code/pure-code.exp b/gcc/testsuite/gcc.target/arm/pure-code/pure-code.exp
index c23392dcdfd..6d32e4a7f8d 100644
--- a/gcc/testsuite/gcc.target/arm/pure-code/pure-code.exp
+++ b/gcc/testsuite/gcc.target/arm/pure-code/pure-code.exp
@@ -35,6 +35,7 @@ if ![info exists DEFAULT_CFLAGS] then {
 if {[check_effective_target_arm_cortex_m]} then {
 # Initialize `dg'.
 dg-init
+torture-init
 
 set saved-dg-do-what-default ${dg-do-what-default}
 set dg-do-what-default "assemble"
@@ -58,5 +59,6 @@ set dg-do-what-default ${saved-dg-do-what-default}
 set LTO_TORTURE_OPTIONS ${saved-lto_torture_options}
 
 # All done.
+torture-finish
 dg-finish
 }
-- 
2.34.1

[PATCH] i386: Honour -mdirect-extern-access when calling fentry

2023-05-09 Thread Ard Biesheuvel via Gcc-patches

The small and medium PIC code models generate profiling calls that
always load the address of __fentry__() via the GOT, even if
-mdirect-extern-access is in effect.

This deviates from the behavior with respect to other external
references, and results in a longer opcode that relies on linker
relaxation to eliminate the GOT load. In this particular case, the
transformation replaces an indirect 'CALL *__fentry__@GOTPCREL(%rip)'
with either 'CALL __fentry__; NOP' or 'NOP; CALL __fentry__', where the
NOP is a 1 byte NOP that preserves the 6 byte length of the sequence.

This is problematic for the Linux kernel, which generally relies on
-mdirect-extern-access and hidden visibility to eliminate GOT based
symbol references in code generated with -fpie/-fpic, without having to
depend on linker relaxation.

The Linux kernel relies on code patching to replace these opcodes with
NOPs at runtime, and this is complicated code that we'd prefer not to
complicate even more by adding support for patching both 5 and 6 byte
sequences as well as parsing the instruction stream to decide which
variant of CALL+NOP we are dealing with.

So let's honour -mdirect-extern-access, and only load the address of
__fentry__ via the GOT if direct references to external symbols are not
permitted.

Note that the GOT reference in question is in fact a data reference: we
explicitly load the address of __fentry__ from the GOT, which amounts to
eager binding, rather than emitting a PLT call that could bind eagerly,
lazily or directly at link time.

gcc/ChangeLog:

* config/i386/i386.cc (x86_function_profiler): Take
  ix86_direct_extern_access into account when generating calls
  to __fentry__()

Cc: H.J. Lu 
Cc: Jakub Jelinek 
Cc: Richard Biener 
Cc: Uros Bizjak 
Cc: Hou Wenlong 
---
 gcc/config/i386/i386.cc | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index b1d08ecdb3d44729..69b183abb4318b0a 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -21836,8 +21836,12 @@ x86_function_profiler (FILE *file, int labelno 
ATTRIBUTE_UNUSED)
  break;
case CM_SMALL_PIC:
case CM_MEDIUM_PIC:
- fprintf (file, "1:\tcall\t*%s@GOTPCREL(%%rip)\n", mcount_name);
- break;
+ if (!ix86_direct_extern_access)
+   {
+ fprintf (file, "1:\tcall\t*%s@GOTPCREL(%%rip)\n", 
mcount_name);
+ break;
+   }
+ /* fall through */
default:
  x86_print_call_or_nop (file, mcount_name);
  break;
-- 
2.39.2

Re: [PATCH] tree-ssa-ccp, wide-int: Fix up handling of [LR]ROTATE_EXPR in bitwise ccp [PR109778]

2023-05-09 Thread Richard Biener via Gcc-patches

On Tue, 9 May 2023, Jakub Jelinek wrote:

> Hi!
> 
> The following testcase is miscompiled, because bitwise ccp2 handles
> a rotate with a signed type incorrectly.
> Seems tree-ssa-ccp.cc has the only callers of wi::[lr]rotate with 3
> arguments, all other callers just rotate in the right precision and
> I think work correctly.  ccp works with widest_ints and so rotations
> by the excessive precision certainly don't match what it wants
> when it sees a rotate in some specific bitsize.  Still, if it is
> unsigned rotate and the widest_int is zero extended from width,
> the functions perform left shift and logical right shift on the value
> and then at the end zero extend the result of left shift and uselessly
> also the result of logical right shift and return | of that.
> On the testcase we the signed char rrotate by 4 argument is
> CONSTANT -75 i.e. 0xfb5 with mask 2.
> The mask is correctly rotated to 0x20, but because the 8-bit constant
> is sign extended to 192-bit one, the logical right shift by 4 doesn't
> yield expected 0xb, but gives 0xfffb, and then
> return wi::zext (left, width) | wi::zext (right, width); where left is
> 0xffffb50, so we return 0xfb instead of the expected
> 0x5b.
> 
> The following patch fixes that by doing the zero extension in case of
> the right variable before doing wi::lrshift rather than after it.
> 
> Also, wi::[lr]rotate widht width < precision always zero extends
> the result.  I'm afraid it can't do better because it doesn't know
> if it is done for an unsigned or signed type, but the caller in this
> case knows that very well, so I've done the extension based on sgn
> in the caller.  E.g. 0x5b rotated right (or left) by 4 with width 8
> previously gave 0xb5, but sgn == SIGNED in widest_int it should be
> 0xfffb5 instead.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk
> and release branches?

OK.

Thanks,
Richard.

> 2023-05-09  Jakub Jelinek  
> 
>   PR tree-optimization/109778
>   * wide-int.h (wi::lrotate, wi::rrotate): Call wi::lrshift on
>   wi::zext (x, width) rather than x if width != precision, rather
>   than using wi::zext (right, width) after the shift.
>   * tree-ssa-ccp.cc (bit_value_binop): Call wi::ext on the results
>   of wi::lrotate or wi::rrotate.
> 
>   * gcc.c-torture/execute/pr109778.c: New test.
> 
> --- gcc/wide-int.h.jj 2023-04-18 11:00:39.926725744 +0200
> +++ gcc/wide-int.h2023-05-08 23:36:41.104412818 +0200
> @@ -3187,9 +3187,11 @@ wi::lrotate (const T1 , const T2 , u
>  width = precision;
>WI_UNARY_RESULT (T2) ymod = umod_trunc (y, width);
>WI_UNARY_RESULT (T1) left = wi::lshift (x, ymod);
> -  WI_UNARY_RESULT (T1) right = wi::lrshift (x, wi::sub (width, ymod));
> +  WI_UNARY_RESULT (T1) right
> += wi::lrshift (width != precision ? wi::zext (x, width) : x,
> +wi::sub (width, ymod));
>if (width != precision)
> -return wi::zext (left, width) | wi::zext (right, width);
> +return wi::zext (left, width) | right;
>return left | right;
>  }
>  
> @@ -3204,10 +3206,11 @@ wi::rrotate (const T1 , const T2 , u
>if (width == 0)
>  width = precision;
>WI_UNARY_RESULT (T2) ymod = umod_trunc (y, width);
> -  WI_UNARY_RESULT (T1) right = wi::lrshift (x, ymod);
> +  WI_UNARY_RESULT (T1) right
> += wi::lrshift (width != precision ? wi::zext (x, width) : x, ymod);
>WI_UNARY_RESULT (T1) left = wi::lshift (x, wi::sub (width, ymod));
>if (width != precision)
> -return wi::zext (left, width) | wi::zext (right, width);
> +return wi::zext (left, width) | right;
>return left | right;
>  }
>  
> --- gcc/tree-ssa-ccp.cc.jj2023-01-02 09:32:39.990030918 +0100
> +++ gcc/tree-ssa-ccp.cc   2023-05-09 00:03:02.692915316 +0200
> @@ -1552,6 +1552,8 @@ bit_value_binop (enum tree_code code, si
> *mask = wi::lrotate (r1mask, shift, width);
> *val = wi::lrotate (r1val, shift, width);
>   }
> +*mask = wi::ext (*mask, width, sgn);
> +*val = wi::ext (*val, width, sgn);
>   }
>   }
>else if (wi::ltu_p (r2val | r2mask, width)
> @@ -1593,8 +1595,8 @@ bit_value_binop (enum tree_code code, si
> /* Accumulate the result.  */
> res_mask |= tmp_mask | (res_val ^ tmp_val);
>   }
> -   *val = wi::bit_and_not (res_val, res_mask);
> -   *mask = res_mask;
> +   *val = wi::ext (wi::bit_and_not (res_val, res_mask), width, sgn);
> +   *mask = wi::ext (res_mask, width, sgn);
>   }
>break;
>  
> --- gcc/testsuite/gcc.c-torture/execute/pr109778.c.jj 2023-05-09 
> 00:05:20.249959226 +0200
> +++ gcc/testsuite/gcc.c-torture/execute/pr109778.c2023-05-09 
> 00:04:58.870263249 +0200
> @@ -0,0 +1,26 @@
> +/* PR tree-optimization/109778 */
> +
> +int a, b, c, d, *e = 
> +
> +static inline unsigned
> +foo (unsigned char x)
> +{
> +  x = 1 | x << 1;
> +  x = x

[PATCH] tree-ssa-ccp, wide-int: Fix up handling of [LR]ROTATE_EXPR in bitwise ccp [PR109778]

2023-05-09 Thread Jakub Jelinek via Gcc-patches

Hi!

The following testcase is miscompiled, because bitwise ccp2 handles
a rotate with a signed type incorrectly.
Seems tree-ssa-ccp.cc has the only callers of wi::[lr]rotate with 3
arguments, all other callers just rotate in the right precision and
I think work correctly.  ccp works with widest_ints and so rotations
by the excessive precision certainly don't match what it wants
when it sees a rotate in some specific bitsize.  Still, if it is
unsigned rotate and the widest_int is zero extended from width,
the functions perform left shift and logical right shift on the value
and then at the end zero extend the result of left shift and uselessly
also the result of logical right shift and return | of that.
On the testcase we the signed char rrotate by 4 argument is
CONSTANT -75 i.e. 0xfb5 with mask 2.
The mask is correctly rotated to 0x20, but because the 8-bit constant
is sign extended to 192-bit one, the logical right shift by 4 doesn't
yield expected 0xb, but gives 0xfffb, and then
return wi::zext (left, width) | wi::zext (right, width); where left is
0xffffb50, so we return 0xfb instead of the expected
0x5b.

The following patch fixes that by doing the zero extension in case of
the right variable before doing wi::lrshift rather than after it.

Also, wi::[lr]rotate widht width < precision always zero extends
the result.  I'm afraid it can't do better because it doesn't know
if it is done for an unsigned or signed type, but the caller in this
case knows that very well, so I've done the extension based on sgn
in the caller.  E.g. 0x5b rotated right (or left) by 4 with width 8
previously gave 0xb5, but sgn == SIGNED in widest_int it should be
0xfffb5 instead.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk
and release branches?

2023-05-09  Jakub Jelinek  

PR tree-optimization/109778
* wide-int.h (wi::lrotate, wi::rrotate): Call wi::lrshift on
wi::zext (x, width) rather than x if width != precision, rather
than using wi::zext (right, width) after the shift.
* tree-ssa-ccp.cc (bit_value_binop): Call wi::ext on the results
of wi::lrotate or wi::rrotate.

* gcc.c-torture/execute/pr109778.c: New test.

--- gcc/wide-int.h.jj   2023-04-18 11:00:39.926725744 +0200
+++ gcc/wide-int.h  2023-05-08 23:36:41.104412818 +0200
@@ -3187,9 +3187,11 @@ wi::lrotate (const T1 , const T2 , u
 width = precision;
   WI_UNARY_RESULT (T2) ymod = umod_trunc (y, width);
   WI_UNARY_RESULT (T1) left = wi::lshift (x, ymod);
-  WI_UNARY_RESULT (T1) right = wi::lrshift (x, wi::sub (width, ymod));
+  WI_UNARY_RESULT (T1) right
+= wi::lrshift (width != precision ? wi::zext (x, width) : x,
+  wi::sub (width, ymod));
   if (width != precision)
-return wi::zext (left, width) | wi::zext (right, width);
+return wi::zext (left, width) | right;
   return left | right;
 }
 
@@ -3204,10 +3206,11 @@ wi::rrotate (const T1 , const T2 , u
   if (width == 0)
 width = precision;
   WI_UNARY_RESULT (T2) ymod = umod_trunc (y, width);
-  WI_UNARY_RESULT (T1) right = wi::lrshift (x, ymod);
+  WI_UNARY_RESULT (T1) right
+= wi::lrshift (width != precision ? wi::zext (x, width) : x, ymod);
   WI_UNARY_RESULT (T1) left = wi::lshift (x, wi::sub (width, ymod));
   if (width != precision)
-return wi::zext (left, width) | wi::zext (right, width);
+return wi::zext (left, width) | right;
   return left | right;
 }
 
--- gcc/tree-ssa-ccp.cc.jj  2023-01-02 09:32:39.990030918 +0100
+++ gcc/tree-ssa-ccp.cc 2023-05-09 00:03:02.692915316 +0200
@@ -1552,6 +1552,8 @@ bit_value_binop (enum tree_code code, si
  *mask = wi::lrotate (r1mask, shift, width);
  *val = wi::lrotate (r1val, shift, width);
}
+  *mask = wi::ext (*mask, width, sgn);
+  *val = wi::ext (*val, width, sgn);
}
}
   else if (wi::ltu_p (r2val | r2mask, width)
@@ -1593,8 +1595,8 @@ bit_value_binop (enum tree_code code, si
  /* Accumulate the result.  */
  res_mask |= tmp_mask | (res_val ^ tmp_val);
}
- *val = wi::bit_and_not (res_val, res_mask);
- *mask = res_mask;
+ *val = wi::ext (wi::bit_and_not (res_val, res_mask), width, sgn);
+ *mask = wi::ext (res_mask, width, sgn);
}
   break;
 
--- gcc/testsuite/gcc.c-torture/execute/pr109778.c.jj   2023-05-09 
00:05:20.249959226 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr109778.c  2023-05-09 
00:04:58.870263249 +0200
@@ -0,0 +1,26 @@
+/* PR tree-optimization/109778 */
+
+int a, b, c, d, *e = 
+
+static inline unsigned
+foo (unsigned char x)
+{
+  x = 1 | x << 1;
+  x = x >> 4 | x << 4;
+  return x;
+}
+
+static inline void
+bar (unsigned x)
+{
+  *e = 8 > foo (x + 86) - 86;
+}
+
+int
+main ()
+{
+  d = a && b;
+  bar (d + 4);
+  if (c != 1)
+__builtin_abort ();
+}

Jakub

1 2 >

1 - 100 of 117 matches

Mail list logo