date:20230725

The pattern mistakenly believes that fsflags can use immediate numbers,
but in fact it does not support it. Immediate numbers should use fsflagsi.

For example:
__builtin_riscv_fsflags(4);

The following error occurred.
/tmp/ccoWdWqT.s: Assembler messages:
/tmp/ccoWdWqT.s:14: Error: illegal operands `fsflags 4'

gcc/ChangeLog:

* config/riscv/riscv.md: Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/fsflags.c: New test.
---
 gcc/config/riscv/riscv.md|  4 ++--
 gcc/testsuite/gcc.target/riscv/fsflags.c | 16 
 2 files changed, 18 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/fsflags.c

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 4615e811947..24515bcf706 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -3074,7 +3074,7 @@ (define_insn "riscv_frcsr"
   "frcsr\t%0")
 
 (define_insn "riscv_fscsr"
-  [(unspec_volatile [(match_operand:SI 0 "csr_operand" "rK")] UNSPECV_FSCSR)]
+  [(unspec_volatile [(match_operand:SI 0 "register_operand" "r")] 
UNSPECV_FSCSR)]
   "TARGET_HARD_FLOAT || TARGET_ZFINX"
   "fscsr\t%0")
 
@@ -3087,7 +3087,7 @@ (define_insn "riscv_frflags"
 (define_insn "riscv_fsflags"
   [(unspec_volatile [(match_operand:SI 0 "csr_operand" "rK")] UNSPECV_FSFLAGS)]
   "TARGET_HARD_FLOAT || TARGET_ZFINX"
-  "fsflags\t%0")
+  "fsflags%i0\t%0")
 
 (define_insn "*riscv_fsnvsnan2"
   [(unspec_volatile [(match_operand:ANYF 0 "register_operand" "f")
diff --git a/gcc/testsuite/gcc.target/riscv/fsflags.c 
b/gcc/testsuite/gcc.target/riscv/fsflags.c
new file mode 100644
index 000..74a97b8a7c7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/fsflags.c
@@ -0,0 +1,16 @@
+/* Verify that fsflags is using the correct register or immediate.  */
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-O" } */
+
+void foo1 (int a)
+{
+   __builtin_riscv_fsflags(a);
+}
+void foo2 ()
+{
+   __builtin_riscv_fsflags(4);
+}
+
+/* { dg-final { scan-assembler-times "fsflags\t" 1 } } */
+/* { dg-final { scan-assembler-times "fsflagsi\t" 1 } } */
-- 
2.17.1

Re: Re: [PATCH v3] RISC-V: Fixbug for fsflags instruction error using immediate.

> So I guess you should change `fscsr` to `fscsr%i0` instead of dropping
> K from the constraint list?
> 
Sorry, you are right. I thought you were talking about fsflags, 
but I didn't notice it was fscsr. I'll correct it right away.
> On Wed, Jul 26, 2023 at 11:42 AM juzhe.zh...@rivai.ai
>  wrote:
> >
> > I don't understand:
> >  (define_insn "riscv_fscsr"
> > -  [(unspec_volatile [(match_operand:SI 0 "csr_operand" "rK")] 
> > UNSPECV_FSCSR)]
> > +  [(unspec_volatile [(match_operand:SI 0 "csr_operand" "r")] 
> > UNSPECV_FSCSR)]
> >"TARGET_HARD_FLOAT || TARGET_ZFINX"
> >"fscsr\t%0")
> >
> > This pattern never allows immediate in the constraint. Why still make 
> > predicate allow immediate?
> >
> >
> >
> >
> > juzhe.zh...@rivai.ai
> >
> > From: Jin Ma
> > Date: 2023-07-26 11:33
> > To: gcc-patches; juzhe.zh...@rivai.ai
> > CC: jeffreyalaw; palmer; richard.sandiford; kito.cheng; philipp.tomsich; 
> > christoph.muellner; Robin Dapp; jinma.contrib
> > Subject: Re: [PATCH v3] RISC-V: Fixbug for fsflags instruction error using 
> > immediate.
> > > -  [(unspec_volatile [(match_operand:SI 0 "csr_operand" "rK")] 
> > > UNSPECV_FSCSR)]
> > > +  [(unspec_volatile [(match_operand:SI 0 "csr_operand" "r")] 
> > > UNSPECV_FSCSR)]
> > >
> > > If you don't allow immediate value in range 0 ~ 31, it should be 
> > > "register_operand" instead of "csr_operand".
> > >
> > >
> >
> > I think directives that support the immediate pattern might be better, on 
> > the one
> > hand fsflagsi are supported in the manual, on the other hand fsflagsi can be
> > slightly faster than fsflags.
> >
> > Regards
> > Jin
> >
> > >
> > > juzhe.zh...@rivai.ai
> > >

RE: [PATCH] RISC-V: Fix vector tuple intrinsic

2023-07-25 Thread Li, Pan2 via Gcc-patches

Thanks a lot. I just fw one email about the write-after-approval steps.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of juzhe.zh...@rivai.ai
Sent: Wednesday, July 26, 2023 12:22 PM
To: Li Xu ; gcc-patches 
Cc: kito.cheng ; palmer ; Li Xu 

Subject: Re: [PATCH] RISC-V: Fix vector tuple intrinsic

Thanks a lot for testing and fixing RVV API。

Could you add a simple float16 tuple api test ?

I known the API is so big that we can't add all api tests into testsuite but 
adding a simple case will be nice.

By the way, do you have write access?




juzhe.zh...@rivai.ai
 
From: Li Xu
Date: 2023-07-26 12:04
To: gcc-patches
CC: kito.cheng; palmer; juzhe.zhong; Li Xu
Subject: [PATCH] RISC-V: Fix vector tuple intrinsic
Consider this following case:
void test_vsoxseg3ei32_v_i32mf2x3(int32_t *base, vuint32mf2_t bindex, 
vint32mf2x3_t v_tuple, size_t vl) {
  return __riscv_vsoxseg3ei32_v_i32mf2x3(base, bindex, v_tuple, vl);
}
 
Compiler failed with:
test.c:19:1: internal compiler error: in vl_vtype_info, at 
config/riscv/riscv-vsetvl.cc:1679
   19 | }
  | ^
0x1439ec2 riscv_vector::vl_vtype_info::vl_vtype_info(riscv_vector::avl_info, 
unsigned char, riscv_vector::vlmul_type, unsigned char, bool, bool)
../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:1679
0x143f788 get_vl_vtype_info
../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:807
0x143f788 riscv_vector::vector_insn_info::parse_insn(rtl_ssa::insn_info*)
../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:1843
0x1440371 riscv_vector::vector_infos_manager::vector_infos_manager()
../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:2350
0x14407ee pass_vsetvl::init()
../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:4581
0x14471cf pass_vsetvl::execute(function*)
../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:4716
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins.def (vfloat16mf4x2_t): Change 
scalar type to float16, eliminate warning.
(vfloat16mf4x3_t): Ditto.
(vfloat16mf4x4_t): Ditto.
(vfloat16mf4x5_t): Ditto.
(vfloat16mf4x6_t): Ditto.
(vfloat16mf4x7_t): Ditto.
(vfloat16mf4x8_t): Ditto.
(vfloat16mf2x2_t): Ditto.
(vfloat16mf2x3_t): Ditto.
(vfloat16mf2x4_t): Ditto.
(vfloat16mf2x5_t): Ditto.
(vfloat16mf2x6_t): Ditto.
(vfloat16mf2x7_t): Ditto.
(vfloat16mf2x8_t): Ditto.
(vfloat16m1x2_t): Ditto.
(vfloat16m1x3_t): Ditto.
(vfloat16m1x4_t): Ditto.
(vfloat16m1x5_t): Ditto.
(vfloat16m1x6_t): Ditto.
(vfloat16m1x7_t): Ditto.
(vfloat16m1x8_t): Ditto.
(vfloat16m2x2_t): Ditto.
(vfloat16m2x3_t): Ditto.
(vfloat16m2x4_t): Ditto.
(vfloat16m4x2_t): Ditto.
* config/riscv/vector-iterators.md: add RVVM4x2DF in iterator V4T.
* config/riscv/vector.md: add tuple mode in attr sew.
---
gcc/config/riscv/riscv-vector-builtins.def | 50 +++---
gcc/config/riscv/vector-iterators.md   |  1 +
gcc/config/riscv/vector.md |  1 +
3 files changed, 27 insertions(+), 25 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-vector-builtins.def 
b/gcc/config/riscv/riscv-vector-builtins.def
index 0e49480703b..6661629aad8 100644
--- a/gcc/config/riscv/riscv-vector-builtins.def
+++ b/gcc/config/riscv/riscv-vector-builtins.def
@@ -441,47 +441,47 @@ DEF_RVV_TYPE (vuint64m8_t, 16, __rvv_uint64m8_t, uint64, 
RVVM8DI, _u64m8, _u64,
DEF_RVV_TYPE (vfloat16mf4_t, 18, __rvv_float16mf4_t, float16, RVVMF4HF, _f16mf4,
  _f16, _e16mf4)
/* Define tuple types for SEW = 16, LMUL = MF4. */
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x2_t, 20, __rvv_float16mf4x2_t, vfloat16mf4_t, 
float, 2, _f16mf4x2)
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x3_t, 20, __rvv_float16mf4x3_t, vfloat16mf4_t, 
float, 3, _f16mf4x3)
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x4_t, 20, __rvv_float16mf4x4_t, vfloat16mf4_t, 
float, 4, _f16mf4x4)
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x5_t, 20, __rvv_float16mf4x5_t, vfloat16mf4_t, 
float, 5, _f16mf4x5)
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x6_t, 20, __rvv_float16mf4x6_t, vfloat16mf4_t, 
float, 6, _f16mf4x6)
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x7_t, 20, __rvv_float16mf4x7_t, vfloat16mf4_t, 
float, 7, _f16mf4x7)
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x8_t, 20, __rvv_float16mf4x8_t, vfloat16mf4_t, 
float, 8, _f16mf4x8)
+DEF_RVV_TUPLE_TYPE (vfloat16mf4x2_t, 20, __rvv_float16mf4x2_t, vfloat16mf4_t, 
float16, 2, _f16mf4x2)
+DEF_RVV_TUPLE_TYPE (vfloat16mf4x3_t, 20, __rvv_float16mf4x3_t, vfloat16mf4_t, 
float16, 3, _f16mf4x3)
+DEF_RVV_TUPLE_TYPE (vfloat16mf4x4_t, 20, __rvv_float16mf4x4_t, vfloat16mf4_t, 
float16, 4, _f16mf4x4)
+DEF_RVV_TUPLE_TYPE (vfloat16mf4x5_t, 20, __rvv_float16mf4x5_t, vfloat16mf4_t, 
float16, 5, _f16mf4x5)
+DEF_RVV_TUPLE_TYPE (vfloat16mf4x6_t, 20, __rvv_float16mf4x6_t, vfloat16mf4_t, 
float16, 6, _f16mf4x6)
+DEF_RVV_TUPLE_TYPE (vfloat16mf4x7_t, 20, __rvv_float16mf4x7_t, vfloat16mf4_t, 
float16, 7, _f16mf4x7)

Re: [PATCH] - Devirtualization of array destruction (C++) - 110057


On 7/12/23 10:10, Ng YongXiang via Gcc-patches wrote:

Component:
c++

Bug ID:
110057

Bugzilla link:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110057

Description:
Array should not call virtual destructor of object when array is destructed

ChangeLog:

2023-07-12  Ng YongXiang  PR c++
* Devirtualize auto generated destructor calls of arraycp/*
init.c: Call non virtual destructor of objects in arraytestsuite/
   * g++.dg/devirt-array-destructor-1.C: New.*
g++.dg/devirt-array-destructor-2.C: New.


On Wed, Jul 12, 2023 at 5:02 PM Xi Ruoyao  wrote:


On Wed, 2023-07-12 at 16:58 +0800, Ng YongXiang via Gcc-patches wrote:

I'm writing to seek for a review for an issue I filed some time ago.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110057 . A proposed patch

is

attached in the bug tracker as well.


You should send the patch to gcc-patches@gcc.gnu.org for a review, see
https://gcc.gnu.org/contribute.html for the details.  Generally we
consider patches attached in bugzilla as drafts.


Thanks!  The change makes sense under 
https://eel.is/c++draft/expr.delete#3.sentence-2 , but please look again 
at contribute.html.


In particular, the Legal section; you don't seem to have a copyright 
assignment with the FSF, nor do I see a DCO certification 
(https://gcc.gnu.org/dco.html) in your patch.


Like the examples in contribute.html, the subject line should be more 
like "[PATCH] c++: devirtualization of array destruction [PR110057]"


The ChangeLog entry should be in the commit message.


 * g++.dg/warn/pr83054.C: Change expected number of devirtualized calls


This isn't just changing the expected number, it's also changing the 
array from a local variable to dynamically allocated, which is a big 
change to what's being tested.  If you want to test the dynamic case, 
please add a new test instead of making this change.



diff --git a/gcc/testsuite/g++.dg/warn/pr83054.C 
b/gcc/testsuite/g++.dg/warn/pr83054.C
index 5285f94acee..7cd0951713d 100644
--- a/gcc/testsuite/g++.dg/warn/pr83054.C
+++ b/gcc/testsuite/g++.dg/warn/pr83054.C
@@ -10,7 +10,7 @@
 #endif
 
 extern "C" int printf (const char *, ...);

-struct foo // { dg-warning "final would enable devirtualization of 5 calls" }
+struct foo // { dg-warning "final would enable devirtualization of 1 call" }
 {
   static int count;
   void print (int i, int j) { printf ("foo[%d][%d] = %d\n", i, j, x); }
@@ -29,19 +29,15 @@ int foo::count;
 
 int main ()

 {
-  {
-foo array[3][3];
-for (int i = 0; i < 3; i++)
-  {
-   for (int j = 0; j < 3; j++)
- {
-   printf("[%d][%d] = %x\n", i, j, (void *)[i][j]);
- }
-  }
-  // The count should be nine, if not, fail the test.
-  if (foo::count != 9)
-   return 1;
-  }
+  foo* arr[9];
+  for (int i = 0; i < 9; ++i)
+arr[i] = new foo();
+  if (foo::count != 9)
+return 1;
+  for (int i = 0; i < 9; ++i)
+arr[i]->print(i / 3, i % 3);
+  for (int i = 0; i < 9; ++i)
+delete arr[i];

Re: [PATCH] RISC-V: Fix vector tuple intrinsic

Thanks a lot for testing and fixing RVV API。

Could you add a simple float16 tuple api test ?

I known the API is so big that we can't add all api tests into testsuite but 
adding a simple case will be nice.

By the way, do you have write access?




juzhe.zh...@rivai.ai
 
From: Li Xu
Date: 2023-07-26 12:04
To: gcc-patches
CC: kito.cheng; palmer; juzhe.zhong; Li Xu
Subject: [PATCH] RISC-V: Fix vector tuple intrinsic
Consider this following case:
void test_vsoxseg3ei32_v_i32mf2x3(int32_t *base, vuint32mf2_t bindex, 
vint32mf2x3_t v_tuple, size_t vl) {
  return __riscv_vsoxseg3ei32_v_i32mf2x3(base, bindex, v_tuple, vl);
}
 
Compiler failed with:
test.c:19:1: internal compiler error: in vl_vtype_info, at 
config/riscv/riscv-vsetvl.cc:1679
   19 | }
  | ^
0x1439ec2 riscv_vector::vl_vtype_info::vl_vtype_info(riscv_vector::avl_info, 
unsigned char, riscv_vector::vlmul_type, unsigned char, bool, bool)
../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:1679
0x143f788 get_vl_vtype_info
../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:807
0x143f788 riscv_vector::vector_insn_info::parse_insn(rtl_ssa::insn_info*)
../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:1843
0x1440371 riscv_vector::vector_infos_manager::vector_infos_manager()
../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:2350
0x14407ee pass_vsetvl::init()
../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:4581
0x14471cf pass_vsetvl::execute(function*)
../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:4716
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins.def (vfloat16mf4x2_t): Change 
scalar type to float16, eliminate warning.
(vfloat16mf4x3_t): Ditto.
(vfloat16mf4x4_t): Ditto.
(vfloat16mf4x5_t): Ditto.
(vfloat16mf4x6_t): Ditto.
(vfloat16mf4x7_t): Ditto.
(vfloat16mf4x8_t): Ditto.
(vfloat16mf2x2_t): Ditto.
(vfloat16mf2x3_t): Ditto.
(vfloat16mf2x4_t): Ditto.
(vfloat16mf2x5_t): Ditto.
(vfloat16mf2x6_t): Ditto.
(vfloat16mf2x7_t): Ditto.
(vfloat16mf2x8_t): Ditto.
(vfloat16m1x2_t): Ditto.
(vfloat16m1x3_t): Ditto.
(vfloat16m1x4_t): Ditto.
(vfloat16m1x5_t): Ditto.
(vfloat16m1x6_t): Ditto.
(vfloat16m1x7_t): Ditto.
(vfloat16m1x8_t): Ditto.
(vfloat16m2x2_t): Ditto.
(vfloat16m2x3_t): Ditto.
(vfloat16m2x4_t): Ditto.
(vfloat16m4x2_t): Ditto.
* config/riscv/vector-iterators.md: add RVVM4x2DF in iterator V4T.
* config/riscv/vector.md: add tuple mode in attr sew.
---
gcc/config/riscv/riscv-vector-builtins.def | 50 +++---
gcc/config/riscv/vector-iterators.md   |  1 +
gcc/config/riscv/vector.md |  1 +
3 files changed, 27 insertions(+), 25 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-vector-builtins.def 
b/gcc/config/riscv/riscv-vector-builtins.def
index 0e49480703b..6661629aad8 100644
--- a/gcc/config/riscv/riscv-vector-builtins.def
+++ b/gcc/config/riscv/riscv-vector-builtins.def
@@ -441,47 +441,47 @@ DEF_RVV_TYPE (vuint64m8_t, 16, __rvv_uint64m8_t, uint64, 
RVVM8DI, _u64m8, _u64,
DEF_RVV_TYPE (vfloat16mf4_t, 18, __rvv_float16mf4_t, float16, RVVMF4HF, _f16mf4,
  _f16, _e16mf4)
/* Define tuple types for SEW = 16, LMUL = MF4. */
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x2_t, 20, __rvv_float16mf4x2_t, vfloat16mf4_t, 
float, 2, _f16mf4x2)
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x3_t, 20, __rvv_float16mf4x3_t, vfloat16mf4_t, 
float, 3, _f16mf4x3)
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x4_t, 20, __rvv_float16mf4x4_t, vfloat16mf4_t, 
float, 4, _f16mf4x4)
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x5_t, 20, __rvv_float16mf4x5_t, vfloat16mf4_t, 
float, 5, _f16mf4x5)
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x6_t, 20, __rvv_float16mf4x6_t, vfloat16mf4_t, 
float, 6, _f16mf4x6)
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x7_t, 20, __rvv_float16mf4x7_t, vfloat16mf4_t, 
float, 7, _f16mf4x7)
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x8_t, 20, __rvv_float16mf4x8_t, vfloat16mf4_t, 
float, 8, _f16mf4x8)
+DEF_RVV_TUPLE_TYPE (vfloat16mf4x2_t, 20, __rvv_float16mf4x2_t, vfloat16mf4_t, 
float16, 2, _f16mf4x2)
+DEF_RVV_TUPLE_TYPE (vfloat16mf4x3_t, 20, __rvv_float16mf4x3_t, vfloat16mf4_t, 
float16, 3, _f16mf4x3)
+DEF_RVV_TUPLE_TYPE (vfloat16mf4x4_t, 20, __rvv_float16mf4x4_t, vfloat16mf4_t, 
float16, 4, _f16mf4x4)
+DEF_RVV_TUPLE_TYPE (vfloat16mf4x5_t, 20, __rvv_float16mf4x5_t, vfloat16mf4_t, 
float16, 5, _f16mf4x5)
+DEF_RVV_TUPLE_TYPE (vfloat16mf4x6_t, 20, __rvv_float16mf4x6_t, vfloat16mf4_t, 
float16, 6, _f16mf4x6)
+DEF_RVV_TUPLE_TYPE (vfloat16mf4x7_t, 20, __rvv_float16mf4x7_t, vfloat16mf4_t, 
float16, 7, _f16mf4x7)
+DEF_RVV_TUPLE_TYPE (vfloat16mf4x8_t, 20, __rvv_float16mf4x8_t, vfloat16mf4_t, 
float16, 8, _f16mf4x8)
/* LMUL = 1/2.  */
DEF_RVV_TYPE (vfloat16mf2_t, 18, __rvv_float16mf2_t, float16, RVVMF2HF, _f16mf2,
  _f16, _e16mf2)
/* Define tuple types for SEW = 16, LMUL = MF2. */
-DEF_RVV_TUPLE_TYPE (vfloat16mf2x2_t, 20,

[PATCH] RISC-V: Fix vector tuple intrinsic

2023-07-25 Thread Li Xu

Consider this following case:
void test_vsoxseg3ei32_v_i32mf2x3(int32_t *base, vuint32mf2_t bindex, 
vint32mf2x3_t v_tuple, size_t vl) {
  return __riscv_vsoxseg3ei32_v_i32mf2x3(base, bindex, v_tuple, vl);
}

Compiler failed with:
test.c:19:1: internal compiler error: in vl_vtype_info, at 
config/riscv/riscv-vsetvl.cc:1679
   19 | }
  | ^
0x1439ec2 riscv_vector::vl_vtype_info::vl_vtype_info(riscv_vector::avl_info, 
unsigned char, riscv_vector::vlmul_type, unsigned char, bool, bool)
../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:1679
0x143f788 get_vl_vtype_info
../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:807
0x143f788 riscv_vector::vector_insn_info::parse_insn(rtl_ssa::insn_info*)
../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:1843
0x1440371 riscv_vector::vector_infos_manager::vector_infos_manager()
../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:2350
0x14407ee pass_vsetvl::init()
../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:4581
0x14471cf pass_vsetvl::execute(function*)
../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:4716

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins.def (vfloat16mf4x2_t): Change 
scalar type to float16, eliminate warning.
(vfloat16mf4x3_t): Ditto.
(vfloat16mf4x4_t): Ditto.
(vfloat16mf4x5_t): Ditto.
(vfloat16mf4x6_t): Ditto.
(vfloat16mf4x7_t): Ditto.
(vfloat16mf4x8_t): Ditto.
(vfloat16mf2x2_t): Ditto.
(vfloat16mf2x3_t): Ditto.
(vfloat16mf2x4_t): Ditto.
(vfloat16mf2x5_t): Ditto.
(vfloat16mf2x6_t): Ditto.
(vfloat16mf2x7_t): Ditto.
(vfloat16mf2x8_t): Ditto.
(vfloat16m1x2_t): Ditto.
(vfloat16m1x3_t): Ditto.
(vfloat16m1x4_t): Ditto.
(vfloat16m1x5_t): Ditto.
(vfloat16m1x6_t): Ditto.
(vfloat16m1x7_t): Ditto.
(vfloat16m1x8_t): Ditto.
(vfloat16m2x2_t): Ditto.
(vfloat16m2x3_t): Ditto.
(vfloat16m2x4_t): Ditto.
(vfloat16m4x2_t): Ditto.
* config/riscv/vector-iterators.md: add RVVM4x2DF in iterator V4T.
* config/riscv/vector.md: add tuple mode in attr sew.
---
 gcc/config/riscv/riscv-vector-builtins.def | 50 +++---
 gcc/config/riscv/vector-iterators.md   |  1 +
 gcc/config/riscv/vector.md |  1 +
 3 files changed, 27 insertions(+), 25 deletions(-)

diff --git a/gcc/config/riscv/riscv-vector-builtins.def 
b/gcc/config/riscv/riscv-vector-builtins.def
index 0e49480703b..6661629aad8 100644
--- a/gcc/config/riscv/riscv-vector-builtins.def
+++ b/gcc/config/riscv/riscv-vector-builtins.def
@@ -441,47 +441,47 @@ DEF_RVV_TYPE (vuint64m8_t, 16, __rvv_uint64m8_t, uint64, 
RVVM8DI, _u64m8, _u64,
 DEF_RVV_TYPE (vfloat16mf4_t, 18, __rvv_float16mf4_t, float16, RVVMF4HF, 
_f16mf4,
  _f16, _e16mf4)
 /* Define tuple types for SEW = 16, LMUL = MF4. */
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x2_t, 20, __rvv_float16mf4x2_t, vfloat16mf4_t, 
float, 2, _f16mf4x2)
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x3_t, 20, __rvv_float16mf4x3_t, vfloat16mf4_t, 
float, 3, _f16mf4x3)
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x4_t, 20, __rvv_float16mf4x4_t, vfloat16mf4_t, 
float, 4, _f16mf4x4)
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x5_t, 20, __rvv_float16mf4x5_t, vfloat16mf4_t, 
float, 5, _f16mf4x5)
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x6_t, 20, __rvv_float16mf4x6_t, vfloat16mf4_t, 
float, 6, _f16mf4x6)
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x7_t, 20, __rvv_float16mf4x7_t, vfloat16mf4_t, 
float, 7, _f16mf4x7)
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x8_t, 20, __rvv_float16mf4x8_t, vfloat16mf4_t, 
float, 8, _f16mf4x8)
+DEF_RVV_TUPLE_TYPE (vfloat16mf4x2_t, 20, __rvv_float16mf4x2_t, vfloat16mf4_t, 
float16, 2, _f16mf4x2)
+DEF_RVV_TUPLE_TYPE (vfloat16mf4x3_t, 20, __rvv_float16mf4x3_t, vfloat16mf4_t, 
float16, 3, _f16mf4x3)
+DEF_RVV_TUPLE_TYPE (vfloat16mf4x4_t, 20, __rvv_float16mf4x4_t, vfloat16mf4_t, 
float16, 4, _f16mf4x4)
+DEF_RVV_TUPLE_TYPE (vfloat16mf4x5_t, 20, __rvv_float16mf4x5_t, vfloat16mf4_t, 
float16, 5, _f16mf4x5)
+DEF_RVV_TUPLE_TYPE (vfloat16mf4x6_t, 20, __rvv_float16mf4x6_t, vfloat16mf4_t, 
float16, 6, _f16mf4x6)
+DEF_RVV_TUPLE_TYPE (vfloat16mf4x7_t, 20, __rvv_float16mf4x7_t, vfloat16mf4_t, 
float16, 7, _f16mf4x7)
+DEF_RVV_TUPLE_TYPE (vfloat16mf4x8_t, 20, __rvv_float16mf4x8_t, vfloat16mf4_t, 
float16, 8, _f16mf4x8)
 /* LMUL = 1/2.  */
 DEF_RVV_TYPE (vfloat16mf2_t, 18, __rvv_float16mf2_t, float16, RVVMF2HF, 
_f16mf2,
  _f16, _e16mf2)
 /* Define tuple types for SEW = 16, LMUL = MF2. */
-DEF_RVV_TUPLE_TYPE (vfloat16mf2x2_t, 20, __rvv_float16mf2x2_t, vfloat16mf2_t, 
float, 2, _f16mf2x2)
-DEF_RVV_TUPLE_TYPE (vfloat16mf2x3_t, 20, __rvv_float16mf2x3_t, vfloat16mf2_t, 
float, 3, _f16mf2x3)
-DEF_RVV_TUPLE_TYPE (vfloat16mf2x4_t, 20, __rvv_float16mf2x4_t, vfloat16mf2_t, 
float, 4, _f16mf2x4)
-DEF_RVV_TUPLE_TYPE (vfloat16mf2x5_t, 20, __rvv_float16mf2x5_t, vfloat16mf2_t, 
float, 5, _f16mf2x5)
-DEF_RVV_TUPLE_TYPE (vfloat16mf2x6_t, 20,

Re: Re: [PATCH v3] RISC-V: Fixbug for fsflags instruction error using immediate.

Yes. I agree.

I didn't take a look into SPEC. Not sure whether fcsr has immediate form.

I mean this patch change in 'fcsr' is quite confusing.

You should either fix the assembly code-gen if fcsr has immediate form,

or fix predicate and constraint both (should not fix constraint only).

Thanks.

juzhe.zh...@rivai.ai

From: Kito Cheng
Date: 2023-07-26 11:45
To: juzhe.zh...@rivai.ai
CC: jinma; gcc-patches; jeffreyalaw; palmer; richard.sandiford; 
philipp.tomsich; christoph.muellner; Robin Dapp; jinma.contrib
Subject: Re: Re: [PATCH v3] RISC-V: Fixbug for fsflags instruction error using 
immediate.
So I guess you should change `fscsr` to `fscsr%i0` instead of dropping
K from the constraint list?

On Wed, Jul 26, 2023 at 11:42 AM juzhe.zh...@rivai.ai
 wrote:
>
> I don't understand:
>  (define_insn "riscv_fscsr"
> -  [(unspec_volatile [(match_operand:SI 0 "csr_operand" "rK")] UNSPECV_FSCSR)]
> +  [(unspec_volatile [(match_operand:SI 0 "csr_operand" "r")] UNSPECV_FSCSR)]
>"TARGET_HARD_FLOAT || TARGET_ZFINX"
>"fscsr\t%0")
>
> This pattern never allows immediate in the constraint. Why still make 
> predicate allow immediate?
>
>
>
>
> juzhe.zh...@rivai.ai
>
> From: Jin Ma
> Date: 2023-07-26 11:33
> To: gcc-patches; juzhe.zh...@rivai.ai
> CC: jeffreyalaw; palmer; richard.sandiford; kito.cheng; philipp.tomsich; 
> christoph.muellner; Robin Dapp; jinma.contrib
> Subject: Re: [PATCH v3] RISC-V: Fixbug for fsflags instruction error using 
> immediate.
> > -  [(unspec_volatile [(match_operand:SI 0 "csr_operand" "rK")] 
> > UNSPECV_FSCSR)]
> > +  [(unspec_volatile [(match_operand:SI 0 "csr_operand" "r")] 
> > UNSPECV_FSCSR)]
> >
> > If you don't allow immediate value in range 0 ~ 31, it should be 
> > "register_operand" instead of "csr_operand".
> >
> >
>
> I think directives that support the immediate pattern might be better, on the 
> one
> hand fsflagsi are supported in the manual, on the other hand fsflagsi can be
> slightly faster than fsflags.
>
> Regards
> Jin
>
> >
> > juzhe.zh...@rivai.ai
> >

Re: Re: [PATCH v3] RISC-V: Fixbug for fsflags instruction error using immediate.

So I guess you should change `fscsr` to `fscsr%i0` instead of dropping
K from the constraint list?

On Wed, Jul 26, 2023 at 11:42 AM juzhe.zh...@rivai.ai
 wrote:
>
> I don't understand:
>  (define_insn "riscv_fscsr"
> -  [(unspec_volatile [(match_operand:SI 0 "csr_operand" "rK")] UNSPECV_FSCSR)]
> +  [(unspec_volatile [(match_operand:SI 0 "csr_operand" "r")] UNSPECV_FSCSR)]
>"TARGET_HARD_FLOAT || TARGET_ZFINX"
>"fscsr\t%0")
>
> This pattern never allows immediate in the constraint. Why still make 
> predicate allow immediate?
>
>
>
>
> juzhe.zh...@rivai.ai
>
> From: Jin Ma
> Date: 2023-07-26 11:33
> To: gcc-patches; juzhe.zh...@rivai.ai
> CC: jeffreyalaw; palmer; richard.sandiford; kito.cheng; philipp.tomsich; 
> christoph.muellner; Robin Dapp; jinma.contrib
> Subject: Re: [PATCH v3] RISC-V: Fixbug for fsflags instruction error using 
> immediate.
> > -  [(unspec_volatile [(match_operand:SI 0 "csr_operand" "rK")] 
> > UNSPECV_FSCSR)]
> > +  [(unspec_volatile [(match_operand:SI 0 "csr_operand" "r")] 
> > UNSPECV_FSCSR)]
> >
> > If you don't allow immediate value in range 0 ~ 31, it should be 
> > "register_operand" instead of "csr_operand".
> >
> >
>
> I think directives that support the immediate pattern might be better, on the 
> one
> hand fsflagsi are supported in the manual, on the other hand fsflagsi can be
> slightly faster than fsflags.
>
> Regards
> Jin
>
> >
> > juzhe.zh...@rivai.ai
> >

Re: Re: [PATCH v3] RISC-V: Fixbug for fsflags instruction error using immediate.

I don't understand:
 (define_insn "riscv_fscsr"
-  [(unspec_volatile [(match_operand:SI 0 "csr_operand" "rK")] UNSPECV_FSCSR)]
+  [(unspec_volatile [(match_operand:SI 0 "csr_operand" "r")] UNSPECV_FSCSR)]
   "TARGET_HARD_FLOAT || TARGET_ZFINX"
   "fscsr\t%0")

This pattern never allows immediate in the constraint. Why still make predicate 
allow immediate?
 



juzhe.zh...@rivai.ai
 
From: Jin Ma
Date: 2023-07-26 11:33
To: gcc-patches; juzhe.zh...@rivai.ai
CC: jeffreyalaw; palmer; richard.sandiford; kito.cheng; philipp.tomsich; 
christoph.muellner; Robin Dapp; jinma.contrib
Subject: Re: [PATCH v3] RISC-V: Fixbug for fsflags instruction error using 
immediate.
> -  [(unspec_volatile [(match_operand:SI 0 "csr_operand" "rK")] UNSPECV_FSCSR)]
> +  [(unspec_volatile [(match_operand:SI 0 "csr_operand" "r")] UNSPECV_FSCSR)]
> 
> If you don't allow immediate value in range 0 ~ 31, it should be 
> "register_operand" instead of "csr_operand".
> 
> 
 
I think directives that support the immediate pattern might be better, on the 
one
hand fsflagsi are supported in the manual, on the other hand fsflagsi can be
slightly faster than fsflags.
 
Regards
Jin
 
> 
> juzhe.zh...@rivai.ai
>

Re: [PATCH v3] RISC-V: Fixbug for fsflags instruction error using immediate.

> -  [(unspec_volatile [(match_operand:SI 0 "csr_operand" "rK")] UNSPECV_FSCSR)]
> +  [(unspec_volatile [(match_operand:SI 0 "csr_operand" "r")] UNSPECV_FSCSR)]
> 
> If you don't allow immediate value in range 0 ~ 31, it should be 
> "register_operand" instead of "csr_operand".
> 
> 

I think directives that support the immediate pattern might be better, on the 
one
hand fsflagsi are supported in the manual, on the other hand fsflagsi can be
slightly faster than fsflags.

Regards
Jin

> 
> juzhe.zh...@rivai.ai
>

Re: RISC-V: Folding memory for FP + constant case





On 7/25/23 05:24, Jivan Hakobyan wrote:

Hi.

I re-run the benchmarks and hopefully got the same profit.
I also compared the leela's code and figured out the reason.

Actually, my and Manolis's patches do the same thing. The difference is 
only execution order.
But shouldn't your patch also allow for for at the last the potential to 
pull the fp+offset computation out of a loop?  I'm pretty sure Manolis's 
patch can't do that.


Because of f-m-o held after the register allocation it cannot eliminate 
redundant move 'sp' to another register.
Actually that's supposed to be handled by a different patch that should 
already be upstream.  Specifically;



commit 6a2e8dcbbd4bab374b27abea375bf7a921047800
Author: Manolis Tsamis 
Date:   Thu May 25 13:44:41 2023 +0200

cprop_hardreg: Enable propagation of the stack pointer if possible

Propagation of the stack pointer in cprop_hardreg is currenty

forbidden in all cases, due to maybe_mode_change returning NULL.
Relax this restriction and allow propagation when no mode change is
requested.

gcc/ChangeLog:

* regcprop.cc (maybe_mode_change): Enable stack pointer

propagation.
I think there were a couple-follow-ups.  But that's the key change that 
should allow propagation of copies from the stack pointer and thus 
eliminate the mov gpr,sp instructions.  If that's not happening, then 
it's worth investigating why.




Besides that, I have checked the build failure on x264_r. It is already 
fixed on the third version.

Yea, this was a problem with re-recognition.  I think it was fixed by:


commit ecfa870ff29d979bd2c3d411643b551f2b6915b0
Author: Vineet Gupta 
Date:   Thu Jul 20 11:15:37 2023 -0700

RISC-V: optim const DF +0.0 store to mem [PR/110748]

Fixes: ef85d150b5963 ("RISC-V: Enable TARGET_SUPPORTS_WIDE_INT")

DF +0.0 is bitwise all zeros so int x0 store to mem can be used to optimize it.

[ ... ]


So I think the big question WRT your patch is does it still help the 
case where we weren't pulling the fp+offset computation out of a loop.


Jeff

Re: [PATCH] RISC-V: optim const DF +0.0 store to mem [PR/110748]





On 7/25/23 17:05, Palmer Dabbelt wrote:

On Fri, 21 Jul 2023 11:47:58 PDT (-0700), gcc-patches@gcc.gnu.org wrote:

On 7/21/23 12:31, Palmer Dabbelt wrote:

(define_expand "len_mask_gather_load"
   [(match_operand:VNX1_QHSD 0 "register_operand")
-   (match_operand 1 "pmode_reg_or_0_operand")
+   (match_operand:P 1 "pmode_reg_or_0_operand")
    (match_operand:VNX1_QHSDI 2 "register_operand")
    (match_operand 3 "")
    (match_operand 4 "")

a bunch of times, as there's a ton of them?  I'm not entirely sure if 
that

could manifest as an actual bug, though...

But won't this cause (const_int 0) to no longer match because CONST_INT
nodes are modeless (VOIDmode)?


I poked around a bit and I'm not actually sure, I'm kind of lost on the 
docs

here.  IIUC we're eliding the VOIDmode in the predicate correctly

    (define_predicate "const_0_operand"
  (and (match_code "const_int,const_wide_int,const_vector")
   (match_test "op == CONST0_RTX (GET_MODE (op))")))

so we're OK there, otherwise we'd presumably have similar problems with
expanders like

    (define_expand "subsi3"
  [(set (match_operand:SI   0 "register_operand" "= r")
   (minus:SI (match_operand:SI 1 "reg_or_0_operand" " rJ")
     (match_operand:SI 2 "register_operand" "  r")))]
  ""

which we have a few of -- though it'd be kind of a silent failure, as
presumably we'd just end up with some more move-x0s emitted?
It's a bit messy to say the least.  However, we can look at other ports 
and after doing so I'm less sure my concern is valid.


Take the typical movXX pattern or expander.  Both operands have a mode, 
so things like CONST_INT must be passing through, even though they're 
VOIDmode.


So it's probably a non-issue.
jeff

[PATCH] rs6000: Correct vsx operands output for xxeval [PR110741]

2023-07-25 Thread Kewen.Lin via Gcc-patches

Hi,

PR110741 exposes one issue that we didn't use the correct
character for vsx operands in output operand substitution,
consequently it can map to the wrong registers which hold
some unexpected values.

Bootstrapped and regress-tested on powerpc64-linux-gnu
P7/P8/P9 and powerpc64le-linux-gnu P9/P10.

I'll push this soon and backport to release branches after
a week or so.

BR,
Kewen
-
PR target/110741

gcc/ChangeLog:

* config/rs6000/vsx.md (define_insn xxeval): Correct vsx
operands output with "x".

gcc/testsuite/ChangeLog:

* g++.target/powerpc/pr110741.C: New test.
---
 gcc/config/rs6000/vsx.md|   2 +-
 gcc/testsuite/g++.target/powerpc/pr110741.C | 552 
 2 files changed, 553 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.target/powerpc/pr110741.C

diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 0c269e4e8d9..1a87f1c0b63 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -6586,7 +6586,7 @@ (define_insn "xxeval"
  (match_operand:QI 4 "u8bit_cint_operand" "n")]
 UNSPEC_XXEVAL))]
"TARGET_POWER10"
-   "xxeval %0,%1,%2,%3,%4"
+   "xxeval %x0,%x1,%x2,%x3,%4"
[(set_attr "type" "vecperm")
 (set_attr "prefixed" "yes")])

diff --git a/gcc/testsuite/g++.target/powerpc/pr110741.C 
b/gcc/testsuite/g++.target/powerpc/pr110741.C
new file mode 100644
index 000..0214936b06d
--- /dev/null
+++ b/gcc/testsuite/g++.target/powerpc/pr110741.C
@@ -0,0 +1,552 @@
+/* { dg-do run { target { power10_hw } } } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
+
+#include 
+
+typedef unsigned char uint8_t;
+
+template 
+static inline vector unsigned long long
+VSXTernaryLogic (vector unsigned long long a, vector unsigned long long b,
+vector unsigned long long c)
+{
+  return vec_ternarylogic (a, b, c, kTernLogOp);
+}
+
+static vector unsigned long long
+VSXTernaryLogic (vector unsigned long long a, vector unsigned long long b,
+vector unsigned long long c, int ternary_logic_op)
+{
+  switch (ternary_logic_op & 0xFF)
+{
+case 0:
+  return VSXTernaryLogic<0> (a, b, c);
+case 1:
+  return VSXTernaryLogic<1> (a, b, c);
+case 2:
+  return VSXTernaryLogic<2> (a, b, c);
+case 3:
+  return VSXTernaryLogic<3> (a, b, c);
+case 4:
+  return VSXTernaryLogic<4> (a, b, c);
+case 5:
+  return VSXTernaryLogic<5> (a, b, c);
+case 6:
+  return VSXTernaryLogic<6> (a, b, c);
+case 7:
+  return VSXTernaryLogic<7> (a, b, c);
+case 8:
+  return VSXTernaryLogic<8> (a, b, c);
+case 9:
+  return VSXTernaryLogic<9> (a, b, c);
+case 10:
+  return VSXTernaryLogic<10> (a, b, c);
+case 11:
+  return VSXTernaryLogic<11> (a, b, c);
+case 12:
+  return VSXTernaryLogic<12> (a, b, c);
+case 13:
+  return VSXTernaryLogic<13> (a, b, c);
+case 14:
+  return VSXTernaryLogic<14> (a, b, c);
+case 15:
+  return VSXTernaryLogic<15> (a, b, c);
+case 16:
+  return VSXTernaryLogic<16> (a, b, c);
+case 17:
+  return VSXTernaryLogic<17> (a, b, c);
+case 18:
+  return VSXTernaryLogic<18> (a, b, c);
+case 19:
+  return VSXTernaryLogic<19> (a, b, c);
+case 20:
+  return VSXTernaryLogic<20> (a, b, c);
+case 21:
+  return VSXTernaryLogic<21> (a, b, c);
+case 22:
+  return VSXTernaryLogic<22> (a, b, c);
+case 23:
+  return VSXTernaryLogic<23> (a, b, c);
+case 24:
+  return VSXTernaryLogic<24> (a, b, c);
+case 25:
+  return VSXTernaryLogic<25> (a, b, c);
+case 26:
+  return VSXTernaryLogic<26> (a, b, c);
+case 27:
+  return VSXTernaryLogic<27> (a, b, c);
+case 28:
+  return VSXTernaryLogic<28> (a, b, c);
+case 29:
+  return VSXTernaryLogic<29> (a, b, c);
+case 30:
+  return VSXTernaryLogic<30> (a, b, c);
+case 31:
+  return VSXTernaryLogic<31> (a, b, c);
+case 32:
+  return VSXTernaryLogic<32> (a, b, c);
+case 33:
+  return VSXTernaryLogic<33> (a, b, c);
+case 34:
+  return VSXTernaryLogic<34> (a, b, c);
+case 35:
+  return VSXTernaryLogic<35> (a, b, c);
+case 36:
+  return VSXTernaryLogic<36> (a, b, c);
+case 37:
+  return VSXTernaryLogic<37> (a, b, c);
+case 38:
+  return VSXTernaryLogic<38> (a, b, c);
+case 39:
+  return VSXTernaryLogic<39> (a, b, c);
+case 40:
+  return VSXTernaryLogic<40> (a, b, c);
+case 41:
+  return VSXTernaryLogic<41> (a, b, c);
+case 42:
+  return VSXTernaryLogic<42> (a, b, c);
+case 43:
+  return VSXTernaryLogic<43> (a, b, c);
+case 44:
+  return VSXTernaryLogic<44> (a, b, c);
+case 45:
+  return VSXTernaryLogic<45> (a, b, c);
+case 46:
+  return VSXTernaryLogic<46> (a, b, c);
+case 47:
+  return VSXTernaryLogic<47> (a, b, c);
+case 48:
+

[PATCH] vect: Treat VMAT_ELEMENTWISE as scalar load in costing [PR110776]

2023-07-25 Thread Kewen.Lin via Gcc-patches

Hi,

PR110776 exposes one issue that we could query unaligned
load for vector type but actually no unaligned vector load
is supported there.  The reason is that the costed load is
with single-lane vector type and its memory access type is
VMAT_ELEMENTWISE, we actually take it as scalar load and
set its alignment_support_scheme as dr_unaligned_supported.

To avoid the ICE as exposed, following Rich's suggestion,
this patch is to make VMAT_ELEMENTWISE be costed as scalar
load.

Bootstrapped and regress-tested on x86_64-redhat-linux,
powerpc64-linux-gnu P8/P9 and powerpc64le-linux-gnu P9/P10.

Is it ok for trunk?

BR,
Kewen
-

Co-authored-by: Richard Biener 

PR tree-optimization/110776

gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_load): Always cost VMAT_ELEMENTWISE
as scalar load.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr110776.c: New test.
---
 gcc/testsuite/gcc.target/powerpc/pr110776.c | 22 +
 gcc/tree-vect-stmts.cc  |  5 -
 2 files changed, 26 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr110776.c

diff --git a/gcc/testsuite/gcc.target/powerpc/pr110776.c 
b/gcc/testsuite/gcc.target/powerpc/pr110776.c
new file mode 100644
index 000..749159fd675
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr110776.c
@@ -0,0 +1,22 @@
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power6 -maltivec" } */
+
+/* Verify there is no ICE.  */
+
+int a;
+long *b;
+int
+c ()
+{
+  long e;
+  int d = 0;
+  for (long f; f; f++)
+{
+  e = b[f * a];
+  if (e)
+   d = 1;
+}
+  if (d)
+for (;;)
+  ;
+}
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index ed28fbdced3..09705200594 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -9840,7 +9840,10 @@ vectorizable_load (vec_info *vinfo,
{
  if (costing_p)
{
- if (VECTOR_TYPE_P (ltype))
+ /* For VMAT_ELEMENTWISE, just cost it as scalar_load to
+avoid ICE, see PR110776.  */
+ if (VECTOR_TYPE_P (ltype)
+ && memory_access_type != VMAT_ELEMENTWISE)
vect_get_load_cost (vinfo, stmt_info, 1,
alignment_support_scheme, misalignment,
false, _cost, nullptr, cost_vec,
--
2.39.1

Re: [PATCH v3] RISC-V: Fixbug for fsflags instruction error using immediate.

-  [(unspec_volatile [(match_operand:SI 0 "csr_operand" "rK")] UNSPECV_FSCSR)]
+  [(unspec_volatile [(match_operand:SI 0 "csr_operand" "r")] UNSPECV_FSCSR)]

If you don't allow immediate value in range 0 ~ 31, it should be 
"register_operand" instead of "csr_operand".



juzhe.zh...@rivai.ai
 
From: Jin Ma
Date: 2023-07-26 10:17
To: gcc-patches
CC: jeffreyalaw; palmer; richard.sandiford; kito.cheng; philipp.tomsich; 
christoph.muellner; rdapp.gcc; juzhe.zhong; jinma.contrib; Jin Ma
Subject: [PATCH v3] RISC-V: Fixbug for fsflags instruction error using 
immediate.
The pattern mistakenly believes that fsflags can use immediate numbers,
but in fact it does not support it. Immediate numbers should use fsflagsi.
 
For example:
__builtin_riscv_fsflags(4);
 
The following error occurred.
/tmp/ccoWdWqT.s: Assembler messages:
/tmp/ccoWdWqT.s:14: Error: illegal operands `fsflags 4'
 
gcc/ChangeLog:
 
* config/riscv/riscv.md: Likewise.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/fsflags.c: New test.
---
gcc/config/riscv/riscv.md|  4 ++--
gcc/testsuite/gcc.target/riscv/fsflags.c | 16 
2 files changed, 18 insertions(+), 2 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/fsflags.c
 
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 4615e811947..74ff9ccc968 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -3074,7 +3074,7 @@ (define_insn "riscv_frcsr"
   "frcsr\t%0")
(define_insn "riscv_fscsr"
-  [(unspec_volatile [(match_operand:SI 0 "csr_operand" "rK")] UNSPECV_FSCSR)]
+  [(unspec_volatile [(match_operand:SI 0 "csr_operand" "r")] UNSPECV_FSCSR)]
   "TARGET_HARD_FLOAT || TARGET_ZFINX"
   "fscsr\t%0")
@@ -3087,7 +3087,7 @@ (define_insn "riscv_frflags"
(define_insn "riscv_fsflags"
   [(unspec_volatile [(match_operand:SI 0 "csr_operand" "rK")] UNSPECV_FSFLAGS)]
   "TARGET_HARD_FLOAT || TARGET_ZFINX"
-  "fsflags\t%0")
+  "fsflags%i0\t%0")
(define_insn "*riscv_fsnvsnan2"
   [(unspec_volatile [(match_operand:ANYF 0 "register_operand" "f")
diff --git a/gcc/testsuite/gcc.target/riscv/fsflags.c 
b/gcc/testsuite/gcc.target/riscv/fsflags.c
new file mode 100644
index 000..74a97b8a7c7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/fsflags.c
@@ -0,0 +1,16 @@
+/* Verify that fsflags is using the correct register or immediate.  */
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-O" } */
+
+void foo1 (int a)
+{
+   __builtin_riscv_fsflags(a);
+}
+void foo2 ()
+{
+   __builtin_riscv_fsflags(4);
+}
+
+/* { dg-final { scan-assembler-times "fsflags\t" 1 } } */
+/* { dg-final { scan-assembler-times "fsflagsi\t" 1 } } */
-- 
2.17.1

[PATCH v3] RISC-V: Fixbug for fsflags instruction error using immediate.

The pattern mistakenly believes that fsflags can use immediate numbers,
but in fact it does not support it. Immediate numbers should use fsflagsi.

For example:
__builtin_riscv_fsflags(4);

The following error occurred.
/tmp/ccoWdWqT.s: Assembler messages:
/tmp/ccoWdWqT.s:14: Error: illegal operands `fsflags 4'

gcc/ChangeLog:

* config/riscv/riscv.md: Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/fsflags.c: New test.
---
 gcc/config/riscv/riscv.md|  4 ++--
 gcc/testsuite/gcc.target/riscv/fsflags.c | 16 
 2 files changed, 18 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/fsflags.c

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 4615e811947..74ff9ccc968 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -3074,7 +3074,7 @@ (define_insn "riscv_frcsr"
   "frcsr\t%0")
 
 (define_insn "riscv_fscsr"
-  [(unspec_volatile [(match_operand:SI 0 "csr_operand" "rK")] UNSPECV_FSCSR)]
+  [(unspec_volatile [(match_operand:SI 0 "csr_operand" "r")] UNSPECV_FSCSR)]
   "TARGET_HARD_FLOAT || TARGET_ZFINX"
   "fscsr\t%0")
 
@@ -3087,7 +3087,7 @@ (define_insn "riscv_frflags"
 (define_insn "riscv_fsflags"
   [(unspec_volatile [(match_operand:SI 0 "csr_operand" "rK")] UNSPECV_FSFLAGS)]
   "TARGET_HARD_FLOAT || TARGET_ZFINX"
-  "fsflags\t%0")
+  "fsflags%i0\t%0")
 
 (define_insn "*riscv_fsnvsnan2"
   [(unspec_volatile [(match_operand:ANYF 0 "register_operand" "f")
diff --git a/gcc/testsuite/gcc.target/riscv/fsflags.c 
b/gcc/testsuite/gcc.target/riscv/fsflags.c
new file mode 100644
index 000..74a97b8a7c7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/fsflags.c
@@ -0,0 +1,16 @@
+/* Verify that fsflags is using the correct register or immediate.  */
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-O" } */
+
+void foo1 (int a)
+{
+   __builtin_riscv_fsflags(a);
+}
+void foo2 ()
+{
+   __builtin_riscv_fsflags(4);
+}
+
+/* { dg-final { scan-assembler-times "fsflags\t" 1 } } */
+/* { dg-final { scan-assembler-times "fsflagsi\t" 1 } } */
-- 
2.17.1

Re: [PATCH] c++: fix ICE with is_really_empty_class [PR110106]


On 7/25/23 16:30, Marek Polacek wrote:

On Tue, Jul 25, 2023 at 04:24:39PM -0400, Jason Merrill wrote:

On 7/25/23 15:59, Marek Polacek wrote:

Something like this, then?  I see that cp_parser_initializer_clause et al
offer further opportunities (because they sometimes use a dummy too) but
this should be a good start.


Looks good.  Please do update the other callers as well, while you're
looking at this.


Thanks.  Can I push this part first?


Ah, sure.  I had thought the other callers would be trivial to add.

Jason

Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]

2023-07-25 Thread Hao Liu OS via Gcc-patches

> When was STMT_VINFO_REDUC_DEF empty?  I just want to make sure that we're not 
> papering over an issue elsewhere.

Yes, I also wonder if this is an issue in vectorizable_reduction.  Below is the 
the gimple of "gcc.target/aarch64/sve/cost_model_13.c":

  :
  # res_18 = PHI 
  # i_20 = PHI 
  _1 = (long unsigned int) i_20;
  _2 = _1 * 2;
  _3 = x_14(D) + _2;
  _4 = *_3;
  _5 = (unsigned short) _4;
  res.0_6 = (unsigned short) res_18;
  _7 = _5 + res.0_6; <-- The current stmt_info
  res_15 = (short int) _7;
  i_16 = i_20 + 1;
  if (n_11(D) > i_16)
goto ;
  else
goto ;

  :
  goto ;

It looks like that STMT_VINFO_REDUC_DEF should be "res_18 = PHI "?
The status here is:
  STMT_VINFO_REDUC_IDX (stmt_info): 1
  STMT_VINFO_REDUC_TYPE (stmt_info): TREE_CODE_REDUCTION
  STMT_VINFO_REDUC_VECTYPE (stmt_info): 0x0

Thanks,
Hao


From: Richard Sandiford 
Sent: Tuesday, July 25, 2023 17:44
To: Hao Liu OS
Cc: GCC-patches@gcc.gnu.org
Subject: Re: [PATCH] AArch64: Do not increase the vect reduction latency by 
multiplying count [PR110625]

Hao Liu OS  writes:
> Hi,
>
> Thanks for the suggestion.  I tested it and found a gcc_assert failure:
> gcc.target/aarch64/sve/cost_model_13.c (internal compiler error: in 
> info_for_reduction, at tree-vect-loop.cc:5473)
>
> It is caused by empty STMT_VINFO_REDUC_DEF.

When was STMT_VINFO_REDUC_DEF empty?  I just want to make sure that
we're not papering over an issue elsewhere.

Thanks,
Richard

  So, I added an extra check before checking single_defuse_cycle. The updated 
patch is below.  Is it OK for trunk?
>
> ---
>
> The new costs should only count reduction latency by multiplying count for
> single_defuse_cycle.  For other situations, this will increase the reduction
> latency a lot and miss vectorization opportunities.
>
> Tested on aarch64-linux-gnu.
>
> gcc/ChangeLog:
>
>   PR target/110625
>   * config/aarch64/aarch64.cc (count_ops): Only '* count' for
>   single_defuse_cycle while counting reduction_latency.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/pr110625_1.c: New testcase.
>   * gcc.target/aarch64/pr110625_2.c: New testcase.
> ---
>  gcc/config/aarch64/aarch64.cc | 13 --
>  gcc/testsuite/gcc.target/aarch64/pr110625_1.c | 46 +++
>  gcc/testsuite/gcc.target/aarch64/pr110625_2.c | 14 ++
>  3 files changed, 69 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/pr110625_1.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/pr110625_2.c
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 560e5431636..478a4e00110 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -16788,10 +16788,15 @@ aarch64_vector_costs::count_ops (unsigned int 
> count, vect_cost_for_stmt kind,
>  {
>unsigned int base
>   = aarch64_in_loop_reduction_latency (m_vinfo, stmt_info, m_vec_flags);
> -
> -  /* ??? Ideally we'd do COUNT reductions in parallel, but unfortunately
> -  that's not yet the case.  */
> -  ops->reduction_latency = MAX (ops->reduction_latency, base * count);
> +  if (STMT_VINFO_REDUC_DEF (stmt_info)
> +   && STMT_VINFO_FORCE_SINGLE_CYCLE (
> + info_for_reduction (m_vinfo, stmt_info)))
> + /* ??? Ideally we'd use a tree to reduce the copies down to 1 vector,
> +and then accumulate that, but at the moment the loop-carried
> +dependency includes all copies.  */
> + ops->reduction_latency = MAX (ops->reduction_latency, base * count);
> +  else
> + ops->reduction_latency = MAX (ops->reduction_latency, base);
>  }
>
>/* Assume that multiply-adds will become a single operation.  */
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr110625_1.c 
> b/gcc/testsuite/gcc.target/aarch64/pr110625_1.c
> new file mode 100644
> index 000..0965cac33a0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/pr110625_1.c
> @@ -0,0 +1,46 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Ofast -mcpu=neoverse-n2 -fdump-tree-vect-details 
> -fno-tree-slp-vectorize" } */
> +/* { dg-final { scan-tree-dump-not "reduction latency = 8" "vect" } } */
> +
> +/* Do not increase the vector body cost due to the incorrect reduction 
> latency
> +Original vector body cost = 51
> +Scalar issue estimate:
> +  ...
> +  reduction latency = 2
> +  estimated min cycles per iteration = 2.00
> +  estimated cycles per vector iteration (for VF 2) = 4.00
> +Vector issue estimate:
> +  ...
> +  reduction latency = 8  <-- Too large
> +  estimated min cycles per iteration = 8.00
> +Increasing body cost to 102 because scalar code would issue more quickly
> +  ...
> +missed:  cost model: the vector iteration cost = 102 divided by the 
> scalar iteration cost = 44 is greater or equal to the vectorization factor = 
> 2.
>

Re: [PATCH v5 0/3] c++: Track lifetimes in constant evaluation [PR70331, ...]


On 7/22/23 11:12, Nathaniel Shead wrote:

This is an update of the patch series at
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625050.html


I applied the patches, with an addition to the first patch to fix 
constexpr-mutable3.C in C++11 mode, which was not part of the default 
std set.  And fixed the testsuite to run that test (and others that test 
c++11_only behavior) in C++11 mode.  Thanks!


FWIW, I test C++ patches with GXX_TESTSUITE_STDS=98,11,14,17,20,impcx 
for more coverage.



Changes since v4:

- Reordered patches to be more independent from each other (they don't need
   to keep updating the new tests)
- Removed workaround for better locations in cxx_eval_store_expression
- Don't bother checking lifetime for CONST_DECLs
- Rewrite patch for dangling pointers to keep the transformation to
   `return (, nullptr)`, but only perform it when genericising. It turns out
   that implementing this wasn't as hard as I thought it might be, at least for
   this specific case.

Thanks very much for all the reviews and comments so far!

Bootstrapped and regtested on x86_64-pc-linux-gnu.

Nathaniel Shead (3):
   c++: Improve location information in constant evaluation
   c++: Prevent dangling pointers from becoming nullptr in constexpr
 [PR110619]
   c++: Track lifetimes in constant evaluation [PR70331,PR96630,PR98675]

  gcc/cp/constexpr.cc   | 159 +-
  gcc/cp/cp-gimplify.cc |  23 ++-
  gcc/cp/cp-tree.h  |   8 +-
  gcc/cp/semantics.cc   |   4 +-
  gcc/cp/typeck.cc  |   9 +-
  gcc/testsuite/g++.dg/cpp0x/constexpr-48089.C  |  10 +-
  gcc/testsuite/g++.dg/cpp0x/constexpr-70323.C  |   8 +-
  gcc/testsuite/g++.dg/cpp0x/constexpr-70323a.C |   8 +-
  .../g++.dg/cpp0x/constexpr-delete2.C  |   5 +-
  gcc/testsuite/g++.dg/cpp0x/constexpr-diag3.C  |   2 +-
  gcc/testsuite/g++.dg/cpp0x/constexpr-ice20.C  |   1 +
  .../g++.dg/cpp0x/constexpr-recursion.C|   6 +-
  gcc/testsuite/g++.dg/cpp0x/overflow1.C|   2 +-
  gcc/testsuite/g++.dg/cpp1y/constexpr-110619.C |  10 ++
  gcc/testsuite/g++.dg/cpp1y/constexpr-89285.C  |   5 +-
  gcc/testsuite/g++.dg/cpp1y/constexpr-89481.C  |   3 +-
  .../g++.dg/cpp1y/constexpr-lifetime1.C|  13 ++
  .../g++.dg/cpp1y/constexpr-lifetime2.C|  20 +++
  .../g++.dg/cpp1y/constexpr-lifetime3.C|  13 ++
  .../g++.dg/cpp1y/constexpr-lifetime4.C|  11 ++
  .../g++.dg/cpp1y/constexpr-lifetime5.C|  11 ++
  .../g++.dg/cpp1y/constexpr-lifetime6.C|  15 ++
  .../g++.dg/cpp1y/constexpr-tracking-const14.C |   3 +-
  .../g++.dg/cpp1y/constexpr-tracking-const16.C |   3 +-
  .../g++.dg/cpp1y/constexpr-tracking-const18.C |   4 +-
  .../g++.dg/cpp1y/constexpr-tracking-const19.C |   4 +-
  .../g++.dg/cpp1y/constexpr-tracking-const21.C |   4 +-
  .../g++.dg/cpp1y/constexpr-tracking-const22.C |   4 +-
  .../g++.dg/cpp1y/constexpr-tracking-const3.C  |   3 +-
  .../g++.dg/cpp1y/constexpr-tracking-const4.C  |   3 +-
  .../g++.dg/cpp1y/constexpr-tracking-const7.C  |   3 +-
  gcc/testsuite/g++.dg/cpp1y/constexpr-union5.C |   4 +-
  gcc/testsuite/g++.dg/cpp1y/pr68180.C  |   4 +-
  .../g++.dg/cpp1z/constexpr-lambda6.C  |   4 +-
  .../g++.dg/cpp1z/constexpr-lambda8.C  |   5 +-
  gcc/testsuite/g++.dg/cpp2a/bit-cast11.C   |  10 +-
  gcc/testsuite/g++.dg/cpp2a/bit-cast12.C   |  10 +-
  gcc/testsuite/g++.dg/cpp2a/bit-cast14.C   |  14 +-
  gcc/testsuite/g++.dg/cpp2a/constexpr-98122.C  |   4 +-
  .../g++.dg/cpp2a/constexpr-dynamic17.C|   5 +-
  gcc/testsuite/g++.dg/cpp2a/constexpr-init1.C  |   5 +-
  gcc/testsuite/g++.dg/cpp2a/constexpr-new12.C  |   6 +-
  gcc/testsuite/g++.dg/cpp2a/constexpr-new3.C   |  10 +-
  gcc/testsuite/g++.dg/cpp2a/constinit10.C  |   5 +-
  .../g++.dg/cpp2a/is-corresponding-member4.C   |   4 +-
  gcc/testsuite/g++.dg/ext/constexpr-vla2.C |   4 +-
  gcc/testsuite/g++.dg/ext/constexpr-vla3.C |   4 +-
  gcc/testsuite/g++.dg/ubsan/pr63956.C  |  23 +--
  .../25_algorithms/equal/constexpr_neg.cc  |   7 +-
  .../testsuite/26_numerics/gcd/105844.cc   |  10 +-
  .../testsuite/26_numerics/lcm/105844.cc   |  14 +-
  51 files changed, 361 insertions(+), 168 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-110619.C
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime1.C
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime2.C
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime3.C
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime4.C
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime5.C
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime6.C

[pushed] testsuite: run C++11 tests in C++11 mode

Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

A recent change missed updating constexpr-mutable3.C because it wasn't run
in C++11 mode even though it checks the behavior for { target c++11_only }.

gcc/testsuite/ChangeLog:

* lib/g++-dg.exp (g++-dg-runtest): Check for c++11_only.
---
 gcc/testsuite/lib/g++-dg.exp | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/lib/g++-dg.exp b/gcc/testsuite/lib/g++-dg.exp
index 046d63170c8..142c52c8426 100644
--- a/gcc/testsuite/lib/g++-dg.exp
+++ b/gcc/testsuite/lib/g++-dg.exp
@@ -55,13 +55,16 @@ proc g++-dg-runtest { testcases flags default-extra-flags } 
{
} else {
# If the test requires a newer C++ version than which
# is tested by default, use that C++ version for that
-   # single test.  This should be updated or commented
-   # out whenever the default std_list is updated or newer
-   # C++ effective target is added.
+   # single test.  Or if a test checks behavior specifically for
+   # one C++ version, include that version in the default list.
+   # These should be adjusted whenever the default std_list is
+   # updated or newer C++ effective target is added.
if [search_for $test "\{ dg-do * \{ target c++23"] {
set std_list { 23 26 }
} elseif [search_for $test "\{ dg-do * \{ target c++26"] {
set std_list { 26 }
+   } elseif [search_for $test "c++11_only"] {
+   set std_list { 98 11 14 20 }
} else {
set std_list { 98 14 17 20 }
}

base-commit: 50656980497d77ac12a5e7179013a6af09ba32f7
-- 
2.39.3

Re: [gcc13 backport 12/12] riscv: fix error: control reaches end of non-void function

OK for backport :)

On Wed, Jul 26, 2023 at 2:11 AM Patrick O'Neill  wrote:
>
> From: Martin Liska 
>
> Fixes:
> gcc/config/riscv/sync.md:66:1: error: control reaches end of non-void 
> function [-Werror=return-type]
> 66 |   [(set (attr "length") (const_int 4))])
>| ^
>
> PR target/109713
>
> gcc/ChangeLog:
>
> * config/riscv/sync.md: Add gcc_unreachable to a switch.
> ---
>  gcc/config/riscv/sync.md | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
> index 6e7c762ac57..9fc626267de 100644
> --- a/gcc/config/riscv/sync.md
> +++ b/gcc/config/riscv/sync.md
> @@ -62,6 +62,8 @@
> return "fence\tr,rw";
>  else if (model == MEMMODEL_RELEASE)
> return "fence\trw,w";
> +else
> +   gcc_unreachable ();
>}
>[(set (attr "length") (const_int 4))])
>
> --
> 2.34.1
>

Re: [PATCH] RISC-V: optim const DF +0.0 store to mem [PR/110748]


On Fri, 21 Jul 2023 11:47:58 PDT (-0700), gcc-patches@gcc.gnu.org wrote:

On 7/21/23 12:31, Palmer Dabbelt wrote:

(define_expand "len_mask_gather_load"
   [(match_operand:VNX1_QHSD 0 "register_operand")
-   (match_operand 1 "pmode_reg_or_0_operand")
+   (match_operand:P 1 "pmode_reg_or_0_operand")
    (match_operand:VNX1_QHSDI 2 "register_operand")
    (match_operand 3 "")
    (match_operand 4 "")

a bunch of times, as there's a ton of them?  I'm not entirely sure if that
could manifest as an actual bug, though...

But won't this cause (const_int 0) to no longer match because CONST_INT
nodes are modeless (VOIDmode)?


I poked around a bit and I'm not actually sure, I'm kind of lost on the docs
here.  IIUC we're eliding the VOIDmode in the predicate correctly

   (define_predicate "const_0_operand"
 (and (match_code "const_int,const_wide_int,const_vector")
  (match_test "op == CONST0_RTX (GET_MODE (op))")))

so we're OK there, otherwise we'd presumably have similar problems with
expanders like

   (define_expand "subsi3"
 [(set (match_operand:SI   0 "register_operand" "= r")
  (minus:SI (match_operand:SI 1 "reg_or_0_operand" " rJ")
(match_operand:SI 2 "register_operand" "  r")))]
 ""

which we have a few of -- though it'd be kind of a silent failure, as
presumably we'd just end up with some more move-x0s emitted?

[COMMITTED v2 1/2] bpf: don't print () in bpf_print_operand_address

[Changes from v1: save calls to fprintf]

Unfortunately, the pseudo-C dialect syntax used for some of the v3
atomic instructions clashes with unconditionally printing the
surrounding parentheses in bpf_print_operand_address.

Instead, place the parentheses in the output templates where needed.

gcc/

* config/bpf/bpf.cc (bpf_print_operand_address): Don't print
enclosing parentheses for pseudo-C dialect.
* config/bpf/bpf.md (zero_exdendhidi2): Add parentheses around
operands of pseudo-C dialect output templates where needed.
(zero_extendqidi2): Likewise.
(zero_extendsidi2): Likewise.
(*mov): Likewise.
---
 gcc/config/bpf/bpf.cc | 11 +++
 gcc/config/bpf/bpf.md | 12 ++--
 2 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
index 55b6927a62f..2e1e3e3abcf 100644
--- a/gcc/config/bpf/bpf.cc
+++ b/gcc/config/bpf/bpf.cc
@@ -933,9 +933,10 @@ bpf_print_operand_address (FILE *file, rtx addr)
   switch (GET_CODE (addr))
 {
 case REG:
-  fprintf (file, asm_dialect == ASM_NORMAL ? "[" : "(");
+  if (asm_dialect == ASM_NORMAL)
+   fprintf (file, "[");
   bpf_print_register (file, addr, 0);
-  fprintf (file, asm_dialect == ASM_NORMAL ? "+0]" : "+0)");
+  fprintf (file, asm_dialect == ASM_NORMAL ? "+0]" : "+0");
   break;
 case PLUS:
   {
@@ -944,11 +945,13 @@ bpf_print_operand_address (FILE *file, rtx addr)
 
if (GET_CODE (op0) == REG && GET_CODE (op1) == CONST_INT)
  {
-   fprintf (file, asm_dialect == ASM_NORMAL ? "[" : "(");
+   if (asm_dialect == ASM_NORMAL)
+ fprintf (file, "[");
bpf_print_register (file, op0, 0);
fprintf (file, "+");
output_addr_const (file, op1);
-   fprintf (file, asm_dialect == ASM_NORMAL ? "]" : ")");
+   if (asm_dialect == ASM_NORMAL)
+ fprintf (file, "]");
  }
else
  fatal_insn ("invalid address in operand", addr);
diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
index 64342ea1de2..579a8213b09 100644
--- a/gcc/config/bpf/bpf.md
+++ b/gcc/config/bpf/bpf.md
@@ -260,7 +260,7 @@ (define_insn "zero_extendhidi2"
   "@
{and\t%0,0x|%0 &= 0x}
{mov\t%0,%1\;and\t%0,0x|%0 = %1;%0 &= 0x}
-   {ldxh\t%0,%1|%0 = *(u16 *) %1}"
+   {ldxh\t%0,%1|%0 = *(u16 *) (%1)}"
   [(set_attr "type" "alu,alu,ldx")])
 
 (define_insn "zero_extendqidi2"
@@ -270,7 +270,7 @@ (define_insn "zero_extendqidi2"
   "@
{and\t%0,0xff|%0 &= 0xff}
{mov\t%0,%1\;and\t%0,0xff|%0 = %1;%0 &= 0xff}
-   {ldxh\t%0,%1|%0 = *(u8 *) %1}"
+   {ldxh\t%0,%1|%0 = *(u8 *) (%1)}"
   [(set_attr "type" "alu,alu,ldx")])
 
 (define_insn "zero_extendsidi2"
@@ -280,7 +280,7 @@ (define_insn "zero_extendsidi2"
   ""
   "@
* return bpf_has_alu32 ? \"{mov32\t%0,%1|%0 = %1}\" : 
\"{mov\t%0,%1\;and\t%0,0x|%0 = %1;%0 &= 0x}\";
-   {ldxw\t%0,%1|%0 = *(u32 *) %1}"
+   {ldxw\t%0,%1|%0 = *(u32 *) (%1)}"
   [(set_attr "type" "alu,ldx")])
 
 ;;; Sign-extension
@@ -319,11 +319,11 @@ (define_insn "*mov"
 (match_operand:MM 1 "mov_src_operand"  " q,rI,B,r,I"))]
   ""
   "@
-   {ldx\t%0,%1|%0 = *( *) %1}
+   {ldx\t%0,%1|%0 = *( *) (%1)}
{mov\t%0,%1|%0 = %1}
{lddw\t%0,%1|%0 = %1 ll}
-   {stx\t%0,%1|*( *) %0 = %1}
-   {st\t%0,%1|*( *) %0 = %1}"
+   {stx\t%0,%1|*( *) (%0) = %1}
+   {st\t%0,%1|*( *) (%0) = %1}"
 [(set_attr "type" "ldx,alu,alu,stx,st")])
 
  Shifts
-- 
2.40.1

[PATCH v2 2/2] bpf: add v3 atomic instructions

[Changes from v1: fix merge issue in invoke.texi]

This patch adds support for the general atomic operations introduced in
eBPF v3. In addition to the existing atomic add instruction, this adds:
 - Atomic and, or, xor
 - Fetching versions of these operations (including add)
 - Atomic exchange
 - Atomic compare-and-exchange

To control emission of these instructions, a new target option
-m[no-]v3-atomics is added. This option is enabled by -mcpu=v3
and above.

Support for these instructions was recently added in binutils.

gcc/

* config/bpf/bpf.opt (mv3-atomics): New option.
* config/bpf/bpf.cc (bpf_option_override): Handle it here.
* config/bpf/bpf.h (enum_reg_class): Add R0 class.
(REG_CLASS_NAMES): Likewise.
(REG_CLASS_CONTENTS): Likewise.
(REGNO_REG_CLASS): Handle R0.
* config/bpf/bpf.md (UNSPEC_XADD): Rename to UNSPEC_AADD.
(UNSPEC_AAND): New unspec.
(UNSPEC_AOR): Likewise.
(UNSPEC_AXOR): Likewise.
(UNSPEC_AFADD): Likewise.
(UNSPEC_AFAND): Likewise.
(UNSPEC_AFOR): Likewise.
(UNSPEC_AFXOR): Likewise.
(UNSPEC_AXCHG): Likewise.
(UNSPEC_ACMPX): Likewise.
(atomic_add): Use UNSPEC_AADD and atomic type attribute.
Move to...
* config/bpf/atomic.md: ...Here. New file.
* config/bpf/constraints.md (t): New constraint for R0.
* doc/invoke.texi (eBPF Options): Document -mv3-atomics.

gcc/testsuite/

* gcc.target/bpf/atomic-cmpxchg-1.c: New test.
* gcc.target/bpf/atomic-cmpxchg-2.c: New test.
* gcc.target/bpf/atomic-fetch-op-1.c: New test.
* gcc.target/bpf/atomic-fetch-op-2.c: New test.
* gcc.target/bpf/atomic-fetch-op-3.c: New test.
* gcc.target/bpf/atomic-op-1.c: New test.
* gcc.target/bpf/atomic-op-2.c: New test.
* gcc.target/bpf/atomic-op-3.c: New test.
* gcc.target/bpf/atomic-xchg-1.c: New test.
* gcc.target/bpf/atomic-xchg-2.c: New test.
---
 gcc/config/bpf/atomic.md  | 185 ++
 gcc/config/bpf/bpf.cc |   3 +
 gcc/config/bpf/bpf.h  |   6 +-
 gcc/config/bpf/bpf.md |  29 ++-
 gcc/config/bpf/bpf.opt|   4 +
 gcc/config/bpf/constraints.md |   3 +
 gcc/doc/invoke.texi   |   8 +-
 .../gcc.target/bpf/atomic-cmpxchg-1.c |  19 ++
 .../gcc.target/bpf/atomic-cmpxchg-2.c |  19 ++
 .../gcc.target/bpf/atomic-fetch-op-1.c|  50 +
 .../gcc.target/bpf/atomic-fetch-op-2.c|  50 +
 .../gcc.target/bpf/atomic-fetch-op-3.c|  49 +
 gcc/testsuite/gcc.target/bpf/atomic-op-1.c|  49 +
 gcc/testsuite/gcc.target/bpf/atomic-op-2.c|  49 +
 gcc/testsuite/gcc.target/bpf/atomic-op-3.c|  49 +
 gcc/testsuite/gcc.target/bpf/atomic-xchg-1.c  |  20 ++
 gcc/testsuite/gcc.target/bpf/atomic-xchg-2.c  |  20 ++
 17 files changed, 593 insertions(+), 19 deletions(-)
 create mode 100644 gcc/config/bpf/atomic.md
 create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-cmpxchg-1.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-cmpxchg-2.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-fetch-op-1.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-fetch-op-2.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-fetch-op-3.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-op-1.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-op-2.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-op-3.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-xchg-1.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-xchg-2.c

diff --git a/gcc/config/bpf/atomic.md b/gcc/config/bpf/atomic.md
new file mode 100644
index 000..caf8cc15cd4
--- /dev/null
+++ b/gcc/config/bpf/atomic.md
@@ -0,0 +1,185 @@
+;; Machine description for eBPF.
+;; Copyright (C) 2023 Free Software Foundation, Inc.
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; .
+
+
+(define_mode_iterator AMO [SI DI])
+
+;;; Plain atomic modify operations.
+
+;; Non-fetching atomic add predates all other BPF atomic insns.
+;; Use xadd{w,dw} for compatibility with older GAS without support
+;; for v3 atomics.  Newer GAS supports "aadd[32]" in line with the
+;; other

Re: [PATCH 1/2] bpf: don't print () in bpf_print_operand_address




On 7/25/23 15:14, Jose E. Marchesi wrote:
> 
> Hi David.
> 
>> Unfortunately, the pseudo-C dialect syntax used for some of the v3
>> atomic instructions clashes with unconditionally printing the
>> surrounding parentheses in bpf_print_operand_address.
>>
>> Instead, place the parentheses in the output templates where needed.
>>
>> Tested in bpf-unknown-none.
>> OK?
>>
>> gcc/
>>
>>  * config/bpf/bpf.cc (bpf_print_operand_address): Don't print
>>  enclosing parentheses for pseudo-C dialect.
>>  * config/bpf/bpf.md (zero_exdendhidi2): Add parentheses around
>>  operands of pseudo-C dialect output templates where needed.
>>  (zero_extendqidi2): Likewise.
>>  (zero_extendsidi2): Likewise.
>>  (*mov): Likewise.
>> ---
>>  gcc/config/bpf/bpf.cc |  8 
>>  gcc/config/bpf/bpf.md | 12 ++--
>>  2 files changed, 10 insertions(+), 10 deletions(-)
>>
>> diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
>> index 55b6927a62f..2c077ea834e 100644
>> --- a/gcc/config/bpf/bpf.cc
>> +++ b/gcc/config/bpf/bpf.cc
>> @@ -933,9 +933,9 @@ bpf_print_operand_address (FILE *file, rtx addr)
>>switch (GET_CODE (addr))
>>  {
>>  case REG:
>> -  fprintf (file, asm_dialect == ASM_NORMAL ? "[" : "(");
>> +  fprintf (file, asm_dialect == ASM_NORMAL ? "[" : "");
> 
> We can save the call to fprintf there with a conditional.

Good point, thanks.
I will update these before pushing.

> 
>>bpf_print_register (file, addr, 0);
>> -  fprintf (file, asm_dialect == ASM_NORMAL ? "+0]" : "+0)");
>> +  fprintf (file, asm_dialect == ASM_NORMAL ? "+0]" : "+0");
>>break;
>>  case PLUS:
>>{
>> @@ -944,11 +944,11 @@ bpf_print_operand_address (FILE *file, rtx addr)
>>  
>>  if (GET_CODE (op0) == REG && GET_CODE (op1) == CONST_INT)
>>{
>> -fprintf (file, asm_dialect == ASM_NORMAL ? "[" : "(");
>> +fprintf (file, asm_dialect == ASM_NORMAL ? "[" : "");
> 
> Likewise.
> 
>>  bpf_print_register (file, op0, 0);
>>  fprintf (file, "+");
>>  output_addr_const (file, op1);
>> -fprintf (file, asm_dialect == ASM_NORMAL ? "]" : ")");
>> +fprintf (file, asm_dialect == ASM_NORMAL ? "]" : "");
>>}
>>  else
>>fatal_insn ("invalid address in operand", addr);
>> diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
>> index 64342ea1de2..579a8213b09 100644
>> --- a/gcc/config/bpf/bpf.md
>> +++ b/gcc/config/bpf/bpf.md
>> @@ -260,7 +260,7 @@ (define_insn "zero_extendhidi2"
>>"@
>> {and\t%0,0x|%0 &= 0x}
>> {mov\t%0,%1\;and\t%0,0x|%0 = %1;%0 &= 0x}
>> -   {ldxh\t%0,%1|%0 = *(u16 *) %1}"
>> +   {ldxh\t%0,%1|%0 = *(u16 *) (%1)}"
>>[(set_attr "type" "alu,alu,ldx")])
>>  
>>  (define_insn "zero_extendqidi2"
>> @@ -270,7 +270,7 @@ (define_insn "zero_extendqidi2"
>>"@
>> {and\t%0,0xff|%0 &= 0xff}
>> {mov\t%0,%1\;and\t%0,0xff|%0 = %1;%0 &= 0xff}
>> -   {ldxh\t%0,%1|%0 = *(u8 *) %1}"
>> +   {ldxh\t%0,%1|%0 = *(u8 *) (%1)}"
>>[(set_attr "type" "alu,alu,ldx")])
>>  
>>  (define_insn "zero_extendsidi2"
>> @@ -280,7 +280,7 @@ (define_insn "zero_extendsidi2"
>>""
>>"@
>> * return bpf_has_alu32 ? \"{mov32\t%0,%1|%0 = %1}\" : 
>> \"{mov\t%0,%1\;and\t%0,0x|%0 = %1;%0 &= 0x}\";
>> -   {ldxw\t%0,%1|%0 = *(u32 *) %1}"
>> +   {ldxw\t%0,%1|%0 = *(u32 *) (%1)}"
>>[(set_attr "type" "alu,ldx")])
>>  
>>  ;;; Sign-extension
>> @@ -319,11 +319,11 @@ (define_insn "*mov"
>>  (match_operand:MM 1 "mov_src_operand"  " q,rI,B,r,I"))]
>>""
>>"@
>> -   {ldx\t%0,%1|%0 = *( *) %1}
>> +   {ldx\t%0,%1|%0 = *( *) (%1)}
>> {mov\t%0,%1|%0 = %1}
>> {lddw\t%0,%1|%0 = %1 ll}
>> -   {stx\t%0,%1|*( *) %0 = %1}
>> -   {st\t%0,%1|*( *) %0 = %1}"
>> +   {stx\t%0,%1|*( *) (%0) = %1}
>> +   {st\t%0,%1|*( *) (%0) = %1}"
>>  [(set_attr "type" "ldx,alu,alu,stx,st")])
>>  
>>   Shifts
> 
> Otherwise, LGTM.
> OK.
> 
> Thanks!

Re: [PATCH 2/2] bpf: add v3 atomic instructions




On 7/25/23 15:18, Jose E. Marchesi wrote:
> 
> Hi David.
> 
>> +<<< HEAD
> 
> There is a merge problem there.

Ugh, I swear I've fixed this twice now. Yet it keeps cropping up.
Sorry. v2 shortly.

> 
>>  @opindex mbswap
>>  @item -mbswap
>>  Enable byte swap instructions.  Enabled for CPU v4 and above.
>> @@ -24715,6 +24716,12 @@ Enable byte swap instructions.  Enabled for CPU v4 
>> and above.
>>  @item -msdiv
>>  Enable signed division and modulus instructions.  Enabled for CPU v4
>>  and above.
>> +===
>> +@opindex mv3-atomics
>> +@item -mv3-atomics
>> +Enable instructions for general atomic operations introduced in CPU v3.
>> +Enabled for CPU v3 and above.
>> +>>> 6de76bd11b6 (bpf: add v3 atomic instructions)

Re: [PATCH 2/2] bpf: add v3 atomic instructions

2023-07-25 Thread Jose E. Marchesi via Gcc-patches



Hi David.

> +<<< HEAD

There is a merge problem there.

>  @opindex mbswap
>  @item -mbswap
>  Enable byte swap instructions.  Enabled for CPU v4 and above.
> @@ -24715,6 +24716,12 @@ Enable byte swap instructions.  Enabled for CPU v4 
> and above.
>  @item -msdiv
>  Enable signed division and modulus instructions.  Enabled for CPU v4
>  and above.
> +===
> +@opindex mv3-atomics
> +@item -mv3-atomics
> +Enable instructions for general atomic operations introduced in CPU v3.
> +Enabled for CPU v3 and above.
> +>>> 6de76bd11b6 (bpf: add v3 atomic instructions)

Re: [PATCH 1/2] bpf: don't print () in bpf_print_operand_address

2023-07-25 Thread Jose E. Marchesi via Gcc-patches



Hi David.

> Unfortunately, the pseudo-C dialect syntax used for some of the v3
> atomic instructions clashes with unconditionally printing the
> surrounding parentheses in bpf_print_operand_address.
>
> Instead, place the parentheses in the output templates where needed.
>
> Tested in bpf-unknown-none.
> OK?
>
> gcc/
>
>   * config/bpf/bpf.cc (bpf_print_operand_address): Don't print
>   enclosing parentheses for pseudo-C dialect.
>   * config/bpf/bpf.md (zero_exdendhidi2): Add parentheses around
>   operands of pseudo-C dialect output templates where needed.
>   (zero_extendqidi2): Likewise.
>   (zero_extendsidi2): Likewise.
>   (*mov): Likewise.
> ---
>  gcc/config/bpf/bpf.cc |  8 
>  gcc/config/bpf/bpf.md | 12 ++--
>  2 files changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
> index 55b6927a62f..2c077ea834e 100644
> --- a/gcc/config/bpf/bpf.cc
> +++ b/gcc/config/bpf/bpf.cc
> @@ -933,9 +933,9 @@ bpf_print_operand_address (FILE *file, rtx addr)
>switch (GET_CODE (addr))
>  {
>  case REG:
> -  fprintf (file, asm_dialect == ASM_NORMAL ? "[" : "(");
> +  fprintf (file, asm_dialect == ASM_NORMAL ? "[" : "");

We can save the call to fprintf there with a conditional.

>bpf_print_register (file, addr, 0);
> -  fprintf (file, asm_dialect == ASM_NORMAL ? "+0]" : "+0)");
> +  fprintf (file, asm_dialect == ASM_NORMAL ? "+0]" : "+0");
>break;
>  case PLUS:
>{
> @@ -944,11 +944,11 @@ bpf_print_operand_address (FILE *file, rtx addr)
>  
>   if (GET_CODE (op0) == REG && GET_CODE (op1) == CONST_INT)
> {
> - fprintf (file, asm_dialect == ASM_NORMAL ? "[" : "(");
> + fprintf (file, asm_dialect == ASM_NORMAL ? "[" : "");

Likewise.

>   bpf_print_register (file, op0, 0);
>   fprintf (file, "+");
>   output_addr_const (file, op1);
> - fprintf (file, asm_dialect == ASM_NORMAL ? "]" : ")");
> + fprintf (file, asm_dialect == ASM_NORMAL ? "]" : "");
> }
>   else
> fatal_insn ("invalid address in operand", addr);
> diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
> index 64342ea1de2..579a8213b09 100644
> --- a/gcc/config/bpf/bpf.md
> +++ b/gcc/config/bpf/bpf.md
> @@ -260,7 +260,7 @@ (define_insn "zero_extendhidi2"
>"@
> {and\t%0,0x|%0 &= 0x}
> {mov\t%0,%1\;and\t%0,0x|%0 = %1;%0 &= 0x}
> -   {ldxh\t%0,%1|%0 = *(u16 *) %1}"
> +   {ldxh\t%0,%1|%0 = *(u16 *) (%1)}"
>[(set_attr "type" "alu,alu,ldx")])
>  
>  (define_insn "zero_extendqidi2"
> @@ -270,7 +270,7 @@ (define_insn "zero_extendqidi2"
>"@
> {and\t%0,0xff|%0 &= 0xff}
> {mov\t%0,%1\;and\t%0,0xff|%0 = %1;%0 &= 0xff}
> -   {ldxh\t%0,%1|%0 = *(u8 *) %1}"
> +   {ldxh\t%0,%1|%0 = *(u8 *) (%1)}"
>[(set_attr "type" "alu,alu,ldx")])
>  
>  (define_insn "zero_extendsidi2"
> @@ -280,7 +280,7 @@ (define_insn "zero_extendsidi2"
>""
>"@
> * return bpf_has_alu32 ? \"{mov32\t%0,%1|%0 = %1}\" : 
> \"{mov\t%0,%1\;and\t%0,0x|%0 = %1;%0 &= 0x}\";
> -   {ldxw\t%0,%1|%0 = *(u32 *) %1}"
> +   {ldxw\t%0,%1|%0 = *(u32 *) (%1)}"
>[(set_attr "type" "alu,ldx")])
>  
>  ;;; Sign-extension
> @@ -319,11 +319,11 @@ (define_insn "*mov"
>  (match_operand:MM 1 "mov_src_operand"  " q,rI,B,r,I"))]
>""
>"@
> -   {ldx\t%0,%1|%0 = *( *) %1}
> +   {ldx\t%0,%1|%0 = *( *) (%1)}
> {mov\t%0,%1|%0 = %1}
> {lddw\t%0,%1|%0 = %1 ll}
> -   {stx\t%0,%1|*( *) %0 = %1}
> -   {st\t%0,%1|*( *) %0 = %1}"
> +   {stx\t%0,%1|*( *) (%0) = %1}
> +   {st\t%0,%1|*( *) (%0) = %1}"
>  [(set_attr "type" "ldx,alu,alu,stx,st")])
>  
>   Shifts

Otherwise, LGTM.
OK.

Thanks!

[PATCH 2/2] bpf: add v3 atomic instructions

This patch adds support for the general atomic operations introduced in
eBPF v3. In addition to the existing atomic add instruction, this adds:
 - Atomic and, or, xor
 - Fetching versions of these operations (including add)
 - Atomic exchange
 - Atomic compare-and-exchange

To control emission of these instructions, a new target option
-m[no-]v3-atomics is added. This option is enabled by -mcpu=v3
and above.

Support for these instructions was recently added in binutils.

Tested in bpf-unknown-none.
OK?

gcc/

* config/bpf/bpf.opt (mv3-atomics): New option.
* config/bpf/bpf.cc (bpf_option_override): Handle it here.
* config/bpf/bpf.h (enum_reg_class): Add R0 class.
(REG_CLASS_NAMES): Likewise.
(REG_CLASS_CONTENTS): Likewise.
(REGNO_REG_CLASS): Handle R0.
* config/bpf/bpf.md (UNSPEC_XADD): Rename to UNSPEC_AADD.
(UNSPEC_AAND): New unspec.
(UNSPEC_AOR): Likewise.
(UNSPEC_AXOR): Likewise.
(UNSPEC_AFADD): Likewise.
(UNSPEC_AFAND): Likewise.
(UNSPEC_AFOR): Likewise.
(UNSPEC_AFXOR): Likewise.
(UNSPEC_AXCHG): Likewise.
(UNSPEC_ACMPX): Likewise.
(atomic_add): Use UNSPEC_AADD and atomic type attribute.
Move to...
* config/bpf/atomic.md: ...Here. New file.
* config/bpf/constraints.md (t): New constraint for R0.
* doc/invoke.texi (eBPF Options): Document -mv3-atomics.

gcc/testsuite/

* gcc.target/bpf/atomic-cmpxchg-1.c: New test.
* gcc.target/bpf/atomic-cmpxchg-2.c: New test.
* gcc.target/bpf/atomic-fetch-op-1.c: New test.
* gcc.target/bpf/atomic-fetch-op-2.c: New test.
* gcc.target/bpf/atomic-fetch-op-3.c: New test.
* gcc.target/bpf/atomic-op-1.c: New test.
* gcc.target/bpf/atomic-op-2.c: New test.
* gcc.target/bpf/atomic-op-3.c: New test.
* gcc.target/bpf/atomic-xchg-1.c: New test.
* gcc.target/bpf/atomic-xchg-2.c: New test.
---
 gcc/config/bpf/atomic.md  | 185 ++
 gcc/config/bpf/bpf.cc |   3 +
 gcc/config/bpf/bpf.h  |   6 +-
 gcc/config/bpf/bpf.md |  29 ++-
 gcc/config/bpf/bpf.opt|   4 +
 gcc/config/bpf/constraints.md |   3 +
 gcc/doc/invoke.texi   |  10 +-
 .../gcc.target/bpf/atomic-cmpxchg-1.c |  19 ++
 .../gcc.target/bpf/atomic-cmpxchg-2.c |  19 ++
 .../gcc.target/bpf/atomic-fetch-op-1.c|  50 +
 .../gcc.target/bpf/atomic-fetch-op-2.c|  50 +
 .../gcc.target/bpf/atomic-fetch-op-3.c|  49 +
 gcc/testsuite/gcc.target/bpf/atomic-op-1.c|  49 +
 gcc/testsuite/gcc.target/bpf/atomic-op-2.c|  49 +
 gcc/testsuite/gcc.target/bpf/atomic-op-3.c|  49 +
 gcc/testsuite/gcc.target/bpf/atomic-xchg-1.c  |  20 ++
 gcc/testsuite/gcc.target/bpf/atomic-xchg-2.c  |  20 ++
 17 files changed, 595 insertions(+), 19 deletions(-)
 create mode 100644 gcc/config/bpf/atomic.md
 create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-cmpxchg-1.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-cmpxchg-2.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-fetch-op-1.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-fetch-op-2.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-fetch-op-3.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-op-1.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-op-2.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-op-3.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-xchg-1.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-xchg-2.c

diff --git a/gcc/config/bpf/atomic.md b/gcc/config/bpf/atomic.md
new file mode 100644
index 000..caf8cc15cd4
--- /dev/null
+++ b/gcc/config/bpf/atomic.md
@@ -0,0 +1,185 @@
+;; Machine description for eBPF.
+;; Copyright (C) 2023 Free Software Foundation, Inc.
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; .
+
+
+(define_mode_iterator AMO [SI DI])
+
+;;; Plain atomic modify operations.
+
+;; Non-fetching atomic add predates all other BPF atomic insns.
+;; Use xadd{w,dw} for compatibility with older GAS without support
+;; for v3 atomics.  Newer GAS supports "aadd[32]" in line with the
+;; other atomic operations.

[PATCH 1/2] bpf: don't print () in bpf_print_operand_address

Unfortunately, the pseudo-C dialect syntax used for some of the v3
atomic instructions clashes with unconditionally printing the
surrounding parentheses in bpf_print_operand_address.

Instead, place the parentheses in the output templates where needed.

Tested in bpf-unknown-none.
OK?

gcc/

* config/bpf/bpf.cc (bpf_print_operand_address): Don't print
enclosing parentheses for pseudo-C dialect.
* config/bpf/bpf.md (zero_exdendhidi2): Add parentheses around
operands of pseudo-C dialect output templates where needed.
(zero_extendqidi2): Likewise.
(zero_extendsidi2): Likewise.
(*mov): Likewise.
---
 gcc/config/bpf/bpf.cc |  8 
 gcc/config/bpf/bpf.md | 12 ++--
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
index 55b6927a62f..2c077ea834e 100644
--- a/gcc/config/bpf/bpf.cc
+++ b/gcc/config/bpf/bpf.cc
@@ -933,9 +933,9 @@ bpf_print_operand_address (FILE *file, rtx addr)
   switch (GET_CODE (addr))
 {
 case REG:
-  fprintf (file, asm_dialect == ASM_NORMAL ? "[" : "(");
+  fprintf (file, asm_dialect == ASM_NORMAL ? "[" : "");
   bpf_print_register (file, addr, 0);
-  fprintf (file, asm_dialect == ASM_NORMAL ? "+0]" : "+0)");
+  fprintf (file, asm_dialect == ASM_NORMAL ? "+0]" : "+0");
   break;
 case PLUS:
   {
@@ -944,11 +944,11 @@ bpf_print_operand_address (FILE *file, rtx addr)
 
if (GET_CODE (op0) == REG && GET_CODE (op1) == CONST_INT)
  {
-   fprintf (file, asm_dialect == ASM_NORMAL ? "[" : "(");
+   fprintf (file, asm_dialect == ASM_NORMAL ? "[" : "");
bpf_print_register (file, op0, 0);
fprintf (file, "+");
output_addr_const (file, op1);
-   fprintf (file, asm_dialect == ASM_NORMAL ? "]" : ")");
+   fprintf (file, asm_dialect == ASM_NORMAL ? "]" : "");
  }
else
  fatal_insn ("invalid address in operand", addr);
diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
index 64342ea1de2..579a8213b09 100644
--- a/gcc/config/bpf/bpf.md
+++ b/gcc/config/bpf/bpf.md
@@ -260,7 +260,7 @@ (define_insn "zero_extendhidi2"
   "@
{and\t%0,0x|%0 &= 0x}
{mov\t%0,%1\;and\t%0,0x|%0 = %1;%0 &= 0x}
-   {ldxh\t%0,%1|%0 = *(u16 *) %1}"
+   {ldxh\t%0,%1|%0 = *(u16 *) (%1)}"
   [(set_attr "type" "alu,alu,ldx")])
 
 (define_insn "zero_extendqidi2"
@@ -270,7 +270,7 @@ (define_insn "zero_extendqidi2"
   "@
{and\t%0,0xff|%0 &= 0xff}
{mov\t%0,%1\;and\t%0,0xff|%0 = %1;%0 &= 0xff}
-   {ldxh\t%0,%1|%0 = *(u8 *) %1}"
+   {ldxh\t%0,%1|%0 = *(u8 *) (%1)}"
   [(set_attr "type" "alu,alu,ldx")])
 
 (define_insn "zero_extendsidi2"
@@ -280,7 +280,7 @@ (define_insn "zero_extendsidi2"
   ""
   "@
* return bpf_has_alu32 ? \"{mov32\t%0,%1|%0 = %1}\" : 
\"{mov\t%0,%1\;and\t%0,0x|%0 = %1;%0 &= 0x}\";
-   {ldxw\t%0,%1|%0 = *(u32 *) %1}"
+   {ldxw\t%0,%1|%0 = *(u32 *) (%1)}"
   [(set_attr "type" "alu,ldx")])
 
 ;;; Sign-extension
@@ -319,11 +319,11 @@ (define_insn "*mov"
 (match_operand:MM 1 "mov_src_operand"  " q,rI,B,r,I"))]
   ""
   "@
-   {ldx\t%0,%1|%0 = *( *) %1}
+   {ldx\t%0,%1|%0 = *( *) (%1)}
{mov\t%0,%1|%0 = %1}
{lddw\t%0,%1|%0 = %1 ll}
-   {stx\t%0,%1|*( *) %0 = %1}
-   {st\t%0,%1|*( *) %0 = %1}"
+   {stx\t%0,%1|*( *) (%0) = %1}
+   {st\t%0,%1|*( *) (%0) = %1}"
 [(set_attr "type" "ldx,alu,alu,stx,st")])
 
  Shifts
-- 
2.40.1

Re: [PATCH] match.pd: Implement missed optimization (~X | Y) ^ X -> ~(X & Y) [PR109986]

2023-07-25 Thread Andrew Pinski via Gcc-patches

On Tue, Jul 25, 2023 at 1:54 PM Andrew Pinski  wrote:
>
> On Tue, Jul 25, 2023 at 12:45 PM Jakub Jelinek via Gcc-patches
>  wrote:
> >
> > On Tue, Jul 25, 2023 at 03:42:21PM -0400, David Edelsohn via Gcc-patches 
> > wrote:
> > > Hi, Drew
> > >
> > > Thanks for addressing this missed optimization.
> > >
> > > The testcase includes an incorrect assumption: signed char, which
> > > causes the testcase to fail on PowerPC.
> > >
> > > Should the testcase be updated to specify signed char in the function
> > > signatures or should -fsigned-char be added to the command line
> > > options?
> >
> > I think we should use signed char instead of char in the testcase.
>
> I also think it should be `signed char` instead as I mentioned in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110803 .

Committed the testsuite fix as r14-2767-g67357270772b91 .

Thanks,
Andrew

>
> Thanks,
> Andrew
>
> >
> > Jakub
> >

[COMMITTED] Fix 110803: use of plain char instead of signed char

2023-07-25 Thread Andrew Pinski via Gcc-patches

So the problem here is that plain char can either be signed
or unsigned depending on the target (powerpc and aarch64 are
unsigned while most other targets are signed). So the testcase
gcc.c-torture/execute/pr109986.c was assuming plain char was signed
char which is wrong so it is better to just change the `char` to be
`signed char`.
Note gcc.c-torture/execute/pr109986.c includes gcc.dg/tree-ssa/pr109986.c
where the plain char was being used.

Committed as obvious after a quick test to make sure 
gcc.c-torture/execute/pr109986.c
now passes and gcc.dg/tree-ssa/pr109986.c still passes.

gcc/testsuite/ChangeLog:

PR testsuite/110803
* gcc.dg/tree-ssa/pr109986.c: Change plain char to be
`signed char`.
---
 gcc/testsuite/gcc.dg/tree-ssa/pr109986.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr109986.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr109986.c
index 45f099b5656..0724510e5d5 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr109986.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr109986.c
@@ -16,14 +16,14 @@ t2 (int a, int b)
   return a ^ (~a | (unsigned int) b);
 }
 
-__attribute__((noipa)) char
-t3 (char a, char b)
+__attribute__((noipa)) signed char
+t3 (signed char a, signed char b)
 {
   return (b | ~a) ^ a;
 }
 
 __attribute__((noipa)) unsigned char
-t4 (char a, char b)
+t4 (signed char a, signed char b)
 {
   return ((unsigned char) a) ^ (b | ~a);
 }
@@ -89,20 +89,20 @@ t12 (int a, unsigned int b)
   return t3;
 }
 
-__attribute__((noipa)) char
-t13 (char a, char b)
+__attribute__((noipa)) signed char
+t13 (signed char a, signed char b)
 {
-  char t1 = ~a;
-  char t2 = b | t1;
-  char t3 = t2 ^ a;
+  signed char t1 = ~a;
+  signed char t2 = b | t1;
+  signed char t3 = t2 ^ a;
   return t3;
 }
 
 __attribute__((noipa)) unsigned char
-t14 (unsigned char a, char b)
+t14 (unsigned char a, signed char b)
 {
   unsigned char t1 = ~a;
-  char t2 = b | t1;
+  signed char t2 = b | t1;
   unsigned char t3 = a ^ t2;
   return t3;
 }
-- 
2.31.1

[patch] OpenMP: Call cuMemcpy2D/cuMemcpy3D for nvptx for omp_target_memcpy_rect

2023-07-25 Thread Tobias Burnus


The attached patch calls CUDA's cuMemcopy2D and cuMemcpy3D
for omp_target_memcpy_rect[,_async} for dim=2/dim=3. This should
speed up the data transfer for noncontiguous data.

While being there, I ended up adding support for device to other device
copying; while potentially slow, it is still better than not being able to
copy - and with shared-memory, it shouldn't be that bad.

Comments, suggestions, remarks?
If there are none, will commit it...

Disclaimer: While I have done correctness tests (system with two nvptx GPUs,
I have not done any performance tests. (I also tested it without offloading
configured, but that's rather boring.)

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
OpenMP: Call cuMemcpy2D/cuMemcpy3D for nvptx for omp_target_memcpy_rect

When copying a 2D or 3D rectangular memmory block, the performance is
better when using CUDA's cuMemcpy2D/cuMemcpy3D instead of copying the
data one by one. That's what this commit does.

Additionally, it permits device-to-device copies, if neccessary using a
temporary variable on the host.

include/ChangeLog:

	* cuda/cuda.h (CUlimit): Add CUDA_ERROR_NOT_INITIALIZED,
	CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_INVALID_HANDLE.
	(CUarray, CUmemorytype, CUDA_MEMCPY2D, CUDA_MEMCPY3D,
	CUDA_MEMCPY3D_PEER): New typdefs.
	(cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned,
	cuMemcpy3D, cuMemcpy3DAsync, cuMemcpy3DPeer,
	cuMemcpy3DPeerAsync): New prototypes.

libgomp/ChangeLog:

	* libgomp-plugin.h (GOMP_OFFLOAD_memcpy2d,
	GOMP_OFFLOAD_memcpy3d): New prototypes.
	* libgomp.h (struct gomp_device_descr): Add memcpy2d_func
	and memcpy3d_func.
	* libgomp.texi (5.1 Impl. Status): Add 'defaultmap(:all)' with 'N'.
	(nvtpx): Document when cuMemcpy2D/cuMemcpy3D is used.
	* oacc-host.c (memcpy2d_func, .memcpy3d_func): Init with NULL.
	* plugin/cuda-lib.def (cuMemcpy2D, cuMemcpy2DUnaligned,
	cuMemcpy3D): Invoke via CUDA_ONE_CALL.
	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_memcpy2d,
	GOMP_OFFLOAD_memcpy3d): New.
	* target.c (omp_target_memcpy_rect_worker):
	(omp_target_memcpy_rect_check, omp_target_memcpy_rect_copy):
	Permit all device-to-device copyies; invoke new plugins for
	2D and 3D copying when available.
	(gomp_load_plugin_for_device): DLSYM the new plugin functions.
	* testsuite/libgomp.c/target-12.c: Fix dimension bug.
	* testsuite/libgomp.fortran/target-12.f90: Likewise.
	* testsuite/libgomp.fortran/target-memcpy-rect-1.f90: New test.

 include/cuda/cuda.h|  85 
 libgomp/libgomp-plugin.h   |   7 +
 libgomp/libgomp.h  |   2 +
 libgomp/libgomp.texi   |   6 +
 libgomp/oacc-host.c|   2 +
 libgomp/plugin/cuda-lib.def|   3 +
 libgomp/plugin/plugin-nvptx.c  | 116 +
 libgomp/target.c   | 152 +-
 libgomp/testsuite/libgomp.c/target-12.c|   6 +-
 libgomp/testsuite/libgomp.fortran/target-12.f90|   6 +-
 .../libgomp.fortran/target-memcpy-rect-1.f90   | 531 +
 11 files changed, 885 insertions(+), 31 deletions(-)

diff --git a/include/cuda/cuda.h b/include/cuda/cuda.h
index 338626fb6dc..09c3c2b8dbe 100644
--- a/include/cuda/cuda.h
+++ b/include/cuda/cuda.h
@@ -47,6 +47,7 @@ typedef void *CUevent;
 typedef void *CUfunction;
 typedef void *CUlinkState;
 typedef void *CUmodule;
+typedef void *CUarray;
 typedef size_t (*CUoccupancyB2DSize)(int);
 typedef void *CUstream;
 
@@ -54,7 +55,10 @@ typedef enum {
   CUDA_SUCCESS = 0,
   CUDA_ERROR_INVALID_VALUE = 1,
   CUDA_ERROR_OUT_OF_MEMORY = 2,
+  CUDA_ERROR_NOT_INITIALIZED = 3,
+  CUDA_ERROR_DEINITIALIZED = 4,
   CUDA_ERROR_INVALID_CONTEXT = 201,
+  CUDA_ERROR_INVALID_HANDLE = 400,
   CUDA_ERROR_NOT_FOUND = 500,
   CUDA_ERROR_NOT_READY = 600,
   CUDA_ERROR_LAUNCH_FAILED = 719,
@@ -126,6 +130,75 @@ typedef enum {
   CU_LIMIT_MALLOC_HEAP_SIZE = 0x02,
 } CUlimit;
 
+typedef enum {
+  CU_MEMORYTYPE_HOST = 0x01,
+  CU_MEMORYTYPE_DEVICE = 0x02,
+  CU_MEMORYTYPE_ARRAY = 0x03,
+  CU_MEMORYTYPE_UNIFIED = 0x04
+} CUmemorytype;
+
+typedef struct {
+  size_t srcXInBytes, srcY;
+  CUmemorytype srcMemoryType;
+  const void *srcHost;
+  CUdeviceptr srcDevice;
+  CUarray srcArray;
+  size_t srcPitch;
+
+  size_t dstXInBytes, dstY;
+  CUmemorytype dstMemoryType;
+  const void *dstHost;
+  CUdeviceptr dstDevice;
+  CUarray dstArray;
+  size_t dstPitch;
+
+  size_t WidthInBytes, Height;
+} CUDA_MEMCPY2D;
+
+typedef struct {
+  size_t srcXInBytes, srcY, srcZ;
+  size_t srcLOD;
+  CUmemorytype srcMemoryType;
+  const void *srcHost;
+  CUdeviceptr srcDevice;
+  CUarray srcArray;
+  void *dummy;
+  size_t srcPitch, srcHeight;
+
+  size_t dstXInBytes, dstY, dstZ;
+  size_t dstLOD;

Re: [gcc13 backport 00/12] RISC-V: Implement ISA Manual Table A.6 Mappings


On Tue, 25 Jul 2023 14:02:24 PDT (-0700), jeffreya...@gmail.com wrote:



On 7/25/23 13:50, Jakub Jelinek wrote:

On Tue, Jul 25, 2023 at 11:01:54AM -0700, Patrick O'Neill wrote:

Discussed during the weekly RISC-V GCC meeting[1] and pre-approved by
Jeff Law.
If there aren't any objections I'll commit this cherry-picked series
on Thursday (July 27th).


Please don't before 13.2 will be released, the branch is frozen and none of
this seems to be a release blocker.

Ugh.  Missed the boat :(

I could make an argument for inclusion given the strong desire to have
compatible mappings across the toolchains and alignment with the RVI
specs -- but I won't.  As Palmer has indicated, it's been broken for a
while and we can manage that breakage.


I think if we just merge it right after 13.2 and indicate that distros 
doing long-term binary builds before 13.3 backport the patches we should 
be fine.  I think that's just Debian right now, so while it's an 
important set of bugs to get fixed it's just the single user.


It's certainly a bummer to miss 13.2, but we've just got ourselves to 
blame for forgetting about the backport ;)







jeff

Re: [gcc13 backport 00/12] RISC-V: Implement ISA Manual Table A.6 Mappings





On 7/25/23 13:50, Jakub Jelinek wrote:

On Tue, Jul 25, 2023 at 11:01:54AM -0700, Patrick O'Neill wrote:

Discussed during the weekly RISC-V GCC meeting[1] and pre-approved by
Jeff Law.
If there aren't any objections I'll commit this cherry-picked series
on Thursday (July 27th).


Please don't before 13.2 will be released, the branch is frozen and none of
this seems to be a release blocker.

Ugh.  Missed the boat :(

I could make an argument for inclusion given the strong desire to have 
compatible mappings across the toolchains and alignment with the RVI 
specs -- but I won't.  As Palmer has indicated, it's been broken for a 
while and we can manage that breakage.





jeff

Re: [PATCH] match.pd: Implement missed optimization (~X | Y) ^ X -> ~(X & Y) [PR109986]

2023-07-25 Thread Andrew Pinski via Gcc-patches

On Tue, Jul 25, 2023 at 12:45 PM Jakub Jelinek via Gcc-patches
 wrote:
>
> On Tue, Jul 25, 2023 at 03:42:21PM -0400, David Edelsohn via Gcc-patches 
> wrote:
> > Hi, Drew
> >
> > Thanks for addressing this missed optimization.
> >
> > The testcase includes an incorrect assumption: signed char, which
> > causes the testcase to fail on PowerPC.
> >
> > Should the testcase be updated to specify signed char in the function
> > signatures or should -fsigned-char be added to the command line
> > options?
>
> I think we should use signed char instead of char in the testcase.

I also think it should be `signed char` instead as I mentioned in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110803 .

Thanks,
Andrew

>
> Jakub
>

Re: [PATCH 1/2][frontend] Add novector C++ pragma


On 7/19/23 11:15, Tamar Christina wrote:

Hi All,

FORTRAN currently has a pragma NOVECTOR for indicating that vectorization should
not be applied to a particular loop.

ICC/ICX also has such a pragma for C and C++ called #pragma novector.

As part of this patch series I need a way to easily turn off vectorization of
particular loops, particularly for testsuite reasons.

This patch proposes a #pragma GCC novector that does the same for C++
as gfortan does for FORTRAN and what ICX/ICX does for C++.

I added only some basic tests here, but the next patch in the series uses this
in the testsuite in about ~800 tests.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/cp/ChangeLog:

* cp-tree.def (RANGE_FOR_STMT): Update comment.
* cp-tree.h (RANGE_FOR_NOVECTOR): New.
(cp_convert_range_for, finish_while_stmt_cond, finish_do_stmt,
finish_for_cond): Add novector param.
* init.cc (build_vec_init): Default novector to false.
* method.cc (build_comparison_op): Likewise.
* parser.cc (cp_parser_statement): Likewise.
(cp_parser_for, cp_parser_c_for, cp_parser_range_for,
cp_convert_range_for, cp_parser_iteration_statement,
cp_parser_omp_for_loop, cp_parser_pragma): Support novector.
(cp_parser_pragma_novector): New.
* pt.cc (tsubst_expr): Likewise.
* semantics.cc (finish_while_stmt_cond, finish_do_stmt,
finish_for_cond): Likewise.

gcc/ChangeLog:

* doc/extend.texi: Document it.

gcc/testsuite/ChangeLog:

* g++.dg/vect/vect.exp (support vect- prefix).
* g++.dg/vect/vect-novector-pragma.cc: New test.

--- inline copy of patch --
diff --git a/gcc/cp/cp-tree.def b/gcc/cp/cp-tree.def
index 
0e66ca70e00caa1dc4beada1024ace32954e2aaf..c13c8ea98a523c4ef1c55a11e02d5da9db7e367e
 100644
--- a/gcc/cp/cp-tree.def
+++ b/gcc/cp/cp-tree.def
@@ -305,8 +305,8 @@ DEFTREECODE (IF_STMT, "if_stmt", tcc_statement, 4)
  
  /* Used to represent a range-based `for' statement. The operands are

 RANGE_FOR_DECL, RANGE_FOR_EXPR, RANGE_FOR_BODY, RANGE_FOR_SCOPE,
-   RANGE_FOR_UNROLL, and RANGE_FOR_INIT_STMT, respectively.  Only used in
-   templates.  */
+   RANGE_FOR_UNROLL, RANGE_FOR_NOVECTOR and RANGE_FOR_INIT_STMT,
+   respectively.  Only used in templates.  */


This change is unnecessary; RANGE_FOR_NOVECTOR is a flag, not an operand.


  DEFTREECODE (RANGE_FOR_STMT, "range_for_stmt", tcc_statement, 6)
  
  /* Used to represent an expression statement.  Use `EXPR_STMT_EXPR' to

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 
dd3665c8ccf48a8a0b1ba2c06400fe50999ea240..8776e8f4cf8266ee715c3e7f943602fdb1acaf79
 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -13658,7 +13660,13 @@ cp_parser_c_for (cp_parser *parser, tree scope, tree 
init, bool ivdep,
   "% pragma");
condition = error_mark_node;
  }
-  finish_for_cond (condition, stmt, ivdep, unroll);
+  else if (novector)
+{
+  cp_parser_error (parser, "missing loop condition in loop with "
+  "% pragma");
+  condition = error_mark_node;
+}


Why is it a problem for a loop with novector to have no condition?  This 
error makes sense for the other pragmas that want to optimize based on 
the condition, it seems unneeded for this pragma.



+
+   cp_token *tok = pragma_tok;
+
+   do
  {
-   tok = cp_lexer_consume_token (parser->lexer);
-   ivdep = cp_parser_pragma_ivdep (parser, tok);
-   tok = cp_lexer_peek_token (the_parser->lexer);
+   switch (cp_parser_pragma_kind (tok))
+ {
+   case PRAGMA_IVDEP:
+ {
+   if (tok != pragma_tok)
+ tok = cp_lexer_consume_token (parser->lexer);
+   ivdep = cp_parser_pragma_ivdep (parser, tok);
+   tok = cp_lexer_peek_token (the_parser->lexer);
+   break;
+ }
+   case PRAGMA_UNROLL:
+ {
+   if (tok != pragma_tok)
+ tok = cp_lexer_consume_token (parser->lexer);
+   unroll = cp_parser_pragma_unroll (parser, tok);
+   tok = cp_lexer_peek_token (the_parser->lexer);
+   break;
+ }
+   case PRAGMA_NOVECTOR:
+ {
+   if (tok != pragma_tok)
+ tok = cp_lexer_consume_token (parser->lexer);
+   novector = cp_parser_pragma_novector (parser, tok);
+   tok = cp_lexer_peek_token (the_parser->lexer);
+   break;
+ }
+   default:
+ gcc_unreachable ();


This unreachable seems to assert that if a pragma follows one of these 
pragmas, it must be another one of these pragmas?  That seems wrong; 
instead of hitting gcc_unreachable() in that case we should

Re: [PATCH] c++: fix ICE with is_really_empty_class [PR110106]

On Tue, Jul 25, 2023 at 04:24:39PM -0400, Jason Merrill wrote:
> On 7/25/23 15:59, Marek Polacek wrote:
> > Something like this, then?  I see that cp_parser_initializer_clause et al
> > offer further opportunities (because they sometimes use a dummy too) but
> > this should be a good start.
> 
> Looks good.  Please do update the other callers as well, while you're
> looking at this.

Thanks.  Can I push this part first?

Re: [PATCH] c++: clear tf_partial et al in instantiate_template [PR108960]


On 7/25/23 15:55, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK.


-- >8 --

In 
we concluded that we might clear all flags except tf_warning_or_error
when performing instantiate_template.

PR c++/108960

gcc/cp/ChangeLog:

* pt.cc (lookup_and_finish_template_variable): Don't clear tf_partial
here.
(instantiate_template): Reset all complain flags except
tf_warning_or_error.
---
  gcc/cp/pt.cc | 14 --
  1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 21b08a6266a..265e2a59a52 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -10396,12 +10396,6 @@ lookup_and_finish_template_variable (tree templ, tree 
targs,
tree var = lookup_template_variable (templ, targs, complain);
if (var == error_mark_node)
  return error_mark_node;
-  /* We may be called while doing a partial substitution, but the
- type of the variable template may be auto, in which case we
- will call do_auto_deduction in mark_used (which clears tf_partial)
- and the auto must be properly reduced at that time for the
- deduction to work.  */
-  complain &= ~tf_partial;
var = finish_template_variable (var, complain);
mark_used (var);
return var;
@@ -22008,6 +22002,14 @@ instantiate_template (tree tmpl, tree orig_args, 
tsubst_flags_t complain)
if (tmpl == error_mark_node)
  return error_mark_node;
  
+  /* The other flags are not relevant anymore here, especially tf_partial

+ shouldn't be set.  For instance, we may be called while doing a partial
+ substitution of a template variable, but the type of the variable
+ template may be auto, in which case we will call do_auto_deduction
+ in mark_used (which clears tf_partial) and the auto must be properly
+ reduced at that time for the deduction to work.  */
+  complain &= tf_warning_or_error;
+
gcc_assert (TREE_CODE (tmpl) == TEMPLATE_DECL);
  
if (modules_p ())


base-commit: 6e424febfbcb27c21a7fe3a137e614765f9cf9d2

Re: [PATCH] c++: fix ICE with is_really_empty_class [PR110106]


On 7/25/23 15:59, Marek Polacek wrote:

On Fri, Jul 21, 2023 at 01:44:17PM -0400, Jason Merrill wrote:

On 7/20/23 17:58, Marek Polacek wrote:

On Thu, Jul 20, 2023 at 03:51:32PM -0400, Marek Polacek wrote:

On Thu, Jul 20, 2023 at 02:37:07PM -0400, Jason Merrill wrote:

On 7/20/23 14:13, Marek Polacek wrote:

On Wed, Jul 19, 2023 at 10:11:27AM -0400, Patrick Palka wrote:

On Tue, 18 Jul 2023, Marek Polacek via Gcc-patches wrote:


Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk and branches?


Looks reasonable to me.


Thanks.

Though I wonder if we could also fix this by not checking potentiality
at all in this case?  The problematic call to is_rvalue_constant_expression
happens from cp_parser_constant_expression with 'allow_non_constant' != 0
and with 'non_constant_p' being a dummy out argument that comes from
cp_parser_functional_cast, so the result of is_rvalue_constant_expression
is effectively unused in this case, and we should be able to safely elide
it when 'allow_non_constant && non_constant_p == nullptr'.


Sounds plausible.  I think my patch could be applied first since it
removes a tiny bit of code, then I can hopefully remove the flag below,
then maybe go back and optimize the call to is_rvalue_constant_expression.
Does that sound sensible?


Relatedly, ISTM the member cp_parser::non_integral_constant_expression_p
is also effectively unused and could be removed?


It looks that way.  Seems it's only used in cp_parser_constant_expression:
10806   if (allow_non_constant_p)
10807 *non_constant_p = parser->non_integral_constant_expression_p;
but that could be easily replaced by a local var.  I'd be happy to see if
we can actually do away with it.  (I wonder why it was introduced and when
it actually stopped being useful.)


It was for the C++98 notion of constant-expression, which was more of a
parser-level notion, and has been supplanted by the C++11 version.  I'm
happy to remove it, and therefore remove the is_rvalue_constant_expression
call.


Wonderful.  I'll do that next.


I found a use of parser->non_integral_constant_expression_p:
finish_id_expression_1 can set it to true which then makes
a difference in cp_parser_constant_expression in C++98.  In
cp_parser_constant_expression we set n_i_c_e_p to false, call
cp_parser_assignment_expression in which finish_id_expression_1
sets n_i_c_e_p to true, then back in cp_parser_constant_expression
we skip the cxx11 block, and set *non_constant_p to true.  If I
remove n_i_c_e_p, we lose that.  This can be seen in init/array60.C.


Sure, we would need to use the C++11 code for C++98 mode, which is likely
fine but is more uncertain.

It's probably simpler to just ignore n_i_c_e_p for C++11 and up, along with
Patrick's suggestion of allowing null non_constant_p with true
allow_non_constant_p.


Something like this, then?  I see that cp_parser_initializer_clause et al
offer further opportunities (because they sometimes use a dummy too) but
this should be a good start.


Looks good.  Please do update the other callers as well, while you're 
looking at this.



Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
It's pointless to call *_rvalue_constant_expression when we're not using
the result.  Also apply some drive-by cleanups.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_constant_expression): Allow non_constant_p to be
nullptr even when allow_non_constant_p is true.  Don't call
_rvalue_constant_expression when not necessary.  Move local variable
declarations closer to their first use.
(cp_parser_static_assert): Don't pass a dummy down to
cp_parser_constant_expression.
---
  gcc/cp/parser.cc | 24 +++-
  1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 5e2b5cba57e..efaa806f107 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -10734,11 +10734,6 @@ cp_parser_constant_expression (cp_parser* parser,
   bool *non_constant_p /* = NULL */,
   bool strict_p /* = false */)
  {
-  bool saved_integral_constant_expression_p;
-  bool saved_allow_non_integral_constant_expression_p;
-  bool saved_non_integral_constant_expression_p;
-  cp_expr expression;
-
/* It might seem that we could simply parse the
   conditional-expression, and then check to see if it were
   TREE_CONSTANT.  However, an expression that is TREE_CONSTANT is
@@ -10757,10 +10752,12 @@ cp_parser_constant_expression (cp_parser* parser,
   will fold this operation to an INTEGER_CST for `3'.  */
  
/* Save the old settings.  */

-  saved_integral_constant_expression_p = 
parser->integral_constant_expression_p;
-  saved_allow_non_integral_constant_expression_p
+  bool saved_integral_constant_expression_p
+= parser->integral_constant_expression_p;
+  bool saved_allow_non_integral_constant_expression_p
  = parser->allow_non_integral_constant_expression_p;
-

Re: [gcc13 backport 00/12] RISC-V: Implement ISA Manual Table A.6 Mappings


On Tue, 25 Jul 2023 12:50:48 PDT (-0700), ja...@redhat.com wrote:

On Tue, Jul 25, 2023 at 11:01:54AM -0700, Patrick O'Neill wrote:

Discussed during the weekly RISC-V GCC meeting[1] and pre-approved by
Jeff Law.
If there aren't any objections I'll commit this cherry-picked series
on Thursday (July 27th).


Please don't before 13.2 will be released, the branch is frozen and none of
this seems to be a release blocker.


Sorry I missed this.  IMO it's fine to wait, this has been broken for 
5-10 years so we can wait another cycle ;)




Jakub

Re: [PATCH] c++: fix ICE with is_really_empty_class [PR110106]

On Fri, Jul 21, 2023 at 01:44:17PM -0400, Jason Merrill wrote:
> On 7/20/23 17:58, Marek Polacek wrote:
> > On Thu, Jul 20, 2023 at 03:51:32PM -0400, Marek Polacek wrote:
> > > On Thu, Jul 20, 2023 at 02:37:07PM -0400, Jason Merrill wrote:
> > > > On 7/20/23 14:13, Marek Polacek wrote:
> > > > > On Wed, Jul 19, 2023 at 10:11:27AM -0400, Patrick Palka wrote:
> > > > > > On Tue, 18 Jul 2023, Marek Polacek via Gcc-patches wrote:
> > > > > > 
> > > > > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk and 
> > > > > > > branches?
> > > > > > 
> > > > > > Looks reasonable to me.
> > > > > 
> > > > > Thanks.
> > > > > > Though I wonder if we could also fix this by not checking 
> > > > > > potentiality
> > > > > > at all in this case?  The problematic call to 
> > > > > > is_rvalue_constant_expression
> > > > > > happens from cp_parser_constant_expression with 
> > > > > > 'allow_non_constant' != 0
> > > > > > and with 'non_constant_p' being a dummy out argument that comes from
> > > > > > cp_parser_functional_cast, so the result of 
> > > > > > is_rvalue_constant_expression
> > > > > > is effectively unused in this case, and we should be able to safely 
> > > > > > elide
> > > > > > it when 'allow_non_constant && non_constant_p == nullptr'.
> > > > > 
> > > > > Sounds plausible.  I think my patch could be applied first since it
> > > > > removes a tiny bit of code, then I can hopefully remove the flag 
> > > > > below,
> > > > > then maybe go back and optimize the call to 
> > > > > is_rvalue_constant_expression.
> > > > > Does that sound sensible?
> > > > > 
> > > > > > Relatedly, ISTM the member 
> > > > > > cp_parser::non_integral_constant_expression_p
> > > > > > is also effectively unused and could be removed?
> > > > > 
> > > > > It looks that way.  Seems it's only used in 
> > > > > cp_parser_constant_expression:
> > > > > 10806   if (allow_non_constant_p)
> > > > > 10807 *non_constant_p = 
> > > > > parser->non_integral_constant_expression_p;
> > > > > but that could be easily replaced by a local var.  I'd be happy to 
> > > > > see if
> > > > > we can actually do away with it.  (I wonder why it was introduced and 
> > > > > when
> > > > > it actually stopped being useful.)
> > > > 
> > > > It was for the C++98 notion of constant-expression, which was more of a
> > > > parser-level notion, and has been supplanted by the C++11 version.  I'm
> > > > happy to remove it, and therefore remove the 
> > > > is_rvalue_constant_expression
> > > > call.
> > > 
> > > Wonderful.  I'll do that next.
> > 
> > I found a use of parser->non_integral_constant_expression_p:
> > finish_id_expression_1 can set it to true which then makes
> > a difference in cp_parser_constant_expression in C++98.  In
> > cp_parser_constant_expression we set n_i_c_e_p to false, call
> > cp_parser_assignment_expression in which finish_id_expression_1
> > sets n_i_c_e_p to true, then back in cp_parser_constant_expression
> > we skip the cxx11 block, and set *non_constant_p to true.  If I
> > remove n_i_c_e_p, we lose that.  This can be seen in init/array60.C.
> 
> Sure, we would need to use the C++11 code for C++98 mode, which is likely
> fine but is more uncertain.
> 
> It's probably simpler to just ignore n_i_c_e_p for C++11 and up, along with
> Patrick's suggestion of allowing null non_constant_p with true
> allow_non_constant_p.

Something like this, then?  I see that cp_parser_initializer_clause et al
offer further opportunities (because they sometimes use a dummy too) but
this should be a good start.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
It's pointless to call *_rvalue_constant_expression when we're not using
the result.  Also apply some drive-by cleanups.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_constant_expression): Allow non_constant_p to be
nullptr even when allow_non_constant_p is true.  Don't call
_rvalue_constant_expression when not necessary.  Move local variable
declarations closer to their first use.
(cp_parser_static_assert): Don't pass a dummy down to
cp_parser_constant_expression.
---
 gcc/cp/parser.cc | 24 +++-
 1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 5e2b5cba57e..efaa806f107 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -10734,11 +10734,6 @@ cp_parser_constant_expression (cp_parser* parser,
   bool *non_constant_p /* = NULL */,
   bool strict_p /* = false */)
 {
-  bool saved_integral_constant_expression_p;
-  bool saved_allow_non_integral_constant_expression_p;
-  bool saved_non_integral_constant_expression_p;
-  cp_expr expression;
-
   /* It might seem that we could simply parse the
  conditional-expression, and then check to see if it were
  TREE_CONSTANT.  However, an expression that is TREE_CONSTANT is
@@ -10757,10 +10752,12 @@ cp_parser_constant_expression

Re: [gcc13 backport 00/12] RISC-V: Implement ISA Manual Table A.6 Mappings


On Tue, 25 Jul 2023 11:01:54 PDT (-0700), Patrick O'Neill wrote:

Discussed during the weekly RISC-V GCC meeting[1] and pre-approved by
Jeff Law.
If there aren't any objections I'll commit this cherry-picked series
on Thursday (July 27th).


+Jakub

According to the "GCC 13.1.1 Status Report (2023-07-20)", it looks like 
we're frozen for 13.2 and thus would need a release maintainer to sign 
off on anything we backport until 13.2 is released.


I'm not opposed to the backport, but it does looks like we're down to no 
P1 regressions which means we might release very soon.  So we should at 
least make sure this gets through all the tests and such.  It's kind of 
splitting hairs as this is a pretty bad set of bugs we're fixing and 
distros are probably going to just backport it anyway, so not sure what 
the right answer is.



Patchset on trunk:
https://inbox.sourceware.org/gcc-patches/20230427162301.1151333-1-patr...@rivosinc.com/
First commit: f37a36bce81b50a43ec1613c1d08d803642f7506

Also includes bugfix from:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109713
commit: 4bd434fbfc7865961a8e10d7e9601b28765ce7be

[1] 
https://inbox.sourceware.org/gcc/mhng-b7423fca-67ec-4ce4-9694-4e062632ceb0@palmer-ri-x1c9/T/#t

Martin Liska (1):
  riscv: fix error: control reaches end of non-void function

Patrick O'Neill (11):
  RISC-V: Eliminate SYNC memory models
  RISC-V: Enforce Libatomic LR/SC SEQ_CST
  RISC-V: Enforce subword atomic LR/SC SEQ_CST
  RISC-V: Enforce atomic compare_exchange SEQ_CST
  RISC-V: Add AMO release bits
  RISC-V: Strengthen atomic stores
  RISC-V: Eliminate AMO op fences
  RISC-V: Weaken LR/SC pairs
  RISC-V: Weaken mem_thread_fence
  RISC-V: Weaken atomic loads
  RISC-V: Table A.6 conformance tests

 gcc/config/riscv/riscv-protos.h   |   3 +
 gcc/config/riscv/riscv.cc |  66 --
 gcc/config/riscv/sync.md  | 196 --
 .../riscv/amo-table-a-6-amo-add-1.c   |  15 ++
 .../riscv/amo-table-a-6-amo-add-2.c   |  15 ++
 .../riscv/amo-table-a-6-amo-add-3.c   |  15 ++
 .../riscv/amo-table-a-6-amo-add-4.c   |  15 ++
 .../riscv/amo-table-a-6-amo-add-5.c   |  15 ++
 .../riscv/amo-table-a-6-compare-exchange-1.c  |   9 +
 .../riscv/amo-table-a-6-compare-exchange-2.c  |   9 +
 .../riscv/amo-table-a-6-compare-exchange-3.c  |   9 +
 .../riscv/amo-table-a-6-compare-exchange-4.c  |   9 +
 .../riscv/amo-table-a-6-compare-exchange-5.c  |   9 +
 .../riscv/amo-table-a-6-compare-exchange-6.c  |  10 +
 .../riscv/amo-table-a-6-compare-exchange-7.c  |   9 +
 .../gcc.target/riscv/amo-table-a-6-fence-1.c  |  14 ++
 .../gcc.target/riscv/amo-table-a-6-fence-2.c  |  15 ++
 .../gcc.target/riscv/amo-table-a-6-fence-3.c  |  15 ++
 .../gcc.target/riscv/amo-table-a-6-fence-4.c  |  15 ++
 .../gcc.target/riscv/amo-table-a-6-fence-5.c  |  15 ++
 .../gcc.target/riscv/amo-table-a-6-load-1.c   |  16 ++
 .../gcc.target/riscv/amo-table-a-6-load-2.c   |  17 ++
 .../gcc.target/riscv/amo-table-a-6-load-3.c   |  18 ++
 .../gcc.target/riscv/amo-table-a-6-store-1.c  |  16 ++
 .../gcc.target/riscv/amo-table-a-6-store-2.c  |  17 ++
 .../riscv/amo-table-a-6-store-compat-3.c  |  18 ++
 .../riscv/amo-table-a-6-subword-amo-add-1.c   |   9 +
 .../riscv/amo-table-a-6-subword-amo-add-2.c   |   9 +
 .../riscv/amo-table-a-6-subword-amo-add-3.c   |   9 +
 .../riscv/amo-table-a-6-subword-amo-add-4.c   |   9 +
 .../riscv/amo-table-a-6-subword-amo-add-5.c   |   9 +
 gcc/testsuite/gcc.target/riscv/pr89835.c  |   9 +
 libgcc/config/riscv/atomic.c  |   4 +-
 33 files changed, 563 insertions(+), 75 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-a-6-compare-exchange-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-a-6-compare-exchange-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-a-6-compare-exchange-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-a-6-compare-exchange-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-a-6-compare-exchange-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-a-6-compare-exchange-6.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-a-6-compare-exchange-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-fence-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-fence-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-fence-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-fence-4.c
 create mode 100644

[PATCH] c++: clear tf_partial et al in instantiate_template [PR108960]

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --

In 
we concluded that we might clear all flags except tf_warning_or_error
when performing instantiate_template.

PR c++/108960

gcc/cp/ChangeLog:

* pt.cc (lookup_and_finish_template_variable): Don't clear tf_partial
here.
(instantiate_template): Reset all complain flags except
tf_warning_or_error.
---
 gcc/cp/pt.cc | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 21b08a6266a..265e2a59a52 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -10396,12 +10396,6 @@ lookup_and_finish_template_variable (tree templ, tree 
targs,
   tree var = lookup_template_variable (templ, targs, complain);
   if (var == error_mark_node)
 return error_mark_node;
-  /* We may be called while doing a partial substitution, but the
- type of the variable template may be auto, in which case we
- will call do_auto_deduction in mark_used (which clears tf_partial)
- and the auto must be properly reduced at that time for the
- deduction to work.  */
-  complain &= ~tf_partial;
   var = finish_template_variable (var, complain);
   mark_used (var);
   return var;
@@ -22008,6 +22002,14 @@ instantiate_template (tree tmpl, tree orig_args, 
tsubst_flags_t complain)
   if (tmpl == error_mark_node)
 return error_mark_node;
 
+  /* The other flags are not relevant anymore here, especially tf_partial
+ shouldn't be set.  For instance, we may be called while doing a partial
+ substitution of a template variable, but the type of the variable
+ template may be auto, in which case we will call do_auto_deduction
+ in mark_used (which clears tf_partial) and the auto must be properly
+ reduced at that time for the deduction to work.  */
+  complain &= tf_warning_or_error;
+
   gcc_assert (TREE_CODE (tmpl) == TEMPLATE_DECL);
 
   if (modules_p ())

base-commit: 6e424febfbcb27c21a7fe3a137e614765f9cf9d2
-- 
2.41.0

Re: [gcc13 backport 00/12] RISC-V: Implement ISA Manual Table A.6 Mappings

2023-07-25 Thread Jakub Jelinek via Gcc-patches

On Tue, Jul 25, 2023 at 11:01:54AM -0700, Patrick O'Neill wrote:
> Discussed during the weekly RISC-V GCC meeting[1] and pre-approved by
> Jeff Law.
> If there aren't any objections I'll commit this cherry-picked series
> on Thursday (July 27th).

Please don't before 13.2 will be released, the branch is frozen and none of
this seems to be a release blocker.

Jakub

Re: [PATCH] match.pd: Implement missed optimization (~X | Y) ^ X -> ~(X & Y) [PR109986]

2023-07-25 Thread Jakub Jelinek via Gcc-patches

On Tue, Jul 25, 2023 at 03:42:21PM -0400, David Edelsohn via Gcc-patches wrote:
> Hi, Drew
> 
> Thanks for addressing this missed optimization.
> 
> The testcase includes an incorrect assumption: signed char, which
> causes the testcase to fail on PowerPC.
> 
> Should the testcase be updated to specify signed char in the function
> signatures or should -fsigned-char be added to the command line
> options?

I think we should use signed char instead of char in the testcase.

Jakub

Re: [PATCH] match.pd: Implement missed optimization (x << c) >> c -> -(x & 1) [PR101955]

2023-07-25 Thread Jakub Jelinek via Gcc-patches

On Tue, Jul 25, 2023 at 03:25:57PM -0400, Drew Ross wrote:
> > With that fixed I think for non-vector integrals the above is the most
> suitable
> > canonical form of a sign-extension.  Note it should also work for any
> other
> > constant shift amount - just use the appropriate intermediate precision
> for
> > the truncating type.
> > We _might_ want
> > to consider to only use the converts when the intermediate type has
> > mode precision (and as a special case allow one bit as in your above case)
> > so it can expand to (sign_extend: (subreg: reg)).
> 
> Here is a pattern that that only matches to truncations that result in mode
> precision (or precision of 1):
> 
> (simplify
>  (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
>  (if (INTEGRAL_TYPE_P (type)
>   && !TYPE_UNSIGNED (type)
>   && wi::gt_p (element_precision (type), wi::to_wide (@1), TYPE_SIGN
> (TREE_TYPE (@1

I'd use
 && wi::ltu_p (wi::to_wide (@1), element_precision (type))
If the shift count would be negative, you'd otherwise ICE in tree_to_uhwi on
it (sure, that is UB at runtime, but compiler shouldn't ICE on it).

>   (with {
> int width = element_precision (type) - tree_to_uhwi (@1);
> tree stype = build_nonstandard_integer_type (width, 0);
>}
>(if (TYPE_PRECISION (stype) == 1 || type_has_mode_precision_p (stype))
> (convert (convert:stype @0))

Jakub

Re: [PATCH] match.pd: Implement missed optimization (~X | Y) ^ X -> ~(X & Y) [PR109986]

2023-07-25 Thread David Edelsohn via Gcc-patches

Hi, Drew

Thanks for addressing this missed optimization.

The testcase includes an incorrect assumption: signed char, which
causes the testcase to fail on PowerPC.

Should the testcase be updated to specify signed char in the function
signatures or should -fsigned-char be added to the command line
options?

Thanks, David

Re: [PATCH] match.pd: Implement missed optimization (x << c) >> c -> -(x & 1) [PR101955]

2023-07-25 Thread Drew Ross via Gcc-patches

> With that fixed I think for non-vector integrals the above is the most
suitable
> canonical form of a sign-extension.  Note it should also work for any
other
> constant shift amount - just use the appropriate intermediate precision
for
> the truncating type.
> We _might_ want
> to consider to only use the converts when the intermediate type has
> mode precision (and as a special case allow one bit as in your above case)
> so it can expand to (sign_extend: (subreg: reg)).

Here is a pattern that that only matches to truncations that result in mode
precision (or precision of 1):

(simplify
 (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
 (if (INTEGRAL_TYPE_P (type)
  && !TYPE_UNSIGNED (type)
  && wi::gt_p (element_precision (type), wi::to_wide (@1), TYPE_SIGN
(TREE_TYPE (@1
  (with {
int width = element_precision (type) - tree_to_uhwi (@1);
tree stype = build_nonstandard_integer_type (width, 0);
   }
   (if (TYPE_PRECISION (stype) == 1 || type_has_mode_precision_p (stype))
(convert (convert:stype @0))

Look ok?

> You might also want to verify what RTL expansion
> produces before/after - it at least shouldn't be worse.

The RTL is slightly better for the mode precision cases and slightly worse
for the precision 1 case.

> That said - do you have any testcase where the canonicalization is an
enabler
> for further transforms or was this requested stand-alone?

No, I don't have any specific test cases. This patch is just in response to
pr101955 .

On Tue, Jul 25, 2023 at 2:55 AM Richard Biener 
wrote:

> On Mon, Jul 24, 2023 at 9:42 PM Jakub Jelinek  wrote:
> >
> > On Mon, Jul 24, 2023 at 03:29:54PM -0400, Drew Ross via Gcc-patches
> wrote:
> > > So would something like
> > >
> > > (simplify
> > >  (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
> > >  (with { tree stype = build_nonstandard_integer_type (1, 0); }
> > >  (if (INTEGRAL_TYPE_P (type)
> > >   && !TYPE_UNSIGNED (type)
> > >   && wi::eq_p (wi::to_wide (@1), element_precision (type) - 1))
> > >   (convert (convert:stype @0)
> > >
> > > work?
> >
> > Certainly swap the if and with and the (with then should be indented by 1
> > column to the right of (if and (convert one further (the reason for the
> > swapping is not to call build_nonstandard_integer_type when it will not
> be
> > needed, which will be probably far more often then an actual match).
>
> With that fixed I think for non-vector integrals the above is the most
> suitable
> canonical form of a sign-extension.  Note it should also work for any other
> constant shift amount - just use the appropriate intermediate precision for
> the truncating type.  You might also want to verify what RTL expansion
> produces before/after - it at least shouldn't be worse.  We _might_ want
> to consider to only use the converts when the intermediate type has
> mode precision (and as a special case allow one bit as in your above case)
> so it can expand to (sign_extend: (subreg: reg)).
>
> > As discussed privately, the above isn't what we want for vectors and the
> 2
> > shifts are probably best on most arches because even when using -(x & 1)
> the
> > { 1, 1, 1, ... } vector would often needed to be loaded from memory.
>
> I think for vectors a vpcmpgt {0,0,0,..}, %xmm is the cheapest way of
> producing the result.  Note that to reflect this on GIMPLE you'd need
>
>   _2 = _1 < { 0,0...};
>   res = _2 ? { -1, -1, ...} : { 0, 0,...};
>
> because whether the ISA has a way to produce all-ones masks isn't known.
>
> For scalars using -(T)(_1 < 0) would also be possible.
>
> That said - do you have any testcase where the canonicalization is an
> enabler
> for further transforms or was this requested stand-alone?
>
> Thanks,
> Richard.
>
> > Jakub
> >
>
>

List myself as "nvptx port" maintainer (was: Thomas Schwinge appointed co-maintainer of the nvptx backend)

2023-07-25 Thread Thomas Schwinge

Hi!

On 2023-07-19T23:41:47+0200, Gerald Pfeifer  wrote:
> It's my pleasure to announce Thomas Schwinge as co-maintainer of the
> nvptx backend.
>
> Congratulations and Happy Hacking, Thomas! Please go ahead and update
> MAINTAINERS accordingly.
>
> Gerald (on behalf of the steering committee)

Thanks!  I've pushed commit 28e3d361ba0cfa7ea2f90706159a144eaf4b650e
'List myself as "nvptx port" maintainer', see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 28e3d361ba0cfa7ea2f90706159a144eaf4b650e Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 25 Jul 2023 21:17:52 +0200
Subject: [PATCH] List myself as "nvptx port" maintainer

	* MAINTAINERS: List myself as "nvptx port" maintainer.
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index b626d89fe34..e9b11b43a0f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -102,6 +102,7 @@ nds32 port		Shiva Chen		
 nios2 port		Chung-Lin Tang		
 nios2 port		Sandra Loosemore	
 nvptx port		Tom de Vries		
+nvptx port		Thomas Schwinge		
 or1k port		Stafford Horne		
 pdp11 port		Paul Koning		
 powerpcspe port		Andrew Jenner		
-- 
2.34.1

Re: [PATCH] range-op-float: Fix up -frounding-math frange_arithmetic +- handling [PR110755]

2023-07-25 Thread Aldy Hernandez via Gcc-patches

The frange bits look fine to me, so if you feel confident in the math 
logic, go right ahead :).


Thanks.
Aldy

On 7/24/23 18:01, Jakub Jelinek wrote:

Hi!

IEEE754 says that x + (-x) and x - x result in +0 in all rounding modes
but rounding towards negative infinity, in which case the result is -0
for all finite x.  x + x and x - (-x) if it is zero retain sign of x.
Now, range_arithmetic implements the normal rounds to even rounding,
and as the addition or subtraction in those cases is exact, we don't do any
further rounding etc. and e.g. on the testcase below distilled from glibc
compute a range [+0, +INF], which is fine for -fno-rounding-math or
if we'd have a guarantee that those statements aren't executed with rounding
towards negative infinity.

I believe it is only +- which has this problematic behavior and I think
it is best to deal with it in frange_arithmetic; if we know -frounding-math
is on, it is x + (-x) or x - x and we are asked to round to negative
infinity (i.e. want low bound rather than high bound), change +0 result to
-0.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk and
after a while for 13.3?  I'm afraid rushing this so late into 13.2...

2023-07-24  Jakub Jelinek  

PR tree-optimization/110755
* range-op-float.cc (frange_arithmetic): Change +0 result to -0
for PLUS_EXPR or MINUS_EXPR if -frounding-math, inf is negative and
it is exact op1 + (-op1) or op1 - op1.

* gcc.dg/pr110755.c: New test.

--- gcc/range-op-float.cc.jj2023-07-23 19:32:20.832434105 +0200
+++ gcc/range-op-float.cc   2023-07-24 09:41:26.231030258 +0200
@@ -324,6 +324,24 @@ frange_arithmetic (enum tree_code code,
bool inexact = real_arithmetic (, code, , );
real_convert (, mode, );
  
+  /* When rounding towards negative infinity, x + (-x) and

+ x - x is -0 rather than +0 real_arithmetic computes.
+ So, when we are looking for lower bound (inf is negative),
+ use -0 rather than +0.  */
+  if (flag_rounding_math
+  && (code == PLUS_EXPR || code == MINUS_EXPR)
+  && !inexact
+  && real_iszero ()
+  && !real_isneg ()
+  && real_isneg ())
+{
+  REAL_VALUE_TYPE op2a = op2;
+  if (code == PLUS_EXPR)
+   op2a.sign ^= 1;
+  if (real_isneg () == real_isneg () && real_equal (, ))
+   result.sign = 1;
+}
+
// Be extra careful if there may be discrepancies between the
// compile and runtime results.
bool round = false;
--- gcc/testsuite/gcc.dg/pr110755.c.jj  2023-07-21 10:34:05.037251433 +0200
+++ gcc/testsuite/gcc.dg/pr110755.c 2023-07-21 10:35:10.986326816 +0200
@@ -0,0 +1,29 @@
+/* PR tree-optimization/110755 */
+/* { dg-do run } */
+/* { dg-require-effective-target fenv } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-O2 -frounding-math" } */
+
+#include 
+
+__attribute__((noipa)) float
+foo (float x)
+{
+  if (x > 0.0)
+{
+  x += 0x1p+23;
+  x -= 0x1p+23;
+  x = __builtin_fabsf (x);
+}
+  return x;
+}
+
+int
+main ()
+{
+#ifdef FE_DOWNWARD
+  fesetround (FE_DOWNWARD);
+  if (__builtin_signbit (foo (0.5)))
+__builtin_abort ();
+#endif
+}

Jakub

[PATCH] Initialize value in bit_value_unop.

2023-07-25 Thread Aldy Hernandez via Gcc-patches

bit_value_binop initializes VAL regardless of the final mask.  It even
has a comment to that effect:

  /* Ensure that VAL is initialized (to any value).  */

However, bit_value_unop, which in theory shares the same API, does not.
This causes range-ops to choke on uninitialized VALs for some inputs to
ABS.

Instead of fixing the callers, it's cleaner to make bit_value_unop and
bit_value_binop consistent.

OK for trunk?

gcc/ChangeLog:

* tree-ssa-ccp.cc (bit_value_unop): Initialize val when appropriate.
---
 gcc/tree-ssa-ccp.cc | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-ssa-ccp.cc b/gcc/tree-ssa-ccp.cc
index 73fb7c11c64..15e65f16008 100644
--- a/gcc/tree-ssa-ccp.cc
+++ b/gcc/tree-ssa-ccp.cc
@@ -1359,7 +1359,10 @@ bit_value_unop (enum tree_code code, signop type_sgn, 
int type_precision,
 case ABS_EXPR:
 case ABSU_EXPR:
   if (wi::sext (rmask, rtype_precision) == -1)
-   *mask = -1;
+   {
+ *mask = -1;
+ *val = 0;
+   }
   else if (wi::neg_p (rmask))
{
  /* Result is either rval or -rval.  */
@@ -1385,6 +1388,7 @@ bit_value_unop (enum tree_code code, signop type_sgn, int 
type_precision,
 
 default:
   *mask = -1;
+  *val = 0;
   break;
 }
 }
-- 
2.41.0

[COMMITTED] Make some functions in CCP static.

2023-07-25 Thread Aldy Hernandez via Gcc-patches

Committed as obvious.

gcc/ChangeLog:

* tree-ssa-ccp.cc (value_mask_to_min_max): Make static.
(bit_value_mult_const): Same.
(get_individual_bits): Same.
---
 gcc/tree-ssa-ccp.cc | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-ssa-ccp.cc b/gcc/tree-ssa-ccp.cc
index 64d5fa81334..73fb7c11c64 100644
--- a/gcc/tree-ssa-ccp.cc
+++ b/gcc/tree-ssa-ccp.cc
@@ -1297,7 +1297,7 @@ ccp_fold (gimple *stmt)
represented by the mask pair VAL and MASK with signedness SGN and
precision PRECISION.  */
 
-void
+static void
 value_mask_to_min_max (widest_int *min, widest_int *max,
   const widest_int , const widest_int ,
   signop sgn, int precision)
@@ -1391,7 +1391,7 @@ bit_value_unop (enum tree_code code, signop type_sgn, int 
type_precision,
 
 /* Determine the mask pair *VAL and *MASK from multiplying the
argument mask pair RVAL, RMASK by the unsigned constant C.  */
-void
+static void
 bit_value_mult_const (signop sgn, int width,
  widest_int *val, widest_int *mask,
  const widest_int , const widest_int ,
@@ -1453,7 +1453,7 @@ bit_value_mult_const (signop sgn, int width,
bits in X (capped at the maximum value MAX).  For example, an X
value 11, places 1, 2 and 8 in BITS and returns the value 3.  */
 
-unsigned int
+static unsigned int
 get_individual_bits (widest_int *bits, widest_int x, unsigned int max)
 {
   unsigned int count = 0;
-- 
2.41.0

[PATCH] Replace invariant ternlog operands

2023-07-25 Thread Yan Simonaytes

Sometimes GCC generates ternlog with three operands, but some of them are 
invariant.
For example:

vpternlogq  $252, %zmm2, %zmm1, %zmm0

In this case zmm1 register isnt used by ternlog.
So should replace zmm1 with zmm0 or zmm2:

vpternlogq  $252, %zmm0, %zmm1, %zmm0

When the third operand of ternlog is memory and both others are invariant 
should add load instruction from this memory to register
and replace the first and the second operands to this register. 
So insted of

vpternlogq  $85, (%rdi), %zmm1, %zmm0

Should emit

vmovdqa64   (%rdi), %zmm0
vpternlogq  $85, %zmm0, %zmm0, %zmm0

gcc/ChangeLog:

* config/i386/i386.cc (ternlog_invariant_operand_mask): New helper
function for replacing invariant operands.
(reduce_ternlog_operands): Likewise.
* config/i386/i386-protos.h (ternlog_invariant_operand_mask): Prototype 
here.
(reduce_ternlog_operands): Likewise.
* config/i386/sse.md:

gcc/testsuite/ChangeLog:

* gcc.target/i386/reduce-ternlog-operands-1.c: New test.
* gcc.target/i386/reduce-ternlog-operands-2.c: New test.
---
 gcc/config/i386/i386-protos.h |  2 +
 gcc/config/i386/i386.cc   | 45 +++
 gcc/config/i386/sse.md| 43 ++
 .../i386/reduce-ternlog-operands-1.c  | 20 +
 .../i386/reduce-ternlog-operands-2.c  | 11 +
 5 files changed, 121 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/reduce-ternlog-operands-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/reduce-ternlog-operands-2.c

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 27fe73ca65c..49398ef9936 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -57,6 +57,8 @@ extern int standard_80387_constant_p (rtx);
 extern const char *standard_80387_constant_opcode (rtx);
 extern rtx standard_80387_constant_rtx (int);
 extern int standard_sse_constant_p (rtx, machine_mode);
+extern int ternlog_invariant_operand_mask (rtx *operands);
+extern void reduce_ternlog_operands (rtx *operands);
 extern const char *standard_sse_constant_opcode (rtx_insn *, rtx *);
 extern bool ix86_standard_x87sse_constant_load_p (const rtx_insn *, rtx);
 extern bool ix86_pre_reload_split (void);
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index f0d6167e667..140de478571 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -5070,6 +5070,51 @@ ix86_check_no_addr_space (rtx insn)
 }
   return true;
 }
+
+/* Return mask of invariant operands:
+   bit number 0 1 2
+   operand number 1 2 3.  */
+
+int
+ternlog_invariant_operand_mask (rtx *operands)
+{
+  int mask = 0;
+  int imm8 = XINT (operands[4], 0);
+
+  if (((imm8 >> 4) & 0xF) == (imm8 & 0xF))
+mask |= 1;
+  if (((imm8 >> 2) & 0x33) == (imm8 & 0x33))
+mask |= (1 << 1);
+  if (((imm8 >> 1) & 0x55) == (imm8 & 0x55))
+mask |= (1 << 2);
+
+  return mask;
+}
+
+/* Replace one of the unused operators with the one used.  */
+
+void
+reduce_ternlog_operands (rtx *operands)
+{
+  int mask = ternlog_invariant_operand_mask (operands);
+
+  if (mask & 1) /* the first operand is invariant.  */
+operands[1] = operands[2];
+
+  if (mask & 2) /* the second operand is invariant.  */
+operands[2] = operands[1];
+
+  if (mask & 4)/* the third operand is invariant.  */
+   operands[3] = operands[1];
+  else if (!MEM_P (operands[3]))
+{
+  if (mask & 1) /* the first operand is invariant.  */
+   operands[1] = operands[3];
+  if (mask & 2) /* the second operands is invariant.  */
+   operands[2] = operands[3];
+}
+}
+
 
 /* Initialize the table of extra 80387 mathematical constants.  */
 
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index a2099373123..f88d82b315c 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -12625,6 +12625,49 @@
  (symbol_ref " == 64 || TARGET_AVX512VL")
  (const_string "*")))])
 
+;; If the first and the second operands of ternlog are invariant and
+;; the third operand is memory
+;; then we should add load third operand from memory to register and
+;; replace first and second operands with this register
+(define_split
+  [(set (match_operand:V 0 "register_operand")
+   (unspec:V
+ [(match_operand:V 1 "register_operand")
+  (match_operand:V 2 "register_operand")
+  (match_operand:V 3 "memory_operand")
+  (match_operand:SI 4 "const_0_to_255_operand")]
+ UNSPEC_VTERNLOG))]
+  "ternlog_invariant_operand_mask (operands) == 3 && !reload_completed"
+  [(set (match_dup 0)
+   (match_dup 3))
+   (set (match_dup 0)
+   (unspec:V
+ [(match_dup 0)
+  (match_dup 0)
+  (match_dup 0)
+  (match_dup 4)]
+ UNSPEC_VTERNLOG))])
+
+;; Replace invariant ternlog operands with used operands
+;; (except for

Re: [PATCH 2/1] c++: passing partially inst ttp as ttp [PR110566]


On 7/24/23 13:03, Patrick Palka wrote:

On Fri, 21 Jul 2023, Jason Merrill wrote:

On 7/21/23 14:34, Patrick Palka wrote:

(This is a follow-up of
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624951.html)

Bootstrapped and regtested on x86_64-pc-linux-gnu, how does this look?

-- >8 --

The previous fix doesn't work for partially instantiated ttps primarily
because most_general_template doesn't work for them.  This patch fixes
this by giving such ttps a DECL_TEMPLATE_INFO (extending the
r11-734-g2fb595f8348e16 fix) with which we can obtain the original ttp.

This patch additionally makes us be more careful about using the correct
amount of levels from the scope of a ttp argument during
coerce_template_template_parms.

PR c++/110566

gcc/cp/ChangeLog:

* pt.cc (reduce_template_parm_level): Set DECL_TEMPLATE_INFO
on the DECL_TEMPLATE_RESULT of a reduced template template
parameter.
(add_defaults_to_ttp): Also update DECL_TEMPLATE_INFO of the
ttp's DECL_TEMPLATE_RESULT.
(coerce_template_template_parms): Make sure 'scope_args' has
the right amount of levels for the ttp argument.
(most_general_template): Handle template template parameters.

gcc/testsuite/ChangeLog:

* g++.dg/template/ttp39.C: New test.
---
   gcc/cp/pt.cc  | 46 ---
   gcc/testsuite/g++.dg/template/ttp39.C | 16 ++
   2 files changed, 57 insertions(+), 5 deletions(-)
   create mode 100644 gcc/testsuite/g++.dg/template/ttp39.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index e0ed4bc8bbb..be7119dd9a0 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -4570,8 +4570,14 @@ reduce_template_parm_level (tree index, tree type,
int levels, tree args,
  TYPE_DECL, DECL_NAME (decl), type);
  DECL_TEMPLATE_RESULT (decl) = inner;
  DECL_ARTIFICIAL (inner) = true;
- DECL_TEMPLATE_PARMS (decl) = tsubst_template_parms
-   (DECL_TEMPLATE_PARMS (orig_decl), args, complain);
+ tree parms = tsubst_template_parms (DECL_TEMPLATE_PARMS (orig_decl),
+ args, complain);
+ DECL_TEMPLATE_PARMS (decl) = parms;
+ retrofit_lang_decl (inner);
+ tree orig_inner = DECL_TEMPLATE_RESULT (orig_decl);
+ DECL_TEMPLATE_INFO (inner)
+   = build_template_info (DECL_TI_TEMPLATE (orig_inner),
+  template_parms_to_args (parms));


Should we assert that orig_inner doesn't have its own DECL_TEMPLATE_INFO?  I'm
wondering if it's possible to reduce the level of a TTP more than once.


It's possible for a ttp belonging to a nested generic lambda:

   template
   void f() {
 [](auto) {
   [] class TT>() {
   };
 }(0);
   }

   template void f();




}
   /* Attach the TPI to the decl.  */
@@ -7936,6 +7942,19 @@ add_defaults_to_ttp (tree otmpl)
}
   }
   +  tree oresult = DECL_TEMPLATE_RESULT (otmpl);
+  tree gen_otmpl = DECL_TI_TEMPLATE (oresult);


Hmm, here we're assuming that all TTPs have DECL_TEMPLATE_INFO?


I figured it's a reasonable assumption since all "formal" ttps
originally start out with DECL_TEMPLATE_INFO (via process_template_parm).
Though I realized I missed adjusting rewrite_template_parm to set
DECL_TEMPLATE_INFO on the new ttp, which the below patch fixes (and
adds a testcase that we'd otherwise segfualt on).




+  tree gen_ntmpl;
+  if (gen_otmpl == otmpl)
+gen_ntmpl = ntmpl;
+  else
+gen_ntmpl = add_defaults_to_ttp (gen_otmpl);
+
+  tree nresult = copy_node (oresult);


Another fixed bug: since we build the new DECL_TEMPLATE_RESULT via
copy_node, we need to avoid sharing its DECL_LANG_SPECIFIC with the
old decl.


+  DECL_TEMPLATE_INFO (nresult) = copy_node (DECL_TEMPLATE_INFO (oresult));
+  DECL_TI_TEMPLATE (nresult) = gen_ntmpl;
+  DECL_TEMPLATE_RESULT (ntmpl) = nresult;
+
 hash_map_safe_put (defaulted_ttp_cache, otmpl, ntmpl);
 return ntmpl;
   }
@@ -8121,15 +8140,29 @@ coerce_template_template_parms (tree parm_tmpl,
 OUTER_ARGS are not the right outer levels in this case, as they are
 the args we're building up for PARM, and for the coercion we want the
 args for ARG.  If DECL_CONTEXT isn't set for a template template
-parameter, we can assume that it's in the current scope.  In that
case
-we might end up adding more levels than needed, but that shouldn't be
-a problem; any args we need to refer to are at the right level.  */
+parameter, we can assume that it's in the current scope.  */
 tree ctx = DECL_CONTEXT (arg_tmpl);
 if (!ctx && DECL_TEMPLATE_TEMPLATE_PARM_P (arg_tmpl))
ctx = current_scope ();
 tree scope_args = NULL_TREE;
 if (tree tinfo = get_template_info (ctx))
scope_args = TI_ARGS (tinfo);
+  if (DECL_TEMPLATE_TEMPLATE_PARM_P (arg_tmpl))
+   {
+ int level =

[gcc13 backport 12/12] riscv: fix error: control reaches end of non-void function

From: Martin Liska 

Fixes:
gcc/config/riscv/sync.md:66:1: error: control reaches end of non-void function 
[-Werror=return-type]
66 |   [(set (attr "length") (const_int 4))])
   | ^

PR target/109713

gcc/ChangeLog:

* config/riscv/sync.md: Add gcc_unreachable to a switch.
---
 gcc/config/riscv/sync.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index 6e7c762ac57..9fc626267de 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -62,6 +62,8 @@
return "fence\tr,rw";
 else if (model == MEMMODEL_RELEASE)
return "fence\trw,w";
+else
+   gcc_unreachable ();
   }
   [(set (attr "length") (const_int 4))])
 
-- 
2.34.1

[gcc13 backport 10/12] RISC-V: Weaken atomic loads

This change brings atomic loads in line with table A.6 of the ISA
manual.

2023-04-27 Patrick O'Neill 

gcc/ChangeLog:

* config/riscv/sync.md (atomic_load): Implement atomic
load mapping.

Signed-off-by: Patrick O'Neill 
---
 gcc/config/riscv/sync.md | 28 ++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index ba132d8a1ce..6e7c762ac57 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -26,6 +26,7 @@
   UNSPEC_SYNC_OLD_OP_SUBWORD
   UNSPEC_SYNC_EXCHANGE
   UNSPEC_SYNC_EXCHANGE_SUBWORD
+  UNSPEC_ATOMIC_LOAD
   UNSPEC_ATOMIC_STORE
   UNSPEC_MEMORY_BARRIER
 ])
@@ -66,8 +67,31 @@
 
 ;; Atomic memory operations.
 
-;; Implement atomic stores with conservative fences.  Fall back to fences for
-;; atomic loads.
+(define_insn "atomic_load"
+  [(set (match_operand:GPR 0 "register_operand" "=r")
+(unspec_volatile:GPR
+  [(match_operand:GPR 1 "memory_operand" "A")
+   (match_operand:SI 2 "const_int_operand")]  ;; model
+  UNSPEC_ATOMIC_LOAD))]
+  "TARGET_ATOMIC"
+  {
+enum memmodel model = (enum memmodel) INTVAL (operands[2]);
+model = memmodel_base (model);
+
+if (model == MEMMODEL_SEQ_CST)
+  return "fence\trw,rw\;"
+"l\t%0,%1\;"
+"fence\tr,rw";
+if (model == MEMMODEL_ACQUIRE)
+  return "l\t%0,%1\;"
+"fence\tr,rw";
+else
+  return "l\t%0,%1";
+  }
+  [(set_attr "type" "atomic")
+   (set (attr "length") (const_int 12))])
+
+;; Implement atomic stores with conservative fences.
 ;; This allows us to be compatible with the ISA manual Table A.6 and Table A.7.
 (define_insn "atomic_store"
   [(set (match_operand:GPR 0 "memory_operand" "=A")
-- 
2.34.1

[gcc13 backport 11/12] RISC-V: Table A.6 conformance tests

These tests cover basic cases to ensure the atomic mappings follow the
strengthened Table A.6 mappings that are compatible with Table A.7.

2023-04-27 Patrick O'Neill 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/amo-table-a-6-amo-add-1.c: New test.
* gcc.target/riscv/amo-table-a-6-amo-add-2.c: New test.
* gcc.target/riscv/amo-table-a-6-amo-add-3.c: New test.
* gcc.target/riscv/amo-table-a-6-amo-add-4.c: New test.
* gcc.target/riscv/amo-table-a-6-amo-add-5.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-1.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-2.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-3.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-4.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-5.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-6.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-7.c: New test.
* gcc.target/riscv/amo-table-a-6-fence-1.c: New test.
* gcc.target/riscv/amo-table-a-6-fence-2.c: New test.
* gcc.target/riscv/amo-table-a-6-fence-3.c: New test.
* gcc.target/riscv/amo-table-a-6-fence-4.c: New test.
* gcc.target/riscv/amo-table-a-6-fence-5.c: New test.
* gcc.target/riscv/amo-table-a-6-load-1.c: New test.
* gcc.target/riscv/amo-table-a-6-load-2.c: New test.
* gcc.target/riscv/amo-table-a-6-load-3.c: New test.
* gcc.target/riscv/amo-table-a-6-store-1.c: New test.
* gcc.target/riscv/amo-table-a-6-store-2.c: New test.
* gcc.target/riscv/amo-table-a-6-store-compat-3.c: New test.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-1.c: New test.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-2.c: New test.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-3.c: New test.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-4.c: New test.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-5.c: New test.

Signed-off-by: Patrick O'Neill 
---
 .../gcc.target/riscv/amo-table-a-6-amo-add-1.c | 15 +++
 .../gcc.target/riscv/amo-table-a-6-amo-add-2.c | 15 +++
 .../gcc.target/riscv/amo-table-a-6-amo-add-3.c | 15 +++
 .../gcc.target/riscv/amo-table-a-6-amo-add-4.c | 15 +++
 .../gcc.target/riscv/amo-table-a-6-amo-add-5.c | 15 +++
 .../riscv/amo-table-a-6-compare-exchange-1.c   |  9 +
 .../riscv/amo-table-a-6-compare-exchange-2.c   |  9 +
 .../riscv/amo-table-a-6-compare-exchange-3.c   |  9 +
 .../riscv/amo-table-a-6-compare-exchange-4.c   |  9 +
 .../riscv/amo-table-a-6-compare-exchange-5.c   |  9 +
 .../riscv/amo-table-a-6-compare-exchange-6.c   | 10 ++
 .../riscv/amo-table-a-6-compare-exchange-7.c   |  9 +
 .../gcc.target/riscv/amo-table-a-6-fence-1.c   | 14 ++
 .../gcc.target/riscv/amo-table-a-6-fence-2.c   | 15 +++
 .../gcc.target/riscv/amo-table-a-6-fence-3.c   | 15 +++
 .../gcc.target/riscv/amo-table-a-6-fence-4.c   | 15 +++
 .../gcc.target/riscv/amo-table-a-6-fence-5.c   | 15 +++
 .../gcc.target/riscv/amo-table-a-6-load-1.c| 16 
 .../gcc.target/riscv/amo-table-a-6-load-2.c| 17 +
 .../gcc.target/riscv/amo-table-a-6-load-3.c| 18 ++
 .../gcc.target/riscv/amo-table-a-6-store-1.c   | 16 
 .../gcc.target/riscv/amo-table-a-6-store-2.c   | 17 +
 .../riscv/amo-table-a-6-store-compat-3.c   | 18 ++
 .../riscv/amo-table-a-6-subword-amo-add-1.c|  9 +
 .../riscv/amo-table-a-6-subword-amo-add-2.c|  9 +
 .../riscv/amo-table-a-6-subword-amo-add-3.c|  9 +
 .../riscv/amo-table-a-6-subword-amo-add-4.c|  9 +
 .../riscv/amo-table-a-6-subword-amo-add-5.c|  9 +
 28 files changed, 360 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-a-6-compare-exchange-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-a-6-compare-exchange-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-a-6-compare-exchange-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-a-6-compare-exchange-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-a-6-compare-exchange-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-a-6-compare-exchange-6.c
 create mode 100644

[gcc13 backport 07/12] RISC-V: Eliminate AMO op fences

Atomic operations with the appropriate bits set already enfore release
semantics. Remove unnecessary release fences from atomic ops.

This change brings AMO ops in line with table A.6 of the ISA manual.

2023-04-27 Patrick O'Neill 

gcc/ChangeLog:

* config/riscv/riscv.cc
(riscv_memmodel_needs_amo_release): Change function name.
(riscv_print_operand): Remove unneeded %F case.
* config/riscv/sync.md: Remove unneeded fences.

Signed-off-by: Patrick O'Neill 
---
 gcc/config/riscv/riscv.cc | 16 +---
 gcc/config/riscv/sync.md  | 12 ++--
 2 files changed, 11 insertions(+), 17 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index df55c427b1b..951f6b5cf42 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4307,11 +4307,11 @@ riscv_memmodel_needs_amo_acquire (enum memmodel model)
 }
 }
 
-/* Return true if a FENCE should be emitted to before a memory access to
-   implement the release portion of memory model MODEL.  */
+/* Return true if the .RL suffix should be added to an AMO to implement the
+   release portion of memory model MODEL.  */
 
 static bool
-riscv_memmodel_needs_release_fence (enum memmodel model)
+riscv_memmodel_needs_amo_release (enum memmodel model)
 {
   switch (model)
 {
@@ -4337,7 +4337,6 @@ riscv_memmodel_needs_release_fence (enum memmodel model)
'R' Print the low-part relocation associated with OP.
'C' Print the integer branch condition for comparison OP.
'A' Print the atomic operation suffix for memory model OP.
-   'F' Print a FENCE if the memory model requires a release.
'z' Print x0 if OP is zero, otherwise print OP normally.
'i' Print i if the operand is not a register.
'S' Print shift-index of single-bit mask OP.
@@ -4499,19 +4498,14 @@ riscv_print_operand (FILE *file, rtx op, int letter)
 
 case 'A':
   if (riscv_memmodel_needs_amo_acquire (model)
- && riscv_memmodel_needs_release_fence (model))
+ && riscv_memmodel_needs_amo_release (model))
fputs (".aqrl", file);
   else if (riscv_memmodel_needs_amo_acquire (model))
fputs (".aq", file);
-  else if (riscv_memmodel_needs_release_fence (model))
+  else if (riscv_memmodel_needs_amo_release (model))
fputs (".rl", file);
   break;
 
-case 'F':
-  if (riscv_memmodel_needs_release_fence (model))
-   fputs ("fence iorw,ow; ", file);
-  break;
-
 case 'i':
   if (code != REG)
 fputs ("i", file);
diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index 1acb78a9ae4..9a3b57bd09f 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -91,9 +91,9 @@
   (match_operand:SI 2 "const_int_operand")] ;; model
 UNSPEC_SYNC_OLD_OP))]
   "TARGET_ATOMIC"
-  "%F2amo.%A2 zero,%z1,%0"
+  "amo.%A2\tzero,%z1,%0"
   [(set_attr "type" "atomic")
-   (set (attr "length") (const_int 8))])
+   (set (attr "length") (const_int 4))])
 
 (define_insn "atomic_fetch_"
   [(set (match_operand:GPR 0 "register_operand" "=")
@@ -105,9 +105,9 @@
   (match_operand:SI 3 "const_int_operand")] ;; model
 UNSPEC_SYNC_OLD_OP))]
   "TARGET_ATOMIC"
-  "%F3amo.%A3 %0,%z2,%1"
+  "amo.%A3\t%0,%z2,%1"
   [(set_attr "type" "atomic")
-   (set (attr "length") (const_int 8))])
+   (set (attr "length") (const_int 4))])
 
 (define_insn "subword_atomic_fetch_strong_"
   [(set (match_operand:SI 0 "register_operand" "=") ;; old value 
at mem
@@ -247,9 +247,9 @@
(set (match_dup 1)
(match_operand:GPR 2 "register_operand" "0"))]
   "TARGET_ATOMIC"
-  "%F3amoswap.%A3 %0,%z2,%1"
+  "amoswap.%A3\t%0,%z2,%1"
   [(set_attr "type" "atomic")
-   (set (attr "length") (const_int 8))])
+   (set (attr "length") (const_int 4))])
 
 (define_expand "atomic_exchange"
   [(match_operand:SHORT 0 "register_operand") ;; old value at mem
-- 
2.34.1

[gcc13 backport 05/12] RISC-V: Add AMO release bits

This patch sets the relevant .rl bits on amo operations.

2023-04-27 Patrick O'Neill 

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_print_operand): Change behavior
of %A to include release bits.

Signed-off-by: Patrick O'Neill 
---
 gcc/config/riscv/riscv.cc | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 11b897aca5c..df55c427b1b 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4498,8 +4498,13 @@ riscv_print_operand (FILE *file, rtx op, int letter)
   break;
 
 case 'A':
-  if (riscv_memmodel_needs_amo_acquire (model))
+  if (riscv_memmodel_needs_amo_acquire (model)
+ && riscv_memmodel_needs_release_fence (model))
+   fputs (".aqrl", file);
+  else if (riscv_memmodel_needs_amo_acquire (model))
fputs (".aq", file);
+  else if (riscv_memmodel_needs_release_fence (model))
+   fputs (".rl", file);
   break;
 
 case 'F':
-- 
2.34.1

[gcc13 backport 08/12] RISC-V: Weaken LR/SC pairs

Introduce the %I and %J flags for setting the .aqrl bits on LR/SC pairs
as needed.

Atomic compare and exchange ops provide success and failure memory
models. C++17 and later place no restrictions on the relative strength
of each model, so ensure we cover both by using a model that enforces
the ordering of both given models.

This change brings LR/SC ops in line with table A.6 of the ISA manual.

2023-04-27 Patrick O'Neill 

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_union_memmodels): Expose
riscv_union_memmodels function to sync.md.
* config/riscv/riscv.cc (riscv_union_memmodels): Add function to
get the union of two memmodels in sync.md.
(riscv_print_operand): Add %I and %J flags that output the
optimal LR/SC flag bits for a given memory model.
* config/riscv/sync.md: Remove static .aqrl bits on LR op/.rl
bits on SC op and replace with optimized %I, %J flags.

Signed-off-by: Patrick O'Neill 
---
 gcc/config/riscv/riscv-protos.h |   3 +
 gcc/config/riscv/riscv.cc   |  44 
 gcc/config/riscv/sync.md| 114 +++-
 3 files changed, 114 insertions(+), 47 deletions(-)

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 02b33e02020..b5616fb3e88 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -22,6 +22,8 @@ along with GCC; see the file COPYING3.  If not see
 #ifndef GCC_RISCV_PROTOS_H
 #define GCC_RISCV_PROTOS_H
 
+#include "memmodel.h"
+
 /* Symbol types we understand.  The order of this list must match that of
the unspec enum in riscv.md, subsequent to UNSPEC_ADDRESS_FIRST.  */
 enum riscv_symbol_type {
@@ -81,6 +83,7 @@ extern bool riscv_v_ext_vector_mode_p (machine_mode);
 extern bool riscv_shamt_matches_mask_p (int, HOST_WIDE_INT);
 extern void riscv_subword_address (rtx, rtx *, rtx *, rtx *, rtx *);
 extern void riscv_lshift_subword (machine_mode, rtx, rtx, rtx *);
+extern enum memmodel riscv_union_memmodels (enum memmodel, enum memmodel);
 
 /* Routines implemented in riscv-c.cc.  */
 void riscv_cpu_cpp_builtins (cpp_reader *);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 951f6b5cf42..59899268918 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4284,6 +4284,36 @@ riscv_print_operand_reloc (FILE *file, rtx op, bool 
hi_reloc)
   fputc (')', file);
 }
 
+/* Return the memory model that encapuslates both given models.  */
+
+enum memmodel
+riscv_union_memmodels (enum memmodel model1, enum memmodel model2)
+{
+  model1 = memmodel_base (model1);
+  model2 = memmodel_base (model2);
+
+  enum memmodel weaker = model1 <= model2 ? model1: model2;
+  enum memmodel stronger = model1 > model2 ? model1: model2;
+
+  switch (stronger)
+{
+  case MEMMODEL_SEQ_CST:
+  case MEMMODEL_ACQ_REL:
+   return stronger;
+  case MEMMODEL_RELEASE:
+   if (weaker == MEMMODEL_ACQUIRE || weaker == MEMMODEL_CONSUME)
+ return MEMMODEL_ACQ_REL;
+   else
+ return stronger;
+  case MEMMODEL_ACQUIRE:
+  case MEMMODEL_CONSUME:
+  case MEMMODEL_RELAXED:
+   return stronger;
+  default:
+   gcc_unreachable ();
+}
+}
+
 /* Return true if the .AQ suffix should be added to an AMO to implement the
acquire portion of memory model MODEL.  */
 
@@ -4337,6 +4367,8 @@ riscv_memmodel_needs_amo_release (enum memmodel model)
'R' Print the low-part relocation associated with OP.
'C' Print the integer branch condition for comparison OP.
'A' Print the atomic operation suffix for memory model OP.
+   'I' Print the LR suffix for memory model OP.
+   'J' Print the SC suffix for memory model OP.
'z' Print x0 if OP is zero, otherwise print OP normally.
'i' Print i if the operand is not a register.
'S' Print shift-index of single-bit mask OP.
@@ -4506,6 +4538,18 @@ riscv_print_operand (FILE *file, rtx op, int letter)
fputs (".rl", file);
   break;
 
+case 'I':
+  if (model == MEMMODEL_SEQ_CST)
+   fputs (".aqrl", file);
+  else if (riscv_memmodel_needs_amo_acquire (model))
+   fputs (".aq", file);
+  break;
+
+case 'J':
+  if (riscv_memmodel_needs_amo_release (model))
+   fputs (".rl", file);
+  break;
+
 case 'i':
   if (code != REG)
 fputs ("i", file);
diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index 9a3b57bd09f..3e6345e83a3 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -116,21 +116,22 @@
(unspec_volatile:SI
  [(any_atomic:SI (match_dup 1)
 (match_operand:SI 2 "register_operand" "rI")) ;; value for 
op
-  (match_operand:SI 3 "register_operand" "rI")]   ;; mask
+  (match_operand:SI 3 "const_int_operand")]   ;; model
 UNSPEC_SYNC_OLD_OP_SUBWORD))
-(match_operand:SI 4 "register_operand" "rI")  ;; not_mask
-

[gcc13 backport 09/12] RISC-V: Weaken mem_thread_fence

This change brings atomic fences in line with table A.6 of the ISA
manual.

Relax mem_thread_fence according to the memmodel given.

2023-04-27 Patrick O'Neill 

gcc/ChangeLog:

* config/riscv/sync.md (mem_thread_fence_1): Change fence
depending on the given memory model.

Signed-off-by: Patrick O'Neill 
---
 gcc/config/riscv/sync.md | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index 3e6345e83a3..ba132d8a1ce 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -45,14 +45,24 @@
   DONE;
 })
 
-;; Until the RISC-V memory model (hence its mapping from C++) is finalized,
-;; conservatively emit a full FENCE.
 (define_insn "mem_thread_fence_1"
   [(set (match_operand:BLK 0 "" "")
(unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BARRIER))
(match_operand:SI 1 "const_int_operand" "")] ;; model
   ""
-  "fence\tiorw,iorw")
+  {
+enum memmodel model = (enum memmodel) INTVAL (operands[1]);
+model = memmodel_base (model);
+if (model == MEMMODEL_SEQ_CST)
+   return "fence\trw,rw";
+else if (model == MEMMODEL_ACQ_REL)
+   return "fence.tso";
+else if (model == MEMMODEL_ACQUIRE)
+   return "fence\tr,rw";
+else if (model == MEMMODEL_RELEASE)
+   return "fence\trw,w";
+  }
+  [(set (attr "length") (const_int 4))])
 
 ;; Atomic memory operations.
 
-- 
2.34.1

[gcc13 backport 06/12] RISC-V: Strengthen atomic stores

This change makes atomic stores strictly stronger than table A.6 of the
ISA manual. This mapping makes the overall patchset compatible with
table A.7 as well.

2023-04-27 Patrick O'Neill 

PR target/89835

gcc/ChangeLog:

* config/riscv/sync.md (atomic_store): Use simple store
instruction in combination with fence(s).

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr89835.c: New test.

Signed-off-by: Patrick O'Neill 
---
 gcc/config/riscv/sync.md | 21 ++---
 gcc/testsuite/gcc.target/riscv/pr89835.c |  9 +
 2 files changed, 27 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr89835.c

diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index 5620d6ffa58..1acb78a9ae4 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -56,7 +56,9 @@
 
 ;; Atomic memory operations.
 
-;; Implement atomic stores with amoswap.  Fall back to fences for atomic loads.
+;; Implement atomic stores with conservative fences.  Fall back to fences for
+;; atomic loads.
+;; This allows us to be compatible with the ISA manual Table A.6 and Table A.7.
 (define_insn "atomic_store"
   [(set (match_operand:GPR 0 "memory_operand" "=A")
 (unspec_volatile:GPR
@@ -64,9 +66,22 @@
(match_operand:SI 2 "const_int_operand")]  ;; model
   UNSPEC_ATOMIC_STORE))]
   "TARGET_ATOMIC"
-  "%F2amoswap.%A2 zero,%z1,%0"
+  {
+enum memmodel model = (enum memmodel) INTVAL (operands[2]);
+model = memmodel_base (model);
+
+if (model == MEMMODEL_SEQ_CST)
+  return "fence\trw,w\;"
+"s\t%z1,%0\;"
+"fence\trw,rw";
+if (model == MEMMODEL_RELEASE)
+  return "fence\trw,w\;"
+"s\t%z1,%0";
+else
+  return "s\t%z1,%0";
+  }
   [(set_attr "type" "atomic")
-   (set (attr "length") (const_int 8))])
+   (set (attr "length") (const_int 12))])
 
 (define_insn "atomic_"
   [(set (match_operand:GPR 0 "memory_operand" "+A")
diff --git a/gcc/testsuite/gcc.target/riscv/pr89835.c 
b/gcc/testsuite/gcc.target/riscv/pr89835.c
new file mode 100644
index 000..ab190e11b60
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr89835.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* Verify that relaxed atomic stores use simple store instuctions.  */
+/* { dg-final { scan-assembler-not "amoswap" } } */
+
+void
+foo(int bar, int baz)
+{
+  __atomic_store_n(, baz, __ATOMIC_RELAXED);
+}
-- 
2.34.1

[gcc13 backport 02/12] RISC-V: Enforce Libatomic LR/SC SEQ_CST

Replace LR.aq/SC.rl pairs with the SEQ_CST LR.aqrl/SC.rl pairs
recommended by table A.6 of the ISA manual.

2023-04-27 Patrick O'Neill 

libgcc/ChangeLog:

* config/riscv/atomic.c: Change LR.aq/SC.rl pairs into
sequentially consistent LR.aqrl/SC.rl pairs.

Signed-off-by: Patrick O'Neill 
---
 libgcc/config/riscv/atomic.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libgcc/config/riscv/atomic.c b/libgcc/config/riscv/atomic.c
index 573d163ea04..bd2b033132b 100644
--- a/libgcc/config/riscv/atomic.c
+++ b/libgcc/config/riscv/atomic.c
@@ -41,7 +41,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 unsigned old, tmp1, tmp2;  \
\
 asm volatile ("1:\n\t" \
- "lr.w.aq %[old], %[mem]\n\t"  \
+ "lr.w.aqrl %[old], %[mem]\n\t"\
  #insn " %[tmp1], %[old], %[value]\n\t"\
  invert\
  "and %[tmp1], %[tmp1], %[mask]\n\t"   \
@@ -75,7 +75,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 unsigned old, tmp1;
\
\
 asm volatile ("1:\n\t" \
- "lr.w.aq %[old], %[mem]\n\t"  \
+ "lr.w.aqrl %[old], %[mem]\n\t"\
  "and %[tmp1], %[old], %[mask]\n\t"\
  "bne %[tmp1], %[o], 1f\n\t"   \
  "and %[tmp1], %[old], %[not_mask]\n\t"\
-- 
2.34.1

[gcc13 backport 04/12] RISC-V: Enforce atomic compare_exchange SEQ_CST

This patch enforces SEQ_CST for atomic compare_exchange ops.

Replace Fence/LR.aq/SC.aq pairs with SEQ_CST LR.aqrl/SC.rl pairs
recommended by table A.6 of the ISA manual.

2023-04-27 Patrick O'Neill 

gcc/ChangeLog:

* config/riscv/sync.md (atomic_cas_value_strong): Change
FENCE/LR.aq/SC.aq into sequentially consistent LR.aqrl/SC.rl
pair.

Signed-off-by: Patrick O'Neill 
---
 gcc/config/riscv/sync.md | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index 0c83ef04607..5620d6ffa58 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -297,9 +297,16 @@
 UNSPEC_COMPARE_AND_SWAP))
(clobber (match_scratch:GPR 6 "="))]
   "TARGET_ATOMIC"
-  "%F5 1: lr.%A5 %0,%1; bne %0,%z2,1f; sc.%A4 %6,%z3,%1; bnez %6,1b; 
1:"
+  {
+return "1:\;"
+  "lr..aqrl\t%0,%1\;"
+  "bne\t%0,%z2,1f\;"
+  "sc..rl\t%6,%z3,%1\;"
+  "bnez\t%6,1b\;"
+  "1:";
+  }
   [(set_attr "type" "atomic")
-   (set (attr "length") (const_int 20))])
+   (set (attr "length") (const_int 16))])
 
 (define_expand "atomic_compare_and_swap"
   [(match_operand:SI 0 "register_operand" "")   ;; bool output
-- 
2.34.1

[gcc13 backport 03/12] RISC-V: Enforce subword atomic LR/SC SEQ_CST

Replace LR.aq/SC.rl pairs with the SEQ_CST LR.aqrl/SC.rl pairs
recommended by table A.6 of the ISA manual.

2023-04-27 Patrick O'Neill 

gcc/ChangeLog:

* config/riscv/sync.md: Change LR.aq/SC.rl pairs into
sequentially consistent LR.aqrl/SC.rl pairs.

Signed-off-by: Patrick O'Neill 
---
 gcc/config/riscv/sync.md | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index 19274528262..0c83ef04607 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -109,7 +109,7 @@
   "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
   {
 return "1:\;"
-  "lr.w.aq\t%0, %1\;"
+  "lr.w.aqrl\t%0, %1\;"
   "\t%5, %0, %2\;"
   "and\t%5, %5, %3\;"
   "and\t%6, %0, %4\;"
@@ -173,7 +173,7 @@
   "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
   {
 return "1:\;"
-  "lr.w.aq\t%0, %1\;"
+  "lr.w.aqrl\t%0, %1\;"
   "and\t%5, %0, %2\;"
   "not\t%5, %5\;"
   "and\t%5, %5, %3\;"
@@ -278,7 +278,7 @@
   "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
   {
 return "1:\;"
-  "lr.w.aq\t%0, %1\;"
+  "lr.w.aqrl\t%0, %1\;"
   "and\t%4, %0, %3\;"
   "or\t%4, %4, %2\;"
   "sc.w.rl\t%4, %4, %1\;"
@@ -443,7 +443,7 @@
   "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
   {
 return "1:\;"
-  "lr.w.aq\t%0, %1\;"
+  "lr.w.aqrl\t%0, %1\;"
   "and\t%6, %0, %4\;"
   "bne\t%6, %z2, 1f\;"
   "and\t%6, %0, %5\;"
-- 
2.34.1

[gcc13 backport 01/12] RISC-V: Eliminate SYNC memory models

Remove references to MEMMODEL_SYNC_* models by converting via
memmodel_base().

2023-04-27 Patrick O'Neill 

gcc/ChangeLog:

* config/riscv/riscv.cc: Remove MEMMODEL_SYNC_* cases and
sanitize memmodel input with memmodel_base.

Signed-off-by: Patrick O'Neill 
---
 gcc/config/riscv/riscv.cc | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 9c626904e89..11b897aca5c 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4294,14 +4294,11 @@ riscv_memmodel_needs_amo_acquire (enum memmodel model)
 {
   case MEMMODEL_ACQ_REL:
   case MEMMODEL_SEQ_CST:
-  case MEMMODEL_SYNC_SEQ_CST:
   case MEMMODEL_ACQUIRE:
   case MEMMODEL_CONSUME:
-  case MEMMODEL_SYNC_ACQUIRE:
return true;
 
   case MEMMODEL_RELEASE:
-  case MEMMODEL_SYNC_RELEASE:
   case MEMMODEL_RELAXED:
return false;
 
@@ -4320,14 +4317,11 @@ riscv_memmodel_needs_release_fence (enum memmodel model)
 {
   case MEMMODEL_ACQ_REL:
   case MEMMODEL_SEQ_CST:
-  case MEMMODEL_SYNC_SEQ_CST:
   case MEMMODEL_RELEASE:
-  case MEMMODEL_SYNC_RELEASE:
return true;
 
   case MEMMODEL_ACQUIRE:
   case MEMMODEL_CONSUME:
-  case MEMMODEL_SYNC_ACQUIRE:
   case MEMMODEL_RELAXED:
return false;
 
@@ -4366,6 +4360,7 @@ riscv_print_operand (FILE *file, rtx op, int letter)
 }
   machine_mode mode = GET_MODE (op);
   enum rtx_code code = GET_CODE (op);
+  const enum memmodel model = memmodel_base (INTVAL (op));
 
   switch (letter)
 {
@@ -4503,12 +4498,12 @@ riscv_print_operand (FILE *file, rtx op, int letter)
   break;
 
 case 'A':
-  if (riscv_memmodel_needs_amo_acquire ((enum memmodel) INTVAL (op)))
+  if (riscv_memmodel_needs_amo_acquire (model))
fputs (".aq", file);
   break;
 
 case 'F':
-  if (riscv_memmodel_needs_release_fence ((enum memmodel) INTVAL (op)))
+  if (riscv_memmodel_needs_release_fence (model))
fputs ("fence iorw,ow; ", file);
   break;
 
-- 
2.34.1

[gcc13 backport 00/12] RISC-V: Implement ISA Manual Table A.6 Mappings

Discussed during the weekly RISC-V GCC meeting[1] and pre-approved by
Jeff Law.
If there aren't any objections I'll commit this cherry-picked series
on Thursday (July 27th).

Patchset on trunk:
https://inbox.sourceware.org/gcc-patches/20230427162301.1151333-1-patr...@rivosinc.com/
First commit: f37a36bce81b50a43ec1613c1d08d803642f7506

Also includes bugfix from:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109713
commit: 4bd434fbfc7865961a8e10d7e9601b28765ce7be

[1] 
https://inbox.sourceware.org/gcc/mhng-b7423fca-67ec-4ce4-9694-4e062632ceb0@palmer-ri-x1c9/T/#t

Martin Liska (1):
  riscv: fix error: control reaches end of non-void function

Patrick O'Neill (11):
  RISC-V: Eliminate SYNC memory models
  RISC-V: Enforce Libatomic LR/SC SEQ_CST
  RISC-V: Enforce subword atomic LR/SC SEQ_CST
  RISC-V: Enforce atomic compare_exchange SEQ_CST
  RISC-V: Add AMO release bits
  RISC-V: Strengthen atomic stores
  RISC-V: Eliminate AMO op fences
  RISC-V: Weaken LR/SC pairs
  RISC-V: Weaken mem_thread_fence
  RISC-V: Weaken atomic loads
  RISC-V: Table A.6 conformance tests

 gcc/config/riscv/riscv-protos.h   |   3 +
 gcc/config/riscv/riscv.cc |  66 --
 gcc/config/riscv/sync.md  | 196 --
 .../riscv/amo-table-a-6-amo-add-1.c   |  15 ++
 .../riscv/amo-table-a-6-amo-add-2.c   |  15 ++
 .../riscv/amo-table-a-6-amo-add-3.c   |  15 ++
 .../riscv/amo-table-a-6-amo-add-4.c   |  15 ++
 .../riscv/amo-table-a-6-amo-add-5.c   |  15 ++
 .../riscv/amo-table-a-6-compare-exchange-1.c  |   9 +
 .../riscv/amo-table-a-6-compare-exchange-2.c  |   9 +
 .../riscv/amo-table-a-6-compare-exchange-3.c  |   9 +
 .../riscv/amo-table-a-6-compare-exchange-4.c  |   9 +
 .../riscv/amo-table-a-6-compare-exchange-5.c  |   9 +
 .../riscv/amo-table-a-6-compare-exchange-6.c  |  10 +
 .../riscv/amo-table-a-6-compare-exchange-7.c  |   9 +
 .../gcc.target/riscv/amo-table-a-6-fence-1.c  |  14 ++
 .../gcc.target/riscv/amo-table-a-6-fence-2.c  |  15 ++
 .../gcc.target/riscv/amo-table-a-6-fence-3.c  |  15 ++
 .../gcc.target/riscv/amo-table-a-6-fence-4.c  |  15 ++
 .../gcc.target/riscv/amo-table-a-6-fence-5.c  |  15 ++
 .../gcc.target/riscv/amo-table-a-6-load-1.c   |  16 ++
 .../gcc.target/riscv/amo-table-a-6-load-2.c   |  17 ++
 .../gcc.target/riscv/amo-table-a-6-load-3.c   |  18 ++
 .../gcc.target/riscv/amo-table-a-6-store-1.c  |  16 ++
 .../gcc.target/riscv/amo-table-a-6-store-2.c  |  17 ++
 .../riscv/amo-table-a-6-store-compat-3.c  |  18 ++
 .../riscv/amo-table-a-6-subword-amo-add-1.c   |   9 +
 .../riscv/amo-table-a-6-subword-amo-add-2.c   |   9 +
 .../riscv/amo-table-a-6-subword-amo-add-3.c   |   9 +
 .../riscv/amo-table-a-6-subword-amo-add-4.c   |   9 +
 .../riscv/amo-table-a-6-subword-amo-add-5.c   |   9 +
 gcc/testsuite/gcc.target/riscv/pr89835.c  |   9 +
 libgcc/config/riscv/atomic.c  |   4 +-
 33 files changed, 563 insertions(+), 75 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-a-6-compare-exchange-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-a-6-compare-exchange-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-a-6-compare-exchange-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-a-6-compare-exchange-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-a-6-compare-exchange-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-a-6-compare-exchange-6.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-a-6-compare-exchange-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-fence-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-fence-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-fence-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-fence-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-fence-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-load-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-load-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-load-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-store-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-store-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-a-6-store-compat-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-a-6-subword-amo-add-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-a-6-subword-amo-add-2.c
 create mode

Re: [PATCH v3] c++: fix ICE with constexpr ARRAY_REF [PR110382]


On 7/25/23 12:59, Marek Polacek wrote:

On Tue, Jul 25, 2023 at 11:15:07AM -0400, Jason Merrill wrote:

On 7/24/23 18:37, Marek Polacek wrote:

On Sat, Jul 22, 2023 at 12:28:59AM -0400, Jason Merrill wrote:

On 7/21/23 18:38, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/13?

-- >8 --

This code in cxx_eval_array_reference has been hard to get right.
In r12-2304 I added some code; in r13-5693 I removed some of it.

Here the problematic line is "S s = arr[0];" which causes a crash
on the assert in verify_ctor_sanity:

 gcc_assert (!ctx->object || !DECL_P (ctx->object)
 || ctx->global->get_value (ctx->object) == ctx->ctor);

ctx->object is the VAR_DECL 's', which is correct here.  The second
line points to the problem: we replaced ctx->ctor in
cxx_eval_array_reference:

 new_ctx.ctor = build_constructor (elem_type, NULL); // #1


...and this code doesn't also clear(/set) new_ctx.object like everywhere
else in constexpr.cc that sets new_ctx.ctor.  Fixing that should make the
testcase work.


Right, but then we'd be back pre-r12-2304 or r13-5693...

...except it should work to always clear the object, like below.

which I think we shouldn't have; the CONSTRUCTOR we created in
cxx_eval_constant_expression/DECL_EXPR

 new_ctx.ctor = build_constructor (TREE_TYPE (r), NULL);

had the right type.


Indeed, and using it rather than building a new one seems like a valid
optimization for trunk.

Agreed, I kept it.


I also notice that the DECL_EXPR code calls unshare_constructor, which
should be unnecessary if init == ctx->ctor?


It looks like init == ctx->ctor only happens only with this new testcase.
I'm not sure it's worth it adding code for such a rare case?

We still need #1 though.  E.g., in constexpr-96241.C, we never
set ctx.ctor/object before calling cxx_eval_array_reference, so
we have to build a CONSTRUCTOR there.  And in constexpr-101371-2.C
we have a ctx.ctor, but it has the wrong type, so we need a new one.

PR c++/110382

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_array_reference): Create a new constructor
only when we don't already have a matching one.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/constexpr-110382.C: New test.
---
gcc/cp/constexpr.cc   |  5 -
gcc/testsuite/g++.dg/cpp1y/constexpr-110382.C | 17 +
2 files changed, 21 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-110382.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index fb94f3cefcb..518b7c7a2d5 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -4291,7 +4291,10 @@ cxx_eval_array_reference (const constexpr_ctx *ctx, tree 
t,
  else
val = build_value_init (elem_type, tf_warning_or_error);
-  if (!SCALAR_TYPE_P (elem_type))
+  if (!SCALAR_TYPE_P (elem_type)
+  /* Create a new constructor only if we don't already have one that
+is suitable.  */
+  && !(ctx->ctor && same_type_p (elem_type, TREE_TYPE (ctx->ctor


We generally use same_type_ignoring_top_level_qualifiers_p in the constexpr
code.


True, changed.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

For 13, I guess I should only clear the object and leave out the
same_type_ bit.

-- >8 --
This code in cxx_eval_array_reference has been hard to get right.
In r12-2304 I added some code; in r13-5693 I removed some of it.

Here the problematic line is "S s = arr[0];" which causes a crash
on the assert in verify_ctor_sanity:

gcc_assert (!ctx->object || !DECL_P (ctx->object)
|| ctx->global->get_value (ctx->object) == ctx->ctor);

ctx->object is the VAR_DECL 's', which is correct here.  The second
line points to the problem: we replaced ctx->ctor in
cxx_eval_array_reference:

new_ctx.ctor = build_constructor (elem_type, NULL); // #1

which I think we shouldn't have; the CONSTRUCTOR we created in
cxx_eval_constant_expression/DECL_EXPR

new_ctx.ctor = build_constructor (TREE_TYPE (r), NULL);

had the right type.

We still need #1 though.  E.g., in constexpr-96241.C, we never
set ctx.ctor/object before calling cxx_eval_array_reference, so
we have to build a CONSTRUCTOR there.  And in constexpr-101371-2.C
we have a ctx.ctor, but it has the wrong type, so we need a new one.

We can fix the problem by always clearing the object, and, as an
optimization, only create/free a new ctor when actually needed.

PR c++/110382

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_array_reference): Create a new constructor
only when we don't already have a matching one.  Clear the object
when the type is non-scalar.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/constexpr-110382.C: New test.
---
   gcc/cp/constexpr.cc   | 17 +++--
   gcc/testsuite/g++.dg/cpp1y/constexpr-110382.C | 17 +
   2 files changed, 32 insertions(+), 2 deletions(-)

Re: [PATCH 3/5] [RISC-V] Generate Zicond instruction for select pattern with condition eq or neq to 0

2023-07-25 Thread Andreas Schwab

On Jul 19 2023, Xiao Zeng wrote:

> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 38d8eb2fcf5..7e6b24bd232 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -2448,6 +2448,17 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
> outer_code, int opno ATTRIBUTE_UN
> *total = COSTS_N_INSNS (1);
> return true;
>   }
> +  else if (TARGET_ZICOND && outer_code == SET &&
> +   ((GET_CODE (XEXP (x, 1)) == REG && XEXP (x, 2) == const0_rtx) 
> ||
> +   (GET_CODE (XEXP (x, 2)) == REG && XEXP (x, 1) == const0_rtx) 
> ||
> +   (GET_CODE (XEXP (x, 1)) == REG && GET_CODE (XEXP (x, 2)) &&
> +XEXP (x, 1) == XEXP (XEXP (x, 0), 0)) ||
> +   (GET_CODE (XEXP (x, 1)) == REG && GET_CODE (XEXP (x, 2)) &&
> +XEXP (x, 2) == XEXP (XEXP (x, 0), 0

Line breaks before the operator, not after.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

Re: [PATCH 0/5] Recognize Zicond extension

On 7/19/23 04:11, Xiao Zeng wrote:

Hi all RISC-V folks:

This series of patches completes support for the riscv architecture's
Zicond standard extension instruction set.

Currently, Zicond is in a frozen state.

See the Zicond specification for details:
https://github.com/riscv/riscv-zicond/releases/download/v1.0-rc2/riscv-zicond-v1.0-rc2.pdf

Prior to this, other community members have also done related work, as shown in:
https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611767.html
https://sourceware.org/pipermail/binutils/2023-January/125773.html

Xiao Zeng (5):
[RISC-V] Recognize Zicond extension
[RISC-V] Generate Zicond instruction for basic semantics
[RISC-V] Generate Zicond instruction for select pattern with condition
eq or neq to 0
[RISC-V] Generate Zicond instruction for select pattern with condition
eq or neq to non-zero
[RISC-V] Generate Zicond instruction for conditional execution

[ ... ]
So what I'm thinking for the overall kit is to stage it in a bit
differently given we have some bits which clearly can go forward as-is
or with very minor changes and others that are going to need some
iteration/refinement.

So I'm going to suggest a few changes so that bits which are non
controversial can move forward immediately.

1/5 looked fine as-is.

I would split 2/5. The first two patterns you added are
non-controversial and could go in immediately. The other 4 patterns
(which require some operand matching) will likely need at least one
round of iteration and should be a distinct patch.

I would split 3/5 as well. 3a would be the costing which I think just
needs to use COSTS_N_INSNS (1) rather than 0 for the cost of a
conditional move and could then move forward immediately. The bits to
wire everything up into the conditional move pattern would be a distinct
patch. We did something similar internally in Ventana and I'd like to
take the time to make sure the issues we ran into are addressed in your
version then do an evaluation of the two approaches.

I think patch 4 is probably going to need some work too. I *think* what
we did internally at Ventana will work better (utilizing scc for a
non-trivial condition).

Let's defer patch #5 initially as well. It's going to get tangled up in
a whole bunch of changes I think we need to make to ifcvt.cc.

The point being that with the bits from #1, #2 and #3 we can get some
initial support in immediately. eswincomputing and ventana can both
reduce our divergence from the trunk and work together on the rest of
the bits.

Does that work for you?

jeff

Re: [PATCH 3/5] [RISC-V] Generate Zicond instruction for select pattern with condition eq or neq to 0





On 7/19/23 04:11, Xiao Zeng wrote:

This patch completes the recognition of Zicond when the select pattern
with condition eq or neq to 0 (using equality as an example), namely:

1 rd = (rs2 == 0) ? non-imm : 0
2 rd = (rs2 == 0) ? non-imm : non-imm
3 rd = (rs2 == 0) ? reg : non-imm
4 rd = (rs2 == 0) ? reg : reg

gcc/ChangeLog:

 * config/riscv/riscv.cc (riscv_rtx_costs): IF_THEN_ELSE costs in 
Zicond.
 (riscv_expand_conditional_move): Recognize Zicond.
 * config/riscv/riscv.md: Zicond patterns.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond-primitiveSemantics_return_0_imm.c: New test.
* gcc.target/riscv/zicond-primitiveSemantics_return_imm_imm.c: New test.
* gcc.target/riscv/zicond-primitiveSemantics_return_imm_reg.c: New test.
* gcc.target/riscv/zicond-primitiveSemantics_return_reg_reg.c: New test.
---
  gcc/config/riscv/riscv.cc | 125 ++
  gcc/config/riscv/riscv.md |   2 +-
  .../zicond-primitiveSemantics_return_0_imm.c  |  65 +
  ...zicond-primitiveSemantics_return_imm_imm.c |  73 ++
  ...zicond-primitiveSemantics_return_imm_reg.c |  65 +
  ...zicond-primitiveSemantics_return_reg_reg.c |  65 +
  6 files changed, 394 insertions(+), 1 deletion(-)
  create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_return_0_imm.c
  create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_return_imm_imm.c
  create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_return_imm_reg.c
  create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_return_reg_reg.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 38d8eb2fcf5..7e6b24bd232 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2448,6 +2448,17 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
  *total = COSTS_N_INSNS (1);
  return true;
}
+  else if (TARGET_ZICOND && outer_code == SET &&
+   ((GET_CODE (XEXP (x, 1)) == REG && XEXP (x, 2) == const0_rtx) ||
+   (GET_CODE (XEXP (x, 2)) == REG && XEXP (x, 1) == const0_rtx) ||
+   (GET_CODE (XEXP (x, 1)) == REG && GET_CODE (XEXP (x, 2)) &&
+XEXP (x, 1) == XEXP (XEXP (x, 0), 0)) ||
+   (GET_CODE (XEXP (x, 1)) == REG && GET_CODE (XEXP (x, 2)) &&
+XEXP (x, 2) == XEXP (XEXP (x, 0), 0
+{
+  *total = 0;
+  return true;
+}

So why *total = 0.  I would have expected *total = COSTS_N_INSNS (1).


I'm not entirely sure the changes to riscv_expand_conditional_move are 
desirable -- these are likely better done in the generic if-conversion 
pass.




diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 6b8c2e8e268..b4147c7a79c 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2484,7 +2484,7 @@
(if_then_else:GPR (match_operand 1 "comparison_operator")
  (match_operand:GPR 2 "reg_or_0_operand")
  (match_operand:GPR 3 "sfb_alu_operand")))]
-  "TARGET_SFB_ALU || TARGET_XTHEADCONDMOV"
+  "TARGET_SFB_ALU || TARGET_XTHEADCONDMOV || TARGET_ZICOND"
  {
if (riscv_expand_conditional_move (operands[0], operands[1],
 operands[2], operands[3]))
We had to do more than just slap on a TARGET_ZICOND.  I'm a bit 
surprised this worked as-is.  Though we also have bits to handle 
conditions other than eq/ne by first emitting an sCC style insn which 
might be adding complication or cases you hadn't encountered.



Jeff

[PATCH v3] c++: fix ICE with constexpr ARRAY_REF [PR110382]

On Tue, Jul 25, 2023 at 11:15:07AM -0400, Jason Merrill wrote:
> On 7/24/23 18:37, Marek Polacek wrote:
> > On Sat, Jul 22, 2023 at 12:28:59AM -0400, Jason Merrill wrote:
> > > On 7/21/23 18:38, Marek Polacek wrote:
> > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/13?
> > > > 
> > > > -- >8 --
> > > > 
> > > > This code in cxx_eval_array_reference has been hard to get right.
> > > > In r12-2304 I added some code; in r13-5693 I removed some of it.
> > > > 
> > > > Here the problematic line is "S s = arr[0];" which causes a crash
> > > > on the assert in verify_ctor_sanity:
> > > > 
> > > > gcc_assert (!ctx->object || !DECL_P (ctx->object)
> > > > || ctx->global->get_value (ctx->object) == ctx->ctor);
> > > > 
> > > > ctx->object is the VAR_DECL 's', which is correct here.  The second
> > > > line points to the problem: we replaced ctx->ctor in
> > > > cxx_eval_array_reference:
> > > > 
> > > > new_ctx.ctor = build_constructor (elem_type, NULL); // #1
> > > 
> > > ...and this code doesn't also clear(/set) new_ctx.object like everywhere
> > > else in constexpr.cc that sets new_ctx.ctor.  Fixing that should make the
> > > testcase work.
> > 
> > Right, but then we'd be back pre-r12-2304 or r13-5693...
> > 
> > ...except it should work to always clear the object, like below.
> > > > which I think we shouldn't have; the CONSTRUCTOR we created in
> > > > cxx_eval_constant_expression/DECL_EXPR
> > > > 
> > > > new_ctx.ctor = build_constructor (TREE_TYPE (r), NULL);
> > > > 
> > > > had the right type.
> > > 
> > > Indeed, and using it rather than building a new one seems like a valid
> > > optimization for trunk.
> > Agreed, I kept it.
> > 
> > > I also notice that the DECL_EXPR code calls unshare_constructor, which
> > > should be unnecessary if init == ctx->ctor?
> > 
> > It looks like init == ctx->ctor only happens only with this new testcase.
> > I'm not sure it's worth it adding code for such a rare case?
> > > > We still need #1 though.  E.g., in constexpr-96241.C, we never
> > > > set ctx.ctor/object before calling cxx_eval_array_reference, so
> > > > we have to build a CONSTRUCTOR there.  And in constexpr-101371-2.C
> > > > we have a ctx.ctor, but it has the wrong type, so we need a new one.
> > > > 
> > > > PR c++/110382
> > > > 
> > > > gcc/cp/ChangeLog:
> > > > 
> > > > * constexpr.cc (cxx_eval_array_reference): Create a new 
> > > > constructor
> > > > only when we don't already have a matching one.
> > > > 
> > > > gcc/testsuite/ChangeLog:
> > > > 
> > > > * g++.dg/cpp1y/constexpr-110382.C: New test.
> > > > ---
> > > >gcc/cp/constexpr.cc   |  5 -
> > > >gcc/testsuite/g++.dg/cpp1y/constexpr-110382.C | 17 +
> > > >2 files changed, 21 insertions(+), 1 deletion(-)
> > > >create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-110382.C
> > > > 
> > > > diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
> > > > index fb94f3cefcb..518b7c7a2d5 100644
> > > > --- a/gcc/cp/constexpr.cc
> > > > +++ b/gcc/cp/constexpr.cc
> > > > @@ -4291,7 +4291,10 @@ cxx_eval_array_reference (const constexpr_ctx 
> > > > *ctx, tree t,
> > > >  else
> > > >val = build_value_init (elem_type, tf_warning_or_error);
> > > > -  if (!SCALAR_TYPE_P (elem_type))
> > > > +  if (!SCALAR_TYPE_P (elem_type)
> > > > +  /* Create a new constructor only if we don't already have one 
> > > > that
> > > > +is suitable.  */
> > > > +  && !(ctx->ctor && same_type_p (elem_type, TREE_TYPE 
> > > > (ctx->ctor
> > > 
> > > We generally use same_type_ignoring_top_level_qualifiers_p in the 
> > > constexpr
> > > code.
> > 
> > True, changed.
> > 
> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > 
> > For 13, I guess I should only clear the object and leave out the
> > same_type_ bit.
> > 
> > -- >8 --
> > This code in cxx_eval_array_reference has been hard to get right.
> > In r12-2304 I added some code; in r13-5693 I removed some of it.
> > 
> > Here the problematic line is "S s = arr[0];" which causes a crash
> > on the assert in verify_ctor_sanity:
> > 
> >gcc_assert (!ctx->object || !DECL_P (ctx->object)
> >|| ctx->global->get_value (ctx->object) == ctx->ctor);
> > 
> > ctx->object is the VAR_DECL 's', which is correct here.  The second
> > line points to the problem: we replaced ctx->ctor in
> > cxx_eval_array_reference:
> > 
> >new_ctx.ctor = build_constructor (elem_type, NULL); // #1
> > 
> > which I think we shouldn't have; the CONSTRUCTOR we created in
> > cxx_eval_constant_expression/DECL_EXPR
> > 
> >new_ctx.ctor = build_constructor (TREE_TYPE (r), NULL);
> > 
> > had the right type.
> > 
> > We still need #1 though.  E.g., in constexpr-96241.C, we never
> > set ctx.ctor/object before calling cxx_eval_array_reference, so
> > we have to build a CONSTRUCTOR there.  And in constexpr-101371-2.C
> > we have a ctx.ctor, but it

Re: [PATCH 2/5] [RISC-V] Generate Zicond instruction for basic semantics





On 7/19/23 04:11, Xiao Zeng wrote:

This patch completes the recognition of the basic semantics
defined in the spec, namely:

Conditional zero, if condition is equal to zero
   rd = (rs2 == 0) ? 0 : rs1
Conditional zero, if condition is non zero
   rd = (rs2 != 0) ? 0 : rs1

gcc/ChangeLog:

* config/riscv/riscv.md: Include zicond.md
* config/riscv/zicond.md: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond-primitiveSemantics.c: New test.
---
  gcc/config/riscv/riscv.md |  1 +
  gcc/config/riscv/zicond.md| 84 +++
  .../riscv/zicond-primitiveSemantics.c | 49 +++
  3 files changed, 134 insertions(+)
  create mode 100644 gcc/config/riscv/zicond.md
  create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics.c

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index d63b584a4c1..6b8c2e8e268 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -3317,3 +3317,4 @@
  (include "sifive-7.md")
  (include "thead.md")
  (include "vector.md")
+(include "zicond.md")
diff --git a/gcc/config/riscv/zicond.md b/gcc/config/riscv/zicond.md
new file mode 100644
index 000..1cf28589c87
--- /dev/null
+++ b/gcc/config/riscv/zicond.md
@@ -0,0 +1,84 @@
+;; Machine description for the RISC-V Zicond extension
+;; Copyright (C) 2022-23 Free Software Foundation, Inc.
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; .
+
+(define_code_iterator eq_or_ne [eq ne])
+(define_code_attr eqz [(eq "nez") (ne "eqz")])
+(define_code_attr nez [(eq "eqz") (ne "nez")])
+
+
+;; Special optimization under eq/ne in primitive semantics
+(define_insn "*czero.eqz..opt1"
+  [(set (match_operand:GPR 0 "register_operand"   "=r")
+(if_then_else:GPR (eq (match_operand:ANYI 1 "register_operand" "r")
+  (const_int 0))
+  (match_operand:GPR 2 "register_operand" "1")
+  (match_operand:GPR 3 "register_operand" "r")))]
+  "TARGET_ZICOND && operands[1] == operands[2]"
+  "czero.eqz\t%0,%3,%1"
Interesting.  We didn't have this pattern internally, though it's 
clever.  I'm curious how often it triggered.


Why did you need the operands[1] == operands[2] condition.   I would 
hazard a guess the idea was to reject cases that weren't going to be 
profitable if LRA/reload needed to insert copies to satisfy the matching 
constraint?


It may have been better to replace operand 2 with (match_dup 1).  If 
that isn't viable, then the right check in the condition would have been

REGNO (operands[1]) == REGNO (operands[2]).

You need to be very careful comparing REG expressions for equality like 
you did.  It probably works in this case, but it's pretty fragile in 
general.  The problem while you can compare two pseudos using pointer 
equality, you can't necessarily do that for hard registers.


What happens under the hood is you can have two distinct pseudos which 
get allocated to the same hard reg.  At assignment time we just replace 
the underlying register # without going back and fixing all the REG 
expressions.  Meaning that you can have:



(reg:XX 12) and (reg:XX 12)

Which are at distinct memory locations.  Meaning that while the RTX 
expresssions look the same, they will fail the pointer equality check.


So again, it probably works on your example, but I'd rather look for 
ways to bullet proof this better.  The (match_dup) approach is probably 
the most preferred.


Similarly for the other 3 patterns that have pointer equality tests for 
two operands in the insn condition.



Jeff

Re: [PATCH 1/5] [RISC-V] Recognize Zicond extension





On 7/19/23 04:11, Xiao Zeng wrote:

This patch is the minimal support for Zicond extension, include
the extension name, mask and target defination.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: New extension.
* config/riscv/riscv-opts.h (MASK_ZICOND): New mask.
(TARGET_ZICOND): New target.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/attribute-20.c: New test.
* gcc.target/riscv/attribute-21.c: New test.
This is OK.  Though I don't think we should install until the follow-on 
patches are ready to go.


jeff

New Ukrainian PO file for 'gcc' (version 13.1.0)

2023-07-25 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Ukrainian team of translators.  The file is available at:

https://translationproject.org/latest/gcc/uk.po

(This file, 'gcc-13.1.0.uk.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.

[PATCH, OpenACC 2.7] Connect readonly modifier to points-to analysis

2023-07-25 Thread Chung-Lin Tang via Gcc-patches

On 2023/7/11 2:33 AM, Chung-Lin Tang via Gcc-patches wrote:
> As we discussed earlier, the work for actually linking this to middle-end
> points-to analysis is a somewhat non-trivial issue. This first patch allows
> the language feature to be used in OpenACC directives first (with no effect 
> for now).
> The middle-end changes are probably going to be a later patch.

This second patch tries to link the readonly modifier to points-to analysis.

There already exists SSA_NAME_POINTS_TO_READONLY_MEMORY and it's support in the
alias oracle routines in tree-ssa-alias.cc, so basically what this patch does is
try to make the variables holding the array section base pointers to have this
flag set.

There is an another OMP_CLAUSE_MAP_POINTS_TO_READONLY set by front-ends on the
associated pointer clauses if OMP_CLAUSE_MAP_READONLY is set.
Also a DECL_POINTS_TO_READONLY flag is set for VAR_DECLs when creating the tmp
vars carrying these receiver references on the offloaded side. These
eventually get translated to SSA_NAME_POINTS_TO_READONLY_MEMORY.

This still doesn't always work as expected in terms of optimization:
struct pointer fields and Fortran arrays (kind of like C structs) which have
several accesses to create the pointer access on the receive/offloaded side,
and SRA appears to not work on these sequences, so gets in the way of much
redundancy elimination.

Currently have one testcase where we can demonstrate 'readonly' can avoid
a clobber by function call. Tested on powerpc64le-linux/nvptx.

Note this patch is create a-top of the front-end patch.
(will respond to the other front-end patch comments later)

Thanks,
Chung-Lin

2023-07-25  Chung-Lin Tang  

gcc/c/ChangeLog:

* c-typeck.cc (handle_omp_array_sections):
Set OMP_CLAUSE_MAP_POINTS_TO_READONLY on pointer clause.

gcc/cp/ChangeLog:

* semantics.cc (handle_omp_array_sections):
Set OMP_CLAUSE_MAP_POINTS_TO_READONLY on pointer clause.

gcc/fortran/ChangeLog:

* trans-openmp.cc (gfc_trans_omp_array_section):
Set OMP_CLAUSE_MAP_POINTS_TO_READONLY on pointer clause.

gcc/ChangeLog:

* gimple-expr.cc (copy_var_decl): Copy DECL_POINTS_TO_READONLY
for VAR_DECLs.
* gimplify.cc (struct gimplify_omp_ctx):
Add 'hash_set *pt_readonly_ptrs' field.
(internal_get_tmp_var): Set
DECL_POINTS_TO_READONLY/SSA_NAME_POINTS_TO_READONLY_MEMORY for
new temp vars.
(build_omp_struct_comp_nodes):
Set OMP_CLAUSE_MAP_POINTS_TO_READONLY on pointer clause.
(gimplify_scan_omp_clauses): Collect OMP_CLAUSE_MAP_POINTS_TO_READONLY
to ctx->pt_readonly_ptrs.
* omp-low.cc (lower_omp_target): Set DECL_POINTS_TO_READONLY for
variables of receiver refs.
* tree-pretty-print.cc (dump_omp_clause):
Print OMP_CLAUSE_MAP_POINTS_TO_READONLY.
(dump_generic_node): Print SSA_NAME_POINTS_TO_READONLY_MEMORY.
* tree.h (DECL_POINTS_TO_READONLY): New macro.
(OMP_CLAUSE_MAP_POINTS_TO_READONLY): New macro.

gcc/testsuite/ChangeLog:

* c-c++-common/goacc/readonly-1.c: Adjust testcase.
* c-c++-common/goacc/readonly-2.c: New testcase.
* gfortran.dg/goacc/readonly-1.f90: Adjust testcase.
diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 7cf411155c6..42591e4029a 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -14258,6 +14258,8 @@ handle_omp_array_sections (tree c, enum 
c_omp_region_type ort)
OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_ATTACH_DETACH);
   else
OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_FIRSTPRIVATE_POINTER);
+  if (OMP_CLAUSE_MAP_READONLY (c))
+   OMP_CLAUSE_MAP_POINTS_TO_READONLY (c2) = 1;
   OMP_CLAUSE_MAP_IMPLICIT (c2) = OMP_CLAUSE_MAP_IMPLICIT (c);
   if (OMP_CLAUSE_MAP_KIND (c2) != GOMP_MAP_FIRSTPRIVATE_POINTER
  && !c_mark_addressable (t))
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 8fb47fd179e..6ab467e1140 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -5872,6 +5872,8 @@ handle_omp_array_sections (tree c, enum c_omp_region_type 
ort)
}
  else
OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_FIRSTPRIVATE_POINTER);
+ if (OMP_CLAUSE_MAP_READONLY (c))
+   OMP_CLAUSE_MAP_POINTS_TO_READONLY (c2) = 1;
  OMP_CLAUSE_MAP_IMPLICIT (c2) = OMP_CLAUSE_MAP_IMPLICIT (c);
  if (OMP_CLAUSE_MAP_KIND (c2) != GOMP_MAP_FIRSTPRIVATE_POINTER
  && !cxx_mark_addressable (t))
diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index 2253d559f9c..d7cd65af1bb 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -2524,6 +2524,8 @@ gfc_trans_omp_array_section (stmtblock_t *block, 
gfc_exec_op op,
   node3 = build_omp_clause (input_location, OMP_CLAUSE_MAP);
   OMP_CLAUSE_SET_MAP_KIND (node3, ptr_kind);
   OMP_CLAUSE_DECL (node3) = gfc_conv_descriptor_data_get (decl);
+  if (n->u.readonly)
+

Re: [PATCH v2] c++: fix ICE with constexpr ARRAY_REF [PR110382]


On 7/24/23 18:37, Marek Polacek wrote:

On Sat, Jul 22, 2023 at 12:28:59AM -0400, Jason Merrill wrote:

On 7/21/23 18:38, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/13?

-- >8 --

This code in cxx_eval_array_reference has been hard to get right.
In r12-2304 I added some code; in r13-5693 I removed some of it.

Here the problematic line is "S s = arr[0];" which causes a crash
on the assert in verify_ctor_sanity:

gcc_assert (!ctx->object || !DECL_P (ctx->object)
|| ctx->global->get_value (ctx->object) == ctx->ctor);

ctx->object is the VAR_DECL 's', which is correct here.  The second
line points to the problem: we replaced ctx->ctor in
cxx_eval_array_reference:

new_ctx.ctor = build_constructor (elem_type, NULL); // #1


...and this code doesn't also clear(/set) new_ctx.object like everywhere
else in constexpr.cc that sets new_ctx.ctor.  Fixing that should make the
testcase work.


Right, but then we'd be back pre-r12-2304 or r13-5693...

...except it should work to always clear the object, like below.
  

which I think we shouldn't have; the CONSTRUCTOR we created in
cxx_eval_constant_expression/DECL_EXPR

new_ctx.ctor = build_constructor (TREE_TYPE (r), NULL);

had the right type.


Indeed, and using it rather than building a new one seems like a valid
optimization for trunk.
  
Agreed, I kept it.



I also notice that the DECL_EXPR code calls unshare_constructor, which
should be unnecessary if init == ctx->ctor?


It looks like init == ctx->ctor only happens only with this new testcase.
I'm not sure it's worth it adding code for such a rare case?
  

We still need #1 though.  E.g., in constexpr-96241.C, we never
set ctx.ctor/object before calling cxx_eval_array_reference, so
we have to build a CONSTRUCTOR there.  And in constexpr-101371-2.C
we have a ctx.ctor, but it has the wrong type, so we need a new one.

PR c++/110382

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_array_reference): Create a new constructor
only when we don't already have a matching one.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/constexpr-110382.C: New test.
---
   gcc/cp/constexpr.cc   |  5 -
   gcc/testsuite/g++.dg/cpp1y/constexpr-110382.C | 17 +
   2 files changed, 21 insertions(+), 1 deletion(-)
   create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-110382.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index fb94f3cefcb..518b7c7a2d5 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -4291,7 +4291,10 @@ cxx_eval_array_reference (const constexpr_ctx *ctx, tree 
t,
 else
   val = build_value_init (elem_type, tf_warning_or_error);
-  if (!SCALAR_TYPE_P (elem_type))
+  if (!SCALAR_TYPE_P (elem_type)
+  /* Create a new constructor only if we don't already have one that
+is suitable.  */
+  && !(ctx->ctor && same_type_p (elem_type, TREE_TYPE (ctx->ctor


We generally use same_type_ignoring_top_level_qualifiers_p in the constexpr
code.


True, changed.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

For 13, I guess I should only clear the object and leave out the
same_type_ bit.

-- >8 --
This code in cxx_eval_array_reference has been hard to get right.
In r12-2304 I added some code; in r13-5693 I removed some of it.

Here the problematic line is "S s = arr[0];" which causes a crash
on the assert in verify_ctor_sanity:

   gcc_assert (!ctx->object || !DECL_P (ctx->object)
   || ctx->global->get_value (ctx->object) == ctx->ctor);

ctx->object is the VAR_DECL 's', which is correct here.  The second
line points to the problem: we replaced ctx->ctor in
cxx_eval_array_reference:

   new_ctx.ctor = build_constructor (elem_type, NULL); // #1

which I think we shouldn't have; the CONSTRUCTOR we created in
cxx_eval_constant_expression/DECL_EXPR

   new_ctx.ctor = build_constructor (TREE_TYPE (r), NULL);

had the right type.

We still need #1 though.  E.g., in constexpr-96241.C, we never
set ctx.ctor/object before calling cxx_eval_array_reference, so
we have to build a CONSTRUCTOR there.  And in constexpr-101371-2.C
we have a ctx.ctor, but it has the wrong type, so we need a new one.

We can fix the problem by always clearing the object, and, as an
optimization, only create/free a new ctor when actually needed.

PR c++/110382

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_array_reference): Create a new constructor
only when we don't already have a matching one.  Clear the object
when the type is non-scalar.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/constexpr-110382.C: New test.
---
  gcc/cp/constexpr.cc   | 17 +++--
  gcc/testsuite/g++.dg/cpp1y/constexpr-110382.C | 17 +
  2 files changed, 32 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-110382.C

diff --git a/gcc/cp/constexpr.cc

Re: [PATCH v7] RISC-V: Support CALL for RVV floating-point dynamic rounding

2023-07-25 Thread Robin Dapp via Gcc-patches

> The call fesetround could be any function in practice, and we never
> know if that function might use dynamic rounding mode floating point
> operation or not, also we don't know if it will be called fesetround
> or not.
> 
> So that's why we want to restore before function call to make sure we
> don't break rounding mode for any inner function, also backup the frm
> after function call can handle the case if the inner function has
> changed global rounding mode.

Ah, that clarifies things a bit.  Yes, I figured - the function is
opaque anyway.

I think we're touching a general question here, I suppose there already
have been discussions on the LLVM side about this?  What about ABI
implications?  By saving and restoring before and after a function
call we assume the FRM register to be call clobbered instead of
letting the function/call save/restore it.  Apart from fesetround
I can't imagine many cases where we would want to re-use the rounding
mode from inside a call so it might be advantageous to leave the
burden of saving/restoring to the callee.  But that's a similar
discussion as with the vector ABI so I don't expect a conclusion ;)

Regards
 Robin

[PATCH] rtl-optimization/110587 - speedup find_hard_regno_for_1

The following applies a micro-optimization to find_hard_regno_for_1,
re-ordering the check so we can easily jump-thread by using an else.
This reduces the time spent in this function by 15% for the testcase
in the PR.

Bootstrap & regtest running on x86_64-unknown-linux-gnu, OK if that
passes?

Thanks,
Richard.

PR rtl-optimization/110587
* lra-assigns.cc (find_hard_regno_for_1): Re-order checks.
---
 gcc/lra-assigns.cc | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/gcc/lra-assigns.cc b/gcc/lra-assigns.cc
index b8582dcafff..d2ebcfd5056 100644
--- a/gcc/lra-assigns.cc
+++ b/gcc/lra-assigns.cc
@@ -522,14 +522,15 @@ find_hard_regno_for_1 (int regno, int *cost, int 
try_only_hard_regno,
   r2 != NULL;
   r2 = r2->start_next)
{
- if (r2->regno >= lra_constraint_new_regno_start
+ if (live_pseudos_reg_renumber[r2->regno] < 0
+ && r2->regno >= lra_constraint_new_regno_start
  && lra_reg_info[r2->regno].preferred_hard_regno1 >= 0
- && live_pseudos_reg_renumber[r2->regno] < 0
  && rclass_intersect_p[regno_allocno_class_array[r2->regno]])
sparseset_set_bit (conflict_reload_and_inheritance_pseudos,
   r2->regno);
- if (live_pseudos_reg_renumber[r2->regno] >= 0
- && rclass_intersect_p[regno_allocno_class_array[r2->regno]])
+ else if (live_pseudos_reg_renumber[r2->regno] >= 0
+  && rclass_intersect_p
+   [regno_allocno_class_array[r2->regno]])
sparseset_set_bit (live_range_hard_reg_pseudos, r2->regno);
}
}
-- 
2.35.3

[PATCH] rtl-optimization/110587 - remove quadratic regno_in_use_p

The following removes the code checking whether a noop copy
is between something involved in the return sequence composed
of a SET and USE.  Instead of checking for this special-case
the following makes us only ever remove noop copies between
pseudos - which is the case that is necessary for IRA/LRA
interfacing to function according to the comment.  That makes
looking for the return reg special case unnecessary, reducing
the compile-time in LRA non-specific to zero for the testcase.

Bootstrapped and tested on x86_64-unknown-linux-gnu with
all languages and {,-m32}.

OK?

Thanks,
Richard.

PR rtl-optimization/110587
* lra-spills.cc (return_regno_p): Remove.
(regno_in_use_p): Likewise.
(lra_final_code_change): Do not remove noop moves
between hard registers.
---
 gcc/lra-spills.cc | 69 +--
 1 file changed, 1 insertion(+), 68 deletions(-)

diff --git a/gcc/lra-spills.cc b/gcc/lra-spills.cc
index 3a7bb7e8cd9..fe58f162d05 100644
--- a/gcc/lra-spills.cc
+++ b/gcc/lra-spills.cc
@@ -705,72 +705,6 @@ alter_subregs (rtx *loc, bool final_p)
   return res;
 }
 
-/* Return true if REGNO is used for return in the current
-   function.  */
-static bool
-return_regno_p (unsigned int regno)
-{
-  rtx outgoing = crtl->return_rtx;
-
-  if (! outgoing)
-return false;
-
-  if (REG_P (outgoing))
-return REGNO (outgoing) == regno;
-  else if (GET_CODE (outgoing) == PARALLEL)
-{
-  int i;
-
-  for (i = 0; i < XVECLEN (outgoing, 0); i++)
-   {
- rtx x = XEXP (XVECEXP (outgoing, 0, i), 0);
-
- if (REG_P (x) && REGNO (x) == regno)
-   return true;
-   }
-}
-  return false;
-}
-
-/* Return true if REGNO is in one of subsequent USE after INSN in the
-   same BB.  */
-static bool
-regno_in_use_p (rtx_insn *insn, unsigned int regno)
-{
-  static lra_insn_recog_data_t id;
-  static struct lra_static_insn_data *static_id;
-  struct lra_insn_reg *reg;
-  int i, arg_regno;
-  basic_block bb = BLOCK_FOR_INSN (insn);
-
-  while ((insn = next_nondebug_insn (insn)) != NULL_RTX)
-{
-  if (BARRIER_P (insn) || bb != BLOCK_FOR_INSN (insn))
-   return false;
-  if (! INSN_P (insn))
-   continue;
-  if (GET_CODE (PATTERN (insn)) == USE
- && REG_P (XEXP (PATTERN (insn), 0))
- && regno == REGNO (XEXP (PATTERN (insn), 0)))
-   return true;
-  /* Check that the regno is not modified.  */
-  id = lra_get_insn_recog_data (insn);
-  for (reg = id->regs; reg != NULL; reg = reg->next)
-   if (reg->type != OP_IN && reg->regno == (int) regno)
- return false;
-  static_id = id->insn_static_data;
-  for (reg = static_id->hard_regs; reg != NULL; reg = reg->next)
-   if (reg->type != OP_IN && reg->regno == (int) regno)
- return false;
-  if (id->arg_hard_regs != NULL)
-   for (i = 0; (arg_regno = id->arg_hard_regs[i]) >= 0; i++)
- if ((int) regno == (arg_regno >= FIRST_PSEUDO_REGISTER
- ? arg_regno : arg_regno - FIRST_PSEUDO_REGISTER))
-   return false;
-}
-  return false;
-}
-
 /* Final change of pseudos got hard registers into the corresponding
hard registers and removing temporary clobbers.  */
 void
@@ -821,8 +755,7 @@ lra_final_code_change (void)
  if (NONJUMP_INSN_P (insn) && GET_CODE (pat) == SET
  && REG_P (SET_SRC (pat)) && REG_P (SET_DEST (pat))
  && REGNO (SET_SRC (pat)) == REGNO (SET_DEST (pat))
- && (! return_regno_p (REGNO (SET_SRC (pat)))
- || ! regno_in_use_p (insn, REGNO (SET_SRC (pat)
+ && REGNO (SET_SRC (pat)) >= FIRST_PSEUDO_REGISTER)
{
  lra_invalidate_insn_data (insn);
  delete_insn (insn);
-- 
2.35.3

[COMMITTED] Adjust one Ada test

2023-07-25 Thread Marc Poulhiès via Gcc-patches

Recent change modified how the loops are created, with the first
iteration being extracted out of the loops in the 2 test cases.
Adjust the text to match from the unroll dump.

gcc/testsuite/ChangeLog:

* gnat.dg/unroll3.adb: Adjust.
---
 Tested on x86_64-pc-linux-gnu, committed on master.

 gcc/testsuite/gnat.dg/unroll3.adb | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gnat.dg/unroll3.adb 
b/gcc/testsuite/gnat.dg/unroll3.adb
index 3bd06e7de76..86193d64681 100644
--- a/gcc/testsuite/gnat.dg/unroll3.adb
+++ b/gcc/testsuite/gnat.dg/unroll3.adb
@@ -23,4 +23,4 @@ package body Unroll3 is
 
 end Unroll3;
 
--- { dg-final { scan-tree-dump-times "loop with 3 iterations completely 
unrolled" 2 "cunroll" } }
+-- { dg-final { scan-tree-dump-times "loop with 2 iterations completely 
unrolled" 2 "cunroll" } }
-- 
2.40.0

Re: [PATCH v7] RISC-V: Support CALL for RVV floating-point dynamic rounding

Hi Robin:

Give few more context about the design:

The call fesetround could be any function in practice, and we never
know if that function might use dynamic rounding mode floating point
operation or not, also we don't know if it will be called fesetround
or not.

So that's why we want to restore before function call to make sure we
don't break rounding mode for any inner function, also backup the frm
after function call can handle the case if the inner function has
changed global rounding mode.

On Tue, Jul 25, 2023 at 7:53 PM Li, Pan2  wrote:
>
> Thanks Robin. Let me share one example for the CALL scenario, considering 
> below code.
>
> external int fesetround(int rounding_mode);
>
> test_call_for_rm:
>  <-FRM X
>vfadd RTZ (static)   <-FRM RTZ
>  <-RESTORE FRM X
>call fesetround RMM <-Change FRM to RMM during the call
>  <-Backup the FRM RMM
>vfadd RUP (static)  <- FRM RUP
>  <- Restore the FRM 
> to RMM
>ret
>
> When emit at the insn call, we need to emit 2 insns, one restore before the 
> call and one backup after call, to ensure 2 things.
>
> 1. The static FRM should not pollute the call.
> 2. The updated FRM in the call will alive to the end of the cfun.
>
> Unfortunately, current mode switching cannot emit 2 insns as above, it mostly 
> emits after. It become even worse when the call
> is the last insn of the bb, we try to do some special handling in needed 
> function for this.
>
> And thank robin again for nits and cleanups, like 
> previous/next_nonnote_nondebug_insn_bb.
>
> Pan
>
> -Original Message-
> From: Robin Dapp 
> Sent: Tuesday, July 25, 2023 4:38 PM
> To: Li, Pan2 ; gcc-patches@gcc.gnu.org
> Cc: rdapp@gmail.com; juzhe.zh...@rivai.ai; kito.ch...@sifive.com; Wang, 
> Yanzhang 
> Subject: Re: [PATCH v7] RISC-V: Support CALL for RVV floating-point dynamic 
> rounding
>
> Hi Pan,
>
> > Given we have a call, we would like to restore before call and then
> > backup frm after call. Looks current mode switching cannot emit insn
> > like that, it can only either emit insn before (mostly) or after
> > (when NOTE_INSN_BASIC_BLOCK_P). Thus, we try to emit the one after
> > call when needed as a specially handling here.
>
> Would you mind explaining a bit more here?  As far as I know we can
> perform necessary mode switching (including saving necessary
> registers) directly after function entry and right before function
> exit.  Is this somehow too early or too late or cannot handle what
> you want?
>
> The patch in itself makes sense (apart from some nits and possible
> cleanups) but I'm still missing the bigger picture.  For me it gets
> more confusing with every patch to be honest :D
>
> Regards
>  Robin
>

Re: vectorizer: Avoid an OOB access from vectorization

2023-07-25 Thread Richard Sandiford via Gcc-patches

Was leaving a bit of time in case Richi had any comments, but:

Matthew Malcomson  writes:
> Our checks for whether the vectorization of a given loop would make an
> out of bounds access miss the case when the vector we load is so large
> as to span multiple iterations worth of data (while only being there to
> implement a single iteration).
>
> This patch adds a check for such an access.
>
> Example where this was going wrong (smaller version of testcase added):
>
> ```
>   extern unsigned short multi_array[5][16][16];
>   extern void initialise_s(int *);
>   extern int get_sval();
>
>   void foo() {
> int s0 = get_sval();
> int s[31];
> int i,j;
> initialise_s([0]);
> s0 = get_sval();
> for (j=0; j < 16; j++)
>   for (i=0; i < 16; i++)
>   multi_array[1][j][i]=s[j*2];
>   }
> ```
>
> With the above loop we would load the `s[j*2]` integer into a 4 element
> vector, which reads 3 extra elements than the scalar loop would.
> `get_group_load_store_type` identifies that the loop requires a scalar
> epilogue due to gaps.  However we do not identify that the above code
> requires *two* scalar loops to be peeled due to the fact that each
> iteration loads an amount of data from the *next* iteration (while not
> using it).
>
> Bootstrapped and regtested on aarch64-none-linux-gnu.
> N.b. out of interest we came across this working with Morello.
>
>
> ### Attachment also inlined for ease of reply
> ###
>
>
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-multi-peel-gaps.c 
> b/gcc/testsuite/gcc.dg/vect/vect-multi-peel-gaps.c
> new file mode 100644
> index 
> ..1b721fd26cab8d5583b153dd6b28c914db870ec3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-multi-peel-gaps.c
> @@ -0,0 +1,60 @@
> +/* For some targets we end up vectorizing the below loop such that the `sp`
> +   single integer is loaded into a 4 integer vector.
> +   While the writes are all safe, without 2 scalar loops being peeled into 
> the
> +   epilogue we would read past the end of the 31 integer array.  This happens
> +   because we load a 4 integer chunk to only use the first integer and
> +   increment by 2 integers at a time, hence the last load needs s[30-33] and
> +   the penultimate load needs s[28-31].
> +   This testcase ensures that we do not crash due to that behaviour.  */
> +/* { dg-require-effective-target mmap } */
> +#include 
> +#include 

I think this should include "tree-vect.h" and should call check_vect in main.

> +
> +#define MMAP_SIZE 0x2
> +#define ADDRESS 0x112200
> +
> +#define MB_BLOCK_SIZE 16
> +#define VERT_PRED_16 0
> +#define HOR_PRED_16 1
> +#define DC_PRED_16 2
> +int *sptr;
> +extern void intrapred_luma_16x16();
> +unsigned short mprr_2[5][16][16];
> +void initialise_s(int *s) { }
> +int main() {
> +void *s_mapping;
> +void *end_s;
> +s_mapping = mmap ((void *)ADDRESS, MMAP_SIZE, PROT_READ | PROT_WRITE,
> +   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> +if (s_mapping == MAP_FAILED)
> +  {
> + perror ("mmap");
> + return 1;
> +  }
> +end_s = (s_mapping + MMAP_SIZE);
> +sptr = (int*)(end_s - sizeof(int[31]));
> +intrapred_luma_16x16(sptr);
> +return 0;
> +}
> +
> +void intrapred_luma_16x16(int * restrict sp) {
> +for (int j=0; j < MB_BLOCK_SIZE; j++)
> +  {
> + mprr_2[VERT_PRED_16][j][0]=sp[j*2];
> + mprr_2[VERT_PRED_16][j][1]=sp[j*2];
> + mprr_2[VERT_PRED_16][j][2]=sp[j*2];
> + mprr_2[VERT_PRED_16][j][3]=sp[j*2];
> + mprr_2[VERT_PRED_16][j][4]=sp[j*2];
> + mprr_2[VERT_PRED_16][j][5]=sp[j*2];
> + mprr_2[VERT_PRED_16][j][6]=sp[j*2];
> + mprr_2[VERT_PRED_16][j][7]=sp[j*2];
> + mprr_2[VERT_PRED_16][j][8]=sp[j*2];
> + mprr_2[VERT_PRED_16][j][9]=sp[j*2];
> + mprr_2[VERT_PRED_16][j][10]=sp[j*2];
> + mprr_2[VERT_PRED_16][j][11]=sp[j*2];
> + mprr_2[VERT_PRED_16][j][12]=sp[j*2];
> + mprr_2[VERT_PRED_16][j][13]=sp[j*2];
> + mprr_2[VERT_PRED_16][j][14]=sp[j*2];
> + mprr_2[VERT_PRED_16][j][15]=sp[j*2];
> +  }
> +}
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 
> c08d0ef951fc63adcfffc601917134ddf51ece45..1c8c6784cc7b5f2d327339ff55a5a5ea08835aab
>  100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -2217,7 +2217,9 @@ get_group_load_store_type (vec_info *vinfo, 
> stmt_vec_info stmt_info,
>but the access in the loop doesn't cover the full vector
>we can end up with no gap recorded but still excess
>elements accessed, see PR103116.  Make sure we peel for
> -  gaps if necessary and sufficient and give up if not.  */
> +  gaps if necessary and sufficient and give up if not.
> +  If there is a combination of the access not covering the full 
> vector and
> +  a gap recorded then we may need to peel twice.  */

Nit: long line.  Might be worth adding a paragraph break.

OK with

[PATCH] internal-fn: Refine macro define of COND_* and COND_LEN_* internal functions

2023-07-25 Thread juzhe . zhong

From: Ju-Zhe Zhong 

Hi, Richard and Richi.

Base on previous disscussions, we should make COND_* and COND_LEN_*
consistent.

So, this patch define these internal function together by these 2
wrappers:

#ifndef DEF_INTERNAL_COND_FN
#define DEF_INTERNAL_COND_FN(NAME, FLAGS, OPTAB, TYPE) \
  DEF_INTERNAL_OPTAB_FN (COND_##NAME, FLAGS, cond_##OPTAB, cond_##TYPE)\
  DEF_INTERNAL_OPTAB_FN (COND_LEN_##NAME, FLAGS, cond_len_##OPTAB, \
 cond_len_##TYPE)
#endif

#ifndef DEF_INTERNAL_SIGNED_COND_FN
#define DEF_INTERNAL_SIGNED_COND_FN(NAME, FLAGS, SELECTOR, SIGNED_OPTAB,   \
UNSIGNED_OPTAB, TYPE)  \
  DEF_INTERNAL_SIGNED_OPTAB_FN (COND_##NAME, FLAGS, SELECTOR,  \
cond_##SIGNED_OPTAB, cond_##UNSIGNED_OPTAB,\
cond_##TYPE)   \
  DEF_INTERNAL_SIGNED_OPTAB_FN (COND_LEN_##NAME, FLAGS, SELECTOR,  \
cond_len_##SIGNED_OPTAB,   \
cond_len_##UNSIGNED_OPTAB, cond_len_##TYPE)
#endif

Bootstrap and Regression on X86 passed.
Ok for trunk ?

gcc/ChangeLog:

* internal-fn.def (DEF_INTERNAL_COND_FN): New macro.
(DEF_INTERNAL_SIGNED_COND_FN): Ditto.
(COND_ADD): Remove.
(COND_SUB): Ditto.
(COND_MUL): Ditto.
(COND_DIV): Ditto.
(COND_MOD): Ditto.
(COND_RDIV): Ditto.
(COND_MIN): Ditto.
(COND_MAX): Ditto.
(COND_FMIN): Ditto.
(COND_FMAX): Ditto.
(COND_AND): Ditto.
(COND_IOR): Ditto.
(COND_XOR): Ditto.
(COND_SHL): Ditto.
(COND_SHR): Ditto.
(COND_FMA): Ditto.
(COND_FMS): Ditto.
(COND_FNMA): Ditto.
(COND_FNMS): Ditto.
(COND_NEG): Ditto.
(COND_LEN_ADD): Ditto.
(COND_LEN_SUB): Ditto.
(COND_LEN_MUL): Ditto.
(COND_LEN_DIV): Ditto.
(COND_LEN_MOD): Ditto.
(COND_LEN_RDIV): Ditto.
(COND_LEN_MIN): Ditto.
(COND_LEN_MAX): Ditto.
(COND_LEN_FMIN): Ditto.
(COND_LEN_FMAX): Ditto.
(COND_LEN_AND): Ditto.
(COND_LEN_IOR): Ditto.
(COND_LEN_XOR): Ditto.
(COND_LEN_SHL): Ditto.
(COND_LEN_SHR): Ditto.
(COND_LEN_FMA): Ditto.
(COND_LEN_FMS): Ditto.
(COND_LEN_FNMA): Ditto.
(COND_LEN_FNMS): Ditto.
(COND_LEN_NEG): Ditto.
(ADD): New macro define.
(SUB): Ditto.
(MUL): Ditto.
(DIV): Ditto.
(MOD): Ditto.
(RDIV): Ditto.
(MIN): Ditto.
(MAX): Ditto.
(FMIN): Ditto.
(FMAX): Ditto.
(AND): Ditto.
(IOR): Ditto.
(XOR): Ditto.
(SHL): Ditto.
(SHR): Ditto.
(FMA): Ditto.
(FMS): Ditto.
(FNMA): Ditto.
(FNMS): Ditto.
(NEG): Ditto.

---
 gcc/internal-fn.def | 123 
 1 file changed, 56 insertions(+), 67 deletions(-)

diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 04f3812326e..bf6825c5d00 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -34,10 +34,12 @@ along with GCC; see the file COPYING3.  If not see
   UNSIGNED_OPTAB, TYPE)
  DEF_INTERNAL_FLT_FN (NAME, FLAGS, OPTAB, TYPE)
  DEF_INTERNAL_INT_FN (NAME, FLAGS, OPTAB, TYPE)
+ DEF_INTERNAL_COND_FN (NAME, FLAGS, OPTAB, TYPE)
+ DEF_INTERNAL_SIGNED_COND_FN (NAME, FLAGS, OPTAB, TYPE)
 
where NAME is the name of the function, FLAGS is a set of
ECF_* flags and FNSPEC is a string describing functions fnspec.
-   
+
DEF_INTERNAL_OPTAB_FN defines an internal function that maps to a
direct optab.  The function should only be called with a given
set of types if the associated optab is available for the modes
@@ -74,7 +76,8 @@ along with GCC; see the file COPYING3.  If not see
 
- cond_len_unary: a conditional unary optab, such as cond_len_neg
- cond_len_binary: a conditional binary optab, such as cond_len_add
-   - cond_len_ternary: a conditional ternary optab, such as 
cond_len_fma_rev
+   - cond_len_ternary: a conditional ternary optab, such as
+   cond_len_fma_rev
 
DEF_INTERNAL_SIGNED_OPTAB_FN defines an internal function that
maps to one of two optabs, depending on the signedness of an input.
@@ -106,6 +109,16 @@ along with GCC; see the file COPYING3.  If not see
These five internal functions will require two optabs each, a SIGNED_OPTAB
and an UNSIGNED_OTPAB.
 
+   DEF_INTERNAL_COND_FN is a wrapper that defines 2 internal functions with
+   DEF_INTERNAL_OPTAB_FN:
+   - One is COND_* operations that are predicated by mask only. Such operations
+ make sense for both vectors and scalars.
+   - The other is COND_LEN_* operations that are predicated by mask and len
+

Re: [RFC] [v2] Extend fold_vec_perm to handle VLA vectors

2023-07-25 Thread Richard Sandiford via Gcc-patches

Hi,

Thanks for the rework and sorry for the slow review.

Prathamesh Kulkarni  writes:
> Hi Richard,
> This is reworking of patch to extend fold_vec_perm to handle VLA vectors.
> The attached patch unifies handling of VLS and VLA vector_csts, while
> using fallback code
> for ctors.
>
> For VLS vector, the patch ignores underlying encoding, and
> uses npatterns = nelts, and nelts_per_pattern = 1.
>
> For VLA patterns, if sel has a stepped sequence, then it
> only chooses elements from a particular pattern of a particular
> input vector.
>
> To make things simpler, the patch imposes following constraints:
> (a) op0_npatterns, op1_npatterns and sel_npatterns are powers of 2.
> (b) The step size for a stepped sequence is a power of 2, and
>   multiple of npatterns of chosen input vector.
> (c) Runtime vector length of sel is a multiple of sel_npatterns.
>  So, we don't handle sel.length = 2 + 2x and npatterns = 4.
>
> Eg:
> op0, op1: npatterns = 2, nelts_per_pattern = 3
> op0_len = op1_len = 16 + 16x.
> sel = { 0, 0, 2, 0, 4, 0, ... }
> npatterns = 2, nelts_per_pattern = 3.
>
> For pattern {0, 2, 4, ...}
> Let,
> a1 = 2
> S = step size = 2
>
> Let Esel denote number of elements per pattern in sel at runtime.
> Esel = (16 + 16x) / npatterns_sel
> = (16 + 16x) / 2
> = (8 + 8x)
>
> So, last element of pattern:
> ae = a1 + (Esel - 2) * S
>  = 2 + (8 + 8x - 2) * 2
>  = 14 + 16x
>
> a1 /trunc arg0_len = 2 / (16 + 16x) = 0
> ae /trunc arg0_len = (14 + 16x) / (16 + 16x) = 0
> Since both are equal with quotient = 0, we select elements from op0.
>
> Since step size (S) is a multiple of npatterns(op0), we select
> all elements from same pattern of op0.
>
> res_npatterns = max (op0_npatterns, max (op1_npatterns, sel_npatterns))
>= max (2, max (2, 2)
>= 2
>
> res_nelts_per_pattern = max (op0_nelts_per_pattern,
> max (op1_nelts_per_pattern,
>  
> sel_nelts_per_pattern))
> = max (3, max (3, 3))
> = 3
>
> So res has encoding with npatterns = 2, nelts_per_pattern = 3.
> res: { op0[0], op0[0], op0[2], op0[0], op0[4], op0[0], ... }
>
> Unfortunately, this results in an issue for poly_int_cst index:
> For example,
> op0, op1: npatterns = 1, nelts_per_pattern = 3
> op0_len = op1_len = 4 + 4x
>
> sel: { 4 + 4x, 5 + 4x, 6 + 4x, ... } // should choose op1
>
> In this case,
> a1 = 5 + 4x
> S = (6 + 4x) - (5 + 4x) = 1
> Esel = 4 + 4x
>
> ae = a1 + (esel - 2) * S
>  = (5 + 4x) + (4 + 4x - 2) * 1
>  = 7 + 8x
>
> IIUC, 7 + 8x will always be index for last element of op1 ?
> if x = 0, len = 4, 7 + 8x = 7
> if x = 1, len = 8, 7 + 8x = 15, etc.
> So the stepped sequence will always choose elements
> from op1 regardless of vector length for above case ?
>
> However,
> ae /trunc op0_len
> = (7 + 8x) / (4 + 4x)
> which is not defined because 7/4 != 8/4
> and we return NULL_TREE, but I suppose the expected result would be:
> res: { op1[0], op1[1], op1[2], ... } ?
>
> The patch passes bootstrap+test on aarch64-linux-gnu with and without sve,
> and on x86_64-unknown-linux-gnu.
> I would be grateful for suggestions on how to proceed.
>
> Thanks,
> Prathamesh
>
> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> index a02ede79fed..8028b3e8e9a 100644
> --- a/gcc/fold-const.cc
> +++ b/gcc/fold-const.cc
> @@ -85,6 +85,10 @@ along with GCC; see the file COPYING3.  If not see
>  #include "vec-perm-indices.h"
>  #include "asan.h"
>  #include "gimple-range.h"
> +#include 
> +#include "tree-pretty-print.h"
> +#include "gimple-pretty-print.h"
> +#include "print-tree.h"
>  
>  /* Nonzero if we are folding constants inside an initializer or a C++
> manifestly-constant-evaluated context; zero otherwise.
> @@ -10493,15 +10497,9 @@ fold_mult_zconjz (location_t loc, tree type, tree 
> expr)
>  static bool
>  vec_cst_ctor_to_array (tree arg, unsigned int nelts, tree *elts)
>  {
> -  unsigned HOST_WIDE_INT i, nunits;
> +  unsigned HOST_WIDE_INT i;
>  
> -  if (TREE_CODE (arg) == VECTOR_CST
> -  && VECTOR_CST_NELTS (arg).is_constant ())
> -{
> -  for (i = 0; i < nunits; ++i)
> - elts[i] = VECTOR_CST_ELT (arg, i);
> -}
> -  else if (TREE_CODE (arg) == CONSTRUCTOR)
> +  if (TREE_CODE (arg) == CONSTRUCTOR)
>  {
>constructor_elt *elt;
>  
> @@ -10519,6 +10517,230 @@ vec_cst_ctor_to_array (tree arg, unsigned int 
> nelts, tree *elts)
>return true;
>  }
>  
> +/* Return a vector with (NPATTERNS, NELTS_PER_PATTERN) encoding.  */
> +
> +static tree
> +vector_cst_reshape (tree vec, unsigned npatterns, unsigned nelts_per_pattern)
> +{
> +  gcc_assert (pow2p_hwi (npatterns));
> +
> +  if (VECTOR_CST_NPATTERNS (vec) == npatterns
> +  && VECTOR_CST_NELTS_PER_PATTERN (vec) == nelts_per_pattern)
> +return vec;
> +
> +  tree v = make_vector (exact_log2 (npatterns),

Re: [PATCH] VECT: Support CALL vectorization for COND_LEN_*

2023-07-25 Thread Richard Sandiford via Gcc-patches

"juzhe.zh...@rivai.ai"  writes:
> Hi, Richard.
>>> I think we should have an internal-fn helper that returns IFN_COND_LEN_*
>>> for a given IFN_COND_*.  It could handle IFN_MASK_LOAD -> IFN_MASK_LEN_LOAD
>>> etc. too.
> Could you name this helper function for me? Does it call 
> "get_conditional_len_internal_fn_for_conditional_fn" ?

How about get_len_internal_fn?

/* If there exists an internal function like IFN that operates on vectors,
   but with additional length and bias parameters, return the internal_fn
   for that function, otherwise return IFN_LAST.  */

Thanks,
Richard

RE: [PATCH v7] RISC-V: Support CALL for RVV floating-point dynamic rounding

2023-07-25 Thread Li, Pan2 via Gcc-patches

Thanks Robin. Let me share one example for the CALL scenario, considering below 
code.

external int fesetround(int rounding_mode);

test_call_for_rm:
 <-FRM X
   vfadd RTZ (static)   <-FRM RTZ
 <-RESTORE FRM X
   call fesetround RMM <-Change FRM to RMM during the call
 <-Backup the FRM RMM
   vfadd RUP (static)  <- FRM RUP
 <- Restore the FRM to 
RMM
   ret

When emit at the insn call, we need to emit 2 insns, one restore before the 
call and one backup after call, to ensure 2 things.

1. The static FRM should not pollute the call.
2. The updated FRM in the call will alive to the end of the cfun.

Unfortunately, current mode switching cannot emit 2 insns as above, it mostly 
emits after. It become even worse when the call
is the last insn of the bb, we try to do some special handling in needed 
function for this.

And thank robin again for nits and cleanups, like 
previous/next_nonnote_nondebug_insn_bb.

Pan

-Original Message-
From: Robin Dapp  
Sent: Tuesday, July 25, 2023 4:38 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: rdapp@gmail.com; juzhe.zh...@rivai.ai; kito.ch...@sifive.com; Wang, 
Yanzhang 
Subject: Re: [PATCH v7] RISC-V: Support CALL for RVV floating-point dynamic 
rounding

Hi Pan,

> Given we have a call, we would like to restore before call and then
> backup frm after call. Looks current mode switching cannot emit insn
> like that, it can only either emit insn before (mostly) or after
> (when NOTE_INSN_BASIC_BLOCK_P). Thus, we try to emit the one after
> call when needed as a specially handling here.

Would you mind explaining a bit more here?  As far as I know we can
perform necessary mode switching (including saving necessary
registers) directly after function entry and right before function
exit.  Is this somehow too early or too late or cannot handle what
you want?

The patch in itself makes sense (apart from some nits and possible
cleanups) but I'm still missing the bigger picture.  For me it gets
more confusing with every patch to be honest :D

Regards
 Robin

Re: [PATCH] PR rtl-optimization/110587: Reduce useless moves in compile-time hog.

On Tue, Jul 25, 2023 at 1:31 PM Roger Sayle  wrote:
>
>
> This patch is the third in series of fixes for PR rtl-optimization/110587,
> a compile-time regression with -O0, that attempts to address the underlying
> cause.  As noted previously, the pathological test case pr28071.c contains
> a large number of useless register-to-register moves that can produce
> quadratic behaviour (in LRA).  These move are generated during RTL
> expansion in emit_group_load_1, where the middle-end attempts to simplify
> the source before calling extract_bit_field.  This is reasonable if the
> source is a complex expression (from before the tree-ssa optimizers), or
> a SUBREG, or a hard register, but it's not particularly useful to copy
> a pseudo register into a new pseudo register.  This patch eliminates that
> redundancy.
>
> The -fdump-tree-expand for pr28071.c compiled with -O0 currently contains
> 777K lines, with this patch it contains 717K lines, i.e. saving about 60K
> lines (admittedly of debugging text output, but it makes the point).
>
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
> As always, I'm happy to revert this change quickly if there's a problem,
> and investigate why this additional copy might (still) be needed on other
> non-x86 targets.

@@ -2622,6 +2622,7 @@ emit_group_load_1 (rtx *tmps, rtx dst, rtx
orig_src, tree type,
 be loaded directly into the destination.  */
   src = orig_src;
   if (!MEM_P (orig_src)
+ && (!REG_P (orig_src) || HARD_REGISTER_P (orig_src))
  && (!CONSTANT_P (orig_src)
  || (GET_MODE (orig_src) != mode
  && GET_MODE (orig_src) != VOIDmode)))

so that means the code guarded by the conditional could instead
be transformed to

   src = force_reg (mode, orig_src);

?  Btw, the || (GET_MODE (orig_src) != mode && GET_MODE (orig_src) != VOIDmode)
case looks odd as in that case we'd use GET_MODE (orig_src) for the
move ... that
might also mean we have to use force_reg (GET_MODE (orig_src) ==
VOIDmode ? mode : GET_MODE (orig_src), orig_src))

Otherwise I think this is OK, as said, using
force_reg somehow would improve readability here I think.

I also wonder how the

  else if (GET_CODE (src) == CONCAT)

case will ever trigger with the current code.

Richard.

>
> 2023-07-25  Roger Sayle  
>
> gcc/ChangeLog
> PR middle-end/28071
> PR rtl-optimization/110587
> * expr.cc (emit_group_load_1): Avoid copying a pseudo register into
> a new pseudo register, i.e. only copy hard regs into a new pseudo.
>
>
> Thanks in advance,
> Roger
> --
>

Re: [COMMITTED] bpf: add pseudo-c asm dialect for "nop"

2023-07-25 Thread Jose E. Marchesi via Gcc-patches



> The define_insn "nop" was missing a template for the pseudo-c dialect,
> so the normal syntax was unconditionally emitted.

Thank you.

> Tested on bpf-unknown-none, committed as obvious.
>
> gcc/
>
>   * config/bpf/bpf.md (nop): Add pseudo-c asm dialect template.
> ---
>  gcc/config/bpf/bpf.md | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
> index 3e2d760fbe4..64342ea1de2 100644
> --- a/gcc/config/bpf/bpf.md
> +++ b/gcc/config/bpf/bpf.md
> @@ -103,7 +103,7 @@ (define_mode_attr msuffix [(SI "32") (DI "")])
>  (define_insn "nop"
>[(const_int 0)]
>""
> -  "ja\t0"
> +  "{ja\t0|goto 0}"
>[(set_attr "type" "alu")])
>  
>   Arithmetic/Logical

[PATCH] PR rtl-optimization/110587: Reduce useless moves in compile-time hog.

2023-07-25 Thread Roger Sayle


This patch is the third in series of fixes for PR rtl-optimization/110587,
a compile-time regression with -O0, that attempts to address the underlying
cause.  As noted previously, the pathological test case pr28071.c contains
a large number of useless register-to-register moves that can produce
quadratic behaviour (in LRA).  These move are generated during RTL
expansion in emit_group_load_1, where the middle-end attempts to simplify
the source before calling extract_bit_field.  This is reasonable if the
source is a complex expression (from before the tree-ssa optimizers), or
a SUBREG, or a hard register, but it's not particularly useful to copy
a pseudo register into a new pseudo register.  This patch eliminates that
redundancy.

The -fdump-tree-expand for pr28071.c compiled with -O0 currently contains
777K lines, with this patch it contains 717K lines, i.e. saving about 60K
lines (admittedly of debugging text output, but it makes the point).


This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?

As always, I'm happy to revert this change quickly if there's a problem,
and investigate why this additional copy might (still) be needed on other
non-x86 targets.


2023-07-25  Roger Sayle  

gcc/ChangeLog
PR middle-end/28071
PR rtl-optimization/110587
* expr.cc (emit_group_load_1): Avoid copying a pseudo register into
a new pseudo register, i.e. only copy hard regs into a new pseudo.


Thanks in advance,
Roger
--

diff --git a/gcc/expr.cc b/gcc/expr.cc
index fff09dc..11d041b 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -2622,6 +2622,7 @@ emit_group_load_1 (rtx *tmps, rtx dst, rtx orig_src, tree 
type,
 be loaded directly into the destination.  */
   src = orig_src;
   if (!MEM_P (orig_src)
+ && (!REG_P (orig_src) || HARD_REGISTER_P (orig_src))
  && (!CONSTANT_P (orig_src)
  || (GET_MODE (orig_src) != mode
  && GET_MODE (orig_src) != VOIDmode)))

Re: Re: [PATCH] VECT: Support CALL vectorization for COND_LEN_*

Hi, Richard.
>> I think we should have an internal-fn helper that returns IFN_COND_LEN_*
>> for a given IFN_COND_*.  It could handle IFN_MASK_LOAD -> IFN_MASK_LEN_LOAD
>> etc. too.
Could you name this helper function for me? Does it call 
"get_conditional_len_internal_fn_for_conditional_fn" ?

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-07-25 18:21
To: juzhe.zhong\@rivai.ai
CC: rguenther; gcc-patches
Subject: Re: [PATCH] VECT: Support CALL vectorization for COND_LEN_*
"juzhe.zh...@rivai.ai"  writes:
> Thanks Richard.
>
> Do you suggest we should add a macro like this first:
>
> #ifndef DEF_INTERNAL_COND_FN
> #define DEF_INTERNAL_COND_FN(NAME, FLAGS, OPTAB, TYPE) \
>  DEF_INTERNAL_OPTAB_FN (COND_##NAME, FLAGS, cond_##optab, cond_##TYPE)
>   DEF_INTERNAL_OPTAB_FN (COND_LEN_##NAME, FLAGS, cond_len_##optab, 
> cond_len_##TYPE)
> #endif
 
Yeah.  (Think there's a missing backslash though.)
 
> If yes, maybe I should first do this in a single patch first?
 
Yeah, doing it as a separate patch sounds good.
 
Richard

Re: [PATCH v3] x86: Properly find the maximum stack slot alignment

On Mon, Jul 24, 2023 at 10:36 PM H.J. Lu via Gcc-patches
 wrote:
>
> Don't assume that stack slots can only be accessed by stack or frame
> registers.  We first find all registers defined by stack or frame
> registers.  Then check memory accesses by such registers, including
> stack and frame registers.

Looks good to me from an algorithmic/DF perspective - I'm leaving it for
x86 maintainers to assess the functional bits.

Thanks,
Richard.

> gcc/
>
> PR target/109780
> * config/i386/i386.cc (ix86_update_stack_alignment): New.
> (ix86_find_all_reg_use): Likewise.
> (ix86_find_max_used_stack_alignment): Also check memory accesses
> from registers defined by stack or frame registers.
>
> gcc/testsuite/
>
> PR target/109780
> * g++.target/i386/pr109780-1.C: New test.
> * gcc.target/i386/pr109780-1.c: Likewise.
> * gcc.target/i386/pr109780-2.c: Likewise.
> ---
>  gcc/config/i386/i386.cc| 128 +
>  gcc/testsuite/g++.target/i386/pr109780-1.C |  72 
>  gcc/testsuite/gcc.target/i386/pr109780-1.c |  14 +++
>  gcc/testsuite/gcc.target/i386/pr109780-2.c |  21 
>  4 files changed, 214 insertions(+), 21 deletions(-)
>  create mode 100644 gcc/testsuite/g++.target/i386/pr109780-1.C
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr109780-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr109780-2.c
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index caca74d6dec..b71fd9401ef 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -8084,6 +8084,65 @@ output_probe_stack_range (rtx reg, rtx end)
>return "";
>  }
>
> +/* Update the maximum stack slot alignment from memory alignment in
> +   PAT.  */
> +
> +static void
> +ix86_update_stack_alignment (rtx, const_rtx pat, void *data)
> +{
> +  /* This insn may reference stack slot.  Update the maximum stack slot
> + alignment.  */
> +  subrtx_iterator::array_type array;
> +  FOR_EACH_SUBRTX (iter, array, pat, ALL)
> +if (MEM_P (*iter))
> +  {
> +   unsigned int alignment = MEM_ALIGN (*iter);
> +   unsigned int *stack_alignment
> + = (unsigned int *) data;
> +   if (alignment > *stack_alignment)
> + *stack_alignment = alignment;
> +   break;
> +  }
> +}
> +
> +/* Find all registers defined with REG.  */
> +
> +static void
> +ix86_find_all_reg_use (HARD_REG_SET _slot_access,
> +  unsigned int reg, auto_bitmap )
> +{
> +  for (df_ref ref = DF_REG_USE_CHAIN (reg);
> +   ref != NULL;
> +   ref = DF_REF_NEXT_REG (ref))
> +{
> +  if (DF_REF_IS_ARTIFICIAL (ref))
> +   continue;
> +
> +  rtx_insn *insn = DF_REF_INSN (ref);
> +  if (!NONDEBUG_INSN_P (insn))
> +   continue;
> +
> +  rtx set = single_set (insn);
> +  if (!set)
> +   continue;
> +
> +  rtx src = SET_SRC (set);
> +  if (MEM_P (src))
> +   continue;
> +
> +  rtx dest = SET_DEST (set);
> +  if (!REG_P (dest))
> +   continue;
> +
> +  if (TEST_HARD_REG_BIT (stack_slot_access, REGNO (dest)))
> +   continue;
> +
> +  /* Add this register to stack_slot_access.  */
> +  add_to_hard_reg_set (_slot_access, Pmode, REGNO (dest));
> +  bitmap_set_bit (worklist, REGNO (dest));
> +}
> +}
> +
>  /* Set stack_frame_required to false if stack frame isn't required.
> Update STACK_ALIGNMENT to the largest alignment, in bits, of stack
> slot used if stack frame is required and CHECK_STACK_SLOT is true.  */
> @@ -8102,10 +8161,6 @@ ix86_find_max_used_stack_alignment (unsigned int 
> _alignment,
>add_to_hard_reg_set (_up_by_prologue, Pmode,
>HARD_FRAME_POINTER_REGNUM);
>
> -  /* The preferred stack alignment is the minimum stack alignment.  */
> -  if (stack_alignment > crtl->preferred_stack_boundary)
> -stack_alignment = crtl->preferred_stack_boundary;
> -
>bool require_stack_frame = false;
>
>FOR_EACH_BB_FN (bb, cfun)
> @@ -8117,27 +8172,58 @@ ix86_find_max_used_stack_alignment (unsigned int 
> _alignment,
>set_up_by_prologue))
>   {
> require_stack_frame = true;
> -
> -   if (check_stack_slot)
> - {
> -   /* Find the maximum stack alignment.  */
> -   subrtx_iterator::array_type array;
> -   FOR_EACH_SUBRTX (iter, array, PATTERN (insn), ALL)
> - if (MEM_P (*iter)
> - && (reg_mentioned_p (stack_pointer_rtx,
> -  *iter)
> - || reg_mentioned_p (frame_pointer_rtx,
> - *iter)))
> -   {
> - unsigned int alignment = MEM_ALIGN (*iter);
> - if (alignment > stack_alignment)
> -   stack_alignment = alignment;
> -   }
> -

Re: RISC-V: Folding memory for FP + constant case

2023-07-25 Thread Jivan Hakobyan via Gcc-patches

Hi.

I re-run the benchmarks and hopefully got the same profit.
I also compared the leela's code and figured out the reason.

Actually, my and Manolis's patches do the same thing. The difference is
only execution order.
Because of f-m-o held after the register allocation it cannot eliminate
redundant move 'sp' to another register.

Here is an example.

int core_bench_state(int *ptr) {
>int final_counts[100] = {0};

   while (*ptr) {
>  int id = foo();
>  final_counts[id]++;
>  ptr++;
>}

   return final_counts[0];
> }


For this loop, the f-m-o pass generates the following.

.L3:

call foo
* mv a5,sp*
sh2add a0,a0,a5
lw a5,0(a0)
lw a4,4(s0)
addi s0,s0,4
addiw a5,a5,1
sw a5,0(a0)
bne a4,zero,.L3


Here '*mv a5, sp*' instruction is redundant.
Leela's FastState::try_move() function has a loop that iterates over 1.3 B
times and contains 5 memory folding cases (5 redundant moves).

Besides that, I have checked the build failure on x264_r. It is already
fixed on the third version.

On Sat, Jul 15, 2023 at 10:16 AM Jeff Law  wrote:

>
>
> On 7/12/23 14:59, Jivan Hakobyan via Gcc-patches wrote:
> > Accessing local arrays element turned into load form (fp + (index <<
> > C1)) + C2 address. In the case when access is in the loop we got loop
> > invariant computation. For some reason, moving out that part cannot
> > be done in loop-invariant passes. But we can handle that in
> > target-specific hook (legitimize_address). That provides an
> > opportunity to rewrite memory access more suitable for the target
> > architecture.
> >
> > This patch solves the mentioned case by rewriting mentioned case to
> > ((fp + C2) + (index << C1)) I have evaluated it on SPEC2017 and got
> > an improvement on leela (over 7b instructions, .39% of the dynamic
> > count) and dwarfs the regression for gcc (14m instructions, .0012% of
> > the dynamic count).
> >
> >
> > gcc/ChangeLog: * config/riscv/riscv.cc (riscv_legitimize_address):
> > Handle folding. (mem_shadd_or_shadd_rtx_p): New predicate.
> So I still need to give the new version a review.  But a high level
> question -- did you re-run the benchmarks with this version to verify
> that we still saw the same nice improvement in leela?
>
> The reason I ask is when I use this on Ventana's internal tree I don't
> see any notable differences in the dynamic instruction counts.  And
> probably the most critical difference between the upstream tree and
> Ventana's tree in this space is Ventana's internal tree has an earlier
> version of the fold-mem-offsets work from Manolis.
>
> It may ultimately be the case that this work and Manolis's f-m-o patch
> have a lot of overlap in terms of their final effect on code generation.
>   Manolis's pass runs much later (after register allocation), so it's
> not going to address the loop-invariant-code-motion issue that
> originally got us looking into this space.  But his pass is generic
> enough that it helps other targets.  So we may ultimately want both.
>
> Anyway, just wanted to verify if this variant is still showing the nice
> improvement on leela that the prior version did.
>
> Jeff
>
> ps.  I know you're on PTO.  No rush on responding -- enjoy the time off.
>
>

-- 
With the best regards
Jivan Hakobyan

Re: [PATCH] Replace lra-spill.cc's return_regno_p with return_reg_p.

On Sat, Jul 22, 2023 at 6:45 PM Roger Sayle  wrote:
>
>
> This patch is my attempt to address the compile-time hog issue
> in PR rtl-optimization/110587.  Richard Biener's analysis shows that
> compilation of pr28071.c with -O0 currently spends ~70% in timer
> "LRA non-specific" due to return_regno_p failing to filter a large
> number of calls to regno_in_use_p, resulting in quadratic behaviour.
>
> For this pathological test case, things can be improved significantly.
> Although the return register (%rax) is indeed mentioned a large
> number of times in this function, due to inlining, the inlined functions
> access the returned register in TImode, whereas the current function
> returns a DImode.  Hence the check to see if we're the last SET of the
> return register, which should be followed by a USE, can be improved
> by also testing the mode.  Implementation-wise, rather than pass an
> additional mode parameter to LRA's local return_regno_p function, which
> only has a single caller, it's more convenient to pass the rtx REG_P,
> and from this extract both the REGNO and the mode in the callee, and
> rename this function to return_reg_p.
>
> The good news is that with this change "LRA non-specific" drops from
> 70% to 13%.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, with no new failures.  Ok for mainline?

I think this is a good heuristic improvement, but I don't feel competent
on assessing the constraints this implies on the structure of the
return value setting instructions.  Previously we'd preserve
noop copies mentioning the return hardreg but now just if that noop
copy has the same mode as the return_rtx.  I don't even understand
why we need to preserve this noop copy in the first place.

Looking at the history of return_regno_p and the surrounding of the
single caller tells this is a place of "interestingness" ...

Can somebody summarize why we need to preserve a noop-move
between the return_rtx (hard?)regs?

The comment

  /* IRA can generate move insns involving pseudos.  It is
 better remove them earlier to speed up compiler a bit.
 It is also better to do it here as they might not pass
 final RTL check in LRA, (e.g. insn moving a control
 register into itself).  So remove an useless move insn
 unless next insn is USE marking the return reg (we should
 save this as some subsequent optimizations assume that
 such original insns are saved).  */

says this is about removing noop copies of _pseudos_ for correctness
reasons.  So, can't we simply scrap the return_regno_p and regno_in_use_p
checks and always preserve noop moves between hardregs here, leaving
that to other passes?

I'm going to bootstrap & test that for the fun of it.

Thanks,
Richard.

>
>
> 2023-07-22  Roger Sayle  
>
> gcc/ChangeLog
> PR middle-end/28071
> PR rtl-optimization/110587
> * lra-spills.cc (return_regno_p): Change argument and rename to...
> (return_reg_p): Check if the given register RTX has the same
> REGNO and machine mode as the function's return value.
> (lra_final_code_change): Update call to return_reg_p.
>
>
> Thanks in advance,
> Roger
> --
>

[patch] OpenMP/Fortran: Reject declarations between target + teams (was: [Patch] OpenMP/Fortran: Reject not strictly nested target -> teams [PR110725, PR71065])

2023-07-25 Thread Tobias Burnus


On 24.07.23 21:49, Jakub Jelinek via Fortran wrote:

On Mon, Jul 24, 2023 at 09:43:10PM +0200, Tobias Burnus wrote:

This patch adds diagnostic for additional code alongside a nested teams
in a target region.

Thanks for working on this.  The fuzzy thing on the Fortran side is
if e.g. multiple nested BLOCK statements can appear sandwiched in between
target and teams (of course without declarations in them)


Talking about declarations, I realized that I missed to diagnose them;
the attached patch should handle them as well. (Except for 'omp nothing'
and 'omp error', which return ST_NONE.)

Comments, remarks, suggestions? If none or no changes are required,
I will later commit the attached follow-up patch.

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
OpenMP/Fortran: Reject declarations between target + teams

While commit r14-2754-g2e31fe431b08b0302e1fa8a1c18ee51adafd41df
detected executable statements, declarations do not show up as
executable statements.  Hence, we now check whether the first
statement after TARGET is TEAMS - such that we can detect data
statements like type or variable declarations.  Fortran semantics
ensures that only executable directives/statemens can come after
'!$omp end teams' such that those can be detected with the
previous check.

Note that statements returning ST_NONE such as 'omp nothing' or
'omp error at(compilation)' will still slip through.

	PR fortran/110725
	PR middle-end/71065

gcc/fortran/ChangeLog:

	* gfortran.h (gfc_omp_clauses): Add target_first_st_is_teams.
	* parse.cc (parse_omp_structured_block): Set it if the first
	statement in the structured block of a TARGET is TEAMS or
	a combined/composite starting with TEAMS.
	* openmp.cc (resolve_omp_target): Also show an error for
	contains_teams_construct without target_first_st_is_teams.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/teams-6.f90: New test.

 gcc/fortran/gfortran.h |  2 +-
 gcc/fortran/openmp.cc  | 13 ++---
 gcc/fortran/parse.cc   | 25 --
 gcc/testsuite/gfortran.dg/gomp/teams-6.f90 | 78 ++
 4 files changed, 108 insertions(+), 10 deletions(-)

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 577ef807af7..9a00e6dea6f 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -1575,7 +1575,7 @@ typedef struct gfc_omp_clauses
   unsigned order_unconstrained:1, order_reproducible:1, capture:1;
   unsigned grainsize_strict:1, num_tasks_strict:1, compare:1, weak:1;
   unsigned non_rectangular:1, order_concurrent:1;
-  unsigned contains_teams_construct:1;
+  unsigned contains_teams_construct:1, target_first_st_is_teams:1;
   ENUM_BITFIELD (gfc_omp_sched_kind) sched_kind:3;
   ENUM_BITFIELD (gfc_omp_device_type) device_type:2;
   ENUM_BITFIELD (gfc_omp_memorder) memorder:3;
diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 675011a18ce..52eeaf2d4da 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -10666,12 +10666,13 @@ resolve_omp_target (gfc_code *code)
 
   if (!code->ext.omp_clauses->contains_teams_construct)
 return;
-  if ((GFC_IS_TEAMS_CONSTRUCT (code->block->next->op)
-   && code->block->next->next == NULL)
-  || (code->block->next->op == EXEC_BLOCK
-	  && code->block->next->next
-	  && GFC_IS_TEAMS_CONSTRUCT (code->block->next->next->op)
-	  && code->block->next->next->next == NULL))
+  if (code->ext.omp_clauses->target_first_st_is_teams
+  && ((GFC_IS_TEAMS_CONSTRUCT (code->block->next->op)
+	   && code->block->next->next == NULL)
+	  || (code->block->next->op == EXEC_BLOCK
+	  && code->block->next->next
+	  && GFC_IS_TEAMS_CONSTRUCT (code->block->next->next->op)
+	  && code->block->next->next->next == NULL)))
 return;
   gfc_code *c = code->block->next;
   while (c && !GFC_IS_TEAMS_CONSTRUCT (c->op))
diff --git a/gcc/fortran/parse.cc b/gcc/fortran/parse.cc
index 011a39c3d04..aa6bb663def 100644
--- a/gcc/fortran/parse.cc
+++ b/gcc/fortran/parse.cc
@@ -5766,7 +5766,7 @@ parse_openmp_allocate_block (gfc_statement omp_st)
 static gfc_statement
 parse_omp_structured_block (gfc_statement omp_st, bool workshare_stmts_only)
 {
-  gfc_statement st, omp_end_st;
+  gfc_statement st, omp_end_st, first_st;
   gfc_code *cp, *np;
   gfc_state_data s;
 
@@ -5857,7 +5857,7 @@ parse_omp_structured_block (gfc_statement omp_st, bool workshare_stmts_only)
   gfc_namespace *my_ns = NULL;
   gfc_namespace *my_parent = NULL;
 
-  st = next_statement ();
+  first_st = st = next_statement ();
 
   if (st == ST_BLOCK)
 {
@@ -5876,9 +5876,28 @@ parse_omp_structured_block (gfc_statement omp_st, bool workshare_stmts_only)
   new_st.ext.block.ns = my_ns;
   new_st.ext.block.assoc = NULL;
   accept_statement

Re: [PATCH v2] RISC-V: Fixbug for fsflags instruction error using immediate.

Jin Ma via Gcc-patches  於 2023年7月25日 週二 15:29 寫道：

> The pattern mistakenly believes that fsflags can use immediate numbers,
> but in fact it does not support it. Immediate numbers should use fsflagsi.
>
> For example:
> __builtin_riscv_fsflags(4);
>
> The following error occurred.
> /tmp/ccoWdWqT.s: Assembler messages:
> /tmp/ccoWdWqT.s:14: Error: illegal operands `fsflags 4'
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.md: Likewise.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/fsflags.c: New test.
> ---
>  gcc/config/riscv/riscv.md|  8 +---
>  gcc/testsuite/gcc.target/riscv/fsflags.c | 16 
>  2 files changed, 21 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/fsflags.c
>
> diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> index 4615e811947..1ec85e30d7e 100644
> --- a/gcc/config/riscv/riscv.md
> +++ b/gcc/config/riscv/riscv.md
> @@ -3074,7 +3074,7 @@ (define_insn "riscv_frcsr"
>"frcsr\t%0")
>
>  (define_insn "riscv_fscsr"
> -  [(unspec_volatile [(match_operand:SI 0 "csr_operand" "rK")]
> UNSPECV_FSCSR)]
> +  [(unspec_volatile [(match_operand:SI 0 "csr_operand" "r")]
> UNSPECV_FSCSR)]
>"TARGET_HARD_FLOAT || TARGET_ZFINX"
>"fscsr\t%0")
>
> @@ -3085,9 +3085,11 @@ (define_insn "riscv_frflags"
>"frflags\t%0")
>
>  (define_insn "riscv_fsflags"
> -  [(unspec_volatile [(match_operand:SI 0 "csr_operand" "rK")]
> UNSPECV_FSFLAGS)]
> +  [(unspec_volatile [(match_operand:SI 0 "csr_operand" "r,K")]
> UNSPECV_FSFLAGS)]
>"TARGET_HARD_FLOAT || TARGET_ZFINX"
> -  "fsflags\t%0")
> +  "@
> +   fsflags\t%0
> +   fsflagsi\t%0")
>

You can be use fsflags%i0, you can reference addsi pattern.


>  (define_insn "*riscv_fsnvsnan2"
>[(unspec_volatile [(match_operand:ANYF 0 "register_operand" "f")
> diff --git a/gcc/testsuite/gcc.target/riscv/fsflags.c
> b/gcc/testsuite/gcc.target/riscv/fsflags.c
> new file mode 100644
> index 000..74a97b8a7c7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/fsflags.c
> @@ -0,0 +1,16 @@
> +/* Verify that fsflags is using the correct register or immediate.  */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target hard_float } */
> +/* { dg-options "-O" } */
> +
> +void foo1 (int a)
> +{
> +   __builtin_riscv_fsflags(a);
> +}
> +void foo2 ()
> +{
> +   __builtin_riscv_fsflags(4);
> +}
> +
> +/* { dg-final { scan-assembler-times "fsflags\t" 1 } } */
> +/* { dg-final { scan-assembler-times "fsflagsi\t" 1 } } */
> --
> 2.17.1
>
>

Re: [PATCH] VECT: Support CALL vectorization for COND_LEN_*

2023-07-25 Thread Richard Sandiford via Gcc-patches

"juzhe.zh...@rivai.ai"  writes:
> Thanks Richard.
>
> Do you suggest we should add a macro like this first:
>
> #ifndef DEF_INTERNAL_COND_FN
> #define DEF_INTERNAL_COND_FN(NAME, FLAGS, OPTAB, TYPE) \
>  DEF_INTERNAL_OPTAB_FN (COND_##NAME, FLAGS, cond_##optab, cond_##TYPE)
>   DEF_INTERNAL_OPTAB_FN (COND_LEN_##NAME, FLAGS, cond_len_##optab, 
> cond_len_##TYPE)
> #endif

Yeah.  (Think there's a missing backslash though.)

> If yes, maybe I should first do this in a single patch first?

Yeah, doing it as a separate patch sounds good.

Richard

Re: Re: [PATCH] VECT: Support CALL vectorization for COND_LEN_*