RE: [PATCH] RISC-V: Legitimise the const0_rtx for RVV load/store address

2023-04-26 Thread Li, Pan2 via Gcc-patches
Thanks Kito. It comes from some experience of Ju-Zhe for auto vectorization in 
previous.

Pan

-Original Message-
From: Kito Cheng  
Sent: Wednesday, April 26, 2023 9:24 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@sifive.com; Wang, 
Yanzhang 
Subject: Re: [PATCH] RISC-V: Legitimise the const0_rtx for RVV load/store 
address

LGTM, pushed to trunk

> This patch try to legitimise the const0_rtx (aka zero register) as the 
> base register for the RVV load/store instructions.
>
> For example:
> vint32m1_t test_vle32_v_i32m1_shortcut (size_t vl) {
>   return __riscv_vle32_v_i32m1 ((int32_t *)0, vl); }

The example is kind of counter intuitive to me, I know it's legal from ISA spec 
level, but can't understand why it's useful...until I saw you mention auto vec 
and index load - I realized this is optimization for gather/scatter code gen.


Re: [PATCH] RISC-V: Legitimise the const0_rtx for RVV load/store address

2023-04-26 Thread Kito Cheng via Gcc-patches
LGTM, pushed to trunk

> This patch try to legitimise the const0_rtx (aka zero register)
> as the base register for the RVV load/store instructions.
>
> For example:
> vint32m1_t test_vle32_v_i32m1_shortcut (size_t vl)
> {
>   return __riscv_vle32_v_i32m1 ((int32_t *)0, vl);
> }

The example is kind of counter intuitive to me, I know it's legal from
ISA spec level, but can't understand why it's useful...until I saw you
mention auto vec and index load - I realized this is optimization for
gather/scatter code gen.


[PATCH] RISC-V: Legitimise the const0_rtx for RVV load/store address

2023-04-26 Thread Pan Li via Gcc-patches
From: Pan Li 

This patch try to legitimise the const0_rtx (aka zero register)
as the base register for the RVV load/store instructions.

For example:
vint32m1_t test_vle32_v_i32m1_shortcut (size_t vl)
{
  return __riscv_vle32_v_i32m1 ((int32_t *)0, vl);
}

Before this patch:
li  a5,0
vsetvli zero,a1,e32,m1,ta,ma
vle32.v v24,0(a5)  <- can propagate the const 0 to a5 here
vs1r.v  v24,0(a0)

After this patch:
vsetvli zero,a1,e32,m1,ta,ma
vle32.v v24,0(zero)
vs1r.v  v24,0(a0)

As above, this patch allow you to propagaate the const 0 (aka zero
register) to the base register of the RVV Unit-Stride load in the
combine pass. This may benefit the underlying RVV auto-vectorization.

However, the indexed load failed to perform the optimization and it
will be token care of in another PATCH.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_classify_address): Allow
  const0_rtx for the RVV load/store.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/zero_base_load_store_optimization.c: New 
test.

Signed-off-by: Pan Li 
Co-authored-by: Ju-Zhe Zhong 
---
 gcc/config/riscv/riscv.cc |  17 ++-
 .../base/zero_base_load_store_optimization.c  | 135 ++
 2 files changed, 150 insertions(+), 2 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/zero_base_load_store_optimization.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index ac8e4420896..a2d2dd0bb67 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -1088,9 +1088,22 @@ riscv_classify_address (struct riscv_address_info *info, 
rtx x,
  && riscv_valid_lo_sum_p (info->symbol_type, mode, info->offset));
 
 case CONST_INT:
-  /* RVV load/store disallow CONST_INT.  */
+  /* We only allow the const0_rtx for the RVV load/store.  For example:
++--+
+| li  a5,0 |
+| vsetvli zero,a1,e32,m1,ta,ma |
+| vle32.v v24,0(a5)  <- propagate the const 0 to a5 here.  |
+| vs1r.v  v24,0(a0)|
++--+
+It can be folded to:
++--+
+| vsetvli zero,a1,e32,m1,ta,ma |
+| vle32.v v24,0(zero)  |
+| vs1r.v  v24,0(a0)|
++--+
+This behavior will benefit the underlying RVV auto vectorization.  */
   if (riscv_v_ext_vector_mode_p (mode))
-   return false;
+   return x == const0_rtx;
 
   /* Small-integer addresses don't occur very often, but they
 are legitimate if x0 is a valid base register.  */
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/base/zero_base_load_store_optimization.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/zero_base_load_store_optimization.c
new file mode 100644
index 000..4b30d3505c5
--- /dev/null
+++ 
b/gcc/testsuite/gcc.target/riscv/rvv/base/zero_base_load_store_optimization.c
@@ -0,0 +1,135 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+// #include 
+#include "riscv_vector.h"
+
+#define float32_t float
+
+// Unit-Stride Load/Store
+vint32m1_t test_vle32_v_i32m1_shortcut (size_t vl)
+{
+  return __riscv_vle32_v_i32m1 ((int32_t *)0, vl);
+}
+
+vuint32m1_t test_vle32_v_u32m1_shortcut (size_t vl)
+{
+  return __riscv_vle32_v_u32m1 ((int32_t *)0, vl);
+}
+
+vfloat32m1_t test_vle32_v_f32m1_shortcut (size_t vl)
+{
+  return __riscv_vle32_v_f32m1 ((float32_t *)0, vl);
+}
+
+void test_vse32_v_i32m1_shortcut (vint32m1_t val, size_t vl)
+{
+  __riscv_vse32_v_i32m1 ((int32_t *)0, val, vl);
+}
+
+void test_vse32_v_u32m1_shortcut (vuint32m1_t val, size_t vl)
+{
+  __riscv_vse32_v_u32m1 ((uint32_t *)0, val, vl);
+}
+
+void test_vse32_v_f32m1_shortcut (vfloat32m1_t val, size_t vl)
+{
+  __riscv_vse32_v_f32m1 ((float32_t *)0, val, vl);
+}
+
+// Stride Load/Store
+vint32m1_t test_vlse32_v_i32m1_shortcut (ptrdiff_t bstride, size_t vl)
+{
+  return  __riscv_vlse32_v_i32m1 ((int32_t *)0, bstride, vl);
+}
+
+vuint32m1_t test_vlse32_v_u32m1_shortcut (ptrdiff_t bstride, size_t vl)
+{
+  return  __riscv_vlse32_v_u32m1 ((uint32_t *)0, bstride, vl);
+}
+
+vfloat32m1_t test_vlse32_v_f32m1_shortcut (ptrdiff_t bstride, size_t vl)
+{
+  return  __riscv_vlse32_v_f32m1 ((float32_t *)0, bstride, vl);
+}
+
+void test_vsse32_v_i32m1_shortcut (ptrdiff_t bstride, vint32m1_t val, size_t 
vl)
+{
+  __riscv_vsse32_v_i32m1 ((int32_t *)0, bstride, val, vl);
+}
+
+void test_vsse32_v_u32m1_shortcut (ptrdiff_t bstride, vuint32m1_t val, size_t 
vl)
+{
+  __riscv_vsse32_v_u32m1 ((uint32_t *)0, bstride, val, vl);
+}
+
+void test_vsse32_v_f32m1_shortcut (ptrdiff_t bstride,