RE: [PATCH v2] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR

2023-04-28 Thread Li, Pan2 via Gcc-patches
Cool, Thank you!

Pan

-Original Message-
From: Kito Cheng  
Sent: Friday, April 28, 2023 8:37 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 

Subject: Re: [PATCH v2] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR

pushed, thanks!


Re: [PATCH v2] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR

2023-04-28 Thread Kito Cheng via Gcc-patches
pushed, thanks!


RE: [PATCH v2] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR

2023-04-28 Thread Li, Pan2 via Gcc-patches
Passed both the X86 bootstrap and regression test.

Pan

-Original Message-
From: Li, Pan2 
Sent: Friday, April 28, 2023 2:45 PM
To: Kito Cheng 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 

Subject: RE: [PATCH v2] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR

Thanks, kito.

Yes, you are right. I am investigating this right now from simplify rtl. Given 
we have one similar case VMORN in previous.

Pan

-Original Message-
From: Kito Cheng 
Sent: Friday, April 28, 2023 2:41 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 

Subject: Re: [PATCH v2] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR

LGTM

I thought it can optimization __riscv_vmseq_vv_i8m8_b1(v1, v1, vl) too, but 
don't know why it's not evaluated

(eq:VNx128BI (reg/v:VNx128QI 137 [ v1 ])
   (reg/v:VNx128QI 137 [ v1 ]))

to true, anyway, I guess it should be your next step to investigate :)

On Fri, Apr 28, 2023 at 10:46 AM  wrote:
>
> From: Pan Li 
>
> When some RVV integer compare operators act on the same vector 
> registers without mask. They can be simplified to VMCLR.
>
> This PATCH allow the ne, lt, ltu, gt, gtu to perform such kind of the 
> simplification by adding one new define_split.
>
> Given we have:
> vbool1_t test_shortcut_for_riscv_vmslt_case_0(vint8m8_t v1, size_t vl) {
>   return __riscv_vmslt_vv_i8m8_b1(v1, v1, vl); }
>
> Before this patch:
> vsetvli  zero,a2,e8,m8,ta,ma
> vl8re8.v v24,0(a1)
> vmslt.vv v8,v24,v24
> vsetvli  a5,zero,e8,m8,ta,ma
> vsm.vv8,0(a0)
> ret
>
> After this patch:
> vsetvli zero,a2,e8,mf8,ta,ma
> vmclr.m v24<- optimized to vmclr.m
> vsetvli zero,a5,e8,mf8,ta,ma
> vsm.v   v24,0(a0)
> ret
>
> As above, we may have one instruction eliminated and require less 
> vector registers.
>
> gcc/ChangeLog:
>
> * config/riscv/vector.md: Add new define split to perform
>   the simplification.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c: New test.
>
> Signed-off-by: Pan Li 
> Co-authored-by: kito-cheng 
> ---
>  gcc/config/riscv/vector.md|  32 ++
>  .../rvv/base/integer_compare_insn_shortcut.c  | 291
> ++
>  2 files changed, 323 insertions(+)
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.
> c
>
> diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md 
> index b3d23441679..1642822d098 100644
> --- a/gcc/config/riscv/vector.md
> +++ b/gcc/config/riscv/vector.md
> @@ -7689,3 +7689,35 @@ (define_insn "@pred_fault_load"
>"vleff.v\t%0,%3%p1"
>[(set_attr "type" "vldff")
> (set_attr "mode" "")])
> +
> +;;
> +-
> + ;;  Integer Compare Instructions Simplification ;;
> +-
> +
> +;; Simplify to VMCLR.m Includes:
> +;; - 1.  VMSNE
> +;; - 2.  VMSLT
> +;; - 3.  VMSLTU
> +;; - 4.  VMSGT
> +;; - 5.  VMSGTU
> +;;
> +-
> +
> +(define_split
> +  [(set (match_operand:VB  0 "register_operand")
> +   (if_then_else:VB
> + (unspec:VB
> +   [(match_operand:VB 1 "vector_all_trues_mask_operand")
> +(match_operand4 "vector_length_operand")
> +(match_operand5 "const_int_operand")
> +(match_operand6 "const_int_operand")
> +(reg:SI VL_REGNUM)
> +(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
> + (match_operand:VB3 "vector_move_operand")
> + (match_operand:VB2 "vector_undef_operand")))]
> +  "TARGET_VECTOR"
> +  [(const_int 0)]
> +  {
> +emit_insn (gen_pred_mov (mode, operands[0], CONST1_RTX 
> (mode),
> +RVV_VUNDEF (mode), operands[3],
> +operands[4], operands[5]));
> +DONE;
> +  }
> +)
> diff --git
> a/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcu
> t.c
> b/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcu
> t.c
> new file mode 100644
> index 000..8954adad09d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_sho
> +++ rtcut.c
> @@ -0,0 +1,291 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h

RE: [PATCH v2] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR

2023-04-27 Thread Li, Pan2 via Gcc-patches
Thanks, kito.

Yes, you are right. I am investigating this right now from simplify rtl. Given 
we have one similar case VMORN in previous.

Pan

-Original Message-
From: Kito Cheng  
Sent: Friday, April 28, 2023 2:41 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 

Subject: Re: [PATCH v2] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR

LGTM

I thought it can optimization __riscv_vmseq_vv_i8m8_b1(v1, v1, vl) too, but 
don't know why it's not evaluated

(eq:VNx128BI (reg/v:VNx128QI 137 [ v1 ])
   (reg/v:VNx128QI 137 [ v1 ]))

to true, anyway, I guess it should be your next step to investigate :)

On Fri, Apr 28, 2023 at 10:46 AM  wrote:
>
> From: Pan Li 
>
> When some RVV integer compare operators act on the same vector 
> registers without mask. They can be simplified to VMCLR.
>
> This PATCH allow the ne, lt, ltu, gt, gtu to perform such kind of the 
> simplification by adding one new define_split.
>
> Given we have:
> vbool1_t test_shortcut_for_riscv_vmslt_case_0(vint8m8_t v1, size_t vl) {
>   return __riscv_vmslt_vv_i8m8_b1(v1, v1, vl); }
>
> Before this patch:
> vsetvli  zero,a2,e8,m8,ta,ma
> vl8re8.v v24,0(a1)
> vmslt.vv v8,v24,v24
> vsetvli  a5,zero,e8,m8,ta,ma
> vsm.vv8,0(a0)
> ret
>
> After this patch:
> vsetvli zero,a2,e8,mf8,ta,ma
> vmclr.m v24<- optimized to vmclr.m
> vsetvli zero,a5,e8,mf8,ta,ma
> vsm.v   v24,0(a0)
> ret
>
> As above, we may have one instruction eliminated and require less 
> vector registers.
>
> gcc/ChangeLog:
>
> * config/riscv/vector.md: Add new define split to perform
>   the simplification.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c: New test.
>
> Signed-off-by: Pan Li 
> Co-authored-by: kito-cheng 
> ---
>  gcc/config/riscv/vector.md|  32 ++
>  .../rvv/base/integer_compare_insn_shortcut.c  | 291 
> ++
>  2 files changed, 323 insertions(+)
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.
> c
>
> diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md 
> index b3d23441679..1642822d098 100644
> --- a/gcc/config/riscv/vector.md
> +++ b/gcc/config/riscv/vector.md
> @@ -7689,3 +7689,35 @@ (define_insn "@pred_fault_load"
>"vleff.v\t%0,%3%p1"
>[(set_attr "type" "vldff")
> (set_attr "mode" "")])
> +
> +;; 
> +-
> + ;;  Integer Compare Instructions Simplification ;; 
> +-
> +
> +;; Simplify to VMCLR.m Includes:
> +;; - 1.  VMSNE
> +;; - 2.  VMSLT
> +;; - 3.  VMSLTU
> +;; - 4.  VMSGT
> +;; - 5.  VMSGTU
> +;; 
> +-
> +
> +(define_split
> +  [(set (match_operand:VB  0 "register_operand")
> +   (if_then_else:VB
> + (unspec:VB
> +   [(match_operand:VB 1 "vector_all_trues_mask_operand")
> +(match_operand4 "vector_length_operand")
> +(match_operand5 "const_int_operand")
> +(match_operand6 "const_int_operand")
> +(reg:SI VL_REGNUM)
> +(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
> + (match_operand:VB3 "vector_move_operand")
> + (match_operand:VB2 "vector_undef_operand")))]
> +  "TARGET_VECTOR"
> +  [(const_int 0)]
> +  {
> +emit_insn (gen_pred_mov (mode, operands[0], CONST1_RTX 
> (mode),
> +RVV_VUNDEF (mode), operands[3],
> +operands[4], operands[5]));
> +DONE;
> +  }
> +)
> diff --git 
> a/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcu
> t.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcu
> t.c
> new file mode 100644
> index 000..8954adad09d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_sho
> +++ rtcut.c
> @@ -0,0 +1,291 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +vbool1_t test_shortcut_for_riscv_vmseq_case_0(vint8m8_t v1, size_t 
> +vl) {
> +  return __riscv_vmseq_vv_i8m8_b1(v1, v1, vl); }
> +
> +vbool2_t test_shortcut_for_riscv_vmseq_case_1(vint8m4_t v1, size_t 
> +vl) {
> +  return __riscv_vmseq_vv_i8

Re: [PATCH v2] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR

2023-04-27 Thread Kito Cheng via Gcc-patches
LGTM

I thought it can optimization __riscv_vmseq_vv_i8m8_b1(v1, v1, vl)
too, but don't know why it's not evaluated

(eq:VNx128BI (reg/v:VNx128QI 137 [ v1 ])
   (reg/v:VNx128QI 137 [ v1 ]))

to true, anyway, I guess it should be your next step to investigate :)

On Fri, Apr 28, 2023 at 10:46 AM  wrote:
>
> From: Pan Li 
>
> When some RVV integer compare operators act on the same vector
> registers without mask. They can be simplified to VMCLR.
>
> This PATCH allow the ne, lt, ltu, gt, gtu to perform such kind
> of the simplification by adding one new define_split.
>
> Given we have:
> vbool1_t test_shortcut_for_riscv_vmslt_case_0(vint8m8_t v1, size_t vl) {
>   return __riscv_vmslt_vv_i8m8_b1(v1, v1, vl);
> }
>
> Before this patch:
> vsetvli  zero,a2,e8,m8,ta,ma
> vl8re8.v v24,0(a1)
> vmslt.vv v8,v24,v24
> vsetvli  a5,zero,e8,m8,ta,ma
> vsm.vv8,0(a0)
> ret
>
> After this patch:
> vsetvli zero,a2,e8,mf8,ta,ma
> vmclr.m v24<- optimized to vmclr.m
> vsetvli zero,a5,e8,mf8,ta,ma
> vsm.v   v24,0(a0)
> ret
>
> As above, we may have one instruction eliminated and require less
> vector registers.
>
> gcc/ChangeLog:
>
> * config/riscv/vector.md: Add new define split to perform
>   the simplification.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c: New test.
>
> Signed-off-by: Pan Li 
> Co-authored-by: kito-cheng 
> ---
>  gcc/config/riscv/vector.md|  32 ++
>  .../rvv/base/integer_compare_insn_shortcut.c  | 291 ++
>  2 files changed, 323 insertions(+)
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c
>
> diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
> index b3d23441679..1642822d098 100644
> --- a/gcc/config/riscv/vector.md
> +++ b/gcc/config/riscv/vector.md
> @@ -7689,3 +7689,35 @@ (define_insn "@pred_fault_load"
>"vleff.v\t%0,%3%p1"
>[(set_attr "type" "vldff")
> (set_attr "mode" "")])
> +
> +;; 
> -
> +;;  Integer Compare Instructions Simplification
> +;; 
> -
> +;; Simplify to VMCLR.m Includes:
> +;; - 1.  VMSNE
> +;; - 2.  VMSLT
> +;; - 3.  VMSLTU
> +;; - 4.  VMSGT
> +;; - 5.  VMSGTU
> +;; 
> -
> +(define_split
> +  [(set (match_operand:VB  0 "register_operand")
> +   (if_then_else:VB
> + (unspec:VB
> +   [(match_operand:VB 1 "vector_all_trues_mask_operand")
> +(match_operand4 "vector_length_operand")
> +(match_operand5 "const_int_operand")
> +(match_operand6 "const_int_operand")
> +(reg:SI VL_REGNUM)
> +(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
> + (match_operand:VB3 "vector_move_operand")
> + (match_operand:VB2 "vector_undef_operand")))]
> +  "TARGET_VECTOR"
> +  [(const_int 0)]
> +  {
> +emit_insn (gen_pred_mov (mode, operands[0], CONST1_RTX 
> (mode),
> +RVV_VUNDEF (mode), operands[3],
> +operands[4], operands[5]));
> +DONE;
> +  }
> +)
> diff --git 
> a/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c
> new file mode 100644
> index 000..8954adad09d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c
> @@ -0,0 +1,291 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +vbool1_t test_shortcut_for_riscv_vmseq_case_0(vint8m8_t v1, size_t vl) {
> +  return __riscv_vmseq_vv_i8m8_b1(v1, v1, vl);
> +}
> +
> +vbool2_t test_shortcut_for_riscv_vmseq_case_1(vint8m4_t v1, size_t vl) {
> +  return __riscv_vmseq_vv_i8m4_b2(v1, v1, vl);
> +}
> +
> +vbool4_t test_shortcut_for_riscv_vmseq_case_2(vint8m2_t v1, size_t vl) {
> +  return __riscv_vmseq_vv_i8m2_b4(v1, v1, vl);
> +}
> +
> +vbool8_t test_shortcut_for_riscv_vmseq_case_3(vint8m1_t v1, size_t vl) {
> +  return __riscv_vmseq_vv_i8m1_b8(v1, v1, vl);
> +}
> +
> +vbool16_t test_shortcut_for_riscv_vmseq_case_4(vint8mf2_t v1, size_t vl) {
> +  return __riscv_vmseq_vv_i8mf2_b16(v1, v1, vl);
> +}
> +
> +vbool32_t test_shortcut_for_riscv_vmseq_case_5(vint8mf4_t v1, size_t vl) {
> +  return __riscv_vmseq_vv_i8mf4_b32(v1, v1, vl);
> +}
> +
> +vbool64_t test_shortcut_for_riscv_vmseq_case_6(vint8mf8_t v1, size_t vl) {
> +  return __riscv_vmseq_vv_i8mf8_b64(v1, v1, vl);
> +}
> +
> +vbool1_t test_shortcut_for_riscv_vmsne_case_0(vint8m8_t v1, size_t vl) {
> +  return __riscv_vmsne_vv_i8m8_b1(v1, v1, vl);
> +}
> +
> +vbool2_t test_shortcut_for_riscv_vmsne_case_1(vint8m4_t v1, size_t vl) {
> +  return __riscv_vmsne_vv_i8m4_b2(v1, v1, vl