RE: [PATCH v6] RISC-V: Using merge approach to optimize repeating sequence
Updated the PATCH v7 for comments addressing. https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619939.html Pan From: Li, Pan2 Sent: Thursday, May 25, 2023 11:17 AM To: juzhe.zh...@rivai.ai; gcc-patches Cc: Kito.cheng ; Wang, Yanzhang Subject: RE: [PATCH v6] RISC-V: Using merge approach to optimize repeating sequence Oops, forget to remove it in previous version, will wait a while and update them together. Pan From: juzhe.zh...@rivai.ai<mailto:juzhe.zh...@rivai.ai> mailto:juzhe.zh...@rivai.ai>> Sent: Thursday, May 25, 2023 11:14 AM To: Li, Pan2 mailto:pan2...@intel.com>>; gcc-patches mailto:gcc-patches@gcc.gnu.org>> Cc: Kito.cheng mailto:kito.ch...@sifive.com>>; Li, Pan2 mailto:pan2...@intel.com>>; Wang, Yanzhang mailto:yanzhang.w...@intel.com>> Subject: Re: [PATCH v6] RISC-V: Using merge approach to optimize repeating sequence * machmode.h (VECTOR_BOOL_MODE_P): New macro. --- a/gcc/machmode.h +++ b/gcc/machmode.h @@ -134,6 +134,10 @@ extern const unsigned char mode_class[NUM_MACHINE_MODES]; || GET_MODE_CLASS (MODE) == MODE_VECTOR_ACCUM\ || GET_MODE_CLASS (MODE) == MODE_VECTOR_UACCUM) +/* Nonzero if MODE is a vector bool mode. */ +#define VECTOR_BOOL_MODE_P(MODE)\ + (GET_MODE_CLASS (MODE) == MODE_VECTOR_BOOL) \ + Why do you add this? But no use. You should drop this. juzhe.zh...@rivai.ai<mailto:juzhe.zh...@rivai.ai> From: pan2.li<mailto:pan2...@intel.com> Date: 2023-05-25 11:09 To: gcc-patches<mailto:gcc-patches@gcc.gnu.org> CC: juzhe.zhong<mailto:juzhe.zh...@rivai.ai>; kito.cheng<mailto:kito.ch...@sifive.com>; pan2.li<mailto:pan2...@intel.com>; yanzhang.wang<mailto:yanzhang.w...@intel.com> Subject: [PATCH v6] RISC-V: Using merge approach to optimize repeating sequence From: Pan Li mailto:pan2...@intel.com>> This patch would like to optimize the VLS vector initialization like repeating sequence. From the vslide1down to the vmerge with a simple cost model, aka every instruction only has 1 cost. Given code with -march=rv64gcv_zvl256b --param riscv-autovec-preference=fixed-vlmax typedef int64_t vnx32di __attribute__ ((vector_size (256))); __attribute__ ((noipa)) void f_vnx32di (int64_t a, int64_t b, int64_t *out) { vnx32di v = { a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, }; *(vnx32di *) out = v; } Before this patch: vslide1down.vx (x31 times) After this patch: li a5,-1431654400 addi a5,a5,-1365 li a3,-1431654400 addi a3,a3,-1366 slli a5,a5,32 add a5,a5,a3 vsetvli a4,zero,e64,m8,ta,ma vmv.v.x v8,a0 vmv.s.x v0,a5 vmerge.vxm v8,v8,a1,v0 vs8r.v v8,0(a2) Since we dont't have SEW = 128 in vec_duplicate, we can't combine ab into SEW = 128 element and then broadcast this big element. Signed-off-by: Pan Li mailto:pan2...@intel.com>> Co-Authored by: Juzhe-Zhong mailto:juzhe.zh...@rivai.ai>> gcc/ChangeLog: * config/riscv/riscv-protos.h (enum insn_type): New type. * config/riscv/riscv-v.cc (RVV_INSN_OPERANDS_MAX): New macro. (rvv_builder::can_duplicate_repeating_sequence_p): Align the referenced class member. (rvv_builder::get_merged_repeating_sequence): (rvv_builder::repeating_sequence_use_merge_profitable_p): New function to evaluate the optimization cost. (rvv_builder::get_merge_scalar_mask): New function to get the merge mask. (emit_scalar_move_insn): New function to emit vmv.s.x. (emit_vlmax_integer_move_insn): New function to emit vlmax vmv.v.x. (emit_nonvlmax_integer_move_insn): New function to emit nonvlmax vmv.v.x. (get_repeating_sequence_dup_machine_mode): New function to get the dup machine mode. (expand_vector_init_merge_repeating_sequence): New function to perform the optimization. (expand_vec_init): Add this vector init optimization. * config/riscv/riscv.h (BITS_PER_WORD): New macro. * machmode.h (VECTOR_BOOL_MODE_P): New macro. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-1.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-2.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-3.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-4.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-5.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-1.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-2.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-3.c: New test. Signed-off-by: Pan Li mailto:pan2...@intel.com>> --- gcc/config/riscv/riscv-protos.h | 1 + gcc/config/riscv/riscv-v.cc | 225 +- gcc/config/riscv/riscv.h | 1 + gcc/machmode.h| 4 + .../vls-vlmax/init-repeat-sequence-1.c
RE: [PATCH v6] RISC-V: Using merge approach to optimize repeating sequence
Oops, forget to remove it in previous version, will wait a while and update them together. Pan From: juzhe.zh...@rivai.ai Sent: Thursday, May 25, 2023 11:14 AM To: Li, Pan2 ; gcc-patches Cc: Kito.cheng ; Li, Pan2 ; Wang, Yanzhang Subject: Re: [PATCH v6] RISC-V: Using merge approach to optimize repeating sequence * machmode.h (VECTOR_BOOL_MODE_P): New macro. --- a/gcc/machmode.h +++ b/gcc/machmode.h @@ -134,6 +134,10 @@ extern const unsigned char mode_class[NUM_MACHINE_MODES]; || GET_MODE_CLASS (MODE) == MODE_VECTOR_ACCUM\ || GET_MODE_CLASS (MODE) == MODE_VECTOR_UACCUM) +/* Nonzero if MODE is a vector bool mode. */ +#define VECTOR_BOOL_MODE_P(MODE)\ + (GET_MODE_CLASS (MODE) == MODE_VECTOR_BOOL) \ + Why do you add this? But no use. You should drop this. juzhe.zh...@rivai.ai<mailto:juzhe.zh...@rivai.ai> From: pan2.li<mailto:pan2...@intel.com> Date: 2023-05-25 11:09 To: gcc-patches<mailto:gcc-patches@gcc.gnu.org> CC: juzhe.zhong<mailto:juzhe.zh...@rivai.ai>; kito.cheng<mailto:kito.ch...@sifive.com>; pan2.li<mailto:pan2...@intel.com>; yanzhang.wang<mailto:yanzhang.w...@intel.com> Subject: [PATCH v6] RISC-V: Using merge approach to optimize repeating sequence From: Pan Li mailto:pan2...@intel.com>> This patch would like to optimize the VLS vector initialization like repeating sequence. From the vslide1down to the vmerge with a simple cost model, aka every instruction only has 1 cost. Given code with -march=rv64gcv_zvl256b --param riscv-autovec-preference=fixed-vlmax typedef int64_t vnx32di __attribute__ ((vector_size (256))); __attribute__ ((noipa)) void f_vnx32di (int64_t a, int64_t b, int64_t *out) { vnx32di v = { a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, }; *(vnx32di *) out = v; } Before this patch: vslide1down.vx (x31 times) After this patch: li a5,-1431654400 addi a5,a5,-1365 li a3,-1431654400 addi a3,a3,-1366 slli a5,a5,32 add a5,a5,a3 vsetvli a4,zero,e64,m8,ta,ma vmv.v.x v8,a0 vmv.s.x v0,a5 vmerge.vxm v8,v8,a1,v0 vs8r.v v8,0(a2) Since we dont't have SEW = 128 in vec_duplicate, we can't combine ab into SEW = 128 element and then broadcast this big element. Signed-off-by: Pan Li mailto:pan2...@intel.com>> Co-Authored by: Juzhe-Zhong mailto:juzhe.zh...@rivai.ai>> gcc/ChangeLog: * config/riscv/riscv-protos.h (enum insn_type): New type. * config/riscv/riscv-v.cc (RVV_INSN_OPERANDS_MAX): New macro. (rvv_builder::can_duplicate_repeating_sequence_p): Align the referenced class member. (rvv_builder::get_merged_repeating_sequence): (rvv_builder::repeating_sequence_use_merge_profitable_p): New function to evaluate the optimization cost. (rvv_builder::get_merge_scalar_mask): New function to get the merge mask. (emit_scalar_move_insn): New function to emit vmv.s.x. (emit_vlmax_integer_move_insn): New function to emit vlmax vmv.v.x. (emit_nonvlmax_integer_move_insn): New function to emit nonvlmax vmv.v.x. (get_repeating_sequence_dup_machine_mode): New function to get the dup machine mode. (expand_vector_init_merge_repeating_sequence): New function to perform the optimization. (expand_vec_init): Add this vector init optimization. * config/riscv/riscv.h (BITS_PER_WORD): New macro. * machmode.h (VECTOR_BOOL_MODE_P): New macro. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-1.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-2.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-3.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-4.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-5.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-1.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-2.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-3.c: New test. Signed-off-by: Pan Li mailto:pan2...@intel.com>> --- gcc/config/riscv/riscv-protos.h | 1 + gcc/config/riscv/riscv-v.cc | 225 +- gcc/config/riscv/riscv.h | 1 + gcc/machmode.h| 4 + .../vls-vlmax/init-repeat-sequence-1.c| 21 ++ .../vls-vlmax/init-repeat-sequence-2.c| 24 ++ .../vls-vlmax/init-repeat-sequence-3.c| 25 ++ .../vls-vlmax/init-repeat-sequence-4.c| 15 ++ .../vls-vlmax/init-repeat-sequence-5.c| 17 ++ .../vls-vlmax/init-repeat-sequence-run-1.c| 47 .../vls-vlmax/init-repeat-sequence-run-2.c| 46 .../vls-vlmax/init-repeat-sequence-run-3.c| 41 12 files changed, 461 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-1.c create mode 100644 g
Re: [PATCH v6] RISC-V: Using merge approach to optimize repeating sequence
* machmode.h (VECTOR_BOOL_MODE_P): New macro. --- a/gcc/machmode.h +++ b/gcc/machmode.h @@ -134,6 +134,10 @@ extern const unsigned char mode_class[NUM_MACHINE_MODES]; || GET_MODE_CLASS (MODE) == MODE_VECTOR_ACCUM \ || GET_MODE_CLASS (MODE) == MODE_VECTOR_UACCUM) +/* Nonzero if MODE is a vector bool mode. */ +#define VECTOR_BOOL_MODE_P(MODE) \ + (GET_MODE_CLASS (MODE) == MODE_VECTOR_BOOL) \ + Why do you add this? But no use. You should drop this. juzhe.zh...@rivai.ai From: pan2.li Date: 2023-05-25 11:09 To: gcc-patches CC: juzhe.zhong; kito.cheng; pan2.li; yanzhang.wang Subject: [PATCH v6] RISC-V: Using merge approach to optimize repeating sequence From: Pan Li This patch would like to optimize the VLS vector initialization like repeating sequence. From the vslide1down to the vmerge with a simple cost model, aka every instruction only has 1 cost. Given code with -march=rv64gcv_zvl256b --param riscv-autovec-preference=fixed-vlmax typedef int64_t vnx32di __attribute__ ((vector_size (256))); __attribute__ ((noipa)) void f_vnx32di (int64_t a, int64_t b, int64_t *out) { vnx32di v = { a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, }; *(vnx32di *) out = v; } Before this patch: vslide1down.vx (x31 times) After this patch: li a5,-1431654400 addi a5,a5,-1365 li a3,-1431654400 addi a3,a3,-1366 slli a5,a5,32 add a5,a5,a3 vsetvli a4,zero,e64,m8,ta,ma vmv.v.x v8,a0 vmv.s.x v0,a5 vmerge.vxm v8,v8,a1,v0 vs8r.v v8,0(a2) Since we dont't have SEW = 128 in vec_duplicate, we can't combine ab into SEW = 128 element and then broadcast this big element. Signed-off-by: Pan Li Co-Authored by: Juzhe-Zhong gcc/ChangeLog: * config/riscv/riscv-protos.h (enum insn_type): New type. * config/riscv/riscv-v.cc (RVV_INSN_OPERANDS_MAX): New macro. (rvv_builder::can_duplicate_repeating_sequence_p): Align the referenced class member. (rvv_builder::get_merged_repeating_sequence): (rvv_builder::repeating_sequence_use_merge_profitable_p): New function to evaluate the optimization cost. (rvv_builder::get_merge_scalar_mask): New function to get the merge mask. (emit_scalar_move_insn): New function to emit vmv.s.x. (emit_vlmax_integer_move_insn): New function to emit vlmax vmv.v.x. (emit_nonvlmax_integer_move_insn): New function to emit nonvlmax vmv.v.x. (get_repeating_sequence_dup_machine_mode): New function to get the dup machine mode. (expand_vector_init_merge_repeating_sequence): New function to perform the optimization. (expand_vec_init): Add this vector init optimization. * config/riscv/riscv.h (BITS_PER_WORD): New macro. * machmode.h (VECTOR_BOOL_MODE_P): New macro. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-1.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-2.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-3.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-4.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-5.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-1.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-2.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-3.c: New test. Signed-off-by: Pan Li --- gcc/config/riscv/riscv-protos.h | 1 + gcc/config/riscv/riscv-v.cc | 225 +- gcc/config/riscv/riscv.h | 1 + gcc/machmode.h| 4 + .../vls-vlmax/init-repeat-sequence-1.c| 21 ++ .../vls-vlmax/init-repeat-sequence-2.c| 24 ++ .../vls-vlmax/init-repeat-sequence-3.c| 25 ++ .../vls-vlmax/init-repeat-sequence-4.c| 15 ++ .../vls-vlmax/init-repeat-sequence-5.c| 17 ++ .../vls-vlmax/init-repeat-sequence-run-1.c| 47 .../vls-vlmax/init-repeat-sequence-run-2.c| 46 .../vls-vlmax/init-repeat-sequence-run-3.c| 41 12 files changed, 461 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-4.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-5.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-3.c diff --git a/gcc/config/riscv/riscv-protos.h b/
[PATCH v6] RISC-V: Using merge approach to optimize repeating sequence
From: Pan Li This patch would like to optimize the VLS vector initialization like repeating sequence. From the vslide1down to the vmerge with a simple cost model, aka every instruction only has 1 cost. Given code with -march=rv64gcv_zvl256b --param riscv-autovec-preference=fixed-vlmax typedef int64_t vnx32di __attribute__ ((vector_size (256))); __attribute__ ((noipa)) void f_vnx32di (int64_t a, int64_t b, int64_t *out) { vnx32di v = { a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, }; *(vnx32di *) out = v; } Before this patch: vslide1down.vx (x31 times) After this patch: li a5,-1431654400 addia5,a5,-1365 li a3,-1431654400 addia3,a3,-1366 sllia5,a5,32 add a5,a5,a3 vsetvli a4,zero,e64,m8,ta,ma vmv.v.x v8,a0 vmv.s.x v0,a5 vmerge.vxm v8,v8,a1,v0 vs8r.v v8,0(a2) Since we dont't have SEW = 128 in vec_duplicate, we can't combine ab into SEW = 128 element and then broadcast this big element. Signed-off-by: Pan Li Co-Authored by: Juzhe-Zhong gcc/ChangeLog: * config/riscv/riscv-protos.h (enum insn_type): New type. * config/riscv/riscv-v.cc (RVV_INSN_OPERANDS_MAX): New macro. (rvv_builder::can_duplicate_repeating_sequence_p): Align the referenced class member. (rvv_builder::get_merged_repeating_sequence): (rvv_builder::repeating_sequence_use_merge_profitable_p): New function to evaluate the optimization cost. (rvv_builder::get_merge_scalar_mask): New function to get the merge mask. (emit_scalar_move_insn): New function to emit vmv.s.x. (emit_vlmax_integer_move_insn): New function to emit vlmax vmv.v.x. (emit_nonvlmax_integer_move_insn): New function to emit nonvlmax vmv.v.x. (get_repeating_sequence_dup_machine_mode): New function to get the dup machine mode. (expand_vector_init_merge_repeating_sequence): New function to perform the optimization. (expand_vec_init): Add this vector init optimization. * config/riscv/riscv.h (BITS_PER_WORD): New macro. * machmode.h (VECTOR_BOOL_MODE_P): New macro. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-1.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-2.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-3.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-4.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-5.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-1.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-2.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-3.c: New test. Signed-off-by: Pan Li --- gcc/config/riscv/riscv-protos.h | 1 + gcc/config/riscv/riscv-v.cc | 225 +- gcc/config/riscv/riscv.h | 1 + gcc/machmode.h| 4 + .../vls-vlmax/init-repeat-sequence-1.c| 21 ++ .../vls-vlmax/init-repeat-sequence-2.c| 24 ++ .../vls-vlmax/init-repeat-sequence-3.c| 25 ++ .../vls-vlmax/init-repeat-sequence-4.c| 15 ++ .../vls-vlmax/init-repeat-sequence-5.c| 17 ++ .../vls-vlmax/init-repeat-sequence-run-1.c| 47 .../vls-vlmax/init-repeat-sequence-run-2.c| 46 .../vls-vlmax/init-repeat-sequence-run-3.c| 41 12 files changed, 461 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-4.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-5.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-3.c diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index 36419c95bbd..768b646fec1 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -140,6 +140,7 @@ enum insn_type RVV_MERGE_OP = 4, RVV_CMP_OP = 4, RVV_CMP_MU_OP = RVV_CMP_OP + 2, /* +2 means mask and maskoff operand. */ + RVV_SCALAR_MOV_OP = 4, }; enum vlmul_type { diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index f71ad9e46a1..458020ce0a1 100644