Re: Re: [PATCH] RISC-V: Normalize user vsetvl intrinsics[PR112092]
On Wed, 8 Nov 2023, Kito Cheng wrote: > OK, then LGTM, thanks for the explanation :) Please don't top-post on a GCC mailing list (and preferably in off-list replies to such mailing list messages unless it's been agreed to somehow with the participants), as it makes it difficult to make context replies. Best practice is to reply inline, quoting the relevant original paragraph (or enough context) referred to above, and with all the other parts of the message replied to discarded. We may even have it written down somewhere (though I haven't checked; in the old days it used to be assumed), and I do hope any sane modern MUA can handle it. Otherwise the discussion thread quickly grows into an illegible mess. So this change does indeed fix PR 112092, however we now have an issue with several other test cases and the new `-mmovcc' option. For example vsetvl-13.c fails with "-mmovcc -mbranch-cost=8" test options and assembly produced is like: vsetvli a6,a6,e8,mf4,ta,ma sneza5,a5 neg a5,a5 and a6,a5,a6 not a5,a5 andia5,a5,55 or a5,a6,a5 beq a4,zero,.L10 li a6,0 vsetvli zero,a5,e32,m1,tu,ma .L4: vle32.v v1,0(a0) vle32.v v1,0(a1) vle32.v v1,0(a2) vse32.v v1,0(a3) addia6,a6,1 bne a4,a6,.L4 .L10: ret As far as I can tell code produced is legitimate, and for the record analogous assembly is produced with `-march=rv32gcv_zicond' too: vsetvli a6,a6,e8,mf4,ta,ma czero.eqz a6,a6,a5 li a7,55 czero.nez a5,a7,a5 or a5,a5,a6 beq a4,zero,.L10 li a6,0 vsetvli zero,a5,e32,m1,tu,ma .L4: vle32.v v1,0(a0) vle32.v v1,0(a1) vle32.v v1,0(a2) vse32.v v1,0(a3) addia6,a6,1 bne a4,a6,.L4 .L10: ret -- it's just that you can't see it with regression testing, because the test case overrides `-march='. Presumably we do want to execute VSETVLI twice here on the basis that to avoid the second one by means of branches would be more costly than not to. Shall we just silence false failures like this with `-mno-movcc' then or shall we handle the conditional-move case somehow? For reference plain branched assembly is like: li a7,55 beq a5,zero,.L13 vsetvli zero,a6,e32,m1,tu,ma .L2: beq a4,zero,.L11 li a5,0 .L4: vle32.v v1,0(a0) vle32.v v1,0(a1) vle32.v v1,0(a2) vse32.v v1,0(a3) addia5,a5,1 bne a4,a5,.L4 .L11: ret .L13: vsetvli zero,a7,e32,m1,tu,ma j .L2 Maciej
Re: Re: [PATCH] RISC-V: Normalize user vsetvl intrinsics[PR112092]
On Wed, Nov 8, 2023 at 2:37 PM juzhe.zh...@rivai.ai wrote: > > Another question raise to me. > > Is it necessary we have such many variant of vsetvls? > > I am thinking about redesign: > > __riscv_vsetvl_e8mf8 > __riscv_vsetvl_e16mf4 > __riscv_vsetvl_e32mf2 > __riscv_vsetvl_e64m1 > > They are quite redundant. They have the same result. > > May be just design as : > > __riscv_vsetvl_ratio64 > > I am no proposing it since it has been used for a long time. Just raise my > concern. Yeah, I agree those variant are just having same behavior even semantic on the current intrinsic model, one reason is we don't have smart vsetvli insertion pass at design stage, also it's more obviously to user to pick the right vsetvli intrinsic, however I intend not to change that interface, the reason is simple, it's used for a long time as you mentioned, change that would be huge disturbance. There may have same argument for vbool* stuffs, but vbool* kind of mixing historical reason* and also we didn't found better way to model that. * We have define MLEN is v-spec long times ago, I forgot it's 0.7 or 0.8..
Re: Re: [PATCH] RISC-V: Normalize user vsetvl intrinsics[PR112092]
Another question raise to me. Is it necessary we have such many variant of vsetvls? I am thinking about redesign: __riscv_vsetvl_e8mf8 __riscv_vsetvl_e16mf4 __riscv_vsetvl_e32mf2 __riscv_vsetvl_e64m1 They are quite redundant. They have the same result. May be just design as : __riscv_vsetvl_ratio64 I am no proposing it since it has been used for a long time. Just raise my concern. juzhe.zh...@rivai.ai From: Kito Cheng Date: 2023-11-08 14:33 To: juzhe.zh...@rivai.ai CC: gcc-patches; Kito.cheng; jeffreyalaw; Robin Dapp Subject: Re: Re: [PATCH] RISC-V: Normalize user vsetvl intrinsics[PR112092] OK, then LGTM, thanks for the explanation :) On Wed, Nov 8, 2023 at 2:33 PM juzhe.zh...@rivai.ai wrote: > > More details: > > bb 1 bb 2 > \/ >bb 3 > > VSETVL PASS can only do VSETVL demand fusion, fuse demand from bb 3 to bb 1, > and fuse demand from bb 3 to bb2. > We are not able to remove block bb 1 and bb 2 and create new bb 4 to hold the > vsetvl if bb 1 and bb 2 has the same vsetvl: > > bb 4 (new block) > | > bb 3 > > I don't think we should do this on VSETVL PASS. > > juzhe.zh...@rivai.ai > > > From: Kito Cheng > Date: 2023-11-08 14:16 > To: Juzhe-Zhong > CC: gcc-patches; kito.cheng; jeffreyalaw; rdapp.gcc > Subject: Re: [PATCH] RISC-V: Normalize user vsetvl intrinsics[PR112092] > I thought vsetvli insertion will try to merge them into one for those > cases? Could you explain few more reasons why they are not fused now? > Not an objection since I could imageing that would be easier to > process, just wondering why. > > On Wed, Nov 8, 2023 at 2:11 PM Juzhe-Zhong wrote: > > > > Since our user vsetvl intrinsics are defined as just calculate the VL output > > which is the number of the elements to be processed. Such intrinsics do not > > have any side effects. We should normalize them when they have same ratio. > > > > E.g __riscv_vsetvl_e8mf8 result is same as __riscv_vsetvl_e64m1. > > > > Normalize them can allow us have better codegen. > > Consider this following example: > > > > #include "riscv_vector.h" > > > > void foo(int32_t *in1, int32_t *in2, int32_t *in3, int32_t *out, size_t n, > > int cond, int avl) { > > > > size_t vl; > > if (cond) > > vl = __riscv_vsetvl_e32m1(avl); > > else > > vl = __riscv_vsetvl_e16mf2(avl); > > for (size_t i = 0; i < n; i += 1) { > > vint32m1_t a = __riscv_vle32_v_i32m1(in1, vl); > > vint32m1_t b = __riscv_vle32_v_i32m1_tu(a, in2, vl); > > vint32m1_t c = __riscv_vle32_v_i32m1_tu(b, in3, vl); > > __riscv_vse32_v_i32m1(out, c, vl); > > } > > } > > > > Before this patch: > > > > foo: > > beq a5,zero,.L2 > > vsetvli a6,a6,e32,m1,tu,ma > > .L3: > > li a5,0 > > beq a4,zero,.L9 > > .L4: > > vle32.v v1,0(a0) > > addia5,a5,1 > > vle32.v v1,0(a1) > > vle32.v v1,0(a2) > > vse32.v v1,0(a3) > > bne a4,a5,.L4 > > .L9: > > ret > > .L2: > > vsetvli zero,a6,e32,m1,tu,ma > > j .L3 > > > > After this patch: > > > > foo: > > li a5,0 > > vsetvli zero,a6,e32,m1,tu,ma > > beq a4,zero,.L9 > > .L4: > > vle32.v v1,0(a0) > > addia5,a5,1 > > vle32.v v1,0(a1) > > vle32.v v1,0(a2) > > vse32.v v1,0(a3) > > bne a4,a5,.L4 > > .L9: > > ret > > > > PR target/112092 > > > > gcc/ChangeLog: > > > > * config/riscv/riscv-vector-builtins-bases.cc: Normalize the > > vsetvls. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/riscv/rvv/vsetvl/pr109743-1.c: Adapt test. > > * gcc.target/riscv/rvv/vsetvl/pr109743-3.c: Ditto. > > * gcc.target/riscv/rvv/vsetvl/vsetvl-11.c: Ditto. > > * gcc.target/riscv/rvv/vsetvl/vsetvl-15.c: Ditto. > > * gcc.target/riscv/rvv/vsetvl/vsetvl-22.c: Ditto. > > * gcc.target/riscv/rvv/vsetvl/vsetvlmax-13.c: Ditto. > > * gcc.target/riscv/rvv/vsetvl/vsetvlmax-15.c: Ditto. > > * gcc.target/riscv/rvv/vsetvl/vsetvlmax-5.c: Ditto. > > * gcc.target/riscv/rvv/vsetvl/vsetvlmax-7.c: Ditto. > > * gcc.target/riscv/rvv/vsetvl/vsetvlmax-8.c: Ditto. > > * gcc.target/riscv/rvv/vsetvl/pr112092-1.c: New test. > > * gcc.target/riscv/rvv/vsetvl/pr112092-2.c: New t
Re: Re: [PATCH] RISC-V: Normalize user vsetvl intrinsics[PR112092]
OK, then LGTM, thanks for the explanation :) On Wed, Nov 8, 2023 at 2:33 PM juzhe.zh...@rivai.ai wrote: > > More details: > > bb 1 bb 2 > \/ >bb 3 > > VSETVL PASS can only do VSETVL demand fusion, fuse demand from bb 3 to bb 1, > and fuse demand from bb 3 to bb2. > We are not able to remove block bb 1 and bb 2 and create new bb 4 to hold the > vsetvl if bb 1 and bb 2 has the same vsetvl: > > bb 4 (new block) > | > bb 3 > > I don't think we should do this on VSETVL PASS. > > juzhe.zh...@rivai.ai > > > From: Kito Cheng > Date: 2023-11-08 14:16 > To: Juzhe-Zhong > CC: gcc-patches; kito.cheng; jeffreyalaw; rdapp.gcc > Subject: Re: [PATCH] RISC-V: Normalize user vsetvl intrinsics[PR112092] > I thought vsetvli insertion will try to merge them into one for those > cases? Could you explain few more reasons why they are not fused now? > Not an objection since I could imageing that would be easier to > process, just wondering why. > > On Wed, Nov 8, 2023 at 2:11 PM Juzhe-Zhong wrote: > > > > Since our user vsetvl intrinsics are defined as just calculate the VL output > > which is the number of the elements to be processed. Such intrinsics do not > > have any side effects. We should normalize them when they have same ratio. > > > > E.g __riscv_vsetvl_e8mf8 result is same as __riscv_vsetvl_e64m1. > > > > Normalize them can allow us have better codegen. > > Consider this following example: > > > > #include "riscv_vector.h" > > > > void foo(int32_t *in1, int32_t *in2, int32_t *in3, int32_t *out, size_t n, > > int cond, int avl) { > > > > size_t vl; > > if (cond) > > vl = __riscv_vsetvl_e32m1(avl); > > else > > vl = __riscv_vsetvl_e16mf2(avl); > > for (size_t i = 0; i < n; i += 1) { > > vint32m1_t a = __riscv_vle32_v_i32m1(in1, vl); > > vint32m1_t b = __riscv_vle32_v_i32m1_tu(a, in2, vl); > > vint32m1_t c = __riscv_vle32_v_i32m1_tu(b, in3, vl); > > __riscv_vse32_v_i32m1(out, c, vl); > > } > > } > > > > Before this patch: > > > > foo: > > beq a5,zero,.L2 > > vsetvli a6,a6,e32,m1,tu,ma > > .L3: > > li a5,0 > > beq a4,zero,.L9 > > .L4: > > vle32.v v1,0(a0) > > addia5,a5,1 > > vle32.v v1,0(a1) > > vle32.v v1,0(a2) > > vse32.v v1,0(a3) > > bne a4,a5,.L4 > > .L9: > > ret > > .L2: > > vsetvli zero,a6,e32,m1,tu,ma > > j .L3 > > > > After this patch: > > > > foo: > > li a5,0 > > vsetvli zero,a6,e32,m1,tu,ma > > beq a4,zero,.L9 > > .L4: > > vle32.v v1,0(a0) > > addia5,a5,1 > > vle32.v v1,0(a1) > > vle32.v v1,0(a2) > > vse32.v v1,0(a3) > > bne a4,a5,.L4 > > .L9: > > ret > > > > PR target/112092 > > > > gcc/ChangeLog: > > > > * config/riscv/riscv-vector-builtins-bases.cc: Normalize the > > vsetvls. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/riscv/rvv/vsetvl/pr109743-1.c: Adapt test. > > * gcc.target/riscv/rvv/vsetvl/pr109743-3.c: Ditto. > > * gcc.target/riscv/rvv/vsetvl/vsetvl-11.c: Ditto. > > * gcc.target/riscv/rvv/vsetvl/vsetvl-15.c: Ditto. > > * gcc.target/riscv/rvv/vsetvl/vsetvl-22.c: Ditto. > > * gcc.target/riscv/rvv/vsetvl/vsetvlmax-13.c: Ditto. > > * gcc.target/riscv/rvv/vsetvl/vsetvlmax-15.c: Ditto. > > * gcc.target/riscv/rvv/vsetvl/vsetvlmax-5.c: Ditto. > > * gcc.target/riscv/rvv/vsetvl/vsetvlmax-7.c: Ditto. > > * gcc.target/riscv/rvv/vsetvl/vsetvlmax-8.c: Ditto. > > * gcc.target/riscv/rvv/vsetvl/pr112092-1.c: New test. > > * gcc.target/riscv/rvv/vsetvl/pr112092-2.c: New test. > > > > --- > > .../riscv/riscv-vector-builtins-bases.cc | 24 +- > > .../gcc.target/riscv/rvv/vsetvl/pr109743-1.c | 2 +- > > .../gcc.target/riscv/rvv/vsetvl/pr109743-3.c | 3 +-- > > .../gcc.target/riscv/rvv/vsetvl/pr112092-1.c | 25 +++ > > .../gcc.target/riscv/rvv/vsetvl/pr112092-2.c | 25 +++ > > .../gcc.target/riscv/rvv/vsetvl/vsetvl-11.c | 2 +- > > .../gcc.target/riscv/rvv/vsetvl/vsetvl-15.c | 2 +- > > .../gcc.target/riscv/rvv/vsetvl/vsetvl-22.c | 2 +- > > .../ri
Re: Re: [PATCH] RISC-V: Normalize user vsetvl intrinsics[PR112092]
More details: bb 1 bb 2 \/ bb 3 VSETVL PASS can only do VSETVL demand fusion, fuse demand from bb 3 to bb 1, and fuse demand from bb 3 to bb2. We are not able to remove block bb 1 and bb 2 and create new bb 4 to hold the vsetvl if bb 1 and bb 2 has the same vsetvl: bb 4 (new block) | bb 3 I don't think we should do this on VSETVL PASS. juzhe.zh...@rivai.ai From: Kito Cheng Date: 2023-11-08 14:16 To: Juzhe-Zhong CC: gcc-patches; kito.cheng; jeffreyalaw; rdapp.gcc Subject: Re: [PATCH] RISC-V: Normalize user vsetvl intrinsics[PR112092] I thought vsetvli insertion will try to merge them into one for those cases? Could you explain few more reasons why they are not fused now? Not an objection since I could imageing that would be easier to process, just wondering why. On Wed, Nov 8, 2023 at 2:11 PM Juzhe-Zhong wrote: > > Since our user vsetvl intrinsics are defined as just calculate the VL output > which is the number of the elements to be processed. Such intrinsics do not > have any side effects. We should normalize them when they have same ratio. > > E.g __riscv_vsetvl_e8mf8 result is same as __riscv_vsetvl_e64m1. > > Normalize them can allow us have better codegen. > Consider this following example: > > #include "riscv_vector.h" > > void foo(int32_t *in1, int32_t *in2, int32_t *in3, int32_t *out, size_t n, > int cond, int avl) { > > size_t vl; > if (cond) > vl = __riscv_vsetvl_e32m1(avl); > else > vl = __riscv_vsetvl_e16mf2(avl); > for (size_t i = 0; i < n; i += 1) { > vint32m1_t a = __riscv_vle32_v_i32m1(in1, vl); > vint32m1_t b = __riscv_vle32_v_i32m1_tu(a, in2, vl); > vint32m1_t c = __riscv_vle32_v_i32m1_tu(b, in3, vl); > __riscv_vse32_v_i32m1(out, c, vl); > } > } > > Before this patch: > > foo: > beq a5,zero,.L2 > vsetvli a6,a6,e32,m1,tu,ma > .L3: > li a5,0 > beq a4,zero,.L9 > .L4: > vle32.v v1,0(a0) > addia5,a5,1 > vle32.v v1,0(a1) > vle32.v v1,0(a2) > vse32.v v1,0(a3) > bne a4,a5,.L4 > .L9: > ret > .L2: > vsetvli zero,a6,e32,m1,tu,ma > j .L3 > > After this patch: > > foo: > li a5,0 > vsetvli zero,a6,e32,m1,tu,ma > beq a4,zero,.L9 > .L4: > vle32.v v1,0(a0) > addia5,a5,1 > vle32.v v1,0(a1) > vle32.v v1,0(a2) > vse32.v v1,0(a3) > bne a4,a5,.L4 > .L9: > ret > > PR target/112092 > > gcc/ChangeLog: > > * config/riscv/riscv-vector-builtins-bases.cc: Normalize the vsetvls. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/vsetvl/pr109743-1.c: Adapt test. > * gcc.target/riscv/rvv/vsetvl/pr109743-3.c: Ditto. > * gcc.target/riscv/rvv/vsetvl/vsetvl-11.c: Ditto. > * gcc.target/riscv/rvv/vsetvl/vsetvl-15.c: Ditto. > * gcc.target/riscv/rvv/vsetvl/vsetvl-22.c: Ditto. > * gcc.target/riscv/rvv/vsetvl/vsetvlmax-13.c: Ditto. > * gcc.target/riscv/rvv/vsetvl/vsetvlmax-15.c: Ditto. > * gcc.target/riscv/rvv/vsetvl/vsetvlmax-5.c: Ditto. > * gcc.target/riscv/rvv/vsetvl/vsetvlmax-7.c: Ditto. > * gcc.target/riscv/rvv/vsetvl/vsetvlmax-8.c: Ditto. > * gcc.target/riscv/rvv/vsetvl/pr112092-1.c: New test. > * gcc.target/riscv/rvv/vsetvl/pr112092-2.c: New test. > > --- > .../riscv/riscv-vector-builtins-bases.cc | 24 +- > .../gcc.target/riscv/rvv/vsetvl/pr109743-1.c | 2 +- > .../gcc.target/riscv/rvv/vsetvl/pr109743-3.c | 3 +-- > .../gcc.target/riscv/rvv/vsetvl/pr112092-1.c | 25 +++ > .../gcc.target/riscv/rvv/vsetvl/pr112092-2.c | 25 +++ > .../gcc.target/riscv/rvv/vsetvl/vsetvl-11.c | 2 +- > .../gcc.target/riscv/rvv/vsetvl/vsetvl-15.c | 2 +- > .../gcc.target/riscv/rvv/vsetvl/vsetvl-22.c | 2 +- > .../riscv/rvv/vsetvl/vsetvlmax-13.c | 4 +-- > .../riscv/rvv/vsetvl/vsetvlmax-15.c | 6 ++--- > .../gcc.target/riscv/rvv/vsetvl/vsetvlmax-5.c | 4 +-- > .../gcc.target/riscv/rvv/vsetvl/vsetvlmax-7.c | 2 +- > .../gcc.target/riscv/rvv/vsetvl/vsetvlmax-8.c | 4 +-- > 13 files changed, 83 insertions(+), 22 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112092-1.c > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112092-2.c > > diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc > b/gcc/config/riscv/riscv-vector-builtins-bases.cc > index 0298b7987a1..d70468542ee 100644 > --- a/gcc/config/riscv/riscv-vector-builtins-bases.cc > +++ b/gcc/config/riscv/riscv-ve
Re: Re: [PATCH] RISC-V: Normalize user vsetvl intrinsics[PR112092]
before VSETVL PASS. The code is as follows: bb 1: vsetvli e16mf2 -> set a6 bb 2: vsetvli e32m1 -> set a6 bb 3: ... vle (use a6) e32m1 TU vle (use a6) e32m1 TU vse (use a6) e32m1 TU VSETVL PASS only do VSETVL information fusion, it doesn't do the CFG block fusion. VSETVL PASS succeed on following fusion: Change bb 1 vsetvli e16mf2 -> e32m1TU Change bb 2 vsetvli e32m1 -> e32m1TU But VSETVL pass can't remove bb1 and bb2, can create a new block said bb 4 to hold vsetvli e32m1TU So you will see: bb 1: vsetvli e32m1TU bb 2: vsetvli e32m1TU bb 3: ... vle vle vse with this patch, since vsetvl e16mf2 and vsetvl e32m1 are normalized into same vsetvl e8mf4 Then, the before the VSETVL PASS, we will see: bb 1 vsetvli e8mf4 bb 2: ... vle vle vse Since the later vle/vle/vse is using e32m1TU, then VSETVL fuse them into bb1 change vsetvli e8mf4 into: bb 1 vsetvli e32m1TU bb 2: ... vle vle vse juzhe.zh...@rivai.ai From: Kito Cheng Date: 2023-11-08 14:16 To: Juzhe-Zhong CC: gcc-patches; kito.cheng; jeffreyalaw; rdapp.gcc Subject: Re: [PATCH] RISC-V: Normalize user vsetvl intrinsics[PR112092] I thought vsetvli insertion will try to merge them into one for those cases? Could you explain few more reasons why they are not fused now? Not an objection since I could imageing that would be easier to process, just wondering why. On Wed, Nov 8, 2023 at 2:11 PM Juzhe-Zhong wrote: > > Since our user vsetvl intrinsics are defined as just calculate the VL output > which is the number of the elements to be processed. Such intrinsics do not > have any side effects. We should normalize them when they have same ratio. > > E.g __riscv_vsetvl_e8mf8 result is same as __riscv_vsetvl_e64m1. > > Normalize them can allow us have better codegen. > Consider this following example: > > #include "riscv_vector.h" > > void foo(int32_t *in1, int32_t *in2, int32_t *in3, int32_t *out, size_t n, > int cond, int avl) { > > size_t vl; > if (cond) > vl = __riscv_vsetvl_e32m1(avl); > else > vl = __riscv_vsetvl_e16mf2(avl); > for (size_t i = 0; i < n; i += 1) { > vint32m1_t a = __riscv_vle32_v_i32m1(in1, vl); > vint32m1_t b = __riscv_vle32_v_i32m1_tu(a, in2, vl); > vint32m1_t c = __riscv_vle32_v_i32m1_tu(b, in3, vl); > __riscv_vse32_v_i32m1(out, c, vl); > } > } > > Before this patch: > > foo: > beq a5,zero,.L2 > vsetvli a6,a6,e32,m1,tu,ma > .L3: > li a5,0 > beq a4,zero,.L9 > .L4: > vle32.v v1,0(a0) > addia5,a5,1 > vle32.v v1,0(a1) > vle32.v v1,0(a2) > vse32.v v1,0(a3) > bne a4,a5,.L4 > .L9: > ret > .L2: > vsetvli zero,a6,e32,m1,tu,ma > j .L3 > > After this patch: > > foo: > li a5,0 > vsetvli zero,a6,e32,m1,tu,ma > beq a4,zero,.L9 > .L4: > vle32.v v1,0(a0) > addia5,a5,1 > vle32.v v1,0(a1) > vle32.v v1,0(a2) > vse32.v v1,0(a3) > bne a4,a5,.L4 > .L9: > ret > > PR target/112092 > > gcc/ChangeLog: > > * config/riscv/riscv-vector-builtins-bases.cc: Normalize the vsetvls. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/vsetvl/pr109743-1.c: Adapt test. > * gcc.target/riscv/rvv/vsetvl/pr109743-3.c: Ditto. > * gcc.target/riscv/rvv/vsetvl/vsetvl-11.c: Ditto. > * gcc.target/riscv/rvv/vsetvl/vsetvl-15.c: Ditto. > * gcc.target/riscv/rvv/vsetvl/vsetvl-22.c: Ditto. > * gcc.target/riscv/rvv/vsetvl/vsetvlmax-13.c: Ditto. > * gcc.target/riscv/rvv/vsetvl/vsetvlmax-15.c: Ditto. > * gcc.target/riscv/rvv/vsetvl/vsetvlmax-5.c: Ditto. > * gcc.target/riscv/rvv/vsetvl/vsetvlmax-7.c: Ditto. > * gcc.target/riscv/rvv/vsetvl/vsetvlmax-8.c: Ditto. > * gcc.target/riscv/rvv/vsetvl/pr112092-1.c: New test. > * gcc.target/riscv/rvv/vsetvl/pr112092-2.c: New test. > > --- > .../riscv/riscv-vector-builtins-bases.cc | 24 +- > .../gcc.target/riscv/rvv/vsetvl/pr109743-1.c | 2 +- > .../gcc.target/riscv/rvv/vsetvl/pr109743-3.c | 3 +-- > .../gcc.target/riscv/rvv/vsetvl/pr112092-1.c | 25 +++ > .../gcc.target/riscv/rvv/vsetvl/pr112092-2.c | 25 +++ > .../gcc.target/riscv/rvv/vsetvl/vsetvl-11.c | 2 +- > .../gcc.target/riscv/rvv/vsetvl/vsetvl-15.c | 2 +- > .../gcc.target/riscv/rvv/vsetvl/vsetvl-22.c | 2 +- > .../riscv/rvv/vsetvl/vsetvlmax-13.c | 4 +-- > .../riscv/rvv/vsetvl/vsetvlmax-15.c | 6 ++--- > .../gcc.target/riscv/rvv/vsetvl/vsetvlmax-5.c | 4 +-- > .../gcc.target/riscv/rvv/vsetvl/vsetvl
Re: [PATCH] RISC-V: Normalize user vsetvl intrinsics[PR112092]
I thought vsetvli insertion will try to merge them into one for those cases? Could you explain few more reasons why they are not fused now? Not an objection since I could imageing that would be easier to process, just wondering why. On Wed, Nov 8, 2023 at 2:11 PM Juzhe-Zhong wrote: > > Since our user vsetvl intrinsics are defined as just calculate the VL output > which is the number of the elements to be processed. Such intrinsics do not > have any side effects. We should normalize them when they have same ratio. > > E.g __riscv_vsetvl_e8mf8 result is same as __riscv_vsetvl_e64m1. > > Normalize them can allow us have better codegen. > Consider this following example: > > #include "riscv_vector.h" > > void foo(int32_t *in1, int32_t *in2, int32_t *in3, int32_t *out, size_t n, > int cond, int avl) { > > size_t vl; > if (cond) > vl = __riscv_vsetvl_e32m1(avl); > else > vl = __riscv_vsetvl_e16mf2(avl); > for (size_t i = 0; i < n; i += 1) { > vint32m1_t a = __riscv_vle32_v_i32m1(in1, vl); > vint32m1_t b = __riscv_vle32_v_i32m1_tu(a, in2, vl); > vint32m1_t c = __riscv_vle32_v_i32m1_tu(b, in3, vl); > __riscv_vse32_v_i32m1(out, c, vl); > } > } > > Before this patch: > > foo: > beq a5,zero,.L2 > vsetvli a6,a6,e32,m1,tu,ma > .L3: > li a5,0 > beq a4,zero,.L9 > .L4: > vle32.v v1,0(a0) > addia5,a5,1 > vle32.v v1,0(a1) > vle32.v v1,0(a2) > vse32.v v1,0(a3) > bne a4,a5,.L4 > .L9: > ret > .L2: > vsetvli zero,a6,e32,m1,tu,ma > j .L3 > > After this patch: > > foo: > li a5,0 > vsetvli zero,a6,e32,m1,tu,ma > beq a4,zero,.L9 > .L4: > vle32.v v1,0(a0) > addia5,a5,1 > vle32.v v1,0(a1) > vle32.v v1,0(a2) > vse32.v v1,0(a3) > bne a4,a5,.L4 > .L9: > ret > > PR target/112092 > > gcc/ChangeLog: > > * config/riscv/riscv-vector-builtins-bases.cc: Normalize the vsetvls. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/vsetvl/pr109743-1.c: Adapt test. > * gcc.target/riscv/rvv/vsetvl/pr109743-3.c: Ditto. > * gcc.target/riscv/rvv/vsetvl/vsetvl-11.c: Ditto. > * gcc.target/riscv/rvv/vsetvl/vsetvl-15.c: Ditto. > * gcc.target/riscv/rvv/vsetvl/vsetvl-22.c: Ditto. > * gcc.target/riscv/rvv/vsetvl/vsetvlmax-13.c: Ditto. > * gcc.target/riscv/rvv/vsetvl/vsetvlmax-15.c: Ditto. > * gcc.target/riscv/rvv/vsetvl/vsetvlmax-5.c: Ditto. > * gcc.target/riscv/rvv/vsetvl/vsetvlmax-7.c: Ditto. > * gcc.target/riscv/rvv/vsetvl/vsetvlmax-8.c: Ditto. > * gcc.target/riscv/rvv/vsetvl/pr112092-1.c: New test. > * gcc.target/riscv/rvv/vsetvl/pr112092-2.c: New test. > > --- > .../riscv/riscv-vector-builtins-bases.cc | 24 +- > .../gcc.target/riscv/rvv/vsetvl/pr109743-1.c | 2 +- > .../gcc.target/riscv/rvv/vsetvl/pr109743-3.c | 3 +-- > .../gcc.target/riscv/rvv/vsetvl/pr112092-1.c | 25 +++ > .../gcc.target/riscv/rvv/vsetvl/pr112092-2.c | 25 +++ > .../gcc.target/riscv/rvv/vsetvl/vsetvl-11.c | 2 +- > .../gcc.target/riscv/rvv/vsetvl/vsetvl-15.c | 2 +- > .../gcc.target/riscv/rvv/vsetvl/vsetvl-22.c | 2 +- > .../riscv/rvv/vsetvl/vsetvlmax-13.c | 4 +-- > .../riscv/rvv/vsetvl/vsetvlmax-15.c | 6 ++--- > .../gcc.target/riscv/rvv/vsetvl/vsetvlmax-5.c | 4 +-- > .../gcc.target/riscv/rvv/vsetvl/vsetvlmax-7.c | 2 +- > .../gcc.target/riscv/rvv/vsetvl/vsetvlmax-8.c | 4 +-- > 13 files changed, 83 insertions(+), 22 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112092-1.c > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112092-2.c > > diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc > b/gcc/config/riscv/riscv-vector-builtins-bases.cc > index 0298b7987a1..d70468542ee 100644 > --- a/gcc/config/riscv/riscv-vector-builtins-bases.cc > +++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc > @@ -131,19 +131,31 @@ public: > > tree type = builtin_types[e.type.index].vector; > machine_mode mode = TYPE_MODE (type); > -machine_mode inner_mode = GET_MODE_INNER (mode); > +/* Normalize same RATO (SEW/LMUL) into same vsetvl instruction. > + > +- e8,mf8/e16,mf4/e32,mf2/e64,m1 --> e8mf8 > +- e8,mf4/e16,mf2/e32,m1/e64,m2 --> e8mf4 > +- e8,mf2/e16,m1/e32,m2/e64,m4 --> e8mf2 > +- e8,m1/e16,m2/e32,m4/e64,m8--> e8m1 > +- e8,m2/e16,m4/e32,m8 --> e8m2 > +- e8,m4/e16,m8 --> e8m4 > +- e8,m8 --> e8m8 > +*/ > /* SEW. */ > -e.add_input_operand (Pmode, > -gen_int_mode (GET_MODE_BITSIZE (inner_mode), Pmode)); > +e.add_input_operand (Pmode, gen_int_mode (8, Pmode)); > > /* LMUL. */ > -