[Bug target/112431] RISC-V GCC-15 feature: Support register overlap on widen RVV instructions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112431 --- Comment #15 from GCC Commits --- The master branch has been updated by Pan Li : https://gcc.gnu.org/g:7e854b58084c131fceca9e8fa9dcc7469972e69d commit r14-6400-g7e854b58084c131fceca9e8fa9dcc7469972e69d Author: Juzhe-Zhong Date: Sat Dec 9 12:06:29 2023 +0800 RISC-V: Support highest overlap for wv instructions According to RVV ISA, we can allow vwadd.wv v2, v2, v3 overlap. Before this patch: nop vsetivlizero,4,e8,m4,tu,ma vle16.v v8,0(a0) vmv8r.v v0,v8 vwsub.wvv0,v8,v12 nop addia4,a0,100 vle16.v v8,0(a4) vmv8r.v v24,v8 vwsub.wvv24,v8,v12 nop addia4,a0,200 vle16.v v8,0(a4) vmv8r.v v16,v8 vwsub.wvv16,v8,v12 nop After this patch: nop vsetivlizero,4,e8,m4,tu,ma vle16.v v0,0(a0) vwsub.wvv0,v0,v4 nop addia4,a0,100 vle16.v v24,0(a4) vwsub.wvv24,v24,v28 nop addia4,a0,200 vle16.v v16,0(a4) vwsub.wvv16,v16,v20 PR target/112431 gcc/ChangeLog: * config/riscv/vector.md: Support highest overlap for wv instructions. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pr112431-39.c: New test. * gcc.target/riscv/rvv/base/pr112431-40.c: New test. * gcc.target/riscv/rvv/base/pr112431-41.c: New test.
[Bug target/112431] RISC-V GCC-15 feature: Support register overlap on widen RVV instructions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112431 --- Comment #14 from GCC Commits --- The master branch has been updated by Pan Li : https://gcc.gnu.org/g:018ba3ac952bed4ae01344c060360f13f7cc084a commit r14-6118-g018ba3ac952bed4ae01344c060360f13f7cc084a Author: Juzhe-Zhong Date: Mon Dec 4 21:44:56 2023 +0800 RISC-V: Fix overlap group incorrect overlap on v0 In serious high register pressure case (appended in this patch): We see vluxei8.v v0,(s1),v1,v0.t which is not allowed. Since according to RVV ISA: +;; The destination vector register group for a masked vector instruction cannot overlap the source mask register (v0), +;; unless the destination vector register is being written with a mask value (e.g., compares) or the scalar result of a reduction. Such case doesn't have spillings, however, we expect such case should be spilled and reload data. The rootcause is I made a mistake in previous patch on matching dest operand and mask operand constraints: dest: "=vr" mask: "vmWc1" After this patch: dest: "vd,vr" mask: "vm,Wc1" make EEW widening pattern are same as other instruction patterns. PR target/112431 gcc/ChangeLog: * config/riscv/vector.md: Fix incorrect overlap in v0. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pr112431-34.c: New test.
[Bug target/112431] RISC-V GCC-15 feature: Support register overlap on widen RVV instructions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112431 --- Comment #13 from GCC Commits --- The master branch has been updated by Pan Li : https://gcc.gnu.org/g:27fde325d64447a3a0d5d550c5976e5f3fb6dc16 commit r14-6117-g27fde325d64447a3a0d5d550c5976e5f3fb6dc16 Author: Juzhe-Zhong Date: Mon Dec 4 21:32:06 2023 +0800 RISC-V: Support highest-number regno overlap for widen ternary Consider this example: #include "riscv_vector.h" void foo6 (void *in, void *out) { vfloat64m8_t accum = __riscv_vle64_v_f64m8 (in, 4); vfloat64m4_t high_eew64 = __riscv_vget_v_f64m8_f64m4 (accum, 1); vint64m4_t high_eew64_i = __riscv_vreinterpret_v_f64m4_i64m4 (high_eew64); vint32m4_t high_eew32_i = __riscv_vreinterpret_v_i64m4_i32m4 (high_eew64_i); vfloat32m4_t high_eew32 = __riscv_vreinterpret_v_i32m4_f32m4 (high_eew32_i); vfloat64m8_t result = __riscv_vfwnmsac_vf_f64m8 (accum, 64, high_eew32, 4); __riscv_vse64_v_f64m8 (out, result, 4); } Before this patch: foo6: # @foo6 vsetivlizero, 4, e32, m4, ta, ma vle64.v v8, (a0) lui a0, 272384 fmv.w.x fa5, a0 vmv8r.v v16, v8 vfwnmsac.vf v16, fa5, v12 vse64.v v16, (a1) ret After this patch: foo6: .LFB5: .cfi_startproc lui a5,%hi(.LC0) flw fa5,%lo(.LC0)(a5) vsetivlizero,4,e32,m4,ta,ma vle64.v v8,0(a0) vfwnmsac.vf v8,fa5,v12 vse64.v v8,0(a1) ret PR target/112431 gcc/ChangeLog: * config/riscv/vector.md: Add highest-number overlap support. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pr112431-37.c: New test. * gcc.target/riscv/rvv/base/pr112431-38.c: New test.
[Bug target/112431] RISC-V GCC-15 feature: Support register overlap on widen RVV instructions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112431 --- Comment #12 from JuzheZhong --- Except vv/wv variant widen instructions. All other widen EEW overlap have been done. It seems that current register filter can not help us simulate accurate highest-number overlap for vwadd.vv/vwadd.wv instructions.
[Bug target/112431] RISC-V GCC-15 feature: Support register overlap on widen RVV instructions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112431 --- Comment #11 from GCC Commits --- The trunk branch has been updated by Lehua Ding : https://gcc.gnu.org/g:7804b4e24cd16283067225d4c2c4a4483a2b31bc commit r14-6113-g7804b4e24cd16283067225d4c2c4a4483a2b31bc Author: Juzhe-Zhong Date: Mon Dec 4 16:51:06 2023 +0800 RISC-V: Remove earlyclobber from widen reduction Since the destination of reduction is not a vector register group, there is no need to apply overlap constraint. Also confirm Clang: The mir in LLVM has early clobber: early-clobber %49:vrm2 = PseudoVWADD_VX_M1 $noreg(tied-def 0), killed %17:vr, %48:gpr, %0:gprnox0, 3, 0; example.c:59:24 The mir in LLVM doesn't have early clobber: %48:vr = PseudoVWREDSUM_VS_M2_E8 $noreg(tied-def 0), %17:vrm2, killed %33:vr, %0:gprnox0, 3, 1; example.c:60:26 And also confirm both: vwredsum.vs v24, v8, v24 and vwredsum.vs v8, v8, v24 all legal on LLVM. Align with LLVM and honor RISC-V V spec, remove earlyclobber. Before this patch: vwredsum.vs v8,v24,v8 vwredsum.vs v7,v22,v7 vwredsum.vs v6,v20,v6 vwredsum.vs v5,v18,v5 vwredsum.vs v4,v16,v4 vwredsum.vs v3,v14,v3 vwredsum.vs v2,v12,v2 vwredsum.vs v1,v10,v1 vmv1r.v v9,v8 vwredsum.vs v9,v24,v9 vmv1r.v v24,v7 vwredsum.vs v24,v22,v24 vmv1r.v v22,v6 vwredsum.vs v22,v20,v22 vmv1r.v v20,v5 vwredsum.vs v20,v18,v20 vmv1r.v v18,v4 vwredsum.vs v18,v16,v18 vmv1r.v v16,v3 vwredsum.vs v16,v14,v16 vmv1r.v v14,v2 vwredsum.vs v14,v12,v14 vmv1r.v v12,v1 vwredsum.vs v12,v10,v12 After this patch: vfwredusum.vs v17,v12,v17 vfwredusum.vs v18,v10,v18 vfwredusum.vs v15,v26,v15 vfwredusum.vs v16,v24,v16 vfwredusum.vs v12,v12,v17 vfwredusum.vs v10,v10,v18 vfwredusum.vs v13,v6,v20 vfwredusum.vs v11,v8,v19 vfwredusum.vs v6,v6,v13 vfwredusum.vs v8,v8,v11 vfwredusum.vs v7,v4,v21 vfwredusum.vs v9,v2,v22 vfwredusum.vs v14,v26,v15 vfwredusum.vs v1,v24,v16 vfwredusum.vs v4,v4,v7 vfwredusum.vs v2,v2,v9 Same behavior as LLVM, and honor RISC-V V spec. PR target/112431 gcc/ChangeLog: * config/riscv/vector.md: Remove earlyclobber from widen reduction. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pr112431-35.c: New test. * gcc.target/riscv/rvv/base/pr112431-36.c: New test.
[Bug target/112431] RISC-V GCC-15 feature: Support register overlap on widen RVV instructions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112431 --- Comment #9 from GCC Commits --- The trunk branch has been updated by Lehua Ding : https://gcc.gnu.org/g:4418d55bcd1b7e0ef823981b6a781d7de5c38cce commit r14-6054-g4418d55bcd1b7e0ef823981b6a781d7de5c38cce Author: Juzhe-Zhong Date: Fri Dec 1 16:09:59 2023 +0800 RISC-V: Support highpart overlap for indexed load with SRC EEW < DEST EEW Leverage previous approach. Before this patch: .L5: add a3,s0,s2 add a4,s6,s2 add a5,s7,s2 vsetvli zero,s0,e64,m8,ta,ma vle8.v v4,0(s2) vle8.v v3,0(a3) mv s2,s1 vle8.v v2,0(a4) vle8.v v1,0(a5) nop vluxei8.v v8,(s1),v4 vs8r.v v8,0(sp) ---> spill vluxei8.v v8,(s1),v3 vluxei8.v v16,(s1),v2 vluxei8.v v24,(s1),v1 nop vmv.x.s a1,v8 vl8re64.v v8,0(sp) ---> reload vmv.x.s a3,v24 vmv.x.s a2,v16 vmv.x.s a0,v8 add s1,s1,s5 callsumation add s3,s3,a0 bgeus4,s1,.L5 After this patch: .L5: add a3,s0,s2 add a4,s6,s2 add a5,s7,s2 vsetvli zero,s0,e64,m8,ta,ma vle8.v v15,0(s2) vle8.v v23,0(a3) mv s2,s1 vle8.v v31,0(a4) vle8.v v7,0(a5) vluxei8.v v8,(s1),v15 vluxei8.v v16,(s1),v23 vluxei8.v v24,(s1),v31 vluxei8.v v0,(s1),v7 vmv.x.s a3,v0 vmv.x.s a2,v24 vmv.x.s a1,v16 vmv.x.s a0,v8 add s1,s1,s5 callsumation add s3,s3,a0 bgeus4,s1,.L5 PR target/112431 gcc/ChangeLog: * config/riscv/vector.md: Support highpart overlap for indexed load. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pr112431-28.c: New test. * gcc.target/riscv/rvv/base/pr112431-29.c: New test. * gcc.target/riscv/rvv/base/pr112431-30.c: New test. * gcc.target/riscv/rvv/base/pr112431-31.c: New test. * gcc.target/riscv/rvv/base/pr112431-32.c: New test. * gcc.target/riscv/rvv/base/pr112431-33.c: New test.
[Bug target/112431] RISC-V GCC-15 feature: Support register overlap on widen RVV instructions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112431 --- Comment #10 from GCC Commits --- The trunk branch has been updated by Lehua Ding : https://gcc.gnu.org/g:a23415d7572774701d7ec04664390260ab9a3f63 commit r14-6055-ga23415d7572774701d7ec04664390260ab9a3f63 Author: Juzhe-Zhong Date: Fri Dec 1 15:00:27 2023 +0800 RISC-V: Support highpart register overlap for widen vx/vf instructions This patch leverages the same approach as vwcvt. Before this patch: .L5: add a3,s0,s1 add a4,s6,s1 add a5,s7,s1 vsetvli zero,s0,e32,m4,ta,ma vle32.v v16,0(s1) vle32.v v12,0(a3) mv s1,s2 vle32.v v8,0(a4) vle32.v v4,0(a5) nop vfwadd.vf v24,v16,fs0 vfwadd.vf v16,v12,fs0 vs8r.v v16,0(sp)-> spill vfwadd.vf v16,v8,fs0 vfwadd.vf v8,v4,fs0 nop vsetvli zero,zero,e64,m8,ta,ma vfmv.f.sfa4,v24 vl8re64.v v24,0(sp) -> reload vfmv.f.sfa5,v24 fcvt.lu.d a0,fa4,rtz fcvt.lu.d a1,fa5,rtz vfmv.f.sfa4,v16 vfmv.f.sfa5,v8 fcvt.lu.d a2,fa4,rtz fcvt.lu.d a3,fa5,rtz add s2,s2,s5 callsumation add s3,s3,a0 bgeus4,s2,.L5 After this patch: .L5: add a3,s0,s1 add a4,s6,s1 add a5,s7,s1 vsetvli zero,s0,e32,m4,ta,ma vle32.v v4,0(s1) vle32.v v28,0(a3) mv s1,s2 vle32.v v20,0(a4) vle32.v v12,0(a5) vfwadd.vf v0,v4,fs0 vfwadd.vf v24,v28,fs0 vfwadd.vf v16,v20,fs0 vfwadd.vf v8,v12,fs0 vsetvli zero,zero,e64,m8,ta,ma vfmv.f.sfa4,v0 vfmv.f.sfa5,v24 fcvt.lu.d a0,fa4,rtz fcvt.lu.d a1,fa5,rtz vfmv.f.sfa4,v16 vfmv.f.sfa5,v8 fcvt.lu.d a2,fa4,rtz fcvt.lu.d a3,fa5,rtz add s2,s2,s5 callsumation add s3,s3,a0 bgeus4,s2,.L5 PR target/112431 gcc/ChangeLog: * config/riscv/vector.md: Support highpart overlap for vx/vf. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pr112431-22.c: New test. * gcc.target/riscv/rvv/base/pr112431-23.c: New test. * gcc.target/riscv/rvv/base/pr112431-24.c: New test. * gcc.target/riscv/rvv/base/pr112431-25.c: New test. * gcc.target/riscv/rvv/base/pr112431-26.c: New test. * gcc.target/riscv/rvv/base/pr112431-27.c: New test.
[Bug target/112431] RISC-V GCC-15 feature: Support register overlap on widen RVV instructions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112431 --- Comment #8 from GCC Commits --- The master branch has been updated by Pan Li : https://gcc.gnu.org/g:303195e2a6b6f0e8f42e0578b61f9f37c6250beb commit r14-6008-g303195e2a6b6f0e8f42e0578b61f9f37c6250beb Author: Juzhe-Zhong Date: Thu Nov 30 20:08:43 2023 +0800 RISC-V: Support widening register overlap for vf4/vf8 size_t foo (char const *buf, size_t len) { size_t sum = 0; size_t vl = __riscv_vsetvlmax_e8m8 (); size_t step = vl * 4; const char *it = buf, *end = buf + len; for (; it + step <= end;) { vint8m1_t v0 = __riscv_vle8_v_i8m1 ((void *) it, vl); it += vl; vint8m1_t v1 = __riscv_vle8_v_i8m1 ((void *) it, vl); it += vl; vint8m1_t v2 = __riscv_vle8_v_i8m1 ((void *) it, vl); it += vl; vint8m1_t v3 = __riscv_vle8_v_i8m1 ((void *) it, vl); it += vl; asm volatile("nop" ::: "memory"); vint64m8_t vw0 = __riscv_vsext_vf8_i64m8 (v0, vl); vint64m8_t vw1 = __riscv_vsext_vf8_i64m8 (v1, vl); vint64m8_t vw2 = __riscv_vsext_vf8_i64m8 (v2, vl); vint64m8_t vw3 = __riscv_vsext_vf8_i64m8 (v3, vl); asm volatile("nop" ::: "memory"); size_t sum0 = __riscv_vmv_x_s_i64m8_i64 (vw0); size_t sum1 = __riscv_vmv_x_s_i64m8_i64 (vw1); size_t sum2 = __riscv_vmv_x_s_i64m8_i64 (vw2); size_t sum3 = __riscv_vmv_x_s_i64m8_i64 (vw3); sum += sumation (sum0, sum1, sum2, sum3); } return sum; } Before this patch: add a3,s0,s1 add a4,s6,s1 add a5,s7,s1 vsetvli zero,s0,e64,m8,ta,ma vle8.v v4,0(s1) vle8.v v3,0(a3) mv s1,s2 vle8.v v2,0(a4) vle8.v v1,0(a5) nop vsext.vf8 v8,v4 vsext.vf8 v16,v2 vs8r.v v8,0(sp) vsext.vf8 v24,v1 vsext.vf8 v8,v3 nop vmv.x.s a1,v8 vl8re64.v v8,0(sp) vmv.x.s a3,v24 vmv.x.s a2,v16 vmv.x.s a0,v8 add s2,s2,s5 callsumation add s3,s3,a0 bgeus4,s2,.L5 After this patch: add a3,s0,s1 add a4,s6,s1 add a5,s7,s1 vsetvli zero,s0,e64,m8,ta,ma vle8.v v15,0(s1) vle8.v v23,0(a3) mv s1,s2 vle8.v v31,0(a4) vle8.v v7,0(a5) vsext.vf8 v8,v15 vsext.vf8 v16,v23 vsext.vf8 v24,v31 vsext.vf8 v0,v7 vmv.x.s a3,v0 vmv.x.s a2,v24 vmv.x.s a1,v16 vmv.x.s a0,v8 add s2,s2,s5 callsumation add s3,s3,a0 bgeus4,s2,.L5 PR target/112431 gcc/ChangeLog: * config/riscv/vector.md: Add widening overlap of vf2/vf4. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pr112431-16.c: New test. * gcc.target/riscv/rvv/base/pr112431-17.c: New test. * gcc.target/riscv/rvv/base/pr112431-18.c: New test.
[Bug target/112431] RISC-V GCC-15 feature: Support register overlap on widen RVV instructions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112431 --- Comment #7 from GCC Commits --- The trunk branch has been updated by Lehua Ding : https://gcc.gnu.org/g:5a35152f87a36db480693884dfb27ff6a5d5d683 commit r14-6007-g5a35152f87a36db480693884dfb27ff6a5d5d683 Author: Juzhe-Zhong Date: Thu Nov 30 18:12:04 2023 +0800 RISC-V: Remove earlyclobber for wx/wf instructions. While working on overlap for widening instructions, I realize that we set vwadd.wx/vfwadd.wf as earlyclobber which is incorrect. Since according to RVV ISA: "The destination EEW equals the source EEW." vwadd.vx widens the first source operand (i.e. 2 * source EEW = dest EEW) while vwadd.wx only widens the second/scalar source operand. Therefore overlap is legal for wx but not for vx. Before this patch (heave spillings): csrra5,vlenb sllia5,a5,1 addia5,a5,64 vfwadd.wf v2,v14,fs0 add a5,a5,sp vs2r.v v2,0(a5) vl2re32.v v2,0(a1) vfwadd.wf v14,v12,fs0 vfwadd.wf v12,v10,fs0 vfwadd.wf v10,v8,fs0 vfwadd.wf v8,v6,fs0 vfwadd.wf v6,v4,fs0 vfwadd.wf v4,v2,fs0 vfwadd.wf v2,v16,fs0 vfwadd.wf v16,v18,fs0 vfwadd.wf v18,v20,fs0 vfwadd.wf v20,v22,fs0 vfwadd.wf v22,v24,fs0 vfwadd.wf v24,v26,fs0 vfwadd.wf v26,v28,fs0 vfwadd.wf v28,v30,fs0 vfwadd.wf v30,v0,fs0 nop vsetvli zero,zero,e32,m2,ta,ma csrra5,vlenb After this patch (no spillings): vfwadd.wf v16,v16,fs0 vfwadd.wf v14,v14,fs0 vfwadd.wf v12,v12,fs0 vfwadd.wf v10,v10,fs0 vfwadd.wf v8,v8,fs0 vfwadd.wf v6,v6,fs0 vfwadd.wf v4,v4,fs0 vfwadd.wf v2,v2,fs0 vfwadd.wf v18,v18,fs0 vfwadd.wf v20,v20,fs0 vfwadd.wf v22,v22,fs0 vfwadd.wf v24,v24,fs0 vfwadd.wf v26,v26,fs0 vfwadd.wf v28,v28,fs0 vfwadd.wf v30,v30,fs0 vfwadd.wf v0,v0,fs0 Confirm the codegen above run successfully on both SPIKE/QEMU. PR target/112431 gcc/ChangeLog: * config/riscv/vector.md: Remove earlyclobber for wx/wf instructions. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pr112431-19.c: New test. * gcc.target/riscv/rvv/base/pr112431-20.c: New test. * gcc.target/riscv/rvv/base/pr112431-21.c: New test.
[Bug target/112431] RISC-V GCC-15 feature: Support register overlap on widen RVV instructions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112431 --- Comment #6 from GCC Commits --- The trunk branch has been updated by Lehua Ding : https://gcc.gnu.org/g:8614cbb253484e28c3eb20cde4d1067aad56de58 commit r14-5984-g8614cbb253484e28c3eb20cde4d1067aad56de58 Author: Juzhe-Zhong Date: Thu Nov 30 10:36:30 2023 +0800 RISC-V: Support highpart overlap for floating-point widen instructions This patch leverages the approach of vwcvt/vext.vf2 which has been approved. Their approaches are totally the same. Tested no regression and committed. PR target/112431 gcc/ChangeLog: * config/riscv/vector.md: Add widenning overlap. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pr112431-10.c: New test. * gcc.target/riscv/rvv/base/pr112431-11.c: New test. * gcc.target/riscv/rvv/base/pr112431-12.c: New test. * gcc.target/riscv/rvv/base/pr112431-13.c: New test. * gcc.target/riscv/rvv/base/pr112431-14.c: New test. * gcc.target/riscv/rvv/base/pr112431-15.c: New test. * gcc.target/riscv/rvv/base/pr112431-7.c: New test. * gcc.target/riscv/rvv/base/pr112431-8.c: New test. * gcc.target/riscv/rvv/base/pr112431-9.c: New test.
[Bug target/112431] RISC-V GCC-15 feature: Support register overlap on widen RVV instructions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112431 --- Comment #5 from GCC Commits --- The master branch has been updated by Pan Li : https://gcc.gnu.org/g:62685890d8861b72f812bfe171a20332df08bd49 commit r14-5982-g62685890d8861b72f812bfe171a20332df08bd49 Author: Juzhe-Zhong Date: Wed Nov 29 18:53:06 2023 +0800 RISC-V: Support highpart overlap for vext.vf PR target/112431 gcc/ChangeLog: * config/riscv/vector.md: Support highpart overlap for vext.vf2 gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/unop_v_constraint-2.c: Adapt test. * gcc.target/riscv/rvv/base/pr112431-4.c: New test. * gcc.target/riscv/rvv/base/pr112431-5.c: New test. * gcc.target/riscv/rvv/base/pr112431-6.c: New test.
[Bug target/112431] RISC-V GCC-15 feature: Support register overlap on widen RVV instructions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112431 --- Comment #4 from GCC Commits --- The master branch has been updated by Pan Li : https://gcc.gnu.org/g:bdad036da32f72b84a96070518e7d75c21706dc2 commit r14-5960-gbdad036da32f72b84a96070518e7d75c21706dc2 Author: Juzhe-Zhong Date: Wed Nov 29 16:34:10 2023 +0800 RISC-V: Support highpart register overlap for vwcvt Since Richard supports register filters recently, we are able to support highpart register overlap for widening RVV instructions. This patch support it for vwcvt intrinsics. I leverage real application user codes for vwcvt: https://github.com/riscv/riscv-v-spec/issues/929 https://godbolt.org/z/xoeGnzd8q This is the real application codes that using LMUL = 8 with unrolling to gain optimal performance for specific libraury. You can see in the codegen, GCC has optimal codegen for such since we supported register lowpart overlap for narrowing instructions (dest EEW < source EEW). Now, we start to support highpart register overlap from this patch for widening instructions (dest EEW > source EEW). Leverage this intrinsic codes above but for vwcvt: https://godbolt.org/z/1TMPE5Wfr size_t foo (char const *buf, size_t len) { size_t sum = 0; size_t vl = __riscv_vsetvlmax_e8m8 (); size_t step = vl * 4; const char *it = buf, *end = buf + len; for (; it + step <= end;) { vint8m4_t v0 = __riscv_vle8_v_i8m4 ((void *) it, vl); it += vl; vint8m4_t v1 = __riscv_vle8_v_i8m4 ((void *) it, vl); it += vl; vint8m4_t v2 = __riscv_vle8_v_i8m4 ((void *) it, vl); it += vl; vint8m4_t v3 = __riscv_vle8_v_i8m4 ((void *) it, vl); it += vl; asm volatile("nop" ::: "memory"); vint16m8_t vw0 = __riscv_vwcvt_x_x_v_i16m8 (v0, vl); vint16m8_t vw1 = __riscv_vwcvt_x_x_v_i16m8 (v1, vl); vint16m8_t vw2 = __riscv_vwcvt_x_x_v_i16m8 (v2, vl); vint16m8_t vw3 = __riscv_vwcvt_x_x_v_i16m8 (v3, vl); asm volatile("nop" ::: "memory"); size_t sum0 = __riscv_vmv_x_s_i16m8_i16 (vw0); size_t sum1 = __riscv_vmv_x_s_i16m8_i16 (vw1); size_t sum2 = __riscv_vmv_x_s_i16m8_i16 (vw2); size_t sum3 = __riscv_vmv_x_s_i16m8_i16 (vw3); sum += sumation (sum0, sum1, sum2, sum3); } return sum; } Before this patch: ... csrrt0,vlenb ... vwcvt.x.x.v v16,v8 vwcvt.x.x.v v8,v28 vs8r.v v16,0(sp) ---> spill vwcvt.x.x.v v16,v24 vwcvt.x.x.v v24,v4 nop vsetvli zero,zero,e16,m8,ta,ma vmv.x.s a2,v16 vl8re16.v v16,0(sp) ---> reload ... csrrt0,vlenb ... You can see heavy spill && reload inside the loop body. After this patch: ... vwcvt.x.x.v v8,v12 vwcvt.x.x.v v16,v20 vwcvt.x.x.v v24,v28 vwcvt.x.x.v v0,v4 ... Optimal codegen after this patch. Tested on zvl128b no regression. I am gonna to test zve64d/zvl256b/zvl512b/zvl1024b. Ok for trunk if no regression on the testing above ? Co-authored-by: kito-cheng Co-authored-by: kito-cheng PR target/112431 gcc/ChangeLog: * config/riscv/constraints.md (TARGET_VECTOR ? V_REGS : NO_REGS): New register filters. * config/riscv/riscv.md (no,W21,W42,W84,W41,W81,W82): Ditto. (no,yes): Ditto. * config/riscv/vector.md: Support highpart register overlap for vwcvt. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pr112431-1.c: New test. * gcc.target/riscv/rvv/base/pr112431-2.c: New test. * gcc.target/riscv/rvv/base/pr112431-3.c: New test.