[Bug target/112431] RISC-V GCC-15 feature: Support register overlap on widen RVV instructions

2023-12-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112431

--- Comment #15 from GCC Commits  ---
The master branch has been updated by Pan Li :

https://gcc.gnu.org/g:7e854b58084c131fceca9e8fa9dcc7469972e69d

commit r14-6400-g7e854b58084c131fceca9e8fa9dcc7469972e69d
Author: Juzhe-Zhong 
Date:   Sat Dec 9 12:06:29 2023 +0800

RISC-V: Support highest overlap for wv instructions

According to RVV ISA, we can allow vwadd.wv v2, v2, v3 overlap.

Before this patch:

nop
vsetivlizero,4,e8,m4,tu,ma
vle16.v v8,0(a0)
vmv8r.v v0,v8
vwsub.wvv0,v8,v12
nop
addia4,a0,100
vle16.v v8,0(a4)
vmv8r.v v24,v8
vwsub.wvv24,v8,v12
nop
addia4,a0,200
vle16.v v8,0(a4)
vmv8r.v v16,v8
vwsub.wvv16,v8,v12
nop

After this patch:

nop
vsetivlizero,4,e8,m4,tu,ma
vle16.v v0,0(a0)
vwsub.wvv0,v0,v4
nop
addia4,a0,100
vle16.v v24,0(a4)
vwsub.wvv24,v24,v28
nop
addia4,a0,200
vle16.v v16,0(a4)
vwsub.wvv16,v16,v20

PR target/112431

gcc/ChangeLog:

* config/riscv/vector.md: Support highest overlap for wv
instructions.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr112431-39.c: New test.
* gcc.target/riscv/rvv/base/pr112431-40.c: New test.
* gcc.target/riscv/rvv/base/pr112431-41.c: New test.

[Bug target/112431] RISC-V GCC-15 feature: Support register overlap on widen RVV instructions

2023-12-04 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112431

--- Comment #14 from GCC Commits  ---
The master branch has been updated by Pan Li :

https://gcc.gnu.org/g:018ba3ac952bed4ae01344c060360f13f7cc084a

commit r14-6118-g018ba3ac952bed4ae01344c060360f13f7cc084a
Author: Juzhe-Zhong 
Date:   Mon Dec 4 21:44:56 2023 +0800

RISC-V: Fix overlap group incorrect overlap on v0

In serious high register pressure case (appended in this patch):

We see vluxei8.v   v0,(s1),v1,v0.t which is not allowed.
Since according to RVV ISA:

+;; The destination vector register group for a masked vector instruction
cannot overlap the source mask register (v0),
+;; unless the destination vector register is being written with a mask
value (e.g., compares) or the scalar result of a reduction.

Such case doesn't have spillings, however, we expect such case should be
spilled and reload data.

The rootcause is I made a mistake in previous patch on matching dest
operand and mask operand constraints:

dest: "=vr"
mask: "vmWc1"

After this patch:

dest: "vd,vr"
mask: "vm,Wc1"

make EEW widening pattern are same as other instruction patterns.

PR target/112431

gcc/ChangeLog:

* config/riscv/vector.md: Fix incorrect overlap in v0.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr112431-34.c: New test.

[Bug target/112431] RISC-V GCC-15 feature: Support register overlap on widen RVV instructions

2023-12-04 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112431

--- Comment #13 from GCC Commits  ---
The master branch has been updated by Pan Li :

https://gcc.gnu.org/g:27fde325d64447a3a0d5d550c5976e5f3fb6dc16

commit r14-6117-g27fde325d64447a3a0d5d550c5976e5f3fb6dc16
Author: Juzhe-Zhong 
Date:   Mon Dec 4 21:32:06 2023 +0800

RISC-V: Support highest-number regno overlap for widen ternary

Consider this example:

#include "riscv_vector.h"
void
foo6 (void *in, void *out)
{
  vfloat64m8_t accum = __riscv_vle64_v_f64m8 (in, 4);
  vfloat64m4_t high_eew64 = __riscv_vget_v_f64m8_f64m4 (accum, 1);
  vint64m4_t high_eew64_i = __riscv_vreinterpret_v_f64m4_i64m4
(high_eew64);
  vint32m4_t high_eew32_i = __riscv_vreinterpret_v_i64m4_i32m4
(high_eew64_i);
  vfloat32m4_t high_eew32 = __riscv_vreinterpret_v_i32m4_f32m4
(high_eew32_i);
  vfloat64m8_t result = __riscv_vfwnmsac_vf_f64m8 (accum, 64, high_eew32,
4);
  __riscv_vse64_v_f64m8 (out, result, 4);
}

Before this patch:

foo6:   # @foo6
vsetivlizero, 4, e32, m4, ta, ma
vle64.v v8, (a0)
lui a0, 272384
fmv.w.x fa5, a0
vmv8r.v v16, v8
vfwnmsac.vf v16, fa5, v12
vse64.v v16, (a1)
ret

After this patch:

foo6:
.LFB5:
.cfi_startproc
lui a5,%hi(.LC0)
flw fa5,%lo(.LC0)(a5)
vsetivlizero,4,e32,m4,ta,ma
vle64.v v8,0(a0)
vfwnmsac.vf v8,fa5,v12
vse64.v v8,0(a1)
ret

PR target/112431

gcc/ChangeLog:

* config/riscv/vector.md: Add highest-number overlap support.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr112431-37.c: New test.
* gcc.target/riscv/rvv/base/pr112431-38.c: New test.

[Bug target/112431] RISC-V GCC-15 feature: Support register overlap on widen RVV instructions

2023-12-04 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112431

--- Comment #12 from JuzheZhong  ---
Except vv/wv variant widen instructions.
All other widen EEW overlap have been done.

It seems that current register filter can not help us simulate accurate
highest-number overlap for vwadd.vv/vwadd.wv instructions.

[Bug target/112431] RISC-V GCC-15 feature: Support register overlap on widen RVV instructions

2023-12-04 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112431

--- Comment #11 from GCC Commits  ---
The trunk branch has been updated by Lehua Ding :

https://gcc.gnu.org/g:7804b4e24cd16283067225d4c2c4a4483a2b31bc

commit r14-6113-g7804b4e24cd16283067225d4c2c4a4483a2b31bc
Author: Juzhe-Zhong 
Date:   Mon Dec 4 16:51:06 2023 +0800

RISC-V: Remove earlyclobber from widen reduction

Since the destination of reduction is not a vector register group, there
is no need to apply overlap constraint.

Also confirm Clang:

The mir in LLVM has early clobber:
early-clobber %49:vrm2 = PseudoVWADD_VX_M1 $noreg(tied-def 0), killed
%17:vr, %48:gpr, %0:gprnox0, 3, 0; example.c:59:24

The mir in LLVM doesn't have early clobber:
%48:vr = PseudoVWREDSUM_VS_M2_E8 $noreg(tied-def 0), %17:vrm2, killed
%33:vr, %0:gprnox0, 3, 1; example.c:60:26

And also confirm both:

vwredsum.vs v24, v8, v24 and vwredsum.vs v8, v8, v24 all legal on
LLVM.

Align with LLVM and honor RISC-V V spec, remove earlyclobber.

Before this patch:

vwredsum.vs v8,v24,v8
vwredsum.vs v7,v22,v7
vwredsum.vs v6,v20,v6
vwredsum.vs v5,v18,v5
vwredsum.vs v4,v16,v4
vwredsum.vs v3,v14,v3
vwredsum.vs v2,v12,v2
vwredsum.vs v1,v10,v1
vmv1r.v v9,v8
vwredsum.vs v9,v24,v9
vmv1r.v v24,v7
vwredsum.vs v24,v22,v24
vmv1r.v v22,v6
vwredsum.vs v22,v20,v22
vmv1r.v v20,v5
vwredsum.vs v20,v18,v20
vmv1r.v v18,v4
vwredsum.vs v18,v16,v18
vmv1r.v v16,v3
vwredsum.vs v16,v14,v16
vmv1r.v v14,v2
vwredsum.vs v14,v12,v14
vmv1r.v v12,v1
vwredsum.vs v12,v10,v12

After this patch:

vfwredusum.vs   v17,v12,v17
vfwredusum.vs   v18,v10,v18
vfwredusum.vs   v15,v26,v15
vfwredusum.vs   v16,v24,v16
vfwredusum.vs   v12,v12,v17
vfwredusum.vs   v10,v10,v18
vfwredusum.vs   v13,v6,v20
vfwredusum.vs   v11,v8,v19
vfwredusum.vs   v6,v6,v13
vfwredusum.vs   v8,v8,v11
vfwredusum.vs   v7,v4,v21
vfwredusum.vs   v9,v2,v22
vfwredusum.vs   v14,v26,v15
vfwredusum.vs   v1,v24,v16
vfwredusum.vs   v4,v4,v7
vfwredusum.vs   v2,v2,v9

Same behavior as LLVM, and honor RISC-V V spec.

PR target/112431

gcc/ChangeLog:

* config/riscv/vector.md: Remove earlyclobber from widen reduction.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr112431-35.c: New test.
* gcc.target/riscv/rvv/base/pr112431-36.c: New test.

[Bug target/112431] RISC-V GCC-15 feature: Support register overlap on widen RVV instructions

2023-12-01 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112431

--- Comment #9 from GCC Commits  ---
The trunk branch has been updated by Lehua Ding :

https://gcc.gnu.org/g:4418d55bcd1b7e0ef823981b6a781d7de5c38cce

commit r14-6054-g4418d55bcd1b7e0ef823981b6a781d7de5c38cce
Author: Juzhe-Zhong 
Date:   Fri Dec 1 16:09:59 2023 +0800

RISC-V: Support highpart overlap for indexed load with SRC EEW < DEST EEW

Leverage previous approach.

Before this patch:

.L5:
add a3,s0,s2
add a4,s6,s2
add a5,s7,s2
vsetvli zero,s0,e64,m8,ta,ma
vle8.v  v4,0(s2)
vle8.v  v3,0(a3)
mv  s2,s1
vle8.v  v2,0(a4)
vle8.v  v1,0(a5)
nop
vluxei8.v   v8,(s1),v4
vs8r.v  v8,0(sp)  ---> spill
vluxei8.v   v8,(s1),v3
vluxei8.v   v16,(s1),v2
vluxei8.v   v24,(s1),v1
nop
vmv.x.s a1,v8
vl8re64.v   v8,0(sp) ---> reload
vmv.x.s a3,v24
vmv.x.s a2,v16
vmv.x.s a0,v8
add s1,s1,s5
callsumation
add s3,s3,a0
bgeus4,s1,.L5

After this patch:

.L5:
add a3,s0,s2
add a4,s6,s2
add a5,s7,s2
vsetvli zero,s0,e64,m8,ta,ma
vle8.v  v15,0(s2)
vle8.v  v23,0(a3)
mv  s2,s1
vle8.v  v31,0(a4)
vle8.v  v7,0(a5)
vluxei8.v   v8,(s1),v15
vluxei8.v   v16,(s1),v23
vluxei8.v   v24,(s1),v31
vluxei8.v   v0,(s1),v7
vmv.x.s a3,v0
vmv.x.s a2,v24
vmv.x.s a1,v16
vmv.x.s a0,v8
add s1,s1,s5
callsumation
add s3,s3,a0
bgeus4,s1,.L5

PR target/112431

gcc/ChangeLog:

* config/riscv/vector.md: Support highpart overlap for indexed
load.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr112431-28.c: New test.
* gcc.target/riscv/rvv/base/pr112431-29.c: New test.
* gcc.target/riscv/rvv/base/pr112431-30.c: New test.
* gcc.target/riscv/rvv/base/pr112431-31.c: New test.
* gcc.target/riscv/rvv/base/pr112431-32.c: New test.
* gcc.target/riscv/rvv/base/pr112431-33.c: New test.

[Bug target/112431] RISC-V GCC-15 feature: Support register overlap on widen RVV instructions

2023-12-01 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112431

--- Comment #10 from GCC Commits  ---
The trunk branch has been updated by Lehua Ding :

https://gcc.gnu.org/g:a23415d7572774701d7ec04664390260ab9a3f63

commit r14-6055-ga23415d7572774701d7ec04664390260ab9a3f63
Author: Juzhe-Zhong 
Date:   Fri Dec 1 15:00:27 2023 +0800

RISC-V: Support highpart register overlap for widen vx/vf instructions

This patch leverages the same approach as vwcvt.

Before this patch:

.L5:
add a3,s0,s1
add a4,s6,s1
add a5,s7,s1
vsetvli zero,s0,e32,m4,ta,ma
vle32.v v16,0(s1)
vle32.v v12,0(a3)
mv  s1,s2
vle32.v v8,0(a4)
vle32.v v4,0(a5)
nop
vfwadd.vf   v24,v16,fs0
vfwadd.vf   v16,v12,fs0
vs8r.v  v16,0(sp)-> spill
vfwadd.vf   v16,v8,fs0
vfwadd.vf   v8,v4,fs0
nop
vsetvli zero,zero,e64,m8,ta,ma
vfmv.f.sfa4,v24
vl8re64.v   v24,0(sp)   -> reload
vfmv.f.sfa5,v24
fcvt.lu.d a0,fa4,rtz
fcvt.lu.d a1,fa5,rtz
vfmv.f.sfa4,v16
vfmv.f.sfa5,v8
fcvt.lu.d a2,fa4,rtz
fcvt.lu.d a3,fa5,rtz
add s2,s2,s5
callsumation
add s3,s3,a0
bgeus4,s2,.L5

After this patch:

.L5:
add a3,s0,s1
add a4,s6,s1
add a5,s7,s1
vsetvli zero,s0,e32,m4,ta,ma
vle32.v v4,0(s1)
vle32.v v28,0(a3)
mv  s1,s2
vle32.v v20,0(a4)
vle32.v v12,0(a5)
vfwadd.vf   v0,v4,fs0
vfwadd.vf   v24,v28,fs0
vfwadd.vf   v16,v20,fs0
vfwadd.vf   v8,v12,fs0
vsetvli zero,zero,e64,m8,ta,ma
vfmv.f.sfa4,v0
vfmv.f.sfa5,v24
fcvt.lu.d a0,fa4,rtz
fcvt.lu.d a1,fa5,rtz
vfmv.f.sfa4,v16
vfmv.f.sfa5,v8
fcvt.lu.d a2,fa4,rtz
fcvt.lu.d a3,fa5,rtz
add s2,s2,s5
callsumation
add s3,s3,a0
bgeus4,s2,.L5

PR target/112431

gcc/ChangeLog:

* config/riscv/vector.md: Support highpart overlap for vx/vf.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr112431-22.c: New test.
* gcc.target/riscv/rvv/base/pr112431-23.c: New test.
* gcc.target/riscv/rvv/base/pr112431-24.c: New test.
* gcc.target/riscv/rvv/base/pr112431-25.c: New test.
* gcc.target/riscv/rvv/base/pr112431-26.c: New test.
* gcc.target/riscv/rvv/base/pr112431-27.c: New test.

[Bug target/112431] RISC-V GCC-15 feature: Support register overlap on widen RVV instructions

2023-11-30 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112431

--- Comment #8 from GCC Commits  ---
The master branch has been updated by Pan Li :

https://gcc.gnu.org/g:303195e2a6b6f0e8f42e0578b61f9f37c6250beb

commit r14-6008-g303195e2a6b6f0e8f42e0578b61f9f37c6250beb
Author: Juzhe-Zhong 
Date:   Thu Nov 30 20:08:43 2023 +0800

RISC-V: Support widening register overlap for vf4/vf8

size_t
foo (char const *buf, size_t len)
{
  size_t sum = 0;
  size_t vl = __riscv_vsetvlmax_e8m8 ();
  size_t step = vl * 4;
  const char *it = buf, *end = buf + len;
  for (; it + step <= end;)
{
  vint8m1_t v0 = __riscv_vle8_v_i8m1 ((void *) it, vl);
  it += vl;
  vint8m1_t v1 = __riscv_vle8_v_i8m1 ((void *) it, vl);
  it += vl;
  vint8m1_t v2 = __riscv_vle8_v_i8m1 ((void *) it, vl);
  it += vl;
  vint8m1_t v3 = __riscv_vle8_v_i8m1 ((void *) it, vl);
  it += vl;

  asm volatile("nop" ::: "memory");
  vint64m8_t vw0 = __riscv_vsext_vf8_i64m8 (v0, vl);
  vint64m8_t vw1 = __riscv_vsext_vf8_i64m8 (v1, vl);
  vint64m8_t vw2 = __riscv_vsext_vf8_i64m8 (v2, vl);
  vint64m8_t vw3 = __riscv_vsext_vf8_i64m8 (v3, vl);

  asm volatile("nop" ::: "memory");
  size_t sum0 = __riscv_vmv_x_s_i64m8_i64 (vw0);
  size_t sum1 = __riscv_vmv_x_s_i64m8_i64 (vw1);
  size_t sum2 = __riscv_vmv_x_s_i64m8_i64 (vw2);
  size_t sum3 = __riscv_vmv_x_s_i64m8_i64 (vw3);

  sum += sumation (sum0, sum1, sum2, sum3);
}
  return sum;
}

Before this patch:

add a3,s0,s1
add a4,s6,s1
add a5,s7,s1
vsetvli zero,s0,e64,m8,ta,ma
vle8.v  v4,0(s1)
vle8.v  v3,0(a3)
mv  s1,s2
vle8.v  v2,0(a4)
vle8.v  v1,0(a5)
nop
vsext.vf8   v8,v4
vsext.vf8   v16,v2
vs8r.v  v8,0(sp)
vsext.vf8   v24,v1
vsext.vf8   v8,v3
nop
vmv.x.s a1,v8
vl8re64.v   v8,0(sp)
vmv.x.s a3,v24
vmv.x.s a2,v16
vmv.x.s a0,v8
add s2,s2,s5
callsumation
add s3,s3,a0
bgeus4,s2,.L5

After this patch:

add a3,s0,s1
add a4,s6,s1
add a5,s7,s1
vsetvli zero,s0,e64,m8,ta,ma
vle8.v  v15,0(s1)
vle8.v  v23,0(a3)
mv  s1,s2
vle8.v  v31,0(a4)
vle8.v  v7,0(a5)
vsext.vf8   v8,v15
vsext.vf8   v16,v23
vsext.vf8   v24,v31
vsext.vf8   v0,v7
vmv.x.s a3,v0
vmv.x.s a2,v24
vmv.x.s a1,v16
vmv.x.s a0,v8
add s2,s2,s5
callsumation
add s3,s3,a0
bgeus4,s2,.L5

PR target/112431

gcc/ChangeLog:

* config/riscv/vector.md: Add widening overlap of vf2/vf4.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr112431-16.c: New test.
* gcc.target/riscv/rvv/base/pr112431-17.c: New test.
* gcc.target/riscv/rvv/base/pr112431-18.c: New test.

[Bug target/112431] RISC-V GCC-15 feature: Support register overlap on widen RVV instructions

2023-11-30 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112431

--- Comment #7 from GCC Commits  ---
The trunk branch has been updated by Lehua Ding :

https://gcc.gnu.org/g:5a35152f87a36db480693884dfb27ff6a5d5d683

commit r14-6007-g5a35152f87a36db480693884dfb27ff6a5d5d683
Author: Juzhe-Zhong 
Date:   Thu Nov 30 18:12:04 2023 +0800

RISC-V: Remove earlyclobber for wx/wf instructions.

While working on overlap for widening instructions, I realize that we set
vwadd.wx/vfwadd.wf as earlyclobber which is incorrect.

Since according to RVV ISA:
"The destination EEW equals the source EEW."

vwadd.vx widens the first source operand (i.e. 2 * source EEW = dest EEW)
while
vwadd.wx only widens the second/scalar source operand.

Therefore overlap is legal for wx but not for vx.

Before this patch (heave spillings):

csrra5,vlenb
sllia5,a5,1
addia5,a5,64
vfwadd.wf   v2,v14,fs0
add a5,a5,sp
vs2r.v  v2,0(a5)
vl2re32.v   v2,0(a1)
vfwadd.wf   v14,v12,fs0
vfwadd.wf   v12,v10,fs0
vfwadd.wf   v10,v8,fs0
vfwadd.wf   v8,v6,fs0
vfwadd.wf   v6,v4,fs0
vfwadd.wf   v4,v2,fs0
vfwadd.wf   v2,v16,fs0
vfwadd.wf   v16,v18,fs0
vfwadd.wf   v18,v20,fs0
vfwadd.wf   v20,v22,fs0
vfwadd.wf   v22,v24,fs0
vfwadd.wf   v24,v26,fs0
vfwadd.wf   v26,v28,fs0
vfwadd.wf   v28,v30,fs0
vfwadd.wf   v30,v0,fs0
nop
vsetvli zero,zero,e32,m2,ta,ma
csrra5,vlenb

After this patch (no spillings):

vfwadd.wf   v16,v16,fs0
vfwadd.wf   v14,v14,fs0
vfwadd.wf   v12,v12,fs0
vfwadd.wf   v10,v10,fs0
vfwadd.wf   v8,v8,fs0
vfwadd.wf   v6,v6,fs0
vfwadd.wf   v4,v4,fs0
vfwadd.wf   v2,v2,fs0
vfwadd.wf   v18,v18,fs0
vfwadd.wf   v20,v20,fs0
vfwadd.wf   v22,v22,fs0
vfwadd.wf   v24,v24,fs0
vfwadd.wf   v26,v26,fs0
vfwadd.wf   v28,v28,fs0
vfwadd.wf   v30,v30,fs0
vfwadd.wf   v0,v0,fs0

Confirm the codegen above run successfully on both SPIKE/QEMU.

PR target/112431

gcc/ChangeLog:

* config/riscv/vector.md: Remove earlyclobber for wx/wf
instructions.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr112431-19.c: New test.
* gcc.target/riscv/rvv/base/pr112431-20.c: New test.
* gcc.target/riscv/rvv/base/pr112431-21.c: New test.

[Bug target/112431] RISC-V GCC-15 feature: Support register overlap on widen RVV instructions

2023-11-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112431

--- Comment #6 from GCC Commits  ---
The trunk branch has been updated by Lehua Ding :

https://gcc.gnu.org/g:8614cbb253484e28c3eb20cde4d1067aad56de58

commit r14-5984-g8614cbb253484e28c3eb20cde4d1067aad56de58
Author: Juzhe-Zhong 
Date:   Thu Nov 30 10:36:30 2023 +0800

RISC-V: Support highpart overlap for floating-point widen instructions

This patch leverages the approach of vwcvt/vext.vf2 which has been
approved.
Their approaches are totally the same.

Tested no regression and committed.

PR target/112431

gcc/ChangeLog:

* config/riscv/vector.md: Add widenning overlap.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr112431-10.c: New test.
* gcc.target/riscv/rvv/base/pr112431-11.c: New test.
* gcc.target/riscv/rvv/base/pr112431-12.c: New test.
* gcc.target/riscv/rvv/base/pr112431-13.c: New test.
* gcc.target/riscv/rvv/base/pr112431-14.c: New test.
* gcc.target/riscv/rvv/base/pr112431-15.c: New test.
* gcc.target/riscv/rvv/base/pr112431-7.c: New test.
* gcc.target/riscv/rvv/base/pr112431-8.c: New test.
* gcc.target/riscv/rvv/base/pr112431-9.c: New test.

[Bug target/112431] RISC-V GCC-15 feature: Support register overlap on widen RVV instructions

2023-11-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112431

--- Comment #5 from GCC Commits  ---
The master branch has been updated by Pan Li :

https://gcc.gnu.org/g:62685890d8861b72f812bfe171a20332df08bd49

commit r14-5982-g62685890d8861b72f812bfe171a20332df08bd49
Author: Juzhe-Zhong 
Date:   Wed Nov 29 18:53:06 2023 +0800

RISC-V: Support highpart overlap for vext.vf

PR target/112431

gcc/ChangeLog:

* config/riscv/vector.md: Support highpart overlap for vext.vf2

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/unop_v_constraint-2.c: Adapt test.
* gcc.target/riscv/rvv/base/pr112431-4.c: New test.
* gcc.target/riscv/rvv/base/pr112431-5.c: New test.
* gcc.target/riscv/rvv/base/pr112431-6.c: New test.

[Bug target/112431] RISC-V GCC-15 feature: Support register overlap on widen RVV instructions

2023-11-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112431

--- Comment #4 from GCC Commits  ---
The master branch has been updated by Pan Li :

https://gcc.gnu.org/g:bdad036da32f72b84a96070518e7d75c21706dc2

commit r14-5960-gbdad036da32f72b84a96070518e7d75c21706dc2
Author: Juzhe-Zhong 
Date:   Wed Nov 29 16:34:10 2023 +0800

RISC-V: Support highpart register overlap for vwcvt

Since Richard supports register filters recently, we are able to support
highpart register
overlap for widening RVV instructions.

This patch support it for vwcvt intrinsics.

I leverage real application user codes for vwcvt:
https://github.com/riscv/riscv-v-spec/issues/929
https://godbolt.org/z/xoeGnzd8q

This is the real application codes that using LMUL = 8 with unrolling to
gain optimal
performance for specific libraury.

You can see in the codegen, GCC has optimal codegen for such since we
supported register
lowpart overlap for narrowing instructions (dest EEW < source EEW).

Now, we start to support highpart register overlap from this patch for
widening instructions (dest EEW > source EEW).

Leverage this intrinsic codes above but for vwcvt:

https://godbolt.org/z/1TMPE5Wfr

size_t
foo (char const *buf, size_t len)
{
  size_t sum = 0;
  size_t vl = __riscv_vsetvlmax_e8m8 ();
  size_t step = vl * 4;
  const char *it = buf, *end = buf + len;
  for (; it + step <= end;)
{
  vint8m4_t v0 = __riscv_vle8_v_i8m4 ((void *) it, vl);
  it += vl;
  vint8m4_t v1 = __riscv_vle8_v_i8m4 ((void *) it, vl);
  it += vl;
  vint8m4_t v2 = __riscv_vle8_v_i8m4 ((void *) it, vl);
  it += vl;
  vint8m4_t v3 = __riscv_vle8_v_i8m4 ((void *) it, vl);
  it += vl;

  asm volatile("nop" ::: "memory");
  vint16m8_t vw0 = __riscv_vwcvt_x_x_v_i16m8 (v0, vl);
  vint16m8_t vw1 = __riscv_vwcvt_x_x_v_i16m8 (v1, vl);
  vint16m8_t vw2 = __riscv_vwcvt_x_x_v_i16m8 (v2, vl);
  vint16m8_t vw3 = __riscv_vwcvt_x_x_v_i16m8 (v3, vl);

  asm volatile("nop" ::: "memory");
  size_t sum0 = __riscv_vmv_x_s_i16m8_i16 (vw0);
  size_t sum1 = __riscv_vmv_x_s_i16m8_i16 (vw1);
  size_t sum2 = __riscv_vmv_x_s_i16m8_i16 (vw2);
  size_t sum3 = __riscv_vmv_x_s_i16m8_i16 (vw3);

  sum += sumation (sum0, sum1, sum2, sum3);
}
  return sum;
}

Before this patch:

...
csrrt0,vlenb
...
vwcvt.x.x.v v16,v8
vwcvt.x.x.v v8,v28
vs8r.v  v16,0(sp)   ---> spill
vwcvt.x.x.v v16,v24
vwcvt.x.x.v v24,v4
nop
vsetvli zero,zero,e16,m8,ta,ma
vmv.x.s a2,v16
vl8re16.v   v16,0(sp)  --->  reload
...
csrrt0,vlenb
...

You can see heavy spill && reload inside the loop body.

After this patch:

...
vwcvt.x.x.v v8,v12
vwcvt.x.x.v v16,v20
vwcvt.x.x.v v24,v28
vwcvt.x.x.v v0,v4
...

Optimal codegen after this patch.

Tested on zvl128b no regression.

I am gonna to test zve64d/zvl256b/zvl512b/zvl1024b.

Ok for trunk if no regression on the testing above ?

Co-authored-by: kito-cheng 
Co-authored-by: kito-cheng 

PR target/112431

gcc/ChangeLog:

* config/riscv/constraints.md (TARGET_VECTOR ? V_REGS : NO_REGS):
New register filters.
* config/riscv/riscv.md (no,W21,W42,W84,W41,W81,W82): Ditto.
(no,yes): Ditto.
* config/riscv/vector.md: Support highpart register overlap for
vwcvt.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr112431-1.c: New test.
* gcc.target/riscv/rvv/base/pr112431-2.c: New test.
* gcc.target/riscv/rvv/base/pr112431-3.c: New test.