[Bug tree-optimization/115387] [15 regression] RISC-V: ICE in iovsprintf.c since r15-1081-ge14afbe2d1c

2024-06-10 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115387

--- Comment #7 from Li Pan  ---
Thanks a lot. I am testing a fix, and will send it out after no surprise.

[Bug tree-optimization/115387] [15 regression] RISC-V: ICE in iovsprintf.c since r15-1081-ge14afbe2d1c

2024-06-09 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115387

--- Comment #5 from Li Pan  ---
Thanks all. I can reproduce this now.
Sorry I didn't run the test with glibc(only newlib), will take care of it ASAP.

[Bug tree-optimization/115387] [15 regression] RISC-V: ICE in iovsprintf.c since r15-1081-ge14afbe2d1c

2024-06-08 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115387

--- Comment #2 from Li Pan  ---
(In reply to Edwin Lu from comment #1)
> Bisected to r15-1081-ge14afbe2d1c being the first bad commit

Ack, thanks Edwin, will try to reproduce this.

[Bug rtl-optimization/115013] [15 Regression] LRA: PR114810 fix result in ICE in the RISC-V Vector

2024-05-17 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115013

Li Pan  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #8 from Li Pan  ---
Fixed.

[Bug c/115013] New: LRA: PR114810 fix result in ICE in the RISC-V Vector

2024-05-09 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115013

Bug ID: 115013
   Summary: LRA: PR114810 fix result in ICE in the RISC-V Vector
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pan2.li at intel dot com
  Target Milestone: ---

The patch:
[PR114810][LRA]: Recognize alternatives with lack of available registers for
insn and demote them.

Results in some ICE in the rvv.exp of RISC-V backend.

   = Summary of gcc testsuite =
| # of unexpected case / # of unique unexpected
case
|  gcc |  g++ | gfortran |
rv64gcv/  lp64d/ medlow | 1061 /69 |0 / 0 |  - |
make: *** [Makefile:1096: report-gcc-newlib] Error 1

Just pick one imm_loop_invariant-10.c as below.

.../gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_loop_invariant-10.c:20:1:
error: unrecognizable insn:
(insn 265 0 0 (parallel [
(set (reg:RVVMF8QI 309 [239])
(unspec:RVVMF8QI [
(reg:SI 0 zero)
] UNSPEC_VUNDEF))
(clobber (scratch:SI))
]) -1
 (nil))
during RTL pass: reload
…. gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_loop_invariant-10.c:20:1:
internal compiler error: in extract_insn, at recog.cc:2812
0xa9d309 _fatal_insn(char const*, rtx_def const*, char const*, int, char
const*)
../.././gcc/gcc/rtl-error.cc:108
0xa9d32b _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
../.././gcc/gcc/rtl-error.cc:116
0xa9bc07 extract_insn(rtx_insn*)
../.././gcc/gcc/recog.cc:2812
0x10e5ad2 ira_remove_insn_scratches(rtx_insn*, bool, _IO_FILE*, rtx_def*
(*)(rtx_def*))
../.././gcc/gcc/ira.cc:5381
0x112868f remove_insn_scratches
../.././gcc/gcc/lra.cc:2154
0x112868f lra_emit_move(rtx_def*, rtx_def*)
../.././gcc/gcc/lra.cc:513
0x1136883 match_reload
../.././gcc/gcc/lra-constraints.cc:1184
0x1142ae4 curr_insn_transform
../.././gcc/gcc/lra-constraints.cc:4778
0x11443cb lra_constraints(bool)
../.././gcc/gcc/lra-constraints.cc:5481
0x112b192 lra(_IO_FILE*, int)
../.././gcc/gcc/lra.cc:2442
0x10e0e7f do_reload
../.././gcc/gcc/ira.cc:5973
0x10e0e7f execute
../.././gcc/gcc/ira.cc:6161

reproduced by below command:
riscv64-unknown-elf-gcc -c -S
gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_loop_invariant-10.c
-march=rv32gcv -mabi=ilp32 -o -

[Bug c/114885] New: RISC-V: ICE of unrecog insn when graphite for both the c/c++ and fortran

2024-04-29 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114885

Bug ID: 114885
   Summary: RISC-V: ICE of unrecog insn when graphite for both the
c/c++ and fortran
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pan2.li at intel dot com
  Target Milestone: ---

When some graphite (require the isl build for gcc) tests, there are sorts of
ICE that cannot recog the insn.

FAIL: gcc.dg/graphite/pr111878.c (internal compiler error: in 
extract_insn, at recog.cc:2812)

FAIL: gfortran.dg/graphite/id-27.f90   -O  (internal compiler error: in 
extract_insn, at recog.cc:2812)
FAIL: gfortran.dg/graphite/id-27.f90   -O  (test for excess errors)
FAIL: gfortran.dg/graphite/pr14741.f90   -O  (internal compiler error: 
in extract_insn, at recog.cc:2812)
FAIL: gfortran.dg/graphite/pr14741.f90   -O  (test for excess errors)
FAIL: gfortran.dg/graphite/pr29581.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  (internal 
compiler error: in extract_insn, at recog.cc:2812)
FAIL: gfortran.dg/graphite/pr29581.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  (test for 
excess errors)
FAIL: gfortran.dg/graphite/pr29581.f90   -O3 -g  (internal compiler 
error: in extract_insn, at recog.cc:2812)
FAIL: gfortran.dg/graphite/pr29581.f90   -O3 -g  (test for excess errors)
FAIL: gfortran.dg/graphite/pr29832.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  (internal 
compiler error: in extract_insn, at recog.cc:2812)
FAIL: gfortran.dg/graphite/pr29832.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  (test for 
excess errors)
FAIL: gfortran.dg/graphite/pr29832.f90   -O3 -g  (internal compiler 
error: in extract_insn, at recog.cc:2812)
FAIL: gfortran.dg/graphite/pr29832.f90   -O3 -g  (test for excess errors)
FAIL: gfortran.dg/graphite/vect-pr40979.f90   -O  (internal compiler 
error: in extract_insn, at recog.cc:2812)
FAIL: gfortran.dg/graphite/vect-pr40979.f90   -O  (test for excess errors)

Reproduce step(s):

1. download isl-0.24, let isl -> /some-where/riscv-gnu-toolchain/gcc/isl-0.24

2. mkdir __BUILD__ && cd __BUILD__ && ../configure \
  --target=riscv64-unknown-elf \
  --prefix=${INSTALL_DIR} \
  --disable-shared \
  --enable-threads \
  --enable-tls \
  --enable-languages=c,c++,fortran \
  --with-system-zlib \
  --with-newlib \
  --disable-libmudflap \
  --disable-libssp \
  --disable-libquadmath \
  --disable-libgomp \
  --enable-nls \
  --disable-tm-clone-registry \
  --src=`pwd`/../ \
  --with-abi=lp64d \
  --with-arch=rv64gcv \
  --with-tune=rocket \
  --with-isa-spec=20191213 \
  CFLAGS_FOR_BUILD="-O0 -g" \
  CXXFLAGS_FOR_BUILD="-O0 -g" \
  CFLAGS_FOR_TARGET="-O0  -g" \
  CXXFLAGS_FOR_TARGET="-O0 -g" \
  BOOT_CFLAGS="-O0 -g" \
  CFLAGS="-O0 -g" \
  CXXFLAGS="-O0 -g" \
  GM2FLAGS_FOR_TARGET="-O0 -g" \
  GOCFLAGS_FOR_TARGET="-O0 -g" \
  GDCFLAGS_FOR_TARGET="-O0 -g"
make -j $(nproc) all-gcc && make install-gcc

3. ../__RISC-V_INSTALL___RV64/bin/riscv64-unknown-elf-gcc
gcc/testsuite/gcc.dg/graphite/pr111878.c -O3 -fgraphite-identity
-fsave-optimization-record -march=rv64gcv -mabi=lp64d -c -S -o -

[Bug target/114639] [riscv] ICE in create_pre_exit, at mode-switching.cc:451

2024-04-28 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114639

--- Comment #19 from Li Pan  ---
Thanks Juzhe.  Here is another example

-
#include 

extern size_t get_new_vl ();

size_t
__attribute__((noinline))
get_vl (size_t *c)
{
  size_t vl = c[0] + c[1];

  return vl;
}

vbool64_t
test_fail_2 (vuint64m1_t a, unsigned long b, size_t *c)
{
  return __riscv_vmsne_vx_u64m1_b64 (a, b, get_vl (c));
}
---

test_fail_2:   
   
   [30/37834]
addisp,sp,-16
sd  ra,8(sp)
sd  s0,0(sp)
csrrt0,vlenb
sub sp,sp,t0
vs1r.v  v1,0(sp)
sub sp,sp,t0
vs1r.v  v2,0(sp)
sub sp,sp,t0
vs1r.v  v3,0(sp)
sub sp,sp,t0
vs1r.v  v4,0(sp)
sub sp,sp,t0
vs1r.v  v5,0(sp)
sub sp,sp,t0
vs1r.v  v6,0(sp)
sub sp,sp,t0
vs1r.v  v7,0(sp)
sub sp,sp,t0
vs1r.v  v24,0(sp)
sub sp,sp,t0
vs1r.v  v25,0(sp)
sub sp,sp,t0
vs1r.v  v26,0(sp)
sub sp,sp,t0
vs1r.v  v27,0(sp)
sub sp,sp,t0
vs1r.v  v28,0(sp)
sub sp,sp,t0   
   
 vs1r.v  v29,0(sp) 
   
   
  sub sp,sp,t0
vs1r.v  v30,0(sp)
sub sp,sp,t0
vs1r.v  v31,0(sp)
csrrt0,vlenb
sub sp,sp,t0
vs1r.v  v8,0(sp)
mv  s0,a0
mv  a0,a1
callget_vl
vl1re64.v   v8,0(sp)
vsetvli zero,a0,e64,m1,ta,ma
vmsne.vxv0,v8,s0
csrrt0,vlenb
add sp,sp,t0
csrrt0,vlenb
vl1re64.v   v31,0(sp)
add sp,sp,t0
vl1re64.v   v30,0(sp)
add sp,sp,t0
vl1re64.v   v29,0(sp)
add sp,sp,t0
vl1re64.v   v28,0(sp)
...

As I understand, these callee saved vector registers are not required if the
function body doesn't pollute these registers.  Only the polluted registers
need to go in/out stack.

However, it is somehow one optimization here, we can consider to improve this
in GCC-15 if my understanding is correct.

[Bug target/114639] [riscv] ICE in create_pre_exit, at mode-switching.cc:451

2024-04-28 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114639

--- Comment #17 from Li Pan  ---
According to the V abi, looks like the asm code tries to save/restore the
callee-saved registers when there is a call in function body.

| Name| ABI Mnemonic | Meaning  | Preserved across
calls?
=
| v0  |  | Argument register| No
| v1-v7   |  | Callee-saved registers   | Yes
| v8-v23  |  | Argument registers   | No
| v24-v31 |  | Callee-saved registers   | Yes

[Bug target/114714] [RISC-V][RVV] ICE: insn does not satisfy its constraints (postreload)

2024-04-15 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114714

--- Comment #4 from Li Pan  ---
(In reply to Kito Cheng from comment #3)
> Reduced case, not the final result, but it already run 8+ hours...
> ```
> typedef int a;
> typedef short b;
> typedef unsigned c;
> template < typename > using e = unsigned;
> template < typename > void ab();
> #pragma riscv intrinsic "vector"
> template < typename f, int, int ac > struct g {
>   using i = f;
>   template < typename m > using j = g< m, 0, ac >;
>   using k = g< i, 1, ac - 1 >;
>   using ad = g< i, 1, ac + 1 >;
> };
> namespace ae {
> struct af {
>   using h = g< short, 6, 0 < 3 >;
> };
> struct ag {
>   using h = af::h;
> };
> } template < typename, int > using ah = ae::ag::h;
> template < class ai > using aj = typename ai::i;
> template < class i, class ai > using j = typename ai::j< i >;
> template < class ai > using ak = j< e< ai >, ai >;
> template < class ai > using k = typename ai::k;
> template < class ai > using ad = typename ai::ad;
> template < a ap > vuint16m1_t ar(g< b, ap, 0 >, b);
> template < a ap > vuint16m2_t ar(g< b, ap, 1 >, b);
> template < a ap > vuint32m2_t ar(g< c, ap, 1 >, c);
> template < a ap > vuint32m4_t ar(g< c, ap, 2 >, c);
> template < class ai > using as = decltype(ar(ai(), aj< ai >()));
> template < class ai > as< ai > at(ai);
> namespace ae {
> template < int ap > vuint32m4_t au(g< c, ap, 1 + 1 >, vuint32m2_t l) {
>   return __riscv_vlmul_ext_v_u32m2_u32m4(l);
> }
> } template < int ap > vuint32m2_t aw(g< c, ap, 1 >, vuint16m1_t l) {
>   return __riscv_vzext_vf2_u32m2(l, 0);
> }
> namespace ae {
> vuint32m4_t ax(vuint32m4_t, vuint32m4_t, a);
> }
> template < class ay, class an > as< ay > az(ay ba, an bc) {
>   an bb;
>   return ae::ax(ae::au(ba, bc), ae::au(ba, bb), 2);
> }
> template < class bd > as< bd > be(bd, as< ad< bd > >);
> namespace ae {
> template < class bh, class bi > void bj(bh bk, bi bl) {
>   ad< decltype(bk) > bn;
>   az(bn, bl);
> }
> } template < int ap, int ac, class bp, class bq >
> void br(g< c, ap, ac > bk, bp, bq bl) {
>   ae::bj(bk, bl);
> }
> template < class ai > using bs = decltype(at(ai()));
> struct bt;
> template < int ac = 1 > class bu {
> public:
>   template < typename i > void operator()(i) {
> ah< i, ac > d;
> bt()(i(), d);
>   }
> };
> struct bt {
>   template < typename bv, class bf > void operator()(bv, bf bw) {
> using bx = bv;
> ak< bf > by;
> k< bf > bz;
> using bq = bs< decltype(by) >;
> using bp = bs< decltype(bw) >;
> bp cb;
> ab< bx >();
> for (;;) {
>   bp cc;
>   bq bl = aw(by, be(bz, cc));
>   br(by, cb, bl);
> }
>   }
> };
> void d() { bu()(b()); }
> 
> ```

Thanks Kito, really save my day!

[Bug target/114714] [RISC-V][RVV] ICE: insn does not satisfy its constraints (postreload)

2024-04-15 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114714

--- Comment #2 from Li Pan  ---
The vzext.vf2 has earlyclobber dest operand, and then it cannot allocated to
the source operand, like vzext.vf2 v0, v0.  Thus we will fail when check_rtl.

(define_insn "@pred__vf2"
  [(set (match_operand:VWEXTI 0 "register_operand" "=vd, vr,
vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, ?, ?")
(if_then_else:VWEXTI
  (unspec:
[(match_operand: 1 "vector_mask_operand"   " vm,Wc1,
vm,Wc1, vm,Wc1, vm,Wc1, vm,Wc1, vm,Wc1,vmWc1,vmWc1")
 (match_operand 4 "vector_length_operand"  " rK, rK,
rK, rK, rK, rK, rK, rK, rK, rK, rK, rK,   rK,   rK")
 (match_operand 5 "const_int_operand"  "i,  i,  i, 
i,  i,  i,  i,  i,  i,  i,  i,  i,i,i")
 (match_operand 6 "const_int_operand"  "i,  i,  i, 
i,  i,  i,  i,  i,  i,  i,  i,  i,i,i")
 (match_operand 7 "const_int_operand"  "i,  i,  i, 
i,  i,  i,  i,  i,  i,  i,  i,  i,i,i")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
  (any_extend:VWEXTI
(match_operand: 3 "register_operand"  
"W21,W21,W21,W21,W42,W42,W42,W42,W84,W84,W84,W84,   vr,   vr"))
  (match_operand:VWEXTI 2 "vector_merge_operand"   " vu, vu, 
0,  0, vu, vu,  0,  0, vu, vu,  0,  0,   vu,0")))]
  "TARGET_VECTOR"
  "vext.vf2\t%0,%3%p1"
  [(set_attr "type" "vext")
   (set_attr "mode" "")
   (set_attr "group_overlap"
"W21,W21,W21,W21,W42,W42,W42,W42,W84,W84,W84,W84,none,none")])



insn 1205 1214 5405 70 (set (reg:RVVM1SI 97 v1 [orig:687 _1177 ] [687])
(if_then_else:RVVM1SI (unspec:RVVMF32BI [
(const_vector:RVVMF32BI repeat [
(const_int 1 [0x1])
])
(reg:DI 25 s9 [orig:539 _889 ] [539])
(const_int 2 [0x2]) repeated x2
(const_int 0 [0])
(reg:SI 66 vl)
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)
(zero_extend:RVVM1SI (reg:RVVMF2HI 97 v1 [orig:654 _1100 ] [654]))
(unspec:RVVM1SI [
(reg:DI 0 zero)
] UNSPEC_VUNDEF))) "../hwy/ops/rvv-inl.h":1964:386 discrim 1
8452 {pred_zero_extendrvvm1si_vf2}
 (nil))
during RTL pass: reload

[Bug target/114714] [RISC-V][RVV] ICE: insn does not satisfy its constraints (postreload)

2024-04-14 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114714

Li Pan  changed:

   What|Removed |Added

 CC||pan2.li at intel dot com

--- Comment #1 from Li Pan  ---
Confirmed from riscv64-unknown-elf-g++ (GCC) 14.0.1 20240415 (experimental).

[Bug target/114639] [riscv] ICE in create_pre_exit, at mode-switching.cc:451

2024-04-09 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114639

--- Comment #13 from Li Pan  ---
overriding TARGET_CLASS_LIKELY_SPILLED_P hook may not be a fix as it will
generate sorts of spill for the below sample code.

vbool2_t test_vmfge_vf_f16m8_b2(vfloat16m8_t op1, float16_t op2, size_t vl) {
  return __riscv_vmfge_vf_f16m8_b2(op1, op2, vl);  
   
 }

need to re-think from the mode-switch side.

[Bug target/114639] [riscv] ICE in create_pre_exit, at mode-switching.cc:451

2024-04-09 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114639

--- Comment #12 from Li Pan  ---
#include 

extern unsigned long get_vl ();

#if 0

#else

vint32m1_t test (vint32m1_t a)
{
  unsigned b;
  return __riscv_vadd_vx_i32m1 (a, b, get_vl ()); // No ICE
}

vbool16_t test (vuint64m4_t a)
{
  unsigned long b;
  return __riscv_vmsne_vx_u64m4_b16 (a, b, get_vl ()); // ICE
}

#endif

This is comes from the below parts:

!(targetm.class_likely_spilled_p (REGNO_REG_CLASS (ret_start)));

For RVV, the reg_class values are listed as below. Because the Vector Mask has
only one reg, then it will be considered as likely spilled as the hook
TARGET_CLASS_LIKELY_SPILLED_P default returns true if reg_class_size[class] ==
1.

Not very sure if overriding TARGET_CLASS_LIKELY_SPILLED_P hook for riscv is a
reasonable fix, trying to understand TARGET_CLASS_LIKELY_SPILLED_P...


panli-reg_class_size[0]=0
panli-reg_class_size[1]=14 
   

panli-reg_class_size[2]=26
panli-reg_class_size[3]=32 
   

panli-reg_class_size[4]=32
panli-reg_class_size[5]=2  
   

panli-reg_class_size[6]=1  <= VM
panli-reg_class_size[7]=31 <= VD   
   

panli-reg_class_size[8]=32 <= V
panli-reg_class_size[9]=98

[Bug target/114639] [riscv] ICE in create_pre_exit, at mode-switching.cc:451

2024-04-09 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114639

--- Comment #11 from Li Pan  ---
(In reply to Li Pan from comment #10)
> The #define FUNCTION_VALUE_REGNO_P(N) ((N) == GP_RETURN || (N) == FP_RETURN)
> of the riscv backend doesn't honor vector mode.  Then the below part
> 
>  370 if (!targetm.calls.function_value_regno_p
> (copy_start))   
> 
>  371   copy_num = 0;
> 
>  372 else
>  373   copy_num = hard_regno_nregs (copy_start,
>  374    GET_MODE (copy_reg));
> 
> will have copy_num == 0 and then went to a different code path.
> 
> Let me run fully riscv regression test for this fix first.

Maybe misunderstand here, need to double-check the vector ABI for return
values.

[Bug target/114639] [riscv] ICE in create_pre_exit, at mode-switching.cc:451

2024-04-09 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114639

--- Comment #10 from Li Pan  ---
The #define FUNCTION_VALUE_REGNO_P(N) ((N) == GP_RETURN || (N) == FP_RETURN) of
the riscv backend doesn't honor vector mode.  Then the below part

 370 if (!targetm.calls.function_value_regno_p
(copy_start))   
 371   copy_num = 0;
 372 else
 373   copy_num = hard_regno_nregs (copy_start,
 374    GET_MODE (copy_reg));

will have copy_num == 0 and then went to a different code path.

Let me run fully riscv regression test for this fix first.

[Bug target/114639] [riscv] ICE in create_pre_exit, at mode-switching.cc:451

2024-04-09 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114639

--- Comment #8 from Li Pan  ---
Find an even simpler code for reproduction.

#include 

extern unsigned long get_vl ();

vbool16_t test (vuint64m4_t a)
{
  unsigned long b;
  return __riscv_vmsne_vx_u64m4_b16 (a, b, get_vl ());
}

../__RISC-V_INSTALL___RV64/bin/riscv64-unknown-elf-g++ -O3 -march=rv64gcv -c
ref.c -S -o -

acc22d56e140220e7dc6c138918cb6754b6d1c0b enabled the vector abi by default, and
trigger this assert in create_pre_exit. Replace get_vl () with a local variable
could bypass this issue. will continue to investigate.

[Bug target/114639] [riscv] ICE in create_pre_exit, at mode-switching.cc:451

2024-04-08 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114639

--- Comment #7 from Li Pan  ---
Looks this commit from bisect acc22d56e140220e7dc6c138918cb6754b6d1c0b, will
take a look into it.

[Bug target/114639] [riscv] ICE in create_pre_exit, at mode-switching.cc:451

2024-04-08 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114639

--- Comment #5 from Li Pan  ---
(In reply to Kito Cheng from comment #4)
> Reduced case:
> ```c
> typedef long c;
> #pragma riscv intrinsic "vector"
> template  struct d {};
> struct e {
>   using f = d<0>;
> };
> struct g {
>   using f = e::f;
> };
> template  using h = g::f;
> template  long k(d);
> vbool16_t j(vuint64m4_t a) {
>   c b;
>   return __riscv_vmsne_vx_u64m4_b16(a, b, k(h()));
> }
> 
> ```

Thanks Kito, reproduced on reduced case with option "riscv64-unknown-elf-g++
-O2 -march=rv64gcv". will take a look into it.


during RTL pass: mode_sw
test.c: In function ‘vbool16_t j(vuint64m4_t)’:
test.c:15:1: internal compiler error: in create_pre_exit, at
mode-switching.cc:451
   15 | }
  | ^
0x3978f12 create_pre_exit  
   

/home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/mode-switching.cc:451
0x3979e9e optimize_mode_switching
   
/home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/mode-switching.cc:849
0x397b9bc execute
   
/home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/mode-switching.cc:1324
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.

[Bug target/114639] [riscv] ICE in create_pre_exit, at mode-switching.cc:451

2024-04-08 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114639

--- Comment #3 from Li Pan  ---
Reproduced from my side too.

[Bug target/114352] RISC-V: ICE when __attribute__((target("arch=+v")) and build with rv64gc -O3

2024-03-21 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114352

Li Pan  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #5 from Li Pan  ---
Fixed.

[Bug c/114352] RISC-V: ICE when __attribute__((target("arch=+v")) and build with rv64gc -O3

2024-03-15 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114352

--- Comment #1 from Li Pan  ---
Test GCC version:

riscv64-unknown-elf-gcc (GCC) 14.0.1 20240315 (experimental)
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[Bug c/114352] New: RISC-V: ICE when __attribute__((target("arch=+v")) and build with rv64gc -O3

2024-03-15 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114352

Bug ID: 114352
   Summary: RISC-V: ICE when __attribute__((target("arch=+v")) and
build with rv64gc -O3
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pan2.li at intel dot com
  Target Milestone: ---

Assume we have a sample code as below

void
__attribute__((target("arch=+v")))
add (int *a, int *b, int *out, unsigned count)
{
  unsigned i;

  for (i = 0; i < count; i++)
out[i] = a[i] + b[i];
}

When build with -march=rv64gc -O3 there will be ICE as below:
test.c: In function ‘add’:
test.c:4:1: internal compiler error: Floating point exception  
   
 4 | {
  | ^  
   
 0x1a5891b crash_signal
   
   
 
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/toplev.cc:319
   
 0x7f0a7884251f ???
./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
0x1f51ba4 riscv_hard_regno_nregs
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/config/riscv/riscv.cc:8143
0x1967bb9 init_reg_modes_target()
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/reginfo.cc:471
0x13fc029 init_emit_regs()
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/emit-rtl.cc:6237
0x1a5b83d target_reinit()
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/toplev.cc:1936
0x35e374d save_target_globals()
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/target-globals.cc:92
0x35e381f save_target_globals_default_opts()
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/target-globals.cc:122
0x1f544cc riscv_save_restore_target_globals(tree_node*)
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/config/riscv/riscv.cc:9138
0x1f55c36 riscv_set_current_function
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/config/riscv/riscv.cc:9477
0x1505be7 invoke_set_current_function_hook
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/function.cc:4690
0x1505f60 allocate_struct_function(tree_node*, bool)
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/function.cc:4813
0x1044e33 store_parm_decls()
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/c/c-decl.cc:11084
0x10b8a54 c_parser_declaration_or_fndef
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/c/c-parser.cc:2975
0x10b62b7 c_parser_external_declaration
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/c/c-parser.cc:2046
   
0x10b5d2a c_parser_translation_unit
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/c/c-parser.cc:1900
0x110d5f4 c_parse_file()
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/c/c-parser.cc:26889
0x11bd3f3 c_common_parse_file()

Prepare a script for most vector arch combinations we will have:
arch=+v Fail
arch=+zve32x Fail
arch=+zve32f Fail
arch=+zve64x Fail
arch=+zve64f Fail
arch=+zve64d Fail
arch=+zvl64b Pass
arch=+zvl128b Pass
arch=+zvl256b Pass
arch=+zvl4096b Pass
arch=+zve32x_zvl64b Fail
arch=+zve32x_zvl128b Fail
arch=+zve32x_zvl256b Fail
arch=+zve32x_zvl4096b Fail
arch=+zve32f_zvl64b Fail
arch=+zve32f_zvl128b Fail
arch=+zve32f_zvl256b Fail
arch=+zve32f_zvl4096b Fail
arch=+zve64x_zvl64b Fail
arch=+zve64x_zvl128b Fail
arch=+zve64x_zvl256b Fail
arch=+zve64x_zvl4096b Fail
arch=+zve64f_zvl64b Fail
arch=+zve64f_zvl128b Fail
arch=+zve64f_zvl256b Fail
arch=+zve64f_zvl4096b Fail
arch=+zve64d_zvl64b Fail
arch=+zve64d_zvl128b Fail
arch=+zve64d_zvl256b Fail
arch=+zve64d_zvl4096b Fail

The passed arch cannot vectorized but the -march=armv8-a -O3 with
__attribute__((target("+sve2"))) can vectorize.

I will try to fix this ICE soon.

[Bug c/114351] New: RISC-V: ICE when __attribute__((target("arch=+v")) and build with rv64gc -O3

2024-03-15 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114351

Bug ID: 114351
   Summary: RISC-V: ICE when __attribute__((target("arch=+v")) and
build with rv64gc -O3
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pan2.li at intel dot com
  Target Milestone: ---

Assume we have a sample code as below

void
__attribute__((target("arch=+v")))
add (int *a, int *b, int *out, unsigned count)
{
  unsigned i;

  for (i = 0; i < count; i++)
out[i] = a[i] + b[i];
}

When build with -march=rv64gc -O3 there will be ICE as below:
test.c: In function ‘add’:
test.c:4:1: internal compiler error: Floating point exception  
   
 4 | {
  | ^  
   
 0x1a5891b crash_signal
   
   
 
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/toplev.cc:319
   
 0x7f0a7884251f ???
./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
0x1f51ba4 riscv_hard_regno_nregs
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/config/riscv/riscv.cc:8143
0x1967bb9 init_reg_modes_target()
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/reginfo.cc:471
0x13fc029 init_emit_regs()
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/emit-rtl.cc:6237
0x1a5b83d target_reinit()
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/toplev.cc:1936
0x35e374d save_target_globals()
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/target-globals.cc:92
0x35e381f save_target_globals_default_opts()
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/target-globals.cc:122
0x1f544cc riscv_save_restore_target_globals(tree_node*)
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/config/riscv/riscv.cc:9138
0x1f55c36 riscv_set_current_function
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/config/riscv/riscv.cc:9477
0x1505be7 invoke_set_current_function_hook
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/function.cc:4690
0x1505f60 allocate_struct_function(tree_node*, bool)
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/function.cc:4813
0x1044e33 store_parm_decls()
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/c/c-decl.cc:11084
0x10b8a54 c_parser_declaration_or_fndef
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/c/c-parser.cc:2975
0x10b62b7 c_parser_external_declaration
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/c/c-parser.cc:2046
   
0x10b5d2a c_parser_translation_unit
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/c/c-parser.cc:1900
0x110d5f4 c_parse_file()
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/c/c-parser.cc:26889
0x11bd3f3 c_common_parse_file()

Prepare a script for most vector arch combinations we will have:
arch=+v Fail
arch=+zve32x Fail
arch=+zve32f Fail
arch=+zve64x Fail
arch=+zve64f Fail
arch=+zve64d Fail
arch=+zvl64b Pass
arch=+zvl128b Pass
arch=+zvl256b Pass
arch=+zvl4096b Pass
arch=+zve32x_zvl64b Fail
arch=+zve32x_zvl128b Fail
arch=+zve32x_zvl256b Fail
arch=+zve32x_zvl4096b Fail
arch=+zve32f_zvl64b Fail
arch=+zve32f_zvl128b Fail
arch=+zve32f_zvl256b Fail
arch=+zve32f_zvl4096b Fail
arch=+zve64x_zvl64b Fail
arch=+zve64x_zvl128b Fail
arch=+zve64x_zvl256b Fail
arch=+zve64x_zvl4096b Fail
arch=+zve64f_zvl64b Fail
arch=+zve64f_zvl128b Fail
arch=+zve64f_zvl256b Fail
arch=+zve64f_zvl4096b Fail
arch=+zve64d_zvl64b Fail
arch=+zve64d_zvl128b Fail
arch=+zve64d_zvl256b Fail
arch=+zve64d_zvl4096b Fail

The passed arch cannot vectorized but the -march=armv8-a -O3 with
__attribute__((target("+sve2"))) can vectorize.

I will try to fix this ICE soon.

[Bug middle-end/114195] [14] RISC-V vector ICE: in vectorizable_store, at tree-vect-stmts.cc:8690

2024-03-10 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114195

--- Comment #4 from Li Pan  ---
Hi Patrick,

Could you please help to double-check if upstream has this problem? As well as
PR114198.

Thanks.

[Bug middle-end/114195] [14] RISC-V vector ICE: in vectorizable_store, at tree-vect-stmts.cc:8690

2024-03-07 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114195

--- Comment #3 from Li Pan  ---
Testing a fix for possible regression.

[Bug middle-end/114195] [14] RISC-V vector ICE: in vectorizable_store, at tree-vect-stmts.cc:8690

2024-03-06 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114195

--- Comment #2 from Li Pan  ---
Trigger below assert in vectorizable_store, the loop_vinfo use_partial,
fully_masked and fully_lens are all true here.

/* Shouldn't go with length-based approach if fully masked.  */
gcc_assert (!loop_lens || !loop_masks);

Introduce by this commit
https://github.com/gcc-mirror/gcc/commit/9fb832ce382d649b7687426e6bc4e5d3715cb78a#diff-97f675a4f401d6ec84d031e0d7259a0b6ba3b50eccc3fe483e9376becc9d9cf9

[Bug middle-end/114195] [14] RISC-V vector ICE: in vectorizable_store, at tree-vect-stmts.cc:8690

2024-03-06 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114195

--- Comment #1 from Li Pan  ---
Confirmed with
  1. build option '-march=rv64gcv -O3'.
  2. riscv64-unknown-elf-gcc (GCC) 14.0.1 20240306 (experimental).

If no one works on this ICE already, will take a look into it.

[Bug target/114027] [14] RISC-V vector: miscompile at -O3

2024-02-21 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114027

--- Comment #4 from Li Pan  ---
Just did some hacks from the riscv backend, which is to replace the expanding
code of reduc_smax_scal_ to the reduc_xor_scal_.

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 3b32369f68c..58424baabd7 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2107,10 +2107,8 @@ (define_expand "reduc_smax_scal_"
(match_operand:V_VLSI 1 "register_operand")]
   "TARGET_VECTOR"
 {
-  int prec = GET_MODE_PRECISION (mode);
-  rtx min = immed_wide_int_const (wi::min_value (prec, SIGNED), mode);
-  riscv_vector::expand_reduction (UNSPEC_REDUC_MAX, riscv_vector::REDUCE_OP,
-  operands, min);
+  riscv_vector::expand_reduction (UNSPEC_REDUC_XOR, riscv_vector::REDUCE_OP,
+  operands, CONST0_RTX (mode));
   DONE;
 })

My idea would like to prove that the last standard name should be .REDUC_XOR.

Then the test (include the narrowed and the original one) can pass. That may
indicates we take .REDUC_MAX by mistake in somewhere. let me try to figure it
out.

[Bug target/114027] [14] RISC-V vector: miscompile at -O3

2024-02-21 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114027

Li Pan  changed:

   What|Removed |Added

 CC||pan2.li at intel dot com

--- Comment #3 from Li Pan  ---
Narrow a little compares to the original test case.

---
int b[10][7] = {{}, // 0
{}, // 1
{}, // 2
{}, // 3
{}, // 4
{}, // 5
{0, 0, 0, 0, 0, 1}, // 6
{2, 3, 4, 5, 6, 7}, // 7
{8, 8, 8, 8, 8, 8}};// 8
   //0  1  2  3  4  5
int c;

int main() {
  int d = 0, a = 0;
  c = 0x;

  for (a = 0; a < 5; a++) {
for (d = 0; d < 6; d++) {
  c ^= -3L;

  if (b[a + 3][d])
continue;

  c = 0;
}
  }

  if (c == -3) {
return 0;
  } else {
return 1;
  }
}
---

The sematics of the loop acts on 5 * 6 matrix. The upstream currently makes the
first 4 * 6 vectorized and then goes scalar for the last 6 elements. The
vectorized part may looks like below.

  vect_array.16 = .MASK_LEN_LOAD_LANES (  [(void *) + 84B],
32B, { -1, ... }, POLY_INT_CST [4, 4], 0);
  vect__28.17_94 = vect_array.16[0];
  vect__28.18_95 = vect_array.16[1];
  vect__28.19_96 = vect_array.16[2];
  vect__28.20_97 = vect_array.16[3];
  vect__28.21_98 = vect_array.16[4];
  vect__28.22_99 = vect_array.16[5];
  vect_array.16 ={v} {CLOBBER};
  mask__70.24_102 = vect__28.17_94 != { 0, ... };
  vect_prephitmp_76.25_104 = .VCOND_MASK (mask__70.24_102, { -1, ... }, { -3,
... });
  mask__80.26_106 = vect__28.18_95 != { 0, ... };
  vect_c_lsm.27_108 = .VCOND_MASK (mask__80.26_106, vect_prephitmp_76.25_104, {
0, ... });
  mask__51.28_110 = vect__28.19_96 != { 0, ... };
  vect_prephitmp_66.29_112 = .VCOND_MASK (mask__51.28_110, vect_c_lsm.27_108, {
-3, ... });
  mask__16.30_114 = vect__28.20_97 != { 0, ... };
  vect_c_lsm.31_116 = .VCOND_MASK (mask__16.30_114, vect_prephitmp_66.29_112, {
0, ... });
  mask__79.32_118 = vect__28.21_98 != { 0, ... };
  vect_prephitmp_56.33_120 = .VCOND_MASK (mask__79.32_118, vect_c_lsm.31_116, {
-3, ... });
  mask__25.34_122 = vect__28.22_99 != { 0, ... };
  vect_c_lsm.35_124 = .VCOND_MASK (mask__25.34_122, vect_prephitmp_56.33_120, {
0, ... });
  _126 = .REDUC_MAX (vect_c_lsm.35_124);

Looks like the last .REDUC_MAX is kind of a surprise here? It is not easy to
get the sematics of REDUC_MAX for source code.  Actually the c will depend on
the previous iteration.

For example, if b condition is 0, c will be 0 forever. If b condition is 1, the
c will be the sequence similar to [-3, 0, -3, 0...].

Not sure if my understanding is correct, will take a look into tree-vect.

[Bug c/113696] RISC-V: ineffective vsetvl behavior

2024-02-19 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113696

Li Pan  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #2 from Li Pan  ---
Fixed.

[Bug tree-optimization/51492] vectorizer does not support saturated arithmetic patterns

2024-02-06 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492

--- Comment #18 from Li Pan  ---
Thanks for the confirmation.

Yes, it was before expand. I will prepare one PATCH for this, and it should
target for gcc-15 I bet.

[Bug tree-optimization/51492] vectorizer does not support saturated arithmetic patterns

2024-02-05 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492

--- Comment #16 from Li Pan  ---
I have a try like below and finally have the Standard Name "SAT_ADD". Could you
please help to double-check if my understanding is correct?

Given below example code below:

typedef unsigned int uint32_t;

uint32_t
sat_add (uint32_t x, uint32_t y)
{
  return (x + y) | - ((x + y) < x);
}

And then add one simpify to match.pd and define new DEF_INTERNAL_OPTAB_FN for
it. Then we have the SAT_ADD representation after expand.

uint32_t sat_add (uint32_t x, uint32_t y)
{
  uint32_t _6;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  _6 = .SAT_ADD (x_4(D), y_5(D)); [tail call]
  return _6;
;;succ:   EXIT

}

If everything goes well, I will prepare the patch for it later. Thanks.

[Bug target/113766] ICE: in generate_insn, at config/riscv/riscv-vector-builtins.cc:4186 with (invalid?) __riscv_vfredosum_tu()

2024-02-05 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113766

--- Comment #1 from Li Pan  ---
Thanks, I will take care of it.

[Bug target/112896] RISC-V: gcc.dg/pr30957-1.c run failure when rv64gcv_zvl1024b_zvfh_zfh

2024-02-04 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112896

Li Pan  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #2 from Li Pan  ---
This testcase is not well designed, removed from upstream and close this
bugzilla.

[Bug target/113697] RISC-V: Redundant vsetvl insn in function

2024-02-03 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113697

Li Pan  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from Li Pan  ---
Fixed.

[Bug tree-optimization/51492] vectorizer does not support saturated arithmetic patterns

2024-02-02 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492

--- Comment #15 from Li Pan  ---
(In reply to Tamar Christina from comment #14)
> Awesome! Feel free to reach out if you need any help.
> 
> It’s likely easier to start with add and sub and get things pipe cleaned and
> expand incrementally than to try and do it all at once.

Cool, thanks in advance.

I will first try to make a SAT_ADD to the direct optab for a POC following your
RFC and suggestion. Looks like at least match.pd and internal-fn.def will be
touched. I am learning how match.pd works right now.

[Bug tree-optimization/51492] vectorizer does not support saturated arithmetic patterns

2024-02-01 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492

--- Comment #13 from Li Pan  ---
I'll try to understand it and make it happen recently.

[Bug tree-optimization/51492] vectorizer does not support saturated arithmetic patterns

2024-02-01 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492

--- Comment #7 from Li Pan  ---
RISC-V backend reproduce code, build with "-march=rv64gcv_zba_zbb_zbc_zbs
--param=riscv-autovec-preference=fixed-vlmax -Ofast -ffast-math"

typedef unsigned short uint16_t;

void AAA (uint16_t *x, uint16_t *y, unsigned wsize, unsigned count)
{
  unsigned m = 0, n = count;
  register uint16_t *p;

  p = x;

  do {
m = *--p;
*p = (uint16_t)(m >= wsize ? m-wsize : 0);
  } while (--n);

  n = wsize;
  p = y;

  do {
  m = *--p;
  *p = (uint16_t)(m >= wsize ? m-wsize : 0);
  } while (--n);
}

[Bug c/113697] New: RISC-V: Redundant vsetvl insn in function

2024-01-31 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113697

Bug ID: 113697
   Summary: RISC-V: Redundant vsetvl insn in function
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pan2.li at intel dot com
  Target Milestone: ---

Give the sample code as below, build with -march=rv64gcv -O3 -g0

int foo (int * __restrict a, int n)
{
int result = 0;
for (int i = 0; i < n; i++)
  result += a[i];
return result;
}

The asm code looks like below, we have one duplicated vsetvl insn here.

foo:
.LFB0:
.cfi_startproc
ble a1,zero,.L4
vsetvli a5,zero,e32,m1,ta,ma
vmv.v.i v1,0
.L3:
vsetvli a5,a1,e32,m1,tu,ma
sllia4,a5,2
sub a1,a1,a5
vle32.v v2,0(a0)
add a0,a0,a4
vadd.vv v1,v2,v1
bne a1,zero,.L3
li  a5,0
vsetivlizero,1,e32,m1,ta,ma
vmv.s.x v2,a5
vsetvli a5,zero,e32,m1,ta,ma  <== redundant vsetvl
vredsum.vs  v1,v1,v2
vmv.x.s a0,v1
ret
.L4:
li  a0,0
ret

[Bug c/113696] New: RISC-V: ineffective vsetvl behavior

2024-01-31 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113696

Bug ID: 113696
   Summary: RISC-V: ineffective vsetvl behavior
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pan2.li at intel dot com
  Target Milestone: ---

Given we have a sample code, build with '-march=rv64gcv -O3 -g0'.


#include "riscv_vector.h"

void f (int32_t * restrict in, int32_t * restrict out, size_t n, size_t cond,
size_t cond2)
{
  for (size_t i = 0; i < n; i++)
{
  if (i == cond) {
vint8mf8_t v = *(vint8mf8_t*)(in + i + 100);
*(vint8mf8_t*)(out + i + 100) = v;
  } else if (i == cond2) {
vfloat32mf2_t v = *(vfloat32mf2_t*)(in + i + 200);
*(vfloat32mf2_t*)(out + i + 200) = v;
  } else if (i == (cond2 - 1)) {
vuint16mf2_t v = *(vuint16mf2_t*)(in + i + 300);
*(vuint16mf2_t*)(out + i + 300) = v;
  } else {
vint8mf4_t v = *(vint8mf4_t*)(in + i + 400);
*(vint8mf4_t*)(out + i + 400) = v;
  }
}
}

when we have asm code as below, the vsetvl insn is somehow ineffective and can
be refined up to a point.

f:
.LFB0:
.cfi_startproc
beq a2,zero,.L12
addia7,a0,400
addia6,a1,400
addia0,a0,1600
addia1,a1,1600
li  a5,0
addit6,a4,-1
vsetvli t3,zero,e8,mf8,ta,ma
.L7:
beq a3,a5,.L15
beq a4,a5,.L16
beq t6,a5,.L17
vsetvli t1,zero,e8,mf4,ta,ma
vle8.v  v1,0(a0)
vse8.v  v1,0(a1)
vsetvli t3,zero,e8,mf8,ta,ma
.L4:
addia5,a5,1
addia7,a7,4
addia6,a6,4
addia0,a0,4
addia1,a1,4
bne a2,a5,.L7
.L12:
ret
.L15:
vle8.v  v1,0(a7)
vse8.v  v1,0(a6)
j   .L4
.L17:
vsetvli t1,zero,e8,mf4,ta,ma
addit5,a0,-400
addit4,a1,-400
vle16.v v1,0(t5)
vse16.v v1,0(t4)
vsetvli t3,zero,e8,mf8,ta,ma
j   .L4
.L16:
addit5,a0,-800
addit4,a1,-800
vle32.v v1,0(t5)
vse32.v v1,0(t4)
j   .L4

[Bug target/113469] RISC-V: Illegal Insn for test case 920501-8.c when make linux for rv32

2024-01-25 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113469

Li Pan  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from Li Pan  ---
Fixed.

[Bug c/113469] New: RISC-V: Illegal Insn for test case 920501-8.c when make linux for rv32

2024-01-17 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113469

Bug ID: 113469
   Summary: RISC-V: Illegal Insn for test case 920501-8.c when
make linux for rv32
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pan2.li at intel dot com
  Target Milestone: ---

The test case will have illegal instruction when `make linux` build of the repo
riscv-gnu-toolchain for rv32.

1. Build.
../__RISC-V_INSTALL___RV32/bin/riscv32-unknown-linux-gnu-gcc
gcc/testsuite/gcc.c-torture/execute/920501-8.c -march=rv32gcv -mabi=ilp32d
-mtune=rocket -mcmodel=medlow -fdiagnostics-plain-output -O2 -w -lm -o
./920501-8.elf -static

2. Run with qemu
../build-qemu/qemu-riscv32 -cpu rv32,vlen=512,v=true,vext_spec=v1.0
920501-8.elf
Illegal instruction (core dumped)

3. When enter function __printf_buffer (comes from libc.a), it will go to insn
like below for the first insn
  __printf_buffer:
auipc a5,0x5f  => directly jump to the vmv insn and then illegal insn met.
...
vmv.v.i.v1,0

4. After some investigation, the function __printf_buffer should be the
   function Xprintf_buffer in glibc/stdio-common/vfprintf-internal.c. You can
   use the below command to compile it.

   cd glibc/stdio-common/
../../__RISC-V_INSTALL___RV32/bin/riscv32-unknown-linux-gnu-gcc
vfprintf-internal.c  \
 -c -std=gnu11 -fgnu89-inline  -mcmodel=medlow -O2 -Wall -Wwrite-strings
-Wundef \
 -fmerge-all-constants -frounding-math -fno-stack-protector -fno-common
-Wstrict-prototypes -Wold-style-definition  \
 -fmath-errno -fPIE   -ftls-model=initial-exec -I../include \

-I/home/pli/gcc/444/riscv-gnu-toolchain/build-glibc-linux-rv32gcv-ilp32d/stdio-common
 \
 -I/home/pli/gcc/444/riscv-gnu-toolchain/build-glibc-linux-rv32gcv-ilp32d 
-I../sysdeps/unix/sysv/linux/riscv/rv32 \
 -I../sysdeps/unix/sysv/linux/riscv  -I../sysdeps/riscv/nptl 
-I../sysdeps/unix/sysv/linux/generic/wordsize-32   \
 -I../sysdeps/unix/sysv/linux/generic  -I../sysdeps/unix/sysv/linux/include
-I../sysdeps/unix/sysv/linux  \
 -I../sysdeps/nptl  -I../sysdeps/pthread  -I../sysdeps/gnu 
-I../sysdeps/unix/inet  -I../sysdeps/unix/sysv  \
 -I../sysdeps/unix  -I../sysdeps/posix  \
 -I../sysdeps/riscv/rv32/rvd  -I../sysdeps/riscv/rv32/rvf  
-I../sysdeps/riscv/rvf \
 -I../sysdeps/riscv/rvd  -I../sysdeps/riscv/rv32  -I../sysdeps/riscv  \
 -I../sysdeps/ieee754/ldbl-128  -I../sysdeps/ieee754/dbl-64 
-I../sysdeps/ieee754/flt-32  \
 -I../sysdeps/wordsize-32   -I../sysdeps/ieee754  -I../sysdeps/generic \
 -I.. -I../libio -I. -nostdinc -isystem
/home/pli/gcc/444/riscv-gnu-toolchain/__RISC-V_INSTALL___RV32/lib/gcc/riscv32-unknown-linux-gnu/14.0.1/include
\
 -isystem
/home/pli/gcc/444/riscv-gnu-toolchain/__RISC-V_INSTALL___RV32/lib/gcc/riscv32-unknown-linux-gnu/14.0.1/include-fixed
  \
 -isystem /home/pli/gcc/444/riscv-gnu-toolchain/linux-headers/include \
 -D_LIBC_REENTRANT -include
/home/pli/gcc/444/riscv-gnu-toolchain/build-glibc-linux-rv32gcv-ilp32d/libc-modules.h
\
 -DMODULE_NAME=libc -include ../include/libc-symbols.h  -DPIC  \
 -DTOP_NAMESPACE=glibc -D_IO_MTSAFE_IO -o test.o

[Bug target/110265] RISC-V: ICE when build RVV intrinsic integer reduction with "-march=rv32gc_zve64d -mabi=ilp32d", both GCC 14 and 13.

2024-01-17 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110265

Li Pan  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Li Pan  ---
Fixed.

[Bug target/109615] Redundant VSETVL after optimized code of RVV

2024-01-17 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109615

Li Pan  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from Li Pan  ---
Fixed.

[Bug target/113393] RISC-V: Full coverage test bugs for upstream 20240112

2024-01-15 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113393

Li Pan  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #2 from Li Pan  ---
Closed as fixed in upstream and validated.

[Bug c/113393] New: RISC-V: Full coverage test bugs for upstream 20240112

2024-01-14 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113393

Bug ID: 113393
   Summary: RISC-V:  Full coverage test bugs for upstream 20240112
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pan2.li at intel dot com
  Target Milestone: ---

For the RV64 parts

Running target
riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-preference=fixed-vlmax
FAIL: gcc.c-torture/execute/pr68532.c   -O0  execution test
FAIL: gcc.c-torture/execute/pr68532.c   -O1  execution test
FAIL: gcc.c-torture/execute/pr68532.c   -O2  execution test
FAIL: gcc.c-torture/execute/pr68532.c   -O3 -fomit-frame-pointer -funroll-loops
-fpeel-loops -ftracer -finline-functions  execution test
FAIL: gcc.c-torture/execute/pr68532.c   -O3 -g  execution test
FAIL: gcc.c-torture/execute/pr68532.c   -Os  execution test
FAIL: gcc.c-torture/execute/pr68532.c   -O2 -flto -fno-use-linker-plugin
-flto-partition=none  execution test

Running target
riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m2/--param=riscv-autovec-preference=fixed-vlmax
FAIL: gcc.dg/vect/pr60196-1.c execution test
FAIL: gcc.dg/vect/pr60196-1.c -flto -ffat-lto-objects execution test

Running target
riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-preference=fixed-vlmax
FAIL: gcc.dg/vect/pr60196-1.c execution test
FAIL: gcc.dg/vect/pr60196-1.c -flto -ffat-lto-objects execution test

Running target
riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-preference=fixed-vlmax
FAIL: gcc.dg/vect/pr60196-1.c execution test
FAIL: gcc.dg/vect/pr60196-1.c -flto -ffat-lto-objects execution test

[Bug target/113247] RISC-V: Performance bug in SHA256 after enabling RVV vectorization

2024-01-10 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113247

--- Comment #10 from Li Pan  ---
(In reply to Robin Dapp from comment #9)
> I also noticed this (likely unwanted) vector snippet and wondered where it
> is being created.  First I thought it's a vec_extract but doesn't look like
> it.  I'm going to check why we create this.
> 
> Pan, the test was on real hardware I suppose?  

Yes.

> So regardless of the fact
> that we likely want to get rid of the snippet above, would you mind checking
> whether generic-ooo has any effect on performance?  Maybe you could try
> -march=rv64gc -mtune=generic-ooo.  Thanks.

Sure thing, actually I have some performance data that is under review to make
sure the alignment to the company policy before share to the community. I will
add a new column for generic-ooo.

[Bug target/113247] RISC-V: Performance bug in SHA256 after enabling RVV vectorization

2024-01-10 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113247

Li Pan  changed:

   What|Removed |Added

 CC||pan2.li at intel dot com

--- Comment #8 from Li Pan  ---
The performance ratio of sha-test compared to scalar is about -2.2% when build
with -mtune=generic-ooo.

Aka the option -mtune=generic-ooo makes the ratio (compares to scalar) from
-70% to -2.2%. I suppose the negative ratio may be caused by the part mentioned
by Juzhe.

[Bug target/113087] [14] RISC-V rv64gcv vector: Runtime mismatch with rv64gc

2024-01-08 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113087

--- Comment #29 from Li Pan  ---
(In reply to Patrick O'Neill from comment #27)
> Linking the discussion/plan here since more interested people are CCd here.
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113206#c9
> Using 4a0a8dc1b88408222b88e10278017189f6144602, the spec run failed on:
> zvl128b (All runtime fails):
> 527.cam4 (Runtime)
> 531.deepsjeng (Runtime)
> 521.wrf (Runtime)
> 523.xalancbmk (Runtime)
> 
> zvl256b:
> 507.cactuBSSN (Runtime)
> 521.wrf (Build)
> 527.cam4 (Runtime)
> 531.deepsjeng (Runtime)
> 549.fotonik3d (Runtime)
> 
> With that info I think the next steps are:
> 1. Triage the zvl256b 521.wrf build failure
> 2. Bisect the newly-failing testcases
> 3. Finish triaging the remaining testcases the fuzzer found
> 4. Attempt to manually reduce cam4 for zvl128b (since it seems to have the
> fastest build+runtime)
> 5. Attempt to manually reduce other fails.

Hi Patrick,

Thanks a lot for the summary. Could you please help to share some more
information about the spec2017 for above data? Like data set (test, train, or
ref), the enviornment (qemu, spike, or hardware) as well as the spec config
file. Just would like to make sure we are on the same page for the failures and
reproducible from others.

Thanks again. Pan

[Bug target/112929] [14] RISC-V vector: Variable clobbered at runtime

2023-12-12 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112929

--- Comment #19 from Li Pan  ---
(In reply to Robin Dapp from comment #7)
> Here
> 
> 0x105c6   vse8.v  v8,(a5)
> 
> is where we overwrite m.  The vl is 128 but the preceding vsetvl gets a4 =
> 46912504507016 as AVL which seems already borken.

I can reproduce this up to a point.

0x10282   vsetvli zero,a4,e8,m8,ta,ma

(gdb) p $a4
$2 = 110736

Looks like 110736 is not the correct vl here, will continue to investigate.

[Bug target/112929] [14] RISC-V vector: Variable clobbered at runtime

2023-12-11 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112929

--- Comment #18 from Li Pan  ---
I see, thanks all, will have a try with variadic function call.

[Bug target/112929] [14] RISC-V vector: Variable clobbered at runtime

2023-12-10 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112929

--- Comment #12 from Li Pan  ---
(In reply to Patrick O'Neill from comment #0)
> Testcase:
> int printf(char *, ...);
> int a, b, l, i, p, q, t, n, o;
> int *volatile c;
> static int j;
> static struct pack_1_struct d;
> long e;
> char m = 5;
> short s;
> #pragma pack(1)
> struct pack_1_struct {
>   long c;
>   int d;
>   int e;
>   int f;
>   int g;
>   int h;
>   int i;
> } h, r = {1}, *f = , *volatile g;
> int main() {
>   int u;
>   j = 0;
>   for (; j < 9; ++j) {
> u = ++t ? a : 0;
> if (u) {
>   int *v = 
>   *v = g || e;
>   *c = 0;
>   *f = h;
> }
> s = l && c;
> o = i;
> d.f || (p = 0);
> q |= n;
>   }
>   r = *f;
>   printf("b: %d\n", b);
>   printf("m: %d\n", m);
> }
> 
> Commands:
> rv64gc:
> > /scratch/tc-testing/tc-dec-8-trunk/build-rv64gcv/bin/riscv64-unknown-linux-gnu-gcc
> >  -march=rv64gc -mabi=lp64d -O3 red.c -o rv64gc.out
> > QEMU_CPU=rv64,vlen=128,v=true,vext_spec=v1.0 
> > /scratch/tc-testing/tc-dec-8-trunk/build-rv64gcv/bin/qemu-riscv64 rv64gc.out
> b: 0
> m: 5
> 
> rv64gcv:
> > /scratch/tc-testing/tc-dec-8-trunk/build-rv64gcv/bin/riscv64-unknown-linux-gnu-gcc
> >  -march=rv64gcv -mabi=lp64d -O3 red.c -o rv64gcv.out
> > QEMU_CPU=rv64,vlen=128,v=true,vext_spec=v1.0 
> > /scratch/tc-testing/tc-dec-8-trunk/build-rv64gcv/bin/qemu-riscv64 
> > rv64gcv.out
> b: 0
> m: 0
> 
> Nothing touches the m variable so at the end it should equal 5.
> 
> Commenting out the preceding printf("b: %d\n", b); statement causes the
> testcase to pass successfully (and doesn't cause much change to the
> assembly):
> https://godbolt.org/z/Erzzqxo8q

Could you please help to share the commit id of GCC for the above test? Would
like to double check if the upstream still have this issue.

[Bug target/112929] [14] RISC-V vector: Variable clobbered at runtime

2023-12-10 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112929

Li Pan  changed:

   What|Removed |Added

 CC||pan2.li at intel dot com

--- Comment #11 from Li Pan  ---
(In reply to JuzheZhong from comment #8)
> Li Pan will investigate it. He will note me if there is a bug in vsetvl pass.

The interesting thing is that I cannot fully reproduce this with build
20231210.

PASS >> ../build-qemu/qemu-riscv64 -cpu rv64,vlen=128,v=true,vext_spec=v1.0
test.rv64gc.elf

FAIL ../build-qemu/qemu-riscv64 -cpu rv64,vlen=128,v=true,vext_spec=v1.0
test.gcv.elf
Segmentation fault (core dumped)

It will be PASS if built with rv64gc but got a segment fault in printf when
built with rv64gcv. Thus I did some adjusting for this case to bypass the
segment, and then have the rv64gcv pass. Update the modified test case as
below.

qemu-riscv64 version 8.1.92 (v8.2.0-rc2-48-gd451e32ce8).
newlib for gcc build.

Modified test case:

int a, b, l, i, p, q, t, n, o;
int *volatile c;
static int j;
static struct pack_1_struct d;
long e;
char m = 5;
short s;

#pragma pack(1)
struct pack_1_struct {
  long c;
  int d;
  int e;
  int f;
  int g;
  int h;
  int i;
} h, r = {1}, *f = , *volatile g;

int main() {
  int u;
  j = 0;

  for (; j < 9; ++j) {
u = ++t ? a : 0;
if (u) {
  int *v = 
  *v = g || e;
  *c = 0;
  *f = h;
}
s = l && c;
o = i;
d.f || (p = 0);
q |= n;
  }

  r = *f;

  if (m == 5)// Reference m like print
return 0;

  return 1234;
}

[Bug c/112896] New: RISC-V: gcc.dg/pr30957-1.c run failure when rv64gcv_zvl1024b_zvfh_zfh

2023-12-07 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112896

Bug ID: 112896
   Summary: RISC-V: gcc.dg/pr30957-1.c run failure when
rv64gcv_zvl1024b_zvfh_zfh
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pan2.li at intel dot com
  Target Milestone: ---

The gcc.dg/pr30957-1.c test case is failed in RISC-V backend when build with
below options.

-march=rv64gcv_zvl1024b_zvfh_zfh -mabi=lp64d  -O2 -mcmodel=medlow
--param=riscv-autovec-preference=fixed-vlmax -funroll-loops -fassociative-math
-fno-trapping-math -fno-signed-zeros -fvariable-expansion-in-unroller
-fdump-rtl-expand-details -lm gcc/testsuite/gcc.dg/pr30957-1.c -o test.elf

The test gcc/testsuite/gcc.dg/pr30957-1.c may be similar as below.

float __attribute__((noinline))
foo (float d, int n)
{
  unsigned i;
  float accum = d;

  for (i = 0; i < n; i++)
accum += d;

  return accum;
}

int
main ()
{
  /* When compiling standard compliant we expect foo to return -0.0.  But the
 variable expansion during unrolling optimization (for this testcase
enabled
 by non-compliant -fassociative-math) instantiates copy(s) of the
 accumulator which it initializes with +0.0.  Hence we expect that foo
 returns +0.0.  */
  if (__builtin_copysignf (1.0, foo (0.0 / -5.0, 10)) != 1.0)
abort ();
  exit (0);
}

Have an initial investigation that RISC-V backend always get LPT_NONE when
unroll_loops, as the step of loop will be dynamic after vectorizing, and get
the simple loop flag as false, then the pass unroll_loops will do nothing for
non simple loop.

We may need further investigation for this case.

[Bug target/112743] RISC-V: building FAIL with -march=rv64(or rv32)gc_zve32f_zvfh_zfh

2023-12-02 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112743

--- Comment #6 from Li Pan  ---
Double confirmed the riscv-gnu-toolchain can be built successfully with the
latest newlib.

[Bug target/112743] RISC-V: building FAIL with -march=rv64(or rv32)gc_zve32f_zvfh_zfh

2023-11-28 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112743

--- Comment #4 from Li Pan  ---
There may be another ICE for zve32f, will double-check about the details.

[Bug c/112743] RISC-V: building FAIL with -march=rv64(or rv32)gc_zve32f_zvfh_zfh

2023-11-28 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112743

Li Pan  changed:

   What|Removed |Added

 CC||pan2.li at intel dot com

--- Comment #1 from Li Pan  ---
Thanks Juzhe, will take a look at this issue and keep you posted.

[Bug target/112598] RISC-V regression testsuite errors with rv64gcv_zvl512b

2023-11-27 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112598

--- Comment #17 from Li Pan  ---
(In reply to Robin Dapp from comment #15)
> Does the =m fix your issue?  Or is the code gen different then and we're
> just lucky?  For my problem it doesn't help because we still don't recognize
> an alias between load and store and the load is moved.
> 
> Richi's RFC patch from a while ago helps, though.  I cc'd you.

No, it should be something that happens to work. I double-checked the asm
layout, the alias is still false between scalar load and vector store.

   1016e:   158000efjal 102c6 
   10172:   ffc50793add a5,a0,-4
   10176:   4689li  a3,2
   10178:   0d047057vsetvli zero,s0,e32,m1,ta,ma
   1017c:   40d8lw  a4,4(s1)<= LOAD
   1017e:   5e00b0d7vmv.v.i v1,1
   10182:   74d1a423sw  a3,1864(gp) # 13398 
   10186:   0207e0a7vse32.v v1,(a5) <=
STORE
   1018a:   03271163bne a4,s2,101ac 

[Bug target/112598] RISC-V regression testsuite errors with rv64gcv_zvl512b

2023-11-27 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112598

--- Comment #14 from Li Pan  ---
The below diff similar to the x86 workaround looks not working, unless we
change the `+m` to `=m`. But I don't fully test the impact of this change
except the case itself.

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 935eeb7fd8e..882fc8fe5ec 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -85,6 +85,9 @@ (define_c_enum "unspec" [

   ;; String unspecs
   UNSPEC_STRLEN
+
+  ;; test
+  UNSPEC_MASKSTORE
 ])

 (define_c_enum "unspecv" [
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index ba9c9e5a9b6..2f74cec51d1 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -1738,16 +1738,17 @@ (define_insn_and_split "*pred_mov"
 ;; Dedicated pattern for vse.v instruction since we can't reuse pred_mov
pattern to include
 ;; memory operand as input which will produce inferior codegen.
 (define_insn "@pred_store"
-  [(set (match_operand:V 0 "memory_operand" "+m")
-   (if_then_else:V
- (unspec:
-   [(match_operand: 1 "vector_mask_operand" "vmWc1")
-(match_operand 3 "vector_length_operand""   rK")
-(match_operand 4 "const_int_operand""i")
-(reg:SI VL_REGNUM)
-(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
- (match_operand:V 2 "register_operand" "vr")
- (match_dup 0)))]
+  [(set (match_operand:V 0 "memory_operand" "=m")
+   (unspec:V
+ [(if_then_else:V
+   (unspec:
+ [(match_operand: 1 "vector_mask_operand" "vmWc1")
+  (match_operand 3 "vector_length_operand""   rK")
+  (match_operand 4 "const_int_operand""i")
+  (reg:SI VL_REGNUM)
+  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+   (match_operand:V 2 "register_operand" "vr")
+   (match_dup 0))] UNSPEC_MASKSTORE))]
   "TARGET_VECTOR"
   "vse.v\t%2,%0%p1"
   [(set_attr "type" "vste")

[Bug target/112598] RISC-V regression testsuite errors with rv64gcv_zvl512b

2023-11-27 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112598

--- Comment #12 from Li Pan  ---
Hi Robin,

Do you have any ideas about the possible fix for this issue? The x86 backend
has one workaround for this issue as below.

https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=dbf8ab449417aa24669f6ccf50be8c17f8c1278e

But unfortunately not suitable for riscv after a quick try because of the below
define_insn:
 (define_insn "@pred_store"
   [(set (match_operand:V 0 "memory_operand" "+m") // "=m" here
for x86 SSE.

Given current stage of GCC, I am not quite sure if we need to fix it in the
backend (Or bypass it) or from the middle end.

[Bug target/112598] RISC-V regression testsuite errors with rv64gcv_zvl512b

2023-11-26 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112598

--- Comment #10 from Li Pan  ---
Link to one similar issue as below.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237

[Bug target/112598] RISC-V regression testsuite errors with rv64gcv_zvl512b

2023-11-23 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112598

--- Comment #9 from Li Pan  ---
Before tracer
- 
ENTRY
   |
   +---+
   |  B2   |
   +---+
  / \
a < 2a >= 2
/  \ 
   +---+  +---+
   | vec store |->| _3 = b[1] |
   +---+  +---+
  /\
_3 != 1 _3 == 1
/\
+++--+
| abort  || return 0 |
+++--+
After tracer
- 

ENTRY
   |
   +---+
   |  B2   |
   +---+
  / \
a < 2a >= 2
/  \ 
+-+  +---+
| vec store   |  | _3 = b[1] |
| |  +---+
after tracer| |   /\
| |_3 != 1 _3 == 1
| _31 = b[1]  | /\
+-+ +++--+
|-->| abort  || return 0 |<---|
|   +++--+|
| |
|-|

After tracer, the vec store and scalar load will be in the same basic block and
unfortunately referenced to the same memory address. Thus, the sch1 make the
scalar load before vec store cause the failure on memory access sequeneces.

[Bug target/112598] RISC-V regression testsuite errors with rv64gcv_zvl512b

2023-11-22 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112598

--- Comment #8 from Li Pan  ---
For gcc.dg/torture/pr58955-2.c, we can simply reproduce it by options

Pass when: -O3
Pass when: -O3 -ftracer -fno-schedule-insns -fno-schedule-insns2
Fail when: -O3 -ftracer -fno-schedule-insns2

   10154:   4409   li   s0,2
   10156:   9c1d   subw s0,s0,a5
   10158:   1402   sll  s0,s0,0x20
   1015a:   9001   srl  s0,s0,0x20
   1015c:   97ca   add  a5,a5,s2
   1015e:   078a   sll  a5,a5,0x2
   10160:   7b018493   add  s1,gp,1968 # 13400 
   10164:   97a6   add  a5,a5,s1
   10166:   00241613   sll  a2,s0,0x2
   1016a:   853e   mv   a0,a5
   1016c:   4581   li   a1,0
   1016e:   158000ef   jal  102c6 
   10172:   ffc50793   add  a5,a0,-4
   10176:   4689   li   a3,2
   10178:   0d047057   vsetvli  zero,s0,e32,m1,ta,ma
   1017c:   40d8   lw   a4,4(s1)<== Load
   1017e:   5e00b0d7   vmv.v.i  v1,1
   10182:   74d1a423   sw   a3,1864(gp) # 13398 
   10186:   0207e0a7   vse32.v  v1,(a5) <== Store
   1018a:   03271163   bne  a4,s2,101ac 

Looks like the tracer and the sch1 resulted in the failure, it is a typical
Load Before Store issue AFAIK. The lw load should be after the vse32 store in
semantics but the sch1 moves it before the store and of course, the value of a4
is unexpected here.

[Bug target/111720] RISC-V: Ugly codegen in RVV

2023-11-22 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #31 from Li Pan  ---
We still have some unnecessary code here, which is stack-related, will take
care of it in another PATCH.

After this patch:
test:
  lui a5,%hi(.LANCHOR0)
  addia5,a5,%lo(.LANCHOR0)
  li  a4,32
  addisp,sp,-32   <== unnecessary insn
  vsetvli zero,a4,e8,m1,ta,ma
  vle8.v  v1,0(a5)
  vs1r.v  v1,0(a0)
  addisp,sp,32<== unnecessary insn
  jr  ra

[Bug tree-optimization/111970] [14 regression] SLP for non-IFN gathers result in RISC-V test failure on gather since r14-4745-gbeab5b95c58145

2023-11-20 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111970

--- Comment #21 from Li Pan  ---
(In reply to Robin Dapp from comment #18)
> I did a quick testsuite run on rv32 and can confirm that this fixes the
> issue for me.

Confirmed that this fixes the issue on RV64 too.

[Bug c/112432] Internal-fn: The [i|l|ll]rint family don't support FLOATN

2023-11-09 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112432

Li Pan  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from Li Pan  ---
The FLOATN support patch merged to trunk already, the below builtin has FLOATN
support now.

1. lrint
2. lround
3. llrint
4. llrount

[Bug c/112432] Internal-fn: The [i|l|ll]rint family don't support FLOATN

2023-11-08 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112432

--- Comment #5 from Li Pan  ---
(In reply to Li Pan from comment #4)
> (In reply to Richard Biener from comment #3)
> > Ah, yes, for lrint we have the builtins - I just looked for lceil here.  So
> > yeah, where there are DEF_EXT_LIB_FLOATN_NX_BUILTINS we should have
> > DEF_INTERNAL_FLT_FLOATN_FN.
> 
> Thanks Richard, I will have a try for this change.

After some double-confirmation, the related definition are list as below

 glibc  GCC-FLOATN_NX_BUILTINS
iceilN  N
ifloor   N  N
irintN  N
iround   N  N

lceilN  N
lfloor   N  N
lrintY  Y
lround   Y  Y

llceil   N  N
llfllor  N  N
llrint   Y  Y
llround  Y  Y

We only need to support lrint/lround/llrint/llround for FLOATN for now.

[Bug c/112432] Internal-fn: The [i|l|ll]rint family don't support FLOATN

2023-11-08 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112432

--- Comment #4 from Li Pan  ---
(In reply to Richard Biener from comment #3)
> Ah, yes, for lrint we have the builtins - I just looked for lceil here.  So
> yeah, where there are DEF_EXT_LIB_FLOATN_NX_BUILTINS we should have
> DEF_INTERNAL_FLT_FLOATN_FN.

Thanks Richard, I will have a try for this change.

[Bug c/112432] Internal-fn: The [i|l|ll]rint family don't support FLOATN

2023-11-07 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112432

--- Comment #2 from Li Pan  ---
(In reply to Richard Biener from comment #1)
> Is there a corresponding C API?  We don't have "generic" versions in
> builtins.def either (with _VAR).
> 
> That said, what's the testcase here?

I found some FLOATN like api from glibc doc, when given N is 16.

long int lrintfN (_FloatN x);
long int lroundfN (_FloatN x);

https://www.gnu.org/software/libc/manual/2.38/html_mono/libc.html

The context comes from the autovec for the lrintf and lrintf16. For example as
below

void
test_lrintf16 (long *out, _Float16 *in, unsigned count)
{
  for (unsigned i = 0; i < count; i++)
out[i] = __builtin_lrintf16 (in[i]);
}

void
test_lrintf (long *out, float *in, unsigned count)
{
  for (unsigned i = 0; i < count; i++)
out[i] = __builtin_lrintf (in[i]);
}

We may have similar rtl code when compile with "-march=rv64gcv_zvfh_zfh
-mabi=lp64d -O3 -ftree-vectorize -ffast-math".
void
test_lrintf16 (long *out, _Float16 *in, unsigned count)
{
  # ivtmp.8_28 = PHI  
  # ivtmp.9_25 = PHI 
  _22 = (void *) ivtmp.8_28;
  _4 = MEM[(_Float16 *)_22];
  _7 = __builtin_lrintf16 (_4);
  _21 = (void *) ivtmp.9_25;
  MEM[(long int *)_21] = _7;
  ivtmp.8_27 = ivtmp.8_28 + 2;
  ivtmp.9_24 = ivtmp.9_25 + 8;
}

void
test_lrintf (long *out, float *in, unsigned count)
{
  # ivtmp.37_32 = PHI 
  # ivtmp.40_26 = PHI 
  _23 = (void *) ivtmp.37_32;
  vect__4.21_40 = MEM  [(float *)_23];
  vect__7.22_41 = .LRINT (vect__4.21_40); // Expand lrint
  _22 = (void *) ivtmp.40_26;
  MEM  [(long int *)_22] = vect__7.22_41;
  ivtmp.37_48 = ivtmp.37_32 + 64;
  ivtmp.40_25 = ivtmp.40_26 + 128;
}

[Bug c/112432] New: Internal-fn: The [i|l|ll]rint family don't support FLOATN

2023-11-07 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112432

Bug ID: 112432
   Summary: Internal-fn: The [i|l|ll]rint family don't support
FLOATN
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pan2.li at intel dot com
  Target Milestone: ---

The [i|l|ll]rint family are defined as DEF_INTERNAL_FLT_FN instead of
DEF_INTERNAL_FLT_FLOATN_FN in the internal-fn.def. Thus, the standard name like
lrint cannot be expanded when _Float16 type is given.

Is there any reason/background that [i|l|ll]rint can honor FLOATN or not? List
all related fn definition as below.

DEF_INTERNAL_FLT_FN (ICEIL, ECF_CONST, lceil, unary_convert)
DEF_INTERNAL_FLT_FN (IFLOOR, ECF_CONST, lfloor, unary_convert)
DEF_INTERNAL_FLT_FN (IRINT, ECF_CONST, lrint, unary_convert)
DEF_INTERNAL_FLT_FN (IROUND, ECF_CONST, lround, unary_convert)
DEF_INTERNAL_FLT_FN (LCEIL, ECF_CONST, lceil, unary_convert)
DEF_INTERNAL_FLT_FN (LFLOOR, ECF_CONST, lfloor, unary_convert)
DEF_INTERNAL_FLT_FN (LRINT, ECF_CONST, lrint, unary_convert)
DEF_INTERNAL_FLT_FN (LROUND, ECF_CONST, lround, unary_convert)
DEF_INTERNAL_FLT_FN (LLCEIL, ECF_CONST, lceil, unary_convert)
DEF_INTERNAL_FLT_FN (LLFLOOR, ECF_CONST, lfloor, unary_convert)
DEF_INTERNAL_FLT_FN (LLRINT, ECF_CONST, lrint, unary_convert)
DEF_INTERNAL_FLT_FN (LLROUND, ECF_CONST, lround, unary_convert)

[Bug target/111720] RISC-V: Ugly codegen in RVV

2023-11-01 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #27 from Li Pan  ---
Hi Richard and Juzhe.

I investigated this issue recently and noticed that it may be related to the
array size of the constant memory. Assume we have 2 functions as below.

vuint8m1_t fn_0 () {
  uint8_t arr[32] = {1, 2, 7, 1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9, 1, 2, 7,
1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9};

  return __riscv_vle8_v_u8m1(arr, 32);
}

vuint8m2_t fn_1 () {
  uint8_t arr[32] = {1, 2, 7, 1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9, 1, 2, 7,
1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9};

  return __riscv_vle8_v_u8m2(arr, 32);
}

The vuint8m1 will have stack variables but the vuint8m2 doesn't. Thus I guess
there may be some limitations when optimization. Finally, I located
extract_low_bits when get_stored_val in dse. Looks like it can only take care
of scalar mode if the nunits are not equal.

rtx extract_low_bits (machine_mode mode, machine_mode src_mode, rtx src)
{
  ...
  if (!int_mode_for_mode (src_mode).exists (_int_mode)
  || !int_mode_for_mode (mode).exists (_mode))
return NULL_RTX;
  ...
}

I try to allow the vector mode for the gen_lowpart here if and only if the size
of mode is not greater than src mode. It can eliminate the stack variables as
we expected up to a point for the above functions.

I tested RVV regression and looks good for now. But I would like to double
confirm with you that it is reasonable? Before we start to do more testing. ;).

Thanks.

[Bug tree-optimization/111970] [14 regression] SLP for non-IFN gathers result in RISC-V test failure on gather since r14-4745-gbeab5b95c58145

2023-10-31 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111970

--- Comment #8 from Li Pan  ---
Still fail in upstream.

../__RISC-V_INSTALL___RV64/bin/riscv64-unknown-elf-gcc -march=rv64imafdcv
-mabi=lp64d \
  -ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax \
  --param riscv-autovec-lmul=dynamic --param vect-epilogues-nomask=0 \
  -ffast-math -lm
gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c
\
  -o test.elf

../build-qemu/qemu-riscv64 -cpu rv64,v=true,vlen=128,elen=64,vext_spec=v1.0
test.elf
assertion "dest_int32_t_int8_t[i * 2] == (src_int32_t_int8_t
[index_int32_t_int8_t[i * 2]] + 1)"
  failed: file
"gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c",
line 45, function: main

../__RISC-V_INSTALL___RV64/bin/riscv64-unknown-elf-gcc --version
riscv64-unknown-elf-gcc (GCC) 14.0.0 20231021 (experimental)
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[Bug tree-optimization/111970] [14 regression] SLP for non-IFN gathers result in RISC-V test failure on gather since r14-4745-gbeab5b95c58145

2023-10-31 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111970

--- Comment #7 from Li Pan  ---
Seems no luck when --param vect-epilogues-nomask=0. I will have a try with the
newest upstream for this issue if everything look OK, and keep you posted.

../__RISC-V_INSTALL___RV64/bin/riscv64-unknown-elf-gcc -march=rv64imafdcv
-mabi=lp64d \
  -ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax \
  --param riscv-autovec-lmul=dynamic --param vect-epilogues-nomask=0 \
  -ffast-math -lm
gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c
\
  -o test.elf

../build-qemu/qemu-riscv64 -cpu rv64,v=true,vlen=128,elen=64,vext_spec=v1.0
test.elf
assertion "dest_int32_t_int8_t[i * 2] == (src_int32_t_int8_t
[index_int32_t_int8_t[i * 2]] + 1)" failed: \
  file
"gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c",
line 45, function: main

../__RISC-V_INSTALL___RV64/bin/riscv64-unknown-elf-gcc --version
riscv64-unknown-elf-gcc (GCC) 14.0.0 20231019 (experimental)
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[Bug tree-optimization/111970] [14 regression] SLP for non-IFN gathers result in RISC-V test failure on gather since r14-4745-gbeab5b95c58145

2023-10-27 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111970

--- Comment #5 from Li Pan  ---
Thank you, any thing I can help please feel free to let me know.

[Bug tree-optimization/111970] [14 regression] SLP for non-IFN gathers result in RISC-V test failure on gather since r14-4745-gbeab5b95c58145

2023-10-25 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111970

--- Comment #3 from Li Pan  ---
Double confirmed the trunk of GCC still has this issue.

[Bug tree-optimization/111970] [14 regression] SLP for non-IFN gathers result in RISC-V test failure on gather since r14-4745-gbeab5b95c58145

2023-10-24 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111970

--- Comment #2 from Li Pan  ---
Add more information about how to build and run the test cases.

Build:

../__RISC-V_INSTALL___RV64/bin/riscv64-unknown-elf-gcc -march=rv64imafdcv
-mabi=lp64d -ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax
--param riscv-autovec-lmul=dynamic -ffast-math -lm
gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c
-o test.elf

Run:

qemu-riscv64 -cpu rv64,v=true,vlen=128,elen=64,vext_spec=v1.0  test.elf
assertion "dest_float_uint8_t[i * 2] == (src_float_uint8_t
[index_float_uint8_t[i * 2]] + 1)" failed: file
"gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c",
line 106, function: main

[Bug c/111970] [tree-optimization] SLP for non-IFN gathers result in RISC-V test failure on gather

2023-10-24 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111970

--- Comment #1 from Li Pan  ---
Created attachment 56198
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56198=edit
Without this commit

[Bug c/111970] New: [tree-optimization] SLP for non-IFN gathers result in RISC-V test failure on gather

2023-10-24 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111970

Bug ID: 111970
   Summary: [tree-optimization] SLP for non-IFN gathers result in
RISC-V test failure on gather
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pan2.li at intel dot com
  Target Milestone: ---

Created attachment 56197
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56197=edit
Within this commit

Hi Richard Biener,

Recently we found one regression of RISC-V backend for gather autovec, aka
gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c.
I narrow it down to a small piece of code like below:

include 

#define TEST_LOOP(DATA_TYPE, INDEX_TYPE)  
\
  void __attribute__ ((noinline, noclone))
\
  f_##DATA_TYPE##_##INDEX_TYPE (DATA_TYPE *restrict y, DATA_TYPE *restrict x, 
\
INDEX_TYPE *restrict index)   
\
  {   
\
for (int i = 0; i < 100; ++i) 
\
  {   
\
y[i * 2] = x[index[i * 2]] + 1;   
\
y[i * 2 + 1] = x[index[i * 2 + 1]] + 2;   
\
  }   
\
  }

TEST_LOOP (float, uint8_t)

The commit id beab5b95c581452adeb26efd59ae84a61fb3b429
(tree-optimization/31 - SLP for non-IFN gathers) makes the tree generate
the incorrect IR as the attachments.

The data array and the index array should have the same step after
vectorization. But we get incorrect offset for the second iteration.

vector(32) float vect__11.11;
_209 = BIT_FIELD_REF  [(uint8_t *)_163], 8, 16>;

then update offset for the second iteration.

ivtmp.35_613 = ivtmp.35_594 + 64; // should be ivtmp = ivtmp + 32
ivtmp.38_76 = ivtmp.38_620 + 256;

I also upload the tree.optimized code before and after this commit, you can
check more details about it. Any more information required please feel free to
let me know.

Pan

[Bug target/111720] RISC-V: Ugly codegen in RVV

2023-10-17 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #14 from Li Pan  ---
Looks like option -fmerge-all-constants doesn't work for this case, as well as
RISC-V.

For RISC-V, the CLOBBER exists after tree gimple.

void test (vuint8m1_t *out) {

  uint8_t arr[32] = {1, 2, 7, 1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9, 1, 2, 7,
1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9};

  *out = *(vuint8m1_t *)arr;
}

void test (vuint8m1_t * out)
{
  uint8_t arr[32];

  try
{
  arr =
"\x01\x02\x07\x01\x03\x04\x05\x03\x01\x00\x01\x02\x04\x04\t\t\x01\x02\x07\x01\x03\x04\x05\x03\x01\x00\x01\x02\x04\x04\t\t";
  arr.0_1 = 
  _2 = MEM[(vuint8m1_t *)arr.0_1];
  *out = _2;
}
  finally
{
  arr = {CLOBBER(eol)};
}
}

[Bug target/111313] RISC-V: Incorrect vsetvl code gen for 2 level loop

2023-09-07 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111313

Li Pan  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #2 from Li Pan  ---
Closed it as validated.

[Bug c/111313] New: RISC-V: Incorrect code gen for 2 level loop

2023-09-06 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111313

Bug ID: 111313
   Summary: RISC-V: Incorrect code gen for 2 level loop
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pan2.li at intel dot com
  Target Milestone: ---

Created attachment 55846
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55846=edit
Reproduce code

Given we have an example code as below.

#define K 32

signed short in[2*K][K] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__)));
signed short coeff[K][K] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__)));

__attribute__ ((noinline)) void
test ()
{
  for (int j = 0; j < K; j++)
  {
for (int i = 0; i < 2*K; i++)
  in[i][j] = i+j;

for (int i = 0; i < K; i++)
  coeff[i][j] = i + 2;
  }
}

When compile with option similar to "-march=rv64imafdcv -mabi=lp64d
-mcmodel=medlow   -fdiagnostics-plain-output  -flto -ffat-lto-objects   --param
riscv-autovec-preference=scalable -Wno-psabi -ftree-vectorize
-fno-tree-loop-distribute-patterns   -fno-vect-cost-model -fno-common
-fdump-tree-vect-details "

The assembly code will be:

init_in:
lui t1,%hi(coeff)
lui a7,%hi(in)
csrra0,vlenb
addit1,t1,%lo(coeff)
addia7,a7,%lo(in)
srlia0,a0,2
li  a6,0
li  t3,32
vsetvli a1,zero,e16,mf2,ta,ma
vid.v   v3
vsll.vi v3,v3,6
.L2:
mv  a2,a7
li  a4,64
vmv.v.x v4,a6  <= this insn will have e16 first, and then e32 when loop
back
vsetvli zero,zero,e32,m1,ta,ma
vid.v   v2
.L3:
vsetvli zero,zero,e16,mf2,ta,ma
vmv1r.v v1,v2
vncvt.x.x.w v1,v1
vsetvli a5,a4,e8,mf4,ta,ma
vsetvli a3,zero,e16,mf2,ta,ma
sub a4,a4,a5
vadd.vv v1,v1,v4
vsetvli zero,a5,e16,mf2,ta,ma
sllia5,a5,6
vsuxei16.v  v1,(a2),v3
vsetvli a1,zero,e32,m1,ta,ma
add a2,a2,a5
vmv.v.x v1,a0
vadd.vv v2,v2,v1
bne a4,zero,.L3
mv  a2,t1
li  a4,32
vid.v   v2
.L4:
vsetvli zero,zero,e16,mf2,ta,ma
vmv1r.v v1,v2
vncvt.x.x.w v1,v1
vsetvli a5,a4,e8,mf4,ta,ma
vsetvli a3,zero,e16,mf2,ta,ma
sub a4,a4,a5
vadd.vi v1,v1,2
vsetvli zero,a5,e16,mf2,ta,ma
sllia5,a5,6
vsuxei16.v  v1,(a2),v3
vsetvli a1,zero,e32,m1,ta,ma
add a2,a2,a5
vmv.v.x v1,a0
vadd.vv v2,v2,v1
bne a4,zero,.L4
addiw   a6,a6,1
addit1,t1,2
addia7,a7,2
bne a6,t3,.L2
ret

[Bug target/110985] RISC-V: Incorrect code gen for RVV VLS

2023-08-14 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110985

Li Pan  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Li Pan  ---
Close this bug as committed already.

[Bug c/110985] New: RISC-V: Incorrect code gen for RVV VLS

2023-08-11 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110985

Bug ID: 110985
   Summary: RISC-V: Incorrect code gen for RVV VLS
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pan2.li at intel dot com
  Target Milestone: ---

Given we have the below sample code.

#include 

typedef int16_t vnx16i __attribute__ ((vector_size (32)));

void
foo (int16_t *__restrict out)
{
  vnx16i v = {15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0};
  *(vnx16i *) out = v;
}

It will generate below incorrect asm when compile with "-march=rv64gcv -O3
--param=riscv-autovec-preference=fixed-vlmax".

foo:
ret

In fact it may be something similar to below assembly code.

foo:
vsetivlizero, 16, e16, m2, ta, ma
vid.v   v8
vrsub.viv8, v8, 15
vse16.v v8, (a0)
ret

[Bug other/110744] [14 regression] gcc.dg/tree-ssa/pr84512.c fails after r14-2267-gb8806f6ffbe72

2023-07-19 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110744

--- Comment #7 from Li Pan  ---
Thanks a lot for the explanation, Kewen. 

Looks you are taking care of this already, anything is required from my-side
please feel free to let me know.

[Bug other/110744] [14 regression] cc.dg/tree-ssa/pr84512.c fails after r14-2267-gb8806f6ffbe72

2023-07-19 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110744

--- Comment #2 from Li Pan  ---
Hi there,

Just try to reproduce this bug with powerPC cross compiler (sorry we don't have
a real powerPC) with the below options. Unfortunately, I failed to reproduce
this bug as above mentioned.

Could you please help to take a look if there is something missing or
incorrect?

1. Build cross compiler

../configure \
  --target=powerpc-unknown-elf \
  --prefix=${INSTALL_DIR} \
  --disable-shared \
  --enable-threads \
  --enable-tls \
  --enable-languages=c,c++ \
  --with-system-zlib \
  --with-newlib \
  --disable-libmudflap \
  --disable-libssp \
  --disable-libquadmath \
  --disable-libgomp \
  --enable-nls \
  --disable-tm-clone-registry \
  --enable-multilib \
  --src=`pwd`/../ \
  --enable-werror \

make -j $(nproc) all-gcc && make install-gcc

2. Compile the source file.

>> ./bin/powerpc-unknown-elf-gcc -mcpu=power10 -O3 -fdump-tree-optimized -c -S 
>> ../gcc/gcc/testsuite/gcc.dg/tree-ssa/pr84512.c -o -
.file   "pr84512.c"
.machine power10
.section".text"
.align 2
.align 4
.globl foo
.type   foo, @function
foo:
.LFB0:
li 3,285
blr
.LFE0:
.size   foo, .-foo
.section.eh_frame,"aw",@progbits
.Lframe1:
.4byte  .LECIE1-.LSCIE1
.LSCIE1:
.4byte  0
.byte   0x3
.string ""
.byte   0x1
.byte   0x7c
.byte   0x41
.byte   0xc
.byte   0x1
.byte   0
.align 2
.LECIE1:
.LSFDE1:
.4byte  .LEFDE1-.LASFDE1
.LASFDE1:
.4byte  .LASFDE1-.Lframe1
.4byte  .LFB0
.4byte  .LFE0-.LFB0
.align 2
.LEFDE1:
.ident  "GCC: (GNU) 14.0.0 20230720 (experimental)"

>> cat pr84512.c.257t.optimized

;; Function foo (foo, funcdef_no=0, decl_uid=3445, cgraph_uid=1,
symbol_order=0)

int foo ()
{
   [local count: 97603129]:
  return 285;

}

[Bug c/110299] New: RISC-V: ICE when build RVV intrinsic widen with "-march=rv32gc_zve64d -mabi=ilp32d", both GCC 14 and 13.

2023-06-18 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110299

Bug ID: 110299
   Summary: RISC-V: ICE when build RVV intrinsic widen with
"-march=rv32gc_zve64d -mabi=ilp32d", both GCC 14 and
13.
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pan2.li at intel dot com
  Target Milestone: ---

Created attachment 55358
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55358=edit
Reproduce code

Given we have the below code.

#include "riscv_vector.h"

#include "riscv_vector.h"

vfloat32m1_t test_vfwredosum_vs_f16m1_f32m1(vfloat16m1_t vector, vfloat32m1_t
scalar, size_t vl) {
  return __riscv_vfwredosum_vs_f16m1_f32m1(vector, scalar, vl);
}

There will be the ICE when build similar as "riscv64-unknown-elf-gcc
-march=rv64gc_zve64d -mabi=lp64 -O3 -Wno-psabi -c -S test-float.c -o -".

.text
test-widen.c: In function 'test_vfwredosum_vs_f16m1_f32m1':
test-widen.c:4:10: error: invalid argument to built-in function
4 |   return __riscv_vfwredosum_vs_f16m1_f32m1(vector, scalar, vl);
  |  ^
during RTL pass: expand
test-widen.c:4:10: internal compiler error: Segmentation fault
0x1044343 crash_signal
../.././gcc/gcc/toplev.cc:314
0x7f76d0c4251f ???
./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
0xc5d1d7 store_expr(tree_node*, rtx_def*, int, bool, bool)
../.././gcc/gcc/expr.cc:6345
0xc5f500 expand_assignment(tree_node*, tree_node*, bool)
../.././gcc/gcc/expr.cc:6048
0xb2142c expand_call_stmt
../.././gcc/gcc/cfgexpand.cc:2829
0xb2142c expand_gimple_stmt_1
../.././gcc/gcc/cfgexpand.cc:3880
0xb2142c expand_gimple_stmt
../.././gcc/gcc/cfgexpand.cc:4044
0xb26770 expand_gimple_basic_block
../.././gcc/gcc/cfgexpand.cc:6096
0xb28837 execute
../.././gcc/gcc/cfgexpand.cc:6831

[Bug c/110277] RISC-V: ICE when build RVV intrinsic float reduction with "-march=rv32gc_zve64d -mabi=ilp32d", both GCC 14 and 13.

2023-06-16 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110277

--- Comment #1 from Li Pan  ---
Meanwhile, the float reduction for FP16 is not well supported for both the
ZVE64 and ZVE32. We will try to fix them together with this bug.

[Bug c/110277] New: RISC-V: ICE when build RVV intrinsic float reduction with "-march=rv32gc_zve64d -mabi=ilp32d", both GCC 14 and 13.

2023-06-16 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110277

Bug ID: 110277
   Summary: RISC-V: ICE when build RVV intrinsic float reduction
with "-march=rv32gc_zve64d -mabi=ilp32d", both GCC 14
and 13.
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pan2.li at intel dot com
  Target Milestone: ---

Created attachment 55338
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55338=edit
Reproduce code

Given we have the below code.

#include "riscv_vector.h"y

vfloat32m1_t test_vfredmax_vs_f32mf2_f32m1(vfloat32mf2_t vector, vfloat32m1_t
scalar, size_t vl) {
  return __riscv_vfredmax_vs_f32mf2_f32m1(vector, scalar, vl);
}

There will be the ICE when build similar as "riscv64-unknown-elf-gcc
-march=rv64gc_zve64d -mabi=lp64 -O3 -Wno-psabi -c -S test-float.c -o -".

test-float.c: In function ‘test_vfredmax_vs_f32mf2_f32m1’:
test-float.c:17:10: error: invalid argument to built-in function
   17 |   return __riscv_vfredmax_vs_f32mf2_f32m1(vector, scalar, vl);
  |  ^~~~
during RTL pass: expand
test-float.c:17:10: internal compiler error: Segmentation fault
0x16e8945 crash_signal
   
/home/pli/repos/gcc/555/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/toplev.cc:314
0x7fcc1724251f ???
./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
0x111c93c store_expr(tree_node*, rtx_def*, int, bool, bool)
   
/home/pli/repos/gcc/555/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/expr.cc:6345
0x111ae77 expand_assignment(tree_node*, tree_node*, bool)
   
/home/pli/repos/gcc/555/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/expr.cc:6048
0xf65d2c expand_call_stmt
   
/home/pli/repos/gcc/555/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/cfgexpand.cc:2829
0xf69ac6 expand_gimple_stmt_1
   
/home/pli/repos/gcc/555/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/cfgexpand.cc:3880
0xf6a1b3 expand_gimple_stmt
   
/home/pli/repos/gcc/555/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/cfgexpand.cc:4044
0xf72d20 expand_gimple_basic_block
   
/home/pli/repos/gcc/555/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/cfgexpand.cc:6096
0xf75279 execute
   
/home/pli/repos/gcc/555/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/cfgexpand.cc:6831

[Bug c/110265] New: RISC-V: ICE when build RVV intrinsic with "-march=rv32gc_zve64d -mabi=ilp32d"

2023-06-15 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110265

Bug ID: 110265
   Summary: RISC-V: ICE when build RVV intrinsic with
"-march=rv32gc_zve64d -mabi=ilp32d"
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pan2.li at intel dot com
  Target Milestone: ---

Created attachment 55325
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55325=edit
Reproduce source code

Given we have the below code.

#include "riscv_vector.h"

vint16m1_t test_vredmax_vs_i16mf4_i16m1(vint16mf4_t vector, vint16m1_t scalar,
size_t vl) {
  return __riscv_vredmax_vs_i16mf4_i16m1(vector, scalar, vl);
}

There will be the ICE when build similar as "riscv64-unknown-elf-gcc
-march=rv32gc_zve64d -mabi=ilp32d -O3 -Wno-psabi test-int.c  -c -S -o -".

>> ../__RISC-V_INSTALL_/bin/riscv64-unknown-elf-gcc -march=rv32gc_zve64d 
>> -mabi=ilp32d -O3 -Wno-psabi test-int.c  -c -S -o -
.file   "test-int.c"
.option nopic
.attribute arch,
"rv32i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_zve32f1p0_zve32x1p0_zve64d1p0_zve64f1p0_zve64x1p0_zvl32b1p0_zvl64b1p0"
.attribute unaligned_access, 0
.attribute stack_align, 16
.text
test-int.c: In function ‘test_vredmax_vs_i16mf4_i16m1’:
test-int.c:4:10: error: invalid argument to built-in function
4 |   return __riscv_vredmax_vs_i16mf4_i16m1(vector, scalar, vl);
  |  ^~~
during RTL pass: expand
test-int.c:4:10: internal compiler error: Segmentation fault
0x16e7017 crash_signal
   
/home/pli/repos/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD/../gcc/toplev.cc:314
0x7f9dcf04251f ???
./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
0x111b9fa store_expr(tree_node*, rtx_def*, int, bool, bool)
   
/home/pli/repos/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD/../gcc/expr.cc:6352
0x1119e77 expand_assignment(tree_node*, tree_node*, bool)
   
/home/pli/repos/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD/../gcc/expr.cc:6048
0xf64d2c expand_call_stmt
   
/home/pli/repos/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD/../gcc/cfgexpand.cc:2829
0xf68ac6 expand_gimple_stmt_1
   
/home/pli/repos/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD/../gcc/cfgexpand.cc:3880
0xf691b3 expand_gimple_stmt
   
/home/pli/repos/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD/../gcc/cfgexpand.cc:4044
0xf71d20 expand_gimple_basic_block
   
/home/pli/repos/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD/../gcc/cfgexpand.cc:6096
0xf74279 execute
   
/home/pli/repos/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD/../gcc/cfgexpand.cc:6831

[Bug target/110146] ICE in riscv_vector::function_builder::add_unique_function()

2023-06-12 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110146

Li Pan  changed:

   What|Removed |Added

 CC||pan2.li at intel dot com

--- Comment #1 from Li Pan  ---
Hi Palmer,

It should be fixed already as I understand on June 06, give or take. Do you
still meet such issues from the upstream?

[Bug target/110109] RISC-V: ICE when build the Intrinsic code

2023-06-03 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110109

--- Comment #1 from Li Pan  ---
GCC 13 doesn't have this issue.

[Bug c/110109] New: RISC-V: ICE when build the Intrinsic code

2023-06-03 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110109

Bug ID: 110109
   Summary: RISC-V: ICE when build the Intrinsic code
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pan2.li at intel dot com
  Target Milestone: ---

Created attachment 55251
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55251=edit
Example code for reproducing

There will be one ICE when build below code with option similar to
'/_INSTALL_RISC-V/bin/riscv64-unknown-elf-gcc -march=rv64gcv -O3 tmp.c -c -S -o
-'

#include "riscv_vector.h"

void __attribute__ ((noinline, noclone))
clean_subreg (int32_t *in, int32_t *out, size_t m)
{
  vint16m8_t v24, v8, v16;
  vint32m8_t result = __riscv_vle32_v_i32m8 (in, 32);
  vint32m1_t v0 = __riscv_vget_v_i32m8_i32m1 (result, 0);
  vint32m1_t v1 = __riscv_vget_v_i32m8_i32m1 (result, 1);
  vint32m1_t v2 = __riscv_vget_v_i32m8_i32m1 (result, 2);
  vint32m1_t v3 = __riscv_vget_v_i32m8_i32m1 (result, 3);
  vint32m1_t v4 = __riscv_vget_v_i32m8_i32m1 (result, 4);
  vint32m1_t v5 = __riscv_vget_v_i32m8_i32m1 (result, 5);
  vint32m1_t v6 = __riscv_vget_v_i32m8_i32m1 (result, 6);
  vint32m1_t v7 = __riscv_vget_v_i32m8_i32m1 (result, 7);

  for (size_t i = 0; i < m; i++)
  {
v0 = __riscv_vadd_vv_i32m1(v0, v0, 4);
v1 = __riscv_vadd_vv_i32m1(v1, v1, 4);
v2 = __riscv_vadd_vv_i32m1(v2, v2, 4);
v3 = __riscv_vadd_vv_i32m1(v3, v3, 4);
v4 = __riscv_vadd_vv_i32m1(v4, v4, 4);
v5 = __riscv_vadd_vv_i32m1(v5, v5, 4);
v6 = __riscv_vadd_vv_i32m1(v6, v6, 4);
v7 = __riscv_vadd_vv_i32m1(v7, v7, 4);
  }

  vint32m8_t result2;

  result2 = __riscv_vset_v_i32m1_i32m8 (result2, 0, v0);
  result2 = __riscv_vset_v_i32m1_i32m8 (result2, 1, v1);
  result2 = __riscv_vset_v_i32m1_i32m8 (result2, 2, v2);
  result2 = __riscv_vset_v_i32m1_i32m8 (result2, 3, v3);
  result2 = __riscv_vset_v_i32m1_i32m8 (result2, 4, v4);
  result2 = __riscv_vset_v_i32m1_i32m8 (result2, 5, v5);
  result2 = __riscv_vset_v_i32m1_i32m8 (result2, 6, v6);
  result2 = __riscv_vset_v_i32m1_i32m8 (result2, 7, v7);

  __riscv_vse32_v_i32m8((int8_t *)(out), result2, 64);
 }

Then we will have ICE as below.

.file   "tmp.c"
.option nopic
.attribute arch,
"rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_v1p0_zicsr2p0_zifencei2p0_zve32f1p0_zve32x1p0_zve64d1p0_zve64f1p0_zve64x1p0_zvl128b1p0_zvl32b1p0_zvl64b1p0"
.attribute unaligned_access, 0
.attribute stack_align, 16
tmp.c: In function ‘clean_subreg’:
tmp.c:40:25: warning: passing argument 1 of ‘__riscv_vse32_v_i32m8’ from
incompatible pointer type [-Wincompatible-pointer-types]
   40 |   __riscv_vse32_v_i32m8((int8_t *)(out), result2, 64);
  | ^~~
  | |
  | int8_t * {aka signed char *}
In file included from tmp.c:1:
/home/pli/repos/gcc/111/riscv-gnu-toolchain/_INSTALL_RISC-V/lib/gcc/riscv64-unknown-elf/14.0.0/include/riscv_vector.h:94:9:
note: expected ‘int *’ but argument is of type ‘int8_t *’ {aka ‘signed char *’}
   94 | #pragma riscv intrinsic "vector"
  | ^
.text
during RTL pass: vregs
tmp.c:41:2: internal compiler error: in to_constant, at poly-int.h:504
   41 |  }
  |  ^
0xf22120 poly_int_pod<2u, unsigned short>::to_constant() const
   
/home/pli/repos/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD/../gcc/poly-int.h:504
0x26fb2fb pattern57
   
/home/pli/repos/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD/gcc/insn-recog.cc:4694
0x2863074 recog_410
   
/home/pli/repos/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD/../gcc/config/riscv/iterators.md:74
0x28901ec recog_440
   
/home/pli/repos/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD/../gcc/config/riscv/riscv.md:1621
0x2891f11 recog(rtx_def*, rtx_insn*, int*)
   
/home/pli/repos/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD/../gcc/config/riscv/iterators.md:52
0xf278e7 recog_memoized(rtx_insn*)
   
/home/pli/repos/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD/../gcc/recog.h:273
0x15dbaca extract_insn(rtx_insn*)
   
/home/pli/repos/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD/../gcc/recog.cc:2789
0x11a2edc instantiate_virtual_regs_in_insn
   
/home/pli/repos/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD/../gcc/function.cc:1611
0x11a4501 instantiate_virtual_regs
   
/home/pli/repos/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD/../gcc/function.cc:1984
0x11a45e8 execute
   
/home/pli/repos/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD/../gcc/function.cc:2033
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.

[Bug target/109974] New: RISCV: RVV VSETVL Pass ICE in SLP auto-vectorization

2023-05-25 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109974

Bug ID: 109974
   Summary: RISCV: RVV VSETVL Pass ICE in SLP auto-vectorization
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pan2.li at intel dot com
  Target Milestone: ---

Created attachment 55160
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55160=edit
Reproduce source file

Given the below example code with build option " -march=rv64gcv_zbb -O3
--param=riscv-autovec-preference=fixed-vlmax".

#include 

void __attribute__((noinline, noclone))
func (int8_t *__restrict x, int64_t *__restrict y, int n)
{
  for (int i = 0, j = 0; i < n; i++, j +=2 )
  {
x[i + 0] += 1;
y[j + 0] += 1;
y[j + 1] += 2;
  }
}

It will trigger one ICE during RTL pass: vsetvl.

.file   "test.c"
.option nopic
.attribute arch,
"rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_v1p0_zicsr2p0_zifencei2p0_zbb1p0_zve32f1p0_zve32x1p0_zve64d1p0_zve64f1p0_zve64x1p0_zvl128b1p0_zvl32b1p0_zvl64b1p0"
.attribute unaligned_access, 0
.attribute stack_align, 16
.text
during RTL pass: vsetvl
test.c: In function ‘func’:
test.c:12:1: internal compiler error: in source_equal_p, at
config/riscv/riscv-vsetvl.cc:1141
   12 | }
  | ^
0x1cd6902 source_equal_p
   
/home/pli/repos/gcc/reference/riscv-gnu-toolchain/gcc/__BUILD_RISC-V/../gcc/config/riscv/riscv-vsetvl.cc:1141
0x1cd7c59 riscv_vector::avl_info::single_source_equal_p(riscv_vector::avl_info
const&) const
   
/home/pli/repos/gcc/reference/riscv-gnu-toolchain/gcc/__BUILD_RISC-V/../gcc/config/riscv/riscv-vsetvl.cc:1658
0x1cd7f01 riscv_vector::avl_info::operator==(riscv_vector::avl_info const&)
const
   
/home/pli/repos/gcc/reference/riscv-gnu-toolchain/gcc/__BUILD_RISC-V/../gcc/config/riscv/riscv-vsetvl.cc:1722
0x1cd8fe0
riscv_vector::vector_insn_info::compatible_avl_p(riscv_vector::vl_vtype_info
const&) const
   
/home/pli/repos/gcc/reference/riscv-gnu-toolchain/gcc/__BUILD_RISC-V/../gcc/config/riscv/riscv-vsetvl.cc:2010
0x1cd6ce4 incompatible_avl_p
   
/home/pli/repos/gcc/reference/riscv-gnu-toolchain/gcc/__BUILD_RISC-V/../gcc/config/riscv/riscv-vsetvl.cc:1199
0x1ce453b
riscv_vector::demands_cond::dual_incompatible_p(riscv_vector::vector_insn_info
const&, riscv_vector::vector_insn_info const&) const
   
/home/pli/repos/gcc/reference/riscv-gnu-toolchain/gcc/__BUILD_RISC-V/../gcc/config/riscv/riscv-vsetvl.h:491
0x1cd8e36
riscv_vector::vector_insn_info::compatible_p(riscv_vector::vector_insn_info
const&) const
   
/home/pli/repos/gcc/reference/riscv-gnu-toolchain/gcc/__BUILD_RISC-V/../gcc/config/riscv/riscv-vsetvl.cc:1983
0x1cdbf13 pass_vsetvl::compute_local_backward_infos(rtl_ssa::bb_info const*)
   
/home/pli/repos/gcc/reference/riscv-gnu-toolchain/gcc/__BUILD_RISC-V/../gcc/config/riscv/riscv-vsetvl.cc:2819
0x1ce188b pass_vsetvl::lazy_vsetvl()
   
/home/pli/repos/gcc/reference/riscv-gnu-toolchain/gcc/__BUILD_RISC-V/../gcc/config/riscv/riscv-vsetvl.cc:4542
0x1ce1b40 pass_vsetvl::execute(function*)
   
/home/pli/repos/gcc/reference/riscv-gnu-toolchain/gcc/__BUILD_RISC-V/../gcc/config/riscv/riscv-vsetvl.cc:4601
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.

[Bug c/109773] New: RISC-V: ICE when build RVV Intrinisc

2023-05-08 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109773

Bug ID: 109773
   Summary: RISC-V: ICE when build RVV Intrinisc
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pan2.li at intel dot com
  Target Milestone: ---

There will be one ICE when trying to build the below code with the option
'-march=rv64gcv -O3 -fno-schedule-insns -c -S test.c -o -'.

#include "riscv_vector.h"
void f4 (int32_t * a, int32_t * b, int n)
{
if (n <= 0)
  return;
int i = n;
size_t vl = __riscv_vsetvl_e8mf4 (i);
for (; i >= 0; i--)
  {
vint32m1_t v = __riscv_vle32_v_i32m1 (a + i, vl);
v = __riscv_vle32_v_i32m1_tu (v, a + i + 100, vl);
__riscv_vse32_v_i32m1 (b + i, v, vl);

if (i >= vl)
  continue;
if (i == 0)
  return;
vl = __riscv_vsetvl_e8mf4 (vl);
  }
}

The output of compiling.

> riscv64-unknown-linux-gnu-gcc -march=rv64gcv -O3 -fno-schedule-insns -c -S 
> test.c -o -
.file   "test.c"
.option nopic
.attribute arch,
"rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_v1p0_zicsr2p0_zifencei2p0_zve32f1p0_zve32x1p0_zve64d1p0_zve64f1p0_zve64x1p0_zvl128b1p0_zvl32b1p0_zvl64b1p0"
.attribute unaligned_access, 0
.attribute stack_align, 16
.text
riscv64-unknown-linux-gnu-gcc: internal compiler error: Segmentation fault
signal terminated program cc1
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
See  for instructions.
> riscv-gnu-toolchain@<22:46:54 一 5月 
> 08>[4]*1*[3s]"~/bin/gnu-rv64-linux-13/bin/riscv64-unknown-linux-gnu-gcc 
> -march=rv64gcv -O3 -fno-schedule-insns -c -S test.c -o -"
>> ~/bin/gnu-rv64-linux-13/bin/riscv64-unknown-linux-gnu-gcc --version
riscv64-unknown-linux-gnu-gcc (g1c9c53f0256) 13.1.1 20230508
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[Bug target/109748] RISC-V: Mis code gen for the RVV intrinsic VSETVL

2023-05-05 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109748

--- Comment #2 from Li Pan  ---
No, should be introduced by one optimization of Juzhe in GCC 14. Juzhe is
working on fixing this, just open a bug on behalf of Juzhe for tracking.

[Bug c/109748] New: RISC-V: Mis code gen for the

2023-05-05 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109748

Bug ID: 109748
   Summary: RISC-V: Mis code gen for the
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pan2.li at intel dot com
  Target Milestone: ---

Created attachment 55007
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55007=edit
Test file for reproducing

Given we have bellow code.

#include 

int byte_mac_vec(unsigned char *a, unsigned char *b, int len) {
  size_t vlmax = __riscv_vsetvlmax_e8m1();
  vint32m4_t vec_s = __riscv_vmv_v_x_i32m4(0, vlmax);
  vint32m1_t vec_zero = __riscv_vmv_v_x_i32m1(0, vlmax);
  int k = len;

  for (size_t vl; k > 0; k -= vl, a += vl, b += vl) {
  vl = __riscv_vsetvl_e8m1(k);

  vuint8m1_t a8s = __riscv_vle8_v_u8m1(a, vl);
  vuint8m1_t b8s = __riscv_vle8_v_u8m1(b, vl);
  vuint32m4_t a8s_extended = __riscv_vzext_vf4_u32m4(a8s, vl);
  vuint32m4_t b8s_extended = __riscv_vzext_vf4_u32m4(a8s, vl);

  vint32m4_t a8s_as_i32 = __riscv_vreinterpret_v_u32m4_i32m4(a8s_extended);
  vint32m4_t b8s_as_i32 = __riscv_vreinterpret_v_u32m4_i32m4(b8s_extended);

  vec_s = __riscv_vmacc_vv_i32m4_tu(vec_s, a8s_as_i32, b8s_as_i32, vl);
  }

  vint32m1_t vec_sum = __riscv_vredsum_vs_i32m4_i32m1(vec_s, vec_zero,
__riscv_vsetvl_e32m4(len));
  int sum = __riscv_vmv_x_s_i32m1_i32(vec_sum);

  return sum;
}

It will generate the below assembly code with build option '-march=rv64gcv
-mabi=lp64 -O3 -c -S test.c -o -'.

byte_mac_vec:
vsetvli a5,zero,e32,m4,ta,ma
vmv.v.i v4,0
vsetvli zero,a5,e32,m1,ta,ma
vmv.v.i v2,0
ble a2,zero,.L2
mv  a4,a2
.L3:
vsetvli a5,a4,e8,m1,ta,ma   <- should be e32m4
subwa4,a4,a5
vle8.v  v1,0(a0)
add a0,a0,a5
vzext.vf4   v8,v1
vmacc.vvv4,v8,v8
bgt a4,zero,.L3
.L2:
vsetvli zero,a2,e32,m4,ta,ma
vredsum.vs  v4,v4,v2
vmv.x.s a0,v4
sext.w  a0,a0
ret

[Bug c/109743] New: RISC-V: Unnecessary VSETVLI of the RVV intrinsic in loop

2023-05-05 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109743

Bug ID: 109743
   Summary: RISC-V: Unnecessary VSETVLI of the RVV intrinsic in
loop
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pan2.li at intel dot com
  Target Milestone: ---

Created attachment 55004
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55004=edit
Test file for the unnecessary VSETVL for RVV intrinsic.

Assume we have the below example code, it looks like we can eliminate 2 VSETVL
instructions.

#include "riscv_vector.h"

void
foo2 (int32_t *a, int32_t *b, int n)
{
  if (n <= 0)
  return;
  int i = n;
  size_t vl = __riscv_vsetvl_e32m1 (i);

  for (; i >= 0; i--)
  {
vint32m1_t v = __riscv_vle32_v_i32m1 (a, vl);
__riscv_vse32_v_i32m1 (b, v, vl);

if (i >= vl)
  continue;

if (i == 0)
  return;

vl = __riscv_vsetvl_e32m1 (i);
  }
}

When compile with option '-march=rv64gc_zve64d -mabi=lp64d -O3 test.c -c -S -o
-', it will generate the assembly code like below.

foo2:
.LFB2:
.cfi_startproc
ble a2,zero,.L1
mv  a4,a2
li  a3,-1
vsetvli a5,a2,e32,m1,ta,mu
vsetvli zero,a5,e32,m1,ta,ma  <- can be eliminated.
.L5:
vle32.v v1,0(a0)
vse32.v v1,0(a1)
bgeua4,a5,.L3
.L10:
beq a2,zero,.L1
vsetvli a5,a4,e32,m1,ta,mu
addia4,a4,-1
vsetvli zero,a5,e32,m1,ta,ma  <- can be eliminated.
vle32.v v1,0(a0)
vse32.v v1,0(a1)
addiw   a2,a2,-1
bltua4,a5,.L10
.L3:
addiw   a2,a2,-1
addia4,a4,-1
bne a2,a3,.L5
.L1:
ret

GCC version:
riscv64-unknown-linux-gnu-gcc (gd7cb9720ed5) 14.0.0 20230503 (experimental)
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[Bug c/109617] New: RISC-V: ICE for vlmul_ext_v intrinsic API

2023-04-24 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109617

Bug ID: 109617
   Summary: RISC-V: ICE for vlmul_ext_v intrinsic API
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pan2.li at intel dot com
  Target Milestone: ---

Given we have the bellow sample code.

#include 

vint16m8_t test_vlmul_ext_v_i16mf4_i16m8(vint16mf4_t op1) {
  return __riscv_vlmul_ext_v_i16mf4_i16m8(op1);
}

It will have ICE when building with this option "riscv64-unknown-elf-gcc
-march=rv64gcv -O3 test.c -c -S -o -".

during RTL pass: expand
test.c: In function 'test_vlmul_ext_v_i16mf4_i16m8':
test.c:7:10: internal compiler error: in code_for_vlmul_extx32, at
./insn-opinit.h:572
7 |   return __riscv_vlmul_ext_v_i16mf4_i16m8(op1);
  |  ^
0x1c07559 code_for_vlmul_extx32(machine_mode)
./insn-opinit.h:572
0x1c0b14e riscv_vector::vlmul_ext::expand(riscv_vector::function_expander&)
const
   
/home/pli/repos/toolchains/gcc/reference/riscv-gnu-toolchain/gcc/__BUILD_RISC-V/../gcc/config/riscv/riscv-vector-builtins-bases.cc:1522
0x1c03b2d riscv_vector::expand_builtin(unsigned int, tree_node*, rtx_def*)
   
/home/pli/repos/toolchains/gcc/reference/riscv-gnu-toolchain/gcc/__BUILD_RISC-V/../gcc/config/riscv/riscv-vector-builtins.cc:3501
0x1bd8463 riscv_expand_builtin(tree_node*, rtx_def*, rtx_def*, machine_mode,
int)
   
/home/pli/repos/toolchains/gcc/reference/riscv-gnu-toolchain/gcc/__BUILD_RISC-V/../gcc/config/riscv/riscv-builtins.cc:379
0xe2951d expand_builtin(tree_node*, rtx_def*, rtx_def*, machine_mode, int)
   
/home/pli/repos/toolchains/gcc/reference/riscv-gnu-toolchain/gcc/__BUILD_RISC-V/../gcc/builtins.cc:7341
0x103cb67 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
   
/home/pli/repos/toolchains/gcc/reference/riscv-gnu-toolchain/gcc/__BUILD_RISC-V/../gcc/expr.cc:11864
0x102e744 expand_expr_real(tree_node*, rtx_def*, machine_mode, expand_modifier,
rtx_def**, bool)
   
/home/pli/repos/toolchains/gcc/reference/riscv-gnu-toolchain/gcc/__BUILD_RISC-V/../gcc/expr.cc:8999
0x1022e43 store_expr(tree_node*, rtx_def*, int, bool, bool)
   
/home/pli/repos/toolchains/gcc/reference/riscv-gnu-toolchain/gcc/__BUILD_RISC-V/../gcc/expr.cc:6330
0x10213c1 expand_assignment(tree_node*, tree_node*, bool)
   
/home/pli/repos/toolchains/gcc/reference/riscv-gnu-toolchain/gcc/__BUILD_RISC-V/../gcc/expr.cc:6048
0xe6cab9 expand_call_stmt
   
/home/pli/repos/toolchains/gcc/reference/riscv-gnu-toolchain/gcc/__BUILD_RISC-V/../gcc/cfgexpand.cc:2829
0xe70886 expand_gimple_stmt_1
   
/home/pli/repos/toolchains/gcc/reference/riscv-gnu-toolchain/gcc/__BUILD_RISC-V/../gcc/cfgexpand.cc:3880
0xe70f7a expand_gimple_stmt
   
/home/pli/repos/toolchains/gcc/reference/riscv-gnu-toolchain/gcc/__BUILD_RISC-V/../gcc/cfgexpand.cc:4044
0xe79c1a expand_gimple_basic_block
   
/home/pli/repos/toolchains/gcc/reference/riscv-gnu-toolchain/gcc/__BUILD_RISC-V/../gcc/cfgexpand.cc:6106
0xe7c17c execute
   
/home/pli/repos/toolchains/gcc/reference/riscv-gnu-toolchain/gcc/__BUILD_RISC-V/../gcc/cfgexpand.cc:6841

[Bug c/109615] New: Redundant VSETVL after optimized code of RVV

2023-04-24 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109615

Bug ID: 109615
   Summary: Redundant VSETVL after optimized code of RVV
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pan2.li at intel dot com
  Target Milestone: ---

Assume we have a sample code as below.


#include "riscv_vector.h"

void f (int8_t * restrict in, int8_t * restrict out, int n, int m, int cond)
{
  size_t vl = 101;
  if (cond)
vl = m * 2;
  else
vl = m * 2 * vl;

  for (size_t i = 0; i < n; i++)
{
  vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i, vl);
  __riscv_vse8_v_i8mf8 (out + i, v, vl);

  vbool64_t mask = __riscv_vlm_v_b64 (in + i + 100, vl);

  vint8mf8_t v2 = __riscv_vle8_v_i8mf8_tumu (mask, v, in + i + 100, vl);
  __riscv_vse8_v_i8mf8 (out + i + 100, v2, vl);
}

  for (size_t i = 0; i < n; i++)
{
  vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i + 300, vl);
  __riscv_vse8_v_i8mf8 (out + i + 300, v, vl);
}
}

Currently the upstream will generate the code as below with *-march=rv64gcv -O3
-frename-registers* options. It looks like the last vsetvl of .L4 bb is
redundant.

f:
slliw   a3,a3,1
bne a4,zero,.L2
li  a5,101
mul a3,a3,a5
.L2:
addia4,a1,100
add t1,a0,a2
mv  t0,a0
beq a2,zero,.L1
vsetvli zero,a3,e8,mf8,tu,mu
.L4:
addia6,t0,100
addia7,a4,-100
vle8.v  v1,0(t0)
addit0,t0,1
vse8.v  v1,0(a7)
vlm.v   v0,0(a6)
vle8.v  v1,0(a6),v0.t
vse8.v  v1,0(a4)
addia4,a4,1
bne t0,t1,.L4
addia0,a0,300
addia1,a1,300
add a2,a0,a2
vsetvli zero,a3,e8,mf8,ta,ma   // <= redundant ?
.L5:
vle8.v  v2,0(a0)
addia0,a0,1
vse8.v  v2,0(a1)
addia1,a1,1
bne a2,a0,.L5
.L1:
ret

[Bug target/109535] internal compiler error: in finalize_new_accesses, at rtl-ssa/changes.cc:471

2023-04-17 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109535

Li Pan  changed:

   What|Removed |Added

 CC||pan2.li at intel dot com

--- Comment #6 from Li Pan  ---
(In reply to rsand...@gcc.gnu.org from comment #2)
> The assert in question fires if the pass creates an instruction
> whose pattern uses a register or memory and if the pass doesn't
> provide associated use information.  Let me know if it looks like
> a bug in rtl-ssa rather than a bug in the vsetvl pass.

Just sync with juzhe for the assertion failure. It tries to find the regno=8 in
the shared_uses=[0, 66, 67] in function_info::finalize_new_accesses. And then
it will hit the NOT_NULL assert.

While in the pass_vsetvl::cleanup_insns, it will clean up the AVL use similar
as below.

AVL regno 8
Before Uses Reg Nums => [0,8,66,67,]
After  Uses Reg Nums => [0,66,67,]

After vsetvl, the avl-related use, aka use Regno=8 will be removed, because the
instruction pattern in RVV will eliminate the dependencies of the operand.

Juzhe can help to correct me if any misleading or misunderstanding.

Thanks.

  1   2   >