[Bug middle-end/113474] RISC-V: Fail to use vmerge.vim for constant vector

2024-05-17 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113474 JuzheZhong changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED

[Bug target/115093] RISC-V Vector ICE in extract_insn: unrecognizable insn

2024-05-15 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115093 JuzheZhong changed: What|Removed |Added CC||juzhe.zhong at rivai dot ai --- Comment

[Bug c/115104] RISC-V: GCC-14 can combine vsext+vadd -> vwadd but Trunk GCC (GCC 15) Failed

2024-05-15 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115104 --- Comment #1 from JuzheZhong --- I wonder whether RIVOS CI already found which commit cause this regression ?

[Bug c/115104] New: RISC-V: GCC-14 can combine vsext+vadd -> vwadd but Trunk GCC (GCC 15) Failed

2024-05-15 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115104 Bug ID: 115104 Summary: RISC-V: GCC-14 can combine vsext+vadd -> vwadd but Trunk GCC (GCC 15) Failed Product: gcc Version: 15.0 Status: UNCONFIRMED Severity:

[Bug c/115068] New: RISC-V: Illegal instruction of vfwadd

2024-05-13 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115068 Bug ID: 115068 Summary: RISC-V: Illegal instruction of vfwadd Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c

[Bug target/114988] RISC-V: ICE in intrinsic __riscv_vfwsub_wf_f32mf2

2024-05-08 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114988 --- Comment #2 from JuzheZhong --- Li Pan is going to work on it. Hi, kito and Jeff. Can this fix backport to GCC-14 ?

[Bug c/114988] RISC-V: ICE in intrinsic __riscv_vfwsub_wf_f32mf2

2024-05-08 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114988 --- Comment #1 from JuzheZhong --- Ideally, it should be reported as (-march=rv64gc): https://godbolt.org/z/3P76YEb9s : In function 'test_vfwsub_wf_f32mf2': :4:15: error: return type 'vfloat32mf2_t' requires the V ISA extension 4 |

[Bug c/114988] New: RISC-V: ICE in intrinsic __riscv_vfwsub_wf_f32mf2

2024-05-08 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114988 Bug ID: 114988 Summary: RISC-V: ICE in intrinsic __riscv_vfwsub_wf_f32mf2 Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3

[Bug target/114887] RISC-V: expect M8 but M4 generated with dynamic LMUL for TSVC s319

2024-04-29 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114887 --- Comment #2 from JuzheZhong --- I think there is a too conservative analysis here: note: _1: type = float, start = 1, end = 6 note: _5: type = float, start = 6, end = 8 note: _3: type = float, start = 3, end = 7 note: _4: type =

[Bug target/114887] RISC-V: expect M8 but M4 generated with dynamic LMUL for TSVC s319

2024-04-29 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114887 --- Comment #1 from JuzheZhong --- The "vect" cost model analysis: https://godbolt.org/z/qbqzon8x1 note: Maximum lmul = 8, At most 40 number of live V_REG at program point 6 for bb 3 It seems that we count one more variable in program

[Bug target/114639] [riscv] ICE in create_pre_exit, at mode-switching.cc:451

2024-04-28 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114639 --- Comment #18 from JuzheZhong --- (In reply to Li Pan from comment #17) > According to the V abi, looks like the asm code tries to save/restore the > callee-saved registers when there is a call in function body. > > | Name| ABI Mnemonic

[Bug target/114639] [riscv] ICE in create_pre_exit, at mode-switching.cc:451

2024-04-22 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114639 --- Comment #16 from JuzheZhong --- This issue is not fully fixed since the fixed patch only fixes ICE but there is a regression in codegen: https://godbolt.org/z/4nvxeqb6K Terrible codege: test(__rvv_uint64m4_t): addisp,sp,-16

[Bug target/114809] [RISC-V RVV] Counting elements might be simpler

2024-04-22 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114809 JuzheZhong changed: What|Removed |Added CC||juzhe.zhong at rivai dot ai --- Comment

[Bug target/114714] [RISC-V][RVV] ICE: insn does not satisfy its constraints (postreload)

2024-04-22 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114714 JuzheZhong changed: What|Removed |Added CC||juzhe.zhong at rivai dot ai --- Comment

[Bug tree-optimization/114749] [13 Regression] RISC-V rv64gcv ICE: in vectorizable_load, at tree-vect-stmts.cc

2024-04-17 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114749 --- Comment #4 from JuzheZhong --- Hi, Patrick. It seems that Richard didn't append the testcase in the patch. Could you send a patch to add the testcase for RISC-V port ? Thangks.

[Bug rtl-optimization/114729] RISC-V SPEC2017 507.cactu excessive spillls with -fschedule-insns

2024-04-15 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114729 JuzheZhong changed: What|Removed |Added CC||juzhe.zhong at rivai dot ai --- Comment

[Bug target/114686] Feature request: Dynamic LMUL should be the default for the RISC-V Vector extension

2024-04-13 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114686 JuzheZhong changed: What|Removed |Added CC||juzhe.zhong at rivai dot ai --- Comment

[Bug target/114639] [riscv] ICE in create_pre_exit, at mode-switching.cc:451

2024-04-08 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114639 --- Comment #6 from JuzheZhong --- Definitely it is a regression: https://compiler-explorer.com/z/e68x5sT9h GCC 13.2 is ok, but GCC 14 ICE. I think you should bisect first.

[Bug tree-optimization/114476] [13/14 Regression] wrong code with -fwrapv -O3 -fno-vect-cost-model (and -march=armv9-a+sve2 on aarch64 and -march=rv64gcv on riscv)

2024-04-02 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114476 --- Comment #7 from JuzheZhong --- Hi, Robin. Will you fix this bug ?

[Bug target/114506] RISC-V: expect M8 but M4 generated with dynamic LMUL

2024-03-28 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114506 JuzheZhong changed: What|Removed |Added CC||juzhe.zhong at rivai dot ai --- Comment

[Bug tree-optimization/114396] [13/14 Regression] Vector: Runtime mismatch at -O2 with -fwrapv since r13-7988-g82919cf4cb2321

2024-03-21 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114396 --- Comment #19 from JuzheZhong --- I think it's better to add pr114396.c into vect testsuite instead of x86 target test since it's the bug not only happens on x86.

[Bug tree-optimization/113281] [11/12/13 Regression] Latent wrong code due to vectorization of shift reduction and missing promotions since r9-1590

2024-03-13 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113281 --- Comment #28 from JuzheZhong --- The original cost model I did work for all cases but with some middle-end changes the cost model failed. I don't have time to figure out what's going on here. Robin may be interested at it.

[Bug middle-end/114109] x264 satd vectorization vs LLVM

2024-02-26 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114109 --- Comment #3 from JuzheZhong --- (In reply to Robin Dapp from comment #2) > It is vectorized with a higher zvl, e.g. zvl512b, refer > https://godbolt.org/z/vbfjYn5Kd. OK. I see. But Clang generates many slide instruction which are expensive

[Bug middle-end/114109] x264 satd vectorization vs LLVM

2024-02-26 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114109 --- Comment #1 from JuzheZhong --- It seems RISC-V Clang didn't vectorize it ? https://godbolt.org/z/G4han6vM3

[Bug target/113913] [14] RISC-V: suboptimal code gen for intrinsic vcreate

2024-02-16 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113913 --- Comment #2 from JuzheZhong --- It's the known issue we are trying to fix it in GCC-15. My colleague Lehua is taking care of it. CCing Lehua.

[Bug tree-optimization/113583] Main loop in 519.lbm not vectorized.

2024-02-07 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 --- Comment #16 from JuzheZhong --- The FMA is generated in widening_mul PASS: Before widening_mul (fab1): _5 = 3.33314829616256247390992939472198486328125e-1 - _4; _6 = _5 *

[Bug tree-optimization/113583] Main loop in 519.lbm not vectorized.

2024-02-07 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 --- Comment #15 from JuzheZhong --- (In reply to rguent...@suse.de from comment #14) > On Wed, 7 Feb 2024, juzhe.zhong at rivai dot ai wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 > > > > --- Comment #13 from JuzheZhong

[Bug tree-optimization/113583] Main loop in 519.lbm not vectorized.

2024-02-06 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 --- Comment #13 from JuzheZhong --- Ok. I found the optimized tree: _5 = 3.33314829616256247390992939472198486328125e-1 - _4; _8 = .FMA (_5, 1.229982236431605997495353221893310546875e-1, _4); Let CST0 =

[Bug tree-optimization/113583] Main loop in 519.lbm not vectorized.

2024-02-06 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 --- Comment #12 from JuzheZhong --- Ok. I found it even without vectorization: GCC is worse than Clang: https://godbolt.org/z/addr54Gc6 GCC (14 instructions inside the loop): fld fa3,0(a0) fld fa5,8(a0) fld

[Bug tree-optimization/113583] Main loop in 519.lbm not vectorized.

2024-02-04 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 --- Comment #11 from JuzheZhong --- Hi, I think this RVV compiler codegen is that optimal codegen we want for RVV: https://repo.hca.bsc.es/epic/z/P6QXCc .LBB0_5:# %vector.body sub a4, t0, a3

[Bug tree-optimization/113134] gcc does not version loops with early break conditions that don't have side-effects

2024-02-02 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113134 --- Comment #22 from JuzheZhong --- I have done this following experiment. diff --git a/gcc/tree-ssa-loop-ivcanon.cc b/gcc/tree-ssa-loop-ivcanon.cc index bf017137260..8c36cc63d3b 100644 --- a/gcc/tree-ssa-loop-ivcanon.cc +++

[Bug target/113608] RISC-V: Vector spills after enabling vector abi

2024-02-01 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113608 --- Comment #2 from JuzheZhong --- vuint16m2_t vadd(vuint16m2_t a, vuint8m1_t b) { int vl = __riscv_vsetvlmax_e8m1(); vuint16m2_t c = __riscv_vzext_vf2_u16m2(b, vl); return __riscv_vadd_vv_u16m2(a, c, vl); }

[Bug tree-optimization/113134] gcc does not version loops with early break conditions that don't have side-effects

2024-02-01 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113134 --- Comment #21 from JuzheZhong --- Hi, Richard. I looked into ivcanon. I found that: /* If the loop has more than one exit, try checking all of them for # of iterations determinable through scev. */ if (!exit)

[Bug tree-optimization/51492] vectorizer does not support saturated arithmetic patterns

2024-02-01 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492 --- Comment #11 from JuzheZhong --- Hi, Tamar. We are interested in supporting saturating and rounding. We may need to support scalar first. Do you have any suggestions ? Or you are already working on it? Thanks.

[Bug tree-optimization/51492] vectorizer does not support saturated arithmetic patterns

2024-02-01 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492 --- Comment #10 from JuzheZhong --- Hi, Tamar. We are interested in supporting saturating and rounding. We may need to support scalar first. Do you have any suggestions ? Or you are already working on it? Thanks.

[Bug tree-optimization/51492] vectorizer does not support saturated arithmetic patterns

2024-02-01 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492 --- Comment #9 from JuzheZhong --- Ok. After investigation of LLVM: Before loop vectorizer: %cond12 = tail call i32 @llvm.usub.sat.i32(i32 %conv5, i32 %wsize) %conv13 = trunc i32 %cond12 to i16 After loop vectorizer: %10 = call <16 x

[Bug tree-optimization/51492] vectorizer does not support saturated arithmetic patterns

2024-02-01 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492 --- Comment #8 from JuzheZhong --- Missing saturate vectorization causes RVV Clang 20% performance better than RVV GCC during recent benchmark evaluation. In coremark pro zip-test, I believe other targets should be the same. I wonder how we

[Bug c/113695] RISC-V: Sources with different EEW must use different registers

2024-01-31 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113695 --- Comment #1 from JuzheZhong --- Since both operand are input operand, early clobber "&" constraint can not help.

[Bug c/113695] New: RISC-V: Sources with different EEW must use different registers

2024-01-31 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113695 Bug ID: 113695 Summary: RISC-V: Sources with different EEW must use different registers Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal

[Bug tree-optimization/113134] gcc does not version loops with early break conditions that don't have side-effects

2024-01-31 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113134 --- Comment #19 from JuzheZhong --- The loop is: bb 3 -> bb 4 -> bb 5 | |__⬆ |__⬆ The condition in bb 3 is if (i_21 == 1001). The condition in bb 4 is if (N_13(D) > i_18). Look into lsplit: This loop doesn't

[Bug tree-optimization/99395] s116 benchmark of TSVC is vectorized by clang and not by gcc

2024-01-31 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395 --- Comment #18 from JuzheZhong --- (In reply to rguent...@suse.de from comment #17) > On Wed, 31 Jan 2024, juzhe.zhong at rivai dot ai wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395 > > > > --- Comment #16 from JuzheZhong

[Bug tree-optimization/99395] s116 benchmark of TSVC is vectorized by clang and not by gcc

2024-01-31 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395 --- Comment #16 from JuzheZhong --- (In reply to rguent...@suse.de from comment #15) > On Wed, 31 Jan 2024, juzhe.zhong at rivai dot ai wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395 > > > > --- Comment #14 from JuzheZhong

[Bug tree-optimization/99395] s116 benchmark of TSVC is vectorized by clang and not by gcc

2024-01-31 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395 --- Comment #14 from JuzheZhong --- Thanks Richard. It seems that we can't fix this issue for now. Is that right ? If I understand correctly, do you mean we should wait after SLP representations are finished and then revisit this PR?

[Bug tree-optimization/99395] s116 benchmark of TSVC is vectorized by clang and not by gcc

2024-01-30 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395 --- Comment #12 from JuzheZhong --- OK. It seems it has data dependency issue: missed: not vectorized, possible dependence between data-refs a[i_15] and a[_4] a[i_15] = _3; STMT 1 _4 = i_15 + 2; _5 = a[_4];STMT 2 STMT2 should not

[Bug tree-optimization/99395] s116 benchmark of TSVC is vectorized by clang and not by gcc

2024-01-30 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395 --- Comment #11 from JuzheZhong --- It seems that we should fix this case (Richard gave) first which I think it's not the SCEV or value-numbering issue: double a[1024]; void foo () { for (int i = 0; i < 1022; i += 2) { double tem =

[Bug tree-optimization/99395] s116 benchmark of TSVC is vectorized by clang and not by gcc

2024-01-30 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395 --- Comment #10 from JuzheZhong --- I think the root cause is we think i_16 and _1 are alias due to scalar evolution: (get_scalar_evolution (scalar = i_16) (scalar_evolution = {0, +, 2}_1)) (get_scalar_evolution (scalar = _1)

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-30 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 JuzheZhong changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug target/113607] [14] RISC-V rv64gcv vector: Runtime mismatch at -O3

2024-01-30 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113607 --- Comment #20 from JuzheZhong --- (In reply to Robin Dapp from comment #19) > What seems odd to me is that in fre5 we simplify > > _429 = .COND_SHL (mask_patt_205.47_276, vect_cst__262, vect_cst__262, { 0, > ... }); >

[Bug tree-optimization/99395] s116 benchmark of TSVC is vectorized by clang and not by gcc

2024-01-30 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395 --- Comment #8 from JuzheZhong --- Hi, Richard. Now, I find the time to GCC vectorization optimization. I find this case: _2 = a[_1]; ... a[i_16] = _4; ,,, _7 = a[_1];---> This load should be eliminated and re-use _2. Am I

[Bug middle-end/113166] RISC-V: Redundant move instructions in RVV intrinsic codes

2024-01-30 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113166 --- Comment #3 from JuzheZhong --- #include #include template inline vuint8m1_t tail_load(void const* data); template<> inline vuint8m1_t tail_load(void const* data) { uint64_t const* ptr64 = reinterpret_cast(data); #if 1 const

[Bug c/113666] New: RISC-V: Cost model test regression due to recent middle-end loop vectorizer changes

2024-01-29 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113666 Bug ID: 113666 Summary: RISC-V: Cost model test regression due to recent middle-end loop vectorizer changes Product: gcc Version: 14.0 Status: UNCONFIRMED

[Bug target/113607] [14] RISC-V rv64gcv vector: Runtime mismatch at -O3

2024-01-28 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113607 --- Comment #15 from JuzheZhong --- Hi, Robin. I tried to disable vec_extract, then the case passed. diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index 3b32369f68c..b61b886ef3d 100644 ---

[Bug target/113607] [14] RISC-V rv64gcv vector: Runtime mismatch at -O3

2024-01-26 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113607 --- Comment #13 from JuzheZhong --- Ok. I found a regression between rvv-next and trunk. I believe it is GCC-12 vs GCC-14: rvv-next: ... .L11: li t1,31 mv a2,a1 bleua7,t1,.L12 bne a6,zero,.L13

[Bug target/113607] [14] RISC-V rv64gcv vector: Runtime mismatch at -O3

2024-01-26 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113607 --- Comment #11 from JuzheZhong --- (In reply to Robin Dapp from comment #10) > The compile farm machine I'm using doesn't have SVE. > Compiling with -march=armv8-a -O3 pr113607.c -fno-vect-cost-model and > running it returns 0 (i.e. ok). > >

[Bug target/113607] [14] RISC-V rv64gcv vector: Runtime mismatch at -O3

2024-01-26 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113607 --- Comment #9 from JuzheZhong --- Hi, Robin. Could you try this case on latest ARM SVE ? with -march=armv8-a+sve -O3 -fno-vect-cost-model. I want to make sure first it is not an middle-end bug. The RVV vectorized IR is same as ARM SVE.

[Bug target/113607] [14] RISC-V rv64gcv vector: Runtime mismatch at -O3

2024-01-26 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113607 --- Comment #8 from JuzheZhong --- Ok. I can reproduce it too. I am gonna work on fixing it. Thanks.

[Bug c/113608] New: RISC-V: Vector spills after enabling vector abi

2024-01-25 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113608 Bug ID: 113608 Summary: RISC-V: Vector spills after enabling vector abi Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component:

[Bug target/113607] [14] RISC-V rv64gcv vector: Runtime mismatch at -O3

2024-01-25 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113607 --- Comment #3 from JuzheZhong --- I tried trunk GCC to run your case with SPIKE, still didn't reproduce this issue.

[Bug target/113607] [14] RISC-V rv64gcv vector: Runtime mismatch at -O3

2024-01-25 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113607 --- Comment #2 from JuzheZhong --- I can't reproduce this issue. Could you test it with this patch applied ? https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643934.html

[Bug target/113607] [14] RISC-V rv64gcv vector: Runtime mismatch at -O3

2024-01-25 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113607 --- Comment #1 from JuzheZhong --- I can reproduce this issue. Could you test it with this patch applied ? https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643934.html

[Bug tree-optimization/113583] Main loop in 519.lbm not vectorized.

2024-01-25 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 --- Comment #7 from JuzheZhong --- (In reply to rguent...@suse.de from comment #6) > On Thu, 25 Jan 2024, juzhe.zhong at rivai dot ai wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 > > > > --- Comment #5 from JuzheZhong ---

[Bug target/113570] RISC-V: SPEC2017 549 fotonik3d miscompilation in autovec VLS 256 build

2024-01-24 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113570 --- Comment #5 from JuzheZhong --- It seems that we don't have any bugs in current SPEC 2017 testing. So I strongly suggest "full coverage" testing on SPEC 2017 which I mentioned in PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113087

[Bug tree-optimization/113583] Main loop in 519.lbm not vectorized.

2024-01-24 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 --- Comment #5 from JuzheZhong --- Both ICC and Clang X86 can vectorize SPEC 2017 lbm: https://godbolt.org/z/MjbTbYf1G But I am not sure X86 ICC is better or X86 Clang is better.

[Bug tree-optimization/113583] Main loop in 519.lbm not vectorized.

2024-01-24 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 --- Comment #4 from JuzheZhong --- OK. Confirm on X86 GCC failed to vectorize it, wheras Clang X86 can vectorize it. https://godbolt.org/z/EaTjGbPGW X86 Clang and RISC-V Clang IR are same: %12 = tail call <8 x double>

[Bug target/113087] [14] RISC-V rv64gcv vector: Runtime mismatch with rv64gc

2024-01-24 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113087 --- Comment #44 from JuzheZhong --- (In reply to Patrick O'Neill from comment #43) > (In reply to Patrick O'Neill from comment #42) > > I kicked off a run roughly 10 hours ago with your memory-hog fix patch > > applied to

[Bug tree-optimization/113583] Main loop in 519.lbm not vectorized.

2024-01-24 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 --- Comment #3 from JuzheZhong --- Ok I see. If we change NN into 8, then we can vectorize it with load_lanes/store_lanes with group size = 8: https://godbolt.org/z/doe9c3hfo We will use vlseg8e64 which is RVVM1DF[8] == RVVM1x8DFmode. Here

[Bug tree-optimization/113583] Main loop in 519.lbm not vectorized.

2024-01-24 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 --- Comment #1 from JuzheZhong --- It's interesting, for Clang only RISC-V can vectorize it. I think there are 2 topics: 1. Support vectorization of this codes of in loop vectorizer. 2. Transform gather/scatter into strided load/store for

[Bug target/113087] [14] RISC-V rv64gcv vector: Runtime mismatch with rv64gc

2024-01-23 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113087 --- Comment #41 from JuzheZhong --- Hi, Patrick. Could you trigger test again base on latest trunk GCC? We have recent memory-hog fix patch: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=3132d2d36b4705bb762e61b1c8ca4da7c78a8321 I want to

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop

2024-01-23 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #24 from JuzheZhong --- (In reply to Richard Biener from comment #19) > (In reply to Richard Biener from comment #18) > > (In reply to Tamar Christina from comment #17) > > > Ok, bisected to > > > > > >

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop

2024-01-23 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #14 from JuzheZhong --- I just tried again both GCC-13.2 and GCC-14 with -fno-vect-cost-model. https://godbolt.org/z/enEG3qf5K GCC-14 requires scalar epilogue loop, whereas GCC-13.2 doesn't. I believe it's not cost model issue.

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop

2024-01-23 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #12 from JuzheZhong --- (In reply to Richard Biener from comment #11) > (In reply to Tamar Christina from comment #9) > > There is a weird costing going on in the PHI nodes though: > > > > m_108 = PHI 1 times vector_stmt costs 0

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop

2024-01-22 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #10 from JuzheZhong --- (In reply to Tamar Christina from comment #9) > So on SVE the change is cost modelling. > > Bisect landed on g:33c2b70dbabc02788caabcbc66b7baeafeb95bcf which changed > the compiler's defaults to using the

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-22 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #31 from JuzheZhong --- machine dep reorg : 403.69 ( 56%) 23.48 ( 93%) 427.17 ( 57%) 5290k ( 0%) Confirm remove RTL DF checking, LICM is no longer be compile-time hog issue. VSETVL PASS count 56% compile-time.

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-22 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #30 from JuzheZhong --- Ok. I believe m_avl_def_in && m_avl_def_out can be removed with a better algorthm. Then the memory-hog should be fixed soon. I am gonna rewrite avl_vl_unmodified_between_p and trigger full coverage testingl

[Bug tree-optimization/113441] [13/14 Regression] Fail to fold the last element with multiple loop

2024-01-22 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #8 from JuzheZhong --- I believe the change between Nov and Dec causes regression. But I don't continue on bisection. Hope this information can help with your bisection. Thanks.

[Bug tree-optimization/113441] [13/14 Regression] Fail to fold the last element with multiple loop

2024-01-22 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #7 from JuzheZhong --- (In reply to Tamar Christina from comment #6) > Hello, > > I can bisect it if you want. it should only take a few seconds. Ok. Thanks a lot ... I take 2 hours to bisect it manually but still didn't locate

[Bug tree-optimization/113441] [13/14 Regression] Fail to fold the last element with multiple loop

2024-01-22 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #5 from JuzheZhong --- Confirm at Nov, 1. The regression is gone. https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=eac0917bd3d2ead4829d56c8f2769176087c7b3d This commit is ok, which has no regressions. Still bisecting manually.

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-22 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #28 from JuzheZhong --- (In reply to Robin Dapp from comment #27) > Following up on this: > > I'm seeing the same thing Patrick does. We create a lot of large non-sparse > sbitmaps that amount to around 33G in total. > > I did

[Bug target/113420] risc-v vector: ICE when using C compiler compile C++ RVV intrinsics

2024-01-21 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113420 JuzheZhong changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-19 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #25 from JuzheZhong --- RISC-V backend memory-hog issue is fixed. But compile time hog in LICM still there, so keep this PR open.

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-19 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #22 from JuzheZhong --- (In reply to Richard Biener from comment #21) > I once tried to avoid df_reorganize_refs and/or optimize this with the > blocks involved but failed. I am considering whether we should disable LICM for RISC-V

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-19 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #19 from JuzheZhong --- (In reply to JuzheZhong from comment #18) > Hi, Robin. > > I have fixed patch for memory-hog: > https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643418.html > > I will commit it after the testing. >

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-19 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #18 from JuzheZhong --- Hi, Robin. I have fixed patch for memory-hog: https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643418.html I will commit it after the testing. But compile-time hog still exists which is loop

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-18 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #17 from JuzheZhong --- Ok. Confirm the original test 33383M -> 4796k now.

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-18 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #16 from JuzheZhong --- (In reply to Andrew Pinski from comment #15) > (In reply to JuzheZhong from comment #14) > > Oh. I known the reason now. > > > > The issue is not RISC-V backend VSETVL PASS. > > > > It's memory bug of

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-18 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #14 from JuzheZhong --- Oh. I known the reason now. The issue is not RISC-V backend VSETVL PASS. It's memory bug of rtx_equal_p I think. We are calling rtx_equal_p which is very costly. For example, has_nonvlmax_reg_avl is

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-18 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #13 from JuzheZhong --- So I think we should investigate why calling has_nonvlmax_reg_avl cost so much memory.

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-18 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #12 from JuzheZhong --- Ok. Here is a simple fix which give some hints: diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc index 2067073185f..ede818140dc 100644 --- a/gcc/config/riscv/riscv-vsetvl.cc

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-18 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #11 from JuzheZhong --- It should be compute_lcm_local_properties. The memory usage reduce 50% after I remove this function. I am still investigating.

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-18 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #10 from JuzheZhong --- No, it's not caused here. I removed the whole function compute_avl_def_data. The memory usage doesn't change.

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-18 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #6 from JuzheZhong --- (In reply to Andrew Pinski from comment #5) > Note "loop invariant motion" is the RTL based loop invariant motion pass. So you mean it should be still RISC-V issue, right ?

[Bug tree-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-18 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #4 from JuzheZhong --- Also, the original file with -fno-move-loop-invariants reduce compile time from 60 minutes into 7 minutes: real7m12.528s user6m55.214s sys 0m17.147s machine dep reorg : 75.93 (

[Bug tree-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-18 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #3 from JuzheZhong --- Ok. The reduced case: # 1 "module_first_rk_step_part1.fppized.f90" # 1 "" # 1 "" # 1 "module_first_rk_step_part1.fppized.f90" !WRF:MEDIATION_LAYER:SOLVER MODULE module_first_rk_step_part1 CONTAINS

[Bug tree-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-18 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #2 from JuzheZhong --- To build the attachment file, we need these following file from SPEC2017: module_big_step_utilities_em.mod module_cumulus_driver.mod module_fddagd_driver.modmodule_model_constants.mod

[Bug tree-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-18 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #1 from JuzheZhong --- Created attachment 57149 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57149=edit spec2017 wrf spec2017 wrf

[Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-18 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 Bug ID: 113495 Summary: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal

[Bug middle-end/113166] RISC-V: Redundant move instructions in RVV intrinsic codes

2024-01-18 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113166 --- Comment #2 from JuzheZhong --- #include #if TO_16 # define uintOut_t uint16_t # define utf8_to_utf32_scalar utf8_to_utf16_scalar # define utf8_to_utf32_rvv utf8_to_utf16_rvv #else # define uintOut_t uint32_t #endif size_t

[Bug c/113474] RISC-V: Fail to use vmerge.vim for constant vector

2024-01-18 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113474 --- Comment #2 from JuzheZhong --- Oh. It's pretty simple fix. I am not sure whether Richards allow it since it's stage4 but worth to have a try. Could you send a patch ?

[Bug c/113474] New: RISC-V: Fail to use vmerge.vim for constant vector

2024-01-18 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113474 Bug ID: 113474 Summary: RISC-V: Fail to use vmerge.vim for constant vector Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3

[Bug target/113429] RISC-V: SPEC2017 527 cam4 miscompilation in autovec VLA build

2024-01-17 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113429 --- Comment #10 from JuzheZhong --- I have commit V3 patch with rebasing since V2 patch conflicts with the trunk. I think you can use trunk GCC validate CAM4 directly now.

  1   2   3   4   5   6   >