[Bug tree-optimization/113583] Main loop in 519.lbm not vectorized.

2024-05-16 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 --- Comment #19 from Richard Biener --- (In reply to Robin Dapp from comment #18) [...] > Regarding the mentioned element-wise costing how should we proceed here? > I'm going to remove the hunk in question, run SPEC2017 on x86 and post a >

[Bug tree-optimization/113583] Main loop in 519.lbm not vectorized.

2024-05-13 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 --- Comment #18 from Robin Dapp --- A bit of a follow-up: I'm working on a patch for reassociation that can handle the mentioned cases and some more but it will still require a bit of time to get everything regression free and correct. What

[Bug tree-optimization/113583] Main loop in 519.lbm not vectorized.

2024-02-07 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 --- Comment #17 from rguenther at suse dot de --- On Wed, 7 Feb 2024, juzhe.zhong at rivai dot ai wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 > > --- Comment #16 from JuzheZhong --- > The FMA is generated in widening_mul

[Bug tree-optimization/113583] Main loop in 519.lbm not vectorized.

2024-02-07 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 --- Comment #14 from rguenther at suse dot de --- On Wed, 7 Feb 2024, juzhe.zhong at rivai dot ai wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 > > --- Comment #13 from JuzheZhong --- > Ok. I found the optimized tree: > > >

[Bug tree-optimization/113583] Main loop in 519.lbm not vectorized.

2024-02-07 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 --- Comment #16 from JuzheZhong --- The FMA is generated in widening_mul PASS: Before widening_mul (fab1): _5 = 3.33314829616256247390992939472198486328125e-1 - _4; _6 = _5 *

[Bug tree-optimization/113583] Main loop in 519.lbm not vectorized.

2024-02-07 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 --- Comment #15 from JuzheZhong --- (In reply to rguent...@suse.de from comment #14) > On Wed, 7 Feb 2024, juzhe.zhong at rivai dot ai wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 > > > > --- Comment #13 from JuzheZhong

[Bug tree-optimization/113583] Main loop in 519.lbm not vectorized.

2024-02-06 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 --- Comment #13 from JuzheZhong --- Ok. I found the optimized tree: _5 = 3.33314829616256247390992939472198486328125e-1 - _4; _8 = .FMA (_5, 1.229982236431605997495353221893310546875e-1, _4); Let CST0 =

[Bug tree-optimization/113583] Main loop in 519.lbm not vectorized.

2024-02-06 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 --- Comment #12 from JuzheZhong --- Ok. I found it even without vectorization: GCC is worse than Clang: https://godbolt.org/z/addr54Gc6 GCC (14 instructions inside the loop): fld fa3,0(a0) fld fa5,8(a0) fld

[Bug tree-optimization/113583] Main loop in 519.lbm not vectorized.

2024-02-04 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 --- Comment #11 from JuzheZhong --- Hi, I think this RVV compiler codegen is that optimal codegen we want for RVV: https://repo.hca.bsc.es/epic/z/P6QXCc .LBB0_5:# %vector.body sub a4, t0, a3

[Bug tree-optimization/113583] Main loop in 519.lbm not vectorized.

2024-01-26 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 --- Comment #10 from rguenther at suse dot de --- On Fri, 26 Jan 2024, rdapp at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 > > --- Comment #9 from Robin Dapp --- > (In reply to rguent...@suse.de from comment

[Bug tree-optimization/113583] Main loop in 519.lbm not vectorized.

2024-01-26 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 --- Comment #9 from Robin Dapp --- (In reply to rguent...@suse.de from comment #6) > t.c:47:21: missed: the size of the group of accesses is not a power of 2 > or not equal to 3 > t.c:47:21: missed: not falling back to elementwise

[Bug tree-optimization/113583] Main loop in 519.lbm not vectorized.

2024-01-25 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 --- Comment #8 from Richard Biener --- (In reply to JuzheZhong from comment #7) > > But I wonder if we see it is beneficial on some boards, could you teach us > how we can enable vectorization for such case according to uarchs ? If you figure

[Bug tree-optimization/113583] Main loop in 519.lbm not vectorized.

2024-01-25 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 --- Comment #7 from JuzheZhong --- (In reply to rguent...@suse.de from comment #6) > On Thu, 25 Jan 2024, juzhe.zhong at rivai dot ai wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 > > > > --- Comment #5 from JuzheZhong ---

[Bug tree-optimization/113583] Main loop in 519.lbm not vectorized.

2024-01-25 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 --- Comment #6 from rguenther at suse dot de --- On Thu, 25 Jan 2024, juzhe.zhong at rivai dot ai wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 > > --- Comment #5 from JuzheZhong --- > Both ICC and Clang X86 can vectorize SPEC

[Bug tree-optimization/113583] Main loop in 519.lbm not vectorized.

2024-01-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement

[Bug tree-optimization/113583] Main loop in 519.lbm not vectorized.

2024-01-24 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 --- Comment #5 from JuzheZhong --- Both ICC and Clang X86 can vectorize SPEC 2017 lbm: https://godbolt.org/z/MjbTbYf1G But I am not sure X86 ICC is better or X86 Clang is better.

[Bug tree-optimization/113583] Main loop in 519.lbm not vectorized.

2024-01-24 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 --- Comment #4 from JuzheZhong --- OK. Confirm on X86 GCC failed to vectorize it, wheras Clang X86 can vectorize it. https://godbolt.org/z/EaTjGbPGW X86 Clang and RISC-V Clang IR are same: %12 = tail call <8 x double>

[Bug tree-optimization/113583] Main loop in 519.lbm not vectorized.

2024-01-24 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 --- Comment #3 from JuzheZhong --- Ok I see. If we change NN into 8, then we can vectorize it with load_lanes/store_lanes with group size = 8: https://godbolt.org/z/doe9c3hfo We will use vlseg8e64 which is RVVM1DF[8] == RVVM1x8DFmode. Here

[Bug tree-optimization/113583] Main loop in 519.lbm not vectorized.

2024-01-24 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 --- Comment #2 from Robin Dapp --- > It's interesting, for Clang only RISC-V can vectorize it. The full loop can be vectorized on clang x86 as well when I remove the first conditional (which is not in the snippet I posted above). So that's

[Bug tree-optimization/113583] Main loop in 519.lbm not vectorized.

2024-01-24 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 --- Comment #1 from JuzheZhong --- It's interesting, for Clang only RISC-V can vectorize it. I think there are 2 topics: 1. Support vectorization of this codes of in loop vectorizer. 2. Transform gather/scatter into strided load/store for