[Bug tree-optimization/94375] 548.exchange2_r run time is 8-18% worse than GCC 9 at -Ofast -march=native
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94375 Martin Jambor changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #7 from Martin Jambor --- Thu bug as specified here has been fixed by commits 31584824665, 91153e0af9a, 67ce9099bc9, 1e7fdc02cba, 7d2cb2755a1. We can still do better on the benchmark if we fix PR 98782.
[Bug tree-optimization/94375] 548.exchange2_r run time is 8-18% worse than GCC 9 at -Ofast -march=native
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94375 --- Comment #6 from Martin Jambor --- (In reply to Richard Biener from comment #2) > Do we ever hit the vectorized paths? What's the best way to find out? If I open the disassembled code in perf report and search for ymm, some of these (groups of) instructions have (very few) samples, but more often they don't.
[Bug tree-optimization/94375] 548.exchange2_r run time is 8-18% worse than GCC 9 at -Ofast -march=native
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94375 --- Comment #5 from Hongtao.liu --- (In reply to Hongtao.liu from comment #4) > (In reply to Martin Jambor from comment #3) > > (In reply to Hongtao.liu from comment #1) > > > Try -mprefer-vector-width=128,256-bit vectorization is not helpful for 548 > > > according to our experience. > > > > I have seen this helping on one system running SLES 15.1 and with > > trunk abe13e1847f (Feb 17 2020) but not on another running openSUSE > > Tumbleweed and with trunk revision 26b3e568a60 (Mar 23 2020). So, > > from my perspective, perhaps it helps, perhaps it doesn't. > > What's your GCC option for OPENSUSE? > > Default value of -mprefer-vector-width for -mtune=zenver1 is 128, if that, > it won't help. > Different processor have different tune which may has different default > vector width. for -march=native, it depends on processor of your server/client.
[Bug tree-optimization/94375] 548.exchange2_r run time is 8-18% worse than GCC 9 at -Ofast -march=native
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94375 --- Comment #4 from Hongtao.liu --- (In reply to Martin Jambor from comment #3) > (In reply to Hongtao.liu from comment #1) > > Try -mprefer-vector-width=128,256-bit vectorization is not helpful for 548 > > according to our experience. > > I have seen this helping on one system running SLES 15.1 and with > trunk abe13e1847f (Feb 17 2020) but not on another running openSUSE > Tumbleweed and with trunk revision 26b3e568a60 (Mar 23 2020). So, > from my perspective, perhaps it helps, perhaps it doesn't. What's your GCC option for OPENSUSE? Default value of -mprefer-vector-width for -mtune=zenver1 is 128, if that, it won't help. Different processor have different tune which may has different default vector width.
[Bug tree-optimization/94375] 548.exchange2_r run time is 8-18% worse than GCC 9 at -Ofast -march=native
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94375 --- Comment #3 from Martin Jambor --- (In reply to Hongtao.liu from comment #1) > Try -mprefer-vector-width=128,256-bit vectorization is not helpful for 548 > according to our experience. I have seen this helping on one system running SLES 15.1 and with trunk abe13e1847f (Feb 17 2020) but not on another running openSUSE Tumbleweed and with trunk revision 26b3e568a60 (Mar 23 2020). So, from my perspective, perhaps it helps, perhaps it doesn't.
[Bug tree-optimization/94375] 548.exchange2_r run time is 8-18% worse than GCC 9 at -Ofast -march=native
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94375 Richard Biener changed: What|Removed |Added Blocks||53947 --- Comment #2 from Richard Biener --- Do we ever hit the vectorized paths? I guess the number of iterations isn't bound so the cost model has a hard time, possibly only triggering at runtime. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations
[Bug tree-optimization/94375] 548.exchange2_r run time is 8-18% worse than GCC 9 at -Ofast -march=native
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94375 --- Comment #1 from Hongtao.liu --- Try -mprefer-vector-width=128,256-bit vectorization is not helpful for 548 according to our experience.