[Bug tree-optimization/94375] 548.exchange2_r run time is 8-18% worse than GCC 9 at -Ofast -march=native

2021-02-04 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94375

Martin Jambor  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from Martin Jambor  ---
Thu bug as specified here has been fixed by commits 31584824665, 91153e0af9a,
67ce9099bc9, 1e7fdc02cba, 7d2cb2755a1.

We can still do better on the benchmark if we fix PR 98782.

[Bug tree-optimization/94375] 548.exchange2_r run time is 8-18% worse than GCC 9 at -Ofast -march=native

2020-04-01 Thread jamborm at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94375

--- Comment #6 from Martin Jambor  ---
(In reply to Richard Biener from comment #2)
> Do we ever hit the vectorized paths?

What's the best way to find out?  If I open the disassembled code in
perf report and search for ymm, some of these (groups of) instructions
have (very few) samples, but more often they don't.

[Bug tree-optimization/94375] 548.exchange2_r run time is 8-18% worse than GCC 9 at -Ofast -march=native

2020-03-30 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94375

--- Comment #5 from Hongtao.liu  ---
(In reply to Hongtao.liu from comment #4)
> (In reply to Martin Jambor from comment #3)
> > (In reply to Hongtao.liu from comment #1)
> > > Try -mprefer-vector-width=128,256-bit vectorization is not helpful for 548
> > > according to our experience.
> > 
> > I have seen this helping on one system running SLES 15.1 and with
> > trunk abe13e1847f (Feb 17 2020) but not on another running openSUSE
> > Tumbleweed and with trunk revision 26b3e568a60 (Mar 23 2020).  So,
> > from my perspective, perhaps it helps, perhaps it doesn't.
> 
> What's your GCC option for OPENSUSE?
> 
> Default value of -mprefer-vector-width for -mtune=zenver1 is 128, if that,
> it won't help.
> Different processor have different tune which may has different default
> vector width.

for -march=native, it depends on processor of your server/client.

[Bug tree-optimization/94375] 548.exchange2_r run time is 8-18% worse than GCC 9 at -Ofast -march=native

2020-03-30 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94375

--- Comment #4 from Hongtao.liu  ---
(In reply to Martin Jambor from comment #3)
> (In reply to Hongtao.liu from comment #1)
> > Try -mprefer-vector-width=128,256-bit vectorization is not helpful for 548
> > according to our experience.
> 
> I have seen this helping on one system running SLES 15.1 and with
> trunk abe13e1847f (Feb 17 2020) but not on another running openSUSE
> Tumbleweed and with trunk revision 26b3e568a60 (Mar 23 2020).  So,
> from my perspective, perhaps it helps, perhaps it doesn't.

What's your GCC option for OPENSUSE?

Default value of -mprefer-vector-width for -mtune=zenver1 is 128, if that, it
won't help.
Different processor have different tune which may has different default vector
width.

[Bug tree-optimization/94375] 548.exchange2_r run time is 8-18% worse than GCC 9 at -Ofast -march=native

2020-03-30 Thread jamborm at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94375

--- Comment #3 from Martin Jambor  ---
(In reply to Hongtao.liu from comment #1)
> Try -mprefer-vector-width=128,256-bit vectorization is not helpful for 548
> according to our experience.

I have seen this helping on one system running SLES 15.1 and with
trunk abe13e1847f (Feb 17 2020) but not on another running openSUSE
Tumbleweed and with trunk revision 26b3e568a60 (Mar 23 2020).  So,
from my perspective, perhaps it helps, perhaps it doesn't.

[Bug tree-optimization/94375] 548.exchange2_r run time is 8-18% worse than GCC 9 at -Ofast -march=native

2020-03-30 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94375

Richard Biener  changed:

   What|Removed |Added

 Blocks||53947

--- Comment #2 from Richard Biener  ---
Do we ever hit the vectorized paths?  I guess the number of iterations isn't
bound so the cost model has a hard time, possibly only triggering at runtime.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

[Bug tree-optimization/94375] 548.exchange2_r run time is 8-18% worse than GCC 9 at -Ofast -march=native

2020-03-29 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94375

--- Comment #1 from Hongtao.liu  ---
Try -mprefer-vector-width=128,256-bit vectorization is not helpful for 548
according to our experience.