[Bug target/80568] x86 -mavx256-split-unaligned-load (and store) is affecting AVX2 code, but probably shouldn't be.

2017-09-07 Thread peter at cordes dot ca
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80568

Peter Cordes  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #3 from Peter Cordes  ---
Bug 78762 is asking for the same thing: disable at least load-splitting in
-mtune=generic when -mavx2 is enabled.

Or more generally, ISA-aware tune=generic.

*** This bug has been marked as a duplicate of bug 78762 ***

[Bug target/80568] x86 -mavx256-split-unaligned-load (and store) is affecting AVX2 code, but probably shouldn't be.

2017-05-02 Thread peter at cordes dot ca
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80568

--- Comment #2 from Peter Cordes  ---
Using ISA-extension options removes some microarchitectures from the set of
CPUs that can run the code, so it would be appropriate for them to have some
effect on tuning.

A "generic AVX2 CPU" is much more specific than a "generic x86-64 CPU".  For
example, rep ret is useless with -mavx, since PhenomII doesn't support AVX (or
SSE4, actually).

As it stands now, gcc doesn't have a way to tune for a "generic avx2 CPU". 
(i.e. only try to avoid problems on Haswell, Skylake, KNL, Excavator, and
Ryzen.  Don't care about things that are slow on IvyBridge, Steamroller, or
Atom.)

-mtune=haswell tells gcc that bsf/bsr are fast, but that's not the case on
Excavator (at least it isn't on Steamroller).  So -mtune=intel or
-mtune=haswell aren't necessarily appropriate, especially if we're just talking
about -mavx, not -mavx2.

---

In the absence of any -mtune or -march options, -mavx could imply
-mtune=generic-avx, the way -march implies a tuning but can be overridden with
-march=foo -mtune=bar.

Or maybe the default -mtune option should be changed to -mtune=generic-isa, so
users can think of it as a tuning that looks at what -m options are enabled to
decide which uarches it can ignore.

It might be easier to maintain if those tune options are limited to only
disabling workaround-options like rep ret and splitting 256b loads/stores.

Or maybe this suggestion would already add too much maintenance work.

---

I don't know whether -mavx256-split-unaligned-load/store is still worth it if
we take SnB/IvB out of the picture.  If it helps a lot for Excavator/Zen, then
maybe.  It probably hurts for KNL, which easily bottlenecks on decode
throughput according to Agner Fog, so more instructions is definitely bad.

---

I didn't find any related bug reports, searching even on closed bugs for split
unaligned load, or for  -mavx256-split-unaligned-load.  I did search some
(including in git for the commit that changed this), but didn't find anything.

Thanks for confirming that it was an intentional bugfix.

[Bug target/80568] x86 -mavx256-split-unaligned-load (and store) is affecting AVX2 code, but probably shouldn't be.

2017-05-02 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80568

--- Comment #1 from Richard Biener  ---
It was a bugfix and it's now working as intended AFAIK.  You can search for
duplicate bugreports.