[Bug middle-end/108441] [12 Regression] Maybe missed optimization: loading an 16-bit integer value from .rodata instead of an immediate store

2023-01-18 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108441

--- Comment #5 from Richard Biener  ---
(In reply to Peter Cordes from comment #4)
> If there isn't already a bug open about tuning choices mismatching hardware,
> I can repost this as a new bug if you'd like.

These are probably best recorded in a new bug, separately for each issue.

[Bug middle-end/108441] [12 Regression] Maybe missed optimization: loading an 16-bit integer value from .rodata instead of an immediate store

2023-01-18 Thread peter at cordes dot ca via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108441

--- Comment #4 from Peter Cordes  ---
This is already fixed in current trunk; sorry I forgot to check that before
recommending to report this store-coalescing bug.

# https://godbolt.org/z/j3MdWrcWM
# GCC nightly -O3   (tune=generic)  and GCC11
store:
movl$16, %eax
movw%ax, ldap(%rip)
ret

In case anyone's wondering why GCC doesn't  movw $16, foo(%rip)
it's avoiding LCP stalls on Intel P6-family CPUs from the 16-bit immediate.

For MOV specifically, that only happens on P6-family (Nehalem and earlier), not
Sandybridge-family, so it's getting close to time to drop it from
-mtune=generic.  (-mtune= bdver* or znver* don't do it, so there is a tuning
setting controlling it)

GCC *only* seems to know about MOV, so ironically with -march=skylake for
example, we avoid a non-existant LCP stall for mov to memory, but GCC compiles
x += 1234 into code that will LCP stall, addw $1234, x(%rip).

-march=alderlake disables this tuning workaround, using movw $imm, mem.  (The
Silvermont-family E-cores in Alder Lake don't have this problem either, so
that's correct.  Agner Fog's guide didn't mention any changes in LCP stalls for
Alder Lake.)

Avoiding LCP stalls is somewhat less important on CPUs with a uop cache, since
it only happens on legacy decode.  Although various things can cause code to
only run from legacy decode even inside a loop, such as Skylake's JCC erratum
microcode mitigation if users don't assemble with the option to have GAS work
around it, which GCC doesn't pass by default with -march=skylake.

If there isn't already a bug open about tuning choices mismatching hardware, I
can repost this as a new bug if you'd like.


Related
:https://stackoverflow.com/questions/75154687/is-this-a-missed-optimization-in-gcc-loading-an-16-bit-integer-value-from-roda

and
https://stackoverflow.com/questions/70719114/why-does-the-short-16-bit-variable-mov-a-value-to-a-register-and-store-that-u

[Bug middle-end/108441] [12 Regression] Maybe missed optimization: loading an 16-bit integer value from .rodata instead of an immediate store

2023-01-18 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108441

Jakub Jelinek  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|NEW |RESOLVED

--- Comment #3 from Jakub Jelinek  ---
Therefore a dup.

*** This bug has been marked as a duplicate of bug 106022 ***

[Bug middle-end/108441] [12 Regression] Maybe missed optimization: loading an 16-bit integer value from .rodata instead of an immediate store

2023-01-18 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108441

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org
Summary|[12.2] Maybe missed |[12 Regression] Maybe
   |optimization: loading an|missed optimization:
   |16-bit integer value from   |loading an 16-bit integer
   |.rodata instead of an   |value from .rodata instead
   |immediate store |of an immediate store
   Last reconfirmed||2023-01-18
   Priority|P3  |P2
 Ever confirmed|0   |1
   Target Milestone|--- |12.3
 Status|UNCONFIRMED |NEW

--- Comment #2 from Jakub Jelinek  ---
Reduced testcase for -O2:
struct S { char a, b; } c;

void
foo (void)
{
  c.a = 16;
  c.b = 0;
}
Started with r12-6173-g9ff206d3865df5cb8
Went away again with r13-1415-gf3a5e75cb66dc96efca