[Bug target/101456] Unnecessary vzeroupper when upper bits of YMM registers already zero

2024-06-20 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101456

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|12.4|12.5

--- Comment #13 from Richard Biener  ---
GCC 12.4 is being released, retargeting bugs to GCC 12.5.

[Bug target/101456] Unnecessary vzeroupper when upper bits of YMM registers already zero

2023-05-08 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101456

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|12.3|12.4

--- Comment #12 from Richard Biener  ---
GCC 12.3 is being released, retargeting bugs to GCC 12.4.

[Bug target/101456] Unnecessary vzeroupper when upper bits of YMM registers already zero

2022-08-19 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101456

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|12.2|12.3

--- Comment #11 from Richard Biener  ---
GCC 12.2 is being released, retargeting bugs to GCC 12.3.

[Bug target/101456] Unnecessary vzeroupper when upper bits of YMM registers already zero

2022-05-06 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101456

Jakub Jelinek  changed:

   What|Removed |Added

   Target Milestone|12.0|12.2

--- Comment #10 from Jakub Jelinek  ---
GCC 12.1 is being released, retargeting bugs to GCC 12.2.

[Bug target/101456] Unnecessary vzeroupper when upper bits of YMM registers already zero

2022-02-15 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101456

--- Comment #9 from Hongtao.liu  ---
(In reply to H.J. Lu from comment #8)
> It turns out that reading YMM registers with all zero bits needs VZEROUPPER
> on Sandy Bride, Ivy Bridge, Haswell, Broadwell and Alder Lake to avoid
> SSE <-> AVX transition penalty.

We should target tune for r12-2571-g9775e465c1fbfc32656de77c618c61acf5bd905d.

[Bug target/101456] Unnecessary vzeroupper when upper bits of YMM registers already zero

2022-02-15 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101456

H.J. Lu  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Resolution|FIXED   |---
 Status|RESOLVED|REOPENED
   Last reconfirmed||2022-02-15

--- Comment #8 from H.J. Lu  ---
It turns out that reading YMM registers with all zero bits needs VZEROUPPER
on Sandy Bride, Ivy Bridge, Haswell, Broadwell and Alder Lake to avoid
SSE <-> AVX transition penalty.

[Bug target/101456] Unnecessary vzeroupper when upper bits of YMM registers already zero

2021-07-28 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101456

H.J. Lu  changed:

   What|Removed |Added

   Target Milestone|--- |12.0
 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from H.J. Lu  ---
Fixed for GCC 12.  Please reopen it if there are other cases where
vzeroupper should be skipped.

[Bug target/101456] Unnecessary vzeroupper when upper bits of YMM registers already zero

2021-07-28 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101456

--- Comment #6 from CVS Commits  ---
The master branch has been updated by H.J. Lu :

https://gcc.gnu.org/g:9775e465c1fbfc32656de77c618c61acf5bd905d

commit r12-2571-g9775e465c1fbfc32656de77c618c61acf5bd905d
Author: H.J. Lu 
Date:   Tue Jul 27 07:46:04 2021 -0700

x86: Don't set AVX_U128_DIRTY when zeroing YMM/ZMM register

There is no SSE <-> AVX transition penalty if the upper bits of YMM/ZMM
registers are unchanged and YMM/ZMM store doesn't change the upper bits
of YMM/ZMM registers.

1. Since zeroing YMM/ZMM register is implemented with zeroing XMM
register, don't set AVX_U128_DIRTY when zeroing YMM/ZMM register.
2. Since store doesn't change the INIT state on the upper bits of
YMM/ZMM register, don't set AVX_U128_DIRTY on store if the source
of store was never non-zero.

Here are the vzeroupper count differences on SPEC CPU 2017 with

-Ofast -march=skylake-avx512

Before  AfterDiff
500.perlbench_r 226 225 -0.44%
502.gcc_r   12631103-12.67%
503.bwaves_r14  14  0.00%
505.mcf_r   29  28  -3.45%
507.cactuBSSN_r 46514628-0.49%
508.namd_r  433 432 -0.23%
510.parest_r20380   19347   -5.07%
511.povray_r495 452 -8.69%
519.lbm_r   2   2   0.00%
520.omnetpp_r   59545677-4.65%
521.wrf_r   12353   12339   -0.11%
523.xalancbmk_r 13137   13001   -1.04%
525.x264_r  192 191 -0.52%
526.blender_r   25152366-5.92%
527.cam4_r  46014583-0.39%
531.deepsjeng_r 20  19  -5.00%
538.imagick_r   898 805 -10.36%
541.leela_r 427 399 -6.56%
544.nab_r   74  74  0.00%
548.exchange2_r 72  72  0.00%
549.fotonik3d_r 318 318 0.00%
554.roms_r  558 554 -0.72%
557.xz_r79  52  -34.18%

and performance differences are within noise range.

gcc/

PR target/101456
* config/i386/i386.c (ix86_avx_u128_mode_needed): Don't set
AVX_U128_DIRTY when all bits are zero.

gcc/testsuite/

PR target/101456
* gcc.target/i386/pr101456-1.c: New test.
* gcc.target/i386/pr101456-2.c: Likewise.

[Bug target/101456] Unnecessary vzeroupper when upper bits of YMM registers already zero

2021-07-16 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101456

--- Comment #5 from H.J. Lu  ---
We need to verify that LOADING the zero YMM register won't trigger
SSE<->AVX transition penalty.

[Bug target/101456] Unnecessary vzeroupper when upper bits of YMM registers already zero

2021-07-15 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101456

H.J. Lu  changed:

   What|Removed |Added

  Attachment #51153|0   |1
is obsolete||

--- Comment #4 from H.J. Lu  ---
Created attachment 51157
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51157=edit
A new patch

[Bug target/101456] Unnecessary vzeroupper when upper bits of YMM registers already zero

2021-07-14 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101456

--- Comment #3 from H.J. Lu  ---
(In reply to Arjan van de Ven from comment #1)
> Actually it's not that they're zero (they are) but they're in "init" state
> since the vpxor wrote to xmm not ymm

We generate:

vxorpd  %xmm0, %xmm0, %xmm0 # 5 [c=4 l=4]  movv4df_internal/0

to zero all bits in YMM and ZMM registers.

[Bug target/101456] Unnecessary vzeroupper when upper bits of YMM registers already zero

2021-07-14 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101456

--- Comment #2 from H.J. Lu  ---
Created attachment 51153
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51153=edit
A patch

[Bug target/101456] Unnecessary vzeroupper when upper bits of YMM registers already zero

2021-07-14 Thread arjan at linux dot intel.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101456

Arjan van de Ven  changed:

   What|Removed |Added

 CC||arjan at linux dot intel.com

--- Comment #1 from Arjan van de Ven  ---
Actually it's not that they're zero (they are) but they're in "init" state
since the vpxor wrote to xmm not ymm