AVX512 register-zeroing should always use AVX 128b, not ymm or zmm

peter at cordes dot ca Thu, 04 May 2017 17:03:33 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80636


            Bug ID: 80636
           Summary: AVX / AVX512 register-zeroing should always use AVX
                    128b, not ymm or zmm
           Product: gcc
           Version: 8.0
               URL: http://stackoverflow.com/questions/43713273/is-vxorps-
                    zeroing-on-amd-jaguar-bulldozer-zen-faster-with-xmm-re
                    gisters-than-ymm
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: peter at cordes dot ca
  Target Milestone: ---
            Target: x86_64-*-*, i?86-*-*

Currently, gcc compiles _mm256_setzero_ps() to vxorps %ymm0, %ymm0, %ymm0, or
zmm for _mm512_setzero_ps.  And similar for pd and integer vectors, using a
vector size that matches how it's going to use the register.

vxorps %xmm0, %xmm0, %xmm0 has the same effect, because AVX instructions zero
the destination register out to VLMAX.

AMD Ryzen decodes the xmm version to 1 micro-op, but the ymm version to 2
micro-ops.  It doesn't detect the zeroing idiom special-case until after the
decoder has split it.  (Earlier AMD CPUs (Bulldozer/Jaguar) may be similar.)

---

For zeroing a ZMM register, it also saves a byte or two to use a VEX prefix
instead of EVEX, if the target register is zmm0-15.  (zmm16-31 of course always
need EVEX).

---

There is no benefit, but also no downside, to using xmm-zeroing on Intel CPUs
that don't split 256b or 512b vector ops.  This change could be made across the
board, without adding any tuning options to control it.

References: 
http://stackoverflow.com/a/43751783/224132 Agner Fog's answer to my SO question
about this.
https://bugs.llvm.org/show_bug.cgi?id=32862  the same issue for clang.

[Bug target/80636] New: AVX / AVX512 register-zeroing should always use AVX 128b, not ymm or zmm

Reply via email to