https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98348

            Bug ID: 98348
           Summary: GCC 10.2 AVX512 Mask regression from GCC 9
           Product: gcc
           Version: 10.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: danielhanchen at gmail dot com
  Target Milestone: ---

In GCC 9, vector comparisons on 128 and 256bit vectors on a AVX512 machine used
vpcmpeqd without any masks.

In GCC 10, for 128bit and 256bit vectors, AVX512 mask instructions are used.
https://gcc.godbolt.org/z/1sPzM5

GCC 10 should follow GCC 9 for vector comparisons when a mask is not needed.

The reason why is https://uops.info/table.html shows that using mask registers
makes 128/256/512 operations have a throughput of 1 and a latency of 3.

However, using a vector comparison directly has a throughput of 2 and a latency
of 1.

Reply via email to