https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98348
Bug ID: 98348 Summary: GCC 10.2 AVX512 Mask regression from GCC 9 Product: gcc Version: 10.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: danielhanchen at gmail dot com Target Milestone: --- In GCC 9, vector comparisons on 128 and 256bit vectors on a AVX512 machine used vpcmpeqd without any masks. In GCC 10, for 128bit and 256bit vectors, AVX512 mask instructions are used. https://gcc.godbolt.org/z/1sPzM5 GCC 10 should follow GCC 9 for vector comparisons when a mask is not needed. The reason why is https://uops.info/table.html shows that using mask registers makes 128/256/512 operations have a throughput of 1 and a latency of 3. However, using a vector comparison directly has a throughput of 2 and a latency of 1.