https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91400

            Bug ID: 91400
           Summary: __builtin_cpu_supports conjunction is optimized poorly
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vanyacpp at gmail dot com
  Target Milestone: ---

Clang 8 optimizes both f() and g() to the same code:

bool f()
{
    return __builtin_cpu_supports("popcnt") && __builtin_cpu_supports("ssse3");
}

bool g()
{
    extern unsigned int cpu_model;
    return (cpu_model & 64) && (cpu_model & 4);
}

f()/g():
        mov     eax, dword ptr [rip + cpu_model]
        and     eax, 68
        cmp     eax, 68
        sete    al
        ret

GCC generates this code only for g(). For f() GCC generates less optimal:

f():
        mov     edx, DWORD PTR __cpu_model[rip+12]
        mov     eax, edx
        shr     eax, 6
        and     eax, 1
        and     edx, 4
        mov     edx, 0
        cmove   eax, edx
        ret

I believe it would be great if GCC is able to generate the same code for f()
too.

Reply via email to