https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85390

            Bug ID: 85390
           Summary: possible missed optimisation / regression from 6.3
                    with conditional expression
           Product: gcc
           Version: 8.0.1
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vegard.nossum at oracle dot com
  Target Milestone: ---

Input:

extern int a, b, c;

int f(int x)
{
    __builtin_prefetch((void *) (x ? a : b));
    return c;
}

Current trunk with -O3 produces this:

f(int):
  testl %edi, %edi
  je .L2
  movslq a(%rip), %rax
  prefetcht0 (%rax)
  movl c(%rip), %eax
  ret
.L2:
  movslq b(%rip), %rax
  prefetcht0 (%rax)
  movl c(%rip), %eax
  ret

While 6.3.0 did not have a branch:

f(int):
  movslq a(%rip), %rdx
  movslq b(%rip), %rax
  testl %edi, %edi
  cmovne %rdx, %rax
  prefetcht0 (%rax)
  movl c(%rip), %eax
  ret

For reference, clang also outputs a branchless (but slightly longer) version:

f(int): # @f(int)
  testl %edi, %edi
  movl $a, %eax
  movl $b, %ecx
  cmovneq %rax, %rcx
  movslq (%rcx), %rax
  prefetcht0 (%rax)
  movl c(%rip), %eax
  retq

In my tests, the 6.3.0 code is equally fast in the x == 0 and x != 0 cases,
whereas trunk/8.0.1 is only half as fast as 6.3.0 in the x == 0 (branch taken)
case. In the branch not taken case, the 8.0.1 code has the same speed as the
6.3.0 code.

Reply via email to