https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79552

            Bug ID: 79552
           Summary: [Regression GCC 6+] Wrong code generation due to
                    -fschedule-insns, with __restrict__ and inline asm
           Product: gcc
           Version: 6.3.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: inline-asm
          Assignee: unassigned at gcc dot gnu.org
          Reporter: katsunori.kumatani at gmail dot com
  Target Milestone: ---

This is a regression bug leading to wrong code generation with __restrict__ in
C++ and inline asm with memory output. I tracked down the culprit to be the 
-fschedule-insns  option, so I used -O1 to track it as it is implicitly enabled
in -O2 and higher. It only happens since GCC 6, version 5.4 is fine.

Simple test case of a simplistic memset in asm (pasted here because it's very
short):


inline void memset_test(void* a, char c, unsigned long n)
{
  // mark the pointer's data as a big clobber via a local variable (this works)
  struct { char _[unsigned(~0U)>>1]; } *const m=(typeof(m))(a);

  asm("rep stosb":"+D"(a),"+c"(n),"=m"(*m):"a"(c));
}

void foo(char* __restrict__ a, int c)
{
  memset_test(a, 0, c);
  asm("xor %0, %0"::"q"(a[0]));  // dummy dependency to show the problem
}


Compile the above with:  -m32 -O1 -fschedule-insns   (or -m64 which has the
same bug)

I've used Godbolt's Compiler Explorer to easily test multiple versions and then
confirmed it even in the upcoming version 7. Here's the example i386 output
from GCC 6 or 7 snapshot (both are identical):

GCC 6.x or GCC 7:
        push    edi
        mov     eax, DWORD PTR [esp+8]
        mov     edi, eax
        mov     ecx, DWORD PTR [esp+12]
        movzx   edx, BYTE PTR [eax]        # this is WRONG!
        mov     eax, 0
        rep stosb
        xor dl, dl
        pop     edi
        ret


GCC 5.4:
        push    edi
        mov     edx, DWORD PTR [esp+8]
        mov     edi, edx
        mov     ecx, DWORD PTR [esp+12]
        mov     eax, 0
        rep stosb
        movzx   eax, BYTE PTR [edx]        # CORRECT, after stosb
        xor al, al
        pop     edi
        ret

I don't have GCC 7 but this bug is confirmed on GCC 6 on my machine, not just
on Godbolt's site where I tested version 7. Other compilers generate correct
code (Clang, ICC) with the same source code.

Things to note:

This happens on GCC 6 and up to 7 only, GCC 5.4 generates correct output.
This happens once you turn on the -fschedule-insns option. So it's a bug there.
If you remove the __restrict__ from the pointer in foo's parameter, the problem
is gone.
Using "asm volatile" instead of "asm" in memset_test generates correct code.
Using "memory" clobber in that asm also generates correct code.


Most of these workarounds are not valid in this context because they DISABLE
the optimizations, so it's like preventing the problem from popping up instead
of solving it. "memory" clobber is obviously the worst solution by far as it
will kill any cached memory in registers. "asm volatile" is probably the least
bad workaround, __restrict__ is definitely useful for same types the compiler
can't otherwise know they won't alias.


Please look into it, it was really annoying to track down.

Reply via email to