https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79552
Bug ID: 79552 Summary: [Regression GCC 6+] Wrong code generation due to -fschedule-insns, with __restrict__ and inline asm Product: gcc Version: 6.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: inline-asm Assignee: unassigned at gcc dot gnu.org Reporter: katsunori.kumatani at gmail dot com Target Milestone: --- This is a regression bug leading to wrong code generation with __restrict__ in C++ and inline asm with memory output. I tracked down the culprit to be the -fschedule-insns option, so I used -O1 to track it as it is implicitly enabled in -O2 and higher. It only happens since GCC 6, version 5.4 is fine. Simple test case of a simplistic memset in asm (pasted here because it's very short): inline void memset_test(void* a, char c, unsigned long n) { // mark the pointer's data as a big clobber via a local variable (this works) struct { char _[unsigned(~0U)>>1]; } *const m=(typeof(m))(a); asm("rep stosb":"+D"(a),"+c"(n),"=m"(*m):"a"(c)); } void foo(char* __restrict__ a, int c) { memset_test(a, 0, c); asm("xor %0, %0"::"q"(a[0])); // dummy dependency to show the problem } Compile the above with: -m32 -O1 -fschedule-insns (or -m64 which has the same bug) I've used Godbolt's Compiler Explorer to easily test multiple versions and then confirmed it even in the upcoming version 7. Here's the example i386 output from GCC 6 or 7 snapshot (both are identical): GCC 6.x or GCC 7: push edi mov eax, DWORD PTR [esp+8] mov edi, eax mov ecx, DWORD PTR [esp+12] movzx edx, BYTE PTR [eax] # this is WRONG! mov eax, 0 rep stosb xor dl, dl pop edi ret GCC 5.4: push edi mov edx, DWORD PTR [esp+8] mov edi, edx mov ecx, DWORD PTR [esp+12] mov eax, 0 rep stosb movzx eax, BYTE PTR [edx] # CORRECT, after stosb xor al, al pop edi ret I don't have GCC 7 but this bug is confirmed on GCC 6 on my machine, not just on Godbolt's site where I tested version 7. Other compilers generate correct code (Clang, ICC) with the same source code. Things to note: This happens on GCC 6 and up to 7 only, GCC 5.4 generates correct output. This happens once you turn on the -fschedule-insns option. So it's a bug there. If you remove the __restrict__ from the pointer in foo's parameter, the problem is gone. Using "asm volatile" instead of "asm" in memset_test generates correct code. Using "memory" clobber in that asm also generates correct code. Most of these workarounds are not valid in this context because they DISABLE the optimizations, so it's like preventing the problem from popping up instead of solving it. "memory" clobber is obviously the worst solution by far as it will kill any cached memory in registers. "asm volatile" is probably the least bad workaround, __restrict__ is definitely useful for same types the compiler can't otherwise know they won't alias. Please look into it, it was really annoying to track down.