[Bug tree-optimization/86318] const local aggregates can be assumed not to be modified even when escaped
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86318 --- Comment #4 from Josh Haberman --- Is there any plan or timeline for fixing this bug?
[Bug tree-optimization/108226] New: __restrict on inlined function parameters does not function as expected
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108226 Bug ID: 108226 Summary: __restrict on inlined function parameters does not function as expected Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: jhaberman at gmail dot com Target Milestone: --- In bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58526 and https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60712#c3 it is said that restrict/__restrict on inlined function parameters was fixed in GCC 5. But I ran into a case where __restrict does not work as expected: // Godbolt link for this example: https://godbolt.org/z/e5j93Ex3v long g; static void Func1(void* p1, int* p2) { switch (*p2) { case 2: __builtin_memcpy(p1, &g, 1); return; case 1: __builtin_memcpy(p1, &g, 8); return; case 0: { __builtin_memcpy(p1, &g, 16); return; } } } static void Func2(char* __restrict p1, int* __restrict p2) { *p2 = 1; *p1 = 123; Func1(p1, p2); } void Func3(char* p1, int* p2) { *p2 = 1; Func2(p1, p2); } The __restrict qualifiers on Func2() should allow the switch() should be optimized away. Clang optimizes it, GCC does not. It appears that __restrict on function parameters can even make the code worse. Consider a slight variation on this example: // Godbolt link for this example: https://godbolt.org/z/Y61qajETd long g; static void Func1(void* p1, int* p2) { switch (*p2) { case 2: __builtin_memcpy(p1, &g, 1); return; case 1: __builtin_memcpy(p1, &g, 8); return; case 0: { __builtin_memcpy(p1, &g, 16); return; } } } // If we remove __restrict here, GCC succeeds in optimizing away the switch(). static void Func2(char* __restrict p1, int* __restrict p2) { *p1 = 123; *p2 = 1; Func1(p1, p2); } void Func3(char* p1, int* p2) { *p2 = 1; Func2(p1, p2); } In this case, it should be straightforward to optimize away the switch(), even without __restrict. But GCC does not optimize this correctly unless we *remove* __restrict.
[Bug tree-optimization/56456] [meta-bug] bogus/missing -Warray-bounds
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56456 Bug 56456 depends on bug 108217, which changed state. Bug 108217 Summary: bogus -Warray-bounds with pointer to constant local https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108217 What|Removed |Added Status|RESOLVED|UNCONFIRMED Resolution|DUPLICATE |---
[Bug middle-end/108217] bogus -Warray-bounds with pointer to constant local
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108217 Josh Haberman changed: What|Removed |Added Status|RESOLVED|UNCONFIRMED Resolution|DUPLICATE |--- --- Comment #3 from Josh Haberman --- > That being said there is a missed optimization but that is the same as PR > 23384 . > The const part is a misleading you really. I think there are two issues here. 1. Escape analysis is not flow sensitive. I agree that aspect of my bug report is a dup of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=23384, and closing as a dup is appropriate there. 2. Escape analysis does not take 'const-ness' of the underlying object into account. Let me illustrate (2) with an example that isolates that issue (Godbolt: https://godbolt.org/z/16cv87s9d) void ExternFunc(const int*); int Bad() { const int i = 0; const int* pi = &i; ExternFunc(pi); return *pi; } int Good() { const int i = 0; ExternFunc(&i); return i; } These two functions are effectively the same, but in Bad() GCC does not perform constant propagation across the external function call. While it's true that the pointer escapes, the underlying object is const and cannot change, so constant propagation should work here, as it does in Good(). Currently GCC re-loads `i` from the stack in Bad(), even though we know statically that the value must be zero. The same missed optimization is present in Clang: https://github.com/llvm/llvm-project/issues/59694
[Bug middle-end/108217] New: bogus -Warray-bounds with pointer to constant local
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108217 Bug ID: 108217 Summary: bogus -Warray-bounds with pointer to constant local Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: jhaberman at gmail dot com Target Milestone: --- Repro: void ExternFunc1(); void ExternFunc2(const int*); char mem[32]; static void StaticFunc(const int* i) { void* ptr = (void*)0; switch (*i) { case 0: ExternFunc2(i); return; case 1: __builtin_memcpy(mem, &ptr, sizeof(ptr)); return; case 2: { __builtin_memcpy(mem, &ptr, 32); return; } } } void Bad() { const int i = 1; ExternFunc1(); StaticFunc(&i); } This reproduces on trunk according to Godbolt: https://godbolt.org/z/vYGo1z6bG Godbolt also indicates a missed optimization, which is probably related to the bogus warning. Clang correctly performs constant propagation of the local `i`, whereas GCC seems to think that all cases of the switch() are reachable. It is true that &i escapes, but mutating `i` is UB because it is const, so it should be legal to perform constant propagation here. Additionally, even if ExternFunc2() mutated `i`, it would be too late to change its value in time to affect the switch().
[Bug rtl-optimization/70782] zero-initialized long returned by value generates useless stores/loads to the stack
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70782 Josh Haberman changed: What|Removed |Added Summary|zero-initialized union |zero-initialized long |returned by value generates |returned by value generates |useless stores/loads to the |useless stores/loads to the |stack |stack --- Comment #1 from Josh Haberman --- I just realized that the union has nothing to do with it. I get exactly the same results if the function returns a long: -- #include long f(const void *p, int type) { long v; memset(&v, 0, 8); if (type == 1) { memcpy(&v, p, 1); } else if (type <= 5) { memcpy(&v, p, 4); } else if (type <= 8) { memcpy(&v, p, 8); } return v; } -- I've retitled the bug accordingly.
[Bug rtl-optimization/70782] New: zero-initialized union returned by value generates useless stores/loads to the stack
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70782 Bug ID: 70782 Summary: zero-initialized union returned by value generates useless stores/loads to the stack Product: gcc Version: 5.2.1 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: jhaberman at gmail dot com Target Milestone: --- Test case: -- #include typedef union { char ch; float fl; double dbl; } u; u f(const void *p, int type) { u v; memset(&v, 0, 8); if (type == 1) { memcpy(&v, p, 1); } else if (type <= 5) { memcpy(&v, p, 4); } else if (type <= 8) { memcpy(&v, p, 8); } return v; } -- With gcc 5.2.1 on Ubuntu, compiled with -O3 -fno-stack-protect I get: -- : 0: 83 fe 01cmpesi,0x1 3: 48 c7 44 24 e8 00 00movQWORD PTR [rsp-0x18],0x0 a: 00 00 c: 74 32 je 40 e: 83 fe 05cmpesi,0x5 11: 7e 1d jle30 13: 83 fe 08cmpesi,0x8 16: 7f 08 jg 20 18: 48 8b 07movrax,QWORD PTR [rdi] 1b: 48 89 44 24 e8 movQWORD PTR [rsp-0x18],rax 20: 48 8b 44 24 e8 movrax,QWORD PTR [rsp-0x18] 25: c3 ret 26: 66 2e 0f 1f 84 00 00nopWORD PTR cs:[rax+rax*1+0x0] 2d: 00 00 00 30: 8b 07 moveax,DWORD PTR [rdi] 32: 89 44 24 e8 movDWORD PTR [rsp-0x18],eax 36: 48 8b 44 24 e8 movrax,QWORD PTR [rsp-0x18] 3b: c3 ret 3c: 0f 1f 40 00 nopDWORD PTR [rax+0x0] 40: 0f b6 07movzx eax,BYTE PTR [rdi] 43: 88 44 24 e8 movBYTE PTR [rsp-0x18],al 47: 48 8b 44 24 e8 movrax,QWORD PTR [rsp-0x18] 4c: c3 ret -- In every code path it saves the read value to the stack, only to read it back. None of these operations are actually necessary, since the code is already zeroing the other parts of rax. This function shouldn't need to use any stack space at all.
[Bug inline-asm/52813] %rsp in clobber list is silently ignored
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52813 --- Comment #4 from Josh Haberman 2012-04-01 19:23:14 UTC --- I understand that GCC may not be able to save/restore %rsp like it does other registers. But if that's the case, GCC should throw an error if the user puts %rsp in the clobber list, instead of silently ignoring it. Otherwise how is the user supposed to know that %rsp will not be saved except through trial and error?
[Bug inline-asm/52813] %rsp in clobber list is silently ignored
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52813 Josh Haberman changed: What|Removed |Added Status|RESOLVED|UNCONFIRMED Resolution|INVALID | --- Comment #2 from Josh Haberman 2012-04-01 15:54:27 UTC --- I don't expect the compiler to analyze the asm string. I expect the compiler to respect my clobber list. I told GCC that I would clobber %rsp. Any other register that I put in the clobber list causes GCC to save that register to the stack or to another register before the asm and restore it from the stack/register after the asm. For example: -- #include int main() { int x = rand(); asm volatile ("movq $0, %%rax" : : : "%rax"); return x; } $ gcc -Wall -O3 -fomit-frame-pointer -c -o test.o test.c $ objdump -d -r -M intel test.o test.o: file format elf64-x86-64 Disassembly of section .text.startup: : 0:48 83 ec 08 subrsp,0x8 4:e8 00 00 00 00 call 9 5: R_X86_64_PC32rand-0x4 9:89 c2movedx,eax b:48 c7 c0 00 00 00 00 movrax,0x0 12:89 d0moveax,edx 14:48 83 c4 08 addrsp,0x8 18:c3 ret -- Notice that it saved eax to edx before my asm and restored it afterwards. This works for every register except %rsp, which is silently ignored if you try to list it in the clobber list. This is a bug.
[Bug inline-asm/52813] New: %rsp in clobber list is silently ignored
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52813 Bug #: 52813 Summary: %rsp in clobber list is silently ignored Classification: Unclassified Product: gcc Version: 4.6.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: inline-asm AssignedTo: unassig...@gcc.gnu.org ReportedBy: jhaber...@gmail.com The following test program crashes even though I correctly listed %rsp as clobbered: -- int main() { asm volatile ("movq $0, %%rsp" : : : "%rsp"); return 0; } -- I would prefer gcc to error out in this case instead of silently ignoring my instruction.
[Bug target/52055] load of 64-bit pointer reads 64 bits even when only 32 are used
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52055 --- Comment #2 from Josh Haberman 2012-01-31 17:23:51 UTC --- Is there any requirement that you trap if the 64-bit read would have trapped? Aren't unaligned reads undefined behavior that only happen to work on x86-64?
[Bug target/52055] New: load of 64-bit pointer reads 64 bits even when only 32 are used
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52055 Bug #: 52055 Summary: load of 64-bit pointer reads 64 bits even when only 32 are used Classification: Unclassified Product: gcc Version: 4.6.1 Status: UNCONFIRMED Severity: minor Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: jhaber...@gmail.com The following test program: #include uint32_t rd32(uint64_t *i) { return *i; } Compiles to this (-O3 -fomit-frame-pointer): : 0:48 8b 07 movrax,QWORD PTR [rdi] 3:c3 ret But Clang compiles to this, which seems correct, is one byte shorter and touches less memory: : 0:8b 07moveax,DWORD PTR [rdi] 2:c3 ret
[Bug rtl-optimization/44194] struct returned by value generates useless stores
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44194 --- Comment #8 from Josh Haberman 2011-02-24 03:27:04 UTC --- I found another test case for this. I thought I'd post it since it's extremely different than the original one. -- class Foo { public: virtual ~Foo() {} virtual void DoSomething() = 0; }; void foo(Foo *f, void (Foo::*member)()) { (f->*member)(); } -- $ g++ -c -O3 -fomit-frame-pointer test.cc $ objdump -M intel -d test.o test.o: file format elf64-x86-64 Disassembly of section .text: <_Z3fooP3FooMS_FvvE>: 0:40 f6 c6 01 test sil,0x1 4:48 89 74 24 e8 movQWORD PTR [rsp-0x18],rsi 9:48 89 54 24 f0 movQWORD PTR [rsp-0x10],rdx e:74 10je 20 <_Z3fooP3FooMS_FvvE+0x20> 10:48 01 d7 addrdi,rdx 13:48 8b 07 movrax,QWORD PTR [rdi] 16:48 8b 74 30 ff movrsi,QWORD PTR [rax+rsi*1-0x1] 1b:ff e6jmprsi 1d:0f 1f 00 nopDWORD PTR [rax] 20:48 01 d7 addrdi,rdx 23:ff e6jmprsi -- We spilled rsi and rdx to the stack (in the red zone, it appears) for no reason (AFAICS).
[Bug rtl-optimization/44194] struct returned by value generates useless stores
--- Comment #7 from jhaberman at gmail dot com 2010-07-10 01:48 --- I must have been on crack when I wrote that last comment. Sorry for the noise. Though I do wonder how difficult the original bug is to fix. This seems to make it more expensive to return structures by value. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44194
[Bug rtl-optimization/44194] struct returned by value generates useless stores
--- Comment #4 from jhaberman at gmail dot com 2010-07-10 01:38 --- This seems to happen even with POD return types: int foo(); void bar(int a); void func() { bar(foo()); } In 32-bit mode it spills the return value to the stack for no reason. It also seems to overallocate the stack (28 bytes allocated, only 4 used): : 0: 83 ec 1csubesp,0x1c 3: e8 fc ff ff ff call 4 4: R_386_PC32 foo 8: 89 04 24movDWORD PTR [esp],eax b: e8 fc ff ff ff call c c: R_386_PC32 bar 10: 83 c4 1caddesp,0x1c 13: c3 ret In 64-bit mode there is no store, but it *does* allocate 8 bytes of stack that it never uses: : 0: 48 83 ec 08 subrsp,0x8 4: 31 c0 xoreax,eax 6: e8 00 00 00 00 call b 7: R_X86_64_PC32foo+0xfffc b: 48 83 c4 08 addrsp,0x8 f: 89 c7 movedi,eax 11: e9 00 00 00 00 jmp16 12: R_X86_64_PC32 bar+0xfffc Any idea how hard this bug is to fix? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44194
[Bug rtl-optimization/44194] New: struct returned by value generates useless stores
Test case: -- #include struct twoints { uint64_t a, b; } foo(); void bar(uint64_t a, uint64_t b); void func() { struct twoints s = foo(); bar(s.a, s.b); } -- $ gcc -save-temps -Wall -c -o testbad.o -msse2 -O3 -fomit-frame-pointer testbad.c $ objdump -d -r -M intel testbad.o testbad.o: file format elf64-x86-64 Disassembly of section .text: : 0: 48 83 ec 28 subrsp,0x28 4: 31 c0 xoreax,eax 6: e8 00 00 00 00 call b 7: R_X86_64_PC32foo-0x4 b: 48 89 04 24 movQWORD PTR [rsp],rax f: 48 89 54 24 08 movQWORD PTR [rsp+0x8],rdx 14: 48 89 d6movrsi,rdx 17: 48 89 44 24 10 movQWORD PTR [rsp+0x10],rax 1c: 48 89 54 24 18 movQWORD PTR [rsp+0x18],rdx 21: 48 89 c7movrdi,rax 24: 48 83 c4 28 addrsp,0x28 28: e9 00 00 00 00 jmp2d 29: R_X86_64_PC32 bar-0x4 -- As you can see above, rax and rdx are stored to the stack twice, but these stores are unnecessary. $ gcc -v Using built-in specs. Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 4.4.3-4ubuntu5' --with-bugurl=file:///usr/share/doc/gcc-4.4/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --enable-shared --enable-multiarch --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.4 --program-suffix=-4.4 --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-plugin --enable-objc-gc --disable-werror --with-arch-32=i486 --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) -- Summary: struct returned by value generates useless stores Product: gcc Version: 4.4.3 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: jhaberman at gmail dot com GCC build triplet: x86_64-linux-gnu GCC host triplet: x86_64-linux-gnu GCC target triplet: x86_64-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44194