[Bug lto/49237] error with -flto: 'f' causes a section type conflict
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49237 --- Comment #2 from Wouter Vermaelen wouter.vermaelen at scarlet dot be 2011-12-16 19:28:36 UTC --- I also can't reproduce it anymore.
[Bug rtl-optimization/51014] [4.7 Regression] ICE: in apply_opt_in_copies, at loop-unroll.c:2283 with -O2 -g -funroll-loops
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51014 Wouter Vermaelen wouter.vermaelen at scarlet dot be changed: What|Removed |Added CC||wouter.vermaelen at scarlet ||dot be --- Comment #1 from Wouter Vermaelen wouter.vermaelen at scarlet dot be 2011-11-08 13:23:32 UTC --- I hit the same ICE, with the same required compiler flags. Here's my reduced testcase: struct S { ~S() { delete p; } int* p; }; void f(S* b, S* e) { for (/**/; b != e; ++b) { b-~S(); } }
[Bug tree-optimization/50417] New: regression: memcpy with known alignment
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50417 Bug #: 50417 Summary: regression: memcpy with known alignment Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: wouter.vermae...@scarlet.be Consider these functions: void copy1(char* d, const char* s) { memcpy(d, s, 256); } void copy2(short* d, const short* s) { memcpy(d, s, 256); } void copy3(int* d, const int* s) { memcpy(d, s, 256); } void copy4(long* d, const long* s) { memcpy(d, s, 256); } g++-4.5.2 is able to generate better code for the later functions. But when I test with a recent snapshot (SVN revision 178875 on linux x86_64) it generates the same code for all versions (same as copy1()).
[Bug tree-optimization/50385] New: missed-optimization: jump to __builtin_unreachable() not removed
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50385 Bug #: 50385 Summary: missed-optimization: jump to __builtin_unreachable() not removed Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: wouter.vermae...@scarlet.be I'm not sure, but this issue might be the same as bug 49054 (if so, feel free to delete this one). #include vector struct S { int a, b; }; std::vectorS v; int search_1(int a) { for (auto it = v.begin(); /**/; ++it) if (it-a == a) return it-b; } int search_2(int a) { for (auto e : v) if (e.a == a) return e.b; __builtin_unreachable(); } I expected to see the same generated code for both functions. Instead the 2nd one still contains some useless comparisons and jumps past the end of the function. Since such a (conditional) jump is anyway undefined behavior it can as well be removed (including the instructions required to calculate the condition). Tested with SVN revision 178775 (20110912) on Linux x86_64.
[Bug rtl-optimization/50339] New: suboptimal register allocation for abs(__int128_t)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50339 Bug #: 50339 Summary: suboptimal register allocation for abs(__int128_t) Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: rtl-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: wouter.vermae...@scarlet.be This function: __int128_t abs128(__int128_t a) { return (a = 0) ? a : -a; } Currently generates the following code (with -O3): (linux x86_64, g++-4.7.0, SVN revision 178692) 49 89 f9mov%rdi,%r9 48 89 f7mov%rsi,%rdi 49 89 f2mov%rsi,%r10 48 c1 ff 3f sar$0x3f,%rdi 48 89 f8mov%rdi,%rax 48 89 famov%rdi,%rdx 4c 31 c8xor%r9,%rax 4c 31 d2xor%r10,%rdx 48 29 f8sub%rdi,%rax 48 19 fasbb%rdi,%rdx c3 retq But the following has 2 'mov' instructions less: 48 89 f8mov%rdi,%rax 48 89 f2mov%rsi,%rdx 48 89 d1mov%rdx,%rcx 48 c1 f9 3f sar$0x3f,%rcx 48 31 c8xor%rcx,%rax 48 31 caxor%rcx,%rdx 48 29 c8sub%rcx,%rax 48 19 casbb%rcx,%rdx c3 retq
[Bug tree-optimization/49552] missed optimization: test for zero remainder after division by a constant.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49552 --- Comment #2 from Wouter Vermaelen wouter.vermaelen at scarlet dot be 2011-06-28 12:22:11 UTC --- Confirmed. Is this possible for all constant moduli? It is. I recommend you get a copy of the book I mentioned before. The theory behind the transformation is much better explained there than I could ever do here. But I'll try to give a recipe to construct the routine for a specific constant: (all examples are for 32-bit, but it should be easy enough to generalize) There are 3 different cases: (x % C) == 0 * 'x' is unsigned, 'C' is odd: return (x * Cinv) = (0x / C); Where Cinv is the multiplicative inverse of C (C * Cinv = 1 (modulo pow(2, 32))). Cinv is the same 'magic number' as is used to optimize exact-division (division where it's known that the remainder will be zero). * 'x' is unsigned, 'C' is even: Split 'C' in an odd factor and a power of two. C = Codd * Ceven where Ceven = pow(2, k) Now we test that 'x' is both divisible by 'Codd' and 'Ceven'. return !(x (Ceven - 1)) ((x * Codd_inv) = (0x / Codd)) When a rotate-right instruction is available, the expression above can be rewritten so that it only requires one test: return rotateRight(x * Codd_inv, k) = (0x / C); // unsigned comparison * 'x' is signed, (C can be odd or even) (I admit, I don't fully understand the theory behind this transformation, so I'll only give the final result). constexpr unsigned A = (0x7fff / Codd) -(1 k); constexpr unsigned B = k ? (A (k - 1)) : (A 1); return rotateRight((x * Codd_inv) + A, k) = B; // unsigned comparison
[Bug tree-optimization/49552] New: missed optimization: test for zero remainder after division by a constant.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49552 Summary: missed optimization: test for zero remainder after division by a constant. Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: wouter.vermae...@scarlet.be Just like there are tricks to transform a division by a constant into a multiplication and some shifts, there are also tricks to test if the remainder of some division by a constant will be equal to zero. Some examples: bool is_mod3_zero(unsigned x) { // equivalent to: return (x % 3) == 0; return (x * 0xaaab) = (0x / 3); } bool is_mod28_zero(unsigned x) { // equivalent to: return (x % 28) == 0; // return !(x 3) ((x * 0xb6db6db7) = (0x / 7)); return rotateRight(x * 0xb6db6db7, 2) = (0x / 28); } bool is_signed_mod28_zero(int x) { // equivalent to: return (x % 28) == 0; const unsigned c = (0x7fff / 7) -(1 2); unsigned q = rotateRight((x * 0xb6db6db7) + c, 2); return q = (c (2 - 1)); } I found this trick in the book Hacker's delight, chapter 10-16 Test for Zero Remainder after Division by a Constant. The book also explains the theory behind this transformation. It would be nice if gcc could automatically perform this optimization. Bonus: bool is_mod3_one(unsigned x) { // equivalent to: return (x % 3) == 1; // only correct if 'x + 2' does not overflow //(sometimes this can be derived from VRP) return ((x + (3 - 1)) * 0xaaab) = (0x / 3); }
[Bug lto/49237] New: error with -flto: 'f' causes a section type conflict
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49237 Summary: error with -flto:'f' causes a section type conflict Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto AssignedTo: unassig...@gcc.gnu.org ReportedBy: wouter.vermae...@scarlet.be cat bug.cc struct Bar; struct Base1 { virtual ~Base1(); }; templatetypename T struct Base2 { virtual void f(T) = 0; }; templatetypename struct Foo : Base1, Base2Bar { virtual void f(Bar) {} }; template struct FooBar; g++-snapshot --version g++-snapshot (GCC) 4.7.0 20110530 (experimental) g++-snapshot bug.cc -c -flto g++-snapshot bug.o -flto In file included from bug.cc:8:0, from :14: bug.cc: In member function ‘f’: bug.cc:9:15: error: f causes a section type conflict Without the '-flto' option it works as expected. This is on linux x86_64 (though I don't think this matters).
[Bug tree-optimization/49203] New: missed-optimization: useless expressions not moved out of loop
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49203 Summary: missed-optimization: useless expressions not moved out of loop Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: wouter.vermae...@scarlet.be Hi all, Below is (a simplified version of) some real code I recently encountered. The stores to the 'output' array are written in the inner loop, but the intention was probably to have them in the outer loop. Gcc is able to 'correct' this programming mistake, but only partly: the stores itself are moved to the outer loop, but the instructions that calculate those values remain in the inner loop. For this particular example, the best solution is of course to fix the C code. But maybe this missed-optimization can also occur in other, more valid, contexts. Below I've included the generated x86_64 code for this example by recent versions of both gcc and llvm. /// unsigned char input[100]; unsigned char output[100]; void f() { for (int i = 0; i 32; i += 4) { unsigned tmp = 0; for (int j = 0; j 16; ++j) { tmp = (tmp 2) | (input[i + j] 0x03); output[i + 0] = (tmp 24) 0xFF; output[i + 1] = (tmp 16) 0xFF; output[i + 2] = (tmp 8) 0xFF; output[i + 3] = (tmp 0) 0xFF; } } } /// g++ (GCC) 4.7.0 20110527 (experimental) g++ -O2 -S movl$output, %r10d movq%r10, %r9 .p2align 4,,10 .p2align 3 .L2: movl%r9d, %esi xorl%edx, %edx xorl%eax, %eax subl%r10d, %esi .p2align 4,,10 .p2align 3 .L3: leal0(,%rax,4), %ecx leal(%rdx,%rsi), %eax addl$1, %edx cltq movzbl input(%rax), %eax andl$3, %eax orl %ecx, %eax movl%eax, %r8d movl%eax, %edi movl%eax, %ecx shrl$24, %r8d shrl$16, %edi shrl$8, %ecx cmpl$16, %edx jne .L3 movb%r8b, (%r9) movb%dil, 1(%r9) movb%cl, 2(%r9) movb%al, 3(%r9) addq$4, %r9 cmpq$output+32, %r9 jne .L2 rep ret /// clang version 3.0 (http://llvm.org/git/clang.git 855f41963e545172a935d07b4713d079e258a207) clang++ -O2 -S # BB#0: # %entry xorl%eax, %eax .align 16, 0x90 .LBB0_1:# %for.cond4.preheader # =This Loop Header: Depth=1 # Child Loop BB0_2 Depth 2 xorl%esi, %esi movq$-16, %rdx .align 16, 0x90 .LBB0_2:# %for.body7 # Parent Loop BB0_1 Depth=1 # = This Inner Loop Header: Depth=2 movl%esi, %ecx movzbl input+16(%rdx,%rax), %edi andl$3, %edi leal(,%rcx,4), %esi orl %edi, %esi incq%rdx jne .LBB0_2 # BB#3: # %for.inc44 # in Loop: Header=BB0_1 Depth=1 movb%sil, output+3(%rax) movl%ecx, %edx shrl$6, %edx movb%dl, output+2(%rax) movl%ecx, %edx shrl$14, %edx movb%dl, output+1(%rax) shrl$22, %ecx movb%cl, output(%rax) addq$4, %rax cmpq$32, %rax jne .LBB0_1 # BB#4: # %for.end47 ret
[Bug tree-optimization/48764] New: wrong-code bug in gcc-4.5.x, related to __restrict
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48764 Summary: wrong-code bug in gcc-4.5.x, related to __restrict Product: gcc Version: 4.5.3 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: wouter.vermae...@scarlet.be I had originally posted this on gcc-help because I wasn't sure it was an actual compiler bug or undefined behavior. Ian Lance Taylor replied that he didn't see any undefined behavior. So I'm reporting it now as a bug. Here's the original message: http://gcc.gnu.org/ml/gcc-help/2011-04/msg00476.html But I'll repeat it below: Hi all, I believe I found a wrong-code bug. The problem triggers when using gcc-4.5.1, 4.5.2 or 4.5.3, but not when using 4.4.5 or 4.7.0 (snapshot 20110419). It also only triggers with certain optimization levels/flags. I wonder if this is a known problem and already fixed in 4.7.0, or that the problem still exists but for some reason doesn't trigger in 4.7.0 (I couldn't easily find something in bugzilla). Below is a reduced test-case that shows the problem. I tried, but I couldn't get it smaller than these 4 files (combined about 60 lines). While reducing this problem I realized that it *might* not be a compiler bug, but undefined behaviour with the usage of __restrict in Buffer::read(). What I wanted to express there is that the memory write done by memcpy() can never overwrite the member variable 'p'. At the moment I still believe it's a compiler bug, but I'm not 100% sure anymore. So is this a compiler bug or undefined behavior in my program? In case of the latter I would appreciate if someone could explain what the problem is and maybe suggest a way to fix it. Thanks. Wouter BTW: The code for gcc-4.7.0 is correct but contains some useless extra instructions (which I tried to avoid with __restrict). I'd also appreciate hints on how to improve the generate code. I do realize that the code in this reduced test-case may look a bit silly and that suggestions to optimize the code may be hard because of this. /// FooBar.hh / struct Loader; struct FooBar { void load(Loader l); char c1, c2; }; /// Loader.hh / #include cstring struct Buffer { Buffer(const char* data) : p(data) {} void read(void* __restrict out) __restrict { memcpy(out, p, 1); ++p; } const char* p; }; templatetypename Derived struct Base { void load2(char t) { Derived self = static_castDerived(*this); self.load1(t); } int dummy; }; struct Loader : BaseLoader { Loader(const char* data) : buffer(data) {} void load1(char t) { buffer.read(t); } Buffer buffer; }; /// FooBar.cc / #include FooBar.hh #include Loader.hh #include cstdio void FooBar::load(Loader l) { l.load1(c1); //printf(This print hides the bug\n); l.load2(c2); } /// main.cc /// #include FooBar.hh #include Loader.hh #include cstdio int main() { char data[2] = { 3, 5 }; Loader loader(data); FooBar fb; fb.load(loader); if ((fb.c1 == 3) (fb.c2 == 5)) { printf(Ok\n); } else { printf(Wrong!\n); } } g++ --version g++ (GCC) 4.5.3 20110423 (prerelease) uname -a Linux argon 2.6.35-28-generic #49-Ubuntu SMP Tue Mar 1 14:39:03 UTC 2011 x86_64 GNU/Linux g++ -O3 FooBar.cc -c g++ -O3 main.cc -c g++ -o bug FooBar.o main.o ./bug Wrong! objdump -d FooBar.o (gcc-4.5.3 prerelease) mov0x8(%rsi),%rdx lea0x8(%rsi),%rax movzbl (%rdx),%edx mov%dl,(%rdi) mov0x8(%rsi),%rdx -- WRONG: still uses original value of Buffer::p addq $0x1,(%rax) -- it is only increased here (for the 1st time) movzbl (%rdx),%edx mov%dl,0x1(%rdi) addq $0x1,(%rax) retq objdump -d FooBar.o (gcc-4.7.0 20110419) mov0x8(%rsi),%rax movzbl (%rax),%edx mov%dl,(%rdi) lea0x1(%rax),%rdx -- correct, but I know this is not mov%rdx,0x8(%rsi) -- required for my application movzbl 0x1(%rax),%edx add$0x2,%rax mov%dl,0x1(%rdi) mov%rax,0x8(%rsi) retq
[Bug lto/48354] New: internal compiler error: in splice_child_die, at dwarf2out.c:8064
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48354 Summary: internal compiler error: in splice_child_die, at dwarf2out.c:8064 Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto AssignedTo: unassig...@gcc.gnu.org ReportedBy: wouter.vermae...@scarlet.be I got this ICE when trying to compile the openMSX package using -flto. I managed to reduce it to this: cat bug.ii templatetypename T struct Identity { typedef T type; }; struct S { typedef void (S::*FP)(); FP fp; }; void g(); void f() { typedef IdentityS::type Dummy; S s; g(); } g++-snapshot -r -nostdlib -g -flto bug.ii ... bug.ii:11:1: internal compiler error: in splice_child_die, at dwarf2out.c:8064 ... I'm using revision trunk@171714. This may or may not be a duplicate of bug 46135. Though the testcase looks very different.
[Bug tree-optimization/46780] New: -fgraphite-identity ICE in refs_may_alias_p_1, at tree-ssa-alias.c:1081
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46780 Summary: -fgraphite-identity ICE in refs_may_alias_p_1, at tree-ssa-alias.c:1081 Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: wouter.vermae...@scarlet.be cat bug.ii extern C { double cos(double x); } static int outbuf[8][8]; int main() { double buf1[9]; double* buf1_1 = buf1[1]; for (int i = 0; i 8; ++i) { buf1_1[i] *= cos(i); } int buf2[64]; for (int i = 0; i 8; ++i) { int* buf2_i = buf2[i]; for (int j = 0; j 8; ++j) { outbuf[i][j] = buf2_i[8 * j]; } } } g++ -O2 -fgraphite-identity bug.ii bug.ii: In function ‘int main()’: bug.ii:19:1: internal compiler error: in refs_may_alias_p_1, at tree-ssa-alias.c:1081 I'm using SVN revision tr...@167414 on linux x86_64.
[Bug 45764] (tree-optimization) New: wrong code -O2 vs -O3 (problem in vectorizer???)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45764 Summary: wrong code -O2 vs -O3(problem in vectorizer???) Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: major Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: wouter.vermae...@scarlet.be cat bug.cc int result[64][16]; int main() { double dbuf[1000] = {0.0}; int ibuf[900]; double d1 = 0.0; double d2 = 0.0; for (int i = 0; i 900; ++i) { ibuf[i] = int(d2 - d1); d1 += dbuf[i]; d2 += dbuf[i + 64]; } for (int i = 0; i 64; ++i) { for (int j = 0; j 8; ++j) { result[i][ j] = ibuf[64 - i + 64 * j]; result[i][15 - j] = ibuf[ i + 64 * j]; } } } g++ -O2 bug.cc ./a.out g++ -O3 bug.cc ./a.out Segmentation fault (core dumped) I'm using SVN revision 164570 on linux_x86_64. -- Configure bugmail: http://gcc.gnu.org/bugzilla/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are on the CC list for the bug.