[Bug c++/77896] Object vtable lookups are not hoisted out of loops
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77896 --- Comment #5 from Ryan Johnson --- In an ideal world, C++ would disallow such behavior by default, with a function attribute of some kind that flags cases where a type change might occur (kind of like how c++11 assumes `nothrow()` for destructors unless you specify otherwise). Not only would it allow better optimizations, it would be safer, because the compiler could then detect and forbid (or at least warn about) problematic usage of such a class (like stack-allocating it, or calling a type-change function when cast as the type that's about to change).
[Bug c++/77896] Object vtable lookups are not hoisted out of loops
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77896 --- Comment #4 from Ryan Johnson --- Yikes. That explains it, all right. I never would have thought of an object destroying itself and changing its type with placement new... I guess it must be subject to the same restrictions as `delete this` [1], because things don't turn out well if the compiler thinks it knows the type of the object: alter.cpp === #include #include struct AlterEgo { virtual ~AlterEgo() { } virtual void toggle()=0; }; struct Jekyl : AlterEgo { ~Jekyl() { puts("~Jekyl"); } void toggle(); }; struct Hyde : AlterEgo { ~Hyde() { puts("~Hyde"); } void toggle(); }; void Jekyl::toggle() { this->~AlterEgo(); new (this) Hyde; } void Hyde::toggle() { this->~AlterEgo(); new (this) Jekyl; } void whatami(AlterEgo* x) { printf("Jekyl? %p\n", dynamic_cast(x)); x->toggle(); printf("Jekyl? %p\n", dynamic_cast (x)); } int main() { puts("\nWorks ok-ish:"); Jekyl* x = new Jekyl; whatami(x); puts("\nJekyl?"); delete x; puts("\nBad idea:"); Jekyl j; j.toggle(); j.toggle(); whatami(); puts("\nJekyl?"); } $ g++ -Wall alter.cpp && ./a.out Works ok-ish: Jekyl? 0x6000104c0 ~Jekyl Jekyl? 0x0 Jekyl? ~Hyde Bad idea: ~Jekyl ~Hyde Jekyl? 0x0 ~Hyde Jekyl? 0xcbf0 Jekyl? ~Jekyl [1] https://isocpp.org/wiki/faq/freestore-mgmt#delete-this
[Bug c++/77896] Object vtable lookups are not hoisted out of loops
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77896 --- Comment #1 from Ryan Johnson --- It appears that multiple calls to different virtual functions of the same object are not optimized, either (each performs the same load-load-jump operation).
[Bug c++/77896] New: Object vtable lookups are not hoisted out of loops
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77896 Bug ID: 77896 Summary: Object vtable lookups are not hoisted out of loops Product: gcc Version: 6.2.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: scovich at gmail dot com Target Milestone: --- C++ virtual function calls normally require two memory loads followed by an indirect jump: one load fetches the vtable from the object, another to fetch the function address from the vtable, and the indirect call to invoke the function. Given that an object's vtable is fixed over its lifetime, and the contents of a given vtable are compile-time constant, I would expect the vtable lookups to be hoisted out of loops when appropriate. For example: foo.cpp === struct Foo { virtual void frob(int i)=0; }; void frobN(Foo* f, int n) { for (int i=0; i < n; i++) f->frob(i); } Compiles at -O2 to substantially the same x86 assembly code for gcc-4.9, gcc-5.2 and gcc-6.2: _Z5frobNP3Fooi: testl %esi, %esi jle .L10 pushq %r12 movl%esi, %r12d pushq %rbp movq%rdi, %rbp pushq %rbx xorl%ebx, %ebx .L5: movq0(%rbp), %rax movl%ebx, %esi addl$1, %ebx movq%rbp, %rdi call*(%rax) cmpl%ebx, %r12d jne .L5 popq%rbx popq%rbp popq%r12 .L10: rep ret I would have expected to see something more like this (obtained using the bound member function extension): _Z5frobNP3Fooi: .LFB12: pushq %r13 pushq %r12 pushq %rbp pushq %rbx subq$8, %rsp movq(%rdi), %rax testl %esi, %esi movq(%rax), %r13 jle .L1 movq%rdi, %r12 movl%esi, %ebp xorl%ebx, %ebx .L5: movl%ebx, %esi addl$1, %ebx movq%r12, %rdi call*%r13 cmpl%ebx, %ebp jne .L5 .L1: addq$8, %rsp popq%rbx popq%rbp popq%r12 popq%r13 ret Altering the test case to trigger speculative devirtualization as follows: bug2.cpp === #include struct Foo { virtual void frob(int i)=0; }; void frobN(Foo* f, int n) { for (int i=0; i < n; i++) f->frob(i); } struct Bar : Foo { void frob(int i) { printf("Bar:%d\n", i); } }; int main() { Bar b; frobN(, 10); } = Shows that even the speculative devirtualization is stuck inside the loop body: _Z5frobNP3Fooi: testl %esi, %esi jle .L13 pushq %r12 movl%esi, %r12d pushq %rbp movq%rdi, %rbp pushq %rbx xorl%ebx, %ebx jmp .L8 .L16: xorl%eax, %eax movl$.LC0, %edi addl$1, %ebx callprintf cmpl%ebx, %r12d je .L15 .L8: movq0(%rbp), %rax movl%ebx, %esi movq(%rax), %rax cmpq$_ZN3Bar4frobEi, %rax je .L16 addl$1, %ebx movq%rbp, %rdi call*%rax cmpl%ebx, %r12d jne .L8 .L15: popq%rbx popq%rbp popq%r12 .L13: rep ret If the vtable lookup could be hoisted, the speculative de-virt could become very powerful by replicating the loop, something like this: _Z5frobNP3Fooi: testl %esi, %esi jle .L10 pushq %r12 movl%esi, %r12d pushq %rbp movq%rdi, %rbp pushq %rbx xorl%ebx, %ebx movq0(%rbp), %rax pushq %r12 movq(%rax), %r13 cmpq$_ZN3Bar4frobEi, %r13 je .L16 .L5: movl%ebx, %esi addl$1, %ebx movq%rbp, %rdi call*%r13 cmpl%ebx, %r12d jne .L5 jmp .L10 .L16: xorl%eax, %eax movl$.LC0, %edi movl%ebx, %esi addl$1, %ebx callprintf cmpl%ebx, %r12d jne .L16 popq%r13 .L10: popq%rbx popq%rbp popq%r12 rep ret
[Bug c++/68859] New: Add a less strict/smarter version of -Wreorder
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68859 Bug ID: 68859 Summary: Add a less strict/smarter version of -Wreorder Product: gcc Version: 5.2.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: scovich at gmail dot com Target Milestone: --- I am working with a large legacy code base that triggers a huge number of warnings when compiled with -Wreorder (or -Wall, which enables it). I am not making any excuses for that code, but still it would be nice to have a weaker-but-smarter variant of -Wreorder that only triggers when the initialization order actually matters. For example, in the .cpp below, the warning triggered by struct `definitely_bad` is helpful and identifies a real bug. The warning for struct `not_a_problem`, on the other hand, is significantly less interesting because each member is initialized completely independently of the others (***). The middle example is evil because it calls a member function of a partially constructed object, so I don't think it much matters whether this smarter warning would trip or not in that case. = example.cpp == struct definitely_bad { int val; int *ptr; definitely_bad(int *p) : ptr(p), val(*ptr) { } }; struct bad_for_a_different_reason { int val; int *ptr; bad_for_a_different_reason(int *p) : ptr(p), val(do_something()) { } int do_something(); }; struct not_a_problem { int val; int *ptr; not_a_problem(int* p, int v) : ptr(p), val(v) { } }; Basically, I could imagine building a dependency graph that tracks which member initializers depend on other members, and then trigger the warning only if the true initialization order is not a valid partial order in that graph. (***) I realize that members could have constructors with global side effects (e.g. calls to printf or changes to global variables). I that case changing the initialization order would still be observable, but this seems like a rare enough case that the proposed warning could ignore it (leaving the existing -Wreorder to flag it if the user desires).
[Bug c++/68859] Add a less strict/smarter version of -Wreorder
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68859 --- Comment #1 from Ryan Johnson --- (I would be happy to do some legwork on this if somebody is willing to send a few pointers by PM. I know the code in gcc/cp/init.c, particularly functions `perform_member_init` and `sort_mem_initializers` are relevant, but would need some help figuring out how to traverse trees and pick out uses of member variables from other members' initializers)
[Bug c++/67866] New: False positive -Wshift-count-overflow on template code that checks for shift count overflow
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67866 Bug ID: 67866 Summary: False positive -Wshift-count-overflow on template code that checks for shift count overflow Product: gcc Version: 5.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: scovich at gmail dot com Target Milestone: --- The following code snippet evokes an obviously unhelpful warning: bug.cpp template unsigned long int m() { unsigned long int max_value = 1; if (M < 64) max_value = (max_value << M) - 1; else max_value = ~(max_value - 1); return max_value; } int main() { m<64>(); } = Output is: g++-5.2 -Wshift-count-overflow bug.cpp -O3 bug.cpp: In instantiation of 'long unsigned int m() [with int M = 64]': bug.cpp:14:11: required from here bug.cpp:6:32: warning: left shift count >= width of type [-Wshift-count-overflow] max_value = (max_value << M) - 1; ^
[Bug c++/61991] Destructors not always called for statically initialized thread_local objects
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61991 --- Comment #1 from Ryan Johnson scovich at gmail dot com --- C++14 (N3652 [1]) specifically alters the Standard to state that a thread_local object with static or constexpr initialization may have a non-trivial destructor (implying that such a destructor should actually run): Variables with static storage duration (3.7.1) or thread storage duration (3.7.2) shall be zero-initialized (8.5) before any other initialization takes place. A constant initializer for an object o is an expression that is a constant expression, except that it may also invoke constexpr constructors for o and its subobjects even if those objects are of non-literal class types [ Note: such a class may have a non-trivial destructor ]. [1] https://isocpp.org/files/papers/N3652.html
[Bug c++/65656] __builtin_constant_p should always be constexpr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65656 --- Comment #3 from Ryan Johnson scovich at gmail dot com --- (In reply to Jason Merrill from comment #2) Author: jason Date: Tue Apr 28 14:43:59 2015 New Revision: 222531 URL: https://gcc.gnu.org/viewcvs?rev=222531root=gccview=rev Log: PR c++/65656 * constexpr.c (cxx_eval_builtin_function_call): Fix __builtin_constant_p. Added: trunk/gcc/testsuite/g++.dg/cpp0x/constexpr-builtin3.C Modified: trunk/gcc/cp/ChangeLog trunk/gcc/cp/constexpr.c Any reason this bug should not be closed as 'fixed' ?
[Bug c++/65656] New: __builtin_constant_p should be constexpr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65656 Bug ID: 65656 Summary: __builtin_constant_p should be constexpr Product: gcc Version: 4.8.3 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: scovich at gmail dot com Consider the following program compiled with `gcc -std=c++11' = bug.cpp = #include cstdio int main(int argc, char *argv[]) { constexpr bool x = __builtin_constant_p(argc); std::printf(x=%d\n, x); } === With optimizations disabled, it correctly treats __builtin_constant_p() as constexpr and prints 0 as expected (because the value of argc is not a compile-time constant). With optimizations enabled (-O1 or higher), compilation fails: bug.cpp: In function ‘int main(int, char**)’: bug.cpp:3:48: error: ‘argc’ is not a constant expression constexpr bool x = __builtin_constant_p(argc); ^ Clang 3.4 handles the case just fine. While I can 100% understand that the return value of __builtin_constant_p() might change depending on what information the optimizer has available, I'm pretty sure __builtin_constant_p() should always return a value computable at compile time. NOTE: this issue is *NOT* the same as Bug #54021, in spite of the two sharing the same title. The latter is mis-named: It actually requests support for constant folding for ternary expressions involving __builtin_constant_p, even when optimizations are disabled and such folding would not normally occur.
[Bug c++/61991] New: Destructors not always called for statically initialized thread_local objects
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61991 Bug ID: 61991 Summary: Destructors not always called for statically initialized thread_local objects Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: scovich at gmail dot com If a thread_local object is statically initialized---trivial or constexpr constructor---but has a non-trivial destructor, the destructor will only run if some *other* thread_local object needs dynamic initialization *and* if at least one such object is accessed by the thread during its lifetime. Accessing members of the object itself does nothing, because it is statically initialized. Example: #include cstdio static thread_local struct X { ~X() { printf(bye!\n); } } x; static thread_local int y = printf(initialized y\n); int main() { //printf(%d\n, y); } Compiling the above with `g++ -std=gnu++11 bug.cpp' gives an executable that produces no output when run. Uncomment the printf in main() and recompile, and the resulting executable prints: initialized y 14 bye! The only hint of trouble at compile time is that the compiler may warn about an unused variable. However, that warning only comes if the offending object is never accessed otherwise (perhaps because it is an exit guard of some type), has static storage class, *and* no other dynamic thread_local storage exists... an unlikely combination. Looking at the assembly code output, the problem is obvious: X::~X is only registered with __cxa_thread_exit if __tls_init is called, and the latter is only called if the thread accesses a TLS object that needs dynamic initialization. Under Cygwin, I hit the further problem that __tls_init doesn't even contain the any calls to __cxa_thread_exit. That's probably a separate bug, though, and I don't know whose problem it might be. If there's no easy fix, might I suggest a loud warning somewhere in the docs might be appropriate so people have a way to know about the limitation? I tried searching for this online, but Google didn't turn anything up. (Discovered in 4.8.3, still there in 4.9.0, and given the nature of the bug I suspect it's in more recent versions, too).
[Bug inline-asm/49611] Inline asm should support input/output of flags
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49611 --- Comment #9 from Ryan Johnson scovich at gmail dot com --- (In reply to Andi Kleen from comment #7) You can do many of these things these days with asm goto, however it typically requires non-structured control flow (goto labels). I filed this bug after determining that asm goto was unsuitable for this purpose. Goto labels are not a problem per se (actually kind of slick), but asm goto requires all outputs to pass through memory and so is only good for control flow (not computation plus exceptional case). It also requires the actual branching and all attendant glue to happen in assembly. Both limitations increase bulk and hamper the optimizer, and go against (what I thought was) the intention that inline asm normally be used for very small snippets of code the compiler can't handle. At some point you may as well just setcc and do a new comparison/branch outside the asm block; less bug-prone and would probably yield faster and cleaner code, too.
[Bug inline-asm/49611] Inline asm should support input/output of flags
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49611 --- Comment #6 from Ryan Johnson scovich at gmail dot com --- (In reply to Jeremy from comment #5) It may not be possible, but perhaps a simpler thing might be for the asm() to notionally return a single boolean value which reflects ONE flag only. Interesting! Ironically, limiting it to one flag opens the way to cleanly specify branching based on multiple flags. The optimizer just needs to recognize that when it sees two otherwise-equivalent (non-volatile) asm statements with different asm_return attribute, it's really just one asm statement that sets multiple flags. Thus: #ifdef USE_ASM #define CMP(a,b) asm(cmp %0 %1 : : r(a), r(b)) #define BELOW(a,b) (__attribute__((asm_return(cc_carry))) CMP(a,b)) #define EQUAL(a,b) (__attribute__((asm_return(cc_zero))) CMP(a,b)) #else #define BELOW(a,b) ((a) (b)) #define EQUAL(a,b) ((a) == (b)) #endif int do_it(unsigned int a, unsigned int b, int c, int d, int e, int f) { int x; if (BELOW(a,b)) x = c+d; else if (EQUAL(a,b)) x = d+e; else x = c+e; return x+f; } Would produce the same assembly code output---with only one comparison---whether USE_ASM was defined or not. Even more fun would be if the optimizer could recognize conditionals that depend on multiple flags (like x86 less or equal) and turn this: if ((__attribute__((asm_return(cc_zero))) CMP(a,b) || __attribute__((asm_return(cc_overflow))) CMP(a,b) != __attribute__((asm_return(cc_sign))) CMP(a,b)) do_less_or_equal(); do_something_else(); into: cmp %[a] %[b] jg 1f call do_less_or_equal 1: call do_something_else Much of the flag-wrangling machinery seems to already exist, because the compiler emits the above asm if you replace the inline asm with either a = b or a b || a == b (assuming now that a and b are signed ints).
[Bug c++/61372] New: Add warning to detect noexcept functions that might throw
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61372 Bug ID: 61372 Summary: Add warning to detect noexcept functions that might throw Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: scovich at gmail dot com The C++11 standard adds the noexcept specification that lets the programmer assert that a function does not throw any exceptions (terminating execution if that assertion ever turns out to be false at runtime). Unfortunately, there is currently no reliable way for a programmer to validate, at compile time, her assertion that a function does or does not throw. The closest thing is -Wnoexcept, which detects the (very narrow) case where the following all apply to some function F: 1. F lacks the noexcept declaration or has declared noexcept(false) 2. The compiler has determined that F cannot throw 3. F causes some noexcept operator to evaluate to false Unfortunately, that narrow formulation makes it really hard to validate much of anything (see example and further discussion below). It would be very helpful to have a warning flag which tells the compiler to report cases where a function's noexcept specification contradicts the compiler's analysis of the function body. Perhaps -Wnoexcept-mismatch={1,2,3}? 1 (high priority): functions declared noexcept(true) but which contain expressions that might throw. This validates stated noexcept assumptions, helping to avoid issues like PR #56166. 2 (medium priority): Also report functions declared noexcept(false) that in fact cannot throw (e.g. cases #1 and #2 for -Wnoexcept). This improves the accuracy of noexcept validation, and also improves performance in general (by eliminating unwind handlers). And makes it easier to avoid/fix things like PR #52562. 3 (low priority): Also report functions which lack any noexcept declaration but which cannot throw (similar to -Wsuggest-attribute for const, pure, etc.). This also improves accuracy of noexcept, but the programmer would have to decide whether to make the API change (marking the function noexcept) or whether it's important to retain the ability to throw in the future. Probably none of the above warnings should be enabled by default, but it might make sense to enable -Wnoexcept-mismatch=1 with -Wall and -Wnoexcept-mismatch=2 with -Wextra. To implement the warning, the compiler would make a pass over each function body (after applying most optimizations, especially inlining and dead code elimination). It would then infer a noexcept value by examining all function calls that remain, and compare that result with the function's actual noexcept specification (or lack thereof). No need for any kind of IPA: if a callee lies about its noexcept status, it's the callee's problem. === Workaround using -Wnoexcept === One might try to combine static_assert with noexcept, e.g: // example.cpp void might_throw(int); // lacks noexcept void also_might_throw(); // lacks noexcept void never_throw(int a) noexcept(noexcept(might_throw(a)) noexcept(also_might_throw())) { if (a) might_throw(a); also_might_throw(); } void foo(int a) noexcept(noexcept(might_throw(a))) { might_throw(a); } static_assert(noexcept(foo(0)), never_throw might throw); There are two glaring problems with that approach, however: 1. Every expression in the function body must be part of the noexcept clause, effectively replicating the function body in its signature (but without the ability to declare local variables). - Maintaining the noexcept chain across code changes would be tedious and error-prone for all but the smallest and most stable functions (= ie the ones least in need of verification). - Operator overloading means you can't even assume basic expressions like a+b are nothrow. To get complete coverage would require either a very careful analysis (error prone) or cracking the entire function body into an AST atomic expressions (tedious *and* error prone). - Macro expansions would add even more headaches, because they may expand to more than one statement and/or include control flow. 2. The static_assert must choose one set of inputs for each function call it passes to operator noexcept. - An optimizer (especially after inlining and constant propagation) could conceivably report that the function is noexcept for that particular input, when in fact other inputs exist that could cause an exception to be thrown (this does not seem to be the case currently). - There may not be any easy way to come up with a valid input (objects that lack a default constructor, etc.). Using hacks like (*(T*)0) would violate all sorts of compiler/optimizer assumptions and risks breaking the analysis
[Bug c++/14932] [3.4/4.0 Regression] cannot use offsetof to get offsets of array elements in g++ 3.4.0 prerelease
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=14932 Ryan Johnson scovich at gmail dot com changed: What|Removed |Added CC||scovich at gmail dot com --- Comment #16 from Ryan Johnson scovich at gmail dot com --- A very similar problem arises with gcc-4.8.2 (and 4.9.0): #include stdio.h struct foo { char data[10]; }; int main() { int x = 4; printf(%zd\n, offsetof(struct foo, data[x])); return 0; } gcc-4.8.2 accepts it (with -xc), as does clang-3.0. In both cases, the resulting binary prints 4 as expected. g++-4.8.2 rejects: bug.cpp: In function ‘int main()’: bug.cpp:9:47: error: ‘x’ cannot appear in a constant-expression printf(%zd\n, offsetof(struct foo, data[x])); So again, I don't think this bug is fixed... but I'll happily file a new PR if that's preferred.
[Bug rtl-optimization/10474] shrink wrapping for functions
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10474 --- Comment #18 from Ryan Johnson scovich at gmail dot com --- (In reply to Martin Jambor from comment #17) The testcase is now shrink-wrapped on ppc64 and x86_64, it is not on others such as i?86 because parameter-passing ABI basically prevents it. If any of the three testcases pass also on any other platform (e.g. Ramana claimed it also works on AArch32 [1]), feel free to add it to the dg target in the testcase(s). For my part, I now consider this to be fixed. [1] http://gcc.gnu.org/ml/gcc-patches/2013-11/msg02726.html Great! Does this mean shrink-wrapping will be in gcc-4.9, at least for x86_64 and ppc64?
[Bug rtl-optimization/10474] shrink wrapping for functions
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10474 --- Comment #20 from Ryan Johnson scovich at gmail dot com --- Hi Martin, (PM reply because I don't have up-to-date information to file a proper bug report with) On 25/11/2013 9:57 AM, jamborm at gcc dot gnu.org wrote: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10474 --- Comment #19 from Martin Jambor jamborm at gcc dot gnu.org --- (In reply to Ryan Johnson from comment #18) Great! Does this mean shrink-wrapping will be in gcc-4.9, at least for x86_64 and ppc64? Well, a fairly basic (but not altogether unreasonable) shrink-wrapping was in gcc 4.8 (and earlier versions) too and that has not changed at all. The problem with this and similar testcases was that the register allocator made decisions which made shrink-wrapping impossible (or at least too difficult to perform). The change I committed and which will be a part of gcc 4.9 fixes this for a class of pseudo-registers which commonly result in this problem but other cases will still remain unresolved, for example PR 51982. For some statistics about what impact the implemented technique has, see the email accompanying the first submission of the patch: http://gcc.gnu.org/ml/gcc-patches/2013-10/msg01719.html If you find another similar example which is important and clearly possible to shrink-wrap but we don't do it, feel free to submit a new missed-optimization bug and CC me. One that comes to mind right off, but is from several years ago and possibly no longer true: on platforms like solaris/sparc, accesses to thread-local storage require a function call to retrieve the base of thread-local storage; the compiler seems to emit the call once, in the function prologue. I strongly suspect (but can't confirm, since I no longer have access to Solaris/sparc) that such a function-call-in-prologue would confound subsequent efforts at shrink wrapping. I don't know how often this sort of scenario arises any more, though. It may be that the new emutls stuff has changed everything, because on cygwin and gcc-4.8 I now see separate calls into emutls for every TLS access. As for PR 51982, it looks like having flow-sensitive local analysis could go a long way: just as it can be useful know that an escaped pointer has not *yet* escaped (e.g. PR 50346), here it would be useful to know that the stack frame, though perhaps eventually needed, is not needed just yet. Then, generation of the stack frame can be pushed down to the first basic block(s) where the need for a stack frame is undisputed, after any conditions that gate it. But I've been told that teaching gcc to think that way would not be easy... In any case, thanks for the improvement to a hairy problem. Regards, Ryan
[Bug c++/58050] New: RVO fails when calling static function through unnamed temporary
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58050 Bug ID: 58050 Summary: RVO fails when calling static function through unnamed temporary Product: gcc Version: 4.8.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: scovich at gmail dot com Return value optimization is not applied when calling a static member function via an unnamed temporary (value or pointer, it doesn't matter). Calling the function directly, or through a named value/pointer, works as expected: // --- bug.cpp --- extern C int puts(char const*); struct B { ~B() { puts(\t~B); } }; struct A { static B make() { return B(); } } a; A *ap() { return a; } int main () { puts(b1); {B b = A::make();} puts(b2); {B B = a.make();} puts(b3); {B b = ap()-make();} puts(b4); {B b = A().make();} } // --- end bug.cpp --- Output is (same for both 4.8.1 and 4.6.3): $ g++ bug.cpp ./a.out b1 ~B b2 ~B b3 ~B ~B b4 ~B ~B The workaround is simple enough to apply, if you happen to notice all the extra object copies being made; I isolated the test case from an app that used 5x more malloc bandwidth than necessary because a single static function called the wrong way returned a largish STL object by value.
[Bug c++/58051] New: No named return value optimization when returned object is implicitly converted
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58051 Bug ID: 58051 Summary: No named return value optimization when returned object is implicitly converted Product: gcc Version: 4.8.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: scovich at gmail dot com The following test case introduces an extra object copy, even though none should be required: // --- bug.cpp --- extern C void puts(char const *); struct A { A()=default; A(A )=default; A(A const ) { puts(copy); } ~A() { puts(~A); } }; struct B { A _a; B(A a) : _a((A)(a)) { } }; B go() { A rval; return rval; } int main () { go(); } // --- end bug.cpp --- (when compiled with both `gcc-4.8.1 -std=gnu++11' and `gcc-4.6.3 -std=gnu++0x') RVO works properly if go() returns A() or std::move(rval).
[Bug c++/58022] New: Compiler rejects abstract class in template class with friend operator
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58022 Bug ID: 58022 Summary: Compiler rejects abstract class in template class with friend operator Product: gcc Version: 4.8.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: scovich at gmail dot com First, apologies for the vague subject line, I really don't know what to call this bug... Consider the following test case: // --- begin bug.cpp --- #include iostream using namespace std; template class T class foo; template class T ostream operator(ostream o, const fooT l); template class T class foo { friend ostream operator T (ostream o, const fooT l); }; class bar; foobar fb; class bar { virtual void baz()=0; }; // --- end bug.cpp --- The test case was isolated using multidelta on a large code base that compiles cleanly with gcc-4.7 and earlier. Compiling it with gcc-4.8.1 gives the error: cannot allocate an object of abstract type ‘bar’, and identifying this function in ostream: templatetypename _CharT, typename _Traits inline basic_ostream_CharT, _Traits operator(basic_ostream_CharT, _Traits __out, _CharT __c) { return __ostream_insert(__out, __c, 1); } Replacing using namespace std with std::ostream everywhere allows it to compile, as does moving the definition of bar above the friend declaration. I'm not 100% certain the code is valid C++, seeing as how it instantiates a template using an incomplete type, but there are still several issues: 1. The compiler gives no hint whatsoever where the real problem is, leaving the user to infer the context in some other way; it took 2h with multidelta to isolate the above test case and finally see what had happened. 2. The declaration of operator (which accepts a const ref) should not interfere with the one in ostream (which accepts a value); without the const ref declaration the compiler (rightfully!) complains that template-id ‘operator bar’ for ‘std::ostream operator(std::ostream, const foobar)’ does not match any template declaration 3. At no point is bar actually instantiated, passed by value, or its members accessed; even if operator did do one of those things, operator is never actually called with foobar as an argument, so the template shouldn't be instantiated. For now, the workaround seems to be ensuring that bar is fully defined before any template class mentions it, but that's not going to be easy given how hard it is to find the problem (and the fact that the foo template is in a utility library and really should be included first under normal circumstances).
[Bug c++/58022] [4.8 Regression] Compiler rejects abstract class in template class with friend operator
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58022 --- Comment #3 from Ryan Johnson scovich at gmail dot com --- (In reply to Paolo Carlini from comment #1) Please try to reduce the testcase further, no includes. You have a number of options here: http://gcc.gnu.org/wiki/A_guide_to_testcase_reduction Sorry, I thought ostream was an important part of the bug and did some work to put it back in... Here's the fully reduced case: // --- begin bug.cpp --- templatetypename _CharT class basic_ostream; typedef basic_ostreamchar ostream; templatetypename T basic_ostreamT operator(basic_ostreamT, T); template class T class foo; template class T ostream operator(ostream, const fooT); template class T class foo { friend ostream operator T (ostream, const fooT); }; class bar; foobar fb; class bar { virtual void baz()=0; }; // --- end bug.cpp ---
[Bug c++/57971] New: Improve copy elision when returning structs by value
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57971 Bug ID: 57971 Summary: Improve copy elision when returning structs by value Product: gcc Version: 4.8.1 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: scovich at gmail dot com Hi all, In the testcase below, bar() and baz() perform copy elision as expected, but blah() does not, in spite of its being functionally identical to baz(): #include cstdio struct foo { foo() { printf(make\n); } foo(foo const ) { printf(copy\n); } void frob() { printf(frob\n); } }; foo bar(bool) { foo f; f.frob(); return f; } foo baz(bool mknew) { if (mknew) return foo(); return bar(mknew); } foo blah(bool mknew) { if (mknew) return foo(); foo f = bar(mknew); return f; } int main() { printf(*** bar ***\n); bar(false); printf(*** baz ***\n); baz(false); printf(*** blah ***\n); blah(false); } Output is: $ g++ -Wall bug.cpp ./a.out *** bar *** make frob *** baz *** make frob *** blah *** make frob copy I assume that bar() and baz() exploit the named and unnamed return value optimizations, respectively, but blah() is missed because it needs both optimizations together.
[Bug c++/55288] New: Improve handling/suppression of maybe-uninitialized warnings
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55288 Bug #: 55288 Summary: Improve handling/suppression of maybe-uninitialized warnings Classification: Unclassified Product: gcc Version: 4.7.1 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: c++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: scov...@gmail.com Created attachment 28669 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=28669 maybe-uninitialized false positive Enabling -Wmaybe-unused (part of -Wall) can result in false positives, which is fine (the warning is still quite useful). However, there is currently no way to disable such warnings on a per-variable basis. It is possible, but ineffective, to push a diagnostic pragma to ignore such warnings: Warnings are generated where the uninitialized value (!= variable) is eventually consumed, and that can easily happen outside the range covered by the pragma. Inlining makes the problem much worse [1]. The attached test case (reduced from actual code) illustrates the problem clearly, failing to compile with `-O2 -Wall -Werror' even though (a) the value *is* always written before being read and (b) even though the containing function has maybe-uninitialized warnings disabled. Adding -DWORKS allows it to compile by disabling the warning for the call site, even though the offending variable is not in scope at any part of the source code where the pragma is in effect. Since the compiler can clearly track which variable was the problem, I would instead propose a new variable attribute, ((maybe_uninitialized)), to suppress all maybe-uninitialized warnings the marked variable might trigger for its consumers. That way, known false positives can be whitelisted without disabling a useful warning for large swaths of unrelated code [2]. [1] First, it can vastly expand the number of problematice end points that lie outside the pragma (they may even reside in different files). Second, the resulting error message is extremely unhelpful, because it names the variable that was originally uninitialized, rather than the variable that ended up holding the poisoned value at the point of use (the former might not even be in the same file, let alone be in scope, and there's no easy way to figure out which of its uses causes the problem). It would be much better in this case if the diagnostic listed the call site(s) and/or assignments that led to the identified line of code depending on the potentially-uninitialized value, similar to how template substitution failures or errors in included headers are handled today. [2] Another potential solution would be to propagate the pragma to inlined call sites, but that seems like a horrifically hacky and error prone solution.
[Bug c++/55288] Improve handling/suppression of maybe-uninitialized warnings
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55288 --- Comment #2 from Ryan Johnson scovich at gmail dot com 2012-11-12 21:11:43 UTC --- (In reply to comment #1) Why don't just initialize the variable? It seems simpler than implementing yet another special attribute in GCC. In the original program, the variable is a largish struct, the function is hot, and the 'valid' execution path is not the most common one. Avoiding unnecessary initialization there has a measurable impact on performance. Note that, in other parts of the code that gcc understands better, the initialization is unnecessary (no warning) and gets optimized away even if I do have it in place... much to my chagrin once, after I did a lot of work to refactor a complex function, only to realize that gcc emitted *exactly* the same machine code afterward, because it had already noticed and eliminated the dead stores. There's also a philosophical argument to be made... if we agree that all warnings subject to false positives should be supressible, the current mechanism for maybe-uninitialized is inadequate, and a variable attribute would resolve the issue very nicely. There's precedent for this: you *could* use #ifndef NDEBUG (or even pragma diagnostic) to avoid unused-variable warnings for helper variables used by multiple assertions scattered over a region of code, but setting ((unused)) on the offending variable is much easier to read and maintain, while still allowing other unused variables to be flagged properly.
[Bug inline-asm/49611] Inline asm should support input/output of flags
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49611 --- Comment #3 from Ryan Johnson scovich at gmail dot com 2012-04-12 16:39:32 UTC --- FYI: based on a discussion from quite some time ago [1], it seems that the Linux kernel folks would be tickled pink to have this feature, and discussed several potential ways to implement it. [1] http://lkml.indiana.edu/hypermail/linux/kernel/0111.2/0256.html
[Bug middle-end/32074] Optimizer does not exploit assertions
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32074 --- Comment #5 from Ryan Johnson scovich at gmail dot com 2012-03-29 02:46:50 UTC --- (In reply to comment #4) We have __builtin_unreachable() now which should allow for this optimization. I've been using __builtin_unreachable() for some time now, and it's very nice for its intended purpose (telling gcc when it's safe to produce better code). I've noticed, though, that the ``x'' passed to assert(x) in already-existing code is often too expensive (or side effect-ful) to optimize away when converted to ``if(!(x)) { __builtin_unreachable(); }'' I would therefore advise against executing the expression passed to assertions under NDEBUG. I use the following in my own code instead: #ifdef NDEBUG #define ASSUME(x) do { if (!(x)) __builtin_unreachable(); } while (0) #else #define ASSUME assert #endif The idea is to state assumptions that might help the compiler generate better code... treating them like assertions in debug mode to catch faulty assumptions. Assertions, meanwhile, should retain their traditional purpose of debugging aid or sanity test and continue to disappear completely in NDEBUG node. Recommend to close as WONTFIX, unless there are other reasons to keep it open.
[Bug c++/52637] New: ICE producing debug info for c++11 code using templates/decltype/lambda
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52637 Bug #: 52637 Summary: ICE producing debug info for c++11 code using templates/decltype/lambda Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: scov...@gmail.com The following code snippet produces an ICE when compiled by gcc-4.7.0-RC1 with flags `-std=gnu++11 -g -c' (gcc-4.6.2 and 4.5.3 accept it): === bug.cpp === template typename T struct foo { foo(T fn) { } }; template class T, typename V void bar(T*, V) { auto x = [] { }; auto y = foodecltype(x)(x); } template typename T void bar(T* t) { bar(t, [] { }); } struct baz { void bar() { ::bar(this); } }; === $ ~/apps/gcc-4.7-RC1/bin/g++ -std=gnu++11 -g bug.cpp bug.cpp:17:2: internal compiler error: in output_die, at dwarf2out.c:8463 Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html for instructions. The testcase was distilled from a multi-thousand line app with help from multidelta. My platform is i686-pc-cygwin, in case that matters.
[Bug bootstrap/52513] gcc-4.7.0-RC-20120302 fails to build for i686-pc-cygwin
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52513 --- Comment #2 from Ryan Johnson scovich at gmail dot com 2012-03-07 13:02:50 UTC --- (In reply to comment #1) 4.6 should be broken as well for you? Oops. I reported wrong in my OP. I've actually been using a home-built 4.6.2 for some time now... and it is the host compiler for this build: $ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/home/Ryan/apps/gcc-4.6.2/libexec/gcc/i686-pc-cygwin/4.6.2/lto-wrapper.exe Target: i686-pc-cygwin Configured with: ../gcc-4.6.2-src/configure --prefix=/home/Ryan/apps/gcc-4.6.2 Thread model: single gcc version 4.6.2 (GCC) Can you check why configure thinks spawnve is available in process.h (contrary to the warning we see in your snippet)? Sorry, I may not have been clear on this. Google reported that spawnve lives in process.h. A quick search on my filesystem shows that spawnve actually lives in cygwin/process.h, not process.h as expected by pex-unix.c. Configure probably only tested linker status for the function and therefore wouldn't have noticed. Perhaps the file moved recently (since 1.7.9 or 10)? I've sent mail to the cygwin list to see if anybody there knows. Meanwhile, soft-linking process.h to where gcc expects it lets the compile continue. I'll report back if any further issues arise. What Windows version are you using? W7-x64
[Bug tree-optimization/50346] Function call foils VRP/jump-threading of redundant predicate on struct member
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50346 --- Comment #6 from Ryan Johnson scovich at gmail dot com 2012-03-07 13:31:19 UTC --- (In reply to comment #5) On Wed, 12 Oct 2011, scovich at gmail dot com wrote: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50346 --- Comment #4 from Ryan Johnson scovich at gmail dot com 2011-10-12 12:40:25 UTC --- (In reply to comment #3) Well, it's a tree optimization issue. It's simple - the local aggregate f escapes the function via the member function call to baz: bb 5: foo::baz (f); and as our points-to analysis is not flow-sensitive for memory/calls this causes f to be clobbered by the call to bar Is flow-sensitive analysis within single functions prohibitively expensive? All the papers I can find talk about whole-program analysis, where it's very expensive in both time and space; the best I could find (CGO'11 best paper) gets it down to 20-30ms and 2-3MB per kLoC for up to ~300kLoC. It would need a complete rewrite, it isn't integratable into the current solver (which happens to be shared between IPA and non-IPA modes). That makes sense... Wild idea: would it be possible to annotate references as escaped or not escaped yet ? Anything global or passed into the function would be marked as escaped, while anything allocated locally would start out as not escaped; assigning to an escaped location or passing to a function would mark it as escaped if it wasn't already. The status could be determined in linear time using local information only (= scalable), and would benefit strongly as inlining (IPA or not) eliminates escape points. Alternatively (or maybe it's really the same thing?), I could imagine an SSA operation which moves the non-escaped variable into an escaped one (which happens to live at the same address) just before it escapes? That might give the same effect with no changes to the current flow-insensitive algorithm, as long as the optimizer knew how to adjust things to account for inlining.
[Bug tree-optimization/50346] Function call foils VRP/jump-threading of redundant predicate on struct member
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50346 --- Comment #8 from Ryan Johnson scovich at gmail dot com 2012-03-07 14:28:29 UTC --- (In reply to comment #7) On Wed, 7 Mar 2012, scovich at gmail dot com wrote: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50346 --- Comment #6 from Ryan Johnson scovich at gmail dot com 2012-03-07 13:31:19 UTC --- (In reply to comment #5) On Wed, 12 Oct 2011, scovich at gmail dot com wrote: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50346 --- Comment #4 from Ryan Johnson scovich at gmail dot com 2011-10-12 12:40:25 UTC --- (In reply to comment #3) Well, it's a tree optimization issue. It's simple - the local aggregate f escapes the function via the member function call to baz: bb 5: foo::baz (f); and as our points-to analysis is not flow-sensitive for memory/calls this causes f to be clobbered by the call to bar Is flow-sensitive analysis within single functions prohibitively expensive? All the papers I can find talk about whole-program analysis, where it's very expensive in both time and space; the best I could find (CGO'11 best paper) gets it down to 20-30ms and 2-3MB per kLoC for up to ~300kLoC. It would need a complete rewrite, it isn't integratable into the current solver (which happens to be shared between IPA and non-IPA modes). That makes sense... Wild idea: would it be possible to annotate references as escaped or not escaped yet ? Anything global or passed into the function would be marked as escaped, while anything allocated locally would start out as not escaped; assigning to an escaped location or passing to a function would mark it as escaped if it wasn't already. The status could be determined in linear time using local information only (= scalable), and would benefit strongly as inlining (IPA or not) eliminates escape points. Well, you can compute the clobber/use sets of individual function calls, IPA PTA computes a simple mod-ref analysis this way. You can also annotate functions whether they make arguments escape or whether it reads from them or clobbers them. The plan is to do some simple analysis and propagate that up the callgraph, similar to pure-const analysis. The escape part could be integrated there. That sounds really slick to have in general... but would it actually catch the test case above? What you describe seems to depend on test() having information about foo::baz() -- which it does not -- while analyzing the body of test() could at least identify the part of f's lifetime where it cannot possibly have escaped. Or does the local analysis come for free once those IPA changes are in place?
[Bug c++/52529] New: Compiler rejects template code inconsistently
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52529 Bug #: 52529 Summary: Compiler rejects template code inconsistently Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: scov...@gmail.com The following code does or does not compile (with varying errors) under 4.7.0-RC20120302 and 4.6.2 depending on the choice of FIRST..FOURTH; all four variants compile under 4.5.3. I've narrowed down the test case as far as I could, but I don't really understand what's going wrong. Does the code break some subtle lookup rule which gcc recently became more strict about? There seem to be two issues, because either AM or foo() is enough to trigger an error; only A1::fooT compiles. === bug.cpp === template long N struct A { template typename T long foo(typename T::X *x); }; template typename T struct B { typedef typename T::X X; enum { M=1 }; static void bar(X *x); }; struct C { struct X; }; int main() { BC::bar(0); } template typename T void BT::bar(X *x) { #if defined(FIRST) AM().foo(x); #elif defined(SECOND) AM().fooT(x); #elif defined(THIRD) A1().foo(x); #elif defined(FOURTH) A1().fooT(x); #endif } === end bug.cpp === Sample error message from 4.7.0, since it has the clearest error messages (4.6.2 gets confused by earlier errors when attempting to suggest a candidate): FIRST bug.cpp: In instantiation of ‘static void BT::bar(BT::X*) [with T = C; BT::X = C::X]’: bug.cpp:12:20: required from here bug.cpp:16:5: error: no matching function for call to ‘A1l::foo(BC::X*)’ bug.cpp:16:5: note: candidate is: bug.cpp:2:32: note: templateclass T long int A::foo(typename T::X*) [with T = T; long int N = 1l] bug.cpp:2:32: note: template argument deduction/substitution failed: bug.cpp:16:5: note: couldn't deduce template parameter ‘T’ SECOND bug.cpp: In static member function ‘static void BT::bar(BT::X*)’: bug.cpp:18:17: error: expected primary-expression before ‘’ token THIRD bug.cpp: In instantiation of ‘static void BT::bar(BT::X*) [with T = C; BT::X = C::X]’: bug.cpp:12:20: required from here bug.cpp:20:5: error: no matching function for call to ‘A1l::foo(BC::X*)’ bug.cpp:20:5: note: candidate is: bug.cpp:2:32: note: templateclass T long int A::foo(typename T::X*) [with T = T; long int N = 1l] bug.cpp:2:32: note: template argument deduction/substitution failed: bug.cpp:20:5: note: couldn't deduce template parameter ‘T’ FOURTH [successful compilation]
[Bug bootstrap/52513] New: gcc-4.7.0-RC-20120302 fails to build for i686-pc-cygwin
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52513 Bug #: 52513 Summary: gcc-4.7.0-RC-20120302 fails to build for i686-pc-cygwin Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: bootstrap AssignedTo: unassig...@gcc.gnu.org ReportedBy: scov...@gmail.com The RC doesn't build on i686-pc-cygwin: gcc -c -DHAVE_CONFIG_H -g -fkeep-inline-functions -I. -I../../gcc-4.7.0-RC-20120302/libiberty/../include -W -Wall -Wwrite-strings -Wc++-compat -Wstrict-prototypes -pedantic ../../gcc-4.7.0-RC-20120302/libiberty/pex-unix.c -o pex-unix.o ../../gcc-4.7.0-RC-20120302/libiberty/pex-unix.c: In function ‘pex_unix_exec_child’: ../../gcc-4.7.0-RC-20120302/libiberty/pex-unix.c:549:2: warning: implicit declaration of function ‘spawnvpe’ [-Wimplicit-function-declaration] ../../gcc-4.7.0-RC-20120302/libiberty/pex-unix.c:549:18: error: ‘_P_NOWAITO’ undeclared (first use in this function) ../../gcc-4.7.0-RC-20120302/libiberty/pex-unix.c:549:18: note: each undeclared identifier is reported only once for each function it appears in ../../gcc-4.7.0-RC-20120302/libiberty/pex-unix.c:551:2: warning: implicit declaration of function ‘spawnve’ [-Wimplicit-function-declaration] Makefile:892: recipe for target `pex-unix.o' failed make[3]: *** [pex-unix.o] Error 1 make[3]: Leaving directory `/home/Ryan/apps/gcc-4.7.0-obj/libiberty' Makefile:8642: recipe for target `all-stage1-libiberty' failed make[2]: *** [all-stage1-libiberty] Error 2 make[2]: Leaving directory `/home/Ryan/apps/gcc-4.7.0-obj' Makefile:15771: recipe for target `stage1-bubble' failed make[1]: *** [stage1-bubble] Error 2 make[1]: Leaving directory `/home/Ryan/apps/gcc-4.7.0-obj' Makefile:897: recipe for target `all' failed make: *** [all] Error 2 The needed declarations seem to live in Windows headers (process.h?) This is using all official (and latest) cygwin packages: binutils-2.22.51 cygwin-1.7.11s(0.259/5/3) gcc-4.5.3 gmp-4.3.2 make-3.82.90 mpfr-3.0.1 Configure command: ../gcc-4.7.0-RC-20120302/configure --prefix=$HOME/apps/gcc-4.7 --enable-languages=c,c++,lto
[Bug c++/52166] New: c++0x required to import standard c++ headers?
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52166 Bug #: 52166 Summary: c++0x required to import standard c++ headers? Classification: Unclassified Product: gcc Version: 4.6.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: scov...@gmail.com Several of the standard C++ wrapper versions of C headers can only be imported with c++0x support enabled (tested on both cygwin and x86_64-linux): #error This file requires compiler and library support for the upcoming ISO C++ standard, C++0x. This support is currently experimental, and must be enabled with the -std=c++0x or -std=gnu++0x compiler options. Workaround is to #include foo.h instead of cfoo, but it's annoying given that the former is supposedly The Right Way for C++ programs to bring in the header. Affected files: ccomplex cfenv cinttypes cstdint
[Bug tree-optimization/50346] Function call foils VRP/jump-threading of redundant predicate on struct member
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50346 --- Comment #4 from Ryan Johnson scovich at gmail dot com 2011-10-12 12:40:25 UTC --- (In reply to comment #3) Well, it's a tree optimization issue. It's simple - the local aggregate f escapes the function via the member function call to baz: bb 5: foo::baz (f); and as our points-to analysis is not flow-sensitive for memory/calls this causes f to be clobbered by the call to bar Is flow-sensitive analysis within single functions prohibitively expensive? All the papers I can find talk about whole-program analysis, where it's very expensive in both time and space; the best I could find (CGO'11 best paper) gets it down to 20-30ms and 2-3MB per kLoC for up to ~300kLoC. as neither the bodies of baz nor bar are visible there is nothing we can do Would knowing the body of bar() help if the latter cannot be inlined?
[Bug c++/50346] New: Function call foils VRP/jump-threading of redundant predicate on struct member
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50346 Bug #: 50346 Summary: Function call foils VRP/jump-threading of redundant predicate on struct member Classification: Unclassified Product: gcc Version: 4.6.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: scov...@gmail.com When compiling the following code with options `-O3 -DBUG' : // === bug.cpp === struct foo { bool b; foo() : b(false) { } void baz(); }; bool bar(); void baz(); void test() { foo f; bool b = false; if (bar()) b = f.b = true; #ifndef BUG if (f.b != b) __builtin_unreachable(); #endif if (f.b) f.baz(); } // === end == gcc fails to eliminate the second (redundant) if statement: _Z4testv: .LFB3: subq$24, %rsp movb$0, 15(%rsp)=== assign f.b = 0 call_Z3barv=== cannot access f.b testb %al, %al je .L2 movb$1, 15(%rsp) .L3: leaq15(%rsp), %rdi call_ZN3foo3bazEv addq$24, %rsp ret .L2: cmpb$0, 15(%rsp)=== always compares equal jne .L3 addq$24, %rsp ret Compiling with `-O3 -UBUG' gives the expected results: _Z4testv: .LFB3: subq$24, %rsp movb$0, 15(%rsp) call_Z3barv testb %al, %al je .L1 leaq15(%rsp), %rdi movb$1, 15(%rsp) call_ZN3foo3bazEv .L1: addq$24, %rsp ret This sort of scenario comes up a lot with RAII-related code, particularly when some code paths clean up the object manually before the destructor runs (obviating the need for the destructor to do it again). While it should be possible to give hints using __builtin_unreachable(), it's not always easy to tell where to put it, and it may need to be placed multiple times to be effective.
[Bug c++/50312] New: ICE when calling offsetof() illegally on incomplete template class
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50312 Bug #: 50312 Summary: ICE when calling offsetof() illegally on incomplete template class Classification: Unclassified Product: gcc Version: 4.6.1 Status: UNCONFIRMED Severity: minor Priority: P3 Component: c++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: scov...@gmail.com The following (admittedly illegal) code: //== begin == #include cstddef #ifdef BUG template typename T=int #define EXTRA #else #define EXTRA #endif struct foo { int bar; enum { END = offsetof(foo, bar) }; }; foo EXTRA a; //== end === Causes an ICE when compiled with -DBUG: $ g++ -DBUG bug.cpp bug.cpp: In instantiation of ‘foo’: bug.cpp:12:11: instantiated from here bug.cpp:10:10: internal compiler error: in tree_low_cst, at tree.h:4233 Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html for instructions. The above is for g++-4.6.1; the same error occurs for g++-4.5.0 (but at tree.c:6202). Compiling without -DBUG triggers a much more helpful diagnostic: $ g++ bug.cpp bug.cpp:10:18: error: invalid use of incomplete type ‘struct foo {aka struct foo}’ bug.cpp:8:8: error: forward declaration of ‘struct foo {aka struct foo}’
[Bug inline-asm/49611] Inline asm should support input/output of flags
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49611 --- Comment #2 from Ryan Johnson scovich at gmail dot com 2011-07-04 20:32:01 UTC --- (In reply to comment #1) Making this work reliably is probably more work than making GCC use the flags from more cases from regular C code. Does that mean each such case would need to be identified individually and then hard-wired into i386.md? The existence of modes like CCGC, CCGOC, CCNO, etc. in i386-modes.def made me hope that some high-level mechanism existed for reasoning about the semantics of condition codes. Or does that mechanism exist, and is just difficult to expose to inline asm for some reason?
[Bug inline-asm/49611] New: Inline asm should support input/output of flags
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49611 Summary: Inline asm should support input/output of flags Product: gcc Version: 4.5.2 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: inline-asm AssignedTo: unassig...@gcc.gnu.org ReportedBy: scov...@gmail.com The main reason I find myself writing inline asm is to do clever things with the flags register, especially in conjunction with unusual instructions. Some examples: 1. Using the sparc brz instruction if the compiler doesn't emit it (e.g. bug #40067). 2. Using the carry flag in x86 to determine whether the unsigned comparison a != b was greater or less than, using subtract-with-borrow (seen in gnu libc): sbb %eax, %eax; sbb $-1, %eax leaves %eax containing -1 if a b and +1 if a b. 3. AMD's Advanced Synchronization Facility which proposes a jmp-like instruction for starting hardware transactions. Its effect is similar to fork(): on the first time past sets flags and eax to zero; a transaction failure resumes from the same PC, but with eax and flags set to reflect an error code. 4. In my experience, the main reason people would want asm goto to allow outputs is because they can't export flags (otherwise the goto can become control flow in C/C++). In all three cases the inline asm becomes needlessly long simply because uses of the flags generated within the asm block will only work reliably within that asm block (including branches, loops, etc.). Consider the following concrete example: #define EOL \n #define EOLT EOL \t long pstrcmp(unsigned char const* a, unsigned char const* b, long* pout, long pin=0) { long delta, tmp; asm(# EOL 1:EOLT movzb (%[a], %[n]), %k[tmp] EOLT movzb (%[b], %[n]), %k[delta]EOLT cmpb %b[delta], %b[tmp] EOLT jnz2f EOLT testb %b[tmp], %b[tmp] EOLT jz 3f EOLT sub%[m1], %[n]EOLT jmp1b EOL 2:EOLT sbb%[delta], %[delta] EOLT sbb%[m1], %[delta]EOL 3: : [a] +r(a), [b] +r(b), [n] +r(pin), [delta] =q(delta), [tmp] =q(tmp) : [m1] i(-1) ); *pout = pin; return delta; } With inline asm support for flags it would look more like this: long pstrcmp(unsigned char const* a, unsigned char const* b, long* pout, long pin=0) { long delta, tmp; again: if (a[pin] == b[pin]) { if (a[pin] != 0) { pin++; goto again; } else { delta = b[pin]; } } else { asm(sbb%[delta], %[delta] EOLT sbb%[m1], %[delta] : [delta] =r : [m1] i(-1), flags(a[pin] != b[pin]) ); } *pout = pin; return delta; } The intent is that the flags input specifier tells the compiler to arrange for flags to be set at entry to the asm block as if the expression passed to it had just completed (the compiler would warn/error if it were unclear the effect evaluating the expression would have on flags). In theory the optimizer should be able to eliminate common expressions and shuffle code to avoid materializing the flags at all. Using flags as output (perhaps to pass as input to another inline asm block) might look like this: asm(cmp %0, %1 : =flags(flags) : r(a), r(b)); ... asm(jz 1f : : flags(flags)); The flags should probably take type 'int' in C. Ideally, the compiler could even recognize and optimize patterns like this: asm(cmp %0, %1 : =flags(flags) : r(a), r(b)); enum { CF=1 }; if (flags CF) { ... } else if (flags ZF) { ... }
[Bug middle-end/49035] New: Avoid setting up stack frame for short, hot code paths
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49035 Summary: Avoid setting up stack frame for short, hot code paths Product: gcc Version: 4.5.2 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: middle-end AssignedTo: unassig...@gcc.gnu.org ReportedBy: scov...@gmail.com I often find myself writing functions of the following form: void foo () { if (common_case) { /* do a little work and return */ } /* uncommon case: do a lot of work, call functions, etc. */ } The resulting assembly code always sets up a stack frame in the function prologue, even though the function usually executes as a leaf using few (or zero) of the callee-save registers and stack slots it saves. Here's an example which is only slightly contrived: === rfe.cpp struct link { link* prev; long go_slow; void frob(link* parent, link* grandparent); }; link* foo(link* list) { link* prev = list-prev; while (__builtin_expect(prev-go_slow, 0)) { link* pprev = __sync_lock_test_and_set(prev-prev, 0); pprev-frob(prev, list); prev = pprev; } return prev; } === rfe.cpp Compiling the above with `x86_64-unknown-linux-gnu-g++-4.5.2 -O3 -S' yields _Z3fooP4link: .LFB0: movq%rbx, -24(%rsp) movq%rbp, -16(%rsp) movq%rdi, %rbx movq%r12, -8(%rsp) subq$24, %rsp movq(%rdi), %rax cmpq$0, 8(%rax) jne .L8 .L2: movq(%rsp), %rbx movq8(%rsp), %rbp movq16(%rsp), %r12 addq$24, %rsp ret .L8: xorl%r12d, %r12d .L6: movq%r12, %rbp xchgq (%rax), %rbp movq%rbx, %rdx movq%rax, %rsi movq%rbp, %rdi call_ZN4link4frobEPS_S0_ cmpq$0, 8(%rbp) jne .L4 movq%rbp, %rax jmp .L2 .L4: movq%rbp, %rax jmp .L6 Ideally, it would look like this instead: _Z3fooP4link: .LFB0: ;; *** hot path executes as leaf *** movq(%rdi), %rax cmpq$0, 8(%rax) jne .L8 ret .L8: ;; *** set up stack frame *** movq%rbx, -24(%rsp) movq%rbp, -16(%rsp) movq%rdi, %rbx movq%r12, -8(%rsp) subq$24, %rsp ;; *** xorl%r12d, %r12d .L6: movq%r12, %rbp xchgq (%rax), %rbp movq%rbx, %rdx movq%rax, %rsi movq%rbp, %rdi call_ZN4link4frobEPS_S0_ cmpq$0, 8(%rbp) jne .L4 ;; *** tear down stack frame *** movq%rbp, %rax movq(%rsp), %rbx movq8(%rsp), %rbp movq16(%rsp), %r12 addq$24, %rsp ;; *** ret .L4: movq%rbp, %rax jmp .L6 The effect can sometimes be simulated using an inlined foo which includes the fast path and a call to the (non-inlined) slow path, but the whims of function inlining often conspire against it even when callers are able to inline foo (e.g. foo is not a library function). There's probably some overlap with partial inlining here: the ideal case essentially splits the slow path off into its own function (called using tail recursion); presumably partial inlining would inline the fast path while having all callers jump to the same copy of the slow path function. However, the optimization is arguably useful even if foo is never inlined at all. Thoughts? Ryan
[Bug middle-end/49035] Avoid setting up stack frame for short, hot code paths
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49035 --- Comment #1 from Ryan Johnson scovich at gmail dot com 2011-05-18 02:56:23 UTC --- Update: using __attribute__((noinline)) it is actually possible to force the compiler to do the right thing, though it makes the code significantly less readable: === example.cpp struct link { link* prev; long go_slow; void frob(link* parent, link* grandparent); }; link* __attribute__((noinline)) foo_slow(link* list, link* prev) { do { link* pprev = __sync_lock_test_and_set(prev-prev, 0); pprev-frob(prev, list); prev = pprev; } while(__builtin_expect(prev-go_slow, 0)); return prev; } link* foo_fast(link* list) { link* prev = list-prev; if (__builtin_expect(prev-go_slow, 0)) return foo_slow(list, prev); return prev; } === example.cpp The above compiles down to something much better, though the calling convention requires an extra movq and there are more jumps than required (the compiler probably doesn't ever perform tail recursion using a conditional jump): _Z8foo_fastP4link: movq(%rdi), %rax cmpq$0, 8(%rax) jne .L7 rep ret .L7: movq%rax, %rsi jmp _Z8foo_slowP4linkS0_ _Z8foo_slowP4linkS0_: movq%rbp, -16(%rsp) movq%r12, -8(%rsp) xorl%ebp, %ebp movq%rbx, -24(%rsp) movq%rdi, %r12 subq$24, %rsp .L2: movq%rbp, %rbx xchgq (%rsi), %rbx movq%r12, %rdx movq%rbx, %rdi call_ZN4link4frobEPS_S0_ cmpq$0, 8(%rbx) jne .L3 movq%rbx, %rax movq8(%rsp), %rbp movq(%rsp), %rbx movq16(%rsp), %r12 addq$24, %rsp ret .L3: movq%rbx, %rsi jmp .L2
[Bug c++/46143] New: __attribute__((optimize)) emits wrong code
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46143 Summary: __attribute__((optimize)) emits wrong code Product: gcc Version: 4.5.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: scov...@gmail.com Created attachment 22129 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=22129 Test case showing wrong code with __attribute__((optimize(0))) Applying '__attribute__((optimize(0)))' to a function causes it to call the wrong variant/clone of an optimized callee that returns a struct by value. The attached test case reproduces the problem when compiled with `g++ -O3 -DBUG bug.cpp' The problem seems to be the way gcc optimizes return-by-value. The statement: iterator it = v.begin() becomes tmp = alloca(sizeof(iterator)) vector::begin(tmp, v) iterator it(*(iterator*)tmp) However, gcc actually calls the wrong variant of vector::begin, with the latter thinking its first argument is v._M_impl._M_start (an iterator to be copied) and which has optimized away the struct completely to return only a pointer. It therefore allocates a temporary and proceeds to initialize it using the (uninitialized) return-value it was passed, then returns the temporary's contents to the caller (main). As a result, 'it' points to whatever happened to be on the stack at the time of the call. Note that the test case smashes the stack only to make the symptoms consistent; the bug remains with or without it. The relevant disassembly follows: main: # call vector::begin(rval_ptr, v) subq$24, %rsp# allocate hidden tmp1 movqv(%rip), %rdx movq%rdx, %rsi # second arg is v movq%rsp, %rdi # first arg is tmp1 call_ZNSt6vectorIP3fooSaIS1_EE5beginEv.clone.1 ... _ZNSt6vectorIP3fooSaIS1_EE5beginEv.clone.1: subq$24, %rsp# allocate hidden tmp2 movq%rdi, %rsi # second arg expects v but gets tmp1 movq%rsp, %rdi # first arg is tmp2 call _ZN9__gnu_cxx17__normal_iteratorIPP3fooSt6vectorIS2_SaIS2_EEEC2ERKS3_.clone.0 movq(%rsp), %rax # return the contents of tmp2 addq$24, %rsp ret _ZN9__gnu_cxx17__normal_iteratorIPP3fooSt6vectorIS2_SaIS2_EEEC2ERKS3_.clone.0: movq%rsi, (%rdi) # tmp2 = tmp1 ret
[Bug c++/46143] __attribute__((optimize)) emits wrong code
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46143 Ryan Johnson scovich at gmail dot com changed: What|Removed |Added Attachment #22129|0 |1 is obsolete|| --- Comment #1 from Ryan Johnson scovich at gmail dot com 2010-10-22 22:18:16 UTC --- Created attachment 22130 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=22130 Test case showing wrong code with __attribute__((optimize(0))) Oops... the previous version had stray marks from emacs+gdb.
[Bug c++/46143] __attribute__((optimize)) emits wrong code
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46143 --- Comment #4 from Ryan Johnson scovich at gmail dot com 2010-10-22 23:06:53 UTC --- As I said, the stack smashing was only there to make the behavior consistent. If the offending stack location happens to contain zero, the bug would go unnoticed (try adding 'long n[1]' as another local, for me it makes the symptom go away unless the stack smash exposes it. In any case, here's a minimal testcase which doesn't do anything evil: #include vector #include cassert typedef std::vectorint intv; int #ifdef BUG __attribute__((optimize(0))) #endif main() { intv v; intv::iterator it = v.begin(); assert(it == v.begin()); return 0; }
[Bug lto/45959] [4.6 Regression] ICE: tree code 'template_type_parm' is not supported in gimple streams with -flto/-fwhopr
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45959 Ryan Johnson scovich at gmail dot com changed: What|Removed |Added CC||scovich at gmail dot com --- Comment #6 from Ryan Johnson scovich at gmail dot com 2010-10-11 13:53:42 UTC --- Actually, this isn't a regression -- not on 4.6, at least. The following minimal test case makes x86_64-unknown-linux-gnu-gcc-4.5.1 die with the same error message: $ cat lto-bug.h EOF #pragma interface templateclass T struct foo; templateclass T struct fooT* : fooT { fooT*(T* t) : fooT(*t) { } }; EOF $ cat lto-bug.C EOF #pragma implementation lto-bug.h #include lto-bug.h EOF $ gcc-4.5.1 -flto lto-bug.C In file included from lto-bug.C:2:0: lto-bug.h:6:2: internal compiler error: tree code ‘template_type_parm’ is not supported in gimple streams Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html for instructions. Removing anything at all makes the ICE disappear I don't have a copy of the 4.6 sources to test whether the just checked-in fix takes care of this... reopen?
[Bug c++/45968] New: ICE: tree code 'template_type_parm' is not supported in gimple streams with -flto
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45968 Summary: ICE: tree code 'template_type_parm' is not supported in gimple streams with -flto Product: gcc Version: lto Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: scov...@gmail.com The following minimal test case makes x86_64-unknown-linux-gnu-gcc-4.5.1 ICE when the -flto flag is supplied: $ cat lto-bug.h EOF #pragma interface templateclass T struct foo; templateclass T struct fooT* : fooT { fooT*(T* t) : fooT(*t) { } }; EOF $ cat lto-bug.C EOF #pragma implementation lto-bug.h #include lto-bug.h EOF $ gcc-4.5.1 -flto lto-bug.C In file included from lto-bug.C:2:0: lto-bug.h:6:2: internal compiler error: tree code ‘template_type_parm’ is not supported in gimple streams Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html for instructions. Removing anything at all makes the ICE disappear This seemed similar to bug #45959, but applying the patch mentioned there to gcc-4.5.1-src/gcc/cp/pt.c:11322 does not help so I'm filing a new bug for this.
[Bug debug/43828] Emit debug info allowing inlined functions to show in stack traces
--- Comment #5 from scovich at gmail dot com 2010-05-07 20:12 --- Belated follow-up: I just tried to use sparc-sun-solaris2.10-gcc-4.4.0 (built from sources) and it does not emit the DW_AT_call_* debug attributes which gdb expects in order to unwind inlined functions. I have searched both the gdb and gcc docs and cannot find any mention of (modern) machines/systems/situations where this is not supported; given that the required attributes are missing it seems like a gcc problem (feeding the .s file to gas doesn't help, so I doubt it's the sun assembler/linker, either) gcc -v Using built-in specs. Target: sparc-sun-solaris2.10 Configured with: ../gcc-4.4.0/configure --prefix=/export/home/ryanjohn/apps/gcc-4.4.0 --with-gmp=/export/home/ryanjohn/apps --with-mpfr=/export/home/ryanjohn/apps --without-gnu-ld --without-gnu-as Thread model: posix gcc version 4.4.0 (GCC) -- scovich at gmail dot com changed: What|Removed |Added Status|RESOLVED|UNCONFIRMED GCC host triplet||sparc-sun-solaris2.10 Known to fail||4.4.0 Resolution|WORKSFORME | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43828
[Bug debug/43828] Emit debug info allowing inlined functions to show in stack traces
--- Comment #6 from scovich at gmail dot com 2010-05-07 21:20 --- Aha! The problem is not that gcc fails to emit the proper debug info, it's that it doesn't always track well which instructions came from which function. For example, if we compile this toy program: int volatile global; int foo(int a) { return a + global; } int bar(int a) { return global + foo(a); } int baz(int a) { return global + bar(a); } int main(int argc, char const* argv[]) { return global + baz(argc); } Running it in gdb will seem to begin execution at exit from bar: Dump of assembler code for function main: 0x000106cc +0: sethi %hi(0x20800), %g1 0x000106d0 +4: ld [ %g1 + 0x124 ], %g4! 0x20924 global = 0x000106d4 +8: ld [ %g1 + 0x124 ], %g3 0x000106d8 +12:ld [ %g1 + 0x124 ], %g2 0x000106dc +16:ld [ %g1 + 0x124 ], %g1 0x000106e0 +20:add %g4, %g1, %g1 0x000106e4 +24:add %g1, %g3, %g1 0x000106e8 +28:add %g1, %g2, %g1 0x000106ec +32:retl 0x000106f0 +36:add %g1, %o0, %o0 End of assembler dump. Apparently someone made the reasonable judgment call that it was better to only enter inlined functions once rather than jumping around, and even then only if code from later in the containing function hasn't already run. Putting a printf in foo() gave the expected result. -- scovich at gmail dot com changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||WORKSFORME http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43828
[Bug debug/43828] Emit debug info allowing inlined functions to show in stack traces
--- Comment #4 from scovich at gmail dot com 2010-04-23 23:29 --- Try the -i option of addr2line. Ah, very nice. It turns out I was using a 4.0-series gcc, which according to gdb's docs doesn't output quite enough debug information to reconstruct inlined stack traces; 4.1 and later do. Time for an upgrade! Thanks! -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43828
[Bug debug/43828] New: Emit debug info allowing inlined functions to show in stack traces
It would be very nice if gcc emitted debug information that allowed profilers and debuggers the option to extract a stack trace which included calls to inlined functions. This would allow developers much greater insight into the behavior of optimized code. C++ programs would benefit disproportionately, especially those which use the STL heavily -- disabling inlining produces a very different executable which makes profiling worse than useless and can mask heisenbugs. Profiling would become significantly more accurate because it could determine how much of a function's overheads remain even after inlining, which is pretty much impossible right now. It would also allow profilers to generate functional call graphs which show all uses of a function, inlined or not. Debugging would also improve because the user would be able to navigate a stack trace which corresponds to the code they're trying to debug, even if the actual calls were optimized away. Questions like which of this function's 10 calls to std::vector::begin seg faulted? would suddenly be *much* easier to answer, and in an intuitive way. With some work it would probably even be possible to maintain mappings for local vars/params (assuming they exist at the time). All this virtual stack trace functionality would need to remain separate (and probably not the default) so as to not confuse (impede) folks who are used to (prefer) the current behavior. NOTE: I realize that full support for this would require changes to other projects like gdb and gprof, but gcc could solve the chicken-and-egg problem by emitting appropriate debug info as a first step; perhaps the new debug info changes introduced with 4.5.0 already do (some of) this? -- Summary: Emit debug info allowing inlined functions to show in stack traces Product: gcc Version: unknown Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: debug AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: scovich at gmail dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43828
[Bug debug/43828] Emit debug info allowing inlined functions to show in stack traces
--- Comment #1 from scovich at gmail dot com 2010-04-21 09:29 --- (In reply to comment #0) One more way debugging would improve: it would become possible to set breakpoints in inlined functions -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43828
[Bug bootstrap/43301] New: top-level configure script ignores ---with-build-time-tools
./configure ... --with-build-time-tools=$MY_TOOLS ignores $MY_TOOLS (though it correctly warns when $MY_TOOLS is not an absolute path). Let's just say this led to extremely frustrating behavior until I decided to start digging... Suggested patch to correct the problem: Index: /home/Ryan/apps/gcc-4.5-src/configure.ac === --- /home/Ryan/apps/gcc-4.5-src/configure.ac(revision 157227) +++ /home/Ryan/apps/gcc-4.5-src/configure.ac(working copy) @@ -3221,7 +3221,9 @@ [ --with-build-time-tools=PATH use given path to find target tools during the build], [case x$withval in - x/*) ;; + x/*) + with_build_time_tools=$withval + ;; *) with_build_time_tools= AC_MSG_WARN([argument to --with-build-time-tools must be an absolute path]) -- Summary: top-level configure script ignores ---with-build-time- tools Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: bootstrap AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: scovich at gmail dot com GCC host triplet: i686-pc-cygwin GCC target triplet: x86_64-unknown-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43301
[Bug bootstrap/43301] top-level configure script ignores ---with-build-time-tools
--- Comment #1 from scovich at gmail dot com 2010-03-09 01:04 --- (In reply to comment #0) Let's just say this led to extremely frustrating behavior until I decided to start digging... To be more specific, the gcc/as wrapper is generated with: ORIGINAL_AS_FOR_TARGET= ORIGINAL_LD_FOR_TARGET= ORIGINAL_PLUGIN_LD_FOR_TARGET= ORIGINAL_NM_FOR_TARGET= Which causes the building of libgcc to fail later on at gcc/as line 83 with a message about exec: not found -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43301
[Bug c/35503] Warning about restricted pointers?
--- Comment #2 from scovich at gmail dot com 2009-11-27 07:45 --- I've also run into this. Perhaps the machinery which tracks strict aliasing (and generates best-effort warnings) could be used here? ... adding this comment instead of filing a duplicate :P -- scovich at gmail dot com changed: What|Removed |Added CC||scovich at gmail dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35503
[Bug middle-end/42077] New: std::set: dereferencing pointer '__x.15' does break strict-aliasing rules
With gcc-4.4.2 the following code generates warnings about strict aliasing: =| bug.cpp |=== #include set #ifdef SHOW_BUG struct foo { int i; bool operator(foo const o) const { return i o.i; } }; #else typedef int foo; #endif int main() { std::setfoo().insert((foo) {0}); } =| bug.cpp |=== $ gcc -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../gcc-4.4.2-src/configure --prefix=/home/rjo/apps/gcc-4.4.2 --with-gmp=/home/rjo/apps --with-mpfr=/home/rjo/apps --disable-nls --disable-multilib Thread model: posix gcc version 4.4.2 (GCC) $ gcc -Wall -O3 -DSHOW_BUG bug.cpp bug.cpp: In function 'int main()': bug.cpp:5: warning: dereferencing pointer '__x.15' does break strict-aliasing rules /home/rjo/apps/gcc-4.4.2/lib/gcc/x86_64-unknown-linux-gnu/4.4.2/../../../../include/c++/4.4.2/bits/stl_tree.h:525: note: initialized from here /home/rjo/experiments/scratch.cpp:5: warning: dereferencing pointer '__x.15' does break strict-aliasing rules /home/rjo/apps/gcc-4.4.2/lib/gcc/x86_64-unknown-linux-gnu/4.4.2/../../../../include/c++/4.4.2/bits/stl_tree.h:525: note: initialized from here In addition to the problem of the STL (appearing to?) break strict-aliasing rules, it looks like bug #38477 is back Below is the (hopefully) relevant snippet of CFG. It looks exactly like the issue described in bug #38477, with static cast voodoo: static const _Val std::_Rb_tree_Key, _Val, _KeyOfValue, _Compare, _Alloc::_S_value(const std::_Rb_tree_node_base*) [with _Key = foo, _Val = foo, _KeyOfValue = std::_Identityfoo, _Compare = std::lessfoo, _All oc = std::allocatorfoo] (const struct _Rb_tree_node_base * __x) { const struct _Rb_tree_node * __x.15; const struct foo D.8747; bb 2: __x.15 = (const struct _Rb_tree_node *) __x; D.8747 = __x.15-_M_value_field; return D.8747; } Also, is there a particular reason the diagnostic says does break instead of breaks ? The former may be technically correct English but sounds strange, even if there's a corresponding may break diagnostic. -- Summary: std::set: dereferencing pointer '__x.15' does break strict-aliasing rules Product: gcc Version: 4.4.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: scovich at gmail dot com GCC host triplet: x86_64-unknown-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42077
[Bug tree-optimization/39390] [4.4 Regression] Bogus aliasing warning with std::set
--- Comment #10 from scovich at gmail dot com 2009-11-17 11:16 --- (In reply to comment #3) the warning is for dead code. Thus this is not a wrong-code problem. Just to verify, does this (and comment #7) mean that the warning is harmless and can be ignored? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39390
[Bug bootstrap/42028] New: Bootstrap fails for mpfr/gmp not in LD_LIBRARY_PATH
Bootstrapping of gcc-4.4.2 fails on my machine because the stage 1 compiler has a runtime dependency on mpfr and gmp, which are not in my LD_LIBRARY_PATH because I only built them in order to compile gcc. Using --with-gmp, --with-mpfr and --with-build-libsubdir at configure time lets it compile but doesn't help it run. Given that the inputs to configure make it pretty clear mpfr and gmp are not in standard locations, and the finished compiler won't have any dependencies on those libraries, I would expect the build system to ensure they are accessible to any dependent intermediate binaries it runs. The workaround is to set up an LD_LIBRARY_PATH, so this is more annoyance than anything else. -- Summary: Bootstrap fails for mpfr/gmp not in LD_LIBRARY_PATH Product: gcc Version: 4.4.2 Status: UNCONFIRMED Severity: trivial Priority: P3 Component: bootstrap AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: scovich at gmail dot com GCC host triplet: x86_64-unknown-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42028
[Bug bootstrap/42028] Bootstrap fails for mpfr/gmp not in LD_LIBRARY_PATH
--- Comment #1 from scovich at gmail dot com 2009-11-13 10:35 --- Hmm.. it seems the final executable depends on mpfr and gmp as well... I could have sworn the docs said it was a build-time dependency only. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42028
[Bug inline-asm/40124] Inline asm should support limited control flow
--- Comment #9 from scovich at gmail dot com 2009-05-13 07:55 --- RE: __builtin_expect -- Thanks! It did help quite a bit, even though the compiler was already emitting not-taken branch hints on its own. RE: Filing bugs -- I have. This RFE arose out of Bug #40078, which was triggered by attempts to work around Bug #40067. I still have some issues with overconservative use of branch delay slots and possibly loop pipelining, but haven't gotten to filing them yet. I've also filed other bugs in the past where it would have been nice to work around using inline asm but control flow was a pain. In the end, is there any particular reason *not* to make inline asm easier to use and more transparent to the compiler, given points #1 and #2? Invoking point #3, what significant uses of computed gotos exist, other than to work around switch statements that compile suboptimally? The docs don't mention any, and yet we have them instead of (or in addition to) bug reports. I'd take a stab at implementing this myself -- it's probably a one-liner -- but I've never hacked gcc before and have no clue where that one line might lurk. BTW, how does one exploit the compiler's overflow catching? I tried testing a+b a and a+b b (for unsigned ints) with no luck, and there's no __builtin test for overflow or carry. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40124
[Bug inline-asm/40124] Inline asm should support limited control flow
--- Comment #11 from scovich at gmail dot com 2009-05-13 09:51 --- If you allow inline asms to change control flow, even just to labels whose address has been taken through label, you penalize a lot of code which doesn't change the control flow, as the compiler will have to assume each inline asm which could possibly get at an label address (not just directly, but through global variables, pointers etc.) can jump to it. I'm going to invoke #3 again to respond to these concerns: a. This RFE is specifically limited to local control flow only, so the compiler can safely ignore any label not in the asm's enclosing function, as well as labels whose addresses are never taken (or provably never used). Computed gotos appear to make the same assumptions, based on the docs' strong warning not allow labels to leak out of their enclosing function in any way. b. While it's always possible that an asm could jump to a value loaded from an arbitrary, dynamically-generated address, the same is true for computed gotos. Either way, compiler analysis or not, doing so would almost certainly send you to la-la land because label values aren't known until assembler time or later and have no guaranteed relationship with each other. The only way to get a valid label address is using one directly, or computing it with some sort of base+(label-base). Either way requires taking the address of the desired label at some point and tipping off the compiler. c. It's pretty easy to write functions whose computed gotos defy static analysis, but most of the time the compiler does pretty well. Well-written asm blocks should access memory via m constraints -- which the compiler can analyze -- rather than manually dereferencing a pointer passed in with an r constraint. This is especially true for asm blocks with no internal control flow, which this RFE encourages. d. If a code path is short/simple enough that incoming jumps penalize it heavily (whether from computed gotos or jumps from asm), it's probably also small enough that the compiler (or programmer, if need be) can duplicate it for the short path. A big, ugly code path probably wouldn't even notice an extra control flow arc or two. In the end, a big goal of this RFE is to allow programmers to make the compiler aware of control flow arcs they're already adding (or tempted to add) behind its back. It therefore wouldn't strike me as much of a limitation if jumps to labels not explicitly passed to the asm are unsupported and may lead to undefined behavior. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40124
[Bug inline-asm/40124] New: Inline asm should support limited control flow
) movqlabels.1894(%rip), %rax jmp *%rax .p2align 4,,7 .L5: leaq12(%rsp), %rdx callhandle_overflow .L4: movl12(%rsp), %eax movq16(%rsp), %rbx movq24(%rsp), %rbp addq$32, %rsp ret -- Summary: Inline asm should support limited control flow Product: gcc Version: unknown Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: inline-asm AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: scovich at gmail dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40124
[Bug inline-asm/40124] Inline asm should support limited control flow
--- Comment #2 from scovich at gmail dot com 2009-05-12 16:13 --- Overflow and adc were only examples. Other instructions that set cc, or other conditions (e.g. parity) would not have that optimization. Another use is the ability to jump out of an inline asm to handle an uncommon case (if writing hand-tuned asm for speed, for example). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40124
[Bug inline-asm/40124] Inline asm should support limited control flow
--- Comment #4 from scovich at gmail dot com 2009-05-12 16:36 --- I'm actually running sparcv9-sun-solaris2.10 (the examples used x86 because more people know it and its asm is easier to read). My use case is the following: I'm implementing high-performance synchronization primitives and the compiler isn't generating good enough code -- partly because it doesn't pipeline spinloops, and partly because it has no way to know what stuff is truly critical path and what just needs to happen eventually. Here's a basic idea of what I've been looking at: long mcs_lock_acquire(mcs_lock* lock, mcs_qnode* me) { again: /* initialize qnode, etc */ membar_producer(); mcs_qnode* pred = atomic_swap(lock-tail, me); if(pred) { pred-next = me; while(int flags=me-wait_flags) { if(flags ERROR) { /* recovery code */ goto again; } } } membar_enter(); return (long) pred; } This code is absolutely performance-critical because every instruction on the critical path delays O(N) other threads -- even a single extra load or store causes noticeable delays. I was trying to rewrite just the while loop above in asm to be more efficient, but it is hard because of that goto inside. Basically, the error isn't going anywhere once it shows up, so we don't have to check it nearly as often as the flags==0 case, and it can be interleaved across as many loop iterations as needed to make its overhead disappear. Manually unrolling and pipelining the loop helped a bit, but the compiler still tended to cluster things together more than was strictly necessary (leading to bursts of saturated pipeline alternating with slack). For CC stuff, especially x86-related, I bet places like fftw and gmp are good sources of frustration to mine. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40124
[Bug inline-asm/40124] Inline asm should support limited control flow
--- Comment #7 from scovich at gmail dot com 2009-05-12 17:01 --- Isn't __builtin_expect a way to send branch prediction hints? I'm not having trouble with that AFAIK. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40124
[Bug middle-end/37722] destructors not called on computed goto
--- Comment #2 from scovich at gmail dot com 2009-05-09 08:16 --- Computed gotos can easily make it impossible for the compiler to call constructors and destructors consistently. This is a major gotcha of computed gotos for people who have used normal gotos in C++ and expect destructors to be handled properly. Consider this program, for instance: #include stdio.h templateint i struct foo { foo() { printf(%s%d\n, __FUNCTION__, i); } ~foo() { printf(%s%d\n, __FUNCTION__, i); } }; enum {RETRY, INSIDE, OUTSIDE, EVIL}; int bar(int idx) { static void* const gotos[] = {RETRY, INSIDE, OUTSIDE, EVIL}; bool first = true; { RETRY: foo1 f1; if(first) { first = false; goto *gotos[idx]; } INSIDE: return 1; } if(0) { foo2 f2; EVIL: return 2; } OUTSIDE: return 0; } int main() { for(int i=RETRY; i = EVIL; i++) printf(%d\n, bar(i)); return 0; } Not only does it let you jump out of a block without calling destructors, it lets you jump into one without calling constructors: $ g++-4.4.0 -Wall -O3 scratch.cpp ./a.out foo1 foo1 ~foo1 1 foo1 ~foo1 1 foo1 0 foo1 ~foo2 2 Ideally, the compiler could analyze possible destinations of the goto (best-effort, of course) and emit suitable diagnostics: scratch.cpp:16: warning: computed goto bypasses destructor of 'foo1 f1' scratch.cpp:13: warning: declared here scratch.cpp:23: warning: possible jump to label 'EVIL' scratch.cpp:16: warning: from here scratch.cpp:22: warning: crosses initialization of 'foo2 f2' In this particular example the compiler should be able to figure out that no labels reach a live f1 and call its destructor properly. If it's not feasible to analyze the possible destinations of the computed goto, regular control flow analysis should at least be able to identify potentially dangerous labels and gotos, e.g.: scratch.cpp:16: warning: computed goto may bypass destructor of 'foo1 f1' scratch.cpp:13: warning: declared here scratch.cpp:23: warning: jump to label 'EVIL' scratch.cpp:8: warning: using a computed goto scratch.cpp:22: warning: may cross initialization of 'foo2 f2' -- scovich at gmail dot com changed: What|Removed |Added CC||scovich at gmail dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37722
[Bug c/40067] New: gcc should use brz(brnz) instead of cmp/be(bne) when possible
Compiling the following function with -O3 gives the following assembly output: void spin(int volatile* ptr) { while(*ptr); return; } spin: .LLFB1: .register %g2, #scratch lduw[%o0], %g1 ! 8 *zero_extendsidi2_insn_sp64/2 [length = 1] cmp %g1, 0 ! 9 *cmpsi_insn [length = 1] be,pn %icc, .LL3 ! 10*normal_branch [length = 1] mov0, %g1 ! 17*movdi_insn_sp64/1 [length = 1] .LL6: lduw[%o0], %g2 ! 20*zero_extendsidi2_insn_sp64/2 [length = 1] cmp %g2, 0 ! 22*cmpsi_insn [length = 1] bne,pt %icc, .LL6 ! 23*normal_branch [length = 1] add%g1, 1, %g1 ! 19*adddi3_sp64/1 [length = 1] .LL3: jmp %o7+8 ! 55*return_internal[length = 1] mov%g1, %o0! 30*movdi_insn_sp64/1 [length = 1] Manually replacing the cmp/b* pairs with br* instructions gives 10-11% more iterations/sec on my machine: .global spin_brz spin_brz: .register %g2, #scratch ld[%o0], %g1 brz,pn%g1, spin_brz_done clr %g1 spin_brz_again: ld[%o0], %g2 brnz,pt %g2, spin_brz_again add %g1, 0x1, %g1 spin_brz_done: retl mov %g1, %o0 .size spin_brz, .- spin_brz -- Summary: gcc should use brz(brnz) instead of cmp/be(bne) when possible Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: scovich at gmail dot com GCC target triplet: sparc-sun-solaris2.10 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40067
[Bug middle-end/40067] gcc should use brz(brnz) instead of cmp/be(bne) when possible
--- Comment #1 from scovich at gmail dot com 2009-05-08 09:38 --- Sorry, the C code should have been: long spin(int volatile* ptr) { long rval=0; while(*ptr) rval++; return rval; } -- scovich at gmail dot com changed: What|Removed |Added Component|c |middle-end Version|unknown |4.4.0 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40067
[Bug target/40067] use brz instead of cmp/be with 32-bit values
--- Comment #3 from scovich at gmail dot com 2009-05-08 11:30 --- What|Removed |Added GCC target triplet|sparc-sun-solaris2.10 |sparc64-sun-solaris2.10 I think this affects 32-bit sparc as well, unless the br* instructions are new in sparcv9 (they don't seem to be). The only difference with v9 seems to be that 32-bit code needs to use ldsw to sign-extend if needed. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40067
[Bug middle-end/40078] New: passing label to inline asm i constraint generates bad code
Somewhat to my surprise, the gcc accepts the following inline asm syntax: asm(jmp %0 : : i(some_label)); The output is what you'd expect: assuming some_label (in C/C++) is associated with the assembler label .LLBF4 gives: jmp .LLBF4 Unfortunately, the optimizer plays havoc with things by happily eliminating the code associated with that label if it is otherwise unused. Consider the following code: static inline int foo() { return 10; } int could_be_anything; long test_label(int volatile* ptr) { int rval = 0; int dummy = 5; static void* const gotos[2] = {DONE, ERROR}; asm volatile(jmp %0\n\t nop: :i(ERROR),i(foo),r(dummy)); //goto *gotos[could_be_anything]; DONE: return rval; ERROR: rval = 1; goto DONE; } This function should return 1 after jumping from ERROR to DONE. Instead, the code for ERROR is eliminated by the optimizer; you either get a return value of zero or an infinite loop depending on whether the label started below or above the asm block (I get the latter): foo: jmp %o7+8 mov10, %o0 .size foo, .-foo test_label: .LL4: .LL5: mov 5, %g1 ! 8 test_label.c 1 jmp .LL4 nop ! 0 2 jmp %o7+8 mov0, %o0 .size test_label, .-test_label The compiler correctly recognizes that both dummy and foo are in use and does not eliminate them, but ERROR gets the axe unless the computed goto is enabled. In theory the inconsistency should be easy to fix -- just mark the label as in-use if it gets passed to a live inline asm block (exactly how functions and variables are currently treated). If for some reason that's impossible or undesirable, it should at least generate a diagnostic. -- Summary: passing label to inline asm i constraint generates bad code Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: scovich at gmail dot com GCC target triplet: sparc-sun-solaris2.10 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40078
[Bug middle-end/40078] passing label to inline asm i constraint generates bad code
--- Comment #2 from scovich at gmail dot com 2009-05-08 23:24 --- Sorry to bring this back up, but I'm not sure if comments show up in a meaningful way on closed bugs... 1. where does is it documented that inline asm can't change control flow? I can't find it in the info pages, nor anywhere in google except this bug and another which was also resolved-invalid with a comment that it's clearly commented. The docs say you can do control flow within a single asm (if your assembler supports local labels), but the only other mention is the part that says you can't jump between asm blocks because the compiler has no way to know that you did it. 2. It makes sense that anything related to stack frames (ret, call) would be a snake pit, but is there some reason why local gotos are inherently unsafe? Unlike an asm-asm jump, the compiler knows all the places control might go (you can only jump out once, after all), and presumably users wouldn't pass in labels they don't intend to use. It seems like the compiler could just treat the asm block accepting labels as a basic block containing a computed goto -- control could fall out the bottom of the block or jump any of the labels which were passed in. 3. Supporting local gotos would help work around the annoyance of getting condition codes out of an asm block efficiently -- pass in the label to a branch instruction and voila! In any case, I'm happy to accept a, go away, but would be extremely interested to hear the reasons behind this limitation given that it seems so close to working on accident. -- scovich at gmail dot com changed: What|Removed |Added Status|RESOLVED|UNCONFIRMED Resolution|INVALID | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40078
[Bug libgomp/29986] testsuite failures
--- Comment #4 from scovich at gmail dot com 2008-04-09 15:18 --- If it's any help, adding some inline asm to the file makes the Sun toolchain croak on my machine. SunOS 5.10 Generic_118833-23 sun4v sparc SUNW,Sun-Fire-T200 Sun C 5.9 SunOS_sparc Patch 124867-01 2007/07/12 Solaris Link Editors: 5.10-1.482 // begin tls-bug.c void membar_producer() { asm volatile(membar #StoreStore); } static __thread bool val; int main() { return val; } // end tls-bug.c This bug seems to show up in arbitrary ways for each of the three compilers on my machine: $ cc -V cc: Sun C 5.9 SunOS_sparc Patch 124867-01 2007/07/12 $ gcc -v Reading specs from /usr/sfw/lib/gcc/sparc-sun-solaris2.10/3.4.3/specs Configured with: /gates/sfw10/builds/sfw10-gate/usr/src/cmd/gcc/gcc-3.4.3/configure --prefix=/usr/sfw --with-as=/usr/sfw/bin/gas --with-gnu-as --with-ld=/usr/ccs/bin/ld --without-gnu-ld --enable-languages=c,c++ --enable-shared Thread model: posix gcc version 3.4.3 (csl-sol210-3_4-branch+sol_rpath) $ ~/apps/gcc/4.3/bin/gcc-4.3 -v Using built-in specs. Target: sparc64-sun-solaris2.10 Configured with: ../configure --prefix=/export/home/ryanjohn/apps/gcc/4.3 --build=sparc64-sun-solaris2.10 --program-suffix=-4.3 --with-mpfr=/export/home/ryanjohn/apps --with-gmp=/export/home/ryanjohn/apps --disable-multilib --with-as=/usr/ccs/bin/as --without-gnu-as --with-ld=/usr/ccs/bin/ld --without-gnu-ld Thread model: posix gcc version 4.3.0 (GCC) Note that all three use the same copy of ld $ cc tls-bug.c $ cc -g tls-bug.c $ CC tls-bug.c ld: fatal: relocation error: R_SPARC_TLS_GD_HI22: file tls-bug.o: symbolunknown: bad symbol type SECT: symbol type must be TLS $ CC -g tls-bug.c $ gcc -m64 tls-bug.c $ gcc -m64 -g tls-bug.c ld: fatal: relocation error: R_SPARC_TLS_DTPOFF64: file /var/tmp//ccuJHWqp.o: symbol done: offset 0x7d901c33 is non-aligned collect2: ld returned 1 exit status $ gcc-4.3 tls-bug.c ld: fatal: relocation error: R_SPARC_TLS_LE_HIX22: file /var/tmp//ccUeK1AZ.o: symbol unknown: bad symbol type SECT: symbol type must be TLS collect2: ld returned 1 exit status $ gcc-4.3 tls-bug.c -g ld: fatal: relocation error: R_SPARC_TLS_LE_HIX22: file /var/tmp//cceRP4ZP.o: symbol unknown: bad symbol type SECT: symbol type must be TLS collect2: ld returned 1 exit status -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29986
[Bug c++/34184] Scope broken for inherited members inside template class?
--- Comment #4 from scovich at gmail dot com 2007-12-11 17:27 --- (In reply to comment #3) Note you can declare a specialization of fooT::bar which shows that the code is really dependent. Duh! That's perfect. Thanks. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34184
[Bug target/34115] atomic builtins not supported on i686?
--- Comment #9 from scovich at gmail dot com 2007-11-28 14:20 --- (In reply to comment #8) (In reply to comment #7) Too bad they aren't defined for any machine I've tried so far... The explanation is very simple: the new macros are implemented only in mainline (would be 4.3.0). Any chance of backporting? (I know, probably not) The only question left is whether the compiler is supposed to emit a warning when it doesn't support the intrinsics (like the docs say) or whether the user should just be ready for linker errors. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34115
[Bug target/34115] atomic builtins not supported on i686?
--- Comment #7 from scovich at gmail dot com 2007-11-28 01:56 --- (In reply to comment #2) I think this is essentially invalid. Note that now we also have the various __GCC_HAVE_SYNC_COMPARE_AND_SWAP_* macros: http://gcc.gnu.org/onlinedocs/cpp/Common-Predefined-Macros.html Too bad they aren't defined for any machine I've tried so far... ia64-linux-gnu (4.1.2 Debian) x86_64-unknown-linux-gnu (4.2.0) sparc-sun-solaris2.10 (4.1.1) powerpc64-unknown-linux-gnu (4.1.2 Gentoo) i686-pc-cygwin (4.2.2) All these actually *do* support CAS, and emit perfectly respectable .asm... as long as you don't wrap them in any #ifdef's. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34115
[Bug c++/34184] Scope broken for inherited members inside template class?
--- Comment #2 from scovich at gmail dot com 2007-11-23 02:06 --- Subject: Re: Scope broken for inherited members inside template class? On 22 Nov 2007 21:03:11 -, pinskia at gcc dot gnu dot org [EMAIL PROTECTED] wrote: The issue comes down to if bar is dependent here and if so is baz's base. The namelookup rules for being dependent are weird and hard to understand really and actually changes namelookup in some cases so we have to go what the standard says. Somehow I'm not surprised the standard is confusing on this point... my (unqualified) opinion is that bar and baz don't depend on foo's template parameter. While fooint::bar is certainly not the same class as foochar::bar, bar's relationship to baz is always the same for any instantiation -- part of the template itself. Imagine that, instead of declaring bar and baz inside templateclass T foo we put them inside an anonymous namespace. What aspect of name lookup has changed so that baz can suddenly find 'bar::i' without hand-holding? I just found an interesting phenomenon -- if baz declares it is using bar::i life is suddenly good. Apparently the compiler is able to determine that bar is, in fact, a parent class of baz, and that the using clause is valid. However, it doesn't notice a problem with using bar::j until you actually instantiate a foo*::baz. Given that the compiler does know bar is a base class of baz, shouldn't any name lookup within baz check for matching members of bar? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34184
[Bug c++/34184] New: Scope broken for inherited members inside template class?
The following code fails to compile this.cpp: In member function 'int fooT::baz::foo()': this.cpp:8: error: 'i' was not declared in this scope // begin this.cpp template class T struct foo { struct bar { int i; }; struct baz : bar { int foo() { return i; } }; }; int main() { } // end this.cpp Changing it to 'this-val' solves the problem, but is unwieldy for classes with lots of members. I'm unsure what the Standard says, but I thought you only needed 'this-' when the member depends on information the compiler won't have until template instantiation time. However, that doesn't really apply here -- foo and bar do not depend on the template's type, so the compiler should be able to figure things out well before the template gets instantiated. FWIW Sun's CC accepts the code with no warnings. It's usually much more strict than gcc (to the point of being really frustrating). Even if the Standard says gcc is right, it would be very convenient if gcc matched CC on this extension. -- Summary: Scope broken for inherited members inside template class? Product: gcc Version: 4.2.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: scovich at gmail dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34184
[Bug c/34115] New: atomic builtins not supported on i686?
Linking fails for the program below, with the error: undefined reference to `___sync_val_compare_and_swap_4' // gcc -Wall atomic.c int main() { int *a, b, c; return __sync_val_compare_and_swap(a, b, c); } According to the atomic builtins docs (), Not all operations are supported by all target processors. If a particular operation cannot be implemented on the target processor, a warning will be generated and a call an external function will be generated. The external function will carry the same name as the builtin, with an additional suffix `_n' where n is the size of the data type. If CAS is not supported, how come I don't get a warning? Why would i686 *not* support compare and swap? The cmpxchg instruction has been around since 80486, according to the intel IA-32 processor manual. Also, does an unsupported builtin mean the user is responsible to write that function, or simply that the compiler must make a function call to synthesize its behavior? FWIW, my x86_64 cross-compile gcc 4.2.0 handles it fine, emitting a lock+cmpxchg pair. -- Summary: atomic builtins not supported on i686? Product: gcc Version: 4.2.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: scovich at gmail dot com GCC host triplet: i686-pc-cygwin http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34115
[Bug target/34115] atomic builtins not supported on i686?
--- Comment #5 from scovich at gmail dot com 2007-11-16 01:00 --- Subject: Re: atomic builtins not supported on i686? On 15 Nov 2007 23:53:06 -, joseph at codesourcery dot com [EMAIL PROTECTED] wrote: Because the default arch for i686-linux-gnu is i386. Which is a stupid inconsistency and arguably a bug. ++ BTW, -march=i686 works beautifully. Close the bug? or rename it as a RFE to have i686-* default to -march=i686? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34115
[Bug target/34115] atomic builtins not supported on i686?
--- Comment #6 from scovich at gmail dot com 2007-11-16 01:04 --- (In reply to comment #5) Subject: Re: atomic builtins not supported on i686? On 15 Nov 2007 23:53:06 -, joseph at codesourcery dot com [EMAIL PROTECTED] wrote: Because the default arch for i686-linux-gnu is i386. Which is a stupid inconsistency and arguably a bug. ++ BTW, -march=i686 works beautifully. Close the bug? or rename it as a RFE to have i686-* default to -march=i686? Oh, and is there supposed to be a warning about unsupported atomic ops or not? If not the docs should say to expect a linker error instead (and also mention/link those macros Paolo pointed out). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34115
[Bug debug/32990] [Regression] gdb has symbol table issues
--- Comment #7 from scovich at gmail dot com 2007-08-13 21:10 --- (In reply to comment #6) Sorry, my mistake. I meant readelf -wi (lowercase I). Unfortunately, I recompiled with 4.1 to get on with debugging, and also updated to 20070810 later that day. Now the bug won't cooperate and show up any more. Maybe the changes over the last three weeks fixed the problem? Also unfortunately, I will lose access to the code once my internship ends this week. It might be best to close this bug or leave it in WAITING as a placeholder in case anyone else sees the same thing in an easier-to-replicate context... -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32990
[Bug debug/32990] [Regression] gdb has symbol table issues
--- Comment #3 from scovich at gmail dot com 2007-08-10 16:20 --- Created an attachment (id=14050) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14050action=view) Output of readelf -wf I'm attaching the output of `readelf -wf' This time around some of offending PC are 0x41ac8c, 0x41bc1c, 0x41bc2d, 0x41bc44, 0x41bc45, 0x41bc55, 0x41bc56, 0x41bc63, 0x41bc64. Also in case it helps, `readelf -a' prints the following warning/error messages: readelf: Warning: There is a hole [0xe1fc - 0xe238] in .debug_loc section. readelf: Warning: There is a hole [0x100dc - 0x10118] in .debug_loc section. readelf: Warning: There is a hole [0x13860 - 0x1389c] in .debug_loc section. readelf: Warning: There is a hole [0x138ac - 0x138e8] in .debug_loc section. readelf: Warning: There is a hole [0x13c3c - 0x13c78] in .debug_loc section. readelf: Warning: There is a hole [0x13f34 - 0x13f70] in .debug_loc section. readelf: Warning: There is a hole [0x13f80 - 0x13fbc] in .debug_loc section. readelf: Warning: There is a hole [0x14148 - 0x14184] in .debug_loc section. readelf: Warning: There is a hole [0x15908 - 0x15944] in .debug_loc section. readelf: Warning: There is a hole [0x16618 - 0x16654] in .debug_loc section. readelf: Warning: There is a hole [0x17f54 - 0x17f90] in .debug_loc section. readelf: Warning: There is a hole [0x17fec - 0x18028] in .debug_loc section. readelf: Warning: There is a hole [0x1824c - 0x18288] in .debug_loc section. readelf: Warning: There is a hole [0x184ac - 0x184e8] in .debug_loc section. readelf: Warning: There is a hole [0x18590 - 0x185cc] in .debug_loc section. readelf: Warning: There is a hole [0x22a08 - 0x22a44] in .debug_loc section. readelf: Warning: There is a hole [0x232f0 - 0x2332c] in .debug_loc section. readelf: Warning: There is a hole [0x26944 - 0x26980] in .debug_loc section. readelf: Warning: There is a hole [0x29320 - 0x2935c] in .debug_loc section. readelf: Warning: There is a hole [0x29878 - 0x298b4] in .debug_loc section. readelf: Warning: There is a hole [0x29910 - 0x2994c] in .debug_loc section. readelf: Error: Range lists in .debug_info section aren't in ascending order! readelf: Warning: There is a hole [0x50 - 0xb0] in .debug_ranges section. readelf: Warning: There is an overlap [0x2fe0 - 0x50] in .debug_ranges section. readelf: Warning: There is a hole [0xb0 - 0x3010] in .debug_ranges section. readelf: Warning: There is an overlap [0x30b0 - 0x2fe0] in .debug_ranges section. readelf: Warning: There is a hole [0x3010 - 0x56e0] in .debug_ranges section. readelf: Warning: There is a hole [0x7610 - 0x76d0] in .debug_ranges section. readelf: Warning: There is an overlap [0x7700 - 0x7610] in .debug_ranges section. readelf: Warning: There is a hole [0x76d0 - 0x9b40] in .debug_ranges section. readelf: Warning: There is an overlap [0xd700 - 0x9a20] in .debug_ranges section. readelf: Warning: There is a hole [0x9b40 - 0xd700] in .debug_ranges section. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32990
[Bug debug/32990] [Regression] gdb has symbol table issues
--- Comment #5 from scovich at gmail dot com 2007-08-10 16:50 --- Murphy strikes again -- 5 minutes after closing this bug it popped back up in spite of a clean compile. Apparently `make clean' can change which PC causes complaints but doesn't necessarily fix the problem. -- scovich at gmail dot com changed: What|Removed |Added Status|RESOLVED|UNCONFIRMED Resolution|INVALID | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32990
[Bug debug/32990] [Regression] gdb has symbol table issues
--- Comment #4 from scovich at gmail dot com 2007-08-10 16:39 --- The problem comes from adding a member function to a header file and only recompiling some of the source files that include it (make depend missed something). It looked like a regression because changing versions of gcc required a clean recompile. -- scovich at gmail dot com changed: What|Removed |Added Status|WAITING |RESOLVED Resolution||INVALID http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32990
[Bug c++/32990] New: [Regression] gdb has symbol table issues
When debugging code produced by g++-4.3.0-20070716 the debugger regularly outputs the following error message when stopping at breakpoints or examining stack frames: error: warning: (Internal error: pc 0x419e59 in read in psymtab, but not in symtab.) Compiling the same code with g++-4.1.2 and running the same breakpoints results in no problems. I'm using gdb-6.6-debian, if that's any help. Unfortunately I have no idea how to narrow the test case down, and am not allowed to submit my program (it's from work). Some searching on Google indicates that .linkonce functions might be part of the issue (I have tons of those), but other than that I'm at a loss to narrow down the problem. If anyone has ideas I'm happy to try them out and report back. -- Summary: [Regression] gdb has symbol table issues Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: scovich at gmail dot com GCC target triplet: x86_64-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32990
[Bug c++/32992] New: Incorrect code generated for anonymous union
Compiling and running the code below produces the following output: $ g++ -Wall -DT=long -g3 union-bug.C ./a.out array= 0x7fff11782ef0 a= 0x7fff11782f20 B={1,3,5} A={-1719443200,4196007,-1719451616} A and B should contain the same values, but A contains garbage instead because the two members of the union do not reside at the same address. Changing T 'int' produces the expected output: $ g++ -Wall -DT=int -g3 union-bug.C ./a.out array= 0x7fff1fbf6370 a= 0x7fff1fbf6370 B={1,3,5} A={1,3,5} 'char' and 'short' also work correctly; '__m128i' (SSE register) breaks. This bug affects both gcc-4.1.2 and gcc-4.3 // union-bug.C #include cstdio struct A { T _a; T _b; T _c; }; struct B { T _array[3]; operator A() { union { T array[3]; A a; }; printf(array= %p\na= %p\n, array, a); for(int i=0; i 3; i++) array[i] = _array[i]; return a; } }; int main() { B b = {{1,3,5}}; A a = b; printf(B={%d,%d,%d}\n, (int) b._array[0], (int) b._array[1], (int) b._array[2]); printf(A={%d,%d,%d}\n, (int) a._a, (int) a._b, (int) a._c); return 0; } -- Summary: Incorrect code generated for anonymous union Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: scovich at gmail dot com GCC target triplet: x86_64-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32992
[Bug c++/32912] New: [Regression] ICE with C99 compound literal expression
Compiling the test case below gives the following ICE: bug.C: In function 'void bar()': bug.C:30: internal compiler error: in build_int_cst_wide, at tree.c:890 Please submit a full bug report, I think this might be related to bug 20103, except that gcc-4.1 handles the test case just fine. The test case also compiles in 4.3 with -O{0,1,s} instead of -O{2,3}. -ftree-pre is the culprit flag -- removing it from -O2 or adding it to -O1 toggles the bug. // g++-4.3-20070716 -msse3 -O2 bug.C #include emmintrin.h // Must be a vector, not a scalar #if 0 typedef long v2d; #else typedef __m128i v2d; #endif v2d rval(); v2d g; struct A { // Must have 2+ members v2d a; v2d b; }; struct B { // Need a struct containing an A A a; }; struct C { operator A() { v2d l; A a; // Must compute (a ^ ~b). Neither (a ^ b) nor (a + ~b) breaks. a.a ^= ~a.b; // globals, locals, and rvals don't break a.a ^= ~(v2d) {0,0}; // members and compound literals do return a; } }; void foo(B); void bar() { foo((B){C()}); } -- Summary: [Regression] ICE with C99 compound literal expression Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: scovich at gmail dot com GCC target triplet: x86_64-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32912
[Bug rtl-optimization/32725] Unnecessary reg-reg moves
--- Comment #6 from scovich at gmail dot com 2007-07-26 22:51 --- I've observed several more pieces of code where this bug comes up, and it always seems to be a case of (a) the compiler duplicating a register to preserve the value after a 2-operand insn can clobbers the original, then (b) later failing to notice that the other use(s) got optimized away, never existed, or were reads that got scheduled before the clobber. Perhaps a register renaming pass later in the compilation process might solve the issue? (I don't know how much that would slow down compilation, though). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32725
[Bug c++/32870] New: Unclear error message when declaring struct in wrong namespace
Compiling this code: struct Foo { struct Bar; Foo(); }; namespace Baz { Foo::Foo() { } struct Foo::Bar { }; } Gives the following two error messages: test.C:6: error: definition of 'void Foo::Foo()' is not in namespace enclosing 'Foo' test.C:7: error: declaration of 'struct Foo::Bar' in 'Baz' which does not enclose 'struct Foo' The first error is nice and clear; the second would be much easier to understand quickly if it also identified 'Baz' as a namespace. Note: this bug dates back at least as far as g++-3.4.4 (tested on 3.4.4, 4.1.2, 4.2.0 and 4.3-20070716) -- Summary: Unclear error message when declaring struct in wrong namespace Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: trivial Priority: P3 Component: c++ AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: scovich at gmail dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32870
[Bug target/32661] __builtin_ia32_vec_ext suboptimal for pointer/ref args
--- Comment #2 from scovich at gmail dot com 2007-07-11 15:03 --- (In reply to comment #1) Confirmed, not a regression. Also affects 4.3. Changing target -- scovich at gmail dot com changed: What|Removed |Added Version|4.1.2 |4.3.0 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32661
[Bug target/32661] __builtin_ia32_vec_ext suboptimal for pointer/ref args
--- Comment #3 from scovich at gmail dot com 2007-07-11 15:10 --- This bug also causes _mm_cvtsi128_si64x() (which calls __builtin_ia32_vec_ext_v2di) to emit suboptimal code. // g++-4.3-070710 -mtune=core2 -O3 -S -dp #include emmintrin.h long vector2long(__m128i* src) { return _mm_cvtsi128_si64x(*src); } Becomes _Z11vector2longPU8__vectorx: .LFB529: movdqa (%rdi), %xmm0 # 6 *movv2di_internal/2 [length = 3] movd%xmm0, %rax # 25*movdi_1_rex64/14 [length = 4] ret # 28return_internal [length = 1] This might be related to bug 32708 (and therefore have a similar fix?) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32661
[Bug middle-end/32729] New: Loop unrolling not performed with large constant loop bound
Consider the following functions: // g++ -mtune=core2 -O3 -S -dp void loop(int* dest, int* src, int count) { for(int i=0; i count; i++) dest[i] = src[i]; } void loop_few(int* dest, int* src) { loop(dest, src, 8); } void loop_many(int* dest, int* src) { loop(dest, src, 64); } loop() unrolls 8x, as expected. loop_few() peels completely, as expected. However, loop_many() neither peels nor unrolls. _Z9loop_manyPiS_: xorl%edx, %edx # 34*movdi_xor_rex64[length = 2] .L47: movl(%rsi,%rdx,4), %eax # 11*movsi_1/1 [length = 3] movl%eax, (%rdi,%rdx,4) # 12*movsi_1/2 [length = 3] incq%rdx# 13*adddi_1_rex64/1[length = 3] cmpq$64, %rdx # 15cmpdi_1_insn_rex64/1[length = 4] jne .L47# 16*jcc_1 [length = 2] rep ; ret # 35return_internal_long[length = 1] Ideally the optimizer would unroll 8x, then notice that (count%8==0) and eliminate the partial unroll code. However, even a stock unroll would be better than nothing. -- Summary: Loop unrolling not performed with large constant loop bound Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: scovich at gmail dot com GCC target triplet: x86_64-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32729
[Bug middle-end/32729] Loop unrolling not performed with large constant loop bound
--- Comment #1 from scovich at gmail dot com 2007-07-11 16:36 --- (In reply to comment #0) // g++ -mtune=core2 -O3 -S -dp Oops... that doesn't actually unroll loop() all, though it still peels loop_few(). Adding -funroll-loops (supposedly enabled by -O3?) unrolls loop() Adding -funroll-all-loops does nothing Nested loops also have issues: void nested_loop(int* dest, int* src) { for(int i=0; i 2; i++) for(int j=0; j 2; j++) dest[4*i+j] = src[4*j+i]; } becomes _Z11nested_loopPiS_: .LFB533: xorl%edx, %edx # 39*movdi_xor_rex64[length = 2] .L47: movl(%rsi), %ecx# 13*movsi_1/1 [length = 2] movl%ecx, (%rdi,%rdx,4) # 14*movsi_1/2 [length = 3] movl16(%rsi), %eax # 15*movsi_1/1 [length = 3] addq$4, %rsi# 17*adddi_1_rex64/1[length = 4] movl%eax, 4(%rdi,%rdx,4)# 16*movsi_1/2 [length = 4] addq$4, %rdx# 18*adddi_1_rex64/1[length = 4] cmpq$8, %rdx# 20cmpdi_1_insn_rex64/1[length = 4] jne .L47# 21*jcc_1 [length = 2] rep ; ret # 40return_internal_long[length = 1] -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32729
[Bug middle-end/32729] Regression: Loop unrolling not performed with large constant loop bound
--- Comment #2 from scovich at gmail dot com 2007-07-11 16:37 --- Regression: gcc-4.1.2 outputs the expected code for all test cases -- scovich at gmail dot com changed: What|Removed |Added Summary|Loop unrolling not performed|Regression: Loop unrolling |with large constant loop|not performed with large |bound |constant loop bound http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32729
[Bug target/32661] __builtin_ia32_vec_ext suboptimal for pointer/ref args
--- Comment #6 from scovich at gmail dot com 2007-07-11 20:27 --- (In reply to comment #5) SImode moves will be a bit harder, because shufps insn pattern is involved in the vector expansion. IIRC, shufps takes 3 cycles on Core2 (http://www.agner.org/optimize/instruction_tables.pdf), even without the operand type mismatch (does that still exist?). That's =4 cycles. Storing the vector to stack and load the desired entry would take =4 cycles, even without Intel's store-load optimizations, and I imagine the optimizer would be able to deal with it better. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32661
[Bug middle-end/32725] New: Unnecessary reg-reg moves
Compiling the following code // g++-4.3-070710 -O3 -msse3 -mtune=core2 -S #include emmintrin.h typedef unsigned long long u64; void foo(int* dest, unsigned short* src, long* indexes, __m128i _m1, __m128i _e, __m128i _m2) { // required by the API, and makes the bug worse u64 e = _mm_cvtsi128_si64x(_e); u64 m1 =_mm_cvtsi128_si64x(_m1); u64 m2 = _mm_cvtsi128_si64x(_m2); for(long i=0; i 3; i++) { u64 data = src[indexes[i]]; __uint128_t result = (__uint128_t) (data m1) * e; dest[i] = (result 64) m2; } } Produces redundant reg-reg moves _Z3fooPiPtPlU8__vectorxS2_S2_: .LFB527: pushq %rbx .LCFI0: movq%rdx, %r11 movd%xmm1, %r10 movd%xmm0, %r8 movd%xmm2, %r9 movq(%rdx), %rax movzwl (%rsi,%rax,2), %eax movq%rax, %rbx 1 andq%r8, %rbx movq%rbx, %rax 2 mulq%r10 movl%r9d, %eax 3 andl%edx, %eax movl%eax, (%rdi) movq8(%r11), %rax movzwl (%rsi,%rax,2), %eax movq%rax, %rbx 1 andq%r8, %rbx movq%rbx, %rax 2 popq%rbx mulq%r10 movl%r9d, %eax 3 andl%edx, %eax movl%eax, 4(%rdi) movq16(%r11), %rax movzwl (%rsi,%rax,2), %eax andq%rax, %r8Almost what 1 should be movq%r8, %rax2 mulq%r10 andl%edx, %r9d Essentially what 3 should be movl%r9d, 8(%rdi) ret The output of a single iteration should look something like this (33% fewer instructions): movq8(%r11), %rax movzwl (%rsi,%rax,2), %eax andq%r8, %rax mulq%r10 andl%r9d, %edx movl%edx, 4(%rdi) Methinks cases (2) and (3) are related to bug 15158 and bug 21202, but that case (1) is something else. There's also that odd choice to use %rbp, even though there are plenty of call-clobber regs to use instead... -- Summary: Unnecessary reg-reg moves Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: scovich at gmail dot com GCC target triplet: x86_64-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32725
[Bug middle-end/32708] New: _mm_cvtsi64x_si128() and _mm_cvtsi128_si64x() inefficient
Consider the following functions (compiled with g++-4.1.2 -msse3 -O3): #include emmintrin.h __m128i int2vector(int i) { return _mm_cvtsi32_si128(i); } int vector2int(__m128i i) { return _mm_cvtsi128_32(i); } __m128i long2vector(long long i) { return _mm_cvtsi64x_si128(i); } long long vector2long(__m128i) { return _mm_cvtsi128_si64x(i); } They become: _Z10int2vectori: movd%edi, %xmm0 ret _Z10vector2intU8__vectorx: movd%xmm0, %rax movq%xmm0, -16(%rsp) ret _Z11long2vectorx: movd%rdi, %mm0 movq%rdi, -8(%rsp) movq2dq %mm0, %xmm0 ret _Z11vector2longU8__vectorx: movd%xmm0, %rax movq%xmm0, -16(%rsp) ret long2vector() should use a simple MOVQ instruction the way int2vector() uses MOVD. It appears that the reason for the stack access is that the original code used a reg64-mem-mm-xmm path, which the optimizer partly noticed; gcc-4.3-20070617 leaves the full path in place. Also, do the vector2X() functions really need to access the stack? Finally, I've noticed several places where instructions involving 64-bit values use the d/l suffix (e.g. long i = 0 == xorl %eax, %eax), or 32-bit operations that use 64-bit registers (e.g. movd %xmm0, %rax above). Are those generally features, bugs, or a who cares? -- Summary: _mm_cvtsi64x_si128() and _mm_cvtsi128_si64x() inefficient Product: gcc Version: 4.1.2 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: scovich at gmail dot com GCC target triplet: x86_64-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32708
[Bug middle-end/32711] New: Regression: ICE when using inline asm constraint X
Compiling the following functions with gcc-4.{1,2,3} results in an ICE. gcc-3.4.4 does not ICE: #include emmintrin.h static inline __m128i my_asm(__m128i a, __m128i b) { __m128i result; asm(pshufb\t%1,%0 : =x(result) : X(b), 0(a)); return result; } __m128i foo(__m128i src) { return my_asm(src, _mm_set1_epi32(1)); } If the inline asm is called directly (not through an inline function) or if the X constraint changes to mx everything works fine. -- Summary: Regression: ICE when using inline asm constraint X Product: gcc Version: 4.1.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: scovich at gmail dot com GCC target triplet: x86_64-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32711
[Bug middle-end/32711] Regression: ICE when using inline asm constraint X
--- Comment #2 from scovich at gmail dot com 2007-07-09 23:27 --- (In reply to comment #1) X constraint means anything matches. Now why we are ICEing is a bit weird. I started using it because g doesn't seem to allow xmm references. Fortunately, xm seems to have the desired effect. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32711
[Bug middle-end/32662] Significant extra code generation for 64x64=128-bit multiply
--- Comment #3 from scovich at gmail dot com 2007-07-07 14:55 --- While it's nice that the new optimization framework can eliminate the redundant IMUL instruction(s), why were they being generated in the first place? Compiling the simpler foo() without optimizations gives: _Z3fooPyPKyy: .LFB2: pushq %rbp .LCFI0: movq%rsp, %rbp .LCFI1: pushq %rbx .LCFI2: movq%rdi, -16(%rbp) movq%rsi, -24(%rbp) movq%rdx, -32(%rbp) movq-24(%rbp), %rax movq(%rax), %rax movq%rax, %rcx movl$0, %ebx here movq-32(%rbp), %rax movl$0, %edx here movq%rbx, %rsi imulq %rax, %rsi here movq%rdx, %rdi imulq %rcx, %rdi here addq%rdi, %rsi mulq%rcx addq%rdx, %rsi movq%rsi, %rdx movq%rdx, %rax xorl%edx, %edx movq%rax, %rdx movq-16(%rbp), %rax movq%rdx, (%rax) popq%rbx leave ret Barring something really strange it seems like this problem could/should be fixed at its source, even for 4.1/4.2 Reopen? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32662
[Bug middle-end/32660] New: ICE using __builtin_ia32_vec_ext_v2di()
Attempting to compile the following function causes an ICE: #include emmintrin.h long foo(__m128i val) return __builtin_ia32_vec_ext_v2di(val, 1) } Changing the call to any of the following compiles just fine: __builtin_ia32_vec_ext_v2di(val, 0) __builtin_ia32_vec_ext_v4si(val, 1) __builtin_ia32_vec_ext_v8hi(val, 1) -- Summary: ICE using __builtin_ia32_vec_ext_v2di() Product: gcc Version: 4.1.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: scovich at gmail dot com GCC target triplet: x86_64-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32660
[Bug middle-end/32660] ICE using __builtin_ia32_vec_ext_v2di()
--- Comment #1 from scovich at gmail dot com 2007-07-06 23:11 --- Oops.. forgot to include the error message g++ -Wall -msse3 -O3 -S union-bug.C union-bug.C: In function long int foo(long long int __vector__): union-bug.C:4: error: unrecognizable insn: (insn 12 7 13 0 (set (reg:DI 61) (vec_select:DI (reg/v:V2DI 60 [ val ]) (parallel [ (const_int 1 [0x1]) ]))) -1 (nil) (nil)) union-bug.C:4: internal compiler error: in extract_insn, at recog.c:2084 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32660