[Bug middle-end/35545] tracer pass is run too late
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35545 --- Comment #23 from davidxl xinliangli at gmail dot com --- (In reply to Jan Hubicka from comment #22) Doing it at same approximately the same place as loop header copying seems to make most sense to me. It benefits from early cleanups and DCE definitly and it should enable more fun with the later scalar passes that are almost all rerun then. We need to make sure tracer doesn't mess too much with loops then. Btw, useless tracing may be undone again by tail-merging. Tracer already does: /* We have the tendency to duplicate the loop header of all do { } while loops. Do not do that - it is not profitable and it might create a loop with multiple entries or at least rotate the loop. */ bb2-loop_father-header != bb2) so it won't kill natural loops and peel (I should find time to update the peeling pass). It also has: if (bb_seen_p (bb2) || (e-flags (EDGE_DFS_BACK | EDGE_COMPLEX)) || find_best_successor (bb2) != e) break; to not unroll. So i think it is safe for loop optimizer - all it can do is expanding a loop that has control flow in it reducing its unrollability later. Tracer seems to consume only profile information and thus doesn't rely on any other transforms (well, apart from cleanups which could affect its cost function). Why not schedule it even earlier? Like to before pass_build_alias? (the pipeline up to loop transforms is quite a mess...) It uses profile information and code size estiamtes. I expect that (especially for C++ stuff) the early post inline passes will remove a lot of code and thus improve traceability. This is why I looked for spot after first DCE/DSE. David, can you please be more specific about how you tested? Was it with profile feedback? What about code size metrics? It is with profile feedback. There is also reduction in code size, though very tiny change. David Honza
[Bug middle-end/35545] tracer pass is run too late
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35545 --- Comment #19 from davidxl xinliangli at gmail dot com --- (In reply to Jan Hubicka from comment #18) WE can try some internal benchmarks with this change too. That would be very welcome. Tracer used to be quite useful pass in old days, doing 1.6% on -O3+FDO SPECint (for 1.4% code size cost) that put it very close to the inliner (1.8%) that however needed 12.9% of code size. http://www.ucw.cz/~hubicka/papers/amd64/node4.html (see aggressive optimization table) I do not think it was ever resonably returned for GIMPLE. Martin, do you have any current scores? Part of benefits came from lack of global optimizers, but I think tail duplication has a potential in modern compiler, too. Perhaps we should try to do that and given that we now have tail merging pass, consider getting it useful without FDO, too. Honza I saw small improvement across most of the benchmarks with this change on sandybridge machines with FDO. The geomean improvement is about 0.3%. The baseline compiler is Google gcc-49 compiler, and the test compiler is Google gcc-49 with the patch.
[Bug middle-end/35545] tracer pass is run too late
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35545 --- Comment #17 from davidxl xinliangli at gmail dot com --- (In reply to Jan Hubicka from comment #16) I have moved tracer before the late cleanups that seems to be rather obbious thing to do. This lets us to optimize the testcase (with -O2): int main() () { struct A * ap; int i; int _6; bb 2: bb 3: # i_29 = PHI i_22(6), 0(2) _6 = i_29 % 7; if (_6 == 0) goto bb 4; else goto bb 5; bb 4: ap_8 = operator new (16); ap_8-i = 0; ap_8-_vptr.A = MEM[(void *)_ZTV1A + 16B]; goto bb 6; bb 5: ap_13 = operator new (16); MEM[(struct B *)ap_13].D.2244.i = 0; MEM[(struct B *)ap_13].b = 0; MEM[(struct B *)ap_13].D.2244._vptr.A = MEM[(void *)_ZTV1B + 16B]; bb 6: # ap_4 = PHI ap_13(5), ap_8(4) operator delete (ap_4); i_22 = i_29 + 1; if (i_22 != 1) goto bb 3; else goto bb 7; bb 7: return 0; } Martin, I do not have SPEC setup, do you think you can benchmark the attached patch with SPEC and profile feedback and also non-FDO -O3 -ftracer compared to -O3, please? It would be nice to know code size impact, too. Index: passes.def === --- passes.def (revision 215651) +++ passes.def (working copy) @@ -155,6 +155,7 @@ along with GCC; see the file COPYING3. NEXT_PASS (pass_dce); NEXT_PASS (pass_call_cdce); NEXT_PASS (pass_cselim); + NEXT_PASS (pass_tracer); NEXT_PASS (pass_copy_prop); NEXT_PASS (pass_tree_ifcombine); NEXT_PASS (pass_phiopt); @@ -252,7 +253,6 @@ along with GCC; see the file COPYING3. NEXT_PASS (pass_cse_reciprocals); NEXT_PASS (pass_reassoc); NEXT_PASS (pass_strength_reduction); - NEXT_PASS (pass_tracer); NEXT_PASS (pass_dominator); NEXT_PASS (pass_strlen); NEXT_PASS (pass_vrp); Doing it at same approximately the same place as loop header copying seems to make most sense to me. It benefits from early cleanups and DCE definitly and it should enable more fun with the later scalar passes that are almost all rerun then. WE can try some internal benchmarks with this change too. David
[Bug middle-end/45631] Indirect call profiling can be improved to handle multiple targets
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45631 davidxl xinliangli at gmail dot com changed: What|Removed |Added CC||xur at google dot com --- Comment #8 from davidxl xinliangli at gmail dot com --- (In reply to Jan Hubicka from comment #7) Any news on this code? Rong, I think it is probably good time to extract the multi-target value profiling code and upstream it. David
[Bug middle-end/63220] error: inlining failed in call to always_inline
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63220 --- Comment #2 from davidxl xinliangli at gmail dot com --- (In reply to Richard Biener from comment #1) First of all you should mark the functions 'inline' as well. This does not help. Then the issue is that 'eq' is called indirectly which isn't allowed for always_inline functions: Is this documented somewhere? A function can be called indirectly and directly. What is the right way to force inlining the direct calls? A warning is already emitted about always_inline might not be inlinable, why the error? t.C:81:66: error: inlining failed in call to always_inline 'static constexpr bool std::__1::char_traitschar::eq(std::__1::char_traitschar::char_type, std::__1::char_traitschar::char_type)': indirect function call with a yet undetermined callee __attribute__ (( __always_inline__)) static constexpr bool eq(char_type __c1, char_type __c2) { ^ t.C:75:37: error: called from here if (!__pred(*__m1, *__m2)) { } ^ which means this is a missed-optimization only. The error is your fault. Note that getting the error is unreliable so -O0 simply doesn't discover the failed inlining. -O2 works fine -- I have not debugged the problem -- but it seems to be some newly cloned cgraph edge to be in inconsistent state. David
[Bug target/63209] [ARM] Wrong conditional move generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63209 davidxl xinliangli at gmail dot com changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #2 from davidxl xinliangli at gmail dot com --- Fixed in trunk: r215136, in gcc-4_9: r215137 2014-09-10 Xinliang David Li davi...@google.com PR target/63209 * config/arm/arm.md (movcond_addsi): Handle case where source and target operands are the same.
[Bug middle-end/63220] New: error: inlining failed in call to always_inline
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63220 Bug ID: 63220 Summary: error: inlining failed in call to always_inline Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: xinliangli at gmail dot com Created attachment 33466 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33466action=edit test case Compile the attached source with -std=gnu++11 -O1 option, the compilation will fail with the error message error: inlining failed in call to always_inline There is no problem with O2.
[Bug rtl-optimization/63209] New: [ARM] Wrong conditional move generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63209 Bug ID: 63209 Summary: [ARM] Wrong conditional move generated Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: xinliangli at gmail dot com Created attachment 33460 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33460action=edit patch file GCC (arm-unknown-linux-gnueabi) generates wrong code (O2) for the following program. The problem is that it uses a source register in the condition move that has already been overwritten in the previous move. The program produces output: top[-1]: ff7a7a7a top[0]: ff7a7a7a left: ff7b7b7b pred: ff7a7a7a pred != left The bad assembly code is: cmn r1, r2 mov r0, r3 movgt r0, r3 The expected output is: top[-1]: ff7a7a7a top[0]: ff7a7a7a left: ff7b7b7b pred: ff7b7b7b The expected assembly: cmn r1, r2 movle r0, r3 The attached patch fixed the problem. The patch passes regression test. // tests.c #include stdio.h static int Sub(int a, int b) { return b -a; } static unsigned Select(unsigned a, unsigned b, unsigned c) { const int pa_minus_pb = Sub((a 8) 0xff, (b 8) 0xff) + Sub((a 0) 0xff, (b 0) 0xff); return (pa_minus_pb = 0) ? a : b; } __attribute__((noinline)) unsigned Predictor(unsigned left, const unsigned* const top) { const unsigned pred = Select(top[1], left, top[0]); return pred; } int main(void) { const unsigned top[2] = {0xff7a7a7a, 0xff7a7a7a}; const unsigned left = 0xff7b7b7b; const unsigned pred = Predictor(left, top /*+ 1*/); fprintf(stderr, top[-1]: %8x top[0]: %8x left: %8x pred: %8x\n, top[0], top[1], left, pred); if (pred == left) return 0; fprintf(stderr, pred != left\n); return 1; }
[Bug rtl-optimization/62040] New: internal compiler error: in simplify_const_unary_operation, at simplify-rtx.c:1555
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62040 Bug ID: 62040 Summary: internal compiler error: in simplify_const_unary_operation, at simplify-rtx.c:1555 Product: gcc Version: 4.10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: xinliangli at gmail dot com Compile the attached source file with option -Os -g using aarch64 compiler, the compiler ICEs: et2.c: In function 'foo': et2.c:189:1: internal compiler error: in simplify_const_unary_operation, at simplify-rtx.c:1555 }
[Bug rtl-optimization/62040] internal compiler error: in simplify_const_unary_operation, at simplify-rtx.c:1555
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62040 --- Comment #1 from davidxl xinliangli at gmail dot com --- Created attachment 33264 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33264action=edit test case
[Bug middle-end/61776] [4.9/4.10 Regression] ICE: verify_flow_info failed: control flow in the middle of basic block with -fprofile-generate
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61776 davidxl xinliangli at gmail dot com changed: What|Removed |Added CC||xinliangli at gmail dot com --- Comment #3 from davidxl xinliangli at gmail dot com --- In tree-profiling (), in the intrumentation loop (for each def function) -- a pending list of functions calling pure functions should be recorded. After the loop that reset the const/pure flags, a loop should iterate though the pending list, and perform cfg fixup on them. David (In reply to Richard Biener from comment #1) There are several possibilities: - don't instrument pure/const functions - do not reset the pure/const flag (I see no reason to - the compiler might not insert side-effects and we can consider the counter updates as non-side-effect) That is, I wonder why we do /* Drop pure/const flags from instrumented functions. */ FOR_EACH_DEFINED_FUNCTION (node) { if (!gimple_has_body_p (node-decl) || !(!node-clone_of || node-decl != node-clone_of-decl)) continue; /* Don't profile functions produced for builtin stuff. */ if (DECL_SOURCE_LOCATION (node-decl) == BUILTINS_LOCATION) continue; cgraph_set_const_flag (node, false, false); cgraph_set_pure_flag (node, false, false); } but don't then call execute_fixup_cfg () again (but it doesn't yet deal with a const/pure function becoming non-pure/const and thus a bb-ending stmt). Btw, this is yet another case where transitioning to call flags instead of decl flags would save us - definitely even the instrumented call cannot return abnormally. Dropping the clearing of const/pure fixes the bug. Honza? I only can think of the case where we have for (..) { const (); const (); } and inline only one of the calls where after that loop store-motion might apply store-motion to the counter updates of the inline clone, overwriting the updates from the non-inline call. But do we really care? Anyway, confirmed. Btw, goo() should be leaf but it seems we don't auto-compute 'leaf' yet?
[Bug middle-end/61776] [4.9/4.10 Regression] ICE: verify_flow_info failed: control flow in the middle of basic block with -fprofile-generate
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61776 --- Comment #5 from davidxl xinliangli at gmail dot com --- (In reply to wmi from comment #4) Can we move the pure/const resetting loop to an earlier place: inside branch_prob , after instrument_edges and before gsi_commit_edge_inserts (where stmt_ends_bb_p is checked), so that gsi_commit_edge_inserts() which changes cfg could take reset const/pure flags into consideration? Sounds plausible. Have you tried it? David
[Bug middle-end/61776] [4.9/4.10 Regression] ICE: verify_flow_info failed: control flow in the middle of basic block with -fprofile-generate
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61776 --- Comment #7 from davidxl xinliangli at gmail dot com --- (In reply to wmi from comment #6) (In reply to davidxl from comment #5) (In reply to wmi from comment #4) Can we move the pure/const resetting loop to an earlier place: inside branch_prob , after instrument_edges and before gsi_commit_edge_inserts (where stmt_ends_bb_p is checked), so that gsi_commit_edge_inserts() which changes cfg could take reset const/pure flags into consideration? Sounds plausible. Have you tried it? David I just tried but found it was not very easy. FOR_EACH_DEFINED_FUNCTION (node) { execute_fixup_cfg() and cleanup_tree_cfg() branch_prob() } For the above loop, branch_prob is called one by one for each defined func. Because a func could possibly call any other funcs on the cgraph, we need to reset the const/pure flags for every defined func before any branch_prob() is called, so we cannot put the const/pure reset code inside branch_prob(). As you noted below, resetting the flag before branch_prob will lead to cfg mismatch between instr and profile-use build. David We also cannot move the const/pure reset loop before the branch_prob() loop, because execute_fixup_cfg will use const/pure flags to generate different cfg. If we put the const/pure reset code before the branch_prob() loop, the const/pure reset code should only be executed in intrumentation phase, not in annotation phase, so that we may get different cfg between intrumentation and annotation. Wei.
[Bug middle-end/61571] bad aliasing -- wrong FRE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61571 --- Comment #5 from davidxl xinliangli at gmail dot com --- Thanks for the analysis. I agree it is invalid to use -fstrict-aliasing for the code. The implementation tries to save some space in __list_impl class by making the sentinel marker __end_ to be just __list_node_base, but it is linked with other __list_node type objects. David
[Bug middle-end/61571] New: bad aliasing -- wrong FRE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61571 Bug ID: 61571 Summary: bad aliasing -- wrong FRE Product: gcc Version: 4.10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: xinliangli at gmail dot com Compile the attached source with -m32 -std=c++11 -O2 on x86, the program will crash. It runs fine with -m32 -std=c++11 -O2 -fno-tree-pre -fno-tree-fre. (same failure is also observed on arm target). The bad FRE happens in listint::merge function. listint::splice needs to be inlined to trigger the bad CSE. The problematic CSE is this one: listint::merge (...) { if (this != __c) { iterator __f1 = begin(); iterator __e1 = end(); iterator __f2 = __c.begin(); // this one ... while (...) { } // inline instance of splice: if (!__c.empty()) { __node_pointer __f = __c.__end_.__next_; // and this one __node_pointer __l = __c.__end_.__prev_; base::__unlink_nodes(__f, __l); __link_nodes(__p.__ptr_, __f, __l); base::__sz() += __c.__sz(); __c.__sz() = 0; } Note begin() method accesses __c.__end_.__next_.
[Bug middle-end/61571] bad aliasing -- wrong FRE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61571 --- Comment #1 from davidxl xinliangli at gmail dot com --- Created attachment 32979 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=32979action=edit source file
[Bug middle-end/61571] bad aliasing -- wrong FRE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61571 davidxl xinliangli at gmail dot com changed: What|Removed |Added CC||richard.guenther at gmail dot com --- Comment #2 from davidxl xinliangli at gmail dot com --- The issue exists in trunk and as early as 4.7 (that i have tested).
[Bug tree-optimization/61289] New: Bad jump threading generates infinite loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61289 Bug ID: 61289 Summary: Bad jump threading generates infinite loop Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: xinliangli at gmail dot com Build the following program with -fno-exceptions -O2, the program aborts at runtime. Adding -fno-tree-dominator-opts, the program runs fine. To trigger the problem, the first iteration needs to return list == new_list -- the code will then get into the cloned path which is actually an infinite loop. (the toy program does something nonsense, but it should finish without aborting). jpthread.h - struct MyList { inline int LastHit () const { return last_; } // inline int FirstHit () const { return last_-1; } MyList* PushBack(int); // MyList* PushFront(int); void Destroy(); static MyList* Create(int); MyList(int i): last_(i) {} void PopBack(); int last_; }; jpthread.cc #include jpthread.h void test () { MyList *list = MyList::Create(20); for (int i = 0; i 2; ) { MyList* new_list = list-PushBack( list-LastHit() + 10); if (new_list != list) { list-Destroy(); list = new_list; ++i; } } list-Destroy(); } int main() { test(); } m.cc - #include stdlib.h #include stdio.h #include jpthread.h int cnt = 0; MyList *prev = 0; MyList* MyList::PushBack(int i) { cnt ++; if (cnt == 1) return prev; if (cnt 3) { fprintf (stderr, Loop not ending, aborting ..\n); abort (); } prev = this; return new MyList(i+last_); } MyList* MyList::Create(int i) { prev = new MyList(i); return prev; } void MyList::Destroy() { delete this; }
[Bug tree-optimization/61009] Incorrect jump threading in dom
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=61009 --- Comment #10 from davidxl xinliangli at gmail dot com --- (In reply to Jeffrey A. Law from comment #9) Sorry, personal issue taking an enormous amount of my time right now. I have a fully tested patch and just need to twiddle the attached test into an executable testcase for the regression suite. Can you post the early version of the patch? We can help test it with larger test cases that exposed the problem in the first place. David
[Bug tree-optimization/61009] Incorrect jump threading in dom
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=61009 davidxl xinliangli at gmail dot com changed: What|Removed |Added CC||xinliangli at gmail dot com --- Comment #8 from davidxl xinliangli at gmail dot com --- (In reply to Jeffrey A. Law from comment #7) I see what's happening here... I need to think about how best to handle this situation. Oh how threading across loop backedges perilous. Any update on this issue? David
[Bug tree-optimization/60899] undef reference generated with -fdevirtualize-speculatively
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60899 --- Comment #3 from davidxl xinliangli at gmail dot com --- (In reply to Jan Hubicka from comment #2) David, it seems a_m.C should be different form a.C. From chain of events you describe I think we need to figure out why the last folding happens. Does the function pass can_refer_decl_in_current_unit_p and if so, how does cgraph node look at that time? Honza Cut paste error: // a_m.cc #include a.h struct D2: public DI { virtual int doit () { return 3; } }; extern int bar(DI*); int main() { D2 d2; return bar(d2); }
[Bug tree-optimization/60899] undef reference generated with -fdevirtualize-speculatively
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60899 --- Comment #4 from davidxl xinliangli at gmail dot com --- (In reply to davidxl from comment #3) (In reply to Jan Hubicka from comment #2) David, it seems a_m.C should be different form a.C. From chain of events you describe I think we need to figure out why the last folding happens. Does the function pass can_refer_decl_in_current_unit_p and if so, how does cgraph node look at that time? Honza Cut paste error: // a_m.cc #include a.h struct D2: public DI { virtual int doit () { return 3; } }; extern int bar(DI*); int main() { D2 d2; return bar(d2); } stepping into can_refer_decl_in_current_unit_p indicates it returns true (for A::foo and A::vtable) at the condition @line 106.
[Bug tree-optimization/60899] undef reference generated with -fdevirtualize-speculatively
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60899 --- Comment #7 from davidxl xinliangli at gmail dot com --- (In reply to Paul Pluzhnikov from comment #6) Google ref: b/13453242 Verified that the proposed patch fixed the problem in b/1345242.
[Bug tree-optimization/60899] New: undef reference generated with -fdevirtualize-speculatively
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60899 Bug ID: 60899 Summary: undef reference generated with -fdevirtualize-speculatively Product: gcc Version: 4.10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: xinliangli at gmail dot com Build the following code with the following command line: g++ -O2 -fdisable-tree-einline a.cc a_m.cc results in: /tmp/cci31j3N.o: In function `D::doit()': a.cc:(.text._ZN1D4doitEv[_ZN1D4doitEv]+0x5): undefined reference to `A::foo()' collect2: error: ld returned 1 exit status It builds fine when devirtualization is disabled: -O2 -fno-devirtualization-speculatively -fdisable-tree-einline The problem is there is no instantiation of any class A instances (final or subclass) in the program, so vtables and A::foo are all eliminated. The reference to A::foo is from D::doit. In a successful build, there are no D instances either, so D::doit won't be emitted. However with speculative devirtualization, D::doit may be speculatively referenced even though there are no D instances. What happens is that during ipa-inline, goo is inlined into D::doit, the virtual call to foo should become an direct call to A::foo, but the new edge is not discovered. Since there is no call edge to A::foo, A::foo gets removed right after ipa-inline (before inline transform). However during inline transform, gimple-fold-call converts the virtual call into a direct call. The test case is extracted from a very large real program. The explicit reference to D::doit in bar is to demonstrate the problem -- in the real program, the reference is from spec-devirt. //a.h struct B { virtual int foo() = 0; int goo() { return foo(); } int i; }; struct A : public B { A() : i(0) {} int foo() { return 1;} int i; }; struct A2 : public B { int foo() { return 2;} }; struct DI { virtual int doit() = 0; }; struct D : public DI { virtual int doit () { return m.goo(); } A m; }; // a.cc #include a.h int cond; int bar (DI* ap) { if (cond) return static_castD*(ap)-D::doit(); // Mimic speculative devirtualization return ap-doit(); } // a_m.cc #include a.h int cond; int bar (DI* ap) { if (cond) return static_castD*(ap)-D::doit(); // Mimic speculative devirtualization return ap-doit(); }
[Bug tree-optimization/48316] missed CSE / reassoc with array offsets
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48316 davidxl xinliangli at gmail dot com changed: What|Removed |Added CC||xinliangli at gmail dot com --- Comment #1 from davidxl xinliangli at gmail dot com --- int a[8][8]; int foo(int i) { return a[i][3] + a[i+1][3] + a[i+2][3]; } ICC generates: movslq%edi, %rdi#5.1 shlq $5, %rdi #6.10 movl 12+a(%rdi), %eax #6.10 addl 44+a(%rdi), %eax #6.20 addl 76+a(%rdi), %eax GCC can not CSE the common code for address arithmetic: leal1(%rdi), %eax movslq %edi, %rdx addl$2, %edi cltq movslq %edi, %rdi salq$5, %rdx salq$5, %rax salq$5, %rdi movla+12(%rax), %eax addla+12(%rdx), %eax addla+12(%rdi), %eax ret
[Bug tree-optimization/59303] [4.9 Regression] uninit-pred-8_b.c and uninit-pred-9_b.c fail after better optimizations
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59303 --- Comment #5 from davidxl xinliangli at gmail dot com --- Fixing this requires more powerful predicate analysis with the help of value equivalent classes. From the dump: Use in stmt blah (v_1); is guarded by : if (_23 != 0) _23 = pephitmp_22 | _8, so the use guard is prephitmp_22 != 0 || _8 != 0 prephitmp_22 = PHI 0(13), prephitmp_21(3), so the guard can be translated to prephitmp_21 != 0 || _8 != 0 prephitmp_21 = r_9(D) = 19, so the predicate becomes r_9(D) = 19 || _8 != 0 For the def: [CHECK] Found def edge 1 in v_1 = PHI v_14(D)(13), r_9(D)(3) Operand defs of phi v_1 = PHI v_14(D)(13), r_9(D)(3) is guarded by : if (_8 != 0) (.OR.) (.NOT.) if (_8 != 0) (.AND.) if (_13 != 0) the predicate is _8 != 0 || _13 != 0 where _13 = _10 | _12, 10_ = (r_9(D) = 19), _12 = l_11(D) != 0 so the predicate is _8 != 0 || r_9(D) = 19 || 1_11(D) != 0 which is the superset for the use. Perhaps we need to do around of predicate normalization before comparison. David
[Bug tree-optimization/59303] [4.9 Regression] uninit-pred-8_b.c and uninit-pred-9_b.c fail after better optimizations
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59303 --- Comment #6 from davidxl xinliangli at gmail dot com --- I am working on a solution (and some cleanups). David
[Bug middle-end/45631] devirtualization with profile feedback does not work for function pointers
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45631 davidxl xinliangli at gmail dot com changed: What|Removed |Added CC||xinliangli at gmail dot com --- Comment #6 from davidxl xinliangli at gmail dot com --- (In reply to Jan Hubicka from comment #5) Google has mentioned to have patches for smarter multi-target collection. So perhaps for next stage1? Yes -- we will contribute that code back to trunk in next stage1 (was planning to do so in this cycle, but Rong was busy with contributing other patches: fast coverage mode, atomic update, libgcov refactoring, and profile-tool). David
[Bug target/59376] -mmemcpy-strategy= and -mmemset-strategy= are undocumented
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59376 davidxl xinliangli at gmail dot com changed: What|Removed |Added CC||xinliangli at gmail dot com --- Comment #1 from davidxl xinliangli at gmail dot com --- Where should it be documented (they are documented in doc/invoke.texi already)?
[Bug tree-optimization/41488] IVOpts cannot coalesce multiple induction variables
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41488 --- Comment #5 from davidxl xinliangli at gmail dot com --- Alternative approach -- introduce a special forward propagation before or after the ivopt to get rid of the redundant iv. This propagation needs to propagate through header phi.
[Bug middle-end/57287] [4.9 Regression] Bogous uninitialized warning with abnormal control flow
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57287 --- Comment #16 from davidxl xinliangli at gmail dot com --- (In reply to Richard Biener from comment #15) Confirmed. David, can you have a look here? I had a hard time following what exactly to do with the dataflow in the uninit pass for abnormal control flow (abnormal control flow should be considered receiving an initialized value). Thanks. I looked at it a little. The warning is not from the predicated uninit analysis which checks uses of phi defs. The warning is emitted from tree-ssa.c:1644 in function warn_uninitialized_vars. It warns about the 'definitely' uninitialized variable that may be executed. In this case it is use of 'buf_16(ab)' in BB3. Not sure where this statement comes from: buf_24 = buf_16(ab); The dot file is attached. digraph uninit.c.131t.uninit1 { overlap=false; subgraph baz { color=black; label=baz; fn_11_basic_block_0 [shape=Mdiamond,style=filled,fillcolor=white,label=ENTRY]; fn_11_basic_block_1 [shape=Mdiamond,style=filled,fillcolor=white,label=EXIT]; fn_11_basic_block_2 [shape=record,style=filled,fillcolor=lightgrey,label={ FREQ:1 |\bb\ 2\:\l\ |varseen_11(ab)\ =\ 0;\l\ |_14\ =\ head\ ();\l\ }]; fn_11_basic_block_3 [shape=record,style=filled,fillcolor=lightgrey,label={ FREQ:7100 |\bb\ 3\:\l\ |list\ =\ _14;\l\ |buf_24\ =\ buf_16(ab);\l\ |if\ (_14\ !=\ 0B)\l\ \ \ goto\ \bb\ 4\;\l\ else\l\ \ \ goto\ \bb\ 8\;\l\ }]; fn_11_basic_block_4 [shape=record,style=filled,fillcolor=lightgrey,label={ FREQ:8500 |\bb\ 4\:\l\ |#\ varseen_3(ab)\ =\ PHI\ \1(3),\ 1(6)\\l\ |#\ n_26(ab)\ =\ PHI\ \_14(3),\ n_2(ab)(6)\\l\ |#\ buf_27(ab)\ =\ PHI\ \buf_24(3),\ buf_7(ab)(6)\\l\ |buf_21\ =\ bar\ ();\l\ }]; fn_11_basic_block_5 [shape=record,style=filled,fillcolor=lightgrey,label={ FREQ:4250 |\bb\ 5\:\l\ }]; fn_11_basic_block_6 [shape=record,style=filled,fillcolor=lightgrey,label={ FREQ:2900 |\bb\ 6\:\l\ |#\ n_1(ab)\ =\ PHI\ \n_15(ab)(2),\ n_26(ab)(4),\ n_26(ab)(5)\\l\ |#\ varseen_4(ab)\ =\ PHI\ \varseen_11(ab)(2),\ varseen_3(ab)(4),\ varseen_3(ab)(5)\\l\ |#\ buf_6(ab)\ =\ PHI\ \buf_16(ab)(2),\ buf_27(ab)(4),\ buf_21(5)\\l\ |__sigsetjmp\ (buf_6(ab),\ 1);\l\ |n_19\ =\ n_1(ab)-\next;\l\ |n_2(ab)\ =\ n_19;\l\ |varseen_5\ =\ varseen_4(ab);\l\ |buf_7(ab)\ =\ buf_6(ab);\l\ |if\ (n_2(ab)\ !=\ 0B)\l\ \ \ goto\ \bb\ 4\;\l\ else\l\ \ \ goto\ \bb\ 7\;\l\ }]; fn_11_basic_block_7 [shape=record,style=filled,fillcolor=lightgrey,label={ FREQ:1500 |\bb\ 7\:\l\ |_25\ =\ varseen_5;\l\ }]; fn_11_basic_block_8 [shape=record,style=filled,fillcolor=lightgrey,label={ FREQ:1500 |\bb\ 8\:\l\ |#\ _8\ =\ PHI\ \_25(7),\ 0(3)\\l\ |return\ _8;\l\ }]; fn_11_basic_block_0:s - fn_11_basic_block_2:n [style=solid,bold,color=blue,weight=100,constraint=true, label=[100%]]; fn_11_basic_block_2:s - fn_11_basic_block_6:n [style=solid,bold,color=red,weight=10,constraint=true, label=[29%]]; fn_11_basic_block_2:s - fn_11_basic_block_3:n [style=solid,bold,color=blue,weight=100,constraint=true, label=[71%]]; fn_11_basic_block_3:s - fn_11_basic_block_4:n [style=solid,bold,color=black,weight=10,constraint=true, label=[85%]]; fn_11_basic_block_3:s - fn_11_basic_block_8:n [style=solid,bold,color=black,weight=10,constraint=true, label=[15%]]; fn_11_basic_block_4:s - fn_11_basic_block_6:n [style=dotted,bold,color=red,weight=10,constraint=false, label=[50%]]; fn_11_basic_block_4:s - fn_11_basic_block_5:n [style=solid,bold,color=blue,weight=100,constraint=true, label=[50%]]; fn_11_basic_block_5:s - fn_11_basic_block_6:n [style=dotted,bold,color=blue,weight=10,constraint=false, label=[100%]]; fn_11_basic_block_6:s - fn_11_basic_block_4:n [style=solid,bold,color=black,weight=10,constraint=true, label=[85%]]; fn_11_basic_block_6:s - fn_11_basic_block_7:n [style=solid,bold,color=black,weight=10,constraint=true, label=[15%]]; fn_11_basic_block_7:s - fn_11_basic_block_8:n [style=solid,bold,color=blue,weight=100,constraint=true, label=[100%]]; fn_11_basic_block_8:s - fn_11_basic_block_1:n [style=solid,bold,color=black,weight=10,constraint=true, label=[100%]]; fn_11_basic_block_0:s - fn_11_basic_block_1:n [style=invis,constraint=true]; } }
[Bug tree-optimization/56490] [4.6/4.7/4.8 Regression] -Wall triggering infinite loop
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56490 --- Comment #5 from davidxl xinliangli at gmail dot com 2013-03-08 16:21:13 UTC --- (In reply to comment #4) Clearly the limiting is bougs ... the issue is with large fanout BBs we recurse in compute_control_dep_chain which leads to exponential complexity. Limiting that to 8^5 isn't really a fix ... (yes, you can call that O(1)). compute_control_dep_chain is called 65 million times(!) for the testcase after the patch. There has to be a better algorithmic way of implementing this. IMHO still not fixed in 4.8. Right. However for on demand computation like this, limiting is still needed. For this particular case, limiting the depth to 3 (or may be smaller) won't cause any regressions.
[Bug middle-end/56490] [4.6/4.7/4.8 Regression] -Wall triggering infinite loop
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56490 davidxl xinliangli at gmail dot com changed: What|Removed |Added CC||xinliangli at gmail dot com --- Comment #2 from davidxl xinliangli at gmail dot com 2013-03-01 22:45:48 UTC --- The function has a control flow that post-dom chain is as long as 103. Limiting the max post-dom walk length solve the problem.
[Bug c++/55742] [4.8 regression] __attribute__ in class function declaration cause prototype does not match errors.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55742 --- Comment #34 from davidxl xinliangli at gmail dot com 2013-01-18 17:27:43 UTC --- The patch is missing changes in documentation on the new attribute. David (In reply to comment #32) Created attachment 29207 [details] gcc48-pr55742.patch This bug is open for way too long given its severity, so let's start talking over patches. This patch attempts to implement what I understand from Jason's comments, just with default instead of any, because it is indeed the default target attribute (whatever you specify on the command line). Say on: void foo (); void foo () __attribute__((target (avx))); void foo () __attribute__((target (default))); __attribute__((target (default))) void foo () { } __attribute__((target (avx))) void foo () { } void (*fn) () = foo; first we merge the first two decls, because only if target attribute is present on both, we consider it for multi-versioning, for compatibility with 4.7 and older. On e.g. void foo (); void foo () __attribute__((target (sse4))); void foo () __attribute__((target (default))); void foo () { } we reject the last fn definition, because at that point foo is already known to be multi-versioned, thus it is required that target attribute is specified for foo (either default, or some other). Unfortunately, for this case the error is reported twice for some reason. The #c0 testcase now compiles. Now, the issues I discovered with multiversioning, still unfixed by the patch: 1) the mv*.C testcases should be moved, probably to g++.dg/ext/mv*.C 2) can you please explain the mess in handle_target_attribute? /* Do not strip invalid target attributes for targets which support function multiversioning as the target string is used to determine versioned functions. */ else if (! targetm.target_option.valid_attribute_p (*node, name, args, flags) ! targetm.target_option.supports_function_versions ()) *no_add_attrs = true; Why do you need that? Consider complete garbage in target attribute arguments, which is errored about, but the above for i386/x86_64 keeps the target attribute around anyway, leading to lots of ICEs everywhere: Consider e.g.: __attribute__((target (default))) void foo (void) { } __attribute__((target (128))) void foo (void) { } 3) the multiversioning code assumes that target has a single argument, but it can have more than one. Say for: __attribute__((target (avx,popcnt))) void foo (void) { } __attribute__((target (popcnt,avx))) void bar (void) { } the compiler handles those two as equivalent, but with -Dbar=foo multi-versioning only considers the first string out of that.
[Bug c++/55742] [4.8 regression] __attribute__ in class function declaration cause prototype does not match errors.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55742 --- Comment #30 from davidxl xinliangli at gmail dot com 2013-01-17 22:45:22 UTC --- (In reply to comment #26) On Wed, Jan 16, 2013 at 5:02 PM, jakub at gcc dot gnu.org gcc-bugzi...@gcc.gnu.org wrote: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55742 --- Comment #25 from Jakub Jelinek jakub at gcc dot gnu.org 2013-01-16 16:02:35 UTC --- The actual merging of target attribute isn't that important, what would be more important is that other attributes are merged together in that case and the decls treated as the same thing. Anyway, with target(any) attribute, what would happen for void foo () __attribute__((target (avx))); void foo () __attribute__((target (any))); IMHO the re-declaration with a different target attribute should be an error. This can be a clean way to handle declarations. The definition should have either 'default' attribute or a matching target attribute. A proper forward declaration for a function with MV applied shouldn't have any target attribute. What does this sentence mean? David
[Bug c++/55742] [4.8 regression] __attribute__ in class function declaration cause prototype does not match errors.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55742 --- Comment #22 from davidxl xinliangli at gmail dot com 2013-01-15 19:03:03 UTC --- (In reply to comment #21) What does it mean to merge two declarations with different target attributes? Are the strings just combined? This is a good question. 'merge' needs to be clearly defined and can be implementation defined. There might be conflicts between explicit target strings) and the compiler can choose to pick one and discard others (with a proper warning). David
[Bug c++/55742] [4.8 regression] __attribute__ in class function declaration cause prototype does not match errors.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55742 --- Comment #11 from davidxl xinliangli at gmail dot com 2013-01-14 17:17:21 UTC --- (In reply to comment #9) I'd say once a target attribute appears at a declaration (non-definition) MV needs to be disabled. Or, the declarations target attribute and those at the MV definitions need to be merged. Thus, a declarations target attribute applies to all definitions (thus all MV variants). This might reduce the usefulness of MV -- using declarations with attributes to indicate MV candidates is the intended use model. David
[Bug c++/55742] [4.8 regression] __attribute__ in class function declaration cause prototype does not match errors.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55742 --- Comment #12 from davidxl xinliangli at gmail dot com 2013-01-14 17:20:56 UTC --- (In reply to comment #10) Either use a different name of the attribute (replace target with mv_target or whatever), or require a new attribute (mv?) to be present for multi-versioning (mv attribute on any of the decls would enable it, if mv attribute isn't present on either of the two decls being merged, then the target attribute is merged as before 4.8). I like this proposal: require a new attribute (mv?) to be present for multi-versioning (mv attribute on any of the decls would enable it, if mv attribute isn't present on either of the two decls being merged, then the target attribute is merged as before 4.8) David
[Bug c++/55742] [4.8 regression] __attribute__ in class function declaration cause prototype does not match errors.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55742 --- Comment #14 from davidxl xinliangli at gmail dot com 2013-01-14 17:49:59 UTC --- (In reply to comment #13) (In reply to comment #12) (In reply to comment #10) Either use a different name of the attribute (replace target with mv_target or whatever), or require a new attribute (mv?) to be present for multi-versioning (mv attribute on any of the decls would enable it, if mv attribute isn't present on either of the two decls being merged, then the target attribute is merged as before 4.8). I like this proposal: I too like just using a different attribute name. I will prepare a patch asap for this. Thanks Sri. require a new attribute (mv?) to be present for multi-versioning (mv attribute on any of the decls would enable it, if mv attribute isn't present on either of the two decls being merged, then the target attribute is merged as before 4.8) David I mean Jacub's second alternative -- adding additional attribute that alters the meaning of 'target' attribute -- when it is present, no merging will be done. David
[Bug c++/55742] [4.8 regression] __attribute__ in class function declaration cause prototype does not match errors.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55742 --- Comment #18 from davidxl xinliangli at gmail dot com 2013-01-14 20:17:45 UTC --- All target attributes on decls not tagged with 'mv' attribute should be merged into the default definition. Any decl tagged with 'mv' should be treated as a new decl. David
[Bug c++/55742] [4.8 regression] __attribute__ in class function declaration cause prototype does not match errors.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55742 --- Comment #20 from davidxl xinliangli at gmail dot com 2013-01-14 20:29:54 UTC --- (In reply to comment #19) That wouldn't work, because you would then have the default (non-mv) version, possibly mv version with no target attribute, and then some other mv versions with target attributes. The problem with that is that the first two mangle the same. This means that a non-mv and mv with no target attribute needs to be treated as the same decl (i.e. merged together). that makes sense -- mv annotated decl without target attribute gets merged with default version -- basically it has no actual effect. David
[Bug c++/55742] __attribute__ in class function declaration cause prototype does not match errors.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55742 davidxl xinliangli at gmail dot com changed: What|Removed |Added CC||xinliangli at gmail dot com --- Comment #6 from davidxl xinliangli at gmail dot com 2012-12-20 19:52:32 UTC --- Since target attribute merging/conflicts handling is not well specified nor implemented, with the introduction of MV, it is a good opportunity to straighten it out and document it. MV helps providing diagnostics to detect the broken code (where the target attribute is ignored) :) David (In reply to comment #5) (In reply to comment #4) On Thu, Dec 20, 2012 at 1:21 PM, tmsriram at google dot com gcc-bugzi...@gcc.gnu.org wrote: However, with function multiversioning, this will become a problem as multiversioning does not treat two decls with different target attributes as identical. Since we are enabling multiversioning by default, atleast in C++ front-end for now, IMO, it is better to insist that the definition and declaration contain identical target attributes. Unfortunately, we cannot do that. A lot of existing code relies on this attribute merging. The cleanest approach here is probably to add an additional 'mv' attribute to explicitly enable multiversioning. Breaking the existing semantics is going to break a lot of code. Ok, just to be clear, there are two problems here: 1) Target attribute merging. If the assumption that the target attributes of the decls must be merged is valid, there is a bug. Also, this means that using target attributes to do multiversioning is wrong. 2) Function multiversioning is exposing the bug, via build failures, the problem of declarations and definitions not having identical target attributes. First, we need to decide if target attribute merging is a valid assumption. If so, we fix the bug and make function multiversioning use a new attribute. If the assumption is not valid, changing the source is the only solution. The source is invalid now since the intended behaviour is not happening. Diego.
[Bug c/55771] Negation and type conversion incorrectly exchanged
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55771 davidxl xinliangli at gmail dot com changed: What|Removed |Added CC||xinliangli at gmail dot com --- Comment #2 from davidxl xinliangli at gmail dot com 2012-12-21 06:58:59 UTC --- (In reply to comment #1) Google ref b/7902206 The FE behaves correctly when 'double' type is used instead of 'float' .
[Bug middle-end/55595] [google] r172952 (LIPO) broke profiledbootstrap on google/main, and later in google/gcc-4_7
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55595 davidxl xinliangli at gmail dot com changed: What|Removed |Added CC||xinliangli at gmail dot com --- Comment #1 from davidxl xinliangli at gmail dot com 2012-12-04 20:19:48 UTC --- google/main branch is not fully tested, and not supposed to be used. Google's release branches are. In very near future, lots of cleanups will be done in google/main, and it will be synced with trunk. During that period, expect it to be very unstable. David (In reply to comment #0) starting with r172592, google/main fails profiledbootstrap with the error below. going one revision back eliminates the error. later commits compound/mutate the profiledbootstrap failures; once this bug is fixed, I'll retest. $ rm -rf * $ ../google-main/configure --program-suffix=-google-4.7 --prefix=/u/mhargett --enable-languages=c,c++,lto --enable-lto --with-fpmath=sse --disable-libmudflap --disable-libssp --enable-build-with-cxx --enable-gold=yes --with-mpc=/u/mhargett --with-gmp=/u/mhargett/ --with-mpfr=/u/mhargett/ --disable-multilib --disable-libgomp --disable-werror $ make -j1 profiledbootstrap if [ x-fpic != x ]; then \ /work/mhargett/google-main-obj/./prev-gcc/xgcc -B/work/mhargett/google-main-obj/./prev-gcc/ -B/u/mhargett/x86_64-unknown-linux-gnu/bin/ -B/u/mhargett/x86_64-unknown-linux-gnu/bin/ -B/u/mhargett/x86_64-unknown-linux-gnu/lib/ -isystem /u/mhargett/x86_64-unknown-linux-gnu/include -isystem /u/mhargett/x86_64-unknown-linux-gnu/sys-include-c -DHAVE_CONFIG_H -g -O2 -fprofile-use -I. -I../../google-main/libiberty/../include -W -Wall -Wwrite-strings -Wc++-compat -Wstrict-prototypes -pedantic -fpic ../../google-main/libiberty/cp-demangle.c -o pic/cp-demangle.o; \ else true; fi ../../google-main/libiberty/cp-demangle.c: In function \u2018is_ctor_or_dtor\u2019: ../../google-main/libiberty/cp-demangle.c:5172:1: error: coverage mismatch for function \u2018is_ctor_or_dtor\u2019 while reading counter \u2018arcs\u2019 [-Werror=coverage-mismatch] ../../google-main/libiberty/cp-demangle.c:5172:1: note: number of counters is 7 instead of 8 ../../google-main/libiberty/cp-demangle.c: In function \u2018d_demangle_callback\u2019: ../../google-main/libiberty/cp-demangle.c:5172:1: error: coverage mismatch for function \u2018d_demangle_callback\u2019 while reading counter \u2018arcs\u2019 [-Werror=coverage-mismatch] ../../google-main/libiberty/cp-demangle.c:5172:1: note: number of counters is 25 instead of 26 ../../google-main/libiberty/cp-demangle.c: In function \u2018d_ctor_dtor_name\u2019: ../../google-main/libiberty/cp-demangle.c:5172:1: error: coverage mismatch for function \u2018d_ctor_dtor_name\u2019 while reading counter \u2018arcs\u2019 [-Werror=coverage-mismatch] ../../google-main/libiberty/cp-demangle.c:5172:1: note: number of counters is 20 instead of 18 ../../google-main/libiberty/cp-demangle.c: In function \u2018d_operator_name\u2019: ../../google-main/libiberty/cp-demangle.c:5172:1: error: coverage mismatch for function \u2018d_operator_name\u2019 while reading counter \u2018arcs\u2019 [-Werror=coverage-mismatch] ../../google-main/libiberty/cp-demangle.c:5172:1: note: number of counters is 21 instead of 20 ../../google-main/libiberty/cp-demangle.c: In function \u2018d_make_name\u2019: ../../google-main/libiberty/cp-demangle.c:5172:1: error: coverage mismatch for function \u2018d_make_name\u2019 while reading counter \u2018arcs\u2019 [-Werror=coverage-mismatch] ../../google-main/libiberty/cp-demangle.c:5172:1: note: number of counters is 5 instead of 4 cc1: some warnings being treated as errors
[Bug c++/55245] [4.6/4.7/4.8 Regression] Compiler segfault when compiling a small test case
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55245 --- Comment #5 from davidxl xinliangli at gmail dot com 2012-11-21 16:17:27 UTC --- Should the main variant types gets laid out in gimplify_type_sizes, when the variant's type has size, but the main variant is incomplete? David
[Bug other/55376] [asan] libsanitizer/README.gcc must contain the exact steps to do code changes and to port code from upstream
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55376 davidxl xinliangli at gmail dot com changed: What|Removed |Added CC||xinliangli at gmail dot com --- Comment #1 from davidxl xinliangli at gmail dot com 2012-11-18 05:18:11 UTC --- (In reply to comment #0) We need to document the libsanitizer update process in libsanitizer/README.gcc Few things to cover: - The changes need to be approved by one of the maintainers (or is it obvious)? - Except for *really* trivial changes all patches should go through the upstream tree first. - When updating from upstream, the comment header should be tampered with (oh, my) - Ideally, we need to have a fully automated script to grab the upstream changes (including new/deleted/moved files, etc). Suggestions here? - What is the testing procedure when updating from upstream? (e.g. how do we avoid regressions on the platforms to which the maintainers have no access?) Are all upstream changes considered reviewed and automatically approved for gcc repo? If yes, then this should be automated (including gcc build and libasan testing).
[Bug c++/55245] New: Compiler segfault when compiling the small test case
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55245 Bug #: 55245 Summary: Compiler segfault when compiling the small test case Classification: Unclassified Product: gcc Version: 4.7.3 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: xinlian...@gmail.com Compiling the following program without any option, gcc segfaults: get.cc: In member function 'virtual void Test2::TestBody()': get.cc:22:27: internal compiler error: Segmentation fault The problem is that one array type Vector2_f[2]'s size field is reset to zero during gimplify_type_sizes by one of its variant (which has null size). The problem does not show up in trunk compiler. Was this fixed explicitly in trunk or it happens to work by luck? David template typename VType class Vector2 { }; typedef Vector2float Vector2_f; void GetR( const Vector2_f mosaic_position[3]); class Test1 { private: virtual void TestBody(); }; void Test1::TestBody() { Vector2_f mosaic_position[2][1]; // (1) } class Test2 { private: virtual void TestBody(); }; int tri; void Test2::TestBody() { Vector2_f mosaic_position[2][3] = { }; GetR(mosaic_position[tri]); }
[Bug c++/55245] Compiler segfault when compiling the small test case
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55245 davidxl xinliangli at gmail dot com changed: What|Removed |Added Version|4.7.3 |4.8.0 --- Comment #1 from davidxl xinliangli at gmail dot com 2012-11-09 00:42:55 UTC --- The problem actually exists in main line compiler too.
[Bug c++/55245] Compiler segfault when compiling the small test case
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55245 --- Comment #2 from davidxl xinliangli at gmail dot com 2012-11-09 06:05:43 UTC --- (In reply to comment #1) The problem actually exists in main line compiler too. This is another test case. Segfaults without option, but builds ok with -DOK. The problem seems to be that the main variant of the array type Vector2float[2] does not have size, while the variant Vector2_f[2] has. The variant's size later gets reset to 0 (from main variant). David template typename VType class Vector2 { public: Vector2(); VType x() const; }; typedef Vector2float Vector2_f; void GetR(const Vector2_f mosaic_position[3]); template class C struct DefaultDeleter { float GetColorTexCoord(Vector2_f tex_coord) const { return tex_coord.x(); } }; class GetT { private: virtual void TestBody(); }; void GetT::TestBody() { #ifdef OK Vector2float mosaic_position[2][3] = { }; #else Vector2_f mosaic_position[2][3] = { }; #endif }
[Bug middle-end/35310] Late struct expansion -- missing PRE (2)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35310 --- Comment #2 from davidxl xinliangli at gmail dot com 2012-10-31 21:17:27 UTC --- GCC is still not generating good code here, neither does ICC. However LLVM does a good job here. David (In reply to comment #1) Confirmed. Looks like something for postreload-gcse to handle. Before that, there are no partial redundancies in the RTL (at least, not in the quick look I gave it).
[Bug middle-end/35305] Speculative PRE support missing
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35305 --- Comment #4 from davidxl xinliangli at gmail dot com 2012-10-30 18:57:18 UTC --- The suggested transformation can be useful in some cases, but won't be general enough. The listed example is an extreme case. For instance, the second a+b instance does not have to be in the hot trace but still still hot enough to be PREed. Or there is no one dominating traces available -- the expression is available in all incoming paths except for one rare path. switch (a) { case 1: g[1] = a + b; break; case 2: g[2] = a + b; break; case 3: g[3] = a + b; ... default: g[0] = 0; } switch (b) { case 1: ... a + b; // partially redundant case 2: ... a + b; // redundant default: // does nothing break; } Regarding handling dereferences, the availability and down-safety analysis needs to be extended to to recognize safe speculation candidates. Example 1: dereference of same pointer fully available -- int g1, g2; struct A { int a; int b; }; void foo(struct A* ap, int k,int m) { if (__builtin_expect (k, 1)) g1 = ap-b; else g2 = ap-a; if (__builtin_expect (m, 1)) { g2 = ap-b;// Good safe speculative PRE candidate } } Example 2: deference of ap fully anticipated int g1, g2; struct A { int a; int b; }; void foo(struct A* ap, int k,int m) { if (__builtin_expect (k, 1)) g1 = ap-b; if (__builtin_expect (m, 1)) g2 = ap-b; // Safe to speculatively hoist across the branch into the else of the previous branch else g1 = ap-a; } David (In reply to comment #3) Wouldn't this be a candidate for forming a superblock from hot traces of a function? Thus in the testcase if (k m) { g1 = a + b; g2 = a + b; } else { ... old code } which would also handle the case where we cannot speculatively move code (like dereferences)?
[Bug tree-optimization/35357] Loop peeling not happening
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35357 --- Comment #2 from davidxl xinliangli at gmail dot com 2012-10-26 16:55:48 UTC --- (In reply to comment #1) What is the expected optimization and what is its benefit? Should be transformed into the following the simplified loop body. void foo(int n) { int i; if (n 0) { for ( i = 0; i n; i++) { b[i*2] = a[3*i+1]; b[i*2] +=1; } b[i*2] = a[3*i + 1]; } } Icc at O2 distributes the loop body: for ( i = 0; i = n; i++) { b[i*2] = a[3*i+1]; } for ( i = 0; i n; i++) { b[i*2] += 1; }
[Bug c++/53845] Another error reporting routines re-entered issue
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53845 davidxl xinliangli at gmail dot com changed: What|Removed |Added CC||xinliangli at gmail dot com --- Comment #2 from davidxl xinliangli at gmail dot com 2012-07-13 00:10:07 UTC --- (In reply to comment #1) push_tinst_level tries to error out even when called (during diagnostics) by a deduction_tsubst_fntype which is passed a complain not including tf_error. Similar ICE can also be triggered in many cases with -fdump-tree-all option. The dumper tries to print template parameters which triggers template instantiation (odd). The instantiation may fail for unknown reason causing the ICE. David
[Bug c++/53605] [4.7 Regression] Compiler ICEs in size_binop_loc
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53605 --- Comment #7 from davidxl xinliangli at gmail dot com 2012-06-13 22:32:20 UTC --- thanks for the fix. Is the fix going to be in gcc-4_7 branch? David
[Bug c++/53605] New: Compiler ICEs in size_binop_loc
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53605 Bug #: 53605 Summary: Compiler ICEs in size_binop_loc Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: xinlian...@gmail.com Compile the small program with gcc (trunk or 4-7 compiler), the compiler will ICE. template bool lhs_is_null_literal class EqHelper { public: template typename T1, typename T2 static int Compare( const T1 expected, const T2 actual); }; void foo(){ static const int kData[] = {}; ::EqHelperfalse::Compare(kData, abc); } This is a regression from gcc4_6. David
[Bug middle-end/49363] [feature request] multiple target attribute (and runtime dispatching based on cpuid)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49363 --- Comment #20 from davidxl xinliangli at gmail dot com 2012-05-10 20:45:58 UTC --- Auto Cloning is an independent feature. What is requested here is an extension of a previous auto clone patch by Sri (which is based on command line option -mversion=...). David (In reply to comment #18) On Thu, May 10, 2012 at 3:16 AM, vincenzo.innocente at cern dot ch gcc-bugzi...@gcc.gnu.org wrote: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49363 --- Comment #17 from vincenzo Innocente vincenzo.innocente at cern dot ch 2012-05-10 10:16:13 UTC --- I tested this float x[1024], y[1024], z[1024], w[1024]; void foo() { for (int i=0; i!=1024; ++i) x[i]=y[i]*z[i]+w[i]; } void __attribute__ ((target(arch=corei7))) foo() { for (int i=0; i!=1024; ++i) x[i]=y[i]*z[i]+w[i]; } void __attribute__ ((target(avx))) foo() { for (int i=0; i!=1024; ++i) x[i]=y[i]*z[i]+w[i]; } and see the three versions generated + the resolver. As you notice the source code is identical as I'm exploiting compiler autovectorization here. In this case I was hoping that a single declaration such as __attribute__ ((target(arch=corei7,avx))) or __attribute__ ((target(arch=corei7),target(avx))) would generate the two versions w/o hang to duplicate the source code. Is this possible to support? Yes, this is on the list of things to add. The front-end should clone the bodies and tag the attributes appropriately. Thanks, -Sri, -- Configure bugmail: http://gcc.gnu.org/bugzilla/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are on the CC list for the bug.
[Bug c++/53220] [4.7/4.8 Regression] g++ mis-compiles compound literals
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53220 --- Comment #5 from davidxl xinliangli at gmail dot com 2012-05-07 16:18:13 UTC --- So it is possible either 1) to keep the current G++ semantics of compound literals, but change its behavior due to the implementation change (with clobber marker); or 2) to change hte G++ semantics to match C semantic, but keep the compiler behavior the same Which way to go? If we go for 1), we probably just need to document this behavior better in GCC, and let user change their code. David
[Bug c++/53220] [4.7/4.8 Regression] g++ mis-compiles compound literals
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53220 --- Comment #7 from davidxl xinliangli at gmail dot com 2012-05-07 17:03:51 UTC --- Yes, the array case should be warned or disallowed if 1 is the way to go. I won't call it a lousy choice -- the C++ semantics of the compound literals allow more agressive optimization and smaller stack usage. David (In reply to comment #6) (In reply to comment #5) 1) to keep the current G++ semantics of compound literals, but change its behavior due to the implementation change (with clobber marker); I would argue that 1 is completely useless for you can also construct an array use case from http://gcc.gnu.org/onlinedocs/gcc/Compound-Literals.html It always initializes the pointer with dangling storage, and is always a bug. If keep the current g++ semantics, then the code should be rejected at compile time, and should *not* work when built without optimization. IMO, having this code working in C and not working in C++ is a lousy choice.
[Bug c++/53220] [4.7/4.8 Regression] g++ mis-compiles compound literals
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53220 --- Comment #9 from davidxl xinliangli at gmail dot com 2012-05-08 00:16:30 UTC --- c++11 defines the lifetime of a temporary -- does it match C or g++'s semantics of compound literals or neither? Note that without your change, the original program may also be subject to runtime failures due to escaped life ranges of the scoped variables leading to bad stack layout -- though such bugs are more subtle and less likely to be triggered. David (In reply to comment #8) The thing is, C++11 introduces list-initialized temporaries; I could rewrite the testcase in C++11 as extern C int printf (const char *, ...); int main() { typedef int AR[4]; for (int *p = AR{1,2,3,0}; *p; ++p) { printf (%d\n, *p); } return 0; } so it made sense to me for compound literals to have the same semantics; otherwise you have a difference in lifetime based on whether or not the type is wrapped in parentheses. I definitely agree that we need to give a diagnostic about taking the address of a temporary here.
[Bug c++/53220] [4.7/4.8 Regression] g++ mis-compiles compound literals
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53220 davidxl xinliangli at gmail dot com changed: What|Removed |Added CC||xinliangli at gmail dot com --- Comment #3 from davidxl xinliangli at gmail dot com 2012-05-04 18:52:21 UTC --- What is the right scope for the compound literals? Is the clobber correctly added or the program is ill formed? David
[Bug middle-end/53090] New: suboptimal ivopt
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53090 Bug #: 53090 Summary: suboptimal ivopt Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassig...@gcc.gnu.org ReportedBy: xinlian...@gmail.com Compiling the attached benchmark code with trunk gcc, the code generated for the hot memory swap loop (line 60) is very inefficient : both icc and llvm use two ivs and generate a tight loop with 9 instructions, but gcc decides to use 3 ivs, and the loop exit testing code is wierd and inefficient -- it ends up produce a loop with 11 instructions. #define XCH(x,y){ Aint t_mp; t_mp=(x); (x)=(y); (y)=t_mp; } for( i=1, j=k-1 ; ij ; ++i, --j ) { XCH(perm[i], perm[j]) } The tight version: .LBB0_11: # %for.body57.i # Parent Loop BB0_1 Depth=1 # Parent Loop BB0_9 Depth=2 # =This Inner Loop Header: Depth=3 movl(%rbx,%rdi,4), %ebp movl(%rbx,%rsi,4), %eax movl%eax, (%rbx,%rdi,4) movl%ebp, (%rbx,%rsi,4) decq%rsi incq%rdi cmpl%edx, %edi leal-1(%rdx), %edx jl.LBB0_11 The gcc version: .L18: movl(%rdx), %edi addl$1, %ecx movl(%rsi), %eax movl%eax, (%rdx) addq$4, %rdx movl%edi, (%rsi) movl%r8d, %edi subq$4, %rsi subl%ecx, %edi cmpl%edi, %ecx jl.L18 However gcc is doing the right thing when applied on the extracted test case: #define XCH(x,y) { int t_mp; t_mp=(x); (x)=(y); (y)=t_mp; } void foo (int *perm, int k) { int i,j; for( i=1, j=k-1 ; ij ; ++i, --j ) { XCH(perm[i], perm[j]) } }
[Bug middle-end/53090] suboptimal ivopt
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53090 --- Comment #1 from davidxl xinliangli at gmail dot com 2012-04-23 17:37:40 UTC --- Created attachment 27223 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=27223 benchmark source
[Bug middle-end/53081] New: memcpy/memset loop recognition
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53081 Bug #: 53081 Summary: memcpy/memset loop recognition Classification: Unclassified Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassig...@gcc.gnu.org ReportedBy: xinlian...@gmail.com Both LLVM and icc recognize initialization and copy loop and synthesize calls to memcpy and memset. memmove call can also be synthesized when src/target may overlap. Option needs to provided to disable such optimization in signal handlers. I consider this as optimization for benchmarking ;) For instance, the prime number finder program sieve.c is one of the benchmarks in LLVM. Both LLVM and icc beats gcc in this one because of the missing optimization. #ifndef T #define T int #endif T arr[1000]; void foo(int n) { int i; for (i = 0; i n; i++) { arr[i] = 0; } } void foo2(int n, T* p) { int i; for (i = 0; i n; i++) { *p++ = 0; } } #ifndef T #define T int #endif T arr[1000]; T arr2[1000]; void foo(int n) { int i; for (i = 0; i n; i++) { arr[i] = arr2[i]; } } // sieve.c /* -*- mode: c -*- * $Id: sieve.c 36673 2007-05-03 16:55:46Z laurov $ * http://www.bagley.org/~doug/shootout/ */ #include stdio.h #include stdlib.h int main(int argc, char *argv[]) { #ifdef SMALL_PROBLEM_SIZE #define LENGTH 17000 #else #define LENGTH 17 #endif int NUM = ((argc == 2) ? atoi(argv[1]) : LENGTH); static char flags[8192 + 1]; long i, k; int count = 0; while (NUM--) { count = 0; for (i=2; i = 8192; i++) { flags[i] = 1; } for (i=2; i = 8192; i++) { if (flags[i]) { /* remove all multiples of prime: i */ for (k=i+i; k = 8192; k+=i) { flags[k] = 0; } count++; } } } printf(Count: %d\n, count); return(0); }
[Bug middle-end/53081] memcpy/memset loop recognition
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53081 --- Comment #3 from davidxl xinliangli at gmail dot com 2012-04-23 05:34:24 UTC --- (In reply to comment #2) I should mention I made one patch before based on the vectorizer code which did detection of at least memset; it was while I was an intern at Apple. I posted it and there was some review. And a few years back there was a paper at the GCC summit about it and expanding it to memcpy. I don't know what happened to that code though. Some simple analysis using scev to identify loads and stores with linear address should be good enough. LLVM's version also tries to merge smaller memsets into a larger one.
[Bug middle-end/53081] memcpy/memset loop recognition
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53081 --- Comment #4 from davidxl xinliangli at gmail dot com 2012-04-23 05:34:55 UTC --- (In reply to comment #2) I should mention I made one patch before based on the vectorizer code which did detection of at least memset; it was while I was an intern at Apple. I posted it and there was some review. And a few years back there was a paper at the GCC summit about it and expanding it to memcpy. I don't know what happened to that code though. Some simple analysis using scev to identify loads and stores with linear address should be good enough. LLVM's version also tries to merge smaller memsets into a larger one.
[Bug middle-end/53082] New: local malloc/free optimization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53082 Bug #: 53082 Summary: local malloc/free optimization Classification: Unclassified Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassig...@gcc.gnu.org ReportedBy: xinlian...@gmail.com Malloc/free pairs sometimes are used in a way that behaves like a stack memory. The optimizer should recognize the pattern and take advantage of it. For instance if the memory is not escaped, dead stores to the memory can be removed. In extreme case, the malloc/free pair can be completely removed. For instance: for (i = 0; i 100; i++) { int *s = (int*) malloc (sizeof(int)); *s = 10; free (s); } This code can be completely removed (which LLVM does). The above can be considered as a benchmarking optimization -- see the one appended (objinst.c in LLVM's benchmark suite). The following more advanced optimization (which as far as I know only HP compiler does) can also be done: coalescing multiple malloc calls into one alloca call, and eliminating the free calls. The memory can be escaped. for (...) { int *s1 = malloc (...); int *s2 = malloc (...); int *s3 = malloc (...); ... free(s1); free(s2); free(s3); } == for (...) { int s = alloca (...); s1 = s; s2 = s+ .. s3 = s+ .. ... } // objinst.c /* -*- mode: c -*- * $Id: objinst.c 36673 2007-05-03 16:55:46Z laurov $ * http://www.bagley.org/~doug/shootout/ */ #include stdio.h #include stdlib.h enum {false, true}; #define TOGGLE \ char state; \ char (*value)(struct Toggle *); \ struct Toggle *(*activate)(struct Toggle *) #define DESTROY free typedef struct Toggle { TOGGLE; } Toggle; char toggle_value(Toggle *this) { return(this-state); } Toggle *toggle_activate(Toggle *this) { this-state = !this-state; return(this); } Toggle *init_Toggle(Toggle *this, char start_state) { this-state = start_state; this-value = toggle_value; this-activate = toggle_activate; return(this); } Toggle *new_Toggle(char start_state) { Toggle *this = (Toggle *)malloc(sizeof(Toggle)); return(init_Toggle(this, start_state)); } typedef struct NthToggle { Toggle base; int count_max; int counter; } NthToggle; NthToggle *nth_toggle_activate(NthToggle *this) { if (++this-counter = this-count_max) { this-base.state = !this-base.state; this-counter = 0; } return(this); } NthToggle *init_NthToggle(NthToggle *this, int max_count) { this-count_max = max_count; this-counter = 0; this-base.activate = (Toggle *(*)(Toggle *))nth_toggle_activate; return(this); } NthToggle *new_NthToggle(char start_state, int max_count) { NthToggle *this = (NthToggle *)malloc(sizeof(NthToggle)); this = (NthToggle *)init_Toggle((Toggle *)this, start_state); return(init_NthToggle(this, max_count)); } int main(int argc, char *argv[]) { #ifdef SMALL_PROBLEM_SIZE #define LENGTH 700 #else #define LENGTH 7000 #endif int i, n = ((argc == 2) ? atoi(argv[1]) : LENGTH); Toggle *tog; NthToggle *ntog; tog = new_Toggle(true); for (i=0; i5; i++) { puts((tog-activate(tog)-value(tog)) ? true : false); } DESTROY(tog); for (i=0; in; i++) { tog = new_Toggle(true); DESTROY(tog); } puts(); ntog = new_NthToggle(true, 3); for (i=0; i8; i++) { const char *Msg; if (ntog-base.activate((Toggle*)ntog)-value((Toggle*)ntog)) Msg = true; else Msg = false; puts(Msg); } DESTROY(ntog); for (i=0; in; i++) { ntog = new_NthToggle(true, 3); DESTROY(ntog); } return 0; }
[Bug c++/52957] Missing suggestions on '=' and '==' confusion
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52957 --- Comment #11 from davidxl xinliangli at gmail dot com 2012-04-15 04:09:53 UTC --- (In reply to comment #7) are against it). I don't think there is any GCC contributor that is paid to improve GCC's diagnostics (am I wrong?). Right, but partially because it is a hard thing to do improve and the ROI is low. It is not *technically* hard. Anyone that knows C and a bit of C++ can fix hundreds (if you know C++ well and have a some experience with the C++ parser, you can probably fix thousands of bugs on your own, each fix taking no more than a couple of hours of debugging and few minutes of programming). However, it is time-consuming because of all the bureaucracy: copyright assignment, setting up the environment, bootstrapping+testing, patch submission, patch review, communication delays, style nits, changelogs nits, and updating the testsuite. That is why even current contributors often don't bother with one-liners, because all the associated tasks can make a one minute change require days of effort. For many compiler hobbyists, the bureaucratic issues you mentioned don't exist yet before they start hacking on GCC FE. If the learning curve is steep and the interfaces are obscure, GCC will be less appealing to them. There is a difference between 'can do' and 'want to do'. This is very different from Clang, where Chris obviously cares a great deal about Clang's diagnostics, and it seems to be supported by a team in Apple (and in Google?). Yes, and in Google, but do you know why? Because it is easy to improve their diagnostics and benefit can be quickly seen. But why is it easy? For example, the C++ FE, although I don't like many of its design decisions, is quite easy to hack. Clang can be used to do lots of interesting stuff -- such as source code indexing (for semantic based code search), complicated automatic code refactoring etc. The Clang's design makes it possible because 1) source location information is well preserved; 2) information about macros are preserved; 3) there is no on the fly constant folding etc. And I don't think this is the reason. GCC is obviously easy to hack for their current developers, but they focus on different things than the Clang/LLVM developers. The whole attitude is quite different, for better or worse, and in terms of popularity, the LLVM crowd wins (of course, the GCC crowd may say that this is not a popularity contest, but if you want to attract new developers, who now have a choice, then it is). However, there are a lot of them, and getting patches reviewed and accepted requires herculean amounts of perseverance. But probably you can share with me more about your (or Googlers) thoughts on the latter, since you have the experience of submitting the same project to both LLVM and GCC. Infrastructure makes a big difference here -- see more about this below. Indeed, but my point is that moving to C++ does little (in my opinion) to fix the infrastructure issues in GCC: dejagnu is awful, the wiki is unmaintained, patch submission is a pain, patch review is a bottleneck, no automatic style checks, useless obligatory changelogs, no marketing, poor documentation for beginners, poor coordination with other projects (binutils, gdb, glibc), etc. (and, worse, the expectation that new/non-paid contributors should take care of fixing these issues). Nothing of the above gets fixed by switching to C++ or by having a clearly defined AST... I think they are clearly related. In any case, I hope you are right, and the move to C++ helps. I imagine that it can help to attract more corporate contributors, who don't have to bother with legal paperwork and are paid to do also boring bits. For sporadic and/or non-paid contributors, there are other bigger issues to solve. Moving to C++ itself won't do anything useful -- it is the new design as a result of moving to C++ that will make a difference. David
[Bug c++/52954] New: Missing bounds check warning
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52954 Bug #: 52954 Summary: Missing bounds check warning Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: xinlian...@gmail.com int foo() { int arr[100]; // ... return arr[100]; } /* From clang: array.cpp:4:10: warning: array index 100 is past the end of the array (which contains 100 elements) [-Warray-bounds] return arr[100]; ^ ~~~ array.cpp:2:3: note: array 'arr' declared here int arr[100]; ^ 1 warning generated. */
[Bug c++/52954] Missing bounds check warning
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52954 --- Comment #1 from davidxl xinliangli at gmail dot com 2012-04-12 21:56:07 UTC --- Another example: constexpr int arr_size = 42; constexpr int N = 44; void f(int); int test() { int arr[arr_size]; // ... f(arr[N]); // ... if (N arr_size) return arr[N]; return 0; } /* deadcode.cpp:7:5: warning: array index 44 is past the end of the array (which contains 42 elements) [-Warray-bounds] f(arr[N]); ^ ~ deadcode.cpp:5:3: note: array 'arr' declared here int arr[arr_size]; ^ 1 warning generated. */
[Bug c++/52955] New: Missing warning on wrong sizeof usage
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52955 Bug #: 52955 Summary: Missing warning on wrong sizeof usage Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: xinlian...@gmail.com Example: #include memory.h struct S { int a, b, c; float vec[16]; }; void test(S *s) { // ... memset(s, 0, sizeof(s)); // ... } /* memset.cpp:5:23: warning: argument to 'sizeof' in 'memset' call is the same expression as the destination; did you mean to dereference it? [- Wsizeof-pointer-memaccess] memset(s, 0, sizeof(s)); ~^ 1 warning generated. */
[Bug c++/52956] New: Missing overflow warning
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52956 Bug #: 52956 Summary: Missing overflow warning Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: xinlian...@gmail.com static const long long DiskCacheSize = 8 30; // 8 Gigs /* overflow.cpp:1:42: warning: signed shift result (0x2) requires 35 bits to represent, but 'int' only has 32 bits [-Wshift-overflow] static const long long DiskCacheSize = 8 30; // 8 Gigs ~ ^ ~~ 1 warning generated. */
[Bug c++/52957] New: Missing suggestions on '=' and '==' confusion
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52957 Bug #: 52957 Summary: Missing suggestions on '=' and '==' confusion Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: xinlian...@gmail.com void foo(); void bar(); void test(int i) { // ... if (i = 42) foo(); else bar(); } /* parentheses1.cpp:5:9: warning: using the result of an assignment as a condition without parentheses [-Wparentheses] if (i = 42) foo(); ~~^~~~ parentheses1.cpp:5:9: note: place parentheses around the assignment to silence this warning if (i = 42) foo(); ^ ( ) parentheses1.cpp:5:9: note: use '==' to turn this assignment into an equality comparison if (i = 42) foo(); ^ == 1 warning generated. */
[Bug c++/52958] New: Missing warning on missed parehthesis
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52958 Bug #: 52958 Summary: Missing warning on missed parehthesis Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: xinlian...@gmail.com int test(bool b, int x, int y) { return 42 + b ? x : y; } /* parentheses4.cpp:2:17: warning: operator '?:' has lower precedence than '+'; '+' will be evaluated first [-Wparentheses] return 42 + b ? x : y; ~~ ^ parentheses4.cpp:2:17: note: place parentheses around the '+' expression to silence this warning return 42 + b ? x : y; ^ ( ) parentheses4.cpp:2:17: note: place parentheses around the '?:' expression to evaluate it first return 42 + b ? x : y; ^ () 1 warning generated. */
[Bug c++/52959] New: Missing typo suggestions
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52959 Bug #: 52959 Summary: Missing typo suggestions Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: xinlian...@gmail.com struct BaseType {}; struct DerivedType : public BaseType { static int base_type; DerivedType() : basetype() {} }; /* typo1.cpp:4:19: error: initializer 'basetype' does not name a non-static data member or base class; did you mean the base class 'BaseType'? DerivedType() : basetype() {} ^~~~ BaseType typo1.cpp:2:22: note: base class 'BaseType' specified here struct DerivedType : public BaseType { ^~~ 1 error generated. */ namespace fiz { namespace bang { int foobar(); } } int test() { return bang::Foobar(); } /* typo2.cpp:5:10: error: use of undeclared identifier 'bang'; did you mean 'fiz::bang'? return bang::Foobar(); ^~~~ fiz::bang typo2.cpp:1:27: note: 'fiz::bang' declared here namespace fiz { namespace bang { ^ typo2.cpp:5:16: error: no member named 'Foobar' in namespace 'fiz::bang'; did you mean 'foobar'? return bang::Foobar(); ~~^~ foobar typo2.cpp:2:7: note: 'foobar' declared here int foobar(); ^ 2 errors generated. */
[Bug c++/52960] New: Missing warnings on ambiguous source : function decl vs local var decl
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52960 Bug #: 52960 Summary: Missing warnings on ambiguous source : function decl vs local var decl Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: xinlian...@gmail.com struct S { int n; }; S bar(); S test() { S foo, bar(); // ... return foo; } /* conversions.cppreturn.cpvexing.cpp vexing.cpp:5:6: warning: empty parentheses interpreted as a function declaration [-Wvexing-parse] bar(); ^~ vexing.cpp:4:8: note: change this ',' to a ';' to call 'bar' S foo, ^ ; vexing.cpp:5:6: note: replace parentheses with an initializer to declare a variable bar(); ^~ {} 1 warning generated. */
[Bug c++/52961] New: Missing warning on empty if
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52961 Bug #: 52961 Summary: Missing warning on empty if Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: xinlian...@gmail.com int test(int i) { if (i 42); return 42; return i; } /* emptyif.cpp:2:14: warning: if statement has empty body [-Wempty-body] if (i 42); ^ 1 warning generated. */
[Bug c++/52962] New: Column number incorrect in error
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52962 Bug #: 52962 Summary: Column number incorrect in error Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: xinlian...@gmail.com #define M1(x, y, z) y(); #define M2(x, y, z) M1(x, y, z) void test() { M2(a, b, c); } /* = Clang macros1.cpp:4:9: error: use of undeclared identifier 'b' M2(a, b, c); ^ macros1.cpp:2:27: note: expanded from macro 'M2' #define M2(x, y, z) M1(x, y, z) ^ macros1.cpp:1:21: note: expanded from macro 'M1' #define M1(x, y, z) y(); ^ 1 error generated. */
[Bug c++/52963] New: Missing error on pack expansion
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52963 Bug #: 52963 Summary: Missing error on pack expansion Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: xinlian...@gmail.com template class T, class U struct S {}; template typename... Ts void f(Ts...); template typename... Ts struct X { template typename... Us void g() { f(STs, Us()...); } }; void test(Xint, float x) { x.gint, float, double(); } /* oscar-wilde.cpp:6:18: error: pack expansion contains parameter packs 'Ts' and 'Us' that have different lengths (2 vs. 3) f(STs, Us()...); ~~ ~~ ^ oscar-wilde.cpp:11:5: note: in instantiation of function template specialization 'Xint, float::gint, float, double' requested here x.gint, float, double(); ^ 1 error generated. */
[Bug c++/52964] New: No warning on negative array size in template instantatiation
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52964 Bug #: 52964 Summary: No warning on negative array size in template instantatiation Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: xinlian...@gmail.com template int N struct S1 { int arr[N - 5]; }; template typename T struct S2 { S1sizeof(T) s1; }; template typename T void foo(T x) { S2T s2; } void test() { foo(42); } /* From Clang: templates1.cpp:1:38: error: 'arr' declared as an array with a negative size template int N struct S1 { int arr[N - 5]; }; ^ templates1.cpp:2:49: note: in instantiation of template class 'S14' requested here template typename T struct S2 { S1sizeof(T) s1; }; ^ templates1.cpp:3:45: note: in instantiation of template class 'S2int' requested here template typename T void foo(T x) { S2T s2; } ^ templates1.cpp:4:15: note: in instantiation of function template specialization 'fooint' requested here void test() { foo(42); } == */
[Bug c++/52961] Missing warning on empty if
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52961 --- Comment #3 from davidxl xinliangli at gmail dot com 2012-04-12 23:11:10 UTC --- (In reply to comment #2) Yes, see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36478 . thanks. I tried -Wempty-body, gcc gives warning as expected: emptyif.cpp: In function 'int test(int)': emptyif.cpp:2:15: warning: suggest braces around empty body in an 'if' statement [-Wempty-body] if (i 42);
[Bug c++/52964] No warning on negative array size in template instantatiation
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52964 --- Comment #3 from davidxl xinliangli at gmail dot com 2012-04-12 23:19:33 UTC --- (In reply to comment #2) And with trunk we print: pr52964.cc: In instantiation of ‘struct S14’: pr52964.cc:2:49: required from ‘struct S2int’ pr52964.cc:3:45: required from ‘void foo(T) [with T = int]’ pr52964.cc:4:21: required from here pr52964.cc:1:43: error: size of array is negative template int N struct S1 { int arr[N - 5]; }; ^ We could easily print the caret for each line. And the locations could be much better. I also like more the order of Clang and the extra info about what each thing is, but we would have to convince Gabriel and/or Jason. David, what else do you see that could be improved? With -fsyntax-only, the warning is not emitted -- any reason for that? David
[Bug c++/52957] Missing suggestions on '=' and '==' confusion
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52957 --- Comment #4 from davidxl xinliangli at gmail dot com 2012-04-12 23:28:38 UTC --- (In reply to comment #3) Yep, a dup. I am sorry David, but many of these deficiencies are well-known. What GCC sorely needs is people working on fixing them, and that is an issue that has no easy fix and it is only going to get worse. :-( *** This bug has been marked as a duplicate of bug 29280 *** This brings up the issue of programmer productivity where Clang can claim that their C++ based code base and clearly defined AST can make fixing things like this easier .. David
[Bug c++/52957] Missing suggestions on '=' and '==' confusion
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52957 --- Comment #6 from davidxl xinliangli at gmail dot com 2012-04-13 05:01:56 UTC --- I think you indirectly proved my point (see below) :) (In reply to comment #5) Maintainers (those who decide) are too few and they either do not care or do not have time to fix these things. Existing or new contributors that are paid to do something specific don't care about anything else as long as it doesn't interfere with their job (if it does, whatever improvement it may be, then they are against it). I don't think there is any GCC contributor that is paid to improve GCC's diagnostics (am I wrong?). Right, but partially because it is a hard thing to do improve and the ROI is low. This is very different from Clang, where Chris obviously cares a great deal about Clang's diagnostics, and it seems to be supported by a team in Apple (and in Google?). Yes, and in Google, but do you know why? Because it is easy to improve their diagnostics and benefit can be quickly seen. Contributors to GCC that care about these things, either unpaid or in their free time, are very few and completely overworked. It is true that the GCC code base has a huge technical debt, but diagnostic fixes tend to be easy to fix, if you have a little of experience. But Clang is even easier to hack on. Any person without any compiler background can build tools on top of it. In other words, the bar of working on compiler becomes very low -- and this is really exciting and attractive for developers who used to think it as untouchable area to work on. However, there are a lot of them, and getting patches reviewed and accepted requires herculean amounts of perseverance. But probably you can share with me more about your (or Googlers) thoughts on the latter, since you have the experience of submitting the same project to both LLVM and GCC. Infrastructure makes a big difference here -- see more about this below. New contributors to GCC are almost non-existent (except those that are paid to work on very specific stuff, like a new port, but these new contributors also don't care about GCC in general, as long as it doesn't interfere with their job). The reasons why GCC fails to attract new contributors are very well-known: the copyright assignment is a major hurdle for casual/unpaid contributors, patches from new contributors are routinely ignored and the patch review process is hostile to new contributors (in reality, much less than in the Linux kernel, but they have enough developers to afford being hostile). I am aware of this. Interacting for the first time with GCC developers can easily discourage a lot of people (there is a whole page just on the subject! http://gcc.gnu.org/wiki/GCC_Research). In Clang for example, small obvious fixes proposed by new contributors without any changelog or even without a patch are often committed immediately. I have never seen this in +6 years contributing to GCC, on the contrary, the contributor will be asked to bootstrap, regtest, provide a Changelog, fix the Changelog, provide a patch, read the code style, fix the code style, some bikeshedding, and between each step, there may be weeks of no feedback at all. The root cause is still GCC's older infrastructure. Lack of modern design leads to weird interactions between various components, lots of hidden assumptions, lack of good verification etc -- all these lead to the problem you mention -- GCC patches from new contributors are bound to be buggy or to trigger other latent bugs here and there, which means the heavy weight process in GCC is required. Being an expert in GCC requires years of knowledge accumulation and hostility is bound to happen (and ironically really required to ensure quality) because qualified reviewers are so few and it just does not scale. Clang can afford to lower the bar for contributors because its infrastructure. GCC also has a huge marketing problem, no overall vision, no clear leadership (it is easy to get contradictory answers by the people who decide), and GCC developers show both a lack of enthusiasm (just compare presentations by Clang/LLVM developers, full of amazing features, awesome graphics, never-seen-before numbers and LLVM webpages are full of how good their compiler is, how much better than anything else, etc.) and a sense of superiority I think part of the reasons is that contributors in that community are much younger on average (the requirement on contributor experience is lower). Young kids tend to be very enthusiastic. (we don't want to be like Clang/LLVM! It produces slower code by 0.0001%! I will assure you the performance difference is much more than that. It doesn't support mmix-knuth-mmixware). These kind of things are essential to attract new contributors, however, they do not seem to provide any advantage to existing developers, and they are seen as pointless extra work better done by, wait
[Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182 --- Comment #28 from davidxl xinliangli at gmail dot com 2012-01-11 17:26:46 UTC --- See comment 24 for shorter test case. Summary: 1) the regression reported by Oleg in gcc4_6 and earlier versions is due to FE code generation difference which lead to the backend to generate code leading to partial register stall. 2) the RAT stall problem is fixed in gcc4_7 3) however in 4_7, there is a different problem -- redundant sign-extension and move instruction is generated. It could be due to the limitation in RTL forward propagation and combine pass to deal with multiple downward uses David
[Bug c++/23383] builtin array operator new is not marked with malloc attribute
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23383 --- Comment #20 from davidxl xinliangli at gmail dot com 2012-01-05 18:11:18 UTC --- (In reply to comment #19) On Wed, 4 Jan 2012, xinliangli at gmail dot com wrote: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23383 --- Comment #18 from davidxl xinliangli at gmail dot com 2012-01-04 17:11:26 UTC --- (In reply to comment #17) On Wed, 4 Jan 2012, xinliangli at gmail dot com wrote: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23383 --- Comment #16 from davidxl xinliangli at gmail dot com 2012-01-04 00:28:55 UTC --- A related topic - aliasing property of realloc -- the malloc attribute is not applied in the glibc header and the comment is like /* __attribute_malloc__ is not used, because if realloc returns the same pointer that was passed to it, aliasing needs to be allowed between objects pointed by the old and new pointers. * It is true that the realloc can return an address is physically aliased with the pointer passed to it -- but assuming no-alias by the compiler should cause no harm -- as all code motions/CSEs across the realloc call will not be possible because realloc may modify/use the memory location. Any comment on this? The malloc attribute assumes that the contents of the memory pointed to by the return value is undefined, so the comment is inaccurate but the malloc attribute can indeed be not used. Which part of the optimizer takes advantage of the 'undefinedness' of returned memory? points-to analysis. It assumes that the returned blob of memory points to nothing (yet). So for int i; int **p = malloc (8); *p = i; int **q = realloc (p, 8); you'd get that *q points to nothing insteda of i. The malloc attribute documentation is confusing: malloc The malloc attribute is used to tell the compiler that a function may be treated as if any non-NULL pointer it returns cannot alias any other pointer valid when the function returns. This will often improve optimization. Standard functions with this property include malloc and calloc. realloc-like functions have this property as long as the old pointer is never referred to (including comparing it to the new pointer) after the function returns a non-NULL value. It does not mention the undefineness explicitly. It might be better to fix the semantics of this attribute to only specify the aliasing property so that it can also be applied to realloc. Points-to needs to be fixed to special case realloc for the initialization. David We can explicitly handle REALLOC in the points-to code to honor the fact that it is not (we do not at the moment). ok. which needs to be fixed here. Richard.
[Bug c++/23383] builtin array operator new is not marked with malloc attribute
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23383 --- Comment #22 from davidxl xinliangli at gmail dot com 2012-01-05 18:54:51 UTC --- (In reply to comment #21) But can't a valid code also compare the result from realloc with the old pointer, and if they are equal, do something, otherwise do something else? I think it is pretty common e.g. if the malloced block contains pointers to parts of the malloced area and upon realloc that didn't return the passed address wants to adjust all those pointers. Having a malloc attribute on realloc would still break this. I'd say we want realloc attribute and handle it where we currently handle BUILT_IN_REALLOC. Right, the case you describe may be common. On the other hand, the compiler probably should not optimize away pointer comparisons without knowing if the pointer is used after 'free' or 'free' like function 'realloc'. David
[Bug c++/23383] builtin array operator new is not marked with malloc attribute
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23383 --- Comment #18 from davidxl xinliangli at gmail dot com 2012-01-04 17:11:26 UTC --- (In reply to comment #17) On Wed, 4 Jan 2012, xinliangli at gmail dot com wrote: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23383 --- Comment #16 from davidxl xinliangli at gmail dot com 2012-01-04 00:28:55 UTC --- A related topic - aliasing property of realloc -- the malloc attribute is not applied in the glibc header and the comment is like /* __attribute_malloc__ is not used, because if realloc returns the same pointer that was passed to it, aliasing needs to be allowed between objects pointed by the old and new pointers. * It is true that the realloc can return an address is physically aliased with the pointer passed to it -- but assuming no-alias by the compiler should cause no harm -- as all code motions/CSEs across the realloc call will not be possible because realloc may modify/use the memory location. Any comment on this? The malloc attribute assumes that the contents of the memory pointed to by the return value is undefined, so the comment is inaccurate but the malloc attribute can indeed be not used. Which part of the optimizer takes advantage of the 'undefinedness' of returned memory? We can explicitly handle REALLOC in the points-to code to honor the fact that it is not (we do not at the moment). ok.
[Bug c++/23383] builtin array operator new is not marked with malloc attribute
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23383 --- Comment #16 from davidxl xinliangli at gmail dot com 2012-01-04 00:28:55 UTC --- A related topic - aliasing property of realloc -- the malloc attribute is not applied in the glibc header and the comment is like /* __attribute_malloc__ is not used, because if realloc returns the same pointer that was passed to it, aliasing needs to be allowed between objects pointed by the old and new pointers. * It is true that the realloc can return an address is physically aliased with the pointer passed to it -- but assuming no-alias by the compiler should cause no harm -- as all code motions/CSEs across the realloc call will not be possible because realloc may modify/use the memory location. Any comment on this? David (In reply to comment #15) (In reply to comment #14) We do the exact opposite - type-based rules override points-to must-alias information (or really may-alias information). Also for the proposed scheme to work you need to guarantee that you always can compute correct points-to relations (I mean, if points-to information says pt_anything and if you then assume must-alias and thus a conflict then you simply disable TBAA completely). Right, in general, type alias rules should override field and flow insensitive pointer aliasing information as they really have very low confidence level (especially for pt_anything case which is just a baseless guess) -- but precise/trustworthy aliasing info should be checked before assertion based alias information and decide whether to proceed. For example: if (no_alias_according_to_conservative_pointer_info) return no_alias; if (no_alias_according_to_precise_pointer_info) return no_alias; if (must_alias or definitely_may_alias) return may/must_alias; (1) // now proceed with type based rules, etc. This is in theory. In practice, it can be tricky to tag the confidence level of aliasing info. David
[Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182 --- Comment #20 from davidxl xinliangli at gmail dot com 2011-10-24 19:33:18 UTC --- The test.cpp attached seems to be the same as the old version. David
[Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182 --- Comment #22 from davidxl xinliangli at gmail dot com 2011-10-24 19:58:23 UTC --- (In reply to comment #21) OK, just in case, here is my current test. Preprocessed test case? I saw the main assembly difference that can explain the performance diff, but want to make sure it is not due to your new source change (I saw some print statement addeded). David
[Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182 --- Comment #24 from davidxl xinliangli at gmail dot com 2011-10-24 23:00:22 UTC --- (In reply to comment #23) Here is the source preprocessed for gcc47. The test exhibits the slowdown mentioned in comment 11. The problem can be reproduced with a simplified test case -- basically depending on how the result value from the inner loop is used in the outer loop (related to casting), the inner loop code is quite different - in the slow case, there are two redundant sign extension and a move instructions generated. # the fast version gcc -O3 -DFAST_VER bug.cpp ./a.out rv=4282167296 test description absolute operations ratio with number time per second test0 0 int8_t constant add 1.05 sec 1523.81 M 1.00 Total absolute time for int8_t constant folding: 1.05 sec # the slow version: gcc -O3 bug.cpp ./a.out rv=4282167296 test description absolute operations ratio with number time per second test0 0 int8_t constant add 1.57 sec 1019.11 M 1.00 Total absolute time for int8_t constant folding: 1.57 sec # however, when disabling inlining of check_shifted_sum_1 in the slow case, the runtime is recovered: gcc -O3 -DNOINLINE bug.cpp ./a.out rv=4282167296 test description absolute operations ratio with number time per second test0 0 int8_t constant add 1.05 sec 1523.81 M 1.00 Total absolute time for int8_t constant folding: 1.05 sec The inner loop body in faster case: .L60: movzbl0(%rbp,%rcx), %r9d addq$1, %rcx cmpl%ecx, %ebx leal10(%r8,%r9), %r8d # SUCC: 4 [91.0%] (dfs_back,can_fallthru) 5 [9.0%] (fallthru,can_fallthru,loop_exit) jg.L60 while for the slow case: .L60: movzbl(%r12,%rcx), %eax movsbl%r8b, %r8d addq$1, %rcx leal10(%rax), %r9d movsbl%r9b, %r9d addl%r8d, %r9d cmpl%ecx, %ebp movl%r9d, %r8d # SUCC: 4 [91.0%] (dfs_back,can_fallthru) 5 [9.0%] (fallthru,can_fallthru,loop_exit) jg.L60 The relevant source change: #ifdef NOINLINE #define INL __attribute__((noinline)) #else #define INL inline #endif template typename T, typename T2, typename Shifter INL void check_shifted_sum_1(T2 result) { T temp = (T)SIZE * Shifter::do_shift((T)init_value); if (!tolerance_equalT((T)result,temp)) printf(test %i failed\n, current_test); } #ifdef FAST_VER #define TYPE u_int32_t #else #define TYPE int8_t #endif template typename T, typename Shifter __attribute__((noinline)) u_int32_t test_constant(T* first, int count, const char *label) { int i; u_int32_t rv = 0; start_timer(); for (i = 0; i iterations; ++i) { T result = 0; for (int n = 0; n count; ++n) { result += Shifter::do_shift( first[n] ); } rv += result; check_shifted_sum_1T, TYPE, Shifter(result); } record_result( timer(), label ); return rv; }
[Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182 --- Comment #25 from davidxl xinliangli at gmail dot com 2011-10-24 23:02:14 UTC --- Created attachment 25600 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25600 test case for 47 Note that with gcc46, the result is even slower -- it has the RAT stall problem which is fixed in 47. David
[Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182 --- Comment #15 from davidxl xinliangli at gmail dot com 2011-10-21 23:02:16 UTC --- (In reply to comment #14) (In reply to comment #13) David, it looks like we are seeing different things with v4.7... See my comment 11 - I am still observing the slowdown. Do you have access to v4.1 and v4.6? Could you try reproducing my test please? Sorry for the delay -- I am pretty swamped these days (till mid October). I will try to look at the problem more then. David I still can not reproduce the problem with trunk compiler: rv=4282167296 test description absolute operations ratio with number time per second test0 0 int8_t constant add 1.09 sec 1467.89 M 1.00 Total absolute time for int8_t constant folding: 1.09 sec Can you attach the output of -v and the assembly file with -fverbose-asm -dA and the optimized dump file with option -fdump-tree-optimized-blocks using trunk compiler? thanks, David
[Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182 --- Comment #14 from davidxl xinliangli at gmail dot com 2011-09-15 17:28:10 UTC --- (In reply to comment #13) David, it looks like we are seeing different things with v4.7... See my comment 11 - I am still observing the slowdown. Do you have access to v4.1 and v4.6? Could you try reproducing my test please? Sorry for the delay -- I am pretty swamped these days (till mid October). I will try to look at the problem more then. David
[Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182 --- Comment #8 from davidxl xinliangli at gmail dot com 2011-08-25 16:17:10 UTC --- gcc46 and gcc47 difference can be reproduced using -O2 -m64. David
[Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182 davidxl xinliangli at gmail dot com changed: What|Removed |Added CC||xinliangli at gmail dot com --- Comment #2 from davidxl xinliangli at gmail dot com 2011-08-24 23:15:44 UTC --- The problem is fixed in trunk compiler: 1) with 4.6 compiler: test description absolute operations ratio with number time per second test0 0 int8_t constant add 3.29 sec 486.32 M 1.00 RAT_STALLS.registers = 288249 (sampling count 10001) 2) with trunk compiler: test description absolute operations ratio with number time per second test0 0 int8_t constant add 1.34 sec 1194.03 M 1.00 No partial register stalls from user functions. Inner loop from trunk compiler: .L55: movzbl0(%rbp,%rcx), %r9d addq$1, %rcx cmpl%ecx, %ebx leal10(%r8,%r9), %r8d jg.L55 Inner loop from 46 compiler: .L43: addl$10, %eax addb(%rdx), %al addq$1, %rdx cmpq$data8+8000, %rdx jne.L43 RAT stalls (not precise event so the instruction causing stalls is a little off) : 400e27:nopw 0x0(%rax,%rax,1) 127 0.0440 : 400e30:add$0xa,%eax 5869 2.0330 : 400e33:add(%rdx),%al 282125 97.7263 : 400e35:add$0x1,%rdx : 400e39:cmp$0x404560,%rdx : 400e40:jne400e30 main+0xd0 David
[Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182 --- Comment #3 from davidxl xinliangli at gmail dot com 2011-08-25 00:13:00 UTC --- Caused by differences in FE generated code: 46: D.6887 = (int) D.6886; D.6888 = custom_constant_addsigned char::do_shift (D.6887); D.6889 = (unsigned char) D.6888; result.8 = (unsigned char) result; D.6891 = D.6889 + result.8; result = (signed char) D.6891; n = n + 1; trunk: D.6938 = (int) D.6937; D.6874 = custom_constant_addsigned char::do_shift (D.6938); D.6939 = (int) result; -- promoted to int D.6940 = (int) D.6874; ---promoted to int D.6941 = D.6939 + D.6940; result = (signed char) D.6941; n = n + 1;
[Bug c++/49855] New: internal compiler error: in fold_convert_const_int_from_real
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49855 Summary: internal compiler error: in fold_convert_const_int_from_real Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: xinlian...@gmail.com In the following code, FE should generate FIX_TRUNC_EXPR instead of NOP_EXPR for the float to int conversion. extern void foo(int); template class Key, class Value void Basic() { const int kT = 1.5e6;// --- causes ICE int size = kT*2/3; do { foo(size); size = size * 0.5 - 1; } while (size = 0 ); } Note that removing the template specification, the right code is generated and there is no ICE. David
[Bug middle-end/49363] [feature request] multiple target attribute (and runtime dispatching based on cpuid)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49363 --- Comment #6 from davidxl xinliangli at gmail dot com 2011-07-23 17:28:30 UTC --- (In reply to comment #5) Any news on this item? Is this feature still foreseen for 4.7? A patch to test for instance. Sri is working on this. He will post a formal specification of feature soon. Part of the proposal also includes implementation of the MV runtime support (runtime initialization to determine cpu topology and intrinsics to query) and command line option extensions. David
[Bug regression/49498] [4.7 Regression]: gcc.dg/uninit-pred-8_b.c bogus warning line 20
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49498 --- Comment #9 from davidxl xinliangli at gmail dot com 2011-07-07 19:31:53 UTC --- On Thu, Jul 7, 2011 at 9:23 AM, law at redhat dot com gcc-bugzi...@gcc.gnu.org wrote: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49498 --- Comment #8 from Jeffrey A. Law law at redhat dot com 2011-07-07 16:22:48 UTC --- -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 07/05/11 16:52, xinliangli at gmail dot com wrote: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49498 --- Comment #7 from davidxl xinliangli at gmail dot com 2011-07-05 22:51:49 UTC --- (In reply to comment #6) Created attachment 24693 [details] k.c.127t.uninit 1) is the complicated control flow generated by if-merging + jump-threading? I think jump-threading is the primarly culprit. At one time I tried to track how things changed through the passes, but ultimately decided the end result either had to be analyzable or not by the uninit code. 2) On the targets that have the problem, is branch cost considered cheap? No idea. I didn't bother to look at why cris-elf triggers the problem, but x86 doesn't. Presumably it's a branch-cost or similar issue. 3) Looks like there are more jump-threading opportunities that can clean up the control flow, namely, Yes. Jump threading is inherently an iterative process. It was decided some time ago to remove the iterative step as it doesn't buy a whole lot in terms of code quality and it's a fairly significant compile-time cost. Does it need to be iterative? Any simple example to show dependency of jump-threading analysis on actual transformation of some other jump-threading instance (e.g, opportunity only gets exposed after transformation)? During my investigations noticed the same thing and manually reran DOM from within GDB and verified it further simplified the CFG, but rerunning DOM isn't really going to be an option. Is incremental DOM update a choice? David With the complicated control flow, the uninit analysis is definitely confused -- requires powerful symbolic evaluation to simplify the predicates: the predicates guarding the use in BLOCK 6: (n=9 AND m==0) OR (n = 9 AND m != 0 AND (n =9 OR m 100)) OR (n 9 AND m 100) OR (n 9 AND m = 100 AND (n=9 OR m 100)) of course there might be other bugs in the analysis that prevent the above from formulated, but nonetheless, it is unlikely to be handled). OK. That's kindof what I needed to know -- we're unlikely to handle this particular case. Solution 1) fix the test case by disabling jumpthreading and if-merging Or xfailing it for the affected targets. 2) or maybe move the uninit analysis earlier in the pipeline. As you know, that runs the risk of introducing other false positives. I can live with #1 or the target-specific xfailing approach. jeff -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJOFd1BAAoJEBRtltQi2kC7wpsIAJ1FStgYozE/6o8o5mZrj+5V hWP7WmspSVFQMLTGDZw7hNeCEnE2rIl/8Tmin6/GDWM50oNGati2HqTRqTSZwU0y YvMY8NlH1/YY4hJ94YPEpNrALAIwD8w3qdhiPVlS7eTWgOl8iTmXLJmJqk6OnT+Z BeYPxpYDkQgYvicyjT4VVcWpwcmCbE/D9bKTNt68ABAH8RTmkba/VaK1wtGpt3qt hy0qXCi59OUPh7TbcKvgrDLL3GDmy68C9afHUiEKyfDu3JxV9awl4y4Bfr1lOURF bvTOuhKQo0MOlbtgJo24nGYK2TU/1Nv1DkcdhgvsCCBciO8LPoocnSZ176ohE5E= =/ti9 -END PGP SIGNATURE- -- Configure bugmail: http://gcc.gnu.org/bugzilla/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are on the CC list for the bug.