[Bug middle-end/38857] [4.4 Regression] ICE in selective scheduler
--- Comment #3 from amonakov at gcc dot gnu dot org 2009-01-20 15:45 --- The assert that fails is checking whether an instruction was correctly disconnected from the insn stream (at its original location) to be inserted on the scheduling boundary by adjusting PREV_INSN/NEXT_INSN links (we try to move instructions instead of removing and reissuing new instruction to avoid cost of re-initialization of associated structures). There are two different versions of code to decide whether it is appropriate to move an instruction for places where instructions are disconnected or inserted into the stream, as different scheduler data is available there. Attached patch (by Andrey) changes it so that decision is made at instruction's original location and saved until we need to move/issue it at the scheduling boundary in should_move parameter. Bootstrapped regtested on ia64-linux. I will include the testcase when sending patch to gcc-patches@ Steven, can you please also check if it fixes the testcase you've seen fail on this assert? -- amonakov at gcc dot gnu dot org changed: What|Removed |Added CC||amonakov at gcc dot gnu dot ||org AssignedTo|unassigned at gcc dot gnu |amonakov at gcc dot gnu dot |dot org |org Status|NEW |ASSIGNED Last reconfirmed|2009-01-19 22:49:52 |2009-01-20 15:45:10 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38857
[Bug middle-end/38857] [4.4 Regression] ICE in selective scheduler
--- Comment #4 from amonakov at gcc dot gnu dot org 2009-01-20 15:47 --- Created an attachment (id=17153) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17153action=view) proposed patch -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38857
[Bug middle-end/38857] [4.4 Regression] ICE in selective scheduler
--- Comment #7 from amonakov at gcc dot gnu dot org 2009-01-22 12:19 --- (In reply to comment #6) -static bool code_motion_path_driver (insn_t, av_set_t, ilist_t, - cmpd_local_params_p, void *); +static int code_motion_path_driver (insn_t, av_set_t, ilist_t, +cmpd_local_params_p, void *); You probably don't want this bit...? The function returns -1 in some circumstances. This change is not relevant to the ICE in question, but is nevertheless a correction (maybe not the best, as 'return true' and 'return false' are used in function's body). I'm not sure what's best here -- to include this in PR fix submission, or as a separate patch. FWIW, there're a couple more unrelated changes: 1) check if a reg is actually a hard reg if (REG_P (*cur_rtx) + HARD_REGISTER_P (*cur_rtx) hard_regno_nregs[REGNO(*cur_rtx)][GET_MODE (*cur_rtx)] 1) and 2) Do not merge info from successors if not relevant /* Merge data, clean up, etc. */ - if (code_motion_path_driver_info-after_merge_succs) + if (res != -1 code_motion_path_driver_info-after_merge_succs) code_motion_path_driver_info-after_merge_succs (lparams, static_params); Again, I will submit them separately if so desired. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38857
[Bug middle-end/38857] [4.4 Regression] ICE in selective scheduler
--- Comment #9 from amonakov at gcc dot gnu dot org 2009-01-29 10:53 --- Subject: Bug 38857 Author: amonakov Date: Thu Jan 29 10:53:15 2009 New Revision: 143753 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=143753 Log: 2009-01-29 Andrey Belevantsev a...@ispras.ru Alexander Monakov amona...@ispras.ru PR middle-end/38857 * sel-sched.c (count_occurrences_1): Check that *cur_rtx is a hard register. (move_exprs_to_boundary): Change return type and pass through should_move from move_op. Relax assert. Update usage ... (schedule_expr_on_boundary): ... here. Use should_move instead of cant_move. (move_op_orig_expr_found): Indicate that insn was disconnected from stream. (code_motion_process_successors): Do not call after_merge_succs callback if original expression was not found when traversing any of the branches. (code_motion_path_driver): Change return type. Update prototype. (move_op): Update comment. Add a new parameter (should_move). Update prototype. Set *should_move based on indication provided by move_op_orig_expr_found. 2009-01-29 Steve Ellcey s...@cup.hp.com PR middle-end/38857 * gcc.c-torture/compile/pr38857.c: New test. Added: trunk/gcc/testsuite/gcc.c-torture/compile/pr38857.c Modified: trunk/gcc/ChangeLog trunk/gcc/sel-sched.c trunk/gcc/testsuite/ChangeLog -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38857
[Bug middle-end/38857] [4.4 Regression] ICE in selective scheduler
--- Comment #10 from amonakov at gcc dot gnu dot org 2009-01-29 10:55 --- Fixed with above commit. -- amonakov at gcc dot gnu dot org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38857
[Bug middle-end/39794] [4.4/4.5 Regression] Miscompile with -O2 -funroll-loops
-- amonakov at gcc dot gnu dot org changed: What|Removed |Added Known to fail||4.4.0 4.5.0 Known to work||4.3.2 Priority|P3 |P5 Summary|Miscompile with -O2 - |[4.4/4.5 Regression] |funroll-loops |Miscompile with -O2 - ||funroll-loops http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39794
[Bug middle-end/39794] [4.4/4.5 Regression] Miscompile with -O2 -funroll-loops
--- Comment #2 from amonakov at gcc dot gnu dot org 2009-04-17 21:55 --- I attempted to investigate the miscompilation on the 4.4 branch. The problem seems to appear in dse2 pass. Basically, after encountering 313 dx:DI=ax:DI+0x4 187 {[di:DI+dx:DI]=[di:DI+dx:DI]0x1;clobber flags:CC;} ... 191 [di:DI+dx:DI+0x4]=cx:SI 314 dx:DI=ax:DI+0x8 200 {[di:DI+dx:DI]=[di:DI+dx:DI]0x1;clobber flags:CC;} and upon considering insn 200, dse2 decides to delete insn 191 and protect insn 187 (both are wrong, 200 depends on 191 and 187 is irrelevant): **scanning insn=200 mem: (plus:DI (reg/v/f:DI 5 di [orig:63 a ] [63]) (reg:DI 1 dx [orig:84 ivtmp.36 ] [84])) expanding: r5 into: NULL expanding: r1 into: (plus:DI (value:DI) (const_int 8 [0x8])) expanding value DI into: r0 expanding: r0 into: NULL after cselib_expand address: (plus:DI (plus:DI (reg/v/f:DI 5 di [orig:63 a ] [63]) (reg:DI 0 ax [orig:76 ivtmp.36 ] [76])) (const_int 8 [0x8])) after canon_rtx address: (plus:DI (plus:DI (reg/v/f:DI 5 di [orig:63 a ] [63]) (reg:DI 0 ax [orig:76 ivtmp.36 ] [76])) (const_int 8 [0x8])) varying cselib base=67 offset = 8 processing cselib load mem:(mem:SI (plus:DI (reg/v/f:DI 5 di [orig:63 a ] [63]) (reg:DI 1 dx [orig:84 ivtmp.36 ] [84])) [2 S4 A32]) processing cselib load against insn 191 processing cselib load against insn 187 removing from active insn=187 has store mem: (plus:DI (reg/v/f:DI 5 di [orig:63 a ] [63]) (reg:DI 1 dx [orig:84 ivtmp.36 ] [84])) expanding: r5 into: NULL expanding: r1 into: (plus:DI (value:DI) (const_int 8 [0x8])) expanding value DI into: r0 expanding: r0 into: NULL after cselib_expand address: (plus:DI (plus:DI (reg/v/f:DI 5 di [orig:63 a ] [63]) (reg:DI 0 ax [orig:76 ivtmp.36 ] [76])) (const_int 8 [0x8])) after canon_rtx address: (plus:DI (plus:DI (reg/v/f:DI 5 di [orig:63 a ] [63]) (reg:DI 0 ax [orig:76 ivtmp.36 ] [76])) (const_int 8 [0x8])) varying cselib base=67 offset = 8 processing cselib store [8..12) trying store in insn=191 gid=-1[8..12) Locally deleting insn 191 deferring deletion of insn with uid = 191. mems_found = 1, cannot_delete = false I wonder how dse2 is supposed to notice that insn 314 changes DX. E.g. when checking rhs of insn 200 ([di+dx]) against lhs of insn 191 ([di+dx+4] for different dx) in check_mem_read_rtx it calls canon_true_dependence (from dse.c:2224) for [di+dx] and [di+dx+4] which returns false. However, these references clearly conflict. Maybe a stupid question, but shouldn't this canon_true_dependence call receive canonicalized MEMs from 'base' and 'store_info-cse_base'? -- amonakov at gcc dot gnu dot org changed: What|Removed |Added CC||amonakov at gmail dot com, ||amonakov at gcc dot gnu dot ||org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39794
[Bug driver/39851] New: gcc -Q --help=target does not list extensions selected by -march=
Even though -march=nocona enables SSE2 and SSE3 extensions, gcc -Q --help=target does not list them as enabled. This may be confusing to the user (see http://gcc.gnu.org/ml/gcc-help/2009-04/msg00293.html ): $ gcc -Q --help=target -march=nocona | grep msse[23] -msse2[disabled] -msse3[disabled] -msse[23] would be enabled in gcc/config/i386/i386.c:override_options(), called from toplev.c:process_options(), which is in turn called from do_compile(). OTOH, --help=target is processed in decode_options(), which is executed before do_compile. It would be nice if --help=target processing could see options overridden by target backend. -- Summary: gcc -Q --help=target does not list extensions selected by -march= Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: driver AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: amonakov at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39851
[Bug middle-end/42130] [graphite-branch] dealII fails
--- Comment #4 from amonakov at gcc dot gnu dot org 2009-11-25 11:48 --- Tobias, Please fix the testcase before committing to trunk, like this ('return 0' is needed to ensure the test does not fail when compiled correctly; 'noclone' to ensure that foo is not specialized for n=0): /* { dg-options -O2 -fno-tree-ch } */ #include vector using std::vector; vectorunsigned __attribute__((noinline,noclone)) foo(unsigned n) { vectorunsigned *vv = new vectorunsigned(n, 0u); return *vv; } int main() { foo(0); return 0; } (In reply to comment #3) -- amonakov at gcc dot gnu dot org changed: What|Removed |Added CC||amonakov at gcc dot gnu dot ||org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42130
[Bug middle-end/42245] ICE in verify_backedges for 197.parser with sel-sched
--- Comment #2 from amonakov at gcc dot gnu dot org 2009-12-04 18:00 --- (In reply to comment #0) Janis, Thank you for the testcase. This bug and PR42249 are fixed by Andrey's old patch: http://gcc.gnu.org/ml/gcc-patches/2008-09/msg01930.html The patch in that message still applies cleanly. I'm working on re-testing it with current mainline. If you could test that patch in your environment, it would be very appreciated. -- amonakov at gcc dot gnu dot org changed: What|Removed |Added CC||amonakov at gcc dot gnu dot ||org AssignedTo|unassigned at gcc dot gnu |amonakov at gcc dot gnu dot |dot org |org Status|UNCONFIRMED |ASSIGNED Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2009-12-04 18:00:57 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42245
[Bug rtl-optimization/42246] ICE in init_seqno for 186.crafty with sel-sched
-- amonakov at gcc dot gnu dot org changed: What|Removed |Added CC||amonakov at gcc dot gnu dot ||org AssignedTo|unassigned at gcc dot gnu |amonakov at gcc dot gnu dot |dot org |org Status|UNCONFIRMED |ASSIGNED Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2009-12-04 18:01:40 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42246
[Bug rtl-optimization/42249] unrecognizable insn for 254.gap with sel-sched
-- amonakov at gcc dot gnu dot org changed: What|Removed |Added CC||amonakov at gcc dot gnu dot ||org AssignedTo|unassigned at gcc dot gnu |amonakov at gcc dot gnu dot |dot org |org Status|UNCONFIRMED |ASSIGNED Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2009-12-04 18:02:16 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42249
[Bug rtl-optimization/42294] [4.5 Regression] ICE in code_motion_path_driver for 416.gamess
--- Comment #3 from amonakov at gcc dot gnu dot org 2009-12-07 18:23 --- Also not reproducible on x86_64-ppc64 cross. While codegen differences on ppc/ppc64/x86_64 cross are certainly surprising, in the end this testcase most likely indicates a bug in sel-sched. -- amonakov at gcc dot gnu dot org changed: What|Removed |Added CC||abel at ispras dot ru AssignedTo|unassigned at gcc dot gnu |amonakov at gcc dot gnu dot |dot org |org Status|UNCONFIRMED |ASSIGNED Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2009-12-07 18:23:47 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42294
[Bug rtl-optimization/42294] [4.5 Regression] ICE in code_motion_path_driver for 416.gamess
--- Comment #4 from amonakov at gcc dot gnu dot org 2009-12-08 11:55 --- (In reply to comment #3) Also not reproducible on x86_64-ppc64 cross. While codegen differences on ppc/ppc64/x86_64 cross are certainly surprising, in the end this testcase most likely indicates a bug in sel-sched. Reprodicible on x86_64-ppc64 cross with -fno-section-anchors appended to command line. Native ppc64 compiler seems to not use section anchors on this testcase. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42294
[Bug middle-end/42245] ICE in verify_backedges for 197.parser with sel-sched
--- Comment #5 from amonakov at gcc dot gnu dot org 2009-12-28 14:23 --- (In reply to comment #4) Re. comment #3 - do you have an example of when/how this can happen? Perhaps you can add it to the comment. Here, we are scheduling a loop that looks like this: +---+ | 4 |--+ +-+-+ | | | v | +---+ | | 5 +-+ | +-+-+ | | | | | v | | +---+ | | exit-+ 6 | | | +---+ | | | | | v | | +---+ | | +-+ 7 | | | | +---+ | | | | | | | | | +---+ | | | | 8 |+ | | +-+-+ | | | | | v | | +---+ | -| 9 +---+ +---+ The order of blocks as given by prev_bb/next_bb corresponds to the figure, but in rev_top_order_index BB8 goes before BB6 (which is OK since they are topologically independent). When BB8 becomes empty in the process of scheduling, in the preparation to delete it we redirect BB7-BB9 branch to BB8 (essentially eliminating a now unneeded jump instruction in the end of BB7). The corresponding comment reads in sel-sched-ir.c reads: /* Check if there is an unnecessary jump in previous basic block leading to next basic block left after removing INSN from stream. If it is so, remove that jump and redirect edge to current basic block (where there was INSN before deletion). This way when NOP will be deleted several instructions later with its basic block we will not get a jump to next instruction, which can be harmful. */ This makes BB6 and BB8 topologically dependent. We then merge BB7 and BB8, creating BB6-BB8 branch, which appears to be a backedge (since topological order has not been recomputed). Since the explanation is quite lengthy, we'd prefer to just leave a reference to this PR in the comment. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42245
[Bug libgomp/42519] bootstrap fails on powerpc64-linux because of libgomp
--- Comment #2 from amonakov at gcc dot gnu dot org 2009-12-28 17:45 --- (In reply to comment #1) So, your pthread.h doesn't contain prototype for pthread_getaffinity_np, yet libpthread.so.0 exports it? Otherwise: Affected system declares those functions in /usr/include/nptl/pthread.h. Some clarifications here: http://gcc.gnu.org/ml/gcc/2009-12/msg00376.html -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42519
[Bug middle-end/42245] ICE in verify_backedges for 197.parser with sel-sched
--- Comment #7 from amonakov at gcc dot gnu dot org 2010-01-11 15:04 --- Our previous patch (http://gcc.gnu.org/ml/gcc-patches/2009-12/msg01215.html) failed to correctly fix the problem, and the new testcase uncovers a flaw in that implementation. We 'forgot' to recompute topological order if it was invalidated in tidy_control_flow but not in maybe_tidy_empty_bb called from that function. Fixed by passing recompute_toporder_p to the latter on top of the mentioned previous patch as below (the patch also makes maybe_tidy_empty_bb static by moving the only caller into the same file). * sel-sched-ir.c (maybe_tidy_empty_bb): Make static. Add new argument. Update all callers. (purge_empty_blocks): Export and move from... * sel-sched.c (purge_empty_blocks): ... here. Delete. * sel-sched-ir.h (maybe_tidy_empty_bb): Delete prototype. (purge_empty_blocks): Declare. diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c index cffee1c..e20df17 100644 --- a/gcc/sel-sched-ir.c +++ b/gcc/sel-sched-ir.c @@ -3538,13 +3538,13 @@ sel_recompute_toporder (void) } /* Tidy the possibly empty block BB. */ -bool -maybe_tidy_empty_bb (basic_block bb) +static bool +maybe_tidy_empty_bb (basic_block bb, bool recompute_toporder_p) { basic_block succ_bb, pred_bb; edge e; edge_iterator ei; - bool rescan_p, recompute_toporder_p = false; + bool rescan_p; /* Keep empty bb only if this block immediately precedes EXIT and has incoming non-fallthrough edge, or it has no predecessors or @@ -3630,7 +3630,7 @@ tidy_control_flow (basic_block xbb, bool full_tidying) insn_t first, last; /* First check whether XBB is empty. */ - changed = maybe_tidy_empty_bb (xbb); + changed = maybe_tidy_empty_bb (xbb, false); if (changed || !full_tidying) return changed; @@ -3694,7 +3694,7 @@ tidy_control_flow (basic_block xbb, bool full_tidying) that contained that jump, becomes empty too. In such case remove it too. */ if (sel_bb_empty_p (xbb-prev_bb)) -changed = maybe_tidy_empty_bb (xbb-prev_bb); +changed = maybe_tidy_empty_bb (xbb-prev_bb, recompute_toporder_p); else if (recompute_toporder_p) sel_recompute_toporder (); } @@ -3702,6 +3702,24 @@ tidy_control_flow (basic_block xbb, bool full_tidying) return changed; } +/* Purge meaningless empty blocks in the middle of a region. */ +void +purge_empty_blocks (void) +{ + /* Do not attempt to delete preheader. */ + int i = sel_is_loop_preheader_p (BASIC_BLOCK (BB_TO_BLOCK (0))) ? 1 : 0; + + while (i current_nr_blocks) +{ + basic_block b = BASIC_BLOCK (BB_TO_BLOCK (i)); + + if (maybe_tidy_empty_bb (b, false)) + continue; + + i++; +} +} + /* Rip-off INSN from the insn stream. When ONLY_DISCONNECT is true, do not delete insn's data, because it will be later re-emitted. Return true if we have removed some blocks afterwards. */ diff --git a/gcc/sel-sched-ir.h b/gcc/sel-sched-ir.h index 317258c..b5121c0 100644 --- a/gcc/sel-sched-ir.h +++ b/gcc/sel-sched-ir.h @@ -1619,7 +1619,7 @@ extern bool tidy_control_flow (basic_block, bool); extern void free_bb_note_pool (void); extern void sel_remove_empty_bb (basic_block, bool, bool); -extern bool maybe_tidy_empty_bb (basic_block bb); +extern void purge_empty_blocks (void); extern basic_block sel_split_edge (edge); extern basic_block sel_create_recovery_block (insn_t); extern void sel_merge_blocks (basic_block, basic_block); diff --git a/gcc/sel-sched.c b/gcc/sel-sched.c index 37be754..9271b80 100644 --- a/gcc/sel-sched.c +++ b/gcc/sel-sched.c @@ -6790,24 +6790,6 @@ setup_current_loop_nest (int rgn) gcc_assert (LOOP_MARKED_FOR_PIPELINING_P (current_loop_nest)); } -/* Purge meaningless empty blocks in the middle of a region. */ -static void -purge_empty_blocks (void) -{ - /* Do not attempt to delete preheader. */ - int i = sel_is_loop_preheader_p (BASIC_BLOCK (BB_TO_BLOCK (0))) ? 1 : 0; - - while (i current_nr_blocks) -{ - basic_block b = BASIC_BLOCK (BB_TO_BLOCK (i)); - - if (maybe_tidy_empty_bb (b)) - continue; - - i++; -} -} - /* Compute instruction priorities for current region. */ static void sel_compute_priorities (int rgn) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42245
[Bug middle-end/37447] [4.4 Regression] test pr28982b.c fails execution on power4 or later with ira change
--- Comment #6 from amonakov at gcc dot gnu dot org 2008-10-02 11:27 --- This patch also fixes miscompilation of vla1.f90 test on ia64 on sel-sched branch. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37447
[Bug target/37381] [4.4 Regression] ICE in ia64_speculate_insn, at config/ia64/ia64.c:6902
--- Comment #10 from amonakov at gcc dot gnu dot org 2008-10-14 13:41 --- Since Andrey has just committed ia64 changes from sel-sched branch to trunk, the underlying problem is fixed and ICEs on original testcase and mentioned regression tests are gone. Closing as fixed. -- amonakov at gcc dot gnu dot org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37381
[Bug target/37381] [4.4 Regression] ICE in ia64_speculate_insn, at config/ia64/ia64.c:6902
--- Comment #12 from amonakov at gcc dot gnu dot org 2008-10-16 17:31 --- Subject: Bug 37381 Author: amonakov Date: Thu Oct 16 17:30:06 2008 New Revision: 141177 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=141177 Log: 2008-10-16 Alexander Monakov [EMAIL PROTECTED] PR target/37381 * gcc.c-torture/compile/pr37381.c: New test. Added: trunk/gcc/testsuite/gcc.c-torture/compile/pr37381.c Modified: trunk/gcc/testsuite/ChangeLog -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37381
[Bug target/37381] [4.4 Regression] ICE in ia64_speculate_insn, at config/ia64/ia64.c:6902
--- Comment #13 from amonakov at gcc dot gnu dot org 2008-10-16 17:42 --- H.J., thanks for reminding. Closing again. -- amonakov at gcc dot gnu dot org changed: What|Removed |Added Status|REOPENED|RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37381
[Bug tree-optimization/41118] New: Wrong dependence analysis in graphite for unrestricted pointers
Consider this example: cat pr41118.c EOF void foo(int n, int *a, int *b) { int i; for (i = 0; i n; i++) a[i] = b[i]; } EOF gcc -S -O2 pr41118.c -floop-parallelize-all -ftree-parallelize-loops=2 grep GOMP 41118.s GCC considers the loop parallel, even though arrays pointed to by arguments may arbitrarily overlap. This is because dependency analysis in graphite treats p[i] as global_mem[alias_set_for_p][i]. In this example, since both a and b have alias set 0, a[0][i0] and b[0][i1] are considered independent for i0 != i1. -- Summary: Wrong dependence analysis in graphite for unrestricted pointers Product: gcc Version: 4.5.0 Status: UNCONFIRMED Keywords: wrong-code Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: amonakov at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41118
[Bug target/33604] [4.3/4.4 Regression] Revision 119502 causes significantly slower results with 4.3/4.4 compared to 4.2
--- Comment #43 from amonakov at gcc dot gnu dot org 2008-08-29 13:12 --- Checking original testcase times on x86_64 prescott with gentoo 4.2, 4.3 and today's trunk: 2.960sg++-4.2.4 (GCC) 4.2.4 (Gentoo 4.2.4 p1.0) 2.916sg++-4.3.1 (Gentoo 4.3.1-r1 p1.1) 4.3.1 3.993sg++ (GCC) 4.4.0 20080829 (experimental) 2.796sg++ (GCC) 4.4.0 20080829 (experimental) with --param max-inline-insns-auto=126 So I believe lack of inlining is the biggest 4.4's problem. We do not inline 3x3 matrix multiplication in benchmark loop. While looking at it I found that einline2 dump does not always show the reason for not inlining. I would like to propose the following patch: --- a/gcc/ipa-inline.c +++ b/gcc/ipa-inline.c @@ -1494,6 +1494,8 @@ cgraph_decide_inlining_incrementally (struct cgraph_node *node, } if (cgraph_default_inline_p (e-callee, failed_reason)) inlined |= try_inline (e, mode, depth); + else if (dump_file) + fprintf (dump_file, Not inlining: %s.\n, failed_reason); } node-aux = (void *)(size_t) old_mode; return inlined; -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33604
[Bug target/37381] [4.4 Regression] ICE in ia64_speculate_insn, at config/ia64/ia64.c:6902
--- Comment #4 from amonakov at gcc dot gnu dot org 2008-09-08 10:38 --- Scheduling of instructions dependent on speculative loads was implemented a bit differently on sel-sched branch and on trunk (before the merge). Since ia64.c changes were not checked in, a discrepancy appeared, resulting in an ICE when attempting to schedule an instruction dependent on speculative load. This issue is fixed by restoring the scheduler behaviour to the pre-merge state. This change will have to be reverted when ia64 changes are approved (handling of BE_IN_SPEC bits will be implemented in back-end). Bootstrapped and regtested on ia64-linux. I will add the testcase and re-post to [EMAIL PROTECTED] PR target/37381 * haifa-sched.c (sched_speculate_insn): Filter BE_IN_SPEC bits before passing dependence status to back-end. Index: gcc/haifa-sched.c === --- gcc/haifa-sched.c (revision 140093) +++ gcc/haifa-sched.c (working copy) @@ -4288,7 +4288,7 @@ sched_speculate_insn (rtx insn, ds_t req !(request BEGIN_SPEC)) return 0; - return targetm.sched.speculate_insn (insn, request, new_pat); + return targetm.sched.speculate_insn (insn, request BEGIN_SPEC, new_pat); } static int -- amonakov at gcc dot gnu dot org changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |amonakov at gcc dot gnu dot |dot org |org Status|UNCONFIRMED |ASSIGNED Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2008-09-08 10:38:12 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37381
[Bug middle-end/37499] [4.4 Regression] Scheduling pass 2 time increases by order of magnitude
--- Comment #4 from amonakov at gcc dot gnu dot org 2008-09-16 15:12 --- A patch for this bug has been posted at http://gcc.gnu.org/ml/gcc-patches/2008-09/msg01135.html Running the testcase on similarly configured compiler shows 2.47 seconds spent in scheduling2, out of 151.27 total time (2.2GHz Opteron 8354). Unpatched times are 32.43 sched2/180.89 total. -- amonakov at gcc dot gnu dot org changed: What|Removed |Added AssignedTo|abel at gcc dot gnu dot org |amonakov at gcc dot gnu dot ||org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37499
[Bug middle-end/37499] [4.4 Regression] Scheduling pass 2 time increases by order of magnitude
--- Comment #6 from amonakov at gcc dot gnu dot org 2008-09-18 08:31 --- Subject: Bug 37499 Author: amonakov Date: Thu Sep 18 08:29:48 2008 New Revision: 140445 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=140445 Log: 2008-09-18 Alexander Monakov [EMAIL PROTECTED] PR middle-end/37499 * sched-int.h (struct _haifa_insn_data): Remove unused field ref_count. * sched-rgn.c (ref_counts): Remove. (insn_referenced): New static variable. (INSN_REF_COUNT): Remove. (sched_run_compute_dependencies): Use insn_referenced instead of INSN_REF_COUNT. (add_branch_dependences): Likewise. Delete dead assignment. Modified: trunk/gcc/ChangeLog trunk/gcc/sched-int.h trunk/gcc/sched-rgn.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37499
[Bug middle-end/37499] [4.4 Regression] Scheduling pass 2 time increases by order of magnitude
--- Comment #7 from amonakov at gcc dot gnu dot org 2008-09-18 08:34 --- Fixed with above commit. -- amonakov at gcc dot gnu dot org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37499
[Bug middle-end/42245] ICE in verify_backedges for 197.parser with sel-sched
--- Comment #9 from amonakov at gcc dot gnu dot org 2010-01-14 10:29 --- Subject: Bug 42245 Author: amonakov Date: Thu Jan 14 10:28:47 2010 New Revision: 155890 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=155890 Log: 2010-01-14 Andrey Belevantsev a...@ispras.ru Alexander Monakov amona...@ispras.ru PR middle-end/42245 * sel-sched-ir.c (sel_recompute_toporder): New. Use it... (maybe_tidy_empty_bb): ... here. Make static. Add new argument. Update all callers. (tidy_control_flow): ... and here. Recompute topological order of basic blocks in region if necessary. (sel_redirect_edge_and_branch): Change return type. Return true if topological order might have been invalidated. (purge_empty_blocks): Export and move from... * sel-sched.c (purge_empty_blocks): ... here. * sel-sched-ir.h (sel_redirect_edge_and_branch): Update prototype. (maybe_tidy_empty_bb): Delete prototype. (purge_empty_blocks): Declare. * gcc.dg/pr42245.c: New. * gcc.dg/pr42245-2.c: New. Modified: trunk/gcc/ChangeLog trunk/gcc/sel-sched-ir.c trunk/gcc/sel-sched-ir.h trunk/gcc/sel-sched.c trunk/gcc/testsuite/ChangeLog -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42245
[Bug middle-end/42245] ICE in verify_backedges for 197.parser with sel-sched
--- Comment #10 from amonakov at gcc dot gnu dot org 2010-01-14 10:38 --- Subject: Bug 42245 Author: amonakov Date: Thu Jan 14 10:38:14 2010 New Revision: 155891 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=155891 Log: Add tests missing from previous commit. PR middle-end/42245 * gcc.dg/pr42245.c: New. * gcc.dg/pr42245-2.c: New. Added: trunk/gcc/testsuite/gcc.dg/pr42245-2.c trunk/gcc/testsuite/gcc.dg/pr42245.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42245
[Bug rtl-optimization/39453] ICE : in init_seqno, at sel-sched.c:6433
--- Comment #6 from amonakov at gcc dot gnu dot org 2010-01-14 10:40 --- Subject: Bug 39453 Author: amonakov Date: Thu Jan 14 10:40:19 2010 New Revision: 155892 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=155892 Log: 2010-01-14 Alexander Monakov amona...@ispras.ru PR rtl-optimization/39453 PR rtl-optimization/42246 * sel-sched-ir.c (considered_for_pipelining_p): Do not test for pipelining_p. (sel_add_loop_preheaders): Add preheader to last_added_blocks. * gcc.dg/pr39453.c: New. * gcc.dg/pr42246.c: New. Added: trunk/gcc/testsuite/gcc.dg/pr39453.c trunk/gcc/testsuite/gcc.dg/pr42246.c Modified: trunk/gcc/ChangeLog trunk/gcc/sel-sched-ir.c trunk/gcc/testsuite/ChangeLog -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39453
[Bug rtl-optimization/42246] ICE in init_seqno for 186.crafty with sel-sched
--- Comment #5 from amonakov at gcc dot gnu dot org 2010-01-14 10:40 --- Subject: Bug 42246 Author: amonakov Date: Thu Jan 14 10:40:19 2010 New Revision: 155892 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=155892 Log: 2010-01-14 Alexander Monakov amona...@ispras.ru PR rtl-optimization/39453 PR rtl-optimization/42246 * sel-sched-ir.c (considered_for_pipelining_p): Do not test for pipelining_p. (sel_add_loop_preheaders): Add preheader to last_added_blocks. * gcc.dg/pr39453.c: New. * gcc.dg/pr42246.c: New. Added: trunk/gcc/testsuite/gcc.dg/pr39453.c trunk/gcc/testsuite/gcc.dg/pr42246.c Modified: trunk/gcc/ChangeLog trunk/gcc/sel-sched-ir.c trunk/gcc/testsuite/ChangeLog -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42246
[Bug middle-end/42245] ICE in verify_backedges for 197.parser with sel-sched
--- Comment #11 from amonakov at gcc dot gnu dot org 2010-01-14 10:41 --- Fixed by revision 155890 -- amonakov at gcc dot gnu dot org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42245
[Bug rtl-optimization/39453] ICE : in init_seqno, at sel-sched.c:6433
--- Comment #7 from amonakov at gcc dot gnu dot org 2010-01-14 10:44 --- Fixed by revision 155892 -- amonakov at gcc dot gnu dot org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39453
[Bug rtl-optimization/42294] [4.5 Regression] ICE in code_motion_path_driver for 416.gamess
--- Comment #10 from amonakov at gcc dot gnu dot org 2010-01-14 10:47 --- Subject: Bug 42294 Author: amonakov Date: Thu Jan 14 10:46:57 2010 New Revision: 155893 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=155893 Log: 2010-01-14 Alexander Monakov amona...@ispras.ru PR rtl-optimization/42294 * sel-sched-ir.h (struct _sel_insn_data): Update comment. * sel-sched.c (move_exprs_to_boundary): Transitively add all originators' originators. * gfortran.dg/pr42294.f: New. Added: trunk/gcc/testsuite/gfortran.dg/pr42294.f Modified: trunk/gcc/ChangeLog trunk/gcc/sel-sched-ir.h trunk/gcc/sel-sched.c trunk/gcc/testsuite/ChangeLog -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42294
[Bug tree-optimization/42771] [4.5 Regression] ICE: in graphite_loop_normal_form, at graphite-sese-to-poly.c (2)
--- Comment #5 from amonakov at gcc dot gnu dot org 2010-01-25 17:06 --- We fail to find number of iterations after rewriting reductions out of SSA. Before graphite pass, IR looks like (for the previous testcase, pr42771.c): bb 9: # j_26 = PHI j_20(10) bb 10: # j_33 = PHI j_26(9), 1(16) D.2747_16 = B[j_33][0]; D.2748_17 = j_33 + -1; D.2749_18 = B[D.2748_17][0]; D.2750_19 = D.2749_18 ^ D.2747_16; B[j_33][0] = D.2750_19; j_20 = j_33 + 1; if (jm_14(D) j_20) goto bb 9; else goto bb 11; At the time of the ICE, IR is transformed to (gdb) call debug_loop(use_loop, 3) loop_3 (header = 10, latch = 9, niter = (unsigned int) jm_14(D) + 4294967294, upper_bound = 2147483646) { bb_9 (preds = {bb_10 }, succs = {bb_10 }) { bb 9: # .MEM_24 = PHI .MEM_78(10) # VUSE .MEM_24 j_26 = Close_Phi.13[0]; # .MEM_79 = VDEF .MEM_24 General_Reduction.14[0] = j_26; } bb_10 (preds = {bb_9 bb_16 }, succs = {bb_9 bb_11 }) { bb 10: # .MEM_15 = PHI .MEM_79(9), .MEM_77(16) # VUSE .MEM_15 D.2766_76 = General_Reduction.14[0]; j_33 = D.2766_76; # VUSE .MEM_15 D.2747_16 = B[j_33][0]; D.2748_17 = j_33 + -1; # VUSE .MEM_15 D.2749_18 = B[D.2748_17][0]; D.2750_19 = D.2749_18 ^ D.2747_16; # .MEM_30 = VDEF .MEM_15 B[j_33][0] = D.2750_19; j_20 = j_33 + 1; # .MEM_78 = VDEF .MEM_30 Close_Phi.13[0] = j_20; if (jm_14(D) j_20) goto bb 9; else goto bb 11; } } We would fail to discover scalar evolution of j_33. -- amonakov at gcc dot gnu dot org changed: What|Removed |Added CC||amonakov at gcc dot gnu dot ||org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42771
[Bug tree-optimization/42771] [4.5 Regression][graphite] ICE: in graphite_loop_normal_form, at graphite-sese-to-poly.c (2)
--- Comment #10 from amonakov at gcc dot gnu dot org 2010-02-10 18:26 --- (In reply to comment #9) Fixed as described in http://gcc.gnu.org/ml/gcc-patches/2010-02/msg00436.html I don't see how this patch makes simple_iv call from number_of_iterations_exit return true for j_20. Could you please kindly explain? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42771
[Bug tree-optimization/43012] [4.5 Regression][graphite] wrong code for -floop-strip-mine in 453.povray
--- Comment #2 from amonakov at gcc dot gnu dot org 2010-02-10 18:41 --- Confirming. Reproducible on amd64-linux. This appears to be a bug in CLooG. Disable CLooG optimizations on graphite branch fixes the bug. The problem is that CLooG generates wrong bounds for parts of strip-mined loop (bounds of the first and the last loops are wrong): for (scat_3=-51;scat_3=63;scat_3++) { S3(scat_3) ; S4(scat_3) ; } for (scat_3=64;scat_3=76;scat_3++) { S3(scat_3) ; S6(scat_3) ; } for (scat_3=77;scat_3=88;scat_3++) { S3(scat_3) ; S8(scat_3) ; } for (scat_3=89;scat_3=-1;scat_3++) { S3(scat_3) ; } -- amonakov at gcc dot gnu dot org changed: What|Removed |Added CC||amonakov at gcc dot gnu dot ||org Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2010-02-10 18:41:13 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43012
[Bug tree-optimization/43012] [4.5 Regression][graphite] wrong code for -floop-strip-mine in 453.povray
--- Comment #3 from amonakov at gcc dot gnu dot org 2010-02-11 14:28 --- (In reply to comment #2) Confirming. Reproducible on amd64-linux. This appears to be a bug in CLooG. Disable CLooG optimizations on graphite branch fixes the bug. The problem is that CLooG generates wrong bounds for parts of strip-mined loop (bounds of the first and the last loops are wrong): I've sent a CLooG patch. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43012
[Bug c/43052] Inline memcmp is *much* slower than glibc's
--- Comment #1 from amonakov at gcc dot gnu dot org 2010-02-12 15:45 --- Confirmed. GCC simply emits repz cmpsb. There was even an e-mail with benchmark results and a patch (never applied): http://gcc.gnu.org/ml/gcc-patches/2009-09/msg02129.html -- amonakov at gcc dot gnu dot org changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2010-02-12 15:45:19 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
[Bug tree-optimization/36905] [4.3/4.4/4.5 Regression] IV-opts needs a little help with a[i+1]
--- Comment #7 from amonakov at gcc dot gnu dot org 2010-02-16 17:43 --- (In reply to comment #6) Looks like this has been fixed. We do generate good code: fred: li 0,100 mtctr 0 .L2: sthu 3,2(4) bdnz .L2 blr .size fred, .-fred .ident GCC: (GNU) 4.5.0 20100215 (experimental) And ivopts dump is quite sane: bb 2: ivtmp.13_17 = (unsigned int) out1_5(D); bb 3: # i_13 = PHI i_3(4), 0(2) # ivtmp.13_15 = PHI ivtmp.13_16(4), ivtmp.13_17(2) i_3 = i_13 + 1; ivtmp.13_16 = ivtmp.13_15 + 2; D.2031_18 = (void *) ivtmp.13_16; MEM[base: D.2031_18] = in_7(D); if (i_3 != 100) goto bb 4; else goto bb 5; bb 4: goto bb 3; -- amonakov at gcc dot gnu dot org changed: What|Removed |Added CC||amonakov at gcc dot gnu dot ||org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36905
[Bug pending/41998] GCC 4.6 pending patches meta-bug
--- Comment #6 from amonakov at gcc dot gnu dot org 2010-02-17 16:32 --- Handle ADDR_EXPR in SCEV http://gcc.gnu.org/ml/gcc-patches/2010-02/msg00666.html -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41998
[Bug tree-optimization/43174] New: Teaching SCEV about ADDR_EXPR causes regression
With patch from here: http://gcc.gnu.org/ml/gcc-patches/2010-02/msg00668.html IVopts begin to create IVs for expressions like a0[i][j][0]. This may cause regressions in stack usage and code size (also possibly speed). Test case: /* ---8--- */ enum {N=123}; int a0[N][N][N], a1[N][N][N], a2[N][N][N], a3[N][N][N], a4[N][N][N], a5[N][N][N], a6[N][N][N], a7[N][N][N]; int foo() { int i, j, k, s = 0; for (i = 0; i N; i++) for (j = 0; j N; j++) for (k = 0; k N; k++) { s += a0[i][j][k]; s += a1[i][j][k]; s += a2[i][j][k]; s += a3[i][j][k]; s += a4[i][j][k]; s += a5[i][j][k]; s += a6[i][j][k]; s += a7[i][j][k]; } return s; } /* ---8--- */ Without the patch, IVopts produce one IV for j loop and 8 IVs for k loop. With the patch, IVopts additionally produce 8 IVs for j loop (with 123*4 increment), 4 of which live on stack (on x86-64, -O2). Creation of IVs that live on stack is likely due to inexact register pressure estimation in IVopts. However, it would be nice if IVopts could notice that it's cheaper to take the final value of inner loop IVs (e.g. a0[i][j][k]) instead of incrementing IV holding a0[i][j][0] by 123*4. It would decrease register pressure and allow to generate perfect code for the test case. -- Summary: Teaching SCEV about ADDR_EXPR causes regression Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: amonakov at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43174
[Bug tree-optimization/43209] [4.5 Regression] ICE in try_improve_iv_set, at tree-ssa-loop-ivopts.c:5238
--- Comment #4 from amonakov at gcc dot gnu dot org 2010-02-28 22:01 --- Confirmed. The first invocation of get_computation_aff fails with ustep == (long) j, cstep == (unsigned long) j: constant_multiple_of (ustep, cstep, rat) returns false (j is int, STRIP_NOPS ({u,c}step) preserves conversions). -- amonakov at gcc dot gnu dot org changed: What|Removed |Added CC||amonakov at gcc dot gnu dot ||org Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2010-02-28 22:01:43 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43209
[Bug tree-optimization/43174] Teaching SCEV about ADDR_EXPR causes regression
--- Comment #1 from amonakov at gcc dot gnu dot org 2010-03-01 17:43 --- Created an attachment (id=20001) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20001action=view) Simplify increments in IVopts using final values of inner loop IVs A quick dirty attempt to implement register pressure reduction in outer loops by using final values of inner loop IVs. Currently, given for (i = 0; i N; i++) for (j = 0; j N; j++) s += a[i][j]; we generate something like bb1 L1: s.0 = PHI(0, s.2) i.0 = PHI(0, i.1) ivtmp.0 = a[i.0][0] bb2 L2: s.1 = PHI(s.0, s.2) j.0 = PHI(122, j.1) ivtmp.1 = PHI(ivtmp.0, ivtmp.2) s.2 = s.1 + MEM(ivtmp.1) ivtmp.2 = ivtmp.1 + 4 j.1 = j.0 - 1 if (j.1 = 0) goto L2 bb3 i.1 = i.0 + 1 if (i.1 = 122) goto L1 This together with the patch mentioned in the previous comment allows to generate: ivtmp.0 = a[0][0] bb1 L1: s.0 = PHI(0, s.2) i.0 = PHI(122, i.1) ivtmp.1 = PHI(ivtmp.0, ivtmp.4) bb2 L2: s.1 = PHI(s.0, s.2) j.0 = PHI(122, j.1) ivtmp.2 = PHI(ivtmp.1, ivtmp.3) s.2 = s.1 + MEM(ivtmp.2) ivtmp.3 = ivtmp.2 + 4 j.1 = j.0 - 1 if (j.1 = 0) goto L2 bb3 ivtmp.4 = ivtmp.3 // would be ivtmp.4 = ivtmp.1 + stride i.1 = i.0 - 1 if (i.1 = 0) goto L1 The improvement is that ivtmp.1 is not live across the inner loop. The approach is to store final values of IVs in a hashtable, mapping SSA_NAME of initial value in the preheader to aff_tree with final value, and then try to replace increments of new IVs with uses of IVs from inner loops (currently I just implemented a brute force loop over all IV uses to find a useful entry in that hashtable). Does this make sense and sound acceptable? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43174
[Bug tree-optimization/43236] -ftree-loop-distribution produces wrong code in reload1.c:delete_output_reload(), bootstrap fails
--- Comment #6 from amonakov at gcc dot gnu dot org 2010-03-03 13:06 --- Not a regression, off-by-one error in reverse iteration case is since day one. Patch: diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c index 13ac7ea..110abdc 100644 --- a/gcc/tree-loop-distribution.c +++ b/gcc/tree-loop-distribution.c @@ -285,6 +285,8 @@ generate_memset_zero (gimple stmt, tree op0, tree nb_iter, addr_base = fold_convert_loc (loc, sizetype, addr_base); addr_base = size_binop_loc (loc, MINUS_EXPR, addr_base, fold_convert_loc (loc, sizetype, nb_bytes)); + addr_base = size_binop_loc (loc, PLUS_EXPR, addr_base, + TYPE_SIZE_UNIT (TREE_TYPE (op0))); addr_base = fold_build2_loc (loc, POINTER_PLUS_EXPR, TREE_TYPE (DR_BASE_ADDRESS (dr)), DR_BASE_ADDRESS (dr), addr_base); This fixes the -O[123] miscompilations. -Os is slightly harder to fix, since we use wrong number of iterations (cond bb is executed 11 times, latch bb with assignment 10 times). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43236
[Bug tree-optimization/43236] -ftree-loop-distribution produces wrong code in reload1.c:delete_output_reload(), bootstrap fails
--- Comment #7 from amonakov at gcc dot gnu dot org 2010-03-03 13:38 --- (In reply to comment #6) This fixes the -O[123] miscompilations. -Os is slightly harder to fix, since we use wrong number of iterations (cond bb is executed 11 times, latch bb with assignment 10 times). I don't see what is the proper fix for the -Os problem. The loop structure is as follows: bb2 i = 20 goto bb4 bb3 i-- a[i] = 0 bb4 if (i 10) goto bb3 Thus, bb4 is header, bb3 is latch, number_of_exit_cond_executions() is 11, just_once_each_iteration_p() is true for both bb3 and bb4 (?!) -- amonakov at gcc dot gnu dot org changed: What|Removed |Added CC||amonakov at gcc dot gnu dot ||org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43236
[Bug tree-optimization/43236] -ftree-loop-distribution produces wrong code in reload1.c:delete_output_reload(), bootstrap fails
--- Comment #8 from amonakov at gcc dot gnu dot org 2010-03-09 16:55 --- Given the fact that loop distribution only works for two-bb loops, I think the fix is to simply take number of latch executions when the stmt is in the latch. diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c --- a/gcc/tree-loop-distribution.c +++ b/gcc/tree-loop-distribution.c @@ -389,6 +391,8 @@ generate_builtin (struct loop *loop, bitmap partition, bool copy_p) goto end; write = stmt; + if (bb == loop-latch) + nb_iter = number_of_latch_executions (loop); } } } -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43236
[Bug tree-optimization/43236] -ftree-loop-distribution produces wrong code in reload1.c:delete_output_reload(), bootstrap fails
--- Comment #9 from amonakov at gcc dot gnu dot org 2010-03-10 12:54 --- Subject: Bug 43236 Author: amonakov Date: Wed Mar 10 12:53:51 2010 New Revision: 157339 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=157339 Log: PR tree-optimization/43236 * tree-loop-distribution.c (generate_memset_zero): Fix off-by-one error in calculation of base address in reverse iteration case. (generate_builtin): Take number of latch executions if the statement is in the latch. * gcc.c-torture/execute/pr43236.c: New. Added: trunk/gcc/testsuite/gcc.c-torture/execute/pr43236.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-loop-distribution.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43236
[Bug tree-optimization/43236] -ftree-loop-distribution produces wrong code in reload1.c:delete_output_reload(), bootstrap fails
--- Comment #10 from amonakov at gcc dot gnu dot org 2010-03-10 12:57 --- Both issues are fixed with above commit. -- amonakov at gcc dot gnu dot org changed: What|Removed |Added Status|NEW |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43236
[Bug gcov-profile/43341] New: pragma pack changes padding in struct gcov_info on 64-bit archs
cat EOF t.c int foo(){} #pragma pack(1) EOF gcc -S -fprofile-generate t.c grep '^\.LPBX' -A 2 t.s .LPBX0: .long 875574314 .quad 0 The padding ('.zero 4') between .long and .quad that correspond to the first two fields of struct gcov_info (an int and a ptr) is gone. This makes building Firefox with profile feedback impossible on amd64. At least gcc-4.[1345] behave this way. -- Summary: pragma pack changes padding in struct gcov_info on 64- bit archs Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: gcov-profile AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: amonakov at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43341
[Bug lto/43355] New: Undefined references with -flto -fuse-linker-plugin, dependent on object file ordering
cat EOF a.c extern int puts(const char*); char *program; void fail() {puts(program);} EOF cat EOF b.c extern int puts(const char*); extern char *program; extern void fail(); void usage() {puts(program);} int main(int argc, char *argv[]) { program = argv[0]; if (argc) usage(); else fail(); return 0; } EOF gcc -flto -fuse-linker-plugin -O2 b.c a.c /usr/lib/gcc/x86_64-pc-linux-gnu/4.5.0-pre/../../../../x86_64-pc-linux-gnu/bin/ld: /tmp/ccvkxseZ.lto.o: in function usage:ccHTGBCK.o(.text+0x3): error: undefined reference to 'program' /usr/lib/gcc/x86_64-pc-linux-gnu/4.5.0-pre/../../../../x86_64-pc-linux-gnu/bin/ld: /tmp/ccvkxseZ.lto.o: in function fail:ccHTGBCK.o(.text+0x13): error: undefined reference to 'program' /usr/lib/gcc/x86_64-pc-linux-gnu/4.5.0-pre/../../../../x86_64-pc-linux-gnu/bin/ld: /tmp/ccvkxseZ.lto.o: in function main:ccHTGBCK.o(.text+0x2c): error: undefined reference to 'program' collect2: ld returned 1 exit status Works with reversed ordering or without -fuse-linker-plugin -- Summary: Undefined references with -flto -fuse-linker-plugin, dependent on object file ordering Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: amonakov at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43355
[Bug rtl-optimization/32283] [4.3/4.4/4.5 regression] Missed induction variable optimization
--- Comment #28 from amonakov at gcc dot gnu dot org 2010-03-16 15:26 --- To provide an update of the situation on 4.5 trunk: AFAIK the situation has been generally improved with Zdenek's second commit (in comment #23) and auto-inc-dec improvements in 4.5. However, on the particular testcase discussed here (comment #20), we still don't DTRT on ia64 (powerpc is OK, don't know about arm). There is unfortunate interplay between IVOPTS, RTL PRE, RTL loop analysis and the auto-inc-dec pass. First, ivopts produce harder-to-grok sequence in loop preheader, transforming bb 3: pretmp.2_8 = (short int) v_4(D); bb 4: # i_13 = PHI i_6(5), 0(3) a[i_13] = pretmp.2_8; i_6 = i_13 + 1; if (len_3(D) i_6) goto bb 5; else goto bb 6; bb 5: goto bb 4; to bb 3: pretmp.2_8 = (short int) v_4(D); ivtmp.7_11 = (long unsigned int) a[0]; D.2006_17 = (unsigned int) len_3(D); D.2007_18 = D.2006_17 + 4294967295; D.2008_19 = (long unsigned int) D.2007_18; D.2009_20 = D.2008_19 * 2; a.13_21 = (long unsigned int) a; D.2011_22 = a.13_21 + 2; D.2012_23 = D.2009_20 + D.2011_22; bb 4: # ivtmp.7_1 = PHI ivtmp.7_12(5), ivtmp.7_11(3) D.2005_16 = (void *) ivtmp.7_1; MEM[base: D.2005_16]{a[i]} = pretmp.2_8; ivtmp.7_12 = ivtmp.7_1 + 2; if (ivtmp.7_12 != D.2012_23) goto bb 5; else goto bb 6; bb 5: goto bb 4; The preheader is not cleaned up until RTL PRE. Then, PRE transforms L54: 51 NOTE_INSN_BASIC_BLOCK 52 [r371:DI]=r373:SI#0 53 r371:DI=r371:DI+0x2 55 r392:BI=r371:DI!=r381:DI 56 pc={(r392:BI!=0x0)?L54:pc} to L83: 82 NOTE_INSN_BASIC_BLOCK 77 r397:DI=r371:DI+0x2 L54: 51 NOTE_INSN_BASIC_BLOCK 52 [r371:DI]=r373:SI#0 75 r371:DI=r397:DI REG_EQUAL: r371:DI+0x2 55 r392:BI=r371:DI!=r381:DI 56 pc={(r392:BI!=0x0)?L83:pc} REG_DEAD: r392:BI REG_BR_PROB: 0x26ac ... which is something auto-inc-dec pass is not able to handle. If I disable rtl pre with -fdbg-cnt=pre:0, auto-inc is generated, but doloop pass is confused instead: Loop 1 is simple: simple exit 4 - 5 infinite if: (expr_list:REG_DEP_TRUE (subreg:SI (and:DI (plus:DI (minus:DI (ashift:DI (reg:DI 390) (const_int 1 [0x1])) (reg:DI 371 [ ivtmp.7 ])) (const:DI (plus:DI (symbol_ref:DI (a) [flags 0x2] var_decl 0x77968000 a) (const_int 2 [0x2] (const_int 1 [0x1])) 0) (nil)) number of iterations: (lshiftrt:DI (plus:DI (minus:DI (reg:DI 381 [ D.2012 ]) (reg:DI 371 [ ivtmp.7 ])) (const_int -2 [0xfffe])) (const_int 1 [0x1])) upper bound: -2 Doloop: Possible infinite iteration case. Doloop: The loop is not suitable. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32283
[Bug tree-optimization/43415] [4.4/4.5 Regression] Consumes large amounts of memory and time in PRE at -O3
--- Comment #1 from amonakov at gcc dot gnu dot org 2010-03-18 11:30 --- Confirming. 4.5 trunk needs lots of memory in PRE. -- amonakov at gcc dot gnu dot org changed: What|Removed |Added CC||rguenth at gcc dot gnu dot ||org, amonakov at gcc dot gnu ||dot org Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Keywords||compile-time-hog, memory-hog Known to fail||4.4.3 4.5.0 Known to work||4.4.2 Last reconfirmed|-00-00 00:00:00 |2010-03-18 11:30:27 date|| Summary|[4.4 regression] gcc takes |[4.4/4.5 Regression] |unusually large amounts of |Consumes large amounts of |memory and time to compile |memory and time in PRE at - |nested for loop at -O3 |O3 Target Milestone|--- |4.5.0 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43415
[Bug c/43423] gcc should vectorize this loop through iteration range splitting
--- Comment #1 from amonakov at gcc dot gnu dot org 2010-03-18 18:13 --- Graphite is able to split the loop, but then the vectorizer punts anyway: gcc -O3 -ftree-vectorizer-verbose=7 -fgraphite-identity -S t.c t.c:11: note: not vectorized: number of iterations cannot be computed. t.c:9: note: not vectorized: number of iterations cannot be computed. t.c:3: note: vectorized 0 loops in function. -- amonakov at gcc dot gnu dot org changed: What|Removed |Added CC||amonakov at gcc dot gnu dot ||org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43423
[Bug target/43603] gcc-4.4.3 ICE on ia64 with -O3
--- Comment #2 from amonakov at gcc dot gnu dot org 2010-04-06 17:10 --- Thanks for the analysis. This is reproducible on trunk with -O2 -fsel-sched-pipelining -fselective-scheduling2 (with -O3, pressure-aware loop invariant motion slightly changes the code, and it's not possible to disable it (not even with -fno-ira-loop-pressure, because it's enabled unconditionally in ia64_override_options)). The real problem is that we are attempting to clone an instruction with asm operands as a bookkeeping copy -- that should never happen. Thus, I think copy_rtx calls should stay. We'll have to fix the scheduler to never attempt such code motion. -- amonakov at gcc dot gnu dot org changed: What|Removed |Added CC||abel at gcc dot gnu dot org, ||amonakov at gcc dot gnu dot ||org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43603
[Bug rtl-optimization/45472] [4.5/4.6 Regression] ICE: in move_op_ascend, at sel-sched.c:6124 with -fselective-scheduling2
--- Comment #4 from amonakov at gcc dot gnu dot org 2010-09-20 14:49 --- A small testcase to illustrate the problem with volatile fields. //---8--- struct vv {volatile long a, b;} vv1, vv2; int foo() { vv1 = vv2; } //---8--- gcc/cc1 -O2 -frename-registers -fschedule-insns2 vol.c movqvv2+8(%rip), %rax movqvv2(%rip), %rdx movq%rax, vv1+8(%rip) movq%rdx, vv1(%rip) The compiler reorders accesses to volatile fields. As Andrey said, /v bits are missing on MEMs even in the .expand dump. -- amonakov at gcc dot gnu dot org changed: What|Removed |Added CC||amonakov at gcc dot gnu dot ||org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45472
[Bug gcov-profile/43341] pragma pack changes padding in struct gcov_info on 64-bit archs
--- Comment #1 from amonakov at gcc dot gnu dot org 2010-04-21 16:43 --- Created an attachment (id=20455) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20455action=view) proposed patch -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43341
[Bug gcov-profile/43825] gcov is initialized wrong on x86_64
--- Comment #7 from amonakov at gcc dot gnu dot org 2010-04-21 16:45 --- *** This bug has been marked as a duplicate of 43341 *** -- amonakov at gcc dot gnu dot org changed: What|Removed |Added CC||amonakov at gcc dot gnu dot ||org Status|WAITING |RESOLVED Resolution||DUPLICATE http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43825
[Bug gcov-profile/43341] pragma pack changes padding in struct gcov_info on 64-bit archs
--- Comment #2 from amonakov at gcc dot gnu dot org 2010-04-21 16:45 --- *** Bug 43825 has been marked as a duplicate of this bug. *** -- amonakov at gcc dot gnu dot org changed: What|Removed |Added CC||tglek at mozilla dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43341
[Bug gcov-profile/43825] gcov is initialized wrong on x86_64
--- Comment #8 from amonakov at gcc dot gnu dot org 2010-04-21 16:48 --- Taras, to avoid triggering the problem from firefox you can search for the file (as I remember there is only one in xulrunner) with #pragma pack(1) and does not reset it, and add #pragma pack() in the end of that file. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43825
[Bug gcov-profile/43341] pragma pack changes padding in struct gcov_info on 64-bit archs
--- Comment #3 from amonakov at gcc dot gnu dot org 2010-04-21 16:54 --- (In reply to comment #1) Created an attachment (id=20455) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20455action=view) [edit] proposed patch GCC generates gcov structures at runtime, and #pragma pack(1) in the source file affects their layout. We probably can reset the alignment in create_coverage to avoid that. The above patch implements a different approach -- it rearranges structure fields and manually sets alignment so that layout does not depend on current structure packing. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43341
[Bug fortran/42958] Weird temporary array allocation
--- Comment #19 from amonakov at gcc dot gnu dot org 2010-04-28 15:15 --- (In reply to comment #18) 3) for the same reason you can also drop the + 1 in computing the allocation size which is currently (ubound - lbound + 1) * 4 Sorry, but isn't +1 needed because bounds are inclusive? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42958
[Bug c/44116] 64bit inodes for source code causes Value too large for defined data type (XFS,inode64)
--- Comment #1 from amonakov at gcc dot gnu dot org 2010-05-13 14:46 --- r...@matylda1: /mnt/data/kasparek# LC_ALL=C gcc -o test.o test-10356.c cc1: error: test-10356.c: Value too large for defined data type The first this I need to help with is how to check if the code that causes this (expect somewhere is used 32bit variable to store the inode) is from gcc itself or it is some third-party library. I Execute gcc -o test.o test-10356.c -### 21 | grep cc1 to get the cc1 command line. Then use gdb --args cc1 cmdline to debug the compiler. Getting a backtrace before the abort would be nice. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44116
[Bug testsuite/44343] malloc check in libstdc++-v3/testsuite/22_locale/codecvt/unshift/char/1.cc
--- Comment #1 from amonakov at gcc dot gnu dot org 2010-05-31 14:23 --- Yes, it's a bug in 1.cc that was fixed for 4.6. -- amonakov at gcc dot gnu dot org changed: What|Removed |Added CC||amonakov at gcc dot gnu dot ||org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44343
[Bug rtl-optimization/44838] [4.6 regression] RTL loop unrolling causes FAIL: gcc.dg/pr39794.c
--- Comment #16 from amonakov at gcc dot gnu dot org 2010-07-07 09:54 --- (In reply to comment #15) Subject: Re: [4.6 regression] RTL loop unrolling causes FAIL: gcc.dg/pr39794.c I am not sure what you mean -- I may be misunderstanding how rtl alias analysis works, but as far as I can tell, what unroller does (just preserving the MEM_ATTRs) is conservatively correct (so, potentially it may make us believe that there are dependences that are not really present, but it should not cause a wrong-code bug). Consider this simplified example: for (i ...) { /*A*/ t = a[i]; /*B*/ a[i+1] = t; } MEM_ATTRS would indicate that memory references in A and B do not alias. Unrolling by 2 produces: for (i ...) { /*A */ t = a[i]; /*B */ a[i+1] = t; /*A'*/ t = a[i+1]; /*B'*/ a[i+2] = t; } Preserving MEM_ATTRS wrongly indicates that memory references in B and A' do not alias, and the scheduler then may happen to lift A' above B. -- amonakov at gcc dot gnu dot org changed: What|Removed |Added CC||amonakov at gcc dot gnu dot ||org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44838
[Bug rtl-optimization/43494] gcc.c-torture/execute/vector-2.c fails with -fpic/-fPIC
--- Comment #9 from amonakov at gcc dot gnu dot org 2010-07-20 08:25 --- It's probably worth noting that the disambiguated MEMs are of different widths: load from (mem/c/i:DI (reg/f:DI 14 r14 [351]) [2 t+8 S8 A64]) store to (mem/s/j/c:SI (reg/f:DI 15 r15 [343]) [2 t+4 S4 A32]) (btw, IIUC the above mems do not alias indeed, and the problem is in disambiguating the above store and a load from (mem/c/i:DI (post_inc:DI (reg/f:DI 14 r14 [351])) [2 t+0 S8 A128]) ) Also, with -fno-strict-aliasing the failure vanishes. -- amonakov at gcc dot gnu dot org changed: What|Removed |Added CC||amonakov at gcc dot gnu dot ||org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43494
[Bug rtl-optimization/43494] gcc.c-torture/execute/vector-2.c fails with -fpic/-fPIC
--- Comment #13 from amonakov at gcc dot gnu dot org 2010-07-20 14:13 --- (In reply to comment #10) Re. comment 9: Well, the order of *this* store and *this* load is the difference between the test case failing or passing. So I do not think the problem is between this load and another store. To clarify, I'm saying that the problem is in moving insn 9 (the store) past insn 21, not past insn 28. Your simplified dumps in comment #10 support that (r15 is incremented and decremented, r14 is post-modified; thus, both insn 21 and insn 9 touch memory[r12]). Insn 28 is not relevant to the miscompilation. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43494
[Bug rtl-optimization/43494] [4.4/4.5/4.6 Regression] Overlooked dependency causes wrong scheduling, wrong code
--- Comment #17 from amonakov at gcc dot gnu dot org 2010-07-21 08:32 --- (In reply to comment #16) OK, I think I finally understand what Alexander tried to explain, and I've annotated the code. Alexander, does this look right to you? Yes, thanks. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43494
[Bug rtl-optimization/43494] [4.4/4.5/4.6 Regression] Overlooked dependency causes wrong scheduling, wrong code
--- Comment #21 from amonakov at gcc dot gnu dot org 2010-07-21 10:07 --- (In reply to comment #20) (Even sel-sched apparently does not use cselib, that's surprising!) Offtopic: yes, using cselib in sel-sched is not quite straightforward, since we need it to work on arbitrary regions (as I understand cselib is designed to work on EBBs). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43494
[Bug rtl-optimization/38644] [4.3/4.4/4.5/4.6 Regression] Optimization flag -O1 -fschedule-insns2 causes wrong code
--- Comment #22 from amonakov at gcc dot gnu dot org 2010-08-12 10:12 --- It looks like patch from comment #16 should fix the problem, but was not reviewed and/or applied. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38644
[Bug rtl-optimization/38644] [4.3/4.4/4.5/4.6 Regression] Optimization flag -O1 -fschedule-insns2 causes wrong code
--- Comment #25 from amonakov at gcc dot gnu dot org 2010-08-12 12:00 --- (In reply to comment #23) The patch from comment #16 only fixes the symptom, and only on ARM. It is not a proper fix for the generic problem that is apparently also visible on POWER. PR30282 audit trail contains more discussion of this problem. Jim Wilson argues that this problem should be addressed by emitting stack ties in epilogues for targets that suffer from this problem (other targets apparently do not thanks to red zone). POWER was fixed that way (PR44199). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38644
[Bug rtl-optimization/44919] ICE on ia64 with -O3 at sel-sched.c:4672
--- Comment #8 from amonakov at gcc dot gnu dot org 2010-09-06 08:57 --- Subject: Bug 44919 Author: amonakov Date: Mon Sep 6 08:56:43 2010 New Revision: 163904 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=163904 Log: PR rtl-optimization/44919 * sel-sched.c (move_cond_jump): Remove assert, check that the several blocks case can only happen with mutually exclusive insns instead. Rewrite the movement code to support moving through several basic blocks. * g++.dg/opt/pr44919.C: New. Added: trunk/gcc/testsuite/g++.dg/opt/pr44919.C Modified: trunk/gcc/ChangeLog trunk/gcc/sel-sched.c trunk/gcc/testsuite/ChangeLog -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44919
[Bug rtl-optimization/44919] ICE on ia64 with -O3 at sel-sched.c:4672
--- Comment #9 from amonakov at gcc dot gnu dot org 2010-09-06 09:00 --- (In reply to comment #7) Any progress with the copyright assignment? The copyright assignment is renewed, and I have committed the patch to the current development branch on Andrey's behalf. It will be committed to release branches in a few days. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44919
[Bug rtl-optimization/44919] ICE on ia64 with -O3 at sel-sched.c:4672
--- Comment #11 from amonakov at gcc dot gnu dot org 2010-09-12 20:34 --- Subject: Bug 44919 Author: amonakov Date: Sun Sep 12 20:34:26 2010 New Revision: 164234 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=164234 Log: Backport from mainline 2010-09-06 Andrey Belevantsev a...@ispras.ru PR rtl-optimization/44919 * sel-sched.c (move_cond_jump): Remove assert, check that the several blocks case can only happen with mutually exclusive insns instead. Rewrite the movement code to support moving through several basic blocks. * g++.dg/opt/pr44919.C: New. Added: branches/gcc-4_5-branch/gcc/testsuite/g++.dg/opt/pr44919.C Modified: branches/gcc-4_5-branch/gcc/ChangeLog branches/gcc-4_5-branch/gcc/sel-sched.c branches/gcc-4_5-branch/gcc/testsuite/ChangeLog -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44919
[Bug rtl-optimization/44919] ICE on ia64 with -O3 at sel-sched.c:4672
--- Comment #12 from amonakov at gcc dot gnu dot org 2010-09-12 20:36 --- Subject: Bug 44919 Author: amonakov Date: Sun Sep 12 20:35:53 2010 New Revision: 164235 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=164235 Log: Backport from mainline 2010-09-06 Andrey Belevantsev a...@ispras.ru PR rtl-optimization/44919 * sel-sched.c (move_cond_jump): Remove assert, check that the several blocks case can only happen with mutually exclusive insns instead. Rewrite the movement code to support moving through several basic blocks. * g++.dg/opt/pr44919.C: New. Added: branches/gcc-4_4-branch/gcc/testsuite/g++.dg/opt/pr44919.C Modified: branches/gcc-4_4-branch/gcc/ChangeLog branches/gcc-4_4-branch/gcc/sel-sched.c branches/gcc-4_4-branch/gcc/testsuite/ChangeLog -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44919
[Bug rtl-optimization/44919] ICE on ia64 with -O3 at sel-sched.c:4672
--- Comment #13 from amonakov at gcc dot gnu dot org 2010-09-12 20:38 --- Fixed on release branches with above commits. -- amonakov at gcc dot gnu dot org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44919
[Bug rtl-optimization/45652] [4.6 Regression] gcc.dg/compat/scalar-by-value-3 FAILs with -O2 -fselective-scheduling2
--- Comment #2 from amonakov at gcc dot gnu dot org 2010-09-13 16:53 --- Confirmed. Not related to PR43949 since selective scheduling does not use cselib. The miscompilation seems to come from RTL aliasing: sel-sched lifts a load that references stack via a general-purpose register above a store via %rsp. bad cmdline: cc1 -O2 -fselective-scheduling2 -fdbg-cnt=sel_sched_insn_cnt:31 good cmdline: cc1 -O2 -fselective-scheduling2 -fdbg-cnt=sel_sched_insn_cnt:30 The no-aliasing decision comes from (base_alias_check): 1742 /* If one address is a stack reference there can be no alias: 1743 stack references using different base registers do not alias, 1744 a stack reference can not alias a parameter, and a stack reference 1745 can not alias a global. */ 1746 if ((GET_CODE (x_base) == ADDRESS GET_MODE (x_base) == Pmode) 1747 || (GET_CODE (y_base) == ADDRESS GET_MODE (y_base) == Pmode)) 1748return 0; Related GDB session: Breakpoint 4, base_alias_check (x=0x76f20920, y=0x76f2d018, x_mode=DImode, y_mode=SImode) at /home/monoid/checkout/git/gcc-selfixes/gcc/alias.c:1687 1687 rtx x_base = find_base_term (x); (gdb) up #1 0x0076da1d in true_dependence_1 (mem=0x76f2d030, mem_mode=SImode, mem_addr=0x76f2d018, x=0x76f30870, x_addr=0x76f20920, varies=0x14041f2 rtx_varies_p, mem_canonicalized=0 '\000') at /home/monoid/checkout/git/gcc-selfixes/gcc/alias.c:2440 2440 if (! base_alias_check (x_addr, mem_addr, GET_MODE (x), mem_mode)) (gdb) call debug_rtx(mem) (mem/s/c:SI (plus:DI (reg/f:DI 7 sp) (const_int 12 [0xc])) [5 ap.fp_offset+0 S4 A32]) (gdb) call debug_rtx(x) (mem/s:DI (reg:DI 4 si) [0 MEM[(struct S * {ref-all})addr.0_2]+0 S8 A64]) (gdb) down #0 base_alias_check (x=0x76f20920, y=0x76f2d018, x_mode=DImode, y_mode=SImode) at /home/monoid/checkout/git/gcc-selfixes/gcc/alias.c:1687 1687 rtx x_base = find_base_term (x); (gdb) n ... (gdb) list 1741 1742 /* If one address is a stack reference there can be no alias: 1743 stack references using different base registers do not alias, 1744 a stack reference can not alias a parameter, and a stack reference 1745 can not alias a global. */ 1746 if ((GET_CODE (x_base) == ADDRESS GET_MODE (x_base) == Pmode) 1747 || (GET_CODE (y_base) == ADDRESS GET_MODE (y_base) == Pmode)) 1748return 0; 1749 1750 return 1; (gdb) call debug_rtx(x_base) (address (reg:DI 4 si)) (gdb) call debug_rtx(y_base) (address:DI (reg/f:DI 7 sp)) (gdb) fin Run till exit from #0 base_alias_check (x=0x76f20920, y=0x76f2d018, x_mode=DImode, y_mode=SImode) at /home/monoid/checkout/git/gcc-selfixes/gcc/alias.c:1746 0x0076da1d in true_dependence_1 (mem=0x76f2d030, mem_mode=SImode, mem_addr=0x76f2d018, x=0x76f30870, x_addr=0x76f20920, varies=0x14041f2 rtx_varies_p, mem_canonicalized=0 '\000') at /home/monoid/checkout/git/gcc-selfixes/gcc/alias.c:2440 2440 if (! base_alias_check (x_addr, mem_addr, GET_MODE (x), mem_mode)) Value returned is $58 = 0 -- amonakov at gcc dot gnu dot org changed: What|Removed |Added CC||amonakov at gcc dot gnu dot ||org Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2010-09-13 16:53:59 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45652