Updating general info in tree-parloops.c
Hi, I updated some of the info in tree-parloops.c, like adding myself to the contributors, and updating the TODO list, both long overdue... I also update the wiki http://gcc.gnu.org/wiki/AutoParInGCC and added a link to it from tree-parloops.c. If there are no objections, I will commit as obvious, 2012-05-20 Razya Ladelsky ra...@il.ibm.com * tree-parloops.c : Add myself to contributors, update TODO list, add link to wiki. Thanks, Razya Index: tree-parloops.c === --- tree-parloops.c (revision 187694) +++ tree-parloops.c (working copy) @@ -1,8 +1,8 @@ /* Loop autoparallelization. Copyright (C) 2006, 2007, 2008, 2009, 2010, 2011, 2012 Free Software Foundation, Inc. - Contributed by Sebastian Pop p...@cri.ensmp.fr and - Zdenek Dvorak dvor...@suse.cz. + Contributed by Sebastian Pop p...@cri.ensmp.fr + Zdenek Dvorak dvor...@suse.cz and Razya Ladelsky ra...@il.ibm.com. This file is part of GCC. @@ -54,9 +54,9 @@ along with GCC; see the file COPYING3. If not see -- if there are several parallelizable loops in a function, it may be possible to generate the threads just once (using synchronization to ensure that cross-loop dependences are obeyed). - -- handling of common scalar dependence patterns (accumulation, ...) - -- handling of non-innermost loops */ - + -- handling of common reduction patterns for outer loops. + + More info can also be found at http://gcc.gnu.org/wiki/AutoParInGCC */ /* Reduction handling: currently we use vect_force_simple_reduction() to detect reduction patterns. =
Refining autopar cost model for outer loops
Hi, This patch changes the minimum number of iterations of outer loops for the runtime check which tests whether it is worthwhile to parallelize the loop or not. The current minimum number of iterations for all loops is MIN_PER_THREAD * number of threads, when MIN_PER_THREAD is arbitrarily set to 100. This prevents some of the promising loops of SPEC2006 from getting parallelized. I changed the minimum bound for outer loops, under the assumption that even if there are not enough iterations, the fact that an outer loop contains more loops, obtains enough work to get parallelized. This indeed allowed for a lot more loops to get parallelized, resulting in substantial performance improvements for SPEC2006 benchmarks, measured on a Power7 6 core, 4 way SMT each. I compared the trunk with O3 + autopar (parallelizing with 6 threads) vs. the trunk with O3 minus vectorization. None of the benchmarks shows any significant degradation. The speedup shown for libquatum with autopar has been obtained with previous versions of autopar, having no relation to this patch, but surely not degraded by it either. These are the speedups I collected: 462.libquantum 2.5 X 410.bwaves 3.3 X 436.cactusADM 4.5 X 459.GemsFDTD1.27 X 481.wrf 1.25 X Bootstrap and testsuite (with -ftree-parallelize-loops=4) pass successfully. spec-2006 showed no regressions. OK for trunk? Thanks, razya 2012-05-08 Razya Ladelsky ra...@il.ibm.com * tree-parloops.c (gen_parallel_loop): Change many_iterations_cond for outer loops. Index: tree-parloops.c === --- tree-parloops.c (revision 186667) +++ tree-parloops.c (working copy) @@ -1732,6 +1732,7 @@ gen_parallel_loop (struct loop *loop, htab_t reduc unsigned prob; location_t loc; gimple cond_stmt; + unsigned int m_p_thread=2; /* From @@ -1792,9 +1793,15 @@ gen_parallel_loop (struct loop *loop, htab_t reduc if (stmts) gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop), stmts); - many_iterations_cond = -fold_build2 (GE_EXPR, boolean_type_node, -nit, build_int_cst (type, MIN_PER_THREAD * n_threads)); + if (loop-inner) +m_p_thread=2; + else +m_p_thread=MIN_PER_THREAD; + + many_iterations_cond = + fold_build2 (GE_EXPR, boolean_type_node, +nit, build_int_cst (type, m_p_thread * n_threads)); + many_iterations_cond = fold_build2 (TRUTH_AND_EXPR, boolean_type_node, invert_truthvalue (unshare_expr (niter-may_be_zero)), =
Correcting transform_to_exit_first_loop + fix to PR46886
Hi, This patch handles duplicating of the last iteration correctly. The current code always duplicates the complete static iteration from the entry to the latch, and then sets the number of iterations according to the pattern of the loop (according to whether the cond before the body, or the body before the cond). The correct way to go about is to not assume anything about the control flow of the loop, and duplicate the last iteration only from entry to the block consisting of the cond, that is the real last dynamic iteration that would be executed. The number of iterations then needs no special care for each loop pattern. This was actually Zdenek's original intent by duplicating the last iteration. This code allows us to remove the do_while cond, and gets PR46886 resolved (instead of the solution suggested in http://gcc.gnu.org/ml/gcc-patches/2012-03/msg01642.html, which handled the number of iterations according to the specific shape of the loop) Bootstrap and testsuite pass successfully. Testsuite with -ftree-parallelize-loops=4 shows only one regression which is uninit-17.c test for warnings, which seems unrelated to how the loop gets parallelized. I also ran spec-2006, and it showed no regressions. 2012-04-20 Razya Ladelsky ra...@il.ibm.com PR tree-optimization/46886 * tree-parloops.c (transform_to_exit_first_loop): Remove setting of number of iterations according to the loop pattern. Duplicate from entry to exit-src instead of loop-latch. (pallelize_loops): Remove the condition preventing do-while loops. * tree-cfg.c (bool bb_in_region_p): New. (gimple_duplicate_sese_tail): Adjust duplication of the the subloops. Adjust redirection of the duplicated iteration. O.K to commit? Thanks, Razya Index: tree-parloops.c === --- tree-parloops.c (revision 186493) +++ tree-parloops.c (working copy) @@ -1481,8 +1481,6 @@ transform_to_exit_first_loop (struct loop *loop, h gimple phi, nphi, cond_stmt, stmt, cond_nit; gimple_stmt_iterator gsi; tree nit_1; - edge exit_1; - tree new_rhs; split_block_after_labels (loop-header); orig_header = single_succ (loop-header); @@ -1512,41 +1510,10 @@ transform_to_exit_first_loop (struct loop *loop, h } } - /* Setting the condition towards peeling the last iteration: -If the block consisting of the exit condition has the latch as -successor, then the body of the loop is executed before -the exit condition is tested. In such case, moving the -condition to the entry, causes that the loop will iterate -one less iteration (which is the wanted outcome, since we -peel out the last iteration). If the body is executed after -the condition, moving the condition to the entry requires -decrementing one iteration. */ - exit_1 = EDGE_SUCC (exit-src, EDGE_SUCC (exit-src, 0) == exit); - if (exit_1-dest == loop-latch) -new_rhs = gimple_cond_rhs (cond_stmt); - else - { -new_rhs = fold_build2 (MINUS_EXPR, TREE_TYPE (gimple_cond_rhs (cond_stmt)), - gimple_cond_rhs (cond_stmt), - build_int_cst (TREE_TYPE (gimple_cond_rhs (cond_stmt)), 1)); -if (TREE_CODE (gimple_cond_rhs (cond_stmt)) == SSA_NAME) - { - basic_block preheader; - gimple_stmt_iterator gsi1; - - preheader = loop_preheader_edge(loop)-src; - gsi1 = gsi_after_labels (preheader); - new_rhs = force_gimple_operand_gsi (gsi1, new_rhs, true, - NULL_TREE,false,GSI_CONTINUE_LINKING); - } - } - gimple_cond_set_rhs (cond_stmt, unshare_expr (new_rhs)); - gimple_cond_set_lhs (cond_stmt, unshare_expr (gimple_cond_lhs (cond_stmt))); - bbs = get_loop_body_in_dom_order (loop); - for (n = 0; bbs[n] != loop-latch; n++) -continue; + for (n = 0; bbs[n] != exit-src; n++) + continue; nbbs = XNEWVEC (basic_block, n); ok = gimple_duplicate_sese_tail (single_succ_edge (loop-header), exit, bbs + 1, n, nbbs); @@ -1599,6 +1566,7 @@ transform_to_exit_first_loop (struct loop *loop, h /* Initialize the control variable to number of iterations according to the rhs of the exit condition. */ gsi = gsi_after_labels (ex_bb); + exit = single_dom_exit (loop); cond_nit = last_stmt (exit-src); nit_1 = gimple_cond_rhs (cond_nit); nit_1 = force_gimple_operand_gsi (gsi, @@ -1861,8 +1829,8 @@ gen_parallel_loop (struct loop *loop, htab_t reduc /* Ensure that the exit condition is the first statement in the loop. */ transform_to_exit_first_loop (loop, reduction_list, nit); - /* Generate initializations for reductions. */ + if (htab_elements (reduction_list) 0) htab_traverse (reduction_list, initialize_reductions, loop); @@ -2187,10 +2155,9
[PATCH] Permanent Fix for PR46886
Hi, This is (hopefully) a permanent fix to pr46886.c I removed the condition preventing parallelization of do_while loops, as it was blocking parallelizing important loops in spec-2006. The patch fixes the number of iterations for cases where the body could appear in the latch, as in pr46886.c. 2012-03-26 Razya Ladelsky ra...@il.ibm.com PR tree-optimization/46886 * tree-parloops.c (transform_to_exit_first_loop):Set number of iterations correctly when the body may appear at the latch. (pallelize_loops): Remove the condition preventing do-while loops. Bootstrap and testsuite psss successfully on power7 linux machine. Ok to commit? Thanks, Index: tree-parloops.c === --- tree-parloops.c (revision 185775) +++ tree-parloops.c (working copy) @@ -1522,7 +1522,10 @@ transform_to_exit_first_loop (struct loop *loop, h the condition, moving the condition to the entry requires decrementing one iteration. */ exit_1 = EDGE_SUCC (exit-src, EDGE_SUCC (exit-src, 0) == exit); - if (exit_1-dest == loop-latch) + + /* if the latch contains more than the one statemnt of control variable + increment then it contains the body. */ + if (exit_1-dest == loop-latch last_and_only_stmt (loop-latch)) new_rhs = gimple_cond_rhs (cond_stmt); else { @@ -2146,7 +2149,6 @@ parallelize_loops (void) return false; if (cfun-has_nonlocal_label) return false; - gcc_obstack_init (parloop_obstack); reduction_list = htab_create (10, reduction_info_hash, reduction_info_eq, free); @@ -2187,10 +2189,7 @@ parallelize_loops (void) || loop_has_blocks_with_irreducible_flag (loop) || (loop_preheader_edge (loop)-src-flags BB_IRREDUCIBLE_LOOP) /* FIXME: the check for vector phi nodes could be removed. */ - || loop_has_vector_phi_nodes (loop) - /* FIXME: transform_to_exit_first_loop does not handle not -header-copied loops correctly - see PR46886. */ - || !do_while_loop_p (loop)) + || loop_has_vector_phi_nodes (loop)) continue; estimated = max_stmt_executions_int (loop, false); /* FIXME: Bypass this check as graphite doesn't update the @@ -2213,6 +2212,7 @@ parallelize_loops (void) continue; changed = true; + if (dump_file (dump_flags TDF_DETAILS)) { if (loop-inner) =
Re: [PATCH] Permanent Fix for PR46886
Richard Guenther rguent...@suse.de wrote on 26/03/2012 01:23:15 PM: From: Richard Guenther rguent...@suse.de To: Razya Ladelsky/Haifa/IBM@IBMIL Cc: gcc-patches@gcc.gnu.org Date: 26/03/2012 01:23 PM Subject: Re: [PATCH] Permanent Fix for PR46886 On Mon, 26 Mar 2012, Razya Ladelsky wrote: Hi, This is (hopefully) a permanent fix to pr46886.c I removed the condition preventing parallelization of do_while loops, as it was blocking parallelizing important loops in spec-2006. The patch fixes the number of iterations for cases where the body could appear in the latch, as in pr46886.c. 2012-03-26 Razya Ladelsky ra...@il.ibm.com PR tree-optimization/46886 * tree-parloops.c (transform_to_exit_first_loop):Set number of iterations correctly when the body may appear at the latch. (pallelize_loops): Remove the condition preventing do-while loops. Bootstrap and testsuite psss successfully on power7 linux machine. Ok to commit? + + /* if the latch contains more than the one statemnt of control variable + increment then it contains the body. */ + if (exit_1-dest == loop-latch last_and_only_stmt (loop-latch)) new_rhs = gimple_cond_rhs (cond_stmt); please check what the comment suggests, thus, last_and_only_stmt (loop-latch) == cond_stmt Hi Richard, The latch has at least one stmt for sure: the control variable increment (not a cond_stmt) coming from the call to canonicalize_loop_ivs(loop, nit, true) earlier in tree-parloops: /* . When BUMP_IN_LATCH is true, the induction variable is incremented in the loop latch, otherwise it is incremented in the loop header. Return the induction variable that was created. */ tree canonicalize_loop_ivs (struct loop *loop, tree *nit, bool bump_in_latch) tree-parloops.c is quite fragile in what it expects the IL to look like and tests do not cover what later code expects. Please do not add to that. I agree. I have also tested this code on spec2006 benchmarks. (together with an upcoming patch, 5 of these show speedups with autopar on a linux-power7) @@ -2146,7 +2149,6 @@ parallelize_loops (void) return false; if (cfun-has_nonlocal_label) return false; - spurious whitespace change. @@ -2213,6 +2212,7 @@ parallelize_loops (void) continue; changed = true; + if (dump_file (dump_flags TDF_DETAILS)) { if (loop-inner) Likewise. Ok with the above change. Ok, thanks, Razya Thanks, Richard.
Fw: [PATCH] Permanent Fix for PR46886
+ + /* if the latch contains more than the one statemnt of control variable + increment then it contains the body. */ + if (exit_1-dest == loop-latch last_and_only_stmt (loop-latch)) new_rhs = gimple_cond_rhs (cond_stmt); please check what the comment suggests, thus, last_and_only_stmt (loop-latch) == cond_stmt Richard, There's no simple way to do such a check, as the control variable increment stmt is explicitly created in canonicalize_loop_ivs and not in tree-parloops. What we do know is that it is put in the latch because we ask for it when we call canonicalize_loop_ivs from parloops. One way to get the control variable increment stmt is to use canonicalize_loop_ivs() return value and look for the stmts that use it. Do you think I should add all this functionality to the code in order to assert the stmt that is in the latch? or can I skip it ? Thanks, Razya
[RFC][PATCH] A change to do_while_loop_p()
Hi, I need to use do_while_loop_p, but I'm not sure its functionality is what I expected it to be. This is the part that I do not understand: /* If the header contains just a condition, it is not a do-while loop. */ stmt = last_and_only_stmt (loop-header); if (stmt gimple_code (stmt) == GIMPLE_COND) return false; The header could contain a condition which is not the loop's exit condition, but rather a part of its body, then why do we rule out this loop as a do_while loop? I ran into this in a loop (the outer loop) extracted from bwaves benchmark: do k=1,nz km1=mod(k+nz-2,nz)+1 kp1=mod(k,nz)+1 do j=1,ny jm1=mod(j+ny-2,ny)+1 jp1=mod(j,ny)+1 . enddo enddo which was translated to: D.2361_17 = *ny_16(D); bb 5: # k_3 = PHI 1(4), k_562(25) if (D.2361_17 0) goto bb 8; else goto bb 6; bb 6: k_562 = k_3 + 1; # DEBUG k = k_562 if (k_3 == D.1583_270) goto bb 7; --- return else goto bb 25; bb 25: goto bb 5; bb 8: -- starting the body of the the second loop pretmp.318_776 = (integer(kind=8)) k_3; pretmp.318_777 = stride.92_20 * pretmp.318_776; ... bb 5 is the header of the outer loop, and bb 25 is the latch. According to do_while_loop_p () this is NOT a do while loop, but it seems that it should be. I am attaching a patch to change do_while_loop_p() assuming that what I understand is indeed correct, Please let me know if I'm right, Thank you, Razya Index: tree-ssa-loop-ch.c === --- tree-ssa-loop-ch.c (revision 185604) +++ tree-ssa-loop-ch.c (working copy) @@ -107,6 +107,8 @@ should_duplicate_loop_header_p (basic_block header bool do_while_loop_p (struct loop *loop) { + edge exit_edge; + gimple cond_stmt; gimple stmt = last_stmt (loop-latch); /* If the latch of the loop is not empty, it is not a do-while loop. */ @@ -116,8 +118,14 @@ do_while_loop_p (struct loop *loop) /* If the header contains just a condition, it is not a do-while loop. */ stmt = last_and_only_stmt (loop-header); + exit_edge = single_dom_exit (loop); + if (exit_edge) +cond_stmt = last_stmt (exit_edge-src); + else +cond_stmt =stmt; if (stmt - gimple_code (stmt) == GIMPLE_COND) + gimple_code (stmt) == GIMPLE_COND + stmt == cond_stmt) return false; return true; =
Re: [PATCH, take 2] Fix PR tree-optimization/49960 ,Fix self data dependence
Jakub Jelinek ja...@redhat.com wrote on 21/11/2011 07:25:10 PM: From: Jakub Jelinek ja...@redhat.com To: Razya Ladelsky/Haifa/IBM@IBMIL Cc: GCC Patches gcc-patches@gcc.gnu.org, Richard Guenther richard.guent...@gmail.com Date: 21/11/2011 07:25 PM Subject: Re: [PATCH, take 2] Fix PR tree-optimization/49960 ,Fix self data dependence On Mon, Nov 21, 2011 at 06:56:55PM +0200, Razya Ladelsky wrote: I have some non-affine cases for which compute_affine_dependence is called (as it is called for ALL dependences from compte_all_depepndences()), and no harm is done. I looked a little bit closer into the code, and this is what happens for non affine accesses: initialize_data_dependence_relation() assigns DDR_ARE_DEPENDENT (res) = chrec_dont_know for the dr. It can be chrec_known too (that's actually the only interesting case for us unless it is a read-read ddr), but you're right that likely object_address_invariant_in_loop_p should be false. Then, compute_affine_depepndence() tests if (DDR_ARE_DEPENDENT (ddr) == NULL_TREE), and does nothing otherwise. Since the dr was initialized with chrec_dont_know, no harm is done. Anyway, since it is useless for your gather case, I'll just remove it, along with compute_self_dependence(). OK? Yes, patch preapproved. Jakub Hi, Going to commit this patch today. I bootstrapped and regtested on ppc64-suse-linux. ChangeLog: * tree-data-ref.c (initialize_data_dependence_relation): Update comment for the self dependence case. (compute_self_dependence): Remove. * tree-vect-data-refs.c (vect_analyze_data_refs): Remove call to compute_self_dependenc. Thanks, Razya Index: tree-data-ref.c === --- tree-data-ref.c (revision 181675) +++ tree-data-ref.c (working copy) @@ -1385,8 +1385,7 @@ initialize_data_dependence_relation (struct data_r return res; } - /* When the references are exactly the same, don't spend time doing - the data dependence tests, just initialize the ddr and return. */ + /* The case where the references are exactly the same. */ if (operand_equal_p (DR_REF (a), DR_REF (b), 0)) { if (loop_nest @@ -4132,19 +4131,6 @@ compute_affine_dependence (struct data_dependence_ fprintf (dump_file, )\n); } -/* This computes the dependence relation for the same data - reference into DDR. */ - -void -compute_self_dependence (struct data_dependence_relation *ddr) -{ - if (DDR_ARE_DEPENDENT (ddr) != NULL_TREE) -return; - - if (DDR_LOOP_NEST (ddr)) -compute_affine_dependence (ddr, VEC_index (loop_p, DDR_LOOP_NEST (ddr), 0)); -} - /* Compute in DEPENDENCE_RELATIONS the data dependence graph for all the data references in DATAREFS, in the LOOP_NEST. When COMPUTE_SELF_AND_RR is FALSE, don't compute read-read and self Index: tree-vect-data-refs.c === --- tree-vect-data-refs.c (revision 181675) +++ tree-vect-data-refs.c (working copy) @@ -3122,7 +3122,6 @@ vect_analyze_data_refs (loop_vec_info loop_vinfo, ddr = VEC_index (ddr_p, ddrs, k); gcc_assert (DDR_A (ddr) == olddr DDR_B (ddr) == olddr); newddr = initialize_data_dependence_relation (dr, dr, nest); - compute_self_dependence (newddr); VEC_replace (ddr_p, ddrs, k, newddr); free_dependence_relation (ddr); VEC_replace (data_reference_p, datarefs, i, dr); =
Re: [PATCH, take 2] Fix PR tree-optimization/49960 ,Fix self data dependence
gcc-patches-ow...@gcc.gnu.org wrote on 21/11/2011 02:57:07 PM: From: Jakub Jelinek ja...@redhat.com To: Razya Ladelsky/Haifa/IBM@IBMIL Cc: Richard Guenther richard.guent...@gmail.com, GCC Patches gcc- patc...@gcc.gnu.org Date: 21/11/2011 02:57 PM Subject: Re: [PATCH, take 2] Fix PR tree-optimization/49960 ,Fix self data dependence Sent by: gcc-patches-ow...@gcc.gnu.org On Tue, Nov 15, 2011 at 12:31:39PM +0200, Razya Ladelsky wrote: This patch fixes the failures described in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49960 It also fixes bzips when run with autopar enabled. Sorry, I've been away and couldn't react to this earlier, but the patch looks wrong to me. The reason why I've been calling compute_self_dependence from the gather handling code is to do there what we did for normal data references in compute_all_dependences, except for the affine stuff. Hi Jakub, what do you mean by 'except for the affine stuff'? Before having this patch, compute_self_depepndence just marked the distance zero and returned, while now it calls compute_affine_dependence. So I think you do get the right outcome now. So, when compute_all_dependences no longer calls it, neither should vect_analyze_data_refs. So do you want to replace it with the call to compute_affine_dependence? Thanks, Razya Jakub
Re: [PATCH, take 2] Fix PR tree-optimization/49960 ,Fix self data dependence
Jakub Jelinek ja...@redhat.com wrote on 21/11/2011 03:59:15 PM: From: Jakub Jelinek ja...@redhat.com To: Razya Ladelsky/Haifa/IBM@IBMIL Cc: GCC Patches gcc-patches@gcc.gnu.org, Richard Guenther richard.guent...@gmail.com Date: 21/11/2011 03:59 PM Subject: Re: [PATCH, take 2] Fix PR tree-optimization/49960 ,Fix self data dependence On Mon, Nov 21, 2011 at 03:50:10PM +0200, Razya Ladelsky wrote: what do you mean by 'except for the affine stuff'? I mean that compute_affine_dependence must never be called on gather data refs, that function can't do anything meaningful for them, they are gather data refs exactly because dr_analyze_innermost failed on them otherwise, as base and/or offset aren't simple IVs. I understand. I tried to follow the same paths that data refs (affine or not) go through in compute_all_depepndences. According to compute_all_depepndences(), the path for computing any data dependence is: call initialize_data_depepndence_relation(), and call compute_affine_dependence() if there's a loop nest. I tried to keep the same semantics for self dependences. Although compute_affine_dependence() can't do anything meaningful for the gather data refs, it may still be assigning values to the dependence relation structure, if you need to use them. Therefore I did not skip the call to compute_self_dependence(), which indeed calls compute_affine_depepndence(). If you think it's redundant, we can remove it. Thanks, Razya So, when compute_all_dependences no longer calls it, neither should vect_analyze_data_refs. So do you want to replace it with the call to compute_affine_dependence? No. With nothing at all. Jakub
Re: [PATCH, take 2] Fix PR tree-optimization/49960 ,Fix self data dependence
Jakub Jelinek ja...@redhat.com wrote on 21/11/2011 05:07:54 PM: From: Jakub Jelinek ja...@redhat.com To: Razya Ladelsky/Haifa/IBM@IBMIL Cc: GCC Patches gcc-patches@gcc.gnu.org, Richard Guenther richard.guent...@gmail.com Date: 21/11/2011 05:08 PM Subject: Re: [PATCH, take 2] Fix PR tree-optimization/49960 ,Fix self data dependence On Mon, Nov 21, 2011 at 04:54:09PM +0200, Razya Ladelsky wrote: Although compute_affine_dependence() can't do anything meaningful for the gather data refs, it may still be assigning values to the dependence relation structure, if you need to use them. Therefore I did not skip the call to compute_self_dependence(), which indeed calls compute_affine_depepndence(). If you think it's redundant, we can remove it. It is not just redundant, but harmful. compute_affine_dependence assumes that base and offset are simple IVs, that is not true for gather deps. So please do remove it. Jakub Jakub, I have some non-affine cases for which compute_affine_dependence is called (as it is called for ALL dependences from compte_all_depepndences()), and no harm is done. I looked a little bit closer into the code, and this is what happens for non affine accesses: initialize_data_dependence_relation() assigns DDR_ARE_DEPENDENT (res) = chrec_dont_know for the dr. Then, compute_affine_depepndence() tests if (DDR_ARE_DEPENDENT (ddr) == NULL_TREE), and does nothing otherwise. Since the dr was initialized with chrec_dont_know, no harm is done. Anyway, since it is useless for your gather case, I'll just remove it, along with compute_self_dependence(). OK? Thanks, Razya
[PATCH, take 2] Fix PR tree-optimization/49960 ,Fix self data dependence
I hope it's clearer now, I will add a comment to the code, and submit it before committing it. No, it's not clearer, because it is not clear why you need to add the hack instead of avoiding the 2nd access function. And iff you add the hack it needs a comment why zero should be special (any other constant would be the same I suppose). Btw, your fortran example does not compile and I don't believe the issue is still present after my last changes to dr_analyze_indices. So, did you verify this on trunk? Richard. This patch fixes the failures described in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49960 It also fixes bzips when run with autopar enabled. In both cases the self dependences are not handled correctly. In the first case, a non affine access is analyzed: in the second, the distance vector is not calculated correctly (the distance vector considered for for self dependences is always (0,0,...)) As a result, the loops get wrongfully parallelized. I modified the previous patch according to the last changes in the trunk, which indeed do not requite special handling for the 2nd access function (as mentioned by Richard). Another change is that the previous version of the patch eliminated compute_self_dependences function as the calls to it were redundant, while this version considers the new call to compute_self_dependences from the vect code for gather (inserted lately by Jakub). ChangeLog: PR tree-optimization/49960 * tree-data-ref.c (initialize_data_dependence_relation): Add initializations. Remove call to compute_self_dependence. (compute_affine_dependence): Remove the !DDR_SELF_REFERENCE condition. (compute_self_dependence): Remove old code. Add call to compute_affine_dependence. (compute_all_dependences): Remove call to compute_self_dependence. Add call to compute_affine_dependence. testsuite/ChangeLog: PR tree-optimization/49960 * gcc.dg/autopar/pr49960.c: New test. * gcc.dg/autopar/pr49960-1.c: New test. Bootstrap and testsuite pass successfully for ppc64-redhat-linux. OK for trunk? Thank you, Razya pr49960.c Description: Binary data pr49960-1.c Description: Binary data Index: gcc/tree-data-ref.c === --- gcc/tree-data-ref.c (revision 181168) +++ gcc/tree-data-ref.c (working copy) @@ -1389,13 +1389,30 @@ initialize_data_dependence_relation (struct data_r the data dependence tests, just initialize the ddr and return. */ if (operand_equal_p (DR_REF (a), DR_REF (b), 0)) { + if (loop_nest + !object_address_invariant_in_loop_p (VEC_index (loop_p, loop_nest, 0), + DR_BASE_OBJECT (a))) + { +DDR_ARE_DEPENDENT (res) = chrec_dont_know; +return res; + } DDR_AFFINE_P (res) = true; DDR_ARE_DEPENDENT (res) = NULL_TREE; DDR_SUBSCRIPTS (res) = VEC_alloc (subscript_p, heap, DR_NUM_DIMENSIONS (a)); DDR_LOOP_NEST (res) = loop_nest; DDR_INNER_LOOP (res) = 0; DDR_SELF_REFERENCE (res) = true; - compute_self_dependence (res); + for (i = 0; i DR_NUM_DIMENSIONS (a); i++) + { + struct subscript *subscript; + + subscript = XNEW (struct subscript); + SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known (); + SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known (); + SUB_LAST_CONFLICT (subscript) = chrec_dont_know; + SUB_DISTANCE (subscript) = chrec_dont_know; + VEC_safe_push (subscript_p, heap, DDR_SUBSCRIPTS (res), subscript); + } return res; } @@ -4040,8 +4057,7 @@ compute_affine_dependence (struct data_dependence_ } /* Analyze only when the dependence relation is not yet known. */ - if (DDR_ARE_DEPENDENT (ddr) == NULL_TREE - !DDR_SELF_REFERENCE (ddr)) + if (DDR_ARE_DEPENDENT (ddr) == NULL_TREE) { dependence_stats.num_dependence_tests++; @@ -4122,31 +4138,11 @@ compute_affine_dependence (struct data_dependence_ void compute_self_dependence (struct data_dependence_relation *ddr) { - unsigned int i; - struct subscript *subscript; - if (DDR_ARE_DEPENDENT (ddr) != NULL_TREE) return; - for (i = 0; VEC_iterate (subscript_p, DDR_SUBSCRIPTS (ddr), i, subscript); - i++) -{ - if (SUB_CONFLICTS_IN_A (subscript)) - free_conflict_function (SUB_CONFLICTS_IN_A (subscript)); - if (SUB_CONFLICTS_IN_B (subscript)) - free_conflict_function (SUB_CONFLICTS_IN_B (subscript)); - - /* The accessed index overlaps for each iteration. */ - SUB_CONFLICTS_IN_A (subscript) - = conflict_fn (1, affine_fn_cst (integer_zero_node)); - SUB_CONFLICTS_IN_B (subscript) - = conflict_fn (1, affine_fn_cst (integer_zero_node)); - SUB_LAST_CONFLICT (subscript) = chrec_dont_know; -} - - /* The distance
Re: [patch] Fix PR tree-optimization/49960 ,Fix self data dependence
gcc-patches-ow...@gcc.gnu.org wrote on 17/10/2011 09:03:59 AM: From: Richard Guenther richard.guent...@gmail.com To: Razya Ladelsky/Haifa/IBM@IBMIL Cc: gcc-patches@gcc.gnu.org, Sebastian Pop s...@gcc.gnu.org Date: 17/10/2011 09:04 AM Subject: Re: [patch] Fix PR tree-optimization/49960 ,Fix self data dependence Sent by: gcc-patches-ow...@gcc.gnu.org On Mon, Oct 17, 2011 at 8:23 AM, Razya Ladelsky ra...@il.ibm.com wrote: This patch fixes the failures described in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49960 It also fixes bzips when run with autopar enabled. In both cases the self dependences are not handled correctly. In the first case, a non affine access is analyzed: in the second, the distance vector is not calculated correctly (the distance vector considered for for self dependences is always (0,0,...)) As a result, the loops get wrongfully parallelized. The patch avoids the special handling of self dependences, and analyzes all dependences in the same way. Specific adjustments and support for the self dependence cases were made. Can you elaborate on @@ -3119,8 +3135,11 @@ add_other_self_distances (struct data_dependence_r { if (DDR_NUM_SUBSCRIPTS (ddr) != 1) { -DDR_ARE_DEPENDENT (ddr) = chrec_dont_know; -return; +if (DDR_NUM_SUBSCRIPTS (ddr) != 2 || !integer_zerop (DR_ACCESS_FN (DDR_A (ddr), 1))) + { +DDR_ARE_DEPENDENT (ddr) = chrec_dont_know; +return; + } } access_fun = DR_ACCESS_FN (DDR_A (ddr), 0); ? It needed a comment before, and now so even more. The rest of the patch is ok, I suppose the above hunk is to enhance something, not to fix the bug? For fortran code like: DO 140 J=1,MB DO 130 K=1,NA BKJ=B(K,J) IF(BKJ.EQ.ZERO) GO TO 130 DO 120 I=1,MA C(I,J)=C(I,J)+A(K,I)*BKJ 120 CONTINUE 130CONTINUE 140 CONTINUE RETURN The access functions for the C(i j) self dependence are: (Data Dep: #(Data Ref: # bb: 9 # stmt: D.1427_79 = *c_78(D)[D.1426_77]; # ref: *c_78(D)[D.1426_77]; # base_object: *c_78(D); # Access function 0: {{(stride.12_25 + 1) + offset.13_36, +, stride.12_25}_1, +, 1}_3 # Access function 1: 0B #) #(Data Ref: # bb: 9 # stmt: *c_78(D)[D.1426_77] = D.1433_88; # ref: *c_78(D)[D.1426_77]; # base_object: *c_78(D); # Access function 0: {{(stride.12_25 + 1) + offset.13_36, +, stride.12_25}_1, +, 1}_3 # Access function 1: 0B #) Two dimesions are created to describe C(i j) although there's no need for access function 1 which is just 0B. If this was a C code, we would have these two access functions for C[i][j]: (Data Dep: #(Data Ref: # bb: 5 # stmt: t_10 = C[i_33][j_37]; # ref: C[i_33][j_37]; # base_object: C # Access function 0: {3, +, 1}_3 # Access function 1: {3, +, 1}_2 #) #(Data Ref: # bb: 5 # stmt: C[i_33][j_37] = D.3852_15; # ref: C[i_33][j_37]; # base_object: C # Access function 0: {3, +, 1}_3 # Access function 1: {3, +, 1}_2 #) In order to handle the Fortran data accesses, even for simple cases as above, I would need to handle multivariate accesses. The data dependence analysis doesn't know how to handle such dependences if there's more than one subscript. The above Frotran code doesn't actually have two subscripts, but one, and thus should be handled. The reason this issue came up only with the changes of this patch is that now add_other_self_distances is called from build_classic_dist_vector, which is called also for self dependences. Before the patch, the distance vector for self dependences was always determined as a vector of 0's, and build_classic_dist_vector was not called. I hope it's clearer now, I will add a comment to the code, and submit it before committing it. Thanks, Razya Thanks, Richard. Bootstrap and testsuite pass successfully for ppc64-redhat-linux. OK for trunk? Thank you, Razya ChangeLog: PR tree-optimization/49960 * tree-data-ref.c (compute_self_dependence): Remove. (initialize_data_dependence_relation): Add intializations. Remove compute_self_dependence. (add_other_self_distances): Add support for two dimensions if the second is zero. (compute_affine_dependence): Remove the !DDR_SELF_REFERENCE condition. (compute_all_dependences): Remove call to compute_self_dependence. Add call to compute_affine_dependence testsuite/ChangeLog: PR tree-optimization/49660 * gcc.dg/autopar/pr49660.c: New test. * gcc.dg/autopar/pr49660-1.c: New test.
[patch] Fix PR tree-optimization/49960 ,Fix self data dependence
This patch fixes the failures described in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49960 It also fixes bzips when run with autopar enabled. In both cases the self dependences are not handled correctly. In the first case, a non affine access is analyzed: in the second, the distance vector is not calculated correctly (the distance vector considered for for self dependences is always (0,0,...)) As a result, the loops get wrongfully parallelized. The patch avoids the special handling of self dependences, and analyzes all dependences in the same way. Specific adjustments and support for the self dependence cases were made. Bootstrap and testsuite pass successfully for ppc64-redhat-linux. OK for trunk? Thank you, Razya ChangeLog: PR tree-optimization/49960 * tree-data-ref.c (compute_self_dependence): Remove. (initialize_data_dependence_relation): Add intializations. Remove compute_self_dependence. (add_other_self_distances): Add support for two dimensions if the second is zero. (compute_affine_dependence): Remove the !DDR_SELF_REFERENCE condition. (compute_all_dependences): Remove call to compute_self_dependence. Add call to compute_affine_dependence testsuite/ChangeLog: PR tree-optimization/49660 * gcc.dg/autopar/pr49660.c: New test. * gcc.dg/autopar/pr49660-1.c: New test. Index: tree-data-ref.c === --- tree-data-ref.c (revision 179799) +++ tree-data-ref.c (working copy) @@ -1346,7 +1346,6 @@ dr_may_alias_p (const struct data_reference *a, co return refs_may_alias_p (addr_a, addr_b); } -static void compute_self_dependence (struct data_dependence_relation *); /* Initialize a data dependence relation between data accesses A and B. NB_LOOPS is the number of loops surrounding the references: the @@ -1386,13 +1385,30 @@ initialize_data_dependence_relation (struct data_r the data dependence tests, just initialize the ddr and return. */ if (operand_equal_p (DR_REF (a), DR_REF (b), 0)) { + if (loop_nest + !object_address_invariant_in_loop_p (VEC_index (loop_p, loop_nest, 0), + DR_BASE_OBJECT (a))) + { + DDR_ARE_DEPENDENT (res) = chrec_dont_know; + return res; + } DDR_AFFINE_P (res) = true; DDR_ARE_DEPENDENT (res) = NULL_TREE; DDR_SUBSCRIPTS (res) = VEC_alloc (subscript_p, heap, DR_NUM_DIMENSIONS (a)); DDR_LOOP_NEST (res) = loop_nest; DDR_INNER_LOOP (res) = 0; DDR_SELF_REFERENCE (res) = true; - compute_self_dependence (res); + for (i = 0; i DR_NUM_DIMENSIONS (a); i++) + { + struct subscript *subscript; + + subscript = XNEW (struct subscript); + SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known (); + SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known (); + SUB_LAST_CONFLICT (subscript) = chrec_dont_know; + SUB_DISTANCE (subscript) = chrec_dont_know; + VEC_safe_push (subscript_p, heap, DDR_SUBSCRIPTS (res), subscript); + } return res; } @@ -3119,8 +3135,11 @@ add_other_self_distances (struct data_dependence_r { if (DDR_NUM_SUBSCRIPTS (ddr) != 1) { - DDR_ARE_DEPENDENT (ddr) = chrec_dont_know; - return; + if (DDR_NUM_SUBSCRIPTS (ddr) != 2 || !integer_zerop (DR_ACCESS_FN (DDR_A (ddr), 1))) + { + DDR_ARE_DEPENDENT (ddr) = chrec_dont_know; + return; + } } access_fun = DR_ACCESS_FN (DDR_A (ddr), 0); @@ -4037,8 +4056,7 @@ compute_affine_dependence (struct data_dependence_ } /* Analyze only when the dependence relation is not yet known. */ - if (DDR_ARE_DEPENDENT (ddr) == NULL_TREE - !DDR_SELF_REFERENCE (ddr)) + if (DDR_ARE_DEPENDENT (ddr) == NULL_TREE) { dependence_stats.num_dependence_tests++; @@ -4113,39 +4131,6 @@ compute_affine_dependence (struct data_dependence_ fprintf (dump_file, )\n); } -/* This computes the dependence relation for the same data - reference into DDR. */ - -static void -compute_self_dependence (struct data_dependence_relation *ddr) -{ - unsigned int i; - struct subscript *subscript; - - if (DDR_ARE_DEPENDENT (ddr) != NULL_TREE) -return; - - for (i = 0; VEC_iterate (subscript_p, DDR_SUBSCRIPTS (ddr), i, subscript); - i++) -{ - if (SUB_CONFLICTS_IN_A (subscript)) - free_conflict_function (SUB_CONFLICTS_IN_A (subscript)); - if (SUB_CONFLICTS_IN_B (subscript)) - free_conflict_function (SUB_CONFLICTS_IN_B (subscript)); - - /* The accessed index overlaps for each iteration. */ - SUB_CONFLICTS_IN_A (subscript) - = conflict_fn (1, affine_fn_cst
Re: [patch] Fix PR tree-optimization/49471
Richard Guenther richard.guent...@gmail.com wrote on 25/07/2011 05:54:28 PM: From: Richard Guenther richard.guent...@gmail.com To: Razya Ladelsky/Haifa/IBM@IBMIL Cc: gcc-patches@gcc.gnu.org, Zdenek Dvorak rakd...@kam.mff.cuni.cz, Sebastian Pop s...@gcc.gnu.org Date: 25/07/2011 05:54 PM Subject: Re: [patch] Fix PR tree-optimization/49471 On Mon, Jul 25, 2011 at 4:47 PM, Razya Ladelsky ra...@il.ibm.com wrote: Hi, This patch fixes the build failure of cactusADM and dealII spec2006 benchmarks when autopar is enabled. (for powerpc they fail only when -m32 is additionally enabled) The problem originated in canonicalize_loop_ivs, where we iterate the header's phis in order to base all the induction variables on a single control variable. We use the largest precision of the loop's ivs in order to determine the type of the control variable. Since iterating the loop's phis takes into account not only the loop's ivs, but also reduction variables, we got precision values like 80 for x86, or 128 for ppc. The compilers failed to create proper types for these sizes (respectively). The proper behavior for determining the control variable's type is to take into account only the loop's ivs, which is what this patch does. Bootstrap and testsuite pass successfully (as autopar is not enabled by default). No new regressions when the testsuite is run with autopar enabled. No new regressions for the run of spec2006 with autopar enabled, cactusADM and dealII benchmarks now pass successfully with autopar on powerpc and x86. Thanks to Zdenek who helped me figure out the failure/fix. OK for trunk? It'll collide with Sebastians patch in that area. I suggested a INTEGRAL_TYPE_P check instead of the simple_iv one, it should be cheaper. Zdenek, do you think it will be incorrect in some cases? The INTEGRAL_TYPE_P check does work for cactusADM and dealII, but I'm not sure about the general case. Razya Thanks, Richard. Thanks, Razya ChangeLog: PR tree-optimization/49471 * tree-vect-loop-manip.c (canonicalize_loop_ivs): Add condition to ignore reduction variables when iterating the loop header's phis.
Re: [patch] Fix PR tree-optimization/49471
Razya Ladelsky/Haifa/IBM wrote on 25/07/2011 05:44:02 PM: From: Razya Ladelsky/Haifa/IBM To: gcc-patches@gcc.gnu.org Cc: Zdenek Dvorak rakd...@kam.mff.cuni.cz, Richard Guenther richard.guent...@gmail.com Date: 25/07/2011 05:44 PM Subject: [patch] Fix PR tree-optimization/49471 Hi, This patch fixes the build failure of cactusADM and dealII spec2006 benchmarks when autopar is enabled. (for powerpc they fail only when -m32 is additionally enabled) The problem originated in canonicalize_loop_ivs, where we iterate the header's phis in order to base all the induction variables on a single control variable. We use the largest precision of the loop's ivs in order to determine the type of the control variable. Since iterating the loop's phis takes into account not only the loop's ivs, but also reduction variables, we got precision values like 80 for x86, or 128 for ppc. The compilers failed to create proper types for these sizes (respectively). The proper behavior for determining the control variable's type is to take into account only the loop's ivs, which is what this patch does. Bootstrap and testsuite pass successfully (as autopar is not enabled by default). No new regressions when the testsuite is run with autopar enabled. No new regressions for the run of spec2006 with autopar enabled, cactusADM and dealII benchmarks now pass successfully with autopar on powerpc and x86. Thanks to Zdenek who helped me figure out the failure/fix. OK for trunk? Thanks, Razya ChangeLog: PR tree-optimization/49471 * tree-vect-loop-manip.c (canonicalize_loop_ivs): Add condition to ignore reduction variables when iterating the loop header's phis. I have an error in the ChangeLog: the change is in tree-ssa-loop-manip.c instead of tree-vect-loop-manip.c Sorry, Razya [attachment cactus_dealII_patch.txt deleted by Razya Ladelsky/Haifa/IBM]
Re: PATCH] PR 49580
Zdenek Dvorak rakd...@kam.mff.cuni.cz wrote on 30/06/2011 15:21:43: From: Zdenek Dvorak rakd...@kam.mff.cuni.cz To: Razya Ladelsky/Haifa/IBM@IBMIL Cc: gcc-patches@gcc.gnu.org, Richard Guenther richard.guent...@gmail.com Date: 30/06/2011 15:21 Subject: Re: PATCH] PR 49580 Hi, This patch fixes the build failure of gcc spec2006 benchmark. The change is in gimple_duplicate_sese_tail(), where we need to subtract 1 from the loop's number of iterations. The stmt created to change the rhs of the loop's condition, used to be inserted right after the defining stmt of the rhs (an ssa name). This requires special handling of different cases of the defining stmt: 1.gimple_stmt 2.gimple_phi 3.default_def Instead of handling each case separately, if we insert the new stmt at the begining of the loop's preheader, we're sure that it's already been defined. Bootstrap and testsuite pass successfully (as autopar is not enabled by default). No new regressions when the testsuite is run with autopar enabled. No new regressions for the run of spec2006 with autopar enabled, and gcc benchmark now passes successfully.. OK for trunk? actually, I think a better approach would be not to have this kind of pass-specific adjustments in gimple_duplicate_sese_tail at all. The code decreasing the number of iterations in gimple_duplicate_sese_tail only works because parloops does induction variable canonicalization first; should we need it to be used anywhere else, it will break. I.e., parloops should first transform the condition so that it does the comparison with the adjusted value, and then gimple_duplicate_sese_tail could do just code copying and cfg changes, Zdenek Hi, I moved the adjustment of the loop's iterations from gimple_duplicate_sese_tail to tree-parloops.c, right before the call to gimple_duplicate_sese_tail. I repeated the bootstrap, regression and spec runs - no new regressions. OK to commit? Thanks, razya Index: gcc/tree-parloops.c === --- gcc/tree-parloops.c (revision 174166) +++ gcc/tree-parloops.c (working copy) @@ -1464,6 +1464,8 @@ transform_to_exit_first_loop (struct loop *loop, h gimple phi, nphi, cond_stmt, stmt, cond_nit; gimple_stmt_iterator gsi; tree nit_1; + edge exit_1; + tree new_rhs; split_block_after_labels (loop-header); orig_header = single_succ (loop-header); @@ -1492,6 +1494,38 @@ transform_to_exit_first_loop (struct loop *loop, h control = t; } } + + /* Setting the condition towards peeling the last iteration: +If the block consisting of the exit condition has the latch as +successor, then the body of the loop is executed before +the exit condition is tested. In such case, moving the +condition to the entry, causes that the loop will iterate +one less iteration (which is the wanted outcome, since we +peel out the last iteration). If the body is executed after +the condition, moving the condition to the entry requires +decrementing one iteration. */ + exit_1 = EDGE_SUCC (exit-src, EDGE_SUCC (exit-src, 0) == exit); + if (exit_1-dest == loop-latch) +new_rhs = gimple_cond_rhs (cond_stmt); + else + { +new_rhs = fold_build2 (MINUS_EXPR, TREE_TYPE (gimple_cond_rhs (cond_stmt)), + gimple_cond_rhs (cond_stmt), + build_int_cst (TREE_TYPE (gimple_cond_rhs (cond_stmt)), 1)); +if (TREE_CODE (gimple_cond_rhs (cond_stmt)) == SSA_NAME) + { + basic_block preheader; + gimple_stmt_iterator gsi1; + + preheader = loop_preheader_edge(loop)-src; + gsi1 = gsi_after_labels (preheader); + new_rhs = force_gimple_operand_gsi (gsi1, new_rhs, true, + NULL_TREE,false,GSI_CONTINUE_LINKING); + } + } + gimple_cond_set_rhs (cond_stmt, unshare_expr (new_rhs)); + gimple_cond_set_lhs (cond_stmt, unshare_expr (gimple_cond_lhs (cond_stmt))); + bbs = get_loop_body_in_dom_order (loop); for (n = 0; bbs[n] != loop-latch; n++) Index: gcc/tree-cfg.c === --- gcc/tree-cfg.c (revision 174166) +++ gcc/tree-cfg.c (working copy) @@ -5397,12 +5397,10 @@ gimple_duplicate_sese_tail (edge entry ATTRIBUTE_U int total_freq = 0, exit_freq = 0; gcov_type total_count = 0, exit_count = 0; edge exits[2], nexits[2], e; - gimple_stmt_iterator gsi,gsi1; + gimple_stmt_iterator gsi; gimple cond_stmt; edge sorig, snew; basic_block exit_bb; - basic_block iters_bb; - tree new_rhs; gimple_stmt_iterator psi; gimple phi; tree def; @@ -5483,35 +5481,6 @@ gimple_duplicate_sese_tail (edge entry ATTRIBUTE_U gcc_assert (gimple_code (cond_stmt) == GIMPLE_COND); cond_stmt = gimple_copy (cond_stmt); - /* If the block consisting of the exit condition has the latch
Re: PATCH] PR 49580
Zdenek Dvorak rakd...@kam.mff.cuni.cz wrote on 05/07/2011 13:37:41: From: Zdenek Dvorak rakd...@kam.mff.cuni.cz To: Razya Ladelsky/Haifa/IBM@IBMIL Cc: gcc-patches@gcc.gnu.org, Richard Guenther richard.guent...@gmail.com Date: 05/07/2011 13:37 Subject: Re: PATCH] PR 49580 Hi, I moved the adjustment of the loop's iterations from gimple_duplicate_sese_tail to tree-parloops.c, right before the call to gimple_duplicate_sese_tail. I repeated the bootstrap, regression and spec runs - no new regressions. OK to commit? OK, Zdenek I also want to commit this testcase, which is a reduced case of the gcc benchmark. I apologize for not submitting it together with the patch. OK? Thanks, Razya Index: gcc/tree-parloops.c === --- gcc/tree-parloops.c (revision 174166) +++ gcc/tree-parloops.c (working copy) @@ -1464,6 +1464,8 @@ transform_to_exit_first_loop (struct loop *loop, h gimple phi, nphi, cond_stmt, stmt, cond_nit; gimple_stmt_iterator gsi; tree nit_1; + edge exit_1; + tree new_rhs; split_block_after_labels (loop-header); orig_header = single_succ (loop-header); @@ -1492,6 +1494,38 @@ transform_to_exit_first_loop (struct loop *loop, h control = t; } } + + /* Setting the condition towards peeling the last iteration: +If the block consisting of the exit condition has the latch as +successor, then the body of the loop is executed before +the exit condition is tested. In such case, moving the +condition to the entry, causes that the loop will iterate +one less iteration (which is the wanted outcome, since we +peel out the last iteration). If the body is executed after +the condition, moving the condition to the entry requires +decrementing one iteration. */ + exit_1 = EDGE_SUCC (exit-src, EDGE_SUCC (exit-src, 0) == exit); + if (exit_1-dest == loop-latch) +new_rhs = gimple_cond_rhs (cond_stmt); + else + { +new_rhs = fold_build2 (MINUS_EXPR, TREE_TYPE (gimple_cond_rhs (cond_stmt)), +gimple_cond_rhs (cond_stmt), +build_int_cst (TREE_TYPE (gimple_cond_rhs (cond_stmt)), 1)); +if (TREE_CODE (gimple_cond_rhs (cond_stmt)) == SSA_NAME) + { +basic_block preheader; + gimple_stmt_iterator gsi1; + + preheader = loop_preheader_edge(loop)-src; + gsi1 = gsi_after_labels (preheader); + new_rhs = force_gimple_operand_gsi (gsi1, new_rhs, true, + NULL_TREE,false,GSI_CONTINUE_LINKING); + } + } + gimple_cond_set_rhs (cond_stmt, unshare_expr (new_rhs)); + gimple_cond_set_lhs (cond_stmt, unshare_expr (gimple_cond_lhs (cond_stmt))); + bbs = get_loop_body_in_dom_order (loop); for (n = 0; bbs[n] != loop-latch; n++) Index: gcc/tree-cfg.c === --- gcc/tree-cfg.c (revision 174166) +++ gcc/tree-cfg.c (working copy) @@ -5397,12 +5397,10 @@ gimple_duplicate_sese_tail (edge entry ATTRIBUTE_U int total_freq = 0, exit_freq = 0; gcov_type total_count = 0, exit_count = 0; edge exits[2], nexits[2], e; - gimple_stmt_iterator gsi,gsi1; + gimple_stmt_iterator gsi; gimple cond_stmt; edge sorig, snew; basic_block exit_bb; - basic_block iters_bb; - tree new_rhs; gimple_stmt_iterator psi; gimple phi; tree def; @@ -5483,35 +5481,6 @@ gimple_duplicate_sese_tail (edge entry ATTRIBUTE_U gcc_assert (gimple_code (cond_stmt) == GIMPLE_COND); cond_stmt = gimple_copy (cond_stmt); - /* If the block consisting of the exit condition has the latch as -successor, then the body of the loop is executed before -the exit condition is tested. In such case, moving the -condition to the entry, causes that the loop will iterate -one less iteration (which is the wanted outcome, since we -peel out the last iteration). If the body is executed after -the condition, moving the condition to the entry requires -decrementing one iteration. */ - if (exits[1]-dest == orig_loop-latch) -new_rhs = gimple_cond_rhs (cond_stmt); - else - { -new_rhs = fold_build2 (MINUS_EXPR, TREE_TYPE (gimple_cond_rhs (cond_stmt)), -gimple_cond_rhs (cond_stmt), -build_int_cst (TREE_TYPE (gimple_cond_rhs (cond_stmt)), 1)); - -if (TREE_CODE (gimple_cond_rhs (cond_stmt)) == SSA_NAME) - { - iters_bb = gimple_bb (SSA_NAME_DEF_STMT (gimple_cond_rhs (cond_stmt))); - for (gsi1 = gsi_start_bb (iters_bb); !gsi_end_p (gsi1); gsi_next (gsi1)) - if (gsi_stmt (gsi1) == SSA_NAME_DEF_STMT (gimple_cond_rhs (cond_stmt))) - break; - - new_rhs = force_gimple_operand_gsi (gsi1, new_rhs, true, - NULL_TREE,false,GSI_CONTINUE_LINKING
PATCH] PR 49580
Hi, This patch fixes the build failure of gcc spec2006 benchmark. The change is in gimple_duplicate_sese_tail(), where we need to subtract 1 from the loop's number of iterations. The stmt created to change the rhs of the loop's condition, used to be inserted right after the defining stmt of the rhs (an ssa name). This requires special handling of different cases of the defining stmt: 1.gimple_stmt 2.gimple_phi 3.default_def Instead of handling each case separately, if we insert the new stmt at the begining of the loop's preheader, we're sure that it's already been defined. Bootstrap and testsuite pass successfully (as autopar is not enabled by default). No new regressions when the testsuite is run with autopar enabled. No new regressions for the run of spec2006 with autopar enabled, and gcc benchmark now passes successfully.. OK for trunk? Thanks, Razya Index: gcc/tree-cfg.c === --- gcc/tree-cfg.c (revision 174166) +++ gcc/tree-cfg.c (working copy) @@ -5401,7 +5401,6 @@ gimple_duplicate_sese_tail (edge entry ATTRIBUTE_U gimple cond_stmt; edge sorig, snew; basic_block exit_bb; - basic_block iters_bb; tree new_rhs; gimple_stmt_iterator psi; gimple phi; @@ -5501,11 +5500,10 @@ gimple_duplicate_sese_tail (edge entry ATTRIBUTE_U if (TREE_CODE (gimple_cond_rhs (cond_stmt)) == SSA_NAME) { - iters_bb = gimple_bb (SSA_NAME_DEF_STMT (gimple_cond_rhs (cond_stmt))); - for (gsi1 = gsi_start_bb (iters_bb); !gsi_end_p (gsi1); gsi_next (gsi1)) - if (gsi_stmt (gsi1) == SSA_NAME_DEF_STMT (gimple_cond_rhs (cond_stmt))) - break; + basic_block preheader; + preheader = loop_preheader_edge(orig_loop)-src; + gsi1 = gsi_after_labels (preheader); new_rhs = force_gimple_operand_gsi (gsi1, new_rhs, true, NULL_TREE,false,GSI_CONTINUE_LINKING); } = 22-12-2009 Razya Ladelsky ra...@il.ibm.com * tree-cfg.c (gimple_duplicate_sese_tail): Insert the stmt caclculating the new rhs of the loop's condition stmt to the preheader instead of iters_bb. =