Re: [PATCH GCC][04/13]Sort statements in topological order for loop distribution
On Wed, Jun 14, 2017 at 11:25 AM, Bin.Chengwrote: > On Wed, Jun 14, 2017 at 10:15 AM, Richard Biener > wrote: >> On Wed, Jun 14, 2017 at 9:53 AM, Bin.Cheng wrote: >>> On Tue, Jun 13, 2017 at 11:59 AM, Richard Biener >>> wrote: On Mon, Jun 12, 2017 at 7:02 PM, Bin Cheng wrote: > Hi, > During the work I ran into a latent bug for distributing. For the moment > we sort statements > in dominance order, but that's not enough because basic blocks may be > sorted in reverse order > of execution flow. This results in wrong data dependence direction > later. This patch fixes > the issue by sorting in topological order. > > Bootstrap and test on x86_64 and AArch64. Is it OK? I suppose you are fixing static int pg_add_dependence_edges (struct graph *rdg, vec loops, int dir, vec drs1, vec drs2) { ... /* Re-shuffle data-refs to be in dominator order. */ if (rdg_vertex_for_stmt (rdg, DR_STMT (dr1)) > rdg_vertex_for_stmt (rdg, DR_STMT (dr2))) { std::swap (dr1, dr2); this_dir = -this_dir; } but then for stmts that are not "ordered" by RPO or DOM like if (flag) ... = dr1; else ... = dr2; this doesn't avoid spurious swaps? Also the code was basically >>> No, this is mainly for below case: >>> if (flag) >>> { >>> partition1: arr[i] = x; >>> } >>> partition2: arr[i] = y; >>> >>> function pg_add_dependence_edges is like: >>> /* Re-shuffle data-refs to be in dominator order. */ >>> if (rdg_vertex_for_stmt (rdg, DR_STMT (dr1)) >>> > rdg_vertex_for_stmt (rdg, DR_STMT (dr2))) >>> { >>> std::swap (dr1, dr2); >>> this_dir = -this_dir; >>> } >>> //... >>> else if (DDR_ARE_DEPENDENT (ddr) == NULL_TREE) >>> { >>> if (DDR_REVERSED_P (ddr)) >>>{ >>> std::swap (dr1, dr2); >>> this_dir = -this_dir; >>>} >>> /* Known dependences can still be unordered througout the >>> iteration space, see gcc.dg/tree-ssa/ldist-16.c. */ >>> if (DDR_NUM_DIST_VECTS (ddr) != 1) >>>this_dir = 2; >>> /* If the overlap is exact preserve stmt order. */ >>> else if (lambda_vector_zerop (DDR_DIST_VECT (ddr, 0), 1)) >>>; >>> else >>>{ >>> /* Else as the distance vector is lexicographic positive >>> swap the dependence direction. */ >>> this_dir = -this_dir; >>>} >>> } >>> For non-ZERO distance vector dependence, the second part always >>> computes src->dest dependence info correctly, as well as edge >>> direction of PG. For ZERO distance vector dependence, we rely on the >>> swap part (thus topological order) to get correct dependence >>> direction. For mentioned case, the two data references are unordered >>> in dominance relation, but ordered in RPO. This is why DOM is not >>> enough. For if-then-else case, the order actually doesn't matter, and >>> the references are unordered in either dominance relation or RPO. >>> Specifically, src->dest is always computed correctly for non-ZERO >>> distance vector cases, no matter or is passed to >>> data dependence analyzer. As for ZERO distance vector (exact >>> overlap), the order doesn't matter either because they control >>> dependent on the same condition. We can simply assume an arbitrary >>> order for it. >> >> Ok, if it only is an issue for zero-distance then yes, I agree. > Yeah, so for distribution, I think we can further simplify > pg_add_dependence_edge because swap is only necessary for > zero-distance cases. >> copied from tree-data-refs.c:find_data_references_in_loop which does iterate over get_loop_body_in_dom_order as well. So isn't the issue latent there as well? >>> In theory maybe. In practice, this is not a problem at all since loop >>> distribution is the only one handles control dependence so far. >> >> You mean the only one running into the bogus BB ordering case. >> I don't see how handling control dependences factor in here. > Yes, that's what I meant. The ordering issue doesn't exist if there > is no control dependence between basic blocks in loop body I think. I > didn't realize the autopar pass. >> That said, what about those "unordered" stmts? I suppose dependence analysis still happily computes a dependence distance but in reality we'd have to consider both execution orders? >>> As explained, there is no need to consider both orders. GCC doesn't >>> really support control dependence parallelization? >>
Re: [PATCH GCC][04/13]Sort statements in topological order for loop distribution
On Wed, Jun 14, 2017 at 10:15 AM, Richard Bienerwrote: > On Wed, Jun 14, 2017 at 9:53 AM, Bin.Cheng wrote: >> On Tue, Jun 13, 2017 at 11:59 AM, Richard Biener >> wrote: >>> On Mon, Jun 12, 2017 at 7:02 PM, Bin Cheng wrote: Hi, During the work I ran into a latent bug for distributing. For the moment we sort statements in dominance order, but that's not enough because basic blocks may be sorted in reverse order of execution flow. This results in wrong data dependence direction later. This patch fixes the issue by sorting in topological order. Bootstrap and test on x86_64 and AArch64. Is it OK? >>> >>> I suppose you are fixing >>> >>> static int >>> pg_add_dependence_edges (struct graph *rdg, vec loops, int dir, >>> vec drs1, >>> vec drs2) >>> { >>> ... >>> /* Re-shuffle data-refs to be in dominator order. */ >>> if (rdg_vertex_for_stmt (rdg, DR_STMT (dr1)) >>> > rdg_vertex_for_stmt (rdg, DR_STMT (dr2))) >>> { >>> std::swap (dr1, dr2); >>> this_dir = -this_dir; >>> } >>> >>> but then for stmts that are not "ordered" by RPO or DOM like >>> >>>if (flag) >>> ... = dr1; >>>else >>> ... = dr2; >>> >>> this doesn't avoid spurious swaps? Also the code was basically >> No, this is mainly for below case: >> if (flag) >> { >> partition1: arr[i] = x; >> } >> partition2: arr[i] = y; >> >> function pg_add_dependence_edges is like: >> /* Re-shuffle data-refs to be in dominator order. */ >> if (rdg_vertex_for_stmt (rdg, DR_STMT (dr1)) >> > rdg_vertex_for_stmt (rdg, DR_STMT (dr2))) >> { >> std::swap (dr1, dr2); >> this_dir = -this_dir; >> } >> //... >> else if (DDR_ARE_DEPENDENT (ddr) == NULL_TREE) >> { >> if (DDR_REVERSED_P (ddr)) >>{ >> std::swap (dr1, dr2); >> this_dir = -this_dir; >>} >> /* Known dependences can still be unordered througout the >> iteration space, see gcc.dg/tree-ssa/ldist-16.c. */ >> if (DDR_NUM_DIST_VECTS (ddr) != 1) >>this_dir = 2; >> /* If the overlap is exact preserve stmt order. */ >> else if (lambda_vector_zerop (DDR_DIST_VECT (ddr, 0), 1)) >>; >> else >>{ >> /* Else as the distance vector is lexicographic positive >> swap the dependence direction. */ >> this_dir = -this_dir; >>} >> } >> For non-ZERO distance vector dependence, the second part always >> computes src->dest dependence info correctly, as well as edge >> direction of PG. For ZERO distance vector dependence, we rely on the >> swap part (thus topological order) to get correct dependence >> direction. For mentioned case, the two data references are unordered >> in dominance relation, but ordered in RPO. This is why DOM is not >> enough. For if-then-else case, the order actually doesn't matter, and >> the references are unordered in either dominance relation or RPO. >> Specifically, src->dest is always computed correctly for non-ZERO >> distance vector cases, no matter or is passed to >> data dependence analyzer. As for ZERO distance vector (exact >> overlap), the order doesn't matter either because they control >> dependent on the same condition. We can simply assume an arbitrary >> order for it. > > Ok, if it only is an issue for zero-distance then yes, I agree. Yeah, so for distribution, I think we can further simplify pg_add_dependence_edge because swap is only necessary for zero-distance cases. > >>> copied from tree-data-refs.c:find_data_references_in_loop which >>> does iterate over get_loop_body_in_dom_order as well. So isn't the >>> issue latent there as well? >> In theory maybe. In practice, this is not a problem at all since loop >> distribution is the only one handles control dependence so far. > > You mean the only one running into the bogus BB ordering case. > I don't see how handling control dependences factor in here. Yes, that's what I meant. The ordering issue doesn't exist if there is no control dependence between basic blocks in loop body I think. I didn't realize the autopar pass. > >>> >>> That said, what about those "unordered" stmts? I suppose >>> dependence analysis still happily computes a dependence >>> distance but in reality we'd have to consider both execution >>> orders? >> As explained, there is no need to consider both orders. GCC doesn't >> really support control dependence parallelization? > > I think autopar supports an arbitrary CFG inside the loops but as it > will never split them it won't change stmt ordering for zero-distance. > > That said, if dependence info can be incorrect if
Re: [PATCH GCC][04/13]Sort statements in topological order for loop distribution
On Wed, Jun 14, 2017 at 9:53 AM, Bin.Chengwrote: > On Tue, Jun 13, 2017 at 11:59 AM, Richard Biener > wrote: >> On Mon, Jun 12, 2017 at 7:02 PM, Bin Cheng wrote: >>> Hi, >>> During the work I ran into a latent bug for distributing. For the moment >>> we sort statements >>> in dominance order, but that's not enough because basic blocks may be >>> sorted in reverse order >>> of execution flow. This results in wrong data dependence direction later. >>> This patch fixes >>> the issue by sorting in topological order. >>> >>> Bootstrap and test on x86_64 and AArch64. Is it OK? >> >> I suppose you are fixing >> >> static int >> pg_add_dependence_edges (struct graph *rdg, vec loops, int dir, >> vec drs1, >> vec drs2) >> { >> ... >> /* Re-shuffle data-refs to be in dominator order. */ >> if (rdg_vertex_for_stmt (rdg, DR_STMT (dr1)) >> > rdg_vertex_for_stmt (rdg, DR_STMT (dr2))) >> { >> std::swap (dr1, dr2); >> this_dir = -this_dir; >> } >> >> but then for stmts that are not "ordered" by RPO or DOM like >> >>if (flag) >> ... = dr1; >>else >> ... = dr2; >> >> this doesn't avoid spurious swaps? Also the code was basically > No, this is mainly for below case: > if (flag) > { > partition1: arr[i] = x; > } > partition2: arr[i] = y; > > function pg_add_dependence_edges is like: > /* Re-shuffle data-refs to be in dominator order. */ > if (rdg_vertex_for_stmt (rdg, DR_STMT (dr1)) > > rdg_vertex_for_stmt (rdg, DR_STMT (dr2))) > { > std::swap (dr1, dr2); > this_dir = -this_dir; > } > //... > else if (DDR_ARE_DEPENDENT (ddr) == NULL_TREE) > { > if (DDR_REVERSED_P (ddr)) >{ > std::swap (dr1, dr2); > this_dir = -this_dir; >} > /* Known dependences can still be unordered througout the > iteration space, see gcc.dg/tree-ssa/ldist-16.c. */ > if (DDR_NUM_DIST_VECTS (ddr) != 1) >this_dir = 2; > /* If the overlap is exact preserve stmt order. */ > else if (lambda_vector_zerop (DDR_DIST_VECT (ddr, 0), 1)) >; > else >{ > /* Else as the distance vector is lexicographic positive > swap the dependence direction. */ > this_dir = -this_dir; >} > } > For non-ZERO distance vector dependence, the second part always > computes src->dest dependence info correctly, as well as edge > direction of PG. For ZERO distance vector dependence, we rely on the > swap part (thus topological order) to get correct dependence > direction. For mentioned case, the two data references are unordered > in dominance relation, but ordered in RPO. This is why DOM is not > enough. For if-then-else case, the order actually doesn't matter, and > the references are unordered in either dominance relation or RPO. > Specifically, src->dest is always computed correctly for non-ZERO > distance vector cases, no matter or is passed to > data dependence analyzer. As for ZERO distance vector (exact > overlap), the order doesn't matter either because they control > dependent on the same condition. We can simply assume an arbitrary > order for it. Ok, if it only is an issue for zero-distance then yes, I agree. >> copied from tree-data-refs.c:find_data_references_in_loop which >> does iterate over get_loop_body_in_dom_order as well. So isn't the >> issue latent there as well? > In theory maybe. In practice, this is not a problem at all since loop > distribution is the only one handles control dependence so far. You mean the only one running into the bogus BB ordering case. I don't see how handling control dependences factor in here. >> >> That said, what about those "unordered" stmts? I suppose >> dependence analysis still happily computes a dependence >> distance but in reality we'd have to consider both execution >> orders? > As explained, there is no need to consider both orders. GCC doesn't > really support control dependence parallelization? I think autopar supports an arbitrary CFG inside the loops but as it will never split them it won't change stmt ordering for zero-distance. That said, if dependence info can be incorrect if applied to a loop we should fixup tree-data-refs.c as well. It might be useful to make get_loop_body_in_rpo_order available then (and eventually all _in_dom_order users can work with rpo order as well thus we can replace it entirely as a second step). Thanks, Richard. > Thanks, > bin >> >> Thanks, >> Richard. >> >> >>> >>> Thanks, >>> bin >>> 2017-06-07 Bin Cheng >>> >>> * tree-loop-distribution.c (bb_top_order_index): New. >>> (bb_top_order_index_size, bb_top_order_cmp):
Re: [PATCH GCC][04/13]Sort statements in topological order for loop distribution
On Tue, Jun 13, 2017 at 11:59 AM, Richard Bienerwrote: > On Mon, Jun 12, 2017 at 7:02 PM, Bin Cheng wrote: >> Hi, >> During the work I ran into a latent bug for distributing. For the moment we >> sort statements >> in dominance order, but that's not enough because basic blocks may be sorted >> in reverse order >> of execution flow. This results in wrong data dependence direction later. >> This patch fixes >> the issue by sorting in topological order. >> >> Bootstrap and test on x86_64 and AArch64. Is it OK? > > I suppose you are fixing > > static int > pg_add_dependence_edges (struct graph *rdg, vec loops, int dir, > vec drs1, > vec drs2) > { > ... > /* Re-shuffle data-refs to be in dominator order. */ > if (rdg_vertex_for_stmt (rdg, DR_STMT (dr1)) > > rdg_vertex_for_stmt (rdg, DR_STMT (dr2))) > { > std::swap (dr1, dr2); > this_dir = -this_dir; > } > > but then for stmts that are not "ordered" by RPO or DOM like > >if (flag) > ... = dr1; >else > ... = dr2; > > this doesn't avoid spurious swaps? Also the code was basically No, this is mainly for below case: if (flag) { partition1: arr[i] = x; } partition2: arr[i] = y; function pg_add_dependence_edges is like: /* Re-shuffle data-refs to be in dominator order. */ if (rdg_vertex_for_stmt (rdg, DR_STMT (dr1)) > rdg_vertex_for_stmt (rdg, DR_STMT (dr2))) { std::swap (dr1, dr2); this_dir = -this_dir; } //... else if (DDR_ARE_DEPENDENT (ddr) == NULL_TREE) { if (DDR_REVERSED_P (ddr)) { std::swap (dr1, dr2); this_dir = -this_dir; } /* Known dependences can still be unordered througout the iteration space, see gcc.dg/tree-ssa/ldist-16.c. */ if (DDR_NUM_DIST_VECTS (ddr) != 1) this_dir = 2; /* If the overlap is exact preserve stmt order. */ else if (lambda_vector_zerop (DDR_DIST_VECT (ddr, 0), 1)) ; else { /* Else as the distance vector is lexicographic positive swap the dependence direction. */ this_dir = -this_dir; } } For non-ZERO distance vector dependence, the second part always computes src->dest dependence info correctly, as well as edge direction of PG. For ZERO distance vector dependence, we rely on the swap part (thus topological order) to get correct dependence direction. For mentioned case, the two data references are unordered in dominance relation, but ordered in RPO. This is why DOM is not enough. For if-then-else case, the order actually doesn't matter, and the references are unordered in either dominance relation or RPO. Specifically, src->dest is always computed correctly for non-ZERO distance vector cases, no matter or is passed to data dependence analyzer. As for ZERO distance vector (exact overlap), the order doesn't matter either because they control dependent on the same condition. We can simply assume an arbitrary order for it. > copied from tree-data-refs.c:find_data_references_in_loop which > does iterate over get_loop_body_in_dom_order as well. So isn't the > issue latent there as well? In theory maybe. In practice, this is not a problem at all since loop distribution is the only one handles control dependence so far. > > That said, what about those "unordered" stmts? I suppose > dependence analysis still happily computes a dependence > distance but in reality we'd have to consider both execution > orders? As explained, there is no need to consider both orders. GCC doesn't really support control dependence parallelization? Thanks, bin > > Thanks, > Richard. > > >> >> Thanks, >> bin >> 2017-06-07 Bin Cheng >> >> * tree-loop-distribution.c (bb_top_order_index): New. >> (bb_top_order_index_size, bb_top_order_cmp): New. >> (stmts_from_loop): Use topological order. >> (pass_loop_distribution::execute): Compute topological order for. >> basic blocks.
Re: [PATCH GCC][04/13]Sort statements in topological order for loop distribution
On Mon, Jun 12, 2017 at 7:02 PM, Bin Chengwrote: > Hi, > During the work I ran into a latent bug for distributing. For the moment we > sort statements > in dominance order, but that's not enough because basic blocks may be sorted > in reverse order > of execution flow. This results in wrong data dependence direction later. > This patch fixes > the issue by sorting in topological order. > > Bootstrap and test on x86_64 and AArch64. Is it OK? I suppose you are fixing static int pg_add_dependence_edges (struct graph *rdg, vec loops, int dir, vec drs1, vec drs2) { ... /* Re-shuffle data-refs to be in dominator order. */ if (rdg_vertex_for_stmt (rdg, DR_STMT (dr1)) > rdg_vertex_for_stmt (rdg, DR_STMT (dr2))) { std::swap (dr1, dr2); this_dir = -this_dir; } but then for stmts that are not "ordered" by RPO or DOM like if (flag) ... = dr1; else ... = dr2; this doesn't avoid spurious swaps? Also the code was basically copied from tree-data-refs.c:find_data_references_in_loop which does iterate over get_loop_body_in_dom_order as well. So isn't the issue latent there as well? That said, what about those "unordered" stmts? I suppose dependence analysis still happily computes a dependence distance but in reality we'd have to consider both execution orders? Thanks, Richard. > > Thanks, > bin > 2017-06-07 Bin Cheng > > * tree-loop-distribution.c (bb_top_order_index): New. > (bb_top_order_index_size, bb_top_order_cmp): New. > (stmts_from_loop): Use topological order. > (pass_loop_distribution::execute): Compute topological order for. > basic blocks.
[PATCH GCC][04/13]Sort statements in topological order for loop distribution
Hi, During the work I ran into a latent bug for distributing. For the moment we sort statements in dominance order, but that's not enough because basic blocks may be sorted in reverse order of execution flow. This results in wrong data dependence direction later. This patch fixes the issue by sorting in topological order. Bootstrap and test on x86_64 and AArch64. Is it OK? Thanks, bin 2017-06-07 Bin Cheng* tree-loop-distribution.c (bb_top_order_index): New. (bb_top_order_index_size, bb_top_order_cmp): New. (stmts_from_loop): Use topological order. (pass_loop_distribution::execute): Compute topological order for. basic blocks.From 4bb233239e080eca956b3db7836cdf64da486dbf Mon Sep 17 00:00:00 2001 From: Bin Cheng Date: Wed, 7 Jun 2017 13:47:52 +0100 Subject: [PATCH 04/14] sort-stmts-in-top-order-20170607.txt --- gcc/tree-loop-distribution.c | 58 +++- 1 file changed, 52 insertions(+), 6 deletions(-) diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c index b0b9d66..a32253c 100644 --- a/gcc/tree-loop-distribution.c +++ b/gcc/tree-loop-distribution.c @@ -373,16 +373,39 @@ create_rdg_vertices (struct graph *rdg, vec stmts, loop_p loop, return true; } -/* Initialize STMTS with all the statements of LOOP. The order in - which we discover statements is important as - generate_loops_for_partition is using the same traversal for - identifying statements in loop copies. */ +/* Array mapping basic block's index to its topological order. */ +static int *bb_top_order_index; +/* And size of the array. */ +static int bb_top_order_index_size; + +/* If X has a smaller topological sort number than Y, returns -1; + if greater, returns 1. */ + +static int +bb_top_order_cmp (const void *x, const void *y) +{ + basic_block bb1 = *(const basic_block *) x; + basic_block bb2 = *(const basic_block *) y; + + gcc_assert (bb1->index < bb_top_order_index_size + && bb2->index < bb_top_order_index_size); + gcc_assert (bb1 == bb2 + || bb_top_order_index[bb1->index] +!= bb_top_order_index[bb2->index]); + + return (bb_top_order_index[bb1->index] - bb_top_order_index[bb2->index]); +} + +/* Initialize STMTS with all the statements of LOOP. We use topological + order to discover all statements. The order is important because + generate_loops_for_partition is using the same traversal for identifying + statements in loop copies. */ static void stmts_from_loop (struct loop *loop, vec *stmts) { unsigned int i; - basic_block *bbs = get_loop_body_in_dom_order (loop); + basic_block *bbs = get_loop_body_in_custom_order (loop, bb_top_order_cmp); for (i = 0; i < loop->num_nodes; i++) { @@ -1764,6 +1787,22 @@ pass_loop_distribution::execute (function *fun) if (number_of_loops (fun) <= 1) return 0; + /* Compute topological order for basic blocks. Topological order is + needed because data dependence is computed for data references in + lexicographical order. */ + if (bb_top_order_index == NULL) +{ + int *rpo = XNEWVEC (int, last_basic_block_for_fn (cfun)); + + bb_top_order_index = XNEWVEC (int, last_basic_block_for_fn (cfun)); + bb_top_order_index_size + = pre_and_rev_post_order_compute_fn (cfun, NULL, rpo, true); + for (int i = 0; i < bb_top_order_index_size; i++) + bb_top_order_index[rpo[i]] = i; + + free (rpo); +} + FOR_ALL_BB_FN (bb, fun) { gimple_stmt_iterator gsi; @@ -1881,13 +1920,20 @@ out: if (cd) delete cd; + if (bb_top_order_index != NULL) +{ + free (bb_top_order_index); + bb_top_order_index = NULL; + bb_top_order_index_size = 0; +} + if (changed) { /* Destroy loop bodies that could not be reused. Do this late as we otherwise can end up refering to stale data in control dependences. */ unsigned i; FOR_EACH_VEC_ELT (loops_to_be_destroyed, i, loop) - destroy_loop (loop); + destroy_loop (loop); /* Cached scalar evolutions now may refer to wrong or non-existing loops. */ -- 1.9.1