Re: [PATCH GCC 1/9]Delete useless code in tree-vect-loop-manip.c

2016-09-07 Thread Jeff Law

On 09/06/2016 12:49 PM, Bin Cheng wrote:

Hi,
This is a patch set generating new control flow graph for vectorized loop and 
its peeling loops.  At the moment, CFG for vecorized loop is complicated and 
sub-optimal.  Major issues are like:
A) For both prologue and vectorized loop, it generates guard/branch before 
loops checking if the following (prologue/vectorized) loop should be skipped.  
It also generates guard/branch after loops checking if the next loop 
(vectorized/epilogue) loop should be skipped.
B) Depending on how conditional set is supported by targets, it may generates 
one additional if-statement (branch) setting the niters for prologue loop.
C) In the worst cases, up to 4 branch instructions need to be executed before 
vectorized loop is entered.
D) For loops without enough niters, it checks some (niters_prologue) 
iterations with prologue loop; then checks if the rest number of iterations (niters 
- niters_prologue) is enough for vectorization; if not, it skips vectorized loop 
and continues with epilogue loop.  This is bad since vectorized loop won't be 
executed at all after all the hassle.

This patch set improves it by merging different checks thus only 2 branch 
instructions (could be further reduced in combination with loop versioning) are 
executed before vectorized loop; it does better in compile time analysis in 
order to avoid prologue/epilogue peeling if possible; it improves code 
generation in various ways (live overflow handling, generating short live 
ranges).  In terms of implementation, it tries to factor SSA updating code out 
of CFG changing code, I think this may help future work replacing slpeel_* with 
generic GIMPLE loop copier.

So far there are 9 patches in the set, patch [1-5] are small prerequisites for 
major change which is done by patch 6.  Patch [7-9] are small patches either 
address test case or improve code generation.  Final bootstrap and test of 
patch set ongoing on x86_64 and AArch64.  Assume no new failure or will be 
fixed, any comments on this?

This is the first patch deleting useless code in tree-vect-loop-manip.c, as 
well as fixing obvious code style issue.

Thanks,
bin

2016-09-01  Bin Cheng  

* tree-vect-loop-manip.c (slpeel_can_duplicate_loop_p): Fix code
style issue.
(vect_do_peeling_for_loop_bound, vect_do_peeling_for_alignment):
Remove useless code.
Seems obvious to me -- I can't think of any reason why we'd emit a NULL 
sequence to the loop preheader edge.


jeff



[PATCH GCC 1/9]Delete useless code in tree-vect-loop-manip.c

2016-09-06 Thread Bin Cheng
Hi,
This is a patch set generating new control flow graph for vectorized loop and 
its peeling loops.  At the moment, CFG for vecorized loop is complicated and 
sub-optimal.  Major issues are like:
A) For both prologue and vectorized loop, it generates guard/branch before 
loops checking if the following (prologue/vectorized) loop should be skipped.  
It also generates guard/branch after loops checking if the next loop 
(vectorized/epilogue) loop should be skipped.
B) Depending on how conditional set is supported by targets, it may generates 
one additional if-statement (branch) setting the niters for prologue loop.
C) In the worst cases, up to 4 branch instructions need to be executed before 
vectorized loop is entered.
D) For loops without enough niters, it checks some (niters_prologue) 
iterations with prologue loop; then checks if the rest number of iterations 
(niters - niters_prologue) is enough for vectorization; if not, it skips 
vectorized loop and continues with epilogue loop.  This is bad since vectorized 
loop won't be executed at all after all the hassle.

This patch set improves it by merging different checks thus only 2 branch 
instructions (could be further reduced in combination with loop versioning) are 
executed before vectorized loop; it does better in compile time analysis in 
order to avoid prologue/epilogue peeling if possible; it improves code 
generation in various ways (live overflow handling, generating short live 
ranges).  In terms of implementation, it tries to factor SSA updating code out 
of CFG changing code, I think this may help future work replacing slpeel_* with 
generic GIMPLE loop copier.

So far there are 9 patches in the set, patch [1-5] are small prerequisites for 
major change which is done by patch 6.  Patch [7-9] are small patches either 
address test case or improve code generation.  Final bootstrap and test of 
patch set ongoing on x86_64 and AArch64.  Assume no new failure or will be 
fixed, any comments on this?

This is the first patch deleting useless code in tree-vect-loop-manip.c, as 
well as fixing obvious code style issue.

Thanks,
bin

2016-09-01  Bin Cheng  

* tree-vect-loop-manip.c (slpeel_can_duplicate_loop_p): Fix code
style issue.
(vect_do_peeling_for_loop_bound, vect_do_peeling_for_alignment):
Remove useless code.diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index 01d6bb1..3a3b0bc 100644
--- a/gcc/tree-vect-loop-manip.c
+++ b/gcc/tree-vect-loop-manip.c
@@ -1003,9 +1003,9 @@ slpeel_can_duplicate_loop_p (const struct loop *loop, 
const_edge e)
   gimple_stmt_iterator loop_exit_gsi = gsi_last_bb (exit_e->src);
   unsigned int num_bb = loop->inner? 5 : 2;
 
-  /* All loops have an outer scope; the only case loop->outer is NULL is 
for
- the function itself.  */
-  if (!loop_outer (loop)
+  /* All loops have an outer scope; the only case loop->outer is NULL is for
+ the function itself.  */
+  if (!loop_outer (loop)
   || loop->num_nodes != num_bb
   || !empty_block_p (loop->latch)
   || !single_exit (loop)
@@ -1786,7 +1786,6 @@ vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo,
   struct loop *new_loop;
   edge update_e;
   basic_block preheader;
-  int loop_num;
   int max_iter;
   tree cond_expr = NULL_TREE;
   gimple_seq cond_expr_stmt_list = NULL;
@@ -1797,8 +1796,6 @@ vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo,
 
   initialize_original_copy_tables ();
 
-  loop_num  = loop->num;
-
   new_loop
 = slpeel_tree_peel_loop_to_edge (loop, scalar_loop, single_exit (loop),
 _mult_vf_name, ni_name, false,
@@ -1806,7 +1803,6 @@ vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo,
 cond_expr, cond_expr_stmt_list,
 0, LOOP_VINFO_VECT_FACTOR (loop_vinfo));
   gcc_assert (new_loop);
-  gcc_assert (loop_num == loop->num);
   slpeel_checking_verify_cfg_after_peeling (loop, new_loop);
 
   /* A guard that controls whether the new_loop is to be executed or skipped
@@ -2053,8 +2049,6 @@ vect_do_peeling_for_alignment (loop_vec_info loop_vinfo, 
tree ni_name,
 
   initialize_original_copy_tables ();
 
-  gimple_seq stmts = NULL;
-  gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop), stmts);
   niters_of_prolog_loop = vect_gen_niters_for_prolog_loop (loop_vinfo,
   ni_name,
   );