Updating general info in tree-parloops.c

2012-05-20 Thread Razya Ladelsky
Hi,

I updated some of the info in tree-parloops.c, like adding myself to the 
contributors, and 
updating the TODO list, both long overdue...

I also update the wiki http://gcc.gnu.org/wiki/AutoParInGCC
and added a link to it from tree-parloops.c.
If there are no objections, I will commit as obvious,


2012-05-20  Razya Ladelsky  ra...@il.ibm.com
 
 * tree-parloops.c : Add myself to contributors, update 
TODO list, add link to wiki.



Thanks,
Razya


Index: tree-parloops.c
===
--- tree-parloops.c (revision 187694)
+++ tree-parloops.c (working copy)
@@ -1,8 +1,8 @@
 /* Loop autoparallelization.
Copyright (C) 2006, 2007, 2008, 2009, 2010, 2011, 2012
Free Software Foundation, Inc.
-   Contributed by Sebastian Pop p...@cri.ensmp.fr and
-   Zdenek Dvorak dvor...@suse.cz.
+   Contributed by Sebastian Pop p...@cri.ensmp.fr 
+   Zdenek Dvorak dvor...@suse.cz and Razya Ladelsky ra...@il.ibm.com.
 
 This file is part of GCC.
 
@@ -54,9 +54,9 @@ along with GCC; see the file COPYING3.  If not see
-- if there are several parallelizable loops in a function, it may be
   possible to generate the threads just once (using synchronization to
   ensure that cross-loop dependences are obeyed).
-   -- handling of common scalar dependence patterns (accumulation, ...)
-   -- handling of non-innermost loops  */
-
+   -- handling of common reduction patterns for outer loops.  
+
+   More info can also be found at http://gcc.gnu.org/wiki/AutoParInGCC  */
 /*
   Reduction handling:
   currently we use vect_force_simple_reduction() to detect reduction patterns.
=

Refining autopar cost model for outer loops

2012-05-13 Thread Razya Ladelsky
Hi,

This patch changes the minimum number of iterations of outer loops for the 
runtime check which tests whether it is worthwhile to parallelize the loop 
or not.
The current minimum number of iterations for all loops is MIN_PER_THREAD * 
number of threads, when MIN_PER_THREAD is arbitrarily set to 100.
This prevents some of the promising loops of SPEC2006 from getting 
parallelized.
I changed the minimum bound for outer loops, under the assumption that 
even if there are not enough iterations, the fact that an outer loop 
contains more loops, obtains enough work to get parallelized.
This indeed allowed for a lot more loops to get parallelized, resulting in 
substantial performance improvements for SPEC2006 benchmarks, measured on 
a Power7 6 core, 4 way SMT each.
I compared  the trunk with O3 + autopar (parallelizing with 6 threads) vs. 
the trunk with   O3  minus vectorization.
None of the benchmarks shows any significant degradation.

The speedup shown for  libquatum  with autopar has been obtained with 
previous versions of autopar, having no relation to this patch, but surely 
not degraded by it either.

These are the speedups I collected:

462.libquantum  2.5 X
410.bwaves  3.3 X
436.cactusADM   4.5 X
459.GemsFDTD1.27 X
481.wrf 1.25 X


Bootstrap and testsuite (with -ftree-parallelize-loops=4) pass 
successfully.
spec-2006 showed no regressions.


OK for trunk?
Thanks,
razya

2012-05-08  Razya Ladelsky  ra...@il.ibm.com
 
 * tree-parloops.c (gen_parallel_loop): Change 
many_iterations_cond for outer loops.
 Index: tree-parloops.c
===
--- tree-parloops.c (revision 186667)
+++ tree-parloops.c (working copy)
@@ -1732,6 +1732,7 @@ gen_parallel_loop (struct loop *loop, htab_t reduc
   unsigned prob;
   location_t loc;
   gimple cond_stmt;
+  unsigned int m_p_thread=2;
 
   /* From
 
@@ -1792,9 +1793,15 @@ gen_parallel_loop (struct loop *loop, htab_t reduc
   if (stmts)
 gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop), stmts);
 
-  many_iterations_cond =
-fold_build2 (GE_EXPR, boolean_type_node,
-nit, build_int_cst (type, MIN_PER_THREAD * n_threads));
+  if (loop-inner)
+m_p_thread=2;
+  else
+m_p_thread=MIN_PER_THREAD;
+
+   many_iterations_cond =
+ fold_build2 (GE_EXPR, boolean_type_node,
+nit, build_int_cst (type, m_p_thread * n_threads));
+
   many_iterations_cond
 = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
   invert_truthvalue (unshare_expr (niter-may_be_zero)),
=

Correcting transform_to_exit_first_loop + fix to PR46886

2012-04-20 Thread Razya Ladelsky
Hi,

This patch handles duplicating of the last iteration correctly.
The current code always duplicates the complete static  iteration from 
the entry to the latch,
and then sets the number of iterations according to the pattern of the 
loop (according to whether the cond before the body, or the body before 
the cond).

The correct way to go about is to not assume anything about the control 
flow of the loop, and  duplicate the last iteration only from 
entry to the block consisting of the cond, that is the real last dynamic 
iteration that would be executed.
The number of iterations then needs no special care for each loop pattern.
This was actually Zdenek's original intent by duplicating the last 
iteration.

This code allows us to remove the do_while cond, and gets PR46886 resolved 
(instead of the solution suggested in 
http://gcc.gnu.org/ml/gcc-patches/2012-03/msg01642.html, 
which handled the number of iterations according to the specific shape of 
the loop)

Bootstrap and testsuite pass successfully.
Testsuite with -ftree-parallelize-loops=4 shows only one regression which 
is uninit-17.c test for warnings, which seems
unrelated to how the loop gets parallelized.
I also ran spec-2006, and it showed no regressions.

2012-04-20  Razya Ladelsky  ra...@il.ibm.com
 
 PR tree-optimization/46886
 * tree-parloops.c (transform_to_exit_first_loop): Remove 
setting of number of iterations according to the loop pattern.
 Duplicate from entry to exit-src instead of loop-latch.
 (pallelize_loops): Remove the condition preventing 
do-while loops.
 * tree-cfg.c (bool bb_in_region_p): New.
 (gimple_duplicate_sese_tail): Adjust duplication of the 
the subloops.
 Adjust redirection of the duplicated iteration. 



O.K to commit?
Thanks,
Razya
Index: tree-parloops.c
===
--- tree-parloops.c (revision 186493)
+++ tree-parloops.c (working copy)
@@ -1481,8 +1481,6 @@ transform_to_exit_first_loop (struct loop *loop, h
   gimple phi, nphi, cond_stmt, stmt, cond_nit;
   gimple_stmt_iterator gsi;
   tree nit_1;
-  edge exit_1;
-  tree new_rhs;
 
   split_block_after_labels (loop-header);
   orig_header = single_succ (loop-header);
@@ -1512,41 +1510,10 @@ transform_to_exit_first_loop (struct loop *loop, h
}
 }
 
- /* Setting the condition towards peeling the last iteration:
-If the block consisting of the exit condition has the latch as
-successor, then the body of the loop is executed before
-the exit condition is tested.  In such case, moving the
-condition to the entry, causes that the loop will iterate
-one less iteration (which is the wanted outcome, since we
-peel out the last iteration).  If the body is executed after
-the condition, moving the condition to the entry requires
-decrementing one iteration.  */
-  exit_1 = EDGE_SUCC (exit-src, EDGE_SUCC (exit-src, 0) == exit); 
-  if (exit_1-dest == loop-latch)
-new_rhs = gimple_cond_rhs (cond_stmt);
-  else
-  {
-new_rhs = fold_build2 (MINUS_EXPR, TREE_TYPE (gimple_cond_rhs (cond_stmt)),
-  gimple_cond_rhs (cond_stmt),
-  build_int_cst (TREE_TYPE (gimple_cond_rhs 
(cond_stmt)), 1));
-if (TREE_CODE (gimple_cond_rhs (cond_stmt)) == SSA_NAME)
-  {
-   basic_block preheader;
-   gimple_stmt_iterator gsi1;
-
-   preheader = loop_preheader_edge(loop)-src;
-   gsi1 = gsi_after_labels (preheader);
-   new_rhs = force_gimple_operand_gsi (gsi1, new_rhs, true,
-   
NULL_TREE,false,GSI_CONTINUE_LINKING);
-  }
-  }
-  gimple_cond_set_rhs (cond_stmt, unshare_expr (new_rhs));
-  gimple_cond_set_lhs (cond_stmt, unshare_expr (gimple_cond_lhs (cond_stmt)));
-  
   bbs = get_loop_body_in_dom_order (loop);
 
-  for (n = 0; bbs[n] != loop-latch; n++)
-continue;
+  for (n = 0; bbs[n] != exit-src; n++)
+   continue;
   nbbs = XNEWVEC (basic_block, n);
   ok = gimple_duplicate_sese_tail (single_succ_edge (loop-header), exit,
   bbs + 1, n, nbbs);
@@ -1599,6 +1566,7 @@ transform_to_exit_first_loop (struct loop *loop, h
   /* Initialize the control variable to number of iterations
  according to the rhs of the exit condition.  */
   gsi = gsi_after_labels (ex_bb);
+  exit = single_dom_exit (loop);
   cond_nit = last_stmt (exit-src);
   nit_1 =  gimple_cond_rhs (cond_nit);
   nit_1 = force_gimple_operand_gsi (gsi,
@@ -1861,8 +1829,8 @@ gen_parallel_loop (struct loop *loop, htab_t reduc
 
   /* Ensure that the exit condition is the first statement in the loop.  */
   transform_to_exit_first_loop (loop, reduction_list, nit);
-
   /* Generate initializations for reductions.  */
+
   if (htab_elements (reduction_list)  0)
 htab_traverse (reduction_list, initialize_reductions, loop);
 
@@ -2187,10 +2155,9

[PATCH] Permanent Fix for PR46886

2012-03-26 Thread Razya Ladelsky
Hi,

This is (hopefully) a permanent fix  to pr46886.c
I removed the condition preventing parallelization of do_while loops, as 
it 
was blocking parallelizing important loops in spec-2006.
The patch fixes the number of iterations for cases where the body could 
appear in the latch, as in pr46886.c.

2012-03-26  Razya Ladelsky  ra...@il.ibm.com

 PR tree-optimization/46886
 * tree-parloops.c (transform_to_exit_first_loop):Set 
number of iterations correctly when the body may appear at the latch.
 (pallelize_loops): Remove the condition preventing 
do-while loops.
 


Bootstrap and testsuite psss successfully on power7 linux machine.
Ok to commit?
Thanks,



Index: tree-parloops.c
===
--- tree-parloops.c (revision 185775)
+++ tree-parloops.c (working copy)
@@ -1522,7 +1522,10 @@ transform_to_exit_first_loop (struct loop *loop, h
 the condition, moving the condition to the entry requires
 decrementing one iteration.  */
   exit_1 = EDGE_SUCC (exit-src, EDGE_SUCC (exit-src, 0) == exit); 
-  if (exit_1-dest == loop-latch)
+  
+  /* if the latch contains more than the one statemnt of control variable 
+ increment then it contains the body.  */
+  if (exit_1-dest == loop-latch  last_and_only_stmt (loop-latch))
 new_rhs = gimple_cond_rhs (cond_stmt);
   else
   {
@@ -2146,7 +2149,6 @@ parallelize_loops (void)
 return false;
   if (cfun-has_nonlocal_label)
 return false;
-
   gcc_obstack_init (parloop_obstack);
   reduction_list = htab_create (10, reduction_info_hash,
 reduction_info_eq, free);
@@ -2187,10 +2189,7 @@ parallelize_loops (void)
  || loop_has_blocks_with_irreducible_flag (loop)
  || (loop_preheader_edge (loop)-src-flags  BB_IRREDUCIBLE_LOOP)
  /* FIXME: the check for vector phi nodes could be removed.  */
- || loop_has_vector_phi_nodes (loop)
- /* FIXME: transform_to_exit_first_loop does not handle not
-header-copied loops correctly - see PR46886.  */
- || !do_while_loop_p (loop))
+ || loop_has_vector_phi_nodes (loop))
continue;
   estimated = max_stmt_executions_int (loop, false);
   /* FIXME: Bypass this check as graphite doesn't update the
@@ -2213,6 +2212,7 @@ parallelize_loops (void)
continue;
 
   changed = true;
+   
   if (dump_file  (dump_flags  TDF_DETAILS))
   {
if (loop-inner)
=

Re: [PATCH] Permanent Fix for PR46886

2012-03-26 Thread Razya Ladelsky
Richard Guenther rguent...@suse.de wrote on 26/03/2012 01:23:15 PM:

 From: Richard Guenther rguent...@suse.de
 To: Razya Ladelsky/Haifa/IBM@IBMIL
 Cc: gcc-patches@gcc.gnu.org
 Date: 26/03/2012 01:23 PM
 Subject: Re: [PATCH] Permanent Fix for PR46886
 
 On Mon, 26 Mar 2012, Razya Ladelsky wrote:
 
  Hi,
  
  This is (hopefully) a permanent fix  to pr46886.c
  I removed the condition preventing parallelization of do_while loops, 
as 
  it 
  was blocking parallelizing important loops in spec-2006.
  The patch fixes the number of iterations for cases where the body 
could 
  appear in the latch, as in pr46886.c.
  
  2012-03-26  Razya Ladelsky  ra...@il.ibm.com
  
   PR tree-optimization/46886
   * tree-parloops.c (transform_to_exit_first_loop):Set 
  number of iterations correctly when the body may appear at the latch.
   (pallelize_loops): Remove the condition preventing 
  do-while loops.
  
  
  
  Bootstrap and testsuite psss successfully on power7 linux machine.
  Ok to commit?
 
 + 
 +  /* if the latch contains more than the one statemnt of control 
variable 
 + increment then it contains the body.  */
 +  if (exit_1-dest == loop-latch  last_and_only_stmt (loop-latch))
  new_rhs = gimple_cond_rhs (cond_stmt);
 
 please check what the comment suggests, thus,
 
 last_and_only_stmt (loop-latch) == cond_stmt
 

Hi Richard,
The latch has at least one stmt for sure:
the control variable increment (not a cond_stmt) coming from the call to 
canonicalize_loop_ivs(loop, nit, true)  earlier in tree-parloops:

 
/* .  When BUMP_IN_LATCH is true, the induction
   variable is incremented in the loop latch, otherwise it is
   incremented in the loop header.  Return the induction variable that
   was created.  */

tree
canonicalize_loop_ivs (struct loop *loop, tree *nit, bool bump_in_latch)



 tree-parloops.c is quite fragile in what it expects the IL to look like
 and tests do not cover what later code expects.  Please do not add to 
 that.


I agree.
I have also tested this code on spec2006 benchmarks.
(together with an upcoming patch, 5 of these show speedups with autopar on 
a linux-power7)




 
 @@ -2146,7 +2149,6 @@ parallelize_loops (void)
  return false;
if (cfun-has_nonlocal_label)
  return false;
 -
 
 spurious whitespace change.
 
 @@ -2213,6 +2212,7 @@ parallelize_loops (void)
 continue;
 
changed = true;
 + 
if (dump_file  (dump_flags  TDF_DETAILS))
{
 if (loop-inner)
 
 Likewise.
 
 Ok with the above change.

Ok, thanks,
Razya

 
 Thanks,
 Richard.
 



Fw: [PATCH] Permanent Fix for PR46886

2012-03-26 Thread Razya Ladelsky
 + 
 +  /* if the latch contains more than the one statemnt of control 
variable 
 + increment then it contains the body.  */
 +  if (exit_1-dest == loop-latch  last_and_only_stmt (loop-latch))
  new_rhs = gimple_cond_rhs (cond_stmt);
 
 please check what the comment suggests, thus,
 
 last_and_only_stmt (loop-latch) == cond_stmt
 


Richard,
There's no simple way to do such a check,
as the control variable increment stmt is explicitly created in 
canonicalize_loop_ivs and not in tree-parloops.
What we do know is that it is put in the latch because we ask for it when 
we call canonicalize_loop_ivs from parloops.

One way to get the control variable increment stmt is to use 
canonicalize_loop_ivs() return value and look for the stmts that use it.

Do you think I should add all this functionality to the code in order to 
assert
the stmt that is in the latch? or can I skip it ?

Thanks,
Razya



[RFC][PATCH] A change to do_while_loop_p()

2012-03-21 Thread Razya Ladelsky
Hi,

I need to use do_while_loop_p, but I'm not sure its functionality is what 
I expected it to be.

This is the part that I do not understand:

/* If the header contains just a condition, it is not a do-while loop.  */
  stmt = last_and_only_stmt (loop-header);
 if (stmt
   gimple_code (stmt) == GIMPLE_COND)
return false;

The header could contain a condition which is not the loop's exit 
condition,
but rather a part of its body, then  why do we rule out this loop as a 
do_while loop?

I ran into this in a loop (the outer loop) extracted from bwaves 
benchmark:

  do k=1,nz
 km1=mod(k+nz-2,nz)+1
 kp1=mod(k,nz)+1
 do j=1,ny
jm1=mod(j+ny-2,ny)+1
jp1=mod(j,ny)+1
.
 enddo
  enddo
 
which was translated to:

D.2361_17 = *ny_16(D);

bb 5:
  # k_3 = PHI 1(4), k_562(25)
  if (D.2361_17  0)
goto bb 8;
  else
goto bb 6;

bb 6:
  k_562 = k_3 + 1;
  # DEBUG k = k_562
  if (k_3 == D.1583_270)
goto bb 7;  ---   return
  else
goto bb 25;

bb 25:
  goto bb 5;

bb 8:  -- starting the body of the the second loop
  pretmp.318_776 = (integer(kind=8)) k_3;
  pretmp.318_777 = stride.92_20 * pretmp.318_776;
... 



bb 5 is the header of the outer loop, and bb 25 is the latch.
According to do_while_loop_p ()  this is NOT a do while loop, but it
seems that it should be.

 I am attaching a patch to change do_while_loop_p() assuming that what I 
understand is indeed correct,
 Please let me know if I'm right,

Thank you,
Razya

Index: tree-ssa-loop-ch.c
===
--- tree-ssa-loop-ch.c  (revision 185604)
+++ tree-ssa-loop-ch.c  (working copy)
@@ -107,6 +107,8 @@ should_duplicate_loop_header_p (basic_block header
 bool
 do_while_loop_p (struct loop *loop)
 {
+  edge exit_edge;
+  gimple cond_stmt;
   gimple stmt = last_stmt (loop-latch);
 
   /* If the latch of the loop is not empty, it is not a do-while loop.  */
@@ -116,8 +118,14 @@ do_while_loop_p (struct loop *loop)
 
   /* If the header contains just a condition, it is not a do-while loop.  */
   stmt = last_and_only_stmt (loop-header);
+  exit_edge = single_dom_exit (loop);
+  if (exit_edge)
+cond_stmt = last_stmt (exit_edge-src);
+  else
+cond_stmt =stmt;
   if (stmt
-   gimple_code (stmt) == GIMPLE_COND)
+   gimple_code (stmt) == GIMPLE_COND
+   stmt == cond_stmt)
 return false;
 
   return true;
=

Re: [PATCH, take 2] Fix PR tree-optimization/49960 ,Fix self data dependence

2011-11-24 Thread Razya Ladelsky
Jakub Jelinek ja...@redhat.com wrote on 21/11/2011 07:25:10 PM:

 From: Jakub Jelinek ja...@redhat.com
 To: Razya Ladelsky/Haifa/IBM@IBMIL
 Cc: GCC Patches gcc-patches@gcc.gnu.org, Richard Guenther 
 richard.guent...@gmail.com
 Date: 21/11/2011 07:25 PM
 Subject: Re: [PATCH, take 2] Fix PR tree-optimization/49960 ,Fix 
 self data dependence
 
 On Mon, Nov 21, 2011 at 06:56:55PM +0200, Razya Ladelsky wrote:
  I have some non-affine cases for which compute_affine_dependence is 
called 
  (as it is called for 
  ALL dependences from compte_all_depepndences()), and no harm is done.
  I looked a little bit closer into the code, and this is what happens 
for 
  non affine accesses:
  
  initialize_data_dependence_relation() assigns 
  DDR_ARE_DEPENDENT (res) = chrec_dont_know for the dr.
 
 It can be chrec_known too (that's actually the only interesting case for 
us
 unless it is a read-read ddr), but you're right that likely
 object_address_invariant_in_loop_p should be false.
 
  Then, compute_affine_depepndence()
  tests if (DDR_ARE_DEPENDENT (ddr) == NULL_TREE), and does nothing 
  otherwise.
  Since the dr was initialized with chrec_dont_know, no harm is done.
  
  Anyway, since it is useless for your gather case, I'll just remove it, 

  along with compute_self_dependence().
  
  OK?
 
 Yes, patch preapproved.
 
Jakub
 

Hi,

Going to commit this patch today.
I bootstrapped and regtested on  ppc64-suse-linux.

ChangeLog:
 
* tree-data-ref.c (initialize_data_dependence_relation): Update 
comment for the 
self dependence case.
(compute_self_dependence): Remove.
* tree-vect-data-refs.c (vect_analyze_data_refs): Remove call to 
compute_self_dependenc. 


Thanks,
Razya

Index: tree-data-ref.c
===
--- tree-data-ref.c (revision 181675)
+++ tree-data-ref.c (working copy)
@@ -1385,8 +1385,7 @@ initialize_data_dependence_relation (struct data_r
   return res;
 }
 
-  /* When the references are exactly the same, don't spend time doing
- the data dependence tests, just initialize the ddr and return.  */
+  /* The case where the references are exactly the same.  */
   if (operand_equal_p (DR_REF (a), DR_REF (b), 0))
 {
  if (loop_nest
@@ -4132,19 +4131,6 @@ compute_affine_dependence (struct data_dependence_
 fprintf (dump_file, )\n);
 }
 
-/* This computes the dependence relation for the same data
-   reference into DDR.  */
-
-void
-compute_self_dependence (struct data_dependence_relation *ddr)
-{
-  if (DDR_ARE_DEPENDENT (ddr) != NULL_TREE)
-return;
-
-  if (DDR_LOOP_NEST (ddr))
-compute_affine_dependence (ddr, VEC_index (loop_p, DDR_LOOP_NEST (ddr), 
0));
-}
-
 /* Compute in DEPENDENCE_RELATIONS the data dependence graph for all
the data references in DATAREFS, in the LOOP_NEST.  When
COMPUTE_SELF_AND_RR is FALSE, don't compute read-read and self
Index: tree-vect-data-refs.c
===
--- tree-vect-data-refs.c   (revision 181675)
+++ tree-vect-data-refs.c   (working copy)
@@ -3122,7 +3122,6 @@ vect_analyze_data_refs (loop_vec_info loop_vinfo,
  ddr = VEC_index (ddr_p, ddrs, k);
  gcc_assert (DDR_A (ddr) == olddr  DDR_B (ddr) == olddr);
  newddr = initialize_data_dependence_relation (dr, dr, nest);
- compute_self_dependence (newddr);
  VEC_replace (ddr_p, ddrs, k, newddr);
  free_dependence_relation (ddr);
  VEC_replace (data_reference_p, datarefs, i, dr);
=

Re: [PATCH, take 2] Fix PR tree-optimization/49960 ,Fix self data dependence

2011-11-21 Thread Razya Ladelsky
gcc-patches-ow...@gcc.gnu.org wrote on 21/11/2011 02:57:07 PM:

 From: Jakub Jelinek ja...@redhat.com
 To: Razya Ladelsky/Haifa/IBM@IBMIL
 Cc: Richard Guenther richard.guent...@gmail.com, GCC Patches gcc-
 patc...@gcc.gnu.org
 Date: 21/11/2011 02:57 PM
 Subject: Re: [PATCH, take 2] Fix PR tree-optimization/49960 ,Fix 
 self data dependence
 Sent by: gcc-patches-ow...@gcc.gnu.org
 
 On Tue, Nov 15, 2011 at 12:31:39PM +0200, Razya Ladelsky wrote:
  This patch fixes the failures described in 
  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49960
  It also fixes bzips when run with autopar enabled.
 
 Sorry, I've been away and couldn't react to this earlier, but the patch
 looks wrong to me.  The reason why I've been calling 
compute_self_dependence
 from the gather handling code is to do there what we did for normal data
 references in compute_all_dependences, except for the affine stuff.

Hi Jakub,
what do you mean by 'except for the affine stuff'?


Before having this patch, compute_self_depepndence just marked the 
distance zero 
and returned, while now it calls compute_affine_dependence.
So I think you do get the right outcome now.

 So, when compute_all_dependences no longer calls it, neither should
 vect_analyze_data_refs.

So do you want to replace it with the call to compute_affine_dependence?

Thanks,
Razya

 
Jakub
 



Re: [PATCH, take 2] Fix PR tree-optimization/49960 ,Fix self data dependence

2011-11-21 Thread Razya Ladelsky
Jakub Jelinek ja...@redhat.com wrote on 21/11/2011 03:59:15 PM:

 From: Jakub Jelinek ja...@redhat.com
 To: Razya Ladelsky/Haifa/IBM@IBMIL
 Cc: GCC Patches gcc-patches@gcc.gnu.org, Richard Guenther 
 richard.guent...@gmail.com
 Date: 21/11/2011 03:59 PM
 Subject: Re: [PATCH, take 2] Fix PR tree-optimization/49960 ,Fix 
 self data dependence
 
 On Mon, Nov 21, 2011 at 03:50:10PM +0200, Razya Ladelsky wrote:
  what do you mean by 'except for the affine stuff'?
 
 I mean that compute_affine_dependence must never be called on gather
 data refs, that function can't do anything meaningful for them, they are
 gather data refs exactly because dr_analyze_innermost failed on them
 otherwise, as base and/or offset aren't simple IVs.
 

I understand.
I tried to follow the same paths that data refs (affine or not) go through 
in compute_all_depepndences.

According to compute_all_depepndences(), the path for computing any data 
dependence is:
call initialize_data_depepndence_relation(), and call 
compute_affine_dependence() if there's a loop nest.
I tried to keep the same semantics for self dependences.

Although compute_affine_dependence() can't do anything meaningful for the 
gather data refs,
it may still be assigning values to the dependence relation structure, if 
you need to use them. 
Therefore I did not skip the call to compute_self_dependence(), which 
indeed calls
compute_affine_depepndence().


If you think it's redundant, we can remove it.
Thanks,
Razya
 

   So, when compute_all_dependences no longer calls it, neither should
   vect_analyze_data_refs.
  
  So do you want to replace it with the call to 
compute_affine_dependence?
 
 No.  With nothing at all.
 
Jakub
 



Re: [PATCH, take 2] Fix PR tree-optimization/49960 ,Fix self data dependence

2011-11-21 Thread Razya Ladelsky
Jakub Jelinek ja...@redhat.com wrote on 21/11/2011 05:07:54 PM:

 From: Jakub Jelinek ja...@redhat.com
 To: Razya Ladelsky/Haifa/IBM@IBMIL
 Cc: GCC Patches gcc-patches@gcc.gnu.org, Richard Guenther 
 richard.guent...@gmail.com
 Date: 21/11/2011 05:08 PM
 Subject: Re: [PATCH, take 2] Fix PR tree-optimization/49960 ,Fix 
 self data dependence
 
 On Mon, Nov 21, 2011 at 04:54:09PM +0200, Razya Ladelsky wrote:
  Although compute_affine_dependence() can't do anything meaningful for 
the 
  gather data refs,
  it may still be assigning values to the dependence relation structure, 
if 
  you need to use them. 
  Therefore I did not skip the call to compute_self_dependence(), which 
  indeed calls
  compute_affine_depepndence().
  
  
  If you think it's redundant, we can remove it.
 
 It is not just redundant, but harmful.  compute_affine_dependence 
assumes
 that base and offset are simple IVs, that is not true for gather deps.
 So please do remove it.
 
Jakub
 

Jakub,

I have some non-affine cases for which compute_affine_dependence is called 
(as it is called for 
ALL dependences from compte_all_depepndences()), and no harm is done.
I looked a little bit closer into the code, and this is what happens for 
non affine accesses:

initialize_data_dependence_relation() assigns 
DDR_ARE_DEPENDENT (res) = chrec_dont_know for the dr.

Then, compute_affine_depepndence()
tests if (DDR_ARE_DEPENDENT (ddr) == NULL_TREE), and does nothing 
otherwise.
Since the dr was initialized with chrec_dont_know, no harm is done.

Anyway, since it is useless for your gather case, I'll just remove it, 
along with compute_self_dependence().

OK?

Thanks,
Razya



[PATCH, take 2] Fix PR tree-optimization/49960 ,Fix self data dependence

2011-11-15 Thread Razya Ladelsky
  I hope it's clearer now, I will add a comment to the code, and submit 
it
  before committing it.
 
 No, it's not clearer, because it is not clear why you need to add the 
hack
 instead of avoiding the 2nd access function. And iff you add the hack it
 needs a comment why zero should be special (any other constant would
 be the same I suppose).
 
 Btw, your fortran example does not compile and I don't believe the issue
 is still present after my last changes to dr_analyze_indices.  So, did
 you verify this on trunk?
 
 Richard.

This patch fixes the failures described in 
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49960
It also fixes bzips when run with autopar enabled.

In both cases the self dependences are not handled correctly.
In the first case, a non affine access is analyzed:
in the second, the distance vector is not calculated correctly (the 
distance vector considered for for self dependences is always (0,0,...))
As  a result, the loops get wrongfully parallelized.

I modified the previous patch according to the last changes in the trunk,
which indeed do not requite special handling for the 2nd access function 
(as mentioned by Richard).
Another change is that the previous version of the patch eliminated 
compute_self_dependences function
as the calls to it were redundant, while this version considers the new 
call to compute_self_dependences from the vect code for gather (inserted 
lately by Jakub).

ChangeLog:
PR tree-optimization/49960

* tree-data-ref.c (initialize_data_dependence_relation): Add 
initializations. 
Remove call to compute_self_dependence.
(compute_affine_dependence): Remove the !DDR_SELF_REFERENCE 
condition.
(compute_self_dependence): Remove old code. Add call to 
compute_affine_dependence.
(compute_all_dependences): Remove call to compute_self_dependence. 

Add call to compute_affine_dependence.
 
testsuite/ChangeLog:
PR tree-optimization/49960

* gcc.dg/autopar/pr49960.c: New test.
* gcc.dg/autopar/pr49960-1.c: New test.







Bootstrap and testsuite pass successfully for ppc64-redhat-linux.

OK for trunk?
Thank you,
Razya



pr49960.c
Description: Binary data


pr49960-1.c
Description: Binary data
Index: gcc/tree-data-ref.c
===
--- gcc/tree-data-ref.c (revision 181168)
+++ gcc/tree-data-ref.c (working copy)
@@ -1389,13 +1389,30 @@ initialize_data_dependence_relation (struct data_r
  the data dependence tests, just initialize the ddr and return.  */
   if (operand_equal_p (DR_REF (a), DR_REF (b), 0))
 {
+ if (loop_nest
+ !object_address_invariant_in_loop_p (VEC_index (loop_p, loop_nest, 
0),
+   DR_BASE_OBJECT (a)))
+  {
+DDR_ARE_DEPENDENT (res) = chrec_dont_know;
+return res;
+  }
   DDR_AFFINE_P (res) = true;
   DDR_ARE_DEPENDENT (res) = NULL_TREE;
   DDR_SUBSCRIPTS (res) = VEC_alloc (subscript_p, heap, DR_NUM_DIMENSIONS 
(a));
   DDR_LOOP_NEST (res) = loop_nest;
   DDR_INNER_LOOP (res) = 0;
   DDR_SELF_REFERENCE (res) = true;
-  compute_self_dependence (res);
+  for (i = 0; i  DR_NUM_DIMENSIONS (a); i++)
+   {
+ struct subscript *subscript;
+
+ subscript = XNEW (struct subscript);
+ SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known ();
+ SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known ();
+ SUB_LAST_CONFLICT (subscript) = chrec_dont_know;
+ SUB_DISTANCE (subscript) = chrec_dont_know;
+ VEC_safe_push (subscript_p, heap, DDR_SUBSCRIPTS (res), subscript);
+   }
   return res;
 }
 
@@ -4040,8 +4057,7 @@ compute_affine_dependence (struct data_dependence_
 }
 
   /* Analyze only when the dependence relation is not yet known.  */
-  if (DDR_ARE_DEPENDENT (ddr) == NULL_TREE
-   !DDR_SELF_REFERENCE (ddr))
+  if (DDR_ARE_DEPENDENT (ddr) == NULL_TREE)
 {
   dependence_stats.num_dependence_tests++;
 
@@ -4122,31 +4138,11 @@ compute_affine_dependence (struct data_dependence_
 void
 compute_self_dependence (struct data_dependence_relation *ddr)
 {
-  unsigned int i;
-  struct subscript *subscript;
-
   if (DDR_ARE_DEPENDENT (ddr) != NULL_TREE)
 return;
 
-  for (i = 0; VEC_iterate (subscript_p, DDR_SUBSCRIPTS (ddr), i, subscript);
-   i++)
-{
-  if (SUB_CONFLICTS_IN_A (subscript))
-   free_conflict_function (SUB_CONFLICTS_IN_A (subscript));
-  if (SUB_CONFLICTS_IN_B (subscript))
-   free_conflict_function (SUB_CONFLICTS_IN_B (subscript));
-
-  /* The accessed index overlaps for each iteration.  */
-  SUB_CONFLICTS_IN_A (subscript)
-   = conflict_fn (1, affine_fn_cst (integer_zero_node));
-  SUB_CONFLICTS_IN_B (subscript)
-   = conflict_fn (1, affine_fn_cst (integer_zero_node));
-  SUB_LAST_CONFLICT (subscript) = chrec_dont_know;
-}
-
-  /* The distance 

Re: [patch] Fix PR tree-optimization/49960 ,Fix self data dependence

2011-10-19 Thread Razya Ladelsky
gcc-patches-ow...@gcc.gnu.org wrote on 17/10/2011 09:03:59 AM:

 From: Richard Guenther richard.guent...@gmail.com
 To: Razya Ladelsky/Haifa/IBM@IBMIL
 Cc: gcc-patches@gcc.gnu.org, Sebastian Pop s...@gcc.gnu.org
 Date: 17/10/2011 09:04 AM
 Subject: Re: [patch] Fix PR tree-optimization/49960 ,Fix self data 
dependence
 Sent by: gcc-patches-ow...@gcc.gnu.org
 
 On Mon, Oct 17, 2011 at 8:23 AM, Razya Ladelsky ra...@il.ibm.com 
wrote:
  This patch fixes the failures described in
  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49960
  It also fixes bzips when run with autopar enabled.
 
  In both cases the self dependences are not handled correctly.
  In the first case, a non affine access is analyzed:
  in the second, the distance vector is not calculated correctly (the
  distance vector considered for for self dependences is always 
(0,0,...))
 
  As  a result, the loops get wrongfully parallelized.
 
  The patch avoids the special handling of  self dependences, and 
analyzes
  all dependences in the same way. Specific adjustments
  and support for the self dependence cases were made.
 
 Can you elaborate on
 
 @@ -3119,8 +3135,11 @@ add_other_self_distances (struct 
data_dependence_r
 {
   if (DDR_NUM_SUBSCRIPTS (ddr) != 1)
{
 -DDR_ARE_DEPENDENT (ddr) = chrec_dont_know;
 -return;
 +if (DDR_NUM_SUBSCRIPTS (ddr) != 2 || !integer_zerop 
(DR_ACCESS_FN
 (DDR_A (ddr), 1)))
 +  {
 +DDR_ARE_DEPENDENT (ddr) = chrec_dont_know;
 +return;
 +  }
}
 
   access_fun = DR_ACCESS_FN (DDR_A (ddr), 0);
 
 ?  It needed a comment before, and now so even more.
 
 The rest of the patch is ok, I suppose the above hunk is to enhance
 something, not
 to fix the bug?

For fortran code like:

  DO 140 J=1,MB
 DO 130 K=1,NA
BKJ=B(K,J)
IF(BKJ.EQ.ZERO) GO TO 130
   DO 120 I=1,MA
  C(I,J)=C(I,J)+A(K,I)*BKJ
  120  CONTINUE
  130CONTINUE
  140 CONTINUE
  RETURN


The access functions for the C(i j) self dependence are:

(Data Dep: 
#(Data Ref: 
#  bb: 9 
#  stmt: D.1427_79 = *c_78(D)[D.1426_77];
#  ref: *c_78(D)[D.1426_77];
#  base_object: *c_78(D);
#  Access function 0: {{(stride.12_25 + 1) + offset.13_36, +, 
stride.12_25}_1, +, 1}_3
#  Access function 1: 0B
#)
#(Data Ref: 
#  bb: 9 
#  stmt: *c_78(D)[D.1426_77] = D.1433_88;
#  ref: *c_78(D)[D.1426_77];
#  base_object: *c_78(D);
#  Access function 0: {{(stride.12_25 + 1) + offset.13_36, +, 
stride.12_25}_1, +, 1}_3
#  Access function 1: 0B
#)


Two dimesions are created to describe C(i j) although there's no need for 
access function 1 which is just 0B.


If this was a C code, we would have these two access functions for 
C[i][j]:

(Data Dep: 
#(Data Ref: 
#  bb: 5 
#  stmt: t_10 = C[i_33][j_37];
#  ref: C[i_33][j_37];
#  base_object: C
#  Access function 0: {3, +, 1}_3
#  Access function 1: {3, +, 1}_2
#)
#(Data Ref: 
#  bb: 5 
#  stmt: C[i_33][j_37] = D.3852_15;
#  ref: C[i_33][j_37];
#  base_object: C
#  Access function 0: {3, +, 1}_3
#  Access function 1: {3, +, 1}_2
#)


In order to handle the Fortran data accesses, even for simple cases as 
above, 
I would need to handle multivariate accesses.
The data dependence analysis doesn't know how to handle such dependences 
if there's more than one subscript.
The above Frotran code doesn't actually have two subscripts, but one, and 
thus should be handled.
 
The reason this issue came up only with the changes of this patch is that 
now 
add_other_self_distances is called from build_classic_dist_vector, which 
is called also for self dependences.
Before the patch, the distance vector for self dependences was always 
determined as a vector of 0's, and build_classic_dist_vector
was not called.

I hope it's clearer now, I will add a comment to the code, and submit it 
before committing it.

Thanks,
Razya


 
 Thanks,
 Richard.
 
  Bootstrap and testsuite pass successfully for ppc64-redhat-linux.
 
  OK for trunk?
  Thank you,
  Razya
 
 
  ChangeLog:
 
 PR tree-optimization/49960
 * tree-data-ref.c (compute_self_dependence): Remove.
  (initialize_data_dependence_relation): Add intializations.
  Remove compute_self_dependence.
  (add_other_self_distances): Add support for two dimensions 
if
  the second is zero.
  (compute_affine_dependence): Remove the 
!DDR_SELF_REFERENCE
  condition.
  (compute_all_dependences): Remove call to
  compute_self_dependence. Add call to compute_affine_dependence
 
  testsuite/ChangeLog:
 
 PR tree-optimization/49660
 * gcc.dg/autopar/pr49660.c: New test.
* gcc.dg/autopar/pr49660-1.c: New test.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 



[patch] Fix PR tree-optimization/49960 ,Fix self data dependence

2011-10-17 Thread Razya Ladelsky
This patch fixes the failures described in 
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49960
It also fixes bzips when run with autopar enabled.

In both cases the self dependences are not handled correctly.
In the first case, a non affine access is analyzed:
in the second, the distance vector is not calculated correctly (the 
distance vector considered for for self dependences is always (0,0,...))

As  a result, the loops get wrongfully parallelized.

The patch avoids the special handling of  self dependences, and analyzes 
all dependences in the same way. Specific adjustments
and support for the self dependence cases were made.

Bootstrap and testsuite pass successfully for ppc64-redhat-linux.

OK for trunk?
Thank you,
Razya


ChangeLog:

PR tree-optimization/49960
* tree-data-ref.c (compute_self_dependence): Remove.
 (initialize_data_dependence_relation): Add intializations. 
Remove compute_self_dependence.
 (add_other_self_distances): Add support for two dimensions if 
the second is zero.
 (compute_affine_dependence): Remove the !DDR_SELF_REFERENCE 
condition.
 (compute_all_dependences): Remove call to 
compute_self_dependence. Add call to compute_affine_dependence

testsuite/ChangeLog:

PR tree-optimization/49660
* gcc.dg/autopar/pr49660.c: New test.
   * gcc.dg/autopar/pr49660-1.c: New test.













 

Index: tree-data-ref.c
===
--- tree-data-ref.c (revision 179799)
+++ tree-data-ref.c (working copy)
@@ -1346,7 +1346,6 @@ dr_may_alias_p (const struct data_reference *a, co
   return refs_may_alias_p (addr_a, addr_b);
 }
 
-static void compute_self_dependence (struct data_dependence_relation *);
 
 /* Initialize a data dependence relation between data accesses A and
B.  NB_LOOPS is the number of loops surrounding the references: the
@@ -1386,13 +1385,30 @@ initialize_data_dependence_relation (struct data_r
  the data dependence tests, just initialize the ddr and return.  */
   if (operand_equal_p (DR_REF (a), DR_REF (b), 0))
 {
+  if (loop_nest
+  !object_address_invariant_in_loop_p (VEC_index (loop_p, loop_nest, 
0),
+ DR_BASE_OBJECT (a)))
+   {
+ DDR_ARE_DEPENDENT (res) = chrec_dont_know;
+ return res;
+   }
   DDR_AFFINE_P (res) = true;
   DDR_ARE_DEPENDENT (res) = NULL_TREE;
   DDR_SUBSCRIPTS (res) = VEC_alloc (subscript_p, heap, DR_NUM_DIMENSIONS 
(a));
   DDR_LOOP_NEST (res) = loop_nest;
   DDR_INNER_LOOP (res) = 0;
   DDR_SELF_REFERENCE (res) = true;
-  compute_self_dependence (res);
+  for (i = 0; i  DR_NUM_DIMENSIONS (a); i++)
+   {
+ struct subscript *subscript;
+ 
+ subscript = XNEW (struct subscript);
+ SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known ();
+ SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known ();
+ SUB_LAST_CONFLICT (subscript) = chrec_dont_know;
+ SUB_DISTANCE (subscript) = chrec_dont_know;
+ VEC_safe_push (subscript_p, heap, DDR_SUBSCRIPTS (res), subscript);
+   }
   return res;
 }
 
@@ -3119,8 +3135,11 @@ add_other_self_distances (struct data_dependence_r
{
  if (DDR_NUM_SUBSCRIPTS (ddr) != 1)
{
- DDR_ARE_DEPENDENT (ddr) = chrec_dont_know;
- return;
+ if (DDR_NUM_SUBSCRIPTS (ddr) != 2 || !integer_zerop 
(DR_ACCESS_FN (DDR_A (ddr), 1)))
+   {
+ DDR_ARE_DEPENDENT (ddr) = chrec_dont_know;
+ return;
+   }
}
 
  access_fun = DR_ACCESS_FN (DDR_A (ddr), 0);
@@ -4037,8 +4056,7 @@ compute_affine_dependence (struct data_dependence_
 }
 
   /* Analyze only when the dependence relation is not yet known.  */
-  if (DDR_ARE_DEPENDENT (ddr) == NULL_TREE
-   !DDR_SELF_REFERENCE (ddr))
+  if (DDR_ARE_DEPENDENT (ddr) == NULL_TREE)
 {
   dependence_stats.num_dependence_tests++;
 
@@ -4113,39 +4131,6 @@ compute_affine_dependence (struct data_dependence_
 fprintf (dump_file, )\n);
 }
 
-/* This computes the dependence relation for the same data
-   reference into DDR.  */
-
-static void
-compute_self_dependence (struct data_dependence_relation *ddr)
-{
-  unsigned int i;
-  struct subscript *subscript;
-
-  if (DDR_ARE_DEPENDENT (ddr) != NULL_TREE)
-return;
-
-  for (i = 0; VEC_iterate (subscript_p, DDR_SUBSCRIPTS (ddr), i, subscript);
-   i++)
-{
-  if (SUB_CONFLICTS_IN_A (subscript))
-   free_conflict_function (SUB_CONFLICTS_IN_A (subscript));
-  if (SUB_CONFLICTS_IN_B (subscript))
-   free_conflict_function (SUB_CONFLICTS_IN_B (subscript));
-
-  /* The accessed index overlaps for each iteration.  */
-  SUB_CONFLICTS_IN_A (subscript)
-   = conflict_fn (1, affine_fn_cst 

Re: [patch] Fix PR tree-optimization/49471

2011-07-26 Thread Razya Ladelsky
Richard Guenther richard.guent...@gmail.com wrote on 25/07/2011 05:54:28 
PM:

 From: Richard Guenther richard.guent...@gmail.com
 To: Razya Ladelsky/Haifa/IBM@IBMIL
 Cc: gcc-patches@gcc.gnu.org, Zdenek Dvorak 
 rakd...@kam.mff.cuni.cz, Sebastian Pop s...@gcc.gnu.org
 Date: 25/07/2011 05:54 PM
 Subject: Re: [patch] Fix PR tree-optimization/49471
 
 On Mon, Jul 25, 2011 at 4:47 PM, Razya Ladelsky ra...@il.ibm.com 
wrote:
  Hi,
 
  This patch fixes the build failure of cactusADM and dealII spec2006
  benchmarks when autopar is enabled.
  (for powerpc they fail only when -m32 is additionally enabled)
 
  The problem originated in canonicalize_loop_ivs, where we iterate the
  header's phis in order to base all
  the induction variables on a single control variable.
  We use the largest precision of the loop's ivs in order to determine 
the
  type of the control variable.
 
  Since iterating the loop's phis takes into account not only the loop's
  ivs, but also reduction variables,
  we got precision values like 80 for x86, or 128 for ppc.
  The compilers failed to create proper types for these sizes
  (respectively).
 
  The proper behavior for determining the control variable's type is to 
take
  into account only the loop's ivs,
  which is what this patch does.
 
  Bootstrap and testsuite pass successfully (as autopar is not enabled 
by
  default).
  No new regressions when the testsuite is run with autopar enabled.
  No new regressions for the run of spec2006 with autopar enabled,
 
  cactusADM and dealII benchmarks now pass successfully with autopar on
  powerpc and x86.
 
  Thanks to Zdenek who helped me figure out the failure/fix.
  OK for trunk?
 
 It'll collide with Sebastians patch in that area.  I suggested a
 INTEGRAL_TYPE_P check instead of the simple_iv one, it
 should be cheaper.  Zdenek, do you think it will be incorrect
 in some cases?
 

The INTEGRAL_TYPE_P check does work for cactusADM and dealII, but
I'm not sure about the general case.

Razya


 Thanks,
 Richard.
 
  Thanks,
  Razya
 
  ChangeLog:
 
PR tree-optimization/49471
* tree-vect-loop-manip.c (canonicalize_loop_ivs): Add condition to
ignore reduction variables when iterating the loop header's phis.
 
 
 



Re: [patch] Fix PR tree-optimization/49471

2011-07-25 Thread Razya Ladelsky
Razya Ladelsky/Haifa/IBM wrote on 25/07/2011 05:44:02 PM:

 From: Razya Ladelsky/Haifa/IBM
 To: gcc-patches@gcc.gnu.org
 Cc: Zdenek Dvorak rakd...@kam.mff.cuni.cz, Richard Guenther 
 richard.guent...@gmail.com
 Date: 25/07/2011 05:44 PM
 Subject: [patch] Fix PR tree-optimization/49471
 
 Hi,
 
 This patch fixes the build failure of cactusADM and dealII spec2006 
 benchmarks when autopar is enabled.
 (for powerpc they fail only when -m32 is additionally enabled)
 
 The problem originated in canonicalize_loop_ivs, where we iterate 
 the header's phis in order to base all
 the induction variables on a single control variable.
 We use the largest precision of the loop's ivs in order to determine
 the type of the control variable. 
 
 Since iterating the loop's phis takes into account not only the 
 loop's ivs, but also reduction variables, 
 we got precision values like 80 for x86, or 128 for ppc.
 The compilers failed to create proper types for these sizes 
(respectively).
 
 The proper behavior for determining the control variable's type is 
 to take into account only the loop's ivs,
 which is what this patch does. 
 
 Bootstrap and testsuite pass successfully (as autopar is not enabled
 by default).
 No new regressions when the testsuite is run with autopar enabled.
 No new regressions for the run of spec2006 with autopar enabled, 
 
 cactusADM and dealII benchmarks now pass successfully with autopar 
 on powerpc and x86.
 
 Thanks to Zdenek who helped me figure out the failure/fix. 
 OK for trunk? 
 Thanks,
 Razya
 
 ChangeLog:
 
PR tree-optimization/49471
* tree-vect-loop-manip.c (canonicalize_loop_ivs): Add condition to 
ignore reduction variables when iterating the loop header's phis.

I have an error in the ChangeLog:
the change is in tree-ssa-loop-manip.c instead of tree-vect-loop-manip.c 

Sorry,
Razya
 
 [attachment cactus_dealII_patch.txt deleted by Razya 
Ladelsky/Haifa/IBM] 


Re: PATCH] PR 49580

2011-07-05 Thread Razya Ladelsky
Zdenek Dvorak rakd...@kam.mff.cuni.cz wrote on 30/06/2011 15:21:43:

 From: Zdenek Dvorak rakd...@kam.mff.cuni.cz
 To: Razya Ladelsky/Haifa/IBM@IBMIL
 Cc: gcc-patches@gcc.gnu.org, Richard Guenther 
richard.guent...@gmail.com
 Date: 30/06/2011 15:21
 Subject: Re: PATCH] PR 49580
 
 Hi,
 
  This patch fixes the build failure of gcc spec2006 benchmark.
  The change is in  gimple_duplicate_sese_tail(), where we need to 
subtract 
  1 from the loop's number of iterations.
  The stmt created to change the rhs of the loop's condition, used to be 

  inserted right after the defining stmt of the rhs (an ssa name).
  This requires special handling of different cases of the defining 
stmt:
  1.gimple_stmt
  2.gimple_phi
  3.default_def
  
  Instead of handling each case separately, if we insert the new stmt at 
the 
  begining of the loop's preheader, we're sure that 
  it's already been defined.
  
  Bootstrap and testsuite pass successfully (as autopar is not enabled 
by 
  default).
  No new regressions when the testsuite is run with autopar enabled.
  No new regressions for the run of spec2006 with autopar enabled, and 
gcc 
  benchmark now passes successfully.. 
  
  OK for trunk? 
 
 actually, I think a better approach would be not to have this kind 
 of pass-specific
 adjustments in gimple_duplicate_sese_tail at all.  The code 
 decreasing the number of
 iterations in gimple_duplicate_sese_tail only works because parloops
 does induction
 variable canonicalization first; should we need it to be used 
 anywhere else, it will break.
 I.e., parloops should first transform the condition so that it does 
 the comparison with
 the adjusted value, and then gimple_duplicate_sese_tail could do 
 just code copying
 and cfg changes,
 
 Zdenek

Hi,

I moved the adjustment of the loop's iterations from 
gimple_duplicate_sese_tail
to tree-parloops.c, right before the call to gimple_duplicate_sese_tail.
I repeated the bootstrap, regression and spec runs - no new regressions.

OK to commit?
Thanks,
razya




Index: gcc/tree-parloops.c
===
--- gcc/tree-parloops.c (revision 174166)
+++ gcc/tree-parloops.c (working copy)
@@ -1464,6 +1464,8 @@ transform_to_exit_first_loop (struct loop *loop, h
   gimple phi, nphi, cond_stmt, stmt, cond_nit;
   gimple_stmt_iterator gsi;
   tree nit_1;
+  edge exit_1;
+  tree new_rhs;
 
   split_block_after_labels (loop-header);
   orig_header = single_succ (loop-header);
@@ -1492,6 +1494,38 @@ transform_to_exit_first_loop (struct loop *loop, h
  control = t;
}
 }
+
+ /* Setting the condition towards peeling the last iteration:
+If the block consisting of the exit condition has the latch as
+successor, then the body of the loop is executed before
+the exit condition is tested.  In such case, moving the
+condition to the entry, causes that the loop will iterate
+one less iteration (which is the wanted outcome, since we
+peel out the last iteration).  If the body is executed after
+the condition, moving the condition to the entry requires
+decrementing one iteration.  */
+  exit_1 = EDGE_SUCC (exit-src, EDGE_SUCC (exit-src, 0) == exit); 
+  if (exit_1-dest == loop-latch)
+new_rhs = gimple_cond_rhs (cond_stmt);
+  else
+  {
+new_rhs = fold_build2 (MINUS_EXPR, TREE_TYPE (gimple_cond_rhs (cond_stmt)),
+  gimple_cond_rhs (cond_stmt),
+  build_int_cst (TREE_TYPE (gimple_cond_rhs 
(cond_stmt)), 1));
+if (TREE_CODE (gimple_cond_rhs (cond_stmt)) == SSA_NAME)
+  {
+   basic_block preheader;
+   gimple_stmt_iterator gsi1;
+
+   preheader = loop_preheader_edge(loop)-src;
+   gsi1 = gsi_after_labels (preheader);
+   new_rhs = force_gimple_operand_gsi (gsi1, new_rhs, true,
+   
NULL_TREE,false,GSI_CONTINUE_LINKING);
+  }
+  }
+  gimple_cond_set_rhs (cond_stmt, unshare_expr (new_rhs));
+  gimple_cond_set_lhs (cond_stmt, unshare_expr (gimple_cond_lhs (cond_stmt)));
+  
   bbs = get_loop_body_in_dom_order (loop);
 
   for (n = 0; bbs[n] != loop-latch; n++)
Index: gcc/tree-cfg.c
===
--- gcc/tree-cfg.c  (revision 174166)
+++ gcc/tree-cfg.c  (working copy)
@@ -5397,12 +5397,10 @@ gimple_duplicate_sese_tail (edge entry ATTRIBUTE_U
   int total_freq = 0, exit_freq = 0;
   gcov_type total_count = 0, exit_count = 0;
   edge exits[2], nexits[2], e;
-  gimple_stmt_iterator gsi,gsi1;
+  gimple_stmt_iterator gsi;
   gimple cond_stmt;
   edge sorig, snew;
   basic_block exit_bb;
-  basic_block iters_bb;
-  tree new_rhs;
   gimple_stmt_iterator psi;
   gimple phi;
   tree def;
@@ -5483,35 +5481,6 @@ gimple_duplicate_sese_tail (edge entry ATTRIBUTE_U
   gcc_assert (gimple_code (cond_stmt) == GIMPLE_COND);
   cond_stmt = gimple_copy (cond_stmt);
 
- /* If the block consisting of the exit condition has the latch

Re: PATCH] PR 49580

2011-07-05 Thread Razya Ladelsky
Zdenek Dvorak rakd...@kam.mff.cuni.cz wrote on 05/07/2011 13:37:41:

 From: Zdenek Dvorak rakd...@kam.mff.cuni.cz
 To: Razya Ladelsky/Haifa/IBM@IBMIL
 Cc: gcc-patches@gcc.gnu.org, Richard Guenther 
richard.guent...@gmail.com
 Date: 05/07/2011 13:37
 Subject: Re: PATCH] PR 49580
 
 Hi,
 
  I moved the adjustment of the loop's iterations from 
  gimple_duplicate_sese_tail
  to tree-parloops.c, right before the call to 
gimple_duplicate_sese_tail.
  I repeated the bootstrap, regression and spec runs - no new 
regressions.
  
  OK to commit?
 
 OK,
 
 Zdenek

I also want to commit this testcase, which is a reduced case of the gcc 
benchmark.
I apologize for not submitting it together with the patch.
OK?

Thanks,
Razya




 
  Index: gcc/tree-parloops.c
  ===
  --- gcc/tree-parloops.c   (revision 174166)
  +++ gcc/tree-parloops.c   (working copy)
  @@ -1464,6 +1464,8 @@ transform_to_exit_first_loop (struct loop *loop, 
h
 gimple phi, nphi, cond_stmt, stmt, cond_nit;
 gimple_stmt_iterator gsi;
 tree nit_1;
  +  edge exit_1;
  +  tree new_rhs;
  
 split_block_after_labels (loop-header);
 orig_header = single_succ (loop-header);
  @@ -1492,6 +1494,38 @@ transform_to_exit_first_loop (struct loop 
*loop, h
control = t;
  }
   }
  +
  + /* Setting the condition towards peeling the last iteration:
  +If the block consisting of the exit condition has the latch as
  +successor, then the body of the loop is executed before
  +the exit condition is tested.  In such case, moving the
  +condition to the entry, causes that the loop will iterate
  +one less iteration (which is the wanted outcome, since we
  +peel out the last iteration).  If the body is executed after
  +the condition, moving the condition to the entry requires
  +decrementing one iteration.  */
  +  exit_1 = EDGE_SUCC (exit-src, EDGE_SUCC (exit-src, 0) == exit); 
  +  if (exit_1-dest == loop-latch)
  +new_rhs = gimple_cond_rhs (cond_stmt);
  +  else
  +  {
  +new_rhs = fold_build2 (MINUS_EXPR, TREE_TYPE (gimple_cond_rhs
 (cond_stmt)),
  +gimple_cond_rhs (cond_stmt),
  +build_int_cst (TREE_TYPE (gimple_cond_rhs (cond_stmt)), 
1));
  +if (TREE_CODE (gimple_cond_rhs (cond_stmt)) == SSA_NAME)
  +  {
  +basic_block preheader;
  + gimple_stmt_iterator gsi1;
  +
  + preheader = loop_preheader_edge(loop)-src;
  +   gsi1 = gsi_after_labels (preheader);
  +   new_rhs = force_gimple_operand_gsi (gsi1, new_rhs, true,
  +   NULL_TREE,false,GSI_CONTINUE_LINKING);
  +  }
  +  }
  +  gimple_cond_set_rhs (cond_stmt, unshare_expr (new_rhs));
  +  gimple_cond_set_lhs (cond_stmt, unshare_expr (gimple_cond_lhs 
 (cond_stmt)));
  + 
 bbs = get_loop_body_in_dom_order (loop);
  
 for (n = 0; bbs[n] != loop-latch; n++)
  Index: gcc/tree-cfg.c
  ===
  --- gcc/tree-cfg.c   (revision 174166)
  +++ gcc/tree-cfg.c   (working copy)
  @@ -5397,12 +5397,10 @@ gimple_duplicate_sese_tail (edge entry 
ATTRIBUTE_U
 int total_freq = 0, exit_freq = 0;
 gcov_type total_count = 0, exit_count = 0;
 edge exits[2], nexits[2], e;
  -  gimple_stmt_iterator gsi,gsi1;
  +  gimple_stmt_iterator gsi;
 gimple cond_stmt;
 edge sorig, snew;
 basic_block exit_bb;
  -  basic_block iters_bb;
  -  tree new_rhs;
 gimple_stmt_iterator psi;
 gimple phi;
 tree def;
  @@ -5483,35 +5481,6 @@ gimple_duplicate_sese_tail (edge entry 
ATTRIBUTE_U
 gcc_assert (gimple_code (cond_stmt) == GIMPLE_COND);
 cond_stmt = gimple_copy (cond_stmt);
  
  - /* If the block consisting of the exit condition has the latch as
  -successor, then the body of the loop is executed before
  -the exit condition is tested.  In such case, moving the
  -condition to the entry, causes that the loop will iterate
  -one less iteration (which is the wanted outcome, since we
  -peel out the last iteration).  If the body is executed after
  -the condition, moving the condition to the entry requires
  -decrementing one iteration.  */
  -  if (exits[1]-dest == orig_loop-latch)
  -new_rhs = gimple_cond_rhs (cond_stmt);
  -  else
  -  {
  -new_rhs = fold_build2 (MINUS_EXPR, TREE_TYPE (gimple_cond_rhs
 (cond_stmt)),
  -gimple_cond_rhs (cond_stmt),
  -build_int_cst (TREE_TYPE (gimple_cond_rhs (cond_stmt)), 
1));
  -
  -if (TREE_CODE (gimple_cond_rhs (cond_stmt)) == SSA_NAME)
  -  {
  -   iters_bb = gimple_bb (SSA_NAME_DEF_STMT (gimple_cond_rhs 
(cond_stmt)));
  -   for (gsi1 = gsi_start_bb (iters_bb); !gsi_end_p (gsi1); 
 gsi_next (gsi1))
  - if (gsi_stmt (gsi1) == SSA_NAME_DEF_STMT (gimple_cond_rhs 
 (cond_stmt)))
  -   break;
  -
  -   new_rhs = force_gimple_operand_gsi (gsi1, new_rhs, true,
  -   NULL_TREE,false,GSI_CONTINUE_LINKING

PATCH] PR 49580

2011-06-30 Thread Razya Ladelsky
Hi,

This patch fixes the build failure of gcc spec2006 benchmark.
The change is in  gimple_duplicate_sese_tail(), where we need to subtract 
1 from the loop's number of iterations.
The stmt created to change the rhs of the loop's condition, used to be 
inserted right after the defining stmt of the rhs (an ssa name).
This requires special handling of different cases of the defining stmt:
1.gimple_stmt
2.gimple_phi
3.default_def

Instead of handling each case separately, if we insert the new stmt at the 
begining of the loop's preheader, we're sure that 
it's already been defined.

Bootstrap and testsuite pass successfully (as autopar is not enabled by 
default).
No new regressions when the testsuite is run with autopar enabled.
No new regressions for the run of spec2006 with autopar enabled, and gcc 
benchmark now passes successfully.. 

OK for trunk? 
Thanks,
Razya

Index: gcc/tree-cfg.c
===
--- gcc/tree-cfg.c  (revision 174166)
+++ gcc/tree-cfg.c  (working copy)
@@ -5401,7 +5401,6 @@ gimple_duplicate_sese_tail (edge entry ATTRIBUTE_U
   gimple cond_stmt;
   edge sorig, snew;
   basic_block exit_bb;
-  basic_block iters_bb;
   tree new_rhs;
   gimple_stmt_iterator psi;
   gimple phi;
@@ -5501,11 +5500,10 @@ gimple_duplicate_sese_tail (edge entry ATTRIBUTE_U
 
 if (TREE_CODE (gimple_cond_rhs (cond_stmt)) == SSA_NAME)
   {
-   iters_bb = gimple_bb (SSA_NAME_DEF_STMT (gimple_cond_rhs (cond_stmt)));
-   for (gsi1 = gsi_start_bb (iters_bb); !gsi_end_p (gsi1); gsi_next 
(gsi1))
- if (gsi_stmt (gsi1) == SSA_NAME_DEF_STMT (gimple_cond_rhs 
(cond_stmt)))
-   break;
+   basic_block preheader;
 
+   preheader = loop_preheader_edge(orig_loop)-src;
+   gsi1 = gsi_after_labels (preheader);
new_rhs = force_gimple_operand_gsi (gsi1, new_rhs, true,

NULL_TREE,false,GSI_CONTINUE_LINKING);
   }
=
22-12-2009  Razya Ladelsky  ra...@il.ibm.com

* tree-cfg.c (gimple_duplicate_sese_tail): Insert the stmt caclculating 
the new rhs
 of the loop's condition stmt to the preheader instead of iters_bb.
=