[Bug tree-optimization/65443] Don't peel last iteration from loop in transform_to_exit_first_loop

2015-06-08 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65443
Bug 65443 depends on bug 66442, which changed state.

Bug 66442 Summary: [6 regression] FAIL: gcc.dg/autopar/pr46885.c (test for 
excess errors)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66442

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED


[Bug tree-optimization/65443] Don't peel last iteration from loop in transform_to_exit_first_loop

2015-06-05 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65443

--- Comment #18 from vries at gcc dot gnu.org ---
Author: vries
Date: Fri Jun  5 15:57:34 2015
New Revision: 224154

URL: https://gcc.gnu.org/viewcvs?rev=224154root=gccview=rev
Log:
Add transform_to_exit_first_loop_alt

2015-06-05  Tom de Vries  t...@codesourcery.com

merge from gomp4 branch:
2015-05-28  Tom de Vries  t...@codesourcery.com

PR tree-optimization/65443
* tree-parloops.c (replace_imm_uses, replace_uses_in_bb_by)
(replace_uses_in_bbs_by, transform_to_exit_first_loop_alt)
(try_transform_to_exit_first_loop_alt): New function.
(transform_to_exit_first_loop): Use
try_transform_to_exit_first_loop_alt.

* gcc.dg/parloops-exit-first-loop-alt-2.c: New test.
* gcc.dg/parloops-exit-first-loop-alt-3.c: New test.
* gcc.dg/parloops-exit-first-loop-alt.c: New test.

* testsuite/libgomp.c/parloops-exit-first-loop-alt-2.c: New test.
* testsuite/libgomp.c/parloops-exit-first-loop-alt-3.c: New test.
* testsuite/libgomp.c/parloops-exit-first-loop-alt.c: New test.

Added:
trunk/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-2.c
trunk/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-3.c
trunk/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt.c
trunk/libgomp/testsuite/libgomp.c/parloops-exit-first-loop-alt-2.c
trunk/libgomp/testsuite/libgomp.c/parloops-exit-first-loop-alt-3.c
trunk/libgomp/testsuite/libgomp.c/parloops-exit-first-loop-alt.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-parloops.c
trunk/libgomp/ChangeLog


[Bug tree-optimization/65443] Don't peel last iteration from loop in transform_to_exit_first_loop

2015-06-05 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65443

vries at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #19 from vries at gcc dot gnu.org ---
Patch with test-cases committed to trunk, marking resolved-fixed.


[Bug tree-optimization/65443] Don't peel last iteration from loop in transform_to_exit_first_loop

2015-05-28 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65443

--- Comment #17 from vries at gcc dot gnu.org ---
Author: vries
Date: Thu May 28 21:23:54 2015
New Revision: 223848

URL: https://gcc.gnu.org/viewcvs?rev=223848root=gccview=rev
Log:
Add transform_to_exit_first_loop_alt

2015-05-28  Tom de Vries  t...@codesourcery.com

PR tree-optimization/65443
* tree-parloops.c (replace_imm_uses, replace_uses_in_bb_by)
(replace_uses_in_bbs_by, transform_to_exit_first_loop_alt)
(try_transform_to_exit_first_loop_alt): New function.
(transform_to_exit_first_loop): Use
try_transform_to_exit_first_loop_alt.

* gcc.dg/parloops-exit-first-loop-alt-2.c: New test.
* gcc.dg/parloops-exit-first-loop-alt-3.c: New test.
* gcc.dg/parloops-exit-first-loop-alt.c: New test.

* testsuite/libgomp.c/parloops-exit-first-loop-alt-2.c: New test.
* testsuite/libgomp.c/parloops-exit-first-loop-alt-3.c: New test.
* testsuite/libgomp.c/parloops-exit-first-loop-alt.c: New test.

Added:
   
branches/gomp-4_0-branch/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-2.c
   
branches/gomp-4_0-branch/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-3.c
   
branches/gomp-4_0-branch/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt.c
   
branches/gomp-4_0-branch/libgomp/testsuite/libgomp.c/parloops-exit-first-loop-alt-2.c
   
branches/gomp-4_0-branch/libgomp/testsuite/libgomp.c/parloops-exit-first-loop-alt-3.c
   
branches/gomp-4_0-branch/libgomp/testsuite/libgomp.c/parloops-exit-first-loop-alt.c
Modified:
branches/gomp-4_0-branch/gcc/ChangeLog.gomp
branches/gomp-4_0-branch/gcc/testsuite/ChangeLog.gomp
branches/gomp-4_0-branch/gcc/tree-parloops.c
branches/gomp-4_0-branch/libgomp/ChangeLog.gomp


[Bug tree-optimization/65443] Don't peel last iteration from loop in transform_to_exit_first_loop

2015-04-16 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65443

--- Comment #16 from vries at gcc dot gnu.org ---
ping:
- https://gcc.gnu.org/ml/gcc-patches/2015-04/msg00763.html


[Bug tree-optimization/65443] Don't peel last iteration from loop in transform_to_exit_first_loop

2015-04-03 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65443

--- Comment #15 from vries at gcc dot gnu.org ---
Submitted updated patch:
https://gcc.gnu.org/ml/gcc-patches/2015-04/msg00115.html


[Bug tree-optimization/65443] Don't peel last iteration from loop in transform_to_exit_first_loop

2015-03-27 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65443

vries at gcc dot gnu.org changed:

   What|Removed |Added

   Keywords||patch

--- Comment #14 from vries at gcc dot gnu.org ---
https://gcc.gnu.org/ml/gcc-patches/2015-03/msg01441.html


[Bug tree-optimization/65443] Don't peel last iteration from loop in transform_to_exit_first_loop

2015-03-26 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65443

vries at gcc dot gnu.org changed:

   What|Removed |Added

  Attachment #35103|0   |1
is obsolete||

--- Comment #12 from vries at gcc dot gnu.org ---
Created attachment 35142
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35142action=edit
WIP patch

Updated patch.

Now handles both constant and variable bounds, and lists the test-cases with
variable bounds it doesn't handle.

Build and reg-tested on x86_64.

Still todo: reductions.


[Bug tree-optimization/65443] Don't peel last iteration from loop in transform_to_exit_first_loop

2015-03-26 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65443

--- Comment #13 from vries at gcc dot gnu.org ---
Created attachment 35145
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35145action=edit
WIP patch

Added reduction example to testcases. Patch runs test-cases successfully.


[Bug tree-optimization/65443] Don't peel last iteration from loop in transform_to_exit_first_loop

2015-03-23 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65443

vries at gcc dot gnu.org changed:

   What|Removed |Added

  Attachment #35092|0   |1
is obsolete||

--- Comment #11 from vries at gcc dot gnu.org ---
Created attachment 35103
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35103action=edit
WIP patch

Updated patch. Skips cases that it can't handle, so it's on by default now.
Bootstrapped and reg-tested on x86_64, no issues found.


[Bug tree-optimization/65443] Don't peel last iteration from loop in transform_to_exit_first_loop

2015-03-22 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65443

vries at gcc dot gnu.org changed:

   What|Removed |Added

  Attachment #35078|0   |1
is obsolete||

--- Comment #10 from vries at gcc dot gnu.org ---
Created attachment 35092
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35092action=edit
WIP patch

Updated patch which fixes probability/frequency. The generated code for the
loopfn is now identical at the optimized dump (previously we were sinking loads
into the loop nest due to the broken probability/frequency). 

The main difference in generated code at the optimized dump is this:
...
   bb 5:
+  n_24 = n_5(D);
   .paral_data_store.6.a = a;
   .paral_data_store.6.b = b;
   .paral_data_store.6.c = c;
-  .paral_data_store.6.D.1854 = _12;
+  .paral_data_store.6.D.1854 = n_5(D);
   __builtin_GOMP_parallel (f._loopfn.0, .paral_data_store.6, 2, 0);
-  ivtmp_27 = (signed int) _12;
-  _29 = a[ivtmp_27];
-  _30 = b[ivtmp_27];
-  _31 = _29 + _30;
-  c[ivtmp_27] = _31;
...

That is, we up the number of iterations with one (from _n - 1 to n), and remove
the peeled-off last loop iteration (the code after the
__builtin_GOMP_parallel).


[Bug tree-optimization/65443] Don't peel last iteration from loop in transform_to_exit_first_loop

2015-03-20 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65443

--- Comment #7 from vries at gcc dot gnu.org ---
Created attachment 35078
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35078action=edit
WIP patch

WIP patch, works on included testcase only.


[Bug tree-optimization/65443] Don't peel last iteration from loop in transform_to_exit_first_loop

2015-03-20 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65443

--- Comment #9 from vries at gcc dot gnu.org ---
Created attachment 35080
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35080action=edit
parloops dump with -ftry


[Bug tree-optimization/65443] Don't peel last iteration from loop in transform_to_exit_first_loop

2015-03-20 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65443

--- Comment #8 from vries at gcc dot gnu.org ---
Created attachment 35079
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35079action=edit
parloops dump with -fno-try


[Bug tree-optimization/65443] Don't peel last iteration from loop in transform_to_exit_first_loop

2015-03-19 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65443

--- Comment #6 from vries at gcc dot gnu.org ---
After looking into it a bit further, I think what we're trying to get is:
...
  bb x:
  goto bb y;

  bb 4:
  i_17 = (int) ivtmp_y;
  _7 = (long unsigned int) i_17;
  _8 = _7 * 4;
  _9 = pretmp_24 + _8;
  _10 = *_9;
  sum_11 = _10 + sum_y;
  i_12 = i_17 + 1;
  i.1_3 = (unsigned int) i_12;
  goto  bb 6;

  bb 6:
  ivtmp_6 = ivtmp_y + 1;
  goto bb y;

  bb y:
  # sum_y = PHI 1(x), sum_11(6)
  # ivtmp_y = PHI 0(x), ivtmp_6(6)
  if (ivtmp_y  _20 + 1)
goto bb 4;
  else
goto bb 5;

  bb 5:
  # sum_21 = PHI sum_y(y), sum_26(8)
  goto bb 7;
...


[Bug tree-optimization/65443] Don't peel last iteration from loop in transform_to_exit_first_loop

2015-03-18 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65443

Richard Biener rguenth at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2015-03-18
 Ever confirmed|0   |1

--- Comment #5 from Richard Biener rguenth at gcc dot gnu.org ---
parloops needs a _lot_ of TLC!

Confirmed.


[Bug tree-optimization/65443] Don't peel last iteration from loop in transform_to_exit_first_loop

2015-03-17 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65443

--- Comment #4 from vries at gcc dot gnu.org ---
(In reply to vries from comment #3)
 (In reply to vries from comment #2)
  The problem with this transformation is that '_20 + 1' might overflow,
  that's what the comment 'This may need some additional preconditioning in
  case NIT = ~0' refers to.
 
 AFAIU, we might also move 'ivtmp_6 = ivtmp_y + 1' to the end of bb4. That
 way it's not triggered at loop entry, as before the transformation, 
 eliminating the need for '_20 + 1'.

One thing I overlooked there:

  _20 = n_4(D) + 4294967295;

If n == 0, we don't reach the loop.

If n == 1, we reach the loop, and _20 == 0. And when we reach the loop
condition from loop entry with ivtmp == 0, ivtmp  _20 will evaluate to false,
and we won't even enter the loop. That's the problem we're trying to solve
using '_20 + 1'. And moving 'ivtmp_6 = ivtmp_y + 1' to the end of bb4 doesn't
fix that.


[Bug tree-optimization/65443] Don't peel last iteration from loop in transform_to_exit_first_loop

2015-03-16 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65443

--- Comment #1 from vries at gcc dot gnu.org ---
Consider test.c, compiled with -O2 -tree-parallelize-loops=2:
...
#include stdio.h

extern unsigned int *a;

void
f (unsigned int n)
{
  int i;
  unsigned int sum = 1;

#pragma omp parallel
  {
#pragma omp for
for (i = 0; i  n; ++i)
  sum += a[i];
  }

  printf (%u\n, sum);
}
...

Before tranform_to_exit_first_loop, the loop looks like this:
...
  bb 4:
  # sum_18 = PHI 1(11), sum_11(6)
  # ivtmp_25 = PHI 0(11), ivtmp_6(6)
  i_17 = (int) ivtmp_25;
  _7 = (long unsigned int) i_17;
  _8 = _7 * 4;
  _9 = pretmp_24 + _8;
  _10 = *_9;
  sum_11 = _10 + sum_18;
  i_12 = i_17 + 1;
  i.1_3 = (unsigned int) i_12;
  if (ivtmp_25  _20)
goto bb 6;
  else
goto bb 5;

  bb 5:
  # sum_21 = PHI sum_11(4), sum_26(8)
  goto bb 7;

  bb 6:
  ivtmp_6 = ivtmp_25 + 1;
  goto bb 4;
...

You might say that the transformation applied by tranform_to_exit_first_loop is
that all the statements in bb4 before the if are moved past the if, into both
bb5 and bb6.

After, it looks like:
...
  bb 4:
  # sum_28 = PHI 1(11), sum_11(6)
  # ivtmp_29 = PHI 0(11), ivtmp_6(6)
  if (ivtmp_29  _20)
goto bb 13;
  else
goto bb 14;

  bb 13:
  # sum_18 = PHI sum_28(4)
  # ivtmp_25 = PHI ivtmp_29(4)
  i_17 = (int) ivtmp_25;
  _7 = (long unsigned int) i_17;
  _8 = _7 * 4;
  _9 = pretmp_24 + _8;
  _10 = *_9;
  sum_11 = _10 + sum_18;
  i_12 = i_17 + 1;
  i.1_3 = (unsigned int) i_12;
  goto bb 6;

  bb 14:
  # sum_30 = PHI sum_28(4)
  ivtmp_31 = _20;
  i_32 = (int) ivtmp_31;
  _33 = (long unsigned int) i_32;
  _34 = _33 * 4;
  _35 = pretmp_24 + _34;
  _36 = *_35;
  sum_37 = _36 + sum_30;
  i_38 = i_32 + 1;
  i.1_39 = (unsigned int) i_38;

  bb 5:
  # sum_21 = PHI sum_37(14), sum_26(8)
  goto bb 7;

  bb 6:
  ivtmp_6 = ivtmp_25 + 1;
  goto bb 4;
...


[Bug tree-optimization/65443] Don't peel last iteration from loop in transform_to_exit_first_loop

2015-03-16 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65443

--- Comment #2 from vries at gcc dot gnu.org ---
AFAIU, this is meant with the todo:
...
  bb x:
  goto bb y;

  bb 4:
  i_17 = (int) ivtmp_6;
  _7 = (long unsigned int) i_17;
  _8 = _7 * 4;
  _9 = pretmp_24 + _8;
  _10 = *_9;
  sum_11 = _10 + sum_y;
  i_12 = i_17 + 1;
  i.1_3 = (unsigned int) i_12;

  bb y:
  # sum_y = PHI 1(x), sum_11(4)
  # ivtmp_y = PHI 0(x), ivtmp_6(4)
  if (ivtmp_y  _20 + 1)
goto bb 6;
  else
goto bb 5;

  bb 5:
  # sum_21 = PHI sum_11(4), sum_26(8)
  goto bb 7;

  bb 6:
  ivtmp_6 = ivtmp_y + 1;
  goto bb 4;
...

So, sort of:
- Split bb 4 before the loop condition, creating bb y.
- Don't enter the loop at bb 4 as before, instead jump to before the loop
  condition, to bb y (creating bb x in the process) 
- For each phi in bb 4, add a corresponding phi to bb y: 
  - For the values for entry from bb x, use the values in the phis in bb 4 for
entry from bb 11.
  - For the values for entry from bb 4, use the reaching definitions.
- increase loop bound with 1 (_20 + 1)
- simplify the phis in bb 4
- use the new phis in bb y as defs for the reachable uses

The problem with this transformation is that '_20 + 1' might overflow, that's
what the comment 'This may need some additional preconditioning in case NIT =
~0' refers to.


[Bug tree-optimization/65443] Don't peel last iteration from loop in transform_to_exit_first_loop

2015-03-16 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65443

--- Comment #3 from vries at gcc dot gnu.org ---
(In reply to vries from comment #2)
 The problem with this transformation is that '_20 + 1' might overflow,
 that's what the comment 'This may need some additional preconditioning in
 case NIT = ~0' refers to.

AFAIU, we might also move 'ivtmp_6 = ivtmp_y + 1' to the end of bb4. That way
it's not triggered at loop entry, as before the transformation,  eliminating
the need for '_20 + 1'.