[Bug tree-optimization/83126] [8 Regression] ICE in transform_to_exit_first_loop_alt, at tree-parloops.c:1713

2018-04-02 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83126

--- Comment #13 from Tom de Vries  ---
(In reply to Arseny Solokha from comment #12)
> I suppose it is fixed.

Filed PR85157 - "[parloops] Prevent canonicalize_loop_ivs failure in
gen_parallel_loop" to fix this in an optimal way.

[Bug tree-optimization/83126] [8 Regression] ICE in transform_to_exit_first_loop_alt, at tree-parloops.c:1713

2018-03-29 Thread asolokha at gmx dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83126

Arseny Solokha  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #12 from Arseny Solokha  ---
I suppose it is fixed.

[Bug tree-optimization/83126] [8 Regression] ICE in transform_to_exit_first_loop_alt, at tree-parloops.c:1713

2018-03-21 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83126

--- Comment #11 from Tom de Vries  ---
Author: vries
Date: Wed Mar 21 12:25:03 2018
New Revision: 258713

URL: https://gcc.gnu.org/viewcvs?rev=258713=gcc=rev
Log:
[parloops] Handle canonicalize_loop_ivs failure

2018-03-21  Tom de Vries  

PR tree-optimization/83126
* tree-parloops.c (num_phis): New function.
(gen_parallel_loop): Detect and handle canonicalize_loop_ivs failure.

* gcc.dg/graphite/pr83126.c: New test.

Added:
trunk/gcc/testsuite/gcc.dg/graphite/pr83126.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-parloops.c

[Bug tree-optimization/83126] [8 Regression] ICE in transform_to_exit_first_loop_alt, at tree-parloops.c:1713

2018-02-22 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83126

Tom de Vries  changed:

   What|Removed |Added

   Keywords||patch

--- Comment #10 from Tom de Vries  ---
https://gcc.gnu.org/ml/gcc-patches/2018-02/msg01272.html

[Bug tree-optimization/83126] [8 Regression] ICE in transform_to_exit_first_loop_alt, at tree-parloops.c:1713

2018-02-21 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83126

--- Comment #9 from Tom de Vries  ---
(In reply to Tom de Vries from comment #8)
> Created attachment 43476 [details]
> Tentative patch
> 
> (In reply to rguent...@suse.de from comment #3)
> 
> > This is the usual "you should not repeat analysis during transform" issue.
> > The vectorizer gets around this by caching relevant scalar evolution
> > but obviously that's difficult if using generic stuff like
> > canonicalize_loop_ivs ...
> 
> This patch caches affine_iv info before calling loop_version, and then uses
> that cached info in canonicalize_loop_ivs. This fixes the ICE.

Bootstrap and reg-test on x86_64 succeeded.

Richard, is this approach ok for stage1 and/or stage4?

[Bug tree-optimization/83126] [8 Regression] ICE in transform_to_exit_first_loop_alt, at tree-parloops.c:1713

2018-02-20 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83126

--- Comment #8 from Tom de Vries  ---
Created attachment 43476
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43476=edit
Tentative patch

(In reply to rguent...@suse.de from comment #3)

> This is the usual "you should not repeat analysis during transform" issue.
> The vectorizer gets around this by caching relevant scalar evolution
> but obviously that's difficult if using generic stuff like
> canonicalize_loop_ivs ...

This patch caches affine_iv info before calling loop_version, and then uses
that cached info in canonicalize_loop_ivs. This fixes the ICE.

[Bug tree-optimization/83126] [8 Regression] ICE in transform_to_exit_first_loop_alt, at tree-parloops.c:1713

2018-02-13 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83126

--- Comment #7 from Tom de Vries  ---
Created attachment 43404
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43404=edit
Demonstrator patch

(In reply to Richard Biener from comment #6)
> (In reply to Tom de Vries from comment #4)
> > (In reply to rguent...@suse.de from comment #3)
> > 
> > > This is the usual "you should not repeat analysis during transform" issue.
> > > The vectorizer gets around this by caching relevant scalar evolution
> > > but obviously that's difficult if using generic stuff like
> > > canonicalize_loop_ivs ...
> > 
> > I wonder if it makes sense to add an interface to scalar_evolution_info to
> > update the instantiated_below field for existing entries. So, before the
> > loop versioning, the loop preheader is bb10, but after loop versioning, it's
> > bb15. If we update the instantiated_below field from 10 to 15, the cached
> > scalar evolution info can still be used.
> 
> I don't see how this is a sound API that can be used without too much
> possibility to shoot yourself in the foot ;)
> 

I suppose, yes. Anyway, out of curiosity, I made this demonstrator patch that
fixes the problem using such an API.

> I _think_ that eventually the bug is fixed if you move the update_ssa after
> versioning to after canonicalize_loop_ivs which should have a similar effect
> than what you propose.  GRAPHITE relies on defered update_ssa as well to
> maintain "compatible" SCEV analysis results.

I think that indeed is necessary, but not sufficient by itself. Postponing
update_ssa doesn't fix the fact that the instantiated_below bb has changed
(which is the bit that this patch tries to fix).

[Bug tree-optimization/83126] [8 Regression] ICE in transform_to_exit_first_loop_alt, at tree-parloops.c:1713

2018-02-13 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83126

--- Comment #6 from Richard Biener  ---
(In reply to Tom de Vries from comment #4)
> (In reply to rguent...@suse.de from comment #3)
> 
> > This is the usual "you should not repeat analysis during transform" issue.
> > The vectorizer gets around this by caching relevant scalar evolution
> > but obviously that's difficult if using generic stuff like
> > canonicalize_loop_ivs ...
> 
> I wonder if it makes sense to add an interface to scalar_evolution_info to
> update the instantiated_below field for existing entries. So, before the
> loop versioning, the loop preheader is bb10, but after loop versioning, it's
> bb15. If we update the instantiated_below field from 10 to 15, the cached
> scalar evolution info can still be used.

I don't see how this is a sound API that can be used without too much
possibility to shoot yourself in the foot ;)

I _think_ that eventually the bug is fixed if you move the update_ssa after
versioning to after canonicalize_loop_ivs which should have a similar effect
than what you propose.  GRAPHITE relies on defered update_ssa as well to
maintain "compatible" SCEV analysis results.

[Bug tree-optimization/83126] [8 Regression] ICE in transform_to_exit_first_loop_alt, at tree-parloops.c:1713

2018-02-12 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83126

--- Comment #5 from Tom de Vries  ---
This more of a stage4 fix: instead of ICE-ing, abort transformation:
...
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index e44ad5e..11f773f 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -2371,6 +2371,23 @@ gen_parallel_loop (struct loop *loop,
   /* Base all the induction variables in LOOP on a single control one.  */
   canonicalize_loop_ivs (loop, , true);

+  unsigned nr_phis = 0;
+  for (gphi_iterator gsi = gsi_start_phis (loop->header); !gsi_end_p (gsi);
+   gsi_next ())
+{
+  gphi *phi = gsi.phi ();
+  tree def = PHI_RESULT (phi);
+  affine_iv iv;
+
+  if (virtual_operand_p (def))
+   continue;
+
+  nr_phis++;
+}
+
+  if (nr_phis != reduction_list->elements () + 1)
+return;
+
   /* Ensure that the exit condition is the first statement in the loop.
  The common case is that latch of the loop is empty (apart from the
  increment) and immediately follows the loop exit test.  Attempt to move
the
...

[Bug tree-optimization/83126] [8 Regression] ICE in transform_to_exit_first_loop_alt, at tree-parloops.c:1713

2018-02-12 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83126

--- Comment #4 from Tom de Vries  ---
(In reply to rguent...@suse.de from comment #3)

> This is the usual "you should not repeat analysis during transform" issue.
> The vectorizer gets around this by caching relevant scalar evolution
> but obviously that's difficult if using generic stuff like
> canonicalize_loop_ivs ...

I wonder if it makes sense to add an interface to scalar_evolution_info to
update the instantiated_below field for existing entries. So, before the loop
versioning, the loop preheader is bb10, but after loop versioning, it's bb15.
If we update the instantiated_below field from 10 to 15, the cached scalar
evolution info can still be used.

[Bug tree-optimization/83126] [8 Regression] ICE in transform_to_exit_first_loop_alt, at tree-parloops.c:1713

2018-02-12 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83126

--- Comment #3 from rguenther at suse dot de  ---
On Mon, 12 Feb 2018, vries at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83126
> 
> Tom de Vries  changed:
> 
>What|Removed |Added
> 
>  CC||rguenth at gcc dot gnu.org
> 
> --- Comment #2 from Tom de Vries  ---
> I.
> 
> If we disable transform_to_exit_first_loop_alt, we run into this assert 
> instead
> in transform_to_exit_first_loop:
> ...
>   gcc_assert (control_name == NULL_TREE
>   && SSA_NAME_VAR (res) == SSA_NAME_VAR (control));
> ...
> 
> So the problem is not specific to transform_to_exit_first_loop_alt.
> 
> 
> II.
> 
> This is the loop we're trying to parallelize, at dce5 (so before parloops):
> ...
>[local count: 118111600]:
> 
>[local count: 236258638]:
>   # _22 = PHI <0(10), _5(8)>
>   # c9_23 = PHI 
>   # e1_27 = PHI <0(10), e1_17(8)>
>   # ivtmp_26 = PHI <2(10), ivtmp_24(8)>
>   _4 = (short unsigned int) e1_27;
>   c9_14 = _4 * c9_23;
>   _5 = _22 + 1;
>   e1_17 = (int) _5;
>   ivtmp_24 = ivtmp_26 - 1;
>   if (ivtmp_24 != 0)
> goto ; [66.67%]
>   else
> goto ; [33.33%]
> 
>[local count: 157513634]:
>   goto ; [100.00%]
> 
>[local count: 78745004]:
>   # c9_21 = PHI 
> ...
> 
> The loop has one exit phi, and 4 loop header phis. The phis are classified in
> try_create_reduction_list.
> 
> The loop exit phi c9_21 together with loop header phi c9_23 is classified as a
> reduction. The remaining 3 loop header phis are classified as 'simple_iv'.
> 
> The idea is that the 3 ivs will be unified by canonicalize_loop_ivs before
> doing transform_to_exit_first_loop_alt or transform_to_exit_first_loop.
> 
> But, after canonicalize_loop_ivs we still have 3 loop header phis (instead of
> the two we are expecting: one for the reduction and one for the iv):
> ...
>[local count: 189006911]:
>   # c9_23 = PHI 
>   # e1_27 = PHI 
>   # ivtmp_8 = PHI 
>   _22 = ivtmp_8;
>   ivtmp_26 = 2 - ivtmp_8;
>   _4 = (short unsigned int) e1_27;
>   c9_14 = _4 * c9_23;
>   _5 = _22 + 1;
>   e1_17 = (int) _5;
>   ivtmp_24 = ivtmp_26 - 1;
>   if (ivtmp_8 < 1)
> goto ; [66.67%]
>   else
> goto ; [33.33%]
> ...
> The reason for this is that the e1_27 phi is not classified as simple_iv by
> canonicalize_loop_ivs.
> 
> This again can be tracked down to the loop versioning transformation at the
> beginning of gen_parallel_loop. Before the transformation, we have 3 simple_iv
> header phis, after the transformation we have 2. This is caused by scalar
> evolution failing for e1_27.
> 
> 
> III. 
> 
> The loop versioning transformation creates a condition, and a conditional jump
> to either the original loop (which is to be transformed into a parallel loop),
> or a copy of the loop. As it happens the condition for this example is
> hardcoded to 0 != 0, meaning we always jump to the copy. This makes the
> original loop dead code, and it's possible that the scalar evolution is 
> somehow
> influenced by that.
> 
> 
> IV.
> 
> FWIW, moving canonicalize_loop_ivs to before the loop versioning fixes the 
> ICE.
> But that needlessly transforms the copy of the loop, while this is only
> necessary for the original loop (which is to be transformed into a parallel
> loop).

This is the usual "you should not repeat analysis during transform" issue.
The vectorizer gets around this by caching relevant scalar evolution
but obviously that's difficult if using generic stuff like
canonicalize_loop_ivs ...

[Bug tree-optimization/83126] [8 Regression] ICE in transform_to_exit_first_loop_alt, at tree-parloops.c:1713

2018-02-12 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83126

Tom de Vries  changed:

   What|Removed |Added

 CC||rguenth at gcc dot gnu.org

--- Comment #2 from Tom de Vries  ---
I.

If we disable transform_to_exit_first_loop_alt, we run into this assert instead
in transform_to_exit_first_loop:
...
  gcc_assert (control_name == NULL_TREE
  && SSA_NAME_VAR (res) == SSA_NAME_VAR (control));
...

So the problem is not specific to transform_to_exit_first_loop_alt.


II.

This is the loop we're trying to parallelize, at dce5 (so before parloops):
...
   [local count: 118111600]:

   [local count: 236258638]:
  # _22 = PHI <0(10), _5(8)>
  # c9_23 = PHI 
  # e1_27 = PHI <0(10), e1_17(8)>
  # ivtmp_26 = PHI <2(10), ivtmp_24(8)>
  _4 = (short unsigned int) e1_27;
  c9_14 = _4 * c9_23;
  _5 = _22 + 1;
  e1_17 = (int) _5;
  ivtmp_24 = ivtmp_26 - 1;
  if (ivtmp_24 != 0)
goto ; [66.67%]
  else
goto ; [33.33%]

   [local count: 157513634]:
  goto ; [100.00%]

   [local count: 78745004]:
  # c9_21 = PHI 
...

The loop has one exit phi, and 4 loop header phis. The phis are classified in
try_create_reduction_list.

The loop exit phi c9_21 together with loop header phi c9_23 is classified as a
reduction. The remaining 3 loop header phis are classified as 'simple_iv'.

The idea is that the 3 ivs will be unified by canonicalize_loop_ivs before
doing transform_to_exit_first_loop_alt or transform_to_exit_first_loop.

But, after canonicalize_loop_ivs we still have 3 loop header phis (instead of
the two we are expecting: one for the reduction and one for the iv):
...
   [local count: 189006911]:
  # c9_23 = PHI 
  # e1_27 = PHI 
  # ivtmp_8 = PHI 
  _22 = ivtmp_8;
  ivtmp_26 = 2 - ivtmp_8;
  _4 = (short unsigned int) e1_27;
  c9_14 = _4 * c9_23;
  _5 = _22 + 1;
  e1_17 = (int) _5;
  ivtmp_24 = ivtmp_26 - 1;
  if (ivtmp_8 < 1)
goto ; [66.67%]
  else
goto ; [33.33%]
...
The reason for this is that the e1_27 phi is not classified as simple_iv by
canonicalize_loop_ivs.

This again can be tracked down to the loop versioning transformation at the
beginning of gen_parallel_loop. Before the transformation, we have 3 simple_iv
header phis, after the transformation we have 2. This is caused by scalar
evolution failing for e1_27.


III. 

The loop versioning transformation creates a condition, and a conditional jump
to either the original loop (which is to be transformed into a parallel loop),
or a copy of the loop. As it happens the condition for this example is
hardcoded to 0 != 0, meaning we always jump to the copy. This makes the
original loop dead code, and it's possible that the scalar evolution is somehow
influenced by that.


IV.

FWIW, moving canonicalize_loop_ivs to before the loop versioning fixes the ICE.
But that needlessly transforms the copy of the loop, while this is only
necessary for the original loop (which is to be transformed into a parallel
loop).

[Bug tree-optimization/83126] [8 Regression] ICE in transform_to_exit_first_loop_alt, at tree-parloops.c:1713

2018-01-08 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83126

Jakub Jelinek  changed:

   What|Removed |Added

   Priority|P3  |P4
 CC||jakub at gcc dot gnu.org

[Bug tree-optimization/83126] [8 Regression] ICE in transform_to_exit_first_loop_alt, at tree-parloops.c:1713

2017-12-19 Thread aldyh at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83126

Aldy Hernandez  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2017-12-19
 CC||aldyh at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Aldy Hernandez  ---
This test has a write to *fd where fd is uninitialized.

Also, the set to fd looks weird:

fd = *fd;

fd is an int *, and you're trying to set the pointer to the value of a short?

I've fixed the uninitialized problem, while leaving the short to int * move,
and I still get an ICE, so... confirmed:

void
ew (unsigned short int c9, int stuff)
{
  int e1;

  for (;;)
{
  unsigned int *by = 
  int *fd = 

  *fd = c9;
  fd = *fd;
  if (*fd != 0)
for (*by = 0; *by < 2; ++*by)
  c9 *= e1;
}
}

We are failing the following assert in transform_to_exit_first_loop_alt:

  struct reduction_info *red = reduction_phi (reduction_list, phi);
  gcc_assert (virtual_operand_p (res_a)
  || res_a == control
  || red != NULL);

(gdb) call debug_generic_stmt(res_a)
e1_27

(gdb) call debug_generic_stmt(control)
ivtmp_8

FWIW, the defining PHI statement for ivtmp_8 looks weird:

  # ivtmp_8 = PHI <(17)>

As do other PHIs:
  # c9_29 = PHI <(8), (15)>
  # e1_30 = PHI <(8), (15)>

[Bug tree-optimization/83126] [8 Regression] ICE in transform_to_exit_first_loop_alt, at tree-parloops.c:1713

2017-11-23 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83126

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |8.0