[Bug rtl-optimization/85180] Infinite loop in RTL DSE optimizer

2023-07-21 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85180

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |8.0

[Bug rtl-optimization/85180] Infinite loop in RTL DSE optimizer

2018-05-22 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85180

--- Comment #21 from Richard Biener  ---
*** Bug 85862 has been marked as a duplicate of this bug. ***

[Bug rtl-optimization/85180] Infinite loop in RTL DSE optimizer

2018-05-10 Thread matthew.weber at rockwellcollins dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85180

Matt Weber  changed:

   What|Removed |Added

 CC||matthew.weber@rockwellcolli
   ||ns.com

--- Comment #20 from Matt Weber  ---
Another datapoint that libnss 3.35 fails to build on the microblaze arch
(uclibc) with any of the 6.x series and this bug's patch resolves that
(https://github.com/gcc-mirror/gcc/commit/df03ebc3574a0d7893127e3b9754a01abf2d8b70).
 

microblazeel-buildroot-linux-uclibc-gcc -o
Linux2.6_microblazeel_microblazeel-buildroot-linux-uclibc-gcc.br_real_glibc_PTH_OPT.OBJ/sslgrp.o
-c -O2 -fPIC   -pipe -ffunction-sections -fdata-sections -DHAVE_STRERROR
-DLINUX -Dlinux -Wall -Werror -DXP_UNIX -UDEBUG -DNDEBUG -D_REENTRANT
-DNSS_NO_INIT_SUPPORT -DUSE_UTIL_DIRECTLY -DNO_NSPR_10_SUPPORT
-DSSL_DISABLE_DEPRECATED_CIPHER_SUITE_NAMES -Isysroot/usr/include/nspr
-Ilibnss-3.35/dist/include -I../../../dist/public/nss
-I../../../dist/private/nss  sslgrp.c


However the libnss build works fine with the 7.x branch and master.  I've been
bisecting all 3 branches for a few days and figured I should just try this
patch before debugging the 6.x branch further.  I even went back and tried to
find common ancestors between the 6.x and 7.x and couldn't get a point where
the good/bad builds matched up.  Guessing backports on 6.x from master
introduced the libnss bug variant as the gcc/cse* and gcc/alias.x files had a
lot of updates from 5.x to 7.x.

Here's the stack trace on cc1 when its hung building a libnss sslgrp.c
file...
( A lot more find_base_terms() before this one)
#71 0x005ae67d in find_base_term(rtx_def*) ()
#72 0x005ae532 in find_base_term(rtx_def*) ()
#73 0x005ae67d in find_base_term(rtx_def*) ()
#74 0x005ae532 in find_base_term(rtx_def*) ()
#75 0x005ae67d in find_base_term(rtx_def*) ()
#76 0x005ae568 in find_base_term(rtx_def*) ()
#77 0x005b137d in write_dependence_p(rtx_def const*, rtx_def const*,
machine_mode, rtx_def*, bool, bool, bool) ()
#78 0x005b1585 in canon_anti_dependence(rtx_def const*, bool, rtx_def
const*, machine_mode, rtx_def*) ()
#79 0x0061c5e3 in cselib_invalidate_mem(rtx_def*) ()
#80 0x0061d32d in cselib_record_sets(rtx_insn*) ()
#81 0x0061e5c8 in cselib_process_insn(rtx_insn*) ()
#82 0x008774b8 in reload_cse_regs_1() ()
#83 0x008777dc in (anonymous
namespace)::pass_postreload_cse::execute(function*) ()
#84 0x0086714d in execute_one_pass(opt_pass*) ()

[Bug rtl-optimization/85180] Infinite loop in RTL DSE optimizer

2018-05-09 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85180

--- Comment #19 from rguenther at suse dot de  ---
On Wed, 9 May 2018, romain.naour at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85180
> 
> Romain Naour  changed:
> 
>What|Removed |Added
> 
>  CC||romain.naour at gmail dot com
> 
> --- Comment #18 from Romain Naour  ---
> (In reply to Richard Biener from comment #17)
> > (In reply to romain.naour from comment #16)
> > > Hi,
> > > 
> > > gcc 7.3.0 is affected by this bug but only on microblaze architecture, see
> > > [1].
> > > Do you plan to backport this patch on gcc 7.x?
> > > It is safe to do so without take the risk to break something with other
> > > architecture or optimization level?
> > > 
> > > Best regards,
> > > Romain
> > > 
> > > [1] http://lists.busybox.net/pipermail/buildroot/2018-April/220156.html
> > 
> > The bug isn't a regression so technically it doesn't qualify.  OTOH it
> > looks reasonably safe to backport and the bug is annoying.  Jakub, would
> > you be ok with a backport?
> 
> Ping?

Jakub said it doesn't qualify give it's too risky.

[Bug rtl-optimization/85180] Infinite loop in RTL DSE optimizer

2018-05-09 Thread romain.naour at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85180

Romain Naour  changed:

   What|Removed |Added

 CC||romain.naour at gmail dot com

--- Comment #18 from Romain Naour  ---
(In reply to Richard Biener from comment #17)
> (In reply to romain.naour from comment #16)
> > Hi,
> > 
> > gcc 7.3.0 is affected by this bug but only on microblaze architecture, see
> > [1].
> > Do you plan to backport this patch on gcc 7.x?
> > It is safe to do so without take the risk to break something with other
> > architecture or optimization level?
> > 
> > Best regards,
> > Romain
> > 
> > [1] http://lists.busybox.net/pipermail/buildroot/2018-April/220156.html
> 
> The bug isn't a regression so technically it doesn't qualify.  OTOH it
> looks reasonably safe to backport and the bug is annoying.  Jakub, would
> you be ok with a backport?

Ping?

[Bug rtl-optimization/85180] Infinite loop in RTL DSE optimizer

2018-04-30 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85180

--- Comment #17 from Richard Biener  ---
(In reply to romain.naour from comment #16)
> Hi,
> 
> gcc 7.3.0 is affected by this bug but only on microblaze architecture, see
> [1].
> Do you plan to backport this patch on gcc 7.x?
> It is safe to do so without take the risk to break something with other
> architecture or optimization level?
> 
> Best regards,
> Romain
> 
> [1] http://lists.busybox.net/pipermail/buildroot/2018-April/220156.html

The bug isn't a regression so technically it doesn't qualify.  OTOH it
looks reasonably safe to backport and the bug is annoying.  Jakub, would
you be ok with a backport?

[Bug rtl-optimization/85180] Infinite loop in RTL DSE optimizer

2018-04-30 Thread romain.naour at smile dot fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85180

--- Comment #16 from romain.naour at smile dot fr ---
Hi,

gcc 7.3.0 is affected by this bug but only on microblaze architecture, see [1].
Do you plan to backport this patch on gcc 7.x?
It is safe to do so without take the risk to break something with other
architecture or optimization level?

Best regards,
Romain

[1] http://lists.busybox.net/pipermail/buildroot/2018-April/220156.html

[Bug rtl-optimization/85180] Infinite loop in RTL DSE optimizer

2018-04-06 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85180

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
  Known to work||8.0
 Resolution|--- |FIXED
  Known to fail||7.3.1

--- Comment #15 from Richard Biener  ---
Fixed for GCC 8.

[Bug rtl-optimization/85180] Infinite loop in RTL DSE optimizer

2018-04-06 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85180

--- Comment #14 from Richard Biener  ---
Author: rguenth
Date: Fri Apr  6 08:30:52 2018
New Revision: 259166

URL: https://gcc.gnu.org/viewcvs?rev=259166=gcc=rev
Log:
2018-04-06  Richard Biener  

PR middle-end/85180
* alias.c (find_base_term): New wrapper around find_base_term
unwinding CSELIB_VAL_PTR changes.
(find_base_term): Do not restore CSELIB_VAL_PTR during the
recursion.

* gcc.dg/pr85180.c: New testcase.

Added:
trunk/gcc/testsuite/gcc.dg/pr85180.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/alias.c
trunk/gcc/testsuite/ChangeLog

[Bug rtl-optimization/85180] Infinite loop in RTL DSE optimizer

2018-04-05 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85180

--- Comment #13 from Richard Biener  ---
Created attachment 43851
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43851=edit
other alternative

Like this.

11.00user 0.01system 0:11.01elapsed 99%CPU (0avgtext+0avgdata
43204maxresident)k
0inputs+144outputs (0major+7848minor)pagefaults 0swaps

needed to avoid calling cselib_sp_based_value_p because that didn't like the
"messed up" val_rtx.  Now the question is whether that's too dangerous given
we don't control FIND_BASE_TERM as defined by targets... (I could guard that
with !VALUE and re-do that below in the VALUE case after the caching, but
eventually that function may recurse itself and wreck the whole thing
anyways...  it also looks like it is invented to do some RTX massaging
before finding the base value, not find it itself?)

[Bug rtl-optimization/85180] Infinite loop in RTL DSE optimizer

2018-04-05 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85180

--- Comment #12 from Jakub Jelinek  ---
True, but might eat more compile time memory.
Further, the question stands, is what find_base_term returns for a particular
VALUE cacheable for the whole duration between cselib_init and cselib_finish in
each pass, or not?  Would be nice to again do instrumented bootstrap/regtest to
gather data whether it ever differs and how often if it does.

[Bug rtl-optimization/85180] Infinite loop in RTL DSE optimizer

2018-04-05 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85180

--- Comment #11 from Richard Biener  ---
So it looks like caching in cselib_val->rtx and unwinding that via a stack
might be the fastest variant (if we care)?

[Bug rtl-optimization/85180] Infinite loop in RTL DSE optimizer

2018-04-05 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85180

--- Comment #10 from Richard Biener  ---
Created attachment 43850
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43850=edit
alternative patch

The most simple hash_map variant is the attached.  Comparing (-O0 optimized
cc1)
compile-time for your testcase shows

hash-map:
> /usr/bin/time ./cc1 -quiet t.c -O -I include 
13.80user 0.01system 0:13.83elapsed 99%CPU (0avgtext+0avgdata
43556maxresident)k
0inputs+144outputs (0major+7967minor)pagefaults 0swaps

vector-unwind:
> /usr/bin/time ./cc1 -quiet t.c -O -I include 
12.10user 0.01system 0:12.12elapsed 99%CPU (0avgtext+0avgdata
43656maxresident)k
0inputs+144outputs (0major+7949minor)pagefaults 0swaps

just building alias.c with -O2:

hash-map:
11.28user 0.01system 0:11.30elapsed 99%CPU (0avgtext+0avgdata
43256maxresident)k
0inputs+144outputs (0major+7855minor)pagefaults 0swaps

vector-unwind:
11.03user 0.01system 0:11.04elapsed 100%CPU (0avgtext+0avgdata
43168maxresident)k
0inputs+144outputs (0major+7872minor)pagefaults 0swaps

caching directly in an enlarged cselib_val yields in

11.03user 0.02system 0:11.05elapsed 99%CPU (0avgtext+0avgdata
43148maxresident)k
0inputs+144outputs (0major+7852minor)pagefaults 0swaps

Index: gcc/alias.c
===
--- gcc/alias.c (revision 259082)
+++ gcc/alias.c (working copy)
@@ -1939,6 +1939,9 @@ find_base_term (rtx x)
   if (cselib_sp_based_value_p (val))
return static_reg_base_value[STACK_POINTER_REGNUM];

+  if (val->base_term != (rtx)-1)
+   return val->base_term;
+  val->base_term = NULL_RTX;
   f = val->locs;
   /* Temporarily reset val->locs to avoid infinite recursion.  */
   val->locs = NULL;
@@ -1953,6 +1956,7 @@ find_base_term (rtx x)
  break;

   val->locs = f;
+  val->base_term = ret;
   return ret;

 case LO_SUM:

[Bug rtl-optimization/85180] Infinite loop in RTL DSE optimizer

2018-04-05 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85180

--- Comment #9 from rguenther at suse dot de  ---
On Thu, 5 Apr 2018, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85180
> 
> --- Comment #8 from Jakub Jelinek  ---
> If find_base_term always returned whatever first returned non-NULL up to the
> ultimate caller, then I think the above would work fine.  Sadly, that is the
> case only in most of the spots, not all of them.
> The PLUS/MINUS case is the major exception (and *_EXTEND too, but that doesn't
> matter):
> rtx base = find_base_term (tmp1);
> if (base != NULL_RTX
> && ((REG_P (tmp1) && REG_POINTER (tmp1))
>  || known_base_value_p (base)))
>   return base;
> base = find_base_term (tmp2);
> if (base != NULL_RTX
> && ((REG_P (tmp2) && REG_POINTER (tmp2))
>  || known_base_value_p (base)))
>   return base;
> If tmp1 or tmp2 aren't REGs with REG_POINTER (the REG with REG_POINTER case
> isn't that interesting, because the find_base_term recursion is one level only
> in that case and no VALUEs are involved), then we only return whatever has 
> been
> returned if known_base_value_p (base).  So other bases (that is group (1),
> incoming arguments, right?) might be returned before your patch and not after
> it, e.g. if
> we at toplevel call find_base_term on a VALUE that has (plus (value 123) 
> (value
> 345)) and (value 456) locs, value 123 has (minus (value 456) (value 345)),
> value 345 doesn't have a base term and value 456 is incoming argument.  When
> recursing on the plus and minus, we see that find_base_term on 456 returned
> non-NULL base that is !known_base_value_p (base) and return NULL, then we walk
> value 456 again and at this point without your patch we would return the
> incoming ADDRESS, but with the patch don't, because we've already walked it.
> 
> Perhaps those cases are sufficiently rare that we don't care, still, it would
> be nice to gather at least some statistics on this on multiple targets (say
> x86_64 and powerpc64) bootstraps/regtests, tweak your patch so that the 2
> argument find_base_term can have the second argument NULL and in that case it
> behaves the old way and call it twice, once with NULL argument and then with
> non-NULL one and compare the result, gather statistics on how many toplevel
> find_base_term calls there were and how many out of them resulted in different
> result.

So we could conservatively address this by

  unsigned saved_stackpos = visited_vals.length ();
  rtx base = find_base_term (tmp1, visited_vals);
  if (base != NULL_RTX
&& ((REG_P (tmp1) && REG_POINTER (tmp1))
 || known_base_value_p (base)))
  return base;
  if (base != NULL_RTX)
unwind-visited_vals-to-saved-stackpos;

and the same for the tmp2 case.  But I'm not sure whether that's a good
idea since it looks like we may end up re-introducing the exponential
complexity if we never return any of the found bases.

Fixing this case with keeping the current behavior would instead ask
for caching the outcome of find_base_term on VALUEs where we then
can remove the val->locs frobbing.

The question is whether that is worth it (as you say) and what
the cost of doing so is.  It looks like we may be able to
change cselib_val in a way to make a direct-mapped cache
possible without too much overhead?  find_base_term is quite
performance sensitive IIRC.  OTOH cselib_val might be size-sensitive...
Using a bit in the RTX and re-using cselib_val->val_rtx and keeping
an unwinding stack for that might be possible if the more trivial
implementation with a hash-map is too expensive.

[Bug rtl-optimization/85180] Infinite loop in RTL DSE optimizer

2018-04-05 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85180

--- Comment #8 from Jakub Jelinek  ---
If find_base_term always returned whatever first returned non-NULL up to the
ultimate caller, then I think the above would work fine.  Sadly, that is the
case only in most of the spots, not all of them.
The PLUS/MINUS case is the major exception (and *_EXTEND too, but that doesn't
matter):
rtx base = find_base_term (tmp1);
if (base != NULL_RTX
&& ((REG_P (tmp1) && REG_POINTER (tmp1))
 || known_base_value_p (base)))
  return base;
base = find_base_term (tmp2);
if (base != NULL_RTX
&& ((REG_P (tmp2) && REG_POINTER (tmp2))
 || known_base_value_p (base)))
  return base;
If tmp1 or tmp2 aren't REGs with REG_POINTER (the REG with REG_POINTER case
isn't that interesting, because the find_base_term recursion is one level only
in that case and no VALUEs are involved), then we only return whatever has been
returned if known_base_value_p (base).  So other bases (that is group (1),
incoming arguments, right?) might be returned before your patch and not after
it, e.g. if
we at toplevel call find_base_term on a VALUE that has (plus (value 123) (value
345)) and (value 456) locs, value 123 has (minus (value 456) (value 345)),
value 345 doesn't have a base term and value 456 is incoming argument.  When
recursing on the plus and minus, we see that find_base_term on 456 returned
non-NULL base that is !known_base_value_p (base) and return NULL, then we walk
value 456 again and at this point without your patch we would return the
incoming ADDRESS, but with the patch don't, because we've already walked it.

Perhaps those cases are sufficiently rare that we don't care, still, it would
be nice to gather at least some statistics on this on multiple targets (say
x86_64 and powerpc64) bootstraps/regtests, tweak your patch so that the 2
argument find_base_term can have the second argument NULL and in that case it
behaves the old way and call it twice, once with NULL argument and then with
non-NULL one and compare the result, gather statistics on how many toplevel
find_base_term calls there were and how many out of them resulted in different
result.

[Bug rtl-optimization/85180] Infinite loop in RTL DSE optimizer

2018-04-05 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85180

--- Comment #7 from Richard Biener  ---
Created attachment 43849
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43849=edit
patch

Patch I am testing.

[Bug rtl-optimization/85180] Infinite loop in RTL DSE optimizer

2018-04-05 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85180

--- Comment #6 from Richard Biener  ---
(In reply to Jakub Jelinek from comment #4)
> Actually, find_base_value is probably ok, it doesn't handle VALUEs and for
> PLUS/MINUS it just guesses one operand on which to recurse, rather than both.

find_base_value is ok in that it walks the full expression once but the
linear complexity breaks down once the expression forms a tree (thus we
get sharing, only possible via VALUEs I guess).

> Another possibility is to add some counter and count how many VALUEs we've
> processed for one toplevel call (or how many recursions we've done) and
> limit that by some constant or param, and just return NULL after we hit that
> limit.

Hmm, so the val->locs = NULL trick would need to be extended to only "pop"
the NULLs before the toplevel call returns.

The following fixes the testcase.  On the testcase most of the time the
stack remains empty, once we have 1 entry and 4 times we have 236
(the problematic one I guess).  On your last testcase (which is also
fixed) I get 3 times 2000 and 2001 times 2001 while in the majority
of calls 0, 1 or 2 (added printf of visited_vals.length ()).

> ./cc1 -quiet t.c -O -I include 2>&1 | sort | uniq -c
 795295 0
 130751 1
   1488 2
  3 2000
   2001 2001

Index: gcc/alias.c
===
--- gcc/alias.c (revision 259082)
+++ gcc/alias.c (working copy)
@@ -1876,7 +1876,8 @@ rtx_equal_for_memref_p (const_rtx x, con
 }

 static rtx
-find_base_term (rtx x)
+find_base_term (rtx x, vec _vals)
 {
   cselib_val *val;
   struct elt_loc_list *l, *f;
@@ -1910,7 +1911,7 @@ find_base_term (rtx x)
 case POST_DEC:
 case PRE_MODIFY:
 case POST_MODIFY:
-  return find_base_term (XEXP (x, 0));
+  return find_base_term (XEXP (x, 0), visited_vals);

 case ZERO_EXTEND:
 case SIGN_EXTEND:  /* Used for Alpha/NT pointers */
@@ -1921,7 +1922,7 @@ find_base_term (rtx x)
return 0;

   {
-   rtx temp = find_base_term (XEXP (x, 0));
+   rtx temp = find_base_term (XEXP (x, 0), visited_vals);

if (temp != 0 && CONSTANT_P (temp))
  temp = convert_memory_address (Pmode, temp);
@@ -1940,7 +1941,9 @@ find_base_term (rtx x)
return static_reg_base_value[STACK_POINTER_REGNUM];

   f = val->locs;
-  /* Temporarily reset val->locs to avoid infinite recursion.  */
+  /* Reset val->locs to avoid infinite recursion.  */
+  if (f)
+   visited_vals.safe_push (std::make_pair (val, f));
   val->locs = NULL;

   for (l = f; l; l = l->next)
@@ -1949,16 +1952,15 @@ find_base_term (rtx x)
&& !CSELIB_VAL_PTR (l->loc)->locs->next
&& CSELIB_VAL_PTR (l->loc)->locs->loc == x)
  continue;
-   else if ((ret = find_base_term (l->loc)) != 0)
+   else if ((ret = find_base_term (l->loc, visited_vals)) != 0)
  break;

-  val->locs = f;
   return ret;

 case LO_SUM:
   /* The standard form is (lo_sum reg sym) so look only at the
  second operand.  */
-  return find_base_term (XEXP (x, 1));
+  return find_base_term (XEXP (x, 1), visited_vals);

 case CONST:
   x = XEXP (x, 0);
@@ -1984,7 +1986,7 @@ find_base_term (rtx x)
   other operand is the base register.  */

if (tmp1 == pic_offset_table_rtx && CONSTANT_P (tmp2))
- return find_base_term (tmp2);
+ return find_base_term (tmp2, visited_vals);

/* If either operand is known to be a pointer, then prefer it
   to determine the base term.  */
@@ -2001,12 +2003,12 @@ find_base_term (rtx x)
   term is from a pointer or is a named object or a special address
   (like an argument or stack reference), then use it for the
   base term.  */
-   rtx base = find_base_term (tmp1);
+   rtx base = find_base_term (tmp1, visited_vals);
if (base != NULL_RTX
&& ((REG_P (tmp1) && REG_POINTER (tmp1))
 || known_base_value_p (base)))
  return base;
-   base = find_base_term (tmp2);
+   base = find_base_term (tmp2, visited_vals);
if (base != NULL_RTX
&& ((REG_P (tmp2) && REG_POINTER (tmp2))
 || known_base_value_p (base)))
@@ -2020,7 +2022,7 @@ find_base_term (rtx x)

 case AND:
   if (CONST_INT_P (XEXP (x, 1)) && INTVAL (XEXP (x, 1)) != 0)
-   return find_base_term (XEXP (x, 0));
+   return find_base_term (XEXP (x, 0), visited_vals);
   return 0;

 case SYMBOL_REF:
@@ -2032,6 +2034,16 @@ find_base_term (rtx x)
 }
 }

+static rtx
+find_base_term (rtx x)
+{
+  auto_vec visited_vals;
+  rtx res = find_base_term (x, visited_vals);
+  for (unsigned i = 0; i < visited_vals.length (); ++i)
+visited_vals[i].first->locs = visited_vals[i].second;
+  return res;
+}
+
 /* Return true if accesses to address X may alias accesses based
on the stack pointer.  */

[Bug rtl-optimization/85180] Infinite loop in RTL DSE optimizer

2018-04-04 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85180

--- Comment #5 from Jakub Jelinek  ---
Another testcase running into this.

char *bar (void);
__INTPTR_TYPE__ baz (void);

void
foo (__INTPTR_TYPE__ *q)
{
  char *p = bar ();
  __INTPTR_TYPE__ a = baz ();
  __INTPTR_TYPE__ b = baz ();
  int i = 0;
#define X q[i++] = a; q[i++] = b; a = a + b; b = b + a;
#define Y X X X X X X X X X X
#define Z Y Y Y Y Y Y Y Y Y Y
  Z Z Z Z Z Z Z Z Z Z
  p[a] = 1;
  p[b] = 2;
}

With Y X X X instead of the 10 Zs I see with -O1 -ftime-report:
 dead store elim1   :   0.43 ( 81%)   0.00 (  0%)   0.43 ( 81%)
  6 kB (  0%)
With Y X X X X:
 dead store elim1   :   1.15 ( 88%)   0.00 (  0%)   1.15 ( 88%)
  7 kB (  0%)
With Y X X X X X:
 dead store elim1   :   3.29 ( 92%)   0.00 (  0%)   3.29 ( 91%)
  7 kB (  0%)
With Y X X X X X X:
 dead store elim1   :   9.18 ( 93%)   0.00 (  0%)   9.19 ( 93%)
  8 kB (  0%)
With Y X X X X X X X:
 dead store elim1   :  24.71 ( 94%)   0.00 (  0%)  24.73 ( 94%)
  8 kB (  1%)
With Y X X X X X X X X:
 dead store elim1   :  68.89 ( 94%)   0.00 (  0%)  68.94 ( 94%)
  9 kB (  1%)
With Y X X X X X X X X X:
 dead store elim1   : 190.42 ( 95%)   0.00 (  0%) 190.56 ( 95%)
  9 kB (  1%)
With Y Y:
 dead store elim1   : 522.59 ( 95%)   0.00 (  0%) 522.96 ( 95%)
 10 kB (  1%)

[Bug rtl-optimization/85180] Infinite loop in RTL DSE optimizer

2018-04-04 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85180

--- Comment #4 from Jakub Jelinek  ---
Actually, find_base_value is probably ok, it doesn't handle VALUEs and for
PLUS/MINUS it just guesses one operand on which to recurse, rather than both.

Another possibility is to add some counter and count how many VALUEs we've
processed for one toplevel call (or how many recursions we've done) and limit
that by some constant or param, and just return NULL after we hit that limit.

[Bug rtl-optimization/85180] Infinite loop in RTL DSE optimizer

2018-04-04 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85180

--- Comment #3 from Jakub Jelinek  ---
Simplified testcase:
struct words
{
#define V(n) short word_##n;
#define W(n) V(n)
#define X(n) W(n##0) W(n##1) W(n##2) W(n##3) W(n##4) W(n##5) W(n##6) W(n##7)
W(n##8) W(n##9)
#define Y X(0) X(1) X(2) X(3) X(4) X(5)
  Y
};

static char *pbuf, **pbuf_ptrs;

void
foo (struct words *w)
{
  char *aa = pbuf, **cptr = pbuf_ptrs;

  *cptr = aa;
#undef V
#define V(n) \
  __builtin_sprintf (*(cptr++), "word_" #n " %d ", w->word_##n); \
  *cptr = (aa += __builtin_strlen (aa) + 1);
  Y

  __builtin_sprintf (*(cptr++), " ");
  *cptr = (aa += __builtin_strlen (aa) + 1);
}

void
bar (char *x, char **y)
{
  pbuf = x;
  pbuf_ptrs = y;
}

Infinite recursion is not possible and is not what's going on here, that is
prevented through:
  f = val->locs;
  /* Temporarily reset val->locs to avoid infinite recursion.  */
  val->locs = NULL;
...
  val->locs = f;
in find_base_term.

I think the problem is rather that for PLUS and MINUS we recurse up to twice
(once for each argument) and for VALUEs we can recurse even more times if it
has long locs list, so if we are unlucky it can be exponential, one single
toplevel find_base_term call recursing transitively on a particular VALUE many
times.

Now, I guess caching find_base_term return value for each VALUE (or perhaps
delayed, look up say 20 VALUEs without caching and only then build the cache
and start caching) could be a way out of this, but the val->locs = NULL; hack
above will not be very good for that, because then we might cache a negative
answer even for VALUEs that would yield non-NULL answer if we called
find_base_term for it from the toplevel.  Dunno if that is acceptable or not.

I also wonder if we couldn't (either in new struct cselib_val field or on the
side table) just record find_base_term value (and find_base_value?) of each
VALUE we create when we process the instruction that created the VALUE.  Or is
the find_base_term not meant to be sticky for each VALUE forever and is
supposed to change as further insns are processed (invalidate locs from VALUEs
etc.)?

[Bug rtl-optimization/85180] Infinite (?) loop in RTL DSE optimizer

2018-04-04 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85180

Richard Biener  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #2 from Richard Biener  ---
Patched find_base_term that tracks width/depth shows

#0  rhs_regno (x=0x7f3e30 <_start>)
at /space/rguenther/src/svn/early-lto-debug/gcc/rtl.h:1895
#1  0x00bb724d in find_base_term (x=0x76a84f18, width=118, 
depth=176) at /space/rguenther/src/svn/early-lto-debug/gcc/alias.c:1894

where one major offender of width is the CSELIB loc iteration.  Adding a
hash-set to see if we recurse into already visited refs shows:

#1  0x00bb7252 in find_base_term (x=0x2805d60, width=61, depth=121, 
set=...) at /space/rguenther/src/svn/early-lto-debug/gcc/alias.c:1886
1886  gcc_assert (!set.add (x));

so indeed this is an endless recursion, not just one that is very deep.

(gdb) p debug_rtx (x)
(value:DI 11:4356 @0x2805d60/0x27f5e40)

coming via

(gdb) p debug_rtx (x)
(minus:DI (value:DI 20:20 @0x2805e38/0x27f5ff0)
(value:DI 11:4356 @0x2805d60/0x27f5e40))

(gdb) p debug_rtx (x)
(value:DI 21:4465 @0x2805e50/0x27f6020)

(plus:DI (value:DI 21:4465 @0x2805e50/0x27f6020)
(const_int 1 [0x1]))
...

not sure if values should ever form a cycle this way?

Anyhow - limiting search width/depth would of course catch this.

[Bug rtl-optimization/85180] Infinite (?) loop in RTL DSE optimizer

2018-04-04 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85180

Richard Biener  changed:

   What|Removed |Added

   Keywords||compile-time-hog
 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2018-04-04
Version|unknown |8.0.1
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
find_base_term is known to be quadratic... (there's some dups for this I bet). 
I suppose you tested trunk.

I guess since this is present for such a long time now we should simply limit
the with and/or depth of the find_base_term recursion given it's safe to
return 0.