[Bug tree-optimization/61515] [4.10 Regression] Extremely long compile time for generated code

2014-06-17 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61515

Richard Biener rguenth at gcc dot gnu.org changed:

   What|Removed |Added

   Keywords|memory-hog  |
 Blocks||47344

--- Comment #8 from Richard Biener rguenth at gcc dot gnu.org ---
(In reply to Richard Biener from comment #7)
 (In reply to Richard Biener from comment #6)
  4.9 at -Os takes 5min and ~2.2GB of ram (points-to takes 20%, DF 33%)
 
 trunk at -Os takes 15min and ~2.1GB of ram (dominator optimization takes 67%)
 
 trunk at -O1 takes 14min and ~2GB of ram (still DOM at 62%)
 
 So it seems that on trunk DOM regressed a lot.  Confirmed as 4.10 regression.

With -O1 -fno-tree-dominator-opts behavior is sane:

 df reaching defs:  86.99 (28%) usr  36.53 (66%) sys 124.04 (34%) wall 
 0 kB ( 0%) ggc
 tree PTA:  10.31 ( 3%) usr   0.15 ( 0%) sys  10.47 ( 3%) wall 
  1948 kB ( 0%) ggc
 tree SSA incremental:  98.24 (31%) usr   8.39 (15%) sys 106.63 (29%) wall 
 13570 kB ( 1%) ggc
 tree loop invariant motion:  14.46 ( 5%) usr   0.10 ( 0%) sys  14.54 ( 4%)
wall   14421 kB ( 1%) ggc
 scev constant prop  :  11.14 ( 4%) usr   0.02 ( 0%) sys  11.18 ( 3%) wall 
 28795 kB ( 2%) ggc
 TOTAL : 312.1055.72   368.92   
1523092 kB
312.10user 55.84system 6:09.34elapsed 99%CPU (0avgtext+0avgdata
2073876maxresident)k
19664inputs+13880outputs (130major+38202143minor)pagefaults 0swaps

(well, sane apart from DF and SSA incremental), but with DOM not disabled
we get

 df reaching defs: 108.63 (11%) usr   8.83 (38%) sys 117.08 (12%) wall 
 0 kB ( 0%) ggc
 tree PTA:  10.51 ( 1%) usr   0.09 ( 0%) sys  10.60 ( 1%) wall 
  1948 kB ( 0%) ggc
 tree SSA incremental:  99.41 (10%) usr   9.17 (39%) sys 108.93 (11%) wall 
 13474 kB ( 1%) ggc
 dominator optimization  : 617.08 (63%) usr   0.32 ( 1%) sys 618.97 (61%) wall 
 56878 kB ( 3%) ggc
 tree loop invariant motion:   2.15 ( 0%) usr   0.11 ( 0%) sys   2.25 ( 0%)
wall   12012 kB ( 1%) ggc
 scev constant prop  :  13.31 ( 1%) usr   0.02 ( 0%) sys  13.36 ( 1%) wall 
 28348 kB ( 2%) ggc
 TOTAL : 981.9823.25  1007.67   
1625119 kB
981.98user 23.34system 16:47.79elapsed 99%CPU (0avgtext+0avgdata
2071112maxresident)k
184inputs+66416outputs (5major+18571554minor)pagefaults 0swaps

(yes, this is with release checking)

The testcase has _lots_ of loops (~11000), inside a big
outer one, the maximum nesting isn't too big (10 from what I can see).
SSA incremental is likely loop-closed SSA rewrite - didn't check.

It should be possible to reduce the testcase somewhat if needed.

Eventually the soulution for DOM is to disable the new path-based
threading (if that turns out to be the issue) for -fno-expensive-optimizations
(that is, optimize  2).


[Bug tree-optimization/61515] [4.10 Regression] Extremely long compile time for generated code

2014-06-17 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61515

--- Comment #9 from Richard Biener rguenth at gcc dot gnu.org ---
Created attachment 32952
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=32952action=edit
stripped down testcase

Stripped down, now it starts to fall off a bit:

 tree SSA incremental:   1.48 ( 7%) usr   0.42 (33%) sys   1.96 ( 8%) wall 
  1882 kB ( 1%) ggc
 dominator optimization  :  10.57 (50%) usr   0.03 ( 2%) sys  12.44 (48%) wall 
  6918 kB ( 4%) ggc
 TOTAL :  21.12 1.2725.67
191269 kB

(just cutting in half three times from the bottom and appending } until it
compiles again)


[Bug tree-optimization/61515] [4.10 Regression] Extremely long compile time for generated code

2014-06-17 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61515

--- Comment #10 from Richard Biener rguenth at gcc dot gnu.org ---
All time is spent in invalidate_equivalencies.

/* A new value has been assigned to LHS.  If necessary, invalidate any
   equivalences that are no longer valid.  */
static void
invalidate_equivalences (tree lhs, vectree *stack)
{

  for (unsigned int i = 1; i  num_ssa_names; i++)
if (ssa_name (i)  SSA_NAME_VALUE (ssa_name (i)) == lhs)
  record_temporary_equivalence (ssa_name (i), NULL_TREE, stack);

  if (SSA_NAME_VALUE (lhs))
record_temporary_equivalence (lhs, NULL_TREE, stack);
}

this is obviously quadratic ... nearly all calls are from

static gimple
record_temporary_equivalences_from_stmts_at_dest (edge e,
  vectree *stack,
  tree (*simplify) (gimple,
gimple),
  bool backedge_seen)
{
...
  else if (backedge_seen)
invalidate_equivalences (gimple_get_lhs (stmt), stack);
}
  return stmt;
}

I think you should record a bitmap of SSA names you need to invalidate
equivalences to and only at the end of this do a single scan over all SSA
names.


[Bug tree-optimization/61515] [4.10 Regression] Extremely long compile time for generated code

2014-06-16 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61515

Richard Biener rguenth at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2014-06-16
 CC||law at gcc dot gnu.org
Version|unknown |4.10.0
   Target Milestone|--- |4.10.0
Summary|Extremely long compile time |[4.10 Regression] Extremely
   |for generated code  |long compile time for
   ||generated code
 Ever confirmed|0   |1

--- Comment #7 from Richard Biener rguenth at gcc dot gnu.org ---
(In reply to Richard Biener from comment #6)
 4.9 at -Os takes 5min and ~2.2GB of ram (points-to takes 20%, DF 33%)

trunk at -Os takes 15min and ~2.1GB of ram (dominator optimization takes 67%)

trunk at -O1 takes 14min and ~2GB of ram (still DOM at 62%)

So it seems that on trunk DOM regressed a lot.  Confirmed as 4.10 regression.