[Bug middle-end/39157] Code that compiles fine in 1GB of memory with 4.1.2 requires 20GB in 4.2.* and higher

2012-11-05 Thread steven at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39157



--- Comment #25 from Steven Bosscher steven at gcc dot gnu.org 2012-11-05 
22:02:18 UTC ---

This problem has been fixed in DF with the DF_RD_PRUNE_DEAD_DEFS flag.

I see no good reason to deprecate the param, though.  For such a huge

loop, any loop invariant code motion is unlikely to be a win.


[Bug middle-end/39157] Code that compiles fine in 1GB of memory with 4.1.2 requires 20GB in 4.2.* and higher

2009-02-20 Thread jakub at gcc dot gnu dot org


--- Comment #24 from jakub at gcc dot gnu dot org  2009-02-20 12:56 ---
Subject: Bug 39157

Author: jakub
Date: Fri Feb 20 12:56:01 2009
New Revision: 144320

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=144320
Log:
PR middle-end/39157
* Makefile.in (loop-invariant.o): Depend on $(PARAMS_H).
* params.h (LOOP_INVARIANT_MAX_BBS_IN_LOOP): Define.
* params.def (loop-invariant-max-bbs-in-loop): New parameter.
* opts.c (decode_options): Set loop-invariant-max-bbs-in-loop
parameter to 1000 for -O1 by default.
* doc/invoke.texi (loop-invariant-max-bbs-in-loop): Document new
parameter.
* loop-invariant.c: Include params.h.
(move_loop_invariants): Don't call move_single_loop_invariants on
very large loops.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/Makefile.in
trunk/gcc/doc/invoke.texi
trunk/gcc/loop-invariant.c
trunk/gcc/opts.c
trunk/gcc/params.def
trunk/gcc/params.h


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39157



[Bug middle-end/39157] Code that compiles fine in 1GB of memory with 4.1.2 requires 20GB in 4.2.* and higher

2009-02-13 Thread jakub at gcc dot gnu dot org


--- Comment #21 from jakub at gcc dot gnu dot org  2009-02-13 09:17 ---
To answer 2), I bet fwprop would suffer the same problem, but fwprop is
disabled at -O1, LICM is not.  I can try it today.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39157



[Bug middle-end/39157] Code that compiles fine in 1GB of memory with 4.1.2 requires 20GB in 4.2.* and higher

2009-02-13 Thread rguenther at suse dot de


--- Comment #22 from rguenther at suse dot de  2009-02-13 11:06 ---
Subject: Re:  Code that compiles fine in 1GB of memory
 with 4.1.2 requires  20GB in 4.2.* and higher

On Thu, 12 Feb 2009, lucier at math dot purdue dot edu wrote:

 --- Comment #19 from lucier at math dot purdue dot edu  2009-02-12 20:51 
 ---
 Subject: Re:  Code that compiles fine in 1GB of
  memory with 4.1.2 requires  20GB in 4.2.* and higher
 
 On Thu, 2009-02-12 at 16:52 +, rguenth at gcc dot gnu dot org wrote:
  --- Comment #16 from rguenth at gcc dot gnu dot org  2009-02-12 16:52 
  ---
  Actually for PR26854 it is just one loop that is detected, covering all of
  the function (with approx. 56000 basic blocks and one basic-block that has
  edges to all other basic blocks in the loop).
 
 Richard:
 
 I'm wondering if you could look at a smaller file that's generated in a
 somewhat different way.
 
 At
 
 http://www.math.purdue.edu/~lucier/bugzilla/8/
 
 there's a file _num.i.gz that I think should have smaller (nested) loops
 than the entire file, for example, from the label
 
 ___L189__23__23_bignum_2e__2a_:
 
 at line 50031 to just before label
 
 ___L190__23__23_bignum_2e__2a_:
 
 at line 50105.
 
 Moving loop invariants out of this loop might help if it detected as a
 loop, but I don't know how to check whether it is.
 
 Perhaps you might check and report whether this small loop is treated as
 a loop or whether, again, the entire function is the only loop
 detected.

Yes, there are several small loops detected for this testcase (139 in
total, including one large with the computed goto).

Btw, thanks for the smaller testcase.

Richard.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39157



[Bug middle-end/39157] Code that compiles fine in 1GB of memory with 4.1.2 requires 20GB in 4.2.* and higher

2009-02-13 Thread rguenth at gcc dot gnu dot org


--- Comment #23 from rguenth at gcc dot gnu dot org  2009-02-13 11:08 
---
Lemme close this bug as a dup of the one marked as regression.

*** This bug has been marked as a duplicate of 26854 ***


-- 

rguenth at gcc dot gnu dot org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution||DUPLICATE


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39157



[Bug middle-end/39157] Code that compiles fine in 1GB of memory with 4.1.2 requires 20GB in 4.2.* and higher

2009-02-12 Thread rguenth at gcc dot gnu dot org


--- Comment #1 from rguenth at gcc dot gnu dot org  2009-02-12 09:58 ---
Is this the same as PR26854?  Note that it would be extremely helpful to have
a testcase that fits into less memory to be able to analyze this.  As you say
it is autogenerated source, can you shrink the testcase?

Thanks.


-- 

rguenth at gcc dot gnu dot org changed:

   What|Removed |Added

 CC||rguenth at gcc dot gnu dot
   ||org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39157



[Bug middle-end/39157] Code that compiles fine in 1GB of memory with 4.1.2 requires 20GB in 4.2.* and higher

2009-02-12 Thread jakub at gcc dot gnu dot org


--- Comment #2 from jakub at gcc dot gnu dot org  2009-02-12 10:41 ---
The peak on x86_64 (with normal checking cc1) seems to be around 29.3GB, but
most of the time top numbers were in the .5GB - 1GB or at most 1.3GB range,
with just a signle very high peak.  Will now look closer where all the memory
is needed.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39157



[Bug middle-end/39157] Code that compiles fine in 1GB of memory with 4.1.2 requires 20GB in 4.2.* and higher

2009-02-12 Thread rguenth at gcc dot gnu dot org


--- Comment #3 from rguenth at gcc dot gnu dot org  2009-02-12 10:46 ---
If it is like PR26854 then it is all df memory.  The gambit stuff consists of
a state machine with loads of computed gotos (thus a nearly fully connected
CFG) IIRC.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39157



[Bug middle-end/39157] Code that compiles fine in 1GB of memory with 4.1.2 requires 20GB in 4.2.* and higher

2009-02-12 Thread steven at gcc dot gnu dot org


--- Comment #4 from steven at gcc dot gnu dot org  2009-02-12 11:26 ---
Even for code with lots of computed gotos, the CFG should not be fully
connected. We factorize computed gotos to avoid exactly that.  At least we used
to.  Maybe the factorizing is broken, or it is undone somewhere too early?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39157



[Bug middle-end/39157] Code that compiles fine in 1GB of memory with 4.1.2 requires 20GB in 4.2.* and higher

2009-02-12 Thread ebotcazou at gcc dot gnu dot org


--- Comment #5 from ebotcazou at gcc dot gnu dot org  2009-02-12 11:40 
---
 Even for code with lots of computed gotos, the CFG should not be fully
 connected. We factorize computed gotos to avoid exactly that.  At least we 
 used
 to.  Maybe the factorizing is broken, or it is undone somewhere too early?

If I'm not mistaken the factoring is undone during RTL expansion.


-- 

ebotcazou at gcc dot gnu dot org changed:

   What|Removed |Added

 CC||ebotcazou at gcc dot gnu dot
   ||org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39157



[Bug middle-end/39157] Code that compiles fine in 1GB of memory with 4.1.2 requires 20GB in 4.2.* and higher

2009-02-12 Thread steven at gcc dot gnu dot org


--- Comment #6 from steven at gcc dot gnu dot org  2009-02-12 12:01 ---
In the past, we did not unfactor them (see e.g. Bug 15242).


-- 

steven at gcc dot gnu dot org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever Confirmed|0   |1
   Last reconfirmed|-00-00 00:00:00 |2009-02-12 12:01:30
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39157



[Bug middle-end/39157] Code that compiles fine in 1GB of memory with 4.1.2 requires 20GB in 4.2.* and higher

2009-02-12 Thread jakub at gcc dot gnu dot org


--- Comment #7 from jakub at gcc dot gnu dot org  2009-02-12 12:16 ---
Adding -fno-move-loop-invariants to the x86_64 mentioned options results in
VIRT memory topping around 1224m (only because of IRA, before RA it never went
above 1GB).  Seems it is really the loop2_invariant pass that ate 28GB of RAM.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39157



[Bug middle-end/39157] Code that compiles fine in 1GB of memory with 4.1.2 requires 20GB in 4.2.* and higher

2009-02-12 Thread steven at gcc dot gnu dot org


--- Comment #8 from steven at gcc dot gnu dot org  2009-02-12 12:49 ---
If there is a test case that compiles in less than 4GB, I'll take this bug (I
have no access to machines with more memory than that ;-)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39157



[Bug middle-end/39157] Code that compiles fine in 1GB of memory with 4.1.2 requires 20GB in 4.2.* and higher

2009-02-12 Thread rguenth at gcc dot gnu dot org


--- Comment #9 from rguenth at gcc dot gnu dot org  2009-02-12 12:58 ---
PR26854 is fixed as well with -fno-move-loop-invariants.  It has a little
less peak memory requirement than the testcase here.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39157



[Bug middle-end/39157] Code that compiles fine in 1GB of memory with 4.1.2 requires 20GB in 4.2.* and higher

2009-02-12 Thread jakub at gcc dot gnu dot org


--- Comment #10 from jakub at gcc dot gnu dot org  2009-02-12 13:16 ---
Most of the memory is allocated in df_chain-block_pool:
p *df-problems_by_index[4]-block_pool
$37 = {name = 0xbc3114 df_chain_block pool, id = 475, elts_per_block = 50,
returned_free_list = 0x0, 
  virgin_free_list = 0x72ce7cae8 , virgin_elts_remaining = 18, elts_allocated
= 775734800, elts_free = 18, 
  blocks_allocated = 15514696, block_list = 0x72ce7c6e0, block_size = 1608,
elt_size = 32}
p *cfun-cfg
$39 = {x_entry_block_ptr = 0x70cb11e0, x_exit_block_ptr = 0x70cb1240,
x_basic_block_info = 0x7283f000, 
  x_n_basic_blocks = 76672, x_n_edges = 135946, x_last_basic_block = 76676,
x_label_to_block_map = 0x0, 
  x_profile_status = PROFILE_GUESSED, x_dom_computed = {DOM_NO_FAST_QUERY,
DOM_NONE}, x_n_bbs_in_dom_tree = {76672, 0}, 
  max_jumptable_ents = 0, last_label_uid = 97987}


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39157



[Bug middle-end/39157] Code that compiles fine in 1GB of memory with 4.1.2 requires 20GB in 4.2.* and higher

2009-02-12 Thread jakub at gcc dot gnu dot org


--- Comment #11 from jakub at gcc dot gnu dot org  2009-02-12 14:21 ---
Created an attachment (id=17288)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17288action=view)
gcc44-pr39157.patch

Quick hack to avoid loop invariant motion from excessively large loops at -O1.
With this cc1 still tops at 1.2GB, but loop invariants are still moved out of
reasonably sized loops.  For the huge ones I wonder how much it buys in anyway,
there it increases register pressure a lot.  1 is pretty arbitrary, I
wonder if we want a param or something.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39157



[Bug middle-end/39157] Code that compiles fine in 1GB of memory with 4.1.2 requires 20GB in 4.2.* and higher

2009-02-12 Thread rguenther at suse dot de


--- Comment #12 from rguenther at suse dot de  2009-02-12 14:23 ---
Subject: Re:  Code that compiles fine in 1GB of memory
 with 4.1.2 requires  20GB in 4.2.* and higher

On Thu, 12 Feb 2009, jakub at gcc dot gnu dot org wrote:

 --- Comment #11 from jakub at gcc dot gnu dot org  2009-02-12 14:21 
 ---
 Created an attachment (id=17288)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17288action=view)
  -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17288action=view)
 gcc44-pr39157.patch
 
 Quick hack to avoid loop invariant motion from excessively large loops at -O1.
 With this cc1 still tops at 1.2GB, but loop invariants are still moved out of
 reasonably sized loops.  For the huge ones I wonder how much it buys in 
 anyway,
 there it increases register pressure a lot.  1 is pretty arbitrary, I
 wonder if we want a param or something.

I'd say for -O1 disable it completely, for -O2 and up use the
heuristic and add a param for it.

Richard.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39157



[Bug middle-end/39157] Code that compiles fine in 1GB of memory with 4.1.2 requires 20GB in 4.2.* and higher

2009-02-12 Thread rguenth at gcc dot gnu dot org


--- Comment #13 from rguenth at gcc dot gnu dot org  2009-02-12 15:47 
---
Zdenek, in this case (and PR26854) can we make sure not to recognize loops
that involve the single non-local goto BB?  Maybe this would solve the
problem as well.


-- 

rguenth at gcc dot gnu dot org changed:

   What|Removed |Added

 CC||rakdver at gcc dot gnu dot
   ||org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39157



[Bug middle-end/39157] Code that compiles fine in 1GB of memory with 4.1.2 requires 20GB in 4.2.* and higher

2009-02-12 Thread jakub at gcc dot gnu dot org


--- Comment #14 from jakub at gcc dot gnu dot org  2009-02-12 15:57 ---
Created an attachment (id=17291)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17291action=view)
gcc44-pr39157.patch

Patch to add loop-invariant-max-bbs-in-loop parameter.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39157



[Bug middle-end/39157] Code that compiles fine in 1GB of memory with 4.1.2 requires 20GB in 4.2.* and higher

2009-02-12 Thread lucier at math dot purdue dot edu


--- Comment #15 from lucier at math dot purdue dot edu  2009-02-12 16:35 
---
Some comments (a lot went on while I was sleeping):

1.  Yes, this is similar to the test case of PR26854, but the C code generator
has changed significantly since that test case was filed.  I don't know if the
changes in the code generator really affect what's happening here, however.

2.  I'm trying to get a moderately sized test case that will compile in about
3GB of RAM, as Steven requested.  (The test case from PR26854 takes at least
7GB of RAM to compile on ppc64.)

3.  If the amount of memory and cpu time required by the test case at -O1
doesn't increase significantly when loop-invariant motion is performed on loops
of size up to 10,000, then it would be good if the parameter at -O1 could be
10,000 instead of 100 (or at least larger than 100), as is suggested in the
most recent patch.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39157



[Bug middle-end/39157] Code that compiles fine in 1GB of memory with 4.1.2 requires 20GB in 4.2.* and higher

2009-02-12 Thread rguenth at gcc dot gnu dot org


--- Comment #16 from rguenth at gcc dot gnu dot org  2009-02-12 16:52 
---
Actually for PR26854 it is just one loop that is detected, covering all of
the function (with approx. 56000 basic blocks and one basic-block that has
edges to all other basic blocks in the loop).  So the default for the param
could indeed be 1 here.  And possibly it's not really the large number
of blocks but that the in/out-degree of that single basic-block with the
computed goto.

Thus, I'd suggest again to not recognize such loops or maybe mark them
specially?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39157



[Bug middle-end/39157] Code that compiles fine in 1GB of memory with 4.1.2 requires 20GB in 4.2.* and higher

2009-02-12 Thread rakdver at kam dot mff dot cuni dot cz


--- Comment #17 from rakdver at kam dot mff dot cuni dot cz  2009-02-12 
18:29 ---
Subject: Re:  Code that compiles fine in 1GB of memory with 4.1.2 requires 
20GB in 4.2.* and higher

 Zdenek, in this case (and PR26854) can we make sure not to recognize loops
 that involve the single non-local goto BB?  Maybe this would solve the
 problem as well.

maybe; however, I think it would even make sense to add an option to the
loop analysis not to detect loops over certain size at all.  Most of the
loop optimization passes become senseless on loops with thousands of
basic blocks, just wasting the resources.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39157



[Bug middle-end/39157] Code that compiles fine in 1GB of memory with 4.1.2 requires 20GB in 4.2.* and higher

2009-02-12 Thread lucier at math dot purdue dot edu


--- Comment #18 from lucier at math dot purdue dot edu  2009-02-12 19:54 
---
There is now a file slatex.i at

http://www.math.purdue.edu/~lucier/bugzilla/8/

that compiles in about 650MB of memory with gcc-4.2.3 on x86-64 with the same
options; I don't know if that will help Steven.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39157



[Bug middle-end/39157] Code that compiles fine in 1GB of memory with 4.1.2 requires 20GB in 4.2.* and higher

2009-02-12 Thread lucier at math dot purdue dot edu


--- Comment #19 from lucier at math dot purdue dot edu  2009-02-12 20:51 
---
Subject: Re:  Code that compiles fine in 1GB of
 memory with 4.1.2 requires  20GB in 4.2.* and higher

On Thu, 2009-02-12 at 16:52 +, rguenth at gcc dot gnu dot org wrote:
 --- Comment #16 from rguenth at gcc dot gnu dot org  2009-02-12 16:52 
 ---
 Actually for PR26854 it is just one loop that is detected, covering all of
 the function (with approx. 56000 basic blocks and one basic-block that has
 edges to all other basic blocks in the loop).

Richard:

I'm wondering if you could look at a smaller file that's generated in a
somewhat different way.

At

http://www.math.purdue.edu/~lucier/bugzilla/8/

there's a file _num.i.gz that I think should have smaller (nested) loops
than the entire file, for example, from the label

___L189__23__23_bignum_2e__2a_:

at line 50031 to just before label

___L190__23__23_bignum_2e__2a_:

at line 50105.

Moving loop invariants out of this loop might help if it detected as a
loop, but I don't know how to check whether it is.

Perhaps you might check and report whether this small loop is treated as
a loop or whether, again, the entire function is the only loop
detected.

Brad


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39157



[Bug middle-end/39157] Code that compiles fine in 1GB of memory with 4.1.2 requires 20GB in 4.2.* and higher

2009-02-12 Thread steven at gcc dot gnu dot org


--- Comment #20 from steven at gcc dot gnu dot org  2009-02-13 07:44 ---
Re: Moving loop invariants out of this loop might help if it detected as a
loop, but I don't know how to check whether it is. (Comment #19):

It's not like there will not be any loop invariant code motion (LICM) at all
anymore if the RTL LICM pass is disabled.  There is an LICM pass on GIMPLE, and
there is also PRE for GIMPLE (and lazy code motion for RTL but I think it
disables itself for your test case).

The RTL LICM pass mostly cleans up after expand, i.e. moves things that are not
exposed in GIMPLE. This is mostly just address calculations.

I think Zdenek is right in Comment #17.  LICM on big loops probably does more
harm than good.  Still, randomly disabling RTL LICM on big loops, when we do
nothing about the other LICM-like passes, looks like a hack to me and not like
a permanent solution.


There are a couple more ways to fix the problem (ISTR I've listed them before
somewhere...):

1. Stop using the DF UD chains and build them locally only.  This is not
something I would like much, but clearly there is something with the DF chains
that consumes too many resources.

2. Figure out why *only* RTL LICM has a big problem with the DF chains for this
test case.  Why does fwprop work?  Does it shut down itself somehow?  Is fwprop
asking only for UD chains for pseudos and LICM also for hardregs, perhaps?

3. Make RTL FUD chains work.  Shame on me for being unable so far to make that
work.  Not 4.4 material.

4. Poor-man's-FUD: For CFGs with many basic blocks and a large join-split point
(like the block that holds the factored computed goto), add a no-op move for
all live registers for which a valid reg-reg move instruction is available.
This effectively splits the UD chains, and the no-op moves would be cleaned up
later (during RA, or maybe earlier).


Just to see if it works, I want to give the poor-man's-FUD option a try with
Brad's smaller test case.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39157