[Bug ipa/44563] GCC uses a lot of RAM when compiling a large numbers of functions

2024-02-16 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563

Richard Biener  changed:

   What|Removed |Added

   Last reconfirmed|2010-06-17 10:36:48 |2024-2-16

--- Comment #40 from Richard Biener  ---
Reconfirmed.  On for GCC 14 we use about 2GB of ram on x86_64 with -O0 and 20s.
With -O1 that regresses to 60s and a little less peak memory.

 callgraph ipa passes   :  14.18 ( 23%)
 tree PTA   :  16.43 ( 27%) 

And -O2 memory usage improves further at about the same compile-time.

[Bug ipa/44563] GCC uses a lot of RAM when compiling a large numbers of functions

2019-11-22 Thread vmakarov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563

Vladimir Makarov  changed:

   What|Removed |Added

 CC||vmakarov at gcc dot gnu.org

--- Comment #39 from Vladimir Makarov  ---
(In reply to Jan Hubicka from comment #38)
>
> Seems a lot of memory is taken by IRA, too.
> 
>

I played with the test too.  2^16 functions. It is a lot of insns and IRA/LRA
is not only RA but in some way a code selector too which should keep a lot
additional structures about insns.   Memory numbers for IRA is just GCed memory
allocation.  We should be interesting in *peak* memory allocation.  I don't see
any peak for RA.  ./cc1 -O2 with valgrind massif looks like (after recent
Richard's patch changing IRA XNEWs on a memory pool):


KB
206.8^:
 |   :#::@:@::@:::
 | :   ::: : :#::@:@::@:::
 |:: : : : : :#::@:@::@:::
 |   : : : : : : :#::@:@::@:::
 |   :@ :: : : : : : :#::@:@::@:::
 |  ::@::: : : : : : :#::@:@::@:::
 |  ::@::: : : : : : :#::@:@::@:::
 |  ::@::: : : : : : :#::@:@::@:::
 |  ::@::: : : : : : :#::@:@::@:::
 |  ::@::: : : : : : :#::@:@::@:::
 |  ::@::: : : : : : :#::@:@::@:::
 |  ::@::: : : : : : :#::@:@::@:::
 |  ::@::: : : : : : :#::@:@::@:::
 |  ::@::: : : : : : :#::@:@::@:::
 |  ::@::: : : : : : :#::@:@::@:::
 |  ::@::: : : : : : :#::@:@::@:::
 |  ::@::: : : : : : :#::@:@::@:::
 |  ::@::: : : : : : :#::@:@::@:::
 |  ::@::: : : : : : :#::@:@::@:::
   0
+--->Mi
 0   1.435

So it is about 200MB *peak* memory consumption and RA consuming only 4% of time
which I think is an excellent result (usually RA takes more time).

 integrated RA  :   7.21 (  4%)   1.51 (  3%)   9.37 (  3%)
1581592 kB ( 38%) 
 LRA non-specific   :   1.35 (  1%)   0.31 (  1%)   1.95 (  1%)
   3584 kB (  0%) 
 LRA virtuals elimination   :   0.38 (  0%)   0.10 (  0%)   0.43 (  0%)
  0 kB (  0%) 
 LRA reload inheritance :   0.16 (  0%)   0.04 (  0%)   0.14 (  0%)
  0 kB (  0%) 
 LRA create live ranges :   0.01 (  0%)   0.01 (  0%)   0.08 (  0%)
  0 kB (  0%) 
 LRA hard reg assignment:   0.23 (  0%)   0.06 (  0%)   0.17 (  0%)
  0 kB (  0%)

[Bug ipa/44563] GCC uses a lot of RAM when compiling a large numbers of functions

2019-11-21 Thread hubicka at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563

Jan Hubicka  changed:

   What|Removed |Added

 Status|ASSIGNED|NEW
   Assignee|hubicka at gcc dot gnu.org |unassigned at gcc dot 
gnu.org

--- Comment #38 from Jan Hubicka  ---
 it is GCC10 but I finally managed to implement the incremental update
here.
Memory use is about 1.1GB but inliner finishes quite quickly:

Time variable   usr   sys  wall
  GGC
 phase setup:   0.00 (  0%)   0.00 (  0%)   0.00 (  0%)
   1237 kB (  0%)
 phase parsing  :   1.29 (  2%)   1.24 (  6%)   2.54 (  3%)
 247897 kB (  6%)
 phase lang. deferred   :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
  0 kB (  0%)
 phase opt and generate :  56.81 ( 98%)  19.35 ( 94%)  76.27 ( 97%)
3859026 kB ( 94%)
 garbage collection :   0.84 (  1%)   0.10 (  0%)   0.93 (  1%)
  0 kB (  0%)
 dump files :   3.28 (  6%)   1.85 (  9%)   5.30 (  7%)
  0 kB (  0%)
 callgraph construction :   0.70 (  1%)   0.28 (  1%)   1.07 (  1%)
  99328 kB (  2%)
 callgraph optimization :   1.38 (  2%)   0.74 (  4%)   2.03 (  3%)
   1026 kB (  0%)
 callgraph functions expansion  :  47.27 ( 81%)  15.51 ( 75%)  62.89 ( 80%)
2827825 kB ( 69%)
 callgraph ipa passes   :   8.19 ( 14%)   3.26 ( 16%)  11.45 ( 15%)
 709147 kB ( 17%)
 ipa function summary   :   0.34 (  1%)   0.08 (  0%)   0.43 (  1%)
  97794 kB (  2%)
 ipa dead code removal  :   0.25 (  0%)   0.01 (  0%)   0.27 (  0%)
  0 kB (  0%)
 ipa inheritance graph  :   0.01 (  0%)   0.00 (  0%)   0.02 (  0%)
  0 kB (  0%)
 ipa devirtualization   :   0.04 (  0%)   0.00 (  0%)   0.04 (  0%)
  0 kB (  0%)
 ipa cp :   0.23 (  0%)   0.02 (  0%)   0.27 (  0%)
   7169 kB (  0%)
 ipa inlining heuristics:   0.19 (  0%)   0.00 (  0%)   0.22 (  0%)
  0 kB (  0%)
 ipa function splitting :   0.02 (  0%)   0.01 (  0%)   0.06 (  0%)
  0 kB (  0%)
 ipa comdats:   0.05 (  0%)   0.00 (  0%)   0.05 (  0%)
  0 kB (  0%)
 ipa various optimizations  :   0.06 (  0%)   0.00 (  0%)   0.06 (  0%)
  0 kB (  0%)
 ipa reference  :   0.10 (  0%)   0.00 (  0%)   0.11 (  0%)
  0 kB (  0%)
 ipa profile:   0.07 (  0%)   0.00 (  0%)   0.06 (  0%)
  0 kB (  0%)
 ipa pure const :   0.45 (  1%)   0.15 (  1%)   0.47 (  1%)
  0 kB (  0%)
 ipa icf:   0.22 (  0%)   0.01 (  0%)   0.23 (  0%)
  0 kB (  0%)
 ipa SRA:   0.13 (  0%)   0.00 (  0%)   0.14 (  0%)
   5120 kB (  0%)
 ipa free lang data :   0.04 (  0%)   0.00 (  0%)   0.04 (  0%)
  0 kB (  0%)
 ipa free inline summary:   0.08 (  0%)   0.00 (  0%)   0.07 (  0%)
  0 kB (  0%)
 cfg construction   :   0.07 (  0%)   0.01 (  0%)   0.19 (  0%)
  0 kB (  0%)
 cfg cleanup:   0.73 (  1%)   0.23 (  1%)   0.95 (  1%)
  0 kB (  0%)
 trivially dead code:   0.30 (  1%)   0.06 (  0%)   0.30 (  0%)
  0 kB (  0%)
 df scan insns  :   0.81 (  1%)   0.21 (  1%)   0.93 (  1%)
   3072 kB (  0%)
 df multiple defs   :   0.28 (  0%)   0.06 (  0%)   0.41 (  1%)
  0 kB (  0%)
 df reaching defs   :   1.48 (  3%)   0.20 (  1%)   1.63 (  2%)
  0 kB (  0%)
 df live regs   :   1.12 (  2%)   0.26 (  1%)   1.33 (  2%)
  0 kB (  0%)
 df live regs   :   0.51 (  1%)   0.19 (  1%)   0.66 (  1%)
  0 kB (  0%)
 df must-initialized regs   :   0.11 (  0%)   0.06 (  0%)   0.14 (  0%)
  0 kB (  0%)
 df use-def / def-use chains:   0.36 (  1%)   0.04 (  0%)   0.43 (  1%)
  0 kB (  0%)
 df reg dead/unused notes   :   1.69 (  3%)   0.20 (  1%)   1.81 (  2%)
  12288 kB (  0%)
 register information   :   0.38 (  1%)   0.04 (  0%)   0.39 (  0%)
  0 kB (  0%)
 alias analysis :   0.82 (  1%)   0.17 (  1%)   1.15 (  1%)
  36865 kB (  1%)
 alias stmt walking :   0.06 (  0%)   0.04 (  0%)   0.07 (  0%)
  0 kB (  0%)
 register scan  :   0.07 (  0%)   0.03 (  0%)   0.11 (  0%)
  0 kB (  0%)
 rebuild jump labels:   0.16 (  0%)   0.06 (  0%)   0.14 (  0%)
  0 kB (  0%)
 preprocessing  :   0.39 (  1%)   0.32 (  2%)   0.49 (  1%)
  44508 kB (  1%)
 lexical analysis   :   0.32 (  1%)   0.39 (  2%)   0.73 (  1%)
  0 kB (  0%)
 parser (global):   0.11 (  0%)   0.08 (  0%)   

[Bug ipa/44563] GCC uses a lot of RAM when compiling a large numbers of functions

2016-08-22 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563

--- Comment #37 from rguenther at suse dot de  ---
On Mon, 22 Aug 2016, d.v.a at ngs dot ru wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563
> 
> --- Comment #36 from __vic  ---
> What about 6.2?

No, maybe GCC 7 if Honza finally manages to get to this...

[Bug ipa/44563] GCC uses a lot of RAM when compiling a large numbers of functions

2016-08-22 Thread d.v.a at ngs dot ru
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563

--- Comment #36 from __vic  ---
What about 6.2?

[Bug ipa/44563] GCC uses a lot of RAM when compiling a large numbers of functions

2016-02-17 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563

--- Comment #35 from Richard Biener  ---
So ... too late for GCC 6 I guess.

[Bug ipa/44563] GCC uses a lot of RAM when compiling a large numbers of functions

2015-03-15 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563

--- Comment #34 from Jan Hubicka hubicka at ucw dot cz ---
The problem is (as described earlier) the fact htat we sum size of all call
statmts
in function after every inline decision.
Most of time is spent in calling estimate_edge_size_and_time:
 79.95%   cc1  cc1[.]
_ZL28estimate_calls_size_and_timeP11cgraph_nodePiS1_S1_S1_j3vecIP9tree_node7va_heap6vl_ptrES2_I28ipa_polymorphic_call_contextS5_S6_ES2_IP21ipa_agg
  2.21%   cc1  libc-2.13.so   [.] _int_malloc
  0.59%   cc1  libc-2.13.so   [.] _int_free

Updating summaries incrementally will solve it but at the moment do not see any
really simple change for GCC-5 (i looked at this code couple times already
because of this PR)

Honza


[Bug ipa/44563] GCC uses a lot of RAM when compiling a large numbers of functions

2015-03-13 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563

Richard Biener rguenth at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |hubicka at gcc dot 
gnu.org

--- Comment #33 from Richard Biener rguenth at gcc dot gnu.org ---
Assigning to Honza - I wonder if there is any low-hanging fruit to improve
things for GCC 5 still.


[Bug ipa/44563] GCC uses a lot of RAM when compiling a large numbers of functions

2015-03-13 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563

--- Comment #32 from Richard Biener rguenth at gcc dot gnu.org ---
Author: rguenth
Date: Fri Mar 13 08:52:51 2015
New Revision: 221410

URL: https://gcc.gnu.org/viewcvs?rev=221410root=gccview=rev
Log:
2015-03-12  Richard Biener  rguent...@suse.de

PR middle-end/44563
* tree-inline.c (gimple_expand_calls_inline): Walk BB backwards
to avoid quadratic behavior with inline expansion splitting blocks.
* tree-cfgcleanup.c (cleanup_tree_cfg_bb): Do not merge block
with the successor if the predecessor will be merged with it.
* tree-cfg.c (gimple_can_merge_blocks_p): We can't merge the
entry block with its successor.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/tree-cfg.c
trunk/gcc/tree-cfgcleanup.c
trunk/gcc/tree-inline.c


[Bug ipa/44563] GCC uses a lot of RAM when compiling a large numbers of functions

2015-03-13 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563

Richard Biener rguenth at gcc dot gnu.org changed:

   What|Removed |Added

  Component|tree-optimization   |ipa
  Known to fail||5.0

--- Comment #30 from Richard Biener rguenth at gcc dot gnu.org ---
With all the patches I have for now we end up with a pure IPA issue:

 phase opt and generate  : 193.97 (99%) usr  13.82 (93%) sys 207.75 (99%) wall
3311016 kB (94%) ggc
 ipa inlining heuristics : 140.48 (72%) usr   0.44 ( 3%) sys 141.13 (67%) wall 
396289 kB (11%) ggc
 dominance computation   :   2.99 ( 2%) usr   1.00 ( 7%) sys   3.89 ( 2%) wall 
 0 kB ( 0%) ggc
 integrated RA   :   4.05 ( 2%) usr   0.85 ( 6%) sys   5.26 ( 3%) wall
1577496 kB (45%) ggc
 rest of compilation :   6.53 ( 3%) usr   1.67 (11%) sys   7.91 ( 4%) wall 
155664 kB ( 4%) ggc
 unaccounted todo:   3.82 ( 2%) usr   1.07 ( 7%) sys   4.98 ( 2%) wall 
 0 kB ( 0%) ggc
 TOTAL : 195.4614.79   210.23   
3514948 kB

everything = 1% dropped.  I wonder what that unaccounted todo is ;)


[Bug ipa/44563] GCC uses a lot of RAM when compiling a large numbers of functions

2015-03-13 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563

--- Comment #31 from Richard Biener rguenth at gcc dot gnu.org ---
Author: rguenth
Date: Fri Mar 13 08:47:14 2015
New Revision: 221409

URL: https://gcc.gnu.org/viewcvs?rev=221409root=gccview=rev
Log:
2015-03-10  Richard Biener  rguent...@suse.de

PR middle-end/44563
* tree-cfgcleanup.c (split_bb_on_noreturn_calls): Remove.
(cleanup_tree_cfg_1): Do not call it.
(execute_cleanup_cfg_post_optimizing): Fixup the CFG here.
(fixup_noreturn_call): Mark the stmt as control altering.
* tree-cfg.c (execute_fixup_cfg): Do not dump the function
here.
(pass_data_fixup_cfg): Produce a dump file.
* tree-ssa-dom.c: Include tree-cfgcleanup.h.
(need_noreturn_fixup): New global.
(pass_dominator::execute): Fixup queued noreturn calls.
(optimize_stmt): Queue calls that became noreturn for fixup.
* tree-ssa-forwprop.c (pass_forwprop::execute): Likewise.
* tree-ssa-pre.c: Include tree-cfgcleanup.h.
(el_to_fixup): New global.
(eliminate_dom_walker::before_dom_childre): Queue calls that
became noreturn for fixup.
(eliminate): Fixup queued noreturn calls.
* tree-ssa-propagate.c: Include tree-cfgcleanup.h.
(substitute_and_fold_dom_walker): New member stmts_to_fixup.
(substitute_and_fold_dom_walker::before_dom_children): Queue
alls that became noreturn for fixup.
(substitute_and_fold): Fixup queued noreturn calls.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/tree-cfg.c
trunk/gcc/tree-cfgcleanup.c
trunk/gcc/tree-ssa-dom.c
trunk/gcc/tree-ssa-forwprop.c
trunk/gcc/tree-ssa-pre.c
trunk/gcc/tree-ssa-propagate.c