[Bug lto/87988] [9 regression] Streaming of ABSTRACT_ORIGIN is expensive

2018-12-21 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87988

Richard Biener  changed:

   What|Removed |Added

   Keywords||compile-time-hog, deferred,
   ||lto, memory-hog
   Priority|P3  |P2
   Target Milestone|9.0 |10.0

--- Comment #11 from Richard Biener  ---
Deferred.

[Bug lto/87988] [9 regression] Streaming of ABSTRACT_ORIGIN is expensive

2018-11-29 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87988

--- Comment #10 from rguenther at suse dot de  ---
On Wed, 28 Nov 2018, hubicka at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87988
> 
> --- Comment #9 from Jan Hubicka  ---
> We still have:
>/* When not generating debug info we can eliminate info on unused  
>   
>   variables.  */  
>   
>else if (!flag_auto_profile && debug_info_level == DINFO_LEVEL_NONE
>   
> && !optinfo_wants_inlining_info_p ()) 
>   
> 
> can we do better here?

I think we can do better in the earlier loop over BLOCK_VARS if we make
sure to not call remove_unused_scope_block_p before early debug 
generation.  For example we should be able to elide

  else if (TREE_CODE (*t) == TYPE_DECL
   || debug_info_level == DINFO_LEVEL_NORMAL
   || debug_info_level == DINFO_LEVEL_VERBOSE)
;

completely.  Likewise

  /* Debug info of nested function refers to the block of the
 function.  We might stil call it even if all statements
 of function it was nested into was elliminated.

 TODO: We can actually look into cgraph to see if function
 will be output to file.  */
  if (TREE_CODE (*t) == FUNCTION_DECL)
unused = false;

should not be necessary - that is, after early debug BLOCK_VARS
only needs to retain used decls (decls we want to annotate with
locations later).

The code you quote above is a bit weird indeed.

[Bug lto/87988] [9 regression] Streaming of ABSTRACT_ORIGIN is expensive

2018-11-28 Thread hubicka at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87988

--- Comment #9 from Jan Hubicka  ---
We still have:
   /* When not generating debug info we can eliminate info on unused
  variables.  */
   else if (!flag_auto_profile && debug_info_level == DINFO_LEVEL_NONE  
&& !optinfo_wants_inlining_info_p ())   

can we do better here?

[Bug lto/87988] [9 regression] Streaming of ABSTRACT_ORIGIN is expensive

2018-11-27 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87988

--- Comment #8 from rguenther at suse dot de  ---
On November 27, 2018 12:01:03 PM GMT+01:00, "hubicka at gcc dot gnu.org"
 wrote:
>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87988
>
>--- Comment #7 from Jan Hubicka  ---
>Hi,
>ltrans files are 1374K without and 1339K with patch.
>
>WPA report without patch:
>[WPA] read 13690507 SCCs of average size 1.397311
>[WPA] 19129895 tree bodies read in total
>[WPA] tree SCC table: size 4194301, 2847668 elements, collision ratio:
>0.834030
>[WPA] tree SCC max chain length 156 (size 1)
>[WPA] Compared 5298938 SCCs, 3633923 collisions (0.685783)
>[WPA] Merged 5282405 SCCs
>[WPA] Merged 9857161 tree bodies
>[WPA] Merged 3063763 types
>[WPA] 1614359 types prevailed (2464626 associated trees)
>[WPA] GIMPLE canonical type table: size 32749, 22785 elements, 149648
>searches,
>63491 collisions (ratio: 0.424269)
>[WPA] GIMPLE canonical type pointer-map: 22785 elements, 348123
>searches
>[WPA] # of input files: 2236
>WPA] Compression: 298531604 input bytes, 898586109 uncompressed bytes
>(ratio:
>3.010020)
>[WPA] Size of mmap'd section decls: 298531604 bytes
>
>WPA report with patch:
>WPA] read 13139926 SCCs of average size 1.412202
>[WPA] 18556224 tree bodies read in total
>[WPA] tree SCC table: size 4194301, 2725601 elements, collision ratio:
>0.813527
>[WPA] tree SCC max chain length 153 (size 1)
>[WPA] Compared 5043033 SCCs, 3379920 collisions (0.670216)
>[WPA] Merged 5027945 SCCs
>[WPA] Merged 9584037 tree bodies
>[WPA] Merged 2957501 types
>[WPA] 1557131 types prevailed (2402973 associated trees)
>[WPA] GIMPLE canonical type table: size 32749, 22783 elements, 148468
>searches,
>63408 collisions (ratio: 0.427082)
>[WPA] GIMPLE canonical type pointer-map: 22783 elements, 347231
>searches
>[WPA] Compression: 288287953 input bytes, 867825506 uncompressed bytes
>(ratio:
>3.010273)
>[WPA] Size of mmap'd section decls: 288287953 bytes
>
>
>The WPA report after optimization without patch:
>WPA statistics
>[WPA] read 13690507 SCCs of average size 1.397311
>[WPA] 19129895 tree bodies read in total
>[WPA] # of input files: 2236
>[WPA] # of input cgraph nodes: 411683
>[WPA] # of function bodies: 83824
>[WPA] # of output files: 128
>[WPA] # of output symtab nodes: 1050843
>[WPA] # of output tree pickle references: 708465
>[WPA] # of output tree bodies: 190996
>[WPA] # callgraph partitions: 128
>[WPA] Compression: 416611153 input bytes, 1228624058 uncompressed bytes
>(ratio:
>2.949090)
>[WPA] Size of mmap'd section decls: 298531604 bytes
>[WPA] Size of mmap'd section function_body: 72959123 bytes
>
>with patch:
>WPA statistics
>[WPA] read 13139926 SCCs of average size 1.412202
>[WPA] 18556224 tree bodies read in total
>[WPA] # of input files: 2236
>[WPA] # of input cgraph nodes: 411683
>[WPA] # of function bodies: 83824
>[WPA] # of output files: 128
>[WPA] # of output symtab nodes: 1050843
>[WPA] # of output tree pickle references: 685935
>[WPA] # of output tree bodies: 183785
>[WPA] # callgraph partitions: 128
>[WPA] Compression: 404728108 input bytes, 1193421138 uncompressed bytes
>(ratio:
>2.948699)
>[WPA] Size of mmap'd section decls: 288287953 bytes
>[WPA] Size of mmap'd section function_body: 71319730 bytes

That makes it not worth the trouble? It might be most of the trees are still
reachable via DECL_ABSTRACT_ORIGIN from the locals (even if they are unused). 

Do we now aggressively prune unused locals from BLOCK_VARS?

[Bug lto/87988] [9 regression] Streaming of ABSTRACT_ORIGIN is expensive

2018-11-27 Thread hubicka at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87988

--- Comment #7 from Jan Hubicka  ---
Hi,
ltrans files are 1374K without and 1339K with patch.

WPA report without patch:
[WPA] read 13690507 SCCs of average size 1.397311
[WPA] 19129895 tree bodies read in total
[WPA] tree SCC table: size 4194301, 2847668 elements, collision ratio: 0.834030
[WPA] tree SCC max chain length 156 (size 1)
[WPA] Compared 5298938 SCCs, 3633923 collisions (0.685783)
[WPA] Merged 5282405 SCCs
[WPA] Merged 9857161 tree bodies
[WPA] Merged 3063763 types
[WPA] 1614359 types prevailed (2464626 associated trees)
[WPA] GIMPLE canonical type table: size 32749, 22785 elements, 149648 searches,
63491 collisions (ratio: 0.424269)
[WPA] GIMPLE canonical type pointer-map: 22785 elements, 348123 searches
[WPA] # of input files: 2236
WPA] Compression: 298531604 input bytes, 898586109 uncompressed bytes (ratio:
3.010020)
[WPA] Size of mmap'd section decls: 298531604 bytes

WPA report with patch:
WPA] read 13139926 SCCs of average size 1.412202
[WPA] 18556224 tree bodies read in total
[WPA] tree SCC table: size 4194301, 2725601 elements, collision ratio: 0.813527
[WPA] tree SCC max chain length 153 (size 1)
[WPA] Compared 5043033 SCCs, 3379920 collisions (0.670216)
[WPA] Merged 5027945 SCCs
[WPA] Merged 9584037 tree bodies
[WPA] Merged 2957501 types
[WPA] 1557131 types prevailed (2402973 associated trees)
[WPA] GIMPLE canonical type table: size 32749, 22783 elements, 148468 searches,
63408 collisions (ratio: 0.427082)
[WPA] GIMPLE canonical type pointer-map: 22783 elements, 347231 searches
[WPA] Compression: 288287953 input bytes, 867825506 uncompressed bytes (ratio:
3.010273)
[WPA] Size of mmap'd section decls: 288287953 bytes


The WPA report after optimization without patch:
WPA statistics
[WPA] read 13690507 SCCs of average size 1.397311
[WPA] 19129895 tree bodies read in total
[WPA] # of input files: 2236
[WPA] # of input cgraph nodes: 411683
[WPA] # of function bodies: 83824
[WPA] # of output files: 128
[WPA] # of output symtab nodes: 1050843
[WPA] # of output tree pickle references: 708465
[WPA] # of output tree bodies: 190996
[WPA] # callgraph partitions: 128
[WPA] Compression: 416611153 input bytes, 1228624058 uncompressed bytes (ratio:
2.949090)
[WPA] Size of mmap'd section decls: 298531604 bytes
[WPA] Size of mmap'd section function_body: 72959123 bytes

with patch:
WPA statistics
[WPA] read 13139926 SCCs of average size 1.412202
[WPA] 18556224 tree bodies read in total
[WPA] # of input files: 2236
[WPA] # of input cgraph nodes: 411683
[WPA] # of function bodies: 83824
[WPA] # of output files: 128
[WPA] # of output symtab nodes: 1050843
[WPA] # of output tree pickle references: 685935
[WPA] # of output tree bodies: 183785
[WPA] # callgraph partitions: 128
[WPA] Compression: 404728108 input bytes, 1193421138 uncompressed bytes (ratio:
2.948699)
[WPA] Size of mmap'd section decls: 288287953 bytes
[WPA] Size of mmap'd section function_body: 71319730 bytes

[Bug lto/87988] [9 regression] Streaming of ABSTRACT_ORIGIN is expensive

2018-11-26 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87988

--- Comment #6 from Jan Hubicka  ---
> 
> Honza - can you test the effect of this patch please?
Thanks! I am just redoing the tests (rebuilding firefoxes with updated
tree), so i will do that today or tomorrow.

Honza

[Bug lto/87988] [9 regression] Streaming of ABSTRACT_ORIGIN is expensive

2018-11-26 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87988

--- Comment #5 from Richard Biener  ---
Created attachment 45092
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45092=edit
better patch

This is a less hacky approach mimicking TREE_DIEs.  It elides BLOCK
BLOCK_ABSTRACT_ORIGIN if that is a BLOCK.  Downside is that the elided BLOCKs
are re-materialized for each _use_ as BLOCK_ABSTRACT_ORIGIN (that can probably
be fixed though with some extra overhead).  Also the overhead for streaming
is one NULL pointer in case BLOCK_ABSTRACT_ORIGIN is not set.

Honza - can you test the effect of this patch please?

I'll throw it on a LTO boostrap, it at least survives lto.exp testing ;)

[Bug lto/87988] [9 regression] Streaming of ABSTRACT_ORIGIN is expensive

2018-11-13 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87988

--- Comment #4 from Richard Biener  ---
OK, I'd rather not put this hack on trunk.  Iff then a very limited scope
TREE_DIE (same cases as the hack) should be brought in.

[Bug lto/87988] [9 regression] Streaming of ABSTRACT_ORIGIN is expensive

2018-11-13 Thread hubicka at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87988

--- Comment #3 from Jan Hubicka  ---
Thanks, I will give it a try.  Note that the numbers I posted are from trunk
and trunk with abstract origin streaming disabled.  Thanks to the type
streaming reorg we do not have overall regression relative to gcc8:

[WPA] read 14320726 SCCs of average size 1.594004
[WPA] 22827300 tree bodies read in total
[WPA] tree SCC table: size 8388593, 3202920 elements, collision ratio: 0.923241
[WPA] tree SCC max chain length 173 (size 2)
[WPA] Compared 4496372 SCCs, 3676586 collisions (0.817678)
[WPA] Merged 4480405 SCCs
[WPA] Merged 11209786 tree bodies
[WPA] Merged 2602183 types
[WPA] 1837633 types prevailed (3482639 associated trees)
[WPA] GIMPLE canonical type table: size 32749, 22890 elements, 278542 searches,
119444 collisions (ratio: 0.428819)
[WPA] GIMPLE canonical type pointer-map: 22890 elements, 629811 searches
[WPA] # of input files: 2236
[WPA] Compression: 347027044 input bytes, 1054806677 uncompressed bytes (ratio:
3.039552)

Overall size of streamed ltrans.o files is 1.7GB

So abstract origins are definitely important to solve, but we are not in
desperate situation for GCC9 unless other testcases turns out to behave worse
than firefox (I am in progress of testing other stuff)

[Bug lto/87988] [9 regression] Streaming of ABSTRACT_ORIGIN is expensive

2018-11-13 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87988

--- Comment #2 from Richard Biener  ---
Created attachment 44995
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44995=edit
untested patch

It's surprisingly difficult to hack around things ... but the attached at least
survives lto.exp testing.

Can you see if it fixes the regression?

I very much expect it to break FAT objects since I "wreck" abstract origins
in a way others may not be hapoy about.

[Bug lto/87988] [9 regression] Streaming of ABSTRACT_ORIGIN is expensive

2018-11-13 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87988

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2018-11-13
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
   Target Milestone|--- |9.0
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
The full TREE_DIE thing won't materialize but I have an idea to "hack" around
the special case of BLOCK_ABSTRACT_ORIGIN.