[Bug lto/87988] [9 regression] Streaming of ABSTRACT_ORIGIN is expensive
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87988 Richard Biener changed: What|Removed |Added Keywords||compile-time-hog, deferred, ||lto, memory-hog Priority|P3 |P2 Target Milestone|9.0 |10.0 --- Comment #11 from Richard Biener --- Deferred.
[Bug lto/87988] [9 regression] Streaming of ABSTRACT_ORIGIN is expensive
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87988 --- Comment #10 from rguenther at suse dot de --- On Wed, 28 Nov 2018, hubicka at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87988 > > --- Comment #9 from Jan Hubicka --- > We still have: >/* When not generating debug info we can eliminate info on unused > > variables. */ > >else if (!flag_auto_profile && debug_info_level == DINFO_LEVEL_NONE > > && !optinfo_wants_inlining_info_p ()) > > > can we do better here? I think we can do better in the earlier loop over BLOCK_VARS if we make sure to not call remove_unused_scope_block_p before early debug generation. For example we should be able to elide else if (TREE_CODE (*t) == TYPE_DECL || debug_info_level == DINFO_LEVEL_NORMAL || debug_info_level == DINFO_LEVEL_VERBOSE) ; completely. Likewise /* Debug info of nested function refers to the block of the function. We might stil call it even if all statements of function it was nested into was elliminated. TODO: We can actually look into cgraph to see if function will be output to file. */ if (TREE_CODE (*t) == FUNCTION_DECL) unused = false; should not be necessary - that is, after early debug BLOCK_VARS only needs to retain used decls (decls we want to annotate with locations later). The code you quote above is a bit weird indeed.
[Bug lto/87988] [9 regression] Streaming of ABSTRACT_ORIGIN is expensive
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87988 --- Comment #9 from Jan Hubicka --- We still have: /* When not generating debug info we can eliminate info on unused variables. */ else if (!flag_auto_profile && debug_info_level == DINFO_LEVEL_NONE && !optinfo_wants_inlining_info_p ()) can we do better here?
[Bug lto/87988] [9 regression] Streaming of ABSTRACT_ORIGIN is expensive
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87988 --- Comment #8 from rguenther at suse dot de --- On November 27, 2018 12:01:03 PM GMT+01:00, "hubicka at gcc dot gnu.org" wrote: >https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87988 > >--- Comment #7 from Jan Hubicka --- >Hi, >ltrans files are 1374K without and 1339K with patch. > >WPA report without patch: >[WPA] read 13690507 SCCs of average size 1.397311 >[WPA] 19129895 tree bodies read in total >[WPA] tree SCC table: size 4194301, 2847668 elements, collision ratio: >0.834030 >[WPA] tree SCC max chain length 156 (size 1) >[WPA] Compared 5298938 SCCs, 3633923 collisions (0.685783) >[WPA] Merged 5282405 SCCs >[WPA] Merged 9857161 tree bodies >[WPA] Merged 3063763 types >[WPA] 1614359 types prevailed (2464626 associated trees) >[WPA] GIMPLE canonical type table: size 32749, 22785 elements, 149648 >searches, >63491 collisions (ratio: 0.424269) >[WPA] GIMPLE canonical type pointer-map: 22785 elements, 348123 >searches >[WPA] # of input files: 2236 >WPA] Compression: 298531604 input bytes, 898586109 uncompressed bytes >(ratio: >3.010020) >[WPA] Size of mmap'd section decls: 298531604 bytes > >WPA report with patch: >WPA] read 13139926 SCCs of average size 1.412202 >[WPA] 18556224 tree bodies read in total >[WPA] tree SCC table: size 4194301, 2725601 elements, collision ratio: >0.813527 >[WPA] tree SCC max chain length 153 (size 1) >[WPA] Compared 5043033 SCCs, 3379920 collisions (0.670216) >[WPA] Merged 5027945 SCCs >[WPA] Merged 9584037 tree bodies >[WPA] Merged 2957501 types >[WPA] 1557131 types prevailed (2402973 associated trees) >[WPA] GIMPLE canonical type table: size 32749, 22783 elements, 148468 >searches, >63408 collisions (ratio: 0.427082) >[WPA] GIMPLE canonical type pointer-map: 22783 elements, 347231 >searches >[WPA] Compression: 288287953 input bytes, 867825506 uncompressed bytes >(ratio: >3.010273) >[WPA] Size of mmap'd section decls: 288287953 bytes > > >The WPA report after optimization without patch: >WPA statistics >[WPA] read 13690507 SCCs of average size 1.397311 >[WPA] 19129895 tree bodies read in total >[WPA] # of input files: 2236 >[WPA] # of input cgraph nodes: 411683 >[WPA] # of function bodies: 83824 >[WPA] # of output files: 128 >[WPA] # of output symtab nodes: 1050843 >[WPA] # of output tree pickle references: 708465 >[WPA] # of output tree bodies: 190996 >[WPA] # callgraph partitions: 128 >[WPA] Compression: 416611153 input bytes, 1228624058 uncompressed bytes >(ratio: >2.949090) >[WPA] Size of mmap'd section decls: 298531604 bytes >[WPA] Size of mmap'd section function_body: 72959123 bytes > >with patch: >WPA statistics >[WPA] read 13139926 SCCs of average size 1.412202 >[WPA] 18556224 tree bodies read in total >[WPA] # of input files: 2236 >[WPA] # of input cgraph nodes: 411683 >[WPA] # of function bodies: 83824 >[WPA] # of output files: 128 >[WPA] # of output symtab nodes: 1050843 >[WPA] # of output tree pickle references: 685935 >[WPA] # of output tree bodies: 183785 >[WPA] # callgraph partitions: 128 >[WPA] Compression: 404728108 input bytes, 1193421138 uncompressed bytes >(ratio: >2.948699) >[WPA] Size of mmap'd section decls: 288287953 bytes >[WPA] Size of mmap'd section function_body: 71319730 bytes That makes it not worth the trouble? It might be most of the trees are still reachable via DECL_ABSTRACT_ORIGIN from the locals (even if they are unused). Do we now aggressively prune unused locals from BLOCK_VARS?
[Bug lto/87988] [9 regression] Streaming of ABSTRACT_ORIGIN is expensive
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87988 --- Comment #7 from Jan Hubicka --- Hi, ltrans files are 1374K without and 1339K with patch. WPA report without patch: [WPA] read 13690507 SCCs of average size 1.397311 [WPA] 19129895 tree bodies read in total [WPA] tree SCC table: size 4194301, 2847668 elements, collision ratio: 0.834030 [WPA] tree SCC max chain length 156 (size 1) [WPA] Compared 5298938 SCCs, 3633923 collisions (0.685783) [WPA] Merged 5282405 SCCs [WPA] Merged 9857161 tree bodies [WPA] Merged 3063763 types [WPA] 1614359 types prevailed (2464626 associated trees) [WPA] GIMPLE canonical type table: size 32749, 22785 elements, 149648 searches, 63491 collisions (ratio: 0.424269) [WPA] GIMPLE canonical type pointer-map: 22785 elements, 348123 searches [WPA] # of input files: 2236 WPA] Compression: 298531604 input bytes, 898586109 uncompressed bytes (ratio: 3.010020) [WPA] Size of mmap'd section decls: 298531604 bytes WPA report with patch: WPA] read 13139926 SCCs of average size 1.412202 [WPA] 18556224 tree bodies read in total [WPA] tree SCC table: size 4194301, 2725601 elements, collision ratio: 0.813527 [WPA] tree SCC max chain length 153 (size 1) [WPA] Compared 5043033 SCCs, 3379920 collisions (0.670216) [WPA] Merged 5027945 SCCs [WPA] Merged 9584037 tree bodies [WPA] Merged 2957501 types [WPA] 1557131 types prevailed (2402973 associated trees) [WPA] GIMPLE canonical type table: size 32749, 22783 elements, 148468 searches, 63408 collisions (ratio: 0.427082) [WPA] GIMPLE canonical type pointer-map: 22783 elements, 347231 searches [WPA] Compression: 288287953 input bytes, 867825506 uncompressed bytes (ratio: 3.010273) [WPA] Size of mmap'd section decls: 288287953 bytes The WPA report after optimization without patch: WPA statistics [WPA] read 13690507 SCCs of average size 1.397311 [WPA] 19129895 tree bodies read in total [WPA] # of input files: 2236 [WPA] # of input cgraph nodes: 411683 [WPA] # of function bodies: 83824 [WPA] # of output files: 128 [WPA] # of output symtab nodes: 1050843 [WPA] # of output tree pickle references: 708465 [WPA] # of output tree bodies: 190996 [WPA] # callgraph partitions: 128 [WPA] Compression: 416611153 input bytes, 1228624058 uncompressed bytes (ratio: 2.949090) [WPA] Size of mmap'd section decls: 298531604 bytes [WPA] Size of mmap'd section function_body: 72959123 bytes with patch: WPA statistics [WPA] read 13139926 SCCs of average size 1.412202 [WPA] 18556224 tree bodies read in total [WPA] # of input files: 2236 [WPA] # of input cgraph nodes: 411683 [WPA] # of function bodies: 83824 [WPA] # of output files: 128 [WPA] # of output symtab nodes: 1050843 [WPA] # of output tree pickle references: 685935 [WPA] # of output tree bodies: 183785 [WPA] # callgraph partitions: 128 [WPA] Compression: 404728108 input bytes, 1193421138 uncompressed bytes (ratio: 2.948699) [WPA] Size of mmap'd section decls: 288287953 bytes [WPA] Size of mmap'd section function_body: 71319730 bytes
[Bug lto/87988] [9 regression] Streaming of ABSTRACT_ORIGIN is expensive
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87988 --- Comment #6 from Jan Hubicka --- > > Honza - can you test the effect of this patch please? Thanks! I am just redoing the tests (rebuilding firefoxes with updated tree), so i will do that today or tomorrow. Honza
[Bug lto/87988] [9 regression] Streaming of ABSTRACT_ORIGIN is expensive
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87988 --- Comment #5 from Richard Biener --- Created attachment 45092 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45092=edit better patch This is a less hacky approach mimicking TREE_DIEs. It elides BLOCK BLOCK_ABSTRACT_ORIGIN if that is a BLOCK. Downside is that the elided BLOCKs are re-materialized for each _use_ as BLOCK_ABSTRACT_ORIGIN (that can probably be fixed though with some extra overhead). Also the overhead for streaming is one NULL pointer in case BLOCK_ABSTRACT_ORIGIN is not set. Honza - can you test the effect of this patch please? I'll throw it on a LTO boostrap, it at least survives lto.exp testing ;)
[Bug lto/87988] [9 regression] Streaming of ABSTRACT_ORIGIN is expensive
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87988 --- Comment #4 from Richard Biener --- OK, I'd rather not put this hack on trunk. Iff then a very limited scope TREE_DIE (same cases as the hack) should be brought in.
[Bug lto/87988] [9 regression] Streaming of ABSTRACT_ORIGIN is expensive
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87988 --- Comment #3 from Jan Hubicka --- Thanks, I will give it a try. Note that the numbers I posted are from trunk and trunk with abstract origin streaming disabled. Thanks to the type streaming reorg we do not have overall regression relative to gcc8: [WPA] read 14320726 SCCs of average size 1.594004 [WPA] 22827300 tree bodies read in total [WPA] tree SCC table: size 8388593, 3202920 elements, collision ratio: 0.923241 [WPA] tree SCC max chain length 173 (size 2) [WPA] Compared 4496372 SCCs, 3676586 collisions (0.817678) [WPA] Merged 4480405 SCCs [WPA] Merged 11209786 tree bodies [WPA] Merged 2602183 types [WPA] 1837633 types prevailed (3482639 associated trees) [WPA] GIMPLE canonical type table: size 32749, 22890 elements, 278542 searches, 119444 collisions (ratio: 0.428819) [WPA] GIMPLE canonical type pointer-map: 22890 elements, 629811 searches [WPA] # of input files: 2236 [WPA] Compression: 347027044 input bytes, 1054806677 uncompressed bytes (ratio: 3.039552) Overall size of streamed ltrans.o files is 1.7GB So abstract origins are definitely important to solve, but we are not in desperate situation for GCC9 unless other testcases turns out to behave worse than firefox (I am in progress of testing other stuff)
[Bug lto/87988] [9 regression] Streaming of ABSTRACT_ORIGIN is expensive
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87988 --- Comment #2 from Richard Biener --- Created attachment 44995 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44995=edit untested patch It's surprisingly difficult to hack around things ... but the attached at least survives lto.exp testing. Can you see if it fixes the regression? I very much expect it to break FAT objects since I "wreck" abstract origins in a way others may not be hapoy about.
[Bug lto/87988] [9 regression] Streaming of ABSTRACT_ORIGIN is expensive
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87988 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2018-11-13 Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Target Milestone|--- |9.0 Ever confirmed|0 |1 --- Comment #1 from Richard Biener --- The full TREE_DIE thing won't materialize but I have an idea to "hack" around the special case of BLOCK_ABSTRACT_ORIGIN.