[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 Bug 45375 depends on bug 48724, which changed state. Bug 48724 Summary: Lto build of mozilla dies at lto-wrapper: error trying to exec 'make -j1': execvp: No such file or directory https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48724 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |WORKSFORME
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #219 from Jan Hubicka --- devirtualization issue is now fixed, so we are down to -fno-lifetime-dse.
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #218 from Martin Liška --- Hi. Building Firefox revision: commit a704d34fb1f9e0f5dbf4113298d885cdb650906c Author: Matthew NoorenbergheDate: Thu Dec 3 17:33:35 2015 -0800 Bug 1230391 - Disable password visibility toggling in the capture doorhanger outside Nightly. rs=bnicholson, a=lizzard on a CLOSED TREE --HG-- extra : source : aea828e2cdf767a358ebc6ea661dd3b9b4160321 extra : intermediate-source : 366dd290472633b06f0942d7737c34e942e0916a This is a minimal set of LTO options for which the built binary can run: MYFLAGS="$OPT -march=native -flto=9 -fno-lifetime-dse -fno-devirtualize" For more details: # MYFLAGS="$OPT -march=native -flto=9" FAILED # MYFLAGS="$OPT -march=native -flto=9 -fno-lifetime-dse -fno-delete-null-pointer-checks -fno-devirtualize -fno-strict-aliasing" OK # MYFLAGS="$OPT -march=native -flto=9 -fno-lifetime-dse -fno-delete-null-pointer-checks" FAILED # MYFLAGS="$OPT -march=native -flto=9 -fno-lifetime-dse -fno-delete-null-pointer-checks -fno-devirtualize" OK # MYFLAGS="$OPT -march=native -flto=9 -fno-devirtualize" FAILED # MYFLAGS="$OPT -march=native -flto=9 -fno-lifetime-dse -fno-devirtualize" OK # MYFLAGS="$OPT -march=native -flto=9 -fno-lifetime-dse" FAILED Martin
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #217 from Jan Hubicka hubicka at gcc dot gnu.org --- Author: hubicka Date: Tue Jan 20 19:48:59 2015 New Revision: 219909 URL: https://gcc.gnu.org/viewcvs?rev=219909root=gccview=rev Log: PR lto/45375 * ipa-inline.c: Include lto-streamer.h (report_inline_failed_reason): Output source file differences and flags on optimization/target node mismatch. (can_inline_edge_p): Consider caller to be the outer inline function; be less restrictive about matching opimize and optimize_size attributes. (inline_account_function_p): Break out from ... (inline_small_functions): ... here. * ipa-inline-transform.c (clone_inlined_nodes): Use inline_account_function_p. (inline_call): Use optimize attribution; use inline_account_function_p. (inline_transform): Use opt_for_fn. * ipa-inline.h (inline_account_function_p): Declare. Modified: trunk/gcc/ChangeLog trunk/gcc/ipa-inline-transform.c trunk/gcc/ipa-inline.c trunk/gcc/ipa-inline.h
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #216 from Jan Hubicka hubicka at gcc dot gnu.org --- Author: hubicka Date: Tue Jan 20 04:39:45 2015 New Revision: 219878 URL: https://gcc.gnu.org/viewcvs?rev=219878root=gccview=rev Log: PR lto/45375 * i386.c (ix86_option_override_internal): Use ix86_tune_cost to set branch cost. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #215 from Jan Hubicka hubicka at gcc dot gnu.org --- Author: hubicka Date: Mon Jan 19 23:58:19 2015 New Revision: 219871 URL: https://gcc.gnu.org/viewcvs?rev=219871root=gccview=rev Log: PR lto/45375 * i386.c (gate): Check flag_expensive_optimizations and optimize_size. (ix86_option_override_internal): Drop optimize_size condition on MASK_ACCUMULATE_OUTGOING_ARGS, MASK_VZEROUPPER, MASK_AVX256_SPLIT_UNALIGNED_LOAD, MASK_AVX256_SPLIT_UNALIGNED_STORE, MASK_PREFER_AVX128. (ix86_avx256_split_vector_move_misalign, ix86_avx256_split_vector_move_misalign): Check optimize_insn_for_speed. * sse.md (all uses of TARGET_PREFER_AVX128): Add optimize_insn_for_speed_p check. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c trunk/gcc/config/i386/sse.md
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 Martin Liška marxin at gcc dot gnu.org changed: What|Removed |Added CC||marxin at gcc dot gnu.org --- Comment #214 from Martin Liška marxin at gcc dot gnu.org --- I've just found ICE for r217480 with LTO and -O2: lto1: internal compiler error: in lto_output_node, at lto-cgraph.c:462 0x7ce411 lto_output_node ../../gcc/lto-cgraph.c:462 0x7ce411 output_symtab() ../../gcc/lto-cgraph.c:974 0x7db276 lto_output() ../../gcc/lto-streamer-out.c:2309 0x814671 write_lto ../../gcc/passes.c:2346 0x8177c1 ipa_write_optimization_summaries(lto_symtab_encoder_d*) ../../gcc/passes.c:2545 0x59512a do_stream_out ../../gcc/lto/lto.c:2475 0x59a41f stream_out ../../gcc/lto/lto.c:2538 0x59a41f lto_wpa_write_files ../../gcc/lto/lto.c:2655 0x59a41f do_whole_program_analysis ../../gcc/lto/lto.c:3323 0x59a41f lto_main() ../../gcc/lto/lto.c:3443 if (tag == LTO_symtab_analyzed_node) gcc_assert (clone_of || !node-clone_of); ^ if (!clone_of) streamer_write_hwi_stream (ob-main_stream, LCC_NOT_FOUND); else streamer_write_hwi_stream (ob-main_stream, ref); If needed I will try to reduce objects that are part of WPA phase. Martin
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #213 from Steffen Hau steffen at hauihau dot de --- Hi Jan, just a short Update: Firefox since version 30 as well as Thunderbird since version 31 both compile fine with LTO enabled without the need of any additional patches. The package size was reduced by 51% (firefox ~420MB - ~207MB) and 59% (thunderbird ~480MB - ~200MB). Both programs work as intended, no crashes or unexpected behaviour so far. Best regards, Steffen
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #212 from Steffen Hau steffen at hauihau dot de --- Hi Jan, I have binutils version 2.24 with the patch from Markus Trippelsdorf for early plugin loading, so I have no wrappers for ar, nm and ranlib. I've also symlinked the liblto_plugin.so in binutils bfd-plugins directory. I'll try to apply the 3 patches you mentioned in your blog post and see wether they help, but I think they are not relevant for elfhack portion which is failing on my system. Which firefox version did you successfully compile?
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #211 from Jan Hubicka hubicka at gcc dot gnu.org --- Elfhack is rather sensitive to LTO, but it works for me, so this seems like binutils issue or some elfhack change that happened recently. I wrote instructions for building firefox with LTO here http://hubicka.blogspot.ca/2014/04/linktime-optimization-in-gcc-2-firefox.html Here I am attaching -ftime-report after the symtab hashtable was removed Execution times (seconds) phase setup : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 1536 kB ( 0%) ggc phase opt and generate : 54.29 (58%) usr 1.28 (18%) sys 55.58 (50%) wall 720779 kB (18%) ggc phase stream in : 33.54 (36%) usr 1.84 (26%) sys 35.39 (32%) wall 3389310 kB (82%) ggc phase stream out: 6.00 ( 6%) usr 4.02 (56%) sys 19.99 (18%) wall 0 kB ( 0%) ggc garbage collection : 1.86 ( 2%) usr 0.00 ( 0%) sys 1.86 ( 2%) wall 0 kB ( 0%) ggc callgraph optimization : 0.23 ( 0%) usr 0.00 ( 0%) sys 0.24 ( 0%) wall 9 kB ( 0%) ggc ipa dead code removal : 5.70 ( 6%) usr 0.18 ( 3%) sys 6.15 ( 6%) wall 92 kB ( 0%) ggc ipa inheritance graph : 0.09 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall 883 kB ( 0%) ggc ipa virtual call target : 5.58 ( 6%) usr 0.06 ( 1%) sys 5.32 ( 5%) wall 0 kB ( 0%) ggc ipa devirtualization: 0.13 ( 0%) usr 0.00 ( 0%) sys 0.20 ( 0%) wall 9201 kB ( 0%) ggc ipa cp : 2.34 ( 2%) usr 0.21 ( 3%) sys 2.55 ( 2%) wall 223628 kB ( 5%) ggc ipa inlining heuristics : 26.97 (29%) usr 0.67 ( 9%) sys 27.66 (25%) wall 865791 kB (21%) ggc ipa comdats : 0.21 ( 0%) usr 0.00 ( 0%) sys 0.21 ( 0%) wall 0 kB ( 0%) ggc ipa lto gimple in : 0.07 ( 0%) usr 0.11 ( 2%) sys 0.21 ( 0%) wall 0 kB ( 0%) ggc ipa lto gimple out : 0.46 ( 0%) usr 0.19 ( 3%) sys 0.65 ( 1%) wall 0 kB ( 0%) ggc ipa lto decl in : 24.76 (26%) usr 1.28 (18%) sys 26.08 (23%) wall 2571773 kB (63%) ggc ipa lto decl out: 5.45 ( 6%) usr 0.28 ( 4%) sys 5.75 ( 5%) wall 0 kB ( 0%) ggc ipa lto cgraph I/O : 1.13 ( 1%) usr 0.24 ( 3%) sys 1.38 ( 1%) wall 414551 kB (10%) ggc ipa lto decl merge : 2.57 ( 3%) usr 0.01 ( 0%) sys 2.58 ( 2%) wall 8227 kB ( 0%) ggc ipa lto cgraph merge: 1.72 ( 2%) usr 0.00 ( 0%) sys 1.72 ( 2%) wall 12166 kB ( 0%) ggc whopr wpa : 1.04 ( 1%) usr 0.00 ( 0%) sys 1.04 ( 1%) wall 2 kB ( 0%) ggc whopr wpa I/O : 0.03 ( 0%) usr 3.55 (50%) sys 13.51 (12%) wall 0 kB ( 0%) ggc whopr partitioning : 4.97 ( 5%) usr 0.06 ( 1%) sys 5.02 ( 5%) wall 3738 kB ( 0%) ggc ipa reference : 3.62 ( 4%) usr 0.12 ( 2%) sys 3.75 ( 3%) wall 0 kB ( 0%) ggc ipa profile : 0.33 ( 0%) usr 0.01 ( 0%) sys 0.33 ( 0%) wall 0 kB ( 0%) ggc ipa pure const : 3.86 ( 4%) usr 0.01 ( 0%) sys 3.88 ( 3%) wall 0 kB ( 0%) ggc tree eh : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc tree CFG cleanup: 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc varconst: 0.05 ( 0%) usr 0.16 ( 2%) sys 0.13 ( 0%) wall 0 kB ( 0%) ggc unaccounted todo: 0.65 ( 1%) usr 0.00 ( 0%) sys 0.64 ( 1%) wall 0 kB ( 0%) ggc TOTAL : 93.84 7.14 110.98 4111626 kB there are some improvements in devirtualization performance that used quite few decl-symbol lookups. (about 20%)
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 Steffen Hau steffen at hauihau dot de changed: What|Removed |Added CC||steffen at hauihau dot de --- Comment #210 from Steffen Hau steffen at hauihau dot de --- Latest firefox 29.0.1 does not compile with LTO enabled (Gentoo/GCc 4.9.0). It fails in elfhack: make[5]: Entering directory '/home/misc/gentoo/tmp/portage/www-client/firefox-29.0.1/work/mozilla-release/obj-x86_64-pc-linux-gnu/build/unix/elfhack' elfhack /home/misc/gentoo/tmp/portage/www-client/firefox-29.0.1/work/mozilla-release/obj-x86_64-pc-linux-gnu/_virtualenv/bin/python /home/misc/gentoo/tmp/portage/www-client/firefox-29.0.1/work/mozilla-release/config/expandlibs_exec.py --depend .deps/elfhack.pp --target elfhack -- x86_64-pc-linux-gnu-g++ -o elfhack -march=native -pipe -ggdb -flto=5 -fuse-linker-plugin -mno-avx -std=gnu++0x -MD -MP -MF .deps/elfhack.pp -Wl,-O1 -Wl,--as-needed -march=native -pipe -ggdb -flto=5 -fuse-linker-plugin -Wl,-znow -Wl,--sort-common -Wl,--hash-style=gnu -Wl,--enable-new-dtags host_elf.o host_elfhack.o x86_64-pc-linux-gnu-gcc -o dummy dummy.o -lpthread -Wl,-O1 -Wl,--as-needed -march=native -pipe -ggdb -flto=5 -fuse-linker-plugin -Wl,-znow -Wl,--sort-common -Wl,--hash-style=gnu -Wl,--enable-new-dtags -Wl,-z,noexecstack -Wl,-z,text -Wl,-rpath-link,/home/misc/gentoo/tmp/portage/www-client/firefox-29.0.1/work/mozilla-release/obj-x86_64-pc-linux-gnu/dist/bin -Wl,-rpath-link,/usr/lib x86_64-pc-linux-gnu-g++ -Wall -Wpointer-arith -Woverloaded-virtual -Werror=return-type -Werror=int-to-pointer-cast -Wtype-limits -Wempty-body -Wsign-compare -Wno-invalid-offsetof -Wcast-align -march=native -pipe -ggdb -flto=5 -fuse-linker-plugin -mno-avx -fno-strict-aliasing -fno-rtti -fno-math-errno -std=gnu++0x -pthread -pipe -fexceptions -DNDEBUG -DTRIMMED -O2 -fomit-frame-pointer -fPIC -shared -Wl,-z,defs -Wl,-h,test-array.so -o test-array.so -lpthread -Wl,-O1 -Wl,--as-needed -march=native -pipe -ggdb -flto=5 -fuse-linker-plugin -Wl,-znow -Wl,--sort-common -Wl,--hash-style=gnu -Wl,--enable-new-dtags -Wl,-z,noexecstack -Wl,-z,text -Wl,-rpath-link,/home/misc/gentoo/tmp/portage/www-client/firefox-29.0.1/work/mozilla-release/obj-x86_64-pc-linux-gnu/dist/bin -Wl,-rpath-link,/usr/lib test-array.o -nostartfiles x86_64-pc-linux-gnu-g++ -Wall -Wpointer-arith -Woverloaded-virtual -Werror=return-type -Werror=int-to-pointer-cast -Wtype-limits -Wempty-body -Wsign-compare -Wno-invalid-offsetof -Wcast-align -march=native -pipe -ggdb -flto=5 -fuse-linker-plugin -mno-avx -fno-strict-aliasing -fno-rtti -fno-math-errno -std=gnu++0x -pthread -pipe -fexceptions -DNDEBUG -DTRIMMED -O2 -fomit-frame-pointer -fPIC -shared -Wl,-z,defs -Wl,-h,test-ctors.so -o test-ctors.so -lpthread -Wl,-O1 -Wl,--as-needed -march=native -pipe -ggdb -flto=5 -fuse-linker-plugin -Wl,-znow -Wl,--sort-common -Wl,--hash-style=gnu -Wl,--enable-new-dtags -Wl,-z,noexecstack -Wl,-z,text -Wl,-rpath-link,/home/misc/gentoo/tmp/portage/www-client/firefox-29.0.1/work/mozilla-release/obj-x86_64-pc-linux-gnu/dist/bin -Wl,-rpath-link,/usr/lib test-ctors.o -nostartfiles === === If you get failures below, please file a bug describing the error === and your environment (compiler and linker versions), and use === --disable-elf-hack until this is fixed. === # Fail if the library doesn't have INIT .dynamic info readelf -d test-ctors.so | grep '(INIT)' 0x000c (INIT) 0x0 /home/misc/gentoo/tmp/portage/www-client/firefox-29.0.1/work/mozilla-release/obj-x86_64-pc-linux-gnu/build/unix/elfhack/elfhack -b -f test-ctors.so === === If you get failures below, please file a bug describing the error === and your environment (compiler and linker versions), and use === --disable-elf-hack until this is fixed. === # Fail if the library doesn't have INIT_ARRAY .dynamic info test-ctors.so: Reduced by 12096 bytes readelf -d test-array.so | grep '(INIT_ARRAY)' # Fail if the backup file doesn't exist [ -f 'test-ctors.so.bak' ] 0x0019 (INIT_ARRAY) 0x9790 # Fail if the new library doesn't contain less relocations /home/misc/gentoo/tmp/portage/www-client/firefox-29.0.1/work/mozilla-release/obj-x86_64-pc-linux-gnu/build/unix/elfhack/elfhack -b -f test-array.so test-array.so: [ $(objdump -R test-ctors.so.bak | wc -l) -gt $(objdump -R test-ctors.so | wc -l) ] Reduced by 12088 bytes # Fail if the backup file doesn't exist [ -f 'test-array.so.bak' ] # Fail if the new library doesn't contain less relocations [ $(objdump -R test-array.so.bak | wc -l) -gt $(objdump -R test-array.so | wc -l) ] # Will either crash or return exit code 1 if elfhack is broken LD_PRELOAD=/home/misc/gentoo/tmp/portage/www-client/firefox-29.0.1/work/mozilla-release/obj-x86_64-pc-linux-gnu/build/unix/elfhack/test-array.so
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #209 from Markus Trippelsdorf trippels at gcc dot gnu.org --- (In reply to Markus Trippelsdorf from comment #208) Both issues from Comment 201 were fixed by: http://gcc.gnu.org/ml/gcc-patches/2014-04/msg00338.html No, only the first issue is fixed. The second one (LTO/PGO build) still happens unfortunately.
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #208 from Markus Trippelsdorf trippels at gcc dot gnu.org --- Both issues from Comment 201 were fixed by: http://gcc.gnu.org/ml/gcc-patches/2014-04/msg00338.html
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #206 from Martin Liška mliska at suse dot cz --- Firefox (and chromium) memory reports with -flto=9 and -O2; archive contains also memory usage graph: https://docs.google.com/file/d/0B0pisUJ80pO1bnV5V0RtWXJkaVU/edit
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #207 from Martin Liška mliska at suse dot cz --- Created attachment 32525 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32525action=edit Memory usage graphs for -flto=9, -flto=4, -flto=1 with -O2
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #205 from Jan Hubicka hubicka at ucw dot cz --- I was looking into this recently, too. Curiously enough, for me clang+LTO was winning but comparing the symbols it seemed that the confiugre scripts picked bit more features at GCC side. I looked briefly on the differences and we can optimize out more vtables which I have patch for pending for next stage1 and optimize out write only global vars. Still the differences may be worth further investigation - clang seems to produce noticeably fewer external relocations, too. This seems like a ABI bug at clang side though. What I use for my firefox builds is --param inline-unit-growth=5. Our -O3 seems bit of overkill for applicatin of fize of Firefox... Honza
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #204 from Markus Trippelsdorf trippels at gcc dot gnu.org --- Here is a comparison of libxul sizes (in bytes, unstripped) for different compiler options: gcc (trunk): -O3 90213016 -O3 -flto 79682648 -O3 -flto / PGO 77250512 -Os 70431584 -Os -flto 62474008 clang (trunk): -O3 80574784 -O3 -flto 79394992 -Os 72452776 -Os -flto 65111640
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #200 from Martin Jambor jamborm at gcc dot gnu.org --- I currently cannot build Firefox with LTO due to PR 60449 (yeah, I know, using gcc configured with checking makes life hard, sometimes unnecessarily). I get errors like /home/mjambor/mozilla/mzc2/media/libvpx/vp8/encoder/onyx_if.c:4884:5: error: control flow in the middle of basic block 7
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #201 from Markus Trippelsdorf trippels at gcc dot gnu.org --- With current gcc trunk and mozilla-central trunk Firefox crashes on startup when build with -flto (--enable-optimize=-O3): 0x75ce5d8f in nsCOMPtr_base::assign_with_AddRef(nsISupports*) [clone .constprop.13162] () from /var/tmp/moz-build-dir/dist/bin/libxul.so (gdb) bt #0 0x75ce5d8f in nsCOMPtr_base::assign_with_AddRef(nsISupports*) [clone .constprop.13162] () from /var/tmp/moz-build-dir/dist/bin/libxul.so #1 0x73fe60eb in nsSocketTransport::OnSocketDetached(PRFileDesc*) () from /var/tmp/moz-build-dir/dist/bin/libxul.so #2 0x73eb74ac in nsSocketTransportService::DetachSocket(nsSocketTransportService::SocketContext*, nsSocketTransportService::SocketContext*) () from /var/tmp/moz-build-dir/dist/bin/libxul.so #3 0x73fff28f in nsSocketTransportService::Run() () from /var/tmp/moz-build-dir/dist/bin/libxul.so #4 0x74059c6a in nsThread::ProcessNextEvent(bool, bool*) () from /var/tmp/moz-build-dir/dist/bin/libxul.so #5 0x75ce5b39 in NS_ProcessNextEvent(nsIThread*, bool) [clone .constprop.13167] () from /var/tmp/moz-build-dir/dist/bin/libxul.so #6 0x745af7a0 in mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) () from /var/tmp/moz-build-dir/dist/bin/libxul.so #7 0x73ec649d in MessageLoop::Run() () from /var/tmp/moz-build-dir/dist/bin/libxul.so #8 0x73fe7a56 in nsThread::ThreadFunc(void*) () from /var/tmp/moz-build-dir/dist/bin/libxul.so #9 0x77e7757c in _pt_root () from /var/tmp/moz-build-dir/dist/bin/libnspr4.so #10 0x77bc41e2 in start_thread () from /lib/libpthread.so.0 #11 0x774932ad in clone () from /lib/libc.so.6 When I build with PGO/LTO Firefox crashes later (when I close a tab with e.g.: https://github.com/JuliaLang/julia/pull/6018 ): Program received signal SIGSEGV, Segmentation fault. 0x751645ed in PL_DHashTableEnumerate(PLDHashTable*, PLDHashOperator (*)(PLDHashTable*, PLDHashEntryHdr*, unsigned int, void*), void*) () from /var/tmp/firefox-destdir/usr/lib/firefox-30.0a1/libxul.so (gdb) bt #0 0x751645ed in PL_DHashTableEnumerate(PLDHashTable*, PLDHashOperator (*)(PLDHashTable*, PLDHashEntryHdr*, unsigned int, void*), void*) () from /var/tmp/firefox-destdir/usr/lib/firefox-30.0a1/libxul.so #1 0x75754d32 in PresShell::Destroy() () from /var/tmp/firefox-destdir/usr/lib/firefox-30.0a1/libxul.so #2 0x75754831 in nsDocumentViewer::DestroyPresShell() () from /var/tmp/firefox-destdir/usr/lib/firefox-30.0a1/libxul.so #3 0x755ee5c4 in nsDocumentViewer::Hide() () from /var/tmp/firefox-destdir/usr/lib/firefox-30.0a1/libxul.so #4 0x757b72eb in nsDocShell::SetVisibility(bool) () from /var/tmp/firefox-destdir/usr/lib/firefox-30.0a1/libxul.so #5 0x75a589a4 in nsFrameLoader::Hide() () from /var/tmp/firefox-destdir/usr/lib/firefox-30.0a1/libxul.so #6 0x75a588f6 in nsHideViewer::Run() () from /var/tmp/firefox-destdir/usr/lib/firefox-30.0a1/libxul.so #7 0x753b97de in nsContentUtils::RemoveScriptBlocker() () from /var/tmp/firefox-destdir/usr/lib/firefox-30.0a1/libxul.so #8 0x753cc954 in nsDocument::EndUpdate(unsigned int) () from /var/tmp/firefox-destdir/usr/lib/firefox-30.0a1/libxul.so #9 0x75651dd6 in mozilla::dom::XULDocument::EndUpdate(unsigned int) () from /var/tmp/firefox-destdir/usr/lib/firefox-30.0a1/libxul.so #10 0x7549673b in nsINode::doRemoveChildAt(unsigned int, bool, nsIContent*, nsAttrAndChildArray) () from /var/tmp/firefox-destdir/usr/lib/firefox-30.0a1/libxul.so #11 0x75496085 in nsXULElement::RemoveChildAt(unsigned int, bool) () from /var/tmp/firefox-destdir/usr/lib/firefox-30.0a1/libxul.so #12 0x75494df9 in nsINode::RemoveChild(nsINode, mozilla::ErrorResult) () from /var/tmp/firefox-destdir/usr/lib/firefox-30.0a1/libxul.so #13 0x75494a00 in mozilla::dom::NodeBinding::removeChild(JSContext*, JS::HandleJSObject*, nsINode*, JSJitMethodCallArgs const) [clone .lto_priv.13709] () from /var/tmp/firefox-destdir/usr/lib/firefox-30.0a1/libxul.so #14 0x753b01e7 in mozilla::dom::GenericBindingMethod(JSContext*, unsigned int, JS::Value*) () from /var/tmp/firefox-destdir/usr/lib/firefox-30.0a1/libxul.so #15 0x75262744 in js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) () from /var/tmp/firefox-destdir/usr/lib/firefox-30.0a1/libxul.so #16 0x7524a14c in Interpret(JSContext*, js::RunState) () from /var/tmp/firefox-destdir/usr/lib/firefox-30.0a1/libxul.so #17 0x75249801 in js::RunScript(JSContext*, js::RunState) () from /var/tmp/firefox-destdir/usr/lib/firefox-30.0a1/libxul.so #18 0x752627ec in js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) () from /var/tmp/firefox-destdir/usr/lib/firefox-30.0a1/libxul.so #19 0x752a574c in js::Invoke(JSContext*, JS::Value
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #202 from H.J. Lu hjl.tools at gmail dot com --- LTO miscompiles 435.gromacs in SPEC CPU 2006 on x32 with -mx32 -O3 -funroll-loops -ffast-math since r208165 (PR 60418). Can you try r208163?
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #203 from Markus Trippelsdorf trippels at gcc dot gnu.org --- (In reply to H.J. Lu from comment #202) LTO miscompiles 435.gromacs in SPEC CPU 2006 on x32 with -mx32 -O3 -funroll-loops -ffast-math since r208165 (PR 60418). Can you try r208163? Yes. Unfortunately with r208163 Firefox still crashes on startup.
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 Markus Trippelsdorf trippels at gcc dot gnu.org changed: What|Removed |Added CC||trippels at gcc dot gnu.org --- Comment #197 from Markus Trippelsdorf trippels at gcc dot gnu.org --- Created attachment 31876 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=31876action=edit mozilla-central patch
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #198 from Markus Trippelsdorf trippels at gcc dot gnu.org --- Created attachment 31877 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=31877action=edit My local PGO/LTO script
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #199 from Markus Trippelsdorf trippels at gcc dot gnu.org --- Created attachment 31878 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=31878action=edit .mozconfig_profile_gen
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #196 from Markus Trippelsdorf markus at trippelsdorf dot de --- (In reply to Jan Hubicka from comment #195) Today there was two fixes for bugs that produce undefined symbols like one you see. Does the problem still exist on current mainline? Are you using profile feedback? The problem is gone on current mainline. (And yes I'm using profile feedback.)
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #195 from Jan Hubicka hubicka at ucw dot cz --- Today there was two fixes for bugs that produce undefined symbols like one you see. Does the problem still exist on current mainline? Are you using profile feedback?
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #193 from Jan Hubicka hubicka at gcc dot gnu.org --- I am building firefox with -O3 and get no undefined symbols. Can you, please, relink with -Wl,--no-demangle --save-temps -fdump-ipa-all and try to look up the missing symbol in -lm.res file and if it not UNDEF there make somewhere available the dumps? If it is undefined there, it may be firefox bug..
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #194 from Markus Trippelsdorf markus at trippelsdorf dot de --- (In reply to Jan Hubicka from comment #193) I am building firefox with -O3 and get no undefined symbols. Can you, please, relink with -Wl,--no-demangle --save-temps -fdump-ipa-all and try to look up the missing symbol in -lm.res file and if it not UNDEF there make somewhere available the dumps? If it is undefined there, it may be firefox bug.. Hmm, it's strange, because there are five undefined references; one of them does not appear in lm.res at all and the other four are all PREVAILING_DEF_IRONLY. (The whole dump is huge. Please tell me which part you need and I will try to upload it somewhere.)
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #191 from Markus Trippelsdorf markus at trippelsdorf dot de --- First of all many thanks for your work on reducing memory usage. Peak memory usage is now lower (~3GB) than clang's (~4GB). However, with -enable-optimize=-O3 on rev202079 I get: (An default (-Os) build on rev202053 went fine this morning) /usr/lib/gcc/x86_64-pc-linux-gnu/4.9.0/../../../../x86_64-pc-linux-gnu/bin/ld: error: /tmp/ccd3grW1.ltrans0.ltrans.o: requires dynamic R_X86_64_PC32 reloc against '_ZN17nsHtt pTransaction18ReadRequestSegmentEP14nsIInputStreamPvPKcjjPj' which may overflow at runtime; recompile with -fPIC /usr/lib/gcc/x86_64-pc-linux-gnu/4.9.0/../../../../x86_64-pc-linux-gnu/bin/ld: error: /tmp/ccd3grW1.ltrans0.ltrans.o: requires dynamic R_X86_64_PC32 reloc against '_ZN17nsHtt pTransaction18ReadRequestSegmentEP14nsIInputStreamPvPKcjjPj' which may overflow at runtime; recompile with -fPIC /usr/lib/gcc/x86_64-pc-linux-gnu/4.9.0/../../../../x86_64-pc-linux-gnu/bin/ld: error: /tmp/ccd3grW1.ltrans1.ltrans.o: requires dynamic R_X86_64_PC32 reloc against '_ZN16nsInp utStreamTee15WriteSegmentFunEP14nsIInputStreamPvPKcjjPj' which may overflow at runtime; recompile with -fPIC /usr/lib/gcc/x86_64-pc-linux-gnu/4.9.0/../../../../x86_64-pc-linux-gnu/bin/ld: error: /tmp/ccd3grW1.ltrans24.ltrans.o: requires dynamic R_X86_64_PC32 reloc against '_ZN16nsIn putStreamTee15WriteSegmentFunEP14nsIInputStreamPvPKcjjPj' which may overflow at runtime; recompile with -fPIC /usr/lib/gcc/x86_64-pc-linux-gnu/4.9.0/../../../../x86_64-pc-linux-gnu/bin/ld: error: read-only segment has dynamic relocations /tmp/ccd3grW1.ltrans0.ltrans.o:ccd3grW1.ltrans0.o:function nsHttpTransaction::ReadSegments(nsAHttpSegmentReader*, unsigned int, unsigned int*): error: undefined reference to 'nsHttpTransaction::ReadRequestSegment(nsIInputStream*, void*, char const*, unsigned int, unsigned int, unsigned int*)' /tmp/ccd3grW1.ltrans0.ltrans.o:ccd3grW1.ltrans0.o:function nsHttpConnection::OnSocketWritable(): error: undefined reference to 'nsHttpTransaction::ReadRequestSegment(nsIInput Stream*, void*, char const*, unsigned int, unsigned int, unsigned int*)' /tmp/ccd3grW1.ltrans0.ltrans.o:ccd3grW1.ltrans0.o:function nsHttpPipeline::ReadSegments(nsAHttpSegmentReader*, unsigned int, unsigned int*): error: undefined reference to 'ns HttpPipeline::ReadFromPipe(nsIInputStream*, void*, char const*, unsigned int, unsigned int, unsigned int*)' /tmp/ccd3grW1.ltrans1.ltrans.o:ccd3grW1.ltrans1.o:function imgRequest::OnDataAvailable(nsIRequest*, nsISupports*, nsIInputStream*, unsigned long, unsigned int): error: undefi ned reference to 'nsInputStreamTee::WriteSegmentFun(nsIInputStream*, void*, char const*, unsigned int, unsigned int, unsigned int*)' /tmp/ccd3grW1.ltrans24.ltrans.o:ccd3grW1.ltrans24.o:function nsInputStreamTee::ReadSegments(tag_nsresult (*)(nsIInputStream*, void*, char const*, unsigned int, unsigned int, unsigned int*), void*, unsigned int, unsigned int*): error: undefined reference to 'nsInputStreamTee::WriteSegmentFun(nsIInputStream*, void*, char const*, unsigned int, unsig ned int, unsigned int*)' Not sure if -O3 or rev202079 is to blame.
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #192 from Markus Trippelsdorf markus at trippelsdorf dot de --- It turned out that -enable-optimize=-O3 is the cause. Rev202079 with -Os links fine.
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 Martin Liška marxin.liska at gmail dot com changed: What|Removed |Added CC||marxin.liska at gmail dot com --- Comment #189 from Martin Liška marxin.liska at gmail dot com --- I've encountered problems connected with PGO: gcc revision: 201894 firefox changeset: 143205:1d6bf2bd4003 (Aug 20, 2013) I build instrumented binary without LTO and after that I use the profile for LTO: MYFLAGS=-flto=9 -fno-fat-lto-objects -ftoplevel-reorder -fprofile-use -Wno-error=coverage-mismatch I know that there are gcda files that are mentioned in this thread and were removed by me: jemalloc.gcda (makes sense) ptsynch.gcda (likewise) HashFunctions.gcda (?) sqlite3.gcda (?) After linking of sqlite3, there are many corrupted profiles like: /ssd/firefox/js/src/gc/Marking.cpp /ssd/firefox/js/src/frontend/BytecodeEmitter.cpp /ssd/firefox/js/src/frontend/Interpreter.cpp ... Example of an error: /ssd/firefox/js/src/gc/Marking.cpp: In function ‘js::gc::IsAboutToBeFinalizedJSAtom(JSAtom**)bool [clone .isra.65]’: /ssd/firefox/js/src/gc/Marking.cpp:1713:1: error: corrupted profile info: profile data is not flow-consistent } ^ /ssd/firefox/js/src/gc/Marking.cpp:1713:1: error: corrupted profile info: number of executions for edge 3-6 thought to be -81 /ssd/firefox/js/src/gc/Marking.cpp:1713:1: error: corrupted profile info: number of executions for edge 3-4 thought to be 39667 /ssd/firefox/js/src/gc/Marking.cpp: In function ‘js::gc::IsAboutToBeFinalizedjs::UnownedBaseShape(js::UnownedBaseShape**)bool [clone .isra.52]’: /ssd/firefox/js/src/gc/Marking.cpp:1713:1: error: corrupted profile info: profile data is not flow-consistent /ssd/firefox/js/src/gc/Marking.cpp:1713:1: error: corrupted profile info: number of executions for edge 3-6 thought to be -1 /ssd/firefox/js/src/gc/Marking.cpp:1713:1: error: corrupted profile info: number of executions for edge 3-4 thought to be 41156 /ssd/firefox/js/src/gc/Marking.cpp: In function ‘MarkInternalJSAtom(JSTracer*, JSAtom**)void’: /ssd/firefox/js/src/gc/Marking.cpp:1713:1: error: corrupted profile info: profile data is not flow-consistent /ssd/firefox/js/src/gc/Marking.cpp:1713:1: error: corrupted profile info: number of executions for edge 9-14 thought to be -39 /ssd/firefox/js/src/gc/Marking.cpp:1713:1: error: corrupted profile info: number of executions for edge 9-10 thought to be 180119 /ssd/firefox/js/src/gc/Marking.cpp: In function ‘MarkInternalJSObject(JSTracer*, JSObject**)void’: /ssd/firefox/js/src/gc/Marking.cpp:1713:1: error: corrupted profile info: profile data is not flow-consistent /ssd/firefox/js/src/gc/Marking.cpp:1713:1: error: corrupted profile info: number of executions for edge 11-18 thought to be -1 /ssd/firefox/js/src/gc/Marking.cpp:1713:1: error: corrupted profile info: number of executions for edge 11-12 thought to be 49007 /ssd/firefox/js/src/gc/Marking.cpp: In member function ‘js::MarkStackunsigned long::push(unsigned long)’: /ssd/firefox/js/src/gc/Marking.cpp:1713:1: error: corrupted profile info: profile data is not flow-consistent /ssd/firefox/js/src/gc/Marking.cpp:1713:1: error: corrupted profile info: number of executions for edge 4-6 thought to be -1 /ssd/firefox/js/src/gc/Marking.cpp:1713:1: error: corrupted profile info: number of executions for edge 4-5 thought to be 1 /ssd/firefox/js/src/gc/Marking.cpp: In member function ‘js::GCMarker::drainMarkStack(js::SliceBudget)’: /ssd/firefox/js/src/gc/Marking.cpp:1713:1: error: corrupted profile info: profile data is not flow-consistent /ssd/firefox/js/src/gc/Marking.cpp:1713:1: error: corrupted profile info: number of executions for edge 3-4 thought to be -7 /ssd/firefox/js/src/gc/Marking.cpp:1713:1: error: corrupted profile info: number of executions for edge 3-1 thought to be 7 /ssd/firefox/js/src/gc/Marking.cpp: In member function ‘js::ObjectImpl::slotSpan() const’: /ssd/firefox/js/src/gc/Marking.cpp:1713:1: error: corrupted profile info: profile data is not flow-consistent /ssd/firefox/js/src/gc/Marking.cpp:1713:1: error: corrupted profile info: number of executions for edge 5-7 thought to be -1 /ssd/firefox/js/src/gc/Marking.cpp:1713:1: error: corrupted profile info: number of executions for edge 5-6 thought to be 15965 Thank you, Martin
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #190 from Jan Hubicka hubicka at ucw dot cz --- /ssd/firefox/js/src/gc/Marking.cpp: In function ???js::gc::IsAboutToBeFinalizedJSAtom(JSAtom**)bool [clone .isra.65]???: /ssd/firefox/js/src/gc/Marking.cpp:1713:1: error: corrupted profile info: profile data is not flow-consistent } ^ /ssd/firefox/js/src/gc/Marking.cpp:1713:1: error: corrupted profile info: number of executions for edge 3-6 thought to be -81 This actually loks like corruption from concurent updates (profiling is not thread safe). Do you get much more of these? I can imagine that garbage collector runs in parrallel and often. /ssd/firefox/js/src/gc/Marking.cpp:1713:1: error: corrupted profile info: number of executions for edge 3-4 thought to be 39667 Perhaps we should fix dumping to dump full 64bit value.. :) Honza
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #187 from Jan Hubicka hubicka at gcc dot gnu.org --- WPA time report Execution times (seconds) phase setup : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 1398 kB ( 0%) ggc phase opt and generate : 80.79 (13%) usr 1.01 ( 3%) sys 81.96 (12%) wall 315727 kB (25%) ggc phase stream in : 283.33 (45%) usr 7.82 (24%) sys 292.12 (44%) wall 940315 kB (74%) ggc phase stream out: 261.66 (42%) usr 23.14 (72%) sys 287.88 (43%) wall 7534 kB ( 1%) ggc garbage collection : 14.45 ( 2%) usr 0.02 ( 0%) sys 14.48 ( 2%) wall 0 kB ( 0%) ggc callgraph optimization : 2.55 ( 0%) usr 0.00 ( 0%) sys 2.55 ( 0%) wall 33 kB ( 0%) ggc ipa cp : 10.45 ( 2%) usr 0.36 ( 1%) sys 10.81 ( 2%) wall 456287 kB (36%) ggc ipa inlining heuristics : 42.12 ( 7%) usr 1.06 ( 3%) sys 43.27 ( 7%) wall 1485346 kB (117%) ggc ipa lto gimple in : 0.56 ( 0%) usr 0.25 ( 1%) sys 0.87 ( 0%) wall 0 kB ( 0%) ggc ipa lto gimple out : 21.77 ( 3%) usr 1.72 ( 5%) sys 23.53 ( 4%) wall 0 kB ( 0%) ggc ipa lto decl in : 183.90 (29%) usr 4.77 (15%) sys 189.46 (29%) wall 959299 kB (76%) ggc ipa lto decl out: 231.70 (37%) usr 10.78 (34%) sys 242.73 (37%) wall 0 kB ( 0%) ggc ipa lto cgraph I/O : 14.38 ( 2%) usr 1.57 ( 5%) sys 15.99 ( 2%) wall 2405760 kB (190%) ggc ipa lto decl merge : 32.16 ( 5%) usr 0.00 ( 0%) sys 32.24 ( 5%) wall 8268 kB ( 1%) ggc ipa lto cgraph merge: 28.72 ( 5%) usr 0.06 ( 0%) sys 28.81 ( 4%) wall 135235 kB (11%) ggc whopr wpa : 9.57 ( 2%) usr 0.05 ( 0%) sys 9.62 ( 1%) wall 7537 kB ( 1%) ggc whopr wpa I/O : 2.07 ( 0%) usr 10.62 (33%) sys 15.49 ( 2%) wall 0 kB ( 0%) ggc whopr partitioning : 3.26 ( 1%) usr 0.03 ( 0%) sys 3.29 ( 0%) wall 0 kB ( 0%) ggc ipa reference : 5.55 ( 1%) usr 0.05 ( 0%) sys 5.62 ( 1%) wall 0 kB ( 0%) ggc ipa profile : 2.82 ( 0%) usr 0.05 ( 0%) sys 2.88 ( 0%) wall 0 kB ( 0%) ggc ipa pure const : 6.25 ( 1%) usr 0.13 ( 0%) sys 6.38 ( 1%) wall 0 kB ( 0%) ggc unaccounted todo: 13.25 ( 2%) usr 0.28 ( 1%) sys 13.58 ( 2%) wall 0 kB ( 0%) ggc TOTAL : 625.7931.97 661.97 1264976 kB
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #185 from Jan Hubicka hubicka at gcc dot gnu.org --- I merged in some patches intended to reduce memory of Firefox LTO and also updated firefox tree. Some more involved patches are on the way, so it is summary where we stand now. WPA usage in TOP is 10GB now. 1) After streaming in trees, the GGC usage is now 5.1GB - 2.5GB are trees, - 1GB are linemaps - 0.8GB are decl maps (decl states) tree_list12561507 integer_type 1511296 pointer_type 4610735 record_type 8139077 method_type 2401664 integer_cst 6677946 string_cst 2127890 function_decl6069299 label_decl504859 field_decl 5104957 var_decl 596020 const_decl 5401253 parm_decl9002744 type_decl10150100 result_decl 2181250 addr_expr4173661 tree_binfo 4780477 I have cache that cuts down the linemaps + patch to not stream PARM_DECLs and RETURN_DECLs. With this the usage goes bellow 3GB. 2) Cgraph streaming now becomes important factor. GGC usage goes up to 7.7GB GGC use: - cgraph nodes themselves are 1.5GB - inline summaries are 0.5GB - cgraph edges are 3.7GB - IPA references 2.3GB - IPA-prop 0.7GB Off GGC - IPA-prop 0.6GB - Inline summary 0.5GB - symtab encoder 0.17GB Here one can easily - compress the vectors recording definitions - pull off parts of cgraph nodes that are not really needed by WPA (nested info, etc.) - perhaps implement of streaming of merged cgraph. so good news is that we now have a lot of interesting low hanging fruit. Bad news is that tree streaming still feels slow. I suppose we need to dig more into what trees really need to go into WPA...
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #186 from Jan Hubicka hubicka at gcc dot gnu.org --- oprofile of merging 6764713.0501 lto1 inflate_fast 38682 7.4624 lto1 compare_tree_sccs_1(tree_node*, tree_node*, tree_node***) 32365 6.2437 lto1 streamer_read_uhwi(lto_input_block*) 31198 6.0186 lto1 streamer_read_tree_bitfields(lto_input_block*, data_in*, tree_node*) 21155 4.0811 libc-2.11.1.so msort_with_tmp 19581 3.7775 lto1 ht_lookup_with_hash(ht*, unsigned char const*, unsigned long, unsigned int, ht_lookup_option) 16584 3.1993 lto1 lto_input_tree(lto_input_block*, data_in*) 15203 2.9329 lto1 lto_input_tree_1(lto_input_block*, data_in*, LTO_tags, unsigned int) 15194 2.9312 libc-2.11.1.so memcpy 14823 2.8596 lto1 htab_find_slot_with_hash 12860 2.4809 lto1 streamer_read_tree_body(lto_input_block*, data_in*, tree_node*) 12705 2.4510 lto1 hash_tabletree_scc_hasher, xcallocator::find_slot_with_hash(tree_scc const*, unsigned int, insert_option) 11773 2.2712 lto1 adler32 11504 2.2193 libc-2.11.1.so _IO_vfscanf 11401 2.1994 lto1 unify_scc(streamer_tree_cache_d*, unsigned int, unsigned int, unsigned int, unsigned int) 9548 1.8420 lto1 streamer_get_pickled_tree(lto_input_block*, data_in*) 9315 1.7970 lto1 inflate IPA 18799 6.2862 lto1 symtab_remove_unreachable_nodes(bool, _IO_FILE*) 11878 3.9719 lto1 cgraph_redirect_edge_callee(cgraph_edge*, cgraph_node*) 11223 3.7528 lto1 do_per_function(void (*)(void*), void*) 10813 3.6157 lto1 pointer_set_lookup(pointer_set_t const*, void const*, unsigned long*) 8415 2.8139 lto1 ipa_reverse_postorder(cgraph_node**) 7689 2.5711 lto1 htab_find_slot_with_hash 7677 2.5671 lto1 do_estimate_growth_1(cgraph_node*, void*) 7477 2.5002 libc-2.11.1.so free 7035 2.3524 libc-2.11.1.so malloc_consolidate Stream out 9440 16.1663 lto1 linemap_lookup(line_maps*, unsigned int) 7663 13.1231 lto1 DFS_write_tree(output_block*, sccs*, tree_node*, bool, bool) 6052 10.3643 lto1 streamer_write_uhwi_stream(lto_output_stream*, unsigned long) 5831 9.9858 lto1 pointer_set_lookup(pointer_set_t const*, void const*, unsigned long*) 3342 5.7233 lto1 streamer_tree_cache_lookup(streamer_tree_cache_d*, tree_node*, unsigned int*) 2229 3.8172 lto1 pointer_map_insert(pointer_map_t*, void const*) 2196 3.7607 lto1 streamer_pack_tree_bitfields(output_block*, bitpack_d*, tree_node*) 2054 3.5175 lto1 lto_output_tree(output_block*, tree_node*, bool, bool) 1656 2.8360 lto1 inflate_fast 1655 2.8342 lto1 pointer_mapunsigned int::insert(void const*, bool*)
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #184 from Jan Hubicka hubicka at gcc dot gnu.org --- New profiles after Richard's changes to remove pointer maps from straming in. Stream in: samples %app name symbol name 3659912.3464 lto1 inflate_fast 27382 9.2371 lto1 streamer_read_uhwi(lto_input_block*) 19282 6.5047 lto1 streamer_read_tree_bitfields(lto_input_block*, data_in*, tree_node*) 15807 5.3324 lto1 compare_tree_sccs_1(tree_node*, tree_node*, tree_node***) 11385 3.8407 libc-2.11.1.so msort_with_tmp 9054 3.0543 libc-2.11.1.so memcpy 8701 2.9352 lto1 htab_find_slot_with_hash 8506 2.8694 lto1 lto_input_tree(lto_input_block*, data_in*) 8405 2.8354 lto1 lto_input_tree_1(lto_input_block*, data_in*, LTO_tags, unsigned int) 8055 2.7173 lto1 ht_lookup_with_hash(ht*, unsigned char const*, unsigned long, unsigned int, ht_lookup_option) 6436 2.1711 lto1 streamer_read_tree_body(lto_input_block*, data_in*, tree_node*) 6287 2.1209 lto1 adler32 5891 1.9873 lto1 streamer_get_pickled_tree(lto_input_block*, data_in*) Stream out: samples %app name symbol name 1988514.6837 lto1 DFS_write_tree(output_block*, sccs*, tree_node*, bool, bool) 1928514.2407 lto1 linemap_lookup(line_maps*, unsigned int) 1619211.9567 lto1 streamer_write_uhwi_stream(lto_output_stream*, unsigned long) 1592611.7603 lto1 pointer_map_insert(pointer_map_t*, void const*) 10285 7.5948 lto1 pointer_map_contains(pointer_map_t const*, void const*) 7324 5.4083 lto1 streamer_tree_cache_lookup(streamer_tree_cache_d*, tree_node*, unsigned int*) 5897 4.3545 lto1 streamer_pack_tree_bitfields(output_block*, bitpack_d*, tree_node*) 5374 3.9683 lto1 lto_output_tree(output_block*, tree_node*, bool, bool) 4896 3.6154 lto1 streamer_tree_cache_insert_1(streamer_tree_cache_d*, tree_node*, unsigned int, unsigned int*, bool) 3285 2.4258 libc-2.11.1.so memset 2669 1.9709 lto1 streamer_write_tree_body(output_block*, tree_node*, bool) 2520 1.8608 libc-2.11.1.so memcpy 2383 1.7597 lto1 streamer_tree_cache_add_to_node_array(streamer_tree_cache_d*, unsigned int, tree_node*, unsigned int) linemap_lookup is easy target, obviously. Execution times (seconds) phase setup : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 1399 kB ( 0%) ggc phase opt and generate : 69.29 (14%) usr 0.82 ( 3%) sys 70.62 (13%) wall 270269 kB (11%) ggc phase stream in : 224.95 (44%) usr 6.23 (22%) sys 236.02 (43%) wall 2174294 kB (89%) ggc phase stream out: 213.26 (42%) usr 21.35 (75%) sys 236.87 (44%) wall 7157 kB ( 0%) ggc garbage collection : 9.92 ( 2%) usr 0.00 ( 0%) sys 9.99 ( 2%) wall 0 kB ( 0%) ggc callgraph optimization : 1.36 ( 0%) usr 0.00 ( 0%) sys 1.34 ( 0%) wall 32 kB ( 0%) ggc ipa cp : 7.65 ( 2%) usr 0.32 ( 1%) sys 8.01 ( 1%) wall 418436 kB (17%) ggc ipa inlining heuristics : 38.83 ( 8%) usr 0.83 ( 3%) sys 39.99 ( 7%) wall 1352530 kB (55%) ggc ipa lto gimple in : 0.39 ( 0%) usr 0.05 ( 0%) sys 0.53 ( 0%) wall 0 kB ( 0%) ggc ipa lto gimple out : 16.46 ( 3%) usr 1.39 ( 5%) sys 17.93 ( 3%) wall 0 kB ( 0%) ggc ipa lto decl in : 158.55 (31%) usr 3.99 (14%) sys 166.99 (31%) wall 2583106 kB (105%) ggc ipa lto decl out: 191.10 (38%) usr 11.48 (40%) sys 203.47 (37%) wall 0 kB ( 0%) ggc ipa lto cgraph I/O : 7.07 ( 1%) usr 1.17 ( 4%) sys 8.27 ( 2%) wall 2134131 kB (87%) ggc ipa lto decl merge : 29.94 ( 6%) usr 0.01 ( 0%) sys 30.06 ( 6%) wall 8270 kB ( 0%) ggc ipa lto cgraph merge: 12.02 ( 2%) usr 0.04 ( 0%) sys 12.13 ( 2%) wall 142240 kB ( 6%) ggc whopr wpa : 7.30 ( 1%) usr 0.03 ( 0%) sys 7.39 ( 1%) wall 7160 kB ( 0%) ggc whopr wpa I/O : 1.40 ( 0%) usr 8.46 (30%) sys 11.14 ( 2%) wall 0 kB ( 0%) ggc whopr partitioning : 2.33 ( 0%) usr 0.01 ( 0%) sys 2.36 ( 0%) wall 0 kB ( 0%) ggc ipa reference : 5.44 ( 1%) usr 0.04 ( 0%) sys 5.53 ( 1%) wall 0 kB ( 0%) ggc ipa profile : 1.26 ( 0%) usr 0.04 ( 0%) sys 1.32 ( 0%) wall 0 kB ( 0%) ggc ipa pure const : 5.87 ( 1%) usr 0.13 ( 0%) sys 6.03 ( 1%) wall 0 kB ( 0%) ggc inline parameters : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #182 from Jan Hubicka hubicka at gcc dot gnu.org --- OK, after a while I should update the stats here. Richard's new tree merging patch makes libxul linking a lot faster and less memory consuming. Peak memory usage (in TOP) is now just bellow 10GB, with bit of incremental improvmenets I hope to get bellow 8GB again soon. Bulid time is real19m0.355s user56m20.459s sys 2m17.533s GGC memory usage after stream in 4938399k Execution times (seconds) phase setup : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 1399 kB ( 0%) ggc phase opt and generate : 72.86 (12%) usr 0.90 ( 3%) sys 75.25 (11%) wall 270952 kB ( 7%) ggc phase stream in : 274.88 (44%) usr 9.01 (26%) sys 294.99 (43%) wall 3478515 kB (93%) ggc phase stream out: 282.18 (45%) usr 24.40 (71%) sys 308.42 (45%) wall 7162 kB ( 0%) ggc garbage collection : 12.99 ( 2%) usr 0.01 ( 0%) sys 13.00 ( 2%) wall 0 kB ( 0%) ggc callgraph optimization : 1.95 ( 0%) usr 0.00 ( 0%) sys 1.95 ( 0%) wall 32 kB ( 0%) ggc ipa cp : 9.82 ( 2%) usr 0.39 ( 1%) sys 10.26 ( 2%) wall 418482 kB (11%) ggc ipa inlining heuristics : 39.30 ( 6%) usr 1.12 ( 3%) sys 41.52 ( 6%) wall 1353294 kB (36%) ggc ipa lto gimple in : 0.45 ( 0%) usr 0.15 ( 0%) sys 0.62 ( 0%) wall 0 kB ( 0%) ggc ipa lto gimple out : 18.24 ( 3%) usr 1.50 ( 4%) sys 19.86 ( 3%) wall 0 kB ( 0%) ggc ipa lto decl in : 200.68 (32%) usr 5.85 (17%) sys 216.44 (32%) wall 3887175 kB (103%) ggc ipa lto decl out: 256.24 (41%) usr 13.44 (39%) sys 271.24 (40%) wall 0 kB ( 0%) ggc ipa lto cgraph I/O : 7.20 ( 1%) usr 1.61 ( 5%) sys 8.83 ( 1%) wall 2134157 kB (57%) ggc ipa lto decl merge : 27.71 ( 4%) usr 0.01 ( 0%) sys 27.72 ( 4%) wall 8270 kB ( 0%) ggc ipa lto cgraph merge: 17.31 ( 3%) usr 0.07 ( 0%) sys 17.39 ( 3%) wall 142240 kB ( 4%) ggc whopr wpa : 8.82 ( 1%) usr 0.04 ( 0%) sys 8.89 ( 1%) wall 7165 kB ( 0%) ggc whopr wpa I/O : 1.63 ( 0%) usr 9.43 (27%) sys 11.19 ( 2%) wall 0 kB ( 0%) ggc whopr partitioning : 3.21 ( 1%) usr 0.04 ( 0%) sys 3.25 ( 0%) wall 0 kB ( 0%) ggc ipa reference : 5.56 ( 1%) usr 0.04 ( 0%) sys 5.81 ( 1%) wall 0 kB ( 0%) ggc ipa profile : 1.83 ( 0%) usr 0.02 ( 0%) sys 1.86 ( 0%) wall 0 kB ( 0%) ggc ipa pure const : 6.07 ( 1%) usr 0.18 ( 1%) sys 6.26 ( 1%) wall 0 kB ( 0%) ggc inline parameters : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 14 kB ( 0%) ggc tree copy propagation : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc tree PTA: 0.05 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall 0 kB ( 0%) ggc tree SSA rewrite: 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 27 kB ( 0%) ggc tree SSA other : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc tree CCP: 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc dominance computation : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc varconst: 0.14 ( 0%) usr 0.12 ( 0%) sys 0.24 ( 0%) wall 0 kB ( 0%) ggc unaccounted todo: 10.69 ( 2%) usr 0.29 ( 1%) sys 11.10 ( 2%) wall 0 kB ( 0%) ggc TOTAL : 629.9334.31 678.67 3758029 kB Memory usage seems about the same with -g. Honza
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #183 from Jan Hubicka hubicka at gcc dot gnu.org --- type merging stats [WPA] read 43156894 SCCs of average size 2.270660 [WPA] 97994652 tree bodies read in total [WPA] tree SCC table: size 8388593, 3830511 elements, collision ratio: 0.684487 [WPA] tree SCC max chain length 88 (size 1) [WPA] Compared 19139975 SCCs, 344923 collisions (0.018021) [WPA] Merged 19067050 SCCs [WPA] Merged 58757829 tree bodies [WPA] Merged 11951381 types [WPA] 4357267 types prevailed (13278034 associated trees) [WPA] Old merging code merges an additional 2026163 types of which 140937 are in the same SCC with their prevailing variant (12389865 and 6362266 associated trees) [WPA] GIMPLE canonical type table: size 131071, 77910 elements, 4357402 searches, 1095104 collisions (ratio: 0.251320) [WPA] GIMPLE canonical type hash table: size 8388593, 4357346 elements, 15252531 searches, 11817317 collisions (ratio: 0.774777) [WPA] # of input files: 4918 [WPA] # of input cgraph nodes: 0 [WPA] # of function bodies: 0 [WPA] # of output files: 0 [WPA] # of output symtab nodes: 0 [WPA] # of output tree pickle references: 0 [WPA] # of output tree bodies: 0 [WPA] # callgraph partitions: 0 [WPA] Compression: 1311851796 input bytes, 4153897270 uncompressed bytes (ratio: 3.166438) [WPA] Size of mmap'd section decls: 1311851796 bytes [LTRANS] read 314277 SCCs of average size 6.082532 [LTRANS] 1911600 tree bodies read in total [LTRANS] GIMPLE canonical type table: size 16381, 9653 elements, 453967 searches, 24697 collisions (ratio: 0.054403) [LTRANS] GIMPLE canonical type hash table: size 1048573, 453913 elements, 1562009 searches, 1517260 collisions (ratio: 0.971352) [LTRANS] # of input files: 1 [LTRANS] # of input cgraph nodes: 0 [LTRANS] # of function bodies: 0
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 Martin Jambor jamborm at gcc dot gnu.org changed: What|Removed |Added Depends on||56570 --- Comment #181 from Martin Jambor jamborm at gcc dot gnu.org 2013-03-08 10:41:54 UTC --- The bug described in comment #179 is now PR 56570.
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #180 from Richard Biener rguenth at gcc dot gnu.org 2013-03-07 16:08:29 UTC --- Try Index: gcc/tree-inline.c === --- gcc/tree-inline.c (revision 196520) +++ gcc/tree-inline.c (working copy) @@ -3929,7 +3929,7 @@ expand_call_inline (basic_block bb, gimp { id-block = make_node (BLOCK); BLOCK_ABSTRACT_ORIGIN (id-block) = fn; - BLOCK_SOURCE_LOCATION (id-block) = input_location; + BLOCK_SOURCE_LOCATION (id-block) = LOCATION_LOCUS (input_location); prepend_lexical_block (gimple_block (stmt), id-block); }
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #179 from Martin Jambor jamborm at gcc dot gnu.org 2013-03-06 15:14:35 UTC --- I'm currently (gcc revision 196427, FF changeset 123831:c95439870e05) facing a few ICEs during the compilation phase with the following backtrace: #0 0x00f89a73 in get_location_from_adhoc_loc (set=0x77ff2000, loc=2947526575) at /home/mjambor/gcc/trunk/src/libcpp/line-map.c:165 #1 0x00c247fe in inlined_function_outer_scope_p (block=0x7fffee4bcb28) at /home/mjambor/gcc/trunk/src/gcc/tree.h:5561 #2 pack_ts_block_value_fields (expr=0x7fffee4bcb28, bp=0x7fffd1a0, ob=0x1c73210) at /home/mjambor/gcc/trunk/src/gcc/tree-streamer-out.c:319 #3 streamer_pack_tree_bitfields (ob=0x1c73210, bp=0x7fffd1a0, expr=0x7fffee4bcb28) at /home/mjambor/gcc/trunk/src/gcc/tree-streamer-out.c:417 #4 0x009c3bc9 in lto_write_tree (ref_p=true, expr=0x7fffee4bcb28, ob=0x1c73210) at /home/mjambor/gcc/trunk/src/gcc/lto-streamer-out.c:317 #5 lto_output_tree (ob=0x1c73210, expr=0x7fffee4bcb28, ref_p=true, this_ref_p=optimized out) at /home/mjambor/gcc/trunk/src/gcc/lto-streamer-out.c:410 #6 0x00c26617 in write_ts_common_tree_pointers (ref_p=true, expr=0x73f6bc80, ob=0x1c73210) at /home/mjambor/gcc/trunk/src/gcc/tree-streamer-out.c:514 #7 streamer_write_tree_body (ob=0x1c73210, expr=0x73f6bc80, ref_p=optimized out) at /home/mjambor/gcc/trunk/src/gcc/tree-streamer-out.c:845 #8 0x009c3bf7 in lto_write_tree (ref_p=true, expr=0x73f6bc80, ob=0x1c73210) at /home/mjambor/gcc/trunk/src/gcc/lto-streamer-out.c:321 #9 lto_output_tree (ob=ob@entry=0x1c73210, expr=0x73f6bc80, ref_p=ref_p@entry=true, this_ref_p=this_ref_p@entry=true) at /home/mjambor/gcc/trunk/src/gcc/lto-streamer-out.c:410 #10 0x00c26e62 in write_ts_exp_tree_pointers (ref_p=optimized out, expr=optimized out, ob=optimized out) at /home/mjambor/gcc/trunk/src/gcc/tree-streamer-out.c:747 #11 streamer_write_tree_body (ob=0x1c73210, expr=0x7fffecc63dc0, ref_p=optimized out) at /home/mjambor/gcc/trunk/src/gcc/tree-streamer-out.c:884 #12 0x009c3bf7 in lto_write_tree (ref_p=true, expr=0x7fffecc63dc0, ob=0x1c73210) at /home/mjambor/gcc/trunk/src/gcc/lto-streamer-out.c:321 #13 lto_output_tree (ob=0x1c73210, expr=0x7fffecc63dc0, ref_p=true, this_ref_p=optimized out) at /home/mjambor/gcc/trunk/src/gcc/lto-streamer-out.c:410 #14 0x00c26df8 in write_ts_exp_tree_pointers (ref_p=optimized out, expr=optimized out, ob=optimized out) at /home/mjambor/gcc/trunk/src/gcc/tree-streamer-out.c:746 #15 streamer_write_tree_body (ob=0x1c73210, expr=0x7fffecc70078, ref_p=optimized out) at /home/mjambor/gcc/trunk/src/gcc/tree-streamer-out.c:884 #16 0x009c3bf7 in lto_write_tree (ref_p=true, expr=0x7fffecc70078, ob=0x1c73210) at /home/mjambor/gcc/trunk/src/gcc/lto-streamer-out.c:321 #17 lto_output_tree (ob=ob@entry=0x1c73210, expr=0x7fffecc70078, ref_p=ref_p@entry=true, this_ref_p=this_ref_p@entry=true) at /home/mjambor/gcc/trunk/src/gcc/lto-streamer-out.c:410 #18 0x00c2681d in write_ts_decl_common_tree_pointers (ref_p=true, expr=0x7fffecc6d720, ob=0x1c73210) at /home/mjambor/gcc/trunk/src/gcc/tree-streamer-out.c:584 #19 streamer_write_tree_body (ob=0x1c73210, expr=0x7fffecc6d720, ref_p=optimized out) at /home/mjambor/gcc/trunk/src/gcc/tree-streamer-out.c:857 #20 0x009c3bf7 in lto_write_tree (ref_p=true, expr=0x7fffecc6d720, ob=0x1c73210) at /home/mjambor/gcc/trunk/src/gcc/lto-streamer-out.c:321 #21 lto_output_tree (ob=0x1c73210, expr=0x7fffecc6d720, ref_p=true, this_ref_p=optimized out) at /home/mjambor/gcc/trunk/src/gcc/lto-streamer-out.c:410 #22 0x00ecd118 in output_gimple_stmt (stmt=0x7fffec6206c0, ob=0x1c73210) at /home/mjambor/gcc/trunk/src/gcc/gimple-streamer-out.c:143 #23 output_bb (ob=0x1c73210, bb=0x7fffed130f08, fn=0x7fffef8603f0) at /home/mjambor/gcc/trunk/src/gcc/gimple-streamer-out.c:199 #24 0x009c4f26 in output_function (node=0x7fffef8614a0) at /home/mjambor/gcc/trunk/src/gcc/lto-streamer-out.c:823 #25 lto_output () at /home/mjambor/gcc/trunk/src/gcc/lto-streamer-out.c:987 #26 0x009fa971 in ipa_write_summaries_2 ( pass=0x1618f00 pass_ipa_lto_gimple_out, state=0x1ad8c00) at /home/mjambor/gcc/trunk/src/gcc/passes.c:2408 The statement being written is: (gdb) call debug_gimple_stmt ((gimple)0x7fffec6206c0) # DEBUG v = 18444633011384221696 This happens for example during compilation of js/src/ion/shared/CodeGenerator-shared.cpp
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #172 from Richard Biener rguenth at gcc dot gnu.org 2013-01-17 10:53:29 UTC --- (In reply to comment #171) Created attachment 29182 [details] Patch to compress line info This patch removes column information from LTO (so we lose carret diagnostics in warnings/errors output at LTO time that seems resonable thing to do) and avoid entering duplicate locators into the linemap. The patch reduces linemap usage from 23% to 5% of GGC memory saving 1-2GB on Mozilla. (also reducing LTO file size). Patch looks incomplete? What does dropping columns only do to memory use? Please disable flag_diagnostics_show_caret unconditionally in lto1 if you do that.
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #173 from Jan Hubicka hubicka at ucw dot cz 2013-01-17 12:30:30 UTC --- Patch looks incomplete? What does dropping columns only do to memory use? I will check. I remember that prior columns there was also some savings for the cache. Just saving 20% out of 23% is cooler than saving 20% out of 5% of memory. Note that we are still over 8GB for Mozilla LTO after latest Mozilla checkout. Please disable flag_diagnostics_show_caret unconditionally in lto1 if you do that. Yeah, I wanted, but I am not sure where in lto.c is proper place to do so?
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #174 from Jakub Jelinek jakub at gcc dot gnu.org 2013-01-17 12:42:06 UTC --- lto_post_options ?
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #175 from Jan Hubicka hubicka at gcc dot gnu.org 2013-01-17 14:40:04 UTC --- Created attachment 29191 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=29191 alternative patch without the compression. This is alternative patch just skipping columns but not doing the compression. It seems that compression is actually quite effective. Non-compressing w/o column info is 1073872920 bytes, compression + no column is 268566544 bytes compression + column is 1073872920 bytes Perhaps I messed up the caching with column info? It strikes wrong that the numbers are precisely the same. But perhaps it is just reallocation strategy. I will also generate fresh numbers for unpatched GCC.
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #176 from Richard Biener rguenth at gcc dot gnu.org 2013-01-17 14:54:22 UTC --- (In reply to comment #175) Created attachment 29191 [details] alternative patch without the compression. This is alternative patch just skipping columns but not doing the compression. It seems that compression is actually quite effective. Non-compressing w/o column info is 1073872920 bytes, compression + no column is 268566544 bytes compression + column is 1073872920 bytes Perhaps I messed up the caching with column info? It strikes wrong that the numbers are precisely the same. But perhaps it is just reallocation strategy. I will also generate fresh numbers for unpatched GCC. +linemap_line_start (line_table, data_in-current_line, 0); - return linemap_position_for_column (line_table, data_in-current_col); + return linemap_position_for_column (line_table, 0); linemap_line_start will aready return a location for column 0. So I'd say we want if (file_change) { ... } return linemap_line_start (line_table, data_in-current_line, 0); instead. Which hopefully does nothing if nothing changed. I don't know how you implement caching - you didn't attach a patch to do so.
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #177 from Jan Hubicka hubicka at gcc dot gnu.org 2013-01-17 15:13:53 UTC --- Created attachment 29192 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=29192 caching Aha, now I see why you ask for complete patch. I obviously messed up the code. This is how I do caching (in version that still has columns in it). I removed the final incarnation of the patch, but it should be easy to re-do.
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #178 from Jan Hubicka hubicka at gcc dot gnu.org 2013-01-17 17:11:13 UTC --- The global cache with arbitrary large size reduces usage down to 0.3% (16908304) bytes. So it seems that sharing across files is quite an important part of the game. I will try to fiddle with the cache size to see how big cache is actually needed. Unpatches mainline needs 1073872920 bytes, that is the same as with dropping columns and/or my initial local caching implementation. This is apparently because of the exponential resizing of the table (i.e. we simply do not save enough to see a difference). Honza
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #171 from Jan Hubicka hubicka at gcc dot gnu.org 2013-01-16 17:25:04 UTC --- Created attachment 29182 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=29182 Patch to compress line info This patch removes column information from LTO (so we lose carret diagnostics in warnings/errors output at LTO time that seems resonable thing to do) and avoid entering duplicate locators into the linemap. The patch reduces linemap usage from 23% to 5% of GGC memory saving 1-2GB on Mozilla. (also reducing LTO file size).
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #170 from Jan Hubicka hubicka at gcc dot gnu.org 2013-01-10 15:04:10 UTC --- OK, here is updated memory use: cgraph.c:863 (cgraph_allocate_init_indirect_info5905200: 0.1% 0: 0.0%6020160: 0.1% 0: 0.0% 298134 tree.c:1237 (build_int_cst_wide) 15554272: 0.4% 0: 0.0% 782528: 0.0% 0: 0.0% 510525 tree.c:1559 (build_string) 10685931: 0.2% 0: 0.0% 16715642: 0.4%2193469: 1.7% 563828 stringpool.c:75 (alloc_node) 0: 0.0% 0: 0.0% 30574880: 0.7% 0: 0.0% 764372 lto/lto.c:2286 (create_subid_section_table) 1522184: 0.0% 0: 0.0% 39117064: 0.8%8051472: 6.4% 3978 stringpool.c:58 (stringpool_ggc_alloc)0: 0.0% 0: 0.0% 41092405: 0.9%2954893: 2.4% 764372 gimple.c:3167 (iterative_hash_canonical_type) 45040752: 1.0% 0: 0.0% 0: 0.0% 0: 0.0%2815047 lto/lto.c:1222 (iterative_hash_gimple_type)68276864: 1.6% 0: 0.0% 0: 0.0% 0: 0.0%4267304 ggc-common.c:249 (ggc_cleared_alloc_ptr_array_tw 91784: 0.0% 487289424:48.8% 71432600: 1.5% 248976: 0.2% 10974 lto/lto.c:1266 (iterative_hash_gimple_type)75288576: 1.8% 0: 0.0% 0: 0.0% 0: 0.0%4705536 lto-section-in.c:362 (lto_new_in_decl_state) 694320: 0.0% 0: 0.0% 94861800: 2.0% 0: 0.0% 796301 tree.c:1263 (build_int_cst_wide) 76232736: 1.8% 0: 0.0% 19358880: 0.4% 0: 0.0%2987238 cgraph.c:794 (cgraph_create_edge_1) 0: 0.0% 0: 0.0% 125510632: 2.7% 0: 0.0%1206833 vec.h:565 ((null)) 66034564: 1.5% 98716: 0.0% 68500548: 1.5%3484420: 2.8% 597783 vec.h:695 ((null))124654648: 2.9% 122044288:12.2% 63749232: 1.4%2614800: 2.1%1590429 tree-streamer-in.c:562 (streamer_alloc_tree) 125829312: 2.9% 0: 0.0% 74222904: 1.6% 7072: 0.0%2005091 lto/lto.c:267 (lto_read_in_decl_state) 1478720: 0.0% 0: 0.0% 216390688: 4.7% 38247784:30.5%5574107 vec.h:747 ((null))173791988: 4.0% 19565412: 2.0% 68225644: 1.5%2680332: 2.1%1396070 vec.h:707 ((null))133872480: 3.1% 0: 0.0% 285212728: 6.1% 800360: 0.6%1059913 cgraph.c:500 (cgraph_allocate_node) 0: 0.0% 0: 0.0% 472831880:10.2% 0: 0.0%1597405 tree.c:1223 (build_int_cst_wide) 607138944:14.1% 0: 0.0% 10427664: 0.2%4719336: 3.8% 315034 toplev.c:959 (realloc_for_line_map) 0: 0.0% 358037664:35.8% 1073872920:23.1%184: 0.0% 16 tree-streamer-in.c:573 (streamer_alloc_tree) 2762184192:64.2% 0: 0.0% 1861017624:40.0% 59027616:47.1% 34649937 Total4302007795999178184 4651003487125411458 68828967 source location GarbageFreed Leak OverheadTimes --- Actually it is a bit of improvement over my past report. Some obvious things 1) we still soak in too many trees (40%) of memory. The per-tree stats are: decls17310018 -1609736744 types8983387 1509209016 exprs2427302 80045744 constants4079292 135393547 binfos 2005091 200038072 random kinds 5691481 227659664 and counts: tree_list5691475 pointer_type 2337585 record_type 3702066 function_decl1856282 field_decl 2812564 const_decl 2739702 parm_decl3549707 type_decl4780459 result_decl 1144482 tree_binfo 2005091 2) new linemaps are still a disaster 3) VEC rewrite did break stats. Honza
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #165 from Jan Hubicka hubicka at gcc dot gnu.org 2013-01-09 15:16:26 UTC --- OK, I tracked down the undefined reference to error: /tmp/cc0oq4BG.ltrans1.ltrans.o: requires dynamic R_X86_64_PC32 reloc against '_ZN12SkAnnotationC1ER23SkFlattenableReadBuffer' which may overflow at runtime; recompile with -fPIC it is caused by bug in Mozilla - it includes file defininig virtual function that use '_ZN12SkAnnotationC1ER23SkFlattenableReadBuffer' (in SkPaint) but it never links with implementation. Normally the function is optimized out. It is not due to fact that we never optimize out virtual functions prior inlining for devirtualization and in WPA path we forget to remove these when done. Fixed by the following patch Index: ipa-inline.c === --- ipa-inline.c(revision 194916) +++ ipa-inline.c(working copy) @@ -1793,7 +1793,7 @@ } inline_small_functions (); - symtab_remove_unreachable_nodes (true, dump_file); + symtab_remove_unreachable_nodes (false, dump_file); free (order); /* Inline functions with a property that after inlining into all callers the Index: lto/lto.c === --- lto/lto.c (revision 194916) +++ lto/lto.c (working copy) @@ -3215,6 +3215,7 @@ cgraph_state = CGRAPH_STATE_IPA_SSA; execute_ipa_pass_list (all_regular_ipa_passes); + symtab_remove_unreachable_nodes (false, dump_file); if (cgraph_dump_file) { Index: cgraphclones.c === --- cgraphclones.c (revision 194916) +++ cgraphclones.c (working copy) @@ -184,6 +184,7 @@ new_node-symbol.decl = decl; symtab_register_node ((symtab_node)new_node); new_node-origin = n-origin; + new_node-symbol.lto_file_data = n-symbol.lto_file_data; if (new_node-origin) { new_node-next_nested = new_node-origin-nested;
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #166 from Jan Hubicka hubicka at gcc dot gnu.org 2013-01-09 15:19:41 UTC --- Markus, the apperance of undefined references I fixed by patch above is highly sensitive to partitioning and inlining decision. Can you, please, check if the problem with PGO remains? It may be another instance of the same issue.
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #167 from Markus Trippelsdorf markus at trippelsdorf dot de 2013-01-09 19:58:33 UTC --- (In reply to comment #166) Markus, the apperance of undefined references I fixed by patch above is highly sensitive to partitioning and inlining decision. Can you, please, check if the problem with PGO remains? It may be another instance of the same issue. Just checked it using your patch from comment 165, but the issue from comment 162 is still there: /usr/lib/gcc/x86_64-pc-linux-gnu/4.8.0/../../../../x86_64-pc-linux-gnu/bin/ld: error: /tmp/ccACx905.ltrans6.ltrans.o: requires dynamic R_X86_64_PC32 reloc against '_ZN13nsXULDocument14MaybeBroadcastEv.466048' which may overflow at runtime; recompile with -fPIC /tmp/ccACx905.ltrans6.ltrans.o:ccACx905.ltrans6.o:function nsRunnableMethodTraitsvoid (nsXULDocument::*)(), true::base_type* NS_N ewRunnableMethodnsXULDocument*, void (nsXULDocument::*)()(nsXULDocument*, void (nsXULDocument::*)()) [clone .local.42120] [clone .constprop.89117]: error: undefined reference to 'nsXULDocument::MaybeBroadcast() [clone .466048]' /tmp/ccACx905.ltrans6.ltrans.o:ccACx905.ltrans6.o:function nsRunnableMethodTraitsvoid (nsXULDocument::*)(), true::base_type* NS_N ewRunnableMethodnsXULDocument*, void (nsXULDocument::*)()(nsXULDocument*, void (nsXULDocument::*)()) [clone .local.42120] [clone .constprop.89117]: error: undefined reference to 'nsXULDocument::MaybeBroadcast() [clone .466048]' Also the memory usage went through the roof (not sure if this caused by your patch or my recent git-pull of mozilla-central): over 9GB RAM is needed (not much fun on my 8GB test-machine). (So I will stop testing Firfox for now, until LTO/PGO memory usage gets sane again (hopefully for 4.9).)
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #168 from Jan Hubicka hubicka at gcc dot gnu.org 2013-01-09 21:20:46 UTC --- Too bad :( The patch should reduce memory usage, not increase it. So it must be something else. My build was around 7GB w/o PGO, I will need to try the PGO builds myself. My tree is however somewhat out of date. I will try fresh checkout and post mem usage stats. Perhaps you can share smewhere the -lm.res and *wpa*cgraph dump of --save-temps -fdump-ipa-cgraph build? I will try to figure out those symbols.
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #169 from Jan Hubicka hubicka at gcc dot gnu.org 2013-01-09 21:22:33 UTC --- Author: hubicka Date: Wed Jan 9 21:22:26 2013 New Revision: 195066 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=195066 Log: PR lto/45375 * ipa-inline.c (ipa_inline): Remove extern inlines and virtual functions. * cgraphclones.c (cgraph_clone_node): Cpoy also LTO file data. * lto.c (do_whole_program_analysis): Remove unreachable nodes after IPA. Modified: trunk/gcc/ChangeLog trunk/gcc/cgraphclones.c trunk/gcc/ipa-inline.c trunk/gcc/lto/ChangeLog trunk/gcc/lto/lto.c
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 Leo Yuriev leo at yuriev dot ru changed: What|Removed |Added CC||leo at yuriev dot ru --- Comment #164 from Leo Yuriev leo at yuriev dot ru 2013-01-06 00:31:55 UTC --- Some trouble while building LLVM with -flto. ../x86_64-linux-gnu/bin/ld.gold: error: /tmp/cc60XH2F.ltrans0.ltrans.o: requires dynamic R_X86_64_PC32 reloc against 'X86CompilationCallback2' which may overflow at runtime; recompile with -fPIC Code: extern C { void X86CompilationCallback(void); asm( .text\n .align 8\n .globl ASMPREFIX X86CompilationCallback\n TYPE_FUNCTION(X86CompilationCallback) ASMPREFIX X86CompilationCallback:\n ... movq8(%rbp), %rdx\n call ASMPREFIX X86CompilationCallback2\n addq$32, %rsp\n ... ); } void __attribute__((used)) X86CompilationCallback2(intptr_t *StackPtr, intptr_t RetAddr) { intptr_t *RetAddrLoc = StackPtr[1]; ... } }
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #163 from Jan Hubicka hubicka at ucw dot cz 2012-12-14 18:24:31 UTC --- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #162 from Markus Trippelsdorf markus at trippelsdorf dot de 2012-12-13 22:25:27 UTC --- The libxul binary size issue is solved now. Good During testing I came across another issue that looks similar to the one Comment 146: /usr/lib/gcc/x86_64-pc-linux-gnu/4.8.0/../../../../x86_64-pc-linux-gnu/bin/ld: error: /tmp/ccwu5G98.ltrans4.ltrans.o: requires dynamic R_X86_64_PC32 reloc against '_ZN13nsXUL Document14MaybeBroadcastEv.429466' which may overflow at runtime; recompile with -fPIC /tmp/ccwu5G98.ltrans4.ltrans.o:ccwu5G98.ltrans4.o:function nsRunnableMethodTraitsvoid (nsXULDocument::*)(), true::base_type* NS_NewRunnableMethodnsXULDocument*, void (nsXU LDocument::*)()(nsXULDocument*, void (nsXULDocument::*)()) [clone .local.39398] [clone .constprop.84952]: error: undefined reference to 'nsXULDocument::MaybeBroadcast() [clone .429466]' /tmp/ccwu5G98.ltrans4.ltrans.o:ccwu5G98.ltrans4.o:function nsRunnableMethodTraitsvoid (nsXULDocument::*)(), true::base_type* NS_NewRunnableMethodnsXULDocument*, void (nsXU LDocument::*)()(nsXULDocument*, void (nsXULDocument::*)()) [clone .local.39398] [clone .constprop.84952]: error: undefined reference to 'nsXULDocument::MaybeBroadcast() [clone .429466]' collect2: error: ld returned 1 exit status After I deleted both nsXULDocument.o and nsXULDocument.gcda and rebuild with: make -f client.mk realbuild MOZ_PROFILE_USE=1 the problem did go away. This sounds like an independent problem with partitining. I am travelling till 17th, so I will try to check this locally myself. Perhaps you can give details on your setup? (i.e. my Mozilla tree got quite dirty with various local hacks I made over time, perhaps I should refresh to cleaner state) Honza
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #160 from Markus Trippelsdorf markus at trippelsdorf dot de 2012-12-13 09:52:37 UTC --- (In reply to comment #159) hal/Hal.gcda: 96.72%: num counts=30069, min counter=16389 hal/Hal.gcda: 97.50%: num counts=35296, min counter=10241 hal/Hal.gcda: 98.28%: num counts=43669, min counter=6145 hal/Hal.gcda: 99.06%: num counts=59589, min counter=3072 hal/Hal.gcda: 99.90%: num counts=115840, min counter=320 So it looks like you would want a cutoff of 97.5% to get close to what was there before. Setting the default cutoff to something like 95% would sound fine to me. I see i asked to reduce the parameter but suggested 990. Markus, can you try setting HOT_BB_COUNT_WS_PERMILLE to 950? It doesn't help: HOT_BB_COUNT_WS_PERMILLE=950: size of libxul.so: 42149632 bytes (In reply to comment #157) (Unfortunately this new ICE happens with yesterdays gcc when linking libxul: /var/tmp/mozilla-central/content/base/src/nsDocument.cpp: In member function ‘CreateRange’: /var/tmp/mozilla-central/content/base/src/nsDocument.cpp:4999:0: internal compiler error: in cgraph_mark_address_taken_node, at cgraph.c:1409 I will open a new PR for this later.) See PR55669
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #161 from Markus Trippelsdorf markus at trippelsdorf dot de 2012-12-13 12:59:59 UTC --- I've opened a new bug for the binary size increase issue: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55674
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #162 from Markus Trippelsdorf markus at trippelsdorf dot de 2012-12-13 22:25:27 UTC --- The libxul binary size issue is solved now. During testing I came across another issue that looks similar to the one Comment 146: /usr/lib/gcc/x86_64-pc-linux-gnu/4.8.0/../../../../x86_64-pc-linux-gnu/bin/ld: error: /tmp/ccwu5G98.ltrans4.ltrans.o: requires dynamic R_X86_64_PC32 reloc against '_ZN13nsXUL Document14MaybeBroadcastEv.429466' which may overflow at runtime; recompile with -fPIC /tmp/ccwu5G98.ltrans4.ltrans.o:ccwu5G98.ltrans4.o:function nsRunnableMethodTraitsvoid (nsXULDocument::*)(), true::base_type* NS_NewRunnableMethodnsXULDocument*, void (nsXU LDocument::*)()(nsXULDocument*, void (nsXULDocument::*)()) [clone .local.39398] [clone .constprop.84952]: error: undefined reference to 'nsXULDocument::MaybeBroadcast() [clone .429466]' /tmp/ccwu5G98.ltrans4.ltrans.o:ccwu5G98.ltrans4.o:function nsRunnableMethodTraitsvoid (nsXULDocument::*)(), true::base_type* NS_NewRunnableMethodnsXULDocument*, void (nsXU LDocument::*)()(nsXULDocument*, void (nsXULDocument::*)()) [clone .local.39398] [clone .constprop.84952]: error: undefined reference to 'nsXULDocument::MaybeBroadcast() [clone .429466]' collect2: error: ld returned 1 exit status After I deleted both nsXULDocument.o and nsXULDocument.gcda and rebuild with: make -f client.mk realbuild MOZ_PROFILE_USE=1 the problem did go away.
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #157 from Markus Trippelsdorf markus at trippelsdorf dot de 2012-12-12 11:43:27 UTC --- With revision 193740 libxul's size is ~34MB, which is OK. (Unfortunately this new ICE happens with yesterdays gcc when linking libxul: /var/tmp/mozilla-central/content/base/src/nsDocument.cpp: In member function ‘CreateRange’: /var/tmp/mozilla-central/content/base/src/nsDocument.cpp:4999:0: internal compiler error: in cgraph_mark_address_taken_node, at cgraph.c:1409 I will open a new PR for this later.) Here are the requested files: (I don't know which of the ~3000 gcda files you need, so I've uploaded them all) http://www.trippelsdorf.de/gcda_before.tar.bz2 (4MB) http://www.trippelsdorf.de/gcda_after.tar.bz2 (4MB) (-fdump-ipa-inline output) http://www.trippelsdorf.de/libxul_before.inline.tar.bz2 (100MB) http://www.trippelsdorf.de/libxul_after.inline.tar.bz2 (68MB, everything 'till the ICE hit)
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #158 from Teresa Johnson tejohnson at google dot com 2012-12-12 18:59:56 UTC --- On Wed, Dec 12, 2012 at 3:43 AM, markus at trippelsdorf dot de gcc-bugzi...@gcc.gnu.org wrote: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #157 from Markus Trippelsdorf markus at trippelsdorf dot de 2012-12-12 11:43:27 UTC --- With revision 193740 libxul's size is ~34MB, which is OK. (Unfortunately this new ICE happens with yesterdays gcc when linking libxul: /var/tmp/mozilla-central/content/base/src/nsDocument.cpp: In member function ‘CreateRange’: /var/tmp/mozilla-central/content/base/src/nsDocument.cpp:4999:0: internal compiler error: in cgraph_mark_address_taken_node, at cgraph.c:1409 I will open a new PR for this later.) Here are the requested files: (I don't know which of the ~3000 gcda files you need, so I've uploaded them all) http://www.trippelsdorf.de/gcda_before.tar.bz2 (4MB) http://www.trippelsdorf.de/gcda_after.tar.bz2 (4MB) Sorry, I should have clarified that any one of them would do (as long as it corresponded to an object file included in the LTO link for the main executable), since the info I need is in the program summary section for the executable, which is duplicated in each of them. (-fdump-ipa-inline output) http://www.trippelsdorf.de/libxul_before.inline.tar.bz2 (100MB) http://www.trippelsdorf.de/libxul_after.inline.tar.bz2 (68MB, everything 'till the ICE hit) With the old heuristics, the hot bb cutoff was: profile_info-sum_max / PARAM_VALUE (HOT_BB_COUNT_FRACTION)) In this case, sum_max is 103439951 and HOT_BB_COUNT_FRACTION was 1, so the cutoff count was 10343. From the working set computed from the histogram, the 99.9% cutoff count is 320. See the end of this email for the full set of histograms and working sets, but here are the top few working sets: ... hal/Hal.gcda: 96.72%: num counts=30069, min counter=16389 hal/Hal.gcda: 97.50%: num counts=35296, min counter=10241 hal/Hal.gcda: 98.28%: num counts=43669, min counter=6145 hal/Hal.gcda: 99.06%: num counts=59589, min counter=3072 hal/Hal.gcda: 99.90%: num counts=115840, min counter=320 So it looks like you would want a cutoff of 97.5% to get close to what was there before. (Honza, I just made some changes to enable gcov-dump to optionally compute and dump out the working sets from the histogram. I can send this for upstream review as I have wanted this several times.) The much smaller cutoff count is why there are fewer calls marked unlikely and more inlining: $ grep call is unlikely before/libxul.so.wpa.049i.inline | wc 442342 4944522 42560600 $ grep call is unlikely after/libxul.so.wpa.049i.inline | wc 392683 4349335 37477001 $ grep Inlined before/libxul.so.wpa.049i.inline | grep eliminated Inlined 60432 calls, eliminated 30986 functions $ grep Inlined after/libxul.so.wpa.049i.inline | grep eliminated Inlined 89573 calls, eliminated 28921 functions On thing that is interesting in the above info, and may be contributing to the larger size now, is that there are more inlines, but fewer functions are being eliminated. I'm not sure why that is offhand. It's possible (probable) that inlining heuristics need some retuning to make optimal use of the new cutoffs. We also see additional inlines in some of our large internal apps with the change, but not much increase in binary size, and it sometimes leads to better performance - although we are not as much affected because the google branches were using a much larger HOT_BB_COUNT_FRACTION of 60K already, in order to get more inlining. In this case, it looks like you are getting more inlines but it is apparently performance-neutral? Looking at a graph of the working set data, the number of counters starts increasing super-exponentially as the percentages approach 100%. I've been thinking that it may be useful to find the knee of the curve to determine the appropriate cutoff percentage. I'll see if I can make some progress on that. Full histogram/working set data: hal/Hal.gcda: a300: 512:PROGRAM_SUMMARY checksum=0x3aa34521 hal/Hal.gcda: counts=2109045, runs=7, sum_all=9749748271, run_max=97136704, sum_max=103439951 hal/Hal.gcda: counter histogram: hal/Hal.gcda: 0: num counts=1824318, min counter=0, cum_counter=0 hal/Hal.gcda: 1: num counts=30727, min counter=1, cum_counter=30727 hal/Hal.gcda: 2: num counts=11646, min counter=2, cum_counter=23292 hal/Hal.gcda: 3: num counts=5414, min counter=3, cum_counter=16242 hal/Hal.gcda: 4: num counts=5156, min counter=4, cum_counter=20624 hal/Hal.gcda: 5: num counts=3379, min counter=5, cum_counter=16895 hal/Hal.gcda: 6: num counts=3674, min counter=6, cum_counter=22044 hal/Hal.gcda: 7: num counts=2310, min counter=7, cum_counter=16170 hal/Hal.gcda: 8: num counts=4756, min counter=8, cum_counter=40330 hal/Hal.gcda: 9: num counts=4725, min counter=10,
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #159 from Jan Hubicka hubicka at ucw dot cz 2012-12-12 20:35:37 UTC --- hal/Hal.gcda: 96.72%: num counts=30069, min counter=16389 hal/Hal.gcda: 97.50%: num counts=35296, min counter=10241 hal/Hal.gcda: 98.28%: num counts=43669, min counter=6145 hal/Hal.gcda: 99.06%: num counts=59589, min counter=3072 hal/Hal.gcda: 99.90%: num counts=115840, min counter=320 So it looks like you would want a cutoff of 97.5% to get close to what was there before. Setting the default cutoff to something like 95% would sound fine to me. I see i asked to reduce the parameter but suggested 990. Markus, can you try setting HOT_BB_COUNT_WS_PERMILLE to 950? Honza
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #154 from Teresa Johnson tejohnson at google dot com 2012-12-11 19:30:53 UTC --- What was the size of the gcc lto/pgo binary before the change to use the histogram? Was it close to the gcc 4.7 lto/pgo size? In that case that is a very large increase, ~25%. Markus, could you attach to the bug one of the gcda files so that I can see the program summary and figure out how far off the old hot bb threshold is from the new histogram-based one? Also, it would be good to see the -fdump-ipa-inline dumps before and after the regression (if necessary, the before one could be from 4_7).
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #155 from Markus Trippelsdorf markus at trippelsdorf dot de 2012-12-11 22:57:14 UTC --- (In reply to comment #154) What was the size of the gcc lto/pgo binary before the change to use the histogram? Was it close to the gcc 4.7 lto/pgo size? In that case that is a very large increase, ~25%. With revision 193914 (before the change) the lto/pgo size is 42115424 bytes. So it looks like Theresa is off the hook. Markus, could you attach to the bug one of the gcda files so that I can see the program summary and figure out how far off the old hot bb threshold is from the new histogram-based one? Also, it would be good to see the -fdump-ipa-inline dumps before and after the regression (if necessary, the before one could be from 4_7). Will try to post them tomorrow .
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #156 from Teresa Johnson tejohnson at google dot com 2012-12-12 00:00:17 UTC --- On Tue, Dec 11, 2012 at 2:57 PM, markus at trippelsdorf dot de gcc-bugzi...@gcc.gnu.org wrote: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #155 from Markus Trippelsdorf markus at trippelsdorf dot de 2012-12-11 22:57:14 UTC --- (In reply to comment #154) What was the size of the gcc lto/pgo binary before the change to use the histogram? Was it close to the gcc 4.7 lto/pgo size? In that case that is a very large increase, ~25%. With revision 193914 (before the change) the lto/pgo size is 42115424 bytes. So it looks like Theresa is off the hook. Unfortunately, I am still possibly on the hook since the main suspect change is r193747 (committed by Honza with changes made by him and I to use the histogram instead of a hard limit for determining bb hotness). Between then and when I committed fixes for this under LTO (r193999) I would expect that the code size might have been worse temporarily because everything looked hot since the histogram was not being streamed through the LTO files properly, and so inlining could have gotten excessive. Markus, could you attach to the bug one of the gcda files so that I can see the program summary and figure out how far off the old hot bb threshold is from the new histogram-based one? Also, it would be good to see the -fdump-ipa-inline dumps before and after the regression (if necessary, the before one could be from 4_7). Will try to post them tomorrow . Ok thanks. Teresa -- Configure bugmail: http://gcc.gnu.org/bugzilla/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are on the CC list for the bug. -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #147 from Jan Hubicka hubicka at ucw dot cz 2012-12-02 09:23:09 UTC --- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #146 from Markus Trippelsdorf markus at trippelsdorf dot de 2012-12-02 07:36:02 UTC --- (In reply to comment #145) http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #144 from Markus Trippelsdorf markus at trippelsdorf dot de 2012-12-01 12:39:30 UTC --- It looks like there is a LTO code-size regression on trunk: (size of libxul.so, build without elfhack): gcc lto/pgo : size: 42204584 | Kraken bench: 2723.9ms +/- 0.9% About LTO+PGO please be sure that you have the Teresa's fix from this Friday in your tree. Yes, my tree already included this fix and also the fix from bug 1. Please try to reduce HOT_BB_COUNT_WS_PERMILLE to 990. I also see some regressions on some SPEC benchmarks (such as GCC) and this helps. If it doesn't it would be nice to know what value is needed for comparable size. gcc : size: 34072808 | Kraken bench: 2804.3ms +/- 1.6% Is LTO w/o PGO bigger than previous builds? Couldn't tell, because it doesn't link: /usr/lib/gcc/x86_64-pc-linux-gnu/4.8.0/../../../../x86_64-pc-linux-gnu/bin/ld: warning: hidden symbol 'pixman_add_triangles' in /var/tmp/moz-build-dir/toolkit/library/../../gfx/cairo/libpixman/src/pixman-trap.o is referenced by DSO /usr/lib64/libcairo.so /usr/lib/gcc/x86_64-pc-linux-gnu/4.8.0/../../../../x86_64-pc-linux-gnu/bin/ld: error: /tmp/cc0oq4BG.ltrans1.ltrans.o: requires dynamic R_X86_64_PC32 reloc against '_ZN12SkAnnotationC1ER23SkFlattenableReadBuffer' which may overflow at runtime; recompile with -fPIC /tmp/cc0oq4BG.ltrans0.ltrans.o:cc0oq4BG.ltrans0.o:function SharedStub: error: undefined reference to 'PrepareAndDispatch' /tmp/cc0oq4BG.ltrans1.ltrans.o:cc0oq4BG.ltrans1.o:function SkAnnotation::CreateProc(SkFlattenableReadBuffer) [clone .local.7828.1055099]: error: undefined reference to 'SkAnnotation::SkAnnotation(SkFlattenableReadBuffer)' collect2: error: ld returned 1 exit status The undefined reference to PrepareAndDispatch is easily fixed by an __attribute__ ((used)). Do you have an idea on how to fix the SkAnnotation::SkAnnotation(SkFlattenableReadBuffer) issue? Hmm, I remember seeing this one, too. I will check. Honza
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #148 from Markus Trippelsdorf markus at trippelsdorf dot de 2012-12-02 11:57:27 UTC --- (In reply to comment #147) http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #146 from Markus Trippelsdorf markus at trippelsdorf dot de 2012-12-02 07:36:02 UTC --- (In reply to comment #145) http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #144 from Markus Trippelsdorf markus at trippelsdorf dot de 2012-12-01 12:39:30 UTC --- It looks like there is a LTO code-size regression on trunk: (size of libxul.so, build without elfhack): gcc lto/pgo : size: 42204584 | Kraken bench: 2723.9ms +/- 0.9% About LTO+PGO please be sure that you have the Teresa's fix from this Friday in your tree. Yes, my tree already included this fix and also the fix from bug 1. Please try to reduce HOT_BB_COUNT_WS_PERMILLE to 990. I also see some regressions on some SPEC benchmarks (such as GCC) and this helps. If it doesn't it would be nice to know what value is needed for comparable size. Unfortunately it doesn't help much, because with --param hot-bb-count-ws-permille=990 the size is only 0.25% smaller: (With --param) : 42098856 (Without ) : 42204584 I will try smaller values later.
Re: [Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
Please try to reduce HOT_BB_COUNT_WS_PERMILLE to 990. I also see some regressions on some SPEC benchmarks (such as GCC) and this helps. If it doesn't it would be nice to know what value is needed for comparable size. Unfortunately it doesn't help much, because with --param hot-bb-count-ws-permille=990 the size is only 0.25% smaller: (With --param) : 42098856 (Without ) : 42204584 I will try smaller values later. Hmm, that sounds like quite bad news - the histogram code was supposed to help in such cases. I will try to fix the non-PGO case and lets try to compare how PGO/non-PGO compare first. If you could put somewhere the -fdump-ipa-inline dump, I will try to check if there is something obviously wrong. In worst case we can resort to combining both heuristics - i.e. keeping the hot_bb_fraction in addition to histogram code. In fact I planned to do that this way but Teresa removed the old code and I did not see good reason why to keep it. Honza
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #149 from Jan Hubicka hubicka at ucw dot cz 2012-12-02 15:05:52 UTC --- Please try to reduce HOT_BB_COUNT_WS_PERMILLE to 990. I also see some regressions on some SPEC benchmarks (such as GCC) and this helps. If it doesn't it would be nice to know what value is needed for comparable size. Unfortunately it doesn't help much, because with --param hot-bb-count-ws-permille=990 the size is only 0.25% smaller: (With --param) : 42098856 (Without ) : 42204584 I will try smaller values later. Hmm, that sounds like quite bad news - the histogram code was supposed to help in such cases. I will try to fix the non-PGO case and lets try to compare how PGO/non-PGO compare first. If you could put somewhere the -fdump-ipa-inline dump, I will try to check if there is something obviously wrong. In worst case we can resort to combining both heuristics - i.e. keeping the hot_bb_fraction in addition to histogram code. In fact I planned to do that this way but Teresa removed the old code and I did not see good reason why to keep it. Honza
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #150 from Markus Trippelsdorf markus at trippelsdorf dot de 2012-12-02 18:03:28 UTC --- For comparison I've just disabled skia and build with LTO only; the size looks good for this case: 31356968
Re: [Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
Teresa comitted another bugfix just today. So with bit of luck it will work now? I will try to look deeper into it ASAP, but I am just getting ready for trip to USA. Honza
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #151 from Jan Hubicka hubicka at ucw dot cz 2012-12-02 20:52:13 UTC --- Teresa comitted another bugfix just today. So with bit of luck it will work now? I will try to look deeper into it ASAP, but I am just getting ready for trip to USA. Honza
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #152 from Jan Hubicka hubicka at ucw dot cz 2012-12-02 21:09:24 UTC --- Also I suppose you don't have comparsion to 4.7 handy? (I am curious because of inliner heuristic re-tunning) Honza
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #153 from Markus Trippelsdorf markus at trippelsdorf dot de 2012-12-02 21:13:21 UTC --- On 2012.12.02 at 21:09 +, hubicka at ucw dot cz wrote: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #152 from Jan Hubicka hubicka at ucw dot cz 2012-12-02 21:09:24 UTC --- Also I suppose you don't have comparsion to 4.7 handy? (I am curious because of inliner heuristic re-tunning) The LTO/PGO sizes were measured with the newest patch from Teresa already applied. gcc-4.7 lto/pgo: size: 7456 | Kraken bench: 2706.7ms +/- 1.1%
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #144 from Markus Trippelsdorf markus at trippelsdorf dot de 2012-12-01 12:39:30 UTC --- It looks like there is a LTO code-size regression on trunk: (size of libxul.so, build without elfhack): gcc lto/pgo : size: 42204584 | Kraken bench: 2723.9ms +/- 0.9% gcc : size: 34072808 | Kraken bench: 2804.3ms +/- 1.6% clang lto : size: 35071848 | Kraken bench: 2804.2ms +/- 1.2% clang : size: 36797384 | Kraken bench: 2819.6ms +/- 1.4%
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #145 from Jan Hubicka hubicka at ucw dot cz 2012-12-01 22:09:07 UTC --- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #144 from Markus Trippelsdorf markus at trippelsdorf dot de 2012-12-01 12:39:30 UTC --- It looks like there is a LTO code-size regression on trunk: (size of libxul.so, build without elfhack): gcc lto/pgo : size: 42204584 | Kraken bench: 2723.9ms +/- 0.9% About LTO+PGO please be sure that you have the Teresa's fix from this Friday in your tree. gcc : size: 34072808 | Kraken bench: 2804.3ms +/- 1.6% Is LTO w/o PGO bigger than previous builds? clang lto : size: 35071848 | Kraken bench: 2804.2ms +/- 1.2% clang : size: 36797384 | Kraken bench: 2819.6ms +/- 1.4%
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #146 from Markus Trippelsdorf markus at trippelsdorf dot de 2012-12-02 07:36:02 UTC --- (In reply to comment #145) http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #144 from Markus Trippelsdorf markus at trippelsdorf dot de 2012-12-01 12:39:30 UTC --- It looks like there is a LTO code-size regression on trunk: (size of libxul.so, build without elfhack): gcc lto/pgo : size: 42204584 | Kraken bench: 2723.9ms +/- 0.9% About LTO+PGO please be sure that you have the Teresa's fix from this Friday in your tree. Yes, my tree already included this fix and also the fix from bug 1. gcc : size: 34072808 | Kraken bench: 2804.3ms +/- 1.6% Is LTO w/o PGO bigger than previous builds? Couldn't tell, because it doesn't link: /usr/lib/gcc/x86_64-pc-linux-gnu/4.8.0/../../../../x86_64-pc-linux-gnu/bin/ld: warning: hidden symbol 'pixman_add_triangles' in /var/tmp/moz-build-dir/toolkit/library/../../gfx/cairo/libpixman/src/pixman-trap.o is referenced by DSO /usr/lib64/libcairo.so /usr/lib/gcc/x86_64-pc-linux-gnu/4.8.0/../../../../x86_64-pc-linux-gnu/bin/ld: error: /tmp/cc0oq4BG.ltrans1.ltrans.o: requires dynamic R_X86_64_PC32 reloc against '_ZN12SkAnnotationC1ER23SkFlattenableReadBuffer' which may overflow at runtime; recompile with -fPIC /tmp/cc0oq4BG.ltrans0.ltrans.o:cc0oq4BG.ltrans0.o:function SharedStub: error: undefined reference to 'PrepareAndDispatch' /tmp/cc0oq4BG.ltrans1.ltrans.o:cc0oq4BG.ltrans1.o:function SkAnnotation::CreateProc(SkFlattenableReadBuffer) [clone .local.7828.1055099]: error: undefined reference to 'SkAnnotation::SkAnnotation(SkFlattenableReadBuffer)' collect2: error: ld returned 1 exit status The undefined reference to PrepareAndDispatch is easily fixed by an __attribute__ ((used)). Do you have an idea on how to fix the SkAnnotation::SkAnnotation(SkFlattenableReadBuffer) issue?
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #142 from Jan Hubicka hubicka at gcc dot gnu.org 2012-10-08 22:19:55 UTC --- After updating Mozilla this weekend, I definitely bloat up 8GB machine. The pak in TOP is around 9-10GB. I checked malloc usage and there are not many surprises. It is about 300MB, mostly GGC overhead, pointer maps and such. Most memory is actually the GGC, about 7GB. Here 5GB survives type and decl merging and is distributed as follows: cgraph.c:722 (cgraph_allocate_init_indirect_info1671240: 0.0% 0: 0.0%8202960: 0.2% 0: 0.0% 246855 tree.c:1226 (build_int_cst_wide) 625825208:12.3% 0: 0.0% 10437744: 0.2%4863752: 3.1% 325009 ipa-prop.h:471 (ipa_check_create_edge_args) 0: 0.0% 0: 0.0% 16777216: 0.3% 0: 0.0% 1 ipa-inline-analysis.c:3697 (inline_read_section) 0: 0.0% 28298904: 1.6% 21095504: 0.4%1064480: 0.7% 423701 tree.c:1561 (build_string) 16526800: 0.3% 0: 0.0% 21695715: 0.4%3395427: 2.2% 864326 ipa-prop.c:3393 (ipa_read_node_info) 0: 0.0%4302088: 0.2% 25029448: 0.5% 119192: 0.1% 246788 stringpool.c:75 (alloc_node) 0: 0.0% 0: 0.0% 27817760: 0.5% 0: 0.0% 695444 ipa-ref.c:51 (ipa_record_reference) 0: 0.0% 188442816:10.3% 28443272: 0.6%2114424: 1.4%1256259 stringpool.c:58 (stringpool_ggc_alloc)0: 0.0% 0: 0.0% 34673092: 0.7%2619412: 1.7% 695444 lto/lto.c:2279 (create_subid_section_table) 275832: 0.0% 0: 0.0% 40363416: 0.8%8051472: 5.2% 3978 tree-streamer-in.c:895 (lto_input_ts_constructor 171812232: 3.4% 192568640:10.6% 42205992: 0.8%1425072: 0.9% 947082 ipa-prop.c:3380 (ipa_read_node_info) 0: 0.0% 35825488: 2.0% 58764528: 1.1% 659704: 0.4% 909232 tree-streamer-in.c:488 (streamer_alloc_tree) 129846168: 2.6% 0: 0.0% 75997752: 1.5% 7072: 0.0%2063753 tree.c:1263 (build_int_cst_wide) 237791264: 4.7% 0: 0.0% 90464320: 1.8% 0: 0.0% 10257987 ipa-inline-analysis.c:3709 (inline_read_section) 0: 0.0% 133938484: 7.4% 101874268: 2.0%1606480: 1.0%1099389 lto-section-in.c:361 (lto_new_in_decl_state) 3240: 0.0% 0: 0.0% 107452560: 2.1% 0: 0.0% 895465 cgraph.c:653 (cgraph_create_edge_1) 0: 0.0% 0: 0.0% 135509816: 2.6% 0: 0.0%1302979 ggc-common.c:253 (ggc_cleared_alloc_ptr_array_tw 2040: 0.0% 866397160:47.6% 190623368: 3.7% 263888: 0.2% 11459 lto/lto.c:267 (lto_read_in_decl_state) 3024: 0.0% 0: 0.0% 225743280: 4.4% 41057176:26.5%6268255 ipa-inline-analysis.c:931 (inline_summary_alloc) 0: 0.0% 0: 0.0% 268435464: 5.2% 8: 0.0% 1 cgraph.c:362 (cgraph_allocate_node) 0: 0.0% 0: 0.0% 515473640:10.1% 0: 0.0%1741465 toplev.c:953 (realloc_for_line_map) 0: 0.0% 358955168:19.7% 1074790424:21.0%184: 0.0% 19 tree-streamer-in.c:499 (streamer_alloc_tree) 3668091656:72.1% 0: 0.0% 1995384408:38.9% 87485792:56.5% 46580224 Total5089831352 1821058652 5124870115154815271 91384962 source location GarbageFreed Leak OverheadTimes I.e. 20% are now linemaps, 38% trees read by the streamer, 10% cgraph nodes, 5% inline summaries, 4% streamer table converting UIDs to decls (that can be freed). The trees are distributed as follows: Kind Nodes Bytes --- decls20489087 -1105370640 types10321297 1733977896 blocks1020128160960 stmts 0 0 refs 442971806000 exprs8205133 264995952 constants11667038 376994197 identifiers 695444 27817760 vecs 325009 626535448 binfos 2063753 205829776 ssa names 0 0 constructors 3698868877264 random kinds 7039351 281574472 lang_decl kinds0 0 lang_type kinds0 0 omp clauses0 0 --- Total61322307 -1863768211 --- Code Nodes I think all the blocks read to WPA are bugs. We may also do better on sharing constants. identifier_node
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #143 from Steven Bosscher steven at gcc dot gnu.org 2012-10-08 22:30:20 UTC --- Created attachment 28395 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=28395 Use size_t for tree code book-keeping ...because overflow looks so sloppy.
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #141 from Markus Trippelsdorf markus at trippelsdorf dot de 2012-09-15 14:05:38 UTC --- After the new IonMonkey JIT went in (http://blog.mozilla.org/javascript/2012/09/12/ionmonkey-in-firefox-18/) peak memory use went up. It is now 6.8GB (gcc-4.7 roughly the same: 6.5GB). So we're approaching the point where a 8GB machine isn't enough to build Firefox with LTO...
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #139 from Jan Hubicka hubicka at gcc dot gnu.org 2012-08-18 09:36:55 UTC --- oprofile of WPA: 649295 18.2243 lto1 lto1 lto_main() 3412569.5783 lto1 lto1 htab_find_slot_with_hash 1265673.5525 lto1 lto1 do_estimate_growth_1(cgraph_node*, void*) 97142 2.7266 lto1 lto1 htab_expand 89658 2.5165 libc-2.11.1.so libc-2.11.1.so _int_malloc 82117 2.3048 lto1 lto1 pointer_map_insert(pointer_map_t*, void const*) 60238 1.6907 lto1 lto1 iterative_hash_hashval_t(unsigned int, unsigned int) 58145 1.6320 lto1 lto1 ggc_internal_alloc_stat(unsigned long, char const*, int, char const*) 53679 1.5067 lto1 lto1 linemap_lookup(line_maps*, unsigned int) 47271 1.3268 lto1 lto1 lto_output_tree(output_block*, tree_node*, bool, bool) 43043 1.2081 lto1 lto1 gt_ggc_mx_lang_tree_node(void*) 42675 1.1978 lto1 lto1 verify_cgraph_node(cgraph_node*) 40609 1.1398 lto1 lto1 streamer_tree_cache_insert_1(streamer_tree_cache_d*, tree_node*, unsigned int*, bool) 40245 1.1296 lto1 lto1 ggc_marked_p(void const*) 39474 1.1079 libc-2.11.1.so libc-2.11.1.so memset 38955 1.0934 libc-2.11.1.so libc-2.11.1.so malloc_consolidate 32085 0.9006 lto1 lto1 streamer_write_uhwi_stream(lto_output_stream*, unsigned long) 31965 0.8972 lto1 lto1 ggc_set_mark(void const*) 31406 0.8815 lto1 lto1 lto_input_tree(lto_input_block*, data_in*) 29213 0.8199 lto1 lto1 streamer_read_tree_bitfields(lto_input_block*, tree_node*) 26846 0.7535 lto1 lto1 hash_pointer 25870 0.7261 libc-2.11.1.so libc-2.11.1.so memcpy We still spend insanely long time in walking types in lto_main (introduced by Michael's patch)
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #140 from Jan Hubicka hubicka at gcc dot gnu.org 2012-08-19 05:55:26 UTC --- Author: hubicka Date: Sun Aug 19 05:55:20 2012 New Revision: 190509 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=190509 Log: PR lto/45375 * ipa-inline.c (want_inline_small_function_p): Bypass inline limits for hinted functions. (edge_badness): Dump hints; decrease badness for hinted funcitons. * ipa-inline.h (enum inline_hints_vals): New enum. (inline_hints): New type. (edge_growth_cache_entry): Add hints. (dump_inline_summary): Update. (dump_inline_hints): Declare. (do_estimate_edge_hints): Declare. (estimate_edge_hints): New inline function. (reset_edge_growth_cache): Update. * predict.c (cgraph_maybe_hot_edge_p): Do not ice on indirect edges. * ipa-inline-analysis.c (dump_inline_hints): New function. (estimate_edge_devirt_benefit): Return true when function should be hinted. (estimate_calls_size_and_time): New hints argument; set it when devritualization happens. (estimate_node_size_and_time): New hints argument. (do_estimate_edge_time): Cache hints. (do_estimate_edge_growth): Update. (do_estimate_edge_hints): New function Modified: trunk/gcc/ChangeLog trunk/gcc/ipa-inline-analysis.c trunk/gcc/ipa-inline.c trunk/gcc/ipa-inline.h trunk/gcc/predict.c trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/ipa/iinline-1.c
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #137 from Jan Hubicka hubicka at gcc dot gnu.org 2012-08-10 15:06:51 UTC --- So since the last report we managed to double WPA memory usage and compile time... 12m wall, 42m user is needed for WPA build. Execution times (seconds) phase opt and generate : 97.34 (21%) usr 0.33 ( 1%) sys 97.70 (20%) wall 98900 kB ( 3%) ggc phase stream in : 242.70 (51%) usr 5.12 (22%) sys 247.94 (50%) wall 3174311 kB (97%) ggc phase stream out: 131.99 (28%) usr 17.49 (76%) sys 149.59 (30%) wall 0 kB ( 0%) ggc garbage collection : 24.01 ( 5%) usr 0.00 ( 0%) sys 24.03 ( 5%) ipa lto gimple out : 12.59 ( 3%) usr 1.07 ( 5%) sys 13.69 ( 3%) wall 0 kB ( 0%) ggc ipa lto decl in : 188.50 (40%) usr 3.93 (17%) sys 192.53 (39%) wall 2083552 kB (64%) ggc ipa lto decl out: 113.33 (24%) usr 8.48 (37%) sys 121.84 (25%) wall 0 kB ( 0%) ggc ipa lto cgraph I/O : 5.58 ( 1%) usr 0.67 ( 3%) sys 6.25 ( 1%) wall 684122 kB (21%) ggc ipa lto decl merge : 10.64 ( 2%) usr 0.01 ( 0%) sys 10.64 ( 2%) wall 291 kB ( 0%) ggc ipa lto cgraph merge: 9.15 ( 2%) usr 0.01 ( 0%) sys 9.17 ( 2%) wall 15100 kB ( 0%) ggc whopr wpa : 5.80 ( 1%) usr 0.05 ( 0%) sys 5.89 ( 1%) wall 1 kB ( 0%) ggc whopr wpa I/O : 2.19 ( 0%) usr 7.94 (35%) sys 10.19 ( 2%) inline heuristics : 61.46 (13%) usr 0.31 ( 1%) sys 61.80 (12%) wall 351753 kB (11%) ggc callgraph verifier : 15.97 ( 3%) usr 0.06 ( 0%) sys 16.00 ( 3%) wall 0 kB ( 0%) ggc TOTAL : 472.0522.94 495.25 3274649 kB
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #138 from Jan Hubicka hubicka at gcc dot gnu.org 2012-08-10 15:35:44 UTC --- Actually not, I looked up wrong report. The last report in comment #121 shows: TOTAL : 616.4322.26 651.79 2165706 kB So we actually got noticeably faster, but need more memory. 1GB of GGC space, but a lot more in top report. I will look into mem report analysis once I am done with merging some other cleanups/speedups.
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #136 from Jan Hubicka hubicka at gcc dot gnu.org 2012-05-13 16:29:04 UTC --- ... and oprofile of compilation stage of -flto-partition=none samples %image name app name symbol name 1949762.8536 lto1 lto1 alloc_page 1090911.5966 libc-2.11.1.so libc-2.11.1.so _int_malloc 99458 1.4556 lto1 lto1 operand_equal_p 88092 1.2893 lto1 lto1 record_reg_classes 87508 1.2807 lto1 lto1 bitmap_set_bit 75628 1.1069 lto1 lto1 estimate_edge_growth 68760 1.0064 lto1 lto1 mem_attrs_eq_p 62151 0.9096 lto1 lto1 for_each_rtx_1 58274 0.8529 libc-2.11.1.so libc-2.11.1.so memset 55257 0.8087 libc-2.11.1.so libc-2.11.1.so malloc 52116 0.7628 lto1 lto1 htab_find_slot_with_hash 50481 0.7388 oprofiledoprofiled /usr/bin/oprofiled 42524 0.6224 lto1 lto1 ggc_set_mark 40190 0.5882 lto1 lto1 constrain_operands 40124 0.5872 lto1 lto1 lookup_page_table_entry 39279 0.5749 lto1 lto1 extract_insn 34436 0.5040 lto1 lto1 ggc_internal_alloc_stat 33609 0.4919 lto1 lto1 preprocess_constraints 32843 0.4807 lto1 lto1 get_attr_enabled 32582 0.4769 lto1 lto1 reload_cse_simplify_operands 32573 0.4767 lto1 lto1 bitmap_clear_bit 32278 0.4724 libc-2.11.1.so libc-2.11.1.so malloc_consolidate 29633 0.4337 lto1 lto1 bitmap_bit_p 29593 0.4331 lto1 lto1 find_reg_note 29428 0.4307 libc-2.11.1.so libc-2.11.1.so _int_free 29161 0.4268 lto1 lto1 df_note_bb_compute 28939 0.4235 libc-2.11.1.so libc-2.11.1.so calloc 28794 0.4214 lto1 lto1 cse_insn 28084 0.4110 lto1 lto1 find_reloads 26192 0.3833 lto1 lto1 ix86_decompose_address 25211 0.3690 libc-2.11.1.so libc-2.11.1.so memcpy 25016 0.3661 lto1 lto1 df_ref_create_structure 24321 0.3560 lto1 lto1 nonzero_bits1 24066 0.3522 lto1 lto1 htab_traverse_noresize 23895 0.3497 libc-2.11.1.so libc-2.11.1.so free
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #130 from Jan Hubicka hubicka at gcc dot gnu.org 2012-05-12 14:44:47 UTC --- After fixing one linker error, I can now build Mozilla with -flto-partition=none. It takes 11GB and 40 minutes, so there is space for improvement ;) There are some obvious questions, like why IRA needs 63% of GGC memory, and VRP 23% Also the -flto-partition=none .text section is now 18% smaller. This is large enough to be declared a bug, but I am not sure how to track it. Note that my macihne has quite poor since CPU performance, so the compile times are likely not comparable with LLVM ones reported above (and I also use debugging build). ipa lto gimple in : 52.12 ( 2%) usr 3.68 ( 9%) sys 55.72 ( 2%) wall 2998249 kB (84%) ggc ipa lto decl in : 225.68 ( 8%) usr 2.39 ( 6%) sys 228.17 ( 8%) wall 1124821 kB (31%) ggc ipa lto cgraph I/O : 4.82 ( 0%) usr 0.44 ( 1%) sys 5.27 ( 0%) wall 684110 kB (19%) ggc cfg construction: 3.01 ( 0%) usr 0.12 ( 0%) sys 3.29 ( 0%) wall 70205 kB ( 2%) ggc cfg cleanup : 46.57 ( 2%) usr 0.41 ( 1%) sys 46.69 ( 2%) wall 75005 kB ( 2%) ggc df live regs: 78.21 ( 3%) usr 0.25 ( 1%) sys 77.55 ( 3%) wall 0 kB ( 0%) ggc alias analysis : 25.59 ( 1%) usr 0.12 ( 0%) sys 25.88 ( 1%) wall 474769 kB (13%) ggc parser (global) : 8.62 ( 0%) usr 0.65 ( 2%) sys 10.00 ( 0%) wall 259389 kB ( 7%) ggc inline heuristics : 87.23 ( 3%) usr 0.51 ( 1%) sys 88.41 ( 3%) wall 451358 kB (13%) ggc integration : 50.61 ( 2%) usr 1.51 ( 4%) sys 52.67 ( 2%) wall 1479979 kB (41%) ggc tree CFG cleanup: 46.68 ( 2%) usr 0.43 ( 1%) sys 48.09 ( 2%) wall 70493 kB ( 2%) ggc tree VRP: 65.88 ( 2%) usr 0.73 ( 2%) sys 66.71 ( 2%) wall 862879 kB (24%) ggc tree copy propagation : 22.30 ( 1%) usr 0.17 ( 0%) sys 22.11 ( 1%) wall 144298 kB ( 4%) ggc tree PTA: 46.70 ( 2%) usr 0.06 ( 0%) sys 46.90 ( 2%) wall 100249 kB ( 3%) ggc tree SSA rewrite: 19.16 ( 1%) usr 0.15 ( 0%) sys 19.09 ( 1%) wall 149347 kB ( 4%) ggc tree SSA incremental: 27.75 ( 1%) usr 0.61 ( 1%) sys 27.86 ( 1%) wall 72307 kB ( 2%) ggc tree operand scan : 57.17 ( 2%) usr 3.03 ( 7%) sys 59.92 ( 2%) wall 1296208 kB (36%) ggc dominator optimization : 35.95 ( 1%) usr 0.21 ( 0%) sys 35.74 ( 1%) wall 311024 kB ( 9%) ggc tree CCP: 31.61 ( 1%) usr 0.12 ( 0%) sys 31.17 ( 1%) wall 69 kB ( 3%) ggc tree PRE: 87.46 ( 3%) usr 0.60 ( 1%) sys 88.62 ( 3%) wall 538859 kB (15%) ggc tree FRE: 47.37 ( 2%) usr 0.58 ( 1%) sys 45.89 ( 2%) wall 274455 kB ( 8%) ggc tree aggressive DCE : 8.96 ( 0%) usr 0.22 ( 1%) sys 8.86 ( 0%) wall 137686 kB ( 4%) ggc tree forward propagate : 10.28 ( 0%) usr 0.10 ( 0%) sys 10.33 ( 0%) wall 56466 kB ( 2%) ggc tree slp vectorization : 25.42 ( 1%) usr 0.16 ( 0%) sys 25.50 ( 1%) wall 436119 kB (12%) ggc complete unrolling : 5.81 ( 0%) usr 0.13 ( 0%) sys 6.07 ( 0%) wall 115165 kB ( 3%) ggc tree vectorization : 1.44 ( 0%) usr 0.05 ( 0%) sys 1.36 ( 0%) wall 31337 kB ( 1%) ggc tree iv optimization: 13.00 ( 0%) usr 0.08 ( 0%) sys 12.94 ( 0%) wall 185893 kB ( 5%) ggc dominance computation : 48.61 ( 2%) usr 0.54 ( 1%) sys 47.65 ( 2%) wall 0 kB ( 0%) ggc expand vars : 18.81 ( 1%) usr 0.09 ( 0%) sys 18.42 ( 1%) wall 167798 kB ( 5%) ggc expand : 116.32 ( 4%) usr 0.61 ( 1%) sys 116.22 ( 4%) wall 1508612 kB (42%) ggc forward prop: 23.01 ( 1%) usr 0.36 ( 1%) sys 23.43 ( 1%) wall 130825 kB ( 4%) ggc CSE : 67.21 ( 2%) usr 0.23 ( 1%) sys 66.28 ( 2%) wall 44439 kB ( 1%) ggc dead store elim1: 20.47 ( 1%) usr 0.10 ( 0%) sys 20.83 ( 1%) wall 103309 kB ( 3%) ggc dead store elim2: 18.99 ( 1%) usr 0.18 ( 0%) sys 20.48 ( 1%) wall 140398 kB ( 4%) ggc CPROP : 52.83 ( 2%) usr 0.33 ( 1%) sys 52.91 ( 2%) wall 336514 kB ( 9%) ggc PRE : 30.60 ( 1%) usr 0.06 ( 0%) sys 30.51 ( 1%) wall 52724 kB ( 1%) ggc CSE 2 : 37.89 ( 1%) usr 0.04 ( 0%) sys 38.88 ( 1%) wall 29785 kB ( 1%) ggc combiner: 80.20 ( 3%) usr 0.23 ( 1%) sys 80.57 ( 3%) wall 400168 kB (11%) ggc integrated RA : 191.13 ( 7%) usr 0.44 ( 1%) sys 190.64 ( 7%) wall 2328880 kB (65%) ggc reload : 65.46 ( 2%) usr 0.09 ( 0%) sys 67.43 ( 2%) wall 193522 kB ( 5%) ggc reload CSE regs : 56.71 ( 2%) usr 0.14 ( 0%) sys 56.49 ( 2%) wall 241394 kB ( 7%) ggc thread pro- epilogue : 14.43 ( 1%) usr 0.15 ( 0%) sys 14.97 ( 1%) wall 201098 kB ( 6%) ggc final : 44.77 ( 2%) usr 2.80 ( 6%) sys 48.99 ( 2%) wall 367580 kB (10%) ggc rest
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 Steven Bosscher steven at gcc dot gnu.org changed: What|Removed |Added CC||steven at gcc dot gnu.org --- Comment #131 from Steven Bosscher steven at gcc dot gnu.org 2012-05-12 15:52:54 UTC --- (In reply to comment #130) There are some obvious questions, like why IRA needs 63% of GGC memory, and VRP 23% tree VRP: 65.88 ( 2%) usr 0.73 ( 2%) sys 66.71 ( 2%) wall 862879 kB (24%) ggc Is it possible to do this again with gathering statistics enabled? The only thing I can think of for this would be ASSERT_EXPRs and all the rewriting involved for them. tree slp vectorization : 25.42 ( 1%) usr 0.16 ( 0%) sys 25.50 ( 1%) wall 436119 kB (12%) ggc This 12% also seems excessive. CPROP : 52.83 ( 2%) usr 0.33 ( 1%) sys 52.91 ( 2%) wall 336514 kB ( 9%) ggc And this one also. I'll see if I can understand and explain this one. integrated RA : 191.13 ( 7%) usr 0.44 ( 1%) sys 190.64 ( 7%) wall 2328880 kB (65%) ggc Uh, wow! :-(
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #132 from Jan Hubicka hubicka at ucw dot cz 2012-05-12 18:32:14 UTC --- tree VRP: 65.88 ( 2%) usr 0.73 ( 2%) sys 66.71 ( 2%) wall 862879 kB (24%) ggc Is it possible to do this again with gathering statistics enabled? The I started it some time ago, but it takes a while (it runs out of RAM even on my machine ;) only thing I can think of for this would be ASSERT_EXPRs and all the rewriting involved for them. It also might be folding doing too much of temporary stuff. tree slp vectorization : 25.42 ( 1%) usr 0.16 ( 0%) sys 25.50 ( 1%) wall 436119 kB (12%) ggc This 12% also seems excessive. Indeed it is. integrated RA : 191.13 ( 7%) usr 0.44 ( 1%) sys 190.64 ( 7%) wall 2328880 kB (65%) ggc Uh, wow! :-( Tep, sems something degenerate here. IRA is usually not that big of memory hog. Honza
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #133 from Jan Hubicka hubicka at ucw dot cz 2012-05-12 19:07:32 UTC --- Another thing to observe is that GGC memory is just 4GB. I am not sure where the other 8GB goes when our IL is believed to be major memory consumer and it resists almost completely in GGC memory. perhaps some of the streaming hashtables gets out of control. Also it seems that line number info is about 1GB. It may be win to write better streaming of locations. Current one enables almost no reuse of locators. Honza
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #134 from Jan Hubicka hubicka at gcc dot gnu.org 2012-05-12 20:22:27 UTC --- I tracked down the LTO/WHOPR code size difference. It is EH handling. EH frame is empty for LTO build and quite large for WHOPR. Probably -fno-exceptions getting lots on way to ltrans? With memory stats there don't seem to be major suprises: tree-phinodes.c:129 (allocate_phi_node) 110246192: 0.8% 0: 0.0%3405296: 0.1% 409376: 0.0% 372408 gimple.c:600 (gimple_build_nop) 119935632: 0.8% 0: 0.0% 252144: 0.0% 0: 0.0%2503912 gimplify.c:437 (create_tmp_var_raw) 119589760: 0.8% 0: 0.0%1119200: 0.0% 0: 0.0% 754431 tree-vrp.c:3993 (build_assert_expr_for) 124663296: 0.9% 0: 0.0% 0: 0.0% 0: 0.0%1298576 emit-rtl.c:3731 (make_jump_insn_raw) 118395600: 0.8% 0: 0.0% 11138960: 0.3% 0: 0.0%1619182 tree-streamer-in.c:484 (streamer_alloc_tree) 90340024: 0.6% 0: 0.0% 51300472: 1.5% 4376: 0.0%1420249 simplify-rtx.c:183 (simplify_gen_binary) 153607224: 1.1% 0: 0.0% 619968: 0.0% 0: 0.0%6426133 fold-const.c:1870 (fold_convert_loc) 154700600: 1.1% 0: 0.0% 2160: 0.0% 0: 0.0%3867569 ggc-common.c:253 (ggc_cleared_alloc_ptr_array_tw 80243272: 0.6% 1267966456:15.3% 76357960: 2.2% 11155352: 1.2%1833025 lto/lto.c:281 (lto_read_in_decl_state) 835696: 0.0% 0: 0.0% 163487336: 4.6% 31116920: 3.4%4176305 cfg.c:216 (connect_src) 174302184: 1.2% 623048: 0.0%7861944: 0.2% 133632: 0.0%4542618 cfg.c:226 (connect_dest) 177198328: 1.2%5444688: 0.1%8603432: 0.2% 347648: 0.0%4628047 tree.c:9115 (make_vector_type)206615472: 1.4% 0: 0.0% 6720: 0.0% 0: 0.0%1229894 emit-rtl.c:639 (gen_rtx_MEM) 202133352: 1.4% 0: 0.0%6629016: 0.2% 0: 0.0%8698432 dwarf2cfi.c:386 (copy_cfi_row)212886640: 1.5% 0: 0.0% 0: 0.0% 0: 0.0%1400570 tree-inline.c:4851 (copy_decl_no_change) 211988960: 1.5% 0: 0.0%7283480: 0.2% 0: 0.0%1387268 tree-ssanames.c:78 (init_ssanames)224107008: 1.6% 252869632: 3.1% 1536: 0.0% 153516032:16.6% 309555 lists.c:144 (alloc_EXPR_LIST) 236354400: 1.7% 0: 0.0%5798160: 0.2% 0: 0.0% 10089690 gimple.c:2237 (gimple_copy) 268995784: 1.9% 0: 0.0%4002872: 0.1% 644208: 0.1%2530798 gimple-streamer-in.c:95 (input_gimple_stmt) 272340080: 1.9% 0: 0.0%4356168: 0.1% 917040: 0.1%2550173 tree-inline.c:4331 (copy_tree_r) 286698704: 2.0% 0: 0.0%2053920: 0.1% 0: 0.0%5999420 rtl.c:287 (copy_rtx) 291942896: 2.0% 0: 0.0% 318864: 0.0% 0: 0.0% 12315136 emit-rtl.c:393 (gen_raw_REG) 271761568: 1.9% 0: 0.0% 25188032: 0.7% 0: 0.0%9279675 cselib.c:1896 (cselib_subst_to_values)299291264: 2.1% 0: 0.0% 0: 0.0% 0: 0.0% 12658684 emit-rtl.c:5427 (init_emit) 354914672: 2.5% 19547728: 0.2% 0: 0.0% 102897600:11.1% 132600 cgraph.c:359 (cgraph_allocate_node) 0: 0.0% 0: 0.0% 401297520:11.4% 0: 0.0%1286210 emit-rtl.c:3679 (make_insn_raw) 435416472: 3.0% 0: 0.0%1754496: 0.0% 0: 0.0%6071819 fold-const.c:7624 (build_fold_addr_expr_with_typ 463283920: 3.2% 0: 0.0% 72880: 0.0% 0: 0.0% 11583920 tree-ssanames.c:141 (make_ssa_name_fn)459164960: 3.2% 0: 0.0%5805920: 0.2% 0: 0.0%5812136 cfg.c:142 (alloc_block) 469702464: 3.3% 0: 0.0% 20328672: 0.6% 0: 0.0%4375278 toplev.c:964 (realloc_for_line_map) 0: 0.0% 357908640: 4.3% 1073741848:30.4%184: 0.0% 9 tree.c:1228 (build_int_cst_wide) 1188738504: 8.3% 0: 0.0% 31478720: 0.9% 401175208:43.3% 295230 tree-streamer-in.c:495 (streamer_alloc_tree) 2413661896:16.9% 0: 0.0% 1163973288:32.9% 41183648: 4.4% 28110064 Total14300758513 8262871404 3534486067927547008308001940 source location GarbageFreed Leak OverheadTimes From explicitely freed GGC mem there are few interesting cases: alias.c:2807 (init_alias_analysis)0: 0.0%
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #135 from Jan Hubicka hubicka at gcc dot gnu.org 2012-05-12 21:33:36 UTC --- ... and mem reports on WPA stage: toplev.c:964 (realloc_for_line_map) 0: 0.0% 89473168: 9.4% 268435472:10.3%160: 0.0% 8 cgraph.c:359 (cgraph_allocate_node) 0: 0.0% 0: 0.0% 401297520:15.3% 0: 0.0%1286210 tree.c:1228 (build_int_cst_wide) 1188709752:33.7% 0: 0.0% 22765400: 0.9% 399425424:83.1% 208540 tree-streamer-in.c:495 (streamer_alloc_tree) 1950272016:55.3% 0: 0.0% 1143907104:43.7% 41182080: 8.6% 22462122 Total3527995024956449616 2618397893480920037 47749265 source location GarbageFreed Leak OverheadTimes So about 50% trees, 15% cgraph nodes (I do have plans how to get those smaller), 10% linemaps (I wonder if simple cache would not save a lot of locators), 5% inline summaries I wonder who is producing that 1GB of temporary integer nodes? Someone abusing them for counting too much? It is there before IPA, so it seems to be streaming or type machinery. Heap vectors: source locationLeak Peak Times --- ipa-reference.c:186 (set_reference_optimization_ 10289688:10.5% 11240664 13: 0.0% lto-cgraph.c:118 (lto_cgraph_encoder_encode) 12756976:13.0% 23348152 26300: 0.2% ipa-ref.c:55 (ipa_record_reference)13593072:13.8% 41932432 1000565: 6.0% passes.c:2214 (execute_one_pass) 21214520:21.5% 41942992 557113: 3.3% ipa-inline-analysis.c:804 (inline_summary_alloc) 30037064:30.5% 30037064 1: 0.0% Total 98450004 16768143 Bitmap Overall Allocated PeakLeak searched search itr - ipa-reference.c:911 (propagate) 37274131244280 3122372031223720 0 0 ipa-reference.c:739 (propagate) 32925813341680 3058960 3058960 0 0 ipa-reference.c:923 (propagate) 37218625153920 2513852025138520 0 0 ipa-reference.c:417 (init_function_info)48726319809560 1980956019809560551335 ipa-reference.c:418 (init_function_info)48726319584680 1958468019584680 79 45 ipa-reference.c:747 (propagate) 32935113229360 3053920 3053920 0 0 Kind Nodes Bytes --- decls11059354 1770384416 types6163492 1035466656 blocks 1 80 stmts 0 0 refs5243 267944 exprs1826905 7444 constants2198755 72290570 identifiers 538891 21555640 vecs 208540 412624304 binfos 1420249 141631744 ssa names111 8880 constructors 1591693820056 random kinds 3270917 130837088 Honza
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #124 from Jan Hubicka hubicka at ucw dot cz 2012-05-11 08:34:17 UTC --- Just for comparison, clang with -O4 runs only single threaded and does everything in memory (no streaming out). It uses 3.5GB of memory (peak) and takes 19 minutes to finish... Interesting. Micsofot's compiler is also barely in 4GB space, right? Is it with debug info? I will try non-WHOPR build to see how bad we are. The actual IL is about 1.5GB of the footprint (measuing GGC memory). I think good part of the rest comes to mmap address space (the object files are rather large). Honza
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #125 from Richard Guenther rguenth at gcc dot gnu.org 2012-05-11 08:44:51 UTC --- (In reply to comment #122) oprofile shows: 139188 15.6963 lto1 lto1 uniquify_nodes 66390 7.4868 lto1 lto1 estimate_edge_growth 52815 5.9560 lto1 lto1 VEC_edge_growth_cache_entry_base_length 47137 5.3157 lto1 lto1 iterative_hash_hashval_t 34037 3.8384 lto1 lto1 htab_find_slot_with_hash 33604 3.7895 lto1 lto1 bp_unpack_value 26584 2.9979 lto1 lto1 do_estimate_growth_1 21410 2.4144 lto1 lto1 ggc_set_mark 17124 1.9311 lto1 lto1 inflate_fast 14464 1.6311 lto1 lto1 streamer_read_uhwi 14204 1.6018 lto1 lto1 lookup_page_table_entry 11430 1.2890 libc-2.11.1.so libc-2.11.1.so memset 11405 1.2861 lto1 lto1 streamer_read_hwi_in_range 11286 1.2727 lto1 lto1 gt_ggc_mx_lang_tree_node 11017 1.2424 lto1 lto1 iterative_hash_gimple_type 10851 1.2237 lto1 lto1 pointer_map_insert 10674 1.2037 lto1 lto1 lto_input_tree 10536 1.1881 lto1 lto1 ht_lookup_with_hash 10269 1.1580 lto1 lto1 streamer_read_uchar 9972 1.1245 lto1 lto1 streamer_read_uchar 9089 1.0250 libc-2.11.1.so libc-2.11.1.so _int_malloc 9086 1.0246 lto1 lto1 alloc_page 6603 0.7446 lto1 lto1 VEC_edge_growth_cache_entry_base_index looks like uniquify_nodes got out of control? Well - the obvious possibly slow part of uniquify nodes is that it walks all fields of record/union types. So - do you have a more detailed profile of uniquify_nodes?
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #126 from Markus Trippelsdorf markus at trippelsdorf dot de 2012-05-11 08:46:39 UTC --- (In reply to comment #124) Just for comparison, clang with -O4 runs only single threaded and does everything in memory (no streaming out). It uses 3.5GB of memory (peak) and takes 19 minutes to finish... Interesting. Micsofot's compiler is also barely in 4GB space, right? IIRC Mozilla recently switched to a 64-bit toolchain on windows, because the 32-bit linker ran out of memory. So they are above 4GB already. Is it with debug info? No.
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #127 from Mike Hommey mh+gcc at glandium dot org 2012-05-11 08:52:24 UTC --- (In reply to comment #126) (In reply to comment #124) Just for comparison, clang with -O4 runs only single threaded and does everything in memory (no streaming out). It uses 3.5GB of memory (peak) and takes 19 minutes to finish... Interesting. Micsofot's compiler is also barely in 4GB space, right? IIRC Mozilla recently switched to a 64-bit toolchain on windows, because the 32-bit linker ran out of memory. So they are above 4GB already. There is unfortunately no cross-linker in MSVC, so you can't link 32-bit binaries with a 64-bit toolchain. We're in the process of switching to 64-bits OS with a 32-its toolchain, which will allow an extra gigabyte of address-space. We've gone past the current 3GB limit a couple times now, at which point, we moved some stuff out of libxul. Before that, we hit the 2GB limit, at which point we used the /3GB option that allows for the extra GB.
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #128 from Jan Hubicka hubicka at ucw dot cz 2012-05-11 08:52:50 UTC --- Well - the obvious possibly slow part of uniquify nodes is that it walks all fields of record/union types. So - do you have a more detailed profile of uniquify_nodes? No, I will try to generate annotated sources then. I am bit puzzled by this - looking at the stuff there seems nothing inherently expensive in it. Honza
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #129 from Jan Hubicka hubicka at gcc dot gnu.org 2012-05-11 19:05:19 UTC --- OK, the slow part of uniuqify_nodes is: /* Remove us from our main variant list if we are not the variant leader. */ if (TYPE_MAIN_VARIANT (t) != t) { tem = TYPE_MAIN_VARIANT (t); while (tem TYPE_NEXT_VARIANT (tem) != t) tem = TYPE_NEXT_VARIANT (tem); if (tem) TYPE_NEXT_VARIANT (tem) = TYPE_NEXT_VARIANT (t); TYPE_NEXT_VARIANT (t) = NULL_TREE; }
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #121 from Jan Hubicka hubicka at gcc dot gnu.org 2012-05-10 21:45:10 UTC --- With inliner performance fix I am going to push out today, the situation looks as follows: Execution times (seconds) phase parsing : 606.20 (98%) usr 21.98 (99%) sys 641.28 (98%) wall 2164274 kB (100%) ggc phase cgraph: 337.00 (55%) usr 18.52 (83%) sys 367.32 (56%) wall 88841 kB ( 4%) ggc phase finalize : 10.21 ( 2%) usr 0.28 ( 1%) sys 10.50 ( 2%) wall 0 kB ( 0%) ggc garbage collection : 33.12 ( 5%) usr 0.04 ( 0%) sys 33.21 ( 5%) wall 0 kB ( 0%) ggc ipa cp : 3.52 ( 1%) usr 0.15 ( 1%) sys 3.67 ( 1%) wall 93737 kB ( 4%) ggc ipa lto gimple out : 14.43 ( 2%) usr 1.38 ( 6%) sys 15.89 ( 2%) wall 0 kB ( 0%) ggc ipa lto decl in : 221.85 (36%) usr 2.52 (11%) sys 225.61 (35%) wall 1153296 kB (53%) ggc ipa lto decl out: 179.65 (29%) usr 8.60 (39%) sys 198.90 (31%) wall 0 kB ( 0%) ggc ipa lto cgraph I/O : 4.59 ( 1%) usr 0.50 ( 2%) sys 5.09 ( 1%) wall 550051 kB (25%) ggc ipa lto decl merge : 9.57 ( 2%) usr 0.00 ( 0%) sys 9.58 ( 1%) wall 291 kB ( 0%) ggc ipa lto cgraph merge: 6.06 ( 1%) usr 0.00 ( 0%) sys 6.08 ( 1%) wall 14158 kB ( 1%) ggc whopr wpa : 6.44 ( 1%) usr 0.06 ( 0%) sys 6.54 ( 1%) wall 2 kB ( 0%) ggc whopr wpa I/O : 2.77 ( 0%) usr 8.03 (36%) sys 11.56 ( 2%) wall 0 kB ( 0%) ggc ipa reference : 5.16 ( 1%) usr 0.08 ( 0%) sys 5.25 ( 1%) wall 0 kB ( 0%) ggc ipa profile : 0.55 ( 0%) usr 0.00 ( 0%) sys 0.55 ( 0%) wall 0 kB ( 0%) ggc ipa pure const : 5.59 ( 1%) usr 0.02 ( 0%) sys 5.61 ( 1%) wall 0 kB ( 0%) ggc parser (global) : 3.98 ( 1%) usr 0.04 ( 0%) sys 4.04 ( 1%) wall 0 kB ( 0%) ggc inline heuristics : 94.38 (15%) usr 0.31 ( 1%) sys 94.90 (15%) wall 342900 kB (16%) ggc tree CFG cleanup: 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc callgraph verifier : 18.53 ( 3%) usr 0.08 ( 0%) sys 18.61 ( 3%) wall 0 kB ( 0%) ggc varconst: 0.04 ( 0%) usr 0.03 ( 0%) sys 0.14 ( 0%) wall 0 kB ( 0%) ggc unaccounted todo: 4.70 ( 1%) usr 0.10 ( 0%) sys 4.81 ( 1%) wall 0 kB ( 0%) ggc TOTAL : 616.4322.26 651.79 2165706 kB So memory use is somewhat up (4GB compared to 3.2GB) but Mozilla grew a bit, too, so I think there are no important changes since my last report. Performance wise we are in better shape than 4.7 release (I will backport the fix, 4.7 needs over 10 minutes in the inliner) but we still are way too slow, with over 3 minutes needed for streaming in..
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375 --- Comment #122 from Jan Hubicka hubicka at gcc dot gnu.org 2012-05-10 21:53:54 UTC --- oprofile shows: 139188 15.6963 lto1 lto1 uniquify_nodes 66390 7.4868 lto1 lto1 estimate_edge_growth 52815 5.9560 lto1 lto1 VEC_edge_growth_cache_entry_base_length 47137 5.3157 lto1 lto1 iterative_hash_hashval_t 34037 3.8384 lto1 lto1 htab_find_slot_with_hash 33604 3.7895 lto1 lto1 bp_unpack_value 26584 2.9979 lto1 lto1 do_estimate_growth_1 21410 2.4144 lto1 lto1 ggc_set_mark 17124 1.9311 lto1 lto1 inflate_fast 14464 1.6311 lto1 lto1 streamer_read_uhwi 14204 1.6018 lto1 lto1 lookup_page_table_entry 11430 1.2890 libc-2.11.1.so libc-2.11.1.so memset 11405 1.2861 lto1 lto1 streamer_read_hwi_in_range 11286 1.2727 lto1 lto1 gt_ggc_mx_lang_tree_node 11017 1.2424 lto1 lto1 iterative_hash_gimple_type 10851 1.2237 lto1 lto1 pointer_map_insert 10674 1.2037 lto1 lto1 lto_input_tree 10536 1.1881 lto1 lto1 ht_lookup_with_hash 10269 1.1580 lto1 lto1 streamer_read_uchar 9972 1.1245 lto1 lto1 streamer_read_uchar 9089 1.0250 libc-2.11.1.so libc-2.11.1.so _int_malloc 9086 1.0246 lto1 lto1 alloc_page 6603 0.7446 lto1 lto1 VEC_edge_growth_cache_entry_base_index looks like uniquify_nodes got out of control?