[Bug rtl-optimization/96388] scheduling takes forever with -fPIC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96388 Andrew Pinski changed: What|Removed |Added Target Milestone|--- |14.0
[Bug rtl-optimization/96388] scheduling takes forever with -fPIC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96388 Maxim Kuvyrkov changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #18 from Maxim Kuvyrkov --- Fixed.
[Bug rtl-optimization/96388] scheduling takes forever with -fPIC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96388 --- Comment #17 from GCC Commits --- The master branch has been updated by Maxim Kuvyrkov : https://gcc.gnu.org/g:0c42d1782e48d8ad578ace2065cce9b3615f97c0 commit r14-8174-g0c42d1782e48d8ad578ace2065cce9b3615f97c0 Author: Maxim Kuvyrkov Date: Sun Nov 19 08:43:05 2023 + sched-deps.cc (find_modifiable_mems): Avoid exponential behavior [PR96388] This patch avoids sched-deps.cc:find_inc() creating exponential number of dependencies, which become memory and compilation time hogs. Consider example (simplified from PR96388) ... === sp=sp-4 // sp_insnA mem_insnA1[sp+A1] ... mem_insnAN[sp+AN] sp=sp-4 // sp_insnB mem_insnB1[sp+B1] ... mem_insnBM[sp+BM] === [For simplicity, let's assume find_inc(backwards==true)]. In this example find_modifiable_mems() will arrange for mem_insnA* to be able to pass sp_insnA, and, while doing this, will create dependencies between all mem_insnA*s and sp_insnB -- because sp_insnB is a consumer of sp_insnA. After this sp_insnB will have N new backward dependencies. Then find_modifiable_mems() gets to mem_insnB*s and starts to create N new dependencies for _every_ mem_insnB*. This gets us N*M new dependencies. In PR96833's testcase N and M are 10k-15k, which causes RAM usage of 30GB and compilation time of 30 minutes, with sched2 accounting for 95% of both metrics. After this patch the RAM usage is down to 1GB and compilation time is down to 3-4 minutes, with sched2 no longer standing out on -ftime-report or memory usage. gcc/ChangeLog: PR rtl-optimization/96388 PR rtl-optimization/111554 * sched-deps.cc (find_inc): Avoid exponential behavior.
[Bug rtl-optimization/96388] scheduling takes forever with -fPIC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96388 --- Comment #16 from Maxim Kuvyrkov --- Posted patch in https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637419.html
[Bug rtl-optimization/96388] scheduling takes forever with -fPIC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96388 Maxim Kuvyrkov changed: What|Removed |Added Status|NEW |ASSIGNED --- Comment #15 from Maxim Kuvyrkov --- Finished analysis. Will post a patch next week.
[Bug rtl-optimization/96388] scheduling takes forever with -fPIC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96388 Maxim Kuvyrkov changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |mkuvyrkov at gcc dot gnu.org --- Comment #14 from Maxim Kuvyrkov --- Taking.
[Bug rtl-optimization/96388] scheduling takes forever with -fPIC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96388 Martin Liška changed: What|Removed |Added CC||amker at gcc dot gnu.org --- Comment #13 from Martin Liška --- (In reply to Richard Biener from comment #10) > The partially reduced (In reply to Martin Liška from comment #9) > > Created attachment 48962 [details] > > Partially reduced test-case > > > > The reduction is quite stuck at this point. > > No longer keys on -fPIC though, so the bisection for this is likely wrong. You are right, without the -fPIC argument, for 1GB memory limit, first bad is: r5-4790-g43722f9fa69d4cc9 where the previous revision only needs ~190MB. > -fno-schedule-insns2 improves it from 18s to 5s compile time and from > 1.1GB of peak RSS to 320MB. > > scheduling 2 : 12.69 ( 71%) 0.10 ( 67%) 12.79 ( > 70%) 11128 kB ( 16%) > > -fmem-report doesn't show anything interesting, looking for heap allocations > now to find the offender. > > Can you bisect your reduced testcase again? GCC 8.4 behaves the same for it > rather than being good but GCC 4.8.5 is fine.
[Bug rtl-optimization/96388] scheduling takes forever with -fPIC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96388 Richard Biener changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Comment #12 from Richard Biener --- None of the various scheduler --params has any effect. selective scheduling is also affected.
[Bug rtl-optimization/96388] scheduling takes forever with -fPIC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96388 Richard Biener changed: What|Removed |Added CC||law at gcc dot gnu.org --- Comment #11 from Richard Biener --- (In reply to Richard Biener from comment #10) > The partially reduced (In reply to Martin Liška from comment #9) > > Created attachment 48962 [details] > > Partially reduced test-case > > > > The reduction is quite stuck at this point. > > No longer keys on -fPIC though, so the bisection for this is likely wrong. > -fno-schedule-insns2 improves it from 18s to 5s compile time and from > 1.1GB of peak RSS to 320MB. > > scheduling 2 : 12.69 ( 71%) 0.10 ( 67%) 12.79 ( > 70%) 11128 kB ( 16%) > > -fmem-report doesn't show anything interesting, looking for heap allocations > now to find the offender. > > Can you bisect your reduced testcase again? GCC 8.4 behaves the same for it > rather than being good but GCC 4.8.5 is fine. For the testcase most time is spent in constrain_operands and update_conflict_hard_regno_costs. It looks like the main issue is a very large chain of dependences and thus going from 27000 schedule_insn calls to 10 000 000 calls to try_ready which means the sd_iterator iterates over many dependent instructions, not stopping at "common dependences". That's likely also the source of the memory use (the dn_pool), though memory reporting with --enable-gather-detailed-mem-stats doesn't seem to work for this pool? dep_nodesched-deps.c:4107 (sched_deps_init) 1 0 : 0.0%0 0 : 0.0% 80 deps_list sched-deps.c:4105 (sched_deps_init) 1 0 : 0.0% 2179k 136k: 0.9% 16 There's also 10 million dep_replacement nodes which are all allocated via XCNEW ... another object_allocator would be more efficient here I guess. Could it be that sched-deps makes a tree out of a dependence graph? CCing the only active haifa scheduler maintainer...
[Bug rtl-optimization/96388] scheduling takes forever with -fPIC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96388 --- Comment #10 from Richard Biener --- The partially reduced (In reply to Martin Liška from comment #9) > Created attachment 48962 [details] > Partially reduced test-case > > The reduction is quite stuck at this point. No longer keys on -fPIC though, so the bisection for this is likely wrong. -fno-schedule-insns2 improves it from 18s to 5s compile time and from 1.1GB of peak RSS to 320MB. scheduling 2 : 12.69 ( 71%) 0.10 ( 67%) 12.79 ( 70%) 11128 kB ( 16%) -fmem-report doesn't show anything interesting, looking for heap allocations now to find the offender. Can you bisect your reduced testcase again? GCC 8.4 behaves the same for it rather than being good but GCC 4.8.5 is fine.
[Bug rtl-optimization/96388] scheduling takes forever with -fPIC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96388 --- Comment #9 from Martin Liška --- Created attachment 48962 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48962=edit Partially reduced test-case The reduction is quite stuck at this point.
[Bug rtl-optimization/96388] scheduling takes forever with -fPIC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96388 --- Comment #8 from Richard Biener --- > /usr/bin/time ./cc1 -quiet t.c 2>&1 | head -1 | awk '{ print $6 * 1 }' 35975 is the max RSS in kB. Guess subtracting the value for an empty compile makes sense as well.
[Bug rtl-optimization/96388] scheduling takes forever with -fPIC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96388 --- Comment #7 from Martin Liška --- (In reply to Richard Biener from comment #6) > (In reply to Martin Liška from comment #5) > > Started with r9-2469-gc6067437d314f803. > > Hmm, it probably makes a latent scheduler issue appear. Guess for better > analysis we have to trim down the source. Not sure how - maybe > automatically with one good and one bad rev. looking for a hundred-fold > increase in memory use? Look at the good compile with /usr/bin/time > and using the RSS to compute a ulimit -v limit or so. I'm reducing that in the ulimit fence ;) Let see where it leads..
[Bug rtl-optimization/96388] scheduling takes forever with -fPIC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96388 --- Comment #6 from Richard Biener --- (In reply to Martin Liška from comment #5) > Started with r9-2469-gc6067437d314f803. Hmm, it probably makes a latent scheduler issue appear. Guess for better analysis we have to trim down the source. Not sure how - maybe automatically with one good and one bad rev. looking for a hundred-fold increase in memory use? Look at the good compile with /usr/bin/time and using the RSS to compute a ulimit -v limit or so.
[Bug rtl-optimization/96388] scheduling takes forever with -fPIC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96388 Martin Liška changed: What|Removed |Added CC||uros at gcc dot gnu.org --- Comment #5 from Martin Liška --- Started with r9-2469-gc6067437d314f803.
[Bug rtl-optimization/96388] scheduling takes forever with -fPIC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96388 Martin Liška changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed||2020-07-30 CC||marxin at gcc dot gnu.org Status|UNCONFIRMED |NEW --- Comment #4 from Martin Liška --- Bisecting that..
[Bug rtl-optimization/96388] scheduling takes forever with -fPIC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96388 Richard Biener changed: What|Removed |Added Known to fail||9.1.0 --- Comment #3 from Richard Biener --- Bisection with ulimit -v 100 (1GB) should work I guess.
[Bug rtl-optimization/96388] scheduling takes forever with -fPIC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96388 --- Comment #2 from Richard Biener --- Created attachment 48958 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48958=edit smaller testcase Removed the smaller functions.
[Bug rtl-optimization/96388] scheduling takes forever with -fPIC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96388 Richard Biener changed: What|Removed |Added Keywords||compile-time-hog, ||memory-hog, needs-bisection Target||x86_64-*-* Known to work||8.4.0 Known to fail||11.0, 9.3.0 --- Comment #1 from Richard Biener --- GCC 8.4 is fine though, taking 600MB and 30seconds. -O1 -fPIC is fine everywhere as well as -O2 -fno-PIC.