[Bug rtl-optimization/96388] scheduling takes forever with -fPIC

2024-01-20 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96388

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |14.0

[Bug rtl-optimization/96388] scheduling takes forever with -fPIC

2024-01-16 Thread mkuvyrkov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96388

Maxim Kuvyrkov  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #18 from Maxim Kuvyrkov  ---
Fixed.

[Bug rtl-optimization/96388] scheduling takes forever with -fPIC

2024-01-16 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96388

--- Comment #17 from GCC Commits  ---
The master branch has been updated by Maxim Kuvyrkov :

https://gcc.gnu.org/g:0c42d1782e48d8ad578ace2065cce9b3615f97c0

commit r14-8174-g0c42d1782e48d8ad578ace2065cce9b3615f97c0
Author: Maxim Kuvyrkov 
Date:   Sun Nov 19 08:43:05 2023 +

sched-deps.cc (find_modifiable_mems): Avoid exponential behavior [PR96388]

This patch avoids sched-deps.cc:find_inc() creating exponential number
of dependencies, which become memory and compilation time hogs.
Consider example (simplified from PR96388) ...
===
sp=sp-4 // sp_insnA
mem_insnA1[sp+A1]
...
mem_insnAN[sp+AN]
sp=sp-4 // sp_insnB
mem_insnB1[sp+B1]
...
mem_insnBM[sp+BM]
===

[For simplicity, let's assume find_inc(backwards==true)].
In this example find_modifiable_mems() will arrange for mem_insnA*
to be able to pass sp_insnA, and, while doing this, will create
dependencies between all mem_insnA*s and sp_insnB -- because sp_insnB
is a consumer of sp_insnA.  After this sp_insnB will have N new
backward dependencies.
Then find_modifiable_mems() gets to mem_insnB*s and starts to create
N new dependencies for _every_ mem_insnB*.  This gets us N*M new
dependencies.

In PR96833's testcase N and M are 10k-15k, which causes RAM usage of
30GB and compilation time of 30 minutes, with sched2 accounting for
95% of both metrics.  After this patch the RAM usage is down to 1GB
and compilation time is down to 3-4 minutes, with sched2 no longer
standing out on -ftime-report or memory usage.

gcc/ChangeLog:

PR rtl-optimization/96388
PR rtl-optimization/111554
* sched-deps.cc (find_inc): Avoid exponential behavior.

[Bug rtl-optimization/96388] scheduling takes forever with -fPIC

2023-11-20 Thread mkuvyrkov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96388

--- Comment #16 from Maxim Kuvyrkov  ---
Posted patch in
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637419.html

[Bug rtl-optimization/96388] scheduling takes forever with -fPIC

2023-11-18 Thread mkuvyrkov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96388

Maxim Kuvyrkov  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED

--- Comment #15 from Maxim Kuvyrkov  ---
Finished analysis.  Will post a patch next week.

[Bug rtl-optimization/96388] scheduling takes forever with -fPIC

2023-10-30 Thread mkuvyrkov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96388

Maxim Kuvyrkov  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |mkuvyrkov at gcc dot 
gnu.org

--- Comment #14 from Maxim Kuvyrkov  ---
Taking.

[Bug rtl-optimization/96388] scheduling takes forever with -fPIC

2020-07-31 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96388

Martin Liška  changed:

   What|Removed |Added

 CC||amker at gcc dot gnu.org

--- Comment #13 from Martin Liška  ---
(In reply to Richard Biener from comment #10)
> The partially reduced (In reply to Martin Liška from comment #9)
> > Created attachment 48962 [details]
> > Partially reduced test-case
> > 
> > The reduction is quite stuck at this point.
> 
> No longer keys on -fPIC though, so the bisection for this is likely wrong.

You are right, without the -fPIC argument, for 1GB memory limit, first bad is:
r5-4790-g43722f9fa69d4cc9

where the previous revision only needs ~190MB.


> -fno-schedule-insns2 improves it from 18s to 5s compile time and from
> 1.1GB of peak RSS to 320MB.
> 
>  scheduling 2   :  12.69 ( 71%)   0.10 ( 67%)  12.79 (
> 70%)   11128 kB ( 16%)
> 
> -fmem-report doesn't show anything interesting, looking for heap allocations
> now to find the offender.
> 
> Can you bisect your reduced testcase again?  GCC 8.4 behaves the same for it
> rather than being good but GCC 4.8.5 is fine.

[Bug rtl-optimization/96388] scheduling takes forever with -fPIC

2020-07-31 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96388

Richard Biener  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #12 from Richard Biener  ---
None of the various scheduler --params has any effect.  selective scheduling is
also affected.

[Bug rtl-optimization/96388] scheduling takes forever with -fPIC

2020-07-31 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96388

Richard Biener  changed:

   What|Removed |Added

 CC||law at gcc dot gnu.org

--- Comment #11 from Richard Biener  ---
(In reply to Richard Biener from comment #10)
> The partially reduced (In reply to Martin Liška from comment #9)
> > Created attachment 48962 [details]
> > Partially reduced test-case
> > 
> > The reduction is quite stuck at this point.
> 
> No longer keys on -fPIC though, so the bisection for this is likely wrong.
> -fno-schedule-insns2 improves it from 18s to 5s compile time and from
> 1.1GB of peak RSS to 320MB.
> 
>  scheduling 2   :  12.69 ( 71%)   0.10 ( 67%)  12.79 (
> 70%)   11128 kB ( 16%)
> 
> -fmem-report doesn't show anything interesting, looking for heap allocations
> now to find the offender.
> 
> Can you bisect your reduced testcase again?  GCC 8.4 behaves the same for it
> rather than being good but GCC 4.8.5 is fine.

For the testcase most time is spent in constrain_operands and
update_conflict_hard_regno_costs.  It looks like the main issue
is a very large chain of dependences and thus going from
27000 schedule_insn calls to 10 000 000 calls to try_ready
which means the sd_iterator iterates over many dependent instructions,
not stopping at "common dependences".  That's likely also the source
of the memory use (the dn_pool), though memory reporting with
--enable-gather-detailed-mem-stats doesn't seem to work for this pool?

dep_nodesched-deps.c:4107 (sched_deps_init)
 1 0 :  0.0%0 0 :  0.0%  80
deps_list   sched-deps.c:4105 (sched_deps_init)
 1 0 :  0.0% 2179k  136k:  0.9%  16

There's also 10 million dep_replacement nodes which are all allocated
via XCNEW ... another object_allocator would be more efficient here
I guess.  Could it be that sched-deps makes a tree out of a dependence
graph?

CCing the only active haifa scheduler maintainer...

[Bug rtl-optimization/96388] scheduling takes forever with -fPIC

2020-07-31 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96388

--- Comment #10 from Richard Biener  ---
The partially reduced (In reply to Martin Liška from comment #9)
> Created attachment 48962 [details]
> Partially reduced test-case
> 
> The reduction is quite stuck at this point.

No longer keys on -fPIC though, so the bisection for this is likely wrong.
-fno-schedule-insns2 improves it from 18s to 5s compile time and from
1.1GB of peak RSS to 320MB.

 scheduling 2   :  12.69 ( 71%)   0.10 ( 67%)  12.79 ( 70%)
  11128 kB ( 16%)

-fmem-report doesn't show anything interesting, looking for heap allocations
now to find the offender.

Can you bisect your reduced testcase again?  GCC 8.4 behaves the same for it
rather than being good but GCC 4.8.5 is fine.

[Bug rtl-optimization/96388] scheduling takes forever with -fPIC

2020-07-30 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96388

--- Comment #9 from Martin Liška  ---
Created attachment 48962
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48962=edit
Partially reduced test-case

The reduction is quite stuck at this point.

[Bug rtl-optimization/96388] scheduling takes forever with -fPIC

2020-07-30 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96388

--- Comment #8 from Richard Biener  ---
> /usr/bin/time ./cc1 -quiet t.c 2>&1 | head -1 | awk '{ print $6 * 1 }'
35975

is the max RSS in kB.  Guess subtracting the value for an empty compile
makes sense as well.

[Bug rtl-optimization/96388] scheduling takes forever with -fPIC

2020-07-30 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96388

--- Comment #7 from Martin Liška  ---
(In reply to Richard Biener from comment #6)
> (In reply to Martin Liška from comment #5)
> > Started with r9-2469-gc6067437d314f803.
> 
> Hmm, it probably makes a latent scheduler issue appear.  Guess for better
> analysis we have to trim down the source.  Not sure how - maybe
> automatically with one good and one bad rev. looking for a hundred-fold
> increase in memory use?  Look at the good compile with /usr/bin/time
> and using the RSS to compute a ulimit -v limit or so.

I'm reducing that in the ulimit fence ;) Let see where it leads..

[Bug rtl-optimization/96388] scheduling takes forever with -fPIC

2020-07-30 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96388

--- Comment #6 from Richard Biener  ---
(In reply to Martin Liška from comment #5)
> Started with r9-2469-gc6067437d314f803.

Hmm, it probably makes a latent scheduler issue appear.  Guess for better
analysis we have to trim down the source.  Not sure how - maybe
automatically with one good and one bad rev. looking for a hundred-fold
increase in memory use?  Look at the good compile with /usr/bin/time
and using the RSS to compute a ulimit -v limit or so.

[Bug rtl-optimization/96388] scheduling takes forever with -fPIC

2020-07-30 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96388

Martin Liška  changed:

   What|Removed |Added

 CC||uros at gcc dot gnu.org

--- Comment #5 from Martin Liška  ---
Started with r9-2469-gc6067437d314f803.

[Bug rtl-optimization/96388] scheduling takes forever with -fPIC

2020-07-30 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96388

Martin Liška  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2020-07-30
 CC||marxin at gcc dot gnu.org
 Status|UNCONFIRMED |NEW

--- Comment #4 from Martin Liška  ---
Bisecting that..

[Bug rtl-optimization/96388] scheduling takes forever with -fPIC

2020-07-30 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96388

Richard Biener  changed:

   What|Removed |Added

  Known to fail||9.1.0

--- Comment #3 from Richard Biener  ---
Bisection with ulimit -v 100 (1GB) should work I guess.

[Bug rtl-optimization/96388] scheduling takes forever with -fPIC

2020-07-30 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96388

--- Comment #2 from Richard Biener  ---
Created attachment 48958
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48958=edit
smaller testcase

Removed the smaller functions.

[Bug rtl-optimization/96388] scheduling takes forever with -fPIC

2020-07-30 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96388

Richard Biener  changed:

   What|Removed |Added

   Keywords||compile-time-hog,
   ||memory-hog, needs-bisection
 Target||x86_64-*-*
  Known to work||8.4.0
  Known to fail||11.0, 9.3.0

--- Comment #1 from Richard Biener  ---
GCC 8.4 is fine though, taking 600MB and 30seconds.  -O1 -fPIC is fine
everywhere as well as -O2 -fno-PIC.