https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103454

            Bug ID: 103454
           Summary: -finline-functions-called-once is both compile-time
                    and runtime loss at average for spec2006, spec2017 and
                    tramp3d
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: hubicka at gcc dot gnu.org
  Target Milestone: ---

Looking into exchange2 performance I run benchmarks with
-fno-inline-functions-called-once.  It seems we do have important regressions
here.

The following compares default flags (base) and run with additional
-fno-inline-functions-called-once
https://lnt.opensuse.org/db_default/v4/CPP/latest_runs_report?younger_in_days=14&older_in_days=0&all_changes=on&min_percentage_change=0.01&revisions=c53447034965e4191a8738f045a3c7d1552d5f59%2C67b183fac7b08067fdd3c09abd3efd2691083395&include_user_branches=on

https://lnt.opensuse.org/db_default/v4/SPEC/latest_runs_report?younger_in_days=14&older_in_days=0&all_changes=on&min_percentage_change=0.01&revisions=c53447034965e4191a8738f045a3c7d1552d5f59%2C67b183fac7b08067fdd3c09abd3efd2691083395&include_user_branches=on

Large differences are

default flags wins
 - fatigue2 with both -O2 and -Ofast inlining 40%

-fno-inline-functions-called-once wins:
 - tramp3d with -Ofast. 31%
 - exchange2 with -Ofast 11-21%
 - specfp2006 total build time 41% (mostly wrf that builds 71% faster)
 - specint2005 total about 1.5-3%
 - specfp2017 total 64% (again mostly wrf)
 - specint2017 total 2.5-3.5%

Once more tests are run I can make better summary.  It is couple releases since
I benchmarked -fno-inline-functions-called-once so I am not quite sure how long
we have the problem.

For exchange2 the problem is inlining different clones of digits2 into each
other. Each clone of digits2 has 9 nested loop and calls the other clone from
innermost one.  I guess we may want to have loop depth limit on inlining once
and also have its own specific large-functions-insns and growth (in particular,
I think the growth wants to be smaller, like say 10% instead of letting
function grow twice).

It also shows however that we have problems in middle-end both in scalability
and code quality on large CFGs which are probably quite important (and anoying)
to track down.

Reply via email to