[Bug other/49194] Trivially stupid inlining decisions

2019-05-10 Thread egallager at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49194

--- Comment #12 from Eric Gallager  ---
(In reply to Jan Hubicka from comment #11)
> Well, I am working on gradual improvements in the inlining decisions,
> but since the PR is not very specific, we never will be perfect :)

So should we leave it open then, or close it?

[Bug other/49194] Trivially stupid inlining decisions

2019-02-10 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49194

--- Comment #11 from Jan Hubicka  ---
Well, I am working on gradual improvements in the inlining decisions,
but since the PR is not very specific, we never will be perfect :)

[Bug other/49194] Trivially stupid inlining decisions

2019-02-10 Thread egallager at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49194

Eric Gallager  changed:

   What|Removed |Added

   Assignee|hubicka at gcc dot gnu.org |unassigned at gcc dot 
gnu.org

--- Comment #10 from Eric Gallager  ---
(In reply to Eric Gallager from comment #9)
> (In reply to Jan Hubicka from comment #8)
> > GCC 4.7 has now shrink wrapping that should reduce effect of inlining large
> > cold functions called once.  Realistic testcases where we still kill code
> > quality would be welcome.
> > 
> > I tested patch to disable inlining once for cold calls, but it does not help
> > in general. What happens is that we stop inlining
> > constructors/destructors/initialization loops that eventually kills code
> > quality of some benchmarks since known values are no longer propagated.  I
> > will do more tunning of this for 4.8.
> > 
> > Honza
> 
> So are you still working on this then?

Guess not; unassigning

[Bug other/49194] Trivially stupid inlining decisions

2018-06-15 Thread egallager at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49194

Eric Gallager  changed:

   What|Removed |Added

 CC||egallager at gcc dot gnu.org

--- Comment #9 from Eric Gallager  ---
(In reply to Jan Hubicka from comment #8)
> GCC 4.7 has now shrink wrapping that should reduce effect of inlining large
> cold functions called once.  Realistic testcases where we still kill code
> quality would be welcome.
> 
> I tested patch to disable inlining once for cold calls, but it does not help
> in general. What happens is that we stop inlining
> constructors/destructors/initialization loops that eventually kills code
> quality of some benchmarks since known values are no longer propagated.  I
> will do more tunning of this for 4.8.
> 
> Honza

So are you still working on this then?

[Bug other/49194] Trivially stupid inlining decisions

2012-03-25 Thread hubicka at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49194

--- Comment #8 from Jan Hubicka hubicka at gcc dot gnu.org 2012-03-25 
09:43:46 UTC ---
GCC 4.7 has now shrink wrapping that should reduce effect of inlining large
cold functions called once.  Realistic testcases where we still kill code
quality would be welcome.

I tested patch to disable inlining once for cold calls, but it does not help in
general. What happens is that we stop inlining
constructors/destructors/initialization loops that eventually kills code
quality of some benchmarks since known values are no longer propagated.  I will
do more tunning of this for 4.8.

Honza


[Bug other/49194] Trivially stupid inlining decisions

2011-05-27 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49194

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2011.05.27 16:02:09
 CC||hubicka at gcc dot gnu.org,
   ||jakub at gcc dot gnu.org
 Ever Confirmed|0   |1

--- Comment #1 from Jakub Jelinek jakub at gcc dot gnu.org 2011-05-27 
16:02:09 UTC ---
I agree if the called function is big and it is very unlikely (most probably
just in PROB_VERY_UNLIKELY cases) -finline-functions-called-once shouldn't
inline.

E.g.
void baz (void);

static void
foo (int x)
{
  char buf[65536];
  int a, b, c, d, e, f, g, h, i, j, k, l, m;
#define A asm volatile (nop : +r (x), =r (a), =r (b), =r (c), \
=r (d), =r (e), =r (f), =r (g), =r (h), \
=r (i), =r (j), =r (k), =r (l), =r (m) \
: r (buf));
#define B A A A A A A A A A A A
#define C B B B B B B B B B B B
  C
}

int
bar (int x)
{
  if (__builtin_expect (x  65, 0))
foo (x + 6);
  baz ();
  return x;
}

inlines foo at -O2.


[Bug other/49194] Trivially stupid inlining decisions

2011-05-27 Thread matz at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49194

Michael Matz matz at gcc dot gnu.org changed:

   What|Removed |Added

 CC||matz at gcc dot gnu.org

--- Comment #2 from Michael Matz matz at gcc dot gnu.org 2011-05-27 16:12:12 
UTC ---
For adjusting GCCs idea about what stack frame growth is acceptable also 
fiddle with the large-stack-frame and large-stack-frame-growth parameters.


[Bug other/49194] Trivially stupid inlining decisions

2011-05-27 Thread hubicka at ucw dot cz
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49194

--- Comment #3 from Jan Hubicka hubicka at ucw dot cz 2011-05-27 16:15:02 UTC 
---
 I agree if the called function is big and it is very unlikely (most probably
 just in PROB_VERY_UNLIKELY cases) -finline-functions-called-once shouldn't
 inline.

-finline-functions-called-once  is trottled down by the large-function-growth
and large-stack-frame-growth limits. The  Kernel case coupld proably be handled
by the second. Does kernel bump down that limits?
It still won't help in case function doesn't have any on-stack aggregates,
since we optimistically assume that all gimple registers will disappear.
Probably
even that could be change, though estimating reload's stack frame usage so
early would
be iffy.

I agree that both limits are bit high for this particular case. However I
experimented with bumping this down some time ago with an extra parameter and
did not find any practictical benefit that would justify adding yet another
limit to the inliner.

So testcases would be welcome.
Honza


[Bug other/49194] Trivially stupid inlining decisions

2011-05-27 Thread hubicka at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49194

--- Comment #4 from Jan Hubicka hubicka at gcc dot gnu.org 2011-05-27 
16:19:04 UTC ---
BTW mainline won't inline foo in that testcase:

Deciding on functions called once:
  not inlinable: bar/1 - foo/0, --param large-stack-frame-growth limit reached

I fixed some stack-growth estimation issues while rewriting inliner for 4.7, if
4.6 inlines, perhaps I can figure out what gets wrong there.


[Bug other/49194] Trivially stupid inlining decisions

2011-05-27 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49194

--- Comment #5 from Jakub Jelinek jakub at gcc dot gnu.org 2011-05-27 
16:24:45 UTC ---
Oops, s/65536/128/, I've changed the testcase too late without retesting.

Anyway, the point is that the limits should be adjusted somewhat if the call is
PROB_VERY_UNLIKELY.


[Bug other/49194] Trivially stupid inlining decisions

2011-05-27 Thread torva...@linux-foundation.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49194

--- Comment #6 from Linus Torvalds torva...@linux-foundation.org 2011-05-27 
16:38:22 UTC ---
(In reply to comment #3)
 
 -finline-functions-called-once  is trottled down by the large-function-growth
 and large-stack-frame-growth limits. The  Kernel case coupld proably be 
 handled
 by the second. Does kernel bump down that limits?

We used to play with inlining limits (gcc had some really bad decisions), but
the meaning of the numbers kept changing from one gcc version to another, and
the heuristics gcc used kept changing too. Which made it practically impossible
to use sanely - you could tweak it for one particular architecture, and one
particular version of gcc, but it would then be worse for others.

Quite frankly, with that kind of history, I'm not very eager to start playing
around with random gcc internal variables again.

So I'd much rather have gcc have good heuristics by default, possibly helped by
the kinds of obvious hints we can give (unlikely() in particular is something
we can add for things like this).

Obviously, we can (and do) use the force the decision with either noinline
or __always_inline (which are just the kernel macros to make the gcc
attribute syntax slightly more readable), but since I've been doing those other
bug reports about bad gcc code generation, I thought I'd point out this one
too.

 It still won't help in case function doesn't have any on-stack aggregates,
 since we optimistically assume that all gimple registers will disappear.
 Probably
 even that could be change, though estimating reload's stack frame usage so
 early would
 be iffy.

Yes, early stack estimation might not work all that well.

In the kernel, we do end up having a few complex functions that we basically
expect to inline to almost nothing - simply because we end up depending on
compile-time constant issues (sometimes very explicitly, with
__builtin_constant_p() followed by a largish switch () statement).

That said, this is something where the call-site really can make a big
difference. Not just the fact that the call site might be marked unlikely()
(again, that's just the kernel making __builtin_expect() readable), but things
like none of the arguments are constants could easily be a good heuristic to
use as a basis for whether to inline or not.

IOW, start out with whatever 'large-stack-frame-growth' and
'large-function-growth' values, but if the call-site is in an unlikely region,
cut those values in half (or whatever). And if none of the arguments are
constants, cut it in half again.

This is an example of why giving these limits as compiler options really
doesn't work: the choice should probably be much more dynamic than just a
single number.

I dunno. As mentioned, we can fix this problem by just marking things noinline
by hand. But I do think that there are fairly obvious cases where inlining
really isn't worth it, and gcc might as well just get those cases right.


[Bug other/49194] Trivially stupid inlining decisions

2011-05-27 Thread hubicka at ucw dot cz
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49194

--- Comment #7 from Jan Hubicka hubicka at ucw dot cz 2011-05-27 18:47:56 UTC 
---
 
 We used to play with inlining limits (gcc had some really bad decisions), but
 the meaning of the numbers kept changing from one gcc version to another, and
 the heuristics gcc used kept changing too. Which made it practically 
 impossible
 to use sanely - you could tweak it for one particular architecture, and one
 particular version of gcc, but it would then be worse for others.

Well, the --param limits are really meant to be more for GCC development than
for being adjusted to random values by random codebases, so this won't really
help indeed. The heuristics needs to keep envolving since the expectations
on it are changing,too (i.e. 10 years ago people didn't really cared about sane
boost/pooma performance and just year ago no one really loaded 20 functions
into compiler at once as we do now with LTO on mozilla).

However concerning the stack growth, this was implemented on request of Andi
Kleen for the kernel, since the stack size constraints are special. There
is -fconserve-stack argument to avoid need to use magic --param values (that
are not that magic for stack usage after all given that they count bytes).
 
 So I'd much rather have gcc have good heuristics by default, possibly helped 
 by
 the kinds of obvious hints we can give (unlikely() in particular is 
 something
 we can add for things like this).

Agreed.  In cases like this you might find handy the cold attribute that has
the effect of making function optimized for size as well as telling branch
predictor to predicts calls leading to call of cold function as unlikely.
It would save need sfor some of the unlikely() calls.
 
 Yes, early stack estimation might not work all that well.

Yep, one would have to do something like computing the number of live
values at given place of function that is doable but moderately expensive
for such a special purpose situation.
 
 That said, this is something where the call-site really can make a big
 difference. Not just the fact that the call site might be marked unlikely()
 (again, that's just the kernel making __builtin_expect() readable), but things
 like none of the arguments are constants could easily be a good heuristic to
 use as a basis for whether to inline or not.
 
 IOW, start out with whatever 'large-stack-frame-growth' and
 'large-function-growth' values, but if the call-site is in an unlikely region,
 cut those values in half (or whatever). And if none of the arguments are
 constants, cut it in half again.
 
 This is an example of why giving these limits as compiler options really
 doesn't work: the choice should probably be much more dynamic than just a
 single number.
 
 I dunno. As mentioned, we can fix this problem by just marking things noinline
 by hand. But I do think that there are fairly obvious cases where inlining
 really isn't worth it, and gcc might as well just get those cases right.

Well, this model won't work as suggested. The problem is that in GCC inliner
simplistic POV inlining is always a win, unless it hits the large
function/stack/unit bounds. Only tradeoff it understand is the code size
growth.

Here inlining causes regalloc to produce bigger stack frame because it is
stupid and doesn't know how to do shrink wraping.  (Bernd recently has posted
patches for this, I duno about their status and if they will help here) This is
important because the hot path through the outer function is extremely short
and the outer function is simple so it won't need the registers otherwise.

The large function/unit limits don't really worry about actual code quality,
just the fact that we don't want non-linear algorithms in the compiler to
become too prominent.  So the starting values are high for this purpose. Large
function is currently set to amusing constant of 2700insns. Dividing by 4 won't
do much help. The real reason is that we are really mixing two concepts
together (compiler nonlinearity and code quality considerations). This is not
good idea.

Some years ago I introduced the notion of hot and cold basic blocks to the
GCC inliner and told it to not inline functions into cold basic blocks unless
caller size (or overall program size) is expected to shrink. This has also
introduced number of regression I had to get through.  (think of not inlining
destructor of object in EH handling code that prevents the object from being
scalar replaced and optimized away).

At the moment I can come up with the following suggestions:

1) inline functions called once from cold basic blocks only when they are small
so caller size will shring (like inlining of small functions does)
2) introduce new function body size limit used only for cold functions called
once
3) try to somehow get very rough stack frame estimate into our current stack
frame growth limits.

I guess I will try 1) and see how it affects other code and if it is not
too bad we can stay with