[Bug other/49194] Trivially stupid inlining decisions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49194 --- Comment #12 from Eric Gallager --- (In reply to Jan Hubicka from comment #11) > Well, I am working on gradual improvements in the inlining decisions, > but since the PR is not very specific, we never will be perfect :) So should we leave it open then, or close it?
[Bug other/49194] Trivially stupid inlining decisions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49194 --- Comment #11 from Jan Hubicka --- Well, I am working on gradual improvements in the inlining decisions, but since the PR is not very specific, we never will be perfect :)
[Bug other/49194] Trivially stupid inlining decisions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49194 Eric Gallager changed: What|Removed |Added Assignee|hubicka at gcc dot gnu.org |unassigned at gcc dot gnu.org --- Comment #10 from Eric Gallager --- (In reply to Eric Gallager from comment #9) > (In reply to Jan Hubicka from comment #8) > > GCC 4.7 has now shrink wrapping that should reduce effect of inlining large > > cold functions called once. Realistic testcases where we still kill code > > quality would be welcome. > > > > I tested patch to disable inlining once for cold calls, but it does not help > > in general. What happens is that we stop inlining > > constructors/destructors/initialization loops that eventually kills code > > quality of some benchmarks since known values are no longer propagated. I > > will do more tunning of this for 4.8. > > > > Honza > > So are you still working on this then? Guess not; unassigning
[Bug other/49194] Trivially stupid inlining decisions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49194 Eric Gallager changed: What|Removed |Added CC||egallager at gcc dot gnu.org --- Comment #9 from Eric Gallager --- (In reply to Jan Hubicka from comment #8) > GCC 4.7 has now shrink wrapping that should reduce effect of inlining large > cold functions called once. Realistic testcases where we still kill code > quality would be welcome. > > I tested patch to disable inlining once for cold calls, but it does not help > in general. What happens is that we stop inlining > constructors/destructors/initialization loops that eventually kills code > quality of some benchmarks since known values are no longer propagated. I > will do more tunning of this for 4.8. > > Honza So are you still working on this then?
[Bug other/49194] Trivially stupid inlining decisions
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49194 --- Comment #8 from Jan Hubicka hubicka at gcc dot gnu.org 2012-03-25 09:43:46 UTC --- GCC 4.7 has now shrink wrapping that should reduce effect of inlining large cold functions called once. Realistic testcases where we still kill code quality would be welcome. I tested patch to disable inlining once for cold calls, but it does not help in general. What happens is that we stop inlining constructors/destructors/initialization loops that eventually kills code quality of some benchmarks since known values are no longer propagated. I will do more tunning of this for 4.8. Honza
[Bug other/49194] Trivially stupid inlining decisions
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49194 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2011.05.27 16:02:09 CC||hubicka at gcc dot gnu.org, ||jakub at gcc dot gnu.org Ever Confirmed|0 |1 --- Comment #1 from Jakub Jelinek jakub at gcc dot gnu.org 2011-05-27 16:02:09 UTC --- I agree if the called function is big and it is very unlikely (most probably just in PROB_VERY_UNLIKELY cases) -finline-functions-called-once shouldn't inline. E.g. void baz (void); static void foo (int x) { char buf[65536]; int a, b, c, d, e, f, g, h, i, j, k, l, m; #define A asm volatile (nop : +r (x), =r (a), =r (b), =r (c), \ =r (d), =r (e), =r (f), =r (g), =r (h), \ =r (i), =r (j), =r (k), =r (l), =r (m) \ : r (buf)); #define B A A A A A A A A A A A #define C B B B B B B B B B B B C } int bar (int x) { if (__builtin_expect (x 65, 0)) foo (x + 6); baz (); return x; } inlines foo at -O2.
[Bug other/49194] Trivially stupid inlining decisions
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49194 Michael Matz matz at gcc dot gnu.org changed: What|Removed |Added CC||matz at gcc dot gnu.org --- Comment #2 from Michael Matz matz at gcc dot gnu.org 2011-05-27 16:12:12 UTC --- For adjusting GCCs idea about what stack frame growth is acceptable also fiddle with the large-stack-frame and large-stack-frame-growth parameters.
[Bug other/49194] Trivially stupid inlining decisions
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49194 --- Comment #3 from Jan Hubicka hubicka at ucw dot cz 2011-05-27 16:15:02 UTC --- I agree if the called function is big and it is very unlikely (most probably just in PROB_VERY_UNLIKELY cases) -finline-functions-called-once shouldn't inline. -finline-functions-called-once is trottled down by the large-function-growth and large-stack-frame-growth limits. The Kernel case coupld proably be handled by the second. Does kernel bump down that limits? It still won't help in case function doesn't have any on-stack aggregates, since we optimistically assume that all gimple registers will disappear. Probably even that could be change, though estimating reload's stack frame usage so early would be iffy. I agree that both limits are bit high for this particular case. However I experimented with bumping this down some time ago with an extra parameter and did not find any practictical benefit that would justify adding yet another limit to the inliner. So testcases would be welcome. Honza
[Bug other/49194] Trivially stupid inlining decisions
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49194 --- Comment #4 from Jan Hubicka hubicka at gcc dot gnu.org 2011-05-27 16:19:04 UTC --- BTW mainline won't inline foo in that testcase: Deciding on functions called once: not inlinable: bar/1 - foo/0, --param large-stack-frame-growth limit reached I fixed some stack-growth estimation issues while rewriting inliner for 4.7, if 4.6 inlines, perhaps I can figure out what gets wrong there.
[Bug other/49194] Trivially stupid inlining decisions
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49194 --- Comment #5 from Jakub Jelinek jakub at gcc dot gnu.org 2011-05-27 16:24:45 UTC --- Oops, s/65536/128/, I've changed the testcase too late without retesting. Anyway, the point is that the limits should be adjusted somewhat if the call is PROB_VERY_UNLIKELY.
[Bug other/49194] Trivially stupid inlining decisions
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49194 --- Comment #6 from Linus Torvalds torva...@linux-foundation.org 2011-05-27 16:38:22 UTC --- (In reply to comment #3) -finline-functions-called-once is trottled down by the large-function-growth and large-stack-frame-growth limits. The Kernel case coupld proably be handled by the second. Does kernel bump down that limits? We used to play with inlining limits (gcc had some really bad decisions), but the meaning of the numbers kept changing from one gcc version to another, and the heuristics gcc used kept changing too. Which made it practically impossible to use sanely - you could tweak it for one particular architecture, and one particular version of gcc, but it would then be worse for others. Quite frankly, with that kind of history, I'm not very eager to start playing around with random gcc internal variables again. So I'd much rather have gcc have good heuristics by default, possibly helped by the kinds of obvious hints we can give (unlikely() in particular is something we can add for things like this). Obviously, we can (and do) use the force the decision with either noinline or __always_inline (which are just the kernel macros to make the gcc attribute syntax slightly more readable), but since I've been doing those other bug reports about bad gcc code generation, I thought I'd point out this one too. It still won't help in case function doesn't have any on-stack aggregates, since we optimistically assume that all gimple registers will disappear. Probably even that could be change, though estimating reload's stack frame usage so early would be iffy. Yes, early stack estimation might not work all that well. In the kernel, we do end up having a few complex functions that we basically expect to inline to almost nothing - simply because we end up depending on compile-time constant issues (sometimes very explicitly, with __builtin_constant_p() followed by a largish switch () statement). That said, this is something where the call-site really can make a big difference. Not just the fact that the call site might be marked unlikely() (again, that's just the kernel making __builtin_expect() readable), but things like none of the arguments are constants could easily be a good heuristic to use as a basis for whether to inline or not. IOW, start out with whatever 'large-stack-frame-growth' and 'large-function-growth' values, but if the call-site is in an unlikely region, cut those values in half (or whatever). And if none of the arguments are constants, cut it in half again. This is an example of why giving these limits as compiler options really doesn't work: the choice should probably be much more dynamic than just a single number. I dunno. As mentioned, we can fix this problem by just marking things noinline by hand. But I do think that there are fairly obvious cases where inlining really isn't worth it, and gcc might as well just get those cases right.
[Bug other/49194] Trivially stupid inlining decisions
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49194 --- Comment #7 from Jan Hubicka hubicka at ucw dot cz 2011-05-27 18:47:56 UTC --- We used to play with inlining limits (gcc had some really bad decisions), but the meaning of the numbers kept changing from one gcc version to another, and the heuristics gcc used kept changing too. Which made it practically impossible to use sanely - you could tweak it for one particular architecture, and one particular version of gcc, but it would then be worse for others. Well, the --param limits are really meant to be more for GCC development than for being adjusted to random values by random codebases, so this won't really help indeed. The heuristics needs to keep envolving since the expectations on it are changing,too (i.e. 10 years ago people didn't really cared about sane boost/pooma performance and just year ago no one really loaded 20 functions into compiler at once as we do now with LTO on mozilla). However concerning the stack growth, this was implemented on request of Andi Kleen for the kernel, since the stack size constraints are special. There is -fconserve-stack argument to avoid need to use magic --param values (that are not that magic for stack usage after all given that they count bytes). So I'd much rather have gcc have good heuristics by default, possibly helped by the kinds of obvious hints we can give (unlikely() in particular is something we can add for things like this). Agreed. In cases like this you might find handy the cold attribute that has the effect of making function optimized for size as well as telling branch predictor to predicts calls leading to call of cold function as unlikely. It would save need sfor some of the unlikely() calls. Yes, early stack estimation might not work all that well. Yep, one would have to do something like computing the number of live values at given place of function that is doable but moderately expensive for such a special purpose situation. That said, this is something where the call-site really can make a big difference. Not just the fact that the call site might be marked unlikely() (again, that's just the kernel making __builtin_expect() readable), but things like none of the arguments are constants could easily be a good heuristic to use as a basis for whether to inline or not. IOW, start out with whatever 'large-stack-frame-growth' and 'large-function-growth' values, but if the call-site is in an unlikely region, cut those values in half (or whatever). And if none of the arguments are constants, cut it in half again. This is an example of why giving these limits as compiler options really doesn't work: the choice should probably be much more dynamic than just a single number. I dunno. As mentioned, we can fix this problem by just marking things noinline by hand. But I do think that there are fairly obvious cases where inlining really isn't worth it, and gcc might as well just get those cases right. Well, this model won't work as suggested. The problem is that in GCC inliner simplistic POV inlining is always a win, unless it hits the large function/stack/unit bounds. Only tradeoff it understand is the code size growth. Here inlining causes regalloc to produce bigger stack frame because it is stupid and doesn't know how to do shrink wraping. (Bernd recently has posted patches for this, I duno about their status and if they will help here) This is important because the hot path through the outer function is extremely short and the outer function is simple so it won't need the registers otherwise. The large function/unit limits don't really worry about actual code quality, just the fact that we don't want non-linear algorithms in the compiler to become too prominent. So the starting values are high for this purpose. Large function is currently set to amusing constant of 2700insns. Dividing by 4 won't do much help. The real reason is that we are really mixing two concepts together (compiler nonlinearity and code quality considerations). This is not good idea. Some years ago I introduced the notion of hot and cold basic blocks to the GCC inliner and told it to not inline functions into cold basic blocks unless caller size (or overall program size) is expected to shrink. This has also introduced number of regression I had to get through. (think of not inlining destructor of object in EH handling code that prevents the object from being scalar replaced and optimized away). At the moment I can come up with the following suggestions: 1) inline functions called once from cold basic blocks only when they are small so caller size will shring (like inlining of small functions does) 2) introduce new function body size limit used only for cold functions called once 3) try to somehow get very rough stack frame estimate into our current stack frame growth limits. I guess I will try 1) and see how it affects other code and if it is not too bad we can stay with