Re: [PATCH] asm-generic: Force inlining of get_order() to work around gcc10 poor decision
On Mon, Oct 19, 2020 at 10:54:40AM +0200, Christophe Leroy wrote: > Le 19/10/2020 à 10:32, Segher Boessenkool a écrit : > >The kernel should just use __always_inline if that is what it *wants*; > >that is true here most likely. GCC could perhaps improve its heuristics > >so that it no longer thinks these functions are often too big for > >inlining (they *are* pretty big, but not after basic optimisations with > >constant integer arguments). > > Yes I guess __always_inline is to be added on functions like this defined > in headers for exactly that, and that's the purpose of this patch. > > However I find it odd that get_order() is outlined by GCC even in some > object files that don't use it at all, for instance in fs/pipe.o It is (arguably) too big too always inline if you do not consider that __builtin_constant_p will remove half of the function one way or another. Not sure if that is what happens here, but now we have a PR (thanks!) and we will find out. Segher
Re: [PATCH] asm-generic: Force inlining of get_order() to work around gcc10 poor decision
Le 19/10/2020 à 10:32, Segher Boessenkool a écrit : On Mon, Oct 19, 2020 at 07:50:41AM +0200, Christophe Leroy wrote: Le 19/10/2020 à 06:55, Joel Stanley a écrit : In the old days, marking a function 'static inline' was forcing GCC to inline, but since commit ac7c3e4ff401 ("compiler: enable CONFIG_OPTIMIZE_INLINING forcibly") GCC may decide to not inline a function. It looks like GCC 10 is taking poor decisions on this. 1952 bytes smaller with your patch applied. Did you raise this with anyone from GCC? Yes I did, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445 For the time being, it's at a standstill. The kernel should just use __always_inline if that is what it *wants*; that is true here most likely. GCC could perhaps improve its heuristics so that it no longer thinks these functions are often too big for inlining (they *are* pretty big, but not after basic optimisations with constant integer arguments). Yes I guess __always_inline is to be added on functions like this defined in headers for exactly that, and that's the purpose of this patch. However I find it odd that get_order() is outlined by GCC even in some object files that don't use it at all, for instance in fs/pipe.o Christophe
Re: [PATCH] asm-generic: Force inlining of get_order() to work around gcc10 poor decision
On Mon, Oct 19, 2020 at 07:50:41AM +0200, Christophe Leroy wrote: > Le 19/10/2020 à 06:55, Joel Stanley a écrit : > >>In the old days, marking a function 'static inline' was forcing > >>GCC to inline, but since commit ac7c3e4ff401 ("compiler: enable > >>CONFIG_OPTIMIZE_INLINING forcibly") GCC may decide to not inline > >>a function. > >> > >>It looks like GCC 10 is taking poor decisions on this. > >1952 bytes smaller with your patch applied. Did you raise this with > >anyone from GCC? > > Yes I did, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445 > > For the time being, it's at a standstill. The kernel should just use __always_inline if that is what it *wants*; that is true here most likely. GCC could perhaps improve its heuristics so that it no longer thinks these functions are often too big for inlining (they *are* pretty big, but not after basic optimisations with constant integer arguments). Segher
Re: [PATCH] asm-generic: Force inlining of get_order() to work around gcc10 poor decision
Le 19/10/2020 à 06:55, Joel Stanley a écrit : On Sat, 17 Oct 2020 at 15:55, Christophe Leroy wrote: When building mpc885_ads_defconfig with gcc 10.1, the function get_order() appears 50 times in vmlinux: [linux]# ppc-linux-objdump -x vmlinux | grep get_order | wc -l 50 [linux]# size vmlinux textdata bss dec hex filename 3842620 675624 135160 4653404 47015c vmlinux In the old days, marking a function 'static inline' was forcing GCC to inline, but since commit ac7c3e4ff401 ("compiler: enable CONFIG_OPTIMIZE_INLINING forcibly") GCC may decide to not inline a function. It looks like GCC 10 is taking poor decisions on this. get_order() compiles into the following tiny function, occupying 20 bytes of text. 007c : 7c: 38 63 ff ff addir3,r3,-1 80: 54 63 a3 3e rlwinm r3,r3,20,12,31 84: 7c 63 00 34 cntlzw r3,r3 88: 20 63 00 20 subfic r3,r3,32 8c: 4e 80 00 20 blr By forcing get_order() to be __always_inline, the size of text is reduced by 1940 bytes, that is almost twice the space occupied by 50 times get_order() [linux-powerpc]# size vmlinux textdata bss dec hex filename 3840680 675588 135176 4651444 46f9b4 vmlinux I see similar results with GCC 10.2 building for arm32. There are 143 instances of get_order with aspeed_g5_defconfig. Before: 9071838 2630138 186468 11888444 b5673c vmlinux After: 9069886 2630126 186468 11886480 b55f90 vmlinux 1952 bytes smaller with your patch applied. Did you raise this with anyone from GCC? Yes I did, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445 For the time being, it's at a standstill. Christophe Reviewed-by: Joel Stanley Signed-off-by: Christophe Leroy --- include/asm-generic/getorder.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/asm-generic/getorder.h b/include/asm-generic/getorder.h index e9f20b813a69..f2979e3a96b6 100644 --- a/include/asm-generic/getorder.h +++ b/include/asm-generic/getorder.h @@ -26,7 +26,7 @@ * * The result is undefined if the size is 0. */ -static inline __attribute_const__ int get_order(unsigned long size) +static __always_inline __attribute_const__ int get_order(unsigned long size) { if (__builtin_constant_p(size)) { if (!size) -- 2.25.0
Re: [PATCH] asm-generic: Force inlining of get_order() to work around gcc10 poor decision
On Sat, 17 Oct 2020 at 15:55, Christophe Leroy wrote: > > When building mpc885_ads_defconfig with gcc 10.1, > the function get_order() appears 50 times in vmlinux: > > [linux]# ppc-linux-objdump -x vmlinux | grep get_order | wc -l > 50 > > [linux]# size vmlinux >textdata bss dec hex filename > 3842620 675624 135160 4653404 47015c vmlinux > > In the old days, marking a function 'static inline' was forcing > GCC to inline, but since commit ac7c3e4ff401 ("compiler: enable > CONFIG_OPTIMIZE_INLINING forcibly") GCC may decide to not inline > a function. > > It looks like GCC 10 is taking poor decisions on this. > > get_order() compiles into the following tiny function, > occupying 20 bytes of text. > > 007c : > 7c: 38 63 ff ff addir3,r3,-1 > 80: 54 63 a3 3e rlwinm r3,r3,20,12,31 > 84: 7c 63 00 34 cntlzw r3,r3 > 88: 20 63 00 20 subfic r3,r3,32 > 8c: 4e 80 00 20 blr > > By forcing get_order() to be __always_inline, the size of text is > reduced by 1940 bytes, that is almost twice the space occupied by > 50 times get_order() > > [linux-powerpc]# size vmlinux >textdata bss dec hex filename > 3840680 675588 135176 4651444 46f9b4 vmlinux I see similar results with GCC 10.2 building for arm32. There are 143 instances of get_order with aspeed_g5_defconfig. Before: 9071838 2630138 186468 11888444 b5673c vmlinux After: 9069886 2630126 186468 11886480 b55f90 vmlinux 1952 bytes smaller with your patch applied. Did you raise this with anyone from GCC? Reviewed-by: Joel Stanley > Signed-off-by: Christophe Leroy > --- > include/asm-generic/getorder.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/include/asm-generic/getorder.h b/include/asm-generic/getorder.h > index e9f20b813a69..f2979e3a96b6 100644 > --- a/include/asm-generic/getorder.h > +++ b/include/asm-generic/getorder.h > @@ -26,7 +26,7 @@ > * > * The result is undefined if the size is 0. > */ > -static inline __attribute_const__ int get_order(unsigned long size) > +static __always_inline __attribute_const__ int get_order(unsigned long size) > { > if (__builtin_constant_p(size)) { > if (!size) > -- > 2.25.0 >
[PATCH] asm-generic: Force inlining of get_order() to work around gcc10 poor decision
When building mpc885_ads_defconfig with gcc 10.1, the function get_order() appears 50 times in vmlinux: [linux]# ppc-linux-objdump -x vmlinux | grep get_order | wc -l 50 [linux]# size vmlinux textdata bss dec hex filename 3842620 675624 135160 4653404 47015c vmlinux In the old days, marking a function 'static inline' was forcing GCC to inline, but since commit ac7c3e4ff401 ("compiler: enable CONFIG_OPTIMIZE_INLINING forcibly") GCC may decide to not inline a function. It looks like GCC 10 is taking poor decisions on this. get_order() compiles into the following tiny function, occupying 20 bytes of text. 007c : 7c: 38 63 ff ff addir3,r3,-1 80: 54 63 a3 3e rlwinm r3,r3,20,12,31 84: 7c 63 00 34 cntlzw r3,r3 88: 20 63 00 20 subfic r3,r3,32 8c: 4e 80 00 20 blr By forcing get_order() to be __always_inline, the size of text is reduced by 1940 bytes, that is almost twice the space occupied by 50 times get_order() [linux-powerpc]# size vmlinux textdata bss dec hex filename 3840680 675588 135176 4651444 46f9b4 vmlinux Signed-off-by: Christophe Leroy --- include/asm-generic/getorder.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/asm-generic/getorder.h b/include/asm-generic/getorder.h index e9f20b813a69..f2979e3a96b6 100644 --- a/include/asm-generic/getorder.h +++ b/include/asm-generic/getorder.h @@ -26,7 +26,7 @@ * * The result is undefined if the size is 0. */ -static inline __attribute_const__ int get_order(unsigned long size) +static __always_inline __attribute_const__ int get_order(unsigned long size) { if (__builtin_constant_p(size)) { if (!size) -- 2.25.0