Re: [PATCH] lib/lz4: make arrays static const, reduces object code size
On Fri, 2017-09-22 at 23:39 +0200, Arnd Bergmann wrote: > On Fri, Sep 22, 2017 at 9:17 PM, Arnd Bergmann wrote: > > On Fri, Sep 22, 2017 at 7:21 PM, Joe Perches wrote: > > > On Fri, 2017-09-22 at 09:48 +0200, Arnd Bergmann wrote: > > > > On Fri, Sep 22, 2017 at 1:11 AM, Colin Ian King > > > >textdata bss dec hex filename > > > > 18220 176 0 1839647dc > > > > build/tmp/lib/lz4/lz4_decompress-after.o > > > > 22297 0 0 222975719 > > > > build/tmp/lib/lz4/lz4_decompress-before.o > > > > > > Perhaps not so much a gcc bug as an opportunity > > > for gcc to add an additional optimization. > > > > > > gcc would have to verify that the const array is > > > not initialized with some variable or argument like: > > > > > > int foo(int a) > > > { > > > const int array[] = {1, a}; > > > ... > > > } > > > > It depends. With a 10KB different in .text size, my guess is that this > > is a case where gcc does the right optimization in principle, but > > fails to do what was intended in some corner cases. > > I found the problem: "gcc -fsanitze=kernel-address --param asan-stack=1" > produces lots of expensive checks here with gcc-5 or higher. > > Disabling it makes a big difference: > > upstream: > gcc-7.0.0 --fsanitze=kernel-address --param asan-stack=1: 31789 bytes > gcc-7.0.0: 16535 bytes > > patched: > gcc-7.0.0 --fsanitze=kernel-address --param asan-stack=1: 20351 bytes > gcc-7.0.0: 14490 bytes I think you are looking at a different issue. There seems still a difference in size between current and Colin's patch in compiled object size. > Arnd
Re: [PATCH] lib/lz4: make arrays static const, reduces object code size
On Fri, 2017-09-22 at 21:17 +0200, Arnd Bergmann wrote: > On Fri, Sep 22, 2017 at 7:21 PM, Joe Perches wrote: > > On Fri, 2017-09-22 at 09:48 +0200, Arnd Bergmann wrote: > > > On Fri, Sep 22, 2017 at 1:11 AM, Colin Ian King > > >textdata bss dec hex filename > > > 18220 176 0 1839647dc > > > build/tmp/lib/lz4/lz4_decompress-after.o > > > 22297 0 0 222975719 > > > build/tmp/lib/lz4/lz4_decompress-before.o > > > > Perhaps not so much a gcc bug as an opportunity > > for gcc to add an additional optimization. > > > > gcc would have to verify that the const array is > > not initialized with some variable or argument like: > > > > int foo(int a) > > { > > const int array[] = {1, a}; > > ... > > } > > It depends. With a 10KB different in .text size, my guess is that this > is a case where gcc does the right optimization in principle, but > fails to do what was intended in some corner cases. Maybe/maybe not. > I just cross-checked by building with clang, there the patch has > no impact on code size, it is 24929 bytes with or without the patch. > > Looking at other versions of (x86) gcc, I see .text sizes of > > afterbefore > gcc-3.4.6 10855 12977 > gcc-4.0.4 11088 11088 > gcc-4.1.3 10973 10973 > gcc-4.2.5 11183 11183 > gcc-4.3.6 15501 17724 Interesting this was apparently deoptimized at version 4.3. Glancing at the release notes doesn't seem to indicate anything obvious. https://gcc.gnu.org/gcc-4.3/changes.html > gcc-4.4.7 13337 15693 > gcc-4.5.4 13162 15491 > gcc-4.6.4 14846 17302 > gcc-4.7.4 14187 16294 > gcc-4.8.5 16591 18730 > gcc-4.9.4 19582 21995 > gcc-5.4.1 18294 22510 > gcc-6.1.1 20487 25172 > gcc-6.3.1 20487 25172 > gcc-7.0.0 20351 31789 > gcc-7.0.1 20351 24966 > gcc-7.1.1 20383 24982 > gcc-8.0.0 20686 25065 > > It seems whatever happened in early versions of gcc-7 has since > improved, and it probably was a bug since older and newer versions > create similar code size (I have not looked at the actual object code). > > The 5K difference in gcc-5 and higher still seems like a lot. It would > also be interesting to look at the decompression performance of > this code witth the different compilers to see if it got better or worse. yup > Most likely, gcc got better at inlining and unrolling parts of the > algorithm, but sometimes an object file that doubles or triples in > size is an indication that the compiler did something really bad. yup[2] cheers, Joe
Re: [PATCH] lib/lz4: make arrays static const, reduces object code size
On 22/09/17 22:39, Arnd Bergmann wrote: > On Fri, Sep 22, 2017 at 9:17 PM, Arnd Bergmann wrote: >> On Fri, Sep 22, 2017 at 7:21 PM, Joe Perches wrote: >>> On Fri, 2017-09-22 at 09:48 +0200, Arnd Bergmann wrote: On Fri, Sep 22, 2017 at 1:11 AM, Colin Ian King >> textdata bss dec hex filename 18220 176 0 1839647dc build/tmp/lib/lz4/lz4_decompress-after.o 22297 0 0 222975719 build/tmp/lib/lz4/lz4_decompress-before.o >>> >>> Perhaps not so much a gcc bug as an opportunity >>> for gcc to add an additional optimization. >>> >>> gcc would have to verify that the const array is >>> not initialized with some variable or argument like: >>> >>> int foo(int a) >>> { >>> const int array[] = {1, a}; >>> ... >>> } >> >> It depends. With a 10KB different in .text size, my guess is that this >> is a case where gcc does the right optimization in principle, but >> fails to do what was intended in some corner cases. > > I found the problem: "gcc -fsanitze=kernel-address --param asan-stack=1" > produces lots of expensive checks here with gcc-5 or higher. > > Disabling it makes a big difference: > > upstream: > gcc-7.0.0 --fsanitze=kernel-address --param asan-stack=1: 31789 bytes > gcc-7.0.0: 16535 bytes > > patched: > gcc-7.0.0 --fsanitze=kernel-address --param asan-stack=1: 20351 bytes > gcc-7.0.0: 14490 bytes > > Arnd > Nice catch!
Re: [PATCH] lib/lz4: make arrays static const, reduces object code size
On Fri, Sep 22, 2017 at 9:17 PM, Arnd Bergmann wrote: > On Fri, Sep 22, 2017 at 7:21 PM, Joe Perches wrote: >> On Fri, 2017-09-22 at 09:48 +0200, Arnd Bergmann wrote: >>> On Fri, Sep 22, 2017 at 1:11 AM, Colin Ian King > >>>textdata bss dec hex filename >>> 18220 176 0 1839647dc >>> build/tmp/lib/lz4/lz4_decompress-after.o >>> 22297 0 0 222975719 >>> build/tmp/lib/lz4/lz4_decompress-before.o >> >> Perhaps not so much a gcc bug as an opportunity >> for gcc to add an additional optimization. >> >> gcc would have to verify that the const array is >> not initialized with some variable or argument like: >> >> int foo(int a) >> { >> const int array[] = {1, a}; >> ... >> } > > It depends. With a 10KB different in .text size, my guess is that this > is a case where gcc does the right optimization in principle, but > fails to do what was intended in some corner cases. I found the problem: "gcc -fsanitze=kernel-address --param asan-stack=1" produces lots of expensive checks here with gcc-5 or higher. Disabling it makes a big difference: upstream: gcc-7.0.0 --fsanitze=kernel-address --param asan-stack=1: 31789 bytes gcc-7.0.0: 16535 bytes patched: gcc-7.0.0 --fsanitze=kernel-address --param asan-stack=1: 20351 bytes gcc-7.0.0: 14490 bytes Arnd
Re: [PATCH] lib/lz4: make arrays static const, reduces object code size
On Fri, Sep 22, 2017 at 9:17 PM, Arnd Bergmann wrote: > afterbefore > gcc-3.4.6 10855 12977 > gcc-4.0.4 11088 11088 > gcc-4.1.3 10973 10973 > gcc-4.2.5 11183 11183 > gcc-4.3.6 15501 17724 > gcc-4.4.7 13337 15693 > gcc-4.5.4 13162 15491 > gcc-4.6.4 14846 17302 > gcc-4.7.4 14187 16294 > gcc-4.8.5 16591 18730 > gcc-4.9.4 19582 21995 > gcc-5.4.1 18294 22510 > gcc-6.1.1 20487 25172 > gcc-6.3.1 20487 25172 > gcc-7.0.0 20351 31789 > gcc-7.0.1 20351 24966 > gcc-7.1.1 20383 24982 > gcc-8.0.0 20686 25065 Here is the same table for 32-bit ARM, showing results that roughly match my expectations of what we should see here 4.3.6 15318 16574 4.4.7 13364 14388 4.5.4 12856 13912 4.6.4 14596 15124 4.7.4 14942 15850 4.8.5 15454 16074 4.9.4 15994 16742 5.4.1 15930 16686 6.3.1 17302 18154 7.0.0 18042 18958 7.0.1 18278 19098 7.1.1 18302 19106 8.0.0 19710 20490 Arnd
Re: [PATCH] lib/lz4: make arrays static const, reduces object code size
On Fri, Sep 22, 2017 at 7:21 PM, Joe Perches wrote: > On Fri, 2017-09-22 at 09:48 +0200, Arnd Bergmann wrote: >> On Fri, Sep 22, 2017 at 1:11 AM, Colin Ian King >>textdata bss dec hex filename >> 18220 176 0 1839647dc >> build/tmp/lib/lz4/lz4_decompress-after.o >> 22297 0 0 222975719 >> build/tmp/lib/lz4/lz4_decompress-before.o > > Perhaps not so much a gcc bug as an opportunity > for gcc to add an additional optimization. > > gcc would have to verify that the const array is > not initialized with some variable or argument like: > > int foo(int a) > { > const int array[] = {1, a}; > ... > } It depends. With a 10KB different in .text size, my guess is that this is a case where gcc does the right optimization in principle, but fails to do what was intended in some corner cases. I just cross-checked by building with clang, there the patch has no impact on code size, it is 24929 bytes with or without the patch. Looking at other versions of (x86) gcc, I see .text sizes of afterbefore gcc-3.4.6 10855 12977 gcc-4.0.4 11088 11088 gcc-4.1.3 10973 10973 gcc-4.2.5 11183 11183 gcc-4.3.6 15501 17724 gcc-4.4.7 13337 15693 gcc-4.5.4 13162 15491 gcc-4.6.4 14846 17302 gcc-4.7.4 14187 16294 gcc-4.8.5 16591 18730 gcc-4.9.4 19582 21995 gcc-5.4.1 18294 22510 gcc-6.1.1 20487 25172 gcc-6.3.1 20487 25172 gcc-7.0.0 20351 31789 gcc-7.0.1 20351 24966 gcc-7.1.1 20383 24982 gcc-8.0.0 20686 25065 It seems whatever happened in early versions of gcc-7 has since improved, and it probably was a bug since older and newer versions create similar code size (I have not looked at the actual object code). The 5K difference in gcc-5 and higher still seems like a lot. It would also be interesting to look at the decompression performance of this code witth the different compilers to see if it got better or worse. Most likely, gcc got better at inlining and unrolling parts of the algorithm, but sometimes an object file that doubles or triples in size is an indication that the compiler did something really bad. Arnd
Re: [PATCH] lib/lz4: make arrays static const, reduces object code size
On Fri, 2017-09-22 at 09:48 +0200, Arnd Bergmann wrote: > On Fri, Sep 22, 2017 at 1:11 AM, Colin Ian King > wrote: > > On 22/09/17 00:09, Christophe JAILLET wrote: > > > Le 22/09/2017 à 00:19, Colin King a écrit : > > > > From: Colin Ian King > > > > > > > > Don't populate the read-only arrays dec32table and dec64table on the > > > > stack, instead make them both static const. Makes the object code > > > > smaller by over 10K bytes: > > > > > > 10k? Wouaouh! This is way much more than what you usually win with such > > > patches. > > > > Yes, I had to triple check it because it was an unbelievable win. > > > > I wonder whether this should be reported as a gcc bug. I tried reproducing > it here with gcc-7.1.1 and gcc-8.0.0, but I only see a 4K difference: > >textdata bss dec hex filename > 18220 176 0 1839647dc > build/tmp/lib/lz4/lz4_decompress-after.o > 22297 0 0 222975719 > build/tmp/lib/lz4/lz4_decompress-before.o Perhaps not so much a gcc bug as an opportunity for gcc to add an additional optimization. gcc would have to verify that the const array is not initialized with some variable or argument like: int foo(int a) { const int array[] = {1, a}; ... }
Re: [PATCH] lib/lz4: make arrays static const, reduces object code size
On Fri, Sep 22, 2017 at 1:11 AM, Colin Ian King wrote: > On 22/09/17 00:09, Christophe JAILLET wrote: >> Le 22/09/2017 à 00:19, Colin King a écrit : >>> From: Colin Ian King >>> >>> Don't populate the read-only arrays dec32table and dec64table on the >>> stack, instead make them both static const. Makes the object code >>> smaller by over 10K bytes: >> 10k? Wouaouh! This is way much more than what you usually win with such >> patches. > > Yes, I had to triple check it because it was an unbelievable win. > I wonder whether this should be reported as a gcc bug. I tried reproducing it here with gcc-7.1.1 and gcc-8.0.0, but I only see a 4K difference: textdata bss dec hex filename 18220 176 0 1839647dc build/tmp/lib/lz4/lz4_decompress-after.o 22297 0 0 222975719 build/tmp/lib/lz4/lz4_decompress-before.o I tried x86 defconfig and allmodconfig on a slightly patched mainline kernel. Arnd
Re: [PATCH] lib/lz4: make arrays static const, reduces object code size
On 22/09/17 00:09, Christophe JAILLET wrote: > Le 22/09/2017 à 00:19, Colin King a écrit : >> From: Colin Ian King >> >> Don't populate the read-only arrays dec32table and dec64table on the >> stack, instead make them both static const. Makes the object code >> smaller by over 10K bytes: > 10k? Wouaouh! This is way much more than what you usually win with such > patches. Yes, I had to triple check it because it was an unbelievable win. > I assume we must thanks FORCE_INLINE in this case. Yep, I believe so. > Nice catch! > > CJ > >> Before: >> text databssdechexfilename >>31500 0 0 31500 7b0c >> lib/lz4/lz4_decompress.o >> >> After: >> text databssdechexfilename >>20237176 0 20413 4fbd >> lib/lz4/lz4_decompress.o >> >> (gcc version 7.2.0 x86_64) >> >> Signed-off-by: Colin Ian King >> --- >> lib/lz4/lz4_decompress.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/lib/lz4/lz4_decompress.c b/lib/lz4/lz4_decompress.c >> index bd3574312b82..141734d255e4 100644 >> --- a/lib/lz4/lz4_decompress.c >> +++ b/lib/lz4/lz4_decompress.c >> @@ -85,8 +85,8 @@ static FORCE_INLINE int LZ4_decompress_generic( >> const BYTE * const lowLimit = lowPrefix - dictSize; >> const BYTE * const dictEnd = (const BYTE *)dictStart + dictSize; >> -const unsigned int dec32table[] = { 0, 1, 2, 1, 4, 4, 4, 4 }; >> -const int dec64table[] = { 0, 0, 0, -1, 0, 1, 2, 3 }; >> +static const unsigned int dec32table[] = { 0, 1, 2, 1, 4, 4, 4, 4 }; >> +static const int dec64table[] = { 0, 0, 0, -1, 0, 1, 2, 3 }; >> const int safeDecode = (endOnInput == endOnInputSize); >> const int checkOffset = ((safeDecode) && (dictSize < (int)(64 * >> KB))); > >
Re: [PATCH] lib/lz4: make arrays static const, reduces object code size
Le 22/09/2017 à 00:19, Colin King a écrit : From: Colin Ian King Don't populate the read-only arrays dec32table and dec64table on the stack, instead make them both static const. Makes the object code smaller by over 10K bytes: 10k? Wouaouh! This is way much more than what you usually win with such patches. I assume we must thanks FORCE_INLINE in this case. Nice catch! CJ Before: text data bss dec hex filename 31500 0 0 315007b0c lib/lz4/lz4_decompress.o After: text data bss dec hex filename 20237176 0 204134fbd lib/lz4/lz4_decompress.o (gcc version 7.2.0 x86_64) Signed-off-by: Colin Ian King --- lib/lz4/lz4_decompress.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/lz4/lz4_decompress.c b/lib/lz4/lz4_decompress.c index bd3574312b82..141734d255e4 100644 --- a/lib/lz4/lz4_decompress.c +++ b/lib/lz4/lz4_decompress.c @@ -85,8 +85,8 @@ static FORCE_INLINE int LZ4_decompress_generic( const BYTE * const lowLimit = lowPrefix - dictSize; const BYTE * const dictEnd = (const BYTE *)dictStart + dictSize; - const unsigned int dec32table[] = { 0, 1, 2, 1, 4, 4, 4, 4 }; - const int dec64table[] = { 0, 0, 0, -1, 0, 1, 2, 3 }; + static const unsigned int dec32table[] = { 0, 1, 2, 1, 4, 4, 4, 4 }; + static const int dec64table[] = { 0, 0, 0, -1, 0, 1, 2, 3 }; const int safeDecode = (endOnInput == endOnInputSize); const int checkOffset = ((safeDecode) && (dictSize < (int)(64 * KB)));