On Fri, Jan 06, 2023 at 11:26:36PM +0000, Maier, Brandon L
Collins wrote:
> > From: Tom Rini <[email protected]>
> > On Mon, Jan 02, 2023 at 04:02:06PM +0000, Maier, Brandon L
> > >
> > > I understand Linux replaced their custom zstd code with the Facebook zstd
> > code, so the changes are widespread. The new zstd does support many
> > optional performance features as documented in zstd's lib/README[1],
> > which I have already added a KConfig to disable all of them in this patch.
> > For
> > reference, if I enable the performance options the build size increases as
> > so:
> > >
> > > aarch64: (for 2/2 boards) all +27432.0 rodata +320.0 text +27112.0
> > > arm: (for 5/5 boards) all +21801.6 rodata +256.0 text +21545.6
> > > sandbox: (for 2/2 boards) all +59004.0 bss +16.0 rodata +800.0 text
> > +58188.0
> > >
> > > I haven't had any luck finding another way to decrease build size.
> > >
> > > [1]
> > https://github.com/facebook/zstd/blob/dev/lib/README.md#modular-build
> >
> > After skimming over that, did you see if not disabling inlining, and
> > letting the compiler figure it out is a big size penalty?
>
> That is one of the settings I am already enabling. To clarify from that
> README, the settings I am enabling in U-Boot are HUF_FORCE_DECOMPRESS_X1,
> ZSTD_FORCE_DECOMPRESS_SEQUENCES_SHORT, ZSTD_NO_INLINE, and
> ZSTD_STRIP_ERROR_STRINGS. Which is what upstream zstd uses for their own
> ZSTD_LIB_MINIFY option in lib/libzstd.mk.
>
> If I drop the ZSTD_NO_INLINE, the build sizes increase as follows:
>
> aarch64: (for 2/2 boards) all +6212.0 rodata +8.0 text +6204.0
> arm: (for 5/5 boards) all +7408.0 text +7408.0
> sandbox: (for 2/2 boards) all +21928.0 bss -16.0 data +16.0 rodata +224.0
> text +21704.0On top of the previous I assume, OK, darn. > > Also, while the arch-level details are good, a specific platform set of > > numbers > > instead might give some ideas on where maybe reductions could come from. > > I looked over each platform, but the size deltas are consistent within each > architecture. > > What stands out though is that the old ZSTD ARM code is about 4.6kb smaller > (25%) than the old ZSTD AARCH64 code. The newer ZSTD doesn't do that, both > ARM and AARCH64 are about the same in size. That is why there is a 4k jump in > size for just the arm boards. Comparing the functions list between the two, > both archs appear similar except ZSTD_decompressBlock_internal() grows > significantly on aarch64 for some reason. I haven't seen any obvious root > cause though. > > Here is the full breakdown of platform size deltas > > aarch64: (for 2/2 boards) all +535.5 rodata +965.5 text -430.0 > turris_mox : all +572 rodata +968 text -396 > mvebu_espressobin-88f3720: all +499 rodata +963 text -464 > arm: (for 5/5 boards) all +4489.6 rodata +940.0 text +3549.6 > turris_omnia : all +4608 rodata +940 text +3668 > m53menlo : all +4460 rodata +940 text +3520 > dh_imx6 : all +4460 rodata +940 text +3520 > stm32mp15_dhcom_basic: all +4460 rodata +940 text +3520 > stm32mp15_dhcor_basic: all +4460 rodata +940 text +3520 > sandbox: (for 2/2 boards) all +8.0 rodata +784.0 text -776.0 > sandbox : all +528 bss -32 rodata +768 text -208 > sandbox64 : all -512 bss +32 rodata +800 text -1344 > > > > Finally, what is the whole list of platforms that grow? > > All of them except sandbox64 grow by some amount. > > > It's all not ideal, yes, but it looks like BTRFS is the main user, right > > now, which > > isn't widely enabled. So maybe we can look towards improving upstream a > > bit here, if motivated. > > I can't speak for the BTRFS code as I don't have a suitable platform > to test it on. But the zstd code does appear to be broken on master. > Running the zstd compression test in sandbox or sandbox64 both > segfault. And I have similar problems trying to decompress FIT images > on our AARCH64 platform. The primary motivation to this patch is that > the new version does work. And this seems like the appropriate fix > given Linux has abandoned the original version of this code. Well, I guess in the end, non-functional smaller code is factually worse than larger functional code, so, I'll pick this up post v2023.01, thanks! -- Tom
signature.asc
Description: PGP signature

