On Wed, Mar 09, 2022 at 01:24:55PM +0000, Dave Jones wrote:
> Hi Michael (and others),
> 
> Julian's summarised this near perfectly, but I'll try and add a little
> detail from the data I've gathered [1] (with others' generous help, in
> particular Heinrich for the RISC-V bits):
> 
> On Wed, Mar 09, 2022 at 09:46:19AM +0100, Julian Andres Klode wrote:
> > On Wed, Mar 09, 2022 at 02:10:57PM +1300, Michael Hudson-Doyle wrote:
> [snip]
> > > > some time ago, the default compressor for initramfs was changed
> > > > from lz4 -9 to zstd -19. This caused significant problems:
> > > >
> > > 
> > > Exactly three months later... we still haven't taken any action on this.
> > > Time to do something!
> 
> Agreed!
> 
> > > I have a few questions below but tl;dr: unless there are immediate
> > > objections, I'm going to make a change to initramfs-tools to allow the
> > > compression level to be configured and set the default to 12 for zstd.
> > 
> > So xypron had a patch to change the default level to 9 for sponsoring
> > out for a couple of months now (no idea how that level came up).
> > 
> > We pushed back on that as it does not account for low-memory systems
> > which we need to take care of as well.
> 
> Yes, zstd -12 is not safe on low memory systems. In particular it fails to
> even run successfully on an otherwise entirely idle and unloaded Pi Zero 2
> (which, on arm64, has a little more than 200MB of RAM free at runtime,
> whilst zstd -T0 -12 on a PC requested ~415MB resident at runtime).
> 
> In fact, I'm not convinced *any* of zstd's levels are actually useful on a
> machine with as limited RAM as the Zero 2 (or 3A+) are. For example, with
> ~200MB of RAM free, if the user is running a daemon that eats, say, 100MB
> resident and we start up a compressor that eats 50MB (as zstd -T0 does at
> level -1) we stand a fair chance of pushing the daemon into OOM.
> 
> Now, in practice this doesn't actually matter right now as I've already
> overridden initramfs' default to lz4 in ubuntu-raspi-settings, however I
> think there are adjustments that should be made there too (and there is the
> question of whether this is relevant for, say, minimal memory cloud
> instances).
> 
> > We then postponed any implementation to after a discussion in
> > Frankfurt.
> > 
> > I think the summary from the Frankfurt discussion was:
> > 
> > - lz4 -1 is the right choice for low-memory systems
> > - if you have more memory, zstd -1 becomes the best choice
> > - pigz is outperforming both a bunch of times
> > 
> > But that's really for waveform to share.
> 
> Just to clarify a couple of things here:
> 
> Firstly I actually think lz4 -2 is probably the ideal level for that
> compressor. There's a large difference in compression performance between
> lz4 -1 and lz4 -2 across all platforms tested, but no difference in memory
> usage, and only a minimal increase in compression & decompression time.
> However, lz4 is currently configured to use level -9 which takes a
> considerable amount of extra time for little to no gain in compression
> performance (at least with our initramfs inputs anyway).
> 
> On machines with more generous RAM allowances, zstd -T0 -1 does appear to be
> the ideal. The incremental gains in compression at higher levels are
> outweighed by the extra time spent compressing (i.e. for our initramfs
> inputs at least, the extra time spent on the compression is not gained back
> on reading the compressed data at I/O speeds typical for their respective
> platforms).
> 
> [snipped some data]
> 
> At this point, if you want some data to play with I'd highly recommend
> cloning the following repo and following the instructions in the README:
> 
> https://github.com/waveform80/compression

I'm still not convinced by the data as it does not align at all what I
see on my laptop or does it? It certainly does not _feel_ like it, as
I was arguing that -12 makes most sense.

I did some reameasurements

Compression levels:

uncompressed        157MB
lz4 -2               75MB (42%)
lz4 -9               63MB (40%)
zstd -T0 -1          56MB (36%)
zstd -T0 -2          52MB (33%)
zstd -T0 -3          47MB (30%)
zstd -T0 -6          45MB (29%)
zstd -T0 -12         40MB (22%)

I don't know where 19 is, but a switch to lz4 -2 would roughly
double the size, that's for sure, so how would this affect
/boot size?

Looking at size, zstd clearly is the correct choice, if we reverted
to lz4 -2, sizes would even grow relative to older lz4 -9 choice,
meaning those users upgrading from focal run out of boot space.

Ignoring non-LTS users for a moment, we essentially need to find
a compressor that accomodates the size increase in kernel initramfs
due to new code and stuff, and I think zstd -1 does that reasonably
well.

Times spent (compressor/total update-initramfs)
                   user        system      total
lz4 -2             0.3/ 6.2s   0.1/ 2.6s   0.3/ 8.2s (3% of update-initramfs 
time)
lz4 -9             4.8/10.8s   0.1/ 2.6s   4.9/12.9s <- this is totally silly
zstd -T0 -1        0.7/ 5.6s   0.1/ 1.7s   0.2/ 6.2s (um, faster than lz4?)
zstd -T0 -1        0.7/ 7.1s   0.1/ 3.5s   0.2/ 9.3s (um, much slower in 2nd 
run)
zstd -T0 -2        0.9/ 7.1s   0.1/ 3.0s   0.3/ 8.8s (more noise than 
difference)
zstd -T0 -3        1.6/ 7.8s   0.1/ 2.9s   0.5/ 8.8s
zstd     -3        0.9/ 7.2s   0.1/ 3.1s   0.8/ 9.5s
zstd     -3        0.9/ 7.7s   0.1/ 3.8s   0.8/10.7s (noise, lots of noise)
zstd -T0 -6        6.2/12.8s   0.1/ 3.9s   1.7/11.4s
zstd -T0 -12      13.1/19.7s   0.2/ 3.4s   4.0/13.0s

It shows us that looking at the compressor does not tell us all the
story; for low-level zstd and lz4 values, you will absolutely not
notice the time spent compressing; in fact, there is more noise from
I/O or whatever despite the laptop essentially idling.

There's no way I can figure out if zstd -3 performs worse than zstd
-T0 -1, as it's runtime varies by 50%.

We also need to consider initrds we prebuild on images and like
combined kernel.efi binaries: They are built once and used hundredthousand
of times, they need *special* configuration.

But my conclusion now is that I think zstd -1 or zstd -2 or whatever is probably
a safe choice for users coming from focal in that it does not grow their
initrds, so it's probably a good default.

I do not like zstd -T0 at high levels as that means you steal too much
CPU time from your real software. If at all, higher levels would have
to be niced I guess.

One thing we should work on is performing the compression in parallel
to the CPIO building, this should reduce I/O wait times and offer
more meaningful parallelization. But not sure how feasible that is
- I don't just mean cpio | compressor, but also running the scripts,
and copying them to the output, more like scripts | cpio files from
stdin | compress.

-- 
debian developer - deb.li/jak | jak-linux.org - free software dev
ubuntu core developer                              i speak de, en

-- 
ubuntu-devel mailing list
ubuntu-devel@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel

Reply via email to