On Thu, Jul 27, 2023, Adrien Nader wrote:
> On Thu, Jul 27, 2023, Michael Hudson-Doyle wrote:
> > On Thu, 27 Jul 2023 at 09:21, Benjamin Drung <bdr...@ubuntu.com> wrote:
> > 
> > > On Wed, 2023-07-26 at 17:53 +0200, Benjamin Drung wrote:
> > > > Hi all,
> > > >
> > > > A few weeks ago, I posted an idea how to reduce the initramfs size and
> > > > speed up the generation:
> > > >
> > > > https://lists.ubuntu.com/archives/ubuntu-devel/2023-July/042652.html
> > > >
> > > > This post sparked a lively discussion. The initial idea was ditched for
> > > > a better solution: mkinitramfs will put all compressed files (kernel
> > > > modules and firmware files) into a cpio archive that is not compressed
> > > > (because compressing compressed files makes no sense). All other files
> > > > will be added to a cpio archive that gets compressed. As next steps, the
> > > > kernel modules and firmware files need to be shipped compressed.
> > > >
> > > > After several iterations for the implementation and review by Daves
> > > > Jones, I just uploaded initramfs-tools 0.142ubuntu8 to mantic that puts
> > > > compressed kernel modules and firmware files in an uncompressed cpio
> > > > (https://launchpad.net/bugs/2028567).
> > > >
> > > > I created/updated the follow-up tickets and added my patches to them:
> > > >
> > > > Ship kernel modules Zstd compressed
> > > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2028568
> > > >
> > > > compress firmware in /lib/firmware
> > > > https://bugs.launchpad.net/ubuntu/+source/linux-firmware/+bug/1942260
> > > >
> > > > And without further ado, here come the benchmark results:
> > > >
> > > > The benchmarks were done either on an AMD Ryzen 7 5700G with schroot and
> > > > overlay on tmpfs or on the hardware mentioned. All tests were running
> > > > the latest Ubuntu mantic development release.
> > > >
> > > > * minimal: schroot with linux-image-generic initramfs-tools zstd
> > > > * full: minimal + busybox-initramfs cryptsetup-initramfs
> > > >   isc-dhcp-client kbd lvm2 mdadm ntfs-3g plymouth plymouth-theme-spinner
> > > > * nvidia: full + linux-headers-generic nvidia-driver-525
> > > > * nvidia fw: nvidia + compressed /lib/firmware/nvidia/525.125.06/
> > > > * VisionFive 2: VisionFive 2 RISC-V board
> > > > * RPi Zero 2: Raspberry Pi Zero 2 ARM board (running armhf)
> > > >
> > > > "next" means using kernel/firmware/initramfs from ppa:bdrung/ppa i.e.
> > > > * initramfs-tools 0.142ubuntu7bd4
> > > > * linux 6.3.0-7.7bd2
> > > > * linux-firmware 20230629.gitee91452d-0ubuntu1bd1
> > > >
> > > > > |                | build   | size               | uncompressed size  |
> > > > > | test           | time    | in bytes  | in MiB | in bytes  | in MiB |
> > > > > |----------------|---------|-----------|--------|--------------------|
> > > > > | minimal        | 4.30 s  |  66701818 |  63.6  | 224087608 | 213.7  |
> > > > > | minimal next   | 4.54 s  |  59935186 |  57.2  |  67738810 |  64.6  |
> > > > > | full           | 7.15 s  | 118007038 | 112.5  | 387976843 | 370.0  |
> > > > > | full next      | 7.29 s  | 106937908 | 102.0  | 128610985 | 122.7  |
> > > > > | nvidia         | 7.04 s  | 209200523 | 199.5  | 513554279 | 489.8  |
> > > > > | nvidia next    | 7.21 s  | 195246287 | 186.2  | 235288174 | 224.4  |
> > > > > | nvidia fw next | 7.16 s  | 191329102 | 182.5  | 213078234 | 203.2  |
> > > > > | VisionFive 2   | 142.9 s | 121895035 | 116.2  | 411160836 | 392.1  |
> > > > > | VF 2 next      | 126.7 s | 111651453 | 106.5  | 134120804 | 127.9  |
> > > > > | RPi Zero 2     | 109.5 s |  39803044 |  40.0  |  69592789 |  66.4  |
> > > > > | RPi Zero 2 ²   | 103.5 s |  39804499 |  40.0  |  69592789 |  66.4  |
> > > > > | RPi Zero 2 next| 101.2 s |  31463352 |  30.0  |  41145762 |  39.2  |
> > > >
> > > > ² Updated initramfs-tools (but no compressed modules or firmware)
> > > >
> > > > The build time was averaged over five runs.
> > > >
> > > > > | improvement  | build time | size   | uncompressed size |
> > > > > |--------------|------------|--------|-------------------|
> > > > > | minimal      |  105.6 %   | 89.9 % |      30.2 %       |
> > > > > | full         |  102.0 %   | 90.6 % |      33.1 %       |
> > > > > | nvidia       |  101.7 %   | 91.5 % |      41.5 %       |
> > > > > | VisionFive 2 |   88.7 %   | 91.6 % |      32.6 %       |
> > > > > | RPi Zero 2   |   92.4 %   | 79.0 % |      59.1 %       |
> > > >
> > > > Building the initramfs takes more CPU cycles (see tests on tmpfs), but
> > > > saves time on disk IO. Daves Jones saw much bigger time savings on his
> > > > Raspberry Pis but his tests were on lunar.
> > > >
> > > > Build time influence:
> > > > + add_directories plus uniq take several milliseconds
> > > > + depmod on compressed kernel modules take hundreds of
> > > >   milliseconds longer
> > > > - copying smaller kernel modules (due to compression) is faster
> > > > - cpio archive that needs to be compressed is smaller
> > > > - not storing intermediate cpio archives saves time
> > > >
> > > > Saving 10 to 20 percent on the initramfs size and only needing half or a
> > > > third of the size when unpacked (i.e. needed memory during boot) is a
> > > > good improvement.
> > >
> > > The smaller initramfs overall size (less to load into memory and unpack)
> > > and the smaller compressed cpio (less to decompress) have a positive
> > > effect on the boot speed, especially on systems with slow CPU and/or
> > > slow IO.
> > >
> > > When looking at the "kernel" time from systemd-analyze, the improvement
> > > ranges from 1.62s - 1.36s = 0.26s in a VM on my desktop to a heavily
> > > noticeable 37.9s - 16.5s = 21.4s on the VisionFive 2 RISC-V board.
> > >
> > 
> > This is good stuff. It's a bit of a shame that the build time for the
> > initramfs hasn't improved much. I guess it's not as dominated by
> > compression time as I thought?
> 
> Compression is typically slower on inputs that compress poorly. That can
> explain some of the low speedup.
> 
> > Do you have any thoughts about making it faster? I know I once ran 'strace
> > -ff mkinitramfs' and ended up with tens or hundreds of thousands of trace
> > files so not having everything done by a billion tiny shell scripts would
> > help, but I don't know how much.
> 
> Without performance data it's early to discuss that but if that's the
> case, that raises the question of how much effort we would want to put
> into improving the scripts, and whether we would rather optimize the
> shell script or rewrite it (or parts of it).
> 
> In any case, what would be an acceptable overhead for the script
> compared to compression and I/O? Something like 20 or 30% at most?
> 
> > I wonder if we can make depmod incremental somehow?
> 
> If find that hundreds of milliseconds sounds like a lot. On my laptop,
> modules are ~100KB of average and that would make the decompression
> speed less than 500KB/s while it's closer to 1.5GB/s on my laptop. An
> RPi Zero 2 is much slower but not that much slower. I wouldn't be
> surprised that there are low hanging fruits and maybe some
> initialization work that is done repeatedly.

I now realize that I'm unsure I understood properly your benchmark
environment. Was it on your laptop/desktop or on raspberry pis? Also,
was depmod taking hundreds of milliseconds longer in total or per file?

Thanks,

-- 
Adrien

-- 
ubuntu-devel mailing list
ubuntu-devel@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel

Reply via email to