Hello!

I apologize for replying so late. I kept my computer off a few months
and so I was away from email too.

On 2020-08-01 Bernhard M. Wiedemann wrote:
> While working on reproducible builds for openSUSE, I found that
> xz --threads=0 produces different output on 1-core-VMs as seen with:
> for n in 2 3 ; do echo | taskset $n xz --threads=0 -c - | md5sum ;
> done
> 
> Is there a simpler way to get reproducible output
> while still taking advantage of parallel processing than
> https://github.com/openSUSE/obs-service-recompress/pull/17 ?

There's no easy and clean way, and obviously that should be fixed.

With the current versions you need a workaround. For example, to always
use at least two threads:

    T=$(expr $(nproc --ignore=1) + 1)
    xz --threads="$T" ...

If it is essential to use only one thread, there is an ugly way which
abuses the way xz' memory limiting feature scales down the number of
threads:

    M=$(expr $(nproc) \* 200)M
    xz -6 -T0 --memlimit="$M" ...

Or:

    M=$(expr $(nproc) \* 1300)M
    xz -9 -T0 --memlimit="$M" ...

200-300 works for xz -6. 1300-2400 works for xz -9. In theory these
values could change in a future version.

> Arch Linux devs also noticed this:
> https://lists.archlinux.org/pipermail/arch-dev-public/2019-March/029520.html

Arch switched to zstd so I guess it doesn't matter there so much
anymore.

As long as users' Internet connections are fast enough (10-20 Mbit/s)
and package managers first download the packages and then decompress
them (instead of doing both in parallel), zstd results in as fast or
faster download + install time. So unless users' don't have at least 10
Mbit/s connections or the monetary cost per megabyte is significant,
zstd is better than xz in package manager use.

Note that Arch used the default "xz -6" which uses a 8 MiB dictionary
while the zstd-compressed packages currently in Arch use a 32 MiB
dictionary. With some packages a bigger dictionary results in big
improvement with both compressors. However, this doesn't change the big
picture much and the above paragraph is still true. Implementing
threaded decompression would help xz but only with big packages.

-- 
Lasse Collin  |  IRC: Larhzu @ IRCnet & Freenode

Reply via email to