On 2021-11-29 Jia Tan wrote: > This patch addresses the issues with reproducible builds when using > multithreaded xz. Previously, specifying --threads=1 instead of > --threads=[n>1] creates different output. Now, setting any number of > threads forces multithreading mode, even if there is only 1 worker > thread.
This is an old problem that should have been fixed long ago. Unfortunately I think the fix needs to be a little more complex due to backward compatibility. With this patch, if threading has been enabled, no further option on the command line (except --flush-timeout) will disable threading. Sometimes there are default options (for exampe, XZ_DEFAULTS) that enable threading and one wants to disable it in a specific situation (like running multiple xz commands in parallel via xargs). If --threads=1 always enables threading, memory usage will be quite a bit higher than in non-threaded mode (94 MiB vs. 166 MiB for the default compression level -6; 674 MiB vs. 1250 MiB for -9). To be backward compatible, maybe it needs extra syntax within the --threads option or a new command line option. Both are a bit annoying and ugly but I don't have a better idea. Currently one-thread multi-threading is done if one specifies two or more threads but the memory limit is so low that only one thread can be used. In that case xz will never switch to non-threaded mode. This ensures that the output file is always the same even if the number of threads gets reduced. When -T0 is used, that is broken in sense that threading mode (and thus encoded output) depends on how many hardware threads are supported. So perhaps -T0 should mean that multi-threaded mode must be used even for single thread (your patch would do this too). A way to explicitly specify one-thread multi-threaded mode is still needed but I guess it wouldn't need to be used so often if -T0 handles it already. -T0 needs improvements in default memory usage limiting too, and both changes could make the default behavior better. The opposite functionality could be made available too: if the number of threads becomes one for whatever reason, an option could tell xz to always use single-threaded mode to get better compression and to save RAM. > +#include "common.h" [...] > // The max is from src/liblzma/common/common.h. > hardware_threads_set(str_to_uint64("threads", > - optarg, 0, 16384)); > + optarg, 0, LZMA_THREADS_MAX)); common.h is internal to liblzma and must not be used from xz. Maybe LZMA_THREADS_MAX could be moved to the public API, I don't know right now. -- Lasse Collin