Re: [xz-devel] [PATCH] xz: Multithreaded mode now always uses stream_encoder_mt to ensure reproducible builds

2021-11-30 Thread Jia Tan
> To be backward compatible, maybe it needs extra syntax within the
> --threads option or a new command line option. Both are a bit annoying
> and ugly but I don't have a better idea.

I added a flag to the threading option to force multithreading or
single threading modes. Right now, if the user specifies -T s4, xz
will ignore the 4 threads requested and do single threaded mode.
Should I issue a warning message, cause an error, or is this the
preferred behavior?

> common.h is internal to liblzma and must not be used from xz. Maybe
> LZMA_THREADS_MAX could be moved to the public API, I don't know right
> now.

I put the hardcoded value back.

---
src/xz/args.c | 38 +++---
src/xz/coder.c | 16 +---
src/xz/coder.h | 3 +++
src/xz/message.c | 7 +--
4 files changed, 56 insertions(+), 8 deletions(-)

diff --git a/src/xz/args.c b/src/xz/args.c
index 9238fb3..2fe98cd 100644
--- a/src/xz/args.c
+++ b/src/xz/args.c
@@ -115,6 +115,40 @@ parse_block_list(char *str)
return;
}
+static void
+parse_threading(char* optarg){
+ char multithreaded_mode = optarg[0];
+ bool threading_specified = false;
+
+ if(multithreaded_mode == 'm' || multithreaded_mode == 's'){
+ threading_specified = true;
+ optarg++;
+ }
+
+ // The max is from src/liblzma/common/common.h.
+ uint64_t threads_requested = str_to_uint64("threads",
+ optarg, 0, 16384);
+ hardware_threads_set(threads_requested);
+
+ if(threading_specified){
+ if(multithreaded_mode == 'm'){
+ set_multithreaded_mode(true);
+ }
+ else if(multithreaded_mode == 's') {
+ set_multithreaded_mode(false);
+ }
+ }
+ else {
+ //Default for --threads=1 is single threaded mode
+ if(threads_requested == 1){
+ set_multithreaded_mode(false);
+ }
+ //Default for --threads=0 or --threads=[n>1] is multi threaded mode
+ else {
+ set_multithreaded_mode(true);
+ }
+ }
+}
static void
parse_real(args_info *args, int argc, char **argv)
@@ -247,9 +281,7 @@ parse_real(args_info *args, int argc, char **argv)
break;
case 'T':
- // The max is from src/liblzma/common/common.h.
- hardware_threads_set(str_to_uint64("threads",
- optarg, 0, 16384));
+ parse_threading(optarg);
break;
// --version
diff --git a/src/xz/coder.c b/src/xz/coder.c
index 85f9543..8dfdeb2 100644
--- a/src/xz/coder.c
+++ b/src/xz/coder.c
@@ -51,6 +51,9 @@ static lzma_check check;
/// This becomes false if the --check=CHECK option is used.
static bool check_default = true;
+/// Flag to indicate multithreaded compression
+static bool multithreaded_mode = false;
+
#if defined(HAVE_ENCODERS) && defined(MYTHREAD_ENABLED)
static lzma_mt mt_options = {
.flags = 0,
@@ -211,10 +214,11 @@ coder_set_compression_settings(void)
}
}
- if (hardware_threads_get() > 1) {
+ if (multithreaded_mode) {
message(V_WARNING, _("Switching to single-threaded "
"mode due to --flush-timeout"));
hardware_threads_set(1);
+ set_multithreaded_mode(false);
}
}
@@ -225,7 +229,7 @@ coder_set_compression_settings(void)
if (opt_mode == MODE_COMPRESS) {
#ifdef HAVE_ENCODERS
# ifdef MYTHREAD_ENABLED
- if (opt_format == FORMAT_XZ && hardware_threads_get() > 1) {
+ if (opt_format == FORMAT_XZ && (hardware_threads_get() > 1 ||
multithreaded_mode)) {
mt_options.threads = hardware_threads_get();
mt_options.block_size = opt_block_size;
mt_options.check = check;
@@ -446,7 +450,7 @@ coder_init(file_pair *pair)
case FORMAT_XZ:
# ifdef MYTHREAD_ENABLED
- if (hardware_threads_get() > 1)
+ if (multithreaded_mode)
ret = lzma_stream_encoder_mt(
&strm, &mt_options);
else
@@ -933,6 +937,12 @@ coder_run(const char *filename)
return;
}
+extern void
+set_multithreaded_mode(bool mode)
+{
+ multithreaded_mode = mode;
+}
+
#ifndef NDEBUG
extern void
diff --git a/src/xz/coder.h b/src/xz/coder.h
index 583da8f..178f036 100644
--- a/src/xz/coder.h
+++ b/src/xz/coder.h
@@ -70,6 +70,9 @@ extern void coder_set_compression_settings(void);
/// Compress or decompress the given file
extern void coder_run(const char *filename);
+ Set multithread mode true/false
+extern void set_multithreaded_mode(bool mode);
+
#ifndef NDEBUG
/// Free the memory allocated for the coder and kill the worker threads.
extern void coder_free(void);
diff --git a/src/xz/message.c b/src/xz/message.c
index 00eb65b..ceb1cdd 100644
--- a/src/xz/message.c
+++ b/src/xz/message.c
@@ -1159,8 +1159,11 @@ message_help(bool long_help)
" does not affect decompressor memory requirements"));
puts(_(
-" -T, --threads=NUM use at most NUM threads; the default is 1; set to 0\n"
-" to use as many threads as there are processor cores"));
+" -T[m|s],\n"
+" --threads[m|s]=NUM use at most NUM threads; the default is 1; set to 0\n"
+" to use as many threads as there are processor cores\n"
+" if s is set, force single threaded mode; if m i set,\n"
+" force multithreaded mode"));
if (long_help) {
puts(_(
-- 
2.25.1
From f855507add2f7d2e1822584f76c7ad56fe7d249d Mon Sep 17 00:00:00 2001
From: jiat75 
Date: Tue, 30 Nov 2021 22:07:46 +0800
Subject: [PATCH] Multithreaded mode now always uses stream_encoder_mt

---
 s

Re: [xz-devel] [PATCH] xz: Multithreaded mode now always uses stream_encoder_mt to ensure reproducible builds

2021-11-29 Thread Lasse Collin
On 2021-11-29 Jia Tan wrote:
> This patch addresses the issues with reproducible builds when using
> multithreaded xz. Previously, specifying --threads=1 instead of
> --threads=[n>1] creates different output. Now, setting any number of  
> threads forces multithreading mode, even if there is only 1 worker
> thread.

This is an old problem that should have been fixed long ago.
Unfortunately I think the fix needs to be a little more complex due to
backward compatibility.

With this patch, if threading has been enabled, no further option on
the command line (except --flush-timeout) will disable threading.
Sometimes there are default options (for exampe, XZ_DEFAULTS) that
enable threading and one wants to disable it in a specific situation
(like running multiple xz commands in parallel via xargs). If
--threads=1 always enables threading, memory usage will be quite a bit
higher than in non-threaded mode (94 MiB vs. 166 MiB for the default
compression level -6; 674 MiB vs. 1250 MiB for -9).

To be backward compatible, maybe it needs extra syntax within the
--threads option or a new command line option. Both are a bit annoying
and ugly but I don't have a better idea.

Currently one-thread multi-threading is done if one specifies two or
more threads but the memory limit is so low that only one thread can be
used. In that case xz will never switch to non-threaded mode. This
ensures that the output file is always the same even if the number of
threads gets reduced.

When -T0 is used, that is broken in sense that threading mode (and
thus encoded output) depends on how many hardware threads are supported.
So perhaps -T0 should mean that multi-threaded mode must be used even
for single thread (your patch would do this too).

A way to explicitly specify one-thread multi-threaded mode is still
needed but I guess it wouldn't need to be used so often if -T0 handles
it already. -T0 needs improvements in default memory usage limiting too,
and both changes could make the default behavior better.

The opposite functionality could be made available too: if the number
of threads becomes one for whatever reason, an option could tell xz to
always use single-threaded mode to get better compression and to save
RAM.

> +#include "common.h"
[...]
> // The max is from src/liblzma/common/common.h.
> hardware_threads_set(str_to_uint64("threads",
> - optarg, 0, 16384));
> + optarg, 0, LZMA_THREADS_MAX));

common.h is internal to liblzma and must not be used from xz. Maybe
LZMA_THREADS_MAX could be moved to the public API, I don't know right
now.

-- 
Lasse Collin