Re: [PATCH 8 of 9] util: support multi-threaded zstandard compression for bundles
On 04/07/2017 03:11 PM, Yuya Nishihara wrote: On Thu, 6 Apr 2017 10:51:36 -0700, Gregory Szorc wrote: Does anyone see any value in adding it as an experimental config option in Mercurial? Or should we drop the feature for now? I'd kinda like it there to make it easier to experiment with. But if others aren't comfortable with it, I totally understand. An experimental config makes sense to me. My only concern was a hard bug in C layer, but that would be okay-ish for experimental features. +1, -- Pierre-Yves David ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 8 of 9] util: support multi-threaded zstandard compression for bundles
On Thu, 6 Apr 2017 10:51:36 -0700, Gregory Szorc wrote: > Does anyone see any value in adding it as an experimental config option in > Mercurial? Or should we drop the feature for now? I'd kinda like it there > to make it easier to experiment with. But if others aren't comfortable with > it, I totally understand. An experimental config makes sense to me. My only concern was a hard bug in C layer, but that would be okay-ish for experimental features. ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 8 of 9] util: support multi-threaded zstandard compression for bundles
OK then. Thanks, Yann. Does anyone see any value in adding it as an experimental config option in Mercurial? Or should we drop the feature for now? I'd kinda like it there to make it easier to experiment with. But if others aren't comfortable with it, I totally understand. On Thu, Apr 6, 2017 at 10:26 AM, Yann Colletwrote: > Yes, the multi-threading API is still considered experimental, hence not > yet recommended for production use. > > We are actually working on that currently, and expect to change this > status in a future release. > > > On 06/04/2017, 10:21, "Gregory Szorc" wrote: > > > > > On Apr 6, 2017, at 06:10, Yuya Nishihara wrote: > > > >> On Sat, 01 Apr 2017 15:31:59 -0700, Gregory Szorc wrote: > >> # HG changeset patch > >> # User Gregory Szorc > >> # Date 1491085720 25200 > >> # Sat Apr 01 15:28:40 2017 -0700 > >> # Node ID c171ff452d36d4caeab1887df4da43e45a61022a > >> # Parent 7f469db35d4a303d53b4c7dc141a8378fa67d9e1 > >> util: support multi-threaded zstandard compression for bundles > > > > Is multi-threading support stable enough? I heard someone saying the > code > > quality wasn't great. > > > > https://github.com/facebook/zstd/issues/491#issuecomment-277780684 > > "experimental" in zstd land typically means "unstable API." This isn't > an issue for python-zstandard since it bundles it's on zstd copy and > statically links. > > But just in case the code may not be suitable to run in production > just yet, I've CCd Yann who give an opinion. > > ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 8 of 9] util: support multi-threaded zstandard compression for bundles
Yes, the multi-threading API is still considered experimental, hence not yet recommended for production use. We are actually working on that currently, and expect to change this status in a future release. On 06/04/2017, 10:21, "Gregory Szorc"wrote: > On Apr 6, 2017, at 06:10, Yuya Nishihara wrote: > >> On Sat, 01 Apr 2017 15:31:59 -0700, Gregory Szorc wrote: >> # HG changeset patch >> # User Gregory Szorc >> # Date 1491085720 25200 >> # Sat Apr 01 15:28:40 2017 -0700 >> # Node ID c171ff452d36d4caeab1887df4da43e45a61022a >> # Parent 7f469db35d4a303d53b4c7dc141a8378fa67d9e1 >> util: support multi-threaded zstandard compression for bundles > > Is multi-threading support stable enough? I heard someone saying the code > quality wasn't great. > > https://github.com/facebook/zstd/issues/491#issuecomment-277780684 "experimental" in zstd land typically means "unstable API." This isn't an issue for python-zstandard since it bundles it's on zstd copy and statically links. But just in case the code may not be suitable to run in production just yet, I've CCd Yann who give an opinion. ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 8 of 9] util: support multi-threaded zstandard compression for bundles
> On Apr 6, 2017, at 06:10, Yuya Nishiharawrote: > >> On Sat, 01 Apr 2017 15:31:59 -0700, Gregory Szorc wrote: >> # HG changeset patch >> # User Gregory Szorc >> # Date 1491085720 25200 >> # Sat Apr 01 15:28:40 2017 -0700 >> # Node ID c171ff452d36d4caeab1887df4da43e45a61022a >> # Parent 7f469db35d4a303d53b4c7dc141a8378fa67d9e1 >> util: support multi-threaded zstandard compression for bundles > > Is multi-threading support stable enough? I heard someone saying the code > quality wasn't great. > > https://github.com/facebook/zstd/issues/491#issuecomment-277780684 "experimental" in zstd land typically means "unstable API." This isn't an issue for python-zstandard since it bundles it's on zstd copy and statically links. But just in case the code may not be suitable to run in production just yet, I've CCd Yann who give an opinion. ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 8 of 9] util: support multi-threaded zstandard compression for bundles
On Sat, 01 Apr 2017 15:31:59 -0700, Gregory Szorc wrote: > # HG changeset patch > # User Gregory Szorc> # Date 1491085720 25200 > # Sat Apr 01 15:28:40 2017 -0700 > # Node ID c171ff452d36d4caeab1887df4da43e45a61022a > # Parent 7f469db35d4a303d53b4c7dc141a8378fa67d9e1 > util: support multi-threaded zstandard compression for bundles Is multi-threading support stable enough? I heard someone saying the code quality wasn't great. https://github.com/facebook/zstd/issues/491#issuecomment-277780684 ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH 8 of 9] util: support multi-threaded zstandard compression for bundles
# HG changeset patch # User Gregory Szorc# Date 1491085720 25200 # Sat Apr 01 15:28:40 2017 -0700 # Node ID c171ff452d36d4caeab1887df4da43e45a61022a # Parent 7f469db35d4a303d53b4c7dc141a8378fa67d9e1 util: support multi-threaded zstandard compression for bundles python-zstandard 0.8.0 exposes support for multi-threaded compression. This seems like a useful feature to make available to operations that perform streaming compression. This patch implements a bundlespec parameter for the zstandard compression engine that defines how many threads to use for compression. By default, absence of the parameter retains the existing behavior of using a single thread. Although, I could make a compelling argument to have it use as many threads as the machine has cores by default (because why wouldn't you want the operation to complete as fast as possible). That being said, there are some subtle differences with how multi-threaded compression works. So leaving it as a single thread is a reasonable default for the initial feature. On the mozilla-unified repo on my i7-6700K, `hg bundle -a` times for various bundlespecs are as follows: ! zstd-v2;compressionlevel=3 121s wall; 119s user; 1,082,202,822 bytes ! zstd-v2;compressionlevel=3;compressionthreads=4 105s wall; 117s user; 1,088,479,719 bytes ! zstd-v2;compressionlevel=15 492s wall; 489s user; 899,048,902 bytes ! zstd-v2;compressionlevel=15;compressionthreads=4 151s wall; 620s user; 905,597,146 bytes We see multi-threaded compression doesn't really help much for level=3. This is because Mercurial can't deliver data to zstandard fast enough. But for level=15, we see a significant slowdown for single-threaded operation compared to level=3 while multi-threaded compression claws back most of the wall time difference (by throwing more CPU cores at the problem) while preserving most of the ~180 MB size reduction. Not bad. The zstandard format is such that the consumer doesn't know nor care that the producer used multi-threaded compression. There are no significant performance implications on the client from using multi-threaded compression. The only difference is the "deltas" within the compressed stream are slightly different due to the way the compressor splits up work across threads. That's why the size of multi-threaded compressed data is typically a little larger. diff --git a/mercurial/util.py b/mercurial/util.py --- a/mercurial/util.py +++ b/mercurial/util.py @@ -3616,6 +3616,16 @@ class _zstdengine(compressionengine): decompression. Compression levels higher than ``19`` can require hundreds of megabytes of memory and may exhaust memory in 32-bit processes. + +The ``compressionthreads`` parameters defines the number of threads +to use for compression. The default is to use a single thread. +A negative value will attempt to detect the number of available CPU +cores and use that many threads. For many compression levels, +Mercurial won't be able to send data to the compressor fast enough +for multi-threaded compression to make compression faster. The use +of multi-threaded compression can be combined with a higher +``compressionlevel`` to achieve better compression ratios while +still completing in a reasonable amount of time. """ return 'zstd', 'ZS' @@ -3638,6 +3648,16 @@ class _zstdengine(compressionengine): opts['level'] = level +if 'compressionthreads' in params: +relevant.add('compressionthreads') +try: +threads = int(params['compressionthreads']) +except ValueError: +raise ValueError(_('compressionthreads "%s" is not an ' + 'integer') % params['compressionthreads']) + +opts['threads'] = threads + return relevant, opts def wireprotosupport(self): @@ -3652,9 +3672,10 @@ class _zstdengine(compressionengine): # while providing no worse compression. It strikes a good balance # between speed and compression. level = opts.get('level', 3) +threads = opts.get('threads', 0) zstd = self._module -z = zstd.ZstdCompressor(level=level).compressobj() +z = zstd.ZstdCompressor(level=level, threads=threads).compressobj() for chunk in it: data = z.compress(chunk) if data: diff --git a/tests/test-bundle-type.t b/tests/test-bundle-type.t --- a/tests/test-bundle-type.t +++ b/tests/test-bundle-type.t @@ -238,6 +238,19 @@ compressionlevel parameter is parsed and bundle2-output-bundle: "HG20", (1 params) 1 parts total bundle2-output-part: "changegroup" (params: 1 mandatory 1 advisory) streamed payload +multi-threaded bundlespec parameter is parsed and used +(we can't easily test this has an effect on the output) + + $ hg --debug -R b1 bundle -a -t 'zstd-v2;compressionthreads=2'