Re: [PATCH 8 of 9] util: support multi-threaded zstandard compression for bundles

2017-04-07 Thread Pierre-Yves David



On 04/07/2017 03:11 PM, Yuya Nishihara wrote:

On Thu, 6 Apr 2017 10:51:36 -0700, Gregory Szorc wrote:

Does anyone see any value in adding it as an experimental config option in
Mercurial? Or should we drop the feature for now? I'd kinda like it there
to make it easier to experiment with. But if others aren't comfortable with
it, I totally understand.


An experimental config makes sense to me. My only concern was a hard bug
in C layer, but that would be okay-ish for experimental features.


+1,

--
Pierre-Yves David
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 8 of 9] util: support multi-threaded zstandard compression for bundles

2017-04-07 Thread Yuya Nishihara
On Thu, 6 Apr 2017 10:51:36 -0700, Gregory Szorc wrote:
> Does anyone see any value in adding it as an experimental config option in
> Mercurial? Or should we drop the feature for now? I'd kinda like it there
> to make it easier to experiment with. But if others aren't comfortable with
> it, I totally understand.

An experimental config makes sense to me. My only concern was a hard bug
in C layer, but that would be okay-ish for experimental features.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 8 of 9] util: support multi-threaded zstandard compression for bundles

2017-04-06 Thread Gregory Szorc
OK then. Thanks, Yann.

Does anyone see any value in adding it as an experimental config option in
Mercurial? Or should we drop the feature for now? I'd kinda like it there
to make it easier to experiment with. But if others aren't comfortable with
it, I totally understand.

On Thu, Apr 6, 2017 at 10:26 AM, Yann Collet  wrote:

> Yes, the multi-threading API is still considered experimental, hence not
> yet recommended for production use.
>
> We are actually working on that currently, and expect to change this
> status in a future release.
>
>
> On 06/04/2017, 10:21, "Gregory Szorc"  wrote:
>
>
>
> > On Apr 6, 2017, at 06:10, Yuya Nishihara  wrote:
> >
> >> On Sat, 01 Apr 2017 15:31:59 -0700, Gregory Szorc wrote:
> >> # HG changeset patch
> >> # User Gregory Szorc 
> >> # Date 1491085720 25200
> >> #  Sat Apr 01 15:28:40 2017 -0700
> >> # Node ID c171ff452d36d4caeab1887df4da43e45a61022a
> >> # Parent  7f469db35d4a303d53b4c7dc141a8378fa67d9e1
> >> util: support multi-threaded zstandard compression for bundles
> >
> > Is multi-threading support stable enough? I heard someone saying the
> code
> > quality wasn't great.
> >
> > https://github.com/facebook/zstd/issues/491#issuecomment-277780684
>
> "experimental" in zstd land typically means "unstable API." This isn't
> an issue for python-zstandard since it bundles it's on zstd copy and
> statically links.
>
> But just in case the code may not be suitable to run in production
> just yet, I've CCd Yann who give an opinion.
>
>
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 8 of 9] util: support multi-threaded zstandard compression for bundles

2017-04-06 Thread Yann Collet
Yes, the multi-threading API is still considered experimental, hence not yet 
recommended for production use.

We are actually working on that currently, and expect to change this status in 
a future release.


On 06/04/2017, 10:21, "Gregory Szorc"  wrote:



> On Apr 6, 2017, at 06:10, Yuya Nishihara  wrote:
> 
>> On Sat, 01 Apr 2017 15:31:59 -0700, Gregory Szorc wrote:
>> # HG changeset patch
>> # User Gregory Szorc 
>> # Date 1491085720 25200
>> #  Sat Apr 01 15:28:40 2017 -0700
>> # Node ID c171ff452d36d4caeab1887df4da43e45a61022a
>> # Parent  7f469db35d4a303d53b4c7dc141a8378fa67d9e1
>> util: support multi-threaded zstandard compression for bundles
> 
> Is multi-threading support stable enough? I heard someone saying the code
> quality wasn't great.
> 
> https://github.com/facebook/zstd/issues/491#issuecomment-277780684

"experimental" in zstd land typically means "unstable API." This isn't an 
issue for python-zstandard since it bundles it's on zstd copy and statically 
links.

But just in case the code may not be suitable to run in production just 
yet, I've CCd Yann who give an opinion.

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 8 of 9] util: support multi-threaded zstandard compression for bundles

2017-04-06 Thread Gregory Szorc


> On Apr 6, 2017, at 06:10, Yuya Nishihara  wrote:
> 
>> On Sat, 01 Apr 2017 15:31:59 -0700, Gregory Szorc wrote:
>> # HG changeset patch
>> # User Gregory Szorc 
>> # Date 1491085720 25200
>> #  Sat Apr 01 15:28:40 2017 -0700
>> # Node ID c171ff452d36d4caeab1887df4da43e45a61022a
>> # Parent  7f469db35d4a303d53b4c7dc141a8378fa67d9e1
>> util: support multi-threaded zstandard compression for bundles
> 
> Is multi-threading support stable enough? I heard someone saying the code
> quality wasn't great.
> 
> https://github.com/facebook/zstd/issues/491#issuecomment-277780684

"experimental" in zstd land typically means "unstable API." This isn't an issue 
for python-zstandard since it bundles it's on zstd copy and statically links.

But just in case the code may not be suitable to run in production just yet, 
I've CCd Yann who give an opinion.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 8 of 9] util: support multi-threaded zstandard compression for bundles

2017-04-06 Thread Yuya Nishihara
On Sat, 01 Apr 2017 15:31:59 -0700, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc 
> # Date 1491085720 25200
> #  Sat Apr 01 15:28:40 2017 -0700
> # Node ID c171ff452d36d4caeab1887df4da43e45a61022a
> # Parent  7f469db35d4a303d53b4c7dc141a8378fa67d9e1
> util: support multi-threaded zstandard compression for bundles

Is multi-threading support stable enough? I heard someone saying the code
quality wasn't great.

https://github.com/facebook/zstd/issues/491#issuecomment-277780684
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 8 of 9] util: support multi-threaded zstandard compression for bundles

2017-04-01 Thread Gregory Szorc
# HG changeset patch
# User Gregory Szorc 
# Date 1491085720 25200
#  Sat Apr 01 15:28:40 2017 -0700
# Node ID c171ff452d36d4caeab1887df4da43e45a61022a
# Parent  7f469db35d4a303d53b4c7dc141a8378fa67d9e1
util: support multi-threaded zstandard compression for bundles

python-zstandard 0.8.0 exposes support for multi-threaded compression.
This seems like a useful feature to make available to operations
that perform streaming compression.

This patch implements a bundlespec parameter for the zstandard
compression engine that defines how many threads to use for
compression. By default, absence of the parameter retains the
existing behavior of using a single thread. Although, I could
make a compelling argument to have it use as many threads as the
machine has cores by default (because why wouldn't you want the
operation to complete as fast as possible). That being said,
there are some subtle differences with how multi-threaded
compression works. So leaving it as a single thread is a
reasonable default for the initial feature.

On the mozilla-unified repo on my i7-6700K, `hg bundle -a` times
for various bundlespecs are as follows:

! zstd-v2;compressionlevel=3
121s wall; 119s user; 1,082,202,822 bytes
! zstd-v2;compressionlevel=3;compressionthreads=4
105s wall; 117s user; 1,088,479,719 bytes
! zstd-v2;compressionlevel=15
492s wall; 489s user; 899,048,902 bytes
! zstd-v2;compressionlevel=15;compressionthreads=4
151s wall; 620s user; 905,597,146 bytes

We see multi-threaded compression doesn't really help much for level=3.
This is because Mercurial can't deliver data to zstandard fast enough.
But for level=15, we see a significant slowdown for single-threaded
operation compared to level=3 while multi-threaded compression claws
back most of the wall time difference (by throwing more CPU cores at
the problem) while preserving most of the ~180 MB size reduction. Not
bad.

The zstandard format is such that the consumer doesn't know nor care
that the producer used multi-threaded compression. There are no
significant performance implications on the client from using
multi-threaded compression. The only difference is the "deltas" within
the compressed stream are slightly different due to the way the
compressor splits up work across threads. That's why the size of
multi-threaded compressed data is typically a little larger.

diff --git a/mercurial/util.py b/mercurial/util.py
--- a/mercurial/util.py
+++ b/mercurial/util.py
@@ -3616,6 +3616,16 @@ class _zstdengine(compressionengine):
 decompression. Compression levels higher than ``19`` can require
 hundreds of megabytes of memory and may exhaust memory in 32-bit
 processes.
+
+The ``compressionthreads`` parameters defines the number of threads
+to use for compression. The default is to use a single thread.
+A negative value will attempt to detect the number of available CPU
+cores and use that many threads. For many compression levels,
+Mercurial won't be able to send data to the compressor fast enough
+for multi-threaded compression to make compression faster. The use
+of multi-threaded compression can be combined with a higher
+``compressionlevel`` to achieve better compression ratios while
+still completing in a reasonable amount of time.
 """
 return 'zstd', 'ZS'
 
@@ -3638,6 +3648,16 @@ class _zstdengine(compressionengine):
 
 opts['level'] = level
 
+if 'compressionthreads' in params:
+relevant.add('compressionthreads')
+try:
+threads = int(params['compressionthreads'])
+except ValueError:
+raise ValueError(_('compressionthreads "%s" is not an '
+   'integer') % params['compressionthreads'])
+
+opts['threads'] = threads
+
 return relevant, opts
 
 def wireprotosupport(self):
@@ -3652,9 +3672,10 @@ class _zstdengine(compressionengine):
 # while providing no worse compression. It strikes a good balance
 # between speed and compression.
 level = opts.get('level', 3)
+threads = opts.get('threads', 0)
 
 zstd = self._module
-z = zstd.ZstdCompressor(level=level).compressobj()
+z = zstd.ZstdCompressor(level=level, threads=threads).compressobj()
 for chunk in it:
 data = z.compress(chunk)
 if data:
diff --git a/tests/test-bundle-type.t b/tests/test-bundle-type.t
--- a/tests/test-bundle-type.t
+++ b/tests/test-bundle-type.t
@@ -238,6 +238,19 @@ compressionlevel parameter is parsed and
   bundle2-output-bundle: "HG20", (1 params) 1 parts total
   bundle2-output-part: "changegroup" (params: 1 mandatory 1 advisory) streamed 
payload
 
+multi-threaded bundlespec parameter is parsed and used
+(we can't easily test this has an effect on the output)
+
+  $ hg --debug -R b1 bundle -a -t 'zstd-v2;compressionthreads=2'