Re: Compression Question

Mark McKeown Tue, 14 Dec 2021 05:12:57 -0800

There is a set of benchmarks comparing the algorithms (lz4, zstd, zlib
etc) and the tradeoffs here:


http://facebook.github.io/zstd/

cheers
Mark

On Mon, Dec 13, 2021 at 10:14 PM Nathan Hartman <hartman.nat...@gmail.com>
wrote:

> On Mon, Dec 13, 2021 at 2:31 PM Luke Mauldin <lukemaul...@icloud.com>
> wrote:
> >
> > From reading the documentation, I can see that Subversion 1.14 supports
> both zlib and lz4 compression.  I am running Subversion on FreeBSD 13.X on
> ZFS which supports native zstd compression.  Some of the repos I host are
> relatively large (60K revisions and 60GB+) and I am wondering what
> combination will give me the best performance?  Currently, I have
> Subversion compression disabled and ZFS with zstd compression enabled.  In
> this setup, ZFS reports a compression ratio of 1.69X.  I would think if
> Subversion natively supported ZSTD compression that would be best but since
> it does not, I just wanted to see if anyone had recommendations?
>
>
> As I understand it, the motivation for adding LZ4 compression (added
> in 1.10) was speed. From vague memory (I haven't looked into
> compression algorithms recently), I think zlib achieves a better
> compression ratio in terms of disk space saved, but LZ4 is faster. I
> haven't had experience with zstd yet.
>
> It is difficult to say which compression format would give the "best"
> performance for a particular application without some experimentation
> because things like hardware I/O speeds and the nature of the data
> being compressed affect the outcome.
>
> Are you looking for the best speed, the best compression ratio, a good
> tradeoff between the two?
>
> If you want to conserve disk space, I would suggest (if it's feasible
> and on a separate machine, not in production), to produce a dumpfile
> and load it twice, once with zlib and once with LZ4, and then compare
> the resulting on-disk sizes to that of the volumes on zstd. Note
> Subversion's data deduplication feature: if this was turned off in the
> past or is off now, some or all of your repo might contain duplicated
> data; to make the experiment "fair" you would need to take this into
> account.
>
> If you are looking for best performance in terms of speed, I don't
> have a simple answer for this because it depends on a great many
> variables in which Subversion's compression is but one. I would assume
> that networking I/O probably plays a bigger role than compression
> here.
>
> Hope this helps,
> Nathan
>


-- 
*MARK MC KEOWN DEVELOPER*

*E* mark.mcke...@wandisco.com

-- 


THIS MESSAGE AND ANY ATTACHMENTS ARE CONFIDENTIAL, PROPRIETARY AND MAY BE 
PRIVILEGED


If this message was misdirected, WANdisco, Inc. and its 
subsidiaries, ("WANdisco") does not waive any confidentiality or privilege. 
If you are not the intended recipient, please notify us immediately and 
destroy the message without disclosing its contents to anyone. Any 
distribution, use or copying of this email or the information it contains 
by other than an intended recipient is unauthorized. The views and opinions 
expressed in this email message are the author's own and may not reflect 
the views and opinions of WANdisco, unless the author is authorized by 
WANdisco to express such views or opinions on its behalf. All email sent to 
or from this address is subject to electronic storage and review by 
WANdisco. Although WANdisco operates anti-virus programs, it does not 
accept responsibility for any damage whatsoever caused by viruses being 
passed.

Re: Compression Question

Reply via email to