Re: [HACKERS] Custom compression methods

konstantin knizhnik Sat, 02 Dec 2017 12:25:04 -0800

On Dec 2, 2017, at 6:04 PM, Tomas Vondra wrote:

> On 12/01/2017 10:52 PM, Andres Freund wrote:
>> On 2017-12-01 16:14:58 -0500, Robert Haas wrote:
>>> Honestly, if we can give everybody a 4% space reduction by
>>> switching to lz4, I think that's totally worth doing -- but let's
>>> not make people choose it, let's make it the default going forward,
>>> and keep pglz support around so we don't break pg_upgrade
>>> compatibility (and so people can continue to choose it if for some
>>> reason it works better in their use case). That kind of improvement
>>> is nothing special in a specific workload, but TOAST is a pretty
>>> general-purpose mechanism. I have become, through a few bitter
>>> experiences, a strong believer in the value of trying to reduce our
>>> on-disk footprint, and knocking 4% off the size of every TOAST
>>> table in the world does not sound worthless to me -- even though
>>> context-aware compression can doubtless do a lot better.
>> 
>> +1. It's also a lot faster, and I've seen way way to many workloads
>> with 50%+ time spent in pglz.
>> 
> 
> TBH the 4% figure is something I mostly made up (I'm fake news!). On the
> mailing list archive (which I believe is pretty compressible) I observed
> something like 2.5% size reduction with lz4 compared to pglz, at least
> with the compression levels I've used ...
> 
> Other algorithms (e.g. zstd) got significantly better compression (25%)
> compared to pglz, but in exchange for longer compression. I'm sure we
> could lower compression level to make it faster, but that will of course
> hurt the compression ratio.
> 
> I don't think switching to a different compression algorithm is a way
> forward - it was proposed and explored repeatedly in the past, and every
> time it failed for a number of reasons, most of which are still valid.
> 
> 
> Firstly, it's going to be quite hard (or perhaps impossible) to find an
> algorithm that is "universally better" than pglz. Some algorithms do
> work better for text documents, some for binary blobs, etc. I don't
> think there's a win-win option.
> 
> Sure, there are workloads where pglz performs poorly (I've seen such
> cases too), but IMHO that's more an argument for the custom compression
> method approach. pglz gives you good default compression in most cases,
> and you can change it for columns where it matters, and where a
> different space/time trade-off makes sense.
> 
> 
> Secondly, all the previous attempts ran into some legal issues, i.e.
> licensing and/or patents. Maybe the situation changed since then (no
> idea, haven't looked into that), but in the past the "pluggable"
> approach was proposed as a way to address this.
> 
>


May be it will be interesting for you to see the following results of applying 
page-level compression (CFS in PgPro-EE) to pgbench data:

Configuration
Size (Gb)
Time (sec)
vanilla postgres
15.31
92
zlib (default level)
2.37
284
zlib (best speed)
2.43
191
postgres internal lz
3.89
214
lz4 
4.12
95
snappy (google)
5.18
99
lzfse (apple)
2.80
1099
zstd (facebook)
1.69
125

All algorithms (except zlib) were used with best-speed option: using better 
compression level usually has not so large impact on compression ratio (<30%), 
but can significantly increase time (several times).
Certainly pgbench isnot the best candidate for testing compression algorithms: 
it generates a lot of artificial and redundant data.
But we measured it also on real customers data and still zstd seems to be the 
best compression methods: provides good compression with smallest CPU overhead.

Re: [HACKERS] Custom compression methods

Reply via email to