Hi,

On 04/08/2019 11:57, Andrey Borodin wrote:


2 авг. 2019 г., в 21:39, Andres Freund <and...@anarazel.de> написал(а):

On 2019-08-02 20:40:51 +0500, Andrey Borodin wrote:
We have some kind of "roadmap" of "extensible pglz". We plan to provide 
implementation on Novembers CF.

I don't understand why it's a good idea to improve the compression side
of pglz. There's plenty other people that spent a lot of time developing
better compression algorithms.
Improving compression side of pglz has two different projects:
1. Faster compression with less code and same compression ratio (patch in this 
thread).
2. Better compression ratio with at least same compression speed of 
uncompressed values.
Why I want to do patch for 2? Because it's interesting.
Will 1 or 2 be reviewed or committed? I have no idea.
Will many users benefit from 1 or 2? Yes, clearly. Unless we force everyone to 
stop compressing with pglz.


FWIW I agree.

Just so that we don't idly talk, what do you think about the attached?
It:
- adds new GUC compression_algorithm with possible values of pglz (default) and 
lz4 (if lz4 is compiled in), requires SIGHUP
- adds --with-lz4 configure option (default yes, so the configure option is 
actually --without-lz4) that enables the lz4, it's using system library
- uses the compression_algorithm for both TOAST and WAL compression (if on)
- supports slicing for lz4 as well (pglz was already supported)
- supports reading old TOAST values
- adds 1 byte header to the compressed data where we currently store the 
algorithm kind, that leaves us with 254 more to add :) (that's an extra 
overhead compared to the current state)
- changes the rawsize in TOAST header to 31 bits via bit packing
- uses the extra bit to differentiate between old and new format
- supports reading from table which has different rows stored with different 
algorithm (so that the GUC itself can be freely changed)
That's cool. I suggest defaulting to lz4 if it is available. You cannot start 
cluster on non-lz4 binaries which used lz4 once.
Do we plan the possibility of compression algorithm as extension? Or will all 
algorithms be packed into that byte in core?

What I wrote does not expect extensions providing new compression. We'd have to somehow reserve compression ids for specific extensions and that seems like a lot of extra complexity for little benefit. I don't see much benefit in having more than say 3 generic compressors (I could imagine adding zstd). If you are thinking about data type specific compression then I think this is wrong layer.

What about lz4 "common prefix"? System or user-defined. If lz4 is compiled in 
we can even offer in-system training, just make sure that trained prefixes will make 
their way to standbys.


I definitely don't plan to work on common prefix. But don't see why that could not be added later.

--
Petr Jelinek
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/


Reply via email to