Re: libpq compression

Konstantin Knizhnik Tue, 23 Feb 2021 08:29:31 -0800


On 22.02.2021 08:38, Craig Ringer wrote:

On Thu, 11 Feb 2021, 21:09 Daniil Zakhlystov,<usernam...@yandex-team.ru <mailto:usernam...@yandex-team.ru>> wrote::
    3. Chunked compression allows to compress only well compressible
    messages and save the CPU cycles by not compressing the others
    4. Chunked compression introduces some traffic overhead compared
    to the permanent (1.2810G vs 1.2761G TX data on pg_restore of IMDB
    database dump, according to results in my previous message)
    5. From the protocol point of view, chunked compression seems a
    little bit more flexible:
     - we can inject some uncompressed messages at any time without
    the need to decompress/compress the compressed data
     - we can potentially switch the compression algorithm at any time
    (but I think that this might be over-engineering)
Chunked compression also potentially makes it easier to handle nonblocking sockets, because you aren't worrying about yet another layerof buffering within the compression stream. This is a real pain withSSL, for example.
It simplifies protocol analysis.
It permits compression to be decided on the fly, heuristically, basedon message size and potential compressibility.
It could relatively easily be extended to compress a group of pendingsmall messages, e.g. by PQflush. That'd help mitigate the downsideswith small messages.
So while stream compression offers better compression ratios, I'minclined to suspect we'll want message level compression.

From my point of view there are several use cases where protocolcompression can be useful for:

1. Replication
2. Backup/dump
3. Bulk load (COPY)
4. Queries returning large objects (json, blobs,...)

All this cases are controlled by user or DBA, so them can make decisionwhether to use compression or not.Switching compression on the fly, use different algorithms in differentdirections is not needed.Yes, in all this scenarios data is mostly transferred in one direction.So compression of small messages going in opposite direction is notstrictly needed.But benchmarks shows that is has almost no influence on performance andCPU usage.So I suggest not to complicate protocol and implementation and implementfunctionality which is present in most of other DBMSes.

There is no sense to try compression on workloads like pgbench and makesome conclusions based on it. From my point of view it is obvious misuse.Compressing of each message idividually or chunked compressionsignificantly decrease compression ratio because typical size of messageis not large enoughand resetting compression state after processing each message (clearingcompression dictionary) adds too large overhead.

Re: libpq compression

Reply via email to