On 22.02.2021 08:38, Craig Ringer wrote:


On Thu, 11 Feb 2021, 21:09 Daniil Zakhlystov, <usernam...@yandex-team.ru <mailto:usernam...@yandex-team.ru>> wrote::


    3. Chunked compression allows to compress only well compressible
    messages and save the CPU cycles by not compressing the others
    4. Chunked compression introduces some traffic overhead compared
    to the permanent (1.2810G vs 1.2761G TX data on pg_restore of IMDB
    database dump, according to results in my previous message)
    5. From the protocol point of view, chunked compression seems a
    little bit more flexible:
     - we can inject some uncompressed messages at any time without
    the need to decompress/compress the compressed data
     - we can potentially switch the compression algorithm at any time
    (but I think that this might be over-engineering)


Chunked compression also potentially makes it easier to handle non blocking sockets, because you aren't worrying about yet another layer of buffering within the compression stream. This is a real pain with SSL, for example.

It simplifies protocol analysis.

It permits compression to be decided on the fly, heuristically, based on message size and potential compressibility.

It could relatively easily be extended to compress a group of pending small messages, e.g. by PQflush. That'd help mitigate the downsides with small messages.

So while stream compression offers better compression ratios, I'm inclined to suspect we'll want message level compression.

From my point of view there are several use cases where protocol compression can be useful for:
1. Replication
2. Backup/dump
3. Bulk load (COPY)
4. Queries returning large objects (json, blobs,...)

All this cases are controlled by user or DBA, so them can make decision whether to use compression or not. Switching compression on the fly, use different algorithms in different directions is not needed. Yes, in all this scenarios data is mostly transferred in one direction. So compression of small messages going in opposite direction is not strictly needed. But benchmarks shows that is has almost no influence on performance and CPU usage. So I suggest not to complicate protocol and implementation and implement functionality which is present in most of other DBMSes.

There is no sense to try compression on workloads like pgbench and make some conclusions based on it. From my point of view it is obvious misuse. Compressing of each message idividually or chunked compression significantly decrease compression ratio because typical size of message is not large enough and resetting compression state after processing each message (clearing compression dictionary) adds too large overhead.

Reply via email to