On 22.02.2021 08:38, Craig Ringer wrote:
On Thu, 11 Feb 2021, 21:09 Daniil Zakhlystov,
<usernam...@yandex-team.ru <mailto:usernam...@yandex-team.ru>> wrote::
3. Chunked compression allows to compress only well compressible
messages and save the CPU cycles by not compressing the others
4. Chunked compression introduces some traffic overhead compared
to the permanent (1.2810G vs 1.2761G TX data on pg_restore of IMDB
database dump, according to results in my previous message)
5. From the protocol point of view, chunked compression seems a
little bit more flexible:
- we can inject some uncompressed messages at any time without
the need to decompress/compress the compressed data
- we can potentially switch the compression algorithm at any time
(but I think that this might be over-engineering)
Chunked compression also potentially makes it easier to handle non
blocking sockets, because you aren't worrying about yet another layer
of buffering within the compression stream. This is a real pain with
SSL, for example.
It simplifies protocol analysis.
It permits compression to be decided on the fly, heuristically, based
on message size and potential compressibility.
It could relatively easily be extended to compress a group of pending
small messages, e.g. by PQflush. That'd help mitigate the downsides
with small messages.
So while stream compression offers better compression ratios, I'm
inclined to suspect we'll want message level compression.
From my point of view there are several use cases where protocol
compression can be useful for:
1. Replication
2. Backup/dump
3. Bulk load (COPY)
4. Queries returning large objects (json, blobs,...)
All this cases are controlled by user or DBA, so them can make decision
whether to use compression or not.
Switching compression on the fly, use different algorithms in different
directions is not needed.
Yes, in all this scenarios data is mostly transferred in one direction.
So compression of small messages going in opposite direction is not
strictly needed.
But benchmarks shows that is has almost no influence on performance and
CPU usage.
So I suggest not to complicate protocol and implementation and implement
functionality which is present in most of other DBMSes.
There is no sense to try compression on workloads like pgbench and make
some conclusions based on it. From my point of view it is obvious misuse.
Compressing of each message idividually or chunked compression
significantly decrease compression ratio because typical size of message
is not large enough
and resetting compression state after processing each message (clearing
compression dictionary) adds too large overhead.