On Fri, Dec 26, 2014 at 4:16 PM, Michael Paquier <michael.paqu...@gmail.com>
> On Fri, Dec 26, 2014 at 3:24 PM, Fujii Masao <masao.fu...@gmail.com>
>> pglz_compress() and pglz_decompress() still use PGLZ_Header, so the
>> which uses those functions needs to handle PGLZ_Header. But it basically
>> be handled via the varlena macros. That is, the frontend still seems to
need to
>> understand the varlena datatype. I think we should avoid that. Thought?
> Hm, yes it may be wiser to remove it and make the data passed to pglz
> for varlena 8 bytes shorter..

OK, here is the result of this work, made of 3 patches.

The first two patches move pglz stuff to src/common and make it a frontend
utility entirely independent on varlena and its related metadata.
- Patch 1 is a simple move of pglz to src/common, with PGLZ_Header still
present. There is nothing amazing here, and that's the broken version that
has been reverted in 966115c.
- The real stuff comes with patch 2, that implements the removal of
PGLZ_Header, changing the APIs of compression and decompression to pglz to
not have anymore toast metadata, this metadata being now localized in
tuptoaster.c. Note that this patch protects the on-disk format (tested with
pg_upgrade from 9.4 to a patched HEAD server). Here is how the APIs of
compression and decompression look like with this patch, simply performing
operations from a source to a destination:
extern int32 pglz_compress(const char *source, int32 slen, char *dest,
                          const PGLZ_Strategy *strategy);
extern int32 pglz_decompress(const char *source, char *dest,
                          int32 compressed_size, int32 raw_size);
The return value of those functions is the number of bytes written in the
destination buffer, and 0 if operation failed. This is aimed to make
backend as well more pluggable. The reason why patch 2 exists (it could be
merged with patch 1), is to facilitate the review and the changes made to
pglz to make it an entirely independent facility.

Patch 3 is the FPW compression, changed to fit with those changes. Note
that as PGLZ_Header contains the raw size of the compressed data, and that
it does not exist, it is necessary to store the raw length of the block
image directly in the block image header with 2 additional bytes. Those 2
bytes are used only if wal_compression is set to true thanks to a boolean
flag, so if wal_compression is disabled, the WAL record length is exactly
the same as HEAD, and there is no penalty in the default case. Similarly to
previous patches, the block image is compressed without its hole.

To finish, here are some results using the same test as here with the hack
on getrusage to get the system and user CPU diff on a single backend
Just as a reminder, this test generated a fixed number of FPWs on a single
backend with fsync and autovacuum disabled with several values of
fillfactor to see the effect of page holes.

  test   | ffactor | user_diff | system_diff | pg_size_pretty
 FPW on  |      50 | 48.823907 |    0.737649 | 582 MB
 FPW on  |      20 | 16.135000 |    0.764682 | 229 MB
 FPW on  |      10 |  8.521099 |    0.751947 | 116 MB
 FPW off |      50 | 29.722793 |    1.045577 | 746 MB
 FPW off |      20 | 12.673375 |    0.905422 | 293 MB
 FPW off |      10 |  6.723120 |    0.779936 | 148 MB
 HEAD    |      50 | 30.763136 |    1.129822 | 746 MB
 HEAD    |      20 | 13.340823 |    0.893365 | 293 MB
 HEAD    |      10 |  7.267311 |    0.909057 | 148 MB
(9 rows)

Results are similar to what has been measured previously, it doesn't hurt
to check again, but roughly the CPU cost is balanced by the WAL record
reduction. There is 0 byte of difference in term of WAL record length
between HEAD this patch when wal_compression = off.

Patches, as well as the test script and the results are attached.

Attachment: results.sql
Description: Binary data

Attachment: test_compress
Description: Binary data

Attachment: 20141228_fpw_compression_v12.tar.gz
Description: GNU Zip compressed data

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to