On Fri, Feb 25, 2022 at 12:05:31PM +0000, Georgios wrote: > The first commit does the heavy lifting required for additional compression > methods. > It expands testing coverage for the already supported gzip compression. Commit > bf9aa490db introduced cfp in compress_io.{c,h} with the intent of unifying > compression related code and allow for the introduction of additional archive > formats. However, pg_backup_archiver.c was not using that API. This commit > teaches pg_backup_archiver.c about cfp and is using it through out.
Thanks for the patch. I have a few high-level comments. + # Do not use --no-sync to give test coverage for data sync. + compression_gzip_directory_format => { + test_key => 'compression', The tests for GZIP had better be split into their own commit, as that's a coverage improvement for the existing code. I was assuming that this was going to be much larger :) +/* Routines that support LZ4 compressed data I/O */ +#ifdef HAVE_LIBLZ4 +static void InitCompressorLZ4(CompressorState *cs); +static void ReadDataFromArchiveLZ4(ArchiveHandle *AH, ReadFunc readF); +static void WriteDataToArchiveLZ4(ArchiveHandle *AH, CompressorState *cs, + const char *data, size_t dLen); +static void EndCompressorLZ4(ArchiveHandle *AH, CompressorState *cs); +#endif Hmm. This is the same set of APIs as ZLIB and NONE to init, read, write and end, but for the LZ4 compressor (NONE has no init/end). Wouldn't it be better to refactor the existing pg_dump code to have a central structure holding all the function definitions in a common structure so as all those function signatures are set in stone in the shape of a catalog of callbacks, making the addition of more compression formats easier? I would imagine that we'd split the code of each compression method into their own file with their own context data. This would lead to a removal of compress_io.c, with its entry points ReadDataFromArchive(), WriteDataToArchive() & co replaced by pointers to each per-compression callback. > Furthermore, compression was chosen based on the value of the level passed > as an argument during the invocation of pg_dump or some hardcoded defaults. > This > does not scale for more than one compression methods. Now the method used for > compression can be explicitly requested during command invocation, or set > during > hardcoded defaults. Then it is stored in the relevant structs and passed in > the > relevant functions, along side compression level which has lost it's special > meaning. The method for compression is not yet stored in the actual archive. > This is done in the next commit which does introduce a new method. That's one thing Robert was arguing about with pg_basebackup, so that would be consistent, and the option set is backward-compatible as far as I get it by reading the code. -- Michael
signature.asc
Description: PGP signature