> On Oct 12, 2025, at 09:25, Tom Lane <[email protected]> wrote:
>
> While playing around with the test cases for pg_dump compression,
> I was startled to discover that the performance of compress_lz4's
> "stream API" code is absolutely abysmal. Here is a simple test
> case to demonstrate, using the regression database as test data:
>
> $ pg_dump -Fd --compress=lz4 -f rlz4.dir regression
> $ time pg_restore -f /dev/null rlz4.dir
>
> real 0m0.023s
> user 0m0.017s
> sys 0m0.006s
>
> So far so good, but now let's compress the toc.dat file:
>
> $ lz4 -f -m --rm rlz4.dir/toc.dat
> $ time pg_restore -f /dev/null rlz4.dir
>
> real 0m1.335s
> user 0m1.326s
> sys 0m0.008s
>
> Considering that lz4 prides itself on fast decompression speed,
> that is not a sane result. Decompressing the file only requires
> a couple ms on my machine:
>
> $ time lz4cat rlz4.dir/toc.dat.lz4 >/dev/null
>
> real 0m0.002s
> user 0m0.000s
> sys 0m0.002s
>
> So on this example, pg_restore is something more than 600x slower
> to read the TOC data than it ought to be.
>
I tested the patch on my MacBook M4 (Sequoia 15.6.1), compiled with -O2
optimization:
- For LZ4 performance improvement:
Without the fix (latest master):
```
chaol@ChaodeMacBook-Air tmp % pg_dump -Fd --compress=lz4 -f rlz4.dir regression
chaol@ChaodeMacBook-Air tmp % time pg_restore -f /dev/null rlz4.dir
pg_restore -f /dev/null rlz4.dir 0.03s user 0.03s system 11% cpu 0.463 total
chaol@ChaodeMacBook-Air tmp % lz4 -f -m --rm rlz4.dir/toc.dat
# It took 1.59s to restore from compressed data, much slower than from
uncompressed data.
chaol@ChaodeMacBook-Air tmp % time pg_restore -f /dev/null rlz4.dir
pg_restore -f /dev/null rlz4.dir 1.59s user 0.02s system 97% cpu 1.653 total
```
With the fix:
```
chaol@ChaodeMacBook-Air tmp % pg_dump -Fd --compress=lz4 -f rlz4.dir regression
chaol@ChaodeMacBook-Air tmp % time pg_restore -f /dev/null rlz4.dir
pg_restore -f /dev/null rlz4.dir 0.02s user 0.03s system 16% cpu 0.305 total
chaol@ChaodeMacBook-Air tmp % lz4 -f -m --rm rlz4.dir/toc.dat
# Much faster !!!
chaol@ChaodeMacBook-Air tmp % time pg_restore -f /dev/null rlz4.dir
pg_restore -f /dev/null rlz4.dir 0.02s user 0.02s system 93% cpu 0.043 total
```
- For Gzip zero read bug:
Without the fix:
```
chaol@ChaodeMacBook-Air tmp % pg_dump -Fd --compress=gzip -f rgzip.dir
regression
chaol@ChaodeMacBook-Air tmp % gzip rgzip.dir/toc.dat
# Failed
chaol@ChaodeMacBook-Air tmp % pg_restore -f /dev/null rgzip.dir
pg_restore: error: could not read from input file:
```
With the fix:
```
chaol@ChaodeMacBook-Air tmp % pg_dump -Fd --compress=gzip -f rgzip.dir
regression
chaol@ChaodeMacBook-Air tmp % gzip rgzip.dir/toc.dat
# Ran successfully
chaol@ChaodeMacBook-Air tmp % pg_restore -f /dev/null rgzip.dir
```
I also reviewed the code change, basically LGTM. Just a couple of trivial
comments:
1 - 0001
In WriteDataToArchiveLZ4()
```
+ required = LZ4F_compressBound(chunk, &state->prefs);
+ if (required > state->buflen - state->bufdata)
+ {
+ cs->writeF(AH, state->buffer, state->bufdata);
+ state->bufdata = 0;
+ }
```
And in EndCompressorLZ4()
```
+ required = LZ4F_compressBound(0, &state->prefs);
+ if (required > state->buflen - state->bufdata)
+ {
+ cs->writeF(AH, state->buffer, state->bufdata);
+ state->bufdata = 0;
+ }
```
These two code pieces are similar, only difference is the first parameter
passed to LZ4F_compressBound().
I think we can create an inline function for it. But these are just 5 lines, I
don’t have a strong option on that.
Same thing for the other code pieces in LZ4Stream_write() and LZ4Stream_close().
2 - 0003
```
/*
* LZ4 equivalent to feof() or gzeof(). Return true iff there is no
- * decompressed output in the overflow buffer and the end of the backing file
```
This doesn’t belong to the current patch. But “iff” seems a typo of “if”. You
may fix it as you are touching this piece of code.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/