> On Mar 2, 2026, at 17:17, Chao Li <[email protected]> wrote:
> 
> Hi,
> 
> There have been a couple of LZ4-related patches recently, so I spent some 
> time playing with the LZ4 path and found a bug in 
> astreamer_lz4_decompressor_content().
> 
> Looking at the code snippet (omitting irrelevant code):
> ```
> ret = LZ4F_decompress(mystreamer->dctx,
>  next_out, &out_size,
>  next_in, &read_size, NULL);
> 
> mystreamer->bytes_written += out_size; // <== bumped bytes_written already
> 
> /*
> * If output buffer is full then forward the content to next streamer
> * and update the output buffer.
> */
> if (mystreamer->bytes_written >= mystreamer->base.bbs_buffer.maxlen)
> {
> astreamer_content(mystreamer->base.bbs_next, member,
>  mystreamer->base.bbs_buffer.data,
>  mystreamer->base.bbs_buffer.maxlen,
>  context);
> 
> avail_out = mystreamer->base.bbs_buffer.maxlen;
> mystreamer->bytes_written = 0;
> next_out = (uint8 *) mystreamer->base.bbs_buffer.data;
> }
> else
> {
> avail_out = mystreamer->base.bbs_buffer.maxlen - mystreamer->bytes_written;
> next_out += mystreamer->bytes_written; // <== The bug is there
> }
> ```
> 
> To advance next_out, the code uses mystreamer->bytes_written. However, 
> bytes_written has already been increased by out_size in the current 
> iteration. As a result, next_out is advanced by the cumulative number of 
> bytes written so far, instead of just the number of bytes produced in this 
> iteration. Effectively, the pointer movement is double-counting the previous 
> progress.
> 
> When I tried to design a test case to trigger this bug, I found it is 
> actually not easy to hit in normal execution. Tracing into the function, I 
> found that the default output buffer size is 1024 bytes, and in practice 
> LZ4F_decompress() tends to fill the output buffer in one or two iterations. 
> As a result, the problematic else branch is either not reached, or reached in 
> a case where bytes_written == out_size, so the incorrect pointer increment 
> does not manifest.
> 
> To reliably trigger the bug, I used a small hack: instead of letting 
> LZ4F_decompress() use the full available out_size, I artificially limited 
> out_size before the call, forcing LZ4F_decompress() to require one more 
> iteration to fill the buffer. See the attached nocfbot_hack.diff for the hack.
> 
> With that hack in place, the bug can be reproduced using the following 
> procedure:
> 
> 1. initdb
> 2 Set "wal_level = replica” in postgreSQl.conf
> 3. Restart the instance
> 4. Create a database
> 5. Generate some WAL logs by psql
> ```
> CREATE TABLE t AS SELECT generate_series(1, 100000) AS id;
> CHECKPOINT;
> ```
> 6. Create a backup
> ```
> % rm -rf /tmp/bkup_lz4
> % pg_basebackup -D /tmp/bkup_lz4 -F t -Z lz4 -X stream -c fast
> ```
> 7. Verify the backup
> ```
> % pg_verifybackup -F t -n /tmp/bkup_lz4
> pg_verifybackup: error: zsh: trace trap  pg_verifybackup -F t -n /tmp/bkup_lz4
> ```
> 
> With the fix applied (plus the hack), step 7 succeeds:
> ```
> % pg_verifybackup -F t -n /tmp/bkup_lz4
> backup successfully verified
> ```
> Best regards,
> --
> Chao Li (Evan)
> HighGo Software Co., Ltd.
> https://www.highgo.com/
> 
> 
> 
> 
> <nocfbot_hack.diff><v1-0001-astreamer_lz4-fix-output-pointer-advancement-in-d.patch>


Added to CF for tracking https://commitfest.postgresql.org/patch/6561/

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/






Reply via email to