On Thu, Apr 4, 2024 at 9:11 PM Tomas Vondra <tomas.von...@enterprisedb.com> wrote: > > On 4/4/24 19:38, Robert Haas wrote: > > Hi, > > > > Yesterday, Tomas Vondra reported to me off-list that he was seeing > > what appeared to be data corruption after taking and restoring an > > incremental backup. Overnight, Jakub Wartak further experimented with > > Tomas's test case, did some initial analysis, and made it very easy to > > reproduce. I spent this morning tracking down the problem, for which I > > attach a patch. > > > > Thanks, I can confirm this fixes the issue I've observed/reported. On > master 10 out of 10 runs failed, with the patch no failures.
Same here, patch fixes it on recent master. I've also run pgbench for ~30mins and compared master and incremental and got 0 differences, should be good. > The test is very simple: > > 1) init pgbench Tomas had magic fingers here - he used pgbench -i -s 100 which causes bigger relations (it wouldn't trigger for smaller -s values as Robert explained - now it makes full sense; in earlier tests I was using much smaller -s , then transitioned to other workloads (mostly append only), and final 100GB+/24h+ tests used mostly INSERTs rather than UPDATEs AFAIR). The other interesting thing is that one of the animals runs with configure --with-relsegsize=<somesmallvalue> (so new relations are full much earlier) and it was not catched there either - Wouldn't it be good idea to to test in src/test/recover/ like that? And of course i'm attaching reproducer with some braindump notes in case in future one hits similiar issue and wonders where to even start looking (it's very primitive though but might help). -J.
#!/bin/bash rm -rf tmp mkdir /tmp/wals pg_ctl -D tmp/data init echo "summarize_wal = on" >> tmp/data/postgresql.conf #optional echo "autovacuum = off" >> tmp/data/postgresql.conf #optional (gather WAL for analysis) echo "archive_mode = on" >> tmp/data/postgresql.conf echo "archive_command = 'cp %p /tmp/wals/%f'" >> tmp/data/postgresql.conf #optional (no FPW in pg_waldump) #echo "full_page_writes = off" >> tmp/data/postgresql.conf #below steps by Tomas: pg_ctl -D tmp/data -l tmp/pg.log start createdb test pgbench -i -s 100 test pg_basebackup -D tmp/full -c fast pgbench -t 1000 test pg_basebackup -D tmp/increment-1 --increment=tmp/full/backup_manifest -c fast pg_dumpall > tmp/dumpall-source.log pg_ctl -D tmp/data -l tmp/pg.log stop pg_verifybackup tmp/full pg_verifybackup tmp/increment-1/ pg_combinebackup -d -o tmp/restored tmp/full tmp/increment-1/ > tmp/combinebackup 2>&1 pg_ctl -D tmp/restored/ -l tmp/pg.log start pg_dumpall > tmp/dumpall-restored.log #pg_ctl -D tmp/restored/ -l pg.tmp/log stop diff -u tmp/dumpall-source.log tmp/dumpall-restored.log | wc -l exit 0 -----------------8<-----------------8<-----------------8<-----------------8<-----------------8<----------------- it gives: postgres@test:5432 : 54750 # select ctid,aid,bid,abalance from pgbench_accounts where aid in (select aid from pgbench_accounts group by aid, bid having count(*) > 1 order by aid, bid limit 1); ctid | aid | bid | abalance ------------+---------+-----+---------- (65690,55) | 4007145 | 41 | 0 (1 row) postgres@test:5432 : 54750 # set enable_bitmapscan to off; SET postgres@test:5432 : 54750 # set enable_indexscan to off; SET postgres@test:5432 : 54750 # select ctid,aid,bid,abalance from pgbench_accounts where aid in (select aid from pgbench_accounts group by aid, bid having count(*) > 1 order by aid, bid limit 1); ctid | aid | bid | abalance -------------+---------+-----+---------- (65690,55) | 4007145 | 41 | 0 (163943,49) | 4007145 | 41 | 638 (2 rows) postgres@test:5432 : 54750 # select relfilenode from pg_class where relname ='pgbench_accounts'; relfilenode ------------- 16400 (1 row) postgres@test:5432 : 54750 # $ for f in tmp/data/pg_wal/summaries/* ; do >&2 echo "$f"; pg_walsummary -i $f ; done | grep -e 65690 -e 163943 tmp/data/pg_wal/summaries/00000001000000000100002800000000010B1AA0.summary tmp/data/pg_wal/summaries/0000000100000000010B1AA000000000014E6538.summary tmp/data/pg_wal/summaries/0000000100000000014E653800000000014E6638.summary tmp/data/pg_wal/summaries/0000000100000000014E663800000000014ED020.summary tmp/data/pg_wal/summaries/0000000100000000014ED02000000000014ED120.summary tmp/data/pg_wal/summaries/0000000100000000014ED12000000000014ED458.summary tmp/data/pg_wal/summaries/0000000100000000014ED4580000000022403570.summary TS 1663, DB 16384, REL 16400, FORK main: block 65690 tmp/data/pg_wal/summaries/0000000100000000224035700000000043431F50.summary tmp/data/pg_wal/summaries/000000010000000043431F50000000004F0000D8.summary tmp/data/pg_wal/summaries/00000001000000004F0000D80000000051000028.summary TS 1663, DB 16384, REL 16400, FORK main: block 65690 TS 1663, DB 16384, REL 16400, FORK main: block 163943 on tmp/data/pg_wal/summaries/00000001000000004F0000D80000000051000028.summary without -i: TS 1663, DB 16384, REL 16400, FORK main: block 163069 TS 1663, DB 16384, REL 16400, FORK main: block 163213 TS 1663, DB 16384, REL 16400, FORK main: block 163253 TS 1663, DB 16384, REL 16400, FORK main: block 163392 TS 1663, DB 16384, REL 16400, FORK main: block 163492 TS 1663, DB 16384, REL 16400, FORK main: block 163542 TS 1663, DB 16384, REL 16400, FORK main: block 163811 TS 1663, DB 16384, REL 16400, FORK main: block 163819 TS 1663, DB 16384, REL 16400, FORK main: blocks 163934..163950 << dupe here TS 1663, DB 16384, REL 16401, FORK main: block 0 TS 1663, DB 16384, REL 16402, FORK main: limit 0 TS 1663, DB 16384, REL 16402, FORK vm: limit 0 TS 1663, DB 16384, REL 16402, FORK init: limit 0 # being lazy! full wal archive_command scan since the beginning $ for f in /tmp/wals/0* ; do >&2 echo "$f"; pg_waldump $f ; done | grep 1663/16384/16400 | grep --color -e 'blk 65690' -e 'blk 163943' [..] /tmp/wals/000000010000000000000019 /tmp/wals/00000001000000000000001A /tmp/wals/00000001000000000000001B <-- normal load from pgbench -i? rmgr: Heap2 len (rec/tot): 6515/ 6515, tx: 743, lsn: 0/1B748238, prev 0/1B7481F8, desc: MULTI_INSERT+INIT ntuples: 61, flags: 0x20, blkref #0: rel 1663/16384/16400 blk 65690 rmgr: Heap2 len (rec/tot): 59/ 59, tx: 743, lsn: 0/1B749BB0, prev 0/1B748238, desc: VISIBLE snapshotConflictHorizon: 0, flags: 0x03, blkref #0: rel 1663/16384/16400 fork vm blk 2, blkref #1: rel 1663/16384/16400 blk 65690 /tmp/wals/00000001000000000000001C /tmp/wals/00000001000000000000001D /tmp/wals/00000001000000000000001E /tmp/wals/00000001000000000000004E /tmp/wals/00000001000000000000004F /tmp/wals/00000001000000000000004F.000000D8.backup // ignore pg_waldump: error: could not read file "00000001000000000000004F.000000D8.backup": read 343 of 8192 /tmp/wals/000000010000000000000050 // our bug trigger? rmgr: Heap len (rec/tot): 171/ 171, tx: 1275, lsn: 0/5081F350, prev 0/5081D370, desc: UPDATE+INIT old_xmax: 1275, old_off: 37, old_infobits: [], flags: 0x01, new_xmax: 0, new_off: 1, blkref #0: rel 1663/16384/16400 blk 163943, blkref #1: rel 1663/16384/16400 blk 13119 rmgr: Heap len (rec/tot): 171/ 171, tx: 1276, lsn: 0/50823260, prev 0/50821280, desc: UPDATE old_xmax: 1276, old_off: 31, old_infobits: [], flags: 0x01, new_xmax: 0, new_off: 2, blkref #0: rel 1663/16384/16400 blk 163943, blkref #1: rel 1663/16384/16400 blk 158621 rmgr: Heap len (rec/tot): 171/ 171, tx: 1277, lsn: 0/508271F8, prev 0/50825218, desc: UPDATE old_xmax: 1277, old_off: 24, old_infobits: [], flags: 0x01, new_xmax: 0, new_off: 3, blkref #0: rel 1663/16384/16400 blk 163943, blkref #1: rel 1663/16384/16400 blk 39933 rmgr: Heap len (rec/tot): 171/ 171, tx: 1278, lsn: 0/5082B0C8, prev 0/508290E8, desc: UPDATE old_xmax: 1278, old_off: 33, old_infobits: [], flags: 0x01, new_xmax: 0, new_off: 4, blkref #0: rel 1663/16384/16400 blk 163943, blkref #1: rel 1663/16384/16400 blk 157948 rmgr: Heap len (rec/tot): 171/ 171, tx: 1279, lsn: 0/5082EFD8, prev 0/5082CFF8, desc: UPDATE old_xmax: 1279, old_off: 40, old_infobits: [], flags: 0x01, new_xmax: 0, new_off: 5, blkref #0: rel 1663/16384/16400 blk 163943, blkref #1: rel 1663/16384/16400 blk 134187 [..] rmgr: Heap len (rec/tot): 171/ 171, tx: 1321, lsn: 0/508D47E8, prev 0/508D2808, desc: UPDATE old_xmax: 1321, old_off: 13, old_infobits: [], flags: 0x01, new_xmax: 0, new_off: 47, blkref #0: rel 1663/16384/16400 blk 163943, blkref #1: rel 1663/16384/16400 blk 30385 rmgr: Heap len (rec/tot): 171/ 171, tx: 1322, lsn: 0/508D86F8, prev 0/508D6718, desc: UPDATE old_xmax: 1322, old_off: 27, old_infobits: [], flags: 0x01, new_xmax: 0, new_off: 48, blkref #0: rel 1663/16384/16400 blk 163943, blkref #1: rel 1663/16384/16400 blk 27497 rmgr: Heap len (rec/tot): 59/ 8135, tx: 1323, lsn: 0/508DA628, prev 0/508DA600, desc: LOCK xmax: 1323, off: 55, infobits: [LOCK_ONLY, EXCL_LOCK], flags: 0x01, blkref #0: rel 1663/16384/16400 blk 65690 FPW rmgr: Heap len (rec/tot): 171/ 171, tx: 1323, lsn: 0/508DC608, prev 0/508DA628, desc: UPDATE old_xmax: 1323, old_off: 55, old_infobits: [], flags: 0x01, new_xmax: 0, new_off: 49, blkref #0: rel 1663/16384/16400 blk 163943, blkref #1: rel 1663/16384/16400 blk 65690 /tmp/wals/000000010000000000000051 /tmp/wals/000000010000000000000051.00000028.backup pg_waldump: error: could not read file "000000010000000000000051.00000028.backup": read 400 of 8192 /tmp/wals/000000010000000000000052 rmgr: Heap2 len (rec/tot): 60/ 8008, tx: 0, lsn: 0/522DD5B0, prev 0/522DB650, desc: PRUNE snapshotConflictHorizon: 1323, nredirected: 0, ndead: 1, isCatalogRel: F, blkref #0: rel 1663/16384/16400 blk 65690 FPW WAL in /tmp/wals/000000010000000000000050, compare against tmp/combinebackup outputs