On Thu, Apr 4, 2024 at 9:11 PM Tomas Vondra
<tomas.von...@enterprisedb.com> wrote:
>
> On 4/4/24 19:38, Robert Haas wrote:
> > Hi,
> >
> > Yesterday, Tomas Vondra reported to me off-list that he was seeing
> > what appeared to be data corruption after taking and restoring an
> > incremental backup. Overnight, Jakub Wartak further experimented with
> > Tomas's test case, did some initial analysis, and made it very easy to
> > reproduce. I spent this morning tracking down the problem, for which I
> > attach a patch.
> >
>
> Thanks, I can confirm this fixes the issue I've observed/reported. On
> master 10 out of 10 runs failed, with the patch no failures.

Same here, patch fixes it on recent master. I've also run pgbench for
~30mins and compared master and incremental and got 0 differences,
should be good.

> The test is very simple:
>
> 1) init pgbench

Tomas had magic fingers here - he used pgbench -i -s 100 which causes
bigger relations (it wouldn't trigger for smaller -s values as Robert
explained - now it makes full sense; in earlier tests I was using much
smaller -s , then transitioned to other workloads (mostly append
only), and final 100GB+/24h+ tests used mostly INSERTs rather than
UPDATEs AFAIR). The other interesting thing is that one of the animals
runs with configure --with-relsegsize=<somesmallvalue> (so new
relations are full much earlier) and it was not catched there either -
Wouldn't it be good idea to to test in src/test/recover/ like that?

And of course i'm attaching reproducer with some braindump notes in
case in future one hits similiar issue and wonders where to even start
looking (it's very primitive though but might help).

-J.
#!/bin/bash
rm -rf tmp
mkdir /tmp/wals
pg_ctl -D tmp/data init
echo "summarize_wal = on" >> tmp/data/postgresql.conf
#optional
echo "autovacuum = off" >> tmp/data/postgresql.conf
#optional (gather WAL for analysis)
echo "archive_mode = on" >> tmp/data/postgresql.conf
echo "archive_command = 'cp %p /tmp/wals/%f'" >> tmp/data/postgresql.conf
#optional (no FPW in pg_waldump)
#echo "full_page_writes = off" >> tmp/data/postgresql.conf

#below steps by Tomas:
pg_ctl -D tmp/data -l tmp/pg.log start
createdb test
pgbench -i -s 100 test
pg_basebackup -D tmp/full -c fast
pgbench -t 1000 test
pg_basebackup -D tmp/increment-1 --increment=tmp/full/backup_manifest -c fast
pg_dumpall > tmp/dumpall-source.log
pg_ctl -D tmp/data -l tmp/pg.log stop
pg_verifybackup tmp/full
pg_verifybackup tmp/increment-1/
pg_combinebackup -d -o tmp/restored tmp/full tmp/increment-1/ > 
tmp/combinebackup 2>&1
pg_ctl -D tmp/restored/ -l tmp/pg.log start
pg_dumpall > tmp/dumpall-restored.log
#pg_ctl -D tmp/restored/ -l pg.tmp/log stop
diff -u tmp/dumpall-source.log tmp/dumpall-restored.log | wc -l
exit 0

-----------------8<-----------------8<-----------------8<-----------------8<-----------------8<-----------------

it gives:

postgres@test:5432 : 54750 # select ctid,aid,bid,abalance from pgbench_accounts 
where aid in (select aid from pgbench_accounts group by aid, bid having 
count(*) > 1 order by aid, bid limit 1);
    ctid    |   aid   | bid | abalance
------------+---------+-----+----------
 (65690,55) | 4007145 |  41 |        0
(1 row)

postgres@test:5432 : 54750 # set enable_bitmapscan to off;
SET
postgres@test:5432 : 54750 # set enable_indexscan to off;
SET
postgres@test:5432 : 54750 # select ctid,aid,bid,abalance from pgbench_accounts 
where aid in (select aid from pgbench_accounts group by aid, bid having 
count(*) > 1 order by aid, bid limit 1);
    ctid     |   aid   | bid | abalance
-------------+---------+-----+----------
 (65690,55)  | 4007145 |  41 |        0
 (163943,49) | 4007145 |  41 |      638
(2 rows)

postgres@test:5432 : 54750 # select relfilenode from pg_class where relname  
='pgbench_accounts';                                                            
  relfilenode
-------------
       16400
(1 row)

postgres@test:5432 : 54750 #

           
           
$ for f in tmp/data/pg_wal/summaries/* ; do >&2 echo "$f"; pg_walsummary -i $f  
; done | grep -e 65690 -e 163943
tmp/data/pg_wal/summaries/00000001000000000100002800000000010B1AA0.summary
tmp/data/pg_wal/summaries/0000000100000000010B1AA000000000014E6538.summary
tmp/data/pg_wal/summaries/0000000100000000014E653800000000014E6638.summary
tmp/data/pg_wal/summaries/0000000100000000014E663800000000014ED020.summary
tmp/data/pg_wal/summaries/0000000100000000014ED02000000000014ED120.summary
tmp/data/pg_wal/summaries/0000000100000000014ED12000000000014ED458.summary
tmp/data/pg_wal/summaries/0000000100000000014ED4580000000022403570.summary
TS 1663, DB 16384, REL 16400, FORK main: block 65690
tmp/data/pg_wal/summaries/0000000100000000224035700000000043431F50.summary
tmp/data/pg_wal/summaries/000000010000000043431F50000000004F0000D8.summary
tmp/data/pg_wal/summaries/00000001000000004F0000D80000000051000028.summary
TS 1663, DB 16384, REL 16400, FORK main: block 65690
TS 1663, DB 16384, REL 16400, FORK main: block 163943


on tmp/data/pg_wal/summaries/00000001000000004F0000D80000000051000028.summary 
without -i:

TS 1663, DB 16384, REL 16400, FORK main: block 163069
TS 1663, DB 16384, REL 16400, FORK main: block 163213
TS 1663, DB 16384, REL 16400, FORK main: block 163253
TS 1663, DB 16384, REL 16400, FORK main: block 163392
TS 1663, DB 16384, REL 16400, FORK main: block 163492
TS 1663, DB 16384, REL 16400, FORK main: block 163542
TS 1663, DB 16384, REL 16400, FORK main: block 163811
TS 1663, DB 16384, REL 16400, FORK main: block 163819
TS 1663, DB 16384, REL 16400, FORK main: blocks 163934..163950 << dupe here
TS 1663, DB 16384, REL 16401, FORK main: block 0
TS 1663, DB 16384, REL 16402, FORK main: limit 0
TS 1663, DB 16384, REL 16402, FORK vm: limit 0
TS 1663, DB 16384, REL 16402, FORK init: limit 0

# being lazy! full wal archive_command scan since the beginning
$ for f in /tmp/wals/0* ; do >&2 echo "$f"; pg_waldump $f  ; done  | grep 
1663/16384/16400 | grep --color -e 'blk 65690' -e 'blk 163943'
[..]
/tmp/wals/000000010000000000000019
/tmp/wals/00000001000000000000001A
/tmp/wals/00000001000000000000001B <-- normal load from pgbench -i?
rmgr: Heap2       len (rec/tot):   6515/  6515, tx:        743, lsn: 
0/1B748238, prev 0/1B7481F8, desc: MULTI_INSERT+INIT ntuples: 61, flags: 0x20, 
blkref #0: rel 1663/16384/16400 blk 65690
rmgr: Heap2       len (rec/tot):     59/    59, tx:        743, lsn: 
0/1B749BB0, prev 0/1B748238, desc: VISIBLE snapshotConflictHorizon: 0, flags: 
0x03, blkref #0: rel 1663/16384/16400 fork vm blk 2, blkref #1: rel 
1663/16384/16400 blk 65690
/tmp/wals/00000001000000000000001C
/tmp/wals/00000001000000000000001D
/tmp/wals/00000001000000000000001E
/tmp/wals/00000001000000000000004E
/tmp/wals/00000001000000000000004F
/tmp/wals/00000001000000000000004F.000000D8.backup // ignore
pg_waldump: error: could not read file 
"00000001000000000000004F.000000D8.backup": read 343 of 8192
/tmp/wals/000000010000000000000050 // our bug trigger?
rmgr: Heap        len (rec/tot):    171/   171, tx:       1275, lsn: 
0/5081F350, prev 0/5081D370, desc: UPDATE+INIT old_xmax: 1275, old_off: 37, 
old_infobits: [], flags: 0x01, new_xmax: 0, new_off: 1, blkref #0: rel 
1663/16384/16400 blk 163943, blkref #1: rel 1663/16384/16400 blk 13119
rmgr: Heap        len (rec/tot):    171/   171, tx:       1276, lsn: 
0/50823260, prev 0/50821280, desc: UPDATE old_xmax: 1276, old_off: 31, 
old_infobits: [], flags: 0x01, new_xmax: 0, new_off: 2, blkref #0: rel 
1663/16384/16400 blk 163943, blkref #1: rel 1663/16384/16400 blk 158621
rmgr: Heap        len (rec/tot):    171/   171, tx:       1277, lsn: 
0/508271F8, prev 0/50825218, desc: UPDATE old_xmax: 1277, old_off: 24, 
old_infobits: [], flags: 0x01, new_xmax: 0, new_off: 3, blkref #0: rel 
1663/16384/16400 blk 163943, blkref #1: rel 1663/16384/16400 blk 39933
rmgr: Heap        len (rec/tot):    171/   171, tx:       1278, lsn: 
0/5082B0C8, prev 0/508290E8, desc: UPDATE old_xmax: 1278, old_off: 33, 
old_infobits: [], flags: 0x01, new_xmax: 0, new_off: 4, blkref #0: rel 
1663/16384/16400 blk 163943, blkref #1: rel 1663/16384/16400 blk 157948
rmgr: Heap        len (rec/tot):    171/   171, tx:       1279, lsn: 
0/5082EFD8, prev 0/5082CFF8, desc: UPDATE old_xmax: 1279, old_off: 40, 
old_infobits: [], flags: 0x01, new_xmax: 0, new_off: 5, blkref #0: rel 
1663/16384/16400 blk 163943, blkref #1: rel 1663/16384/16400 blk 134187
[..]
rmgr: Heap        len (rec/tot):    171/   171, tx:       1321, lsn: 
0/508D47E8, prev 0/508D2808, desc: UPDATE old_xmax: 1321, old_off: 13, 
old_infobits: [], flags: 0x01, new_xmax: 0, new_off: 47, blkref #0: rel 
1663/16384/16400 blk 163943, blkref #1: rel 1663/16384/16400 blk 30385
rmgr: Heap        len (rec/tot):    171/   171, tx:       1322, lsn: 
0/508D86F8, prev 0/508D6718, desc: UPDATE old_xmax: 1322, old_off: 27, 
old_infobits: [], flags: 0x01, new_xmax: 0, new_off: 48, blkref #0: rel 
1663/16384/16400 blk 163943, blkref #1: rel 1663/16384/16400 blk 27497
rmgr: Heap        len (rec/tot):     59/  8135, tx:       1323, lsn: 
0/508DA628, prev 0/508DA600, desc: LOCK xmax: 1323, off: 55, infobits: 
[LOCK_ONLY, EXCL_LOCK], flags: 0x01, blkref #0: rel 1663/16384/16400 blk 65690 
FPW
rmgr: Heap        len (rec/tot):    171/   171, tx:       1323, lsn: 
0/508DC608, prev 0/508DA628, desc: UPDATE old_xmax: 1323, old_off: 55, 
old_infobits: [], flags: 0x01, new_xmax: 0, new_off: 49, blkref #0: rel 
1663/16384/16400 blk 163943, blkref #1: rel 1663/16384/16400 blk 65690
/tmp/wals/000000010000000000000051
/tmp/wals/000000010000000000000051.00000028.backup
pg_waldump: error: could not read file 
"000000010000000000000051.00000028.backup": read 400 of 8192
/tmp/wals/000000010000000000000052
rmgr: Heap2       len (rec/tot):     60/  8008, tx:          0, lsn: 
0/522DD5B0, prev 0/522DB650, desc: PRUNE snapshotConflictHorizon: 1323, 
nredirected: 0, ndead: 1, isCatalogRel: F, blkref #0: rel 1663/16384/16400 blk 
65690 FPW



WAL in /tmp/wals/000000010000000000000050, compare against tmp/combinebackup 
outputs

Reply via email to