>    Nice catch!  However, I'm not sure I like the patch.

>             * made it through and start writing after the portion that 
> persisted.
>             * (It's critical to first write an OVERWRITE_CONTRECORD message, 
> which
>             * we'll do as soon as we're open for writing new WAL.)
>    +        *
>    +        * If the last wal record is ahead of the missing contrecord, this 
> is a
>    +        * recently promoted primary and we should not write an overwrite
>    +        * contrecord.

>    Before the part, the comment follows the part shown below.

>    >        * Actually, if WAL ended in an incomplete record, skip the parts 
> that

>    So, actually WAL did not ended in an incomplete record.  I think
>    FinishWalRecover is the last place to do that. (But it could be
>    earlier.)

Thanks for the feedback. 

## On the primary, an OVERWRITE_CONTRECORD is written. The aborted record 
(abortedRecPtr) and insert position of the missing contrecord 
(missingContrecPtr) are set during FinishWalRecovery and after recovery but 
before writes are accepted, the OVERWRITE_CONTRECORD is written.
## on the primary logs

2022-02-25 02:25:16.208 UTC [7397] LOG:  redo done at 0/1FFD2B8 system usage: 
CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s
2022-02-25 02:25:16.208 UTC [7397] DEBUG:  resetting unlogged relations: 
cleanup 0 init 1
2022-02-25 02:25:16.209 UTC [7397] DEBUG:  creating and filling new WAL file
2022-02-25 02:25:16.217 UTC [7397] DEBUG:  done creating and filling new WAL 
file
2022-02-25 02:25:16.217 UTC [7397] DEBUG:  MultiXactId wrap limit is 
2147483648, limited by database with OID 1
2022-02-25 02:25:16.217 UTC [7397] DEBUG:  MultiXact member stop limit is now 
4294914944 based on MultiXact 1
2022-02-25 02:25:16.217 UTC [7397] DEBUG:  OVERWRITE_CONTRECORD inserted at 
0/2000028 for aborted record 0/1FFD2E0 <<------- Attached V3 of patch that adds 
logging which shows the overwrite record created on the primary
2022-02-25 02:25:16.217 UTC [7395] LOG:  checkpoint starting: end-of-recovery 
immediate wait
2022-02-25 02:25:16.217 UTC [7395] DEBUG:  performing replication slot 
checkpoint
2022-02-25 02:25:16.218 UTC [7395] DEBUG:  attempting to remove WAL segments 
older than log file 000000000000000000000000
2022-02-25 02:25:16.218 UTC [7395] LOG:  checkpoint complete: wrote 131 buffers 
(102.3%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.001 
s, total=0.001 s; sync files=0, longest=0.000 s, average=0.000 s; 
distance=11381 kB, estimate=11381 kB
2022-02-25 02:25:16.219 UTC [7394] DEBUG:  starting background worker process 
"logical replication launcher"
2022-02-25 02:25:16.219 UTC [7394] LOG:  database system is ready to accept 
connections

## A downstream standby now skips over this OVERWRITE_CONTRECORD, but the value 
for missingContrecPtr is not invalidated afterwards.
## on the standby logs

2022-02-25 02:25:16.616 UTC [7413] DEBUG:  sending hot standby feedback xmin 0 
epoch 0 catalog_xmin 0 catalog_xmin_epoch 0
2022-02-25 02:25:16.616 UTC [7387] LOG:  successfully skipped missing 
contrecord at 0/1FFD2E0, overwritten at 2022-02-25 02:25:16.2175+00
2022-02-25 02:25:16.616 UTC [7387] CONTEXT:  WAL redo at 0/2000028 for 
XLOG/OVERWRITE_CONTRECORD: lsn 0/1FFD2E0; time 2022-02-25 02:25:16.2175+00

## After promotion, the standby attempts to write the overwrite_contrecord 
again using the missingContrecPtr LSN which is now in the past.
## on the standby logs

2022-02-25 02:25:16.646 UTC [7387] LOG:  invalid record length at 0/201EC70: 
wanted 24, got 0
2022-02-25 02:25:16.646 UTC [7387] LOG:  redo done at 0/201EC48 system usage: 
CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.55 s
2022-02-25 02:25:16.646 UTC [7387] LOG:  last completed transaction was at log 
time 2022-02-25 02:25:16.301554+00
2022-02-25 02:25:16.646 UTC [7387] DEBUG:  resetting unlogged relations: 
cleanup 0 init 1
2022-02-25 02:25:16.646 UTC [7387] LOG:  selected new timeline ID: 2
2022-02-25 02:25:16.646 UTC [7387] DEBUG:  updated min recovery point to 
0/201EC70 on timeline 1
2022-02-25 02:25:16.656 UTC [7387] DEBUG:  could not remove file 
"pg_wal/000000020000000000000002": No such file or directory
2022-02-25 02:25:16.656 UTC [7387] LOG:  archive recovery complete
2022-02-25 02:25:16.656 UTC [7387] DEBUG:  MultiXactId wrap limit is 
2147483648, limited by database with OID 1
2022-02-25 02:25:16.656 UTC [7387] DEBUG:  MultiXact member stop limit is now 
4294914944 based on MultiXact 1
2022-02-25 02:25:16.656 UTC [7387] DEBUG:  OVERWRITE_CONTRECORD inserted at 
0/2000028 for aborted record 0/1FFD2E0 <<------- The same overwrite record is 
written on the recently promoted standby
2022-02-25 02:25:16.656 UTC [7387] DEBUG:  attempting to remove WAL segments 
newer than log file 000000020000000000000001
2022-02-25 02:25:16.656 UTC [7387] DEBUG:  recycled write-ahead log file 
"000000010000000000000002"
2022-02-25 02:25:16.656 UTC [7387] DEBUG:  release all standby locks
2022-02-25 02:25:16.656 UTC [7385] LOG:  checkpoint starting: force
2022-02-25 02:25:16.656 UTC [7385] DEBUG:  performing replication slot 
checkpoint
2022-02-25 02:25:16.656 UTC [7385] LOG:  request to flush past end of generated 
WAL; request 0/201EC70, current position 0/2000088
2022-02-25 02:25:16.656 UTC [7385] PANIC:  xlog flush request 0/201EC70 is not 
satisfied --- flushed only to 0/2000088
2022-02-25 02:25:16.657 UTC [7384] LOG:  checkpointer process (PID 7385) was 
terminated by signal 6: Aborted

The purpose of the patch is to check that if the current LSN is beyond 
missingContrecPtr, and to skip ( incorrectly ) writing an overwrite contrecord 
that was written on the old writer.

--
Sami Imseih
Amazon Web Services


Attachment: v3-0001-Fix-missing-continuation-record-after-standby-pro.patch
Description: v3-0001-Fix-missing-continuation-record-after-standby-pro.patch

Reply via email to