On Wed, Mar 16, 2022 at 10:24:44AM +0900, Kyotaro Horiguchi wrote: > While discussing on additional LSNs in checkpoint log message, > Fujii-san pointed out [2] that there is a case where > CreateRestartPoint leaves unrecoverable database when concurrent > promotion happens. That corruption is "fixed" by the next checkpoint > so it is not a severe corruption.
I suspect we'll start seeing this problem more often once end-of-recovery checkpoints are removed [0]. Would you mind creating a commitfest entry for this thread? I didn't see one. > AFAICS since 9.5, no check(/restart)pionts won't run concurrently with > restartpoint [3]. So I propose to remove the code path as attached. Yeah, this "quick hack" has been around for some time (2de48a8), and I believe much has changed since then, so something like what you're proposing is probably the right thing to do. > /* Also update the info_lck-protected copy */ > SpinLockAcquire(&XLogCtl->info_lck); > - XLogCtl->RedoRecPtr = lastCheckPoint.redo; > + XLogCtl->RedoRecPtr = RedoRecPtr; > SpinLockRelease(&XLogCtl->info_lck); > > /* > @@ -6984,7 +6987,10 @@ CreateRestartPoint(int flags) > /* Update the process title */ > update_checkpoint_display(flags, true, false); > > - CheckPointGuts(lastCheckPoint.redo, flags); > + CheckPointGuts(RedoRecPtr, flags); I don't understand the purpose of these changes. Are these related to the fix, or is this just tidying up? [0] https://postgr.es/m/CA%2BTgmoY%2BSJLTjma4Hfn1sA7S6CZAgbihYd%3DKzO6srd7Ut%3DXVBQ%40mail.gmail.com -- Nathan Bossart Amazon Web Services: https://aws.amazon.com