On Fri, Jun 30, 2023 at 7:29 PM Hayato Kuroda (Fujitsu) <kuroda.hay...@fujitsu.com> wrote: > > I have analyzed more, and concluded that there are no difference between > manual > and shutdown checkpoint. > > The difference was whether the CHECKPOINT record has been decoded or not. > The overall workflow of this test was: > > 1. do INSERT > (2. do CHECKPOINT) > (3. decode CHECKPOINT record) > 4. receive feedback message from standby > 5. do shutdown CHECKPOINT > > At step 3, the walsender decoded that WAL and set candidate_xmin_lsn. The > stucktrace was: > standby_decode()->SnapBuildProcessRunningXacts()->LogicalIncreaseXminForSlot(). > > At step 4, the confirmed_flush of the slot was updated, but > ReplicationSlotSave() > was executed only when the slot->candidate_xmin_lsn had valid lsn. If step 2 > and > 3 are misssed, the dirty flag is not set and the change is still on the > memory. > > FInally, the CHECKPOINT was executed at step 5. If step 2 and 3 are misssed > and > the patch from Julien is not applied, the updated value will be discarded. > This > is what I observed. The patch forces to save the logical slot at the shutdown > checkpoint, so the confirmed_lsn is save to disk at step 5. >
I see your point but there are comments in walsender.c which indicates that we also wait for step-5 to get replicated. See [1] and comments atop walsender.c. If this is true then we don't need a special check as you have in patch 0003 or at least it doesn't seem to be required in all cases. [1] - /* * When SIGUSR2 arrives, we send any outstanding logs up to the * shutdown checkpoint record (i.e., the latest record), wait for * them to be replicated to the standby, and exit. ... */ -- With Regards, Amit Kapila.