On Fri, Jun 30, 2023 at 7:29 PM Hayato Kuroda (Fujitsu)
<kuroda.hay...@fujitsu.com> wrote:
>
> I have analyzed more, and concluded that there are no difference between 
> manual
> and shutdown checkpoint.
>
> The difference was whether the CHECKPOINT record has been decoded or not.
> The overall workflow of this test was:
>
> 1. do INSERT
> (2. do CHECKPOINT)
> (3. decode CHECKPOINT record)
> 4. receive feedback message from standby
> 5. do shutdown CHECKPOINT
>
> At step 3, the walsender decoded that WAL and set candidate_xmin_lsn. The 
> stucktrace was:
> standby_decode()->SnapBuildProcessRunningXacts()->LogicalIncreaseXminForSlot().
>
> At step 4, the confirmed_flush of the slot was updated, but 
> ReplicationSlotSave()
> was executed only when the slot->candidate_xmin_lsn had valid lsn. If step 2 
> and
> 3 are misssed, the dirty flag is not set and the change is still on the 
> memory.
>
> FInally, the CHECKPOINT was executed at step 5. If step 2 and 3 are misssed 
> and
> the patch from Julien is not applied, the updated value will be discarded. 
> This
> is what I observed. The patch forces to save the logical slot at the shutdown
> checkpoint, so the confirmed_lsn is save to disk at step 5.
>

I see your point but there are comments in walsender.c which indicates
that we also wait for step-5 to get replicated. See [1] and comments
atop walsender.c. If this is true then we don't need a special check
as you have in patch 0003 or at least it doesn't seem to be required
in all cases.

[1] -
/*
* When SIGUSR2 arrives, we send any outstanding logs up to the
* shutdown checkpoint record (i.e., the latest record), wait for
* them to be replicated to the standby, and exit. ...
*/

-- 
With Regards,
Amit Kapila.


Reply via email to