Dear Amit,

> I have studied this a bit more and it seems that is true for physical
> walsenders where we set the state of walsender as WALSNDSTATE_STOPPING
> in XLogSendPhysical, then the checkpointer finishes writing checkpoint
> record and then postmaster sends SIGUSR2 for walsender to exit. IIUC,
> this whole logic of different stop states has been introduced in
> commit c6c3334364 based on the discussion in the thread [1]. As per my
> understanding, logical walsenders don't seem to be waiting for
> shutdown checkpoint record and finishes before even we LOG that
> record. It seems that the behavior of logical walsenders is different
> from physical walsenders where we wait for them to send even the final
> shutdown checkpoint record before they finish.

Yes, you are right. Physical walsenders wait exiting checkpointer, but logical
ones exit before checkpointer does. This is because logical walsender may 
generate
WALs due by executing replication commands like START_REPLICATION and
CREATE_REPLICATION_SLOT and they may be recorded at after the shutdown
checkpoint record. This leads PANIC.

> If so, then we won't be
> able to switchover to logical subscribers even in case of a clean
> shutdown. Am, I missing something?
> 
> [1] -
> https://www.postgresql.org/message-id/CAHGQGwEsttg9P9LOOavoc9d6VB1zV
> mYgfBk%3DLjsk-UL9cEf-eA%40mail.gmail.com

Based on the above, we are considering that we delay the timing of shutdown for
logical walsenders. The preliminary workflow is:

1. When logical walsenders receives siginal from checkpointer, it consumes all
   of WAL records, change its state into WALSNDSTATE_STOPPING, and stop doing
   anything. 
2. Then the checkpointer does the shutdown checkpoint
3. After that postmaster sends signal to walsenders, same as current 
implementation.
4. Finally logical walsenders process the shutdown checkpoint record and update 
the
  confirmed_lsn after the acknowledgement from subscriber. 
  Note that logical walsenders don't have to send a shutdown checkpoint record
  to subscriber but following keep_alive will help us to increment the 
confirmed_lsn.
5. All tasks are done, they exit.

This mechanism ensures that the confirmed_lsn of active slots is same as the 
current
WAL location of old publisher, so that 0003 patch would become more simpler.
We would not have to calculate the acceptable difference anymore.

One thing we must consider is that any WALs must not be generated while decoding
the shutdown checkpoint record. It causes the PANIC. IIUC the record leads
SnapBuildSerializationPoint(), which just serializes snapbuild or restores from
it, so the change may be acceptable. Thought?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Reply via email to