Hi,

On 11/17/23 2:46 AM, Zhijie Hou (Fujitsu) wrote:
On Tuesday, November 14, 2023 10:27 PM Drouvot, Bertrand 
<bertranddrouvot...@gmail.com> wrote:
On 11/13/23 2:57 PM, Zhijie Hou (Fujitsu) wrote:
On Friday, November 10, 2023 4:16 PM Drouvot, Bertrand
<bertranddrouvot...@gmail.com> wrote:
Yeah good point, agree to just error out in all the case then (if we
discard the sync_ reserved wording proposal, which seems to be the
case as probably not worth the extra work).

Thanks for the discussion!

Here is the V33 patch set which includes the following changes:

Thanks for working on it!


1) Drop slots with state 'i' in promotion flow after we shut down WalReceiver.

@@ -3557,10 +3558,15 @@ WaitForWALToBecomeAvailable(XLogRecPtr
RecPtr, bool randAccess,
                       * this only after failure, so when you promote, we still
                       * finish replaying as much as we can from archive and
                       * pg_wal before failover.
+                    *
+                    * Drop the slots for which sync is initiated but not yet
+                    * completed i.e. they are still waiting for the primary
+                    * server to catch up.
                       */
                      if (StandbyMode && CheckForStandbyTrigger())
                      {
                          XLogShutdownWalRcv();
+                       slotsync_drop_initiated_slots();
                          return XLREAD_FAIL;
                      }

I had a closer look and it seems this is not located at the right place.

Indeed, it's added here:

switch (currentSource)
{
        case XLOG_FROM_ARCHIVE:
        case XLOG_FROM_PG_WAL:

While in our case we are in

        case XLOG_FROM_STREAM:

So I think we should move slotsync_drop_initiated_slots() in the
XLOG_FROM_STREAM case. Maybe before shutting down the sync slot worker?
(the TODO item number 2 you mentioned up-thread)

Thanks for the comment.

I feel the WaitForWALToBecomeAvailable may not be the best place to shutdown
slotsync worker and drop slots. There could be other reasons(other than
promotion) as mentioned in comments in case XLOG_FROM_STREAM to reach the code
there. I thought if the intention is to stop slotsync workers on promotion,
maybe FinishWalRecovery() is a better place to do it as it's indicating the end
of recovery and XLogShutdownWalRcv is also called in it.

I can see that slotsync_drop_initiated_slots() has been moved in 
FinishWalRecovery()
in v35. That looks ok.

And I feel we'd better drop the slots after shutting down the slotsync workers,
because otherwise the slotsync workers could create the dropped slot again in
rare cases.

Yeah, agree and I can see that it's done that way in v35.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


Reply via email to