Re: Synchronizing slots from primary to standby

Drouvot, Bertrand Tue, 03 Oct 2023 23:25:47 -0700

Hi,

On 10/4/23 6:26 AM, shveta malik wrote:

On Wed, Oct 4, 2023 at 5:36 AM Amit Kapila <amit.kapil...@gmail.com> wrote:


On Tue, Oct 3, 2023 at 9:27 PM shveta malik <shveta.ma...@gmail.com> wrote:


On Tue, Oct 3, 2023 at 7:56 PM Drouvot, Bertrand
<bertranddrouvot...@gmail.com> wrote:


Hi,

On 10/3/23 12:54 PM, Amit Kapila wrote:

On Mon, Oct 2, 2023 at 11:39 AM Drouvot, Bertrand
<bertranddrouvot...@gmail.com> wrote:


On 9/29/23 1:33 PM, Amit Kapila wrote:

On Thu, Sep 28, 2023 at 6:31 PM Drouvot, Bertrand
<bertranddrouvot...@gmail.com> wrote:

- probably open corner cases like: what if a standby is down? would that mean
that synchronize_slot_names not being send to the primary would allow the 
decoding
on the primary to go ahead?


Good question. BTW, irrespective of whether we have
'standby_slot_names' parameters or not, how should we behave if
standby is down? Say, if 'synchronize_slot_names' is only specified on
standby then in such a situation primary won't be even aware that some
of the logical walsenders need to wait.


Exactly, that's why I was thinking keeping standby_slot_names to address
this scenario. In such a case one could simply decide to keep or remove
the associated physical replication slot from standby_slot_names. Keep would
mean "wait" and removing would mean allow to decode on the primary.

OTOH, one can say that users
should configure 'synchronize_slot_names' on both primary and standby
but note that this value could be different for different standby's,
so we can't configure it on primary.


Yeah, I think that's a good use case for standby_slot_names, what do you think?


But, even if we keep 'standby_slot_names' for this purpose, the
primary doesn't know the value of 'synchronize_slot_names' once the
standby is down and or the primary is restarted. So, how will we know
which logical WAL senders needs to wait for 'standby_slot_names'?


Yeah right, I also think we'd need:

- synchronize_slot_names on both primary and standby

But now we would need to take care of different standby having different values 
(
as you said up-thread)....

Thinking out loud: What about a single GUC on the primary (not 
standby_slot_names nor
synchronize_slot_names) but say logical_slots_wait_for_standby that could be a 
list of say
"logical_slot_name:physical_slot".

I think this GUC would help us define each walsender behavior (should the 
standby(s)
be up or down):


It may help in defining the walsender's behaviour better for sure. But
the problem I see once we start defining sync-slot-names on primary
(in any form whether as independent GUC or as above mapping GUC) is
that it needs to be then in sync with standbys, as each standby for
sure needs to maintain its own sync-slot-names GUC to make it aware of
what all it needs to sync.


Yes, I also think so. Also, defining such a GUC where user wants to
sync all the slots which would normally be the case would be a night
mare for the users.


This brings us to the original question of
how do we actually keep these configurations in sync between primary
and standby if we plan to maintain it on both?

- don't wait if its associated logical_slot is not listed in this GUC
- or wait based on its associated "list" of mapped physical slots (would 
probably
have to deal with the min restart_lsn for all the corresponding mapped ones).

I don't think we can avoid having to define at least one GUC on the primary (at 
least to
handle the case of standby(s) being down).


How about an alternate scheme where we define sync_slot_names on
standby but then store the physical_slot_name in the corresponding
logical slot (ReplicationSlotPersistentData) to be synced? So, the
standby will send the list of 'sync_slot_names' and the primary will
add the physical standby's slot_name in each of the corresponding
sync_slot. Now, if we do this then even after restart, we should be
able to know for which physical slot each logical slot needs to wait.
We can even provide an SQL API to reset the value of
standby_slot_names in logical slots as a way to unblock decoding in
case of emergency (for example, corresponding when physical standby
never comes up).



Looks like a better approach to me. It solves most of the pain points like:
1) Avoids the need of multiple GUCs
2) Primary and standby need not to worry to be in sync if we maintain
sync-slot-names GUC on both
3) User still gets the flexibility to remove a standby from wait-lost
of primary's logical-walsenders' using reset SQL API.


Fully agree.

Now some initial thoughts:
1) Since each logical slot could be needed to be synched by multiple
physical-standbys, so in ReplicationSlotPersistentData, we need to
hold a list of standby's name. So this brings us to question as in how
much shall we allocate initially in shared-memory? Shall it be for
max_replication_slots (worst case scenario) in each
ReplicationSlotPersistentData to hold physical-standby names?


Yeah, and even if we do the opposite means add the 'to-sync'
logical replication slot in the ReplicationSlotPersistentData of the physical
slot(s) the questions still remain (as a physical standby could want to
sync multiples slots)

2) If standby sends '*', then we need to update each logical-slot with
that standby-name. Or do we have better way to deal with '*'? Need to
think more on this.

JFYI, on the similar line, currently in ReplicationSlotPersistentData,
we are maintaining a flag for slot-sync feature which is:

         bool            synced; /* Is this a slot created by a
sync-slot worker? */

This flag currently holds significance only on physical-standby. This
has been added to distinguish between a slot created by user for

logical decoding purpose and the ones being synced from primary.


BTW, what about having this "user visible" through pg_replication_slots?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

Reply via email to