Hi,

On 11/8/23 4:50 AM, Amit Kapila wrote:
On Tue, Nov 7, 2023 at 7:58 PM Drouvot, Bertrand
<bertranddrouvot...@gmail.com> wrote:

If we think this window is too short we could:

- increase it
or
- don't drop the slot once created (even if there is no activity
on the primary during PrimaryCatchupWaitAttempt attempts) so that
the next loop of attempts will compare with "older" LSN/xmin (as compare to
dropping and re-creating the slot). That way the window would be since the
initial slot creation.


Yeah, this sounds reasonable but we can't mark such slots to be
synced/available for use after failover.

Yeah, currently we are fine as slots are dropped in 
wait_for_primary_slot_catchup() if
we are not in recovery anymore.

I think if we want to follow
this approach then we need to also monitor these slots for any change
in the consecutive cycles and if we are able to sync them then
accordingly we enable them to use after failover.

What about to add a new field in ReplicationSlotPersistentData
indicating that we are waiting for "sync" and drop such slots during promotion 
and
/or if not in recovery?

Another somewhat related point is that right now, we just wait for the
change on the first slot (the patch refers to it as the monitoring
slot) for computing nap_time before which we will recheck all the
slots. I think we can improve that as well such that even if any
slot's information is changed, we don't consider changing naptime.


Yeah, that sounds reasonable to me.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


Reply via email to