On Thu, Aug 28, 2025 at 3:29 PM Kirill Reshke <reshkekir...@gmail.com> wrote:
>
> On Thu, 28 Aug 2025 at 14:56, Amit Kapila <amit.kapil...@gmail.com> wrote:
> >
> > On Thu, Aug 28, 2025 at 11:07 AM Ashutosh Sharma <ashu.coe...@gmail.com> 
> > wrote:
> > >
> > > We have seen cases where slot synchronization gets delayed, for example 
> > > when the slot is behind the failover standby or vice versa, and the slot 
> > > sync worker has to wait for one to catch up with the other. During this 
> > > waiting period, users querying pg_replication_slots can only see whether 
> > > the slot has been synchronized or not. If it has already synchronized, 
> > > that’s fine, but if synchronization is taking longer, users would 
> > > naturally want to understand the reason for the delay.
> > >
> > > Is there a way for end users to know the cause of slot synchronization 
> > > delays, so they can take appropriate actions to speed it up?
> > >
> > > I understand that server logs are emitted in such cases, but logs are not 
> > > something end users would want to check regularly. Moreover, since 
> > > logging is configuration-based, relevant messages may sometimes be 
> > > skipped or suppressed.
> > >
> >
> > Currently, the way to see the reason for sync skip is LOGs but I think
> > it is better to add a new column like sync_skip_reason in
> > pg_replication_slots. This can show the reasons like
> > standby_LSN_ahead_remote_LSN. I think ideally users can compare
> > standby's slot LSN/XMIN with remote_slot being synced. Do you have any
> > better ideas?
> >
>
> How about something like pg_stat_progress_replication_slot with remote
> LSN/standby LSN/catalog XID etc?
> Wouldn't this be in sync with all other debug pg_stat_progress* views
> and thus more Postgres-y?
>

Yes, that is another option. I am a little worried that it is not
always the sync lags behind, so having a separate view just for sync
progress may be too much. Yet another option is existing view
pg_stat_replication_slots but it seems sync progress doesn't directly
match there. For example, we can add a counter sync_skipped, time of
last sync_skip, and last_sync_skip_reason that could be sufficient to
dig the problem further.

-- 
With Regards,
Amit Kapila.


Reply via email to