On Thu, Dec 23, 2021 at 5:53 AM SATYANARAYANA NARLAPURAM <satyanarlapu...@gmail.com> wrote: > > Hi Hackers, > > I am considering implementing RPO (recovery point objective) enforcement > feature for Postgres where the WAL writes on the primary are stalled when the > WAL distance between the primary and standby exceeds the configured > (replica_lag_in_bytes) threshold. This feature is useful particularly in the > disaster recovery setups where primary and standby are in different regions > and synchronous replication can't be set up for latency and performance > reasons yet requires some level of RPO enforcement.
Limiting transaction rate when the standby fails behind is a good feature ... > > The idea here is to calculate the lag between the primary and the standby > (Async?) server during XLogInsert and block the caller until the lag is less > than the threshold value. We can calculate the max lag by iterating over > ReplicationSlotCtl->replication_slots. If this is not something we don't want > to do in the core, at least adding a hook for XlogInsert is of great value. but doing it in XLogInsert does not seem to be a good idea. It's a common point for all kinds of logging including VACUUM. We could accidently stall a critical VACUUM operation because of that. As Bharath described, it better be handled at the application level monitoring. -- Best Wishes, Ashutosh Bapat