On Thu, Dec 23, 2021 at 5:53 AM SATYANARAYANA NARLAPURAM
<satyanarlapu...@gmail.com> wrote:
>
> Hi Hackers,
>
> I am considering implementing RPO (recovery point objective) enforcement 
> feature for Postgres where the WAL writes on the primary are stalled when the 
> WAL distance between the primary and standby exceeds the configured 
> (replica_lag_in_bytes) threshold. This feature is useful particularly in the 
> disaster recovery setups where primary and standby are in different regions 
> and synchronous replication can't be set up for latency and performance 
> reasons yet requires some level of RPO enforcement.

Limiting transaction rate when the standby fails behind is a good feature ...

>
> The idea here is to calculate the lag between the primary and the standby 
> (Async?) server during XLogInsert and block the caller until the lag is less 
> than the threshold value. We can calculate the max lag by iterating over 
> ReplicationSlotCtl->replication_slots. If this is not something we don't want 
> to do in the core, at least adding a hook for XlogInsert is of great value.

but doing it in XLogInsert does not seem to be a good idea. It's a
common point for all kinds of logging including VACUUM. We could
accidently stall a critical VACUUM operation because of that.

As Bharath described, it better be handled at the application level monitoring.

-- 
Best Wishes,
Ashutosh Bapat


Reply via email to