On Fri, Apr 28, 2023 at 2:35 PM Hayato Kuroda (Fujitsu) <kuroda.hay...@fujitsu.com> wrote: > > Dear hackers, > > I rebased and refined my PoC. Followings are the changes:
Thanks. Apologies for being late here. Please bear with me if I'm repeating any of the discussed points. I'm mainly trying to understand the production level use-case behind this feature, and for that matter, recovery_min_apply_delay. AFAIK, people try to keep the replication lag as minimum as possible i.e. near zero to avoid the extreme problems on production servers - wal file growth, blocked vacuum, crash and downtime. The proposed feature commit message and existing docs about recovery_min_apply_delay justify the reason as 'offering opportunities to correct data loss errors'. If someone wants to enable recovery_min_apply_delay/min_apply_delay on production servers, I'm guessing their values will be in hours, not in minutes; for the simple reason that when a data loss occurs, people/infrastructure monitoring postgres need to know it first and need time to respond with corrective actions to recover data loss. When these parameters are set, the primary server mustn't be generating too much WAL to avoid eventual crash/downtime. Who would really want to be so defensive against somebody who may or may not accidentally cause data loss and enable these features on production servers (especially when these can take down the primary server) and live happily with the induced replication lag? AFAIK, PITR is what people use for recovering from data loss errors in production. IMO, before we even go implement the apply delay feature for logical replication, it's worth to understand if induced replication lags have any production level significance. We can also debate if providing apply delay hooks is any better with simple out-of-the-box extensions as opposed to the core providing these features. Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com