On Fri, Sep 9, 2022 at 10:29 PM Nathan Bossart <nathandboss...@gmail.com> wrote: > > On Fri, Sep 09, 2022 at 12:14:25PM +0530, Bharath Rupireddy wrote: > > On Fri, Sep 9, 2022 at 10:57 AM Kyotaro Horiguchi > > <horikyota....@gmail.com> wrote: > >> At Thu, 8 Sep 2022 10:53:56 -0700, Nathan Bossart > >> <nathandboss...@gmail.com> wrote in > >> > My general point is that we should probably offer some basic preventative > >> > measure against flipping back and forth between streaming and archive > >> > recovery while making zero progress. As I noted, maybe that's as simple > >> > as > >> > having WaitForWALToBecomeAvailable() attempt to restore a file from > >> > archive > >> > at least once before the new parameter forces us to switch to streaming > >> > replication. There might be other ways to handle this. > >> > >> +1. > > > > Hm. In that case, I think we can get rid of timeout based switching > > mechanism and have this behaviour - the standby can attempt to switch > > to streaming mode from archive, say, after fetching 1, 2 or a > > configurable number of WAL files. In fact, this is the original idea > > proposed by Satya in this thread. > > IMO the timeout approach would be more intuitive for users. When it comes > to archive recovery, "WAL segment" isn't a standard unit of measure. WAL > segment size can differ between clusters, and WAL files can have different > amounts of data or take different amounts of time to replay.
How about the amount of WAL bytes fetched from the archive after which a standby attempts to connect to primary or enter streaming mode? Of late, we've changed some GUCs to represent bytes instead of WAL files/segments, see [1]. > So I think it > would be difficult for the end user to decide on a value. However, even > the timeout approach has this sort of problem. If your parameter is set to > 1 minute, but the current archive takes 5 minutes to recover, you won't > really be testing streaming replication once a minute. That would likely > need to be documented. If we have configurable WAL bytes instead of timeout for standby WAL source switch from archive to primary, we don't have the above problem right? [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=c3fe108c025e4a080315562d4c15ecbe3f00405e -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com