On Wed, Feb 2, 2022 at 9:28 PM Julien Rouhaud <rjuju...@gmail.com> wrote: > > On Wed, Feb 02, 2022 at 09:14:03PM +0530, Bharath Rupireddy wrote: > > > > FYI that thread is closed, it committed the change (f61e1dd [1]) that > > pg_receivewal can read from its replication slot restart lsn. > > > > I know that providing the start pos as an option came up there [2], > > but I wanted to start the discussion fresh as that thread got closed. > > Ah sorry I misunderstood your email. > > I'm not sure it's a good idea. If you have missing WALs in your target > directory but have an alternative backup location, you will have to restore > the > WAL from that alternative location anyway, so I'm not sure how accepting a > different start position is going to help in that scenario. On the other hand > allowing a position at the command line can also lead to accepting a bogus > position, which could possibly make things worse.
Isn't complex for anyone to go to the archive location which involves extra steps - getting authentication tokens, searching there for the required WAL file, downloading it, unzipping it, copying back to pg_receivewal node etc. in production environments? You know, this will just be problematic and adds more time for bringing up the pg_receivewal. Instead if I know that the latest checkpoint LSN and archived WAL file from the primary, I can just provide the startpos (probably the last checkpoint LSN) to pg_receivewal so that it can continue getting the WAL records from primary, avoiding the whole bunch of the manual work that I had to do. > > 2) Currently, RECONNECT_SLEEP_TIME is 5sec - but I may want to have > > more reconnect time as I know that the primary can go down at any time > > for whatever reasons in production environments which can take some > > time till I bring up primary and I don't want to waste compute cycles > > in the node on which pg_receivewal is running > > I don't think that attempting a connection is really costly. Also, increasing > this retry time also increases the amount of time you're not streaming WALs, > and thus the amount of data you can lose so I'm not sure that's actually a > good > idea. But you might also want to make it more aggressive, so no objection to > make it configurable. Yeah, making it configurable helps tune the reconnect time as per the requirements. Regards, Bharath Rupireddy.