On Fri, Jul 8, 2022 at 9:16 PM Bharath Rupireddy <bharath.rupireddyforpostg...@gmail.com> wrote: > > On Sat, Jun 25, 2022 at 1:31 AM Cary Huang <cary.hu...@highgo.ca> wrote: > > > > The following review has been posted through the commitfest application: > > make installcheck-world: tested, passed > > Implements feature: tested, passed > > Spec compliant: not tested > > Documentation: not tested > > > > Hello > > > > I tested this patch in a setup where the standby is in the middle of > > replicating and REDOing primary's WAL files during a very large data > > insertion. During this time, I keep killing the walreceiver process to > > cause a stream failure and force standby to read from archive. The system > > will restore from archive for "wal_retrieve_retry_interval" seconds before > > it attempts to steam again. Without this patch, once the streaming is > > interrupted, it keeps reading from archive until standby reaches the same > > consistent state of primary and then it will switch back to streaming > > again. So it seems that the patch does the job as described and does bring > > some benefit during a very large REDO job where it will try to re-stream > > after restoring some WALs from archive to speed up this "catch up" process. > > But if the recovery job is not a large one, PG is already switching back to > > streaming once it hits consistent state. > > Thanks a lot Cary for testing the patch. > > > Here's a v1 patch that I've come up with. I'm right now using the > > existing GUC wal_retrieve_retry_interval to switch to stream mode from > > archive mode as opposed to switching only after the failure to get WAL > > from archive mode. If okay with the approach, I can add tests, change > > the docs and add a new GUC to control this behaviour. I'm open to > > thoughts and ideas here. > > It will be great if I can hear some thoughts on the above points (as > posted upthread).
Here's the v2 patch with a separate GUC, new GUC was necessary as the existing GUC wal_retrieve_retry_interval is loaded with multiple usages. When the feature is enabled, it will let standby to switch to stream mode i.e. fetch WAL from primary before even fetching from archive fails. The switching to stream mode from archive happens in 2 scenarios: 1) when standby is in initial recovery 2) when there was a failure in receiving from primary (walreceiver got killed or crashed or timed out, or connectivity to primary was broken - for whatever reasons). I also added test cases to the v2 patch. Please review the patch. -- Bharath Rupireddy RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/
v2-0001-Switch-WAL-source-to-stream-from-archive.patch
Description: Binary data