On Fri, Aug 13, 2021 at 05:59:21PM -0700, Soumyadeep Chakraborty wrote: > and passes with the code change, as expected. I can't explain why the > test doesn't freeze up in v3 in wait_for_catchup() at the end.
It took me some some to understand why. If I am right, that's because of the intermediate test block working on $standby_2 and the two INSERT queries of the primary. In v1 and v4, we have no activity on the primary between the first set of tests and yours, meaning that $standby has nothing to do. In v2 and v3, the two INSERT queries run on the primary for the purpose of the recovery pause make $standby_1 wait for the default value of recovery_min_apply_delay, aka 3s, in parallel. If the set of tests for $standby_2 is faster than that, we'd bump on the phase where the code still waited for 3s, not the 2 hours set, visibly. After considering this stuff, the order dependency we'd introduce in this test makes the whole thing more brittle than it should. And such an edge case does not seem worth spending extra cycles testing anyway, as if things break we'd finish with a test stuck for an unnecessary long time by relying on wait_for_catchup("replay"). We could use something else, say based on a lookup of pg_stat_activity but this still requires extra run time for the wait phases needed. So at the end I have dropped the test, but backpatched the fix. -- Michael
signature.asc
Description: PGP signature