On Thu, 25 Oct 2018 02:57:18 -0400 Nikolay Samokhvalov <samokhva...@gmail.com> wrote: ... > My research shows that some people already rely on the following when > planned failover (aka switchover) procedure, doing it in production: > > 1) shutdown the current master > 2) ensure that the "master candidate" replica has received all WAL data > including shutdown checkpoint from the old master > 3) promote the master candidate to make it new master > 4) configure recovery.conf on the old master node, while it's inactive > 5) start the old master node as a new replica following the new master.
Indeed. > It looks to me now, that if no steps missed in the procedure, this approach > is eligible for Postgres versions 9.3+ (for older versions like 9.3 maybe > not really always – people who know details better will correct me here > maybe). Am I right? Or I'm missing some risks here? As far as I know, this is correct. > Two changes were made in 9.3 which allowed this approach in general [1] > [2]. Also, I see from the code [3] that during shutdown process, the > walsenders are the last who are stopped, so allow replicas to get the > shutdown checkpoint information. I had the same conclusions when I was studying controlled failover some years ago to implement it PAF project (allowing controlled switchover in one command). Here is a discussions around switchover taking place three years ago on Pacemaker mailing list: https://lists.clusterlabs.org/pipermail/users/2016-October/011568.html > Is this approach considered as safe now? Considering above points, I do think so. The only additional nice step would be to be able to run some more safety tests AFTER the switchover process on te old master. The only way I can think of would be to run pg_rewind even if it doesn't do much. > if so, let's add it to the documentation, making it official. The patch is > attached. I suppose we should add the technical steps in a sample procedure?