Hi, On 2018-08-01 21:03:22 +0300, Sergei Kornilov wrote: > > They fail over to a secondary to do maintenance on a primary. > But this is not problem even in current patch state. We can restart replica > before failover and it works. I tested this behavior during my review. > We can: > - call pg_enable_data_checksums() on master > - wait change data_checksums to inprogress on replica
That's *precisely* the problem. What if your replicas are delayed (e.g. recovery_min_apply_delay)? How would you schedule that restart properly? What if you later need to do PITR? > - restart replica - we can restart replica before promote, right? > - promote this replica > - checksum helper is launched now and working on this promoted cluster This doesn't test the consequences of the restart being skipped, nor does it review on a code level the correctness. Greetings, Andres Freund