Hi

> This doesn't test the consequences of the restart being skipped, nor
> does it review on a code level the correctness.
I check not only one stuff during review. I look code too: bgworker 
checksumhelper.c registered with:
> bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
And then process the whole cluster (even if we run checksumhelper before, but 
exit before its completed). Or BgWorkerStart_RecoveryFinished does not 
guarantee start only after recovery finished?
Before start any real work (and after recovery end) checksumhelper checked 
current cluster status again:

> +      * If a standby was restarted when in pending state, a background worker
> +      * was registered to start. If it's later promoted after the master has
> +      * completed enabling checksums, we need to terminate immediately and 
> not
> +      * do anything. If the cluster is still in pending state when promoted,
> +      * the background worker should start to complete the job.

> What if your replicas are delayed (e.g. recovery_min_apply_delay)?
> What if you later need to do PITR?
if we start after replay pg_enable_data_checksums and before it completed - we 
plan start bgworker on recovery finish.
if we replay checksumhelper finish - we _can_ start checksumhelper again and 
this is handled during checksumhelper start.

Behavior seems correct for me. I miss something very wrong?

regards, Sergei

Reply via email to