On 25/11/2020 15:20, Daniel Gustafsson wrote:
On 23 Nov 2020, at 18:36, Heikki Linnakangas <[email protected]> wrote: What happens is if you crash between UpdateControlFile() and XlogChecksum()?Good point, that would not get the cluster to a consistent state. The XlogChecksum should be performed before controlfile is udpated. +void +SetDataChecksumsOff(void) +{ + LWLockAcquire(ControlFileLock, LW_EXCLUSIVE); + + if (ControlFile->data_checksum_version == 0) + { + LWLockRelease(ControlFileLock); + return; + } + + if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION) + { + ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION; + UpdateControlFile(); + LWLockRelease(ControlFileLock); + + /* + * Update local state in all backends to ensure that any backend in + * "on" state is changed to "inprogress-off". + */ + XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION); + WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF)); + + /* + * At this point we know that no backends are verifying data checksums + * during reading. Next, we can safely move to state "off" to also + * stop writing checksums. + */ + } + + XlogChecksums(0); + + LWLockAcquire(ControlFileLock, LW_EXCLUSIVE); + ControlFile->data_checksum_version = 0; + UpdateControlFile(); + LWLockRelease(ControlFileLock); + + WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF)); +}
The lwlocking doesn't look right here. If ControlFile->data_checksum_version != PG_DATA_CHECKSUM_VERSION, LWLockAcquire is called twice without a LWLockRelease in between.
What if a checkpoint, and a crash, happens just after the WAL record has been written, but before the control file is updated? That's a ridiculously tight window for a whole checkpoint cycle to happen, but in principle I think that would spell trouble. I think you could set delayChkpt to prevent the checkpoint from happening in that window, similar to how we avoid this problem with the clog updates at commit. Also, I think this should be in a critical section; we don't want the process to error out in between for any reason, and if it does happen, it's panic time.
- Heikki
