On Wed, Feb 27, 2019 at 11:50:17AM +0100, Fabien COELHO wrote: >> Shouldn't be necessary - the control file fits into a single page, and >> writes of that size ought to always be atomic. And I also think >> introducing flock usage for this would be quite disproportional.
There are static assertions to make sure that the side of control file data never gets higher than 512 bytes for this purpose. > Note that my concern is not about the page size, but rather that as more > commands may change the cluster status by editing the control file, it would > be better that a postmaster does not start while a pg_rewind or enable > checksum or whatever is in progress, and currently there is a possible race > condition between the read and write that can induce an issue, at least > theoretically. Something that I think we could live instead is a special flag in the control file to mark the postmaster as in maintenance mode. This would be useful to prevent the postmaster to start if seeing this flag in the control file, as well to find out that a host has crashed in the middle of a maintenance operation. We don't give this insurance now when running pg_rewind, which is bad. That's also separate from the checksum-related patches and pg_rewind. flock() can be something hard to live with for cross-platform compatibility like Windows (LockFileEx) or fancy platforms. And note that we don't use it yet in the tree. And flock() would help in the first case I am mentioning, but not in the second. -- Michael
signature.asc
Description: PGP signature