Thomas Munro <thomas.mu...@gmail.com> writes: > On Wed, Nov 23, 2022 at 11:03 PM Thomas Munro <thomas.mu...@gmail.com> wrote: >> I assume this is ext4. Presumably anything that reads the >> controlfile, like pg_ctl, pg_checksums, pg_resetwal, >> pg_control_system(), ... by reading without interlocking against >> writes could see garbage. I have lost track of the versions and the >> thread, but I worked out at some point by experimentation that this >> only started relatively recently for concurrent read() and write(), >> but always happened with concurrent pread() and pwrite(). The control >> file uses the non-p variants which didn't mash old/new data like >> grated cheese under concurrency due to some implementation detail, but >> now does.
Ugh. > As for what to do about it, some ideas: > 2. Retry after a short time on checksum failure. The probability is > already miniscule, and becomes pretty close to 0 if we read thrice > 100ms apart. > First thought is that 2 is appropriate level of complexity for this > rare and stupid problem. Yeah, I was thinking the same. A variant could be "repeat until we see the same calculated checksum twice". regards, tom lane