Hi, On 2018-04-06 14:34:43 +0200, Tomas Vondra wrote: > > Oh, that's not my intention either -- I just wanted to make sure I > > was thinking about the same issue you were.
> I agree we shouldn't rely on chance here - if we might read a stale > value, we need to fix that of course. It's perfectly possible that some side-conditions mitigate this. What concerns me that a) Nobody appears to have raised this issue beforehand, besides an unlocked read of a critical variable being a fairly obvious issue. This kind of thing needs to be carefully thought about. b) If there's some "side channel" interlock, it's not documented. I noticed the issue because of an IM question about the general feature, and I did a three minute skim and saw the read without a comment. > I'm not quite sure I fully understand the issue, though. I assume both > LockBufHdr and UnlockBufHdr are memory barriers, so for bad things to > happen the process would need to be already past LockBufHdr when the > checksum version is updated. In which case it can use a stale version > when writing the buffer out. Correct? Yes, they're are memory barriers. > I wonder if that's actually a problem, considering the checksum worker > will then overwrite all data with correct checksums anyway. So the other > process would have to overwrite the buffer after checksum worker, at > which point it'll have to go through LockBufHdr. Again, I'm not sure if there's some combination of issues that make this not a problem in practice. I just *asked* if there's something preventing this from being a problem. The really problematic case would be if it is possible for some process to wait long enough, without executing a barrier implying operation, that it'd try to write out a page that the checksum worker has already passed over. Greetings, Andres Freund