On Thu, Feb 02, 2023 at 03:23:51PM -0500, Robert Haas wrote: > 0001 fixes what I believe to be a slight logical error in sendFile(), > introduced by me during the v15 development cycle when I introduced > the bbsink abstraction. I believe that it is theoretically possible > for this to cause an assertion failure, although the chances of that > actually happening seem extremely remote in practice. I don't believe > there are any consequences worse than that; for instance, I don't > think this can result in your backup getting corrupted. See the > proposed commit message for full details. Because I believe that this > is formally a bug, I am inclined to think that this should be > back-patched, but I also think it's fairly likely that no one would > ever notice if we didn't. However, patch authors have been known to be > wrong about the consequences of their own bugs from time to time, so > please do let me know if this seems more serious to you than what I'm > indicating, or conversely if you think it's not a problem at all for > some reason.
Seems right, I think that you should backpatch that as VERIFY_CHECKSUMS is the default. > 0002 removes an old comment from the file that I find useless and > slightly misleading. Okay. > 0003 rewrites a comment about the way that we verify checksums during > backups. If we get a checksum mismatch, we reread the block and see if > the perceived problem goes away. If it doesn't, then we report it. > This is intended as protection against the backup reading a block > while some other process is in the midst of writing it, but there's no > guarantee that any concurrent write will finish quickly enough for our > second read attempt to see the updated contents. There is more to it: the page LSN is checked before its checksum. Hence, if the page's LSN is corrupted in a way that it is higher than sink->bbs_state->startptr, the checksum verification is just skipped while the page is broken but not reported as such. (Spoiler: this has been mentioned in the past, and maybe we'd better remove this stuff in its entirety.) > The comment claims > otherwise, and that's false, and I'm getting tired of reading this > false claim every time I read this code, so I rewrote the comment to > say what I believe to be true, namely, that our algorithm is flaky and > we have no good way to fix that right now. I'm pretty sure that Andres > pointed this problem out when this feature was under discussion, but > somehow it's still like this. There's another nearby comment which is > also false or at least misleading for basically the same reasons which > probably should be rewritten too, but I was a bit less certain how to > rewrite it and it wasn't making me as annoyed as this one, so for now > I only rewrote the one. Indeed, that's an improvement ;) -- Michael
signature.asc
Description: PGP signature