On Mon, Apr 9, 2018 at 12:13 PM, Andres Freund <and...@anarazel.de> wrote: > Let's lower the pitchforks a bit here. Obviously a grand rewrite is > absurd, as is some of the proposed ways this is all supposed to > work. But I think the case we're discussing is much closer to a near > irresolvable corner case than anything else.
+1 > We're talking about the storage layer returning an irresolvable > error. You're hosed even if we report it properly. Yes, it'd be nice if > we could report it reliably. But that doesn't change the fact that what > we're doing is ensuring that data is safely fsynced unless storage > fails, in which case it's not safely fsynced anyway. Right. We seem to be implicitly assuming that there is a big difference between a problem in the storage layer that we could in principle detect, but don't, and any other problem in the storage layer. I've read articles claiming that technologies like SMART are not really reliable in a practical sense , so it seems to me that there is reason to doubt that this gap is all that big. That said, I suspect that the problems with running out of disk space are serious practical problems. I have personally scoffed at stories involving Postgres databases corruption that gets attributed to running out of disk space. Looks like I was dead wrong.  https://danluu.com/file-consistency/ -- "Filesystem correctness" -- Peter Geoghegan