On 2018-04-09 10:00:41 +0800, Craig Ringer wrote: > I suspect we've written off a fair few issues in the past as "it'd bad > hardware" when actually, the hardware fault was the trigger for a Pg/kernel > interaction bug. And blamed containers for things that weren't really the > container's fault. But even so, if it were happening tons, we'd hear more > noise.
Agreed on that, but I think that's FAR more likely to be things like multixacts, index structure corruption due to logic bugs etc. > I've already been very surprised there when I learned that PostgreSQL > completely ignores wholly absent relfilenodes. Specifically, if you > unlink() a relation's backing relfilenode while Pg is down and that file > has writes pending in the WAL. We merrily re-create it with uninitalized > pages and go on our way. As Andres pointed out in an offlist discussion, > redo isn't a consistency check, and it's not obliged to fail in such cases. > We can say "well, don't do that then" and define away file losses from FS > corruption etc as not our problem, the lower levels we expect to take care > of this have failed. And it'd be a realy bad idea to behave differently. > And in many failure modes there's no reason to expect any data loss at all, > like: > > * Local disk fills up (seems to be safe already due to space reservation at > write() time) That definitely should be treated separately. > * Thin-provisioned storage backing local volume iSCSI or paravirt block > device fills up > * NFS volume fills up Those should be the same as the above. > I think we need to think about a more robust path in future. But it's > certainly not "stop the world" territory. I think you're underestimating the complexity of doing that by at least two orders of magnitude. Greetings, Andres Freund