On Wed, Feb 20, 2019 at 5:52 AM Andres Freund <and...@anarazel.de> wrote:
> > 1.  Figure out how to get the ALLOCATE command all the way through the
> > stack from PostgreSQL to the remote NFS server, and know for sure that
> > it really happened.  On the Debian buster Linux 4.18 system I checked,
> > fallocate() reports EOPNOTSUPP for fallocate(), and posix_fallocate()
> > appears to succeed but it doesn't really do anything at all (though I
> > understand that some versions sometimes write zeros to simulate
> > allocation, which in this case would be equally useless as it doesn't
> > reserve anything on an NFS server).  We need the server and NFS client
> > and libc to be of the right version and cooperate and tell us that
> > they have really truly reserved space, but there isn't currently a way
> > as far as I can tell.  How can we achieve that, without writing our
> > own NFS client?
> >
> > 2.  Deal with the resulting performance suckage.  Extending 8kb at a
> > time with synchronous network round trips won't fly.
>
> I think I'd just go for fsync();pwrite();fsync(); as the extension
> mechanism, iff we're detecting a tablespace is on NFS. The first fsync()
> to make sure there's no previous errors that we could mistake for
> ENOSPC, the pwrite to extend, the second fsync to make sure there's
> actually space. Then we can detect ENOSPC properly.  That possibly does
> leave some errors where we could mistake ENOSPC as something more benign
> than it is, but the cases seem pretty narrow, due to the previous
> fsync() (maybe the other side could be thin provisioned and get an
> ENOSPC there - but in that case we didn't actually loose any data. The
> only dangerous scenario I can come up with is that the remote side is on
> thinly provisioned CoW system, and a concurrent write to an earlier
> block runs out of space - but seriously, good riddance to you).

This seems to make sense, and has the advantage that it uses
interfaces that exist right now.  But it seems a bit like we'll have
to wait for them to finish building out the errseq_t support for NFS
to avoid various races around the mapping's AS_EIO flag (A: fsync() ->
EIO, B: fsync() -> SUCCESS, log checkpoint; A: panic), and then maybe
we'd have to get at least one of { fd-passing, direct IO, threads }
working on our side ...

-- 
Thomas Munro
https://enterprisedb.com

Reply via email to