On Wed, Jan 27, 2021 at 2:49 PM Greg Kurz <gr...@kaod.org> wrote:
>
> On Wed, 27 Jan 2021 11:34:52 +0100
> Miklos Szeredi <mszer...@redhat.com> wrote:

> > Another solution specifically for O_CREAT without O_EXCL would be to
> > turn it into an exclusive create.
>
> Would this added O_EXCL then appear on the client side, e.g. to
> guest userspace doing fcntl(F_GETFL) ?

No.  Guest kernel keeps track of open flags.

> > If that fails with EEXIST then try
> > the normal open path (open with O_PATH, fstat, open proc symlink).  If
>
> open(O_PATH | O_NOFOLLOW) + fstatat(AT_EMPTY_PATH|AT_SYMLINK_NOFOLLOW)
> would indeed allow to filter out anything that isn't a directory and
> to safely open the proc symlink.
>
> > that fails with ENOENT, then retry the whole thing a certain number of
>
> Indeed someone could have unlinked the file in the meantime, in which
> case the open(O_PATH | O_NOFOLLOW) would fail, but if it succeeds then
> we cannot hit ENOENT anymore AFAICT.

Right.

> > times.  If it still fails then somebody is definitely messing with us
> > and we can fail the request with EIO.
> >
>
> Not sure what the retry+timeout is trying to mitigate here... why not
> returning EIO right away ?

The semantics of O_CREATE are that it can fail neither because the
file exists nor because it doesn't.  This doesn't matter if the
exported tree is not modified outside of a single guest, because of
locking provided by the guest kernel.

However if we want to support shared access to a tree then O_CREAT
semantics should work even in the face of races due to external
modification of the tree.  I.e. following race:

virtiofsd: open(foo, O_CREAT | O_EXCL) -> EEXIST
other task: unlink(foo)
virtiofsd: open(foo, O_PATH | O_NOFOLLOW) -> ENOENT

To properly support the above the O_CREAT | O_EXCL open would need to
be retried.

Thanks,
Miklos


Reply via email to