Re: [PATCH] ceph: don't return -ESTALE if there's still an open file

2020-05-19 Thread Jeff Layton
On Tue, 2020-05-19 at 07:00 +0300, Amir Goldstein wrote:
> On Tue, May 19, 2020 at 1:30 AM Gregory Farnum  wrote:
> > Maybe we resolved this conversation; I can't quite tell...
> 
> I think v2 patch wraps it up...
> 
> [...]
> 

Agreed.

> > > > Questions:
> > > > 1. Does sync() result in fully purging inodes on MDS?
> > > 
> > > I don't think so, but again, that code is not trivial to follow. I do
> > > know that the MDS keeps around a "strays directory" which contains
> > > unlinked inodes that are lazily cleaned up. My suspicion is that it's
> > > satisfying lookups out of this cache as well.
> > > 
> > > Which may be fine...the MDS is not required to be POSIX compliant after
> > > all. Only the fs drivers are.
> > 
> > I don't think this is quite that simple. Yes, the MDS is certainly
> > giving back stray inodes in response to a lookup-by-ino request. But
> > that's for a specific purpose: we need to be able to give back caps on
> > unlinked-but-open files. For NFS specifically, I don't know what the
> > rules are on NFS file handles and unlinked files, but the Ceph MDS
> > won't know when files are closed everywhere, and it translates from
> > NFS fh to Ceph inode using that lookup-by-ino functionality.
> > 
> 
> There is no protocol rule that NFS server MUST return ESTALE
> for file handle of a deleted file, but there is a rule that it MAY return
> ESTALE for deleted file. For example, on server restart and traditional
> block filesystem, there is not much choice.
> 
> So returning ESTALE when file is deleted but opened on another ceph
> client is definitely allowed by the protocol standard, the question is
> whether changing the behavior will break any existing workloads...
> 

Right -- that was sort of the point of my original question about the
xfstest. The fact that ceph wasn't returning ESTALE in this situation
didn't seem to be technically _wrong_ to me, but the xfstest treated
that as a failure. It's probably best to return ESTALE since that's the
conventional behavior, but I don't think it's necessarily required for
correct operation in general.

FWIW, if we ever implement O_TMPFILE in ceph, then we may need to
revisit this code. With that, you can do a 0->1 transition on i_nlink,
which blows some of the assumptions we're making here out of the water.

> > > > 2. Is i_nlink synchronized among nodes on deferred delete?
> > > > IWO, can inode come back from the dead on client if another node
> > > > has linked it before i_nlink 0 was observed?
> > > 
> > > No, that shouldn't happen. The caps mechanism should ensure that it
> > > can't be observed by other clients until after the change.
> > > 
> > > That said, Luis' current patch doesn't ensure we have the correct caps
> > > to check the i_nlink. We may need to add that in before we can roll with
> > > this.
> > > 
> > > > 3. Can an NFS client be "migrated" from one ceph node to another
> > > > with an open but unlinked file?
> > > > 
> > > 
> > > No. Open files in ceph are generally per-client. You can't pass around a
> > > fd (or equivalent).
> > 
> > But the NFS file handles I think do work across clients, right?
> > 
> 
> Maybe they can, but that would be like NFS server restart, so
> all bets are off w.r.t open but deleted files.
> 

They do work across clients, but a file handle is just an identifier for
an inode. That's completely orthogonal to whether the file is open.

-- 
Jeff Layton 



Re: [PATCH] ceph: don't return -ESTALE if there's still an open file

2020-05-18 Thread Amir Goldstein
On Tue, May 19, 2020 at 1:30 AM Gregory Farnum  wrote:
>
> Maybe we resolved this conversation; I can't quite tell...

I think v2 patch wraps it up...

[...]

> > >
> > > Questions:
> > > 1. Does sync() result in fully purging inodes on MDS?
> >
> > I don't think so, but again, that code is not trivial to follow. I do
> > know that the MDS keeps around a "strays directory" which contains
> > unlinked inodes that are lazily cleaned up. My suspicion is that it's
> > satisfying lookups out of this cache as well.
> >
> > Which may be fine...the MDS is not required to be POSIX compliant after
> > all. Only the fs drivers are.
>
> I don't think this is quite that simple. Yes, the MDS is certainly
> giving back stray inodes in response to a lookup-by-ino request. But
> that's for a specific purpose: we need to be able to give back caps on
> unlinked-but-open files. For NFS specifically, I don't know what the
> rules are on NFS file handles and unlinked files, but the Ceph MDS
> won't know when files are closed everywhere, and it translates from
> NFS fh to Ceph inode using that lookup-by-ino functionality.
>

There is no protocol rule that NFS server MUST return ESTALE
for file handle of a deleted file, but there is a rule that it MAY return
ESTALE for deleted file. For example, on server restart and traditional
block filesystem, there is not much choice.

So returning ESTALE when file is deleted but opened on another ceph
client is definitely allowed by the protocol standard, the question is
whether changing the behavior will break any existing workloads...

> >
> > > 2. Is i_nlink synchronized among nodes on deferred delete?
> > > IWO, can inode come back from the dead on client if another node
> > > has linked it before i_nlink 0 was observed?
> >
> > No, that shouldn't happen. The caps mechanism should ensure that it
> > can't be observed by other clients until after the change.
> >
> > That said, Luis' current patch doesn't ensure we have the correct caps
> > to check the i_nlink. We may need to add that in before we can roll with
> > this.
> >
> > > 3. Can an NFS client be "migrated" from one ceph node to another
> > > with an open but unlinked file?
> > >
> >
> > No. Open files in ceph are generally per-client. You can't pass around a
> > fd (or equivalent).
>
> But the NFS file handles I think do work across clients, right?
>

Maybe they can, but that would be like NFS server restart, so
all bets are off w.r.t open but deleted files.

Thanks,
Amir.


Re: [PATCH] ceph: don't return -ESTALE if there's still an open file

2020-05-18 Thread Gregory Farnum
Maybe we resolved this conversation; I can't quite tell...

On Fri, May 15, 2020 at 12:16 PM Jeff Layton  wrote:
>
> On Fri, 2020-05-15 at 19:56 +0300, Amir Goldstein wrote:
> > On Fri, May 15, 2020 at 2:38 PM Jeff Layton  wrote:
> > > On Fri, 2020-05-15 at 12:15 +0100, Luis Henriques wrote:
> > > > On Fri, May 15, 2020 at 09:42:24AM +0300, Amir Goldstein wrote:
> > > > > +CC: fstests
> > > > >
> > > > > On Thu, May 14, 2020 at 4:15 PM Jeff Layton  
> > > > > wrote:
> > > > > > On Thu, 2020-05-14 at 13:48 +0100, Luis Henriques wrote:
> > > > > > > On Thu, May 14, 2020 at 08:10:09AM -0400, Jeff Layton wrote:
> > > > > > > > On Thu, 2020-05-14 at 12:14 +0100, Luis Henriques wrote:
> > > > > > > > > Similarly to commit 03f219041fdb ("ceph: check i_nlink while 
> > > > > > > > > converting
> > > > > > > > > a file handle to dentry"), this fixes another corner case with
> > > > > > > > > name_to_handle_at/open_by_handle_at.  The issue has been 
> > > > > > > > > detected by
> > > > > > > > > xfstest generic/467, when doing:
> > > > > > > > >
> > > > > > > > >  - name_to_handle_at("/cephfs/myfile")
> > > > > > > > >  - open("/cephfs/myfile")
> > > > > > > > >  - unlink("/cephfs/myfile")
> > > > > > > > >  - open_by_handle_at()
> > > > > > > > >
> > > > > > > > > The call to open_by_handle_at should not fail because the 
> > > > > > > > > file still
> > > > > > > > > exists and we do have a valid handle to it.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Luis Henriques 
> > > > > > > > > ---
> > > > > > > > >  fs/ceph/export.c | 13 +++--
> > > > > > > > >  1 file changed, 11 insertions(+), 2 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git a/fs/ceph/export.c b/fs/ceph/export.c
> > > > > > > > > index 79dc06881e78..8556df9d94d0 100644
> > > > > > > > > --- a/fs/ceph/export.c
> > > > > > > > > +++ b/fs/ceph/export.c
> > > > > > > > > @@ -171,12 +171,21 @@ struct inode *ceph_lookup_inode(struct 
> > > > > > > > > super_block *sb, u64 ino)
> > > > > > > > >
> > > > > > > > >  static struct dentry *__fh_to_dentry(struct super_block *sb, 
> > > > > > > > > u64 ino)
> > > > > > > > >  {
> > > > > > > > > + struct ceph_inode_info *ci;
> > > > > > > > >   struct inode *inode = __lookup_inode(sb, ino);
> > > > > > > > > +
> > > > > > > > >   if (IS_ERR(inode))
> > > > > > > > >   return ERR_CAST(inode);
> > > > > > > > >   if (inode->i_nlink == 0) {
> > > > > > > > > - iput(inode);
> > > > > > > > > - return ERR_PTR(-ESTALE);
> > > > > > > > > + bool is_open;
> > > > > > > > > + ci = ceph_inode(inode);
> > > > > > > > > + spin_lock(>i_ceph_lock);
> > > > > > > > > + is_open = __ceph_is_file_opened(ci);
> > > > > > > > > + spin_unlock(>i_ceph_lock);
> > > > > > > > > + if (!is_open) {
> > > > > > > > > + iput(inode);
> > > > > > > > > + return ERR_PTR(-ESTALE);
> > > > > > > > > + }
> > > > > > > > >   }
> > > > > > > > >   return d_obtain_alias(inode);
> > > > > > > > >  }
> > > > > > > >
> > > > > > > > Thanks Luis. Out of curiousity, is there any reason we 
> > > > > > > > shouldn't ignore
> > > > > > > > the i_nlink value here? Does anything obviously break if we do?
> > > > > > >
> > > > > > > Yes, the scenario described in commit 03f219041fdb is still 
> > > > > > > valid, which
> > > > > > > is basically the same but without the extra open(2):
> > > > > > >
> > > > > > >   - name_to_handle_at("/cephfs/myfile")
> > > > > > >   - unlink("/cephfs/myfile")
> > > > > > >   - open_by_handle_at()
> > > > > > >
> > > > > >
> > > > > > Ok, I guess we end up doing some delayed cleanup, and that allows 
> > > > > > the
> > > > > > inode to be found in that situation.
> > > > > >
> > > > > > > The open_by_handle_at man page isn't really clear about these 2 
> > > > > > > scenarios,
> > > > > > > but generic/426 will fail if -ESTALE isn't returned.  Want me to 
> > > > > > > add a
> > > > > > > comment to the code, describing these 2 scenarios?
> > > > > > >
> > > > > >
> > > > > > (cc'ing Amir since he added this test)
> > > > > >
> > > > > > I don't think there is any hard requirement that open_by_handle_at
> > > > > > should fail in that situation. It generally does for most 
> > > > > > filesystems
> > > > > > due to the way they handle cl794798fa xfsqa: test open_by_handle() 
> > > > > > on unlinked and freed inode clusters
> > > > > eaning up unlinked inodes, but I don't
> > > > > > think it's technically illegal to allow the inode to still be 
> > > > > > found. If
> > > > > > the caller cares about whether it has been unlinked it can always 
> > > > > > test
> > > > > > i_nlink itself.
> > > > > >
> > > > > > Amir, is this required for some reason that I'm not aware of?
> > > > >
> > > > > Hi Jeff,
> > > > >
> > > > > The origin of this test is in fstests commit:
> > > > > 794798fa xfsqa: test open_by_handle() on unlinked and freed inode 
> > > > > clusters
> > > > >
> > > > > It was 

Re: [PATCH] ceph: don't return -ESTALE if there's still an open file

2020-05-16 Thread Jeff Layton
On Sat, 2020-05-16 at 09:58 +0300, Amir Goldstein wrote:
> [pulling in nfs guys]
> 
> > > Questions:
> > > 1. Does sync() result in fully purging inodes on MDS?
> > 
> > I don't think so, but again, that code is not trivial to follow. I do
> > know that the MDS keeps around a "strays directory" which contains
> > unlinked inodes that are lazily cleaned up. My suspicion is that it's
> > satisfying lookups out of this cache as well.
> > 
> > Which may be fine...the MDS is not required to be POSIX compliant after
> > all. Only the fs drivers are.
> > 
> > > 2. Is i_nlink synchronized among nodes on deferred delete?
> > > IWO, can inode come back from the dead on client if another node
> > > has linked it before i_nlink 0 was observed?
> > 
> > No, that shouldn't happen. The caps mechanism should ensure that it
> > can't be observed by other clients until after the change.
> > 
> > That said, Luis' current patch doesn't ensure we have the correct caps
> > to check the i_nlink. We may need to add that in before we can roll with
> > this.
> > 
> > > 3. Can an NFS client be "migrated" from one ceph node to another
> > > with an open but unlinked file?
> > > 
> > 
> > No. Open files in ceph are generally per-client. You can't pass around a
> > fd (or equivalent).
> 
> Not sure we are talking about the same thing.
> It's not ceph fd that is being passed around, it's the NFS client's fd.
> If there is no case where NFS client would access ceph client2
> with a handle it got from ceph client1, then there is no reason to satisfy
> an open_by_handle() call for an unlinked file on client2.
> If file was opened on client1, it may be "legal" to satisfy open_by_handle()
> on client2, but I don't see how stopping to satisfy that can break anything.
> 

Not currently, but eventually we may need to allow for that...which is
another good reason to handle this on the (Ceph) client instead, as the
client can then decide whether treat an unlinked file as an ESTALE
return based on its needs.

> > > I think what the test is trying to verify is that a "fully purged" inodes
> > > cannot be opened db handle, but there is no standard way to verify
> > > "fully purged", so the test resorts to sync() + another sync() + 
> > > drop_caches.
> > > 
> > 
> > Got it. That makes sense.
> > 
> > > Is there anything else that needs to be done on ceph in order to flush
> > > all deferred operations from this client to MDS?
> > 
> > I'm leaning toward something like what Luis has proposed, but adding in
> > appropriate cap handling.
> 
> That sounds fine.
> 
> > Basically, we have to consider the situation where one client has the
> > file open and another client unlinks it, and then does an
> > open_by_handle_at. Should it succeed in that case?
> > 
> > I can see arguments for either way.
> 
> IMO, the behavior should be defined for a client that has the file open.
> For the rest it does not really matter.
> 
> My argument is that is it easy to satisfy the test's expectation and conform
> to behavior of other filesystems without breaking any real workload.
> 
> To satisfy the test's expectation, you only need to change behavior of ceph
> client in i_count 1 use case. If i_count is 1 need to take all relevant caps
> to check that i_nlink is "globally" 0, before returning ESTALE.
> But if i_count > 1, no need to bother.

Makes sense. Thanks.

-- 
Jeff Layton 



Re: [PATCH] ceph: don't return -ESTALE if there's still an open file

2020-05-16 Thread Amir Goldstein
[pulling in nfs guys]

> > Questions:
> > 1. Does sync() result in fully purging inodes on MDS?
>
> I don't think so, but again, that code is not trivial to follow. I do
> know that the MDS keeps around a "strays directory" which contains
> unlinked inodes that are lazily cleaned up. My suspicion is that it's
> satisfying lookups out of this cache as well.
>
> Which may be fine...the MDS is not required to be POSIX compliant after
> all. Only the fs drivers are.
>
> > 2. Is i_nlink synchronized among nodes on deferred delete?
> > IWO, can inode come back from the dead on client if another node
> > has linked it before i_nlink 0 was observed?
>
> No, that shouldn't happen. The caps mechanism should ensure that it
> can't be observed by other clients until after the change.
>
> That said, Luis' current patch doesn't ensure we have the correct caps
> to check the i_nlink. We may need to add that in before we can roll with
> this.
>
> > 3. Can an NFS client be "migrated" from one ceph node to another
> > with an open but unlinked file?
> >
>
> No. Open files in ceph are generally per-client. You can't pass around a
> fd (or equivalent).

Not sure we are talking about the same thing.
It's not ceph fd that is being passed around, it's the NFS client's fd.
If there is no case where NFS client would access ceph client2
with a handle it got from ceph client1, then there is no reason to satisfy
an open_by_handle() call for an unlinked file on client2.
If file was opened on client1, it may be "legal" to satisfy open_by_handle()
on client2, but I don't see how stopping to satisfy that can break anything.

>
> > I think what the test is trying to verify is that a "fully purged" inodes
> > cannot be opened db handle, but there is no standard way to verify
> > "fully purged", so the test resorts to sync() + another sync() + 
> > drop_caches.
> >
>
> Got it. That makes sense.
>
> > Is there anything else that needs to be done on ceph in order to flush
> > all deferred operations from this client to MDS?
>
> I'm leaning toward something like what Luis has proposed, but adding in
> appropriate cap handling.

That sounds fine.

>
> Basically, we have to consider the situation where one client has the
> file open and another client unlinks it, and then does an
> open_by_handle_at. Should it succeed in that case?
>
> I can see arguments for either way.

IMO, the behavior should be defined for a client that has the file open.
For the rest it does not really matter.

My argument is that is it easy to satisfy the test's expectation and conform
to behavior of other filesystems without breaking any real workload.

To satisfy the test's expectation, you only need to change behavior of ceph
client in i_count 1 use case. If i_count is 1 need to take all relevant caps
to check that i_nlink is "globally" 0, before returning ESTALE.
But if i_count > 1, no need to bother.

Thanks,
Amir.


Re: [PATCH] ceph: don't return -ESTALE if there's still an open file

2020-05-15 Thread Jeff Layton
On Fri, 2020-05-15 at 19:56 +0300, Amir Goldstein wrote:
> On Fri, May 15, 2020 at 2:38 PM Jeff Layton  wrote:
> > On Fri, 2020-05-15 at 12:15 +0100, Luis Henriques wrote:
> > > On Fri, May 15, 2020 at 09:42:24AM +0300, Amir Goldstein wrote:
> > > > +CC: fstests
> > > > 
> > > > On Thu, May 14, 2020 at 4:15 PM Jeff Layton  wrote:
> > > > > On Thu, 2020-05-14 at 13:48 +0100, Luis Henriques wrote:
> > > > > > On Thu, May 14, 2020 at 08:10:09AM -0400, Jeff Layton wrote:
> > > > > > > On Thu, 2020-05-14 at 12:14 +0100, Luis Henriques wrote:
> > > > > > > > Similarly to commit 03f219041fdb ("ceph: check i_nlink while 
> > > > > > > > converting
> > > > > > > > a file handle to dentry"), this fixes another corner case with
> > > > > > > > name_to_handle_at/open_by_handle_at.  The issue has been 
> > > > > > > > detected by
> > > > > > > > xfstest generic/467, when doing:
> > > > > > > > 
> > > > > > > >  - name_to_handle_at("/cephfs/myfile")
> > > > > > > >  - open("/cephfs/myfile")
> > > > > > > >  - unlink("/cephfs/myfile")
> > > > > > > >  - open_by_handle_at()
> > > > > > > > 
> > > > > > > > The call to open_by_handle_at should not fail because the file 
> > > > > > > > still
> > > > > > > > exists and we do have a valid handle to it.
> > > > > > > > 
> > > > > > > > Signed-off-by: Luis Henriques 
> > > > > > > > ---
> > > > > > > >  fs/ceph/export.c | 13 +++--
> > > > > > > >  1 file changed, 11 insertions(+), 2 deletions(-)
> > > > > > > > 
> > > > > > > > diff --git a/fs/ceph/export.c b/fs/ceph/export.c
> > > > > > > > index 79dc06881e78..8556df9d94d0 100644
> > > > > > > > --- a/fs/ceph/export.c
> > > > > > > > +++ b/fs/ceph/export.c
> > > > > > > > @@ -171,12 +171,21 @@ struct inode *ceph_lookup_inode(struct 
> > > > > > > > super_block *sb, u64 ino)
> > > > > > > > 
> > > > > > > >  static struct dentry *__fh_to_dentry(struct super_block *sb, 
> > > > > > > > u64 ino)
> > > > > > > >  {
> > > > > > > > + struct ceph_inode_info *ci;
> > > > > > > >   struct inode *inode = __lookup_inode(sb, ino);
> > > > > > > > +
> > > > > > > >   if (IS_ERR(inode))
> > > > > > > >   return ERR_CAST(inode);
> > > > > > > >   if (inode->i_nlink == 0) {
> > > > > > > > - iput(inode);
> > > > > > > > - return ERR_PTR(-ESTALE);
> > > > > > > > + bool is_open;
> > > > > > > > + ci = ceph_inode(inode);
> > > > > > > > + spin_lock(>i_ceph_lock);
> > > > > > > > + is_open = __ceph_is_file_opened(ci);
> > > > > > > > + spin_unlock(>i_ceph_lock);
> > > > > > > > + if (!is_open) {
> > > > > > > > + iput(inode);
> > > > > > > > + return ERR_PTR(-ESTALE);
> > > > > > > > + }
> > > > > > > >   }
> > > > > > > >   return d_obtain_alias(inode);
> > > > > > > >  }
> > > > > > > 
> > > > > > > Thanks Luis. Out of curiousity, is there any reason we shouldn't 
> > > > > > > ignore
> > > > > > > the i_nlink value here? Does anything obviously break if we do?
> > > > > > 
> > > > > > Yes, the scenario described in commit 03f219041fdb is still valid, 
> > > > > > which
> > > > > > is basically the same but without the extra open(2):
> > > > > > 
> > > > > >   - name_to_handle_at("/cephfs/myfile")
> > > > > >   - unlink("/cephfs/myfile")
> > > > > >   - open_by_handle_at()
> > > > > > 
> > > > > 
> > > > > Ok, I guess we end up doing some delayed cleanup, and that allows the
> > > > > inode to be found in that situation.
> > > > > 
> > > > > > The open_by_handle_at man page isn't really clear about these 2 
> > > > > > scenarios,
> > > > > > but generic/426 will fail if -ESTALE isn't returned.  Want me to 
> > > > > > add a
> > > > > > comment to the code, describing these 2 scenarios?
> > > > > > 
> > > > > 
> > > > > (cc'ing Amir since he added this test)
> > > > > 
> > > > > I don't think there is any hard requirement that open_by_handle_at
> > > > > should fail in that situation. It generally does for most filesystems
> > > > > due to the way they handle cl794798fa xfsqa: test open_by_handle() on 
> > > > > unlinked and freed inode clusters
> > > > eaning up unlinked inodes, but I don't
> > > > > think it's technically illegal to allow the inode to still be found. 
> > > > > If
> > > > > the caller cares about whether it has been unlinked it can always test
> > > > > i_nlink itself.
> > > > > 
> > > > > Amir, is this required for some reason that I'm not aware of?
> > > > 
> > > > Hi Jeff,
> > > > 
> > > > The origin of this test is in fstests commit:
> > > > 794798fa xfsqa: test open_by_handle() on unlinked and freed inode 
> > > > clusters
> > > > 
> > > > It was introduced to catch an xfs bug, so this behavior is the 
> > > > expectation
> > > > of xfs filesystem, but note that it is not a general expectation to fail
> > > > open_by_handle() after unlink(), it is an expectation to fail 
> > > > open_by_handle()
> > > > after unlink() + sync() + drop_caches.
> > > 
> > > Yes, sorry I should have mentioned the 

Re: [PATCH] ceph: don't return -ESTALE if there's still an open file

2020-05-15 Thread Amir Goldstein
On Fri, May 15, 2020 at 2:38 PM Jeff Layton  wrote:
>
> On Fri, 2020-05-15 at 12:15 +0100, Luis Henriques wrote:
> > On Fri, May 15, 2020 at 09:42:24AM +0300, Amir Goldstein wrote:
> > > +CC: fstests
> > >
> > > On Thu, May 14, 2020 at 4:15 PM Jeff Layton  wrote:
> > > > On Thu, 2020-05-14 at 13:48 +0100, Luis Henriques wrote:
> > > > > On Thu, May 14, 2020 at 08:10:09AM -0400, Jeff Layton wrote:
> > > > > > On Thu, 2020-05-14 at 12:14 +0100, Luis Henriques wrote:
> > > > > > > Similarly to commit 03f219041fdb ("ceph: check i_nlink while 
> > > > > > > converting
> > > > > > > a file handle to dentry"), this fixes another corner case with
> > > > > > > name_to_handle_at/open_by_handle_at.  The issue has been detected 
> > > > > > > by
> > > > > > > xfstest generic/467, when doing:
> > > > > > >
> > > > > > >  - name_to_handle_at("/cephfs/myfile")
> > > > > > >  - open("/cephfs/myfile")
> > > > > > >  - unlink("/cephfs/myfile")
> > > > > > >  - open_by_handle_at()
> > > > > > >
> > > > > > > The call to open_by_handle_at should not fail because the file 
> > > > > > > still
> > > > > > > exists and we do have a valid handle to it.
> > > > > > >
> > > > > > > Signed-off-by: Luis Henriques 
> > > > > > > ---
> > > > > > >  fs/ceph/export.c | 13 +++--
> > > > > > >  1 file changed, 11 insertions(+), 2 deletions(-)
> > > > > > >
> > > > > > > diff --git a/fs/ceph/export.c b/fs/ceph/export.c
> > > > > > > index 79dc06881e78..8556df9d94d0 100644
> > > > > > > --- a/fs/ceph/export.c
> > > > > > > +++ b/fs/ceph/export.c
> > > > > > > @@ -171,12 +171,21 @@ struct inode *ceph_lookup_inode(struct 
> > > > > > > super_block *sb, u64 ino)
> > > > > > >
> > > > > > >  static struct dentry *__fh_to_dentry(struct super_block *sb, u64 
> > > > > > > ino)
> > > > > > >  {
> > > > > > > + struct ceph_inode_info *ci;
> > > > > > >   struct inode *inode = __lookup_inode(sb, ino);
> > > > > > > +
> > > > > > >   if (IS_ERR(inode))
> > > > > > >   return ERR_CAST(inode);
> > > > > > >   if (inode->i_nlink == 0) {
> > > > > > > - iput(inode);
> > > > > > > - return ERR_PTR(-ESTALE);
> > > > > > > + bool is_open;
> > > > > > > + ci = ceph_inode(inode);
> > > > > > > + spin_lock(>i_ceph_lock);
> > > > > > > + is_open = __ceph_is_file_opened(ci);
> > > > > > > + spin_unlock(>i_ceph_lock);
> > > > > > > + if (!is_open) {
> > > > > > > + iput(inode);
> > > > > > > + return ERR_PTR(-ESTALE);
> > > > > > > + }
> > > > > > >   }
> > > > > > >   return d_obtain_alias(inode);
> > > > > > >  }
> > > > > >
> > > > > > Thanks Luis. Out of curiousity, is there any reason we shouldn't 
> > > > > > ignore
> > > > > > the i_nlink value here? Does anything obviously break if we do?
> > > > >
> > > > > Yes, the scenario described in commit 03f219041fdb is still valid, 
> > > > > which
> > > > > is basically the same but without the extra open(2):
> > > > >
> > > > >   - name_to_handle_at("/cephfs/myfile")
> > > > >   - unlink("/cephfs/myfile")
> > > > >   - open_by_handle_at()
> > > > >
> > > >
> > > > Ok, I guess we end up doing some delayed cleanup, and that allows the
> > > > inode to be found in that situation.
> > > >
> > > > > The open_by_handle_at man page isn't really clear about these 2 
> > > > > scenarios,
> > > > > but generic/426 will fail if -ESTALE isn't returned.  Want me to add a
> > > > > comment to the code, describing these 2 scenarios?
> > > > >
> > > >
> > > > (cc'ing Amir since he added this test)
> > > >
> > > > I don't think there is any hard requirement that open_by_handle_at
> > > > should fail in that situation. It generally does for most filesystems
> > > > due to the way they handle cl794798fa xfsqa: test open_by_handle() on 
> > > > unlinked and freed inode clusters
> > > eaning up unlinked inodes, but I don't
> > > > think it's technically illegal to allow the inode to still be found. If
> > > > the caller cares about whether it has been unlinked it can always test
> > > > i_nlink itself.
> > > >
> > > > Amir, is this required for some reason that I'm not aware of?
> > >
> > > Hi Jeff,
> > >
> > > The origin of this test is in fstests commit:
> > > 794798fa xfsqa: test open_by_handle() on unlinked and freed inode clusters
> > >
> > > It was introduced to catch an xfs bug, so this behavior is the expectation
> > > of xfs filesystem, but note that it is not a general expectation to fail
> > > open_by_handle() after unlink(), it is an expectation to fail 
> > > open_by_handle()
> > > after unlink() + sync() + drop_caches.
> >
> > Yes, sorry I should have mentioned the sync+drop_caches in the
> > description.
> >
> > > I have later converted the test to generic, because I needed to check the
> > > same expectation for overlayfs use case, which is:
> > > The original inode is always there (in lower layer), unlink creates a 
> > > whiteout
> > > mark and open_by_handle should treat that as ESTALE, 

Re: [PATCH] ceph: don't return -ESTALE if there's still an open file

2020-05-15 Thread Jeff Layton
On Fri, 2020-05-15 at 12:15 +0100, Luis Henriques wrote:
> On Fri, May 15, 2020 at 09:42:24AM +0300, Amir Goldstein wrote:
> > +CC: fstests
> > 
> > On Thu, May 14, 2020 at 4:15 PM Jeff Layton  wrote:
> > > On Thu, 2020-05-14 at 13:48 +0100, Luis Henriques wrote:
> > > > On Thu, May 14, 2020 at 08:10:09AM -0400, Jeff Layton wrote:
> > > > > On Thu, 2020-05-14 at 12:14 +0100, Luis Henriques wrote:
> > > > > > Similarly to commit 03f219041fdb ("ceph: check i_nlink while 
> > > > > > converting
> > > > > > a file handle to dentry"), this fixes another corner case with
> > > > > > name_to_handle_at/open_by_handle_at.  The issue has been detected by
> > > > > > xfstest generic/467, when doing:
> > > > > > 
> > > > > >  - name_to_handle_at("/cephfs/myfile")
> > > > > >  - open("/cephfs/myfile")
> > > > > >  - unlink("/cephfs/myfile")
> > > > > >  - open_by_handle_at()
> > > > > > 
> > > > > > The call to open_by_handle_at should not fail because the file still
> > > > > > exists and we do have a valid handle to it.
> > > > > > 
> > > > > > Signed-off-by: Luis Henriques 
> > > > > > ---
> > > > > >  fs/ceph/export.c | 13 +++--
> > > > > >  1 file changed, 11 insertions(+), 2 deletions(-)
> > > > > > 
> > > > > > diff --git a/fs/ceph/export.c b/fs/ceph/export.c
> > > > > > index 79dc06881e78..8556df9d94d0 100644
> > > > > > --- a/fs/ceph/export.c
> > > > > > +++ b/fs/ceph/export.c
> > > > > > @@ -171,12 +171,21 @@ struct inode *ceph_lookup_inode(struct 
> > > > > > super_block *sb, u64 ino)
> > > > > > 
> > > > > >  static struct dentry *__fh_to_dentry(struct super_block *sb, u64 
> > > > > > ino)
> > > > > >  {
> > > > > > + struct ceph_inode_info *ci;
> > > > > >   struct inode *inode = __lookup_inode(sb, ino);
> > > > > > +
> > > > > >   if (IS_ERR(inode))
> > > > > >   return ERR_CAST(inode);
> > > > > >   if (inode->i_nlink == 0) {
> > > > > > - iput(inode);
> > > > > > - return ERR_PTR(-ESTALE);
> > > > > > + bool is_open;
> > > > > > + ci = ceph_inode(inode);
> > > > > > + spin_lock(>i_ceph_lock);
> > > > > > + is_open = __ceph_is_file_opened(ci);
> > > > > > + spin_unlock(>i_ceph_lock);
> > > > > > + if (!is_open) {
> > > > > > + iput(inode);
> > > > > > + return ERR_PTR(-ESTALE);
> > > > > > + }
> > > > > >   }
> > > > > >   return d_obtain_alias(inode);
> > > > > >  }
> > > > > 
> > > > > Thanks Luis. Out of curiousity, is there any reason we shouldn't 
> > > > > ignore
> > > > > the i_nlink value here? Does anything obviously break if we do?
> > > > 
> > > > Yes, the scenario described in commit 03f219041fdb is still valid, which
> > > > is basically the same but without the extra open(2):
> > > > 
> > > >   - name_to_handle_at("/cephfs/myfile")
> > > >   - unlink("/cephfs/myfile")
> > > >   - open_by_handle_at()
> > > > 
> > > 
> > > Ok, I guess we end up doing some delayed cleanup, and that allows the
> > > inode to be found in that situation.
> > > 
> > > > The open_by_handle_at man page isn't really clear about these 2 
> > > > scenarios,
> > > > but generic/426 will fail if -ESTALE isn't returned.  Want me to add a
> > > > comment to the code, describing these 2 scenarios?
> > > > 
> > > 
> > > (cc'ing Amir since he added this test)
> > > 
> > > I don't think there is any hard requirement that open_by_handle_at
> > > should fail in that situation. It generally does for most filesystems
> > > due to the way they handle cl794798fa xfsqa: test open_by_handle() on 
> > > unlinked and freed inode clusters
> > eaning up unlinked inodes, but I don't
> > > think it's technically illegal to allow the inode to still be found. If
> > > the caller cares about whether it has been unlinked it can always test
> > > i_nlink itself.
> > > 
> > > Amir, is this required for some reason that I'm not aware of?
> > 
> > Hi Jeff,
> > 
> > The origin of this test is in fstests commit:
> > 794798fa xfsqa: test open_by_handle() on unlinked and freed inode clusters
> > 
> > It was introduced to catch an xfs bug, so this behavior is the expectation
> > of xfs filesystem, but note that it is not a general expectation to fail
> > open_by_handle() after unlink(), it is an expectation to fail 
> > open_by_handle()
> > after unlink() + sync() + drop_caches.
> 
> Yes, sorry I should have mentioned the sync+drop_caches in the
> description.
> 
> > I have later converted the test to generic, because I needed to check the
> > same expectation for overlayfs use case, which is:
> > The original inode is always there (in lower layer), unlink creates a 
> > whiteout
> > mark and open_by_handle should treat that as ESTALE, otherwise the
> > unlinked files would be accessible to nfs clients forever.
> > 

Ok, that makes sense. 

The situation with Ceph is a bit different I think. I suspect that we're
cleaning the inode out of the client's caches after drop_caches, but
then we end up issuing a lookup by inode number to the 

Re: [PATCH] ceph: don't return -ESTALE if there's still an open file

2020-05-15 Thread Luis Henriques
On Fri, May 15, 2020 at 09:42:24AM +0300, Amir Goldstein wrote:
> +CC: fstests
> 
> On Thu, May 14, 2020 at 4:15 PM Jeff Layton  wrote:
> >
> > On Thu, 2020-05-14 at 13:48 +0100, Luis Henriques wrote:
> > > On Thu, May 14, 2020 at 08:10:09AM -0400, Jeff Layton wrote:
> > > > On Thu, 2020-05-14 at 12:14 +0100, Luis Henriques wrote:
> > > > > Similarly to commit 03f219041fdb ("ceph: check i_nlink while 
> > > > > converting
> > > > > a file handle to dentry"), this fixes another corner case with
> > > > > name_to_handle_at/open_by_handle_at.  The issue has been detected by
> > > > > xfstest generic/467, when doing:
> > > > >
> > > > >  - name_to_handle_at("/cephfs/myfile")
> > > > >  - open("/cephfs/myfile")
> > > > >  - unlink("/cephfs/myfile")
> > > > >  - open_by_handle_at()
> > > > >
> > > > > The call to open_by_handle_at should not fail because the file still
> > > > > exists and we do have a valid handle to it.
> > > > >
> > > > > Signed-off-by: Luis Henriques 
> > > > > ---
> > > > >  fs/ceph/export.c | 13 +++--
> > > > >  1 file changed, 11 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/fs/ceph/export.c b/fs/ceph/export.c
> > > > > index 79dc06881e78..8556df9d94d0 100644
> > > > > --- a/fs/ceph/export.c
> > > > > +++ b/fs/ceph/export.c
> > > > > @@ -171,12 +171,21 @@ struct inode *ceph_lookup_inode(struct 
> > > > > super_block *sb, u64 ino)
> > > > >
> > > > >  static struct dentry *__fh_to_dentry(struct super_block *sb, u64 ino)
> > > > >  {
> > > > > + struct ceph_inode_info *ci;
> > > > >   struct inode *inode = __lookup_inode(sb, ino);
> > > > > +
> > > > >   if (IS_ERR(inode))
> > > > >   return ERR_CAST(inode);
> > > > >   if (inode->i_nlink == 0) {
> > > > > - iput(inode);
> > > > > - return ERR_PTR(-ESTALE);
> > > > > + bool is_open;
> > > > > + ci = ceph_inode(inode);
> > > > > + spin_lock(>i_ceph_lock);
> > > > > + is_open = __ceph_is_file_opened(ci);
> > > > > + spin_unlock(>i_ceph_lock);
> > > > > + if (!is_open) {
> > > > > + iput(inode);
> > > > > + return ERR_PTR(-ESTALE);
> > > > > + }
> > > > >   }
> > > > >   return d_obtain_alias(inode);
> > > > >  }
> > > >
> > > > Thanks Luis. Out of curiousity, is there any reason we shouldn't ignore
> > > > the i_nlink value here? Does anything obviously break if we do?
> > >
> > > Yes, the scenario described in commit 03f219041fdb is still valid, which
> > > is basically the same but without the extra open(2):
> > >
> > >   - name_to_handle_at("/cephfs/myfile")
> > >   - unlink("/cephfs/myfile")
> > >   - open_by_handle_at()
> > >
> >
> > Ok, I guess we end up doing some delayed cleanup, and that allows the
> > inode to be found in that situation.
> >
> > > The open_by_handle_at man page isn't really clear about these 2 scenarios,
> > > but generic/426 will fail if -ESTALE isn't returned.  Want me to add a
> > > comment to the code, describing these 2 scenarios?
> > >
> >
> > (cc'ing Amir since he added this test)
> >
> > I don't think there is any hard requirement that open_by_handle_at
> > should fail in that situation. It generally does for most filesystems
> > due to the way they handle cl794798fa xfsqa: test open_by_handle() on 
> > unlinked and freed inode clusters
> eaning up unlinked inodes, but I don't
> > think it's technically illegal to allow the inode to still be found. If
> > the caller cares about whether it has been unlinked it can always test
> > i_nlink itself.
> >
> > Amir, is this required for some reason that I'm not aware of?
> 
> Hi Jeff,
> 
> The origin of this test is in fstests commit:
> 794798fa xfsqa: test open_by_handle() on unlinked and freed inode clusters
> 
> It was introduced to catch an xfs bug, so this behavior is the expectation
> of xfs filesystem, but note that it is not a general expectation to fail
> open_by_handle() after unlink(), it is an expectation to fail open_by_handle()
> after unlink() + sync() + drop_caches.

Yes, sorry I should have mentioned the sync+drop_caches in the
description.

> I have later converted the test to generic, because I needed to check the
> same expectation for overlayfs use case, which is:
> The original inode is always there (in lower layer), unlink creates a whiteout
> mark and open_by_handle should treat that as ESTALE, otherwise the
> unlinked files would be accessible to nfs clients forever.
> 
> In overlayfs, we handle the open file case by returning a dentry only
> in case the inode with deletion mark in question is already in inode cache,
> but we take care not to populate inode cache with the check.
> It is easier, because we do not need to get inode into cache for checking
> the delete marker.
> 
> Maybe you could instead check in __fh_to_dentry():
> 
> if (inode->i_nlink == 0 && atomic_read(>i_count) == 1)) {
> iput(inode);
> return ERR_PTR(-ESTALE);
> }
> 
> The above is untested, so I don't know if 

Re: [PATCH] ceph: don't return -ESTALE if there's still an open file

2020-05-15 Thread Amir Goldstein
+CC: fstests

On Thu, May 14, 2020 at 4:15 PM Jeff Layton  wrote:
>
> On Thu, 2020-05-14 at 13:48 +0100, Luis Henriques wrote:
> > On Thu, May 14, 2020 at 08:10:09AM -0400, Jeff Layton wrote:
> > > On Thu, 2020-05-14 at 12:14 +0100, Luis Henriques wrote:
> > > > Similarly to commit 03f219041fdb ("ceph: check i_nlink while converting
> > > > a file handle to dentry"), this fixes another corner case with
> > > > name_to_handle_at/open_by_handle_at.  The issue has been detected by
> > > > xfstest generic/467, when doing:
> > > >
> > > >  - name_to_handle_at("/cephfs/myfile")
> > > >  - open("/cephfs/myfile")
> > > >  - unlink("/cephfs/myfile")
> > > >  - open_by_handle_at()
> > > >
> > > > The call to open_by_handle_at should not fail because the file still
> > > > exists and we do have a valid handle to it.
> > > >
> > > > Signed-off-by: Luis Henriques 
> > > > ---
> > > >  fs/ceph/export.c | 13 +++--
> > > >  1 file changed, 11 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/fs/ceph/export.c b/fs/ceph/export.c
> > > > index 79dc06881e78..8556df9d94d0 100644
> > > > --- a/fs/ceph/export.c
> > > > +++ b/fs/ceph/export.c
> > > > @@ -171,12 +171,21 @@ struct inode *ceph_lookup_inode(struct 
> > > > super_block *sb, u64 ino)
> > > >
> > > >  static struct dentry *__fh_to_dentry(struct super_block *sb, u64 ino)
> > > >  {
> > > > + struct ceph_inode_info *ci;
> > > >   struct inode *inode = __lookup_inode(sb, ino);
> > > > +
> > > >   if (IS_ERR(inode))
> > > >   return ERR_CAST(inode);
> > > >   if (inode->i_nlink == 0) {
> > > > - iput(inode);
> > > > - return ERR_PTR(-ESTALE);
> > > > + bool is_open;
> > > > + ci = ceph_inode(inode);
> > > > + spin_lock(>i_ceph_lock);
> > > > + is_open = __ceph_is_file_opened(ci);
> > > > + spin_unlock(>i_ceph_lock);
> > > > + if (!is_open) {
> > > > + iput(inode);
> > > > + return ERR_PTR(-ESTALE);
> > > > + }
> > > >   }
> > > >   return d_obtain_alias(inode);
> > > >  }
> > >
> > > Thanks Luis. Out of curiousity, is there any reason we shouldn't ignore
> > > the i_nlink value here? Does anything obviously break if we do?
> >
> > Yes, the scenario described in commit 03f219041fdb is still valid, which
> > is basically the same but without the extra open(2):
> >
> >   - name_to_handle_at("/cephfs/myfile")
> >   - unlink("/cephfs/myfile")
> >   - open_by_handle_at()
> >
>
> Ok, I guess we end up doing some delayed cleanup, and that allows the
> inode to be found in that situation.
>
> > The open_by_handle_at man page isn't really clear about these 2 scenarios,
> > but generic/426 will fail if -ESTALE isn't returned.  Want me to add a
> > comment to the code, describing these 2 scenarios?
> >
>
> (cc'ing Amir since he added this test)
>
> I don't think there is any hard requirement that open_by_handle_at
> should fail in that situation. It generally does for most filesystems
> due to the way they handle cl794798fa xfsqa: test open_by_handle() on 
> unlinked and freed inode clusters
eaning up unlinked inodes, but I don't
> think it's technically illegal to allow the inode to still be found. If
> the caller cares about whether it has been unlinked it can always test
> i_nlink itself.
>
> Amir, is this required for some reason that I'm not aware of?

Hi Jeff,

The origin of this test is in fstests commit:
794798fa xfsqa: test open_by_handle() on unlinked and freed inode clusters

It was introduced to catch an xfs bug, so this behavior is the expectation
of xfs filesystem, but note that it is not a general expectation to fail
open_by_handle() after unlink(), it is an expectation to fail open_by_handle()
after unlink() + sync() + drop_caches.

I have later converted the test to generic, because I needed to check the
same expectation for overlayfs use case, which is:
The original inode is always there (in lower layer), unlink creates a whiteout
mark and open_by_handle should treat that as ESTALE, otherwise the
unlinked files would be accessible to nfs clients forever.

In overlayfs, we handle the open file case by returning a dentry only
in case the inode with deletion mark in question is already in inode cache,
but we take care not to populate inode cache with the check.
It is easier, because we do not need to get inode into cache for checking
the delete marker.

Maybe you could instead check in __fh_to_dentry():

if (inode->i_nlink == 0 && atomic_read(>i_count) == 1)) {
iput(inode);
return ERR_PTR(-ESTALE);
}

The above is untested, so I don't know if it's enough to pass generic/426.
Note that generic/467 also checks the same behavior for rmdir().

If you decide that ceph does not need to comply to this behavior,
then we probably need to whitelist/blocklist the filesystems that
want to test this behavior, which will be a shame.

Thanks,
Amir.


Re: [PATCH] ceph: don't return -ESTALE if there's still an open file

2020-05-14 Thread Jeff Layton
On Thu, 2020-05-14 at 13:48 +0100, Luis Henriques wrote:
> On Thu, May 14, 2020 at 08:10:09AM -0400, Jeff Layton wrote:
> > On Thu, 2020-05-14 at 12:14 +0100, Luis Henriques wrote:
> > > Similarly to commit 03f219041fdb ("ceph: check i_nlink while converting
> > > a file handle to dentry"), this fixes another corner case with
> > > name_to_handle_at/open_by_handle_at.  The issue has been detected by
> > > xfstest generic/467, when doing:
> > > 
> > >  - name_to_handle_at("/cephfs/myfile")
> > >  - open("/cephfs/myfile")
> > >  - unlink("/cephfs/myfile")
> > >  - open_by_handle_at()
> > > 
> > > The call to open_by_handle_at should not fail because the file still
> > > exists and we do have a valid handle to it.
> > > 
> > > Signed-off-by: Luis Henriques 
> > > ---
> > >  fs/ceph/export.c | 13 +++--
> > >  1 file changed, 11 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/fs/ceph/export.c b/fs/ceph/export.c
> > > index 79dc06881e78..8556df9d94d0 100644
> > > --- a/fs/ceph/export.c
> > > +++ b/fs/ceph/export.c
> > > @@ -171,12 +171,21 @@ struct inode *ceph_lookup_inode(struct super_block 
> > > *sb, u64 ino)
> > >  
> > >  static struct dentry *__fh_to_dentry(struct super_block *sb, u64 ino)
> > >  {
> > > + struct ceph_inode_info *ci;
> > >   struct inode *inode = __lookup_inode(sb, ino);
> > > +
> > >   if (IS_ERR(inode))
> > >   return ERR_CAST(inode);
> > >   if (inode->i_nlink == 0) {
> > > - iput(inode);
> > > - return ERR_PTR(-ESTALE);
> > > + bool is_open;
> > > + ci = ceph_inode(inode);
> > > + spin_lock(>i_ceph_lock);
> > > + is_open = __ceph_is_file_opened(ci);
> > > + spin_unlock(>i_ceph_lock);
> > > + if (!is_open) {
> > > + iput(inode);
> > > + return ERR_PTR(-ESTALE);
> > > + }
> > >   }
> > >   return d_obtain_alias(inode);
> > >  }
> > 
> > Thanks Luis. Out of curiousity, is there any reason we shouldn't ignore
> > the i_nlink value here? Does anything obviously break if we do?
> 
> Yes, the scenario described in commit 03f219041fdb is still valid, which
> is basically the same but without the extra open(2):
> 
>   - name_to_handle_at("/cephfs/myfile")
>   - unlink("/cephfs/myfile")
>   - open_by_handle_at()
> 

Ok, I guess we end up doing some delayed cleanup, and that allows the
inode to be found in that situation.

> The open_by_handle_at man page isn't really clear about these 2 scenarios,
> but generic/426 will fail if -ESTALE isn't returned.  Want me to add a
> comment to the code, describing these 2 scenarios?
> 

(cc'ing Amir since he added this test)

I don't think there is any hard requirement that open_by_handle_at
should fail in that situation. It generally does for most filesystems
due to the way they handle cleaning up unlinked inodes, but I don't
think it's technically illegal to allow the inode to still be found. If
the caller cares about whether it has been unlinked it can always test
i_nlink itself.

Amir, is this required for some reason that I'm not aware of?

Thanks,
-- 
Jeff Layton 



Re: [PATCH] ceph: don't return -ESTALE if there's still an open file

2020-05-14 Thread Luis Henriques
On Thu, May 14, 2020 at 08:10:09AM -0400, Jeff Layton wrote:
> On Thu, 2020-05-14 at 12:14 +0100, Luis Henriques wrote:
> > Similarly to commit 03f219041fdb ("ceph: check i_nlink while converting
> > a file handle to dentry"), this fixes another corner case with
> > name_to_handle_at/open_by_handle_at.  The issue has been detected by
> > xfstest generic/467, when doing:
> > 
> >  - name_to_handle_at("/cephfs/myfile")
> >  - open("/cephfs/myfile")
> >  - unlink("/cephfs/myfile")
> >  - open_by_handle_at()
> > 
> > The call to open_by_handle_at should not fail because the file still
> > exists and we do have a valid handle to it.
> > 
> > Signed-off-by: Luis Henriques 
> > ---
> >  fs/ceph/export.c | 13 +++--
> >  1 file changed, 11 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/ceph/export.c b/fs/ceph/export.c
> > index 79dc06881e78..8556df9d94d0 100644
> > --- a/fs/ceph/export.c
> > +++ b/fs/ceph/export.c
> > @@ -171,12 +171,21 @@ struct inode *ceph_lookup_inode(struct super_block 
> > *sb, u64 ino)
> >  
> >  static struct dentry *__fh_to_dentry(struct super_block *sb, u64 ino)
> >  {
> > +   struct ceph_inode_info *ci;
> > struct inode *inode = __lookup_inode(sb, ino);
> > +
> > if (IS_ERR(inode))
> > return ERR_CAST(inode);
> > if (inode->i_nlink == 0) {
> > -   iput(inode);
> > -   return ERR_PTR(-ESTALE);
> > +   bool is_open;
> > +   ci = ceph_inode(inode);
> > +   spin_lock(>i_ceph_lock);
> > +   is_open = __ceph_is_file_opened(ci);
> > +   spin_unlock(>i_ceph_lock);
> > +   if (!is_open) {
> > +   iput(inode);
> > +   return ERR_PTR(-ESTALE);
> > +   }
> > }
> > return d_obtain_alias(inode);
> >  }
> 
> Thanks Luis. Out of curiousity, is there any reason we shouldn't ignore
> the i_nlink value here? Does anything obviously break if we do?

Yes, the scenario described in commit 03f219041fdb is still valid, which
is basically the same but without the extra open(2):

  - name_to_handle_at("/cephfs/myfile")
  - unlink("/cephfs/myfile")
  - open_by_handle_at()

The open_by_handle_at man page isn't really clear about these 2 scenarios,
but generic/426 will fail if -ESTALE isn't returned.  Want me to add a
comment to the code, describing these 2 scenarios?

Cheers,
--
Luis


Re: [PATCH] ceph: don't return -ESTALE if there's still an open file

2020-05-14 Thread Jeff Layton
On Thu, 2020-05-14 at 12:14 +0100, Luis Henriques wrote:
> Similarly to commit 03f219041fdb ("ceph: check i_nlink while converting
> a file handle to dentry"), this fixes another corner case with
> name_to_handle_at/open_by_handle_at.  The issue has been detected by
> xfstest generic/467, when doing:
> 
>  - name_to_handle_at("/cephfs/myfile")
>  - open("/cephfs/myfile")
>  - unlink("/cephfs/myfile")
>  - open_by_handle_at()
> 
> The call to open_by_handle_at should not fail because the file still
> exists and we do have a valid handle to it.
> 
> Signed-off-by: Luis Henriques 
> ---
>  fs/ceph/export.c | 13 +++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ceph/export.c b/fs/ceph/export.c
> index 79dc06881e78..8556df9d94d0 100644
> --- a/fs/ceph/export.c
> +++ b/fs/ceph/export.c
> @@ -171,12 +171,21 @@ struct inode *ceph_lookup_inode(struct super_block *sb, 
> u64 ino)
>  
>  static struct dentry *__fh_to_dentry(struct super_block *sb, u64 ino)
>  {
> + struct ceph_inode_info *ci;
>   struct inode *inode = __lookup_inode(sb, ino);
> +
>   if (IS_ERR(inode))
>   return ERR_CAST(inode);
>   if (inode->i_nlink == 0) {
> - iput(inode);
> - return ERR_PTR(-ESTALE);
> + bool is_open;
> + ci = ceph_inode(inode);
> + spin_lock(>i_ceph_lock);
> + is_open = __ceph_is_file_opened(ci);
> + spin_unlock(>i_ceph_lock);
> + if (!is_open) {
> + iput(inode);
> + return ERR_PTR(-ESTALE);
> + }
>   }
>   return d_obtain_alias(inode);
>  }

Thanks Luis. Out of curiousity, is there any reason we shouldn't ignore
the i_nlink value here? Does anything obviously break if we do?

Thanks,
-- 
Jeff Layton 



[PATCH] ceph: don't return -ESTALE if there's still an open file

2020-05-14 Thread Luis Henriques
Similarly to commit 03f219041fdb ("ceph: check i_nlink while converting
a file handle to dentry"), this fixes another corner case with
name_to_handle_at/open_by_handle_at.  The issue has been detected by
xfstest generic/467, when doing:

 - name_to_handle_at("/cephfs/myfile")
 - open("/cephfs/myfile")
 - unlink("/cephfs/myfile")
 - open_by_handle_at()

The call to open_by_handle_at should not fail because the file still
exists and we do have a valid handle to it.

Signed-off-by: Luis Henriques 
---
 fs/ceph/export.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/fs/ceph/export.c b/fs/ceph/export.c
index 79dc06881e78..8556df9d94d0 100644
--- a/fs/ceph/export.c
+++ b/fs/ceph/export.c
@@ -171,12 +171,21 @@ struct inode *ceph_lookup_inode(struct super_block *sb, 
u64 ino)
 
 static struct dentry *__fh_to_dentry(struct super_block *sb, u64 ino)
 {
+   struct ceph_inode_info *ci;
struct inode *inode = __lookup_inode(sb, ino);
+
if (IS_ERR(inode))
return ERR_CAST(inode);
if (inode->i_nlink == 0) {
-   iput(inode);
-   return ERR_PTR(-ESTALE);
+   bool is_open;
+   ci = ceph_inode(inode);
+   spin_lock(>i_ceph_lock);
+   is_open = __ceph_is_file_opened(ci);
+   spin_unlock(>i_ceph_lock);
+   if (!is_open) {
+   iput(inode);
+   return ERR_PTR(-ESTALE);
+   }
}
return d_obtain_alias(inode);
 }