Oops: Forget to do a reply-to-all..

Pete,
Weird..
d_revalidate is the FS callback to tell the VFS that the dentry is no
longer valid
and in this case it looks like it rightfully returns 0 indicating that
the entry is longer to be trusted.

       int status = dentry->d_op->d_revalidate(dentry, nd);
        if (unlikely(status <= 0)) {
                /*
                 * The dentry failed validation.
                 * If d_revalidate returned 0 attempt to invalidate
                 * the dentry otherwise d_revalidate is asking us
                 * to return a fail status.
                 */
                if (!status) {
                        if (!d_invalidate(dentry)) { <-- will try to
invalidate the dentry
                                dput(dentry);  <-- drop the reference
                                dentry = NULL; <-- set it to NULL
                        }
                } else {
                        dput(dentry);
                        dentry = ERR_PTR(status);
                }
return dentry <-- gets returned.

This percolates upwards

        dentry = cached_lookup(base, name, nd);
        if (!dentry) {
                struct dentry *new = d_alloc(base, name); <-- would
have allocated a new dentry
                dentry = ERR_PTR(-ENOMEM);
                if (!new)
                        goto out;
                dentry = inode->i_op->lookup(inode, new, nd); <--
callsinto our lookup
                if (!dentry)
                        dentry = new;
                else
                        dput(new);
        }

I am confused on why we are doing a getattr on the stale handle unless
the dentry was somehow not being invalidated i.e. d_invalidate()
failed perhaps?
In any case d_drop() is the API to drop the dentry which is what
d_invalidate() should have invoked..
btw, Were there are any files under the "foo" directory?

Can you bump up the loglevels (include lookup GOSSIP_NAME_DEBUG,
getattr GOSSIP_GETATTR_DEBUG) and send it to me?
I will run this workload tonight and see if I can repro this..
This needs more than 1 machine, right?
thanks,
Murali

On 8/15/07, Pete Wyckoff <[EMAIL PROTECTED]> wrote:
> Sam and I have been tracking down a pvfs bug when using the VFS
> interface.  Kevin discovered it.
>
> The code is test #7 in simul.  It runs in parallel, four tasks,
> tasks 0 and 1 on node1 and tasks 2 and 3 on node2.  It does:
>
>     if (task == 0)
>         mkdir("foo");
>
>     MPI_Barrier();
>     sleep(3);
>
>     stat("foo");
>
>     if (task == 0)
>         rmdir("foo");
>
> On a freshly initialized pvfs (1 server for both md + io), it works.
> Task 0 creates the directory, and all four tasks stat it
> successfully.  When the process exits, the directory is indeed gone.
>
> The second time you run it, tasks 2 and 3 (on node2) get -ENOENT
> from the stat, but tasks 0 and 1 work fine as before and the
> directory was indeed created properly.
>
> Looking down a bit further, the server sees lookup requests from
> tasks 2 and 3, and returns the proper handle Id.  Then it sees
> getattr requests from tasks 2 and 3 for the handle ID that the
> directory had on the first run, not the handle ID for this run.
>
> We may have traced this to the kernel module.  Some of the log looks
> like this:
>
>     pvfs2_d_revalidate_common: called on dentry ffff81003df86970.
>     pvfs2_d_revalidate_common: parent found.
>     pvfs2_d_revalidate_common: attempting lookup.
>     Alloced OP (ffff81003d98a1f8: 121 OP_LOOKUP)
>     pvfs2: service_operation: pvfs2_lookup ffff81003d98a1f8
>     client-core: reading op tag 120 OP_LOOKUP
>     client-core: reading op tag 121 OP_LOOKUP
>     (get) Alloced OP (ffff81003df561b8:120)
>     (get) Alloced OP (ffff81003d98a1f8:121)
>     pvfs2: service_operation pvfs2_lookup returning: 0 for
>     ffff81003df561b8.
>     pvfs2_d_revalidate_common: lookup failure or no match.
>     Releasing OP (ffff81003df561b8: 120)
>     pvfs2_getattr: called on simul_dir_stat.0
>     pvfs2_inode_getattr: called on inode 1048471
>
> Something calls revalidate on the dentry.  The lookup returns
> successful from userspace.  The kernel sees that the handles are
> different:
>
>             if((new_op->downcall.status != 0) ||
>                     !match_handle(new_op->downcall.resp.lookup.refn.handle, 
> inode))
>             {
>                 gossip_debug(GOSSIP_DCACHE_DEBUG, "pvfs2_d_revalidate_common: 
> lookup failure or no match.\n");
>                 op_release(new_op);
>                 return(0);
>             }
>
> But then immediatly issues a getattr for the old handle ID.  Anybody
> know how to fix or destroy the bad dentry?  (Looks at Murali...)
>
>                 -- Pete
>
> _______________________________________________
> Pvfs2-developers mailing list
> [email protected]
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to