Sam and I have been tracking down a pvfs bug when using the VFS
interface.  Kevin discovered it.

The code is test #7 in simul.  It runs in parallel, four tasks,
tasks 0 and 1 on node1 and tasks 2 and 3 on node2.  It does:
    
    if (task == 0)
        mkdir("foo");

    MPI_Barrier();
    sleep(3);

    stat("foo");

    if (task == 0)
        rmdir("foo");

On a freshly initialized pvfs (1 server for both md + io), it works.
Task 0 creates the directory, and all four tasks stat it
successfully.  When the process exits, the directory is indeed gone.

The second time you run it, tasks 2 and 3 (on node2) get -ENOENT
from the stat, but tasks 0 and 1 work fine as before and the
directory was indeed created properly.

Looking down a bit further, the server sees lookup requests from
tasks 2 and 3, and returns the proper handle Id.  Then it sees
getattr requests from tasks 2 and 3 for the handle ID that the
directory had on the first run, not the handle ID for this run.

We may have traced this to the kernel module.  Some of the log looks
like this:

    pvfs2_d_revalidate_common: called on dentry ffff81003df86970.
    pvfs2_d_revalidate_common: parent found.
    pvfs2_d_revalidate_common: attempting lookup.
    Alloced OP (ffff81003d98a1f8: 121 OP_LOOKUP)
    pvfs2: service_operation: pvfs2_lookup ffff81003d98a1f8
    client-core: reading op tag 120 OP_LOOKUP
    client-core: reading op tag 121 OP_LOOKUP
    (get) Alloced OP (ffff81003df561b8:120)
    (get) Alloced OP (ffff81003d98a1f8:121)
    pvfs2: service_operation pvfs2_lookup returning: 0 for
    ffff81003df561b8.
    pvfs2_d_revalidate_common: lookup failure or no match.
    Releasing OP (ffff81003df561b8: 120)
    pvfs2_getattr: called on simul_dir_stat.0
    pvfs2_inode_getattr: called on inode 1048471

Something calls revalidate on the dentry.  The lookup returns
successful from userspace.  The kernel sees that the handles are
different:

            if((new_op->downcall.status != 0) ||
                    !match_handle(new_op->downcall.resp.lookup.refn.handle, 
inode))
            {
                gossip_debug(GOSSIP_DCACHE_DEBUG, "pvfs2_d_revalidate_common: 
lookup failure or no match.\n");
                op_release(new_op);
                return(0);
            }

But then immediatly issues a getattr for the old handle ID.  Anybody
know how to fix or destroy the bad dentry?  (Looks at Murali...)

                -- Pete

_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to