You might want to try repeating the test with the pvfs2-client set to
disable the ncache and acache (set the timeout to zero either in proc or
with command line arguments). I don't know if they are playing any role
or not, but it may at least simplify the debugging a little.
-Phil
Pete Wyckoff wrote:
Sam and I have been tracking down a pvfs bug when using the VFS
interface. Kevin discovered it.
The code is test #7 in simul. It runs in parallel, four tasks,
tasks 0 and 1 on node1 and tasks 2 and 3 on node2. It does:
if (task == 0)
mkdir("foo");
MPI_Barrier();
sleep(3);
stat("foo");
if (task == 0)
rmdir("foo");
On a freshly initialized pvfs (1 server for both md + io), it works.
Task 0 creates the directory, and all four tasks stat it
successfully. When the process exits, the directory is indeed gone.
The second time you run it, tasks 2 and 3 (on node2) get -ENOENT
from the stat, but tasks 0 and 1 work fine as before and the
directory was indeed created properly.
Looking down a bit further, the server sees lookup requests from
tasks 2 and 3, and returns the proper handle Id. Then it sees
getattr requests from tasks 2 and 3 for the handle ID that the
directory had on the first run, not the handle ID for this run.
We may have traced this to the kernel module. Some of the log looks
like this:
pvfs2_d_revalidate_common: called on dentry ffff81003df86970.
pvfs2_d_revalidate_common: parent found.
pvfs2_d_revalidate_common: attempting lookup.
Alloced OP (ffff81003d98a1f8: 121 OP_LOOKUP)
pvfs2: service_operation: pvfs2_lookup ffff81003d98a1f8
client-core: reading op tag 120 OP_LOOKUP
client-core: reading op tag 121 OP_LOOKUP
(get) Alloced OP (ffff81003df561b8:120)
(get) Alloced OP (ffff81003d98a1f8:121)
pvfs2: service_operation pvfs2_lookup returning: 0 for
ffff81003df561b8.
pvfs2_d_revalidate_common: lookup failure or no match.
Releasing OP (ffff81003df561b8: 120)
pvfs2_getattr: called on simul_dir_stat.0
pvfs2_inode_getattr: called on inode 1048471
Something calls revalidate on the dentry. The lookup returns
successful from userspace. The kernel sees that the handles are
different:
if((new_op->downcall.status != 0) ||
!match_handle(new_op->downcall.resp.lookup.refn.handle,
inode))
{
gossip_debug(GOSSIP_DCACHE_DEBUG, "pvfs2_d_revalidate_common: lookup
failure or no match.\n");
op_release(new_op);
return(0);
}
But then immediatly issues a getattr for the old handle ID. Anybody
know how to fix or destroy the bad dentry? (Looks at Murali...)
-- Pete
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers