Hi all,
While I understand that the ncache does add some disparity to the views
of various processes, I think it would be worth tracking down exactly
what is happening here. It seems like this particular error is one that
we should be able to avoid.
I agree with RobL that the ncache isn't going to be terribly helpful for
most MPI-IO users, and with Phil that default disabling through the
system interface is probably good.
Rob
Phil Carns wrote:
Pete Wyckoff wrote:
The simul code, test #14, does a shared create: all processes
try to do "creat(file, 0644)" at the same time through the VFS.
There is no O_EXCL, so what should happen here is that they all
succeed, although under the hood, all but one will probably have
to unwind the SYS_CREATE when they notice that the dirent already
exists from another process.
This used to work just fine. With the addition of the ncache code
to pvfs2-client, I'm guessing, things break. The test works again
if I add "-n 0" to the pvfs2-client command line.
My setup is all x86_64. Two IO servers, one of which does MD too.
Two other nodes as clients, running:
mpiexec -pernode -np 2 $simul/simul -d /pvfs-ib -i 14 -n 200 -N 1
eventually one will fail, usually around the second iteration, with
14:46:54: Process 1(ib26): FAILED in simul_creat, creat failed:
No such file or directory
Does anybody know the ncache code well enough to figure this out?
I find the -EEXIST fixup code in client-core, but can't see what
kind of ncache invalidation should presumably happen around there.
-- Pete
What happens on each iteration? Does the code at some point delete a
file with a particular name and then create a new one with the same name?
-Phil
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers