On Sep 5, 2007, at 10:31 AM, Phil Carns wrote:
We have run into a problem with running "rm -rf" and "ls"
concurrently on the same directory from different client nodes. In
the particular case that we are looking at, the directory has about
7000 files in it but no subdirectories. If we do an ls on the
directory while an "rm -rf" is running from a different client,
then the rm fails to remove all of the files. It seems to get
worse if you do more than one ls while the rm is working. This is
on RHEL4 with 2.6.9.something kernels.
Has anyone else seen this? Any idea what the problem is?
Hi Phil,
The trove layer caches the position -> name mapping for positions it
returns back to the client on a readdir. The problem is probably
related to caching those entries, where the readdir for the rm is
iterating over the directory, and so inserting position -> name
entries into the cache, and then ls is coming along and replacing
those entries with its own, where the position is the same but the
name is further down in the directory (because rm has removed some of
them). That's just a guess though. You could see if disabling that
position cache helps fix the problem, disabling it will cause the
berkeley db iterate to walk through all the entries up to the
position though, so its going to be much slower. The position cache
is in dbpf-keyval-pcache.c.
Probably the right long term solution is to return the name as the
position, instead of an int.
-sam
thanks,
-Phil
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers