Hi Bart,
Could you try this patch out and see if it fixes your problem? This is
checked into trunk as well. This won't eliminate the inode alloc
warning, but I think it does actually fix the umount hang.
I also suspect that this same issue may affect a few other cases as
well, but it would be good if you could confirm this much for starters.
I think the same class of bug is affecting some of the proc file
handlers, for example. Cases like this also cause a pvfs2-client-core hang:
"for i in `seq 1 100`; do echo $i; cat
/proc/sys/pvfs2/perf-counters/acache; done"
thanks!
-Phil
Bart Taylor wrote:
Hey guys,
I have been running some tests against the 271 release, and I am having
some trouble with multiple mounts on one client. My setup has 2 servers
(both meta and io servers on local disk) and one client all of which are
running RHEL4 update 6. All that was done on the test client is loading
the kernel module and starting pvfs2-client. I can mount the file
system once and use it without any problem, but I have attached a test
script - takes file system information and a number of times to mount it
- that keeps failing. Here are the steps it executes:
- For the number of mounts requested
- Create a new directory (defaults to /tmp/mount_limit.#)
- Mount the specified file system on the new dir
- For the number of mounts requested
- Do a recursive ls comparison (keep a copy the first time through
and compare subsequent mounts to the first)
- Unmount the dir
- Delete the dir
I have been able to consistently reproduce the problem running the
attached script like this:
./test-mount-limit.pl pvfs2-server1:3334/pvfs2-fs 100
It stalls every time with either 36 or 37 mounts remaining. The script
has been successfully run on previous versions of pvfs2 up to several
thousand mounts.
The problem comes at the umount step. Eventually the process just
hangs, strands a bunch of mounts, and umount doesn't work as expected
after that even from the command line. When it stalls, I start seeing
messages like this one in dmesg and syslog:
May 2 15:02:44 client-node kernel: pvfs2_kill_sb: (WARNING) number of
inode allocs (4100) != number of inode deallocs (2665)
I am running this against an almost empty file system since the
recursive ls would take a while if it were large. Am I doing something
wrong/strange here, or is there a client/kernel problem? The test seems
pretty straight-forward, and I've never had an issue with the script
before. I'm not sure if it was run against the 2.7.0 release though.
Bart.
------------------------------------------------------------------------
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
---------------------
PatchSet 198
Date: 2008/05/13 09:46:28
Author: pcarns
Branch: HEAD
Tag: (none)
Log:
The umount request in pvfs2-client was not reposting an unexpected device
request, which means that all outstanding operations would be exhausted
after 64 umounts (and hang the pvfs2-client-core daemon). This changes
makes umount act like some of the more normal operation types.
Suspect a similar problem in other operations that are handled inline (like
perf monitoring and other /proc/sys/pvfs2 handlers) but will address those
separately.
Fixes bug reported by Bart Taylor:
http://www.beowulf-underground.org/pipermail/pvfs2-developers/2008-May/004018.html
Members:
pvfs2-client-core.c:1.94->1.95
Index: pvfs2-1/src/apps/kernel/linux/pvfs2-client-core.c
diff -u pvfs2-1/src/apps/kernel/linux/pvfs2-client-core.c:1.94 pvfs2-1/src/apps/kernel/linux/pvfs2-client-core.c:1.95
--- pvfs2-1/src/apps/kernel/linux/pvfs2-client-core.c:1.94 Wed May 7 15:12:51 2008
+++ pvfs2-1/src/apps/kernel/linux/pvfs2-client-core.c Tue May 13 09:46:28 2008
@@ -1169,7 +1169,9 @@
ok:
PVFS_util_free_mntent(&mntent);
- write_inlined_device_response(vfs_request);
+ /* let handle_unexp_vfs_request() function detect completion and handle */
+ vfs_request->op_id = -1;
+
return 0;
fail_downcall:
gossip_err(
@@ -2666,6 +2668,8 @@
}
break;
}
+ case PVFS2_VFS_OP_FS_UMOUNT:
+ break;
default:
gossip_err("Completed upcall of unknown type %x!\n",
vfs_request->in_upcall.type);
@@ -2855,6 +2859,7 @@
calls that are serviced inline.
*/
case PVFS2_VFS_OP_FS_UMOUNT:
+ posted_op = 1;
ret = service_fs_umount_request(vfs_request);
break;
case PVFS2_VFS_OP_PERF_COUNT:
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers