Hi Bart,

Could you try this patch out and see if it fixes your problem? This is checked into trunk as well. This won't eliminate the inode alloc warning, but I think it does actually fix the umount hang.

I also suspect that this same issue may affect a few other cases as well, but it would be good if you could confirm this much for starters.

I think the same class of bug is affecting some of the proc file handlers, for example. Cases like this also cause a pvfs2-client-core hang:

"for i in `seq 1 100`; do echo $i; cat /proc/sys/pvfs2/perf-counters/acache; done"

thanks!
-Phil

Bart Taylor wrote:
Hey guys,

I have been running some tests against the 271 release, and I am having some trouble with multiple mounts on one client. My setup has 2 servers (both meta and io servers on local disk) and one client all of which are running RHEL4 update 6. All that was done on the test client is loading the kernel module and starting pvfs2-client. I can mount the file system once and use it without any problem, but I have attached a test script - takes file system information and a number of times to mount it - that keeps failing. Here are the steps it executes:

- For the number of mounts requested
   - Create a new directory (defaults to /tmp/mount_limit.#)
   - Mount the specified file system on the new dir

- For the number of mounts requested
- Do a recursive ls comparison (keep a copy the first time through and compare subsequent mounts to the first)
   - Unmount the dir
   - Delete the dir

I have been able to consistently reproduce the problem running the attached script like this:
./test-mount-limit.pl pvfs2-server1:3334/pvfs2-fs 100
It stalls every time with either 36 or 37 mounts remaining. The script has been successfully run on previous versions of pvfs2 up to several thousand mounts.

The problem comes at the umount step. Eventually the process just hangs, strands a bunch of mounts, and umount doesn't work as expected after that even from the command line. When it stalls, I start seeing messages like this one in dmesg and syslog: May 2 15:02:44 client-node kernel: pvfs2_kill_sb: (WARNING) number of inode allocs (4100) != number of inode deallocs (2665)

I am running this against an almost empty file system since the recursive ls would take a while if it were large. Am I doing something wrong/strange here, or is there a client/kernel problem? The test seems pretty straight-forward, and I've never had an issue with the script before. I'm not sure if it was run against the 2.7.0 release though.

Bart.


------------------------------------------------------------------------

_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

---------------------
PatchSet 198 
Date: 2008/05/13 09:46:28
Author: pcarns
Branch: HEAD
Tag: (none) 
Log:
The umount request in pvfs2-client was not reposting an unexpected device
request, which means that all outstanding operations would be exhausted
after 64 umounts (and hang the pvfs2-client-core daemon).  This changes
makes umount act like some of the more normal operation types.

Suspect a similar problem in other operations that are handled inline (like
perf monitoring and other /proc/sys/pvfs2 handlers) but will address those
separately.

Fixes bug reported by Bart Taylor:
http://www.beowulf-underground.org/pipermail/pvfs2-developers/2008-May/004018.html

Members: 
	pvfs2-client-core.c:1.94->1.95 

Index: pvfs2-1/src/apps/kernel/linux/pvfs2-client-core.c
diff -u pvfs2-1/src/apps/kernel/linux/pvfs2-client-core.c:1.94 pvfs2-1/src/apps/kernel/linux/pvfs2-client-core.c:1.95
--- pvfs2-1/src/apps/kernel/linux/pvfs2-client-core.c:1.94	Wed May  7 15:12:51 2008
+++ pvfs2-1/src/apps/kernel/linux/pvfs2-client-core.c	Tue May 13 09:46:28 2008
@@ -1169,7 +1169,9 @@
 ok:
     PVFS_util_free_mntent(&mntent);
 
-    write_inlined_device_response(vfs_request);
+    /* let handle_unexp_vfs_request() function detect completion and handle */
+    vfs_request->op_id = -1;
+
     return 0;
 fail_downcall:
     gossip_err(
@@ -2666,6 +2668,8 @@
             }
             break;
         }
+        case PVFS2_VFS_OP_FS_UMOUNT:
+            break;
         default:
             gossip_err("Completed upcall of unknown type %x!\n",
                        vfs_request->in_upcall.type);
@@ -2855,6 +2859,7 @@
               calls that are serviced inline.
             */
         case PVFS2_VFS_OP_FS_UMOUNT:
+            posted_op = 1;
             ret = service_fs_umount_request(vfs_request);
             break;
         case PVFS2_VFS_OP_PERF_COUNT:
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to