Attached is a patch implemented what I discussed below. It just allows 
cancellation ops to be serviced before the filesystem is mounted.

I've also attached a change to client-state-machine.c that I must not 
have included in my last patch to use the base frame of the stack in the 
cancelled_io_jobs_are_pending call.

Michael

On Tue, Mar 09, 2010 at 02:32:26PM -0500, Michael Moore wrote:
> I really wish I didn't have to resurrect this. We are again seeing issues 
> with processes with in-progress writes spinning when pvfs2-client-core 
> segfaults and restarts and the filesystem is not yet mounted.
> 
> I think the issue is this:
> If there is a cancellation request issued by a process (e.g. 'kill -9 
> badguy') and the filesystem is not yet mounted the kernel spins waiting 
> for the cancellation downcall. I assume this is due to pvfs2-client-core 
> not servicing requests other than mount requests until the filesystem is 
> remounted so a downcall never comes.
> 
> To reproduce this (the convoluted process):
> Have a process write to a pvfs filesystem
> Cause communication to fail (i.e. kill a server process)
> After client-core has begun retrying, kill the client-core
> After client-core restart and write retries, kill the writing process
> 
> Unfortunately, this is the behavior we see when a bad perror_gossip call 
> is segfaulting client-core as I/O is being re-tried and cancelled 
> (haven't tracked that down yet). 
> 
> Fix:
> So far, allowing cancellation requests in addition to remount 
> requests to be handled before the filesystem is mounted seems to fix the 
> issue. Does this make sense? I imagine it means that the cancel I/O 
> request is effectively not handled since the write request doesn't exist 
> yet.  If the filesystem is ever mounted writes will re-issue the 
> cancellation if the writes continue to fail?
> 
> Thanks,
> Michael
> _______________________________________________
> Pvfs2-developers mailing list
> [email protected]
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
Index: pvfs2/src/apps/kernel/linux/pvfs2-client-core.c
===================================================================
RCS file: /projects/cvsroot/pvfs2/src/apps/kernel/linux/pvfs2-client-core.c,v
retrieving revision 1.108
diff -a -u -p -r1.108 pvfs2-client-core.c
--- pvfs2/src/apps/kernel/linux/pvfs2-client-core.c     11 Feb 2010 23:03:32 
-0000      1.108
+++ pvfs2/src/apps/kernel/linux/pvfs2-client-core.c     10 Mar 2010 12:11:43 
-0000
@@ -2852,7 +2852,8 @@ static inline PVFS_error handle_unexp_vf
     }
 
     if (remount_complete == REMOUNT_NOTCOMPLETED &&
-        (vfs_request->in_upcall.type != PVFS2_VFS_OP_FS_MOUNT))
+        (vfs_request->in_upcall.type != PVFS2_VFS_OP_FS_MOUNT) && 
+        (vfs_request->in_upcall.type != PVFS2_VFS_OP_CANCEL) )
     {
         gossip_debug(
             GOSSIP_CLIENTCORE_DEBUG, "Got an upcall operation of "
Index: pvfs2/src/client/sysint/client-state-machine.c
===================================================================
RCS file: /projects/cvsroot/pvfs2/src/client/sysint/client-state-machine.c,v
retrieving revision 1.105
diff -a -u -p -r1.105 client-state-machine.c
--- pvfs2/src/client/sysint/client-state-machine.c      8 Feb 2010 16:46:33 
-0000       1.105
+++ pvfs2/src/client/sysint/client-state-machine.c      10 Mar 2010 12:19:03 
-0000
@@ -208,23 +208,28 @@ static inline int cancelled_io_jobs_are_
       cancellations on the I/O operation are accounted for
     */
     assert(sm_p);
+    
+    PINT_client_sm *sm_base_p = 
+        PINT_sm_frame(smcb, (-(smcb->frame_count -1)));
+
+    assert(sm_base_p);
 
     /*
       this *can* possibly be 0 in the case that the I/O has already
       completed and no job cancellation were issued at I/O cancel time
     */
-    if (sm_p->u.io.total_cancellations_remaining > 0)
+    if (sm_base_p->u.io.total_cancellations_remaining > 0)
     {
-        sm_p->u.io.total_cancellations_remaining--;
+        sm_base_p->u.io.total_cancellations_remaining--;
     }
 
     gossip_debug(
         GOSSIP_IO_DEBUG, "(%p) cancelled_io_jobs_are_pending: %d "
-        "remaining (op %s)\n", sm_p,
-        sm_p->u.io.total_cancellations_remaining,
+        "remaining (op %s)\n", sm_base_p,
+        sm_base_p->u.io.total_cancellations_remaining,
         (PINT_smcb_complete(smcb) ? "complete" : "NOT complete"));
 
-    return (sm_p->u.io.total_cancellations_remaining != 0);
+    return (sm_base_p->u.io.total_cancellations_remaining != 0);
 }
 
 /* this array must be ordered to match the enum in client-state-machine.h */ 
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to