Attached is a patch implemented what I discussed below. It just allows
cancellation ops to be serviced before the filesystem is mounted.
I've also attached a change to client-state-machine.c that I must not
have included in my last patch to use the base frame of the stack in the
cancelled_io_jobs_are_pending call.
Michael
On Tue, Mar 09, 2010 at 02:32:26PM -0500, Michael Moore wrote:
> I really wish I didn't have to resurrect this. We are again seeing issues
> with processes with in-progress writes spinning when pvfs2-client-core
> segfaults and restarts and the filesystem is not yet mounted.
>
> I think the issue is this:
> If there is a cancellation request issued by a process (e.g. 'kill -9
> badguy') and the filesystem is not yet mounted the kernel spins waiting
> for the cancellation downcall. I assume this is due to pvfs2-client-core
> not servicing requests other than mount requests until the filesystem is
> remounted so a downcall never comes.
>
> To reproduce this (the convoluted process):
> Have a process write to a pvfs filesystem
> Cause communication to fail (i.e. kill a server process)
> After client-core has begun retrying, kill the client-core
> After client-core restart and write retries, kill the writing process
>
> Unfortunately, this is the behavior we see when a bad perror_gossip call
> is segfaulting client-core as I/O is being re-tried and cancelled
> (haven't tracked that down yet).
>
> Fix:
> So far, allowing cancellation requests in addition to remount
> requests to be handled before the filesystem is mounted seems to fix the
> issue. Does this make sense? I imagine it means that the cancel I/O
> request is effectively not handled since the write request doesn't exist
> yet. If the filesystem is ever mounted writes will re-issue the
> cancellation if the writes continue to fail?
>
> Thanks,
> Michael
> _______________________________________________
> Pvfs2-developers mailing list
> [email protected]
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
Index: pvfs2/src/apps/kernel/linux/pvfs2-client-core.c
===================================================================
RCS file: /projects/cvsroot/pvfs2/src/apps/kernel/linux/pvfs2-client-core.c,v
retrieving revision 1.108
diff -a -u -p -r1.108 pvfs2-client-core.c
--- pvfs2/src/apps/kernel/linux/pvfs2-client-core.c 11 Feb 2010 23:03:32
-0000 1.108
+++ pvfs2/src/apps/kernel/linux/pvfs2-client-core.c 10 Mar 2010 12:11:43
-0000
@@ -2852,7 +2852,8 @@ static inline PVFS_error handle_unexp_vf
}
if (remount_complete == REMOUNT_NOTCOMPLETED &&
- (vfs_request->in_upcall.type != PVFS2_VFS_OP_FS_MOUNT))
+ (vfs_request->in_upcall.type != PVFS2_VFS_OP_FS_MOUNT) &&
+ (vfs_request->in_upcall.type != PVFS2_VFS_OP_CANCEL) )
{
gossip_debug(
GOSSIP_CLIENTCORE_DEBUG, "Got an upcall operation of "
Index: pvfs2/src/client/sysint/client-state-machine.c
===================================================================
RCS file: /projects/cvsroot/pvfs2/src/client/sysint/client-state-machine.c,v
retrieving revision 1.105
diff -a -u -p -r1.105 client-state-machine.c
--- pvfs2/src/client/sysint/client-state-machine.c 8 Feb 2010 16:46:33
-0000 1.105
+++ pvfs2/src/client/sysint/client-state-machine.c 10 Mar 2010 12:19:03
-0000
@@ -208,23 +208,28 @@ static inline int cancelled_io_jobs_are_
cancellations on the I/O operation are accounted for
*/
assert(sm_p);
+
+ PINT_client_sm *sm_base_p =
+ PINT_sm_frame(smcb, (-(smcb->frame_count -1)));
+
+ assert(sm_base_p);
/*
this *can* possibly be 0 in the case that the I/O has already
completed and no job cancellation were issued at I/O cancel time
*/
- if (sm_p->u.io.total_cancellations_remaining > 0)
+ if (sm_base_p->u.io.total_cancellations_remaining > 0)
{
- sm_p->u.io.total_cancellations_remaining--;
+ sm_base_p->u.io.total_cancellations_remaining--;
}
gossip_debug(
GOSSIP_IO_DEBUG, "(%p) cancelled_io_jobs_are_pending: %d "
- "remaining (op %s)\n", sm_p,
- sm_p->u.io.total_cancellations_remaining,
+ "remaining (op %s)\n", sm_base_p,
+ sm_base_p->u.io.total_cancellations_remaining,
(PINT_smcb_complete(smcb) ? "complete" : "NOT complete"));
- return (sm_p->u.io.total_cancellations_remaining != 0);
+ return (sm_base_p->u.io.total_cancellations_remaining != 0);
}
/* this array must be ordered to match the enum in client-state-machine.h */
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers