Thanks Michael. From your first description I wondered if it wouldn't be better to discard the cancel request in that case, but I like that your approach doesn't require adding any new significant logic.

Both patches are in trunk and 2-8 branch now.

-Phil

Michael Moore wrote:
Attached is a patch implemented what I discussed below. It just allows cancellation ops to be serviced before the filesystem is mounted.

I've also attached a change to client-state-machine.c that I must not have included in my last patch to use the base frame of the stack in the cancelled_io_jobs_are_pending call.

Michael

On Tue, Mar 09, 2010 at 02:32:26PM -0500, Michael Moore wrote:
I really wish I didn't have to resurrect this. We are again seeing issues with processes with in-progress writes spinning when pvfs2-client-core segfaults and restarts and the filesystem is not yet mounted.

I think the issue is this:
If there is a cancellation request issued by a process (e.g. 'kill -9 badguy') and the filesystem is not yet mounted the kernel spins waiting for the cancellation downcall. I assume this is due to pvfs2-client-core not servicing requests other than mount requests until the filesystem is remounted so a downcall never comes.

To reproduce this (the convoluted process):
Have a process write to a pvfs filesystem
Cause communication to fail (i.e. kill a server process)
After client-core has begun retrying, kill the client-core
After client-core restart and write retries, kill the writing process

Unfortunately, this is the behavior we see when a bad perror_gossip call is segfaulting client-core as I/O is being re-tried and cancelled (haven't tracked that down yet).
Fix:
So far, allowing cancellation requests in addition to remount requests to be handled before the filesystem is mounted seems to fix the issue. Does this make sense? I imagine it means that the cancel I/O request is effectively not handled since the write request doesn't exist yet. If the filesystem is ever mounted writes will re-issue the cancellation if the writes continue to fail?

Thanks,
Michael
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

------------------------------------------------------------------------

_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to