I really wish I didn't have to resurrect this. We are again seeing issues with processes with in-progress writes spinning when pvfs2-client-core segfaults and restarts and the filesystem is not yet mounted.
I think the issue is this: If there is a cancellation request issued by a process (e.g. 'kill -9 badguy') and the filesystem is not yet mounted the kernel spins waiting for the cancellation downcall. I assume this is due to pvfs2-client-core not servicing requests other than mount requests until the filesystem is remounted so a downcall never comes. To reproduce this (the convoluted process): Have a process write to a pvfs filesystem Cause communication to fail (i.e. kill a server process) After client-core has begun retrying, kill the client-core After client-core restart and write retries, kill the writing process Unfortunately, this is the behavior we see when a bad perror_gossip call is segfaulting client-core as I/O is being re-tried and cancelled (haven't tracked that down yet). Fix: So far, allowing cancellation requests in addition to remount requests to be handled before the filesystem is mounted seems to fix the issue. Does this make sense? I imagine it means that the cancel I/O request is effectively not handled since the write request doesn't exist yet. If the filesystem is ever mounted writes will re-issue the cancellation if the writes continue to fail? Thanks, Michael _______________________________________________ Pvfs2-developers mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
