I really wish I didn't have to resurrect this. We are again seeing issues 
with processes with in-progress writes spinning when pvfs2-client-core 
segfaults and restarts and the filesystem is not yet mounted.

I think the issue is this:
If there is a cancellation request issued by a process (e.g. 'kill -9 
badguy') and the filesystem is not yet mounted the kernel spins waiting 
for the cancellation downcall. I assume this is due to pvfs2-client-core 
not servicing requests other than mount requests until the filesystem is 
remounted so a downcall never comes.

To reproduce this (the convoluted process):
Have a process write to a pvfs filesystem
Cause communication to fail (i.e. kill a server process)
After client-core has begun retrying, kill the client-core
After client-core restart and write retries, kill the writing process

Unfortunately, this is the behavior we see when a bad perror_gossip call 
is segfaulting client-core as I/O is being re-tried and cancelled 
(haven't tracked that down yet). 

Fix:
So far, allowing cancellation requests in addition to remount 
requests to be handled before the filesystem is mounted seems to fix the 
issue. Does this make sense? I imagine it means that the cancel I/O 
request is effectively not handled since the write request doesn't exist 
yet.  If the filesystem is ever mounted writes will re-issue the 
cancellation if the writes continue to fail?

Thanks,
Michael
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to