[Lustre-discuss] lock callback timer expired, lock on destroyed export, locks stolen, busy with active RPCs, operation 400 on unconnected MDS

2010-05-03 Thread Thomas Roth
Hi all, just want to share my recent insight and increase the number of Google hits for those who suffer from - MDT / filesystem becoming suddenly unusable - LustreError: ... lock callback timer expired ... - LustreError: ... lock on destroyed export ... - Lustre: ... Stealing 1 locks ... -

Re: [Lustre-discuss] lock callback timer expired, lock on destroyed export, locks stolen, busy with active RPCs, operation 400 on unconnected MDS

2010-05-03 Thread Oleg Drokin
Hello! On May 3, 2010, at 11:49 AM, Thomas Roth wrote: We found a user job submission script that probably caused all this by starting - several hundred (900) jobs simultaneously - all of them opening one and the same file for batch system errors and one and the same file for its output.