Hi all,
just want to share my recent insight and increase the number of Google hits
for those who suffer from
- MDT / filesystem becoming suddenly unusable
- LustreError: ... lock callback timer expired ...
- LustreError: ... lock on destroyed export ...
- Lustre: ... Stealing 1 locks ...
-
Hello!
On May 3, 2010, at 11:49 AM, Thomas Roth wrote:
We found a user job submission script that probably caused all this by
starting
- several hundred (900) jobs simultaneously
- all of them opening one and the same file for batch system errors and
one and the same file for its output.