On Wed, 28 Nov 2018 at 22:11, Davide Tacchella <dtacche...@cray.com> wrote: > Your problem, as described, looks like a Lustre locking issue, avoiding > scan on a certain directory may work for some time, real fix would be > to identify Lustre MDS issue and fix it.
Yep - The problem has been that the only sign anything 'strange' was happening on the box was that something tries to load a whole bunch of kernel modules just as it dies - Finger of blame was pointed at watchdog, but once it started happening at about the same time (4h into scan) on a second box suspicion switched to filesystem. I'm going to run with exclude for now and see if this scan completes (we're about 200M inodes on the filesystem just now) and then start again just scanning the suspect part of the tree to identify the issue directory. (it's a whole pile of backups from pre rhine/redwood...) Is there something likely to show on the MDS that I should watch for? Many thanks Andrew _______________________________________________ robinhood-support mailing list robinhood-support@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/robinhood-support