On Wed, 28 Nov 2018 at 22:11, Davide Tacchella <dtacche...@cray.com> wrote:
> Your problem, as described, looks like a Lustre locking issue, avoiding
> scan on a certain directory may work for some time, real fix would be
> to identify Lustre MDS issue and fix it.

Yep - The problem has been that the only sign anything 'strange' was
happening on the box was that something tries to load a whole bunch of
kernel modules just as it dies - Finger of blame was pointed at
watchdog, but once it started happening at about the same time (4h
into scan) on a second box suspicion switched to filesystem.

I'm going to run with exclude for now and see if this scan completes
(we're about 200M inodes on the filesystem just now) and then start
again just scanning the suspect part of the tree to identify the issue
directory. (it's a whole pile of backups from pre rhine/redwood...)

Is there something likely to show on the MDS that I should watch for?

Many thanks

Andrew


_______________________________________________
robinhood-support mailing list
robinhood-support@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/robinhood-support

Reply via email to