Hello,

I have a problem similar to 
https://sourceforge.net/p/robinhood/mailman/message/35883907/ in which the 
robinhood server running mariadb-5.5.52-1.el7.x86_64 and lustre 2.8.0.8 client 
will reboot when the initial scan is run. I am running this in a testbed 
environment prior to deployment on our production system because I want to get 
a complete handle on it before I commit to the deployment. I have 2 separate 
lustre file systems that I am running against: One is a 408TB lustre 2.8 file 
system with ~16M inodes, the other is a 204TB lustre 2.5.5 file system with ~3M 
inodes.

The curious thing is that I had successfully scanned both file systems 
independently on the system with everything working (including web-gui) and 
then basically blew away the databases to get a datapoint on how the system 
performed and the time it took if I ran a scan on both file systems 
simultaneously. It appears that it is only impacting the 2.8 file system 
database. I just ran a fresh scan against the 2.5.5 file system without 
problem. I then stated a new scan against the 2.8 file system an once again it 
rebooted.

Like the other support ticket above, when I ran the scan only on the 2.8 file 
system in debug mode it also reported messages similar to “2017/07/10 15:44:58 
[15191/6] FS_Scan | openat failed on <parent_fd=18>/libippch.so: Too many 
levels of symbolic links”. I check a large number of the files that were being 
reported and for the most part they were library files with only a couple of 
symlinks to the .so file in the same directory.

The only other thing of note that I was able to capture is this from the 
console output:

[ 3301.937577] LustreError: 15209:0:(linux-module.c:92:obd_ioctl_getdata()) 
Version mismatch kernel (10004) vs application (0)
[ 3301.950059] LustreError: 15209:0:(class_obd.c:230:class_handle_ioctl()) OBD 
ioctl: data error"

There was no indication of a fault in any of the log files and I was running 
top and htop during the process and neither CPU or memory was exhausted. Nor 
did I see anything suspicious happening on the file system itself.

Any help or clues as to why this is failing would be greatly appreciated. 
Thanks in advance.
====

Joe Mervini
Sandia National Laboratories
High Performance Computing
505.844.6770
jame...@sandia.gov<mailto:jame...@sandia.gov>



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
robinhood-support mailing list
robinhood-support@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/robinhood-support

Reply via email to