Hello,
I have a problem similar to
https://sourceforge.net/p/robinhood/mailman/message/35883907/ in which the
robinhood server running mariadb-5.5.52-1.el7.x86_64 and lustre 2.8.0.8 client
will reboot when the initial scan is run. I am running this in a testbed
environment prior to deployment on our production system because I want to get
a complete handle on it before I commit to the deployment. I have 2 separate
lustre file systems that I am running against: One is a 408TB lustre 2.8 file
system with ~16M inodes, the other is a 204TB lustre 2.5.5 file system with ~3M
inodes.
The curious thing is that I had successfully scanned both file systems
independently on the system with everything working (including web-gui) and
then basically blew away the databases to get a datapoint on how the system
performed and the time it took if I ran a scan on both file systems
simultaneously. It appears that it is only impacting the 2.8 file system
database. I just ran a fresh scan against the 2.5.5 file system without
problem. I then stated a new scan against the 2.8 file system an once again it
rebooted.
Like the other support ticket above, when I ran the scan only on the 2.8 file
system in debug mode it also reported messages similar to “2017/07/10 15:44:58
[15191/6] FS_Scan | openat failed on <parent_fd=18>/libippch.so: Too many
levels of symbolic links”. I check a large number of the files that were being
reported and for the most part they were library files with only a couple of
symlinks to the .so file in the same directory.
The only other thing of note that I was able to capture is this from the
console output:
[ 3301.937577] LustreError: 15209:0:(linux-module.c:92:obd_ioctl_getdata())
Version mismatch kernel (10004) vs application (0)
[ 3301.950059] LustreError: 15209:0:(class_obd.c:230:class_handle_ioctl()) OBD
ioctl: data error"
There was no indication of a fault in any of the log files and I was running
top and htop during the process and neither CPU or memory was exhausted. Nor
did I see anything suspicious happening on the file system itself.
Any help or clues as to why this is failing would be greatly appreciated.
Thanks in advance.
====
Joe Mervini
Sandia National Laboratories
High Performance Computing
505.844.6770
jame...@sandia.gov<mailto:jame...@sandia.gov>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
robinhood-support mailing list
robinhood-support@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/robinhood-support