Robinhood in the scanning mode seems to randomly reset a machine after  
running for some time (from ~1 to ~10 hours).
This has been observed at least on 3 different nodes.

The only message in the log before resetting the node is the following

kernel: fuse init (API version 7.14)
kernel: iTCO_wdt: Unexpected close, not stopping watchdog!
kernel: lp: driver loaded but no devices found
kernel: ppdev: user-space parallel port driver
kernel: PPP generic driver version 2.4.2
kernel: tun: Universal TUN/TAP device driver, 1.6
kernel: tun: (C) 1999-2004 Max Krasnyansky <m...@qualcomm.com>

If I remove the *_wdt watchdog models then the scan is successfully completed.

Nodes run SL 6.7, kernel 2.6.32-642.6.2.el6.x86_64, lustre 2.4.3,
and we are using pre-built robinhood rpms version 3.0. Lustre to be  
scanned has ~50M files.

Any ideas why is this happening?

Gizo



-- 
Dr. Gizo Nanava
Leibniz Universitaet IT Services
Leibniz Universitaet Hannover
Schlosswender Str. 5
D-30159 Hannover
Tel +49 511 762 7919085
http://www.luis.uni-hannover.de




------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
robinhood-support mailing list
robinhood-support@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/robinhood-support

Reply via email to