On Wed, Mar 20, 2013 at 08:50:40AM -0700, Peter Wood wrote: > I'm sorry. I should have mentioned it that I can't find any errors in the > logs. The last entry in /var/adm/messages is that I removed the keyboard > after the last reboot and then it shows the new boot up messages when I > boot > up the system after the crash. The BIOS log is empty. I'm not sure how to > check the IPMI but IPMI is not configured and I'm not using it.
You definitely should! Plugin a cable into the dedicated network port and configure it (easiest way for you is probably to jump into the BIOS and assign the appropriate IP address etc.). Than, for a quick look, point your browser to the given IP port 80 (default login is ADMIN/ADMIN). Also you may now configure some other details (accounts/passwords/roles). To track the problem, either write a script, which polls the parameters in question periodically or just install the latest ipmiViewer and use this to monitor your sensors ad hoc. see ftp://ftp.supermicro.com/utility/IPMIView/ > Just another observation - the crashes are more intense the more data the > system serves (NFS). > I'm looking into FRMW upgrades for the LSI now. Latest LSI FW should be P15, for this MB type 217 (2.17), MB-BIOS C28 (1.0b). However, I doubt, that your problem has anything to do with the SAS-ctrl or OI or ZFS. My guess is, that either your MB is broken (we had an X9DRH-iF, which instantly "disappeared" as soon as it got some real load) or you have a heat problem (watch you cpu temp e.g. via ipmiviewer). With 2GHz that's not very likely, but worth a try (socket placement on this board is not really smart IMHO). To test quickly - disable all addtional, unneeded service in OI, which may put some load on the machine (like NFS service, http and bla) and perhaps even export unneeded pools (just to be sure) - fire up your ipmiviewer and look at the sensors (set update to 10s) or refresh manually often - start 'openssl speed -multi 32' and keep watching your cpu temp sensors (with 2GHz I guess it takes ~ 12min) I guess, your machine "disappears" before the CPUs getting really hot (broken MB). If CPUs switch off (usually first CPU2 and a little bit later CPU1) you have a cooling problem. If nothing happens, well, than it could be an OI or ZFS problem ;-) Have fun, jel. -- Otto-von-Guericke University http://www.cs.uni-magdeburg.de/ Department of Computer Science Geb. 29 R 027, Universitaetsplatz 2 39106 Magdeburg, Germany Tel: +49 391 67 52768 _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss