Same issue here around two months ago when a L2arc device failed… failmode was default and the device was actually an mSata SSD mounted in a PCI-E mSata card:
http://www.addonics.com/products/ad4mspx2.php and the disk was one of four of these http://www.samsung.com/us/computer/memory-storage/MZ-MTE1T0BW Can these reboots be avoided in any way? Br, Rune From: OmniOS-discuss [mailto:[email protected]] On Behalf Of Schweiss, Chip Sent: Monday, May 18, 2015 10:31 PM To: Paul B. Henson Cc: omnios-discuss Subject: Re: [OmniOS-discuss] disk failure causing reboot? I had the exact same failure mode last week. With over 1000 spindles I see this about once a month. I can publish my dump also if anyone actually want's to try to fix this problem, but I think there are several of the same thing already linked to tickets in Illumos-gate. Pools for the most part should be set to failmode=panic or wait, but a failed disk should not cause a panic. The system this happened to me on failmode was set to wait. It is also on r151012, waiting on a window to upgrade to r151014. My pool is raidz3, so no reason not to kick a bad disk. All my disks are SAS in DataON JBODs, dual connected across two LSI HBAs. BTW, pull a SAS cable and you get a panic too, not degraded multipath. Illumos seems to panic on just about any SAS event these days regardless of redundancy. -Chip On Mon, May 18, 2015 at 3:08 PM, Paul B. Henson <[email protected]<mailto:[email protected]>> wrote: On Mon, May 18, 2015 at 06:25:34PM +0000, Jeff Stockett wrote: > A drive failed in one of our supermicro 5048R-E1CR36L servers running > omnios r151012 last night, and somewhat unexpectedly, the whole system > seems to have panicked. You don't happen to have failmode set to panic on the pool? From the zpool manpage: failmode=wait | continue | panic Controls the system behavior in the event of catastrophic pool failure. This condition is typically a result of a loss of connectivity to the underlying storage device(s) or a failure of all devices within the pool. The behavior of such an event is determined as follows: wait Blocks all I/O access until the device connectivity is recovered and the errors are cleared. This is the default behavior. continue Returns EIO to any new write I/O requests but allows reads to any of the remaining healthy devices. Any write requests that have yet to be committed to disk would be blocked. panic Prints out a message to the console and generates a system crash dump. _______________________________________________ OmniOS-discuss mailing list [email protected]<mailto:[email protected]> http://lists.omniti.com/mailman/listinfo/omnios-discuss
_______________________________________________ OmniOS-discuss mailing list [email protected] http://lists.omniti.com/mailman/listinfo/omnios-discuss
