This looks substantially similar to OS-2415 (https://smartos.org/bugview/OS-2415, https://www.illumos.org/issues/4013). That issue has been closed over 3 years, but the fix was just reverting a commit from 3 years before that. I'm not sure it was ever followed up on to re-correct the original bug (it would surely have been given a new ID, which makes it hard for me to track). One of our kernel developers may remember this and know more. If this is indeed the same issue, you've hit an issue that's at least six years old.
The description of OS-2415 seems to indicate faulty hardware, though it's an OS bug that there was a panic instead of simply faulting the device. Since you know you have a flaky drive that could be what's going on here. In any event it's probably best to replace that drive as soon as possible. I'd like to get a copy of the crash dump from you, if possible (I can give you a signed Manta URL to upload to if that helps). That will help confirm whether or not it's actually related to OS-2415, and hopefully get the underlying issue corrected. -- Brian Bennett Systems Engineer, Cloud Operations Joyent, Inc. | www.joyent.com <http://www.joyent.com/> > On Sep 10, 2016, at 2:50 PM, Daniel Carosone <[email protected]> > wrote: > > > Hi all, > > I get the following panic when scrubbing a pool. After the panic, it will > continue scrubbing and possibly panic again, several times. > However, if left to run, the pool will finish scrubbing and is reported clean. > > The pool is an 8-way raidz2, across several AHCI controllers (Intel and > Marvell) on one of those AsRock Avoton C2750D4I Mini-ITX boards that were all > the rage a couple of years ago for home servers. I'm only just getting around > to putting it into full service, but it's been running fine (with fewer > drives) until now. It's running the current PI. > > I'm assuming there's some load-related locking issue, and hoping it's a > solvable software issue rather than bad hardware. I haven't yet really looked > at the BIOS options to see if there are some interrupt-mapping options that > might move the issue around somehow. I can try that, but I'd rather a > deliberate set of tests rather than random shuffling. > > There's also a flaky ssd in the zones pool (on sata0/0 c3t0d0). It works > fine, most of the time, but occasionally goes offline until I power off, > fiddle and re-try it. I suspect a cable problem, and will be pulling it out > to test separately. It doesn't share a controller with the pool, and has > been offline through a full panic cycle, so I'm hoping is unrelated. At > least, a failed drive shouldn't be able to cause this, so I'm posting before > trying that anyway. > > So, some crash and config details below. Suggestions and requests for > further info welcome (I presume info on driver and interrupt status would be > useful, but I don't have the mdb incantations..) > > [root@d0-50-99-46-c2-00 /var/crash/volatile]# mdb -e '::status;$C' vmcore.5 > debugging crash dump vmcore.5 (64-bit) from d0-50-99-46-c2-00 > operating system: 5.11 joyent_20160906T181054Z (i86pc) > image uuid: (not set) > panic message: I/O to pool 'titan' appears to be hung. > dump content: kernel pages only > ffffff003d54f9d0 vpanic() > ffffff003d54fa20 vdev_deadman+0x10b(ffffff0d08289380) > ffffff003d54fa70 vdev_deadman+0x4a(ffffff0d10a28640) > ffffff003d54fac0 vdev_deadman+0x4a(ffffff0d10bbd040) > ffffff003d54faf0 spa_deadman+0xad(ffffff0d11210000) > ffffff003d54fb90 cyclic_softint+0xfd(ffffff0d07de9a80, 0) > ffffff003d54fba0 cbe_low_level+0x14() > ffffff003d54fbf0 av_dispatch_softvect+0x78(2) > ffffff003d54fc20 dispatch_softint+0x39(0, 0) > ffffff003d4e8a20 switch_sp_and_call+0x13() > ffffff003d4e8a60 dosoftint+0x44(ffffff003d4e8ad0) > ffffff003d4e8ac0 do_interrupt+0xba(ffffff003d4e8ad0, 0) > ffffff003d4e8ad0 _interrupt+0xba() > ffffff003d4e8bc0 i86_mwait+0xd() > ffffff003d4e8c00 cpu_idle_mwait+0x109() > ffffff003d4e8c20 idle+0xa7() > ffffff003d4e8c30 thread_start+8() > > [root@d0-50-99-46-c2-00 /var/crash/volatile]# zpool status -v titan > pool: titan > state: ONLINE > scan: scrub repaired 0 in 3h53m with 0 errors on Sat Sep 10 13:59:42 2016 > config: > > NAME STATE READ WRITE CKSUM > titan ONLINE 0 0 0 > raidz2-0 ONLINE 0 0 0 > c4t0d0 ONLINE 0 0 0 > c4t1d0 ONLINE 0 0 0 > c0t0d0 ONLINE 0 0 0 > c0t1d0 ONLINE 0 0 0 > c1t0d0 ONLINE 0 0 0 > c1t1d0 ONLINE 0 0 0 > c1t2d0 ONLINE 0 0 0 > c1t3d0 ONLINE 0 0 0 > > [root@d0-50-99-46-c2-00 /var/crash/volatile]# cfgadm -lv > Ap_Id Receptacle Occupant Condition > Information > When Type Busy Phys_Id > sata0/0 connected unconfigured unknown Mod: > FRev: SN: > unavailable unknown n /devices/pci@0,0/pci1849,1f22@17:0 > sata0/1::dsk/c3t1d0 connected configured ok Mod: > INTEL SSDSC2BW240A4 FRev: DC32 SN: CVDA44520AGG2403GN > unavailable disk n /devices/pci@0,0/pci1849,1f22@17:1 > sata0/2::dsk/c3t2d0 connected configured ok Mod: > KINGSTON SVP200S37A240G FRev: 502ABBF0 SN: 50026B722C0629B9 > unavailable disk n /devices/pci@0,0/pci1849,1f22@17:2 > sata0/3 empty unconfigured ok > unavailable sata-port n /devices/pci@0,0/pci1849,1f22@17:3 > sata1/0::dsk/c4t0d0 connected configured ok Mod: WDC > WD80EFZX-68UW8N0 FRev: 83.H0A83 SN: VKJA71SX > unavailable disk n /devices/pci@0,0/pci1849,1f32@18:0 > sata1/1::dsk/c4t1d0 connected configured ok Mod: WDC > WD80EFZX-68UW8N0 FRev: 83.H0A83 SN: VLGUM69Z > unavailable disk n /devices/pci@0,0/pci1849,1f32@18:1 > sata2/0::dsk/c0t0d0 connected configured ok Mod: WDC > WD80EFZX-68UW8N0 FRev: 83.H0A83 SN: VKJ9UZ6X > unavailable disk n > /devices/pci@0,0/pci8086,1f12@3/pci10b5,8608@0/pci10b5,8608@1/pci1849,9172@0:0 > sata2/1::dsk/c0t1d0 connected configured ok Mod: WDC > WD80EFZX-68UW8N0 FRev: 83.H0A83 SN: VKJAVGNX > unavailable disk n > /devices/pci@0,0/pci8086,1f12@3/pci10b5,8608@0/pci10b5,8608@1/pci1849,9172@0:1 > sata3/0::dsk/c1t0d0 connected configured ok Mod: WDC > WD80EFZX-68UW8N0 FRev: 83.H0A83 SN: VKJYRS2Y > unavailable disk n > /devices/pci@0,0/pci8086,1f13@4/pci1849,9230@0:0 > sata3/1::dsk/c1t1d0 connected configured ok Mod: WDC > WD80EFZX-68UW8N0 FRev: 83.H0A83 SN: VLGUJSDZ > unavailable disk n > /devices/pci@0,0/pci8086,1f13@4/pci1849,9230@0:1 > sata3/2::dsk/c1t2d0 connected configured ok Mod: WDC > WD80EFZX-68UW8N0 FRev: 83.H0A83 SN: VLGUNDZZ > unavailable disk n > /devices/pci@0,0/pci8086,1f13@4/pci1849,9230@0:2 > sata3/3::dsk/c1t3d0 connected configured ok Mod: WDC > WD80EFZX-68UW8N0 FRev: 83.H0A83 SN: VLGUJNXZ > unavailable disk n > /devices/pci@0,0/pci8086,1f13@4/pci1849,9230@0:3 > sata3/4 empty unconfigured ok > unavailable sata-port n > /devices/pci@0,0/pci8086,1f13@4/pci1849,9230@0:4 > sata3/5 empty unconfigured ok > unavailable sata-port n > /devices/pci@0,0/pci8086,1f13@4/pci1849,9230@0:5 > sata3/6 empty unconfigured ok > unavailable sata-port n > /devices/pci@0,0/pci8086,1f13@4/pci1849,9230@0:6 > sata3/7 connected unconfigured ok Mod: > MARVELL VIRTUALL FRev: 1.09 SN: > unavailable processor n > /devices/pci@0,0/pci8086,1f13@4/pci1849,9230@0:7 > usb0/1 connected configured ok Mfg: > <undef> Product: <undef> NConfigs: 1 Config: 0 <no cfg str descr> > # usb stuff below here trimmed > > -- > Dan. > smartos-discuss | Archives > <https://www.listbox.com/member/archive/184463/=now> > <https://www.listbox.com/member/archive/rss/184463/26986985-d0246faa> | > Modify <https://www.listbox.com/member/?&> Your Subscription > <http://www.listbox.com/>
smime.p7s
Description: S/MIME cryptographic signature
------------------------------------------- smartos-discuss Archives: https://www.listbox.com/member/archive/184463/=now RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00 Modify Your Subscription: https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb Powered by Listbox: http://www.listbox.com
