It panics still in scrub, even with the suspect ssd removed. Some more stuff from mdb, based on the reports in OS-2415, below. Notable difference is any evidence of disk timeout or other issues prior to the deadman firing.
I looked at several of the dumps in the same way. They look about the same, with one disk having 2 ncmds_in_driver - but it's not the same disk every time. Where should I send the dump? [root@d0-50-99-46-c2-00 /var/crash/volatile]# mdb unix.12 vmcore.12 Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc pcplusmp scsi_vhci ufs ip hook neti sockfs arp usba stmf_sbd stmf zfs mm sd lofs idm sata crypto random cpc logindmux ptm kvm sppp nsmb smbsrv nfs ipc ] > ::status debugging crash dump vmcore.12 (64-bit) from d0-50-99-46-c2-00 operating system: 5.11 joyent_20160906T181054Z (i86pc) image uuid: (not set) panic message: I/O to pool 'titan' appears to be hung. dump content: kernel pages only > $C ffffff003d3149d0 vpanic() ffffff003d314a20 vdev_deadman+0x10b(ffffff0d13422300) ffffff003d314a70 vdev_deadman+0x4a(ffffff0d11955080) ffffff003d314ac0 vdev_deadman+0x4a(ffffff0d119556c0) ffffff003d314af0 spa_deadman+0xad(ffffff0d10a94000) ffffff003d314b90 cyclic_softint+0xfd(ffffff0d07a64540, 0) ffffff003d314ba0 cbe_low_level+0x14() ffffff003d314bf0 av_dispatch_softvect+0x78(2) ffffff003d314c20 dispatch_softint+0x39(0, 0) ffffff003d2c3a20 switch_sp_and_call+0x13() ffffff003d2c3a60 dosoftint+0x44(ffffff003d2c3ad0) ffffff003d2c3ac0 do_interrupt+0xba(ffffff003d2c3ad0, 0) ffffff003d2c3ad0 _interrupt+0xba() ffffff003d2c3bc0 i86_mwait+0xd() ffffff003d2c3c00 cpu_idle_mwait+0x109() ffffff003d2c3c20 idle+0xa7() ffffff003d2c3c30 thread_start+8() > *sd_state::walk softstate | ::print struct sd_lun un_sd un_ncmds_in_driver un_sd = 0xffffff0d0e66aed8 un_ncmds_in_driver = 0 un_sd = 0xffffff0d0e9ea108 un_ncmds_in_driver = 0 un_sd = 0xffffff0d0ecba950 un_ncmds_in_driver = 0 un_sd = 0xffffff0d0e70a6a0 un_ncmds_in_driver = 0 un_sd = 0xffffff0d0ecbaad0 un_ncmds_in_driver = 0 un_sd = 0xffffff0d0ecba650 un_ncmds_in_driver = 0 un_sd = 0xffffff0d0ecba890 un_ncmds_in_driver = 0x2 un_sd = 0xffffff0d0ecba0b0 un_ncmds_in_driver = 0 un_sd = 0xffffff0d10a99f58 un_ncmds_in_driver = 0 un_sd = 0xffffff0d10a99e98 un_ncmds_in_driver = 0 un_sd = 0xffffff0d10a99e38 un_ncmds_in_driver = 0 > 0xffffff0d0ecba890::print struct scsi_device sd_dev | ::devinfo ffffff0d05e127f8 scsiclass,00, instance #6 (driver name: sd) Driver properties at ffffff0d109f16f0: name='inquiry-serial-no' type=string items=1 value='VKJYRS2Y' name='pm-components' type=string items=5 value='NAME=spindle-motor' + '0=stopped' + '1=standby' + '2=idle' + '3=active' name='pm-hardware-state' type=string items=1 value='needs-suspend-resume' name='ddi-failfast-supported' type=any items=0 name='ddi-kernel-ioctl' type=any items=0 name='fm-ereport-capable' type=any items=0 name='pm-capable' type=int items=1 value=00030006 Hardware properties at ffffff0d109f16c8: name='devid' type=string items=1 value='id1,sd@SATA_____WDC_WD80EFZX-68U____________VKJYRS2Y' name='inquiry-device-type' type=int items=1 value=00000000 name='inquiry-revision-id' type=string items=1 value='83.H0A83' name='inquiry-product-id' type=string items=1 value='WD80EFZX-68UW8N0' name='inquiry-vendor-id' type=string items=1 value='WDC' name='class' type=string items=1 value='scsi' name='sata-phy' type=int items=1 value=00000000 name='compatible' type=string items=4 value='scsiclass,00.vATA.pWDC_WD80EFZX-68U.r0A83' + 'scsiclass,00.vATA.pWDC_WD80EFZX-68U' + 'scsiclass,00' + 'scsiclass' name='lun' type=int items=1 value=00000000 name='target' type=int items=1 value=00000000 name='device-type' type=string items=1 value='scsi' Global properties at ffffff0d05cbd400: name='ddi-devid-registrant' type=int items=1 value=00000001 name='sd-config-list' type=string items=14 value='' + 'retries-timeout:1,retries-busy:1,retries-reset:1,retries-victim:2' + 'DELL PERC H710' + ' cache-nonvolatile:true' + 'DELL PERC H700' + 'cache-nonvolatile:true' + 'DELL PERC/6i' + 'cache-nonvolatile:true ' + 'ATA Samsung SSD 830' + 'physical-block-size:4096' + 'ATA Samsung SSD 840' + 'physical-block-size:4096' + ' ATA Samsung SSD 850' + 'physical-block-size:4096' > *sd_state::walk softstate | ::print struct sd_lun un_cmd_timeout un_cmd_timeout = 0xa un_cmd_timeout = 0xa un_cmd_timeout = 0xa un_cmd_timeout = 0xa un_cmd_timeout = 0xa un_cmd_timeout = 0xa un_cmd_timeout = 0xa un_cmd_timeout = 0xa un_cmd_timeout = 0xa un_cmd_timeout = 0xa un_cmd_timeout = 0xa > On 12 September 2016 at 09:22, Daniel Carosone <daniel.caros...@gmail.com> wrote: > Oh, hm, one more detail: in the time between when scrubbing activity on > that pool stops and the deadman fires, a shell on the box still works. > zpool status will hang on the pool that panics, but works fine on other > pools, including the one with the flaky ssd notionally as a member. > > So whatever is hung really does seem to involve those disks / controllers > / ports specifically. > -- Dan. ------------------------------------------- smartos-discuss Archives: https://www.listbox.com/member/archive/184463/=now RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00 Modify Your Subscription: https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb Powered by Listbox: http://www.listbox.com