Re: [smartos-discuss] Pool scrub causes panic via spa_deadman and vdev_deadman

Daniel Carosone Tue, 13 Sep 2016 04:19:15 -0700

It panics still in scrub, even with the suspect ssd removed.

Some more stuff from mdb, based on the reports in OS-2415, below.  Notable
difference is any evidence of disk timeout or other issues prior to the
deadman firing.


I looked at several of the dumps in the same way. They look about the same,
with one disk having 2 ncmds_in_driver - but it's not the same disk every
time.

Where should I send the dump?

[root@d0-50-99-46-c2-00 /var/crash/volatile]# mdb unix.12 vmcore.12
Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc pcplusmp
scsi_vhci ufs ip hook neti sockfs arp usba stmf_sbd stmf zfs
 mm sd lofs idm sata crypto random cpc logindmux ptm kvm sppp nsmb smbsrv
nfs ipc ]
> ::status
debugging crash dump vmcore.12 (64-bit) from d0-50-99-46-c2-00
operating system: 5.11 joyent_20160906T181054Z (i86pc)
image uuid: (not set)
panic message: I/O to pool 'titan' appears to be hung.
dump content: kernel pages only
> $C
ffffff003d3149d0 vpanic()
ffffff003d314a20 vdev_deadman+0x10b(ffffff0d13422300)
ffffff003d314a70 vdev_deadman+0x4a(ffffff0d11955080)
ffffff003d314ac0 vdev_deadman+0x4a(ffffff0d119556c0)
ffffff003d314af0 spa_deadman+0xad(ffffff0d10a94000)
ffffff003d314b90 cyclic_softint+0xfd(ffffff0d07a64540, 0)
ffffff003d314ba0 cbe_low_level+0x14()
ffffff003d314bf0 av_dispatch_softvect+0x78(2)
ffffff003d314c20 dispatch_softint+0x39(0, 0)
ffffff003d2c3a20 switch_sp_and_call+0x13()
ffffff003d2c3a60 dosoftint+0x44(ffffff003d2c3ad0)
ffffff003d2c3ac0 do_interrupt+0xba(ffffff003d2c3ad0, 0)
ffffff003d2c3ad0 _interrupt+0xba()
ffffff003d2c3bc0 i86_mwait+0xd()
ffffff003d2c3c00 cpu_idle_mwait+0x109()
ffffff003d2c3c20 idle+0xa7()
ffffff003d2c3c30 thread_start+8()
> *sd_state::walk softstate | ::print struct sd_lun un_sd un_ncmds_in_driver
un_sd = 0xffffff0d0e66aed8
un_ncmds_in_driver = 0
un_sd = 0xffffff0d0e9ea108
un_ncmds_in_driver = 0
un_sd = 0xffffff0d0ecba950
un_ncmds_in_driver = 0
un_sd = 0xffffff0d0e70a6a0
un_ncmds_in_driver = 0
un_sd = 0xffffff0d0ecbaad0
un_ncmds_in_driver = 0
un_sd = 0xffffff0d0ecba650
un_ncmds_in_driver = 0
un_sd = 0xffffff0d0ecba890
un_ncmds_in_driver = 0x2
un_sd = 0xffffff0d0ecba0b0
un_ncmds_in_driver = 0
un_sd = 0xffffff0d10a99f58
un_ncmds_in_driver = 0
un_sd = 0xffffff0d10a99e98
un_ncmds_in_driver = 0
un_sd = 0xffffff0d10a99e38
un_ncmds_in_driver = 0
> 0xffffff0d0ecba890::print struct scsi_device sd_dev | ::devinfo
ffffff0d05e127f8 scsiclass,00, instance #6 (driver name: sd)
        Driver properties at ffffff0d109f16f0:
            name='inquiry-serial-no' type=string items=1
                value='VKJYRS2Y'
            name='pm-components' type=string items=5
                value='NAME=spindle-motor' + '0=stopped' + '1=standby' +
'2=idle' + '3=active'
            name='pm-hardware-state' type=string items=1
                value='needs-suspend-resume'
            name='ddi-failfast-supported' type=any items=0
            name='ddi-kernel-ioctl' type=any items=0
            name='fm-ereport-capable' type=any items=0
            name='pm-capable' type=int items=1
                value=00030006
        Hardware properties at ffffff0d109f16c8:
            name='devid' type=string items=1
                value='id1,sd@SATA_____WDC_WD80EFZX-68U____________VKJYRS2Y'
            name='inquiry-device-type' type=int items=1
                value=00000000
            name='inquiry-revision-id' type=string items=1
                value='83.H0A83'
            name='inquiry-product-id' type=string items=1
                value='WD80EFZX-68UW8N0'
            name='inquiry-vendor-id' type=string items=1
                value='WDC'
            name='class' type=string items=1
                value='scsi'
            name='sata-phy' type=int items=1
                value=00000000
            name='compatible' type=string items=4
                value='scsiclass,00.vATA.pWDC_WD80EFZX-68U.r0A83' +
'scsiclass,00.vATA.pWDC_WD80EFZX-68U' + 'scsiclass,00' + 'scsiclass'
            name='lun' type=int items=1
                value=00000000
            name='target' type=int items=1
                value=00000000
            name='device-type' type=string items=1
                value='scsi'
        Global properties at ffffff0d05cbd400:
            name='ddi-devid-registrant' type=int items=1
                value=00000001
            name='sd-config-list' type=string items=14
                value='' +
'retries-timeout:1,retries-busy:1,retries-reset:1,retries-victim:2' +
'DELL    PERC H710' + '
                cache-nonvolatile:true' + 'DELL    PERC H700' +
'cache-nonvolatile:true' + 'DELL    PERC/6i' + 'cache-nonvolatile:true
                ' + 'ATA     Samsung SSD 830' + 'physical-block-size:4096'
+ 'ATA     Samsung SSD 840' + 'physical-block-size:4096' + '
                ATA     Samsung SSD 850' + 'physical-block-size:4096'
> *sd_state::walk softstate | ::print struct sd_lun un_cmd_timeout
un_cmd_timeout = 0xa
un_cmd_timeout = 0xa
un_cmd_timeout = 0xa
un_cmd_timeout = 0xa
un_cmd_timeout = 0xa
un_cmd_timeout = 0xa
un_cmd_timeout = 0xa
un_cmd_timeout = 0xa
un_cmd_timeout = 0xa
un_cmd_timeout = 0xa
un_cmd_timeout = 0xa
>



On 12 September 2016 at 09:22, Daniel Carosone <daniel.caros...@gmail.com>
wrote:

> Oh, hm, one more detail: in the time between when scrubbing activity on
> that pool stops and the deadman fires, a shell on the box still works.
> zpool status will hang on the pool that panics, but works fine on other
> pools, including the one with the flaky ssd notionally as a member.
>
> So whatever is hung really does seem to involve those disks / controllers
> / ports specifically.
>



-- 
Dan.



-------------------------------------------
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com

Re: [smartos-discuss] Pool scrub causes panic via spa_deadman and vdev_deadman

Reply via email to