Thanks Dale, thanks Robert.
Unfortunately the dtrace script does not work - with nvme driver unloaded,
it complains "::nvme_attach:entry does not match any probes"; with nvme
loaded, "rem_drv nvme" returns "Device busy Cannot unload module: nvme Will
be unloaded upon reboot.".
So I built an OS image with nvme debug output, and found out that nvme
driver itself seems to be working fine, nvme_attach() for each of the NVMe
SSDs succeeded.
The problem is that for some reason, the OS decided to retire /pci@0
,0/pci8086,6f08@3, which subsequently caused all NVMe SSDs to be retired:
genunix: [ID 385786 kern.notice] NOTICE: Retire device: found dip =
ffffd127046fed58 for /pci@0,0/pci8086,6f08@3.
genunix: [ID 701951 kern.notice] NOTICE: retire: subtree retire notify:
path = /pci@0,0/pci8086,6f08@3
genunix: [ID 966719 kern.notice] NOTICE: retire succeeded: path = /pci@0
,0/pci8086,6f08@3
genunix: [ID 631017 kern.notice] NOTICE: Device: already retired: /pci@0
,0/pci8086,6f08@3/pci10b5,9765@0
genunix: [ID 943439 kern.notice] NOTICE: attach: device is retired:
path=/pci@0,0/pci8086,6f08@3/pci10b5,9765@0/pci10b5,9765@4/pci8086,3703@0
/blkdev@1,0
genunix: [ID 619516 kern.notice] NOTICE: attach: Mark and fence subtree:
path=/pci@0,0/pci8086,6f08@3/pci10b5,9765@0/pci10b5,9765@4/pci8086,3703@0
/blkdev@1,0
genunix: [ID 319796 kern.notice] NOTICE: Fenced: /pci@0,0/pci8086,6f08@3
/pci10b5,9765@0/pci10b5,9765@4/pci8086,3703@0/blkdev@1,0
genunix: [ID 943439 kern.notice] NOTICE: attach: device is retired:
path=/pci@0,0/pci8086,6f08@3/pci10b5,9765@0/pci10b5,9765@5/pci8086,370a@0
/blkdev@1,0
genunix: [ID 619516 kern.notice] NOTICE: attach: Mark and fence subtree:
path=/pci@0,0/pci8086,6f08@3/pci10b5,9765@0/pci10b5,9765@5/pci8086,370a@0
/blkdev@1,0
genunix: [ID 319796 kern.notice] NOTICE: Fenced: /pci@0,0/pci8086,6f08@3
/pci10b5,9765@0/pci10b5,9765@5/pci8086,370a@0/blkdev@1,0
genunix: [ID 943439 kern.notice] NOTICE: attach: device is retired:
path=/pci@0,0/pci8086,6f08@3/pci10b5,9765@0/pci10b5,9765@6/pci8086,370a@0
/blkdev@1,0
genunix: [ID 619516 kern.notice] NOTICE: attach: Mark and fence subtree:
path=/pci@0,0/pci8086,6f08@3/pci10b5,9765@0/pci10b5,9765@6/pci8086,370a@0
/blkdev@1,0
genunix: [ID 319796 kern.notice] NOTICE: Fenced: /pci@0,0/pci8086,6f08@3
/pci10b5,9765@0/pci10b5,9765@6/pci8086,370a@0/blkdev@1,0
genunix: [ID 943439 kern.notice] NOTICE: attach: device is retired:
path=/pci@0,0/pci8086,6f08@3/pci10b5,9765@0/pci10b5,9765@7/pci8086,370a@0
/blkdev@1,0
genunix: [ID 619516 kern.notice] NOTICE: attach: Mark and fence subtree:
path=/pci@0,0/pci8086,6f08@3/pci10b5,9765@0/pci10b5,9765@7/pci8086,370a@0
/blkdev@1,0
genunix: [ID 319796 kern.notice] NOTICE: Fenced: /pci@0,0/pci8086,6f08@3
/pci10b5,9765@0/pci10b5,9765@7/pci8086,370a@0/blkdev@1,0
I panicked the host when e_ddi_retire_device() is called, here is what I
found:
it is /usr/lib/fm/fmd/fmd who calls modctl -> modctl_retire
-> e_ddi_retire_device to retire /pci@0,0/pci8086,6f08@3.
Attached is a file with some entries produced by fmdump. It's weird that
sometimes I got those fm entries but sometimes the system generated nothing
but still retired the drives.
I don't know how to interpret those entries, maybe someone on the list can
shed some light?
Device 8086:6f08 is "Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3
v4/Xeon D PCI Express Root Port 3" and seems to use "PCIe bridge/switch
driver" (pcieb). Is it possible the pcieb driver in illumos does not work
properly with this device?
Thanks,
--Youzhong
On Wed, Jun 22, 2016 at 11:30 AM, Dale Ghent <[email protected]> wrote:
>
> > On Jun 22, 2016, at 9:20 AM, Youzhong Yang <[email protected]> wrote:
> >
> > Thanks Robert. Yes the driver failed to attach:
> >
> > # rem_drv nvme
> > # add_drv nvme
> > devfsadm: driver failed to attach: nvme
> > Warning: Driver (nvme) successfully added to system but failed to attach
>
> Here's a dtrace script:
> https://paste.ec/paste/Mk5A6WTy#p-75cIYWF6qap6Jlo2T4x/mWGDnapNQda6sNP+wPQeV
>
> In one window, run it as such:
> ./nvmestack.d nvme_attach
>
> In a second window, run rem_drv and add_drv as you did above, and provide
> the output of the script.
>
> /dale
>
Jun 24 06:12:13.3100 ereport.io.pci.fabric 0x549024bd89b01001
Jun 24 06:12:13.3181 ereport.io.pci.fabric 0x54902c71f1101001
Jun 24 06:12:13.3181 ereport.io.pci.fabric 0x54902c769f701001
Jun 24 06:12:13.1585 ereport.io.pciex.rc.ce-msg 0x548f943974a01001
Jun 24 06:12:13.3211 ereport.io.pci.fabric 0x54902f5afaf01001
Jun 24 06:12:13.3212 ereport.io.pci.fabric 0x54902f5fd7f01001
Jun 24 06:12:13.1585 ereport.io.pciex.pl.re 0x548f943e23a01001
Jun 24 06:12:13.3977 ereport.io.pci.fabric 0x54907854f6101001
Jun 24 06:12:13.1585 ereport.io.pciex.rc.ce-msg 0x548f943e23a01001
Jun 24 2016 06:12:13.310053547 ereport.io.pci.fabric
nvlist version: 0
class = ereport.io.pci.fabric
ena = 0x549024bd89b01001
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = dev
device-path = /pci@0,0/pci8086,6f08@3/pci10b5,9765@0
(end detector)
bdf = 0x300
device_id = 0x9765
vendor_id = 0x10b5
rev_id = 0xaa
dev_type = 0x50
pcie_off = 0x68
pcix_off = 0x0
aer_off = 0xfb4
ecc_ver = 0x0
pci_status = 0x10
pci_command = 0x147
pci_bdg_sec_status = 0x0
pci_bdg_ctrl = 0x3
pcie_status = 0x1
pcie_command = 0x37
pcie_dev_cap = 0x648002
pcie_adv_ctl = 0xbf
pcie_ue_status = 0x0
pcie_ue_mask = 0x180000
pcie_ue_sev = 0x62030
pcie_ue_hdr0 = 0x0
pcie_ue_hdr1 = 0x0
pcie_ue_hdr2 = 0x0
pcie_ue_hdr3 = 0x0
pcie_ce_status = 0x1
pcie_ce_mask = 0x0
remainder = 0x0
severity = 0x3
__ttl = 0x1
__tod = 0x576d077d 0x127b0aab
Jun 24 2016 06:12:13.318131206 ereport.io.pci.fabric
nvlist version: 0
class = ereport.io.pci.fabric
ena = 0x54902c71f1101001
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = dev
device-path = /pci@0,0/pci8086,6f08@3
(end detector)
bdf = 0x18
device_id = 0x6f08
vendor_id = 0x8086
rev_id = 0x1
dev_type = 0x40
pcie_off = 0x90
pcix_off = 0x0
aer_off = 0x148
ecc_ver = 0x0
pci_status = 0x10
pci_command = 0x47
pci_bdg_sec_status = 0x0
pci_bdg_ctrl = 0x3
pcie_status = 0x0
pcie_command = 0x27
pcie_dev_cap = 0x8001
pcie_adv_ctl = 0xa0
pcie_ue_status = 0x0
pcie_ue_mask = 0x180000
pcie_ue_sev = 0x62030
pcie_ue_hdr0 = 0x0
pcie_ue_hdr1 = 0x0
pcie_ue_hdr2 = 0x0
pcie_ue_hdr3 = 0x0
pcie_ce_status = 0x0
pcie_ce_mask = 0x0
pcie_rp_status = 0x0
pcie_rp_control = 0x0
pcie_adv_rp_status = 0x1
pcie_adv_rp_command = 0x7
pcie_adv_rp_ce_src_id = 0x300
pcie_adv_rp_ue_src_id = 0x0
remainder = 0x1
severity = 0x1
__ttl = 0x1
__tod = 0x576d077d 0x12f64c06
Jun 24 2016 06:12:13.318150094 ereport.io.pci.fabric
nvlist version: 0
class = ereport.io.pci.fabric
ena = 0x54902c769f701001
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = dev
device-path = /pci@0,0/pci8086,6f08@3/pci10b5,9765@0
(end detector)
bdf = 0x300
device_id = 0x9765
vendor_id = 0x10b5
rev_id = 0xaa
dev_type = 0x50
pcie_off = 0x68
pcix_off = 0x0
aer_off = 0xfb4
ecc_ver = 0x0
pci_status = 0x10
pci_command = 0x147
pci_bdg_sec_status = 0x0
pci_bdg_ctrl = 0x3
pcie_status = 0x1
pcie_command = 0x37
pcie_dev_cap = 0x648002
pcie_adv_ctl = 0xbf
pcie_ue_status = 0x0
pcie_ue_mask = 0x180000
pcie_ue_sev = 0x62030
pcie_ue_hdr0 = 0x0
pcie_ue_hdr1 = 0x0
pcie_ue_hdr2 = 0x0
pcie_ue_hdr3 = 0x0
pcie_ce_status = 0x1
pcie_ce_mask = 0x0
remainder = 0x0
severity = 0x3
__ttl = 0x1
__tod = 0x576d077d 0x12f695ce
Jun 24 2016 06:12:13.158523978 ereport.io.pciex.rc.ce-msg
nvlist version: 0
ena = 0x548f943974a01001
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = dev
device-path = /pci@0,0/pci8086,6f08@3
(end detector)
class = ereport.io.pciex.rc.ce-msg
rc-status = 0x1
source-id = 0x300
source-valid = 1
__ttl = 0x1
__tod = 0x576d077d 0x972e24a
Jun 24 2016 06:12:13.321182265 ereport.io.pci.fabric
nvlist version: 0
class = ereport.io.pci.fabric
ena = 0x54902f5afaf01001
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = dev
device-path = /pci@0,0/pci8086,6f08@3
(end detector)
bdf = 0x18
device_id = 0x6f08
vendor_id = 0x8086
rev_id = 0x1
dev_type = 0x40
pcie_off = 0x90
pcix_off = 0x0
aer_off = 0x148
ecc_ver = 0x0
pci_status = 0x10
pci_command = 0x47
pci_bdg_sec_status = 0x0
pci_bdg_ctrl = 0x3
pcie_status = 0x0
pcie_command = 0x27
pcie_dev_cap = 0x8001
pcie_adv_ctl = 0xa0
pcie_ue_status = 0x0
pcie_ue_mask = 0x180000
pcie_ue_sev = 0x62030
pcie_ue_hdr0 = 0x0
pcie_ue_hdr1 = 0x0
pcie_ue_hdr2 = 0x0
pcie_ue_hdr3 = 0x0
pcie_ce_status = 0x0
pcie_ce_mask = 0x0
pcie_rp_status = 0x0
pcie_rp_control = 0x0
pcie_adv_rp_status = 0x1
pcie_adv_rp_command = 0x7
pcie_adv_rp_ce_src_id = 0x300
pcie_adv_rp_ue_src_id = 0x0
remainder = 0x1
severity = 0x1
__ttl = 0x1
__tod = 0x576d077d 0x1324da39
Jun 24 2016 06:12:13.321200166 ereport.io.pci.fabric
nvlist version: 0
class = ereport.io.pci.fabric
ena = 0x54902f5fd7f01001
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = dev
device-path = /pci@0,0/pci8086,6f08@3/pci10b5,9765@0
(end detector)
bdf = 0x300
device_id = 0x9765
vendor_id = 0x10b5
rev_id = 0xaa
dev_type = 0x50
pcie_off = 0x68
pcix_off = 0x0
aer_off = 0xfb4
ecc_ver = 0x0
pci_status = 0x10
pci_command = 0x147
pci_bdg_sec_status = 0x0
pci_bdg_ctrl = 0x3
pcie_status = 0x1
pcie_command = 0x37
pcie_dev_cap = 0x648002
pcie_adv_ctl = 0xbf
pcie_ue_status = 0x0
pcie_ue_mask = 0x180000
pcie_ue_sev = 0x62030
pcie_ue_hdr0 = 0x0
pcie_ue_hdr1 = 0x0
pcie_ue_hdr2 = 0x0
pcie_ue_hdr3 = 0x0
pcie_ce_status = 0x1
pcie_ce_mask = 0x0
remainder = 0x0
severity = 0x3
__ttl = 0x1
__tod = 0x576d077d 0x13252026
Jun 24 2016 06:12:13.158542439 ereport.io.pciex.pl.re
nvlist version: 0
ena = 0x548f943e23a01001
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = dev
device-path = /pci@0,0/pci8086,6f08@3/pci10b5,9765@0
(end detector)
class = ereport.io.pciex.pl.re
dev-status = 0x1
ce-status = 0x1
__ttl = 0x1
__tod = 0x576d077d 0x9732a67
Jun 24 2016 06:12:13.397700477 ereport.io.pci.fabric
nvlist version: 0
class = ereport.io.pci.fabric
ena = 0x54907854f6101001
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = dev
device-path = /pci@0,0/pci8086,6f08@3
(end detector)
bdf = 0x18
device_id = 0x6f08
vendor_id = 0x8086
rev_id = 0x1
dev_type = 0x40
pcie_off = 0x90
pcix_off = 0x0
aer_off = 0x148
ecc_ver = 0x0
pci_status = 0x10
pci_command = 0x47
pci_bdg_sec_status = 0x0
pci_bdg_ctrl = 0x3
pcie_status = 0x0
pcie_command = 0x27
pcie_dev_cap = 0x8001
pcie_adv_ctl = 0xa0
pcie_ue_status = 0x0
pcie_ue_mask = 0x180000
pcie_ue_sev = 0x62030
pcie_ue_hdr0 = 0x0
pcie_ue_hdr1 = 0x0
pcie_ue_hdr2 = 0x0
pcie_ue_hdr3 = 0x0
pcie_ce_status = 0x0
pcie_ce_mask = 0x0
pcie_rp_status = 0x0
pcie_rp_control = 0x0
pcie_adv_rp_status = 0x1
pcie_adv_rp_command = 0x7
pcie_adv_rp_ce_src_id = 0x300
pcie_adv_rp_ue_src_id = 0x0
remainder = 0x1
severity = 0x1
__ttl = 0x1
__tod = 0x576d077d 0x17b46d7d
Jun 24 2016 06:12:13.158542439 ereport.io.pciex.rc.ce-msg
nvlist version: 0
ena = 0x548f943e23a01001
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = dev
device-path = /pci@0,0
(end detector)
class = ereport.io.pciex.rc.ce-msg
rc-status = 0x1
source-id = 0x300
source-valid = 1
__ttl = 0x1
__tod = 0x576d077d 0x9732a67
-------------------------------------------
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription:
https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com