Afternoon!
I’ve been trying to diagnose a box that’s been crashing after anywhere between
3 and 30 days uptime. For a little while, I’d suspected the issue might be
storage related, as the crashes never created a dump, but sending an NMI always
did.
Today, the box actually created a dump, and on first investigation, it appears
that the mpt_sas driver may be involved. Maybe. The last message in the buffer
points to the PCI address that the SAS controller occupies, but a lot of the
messages before that appear to be related to KVM.
If anyone has the time, I’d really appreciate some pointers on narrowing this
down. I’ve not raised an issue on GitHub yet, because this could easily be a
hardware-related issue.
Through some poking around at the end of last year, I have also noticed that
the onboard SAS controller (LSI/Avago 3008) isn’t running the firmware we
specified to the VAR - not sure if this is likely to make a difference, but
it’s easily flashed. It is currently running the IR firmware on and older
phase, when it should really be running the recent IT firmware.
Here are the potentially interesting bits from the dump;
> ::status
debugging crash dump vmcore.0 (64-bit) from bri-triw-001
operating system: 5.11 joyent_20161208T003358Z (i86pc)
image uuid: (not set)
panic message:
mutex_exit: not owner, lp=fffffea460ba2020 owner=fffffea3b90aa460
thread=fffffea3f40b1be0
dump content: kernel pages only
> $C
fffffe426a808fe8 vpanic()
fffffe426a809008 mutex_panic+0x58(fffffffffb94dc45, fffffea460ba2020)
fffffe426a809038 mutex_vector_exit+0x40(fffffea460ba2020)
fffffe426a809070 gfn_to_memslot_unaliased+0x6f()
fffffe426a809090 gfn_to_hva+0x27()
fffffe426a8090c0 kvm_read_guest_page+0x29()
fffffe426a809110 kvm_read_guest+0x43()
fffffe426a809190 paging64_walk_addr+0xef()
fffffe426a809230 paging64_gva_to_gpa+0x43()
fffffe426a809260 kvm_mmu_gva_to_gpa_read+0x45()
fffffe426a8092b0 emulator_read_emulated+0x7c()
fffffe426a809350 x86_emulate_insn+0x1af()
fffffe426a809390 emulate_instruction+0x1e9()
fffffe426a8093c0 kvm_mmu_page_fault+0x60()
fffffe426a8093f0 handle_ept_violation+0x111()
fffffe426a809430 vmx_handle_exit+0x16a()
fffffe426a809460 vcpu_enter_guest+0x3ea()
fffffe426a8094a0 __vcpu_run+0x8b()
fffffe426a8094e0 kvm_arch_vcpu_ioctl_run+0x112()
fffffe426a809cc0 kvm_ioctl+0x466()
fffffe426a809d00 cdev_ioctl+0x39(3400000068, 2000ae80, 0, 202003,
fffffea3b7d370d0,
fffffe426a809ea8)
fffffe426a809d50 spec_ioctl+0x60(fffffea4440b5040, 2000ae80, 0, 202003,
fffffea3b7d370d0,
fffffe426a809ea8, 0)
fffffe426a809de0 fop_ioctl+0x55(fffffea4440b5040, 2000ae80, 0, 202003,
fffffea3b7d370d0,
fffffe426a809ea8, 0)
fffffe426a809f00 ioctl+0x9b(16, 2000ae80, 0)
fffffe426a809f10 sys_syscall+0x1a2()
> ::msgbuf
MESSAGE
vcpu 3 received sipi with vector # 10
vcpu 5 received sipi with vector # 10
kvm_lapic_reset: vcpu=fffffea4434cf000, id=3, base_msr= fee00800 PRIx64
base_address=fee00
000
kvm_lapic_reset: vcpu=fffffea4434ff000, id=5, base_msr= fee00800 PRIx64
base_address=fee00
000
vcpu 9 received sipi with vector # 10
vcpu 10 received sipi with vector # 10
kvm_lapic_reset: vcpu=fffffea44354f000, id=9, base_msr= fee00800 PRIx64
base_address=fee00
000
vcpu 4 received sipi with vector # 10
kvm_lapic_reset: vcpu=fffffea493522000, id=10, base_msr= fee00800 PRIx64
base_address=fee0
0000
vcpu 6 received sipi with vector # 10
vcpu 7 received sipi with vector # 10
kvm_lapic_reset: vcpu=fffffea44348f000, id=1, base_msr= fee00800 PRIx64
base_address=fee00
000
vcpu 8 received sipi with vector # 10
vcpu 2 received sipi with vector # 10
vcpu 11 received sipi with vector # 10
kvm_lapic_reset: vcpu=fffffea4434bf000, id=4, base_msr= fee00800 PRIx64
base_address=fee00
000
kvm_lapic_reset: vcpu=fffffea44352f000, id=6, base_msr= fee00800 PRIx64
base_address=fee00
000
kvm_lapic_reset: vcpu=fffffea443517000, id=7, base_msr= fee00800 PRIx64
base_address=fee00
000
kvm_lapic_reset: vcpu=fffffea443557000, id=8, base_msr= fee00800 PRIx64
base_address=fee00
000
kvm_lapic_reset: vcpu=fffffea44347f000, id=2, base_msr= fee00800 PRIx64
base_address=fee00
000
kvm_lapic_reset: vcpu=fffffea49351a000, id=11, base_msr= fee00800 PRIx64
base_address=fee0
0000
unhandled wrmsr: 0x0 data 0
vcpu 1 received sipi with vector # 1
kvm_lapic_reset: vcpu=fffffea44348f000, id=1, base_msr= fee00800 PRIx64
base_address=fee00
000
vcpu 2 received sipi with vector # 1
kvm_lapic_reset: vcpu=fffffea44347f000, id=2, base_msr= fee00800 PRIx64
base_address=fee00
000
vcpu 3 received sipi with vector # 1
kvm_lapic_reset: vcpu=fffffea4434cf000, id=3, base_msr= fee00800 PRIx64
base_address=fee00
000
vcpu 4 received sipi with vector # 1
kvm_lapic_reset: vcpu=fffffea4434bf000, id=4, base_msr= fee00800 PRIx64
base_address=fee00
000
vcpu 5 received sipi with vector # 1
kvm_lapic_reset: vcpu=fffffea4434ff000, id=5, base_msr= fee00800 PRIx64
base_address=fee00
000
vcpu 6 received sipi with vector # 1
kvm_lapic_reset: vcpu=fffffea44352f000, id=6, base_msr= fee00800 PRIx64
base_address=fee00
000
vcpu 7 received sipi with vector # 1
kvm_lapic_reset: vcpu=fffffea443517000, id=7, base_msr= fee00800 PRIx64
base_address=fee00
000
vcpu 8 received sipi with vector # 1
kvm_lapic_reset: vcpu=fffffea443557000, id=8, base_msr= fee00800 PRIx64
base_address=fee00
000
vcpu 9 received sipi with vector # 1
kvm_lapic_reset: vcpu=fffffea44354f000, id=9, base_msr= fee00800 PRIx64
base_address=fee00
000
vcpu 10 received sipi with vector # 1
kvm_lapic_reset: vcpu=fffffea493522000, id=10, base_msr= fee00800 PRIx64
base_address=fee0
0000
vcpu 11 received sipi with vector # 1
kvm_lapic_reset: vcpu=fffffea49351a000, id=11, base_msr= fee00800 PRIx64
base_address=fee0
0000
unhandled wrmsr: 0x0 data 0
unhandled wrmsr: 0x0 data 0
unhandled wrmsr: 0x0 data 0
unhandled wrmsr: 0x0 data 0
unhandled wrmsr: 0xffbfe0 data fffffd7fffdfe720
unhandled wrmsr: 0xffbfe0 data fffffd7fffdfe720
unhandled wrmsr: 0xffbfe0 data fffffd7fffdfe720
unhandled wrmsr: 0xffbfe0 data fffffd7fffdfe720
vcpu 1 received sipi with vector # 10
vcpu 3 received sipi with vector # 10
vcpu 2 received sipi with vector # 10
kvm_lapic_reset: vcpu=fffffea5ac6e8000, id=1, base_msr= fee00800 PRIx64
base_address=fee00
000
kvm_lapic_reset: vcpu=fffffea625738000, id=3, base_msr= fee00800 PRIx64
base_address=fee00
000
kvm_lapic_reset: vcpu=fffffea443527000, id=2, base_msr= fee00800 PRIx64
base_address=fee00
000
unhandled wrmsr: 0x0 data 0
vcpu 1 received sipi with vector # 1
kvm_lapic_reset: vcpu=fffffea5ac6e8000, id=1, base_msr= fee00800 PRIx64
base_address=fee00
000
vcpu 2 received sipi with vector # 1
kvm_lapic_reset: vcpu=fffffea443527000, id=2, base_msr= fee00800 PRIx64
base_address=fee00
000
vcpu 3 received sipi with vector # 1
kvm_lapic_reset: vcpu=fffffea625738000, id=3, base_msr= fee00800 PRIx64
base_address=fee00
000
unhandled wrmsr: 0x0 data 0
unhandled wrmsr: 0x0 data 0
unhandled wrmsr: 0x0 data 0
unhandled wrmsr: 0x0 data 0
unhandled wrmsr: 0x11e4720 data fffffd7fffdfe720
unhandled wrmsr: 0x11e4720 data fffffd7fffdfe720
unhandled wrmsr: 0x11e4720 data fffffd7fffdfe720
unhandled wrmsr: 0x11e4720 data fffffd7fffdfe720
vcpu 2 received sipi with vector # 10
vcpu 1 received sipi with vector # 10
vcpu 3 received sipi with vector # 10
kvm_lapic_reset: vcpu=fffffea5ac6e8000, id=1, base_msr= fee00800 PRIx64
base_address=fee00
000
kvm_lapic_reset: vcpu=fffffea443527000, id=2, base_msr= fee00800 PRIx64
base_address=fee00
000
kvm_lapic_reset: vcpu=fffffea625738000, id=3, base_msr= fee00800 PRIx64
base_address=fee00
000
unhandled wrmsr: 0x0 data 0
vcpu 1 received sipi with vector # 1
kvm_lapic_reset: vcpu=fffffea5ac6e8000, id=1, base_msr= fee00800 PRIx64
base_address=fee00
000
vcpu 2 received sipi with vector # 1
kvm_lapic_reset: vcpu=fffffea443527000, id=2, base_msr= fee00800 PRIx64
base_address=fee00
000
vcpu 3 received sipi with vector # 1
kvm_lapic_reset: vcpu=fffffea625738000, id=3, base_msr= fee00800 PRIx64
base_address=fee00
000
unhandled wrmsr: 0x0 data 0
unhandled wrmsr: 0x0 data 0
unhandled wrmsr: 0x0 data 0
unhandled wrmsr: 0x0 data 0
unhandled wrmsr: 0x11e4a20 data fffffd7fffdfe720
unhandled wrmsr: 0x11e4a20 data fffffd7fffdfe720
unhandled wrmsr: 0x11e4a20 data fffffd7fffdfe720
unhandled wrmsr: 0x11e4a20 data fffffd7fffdfe720
vcpu 3 received sipi with vector # 10
vcpu 1 received sipi with vector # 10
vcpu 2 received sipi with vector # 10
kvm_lapic_reset: vcpu=fffffea625738000, id=3, base_msr= fee00800 PRIx64
base_address=fee00
000
kvm_lapic_reset: vcpu=fffffea443527000, id=2, base_msr= fee00800 PRIx64
base_address=fee00
000
kvm_lapic_reset: vcpu=fffffea5ac6e8000, id=1, base_msr= fee00800 PRIx64
base_address=fee00
000
unhandled wrmsr: 0x0 data 0
vcpu 1 received sipi with vector # 1
kvm_lapic_reset: vcpu=fffffea5ac6e8000, id=1, base_msr= fee00800 PRIx64
base_address=fee00
000
vcpu 2 received sipi with vector # 1
kvm_lapic_reset: vcpu=fffffea443527000, id=2, base_msr= fee00800 PRIx64
base_address=fee00
000
vcpu 3 received sipi with vector # 1
kvm_lapic_reset: vcpu=fffffea625738000, id=3, base_msr= fee00800 PRIx64
base_address=fee00
000
panic[cpu20]/thread=fffffea3f40b1be0:
mutex_exit: not owner, lp=fffffea460ba2020 owner=fffffea3b90aa460
thread=fffffea3f40b1be0
>> warning! 8-byte aligned %fp = fffffe426a809008
fffffe426a809008 unix:mutex_panic+58 ()
>> warning! 8-byte aligned %fp = fffffe426a809038
fffffe426a809038 unix:mutex_vector_exit+40 ()
fffffe426a809070 kvm:gfn_to_memslot_unaliased+6f ()
fffffe426a809090 kvm:gfn_to_hva+27 ()
fffffe426a8090c0 kvm:kvm_read_guest_page+29 ()
fffffe426a809110 kvm:kvm_read_guest+43 ()
fffffe426a809190 kvm:paging64_walk_addr+ef ()
fffffe426a809230 kvm:paging64_gva_to_gpa+43 ()
fffffe426a809260 kvm:kvm_mmu_gva_to_gpa_read+45 ()
fffffe426a8092b0 kvm:emulator_read_emulated+7c ()
fffffe426a809350 kvm:x86_emulate_insn+1af ()
fffffe426a809390 kvm:emulate_instruction+1e9 ()
fffffe426a8093c0 kvm:kvm_mmu_page_fault+60 ()
fffffe426a8093f0 kvm:handle_ept_violation+111 ()
fffffe426a809430 kvm:vmx_handle_exit+16a ()
fffffe426a809460 kvm:vcpu_enter_guest+3ea ()
fffffe426a8094a0 kvm:__vcpu_run+8b ()
fffffe426a8094e0 kvm:kvm_arch_vcpu_ioctl_run+112 ()
fffffe426a809cc0 kvm:kvm_ioctl+466 ()
fffffe426a809d00 genunix:cdev_ioctl+39 ()
fffffe426a809d50 specfs:spec_ioctl+60 ()
fffffe426a809de0 genunix:fop_ioctl+55 ()
fffffe426a809f00 genunix:ioctl+9b ()
fffffe426a809f10 unix:brand_sys_syscall+21d ()
WARNING: /pci@0,0/pci8086,6f02@1/pci15d9,808@0 (mpt_sas0):
mptsas_handle_event_sync: event 0xf, IOCStatus=0x8000,
IOCLogInfo=0x31170000
/pci@0,0/pci8086,6f02@1/pci15d9,808@0 (mpt_sas0):
Log info 0x31170000 received for target 16 w5000cca07d0109a9.
scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
-------------------------------------------
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription:
https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com