On 1/5/17 6:55 , Adam Richmond-Gordon wrote:
> Afternoon!
> 
> I’ve been trying to diagnose a box that’s been crashing after anywhere 
> between 3 and 30 days uptime. For a little while, I’d suspected the issue 
> might be storage related, as the crashes never created a dump, but sending an 
> NMI always did.
> 
> Today, the box actually created a dump, and on first investigation, it 
> appears that the mpt_sas driver may be involved. Maybe. The last message in 
> the buffer points to the PCI address that the SAS controller occupies, but a 
> lot of the messages before that appear to be related to KVM.
> 
> If anyone has the time, I’d really appreciate some pointers on narrowing this 
> down. I’ve not raised an issue on GitHub yet, because this could easily be a 
> hardware-related issue.

The stack that you have is when exiting a mutex something incorrect
happen. We tried to unlock the lock to something we don't have. If you
can upload the dump somewhere that folks can dig into it, that'd be
helpful. If you need help doing so or need a location to put it, please
let me know. I'd also create a github ticket for that.

For the crashes not creating a dump, I'd double check your dump device size.

Robert

> Through some poking around at the end of last year, I have also noticed that 
> the onboard SAS controller (LSI/Avago 3008) isn’t running the firmware we 
> specified to the VAR - not sure if this is likely to make a difference, but 
> it’s easily flashed. It is currently running the IR firmware on and older 
> phase, when it should really be running the recent IT firmware.
> 
> Here are the potentially interesting bits from the dump;
> 
>> ::status
> debugging crash dump vmcore.0 (64-bit) from bri-triw-001
> operating system: 5.11 joyent_20161208T003358Z (i86pc)
> image uuid: (not set)
> panic message:
> mutex_exit: not owner, lp=fffffea460ba2020 owner=fffffea3b90aa460 
> thread=fffffea3f40b1be0
> dump content: kernel pages only
> 
>> $C
> fffffe426a808fe8 vpanic()
> fffffe426a809008 mutex_panic+0x58(fffffffffb94dc45, fffffea460ba2020)
> fffffe426a809038 mutex_vector_exit+0x40(fffffea460ba2020)
> fffffe426a809070 gfn_to_memslot_unaliased+0x6f()
> fffffe426a809090 gfn_to_hva+0x27()
> fffffe426a8090c0 kvm_read_guest_page+0x29()
> fffffe426a809110 kvm_read_guest+0x43()
> fffffe426a809190 paging64_walk_addr+0xef()
> fffffe426a809230 paging64_gva_to_gpa+0x43()
> fffffe426a809260 kvm_mmu_gva_to_gpa_read+0x45()
> fffffe426a8092b0 emulator_read_emulated+0x7c()
> fffffe426a809350 x86_emulate_insn+0x1af()
> fffffe426a809390 emulate_instruction+0x1e9()
> fffffe426a8093c0 kvm_mmu_page_fault+0x60()
> fffffe426a8093f0 handle_ept_violation+0x111()
> fffffe426a809430 vmx_handle_exit+0x16a()
> fffffe426a809460 vcpu_enter_guest+0x3ea()
> fffffe426a8094a0 __vcpu_run+0x8b()
> fffffe426a8094e0 kvm_arch_vcpu_ioctl_run+0x112()
> fffffe426a809cc0 kvm_ioctl+0x466()
> fffffe426a809d00 cdev_ioctl+0x39(3400000068, 2000ae80, 0, 202003, 
> fffffea3b7d370d0,
> fffffe426a809ea8)
> fffffe426a809d50 spec_ioctl+0x60(fffffea4440b5040, 2000ae80, 0, 202003, 
> fffffea3b7d370d0,
> fffffe426a809ea8, 0)
> fffffe426a809de0 fop_ioctl+0x55(fffffea4440b5040, 2000ae80, 0, 202003, 
> fffffea3b7d370d0,
> fffffe426a809ea8, 0)
> fffffe426a809f00 ioctl+0x9b(16, 2000ae80, 0)
> fffffe426a809f10 sys_syscall+0x1a2()
> 
>> ::msgbuf
> MESSAGE
> vcpu 3 received sipi with vector # 10
> vcpu 5 received sipi with vector # 10
> kvm_lapic_reset: vcpu=fffffea4434cf000, id=3, base_msr= fee00800 PRIx64 
> base_address=fee00
> 000
> kvm_lapic_reset: vcpu=fffffea4434ff000, id=5, base_msr= fee00800 PRIx64 
> base_address=fee00
> 000
> vcpu 9 received sipi with vector # 10
> vcpu 10 received sipi with vector # 10
> kvm_lapic_reset: vcpu=fffffea44354f000, id=9, base_msr= fee00800 PRIx64 
> base_address=fee00
> 000
> vcpu 4 received sipi with vector # 10
> kvm_lapic_reset: vcpu=fffffea493522000, id=10, base_msr= fee00800 PRIx64 
> base_address=fee0
> 0000
> vcpu 6 received sipi with vector # 10
> vcpu 7 received sipi with vector # 10
> kvm_lapic_reset: vcpu=fffffea44348f000, id=1, base_msr= fee00800 PRIx64 
> base_address=fee00
> 000
> vcpu 8 received sipi with vector # 10
> vcpu 2 received sipi with vector # 10
> vcpu 11 received sipi with vector # 10
> kvm_lapic_reset: vcpu=fffffea4434bf000, id=4, base_msr= fee00800 PRIx64 
> base_address=fee00
> 000
> kvm_lapic_reset: vcpu=fffffea44352f000, id=6, base_msr= fee00800 PRIx64 
> base_address=fee00
> 000
> kvm_lapic_reset: vcpu=fffffea443517000, id=7, base_msr= fee00800 PRIx64 
> base_address=fee00
> 000
> kvm_lapic_reset: vcpu=fffffea443557000, id=8, base_msr= fee00800 PRIx64 
> base_address=fee00
> 000
> kvm_lapic_reset: vcpu=fffffea44347f000, id=2, base_msr= fee00800 PRIx64 
> base_address=fee00
> 000
> kvm_lapic_reset: vcpu=fffffea49351a000, id=11, base_msr= fee00800 PRIx64 
> base_address=fee0
> 0000
> unhandled wrmsr: 0x0 data 0
> vcpu 1 received sipi with vector # 1
> kvm_lapic_reset: vcpu=fffffea44348f000, id=1, base_msr= fee00800 PRIx64 
> base_address=fee00
> 000
> vcpu 2 received sipi with vector # 1
> kvm_lapic_reset: vcpu=fffffea44347f000, id=2, base_msr= fee00800 PRIx64 
> base_address=fee00
> 000
> vcpu 3 received sipi with vector # 1
> kvm_lapic_reset: vcpu=fffffea4434cf000, id=3, base_msr= fee00800 PRIx64 
> base_address=fee00
> 000
> vcpu 4 received sipi with vector # 1
> kvm_lapic_reset: vcpu=fffffea4434bf000, id=4, base_msr= fee00800 PRIx64 
> base_address=fee00
> 000
> vcpu 5 received sipi with vector # 1
> kvm_lapic_reset: vcpu=fffffea4434ff000, id=5, base_msr= fee00800 PRIx64 
> base_address=fee00
> 000
> vcpu 6 received sipi with vector # 1
> kvm_lapic_reset: vcpu=fffffea44352f000, id=6, base_msr= fee00800 PRIx64 
> base_address=fee00
> 000
> vcpu 7 received sipi with vector # 1
> kvm_lapic_reset: vcpu=fffffea443517000, id=7, base_msr= fee00800 PRIx64 
> base_address=fee00
> 000
> vcpu 8 received sipi with vector # 1
> kvm_lapic_reset: vcpu=fffffea443557000, id=8, base_msr= fee00800 PRIx64 
> base_address=fee00
> 000
> vcpu 9 received sipi with vector # 1
> kvm_lapic_reset: vcpu=fffffea44354f000, id=9, base_msr= fee00800 PRIx64 
> base_address=fee00
> 000
> vcpu 10 received sipi with vector # 1
> kvm_lapic_reset: vcpu=fffffea493522000, id=10, base_msr= fee00800 PRIx64 
> base_address=fee0
> 0000
> vcpu 11 received sipi with vector # 1
> kvm_lapic_reset: vcpu=fffffea49351a000, id=11, base_msr= fee00800 PRIx64 
> base_address=fee0
> 0000
> unhandled wrmsr: 0x0 data 0
> unhandled wrmsr: 0x0 data 0
> unhandled wrmsr: 0x0 data 0
> unhandled wrmsr: 0x0 data 0
> unhandled wrmsr: 0xffbfe0 data fffffd7fffdfe720
> unhandled wrmsr: 0xffbfe0 data fffffd7fffdfe720
> unhandled wrmsr: 0xffbfe0 data fffffd7fffdfe720
> unhandled wrmsr: 0xffbfe0 data fffffd7fffdfe720
> vcpu 1 received sipi with vector # 10
> vcpu 3 received sipi with vector # 10
> vcpu 2 received sipi with vector # 10
> kvm_lapic_reset: vcpu=fffffea5ac6e8000, id=1, base_msr= fee00800 PRIx64 
> base_address=fee00
> 000
> kvm_lapic_reset: vcpu=fffffea625738000, id=3, base_msr= fee00800 PRIx64 
> base_address=fee00
> 000
> kvm_lapic_reset: vcpu=fffffea443527000, id=2, base_msr= fee00800 PRIx64 
> base_address=fee00
> 000
> unhandled wrmsr: 0x0 data 0
> vcpu 1 received sipi with vector # 1
> kvm_lapic_reset: vcpu=fffffea5ac6e8000, id=1, base_msr= fee00800 PRIx64 
> base_address=fee00
> 000
> vcpu 2 received sipi with vector # 1
> kvm_lapic_reset: vcpu=fffffea443527000, id=2, base_msr= fee00800 PRIx64 
> base_address=fee00
> 000
> vcpu 3 received sipi with vector # 1
> kvm_lapic_reset: vcpu=fffffea625738000, id=3, base_msr= fee00800 PRIx64 
> base_address=fee00
> 000
> unhandled wrmsr: 0x0 data 0
> unhandled wrmsr: 0x0 data 0
> unhandled wrmsr: 0x0 data 0
> unhandled wrmsr: 0x0 data 0
> unhandled wrmsr: 0x11e4720 data fffffd7fffdfe720
> unhandled wrmsr: 0x11e4720 data fffffd7fffdfe720
> unhandled wrmsr: 0x11e4720 data fffffd7fffdfe720
> unhandled wrmsr: 0x11e4720 data fffffd7fffdfe720
> vcpu 2 received sipi with vector # 10
> vcpu 1 received sipi with vector # 10
> vcpu 3 received sipi with vector # 10
> kvm_lapic_reset: vcpu=fffffea5ac6e8000, id=1, base_msr= fee00800 PRIx64 
> base_address=fee00
> 000
> kvm_lapic_reset: vcpu=fffffea443527000, id=2, base_msr= fee00800 PRIx64 
> base_address=fee00
> 000
> kvm_lapic_reset: vcpu=fffffea625738000, id=3, base_msr= fee00800 PRIx64 
> base_address=fee00
> 000
> unhandled wrmsr: 0x0 data 0
> vcpu 1 received sipi with vector # 1
> kvm_lapic_reset: vcpu=fffffea5ac6e8000, id=1, base_msr= fee00800 PRIx64 
> base_address=fee00
> 000
> vcpu 2 received sipi with vector # 1
> kvm_lapic_reset: vcpu=fffffea443527000, id=2, base_msr= fee00800 PRIx64 
> base_address=fee00
> 000
> vcpu 3 received sipi with vector # 1
> kvm_lapic_reset: vcpu=fffffea625738000, id=3, base_msr= fee00800 PRIx64 
> base_address=fee00
> 000
> unhandled wrmsr: 0x0 data 0
> unhandled wrmsr: 0x0 data 0
> unhandled wrmsr: 0x0 data 0
> unhandled wrmsr: 0x0 data 0
> unhandled wrmsr: 0x11e4a20 data fffffd7fffdfe720
> unhandled wrmsr: 0x11e4a20 data fffffd7fffdfe720
> unhandled wrmsr: 0x11e4a20 data fffffd7fffdfe720
> unhandled wrmsr: 0x11e4a20 data fffffd7fffdfe720
> vcpu 3 received sipi with vector # 10
> vcpu 1 received sipi with vector # 10
> vcpu 2 received sipi with vector # 10
> kvm_lapic_reset: vcpu=fffffea625738000, id=3, base_msr= fee00800 PRIx64 
> base_address=fee00
> 000
> kvm_lapic_reset: vcpu=fffffea443527000, id=2, base_msr= fee00800 PRIx64 
> base_address=fee00
> 000
> kvm_lapic_reset: vcpu=fffffea5ac6e8000, id=1, base_msr= fee00800 PRIx64 
> base_address=fee00
> 000
> unhandled wrmsr: 0x0 data 0
> vcpu 1 received sipi with vector # 1
> kvm_lapic_reset: vcpu=fffffea5ac6e8000, id=1, base_msr= fee00800 PRIx64 
> base_address=fee00
> 000
> vcpu 2 received sipi with vector # 1
> kvm_lapic_reset: vcpu=fffffea443527000, id=2, base_msr= fee00800 PRIx64 
> base_address=fee00
> 000
> vcpu 3 received sipi with vector # 1
> kvm_lapic_reset: vcpu=fffffea625738000, id=3, base_msr= fee00800 PRIx64 
> base_address=fee00
> 000
> 
> panic[cpu20]/thread=fffffea3f40b1be0:
> mutex_exit: not owner, lp=fffffea460ba2020 owner=fffffea3b90aa460 
> thread=fffffea3f40b1be0
> 
> 
>   >> warning! 8-byte aligned %fp = fffffe426a809008
> fffffe426a809008 unix:mutex_panic+58 ()
>   >> warning! 8-byte aligned %fp = fffffe426a809038
> fffffe426a809038 unix:mutex_vector_exit+40 ()
> fffffe426a809070 kvm:gfn_to_memslot_unaliased+6f ()
> fffffe426a809090 kvm:gfn_to_hva+27 ()
> fffffe426a8090c0 kvm:kvm_read_guest_page+29 ()
> fffffe426a809110 kvm:kvm_read_guest+43 ()
> fffffe426a809190 kvm:paging64_walk_addr+ef ()
> fffffe426a809230 kvm:paging64_gva_to_gpa+43 ()
> fffffe426a809260 kvm:kvm_mmu_gva_to_gpa_read+45 ()
> fffffe426a8092b0 kvm:emulator_read_emulated+7c ()
> fffffe426a809350 kvm:x86_emulate_insn+1af ()
> fffffe426a809390 kvm:emulate_instruction+1e9 ()
> fffffe426a8093c0 kvm:kvm_mmu_page_fault+60 ()
> fffffe426a8093f0 kvm:handle_ept_violation+111 ()
> fffffe426a809430 kvm:vmx_handle_exit+16a ()
> fffffe426a809460 kvm:vcpu_enter_guest+3ea ()
> fffffe426a8094a0 kvm:__vcpu_run+8b ()
> fffffe426a8094e0 kvm:kvm_arch_vcpu_ioctl_run+112 ()
> fffffe426a809cc0 kvm:kvm_ioctl+466 ()
> fffffe426a809d00 genunix:cdev_ioctl+39 ()
> fffffe426a809d50 specfs:spec_ioctl+60 ()
> fffffe426a809de0 genunix:fop_ioctl+55 ()
> fffffe426a809f00 genunix:ioctl+9b ()
> fffffe426a809f10 unix:brand_sys_syscall+21d ()
> 
> WARNING: /pci@0,0/pci8086,6f02@1/pci15d9,808@0 (mpt_sas0):
>         mptsas_handle_event_sync: event 0xf, IOCStatus=0x8000, 
> IOCLogInfo=0x31170000
> /pci@0,0/pci8086,6f02@1/pci15d9,808@0 (mpt_sas0):
>         Log info 0x31170000 received for target 16 w5000cca07d0109a9.
>         scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
> 


-------------------------------------------
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com

Reply via email to