Wow, that was stupid.

Password invalidated haha :|

> On 7 Jan 2017, at 00:24, Adam Richmond-Gordon <[email protected]> 
> wrote:
> 
> Hi Robert,
> 
> I really appreciate you picking this up!
> 
> SFTP server:
> 213.249.128.37
> Port:
> 61205
> User:
> joyent
> Password: 
> ;URe#bVg+|Yn_6(3!5eO~y>Y'E\cU}w@&DN<YxWv6d|eG_do{V<tTL1`q<KoR<C
> 
> Pretty sure there’s everything that'sl needed to get started there - I’ve 
> compressed the dump, it unpacks to about 20GB.
> 
> Let me know if you or anyone else needs anything else from me!
> 
> Adam
> 
>> On 6 Jan 2017, at 22:56, Robert Mustacchi <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> On 1/6/17 4:28 , Adam Richmond-Gordon wrote:
>>> Hello,
>>> 
>>> I have opened a GitHub issue for this;
>>> https://github.com/joyent/illumos-joyent/issues/135 
>>> <https://github.com/joyent/illumos-joyent/issues/135> 
>>> <https://github.com/joyent/illumos-joyent/issues/135 
>>> <https://github.com/joyent/illumos-joyent/issues/135>>
>> 
>> Hi Adam,
>> 
>> Thanks for following up there. If you can contact me privately, I'll
>> follow up and grab the dump so we can get some eyes on it at Joyent.
>> 
>> Thanks,
>> Robert
>> 
>>>> On 5 Jan 2017, at 17:16, Adam Richmond-Gordon <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> Thanks Robert - I’ll get it uploaded and create an issue.
>>>> 
>>>> The dump device size is set to 256GB (equal to RAM) - I’ve been gradually 
>>>> increasing it after every dumpless crash. I’d been considering connecting 
>>>> a USB disk and pointing dumpadm at that.
>>>> 
>>>>> On 5 Jan 2017, at 16:52, Robert Mustacchi <[email protected] 
>>>>> <mailto:[email protected]> <mailto:[email protected] 
>>>>> <mailto:[email protected]>>> wrote:
>>>>> 
>>>>> On 1/5/17 6:55 , Adam Richmond-Gordon wrote:
>>>>>> Afternoon!
>>>>>> 
>>>>>> I’ve been trying to diagnose a box that’s been crashing after anywhere 
>>>>>> between 3 and 30 days uptime. For a little while, I’d suspected the 
>>>>>> issue might be storage related, as the crashes never created a dump, but 
>>>>>> sending an NMI always did.
>>>>>> 
>>>>>> Today, the box actually created a dump, and on first investigation, it 
>>>>>> appears that the mpt_sas driver may be involved. Maybe. The last message 
>>>>>> in the buffer points to the PCI address that the SAS controller 
>>>>>> occupies, but a lot of the messages before that appear to be related to 
>>>>>> KVM.
>>>>>> 
>>>>>> If anyone has the time, I’d really appreciate some pointers on narrowing 
>>>>>> this down. I’ve not raised an issue on GitHub yet, because this could 
>>>>>> easily be a hardware-related issue.
>>>>> 
>>>>> The stack that you have is when exiting a mutex something incorrect
>>>>> happen. We tried to unlock the lock to something we don't have. If you
>>>>> can upload the dump somewhere that folks can dig into it, that'd be
>>>>> helpful. If you need help doing so or need a location to put it, please
>>>>> let me know. I'd also create a github ticket for that.
>>>>> 
>>>>> For the crashes not creating a dump, I'd double check your dump device 
>>>>> size.
>>>>> 
>>>>> Robert
>>>>> 
>>>>>> Through some poking around at the end of last year, I have also noticed 
>>>>>> that the onboard SAS controller (LSI/Avago 3008) isn’t running the 
>>>>>> firmware we specified to the VAR - not sure if this is likely to make a 
>>>>>> difference, but it’s easily flashed. It is currently running the IR 
>>>>>> firmware on and older phase, when it should really be running the recent 
>>>>>> IT firmware.
>>>>>> 
>>>>>> Here are the potentially interesting bits from the dump;
>>>>>> 
>>>>>>> ::status
>>>>>> debugging crash dump vmcore.0 (64-bit) from bri-triw-001
>>>>>> operating system: 5.11 joyent_20161208T003358Z (i86pc)
>>>>>> image uuid: (not set)
>>>>>> panic message:
>>>>>> mutex_exit: not owner, lp=fffffea460ba2020 owner=fffffea3b90aa460 
>>>>>> thread=fffffea3f40b1be0
>>>>>> dump content: kernel pages only
>>>>>> 
>>>>>>> $C
>>>>>> fffffe426a808fe8 vpanic()
>>>>>> fffffe426a809008 mutex_panic+0x58(fffffffffb94dc45, fffffea460ba2020)
>>>>>> fffffe426a809038 mutex_vector_exit+0x40(fffffea460ba2020)
>>>>>> fffffe426a809070 gfn_to_memslot_unaliased+0x6f()
>>>>>> fffffe426a809090 gfn_to_hva+0x27()
>>>>>> fffffe426a8090c0 kvm_read_guest_page+0x29()
>>>>>> fffffe426a809110 kvm_read_guest+0x43()
>>>>>> fffffe426a809190 paging64_walk_addr+0xef()
>>>>>> fffffe426a809230 paging64_gva_to_gpa+0x43()
>>>>>> fffffe426a809260 kvm_mmu_gva_to_gpa_read+0x45()
>>>>>> fffffe426a8092b0 emulator_read_emulated+0x7c()
>>>>>> fffffe426a809350 x86_emulate_insn+0x1af()
>>>>>> fffffe426a809390 emulate_instruction+0x1e9()
>>>>>> fffffe426a8093c0 kvm_mmu_page_fault+0x60()
>>>>>> fffffe426a8093f0 handle_ept_violation+0x111()
>>>>>> fffffe426a809430 vmx_handle_exit+0x16a()
>>>>>> fffffe426a809460 vcpu_enter_guest+0x3ea()
>>>>>> fffffe426a8094a0 __vcpu_run+0x8b()
>>>>>> fffffe426a8094e0 kvm_arch_vcpu_ioctl_run+0x112()
>>>>>> fffffe426a809cc0 kvm_ioctl+0x466()
>>>>>> fffffe426a809d00 cdev_ioctl+0x39(3400000068, 2000ae80, 0, 202003, 
>>>>>> fffffea3b7d370d0,
>>>>>> fffffe426a809ea8)
>>>>>> fffffe426a809d50 spec_ioctl+0x60(fffffea4440b5040, 2000ae80, 0, 202003, 
>>>>>> fffffea3b7d370d0,
>>>>>> fffffe426a809ea8, 0)
>>>>>> fffffe426a809de0 fop_ioctl+0x55(fffffea4440b5040, 2000ae80, 0, 202003, 
>>>>>> fffffea3b7d370d0,
>>>>>> fffffe426a809ea8, 0)
>>>>>> fffffe426a809f00 ioctl+0x9b(16, 2000ae80, 0)
>>>>>> fffffe426a809f10 sys_syscall+0x1a2()
>>>>>> 
>>>>>>> ::msgbuf
>>>>>> MESSAGE
>>>>>> vcpu 3 received sipi with vector # 10
>>>>>> vcpu 5 received sipi with vector # 10
>>>>>> kvm_lapic_reset: vcpu=fffffea4434cf000, id=3, base_msr= fee00800 PRIx64 
>>>>>> base_address=fee00
>>>>>> 000
>>>>>> kvm_lapic_reset: vcpu=fffffea4434ff000, id=5, base_msr= fee00800 PRIx64 
>>>>>> base_address=fee00
>>>>>> 000
>>>>>> vcpu 9 received sipi with vector # 10
>>>>>> vcpu 10 received sipi with vector # 10
>>>>>> kvm_lapic_reset: vcpu=fffffea44354f000, id=9, base_msr= fee00800 PRIx64 
>>>>>> base_address=fee00
>>>>>> 000
>>>>>> vcpu 4 received sipi with vector # 10
>>>>>> kvm_lapic_reset: vcpu=fffffea493522000, id=10, base_msr= fee00800 PRIx64 
>>>>>> base_address=fee0
>>>>>> 0000
>>>>>> vcpu 6 received sipi with vector # 10
>>>>>> vcpu 7 received sipi with vector # 10
>>>>>> kvm_lapic_reset: vcpu=fffffea44348f000, id=1, base_msr= fee00800 PRIx64 
>>>>>> base_address=fee00
>>>>>> 000
>>>>>> vcpu 8 received sipi with vector # 10
>>>>>> vcpu 2 received sipi with vector # 10
>>>>>> vcpu 11 received sipi with vector # 10
>>>>>> kvm_lapic_reset: vcpu=fffffea4434bf000, id=4, base_msr= fee00800 PRIx64 
>>>>>> base_address=fee00
>>>>>> 000
>>>>>> kvm_lapic_reset: vcpu=fffffea44352f000, id=6, base_msr= fee00800 PRIx64 
>>>>>> base_address=fee00
>>>>>> 000
>>>>>> kvm_lapic_reset: vcpu=fffffea443517000, id=7, base_msr= fee00800 PRIx64 
>>>>>> base_address=fee00
>>>>>> 000
>>>>>> kvm_lapic_reset: vcpu=fffffea443557000, id=8, base_msr= fee00800 PRIx64 
>>>>>> base_address=fee00
>>>>>> 000
>>>>>> kvm_lapic_reset: vcpu=fffffea44347f000, id=2, base_msr= fee00800 PRIx64 
>>>>>> base_address=fee00
>>>>>> 000
>>>>>> kvm_lapic_reset: vcpu=fffffea49351a000, id=11, base_msr= fee00800 PRIx64 
>>>>>> base_address=fee0
>>>>>> 0000
>>>>>> unhandled wrmsr: 0x0 data 0
>>>>>> vcpu 1 received sipi with vector # 1
>>>>>> kvm_lapic_reset: vcpu=fffffea44348f000, id=1, base_msr= fee00800 PRIx64 
>>>>>> base_address=fee00
>>>>>> 000
>>>>>> vcpu 2 received sipi with vector # 1
>>>>>> kvm_lapic_reset: vcpu=fffffea44347f000, id=2, base_msr= fee00800 PRIx64 
>>>>>> base_address=fee00
>>>>>> 000
>>>>>> vcpu 3 received sipi with vector # 1
>>>>>> kvm_lapic_reset: vcpu=fffffea4434cf000, id=3, base_msr= fee00800 PRIx64 
>>>>>> base_address=fee00
>>>>>> 000
>>>>>> vcpu 4 received sipi with vector # 1
>>>>>> kvm_lapic_reset: vcpu=fffffea4434bf000, id=4, base_msr= fee00800 PRIx64 
>>>>>> base_address=fee00
>>>>>> 000
>>>>>> vcpu 5 received sipi with vector # 1
>>>>>> kvm_lapic_reset: vcpu=fffffea4434ff000, id=5, base_msr= fee00800 PRIx64 
>>>>>> base_address=fee00
>>>>>> 000
>>>>>> vcpu 6 received sipi with vector # 1
>>>>>> kvm_lapic_reset: vcpu=fffffea44352f000, id=6, base_msr= fee00800 PRIx64 
>>>>>> base_address=fee00
>>>>>> 000
>>>>>> vcpu 7 received sipi with vector # 1
>>>>>> kvm_lapic_reset: vcpu=fffffea443517000, id=7, base_msr= fee00800 PRIx64 
>>>>>> base_address=fee00
>>>>>> 000
>>>>>> vcpu 8 received sipi with vector # 1
>>>>>> kvm_lapic_reset: vcpu=fffffea443557000, id=8, base_msr= fee00800 PRIx64 
>>>>>> base_address=fee00
>>>>>> 000
>>>>>> vcpu 9 received sipi with vector # 1
>>>>>> kvm_lapic_reset: vcpu=fffffea44354f000, id=9, base_msr= fee00800 PRIx64 
>>>>>> base_address=fee00
>>>>>> 000
>>>>>> vcpu 10 received sipi with vector # 1
>>>>>> kvm_lapic_reset: vcpu=fffffea493522000, id=10, base_msr= fee00800 PRIx64 
>>>>>> base_address=fee0
>>>>>> 0000
>>>>>> vcpu 11 received sipi with vector # 1
>>>>>> kvm_lapic_reset: vcpu=fffffea49351a000, id=11, base_msr= fee00800 PRIx64 
>>>>>> base_address=fee0
>>>>>> 0000
>>>>>> unhandled wrmsr: 0x0 data 0
>>>>>> unhandled wrmsr: 0x0 data 0
>>>>>> unhandled wrmsr: 0x0 data 0
>>>>>> unhandled wrmsr: 0x0 data 0
>>>>>> unhandled wrmsr: 0xffbfe0 data fffffd7fffdfe720
>>>>>> unhandled wrmsr: 0xffbfe0 data fffffd7fffdfe720
>>>>>> unhandled wrmsr: 0xffbfe0 data fffffd7fffdfe720
>>>>>> unhandled wrmsr: 0xffbfe0 data fffffd7fffdfe720
>>>>>> vcpu 1 received sipi with vector # 10
>>>>>> vcpu 3 received sipi with vector # 10
>>>>>> vcpu 2 received sipi with vector # 10
>>>>>> kvm_lapic_reset: vcpu=fffffea5ac6e8000, id=1, base_msr= fee00800 PRIx64 
>>>>>> base_address=fee00
>>>>>> 000
>>>>>> kvm_lapic_reset: vcpu=fffffea625738000, id=3, base_msr= fee00800 PRIx64 
>>>>>> base_address=fee00
>>>>>> 000
>>>>>> kvm_lapic_reset: vcpu=fffffea443527000, id=2, base_msr= fee00800 PRIx64 
>>>>>> base_address=fee00
>>>>>> 000
>>>>>> unhandled wrmsr: 0x0 data 0
>>>>>> vcpu 1 received sipi with vector # 1
>>>>>> kvm_lapic_reset: vcpu=fffffea5ac6e8000, id=1, base_msr= fee00800 PRIx64 
>>>>>> base_address=fee00
>>>>>> 000
>>>>>> vcpu 2 received sipi with vector # 1
>>>>>> kvm_lapic_reset: vcpu=fffffea443527000, id=2, base_msr= fee00800 PRIx64 
>>>>>> base_address=fee00
>>>>>> 000
>>>>>> vcpu 3 received sipi with vector # 1
>>>>>> kvm_lapic_reset: vcpu=fffffea625738000, id=3, base_msr= fee00800 PRIx64 
>>>>>> base_address=fee00
>>>>>> 000
>>>>>> unhandled wrmsr: 0x0 data 0
>>>>>> unhandled wrmsr: 0x0 data 0
>>>>>> unhandled wrmsr: 0x0 data 0
>>>>>> unhandled wrmsr: 0x0 data 0
>>>>>> unhandled wrmsr: 0x11e4720 data fffffd7fffdfe720
>>>>>> unhandled wrmsr: 0x11e4720 data fffffd7fffdfe720
>>>>>> unhandled wrmsr: 0x11e4720 data fffffd7fffdfe720
>>>>>> unhandled wrmsr: 0x11e4720 data fffffd7fffdfe720
>>>>>> vcpu 2 received sipi with vector # 10
>>>>>> vcpu 1 received sipi with vector # 10
>>>>>> vcpu 3 received sipi with vector # 10
>>>>>> kvm_lapic_reset: vcpu=fffffea5ac6e8000, id=1, base_msr= fee00800 PRIx64 
>>>>>> base_address=fee00
>>>>>> 000
>>>>>> kvm_lapic_reset: vcpu=fffffea443527000, id=2, base_msr= fee00800 PRIx64 
>>>>>> base_address=fee00
>>>>>> 000
>>>>>> kvm_lapic_reset: vcpu=fffffea625738000, id=3, base_msr= fee00800 PRIx64 
>>>>>> base_address=fee00
>>>>>> 000
>>>>>> unhandled wrmsr: 0x0 data 0
>>>>>> vcpu 1 received sipi with vector # 1
>>>>>> kvm_lapic_reset: vcpu=fffffea5ac6e8000, id=1, base_msr= fee00800 PRIx64 
>>>>>> base_address=fee00
>>>>>> 000
>>>>>> vcpu 2 received sipi with vector # 1
>>>>>> kvm_lapic_reset: vcpu=fffffea443527000, id=2, base_msr= fee00800 PRIx64 
>>>>>> base_address=fee00
>>>>>> 000
>>>>>> vcpu 3 received sipi with vector # 1
>>>>>> kvm_lapic_reset: vcpu=fffffea625738000, id=3, base_msr= fee00800 PRIx64 
>>>>>> base_address=fee00
>>>>>> 000
>>>>>> unhandled wrmsr: 0x0 data 0
>>>>>> unhandled wrmsr: 0x0 data 0
>>>>>> unhandled wrmsr: 0x0 data 0
>>>>>> unhandled wrmsr: 0x0 data 0
>>>>>> unhandled wrmsr: 0x11e4a20 data fffffd7fffdfe720
>>>>>> unhandled wrmsr: 0x11e4a20 data fffffd7fffdfe720
>>>>>> unhandled wrmsr: 0x11e4a20 data fffffd7fffdfe720
>>>>>> unhandled wrmsr: 0x11e4a20 data fffffd7fffdfe720
>>>>>> vcpu 3 received sipi with vector # 10
>>>>>> vcpu 1 received sipi with vector # 10
>>>>>> vcpu 2 received sipi with vector # 10
>>>>>> kvm_lapic_reset: vcpu=fffffea625738000, id=3, base_msr= fee00800 PRIx64 
>>>>>> base_address=fee00
>>>>>> 000
>>>>>> kvm_lapic_reset: vcpu=fffffea443527000, id=2, base_msr= fee00800 PRIx64 
>>>>>> base_address=fee00
>>>>>> 000
>>>>>> kvm_lapic_reset: vcpu=fffffea5ac6e8000, id=1, base_msr= fee00800 PRIx64 
>>>>>> base_address=fee00
>>>>>> 000
>>>>>> unhandled wrmsr: 0x0 data 0
>>>>>> vcpu 1 received sipi with vector # 1
>>>>>> kvm_lapic_reset: vcpu=fffffea5ac6e8000, id=1, base_msr= fee00800 PRIx64 
>>>>>> base_address=fee00
>>>>>> 000
>>>>>> vcpu 2 received sipi with vector # 1
>>>>>> kvm_lapic_reset: vcpu=fffffea443527000, id=2, base_msr= fee00800 PRIx64 
>>>>>> base_address=fee00
>>>>>> 000
>>>>>> vcpu 3 received sipi with vector # 1
>>>>>> kvm_lapic_reset: vcpu=fffffea625738000, id=3, base_msr= fee00800 PRIx64 
>>>>>> base_address=fee00
>>>>>> 000
>>>>>> 
>>>>>> panic[cpu20]/thread=fffffea3f40b1be0:
>>>>>> mutex_exit: not owner, lp=fffffea460ba2020 owner=fffffea3b90aa460 
>>>>>> thread=fffffea3f40b1be0
>>>>>> 
>>>>>> 
>>>>>>>> warning! 8-byte aligned %fp = fffffe426a809008
>>>>>> fffffe426a809008 unix:mutex_panic+58 ()
>>>>>>>> warning! 8-byte aligned %fp = fffffe426a809038
>>>>>> fffffe426a809038 unix:mutex_vector_exit+40 ()
>>>>>> fffffe426a809070 kvm:gfn_to_memslot_unaliased+6f ()
>>>>>> fffffe426a809090 kvm:gfn_to_hva+27 ()
>>>>>> fffffe426a8090c0 kvm:kvm_read_guest_page+29 ()
>>>>>> fffffe426a809110 kvm:kvm_read_guest+43 ()
>>>>>> fffffe426a809190 kvm:paging64_walk_addr+ef ()
>>>>>> fffffe426a809230 kvm:paging64_gva_to_gpa+43 ()
>>>>>> fffffe426a809260 kvm:kvm_mmu_gva_to_gpa_read+45 ()
>>>>>> fffffe426a8092b0 kvm:emulator_read_emulated+7c ()
>>>>>> fffffe426a809350 kvm:x86_emulate_insn+1af ()
>>>>>> fffffe426a809390 kvm:emulate_instruction+1e9 ()
>>>>>> fffffe426a8093c0 kvm:kvm_mmu_page_fault+60 ()
>>>>>> fffffe426a8093f0 kvm:handle_ept_violation+111 ()
>>>>>> fffffe426a809430 kvm:vmx_handle_exit+16a ()
>>>>>> fffffe426a809460 kvm:vcpu_enter_guest+3ea ()
>>>>>> fffffe426a8094a0 kvm:__vcpu_run+8b ()
>>>>>> fffffe426a8094e0 kvm:kvm_arch_vcpu_ioctl_run+112 ()
>>>>>> fffffe426a809cc0 kvm:kvm_ioctl+466 ()
>>>>>> fffffe426a809d00 genunix:cdev_ioctl+39 ()
>>>>>> fffffe426a809d50 specfs:spec_ioctl+60 ()
>>>>>> fffffe426a809de0 genunix:fop_ioctl+55 ()
>>>>>> fffffe426a809f00 genunix:ioctl+9b ()
>>>>>> fffffe426a809f10 unix:brand_sys_syscall+21d ()
>>>>>> 
>>>>>> WARNING: /pci@0,0/pci8086,6f02@1/pci15d9,808@0 (mpt_sas0):
>>>>>>       mptsas_handle_event_sync: event 0xf, IOCStatus=0x8000, 
>>>>>> IOCLogInfo=0x31170000
>>>>>> /pci@0,0/pci8086,6f02@1/pci15d9,808@0 (mpt_sas0):
>>>>>>       Log info 0x31170000 received for target 16 w5000cca07d0109a9.
>>>>>>       scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
>>>>>> 
>>>>> 
>>>>> 
>>>> smartos-discuss | Archives 
>>>> <https://www.listbox.com/member/archive/184463/=now 
>>>> <https://www.listbox.com/member/archive/184463/=now>>  
>>>> <https://www.listbox.com/member/archive/rss/184463/28443474-1732e24d 
>>>> <https://www.listbox.com/member/archive/rss/184463/28443474-1732e24d>> | 
>>>> Modify <https://www.listbox.com/member/?&; 
>>>> <https://www.listbox.com/member/?&;>> Your Subscription
>>> 
>>> 
>> 
>> 
>> http://www.listbox.com <http://www.listbox.com/>
> 
> smartos-discuss | Archives 
> <https://www.listbox.com/member/archive/184463/=now>  
> <https://www.listbox.com/member/archive/rss/184463/28443474-1732e24d> | 
> Modify <https://www.listbox.com/member/?&;> Your Subscription




-------------------------------------------
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com

Reply via email to