Re: [smartos-discuss] still hang at boot - OS-7079 mp_startup_common races itself

2018-08-12 Thread Youzhong Yang
So your image booted up. Interesting ... Maybe something else messed up
your fix?
Anyway I am now building my image and see what I can get from ::cpustack.

On Mon, Aug 13, 2018 at 1:14 AM, Jason King 
wrote:

> Doh.. the problems of it being late :) .. there should be a ‘public’ in
> there.
>
> Try
>
> https://us-east.manta.joyent.com/jbk/public/OS-7079/
> platform-20180719T001516Z.iso
>
>
> From: Youzhong Yang  
> Reply: Youzhong Yang  
> Date: August 13, 2018 at 12:12:52 AM
>
> To: Jason King  
> Cc: smartos-discuss@lists.smartos.org 
> 
> Subject:  Re: [smartos-discuss] still hang at boot - OS-7079
> mp_startup_common races itself
>
> I got this:
>
> {"code":"ResourceNotFound","message":"/jbk/OS-7079/platform-20180719T001516Z.iso
> does not exist"}
>
> In our /etc/system, I have
> set pcplusmp:apic_panic_on_nmi=1
> set apix:apic_panic_on_nmi=1
>
> If I set them to 0, and boot with -k, a NMI should drop into kmdb, right?
> I will build an image now and test.
>
>
> On Mon, Aug 13, 2018 at 1:04 AM, Jason King 
> wrote:
>
>> There’s a couple of ways — you can boot -kd and set a breakpoint to set
>> it.  You can also set it in etc/system in the proto area when building an
>> image.
>>
>> If you want, I do have an image of 20180719 w/ OS-7079 applied and kmdb
>> on NMI already set (you’d still want to boot -k)  — you can grab it at
>> https://us-east.manta.joyent.com/jbk/OS-7079/platform-201807
>> 19T001516Z.{iso,tgz,usb.bz2}
>>
>> If you do, it’d be interesting to see ::cpustack on each core looks like.
>>
>>
>> From: Youzhong Yang  
>> Reply: Youzhong Yang  
>> Date: August 12, 2018 at 11:58:48 PM
>> To: Jason King  
>> Cc: smartos-discuss@lists.smartos.org 
>> 
>> Subject:  Re: [smartos-discuss] still hang at boot - OS-7079
>> mp_startup_common races itself
>>
>> I sent NMI, but it printed out a stack trace plus a message "no dump
>> device" or something then rebooted. I tried -v on my old supermicro system,
>> on the console I saw message about sd## devices, then it hung. The console
>> still responded to keyboard, but just stayed that way forever.
>>
>> What change is needed to drop into kmdb when the OS receives NMI?
>>
>> On Mon, Aug 13, 2018 at 12:06 AM, Jason King 
>> wrote:
>>
>>> Was that with boot -v?  Are you able to send the system an NMI after it
>>> hangs (or get the boot -v output up to the hang)?
>>>
>>> Prior to OS-7079, the system would start to startup the next CPU before
>>> it had completely finished initializing the ‘current’ CPU (which could
>>> deadlock depending on which CPU obtained a particular lock first), the
>>> change makes it wait until the current CPU is finished starting up before
>>> proceeding to the next CPU.
>>>
>>> It’s certainly possible it could have revealed another bug — OS-7079
>>> itself was introduced almost 10 years ago, but didn’t seem to be easy to
>>> trigger until recent CPUs.
>>>
>>>
>>> From: Youzhong Yang  
>>> Reply: smartos-discuss@lists.smartos.org >> .org> 
>>> Date: August 12, 2018 at 10:46:05 PM
>>> To: smartos-discuss@lists.smartos.org >> .org> 
>>> Subject:  [smartos-discuss] still hang at boot - OS-7079
>>> mp_startup_common races itself
>>>
>>> Today I built a smartos image (with all git repos synced to master) and
>>> rebooted the host with that image. It hung after the banner message + one
>>> more line about power management or something.
>>>
>>> Then I reverted OS-7079, built an image, rebooted, it worked perfectly.
>>>
>>> So does it mean OS-7079 fixed one issue, but caused another? My host is
>>> an old Supermicro X8DAH, Intel(R) Xeon(R) CPU X5570  @ 2.93GHz. Tomorrow I
>>> will try on a new all NVMe system and see if it works.
>>>
>>> Thanks.
>>> *smartos-discuss* | Archives
>>>  | Modify
>>>  Your Subscription
>>> 
>>>
>>>
>>
>



---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
Modify Your Subscription: https://www.listbox.com/member/?member_id=25769125
Powered by Listbox: https://www.listbox.com


Re: [smartos-discuss] still hang at boot - OS-7079 mp_startup_common races itself

2018-08-12 Thread Jason King
Doh.. the problems of it being late :) .. there should be a ‘public’ in there. 

Try

https://us-east.manta.joyent.com/jbk/public/OS-7079/platform-20180719T001516Z.iso


From: Youzhong Yang 
Reply: Youzhong Yang 
Date: August 13, 2018 at 12:12:52 AM
To: Jason King 
Cc: smartos-discuss@lists.smartos.org 
Subject:  Re: [smartos-discuss] still hang at boot - OS-7079 mp_startup_common 
races itself  

I got this:

{"code":"ResourceNotFound","message":"/jbk/OS-7079/platform-20180719T001516Z.iso
 does not exist"}

In our /etc/system, I have 
set pcplusmp:apic_panic_on_nmi=1
set apix:apic_panic_on_nmi=1

If I set them to 0, and boot with -k, a NMI should drop into kmdb, right? I 
will build an image now and test.


On Mon, Aug 13, 2018 at 1:04 AM, Jason King  wrote:
There’s a couple of ways — you can boot -kd and set a breakpoint to set it.  
You can also set it in etc/system in the proto area when building an image.

If you want, I do have an image of 20180719 w/ OS-7079 applied and kmdb on NMI 
already set (you’d still want to boot -k)  — you can grab it at
https://us-east.manta.joyent.com/jbk/OS-7079/platform-20180719T001516Z.{iso,tgz,usb.bz2}

If you do, it’d be interesting to see ::cpustack on each core looks like.


From: Youzhong Yang 
Reply: Youzhong Yang 
Date: August 12, 2018 at 11:58:48 PM
To: Jason King 
Cc: smartos-discuss@lists.smartos.org 
Subject:  Re: [smartos-discuss] still hang at boot - OS-7079 mp_startup_common 
races itself

I sent NMI, but it printed out a stack trace plus a message "no dump device" or 
something then rebooted. I tried -v on my old supermicro system, on the console 
I saw message about sd## devices, then it hung. The console still responded to 
keyboard, but just stayed that way forever.

What change is needed to drop into kmdb when the OS receives NMI?

On Mon, Aug 13, 2018 at 12:06 AM, Jason King  wrote:
Was that with boot -v?  Are you able to send the system an NMI after it hangs 
(or get the boot -v output up to the hang)?

Prior to OS-7079, the system would start to startup the next CPU before it had 
completely finished initializing the ‘current’ CPU (which could deadlock 
depending on which CPU obtained a particular lock first), the change makes it 
wait until the current CPU is finished starting up before proceeding to the 
next CPU.

It’s certainly possible it could have revealed another bug — OS-7079 itself was 
introduced almost 10 years ago, but didn’t seem to be easy to trigger until 
recent CPUs.


From: Youzhong Yang 
Reply: smartos-discuss@lists.smartos.org 
Date: August 12, 2018 at 10:46:05 PM
To: smartos-discuss@lists.smartos.org 
Subject:  [smartos-discuss] still hang at boot - OS-7079 mp_startup_common 
races itself

Today I built a smartos image (with all git repos synced to master) and 
rebooted the host with that image. It hung after the banner message + one more 
line about power management or something.

Then I reverted OS-7079, built an image, rebooted, it worked perfectly.

So does it mean OS-7079 fixed one issue, but caused another? My host is an old 
Supermicro X8DAH, Intel(R) Xeon(R) CPU X5570  @ 2.93GHz. Tomorrow I will try on 
a new all NVMe system and see if it works.

Thanks.
smartos-discuss | Archives | Modify Your Subscription   



signature.asc
Description: Message signed with OpenPGP using AMPGpg


smime.p7s
Description: S/MIME cryptographic signature




---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
Modify Your Subscription: https://www.listbox.com/member/?member_id=25769125
Powered by Listbox: https://www.listbox.com


Re: [smartos-discuss] still hang at boot - OS-7079 mp_startup_common races itself

2018-08-12 Thread Jason King
There’s a couple of ways — you can boot -kd and set a breakpoint to set it.  
You can also set it in etc/system in the proto area when building an image.

If you want, I do have an image of 20180719 w/ OS-7079 applied and kmdb on NMI 
already set (you’d still want to boot -k)  — you can grab it at
https://us-east.manta.joyent.com/jbk/OS-7079/platform-20180719T001516Z.{iso,tgz,usb.bz2}

If you do, it’d be interesting to see ::cpustack on each core looks like.


From: Youzhong Yang 
Reply: Youzhong Yang 
Date: August 12, 2018 at 11:58:48 PM
To: Jason King 
Cc: smartos-discuss@lists.smartos.org 
Subject:  Re: [smartos-discuss] still hang at boot - OS-7079 mp_startup_common 
races itself  

I sent NMI, but it printed out a stack trace plus a message "no dump device" or 
something then rebooted. I tried -v on my old supermicro system, on the console 
I saw message about sd## devices, then it hung. The console still responded to 
keyboard, but just stayed that way forever.

What change is needed to drop into kmdb when the OS receives NMI?

On Mon, Aug 13, 2018 at 12:06 AM, Jason King  wrote:
Was that with boot -v?  Are you able to send the system an NMI after it hangs 
(or get the boot -v output up to the hang)?

Prior to OS-7079, the system would start to startup the next CPU before it had 
completely finished initializing the ‘current’ CPU (which could deadlock 
depending on which CPU obtained a particular lock first), the change makes it 
wait until the current CPU is finished starting up before proceeding to the 
next CPU.

It’s certainly possible it could have revealed another bug — OS-7079 itself was 
introduced almost 10 years ago, but didn’t seem to be easy to trigger until 
recent CPUs.


From: Youzhong Yang 
Reply: smartos-discuss@lists.smartos.org 
Date: August 12, 2018 at 10:46:05 PM
To: smartos-discuss@lists.smartos.org 
Subject:  [smartos-discuss] still hang at boot - OS-7079 mp_startup_common 
races itself

Today I built a smartos image (with all git repos synced to master) and 
rebooted the host with that image. It hung after the banner message + one more 
line about power management or something.

Then I reverted OS-7079, built an image, rebooted, it worked perfectly.

So does it mean OS-7079 fixed one issue, but caused another? My host is an old 
Supermicro X8DAH, Intel(R) Xeon(R) CPU X5570  @ 2.93GHz. Tomorrow I will try on 
a new all NVMe system and see if it works.

Thanks.
smartos-discuss | Archives | Modify Your Subscription   


signature.asc
Description: Message signed with OpenPGP using AMPGpg


smime.p7s
Description: S/MIME cryptographic signature




---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
Modify Your Subscription: https://www.listbox.com/member/?member_id=25769125
Powered by Listbox: https://www.listbox.com


Re: [smartos-discuss] still hang at boot - OS-7079 mp_startup_common races itself

2018-08-12 Thread Youzhong Yang
I sent NMI, but it printed out a stack trace plus a message "no dump
device" or something then rebooted. I tried -v on my old supermicro system,
on the console I saw message about sd## devices, then it hung. The console
still responded to keyboard, but just stayed that way forever.

What change is needed to drop into kmdb when the OS receives NMI?

On Mon, Aug 13, 2018 at 12:06 AM, Jason King 
wrote:

> Was that with boot -v?  Are you able to send the system an NMI after it
> hangs (or get the boot -v output up to the hang)?
>
> Prior to OS-7079, the system would start to startup the next CPU before it
> had completely finished initializing the ‘current’ CPU (which could
> deadlock depending on which CPU obtained a particular lock first), the
> change makes it wait until the current CPU is finished starting up before
> proceeding to the next CPU.
>
> It’s certainly possible it could have revealed another bug — OS-7079
> itself was introduced almost 10 years ago, but didn’t seem to be easy to
> trigger until recent CPUs.
>
>
> From: Youzhong Yang  
> Reply: smartos-discuss@lists.smartos.org  smartos.org> 
> Date: August 12, 2018 at 10:46:05 PM
> To: smartos-discuss@lists.smartos.org 
> 
> Subject:  [smartos-discuss] still hang at boot - OS-7079
> mp_startup_common races itself
>
> Today I built a smartos image (with all git repos synced to master) and
> rebooted the host with that image. It hung after the banner message + one
> more line about power management or something.
>
> Then I reverted OS-7079, built an image, rebooted, it worked perfectly.
>
> So does it mean OS-7079 fixed one issue, but caused another? My host is an
> old Supermicro X8DAH, Intel(R) Xeon(R) CPU X5570  @ 2.93GHz. Tomorrow I
> will try on a new all NVMe system and see if it works.
>
> Thanks.
> *smartos-discuss* | Archives
>  | Modify
>  Your Subscription
> 
>
>



---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
Modify Your Subscription: https://www.listbox.com/member/?member_id=25769125
Powered by Listbox: https://www.listbox.com


Re: [smartos-discuss] still hang at boot - OS-7079 mp_startup_common races itself

2018-08-12 Thread Youzhong Yang
So I tried on our new Supermicro X11DPU system ( Intel(R) Xeon(R) Gold 6140
CPU @ 2.30GHz ), same issue, hung at boot, but with OS-7079 reverted, it
booted up successfully.

I can try NMI tomorrow and update.

On Mon, Aug 13, 2018 at 12:06 AM, Jason King 
wrote:

> Was that with boot -v?  Are you able to send the system an NMI after it
> hangs (or get the boot -v output up to the hang)?
>
> Prior to OS-7079, the system would start to startup the next CPU before it
> had completely finished initializing the ‘current’ CPU (which could
> deadlock depending on which CPU obtained a particular lock first), the
> change makes it wait until the current CPU is finished starting up before
> proceeding to the next CPU.
>
> It’s certainly possible it could have revealed another bug — OS-7079
> itself was introduced almost 10 years ago, but didn’t seem to be easy to
> trigger until recent CPUs.
>
>
> From: Youzhong Yang  
> Reply: smartos-discuss@lists.smartos.org  smartos.org> 
> Date: August 12, 2018 at 10:46:05 PM
> To: smartos-discuss@lists.smartos.org 
> 
> Subject:  [smartos-discuss] still hang at boot - OS-7079
> mp_startup_common races itself
>
> Today I built a smartos image (with all git repos synced to master) and
> rebooted the host with that image. It hung after the banner message + one
> more line about power management or something.
>
> Then I reverted OS-7079, built an image, rebooted, it worked perfectly.
>
> So does it mean OS-7079 fixed one issue, but caused another? My host is an
> old Supermicro X8DAH, Intel(R) Xeon(R) CPU X5570  @ 2.93GHz. Tomorrow I
> will try on a new all NVMe system and see if it works.
>
> Thanks.
> *smartos-discuss* | Archives
>  | Modify
>  Your Subscription
> 
>
>



---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
Modify Your Subscription: https://www.listbox.com/member/?member_id=25769125
Powered by Listbox: https://www.listbox.com


Re: [smartos-discuss] still hang at boot - OS-7079 mp_startup_common races itself

2018-08-12 Thread Jason King
Was that with boot -v?  Are you able to send the system an NMI after it hangs 
(or get the boot -v output up to the hang)?

Prior to OS-7079, the system would start to startup the next CPU before it had 
completely finished initializing the ‘current’ CPU (which could deadlock 
depending on which CPU obtained a particular lock first), the change makes it 
wait until the current CPU is finished starting up before proceeding to the 
next CPU.

It’s certainly possible it could have revealed another bug — OS-7079 itself was 
introduced almost 10 years ago, but didn’t seem to be easy to trigger until 
recent CPUs.


From: Youzhong Yang 
Reply: smartos-discuss@lists.smartos.org 
Date: August 12, 2018 at 10:46:05 PM
To: smartos-discuss@lists.smartos.org 
Subject:  [smartos-discuss] still hang at boot - OS-7079 mp_startup_common 
races itself  

Today I built a smartos image (with all git repos synced to master) and 
rebooted the host with that image. It hung after the banner message + one more 
line about power management or something.

Then I reverted OS-7079, built an image, rebooted, it worked perfectly.

So does it mean OS-7079 fixed one issue, but caused another? My host is an old 
Supermicro X8DAH, Intel(R) Xeon(R) CPU X5570  @ 2.93GHz. Tomorrow I will try on 
a new all NVMe system and see if it works.

Thanks.
smartos-discuss | Archives | Modify Your Subscription   

signature.asc
Description: Message signed with OpenPGP using AMPGpg


smime.p7s
Description: S/MIME cryptographic signature




---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
Modify Your Subscription: https://www.listbox.com/member/?member_id=25769125
Powered by Listbox: https://www.listbox.com


[smartos-discuss] still hang at boot - OS-7079 mp_startup_common races itself

2018-08-12 Thread Youzhong Yang
Today I built a smartos image (with all git repos synced to master) and
rebooted the host with that image. It hung after the banner message + one
more line about power management or something.

Then I reverted OS-7079, built an image, rebooted, it worked perfectly.

So does it mean OS-7079 fixed one issue, but caused another? My host is an
old Supermicro X8DAH, Intel(R) Xeon(R) CPU X5570  @ 2.93GHz. Tomorrow I
will try on a new all NVMe system and see if it works.

Thanks.



---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
Modify Your Subscription: https://www.listbox.com/member/?member_id=25769125
Powered by Listbox: https://www.listbox.com