Hi Jason,

The image that had the issue was a full build, but last night I played git
reset --hard, git pull etc.., the image I got didn't reproduce the hang. So
Now I am starting all over again and see if I can reproduce.

Thanks.

On Mon, Aug 13, 2018 at 10:20 AM, Jason King <[email protected]>
wrote:

> That is strange — the same git repo that built that image was the one that
> was pushed to Gerrit and merged with master (i.e. it is the _same_
> commit).  Looking at the current master (as of a few minutes ago), the
> change doesn’t appear to be stepped on (it’s a very small change — it just
> moves the setting of a bit mask indicating the CPU has finished starting up
> to the last thing in the per-cpu startup thread (minus some diagnostic
> messages after startup and a call to thread_exit()).
>
> Did you try doing a 'gmake clobber’ in your SmartOS repo before building?
> Unfortunately, incremental (even just rebuilding illumos-joyent) doesn’t
> always work and can sometimes cause strange behavior.
>
> In the meantime, I’ll try doing a full build of the current master and
> installing it on my server here at home — it was very good at tripping the
> bug in OS-7079, so I’ll see if I can get it to hang (though it’ll take a
> bit to do a full build).
>
>
> From: Youzhong Yang <[email protected]> <[email protected]>
> Reply: Youzhong Yang <[email protected]> <[email protected]>
> Date: August 13, 2018 at 12:27:52 AM
>
> To: Jason King <[email protected]> <[email protected]>
> Cc: [email protected] <[email protected]>
> <[email protected]>
> Subject:  Re: [smartos-discuss] still hang at boot - OS-7079
> mp_startup_common races itself
>
> So your image booted up. Interesting ... Maybe something else messed up
> your fix?
> Anyway I am now building my image and see what I can get from ::cpustack.
>
> On Mon, Aug 13, 2018 at 1:14 AM, Jason King <[email protected]>
> wrote:
>
>> Doh.. the problems of it being late :) .. there should be a ‘public’ in
>> there.
>>
>> Try
>>
>> https://us-east.manta.joyent.com/jbk/public/OS-7079/platform
>> -20180719T001516Z.iso
>>
>>
>> From: Youzhong Yang <[email protected]> <[email protected]>
>> Reply: Youzhong Yang <[email protected]> <[email protected]>
>> Date: August 13, 2018 at 12:12:52 AM
>>
>> To: Jason King <[email protected]> <[email protected]>
>> Cc: [email protected] <[email protected]>
>> <[email protected]>
>> Subject:  Re: [smartos-discuss] still hang at boot - OS-7079
>> mp_startup_common races itself
>>
>> I got this:
>>
>> {"code":"ResourceNotFound","message":"/jbk/OS-7079/platform-20180719T001516Z.iso
>> does not exist"}
>>
>> In our /etc/system, I have
>> set pcplusmp:apic_panic_on_nmi=1
>> set apix:apic_panic_on_nmi=1
>>
>> If I set them to 0, and boot with -k, a NMI should drop into kmdb, right?
>> I will build an image now and test.
>>
>>
>> On Mon, Aug 13, 2018 at 1:04 AM, Jason King <[email protected]>
>> wrote:
>>
>>> There’s a couple of ways — you can boot -kd and set a breakpoint to set
>>> it.  You can also set it in etc/system in the proto area when building an
>>> image.
>>>
>>> If you want, I do have an image of 20180719 w/ OS-7079 applied and kmdb
>>> on NMI already set (you’d still want to boot -k)  — you can grab it at
>>> https://us-east.manta.joyent.com/jbk/OS-7079/platform-201807
>>> 19T001516Z.{iso,tgz,usb.bz2}
>>>
>>> If you do, it’d be interesting to see ::cpustack on each core looks like.
>>>
>>>
>>> From: Youzhong Yang <[email protected]> <[email protected]>
>>> Reply: Youzhong Yang <[email protected]> <[email protected]>
>>> Date: August 12, 2018 at 11:58:48 PM
>>> To: Jason King <[email protected]> <[email protected]>
>>> Cc: [email protected] <[email protected]
>>> .org> <[email protected]>
>>> Subject:  Re: [smartos-discuss] still hang at boot - OS-7079
>>> mp_startup_common races itself
>>>
>>> I sent NMI, but it printed out a stack trace plus a message "no dump
>>> device" or something then rebooted. I tried -v on my old supermicro system,
>>> on the console I saw message about sd## devices, then it hung. The console
>>> still responded to keyboard, but just stayed that way forever.
>>>
>>> What change is needed to drop into kmdb when the OS receives NMI?
>>>
>>> On Mon, Aug 13, 2018 at 12:06 AM, Jason King <[email protected]
>>> > wrote:
>>>
>>>> Was that with boot -v?  Are you able to send the system an NMI after it
>>>> hangs (or get the boot -v output up to the hang)?
>>>>
>>>> Prior to OS-7079, the system would start to startup the next CPU before
>>>> it had completely finished initializing the ‘current’ CPU (which could
>>>> deadlock depending on which CPU obtained a particular lock first), the
>>>> change makes it wait until the current CPU is finished starting up before
>>>> proceeding to the next CPU.
>>>>
>>>> It’s certainly possible it could have revealed another bug — OS-7079
>>>> itself was introduced almost 10 years ago, but didn’t seem to be easy to
>>>> trigger until recent CPUs.
>>>>
>>>>
>>>> From: Youzhong Yang <[email protected]> <[email protected]>
>>>> Reply: [email protected] <[email protected]
>>>> .org> <[email protected]>
>>>> Date: August 12, 2018 at 10:46:05 PM
>>>> To: [email protected] <[email protected]
>>>> .org> <[email protected]>
>>>> Subject:  [smartos-discuss] still hang at boot - OS-7079
>>>> mp_startup_common races itself
>>>>
>>>> Today I built a smartos image (with all git repos synced to master) and
>>>> rebooted the host with that image. It hung after the banner message + one
>>>> more line about power management or something.
>>>>
>>>> Then I reverted OS-7079, built an image, rebooted, it worked perfectly.
>>>>
>>>> So does it mean OS-7079 fixed one issue, but caused another? My host is
>>>> an old Supermicro X8DAH, Intel(R) Xeon(R) CPU X5570  @ 2.93GHz. Tomorrow I
>>>> will try on a new all NVMe system and see if it works.
>>>>
>>>> Thanks.
>>>> *smartos-discuss* | Archives
>>>> <https://www.listbox.com/member/archive/184463/=now> | Modify
>>>> <https://www.listbox.com/member/?> Your Subscription
>>>> <https://www.listbox.com>
>>>>
>>>>
>>>
>>
>



-------------------------------------------
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
Modify Your Subscription: https://www.listbox.com/member/?member_id=25769125
Powered by Listbox: https://www.listbox.com

Reply via email to