So your image booted up. Interesting ... Maybe something else messed up your fix? Anyway I am now building my image and see what I can get from ::cpustack.
On Mon, Aug 13, 2018 at 1:14 AM, Jason King <[email protected]> wrote: > Doh.. the problems of it being late :) .. there should be a ‘public’ in > there. > > Try > > https://us-east.manta.joyent.com/jbk/public/OS-7079/ > platform-20180719T001516Z.iso > > > From: Youzhong Yang <[email protected]> <[email protected]> > Reply: Youzhong Yang <[email protected]> <[email protected]> > Date: August 13, 2018 at 12:12:52 AM > > To: Jason King <[email protected]> <[email protected]> > Cc: [email protected] <[email protected]> > <[email protected]> > Subject: Re: [smartos-discuss] still hang at boot - OS-7079 > mp_startup_common races itself > > I got this: > > {"code":"ResourceNotFound","message":"/jbk/OS-7079/platform-20180719T001516Z.iso > does not exist"} > > In our /etc/system, I have > set pcplusmp:apic_panic_on_nmi=1 > set apix:apic_panic_on_nmi=1 > > If I set them to 0, and boot with -k, a NMI should drop into kmdb, right? > I will build an image now and test. > > > On Mon, Aug 13, 2018 at 1:04 AM, Jason King <[email protected]> > wrote: > >> There’s a couple of ways — you can boot -kd and set a breakpoint to set >> it. You can also set it in etc/system in the proto area when building an >> image. >> >> If you want, I do have an image of 20180719 w/ OS-7079 applied and kmdb >> on NMI already set (you’d still want to boot -k) — you can grab it at >> https://us-east.manta.joyent.com/jbk/OS-7079/platform-201807 >> 19T001516Z.{iso,tgz,usb.bz2} >> >> If you do, it’d be interesting to see ::cpustack on each core looks like. >> >> >> From: Youzhong Yang <[email protected]> <[email protected]> >> Reply: Youzhong Yang <[email protected]> <[email protected]> >> Date: August 12, 2018 at 11:58:48 PM >> To: Jason King <[email protected]> <[email protected]> >> Cc: [email protected] <[email protected]> >> <[email protected]> >> Subject: Re: [smartos-discuss] still hang at boot - OS-7079 >> mp_startup_common races itself >> >> I sent NMI, but it printed out a stack trace plus a message "no dump >> device" or something then rebooted. I tried -v on my old supermicro system, >> on the console I saw message about sd## devices, then it hung. The console >> still responded to keyboard, but just stayed that way forever. >> >> What change is needed to drop into kmdb when the OS receives NMI? >> >> On Mon, Aug 13, 2018 at 12:06 AM, Jason King <[email protected]> >> wrote: >> >>> Was that with boot -v? Are you able to send the system an NMI after it >>> hangs (or get the boot -v output up to the hang)? >>> >>> Prior to OS-7079, the system would start to startup the next CPU before >>> it had completely finished initializing the ‘current’ CPU (which could >>> deadlock depending on which CPU obtained a particular lock first), the >>> change makes it wait until the current CPU is finished starting up before >>> proceeding to the next CPU. >>> >>> It’s certainly possible it could have revealed another bug — OS-7079 >>> itself was introduced almost 10 years ago, but didn’t seem to be easy to >>> trigger until recent CPUs. >>> >>> >>> From: Youzhong Yang <[email protected]> <[email protected]> >>> Reply: [email protected] <[email protected] >>> .org> <[email protected]> >>> Date: August 12, 2018 at 10:46:05 PM >>> To: [email protected] <[email protected] >>> .org> <[email protected]> >>> Subject: [smartos-discuss] still hang at boot - OS-7079 >>> mp_startup_common races itself >>> >>> Today I built a smartos image (with all git repos synced to master) and >>> rebooted the host with that image. It hung after the banner message + one >>> more line about power management or something. >>> >>> Then I reverted OS-7079, built an image, rebooted, it worked perfectly. >>> >>> So does it mean OS-7079 fixed one issue, but caused another? My host is >>> an old Supermicro X8DAH, Intel(R) Xeon(R) CPU X5570 @ 2.93GHz. Tomorrow I >>> will try on a new all NVMe system and see if it works. >>> >>> Thanks. >>> *smartos-discuss* | Archives >>> <https://www.listbox.com/member/archive/184463/=now> | Modify >>> <https://www.listbox.com/member/?> Your Subscription >>> <https://www.listbox.com> >>> >>> >> > ------------------------------------------- smartos-discuss Archives: https://www.listbox.com/member/archive/184463/=now Modify Your Subscription: https://www.listbox.com/member/?member_id=25769125 Powered by Listbox: https://www.listbox.com
