Re: [edk2] [Patch V2] UefiCpuPkg/MpInitLib: Remove redundant parameter.
On 07/25/18 13:35, Dong, Eric wrote: >> From: Laszlo Ersek [mailto:ler...@redhat.com] >> Sent: Wednesday, July 25, 2018 6:14 PM >> To: Dong, Eric ; edk2-devel@lists.01.org >> Cc: Ni, Ruiyu >> On 07/25/18 05:50, Dong, Eric wrote: >>> But some AP may wake up later and it failed to return from the >>> procedure. >> >> Ah! So that explains another symptom I've since seen as well -- >> although *very* rarely. Namely, if an AP wakes up *after* >> PiSmmCpuDxeSmm moves on, thinking that all APs are finished, the AP >> can execute garbage in "no man's land" -- and that crashes the guest. >> Basically, QEMU/KVM pause the guest with "emulation failure", and >> QEMU dumps the VCPU register state to the standard error on the host >> side. In particular, the register state indicates that the crashed >> VCPU is *not* in SMM. > > Where to get these QEMU dumps info? QEMU simply writes the register dump to stderr. If you use QEMU together with libvirt, then QEMU's stderr is saved in: /var/log/libvirt/qemu/$DOMAIN.log Here's an example: KVM internal error. Suberror: 1 emulation failure RAX= RBX=0005 RCX=7eab9828 RDX=0002 RSI=0009f000 RDI=7eef2ae0 RBP=7eef2e40 RSP=7eef2a08 R8 =0002 R9 = R10= R11= R12= R13= R14= R15= RIP=7ea9cdb2 RFL=00010046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0030 00c09300 DPL=0 DS [-WA] CS =0038 00a09b00 DPL=0 CS64 [-RA] SS =0030 00c09300 DPL=0 DS [-WA] DS =0030 00c09300 DPL=0 DS [-WA] FS =0030 00c09300 DPL=0 DS [-WA] GS =0030 00c09300 DPL=0 DS [-WA] LDT= 8200 DPL=0 LDT TR = 8b00 DPL=0 TSS64-busy GDT= 7ebed998 0047 IDT= 7e237280 0fff CR0=c0010033 CR2= CR3=7ec01000 CR4=0668 DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER=0500 Code=89 d0 5f c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc f3 90 cc cc cc cc cc cc cc cc cc cc cc cc cc 53 89 c8 89 d1 50 0f a2 4c 8b 54 24 38 4d 85 d2 (Anyway, this is just for general discussion now -- your v3 series works perfectly for me.) > I'm not clear who calls StartAllAPs function when I send this mail, I > just based on the debug message found StartAllAps trigged right after > PiSmmCpuDxeSmm driver. but I think this not blocks my patches for this > issue. So I sent the patches and back to dig it more. Sure, I think your analysis was spot-on, regardless of how you produced it. The register dumps / emulation failures that I mentioned in my previous email in this thread are not a new issue with your v3 posting. The v3 series is fine. Instead, these failures were a *further* (but less frequent) symptom with the *original* series, and I encountered them separately from the boot hangs that I reported at first. > Now I found it's PiSmmIpl driver calls gDS->SetMemorySpaceAttributes > in SmmIplDxeDispatchEventNotify function which finally calls the > StartAllAps. > The Ap procedure is SetMtrrsFromBuffer function in CpuDxe driver. In > SetMtrrsFromBuffer function, it just call MtrrSetAllMtrrs to update > the MTRR, and it use local variable to save the MTRR settings. > > When the hang issue raised, in the time the AP begin to run the > procedure, the BSP has leave current function. So the saved MTRR > setting in the local variable is not valid anymore. So the > MtrrSetAllMtrrs function use an invalid Mtrr Setttings to set mtrrs > and never exit the procedure. Do you think it's reasonable? Absolutely. Basically, if an AP is allowed to continue running the user procedure after the BSP in MpInitLib thinks that the AP is finished, anything at all can happen -- the escapes control, it may access memory that has been freed, and perhaps it might even execute code that has been freed and overwritten. The actual symptoms can vary wildly. > I found till the ExitBootService point, the AP still in busy state. I > check the ApWakeupFunction function, found between the AP procedure > and set AP state to Idle, no other possible code exist. So I think the > AP should not return back from procedure. I try to add debug message > to know where the AP is stopped at. But after I add debug message in > MtrrSetAllMtrrs, the issue can't reproduce anymore. I tried about 30 > times. Do you have other message to get to know where the AP is? No, I don't have any other ideas at hand to deduce where the AP was stuck -- as far as I'm concerned, it had just wandered off into the weeds :) I think your idea to check the AP state at the time of the
Re: [edk2] [Patch V2] UefiCpuPkg/MpInitLib: Remove redundant parameter.
Hi Laszlo, > -Original Message- > From: Laszlo Ersek [mailto:ler...@redhat.com] > Sent: Wednesday, July 25, 2018 6:14 PM > To: Dong, Eric ; edk2-devel@lists.01.org > Cc: Ni, Ruiyu > Subject: Re: [edk2] [Patch V2] UefiCpuPkg/MpInitLib: Remove redundant > parameter. > > On 07/25/18 05:50, Dong, Eric wrote: > > Hi Laszlo, > > > > I have root cause this issue, the AP hangs in the procedure when > > PiSmmCpuDxeSmm driver start up trigged this issue. > > > > When PiSmmCpuDxeSmm driver start up, it will call StartAllAps to set > > memory attribute. In StartAllAps function, after call WakeUpAp to > > start Aps, it calls CheckAllAps to wait all Aps finished the task. In > > CheckAllAps function, it detect AP state to know whether the AP has > > finished its task. In old code, it check whether the AP state is > > CpuStateFinished to know whether AP has finished tasks. This state is > > only set by AP when it truly finished task. In new logic, > > CpuStateFinished been replace with CpuStateIdle. And CpuStateIdle is > > also the begin state of the AP. AP will change state from CpuStateIdle > > to CpuStateBusy when it start execute the procedure. And after it > > finished the procedure, it will change state back to CpuStateIdle. > > > > So when the hang issue raised, AP state is not been changed to > > CpuStateBusy when BSP calls CheckAllAps to check whether the AP has > > finished its task. So the state for the AP still in CpuStateIdle, but > > BSP think AP has finished its task. In this case, BSP think all the > > Aps has finished their tasks and it continues boot. > > Awesome analysis! > > So, this looks like an "inverse" variant of the classic "ABA problem": > > https://en.wikipedia.org/wiki/ABA_problem Yes, it's an ABA problem. New patch keep the CpuStateReady to distinguish from the CpuStateIdle state. > > > But some AP may wake up later and it failed to return from the > > procedure. > > Ah! So that explains another symptom I've since seen as well -- although > *very* rarely. Namely, if an AP wakes up *after* PiSmmCpuDxeSmm moves > on, thinking that all APs are finished, the AP can execute garbage in "no > man's > land" -- and that crashes the guest. Basically, QEMU/KVM pause the guest > with "emulation failure", and QEMU dumps the VCPU register state to the > standard error on the host side. In particular, the register state indicates > that > the crashed VCPU is *not* in SMM. Where to get these QEMU dumps info? I'm not clear who calls StartAllAPs function when I send this mail, I just based on the debug message found StartAllAps trigged right after PiSmmCpuDxeSmm driver. but I think this not blocks my patches for this issue. So I sent the patches and back to dig it more. Now I found it's PiSmmIpl driver calls gDS->SetMemorySpaceAttributes in SmmIplDxeDispatchEventNotify function which finally calls the StartAllAps. The Ap procedure is SetMtrrsFromBuffer function in CpuDxe driver. In SetMtrrsFromBuffer function, it just call MtrrSetAllMtrrs to update the MTRR, and it use local variable to save the MTRR settings. When the hang issue raised, in the time the AP begin to run the procedure, the BSP has leave current function. So the saved MTRR setting in the local variable is not valid anymore. So the MtrrSetAllMtrrs function use an invalid Mtrr Setttings to set mtrrs and never exit the procedure. Do you think it's reasonable? I found till the ExitBootService point, the AP still in busy state. I check the ApWakeupFunction function, found between the AP procedure and set AP state to Idle, no other possible code exist. So I think the AP should not return back from procedure. I try to add debug message to know where the AP is stopped at. But after I add debug message in MtrrSetAllMtrrs, the issue can't reproduce anymore. I tried about 30 times. Do you have other message to get to know where the AP is? Thanks, Eric > > When I first encountered this symptom now, while playing some more with > your patches, it reminded me of earlier problems with MpInitLib. And now > your analysis makes perfect sense of this additional symptom! > > > In this case, the AP state keeps at CpuStateBusy. So later in > > ChangeApLoopCallback function, because this AP state still in > > CpuStateBusy, this AP will not trig the procedure. But BSP wait all > > APs to trig the procedure(BSP wait the Aps to reduce the > > mNumberToFinish value in procedure to continue boot) to continue the > > boot, so the hang occurred. > > This completes the explanation. > > > I think we should keep a middle state to let us know whether the AP > > truly finished its task. I will send another serial patch for this > > issue. Please help to check the new patches. > > Yes, I'll test them too. > > Thanks! > Laszlo ___ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [Patch V2] UefiCpuPkg/MpInitLib: Remove redundant parameter.
On 07/25/18 05:50, Dong, Eric wrote: > Hi Laszlo, > > I have root cause this issue, the AP hangs in the procedure when > PiSmmCpuDxeSmm driver start up trigged this issue. > > When PiSmmCpuDxeSmm driver start up, it will call StartAllAps to set > memory attribute. In StartAllAps function, after call WakeUpAp to > start Aps, it calls CheckAllAps to wait all Aps finished the task. In > CheckAllAps function, it detect AP state to know whether the AP has > finished its task. In old code, it check whether the AP state is > CpuStateFinished to know whether AP has finished tasks. This state is > only set by AP when it truly finished task. In new logic, > CpuStateFinished been replace with CpuStateIdle. And CpuStateIdle is > also the begin state of the AP. AP will change state from CpuStateIdle > to CpuStateBusy when it start execute the procedure. And after it > finished the procedure, it will change state back to CpuStateIdle. > > So when the hang issue raised, AP state is not been changed to > CpuStateBusy when BSP calls CheckAllAps to check whether the AP has > finished its task. So the state for the AP still in CpuStateIdle, but > BSP think AP has finished its task. In this case, BSP think all the > Aps has finished their tasks and it continues boot. Awesome analysis! So, this looks like an "inverse" variant of the classic "ABA problem": https://en.wikipedia.org/wiki/ABA_problem > But some AP may wake up later and it failed to return from the > procedure. Ah! So that explains another symptom I've since seen as well -- although *very* rarely. Namely, if an AP wakes up *after* PiSmmCpuDxeSmm moves on, thinking that all APs are finished, the AP can execute garbage in "no man's land" -- and that crashes the guest. Basically, QEMU/KVM pause the guest with "emulation failure", and QEMU dumps the VCPU register state to the standard error on the host side. In particular, the register state indicates that the crashed VCPU is *not* in SMM. When I first encountered this symptom now, while playing some more with your patches, it reminded me of earlier problems with MpInitLib. And now your analysis makes perfect sense of this additional symptom! > In this case, the AP state keeps at CpuStateBusy. So later in > ChangeApLoopCallback function, because this AP state still in > CpuStateBusy, this AP will not trig the procedure. But BSP wait all > APs to trig the procedure(BSP wait the Aps to reduce the > mNumberToFinish value in procedure to continue boot) to continue the > boot, so the hang occurred. This completes the explanation. > I think we should keep a middle state to let us know whether the AP > truly finished its task. I will send another serial patch for this > issue. Please help to check the new patches. Yes, I'll test them too. Thanks! Laszlo ___ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [Patch V2] UefiCpuPkg/MpInitLib: Remove redundant parameter.
Hi Laszlo, I have root cause this issue, the AP hangs in the procedure when PiSmmCpuDxeSmm driver start up trigged this issue. When PiSmmCpuDxeSmm driver start up, it will call StartAllAps to set memory attribute. In StartAllAps function, after call WakeUpAp to start Aps, it calls CheckAllAps to wait all Aps finished the task. In CheckAllAps function, it detect AP state to know whether the AP has finished its task. In old code, it check whether the AP state is CpuStateFinished to know whether AP has finished tasks. This state is only set by AP when it truly finished task. In new logic, CpuStateFinished been replace with CpuStateIdle. And CpuStateIdle is also the begin state of the AP. AP will change state from CpuStateIdle to CpuStateBusy when it start execute the procedure. And after it finished the procedure, it will change state back to CpuStateIdle. So when the hang issue raised, AP state is not been changed to CpuStateBusy when BSP calls CheckAllAps to check whether the AP has finished its task. So the state for the AP still in CpuStateIdle, but BSP think AP has finished its task. In this case, BSP think all the Aps has finished their tasks and it continues boot. But some AP may wake up later and it failed to return from the procedure. In this case, the AP state keeps at CpuStateBusy. So later in ChangeApLoopCallback function, because this AP state still in CpuStateBusy, this AP will not trig the procedure. But BSP wait all APs to trig the procedure(BSP wait the Aps to reduce the mNumberToFinish value in procedure to continue boot) to continue the boot, so the hang occurred. I think we should keep a middle state to let us know whether the AP truly finished its task. I will send another serial patch for this issue. Please help to check the new patches. Thanks, Eric > -Original Message- > From: edk2-devel [mailto:edk2-devel-boun...@lists.01.org] On Behalf Of > Laszlo Ersek > Sent: Saturday, July 21, 2018 12:30 AM > To: Dong, Eric ; edk2-devel@lists.01.org > Cc: Ni, Ruiyu > Subject: Re: [edk2] [Patch V2] UefiCpuPkg/MpInitLib: Remove redundant > parameter. > > On 07/20/18 08:53, Dong, Eric wrote: > >> -Original Message- From: Laszlo Ersek > >> [mailto:ler...@redhat.com] > > >> Therefore, please upgrade the host to Fedora 26. In Fedora 26, QEMU > >> 2.9 is shipped: > >> > >> https://koji.fedoraproject.org/koji/buildinfo?buildID=986762 > >> > >> ... It's even better if you can upgrade to Fedora 27, as Fedora 27 is > >> the oldest Fedora release still supported at this point. The > >> following article describes the recommended upgrade method: > >> > >> https://fedoraproject.org/wiki/DNF_system_upgrade > >> > > > > I updated the system to fedora 28, but it failed to boot. :( so I > > borrowed an exited fedora 27 DVD and installed it. With this OS, I can > > reproduce this issue now. I found this issue is an random issue, I > > booted 5 times and met the issue. I'm checking the issue. > > Awesome! > > (I'm not happy about the problem itself, of course, but I'm *very* thankful > that you took the time to install a Linux box, for testing with > KVM!!!) > > Laszlo > ___ > edk2-devel mailing list > edk2-devel@lists.01.org > https://lists.01.org/mailman/listinfo/edk2-devel ___ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [Patch V2] UefiCpuPkg/MpInitLib: Remove redundant parameter.
On 07/20/18 08:53, Dong, Eric wrote: >> -Original Message- From: Laszlo Ersek >> [mailto:ler...@redhat.com] >> Therefore, please upgrade the host to Fedora 26. In Fedora 26, QEMU >> 2.9 is shipped: >> >> https://koji.fedoraproject.org/koji/buildinfo?buildID=986762 >> >> ... It's even better if you can upgrade to Fedora 27, as Fedora 27 >> is the oldest Fedora release still supported at this point. The >> following article describes the recommended upgrade method: >> >> https://fedoraproject.org/wiki/DNF_system_upgrade >> > > I updated the system to fedora 28, but it failed to boot. :( so I > borrowed an exited fedora 27 DVD and installed it. With this OS, I > can reproduce this issue now. I found this issue is an random issue, > I booted 5 times and met the issue. I'm checking the issue. Awesome! (I'm not happy about the problem itself, of course, but I'm *very* thankful that you took the time to install a Linux box, for testing with KVM!!!) Laszlo ___ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [Patch V2] UefiCpuPkg/MpInitLib: Remove redundant parameter.
Hi Laszlo, > -Original Message- > From: Laszlo Ersek [mailto:ler...@redhat.com] > Sent: Friday, July 20, 2018 1:01 AM > To: Dong, Eric ; edk2-devel@lists.01.org > Cc: Ni, Ruiyu > Subject: Re: [edk2] [Patch V2] UefiCpuPkg/MpInitLib: Remove redundant > parameter. > > Hi Eric, > > apologies about the delay. > > On 07/18/18 14:59, Dong, Eric wrote: > > Hi Laszlo, > > > > I finally succeed to setup the OVMF platform which can verify the boot > > failure issue. But on my platform, if I use image build with below > > command (I assume it is used to enable SMM), the system can't boot to > > OS (host OS is fedora 25 and guest OS is Ubuntu 18.04). It hang at OS > > boot phase after ExitBootService point (I can see the console log > > which should been printed at ExitBootService point, so I think hang > > should after this point). > > build -a IA32 -a X64 -p OvmfPkg/OvmfPkgIa32X64.dsc -t VS2015x86 -b > > NOOPT -D SMM_REQUIRE -D SECURE_BOOT_ENABLE -D TLS_ENABLE > > > > If I use below command to build the image, the system can boot to OS. > > build -a IA32 -a X64 -p OvmfPkg\OvmfPkgIa32X64.dsc -t VS2015x86 -b > > NOOPT > > > > Does my OVMF environment still has problem? > > > > > > When do the above test, I don't include my two patches. > > Yes, I think this host environment is still problematic. Namely, the latest > QEMU version shipped in Fedora 25 is QEMU-2.7: > > https://koji.fedoraproject.org/koji/buildinfo?buildID=918114 > > and QEMU-2.7 does not have a feature that is important for SMM stability. > This feature is called "SMI broadcast". > > In OVMF, the "OvmfPkg/SmmControl2Dxe" runtime driver implements > EFI_SMM_CONTROL2_PROTOCOL (which is a runtime protocol). The Trigger() > member function raises an SMI, by writing to IO port 0xB2 (ICH9_APM_CNT). > > Originally, QEMU would raise the SMI synchronously only on the sole VCPU > that called Trigger(). Then, the edk2 SMM driver stack would have to pull the > other processors explicitly into SMM (via APIC accesses, if I remember > correctly). This was extremely slow (the processor first raising the SMI would > wait for a long time for the other processors to show up in SMM, before it > would decide to pull them in with APIC writes). Also when we switched the > edk2 SMM sync mode to "relaxed", the results remained very unstable. We > decided that edk2 supported the "traditional" SMM sync mode much better, > and so we implemented "SMI broadcast" in QEMU, to satisfy that sync mode. > > (My memories are a bit fuzzy at this point; you can read more in the following > RH Bugzilla entries: > > https://bugzilla.redhat.com/show_bug.cgi?id=1412327 [QEMU] > https://bugzilla.redhat.com/show_bug.cgi?id=1412313 [OVMF]) > > The idea of "SMI broadcast" is that, regardless of which VCPU triggers the > SMI, QEMU raises the SMI immediately on all VCPUs. This made a > *huge* difference for the performance and the stability of the edk2 SMM > driver stack, used in OVMF and on QEMU/KVM. > > Now, in order to be able to use old OVMF on new QEMU and vice versa, this > feature is runtime-negotiated between "OvmfPkg/SmmControl2Dxe" and > QEMU. (The feature is not enabled by default, and without "SMI broadcast", > the "relaxed" sync method is slightly less broken than the "tradiational" > method, so OVMF defaults to that. With the feature enabled, the "traditional" > mode is better -- that config is the absolute best of all four possible > combinations.) > > More precisely, on the QEMU side, the feature is not tied to a QEMU release, > but to Q35 *machine type versions*. Therefore, in order to benefit from the > feature, you need all of the following: > > - a recent enough OVMF, > - a recent enough QEMU release, > - a recent enough Q35 machine type, specified on the QEMU command line. > > The particular minimum machine type is "pc-q35-2.9" (which is clearly only > provided by QEMU-2.9 and later). The machine type requirement is > automatically satisfied if you use QEMU-2.9+, and just request the "q35" > machine type. (Without an explicit machtype version number, the highest one > supported by the QEMU release will be picked.) > > The lack of this feature in your environment is confirmed by your OVMF > log: > > > NegotiateSmiFeatures: SMI feature negotiation unavailable > > If the feature is available, you will see the following two messages > instead: > > NegotiateSmiFeatures: using SMI broadcast > [...] > AppendFwCfgBootScript: SMI feature negot
Re: [edk2] [Patch V2] UefiCpuPkg/MpInitLib: Remove redundant parameter.
Hi Eric, apologies about the delay. On 07/18/18 14:59, Dong, Eric wrote: > Hi Laszlo, > > I finally succeed to setup the OVMF platform which can verify the boot > failure issue. But on my platform, if I use image build with below > command (I assume it is used to enable SMM), the system can't boot to > OS (host OS is fedora 25 and guest OS is Ubuntu 18.04). It hang at OS > boot phase after ExitBootService point (I can see the console log > which should been printed at ExitBootService point, so I think hang > should after this point). > build -a IA32 -a X64 -p OvmfPkg/OvmfPkgIa32X64.dsc -t VS2015x86 -b > NOOPT -D SMM_REQUIRE -D SECURE_BOOT_ENABLE -D TLS_ENABLE > > If I use below command to build the image, the system can boot to OS. > build -a IA32 -a X64 -p OvmfPkg\OvmfPkgIa32X64.dsc -t VS2015x86 -b NOOPT > > Does my OVMF environment still has problem? > > > When do the above test, I don't include my two patches. Yes, I think this host environment is still problematic. Namely, the latest QEMU version shipped in Fedora 25 is QEMU-2.7: https://koji.fedoraproject.org/koji/buildinfo?buildID=918114 and QEMU-2.7 does not have a feature that is important for SMM stability. This feature is called "SMI broadcast". In OVMF, the "OvmfPkg/SmmControl2Dxe" runtime driver implements EFI_SMM_CONTROL2_PROTOCOL (which is a runtime protocol). The Trigger() member function raises an SMI, by writing to IO port 0xB2 (ICH9_APM_CNT). Originally, QEMU would raise the SMI synchronously only on the sole VCPU that called Trigger(). Then, the edk2 SMM driver stack would have to pull the other processors explicitly into SMM (via APIC accesses, if I remember correctly). This was extremely slow (the processor first raising the SMI would wait for a long time for the other processors to show up in SMM, before it would decide to pull them in with APIC writes). Also when we switched the edk2 SMM sync mode to "relaxed", the results remained very unstable. We decided that edk2 supported the "traditional" SMM sync mode much better, and so we implemented "SMI broadcast" in QEMU, to satisfy that sync mode. (My memories are a bit fuzzy at this point; you can read more in the following RH Bugzilla entries: https://bugzilla.redhat.com/show_bug.cgi?id=1412327 [QEMU] https://bugzilla.redhat.com/show_bug.cgi?id=1412313 [OVMF]) The idea of "SMI broadcast" is that, regardless of which VCPU triggers the SMI, QEMU raises the SMI immediately on all VCPUs. This made a *huge* difference for the performance and the stability of the edk2 SMM driver stack, used in OVMF and on QEMU/KVM. Now, in order to be able to use old OVMF on new QEMU and vice versa, this feature is runtime-negotiated between "OvmfPkg/SmmControl2Dxe" and QEMU. (The feature is not enabled by default, and without "SMI broadcast", the "relaxed" sync method is slightly less broken than the "tradiational" method, so OVMF defaults to that. With the feature enabled, the "traditional" mode is better -- that config is the absolute best of all four possible combinations.) More precisely, on the QEMU side, the feature is not tied to a QEMU release, but to Q35 *machine type versions*. Therefore, in order to benefit from the feature, you need all of the following: - a recent enough OVMF, - a recent enough QEMU release, - a recent enough Q35 machine type, specified on the QEMU command line. The particular minimum machine type is "pc-q35-2.9" (which is clearly only provided by QEMU-2.9 and later). The machine type requirement is automatically satisfied if you use QEMU-2.9+, and just request the "q35" machine type. (Without an explicit machtype version number, the highest one supported by the QEMU release will be picked.) The lack of this feature in your environment is confirmed by your OVMF log: > NegotiateSmiFeatures: SMI feature negotiation unavailable If the feature is available, you will see the following two messages instead: NegotiateSmiFeatures: using SMI broadcast [...] AppendFwCfgBootScript: SMI feature negotiation boot script saved (The second message only appears if you have S3 enabled -- at S3 resume, the feature has to be re-enabled, so SmmControl2Dxe saves a boot script fragment for that.) Therefore, please upgrade the host to Fedora 26. In Fedora 26, QEMU 2.9 is shipped: https://koji.fedoraproject.org/koji/buildinfo?buildID=986762 ... It's even better if you can upgrade to Fedora 27, as Fedora 27 is the oldest Fedora release still supported at this point. The following article describes the recommended upgrade method: https://fedoraproject.org/wiki/DNF_system_upgrade > Then I include my patches and build the image with SMM enabled, I > found I can't reproduce the issue you met. I can find the > "MpInitChangeApLoopCallback done!" message in the console log. > Attached the console log. Yes, I can see "MpInitChangeApLoopCallback() done" in the log. > Can you help to verify the OVMF image build from my side? Your firmware image (SHA1:
Re: [edk2] [Patch V2] UefiCpuPkg/MpInitLib: Remove redundant parameter.
Hi Eric, On 06/29/18 05:20, Eric Dong wrote: > Parameter StartCount duplicates with RunningCount. After this change, > RunningCount means the running AP count. > > V2 changes: Remove volatile for RunningCount. > > Done Test: > 1.PI SCT Test > 2.Boot OS / S3 > > Cc: Ruiyu Ni > Cc: Jeff Fan > Contributed-under: TianoCore Contribution Agreement 1.1 > Signed-off-by: Eric Dong > --- > UefiCpuPkg/Library/MpInitLib/MpLib.c | 11 +-- > UefiCpuPkg/Library/MpInitLib/MpLib.h | 3 +-- > 2 files changed, 6 insertions(+), 8 deletions(-) > > diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c > b/UefiCpuPkg/Library/MpInitLib/MpLib.c > index 3945771764..52c9679099 100644 > --- a/UefiCpuPkg/Library/MpInitLib/MpLib.c > +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c > @@ -1400,7 +1400,7 @@ CheckAllAPs ( > // value of state after setting the it to CpuStateFinished, so BSP can > safely make use of its value. > // > if (GetApState(CpuData) != CpuStateBusy) { > - CpuMpData->RunningCount ++; > + CpuMpData->RunningCount --; >CpuMpData->CpuData[ProcessorNumber].Waiting = FALSE; > >// > @@ -1425,7 +1425,7 @@ CheckAllAPs ( >// >// If all APs finish, return EFI_SUCCESS. >// > - if (CpuMpData->RunningCount == CpuMpData->StartCount) { > + if (CpuMpData->RunningCount == 0) { > return EFI_SUCCESS; >} > > @@ -1442,7 +1442,7 @@ CheckAllAPs ( > // > if (CpuMpData->FailedCpuList != NULL) { >*CpuMpData->FailedCpuList = > - AllocatePool ((CpuMpData->StartCount - CpuMpData->FinishedCount + > 1) * sizeof (UINTN)); > + AllocatePool ((CpuMpData->RunningCount + 1) * sizeof (UINTN)); >ASSERT (*CpuMpData->FailedCpuList != NULL); > } > ListIndex = 0; > @@ -2121,7 +2121,7 @@ StartupAllAPsWorker ( > return EFI_NOT_STARTED; >} > > - CpuMpData->StartCount = 0; > + CpuMpData->RunningCount = 0; >for (ProcessorNumber = 0; ProcessorNumber < ProcessorCount; > ProcessorNumber++) { > CpuData = >CpuData[ProcessorNumber]; > CpuData->Waiting = FALSE; > @@ -2131,7 +2131,7 @@ StartupAllAPsWorker ( > // Mark this processor as responsible for current calling. > // > CpuData->Waiting = TRUE; > -CpuMpData->StartCount++; > +CpuMpData->RunningCount++; >} > } >} > @@ -2140,7 +2140,6 @@ StartupAllAPsWorker ( >CpuMpData->ProcArguments = ProcedureArgument; >CpuMpData->SingleThread = SingleThread; >CpuMpData->FinishedCount = 0; > - CpuMpData->RunningCount = 0; >CpuMpData->FailedCpuList = FailedCpuList; >CpuMpData->ExpectedTime = CalculateTimeout ( > TimeoutInMicroseconds, > diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.h > b/UefiCpuPkg/Library/MpInitLib/MpLib.h > index 90c09fb8fb..ad62acf766 100644 > --- a/UefiCpuPkg/Library/MpInitLib/MpLib.h > +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.h > @@ -210,9 +210,8 @@ struct _CPU_MP_DATA { >UINTN BackupBuffer; >UINTN BackupBufferSize; > > - volatile UINT32StartCount; >volatile UINT32FinishedCount; > - volatile UINT32RunningCount; > + UINT32 RunningCount; >BOOLEANSingleThread; >EFI_AP_PROCEDURE Procedure; >VOID *ProcArguments; > I got confused by the way you sent out this patch. First I thought that you meant it separately (stand-alone). I tried to test it, but it didn't apply. Also my intent was to ask you whether you wanted to send a new version of [edk2] [Patch 1/2] UefiCpuPkg/MpInitLib: Not use disabled AP. However, upon seeing that this patch wouldn't apply, I'm now thinking that you would like to preserve [edk2] [Patch 1/2] UefiCpuPkg/MpInitLib: Not use disabled AP. without any changes, and your intent is to only update [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Remove redundant parameter. to version 2. In such cases, please do not post an independent patch. Instead, pick one of the following: - Repost the entire series as v2, and mark in the cover letter that patch #1 is unchanged from the v1 posting. - Alternatively, post the patch in response to the *original* v1 thread (using --in-reply-to= with git-send-email), and use the subject [edk2] [Patch v2 2/2] UefiCpuPkg/MpInitLib: Remove redundant parameter. Either of these approaches will let reviewers know that you intend the two patches to go together (and in what order), and that only patch #2 has been updated. So... Assuming this was indeed your intent, I applied the following two patches locally, for testing: [edk2] [Patch 1_2] UefiCpuPkg_MpInitLib: Not use disabled AP. Message-Id: <20180628112920.5296-1-eric.d...@intel.com> 20180628112920.5296-1-eric.dong@intel.com">http://mid.mail-archive.com/20180628112920.5296-1-eric.dong@intel.com [edk2]
[edk2] [Patch V2] UefiCpuPkg/MpInitLib: Remove redundant parameter.
Parameter StartCount duplicates with RunningCount. After this change, RunningCount means the running AP count. V2 changes: Remove volatile for RunningCount. Done Test: 1.PI SCT Test 2.Boot OS / S3 Cc: Ruiyu Ni Cc: Jeff Fan Contributed-under: TianoCore Contribution Agreement 1.1 Signed-off-by: Eric Dong --- UefiCpuPkg/Library/MpInitLib/MpLib.c | 11 +-- UefiCpuPkg/Library/MpInitLib/MpLib.h | 3 +-- 2 files changed, 6 insertions(+), 8 deletions(-) diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c b/UefiCpuPkg/Library/MpInitLib/MpLib.c index 3945771764..52c9679099 100644 --- a/UefiCpuPkg/Library/MpInitLib/MpLib.c +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c @@ -1400,7 +1400,7 @@ CheckAllAPs ( // value of state after setting the it to CpuStateFinished, so BSP can safely make use of its value. // if (GetApState(CpuData) != CpuStateBusy) { - CpuMpData->RunningCount ++; + CpuMpData->RunningCount --; CpuMpData->CpuData[ProcessorNumber].Waiting = FALSE; // @@ -1425,7 +1425,7 @@ CheckAllAPs ( // // If all APs finish, return EFI_SUCCESS. // - if (CpuMpData->RunningCount == CpuMpData->StartCount) { + if (CpuMpData->RunningCount == 0) { return EFI_SUCCESS; } @@ -1442,7 +1442,7 @@ CheckAllAPs ( // if (CpuMpData->FailedCpuList != NULL) { *CpuMpData->FailedCpuList = - AllocatePool ((CpuMpData->StartCount - CpuMpData->FinishedCount + 1) * sizeof (UINTN)); + AllocatePool ((CpuMpData->RunningCount + 1) * sizeof (UINTN)); ASSERT (*CpuMpData->FailedCpuList != NULL); } ListIndex = 0; @@ -2121,7 +2121,7 @@ StartupAllAPsWorker ( return EFI_NOT_STARTED; } - CpuMpData->StartCount = 0; + CpuMpData->RunningCount = 0; for (ProcessorNumber = 0; ProcessorNumber < ProcessorCount; ProcessorNumber++) { CpuData = >CpuData[ProcessorNumber]; CpuData->Waiting = FALSE; @@ -2131,7 +2131,7 @@ StartupAllAPsWorker ( // Mark this processor as responsible for current calling. // CpuData->Waiting = TRUE; -CpuMpData->StartCount++; +CpuMpData->RunningCount++; } } } @@ -2140,7 +2140,6 @@ StartupAllAPsWorker ( CpuMpData->ProcArguments = ProcedureArgument; CpuMpData->SingleThread = SingleThread; CpuMpData->FinishedCount = 0; - CpuMpData->RunningCount = 0; CpuMpData->FailedCpuList = FailedCpuList; CpuMpData->ExpectedTime = CalculateTimeout ( TimeoutInMicroseconds, diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.h b/UefiCpuPkg/Library/MpInitLib/MpLib.h index 90c09fb8fb..ad62acf766 100644 --- a/UefiCpuPkg/Library/MpInitLib/MpLib.h +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.h @@ -210,9 +210,8 @@ struct _CPU_MP_DATA { UINTN BackupBuffer; UINTN BackupBufferSize; - volatile UINT32StartCount; volatile UINT32FinishedCount; - volatile UINT32RunningCount; + UINT32 RunningCount; BOOLEANSingleThread; EFI_AP_PROCEDURE Procedure; VOID *ProcArguments; -- 2.15.0.windows.1 ___ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel