On 08/26/20 15:32, Laszlo Ersek wrote: > On 08/26/20 11:24, Laszlo Ersek wrote: >> Hi Igor, >> >> On 08/25/20 19:25, Laszlo Ersek wrote: >> >>> So I would suggest fetching the CNEW array element back into "uid" >>> first, then using "uid" for both the NOTIFY call, and the (currently >>> missing) restoration of CSEL. Then we can write 1 to CINS. >>> >>> Expressed as a patch on top of yours: >>> >>>> diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c >>>> index 4864c3b39694..2bea6144fd5e 100644 >>>> --- a/hw/acpi/cpu.c >>>> +++ b/hw/acpi/cpu.c >>>> @@ -564,8 +564,11 @@ void build_cpus_aml(Aml *table, MachineState >>>> *machine, CPUHotplugFeatures opts, >>>> aml_append(method, aml_store(zero, cpu_idx)); >>>> while_ctx = aml_while(aml_lless(cpu_idx, num_added_cpus)); >>>> { >>>> - aml_append(while_ctx, aml_call2(CPU_NOTIFY_METHOD, >>>> - aml_derefof(aml_index(new_cpus, cpu_idx)), dev_chk)); >>>> + aml_append(while_ctx, >>>> + aml_store(aml_derefof(aml_index(new_cpus, cpu_idx)), >>>> uid)); >>>> + aml_append(while_ctx, >>>> + aml_call2(CPU_NOTIFY_METHOD, uid, dev_chk)); >>>> + aml_append(while_ctx, aml_store(uid, cpu_selector)); >>>> aml_append(while_ctx, aml_store(one, ins_evt)); >>>> aml_append(while_ctx, aml_increment(cpu_idx)); >>>> } >>> >>> This effects the following change, in the decompiled method: >>> >>>> @@ -37,15 +37,17 @@ >>>> If ((Local_NumAddedCpus != Zero)) >>>> { >>>> \_SB.PCI0.SMI0.SMIC = 0x04 >>>> } >>>> >>>> Local_CpuIdx = Zero >>>> While ((Local_CpuIdx < Local_NumAddedCpus)) >>>> { >>>> - CTFY (DerefOf (CNEW [Local_CpuIdx]), One) >>>> + Local_Uid = DerefOf (CNEW [Local_CpuIdx]) >>>> + CTFY (Local_Uid, One) >>>> + \_SB.PCI0.PRES.CSEL = Local_Uid >>>> \_SB.PCI0.PRES.CINS = One >>>> Local_CpuIdx++ >>>> } >>>> >>>> Release (\_SB.PCI0.PRES.CPLK) >>>> } >>> >>> With this change, the >>> >>> virsh setvcpus DOMAIN 8 --live >>> >>> command works for me. The topology in my test domain has CPU#0 and >>> CPU#2 cold-plugged, so the command adds 6 VCPUs. Viewed from the >>> firmware side, the 6 "device_add" commands, issued in close succession >>> by libvirtd, coalesce into 4 "batches". (And of course the firmware >>> sees the 4 batches back-to-back.) >> >> unfortunately, with more testing, I have run into two more races: >> >> (1) When a "device_add" occurs after the ACPI loop collects the CPUS >> from the register block, but before the SMI. >> >> Here, the "stray CPU" is processed fine by the firmware. However, >> the CTFY loop in ACPI does not know about the CPU, so it doesn't >> clear the pending insert event for it. And when the firmware is >> entered with an SMI for the *next* time, the firmware sees the same >> CPU *again* as pending, and tries to relocate it again. Bad things >> happen. >> >> (2) When a "device_add" occurs after the SMI, but before the firmware >> collects the pending CPUs from the register block. >> >> Here, the firmware collects the "stray CPU". However, the "broadcast >> SMI", with which we entered the firmware, did *not* cover the stray >> CPU -- the CPU_FOREACH() loop in ich9_apm_ctrl_changed() could not >> make the SMI pending for the new CPU, because at that time, the CPU >> had not been added yet. As a result, when the firmware sends an >> INIT-SIPI-SIPI to the new CPU, expecting it to boot right into SMM, >> the new CPU instead boots straight into the post-RSM (normal mode) >> "pen", skipping its initial SMI handler. Meaning that the CPU halts >> nicely, but its SMBASE is never relocated, and the SMRAM message >> exchange with the BSP falls apart. >> >> Possible mitigations I can think of: >> >> For problem (1): >> >> (1a) Change the firmware so it notices that it has relocated the >> "stray" CPU before -- such CPUs should be simply skipped in the >> firmware. The next time the CTFY loop runs in ACPI, it will clear >> the pending event. >> >> (1b) Alternatively, stop consuming the hotplug register block in the >> firmware altogether, and work out general message passing, from >> ACPI to firmware. See the outline here: >> >> >> http://mid.mail-archive.com/cf887d74-f65d-602a-9629-3d25cef93a69@redhat.com >> >> For problem (2): >> >> (2a) Change the firmware so that it sends a directed SMI as well to >> each CPU, just before sending an INIT-SIPI-SIPI. This should be >> idempotent -- if the broadcast SMI *has* covered the the CPU, >> then sending a directed SMI should make no difference. >> >> (2b) Alternatively, change the "device_add" command in QEMU so that, >> if "CPU hotplug with SMI" has been negotiated, the new CPU is >> added with the SMI made pending for it at once. (That is, no >> hot-plugged CPU would exist with the directed SMI *not* pending >> for it.) >> >> (2c) Alternatively, approach (1b) would fix problem (2) as well -- the >> firmware would only relocate such CPUs that ACPI collected before >> injecting the SMI. So all those CPUs would have the SMI pending. >> >> >> I can experiment with (1a) and (2a), > > My patches for (1a) and (1b) seem to work -- my workstation has 10
aargh, I meant my patches for (1a) and (2a). sorry Laszlo > PCPUs, and I'm using a guest with 20 possible VCPUs and 2 cold-plugged > VCPUs on it, for testing. The patches survive the hot-plugging of 18 > VCPUs in one go, or two batches like 9+9. I can see the fixes being > exercised. > > Unless you strongly disagree (or I find issues in further testing), I > propose that I post these fixes to edk2-devel (they should still be in > scope for the upcoming release), and that we stick with your current > patch series for QEMU (v3 -- upcoming, or maybe already posted). > > Thanks! > Laszlo >