On Wed Sep 10, 2025 at 7:01 PM CEST, Alejandro Vallejo wrote: > On Wed Sep 10, 2025 at 5:31 PM CEST, Jan Beulich wrote: >> On 10.09.2025 17:16, Alejandro Vallejo wrote: >>> On Wed Sep 10, 2025 at 5:02 PM CEST, Jan Beulich wrote: >>>> On 10.09.2025 16:49, Alejandro Vallejo wrote: >>>>> CPU hotplug relies on the guest having access to the legacy online CPU >>>>> bitmap that QEMU provides at PIO 0xAF00. But PVH guests have no DM, so >>>>> this causes the MADT to get corrupted due to spurious modifications of >>>>> the "online" flag in MADT entries and the table checksum during the >>>>> initial acpica passes. >>>> >>>> I don't understand this MADT corruption aspect, which - aiui - is why >>>> there's a Fixes: tag here. The code change itself looks plausible. >>> >>> When there's no DM to provide a real and honest online CPU bitmap on PIO >>> 0xAF00 >>> then we get all 1s (because there's no IOREQ server). Which confuses the GPE >>> handler. >>> >>> Somehow, the GPE handler is being triggered. Whether this is due to a real >>> SCI >>> or just it being spuriously executed as part of the initial acpica pass, I >>> don't >>> know. >>> >>> Both statements combined means the checksum and online flags in the MADT get >>> changed after initial parsing making it appear as-if all 128 CPUs were >>> plugged. >> >> I can follow this part (the online flags one, that is). >> >>> This patch makes the checksums be correct after acpica init. >> >> I'm still in trouble with this one. If MADT is modified in the process, >> there's >> only one of two possible options: >> 1) It's expected for the checksum to no longer be correct. >> 2) The checksum is being fixed up in the process. >> That's independent of being HVM or PVH and independent of guest boot or >> later. >> (Of course there's a sub-variant of 2, where the adjusting of the checksum >> would be broken, but that wouldn't be covered by your change.) >> >> Jan > > I see what you mean now. The checksum correction code LOOKS correct. But I > wonder about the table length... We report a table as big as it needs to be, > but the checksum update is done irrespective of FLG being inside the valid > range > of the MADT. If a guest with 2 vCPUs (in max_vcpus) sees vCPU127 being > signalled > that'd trigger the (unseen) online flag to be enabled and the checksum > adjusted, > except the checksum must not being adjusted. > > I could add even more AML to cover that, but that'd be QEMU misbehaving (or > being absent). This patch covers the latter case, but it might be good to > change the commit message to reflect the real problem. > > Cheers, > Alejandro
It doesn't quite add up in the mismatch though. There might be something else lurking in there. Regardless, I don't want this junk in PVH. Would a commit reword suffice to have it acked? Cheers, Alejandro