On 11/09/2025 1:35 pm, Alejandro Vallejo wrote:
> On Thu Sep 11, 2025 at 2:03 PM CEST, Andrew Cooper wrote:
>> On 11/09/2025 12:53 pm, Alejandro Vallejo wrote:
>>> CPU hotplug relies on the online CPU bitmap being provided on PIO 0xaf00
>>> by the device model. The GPE handler checks this and compares it against
>>> the "online" flag on each MADT LAPIC entry, setting the flag to its
>>> related bit in the bitmap and adjusting the table's checksum.
>>>
>>> The bytecode doesn't, however, stop at NCPUS. It keeps comparing until it
>>> reaches 128, even if that overflows the MADT into some other (hopefully
>>> mapped) memory. The reading isn't as problematic as the writing though.
>>>
>>> If an "entry" outside the MADT is deemed to disagree with the CPU bitmap
>>> then the bit where the "online" flag would be is flipped, thus
>>> corrupting that memory. And the MADT checksum gets adjusted for a flip
>>> that happened outside its range. It's all terrible.
>>>
>>> Note that this corruption happens regardless of the device-model being
>>> present or not, because even if the bitmap holds 0s, the overflowed
>>> memory might not at the bits corresponding to the "online" flag.
>>>
>>> This patch adjusts the DSDT so entries >=NCPUS are skipped.
>>>
>>> Fixes: c70ad37a1f7c("HVM vcpu add/remove: setup dsdt infrastructure...")
>>> Reported-by: Grygorii Strashko <grygorii_stras...@epam.com>
>>> Signed-off-by: Alejandro Vallejo <alejandro.garciavall...@amd.com>
>>> ---
>>> Half RFC. Not thoroughly untested. Pipeline is green, but none of this is 
>>> tested
>>> there.
>>>
>>> v2:
>>>   * New patch with the general fix for HVM too. Turns out the correction
>>>     logic was buggy after all.
>> Hmm, this does sound rather more serious.  I have a nagging feeling that
>> until recently we always wrote 128 MADT entries.
> If so, I don't see where. It used to be 16, waaaaaaaaaaaaaaaaaaaaay back when.
> Then it got extended to whatever it needed to be.
>
> I have the nagging feeling that rather opaque "some OSs (cough Windows cough)
> don't like more than 16 CPUs was actually this bug in action. Making the DSDTs
> with exactly 16 CPUs a particular kind of silly.
>
>> So, while this looks like a good fix, I think we might want a second
>> Fixes tag.
> Happy to add it, but I really don't see anything like that in the git log.

I can't find it either.  Maybe my memory is getting faulty.

~Andrew

Reply via email to