On 11/09/2025 1:35 pm, Alejandro Vallejo wrote: > On Thu Sep 11, 2025 at 2:03 PM CEST, Andrew Cooper wrote: >> On 11/09/2025 12:53 pm, Alejandro Vallejo wrote: >>> CPU hotplug relies on the online CPU bitmap being provided on PIO 0xaf00 >>> by the device model. The GPE handler checks this and compares it against >>> the "online" flag on each MADT LAPIC entry, setting the flag to its >>> related bit in the bitmap and adjusting the table's checksum. >>> >>> The bytecode doesn't, however, stop at NCPUS. It keeps comparing until it >>> reaches 128, even if that overflows the MADT into some other (hopefully >>> mapped) memory. The reading isn't as problematic as the writing though. >>> >>> If an "entry" outside the MADT is deemed to disagree with the CPU bitmap >>> then the bit where the "online" flag would be is flipped, thus >>> corrupting that memory. And the MADT checksum gets adjusted for a flip >>> that happened outside its range. It's all terrible. >>> >>> Note that this corruption happens regardless of the device-model being >>> present or not, because even if the bitmap holds 0s, the overflowed >>> memory might not at the bits corresponding to the "online" flag. >>> >>> This patch adjusts the DSDT so entries >=NCPUS are skipped. >>> >>> Fixes: c70ad37a1f7c("HVM vcpu add/remove: setup dsdt infrastructure...") >>> Reported-by: Grygorii Strashko <grygorii_stras...@epam.com> >>> Signed-off-by: Alejandro Vallejo <alejandro.garciavall...@amd.com> >>> --- >>> Half RFC. Not thoroughly untested. Pipeline is green, but none of this is >>> tested >>> there. >>> >>> v2: >>> * New patch with the general fix for HVM too. Turns out the correction >>> logic was buggy after all. >> Hmm, this does sound rather more serious. I have a nagging feeling that >> until recently we always wrote 128 MADT entries. > If so, I don't see where. It used to be 16, waaaaaaaaaaaaaaaaaaaaay back when. > Then it got extended to whatever it needed to be. > > I have the nagging feeling that rather opaque "some OSs (cough Windows cough) > don't like more than 16 CPUs was actually this bug in action. Making the DSDTs > with exactly 16 CPUs a particular kind of silly. > >> So, while this looks like a good fix, I think we might want a second >> Fixes tag. > Happy to add it, but I really don't see anything like that in the git log.
I can't find it either. Maybe my memory is getting faulty. ~Andrew