On 15/11/2024 9:28 am, Jan Beulich wrote: > On 14.11.2024 19:22, Andrew Cooper wrote: >> It may have taken a while, but it occurs to me that the mentioned commit >> fixed >> a second problem too. >> >> On entering trampoline_boot_cpu_entry(), %esp points at the trampoline stack, >> but in a 32bit flat segment. It happens to be page aligned. >> >> When dropping into 16bit mode, the stack segment operates on %sp, preserving >> the upper bits. Prior to 1ed76797439e, the top nibble of %sp would depend on >> where the trampoline was placed in low memory, and only had a 1/16 chance of >> being 0 and therefore operating on the intended stack. >> >> There was a 15/16 chance of using a different page in the trampoline as if it >> were the stack. Therefore, zeroing %esp was correct, but for more reasons >> than realised at the time. > I'm afraid I don't follow this analysis. Said commit replaced clearing of %sp > by clearing of %esp.
Correct > That made no difference for anything using the 16-bit > register. True, but Xen's 16bit code isn't very relevant to the analysis. Fujitsu's BIOS is. > I don't see how the top nibble of %sp could have been non-zero > prior to that change. Oh, that's a typo. It should have been the 5th nibble of %esp. Said nibble depends entirely on where the trampoline is placed in low memory. We first enter the trampoline in 32bit flat mode with %esp being the absolute address of the stack. i.e. it's 0x000yyyyy with a 15/16th's chance of the 5th nibble being non-zero. Then we drop down into Real mode (non flat, because the trampoline never overlaps the IVT at 0). At this point we used to zero %sp which preserves %esp's 5th nibble. And in the case that went wrong, INT $0x15 corrupted memory that happened to be in the Xen image. Anyway, when I was debugging 11 years ago, I noticed that %esp was nonzero in its upper half and, despite deciding this was suspicious, couldn't figure out why and zeroing it all fixed the memory corruption. I also didn't appreciate that `xor %sp, %sp` was strongly dependent on the trampoline being at exactly trampoline_start + 64k. Anyway, given that everyone seemed to be confused, I guess I need to try rewriting the commit message. ~Andrew
