On 23.07.25 11:26, David Woodhouse wrote: > On Thu, 2025-06-19 at 21:42 +0200, Mathias Krause wrote: >> KVM has a weird behaviour when a guest executes VMCALL on an AMD system >> or VMMCALL on an Intel CPU. Both naturally generate an invalid opcode >> exception (#UD) as they are just the wrong instruction for the CPU >> given. But instead of forwarding the exception to the guest, KVM tries >> to patch the guest instruction to match the host's actual hypercall >> instruction. That is doomed to fail as read-only code is rather the >> standard these days. But, instead of letting go the patching attempt and >> falling back to #UD injection, KVM injects the page fault instead. >> >> That's wrong on multiple levels. Not only isn't that a valid exception >> to be generated by these instructions, confusing attempts to handle >> them. It also destroys guest state by doing so, namely the value of CR2. >> >> Sean attempted to fix that in KVM[1] but the patch was never applied. >> >> Later, Oliver added a quirk bit in [2] so the behaviour can, at least, >> conceptually be disabled. Paolo even called out to add this very >> functionality to disable the quirk in QEMU[3]. So lets just do it. >> >> A new property 'hypercall-patching=on|off' is added, for the very >> unlikely case that there are setups that really need the patching. >> However, these would be vulnerable to memory corruption attacks freely >> overwriting code as they please. So, my guess is, there are exactly 0 >> systems out there requiring this quirk. > > I am always wary of making assumptions about how guests behave in the > general case. Every time we do so, we seem to find that *some* ancient > version of some random network applicance — or FreeBSD — does exactly > the thing we considered unlikely. And customers get sad. > > As a general rule, before disabling a thing that even *might* have > worked for a guest, I'd like to run in a 'warning' mode first. Only > after running the whole fleet with such a warning and observing that it > *doesn't* trigger, can we actually switch the thing *off*.
Looks like I was overly optimistic. There are, of course, use cases that rely on the hypercall patching, even if it's just for testing purposes. One of these are the KUT tests. I tried to fix these[1], however, there are probably more such mini-kernels, so I reverted back to not changing the default behaviour and only provided a knob to disabled the quirk, making users to manually opt-in to it[2]. > > Can we have 'hypercall-patching=on|off|log' ? I'd like to have the 'log' option as well. But as KVM does the patching on its own, this would require QEMU to analyze and react to related #UD exceptions (and possibly #PF to handle currently failing uses cases with read-only code too) further, I'd rather not want to do. Another option would be to do a WARN_ON[_ONCE]() in KVM if it does the patching. But, then, existing use cases would suddenly trigger a kernel warning, which used to work before. Again, something users probably don't want to see. :/ I guess, we have to stick around with the default but make users aware of the option to disable the patching themselves. Thanks, Mathias [1] https://lore.kernel.org/kvm/20250724191050.1988675-1-mini...@grsecurity.net/ [2] https://lore.kernel.org/kvm/20250801131226.2729893-1-mini...@grsecurity.net/