On 23.07.25 11:26, David Woodhouse wrote:
> On Thu, 2025-06-19 at 21:42 +0200, Mathias Krause wrote:
>> KVM has a weird behaviour when a guest executes VMCALL on an AMD system
>> or VMMCALL on an Intel CPU. Both naturally generate an invalid opcode
>> exception (#UD) as they are just the wrong instruction for the CPU
>> given. But instead of forwarding the exception to the guest, KVM tries
>> to patch the guest instruction to match the host's actual hypercall
>> instruction. That is doomed to fail as read-only code is rather the
>> standard these days. But, instead of letting go the patching attempt and
>> falling back to #UD injection, KVM injects the page fault instead.
>>
>> That's wrong on multiple levels. Not only isn't that a valid exception
>> to be generated by these instructions, confusing attempts to handle
>> them. It also destroys guest state by doing so, namely the value of CR2.
>>
>> Sean attempted to fix that in KVM[1] but the patch was never applied.
>>
>> Later, Oliver added a quirk bit in [2] so the behaviour can, at least,
>> conceptually be disabled. Paolo even called out to add this very
>> functionality to disable the quirk in QEMU[3]. So lets just do it.
>>
>> A new property 'hypercall-patching=on|off' is added, for the very
>> unlikely case that there are setups that really need the patching.
>> However, these would be vulnerable to memory corruption attacks freely
>> overwriting code as they please. So, my guess is, there are exactly 0
>> systems out there requiring this quirk.
> 
> I am always wary of making assumptions about how guests behave in the
> general case. Every time we do so, we seem to find that *some* ancient
> version of some random network applicance — or FreeBSD — does exactly
> the thing we considered unlikely. And customers get sad.
> 
> As a general rule, before disabling a thing that even *might* have
> worked for a guest, I'd like to run in a 'warning' mode first. Only
> after running the whole fleet with such a warning and observing that it
> *doesn't* trigger, can we actually switch the thing *off*.

Looks like I was overly optimistic. There are, of course, use cases that
rely on the hypercall patching, even if it's just for testing purposes.
One of these are the KUT tests. I tried to fix these[1], however, there
are probably more such mini-kernels, so I reverted back to not changing
the default behaviour and only provided a knob to disabled the quirk,
making users to manually opt-in to it[2].

> 
> Can we have 'hypercall-patching=on|off|log' ? 

I'd like to have the 'log' option as well. But as KVM does the patching
on its own, this would require QEMU to analyze and react to related #UD
exceptions (and possibly #PF to handle currently failing uses cases with
read-only code too) further, I'd rather not want to do.

Another option would be to do a WARN_ON[_ONCE]() in KVM if it does the
patching. But, then, existing use cases would suddenly trigger a kernel
warning, which used to work before. Again, something users probably
don't want to see. :/

I guess, we have to stick around with the default but make users aware
of the option to disable the patching themselves.

Thanks,
Mathias

[1]
https://lore.kernel.org/kvm/20250724191050.1988675-1-mini...@grsecurity.net/
[2]
https://lore.kernel.org/kvm/20250801131226.2729893-1-mini...@grsecurity.net/

Reply via email to