Re: [RFC 0/33] KVM: x86: hyperv: Introduce VSM support
Hey Nicolas, On 08.11.23 12:17, Nicolas Saenz Julienne wrote: Hyper-V's Virtual Secure Mode (VSM) is a virtualisation security feature that leverages the hypervisor to create secure execution environments within a guest. VSM is documented as part of Microsoft's Hypervisor Top Level Functional Specification [1]. Security features that build upon VSM, like Windows Credential Guard, are enabled by default on Windows 11, and are becoming a prerequisite in some industries. This RFC series introduces the necessary infrastructure to emulate VSM enabled guests. It is a snapshot of the progress we made so far, and its main goal is to gather design feedback. Specifically on the KVM APIs we introduce. For a high level design overview, see the documentation in patch 33. Additionally, this topic will be discussed as part of the KVM Micro-conference, in this year's Linux Plumbers Conference [2]. Awesome, looking forward to the session! :) The series is accompanied by two repositories: - A PoC QEMU implementation of VSM [3]. - VSM kvm-unit-tests [4]. Note that this isn't a full VSM implementation. For now it only supports 2 VTLs, and only runs on uniprocessor guests. It is capable of booting Windows Sever 2016/2019, but is unstable during runtime. How much of these limitations are inherent in the current set of patches? What is missing to go beyond 2 VTLs and into SMP land? Anything that will require API changes? Alex Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
Re: [RFC 0/33] KVM: x86: hyperv: Introduce VSM support
On Wed Nov 8, 2023 at 11:40 AM UTC, Alexander Graf wrote: > Hey Nicolas, [...] > > The series is accompanied by two repositories: > > - A PoC QEMU implementation of VSM [3]. > > - VSM kvm-unit-tests [4]. > > > > Note that this isn't a full VSM implementation. For now it only supports > > 2 VTLs, and only runs on uniprocessor guests. It is capable of booting > > Windows Sever 2016/2019, but is unstable during runtime. > > How much of these limitations are inherent in the current set of > patches? What is missing to go beyond 2 VTLs and into SMP land? Anything > that will require API changes? The main KVM concepts introduced by this series are ready to deal with any number of VTLs (APIC ID groups, VTL KVM device). KVM_HV_GET_VSM_STATE should provide a copy of 'vsm_code_page_offsets' per-VTL, since the hypercall page is partition wide but per-VTL. Attaching that information as a VTL KVM device attribute fits that requirement nicely. I'd prefer going that way especially if the VTL KVM device has a decent reception. Also, the secure memory intecepts and HVCALL_TRANSLATE_VIRTUAL_ADDRESS take some VTL related shortcuts, but those are going away. Otherwise, I don't see any necessary in-kernel changes. When virtualizing Windows with VSM I've never seen usages that go beyond VTL1. So enabling VTL > 1 will be mostly a kvm-unit-tests effort. As for SMP, it just a matter of work. Notably HvStartVirtualProcessor and HvGetVpIndexFromApicId need to be implemented, and making sure the QEMU VTL scheduling code holds up. Nicolas
Re: [RFC 0/33] KVM: x86: hyperv: Introduce VSM support
On Wed, Nov 08, 2023, Nicolas Saenz Julienne wrote: > This RFC series introduces the necessary infrastructure to emulate VSM > enabled guests. It is a snapshot of the progress we made so far, and its > main goal is to gather design feedback. Heh, then please provide an overview of the design, and ideally context and/or justification for various design decisions. It doesn't need to be a proper design doc, and you can certainly point at other documentation for explaining VSM/VTLs, but a few paragraphs and/or verbose bullet points would go a long way. The documentation in patch 33 provides an explanation of VSM itself, and a little insight into how userspace can utilize the KVM implementation. But the documentation provides no explanation of the mechanics that KVM *developers* care about, e.g. the use of memory attributes, how memory attributes are enforced, whether or not an in-kernel local APIC is required, etc. Nor does the documentation explain *why*, e.g. why store a separate set of memory attributes per VTL "device", which by the by is broken and unnecessary. > Specifically on the KVM APIs we introduce. For a high level design overview, > see the documentation in patch 33. > > Additionally, this topic will be discussed as part of the KVM > Micro-conference, in this year's Linux Plumbers Conference [2]. > > The series is accompanied by two repositories: > - A PoC QEMU implementation of VSM [3]. > - VSM kvm-unit-tests [4]. > > Note that this isn't a full VSM implementation. For now it only supports > 2 VTLs, and only runs on uniprocessor guests. It is capable of booting > Windows Sever 2016/2019, but is unstable during runtime. > > The series is based on the v6.6 kernel release, and depends on the > introduction of KVM memory attributes, which is being worked on > independently in "KVM: guest_memfd() and per-page attributes" [5]. This doesn't actually apply on 6.6 with v14 of guest_memfd, because v14 of guest_memfd is based on kvm-6.7-1. Ah, and looking at your github repo, this isn't based on v14 at all, it's based on v12. That's totally fine, but the cover letter needs to explicitly, clearly, and *accurately* state the dependencies. I can obviously grab the full branch from github, but that's not foolproof, e.g. if you accidentally delete or force push to that branch. And I also prefer to know that what I'm replying to on list is the exact same code that I am looking at. > A full Linux tree is also made available [6]. > > Series rundown: > - Patch 2 introduces the concept of APIC ID groups. > - Patches 3-12 introduce the VSM capability and basic VTL awareness into >Hyper-V emulation. > - Patch 13 introduces vCPU polling support. > - Patches 14-31 use KVM's memory attributes to implement VTL memory >protections. Introduces the VTL KMV device and secure memory >intercepts. > - Patch 32 is a temporary implementation of >HVCALL_TRANSLATE_VIRTUAL_ADDRESS necessary to boot Windows 2019. > - Patch 33 introduces documentation. > > Our intention is to integrate feedback gathered in the RFC and LPC while > we finish the VSM implementation. In the future, we will split the series > into distinct feature patch sets and upstream these independently. > > Thanks, > Nicolas > > [1] > https://raw.githubusercontent.com/Microsoft/Virtualization-Documentation/master/tlfs/Hypervisor%20Top%20Level%20Functional%20Specification%20v6.0b.pdf > [2] https://lpc.events/event/17/sessions/166/#20231114 > [3] https://github.com/vianpl/qemu/tree/vsm-rfc-v1 > [4] https://github.com/vianpl/kvm-unit-tests/tree/vsm-rfc-v1 > [5] https://lore.kernel.org/lkml/20231105163040.14904-1-pbonz...@redhat.com/. > [6] Full tree: https://github.com/vianpl/linux/tree/vsm-rfc-v1. When providing github links, my preference is to format the pointers as: or tags/ e.g. https://github.com/vianpl/linux vsm-rfc-v1 so that readers can copy+paste the full thing directly into `git fetch`. It's a minor thing, but AFAIK no one actually does review by clicking through github's webview. > There are also two small dependencies with > https://marc.info/?l=kvm&m=167887543028109&w=2 and > https://lkml.org/lkml/2023/10/17/972 Please use lore links, there's zero reason to use anything else these days. For those of us that use b4, lore links make life much easier.
Re: [RFC 0/33] KVM: x86: hyperv: Introduce VSM support
On Wed, Nov 08, 2023, Sean Christopherson wrote: > On Wed, Nov 08, 2023, Nicolas Saenz Julienne wrote: > > This RFC series introduces the necessary infrastructure to emulate VSM > > enabled guests. It is a snapshot of the progress we made so far, and its > > main goal is to gather design feedback. > > Heh, then please provide an overview of the design, and ideally context and/or > justification for various design decisions. It doesn't need to be a proper > design > doc, and you can certainly point at other documentation for explaining > VSM/VTLs, > but a few paragraphs and/or verbose bullet points would go a long way. > > The documentation in patch 33 provides an explanation of VSM itself, and a > little > insight into how userspace can utilize the KVM implementation. But the > documentation > provides no explanation of the mechanics that KVM *developers* care about, > e.g. > the use of memory attributes, how memory attributes are enforced, whether or > not > an in-kernel local APIC is required, etc. > > Nor does the documentation explain *why*, e.g. why store a separate set of > memory > attributes per VTL "device", which by the by is broken and unnecessary. After speed reading the series.. An overview of the design, why you made certain choices, and the tradeoffs between various options is definitely needed. A few questions off the top of my head: - What is the split between userspace and KVM? How did you arrive at that split? - How much *needs* to be in KVM? I.e. how much can be pushed to userspace while maintaininly good performance? - Why not make VTLs a first-party concept in KVM? E.g. rather than bury info in a VTL device and APIC ID groups, why not modify "struct kvm" to support replicating state that needs to be tracked per-VTL? Because of how memory attributes affect hugepages, duplicating *memslots* might actually be easier than teaching memslots to be VTL-aware. - Is "struct kvm_vcpu" the best representation of an execution context (if I'm getting the terminology right)? E.g. if 90% of the state is guaranteed to be identical for a given vCPU across execution contexts, then modeling that with separate kvm_vcpu structures is very inefficient. I highly doubt it's 90%, but it might be quite high depending on how much the TFLS restricts the state of the vCPU, e.g. if it's 64-bit only. The more info you can provide before LPC, the better, e.g. so that we can spend time discussing options instead of you getting peppered with questions about the requirements and whatnot.
Re: [RFC 0/33] KVM: x86: hyperv: Introduce VSM support
Hi Sean, Thanks for taking the time to review the series. I took note of your comments across the series, and will incorporate them into the LPC discussion. On Wed Nov 8, 2023 at 6:33 PM UTC, Sean Christopherson wrote: > On Wed, Nov 08, 2023, Sean Christopherson wrote: > > On Wed, Nov 08, 2023, Nicolas Saenz Julienne wrote: > > > This RFC series introduces the necessary infrastructure to emulate VSM > > > enabled guests. It is a snapshot of the progress we made so far, and its > > > main goal is to gather design feedback. > > > > Heh, then please provide an overview of the design, and ideally context > > and/or > > justification for various design decisions. It doesn't need to be a proper > > design > > doc, and you can certainly point at other documentation for explaining > > VSM/VTLs, > > but a few paragraphs and/or verbose bullet points would go a long way. > > > > The documentation in patch 33 provides an explanation of VSM itself, and a > > little > > insight into how userspace can utilize the KVM implementation. But the > > documentation > > provides no explanation of the mechanics that KVM *developers* care about, > > e.g. > > the use of memory attributes, how memory attributes are enforced, whether > > or not > > an in-kernel local APIC is required, etc. > > > > Nor does the documentation explain *why*, e.g. why store a separate set of > > memory > > attributes per VTL "device", which by the by is broken and unnecessary. > > After speed reading the series.. An overview of the design, why you made > certain > choices, and the tradeoffs between various options is definitely needed. > > A few questions off the top of my head: > > - What is the split between userspace and KVM? How did you arrive at that > split? Our original design, which we discussed in the KVM forum 2023 [1] and is public [2], implemented most of VSM in-kernel. Notably we introduced VTL awareness in struct kvm_vcpu. This turned out to be way too complex: now vcpus have multiple CPU architectural states, events, apics, mmu, etc. First of all, the code turned out to be very intrusive, for example, most apic APIs had to be reworked one way or another to accommodate the fact there are multiple apics available. Also, we were forced to introduce VSM-specific semantics in x86 emulation code. But more importantly, the biggest pain has been dealing with state changes, they may be triggered remotely through requests, and some are already fairly delicate as-is. They involve a multitude of corner cases that almost never apply for a VTL aware kvm_vcpu. Especially if you factor in features like live migration. It's been a continuous source of regressions. Memory protections were implemented by using memory slot modifications. We introduced a downstream API that allows updating memory slots concurrently with vCPUs running. I think there was a similar proposal upstream from Red Hat some time ago. The result is complex, hard to generalize and slow. So we decided to move all this complexity outside of struct kvm_vcpu and, as much as possible, out of the kernel. We figures out the basic kernel building blocks that are absolutely necessary, and let user-space deal with the rest. > - How much *needs* to be in KVM? I.e. how much can be pushed to userspace > while >maintaininly good performance? As I said above, the aim of the current design is to keep it as light as possible. The biggest move we made was moving VTL switching into user-space. We don't see any indication that performance is affected in a major way. But we will know for sure once we finish the implementation and test it under real use-cases. > - Why not make VTLs a first-party concept in KVM? E.g. rather than bury info >in a VTL device and APIC ID groups, why not modify "struct kvm" to support >replicating state that needs to be tracked per-VTL? Because of how memory >attributes affect hugepages, duplicating *memslots* might actually be > easier >than teaching memslots to be VTL-aware. I do agree that we need to introduce some level VTL awareness into memslots. There's the hugepages issues you pointed out. But it'll be also necessary once we look at how to implement overlay pages that are per-VTL. (A topic I didn't mention in the series as I though I had managed to solve memory protections while avoiding the need for multiple slots). What I have in mind is introducing a memory slot address space per-VTL, similar to how we do things with SMM. It's important to note that requirements for overlay pages and memory protections are very different. Overlay pages are scarce, and are setup once and never change (AFAICT), so we think stopping all vCPUs, updating slots, and resuming execution will provide good enough performance. Memory protections happen very frequently, generally with page granularity, and may be short-lived. > - Is "struct kvm_vcpu" the best representation of an execution context (if > I'm >getting the terminology right)? Le
Re: [RFC 0/33] KVM: x86: hyperv: Introduce VSM support
On Wed Nov 8, 2023 at 4:55 PM UTC, Sean Christopherson wrote: > > This RFC series introduces the necessary infrastructure to emulate VSM > > enabled guests. It is a snapshot of the progress we made so far, and its > > main goal is to gather design feedback. > > Heh, then please provide an overview of the design, and ideally context and/or > justification for various design decisions. It doesn't need to be a proper > design > doc, and you can certainly point at other documentation for explaining > VSM/VTLs, > but a few paragraphs and/or verbose bullet points would go a long way. > > The documentation in patch 33 provides an explanation of VSM itself, and a > little > insight into how userspace can utilize the KVM implementation. But the > documentation > provides no explanation of the mechanics that KVM *developers* care about, > e.g. > the use of memory attributes, how memory attributes are enforced, whether or > not > an in-kernel local APIC is required, etc. Noted, I'll provide a design review on the next submission. > Nor does the documentation explain *why*, e.g. why store a separate set of > memory > attributes per VTL "device", which by the by is broken and unnecessary. It's clear to me how the current implementation of VTL devices is broken. But unncessary? That made me think we could inject the VTL In the memory attribute key, for ex. with 'gfn | vtl << 58'. And then use generic API and a single xarray. Nicolas
Re: [RFC 0/33] KVM: x86: hyperv: Introduce VSM support
On Fri, Nov 10, 2023, Nicolas Saenz Julienne wrote: > On Wed Nov 8, 2023 at 6:33 PM UTC, Sean Christopherson wrote: > > - What is the split between userspace and KVM? How did you arrive at that > > split? > > Our original design, which we discussed in the KVM forum 2023 [1] and is > public [2], implemented most of VSM in-kernel. Notably we introduced VTL > awareness in struct kvm_vcpu. ... > So we decided to move all this complexity outside of struct kvm_vcpu > and, as much as possible, out of the kernel. We figures out the basic > kernel building blocks that are absolutely necessary, and let user-space > deal with the rest. Sorry, I should have been more clear. What's the split in terms of responsibilities, i.e. what will KVM's ABI look like? E.g. if the vCPU=>VTLs setup is nonsensical, does KVM care? My general preference is for KVM to be as permissive as possible, i.e. let userspace do whatever it wants so long as it doesn't place undue burden on KVM. But at the same time I don't to end up in a similar boat as many of the paravirt features, where things just stop working if userspace or the guest makes a goof. > > - Why not make VTLs a first-party concept in KVM? E.g. rather than bury > > info > >in a VTL device and APIC ID groups, why not modify "struct kvm" to > > support > >replicating state that needs to be tracked per-VTL? Because of how > > memory > >attributes affect hugepages, duplicating *memslots* might actually be > > easier > >than teaching memslots to be VTL-aware. > > I do agree that we need to introduce some level VTL awareness into > memslots. There's the hugepages issues you pointed out. But it'll be > also necessary once we look at how to implement overlay pages that are > per-VTL. (A topic I didn't mention in the series as I though I had > managed to solve memory protections while avoiding the need for multiple > slots). What I have in mind is introducing a memory slot address space > per-VTL, similar to how we do things with SMM. N (I hate memslot address spaces :-) ) Why not represent each VTL with a separate "struct kvm" instance? That would naturally provide per-VTL behavior for: - APIC groups - memslot overlays - memory attributes (and their impact on hugepages) - MMU pages The only (obvious) issue with that approach would be cross-VTL operations. IIUC, sending IPIs across VTLs isn't allowed, but even if it were, that should be easy enough to solve, e.g. KVM already supports posting interrupts from non-KVM sources. GVA=>GPA translation would be trickier, but that patch suggests you want to handle that in userspace anyways. And if translation is a rare/slow path, maybe it could simply be punted to userspace? NOTE: The hypercall implementation is incomplete and only shared for completion. Additionally we'd like to move the VTL aware parts to user-space. Ewww, and looking at what it would take to support cross-VM translations shows another problem with using vCPUs to model VTLs. Nothing prevents userspace from running a virtual CPU at multiple VTLs concurrently, which means that anything that uses kvm_hv_get_vtl_vcpu() is unsafe, e.g. walk_mmu->gva_to_gpa() could be modified while kvm_hv_xlate_va_walk() is running. I suppose that's not too hard to solve, e.g. mutex_trylock() and bail if something holds the other kvm_vcpu/VTL's mutex. Though ideally, KVM would punt all cross-VTL operations to userspace. :-) If punting to userspace isn't feasible, using a struct kvm per VTL probably wouldn't make the locking and concurrency problems meaningfully easier or harder to solve. E.g. KVM could require VTLs, i.e. "struct kvm" instances that are part of a single virtual machine, to belong to the same process. That'd avoid headaches with mm_struct, at which point I don't _think_ getting and using a kvm_vcpu from a different kvm would need special handling? Heh, another fun one, the VTL handling in kvm_hv_send_ipi() is wildly broken, the in_vtl field is consumed before send_ipi is read from userspace. union hv_input_vtl *in_vtl; u64 valid_bank_mask; u32 vector; bool all_cpus; u8 vtl; /* VTL is at the same offset on both IPI types */ in_vtl = &send_ipi.in_vtl; vtl = in_vtl->use_target_vtl ? in_vtl->target_vtl : kvm_hv_get_active_vtl(vcpu); > >E.g. if 90% of the state is guaranteed to be identical for a given > >vCPU across execution contexts, then modeling that with separate > >kvm_vcpu structures is very inefficient. I highly doubt it's 90%, > >but it might be quite high depending on how much the TFLS restricts > >the state of the vCPU, e.g. if it's 64-bit only. > > For the record here's the private VTL state (TLFS 15.11.1): > > "In general, each VTL has its own control registers, RIP register, RSP > register, and MSRs: > > SYSENTER_CS, SYSENTER_ESP, SYSENTER_EIP, STAR, LSTAR, CSTAR, SFMASK, > EFER, PAT, KERNEL_GSBASE
Re: [RFC 0/33] KVM: x86: hyperv: Introduce VSM support
On Fri Nov 10, 2023 at 7:32 PM UTC, Sean Christopherson wrote: > On Fri, Nov 10, 2023, Nicolas Saenz Julienne wrote: > > On Wed Nov 8, 2023 at 6:33 PM UTC, Sean Christopherson wrote: > > > - What is the split between userspace and KVM? How did you arrive at > > > that split? > > > > Our original design, which we discussed in the KVM forum 2023 [1] and is > > public [2], implemented most of VSM in-kernel. Notably we introduced VTL > > awareness in struct kvm_vcpu. > > ... > > > So we decided to move all this complexity outside of struct kvm_vcpu > > and, as much as possible, out of the kernel. We figures out the basic > > kernel building blocks that are absolutely necessary, and let user-space > > deal with the rest. > > Sorry, I should have been more clear. What's the split in terms of > responsibilities, > i.e. what will KVM's ABI look like? E.g. if the vCPU=>VTLs setup is > nonsensical, > does KVM care? > > My general preference is for KVM to be as permissive as possible, i.e. let > userspace do whatever it wants so long as it doesn't place undue burden on > KVM. > But at the same time I don't to end up in a similar boat as many of the > paravirt > features, where things just stop working if userspace or the guest makes a > goof. I'll make sure to formalize this for whenever I post a full series, I need to go over every hcall and think from that perspective. There are some rules it might make sense to enforce. But it really depends on the abstractions we settle on. KVM might not have the necessary introspection to enforce them. IMO ideally it wouldn't, VTLs should remain a user-space concept. My approach so far has been trusting QEMU is doing the right thing. Some high level examples come to mind: - Only one VTL vCPU might run at all times. - Privileged VTL interrupts have precedence over lower VTL execution. - lAPICs can only access their VTL. (Cross VTL IPIs happen through the PV interface). - Lower VTL state should be up to date when accessed from privileged VTLs (through the GET/SET_VP_REGSITER hcall). > > > - Why not make VTLs a first-party concept in KVM? E.g. rather than bury > > > info > > >in a VTL device and APIC ID groups, why not modify "struct kvm" to > > > support > > >replicating state that needs to be tracked per-VTL? Because of how > > > memory > > >attributes affect hugepages, duplicating *memslots* might actually be > > > easier > > >than teaching memslots to be VTL-aware. > > > > I do agree that we need to introduce some level VTL awareness into > > memslots. There's the hugepages issues you pointed out. But it'll be > > also necessary once we look at how to implement overlay pages that are > > per-VTL. (A topic I didn't mention in the series as I though I had > > managed to solve memory protections while avoiding the need for multiple > > slots). What I have in mind is introducing a memory slot address space > > per-VTL, similar to how we do things with SMM. > > N (I hate memslot address spaces :-) ) > > Why not represent each VTL with a separate "struct kvm" instance? That would > naturally provide per-VTL behavior for: > > - APIC groups > - memslot overlays > - memory attributes (and their impact on hugepages) > - MMU pages Very interesting idea! I'll spend some time researching it, it sure solves a lot of issues. > The only (obvious) issue with that approach would be cross-VTL operations. > IIUC, > sending IPIs across VTLs isn't allowed, but even if it were, that should be > easy > enough to solve, e.g. KVM already supports posting interrupts from non-KVM > sources. Correct. Only through kvm_hv_send_ipi(), but from experience it happens very rarely, so performance shouldn't be critical. > GVA=>GPA translation would be trickier, but that patch suggests you want to > handle > that in userspace anyways. And if translation is a rare/slow path, maybe it > could > simply be punted to userspace? > > NOTE: The hypercall implementation is incomplete and only shared for > completion. Additionally we'd like to move the VTL aware parts to > user-space. > > Ewww, and looking at what it would take to support cross-VM translations shows > another problem with using vCPUs to model VTLs. Nothing prevents userspace > from > running a virtual CPU at multiple VTLs concurrently, which means that anything > that uses kvm_hv_get_vtl_vcpu() is unsafe, e.g. walk_mmu->gva_to_gpa() could > be > modified while kvm_hv_xlate_va_walk() is running. > > I suppose that's not too hard to solve, e.g. mutex_trylock() and bail if > something > holds the other kvm_vcpu/VTL's mutex. Though ideally, KVM would punt all > cross-VTL > operations to userspace. :-) > > If punting to userspace isn't feasible, using a struct kvm per VTL probably > wouldn't > make the locking and concurrency problems meaningfully easier or harder to > solve. > E.g. KVM could require VTLs, i.e. "struct kvm" instances that are part of a > single > virtual ma