Re: [PATCH RFC 03/39] KVM: x86/xen: register shared_info page
On Tue, 2020-12-01 at 16:40 -0800, Ankur Arora wrote: > > How come we get to pin the page and directly dereference it every time, > > while kvm_setup_pvclock_page() has to use kvm_write_guest_cached() > > instead? > > So looking at my WIP trees from the time, this is something that > we went back and forth on as well with using just a pinned page or a > persistent kvm_vcpu_map(). I have a new plan. Screw pinning it and mapping it into kernel space just because we need to do atomic operations on it. Futexes can do atomic operations on userspace; so can we. We just add atomic_cmpxchg_user() and use that. We can add kvm_cmpxchg_guest_offset_cached() and use that from places like record_steal_time(). That works nicely for all of the Xen support needs too, I think — since test-and-set and test-and-clear bit operations all want to be built on cmpxchg. The one thing I can think off offhand that it doesn't address easily is the testing of vcpu_info->evtchn_upcall_pending in kvm_xen_cpu_has_interrupt(), which really does want to be fast so I'm not sure I want to be using copy_from_user() for that. But I think I can open-code that to do the cache checking that kvm_read_guest_offset_cached() does, and use __get_user() directly in the fast path, falling back to actually calling kvm_read_guest_offset_cached() when it needs to. That will suffice until/unless we grow more use cases which would make it worth providing that as a kvm_get_guest...() set of functions for generic use. smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH RFC 03/39] KVM: x86/xen: register shared_info page
On Thu, Dec 03, 2020, David Woodhouse wrote: > On Wed, 2020-12-02 at 12:32 -0800, Ankur Arora wrote: > > > On IRC, Paolo told me that permanent pinning causes problems for memory > > > hotplug, and pointed me at the trick we do with an MMU notifier and > > > kvm_vcpu_reload_apic_access_page(). > > > > Okay that answers my question. Thanks for clearing that up. > > > > Not sure of a good place to document this but it would be good to > > have this written down somewhere. Maybe kvm_map_gfn()? > > Trying not to get too distracted by polishing this part, so I can > continue with making more things actually work. But I took a quick look > at the reload_apic_access_page() thing. > > AFAICT it works because the access is only from *within* the vCPU, in > guest mode. > > So all the notifier has to do is kick all CPUs, which happens when it > calls kvm_make_all_cpus_request(). Thus we are guaranteed that all CPUs > are *out* of guest mode by the time... > > ...er... maybe not by the time the notifier returns, because all > we've done is *send* the IPI and we don't know the other CPUs have > actually stopped running the guest yet? > > Maybe there's some explanation of why the actual TLB shootdown > truly *will* occur before the page goes away, and some ordering > rules which mean our reschedule IPI will happen first? Something > like that ideally would have been in a comment in in MMU notifier. KVM_REQ_APIC_PAGE_RELOAD is tagged with KVM_REQUEST_WAIT, which means that kvm_kick_many_cpus() and thus smp_call_function_many() will have @wait=true, i.e. the sender will wait for the SMP function call to finish on the target CPUs.
Re: [PATCH RFC 03/39] KVM: x86/xen: register shared_info page
On Wed, 2020-12-02 at 12:32 -0800, Ankur Arora wrote: > > On IRC, Paolo told me that permanent pinning causes problems for memory > > hotplug, and pointed me at the trick we do with an MMU notifier and > > kvm_vcpu_reload_apic_access_page(). > > Okay that answers my question. Thanks for clearing that up. > > Not sure of a good place to document this but it would be good to > have this written down somewhere. Maybe kvm_map_gfn()? Trying not to get too distracted by polishing this part, so I can continue with making more things actually work. But I took a quick look at the reload_apic_access_page() thing. AFAICT it works because the access is only from *within* the vCPU, in guest mode. So all the notifier has to do is kick all CPUs, which happens when it calls kvm_make_all_cpus_request(). Thus we are guaranteed that all CPUs are *out* of guest mode by the time... ...er... maybe not by the time the notifier returns, because all we've done is *send* the IPI and we don't know the other CPUs have actually stopped running the guest yet? Maybe there's some explanation of why the actual TLB shootdown truly *will* occur before the page goes away, and some ordering rules which mean our reschedule IPI will happen first? Something like that ideally would have been in a comment in in MMU notifier. Anyway, I'm not going to fret about that because it clearly doesn't work for the pvclock/shinfo cases anyway; those are accessed from the kernel *outside* guest mode. I think we can use a similar trick but just protect the accesses with kvm->srcu. The code in my tree now uses kvm_map_gfn() at setup time and leaves the shinfo page mapped. A bit like the KVM pvclock code, except that I don't pointlessly map and unmap all the time while leaving the page pinned anyway. So all we need to do is kvm_release_pfn() right after mapping it, leaving the map->hva "unsafe". Then we hook in to the MMU notifier to unmap it when it goes away (in RCU style; clearing the pointer variable, synchronize_srcu(), and *then* unmap, possibly in different range_start/range/range_end callbacks). The *users* of such mappings will need an RCU read lock, and will need to be able to fault the GFN back in when they need it. For now, though, I'm content that converting the Xen shinfo support to kvm_map_gfn() is the right way to go, and the memory hotplug support is an incremental addition on top of it. And I'm even more comforted by the observation that the KVM pvclock support *already* pins its pages, so I'm not even failing to meet the existing bar :) smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH RFC 03/39] KVM: x86/xen: register shared_info page
On 2020-12-02 2:44 a.m., Joao Martins wrote: [late response - was on holiday yesterday] On 12/2/20 12:40 AM, Ankur Arora wrote: On 2020-12-01 5:07 a.m., David Woodhouse wrote: On Wed, 2019-02-20 at 20:15 +, Joao Martins wrote: +static int kvm_xen_shared_info_init(struct kvm *kvm, gfn_t gfn) +{ + struct shared_info *shared_info; + struct page *page; + + page = gfn_to_page(kvm, gfn); + if (is_error_page(page)) + return -EINVAL; + + kvm->arch.xen.shinfo_addr = gfn; + + shared_info = page_to_virt(page); + memset(shared_info, 0, sizeof(struct shared_info)); + kvm->arch.xen.shinfo = shared_info; + return 0; +} + Hm. How come we get to pin the page and directly dereference it every time, while kvm_setup_pvclock_page() has to use kvm_write_guest_cached() instead? So looking at my WIP trees from the time, this is something that we went back and forth on as well with using just a pinned page or a persistent kvm_vcpu_map(). I remember distinguishing shared_info/vcpu_info from kvm_setup_pvclock_page() as shared_info is created early and is not expected to change during the lifetime of the guest which didn't seem true for MSR_KVM_SYSTEM_TIME (or MSR_KVM_STEAL_TIME) so that would either need to do a kvm_vcpu_map() kvm_vcpu_unmap() dance or do some kind of synchronization. That said, I don't think this code explicitly disallows any updates to shared_info. If that was allowed, wouldn't it have been a much simpler fix for CVE-2019-3016? What am I missing? Agreed. Perhaps, Paolo can chime in with why KVM never uses pinned page and always prefers to do cached mappings instead? Part of the CVE fix to not use cached versions. It's not a longterm pin of the page unlike we try to do here (partly due to the nature of the pages we are mapping) but we still we map the gpa, RMW the steal time struct, and then unmap the page. See record_steal_time() -- but more specifically commit b043138246 ("x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is not missed"). But I am not sure it's a good idea to follow the same as record_steal_time() given that this is a fairly sensitive code path for event channels. Should I rework these to use kvm_write_guest_cached()? kvm_vcpu_map() would be better. The event channel logic does RMW operations on shared_info->vcpu_info. Indeed, yes. Ankur IIRC, we saw missed event channels notifications when we were using the {write,read}_cached() version of the patch. But I can't remember the reason it was due to, either the evtchn_pending or the mask word -- which would make it not inject an upcall. If memory serves, it was the mask. Though I don't think that we had kvm_{write,read}_cached in use at that point -- given that they were definitely not RMW safe. Ankur Joao
Re: [PATCH RFC 03/39] KVM: x86/xen: register shared_info page
On 2020-12-02 4:20 a.m., David Woodhouse wrote: On Wed, 2020-12-02 at 10:44 +, Joao Martins wrote: [late response - was on holiday yesterday] On 12/2/20 12:40 AM, Ankur Arora wrote: On 2020-12-01 5:07 a.m., David Woodhouse wrote: On Wed, 2019-02-20 at 20:15 +, Joao Martins wrote: +static int kvm_xen_shared_info_init(struct kvm *kvm, gfn_t gfn) +{ + struct shared_info *shared_info; + struct page *page; + + page = gfn_to_page(kvm, gfn); + if (is_error_page(page)) + return -EINVAL; + + kvm->arch.xen.shinfo_addr = gfn; + + shared_info = page_to_virt(page); + memset(shared_info, 0, sizeof(struct shared_info)); + kvm->arch.xen.shinfo = shared_info; + return 0; +} + Hm. How come we get to pin the page and directly dereference it every time, while kvm_setup_pvclock_page() has to use kvm_write_guest_cached() instead? So looking at my WIP trees from the time, this is something that we went back and forth on as well with using just a pinned page or a persistent kvm_vcpu_map(). I remember distinguishing shared_info/vcpu_info from kvm_setup_pvclock_page() as shared_info is created early and is not expected to change during the lifetime of the guest which didn't seem true for MSR_KVM_SYSTEM_TIME (or MSR_KVM_STEAL_TIME) so that would either need to do a kvm_vcpu_map() kvm_vcpu_unmap() dance or do some kind of synchronization. That said, I don't think this code explicitly disallows any updates to shared_info. If that was allowed, wouldn't it have been a much simpler fix for CVE-2019-3016? What am I missing? Agreed. Perhaps, Paolo can chime in with why KVM never uses pinned page and always prefers to do cached mappings instead? Part of the CVE fix to not use cached versions. It's not a longterm pin of the page unlike we try to do here (partly due to the nature of the pages we are mapping) but we still we map the gpa, RMW the steal time struct, and then unmap the page. See record_steal_time() -- but more specifically commit b043138246 ("x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is not missed"). But I am not sure it's a good idea to follow the same as record_steal_time() given that this is a fairly sensitive code path for event channels. Right. We definitely need to use atomic RMW operations (like the CVE fix did) so the page needs to be *mapped*. My question was about a permanent pinned mapping vs the map/unmap as we need it that record_steal_time() does. On IRC, Paolo told me that permanent pinning causes problems for memory hotplug, and pointed me at the trick we do with an MMU notifier and kvm_vcpu_reload_apic_access_page(). Okay that answers my question. Thanks for clearing that up. Not sure of a good place to document this but it would be good to have this written down somewhere. Maybe kvm_map_gfn()? I'm going to stick with the pinning we have for the moment, and just fix up the fact that it leaks the pinned pages if the guest sets the shared_info address more than once. At some point the apic page MMU notifier thing can be made generic, and we can use that for this and for KVM steal time too. Yeah, that's something that'll definitely be good to have. Ankur
Re: [PATCH RFC 03/39] KVM: x86/xen: register shared_info page
On Wed, 2020-12-02 at 10:44 +, Joao Martins wrote: > [late response - was on holiday yesterday] > > On 12/2/20 12:40 AM, Ankur Arora wrote: > > On 2020-12-01 5:07 a.m., David Woodhouse wrote: > > > On Wed, 2019-02-20 at 20:15 +, Joao Martins wrote: > > > > +static int kvm_xen_shared_info_init(struct kvm *kvm, gfn_t gfn) > > > > +{ > > > > + struct shared_info *shared_info; > > > > + struct page *page; > > > > + > > > > + page = gfn_to_page(kvm, gfn); > > > > + if (is_error_page(page)) > > > > + return -EINVAL; > > > > + > > > > + kvm->arch.xen.shinfo_addr = gfn; > > > > + > > > > + shared_info = page_to_virt(page); > > > > + memset(shared_info, 0, sizeof(struct shared_info)); > > > > + kvm->arch.xen.shinfo = shared_info; > > > > + return 0; > > > > +} > > > > + > > > > > > Hm. > > > > > > How come we get to pin the page and directly dereference it every time, > > > while kvm_setup_pvclock_page() has to use kvm_write_guest_cached() > > > instead? > > > > So looking at my WIP trees from the time, this is something that > > we went back and forth on as well with using just a pinned page or a > > persistent kvm_vcpu_map(). > > > > I remember distinguishing shared_info/vcpu_info from > > kvm_setup_pvclock_page() > > as shared_info is created early and is not expected to change during the > > lifetime of the guest which didn't seem true for MSR_KVM_SYSTEM_TIME (or > > MSR_KVM_STEAL_TIME) so that would either need to do a kvm_vcpu_map() > > kvm_vcpu_unmap() dance or do some kind of synchronization. > > > > That said, I don't think this code explicitly disallows any updates > > to shared_info. > > > > > > > > If that was allowed, wouldn't it have been a much simpler fix for > > > CVE-2019-3016? What am I missing? > > > > Agreed. > > > > Perhaps, Paolo can chime in with why KVM never uses pinned page > > and always prefers to do cached mappings instead? > > > > Part of the CVE fix to not use cached versions. > > It's not a longterm pin of the page unlike we try to do here (partly due to > the nature > of the pages we are mapping) but we still we map the gpa, RMW the steal time > struct, and > then unmap the page. > > See record_steal_time() -- but more specifically commit b043138246 ("x86/KVM: > Make sure > KVM_VCPU_FLUSH_TLB flag is not missed"). > > But I am not sure it's a good idea to follow the same as record_steal_time() > given that > this is a fairly sensitive code path for event channels. Right. We definitely need to use atomic RMW operations (like the CVE fix did) so the page needs to be *mapped*. My question was about a permanent pinned mapping vs the map/unmap as we need it that record_steal_time() does. On IRC, Paolo told me that permanent pinning causes problems for memory hotplug, and pointed me at the trick we do with an MMU notifier and kvm_vcpu_reload_apic_access_page(). I'm going to stick with the pinning we have for the moment, and just fix up the fact that it leaks the pinned pages if the guest sets the shared_info address more than once. At some point the apic page MMU notifier thing can be made generic, and we can use that for this and for KVM steal time too. smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH RFC 03/39] KVM: x86/xen: register shared_info page
On 12/2/20 5:17 AM, Ankur Arora wrote: > On 2020-12-01 5:26 p.m., David Woodhouse wrote >> On Tue, 2020-12-01 at 16:40 -0800, Ankur Arora wrote: >>> On 2020-12-01 5:07 a.m., David Woodhouse wrote: [...] If that was allowed, wouldn't it have been a much simpler fix for CVE-2019-3016? What am I missing? >>> >>> Agreed. >>> >>> Perhaps, Paolo can chime in with why KVM never uses pinned page >>> and always prefers to do cached mappings instead? >>> Should I rework these to use kvm_write_guest_cached()? >>> >>> kvm_vcpu_map() would be better. The event channel logic does RMW operations >>> on shared_info->vcpu_info. >> >> I've ported the shared_info/vcpu_info parts and made a test case, and >> was going back through to make it use kvm_write_guest_cached(). But I >> should probably plug on through the evtchn bits before I do that. >> >> I also don't see much locking on the cache, and can't convince myself >> that accessing the shared_info page from multiple CPUs with >> kvm_write_guest_cached() or kvm_map_gfn() is sane unless they each have >> their own cache. > > I think you could get a VCPU specific cache with kvm_vcpu_map(). > steal clock handling creates such a mapping cache (struct gfn_to_pfn_cache). Joao
Re: [PATCH RFC 03/39] KVM: x86/xen: register shared_info page
[late response - was on holiday yesterday] On 12/2/20 12:40 AM, Ankur Arora wrote: > On 2020-12-01 5:07 a.m., David Woodhouse wrote: >> On Wed, 2019-02-20 at 20:15 +, Joao Martins wrote: >>> +static int kvm_xen_shared_info_init(struct kvm *kvm, gfn_t gfn) >>> +{ >>> + struct shared_info *shared_info; >>> + struct page *page; >>> + >>> + page = gfn_to_page(kvm, gfn); >>> + if (is_error_page(page)) >>> + return -EINVAL; >>> + >>> + kvm->arch.xen.shinfo_addr = gfn; >>> + >>> + shared_info = page_to_virt(page); >>> + memset(shared_info, 0, sizeof(struct shared_info)); >>> + kvm->arch.xen.shinfo = shared_info; >>> + return 0; >>> +} >>> + >> >> Hm. >> >> How come we get to pin the page and directly dereference it every time, >> while kvm_setup_pvclock_page() has to use kvm_write_guest_cached() >> instead? > > So looking at my WIP trees from the time, this is something that > we went back and forth on as well with using just a pinned page or a > persistent kvm_vcpu_map(). > > I remember distinguishing shared_info/vcpu_info from kvm_setup_pvclock_page() > as shared_info is created early and is not expected to change during the > lifetime of the guest which didn't seem true for MSR_KVM_SYSTEM_TIME (or > MSR_KVM_STEAL_TIME) so that would either need to do a kvm_vcpu_map() > kvm_vcpu_unmap() dance or do some kind of synchronization. > > That said, I don't think this code explicitly disallows any updates > to shared_info. > >> >> If that was allowed, wouldn't it have been a much simpler fix for >> CVE-2019-3016? What am I missing? > > Agreed. > > Perhaps, Paolo can chime in with why KVM never uses pinned page > and always prefers to do cached mappings instead? > Part of the CVE fix to not use cached versions. It's not a longterm pin of the page unlike we try to do here (partly due to the nature of the pages we are mapping) but we still we map the gpa, RMW the steal time struct, and then unmap the page. See record_steal_time() -- but more specifically commit b043138246 ("x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is not missed"). But I am not sure it's a good idea to follow the same as record_steal_time() given that this is a fairly sensitive code path for event channels. >> >> Should I rework these to use kvm_write_guest_cached()? > > kvm_vcpu_map() would be better. The event channel logic does RMW operations > on shared_info->vcpu_info. > Indeed, yes. Ankur IIRC, we saw missed event channels notifications when we were using the {write,read}_cached() version of the patch. But I can't remember the reason it was due to, either the evtchn_pending or the mask word -- which would make it not inject an upcall. Joao
Re: [PATCH RFC 03/39] KVM: x86/xen: register shared_info page
On 2020-12-01 5:26 p.m., David Woodhouse wrote On Tue, 2020-12-01 at 16:40 -0800, Ankur Arora wrote: On 2020-12-01 5:07 a.m., David Woodhouse wrote: On Wed, 2019-02-20 at 20:15 +, Joao Martins wrote: +static int kvm_xen_shared_info_init(struct kvm *kvm, gfn_t gfn) +{ + struct shared_info *shared_info; + struct page *page; + + page = gfn_to_page(kvm, gfn); + if (is_error_page(page)) + return -EINVAL; + + kvm->arch.xen.shinfo_addr = gfn; + + shared_info = page_to_virt(page); + memset(shared_info, 0, sizeof(struct shared_info)); + kvm->arch.xen.shinfo = shared_info; + return 0; +} + Hm. How come we get to pin the page and directly dereference it every time, while kvm_setup_pvclock_page() has to use kvm_write_guest_cached() instead? So looking at my WIP trees from the time, this is something that we went back and forth on as well with using just a pinned page or a persistent kvm_vcpu_map(). OK, thanks. I remember distinguishing shared_info/vcpu_info from kvm_setup_pvclock_page() as shared_info is created early and is not expected to change during the lifetime of the guest which didn't seem true for MSR_KVM_SYSTEM_TIME (or MSR_KVM_STEAL_TIME) so that would either need to do a kvm_vcpu_map() kvm_vcpu_unmap() dance or do some kind of synchronization. That said, I don't think this code explicitly disallows any updates to shared_info. It needs to allow updates as well as disabling the shared_info pages. We're going to need that to implement SHUTDOWN_soft_reset for kexec. True. If that was allowed, wouldn't it have been a much simpler fix for CVE-2019-3016? What am I missing? Agreed. Perhaps, Paolo can chime in with why KVM never uses pinned page and always prefers to do cached mappings instead? Should I rework these to use kvm_write_guest_cached()? kvm_vcpu_map() would be better. The event channel logic does RMW operations on shared_info->vcpu_info. I've ported the shared_info/vcpu_info parts and made a test case, and was going back through to make it use kvm_write_guest_cached(). But I should probably plug on through the evtchn bits before I do that. I also don't see much locking on the cache, and can't convince myself that accessing the shared_info page from multiple CPUs with kvm_write_guest_cached() or kvm_map_gfn() is sane unless they each have their own cache. I think you could get a VCPU specific cache with kvm_vcpu_map(). What I have so far is at https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/xenpv Thanks. Will take a look there. Ankur I'll do the event channel support tomorrow and hook it up to my actual VMM to give it some more serious testing.
Re: [PATCH RFC 03/39] KVM: x86/xen: register shared_info page
On Tue, 2020-12-01 at 16:40 -0800, Ankur Arora wrote: > On 2020-12-01 5:07 a.m., David Woodhouse wrote: > > On Wed, 2019-02-20 at 20:15 +, Joao Martins wrote: > > > +static int kvm_xen_shared_info_init(struct kvm *kvm, gfn_t gfn) > > > +{ > > > + struct shared_info *shared_info; > > > + struct page *page; > > > + > > > + page = gfn_to_page(kvm, gfn); > > > + if (is_error_page(page)) > > > + return -EINVAL; > > > + > > > + kvm->arch.xen.shinfo_addr = gfn; > > > + > > > + shared_info = page_to_virt(page); > > > + memset(shared_info, 0, sizeof(struct shared_info)); > > > + kvm->arch.xen.shinfo = shared_info; > > > + return 0; > > > +} > > > + > > > > Hm. > > > > How come we get to pin the page and directly dereference it every time, > > while kvm_setup_pvclock_page() has to use kvm_write_guest_cached() > > instead? > > So looking at my WIP trees from the time, this is something that > we went back and forth on as well with using just a pinned page or a > persistent kvm_vcpu_map(). OK, thanks. > I remember distinguishing shared_info/vcpu_info from kvm_setup_pvclock_page() > as shared_info is created early and is not expected to change during the > lifetime of the guest which didn't seem true for MSR_KVM_SYSTEM_TIME (or > MSR_KVM_STEAL_TIME) so that would either need to do a kvm_vcpu_map() > kvm_vcpu_unmap() dance or do some kind of synchronization. > > That said, I don't think this code explicitly disallows any updates > to shared_info. It needs to allow updates as well as disabling the shared_info pages. We're going to need that to implement SHUTDOWN_soft_reset for kexec. > > > > If that was allowed, wouldn't it have been a much simpler fix for > > CVE-2019-3016? What am I missing? > > Agreed. > > Perhaps, Paolo can chime in with why KVM never uses pinned page > and always prefers to do cached mappings instead? > > > > > Should I rework these to use kvm_write_guest_cached()? > > kvm_vcpu_map() would be better. The event channel logic does RMW operations > on shared_info->vcpu_info. I've ported the shared_info/vcpu_info parts and made a test case, and was going back through to make it use kvm_write_guest_cached(). But I should probably plug on through the evtchn bits before I do that. I also don't see much locking on the cache, and can't convince myself that accessing the shared_info page from multiple CPUs with kvm_write_guest_cached() or kvm_map_gfn() is sane unless they each have their own cache. What I have so far is at https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/xenpv I'll do the event channel support tomorrow and hook it up to my actual VMM to give it some more serious testing. smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH RFC 03/39] KVM: x86/xen: register shared_info page
On 2020-12-01 5:07 a.m., David Woodhouse wrote: On Wed, 2019-02-20 at 20:15 +, Joao Martins wrote: +static int kvm_xen_shared_info_init(struct kvm *kvm, gfn_t gfn) +{ + struct shared_info *shared_info; + struct page *page; + + page = gfn_to_page(kvm, gfn); + if (is_error_page(page)) + return -EINVAL; + + kvm->arch.xen.shinfo_addr = gfn; + + shared_info = page_to_virt(page); + memset(shared_info, 0, sizeof(struct shared_info)); + kvm->arch.xen.shinfo = shared_info; + return 0; +} + Hm. How come we get to pin the page and directly dereference it every time, while kvm_setup_pvclock_page() has to use kvm_write_guest_cached() instead? So looking at my WIP trees from the time, this is something that we went back and forth on as well with using just a pinned page or a persistent kvm_vcpu_map(). I remember distinguishing shared_info/vcpu_info from kvm_setup_pvclock_page() as shared_info is created early and is not expected to change during the lifetime of the guest which didn't seem true for MSR_KVM_SYSTEM_TIME (or MSR_KVM_STEAL_TIME) so that would either need to do a kvm_vcpu_map() kvm_vcpu_unmap() dance or do some kind of synchronization. That said, I don't think this code explicitly disallows any updates to shared_info. If that was allowed, wouldn't it have been a much simpler fix for CVE-2019-3016? What am I missing? Agreed. Perhaps, Paolo can chime in with why KVM never uses pinned page and always prefers to do cached mappings instead? Should I rework these to use kvm_write_guest_cached()? kvm_vcpu_map() would be better. The event channel logic does RMW operations on shared_info->vcpu_info. Ankur
Re: [PATCH RFC 03/39] KVM: x86/xen: register shared_info page
On Wed, 2019-02-20 at 20:15 +, Joao Martins wrote: > +static int kvm_xen_shared_info_init(struct kvm *kvm, gfn_t gfn) > +{ > + struct shared_info *shared_info; > + struct page *page; > + > + page = gfn_to_page(kvm, gfn); > + if (is_error_page(page)) > + return -EINVAL; > + > + kvm->arch.xen.shinfo_addr = gfn; > + > + shared_info = page_to_virt(page); > + memset(shared_info, 0, sizeof(struct shared_info)); > + kvm->arch.xen.shinfo = shared_info; > + return 0; > +} > + Hm. How come we get to pin the page and directly dereference it every time, while kvm_setup_pvclock_page() has to use kvm_write_guest_cached() instead? If that was allowed, wouldn't it have been a much simpler fix for CVE-2019-3016? What am I missing? Should I rework these to use kvm_write_guest_cached()? smime.p7s Description: S/MIME cryptographic signature