Re: [PATCH RFC 03/39] KVM: x86/xen: register shared_info page

2020-12-12 Thread David Woodhouse
On Tue, 2020-12-01 at 16:40 -0800, Ankur Arora wrote:
> > How come we get to pin the page and directly dereference it every time,
> > while kvm_setup_pvclock_page() has to use kvm_write_guest_cached()
> > instead?
> 
> So looking at my WIP trees from the time, this is something that
> we went back and forth on as well with using just a pinned page or a
> persistent kvm_vcpu_map().

I have a new plan.

Screw pinning it and mapping it into kernel space just because we need
to do atomic operations on it. 

Futexes can do atomic operations on userspace; so can we. We just add
atomic_cmpxchg_user() and use that.

We can add kvm_cmpxchg_guest_offset_cached() and use that from places
like record_steal_time().

That works nicely for all of the Xen support needs too, I think — since
test-and-set and test-and-clear bit operations all want to be built on
cmpxchg.

The one thing I can think off offhand that it doesn't address easily is
the testing of vcpu_info->evtchn_upcall_pending in
kvm_xen_cpu_has_interrupt(), which really does want to be fast so I'm
not sure I want to be using copy_from_user() for that.


But I think I can open-code that to do the cache checking that
kvm_read_guest_offset_cached() does, and use __get_user() directly in
the fast path, falling back to actually calling
kvm_read_guest_offset_cached() when it needs to. That will suffice
until/unless we grow more use cases which would make it worth providing
that as a kvm_get_guest...() set of functions for generic use.



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH RFC 03/39] KVM: x86/xen: register shared_info page

2020-12-04 Thread Sean Christopherson
On Thu, Dec 03, 2020, David Woodhouse wrote:
> On Wed, 2020-12-02 at 12:32 -0800, Ankur Arora wrote:
> > > On IRC, Paolo told me that permanent pinning causes problems for memory
> > > hotplug, and pointed me at the trick we do with an MMU notifier and
> > > kvm_vcpu_reload_apic_access_page().
> > 
> > Okay that answers my question. Thanks for clearing that up.
> > 
> > Not sure of a good place to document this but it would be good to
> > have this written down somewhere. Maybe kvm_map_gfn()?
> 
> Trying not to get too distracted by polishing this part, so I can
> continue with making more things actually work. But I took a quick look
> at the reload_apic_access_page() thing.
> 
> AFAICT it works because the access is only from *within* the vCPU, in
> guest mode.
> 
> So all the notifier has to do is kick all CPUs, which happens when it
> calls kvm_make_all_cpus_request(). Thus we are guaranteed that all CPUs
> are *out* of guest mode by the time...
> 
> ...er... maybe not by the time the notifier returns, because all 
> we've done is *send* the IPI and we don't know the other CPUs have 
> actually stopped running the guest yet? 
> 
> Maybe there's some explanation of why the actual TLB shootdown 
> truly *will* occur before the page goes away, and some ordering 
> rules which mean our reschedule IPI will happen first? Something 
> like that ideally would have been in a comment in in MMU notifier.

KVM_REQ_APIC_PAGE_RELOAD is tagged with KVM_REQUEST_WAIT, which means that
kvm_kick_many_cpus() and thus smp_call_function_many() will have @wait=true,
i.e. the sender will wait for the SMP function call to finish on the target 
CPUs.


Re: [PATCH RFC 03/39] KVM: x86/xen: register shared_info page

2020-12-03 Thread David Woodhouse
On Wed, 2020-12-02 at 12:32 -0800, Ankur Arora wrote:
> > On IRC, Paolo told me that permanent pinning causes problems for memory
> > hotplug, and pointed me at the trick we do with an MMU notifier and
> > kvm_vcpu_reload_apic_access_page().
> 
> Okay that answers my question. Thanks for clearing that up.
> 
> Not sure of a good place to document this but it would be good to
> have this written down somewhere. Maybe kvm_map_gfn()?

Trying not to get too distracted by polishing this part, so I can
continue with making more things actually work. But I took a quick look
at the reload_apic_access_page() thing.

AFAICT it works because the access is only from *within* the vCPU, in
guest mode.

So all the notifier has to do is kick all CPUs, which happens when it
calls kvm_make_all_cpus_request(). Thus we are guaranteed that all CPUs
are *out* of guest mode by the time...

...er... maybe not by the time the notifier returns, because all 
we've done is *send* the IPI and we don't know the other CPUs have 
actually stopped running the guest yet? 

Maybe there's some explanation of why the actual TLB shootdown 
truly *will* occur before the page goes away, and some ordering 
rules which mean our reschedule IPI will happen first? Something 
like that ideally would have been in a comment in in MMU notifier.

Anyway, I'm not going to fret about that because it clearly doesn't
work for the pvclock/shinfo cases anyway; those are accessed from the
kernel *outside* guest mode. I think we can use a similar trick but
just protect the accesses with kvm->srcu.

The code in my tree now uses kvm_map_gfn() at setup time and leaves the
shinfo page mapped. A bit like the KVM pvclock code, except that I
don't pointlessly map and unmap all the time while leaving the page
pinned anyway.

So all we need to do is kvm_release_pfn() right after mapping it,
leaving the map->hva "unsafe".

Then we hook in to the MMU notifier to unmap it when it goes away (in
RCU style; clearing the pointer variable, synchronize_srcu(), and
*then* unmap, possibly in different range_start/range/range_end
callbacks).

The *users* of such mappings will need an RCU read lock, and will need
to be able to fault the GFN back in when they need it.

For now, though, I'm content that converting the Xen shinfo support to
kvm_map_gfn() is the right way to go, and the memory hotplug support is
an incremental addition on top of it.

And I'm even more comforted by the observation that the KVM pvclock
support *already* pins its pages, so I'm not even failing to meet the
existing bar :)



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH RFC 03/39] KVM: x86/xen: register shared_info page

2020-12-02 Thread Ankur Arora

On 2020-12-02 2:44 a.m., Joao Martins wrote:

[late response - was on holiday yesterday]

On 12/2/20 12:40 AM, Ankur Arora wrote:

On 2020-12-01 5:07 a.m., David Woodhouse wrote:

On Wed, 2019-02-20 at 20:15 +, Joao Martins wrote:

+static int kvm_xen_shared_info_init(struct kvm *kvm, gfn_t gfn)
+{
+   struct shared_info *shared_info;
+   struct page *page;
+
+   page = gfn_to_page(kvm, gfn);
+   if (is_error_page(page))
+   return -EINVAL;
+
+   kvm->arch.xen.shinfo_addr = gfn;
+
+   shared_info = page_to_virt(page);
+   memset(shared_info, 0, sizeof(struct shared_info));
+   kvm->arch.xen.shinfo = shared_info;
+   return 0;
+}
+


Hm.

How come we get to pin the page and directly dereference it every time,
while kvm_setup_pvclock_page() has to use kvm_write_guest_cached()
instead?


So looking at my WIP trees from the time, this is something that
we went back and forth on as well with using just a pinned page or a
persistent kvm_vcpu_map().

I remember distinguishing shared_info/vcpu_info from kvm_setup_pvclock_page()
as shared_info is created early and is not expected to change during the
lifetime of the guest which didn't seem true for MSR_KVM_SYSTEM_TIME (or
MSR_KVM_STEAL_TIME) so that would either need to do a kvm_vcpu_map()
kvm_vcpu_unmap() dance or do some kind of synchronization.

That said, I don't think this code explicitly disallows any updates
to shared_info.



If that was allowed, wouldn't it have been a much simpler fix for
CVE-2019-3016? What am I missing?


Agreed.

Perhaps, Paolo can chime in with why KVM never uses pinned page
and always prefers to do cached mappings instead?


Part of the CVE fix to not use cached versions.

It's not a longterm pin of the page unlike we try to do here (partly due to the 
nature
of the pages we are mapping) but we still we map the gpa, RMW the steal time 
struct, and
then unmap the page.

See record_steal_time() -- but more specifically commit b043138246 ("x86/KVM: 
Make sure
KVM_VCPU_FLUSH_TLB flag is not missed").

But I am not sure it's a good idea to follow the same as record_steal_time() 
given that
this is a fairly sensitive code path for event channels.



Should I rework these to use kvm_write_guest_cached()?


kvm_vcpu_map() would be better. The event channel logic does RMW operations
on shared_info->vcpu_info.


Indeed, yes.

Ankur IIRC, we saw missed event channels notifications when we were using the
{write,read}_cached() version of the patch.

But I can't remember the reason it was due to, either the evtchn_pending or the 
mask
word -- which would make it not inject an upcall.


If memory serves, it was the mask. Though I don't think that we had
kvm_{write,read}_cached in use at that point -- given that they were
definitely not RMW safe.


Ankur



Joao



Re: [PATCH RFC 03/39] KVM: x86/xen: register shared_info page

2020-12-02 Thread Ankur Arora




On 2020-12-02 4:20 a.m., David Woodhouse wrote:

On Wed, 2020-12-02 at 10:44 +, Joao Martins wrote:

[late response - was on holiday yesterday]

On 12/2/20 12:40 AM, Ankur Arora wrote:

On 2020-12-01 5:07 a.m., David Woodhouse wrote:

On Wed, 2019-02-20 at 20:15 +, Joao Martins wrote:

+static int kvm_xen_shared_info_init(struct kvm *kvm, gfn_t gfn)
+{
+   struct shared_info *shared_info;
+   struct page *page;
+
+   page = gfn_to_page(kvm, gfn);
+   if (is_error_page(page))
+   return -EINVAL;
+
+   kvm->arch.xen.shinfo_addr = gfn;
+
+   shared_info = page_to_virt(page);
+   memset(shared_info, 0, sizeof(struct shared_info));
+   kvm->arch.xen.shinfo = shared_info;
+   return 0;
+}
+


Hm.

How come we get to pin the page and directly dereference it every time,
while kvm_setup_pvclock_page() has to use kvm_write_guest_cached()
instead?


So looking at my WIP trees from the time, this is something that
we went back and forth on as well with using just a pinned page or a
persistent kvm_vcpu_map().

I remember distinguishing shared_info/vcpu_info from kvm_setup_pvclock_page()
as shared_info is created early and is not expected to change during the
lifetime of the guest which didn't seem true for MSR_KVM_SYSTEM_TIME (or
MSR_KVM_STEAL_TIME) so that would either need to do a kvm_vcpu_map()
kvm_vcpu_unmap() dance or do some kind of synchronization.

That said, I don't think this code explicitly disallows any updates
to shared_info.



If that was allowed, wouldn't it have been a much simpler fix for
CVE-2019-3016? What am I missing?


Agreed.

Perhaps, Paolo can chime in with why KVM never uses pinned page
and always prefers to do cached mappings instead?



Part of the CVE fix to not use cached versions.

It's not a longterm pin of the page unlike we try to do here (partly due to the 
nature
of the pages we are mapping) but we still we map the gpa, RMW the steal time 
struct, and
then unmap the page.

See record_steal_time() -- but more specifically commit b043138246 ("x86/KVM: 
Make sure
KVM_VCPU_FLUSH_TLB flag is not missed").

But I am not sure it's a good idea to follow the same as record_steal_time() 
given that
this is a fairly sensitive code path for event channels.


Right. We definitely need to use atomic RMW operations (like the CVE
fix did) so the page needs to be *mapped*.

My question was about a permanent pinned mapping vs the map/unmap as we
need it that record_steal_time() does.

On IRC, Paolo told me that permanent pinning causes problems for memory
hotplug, and pointed me at the trick we do with an MMU notifier and
kvm_vcpu_reload_apic_access_page().


Okay that answers my question. Thanks for clearing that up.

Not sure of a good place to document this but it would be good to
have this written down somewhere. Maybe kvm_map_gfn()?



I'm going to stick with the pinning we have for the moment, and just
fix up the fact that it leaks the pinned pages if the guest sets the
shared_info address more than once.

At some point the apic page MMU notifier thing can be made generic, and
we can use that for this and for KVM steal time too.



Yeah, that's something that'll definitely be good to have.

Ankur


Re: [PATCH RFC 03/39] KVM: x86/xen: register shared_info page

2020-12-02 Thread David Woodhouse
On Wed, 2020-12-02 at 10:44 +, Joao Martins wrote:
> [late response - was on holiday yesterday]
> 
> On 12/2/20 12:40 AM, Ankur Arora wrote:
> > On 2020-12-01 5:07 a.m., David Woodhouse wrote:
> > > On Wed, 2019-02-20 at 20:15 +, Joao Martins wrote:
> > > > +static int kvm_xen_shared_info_init(struct kvm *kvm, gfn_t gfn)
> > > > +{
> > > > +   struct shared_info *shared_info;
> > > > +   struct page *page;
> > > > +
> > > > +   page = gfn_to_page(kvm, gfn);
> > > > +   if (is_error_page(page))
> > > > +   return -EINVAL;
> > > > +
> > > > +   kvm->arch.xen.shinfo_addr = gfn;
> > > > +
> > > > +   shared_info = page_to_virt(page);
> > > > +   memset(shared_info, 0, sizeof(struct shared_info));
> > > > +   kvm->arch.xen.shinfo = shared_info;
> > > > +   return 0;
> > > > +}
> > > > +
> > > 
> > > Hm.
> > > 
> > > How come we get to pin the page and directly dereference it every time,
> > > while kvm_setup_pvclock_page() has to use kvm_write_guest_cached()
> > > instead?
> > 
> > So looking at my WIP trees from the time, this is something that
> > we went back and forth on as well with using just a pinned page or a
> > persistent kvm_vcpu_map().
> > 
> > I remember distinguishing shared_info/vcpu_info from 
> > kvm_setup_pvclock_page()
> > as shared_info is created early and is not expected to change during the
> > lifetime of the guest which didn't seem true for MSR_KVM_SYSTEM_TIME (or
> > MSR_KVM_STEAL_TIME) so that would either need to do a kvm_vcpu_map()
> > kvm_vcpu_unmap() dance or do some kind of synchronization.
> > 
> > That said, I don't think this code explicitly disallows any updates
> > to shared_info.
> > 
> > > 
> > > If that was allowed, wouldn't it have been a much simpler fix for
> > > CVE-2019-3016? What am I missing?
> > 
> > Agreed.
> > 
> > Perhaps, Paolo can chime in with why KVM never uses pinned page
> > and always prefers to do cached mappings instead?
> > 
> 
> Part of the CVE fix to not use cached versions.
> 
> It's not a longterm pin of the page unlike we try to do here (partly due to 
> the nature
> of the pages we are mapping) but we still we map the gpa, RMW the steal time 
> struct, and
> then unmap the page.
> 
> See record_steal_time() -- but more specifically commit b043138246 ("x86/KVM: 
> Make sure
> KVM_VCPU_FLUSH_TLB flag is not missed").
> 
> But I am not sure it's a good idea to follow the same as record_steal_time() 
> given that
> this is a fairly sensitive code path for event channels.

Right. We definitely need to use atomic RMW operations (like the CVE
fix did) so the page needs to be *mapped*.

My question was about a permanent pinned mapping vs the map/unmap as we
need it that record_steal_time() does.

On IRC, Paolo told me that permanent pinning causes problems for memory
hotplug, and pointed me at the trick we do with an MMU notifier and
kvm_vcpu_reload_apic_access_page().

I'm going to stick with the pinning we have for the moment, and just
fix up the fact that it leaks the pinned pages if the guest sets the
shared_info address more than once.

At some point the apic page MMU notifier thing can be made generic, and
we can use that for this and for KVM steal time too.



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH RFC 03/39] KVM: x86/xen: register shared_info page

2020-12-02 Thread Joao Martins
On 12/2/20 5:17 AM, Ankur Arora wrote:
> On 2020-12-01 5:26 p.m., David Woodhouse wrote
>> On Tue, 2020-12-01 at 16:40 -0800, Ankur Arora wrote:
>>> On 2020-12-01 5:07 a.m., David Woodhouse wrote:

[...]

 If that was allowed, wouldn't it have been a much simpler fix for
 CVE-2019-3016? What am I missing?
>>>
>>> Agreed.
>>>
>>> Perhaps, Paolo can chime in with why KVM never uses pinned page
>>> and always prefers to do cached mappings instead?
>>>

 Should I rework these to use kvm_write_guest_cached()?
>>>
>>> kvm_vcpu_map() would be better. The event channel logic does RMW operations
>>> on shared_info->vcpu_info.
>>
>> I've ported the shared_info/vcpu_info parts and made a test case, and
>> was going back through to make it use kvm_write_guest_cached(). But I
>> should probably plug on through the evtchn bits before I do that.
>>
>> I also don't see much locking on the cache, and can't convince myself
>> that accessing the shared_info page from multiple CPUs with
>> kvm_write_guest_cached() or kvm_map_gfn() is sane unless they each have
>> their own cache.
> 
> I think you could get a VCPU specific cache with kvm_vcpu_map().
> 

steal clock handling creates such a mapping cache (struct gfn_to_pfn_cache).

Joao


Re: [PATCH RFC 03/39] KVM: x86/xen: register shared_info page

2020-12-02 Thread Joao Martins
[late response - was on holiday yesterday]

On 12/2/20 12:40 AM, Ankur Arora wrote:
> On 2020-12-01 5:07 a.m., David Woodhouse wrote:
>> On Wed, 2019-02-20 at 20:15 +, Joao Martins wrote:
>>> +static int kvm_xen_shared_info_init(struct kvm *kvm, gfn_t gfn)
>>> +{
>>> +   struct shared_info *shared_info;
>>> +   struct page *page;
>>> +
>>> +   page = gfn_to_page(kvm, gfn);
>>> +   if (is_error_page(page))
>>> +   return -EINVAL;
>>> +
>>> +   kvm->arch.xen.shinfo_addr = gfn;
>>> +
>>> +   shared_info = page_to_virt(page);
>>> +   memset(shared_info, 0, sizeof(struct shared_info));
>>> +   kvm->arch.xen.shinfo = shared_info;
>>> +   return 0;
>>> +}
>>> +
>>
>> Hm.
>>
>> How come we get to pin the page and directly dereference it every time,
>> while kvm_setup_pvclock_page() has to use kvm_write_guest_cached()
>> instead?
> 
> So looking at my WIP trees from the time, this is something that
> we went back and forth on as well with using just a pinned page or a
> persistent kvm_vcpu_map().
> 
> I remember distinguishing shared_info/vcpu_info from kvm_setup_pvclock_page()
> as shared_info is created early and is not expected to change during the
> lifetime of the guest which didn't seem true for MSR_KVM_SYSTEM_TIME (or
> MSR_KVM_STEAL_TIME) so that would either need to do a kvm_vcpu_map()
> kvm_vcpu_unmap() dance or do some kind of synchronization.
> 
> That said, I don't think this code explicitly disallows any updates
> to shared_info.
> 
>>
>> If that was allowed, wouldn't it have been a much simpler fix for
>> CVE-2019-3016? What am I missing?
> 
> Agreed.
> 
> Perhaps, Paolo can chime in with why KVM never uses pinned page
> and always prefers to do cached mappings instead?
> 
Part of the CVE fix to not use cached versions.

It's not a longterm pin of the page unlike we try to do here (partly due to the 
nature
of the pages we are mapping) but we still we map the gpa, RMW the steal time 
struct, and
then unmap the page.

See record_steal_time() -- but more specifically commit b043138246 ("x86/KVM: 
Make sure
KVM_VCPU_FLUSH_TLB flag is not missed").

But I am not sure it's a good idea to follow the same as record_steal_time() 
given that
this is a fairly sensitive code path for event channels.

>>
>> Should I rework these to use kvm_write_guest_cached()?
> 
> kvm_vcpu_map() would be better. The event channel logic does RMW operations
> on shared_info->vcpu_info.
> 
Indeed, yes.

Ankur IIRC, we saw missed event channels notifications when we were using the
{write,read}_cached() version of the patch.

But I can't remember the reason it was due to, either the evtchn_pending or the 
mask
word -- which would make it not inject an upcall.

Joao


Re: [PATCH RFC 03/39] KVM: x86/xen: register shared_info page

2020-12-01 Thread Ankur Arora

On 2020-12-01 5:26 p.m., David Woodhouse wrote

On Tue, 2020-12-01 at 16:40 -0800, Ankur Arora wrote:

On 2020-12-01 5:07 a.m., David Woodhouse wrote:

On Wed, 2019-02-20 at 20:15 +, Joao Martins wrote:

+static int kvm_xen_shared_info_init(struct kvm *kvm, gfn_t gfn)
+{
+   struct shared_info *shared_info;
+   struct page *page;
+
+   page = gfn_to_page(kvm, gfn);
+   if (is_error_page(page))
+   return -EINVAL;
+
+   kvm->arch.xen.shinfo_addr = gfn;
+
+   shared_info = page_to_virt(page);
+   memset(shared_info, 0, sizeof(struct shared_info));
+   kvm->arch.xen.shinfo = shared_info;
+   return 0;
+}
+


Hm.

How come we get to pin the page and directly dereference it every time,
while kvm_setup_pvclock_page() has to use kvm_write_guest_cached()
instead?


So looking at my WIP trees from the time, this is something that
we went back and forth on as well with using just a pinned page or a
persistent kvm_vcpu_map().


OK, thanks.


I remember distinguishing shared_info/vcpu_info from kvm_setup_pvclock_page()
as shared_info is created early and is not expected to change during the
lifetime of the guest which didn't seem true for MSR_KVM_SYSTEM_TIME (or
MSR_KVM_STEAL_TIME) so that would either need to do a kvm_vcpu_map()
kvm_vcpu_unmap() dance or do some kind of synchronization.

That said, I don't think this code explicitly disallows any updates
to shared_info.


It needs to allow updates as well as disabling the shared_info pages.
We're going to need that to implement SHUTDOWN_soft_reset for kexec.

True.





If that was allowed, wouldn't it have been a much simpler fix for
CVE-2019-3016? What am I missing?


Agreed.

Perhaps, Paolo can chime in with why KVM never uses pinned page
and always prefers to do cached mappings instead?



Should I rework these to use kvm_write_guest_cached()?


kvm_vcpu_map() would be better. The event channel logic does RMW operations
on shared_info->vcpu_info.


I've ported the shared_info/vcpu_info parts and made a test case, and
was going back through to make it use kvm_write_guest_cached(). But I
should probably plug on through the evtchn bits before I do that.

I also don't see much locking on the cache, and can't convince myself
that accessing the shared_info page from multiple CPUs with
kvm_write_guest_cached() or kvm_map_gfn() is sane unless they each have
their own cache.


I think you could get a VCPU specific cache with kvm_vcpu_map().



What I have so far is at

https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/xenpv


Thanks. Will take a look there.

Ankur



I'll do the event channel support tomorrow and hook it up to my actual
VMM to give it some more serious testing.



Re: [PATCH RFC 03/39] KVM: x86/xen: register shared_info page

2020-12-01 Thread David Woodhouse
On Tue, 2020-12-01 at 16:40 -0800, Ankur Arora wrote:
> On 2020-12-01 5:07 a.m., David Woodhouse wrote:
> > On Wed, 2019-02-20 at 20:15 +, Joao Martins wrote:
> > > +static int kvm_xen_shared_info_init(struct kvm *kvm, gfn_t gfn)
> > > +{
> > > +   struct shared_info *shared_info;
> > > +   struct page *page;
> > > +
> > > +   page = gfn_to_page(kvm, gfn);
> > > +   if (is_error_page(page))
> > > +   return -EINVAL;
> > > +
> > > +   kvm->arch.xen.shinfo_addr = gfn;
> > > +
> > > +   shared_info = page_to_virt(page);
> > > +   memset(shared_info, 0, sizeof(struct shared_info));
> > > +   kvm->arch.xen.shinfo = shared_info;
> > > +   return 0;
> > > +}
> > > +
> > 
> > Hm.
> > 
> > How come we get to pin the page and directly dereference it every time,
> > while kvm_setup_pvclock_page() has to use kvm_write_guest_cached()
> > instead?
> 
> So looking at my WIP trees from the time, this is something that
> we went back and forth on as well with using just a pinned page or a
> persistent kvm_vcpu_map().

OK, thanks.

> I remember distinguishing shared_info/vcpu_info from kvm_setup_pvclock_page()
> as shared_info is created early and is not expected to change during the
> lifetime of the guest which didn't seem true for MSR_KVM_SYSTEM_TIME (or
> MSR_KVM_STEAL_TIME) so that would either need to do a kvm_vcpu_map()
> kvm_vcpu_unmap() dance or do some kind of synchronization.
> 
> That said, I don't think this code explicitly disallows any updates
> to shared_info.

It needs to allow updates as well as disabling the shared_info pages.
We're going to need that to implement SHUTDOWN_soft_reset for kexec.

> > 
> > If that was allowed, wouldn't it have been a much simpler fix for
> > CVE-2019-3016? What am I missing?
> 
> Agreed.
> 
> Perhaps, Paolo can chime in with why KVM never uses pinned page
> and always prefers to do cached mappings instead?
> 
> > 
> > Should I rework these to use kvm_write_guest_cached()?
> 
> kvm_vcpu_map() would be better. The event channel logic does RMW operations
> on shared_info->vcpu_info.

I've ported the shared_info/vcpu_info parts and made a test case, and
was going back through to make it use kvm_write_guest_cached(). But I
should probably plug on through the evtchn bits before I do that.

I also don't see much locking on the cache, and can't convince myself
that accessing the shared_info page from multiple CPUs with
kvm_write_guest_cached() or kvm_map_gfn() is sane unless they each have
their own cache.

What I have so far is at

https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/xenpv

I'll do the event channel support tomorrow and hook it up to my actual
VMM to give it some more serious testing.



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH RFC 03/39] KVM: x86/xen: register shared_info page

2020-12-01 Thread Ankur Arora

On 2020-12-01 5:07 a.m., David Woodhouse wrote:

On Wed, 2019-02-20 at 20:15 +, Joao Martins wrote:

+static int kvm_xen_shared_info_init(struct kvm *kvm, gfn_t gfn)
+{
+   struct shared_info *shared_info;
+   struct page *page;
+
+   page = gfn_to_page(kvm, gfn);
+   if (is_error_page(page))
+   return -EINVAL;
+
+   kvm->arch.xen.shinfo_addr = gfn;
+
+   shared_info = page_to_virt(page);
+   memset(shared_info, 0, sizeof(struct shared_info));
+   kvm->arch.xen.shinfo = shared_info;
+   return 0;
+}
+


Hm.

How come we get to pin the page and directly dereference it every time,
while kvm_setup_pvclock_page() has to use kvm_write_guest_cached()
instead?


So looking at my WIP trees from the time, this is something that
we went back and forth on as well with using just a pinned page or a
persistent kvm_vcpu_map().

I remember distinguishing shared_info/vcpu_info from kvm_setup_pvclock_page()
as shared_info is created early and is not expected to change during the
lifetime of the guest which didn't seem true for MSR_KVM_SYSTEM_TIME (or
MSR_KVM_STEAL_TIME) so that would either need to do a kvm_vcpu_map()
kvm_vcpu_unmap() dance or do some kind of synchronization.

That said, I don't think this code explicitly disallows any updates
to shared_info.



If that was allowed, wouldn't it have been a much simpler fix for
CVE-2019-3016? What am I missing?


Agreed.

Perhaps, Paolo can chime in with why KVM never uses pinned page
and always prefers to do cached mappings instead?



Should I rework these to use kvm_write_guest_cached()?


kvm_vcpu_map() would be better. The event channel logic does RMW operations
on shared_info->vcpu_info.


Ankur






Re: [PATCH RFC 03/39] KVM: x86/xen: register shared_info page

2020-12-01 Thread David Woodhouse
On Wed, 2019-02-20 at 20:15 +, Joao Martins wrote:
> +static int kvm_xen_shared_info_init(struct kvm *kvm, gfn_t gfn)
> +{
> +   struct shared_info *shared_info;
> +   struct page *page;
> +
> +   page = gfn_to_page(kvm, gfn);
> +   if (is_error_page(page))
> +   return -EINVAL;
> +
> +   kvm->arch.xen.shinfo_addr = gfn;
> +
> +   shared_info = page_to_virt(page);
> +   memset(shared_info, 0, sizeof(struct shared_info));
> +   kvm->arch.xen.shinfo = shared_info;
> +   return 0;
> +}
> +

Hm.

How come we get to pin the page and directly dereference it every time,
while kvm_setup_pvclock_page() has to use kvm_write_guest_cached()
instead?

If that was allowed, wouldn't it have been a much simpler fix for
CVE-2019-3016? What am I missing?

Should I rework these to use kvm_write_guest_cached()? 




smime.p7s
Description: S/MIME cryptographic signature