Re: [PATCH V5 0/9] Fixes for vhost metadata acceleration

2019-08-14 Thread Jason Wang


On 2019/8/14 上午12:41, Christoph Hellwig wrote:

On Tue, Aug 13, 2019 at 08:57:07AM -0300, Jason Gunthorpe wrote:

On Tue, Aug 13, 2019 at 04:31:07PM +0800, Jason Wang wrote:


What kind of issues do you see? Spinlock is to synchronize GUP with MMU
notifier in this series.

A GUP that can't sleep can't pagefault which makes it a really weird
pattern

get_user_pages/get_user_pages_fast must not be called under a spinlock.
We have the somewhat misnamed __get_user_page_fast that just does a
lookup for existing pages and never faults for a few places that need
to do that lookup from contexts where we can't sleep.



Yes, I do use __get_user_pages_fast() in the code.

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V5 0/9] Fixes for vhost metadata acceleration

2019-08-14 Thread Jason Wang


On 2019/8/13 下午7:57, Jason Gunthorpe wrote:

On Tue, Aug 13, 2019 at 04:31:07PM +0800, Jason Wang wrote:


What kind of issues do you see? Spinlock is to synchronize GUP with MMU
notifier in this series.

A GUP that can't sleep can't pagefault which makes it a really weird
pattern



My understanding is __get_user_pages_fast() assumes caller can fail or 
have fallback. And we have graceful fallback to copy_{to|from}_user().






Btw, back to the original question. May I know why synchronize_rcu() is not
suitable? Consider:

We already went over this. You'd need to determine it doesn't somehow
deadlock the mm on reclaim paths. Maybe it is OK, the rcq_gq_wq is
marked WQ_MEM_RECLAIM at least..



Yes, will take a look at this.




I also think Michael was concerned about the latency spikes a long RCU
delay would cause.



I don't think it's a real problem consider MMU notifier could be 
preempted or blocked.


Thanks




Jason

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] virtio-net: lower min ring num_free for efficiency

2019-08-14 Thread Jason Wang


On 2019/8/15 上午11:11, 冉 jiang wrote:

On 2019/8/15 11:01, Jason Wang wrote:

On 2019/8/14 上午10:06, ? jiang wrote:

This change lowers ring buffer reclaim threshold from 1/2*queue to
budget
for better performance. According to our test with qemu + dpdk, packet
dropping happens when the guest is not able to provide free buffer in
avail ring timely with default 1/2*queue. The value in the patch has
been
tested and does show better performance.


Please add your tests setup and result here.

Thanks



Signed-off-by: jiangkidd 
---
   drivers/net/virtio_net.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 0d4115c9e20b..bc08be7925eb 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1331,7 +1331,7 @@ static int virtnet_receive(struct receive_queue
*rq, int budget,
   }
   }
   -    if (rq->vq->num_free > virtqueue_get_vring_size(rq->vq) / 2) {
+    if (rq->vq->num_free > min((unsigned int)budget,
virtqueue_get_vring_size(rq->vq)) / 2) {
   if (!try_fill_recv(vi, rq, GFP_ATOMIC))
   schedule_delayed_work(>refill, 0);
   }

Sure, here are the details:



Thanks for the details, but I meant it's better if you could summarize 
you test result in the commit log in a compact way.


Btw, some comments, see below:





Test setup & result:



Below is the snippet from our test result. Test1 was done with default
driver with the value of 1/2 * queue, while test2 is with my patch. We
can see average
drop packets do decrease a lot in test2.

test1Time    avgDropPackets    test2Time    avgDropPackets pps

16:21.0    12.295    56:50.4    0 300k
17:19.1    15.244    56:50.4    0    300k
18:17.5    18.789    56:50.4    0    300k
19:15.1    14.208    56:50.4    0    300k
20:13.2    20.818    56:50.4    0.267    300k
21:11.2    12.397    56:50.4    0    300k
22:09.3    12.599    56:50.4    0    300k
23:07.3    15.531    57:48.4    0    300k
24:05.5    13.664    58:46.5    0    300k
25:03.7    13.158    59:44.5    4.73    300k
26:01.1    2.486    00:42.6    0    300k
26:59.1    11.241    01:40.6    0    300k
27:57.2    20.521    02:38.6    0    300k
28:55.2    30.094    03:36.7    0    300k
29:53.3    16.828    04:34.7    0.963    300k
30:51.3    46.916    05:32.8    0    400k
31:49.3    56.214    05:32.8    0    400k
32:47.3    58.69    05:32.8    0    400k
33:45.3    61.486    05:32.8    0    400k
34:43.3    72.175    05:32.8    0.598    400k
35:41.3    56.699    05:32.8    0    400k
36:39.3    61.071    05:32.8    0    400k
37:37.3    43.355    06:30.8    0    400k
38:35.4    44.644    06:30.8    0    400k
39:33.4    72.336    06:30.8    0    400k
40:31.4    70.676    06:30.8    0    400k
41:29.4    108.009    06:30.8    0    400k
42:27.4    65.216    06:30.8    0    400k



Why there're difference in test time? Could you summarize them like:

Test setup: e.g testpmd or pktgen to generate packets to guest

avg packets drop before: XXX

avg packets drop after: YYY(-ZZZ%)

Thanks





Data to prove why the patch helps:



We did have completed several rounds of test with setting the value to
budget (64 as the default value). It does improve a lot with pps is
below 400pps for a single stream. We are confident that it runs out of free
buffer in avail ring when packet dropping happens with below systemtap:

Just a snippet:

probe module("virtio_ring").function("virtqueue_get_buf")
{
   x = (@cast($_vq, "vring_virtqueue")->vring->used->idx)-
(@cast($_vq, "vring_virtqueue")->last_used_idx) ---> we use this one
to verify if the queue is full, which means guest is not able to take
buffer from the queue timely

   if (x<0 && (x+65535)<4096)
       x = x+65535

   if((x==1024) && @cast($_vq, "vring_virtqueue")->vq->callback ==
callback_addr)
       netrxcount[x] <<< gettimeofday_s()
}


probe module("virtio_ring").function("virtqueue_add_inbuf")
{
   y = (@cast($vq, "vring_virtqueue")->vring->avail->idx)-
(@cast($vq, "vring_virtqueue")->vring->used->idx) ---> we use this one
to verify if we run out of free buffer in avail ring
   if (y<0 && (y+65535)<4096)
       y = y+65535

   if(@2=="debugon")
   {
       if(y==0 && @cast($vq, "vring_virtqueue")->vq->callback ==
callback_addr)
       {
           netrxfreecount[y] <<< gettimeofday_s()

           printf("no avail ring left seen, printing most recent 5
num free, vq: %lx, current index: %d\n", $vq, recentfreecount)
           for(i=recentfreecount; i!=((recentfreecount+4) % 5);
i=((i+1) % 5))
           {
               printf("index: %d, num free: %d\n", i, recentfree[$vq,
i])
           }

           printf("index: %d, num free: %d\n", i, recentfree[$vq, i])
           //exit()
       }
   }
}


probe

Re: [PATCH] virtio-net: lower min ring num_free for efficiency

2019-08-14 Thread 冉 jiang

On 2019/8/15 11:01, Jason Wang wrote:
>
> On 2019/8/14 上午10:06, ? jiang wrote:
>> This change lowers ring buffer reclaim threshold from 1/2*queue to 
>> budget
>> for better performance. According to our test with qemu + dpdk, packet
>> dropping happens when the guest is not able to provide free buffer in
>> avail ring timely with default 1/2*queue. The value in the patch has 
>> been
>> tested and does show better performance.
>
>
> Please add your tests setup and result here.
>
> Thanks
>
>
>>
>> Signed-off-by: jiangkidd 
>> ---
>>   drivers/net/virtio_net.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 0d4115c9e20b..bc08be7925eb 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -1331,7 +1331,7 @@ static int virtnet_receive(struct receive_queue 
>> *rq, int budget,
>>   }
>>   }
>>   -    if (rq->vq->num_free > virtqueue_get_vring_size(rq->vq) / 2) {
>> +    if (rq->vq->num_free > min((unsigned int)budget, 
>> virtqueue_get_vring_size(rq->vq)) / 2) {
>>   if (!try_fill_recv(vi, rq, GFP_ATOMIC))
>>   schedule_delayed_work(>refill, 0);
>>   }

Sure, here are the details:


Test setup & result:



Below is the snippet from our test result. Test1 was done with default
driver with the value of 1/2 * queue, while test2 is with my patch. We 
can see average
drop packets do decrease a lot in test2.

test1Time    avgDropPackets    test2Time    avgDropPackets pps

16:21.0    12.295    56:50.4    0 300k
17:19.1    15.244    56:50.4    0    300k
18:17.5    18.789    56:50.4    0    300k
19:15.1    14.208    56:50.4    0    300k
20:13.2    20.818    56:50.4    0.267    300k
21:11.2    12.397    56:50.4    0    300k
22:09.3    12.599    56:50.4    0    300k
23:07.3    15.531    57:48.4    0    300k
24:05.5    13.664    58:46.5    0    300k
25:03.7    13.158    59:44.5    4.73    300k
26:01.1    2.486    00:42.6    0    300k
26:59.1    11.241    01:40.6    0    300k
27:57.2    20.521    02:38.6    0    300k
28:55.2    30.094    03:36.7    0    300k
29:53.3    16.828    04:34.7    0.963    300k
30:51.3    46.916    05:32.8    0    400k
31:49.3    56.214    05:32.8    0    400k
32:47.3    58.69    05:32.8    0    400k
33:45.3    61.486    05:32.8    0    400k
34:43.3    72.175    05:32.8    0.598    400k
35:41.3    56.699    05:32.8    0    400k
36:39.3    61.071    05:32.8    0    400k
37:37.3    43.355    06:30.8    0    400k
38:35.4    44.644    06:30.8    0    400k
39:33.4    72.336    06:30.8    0    400k
40:31.4    70.676    06:30.8    0    400k
41:29.4    108.009    06:30.8    0    400k
42:27.4    65.216    06:30.8    0    400k


Data to prove why the patch helps:



We did have completed several rounds of test with setting the value to
budget (64 as the default value). It does improve a lot with pps is
below 400pps for a single stream. We are confident that it runs out of free
buffer in avail ring when packet dropping happens with below systemtap:

Just a snippet:

probe module("virtio_ring").function("virtqueue_get_buf")
{
  x = (@cast($_vq, "vring_virtqueue")->vring->used->idx)-
(@cast($_vq, "vring_virtqueue")->last_used_idx) ---> we use this one
to verify if the queue is full, which means guest is not able to take
buffer from the queue timely

  if (x<0 && (x+65535)<4096)
      x = x+65535

  if((x==1024) && @cast($_vq, "vring_virtqueue")->vq->callback ==
callback_addr)
      netrxcount[x] <<< gettimeofday_s()
}


probe module("virtio_ring").function("virtqueue_add_inbuf")
{
  y = (@cast($vq, "vring_virtqueue")->vring->avail->idx)-
(@cast($vq, "vring_virtqueue")->vring->used->idx) ---> we use this one
to verify if we run out of free buffer in avail ring
  if (y<0 && (y+65535)<4096)
      y = y+65535

  if(@2=="debugon")
  {
      if(y==0 && @cast($vq, "vring_virtqueue")->vq->callback ==
callback_addr)
      {
          netrxfreecount[y] <<< gettimeofday_s()

          printf("no avail ring left seen, printing most recent 5
num free, vq: %lx, current index: %d\n", $vq, recentfreecount)
          for(i=recentfreecount; i!=((recentfreecount+4) % 5);
i=((i+1) % 5))
          {
              printf("index: %d, num free: %d\n", i, recentfree[$vq,
i])
          }

          printf("index: %d, num free: %d\n", i, recentfree[$vq, i])
          //exit()
      }
  }
}


probe
module("virtio_net").statement("virtnet_receive@drivers/net/virtio_net.c:732") 

{
  recentfreecount++
  recentfreecount = recentfreecount % 5
  recentfree[$rq->vq, recentfreecount] = $rq->vq->num_free --->
record the num_free for the last 5 calls to virtnet_receive, so we can
see if lowering the bar helps.
}


Here is the result:

no avail ring left seen, printing most recent 5 num free, 

Re: [PATCH] virtio-net: lower min ring num_free for efficiency

2019-08-14 Thread Jason Wang


On 2019/8/14 上午10:06, ? jiang wrote:

This change lowers ring buffer reclaim threshold from 1/2*queue to budget
for better performance. According to our test with qemu + dpdk, packet
dropping happens when the guest is not able to provide free buffer in
avail ring timely with default 1/2*queue. The value in the patch has been
tested and does show better performance.



Please add your tests setup and result here.

Thanks




Signed-off-by: jiangkidd 
---
  drivers/net/virtio_net.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 0d4115c9e20b..bc08be7925eb 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1331,7 +1331,7 @@ static int virtnet_receive(struct receive_queue *rq, int 
budget,
}
}
  
-	if (rq->vq->num_free > virtqueue_get_vring_size(rq->vq) / 2) {

+   if (rq->vq->num_free > min((unsigned int)budget, 
virtqueue_get_vring_size(rq->vq)) / 2) {
if (!try_fill_recv(vi, rq, GFP_ATOMIC))
schedule_delayed_work(>refill, 0);
}

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH v6 64/92] kvm: introspection: add single-stepping

2019-08-14 Thread Paolo Bonzini
On 14/08/19 14:36, Nicusor CITU wrote:
> Thank you for signaling this. This piece of code is leftover from the
> initial attempt to make single step running.
> Based on latest results, we do not actually need to change
> interruptibility during the singlestep. It is enough to enable the MTF
> and just suppress any interrupt injection (if any) before leaving the
> vcpu entering in guest.
> 

This is exactly what testcases are for...

Paolo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH v6 74/92] kvm: x86: do not unconditionally patch the hypercall instruction during emulation

2019-08-14 Thread Paolo Bonzini
On 14/08/19 14:07, Adalbert Lazăr wrote:
> On Tue, 13 Aug 2019 11:20:45 +0200, Paolo Bonzini  wrote:
>> On 09/08/19 18:00, Adalbert Lazăr wrote:
>>> From: Mihai Donțu 
>>>
>>> It can happened for us to end up emulating the VMCALL instruction as a
>>> result of the handling of an EPT write fault. In this situation, the
>>> emulator will try to unconditionally patch the correct hypercall opcode
>>> bytes using emulator_write_emulated(). However, this last call uses the
>>> fault GPA (if available) or walks the guest page tables at RIP,
>>> otherwise. The trouble begins when using KVMI, when we forbid the use of
>>> the fault GPA and fallback to the guest pt walk: in Windows (8.1 and
>>> newer) the page that we try to write into is marked read-execute and as
>>> such emulator_write_emulated() fails and we inject a write #PF, leading
>>> to a guest crash.
>>>
>>> The fix is rather simple: check the existing instruction bytes before
>>> doing the patching. This does not change the normal KVM behaviour, but
>>> does help when using KVMI as we no longer inject a write #PF.
>>
>> Fixing the hypercall is just an optimization.  Can we just hush and
>> return to the guest if emulator_write_emulated returns
>> X86EMUL_PROPAGATE_FAULT?
>>
>> Paolo
> 
> Something like this?
> 
>   err = emulator_write_emulated(...);
>   if (err == X86EMUL_PROPAGATE_FAULT)
>   err = X86EMUL_CONTINUE;
>   return err;

Yes.  The only difference will be that you'll keep getting #UD vmexits
instead of hypercall vmexits.  It's also safer, we want to obey those
r-x permissions because PatchGuard would crash the system if it noticed
the rewriting for whatever reason.

Paolo

>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>> index 04b1d2916a0a..965c4f0108eb 100644
>>> --- a/arch/x86/kvm/x86.c
>>> +++ b/arch/x86/kvm/x86.c
>>> @@ -7363,16 +7363,33 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
>>>  }
>>>  EXPORT_SYMBOL_GPL(kvm_emulate_hypercall);
>>>  
>>> +#define KVM_HYPERCALL_INSN_LEN 3
>>> +
>>>  static int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt)
>>>  {
>>> +   int err;
>>> struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
>>> -   char instruction[3];
>>> +   char buf[KVM_HYPERCALL_INSN_LEN];
>>> +   char instruction[KVM_HYPERCALL_INSN_LEN];
>>> unsigned long rip = kvm_rip_read(vcpu);
>>>  
>>> +   err = emulator_read_emulated(ctxt, rip, buf, sizeof(buf),
>>> +>exception);
>>> +   if (err != X86EMUL_CONTINUE)
>>> +   return err;
>>> +
>>> kvm_x86_ops->patch_hypercall(vcpu, instruction);
>>> +   if (!memcmp(instruction, buf, sizeof(instruction)))
>>> +   /*
>>> +* The hypercall instruction is the correct one. Retry
>>> +* its execution maybe we got here as a result of an
>>> +* event other than #UD which has been resolved in the
>>> +* mean time.
>>> +*/
>>> +   return X86EMUL_CONTINUE;
>>>  
>>> -   return emulator_write_emulated(ctxt, rip, instruction, 3,
>>> -   >exception);
>>> +   return emulator_write_emulated(ctxt, rip, instruction,
>>> +  sizeof(instruction), >exception);
>>>  }

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH v6 74/92] kvm: x86: do not unconditionally patch the hypercall instruction during emulation

2019-08-14 Thread Adalbert Lazăr
On Tue, 13 Aug 2019 11:20:45 +0200, Paolo Bonzini  wrote:
> On 09/08/19 18:00, Adalbert Lazăr wrote:
> > From: Mihai Donțu 
> > 
> > It can happened for us to end up emulating the VMCALL instruction as a
> > result of the handling of an EPT write fault. In this situation, the
> > emulator will try to unconditionally patch the correct hypercall opcode
> > bytes using emulator_write_emulated(). However, this last call uses the
> > fault GPA (if available) or walks the guest page tables at RIP,
> > otherwise. The trouble begins when using KVMI, when we forbid the use of
> > the fault GPA and fallback to the guest pt walk: in Windows (8.1 and
> > newer) the page that we try to write into is marked read-execute and as
> > such emulator_write_emulated() fails and we inject a write #PF, leading
> > to a guest crash.
> > 
> > The fix is rather simple: check the existing instruction bytes before
> > doing the patching. This does not change the normal KVM behaviour, but
> > does help when using KVMI as we no longer inject a write #PF.
> 
> Fixing the hypercall is just an optimization.  Can we just hush and
> return to the guest if emulator_write_emulated returns
> X86EMUL_PROPAGATE_FAULT?
> 
> Paolo

Something like this?

err = emulator_write_emulated(...);
if (err == X86EMUL_PROPAGATE_FAULT)
err = X86EMUL_CONTINUE;
return err;

> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 04b1d2916a0a..965c4f0108eb 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -7363,16 +7363,33 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
> >  }
> >  EXPORT_SYMBOL_GPL(kvm_emulate_hypercall);
> >  
> > +#define KVM_HYPERCALL_INSN_LEN 3
> > +
> >  static int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt)
> >  {
> > +   int err;
> > struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
> > -   char instruction[3];
> > +   char buf[KVM_HYPERCALL_INSN_LEN];
> > +   char instruction[KVM_HYPERCALL_INSN_LEN];
> > unsigned long rip = kvm_rip_read(vcpu);
> >  
> > +   err = emulator_read_emulated(ctxt, rip, buf, sizeof(buf),
> > +>exception);
> > +   if (err != X86EMUL_CONTINUE)
> > +   return err;
> > +
> > kvm_x86_ops->patch_hypercall(vcpu, instruction);
> > +   if (!memcmp(instruction, buf, sizeof(instruction)))
> > +   /*
> > +* The hypercall instruction is the correct one. Retry
> > +* its execution maybe we got here as a result of an
> > +* event other than #UD which has been resolved in the
> > +* mean time.
> > +*/
> > +   return X86EMUL_CONTINUE;
> >  
> > -   return emulator_write_emulated(ctxt, rip, instruction, 3,
> > -   >exception);
> > +   return emulator_write_emulated(ctxt, rip, instruction,
> > +  sizeof(instruction), >exception);
> >  }
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH v6 01/92] kvm: introduce KVMI (VM introspection subsystem)

2019-08-14 Thread Paolo Bonzini
On 14/08/19 11:48, Adalbert Lazăr wrote:
>> Why does closing the socket require destroying the kvmi object?  E.g. can
>> it be marked as defunct or whatever and only fully removed on a synchronous
>> unhook from userspace?  Re-hooking could either require said unhook, or
>> maybe reuse the existing kvmi object with a new socket.
> Will it be better to have the following ioctls?
> 
>   - hook (alloc kvmi and kvmi_vcpu structs)
>   - notify_imminent_unhook (send the KVMI_EVENT_UNHOOK event)
>   - unhook (free kvmi and kvmi_vcpu structs)

Yeah, that is nice also because it leaves the timeout policy to
userspace.  (BTW, please change references to QEMU to "userspace").

Paolo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH v6 01/92] kvm: introduce KVMI (VM introspection subsystem)

2019-08-14 Thread Adalbert Lazăr
On Tue, 13 Aug 2019 08:01:28 -0700, Sean Christopherson 
 wrote:
> On Tue, Aug 13, 2019 at 02:09:51PM +0200, Paolo Bonzini wrote:
> > On 13/08/19 13:57, Adalbert Lazăr wrote:
> > >> The refcounting approach seems a bit backwards, and AFAICT is driven by
> > >> implementing unhook via a message, which also seems backwards.  I assume
> > >> hook and unhook are relatively rare events and not performance critical,
> > >> so make those the restricted/slow flows, e.g. force userspace to quiesce
> > >> the VM by making unhook() mutually exclusive with every vcpu ioctl() and
> > >> maybe anything that takes kvm->lock. 
> > >>
> > >> Then kvmi_ioctl_unhook() can use thread_stop() and kvmi_recv() just needs
> > >> to check kthread_should_stop().
> > >>
> > >> That way kvmi doesn't need to be refcounted since it's guaranteed to be
> > >> alive if the pointer is non-null.  Eliminating the refcounting will clean
> > >> up a lot of the code by eliminating calls to kvmi_{get,put}(), e.g.
> > >> wrappers like kvmi_breakpoint_event() just check vcpu->kvmi, or maybe
> > >> even get dropped altogether.
> > > 
> > > The unhook event has been added to cover the following case: while the
> > > introspection tool runs in another VM, both VMs, the virtual appliance
> > > and the introspected VM, could be paused by the user. We needed a way
> > > to signal this to the introspection tool and give it time to unhook
> > > (the introspected VM has to run and execute the introspection commands
> > > during this phase). The receiving threads quits when the socket is closed
> > > (by QEMU or by the introspection tool).
> 
> Why does closing the socket require destroying the kvmi object?  E.g. can
> it be marked as defunct or whatever and only fully removed on a synchronous
> unhook from userspace?  Re-hooking could either require said unhook, or
> maybe reuse the existing kvmi object with a new socket.

Will it be better to have the following ioctls?

  - hook (alloc kvmi and kvmi_vcpu structs)
  - notify_imminent_unhook (send the KVMI_EVENT_UNHOOK event)
  - unhook (free kvmi and kvmi_vcpu structs)
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH v6 14/92] kvm: introspection: handle introspection commands before returning to guest

2019-08-14 Thread Adalbert Lazăr
On Tue, 13 Aug 2019 16:45:11 +0200, Paolo Bonzini  wrote:
> On 13/08/19 15:54, Adalbert Lazăr wrote:
> > Leaving kvm_vcpu_block() in order to handle a request such as 'pause',
> > would cause the vCPU to enter the guest when resumed. Most of the
> > time this does not appear to be an issue, but during early boot it
> > can happen for a non-boot vCPU to start executing code from areas that
> > first needed to be set up by vCPU #0.
> > 
> > In a particular case, vCPU #1 executed code which resided in an area
> > not covered by a memslot, which caused an EPT violation that got
> > turned in mmu_set_spte() into a MMIO request that required emulation.
> > Unfortunatelly, the emulator tripped, exited to userspace and the VM
> > was aborted.
> 
> Okay, this makes sense.  Maybe you want to handle KVM_REQ_INTROSPECTION
> in vcpu_run rather than vcpu_enter_guest?
> 
> Paolo

Right! We've missed that.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization