Re: [PATCH v5] KVM: arm/arm64: Route vtimer events to user space

2016-09-23 Thread Christoffer Dall
On Fri, Sep 23, 2016 at 11:10:46AM +0200, Alexander Graf wrote:
> 
> 
> On 23.09.16 10:57, Paolo Bonzini wrote:
> > 
> > 
> > On 23/09/2016 09:14, Alexander Graf wrote:
>  +/*
>  + * Synchronize the timer IRQ state with the interrupt controller.
>  + */
>   static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level)
>   {
>   int ret;
>   struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
>   
>   timer->active_cleared_last = false;
>   timer->irq.level = new_level;
>  -trace_kvm_timer_update_irq(vcpu->vcpu_id, timer->irq.irq,
>  +trace_kvm_timer_update_irq(vcpu->vcpu_id, host_vtimer_irq,
>  timer->irq.level);
>  +[...]
>  +struct kvm_sync_regs *regs = &vcpu->run->s.regs;
>  +
>  +/* Populate the timer bitmap for user space */
>  +regs->kernel_timer_pending &= ~KVM_ARM_TIMER_VTIMER;
>  +if (new_level)
>  +regs->kernel_timer_pending |= 
>  KVM_ARM_TIMER_VTIMER;
> >>>
> >>> I think if you got here, it means you have to exit to userspace to
> >>> update it of the new state.  If you don't want to propagate a return
> >>
> >> Yes, but we can't exit straight away with our own exit reason because we
> >> might be inside an MMIO exit path here which already occupies the
> >> exit_reason.
> > 
> > So the idea is that whenever you're here you have one of the following
> > cases:
> > 
> > - are coming from kvm_timer_flush_hwstate, and then you exit immediately
> > with KVM_EXIT_INTR if needed
> > 
> > - you are coming from the kvm_timer_sync_hwstate just before
> > handle_exit.  Then if there's a vmexit you have already set
> > regs->kernel_timer_pending, if not you'll do a kvm_timer_flush_hwstate soon.
> > 
> > - you are coming from the kvm_timer_sync_hwstate in the middle of
> > kvm_arch_vcpu_ioctl_run, and then "continue" will either exit the loop
> > immediately (if ret <= 0) or go to kvm_timer_flush_hwstate as in the
> > previous case
> > 
> > Right?
> 
> Yup :)
> 
> > 
> >>> Maybe I'm misunderstanding and user_timer_pending is just a cached
> >>> verison of what you said last, but as I said above, I think you can just
> >>> compare timer->irq.level with the last value the kvm_run struct, and if
> >>> something changed, you have to exit.
> >>
> >> So how would user space know whether the line went up or down? Or didn't
> >> change at all (if we coalesce with an MMIO exit)?
> > 
> > It would save the status of the line somewhere in its own variable,
> > without introducing a relatively delicate API between kernel and user.
> > 
> > I agree that you cannot update kernel_timer_pending only in
> > flush_hwstate; otherwise you miss on case (2) when there is a vmexit.
> > That has to stay in kvm_timer_sync_hwstate (or it could become a new
> > function called at the very end of kvm_arch_vcpu_ioctl_run).
> 
> The beauty of having it in the timer update function is that it gets
> called from flush_hwstate() as well. That way we catch the update also
> in cases where we need it before we enter the vcpu.
> 
> > However, I'm not sure why user_timer_pending is necessary.  If it is
> > just the value you assigned to regs->kernel_timer_pending at the time of
> > the previous vmexit, can kvm_timer_flush_hwstate just do
> > 
> >if (timer->prev_kernel_timer_pending != regs->kernel_timer_pending) {
> >timer->prev_kernel_timer_pending = regs->kernel_timer_pending;
> >return 1;
> >}
> > 
> > ?  Or even
> > 
> >if (timer->prev_irq_level != timer->irq.level) {
> >timer->prev_irq_level = regs->irq.level;
> >return 1;
> >}
> > 
> > so that regs->kernel_timer_pending remains exclusively
> > kvm_timer_sync_hwstate's business?
> 
> We could do that too, yes. But it puts more burden on user space - it
> would have to ensure that it *always* checks for the pending timer
> status. With this approach, user space may opt to only check for timer
> state changes on -EINTR and things would still work.
> 
Huh?  I thought the whole point was that you could piggy back on an MMIO
exit with a change in the line state, for example.  So I don't
understand this.

In any case, exposing internal historical state only used by the kernel
to figure out whether or not it should force an exit if there's not
already one happening, to user space, just feels weird to me, and my
emphasis is on having a clean timer emulation component in the kernel
where the semantics are clear.  Have a look at my proposal in the other
mail on this thread.

-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v5] KVM: arm/arm64: Route vtimer events to user space

2016-09-23 Thread Paolo Bonzini


On 23/09/2016 11:17, Alexander Graf wrote:
>> > 
>> > On the other hand, what happens if you run new QEMU with old userspace?
>> > With user_timer_pending you'd get an infinite stream of vmexits the
>> > first time the timer fires, wouldn't you?  Whereas if you keep it in the
>> > kernel, userspace would simply not get the interrupt (because it doesn't
>> > know about kernel_timer_pending) and think it got a spurious vmexit.
>> > The kernel's IRQ would stay masked and everything would just (not) work
>> > like before your patch?
> Yes, we'd definitely stay more compatible by tracking it only in the
> kernel. I'm not fully convinced that it's the better interface, but
> since both Christoffer and you seem to choke on that part, I'll give it
> a stab ;).

Cool!  FWIW my suggestion for kernel_timer_pending's name would be
timer_irq_level (nicely matching timer->irq.level in the kernel).

Paolo
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v5] KVM: arm/arm64: Route vtimer events to user space

2016-09-23 Thread Alexander Graf


On 23.09.16 11:15, Paolo Bonzini wrote:
> 
> 
> On 23/09/2016 11:10, Alexander Graf wrote:
> Maybe I'm misunderstanding and user_timer_pending is just a cached
> verison of what you said last, but as I said above, I think you can just
> compare timer->irq.level with the last value the kvm_run struct, and if
> something changed, you have to exit.

 So how would user space know whether the line went up or down? Or didn't
 change at all (if we coalesce with an MMIO exit)?
>>>
>>> It would save the status of the line somewhere in its own variable,
>>> without introducing a relatively delicate API between kernel and user.
>>>
>>> I agree that you cannot update kernel_timer_pending only in
>>> flush_hwstate; otherwise you miss on case (2) when there is a vmexit.
>>> That has to stay in kvm_timer_sync_hwstate (or it could become a new
>>> function called at the very end of kvm_arch_vcpu_ioctl_run).
>>
>> The beauty of having it in the timer update function is that it gets
>> called from flush_hwstate() as well. That way we catch the update also
>> in cases where we need it before we enter the vcpu.
> 
> Yeah, the timer update function is ideal because it is called from both
> flush_hwstate and sync_hwstate.
> 
>>> However, I'm not sure why user_timer_pending is necessary.  If it is
>>> just the value you assigned to regs->kernel_timer_pending at the time of
>>> the previous vmexit, can kvm_timer_flush_hwstate just do
>>>
>>>if (timer->prev_kernel_timer_pending != regs->kernel_timer_pending) {
>>>timer->prev_kernel_timer_pending = regs->kernel_timer_pending;
>>>return 1;
>>>}
>>>
>>> ?  Or even
>>>
>>>if (timer->prev_irq_level != timer->irq.level) {
>>>timer->prev_irq_level = regs->irq.level;
>>>return 1;
>>>}
>>>
>>> so that regs->kernel_timer_pending remains exclusively
>>> kvm_timer_sync_hwstate's business?
>>
>> We could do that too, yes. But it puts more burden on user space - it
>> would have to ensure that it *always* checks for the pending timer
>> status. With this approach, user space may opt to only check for timer
>> state changes on -EINTR and things would still work.
> 
> That would be overloading EINTR a bit though.  But that's a digression,
> I don't think it matters much.
> 
> On the other hand, what happens if you run new QEMU with old userspace?
> With user_timer_pending you'd get an infinite stream of vmexits the
> first time the timer fires, wouldn't you?  Whereas if you keep it in the
> kernel, userspace would simply not get the interrupt (because it doesn't
> know about kernel_timer_pending) and think it got a spurious vmexit.
> The kernel's IRQ would stay masked and everything would just (not) work
> like before your patch?

Yes, we'd definitely stay more compatible by tracking it only in the
kernel. I'm not fully convinced that it's the better interface, but
since both Christoffer and you seem to choke on that part, I'll give it
a stab ;).


Alex
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v5] KVM: arm/arm64: Route vtimer events to user space

2016-09-23 Thread Paolo Bonzini


On 23/09/2016 11:10, Alexander Graf wrote:
 Maybe I'm misunderstanding and user_timer_pending is just a cached
 verison of what you said last, but as I said above, I think you can just
 compare timer->irq.level with the last value the kvm_run struct, and if
 something changed, you have to exit.
>>>
>>> So how would user space know whether the line went up or down? Or didn't
>>> change at all (if we coalesce with an MMIO exit)?
>>
>> It would save the status of the line somewhere in its own variable,
>> without introducing a relatively delicate API between kernel and user.
>>
>> I agree that you cannot update kernel_timer_pending only in
>> flush_hwstate; otherwise you miss on case (2) when there is a vmexit.
>> That has to stay in kvm_timer_sync_hwstate (or it could become a new
>> function called at the very end of kvm_arch_vcpu_ioctl_run).
> 
> The beauty of having it in the timer update function is that it gets
> called from flush_hwstate() as well. That way we catch the update also
> in cases where we need it before we enter the vcpu.

Yeah, the timer update function is ideal because it is called from both
flush_hwstate and sync_hwstate.

>> However, I'm not sure why user_timer_pending is necessary.  If it is
>> just the value you assigned to regs->kernel_timer_pending at the time of
>> the previous vmexit, can kvm_timer_flush_hwstate just do
>>
>>if (timer->prev_kernel_timer_pending != regs->kernel_timer_pending) {
>>timer->prev_kernel_timer_pending = regs->kernel_timer_pending;
>>return 1;
>>}
>>
>> ?  Or even
>>
>>if (timer->prev_irq_level != timer->irq.level) {
>>timer->prev_irq_level = regs->irq.level;
>>return 1;
>>}
>>
>> so that regs->kernel_timer_pending remains exclusively
>> kvm_timer_sync_hwstate's business?
> 
> We could do that too, yes. But it puts more burden on user space - it
> would have to ensure that it *always* checks for the pending timer
> status. With this approach, user space may opt to only check for timer
> state changes on -EINTR and things would still work.

That would be overloading EINTR a bit though.  But that's a digression,
I don't think it matters much.

On the other hand, what happens if you run new QEMU with old userspace?
With user_timer_pending you'd get an infinite stream of vmexits the
first time the timer fires, wouldn't you?  Whereas if you keep it in the
kernel, userspace would simply not get the interrupt (because it doesn't
know about kernel_timer_pending) and think it got a spurious vmexit.
The kernel's IRQ would stay masked and everything would just (not) work
like before your patch?

Thanks,

Paolo
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v5] KVM: arm/arm64: Route vtimer events to user space

2016-09-23 Thread Alexander Graf


On 23.09.16 10:57, Paolo Bonzini wrote:
> 
> 
> On 23/09/2016 09:14, Alexander Graf wrote:
 +/*
 + * Synchronize the timer IRQ state with the interrupt controller.
 + */
  static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level)
  {
int ret;
struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
  
timer->active_cleared_last = false;
timer->irq.level = new_level;
 -  trace_kvm_timer_update_irq(vcpu->vcpu_id, timer->irq.irq,
 +  trace_kvm_timer_update_irq(vcpu->vcpu_id, host_vtimer_irq,
   timer->irq.level);
 +  [...]
 +  struct kvm_sync_regs *regs = &vcpu->run->s.regs;
 +
 +  /* Populate the timer bitmap for user space */
 +  regs->kernel_timer_pending &= ~KVM_ARM_TIMER_VTIMER;
 +  if (new_level)
 +  regs->kernel_timer_pending |= KVM_ARM_TIMER_VTIMER;
>>>
>>> I think if you got here, it means you have to exit to userspace to
>>> update it of the new state.  If you don't want to propagate a return
>>
>> Yes, but we can't exit straight away with our own exit reason because we
>> might be inside an MMIO exit path here which already occupies the
>> exit_reason.
> 
> So the idea is that whenever you're here you have one of the following
> cases:
> 
> - are coming from kvm_timer_flush_hwstate, and then you exit immediately
> with KVM_EXIT_INTR if needed
> 
> - you are coming from the kvm_timer_sync_hwstate just before
> handle_exit.  Then if there's a vmexit you have already set
> regs->kernel_timer_pending, if not you'll do a kvm_timer_flush_hwstate soon.
> 
> - you are coming from the kvm_timer_sync_hwstate in the middle of
> kvm_arch_vcpu_ioctl_run, and then "continue" will either exit the loop
> immediately (if ret <= 0) or go to kvm_timer_flush_hwstate as in the
> previous case
> 
> Right?

Yup :)

> 
>>> Maybe I'm misunderstanding and user_timer_pending is just a cached
>>> verison of what you said last, but as I said above, I think you can just
>>> compare timer->irq.level with the last value the kvm_run struct, and if
>>> something changed, you have to exit.
>>
>> So how would user space know whether the line went up or down? Or didn't
>> change at all (if we coalesce with an MMIO exit)?
> 
> It would save the status of the line somewhere in its own variable,
> without introducing a relatively delicate API between kernel and user.
> 
> I agree that you cannot update kernel_timer_pending only in
> flush_hwstate; otherwise you miss on case (2) when there is a vmexit.
> That has to stay in kvm_timer_sync_hwstate (or it could become a new
> function called at the very end of kvm_arch_vcpu_ioctl_run).

The beauty of having it in the timer update function is that it gets
called from flush_hwstate() as well. That way we catch the update also
in cases where we need it before we enter the vcpu.

> However, I'm not sure why user_timer_pending is necessary.  If it is
> just the value you assigned to regs->kernel_timer_pending at the time of
> the previous vmexit, can kvm_timer_flush_hwstate just do
> 
>if (timer->prev_kernel_timer_pending != regs->kernel_timer_pending) {
>timer->prev_kernel_timer_pending = regs->kernel_timer_pending;
>return 1;
>}
> 
> ?  Or even
> 
>if (timer->prev_irq_level != timer->irq.level) {
>timer->prev_irq_level = regs->irq.level;
>return 1;
>}
> 
> so that regs->kernel_timer_pending remains exclusively
> kvm_timer_sync_hwstate's business?

We could do that too, yes. But it puts more burden on user space - it
would have to ensure that it *always* checks for the pending timer
status. With this approach, user space may opt to only check for timer
state changes on -EINTR and things would still work.


Alex
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v5] KVM: arm/arm64: Route vtimer events to user space

2016-09-23 Thread Christoffer Dall
On Fri, Sep 23, 2016 at 09:14:13AM +0200, Alexander Graf wrote:
> 
> 
> On 22.09.16 23:28, Christoffer Dall wrote:
> > On Thu, Sep 22, 2016 at 02:52:49PM +0200, Alexander Graf wrote:
> >> We have 2 modes for dealing with interrupts in the ARM world. We can either
> >> handle them all using hardware acceleration through the vgic or we can 
> >> emulate
> >> a gic in user space and only drive CPU IRQ pins from there.
> >>
> >> Unfortunately, when driving IRQs from user space, we never tell user space
> >> about timer events that may result in interrupt line state changes, so we
> >> lose out on timer events if we run with user space gic emulation.
> >>
> >> This patch fixes that by syncing user space's view of the vtimer irq line
> >> with the kvm view of that same line.
> >>
> >> With this patch I can successfully run edk2 and Linux with user space gic
> >> emulation.
> >>
> >> Signed-off-by: Alexander Graf 
> >>
> >> ---
> >>
> >> v1 -> v2:
> >>
> >>   - Add back curly brace that got lost
> >>
> >> v2 -> v3:
> >>
> >>   - Split into patch set
> >>
> >> v3 -> v4:
> >>
> >>   - Improve documentation
> >>
> >> v4 -> v5:
> >>
> >>   - Rewrite to use pending state sync in sregs (marc)
> >>   - Remove redundant checks of vgic_initialized()
> >>   - qemu tree to try this out: https://github.com/agraf/u-boot.git 
> >> no-kvm-irqchip-for-v5
> > 
> > huh, qemu=u-boot?
> 
> Bleks, qemu.git of course.
> 
> > 
> >> ---
> >>  Documentation/virtual/kvm/api.txt |  26 
> >>  arch/arm/include/uapi/asm/kvm.h   |   3 +
> >>  arch/arm/kvm/arm.c|  14 ++---
> >>  arch/arm64/include/uapi/asm/kvm.h |   3 +
> >>  include/kvm/arm_arch_timer.h  |   2 +-
> >>  include/uapi/linux/kvm.h  |   6 ++
> >>  virt/kvm/arm/arch_timer.c | 129 
> >> ++
> >>  7 files changed, 134 insertions(+), 49 deletions(-)
> >>
> >> diff --git a/Documentation/virtual/kvm/api.txt 
> >> b/Documentation/virtual/kvm/api.txt
> >> index 739db9a..8049327 100644
> >> --- a/Documentation/virtual/kvm/api.txt
> >> +++ b/Documentation/virtual/kvm/api.txt
> >> @@ -3928,3 +3928,29 @@ In order to use SynIC, it has to be activated by 
> >> setting this
> >>  capability via KVM_ENABLE_CAP ioctl on the vcpu fd. Note that this
> >>  will disable the use of APIC hardware virtualization even if supported
> >>  by the CPU, as it's incompatible with SynIC auto-EOI behavior.
> >> +
> >> +8.3 KVM_CAP_ARM_TIMER
> >> +
> >> +Architectures: arm, arm64
> >> +This capability, if KVM_CHECK_EXTENSION indicates that it is available 
> >> and no
> >> +in-kernel interrupt controller is in use, means that that the kernel 
> >> populates
> >> +the vcpu's run->s.regs.kernel_timer_pending field with timers that are 
> >> currently
> >> +considered pending by kvm.
> > 
> > Be careful with the word 'pending' here.  I think this could be
> > misleading, because pending is a state in the GIC, but not really
> > something I can find specific to the timer.  It would be more
> > descriptive to say that the kernel maintained generic timer's output
> > signal is asserted.
> 
> Sure, asserted works for me. Or maybe istatus?
> 

I think the concept that most closely describes the generic timer
architecture talks about 'asserting the timer output signal', so I'd
rather we stick to something as close to that as possible.

> > 
> >> +
> >> +If active, it also allows user space to propagate its own pending state 
> >> of timer
> >> +interrupt lines using run->s.regs.user_timer_pending. If those two fields
> >> +mismatch during CPU execution, kvm will exit to user space to give it a 
> >> chance
> > 
> > I don't quite understand the semantics here.  The only entity that knows
> > what the level state of the output of the timer is, is the kernel, which
> > emulates the timer.  Userspace knows interrupt controller state, but if
> > it has a different view of the timer state than the kernel, it's because
> > the kernel failed to notify userspace of a change or userspace failed to
> > listen?
> 
> Right, and the reason we have 2 fields is to get us exit-less (and
> easier) updates whenever we can. In most cases, taking the assertion
> down will coincide with an MMIO exit to user space for an EOI for example.

The assertion of the timer output signal is a completely separate
concept from an EOI.  Most often, the signal will be deasserted *before*
the EOI, but only noticed when you trap for the EOI.  Just to make that
distinction clear.

> 
> Somewhere in between in the development I had a version that explicitly
> triggered a KVM_EXIT for every state change of the timer. But that gets
> messy very quickly. You need to update the state change before an MMIO
> for example, otherwise a sequence like
> 
>   * set cval to future (which in turns sets istatus=0)
>   * read gic pending state
> 
> will give you bogus results, as the mmio read to user space did not yet
> have the timer status updated. And I really didn't want to wrap my head
> around

Re: [PATCH v5] KVM: arm/arm64: Route vtimer events to user space

2016-09-23 Thread Paolo Bonzini


On 23/09/2016 09:14, Alexander Graf wrote:
>>> +/*
>>> + * Synchronize the timer IRQ state with the interrupt controller.
>>> + */
>>>  static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level)
>>>  {
>>> int ret;
>>> struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
>>>  
>>> timer->active_cleared_last = false;
>>> timer->irq.level = new_level;
>>> -   trace_kvm_timer_update_irq(vcpu->vcpu_id, timer->irq.irq,
>>> +   trace_kvm_timer_update_irq(vcpu->vcpu_id, host_vtimer_irq,
>>>timer->irq.level);
>>> +   [...]
>>> +   struct kvm_sync_regs *regs = &vcpu->run->s.regs;
>>> +
>>> +   /* Populate the timer bitmap for user space */
>>> +   regs->kernel_timer_pending &= ~KVM_ARM_TIMER_VTIMER;
>>> +   if (new_level)
>>> +   regs->kernel_timer_pending |= KVM_ARM_TIMER_VTIMER;
>>
>> I think if you got here, it means you have to exit to userspace to
>> update it of the new state.  If you don't want to propagate a return
> 
> Yes, but we can't exit straight away with our own exit reason because we
> might be inside an MMIO exit path here which already occupies the
> exit_reason.

So the idea is that whenever you're here you have one of the following
cases:

- are coming from kvm_timer_flush_hwstate, and then you exit immediately
with KVM_EXIT_INTR if needed

- you are coming from the kvm_timer_sync_hwstate just before
handle_exit.  Then if there's a vmexit you have already set
regs->kernel_timer_pending, if not you'll do a kvm_timer_flush_hwstate soon.

- you are coming from the kvm_timer_sync_hwstate in the middle of
kvm_arch_vcpu_ioctl_run, and then "continue" will either exit the loop
immediately (if ret <= 0) or go to kvm_timer_flush_hwstate as in the
previous case

Right?

>> Maybe I'm misunderstanding and user_timer_pending is just a cached
>> verison of what you said last, but as I said above, I think you can just
>> compare timer->irq.level with the last value the kvm_run struct, and if
>> something changed, you have to exit.
> 
> So how would user space know whether the line went up or down? Or didn't
> change at all (if we coalesce with an MMIO exit)?

It would save the status of the line somewhere in its own variable,
without introducing a relatively delicate API between kernel and user.

I agree that you cannot update kernel_timer_pending only in
flush_hwstate; otherwise you miss on case (2) when there is a vmexit.
That has to stay in kvm_timer_sync_hwstate (or it could become a new
function called at the very end of kvm_arch_vcpu_ioctl_run).

However, I'm not sure why user_timer_pending is necessary.  If it is
just the value you assigned to regs->kernel_timer_pending at the time of
the previous vmexit, can kvm_timer_flush_hwstate just do

   if (timer->prev_kernel_timer_pending != regs->kernel_timer_pending) {
   timer->prev_kernel_timer_pending = regs->kernel_timer_pending;
   return 1;
   }

?  Or even

   if (timer->prev_irq_level != timer->irq.level) {
   timer->prev_irq_level = regs->irq.level;
   return 1;
   }

so that regs->kernel_timer_pending remains exclusively
kvm_timer_sync_hwstate's business?

Thanks,

Paolo

>>
>>> +
>>> +   /*
>>> +* As long as user space is aware that the timer is pending,
>>> +* we do not need to get new host timer events.
>>> +*/
>>
>> yes, correct, but I don't think this concept was clearly reflected in
>> your API text above.
>>
>>> +   if (timer->irq.level)
>>> +   disable_percpu_irq(host_vtimer_irq);
>>> +   else
>>> +   enable_percpu_irq(host_vtimer_irq, 0);
>>> +   }
>>
>> could we move these two blocks into their own functions instead?  That
>> would also give nice names to the huge chunk of complicated
>> functionality, e.g. flush_timer_state_to_user() and
>> flush_timer_state_to_vgic().
> 
> That's probably a very useful cleanup, yes :).
> 
> 
> Alex
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v5] KVM: arm/arm64: Route vtimer events to user space

2016-09-23 Thread Alexander Graf


On 22.09.16 23:28, Christoffer Dall wrote:
> On Thu, Sep 22, 2016 at 02:52:49PM +0200, Alexander Graf wrote:
>> We have 2 modes for dealing with interrupts in the ARM world. We can either
>> handle them all using hardware acceleration through the vgic or we can 
>> emulate
>> a gic in user space and only drive CPU IRQ pins from there.
>>
>> Unfortunately, when driving IRQs from user space, we never tell user space
>> about timer events that may result in interrupt line state changes, so we
>> lose out on timer events if we run with user space gic emulation.
>>
>> This patch fixes that by syncing user space's view of the vtimer irq line
>> with the kvm view of that same line.
>>
>> With this patch I can successfully run edk2 and Linux with user space gic
>> emulation.
>>
>> Signed-off-by: Alexander Graf 
>>
>> ---
>>
>> v1 -> v2:
>>
>>   - Add back curly brace that got lost
>>
>> v2 -> v3:
>>
>>   - Split into patch set
>>
>> v3 -> v4:
>>
>>   - Improve documentation
>>
>> v4 -> v5:
>>
>>   - Rewrite to use pending state sync in sregs (marc)
>>   - Remove redundant checks of vgic_initialized()
>>   - qemu tree to try this out: https://github.com/agraf/u-boot.git 
>> no-kvm-irqchip-for-v5
> 
> huh, qemu=u-boot?

Bleks, qemu.git of course.

> 
>> ---
>>  Documentation/virtual/kvm/api.txt |  26 
>>  arch/arm/include/uapi/asm/kvm.h   |   3 +
>>  arch/arm/kvm/arm.c|  14 ++---
>>  arch/arm64/include/uapi/asm/kvm.h |   3 +
>>  include/kvm/arm_arch_timer.h  |   2 +-
>>  include/uapi/linux/kvm.h  |   6 ++
>>  virt/kvm/arm/arch_timer.c | 129 
>> ++
>>  7 files changed, 134 insertions(+), 49 deletions(-)
>>
>> diff --git a/Documentation/virtual/kvm/api.txt 
>> b/Documentation/virtual/kvm/api.txt
>> index 739db9a..8049327 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -3928,3 +3928,29 @@ In order to use SynIC, it has to be activated by 
>> setting this
>>  capability via KVM_ENABLE_CAP ioctl on the vcpu fd. Note that this
>>  will disable the use of APIC hardware virtualization even if supported
>>  by the CPU, as it's incompatible with SynIC auto-EOI behavior.
>> +
>> +8.3 KVM_CAP_ARM_TIMER
>> +
>> +Architectures: arm, arm64
>> +This capability, if KVM_CHECK_EXTENSION indicates that it is available and 
>> no
>> +in-kernel interrupt controller is in use, means that that the kernel 
>> populates
>> +the vcpu's run->s.regs.kernel_timer_pending field with timers that are 
>> currently
>> +considered pending by kvm.
> 
> Be careful with the word 'pending' here.  I think this could be
> misleading, because pending is a state in the GIC, but not really
> something I can find specific to the timer.  It would be more
> descriptive to say that the kernel maintained generic timer's output
> signal is asserted.

Sure, asserted works for me. Or maybe istatus?

> 
>> +
>> +If active, it also allows user space to propagate its own pending state of 
>> timer
>> +interrupt lines using run->s.regs.user_timer_pending. If those two fields
>> +mismatch during CPU execution, kvm will exit to user space to give it a 
>> chance
> 
> I don't quite understand the semantics here.  The only entity that knows
> what the level state of the output of the timer is, is the kernel, which
> emulates the timer.  Userspace knows interrupt controller state, but if
> it has a different view of the timer state than the kernel, it's because
> the kernel failed to notify userspace of a change or userspace failed to
> listen?

Right, and the reason we have 2 fields is to get us exit-less (and
easier) updates whenever we can. In most cases, taking the assertion
down will coincide with an MMIO exit to user space for an EOI for example.

Somewhere in between in the development I had a version that explicitly
triggered a KVM_EXIT for every state change of the timer. But that gets
messy very quickly. You need to update the state change before an MMIO
for example, otherwise a sequence like

  * set cval to future (which in turns sets istatus=0)
  * read gic pending state

will give you bogus results, as the mmio read to user space did not yet
have the timer status updated. And I really didn't want to wrap my head
around restarting MMIO exits ;).

So having this side channel where user space potentially expects a timer
status change on every exit is much cleaner.

> 
>> +to update its own interrupt pending status. This usually involves triggering
>> +an interrupt line on a user space emulated interrupt controller.
> 
> To me it feels like the semantics should be that userspace can always
> derive the status of the timer and the level of the output signal from
> the timer by simply looking at kvm_run structure.

Yes, it can, and it does. That's what the "kernel_timer_pending" field is.

The kernel however also needs to know whether user space's view is in
sync, because we don't know whether there was an exit between our
internal state 

Re: [PATCH v5] KVM: arm/arm64: Route vtimer events to user space

2016-09-22 Thread Christoffer Dall
On Thu, Sep 22, 2016 at 02:52:49PM +0200, Alexander Graf wrote:
> We have 2 modes for dealing with interrupts in the ARM world. We can either
> handle them all using hardware acceleration through the vgic or we can emulate
> a gic in user space and only drive CPU IRQ pins from there.
> 
> Unfortunately, when driving IRQs from user space, we never tell user space
> about timer events that may result in interrupt line state changes, so we
> lose out on timer events if we run with user space gic emulation.
> 
> This patch fixes that by syncing user space's view of the vtimer irq line
> with the kvm view of that same line.
> 
> With this patch I can successfully run edk2 and Linux with user space gic
> emulation.
> 
> Signed-off-by: Alexander Graf 
> 
> ---
> 
> v1 -> v2:
> 
>   - Add back curly brace that got lost
> 
> v2 -> v3:
> 
>   - Split into patch set
> 
> v3 -> v4:
> 
>   - Improve documentation
> 
> v4 -> v5:
> 
>   - Rewrite to use pending state sync in sregs (marc)
>   - Remove redundant checks of vgic_initialized()
>   - qemu tree to try this out: https://github.com/agraf/u-boot.git 
> no-kvm-irqchip-for-v5

huh, qemu=u-boot?

> ---
>  Documentation/virtual/kvm/api.txt |  26 
>  arch/arm/include/uapi/asm/kvm.h   |   3 +
>  arch/arm/kvm/arm.c|  14 ++---
>  arch/arm64/include/uapi/asm/kvm.h |   3 +
>  include/kvm/arm_arch_timer.h  |   2 +-
>  include/uapi/linux/kvm.h  |   6 ++
>  virt/kvm/arm/arch_timer.c | 129 
> ++
>  7 files changed, 134 insertions(+), 49 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/api.txt 
> b/Documentation/virtual/kvm/api.txt
> index 739db9a..8049327 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -3928,3 +3928,29 @@ In order to use SynIC, it has to be activated by 
> setting this
>  capability via KVM_ENABLE_CAP ioctl on the vcpu fd. Note that this
>  will disable the use of APIC hardware virtualization even if supported
>  by the CPU, as it's incompatible with SynIC auto-EOI behavior.
> +
> +8.3 KVM_CAP_ARM_TIMER
> +
> +Architectures: arm, arm64
> +This capability, if KVM_CHECK_EXTENSION indicates that it is available and no
> +in-kernel interrupt controller is in use, means that that the kernel 
> populates
> +the vcpu's run->s.regs.kernel_timer_pending field with timers that are 
> currently
> +considered pending by kvm.

Be careful with the word 'pending' here.  I think this could be
misleading, because pending is a state in the GIC, but not really
something I can find specific to the timer.  It would be more
descriptive to say that the kernel maintained generic timer's output
signal is asserted.

> +
> +If active, it also allows user space to propagate its own pending state of 
> timer
> +interrupt lines using run->s.regs.user_timer_pending. If those two fields
> +mismatch during CPU execution, kvm will exit to user space to give it a 
> chance

I don't quite understand the semantics here.  The only entity that knows
what the level state of the output of the timer is, is the kernel, which
emulates the timer.  Userspace knows interrupt controller state, but if
it has a different view of the timer state than the kernel, it's because
the kernel failed to notify userspace of a change or userspace failed to
listen?

> +to update its own interrupt pending status. This usually involves triggering
> +an interrupt line on a user space emulated interrupt controller.

To me it feels like the semantics should be that userspace can always
derive the status of the timer and the level of the output signal from
the timer by simply looking at kvm_run structure.

The remaining two problems are:

(1) when should the kernel trigger exits to userspace?  Presumably on
any change in the timer's output level, because this change has to be
propagated to the userspace interrupt controller.

(2) the kernel needs to somehow mask the underlying hardware timer
interrupt signal when it's active, because otherwise the guest won't
proceed.  If we simply mask the hardware signal after telling userspace
the output signal is asserted and until the output signal ever becomes
deasserted, why do we need to listen to anything userspace has to say?


> +
> +The fields run->s.regs.kernel_timer_pending and 
> run->s.regs.user_timer_pending
> +are available independent of run->kvm_valid_regs or run->kvm_dirty_regs bits.
> +If no in-kernel interrupt controller is used and the capability exists, they
> +will always be available and used.
> +
> +Currently the following bits are defined for both bitmaps:
> +
> +KVM_ARM_TIMER_VTIMER  -  virtual timer
> +
> +Future versions of kvm may implement additional timer events. These will get
> +indicated by additional KVM_CAP extensions.
> diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
> index a2b3eb3..caad81d 100644
> --- a/arch/arm/include/uapi/asm/kvm.h
> +++ b/arch/arm/include/uapi/asm/kvm.h
> @@ -105

[PATCH v5] KVM: arm/arm64: Route vtimer events to user space

2016-09-22 Thread Alexander Graf
We have 2 modes for dealing with interrupts in the ARM world. We can either
handle them all using hardware acceleration through the vgic or we can emulate
a gic in user space and only drive CPU IRQ pins from there.

Unfortunately, when driving IRQs from user space, we never tell user space
about timer events that may result in interrupt line state changes, so we
lose out on timer events if we run with user space gic emulation.

This patch fixes that by syncing user space's view of the vtimer irq line
with the kvm view of that same line.

With this patch I can successfully run edk2 and Linux with user space gic
emulation.

Signed-off-by: Alexander Graf 

---

v1 -> v2:

  - Add back curly brace that got lost

v2 -> v3:

  - Split into patch set

v3 -> v4:

  - Improve documentation

v4 -> v5:

  - Rewrite to use pending state sync in sregs (marc)
  - Remove redundant checks of vgic_initialized()
  - qemu tree to try this out: https://github.com/agraf/u-boot.git 
no-kvm-irqchip-for-v5
---
 Documentation/virtual/kvm/api.txt |  26 
 arch/arm/include/uapi/asm/kvm.h   |   3 +
 arch/arm/kvm/arm.c|  14 ++---
 arch/arm64/include/uapi/asm/kvm.h |   3 +
 include/kvm/arm_arch_timer.h  |   2 +-
 include/uapi/linux/kvm.h  |   6 ++
 virt/kvm/arm/arch_timer.c | 129 ++
 7 files changed, 134 insertions(+), 49 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 739db9a..8049327 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -3928,3 +3928,29 @@ In order to use SynIC, it has to be activated by setting 
this
 capability via KVM_ENABLE_CAP ioctl on the vcpu fd. Note that this
 will disable the use of APIC hardware virtualization even if supported
 by the CPU, as it's incompatible with SynIC auto-EOI behavior.
+
+8.3 KVM_CAP_ARM_TIMER
+
+Architectures: arm, arm64
+This capability, if KVM_CHECK_EXTENSION indicates that it is available and no
+in-kernel interrupt controller is in use, means that that the kernel populates
+the vcpu's run->s.regs.kernel_timer_pending field with timers that are 
currently
+considered pending by kvm.
+
+If active, it also allows user space to propagate its own pending state of 
timer
+interrupt lines using run->s.regs.user_timer_pending. If those two fields
+mismatch during CPU execution, kvm will exit to user space to give it a chance
+to update its own interrupt pending status. This usually involves triggering
+an interrupt line on a user space emulated interrupt controller.
+
+The fields run->s.regs.kernel_timer_pending and run->s.regs.user_timer_pending
+are available independent of run->kvm_valid_regs or run->kvm_dirty_regs bits.
+If no in-kernel interrupt controller is used and the capability exists, they
+will always be available and used.
+
+Currently the following bits are defined for both bitmaps:
+
+KVM_ARM_TIMER_VTIMER  -  virtual timer
+
+Future versions of kvm may implement additional timer events. These will get
+indicated by additional KVM_CAP extensions.
diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
index a2b3eb3..caad81d 100644
--- a/arch/arm/include/uapi/asm/kvm.h
+++ b/arch/arm/include/uapi/asm/kvm.h
@@ -105,6 +105,9 @@ struct kvm_debug_exit_arch {
 };
 
 struct kvm_sync_regs {
+   /* Used with KVM_CAP_ARM_TIMER */
+   u8 kernel_timer_pending;
+   u8 user_timer_pending;
 };
 
 struct kvm_arch_memory_slot {
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 75f130e..dc19221 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -187,6 +187,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_ARM_PSCI_0_2:
case KVM_CAP_READONLY_MEM:
case KVM_CAP_MP_STATE:
+   case KVM_CAP_ARM_TIMER:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
@@ -474,13 +475,7 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
return ret;
}
 
-   /*
-* Enable the arch timers only if we have an in-kernel VGIC
-* and it has been properly initialized, since we cannot handle
-* interrupts from the virtual timer with a userspace gic.
-*/
-   if (irqchip_in_kernel(kvm) && vgic_initialized(kvm))
-   ret = kvm_timer_enable(vcpu);
+   ret = kvm_timer_enable(vcpu);
 
return ret;
 }
@@ -588,7 +583,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
 */
preempt_disable();
kvm_pmu_flush_hwstate(vcpu);
-   kvm_timer_flush_hwstate(vcpu);
+   if (kvm_timer_flush_hwstate(vcpu)) {
+   ret = -EINTR;
+   run->exit_reason = KVM_EXIT_INTR;
+   }
kvm_vgic_flush_hwstate(vcpu);
 
local_irq_disable();
diff --git a/arch/arm64/include/uapi