Re: [PATCH v2 01/15] KVM: arm/arm64: VGIC: don't track used LRs in the distributor

2015-10-02 Thread Andre Przywara
Hi Pavel,

On 02/10/15 13:39, Pavel Fedin wrote:
>  Hello!
> 
>> Can't you use the ELRSR bitmap instead? The idea of lr_used sounds like
>> a moot optimization to me.
> 
>  This perfectly works on 4.2, but will break HW interrupt forwarding on 4.3. 
> If you look at 4.3
> __kvm_vgic_sync_hwstate(), you'll notice that for HW interrupts lr_used and 
> elrsr_ptr will diverge
> at this point, and this function actually brings them into sync. And it 
> relies on lr_used for the
> loop to operate correctly (no idea why we use "for" loop here with extra 
> check instead of
> "for_each_set_bit(lr, vgic_cpu->lr_used, vgic->nr_lr)", looks stupid to me).

I know, because I have reworked my patch lately to work on top of 4.3-rc
and Christoffer's timer rework series. I have something which "works
now"(TM), but wanted to wait for Christoffer's respin to send out.
I will send you this new version this as a sneak preview in private,
maybe that helps.

Cheers,
Andre.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v2 01/15] KVM: arm/arm64: VGIC: don't track used LRs in the distributor

2015-10-02 Thread Pavel Fedin
 Hello! One more concern.

> Currently we track which IRQ has been mapped to which VGIC list
> register and also have to synchronize both. We used to do this
> to hold some extra state (for instance the active bit).
> It turns out that this extra state in the LRs is no longer needed and
> this extra tracking causes some pain later.

 Returning to the beginning, can you explain, what kind of pain exactly does it 
bring?
 For some reasons here i had to keep all this tracking mechanism, and modified 
your patchset. I see
no significant problems, except memory usage. I have to allocate 
vgic_irq_lr_map large enough to
hold indexes up to 16384, and indexes from dist->nr_irqs to 8192 appear to be 
not used.
 Since the map itself is actually used only for piggy-back optimization, it is 
possible to easily
get rid of it using for_each_set_bit(lr, vgic_cpu->lr_used, vgic->nr_lr) 
iteration instead. The rest
of mechanism will work as it is, there's no need to remove the state tracking 
bitmap and
optimization itself.
 I am currently testing this approach and preparing my alternative patch for 
upstreaming.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v2 01/15] KVM: arm/arm64: VGIC: don't track used LRs in the distributor

2015-10-02 Thread Pavel Fedin
 Hello!

> For instance how did you come up with that 16384? LPIs could actually be
> much bigger (in fact the emulation currently support up to 64k).

 Well, read "at least 16384". I actually don't remember the exact value you've 
put in.

> >  Since the map itself is actually used only for piggy-back optimization, it 
> > is possible to
> easily
> > get rid of it using for_each_set_bit(lr, vgic_cpu->lr_used, vgic->nr_lr) 
> > iteration instead.
> 
> Can't you use the ELRSR bitmap instead? The idea of lr_used sounds like
> a moot optimization to me.

 I'll take a look. You seem to be right, lr_used became a direct (well, 
inverted) copy of elrsr
after full elrsr synchronization was introduced long time ago. It's just 
current code relying on
lr_used everywhere.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v2 01/15] KVM: arm/arm64: VGIC: don't track used LRs in the distributor

2015-10-02 Thread Pavel Fedin
 Hello!

> Can't you use the ELRSR bitmap instead? The idea of lr_used sounds like
> a moot optimization to me.

 This perfectly works on 4.2, but will break HW interrupt forwarding on 4.3. If 
you look at 4.3
__kvm_vgic_sync_hwstate(), you'll notice that for HW interrupts lr_used and 
elrsr_ptr will diverge
at this point, and this function actually brings them into sync. And it relies 
on lr_used for the
loop to operate correctly (no idea why we use "for" loop here with extra check 
instead of
"for_each_set_bit(lr, vgic_cpu->lr_used, vgic->nr_lr)", looks stupid to me).

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 01/15] KVM: arm/arm64: VGIC: don't track used LRs in the distributor

2015-10-02 Thread Andre Przywara
Hi Pavel,

On 02/10/15 10:55, Pavel Fedin wrote:
>  Hello! One more concern.
> 
>> Currently we track which IRQ has been mapped to which VGIC list
>> register and also have to synchronize both. We used to do this
>> to hold some extra state (for instance the active bit).
>> It turns out that this extra state in the LRs is no longer needed and
>> this extra tracking causes some pain later.
> 
>  Returning to the beginning, can you explain, what kind of pain exactly does 
> it bring?
>  For some reasons here i had to keep all this tracking mechanism, and 
> modified your patchset. I see
> no significant problems, except memory usage. I have to allocate 
> vgic_irq_lr_map large enough to
> hold indexes up to 16384, and indexes from dist->nr_irqs to 8192 appear to be 
> not used.

Yes, this was the main problem I was concerned about. Actually not so
much about memory usage really, but more about the idea of pushing the
concept of bitmaps beyond the point where it is a reasonable data
structure to use.
I briefly thought about extending the bitmaps, but that sounded like a
hack to me.
For instance how did you come up with that 16384? LPIs could actually be
much bigger (in fact the emulation currently support up to 64k).
In my opinion removing that tracking mechanism is actually a good idea.
Most implementations actually have only _four_ list registers and since
we keep them shadowed in our KVM data structure, iterating over all of
them is not really painful.

>  Since the map itself is actually used only for piggy-back optimization, it 
> is possible to easily
> get rid of it using for_each_set_bit(lr, vgic_cpu->lr_used, vgic->nr_lr) 
> iteration instead.

Can't you use the ELRSR bitmap instead? The idea of lr_used sounds like
a moot optimization to me.

Cheers,
Andre.

> The rest
> of mechanism will work as it is, there's no need to remove the state tracking 
> bitmap and
> optimization itself.
>  I am currently testing this approach and preparing my alternative patch for 
> upstreaming.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 01/15] KVM: arm/arm64: VGIC: don't track used LRs in the distributor

2015-09-02 Thread Andre Przywara
On 31/08/15 09:42, Eric Auger wrote:
> On 08/24/2015 06:33 PM, Andre Przywara wrote:

Salut Eric,

...

 @@ -1126,9 +1124,9 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu 
 *vcpu, int irq,
   */
  bool vgic_queue_irq(struct kvm_vcpu *vcpu, u8 sgi_source_id, int irq)
  {
 -   struct vgic_cpu *vgic_cpu = >arch.vgic_cpu;
 struct vgic_dist *dist = >kvm->arch.vgic;
 -   struct vgic_lr vlr;
 +   u64 elrsr = vgic_get_elrsr(vcpu);
 +   unsigned long *elrsr_ptr = u64_to_bitmask();
 int lr;

 /* Sanitize the input... */
 @@ -1138,42 +1136,20 @@ bool vgic_queue_irq(struct kvm_vcpu *vcpu, u8 
 sgi_source_id, int irq)

 kvm_debug("Queue IRQ%d\n", irq);

 -   lr = vgic_cpu->vgic_irq_lr_map[irq];
 -
 -   /* Do we have an active interrupt for the same CPUID? */
 -   if (lr != LR_EMPTY) {
 -   vlr = vgic_get_lr(vcpu, lr);
 -   if (vlr.source == sgi_source_id) {
 -   kvm_debug("LR%d piggyback for IRQ%d\n", lr, vlr.irq);
 -   BUG_ON(!test_bit(lr, vgic_cpu->lr_used));
 -   vgic_queue_irq_to_lr(vcpu, irq, lr, vlr);
 -   return true;
 -   }
 -   }
 +   lr = find_first_bit(elrsr_ptr, vgic->nr_lr);

 -   /* Try to use another LR for this interrupt */
 -   lr = find_first_zero_bit((unsigned long *)vgic_cpu->lr_used,
 -  vgic->nr_lr);
 if (lr >= vgic->nr_lr)
 return false;

 kvm_debug("LR%d allocated for IRQ%d %x\n", lr, irq, sgi_source_id);
 -   vgic_cpu->vgic_irq_lr_map[irq] = lr;
 -   set_bit(lr, vgic_cpu->lr_used);

 -   vlr.irq = irq;
 -   vlr.source = sgi_source_id;
 -   vlr.state = 0;
 -   vgic_queue_irq_to_lr(vcpu, irq, lr, vlr);
 +   vgic_queue_irq_to_lr(vcpu, irq, lr, sgi_source_id);

 return true;
  }

  static bool vgic_queue_hwirq(struct kvm_vcpu *vcpu, int irq)
  {
 -   if (!vgic_can_sample_irq(vcpu, irq))
 -   return true; /* level interrupt, already queued */
 -
>>> I think that change needs to be introduced in a separate patch as the
>>> other one mentioned above and justified since it affects the state machine.
>>
>> But this change is dependent on this patch: it will not work without
>> this patch and this patch will not work without that change.
>> So the idea is that on returning from the guest we now harvest all
>> _used_ LRs by copying their state back into the distributor. The
>> previous behaviour was to just check _unused_ LRs for completed IRQs.
>> So now all IRQs need to be re-inserted into the LRs before the next
>> guest run, that's why we have to remove the test which skipped this for
>> IRQs where the code knew that they were still in the LRs.
>> Does that make sense?
> In level sensitive case, what if the IRQ'LR state was active. LR was
> kept intact. IRQ is queued. With previous code we wouldn't inject a new
> IRQ. Here aren't we going to inject it?

Effectively we only inject _pending_ IRQs: I was going forth and back
through the current code and couldn't find a place where we actually
make use of any active-only interrupt - the only exception being
migration, where we explicitly iterate through all LRs again and pick up
active-only IRQs as well. We never set the active state except there.

So I decided to keep only-active IRQs in the LRs and do not propagate
them back into the emulated distributor state. One reason is that we
don't use it, which makes the code look silly and secondly we avoid the
issue you described above.

> In the sync when you move the IRQ from the LR reg to the state variable,
> shouldn't you reset the queued state for level sensitive case? In such a
> case I think you could keep that check.

We keep them as "queued" in our emulation until they become inactive and
we clear the queued bit in the EOI handler.

Admittedly this whole approach is not obvious and it's perfectly
possible that I missed something. So please can you check and confirm my
assumptions above?

Merci,
André

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 01/15] KVM: arm/arm64: VGIC: don't track used LRs in the distributor

2015-08-31 Thread Eric Auger
On 08/24/2015 06:33 PM, Andre Przywara wrote:
> Hi Eric,
> 
> On 12/08/15 10:01, Eric Auger wrote:
> 
>>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>>> index bc40137..394622c 100644
>>> --- a/virt/kvm/arm/vgic.c
>>> +++ b/virt/kvm/arm/vgic.c
>>> @@ -79,7 +79,6 @@
>>>  #include "vgic.h"
>>>  
>>>  static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu);
>>> -static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu);
>>>  static struct vgic_lr vgic_get_lr(const struct kvm_vcpu *vcpu, int lr);
>>>  static void vgic_set_lr(struct kvm_vcpu *vcpu, int lr, struct vgic_lr 
>>> lr_desc);
>>>  
>>> @@ -647,6 +646,17 @@ bool vgic_handle_cfg_reg(u32 *reg, struct 
>>> kvm_exit_mmio *mmio,
>>> return false;
>>>  }
>>>  
>>> +static void vgic_sync_lr_elrsr(struct kvm_vcpu *vcpu, int lr,
>>> +  struct vgic_lr vlr)
>>> +{
>>> +   vgic_ops->sync_lr_elrsr(vcpu, lr, vlr);
>>> +}
>> why not renaming this into vgic_set_elrsr. This would be homogeneous
>> with other virtual interface control register setters?
> 
> But that would involve renaming the vgic_ops members as well to be
> consistent, right?
yes
 As there is no change in the behaviour, a naming
> change sounds unmotivated to me. And _set_ wouldn't be exact, as this
> function deals only with only one bit at a time and allows to clear it
> as well.
OK. Not that much important. That's just I had in mind the
__kvm_vgic_SYNC_hwstate which has more intelligence ;-) That function
just sets one bit of elrsr ...
> 
>>> +
>>> +static inline u64 vgic_get_elrsr(struct kvm_vcpu *vcpu)
>>> +{
>>> +   return vgic_ops->get_elrsr(vcpu);
>>> +}
>> If I am not wrong, each time you manipulate the elrsr you handle the
>> bitmap. why not directly returning an unsigned long * then (elrsr_ptr)?
> 
> Because the pointer needs to point somewhere, and that storage is
> currently located on the caller's stack. Directly returning a pointer
> would require the caller to provide some memory for the u64, which does
> not save you so much in terms on LOC:
> 
> - u64 elrsr = vgic_get_elrsr(vcpu);
> - unsigned long *elrsr_ptr = u64_to_bitmask();
> + u64 elrsr;
> + unsigned long *elrsr_ptr = vgic_get_elrsr_bm(vcpu, );
> 
> Also we need u64_to_bitmask() in one case when converting the EISR
> value, so we cannot get lost of that function.

OK fair enough
> 
>>> +
>>>  /**
>>>   * vgic_unqueue_irqs - move pending/active IRQs from LRs to the distributor
>>>   * @vgic_cpu: Pointer to the vgic_cpu struct holding the LRs
>>> @@ -658,9 +668,11 @@ bool vgic_handle_cfg_reg(u32 *reg, struct 
>>> kvm_exit_mmio *mmio,
>>>  void vgic_unqueue_irqs(struct kvm_vcpu *vcpu)
>>>  {
>>> struct vgic_cpu *vgic_cpu = >arch.vgic_cpu;
>>> +   u64 elrsr = vgic_get_elrsr(vcpu);
>>> +   unsigned long *elrsr_ptr = u64_to_bitmask();
>>> int i;
>>>  
>>> -   for_each_set_bit(i, vgic_cpu->lr_used, vgic_cpu->nr_lr) {
>>> +   for_each_clear_bit(i, elrsr_ptr, vgic_cpu->nr_lr) {
>>> struct vgic_lr lr = vgic_get_lr(vcpu, i);
>>>  
>>> /*
>>> @@ -703,7 +715,7 @@ void vgic_unqueue_irqs(struct kvm_vcpu *vcpu)
>>>  * Mark the LR as free for other use.
>>>  */
>>> BUG_ON(lr.state & LR_STATE_MASK);
>>> -   vgic_retire_lr(i, lr.irq, vcpu);
>>> +   vgic_sync_lr_elrsr(vcpu, i, lr);
>>> vgic_irq_clear_queued(vcpu, lr.irq);
>>>  
>>> /* Finally update the VGIC state. */
>>> @@ -1011,17 +1023,6 @@ static void vgic_set_lr(struct kvm_vcpu *vcpu, int 
>>> lr,
>>> vgic_ops->set_lr(vcpu, lr, vlr);
>>>  }
>>>  
>>> -static void vgic_sync_lr_elrsr(struct kvm_vcpu *vcpu, int lr,
>>> -  struct vgic_lr vlr)
>>> -{
>>> -   vgic_ops->sync_lr_elrsr(vcpu, lr, vlr);
>>> -}
>>> -
>>> -static inline u64 vgic_get_elrsr(struct kvm_vcpu *vcpu)
>>> -{
>>> -   return vgic_ops->get_elrsr(vcpu);
>>> -}
>>> -
>>>  static inline u64 vgic_get_eisr(struct kvm_vcpu *vcpu)
>>>  {
>>> return vgic_ops->get_eisr(vcpu);
>>> @@ -1062,18 +1063,6 @@ static inline void vgic_enable(struct kvm_vcpu *vcpu)
>>> vgic_ops->enable(vcpu);
>>>  }
>>>  
>>> -static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu)
>>> -{
>>> -   struct vgic_cpu *vgic_cpu = >arch.vgic_cpu;
>>> -   struct vgic_lr vlr = vgic_get_lr(vcpu, lr_nr);
>>> -
>>> -   vlr.state = 0;
>>> -   vgic_set_lr(vcpu, lr_nr, vlr);
>>> -   clear_bit(lr_nr, vgic_cpu->lr_used);
>>> -   vgic_cpu->vgic_irq_lr_map[irq] = LR_EMPTY;
>>> -   vgic_sync_lr_elrsr(vcpu, lr_nr, vlr);
>>> -}
>>> -
>>>  /*
>>>   * An interrupt may have been disabled after being made pending on the
>>>   * CPU interface (the classic case is a timer running while we're
>>> @@ -1085,23 +1074,32 @@ static void vgic_retire_lr(int lr_nr, int irq, 
>>> struct kvm_vcpu *vcpu)
>>>   */
>>>  static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu)
>>>  {
>>> -   struct vgic_cpu *vgic_cpu = >arch.vgic_cpu;
>>> +   u64 elrsr = 

Re: [PATCH v2 01/15] KVM: arm/arm64: VGIC: don't track used LRs in the distributor

2015-08-24 Thread Andre Przywara
Hi Eric,

On 12/08/15 10:01, Eric Auger wrote:

 diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
 index bc40137..394622c 100644
 --- a/virt/kvm/arm/vgic.c
 +++ b/virt/kvm/arm/vgic.c
 @@ -79,7 +79,6 @@
  #include vgic.h
  
  static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu);
 -static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu);
  static struct vgic_lr vgic_get_lr(const struct kvm_vcpu *vcpu, int lr);
  static void vgic_set_lr(struct kvm_vcpu *vcpu, int lr, struct vgic_lr 
 lr_desc);
  
 @@ -647,6 +646,17 @@ bool vgic_handle_cfg_reg(u32 *reg, struct kvm_exit_mmio 
 *mmio,
  return false;
  }
  
 +static void vgic_sync_lr_elrsr(struct kvm_vcpu *vcpu, int lr,
 +   struct vgic_lr vlr)
 +{
 +vgic_ops-sync_lr_elrsr(vcpu, lr, vlr);
 +}
 why not renaming this into vgic_set_elrsr. This would be homogeneous
 with other virtual interface control register setters?

But that would involve renaming the vgic_ops members as well to be
consistent, right? As there is no change in the behaviour, a naming
change sounds unmotivated to me. And _set_ wouldn't be exact, as this
function deals only with only one bit at a time and allows to clear it
as well.

 +
 +static inline u64 vgic_get_elrsr(struct kvm_vcpu *vcpu)
 +{
 +return vgic_ops-get_elrsr(vcpu);
 +}
 If I am not wrong, each time you manipulate the elrsr you handle the
 bitmap. why not directly returning an unsigned long * then (elrsr_ptr)?

Because the pointer needs to point somewhere, and that storage is
currently located on the caller's stack. Directly returning a pointer
would require the caller to provide some memory for the u64, which does
not save you so much in terms on LOC:

-   u64 elrsr = vgic_get_elrsr(vcpu);
-   unsigned long *elrsr_ptr = u64_to_bitmask(elrsr);
+   u64 elrsr;
+   unsigned long *elrsr_ptr = vgic_get_elrsr_bm(vcpu, elrsr);

Also we need u64_to_bitmask() in one case when converting the EISR
value, so we cannot get lost of that function.

 +
  /**
   * vgic_unqueue_irqs - move pending/active IRQs from LRs to the distributor
   * @vgic_cpu: Pointer to the vgic_cpu struct holding the LRs
 @@ -658,9 +668,11 @@ bool vgic_handle_cfg_reg(u32 *reg, struct kvm_exit_mmio 
 *mmio,
  void vgic_unqueue_irqs(struct kvm_vcpu *vcpu)
  {
  struct vgic_cpu *vgic_cpu = vcpu-arch.vgic_cpu;
 +u64 elrsr = vgic_get_elrsr(vcpu);
 +unsigned long *elrsr_ptr = u64_to_bitmask(elrsr);
  int i;
  
 -for_each_set_bit(i, vgic_cpu-lr_used, vgic_cpu-nr_lr) {
 +for_each_clear_bit(i, elrsr_ptr, vgic_cpu-nr_lr) {
  struct vgic_lr lr = vgic_get_lr(vcpu, i);
  
  /*
 @@ -703,7 +715,7 @@ void vgic_unqueue_irqs(struct kvm_vcpu *vcpu)
   * Mark the LR as free for other use.
   */
  BUG_ON(lr.state  LR_STATE_MASK);
 -vgic_retire_lr(i, lr.irq, vcpu);
 +vgic_sync_lr_elrsr(vcpu, i, lr);
  vgic_irq_clear_queued(vcpu, lr.irq);
  
  /* Finally update the VGIC state. */
 @@ -1011,17 +1023,6 @@ static void vgic_set_lr(struct kvm_vcpu *vcpu, int lr,
  vgic_ops-set_lr(vcpu, lr, vlr);
  }
  
 -static void vgic_sync_lr_elrsr(struct kvm_vcpu *vcpu, int lr,
 -   struct vgic_lr vlr)
 -{
 -vgic_ops-sync_lr_elrsr(vcpu, lr, vlr);
 -}
 -
 -static inline u64 vgic_get_elrsr(struct kvm_vcpu *vcpu)
 -{
 -return vgic_ops-get_elrsr(vcpu);
 -}
 -
  static inline u64 vgic_get_eisr(struct kvm_vcpu *vcpu)
  {
  return vgic_ops-get_eisr(vcpu);
 @@ -1062,18 +1063,6 @@ static inline void vgic_enable(struct kvm_vcpu *vcpu)
  vgic_ops-enable(vcpu);
  }
  
 -static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu)
 -{
 -struct vgic_cpu *vgic_cpu = vcpu-arch.vgic_cpu;
 -struct vgic_lr vlr = vgic_get_lr(vcpu, lr_nr);
 -
 -vlr.state = 0;
 -vgic_set_lr(vcpu, lr_nr, vlr);
 -clear_bit(lr_nr, vgic_cpu-lr_used);
 -vgic_cpu-vgic_irq_lr_map[irq] = LR_EMPTY;
 -vgic_sync_lr_elrsr(vcpu, lr_nr, vlr);
 -}
 -
  /*
   * An interrupt may have been disabled after being made pending on the
   * CPU interface (the classic case is a timer running while we're
 @@ -1085,23 +1074,32 @@ static void vgic_retire_lr(int lr_nr, int irq, 
 struct kvm_vcpu *vcpu)
   */
  static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu)
  {
 -struct vgic_cpu *vgic_cpu = vcpu-arch.vgic_cpu;
 +u64 elrsr = vgic_get_elrsr(vcpu);
 +unsigned long *elrsr_ptr = u64_to_bitmask(elrsr);
 if you agree with above modif I would simply rename elrsr_ptr into elrsr.
  int lr;
 +struct vgic_lr vlr;
 why moving this declaration here. I think this can remain in the block.

Possibly. Don't remember the reason of this move, I think it was due to
some other changes I later removed. I will revert it.

  
 -for_each_set_bit(lr, vgic_cpu-lr_used, vgic-nr_lr) {
 -struct vgic_lr vlr = vgic_get_lr(vcpu, lr);
 +for_each_clear_bit(lr, elrsr_ptr, 

Re: [PATCH v2 01/15] KVM: arm/arm64: VGIC: don't track used LRs in the distributor

2015-08-12 Thread Eric Auger
Hi Andre,
On 07/10/2015 04:21 PM, Andre Przywara wrote:
 Currently we track which IRQ has been mapped to which VGIC list
 register and also have to synchronize both. We used to do this
 to hold some extra state (for instance the active bit).
 It turns out that this extra state in the LRs is no longer needed and
 this extra tracking causes some pain later.
 Remove the tracking feature (lr_map and lr_used) and get rid of
 quite some code on the way.
 On a guest exit we pick up all still pending IRQs from the LRs and put
 them back in the distributor. We don't care about active-only IRQs,
 so we keep them in the LRs. They will be retired either by our
 vgic_process_maintenance() routine or by the GIC hardware in case of
 edge triggered interrupts.
 In places where we scan LRs we now use our shadow copy of the ELRSR
 register directly.
 This code change means we lose the piggy-back optimization, which
 would re-use an active-only LR to inject the pending state on top of
 it. Tracing with various workloads shows that this actually occurred
 very rarely, the ballpark figure is about once every 10,000 exits
 in a disk I/O heavy workload. Also the list registers don't seem to
 as scarce as assumed, with all 4 LRs on the popular implementations
 used less than once every 100,000 exits.
 
 This has been briefly tested on Midway, Juno and the model (the latter
 both with GICv2 and GICv3 guests).
 
 Signed-off-by: Andre Przywara andre.przyw...@arm.com
 ---
  include/kvm/arm_vgic.h |   6 ---
  virt/kvm/arm/vgic-v2.c |   1 +
  virt/kvm/arm/vgic-v3.c |   1 +
  virt/kvm/arm/vgic.c| 143 
 ++---
  4 files changed, 66 insertions(+), 85 deletions(-)
 
 diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
 index 133ea00..2ccfa9a 100644
 --- a/include/kvm/arm_vgic.h
 +++ b/include/kvm/arm_vgic.h
 @@ -279,9 +279,6 @@ struct vgic_v3_cpu_if {
  };
  
  struct vgic_cpu {
 - /* per IRQ to LR mapping */
 - u8  *vgic_irq_lr_map;
 -
   /* Pending/active/both interrupts on this VCPU */
   DECLARE_BITMAP( pending_percpu, VGIC_NR_PRIVATE_IRQS);
   DECLARE_BITMAP( active_percpu, VGIC_NR_PRIVATE_IRQS);
 @@ -292,9 +289,6 @@ struct vgic_cpu {
   unsigned long   *active_shared;
   unsigned long   *pend_act_shared;
  
 - /* Bitmap of used/free list registers */
 - DECLARE_BITMAP( lr_used, VGIC_V2_MAX_LRS);
 -
   /* Number of list registers on this CPU */
   int nr_lr;
  
 diff --git a/virt/kvm/arm/vgic-v2.c b/virt/kvm/arm/vgic-v2.c
 index f9b9c7c..f723710 100644
 --- a/virt/kvm/arm/vgic-v2.c
 +++ b/virt/kvm/arm/vgic-v2.c
 @@ -144,6 +144,7 @@ static void vgic_v2_enable(struct kvm_vcpu *vcpu)
* anyway.
*/
   vcpu-arch.vgic_cpu.vgic_v2.vgic_vmcr = 0;
 + vcpu-arch.vgic_cpu.vgic_v2.vgic_elrsr = ~0;
  
   /* Get the show on the road... */
   vcpu-arch.vgic_cpu.vgic_v2.vgic_hcr = GICH_HCR_EN;
 diff --git a/virt/kvm/arm/vgic-v3.c b/virt/kvm/arm/vgic-v3.c
 index dff0602..21e5d28 100644
 --- a/virt/kvm/arm/vgic-v3.c
 +++ b/virt/kvm/arm/vgic-v3.c
 @@ -178,6 +178,7 @@ static void vgic_v3_enable(struct kvm_vcpu *vcpu)
* anyway.
*/
   vgic_v3-vgic_vmcr = 0;
 + vgic_v3-vgic_elrsr = ~0;
  
   /*
* If we are emulating a GICv3, we do it in an non-GICv2-compatible
 diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
 index bc40137..394622c 100644
 --- a/virt/kvm/arm/vgic.c
 +++ b/virt/kvm/arm/vgic.c
 @@ -79,7 +79,6 @@
  #include vgic.h
  
  static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu);
 -static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu);
  static struct vgic_lr vgic_get_lr(const struct kvm_vcpu *vcpu, int lr);
  static void vgic_set_lr(struct kvm_vcpu *vcpu, int lr, struct vgic_lr 
 lr_desc);
  
 @@ -647,6 +646,17 @@ bool vgic_handle_cfg_reg(u32 *reg, struct kvm_exit_mmio 
 *mmio,
   return false;
  }
  
 +static void vgic_sync_lr_elrsr(struct kvm_vcpu *vcpu, int lr,
 +struct vgic_lr vlr)
 +{
 + vgic_ops-sync_lr_elrsr(vcpu, lr, vlr);
 +}
why not renaming this into vgic_set_elrsr. This would be homogeneous
with other virtual interface control register setters?
 +
 +static inline u64 vgic_get_elrsr(struct kvm_vcpu *vcpu)
 +{
 + return vgic_ops-get_elrsr(vcpu);
 +}
If I am not wrong, each time you manipulate the elrsr you handle the
bitmap. why not directly returning an unsigned long * then (elrsr_ptr)?
 +
  /**
   * vgic_unqueue_irqs - move pending/active IRQs from LRs to the distributor
   * @vgic_cpu: Pointer to the vgic_cpu struct holding the LRs
 @@ -658,9 +668,11 @@ bool vgic_handle_cfg_reg(u32 *reg, struct kvm_exit_mmio 
 *mmio,
  void vgic_unqueue_irqs(struct kvm_vcpu *vcpu)
  {
   struct vgic_cpu *vgic_cpu = vcpu-arch.vgic_cpu;
 + u64 elrsr = vgic_get_elrsr(vcpu);
 + unsigned long *elrsr_ptr = u64_to_bitmask(elrsr);
   int i;
  
 - for_each_set_bit(i, vgic_cpu-lr_used, 

[PATCH v2 01/15] KVM: arm/arm64: VGIC: don't track used LRs in the distributor

2015-07-10 Thread Andre Przywara
Currently we track which IRQ has been mapped to which VGIC list
register and also have to synchronize both. We used to do this
to hold some extra state (for instance the active bit).
It turns out that this extra state in the LRs is no longer needed and
this extra tracking causes some pain later.
Remove the tracking feature (lr_map and lr_used) and get rid of
quite some code on the way.
On a guest exit we pick up all still pending IRQs from the LRs and put
them back in the distributor. We don't care about active-only IRQs,
so we keep them in the LRs. They will be retired either by our
vgic_process_maintenance() routine or by the GIC hardware in case of
edge triggered interrupts.
In places where we scan LRs we now use our shadow copy of the ELRSR
register directly.
This code change means we lose the piggy-back optimization, which
would re-use an active-only LR to inject the pending state on top of
it. Tracing with various workloads shows that this actually occurred
very rarely, the ballpark figure is about once every 10,000 exits
in a disk I/O heavy workload. Also the list registers don't seem to
as scarce as assumed, with all 4 LRs on the popular implementations
used less than once every 100,000 exits.

This has been briefly tested on Midway, Juno and the model (the latter
both with GICv2 and GICv3 guests).

Signed-off-by: Andre Przywara andre.przyw...@arm.com
---
 include/kvm/arm_vgic.h |   6 ---
 virt/kvm/arm/vgic-v2.c |   1 +
 virt/kvm/arm/vgic-v3.c |   1 +
 virt/kvm/arm/vgic.c| 143 ++---
 4 files changed, 66 insertions(+), 85 deletions(-)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 133ea00..2ccfa9a 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -279,9 +279,6 @@ struct vgic_v3_cpu_if {
 };
 
 struct vgic_cpu {
-   /* per IRQ to LR mapping */
-   u8  *vgic_irq_lr_map;
-
/* Pending/active/both interrupts on this VCPU */
DECLARE_BITMAP( pending_percpu, VGIC_NR_PRIVATE_IRQS);
DECLARE_BITMAP( active_percpu, VGIC_NR_PRIVATE_IRQS);
@@ -292,9 +289,6 @@ struct vgic_cpu {
unsigned long   *active_shared;
unsigned long   *pend_act_shared;
 
-   /* Bitmap of used/free list registers */
-   DECLARE_BITMAP( lr_used, VGIC_V2_MAX_LRS);
-
/* Number of list registers on this CPU */
int nr_lr;
 
diff --git a/virt/kvm/arm/vgic-v2.c b/virt/kvm/arm/vgic-v2.c
index f9b9c7c..f723710 100644
--- a/virt/kvm/arm/vgic-v2.c
+++ b/virt/kvm/arm/vgic-v2.c
@@ -144,6 +144,7 @@ static void vgic_v2_enable(struct kvm_vcpu *vcpu)
 * anyway.
 */
vcpu-arch.vgic_cpu.vgic_v2.vgic_vmcr = 0;
+   vcpu-arch.vgic_cpu.vgic_v2.vgic_elrsr = ~0;
 
/* Get the show on the road... */
vcpu-arch.vgic_cpu.vgic_v2.vgic_hcr = GICH_HCR_EN;
diff --git a/virt/kvm/arm/vgic-v3.c b/virt/kvm/arm/vgic-v3.c
index dff0602..21e5d28 100644
--- a/virt/kvm/arm/vgic-v3.c
+++ b/virt/kvm/arm/vgic-v3.c
@@ -178,6 +178,7 @@ static void vgic_v3_enable(struct kvm_vcpu *vcpu)
 * anyway.
 */
vgic_v3-vgic_vmcr = 0;
+   vgic_v3-vgic_elrsr = ~0;
 
/*
 * If we are emulating a GICv3, we do it in an non-GICv2-compatible
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index bc40137..394622c 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -79,7 +79,6 @@
 #include vgic.h
 
 static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu);
-static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu);
 static struct vgic_lr vgic_get_lr(const struct kvm_vcpu *vcpu, int lr);
 static void vgic_set_lr(struct kvm_vcpu *vcpu, int lr, struct vgic_lr lr_desc);
 
@@ -647,6 +646,17 @@ bool vgic_handle_cfg_reg(u32 *reg, struct kvm_exit_mmio 
*mmio,
return false;
 }
 
+static void vgic_sync_lr_elrsr(struct kvm_vcpu *vcpu, int lr,
+  struct vgic_lr vlr)
+{
+   vgic_ops-sync_lr_elrsr(vcpu, lr, vlr);
+}
+
+static inline u64 vgic_get_elrsr(struct kvm_vcpu *vcpu)
+{
+   return vgic_ops-get_elrsr(vcpu);
+}
+
 /**
  * vgic_unqueue_irqs - move pending/active IRQs from LRs to the distributor
  * @vgic_cpu: Pointer to the vgic_cpu struct holding the LRs
@@ -658,9 +668,11 @@ bool vgic_handle_cfg_reg(u32 *reg, struct kvm_exit_mmio 
*mmio,
 void vgic_unqueue_irqs(struct kvm_vcpu *vcpu)
 {
struct vgic_cpu *vgic_cpu = vcpu-arch.vgic_cpu;
+   u64 elrsr = vgic_get_elrsr(vcpu);
+   unsigned long *elrsr_ptr = u64_to_bitmask(elrsr);
int i;
 
-   for_each_set_bit(i, vgic_cpu-lr_used, vgic_cpu-nr_lr) {
+   for_each_clear_bit(i, elrsr_ptr, vgic_cpu-nr_lr) {
struct vgic_lr lr = vgic_get_lr(vcpu, i);
 
/*
@@ -703,7 +715,7 @@ void vgic_unqueue_irqs(struct kvm_vcpu *vcpu)
 * Mark the LR as free for other use.
 */
BUG_ON(lr.state  LR_STATE_MASK);
-   vgic_retire_lr(i,