Re: [PATCH] KVM: x86: Add lowest-priority support for vt-d posted-interrupts

2015-12-11 Thread Radim Krcmár
2015-12-10 01:52+, Wu, Feng:
>> From: Radim Krčmář [mailto:rkrc...@redhat.com]
>> (Physical xAPIC+x2APIC mode is still somewhat reasonable and xAPIC CPUs
>>  start with LDR=0, which means that operating system doesn't need to
>>  utilize mixed mode, as defined by KVM, when switching to x2APIC.)
> 
> I think you mean Physical xAPIC+Physical x2APIC mode, right? For physical
> mode, we don't use LDR in any case, do we? So in physical mode, we only
> use the APIC ID, that is why they can be mixed, is my understanding correct?

Yes.  (Technically, physical and logical addressing is always active in
APIC, but xAPIC must have nonzero LDR to accept logical interrupts[1].)
If all xAPIC LDRs are zero, KVM doesn't enter a "mixed mode" even if
some are xAPIC and some x2APIC [2].

1: Real LAPICs probably do not accept broadcasts on APICs where LDR=0,
   KVM LAPICs do, but lowest priority broadcast is not allowed anyway,
   so PI doesn't care.

2: KVM allows OS-writeable APIC ID, which complicates things and real
   hardware probably doesn't allow it because of that ... we'd be saner
   with RO APIC ID, but it's not that bad.  (And no major OS does it :])

>>  the system uses cluster xAPIC, OS should set DFR before LDR, which
>>  doesn't trigger mixed mode either.)
> 
> Just curious, if the APIC is software disabled and it is in xAPIC mode. OS 
> sets
> different value for DFR for different APICs, then when OS sets LDR, KVM can
> trigger mixed flat and cluster mode, right?

Exactly.
APICs with zeroed LDR are ignored, so KVM will use the slow-path for
delivery (= trigger mixed mode) at the moment the first APIC with
different DFR is configured.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: Add lowest-priority support for vt-d posted-interrupts

2015-11-26 Thread Radim Krcmár
2015-11-26 06:24+, Wu, Feng:
>> From: Radim Krčmář [mailto:rkrc...@redhat.com]
>> 2015-11-25 15:38+0100, Paolo Bonzini:
>>> On 25/11/2015 15:12, Radim Krcmár wrote:
>>>> I think it's ok to pick any algorithm we like.  It's unlikely that
>>>> software would recognize and take advantage of the hardware algorithm
>>>> without adding a special treatment for KVM.
>>>> (I'd vote for the simple pick-first-APIC lowest priority algorithm ...
>>>>  I don't see much point in complicating lowest priority when it doesn't
>>>>  deliver to lowest priority CPU anyway.)
>>>
>>> Vector hashing is an improvement for the common case where all vectors
>>> are set to all CPUs.  Sure you can get an unlucky assignment, but it's
>>> still better than pick-first-APIC.
>> 
>> Yeah, hashing has a valid use case, but a subtle weighting of drawbacks
>> led me to prefer pick-first-APIC ...
> 
> Is it possible that pick-first-APIC policy make certain vCPU's irq workload 
> too
> heavy?

It is, but vector hashing doesn't eliminate that possibility, just makes
it significantly less likely.
irqbalanced takes care of proper distribution in Linux guests.  I'm not
sure what other OS do, but they should have something like that as well.

>> (I'd prefer to have simple code in KVM and depend on static IRQ balancing
>>  in a guest to handle the distribution.
>>  The guest could get the unlucky assignment anyway, so it should be
>>  prepared;  and hashing just made KVM worse in that case.  Guests might
>>  also configure physical x(2)APIC, where is no lowest priority.
>>  And if the guest doesn't do anything with IRQs, then it might not even
>>  care about the impact that our choice has.)
> 
> Do do you guys have an agreement on how to handle this? Or we can implement
> the vector hashing at the current stage. then we can improve it like Radim 
> mentioned
> above if it is really needed? 

Vector hashing is definitely an improvement over the current situation;
I'll agree with any algorithm if it is reasonably implemented.

(v1 fails delivery if chosen APIC is disabled, misuses KVM_MAX_VCPUS as
 bitmap size, and counting number of bits is better done with hweight16()
 -- these bugs would hopefully be fixed by having a common functinon :])
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: Add lowest-priority support for vt-d posted-interrupts

2015-11-25 Thread Radim Krcmár
2015-11-25 03:21+, Wu, Feng:
> From: Radim Krčmář [mailto:rkrc...@redhat.com]
>> The hash function just interprets a subset of vector's bits as a number
>> and uses that as a starting offset in a search for an enabled APIC
>> within the destination set?
>> 
>> For example:
>> The x2APIC destination is 0x0055 (= first four even APICs in cluster
>> 0), the vector is 0b1110, and bits 10:8 of IntControl are 000.
>> 
>> 000 means that bits 7:4 of vector are selected, thus the vector hash is
>> 0b1110 = 14, so the round-robin effectively does 14 % 4 (because we only
>> have 4 destinations) and delivers to the 3rd possible APIC (= ID 6)?
> 
> In my current implementation, I don't select a subset of vector's bits as
> the number, instead, I use the whole vector number. For software emulation
> p. o. v, do we really need to select a subset of the vector's bits as the base
> number? What is your opinion? Thanks a lot!

I think it's ok to pick any algorithm we like.  It's unlikely that
software would recognize and take advantage of the hardware algorithm
without adding a special treatment for KVM.
(I'd vote for the simple pick-first-APIC lowest priority algorithm ...
 I don't see much point in complicating lowest priority when it doesn't
 deliver to lowest priority CPU anyway.)

I mainly wanted to know what real hardware really does, because there is
a lot of alternatives that still fit into the Xeon documentation.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: Add lowest-priority support for vt-d posted-interrupts

2015-11-24 Thread Radim Krcmár
2015-11-24 01:26+, Wu, Feng:
>> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
>> On 16/11/2015 20:03, Radim Krčmář wrote:
>> > 2015-11-09 10:46+0800, Feng Wu:
>> >> Use vector-hashing to handle lowest-priority interrupts for
>> >> posted-interrupts. As an example, modern Intel CPUs use this
>> >> method to handle lowest-priority interrupts.
>> >
>> > (I don't think it's a good idea that the algorithm differs from non-PI
>> >  lowest priority delivery.  I'd make them both vector-hashing, which
>> >  would be "fun" to explain to people expecting round robin ...)
>> 
>> Yup, I would make it a module option.  Thanks very much Radim for
>> helping with the review.
> 
> Thanks for your guys' review. Yes, we can introduce a module option
> for it. According to Radim's comments above, we need use the
> same policy for PI and non-PI lowest-priority interrupts, so here is the
> question: for vector hashing, it is easy to apply it for both non-PI and PI
> case, however, for Round-Robin, in non-PI case, the round robin counter
> is used and updated when the interrupt is injected to guest, but for
> PI case, the interrupt is injected to guest totally by hardware, software
> cannot control it while interrupt delivery, we can only decide the
> destination vCPU for the PI interrupt in the initial configuration
> time (guest update vMSI -> QEMU -> KVM). Do you guys have any good
> suggestion to do round robin for PI lowest-priority? Seems Round robin
> is not a good way for PI lowest-priority interrupts. Any comments
> are appreciated!

It's meaningless to try dynamic algorithms with PI so if we allow both
lowest priority algorithms, I'd let PI handle any lowest priority only
with vector hashing.  (It's an ugly compromise.)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html