On 2016-05-03 07:30, Peter Xu wrote: > On Tue, May 03, 2016 at 06:38:28AM +0200, Jan Kiszka wrote: >> On 2016-05-03 05:22, Peter Xu wrote: >>> On Fri, Apr 29, 2016 at 09:52:14PM +0200, Radim Krčmář wrote: >>>> 2016-04-28 17:18+0800, Peter Xu: >>>>> On Thu, Apr 28, 2016 at 09:19:28AM +0200, Jan Kiszka wrote: >>>>>> Instead of fiddling with irq routes for the IOAPIC - where we don't need >>>>>> it -, I would suggest to do the following: Send IOAPIC events via >>>>>> kvm_irqchip_send_msi to the kernel. Only irqfd users (vhost, vfio) >>>>>> should use the pattern you are now applying to the IOAPIC: establish >>>>>> static routes as an irqfd is set up, and that route should be translated >>>>>> by the iommu first, register an IEC notifier to update any affected >>>>>> route when the iommu translation changes. >>>>> >>>>> Yes, maybe that's the right thing to do. Or say, when split irqchip, >>>>> IOAPIC can avoid using GSI routes any more. If with that, I should >>>>> also remove lots of things, like: IEC notifiers for IOAPIC, and all >>>>> things related to msi route sync-up in IOAPIC codes with KVM (so I >>>>> suppose we will save 24 gsi route entries for KVM, which sounds >>>>> good). >>>> >>>> Sadly, we can't get rid of those GSI routes. KVM uses them to decide >>>> whether it should forward EOI to userspace. And QEMU also has to remap >>>> them, because KVM uses dest_id for that decision. >>> >>> Thanks to point out. Actually I have started to test the new patch >>> and did encountered issue when using split irqchip mode. The above >>> explains. :) >>> >>> So I think I will keep this part of codes un-touched in this >>> series, and I can move forward with IR changes. >>> >>> However, I am still wondering whether we can avoid this in KVM when >>> when using split irqchip mode? E.g., what if we do not do this check >>> in KVM and let all EOI be passed to userspace? Like: >>> >>> --- >>> >>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c >>> index 36591fa..c0a6e51 100644 >>> --- a/arch/x86/kvm/lapic.c >>> +++ b/arch/x86/kvm/lapic.c >>> @@ -936,10 +936,6 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic >>> *apic, int vector) >>> { >>> int trigger_mode; >>> >>> - /* Eoi the ioapic only if the ioapic doesn't own the vector. */ >>> - if (!kvm_ioapic_handles_vector(apic, vector)) >>> - return; >>> - >>> /* Request a KVM exit to inform the userspace IOAPIC. */ >>> if (irqchip_split(apic->vcpu->kvm)) { >>> apic->vcpu->arch.pending_ioapic_eoi = vector; >>> @@ -947,6 +943,10 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic >>> *apic, int vector) >>> return; >>> } >>> >>> + /* Eoi the ioapic only if the ioapic doesn't own the vector. */ >>> + if (!kvm_ioapic_handles_vector(apic, vector)) >>> + return; >>> + >>> if (apic_test_vector(vector, apic->regs + APIC_TMR)) >>> trigger_mode = IOAPIC_LEVEL_TRIG; >>> else >>> >>> --- >>> >>> I believe this will brings more user-space context switches, I just >>> do not know how this will impact the system (only for curiosity :). >> >> Yes, this doesn't look good from the performance POV. Given that most >> EOIs of the APICs will not trigger a message to an IOAPIC and userspace >> exits are expensive, that should be measurable. >> >> But you should optimize route updating a bit: I noticed real delays >> (almost a second) when reprogramming all IOAPIC pins during Jailhouse >> handover. That's because of the heavyweight synchronize_srcu we have on >> each route change. Even if that is an extreme case, you should try to >> reduce route updates to only those that were actually affected by an >> IRTE change. > > Yes, this solution will possibly need one more IOMMU API besides > int_remap(), for IOAPIC to know whether specific IOAPIC entry needs > to be updated (IOAPIC needs to know the index information of each > IOAPIC entry, and see whether it is matched with IEC notified index > and mask, in ioapic_iec_notifier()). > > Another problem that may cause this issue is that: now we are > committing route information every time when updating route entries > (please check kvm_update_routing_entry()). IMHO this is not > necessary - we will need 25 times of committing for each IEC > invalication, while actually we only need one at the end. > > So... We'll have another solution for this: how about not committing > for each update, and just do one commit after all changes settled? > This seems to be much cleaner, and easier comparing to the above > proposal.
If you see a number of invalidation requests queued in a row, you may also process them atomically from the guest POV, which includes a single route update at the end, before the guest gets told that everything was done. Jan > > (These two solutions actually solves different problems, and either > of them should help in reducing the 1 sec delay mentioned > above. For now, I'd prefer first choose the 2nd solution to do > explicit route commit, and add the 1st one into my todo list) > > Thanks, > > -- peterx >