On 2016-05-03 08:00, Peter Xu wrote: > On Tue, May 03, 2016 at 07:40:50AM +0200, Jan Kiszka wrote: >> On 2016-05-03 07:30, Peter Xu wrote: >>> On Tue, May 03, 2016 at 06:38:28AM +0200, Jan Kiszka wrote: >>>> On 2016-05-03 05:22, Peter Xu wrote: >>>>> On Fri, Apr 29, 2016 at 09:52:14PM +0200, Radim Krčmář wrote: >>>>>> 2016-04-28 17:18+0800, Peter Xu: >>>>>>> On Thu, Apr 28, 2016 at 09:19:28AM +0200, Jan Kiszka wrote: >>>>>>>> Instead of fiddling with irq routes for the IOAPIC - where we don't >>>>>>>> need >>>>>>>> it -, I would suggest to do the following: Send IOAPIC events via >>>>>>>> kvm_irqchip_send_msi to the kernel. Only irqfd users (vhost, vfio) >>>>>>>> should use the pattern you are now applying to the IOAPIC: establish >>>>>>>> static routes as an irqfd is set up, and that route should be >>>>>>>> translated >>>>>>>> by the iommu first, register an IEC notifier to update any affected >>>>>>>> route when the iommu translation changes. >>>>>>> >>>>>>> Yes, maybe that's the right thing to do. Or say, when split irqchip, >>>>>>> IOAPIC can avoid using GSI routes any more. If with that, I should >>>>>>> also remove lots of things, like: IEC notifiers for IOAPIC, and all >>>>>>> things related to msi route sync-up in IOAPIC codes with KVM (so I >>>>>>> suppose we will save 24 gsi route entries for KVM, which sounds >>>>>>> good). >>>>>> >>>>>> Sadly, we can't get rid of those GSI routes. KVM uses them to decide >>>>>> whether it should forward EOI to userspace. And QEMU also has to remap >>>>>> them, because KVM uses dest_id for that decision. >>>>> >>>>> Thanks to point out. Actually I have started to test the new patch >>>>> and did encountered issue when using split irqchip mode. The above >>>>> explains. :) >>>>> >>>>> So I think I will keep this part of codes un-touched in this >>>>> series, and I can move forward with IR changes. >>>>> >>>>> However, I am still wondering whether we can avoid this in KVM when >>>>> when using split irqchip mode? E.g., what if we do not do this check >>>>> in KVM and let all EOI be passed to userspace? Like: >>>>> >>>>> --- >>>>> >>>>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c >>>>> index 36591fa..c0a6e51 100644 >>>>> --- a/arch/x86/kvm/lapic.c >>>>> +++ b/arch/x86/kvm/lapic.c >>>>> @@ -936,10 +936,6 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic >>>>> *apic, int vector) >>>>> { >>>>> int trigger_mode; >>>>> >>>>> - /* Eoi the ioapic only if the ioapic doesn't own the vector. */ >>>>> - if (!kvm_ioapic_handles_vector(apic, vector)) >>>>> - return; >>>>> - >>>>> /* Request a KVM exit to inform the userspace IOAPIC. */ >>>>> if (irqchip_split(apic->vcpu->kvm)) { >>>>> apic->vcpu->arch.pending_ioapic_eoi = vector; >>>>> @@ -947,6 +943,10 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic >>>>> *apic, int vector) >>>>> return; >>>>> } >>>>> >>>>> + /* Eoi the ioapic only if the ioapic doesn't own the vector. */ >>>>> + if (!kvm_ioapic_handles_vector(apic, vector)) >>>>> + return; >>>>> + >>>>> if (apic_test_vector(vector, apic->regs + APIC_TMR)) >>>>> trigger_mode = IOAPIC_LEVEL_TRIG; >>>>> else >>>>> >>>>> --- >>>>> >>>>> I believe this will brings more user-space context switches, I just >>>>> do not know how this will impact the system (only for curiosity :). >>>> >>>> Yes, this doesn't look good from the performance POV. Given that most >>>> EOIs of the APICs will not trigger a message to an IOAPIC and userspace >>>> exits are expensive, that should be measurable. >>>> >>>> But you should optimize route updating a bit: I noticed real delays >>>> (almost a second) when reprogramming all IOAPIC pins during Jailhouse >>>> handover. That's because of the heavyweight synchronize_srcu we have on >>>> each route change. Even if that is an extreme case, you should try to >>>> reduce route updates to only those that were actually affected by an >>>> IRTE change. >>> >>> Yes, this solution will possibly need one more IOMMU API besides >>> int_remap(), for IOAPIC to know whether specific IOAPIC entry needs >>> to be updated (IOAPIC needs to know the index information of each >>> IOAPIC entry, and see whether it is matched with IEC notified index >>> and mask, in ioapic_iec_notifier()). >>> >>> Another problem that may cause this issue is that: now we are >>> committing route information every time when updating route entries >>> (please check kvm_update_routing_entry()). IMHO this is not >>> necessary - we will need 25 times of committing for each IEC >>> invalication, while actually we only need one at the end. >>> >>> So... We'll have another solution for this: how about not committing >>> for each update, and just do one commit after all changes settled? >>> This seems to be much cleaner, and easier comparing to the above >>> proposal. >> >> If you see a number of invalidation requests queued in a row, you may >> also process them atomically from the guest POV, which includes a single >> route update at the end, before the guest gets told that everything was >> done. > > Yes, this would be better as a 3rd step after the above two. If you > would not mind, I'll noting this down as well with the 1st proposal > (considering that this will need more code changes, not only in QI > logic, also in IEC notification path), and will only contain the 2nd > change (explicit route commit) in v6, to partially solve the problem > for now.
Ack. Jan