Re: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes
On 12/11/2014 04:42, Zhang, Yang Z wrote: Personally, I think this feature will be helpful to the legacy device assignment. Agree, vfio is the right solution for future feature enabling. But the old kvm without the good vfio supporting is still used largely today. The user really looking for this feature but they will not upgrade their kernel. It's easy for us to backport this feature to old kvm with the legacy device assignment, but it is impossible to backport the whole vfio. You can certainly backport these patches to distros that do not have VFIO. But upstream we should work on VFIO first. VFIO has feature parity with legacy device assignment, and adding a new feature that is not in VFIO would be a bad idea. By the way, do you have benchmark results for it? We have not been able to see any performance improvement for APICv on e.g. netperf. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes
-Original Message- From: Paolo Bonzini [mailto:pbonz...@redhat.com] Sent: Wednesday, November 12, 2014 5:14 PM To: Zhang, Yang Z; Wu, Feng; Alex Williamson Cc: g...@kernel.org; dw...@infradead.org; j...@8bytes.org; t...@linutronix.de; mi...@redhat.com; h...@zytor.com; x...@kernel.org; kvm@vger.kernel.org; io...@lists.linux-foundation.org; linux-ker...@vger.kernel.org Subject: Re: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes On 12/11/2014 04:42, Zhang, Yang Z wrote: Personally, I think this feature will be helpful to the legacy device assignment. Agree, vfio is the right solution for future feature enabling. But the old kvm without the good vfio supporting is still used largely today. The user really looking for this feature but they will not upgrade their kernel. It's easy for us to backport this feature to old kvm with the legacy device assignment, but it is impossible to backport the whole vfio. You can certainly backport these patches to distros that do not have VFIO. But upstream we should work on VFIO first. VFIO has feature parity with legacy device assignment, and adding a new feature that is not in VFIO would be a bad idea. By the way, do you have benchmark results for it? We have not been able to see any performance improvement for APICv on e.g. netperf. Do you mean benchmark results for APICv itself or VT-d Posted-Interrtups? Thanks, Feng Paolo
Re: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes
On 12/11/2014 10:19, Wu, Feng wrote: You can certainly backport these patches to distros that do not have VFIO. But upstream we should work on VFIO first. VFIO has feature parity with legacy device assignment, and adding a new feature that is not in VFIO would be a bad idea. By the way, do you have benchmark results for it? We have not been able to see any performance improvement for APICv on e.g. netperf. Do you mean benchmark results for APICv itself or VT-d Posted-Interrtups? Especially for VT-d posted interrupts---but it'd be great to know which workloads see the biggest speedup from APICv. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes
On Wed, 2014-11-12 at 10:14 +0100, Paolo Bonzini wrote: On 12/11/2014 04:42, Zhang, Yang Z wrote: Personally, I think this feature will be helpful to the legacy device assignment. Agree, vfio is the right solution for future feature enabling. But the old kvm without the good vfio supporting is still used largely today. The user really looking for this feature but they will not upgrade their kernel. It's easy for us to backport this feature to old kvm with the legacy device assignment, but it is impossible to backport the whole vfio. You can certainly backport these patches to distros that do not have VFIO. But upstream we should work on VFIO first. VFIO has feature parity with legacy device assignment, and adding a new feature that is not in VFIO would be a bad idea. Thanks Paolo, I agree. We should design the interfaces for VFIO since we expect legacy KVM assignment to be deprecated and eventually removed. I think that some of the platform device work for ARM's IRQ forwarding should probably be leveraged for this interface. IRQ forwarding effectively allows level triggered interrupts to be handled as edge, eliminating the mask/unmask overhead and EOI path entirely. To do this through VFIO they make use of the KVM-VFIO device to register the device and set attributes for the forwarded IRQ. This enables KVM to use the VFIO external user interfaces to acquire a VFIO device reference and access the struct device. From there it can do some IRQ manipulation on the device to reconfigure how the host handles the interrupt. Ideally we could use the same base KVM-VFIO device interface interface, perhaps with different attributes, and obviously with different architecture backing. Thanks, Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes
Wu, Feng wrote on 2014-11-13: kvm-ow...@vger.kernel.org wrote on 2014-11-12: kvm@vger.kernel.org; io...@lists.linux-foundation.org; linux-ker...@vger.kernel.org Subject: Re: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes On 12/11/2014 10:19, Wu, Feng wrote: You can certainly backport these patches to distros that do not have VFIO. But upstream we should work on VFIO first. VFIO has feature parity with legacy device assignment, and adding a new feature that is not in VFIO would be a bad idea. By the way, do you have benchmark results for it? We have not been able to see any performance improvement for APICv on e.g. netperf. Do you mean benchmark results for APICv itself or VT-d Posted-Interrtups? Especially for VT-d posted interrupts---but it'd be great to know which workloads see the biggest speedup from APICv. We have some draft performance data internally, please see the attached. For VT-d PI, I think we can get the biggest performance gain if the VCPU is running in non-root mode for most of the time (not in HLT state), since external interrupt from assigned devices will be delivered by guest directly in this case. That means we can run some cpu intensive workload in the guests. Have you check that the CPU side posted interrupt is taking effect in w/o VT-D PI case? Per my understanding, the performance gap should be so large if you use CPU side posted interrupt. This data more like the VT-d PI vs non PI(both VT-d and CPU). Thanks, Feng Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Best regards, Yang
RE: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes
-Original Message- From: Zhang, Yang Z Sent: Thursday, November 13, 2014 9:21 AM To: Wu, Feng; Paolo Bonzini; Alex Williamson Cc: g...@kernel.org; dw...@infradead.org; j...@8bytes.org; t...@linutronix.de; mi...@redhat.com; h...@zytor.com; x...@kernel.org; kvm@vger.kernel.org; io...@lists.linux-foundation.org; linux-ker...@vger.kernel.org Subject: RE: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes Wu, Feng wrote on 2014-11-13: kvm-ow...@vger.kernel.org wrote on 2014-11-12: kvm@vger.kernel.org; io...@lists.linux-foundation.org; linux-ker...@vger.kernel.org Subject: Re: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes On 12/11/2014 10:19, Wu, Feng wrote: You can certainly backport these patches to distros that do not have VFIO. But upstream we should work on VFIO first. VFIO has feature parity with legacy device assignment, and adding a new feature that is not in VFIO would be a bad idea. By the way, do you have benchmark results for it? We have not been able to see any performance improvement for APICv on e.g. netperf. Do you mean benchmark results for APICv itself or VT-d Posted-Interrtups? Especially for VT-d posted interrupts---but it'd be great to know which workloads see the biggest speedup from APICv. We have some draft performance data internally, please see the attached. For VT-d PI, I think we can get the biggest performance gain if the VCPU is running in non-root mode for most of the time (not in HLT state), since external interrupt from assigned devices will be delivered by guest directly in this case. That means we can run some cpu intensive workload in the guests. Have you check that the CPU side posted interrupt is taking effect in w/o VT-D PI case? Per my understanding, the performance gap should be so large if you use CPU side posted interrupt. This data more like the VT-d PI vs non PI(both VT-d and CPU). Yes, this data is VT-d PI vs Non VT-d PI. The CPU side APICv mechanism (including CPU side Posted-Interrtups) is enabled. Thanks, Feng Thanks, Feng Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Best regards, Yang
RE: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes
Wu, Feng wrote on 2014-11-13: Zhang, Yang Z wrote on 2014-11-13: kvm@vger.kernel.org; io...@lists.linux-foundation.org; linux-ker...@vger.kernel.org Subject: RE: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes Wu, Feng wrote on 2014-11-13: kvm-ow...@vger.kernel.org wrote on 2014-11-12: kvm@vger.kernel.org; io...@lists.linux-foundation.org; linux-ker...@vger.kernel.org Subject: Re: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes On 12/11/2014 10:19, Wu, Feng wrote: You can certainly backport these patches to distros that do not have VFIO. But upstream we should work on VFIO first. VFIO has feature parity with legacy device assignment, and adding a new feature that is not in VFIO would be a bad idea. By the way, do you have benchmark results for it? We have not been able to see any performance improvement for APICv on e.g. netperf. Do you mean benchmark results for APICv itself or VT-d Posted-Interrtups? Especially for VT-d posted interrupts---but it'd be great to know which workloads see the biggest speedup from APICv. We have some draft performance data internally, please see the attached. For VT-d PI, I think we can get the biggest performance gain if the VCPU is running in non-root mode for most of the time (not in HLT state), since external interrupt from assigned devices will be delivered by guest directly in this case. That means we can run some cpu intensive workload in the guests. Have you check that the CPU side posted interrupt is taking effect in w/o VT-D PI case? Per my understanding, the performance gap should be so large if you use CPU side posted interrupt. This data more like the VT-d PI vs non PI(both VT-d and CPU). Yes, this data is VT-d PI vs Non VT-d PI. The CPU side APICv mechanism (including CPU side Posted-Interrtups) is enabled. From the CPU utilization data, it seems the environment of APICv is not reasonable to me. with current APICv, the interrupt should not deliver to the PCPU where vcpu is running. Otherwise, it will force the vcpu vmexit and the CPU side posted interrupt cannot take effect. Do you set the interrupt affinity manually? Thanks, Feng Thanks, Feng Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Best regards, Yang Best regards, Yang
RE: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes
-Original Message- From: Alex Williamson [mailto:alex.william...@redhat.com] Sent: Tuesday, November 11, 2014 5:58 AM To: Wu, Feng Cc: g...@kernel.org; pbonz...@redhat.com; dw...@infradead.org; j...@8bytes.org; t...@linutronix.de; mi...@redhat.com; h...@zytor.com; x...@kernel.org; kvm@vger.kernel.org; io...@lists.linux-foundation.org; linux-ker...@vger.kernel.org Subject: Re: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes On Mon, 2014-11-10 at 14:26 +0800, Feng Wu wrote: When guest changes its interrupt configuration (such as, vector, etc.) for direct-assigned devices, we need to update the associated IRTE with the new guest vector, so external interrupts from the assigned devices can be injected to guests without VM-Exit. The current method of handling guest lowest priority interrtups is to use a counter 'apic_arb_prio' for each VCPU, we choose the VCPU with smallest 'apic_arb_prio' and then increase it by 1. However, for VT-d PI, we cannot re-use this, since we no longer have control to 'apic_arb_prio' with posted interrupt direct delivery by Hardware. Here, we introduce a similiar way with 'apic_arb_prio' to handle guest lowest priority interrtups when VT-d PI is used. Here is the ideas: - Each VCPU has a counter 'round_robin_counter'. - When guests sets an interrupts to lowest priority, we choose the VCPU with smallest 'round_robin_counter' as the destination, then increase it. Signed-off-by: Feng Wu feng...@intel.com --- arch/x86/include/asm/irq_remapping.h |6 ++ arch/x86/include/asm/kvm_host.h |2 + arch/x86/kvm/vmx.c | 12 +++ arch/x86/kvm/x86.c | 11 +++ drivers/iommu/amd_iommu.c|6 ++ drivers/iommu/intel_irq_remapping.c | 28 +++ drivers/iommu/irq_remapping.c|9 ++ drivers/iommu/irq_remapping.h|3 + include/linux/dmar.h | 26 ++ include/linux/kvm_host.h | 22 + include/uapi/linux/kvm.h |1 + virt/kvm/assigned-dev.c | 141 ++ virt/kvm/irq_comm.c |4 +- virt/kvm/irqchip.c | 11 --- 14 files changed, 269 insertions(+), 13 deletions(-) diff --git a/arch/x86/include/asm/irq_remapping.h b/arch/x86/include/asm/irq_remapping.h index a3cc437..32d6cc4 100644 --- a/arch/x86/include/asm/irq_remapping.h +++ b/arch/x86/include/asm/irq_remapping.h @@ -51,6 +51,7 @@ extern void compose_remapped_msi_msg(struct pci_dev *pdev, unsigned int irq, unsigned int dest, struct msi_msg *msg, u8 hpet_id); extern int setup_hpet_msi_remapped(unsigned int irq, unsigned int id); +extern int update_pi_irte(unsigned int irq, u64 pi_desc_addr, u32 vector); extern void panic_if_irq_remap(const char *msg); extern bool setup_remapped_irq(int irq, struct irq_cfg *cfg, @@ -88,6 +89,11 @@ static inline int setup_hpet_msi_remapped(unsigned int irq, unsigned int id) return -ENODEV; } +static inline int update_pi_irte(unsigned int irq, u64 pi_desc_addr, u32 vector) +{ + return -ENODEV; +} + static inline void panic_if_irq_remap(const char *msg) { } diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 6ed0c30..0630161 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -358,6 +358,7 @@ struct kvm_vcpu_arch { struct kvm_lapic *apic;/* kernel irqchip context */ unsigned long apic_attention; int32_t apic_arb_prio; + int32_t round_robin_counter; int mp_state; u64 ia32_misc_enable_msr; bool tpr_access_reporting; @@ -771,6 +772,7 @@ struct kvm_x86_ops { int (*check_nested_events)(struct kvm_vcpu *vcpu, bool external_intr); void (*sched_in)(struct kvm_vcpu *kvm, int cpu); + u64 (*get_pi_desc_addr)(struct kvm_vcpu *vcpu); }; struct kvm_arch_async_pf { diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index a4670d3..ae91b72 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -544,6 +544,11 @@ static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu) return container_of(vcpu, struct vcpu_vmx, vcpu); } +struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu) +{ + return (to_vmx(vcpu)-pi_desc); +} + #define VMCS12_OFFSET(x) offsetof(struct vmcs12, x) #define FIELD(number, name)[number] = VMCS12_OFFSET(name) #define FIELD64(number, name) [number] = VMCS12_OFFSET(name), \ @@ -4280,6 +4285,11 @@ static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu) return; } +static u64 vmx_get_pi_desc_addr(struct kvm_vcpu *vcpu) +{ + return __pa((u64)vcpu_to_pi_desc(vcpu)); +} + /* * Set up
RE: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes
-Original Message- From: Alex Williamson [mailto:alex.william...@redhat.com] Sent: Tuesday, November 11, 2014 5:58 AM To: Wu, Feng Cc: g...@kernel.org; pbonz...@redhat.com; dw...@infradead.org; j...@8bytes.org; t...@linutronix.de; mi...@redhat.com; h...@zytor.com; x...@kernel.org; kvm@vger.kernel.org; io...@lists.linux-foundation.org; linux-ker...@vger.kernel.org Subject: Re: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes On Mon, 2014-11-10 at 14:26 +0800, Feng Wu wrote: When guest changes its interrupt configuration (such as, vector, etc.) for direct-assigned devices, we need to update the associated IRTE with the new guest vector, so external interrupts from the assigned devices can be injected to guests without VM-Exit. The current method of handling guest lowest priority interrtups is to use a counter 'apic_arb_prio' for each VCPU, we choose the VCPU with smallest 'apic_arb_prio' and then increase it by 1. However, for VT-d PI, we cannot re-use this, since we no longer have control to 'apic_arb_prio' with posted interrupt direct delivery by Hardware. Here, we introduce a similiar way with 'apic_arb_prio' to handle guest lowest priority interrtups when VT-d PI is used. Here is the ideas: - Each VCPU has a counter 'round_robin_counter'. - When guests sets an interrupts to lowest priority, we choose the VCPU with smallest 'round_robin_counter' as the destination, then increase it. Signed-off-by: Feng Wu feng...@intel.com --- arch/x86/include/asm/irq_remapping.h |6 ++ arch/x86/include/asm/kvm_host.h |2 + arch/x86/kvm/vmx.c | 12 +++ arch/x86/kvm/x86.c | 11 +++ drivers/iommu/amd_iommu.c|6 ++ drivers/iommu/intel_irq_remapping.c | 28 +++ drivers/iommu/irq_remapping.c|9 ++ drivers/iommu/irq_remapping.h|3 + include/linux/dmar.h | 26 ++ include/linux/kvm_host.h | 22 + include/uapi/linux/kvm.h |1 + virt/kvm/assigned-dev.c | 141 ++ virt/kvm/irq_comm.c |4 +- virt/kvm/irqchip.c | 11 --- 14 files changed, 269 insertions(+), 13 deletions(-) diff --git a/arch/x86/include/asm/irq_remapping.h b/arch/x86/include/asm/irq_remapping.h index a3cc437..32d6cc4 100644 --- a/arch/x86/include/asm/irq_remapping.h +++ b/arch/x86/include/asm/irq_remapping.h @@ -51,6 +51,7 @@ extern void compose_remapped_msi_msg(struct pci_dev *pdev, unsigned int irq, unsigned int dest, struct msi_msg *msg, u8 hpet_id); extern int setup_hpet_msi_remapped(unsigned int irq, unsigned int id); +extern int update_pi_irte(unsigned int irq, u64 pi_desc_addr, u32 vector); extern void panic_if_irq_remap(const char *msg); extern bool setup_remapped_irq(int irq, struct irq_cfg *cfg, @@ -88,6 +89,11 @@ static inline int setup_hpet_msi_remapped(unsigned int irq, unsigned int id) return -ENODEV; } +static inline int update_pi_irte(unsigned int irq, u64 pi_desc_addr, u32 vector) +{ + return -ENODEV; +} + static inline void panic_if_irq_remap(const char *msg) { } diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 6ed0c30..0630161 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -358,6 +358,7 @@ struct kvm_vcpu_arch { struct kvm_lapic *apic;/* kernel irqchip context */ unsigned long apic_attention; int32_t apic_arb_prio; + int32_t round_robin_counter; int mp_state; u64 ia32_misc_enable_msr; bool tpr_access_reporting; @@ -771,6 +772,7 @@ struct kvm_x86_ops { int (*check_nested_events)(struct kvm_vcpu *vcpu, bool external_intr); void (*sched_in)(struct kvm_vcpu *kvm, int cpu); + u64 (*get_pi_desc_addr)(struct kvm_vcpu *vcpu); }; struct kvm_arch_async_pf { diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index a4670d3..ae91b72 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -544,6 +544,11 @@ static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu) return container_of(vcpu, struct vcpu_vmx, vcpu); } +struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu) +{ + return (to_vmx(vcpu)-pi_desc); +} + #define VMCS12_OFFSET(x) offsetof(struct vmcs12, x) #define FIELD(number, name)[number] = VMCS12_OFFSET(name) #define FIELD64(number, name) [number] = VMCS12_OFFSET(name), \ @@ -4280,6 +4285,11 @@ static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu) return; } +static u64 vmx_get_pi_desc_addr(struct kvm_vcpu *vcpu) +{ + return __pa((u64)vcpu_to_pi_desc(vcpu)); +} + /* * Set up
Re: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes
On 11/11/2014 10:20, Wu, Feng wrote: Since legacy KVM device assignment is effectively deprecated, have you considered how we might do this with VFIO? Thanks, I haven't thought about how to enable this in VFIO so far. I think I can continue to implement that if needed after this patch set is finished. What do you think of this? Hi Feng, we are not applying new features to legacy KVM device assignment, since it is unsafe (it does not honor ACS). I and Alex can help you with designing a way to interface VFIO with KVM posted interrupts. Give us a few days to study these patches more, or feel free to request comments if you have ideas about it yourself. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes
-Original Message- From: Paolo Bonzini [mailto:pbonz...@redhat.com] Sent: Tuesday, November 11, 2014 7:02 PM To: Wu, Feng; Alex Williamson Cc: g...@kernel.org; dw...@infradead.org; j...@8bytes.org; t...@linutronix.de; mi...@redhat.com; h...@zytor.com; x...@kernel.org; kvm@vger.kernel.org; io...@lists.linux-foundation.org; linux-ker...@vger.kernel.org Subject: Re: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes On 11/11/2014 10:20, Wu, Feng wrote: Since legacy KVM device assignment is effectively deprecated, have you considered how we might do this with VFIO? Thanks, I haven't thought about how to enable this in VFIO so far. I think I can continue to implement that if needed after this patch set is finished. What do you think of this? Hi Feng, we are not applying new features to legacy KVM device assignment, since it is unsafe (it does not honor ACS). I and Alex can help you with designing a way to interface VFIO with KVM posted interrupts. Give us a few days to study these patches more, or feel free to request comments if you have ideas about it yourself. Paolo Okay, then I will put some efforts on getting familiar with VFIO mechanism. If You have any questions about these patches, we can discuss it together. Thanks, Feng N�r��yb�X��ǧv�^�){.n�+h����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf
RE: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes
-Original Message- From: Paolo Bonzini [mailto:pbonz...@redhat.com] Sent: Tuesday, November 11, 2014 7:02 PM To: Wu, Feng; Alex Williamson Cc: g...@kernel.org; dw...@infradead.org; j...@8bytes.org; t...@linutronix.de; mi...@redhat.com; h...@zytor.com; x...@kernel.org; kvm@vger.kernel.org; io...@lists.linux-foundation.org; linux-ker...@vger.kernel.org Subject: Re: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes On 11/11/2014 10:20, Wu, Feng wrote: Since legacy KVM device assignment is effectively deprecated, have you considered how we might do this with VFIO? Thanks, I haven't thought about how to enable this in VFIO so far. I think I can continue to implement that if needed after this patch set is finished. What do you think of this? Hi Feng, we are not applying new features to legacy KVM device assignment, since it is unsafe (it does not honor ACS). I and Alex can help you with designing a way to interface VFIO with KVM posted interrupts. Give us a few days to study these patches more, or feel free to request comments if you have ideas about it yourself. Paolo Okay, then I will put some efforts on getting familiar with VFIO mechanism. If You have any questions about these patches, we can discuss it together. Thanks, Feng -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes
Paolo Bonzini wrote on 2014-11-11: On 11/11/2014 10:20, Wu, Feng wrote: Since legacy KVM device assignment is effectively deprecated, have you considered how we might do this with VFIO? Thanks, I haven't thought about how to enable this in VFIO so far. I think I can continue to implement that if needed after this patch set is finished. What do you think of this? Hi Feng, we are not applying new features to legacy KVM device assignment, since it is unsafe (it does not honor ACS). Personally, I think this feature will be helpful to the legacy device assignment. Agree, vfio is the right solution for future feature enabling. But the old kvm without the good vfio supporting is still used largely today. The user really looking for this feature but they will not upgrade their kernel. It's easy for us to backport this feature to old kvm with the legacy device assignment, but it is impossible to backport the whole vfio. So I think you guys can take a consider to add this feature to both vfio and legacy device assignment. I and Alex can help you with designing a way to interface VFIO with KVM posted interrupts. Give us a few days to study these patches more, or feel free to request comments if you have ideas about it yourself. Paolo Best regards, Yang
Re: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes
On Mon, 2014-11-10 at 14:26 +0800, Feng Wu wrote: When guest changes its interrupt configuration (such as, vector, etc.) for direct-assigned devices, we need to update the associated IRTE with the new guest vector, so external interrupts from the assigned devices can be injected to guests without VM-Exit. The current method of handling guest lowest priority interrtups is to use a counter 'apic_arb_prio' for each VCPU, we choose the VCPU with smallest 'apic_arb_prio' and then increase it by 1. However, for VT-d PI, we cannot re-use this, since we no longer have control to 'apic_arb_prio' with posted interrupt direct delivery by Hardware. Here, we introduce a similiar way with 'apic_arb_prio' to handle guest lowest priority interrtups when VT-d PI is used. Here is the ideas: - Each VCPU has a counter 'round_robin_counter'. - When guests sets an interrupts to lowest priority, we choose the VCPU with smallest 'round_robin_counter' as the destination, then increase it. Signed-off-by: Feng Wu feng...@intel.com --- arch/x86/include/asm/irq_remapping.h |6 ++ arch/x86/include/asm/kvm_host.h |2 + arch/x86/kvm/vmx.c | 12 +++ arch/x86/kvm/x86.c | 11 +++ drivers/iommu/amd_iommu.c|6 ++ drivers/iommu/intel_irq_remapping.c | 28 +++ drivers/iommu/irq_remapping.c|9 ++ drivers/iommu/irq_remapping.h|3 + include/linux/dmar.h | 26 ++ include/linux/kvm_host.h | 22 + include/uapi/linux/kvm.h |1 + virt/kvm/assigned-dev.c | 141 ++ virt/kvm/irq_comm.c |4 +- virt/kvm/irqchip.c | 11 --- 14 files changed, 269 insertions(+), 13 deletions(-) diff --git a/arch/x86/include/asm/irq_remapping.h b/arch/x86/include/asm/irq_remapping.h index a3cc437..32d6cc4 100644 --- a/arch/x86/include/asm/irq_remapping.h +++ b/arch/x86/include/asm/irq_remapping.h @@ -51,6 +51,7 @@ extern void compose_remapped_msi_msg(struct pci_dev *pdev, unsigned int irq, unsigned int dest, struct msi_msg *msg, u8 hpet_id); extern int setup_hpet_msi_remapped(unsigned int irq, unsigned int id); +extern int update_pi_irte(unsigned int irq, u64 pi_desc_addr, u32 vector); extern void panic_if_irq_remap(const char *msg); extern bool setup_remapped_irq(int irq, struct irq_cfg *cfg, @@ -88,6 +89,11 @@ static inline int setup_hpet_msi_remapped(unsigned int irq, unsigned int id) return -ENODEV; } +static inline int update_pi_irte(unsigned int irq, u64 pi_desc_addr, u32 vector) +{ + return -ENODEV; +} + static inline void panic_if_irq_remap(const char *msg) { } diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 6ed0c30..0630161 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -358,6 +358,7 @@ struct kvm_vcpu_arch { struct kvm_lapic *apic;/* kernel irqchip context */ unsigned long apic_attention; int32_t apic_arb_prio; + int32_t round_robin_counter; int mp_state; u64 ia32_misc_enable_msr; bool tpr_access_reporting; @@ -771,6 +772,7 @@ struct kvm_x86_ops { int (*check_nested_events)(struct kvm_vcpu *vcpu, bool external_intr); void (*sched_in)(struct kvm_vcpu *kvm, int cpu); + u64 (*get_pi_desc_addr)(struct kvm_vcpu *vcpu); }; struct kvm_arch_async_pf { diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index a4670d3..ae91b72 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -544,6 +544,11 @@ static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu) return container_of(vcpu, struct vcpu_vmx, vcpu); } +struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu) +{ + return (to_vmx(vcpu)-pi_desc); +} + #define VMCS12_OFFSET(x) offsetof(struct vmcs12, x) #define FIELD(number, name) [number] = VMCS12_OFFSET(name) #define FIELD64(number, name)[number] = VMCS12_OFFSET(name), \ @@ -4280,6 +4285,11 @@ static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu) return; } +static u64 vmx_get_pi_desc_addr(struct kvm_vcpu *vcpu) +{ + return __pa((u64)vcpu_to_pi_desc(vcpu)); +} + /* * Set up the vmcs's constant host-state fields, i.e., host-state fields that * will not change in the lifetime of the guest. @@ -9232,6 +9242,8 @@ static struct kvm_x86_ops vmx_x86_ops = { .check_nested_events = vmx_check_nested_events, .sched_in = vmx_sched_in, + + .get_pi_desc_addr = vmx_get_pi_desc_addr, }; static int __init vmx_init(void) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index b447a98..0c19d15 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -7735,6 +7735,17 @@ bool