Re: [Qemu-devel] [PATCH 0/2] mips/kvm: Fixes for big endian MIPS64 hosts
On 9 July 2015 at 14:49, Paolo Bonzini pbonz...@redhat.com wrote: On 09/07/2015 10:13, Leon Alrae wrote: On 08/07/2015 16:22, Paolo Bonzini wrote: On 08/07/2015 17:03, James Hogan wrote: Hi Paolo, On 24/04/15 11:26, James Hogan wrote: A couple of small fixes for accessing 32-bit KVM registers on big endian, and to sign extend struct kvm_regs registers so as to work on MIPS64 hosts. James Hogan (2): mips/kvm: Fix Big endian 32-bit register access mips/kvm: Sign extend registers written to KVM Any chance of applying these in time for v2.4? I had assumed they'd go through Leon, but I can apply them for 2.4-rc1 as well. I thought these changes would get applied via kvm queue since they touch target-mips/kvm.c only. But if such target-specific kvm patches usually go via target-* tree then I'm happy to pick them up. It's the same---they can go in through any tree. In this case I've now applied them and will send them out next week. I've actually just applied them to master as buildfixes :-) thanks -- PMM -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC v2 0/6] IRQ bypass manager and irqfd consumer
On Thu, 2015-07-09 at 14:28 +0200, Joerg Roedel wrote: On Tue, Jul 07, 2015 at 11:17:48AM -0600, Alex Williamson wrote: Hosting the bypass manager in kernel/irq seemed appropriate, but really it could be anywhere. Does anyone have a different preference or specifically want it under their scope? We had originally thought of this as an IOMMU service, but I think we've generalized it beyond that. I expect we should also add the necessary hooks to turn it into a loadable module to keep the tinification folks happy, I'll incorporate the current working changes and post a version with that. Yeah, this is only an IOMMU service on x86, afaik. So drivers/iommu is probably the wrong place to host it. Will there be any other producers than VFIO or any other consumers than KVM? If not, it should live in one of these spaces. KVM is probably the best choice, as any hardware feature that uses this targets virtualization, so there will hardly ever be another consumer than KVM. If we think that it's *only* a kvm-vfio interaction then we could add it to virt/kvm/vfio.c. vfio could use symbol_get to avoid a module dependency and effectively disable the code path when not used with kvm. The reverse model of hosting it in vfio and using symbol_get from kvm-vfio would also work. Do we really want to declare it to be kvm-vfio specific though? Another option would be to simply host it under virt/lib with module dependencies for both vfio and kvm. Thanks, Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC v2 0/6] IRQ bypass manager and irqfd consumer
On 09/07/2015 16:13, Alex Williamson wrote: On Thu, 2015-07-09 at 14:28 +0200, Joerg Roedel wrote: On Tue, Jul 07, 2015 at 11:17:48AM -0600, Alex Williamson wrote: Hosting the bypass manager in kernel/irq seemed appropriate, but really it could be anywhere. Does anyone have a different preference or specifically want it under their scope? We had originally thought of this as an IOMMU service, but I think we've generalized it beyond that. I expect we should also add the necessary hooks to turn it into a loadable module to keep the tinification folks happy, I'll incorporate the current working changes and post a version with that. Yeah, this is only an IOMMU service on x86, afaik. So drivers/iommu is probably the wrong place to host it. Will there be any other producers than VFIO or any other consumers than KVM? If not, it should live in one of these spaces. KVM is probably the best choice, as any hardware feature that uses this targets virtualization, so there will hardly ever be another consumer than KVM. If we think that it's *only* a kvm-vfio interaction then we could add it to virt/kvm/vfio.c. vfio could use symbol_get to avoid a module dependency and effectively disable the code path when not used with kvm. The reverse model of hosting it in vfio and using symbol_get from kvm-vfio would also work. Do we really want to declare it to be kvm-vfio specific though? Another option would be to simply host it under virt/lib with module dependencies for both vfio and kvm. I wonder if in the future we may have some kind of driver-mediated passthrough, e.g. for network drivers. They might use the bypass mechanism too. So I think drivers/vfio is too restrictive. virt/ right now only hosts KVM, but it could for example host lguest too. virt/lib/ is okay with me. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] KVM: arm/arm64: Don't let userspace update CNTVOFF once guest is running
On Thu, Jul 09, 2015 at 02:24:06PM +0200, Christoffer Dall wrote: On Thu, Jul 09, 2015 at 01:07:24PM +0100, Peter Maydell wrote: On 9 July 2015 at 13:05, Christoffer Dall christoffer.d...@linaro.org wrote: As I understand it, the problem is that if we ever run a VCPU after reading the value, and write back the value afterwards, you potentially make time go backwards and get inconsistent views of time from different VCPUs because they may have read the time before/after updating the CNTVOFF. Right, but I think if QEMU does that it's a bug (and more to the point I don't entirely understand why we would do that yet, even given that we don't have a distinction between registers to sync always and registers to sync only on reset...) I think we have evidence that it does that, but we don't know why/how. So I ran this through GDB, and this happens when the guest probes the virtio devices, specifically the backtrace tells me that virtio_current_cpu_endian () at /root/src/qemu/hw/virtio/virtio.c:594 = cc-virtio_is_big_endian - arm_cpu_is_big_endian - cpu_synchronize_state - kvm_cpu_synchronize_state which causes cpu-kvm_vcpu_dirty = true, which causes the run-loop to write the CNTVOFF on a per-vcpu basis without stopping anything, as far as I can tell. So yeah, I guess the only required fix here is to fix QEMU in some way as to not fiddle with the timer registers in this way, and then I honestly don't know if we should try to fix legacy userspace in the kernel, but based on the feedback from Jan, I suppose not. Thanks, -Christoffer -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] KVM: arm/arm64: Don't let userspace update CNTVOFF once guest is running
On 9 July 2015 at 15:17, Christoffer Dall christoffer.d...@linaro.org wrote: On Thu, Jul 09, 2015 at 02:24:06PM +0200, Christoffer Dall wrote: So I ran this through GDB, and this happens when the guest probes the virtio devices, specifically the backtrace tells me that virtio_current_cpu_endian () at /root/src/qemu/hw/virtio/virtio.c:594 = cc-virtio_is_big_endian - arm_cpu_is_big_endian - cpu_synchronize_state - kvm_cpu_synchronize_state which causes cpu-kvm_vcpu_dirty = true, which causes the run-loop to write the CNTVOFF on a per-vcpu basis without stopping anything, as far as I can tell. Ah, I was wondering if it was going to turn out to be this, but I hadn't figured out why it was going to cause us to do a write-back of state rather than just a read. So yeah, I guess the only required fix here is to fix QEMU in some way as to not fiddle with the timer registers in this way, and then I honestly don't know if we should try to fix legacy userspace in the kernel, but based on the feedback from Jan, I suppose not. arm_cpu_is_big_endian() doesn't actually want to write back any state -- all it's interested in is reading. So we really ought not to need to write anything back there. kvm-all.c's sync functions don't seem to provide a sync kernel state to userspace but I promise I'm not going to dirty it function though. I guess we could add one. (Overall it's kind of fragile even if we can avoid this specific case where we're writing back the counter state, though -- we should probably think about the sync level stuff as well.) -- PMM -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] KVM: SVM: use NPT page attributes
On 09/07/2015 04:30, Xiao Guangrong wrote: diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 602b974a60a6..0f125c1860ec 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1085,6 +1085,47 @@ static u64 svm_compute_tsc_offset(struct kvm_vcpu *vcpu, u64 target_tsc) return target_tsc - tsc; } +static void svm_set_guest_pat(struct vcpu_svm *svm, u64 *g_pat) +{ +struct kvm_vcpu *vcpu = svm-vcpu; + +/* Unlike Intel, AMD takes the guest's CR0.CD into account. I noticed this code in svm_set_cr0(): if (!(vcpu-kvm-arch.disabled_quirks KVM_QUIRK_CD_NW_CLEARED)) cr0 = ~(X86_CR0_CD | X86_CR0_NW); gCR0.CD is hidden to CPU if KVM_QUIRK_CD_NW_CLEARED is not set and looks like it is the normal case after grepping Qemu code. Hi Xiao, yes, this is correct. QEMU still does not have support for disabling quirks, so gCR0.CD is currently hidden on SVM. I would like to include this series in 4.2, while for 4.3 I will disable the quirk above altogether (it is superseded by the way PAT is forced to all-WB). In any case, this is just a KVM limitation. gCR0.CD is taken into account by the processor when computing the effective memory type. It uses all of these: - the guest page tables PAT/PCD/PWT - the guest page tables CR3.PCD/CR3.PWD - the nested page tables PAT/PCD/PWT - the guest page tables CR3.PCD/CR3.PWD - the host MTRRs - gCR0.CD - hCR0.CD Paolo How about this one? I still do not know how SVM properly emulates CR0.CD? :( -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/7] KVM: arm/arm64: gsi routing support
Hi Pavel, On 09/07/15 15:37, Pavel Fedin wrote: Hello! v1 - v2: - user API changed: x devid id passed in kvm_irq_routing_msi x kept the new routing entry type: KVM_IRQ_ROUTING_EXTENDED_MSI Andre, you never replied to my last comment to the previous series. Oh dear, my draft folder again :-( Sorry for that! Are you going to do the same change in your MSI API? Otherwise: 1. KVM_IRQ_LINE - we have completely own convention. Well, this was already done before us, we cannot fix it. 2. KVM_SIGNAL_MSI - we use VALID_DEVID flag plus devid Yes, because there is already a flag value and no other way to specify this, in contrast to ... 3. KVM_SET_GSI_ROUTING - we use KVM_IRQ_ROUTING_EXTENDED_MSI plus devid Here we already have a type field with some users, so lets piggy-back on this. Both ioctl extensions are coupled with a per-VM capability to let userland know that it needs to provide a device ID. Don't (2) and (3) together still look bad? Since we agreed on not using flags, i would suggest to have KVM_SIGNAL_EXTENDED_MSI counterpart, which also doesn't use flags. Using flags on its own (without an explicit capability) is what I opposed against, not flags in general. After all, that's what they are meant for, right? In case of KVM_SET_GSI_ROUTING it just seems awkward to me to use a flag when a different type would do as well. But after all, I don't have a strong opinion on that matter, so if others prefer using a flag I am also fine with that. Poka, Andre. I know, we were already talking about it, so, if this gets ignored for the second time, i assume the Architects decided that fancy APIs are cool, and i promise to stop this. Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] mips/kvm: Fixes for big endian MIPS64 hosts
On 09/07/2015 10:13, Leon Alrae wrote: On 08/07/2015 16:22, Paolo Bonzini wrote: On 08/07/2015 17:03, James Hogan wrote: Hi Paolo, On 24/04/15 11:26, James Hogan wrote: A couple of small fixes for accessing 32-bit KVM registers on big endian, and to sign extend struct kvm_regs registers so as to work on MIPS64 hosts. James Hogan (2): mips/kvm: Fix Big endian 32-bit register access mips/kvm: Sign extend registers written to KVM Any chance of applying these in time for v2.4? I had assumed they'd go through Leon, but I can apply them for 2.4-rc1 as well. I thought these changes would get applied via kvm queue since they touch target-mips/kvm.c only. But if such target-specific kvm patches usually go via target-* tree then I'm happy to pick them up. It's the same---they can go in through any tree. In this case I've now applied them and will send them out next week. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for 2.4 0/2] MIPS build fixes for v2.4
On 9 July 2015 at 12:52, Leon Alrae leon.al...@imgtec.com wrote: On 09/07/2015 10:17, James Hogan wrote: These two patches fix build errors for the MIPS TCG backend and MIPS KVM. Please could they be applied for v2.4. James Hogan (2): tcg/mips: Fix build error from merged memop+mmu_idx parameter mips/kvm: Sync with newer MIPS KVM headers target-mips/kvm.c | 15 ++- tcg/mips/tcg-target.c | 4 ++-- 2 files changed, 4 insertions(+), 15 deletions(-) Reviewed-by: Leon Alrae leon.al...@imgtec.com Peter, since these are build fixes, could they be squeezed into rc0? Applied to master, thanks. -- PMM -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v2 0/7] KVM: arm/arm64: gsi routing support
Hello! v1 - v2: - user API changed: x devid id passed in kvm_irq_routing_msi x kept the new routing entry type: KVM_IRQ_ROUTING_EXTENDED_MSI Andre, you never replied to my last comment to the previous series. Are you going to do the same change in your MSI API? Otherwise: 1. KVM_IRQ_LINE - we have completely own convention. Well, this was already done before us, we cannot fix it. 2. KVM_SIGNAL_MSI - we use VALID_DEVID flag plus devid 3. KVM_SET_GSI_ROUTING - we use KVM_IRQ_ROUTING_EXTENDED_MSI plus devid Don't (2) and (3) together still look bad? Since we agreed on not using flags, i would suggest to have KVM_SIGNAL_EXTENDED_MSI counterpart, which also doesn't use flags. I know, we were already talking about it, so, if this gets ignored for the second time, i assume the Architects decided that fancy APIs are cool, and i promise to stop this. Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 0/2] mips/kvm: Fixes for big endian MIPS64 hosts
On 09/07/2015 16:00, Peter Maydell wrote: I've actually just applied them to master as buildfixes :-) No, wait, I'm confusing this set with a different 2-patch set of MIPS fixes. Paolo, can you go ahead and take them through the kvm tree? Sure, I had already queued them in fact. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 0/2] mips/kvm: Fixes for big endian MIPS64 hosts
On 9 July 2015 at 14:58, Peter Maydell peter.mayd...@linaro.org wrote: On 9 July 2015 at 14:49, Paolo Bonzini pbonz...@redhat.com wrote: It's the same---they can go in through any tree. In this case I've now applied them and will send them out next week. I've actually just applied them to master as buildfixes :-) No, wait, I'm confusing this set with a different 2-patch set of MIPS fixes. Paolo, can you go ahead and take them through the kvm tree? thanks -- PMM -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC v2 0/6] IRQ bypass manager and irqfd consumer
On Thu, Jul 09, 2015 at 04:38:41PM +0200, Paolo Bonzini wrote: On Tue, Jul 07, 2015 at 11:17:48AM -0600, Alex Williamson wrote: If we think that it's *only* a kvm-vfio interaction then we could add it to virt/kvm/vfio.c. vfio could use symbol_get to avoid a module dependency and effectively disable the code path when not used with kvm. The reverse model of hosting it in vfio and using symbol_get from kvm-vfio would also work. Do we really want to declare it to be kvm-vfio specific though? Another option would be to simply host it under virt/lib with module dependencies for both vfio and kvm. I wonder if in the future we may have some kind of driver-mediated passthrough, e.g. for network drivers. They might use the bypass mechanism too. So I think drivers/vfio is too restrictive. virt/ right now only hosts KVM, but it could for example host lguest too. virt/lib/ is okay with me. Yeah, virt/lib is probably the best choice. Joerg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v2 0/7] KVM: arm/arm64: gsi routing support
Hi! 3. KVM_SET_GSI_ROUTING - we use KVM_IRQ_ROUTING_EXTENDED_MSI plus devid Here we already have a type field with some users, so lets piggy-back on this. We already have 'flags' there too. Both ioctl extensions are coupled with a per-VM capability to let userland know that it needs to provide a device ID. Using flags on its own (without an explicit capability) is what I opposed against, not flags in general. Ok, and in your next respin you'll add the capability, correct? So that we will finally have all pieces in place. In case of KVM_SET_GSI_ROUTING it just seems awkward to me to use a flag when a different type would do as well. Well, MSI vs Extended MSI are even not different types really. It's just MSI which has devid. And we *ALREADY* have VALID_DEVID flag. But after all, I don't have a strong opinion on that matter, so if others prefer using a flag I am also fine with that. So, ok, to be short... My vote is for flag, because it's already there and it keeps up with the style we already have. Eric, this is my final statement about it. It's up to you to accept or ignore. In qemu code flag is a little bit nicer because it's just stored in a variable and helps to avoid several if-else's (however small ones). Compare: --- cut --- kroute.gsi = virq; kroute.type = KVM_IRQ_ROUTING_MSI; kroute.u.msi.address_lo = (uint32_t)msg.address; kroute.u.msi.address_hi = msg.address 32; kroute.u.msi.data = le32_to_cpu(msg.data); kroute.flags = kvm_msi_flags; if (kroute.flags KVM_MSI_VALID_DEVID) { kroute.u.msi.devid = (pci_bus_num(dev-bus) 8) | dev-devfn; } --- cut --- and: --- cut --- kroute.gsi = virq; if (use_extended_msi) { kroute.u.msi.devid = (pci_bus_num(dev-bus) 8) | dev-devfn; kroute.type = KVM_IRQ_ROUTING_EXTENDED_MSI; } else { kroute.type = KVM_IRQ_ROUTING_MSI; } kroute.u.msi.address_lo = (uint32_t)msg.address; kroute.u.msi.address_hi = msg.address 32; kroute.u.msi.data = le32_to_cpu(msg.data); kroute.flags = kvm_msi_flags; --- cut --- Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] KVM: arm/arm64: Don't let userspace update CNTVOFF once guest is running
On Thu, Jul 09, 2015 at 03:26:20PM +0100, Peter Maydell wrote: On 9 July 2015 at 15:17, Christoffer Dall christoffer.d...@linaro.org wrote: On Thu, Jul 09, 2015 at 02:24:06PM +0200, Christoffer Dall wrote: So I ran this through GDB, and this happens when the guest probes the virtio devices, specifically the backtrace tells me that virtio_current_cpu_endian () at /root/src/qemu/hw/virtio/virtio.c:594 = cc-virtio_is_big_endian - arm_cpu_is_big_endian - cpu_synchronize_state - kvm_cpu_synchronize_state which causes cpu-kvm_vcpu_dirty = true, which causes the run-loop to write the CNTVOFF on a per-vcpu basis without stopping anything, as far as I can tell. Ah, I was wondering if it was going to turn out to be this, but I hadn't figured out why it was going to cause us to do a write-back of state rather than just a read. So yeah, I guess the only required fix here is to fix QEMU in some way as to not fiddle with the timer registers in this way, and then I honestly don't know if we should try to fix legacy userspace in the kernel, but based on the feedback from Jan, I suppose not. arm_cpu_is_big_endian() doesn't actually want to write back any state -- all it's interested in is reading. So we really ought not to need to write anything back there. kvm-all.c's sync functions don't seem to provide a sync kernel state to userspace but I promise I'm not going to dirty it function though. I guess we could add one. (Overall it's kind of fragile even if we can avoid this specific case where we're writing back the counter state, though -- we should probably think about the sync level stuff as well.) Sounds to me like we should add the sync level stuff first, as it should catch all callers, and afterwards consider adding the sync-one-way optimization. -Christoffer -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/7] KVM: arm/arm64: gsi routing support
Hi Pavel, Andre, On 07/09/2015 05:52 PM, Pavel Fedin wrote: Hi! 3. KVM_SET_GSI_ROUTING - we use KVM_IRQ_ROUTING_EXTENDED_MSI plus devid Here we already have a type field with some users, so lets piggy-back on this. We already have 'flags' there too. Both ioctl extensions are coupled with a per-VM capability to let userland know that it needs to provide a device ID. Using flags on its own (without an explicit capability) is what I opposed against, not flags in general. Ok, and in your next respin you'll add the capability, correct? So that we will finally have all pieces in place. In case of KVM_SET_GSI_ROUTING it just seems awkward to me to use a flag when a different type would do as well. Well, MSI vs Extended MSI are even not different types really. It's just MSI which has devid. And we *ALREADY* have VALID_DEVID flag. But after all, I don't have a strong opinion on that matter, so if others prefer using a flag I am also fine with that. So, ok, to be short... My vote is for flag, because it's already there and it keeps up with the style we already have. Eric, this is my final statement about it. It's up to you to accept or ignore. In qemu code flag is a little bit nicer because it's just stored in a variable and helps to avoid several if-else's (however small ones). Compare: Well personally I prefer the type thing and I don't see much difference at userspace level anyway. But I am not this kind of hyperspace architect guy. So, since there is no consensus here, I would say let's wait for formal reviews of our maintainers and I will align. The v2 update is not the outcome of a consensus so I made arbitrary decisions to progress fix bugs and I hope this eventually works with ITS ;-) Best Regards Eric --- cut --- kroute.gsi = virq; kroute.type = KVM_IRQ_ROUTING_MSI; kroute.u.msi.address_lo = (uint32_t)msg.address; kroute.u.msi.address_hi = msg.address 32; kroute.u.msi.data = le32_to_cpu(msg.data); kroute.flags = kvm_msi_flags; if (kroute.flags KVM_MSI_VALID_DEVID) { kroute.u.msi.devid = (pci_bus_num(dev-bus) 8) | dev-devfn; } --- cut --- and: --- cut --- kroute.gsi = virq; if (use_extended_msi) { kroute.u.msi.devid = (pci_bus_num(dev-bus) 8) | dev-devfn; kroute.type = KVM_IRQ_ROUTING_EXTENDED_MSI; } else { kroute.type = KVM_IRQ_ROUTING_MSI; } kroute.u.msi.address_lo = (uint32_t)msg.address; kroute.u.msi.address_hi = msg.address 32; kroute.u.msi.data = le32_to_cpu(msg.data); kroute.flags = kvm_msi_flags; --- cut --- Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC v2 0/6] IRQ bypass manager and irqfd consumer
On Thu, 2015-07-09 at 17:34 +0200, Joerg Roedel wrote: On Thu, Jul 09, 2015 at 04:38:41PM +0200, Paolo Bonzini wrote: On Tue, Jul 07, 2015 at 11:17:48AM -0600, Alex Williamson wrote: If we think that it's *only* a kvm-vfio interaction then we could add it to virt/kvm/vfio.c. vfio could use symbol_get to avoid a module dependency and effectively disable the code path when not used with kvm. The reverse model of hosting it in vfio and using symbol_get from kvm-vfio would also work. Do we really want to declare it to be kvm-vfio specific though? Another option would be to simply host it under virt/lib with module dependencies for both vfio and kvm. I wonder if in the future we may have some kind of driver-mediated passthrough, e.g. for network drivers. They might use the bypass mechanism too. So I think drivers/vfio is too restrictive. virt/ right now only hosts KVM, but it could for example host lguest too. virt/lib/ is okay with me. Yeah, virt/lib is probably the best choice. I will make it so. Thanks, Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: x86: Add host physical address width capability
Paolo Bonzini pbonz...@redhat.com writes: On 09/07/2015 08:43, Laszlo Ersek wrote: On 07/09/15 08:09, Paolo Bonzini wrote: On 09/07/2015 00:36, Bandan Das wrote: Let userspace inquire the maximum physical address width of the host processors; this can be used to identify maximum memory that can be assigned to the guest. Reported-by: Laszlo Ersek ler...@redhat.com Signed-off-by: Bandan Das b...@redhat.com --- arch/x86/kvm/x86.c | 3 +++ include/uapi/linux/kvm.h | 1 + 2 files changed, 4 insertions(+) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index bbaf44e..97d6746 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2683,6 +2683,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_NR_MEMSLOTS: r = KVM_USER_MEM_SLOTS; break; + case KVM_CAP_PHY_ADDR_WIDTH: + r = boot_cpu_data.x86_phys_bits; + break; Userspace can just use CPUID, can't it? I believe KVM's cooperation is necessary, for the following reason: The truncation only occurs when the guest-phys - host-phys translation is done in hardware, *and* the phys bits of the host processor are insufficient to represent the highest guest-phys address that the guest will ever face. The first condition (of course) means that the truncation depends on EPT being enabled. (I didn't test on AMD so I don't know if RVI has the same issue.) If EPT is disabled, either because the host processor lacks it, or because the respective kvm_intel module parameter is set so, then the issue cannot be experienced. Therefore I believe a KVM patch is necessary. However, this specific patch doesn't seem sufficient; it should also consider whether EPT is enabled. (And the ioctl should be perhaps renamed to reflect that -- what QEMU needs to know is not the raw physical address width of the host processor, but whether that width will cause EPT to silently truncate high guest-phys addresses.) Right; if you want to consider whether EPT is enabled (which is the right thing to do, albeit it makes for a much bigger patch) a KVM patch is necessary. In that case you also need to patch the API documentation. Note that this patch really doesn't do anything except for printing a message that something might potentially go wrong. Without EPT, you don't hit the processor limitation with your setup, but the user should nevertheless still be notified. In fact, I think shadow paging code should also emulate this behavior if the gpa is out of range. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: x86: Add host physical address width capability
On 07/09/15 20:32, Bandan Das wrote: Paolo Bonzini pbonz...@redhat.com writes: On 09/07/2015 08:43, Laszlo Ersek wrote: On 07/09/15 08:09, Paolo Bonzini wrote: On 09/07/2015 00:36, Bandan Das wrote: Let userspace inquire the maximum physical address width of the host processors; this can be used to identify maximum memory that can be assigned to the guest. Reported-by: Laszlo Ersek ler...@redhat.com Signed-off-by: Bandan Das b...@redhat.com --- arch/x86/kvm/x86.c | 3 +++ include/uapi/linux/kvm.h | 1 + 2 files changed, 4 insertions(+) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index bbaf44e..97d6746 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2683,6 +2683,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_NR_MEMSLOTS: r = KVM_USER_MEM_SLOTS; break; + case KVM_CAP_PHY_ADDR_WIDTH: + r = boot_cpu_data.x86_phys_bits; + break; Userspace can just use CPUID, can't it? I believe KVM's cooperation is necessary, for the following reason: The truncation only occurs when the guest-phys - host-phys translation is done in hardware, *and* the phys bits of the host processor are insufficient to represent the highest guest-phys address that the guest will ever face. The first condition (of course) means that the truncation depends on EPT being enabled. (I didn't test on AMD so I don't know if RVI has the same issue.) If EPT is disabled, either because the host processor lacks it, or because the respective kvm_intel module parameter is set so, then the issue cannot be experienced. Therefore I believe a KVM patch is necessary. However, this specific patch doesn't seem sufficient; it should also consider whether EPT is enabled. (And the ioctl should be perhaps renamed to reflect that -- what QEMU needs to know is not the raw physical address width of the host processor, but whether that width will cause EPT to silently truncate high guest-phys addresses.) Right; if you want to consider whether EPT is enabled (which is the right thing to do, albeit it makes for a much bigger patch) a KVM patch is necessary. In that case you also need to patch the API documentation. Note that this patch really doesn't do anything except for printing a message that something might potentially go wrong. Yes. Without EPT, you don't hit the processor limitation with your setup, but the user should nevertheless still be notified. I disagree. In fact, I think shadow paging code should also emulate this behavior if the gpa is out of range. I disagree. There is no out of range gpa. QEMU allocates enough memory, and it should be completely transparent to the guest. The fact that it silently breaks with nested paging if the host processor doesn't have enough address bits is a bug (maybe a hardware bug, maybe a KVM bug; I'm not sure, but I suspect it's a hardware bug). In any case the guest shouldn't care at all. It is a *virtual* machine, and the VMM should lie to it plausibly enough. How much RAM, and how many phys address bits the host has, is a performance question, but it should not be a correctness question. A 256 GB guest should run (slowly, but correctly) on a laptop that has only 4 GB of RAM and only 36 phys addr bits, but plenty of swap space. Because otherwise your argument could be extrapolated as TCG should break too if the gpa is 'out of range'. So, I disagree. Whatever memory you give to the guest should just work (unless of course you want to emulate a small address width for the *VCPU*, but that's absolutely not the use case here). What we have here is a leaky abstraction: a PCPU limitation giving away a lie that the guest should never notice. The guest should be able to use all memory that was specified with QEMU's -m, regardless of TCG vs. KVM-without-EPT vs. KVM-with-EPT. If the last case cannot work (due to hardware limitations), that's fine, but then (and only then) a warning should be printed. ... In any case, please understand that I'm not campaigning for this warning :) IIRC the warning was your (very welcome!) idea after I reported the problem; I'm just trying to ensure that the warning match the exact issue I encountered. Thanks! Laszlo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v2 0/7] KVM: arm/arm64: gsi routing support
Well personally I prefer the type thing and I don't see much difference at userspace level anyway. But I am not this kind of hyperspace architect guy. So, since there is no consensus here, I would say let's wait for formal reviews of our maintainers and I will align. Hah, okay... Don't take it for too much. Anyway your word weighs much more than my one... hope this eventually works with ITS ;-) You know... There are actually problems with irqfd. However i'm not sure whether they are result of some bug or of poorly made vGIC CPU interface software emulation. I don't have hardware with working vGICv3 here. With GICv2m everything is fine, but code path is quite different, and LPI != SPI. This is further overcomplicated by the fact that as soon as i start up the profiler, the problem goes away. Looks like increased number of context switches have impact. Actually, the problem is very poor vhost-net throughput, which jumps up when i start the profiler and goes back down when i stop it. Just to note... I'm off since now for a small vacation, will be back on monday or tuesday. See you guys! Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] target-i386: Sanity check host processor physical address width
Laszlo Ersek ler...@redhat.com writes: ... First, see my comments on the KVM patch. Second, ram_size is not the right thing to compare. What should be checked is whether the highest guest-physical address that maps to RAM can be represented in the address width of the host processor (and only if EPT is enabled, but that sub-condition belongs to the KVM patch). Note that this is not the same as the check written in the patch. For example, if you assume a 32-bit PCI hole with size 1 GB, then a total guest RAM of size 63 GB will result in the highest guest-phys memory address being 0xF__, which just fits into 36 bits. Correspondingly, the above code would not print the warning for -m $((63 * 1024 + 1)) on my laptop (which has address sizes : 36 bits physical, ...), even though such a guest would not boot for me (with EPT enabled). Please see http://thread.gmane.org/gmane.comp.bios.tianocore.devel/15418/focus=15447 So, ram_size in the controlling expression should be replaced with maximum_guest_ram_address (which should be inclusive, and the = relop should be preserved). also with memory hotplug tuned on we should check if the end of hotplug memory area is less then limit, i.e.: pcms-hotplug_memory.base + hotplug_mem_size 1ULL max_phys_bits Seems reasonable, thanks for the hint! Thanks Igor and Laszlo, makes sense. I am wondering if this 1GB PCI hole is always fixed so that I can simply include that in calculating the maximum guest ram address ? Or do we have to figure that out every time ? (The LHS in this instance is exclusive though, so equality should *not* trigger the warning. maximum_guest_ram_address is inclusive, and equality should trigger the warning. (Although equality seems quite impossible in practice.)) Thanks! Laszlo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: Add Kconfig option to signal cross-endian guests
On Thu, 9 Jul 2015 16:07:47 +0300 Michael S. Tsirkin m...@redhat.com wrote: On Thu, Jul 09, 2015 at 02:57:33PM +0200, Paolo Bonzini wrote: On 09/07/2015 11:48, Laurent Vivier wrote: On 09/07/2015 09:49, Thomas Huth wrote: The option for supporting cross-endianness legacy guests in the vhost and tun code should only be available on systems that support cross-endian guests. I'm sure I misunderstand something, but what happens if we use QEMU with TCG instead of KVM, i.e. a big endian powerpc kernel guest on x86_64 little endian host ? TCG does not yet support irqfd/ioeventfd, so it cannot be used with vhost. Paolo vhost does not require irqfd anymore. I think ioeventfd actually works fine though I didn't try, it would be easy to support. That's an interesting issue, thanks for pointing this out, Laurent! So do we now rather want to leave everything as it currently is, in case somebody wants to use vhost-net with a cross-endian TCG guest one day? Or do we assume that either a) TCG is so slow anyway that nobody wants to accelerate it with vhost or b) TCG vhost likely won't happen that soon so we hope that everybody will already be using virtio 1.0 at that point in time (with a fixed endianness) ? ... then I think we should go on and include this patch. Thomas -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: Add Kconfig option to signal cross-endian guests
On Thu, 9 Jul 2015 16:07:47 +0300 Michael S. Tsirkin m...@redhat.com wrote: On Thu, Jul 09, 2015 at 02:57:33PM +0200, Paolo Bonzini wrote: On 09/07/2015 11:48, Laurent Vivier wrote: On 09/07/2015 09:49, Thomas Huth wrote: The option for supporting cross-endianness legacy guests in the vhost and tun code should only be available on systems that support cross-endian guests. I'm sure I misunderstand something, but what happens if we use QEMU with TCG instead of KVM, i.e. a big endian powerpc kernel guest on x86_64 little endian host ? TCG does not yet support irqfd/ioeventfd, so it cannot be used with vhost. Paolo vhost does not require irqfd anymore. I think ioeventfd actually works fine though I didn't try, it would be easy to support. That's an interesting issue, thanks for pointing this out, Laurent! So do we now rather want to leave everything as it currently is, in case somebody wants to use vhost-net with a cross-endian TCG guest one day? Or do we assume that either a) TCG is so slow anyway that nobody wants to accelerate it with vhost or b) TCG vhost likely won't happen that soon so we hope that everybody will already be using virtio 1.0 at that point in time (with a fixed endianness) ? ... then I think we should go on and include this patch. Thomas -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC v1 4/5] KVM: x86: Add arch specific routines for irqbypass manager
On Fri, 2015-07-10 at 11:00 +0800, Feng Wu wrote: Add the following x86 specific routines for irqbypass manger: - kvm_arch_irq_bypass_add_producer - kvm_arch_irq_bypass_del_producer - kvm_arch_irq_bypass_stop - kvm_arch_irq_bypass_resume - kvm_arch_irq_bypass_update Signed-off-by: Feng Wu feng...@intel.com --- arch/x86/kvm/Kconfig | 1 + arch/x86/kvm/x86.c | 55 2 files changed, 56 insertions(+) diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index 22f6fcb..ae68f2a 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -62,6 +62,7 @@ config KVM_INTEL # for perf_guest_get_msrs(): depends on CPU_SUP_INTEL select IRQ_BYPASS_MANAGER + select HAVE_KVM_IRQ_BYPASS ---help--- Provides support for KVM on Intel processors equipped with the VT extensions. diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index d81ac02..a59f7e3 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -49,6 +49,8 @@ #include linux/pci.h #include linux/timekeeper_internal.h #include linux/pvclock_gtod.h +#include linux/kvm_irqfd.h +#include linux/irqbypass.h #include trace/events/kvm.h #define CREATE_TRACE_POINTS @@ -8023,6 +8025,59 @@ out: return ret; } +void kvm_arch_irq_bypass_add_producer(struct irq_bypass_consumer *cons, + struct irq_bypass_producer *prod) +{ + int ret; + struct kvm_kernel_irqfd *irqfd = + container_of(cons, struct kvm_kernel_irqfd, consumer); + + irqfd-producer = prod; + + ret = kvm_arch_update_pi_irte(irqfd-kvm, prod-irq, irqfd-gsi, 1); + WARN_ON(ret); +} + +void kvm_arch_irq_bypass_del_producer(struct irq_bypass_consumer *cons, + struct irq_bypass_producer *prod) +{ + int ret; + struct kvm_kernel_irqfd *irqfd = + container_of(cons, struct kvm_kernel_irqfd, consumer); + + irqfd-producer = NULL; + + /* + * When producer of consumer is unregistered, we change back to + * remapped mode, so we can re-use the current implementation + * when the irq is masked/disabed or the consumer side (KVM + * int this case doesn't want to receive the interrupts. + */ + ret = kvm_arch_update_pi_irte(irqfd-kvm, prod-irq, irqfd-gsi, 0); + WARN_ON(ret); +} + +void kvm_arch_irq_bypass_stop(struct irq_bypass_consumer *cons) +{ +} + +void kvm_arch_irq_bypass_resume(struct irq_bypass_consumer *cons) +{ +} No need for dummy functions, these callbacks will be optional. Thanks, Alex + +void kvm_arch_irq_bypass_update(struct irq_bypass_consumer *cons) +{ + int ret; + struct kvm_kernel_irqfd *irqfd = + container_of(cons, struct kvm_kernel_irqfd, consumer); + + BUG_ON(!irqfd-producer); + + ret = kvm_arch_update_pi_irte(irqfd-kvm, irqfd-producer-irq, + irqfd-gsi, 1); + WARN_ON(ret); +} + EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [RFC v1 5/5] Call irqbypass update routine when updating irqfd
-Original Message- From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of Alex Williamson Sent: Friday, July 10, 2015 11:26 AM To: Wu, Feng Cc: kvm@vger.kernel.org; pbonz...@redhat.com; j...@8bytes.org; avi.kiv...@gmail.com; eric.au...@linaro.org Subject: Re: [RFC v1 5/5] Call irqbypass update routine when updating irqfd On Fri, 2015-07-10 at 11:00 +0800, Feng Wu wrote: Call update routine when updating irqfd, this can update the IRTE for Intel posted-interrupts. Signed-off-by: Feng Wu feng...@intel.com --- virt/kvm/eventfd.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index a32cf6c..1226835 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -570,8 +570,10 @@ void kvm_irq_routing_update(struct kvm *kvm) spin_lock_irq(kvm-irqfds.lock); - list_for_each_entry(irqfd, kvm-irqfds.items, list) + list_for_each_entry(irqfd, kvm-irqfds.items, list) { irqfd_update(kvm, irqfd); + irqfd-consumer.update(irqfd-consumer); + } spin_unlock_irq(kvm-irqfds.lock); } I don't understand why the irq bypass manager needs to know about this update callback. We could just as easily make it be a function pointer on the irqfd structure or maybe just open code it. It's defined by the consumer and called by the consumer, the irq bypass manager shouldn't know about it. Thanks, Yes, you are right. All we need is the producer information which has been passed in the register routine. And we can easily make this update logic inside the consumer. Thanks for your comments! Thanks, Feng Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: x86: Add host physical address width capability
Bandan Das b...@redhat.com writes: Laszlo Ersek ler...@redhat.com writes: ... Yes. Without EPT, you don't hit the processor limitation with your setup, but the user should nevertheless still be notified. I disagree. In fact, I think shadow paging code should also emulate this behavior if the gpa is out of range. I disagree. There is no out of range gpa. QEMU allocates enough memory, and it should be completely transparent to the guest. The fact that it silently breaks with nested paging if the host processor doesn't have enough address bits is a bug (maybe a hardware bug, maybe a KVM bug; I'm not sure, but I suspect it's a hardware bug). In any case the guest shouldn't care at all. It is a *virtual* machine, and the VMM should lie to it plausibly enough. How much RAM, and how many phys address bits the host has, is a performance question, but it should not be a correctness question. A 256 GB guest should run (slowly, but correctly) on a laptop that has only 4 GB of RAM and only 36 phys addr bits, but plenty of swap space. Because otherwise your argument could be extrapolated as TCG should break too if the gpa is 'out of range'. So, I disagree. Whatever memory you give to the guest should just work (unless of course you want to emulate a small address width for the *VCPU*, but that's absolutely not the use case here). What we have here is a leaky abstraction: a PCPU limitation giving away a lie that the guest should never notice. The guest should be able to use all memory that was specified with QEMU's -m, regardless of TCG vs. KVM-without-EPT vs. KVM-with-EPT. If the last case cannot work (due to hardware limitations), that's fine, but then (and only then) a warning should be printed. Hmm... Ok, I understand your point. So, this is more like a EPT limitation/bug in that Qemu isn't complaining about the memory assigned to the guest but EPT code is breaking owing to the processor physical address width. And honestly, I now think that this patch just makes the whole situation more confusing :) I am wondering if it's just possible for kvm to simply throw an error like a EPT misconfiguration or something .. Or in other words, if using a hardware assisted mechanism is just not possible, KVM will simply not let it run instead of letting a guest stuck in boot. I noticed that when the guest gets stuck, trace shows an endless loop of EXTERNAL_INTERRUPT exits with code 14 (PF). There's a note in 28.2.2 of the spec that No processors supporting the Intel64 architecture support more than 48 physical-address bits.. An attempt to use such an address causes a page fault. So, my first guess was to print out the guest physical address. That seems to be well beyond the range and is always 0xff000 (when the guest is stuck). The other thing I can think of is the EPT entries have bits in the 51:N range set which is reserved and always 0. I haven't verified but it looks like there's ept_misconfig_inspect_spte() that should already catch this condition. I am out of ideas for today :) ... In any case, please understand that I'm not campaigning for this warning :) IIRC the warning was your (very welcome!) idea after I reported the problem; I'm just trying to ensure that the warning match the exact issue I encountered. Thanks! Laszlo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] KVM: SVM: use NPT page attributes
On 07/09/2015 11:18 PM, Paolo Bonzini wrote: On 09/07/2015 04:30, Xiao Guangrong wrote: diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 602b974a60a6..0f125c1860ec 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1085,6 +1085,47 @@ static u64 svm_compute_tsc_offset(struct kvm_vcpu *vcpu, u64 target_tsc) return target_tsc - tsc; } +static void svm_set_guest_pat(struct vcpu_svm *svm, u64 *g_pat) +{ +struct kvm_vcpu *vcpu = svm-vcpu; + +/* Unlike Intel, AMD takes the guest's CR0.CD into account. I noticed this code in svm_set_cr0(): if (!(vcpu-kvm-arch.disabled_quirks KVM_QUIRK_CD_NW_CLEARED)) cr0 = ~(X86_CR0_CD | X86_CR0_NW); gCR0.CD is hidden to CPU if KVM_QUIRK_CD_NW_CLEARED is not set and looks like it is the normal case after grepping Qemu code. Hi Xiao, yes, this is correct. QEMU still does not have support for disabling quirks, so gCR0.CD is currently hidden on SVM. I would like to include this series in 4.2, while for 4.3 I will disable the quirk above altogether (it is superseded by the way PAT is forced to all-WB). That plan sounds good to me. You will drop disabled_quirks completely or just enable it in Qemu? :) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC v1 1/5] vfio: Register/unregister irq_bypass_producer
This patch adds the registration/unregistration of an irq_bypass_producer for MSI/MSIx on vfio pci devices. Signed-off-by: Feng Wu feng...@intel.com --- drivers/vfio/pci/vfio_pci_intrs.c | 8 ++-- drivers/vfio/pci/vfio_pci_private.h | 2 ++ 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c index 4e053be..6e86292 100644 --- a/drivers/vfio/pci/vfio_pci_intrs.c +++ b/drivers/vfio/pci/vfio_pci_intrs.c @@ -322,7 +322,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev, if (vdev-ctx[vector].trigger) { free_irq(irq, vdev-ctx[vector].trigger); - /* irq_bypass_unregister_producer(); */ + irq_bypass_unregister_producer(vdev-ctx[vector].producer); kfree(vdev-ctx[vector].name); eventfd_ctx_put(vdev-ctx[vector].trigger); vdev-ctx[vector].trigger = NULL; @@ -364,7 +364,11 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev, return ret; } - /* irq_bypass_register_producer(); */ + INIT_LIST_HEAD(vdev-ctx[vector].producer.node); + vdev-ctx[vector].producer.token = trigger; + vdev-ctx[vector].producer.irq = irq; + ret = irq_bypass_register_producer(vdev-ctx[vector].producer); + WARN_ON(ret); vdev-ctx[vector].trigger = trigger; diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h index ae0e1b4..0e7394f 100644 --- a/drivers/vfio/pci/vfio_pci_private.h +++ b/drivers/vfio/pci/vfio_pci_private.h @@ -13,6 +13,7 @@ #include linux/mutex.h #include linux/pci.h +#include linux/irqbypass.h #ifndef VFIO_PCI_PRIVATE_H #define VFIO_PCI_PRIVATE_H @@ -29,6 +30,7 @@ struct vfio_pci_irq_ctx { struct virqfd *mask; char*name; boolmasked; + struct irq_bypass_producer producer; }; struct vfio_pci_device { -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC v1 3/5] KVM: Add pointer to 'struct irq_bypass_produce' in 'kvm_kernel_irqfd'
Add reference to struct irq_bypass_produce so we can get the producer information from the consumer side. Signed-off-by: Feng Wu feng...@intel.com --- include/linux/kvm_irqfd.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/linux/kvm_irqfd.h b/include/linux/kvm_irqfd.h index 3c0bd07..0c1de05 100644 --- a/include/linux/kvm_irqfd.h +++ b/include/linux/kvm_irqfd.h @@ -65,6 +65,7 @@ struct kvm_kernel_irqfd { poll_table pt; struct work_struct shutdown; struct irq_bypass_consumer consumer; + struct irq_bypass_producer *producer; }; #endif /* __LINUX_KVM_IRQFD_H */ -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC v1 5/5] Call irqbypass update routine when updating irqfd
Call update routine when updating irqfd, this can update the IRTE for Intel posted-interrupts. Signed-off-by: Feng Wu feng...@intel.com --- virt/kvm/eventfd.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index a32cf6c..1226835 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -570,8 +570,10 @@ void kvm_irq_routing_update(struct kvm *kvm) spin_lock_irq(kvm-irqfds.lock); - list_for_each_entry(irqfd, kvm-irqfds.items, list) + list_for_each_entry(irqfd, kvm-irqfds.items, list) { irqfd_update(kvm, irqfd); + irqfd-consumer.update(irqfd-consumer); + } spin_unlock_irq(kvm-irqfds.lock); } -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC v1 2/5] KVM: x86: Update IRTE for posted-interrupts
This patch adds the routine to update IRTE for posted-interrupts when guest changes the interrupt configuration. Signed-off-by: Feng Wu feng...@intel.com --- arch/x86/kvm/x86.c | 73 ++ 1 file changed, 73 insertions(+) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 26eaeb5..d81ac02 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -63,6 +63,7 @@ #include asm/fpu/internal.h /* Ugh! */ #include asm/pvclock.h #include asm/div64.h +#include asm/irq_remapping.h #define MAX_IO_MSRS 256 #define KVM_MAX_MCE_BANKS 32 @@ -7950,6 +7951,78 @@ bool kvm_arch_has_noncoherent_dma(struct kvm *kvm) } EXPORT_SYMBOL_GPL(kvm_arch_has_noncoherent_dma); +/* + * kvm_arch_update_pi_irte - set IRTE for Posted-Interrupts + * + * @kvm: kvm + * @host_irq: host irq of the interrupt + * @guest_irq: gsi of the interrupt + * @set: set or unset PI + * returns 0 on success, 0 on failure + */ +int kvm_arch_update_pi_irte(struct kvm *kvm, unsigned int host_irq, + uint32_t guest_irq, bool set) +{ + struct kvm_kernel_irq_routing_entry *e; + struct kvm_irq_routing_table *irq_rt; + struct kvm_lapic_irq irq; + struct kvm_vcpu *vcpu; + struct vcpu_data vcpu_info; + int idx, ret = -EINVAL; + + if (!irq_remapping_cap(IRQ_POSTING_CAP)) + return 0; + + idx = srcu_read_lock(kvm-irq_srcu); + irq_rt = srcu_dereference(kvm-irq_routing, kvm-irq_srcu); + BUG_ON(guest_irq = irq_rt-nr_rt_entries); + + hlist_for_each_entry(e, irq_rt-map[guest_irq], link) { + if (e-type != KVM_IRQ_ROUTING_MSI) + continue; + /* +* VT-d PI cannot support posting multicast/broadcast +* interrupts to a VCPU, we still use interrupt remapping +* for these kind of interrupts. +* +* For lowest-priority interrupts, we only support +* those with single CPU as the destination, e.g. user +* configures the interrupts via /proc/irq or uses +* irqbalance to make the interrupts single-CPU. +* +* We will support full lowest-priority interrupt later. +* +*/ + + kvm_set_msi_irq(e, irq); + if (!kvm_intr_is_single_vcpu(kvm, irq, vcpu)) + continue; + + vcpu_info.pi_desc_addr = kvm_x86_ops-get_pi_desc_addr(vcpu); + vcpu_info.vector = irq.vector; + + if (set) + ret = irq_set_vcpu_affinity(host_irq, vcpu_info); + else { + /* suppress notification event before unposting */ + kvm_x86_ops-pi_set_sn(vcpu); + ret = irq_set_vcpu_affinity(host_irq, NULL); + kvm_x86_ops-pi_clear_sn(vcpu); + } + + if (ret 0) { + printk(KERN_INFO %s: failed to update PI IRTE\n, + __func__); + goto out; + } + } + + ret = 0; +out: + srcu_read_unlock(kvm-irq_srcu, idx); + return ret; +} + EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault); -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC v1 4/5] KVM: x86: Add arch specific routines for irqbypass manager
Add the following x86 specific routines for irqbypass manger: - kvm_arch_irq_bypass_add_producer - kvm_arch_irq_bypass_del_producer - kvm_arch_irq_bypass_stop - kvm_arch_irq_bypass_resume - kvm_arch_irq_bypass_update Signed-off-by: Feng Wu feng...@intel.com --- arch/x86/kvm/Kconfig | 1 + arch/x86/kvm/x86.c | 55 2 files changed, 56 insertions(+) diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index 22f6fcb..ae68f2a 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -62,6 +62,7 @@ config KVM_INTEL # for perf_guest_get_msrs(): depends on CPU_SUP_INTEL select IRQ_BYPASS_MANAGER + select HAVE_KVM_IRQ_BYPASS ---help--- Provides support for KVM on Intel processors equipped with the VT extensions. diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index d81ac02..a59f7e3 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -49,6 +49,8 @@ #include linux/pci.h #include linux/timekeeper_internal.h #include linux/pvclock_gtod.h +#include linux/kvm_irqfd.h +#include linux/irqbypass.h #include trace/events/kvm.h #define CREATE_TRACE_POINTS @@ -8023,6 +8025,59 @@ out: return ret; } +void kvm_arch_irq_bypass_add_producer(struct irq_bypass_consumer *cons, + struct irq_bypass_producer *prod) +{ + int ret; + struct kvm_kernel_irqfd *irqfd = + container_of(cons, struct kvm_kernel_irqfd, consumer); + + irqfd-producer = prod; + + ret = kvm_arch_update_pi_irte(irqfd-kvm, prod-irq, irqfd-gsi, 1); + WARN_ON(ret); +} + +void kvm_arch_irq_bypass_del_producer(struct irq_bypass_consumer *cons, + struct irq_bypass_producer *prod) +{ + int ret; + struct kvm_kernel_irqfd *irqfd = + container_of(cons, struct kvm_kernel_irqfd, consumer); + + irqfd-producer = NULL; + + /* +* When producer of consumer is unregistered, we change back to +* remapped mode, so we can re-use the current implementation +* when the irq is masked/disabed or the consumer side (KVM +* int this case doesn't want to receive the interrupts. +*/ + ret = kvm_arch_update_pi_irte(irqfd-kvm, prod-irq, irqfd-gsi, 0); + WARN_ON(ret); +} + +void kvm_arch_irq_bypass_stop(struct irq_bypass_consumer *cons) +{ +} + +void kvm_arch_irq_bypass_resume(struct irq_bypass_consumer *cons) +{ +} + +void kvm_arch_irq_bypass_update(struct irq_bypass_consumer *cons) +{ + int ret; + struct kvm_kernel_irqfd *irqfd = + container_of(cons, struct kvm_kernel_irqfd, consumer); + + BUG_ON(!irqfd-producer); + + ret = kvm_arch_update_pi_irte(irqfd-kvm, irqfd-producer-irq, + irqfd-gsi, 1); + WARN_ON(ret); +} + EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault); -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC v1 0/5] irq bypass interface implementation for VT-d Posted-interrupts
This series is based on Alex and Eric's irq bypass manager framework. To make things clear, I only send out the patches related to irq bypass manager, the purpose here is to show how certain callbacks are used in VT-d PI and help to improve the irqbypass manager itself. Feng Wu (5): vfio: Register/unregister irq_bypass_producer KVM: x86: Update IRTE for posted-interrupts KVM: Add pointer to 'struct irq_bypass_produce' in 'kvm_kernel_irqfd' KVM: x86: Add arch specific routines for irqbypass manager Call irqbypass update routine when updating irqfd arch/x86/kvm/Kconfig| 1 + arch/x86/kvm/x86.c | 128 drivers/vfio/pci/vfio_pci_intrs.c | 8 ++- drivers/vfio/pci/vfio_pci_private.h | 2 + include/linux/kvm_irqfd.h | 1 + virt/kvm/eventfd.c | 4 +- 6 files changed, 141 insertions(+), 3 deletions(-) -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC v1 5/5] Call irqbypass update routine when updating irqfd
On Fri, 2015-07-10 at 11:00 +0800, Feng Wu wrote: Call update routine when updating irqfd, this can update the IRTE for Intel posted-interrupts. Signed-off-by: Feng Wu feng...@intel.com --- virt/kvm/eventfd.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index a32cf6c..1226835 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -570,8 +570,10 @@ void kvm_irq_routing_update(struct kvm *kvm) spin_lock_irq(kvm-irqfds.lock); - list_for_each_entry(irqfd, kvm-irqfds.items, list) + list_for_each_entry(irqfd, kvm-irqfds.items, list) { irqfd_update(kvm, irqfd); + irqfd-consumer.update(irqfd-consumer); + } spin_unlock_irq(kvm-irqfds.lock); } I don't understand why the irq bypass manager needs to know about this update callback. We could just as easily make it be a function pointer on the irqfd structure or maybe just open code it. It's defined by the consumer and called by the consumer, the irq bypass manager shouldn't know about it. Thanks, Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: Add Kconfig option to signal cross-endian guests
The option for supporting cross-endianness legacy guests in the vhost and tun code should only be available on systems that support cross-endian guests. Signed-off-by: Thomas Huth th...@redhat.com --- arch/arm/kvm/Kconfig | 1 + arch/arm64/kvm/Kconfig | 1 + arch/powerpc/kvm/Kconfig | 1 + drivers/net/Kconfig | 1 + drivers/vhost/Kconfig| 1 + virt/kvm/Kconfig | 3 +++ 6 files changed, 8 insertions(+) diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig index bfb915d..9d8f363 100644 --- a/arch/arm/kvm/Kconfig +++ b/arch/arm/kvm/Kconfig @@ -31,6 +31,7 @@ config KVM select KVM_VFIO select HAVE_KVM_EVENTFD select HAVE_KVM_IRQFD + select KVM_CROSS_ENDIAN_GUESTS depends on ARM_VIRT_EXT ARM_LPAE ARM_ARCH_TIMER ---help--- Support hosting virtualized guest machines. diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig index bfffe8f..9af39fe 100644 --- a/arch/arm64/kvm/Kconfig +++ b/arch/arm64/kvm/Kconfig @@ -31,6 +31,7 @@ config KVM select KVM_VFIO select HAVE_KVM_EVENTFD select HAVE_KVM_IRQFD + select KVM_CROSS_ENDIAN_GUESTS ---help--- Support hosting virtualized guest machines. diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig index 3caec2c..e028710 100644 --- a/arch/powerpc/kvm/Kconfig +++ b/arch/powerpc/kvm/Kconfig @@ -79,6 +79,7 @@ config KVM_BOOK3S_64_HV select KVM_BOOK3S_HV_POSSIBLE select MMU_NOTIFIER select CMA + select KVM_CROSS_ENDIAN_GUESTS ---help--- Support running unmodified book3s_64 guest kernels in virtual machines on POWER7 and PPC970 processors that have diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index c18f9e6..0c4ce47 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -261,6 +261,7 @@ config TUN config TUN_VNET_CROSS_LE bool Support for cross-endian vnet headers on little-endian kernels default n + depends on KVM_CROSS_ENDIAN_GUESTS ---help--- This option allows TUN/TAP and MACVTAP device drivers in a little-endian kernel to parse vnet headers that come from a diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig index 533eaf0..4d8ae6b 100644 --- a/drivers/vhost/Kconfig +++ b/drivers/vhost/Kconfig @@ -35,6 +35,7 @@ config VHOST config VHOST_CROSS_ENDIAN_LEGACY bool Cross-endian support for vhost + depends on KVM_CROSS_ENDIAN_GUESTS default n ---help--- This option allows vhost to support guests with a different byte diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig index e2c876d..cc7b28a 100644 --- a/virt/kvm/Kconfig +++ b/virt/kvm/Kconfig @@ -47,3 +47,6 @@ config KVM_GENERIC_DIRTYLOG_READ_PROTECT config KVM_COMPAT def_bool y depends on COMPAT !S390 + +config KVM_CROSS_ENDIAN_GUESTS + bool -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] mips/kvm: Fixes for big endian MIPS64 hosts
On 08/07/2015 16:22, Paolo Bonzini wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 On 08/07/2015 17:03, James Hogan wrote: Hi Paolo, On 24/04/15 11:26, James Hogan wrote: A couple of small fixes for accessing 32-bit KVM registers on big endian, and to sign extend struct kvm_regs registers so as to work on MIPS64 hosts. James Hogan (2): mips/kvm: Fix Big endian 32-bit register access mips/kvm: Sign extend registers written to KVM Any chance of applying these in time for v2.4? I had assumed they'd go through Leon, but I can apply them for 2.4-rc1 as well. I thought these changes would get applied via kvm queue since they touch target-mips/kvm.c only. But if such target-specific kvm patches usually go via target-* tree then I'm happy to pick them up. Leon -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] KVM: arm/arm64: Don't let userspace update CNTVOFF once guest is running
On 9 July 2015 at 11:22, Christoffer Dall christoffer.d...@linaro.org wrote: On Wed, Jul 08, 2015 at 08:13:59PM +0100, Peter Maydell wrote: I suspect Jan is right and we really need to distinguish the KVM_PUT_*_STATE levels in ARM QEMU. This probably implies some kind of whitelist/override mechanism, since by and large we neither know nor want to know the semantics for system registers, we leave that up to the kernel. Q: if you have a running VM, and you pause it for an hour, what should the CNTVCT register do? Presumably it should not advance, but how do we arrange for that to happen? I think the CNTVCT should not advance when the VM is not scheduled, so if we pause the VM or starve all the VCPUs for enough time, the guest should not see time progressing, since otherwise the guest scheduler cannot maintain fairness and you're bound to see spurious RCU stalls etc. That's exactly why a guest can read both a virtual and physical counter and it is an area where you simply want some level of paravirtualization. I haven't studied how/if Linux deals with this at all. So I think adjusting CNTVOFF should be managed by the kernel for the pause/starvation scenario (which I think Avi once told me x86 does too - does anyone know the current state of the art?). Agreed that we should be aiming for same behaviour as x86 rather than reinventing the wheel... So the only situation where I think userspace should adjust the CNTVOFF value is for migration where we are talking about a brand new VM with has_run_once clear. For incoming migration/loadvm the VM will be freshly *reset*, but it will not be completely created from scratch. It's possible for the user to run a VM for a bit, and then use loadvm from the monitor to load an old snapshot. This will do a qemu_system_reset() and then restore the state. So in that case has_run_once won't be clear. (Would it be OK to clear it when userspace does VCPU_INIT? What else does it control?) Thus, if we were designing this from scratch now, the API should be to return an error when trying to set KVM_REG_ARM_TIMER_CNT after the VM has run once, but it's too late for that as we would break userspace. The best alternative IMHO would be to merge Marc's patch and fix CNTVOFF in the kernel side as well, and finally also fix QEMU so that it doesn't try to do the thing that future kernels will ignore. Isn't Marc's patch causing us to return an error to userspace? (It does a 'return -1', at least -- does that get ignored at a higher level?) In principle, if userspace does: * stop all vCPUs * read the counter values from each vCPU * write the counter values back to each vCPU * resume vCPUs this shouldn't cause time to do funny things, should it? (ie the problem only occurs if we try to write the vCPU counter value while only one vCPU is stopped). thanks -- PMM -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] KVM: arm/arm64: Don't let userspace update CNTVOFF once guest is running
On 2015-07-09 12:22, Christoffer Dall wrote: Hi Peter and Marc, [cc'ing Paolo for his input on x86 timekeeping] On Wed, Jul 08, 2015 at 08:13:59PM +0100, Peter Maydell wrote: On 8 July 2015 at 17:37, Marc Zyngier marc.zyng...@arm.com wrote: On 08/07/15 17:06, Peter Maydell wrote: I'd prefer it if somebody could investigate to see why QEMU is actually doing this -- so far we just have speculation. I'd prefer that too, but so far people seem to be more comfortable waiting for the issue to fix itself. In the meantime, VMs are broken in weird and wonderful ways, and I don't think the current status-quo helps anyone. Putting in a patch which might not be the right fix isn't necessarily a good plan either... Does has_run_once get cleared if we do a re-VCPU_INIT of a CPU that's run before? (We need to allow rewriting of guest state at that point so that reset VM and load migration state behaves correctly.) no, it does not, has_run_once is set the first time a VCPU is run and is currently *never* cleared. I suspect Jan is right and we really need to distinguish the KVM_PUT_*_STATE levels in ARM QEMU. This probably implies some kind of whitelist/override mechanism, since by and large we neither know nor want to know the semantics for system registers, we leave that up to the kernel. Q: if you have a running VM, and you pause it for an hour, what should the CNTVCT register do? Presumably it should not advance, but how do we arrange for that to happen? I think the CNTVCT should not advance when the VM is not scheduled, so if we pause the VM or starve all the VCPUs for enough time, the guest should not see time progressing, since otherwise the guest scheduler cannot maintain fairness and you're bound to see spurious RCU stalls etc. That's exactly why a guest can read both a virtual and physical counter and it is an area where you simply want some level of paravirtualization. I haven't studied how/if Linux deals with this at all. So I think adjusting CNTVOFF should be managed by the kernel for the pause/starvation scenario (which I think Avi once told me x86 does too - does anyone know the current state of the art?). So the only situation where I think userspace should adjust the CNTVOFF value is for migration where we are talking about a brand new VM with has_run_once clear. Thus, if we were designing this from scratch now, the API should be to return an error when trying to set KVM_REG_ARM_TIMER_CNT after the VM has run once, but it's too late for that as we would break userspace. The best alternative IMHO would be to merge Marc's patch and fix CNTVOFF in the kernel side as well, and finally also fix QEMU so that it doesn't try to do the thing that future kernels will ignore. Fixing QEMU to only write on KVM_PUT_FULL_STATE - yes, that should be done, but I don't think the approach for the kernel is generally right. The kernel should not do any policing on user space requests to change the VCPU or VM state unless - security is affected - userspace lacks information, thus cannot decide correctly - legacy userspace has a bug, we can detect it and want to fix that up without affecting future userspace that has a reason to do it differently Regarding CNTVOFF, the first two criteria do not apply for sure. Maybe the last one, don't know. Just think of the hypothetical scenario that a userspace VM debugger wants to inject certain register manipulations. If you block this by some hidden VM state like proposed, that feature would no longer be implementable easily. Jan signature.asc Description: OpenPGP digital signature
Re: [PATCH] KVM: x86: Add host physical address width capability
On 07/09/15 08:09, Paolo Bonzini wrote: On 09/07/2015 00:36, Bandan Das wrote: Let userspace inquire the maximum physical address width of the host processors; this can be used to identify maximum memory that can be assigned to the guest. Reported-by: Laszlo Ersek ler...@redhat.com Signed-off-by: Bandan Das b...@redhat.com --- arch/x86/kvm/x86.c | 3 +++ include/uapi/linux/kvm.h | 1 + 2 files changed, 4 insertions(+) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index bbaf44e..97d6746 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2683,6 +2683,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_NR_MEMSLOTS: r = KVM_USER_MEM_SLOTS; break; +case KVM_CAP_PHY_ADDR_WIDTH: +r = boot_cpu_data.x86_phys_bits; +break; Userspace can just use CPUID, can't it? I believe KVM's cooperation is necessary, for the following reason: The truncation only occurs when the guest-phys - host-phys translation is done in hardware, *and* the phys bits of the host processor are insufficient to represent the highest guest-phys address that the guest will ever face. The first condition (of course) means that the truncation depends on EPT being enabled. (I didn't test on AMD so I don't know if RVI has the same issue.) If EPT is disabled, either because the host processor lacks it, or because the respective kvm_intel module parameter is set so, then the issue cannot be experienced. Therefore I believe a KVM patch is necessary. However, this specific patch doesn't seem sufficient; it should also consider whether EPT is enabled. (And the ioctl should be perhaps renamed to reflect that -- what QEMU needs to know is not the raw physical address width of the host processor, but whether that width will cause EPT to silently truncate high guest-phys addresses.) Thanks Laszlo Paolo case KVM_CAP_PV_MMU:/* obsolete */ r = 0; break; diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 716ad4a..e7949a1 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -817,6 +817,7 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_DISABLE_QUIRKS 116 #define KVM_CAP_X86_SMM 117 #define KVM_CAP_MULTI_ADDRESS_SPACE 118 +#define KVM_CAP_PHY_ADDR_WIDTH 119 #ifdef KVM_CAP_IRQ_ROUTING -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: x86: fix load xsave feature warning
[ 68.196974] WARNING: CPU: 1 PID: 2140 at arch/x86/kvm/x86.c:3161 kvm_arch_vcpu_ioctl+0xe88/0x1340 [kvm]() [ 68.196975] Modules linked in: snd_hda_codec_hdmi i915 rfcomm bnep bluetooth i2c_algo_bit rfkill nfsd drm_kms_helper nfs_acl nfs drm lockd grace sunrpc fscache snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_seq_dummy snd_seq_oss x86_pkg_temp_thermal snd_seq_midi kvm_intel snd_seq_midi_event snd_rawmidi kvm snd_seq ghash_clmulni_intel fuse snd_timer aesni_intel parport_pc ablk_helper snd_seq_device cryptd ppdev snd lp parport lrw dcdbas gf128mul i2c_core glue_helper lpc_ich video shpchp mfd_core soundcore serio_raw acpi_cpufreq ext4 mbcache jbd2 sd_mod crc32c_intel ahci libahci libata e1000e ptp pps_core [ 68.197005] CPU: 1 PID: 2140 Comm: qemu-system-x86 Not tainted 4.2.0-rc1+ #2 [ 68.197006] Hardware name: Dell Inc. OptiPlex 7020/0F5C5X, BIOS A03 01/08/2015 [ 68.197007] a03b0657 8800d984bca8 815915a2 [ 68.197009] 8800d984bce8 81057c0a 7ff6d0001000 [ 68.197010] 0002 880211c1a000 0004 8800ce0288c0 [ 68.197012] Call Trace: [ 68.197017] [815915a2] dump_stack+0x45/0x57 [ 68.197020] [81057c0a] warn_slowpath_common+0x8a/0xc0 [ 68.197022] [81057cfa] warn_slowpath_null+0x1a/0x20 [ 68.197029] [a037bed8] kvm_arch_vcpu_ioctl+0xe88/0x1340 [kvm] [ 68.197035] [a037aede] ? kvm_arch_vcpu_load+0x4e/0x1c0 [kvm] [ 68.197040] [a03696a6] kvm_vcpu_ioctl+0xc6/0x5c0 [kvm] [ 68.197043] [811252d2] ? perf_pmu_enable+0x22/0x30 [ 68.197044] [8112663e] ? perf_event_context_sched_in+0x7e/0xb0 [ 68.197048] [811a6882] do_vfs_ioctl+0x2c2/0x4a0 [ 68.197050] [8107bf33] ? finish_task_switch+0x173/0x220 [ 68.197053] [8123307f] ? selinux_file_ioctl+0x4f/0xd0 [ 68.197055] [8122cac3] ? security_file_ioctl+0x43/0x60 [ 68.197057] [811a6ad9] SyS_ioctl+0x79/0x90 [ 68.197060] [81597e57] entry_SYSCALL_64_fastpath+0x12/0x6a [ 68.197061] ---[ end trace 558a5ebf9445fc80 ]--- After commit (0c4109bec0 'x86/fpu/xstate: Fix up bad get_xsave_addr() assumptions'), there is no assumption an xsave bit is present in the hardware (pcntxt_mask) that it is always present in a given xsave buffer. An enabled state to be present on 'pcntxt_mask', but *not* in 'xstate_bv' could happen when the last 'xsave' did not request that this feature be saved (unlikely) or because the init optimization caused it to not be saved. This patch kill the assumption. Signed-off-by: Wanpeng Li wanpeng...@hotmail.com --- Note: This patch against latest linus tree. arch/x86/kvm/x86.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index bbaf44e..7125cefe 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3157,8 +3157,7 @@ static void load_xsave(struct kvm_vcpu *vcpu, u8 *src) cpuid_count(XSTATE_CPUID, index, size, offset, ecx, edx); memcpy(dest, src + offset, size); - } else - WARN_ON_ONCE(1); + } valid -= feature; } -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: x86: Add host physical address width capability
On 09/07/2015 00:36, Bandan Das wrote: Let userspace inquire the maximum physical address width of the host processors; this can be used to identify maximum memory that can be assigned to the guest. Reported-by: Laszlo Ersek ler...@redhat.com Signed-off-by: Bandan Das b...@redhat.com --- arch/x86/kvm/x86.c | 3 +++ include/uapi/linux/kvm.h | 1 + 2 files changed, 4 insertions(+) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index bbaf44e..97d6746 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2683,6 +2683,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_NR_MEMSLOTS: r = KVM_USER_MEM_SLOTS; break; + case KVM_CAP_PHY_ADDR_WIDTH: + r = boot_cpu_data.x86_phys_bits; + break; Userspace can just use CPUID, can't it? Paolo case KVM_CAP_PV_MMU:/* obsolete */ r = 0; break; diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 716ad4a..e7949a1 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -817,6 +817,7 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_DISABLE_QUIRKS 116 #define KVM_CAP_X86_SMM 117 #define KVM_CAP_MULTI_ADDRESS_SPACE 118 +#define KVM_CAP_PHY_ADDR_WIDTH 119 #ifdef KVM_CAP_IRQ_ROUTING -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] target-i386: Sanity check host processor physical address width
On 07/09/15 00:42, Bandan Das wrote: If a Linux guest is assigned more memory than is supported by the host processor, the guest is unable to boot. That is expected, however, there's no message indicating the user what went wrong. This change prints a message to stderr if KVM has the corresponding capability. Reported-by: Laszlo Ersek ler...@redhat.com Signed-off-by: Bandan Das b...@redhat.com --- linux-headers/linux/kvm.h | 1 + target-i386/kvm.c | 6 ++ 2 files changed, 7 insertions(+) diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h index 3bac873..6afad49 100644 --- a/linux-headers/linux/kvm.h +++ b/linux-headers/linux/kvm.h @@ -817,6 +817,7 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_DISABLE_QUIRKS 116 #define KVM_CAP_X86_SMM 117 #define KVM_CAP_MULTI_ADDRESS_SPACE 118 +#define KVM_CAP_PHY_ADDR_WIDTH 119 #ifdef KVM_CAP_IRQ_ROUTING diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 066d03d..66e3448 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -892,6 +892,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s) uint64_t shadow_mem; int ret; struct utsname utsname; +int max_phys_bits; ret = kvm_get_supported_msrs(s); if (ret 0) { @@ -945,6 +946,11 @@ int kvm_arch_init(MachineState *ms, KVMState *s) } } +max_phys_bits = kvm_check_extension(s, KVM_CAP_PHY_ADDR_WIDTH); +if (max_phys_bits (1ULL max_phys_bits) = ram_size) +fprintf(stderr, Warning: The amount of memory assigned to the guest +is more than that supported by the host CPU(s). Guest may be unstable.\n); + if (kvm_check_extension(s, KVM_CAP_X86_SMM)) { smram_machine_done.notify = register_smram_listener; qemu_add_machine_init_done_notifier(smram_machine_done); First, see my comments on the KVM patch. Second, ram_size is not the right thing to compare. What should be checked is whether the highest guest-physical address that maps to RAM can be represented in the address width of the host processor (and only if EPT is enabled, but that sub-condition belongs to the KVM patch). Note that this is not the same as the check written in the patch. For example, if you assume a 32-bit PCI hole with size 1 GB, then a total guest RAM of size 63 GB will result in the highest guest-phys memory address being 0xF__, which just fits into 36 bits. Correspondingly, the above code would not print the warning for -m $((63 * 1024 + 1)) on my laptop (which has address sizes : 36 bits physical, ...), even though such a guest would not boot for me (with EPT enabled). Please see http://thread.gmane.org/gmane.comp.bios.tianocore.devel/15418/focus=15447 So, ram_size in the controlling expression should be replaced with maximum_guest_ram_address (which should be inclusive, and the = relop should be preserved). Thanks Laszlo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: Add Kconfig option to signal cross-endian guests
The option for supporting cross-endianness legacy guests in the vhost and tun code should only be available on systems that support cross-endian guests. Signed-off-by: Thomas Huth th...@redhat.com --- arch/arm/kvm/Kconfig | 1 + arch/arm64/kvm/Kconfig | 1 + arch/powerpc/kvm/Kconfig | 1 + drivers/net/Kconfig | 1 + drivers/vhost/Kconfig| 1 + virt/kvm/Kconfig | 3 +++ 6 files changed, 8 insertions(+) diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig index bfb915d..9d8f363 100644 --- a/arch/arm/kvm/Kconfig +++ b/arch/arm/kvm/Kconfig @@ -31,6 +31,7 @@ config KVM select KVM_VFIO select HAVE_KVM_EVENTFD select HAVE_KVM_IRQFD + select KVM_CROSS_ENDIAN_GUESTS depends on ARM_VIRT_EXT ARM_LPAE ARM_ARCH_TIMER ---help--- Support hosting virtualized guest machines. diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig index bfffe8f..9af39fe 100644 --- a/arch/arm64/kvm/Kconfig +++ b/arch/arm64/kvm/Kconfig @@ -31,6 +31,7 @@ config KVM select KVM_VFIO select HAVE_KVM_EVENTFD select HAVE_KVM_IRQFD + select KVM_CROSS_ENDIAN_GUESTS ---help--- Support hosting virtualized guest machines. diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig index 3caec2c..e028710 100644 --- a/arch/powerpc/kvm/Kconfig +++ b/arch/powerpc/kvm/Kconfig @@ -79,6 +79,7 @@ config KVM_BOOK3S_64_HV select KVM_BOOK3S_HV_POSSIBLE select MMU_NOTIFIER select CMA + select KVM_CROSS_ENDIAN_GUESTS ---help--- Support running unmodified book3s_64 guest kernels in virtual machines on POWER7 and PPC970 processors that have diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index c18f9e6..0c4ce47 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -261,6 +261,7 @@ config TUN config TUN_VNET_CROSS_LE bool Support for cross-endian vnet headers on little-endian kernels default n + depends on KVM_CROSS_ENDIAN_GUESTS ---help--- This option allows TUN/TAP and MACVTAP device drivers in a little-endian kernel to parse vnet headers that come from a diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig index 533eaf0..4d8ae6b 100644 --- a/drivers/vhost/Kconfig +++ b/drivers/vhost/Kconfig @@ -35,6 +35,7 @@ config VHOST config VHOST_CROSS_ENDIAN_LEGACY bool Cross-endian support for vhost + depends on KVM_CROSS_ENDIAN_GUESTS default n ---help--- This option allows vhost to support guests with a different byte diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig index e2c876d..cc7b28a 100644 --- a/virt/kvm/Kconfig +++ b/virt/kvm/Kconfig @@ -47,3 +47,6 @@ config KVM_GENERIC_DIRTYLOG_READ_PROTECT config KVM_COMPAT def_bool y depends on COMPAT !S390 + +config KVM_CROSS_ENDIAN_GUESTS + bool -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] target-i386: Sanity check host processor physical address width
On 09/07/2015 00:42, Bandan Das wrote: If a Linux guest is assigned more memory than is supported by the host processor, the guest is unable to boot. That is expected, however, there's no message indicating the user what went wrong. This change prints a message to stderr if KVM has the corresponding capability. Reported-by: Laszlo Ersek ler...@redhat.com Signed-off-by: Bandan Das b...@redhat.com This is not entirely correct, because it doesn't take the PCI hole into account. Perhaps KVM could simply hide memory above the limit (i.e. treat it as MMIO), and the BIOS could remove RAM above the limit from the e820 memory map? Paolo --- linux-headers/linux/kvm.h | 1 + target-i386/kvm.c | 6 ++ 2 files changed, 7 insertions(+) diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h index 3bac873..6afad49 100644 --- a/linux-headers/linux/kvm.h +++ b/linux-headers/linux/kvm.h @@ -817,6 +817,7 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_DISABLE_QUIRKS 116 #define KVM_CAP_X86_SMM 117 #define KVM_CAP_MULTI_ADDRESS_SPACE 118 +#define KVM_CAP_PHY_ADDR_WIDTH 119 #ifdef KVM_CAP_IRQ_ROUTING diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 066d03d..66e3448 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -892,6 +892,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s) uint64_t shadow_mem; int ret; struct utsname utsname; +int max_phys_bits; ret = kvm_get_supported_msrs(s); if (ret 0) { @@ -945,6 +946,11 @@ int kvm_arch_init(MachineState *ms, KVMState *s) } } +max_phys_bits = kvm_check_extension(s, KVM_CAP_PHY_ADDR_WIDTH); +if (max_phys_bits (1ULL max_phys_bits) = ram_size) +fprintf(stderr, Warning: The amount of memory assigned to the guest +is more than that supported by the host CPU(s). Guest may be unstable.\n); + if (kvm_check_extension(s, KVM_CAP_X86_SMM)) { smram_machine_done.notify = register_smram_listener; qemu_add_machine_init_done_notifier(smram_machine_done); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] target-i386: Sanity check host processor physical address width
On 07/09/15 09:59, Paolo Bonzini wrote: On 09/07/2015 00:42, Bandan Das wrote: If a Linux guest is assigned more memory than is supported by the host processor, the guest is unable to boot. That is expected, however, there's no message indicating the user what went wrong. This change prints a message to stderr if KVM has the corresponding capability. Reported-by: Laszlo Ersek ler...@redhat.com Signed-off-by: Bandan Das b...@redhat.com This is not entirely correct, because it doesn't take the PCI hole into account. Perhaps KVM could simply hide memory above the limit (i.e. treat it as MMIO), and the BIOS could remove RAM above the limit from the e820 memory map? I'd prefer to leave the guest firmware*s* out of this... :) E820 is a legacy BIOS concept. In OVMF we'd have to hack the memory resource descriptor HOBs (which in turn control the DXE memory space map, which in turn controls the UEFI memory map). Those HOBs are currently based on what the CMOS reports about the RAM available under and above 4GB. It's pretty complex already (will get more complex with SMM support), and TBH, for working around such an obscure issue, I wouldn't like to complicate it even further... After all, this is a host platform limitation. The solution should be to either move to a more capable host, or do it in software (disable EPT). Thanks! Laszlo Paolo --- linux-headers/linux/kvm.h | 1 + target-i386/kvm.c | 6 ++ 2 files changed, 7 insertions(+) diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h index 3bac873..6afad49 100644 --- a/linux-headers/linux/kvm.h +++ b/linux-headers/linux/kvm.h @@ -817,6 +817,7 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_DISABLE_QUIRKS 116 #define KVM_CAP_X86_SMM 117 #define KVM_CAP_MULTI_ADDRESS_SPACE 118 +#define KVM_CAP_PHY_ADDR_WIDTH 119 #ifdef KVM_CAP_IRQ_ROUTING diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 066d03d..66e3448 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -892,6 +892,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s) uint64_t shadow_mem; int ret; struct utsname utsname; +int max_phys_bits; ret = kvm_get_supported_msrs(s); if (ret 0) { @@ -945,6 +946,11 @@ int kvm_arch_init(MachineState *ms, KVMState *s) } } +max_phys_bits = kvm_check_extension(s, KVM_CAP_PHY_ADDR_WIDTH); +if (max_phys_bits (1ULL max_phys_bits) = ram_size) +fprintf(stderr, Warning: The amount of memory assigned to the guest +is more than that supported by the host CPU(s). Guest may be unstable.\n); + if (kvm_check_extension(s, KVM_CAP_X86_SMM)) { smram_machine_done.notify = register_smram_listener; qemu_add_machine_init_done_notifier(smram_machine_done); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Live migration using shared storage in different networks
On Mon, Jul 06, 2015 at 09:44:07AM +0100, Miguel Barbosa Gonçalves wrote: I am building a KVM cluster that needs VM live migration. My shared storage as well as the KVM hosts will be running CentOS. Because 10 Gbps Ethernet switches are very expensive at the moment I will connect the KVM hosts to the storage by cross-over cables and create private networks for each connection (10.0.0.0/30 and 10.0.0.4/30). The following diagram shows the topology Management ManagementManagement VLAN VLAN VLAN | | | ++-+ 10 Gbps +++ 10 Gbps ++-+ | KVM Host |---| Storage |---| KVM Host | ++-+ +++ ++-+ | | | Public PublicPublic VLAN VLAN VLAN My question is: will live migration work in this configuration since the storage will have 2 different IP addresses (10.0.0.1 and 10.0.0.5) in 2 different networks even though it is the same storage? At the QEMU level this works. At the libvirt level you may need to be careful how you configure the storage and domains (VMs). You need to make sure that the storage network details are not part of the same VM configuration that gets applied on both hosts. For example: if you NFS mount or attach iSCSI LUNs to the host and then just give libvirt the path to the image file, it will work. But if you want libvirt to do the NFS/iSCSI setup for you then you'll have to dig into the configuration documentation (http://libvirt.org/) and it may not be possible. Stefan pgpDCdJm4w0jZ.pgp Description: PGP signature
[PATCH for 2.4 0/2] MIPS build fixes for v2.4
These two patches fix build errors for the MIPS TCG backend and MIPS KVM. Please could they be applied for v2.4. James Hogan (2): tcg/mips: Fix build error from merged memop+mmu_idx parameter mips/kvm: Sync with newer MIPS KVM headers target-mips/kvm.c | 15 ++- tcg/mips/tcg-target.c | 4 ++-- 2 files changed, 4 insertions(+), 15 deletions(-) Cc: Aurelien Jarno aurel...@aurel32.net Cc: Leon Alrae leon.al...@imgtec.com Cc: Richard Henderson r...@twiddle.net Cc: Peter Maydell peter.mayd...@linaro.org Cc: Paolo Bonzini pbonz...@redhat.com Cc: kvm@vger.kernel.org -- 2.3.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH for 2.4 2/2] mips/kvm: Sync with newer MIPS KVM headers
The KVM_REG_MIPS_COUNT_* definitions are now included in linux-headers/asm-mips/kvm.h since commit b061808d39fa (linux-headers: update linux headers to kvm/next), therefore the duplicate definitions in target-mips/kvm.c can now be dropped (the definitions were tweaked slightly in commit 7a52ce8a1607 (linux-headers: update) which triggered the following build warnings turned errors): target-mips/kvm.c:232:0: error: KVM_REG_MIPS_COUNT_CTL redefined [-Werror] linux-headers/asm/kvm.h:129:0: note: this is the location of the previous definition target-mips/kvm.c:236:0: error: KVM_REG_MIPS_COUNT_RESUME redefined [-Werror] linux-headers/asm/kvm.h:141:0: note: this is the location of the previous definition target-mips/kvm.c:239:0: error: KVM_REG_MIPS_COUNT_HZ redefined [-Werror] linux-headers/asm/kvm.h:147:0: note: this is the location of the previous definition Also update the MIPS_C0_{32,64} macros to utilise definitions more recently added to the asm-mips/kvm.h header. Signed-off-by: James Hogan james.ho...@imgtec.com Cc: Paolo Bonzini pbonz...@redhat.com Cc: Leon Alrae leon.al...@imgtec.com Cc: Aurelien Jarno aurel...@aurel32.net Cc: kvm@vger.kernel.org --- target-mips/kvm.c | 15 ++- 1 file changed, 2 insertions(+), 13 deletions(-) diff --git a/target-mips/kvm.c b/target-mips/kvm.c index 7d2293d93492..bd64a70bcda0 100644 --- a/target-mips/kvm.c +++ b/target-mips/kvm.c @@ -212,10 +212,10 @@ int kvm_mips_set_ipi_interrupt(MIPSCPU *cpu, int irq, int level) } #define MIPS_CP0_32(_R, _S) \ -(KVM_REG_MIPS | KVM_REG_SIZE_U32 | 0x1 | (8 * (_R) + (_S))) +(KVM_REG_MIPS_CP0 | KVM_REG_SIZE_U32 | (8 * (_R) + (_S))) #define MIPS_CP0_64(_R, _S) \ -(KVM_REG_MIPS | KVM_REG_SIZE_U64 | 0x1 | (8 * (_R) + (_S))) +(KVM_REG_MIPS_CP0 | KVM_REG_SIZE_U64 | (8 * (_R) + (_S))) #define KVM_REG_MIPS_CP0_INDEX MIPS_CP0_32(0, 0) #define KVM_REG_MIPS_CP0_CONTEXTMIPS_CP0_64(4, 0) @@ -232,17 +232,6 @@ int kvm_mips_set_ipi_interrupt(MIPSCPU *cpu, int irq, int level) #define KVM_REG_MIPS_CP0_EPCMIPS_CP0_64(14, 0) #define KVM_REG_MIPS_CP0_ERROREPC MIPS_CP0_64(30, 0) -/* CP0_Count control */ -#define KVM_REG_MIPS_COUNT_CTL (KVM_REG_MIPS | KVM_REG_SIZE_U64 | \ - 0x2 | 0) -#define KVM_REG_MIPS_COUNT_CTL_DC 0x0001 /* master disable */ -/* CP0_Count resume monotonic nanoseconds */ -#define KVM_REG_MIPS_COUNT_RESUME (KVM_REG_MIPS | KVM_REG_SIZE_U64 | \ - 0x2 | 1) -/* CP0_Count rate in Hz */ -#define KVM_REG_MIPS_COUNT_HZ (KVM_REG_MIPS | KVM_REG_SIZE_U64 | \ - 0x2 | 2) - static inline int kvm_mips_put_one_reg(CPUState *cs, uint64_t reg_id, int32_t *addr) { -- 2.3.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/13] arm64: KVM: VHE: Add alternatives for VHE-enabled world-switch
Hi Mario, On 09/07/15 02:29, Mario Smarduch wrote: On 07/08/2015 09:19 AM, Marc Zyngier wrote: In order to switch between host and guest, a VHE-enabled kernel must use different accessors for certain system registers. This patch uses runtime patching to use the right instruction when required... Signed-off-by: Marc Zyngier marc.zyng...@arm.com --- arch/arm64/include/asm/kvm_asm.h | 40 ++-- arch/arm64/kvm/hyp.S | 210 ++- arch/arm64/kvm/vhe-macros.h | 18 3 files changed, 191 insertions(+), 77 deletions(-) [] * Author: Marc Zyngier marc.zyng...@arm.com * * This program is free software; you can redistribute it and/or modify @@ -67,40 +67,52 @@ stp x29, lr, [x3, #80] mrs x19, sp_el0 - mrs x20, elr_el2// pc before entering el2 - mrs x21, spsr_el2 // pstate before entering el2 + str x19, [x3, #96] +.endm - stp x19, x20, [x3, #96] - str x21, [x3, #112] Hi Marc, trying to make a little sense out of this :) Don't even try, it hurts... ;-) In the case of VHE kernel the two 'mrs_hyp()' and 'mrs_el1()' calls would be accessing same registers - namely EL1 variants? For non VHE EL2, EL1? The mrs_s and sysreg_EL12 are new, not sure what these mean. mrs_s and msr_s are just macros to that deal with system registers that the assembler doesn't know about (yet). They have been in (moderate) use for about a year, and have been introduced with the GICv3 support. See arch/arm64/include/asm/sysreg.h for the gory details. Now, on to sysreg_EL12: The main idea with VHE is that anything that used to run at EL1 (the kernel) can now run unmodified at EL2, and that it is the EL2 software that has to change to deal with it. So when the kernel uses VHE and runs at EL2, an access to sysreg_EL1 really accesses sysreg_EL2, transparently. This is what makes it possible to run the kernel at EL2 without any change. But when the KVM world switch wants to access a guest register, it cannot use sysreg_EL1 anymore (that would hit on the EL2 register because of the above rule). For this, it must use sysreg_EL12 which effectively means access the EL1 register from EL2. As a consequence, we have the following rules: - non-VHE: msr_el1 uses EL1, msr_hyp uses EL2 - VHE: msr_el1 uses EL12, msr_hyp uses EL1 Does this help? M. - Mario +.macro save_el1_state + mrs_hyp(x20, ELR) // pc before entering el2 + mrs_hyp(x21, SPSR) // pstate before entering el2 mrs x22, sp_el1 - mrs x23, elr_el1 - mrs x24, spsr_el1 + + mrs_el1(x23, elr) + mrs_el1(x24, spsr) + + add x3, x2, #CPU_XREG_OFFSET(31)// SP_EL0 + stp x20, x21, [x3, #8] // HACK: Store to the regs after SP_EL0 str x22, [x2, #CPU_GP_REG_OFFSET(CPU_SP_EL1)] str x23, [x2, #CPU_GP_REG_OFFSET(CPU_ELR_EL1)] str x24, [x2, #CPU_SPSR_OFFSET(KVM_SPSR_EL1)] .endm -.macro restore_common_regs +.macro restore_el1_state // x2: base address for cpu context // x3: tmp register + add x3, x2, #CPU_XREG_OFFSET(31)// SP_EL0 + ldp x20, x21, [x3, #8] // Same hack again, get guest PC and pstate + ldr x22, [x2, #CPU_GP_REG_OFFSET(CPU_SP_EL1)] ldr x23, [x2, #CPU_GP_REG_OFFSET(CPU_ELR_EL1)] ldr x24, [x2, #CPU_SPSR_OFFSET(KVM_SPSR_EL1)] + msr_hyp(ELR, x20) // pc on return from el2 + msr_hyp(SPSR, x21) // pstate on return from el2 + msr sp_el1, x22 - msr elr_el1, x23 - msr spsr_el1, x24 - add x3, x2, #CPU_XREG_OFFSET(31)// SP_EL0 - ldp x19, x20, [x3] - ldr x21, [x3, #16] + msr_el1(elr, x23) + msr_el1(spsr, x24) +.endm +.macro restore_common_regs + // x2: base address for cpu context + // x3: tmp register + + ldr x19, [x2, #CPU_XREG_OFFSET(31)] // SP_EL0 msr sp_el0, x19 - msr elr_el2, x20// pc on return from el2 - msr spsr_el2, x21 // pstate on return from el2 add x3, x2, #CPU_XREG_OFFSET(19) ldp x19, x20, [x3] @@ -113,9 +125,15 @@ .macro save_host_regs save_common_regs +ifnvhe nop,b skip_el1_save + save_el1_state +skip_el1_save: .endm .macro restore_host_regs +ifnvhe nop,b skip_el1_restore + restore_el1_state +skip_el1_restore: restore_common_regs .endm @@ -159,6 +177,7 @@ stp x6, x7, [x3, #16] save_common_regs + save_el1_state .endm .macro restore_guest_regs @@ -184,6 +203,7 @@ ldr x18, [x3, #144] // x19-x29, lr, sp*, elr*, spsr* + restore_el1_state restore_common_regs // Last bits of the
[PATCH v2 0/7] KVM: arm/arm64: gsi routing support
With the advent of GICv3 ITS in-kernel emulation, KVM GSI routing appears to be requested. More specifically MSI routing is needed. irqchip routing does not sound to be really useful on arm but usage of MSI routing also mandates to integrate irqchip routing. The initial implementation of irqfd on arm must be upgraded with the integration of kvm irqchip.c code and the implementation of its standard hooks in the architecture specific part. In case KVM_SET_GSI_ROUTING ioctl is not called, a default routing table with flat irqchip routing entries is built enabling to inject gsi corresponding to the SPI indexes seen by the guest. As soon as KVM_SET_GSI_ROUTING is called, user-space overwrites this default routing table and is responsible for building the whole routing table. for arm/arm64 KVM_SET_GSI_ROUTING has a limited support: - only applies to KVM_IRQFD and not to KVM_IRQ_LINE - irqchip routing was tested on Calxeda midway (VFIO with irqfd) with and without explicit routing - MSI routing without GICv3 ITS was tested using APM Xgene-I (qemu VIRTIO-PCI vhost net without gsi_direct_mapping). - MSI routing with GICv3 ITS is *NOT* tested. Code can be found at https://git.linaro.org/people/eric.auger/linux.git/shortlog/refs/heads/v4.2-rc1-gsi-routing-v2 It applies on Andre's [PATCH 00/13] arm64: KVM: GICv3 ITS emulation (http://www.spinics.net/lists/kvm/msg117402.html) History: v1 - v2: - user API changed: x devid id passed in kvm_irq_routing_msi x kept the new routing entry type: KVM_IRQ_ROUTING_EXTENDED_MSI - kvm_host.h: adopt Andre's proposal to replace the msi_msg by a struct composed of the msi_msg and devid in kvm_kernel_irq_routing_entry - Fix bug reported by Pavel: Added KVM_IRQ_ROUTING_EXTENDED_MSI handling in eventfd.c - added vgic_v2m_inject_msi in vgic-v2-emul.c as suggested by Andre - fix bug reported by Andre: bad setting of msi.flags and msi.devid in kvm_send_userspace_msi - avoid injecting reserved IRQ numbers in vgic_irqfd_set_irq RFC - PATCH: - clearly state limited support on arm/arm64: KVM_IRQ_LINE not impacted by GSI routing - add default routing table feature (new patch file) - changed uapi to use padding field area - reword api.txt Eric Auger (7): KVM: api: introduce KVM_IRQ_ROUTING_EXTENDED_MSI KVM: kvm_host: add devid in kvm_kernel_irq_routing_entry KVM: irqchip: convey devid to kvm_set_msi KVM: arm/arm64: enable irqchip routing KVM: arm/arm64: build a default routing table KVM: arm/arm64: enable MSI routing KVM: arm: implement kvm_set_msi by gsi direct mapping Documentation/virtual/kvm/api.txt | 32 ++-- arch/arm/include/asm/kvm_host.h | 2 + arch/arm/kvm/Kconfig | 3 ++ arch/arm/kvm/Makefile | 2 +- arch/arm64/include/asm/kvm_host.h | 1 + arch/arm64/kvm/Kconfig| 2 + arch/arm64/kvm/Makefile | 2 +- include/kvm/arm_vgic.h| 9 include/linux/kvm_host.h | 7 ++- include/uapi/linux/kvm.h | 6 ++- virt/kvm/arm/vgic-v2-emul.c | 12 + virt/kvm/arm/vgic.c | 101 +- virt/kvm/eventfd.c| 6 ++- virt/kvm/irqchip.c| 12 - 14 files changed, 153 insertions(+), 44 deletions(-) -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 5/7] KVM: arm/arm64: build a default routing table
Implement a default routing table made of flat irqchip routing entries (gsi = irqchip.pin) covering the VGIC SPI indexes. This routing table is overwritten by the first user-space call to KVM_SET_GSI_ROUTING ioctl. Signed-off-by: Eric Auger eric.au...@linaro.org --- PATCH: creation --- virt/kvm/arm/vgic.c | 21 + 1 file changed, 21 insertions(+) diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index 6c6c25e..63be67e 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -1781,6 +1781,8 @@ int vgic_init(struct kvm *kvm) ret |= vgic_init_bitmap(dist-irq_cfg, nr_cpus, nr_irqs); ret |= vgic_init_bytemap(dist-irq_priority, nr_cpus, nr_irqs); + ret |= kvm_setup_default_irq_routing(kvm); + if (ret) goto out; @@ -2258,6 +2260,25 @@ out: return r; } +int kvm_setup_default_irq_routing(struct kvm *kvm) +{ + struct kvm_irq_routing_entry *entries; + u32 nr = kvm-arch.vgic.nr_irqs - VGIC_NR_PRIVATE_IRQS; + int i, ret; + + entries = kcalloc(nr, sizeof(struct kvm_kernel_irq_routing_entry), + GFP_KERNEL); + for (i = 0; i nr; i++) { + entries[i].gsi = i; + entries[i].type = KVM_IRQ_ROUTING_IRQCHIP; + entries[i].u.irqchip.irqchip = 0; + entries[i].u.irqchip.pin = i; + } + ret = kvm_set_irq_routing(kvm, entries, nr, 0); + kfree(entries); + return ret; +} + int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e, struct kvm *kvm, int irq_source_id, int level, bool line_status) -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/7] KVM: kvm_host: add devid in kvm_kernel_irq_routing_entry
Extend kvm_kernel_irq_routing_entry to transport devid. This is needed for ARM. Its validity depends on the routing type entry. Signed-off-by: Eric Auger eric.au...@linaro.org --- v1 - v2: replace msi_msg field by a struct composed of msi_msg and devid RFC - PATCH: - reword the commit message after change in first patch (uapi) --- include/linux/kvm_host.h | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 9564fd7..cac0fe4 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -321,7 +321,10 @@ struct kvm_kernel_irq_routing_entry { unsigned irqchip; unsigned pin; } irqchip; - struct msi_msg msi; + struct { + struct msi_msg msi; + u32 devid; + }; struct kvm_s390_adapter_int adapter; }; struct hlist_node link; -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 7/7] KVM: arm: implement kvm_set_msi by gsi direct mapping
If the ITS modality is not available, let's simply support MSI injection by transforming the MSI.data into an SPI ID. This becomes possible to use KVM_SIGNAL_MSI ioctl for arm too. Signed-off-by: Eric Auger eric.au...@linaro.org --- v1 - v2: - introduce vgic_v2m_inject_msi in vgic-v2-emul.c following Andre's advice --- arch/arm/kvm/Kconfig| 1 + virt/kvm/arm/vgic-v2-emul.c | 12 2 files changed, 13 insertions(+) diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig index 151e710..0f58baf 100644 --- a/arch/arm/kvm/Kconfig +++ b/arch/arm/kvm/Kconfig @@ -31,6 +31,7 @@ config KVM select KVM_VFIO select HAVE_KVM_EVENTFD select HAVE_KVM_IRQFD + select HAVE_KVM_MSI select HAVE_KVM_IRQCHIP select HAVE_KVM_IRQ_ROUTING depends on ARM_VIRT_EXT ARM_LPAE ARM_ARCH_TIMER diff --git a/virt/kvm/arm/vgic-v2-emul.c b/virt/kvm/arm/vgic-v2-emul.c index 1390797..43013cc 100644 --- a/virt/kvm/arm/vgic-v2-emul.c +++ b/virt/kvm/arm/vgic-v2-emul.c @@ -478,6 +478,17 @@ static bool vgic_v2_queue_sgi(struct kvm_vcpu *vcpu, int irq) } /** + * Emulates GICv2M MSI injection by injecting the SPI ID matching + * the msi data + * @kvm: pointer to the kvm struct + * @msi: the msi struct handle + */ +static int vgic_v2m_inject_msi(struct kvm *kvm, struct kvm_msi *msi) +{ + return kvm_vgic_inject_irq(kvm, 0, msi-data, 1); +} + +/** * kvm_vgic_map_resources - Configure global VGIC state before running any VCPUs * @kvm: pointer to the kvm struct * @@ -566,6 +577,7 @@ void vgic_v2_init_emulation(struct kvm *kvm) dist-vm_ops.add_sgi_source = vgic_v2_add_sgi_source; dist-vm_ops.init_model = vgic_v2_init_model; dist-vm_ops.map_resources = vgic_v2_map_resources; + dist-vm_ops.inject_msi = vgic_v2m_inject_msi; kvm-arch.max_vcpus = VGIC_V2_MAX_CPUS; } -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 6/7] KVM: arm/arm64: enable MSI routing
Up to now, only irqchip routing entries could be set. This patch adds the capability to insert MSI routing entries, with or without device id. Although standard MSI entries can be set, their injection still is not supported. For ARM64, let's also increase KVM_MAX_IRQ_ROUTES to 4096: include SPI irqchip flat routes plus MSI routes. In the future this might be extended. The new MSI routing entry type also must be managed similarly to legacy KVM_IRQ_ROUTING_MSI in eventfd irqfd_wakeup and irqfd_update. Signed-off-by: Eric Auger eric.au...@linaro.org --- v1 - v2: - adapt to new routing entry types RFC - PATCH: - move api MSI routing updates into that patch file - use new devid field of user api struct --- Documentation/virtual/kvm/api.txt | 10 ++ include/linux/kvm_host.h | 2 ++ virt/kvm/arm/vgic.c | 13 + virt/kvm/eventfd.c| 6 -- 4 files changed, 29 insertions(+), 2 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 9276dac..686b121 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1431,6 +1431,11 @@ Sets the GSI routing table entries, overwriting any previously set entries. On arm/arm64, GSI routing has the following limitation: - GSI routing does not apply to KVM_IRQ_LINE but only to KVM_IRQFD. +On arm/arm64, MSI routing through in-kernel GICv3 ITS must use +KVM_IRQ_ROUTING_EXTENDED_MSI routing type and device ID must be set +in msi struct. Otherwise, KVM_IRQ_ROUTING_MSI must be used without +populating the msi devid field. + struct kvm_irq_routing { __u32 nr; __u32 flags; @@ -2342,6 +2347,11 @@ On arm/arm64, gsi routing being supported, the following can happen: - in case no routing entry is associated to this gsi, injection fails - in case the gsi is associated to an irqchip routing entry, irqchip.pin + 32 corresponds to the injected SPI ID. +- in case the gsi is associated to an MSI routing entry, + * without GICv3 ITS in-kernel emulation, MSI data matches the SPI ID +of the injected SPI + * with GICv3 ITS in-kernel emulation, the MSI message and device ID +are translated into an LPI. 4.76 KVM_PPC_ALLOCATE_HTAB diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index cac0fe4..ea1a810 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -976,6 +976,8 @@ static inline int mmu_notifier_retry(struct kvm *kvm, unsigned long mmu_seq) #ifdef CONFIG_S390 #define KVM_MAX_IRQ_ROUTES 4096 //FIXME: we can have more than that... +#elif defined(CONFIG_ARM64) +#define KVM_MAX_IRQ_ROUTES 4096 #else #define KVM_MAX_IRQ_ROUTES 1024 #endif diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index 63be67e..c5546d8 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -2252,6 +2252,19 @@ int kvm_set_routing_entry(struct kvm_kernel_irq_routing_entry *e, (e-irqchip.irqchip = KVM_NR_IRQCHIPS)) goto out; break; + case KVM_IRQ_ROUTING_MSI: + e-set = kvm_set_msi; + e-msi.address_lo = ue-u.msi.address_lo; + e-msi.address_hi = ue-u.msi.address_hi; + e-msi.data = ue-u.msi.data; + break; + case KVM_IRQ_ROUTING_EXTENDED_MSI: + e-set = kvm_set_msi; + e-msi.address_lo = ue-u.msi.address_lo; + e-msi.address_hi = ue-u.msi.address_hi; + e-msi.data = ue-u.msi.data; + e-devid = ue-u.msi.devid; + break; default: goto out; } diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 9ff4193..d76d05d 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -238,7 +238,8 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key) irq = irqfd-irq_entry; } while (read_seqcount_retry(irqfd-irq_entry_sc, seq)); /* An event has been signaled, inject an interrupt */ - if (irq.type == KVM_IRQ_ROUTING_MSI) + if (irq.type == KVM_IRQ_ROUTING_MSI || + irq.type == KVM_IRQ_ROUTING_EXTENDED_MSI) kvm_set_msi(irq, kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 1, false); else @@ -294,7 +295,8 @@ static void irqfd_update(struct kvm *kvm, struct _irqfd *irqfd) e = entries; for (i = 0; i n_entries; ++i, ++e) { /* Only fast-path MSI. */ - if (e-type == KVM_IRQ_ROUTING_MSI) + if (e-type == KVM_IRQ_ROUTING_MSI || + e-type == KVM_IRQ_ROUTING_EXTENDED_MSI) irqfd-irq_entry = *e; } -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at
[PATCH v2 3/7] KVM: irqchip: convey devid to kvm_set_msi
on ARM, a devid field is populated in kvm_msi struct in case the flag is set to KVM_MSI_VALID_DEVID. Let's populate the corresponding kvm_kernel_irq_routing_entry devid field and set the msi type to KVM_IRQ_ROUTING_EXTENDED_MSI. Signed-off-by: Eric Auger eric.au...@linaro.org --- virt/kvm/irqchip.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/virt/kvm/irqchip.c b/virt/kvm/irqchip.c index 21c1424..e678f8a 100644 --- a/virt/kvm/irqchip.c +++ b/virt/kvm/irqchip.c @@ -72,9 +72,17 @@ int kvm_send_userspace_msi(struct kvm *kvm, struct kvm_msi *msi) { struct kvm_kernel_irq_routing_entry route; - if (!irqchip_in_kernel(kvm) || msi-flags != 0) + if (!irqchip_in_kernel(kvm)) return -EINVAL; + if (msi-flags KVM_MSI_VALID_DEVID) { + route.devid = msi-devid; + route.type = KVM_IRQ_ROUTING_EXTENDED_MSI; + } else if (!msi-flags) + return -EINVAL; + + /* historically the route.type was not set */ + route.msi.address_lo = msi-address_lo; route.msi.address_hi = msi-address_hi; route.msi.data = msi-data; -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 4/7] KVM: arm/arm64: enable irqchip routing
This patch adds compilation and link against irqchip. On ARM, irqchip routing is not really useful since there is a single irqchip. However main motivation behind using irqchip code is to enable MSI routing code. With the support of in-kernel GICv3 ITS emulation, it now seems to be a MUST HAVE requirement. Functions previously implemented in vgic.c and substitute to more complex irqchip implementation are removed: - kvm_send_userspace_msi - kvm_irq_map_chip_pin - kvm_set_irq - kvm_irq_map_gsi. They implemented a kernel default identity GSI routing. This is now replaced by user-side provided routing. Routing standard hooks are now implemented in vgic: - kvm_set_routing_entry - kvm_set_irq - kvm_set_msi Both HAVE_KVM_IRQCHIP and HAVE_KVM_IRQ_ROUTING are defined. KVM_CAP_IRQ_ROUTING is advertised and KVM_SET_GSI_ROUTING is allowed. MSI routing is not yet allowed. Signed-off-by: Eric Auger eric.au...@linaro.org --- v1 - v2: - fix bug reported by Andre related to msi.flags and msi.devid setting in kvm_send_userspace_msi - avoid injecting reserved IRQ numbers in vgic_irqfd_set_irq RFC - PATCH - reword api.txt: x move MSI routing comments in a subsequent patch, x clearly state GSI routing does not apply to KVM_IRQ_LINE --- Documentation/virtual/kvm/api.txt | 12 --- arch/arm/include/asm/kvm_host.h | 2 ++ arch/arm/kvm/Kconfig | 2 ++ arch/arm/kvm/Makefile | 2 +- arch/arm64/include/asm/kvm_host.h | 1 + arch/arm64/kvm/Kconfig| 2 ++ arch/arm64/kvm/Makefile | 2 +- include/kvm/arm_vgic.h| 9 - virt/kvm/arm/vgic.c | 71 +-- virt/kvm/irqchip.c| 2 ++ 10 files changed, 65 insertions(+), 40 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 3d920cd..9276dac 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1421,13 +1421,16 @@ KVM_ASSIGN_DEV_IRQ. Partial deassignment of host or guest IRQ is allowed. 4.52 KVM_SET_GSI_ROUTING Capability: KVM_CAP_IRQ_ROUTING -Architectures: x86 s390 +Architectures: x86 s390 arm arm64 Type: vm ioctl Parameters: struct kvm_irq_routing (in) Returns: 0 on success, -1 on error Sets the GSI routing table entries, overwriting any previously set entries. +On arm/arm64, GSI routing has the following limitation: +- GSI routing does not apply to KVM_IRQ_LINE but only to KVM_IRQFD. + struct kvm_irq_routing { __u32 nr; __u32 flags; @@ -2335,9 +2338,10 @@ Note that closing the resamplefd is not sufficient to disable the irqfd. The KVM_IRQFD_FLAG_RESAMPLE is only necessary on assignment and need not be specified with KVM_IRQFD_FLAG_DEASSIGN. -On ARM/ARM64, the gsi field in the kvm_irqfd struct specifies the Shared -Peripheral Interrupt (SPI) index, such that the GIC interrupt ID is -given by gsi + 32. +On arm/arm64, gsi routing being supported, the following can happen: +- in case no routing entry is associated to this gsi, injection fails +- in case the gsi is associated to an irqchip routing entry, + irqchip.pin + 32 corresponds to the injected SPI ID. 4.76 KVM_PPC_ALLOCATE_HTAB diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index e896d2c..9871441 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -42,6 +42,8 @@ #define KVM_VCPU_MAX_FEATURES 2 +#define KVM_IRQCHIP_NUM_PINS 988 /* 1020 -32 is the number of SPI */ + #include kvm/arm_vgic.h u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode); diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig index bfb915d..151e710 100644 --- a/arch/arm/kvm/Kconfig +++ b/arch/arm/kvm/Kconfig @@ -31,6 +31,8 @@ config KVM select KVM_VFIO select HAVE_KVM_EVENTFD select HAVE_KVM_IRQFD + select HAVE_KVM_IRQCHIP + select HAVE_KVM_IRQ_ROUTING depends on ARM_VIRT_EXT ARM_LPAE ARM_ARCH_TIMER ---help--- Support hosting virtualized guest machines. diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile index c5eef02c..1a8f48a 100644 --- a/arch/arm/kvm/Makefile +++ b/arch/arm/kvm/Makefile @@ -15,7 +15,7 @@ AFLAGS_init.o := -Wa,-march=armv7-a$(plus_virt) AFLAGS_interrupts.o := -Wa,-march=armv7-a$(plus_virt) KVM := ../../../virt/kvm -kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o $(KVM)/vfio.o +kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o $(KVM)/vfio.o $(KVM)/irqchip.o obj-y += kvm-arm.o init.o interrupts.o obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 2709db2..f6201eb 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -44,6 +44,7 @@ #include kvm/arm_arch_timer.h #define KVM_VCPU_MAX_FEATURES 3 +#define KVM_IRQCHIP_NUM_PINS 988 /*
[PATCH v2 1/7] KVM: api: introduce KVM_IRQ_ROUTING_EXTENDED_MSI
On ARM, the MSI msg (address and data) comes along with out-of-band device ID information. The device ID encodes the device that writes the MSI msg. Let's convey the device id in kvm_irq_routing_msi and use a new routing entry type to indicate the devid is populated. Signed-off-by: Eric Auger eric.au...@linaro.org --- v1 - v2: - devid id passed in kvm_irq_routing_msi instead of in kvm_irq_routing_entry RFC - PATCH - remove kvm_irq_routing_extended_msi and use union instead --- Documentation/virtual/kvm/api.txt | 10 +- include/uapi/linux/kvm.h | 6 +- 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index bea8de7..3d920cd 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1453,6 +1453,7 @@ struct kvm_irq_routing_entry { #define KVM_IRQ_ROUTING_IRQCHIP 1 #define KVM_IRQ_ROUTING_MSI 2 #define KVM_IRQ_ROUTING_S390_ADAPTER 3 +#define KVM_IRQ_ROUTING_EXTENDED_MSI 4 No flags are specified so far, the corresponding field must be set to zero. @@ -1465,9 +1466,16 @@ struct kvm_irq_routing_msi { __u32 address_lo; __u32 address_hi; __u32 data; - __u32 pad; + union { + __u32 pad; + __u32 devid; + }; }; +for KVM_IRQ_ROUTING_EXTENDED_MSI routing entry type, the kvm_irq_routing_msi +routing entry is used and devid is populated with the device ID that wrote +the MSI message. For PCI, this is usually a BFD identifier in the lower 16 bits. + struct kvm_irq_routing_s390_adapter { __u64 ind_addr; __u64 summary_addr; diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 9a261e5..39ec164 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -829,7 +829,10 @@ struct kvm_irq_routing_msi { __u32 address_lo; __u32 address_hi; __u32 data; - __u32 pad; + union { + __u32 pad; + __u32 devid; + }; }; struct kvm_irq_routing_s390_adapter { @@ -844,6 +847,7 @@ struct kvm_irq_routing_s390_adapter { #define KVM_IRQ_ROUTING_IRQCHIP 1 #define KVM_IRQ_ROUTING_MSI 2 #define KVM_IRQ_ROUTING_S390_ADAPTER 3 +#define KVM_IRQ_ROUTING_EXTENDED_MSI 4 struct kvm_irq_routing_entry { __u32 gsi; -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] Detect vGIC presence at runtime
On Tue, Jul 07, 2015 at 02:11:01PM +0300, Pavel Fedin wrote: Allows to use KVM on hardware without vGIC. Interrupt controller has to be emulated in userspace in this case. Signed-off-by: Pavel Fedin p.fe...@samsung.com --- arch/arm/kvm/arm.c | 19 ++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index d9631ec..f2f7a13 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -61,6 +61,8 @@ static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1); static u8 kvm_next_vmid; static DEFINE_SPINLOCK(kvm_vmid_lock); +static bool vgic_present; + static void kvm_arm_set_running_vcpu(struct kvm_vcpu *vcpu) { BUG_ON(preemptible()); @@ -172,6 +174,8 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) switch (ext) { case KVM_CAP_IRQCHIP: case KVM_CAP_IRQFD: + r = vgic_present; + break; case KVM_CAP_IOEVENTFD: case KVM_CAP_DEVICE_CTRL: case KVM_CAP_USER_MEMORY: @@ -850,6 +854,8 @@ static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm, switch (dev_id) { case KVM_ARM_DEVICE_VGIC_V2: + if (!vgic_present) + return -ENXIO; return kvm_vgic_addr(kvm, type, dev_addr-addr, true); default: return -ENODEV; @@ -864,6 +870,8 @@ long kvm_arch_vm_ioctl(struct file *filp, switch (ioctl) { case KVM_CREATE_IRQCHIP: { + if (!vgic_present) + return -ENXIO; return kvm_vgic_create(kvm, KVM_DEV_TYPE_ARM_VGIC_V2); } case KVM_ARM_SET_DEVICE_ADDR: { @@ -1046,8 +1054,17 @@ static int init_hyp_mode(void) * Init HYP view of VGIC */ err = kvm_vgic_hyp_init(); - if (err) + switch (err) { + case 0: + vgic_present = true; + break; + case -ENODEV: + case -ENXIO: + vgic_present = false; why not report ENXIO as an error? If probing the vgic fails due to being unable to request the irq or something similar, then surely your system has and error and this should be reported. This may be more nicely implemented by letting the vgic init/probe functions set the vgic_present, or maybe better yet, just export a function from vgic.c: bool kvm_vgic_present(void) { return vgic_ops != NULL; } + break; + default: goto out_free_context; + } /* * Init HYP architected timer support -- 2.4.4 -Christoffer -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for 2.4 0/2] MIPS build fixes for v2.4
On 09/07/2015 10:17, James Hogan wrote: These two patches fix build errors for the MIPS TCG backend and MIPS KVM. Please could they be applied for v2.4. James Hogan (2): tcg/mips: Fix build error from merged memop+mmu_idx parameter mips/kvm: Sync with newer MIPS KVM headers target-mips/kvm.c | 15 ++- tcg/mips/tcg-target.c | 4 ++-- 2 files changed, 4 insertions(+), 15 deletions(-) Reviewed-by: Leon Alrae leon.al...@imgtec.com Peter, since these are build fixes, could they be squeezed into rc0? Thanks, Leon -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: Add Kconfig option to signal cross-endian guests
On Thu, 9 Jul 2015 09:49:05 +0200 Thomas Huth th...@redhat.com wrote: The option for supporting cross-endianness legacy guests in s/cross-endianness/cross-endian/ ? the vhost and tun code should only be available on systems that support cross-endian guests. Signed-off-by: Thomas Huth th...@redhat.com --- arch/arm/kvm/Kconfig | 1 + arch/arm64/kvm/Kconfig | 1 + arch/powerpc/kvm/Kconfig | 1 + drivers/net/Kconfig | 1 + drivers/vhost/Kconfig| 1 + virt/kvm/Kconfig | 3 +++ 6 files changed, 8 insertions(+) Acked-by: Cornelia Huck cornelia.h...@de.ibm.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: Add Kconfig option to signal cross-endian guests
On Thu, 9 Jul 2015 09:49:05 +0200 Thomas Huth th...@redhat.com wrote: The option for supporting cross-endianness legacy guests in s/cross-endianness/cross-endian/ ? the vhost and tun code should only be available on systems that support cross-endian guests. Signed-off-by: Thomas Huth th...@redhat.com --- arch/arm/kvm/Kconfig | 1 + arch/arm64/kvm/Kconfig | 1 + arch/powerpc/kvm/Kconfig | 1 + drivers/net/Kconfig | 1 + drivers/vhost/Kconfig| 1 + virt/kvm/Kconfig | 3 +++ 6 files changed, 8 insertions(+) Acked-by: Cornelia Huck cornelia.h...@de.ibm.com -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC v2 0/6] IRQ bypass manager and irqfd consumer
On Tue, Jul 07, 2015 at 11:17:48AM -0600, Alex Williamson wrote: Hosting the bypass manager in kernel/irq seemed appropriate, but really it could be anywhere. Does anyone have a different preference or specifically want it under their scope? We had originally thought of this as an IOMMU service, but I think we've generalized it beyond that. I expect we should also add the necessary hooks to turn it into a loadable module to keep the tinification folks happy, I'll incorporate the current working changes and post a version with that. Yeah, this is only an IOMMU service on x86, afaik. So drivers/iommu is probably the wrong place to host it. Will there be any other producers than VFIO or any other consumers than KVM? If not, it should live in one of these spaces. KVM is probably the best choice, as any hardware feature that uses this targets virtualization, so there will hardly ever be another consumer than KVM. Joerg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] KVM: arm/arm64: Don't let userspace update CNTVOFF once guest is running
On 9 July 2015 at 13:05, Christoffer Dall christoffer.d...@linaro.org wrote: As I understand it, the problem is that if we ever run a VCPU after reading the value, and write back the value afterwards, you potentially make time go backwards and get inconsistent views of time from different VCPUs because they may have read the time before/after updating the CNTVOFF. Right, but I think if QEMU does that it's a bug (and more to the point I don't entirely understand why we would do that yet, even given that we don't have a distinction between registers to sync always and registers to sync only on reset...) -- PMM -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] KVM: arm/arm64: Don't let userspace update CNTVOFF once guest is running
On Thu, Jul 09, 2015 at 01:07:24PM +0100, Peter Maydell wrote: On 9 July 2015 at 13:05, Christoffer Dall christoffer.d...@linaro.org wrote: As I understand it, the problem is that if we ever run a VCPU after reading the value, and write back the value afterwards, you potentially make time go backwards and get inconsistent views of time from different VCPUs because they may have read the time before/after updating the CNTVOFF. Right, but I think if QEMU does that it's a bug (and more to the point I don't entirely understand why we would do that yet, even given that we don't have a distinction between registers to sync always and registers to sync only on reset...) I think we have evidence that it does that, but we don't know why/how. -Christoffer -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 08/11] KVM: arm: implement dirty bit mechanism for debug registers
On Tue, Jul 07, 2015 at 11:24:06AM +0100, Will Deacon wrote: On Tue, Jul 07, 2015 at 11:06:57AM +0100, Zhichao Huang wrote: Chazy and me are talking about how to reduce the saving/restoring overhead for debug registers. We want to add a state in hw_breakpoint.c to indicate whether the host enable any hwbrpts or not (might export a fuction that kvm can call), then we can read this state from memory instead of reading from real hardware registers, and to decide whether we need a world switch or not. Does it acceptable? Maybe, hard to tell without the code. There are obvious races to deal with if you use variables to indicate whether resources are in use -- why not just trap debug access from the host as well? Then you could keep track of the owner in kvm and trap accesses from everybody else. The only information we're looking for here is whether the host has enabled some break/watch point so that we need to disable them before running the guest. Just to re-iterate, when we are about to run a guest, we have the following cases: 1) Neither the host nor the guest has configured any [WB]points 2) Only the host has configured any [WB]points 3) Only the guest has configured any [WB]points 4) Both the host and the guest have configured any [WB]points In case (1), KVM should enable trapping and swtich the register state on guest accesses. In cases (2), (3), and (4) we must switch the register state on each entry/exit. If we are to trap debug register accesses in KVM to set a flag to keep track of the owner (iow. has the host touched the registers) then don't we impose an ordering requirement of whether KVM or the breakpoint functionality gets initialized first, and we need to take special care when tearing down KVM to disable the traps? It sounds a little complex. I've previously suggested to simply look at the B/W control registers to figure out what to do. Caching the state in memory is an optimization, do we even have any idea how important such an optimization is? Thanks, -Christoffer -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] KVM: arm/arm64: Don't let userspace update CNTVOFF once guest is running
On Thu, Jul 09, 2015 at 11:38:40AM +0100, Peter Maydell wrote: On 9 July 2015 at 11:22, Christoffer Dall christoffer.d...@linaro.org wrote: On Wed, Jul 08, 2015 at 08:13:59PM +0100, Peter Maydell wrote: I suspect Jan is right and we really need to distinguish the KVM_PUT_*_STATE levels in ARM QEMU. This probably implies some kind of whitelist/override mechanism, since by and large we neither know nor want to know the semantics for system registers, we leave that up to the kernel. Q: if you have a running VM, and you pause it for an hour, what should the CNTVCT register do? Presumably it should not advance, but how do we arrange for that to happen? I think the CNTVCT should not advance when the VM is not scheduled, so if we pause the VM or starve all the VCPUs for enough time, the guest should not see time progressing, since otherwise the guest scheduler cannot maintain fairness and you're bound to see spurious RCU stalls etc. That's exactly why a guest can read both a virtual and physical counter and it is an area where you simply want some level of paravirtualization. I haven't studied how/if Linux deals with this at all. So I think adjusting CNTVOFF should be managed by the kernel for the pause/starvation scenario (which I think Avi once told me x86 does too - does anyone know the current state of the art?). Agreed that we should be aiming for same behaviour as x86 rather than reinventing the wheel... So the only situation where I think userspace should adjust the CNTVOFF value is for migration where we are talking about a brand new VM with has_run_once clear. For incoming migration/loadvm the VM will be freshly *reset*, but it will not be completely created from scratch. It's possible for the user to run a VM for a bit, and then use loadvm from the monitor to load an old snapshot. This will do a qemu_system_reset() and then restore the state. So in that case has_run_once won't be clear. (Would it be OK to clear it when userspace does VCPU_INIT? What else does it control?) We would have to introduce another flag. It is currently used to only map the VGIC resources once (we have to do this before running the VCPU the first time because only then do we have the required info of where to map the resources etc.), and we use it to enforce that the user doesn't initialize a VCPU with one setting, and then initialize it with a different setting afterwards (for example with power-on state or the emulated CPU target). Thus, if we were designing this from scratch now, the API should be to return an error when trying to set KVM_REG_ARM_TIMER_CNT after the VM has run once, but it's too late for that as we would break userspace. The best alternative IMHO would be to merge Marc's patch and fix CNTVOFF in the kernel side as well, and finally also fix QEMU so that it doesn't try to do the thing that future kernels will ignore. Isn't Marc's patch causing us to return an error to userspace? (It does a 'return -1', at least -- does that get ignored at a higher level?) I remembered it as it simply ignored it, but you're right, it returns -1 to userspace (which is a strange thing we do in arch_timer.c, I'm not sure why we return -EPERM here - probably a bug). So that can't work obviously, because it will regress userspace. In principle, if userspace does: * stop all vCPUs * read the counter values from each vCPU * write the counter values back to each vCPU * resume vCPUs this shouldn't cause time to do funny things, should it? No, it shouldn't. (ie the problem only occurs if we try to write the vCPU counter value while only one vCPU is stopped). As I understand it, the problem is that if we ever run a VCPU after reading the value, and write back the value afterwards, you potentially make time go backwards and get inconsistent views of time from different VCPUs because they may have read the time before/after updating the CNTVOFF. -Christoffer -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] KVM: arm/arm64: Don't let userspace update CNTVOFF once guest is running
Hi Jan, On Thu, Jul 09, 2015 at 12:40:39PM +0200, Jan Kiszka wrote: On 2015-07-09 12:22, Christoffer Dall wrote: Hi Peter and Marc, [cc'ing Paolo for his input on x86 timekeeping] On Wed, Jul 08, 2015 at 08:13:59PM +0100, Peter Maydell wrote: On 8 July 2015 at 17:37, Marc Zyngier marc.zyng...@arm.com wrote: On 08/07/15 17:06, Peter Maydell wrote: I'd prefer it if somebody could investigate to see why QEMU is actually doing this -- so far we just have speculation. I'd prefer that too, but so far people seem to be more comfortable waiting for the issue to fix itself. In the meantime, VMs are broken in weird and wonderful ways, and I don't think the current status-quo helps anyone. Putting in a patch which might not be the right fix isn't necessarily a good plan either... Does has_run_once get cleared if we do a re-VCPU_INIT of a CPU that's run before? (We need to allow rewriting of guest state at that point so that reset VM and load migration state behaves correctly.) no, it does not, has_run_once is set the first time a VCPU is run and is currently *never* cleared. I suspect Jan is right and we really need to distinguish the KVM_PUT_*_STATE levels in ARM QEMU. This probably implies some kind of whitelist/override mechanism, since by and large we neither know nor want to know the semantics for system registers, we leave that up to the kernel. Q: if you have a running VM, and you pause it for an hour, what should the CNTVCT register do? Presumably it should not advance, but how do we arrange for that to happen? I think the CNTVCT should not advance when the VM is not scheduled, so if we pause the VM or starve all the VCPUs for enough time, the guest should not see time progressing, since otherwise the guest scheduler cannot maintain fairness and you're bound to see spurious RCU stalls etc. That's exactly why a guest can read both a virtual and physical counter and it is an area where you simply want some level of paravirtualization. I haven't studied how/if Linux deals with this at all. So I think adjusting CNTVOFF should be managed by the kernel for the pause/starvation scenario (which I think Avi once told me x86 does too - does anyone know the current state of the art?). So the only situation where I think userspace should adjust the CNTVOFF value is for migration where we are talking about a brand new VM with has_run_once clear. Thus, if we were designing this from scratch now, the API should be to return an error when trying to set KVM_REG_ARM_TIMER_CNT after the VM has run once, but it's too late for that as we would break userspace. The best alternative IMHO would be to merge Marc's patch and fix CNTVOFF in the kernel side as well, and finally also fix QEMU so that it doesn't try to do the thing that future kernels will ignore. Fixing QEMU to only write on KVM_PUT_FULL_STATE - yes, that should be done, but I don't think the approach for the kernel is generally right. The kernel should not do any policing on user space requests to change the VCPU or VM state unless - security is affected - userspace lacks information, thus cannot decide correctly - legacy userspace has a bug, we can detect it and want to fix that up without affecting future userspace that has a reason to do it differently Regarding CNTVOFF, the first two criteria do not apply for sure. Maybe the last one, don't know. Just think of the hypothetical scenario that a userspace VM debugger wants to inject certain register manipulations. If you block this by some hidden VM state like proposed, that feature would no longer be implementable easily. Hmm, I didn't think about this, I guess reverse execution and record/replay scenarios would also want you to make time appear to be going backwards? But the third case kind of does apply, if we care about fixing legacy userspace in the kernel. We could define a new register index for the ioctl that allows time to move forward that future userspace can use, and impose the restriction of not making time go backwards with the current index. Would that break any currently working use cases on arm64 with legacy userspace, such as migration? Thanks, -Christoffer signature.asc Description: Digital signature
Re: [PATCH 01/13] arm/arm64: Add new is_kernel_in_hyp_mode predicate
Hi, +static inline bool is_kernel_in_hyp_mode(void) +{ + u64 el; + + asm(mrs %0, CurrentEL : =r (el)); + return el == CurrentEL_EL2; +} If you can include cputype.h, I think this can be: static inline bool is_kernel_in_hyp_mode(void) { return read_cpuid(CurrentEL) == CurrentEL_EL2; } Mark. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: Add Kconfig option to signal cross-endian guests
On 09/07/2015 09:49, Thomas Huth wrote: The option for supporting cross-endianness legacy guests in the vhost and tun code should only be available on systems that support cross-endian guests. I'm sure I misunderstand something, but what happens if we use QEMU with TCG instead of KVM, i.e. a big endian powerpc kernel guest on x86_64 little endian host ? Do you forbid the use of vhost in this case ? Signed-off-by: Thomas Huth th...@redhat.com --- arch/arm/kvm/Kconfig | 1 + arch/arm64/kvm/Kconfig | 1 + arch/powerpc/kvm/Kconfig | 1 + drivers/net/Kconfig | 1 + drivers/vhost/Kconfig| 1 + virt/kvm/Kconfig | 3 +++ 6 files changed, 8 insertions(+) diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig index bfb915d..9d8f363 100644 --- a/arch/arm/kvm/Kconfig +++ b/arch/arm/kvm/Kconfig @@ -31,6 +31,7 @@ config KVM select KVM_VFIO select HAVE_KVM_EVENTFD select HAVE_KVM_IRQFD + select KVM_CROSS_ENDIAN_GUESTS depends on ARM_VIRT_EXT ARM_LPAE ARM_ARCH_TIMER ---help--- Support hosting virtualized guest machines. diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig index bfffe8f..9af39fe 100644 --- a/arch/arm64/kvm/Kconfig +++ b/arch/arm64/kvm/Kconfig @@ -31,6 +31,7 @@ config KVM select KVM_VFIO select HAVE_KVM_EVENTFD select HAVE_KVM_IRQFD + select KVM_CROSS_ENDIAN_GUESTS ---help--- Support hosting virtualized guest machines. diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig index 3caec2c..e028710 100644 --- a/arch/powerpc/kvm/Kconfig +++ b/arch/powerpc/kvm/Kconfig @@ -79,6 +79,7 @@ config KVM_BOOK3S_64_HV select KVM_BOOK3S_HV_POSSIBLE select MMU_NOTIFIER select CMA + select KVM_CROSS_ENDIAN_GUESTS ---help--- Support running unmodified book3s_64 guest kernels in virtual machines on POWER7 and PPC970 processors that have diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index c18f9e6..0c4ce47 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -261,6 +261,7 @@ config TUN config TUN_VNET_CROSS_LE bool Support for cross-endian vnet headers on little-endian kernels default n + depends on KVM_CROSS_ENDIAN_GUESTS ---help--- This option allows TUN/TAP and MACVTAP device drivers in a little-endian kernel to parse vnet headers that come from a diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig index 533eaf0..4d8ae6b 100644 --- a/drivers/vhost/Kconfig +++ b/drivers/vhost/Kconfig @@ -35,6 +35,7 @@ config VHOST config VHOST_CROSS_ENDIAN_LEGACY bool Cross-endian support for vhost + depends on KVM_CROSS_ENDIAN_GUESTS default n ---help--- This option allows vhost to support guests with a different byte diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig index e2c876d..cc7b28a 100644 --- a/virt/kvm/Kconfig +++ b/virt/kvm/Kconfig @@ -47,3 +47,6 @@ config KVM_GENERIC_DIRTYLOG_READ_PROTECT config KVM_COMPAT def_bool y depends on COMPAT !S390 + +config KVM_CROSS_ENDIAN_GUESTS + bool -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 00/11] KVM Guest Debug support for arm64
On Tue, Jul 07, 2015 at 05:29:52PM +0100, Alex Bennée wrote: Here is V8 of the KVM Guest Debug support for arm64. The diffstat between v7 and v8 is getting pretty small and as I haven't re-based you can run: git diff -u guest-debug/4.1-v7..guest-debug/4.1-v8 And the kernelci report is at: http://kernelci.org/build/alex/kernel/v4.1-11-g182a5fa64600/ The only real changes apart from comments and white space are to sys_regs which sees another minor re-factoring. The 32 bit handling explicitly preserves the top 32 bits of the AArch64 registers although I'm not convinced it matter too much as for a booting AArch32 guest kernel they should start as 0 and never change. For full details see the changelog on each of the patches. GIT Repos: The patches for this series are based off v4.1 and can be found at: Kernel: https://git.linaro.org/people/alex.bennee/linux.git branch: guest-debug/4.1-v8 describe: v4.1-11-g182a5faB QEMU: https://github.com/stsquad/qemu branch: kvm/guest-debug-v6*** BLURB HERE *** I'm happy with this series as it stands now and think it can be put into -next. I have boot-tested the series with a VM on both Juno and TC2. Thanks, -Christoffer -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 03/13] arm64: Add ARM64_HAS_VIRT_HOST_EXTN feature
On 09/07/15 10:48, Mark Rutland wrote: On Wed, Jul 08, 2015 at 05:19:06PM +0100, Marc Zyngier wrote: Add a new ARM64_HAS_VIRT_HOST_EXTN features to indicate that the CPU has the ARMv8,1 VHE capability. Nit: s/,/./ It's probably worth mentioning somewhere that we have to check CurrentEL rather than a feature register in case some prior software dropped us to EL1N (e.g. if we're a guest under this scheme). Good point, this is a leftover from a previous version that actually checked the feature register. I'll clean up the commit log. Thanks, M. Mark. This will be used to trigger kernel patching in KVM. Signed-off-by: Marc Zyngier marc.zyng...@arm.com --- arch/arm64/include/asm/cpufeature.h | 3 ++- arch/arm64/kernel/cpufeature.c | 11 +++ 2 files changed, 13 insertions(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h index c104421..6c3742d 100644 --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -25,8 +25,9 @@ #define ARM64_WORKAROUND_DEVICE_LOAD_ACQUIRE1 #define ARM64_WORKAROUND_845719 2 #define ARM64_HAS_SYSREG_GIC_CPUIF 3 +#define ARM64_HAS_VIRT_HOST_EXTN4 -#define ARM64_NCAPS 4 +#define ARM64_NCAPS 5 #ifndef __ASSEMBLY__ diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 5ad86ce..e1dcd63 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -21,6 +21,7 @@ #include linux/types.h #include asm/cpu.h #include asm/cpufeature.h +#include asm/virt.h static bool has_id_aa64pfr0_feature(const struct arm64_cpu_capabilities *entry) @@ -31,6 +32,11 @@ has_id_aa64pfr0_feature(const struct arm64_cpu_capabilities *entry) return (val entry-register_mask) == entry-register_value; } +static bool runs_at_el2(const struct arm64_cpu_capabilities *entry) +{ +return is_kernel_in_hyp_mode(); +} + static const struct arm64_cpu_capabilities arm64_features[] = { { .desc = GIC system register CPU interface, @@ -39,6 +45,11 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .register_mask = (0xf 24), .register_value = (1 24), }, +{ +.desc = Virtualization Host Extensions, +.capability = ARM64_HAS_VIRT_HOST_EXTN, +.matches = runs_at_el2, +}, {}, }; -- 2.1.4 ___ kvmarm mailing list kvm...@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm -- Jazz is not dead. It just smells funny... -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: Add Kconfig option to signal cross-endian guests
On Thu, Jul 09, 2015 at 09:49:05AM +0200, Thomas Huth wrote: The option for supporting cross-endianness legacy guests in the vhost and tun code should only be available on systems that support cross-endian guests. Signed-off-by: Thomas Huth th...@redhat.com Acked-by: Christoffer Dall christoffer.d...@linaro.org -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: Add Kconfig option to signal cross-endian guests
On Thu, Jul 09, 2015 at 09:49:05AM +0200, Thomas Huth wrote: The option for supporting cross-endianness legacy guests in the vhost and tun code should only be available on systems that support cross-endian guests. Signed-off-by: Thomas Huth th...@redhat.com Acked-by: Christoffer Dall christoffer.d...@linaro.org -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] target-i386: Sanity check host processor physical address width
On Thu, 09 Jul 2015 09:02:38 +0200 Laszlo Ersek ler...@redhat.com wrote: On 07/09/15 00:42, Bandan Das wrote: If a Linux guest is assigned more memory than is supported by the host processor, the guest is unable to boot. That is expected, however, there's no message indicating the user what went wrong. This change prints a message to stderr if KVM has the corresponding capability. Reported-by: Laszlo Ersek ler...@redhat.com Signed-off-by: Bandan Das b...@redhat.com --- linux-headers/linux/kvm.h | 1 + target-i386/kvm.c | 6 ++ 2 files changed, 7 insertions(+) diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h index 3bac873..6afad49 100644 --- a/linux-headers/linux/kvm.h +++ b/linux-headers/linux/kvm.h @@ -817,6 +817,7 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_DISABLE_QUIRKS 116 #define KVM_CAP_X86_SMM 117 #define KVM_CAP_MULTI_ADDRESS_SPACE 118 +#define KVM_CAP_PHY_ADDR_WIDTH 119 #ifdef KVM_CAP_IRQ_ROUTING diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 066d03d..66e3448 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -892,6 +892,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s) uint64_t shadow_mem; int ret; struct utsname utsname; +int max_phys_bits; ret = kvm_get_supported_msrs(s); if (ret 0) { @@ -945,6 +946,11 @@ int kvm_arch_init(MachineState *ms, KVMState *s) } } +max_phys_bits = kvm_check_extension(s, KVM_CAP_PHY_ADDR_WIDTH); +if (max_phys_bits (1ULL max_phys_bits) = ram_size) +fprintf(stderr, Warning: The amount of memory assigned to the guest +is more than that supported by the host CPU(s). Guest may be unstable.\n); + if (kvm_check_extension(s, KVM_CAP_X86_SMM)) { smram_machine_done.notify = register_smram_listener; qemu_add_machine_init_done_notifier(smram_machine_done); First, see my comments on the KVM patch. Second, ram_size is not the right thing to compare. What should be checked is whether the highest guest-physical address that maps to RAM can be represented in the address width of the host processor (and only if EPT is enabled, but that sub-condition belongs to the KVM patch). Note that this is not the same as the check written in the patch. For example, if you assume a 32-bit PCI hole with size 1 GB, then a total guest RAM of size 63 GB will result in the highest guest-phys memory address being 0xF__, which just fits into 36 bits. Correspondingly, the above code would not print the warning for -m $((63 * 1024 + 1)) on my laptop (which has address sizes : 36 bits physical, ...), even though such a guest would not boot for me (with EPT enabled). Please see http://thread.gmane.org/gmane.comp.bios.tianocore.devel/15418/focus=15447 So, ram_size in the controlling expression should be replaced with maximum_guest_ram_address (which should be inclusive, and the = relop should be preserved). also with memory hotplug tuned on we should check if the end of hotplug memory area is less then limit, i.e.: pcms-hotplug_memory.base + hotplug_mem_size 1ULL max_phys_bits Thanks Laszlo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 03/13] arm64: Add ARM64_HAS_VIRT_HOST_EXTN feature
On Wed, Jul 08, 2015 at 05:19:06PM +0100, Marc Zyngier wrote: Add a new ARM64_HAS_VIRT_HOST_EXTN features to indicate that the CPU has the ARMv8,1 VHE capability. Nit: s/,/./ It's probably worth mentioning somewhere that we have to check CurrentEL rather than a feature register in case some prior software dropped us to EL1N (e.g. if we're a guest under this scheme). Mark. This will be used to trigger kernel patching in KVM. Signed-off-by: Marc Zyngier marc.zyng...@arm.com --- arch/arm64/include/asm/cpufeature.h | 3 ++- arch/arm64/kernel/cpufeature.c | 11 +++ 2 files changed, 13 insertions(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h index c104421..6c3742d 100644 --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -25,8 +25,9 @@ #define ARM64_WORKAROUND_DEVICE_LOAD_ACQUIRE 1 #define ARM64_WORKAROUND_845719 2 #define ARM64_HAS_SYSREG_GIC_CPUIF 3 +#define ARM64_HAS_VIRT_HOST_EXTN 4 -#define ARM64_NCAPS 4 +#define ARM64_NCAPS 5 #ifndef __ASSEMBLY__ diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 5ad86ce..e1dcd63 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -21,6 +21,7 @@ #include linux/types.h #include asm/cpu.h #include asm/cpufeature.h +#include asm/virt.h static bool has_id_aa64pfr0_feature(const struct arm64_cpu_capabilities *entry) @@ -31,6 +32,11 @@ has_id_aa64pfr0_feature(const struct arm64_cpu_capabilities *entry) return (val entry-register_mask) == entry-register_value; } +static bool runs_at_el2(const struct arm64_cpu_capabilities *entry) +{ + return is_kernel_in_hyp_mode(); +} + static const struct arm64_cpu_capabilities arm64_features[] = { { .desc = GIC system register CPU interface, @@ -39,6 +45,11 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .register_mask = (0xf 24), .register_value = (1 24), }, + { + .desc = Virtualization Host Extensions, + .capability = ARM64_HAS_VIRT_HOST_EXTN, + .matches = runs_at_el2, + }, {}, }; -- 2.1.4 ___ kvmarm mailing list kvm...@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: Add Kconfig option to signal cross-endian guests
On 09/07/2015 09:49, Thomas Huth wrote: The option for supporting cross-endianness legacy guests in the vhost and tun code should only be available on systems that support cross-endian guests. I'm sure I misunderstand something, but what happens if we use QEMU with TCG instead of KVM, i.e. a big endian powerpc kernel guest on x86_64 little endian host ? Do you forbid the use of vhost in this case ? Signed-off-by: Thomas Huth th...@redhat.com --- arch/arm/kvm/Kconfig | 1 + arch/arm64/kvm/Kconfig | 1 + arch/powerpc/kvm/Kconfig | 1 + drivers/net/Kconfig | 1 + drivers/vhost/Kconfig| 1 + virt/kvm/Kconfig | 3 +++ 6 files changed, 8 insertions(+) diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig index bfb915d..9d8f363 100644 --- a/arch/arm/kvm/Kconfig +++ b/arch/arm/kvm/Kconfig @@ -31,6 +31,7 @@ config KVM select KVM_VFIO select HAVE_KVM_EVENTFD select HAVE_KVM_IRQFD + select KVM_CROSS_ENDIAN_GUESTS depends on ARM_VIRT_EXT ARM_LPAE ARM_ARCH_TIMER ---help--- Support hosting virtualized guest machines. diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig index bfffe8f..9af39fe 100644 --- a/arch/arm64/kvm/Kconfig +++ b/arch/arm64/kvm/Kconfig @@ -31,6 +31,7 @@ config KVM select KVM_VFIO select HAVE_KVM_EVENTFD select HAVE_KVM_IRQFD + select KVM_CROSS_ENDIAN_GUESTS ---help--- Support hosting virtualized guest machines. diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig index 3caec2c..e028710 100644 --- a/arch/powerpc/kvm/Kconfig +++ b/arch/powerpc/kvm/Kconfig @@ -79,6 +79,7 @@ config KVM_BOOK3S_64_HV select KVM_BOOK3S_HV_POSSIBLE select MMU_NOTIFIER select CMA + select KVM_CROSS_ENDIAN_GUESTS ---help--- Support running unmodified book3s_64 guest kernels in virtual machines on POWER7 and PPC970 processors that have diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index c18f9e6..0c4ce47 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -261,6 +261,7 @@ config TUN config TUN_VNET_CROSS_LE bool Support for cross-endian vnet headers on little-endian kernels default n + depends on KVM_CROSS_ENDIAN_GUESTS ---help--- This option allows TUN/TAP and MACVTAP device drivers in a little-endian kernel to parse vnet headers that come from a diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig index 533eaf0..4d8ae6b 100644 --- a/drivers/vhost/Kconfig +++ b/drivers/vhost/Kconfig @@ -35,6 +35,7 @@ config VHOST config VHOST_CROSS_ENDIAN_LEGACY bool Cross-endian support for vhost + depends on KVM_CROSS_ENDIAN_GUESTS default n ---help--- This option allows vhost to support guests with a different byte diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig index e2c876d..cc7b28a 100644 --- a/virt/kvm/Kconfig +++ b/virt/kvm/Kconfig @@ -47,3 +47,6 @@ config KVM_GENERIC_DIRTYLOG_READ_PROTECT config KVM_COMPAT def_bool y depends on COMPAT !S390 + +config KVM_CROSS_ENDIAN_GUESTS + bool -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] target-i386: Sanity check host processor physical address width
On 07/09/15 11:27, Igor Mammedov wrote: On Thu, 09 Jul 2015 09:02:38 +0200 Laszlo Ersek ler...@redhat.com wrote: On 07/09/15 00:42, Bandan Das wrote: If a Linux guest is assigned more memory than is supported by the host processor, the guest is unable to boot. That is expected, however, there's no message indicating the user what went wrong. This change prints a message to stderr if KVM has the corresponding capability. Reported-by: Laszlo Ersek ler...@redhat.com Signed-off-by: Bandan Das b...@redhat.com --- linux-headers/linux/kvm.h | 1 + target-i386/kvm.c | 6 ++ 2 files changed, 7 insertions(+) diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h index 3bac873..6afad49 100644 --- a/linux-headers/linux/kvm.h +++ b/linux-headers/linux/kvm.h @@ -817,6 +817,7 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_DISABLE_QUIRKS 116 #define KVM_CAP_X86_SMM 117 #define KVM_CAP_MULTI_ADDRESS_SPACE 118 +#define KVM_CAP_PHY_ADDR_WIDTH 119 #ifdef KVM_CAP_IRQ_ROUTING diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 066d03d..66e3448 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -892,6 +892,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s) uint64_t shadow_mem; int ret; struct utsname utsname; +int max_phys_bits; ret = kvm_get_supported_msrs(s); if (ret 0) { @@ -945,6 +946,11 @@ int kvm_arch_init(MachineState *ms, KVMState *s) } } +max_phys_bits = kvm_check_extension(s, KVM_CAP_PHY_ADDR_WIDTH); +if (max_phys_bits (1ULL max_phys_bits) = ram_size) +fprintf(stderr, Warning: The amount of memory assigned to the guest +is more than that supported by the host CPU(s). Guest may be unstable.\n); + if (kvm_check_extension(s, KVM_CAP_X86_SMM)) { smram_machine_done.notify = register_smram_listener; qemu_add_machine_init_done_notifier(smram_machine_done); First, see my comments on the KVM patch. Second, ram_size is not the right thing to compare. What should be checked is whether the highest guest-physical address that maps to RAM can be represented in the address width of the host processor (and only if EPT is enabled, but that sub-condition belongs to the KVM patch). Note that this is not the same as the check written in the patch. For example, if you assume a 32-bit PCI hole with size 1 GB, then a total guest RAM of size 63 GB will result in the highest guest-phys memory address being 0xF__, which just fits into 36 bits. Correspondingly, the above code would not print the warning for -m $((63 * 1024 + 1)) on my laptop (which has address sizes : 36 bits physical, ...), even though such a guest would not boot for me (with EPT enabled). Please see http://thread.gmane.org/gmane.comp.bios.tianocore.devel/15418/focus=15447 So, ram_size in the controlling expression should be replaced with maximum_guest_ram_address (which should be inclusive, and the = relop should be preserved). also with memory hotplug tuned on we should check if the end of hotplug memory area is less then limit, i.e.: pcms-hotplug_memory.base + hotplug_mem_size 1ULL max_phys_bits Seems reasonable, thanks for the hint! (The LHS in this instance is exclusive though, so equality should *not* trigger the warning. maximum_guest_ram_address is inclusive, and equality should trigger the warning. (Although equality seems quite impossible in practice.)) Thanks! Laszlo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/13] arm/arm64: Add new is_kernel_in_hyp_mode predicate
On 09/07/15 10:42, Mark Rutland wrote: Hi, +static inline bool is_kernel_in_hyp_mode(void) +{ +u64 el; + +asm(mrs %0, CurrentEL : =r (el)); +return el == CurrentEL_EL2; +} If you can include cputype.h, I think this can be: static inline bool is_kernel_in_hyp_mode(void) { return read_cpuid(CurrentEL) == CurrentEL_EL2; } This would indeed work, but CurrentEL is hardly an ID register. I feel slightly uncomfortable using read_cpuid (which might return a cached version at some point) for random system registers. Thoughts? M. -- Jazz is not dead. It just smells funny... -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/13] arm/arm64: Add new is_kernel_in_hyp_mode predicate
On Thu, Jul 09, 2015 at 11:05:34AM +0100, Marc Zyngier wrote: On 09/07/15 10:42, Mark Rutland wrote: Hi, +static inline bool is_kernel_in_hyp_mode(void) +{ + u64 el; + + asm(mrs %0, CurrentEL : =r (el)); + return el == CurrentEL_EL2; +} If you can include cputype.h, I think this can be: static inline bool is_kernel_in_hyp_mode(void) { return read_cpuid(CurrentEL) == CurrentEL_EL2; } This would indeed work, but CurrentEL is hardly an ID register. I feel slightly uncomfortable using read_cpuid (which might return a cached version at some point) for random system registers. Thoughts? I have no strong feelings either way, but I agree with the general uneasiness w.r.t. what read_cpuid can be expected to do. Elsewhere we just use inline asm to read non CPUID system registers, so let's leave your patch as it was. Thanks, Mark. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] KVM: arm/arm64: Don't let userspace update CNTVOFF once guest is running
Hi Peter and Marc, [cc'ing Paolo for his input on x86 timekeeping] On Wed, Jul 08, 2015 at 08:13:59PM +0100, Peter Maydell wrote: On 8 July 2015 at 17:37, Marc Zyngier marc.zyng...@arm.com wrote: On 08/07/15 17:06, Peter Maydell wrote: I'd prefer it if somebody could investigate to see why QEMU is actually doing this -- so far we just have speculation. I'd prefer that too, but so far people seem to be more comfortable waiting for the issue to fix itself. In the meantime, VMs are broken in weird and wonderful ways, and I don't think the current status-quo helps anyone. Putting in a patch which might not be the right fix isn't necessarily a good plan either... Does has_run_once get cleared if we do a re-VCPU_INIT of a CPU that's run before? (We need to allow rewriting of guest state at that point so that reset VM and load migration state behaves correctly.) no, it does not, has_run_once is set the first time a VCPU is run and is currently *never* cleared. I suspect Jan is right and we really need to distinguish the KVM_PUT_*_STATE levels in ARM QEMU. This probably implies some kind of whitelist/override mechanism, since by and large we neither know nor want to know the semantics for system registers, we leave that up to the kernel. Q: if you have a running VM, and you pause it for an hour, what should the CNTVCT register do? Presumably it should not advance, but how do we arrange for that to happen? I think the CNTVCT should not advance when the VM is not scheduled, so if we pause the VM or starve all the VCPUs for enough time, the guest should not see time progressing, since otherwise the guest scheduler cannot maintain fairness and you're bound to see spurious RCU stalls etc. That's exactly why a guest can read both a virtual and physical counter and it is an area where you simply want some level of paravirtualization. I haven't studied how/if Linux deals with this at all. So I think adjusting CNTVOFF should be managed by the kernel for the pause/starvation scenario (which I think Avi once told me x86 does too - does anyone know the current state of the art?). So the only situation where I think userspace should adjust the CNTVOFF value is for migration where we are talking about a brand new VM with has_run_once clear. Thus, if we were designing this from scratch now, the API should be to return an error when trying to set KVM_REG_ARM_TIMER_CNT after the VM has run once, but it's too late for that as we would break userspace. The best alternative IMHO would be to merge Marc's patch and fix CNTVOFF in the kernel side as well, and finally also fix QEMU so that it doesn't try to do the thing that future kernels will ignore. -Christoffer -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] target-i386: Sanity check host processor physical address width
On Wed, 08 Jul 2015 18:42:01 -0400 Bandan Das b...@redhat.com wrote: If a Linux guest is assigned more memory than is supported by the host processor, the guest is unable to boot. That is expected, however, there's no message indicating the user what went wrong. This change prints a message to stderr if KVM has the corresponding capability. Reported-by: Laszlo Ersek ler...@redhat.com Signed-off-by: Bandan Das b...@redhat.com --- linux-headers/linux/kvm.h | 1 + target-i386/kvm.c | 6 ++ 2 files changed, 7 insertions(+) diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h index 3bac873..6afad49 100644 --- a/linux-headers/linux/kvm.h +++ b/linux-headers/linux/kvm.h @@ -817,6 +817,7 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_DISABLE_QUIRKS 116 #define KVM_CAP_X86_SMM 117 #define KVM_CAP_MULTI_ADDRESS_SPACE 118 +#define KVM_CAP_PHY_ADDR_WIDTH 119 #ifdef KVM_CAP_IRQ_ROUTING diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 066d03d..66e3448 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -892,6 +892,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s) uint64_t shadow_mem; int ret; struct utsname utsname; +int max_phys_bits; ret = kvm_get_supported_msrs(s); if (ret 0) { @@ -945,6 +946,11 @@ int kvm_arch_init(MachineState *ms, KVMState *s) } } +max_phys_bits = kvm_check_extension(s, KVM_CAP_PHY_ADDR_WIDTH); max_phys_bits seems generic enough and could be applied to other targets as well. making it a property of machine, would make accessing/manipulating it easier. define default value for machine/TCG mode and when KVM is enabled it would override/set its own limit. then any board could easily access machine-max_gpa to make board specific checks. +if (max_phys_bits (1ULL max_phys_bits) = ram_size) +fprintf(stderr, Warning: The amount of memory assigned to the guest +is more than that supported by the host CPU(s). Guest may be unstable.\n); + if (kvm_check_extension(s, KVM_CAP_X86_SMM)) { smram_machine_done.notify = register_smram_listener; qemu_add_machine_init_done_notifier(smram_machine_done); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 2/2] Detect vGIC presence at runtime
Hello! why not report ENXIO as an error? If probing the vgic fails due to being unable to request the irq or something similar, then surely your system has and error and this should be reported. It is reported by probe function itself. -ENODEV here means there's no GIC at all. -ENXIO happens when, for example, there is GIC node in the device tree, but it does not specify vGIC resources. Normally this means that vGIC is defunct on the machine. This may be more nicely implemented by letting the vgic init/probe functions set the vgic_present, or maybe better yet, just export a function from vgic.c: bool kvm_vgic_present(void) { return vgic_ops != NULL; } Is it necessary? Actually this flag is not needed anywhere else except arch/arm/kvm/arm.c, only at init time. Runtime should, i believe, use irqchip_in_kernel(), because userland can choose just not to use vGIC for some reason (testing for example). Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: x86: fix load xsave feature warning
On 09/07/2015 09:44, Wanpeng Li wrote: [ 68.196974] WARNING: CPU: 1 PID: 2140 at arch/x86/kvm/x86.c:3161 kvm_arch_vcpu_ioctl+0xe88/0x1340 [kvm]() [ 68.196975] Modules linked in: snd_hda_codec_hdmi i915 rfcomm bnep bluetooth i2c_algo_bit rfkill nfsd drm_kms_helper nfs_acl nfs drm lockd grace sunrpc fscache snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_seq_dummy snd_seq_oss x86_pkg_temp_thermal snd_seq_midi kvm_intel snd_seq_midi_event snd_rawmidi kvm snd_seq ghash_clmulni_intel fuse snd_timer aesni_intel parport_pc ablk_helper snd_seq_device cryptd ppdev snd lp parport lrw dcdbas gf128mul i2c_core glue_helper lpc_ich video shpchp mfd_core soundcore serio_raw acpi_cpufreq ext4 mbcache jbd2 sd_mod crc32c_intel ahci libahci libata e1000e ptp pps_core [ 68.197005] CPU: 1 PID: 2140 Comm: qemu-system-x86 Not tainted 4.2.0-rc1+ #2 [ 68.197006] Hardware name: Dell Inc. OptiPlex 7020/0F5C5X, BIOS A03 01/08/2015 [ 68.197007] a03b0657 8800d984bca8 815915a2 [ 68.197009] 8800d984bce8 81057c0a 7ff6d0001000 [ 68.197010] 0002 880211c1a000 0004 8800ce0288c0 [ 68.197012] Call Trace: [ 68.197017] [815915a2] dump_stack+0x45/0x57 [ 68.197020] [81057c0a] warn_slowpath_common+0x8a/0xc0 [ 68.197022] [81057cfa] warn_slowpath_null+0x1a/0x20 [ 68.197029] [a037bed8] kvm_arch_vcpu_ioctl+0xe88/0x1340 [kvm] [ 68.197035] [a037aede] ? kvm_arch_vcpu_load+0x4e/0x1c0 [kvm] [ 68.197040] [a03696a6] kvm_vcpu_ioctl+0xc6/0x5c0 [kvm] [ 68.197043] [811252d2] ? perf_pmu_enable+0x22/0x30 [ 68.197044] [8112663e] ? perf_event_context_sched_in+0x7e/0xb0 [ 68.197048] [811a6882] do_vfs_ioctl+0x2c2/0x4a0 [ 68.197050] [8107bf33] ? finish_task_switch+0x173/0x220 [ 68.197053] [8123307f] ? selinux_file_ioctl+0x4f/0xd0 [ 68.197055] [8122cac3] ? security_file_ioctl+0x43/0x60 [ 68.197057] [811a6ad9] SyS_ioctl+0x79/0x90 [ 68.197060] [81597e57] entry_SYSCALL_64_fastpath+0x12/0x6a [ 68.197061] ---[ end trace 558a5ebf9445fc80 ]--- After commit (0c4109bec0 'x86/fpu/xstate: Fix up bad get_xsave_addr() assumptions'), there is no assumption an xsave bit is present in the hardware (pcntxt_mask) that it is always present in a given xsave buffer. An enabled state to be present on 'pcntxt_mask', but *not* in 'xstate_bv' could happen when the last 'xsave' did not request that this feature be saved (unlikely) or because the init optimization caused it to not be saved. This patch kill the assumption. Signed-off-by: Wanpeng Li wanpeng...@hotmail.com --- Note: This patch against latest linus tree. arch/x86/kvm/x86.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index bbaf44e..7125cefe 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3157,8 +3157,7 @@ static void load_xsave(struct kvm_vcpu *vcpu, u8 *src) cpuid_count(XSTATE_CPUID, index, size, offset, ecx, edx); memcpy(dest, src + offset, size); - } else - WARN_ON_ONCE(1); + } valid -= feature; } I'll apply this tomorrow, thanks. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 08/11] KVM: arm: implement dirty bit mechanism for debug registers
On July 9, 2015 1:08:55 AM GMT+08:00, Will Deacon will.dea...@arm.com wrote: On Wed, Jul 08, 2015 at 11:50:22AM +0100, Zhichao Huang wrote: Are you happy with this?: You miss the reserved breakpoint, I think. Sorry, I can't quite understand. What is the reserved breakpoint? When arch_[iu]ninstall_hw_breakpoint is called, it only set/clear the brps and wrps slots. I also still don't understand why this is preferable to trapping. Will -- Zhichao Huang -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: Add Kconfig option to signal cross-endian guests
On Thu, Jul 09, 2015 at 02:57:33PM +0200, Paolo Bonzini wrote: On 09/07/2015 11:48, Laurent Vivier wrote: On 09/07/2015 09:49, Thomas Huth wrote: The option for supporting cross-endianness legacy guests in the vhost and tun code should only be available on systems that support cross-endian guests. I'm sure I misunderstand something, but what happens if we use QEMU with TCG instead of KVM, i.e. a big endian powerpc kernel guest on x86_64 little endian host ? TCG does not yet support irqfd/ioeventfd, so it cannot be used with vhost. Paolo vhost does not require irqfd anymore. I think ioeventfd actually works fine though I didn't try, it would be easy to support. Do you forbid the use of vhost in this case ? Signed-off-by: Thomas Huth th...@redhat.com --- arch/arm/kvm/Kconfig | 1 + arch/arm64/kvm/Kconfig | 1 + arch/powerpc/kvm/Kconfig | 1 + drivers/net/Kconfig | 1 + drivers/vhost/Kconfig| 1 + virt/kvm/Kconfig | 3 +++ 6 files changed, 8 insertions(+) diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig index bfb915d..9d8f363 100644 --- a/arch/arm/kvm/Kconfig +++ b/arch/arm/kvm/Kconfig @@ -31,6 +31,7 @@ config KVM select KVM_VFIO select HAVE_KVM_EVENTFD select HAVE_KVM_IRQFD + select KVM_CROSS_ENDIAN_GUESTS depends on ARM_VIRT_EXT ARM_LPAE ARM_ARCH_TIMER ---help--- Support hosting virtualized guest machines. diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig index bfffe8f..9af39fe 100644 --- a/arch/arm64/kvm/Kconfig +++ b/arch/arm64/kvm/Kconfig @@ -31,6 +31,7 @@ config KVM select KVM_VFIO select HAVE_KVM_EVENTFD select HAVE_KVM_IRQFD + select KVM_CROSS_ENDIAN_GUESTS ---help--- Support hosting virtualized guest machines. diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig index 3caec2c..e028710 100644 --- a/arch/powerpc/kvm/Kconfig +++ b/arch/powerpc/kvm/Kconfig @@ -79,6 +79,7 @@ config KVM_BOOK3S_64_HV select KVM_BOOK3S_HV_POSSIBLE select MMU_NOTIFIER select CMA + select KVM_CROSS_ENDIAN_GUESTS ---help--- Support running unmodified book3s_64 guest kernels in virtual machines on POWER7 and PPC970 processors that have diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index c18f9e6..0c4ce47 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -261,6 +261,7 @@ config TUN config TUN_VNET_CROSS_LE bool Support for cross-endian vnet headers on little-endian kernels default n + depends on KVM_CROSS_ENDIAN_GUESTS ---help--- This option allows TUN/TAP and MACVTAP device drivers in a little-endian kernel to parse vnet headers that come from a diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig index 533eaf0..4d8ae6b 100644 --- a/drivers/vhost/Kconfig +++ b/drivers/vhost/Kconfig @@ -35,6 +35,7 @@ config VHOST config VHOST_CROSS_ENDIAN_LEGACY bool Cross-endian support for vhost + depends on KVM_CROSS_ENDIAN_GUESTS default n ---help--- This option allows vhost to support guests with a different byte diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig index e2c876d..cc7b28a 100644 --- a/virt/kvm/Kconfig +++ b/virt/kvm/Kconfig @@ -47,3 +47,6 @@ config KVM_GENERIC_DIRTYLOG_READ_PROTECT config KVM_COMPAT def_bool y depends on COMPAT !S390 + +config KVM_CROSS_ENDIAN_GUESTS + bool -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/6] irqchip: GICv2/v3: Add support for irq_vcpu_affinity
The GICv2 and GICv3 architectures allow an active physical interrupt to be forwarded to a guest, and the guest to indirectly perform the deactivation of the interrupt by performing an EOI on the virtual interrupt (see for example the GICv2 spec, 3.2.1). This allows some substantial performance improvement for level triggered interrupts that otherwise have to be masked/unmasked in VFIO, not to mention the required trap back to KVM when the guest performs an EOI. To enable this, the GICs need to be switched to a different EOImode, where a taken interrupt can be left active (which prevents the same interrupt from being taken again), while other interrupts are still being processed normally. We also use the new irq_set_vcpu_affinity hook that was introduced for Intel's Posted Interrupts to determine whether or not to perform the deactivation at EOI-time. As all of this only makes sense when the kernel can behave as a hypervisor, we only enable this mode on detecting that the kernel was actually booted in HYP mode, and that the GIC supports this feature. This series is a complete rework of a RFC I sent over a year ago: http://lists.infradead.org/pipermail/linux-arm-kernel/2014-June/266328.html Since then, a lot has been either merged (the irqchip_state) or reworked (my active-timer series: http://www.spinics.net/lists/kvm/msg118768.html), and this implements the last few bits for Eric Auger's series to finally make it into the kernel: https://lkml.org/lkml/2015/7/2/268 https://lkml.org/lkml/2015/7/6/291 With all these patches combined, physical interrupt routing from the kernel into a VM becomes possible. This has been tested on Juno (GICv2) and FastModel (GICv3). A branch is available at: git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git irq/gic-irq-vcpu-affinity Marc Zyngier (6): irqchip: GICv3: Convert to EOImode == 1 irqchip: GIC: Convert to EOImode == 1 irqchip: GICv3: Skip LPI deactivation irqchip: GIC: Use chip_data instead of handler_data for cascaded interrupts irqchip: GICv3: Don't deactivate interrupts forwarded to a guest irqchip: GIC: Don't deactivate interrupts forwarded to a guest drivers/irqchip/irq-gic-v3.c | 57 +++-- drivers/irqchip/irq-gic.c | 64 +++--- include/linux/irqchip/arm-gic-v3.h | 9 ++ include/linux/irqchip/arm-gic.h| 4 +++ 4 files changed, 126 insertions(+), 8 deletions(-) -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] Detect vGIC presence at runtime
On Thu, Jul 09, 2015 at 03:50:49PM +0300, Pavel Fedin wrote: Hello! why not report ENXIO as an error? If probing the vgic fails due to being unable to request the irq or something similar, then surely your system has and error and this should be reported. It is reported by probe function itself. -ENODEV here means there's no GIC at all. -ENXIO happens when, for example, there is GIC node in the device tree, but it does not specify vGIC resources. Normally this means that vGIC is defunct on the machine. I'd like to distinguish between the 'missing vgic' and 'something bad happened when trying to initialize the vgic' cases, which I don't think we do currently, because the ENXIO code is used in various situations. This may be more nicely implemented by letting the vgic init/probe functions set the vgic_present, or maybe better yet, just export a function from vgic.c: bool kvm_vgic_present(void) { return vgic_ops != NULL; } Is it necessary? Actually this flag is not needed anywhere else except arch/arm/kvm/arm.c, only at init time. Runtime should, i believe, use irqchip_in_kernel(), because userland can choose just not to use vGIC for some reason (testing for example). I feel the init flow is relatively difficult to follow and adding a bunch of flags here and there doesn't seem to help. By adding a function with a proper comment, it should be more clear, and I don't like the switch statement on the error return values. -Christoffer -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: x86: Add host physical address width capability
On 09/07/2015 08:43, Laszlo Ersek wrote: On 07/09/15 08:09, Paolo Bonzini wrote: On 09/07/2015 00:36, Bandan Das wrote: Let userspace inquire the maximum physical address width of the host processors; this can be used to identify maximum memory that can be assigned to the guest. Reported-by: Laszlo Ersek ler...@redhat.com Signed-off-by: Bandan Das b...@redhat.com --- arch/x86/kvm/x86.c | 3 +++ include/uapi/linux/kvm.h | 1 + 2 files changed, 4 insertions(+) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index bbaf44e..97d6746 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2683,6 +2683,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_NR_MEMSLOTS: r = KVM_USER_MEM_SLOTS; break; + case KVM_CAP_PHY_ADDR_WIDTH: + r = boot_cpu_data.x86_phys_bits; + break; Userspace can just use CPUID, can't it? I believe KVM's cooperation is necessary, for the following reason: The truncation only occurs when the guest-phys - host-phys translation is done in hardware, *and* the phys bits of the host processor are insufficient to represent the highest guest-phys address that the guest will ever face. The first condition (of course) means that the truncation depends on EPT being enabled. (I didn't test on AMD so I don't know if RVI has the same issue.) If EPT is disabled, either because the host processor lacks it, or because the respective kvm_intel module parameter is set so, then the issue cannot be experienced. Therefore I believe a KVM patch is necessary. However, this specific patch doesn't seem sufficient; it should also consider whether EPT is enabled. (And the ioctl should be perhaps renamed to reflect that -- what QEMU needs to know is not the raw physical address width of the host processor, but whether that width will cause EPT to silently truncate high guest-phys addresses.) Right; if you want to consider whether EPT is enabled (which is the right thing to do, albeit it makes for a much bigger patch) a KVM patch is necessary. In that case you also need to patch the API documentation. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: Add Kconfig option to signal cross-endian guests
On 09/07/2015 11:48, Laurent Vivier wrote: On 09/07/2015 09:49, Thomas Huth wrote: The option for supporting cross-endianness legacy guests in the vhost and tun code should only be available on systems that support cross-endian guests. I'm sure I misunderstand something, but what happens if we use QEMU with TCG instead of KVM, i.e. a big endian powerpc kernel guest on x86_64 little endian host ? TCG does not yet support irqfd/ioeventfd, so it cannot be used with vhost. Paolo Do you forbid the use of vhost in this case ? Signed-off-by: Thomas Huth th...@redhat.com --- arch/arm/kvm/Kconfig | 1 + arch/arm64/kvm/Kconfig | 1 + arch/powerpc/kvm/Kconfig | 1 + drivers/net/Kconfig | 1 + drivers/vhost/Kconfig| 1 + virt/kvm/Kconfig | 3 +++ 6 files changed, 8 insertions(+) diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig index bfb915d..9d8f363 100644 --- a/arch/arm/kvm/Kconfig +++ b/arch/arm/kvm/Kconfig @@ -31,6 +31,7 @@ config KVM select KVM_VFIO select HAVE_KVM_EVENTFD select HAVE_KVM_IRQFD +select KVM_CROSS_ENDIAN_GUESTS depends on ARM_VIRT_EXT ARM_LPAE ARM_ARCH_TIMER ---help--- Support hosting virtualized guest machines. diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig index bfffe8f..9af39fe 100644 --- a/arch/arm64/kvm/Kconfig +++ b/arch/arm64/kvm/Kconfig @@ -31,6 +31,7 @@ config KVM select KVM_VFIO select HAVE_KVM_EVENTFD select HAVE_KVM_IRQFD +select KVM_CROSS_ENDIAN_GUESTS ---help--- Support hosting virtualized guest machines. diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig index 3caec2c..e028710 100644 --- a/arch/powerpc/kvm/Kconfig +++ b/arch/powerpc/kvm/Kconfig @@ -79,6 +79,7 @@ config KVM_BOOK3S_64_HV select KVM_BOOK3S_HV_POSSIBLE select MMU_NOTIFIER select CMA +select KVM_CROSS_ENDIAN_GUESTS ---help--- Support running unmodified book3s_64 guest kernels in virtual machines on POWER7 and PPC970 processors that have diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index c18f9e6..0c4ce47 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -261,6 +261,7 @@ config TUN config TUN_VNET_CROSS_LE bool Support for cross-endian vnet headers on little-endian kernels default n +depends on KVM_CROSS_ENDIAN_GUESTS ---help--- This option allows TUN/TAP and MACVTAP device drivers in a little-endian kernel to parse vnet headers that come from a diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig index 533eaf0..4d8ae6b 100644 --- a/drivers/vhost/Kconfig +++ b/drivers/vhost/Kconfig @@ -35,6 +35,7 @@ config VHOST config VHOST_CROSS_ENDIAN_LEGACY bool Cross-endian support for vhost +depends on KVM_CROSS_ENDIAN_GUESTS default n ---help--- This option allows vhost to support guests with a different byte diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig index e2c876d..cc7b28a 100644 --- a/virt/kvm/Kconfig +++ b/virt/kvm/Kconfig @@ -47,3 +47,6 @@ config KVM_GENERIC_DIRTYLOG_READ_PROTECT config KVM_COMPAT def_bool y depends on COMPAT !S390 + +config KVM_CROSS_ENDIAN_GUESTS + bool -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: Add Kconfig option to signal cross-endian guests
On 09/07/2015 11:48, Laurent Vivier wrote: On 09/07/2015 09:49, Thomas Huth wrote: The option for supporting cross-endianness legacy guests in the vhost and tun code should only be available on systems that support cross-endian guests. I'm sure I misunderstand something, but what happens if we use QEMU with TCG instead of KVM, i.e. a big endian powerpc kernel guest on x86_64 little endian host ? TCG does not yet support irqfd/ioeventfd, so it cannot be used with vhost. Paolo Do you forbid the use of vhost in this case ? Signed-off-by: Thomas Huth th...@redhat.com --- arch/arm/kvm/Kconfig | 1 + arch/arm64/kvm/Kconfig | 1 + arch/powerpc/kvm/Kconfig | 1 + drivers/net/Kconfig | 1 + drivers/vhost/Kconfig| 1 + virt/kvm/Kconfig | 3 +++ 6 files changed, 8 insertions(+) diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig index bfb915d..9d8f363 100644 --- a/arch/arm/kvm/Kconfig +++ b/arch/arm/kvm/Kconfig @@ -31,6 +31,7 @@ config KVM select KVM_VFIO select HAVE_KVM_EVENTFD select HAVE_KVM_IRQFD +select KVM_CROSS_ENDIAN_GUESTS depends on ARM_VIRT_EXT ARM_LPAE ARM_ARCH_TIMER ---help--- Support hosting virtualized guest machines. diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig index bfffe8f..9af39fe 100644 --- a/arch/arm64/kvm/Kconfig +++ b/arch/arm64/kvm/Kconfig @@ -31,6 +31,7 @@ config KVM select KVM_VFIO select HAVE_KVM_EVENTFD select HAVE_KVM_IRQFD +select KVM_CROSS_ENDIAN_GUESTS ---help--- Support hosting virtualized guest machines. diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig index 3caec2c..e028710 100644 --- a/arch/powerpc/kvm/Kconfig +++ b/arch/powerpc/kvm/Kconfig @@ -79,6 +79,7 @@ config KVM_BOOK3S_64_HV select KVM_BOOK3S_HV_POSSIBLE select MMU_NOTIFIER select CMA +select KVM_CROSS_ENDIAN_GUESTS ---help--- Support running unmodified book3s_64 guest kernels in virtual machines on POWER7 and PPC970 processors that have diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index c18f9e6..0c4ce47 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -261,6 +261,7 @@ config TUN config TUN_VNET_CROSS_LE bool Support for cross-endian vnet headers on little-endian kernels default n +depends on KVM_CROSS_ENDIAN_GUESTS ---help--- This option allows TUN/TAP and MACVTAP device drivers in a little-endian kernel to parse vnet headers that come from a diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig index 533eaf0..4d8ae6b 100644 --- a/drivers/vhost/Kconfig +++ b/drivers/vhost/Kconfig @@ -35,6 +35,7 @@ config VHOST config VHOST_CROSS_ENDIAN_LEGACY bool Cross-endian support for vhost +depends on KVM_CROSS_ENDIAN_GUESTS default n ---help--- This option allows vhost to support guests with a different byte diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig index e2c876d..cc7b28a 100644 --- a/virt/kvm/Kconfig +++ b/virt/kvm/Kconfig @@ -47,3 +47,6 @@ config KVM_GENERIC_DIRTYLOG_READ_PROTECT config KVM_COMPAT def_bool y depends on COMPAT !S390 + +config KVM_CROSS_ENDIAN_GUESTS + bool -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] target-i386: Sanity check host processor physical address width
On 09/07/2015 10:26, Laszlo Ersek wrote: Perhaps KVM could simply hide memory above the limit (i.e. treat it as MMIO), and the BIOS could remove RAM above the limit from the e820 memory map? I'd prefer to leave the guest firmware*s* out of this... :) E820 is a legacy BIOS concept. In OVMF we'd have to hack the memory resource descriptor HOBs (which in turn control the DXE memory space map, which in turn controls the UEFI memory map). Those HOBs are currently based on what the CMOS reports about the RAM available under and above 4GB. It's pretty complex already (will get more complex with SMM support), and TBH, for working around such an obscure issue, I wouldn't like to complicate it even further... After all, this is a host platform limitation. The solution should be to either move to a more capable host, or do it in software (disable EPT). The reason I mentioned the firmware is because you could in principle have the same issue on real hardware - say putting 128 GB on your laptop. The firmware should cope with it. If OVMF does not use etc/e820, it can instead hack the values it reads from CMOS, bounding them according to the CPUID value. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: Add Kconfig option to signal cross-endian guests
On Thu, Jul 09, 2015 at 02:57:33PM +0200, Paolo Bonzini wrote: On 09/07/2015 11:48, Laurent Vivier wrote: On 09/07/2015 09:49, Thomas Huth wrote: The option for supporting cross-endianness legacy guests in the vhost and tun code should only be available on systems that support cross-endian guests. I'm sure I misunderstand something, but what happens if we use QEMU with TCG instead of KVM, i.e. a big endian powerpc kernel guest on x86_64 little endian host ? TCG does not yet support irqfd/ioeventfd, so it cannot be used with vhost. Paolo vhost does not require irqfd anymore. I think ioeventfd actually works fine though I didn't try, it would be easy to support. Do you forbid the use of vhost in this case ? Signed-off-by: Thomas Huth th...@redhat.com --- arch/arm/kvm/Kconfig | 1 + arch/arm64/kvm/Kconfig | 1 + arch/powerpc/kvm/Kconfig | 1 + drivers/net/Kconfig | 1 + drivers/vhost/Kconfig| 1 + virt/kvm/Kconfig | 3 +++ 6 files changed, 8 insertions(+) diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig index bfb915d..9d8f363 100644 --- a/arch/arm/kvm/Kconfig +++ b/arch/arm/kvm/Kconfig @@ -31,6 +31,7 @@ config KVM select KVM_VFIO select HAVE_KVM_EVENTFD select HAVE_KVM_IRQFD + select KVM_CROSS_ENDIAN_GUESTS depends on ARM_VIRT_EXT ARM_LPAE ARM_ARCH_TIMER ---help--- Support hosting virtualized guest machines. diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig index bfffe8f..9af39fe 100644 --- a/arch/arm64/kvm/Kconfig +++ b/arch/arm64/kvm/Kconfig @@ -31,6 +31,7 @@ config KVM select KVM_VFIO select HAVE_KVM_EVENTFD select HAVE_KVM_IRQFD + select KVM_CROSS_ENDIAN_GUESTS ---help--- Support hosting virtualized guest machines. diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig index 3caec2c..e028710 100644 --- a/arch/powerpc/kvm/Kconfig +++ b/arch/powerpc/kvm/Kconfig @@ -79,6 +79,7 @@ config KVM_BOOK3S_64_HV select KVM_BOOK3S_HV_POSSIBLE select MMU_NOTIFIER select CMA + select KVM_CROSS_ENDIAN_GUESTS ---help--- Support running unmodified book3s_64 guest kernels in virtual machines on POWER7 and PPC970 processors that have diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index c18f9e6..0c4ce47 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -261,6 +261,7 @@ config TUN config TUN_VNET_CROSS_LE bool Support for cross-endian vnet headers on little-endian kernels default n + depends on KVM_CROSS_ENDIAN_GUESTS ---help--- This option allows TUN/TAP and MACVTAP device drivers in a little-endian kernel to parse vnet headers that come from a diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig index 533eaf0..4d8ae6b 100644 --- a/drivers/vhost/Kconfig +++ b/drivers/vhost/Kconfig @@ -35,6 +35,7 @@ config VHOST config VHOST_CROSS_ENDIAN_LEGACY bool Cross-endian support for vhost + depends on KVM_CROSS_ENDIAN_GUESTS default n ---help--- This option allows vhost to support guests with a different byte diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig index e2c876d..cc7b28a 100644 --- a/virt/kvm/Kconfig +++ b/virt/kvm/Kconfig @@ -47,3 +47,6 @@ config KVM_GENERIC_DIRTYLOG_READ_PROTECT config KVM_COMPAT def_bool y depends on COMPAT !S390 + +config KVM_CROSS_ENDIAN_GUESTS + bool -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/6] irqchip: GICv3: Skip LPI deactivation
Contrary to other GICv3 interrupts, LPIs do not have an active state by virtue of being edge-triggered only (they only have a pending state). Given this, there is no point trying to deactivate them, and we can skip the ICC_DIR_EL1 entierely. Signed-off-by: Marc Zyngier marc.zyng...@arm.com --- drivers/irqchip/irq-gic-v3.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index 49768fc..e02592b 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -295,10 +295,14 @@ static int gic_irq_get_irqchip_state(struct irq_data *d, static void gic_eoi_irq(struct irq_data *d) { - if (static_key_true(supports_deactivate)) + if (static_key_true(supports_deactivate)) { + /* No need to deactivate an LPI */ + if (gic_irq(d) = 8192) + return; gic_write_dir(gic_irq(d)); - else + } else { gic_write_eoir(gic_irq(d)); + } } static int gic_set_type(struct irq_data *d, unsigned int type) -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/6] irqchip: GIC: Convert to EOImode == 1
So far, GICv2 has been used in with EOImode == 0. The effect of this mode is to perform the priority drop and the deactivation of the interrupt at the same time. While this works perfectly for Linux (we only have a single priority), it causes issues when an interrupt is forwarded to a guest, and when we want the guest to perform the EOI itself. For this case, the GIC architecture provides EOImode == 1, where: - A write to the EOI register drops the priority of the interrupt and leaves it active. Other interrupts at the same priority level can now be taken, but the active interrupt cannot be taken again - A write to the DIR marks the interrupt as inactive, meaning it can now be taken again. We only enable this feature when booted in HYP mode and that the device-tree reporte a suitable CPU interface. Observable behaviour should remain unchanged. Signed-off-by: Marc Zyngier marc.zyng...@arm.com --- drivers/irqchip/irq-gic.c | 32 +--- include/linux/irqchip/arm-gic.h | 4 2 files changed, 33 insertions(+), 3 deletions(-) diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c index 8d7e1c8..e264675 100644 --- a/drivers/irqchip/irq-gic.c +++ b/drivers/irqchip/irq-gic.c @@ -46,6 +46,7 @@ #include asm/irq.h #include asm/exception.h #include asm/smp_plat.h +#include asm/virt.h #include irq-gic-common.h #include irqchip.h @@ -82,6 +83,8 @@ static DEFINE_RAW_SPINLOCK(irq_controller_lock); #define NR_GIC_CPU_IF 8 static u8 gic_cpu_map[NR_GIC_CPU_IF] __read_mostly; +static struct static_key supports_deactivate = STATIC_KEY_INIT_TRUE; + #ifndef MAX_GIC_NR #define MAX_GIC_NR 1 #endif @@ -164,7 +167,10 @@ static void gic_unmask_irq(struct irq_data *d) static void gic_eoi_irq(struct irq_data *d) { - writel_relaxed(gic_irq(d), gic_cpu_base(d) + GIC_CPU_EOI); + if (static_key_true(supports_deactivate)) + writel_relaxed(gic_irq(d), gic_cpu_base(d) + GIC_CPU_DEACTIVATE); + else + writel_relaxed(gic_irq(d), gic_cpu_base(d) + GIC_CPU_EOI); } static int gic_irq_set_irqchip_state(struct irq_data *d, @@ -272,11 +278,15 @@ static void __exception_irq_entry gic_handle_irq(struct pt_regs *regs) irqnr = irqstat GICC_IAR_INT_ID_MASK; if (likely(irqnr 15 irqnr 1021)) { + if (static_key_true(supports_deactivate)) + writel_relaxed(irqstat, cpu_base + GIC_CPU_EOI); handle_domain_irq(gic-domain, irqnr, regs); continue; } if (irqnr 16) { writel_relaxed(irqstat, cpu_base + GIC_CPU_EOI); + if (static_key_true(supports_deactivate)) + writel_relaxed(irqstat, cpu_base + GIC_CPU_DEACTIVATE); #ifdef CONFIG_SMP handle_IPI(irqnr, regs); #endif @@ -358,7 +368,11 @@ static u8 gic_get_cpumask(struct gic_chip_data *gic) static void gic_cpu_if_up(void) { void __iomem *cpu_base = gic_data_cpu_base(gic_data[0]); - u32 bypass = 0; + u32 bypass; + u32 mode = 0; + + if (static_key_true(supports_deactivate)) + mode = GIC_CPU_CTRL_EOImodeNS; /* * Preserve bypass disable bits to be written back later @@ -366,7 +380,7 @@ static void gic_cpu_if_up(void) bypass = readl(cpu_base + GIC_CPU_CTRL); bypass = GICC_DIS_BYPASS_MASK; - writel_relaxed(bypass | GICC_ENABLE, cpu_base + GIC_CPU_CTRL); + writel_relaxed(bypass | mode | GICC_ENABLE, cpu_base + GIC_CPU_CTRL); } @@ -986,6 +1000,9 @@ void __init gic_init_bases(unsigned int gic_nr, int irq_start, register_cpu_notifier(gic_cpu_notifier); #endif set_handle_irq(gic_handle_irq); + pr_info (GIC: Using EOImode == %d\n, +static_key_true(supports_deactivate)); + } gic_dist_init(gic); @@ -1001,6 +1018,7 @@ gic_of_init(struct device_node *node, struct device_node *parent) { void __iomem *cpu_base; void __iomem *dist_base; + struct resource cpu_res; u32 percpu_offset; int irq; @@ -1013,6 +1031,11 @@ gic_of_init(struct device_node *node, struct device_node *parent) cpu_base = of_iomap(node, 1); WARN(!cpu_base, unable to map gic cpu registers\n); + of_address_to_resource(node, 1, cpu_res); + if (!gic_cnt + (!is_hyp_mode_available() || resource_size(cpu_res) SZ_8K)) + static_key_slow_dec(supports_deactivate); + if (of_property_read_u32(node, cpu-offset, percpu_offset)) percpu_offset = 0; @@ -1131,6 +1154,9 @@ gic_v2_acpi_init(struct acpi_table_header *table) return -ENOMEM; } + if (!is_hyp_mode_available()) + static_key_slow_dec(supports_deactivate); + /* *
[PATCH 6/6] irqchip: GIC: Don't deactivate interrupts forwarded to a guest
Commit 0a4377de3056 (genirq: Introduce irq_set_vcpu_affinity() to target an interrupt to a VCPU) added just what we needed at the lowest level to allow an interrupt to be deactivated by a guest. When such a request reaches the GIC, it knows it doesn't need to perform the deactivation anymore, and can safely leave the guest do its magic. This of course requires additional support in both VFIO and KVM. Signed-off-by: Marc Zyngier marc.zyng...@arm.com --- drivers/irqchip/irq-gic.c | 32 ++-- 1 file changed, 30 insertions(+), 2 deletions(-) diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c index 3c7f3a4..ca4074a 100644 --- a/drivers/irqchip/irq-gic.c +++ b/drivers/irqchip/irq-gic.c @@ -140,6 +140,11 @@ static inline unsigned int gic_irq(struct irq_data *d) return d-hwirq; } +static inline bool forwarded_irq(struct irq_data *d) +{ + return d-handler_data != NULL; +} + /* * Routines to acknowledge, disable and enable interrupts */ @@ -158,6 +163,12 @@ static int gic_peek_irq(struct irq_data *d, u32 offset) static void gic_mask_irq(struct irq_data *d) { gic_poke_irq(d, GIC_DIST_ENABLE_CLEAR); + /* +* When masking a forwarded interrupt, make sure it is +* deactivated as well. +*/ + if (static_key_true(supports_deactivate) forwarded_irq(d)) + gic_poke_irq(d, GIC_DIST_ACTIVE_CLEAR); } static void gic_unmask_irq(struct irq_data *d) @@ -167,10 +178,14 @@ static void gic_unmask_irq(struct irq_data *d) static void gic_eoi_irq(struct irq_data *d) { - if (static_key_true(supports_deactivate)) + if (static_key_true(supports_deactivate)) { + /* Do not deactivate an IRQ forwarded to a vcpu. */ + if (forwarded_irq(d)) + return; writel_relaxed(gic_irq(d), gic_cpu_base(d) + GIC_CPU_DEACTIVATE); - else + } else { writel_relaxed(gic_irq(d), gic_cpu_base(d) + GIC_CPU_EOI); + } } static int gic_irq_set_irqchip_state(struct irq_data *d, @@ -239,6 +254,18 @@ static int gic_set_type(struct irq_data *d, unsigned int type) return gic_configure_irq(gicirq, type, base, NULL); } +static int gic_irq_set_vcpu_affinity(struct irq_data *d, void *vcpu) +{ + /* Only interrupts on the primary GIC can be forwarded to a vcpu. */ + if (static_key_true(supports_deactivate) + irq_data_get_irq_chip_data(d) == gic_data[0]) { + d-handler_data = vcpu; + return 0; + } + + return -EINVAL; +} + #ifdef CONFIG_SMP static int gic_set_affinity(struct irq_data *d, const struct cpumask *mask_val, bool force) @@ -334,6 +361,7 @@ static struct irq_chip gic_chip = { #endif .irq_get_irqchip_state = gic_irq_get_irqchip_state, .irq_set_irqchip_state = gic_irq_set_irqchip_state, + .irq_set_vcpu_affinity = gic_irq_set_vcpu_affinity, .flags = IRQCHIP_SET_TYPE_MASKED, }; -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/6] irqchip: GICv3: Don't deactivate interrupts forwarded to a guest
Commit 0a4377de3056 (genirq: Introduce irq_set_vcpu_affinity() to target an interrupt to a VCPU) added just what we needed at the lowest level to allow an interrupt to be deactivated by a guest. When such a request reaches the GIC, it knows it doesn't need to perform the deactivation anymore, and can safely leave the guest do its magic. This of course requires additional support in both VFIO and KVM. Signed-off-by: Marc Zyngier marc.zyng...@arm.com --- drivers/irqchip/irq-gic-v3.c | 29 +++-- 1 file changed, 27 insertions(+), 2 deletions(-) diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index e02592b..a1ca9e6 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -70,6 +70,11 @@ static inline int gic_irq_in_rdist(struct irq_data *d) return gic_irq(d) 32; } +static inline bool forwarded_irq(struct irq_data *d) +{ + return d-handler_data != NULL; +} + static inline void __iomem *gic_dist_base(struct irq_data *d) { if (gic_irq_in_rdist(d))/* SGI+PPI - SGI_base for this CPU */ @@ -231,6 +236,12 @@ static void gic_poke_irq(struct irq_data *d, u32 offset) static void gic_mask_irq(struct irq_data *d) { gic_poke_irq(d, GICD_ICENABLER); + /* +* When masking a forwarded interrupt, make sure it is +* deactivated as well. +*/ + if (static_key_true(supports_deactivate) forwarded_irq(d)) + gic_poke_irq(d, GICD_ICACTIVER); } static void gic_unmask_irq(struct irq_data *d) @@ -296,8 +307,11 @@ static int gic_irq_get_irqchip_state(struct irq_data *d, static void gic_eoi_irq(struct irq_data *d) { if (static_key_true(supports_deactivate)) { - /* No need to deactivate an LPI */ - if (gic_irq(d) = 8192) + /* +* No need to deactivate an LPI, or an interrupt that +* is is getting forwarded to a vcpu. +*/ + if (gic_irq(d) = 8192 || forwarded_irq(d)) return; gic_write_dir(gic_irq(d)); } else { @@ -331,6 +345,16 @@ static int gic_set_type(struct irq_data *d, unsigned int type) return gic_configure_irq(irq, type, base, rwp_wait); } +static int gic_irq_set_vcpu_affinity(struct irq_data *d, void *vcpu) +{ + if (static_key_true(supports_deactivate)) { + d-handler_data = vcpu; + return 0; + } + + return -EINVAL; +} + static u64 gic_mpidr_to_affinity(u64 mpidr) { u64 aff; @@ -678,6 +702,7 @@ static struct irq_chip gic_chip = { .irq_set_affinity = gic_set_affinity, .irq_get_irqchip_state = gic_irq_get_irqchip_state, .irq_set_irqchip_state = gic_irq_set_irqchip_state, + .irq_set_vcpu_affinity = gic_irq_set_vcpu_affinity, .flags = IRQCHIP_SET_TYPE_MASKED, }; -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/6] irqchip: GIC: Use chip_data instead of handler_data for cascaded interrupts
When used as a primary interrupt controller, the GIC driver uses irq_data-chip_data to extract controller-specific information. When used as a secondary interrupt controller, it uses handler_data instead. As this difference is relatively pointless and only creates confusion, change the secondary path to match what the rest of the driver does. Signed-off-by: Marc Zyngier marc.zyng...@arm.com --- drivers/irqchip/irq-gic.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c index e264675..3c7f3a4 100644 --- a/drivers/irqchip/irq-gic.c +++ b/drivers/irqchip/irq-gic.c @@ -298,7 +298,7 @@ static void __exception_irq_entry gic_handle_irq(struct pt_regs *regs) static void gic_handle_cascade_irq(unsigned int irq, struct irq_desc *desc) { - struct gic_chip_data *chip_data = irq_get_handler_data(irq); + struct gic_chip_data *chip_data = irq_get_chip_data(irq); struct irq_chip *chip = irq_get_chip(irq); unsigned int cascade_irq, gic_irq; unsigned long status; @@ -341,7 +341,7 @@ void __init gic_cascade_irq(unsigned int gic_nr, unsigned int irq) { if (gic_nr = MAX_GIC_NR) BUG(); - if (irq_set_handler_data(irq, gic_data[gic_nr]) != 0) + if (irq_set_chip_data(irq, gic_data[gic_nr]) != 0) BUG(); irq_set_chained_handler(irq, gic_handle_cascade_irq); } -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/6] irqchip: GICv3: Convert to EOImode == 1
So far, GICv3 has been used in with EOImode == 0. The effect of this mode is to perform the priority drop and the deactivation of the interrupt at the same time. While this works perfectly for Linux (we only have a single priority), it causes issues when an interrupt is forwarded to a guest, and when we want the guest to perform the EOI itself. For this case, the GIC architecture provides EOImode == 1, where: - A write to ICC_EOIR1_EL1 drops the priority of the interrupt and leaves it active. Other interrupts at the same priority level can now be taken, but the active interrupt cannot be taken again - A write to ICC_DIR_EL1 marks the interrupt as inactive, meaning it can now be taken again. This patch converts the driver to be able to use this new mode, depending on whether or not the kernel can behave as a hypervisor. No feature change. Signed-off-by: Marc Zyngier marc.zyng...@arm.com --- drivers/irqchip/irq-gic-v3.c | 28 +--- include/linux/irqchip/arm-gic-v3.h | 9 + 2 files changed, 34 insertions(+), 3 deletions(-) diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index c52f7ba..49768fc 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -30,6 +30,7 @@ #include asm/cputype.h #include asm/exception.h #include asm/smp_plat.h +#include asm/virt.h #include irq-gic-common.h #include irqchip.h @@ -50,6 +51,7 @@ struct gic_chip_data { }; static struct gic_chip_data gic_data __read_mostly; +static struct static_key supports_deactivate = STATIC_KEY_INIT_TRUE; #define gic_data_rdist() (this_cpu_ptr(gic_data.rdists.rdist)) #define gic_data_rdist_rd_base() (gic_data_rdist()-rd_base) @@ -293,7 +295,10 @@ static int gic_irq_get_irqchip_state(struct irq_data *d, static void gic_eoi_irq(struct irq_data *d) { - gic_write_eoir(gic_irq(d)); + if (static_key_true(supports_deactivate)) + gic_write_dir(gic_irq(d)); + else + gic_write_eoir(gic_irq(d)); } static int gic_set_type(struct irq_data *d, unsigned int type) @@ -343,6 +348,10 @@ static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs if (likely(irqnr 15 irqnr 1020) || irqnr = 8192) { int err; + + if (static_key_true(supports_deactivate)) + gic_write_eoir(irqnr); + err = handle_domain_irq(gic_data.domain, irqnr, regs); if (err) { WARN_ONCE(true, Unexpected interrupt received!\n); @@ -352,6 +361,8 @@ static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs } if (irqnr 16) { gic_write_eoir(irqnr); + if (static_key_true(supports_deactivate)) + gic_write_dir(irqnr); #ifdef CONFIG_SMP handle_IPI(irqnr, regs); #else @@ -451,8 +462,13 @@ static void gic_cpu_sys_reg_init(void) /* Set priority mask register */ gic_write_pmr(DEFAULT_PMR_VALUE); - /* EOI deactivates interrupt too (mode 0) */ - gic_write_ctlr(ICC_CTLR_EL1_EOImode_drop_dir); + if (static_key_true(supports_deactivate)) { + /* EOI drops priority only (mode 1) */ + gic_write_ctlr(ICC_CTLR_EL1_EOImode_drop); + } else { + /* EOI deactivates interrupt too (mode 0) */ + gic_write_ctlr(ICC_CTLR_EL1_EOImode_drop_dir); + } /* ... and let's hit the road... */ gic_write_grpen1(1); @@ -820,6 +836,12 @@ static int __init gic_of_init(struct device_node *node, struct device_node *pare if (of_property_read_u64(node, redistributor-stride, redist_stride)) redist_stride = 0; + if (!is_hyp_mode_available()) + static_key_slow_dec(supports_deactivate); + + pr_info (GIC: Using EOImode == %d\n, +static_key_true(supports_deactivate)); + gic_data.dist_base = dist_base; gic_data.redist_regions = rdist_regs; gic_data.nr_redist_regions = nr_redist_regions; diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h index ffbc034..bc98832 100644 --- a/include/linux/irqchip/arm-gic-v3.h +++ b/include/linux/irqchip/arm-gic-v3.h @@ -104,6 +104,8 @@ #define GICR_SYNCR 0x00C0 #define GICR_MOVLPIR 0x0100 #define GICR_MOVALLR 0x0110 +#define GICR_ISACTIVER GICD_ISACTIVER +#define GICR_ICACTIVER GICD_ICACTIVER #define GICR_IDREGSGICD_IDREGS #define GICR_PIDR2 GICD_PIDR2 @@ -288,6 +290,7 @@ #define ICH_VMCR_PMR_MASK (0xffUL ICH_VMCR_PMR_SHIFT) #define ICC_EOIR1_EL1 sys_reg(3, 0, 12, 12, 1) +#define ICC_DIR_EL1
RE: [PATCH 2/2] Detect vGIC presence at runtime
Hello! I'd like to distinguish between the 'missing vgic' and 'something bad happened when trying to initialize the vgic' cases, which I don't think we do currently, because the ENXIO code is used in various situations. It is done. Check, for example, vgic_v2_probe(). -ENXIO is returned when some of resources are either missing from DT specification or wrongly given (not page-aligned). In the rest of cases error code is taken from underlying functions, which are more supposed to return things like -EINVAL or -ENOMEM. If you are doubtful, i could suggest to replace -EINVAL with -ENODEV in cases where resources are not present. This would give more clear indication of we don't have vGIC condition. I feel the init flow is relatively difficult to follow and adding a bunch of flags here and there doesn't seem to help. By adding a function with a proper comment, it should be more clear, and I don't like the switch statement on the error return values. Well, the alternative is: 1. If GIC node or vGIC resources are not present, return 0 instead of error code. 2. Implement a function which you suggest. Then: 1. Situation with missing vGIC is just considered to be normal; it's not a to-be-ignored error any more. 2. vGIC present situation is automatically determined by vgic_ops != NULL; this means that probe function completely worked and vGIC implementation has been chosen. Agree? Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia -Original Message- From: Christoffer Dall [mailto:christoffer.d...@linaro.org] Sent: Thursday, July 09, 2015 4:25 PM To: Pavel Fedin Cc: kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org; 'Marc Zyngier' Subject: Re: [PATCH 2/2] Detect vGIC presence at runtime On Thu, Jul 09, 2015 at 03:50:49PM +0300, Pavel Fedin wrote: Hello! why not report ENXIO as an error? If probing the vgic fails due to being unable to request the irq or something similar, then surely your system has and error and this should be reported. It is reported by probe function itself. -ENODEV here means there's no GIC at all. -ENXIO happens when, for example, there is GIC node in the device tree, but it does not specify vGIC resources. Normally this means that vGIC is defunct on the machine. This may be more nicely implemented by letting the vgic init/probe functions set the vgic_present, or maybe better yet, just export a function from vgic.c: bool kvm_vgic_present(void) { return vgic_ops != NULL; } Is it necessary? Actually this flag is not needed anywhere else except arch/arm/kvm/arm.c, only at init time. Runtime should, i believe, use irqchip_in_kernel(), because userland can choose just not to use vGIC for some reason (testing for example). I feel the init flow is relatively difficult to follow and adding a bunch of flags here and there doesn't seem to help. By adding a function with a proper comment, it should be more clear, and I don't like the switch statement on the error return values. -Christoffer -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] target-i386: Sanity check host processor physical address width
Paolo Bonzini pbonz...@redhat.com writes: On 09/07/2015 10:26, Laszlo Ersek wrote: Perhaps KVM could simply hide memory above the limit (i.e. treat it as MMIO), and the BIOS could remove RAM above the limit from the e820 memory map? I'd prefer to leave the guest firmware*s* out of this... :) E820 is a legacy BIOS concept. In OVMF we'd have to hack the memory resource descriptor HOBs (which in turn control the DXE memory space map, which in turn controls the UEFI memory map). Those HOBs are currently based on what the CMOS reports about the RAM available under and above 4GB. It's pretty complex already (will get more complex with SMM support), and TBH, for working around such an obscure issue, I wouldn't like to complicate it even further... After all, this is a host platform limitation. The solution should be to either move to a more capable host, or do it in software (disable EPT). The reason I mentioned the firmware is because you could in principle have the same issue on real hardware - say putting 128 GB on your laptop. The firmware should cope with it. Agreed, it's probably not a good idea to deviate too much from how real hardware would behave IMO. As a simplification of Paolo's idea, is it possible for qemu to completely ignore memory above the limit ? Will that break anything ? :) If OVMF does not use etc/e820, it can instead hack the values it reads from CMOS, bounding them according to the CPUID value. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html