[PATCH] VSOCK: mark virtio_transport.ko experimental
Be explicit that the virtio_transport.ko code implements a draft virtio specification that is still subject to change. Signed-off-by: Stefan Hajnoczi--- If you'd rather wait until the device specification has been finalized, feel free to revert the virtio-vsock code for now. Apologies for not mentioning the status in the Kconfig earlier. net/vmw_vsock/Kconfig | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/net/vmw_vsock/Kconfig b/net/vmw_vsock/Kconfig index 74e0bc8..d8be850 100644 --- a/net/vmw_vsock/Kconfig +++ b/net/vmw_vsock/Kconfig @@ -28,12 +28,17 @@ config VMWARE_VMCI_VSOCKETS will be called vmw_vsock_vmci_transport. If unsure, say N. config VIRTIO_VSOCKETS - tristate "virtio transport for Virtual Sockets" + tristate "virtio transport for Virtual Sockets (Experimental)" depends on VSOCKETS && VIRTIO select VIRTIO_VSOCKETS_COMMON + default n help This module implements a virtio transport for Virtual Sockets. + This feature is based on a draft of the virtio-vsock device + specification that is still subject to change. It can be used + to begin developing applications that use Virtual Sockets. + Enable this transport if your Virtual Machine runs on Qemu/KVM. To compile this driver as a module, choose M here: the module -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/5] Add virtio transport for AF_VSOCK
On Wed, Dec 02, 2015 at 02:43:58PM +0800, Stefan Hajnoczi wrote: > v2: > * Rebased onto Linux v4.4-rc2 > * vhost: Refuse to assign reserved CIDs > * vhost: Refuse guest CID if already in use > * vhost: Only accept correctly addressed packets (no spoofing!) > * vhost: Support flexible rx/tx descriptor layout > * vhost: Add missing total_tx_buf decrement > * virtio_transport: Fix total_tx_buf accounting > * virtio_transport: Add virtio_transport global mutex to prevent races > * common: Notify other side of SOCK_STREAM disconnect (fixes shutdown >semantics) > * common: Avoid recursive mutex_lock(tx_lock) for write_space (fixes > deadlock) > * common: Define VIRTIO_VSOCK_TYPE_STREAM/DGRAM hardware interface constants > * common: Define VIRTIO_VSOCK_SHUTDOWN_RCV/SEND hardware interface constants > * common: Fix peer_buf_alloc inheritance on child socket > > This patch series adds a virtio transport for AF_VSOCK (net/vmw_vsock/). > AF_VSOCK is designed for communication between virtual machines and > hypervisors. It is currently only implemented for VMware's VMCI transport. > > This series implements the proposed virtio-vsock device specification from > here: > http://comments.gmane.org/gmane.comp.emulators.virtio.devel/855 > > Most of the work was done by Asias He and Gerd Hoffmann a while back. I have > picked up the series again. > > The QEMU userspace changes are here: > https://github.com/stefanha/qemu/commits/vsock > > Why virtio-vsock? > - > Guest<->host communication is currently done over the virtio-serial device. > This makes it hard to port sockets API-based applications and is limited to > static ports. > > virtio-vsock uses the sockets API so that applications can rely on familiar > SOCK_STREAM and SOCK_DGRAM semantics. Applications on the host can easily > connect to guest agents because the sockets API allows multiple connections to > a listen socket (unlike virtio-serial). This simplifies the guest<->host > communication and eliminates the need for extra processes on the host to > arbitrate virtio-serial ports. > > Overview > > This series adds 3 pieces: > > 1. virtio_transport_common.ko - core virtio vsock code that uses vsock.ko > > 2. virtio_transport.ko - guest driver > > 3. drivers/vhost/vsock.ko - host driver > > Howto > - > The following kernel options are needed: > CONFIG_VSOCKETS=y > CONFIG_VIRTIO_VSOCKETS=y > CONFIG_VIRTIO_VSOCKETS_COMMON=y > CONFIG_VHOST_VSOCK=m > > Launch QEMU as follows: > # qemu ... -device vhost-vsock-pci,id=vhost-vsock-pci0,guest-cid=3 > > Guest and host can communicate via AF_VSOCK sockets. The host's CID (address) > is 2 and the guest is automatically assigned a CID (use VMADDR_CID_ANY (-1) to > bind to it). > > Status > -- > There are a few design changes I'd like to make to the virtio-vsock device: > > 1. The 3-way handshake isn't necessary over a reliable transport (virtqueue). >Spoofing packets is also impossible so the security aspects of the 3-way >handshake (including syn cookie) add nothing. The next version will have a >single operation to establish a connection. It's hard to discuss without seeing the details, but we do need to slow down guests that are flooding host with socket creation requests. The handshake is a simple way for hypervisor to defer such requests until it has resources without breaking things. > 2. Credit-based flow control doesn't work for SOCK_DGRAM since multiple > clients >can transmit to the same listen socket. There is no way for the clients to >coordinate buffer space with each other fairly. The next version will drop >credit-based flow control for SOCK_DGRAM and only rely on best-effort >delivery. SOCK_STREAM still has guaranteed delivery. I suspect in the end we will need a measure of fairness even if you drop packets. And recovering from packet loss is hard enough that not many applications do it correctly. I suggest disabling SOCK_DGRAM for now. > 3. In the next version only the host will be able to establish connections >(i.e. to connect to a guest agent). This is for security reasons since >there is currently no ability to provide host services only to certain >guests. This also matches how AF_VSOCK works on modern VMware hypervisors. I see David merged this one already, but above planned changes are userspace and hypervisor/guest visible. Once this is upstream and userspace/guests start relying on it, we'll be stuck supporting this version in addition to whatever we really want, with no easy way to even test it. Might it not be better to defer enabling this upstream until the interface is finalized? > Asias He (5): > VSOCK: Introduce vsock_find_unbound_socket and > vsock_bind_dgram_generic > VSOCK: Introduce virtio-vsock-common.ko > VSOCK: Introduce virtio-vsock.ko > VSOCK: Introduce vhost-vsock.ko > VSOCK: Add Makefile and Kconfig > > drivers/vhost/Kconfig
Re: [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC
On 12/2/2015 10:31 PM, Michael S. Tsirkin wrote: >We hope >to find a better way to make SRIOV NIC work in these cases and this is >worth to do since SRIOV NIC provides better network performance compared >with PV NIC. If this is a performance optimization as the above implies, you need to include some numbers, and document how did you implement the switch and how did you measure the performance. OK. Some ideas of my patches come from paper "CompSC: Live Migration with Pass-through Devices". http://www.cl.cam.ac.uk/research/srg/netos/vee_2012/papers/p109.pdf It compared performance data between the solution of switching PV and VF and VF migration.(Chapter 7: Discussion) >Current patches have some issues. I think we can find >solution for them andimprove them step by step. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] KVM: arm/arm64: Revert to old way of checking for device mapping in stage2_flush_ptes().
Hello! > > > Cc: sta...@vger.kernel.org > > > Fixes: e6fab5442345 ("ARM/arm64: KVM: test properly for a PTE's > > > uncachedness") > > > > > > > That commit is not in a release yet, so no need for cc stable > [...] > > But it is cc'd to stable, so unless it is going to be nacked at review > stage, any subsequent fixes should also be cc'd. Sorry guys for messing things up a bit, but the affected commit actually is in stable branch (4.4-rc3), so i decided to Cc: stable, just in case, because the breakage is quite big IMHO. Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] KVM: arm64: Correctly handle zero register during MMIO
On 03/12/15 09:58, Pavel Fedin wrote: > On ARM64 register index of 31 corresponds to both zero register and SP. > However, all memory access instructions, use ZR as transfer register. SP > is used only as a base register in indirect memory addressing, or by > register-register arithmetics, which cannot be trapped here. > > Correct emulation is achieved by introducing new register accessor > functions, which can do special handling for reg_num == 31. These new > accessors intentionally do not rely on old vcpu_reg() on ARM64, because > it is to be removed. Since the affected code is shared by both ARM > flavours, implementations of these accessors are also added to ARM32 code. > > This patch fixes setting MMIO register to a random value (actually SP) > instead of zero by something like: > > *((volatile int *)reg) = 0; > > compilers tend to generate "str wzr, [xx]" here > > Signed-off-by: Pavel Fedin> --- > arch/arm/include/asm/kvm_emulate.h | 12 > arch/arm/kvm/mmio.c | 5 +++-- > arch/arm64/include/asm/kvm_emulate.h | 13 + > 3 files changed, 28 insertions(+), 2 deletions(-) > > diff --git a/arch/arm/include/asm/kvm_emulate.h > b/arch/arm/include/asm/kvm_emulate.h > index a9c80a2..b7ff32e 100644 > --- a/arch/arm/include/asm/kvm_emulate.h > +++ b/arch/arm/include/asm/kvm_emulate.h > @@ -28,6 +28,18 @@ > unsigned long *vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num); > unsigned long *vcpu_spsr(struct kvm_vcpu *vcpu); > > +static inline unsigned long vcpu_get_reg(const struct kvm_vcpu *vcpu, > + u8 reg_num) > +{ > + return *vcpu_reg(vcpu, reg_num); > +} > + > +static inline void vcpu_set_reg(const struct kvm_vcpu *vcpu, u8 reg_num, > + unsigned long val) > +{ > + *vcpu_reg(vcpu, reg_num) = val; > +} > + > bool kvm_condition_valid(struct kvm_vcpu *vcpu); > void kvm_skip_instr(struct kvm_vcpu *vcpu, bool is_wide_instr); > void kvm_inject_undefined(struct kvm_vcpu *vcpu); > diff --git a/arch/arm/kvm/mmio.c b/arch/arm/kvm/mmio.c > index 974b1c6..3a10c9f 100644 > --- a/arch/arm/kvm/mmio.c > +++ b/arch/arm/kvm/mmio.c > @@ -115,7 +115,7 @@ int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct > kvm_run *run) > trace_kvm_mmio(KVM_TRACE_MMIO_READ, len, run->mmio.phys_addr, > data); > data = vcpu_data_host_to_guest(vcpu, data, len); > - *vcpu_reg(vcpu, vcpu->arch.mmio_decode.rt) = data; > + vcpu_set_reg(vcpu, vcpu->arch.mmio_decode.rt, data); > } > > return 0; > @@ -186,7 +186,8 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run > *run, > rt = vcpu->arch.mmio_decode.rt; > > if (is_write) { > - data = vcpu_data_guest_to_host(vcpu, *vcpu_reg(vcpu, rt), len); > + data = vcpu_data_guest_to_host(vcpu, vcpu_get_reg(vcpu, rt), > +len); > > trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, len, fault_ipa, data); > mmio_write_buf(data_buf, len, data); > diff --git a/arch/arm64/include/asm/kvm_emulate.h > b/arch/arm64/include/asm/kvm_emulate.h > index 3ca894e..5a182af 100644 > --- a/arch/arm64/include/asm/kvm_emulate.h > +++ b/arch/arm64/include/asm/kvm_emulate.h > @@ -109,6 +109,19 @@ static inline unsigned long *vcpu_reg(const struct > kvm_vcpu *vcpu, u8 reg_num) > return (unsigned long *)_gp_regs(vcpu)->regs.regs[reg_num]; > } > > +static inline unsigned long vcpu_get_reg(const struct kvm_vcpu *vcpu, > + u8 reg_num) > +{ > + return (reg_num == 31) ? 0 : vcpu_gp_regs(vcpu)->regs.regs[reg_num]; > +} > + > +static inline void vcpu_set_reg(struct kvm_vcpu *vcpu, u8 reg_num, > + unsigned long val) > +{ > + if (reg_num != 31) > + vcpu_gp_regs(vcpu)->regs.regs[reg_num] = val; > +} > + > /* Get vcpu SPSR for current mode */ > static inline unsigned long *vcpu_spsr(const struct kvm_vcpu *vcpu) > { > Thanks for finding this nasty one. Reviewed-by: Marc Zyngier M. -- Jazz is not dead. It just smells funny... -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] KVM: arm64: BUG FIX: Correctly handle zero register transfers
On 03/12/15 10:53, Pavel Fedin wrote: > Hello! > >>> The problem has been discovered by performing an operation >>> >>> *((volatile int *)reg) = 0; >>> >>> which compiles as "str xzr, [xx]", and resulted in strange values being >>> written. >> >> Interesting find. Which compiler is that? > > $ aarch64-linux-gnu-gcc --version > aarch64-linux-gnu-gcc (Linaro GCC 2014.11) 4.9.3 20141031 (prerelease) > Copyright (C) 2014 Free Software Foundation, Inc. > This is free software; see the source for copying conditions. There is NO > warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. OK. I was just wondering if that was a new thing or not. [...] > Isn't it legitimate to write from ZR to MMIO register? > Another potential case is in our vgic-v3-switch.S: > > msr_s ICH_HCR_EL2, xzr > > It's only because it is KVM code we have never discovered this problem yet. > Somebody could write such a thing in some other place, > with some other register, which would be executed by KVM, and... boo... I'm certainly not disputing that, this is a real bug that should be fixed right now. Looking forward to seeing your v2. Thanks, M. -- Jazz is not dead. It just smells funny... -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 2/3] KVM: arm64: Correctly handle zero register in system register accesses
Hello! > > diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c > > index 87a64e8..a667228 100644 > > --- a/arch/arm64/kvm/sys_regs.c > > +++ b/arch/arm64/kvm/sys_regs.c > > @@ -102,7 +102,7 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu, > > > > BUG_ON(!p->is_write); > > > > - val = *vcpu_reg(vcpu, p->Rt); > > + val = *p->val; > > Why does it have to be a pointer? You could just have "val = p->val" if > you carried the actual value instead of a pointer to the stack variable > holding that value. There's only one concern for pointer approach. Actually, this refactor is based on my vGICv3 live migration API patch set: http://www.spinics.net/lists/kvm/msg124205.html http://www.spinics.net/lists/kvm/msg124202.html It's simply more convenient to use a pointer for exchange with userspace, see vgic_v3_cpu_regs_access() and callers. I wouldn't like to refactor the code again. What's your opinion on this? And of course i'll fix up the rest. Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] KVM: arm64: Correctly handle zero register in system register accesses
On 03/12/15 11:08, Pavel Fedin wrote: > Hello! > >>> diff --git a/arch/arm64/kvm/sys_regs.c >>> b/arch/arm64/kvm/sys_regs.c index 87a64e8..a667228 100644 --- >>> a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ >>> -102,7 +102,7 @@ static bool access_vm_reg(struct kvm_vcpu >>> *vcpu, >>> >>> BUG_ON(!p->is_write); >>> >>> - val = *vcpu_reg(vcpu, p->Rt); + val = *p->val; >> >> Why does it have to be a pointer? You could just have "val = >> p->val" if you carried the actual value instead of a pointer to the >> stack variable holding that value. > > There's only one concern for pointer approach. Actually, this > refactor is based on my vGICv3 live migration API patch set: > http://www.spinics.net/lists/kvm/msg124205.html > http://www.spinics.net/lists/kvm/msg124202.html > > It's simply more convenient to use a pointer for exchange with > userspace, see vgic_v3_cpu_regs_access() and callers. I wouldn't like > to refactor the code again. What's your opinion on this? I still don't think this is a good idea. You can still store the value as an integer in vgic_v3_cpu_regs_access(), and check the write property to do the writeback on read. Which is the same thing I asked for in this patch. > And of course i'll fix up the rest. Thanks, M. -- Jazz is not dead. It just smells funny... -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 2/3] KVM: arm64: Correctly handle zero register in system register accesses
Hello! > > It's simply more convenient to use a pointer for exchange with > > userspace, see vgic_v3_cpu_regs_access() and callers. I wouldn't like > > to refactor the code again. What's your opinion on this? > > I still don't think this is a good idea. You can still store the value > as an integer in vgic_v3_cpu_regs_access(), and check the write property > to do the writeback on read. Which is the same thing I asked for in this > patch. Started doing this and found one more (big) reason against. All sysreg handlers have 'const struct sys_reg_params' declaration, and callers, and their callers... This 'const' is all around the code, and it would take a separate huge patch to un-const'ify all this. Does it worth that? Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: arm/arm64: Revert to old way of checking for device mapping in stage2_flush_ptes().
On 3 December 2015 at 08:14, Pavel Fedinwrote: > Hello! > >> > diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c >> > index 7dace90..51ad98f 100644 >> > --- a/arch/arm/kvm/mmu.c >> > +++ b/arch/arm/kvm/mmu.c >> > @@ -310,7 +310,8 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t >> > *pmd, >> > >> > pte = pte_offset_kernel(pmd, addr); >> > do { >> > - if (!pte_none(*pte) && >> > !kvm_is_device_pfn(__phys_to_pfn(addr))) >> > + if (!pte_none(*pte) && >> > + (pte_val(*pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE) >> >> I think your analysis is correct, but does that not apply to both instances? > > No no, another one is correct, since it operates on real PFN (at least looks > like so). I have verified my fix against the original problem (crash on > Exynos5410 without generic timer), and it still works fine there. > I don't think so. Regardless of whether you are manipulating HYP mappings or stage-2 mappings, the physical address is always the output, not the input of the translation, so addr is always either a virtual address or a intermediate physical address, whereas pfn_valid() operates on host physical addresses. >> And instead of reverting, could we fix this properly instead? > > Of course, i'm not against alternate approaches, feel free to. I've just > suggested what i could, to fix things quickly. I'm indeed no expert in KVM > memory management yet. After all, this is what mailing lists are for. > OK. I will follow up with a patch, as Christoffer requested. I'd appreciate it if you could test to see if it also fixes the current issue, and the original arch timer issue. Thanks, Ard. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 10/21] arm64: KVM: Add patchable function selector
On Wed, 2 Dec 2015 16:34:56 -0600 Andrew Joneswrote: > On Fri, Nov 27, 2015 at 06:50:04PM +, Marc Zyngier wrote: > > KVM so far relies on code patching, and is likely to use it more > > in the future. The main issue is that our alternative system works > > at the instruction level, while we'd like to have alternatives at > > the function level. > > How about setting static-keys at hyp init time? That was an option I looked at. And while static keys would work to some extent, they also have some nasty side effects: - They create both a fast and a slow path. We don't want that - both path should be equally fast, or at least have as little overhead as possible - We do need code patching for some assembly code, and using static keys on top creates a parallel mechanism that makes it hard to follow/debug/maintain. You can view this alternative function call as a slightly different kind of static keys - one that can give you the capability to handle function calls instead of just jumping over code sequences. Both have their own merits. Thanks, M. -- Without deviation from the norm, progress is not possible. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH V2 09/10] Qemu/VFIO: Add SRIOV VF migration support
On 12/3/2015 6:25 AM, Alex Williamson wrote: On Tue, 2015-11-24 at 21:35 +0800, Lan Tianyu wrote: This patch is to add SRIOV VF migration support. Create new device type "vfio-sriov" and add faked PCI migration capability to the type device. The purpose of the new capability 1) sync migration status with VF driver in the VM 2) Get mailbox irq vector to notify VF driver during migration. 3) Provide a way to control injecting irq or not. Qemu will migrate PCI configure space regs and MSIX config for VF. Inject mailbox irq at last stage of migration to notify VF about migration event and wait VF driver ready for migration. VF driver writeS PCI config reg PCI_VF_MIGRATION_VF_STATUS in the new cap table to tell Qemu. What makes this sr-iov specific? Why wouldn't we simply extend vfio-pci with a migration=on feature? Thanks, Sounds reasonable and will update it. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH V2 06/10] Qemu/PCI: Add macros for faked PCI migration capability
On 12/3/2015 6:25 AM, Alex Williamson wrote: This will of course break if the PCI SIG defines that capability index. Couldn't this be done within a vendor defined capability? Thanks, Yes, it should work and thanks for suggestion. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] KVM: arm/arm64: Revert to old way of checking for device mapping in stage2_flush_ptes().
Hello! > >> I think your analysis is correct, but does that not apply to both > >> instances? > > > > No no, another one is correct, since it operates on real PFN (at least > > looks like so). I > have verified my fix against the original problem (crash on Exynos5410 > without generic timer), > and it still works fine there. > > > > I don't think so. Regardless of whether you are manipulating HYP > mappings or stage-2 mappings, the physical address is always the > output, not the input of the translation, so addr is always either a > virtual address or a intermediate physical address, whereas > pfn_valid() operates on host physical addresses. Yes, you are right. I have reviewed this more carefully, and indeed, unmap_range() is also called by unmap_stage2_range(), so it can be both IPA and real PA. > OK. I will follow up with a patch, as Christoffer requested. I'd > appreciate it if you could test to see if it also fixes the current > issue, and the original arch timer issue. I have just made the same patch, and currently testing it on all my boards. Also i'll test it on my ARM64 too, just in case. I was about to finish the testing and send the patch in maybe one or two hours. Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3] KVM: arm64: BUG FIX: Correctly handle zero register transfers
ARM64 CPU has zero register which is read-only, with a value of 0. However, KVM currently incorrectly recognizes it being SP (because Rt == 31, and in struct user_pt_regs 'regs' array is followed by SP), resulting in invalid value being read, or even SP corruption on write. The problem has been discovered by performing an operation *((volatile int *)reg) = 0; which compiles as "str xzr, [xx]", and resulted in strange values being written. Pavel Fedin (3): KVM: arm64: Correctly handle zero register during MMIO KVM: arm64: Correctly handle zero register in system register accesses KVM: arm64: Get rid of old vcpu_reg() arch/arm/include/asm/kvm_emulate.h | 12 ++ arch/arm/kvm/mmio.c | 5 ++- arch/arm/kvm/psci.c | 20 - arch/arm64/include/asm/kvm_emulate.h | 18 +--- arch/arm64/kvm/handle_exit.c | 2 +- arch/arm64/kvm/sys_regs.c| 79 arch/arm64/kvm/sys_regs.h| 4 +- arch/arm64/kvm/sys_regs_generic_v8.c | 2 +- 8 files changed, 85 insertions(+), 57 deletions(-) -- 2.4.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] KVM: arm64: Correctly handle zero register during MMIO
On ARM64 register index of 31 corresponds to both zero register and SP. However, all memory access instructions, use ZR as transfer register. SP is used only as a base register in indirect memory addressing, or by register-register arithmetics, which cannot be trapped here. Correct emulation is achieved by introducing new register accessor functions, which can do special handling for reg_num == 31. These new accessors intentionally do not rely on old vcpu_reg() on ARM64, because it is to be removed. Since the affected code is shared by both ARM flavours, implementations of these accessors are also added to ARM32 code. This patch fixes setting MMIO register to a random value (actually SP) instead of zero by something like: *((volatile int *)reg) = 0; compilers tend to generate "str wzr, [xx]" here Signed-off-by: Pavel Fedin--- arch/arm/include/asm/kvm_emulate.h | 12 arch/arm/kvm/mmio.c | 5 +++-- arch/arm64/include/asm/kvm_emulate.h | 13 + 3 files changed, 28 insertions(+), 2 deletions(-) diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h index a9c80a2..b7ff32e 100644 --- a/arch/arm/include/asm/kvm_emulate.h +++ b/arch/arm/include/asm/kvm_emulate.h @@ -28,6 +28,18 @@ unsigned long *vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num); unsigned long *vcpu_spsr(struct kvm_vcpu *vcpu); +static inline unsigned long vcpu_get_reg(const struct kvm_vcpu *vcpu, +u8 reg_num) +{ + return *vcpu_reg(vcpu, reg_num); +} + +static inline void vcpu_set_reg(const struct kvm_vcpu *vcpu, u8 reg_num, + unsigned long val) +{ + *vcpu_reg(vcpu, reg_num) = val; +} + bool kvm_condition_valid(struct kvm_vcpu *vcpu); void kvm_skip_instr(struct kvm_vcpu *vcpu, bool is_wide_instr); void kvm_inject_undefined(struct kvm_vcpu *vcpu); diff --git a/arch/arm/kvm/mmio.c b/arch/arm/kvm/mmio.c index 974b1c6..3a10c9f 100644 --- a/arch/arm/kvm/mmio.c +++ b/arch/arm/kvm/mmio.c @@ -115,7 +115,7 @@ int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run) trace_kvm_mmio(KVM_TRACE_MMIO_READ, len, run->mmio.phys_addr, data); data = vcpu_data_host_to_guest(vcpu, data, len); - *vcpu_reg(vcpu, vcpu->arch.mmio_decode.rt) = data; + vcpu_set_reg(vcpu, vcpu->arch.mmio_decode.rt, data); } return 0; @@ -186,7 +186,8 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run, rt = vcpu->arch.mmio_decode.rt; if (is_write) { - data = vcpu_data_guest_to_host(vcpu, *vcpu_reg(vcpu, rt), len); + data = vcpu_data_guest_to_host(vcpu, vcpu_get_reg(vcpu, rt), + len); trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, len, fault_ipa, data); mmio_write_buf(data_buf, len, data); diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h index 3ca894e..5a182af 100644 --- a/arch/arm64/include/asm/kvm_emulate.h +++ b/arch/arm64/include/asm/kvm_emulate.h @@ -109,6 +109,19 @@ static inline unsigned long *vcpu_reg(const struct kvm_vcpu *vcpu, u8 reg_num) return (unsigned long *)_gp_regs(vcpu)->regs.regs[reg_num]; } +static inline unsigned long vcpu_get_reg(const struct kvm_vcpu *vcpu, +u8 reg_num) +{ + return (reg_num == 31) ? 0 : vcpu_gp_regs(vcpu)->regs.regs[reg_num]; +} + +static inline void vcpu_set_reg(struct kvm_vcpu *vcpu, u8 reg_num, + unsigned long val) +{ + if (reg_num != 31) + vcpu_gp_regs(vcpu)->regs.regs[reg_num] = val; +} + /* Get vcpu SPSR for current mode */ static inline unsigned long *vcpu_spsr(const struct kvm_vcpu *vcpu) { -- 2.4.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] KVM: arm64: Correctly handle zero register in system register accesses
System register accesses also use zero register for Rt == 31, and therefore using it will also result in getting SP value instead. This patch makes them also using new accessors, introduced by the previous patch. Additionally, got rid of "massive hack" in kvm_handle_cp_64(). Signed-off-by: Pavel Fedin--- arch/arm64/kvm/sys_regs.c| 79 arch/arm64/kvm/sys_regs.h| 4 +- arch/arm64/kvm/sys_regs_generic_v8.c | 2 +- 3 files changed, 46 insertions(+), 39 deletions(-) diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index 87a64e8..a667228 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -102,7 +102,7 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu, BUG_ON(!p->is_write); - val = *vcpu_reg(vcpu, p->Rt); + val = *p->val; if (!p->is_aarch32) { vcpu_sys_reg(vcpu, r->reg) = val; } else { @@ -125,13 +125,10 @@ static bool access_gic_sgi(struct kvm_vcpu *vcpu, const struct sys_reg_params *p, const struct sys_reg_desc *r) { - u64 val; - if (!p->is_write) return read_from_write_only(vcpu, p); - val = *vcpu_reg(vcpu, p->Rt); - vgic_v3_dispatch_sgi(vcpu, val); + vgic_v3_dispatch_sgi(vcpu, *p->val); return true; } @@ -153,7 +150,7 @@ static bool trap_oslsr_el1(struct kvm_vcpu *vcpu, if (p->is_write) { return ignore_write(vcpu, p); } else { - *vcpu_reg(vcpu, p->Rt) = (1 << 3); + *p->val = (1 << 3); return true; } } @@ -167,7 +164,7 @@ static bool trap_dbgauthstatus_el1(struct kvm_vcpu *vcpu, } else { u32 val; asm volatile("mrs %0, dbgauthstatus_el1" : "=r" (val)); - *vcpu_reg(vcpu, p->Rt) = val; + *p->val = val; return true; } } @@ -204,13 +201,13 @@ static bool trap_debug_regs(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r) { if (p->is_write) { - vcpu_sys_reg(vcpu, r->reg) = *vcpu_reg(vcpu, p->Rt); + vcpu_sys_reg(vcpu, r->reg) = *p->val; vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY; } else { - *vcpu_reg(vcpu, p->Rt) = vcpu_sys_reg(vcpu, r->reg); + *p->val = vcpu_sys_reg(vcpu, r->reg); } - trace_trap_reg(__func__, r->reg, p->is_write, *vcpu_reg(vcpu, p->Rt)); + trace_trap_reg(__func__, r->reg, p->is_write, *p->val); return true; } @@ -228,7 +225,7 @@ static inline void reg_to_dbg(struct kvm_vcpu *vcpu, const struct sys_reg_params *p, u64 *dbg_reg) { - u64 val = *vcpu_reg(vcpu, p->Rt); + u64 val = *p->val; if (p->is_32bit) { val &= 0xUL; @@ -248,7 +245,7 @@ static inline void dbg_to_reg(struct kvm_vcpu *vcpu, if (p->is_32bit) val &= 0xUL; - *vcpu_reg(vcpu, p->Rt) = val; + *p->val = val; } static inline bool trap_bvr(struct kvm_vcpu *vcpu, @@ -697,10 +694,10 @@ static bool trap_dbgidr(struct kvm_vcpu *vcpu, u64 pfr = read_system_reg(SYS_ID_AA64PFR0_EL1); u32 el3 = !!cpuid_feature_extract_field(pfr, ID_AA64PFR0_EL3_SHIFT); - *vcpu_reg(vcpu, p->Rt) = dfr >> ID_AA64DFR0_WRPS_SHIFT) & 0xf) << 28) | - (((dfr >> ID_AA64DFR0_BRPS_SHIFT) & 0xf) << 24) | - (((dfr >> ID_AA64DFR0_CTX_CMPS_SHIFT) & 0xf) << 20) | - (6 << 16) | (el3 << 14) | (el3 << 12)); + *p->val = dfr >> ID_AA64DFR0_WRPS_SHIFT) & 0xf) << 28) | + (((dfr >> ID_AA64DFR0_BRPS_SHIFT) & 0xf) << 24) | + (((dfr >> ID_AA64DFR0_CTX_CMPS_SHIFT) & 0xf) << 20) | + (6 << 16) | (el3 << 14) | (el3 << 12)); return true; } } @@ -710,10 +707,10 @@ static bool trap_debug32(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r) { if (p->is_write) { - vcpu_cp14(vcpu, r->reg) = *vcpu_reg(vcpu, p->Rt); + vcpu_cp14(vcpu, r->reg) = *p->val; vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY; } else { - *vcpu_reg(vcpu, p->Rt) = vcpu_cp14(vcpu, r->reg); + *p->val = vcpu_cp14(vcpu, r->reg); } return true; @@ -740,12 +737,12 @@ static inline bool trap_xvr(struct kvm_vcpu *vcpu, u64 val = *dbg_reg; val &= 0xUL; - val |= *vcpu_reg(vcpu, p->Rt) << 32; + val |= *p->val << 32; *dbg_reg = val;
[PATCH 3/3] KVM: arm64: Get rid of old vcpu_reg()
Using oldstyle vcpu_reg() accessor is proven to be inapproptiate and unsafe on ARM64. This patch fixes the rest of use cases and completely removes vcpu_reg() on ARM64. Signed-off-by: Pavel Fedin--- arch/arm/kvm/psci.c | 20 ++-- arch/arm64/include/asm/kvm_emulate.h | 11 +++ arch/arm64/kvm/handle_exit.c | 2 +- 3 files changed, 14 insertions(+), 19 deletions(-) diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c index 0b55696..a9b3b90 100644 --- a/arch/arm/kvm/psci.c +++ b/arch/arm/kvm/psci.c @@ -75,7 +75,7 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu) unsigned long context_id; phys_addr_t target_pc; - cpu_id = *vcpu_reg(source_vcpu, 1) & MPIDR_HWID_BITMASK; + cpu_id = vcpu_get_reg(source_vcpu, 1) & MPIDR_HWID_BITMASK; if (vcpu_mode_is_32bit(source_vcpu)) cpu_id &= ~((u32) 0); @@ -94,8 +94,8 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu) return PSCI_RET_INVALID_PARAMS; } - target_pc = *vcpu_reg(source_vcpu, 2); - context_id = *vcpu_reg(source_vcpu, 3); + target_pc = vcpu_get_reg(source_vcpu, 2); + context_id = vcpu_get_reg(source_vcpu, 3); kvm_reset_vcpu(vcpu); @@ -114,7 +114,7 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu) * NOTE: We always update r0 (or x0) because for PSCI v0.1 * the general puspose registers are undefined upon CPU_ON. */ - *vcpu_reg(vcpu, 0) = context_id; + vcpu_set_reg(vcpu, 0, context_id); vcpu->arch.power_off = false; smp_mb(); /* Make sure the above is visible */ @@ -134,8 +134,8 @@ static unsigned long kvm_psci_vcpu_affinity_info(struct kvm_vcpu *vcpu) struct kvm *kvm = vcpu->kvm; struct kvm_vcpu *tmp; - target_affinity = *vcpu_reg(vcpu, 1); - lowest_affinity_level = *vcpu_reg(vcpu, 2); + target_affinity = vcpu_get_reg(vcpu, 1); + lowest_affinity_level = vcpu_get_reg(vcpu, 2); /* Determine target affinity mask */ target_affinity_mask = psci_affinity_mask(lowest_affinity_level); @@ -209,7 +209,7 @@ int kvm_psci_version(struct kvm_vcpu *vcpu) static int kvm_psci_0_2_call(struct kvm_vcpu *vcpu) { int ret = 1; - unsigned long psci_fn = *vcpu_reg(vcpu, 0) & ~((u32) 0); + unsigned long psci_fn = vcpu_get_reg(vcpu, 0) & ~((u32) 0); unsigned long val; switch (psci_fn) { @@ -273,13 +273,13 @@ static int kvm_psci_0_2_call(struct kvm_vcpu *vcpu) break; } - *vcpu_reg(vcpu, 0) = val; + vcpu_set_reg(vcpu, 0, val); return ret; } static int kvm_psci_0_1_call(struct kvm_vcpu *vcpu) { - unsigned long psci_fn = *vcpu_reg(vcpu, 0) & ~((u32) 0); + unsigned long psci_fn = vcpu_get_reg(vcpu, 0) & ~((u32) 0); unsigned long val; switch (psci_fn) { @@ -295,7 +295,7 @@ static int kvm_psci_0_1_call(struct kvm_vcpu *vcpu) break; } - *vcpu_reg(vcpu, 0) = val; + vcpu_set_reg(vcpu, 0, val); return 1; } diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h index 5a182af..25a4021 100644 --- a/arch/arm64/include/asm/kvm_emulate.h +++ b/arch/arm64/include/asm/kvm_emulate.h @@ -100,15 +100,10 @@ static inline void vcpu_set_thumb(struct kvm_vcpu *vcpu) } /* - * vcpu_reg should always be passed a register number coming from a - * read of ESR_EL2. Otherwise, it may give the wrong result on AArch32 - * with banked registers. + * vcpu_get_reg and vcpu_set_reg should always be passed a register number + * coming from a read of ESR_EL2. Otherwise, it may give the wrong result on + * AArch32 with banked registers. */ -static inline unsigned long *vcpu_reg(const struct kvm_vcpu *vcpu, u8 reg_num) -{ - return (unsigned long *)_gp_regs(vcpu)->regs.regs[reg_num]; -} - static inline unsigned long vcpu_get_reg(const struct kvm_vcpu *vcpu, u8 reg_num) { diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c index 68a0759..15f0477 100644 --- a/arch/arm64/kvm/handle_exit.c +++ b/arch/arm64/kvm/handle_exit.c @@ -37,7 +37,7 @@ static int handle_hvc(struct kvm_vcpu *vcpu, struct kvm_run *run) { int ret; - trace_kvm_hvc_arm64(*vcpu_pc(vcpu), *vcpu_reg(vcpu, 0), + trace_kvm_hvc_arm64(*vcpu_pc(vcpu), vcpu_get_reg(vcpu, 0), kvm_vcpu_hvc_get_imm(vcpu)); ret = kvm_psci_call(vcpu); -- 2.4.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 0/3] KVM: arm64: BUG FIX: Correctly handle zero register transfers
Hello! > > The problem has been discovered by performing an operation > > > > *((volatile int *)reg) = 0; > > > > which compiles as "str xzr, [xx]", and resulted in strange values being > > written. > > Interesting find. Which compiler is that? $ aarch64-linux-gnu-gcc --version aarch64-linux-gnu-gcc (Linaro GCC 2014.11) 4.9.3 20141031 (prerelease) Copyright (C) 2014 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. This is from my colleague who actually hit the bug by his driver. And i can reproduce the issue with different compiler version using the following small testcase: --- cut --- p.fedin@fedinw7x64 /cygdrive/d/Projects/Test $ cat test.c volatile int *addr; int test_val(int val) { *addr = val; } int test_zero(void) { *addr = 0; } p.fedin@fedinw7x64 /cygdrive/d/Projects/Test $ aarch64-unknown-linux-gnu-gcc -O2 -c test.c p.fedin@fedinw7x64 /cygdrive/d/Projects/Test $ aarch64-unknown-linux-gnu-objdump -d test.o test.o: file format elf64-littleaarch64 Disassembly of section .text: : 0: 2a0003e2mov w2, w0 4: 2a0103e0mov w0, w1 8: 9001adrpx1, 8c: f9400021ldr x1, [x1] 10: b922str w2, [x1] 14: d65f03c0ret 0018 : 18: 9001adrpx1, 8 1c: f9400021ldr x1, [x1] 20: b93fstr wzr, [x1] 24: d65f03c0ret p.fedin@fedinw7x64 /cygdrive/d/Projects/Test $ aarch64-unknown-linux-gnu-gcc --version aarch64-unknown-linux-gnu-gcc (GCC) 4.9.0 Copyright (C) 2014 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. --- cut --- Isn't it legitimate to write from ZR to MMIO register? Another potential case is in our vgic-v3-switch.S: msr_s ICH_HCR_EL2, xzr It's only because it is KVM code we have never discovered this problem yet. Somebody could write such a thing in some other place, with some other register, which would be executed by KVM, and... boo... Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH V2 02/10] Qemu/VFIO: Add new VFIO_GET_PCI_CAP_INFO ioctl cmd definition
On 12/3/2015 6:25 AM, Alex Williamson wrote: I didn't seen a matching kernel patch series for this, but why is the kernel more capable of doing this than userspace is already? The following link is the kernel patch. http://marc.info/?l=kvm=144837328920989=2 These seem like pointless ioctls, we're creating a purely virtual PCI capability, the kernel doesn't really need to participate in that. VFIO kernel driver has pci_config_map which indicates the PCI capability position and length which helps to find free PCI config regs. Qemu side doesn't have such info and can't get the exact table size of PCI capability. If we want to add such support in the Qemu, needs duplicates a lot of code of vfio_pci_configs.c in the Qemu. Also, why are we restricting ourselves to standard capabilities? This version is to check whether it's on the right way and We can extend this to pci extended capability later. That's often a crowded space and we can't always know whether an area is free or not based only on it being covered by a capability. Some capabilities can also appear more than once, so there's context that isn't being passed to the kernel here. Thanks, The region outside of PCI capability are not passed to kernel or used by Qemu for MSI/MSIX . It's possible to use these places for new capability. One concerns is that guest driver may abuse them and quirk of masking some special regs outside of capability maybe helpful. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] KVM: arm64: BUG FIX: Correctly handle zero register transfers
On 03/12/15 09:58, Pavel Fedin wrote: > ARM64 CPU has zero register which is read-only, with a value of 0. > However, KVM currently incorrectly recognizes it being SP (because > Rt == 31, and in struct user_pt_regs 'regs' array is followed by SP), > resulting in invalid value being read, or even SP corruption on write. No really. XZR and SP do share the same encoding. > The problem has been discovered by performing an operation > > *((volatile int *)reg) = 0; > > which compiles as "str xzr, [xx]", and resulted in strange values being > written. Interesting find. Which compiler is that? Thanks, M. -- Jazz is not dead. It just smells funny... -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] KVM: arm64: Correctly handle zero register in system register accesses
Pavel, On 03/12/15 09:58, Pavel Fedin wrote: > System register accesses also use zero register for Rt == 31, and > therefore using it will also result in getting SP value instead. This > patch makes them also using new accessors, introduced by the previous > patch. > > Additionally, got rid of "massive hack" in kvm_handle_cp_64(). Looks good for a first drop - some comments below. > > Signed-off-by: Pavel Fedin> --- > arch/arm64/kvm/sys_regs.c| 79 > > arch/arm64/kvm/sys_regs.h| 4 +- > arch/arm64/kvm/sys_regs_generic_v8.c | 2 +- > 3 files changed, 46 insertions(+), 39 deletions(-) > > diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c > index 87a64e8..a667228 100644 > --- a/arch/arm64/kvm/sys_regs.c > +++ b/arch/arm64/kvm/sys_regs.c > @@ -102,7 +102,7 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu, > > BUG_ON(!p->is_write); > > - val = *vcpu_reg(vcpu, p->Rt); > + val = *p->val; Why does it have to be a pointer? You could just have "val = p->val" if you carried the actual value instead of a pointer to the stack variable holding that value. > if (!p->is_aarch32) { > vcpu_sys_reg(vcpu, r->reg) = val; > } else { > @@ -125,13 +125,10 @@ static bool access_gic_sgi(struct kvm_vcpu *vcpu, > const struct sys_reg_params *p, > const struct sys_reg_desc *r) > { > - u64 val; > - > if (!p->is_write) > return read_from_write_only(vcpu, p); > > - val = *vcpu_reg(vcpu, p->Rt); > - vgic_v3_dispatch_sgi(vcpu, val); > + vgic_v3_dispatch_sgi(vcpu, *p->val); > > return true; > } > @@ -153,7 +150,7 @@ static bool trap_oslsr_el1(struct kvm_vcpu *vcpu, > if (p->is_write) { > return ignore_write(vcpu, p); > } else { > - *vcpu_reg(vcpu, p->Rt) = (1 << 3); > + *p->val = (1 << 3); > return true; > } > } > @@ -167,7 +164,7 @@ static bool trap_dbgauthstatus_el1(struct kvm_vcpu *vcpu, > } else { > u32 val; > asm volatile("mrs %0, dbgauthstatus_el1" : "=r" (val)); > - *vcpu_reg(vcpu, p->Rt) = val; > + *p->val = val; > return true; > } > } > @@ -204,13 +201,13 @@ static bool trap_debug_regs(struct kvm_vcpu *vcpu, > const struct sys_reg_desc *r) > { > if (p->is_write) { > - vcpu_sys_reg(vcpu, r->reg) = *vcpu_reg(vcpu, p->Rt); > + vcpu_sys_reg(vcpu, r->reg) = *p->val; > vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY; > } else { > - *vcpu_reg(vcpu, p->Rt) = vcpu_sys_reg(vcpu, r->reg); > + *p->val = vcpu_sys_reg(vcpu, r->reg); > } > > - trace_trap_reg(__func__, r->reg, p->is_write, *vcpu_reg(vcpu, p->Rt)); > + trace_trap_reg(__func__, r->reg, p->is_write, *p->val); > > return true; > } > @@ -228,7 +225,7 @@ static inline void reg_to_dbg(struct kvm_vcpu *vcpu, > const struct sys_reg_params *p, > u64 *dbg_reg) > { > - u64 val = *vcpu_reg(vcpu, p->Rt); > + u64 val = *p->val; > > if (p->is_32bit) { > val &= 0xUL; > @@ -248,7 +245,7 @@ static inline void dbg_to_reg(struct kvm_vcpu *vcpu, > if (p->is_32bit) > val &= 0xUL; > > - *vcpu_reg(vcpu, p->Rt) = val; > + *p->val = val; > } > > static inline bool trap_bvr(struct kvm_vcpu *vcpu, > @@ -697,10 +694,10 @@ static bool trap_dbgidr(struct kvm_vcpu *vcpu, > u64 pfr = read_system_reg(SYS_ID_AA64PFR0_EL1); > u32 el3 = !!cpuid_feature_extract_field(pfr, > ID_AA64PFR0_EL3_SHIFT); > > - *vcpu_reg(vcpu, p->Rt) = dfr >> ID_AA64DFR0_WRPS_SHIFT) & > 0xf) << 28) | > - (((dfr >> ID_AA64DFR0_BRPS_SHIFT) & > 0xf) << 24) | > - (((dfr >> ID_AA64DFR0_CTX_CMPS_SHIFT) > & 0xf) << 20) | > - (6 << 16) | (el3 << 14) | (el3 << > 12)); > + *p->val = dfr >> ID_AA64DFR0_WRPS_SHIFT) & 0xf) << 28) | > +(((dfr >> ID_AA64DFR0_BRPS_SHIFT) & 0xf) << 24) | > +(((dfr >> ID_AA64DFR0_CTX_CMPS_SHIFT) & 0xf) << 20) | > +(6 << 16) | (el3 << 14) | (el3 << 12)); > return true; > } > } > @@ -710,10 +707,10 @@ static bool trap_debug32(struct kvm_vcpu *vcpu, >const struct sys_reg_desc *r) > { > if (p->is_write) { > - vcpu_cp14(vcpu, r->reg) = *vcpu_reg(vcpu, p->Rt); > + vcpu_cp14(vcpu, r->reg) = *p->val; > vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY; > } else { > - *vcpu_reg(vcpu, p->Rt) = vcpu_cp14(vcpu, r->reg); > +
RE: [PATCH v2 0/3] Introduce MSI hardware mapping for VFIO
Hello! > I like that you're making this transparent > for the user, but at the same time, directly calling function pointers > through the msi_domain_ops is quite ugly. Do you mean dereferencing info->ops->vfio_map() in .c code? I can introduce some wrappers in include/linux/msi.h like msi_domain_vfio_map()/msi_domain_vfio_unmap(), this would not conceptually change anything. > There needs to be a real, interface there that isn't specific to vfio. Hm... What else is going to use this? Actually, in my implementation the only thing specific to vfio is using struct vfio_iommu_driver_ops. This is because we have to perform MSI mapping for all "vfio domains" registered for this container. At least this is how original type1 driver works. Can anybody explain me, what these "vfio domains" are? From the code it looks like we can have several IOMMU instances belonging to one VFIO container, and in this case one IOMMU == one "vfio domain". So is my understanding correct that "vfio domain" is IOMMU instance? And here come completely different ideas... First of all, can anybody explain, why do i perform all mappings on per-IOMMU basis, not on per-device basis? AFAIK at least ARM SMMU knows about "stream IDs", and therefore it should be capable of distinguishing between different devices. So can i have per-device mapping? This would make things much simpler. So: Idea 1: do per-device mappings. In this case i don't have to track down which devices belong to which group and which IOMMU... Idea 2: What if we indeed simply simulate x86 behavior? What if we just do 1:1 mapping for MSI register when IOMMU is initialized and forget about it, so that MSI messages are guaranteed to reach the host? Or would this mean that we would have to do 1:1 mapping for the whole address range? Looks like (1) tried to do something similar, with address reservation. Idea 3: Is single device guaranteed to correspond to a single "vfio domain" (and, as a consequence, to a single IOMMU)? In this case it's very easy to unlink interface introduced by 0002 of my series from vfio, and pass just raw struct iommu_domain * without any driver_ops? irqchip code would only need iommu_map() and iommu_unmap() then, no calling back to vfio layer. Cc'ed to authors of all mentioned series > [1] http://www.spinics.net/lists/kvm/msg121669.html, > http://www.spinics.net/lists/kvm/msg121662.html > [2] http://www.spinics.net/lists/kvm/msg119236.html Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC
On 12/2/2015 10:31 PM, Michael S. Tsirkin wrote: >We hope >to find a better way to make SRIOV NIC work in these cases and this is >worth to do since SRIOV NIC provides better network performance compared >with PV NIC. If this is a performance optimization as the above implies, you need to include some numbers, and document how did you implement the switch and how did you measure the performance. OK. Some ideas of my patches come from paper "CompSC: Live Migration with Pass-through Devices". http://www.cl.cam.ac.uk/research/srg/netos/vee_2012/papers/p109.pdf It compared performance data between the solution of switching PV and VF and VF migration.(Chapter 7: Discussion) >Current patches have some issues. I think we can find >solution for them andimprove them step by step. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/3] Introduce MSI hardware mapping for VFIO
On Thu, 3 Dec 2015 16:16:54 +0300 Pavel Fedinwrote: Pavel, > Hello! > > > I like that you're making this transparent > > for the user, but at the same time, directly calling function pointers > > through the msi_domain_ops is quite ugly. > > Do you mean dereferencing info->ops->vfio_map() in .c code? I can introduce > some wrappers in include/linux/msi.h like > msi_domain_vfio_map()/msi_domain_vfio_unmap(), this would not conceptually > change anything. > > > There needs to be a real, interface there that isn't specific to vfio. > > Hm... What else is going to use this? > Actually, in my implementation the only thing specific to vfio is > using struct vfio_iommu_driver_ops. This is because we have to > perform MSI mapping for all "vfio domains" registered for this > container. At least this is how original type1 driver works. > Can anybody explain me, what these "vfio domains" are? From the code > it looks like we can have several IOMMU instances belonging to one > VFIO container, and in this case one IOMMU == one "vfio domain". So > is my understanding correct that "vfio domain" is IOMMU instance? > And here come completely different ideas... > First of all, can anybody explain, why do i perform all mappings on > per-IOMMU basis, not on per-device basis? AFAIK at least ARM SMMU > knows about "stream IDs", and therefore it should be capable of > distinguishing between different devices. So can i have per-device > mapping? This would make things much simpler. Not exactly. You can have a a bunch of devices between a single StreamID (making them completely indistinguishable). This is a system integration decision. > So: > Idea 1: do per-device mappings. In this case i don't have to track > down which devices belong to which group and which IOMMU... As explained above, this doesn't work. > Idea 2: What if we indeed simply simulate x86 behavior? What if we > just do 1:1 mapping for MSI register when IOMMU is initialized and > forget about it, so that MSI messages are guaranteed to reach the > host? Or would this mean that we would have to do 1:1 mapping for the > whole address range? Looks like (1) tried to do something similar, > with address reservation. Neither. We want userspace to be in control of the memory map, and it is the kernel's job to tell us whether or not this matches the HW capabilities or not. A fixed mapping may completely clash with the memory map I want (think emulating HW x on platform y), and there is no reason why we should have the restrictions x86 has. My understanding is that userspace should be in charge here, and maybe obtain a range of acceptable IOVAs for the MSI doorbell to be mapped. Thanks, M. -- Jazz is not dead. It just smells funny. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] KVM: arm64: Correctly handle zero register in system register accesses
On 03/12/15 11:55, Pavel Fedin wrote: > Hello! > >>> It's simply more convenient to use a pointer for exchange with >>> userspace, see vgic_v3_cpu_regs_access() and callers. I wouldn't like >>> to refactor the code again. What's your opinion on this? >> >> I still don't think this is a good idea. You can still store the value >> as an integer in vgic_v3_cpu_regs_access(), and check the write property >> to do the writeback on read. Which is the same thing I asked for in this >> patch. > > Started doing this and found one more (big) reason against. All sysreg > handlers have 'const struct sys_reg_params' declaration, and > callers, and their callers... This 'const' is all around the code, and it > would take a separate huge patch to un-const'ify all this. > Does it worth that? maz@approximate:~/Work/arm-platforms$ git diff --stat arch/arm64/kvm/sys_regs.c | 36 ++-- arch/arm64/kvm/sys_regs.h | 2 +- 2 files changed, 19 insertions(+), 19 deletions(-) Hardly a big deal... You can have that as a separate patch. Thanks, M. -- Jazz is not dead. It just smells funny... -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: VMX: fix the writing POSTED_INTR_NV
On 03/12/2015 06:29, roy.qing...@gmail.com wrote: > From: Li RongQing> > POSTED_INTR_NV is 16bit, should not use 64bit write function > > [ 5311.676074] vmwrite error: reg 3 value 0 (err 12) > [ 5311.680001] CPU: 49 PID: 4240 Comm: qemu-system-i38 Tainted: G I > 4.1.13-WR8.0.0.0_standard #1 > [ 5311.689343] Hardware name: Intel Corporation S2600WT2/S2600WT2, BIOS > SE5C610.86B.01.01.0008.021120151325 02/11/2015 > [ 5311.699550] e69a7e1c c1950de1 e69a7e38 > fafcff45 fafebd24 > [ 5311.706924] 0003 000c b6a06dfa e69a7e40 fafcff79 > e69a7eb0 fafd5f57 > [ 5311.714296] e69a7ec0 c1080600 0001 c0e18018 01be > 0b43 > [ 5311.721651] Call Trace: > [ 5311.722942] [] dump_stack+0x4b/0x75 > [ 5311.726467] [] vmwrite_error+0x35/0x40 [kvm_intel] > [ 5311.731444] [] vmcs_writel+0x29/0x30 [kvm_intel] > [ 5311.736228] [] vmx_create_vcpu+0x337/0xb90 [kvm_intel] > [ 5311.741600] [] ? dequeue_task_fair+0x2e0/0xf60 > [ 5311.746197] [] kvm_arch_vcpu_create+0x3a/0x70 [kvm] > [ 5311.751278] [] kvm_vm_ioctl+0x14d/0x640 [kvm] > [ 5311.755771] [] ? free_pages_prepare+0x1a4/0x2d0 > [ 5311.760455] [] ? debug_smp_processor_id+0x12/0x20 > [ 5311.765333] [] ? sched_move_task+0xbe/0x170 > [ 5311.769621] [] ? kmem_cache_free+0x213/0x230 > [ 5311.774016] [] ? kvm_set_memory_region+0x60/0x60 [kvm] > [ 5311.779379] [] do_vfs_ioctl+0x2e2/0x500 > [ 5311.783285] [] ? kmem_cache_free+0x213/0x230 > [ 5311.787677] [] ? __mmdrop+0x63/0xd0 > [ 5311.791196] [] ? __mmdrop+0x63/0xd0 > [ 5311.794712] [] ? __mmdrop+0x63/0xd0 > [ 5311.798234] [] ? __fget+0x57/0x90 > [ 5311.801559] [] ? __fget_light+0x22/0x50 > [ 5311.805464] [] SyS_ioctl+0x80/0x90 > [ 5311.808885] [] sysenter_do_call+0x12/0x12 > [ 5312.059280] kvm: zapping shadow pages for mmio generation wraparound > [ 5313.678415] kvm [4231]: vcpu0 disabled perfctr wrmsr: 0xc2 data 0x > [ 5313.726518] kvm [4231]: vcpu0 unhandled rdmsr: 0x570 > > Signed-off-by: Li RongQing > Cc: Yang Zhang > --- > arch/x86/kvm/vmx.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index 5eb56ed..418e084 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -4780,7 +4780,7 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx) > > vmcs_write16(GUEST_INTR_STATUS, 0); > > - vmcs_write64(POSTED_INTR_NV, POSTED_INTR_VECTOR); > + vmcs_write16(POSTED_INTR_NV, POSTED_INTR_VECTOR); > vmcs_write64(POSTED_INTR_DESC_ADDR, __pa((>pi_desc))); > } > > @@ -9505,7 +9505,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, > struct vmcs12 *vmcs12) >*/ > vmx->nested.posted_intr_nv = vmcs12->posted_intr_nv; > vmx->nested.pi_pending = false; > - vmcs_write64(POSTED_INTR_NV, POSTED_INTR_VECTOR); > + vmcs_write16(POSTED_INTR_NV, POSTED_INTR_VECTOR); > vmcs_write64(POSTED_INTR_DESC_ADDR, > page_to_phys(vmx->nested.pi_desc_page) + > (unsigned long)(vmcs12->posted_intr_desc_addr & > Indeed this is a problem for 32-bit hosts. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/1] virtio/s390: one fix
Michael, here's one fix for the virtio-ccw driver. Patch is against master, as your git branches on mst/vhost.git seem to be quite old. Is there some branch I can base my own branch on, or do you prefer to take patches directly anyway? Cornelia Huck (1): virtio/s390: handle error values in irb drivers/s390/virtio/virtio_ccw.c | 62 1 file changed, 37 insertions(+), 25 deletions(-) -- 2.3.9 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/1] virtio/s390: handle error values in irb
The common I/O layer may pass an error value as the irb in the device's interrupt handler (for classic channel I/O). This won't happen in current virtio-ccw implementations, but it's better to be safe than sorry. Let's just return the error conveyed by the irb and clear any possible pending I/O indications. Signed-off-by: Cornelia HuckReviewed-by: Guenther Hutzl Reviewed-by: Pierre Morel --- drivers/s390/virtio/virtio_ccw.c | 62 1 file changed, 37 insertions(+), 25 deletions(-) diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c index b2a1a81..1b83159 100644 --- a/drivers/s390/virtio/virtio_ccw.c +++ b/drivers/s390/virtio/virtio_ccw.c @@ -984,6 +984,36 @@ static struct virtqueue *virtio_ccw_vq_by_ind(struct virtio_ccw_device *vcdev, return vq; } +static void virtio_ccw_check_activity(struct virtio_ccw_device *vcdev, + __u32 activity) +{ + if (vcdev->curr_io & activity) { + switch (activity) { + case VIRTIO_CCW_DOING_READ_FEAT: + case VIRTIO_CCW_DOING_WRITE_FEAT: + case VIRTIO_CCW_DOING_READ_CONFIG: + case VIRTIO_CCW_DOING_WRITE_CONFIG: + case VIRTIO_CCW_DOING_WRITE_STATUS: + case VIRTIO_CCW_DOING_SET_VQ: + case VIRTIO_CCW_DOING_SET_IND: + case VIRTIO_CCW_DOING_SET_CONF_IND: + case VIRTIO_CCW_DOING_RESET: + case VIRTIO_CCW_DOING_READ_VQ_CONF: + case VIRTIO_CCW_DOING_SET_IND_ADAPTER: + case VIRTIO_CCW_DOING_SET_VIRTIO_REV: + vcdev->curr_io &= ~activity; + wake_up(>wait_q); + break; + default: + /* don't know what to do... */ + dev_warn(>cdev->dev, +"Suspicious activity '%08x'\n", activity); + WARN_ON(1); + break; + } + } +} + static void virtio_ccw_int_handler(struct ccw_device *cdev, unsigned long intparm, struct irb *irb) @@ -995,6 +1025,12 @@ static void virtio_ccw_int_handler(struct ccw_device *cdev, if (!vcdev) return; + if (IS_ERR(irb)) { + vcdev->err = PTR_ERR(irb); + virtio_ccw_check_activity(vcdev, activity); + /* Don't poke around indicators, something's wrong. */ + return; + } /* Check if it's a notification from the host. */ if ((intparm == 0) && (scsw_stctl(>scsw) == @@ -1010,31 +1046,7 @@ static void virtio_ccw_int_handler(struct ccw_device *cdev, /* Map everything else to -EIO. */ vcdev->err = -EIO; } - if (vcdev->curr_io & activity) { - switch (activity) { - case VIRTIO_CCW_DOING_READ_FEAT: - case VIRTIO_CCW_DOING_WRITE_FEAT: - case VIRTIO_CCW_DOING_READ_CONFIG: - case VIRTIO_CCW_DOING_WRITE_CONFIG: - case VIRTIO_CCW_DOING_WRITE_STATUS: - case VIRTIO_CCW_DOING_SET_VQ: - case VIRTIO_CCW_DOING_SET_IND: - case VIRTIO_CCW_DOING_SET_CONF_IND: - case VIRTIO_CCW_DOING_RESET: - case VIRTIO_CCW_DOING_READ_VQ_CONF: - case VIRTIO_CCW_DOING_SET_IND_ADAPTER: - case VIRTIO_CCW_DOING_SET_VIRTIO_REV: - vcdev->curr_io &= ~activity; - wake_up(>wait_q); - break; - default: - /* don't know what to do... */ - dev_warn(>dev, "Suspicious activity '%08x'\n", -activity); - WARN_ON(1); - break; - } - } + virtio_ccw_check_activity(vcdev, activity); for_each_set_bit(i, >indicators, sizeof(vcdev->indicators) * BITS_PER_BYTE) { /* The bit clear must happen before the vring kick. */ -- 2.3.9 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/3] Introduce MSI hardware mapping for VFIO
On Thu, 2015-12-03 at 16:16 +0300, Pavel Fedin wrote: > Hello! > > > I like that you're making this transparent > > for the user, but at the same time, directly calling function pointers > > through the msi_domain_ops is quite ugly. > > Do you mean dereferencing info->ops->vfio_map() in .c code? I can > introduce some wrappers in include/linux/msi.h like > msi_domain_vfio_map()/msi_domain_vfio_unmap(), this would not > conceptually change anything. But otherwise we're parsing data structures that are possibly intended to be private. An interface also abstracts the implementation from the caller. > > There needs to be a real, interface there that isn't specific to > vfio. > > Hm... What else is going to use this? I don't know, but pushing vfio specific data structures and concepts into core kernel callbacks is clearly the wrong direction to go. > Actually, in my implementation the only thing specific to vfio is > using struct vfio_iommu_driver_ops. This is because we have to perform > MSI mapping for all "vfio domains" registered for this container. At > least this is how original type1 driver works. > Can anybody explain me, what these "vfio domains" are? From the code > it looks like we can have several IOMMU instances belonging to one > VFIO container, and in this case one IOMMU == one "vfio domain". So is > my understanding correct that "vfio domain" is IOMMU instance? There's no such thing as a vfio domain, I think you mean iommu domains. A vfio container represents a user iommu context. All of the groups (and thus devices) within a container have the same iommu mappings. However, not all of the groups are necessarily behind iommu hardware units that support the same set of features. We might therefore need to mirror the user iommu context for the container across multiple physical iommu contexts (aka domains). When we walk the iommu->domain_list, we're mirroring mappings across these multiple iommu domains within the container. > And here come completely different ideas... > First of all, can anybody explain, why do i perform all mappings on > per-IOMMU basis, not on per-device basis? AFAIK at least ARM SMMU > knows about "stream IDs", and therefore it should be capable of > distinguishing between different devices. So can i have per-device > mapping? This would make things much simpler. vfio is built on iommu groups with the premise being that an iommu group represents the smallest set of devices that are isolated from all other devices both by iommu visibility and by DMA isolation (peer-to-peer). Therefore we base iommu mappings on an iommu group because we cannot enforce userspace isolation at a sub-group level. In a system or topology that is well architected for device isolation, there will be a one-to-one mapping of iommu groups to devices. So, iommu mappings are always on a per-group basis, but the user may attach multiple groups to a single container, which as discussed above represents a single iommu context. That single context may be backed by one or more iommu domains, depending on the capabilities of the iommu hardware. You're therefore not performing all mappings on a per-iommu basis unless you have a user defined container spanning multiple iommus which are incompatible in a way that requires us to manage them with separate iommu domains. The iommu's ability to do per device mappings here is irrelevant. You're working within a user defined IOMMU context where they have decided that all of the devices should have the same context. > So: > Idea 1: do per-device mappings. In this case i don't have to track > down which devices belong to which group and which IOMMU... Nak, that doesn't solve anything. > Idea 2: What if we indeed simply simulate x86 behavior? What if we > just do 1:1 mapping for MSI register when IOMMU is initialized and > forget about it, so that MSI messages are guaranteed to reach the > host? Or would this mean that we would have to do 1:1 mapping for the > whole address range? Looks like (1) tried to do something similar, > with address reservation. x86 isn't problem-free in this space. An x86 VM is going to know that the 0xfee0 address range is special, it won't be backed by RAM and won't be a DMA target, thus we'll never attempt to map it for an iova address. However, if we run a non-x86 VM or a userspace driver, it doesn't necessarily know that there's anything special about that range of iovas. I intend to resolve this with an extension to the iommu info ioctl that describes the available iova space for the iommu. The interrupt region would simply be excluded. This may be an option for you too, but you need to consider whether it precludes things like hotplug. Even in the x86 case, if we have a non-x86 VM and we try to hot-add a PCI device, we can't dynamically remove the RAM that would interfere with with the MSI vector block. I don't know what that looks like on your platform, whether you can pick a fixed range for the
[PATCH] KVM: VMX: fix read/write sizes of VMCS fields
In theory this should have broken EPT on 32-bit kernels (due to reading the high part of natural-width field GUEST_CR3). Not sure if no one noticed or the processor behaves differently from the documentation. Signed-off-by: Paolo Bonzini--- arch/x86/kvm/vmx.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index c39737ff0581..b1af1e48070b 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -4868,7 +4868,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) seg_setup(VCPU_SREG_CS); vmcs_write16(GUEST_CS_SELECTOR, 0xf000); - vmcs_write32(GUEST_CS_BASE, 0x); + vmcs_writel(GUEST_CS_BASE, 0xul); seg_setup(VCPU_SREG_DS); seg_setup(VCPU_SREG_ES); @@ -4904,7 +4904,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) vmcs_write32(GUEST_ACTIVITY_STATE, GUEST_ACTIVITY_ACTIVE); vmcs_write32(GUEST_INTERRUPTIBILITY_INFO, 0); - vmcs_write32(GUEST_PENDING_DBG_EXCEPTIONS, 0); + vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS, 0); setup_msrs(vmx); @@ -7893,7 +7893,7 @@ static void dump_vmcs(void) u32 pin_based_exec_ctrl = vmcs_read32(PIN_BASED_VM_EXEC_CONTROL); u32 secondary_exec_control = 0; unsigned long cr4 = vmcs_readl(GUEST_CR4); - u64 efer = vmcs_readl(GUEST_IA32_EFER); + u64 efer = vmcs_read64(GUEST_IA32_EFER); int i, n; if (cpu_has_secondary_exec_ctrls()) @@ -10159,7 +10159,7 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12, * Additionally, restore L2's PDPTR to vmcs12. */ if (enable_ept) { - vmcs12->guest_cr3 = vmcs_read64(GUEST_CR3); + vmcs12->guest_cr3 = vmcs_readl(GUEST_CR3); vmcs12->guest_pdptr0 = vmcs_read64(GUEST_PDPTR0); vmcs12->guest_pdptr1 = vmcs_read64(GUEST_PDPTR1); vmcs12->guest_pdptr2 = vmcs_read64(GUEST_PDPTR2); -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: VMX: fix read/write sizes of VMCS fields in dump_vmcs
This was not printing the high parts of several 64-bit fields on 32-bit kernels. Separate from the previous one to make the patches easier to review. Signed-off-by: Paolo Bonzini--- arch/x86/kvm/vmx.c | 39 --- 1 file changed, 20 insertions(+), 19 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index b1af1e48070b..b1a453d78155 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -7909,10 +7909,10 @@ static void dump_vmcs(void) if ((secondary_exec_control & SECONDARY_EXEC_ENABLE_EPT) && (cr4 & X86_CR4_PAE) && !(efer & EFER_LMA)) { - pr_err("PDPTR0 = 0x%016lx PDPTR1 = 0x%016lx\n", - vmcs_readl(GUEST_PDPTR0), vmcs_readl(GUEST_PDPTR1)); - pr_err("PDPTR2 = 0x%016lx PDPTR3 = 0x%016lx\n", - vmcs_readl(GUEST_PDPTR2), vmcs_readl(GUEST_PDPTR3)); + pr_err("PDPTR0 = 0x%016llx PDPTR1 = 0x%016llx\n", + vmcs_read64(GUEST_PDPTR0), vmcs_read64(GUEST_PDPTR1)); + pr_err("PDPTR2 = 0x%016llx PDPTR3 = 0x%016llx\n", + vmcs_read64(GUEST_PDPTR2), vmcs_read64(GUEST_PDPTR3)); } pr_err("RSP = 0x%016lx RIP = 0x%016lx\n", vmcs_readl(GUEST_RSP), vmcs_readl(GUEST_RIP)); @@ -7933,16 +7933,16 @@ static void dump_vmcs(void) vmx_dump_sel("TR: ", GUEST_TR_SELECTOR); if ((vmexit_ctl & (VM_EXIT_SAVE_IA32_PAT | VM_EXIT_SAVE_IA32_EFER)) || (vmentry_ctl & (VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_IA32_EFER))) - pr_err("EFER = 0x%016llx PAT = 0x%016lx\n", - efer, vmcs_readl(GUEST_IA32_PAT)); - pr_err("DebugCtl = 0x%016lx DebugExceptions = 0x%016lx\n", - vmcs_readl(GUEST_IA32_DEBUGCTL), + pr_err("EFER = 0x%016llx PAT = 0x%016llx\n", + efer, vmcs_read64(GUEST_IA32_PAT)); + pr_err("DebugCtl = 0x%016llx DebugExceptions = 0x%016lx\n", + vmcs_read64(GUEST_IA32_DEBUGCTL), vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS)); if (vmentry_ctl & VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL) - pr_err("PerfGlobCtl = 0x%016lx\n", - vmcs_readl(GUEST_IA32_PERF_GLOBAL_CTRL)); + pr_err("PerfGlobCtl = 0x%016llx\n", + vmcs_read64(GUEST_IA32_PERF_GLOBAL_CTRL)); if (vmentry_ctl & VM_ENTRY_LOAD_BNDCFGS) - pr_err("BndCfgS = 0x%016lx\n", vmcs_readl(GUEST_BNDCFGS)); + pr_err("BndCfgS = 0x%016llx\n", vmcs_read64(GUEST_BNDCFGS)); pr_err("Interruptibility = %08x ActivityState = %08x\n", vmcs_read32(GUEST_INTERRUPTIBILITY_INFO), vmcs_read32(GUEST_ACTIVITY_STATE)); @@ -7971,11 +7971,12 @@ static void dump_vmcs(void) vmcs_read32(HOST_IA32_SYSENTER_CS), vmcs_readl(HOST_IA32_SYSENTER_EIP)); if (vmexit_ctl & (VM_EXIT_LOAD_IA32_PAT | VM_EXIT_LOAD_IA32_EFER)) - pr_err("EFER = 0x%016lx PAT = 0x%016lx\n", - vmcs_readl(HOST_IA32_EFER), vmcs_readl(HOST_IA32_PAT)); + pr_err("EFER = 0x%016llx PAT = 0x%016llx\n", + vmcs_read64(HOST_IA32_EFER), + vmcs_read64(HOST_IA32_PAT)); if (vmexit_ctl & VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL) - pr_err("PerfGlobCtl = 0x%016lx\n", - vmcs_readl(HOST_IA32_PERF_GLOBAL_CTRL)); + pr_err("PerfGlobCtl = 0x%016llx\n", + vmcs_read64(HOST_IA32_PERF_GLOBAL_CTRL)); pr_err("*** Control State ***\n"); pr_err("PinBased=%08x CPUBased=%08x SecondaryExec=%08x\n", @@ -7998,16 +7999,16 @@ static void dump_vmcs(void) pr_err("IDTVectoring: info=%08x errcode=%08x\n", vmcs_read32(IDT_VECTORING_INFO_FIELD), vmcs_read32(IDT_VECTORING_ERROR_CODE)); - pr_err("TSC Offset = 0x%016lx\n", vmcs_readl(TSC_OFFSET)); + pr_err("TSC Offset = 0x%016llx\n", vmcs_read64(TSC_OFFSET)); if (secondary_exec_control & SECONDARY_EXEC_TSC_SCALING) - pr_err("TSC Multiplier = 0x%016lx\n", - vmcs_readl(TSC_MULTIPLIER)); + pr_err("TSC Multiplier = 0x%016llx\n", + vmcs_read64(TSC_MULTIPLIER)); if (cpu_based_exec_ctrl & CPU_BASED_TPR_SHADOW) pr_err("TPR Threshold = 0x%02x\n", vmcs_read32(TPR_THRESHOLD)); if (pin_based_exec_ctrl & PIN_BASED_POSTED_INTR) pr_err("PostedIntrVec = 0x%02x\n", vmcs_read16(POSTED_INTR_NV)); if ((secondary_exec_control & SECONDARY_EXEC_ENABLE_EPT)) - pr_err("EPT pointer = 0x%016lx\n", vmcs_readl(EPT_POINTER)); + pr_err("EPT pointer = 0x%016llx\n", vmcs_read64(EPT_POINTER)); n = vmcs_read32(CR3_TARGET_COUNT); for (i = 0; i + 1 < n; i += 4) pr_err("CR3
Re: [PATCH v4 1/3] KVM/arm/arm64: add hooks for armv7 fp/simd lazy switch support
On 14/11/15 22:12, Mario Smarduch wrote: > This patch adds vcpu fields to track lazy state, save host FPEXC, and > offsets to fields. > > Signed-off-by: Mario Smarduch> --- > arch/arm/include/asm/kvm_host.h | 6 ++ > arch/arm/kernel/asm-offsets.c | 2 ++ > 2 files changed, 8 insertions(+) > > diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h > index 3df1e97..f1bf551 100644 > --- a/arch/arm/include/asm/kvm_host.h > +++ b/arch/arm/include/asm/kvm_host.h > @@ -107,6 +107,12 @@ struct kvm_vcpu_arch { > /* Interrupt related fields */ > u32 irq_lines; /* IRQ and FIQ levels */ > > + /* fp/simd dirty flag true if guest accessed register file */ > + boolvfp_dirty; I think we do not need this bool, because it is already represented by the state of the trapping bits. > + > + /* Save host FPEXC register to later restore on vcpu put */ > + u32 host_fpexc; > + > /* Exception Information */ > struct kvm_vcpu_fault_info fault; > > diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c > index 871b826..9f79712 100644 > --- a/arch/arm/kernel/asm-offsets.c > +++ b/arch/arm/kernel/asm-offsets.c > @@ -186,6 +186,8 @@ int main(void) >DEFINE(VCPU_CPSR, offsetof(struct kvm_vcpu, > arch.regs.usr_regs.ARM_cpsr)); >DEFINE(VCPU_HCR, offsetof(struct kvm_vcpu, arch.hcr)); >DEFINE(VCPU_IRQ_LINES, offsetof(struct kvm_vcpu, arch.irq_lines)); > + DEFINE(VCPU_VFP_DIRTY, offsetof(struct kvm_vcpu, arch.vfp_dirty)); > + DEFINE(VCPU_VFP_HOST_FPEXC,offsetof(struct kvm_vcpu, > arch.host_fpexc)); >DEFINE(VCPU_HSR, offsetof(struct kvm_vcpu, arch.fault.hsr)); >DEFINE(VCPU_HxFAR, offsetof(struct kvm_vcpu, arch.fault.hxfar)); >DEFINE(VCPU_HPFAR, offsetof(struct kvm_vcpu, arch.fault.hpfar)); > Thanks, M. -- Jazz is not dead. It just smells funny... -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 2/3] KVM/arm/arm64: enable enhanced armv7 fp/simd lazy switch
On 14/11/15 22:12, Mario Smarduch wrote: > This patch tracks armv7 fp/simd hardware state with a vcpu lazy flag. > On vcpu_load saves host fpexc and enables FP access, and later enables fp/simd > trapping if lazy flag is not set. On first fp/simd access trap to handler > to save host and restore guest context, disable trapping and set vcpu lazy > flag. On vcpu_put if flag is set save guest and restore host context and > always restore host fpexc. > > Signed-off-by: Mario Smarduch> --- > arch/arm/include/asm/kvm_host.h | 33 ++ > arch/arm/kvm/arm.c| 12 > arch/arm/kvm/interrupts.S | 58 > +++ > arch/arm/kvm/interrupts_head.S| 26 +- > arch/arm64/include/asm/kvm_host.h | 6 > 5 files changed, 104 insertions(+), 31 deletions(-) > > diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h > index f1bf551..8fc7a59 100644 > --- a/arch/arm/include/asm/kvm_host.h > +++ b/arch/arm/include/asm/kvm_host.h > @@ -40,6 +40,38 @@ > > #define KVM_MAX_VCPUS VGIC_V2_MAX_CPUS > > +/* > + * Reads the host FPEXC register, saves it to vcpu context and enables the > + * FP/SIMD unit. > + */ > +#ifdef CONFIG_VFPv3 > +#define kvm_enable_fpexc(vcpu) { \ > + u32 fpexc = 0; \ > + asm volatile( \ > + "mrc p10, 7, %0, cr8, cr0, 0\n" \ > + "str %0, [%1]\n"\ > + "orr %0, %0, #(1 << 30)\n" \ > + "mcr p10, 7, %0, cr8, cr0, 0\n" \ Don't you need an ISB here? Also, it would be a lot nicer if this was a real function (possibly inlined). I don't see any real reason to make this a #define. Also, you're preserving a lot of the host's FPEXC bits. Is that safe? > + : "+r" (fpexc) \ > + : "r" (>arch.host_fpexc) \ > + ); \ > +} > +#else > +#define kvm_enable_fpexc(vcpu) > +#endif > + > +/* Restores host FPEXC register */ > +#ifdef CONFIG_VFPv3 > +#define kvm_restore_host_fpexc(vcpu) { \ > + asm volatile( \ > + "mcr p10, 7, %0, cr8, cr0, 0\n" \ > + : : "r" (vcpu->arch.host_fpexc) \ > + ); \ > +} > +#else > +#define kvm_restore_host_fpexc(vcpu) > +#endif > + > u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode); > int __attribute_const__ kvm_target_cpu(void); > int kvm_reset_vcpu(struct kvm_vcpu *vcpu); > @@ -227,6 +259,7 @@ int kvm_perf_teardown(void); > void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot); > > struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr); > +void kvm_restore_host_vfp_state(struct kvm_vcpu *); > > static inline void kvm_arch_hardware_disable(void) {} > static inline void kvm_arch_hardware_unsetup(void) {} > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c > index dc017ad..cfc348a 100644 > --- a/arch/arm/kvm/arm.c > +++ b/arch/arm/kvm/arm.c > @@ -291,10 +291,22 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) > vcpu->arch.host_cpu_context = this_cpu_ptr(kvm_host_cpu_state); > > kvm_arm_set_running_vcpu(vcpu); > + > + /* Save and enable FPEXC before we load guest context */ > + kvm_enable_fpexc(vcpu); > } > > void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) > { > + /* If the fp/simd registers are dirty save guest, restore host. */ > + if (vcpu->arch.vfp_dirty) { See my previous comment about the dirty state. > + kvm_restore_host_vfp_state(vcpu); > + vcpu->arch.vfp_dirty = 0; > + } > + > + /* Restore host FPEXC trashed in vcpu_load */ > + kvm_restore_host_fpexc(vcpu); > + > /* >* The arch-generic KVM code expects the cpu field of a vcpu to be -1 >* if the vcpu is no longer assigned to a cpu. This is used for the > diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S > index 900ef6d..1ddaa89 100644 > --- a/arch/arm/kvm/interrupts.S > +++ b/arch/arm/kvm/interrupts.S > @@ -28,6 +28,26 @@ > #include "interrupts_head.S" > > .text > +/** > + * void kvm_restore_host_vfp_state(struct vcpu *vcpu) - > + * This function is called from host to save the guest, and restore host > + * fp/simd hardware context. It's placed outside of hyp start/end region. > + */ > +ENTRY(kvm_restore_host_vfp_state) > +#ifdef CONFIG_VFPv3 > + push{r4-r7} > + > + add r7, vcpu, #VCPU_VFP_GUEST > + store_vfp_state r7 > + > + add r7, vcpu, #VCPU_VFP_HOST > + ldr r7, [r7] > + restore_vfp_state r7 > + > + pop {r4-r7} > +#endif > + bx lr > +ENDPROC(kvm_restore_host_vfp_state) > Please don't mix things that are
Re: [RFC PATCH V2 02/10] Qemu/VFIO: Add new VFIO_GET_PCI_CAP_INFO ioctl cmd definition
On Thu, 2015-12-03 at 16:40 +0800, Lan, Tianyu wrote: > On 12/3/2015 6:25 AM, Alex Williamson wrote: > > I didn't seen a matching kernel patch series for this, but why is the > > kernel more capable of doing this than userspace is already? > The following link is the kernel patch. > http://marc.info/?l=kvm=144837328920989=2 > > > These seem > > like pointless ioctls, we're creating a purely virtual PCI capability, > > the kernel doesn't really need to participate in that. > > VFIO kernel driver has pci_config_map which indicates the PCI capability > position and length which helps to find free PCI config regs. Qemu side > doesn't have such info and can't get the exact table size of PCI > capability. If we want to add such support in the Qemu, needs duplicates > a lot of code of vfio_pci_configs.c in the Qemu. That's an internal implementation detail of the kernel, not motivation for creating a new userspace ABI. QEMU can recreate this data on its own. The kernel is in no more of an authoritative position to determine capability extents than userspace. > > Also, why are we > > restricting ourselves to standard capabilities? > > This version is to check whether it's on the right way and We can extend > this to pci extended capability later. > > > That's often a crowded > > space and we can't always know whether an area is free or not based only > > on it being covered by a capability. Some capabilities can also appear > > more than once, so there's context that isn't being passed to the kernel > > here. Thanks, > > The region outside of PCI capability are not passed to kernel or used by > Qemu for MSI/MSIX . It's possible to use these places for new > capability. One concerns is that guest driver may abuse them and quirk > of masking some special regs outside of capability maybe helpful. That's not correct, see kernel commit a7d1ea1c11b33bda2691f3294b4d735ed635535a. Gaps between capabilities are exposed with raw read-write access from the kernel and some drivers and devices depend on this. There's also no guarantee that there's a sufficiently sized gap in conventional space. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 3/3] KVM/arm/arm64: enable enhanced armv8 fp/simd lazy switch
On 14/11/15 22:12, Mario Smarduch wrote: > This patch tracks armv7 and armv8 fp/simd hardware state with a vcpu lazy > flag. > On vcpu_load for 32 bit guests enable FP access, and later enable fp/simd > trapping for 32 and 64 bit guests if lazy flag is not set. On first fp/simd > access trap to handler to save host and restore guest context, disable > trapping and set vcpu lazy flag. On vcpu_put if flag is set save guest and > restore host context and also save guest fpexc register. > > Signed-off-by: Mario Smarduch> --- > arch/arm/include/asm/kvm_host.h | 3 ++ > arch/arm/kvm/arm.c| 18 +++-- > arch/arm64/include/asm/kvm_asm.h | 2 + > arch/arm64/include/asm/kvm_host.h | 17 +++- > arch/arm64/kernel/asm-offsets.c | 1 + > arch/arm64/kvm/hyp.S | 83 > +-- > 6 files changed, 89 insertions(+), 35 deletions(-) > > diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h > index 8fc7a59..6960ff2 100644 > --- a/arch/arm/include/asm/kvm_host.h > +++ b/arch/arm/include/asm/kvm_host.h > @@ -40,6 +40,8 @@ > > #define KVM_MAX_VCPUS VGIC_V2_MAX_CPUS > > +#define kvm_guest_is32bit(vcpu) true This should be defined as an inline function, and placed in asm/kvm_emulate.h, probably renamed as kvm_guest_vcpu_is_32bit. > + > /* > * Reads the host FPEXC register, saves to vcpu context and enables the > * FPEXC. > @@ -260,6 +262,7 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot); > > struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr); > void kvm_restore_host_vfp_state(struct kvm_vcpu *); > +static inline void kvm_save_guest_fpexc(struct kvm_vcpu *vcpu) {} > > static inline void kvm_arch_hardware_disable(void) {} > static inline void kvm_arch_hardware_unsetup(void) {} > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c > index cfc348a..7a20530 100644 > --- a/arch/arm/kvm/arm.c > +++ b/arch/arm/kvm/arm.c > @@ -292,8 +292,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) > > kvm_arm_set_running_vcpu(vcpu); > > - /* Save and enable FPEXC before we load guest context */ > - kvm_enable_fpexc(vcpu); > + /* > + * For 32bit guest executing on arm64, enable fp/simd access in > + * EL2. On arm32 save host fpexc and then enable fp/simd access. > + */ > + if (kvm_guest_is32bit(vcpu)) > + kvm_enable_fpexc(vcpu); > } > > void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) > @@ -301,10 +305,18 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) > /* If the fp/simd registers are dirty save guest, restore host. */ > if (vcpu->arch.vfp_dirty) { > kvm_restore_host_vfp_state(vcpu); > + > + /* > + * For 32bit guest on arm64 save the guest fpexc register > + * in EL2 mode. > + */ > + if (kvm_guest_is32bit(vcpu)) > + kvm_save_guest_fpexc(vcpu); > + > vcpu->arch.vfp_dirty = 0; > } > > - /* Restore host FPEXC trashed in vcpu_load */ > + /* For arm32 restore host FPEXC trashed in vcpu_load. */ > kvm_restore_host_fpexc(vcpu); > > /* > diff --git a/arch/arm64/include/asm/kvm_asm.h > b/arch/arm64/include/asm/kvm_asm.h > index 5e37710..c589ca9 100644 > --- a/arch/arm64/include/asm/kvm_asm.h > +++ b/arch/arm64/include/asm/kvm_asm.h > @@ -117,6 +117,8 @@ extern char __kvm_hyp_vector[]; > extern void __kvm_flush_vm_context(void); > extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa); > extern void __kvm_tlb_flush_vmid(struct kvm *kvm); > +extern void __kvm_enable_fpexc32(void); > +extern void __kvm_save_fpexc32(struct kvm_vcpu *vcpu); > > extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu); > > diff --git a/arch/arm64/include/asm/kvm_host.h > b/arch/arm64/include/asm/kvm_host.h > index 83e65dd..6e2d6b5 100644 > --- a/arch/arm64/include/asm/kvm_host.h > +++ b/arch/arm64/include/asm/kvm_host.h > @@ -41,6 +41,8 @@ > > #define KVM_VCPU_MAX_FEATURES 3 > > +#define kvm_guest_is32bit(vcpu) (!(vcpu->arch.hcr_el2 & HCR_RW)) > + > int __attribute_const__ kvm_target_cpu(void); > int kvm_reset_vcpu(struct kvm_vcpu *vcpu); > int kvm_arch_dev_ioctl_check_extension(long ext); > @@ -251,9 +253,20 @@ static inline void kvm_arch_hardware_unsetup(void) {} > static inline void kvm_arch_sync_events(struct kvm *kvm) {} > static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {} > static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {} > -static inline void kvm_enable_fpexc(struct kvm_vcpu *vcpu) {} > -static inline void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu) {} > + > +static inline void kvm_enable_fpexc(struct kvm_vcpu *vcpu) > +{ > + /* Enable FP/SIMD access from EL2 mode*/ > + kvm_call_hyp(__kvm_enable_fpexc32); > +} > + > +static inline void kvm_save_guest_fpexc(struct kvm_vcpu *vcpu) > +{
[PATCH] KVM: vmx: detect mismatched size in VMCS read/write
Signed-off-by: Paolo Bonzini--- I am sending this as RFC because the error messages it produces are very ugly. Because of inlining, the original line is lost. The alternative is to change vmcs_read/write/checkXX into macros, but then you need to have a single huge BUILD_BUG_ON or BUILD_BUG_ON_MSG because multiple BUILD_BUG_ON* with the same __LINE__ are not supported well. --- arch/x86/kvm/vmx.c | 100 - 1 file changed, 83 insertions(+), 17 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index b1a453d78155..62d958a1ec0f 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -1447,7 +1447,51 @@ static inline void ept_sync_context(u64 eptp) } } -static __always_inline unsigned long vmcs_readl(unsigned long field) +static __always_inline void vmcs_check16(unsigned long field) +{ +BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6001) == 0x2000, +"16-bit accessor invalid for 64-bit field"); +BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6001) == 0x2001, +"16-bit accessor invalid for 64-bit high field"); +BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6000) == 0x4000, +"16-bit accessor invalid for 32-bit high field"); +BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6000) == 0x6000, +"16-bit accessor invalid for natural width field"); +} + +static __always_inline void vmcs_check32(unsigned long field) +{ +BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6000) == 0, +"32-bit accessor invalid for 16-bit field"); +BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6000) == 0x6000, +"32-bit accessor invalid for natural width field"); +} + +static __always_inline void vmcs_check64(unsigned long field) +{ +BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6000) == 0, +"64-bit accessor invalid for 16-bit field"); +BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6001) == 0x2001, +"64-bit accessor invalid for 64-bit high field"); +BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6000) == 0x4000, +"64-bit accessor invalid for 32-bit field"); +BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6000) == 0x6000, +"64-bit accessor invalid for natural width field"); +} + +static __always_inline void vmcs_checkl(unsigned long field) +{ +BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6000) == 0, +"Natural width accessor invalid for 16-bit field"); +BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6001) == 0x2000, +"Natural width accessor invalid for 64-bit field"); +BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6001) == 0x2001, +"Natural width accessor invalid for 64-bit high field"); +BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6000) == 0x4000, +"Natural width accessor invalid for 32-bit field"); +} + +static __always_inline unsigned long __vmcs_readl(unsigned long field) { unsigned long value; @@ -1458,23 +1502,32 @@ static __always_inline unsigned long vmcs_readl(unsigned long field) static __always_inline u16 vmcs_read16(unsigned long field) { - return vmcs_readl(field); + vmcs_check16(field); + return __vmcs_readl(field); } static __always_inline u32 vmcs_read32(unsigned long field) { - return vmcs_readl(field); + vmcs_check32(field); + return __vmcs_readl(field); } static __always_inline u64 vmcs_read64(unsigned long field) { + vmcs_check64(field); #ifdef CONFIG_X86_64 - return vmcs_readl(field); + return __vmcs_readl(field); #else - return vmcs_readl(field) | ((u64)vmcs_readl(field+1) << 32); + return __vmcs_readl(field) | ((u64)__vmcs_readl(field+1) << 32); #endif } +static __always_inline unsigned long vmcs_readl(unsigned long field) +{ + vmcs_checkl(field); + return __vmcs_readl(field); +} + static noinline void vmwrite_error(unsigned long field, unsigned long value) { printk(KERN_ERR "vmwrite error: reg %lx value %lx (err %d)\n", @@ -1482,7 +1535,7 @@ static noinline void vmwrite_error(unsigned long field, unsigned long value) dump_stack(); } -static void vmcs_writel(unsigned long field, unsigned long value) +static __always_inline void __vmcs_writel(unsigned long field, unsigned long value) { u8 error; @@ -1492,33 +1545,46 @@ static void vmcs_writel(unsigned long field, unsigned long
Re: [PATCH v2 0/5] Add virtio transport for AF_VSOCK
From: Stefan HajnocziDate: Wed, 2 Dec 2015 14:43:58 +0800 > This patch series adds a virtio transport for AF_VSOCK (net/vmw_vsock/). Series applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 1/3] KVM/arm/arm64: add hooks for armv7 fp/simd lazy switch support
On 12/3/2015 7:46 AM, Marc Zyngier wrote: > On 14/11/15 22:12, Mario Smarduch wrote: >> This patch adds vcpu fields to track lazy state, save host FPEXC, and >> offsets to fields. >> >> Signed-off-by: Mario Smarduch>> --- >> arch/arm/include/asm/kvm_host.h | 6 ++ >> arch/arm/kernel/asm-offsets.c | 2 ++ >> 2 files changed, 8 insertions(+) >> >> diff --git a/arch/arm/include/asm/kvm_host.h >> b/arch/arm/include/asm/kvm_host.h >> index 3df1e97..f1bf551 100644 >> --- a/arch/arm/include/asm/kvm_host.h >> +++ b/arch/arm/include/asm/kvm_host.h >> @@ -107,6 +107,12 @@ struct kvm_vcpu_arch { >> /* Interrupt related fields */ >> u32 irq_lines; /* IRQ and FIQ levels */ >> >> +/* fp/simd dirty flag true if guest accessed register file */ >> +boolvfp_dirty; > > I think we do not need this bool, because it is already represented by > the state of the trapping bits. The trapping bit state is lost on exit since they're cleared, no? > >> + >> +/* Save host FPEXC register to later restore on vcpu put */ >> +u32 host_fpexc; >> + >> /* Exception Information */ >> struct kvm_vcpu_fault_info fault; >> >> diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c >> index 871b826..9f79712 100644 >> --- a/arch/arm/kernel/asm-offsets.c >> +++ b/arch/arm/kernel/asm-offsets.c >> @@ -186,6 +186,8 @@ int main(void) >>DEFINE(VCPU_CPSR, offsetof(struct kvm_vcpu, >> arch.regs.usr_regs.ARM_cpsr)); >>DEFINE(VCPU_HCR, offsetof(struct kvm_vcpu, arch.hcr)); >>DEFINE(VCPU_IRQ_LINES,offsetof(struct kvm_vcpu, arch.irq_lines)); >> + DEFINE(VCPU_VFP_DIRTY,offsetof(struct kvm_vcpu, arch.vfp_dirty)); >> + DEFINE(VCPU_VFP_HOST_FPEXC, offsetof(struct kvm_vcpu, >> arch.host_fpexc)); >>DEFINE(VCPU_HSR, offsetof(struct kvm_vcpu, arch.fault.hsr)); >>DEFINE(VCPU_HxFAR,offsetof(struct kvm_vcpu, >> arch.fault.hxfar)); >>DEFINE(VCPU_HPFAR,offsetof(struct kvm_vcpu, >> arch.fault.hpfar)); >> > > Thanks, > > M. > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 1/3] KVM/arm/arm64: add hooks for armv7 fp/simd lazy switch support
On 03/12/15 19:21, Mario Smarduch wrote: > > > On 12/3/2015 7:46 AM, Marc Zyngier wrote: >> On 14/11/15 22:12, Mario Smarduch wrote: >>> This patch adds vcpu fields to track lazy state, save host FPEXC, and >>> offsets to fields. >>> >>> Signed-off-by: Mario Smarduch>>> --- >>> arch/arm/include/asm/kvm_host.h | 6 ++ >>> arch/arm/kernel/asm-offsets.c | 2 ++ >>> 2 files changed, 8 insertions(+) >>> >>> diff --git a/arch/arm/include/asm/kvm_host.h >>> b/arch/arm/include/asm/kvm_host.h >>> index 3df1e97..f1bf551 100644 >>> --- a/arch/arm/include/asm/kvm_host.h >>> +++ b/arch/arm/include/asm/kvm_host.h >>> @@ -107,6 +107,12 @@ struct kvm_vcpu_arch { >>> /* Interrupt related fields */ >>> u32 irq_lines; /* IRQ and FIQ levels */ >>> >>> + /* fp/simd dirty flag true if guest accessed register file */ >>> + boolvfp_dirty; >> >> I think we do not need this bool, because it is already represented by >> the state of the trapping bits. > > The trapping bit state is lost on exit since they're cleared, no? But that's what should actually be preserved, no? At the moment, you maintain some side state to reflect what the trapping state is. You might as well keep it around all the time. Thanks, M. -- Jazz is not dead. It just smells funny... -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/5] Threaded MSI interrupt for VFIO PCI device
On Thu, Dec 03, 2015 at 11:55:53AM -0700, Alex Williamson wrote: > On Thu, 2015-12-03 at 10:22 -0800, Yunhong Jiang wrote: > > When assigning a VFIO device to a KVM guest with low latency requirement, > > it > > is better to handle the interrupt in the hard interrupt context, to reduce > > the context switch to/from the IRQ thread. > > > > Based on discussion on https://lkml.org/lkml/2015/10/26/764, the VFIO msi > > interrupt is changed to use request_threaded_irq(). The primary interrupt > > handler tries to set the guest interrupt atomically. If it fails to achieve > > it, a threaded interrupt handler will be invoked. > > > > The irq_bypass manager is extended for this purpose. The KVM eventfd will > > provide a irqbypass consumer to handle the interrupt at hard interrupt > > context. The producer will invoke the consumer's handler then. > > Do you have any performance data? Thanks, Sorry, I should include the data on the commit messages. The test: Launch a VM with a FPGA device, which triggers an interrpt every 1ms. The VM is launched on a isolated CPU with NONHZ_FULL enabled. Two data are collected. On the guest, a special program will check the latency from the time the interrupt been triggered to the time the interrupt received by guest IRS. On the host, I use the perf to collect the perf data. The performance Data with the patch: Host perf data: [root@otcnfv02 test-tools]# perf kvm stat record -C 34 sleep 10 [root@otcnfv02 test-tools]# perf kvm stat report Analyze events for all VMs, all VCPUs: VM-EXITSamples Samples% Time%Min TimeMax Time Avg time EXTERNAL_INTERRUPT 999798.55%99.31% 1.73us 17.07us 2.09us ( +- 0.21% ) PAUSE_INSTRUCTION127 1.25% 0.51% 0.69us 1.20us 0.84us ( +- 1.34% ) MSR_WRITE 20 0.20% 0.18% 1.62us 3.21us 1.93us ( +- 3.95% ) Guest data: [nfv@rt-test-1 ~]$ ./run-int-test.sh Latency is Min: 3.74us Max: 20.08us Avg: 4.49us No of interrupts = 74995 The performance data without the patch: Host perf data: [root@otcnfv02 test-tools]# perf kvm stat record -C 34 sleep 10 [root@otcnfv02 test-tools]# perf kvm stat report Analyze events for all VMs, all VCPUs: VM-EXITSamples Samples% Time%Min TimeMax Time Avg time PAUSE_INSTRUCTION 14113687.74%50.39% 0.69us 8.51us 0.77us ( +- 0.07% ) EXTERNAL_INTERRUPT 1970112.25%49.59% 2.31us 15.93us 5.46us ( +- 0.12% ) MSR_WRITE 24 0.01% 0.02% 1.51us 2.22us 1.91us ( +- 1.91% ) Notice: The EXTERNAL_INTERRUPT VMExit is different w/ the patch (9997) and w/o the patch (19701). It is because with threaded IRQ, the NOHZ_FULL it not working because two threads on the pCPU, thus we have both the FPGA device interrupt and the tick timer interrupt. After calculation, the average time for the FPGA device interrupt is 4.72 us. Guest data: [nfv@rt-test-1 ~]$ ./run-int-test.sh Latency is Min: 6.70us Max: 50.38us Avg: 7.44us No of interrupts = 42639 Thanks --jyh > > Alex > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/5] Threaded MSI interrupt for VFIO PCI device
On Thu, 2015-12-03 at 10:22 -0800, Yunhong Jiang wrote: > When assigning a VFIO device to a KVM guest with low latency requirement, it > is better to handle the interrupt in the hard interrupt context, to reduce > the context switch to/from the IRQ thread. > > Based on discussion on https://lkml.org/lkml/2015/10/26/764, the VFIO msi > interrupt is changed to use request_threaded_irq(). The primary interrupt > handler tries to set the guest interrupt atomically. If it fails to achieve > it, a threaded interrupt handler will be invoked. > > The irq_bypass manager is extended for this purpose. The KVM eventfd will > provide a irqbypass consumer to handle the interrupt at hard interrupt > context. The producer will invoke the consumer's handler then. Do you have any performance data? Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 1/3] KVM/arm/arm64: add hooks for armv7 fp/simd lazy switch support
On 12/3/2015 11:24 AM, Marc Zyngier wrote: > On 03/12/15 19:21, Mario Smarduch wrote: >> >> >> On 12/3/2015 7:46 AM, Marc Zyngier wrote: >>> On 14/11/15 22:12, Mario Smarduch wrote: This patch adds vcpu fields to track lazy state, save host FPEXC, and offsets to fields. Signed-off-by: Mario Smarduch--- arch/arm/include/asm/kvm_host.h | 6 ++ arch/arm/kernel/asm-offsets.c | 2 ++ 2 files changed, 8 insertions(+) diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index 3df1e97..f1bf551 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -107,6 +107,12 @@ struct kvm_vcpu_arch { /* Interrupt related fields */ u32 irq_lines; /* IRQ and FIQ levels */ + /* fp/simd dirty flag true if guest accessed register file */ + boolvfp_dirty; >>> >>> I think we do not need this bool, because it is already represented by >>> the state of the trapping bits. >> >> The trapping bit state is lost on exit since they're cleared, no? > > But that's what should actually be preserved, no? At the moment, you > maintain some side state to reflect what the trapping state is. You > might as well keep it around all the time. Ok I see, you should be able to preserve and use the trap registers. I'll rework it. > > Thanks, > > M. > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: VMX: fix the writing POSTED_INTR_NV
On Thu, Dec 3, 2015 at 9:58 PM, Paolo Bonziniwrote: > Indeed this is a problem for 32-bit hosts. I meet this error when I run a 32bit kernel on Intel Wildcat Pass platform -Roy -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] KVM: Add the irq handling consumer
[if your patch is applied to the wrong git tree, please drop us a note to help improving the system] Hi Yunhong, [auto build test ERROR on kvm/linux-next] [also build test ERROR on v4.4-rc3 next-20151202] url: https://github.com/0day-ci/linux/commits/Yunhong-Jiang/KVM-Extract-the-irqfd_wakeup_pollin-irqfd_wakeup_pollup/20151204-024034 base: https://git.kernel.org/pub/scm/virt/kvm/kvm.git linux-next config: arm64-allmodconfig (attached as .config) reproduce: wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=arm64 All errors (new ones prefixed by >>): arch/arm64/kvm/built-in.o: In function `kvm_irqfd_assign': >> :(.text+0x10a70): undefined reference to `irq_bypass_register_consumer' arch/arm64/kvm/built-in.o: In function `irqfd_shutdown': >> :(.text+0x10cd4): undefined reference to `irq_bypass_unregister_consumer' --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: Binary data
[PATCH 5/5] KVM: Expose x86 kvm_arch_set_irq_inatomic()
The x86 support setting irq in atomic, expose it to vfio driver. Signed-off-by: Yunhong Jiang--- arch/x86/kvm/Kconfig | 1 + include/linux/kvm_host.h | 19 --- virt/kvm/Kconfig | 3 +++ virt/kvm/eventfd.c | 9 - 4 files changed, 20 insertions(+), 12 deletions(-) diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index 639a6e34500c..642e8b905c96 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -30,6 +30,7 @@ config KVM select HAVE_KVM_IRQFD select IRQ_BYPASS_MANAGER select HAVE_KVM_IRQ_BYPASS + select KVM_SET_IRQ_INATOMIC select HAVE_KVM_IRQ_ROUTING select HAVE_KVM_EVENTFD select KVM_APIC_ARCHITECTURE diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 590c46e672df..a6e237275928 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -852,9 +852,6 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level, bool line_status); int kvm_set_msi(struct kvm_kernel_irq_routing_entry *irq_entry, struct kvm *kvm, int irq_source_id, int level, bool line_status); -int kvm_arch_set_irq_inatomic(struct kvm_kernel_irq_routing_entry *e, - struct kvm *kvm, int irq_source_id, - int level, bool line_status); bool kvm_irq_has_notifier(struct kvm *kvm, unsigned irqchip, unsigned pin); void kvm_notify_acked_gsi(struct kvm *kvm, int gsi); void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin); @@ -1207,4 +1204,20 @@ int kvm_arch_update_irqfd_routing(struct kvm *kvm, unsigned int host_irq, uint32_t guest_irq, bool set); #endif /* CONFIG_HAVE_KVM_IRQ_BYPASS */ +#ifndef CONFIG_KVM_SET_IRQ_INATOMIC +int __attribute__((weak)) kvm_arch_set_irq_inatomic( + struct kvm_kernel_irq_routing_entry *irq, + struct kvm *kvm, int irq_source_id, + int level, + bool line_status) +{ + return -EWOULDBLOCK; +} +#else +extern int kvm_arch_set_irq_inatomic( + struct kvm_kernel_irq_routing_entry *e, + struct kvm *kvm, int irq_source_id, int level, + bool line_status); +#endif + #endif diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig index 7a79b6853583..7c99dd4724a4 100644 --- a/virt/kvm/Kconfig +++ b/virt/kvm/Kconfig @@ -50,3 +50,6 @@ config KVM_COMPAT config HAVE_KVM_IRQ_BYPASS bool + +config KVM_SET_IRQ_INATOMIC + bool diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index b20a2d1bbf73..405c26742380 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -173,15 +173,6 @@ irqfd_deactivate(struct kvm_kernel_irqfd *irqfd) queue_work(irqfd_cleanup_wq, >shutdown); } -int __attribute__((weak)) kvm_arch_set_irq_inatomic( - struct kvm_kernel_irq_routing_entry *irq, - struct kvm *kvm, int irq_source_id, - int level, - bool line_status) -{ - return -EWOULDBLOCK; -} - static int irqfd_wakeup_pollin(struct kvm_kernel_irqfd *irqfd) { -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/5] VIRT: Support runtime irq_bypass consumer
Extend the irq_bypass manager to support runtime consumers. A runtime irq_bypass consumer can handle interrupt when an interrupt triggered. A runtime consumer has it's handle_irq() function set and passing a irq_context for the irq handling. A producer keep a link for the runtime consumers, so that it can invoke each consumer's handle_irq() when irq invoked. Currently the irq_bypass manager has several code path assuming there is only one consumer/producer pair for each token. For example, when register the producer, it exits the loop after finding one match consumer. This is updated to support both static consumer (like for Posted Interrupt consumer) and runtime consumer. Signed-off-by: Yunhong Jiang--- include/linux/irqbypass.h | 8 + virt/lib/irqbypass.c | 82 +++ 2 files changed, 69 insertions(+), 21 deletions(-) diff --git a/include/linux/irqbypass.h b/include/linux/irqbypass.h index 1551b5b2f4c2..d5bec0c7be3a 100644 --- a/include/linux/irqbypass.h +++ b/include/linux/irqbypass.h @@ -12,6 +12,7 @@ #define IRQBYPASS_H #include +#include struct irq_bypass_consumer; @@ -47,6 +48,9 @@ struct irq_bypass_consumer; */ struct irq_bypass_producer { struct list_head node; + /* Update side is synchronized by the lock on irqbypass.c */ + struct srcu_struct srcu; + struct list_head consumers; void *token; int irq; int (*add_consumer)(struct irq_bypass_producer *, @@ -61,6 +65,7 @@ struct irq_bypass_producer { * struct irq_bypass_consumer - IRQ bypass consumer definition * @node: IRQ bypass manager private list management * @token: opaque token to match between producer and consumer + * @sibling: consumers with same token list management * @add_producer: Connect the IRQ consumer to an IRQ producer * @del_producer: Disconnect the IRQ consumer from an IRQ producer * @stop: Perform any quiesce operations necessary prior to add/del (optional) @@ -73,6 +78,7 @@ struct irq_bypass_producer { */ struct irq_bypass_consumer { struct list_head node; + struct list_head sibling; void *token; int (*add_producer)(struct irq_bypass_consumer *, struct irq_bypass_producer *); @@ -80,6 +86,8 @@ struct irq_bypass_consumer { struct irq_bypass_producer *); void (*stop)(struct irq_bypass_consumer *); void (*start)(struct irq_bypass_consumer *); + int (*handle_irq)(void *arg); + void *irq_context; }; int irq_bypass_register_producer(struct irq_bypass_producer *); diff --git a/virt/lib/irqbypass.c b/virt/lib/irqbypass.c index 09a03b5a21ff..43ef9e2c77dc 100644 --- a/virt/lib/irqbypass.c +++ b/virt/lib/irqbypass.c @@ -21,6 +21,7 @@ #include #include #include +#include MODULE_LICENSE("GPL v2"); MODULE_DESCRIPTION("IRQ bypass manager utility module"); @@ -49,11 +50,8 @@ static int __connect(struct irq_bypass_producer *prod, prod->del_consumer(prod, cons); } - if (cons->start) - cons->start(cons); - if (prod->start) - prod->start(prod); - + if (!ret && cons->handle_irq) + list_add_rcu(>sibling, >consumers); return ret; } @@ -71,6 +69,11 @@ static void __disconnect(struct irq_bypass_producer *prod, if (prod->del_consumer) prod->del_consumer(prod, cons); + if (cons->handle_irq) { + list_del_rcu(>sibling); + synchronize_srcu(>srcu); + } + if (cons->start) cons->start(cons); if (prod->start) @@ -87,7 +90,8 @@ static void __disconnect(struct irq_bypass_producer *prod, int irq_bypass_register_producer(struct irq_bypass_producer *producer) { struct irq_bypass_producer *tmp; - struct irq_bypass_consumer *consumer; + struct list_head *node, *next, siblings = LIST_HEAD_INIT(siblings); + int ret; might_sleep(); @@ -96,6 +100,9 @@ int irq_bypass_register_producer(struct irq_bypass_producer *producer) mutex_lock(); + INIT_LIST_HEAD(>consumers); + init_srcu_struct(>srcu); + list_for_each_entry(tmp, , node) { if (tmp->token == producer->token) { mutex_unlock(); @@ -104,23 +111,48 @@ int irq_bypass_register_producer(struct irq_bypass_producer *producer) } } - list_for_each_entry(consumer, , node) { + list_for_each_safe(node, next, ) { + struct irq_bypass_consumer *consumer = container_of( + node, struct irq_bypass_consumer, node); + if (consumer->token == producer->token) { - int ret = __connect(producer, consumer); - if (ret) { - mutex_unlock(); -
[PATCH 1/5] KVM: Extract the irqfd_wakeup_pollin/irqfd_wakeup_pollup
Separate the irqfd_wakeup_pollin/irqfd_wakeup_pollup from the irqfd_wakeup, so that we can reuse the logic for MSI fastpath injection. Signed-off-by: Yunhong Jiang--- virt/kvm/eventfd.c | 86 -- 1 file changed, 51 insertions(+), 35 deletions(-) diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 46dbc0a7dfc1..c31d43b762db 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -180,6 +180,53 @@ int __attribute__((weak)) kvm_arch_set_irq_inatomic( return -EWOULDBLOCK; } +static int +irqfd_wakeup_pollin(struct kvm_kernel_irqfd *irqfd) +{ + struct kvm *kvm = irqfd->kvm; + struct kvm_kernel_irq_routing_entry irq; + unsigned seq; + int idx, ret; + + idx = srcu_read_lock(>irq_srcu); + do { + seq = read_seqcount_begin(>irq_entry_sc); + irq = irqfd->irq_entry; + } while (read_seqcount_retry(>irq_entry_sc, seq)); + /* An event has been signaled, inject an interrupt */ + ret = kvm_arch_set_irq_inatomic(, kvm, + KVM_USERSPACE_IRQ_SOURCE_ID, 1, + false); + srcu_read_unlock(>irq_srcu, idx); + + return ret; +} + +static int +irqfd_wakeup_pollup(struct kvm_kernel_irqfd *irqfd) +{ + struct kvm *kvm = irqfd->kvm; + unsigned long flags; + + spin_lock_irqsave(>irqfds.lock, flags); + + /* +* We must check if someone deactivated the irqfd before +* we could acquire the irqfds.lock since the item is +* deactivated from the KVM side before it is unhooked from +* the wait-queue. If it is already deactivated, we can +* simply return knowing the other side will cleanup for us. +* We cannot race against the irqfd going away since the +* other side is required to acquire wqh->lock, which we hold +*/ + if (irqfd_is_active(irqfd)) + irqfd_deactivate(irqfd); + + spin_unlock_irqrestore(>irqfds.lock, flags); + + return 0; +} + /* * Called with wqh->lock held and interrupts disabled */ @@ -189,45 +236,14 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key) struct kvm_kernel_irqfd *irqfd = container_of(wait, struct kvm_kernel_irqfd, wait); unsigned long flags = (unsigned long)key; - struct kvm_kernel_irq_routing_entry irq; - struct kvm *kvm = irqfd->kvm; - unsigned seq; - int idx; - if (flags & POLLIN) { - idx = srcu_read_lock(>irq_srcu); - do { - seq = read_seqcount_begin(>irq_entry_sc); - irq = irqfd->irq_entry; - } while (read_seqcount_retry(>irq_entry_sc, seq)); - /* An event has been signaled, inject an interrupt */ - if (kvm_arch_set_irq_inatomic(, kvm, - KVM_USERSPACE_IRQ_SOURCE_ID, 1, - false) == -EWOULDBLOCK) + if (flags & POLLIN) + if (irqfd_wakeup_pollin(irqfd) == -EWOULDBLOCK) schedule_work(>inject); - srcu_read_unlock(>irq_srcu, idx); - } - if (flags & POLLHUP) { + if (flags & POLLHUP) /* The eventfd is closing, detach from KVM */ - unsigned long flags; - - spin_lock_irqsave(>irqfds.lock, flags); - - /* -* We must check if someone deactivated the irqfd before -* we could acquire the irqfds.lock since the item is -* deactivated from the KVM side before it is unhooked from -* the wait-queue. If it is already deactivated, we can -* simply return knowing the other side will cleanup for us. -* We cannot race against the irqfd going away since the -* other side is required to acquire wqh->lock, which we hold -*/ - if (irqfd_is_active(irqfd)) - irqfd_deactivate(irqfd); - - spin_unlock_irqrestore(>irqfds.lock, flags); - } + irqfd_wakeup_pollup(irqfd); return 0; } -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/5] VFIO: Support threaded interrupt handling on VFIO
For VFIO device with MSI interrupt type, it's possible to handle the interrupt on hard interrupt context without invoking the interrupt thread. Handling the interrupt on hard interrupt context reduce the interrupt latency. Signed-off-by: Yunhong Jiang--- drivers/vfio/pci/vfio_pci_intrs.c | 39 ++- 1 file changed, 34 insertions(+), 5 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c index 3b3ba15558b7..108d335c5656 100644 --- a/drivers/vfio/pci/vfio_pci_intrs.c +++ b/drivers/vfio/pci/vfio_pci_intrs.c @@ -236,12 +236,35 @@ static void vfio_intx_disable(struct vfio_pci_device *vdev) kfree(vdev->ctx); } +static irqreturn_t vfio_msihandler(int irq, void *arg) +{ + struct vfio_pci_irq_ctx *ctx = arg; + struct irq_bypass_producer *producer = >producer; + struct irq_bypass_consumer *consumer; + int ret = IRQ_HANDLED, idx; + + idx = srcu_read_lock(>srcu); + + list_for_each_entry_rcu(consumer, >consumers, sibling) { + /* +* Invoke the thread handler if any consumer would block, but +* finish all consumes. +*/ + if (consumer->handle_irq(consumer->irq_context) == -EWOULDBLOCK) + ret = IRQ_WAKE_THREAD; + continue; + } + + srcu_read_unlock(>srcu, idx); + return ret; +} + /* * MSI/MSI-X */ -static irqreturn_t vfio_msihandler(int irq, void *arg) +static irqreturn_t vfio_msihandler_threaded(int irq, void *arg) { - struct eventfd_ctx *trigger = arg; + struct eventfd_ctx *trigger = ((struct vfio_pci_irq_ctx *)arg)->trigger; eventfd_signal(trigger, 1); return IRQ_HANDLED; @@ -318,7 +341,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev, return -EINVAL; if (vdev->ctx[vector].trigger) { - free_irq(irq, vdev->ctx[vector].trigger); + free_irq(irq, >ctx[vector]); irq_bypass_unregister_producer(>ctx[vector].producer); kfree(vdev->ctx[vector].name); eventfd_ctx_put(vdev->ctx[vector].trigger); @@ -353,8 +376,14 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev, pci_write_msi_msg(irq, ); } - ret = request_irq(irq, vfio_msihandler, 0, - vdev->ctx[vector].name, trigger); + /* +* Currently the primary handler for the thread_irq will be invoked on +* a thread, the IRQF_ONESHOT is a hack for it. +*/ + ret = request_threaded_irq(irq, vfio_msihandler, + vfio_msihandler_threaded, + IRQF_ONESHOT, vdev->ctx[vector].name, + >ctx[vector]); if (ret) { kfree(vdev->ctx[vector].name); eventfd_ctx_put(trigger); -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC
On Wed, Dec 2, 2015 at 6:08 AM, Lan, Tianyuwrote: > On 12/1/2015 11:02 PM, Michael S. Tsirkin wrote: >>> >>> But >>> it requires guest OS to do specific configurations inside and rely on >>> bonding driver which blocks it work on Windows. >>> From performance side, >>> putting VF and virtio NIC under bonded interface will affect their >>> performance even when not do migration. These factors block to use VF >>> NIC passthough in some user cases(Especially in the cloud) which require >>> migration. >> >> >> That's really up to guest. You don't need to do bonding, >> you can just move the IP and mac from userspace, that's >> possible on most OS-es. >> >> Or write something in guest kernel that is more lightweight if you are >> so inclined. What we are discussing here is the host-guest interface, >> not the in-guest interface. >> >>> Current solution we proposed changes NIC driver and Qemu. Guest Os >>> doesn't need to do special thing for migration. >>> It's easy to deploy >> >> >> >> Except of course these patches don't even work properly yet. >> >> And when they do, even minor changes in host side NIC hardware across >> migration will break guests in hard to predict ways. > > > Switching between PV and VF NIC will introduce network stop and the > latency of hotplug VF is measurable. For some user cases(cloud service > and OPNFV) which are sensitive to network stabilization and performance, > these are not friend and blocks SRIOV NIC usage in these case. We hope > to find a better way to make SRIOV NIC work in these cases and this is > worth to do since SRIOV NIC provides better network performance compared > with PV NIC. Current patches have some issues. I think we can find > solution for them andimprove them step by step. I still believe the concepts being put into use here are deeply flawed. You are assuming you can somehow complete the migration while the device is active and I seriously doubt that is the case. You are going to cause data corruption or worse cause a kernel panic when you end up corrupting the guest memory. You have to halt the device at some point in order to complete the migration. Now I fully agree it is best to do this for as small a window as possible. I really think that your best approach would be embrace and extend the current solution that is making use of bonding. The first step being to make it so that you don't have to hot-plug the VF until just before you halt the guest instead of before you start he migration. Just doing that would yield a significant gain in terms of performance during the migration. In addition something like that should be able to be done without having to be overly invasive into the drivers. A few tweaks to the DMA API and you could probably have that resolved. As far as avoiding the hot-plug itself that would be better handled as a separate follow-up, and really belongs more to the PCI layer than the NIC device drivers. The device drivers should already have code for handling a suspend/resume due to a power cycle event. If you could make use of that then it is just a matter of implementing something in the hot-plug or PCIe drivers that would allow QEMU to signal when the device needs to go into D3 and when it can resume normal operation at D0. You could probably use the PCI Bus Master Enable bit as the test on if the device is ready for migration or not. If the bit is set you cannot migrate the VM, and if it is cleared than you are ready to migrate. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/5] Threaded MSI interrupt for VFIO PCI device
When assigning a VFIO device to a KVM guest with low latency requirement, it is better to handle the interrupt in the hard interrupt context, to reduce the context switch to/from the IRQ thread. Based on discussion on https://lkml.org/lkml/2015/10/26/764, the VFIO msi interrupt is changed to use request_threaded_irq(). The primary interrupt handler tries to set the guest interrupt atomically. If it fails to achieve it, a threaded interrupt handler will be invoked. The irq_bypass manager is extended for this purpose. The KVM eventfd will provide a irqbypass consumer to handle the interrupt at hard interrupt context. The producer will invoke the consumer's handler then. Yunhong Jiang (5): Extract the irqfd_wakeup_pollin/irqfd_wakeup_pollup Support runtime irq_bypass consumer Support threaded interrupt handling on VFIO Add the irq handling consumer Expose x86 kvm_arch_set_irq_inatomic() arch/x86/kvm/Kconfig | 1 + drivers/vfio/pci/vfio_pci_intrs.c | 39 ++-- include/linux/irqbypass.h | 8 +++ include/linux/kvm_host.h | 19 +- include/linux/kvm_irqfd.h | 1 + virt/kvm/Kconfig | 3 + virt/kvm/eventfd.c| 131 ++ virt/lib/irqbypass.c | 82 ++-- 8 files changed, 214 insertions(+), 70 deletions(-) -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/5] KVM: Add the irq handling consumer
Add an irq_bypass consumer to the KVM eventfd, so that when a MSI interrupt happens and triggerred from VFIO, it can be handled fast. Signed-off-by: Yunhong Jiang--- include/linux/kvm_irqfd.h | 1 + virt/kvm/eventfd.c| 42 ++ 2 files changed, 43 insertions(+) diff --git a/include/linux/kvm_irqfd.h b/include/linux/kvm_irqfd.h index 0c1de05098c8..5573d53ccebb 100644 --- a/include/linux/kvm_irqfd.h +++ b/include/linux/kvm_irqfd.h @@ -65,6 +65,7 @@ struct kvm_kernel_irqfd { poll_table pt; struct work_struct shutdown; struct irq_bypass_consumer consumer; + struct irq_bypass_consumer fastpath; struct irq_bypass_producer *producer; }; diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index c31d43b762db..b20a2d1bbf73 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -144,6 +144,8 @@ irqfd_shutdown(struct work_struct *work) #ifdef CONFIG_HAVE_KVM_IRQ_BYPASS irq_bypass_unregister_consumer(>consumer); #endif + if (irqfd->fastpath.token) + irq_bypass_unregister_consumer(>fastpath); eventfd_ctx_put(irqfd->eventfd); kfree(irqfd); } @@ -203,6 +205,14 @@ irqfd_wakeup_pollin(struct kvm_kernel_irqfd *irqfd) } static int +kvm_fastpath_irq(void *arg) +{ + struct kvm_kernel_irqfd *irqfd = arg; + + return irqfd_wakeup_pollin(irqfd); +} + +static int irqfd_wakeup_pollup(struct kvm_kernel_irqfd *irqfd) { struct kvm *kvm = irqfd->kvm; @@ -296,6 +306,34 @@ int __attribute__((weak)) kvm_arch_update_irqfd_routing( } #endif +static int kvm_fastpath_stub(struct irq_bypass_consumer *stub, +struct irq_bypass_producer *stub1) +{ + return 0; +} + +static void kvm_fastpath_stub1(struct irq_bypass_consumer *stub, +struct irq_bypass_producer *stub1) +{ +} + +static int setup_fastpath_consumer(struct kvm_kernel_irqfd *irqfd) +{ + int ret; + + irqfd->fastpath.token = (void *)irqfd->eventfd; + irqfd->fastpath.add_producer = kvm_fastpath_stub; + irqfd->fastpath.del_producer = kvm_fastpath_stub1; + irqfd->fastpath.handle_irq = kvm_fastpath_irq; + irqfd->fastpath.irq_context = irqfd; + ret = irq_bypass_register_consumer(>fastpath); + + if (ret) + /* A special tag to indicate consumer not working */ + irqfd->fastpath.token = (void *)0; + return ret; +} + static int kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args) { @@ -435,6 +473,10 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args) irqfd->consumer.token, ret); #endif + if (setup_fastpath_consumer(irqfd)) + pr_info("irq bypass fastpath consumer (toke %p) registration fails: %d\n", + irqfd->eventfd, ret); + return 0; fail: -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 3/3] KVM/arm/arm64: enable enhanced armv8 fp/simd lazy switch
On 12/3/2015 8:13 AM, Marc Zyngier wrote: > On 14/11/15 22:12, Mario Smarduch wrote: >> This patch tracks armv7 and armv8 fp/simd hardware state with a vcpu lazy >> flag. >> On vcpu_load for 32 bit guests enable FP access, and later enable fp/simd >> trapping for 32 and 64 bit guests if lazy flag is not set. On first fp/simd >> access trap to handler to save host and restore guest context, disable >> trapping and set vcpu lazy flag. On vcpu_put if flag is set save guest and >> restore host context and also save guest fpexc register. >> >> Signed-off-by: Mario Smarduch>> --- >> arch/arm/include/asm/kvm_host.h | 3 ++ >> arch/arm/kvm/arm.c| 18 +++-- >> arch/arm64/include/asm/kvm_asm.h | 2 + >> arch/arm64/include/asm/kvm_host.h | 17 +++- >> arch/arm64/kernel/asm-offsets.c | 1 + >> arch/arm64/kvm/hyp.S | 83 >> +-- >> 6 files changed, 89 insertions(+), 35 deletions(-) >> >> diff --git a/arch/arm/include/asm/kvm_host.h >> b/arch/arm/include/asm/kvm_host.h >> index 8fc7a59..6960ff2 100644 >> --- a/arch/arm/include/asm/kvm_host.h >> +++ b/arch/arm/include/asm/kvm_host.h >> @@ -40,6 +40,8 @@ >> >> #define KVM_MAX_VCPUS VGIC_V2_MAX_CPUS >> >> +#define kvm_guest_is32bit(vcpu) true > > This should be defined as an inline function, and placed in > asm/kvm_emulate.h, probably renamed as kvm_guest_vcpu_is_32bit. Will do, this header file should also resolve my problems in armv7. > >> + >> /* >> * Reads the host FPEXC register, saves to vcpu context and enables the >> * FPEXC. >> @@ -260,6 +262,7 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot); >> >> struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr); >> void kvm_restore_host_vfp_state(struct kvm_vcpu *); >> +static inline void kvm_save_guest_fpexc(struct kvm_vcpu *vcpu) {} >> >> static inline void kvm_arch_hardware_disable(void) {} >> static inline void kvm_arch_hardware_unsetup(void) {} >> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c >> index cfc348a..7a20530 100644 >> --- a/arch/arm/kvm/arm.c >> +++ b/arch/arm/kvm/arm.c >> @@ -292,8 +292,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) >> >> kvm_arm_set_running_vcpu(vcpu); >> >> -/* Save and enable FPEXC before we load guest context */ >> -kvm_enable_fpexc(vcpu); >> +/* >> + * For 32bit guest executing on arm64, enable fp/simd access in >> + * EL2. On arm32 save host fpexc and then enable fp/simd access. >> + */ >> +if (kvm_guest_is32bit(vcpu)) >> +kvm_enable_fpexc(vcpu); >> } >> >> void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) >> @@ -301,10 +305,18 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) >> /* If the fp/simd registers are dirty save guest, restore host. */ >> if (vcpu->arch.vfp_dirty) { >> kvm_restore_host_vfp_state(vcpu); >> + >> +/* >> + * For 32bit guest on arm64 save the guest fpexc register >> + * in EL2 mode. >> + */ >> +if (kvm_guest_is32bit(vcpu)) >> +kvm_save_guest_fpexc(vcpu); >> + >> vcpu->arch.vfp_dirty = 0; >> } >> >> -/* Restore host FPEXC trashed in vcpu_load */ >> +/* For arm32 restore host FPEXC trashed in vcpu_load. */ >> kvm_restore_host_fpexc(vcpu); >> >> /* >> diff --git a/arch/arm64/include/asm/kvm_asm.h >> b/arch/arm64/include/asm/kvm_asm.h >> index 5e37710..c589ca9 100644 >> --- a/arch/arm64/include/asm/kvm_asm.h >> +++ b/arch/arm64/include/asm/kvm_asm.h >> @@ -117,6 +117,8 @@ extern char __kvm_hyp_vector[]; >> extern void __kvm_flush_vm_context(void); >> extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa); >> extern void __kvm_tlb_flush_vmid(struct kvm *kvm); >> +extern void __kvm_enable_fpexc32(void); >> +extern void __kvm_save_fpexc32(struct kvm_vcpu *vcpu); >> >> extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu); >> >> diff --git a/arch/arm64/include/asm/kvm_host.h >> b/arch/arm64/include/asm/kvm_host.h >> index 83e65dd..6e2d6b5 100644 >> --- a/arch/arm64/include/asm/kvm_host.h >> +++ b/arch/arm64/include/asm/kvm_host.h >> @@ -41,6 +41,8 @@ >> >> #define KVM_VCPU_MAX_FEATURES 3 >> >> +#define kvm_guest_is32bit(vcpu) (!(vcpu->arch.hcr_el2 & HCR_RW)) >> + >> int __attribute_const__ kvm_target_cpu(void); >> int kvm_reset_vcpu(struct kvm_vcpu *vcpu); >> int kvm_arch_dev_ioctl_check_extension(long ext); >> @@ -251,9 +253,20 @@ static inline void kvm_arch_hardware_unsetup(void) {} >> static inline void kvm_arch_sync_events(struct kvm *kvm) {} >> static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {} >> static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {} >> -static inline void kvm_enable_fpexc(struct kvm_vcpu *vcpu) {} >> -static inline void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu) {} >> + >> +static inline void
Re: [PATCH] KVM: VMX: fix the writing POSTED_INTR_NV
On 2015/12/3 13:29, roy.qing...@gmail.com wrote: From: Li RongQingPOSTED_INTR_NV is 16bit, should not use 64bit write function [ 5311.676074] vmwrite error: reg 3 value 0 (err 12) [ 5311.680001] CPU: 49 PID: 4240 Comm: qemu-system-i38 Tainted: G I 4.1.13-WR8.0.0.0_standard #1 [ 5311.689343] Hardware name: Intel Corporation S2600WT2/S2600WT2, BIOS SE5C610.86B.01.01.0008.021120151325 02/11/2015 [ 5311.699550] e69a7e1c c1950de1 e69a7e38 fafcff45 fafebd24 [ 5311.706924] 0003 000c b6a06dfa e69a7e40 fafcff79 e69a7eb0 fafd5f57 [ 5311.714296] e69a7ec0 c1080600 0001 c0e18018 01be 0b43 [ 5311.721651] Call Trace: [ 5311.722942] [] dump_stack+0x4b/0x75 [ 5311.726467] [] vmwrite_error+0x35/0x40 [kvm_intel] [ 5311.731444] [] vmcs_writel+0x29/0x30 [kvm_intel] [ 5311.736228] [] vmx_create_vcpu+0x337/0xb90 [kvm_intel] [ 5311.741600] [] ? dequeue_task_fair+0x2e0/0xf60 [ 5311.746197] [] kvm_arch_vcpu_create+0x3a/0x70 [kvm] [ 5311.751278] [] kvm_vm_ioctl+0x14d/0x640 [kvm] [ 5311.755771] [] ? free_pages_prepare+0x1a4/0x2d0 [ 5311.760455] [] ? debug_smp_processor_id+0x12/0x20 [ 5311.765333] [] ? sched_move_task+0xbe/0x170 [ 5311.769621] [] ? kmem_cache_free+0x213/0x230 [ 5311.774016] [] ? kvm_set_memory_region+0x60/0x60 [kvm] [ 5311.779379] [] do_vfs_ioctl+0x2e2/0x500 [ 5311.783285] [] ? kmem_cache_free+0x213/0x230 [ 5311.787677] [] ? __mmdrop+0x63/0xd0 [ 5311.791196] [] ? __mmdrop+0x63/0xd0 [ 5311.794712] [] ? __mmdrop+0x63/0xd0 [ 5311.798234] [] ? __fget+0x57/0x90 [ 5311.801559] [] ? __fget_light+0x22/0x50 [ 5311.805464] [] SyS_ioctl+0x80/0x90 [ 5311.808885] [] sysenter_do_call+0x12/0x12 [ 5312.059280] kvm: zapping shadow pages for mmio generation wraparound [ 5313.678415] kvm [4231]: vcpu0 disabled perfctr wrmsr: 0xc2 data 0x [ 5313.726518] kvm [4231]: vcpu0 unhandled rdmsr: 0x570 Signed-off-by: Li RongQing Cc: Yang Zhang --- arch/x86/kvm/vmx.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 5eb56ed..418e084 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -4780,7 +4780,7 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx) vmcs_write16(GUEST_INTR_STATUS, 0); - vmcs_write64(POSTED_INTR_NV, POSTED_INTR_VECTOR); + vmcs_write16(POSTED_INTR_NV, POSTED_INTR_VECTOR); vmcs_write64(POSTED_INTR_DESC_ADDR, __pa((>pi_desc))); } @@ -9505,7 +9505,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) */ vmx->nested.posted_intr_nv = vmcs12->posted_intr_nv; vmx->nested.pi_pending = false; - vmcs_write64(POSTED_INTR_NV, POSTED_INTR_VECTOR); + vmcs_write16(POSTED_INTR_NV, POSTED_INTR_VECTOR); vmcs_write64(POSTED_INTR_DESC_ADDR, page_to_phys(vmx->nested.pi_desc_page) + (unsigned long)(vmcs12->posted_intr_desc_addr & Yes, it is a mistake from my original patch.:( Reviewed-by: Yang Zhang -- best regards yang -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 2/3] KVM/arm/arm64: enable enhanced armv7 fp/simd lazy switch
On 12/3/2015 7:58 AM, Marc Zyngier wrote: > On 14/11/15 22:12, Mario Smarduch wrote: >> This patch tracks armv7 fp/simd hardware state with a vcpu lazy flag. >> On vcpu_load saves host fpexc and enables FP access, and later enables >> fp/simd >> trapping if lazy flag is not set. On first fp/simd access trap to handler >> to save host and restore guest context, disable trapping and set vcpu lazy >> flag. On vcpu_put if flag is set save guest and restore host context and >> always restore host fpexc. >> >> Signed-off-by: Mario Smarduch>> --- >> arch/arm/include/asm/kvm_host.h | 33 ++ >> arch/arm/kvm/arm.c| 12 >> arch/arm/kvm/interrupts.S | 58 >> +++ >> arch/arm/kvm/interrupts_head.S| 26 +- >> arch/arm64/include/asm/kvm_host.h | 6 >> 5 files changed, 104 insertions(+), 31 deletions(-) >> >> diff --git a/arch/arm/include/asm/kvm_host.h >> b/arch/arm/include/asm/kvm_host.h >> index f1bf551..8fc7a59 100644 >> --- a/arch/arm/include/asm/kvm_host.h >> +++ b/arch/arm/include/asm/kvm_host.h >> @@ -40,6 +40,38 @@ >> >> #define KVM_MAX_VCPUS VGIC_V2_MAX_CPUS >> >> +/* >> + * Reads the host FPEXC register, saves it to vcpu context and enables the >> + * FP/SIMD unit. >> + */ >> +#ifdef CONFIG_VFPv3 >> +#define kvm_enable_fpexc(vcpu) {\ >> +u32 fpexc = 0; \ >> +asm volatile( \ >> +"mrc p10, 7, %0, cr8, cr0, 0\n" \ >> +"str %0, [%1]\n"\ >> +"orr %0, %0, #(1 << 30)\n" \ >> +"mcr p10, 7, %0, cr8, cr0, 0\n" \ > > Don't you need an ISB here? Yes it does (B2.7.3) - was thinking something else but the manual is clear here. > Also, it would be a lot nicer if this was a > real function (possibly inlined). I don't see any real reason to make > this a #define. Had some trouble reconciling arm and arm64 compile making this a function in kvm_host.h. I'll work to resolve it. > > Also, you're preserving a lot of the host's FPEXC bits. Is that safe? No it may not be, should just set the enable bit. > >> +: "+r" (fpexc) \ >> +: "r" (>arch.host_fpexc) \ >> +); \ >> +} >> +#else >> +#define kvm_enable_fpexc(vcpu) >> +#endif >> + >> +/* Restores host FPEXC register */ >> +#ifdef CONFIG_VFPv3 >> +#define kvm_restore_host_fpexc(vcpu) { \ >> +asm volatile( \ >> +"mcr p10, 7, %0, cr8, cr0, 0\n" \ >> +: : "r" (vcpu->arch.host_fpexc) \ >> +); \ >> +} >> +#else >> +#define kvm_restore_host_fpexc(vcpu) >> +#endif >> + >> u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode); >> int __attribute_const__ kvm_target_cpu(void); >> int kvm_reset_vcpu(struct kvm_vcpu *vcpu); >> @@ -227,6 +259,7 @@ int kvm_perf_teardown(void); >> void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot); >> >> struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr); >> +void kvm_restore_host_vfp_state(struct kvm_vcpu *); >> >> static inline void kvm_arch_hardware_disable(void) {} >> static inline void kvm_arch_hardware_unsetup(void) {} >> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c >> index dc017ad..cfc348a 100644 >> --- a/arch/arm/kvm/arm.c >> +++ b/arch/arm/kvm/arm.c >> @@ -291,10 +291,22 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) >> vcpu->arch.host_cpu_context = this_cpu_ptr(kvm_host_cpu_state); >> >> kvm_arm_set_running_vcpu(vcpu); >> + >> +/* Save and enable FPEXC before we load guest context */ >> +kvm_enable_fpexc(vcpu); >> } >> >> void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) >> { >> +/* If the fp/simd registers are dirty save guest, restore host. */ >> +if (vcpu->arch.vfp_dirty) { > > See my previous comment about the dirty state. Yes that change seems to be working out fine. > >> +kvm_restore_host_vfp_state(vcpu); >> +vcpu->arch.vfp_dirty = 0; >> +} >> + >> +/* Restore host FPEXC trashed in vcpu_load */ >> +kvm_restore_host_fpexc(vcpu); >> + >> /* >> * The arch-generic KVM code expects the cpu field of a vcpu to be -1 >> * if the vcpu is no longer assigned to a cpu. This is used for the >> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S >> index 900ef6d..1ddaa89 100644 >> --- a/arch/arm/kvm/interrupts.S >> +++ b/arch/arm/kvm/interrupts.S >> @@ -28,6 +28,26 @@ >> #include "interrupts_head.S" >> >> .text >> +/** >> + * void kvm_restore_host_vfp_state(struct vcpu *vcpu) - >> + * This function is called from host to save the guest, and restore host >> + * fp/simd hardware context.
Re: [PATCH] KVM: arm/arm64: Revert to old way of checking for device mapping in stage2_flush_ptes().
On Wed, 2015-12-02 at 18:41 +0100, Ard Biesheuvel wrote: > Hi Pavel, > > Thanks for getting to the bottom of this. > > On 1 December 2015 at 14:03, Pavel Fedinwrote: > > This function takes stage-II physical addresses (A.K.A. IPA), on input, not > > real physical addresses. This causes kvm_is_device_pfn() to return wrong > > values, depending on how much guest and host memory maps match. This > > results in completely broken KVM on some boards. The problem has been > > caught on Samsung proprietary hardware. > > > > Cc: sta...@vger.kernel.org > > Fixes: e6fab5442345 ("ARM/arm64: KVM: test properly for a PTE's > > uncachedness") > > > > That commit is not in a release yet, so no need for cc stable [...] But it is cc'd to stable, so unless it is going to be nacked at review stage, any subsequent fixes should also be cc'd. Ben. -- Ben Hutchings For every action, there is an equal and opposite criticism. - Harrison signature.asc Description: This is a digitally signed message part
Re: [PATCH] KVM: VMX: fix read/write sizes of VMCS fields
On 2015/12/3 23:11, Paolo Bonzini wrote: In theory this should have broken EPT on 32-bit kernels (due to reading the high part of natural-width field GUEST_CR3). Not sure if no one noticed or the processor behaves differently from the documentation. It seems we will check the success of vmcs_write but not vmcs_read. Shouldn't check the vmcs_read? Signed-off-by: Paolo Bonzini--- arch/x86/kvm/vmx.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index c39737ff0581..b1af1e48070b 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -4868,7 +4868,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) seg_setup(VCPU_SREG_CS); vmcs_write16(GUEST_CS_SELECTOR, 0xf000); - vmcs_write32(GUEST_CS_BASE, 0x); + vmcs_writel(GUEST_CS_BASE, 0xul); seg_setup(VCPU_SREG_DS); seg_setup(VCPU_SREG_ES); @@ -4904,7 +4904,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) vmcs_write32(GUEST_ACTIVITY_STATE, GUEST_ACTIVITY_ACTIVE); vmcs_write32(GUEST_INTERRUPTIBILITY_INFO, 0); - vmcs_write32(GUEST_PENDING_DBG_EXCEPTIONS, 0); + vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS, 0); setup_msrs(vmx); @@ -7893,7 +7893,7 @@ static void dump_vmcs(void) u32 pin_based_exec_ctrl = vmcs_read32(PIN_BASED_VM_EXEC_CONTROL); u32 secondary_exec_control = 0; unsigned long cr4 = vmcs_readl(GUEST_CR4); - u64 efer = vmcs_readl(GUEST_IA32_EFER); + u64 efer = vmcs_read64(GUEST_IA32_EFER); int i, n; if (cpu_has_secondary_exec_ctrls()) @@ -10159,7 +10159,7 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12, * Additionally, restore L2's PDPTR to vmcs12. */ if (enable_ept) { - vmcs12->guest_cr3 = vmcs_read64(GUEST_CR3); + vmcs12->guest_cr3 = vmcs_readl(GUEST_CR3); vmcs12->guest_pdptr0 = vmcs_read64(GUEST_PDPTR0); vmcs12->guest_pdptr1 = vmcs_read64(GUEST_PDPTR1); vmcs12->guest_pdptr2 = vmcs_read64(GUEST_PDPTR2); -- best regards yang -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 3/3] vhost_net: basic polling support
On 12/02/2015 08:36 PM, Michael S. Tsirkin wrote: > On Wed, Dec 02, 2015 at 01:04:03PM +0800, Jason Wang wrote: >> >> On 12/01/2015 10:43 PM, Michael S. Tsirkin wrote: >>> On Tue, Dec 01, 2015 at 01:17:49PM +0800, Jason Wang wrote: On 11/30/2015 06:44 PM, Michael S. Tsirkin wrote: > On Wed, Nov 25, 2015 at 03:11:29PM +0800, Jason Wang wrote: >>> This patch tries to poll for new added tx buffer or socket receive >>> queue for a while at the end of tx/rx processing. The maximum time >>> spent on polling were specified through a new kind of vring ioctl. >>> >>> Signed-off-by: Jason Wang> One further enhancement would be to actually poll > the underlying device. This should be reasonably > straight-forward with macvtap (especially in the > passthrough mode). > > Yes, it is. I have some patches to do this by replacing skb_queue_empty() with sk_busy_loop() but for tap. >>> We probably don't want to do this unconditionally, though. >>> Tests does not show any improvement but some regression. >>> Did you add code to call sk_mark_napi_id on tap then? >>> sk_busy_loop won't do anything useful without. >> Yes I did. Probably something wrong elsewhere. > Is this for guest-to-guest? Nope. Like you said below, since it requires NAPI so it was external host to guest. > the patch to do napi > for tap is still not upstream due to minor performance > regression. Want me to repost it? Sure, I've played this a little bit in the past too. > Maybe it's better to test macvtap. >>> Same thing ... >>> > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html