RE: [PATCH v4 3/7] KVM: arm/arm64: Fix the documentation
Hello! > > --- a/Documentation/virtual/kvm/devices/arm-vgic.txt > > +++ b/Documentation/virtual/kvm/devices/arm-vgic.txt > > @@ -44,28 +44,29 @@ Groups: > >Attributes: > > The attr field of kvm_device_attr encodes two values: > > bits: | 63 40 | 39 .. 32 | 31 0 | > > -values: |reserved | cpu id | offset | > > +values: |reserved | cpu idx | offset | > > why should this be changed to cpu idx? Because it's index (from 0 to N - 1), and "cpu id" may confuse readers that it should be MPIDR affinity value. In register access function we do "vcpu = kvm_get_vcpu(dev->kvm, cpuid);" (see here: http://lxr.free-electrons.com/source/virt/kvm/arm/vgic-v2-emul.c#L664), and kvm_get_vcpu just indexes the array: http://lxr.free-electrons.com/source/include/linux/kvm_host.h#L427 I decided to change this after http://www.spinics.net/lists/kvm-arm/msg16359.html, Andre clearly mistook this ID for being affinity value. Before GICv3 nobody saw the difference because we had only up to 16 CPUs, with IDs from 0 to 15, i. e. corresponding to indexes. Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] KVM: nVMX: expose VPID capability to L1
On 9/29/15 6:39 PM, Paolo Bonzini wrote: On 29/09/2015 04:55, Wanpeng Li wrote: Expose VPID capability to L1. Signed-off-by: Wanpeng Li--- v1 -> v2: * set only VMX_VPID_EXTENT_GLOBAL_CONTEXT_BIT Thanks. I've checked more thoroughly your implementation against the SDM now, and there are a few missing things between this patch and the one that emulates INVVPID: - you're not setting bit 32 of the VMX_EPT_VPID_CAP MSR - you were not checking against the supported types in the implementation of the INVVPID instruction - the memory operand must always be read even if it isn't needed (e.g., for type==global), similar to INVEPT - for single-context invalidation you're not checking that VPID != 0, though in practice that doesn't matter because we don't want to support single-context invalidation - you're always setting the MSR's bits to 1 even if !enable_vpid I believe you mean always setting the MSR's bits to 1 when !enable_ept and enable_vpid. At this point it's better if you resend the whole nested VPID implementation, i.e. the following five patches: KVM: VMX: adjust interface to allocate/free_vpid KVM: VMX: introduce __vmx_flush_tlb to handle specific vpid KVM: nVMX: emulate the INVVPID instruction KVM: nVMX: nested VPID emulation KVM: nVMX: expose VPID capability to L1 with the above issues fixed. I just send out v2. Please also send kvm-unit-tests patches that tests for the error cases. I will do it later. Regards, Wanpeng Li -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/5] KVM: VMX: adjust interface to allocate/free_vpid
Adjust allocate/free_vid so that they can be reused for the nested vpid. Reviewed-by: Wincy VanSigned-off-by: Wanpeng Li --- arch/x86/kvm/vmx.c | 25 - 1 file changed, 12 insertions(+), 13 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 6407674..3c9e2a4a 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -4155,29 +4155,28 @@ static int alloc_identity_pagetable(struct kvm *kvm) return r; } -static void allocate_vpid(struct vcpu_vmx *vmx) +static int allocate_vpid(void) { int vpid; - vmx->vpid = 0; if (!enable_vpid) - return; + return 0; spin_lock(_vpid_lock); vpid = find_first_zero_bit(vmx_vpid_bitmap, VMX_NR_VPIDS); - if (vpid < VMX_NR_VPIDS) { - vmx->vpid = vpid; + if (vpid < VMX_NR_VPIDS) __set_bit(vpid, vmx_vpid_bitmap); - } + else + vpid = 0; spin_unlock(_vpid_lock); + return vpid; } -static void free_vpid(struct vcpu_vmx *vmx) +static void free_vpid(int vpid) { - if (!enable_vpid) + if (!enable_vpid || vpid == 0) return; spin_lock(_vpid_lock); - if (vmx->vpid != 0) - __clear_bit(vmx->vpid, vmx_vpid_bitmap); + __clear_bit(vpid, vmx_vpid_bitmap); spin_unlock(_vpid_lock); } @@ -8492,7 +8491,7 @@ static void vmx_free_vcpu(struct kvm_vcpu *vcpu) if (enable_pml) vmx_disable_pml(vmx); - free_vpid(vmx); + free_vpid(vmx->vpid); leave_guest_mode(vcpu); vmx_load_vmcs01(vcpu); free_nested(vmx); @@ -8511,7 +8510,7 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) if (!vmx) return ERR_PTR(-ENOMEM); - allocate_vpid(vmx); + vmx->vpid = allocate_vpid(); err = kvm_vcpu_init(>vcpu, kvm, id); if (err) @@ -8587,7 +8586,7 @@ free_msrs: uninit_vcpu: kvm_vcpu_uninit(>vcpu); free_vcpu: - free_vpid(vmx); + free_vpid(vmx->vpid); kmem_cache_free(kvm_vcpu_cache, vmx); return ERR_PTR(err); } -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/5] KVM: VMX: introduce __vmx_flush_tlb to handle specific vpid
Introduce __vmx_flush_tlb() to handle specific vpid. It will be used by later patches, note that the "all context" variant can be mapped to vpid_sync_vcpu_single with vpid02 as the argument (a nice side effect of vpid02 design). Reviewed-by: Wincy VanSigned-off-by: Wanpeng Li --- arch/x86/kvm/vmx.c | 21 + 1 file changed, 13 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 3c9e2a4a..215db2b 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -1337,13 +1337,13 @@ static void loaded_vmcs_clear(struct loaded_vmcs *loaded_vmcs) __loaded_vmcs_clear, loaded_vmcs, 1); } -static inline void vpid_sync_vcpu_single(struct vcpu_vmx *vmx) +static inline void vpid_sync_vcpu_single(int vpid) { - if (vmx->vpid == 0) + if (vpid == 0) return; if (cpu_has_vmx_invvpid_single()) - __invvpid(VMX_VPID_EXTENT_SINGLE_CONTEXT, vmx->vpid, 0); + __invvpid(VMX_VPID_EXTENT_SINGLE_CONTEXT, vpid, 0); } static inline void vpid_sync_vcpu_global(void) @@ -1352,10 +1352,10 @@ static inline void vpid_sync_vcpu_global(void) __invvpid(VMX_VPID_EXTENT_ALL_CONTEXT, 0, 0); } -static inline void vpid_sync_context(struct vcpu_vmx *vmx) +static inline void vpid_sync_context(int vpid) { if (cpu_has_vmx_invvpid_single()) - vpid_sync_vcpu_single(vmx); + vpid_sync_vcpu_single(vpid); else vpid_sync_vcpu_global(); } @@ -3441,9 +3441,9 @@ static void exit_lmode(struct kvm_vcpu *vcpu) #endif -static void vmx_flush_tlb(struct kvm_vcpu *vcpu) +static inline void __vmx_flush_tlb(struct kvm_vcpu *vcpu, int vpid) { - vpid_sync_context(to_vmx(vcpu)); + vpid_sync_context(vpid); if (enable_ept) { if (!VALID_PAGE(vcpu->arch.mmu.root_hpa)) return; @@ -3451,6 +3451,11 @@ static void vmx_flush_tlb(struct kvm_vcpu *vcpu) } } +static void vmx_flush_tlb(struct kvm_vcpu *vcpu) +{ + __vmx_flush_tlb(vcpu, to_vmx(vcpu)->vpid); +} + static void vmx_decache_cr0_guest_bits(struct kvm_vcpu *vcpu) { ulong cr0_guest_owned_bits = vcpu->arch.cr0_guest_owned_bits; @@ -4784,7 +4789,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) vmx_fpu_activate(vcpu); update_exception_bitmap(vcpu); - vpid_sync_context(vmx); + vpid_sync_context(vmx->vpid); } /* -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 0/5] KVM: nVMX: nested VPID emulation
v1 -> v2: * set bit 32 of the VMX_EPT_VPID_CAP MSR * check against the supported types in the implementation of the INVVPID instruction * the memory operand must always be read even if it isn't needed (e.g., for type==global), similar to INVEPT * for single-context invalidation to check that VPID != 0, though in practice that doesn't matter because we don't want to support single-context invalidation * don't set msr's ept related bits if !enable_ept VPID is used to tag address space and avoid a TLB flush. Currently L0 use the same VPID to run L1 and all its guests. KVM flushes VPID when switching between L1 and L2. This patch advertises VPID to the L1 hypervisor, then address space of L1 and L2 can be separately treated and avoid TLB flush when swithing between L1 and L2. For each nested vmentry, if vpid12 is changed, reuse shadow vpid w/ an invvpid. Performance: run lmbench on L2 w/ 3.5 kernel. Context switching - times in microseconds - smaller is better - Host OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw - - -- -- -- -- -- --- --- kernelLinux 3.5.0-1 1.2200 1.3700 1.4500 4.7800 2.3300 5.6 2.88000 nested VPID kernelLinux 3.5.0-1 1.2600 1.4300 1.5600 12.7 12.9 3.49000 7.46000 vanilla Wanpeng Li (5): KVM: VMX: adjust interface to allocate/free_vpid KVM: VMX: introduce __vmx_flush_tlb to handle specific vpid KVM: nVMX: emulate the INVVPID instruction KVM: nVMX: nested VPID emulation KVM: nVMX: expose VPID capability to L1 arch/x86/include/asm/vmx.h | 3 + arch/x86/kvm/vmx.c | 164 ++--- 2 files changed, 129 insertions(+), 38 deletions(-) -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: arm/arm64: Add VGICv3 save/restore API documentation
On Thu, Oct 08, 2015 at 12:10:35PM +0300, Pavel Fedin wrote: > Hello! > [...] > > > The architecture defines how to address a specific CPU, and that's using > > the MPIDR, not inventing our own scheme, so that's what we should do. > > But doesn't the same apply to GICv2 then? It just happened so that we had > Aff0 == idx, therefore > looks like nobody cared. I don't think it's about caring, but we (I) didn't consider it when designing that API. > My argument is to have GICv3 API as similar to GICv2 as possible, IMHO that > would make it easier to > maintain the code, and perhaps give some way to reusing it. Plenty of things are broken about the VGICv2 implementation and API, so my goal is not to have GICv3 be similar to GICv2, but to design a good API. > > > For example, I don't think you had the full 32-bits available to address > > a CPU in your proposal, so if nothing else, that proposal had less > > expressive power than what the architecture defines. > > My proposal was: > > --- cut --- > KVM_DEV_ARM_VGIC_GRP_DIST_REGS > Attributes: > The attr field of kvm_device_attr encodes two values: > bits: | 63 | 62 .. 52 | 51 .. 32 | 31 0 | > values: | size | reserved | cpu idx | offset | > > All distributor regs can be accessed as (rw, 32-bit) > For GICv3 some regsisters are actually (rw, 64-bit) according to the > specification. In order to perform full 64-bit access 'size' bit should be > set to 1. KVM_DEV_ARM_VGIC_64BIT flag value is provided for this purpose. > --- cut --- > > Bit 63 was meant to specify access size (u64 or u32), and bits 62 - 52 were > reserved just in order > to match ARM64_SYS_REG() macro, which uses these bits for own purpose. > > But, since your proposal suggests that all GICv3 accesses are 64-bit wide, > and we use own system > register encoding (i don't see any serious problems with this), it is > possible just to use bits > 63...32 for vCPU index. So, maximum number of CPUs would be the same > 0x as in your proposal, > and the API would be fully compatible with GICv2 one (well, except access > size), and we would use > the same definitions and the same approach to handle both. This would bring > more consistency and > less confusion to userspace developers who use the API. I don't agree; using the same API with slightly different semantics depending on which device you created is much more error prone than having a clear separation between the two different APIs, IMHO. > > By the way, the rest of KVM API (KVM_IRQ_LINE, for example), also uses vCPU > index. > > That's all my arguments for vCPU index instead of affinity value I'm not convinced that we should be compatible with GICv2 at all. (The deeper argument here is that GICv2 had an architectural limitation to 8 CPUs so all the CPU addressing is simple in that case. This is a fundamental evolution from GICv2 to GICv3 so it is only natural that there are substantial changes in this area.) I'll let Marc or Peter chime in if they disagree. >, and if you say "that doesn't > count, we don't have to be compatible with GICv2", i'll accept this and > proceed with the rewrite. > Cool! Thanks, -Christoffer -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] KVM: arm/arm64: Add VGICv3 save/restore API documentation
Hello! > +The mpidr encoding is based on the affinity information in the > +architecture defined MPIDR, and the field is encoded as follows: > + | 63 56 | 55 48 | 47 40 | 39 32 | > + |Aff3|Aff2|Aff1|Aff0| One concern about this... Does it already have "We are Bosses, we Decided it, It's not subject to change, We do not care" status? Actually, current approach with using index instead of Aff bits works pretty well. It works fine with qemu too, because internally qemu also maintains CPU index, which it uses for GICv2 accesses. Keeping index allows to reuse more code, and provides better backwards compatibility. So could we do this? Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v3 00/16] KVM: arm64: GICv3 ITS emulation
Hello! Sorry for taking up your time, and thank you very much for the explanation. > I'd appreciate if you could try to read and understand the architecture > spec instead of randomly googling and quoting various bits of > irrelevant information. I give my apologizes for not having time to read the whole specs from beginning to the end. Can only add that it's quite weird to have these important things in "Terminology" section. I would expect them to be in 6.1, for example. That was the part i read, but failed to find the exact answer: --- cut --- LPIs do not have an active state, and transition to the inactive state on being acknowledged by a PE --- cut --- Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 5/5] KVM: nVMX: expose VPID capability to L1
Expose VPID capability to L1. For nested guests, we don't do anything specific for single context invalidation. Hence, only advertise support for global context invalidation. The major benefit of nested VPID comes from having separate vpids when switching between L1 and L2, and also when L2's vCPUs not sched in/out on L1. Reviewed-by: Wincy VanSigned-off-by: Wanpeng Li --- arch/x86/kvm/vmx.c | 36 1 file changed, 24 insertions(+), 12 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 31d272e..22b4dc7 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -442,7 +442,7 @@ struct nested_vmx { u32 nested_vmx_true_entry_ctls_low; u32 nested_vmx_misc_low; u32 nested_vmx_misc_high; - u32 nested_vmx_ept_caps; + u64 nested_vmx_ept_vpid_caps; }; #define POSTED_INTR_ON 0 @@ -2489,18 +2489,22 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx *vmx) /* nested EPT: emulate EPT also to L1 */ vmx->nested.nested_vmx_secondary_ctls_high |= SECONDARY_EXEC_ENABLE_EPT; - vmx->nested.nested_vmx_ept_caps = VMX_EPT_PAGE_WALK_4_BIT | + vmx->nested.nested_vmx_ept_vpid_caps = VMX_EPT_PAGE_WALK_4_BIT | VMX_EPTP_WB_BIT | VMX_EPT_2MB_PAGE_BIT | VMX_EPT_INVEPT_BIT; - vmx->nested.nested_vmx_ept_caps &= vmx_capability.ept; + vmx->nested.nested_vmx_ept_vpid_caps &= vmx_capability.ept; /* * For nested guests, we don't do anything specific * for single context invalidation. Hence, only advertise * support for global context invalidation. */ - vmx->nested.nested_vmx_ept_caps |= VMX_EPT_EXTENT_GLOBAL_BIT; + vmx->nested.nested_vmx_ept_vpid_caps |= VMX_EPT_EXTENT_GLOBAL_BIT; } else - vmx->nested.nested_vmx_ept_caps = 0; + vmx->nested.nested_vmx_ept_vpid_caps = 0; + + if (enable_vpid) + vmx->nested.nested_vmx_ept_vpid_caps |= (VMX_VPID_INVVPID_BIT | + VMX_VPID_EXTENT_GLOBAL_CONTEXT_BIT) << 32; if (enable_unrestricted_guest) vmx->nested.nested_vmx_secondary_ctls_high |= @@ -2616,8 +2620,7 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata) vmx->nested.nested_vmx_secondary_ctls_high); break; case MSR_IA32_VMX_EPT_VPID_CAP: - /* Currently, no nested vpid support */ - *pdata = vmx->nested.nested_vmx_ept_caps; + *pdata = vmx->nested.nested_vmx_ept_vpid_caps; break; default: return 1; @@ -7152,7 +7155,7 @@ static int handle_invept(struct kvm_vcpu *vcpu) if (!(vmx->nested.nested_vmx_secondary_ctls_high & SECONDARY_EXEC_ENABLE_EPT) || - !(vmx->nested.nested_vmx_ept_caps & VMX_EPT_INVEPT_BIT)) { + !(vmx->nested.nested_vmx_ept_vpid_caps & VMX_EPT_INVEPT_BIT)) { kvm_queue_exception(vcpu, UD_VECTOR); return 1; } @@ -7168,7 +7171,7 @@ static int handle_invept(struct kvm_vcpu *vcpu) vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO); type = kvm_register_readl(vcpu, (vmx_instruction_info >> 28) & 0xf); - types = (vmx->nested.nested_vmx_ept_caps >> VMX_EPT_EXTENT_SHIFT) & 6; + types = (vmx->nested.nested_vmx_ept_vpid_caps >> VMX_EPT_EXTENT_SHIFT) & 6; if (!(types & (1UL << type))) { nested_vmx_failValid(vcpu, @@ -7207,14 +7210,15 @@ static int handle_invept(struct kvm_vcpu *vcpu) static int handle_invvpid(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); - u32 vmx_instruction_info; + u32 vmx_instruction_info, types; unsigned long type; gva_t gva; struct x86_exception e; int vpid; if (!(vmx->nested.nested_vmx_secondary_ctls_high & - SECONDARY_EXEC_ENABLE_VPID)) { + SECONDARY_EXEC_ENABLE_VPID) || + !(vmx->nested.nested_vmx_ept_vpid_caps & (VMX_VPID_INVVPID_BIT << 32))) { kvm_queue_exception(vcpu, UD_VECTOR); return 1; } @@ -7225,6 +7229,14 @@ static int handle_invvpid(struct kvm_vcpu *vcpu) vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO); type = kvm_register_readl(vcpu, (vmx_instruction_info >> 28) & 0xf); + types = (vmx->nested.nested_vmx_ept_vpid_caps >> VMX_VPID_EXTENT_SHIFT) & 0x7; + + if (!(types & (1UL << type))) { + nested_vmx_failValid(vcpu, + VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID); + return 1; + } + /* according to the intel vmx instruction
[PATCH v2 4/5] KVM: nVMX: nested VPID emulation
VPID is used to tag address space and avoid a TLB flush. Currently L0 use the same VPID to run L1 and all its guests. KVM flushes VPID when switching between L1 and L2. This patch advertises VPID to the L1 hypervisor, then address space of L1 and L2 can be separately treated and avoid TLB flush when swithing between L1 and L2. For each nested vmentry, if vpid12 is changed, reuse shadow vpid w/ an invvpid. Performance: run lmbench on L2 w/ 3.5 kernel. Context switching - times in microseconds - smaller is better - Host OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw - - -- -- -- -- -- --- --- kernelLinux 3.5.0-1 1.2200 1.3700 1.4500 4.7800 2.3300 5.6 2.88000 nested VPID kernelLinux 3.5.0-1 1.2600 1.4300 1.5600 12.7 12.9 3.49000 7.46000 vanilla Reviewed-by: Jan KiszkaReviewed-by: Wincy Van Signed-off-by: Wanpeng Li --- arch/x86/kvm/vmx.c | 39 --- 1 file changed, 32 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 87d042a..31d272e 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -424,6 +424,9 @@ struct nested_vmx { /* to migrate it to L2 if VM_ENTRY_LOAD_DEBUG_CONTROLS is off */ u64 vmcs01_debugctl; + u16 vpid02; + u16 last_vpid; + u32 nested_vmx_procbased_ctls_low; u32 nested_vmx_procbased_ctls_high; u32 nested_vmx_true_procbased_ctls_low; @@ -1157,6 +1160,11 @@ static inline bool nested_cpu_has_virt_x2apic_mode(struct vmcs12 *vmcs12) return nested_cpu_has2(vmcs12, SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE); } +static inline bool nested_cpu_has_vpid(struct vmcs12 *vmcs12) +{ + return nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENABLE_VPID); +} + static inline bool nested_cpu_has_apic_reg_virt(struct vmcs12 *vmcs12) { return nested_cpu_has2(vmcs12, SECONDARY_EXEC_APIC_REGISTER_VIRT); @@ -2471,6 +2479,7 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx *vmx) SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES | SECONDARY_EXEC_RDTSCP | SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE | + SECONDARY_EXEC_ENABLE_VPID | SECONDARY_EXEC_APIC_REGISTER_VIRT | SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY | SECONDARY_EXEC_WBINVD_EXITING | @@ -6680,6 +6689,7 @@ static void free_nested(struct vcpu_vmx *vmx) return; vmx->nested.vmxon = false; + free_vpid(vmx->nested.vpid02); nested_release_vmcs12(vmx); if (enable_shadow_vmcs) free_vmcs(vmx->nested.current_shadow_vmcs); @@ -7234,7 +7244,7 @@ static int handle_invvpid(struct kvm_vcpu *vcpu) VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID); return 1; } - vmx_flush_tlb(vcpu); + __vmx_flush_tlb(vcpu, to_vmx(vcpu)->nested.vpid02); nested_vmx_succeed(vcpu); break; default: @@ -8610,8 +8620,10 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) goto free_vmcs; } - if (nested) + if (nested) { nested_vmx_setup_ctls_msrs(vmx); + vmx->nested.vpid02 = allocate_vpid(); + } vmx->nested.posted_intr_nv = -1; vmx->nested.current_vmptr = -1ull; @@ -8632,6 +8644,7 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) return >vcpu; free_vmcs: + free_vpid(vmx->nested.vpid02); free_loaded_vmcs(vmx->loaded_vmcs); free_msrs: kfree(vmx->guest_msrs); @@ -9493,12 +9506,24 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) if (enable_vpid) { /* -* Trivially support vpid by letting L2s share their parent -* L1's vpid. TODO: move to a more elaborate solution, giving -* each L2 its own vpid and exposing the vpid feature to L1. +* There is no direct mapping between vpid02 and vpid12, the +* vpid02 is per-vCPU for L0 and reused while the value of +* vpid12 is changed w/ one invvpid during nested vmentry. +* The vpid12 is allocated by L1 for L2, so it will not +* influence global bitmap(for vpid01 and vpid02 allocation) +* even if spawn a lot of nested vCPUs. */ - vmcs_write16(VIRTUAL_PROCESSOR_ID, vmx->vpid); - vmx_flush_tlb(vcpu); + if (nested_cpu_has_vpid(vmcs12) && vmx->nested.vpid02) { +
[PATCH v2 3/5] KVM: nVMX: emulate the INVVPID instruction
Add the INVVPID instruction emulation. Reviewed-by: Wincy VanSigned-off-by: Wanpeng Li --- arch/x86/include/asm/vmx.h | 3 +++ arch/x86/kvm/vmx.c | 49 +- 2 files changed, 51 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 448b7ca..af5fdaf 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -397,8 +397,10 @@ enum vmcs_field { #define IDENTITY_PAGETABLE_PRIVATE_MEMSLOT (KVM_USER_MEM_SLOTS + 2) #define VMX_NR_VPIDS (1 << 16) +#define VMX_VPID_EXTENT_INDIVIDUAL_ADDR0 #define VMX_VPID_EXTENT_SINGLE_CONTEXT 1 #define VMX_VPID_EXTENT_ALL_CONTEXT2 +#define VMX_VPID_EXTENT_SHIFT 40 #define VMX_EPT_EXTENT_INDIVIDUAL_ADDR 0 #define VMX_EPT_EXTENT_CONTEXT 1 @@ -416,6 +418,7 @@ enum vmcs_field { #define VMX_EPT_EXTENT_CONTEXT_BIT (1ull << 25) #define VMX_EPT_EXTENT_GLOBAL_BIT (1ull << 26) +#define VMX_VPID_INVVPID_BIT(1ull << 0) /* (32 - 32) */ #define VMX_VPID_EXTENT_SINGLE_CONTEXT_BIT (1ull << 9) /* (41 - 32) */ #define VMX_VPID_EXTENT_GLOBAL_CONTEXT_BIT (1ull << 10) /* (42 - 32) */ diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 215db2b..87d042a 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -7196,7 +7196,54 @@ static int handle_invept(struct kvm_vcpu *vcpu) static int handle_invvpid(struct kvm_vcpu *vcpu) { - kvm_queue_exception(vcpu, UD_VECTOR); + struct vcpu_vmx *vmx = to_vmx(vcpu); + u32 vmx_instruction_info; + unsigned long type; + gva_t gva; + struct x86_exception e; + int vpid; + + if (!(vmx->nested.nested_vmx_secondary_ctls_high & + SECONDARY_EXEC_ENABLE_VPID)) { + kvm_queue_exception(vcpu, UD_VECTOR); + return 1; + } + + if (!nested_vmx_check_permission(vcpu)) + return 1; + + vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO); + type = kvm_register_readl(vcpu, (vmx_instruction_info >> 28) & 0xf); + + /* according to the intel vmx instruction reference, the memory +* operand is read even if it isn't needed (e.g., for type==global) +*/ + if (get_vmx_mem_address(vcpu, vmcs_readl(EXIT_QUALIFICATION), + vmx_instruction_info, false, )) + return 1; + if (kvm_read_guest_virt(>arch.emulate_ctxt, gva, , + sizeof(u32), )) { + kvm_inject_page_fault(vcpu, ); + return 1; + } + + switch (type) { + case VMX_VPID_EXTENT_ALL_CONTEXT: + if (get_vmcs12(vcpu)->virtual_processor_id == 0) { + nested_vmx_failValid(vcpu, + VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID); + return 1; + } + vmx_flush_tlb(vcpu); + nested_vmx_succeed(vcpu); + break; + default: + /* Trap single context invalidation invvpid calls */ + BUG_ON(1); + break; + } + + skip_emulated_instruction(vcpu); return 1; } -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: arm/arm64: Fix memory leak if timer initialization fails
On Tue, Oct 06, 2015 at 11:14:35AM +0300, Pavel Fedin wrote: > Jump to correct label and free kvm_host_cpu_state > > Signed-off-by: Pavel Fedin> --- > arch/arm/kvm/arm.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c > index dc017ad..78b2869 100644 > --- a/arch/arm/kvm/arm.c > +++ b/arch/arm/kvm/arm.c > @@ -1080,7 +1080,7 @@ static int init_hyp_mode(void) >*/ > err = kvm_timer_hyp_init(); > if (err) > - goto out_free_mappings; > + goto out_free_context; > > #ifndef CONFIG_HOTPLUG_CPU > free_boot_hyp_pgd(); Nice catch. Applied, thanks. -Christoffer -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] KVM: arm/arm64: Add VGICv3 save/restore API documentation
Hello! > > One concern about this... Does it already have "We are Bosses, we Decided > > it, It's not > subject to > > change, We do not care" status? > > That's a rather negative question. Sorry, didn't want to offend anyone. I just wanted to tell that i know that you, as maintainers, have much more power than i do, and you can always say "it's political decision, we just want it and that's final", and if you choose to do this, i'm perfectly OK with that, just say it. > The architecture defines how to address a specific CPU, and that's using > the MPIDR, not inventing our own scheme, so that's what we should do. But doesn't the same apply to GICv2 then? It just happened so that we had Aff0 == idx, therefore looks like nobody cared. My argument is to have GICv3 API as similar to GICv2 as possible, IMHO that would make it easier to maintain the code, and perhaps give some way to reusing it. > For example, I don't think you had the full 32-bits available to address > a CPU in your proposal, so if nothing else, that proposal had less > expressive power than what the architecture defines. My proposal was: --- cut --- KVM_DEV_ARM_VGIC_GRP_DIST_REGS Attributes: The attr field of kvm_device_attr encodes two values: bits: | 63 | 62 .. 52 | 51 .. 32 | 31 0 | values: | size | reserved | cpu idx | offset | All distributor regs can be accessed as (rw, 32-bit) For GICv3 some regsisters are actually (rw, 64-bit) according to the specification. In order to perform full 64-bit access 'size' bit should be set to 1. KVM_DEV_ARM_VGIC_64BIT flag value is provided for this purpose. --- cut --- Bit 63 was meant to specify access size (u64 or u32), and bits 62 - 52 were reserved just in order to match ARM64_SYS_REG() macro, which uses these bits for own purpose. But, since your proposal suggests that all GICv3 accesses are 64-bit wide, and we use own system register encoding (i don't see any serious problems with this), it is possible just to use bits 63...32 for vCPU index. So, maximum number of CPUs would be the same 0x as in your proposal, and the API would be fully compatible with GICv2 one (well, except access size), and we would use the same definitions and the same approach to handle both. This would bring more consistency and less confusion to userspace developers who use the API. By the way, the rest of KVM API (KVM_IRQ_LINE, for example), also uses vCPU index. That's all my arguments for vCPU index instead of affinity value, and if you say "that doesn't count, we don't have to be compatible with GICv2", i'll accept this and proceed with the rewrite. Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: arm/arm64: Add VGICv3 save/restore API documentation
On Thu, Oct 08, 2015 at 10:17:09AM +0300, Pavel Fedin wrote: > Hello! > > > +The mpidr encoding is based on the affinity information in the > > +architecture defined MPIDR, and the field is encoded as follows: > > + | 63 56 | 55 48 | 47 40 | 39 32 | > > + |Aff3|Aff2|Aff1|Aff0| > > One concern about this... Does it already have "We are Bosses, we Decided > it, It's not subject to > change, We do not care" status? That's a rather negative question. We spent a fair amount of time discussing this during SFO15 and arrived at the conclusion that this was the way to go. If there are good arguments for why this is not sufficient or causes problems, then of course we'll make changes as required. > Actually, current approach with using index instead of Aff bits > works pretty well. It works fine with qemu too, because internally qemu also > maintains CPU index, > which it uses for GICv2 accesses. > Keeping index allows to reuse more code, and provides better backwards > compatibility. So could we > do this? > The architecture defines how to address a specific CPU, and that's using the MPIDR, not inventing our own scheme, so that's what we should do. For example, I don't think you had the full 32-bits available to address a CPU in your proposal, so if nothing else, that proposal had less expressive power than what the architecture defines. I also don't buy the argument that this saves a lot of code. If you have something that deals with a cpu index, surely two simple functions can allow the same amount of code reuse: unsigned long mpidr_to_vcpu_idx(unsigned long mpidr); unsigned long vcpu_idx_to_mpidr(unsigned long vcpu_idx); -Christoffer -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 3/7] KVM: arm/arm64: Fix the documentation
On Thu, Oct 08, 2015 at 09:52:02AM +0300, Pavel Fedin wrote: > Hello! > > > > --- a/Documentation/virtual/kvm/devices/arm-vgic.txt > > > +++ b/Documentation/virtual/kvm/devices/arm-vgic.txt > > > @@ -44,28 +44,29 @@ Groups: > > >Attributes: > > > The attr field of kvm_device_attr encodes two values: > > > bits: | 63 40 | 39 .. 32 | 31 0 | > > > -values: |reserved | cpu id | offset | > > > +values: |reserved | cpu idx | offset | > > > > why should this be changed to cpu idx? > > Because it's index (from 0 to N - 1), and "cpu id" may confuse readers that > it should be MPIDR > affinity value. In register access function we do "vcpu = > kvm_get_vcpu(dev->kvm, cpuid);" (see here: > http://lxr.free-electrons.com/source/virt/kvm/arm/vgic-v2-emul.c#L664), and > kvm_get_vcpu just > indexes the array: > http://lxr.free-electrons.com/source/include/linux/kvm_host.h#L427 > I decided to change this after > http://www.spinics.net/lists/kvm-arm/msg16359.html, Andre clearly > mistook this ID for being affinity value. > Before GICv3 nobody saw the difference because we had only up to 16 CPUs, > with IDs from 0 to 15, i. > e. corresponding to indexes. > ok, fair enough. This kind of rationale is helpful to put in the commit text though. -Christoffer -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 0/2] KVM: arm/arm64: Clean up some obsolete code
Hello! > Yes, I am looking at merging this. From the discussion with Pavel I > remember some things that I disagreed with, so I may propose a follow-up > patch. I will give this a try tomorrow. I reply to this thread, because this relates to the whole changeset. After the merge, some pieces are missing, considering my live migration W.I.P (see patch below). Together with this, vITS v3 backported to v4.2.2 and... Tested-by: Pavel Fedin--- >From b08e9ba1da69f9cf5731c89a4ff39561cd16e6ea Mon Sep 17 00:00:00 2001 From: Pavel Fedin Date: Thu, 8 Oct 2015 14:43:23 +0300 Subject: [PATCH] Missing vITS pieces for live migration and safety 1. Split up vits_init() and perform allocations on PENDBASER access. Fixes crash when setting LPI registers from userspace before any vCPU has been run. The bug is triggered by reset routine in qemu. 2. Properly handle LPIs in vgic_unqueue_irqs(). Does not corrupt memory any more if LPI happens to get in there. Signed-off-by: Pavel Fedin --- virt/kvm/arm/its-emul.c | 26 +++--- virt/kvm/arm/its-emul.h | 1 + virt/kvm/arm/vgic-v3-emul.c | 11 ++- virt/kvm/arm/vgic.c | 15 +++ 4 files changed, 41 insertions(+), 12 deletions(-) diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c index b40a7fc..b1d61df 100644 --- a/virt/kvm/arm/its-emul.c +++ b/virt/kvm/arm/its-emul.c @@ -1117,7 +1117,9 @@ int vits_init(struct kvm *kvm) { struct vgic_dist *dist = >arch.vgic; struct vgic_its *its = >its; - int ret; + + if (dist->pendbaser) + return 0; dist->pendbaser = kcalloc(dist->nr_cpus, sizeof(u64), GFP_KERNEL); if (!dist->pendbaser) @@ -1132,18 +1134,27 @@ int vits_init(struct kvm *kvm) INIT_LIST_HEAD(>device_list); INIT_LIST_HEAD(>collection_list); - ret = vgic_register_kvm_io_dev(kvm, dist->vgic_its_base, - KVM_VGIC_V3_ITS_SIZE, vgicv3_its_ranges, - -1, >iodev); - if (ret) - return ret; - its->enabled = false; dist->msis_require_devid = true; return 0; } +int vits_map_resources(struct kvm *kvm) +{ + struct vgic_dist *dist = >arch.vgic; + struct vgic_its *its = >its; + int ret; + + ret = vits_init(kvm); + if (ret) + return ret; + + return vgic_register_kvm_io_dev(kvm, dist->vgic_its_base, + KVM_VGIC_V3_ITS_SIZE, vgicv3_its_ranges, + -1, >iodev); +} + void vits_destroy(struct kvm *kvm) { struct vgic_dist *dist = >arch.vgic; @@ -1182,6 +1193,7 @@ void vits_destroy(struct kvm *kvm) kfree(its->buffer_page); kfree(dist->pendbaser); + dist->pendbaser = NULL; its->enabled = false; spin_unlock(>lock); } diff --git a/virt/kvm/arm/its-emul.h b/virt/kvm/arm/its-emul.h index 95e56a7..236f153 100644 --- a/virt/kvm/arm/its-emul.h +++ b/virt/kvm/arm/its-emul.h @@ -34,6 +34,7 @@ void vgic_enable_lpis(struct kvm_vcpu *vcpu); int vits_init(struct kvm *kvm); +int vits_map_resources(struct kvm *kvm); void vits_destroy(struct kvm *kvm); int vits_inject_msi(struct kvm *kvm, struct kvm_msi *msi); diff --git a/virt/kvm/arm/vgic-v3-emul.c b/virt/kvm/arm/vgic-v3-emul.c index 3e262f3..b340202 100644 --- a/virt/kvm/arm/vgic-v3-emul.c +++ b/virt/kvm/arm/vgic-v3-emul.c @@ -693,6 +693,15 @@ static bool handle_mmio_pendbaser_redist(struct kvm_vcpu *vcpu, struct vgic_dist *dist = >kvm->arch.vgic; int mode = ACCESS_READ_VALUE; + /* +* This makes sure that dist->pendbaser is allocated. +* We don't use any locks here because the actual initialization will +* happen either during register access from userspace, or upon first +* run. Both situations are already single-threaded. +*/ + if (vits_init(vcpu->kvm)) + return false; + /* Storing a value with LPIs already enabled is undefined */ mode |= dist->lpis_enabled ? ACCESS_WRITE_IGNORED : ACCESS_WRITE_VALUE; vgic_reg64_access(mmio, offset, @@ -880,7 +889,7 @@ static int vgic_v3_map_resources(struct kvm *kvm, } if (vgic_has_its(kvm)) { - ret = vits_init(kvm); + ret = vits_map_resources(kvm); if (ret) goto out_unregister; } diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index b32baa0..8dbbb1a 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -711,6 +711,7 @@ bool vgic_handle_cfg_reg(u32 *reg, struct kvm_exit_mmio *mmio, */ void vgic_unqueue_irqs(struct kvm_vcpu *vcpu) { + struct vgic_dist *dist = >kvm->arch.vgic; struct vgic_cpu *vgic_cpu = >arch.vgic_cpu; u64 elrsr = vgic_get_elrsr(vcpu);
Re: [PATCH] KVM: arm/arm64: Add VGICv3 save/restore API documentation
On 8 October 2015 at 10:10, Pavel Fedinwrote: > Sorry, didn't want to offend anyone. I just wanted to tell that i know > that you, as maintainers, have much more power than i do, and you can > always say "it's political decision, we just want it and that's final", > and if you choose to do this, i'm perfectly OK with that, just say it. This isn't intended to be a political decision; it's our joint technical opinion on the best design for this API. Since we all happened to be in one physical location the other week it was a good opportunity for us to work through how we thought the API should look from a technical perspective. At that point it seemed to us to be clearer to write up the results of that discussion as a single patch to the API documentation, rather than doing it by (for instance) making lots of different review comments on your patches. Christoffer's API documentation patch is a patch like any other and it has to go through review. If there are parts where you don't understand the rationale or you think we got something wrong you should let us know. thanks -- PMM -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] KVM: arm/arm64: Add VGICv3 save/restore API documentation
Hello! > Well, compatibility with GICv2 is the biggest mistake we made when > designing the GICv3 architecture. And that's why our emulation doesn't > give a damn about v2 compatibility. Ok, i see your arguments, and after that it becomes a matter of personal taste. Three beat one, i go your way. :) Don't know if i'll be able to roll out v5 tomorrow, but i think on monday i'll definitely do. Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] KVM: arm/arm64: Clean up some obsolete code
On Thu, Oct 08, 2015 at 12:15:06PM +0100, Andre Przywara wrote: > Hi, > > On 08/10/15 11:56, Marc Zyngier wrote: > > On 08/10/15 11:14, Christoffer Dall wrote: > >> Hi Pavel, > >> > >> On Fri, Oct 02, 2015 at 05:44:27PM +0300, Pavel Fedin wrote: > >>> Current KVM code has lots of old redundancies, which can be cleaned up. > >>> This patchset is actually a better alternative to > >>> http://www.spinics.net/lists/arm-kernel/msg430726.html, which allows to > >>> keep piggy-backed LRs. The idea is based on the fact that our code also > >>> maintains LR state in elrsr, and this information is enough to track LR > >>> usage. > >>> > >>> This patchset is made against linux-next of 02.10.2015. Thanks to Andre > >>> for pointing out some 4.3 specifics. > >>> > >> I'm not opposed to these changes, they clean up the data structures > >> which is definitely a good thing. > >> > >> I am a bit worries about how/if this is going to conflict with the ITS > >> series and other patches in flight touchignt he vgic. > >> > >> Marc/Andre, any thoughts on this? > > > > I don't mind the simplification (Andre was already removing the > > piggybacking stuff as part of his ITS series). I'm a bit more cautious > > about the sync_elrsr stuff, but that's mostly because I've only read the > > patch in a superficial way. > > > > But yes, this is probably going to clash, unless we make this part of an > > existing series (/me looks at André... ;-) > > Yes, I am looking at merging this. From the discussion with Pavel I > remember some things that I disagreed with, so I may propose a follow-up > patch. I will give this a try tomorrow. > >From a maintainer perspective I'd much prefer Andre take these patches as part of his series and send everything together in one go. Thanks, -Christoffer -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 0/2] KVM: arm/arm64: Clean up some obsolete code
Hello! > I don't mind the simplification (Andre was already removing the > piggybacking stuff as part of his ITS series). I'm a bit more cautious > about the sync_elrsr stuff, but that's mostly because I've only read the > patch in a superficial way. If you are really afraid of it, you can omit 2/2. The reason why i've done it is exactly what i said in commit message - LR setting and ELRSR sync *always* go together. The main part of the cleanup is 1/2. > But yes, this is probably going to clash, unless we make this part of an > existing series (/me looks at André... ;-) It a little bit clashes, patch No 0012 needs small modifications, but i'd say they are trivial. If you want, i could rebase the whole thing on top of current kvmarm.git by myself. Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] KVM: arm/arm64: Clean up some obsolete code
On 08/10/15 11:14, Christoffer Dall wrote: > Hi Pavel, > > On Fri, Oct 02, 2015 at 05:44:27PM +0300, Pavel Fedin wrote: >> Current KVM code has lots of old redundancies, which can be cleaned up. >> This patchset is actually a better alternative to >> http://www.spinics.net/lists/arm-kernel/msg430726.html, which allows to >> keep piggy-backed LRs. The idea is based on the fact that our code also >> maintains LR state in elrsr, and this information is enough to track LR >> usage. >> >> This patchset is made against linux-next of 02.10.2015. Thanks to Andre >> for pointing out some 4.3 specifics. >> > I'm not opposed to these changes, they clean up the data structures > which is definitely a good thing. > > I am a bit worries about how/if this is going to conflict with the ITS > series and other patches in flight touchignt he vgic. > > Marc/Andre, any thoughts on this? I don't mind the simplification (Andre was already removing the piggybacking stuff as part of his ITS series). I'm a bit more cautious about the sync_elrsr stuff, but that's mostly because I've only read the patch in a superficial way. But yes, this is probably going to clash, unless we make this part of an existing series (/me looks at André... ;-) M. -- Jazz is not dead. It just smells funny... -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 0/2] KVM: arm/arm64: Clean up some obsolete code
Hello! > I am a bit worries about how/if this is going to conflict with the ITS > series and other patches in flight touchignt he vgic. Just to note: of course i test them together. This works fine at least with vITS v2, where it replaces patch 0001 from the original series. I guess it should also work fine with v3, replacing patches 0001 and 0002. Merging this at the moment. Yes, vgic_unqueue_lpi() falls out of use, but this is temporary, because it will be very useful for live migration. Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
AIO requests may be disordered by Qemu-kvm iothread with disk cache=writethrough, Bug or Feature?
Dear KVM Developers: I am Xiang Song from UCloud company. We currently encounter a weird phenomenon about Qemu-KVM IOthread. We recently try to use Linux AIO from guest OS and find that the IOthread mechanism of Qemu-KVM will reorder I/O requests from guest OS even when the AIO write requests are issued from a single thread in order. This does not happen on the host OS however. We are not sure whether this is a feature of Qemu-KVM IOthread mechanism or a Bug. The testbd is as following: (the guest disk device cache is configured to writethrough.) CPU: Intel(R) Xeon(R) CPU E5-2650 QEMU version: 1.5.3 Host/Guest Kernel: Both Linux 4.1.8 & Linux 2.6.32, OS type CentOS 6.5 Simplified Guest OS qemu cmd: /usr/libexec/qemu-kvm -machine rhel6.3.0,accel=kvm,usb=off -cpu kvm64 -smp 8,sockets=8,cores=1,threads=1 -drive file=/var/lib/libvirt/images/song-disk.img,if=none,id=drive-virtio-disk0,format=qcow2,serial=UCLOUD_DISK_VDA,cache=writethrough -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:22:d5:52,bus=pci.0,addr=0x4 The test code triggerring this phenomenon work as following: it use linux aio API to issue concurrent async write requests to a file. During exection it will continuously write data into target test file. There are total 'X' jobs, and each job is assigned a job id JOB_ID which starts from 0. Each job will write 16 * 512 Byte data into the target file at offset = JOB_ID * 512. (the data is repeated uint64_t JOB_ID). There is only one thread handling 'X' jobs one by one through Linux AIO (io_submit) cmd. When handling jobs, it will continuously issuing AIO requests without waiting for AIO Callbacks. When it finishes, the file should look like: [00][1...1][2...2][3...3]...[X-1...X-1] Then we use a check program to test the resulting file, it can continuously read the first 8 byte (uint64_t) of each sector and print it out. In normal cases, it's output is like: 0 1 2 3 X-1 Exec output: (Set X=32) In our guest OS, the output is abnormal: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 18 18 18 18 18 24 25 26 27 28 29 30 31. It can be seen that job20~job24 are overwrited by job19. In our host OS, the output is as expected, 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31. I can provide the example code if needed. Best regards, song 2015-10-08 charlie.song -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] KVM: arm/arm64: Clean up some obsolete code
Hi Pavel, On Fri, Oct 02, 2015 at 05:44:27PM +0300, Pavel Fedin wrote: > Current KVM code has lots of old redundancies, which can be cleaned up. > This patchset is actually a better alternative to > http://www.spinics.net/lists/arm-kernel/msg430726.html, which allows to > keep piggy-backed LRs. The idea is based on the fact that our code also > maintains LR state in elrsr, and this information is enough to track LR > usage. > > This patchset is made against linux-next of 02.10.2015. Thanks to Andre > for pointing out some 4.3 specifics. > I'm not opposed to these changes, they clean up the data structures which is definitely a good thing. I am a bit worries about how/if this is going to conflict with the ITS series and other patches in flight touchignt he vgic. Marc/Andre, any thoughts on this? -Christoffer -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: arm/arm64: Add VGICv3 save/restore API documentation
On 08/10/15 10:28, Christoffer Dall wrote: > On Thu, Oct 08, 2015 at 12:10:35PM +0300, Pavel Fedin wrote: >> Hello! >> > [...] >> >>> The architecture defines how to address a specific CPU, and that's using >>> the MPIDR, not inventing our own scheme, so that's what we should do. >> >> But doesn't the same apply to GICv2 then? It just happened so that we had >> Aff0 == idx, therefore >> looks like nobody cared. > > I don't think it's about caring, but we (I) didn't consider it when > designing that API. > >> My argument is to have GICv3 API as similar to GICv2 as possible, IMHO that >> would make it easier to >> maintain the code, and perhaps give some way to reusing it. > > Plenty of things are broken about the VGICv2 implementation and API, so > my goal is not to have GICv3 be similar to GICv2, but to design a good > API. > >> >>> For example, I don't think you had the full 32-bits available to address >>> a CPU in your proposal, so if nothing else, that proposal had less >>> expressive power than what the architecture defines. >> >> My proposal was: >> >> --- cut --- >> KVM_DEV_ARM_VGIC_GRP_DIST_REGS >> Attributes: >> The attr field of kvm_device_attr encodes two values: >> bits: | 63 | 62 .. 52 | 51 .. 32 | 31 0 | >> values: | size | reserved | cpu idx | offset | >> >> All distributor regs can be accessed as (rw, 32-bit) >> For GICv3 some regsisters are actually (rw, 64-bit) according to the >> specification. In order to perform full 64-bit access 'size' bit should >> be >> set to 1. KVM_DEV_ARM_VGIC_64BIT flag value is provided for this purpose. >> --- cut --- >> >> Bit 63 was meant to specify access size (u64 or u32), and bits 62 - 52 were >> reserved just in order >> to match ARM64_SYS_REG() macro, which uses these bits for own purpose. >> >> But, since your proposal suggests that all GICv3 accesses are 64-bit wide, >> and we use own system >> register encoding (i don't see any serious problems with this), it is >> possible just to use bits >> 63...32 for vCPU index. So, maximum number of CPUs would be the same >> 0x as in your proposal, >> and the API would be fully compatible with GICv2 one (well, except access >> size), and we would use >> the same definitions and the same approach to handle both. This would bring >> more consistency and >> less confusion to userspace developers who use the API. > > I don't agree; using the same API with slightly different semantics > depending on which device you created is much more error prone than > having a clear separation between the two different APIs, IMHO. > >> >> By the way, the rest of KVM API (KVM_IRQ_LINE, for example), also uses vCPU >> index. >> >> That's all my arguments for vCPU index instead of affinity value > > I'm not convinced that we should be compatible with GICv2 at all. (The > deeper argument here is that GICv2 had an architectural limitation to 8 > CPUs so all the CPU addressing is simple in that case. This is a > fundamental evolution from GICv2 to GICv3 so it is only natural that > there are substantial changes in this area.) > > I'll let Marc or Peter chime in if they disagree. Well, compatibility with GICv2 is the biggest mistake we made when designing the GICv3 architecture. And that's why our emulation doesn't give a damn about v2 compatibility. Designing the kernel API in terms of GICv2 is nothing short of revolting, if you ask me. We've already shoe-horned GICv3 in GICv2 data structures in KVM, and that's an absolute mess. I can't wait to simple nuke the thing. Once v2 is out of the picture, everything is much simpler. Thanks, M. -- Jazz is not dead. It just smells funny... -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] KVM: arm/arm64: Clean up some obsolete code
Hi, On 08/10/15 11:56, Marc Zyngier wrote: > On 08/10/15 11:14, Christoffer Dall wrote: >> Hi Pavel, >> >> On Fri, Oct 02, 2015 at 05:44:27PM +0300, Pavel Fedin wrote: >>> Current KVM code has lots of old redundancies, which can be cleaned up. >>> This patchset is actually a better alternative to >>> http://www.spinics.net/lists/arm-kernel/msg430726.html, which allows to >>> keep piggy-backed LRs. The idea is based on the fact that our code also >>> maintains LR state in elrsr, and this information is enough to track LR >>> usage. >>> >>> This patchset is made against linux-next of 02.10.2015. Thanks to Andre >>> for pointing out some 4.3 specifics. >>> >> I'm not opposed to these changes, they clean up the data structures >> which is definitely a good thing. >> >> I am a bit worries about how/if this is going to conflict with the ITS >> series and other patches in flight touchignt he vgic. >> >> Marc/Andre, any thoughts on this? > > I don't mind the simplification (Andre was already removing the > piggybacking stuff as part of his ITS series). I'm a bit more cautious > about the sync_elrsr stuff, but that's mostly because I've only read the > patch in a superficial way. > > But yes, this is probably going to clash, unless we make this part of an > existing series (/me looks at André... ;-) Yes, I am looking at merging this. From the discussion with Pavel I remember some things that I disagreed with, so I may propose a follow-up patch. I will give this a try tomorrow. Cheers, Andre. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: arm/arm64: Add VGICv3 save/restore API documentation
On Thu, Oct 08, 2015 at 03:28:40PM +0300, Pavel Fedin wrote: > Hello! > > > Well, compatibility with GICv2 is the biggest mistake we made when > > designing the GICv3 architecture. And that's why our emulation doesn't > > give a damn about v2 compatibility. > > Ok, i see your arguments, and after that it becomes a matter of personal > taste. Three beat one, i > go your way. :) Don't know if i'll be able to roll out v5 tomorrow, but i > think on monday i'll > definitely do. > There's no rush, I don't think this will make it into v4.4 anyhow, as we're already on -rc4 and there's a bunch of other stuff in flight, and I haven't configured a way to test these patches yet. Speaking of which, does the QEMU side of this patch set require first adding the GICv3 emulation for the data structures or is there a stand-alone migration patch set somewhere? -Christoffer -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: arm/arm64: Add VGICv3 save/restore API documentation
On 08/10/15 13:45, Pavel Fedin wrote: > Hello! > >> There's no rush, I don't think this will make it into v4.4 anyhow > > Did you mean 4.3 here? No, that'd be really 4.4. 4.3 has closed 4 weeks ago, and 4.4 is about to open. This work is unlikely to make it before 4.5 TBH. Thanks, M. -- Jazz is not dead. It just smells funny... -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: arm/arm64: Add VGICv3 save/restore API documentation
On 8 October 2015 at 13:45, Pavel Fedinwrote: >> Speaking of which, does the QEMU side of this patch set require first >> adding the GICv3 emulation for the data structures or is there a >> stand-alone migration patch set somewhere? > > I rolled it out a week ago: > https://www.mail-archive.com/qemu-devel@nongnu.org/msg325331.html. I > tried to get reviewed/accepted/whatever at least data structure part, but > Peter replied that he > isn't interested before we have the kernel. More specifically, I wanted us to agree on the kernel API for migration, because that significantly affects how the QEMU code looks. A quick look at your patch suggests you still have data structures like +struct gicv3_irq_state { +/* The enable bits are only banked for per-cpu interrupts. */ +unsigned long *enabled; +unsigned long *pending; +unsigned long *active; +unsigned long *level; +unsigned long *group; +bool edge_trigger; /* true: edge-triggered, false: level-triggered */ +uint32_t mask_size; /* Size of bitfields in long's, for migration */ +}; I think it's probably going to be better to have an array of redistributor structures (one per CPU), and then keep the state that in hardware is in each redistributor inside those sub-structures. Similarly for state that in hardware is inside the distributor, and inside each cpu interface. thanks -- PMM -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] KVM: arm/arm64: Add VGICv3 save/restore API documentation
Hello! > There's no rush, I don't think this will make it into v4.4 anyhow Did you mean 4.3 here? > Speaking of which, does the QEMU side of this patch set require first > adding the GICv3 emulation for the data structures or is there a > stand-alone migration patch set somewhere? I rolled it out a week ago: https://www.mail-archive.com/qemu-devel@nongnu.org/msg325331.html. I tried to get reviewed/accepted/whatever at least data structure part, but Peter replied that he isn't interested before we have the kernel. Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 10/15] arm64: kvm: Fix {V}TCR_EL2_TG0 mask
On Tue, Sep 15, 2015 at 04:41:19PM +0100, Suzuki K. Poulose wrote: > From: "Suzuki K. Poulose"> > {V}TCR_EL2_TG0 is a 2bit wide field, where: > > 00 - 4K > 01 - 64K > 10 - 16K > > But we use only 1 bit, which has worked well so far since > we never cared about 16K. Fix it for 16K support. > > Cc: Catalin Marinas > Cc: Will Deacon > Cc: Marc Zyngier > Cc: Christoffer Dall > Cc: kvm...@lists.cs.columbia.edu > Acked-by: Mark Rutland > Signed-off-by: Suzuki K. Poulose > --- > arch/arm64/include/asm/kvm_arm.h |4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/arch/arm64/include/asm/kvm_arm.h > b/arch/arm64/include/asm/kvm_arm.h > index 7605e09..bdf139e 100644 > --- a/arch/arm64/include/asm/kvm_arm.h > +++ b/arch/arm64/include/asm/kvm_arm.h > @@ -98,7 +98,7 @@ > #define TCR_EL2_TBI (1 << 20) > #define TCR_EL2_PS (7 << 16) > #define TCR_EL2_PS_40B (2 << 16) > -#define TCR_EL2_TG0 (1 << 14) > +#define TCR_EL2_TG0 (3 << 14) > #define TCR_EL2_SH0 (3 << 12) > #define TCR_EL2_ORGN0(3 << 10) > #define TCR_EL2_IRGN0(3 << 8) > @@ -110,7 +110,7 @@ > > /* VTCR_EL2 Registers bits */ > #define VTCR_EL2_PS_MASK (7 << 16) > -#define VTCR_EL2_TG0_MASK(1 << 14) > +#define VTCR_EL2_TG0_MASK(3 << 14) > #define VTCR_EL2_TG0_4K (0 << 14) > #define VTCR_EL2_TG0_64K (1 << 14) > #define VTCR_EL2_SH0_MASK(3 << 12) > -- > 1.7.9.5 > Reviewed-by: Christoffer Dall -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 03/15] arm64: Introduce helpers for page table levels
On Wed, Oct 07, 2015 at 10:26:14AM +0100, Marc Zyngier wrote: > On 07/10/15 09:26, Christoffer Dall wrote: > > Hi Suzuki, > > > > On Tue, Sep 15, 2015 at 04:41:12PM +0100, Suzuki K. Poulose wrote: > >> From: "Suzuki K. Poulose"> >> > >> Introduce helpers for finding the number of page table > >> levels required for a given VA width, shift for a particular > >> page table level. > >> > >> Convert the existing users to the new helpers. More users > >> to follow. > >> > >> Cc: Ard Biesheuvel > >> Cc: Mark Rutland > >> Cc: Catalin Marinas > >> Cc: Will Deacon > >> Signed-off-by: Suzuki K. Poulose > >> Reviewed-by: Ard Biesheuvel > >> Tested-by: Ard Biesheuvel > >> --- > >> arch/arm64/include/asm/pgtable-hwdef.h | 15 --- > >> 1 file changed, 12 insertions(+), 3 deletions(-) > >> > >> diff --git a/arch/arm64/include/asm/pgtable-hwdef.h > >> b/arch/arm64/include/asm/pgtable-hwdef.h > >> index 24154b0..ce18389 100644 > >> --- a/arch/arm64/include/asm/pgtable-hwdef.h > >> +++ b/arch/arm64/include/asm/pgtable-hwdef.h > >> @@ -16,13 +16,21 @@ > >> #ifndef __ASM_PGTABLE_HWDEF_H > >> #define __ASM_PGTABLE_HWDEF_H > >> > >> +/* > >> + * Number of page-table levels required to address 'va_bits' wide > >> + * address, without section mapping > >> + */ > >> +#define ARM64_HW_PGTABLE_LEVELS(va_bits) (((va_bits) - 4) / (PAGE_SHIFT - > >> 3)) > > > > I don't understand the '(va_bits) - 4' here, can you explain it (and add a > > comment to that effect) ? > > I just had a chat with Catalin, who did shed some light on this. > It all has to do with rounding up. What you would like to have here is: > > #define ARM64_HW_PGTABLE_LEVELS(va_bits) DIV_ROUND_UP(va_bits - PAGE_SHIFT, > PAGE_SHIFT - 3) > > where (va_bits - PAGE_SHIFT) is the total number of bits we deal > with during a page table walk, and (PAGE_SHIFT - 3) is the number > of bits we deal with per level. > > The clue is in how DIV_ROUND_UP is written: > > #define DIV_ROUND_UP(n,d) (((n) + (d) - 1) / (d)) > > which gives you Suzuki's magic formula. > > I'd vote for the DIV_ROUND_UP(), which will make things a lot more readable. > Thanks for the explanation, I vote for DIV_ROUND_UP too. You can stash this away for a cryptic interview question ;) -Christoffer -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: AIO requests may be disordered by Qemu-kvm iothread with disk cache=writethrough, Bug or Feature?
On Thu, 10/08 19:59, charlie.song wrote: > Dear KVM Developers: > I am Xiang Song from UCloud company. We currently encounter a weird > phenomenon about Qemu-KVM IOthread. > We recently try to use Linux AIO from guest OS and find that the IOthread > mechanism of Qemu-KVM will reorder I/O requests from guest OS > even when the AIO write requests are issued from a single thread in order. > This does not happen on the host OS however. > We are not sure whether this is a feature of Qemu-KVM IOthread mechanism > or a Bug. > > The testbd is as following: (the guest disk device cache is configured to > writethrough.) > CPU: Intel(R) Xeon(R) CPU E5-2650 > QEMU version: 1.5.3 > Host/Guest Kernel: Both Linux 4.1.8 & Linux 2.6.32, OS type CentOS 6.5 > Simplified Guest OS qemu cmd: > /usr/libexec/qemu-kvm -machine rhel6.3.0,accel=kvm,usb=off -cpu kvm64 -smp > 8,sockets=8,cores=1,threads=1 > -drive > file=/var/lib/libvirt/images/song-disk.img,if=none,id=drive-virtio-disk0,format=qcow2,serial=UCLOUD_DISK_VDA,cache=writethrough > > -device > virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:22:d5:52,bus=pci.0,addr=0x4 You mentioned iothread above but it's not in your command line? > > The test code triggerring this phenomenon work as following: it use linux aio > API to issue concurrent async write requests to a file. During exection it > will > continuously write data into target test file. There are total 'X' jobs, and > each job is assigned a job id JOB_ID which starts from 0. Each job will write > 16 * 512 > Byte data into the target file at offset = JOB_ID * 512. (the data is > repeated uint64_t JOB_ID). > There is only one thread handling 'X' jobs one by one through Linux AIO > (io_submit) cmd. When handling jobs, it will continuously > issuing AIO requests without waiting for AIO Callbacks. When it finishes, the > file should look like: > [00][1...1][2...2][3...3]...[X-1...X-1] > Then we use a check program to test the resulting file, it can > continuously read the first 8 byte (uint64_t) of each sector and print it > out. In normal cases, > it's output is like: > 0 1 2 3 X-1 > > Exec output: (Set X=32) > In our guest OS, the output is abnormal: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 > 15 16 17 18 18 18 18 18 18 24 25 26 27 28 29 30 31. > It can be seen that job20~job24 are overwrited by job19. > In our host OS, the output is as expected, 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 > 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31. I'm not 100% sure but I don't think the returning of io_submit guarantees any ordering, usually you need to wait for the callback to ensure that. Fam > > > I can provide the example code if needed. > > Best regards, song > > 2015-10-08 > > > charlie.song > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 03/15] arm64: Introduce helpers for page table levels
On Thu, Oct 08, 2015 at 06:22:34PM +0100, Suzuki K. Poulose wrote: > On 08/10/15 15:45, Christoffer Dall wrote: > >On Wed, Oct 07, 2015 at 10:26:14AM +0100, Marc Zyngier wrote: > >>I just had a chat with Catalin, who did shed some light on this. > >>It all has to do with rounding up. What you would like to have here is: > >> > >>#define ARM64_HW_PGTABLE_LEVELS(va_bits) DIV_ROUND_UP(va_bits - PAGE_SHIFT, > >>PAGE_SHIFT - 3) > >> > >>where (va_bits - PAGE_SHIFT) is the total number of bits we deal > >>with during a page table walk, and (PAGE_SHIFT - 3) is the number > >>of bits we deal with per level. > >> > >>The clue is in how DIV_ROUND_UP is written: > >> > >>#define DIV_ROUND_UP(n,d) (((n) + (d) - 1) / (d)) > >> > >>which gives you Suzuki's magic formula. > >> > >>I'd vote for the DIV_ROUND_UP(), which will make things a lot more readable. > >> > >Thanks for the explanation, I vote for DIV_ROUND_UP too. > > Btw, DIV_ROUND_UP is defined in linux/kernel.h, including which in the > required > headers breaks the build. I could add the definition of the same locally. Or just keep the original magic formula and add the DIV_ROUND_UP one in a comment. -- Catalin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 03/15] arm64: Introduce helpers for page table levels
On 08/10/15 15:45, Christoffer Dall wrote: On Wed, Oct 07, 2015 at 10:26:14AM +0100, Marc Zyngier wrote: On 07/10/15 09:26, Christoffer Dall wrote: Hi Suzuki, I just had a chat with Catalin, who did shed some light on this. It all has to do with rounding up. What you would like to have here is: #define ARM64_HW_PGTABLE_LEVELS(va_bits) DIV_ROUND_UP(va_bits - PAGE_SHIFT, PAGE_SHIFT - 3) where (va_bits - PAGE_SHIFT) is the total number of bits we deal with during a page table walk, and (PAGE_SHIFT - 3) is the number of bits we deal with per level. The clue is in how DIV_ROUND_UP is written: #define DIV_ROUND_UP(n,d) (((n) + (d) - 1) / (d)) which gives you Suzuki's magic formula. I'd vote for the DIV_ROUND_UP(), which will make things a lot more readable. Thanks for the explanation, I vote for DIV_ROUND_UP too. Btw, DIV_ROUND_UP is defined in linux/kernel.h, including which in the required headers breaks the build. I could add the definition of the same locally. You can stash this away for a cryptic interview question ;) ;) Suzuki -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: AIO requests may be disordered by Qemu-kvm iothread with disk cache=writethrough, Bug or Feature?
On Fri, 10/09 11:25, charlie.song wrote: > At 2015-10-08 23:37:02, "Fam Zheng"wrote: > >On Thu, 10/08 19:59, charlie.song wrote: > >> Dear KVM Developers: > >> I am Xiang Song from UCloud company. We currently encounter a weird > >> phenomenon about Qemu-KVM IOthread. > >> We recently try to use Linux AIO from guest OS and find that the > >> IOthread mechanism of Qemu-KVM will reorder I/O requests from guest OS > >> even when the AIO write requests are issued from a single thread in order. > >> This does not happen on the host OS however. > >> We are not sure whether this is a feature of Qemu-KVM IOthread > >> mechanism or a Bug. > >> > >> The testbd is as following: (the guest disk device cache is configured to > >> writethrough.) > >> CPU: Intel(R) Xeon(R) CPU E5-2650 > >> QEMU version: 1.5.3 > >> Host/Guest Kernel: Both Linux 4.1.8 & Linux 2.6.32, OS type CentOS 6.5 > >> Simplified Guest OS qemu cmd: > >> /usr/libexec/qemu-kvm -machine rhel6.3.0,accel=kvm,usb=off -cpu kvm64 -smp > >> 8,sockets=8,cores=1,threads=1 > >> -drive > >> file=/var/lib/libvirt/images/song-disk.img,if=none,id=drive-virtio-disk0,format=qcow2,serial=UCLOUD_DISK_VDA,cache=writethrough > >> > >> -device > >> virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:22:d5:52,bus=pci.0,addr=0x4 > > > >You mentioned iothread above but it's not in your command line? > I means the thread pool mechanism used by qemu-kvm to accelerate I/O > processing.This is used by paio_submit (block/raw-posix.c) by default and > with pool->max_threads = 64 as I know. (qemu-kvm version 1.5.3) The thread pool parallism may reorder non-overlapping requests, but it shouldn't cause any reordering of overlapping requests like the case in your io pattern. QEMU ensures that. Do you see this with aio=native? Fam > > > > >> > >> The test code triggerring this phenomenon work as following: it use linux > >> aio API to issue concurrent async write requests to a file. During > >> exection it will > >> continuously write data into target test file. There are total 'X' jobs, > >> and each job is assigned a job id JOB_ID which starts from 0. Each job > >> will write 16 * 512 > >> Byte data into the target file at offset = JOB_ID * 512. (the data is > >> repeated uint64_t JOB_ID). > >> There is only one thread handling 'X' jobs one by one through Linux > >> AIO (io_submit) cmd. When handling jobs, it will continuously > >> issuing AIO requests without waiting for AIO Callbacks. When it finishes, > >> the file should look like: > >> [00][1...1][2...2][3...3]...[X-1...X-1] > >> Then we use a check program to test the resulting file, it can > >> continuously read the first 8 byte (uint64_t) of each sector and print it > >> out. In normal cases, > >> it's output is like: > >> 0 1 2 3 X-1 > >> > >> Exec output: (Set X=32) > >> In our guest OS, the output is abnormal: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 > >> 14 15 16 17 18 18 18 18 18 18 24 25 26 27 28 29 30 31. > >> It can be seen that job20~job24 are overwrited by job19. > >> In our host OS, the output is as expected, 0 1 2 3 4 5 6 7 8 9 10 11 12 13 > >> 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31. > > > >I'm not 100% sure but I don't think the returning of io_submit guarantees any > >ordering, usually you need to wait for the callback to ensure that. > Is there any proof or artical about the ordering of io_submit requests? > > > >Fam > > > >> > >> > >> I can provide the example code if needed. > >> > >> Best regards, song > >> > >> 2015-10-08 > >> > >> > >> charlie.song > >> > >> -- > >> To unsubscribe from this list: send the line "unsubscribe kvm" in > >> the body of a message to majord...@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: fix waitqueue_active without memory barrier in virt/kvm/async_pf.c
async_pf_execute() seems to be missing a memory barrier which might cause the waker to not notice the waiter and miss sending a wake_up as in the following figure. async_pf_executekvm_vcpu_block spin_lock(>async_pf.lock); if (waitqueue_active(>wq)) /* The CPU might reorder the test for the waitqueue up here, before prior writes complete */ prepare_to_wait(>wq, , TASK_INTERRUPTIBLE); /*if (kvm_vcpu_check_block(vcpu) < 0) */ /*if (kvm_arch_vcpu_runnable(vcpu)) { */ ... return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE && !vcpu->arch.apf.halted) || !list_empty_careful(>async_pf.done) ... return 0; list_add_tail(>link, >async_pf.done); spin_unlock(>async_pf.lock); waited = true; schedule(); The attached patch adds the missing memory barrier. I found this issue when I was looking through the linux source code for places calling waitqueue_active() before wake_up*(), but without preceding memory barriers, after sending a patch to fix a similar issue in drivers/tty/n_tty.c (Details about the original issue can be found here: https://lkml.org/lkml/2015/9/28/849). Signed-off-by: Kosuke Tatsukawa--- virt/kvm/async_pf.c |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c index 44660ae..303fc7f 100644 --- a/virt/kvm/async_pf.c +++ b/virt/kvm/async_pf.c @@ -94,6 +94,11 @@ static void async_pf_execute(struct work_struct *work) trace_kvm_async_pf_completed(addr, gva); + /* +* Memory barrier is required here to make sure change to +* vcpu->async_pf.done is visible from other CPUs +*/ + smp_mb(); if (waitqueue_active(>wq)) wake_up_interruptible(>wq); -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: AIO requests may be disordered by Qemu-kvm iothread with disk cache=writethrough, Bug or Feature?
At 2015-10-08 23:37:02, "Fam Zheng"wrote: >On Thu, 10/08 19:59, charlie.song wrote: >> Dear KVM Developers: >> I am Xiang Song from UCloud company. We currently encounter a weird >> phenomenon about Qemu-KVM IOthread. >> We recently try to use Linux AIO from guest OS and find that the >> IOthread mechanism of Qemu-KVM will reorder I/O requests from guest OS >> even when the AIO write requests are issued from a single thread in order. >> This does not happen on the host OS however. >> We are not sure whether this is a feature of Qemu-KVM IOthread mechanism >> or a Bug. >> >> The testbd is as following: (the guest disk device cache is configured to >> writethrough.) >> CPU: Intel(R) Xeon(R) CPU E5-2650 >> QEMU version: 1.5.3 >> Host/Guest Kernel: Both Linux 4.1.8 & Linux 2.6.32, OS type CentOS 6.5 >> Simplified Guest OS qemu cmd: >> /usr/libexec/qemu-kvm -machine rhel6.3.0,accel=kvm,usb=off -cpu kvm64 -smp >> 8,sockets=8,cores=1,threads=1 >> -drive >> file=/var/lib/libvirt/images/song-disk.img,if=none,id=drive-virtio-disk0,format=qcow2,serial=UCLOUD_DISK_VDA,cache=writethrough >> >> -device >> virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:22:d5:52,bus=pci.0,addr=0x4 > >You mentioned iothread above but it's not in your command line? I means the thread pool mechanism used by qemu-kvm to accelerate I/O processing.This is used by paio_submit (block/raw-posix.c) by default and with pool->max_threads = 64 as I know. (qemu-kvm version 1.5.3) > >> >> The test code triggerring this phenomenon work as following: it use linux >> aio API to issue concurrent async write requests to a file. During exection >> it will >> continuously write data into target test file. There are total 'X' jobs, and >> each job is assigned a job id JOB_ID which starts from 0. Each job will >> write 16 * 512 >> Byte data into the target file at offset = JOB_ID * 512. (the data is >> repeated uint64_t JOB_ID). >> There is only one thread handling 'X' jobs one by one through Linux AIO >> (io_submit) cmd. When handling jobs, it will continuously >> issuing AIO requests without waiting for AIO Callbacks. When it finishes, >> the file should look like: >> [00][1...1][2...2][3...3]...[X-1...X-1] >> Then we use a check program to test the resulting file, it can >> continuously read the first 8 byte (uint64_t) of each sector and print it >> out. In normal cases, >> it's output is like: >> 0 1 2 3 X-1 >> >> Exec output: (Set X=32) >> In our guest OS, the output is abnormal: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 >> 15 16 17 18 18 18 18 18 18 24 25 26 27 28 29 30 31. >> It can be seen that job20~job24 are overwrited by job19. >> In our host OS, the output is as expected, 0 1 2 3 4 5 6 7 8 9 10 11 12 13 >> 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31. > >I'm not 100% sure but I don't think the returning of io_submit guarantees any >ordering, usually you need to wait for the callback to ensure that. Is there any proof or artical about the ordering of io_submit requests? > >Fam > >> >> >> I can provide the example code if needed. >> >> Best regards, song >> >> 2015-10-08 >> >> >> charlie.song >> >> -- >> To unsubscribe from this list: send the line "unsubscribe kvm" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Re: AIO requests may be disordered by Qemu-kvm iothread with disk cache=writethrough, Bug or Feature?
At 2015-10-09 12:33:20, "charlie.song"wrote: >At 2015-10-09 12:16:03, "Fam Zheng" wrote: >>On Fri, 10/09 11:25, charlie.song wrote: >>> At 2015-10-08 23:37:02, "Fam Zheng" wrote: >>> >On Thu, 10/08 19:59, charlie.song wrote: >>> >> Dear KVM Developers: >>> >> I am Xiang Song from UCloud company. We currently encounter a weird >>> >> phenomenon about Qemu-KVM IOthread. >>> >> We recently try to use Linux AIO from guest OS and find that the >>> >> IOthread mechanism of Qemu-KVM will reorder I/O requests from guest OS >>> >> even when the AIO write requests are issued from a single thread in >>> >> order. This does not happen on the host OS however. >>> >> We are not sure whether this is a feature of Qemu-KVM IOthread >>> >> mechanism or a Bug. >>> >> >>> >> The testbd is as following: (the guest disk device cache is configured >>> >> to writethrough.) >>> >> CPU: Intel(R) Xeon(R) CPU E5-2650 >>> >> QEMU version: 1.5.3 >>> >> Host/Guest Kernel: Both Linux 4.1.8 & Linux 2.6.32, OS type CentOS 6.5 >>> >> Simplified Guest OS qemu cmd: >>> >> /usr/libexec/qemu-kvm -machine rhel6.3.0,accel=kvm,usb=off -cpu kvm64 >>> >> -smp 8,sockets=8,cores=1,threads=1 >>> >> -drive >>> >> file=/var/lib/libvirt/images/song-disk.img,if=none,id=drive-virtio-disk0,format=qcow2,serial=UCLOUD_DISK_VDA,cache=writethrough >>> >> >>> >> -device >>> >> virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:22:d5:52,bus=pci.0,addr=0x4 >>> > >>> >You mentioned iothread above but it's not in your command line? >>> I means the thread pool mechanism used by qemu-kvm to accelerate I/O >>> processing.This is used by paio_submit (block/raw-posix.c) by default and >>> with pool->max_threads = 64 as I know. (qemu-kvm version 1.5.3) >> >>The thread pool parallism may reorder non-overlapping requests, but it >>shouldn't cause any reordering of overlapping requests like the case in your >>io >>pattern. QEMU ensures that. >> >>Do you see this with aio=native? >We see this with both aio=threads and aio=native. >Are you sure "QEMU ensures the ordering of overlapping requests"? >I can provide the example code if needed. If I configure the guest disk IO with "cache=directsync,aio=native", this phenomenon disappears. > >> >>Fam >> >>> >>> > >>> >> >>> >> The test code triggerring this phenomenon work as following: it use >>> >> linux aio API to issue concurrent async write requests to a file. During >>> >> exection it will >>> >> continuously write data into target test file. There are total 'X' jobs, >>> >> and each job is assigned a job id JOB_ID which starts from 0. Each job >>> >> will write 16 * 512 >>> >> Byte data into the target file at offset = JOB_ID * 512. (the data is >>> >> repeated uint64_t JOB_ID). >>> >> There is only one thread handling 'X' jobs one by one through Linux >>> >> AIO (io_submit) cmd. When handling jobs, it will continuously >>> >> issuing AIO requests without waiting for AIO Callbacks. When it >>> >> finishes, the file should look like: >>> >> [00][1...1][2...2][3...3]...[X-1...X-1] >>> >> Then we use a check program to test the resulting file, it can >>> >> continuously read the first 8 byte (uint64_t) of each sector and print >>> >> it out. In normal cases, >>> >> it's output is like: >>> >> 0 1 2 3 X-1 >>> >> >>> >> Exec output: (Set X=32) >>> >> In our guest OS, the output is abnormal: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 >>> >> 14 15 16 17 18 18 18 18 18 18 24 25 26 27 28 29 30 31. >>> >> It can be seen that job20~job24 are overwrited by job19. >>> >> In our host OS, the output is as expected, 0 1 2 3 4 5 6 7 8 9 10 11 12 >>> >> 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31. >>> > >>> >I'm not 100% sure but I don't think the returning of io_submit guarantees >>> >any >>> >ordering, usually you need to wait for the callback to ensure that. >>> Is there any proof or artical about the ordering of io_submit requests? >>> > >>> >Fam >>> > >>> >> >>> >> >>> >> I can provide the example code if needed. >>> >> >>> >> Best regards, song >>> >> >>> >> 2015-10-08 >>> >> >>> >> >>> >> charlie.song >>> >> >>> >> -- >>> >> To unsubscribe from this list: send the line "unsubscribe kvm" in >>> >> the body of a message to majord...@vger.kernel.org >>> >> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe kvm" in >>> the body of a message to majord...@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Re: AIO requests may be disordered by Qemu-kvm iothread with disk cache=writethrough, Bug or Feature?
At 2015-10-09 12:16:03, "Fam Zheng"wrote: >On Fri, 10/09 11:25, charlie.song wrote: >> At 2015-10-08 23:37:02, "Fam Zheng" wrote: >> >On Thu, 10/08 19:59, charlie.song wrote: >> >> Dear KVM Developers: >> >> I am Xiang Song from UCloud company. We currently encounter a weird >> >> phenomenon about Qemu-KVM IOthread. >> >> We recently try to use Linux AIO from guest OS and find that the >> >> IOthread mechanism of Qemu-KVM will reorder I/O requests from guest OS >> >> even when the AIO write requests are issued from a single thread in >> >> order. This does not happen on the host OS however. >> >> We are not sure whether this is a feature of Qemu-KVM IOthread >> >> mechanism or a Bug. >> >> >> >> The testbd is as following: (the guest disk device cache is configured to >> >> writethrough.) >> >> CPU: Intel(R) Xeon(R) CPU E5-2650 >> >> QEMU version: 1.5.3 >> >> Host/Guest Kernel: Both Linux 4.1.8 & Linux 2.6.32, OS type CentOS 6.5 >> >> Simplified Guest OS qemu cmd: >> >> /usr/libexec/qemu-kvm -machine rhel6.3.0,accel=kvm,usb=off -cpu kvm64 >> >> -smp 8,sockets=8,cores=1,threads=1 >> >> -drive >> >> file=/var/lib/libvirt/images/song-disk.img,if=none,id=drive-virtio-disk0,format=qcow2,serial=UCLOUD_DISK_VDA,cache=writethrough >> >> >> >> -device >> >> virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:22:d5:52,bus=pci.0,addr=0x4 >> > >> >You mentioned iothread above but it's not in your command line? >> I means the thread pool mechanism used by qemu-kvm to accelerate I/O >> processing.This is used by paio_submit (block/raw-posix.c) by default and >> with pool->max_threads = 64 as I know. (qemu-kvm version 1.5.3) > >The thread pool parallism may reorder non-overlapping requests, but it >shouldn't cause any reordering of overlapping requests like the case in your io >pattern. QEMU ensures that. > >Do you see this with aio=native? We see this with both aio=threads and aio=native. Are you sure "QEMU ensures the ordering of overlapping requests"? I can provide the example code if needed. > >Fam > >> >> > >> >> >> >> The test code triggerring this phenomenon work as following: it use linux >> >> aio API to issue concurrent async write requests to a file. During >> >> exection it will >> >> continuously write data into target test file. There are total 'X' jobs, >> >> and each job is assigned a job id JOB_ID which starts from 0. Each job >> >> will write 16 * 512 >> >> Byte data into the target file at offset = JOB_ID * 512. (the data is >> >> repeated uint64_t JOB_ID). >> >> There is only one thread handling 'X' jobs one by one through Linux >> >> AIO (io_submit) cmd. When handling jobs, it will continuously >> >> issuing AIO requests without waiting for AIO Callbacks. When it finishes, >> >> the file should look like: >> >> [00][1...1][2...2][3...3]...[X-1...X-1] >> >> Then we use a check program to test the resulting file, it can >> >> continuously read the first 8 byte (uint64_t) of each sector and print it >> >> out. In normal cases, >> >> it's output is like: >> >> 0 1 2 3 X-1 >> >> >> >> Exec output: (Set X=32) >> >> In our guest OS, the output is abnormal: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 >> >> 14 15 16 17 18 18 18 18 18 18 24 25 26 27 28 29 30 31. >> >> It can be seen that job20~job24 are overwrited by job19. >> >> In our host OS, the output is as expected, 0 1 2 3 4 5 6 7 8 9 10 11 12 >> >> 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31. >> > >> >I'm not 100% sure but I don't think the returning of io_submit guarantees >> >any >> >ordering, usually you need to wait for the callback to ensure that. >> Is there any proof or artical about the ordering of io_submit requests? >> > >> >Fam >> > >> >> >> >> >> >> I can provide the example code if needed. >> >> >> >> Best regards, song >> >> >> >> 2015-10-08 >> >> >> >> >> >> charlie.song >> >> >> >> -- >> >> To unsubscribe from this list: send the line "unsubscribe kvm" in >> >> the body of a message to majord...@vger.kernel.org >> >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- >> To unsubscribe from this list: send the line "unsubscribe kvm" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/2] KVM: x86: fix edge EOI and IOAPIC reconfig race
KVM uses eoi_exit_bitmap to track vectors that need an action on EOI. The problem is that IOAPIC can be reconfigured while an interrupt with old configuration is pending and eoi_exit_bitmap only remembers the newest configuration; thus EOI from the pending interrupt is not recognized. (Reconfiguration is not a problem for level interrupts, because IOAPIC sends interrupt with the new configuration.) For an edge interrupt with ACK notifiers, like i8254 timer; things can happen in this order 1) IOAPIC inject a vector from i8254 2) guest reconfigures that vector's VCPU and therefore eoi_exit_bitmap on original VCPU gets cleared 3) guest's handler for the vector does EOI 4) KVM's EOI handler doesn't pass that vector to IOAPIC because it is not in that VCPU's eoi_exit_bitmap 5) i8254 stops working A simple solution is to set the IOAPIC vector in eoi_exit_bitmap if the vector is in PIR/IRR/ISR. This creates an unwanted situation if the vector is reused by a non-IOAPIC source, but I think it is so rare that we don't want to make the solution more sophisticated. The simple solution also doesn't work if we are reconfiguring the vector. (Shouldn't happen in the wild and I'd rather fix users of ACK notifiers instead of working around that.) The are no races because ioapic injection and reconfig are locked. Fixes: b053b2aef25d ("KVM: x86: Add EOI exit bitmap inference") [Before b053b2aef25d, this bug happened only with APICv.] Fixes: c7c9c56ca26f ("x86, apicv: add virtual interrupt delivery support") Cc:Signed-off-by: Radim Krčmář --- v2: moved sync_pir_to_irr out of kvm_ioapic_scan_entry [Paolo] arch/x86/kvm/ioapic.c | 4 +++- arch/x86/kvm/x86.c| 4 +++- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c index 2dcda0f188ba..88d0a92d3f94 100644 --- a/arch/x86/kvm/ioapic.c +++ b/arch/x86/kvm/ioapic.c @@ -246,7 +246,9 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap) kvm_irq_has_notifier(ioapic->kvm, KVM_IRQCHIP_IOAPIC, index) || index == RTC_GSI) { if (kvm_apic_match_dest(vcpu, NULL, 0, - e->fields.dest_id, e->fields.dest_mode)) +e->fields.dest_id, e->fields.dest_mode) || + (e->fields.trig_mode == IOAPIC_EDGE_TRIG && +kvm_apic_pending_eoi(vcpu, e->fields.vector))) __set_bit(e->fields.vector, (unsigned long *)eoi_exit_bitmap); } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 2d2c9bb0d6d6..7ed88020d414 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6201,8 +6201,10 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu) if (irqchip_split(vcpu->kvm)) kvm_scan_ioapic_routes(vcpu, vcpu->arch.eoi_exit_bitmap); - else + else { + kvm_x86_ops->sync_pir_to_irr(vcpu); kvm_ioapic_scan_entry(vcpu, vcpu->arch.eoi_exit_bitmap); + } kvm_x86_ops->load_eoi_exitmap(vcpu); } -- 2.5.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: x86: don't notify userspace IOAPIC on edge EOI
2015-10-08 20:30+0200, Radim Krčmář: > On real hardware, edge-triggered interrupts don't set a bit in TMR, > which means that IOAPIC isn't notified on EOI. Do the same here. > > Staying in guest/kernel mode after edge EOI is what we want for most > devices. If some bugs could be nicely worked around with edge EOI > notifications, we should invest in a better interface. > > Signed-off-by: Radim Krčmář> --- > Completely untested. > > diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c > @@ -389,13 +389,15 @@ void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu, u64 > *eoi_exit_bitmap) > for (i = 0; i < nr_ioapic_pins; ++i) { > hlist_for_each_entry(entry, >map[i], link) { > u32 dest_id, dest_mode; > + bool level; > > if (entry->type != KVM_IRQ_ROUTING_MSI) > continue; > dest_id = (entry->msi.address_lo >> 12) & 0xff; > dest_mode = (entry->msi.address_lo >> 2) & 0x1; > - if (kvm_apic_match_dest(vcpu, NULL, 0, dest_id, > - dest_mode)) { > + level = entry->msi.data & MSI_DATA_TRIGGER_LEVEL; > + if (level && kvm_apic_match_dest(vcpu, NULL, 0, Describing that result is an overkill -- I'll send v2 if the idea is accepted. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 0/2] KVM: x86: fix edge EOI and IOAPIC reconfig race
v2: * rewritten [1/2] and * refactored [2/2], all thanks to Paolo's comments This problem is not fixed for split userspace part as I think that it would be better to solve that by excluding edge interrupts from eoi_exit_bitmap (see the next patch in kvm-list for discussion). Radim Krčmář (2): kvm: x86: set KVM_REQ_EVENT when updating IRR KVM: x86: fix edge EOI and IOAPIC reconfig race arch/x86/kvm/ioapic.c | 4 +++- arch/x86/kvm/lapic.c | 2 ++ arch/x86/kvm/x86.c| 4 +++- 3 files changed, 8 insertions(+), 2 deletions(-) -- 2.5.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/2] kvm: x86: set KVM_REQ_EVENT when updating IRR
After moving PIR to IRR, the interrupt needs to be delivered manually. Reported-by: Paolo BonziniCc: Signed-off-by: Radim Krčmář --- v2: completely rewritten arch/x86/kvm/lapic.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 944b38a56929..168b8759bd73 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -348,6 +348,8 @@ void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir) struct kvm_lapic *apic = vcpu->arch.apic; __kvm_apic_update_irr(pir, apic->regs); + + kvm_make_request(KVM_REQ_EVENT, vcpu); } EXPORT_SYMBOL_GPL(kvm_apic_update_irr); -- 2.5.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: x86: don't notify userspace IOAPIC on edge EOI
On real hardware, edge-triggered interrupts don't set a bit in TMR, which means that IOAPIC isn't notified on EOI. Do the same here. Staying in guest/kernel mode after edge EOI is what we want for most devices. If some bugs could be nicely worked around with edge EOI notifications, we should invest in a better interface. Signed-off-by: Radim Krčmář--- Completely untested. arch/x86/kvm/irq_comm.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c index c89228980230..8f4499c7ffc1 100644 --- a/arch/x86/kvm/irq_comm.c +++ b/arch/x86/kvm/irq_comm.c @@ -389,13 +389,15 @@ void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap) for (i = 0; i < nr_ioapic_pins; ++i) { hlist_for_each_entry(entry, >map[i], link) { u32 dest_id, dest_mode; + bool level; if (entry->type != KVM_IRQ_ROUTING_MSI) continue; dest_id = (entry->msi.address_lo >> 12) & 0xff; dest_mode = (entry->msi.address_lo >> 2) & 0x1; - if (kvm_apic_match_dest(vcpu, NULL, 0, dest_id, - dest_mode)) { + level = entry->msi.data & MSI_DATA_TRIGGER_LEVEL; + if (level && kvm_apic_match_dest(vcpu, NULL, 0, + dest_id, dest_mode)) { u32 vector = entry->msi.data & 0xff; __set_bit(vector, -- 2.5.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html