Re: [PATCH v3] powerpc: kvm: make _PAGE_NUMA take effect
On 13.04.14 04:27, Liu ping fan wrote: On Fri, Apr 11, 2014 at 10:03 PM, Alexander Graf wrote: On 11.04.2014, at 13:45, Liu Ping Fan wrote: When we mark pte with _PAGE_NUMA we already call mmu_notifier_invalidate_range_start and mmu_notifier_invalidate_range_end, which will mark existing guest hpte entry as HPTE_V_ABSENT. Now we need to do that when we are inserting new guest hpte entries. What happens when we don't? Why do we need the check? Why isn't it done implicitly? What happens when we treat a NUMA marked page as non-present? Why does it work out for us? Assume you have no idea what PAGE_NUMA is, but try to figure out what this patch does and whether you need to cherry-pick it into your downstream kernel. The description as is still is not very helpful for that. It doesn't even explain what really changes with this patch applied. Yeah. what about appending the following description? Can it make the context clear? "Guest should not setup a hpte for the page whose pte is marked with _PAGE_NUMA, so on the host, the numa-fault mechanism can take effect to check whether the page is placed correctly or not." Try to come up with a text that answers the following questions in order: - What does _PAGE_NUMA mean? - How does page migration with _PAGE_NUMA work? -> Why should we not map pages when _PAGE_NUMA is set? - Which part of what needs to be done did the previous _PAGE_NUMA patch address? - What's the situation without this patch? - Which scenario does this patch fix? Once you have a text that answers those, you should have a good patch description :). Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/3] KVM: nVMX: Don't advertise single context invalidation for invept
On 2014-04-11 21:35, Marcelo Tosatti wrote: > On Fri, Apr 11, 2014 at 08:53:09PM +0200, Jan Kiszka wrote: >> On 2014-04-11 20:35, Bandan Das wrote: >>> Jan Kiszka writes: >>> On 2014-04-11 19:26, Bandan Das wrote: > Jan Kiszka writes: > >> On 2014-04-11 02:27, Bandan Das wrote: >>> Marcelo Tosatti writes: >>> On Mon, Mar 31, 2014 at 05:00:23PM -0400, Bandan Das wrote: > For single context invalidation, we fall through to global > invalidation in handle_invept() except for one case - when > the operand supplied by L1 is different from what we have in > vmcs12. However, typically hypervisors will only call invept > for the currently loaded eptp, so the condition will > never be true. > > Signed-off-by: Bandan Das Bandan, Why not fix INVEPT single-context rather than removing it entirely? "Single-context. If the INVEPT type is 1, the logical processor invalidates all guest-physical mappings and combined mappings associated with the EP4TA specified in the INVEPT descriptor. Combined mappings for that EP4TA are invalidated for all VPIDs and all PCIDs. (The instruction may invalidate mappings associated with other EP4TAs.)" So just removing the "if (EPTP != CURRENT.EPTP) BREAK" should be enough. >>> >>> The single context invalidation in handle_invept() doesn't do >>> anything different. It just falls down to the global case. >>> And the invept code in Xen and KVM both seemed to fall back >>> to global invalidation if support for single context wasn't found. >>> So, it was proposed not to advertise it at all. >>> >>> But rethinking this again, I agree with you. If there's a hypervisor >>> with a single context invept implmentation that does not fallback, >>> this will unfortunately not work. Jan, do you agree with this ? >> >> A hypervisor that doesn't properly check the HW caps is just broken. And >> one that mandates single context invalidation support is silly. > > Well, but we could make life a little bit easier for the unfortunate user > using the broken hypervisor :) And advertising single context > inavalidation > doesn't really seem to have any downsides. Ok, let's try it this way: single-context invalidation is inherently tied to VPID support (that's how you address a context). However, KVM does not expose VPID to its guest. So this discussion is mood: no hypervisor will make use of this feature as it has no means to fill in the required parameter. >>> >>> I thought (from the spec) invept single context invalidation >>> takes the EP4TA as the second argument. invvpid single context >>> however takes the VPID as its descriptor. >> >> Oops, invept/invvpid mess-up while re-reading the spec - sorry. >> >>> >>> The Xen L1 hypervisor was actually calling single context invept >>> multiple times. That's how I hit this bug. >> >> ...and it's no longer doing it now, I suppose. The question remains, >> which hypervisor we want to cater with a >> "single-context-that-is-current-context" invalidation (that is my >> understanding of Marcelo's proposal). > > My proposal is to implement what is in the spec. > >> On the other hand, if some hypervisor actually uses invept to >> invalidate a non-current mapping, we would regress compared to not >> exposing single context invept. Hope I got this conclusion right. ;) > > In that case INVEPT global would also be broken. I'm all for having a proper invept single context support but that, first of all, requires tracking the vEPTP->EPTP mappings. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] virtio-scsi: Skip setting affinity on uninitialized vq
virtscsi_init calls virtscsi_remove_vqs on err, even before initializing the vqs. The latter calls virtscsi_set_affinity, so let's check the pointer there before setting affinity on it. This fixes a panic when setting device's num_queues=2 on RHEL 6.5: qemu-system-x86_64 ... \ -device virtio-scsi-pci,id=scsi0,addr=0x13,...,num_queues=2 \ -drive file=/stor/vm/dummy.raw,id=drive-scsi-disk,... \ -device scsi-hd,drive=drive-scsi-disk,... [0.354734] scsi0 : Virtio SCSI HBA [0.379504] BUG: unable to handle kernel NULL pointer dereference at 0020 [0.380141] IP: [] __virtscsi_set_affinity+0x4f/0x120 [0.380141] PGD 0 [0.380141] Oops: [#1] SMP [0.380141] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.14.0+ #5 [0.380141] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 [0.380141] task: 88003c9f ti: 88003c9f8000 task.ti: 88003c9f8000 [0.380141] RIP: 0010:[] [] __virtscsi_set_affinity+0x4f/0x120 [0.380141] RSP: :88003c9f9c08 EFLAGS: 00010256 [0.380141] RAX: RBX: 88003c3a9d40 RCX: 1070 [0.380141] RDX: 0002 RSI: RDI: [0.380141] RBP: 88003c9f9c28 R08: 000136c0 R09: 88003c801c00 [0.380141] R10: 81475229 R11: 0008 R12: [0.380141] R13: 81cc7ca8 R14: 88003cac3d40 R15: 88003cac37a0 [0.380141] FS: () GS:88003e40() knlGS: [0.380141] CS: 0010 DS: ES: CR0: 8005003b [0.380141] CR2: 0020 CR3: 01c0e000 CR4: 06f0 [0.380141] Stack: [0.380141] 88003c3a9d40 88003cac3d80 88003cac3d40 [0.380141] 88003c9f9c48 814742e8 88003c26d000 88003c26d000 [0.380141] 88003c9f9c68 81474321 88003c26d000 88003c3a9d40 [0.380141] Call Trace: [0.380141] [] virtscsi_set_affinity+0x28/0x40 [0.380141] [] virtscsi_remove_vqs+0x21/0x50 [0.380141] [] virtscsi_init+0x91/0x240 [0.380141] [] ? vp_get+0x50/0x70 [0.380141] [] virtscsi_probe+0xf4/0x280 [0.380141] [] virtio_dev_probe+0xe5/0x140 [0.380141] [] driver_probe_device+0x89/0x230 [0.380141] [] __driver_attach+0x9b/0xa0 [0.380141] [] ? driver_probe_device+0x230/0x230 [0.380141] [] ? driver_probe_device+0x230/0x230 [0.380141] [] bus_for_each_dev+0x8c/0xb0 [0.380141] [] driver_attach+0x19/0x20 [0.380141] [] bus_add_driver+0x198/0x220 [0.380141] [] driver_register+0x5f/0xf0 [0.380141] [] ? spi_transport_init+0x79/0x79 [0.380141] [] register_virtio_driver+0x1b/0x30 [0.380141] [] init+0x88/0xd6 [0.380141] [] ? scsi_init_procfs+0x5b/0x5b [0.380141] [] do_one_initcall+0x7f/0x10a [0.380141] [] kernel_init_freeable+0x14a/0x1de [0.380141] [] ? kernel_init_freeable+0x1de/0x1de [0.380141] [] ? rest_init+0x80/0x80 [0.380141] [] kernel_init+0x9/0xf0 [0.380141] [] ret_from_fork+0x7c/0xb0 [0.380141] [] ? rest_init+0x80/0x80 [0.380141] RIP [] __virtscsi_set_affinity+0x4f/0x120 [0.380141] RSP [0.380141] CR2: 0020 [0.380141] ---[ end trace 8074b70c3d5e1d73 ]--- [0.475018] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0009 [0.475018] [0.475068] Kernel Offset: 0x0 from 0x8100 (relocation range: 0x8000-0x9fff) [0.475068] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0009 Signed-off-by: Fam Zheng Acked-by: Paolo Bonzini Cc: sta...@vger.kernel.org --- drivers/scsi/virtio_scsi.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index 16bfd50..3019267 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -750,8 +750,12 @@ static void __virtscsi_set_affinity(struct virtio_scsi *vscsi, bool affinity) vscsi->affinity_hint_set = true; } else { - for (i = 0; i < vscsi->num_queues; i++) + for (i = 0; i < vscsi->num_queues; i++) { + if (!vscsi->req_vqs[i].vq) { + continue; + } virtqueue_set_affinity(vscsi->req_vqs[i].vq, -1); + } vscsi->affinity_hint_set = false; } -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: PPC: Book3S HV: Add missing code for transaction reclaim on guest exit
Testing by Michael Neuling revealed that commit e4e38121507a ("KVM: PPC: Book3S HV: Add transactional memory support") is missing the code that saves away the checkpointed state of the guest when switching to the host. This adds that code, which was in earlier versions of the patch but went missing somehow. Reported-by: Michael Neuling Signed-off-by: Paul Mackerras --- Please send this along for inclusion in 3.15. Without it, on POWER8 it is possible for the guest to cause the host to crash. arch/powerpc/kvm/book3s_hv_rmhandlers.S | 104 1 file changed, 104 insertions(+) diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index 4963335..ec5ad6e 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -1312,6 +1312,110 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_206) mr r3, r9 bl kvmppc_save_fp +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM +BEGIN_FTR_SECTION + b 2f +END_FTR_SECTION_IFCLR(CPU_FTR_TM) + /* Turn on TM. */ + mfmsr r8 + li r0, 1 + rldimi r8, r0, MSR_TM_LG, 63-MSR_TM_LG + mtmsrd r8 + + ld r5, VCPU_MSR(r9) + rldicl. r5, r5, 64 - MSR_TS_S_LG, 62 + beq 1f /* TM not active in guest. */ + + li r3, TM_CAUSE_KVM_RESCHED + + /* Clear the MSR RI since r1, r13 are all going to be foobar. */ + li r5, 0 + mtmsrd r5, 1 + + /* All GPRs are volatile at this point. */ + TRECLAIM(R3) + + /* Temporarily store r13 and r9 so we have some regs to play with */ + SET_SCRATCH0(r13) + GET_PACA(r13) + std r9, PACATMSCRATCH(r13) + ld r9, HSTATE_KVM_VCPU(r13) + + /* Get a few more GPRs free. */ + std r29, VCPU_GPRS_TM(29)(r9) + std r30, VCPU_GPRS_TM(30)(r9) + std r31, VCPU_GPRS_TM(31)(r9) + + /* Save away PPR and DSCR soon so don't run with user values. */ + mfspr r31, SPRN_PPR + HMT_MEDIUM + mfspr r30, SPRN_DSCR + ld r29, HSTATE_DSCR(r13) + mtspr SPRN_DSCR, r29 + + /* Save all but r9, r13 & r29-r31 */ + reg = 0 + .rept 29 + .if (reg != 9) && (reg != 13) + std reg, VCPU_GPRS_TM(reg)(r9) + .endif + reg = reg + 1 + .endr + /* ... now save r13 */ + GET_SCRATCH0(r4) + std r4, VCPU_GPRS_TM(13)(r9) + /* ... and save r9 */ + ld r4, PACATMSCRATCH(r13) + std r4, VCPU_GPRS_TM(9)(r9) + + /* Reload stack pointer and TOC. */ + ld r1, HSTATE_HOST_R1(r13) + ld r2, PACATOC(r13) + + /* Set MSR RI now we have r1 and r13 back. */ + li r5, MSR_RI + mtmsrd r5, 1 + + /* Save away checkpinted SPRs. */ + std r31, VCPU_PPR_TM(r9) + std r30, VCPU_DSCR_TM(r9) + mflrr5 + mfcrr6 + mfctr r7 + mfspr r8, SPRN_AMR + mfspr r10, SPRN_TAR + std r5, VCPU_LR_TM(r9) + stw r6, VCPU_CR_TM(r9) + std r7, VCPU_CTR_TM(r9) + std r8, VCPU_AMR_TM(r9) + std r10, VCPU_TAR_TM(r9) + + /* Restore r12 as trap number. */ + lwz r12, VCPU_TRAP(r9) + + /* Save FP/VSX. */ + addir3, r9, VCPU_FPRS_TM + bl .store_fp_state + addir3, r9, VCPU_VRS_TM + bl .store_vr_state + mfspr r6, SPRN_VRSAVE + stw r6, VCPU_VRSAVE_TM(r9) +1: + /* +* We need to save these SPRs after the treclaim so that the software +* error code is recorded correctly in the TEXASR. Also the user may +* change these outside of a transaction, so they must always be +* context switched. +*/ + mfspr r5, SPRN_TFHAR + mfspr r6, SPRN_TFIAR + mfspr r7, SPRN_TEXASR + std r5, VCPU_TFHAR(r9) + std r6, VCPU_TFIAR(r9) + std r7, VCPU_TEXASR(r9) +2: +#endif + /* Increment yield count if they have a VPA */ ld r8, VCPU_VPA(r9)/* do they have a VPA? */ cmpdi r8, 0 -- 1.9.rc1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 0/4] KVM: enable Intel SMAP for KVM
I would like to see this in 3.15. -hpa On April 13, 2014 2:57:38 PM PDT, Marcelo Tosatti wrote: >On Fri, Apr 11, 2014 at 08:16:28PM -0400, Paolo Bonzini wrote: >> Il 10/04/2014 16:01, Marcelo Tosatti ha scritto: >> >On Tue, Apr 08, 2014 at 04:38:08PM -0400, Paolo Bonzini wrote: >> >>Il 07/04/2014 21:06, Wu, Feng ha scritto: >> >Even though the tests do not cover the CPL=3/implicit access >case, the >> >logic to compute PFERR_RSVD_MASK dynamically is already covered >by AC=1. >> > So I'm quite happy with the coverage. Series is >> > >> >Reviewed-by: Paolo Bonzini ] >> >>>Thanks very much for your review on this. >> >>>BTW: Since 3.15 merge window is still open, I am wondering whether >there is >> >>>any possibility to make SMAP into 3.15 with another pull request. >> >>> >> >> >> >>This is up to Marcelo who is currently managing the KVM tree. >> >> >> >>Paolo >> > >> >The merge window is for patches which have been tested in queue/next >> >for sometime. This patch has received no testing other than the >> >developer testing. >> >> This is not going to change unfortunately since this is not shipping >> in any real silicon. The only hope could be to use QEMU's SVM and >> SMAP emulation. > >Well, let me know if you want an exception to the rule so i should >merge this patchset and submit it for 3.15. > >> >> >Lack of implicit supervisor mode by instructions such as "Examples >of >> >such implicit..." in section 9.3.2, in KVM's emulator, makes the >feature >> >incomplete, does it not ? >> >> Implicit supervisor mode is handled by KVM emulator using >> read/write_std. These accesses do not set PFERR_USER_MASK, and >> should work fine with SMAP. Am I misunderstanding? >> >> Paolo > >Right. -- Sent from my mobile phone. Please pardon brevity and lack of formatting. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 0/4] KVM: enable Intel SMAP for KVM
On Fri, Apr 11, 2014 at 08:16:28PM -0400, Paolo Bonzini wrote: > Il 10/04/2014 16:01, Marcelo Tosatti ha scritto: > >On Tue, Apr 08, 2014 at 04:38:08PM -0400, Paolo Bonzini wrote: > >>Il 07/04/2014 21:06, Wu, Feng ha scritto: > >Even though the tests do not cover the CPL=3/implicit access case, the > >logic to compute PFERR_RSVD_MASK dynamically is already covered by AC=1. > > So I'm quite happy with the coverage. Series is > > > >Reviewed-by: Paolo Bonzini ] > >>>Thanks very much for your review on this. > >>>BTW: Since 3.15 merge window is still open, I am wondering whether there is > >>>any possibility to make SMAP into 3.15 with another pull request. > >>> > >> > >>This is up to Marcelo who is currently managing the KVM tree. > >> > >>Paolo > > > >The merge window is for patches which have been tested in queue/next > >for sometime. This patch has received no testing other than the > >developer testing. > > This is not going to change unfortunately since this is not shipping > in any real silicon. The only hope could be to use QEMU's SVM and > SMAP emulation. Well, let me know if you want an exception to the rule so i should merge this patchset and submit it for 3.15. > > >Lack of implicit supervisor mode by instructions such as "Examples of > >such implicit..." in section 9.3.2, in KVM's emulator, makes the feature > >incomplete, does it not ? > > Implicit supervisor mode is handled by KVM emulator using > read/write_std. These accesses do not set PFERR_USER_MASK, and > should work fine with SMAP. Am I misunderstanding? > > Paolo Right. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
Hello, I had some basic questions regarding KVM, and would appreciate any help:) I have been reading about the KVM architecture, and as I understand it, the guest shows up as a regular process in the host itself.. I had some questions around that.. 1. Are the guest processes implemented as a control group within the overall VM process itself? Is the VM a kernel process or a user process? 2. Is there a way for me to force some specific CPU/s to a guest, and those CPUs to be not used for any work on the host itself? Pinning is just making sure the vCPU runs on the same physical CPU always, I am looking for something more than that.. 3. If the host is compiled as a non pre-emptible kernel, kernel process run to completion until they give up the CPU themselves. In the context of a guest, I am trying to understand what that would mean in the context of KVM and guest VMs. If the VM is a user process, it means nothing, I wasnt sure as per (1). Cheers! M -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC untested] kvm/x86: implement hv EOI assist
It seems that it's easy to implement the EOI assist on top of the PV EOI feature: simply convert the page address to the format expected by PV EOI. Signed-off-by: Michael S. Tsirkin --- Warning: untested! As I'll be off-line for a couple of days, sending this out for early review. Review/comments/flames/testing reports welcome. diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 8ae1ff5..d84d750fc 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1890,6 +1890,8 @@ static int set_msr_hyperv(struct kvm_vcpu *vcpu, u32 msr, u64 data) if (!(data & HV_X64_MSR_APIC_ASSIST_PAGE_ENABLE)) { vcpu->arch.hv_vapic = data; + if (kvm_lapic_enable_pv_eoi(vcpu, 0)) + return 1; break; } gfn = data >> HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT; @@ -1900,6 +1902,8 @@ static int set_msr_hyperv(struct kvm_vcpu *vcpu, u32 msr, u64 data) return 1; vcpu->arch.hv_vapic = data; mark_page_dirty(vcpu->kvm, gfn); + if (kvm_lapic_enable_pv_eoi(vcpu, gfn_to_gpa(gfn) | KVM_MSR_ENABLED)) + return 1; break; } case HV_X64_MSR_EOI: -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 3/6] KVM: emulate: cleanup decode_rm
Il 10/04/2014 20:03, Bandan Das ha scritto: - if (ctxt->rex_prefix) { - ctxt->modrm_reg = (ctxt->rex_prefix & 4) << 1; /* REX.R */ - index_reg = (ctxt->rex_prefix & 2) << 2; /* REX.X */ - ctxt->modrm_rm = base_reg = (ctxt->rex_prefix & 1) << 3; /* REG.B */ - } + index_reg = (ctxt->rex_prefix << 2) & 8; /* REX.X */ + base_reg = (ctxt->rex_prefix << 3) & 8; /* REG.B */ This is REX.B (preexisting typo), and please leave REX.R here too for clarity. Also, the function is "decode_modrm", not "decode_rm" (in the commit message). Otherwise the series seems okay from a somewhat cursory review. Paolo - ctxt->modrm_mod |= (ctxt->modrm & 0xc0) >> 6; - ctxt->modrm_reg |= (ctxt->modrm & 0x38) >> 3; - ctxt->modrm_rm |= (ctxt->modrm & 0x07); + ctxt->modrm_mod = (ctxt->modrm & 0xc0) >> 6; + ctxt->modrm_reg = ((ctxt->rex_prefix << 1) & 8) | /* REX.R */ + ((ctxt->modrm & 0x38) >> 3); + ctxt->modrm_rm = base_reg | (ctxt->modrm & 0x07); ctxt->modrm_seg = VCPU_SREG_DS; -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 6/6] KVM: emulate: remove memopp and rip_relative
Il 10/04/2014 20:03, Bandan Das ha scritto: /* Fields above regs are cleared together. */ This comment is not accurate anymore after patch 4. Since you're fixing it, please add another comment saying where the cleared fields start, too. + ctxt->memop.addr.mem.ea = (u32)ctxt->memop.addr.mem.ea; This is missing "if (ctxt->ad_bytes != 8)". + if (rip_relative) + ctxt->memop.addr.mem.ea += ctxt->_eip; Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html