[GIT PULL] more KVM changes for 3.11
Linus, Please pull from git://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/kvm-3.11-2 to receive more KVM updates for the 3.11 merge window. There is a fix for a bug that prevents some guests from working on old Intel CPUs and a patch that integrates ARM64 KVM, merged via ARM64 tree, into Kconfig. Gleb Natapov (1): KVM: VMX: mark unusable segment as nonpresent Marc Zyngier (1): arm64: KVM: Kconfig integration arch/arm64/Kconfig |2 ++ arch/arm64/kernel/asm-offsets.c |1 + arch/arm64/kvm/Kconfig | 51 +++ arch/x86/kvm/vmx.c | 11 +++-- 4 files changed, 63 insertions(+), 2 deletions(-) create mode 100644 arch/arm64/kvm/Kconfig -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] kvm tools: fix boot of guests with more than 4gb of ram
On Sun, Jul 7, 2013 at 7:00 PM, Sasha Levin sasha.le...@oracle.com wrote: Commit kvm tools: virtio: remove hardcoded assumptions about guest page size has introduced a bug that prevented guests with more than 4gb of ram from booting. The issue is that 'pfn' is a 32bit integer, so when multiplying it by page size to get the actual page will cause an overflow if the pfn referred to a memory area above 4gb. Signed-off-by: Sasha Levin sasha.le...@oracle.com Will, Michael, Asias, good to merge? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 60505] Heavy network traffic triggers vhost_net lockup
https://bugzilla.kernel.org/show_bug.cgi?id=60505 --- Comment #4 from Bart Van Assche bvanass...@acm.org --- I have not yet tried to disable zero-copy tx. But even with the vhost-net patch applied on kernel v3.9.9 I can still trigger this issue: Jul 8 10:58:01 asus kernel: BUG: unable to handle kernel NULL pointer dereference at 001c Jul 8 10:58:01 asus kernel: IP: [810f73a9] put_compound_page+0x89/0x170 Jul 8 10:58:01 asus kernel: PGD 0 Jul 8 10:58:01 asus kernel: Oops: [#1] SMP Jul 8 10:58:01 asus kernel: Modules linked in: dm_queue_length dm_multipath ib_iser iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi vhost_net tun fuse ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables af_packet bridge stp llc rdma_ucm rdma_cm iw_cm ib_addr ib_srp scsi_transport_srp scsi_tgt ib_ipoib ib_cm ib_uverbs ib_umad mlx4_en mlx4_ib ib_sa ib_mad ib_core dm_mod hid_generic usbhid hid acpi_cpufreq mperf kvm_intel i2c_i801 kvm r8169 ehci_pci snd_hda_codec_hdmi qla2xxx snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep ehci_hcd snd_pcm snd_seq mii sr_mod cdrom sg snd_timer pcspkr snd_seq_device mlx4_core scsi_transport_fc wmi snd soundcore snd_page_alloc crc32c_intel microcode autofs4 ext4 jbd2 mbcache crc16 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx raid10 raid0 raid1 sd_mod crc_t10dif i915 drm_kms_helper drm ahci libahci intel_agp i2c_algo_bit intel_gtt agpgart xhci_hcd i2c_core video usbcore usb_common button processor thermal_sys hwmon scsi_dh_alua scsi_dh pata_acpi libata scsi_mod Jul 8 10:58:01 asus kernel: CPU 3 Jul 8 10:58:01 asus kernel: Pid: 5485, comm: vhost-5462 Not tainted 3.9.9+ #1 Gigabyte Technology Co., Ltd. Z68X-UD3H-B3/Z68X-UD3H-B3 Jul 8 10:58:01 asus kernel: RIP: 0010:[810f73a9] [810f73a9] put_compound_page+0x89/0x170 Jul 8 10:58:01 asus kernel: RSP: 0018:8800aab13bd8 EFLAGS: 00010286 Jul 8 10:58:01 asus kernel: RAX: 880118b0b600 RBX: 880118b0b800 RCX: ea000252801c Jul 8 10:58:01 asus kernel: RDX: 0140 RSI: 0246 RDI: 880118b0b800 Jul 8 10:58:01 asus kernel: RBP: 8800aab13bf8 R08: 8800aa8f4518 R09: 0010 Jul 8 10:58:01 asus kernel: R10: R11: 7fa0c000 R12: Jul 8 10:58:01 asus kernel: R13: a078f96c R14: 91aa R15: 8800b3bb7500 Jul 8 10:58:01 asus kernel: FS: () GS:88011fac() knlGS: Jul 8 10:58:01 asus kernel: CS: 0010 DS: ES: CR0: 80050033 Jul 8 10:58:01 asus kernel: CR2: 001c CR3: aab9f000 CR4: 000427e0 Jul 8 10:58:01 asus kernel: DR0: DR1: DR2: Jul 8 10:58:01 asus kernel: DR3: DR6: 0ff0 DR7: 0400 Jul 8 10:58:01 asus kernel: Process vhost-5462 (pid: 5485, threadinfo 8800aab12000, task 88010792) Jul 8 10:58:01 asus kernel: Stack: Jul 8 10:58:01 asus kernel: eaecae40 0012 8800b3bb7500 a078f96c Jul 8 10:58:01 asus kernel: 8800aab13c08 810f77ec 8800aab13c28 8132045f Jul 8 10:58:01 asus kernel: 8800b3bb7500 8800b3bb7500 8800aab13c48 813204fe Jul 8 10:58:01 asus kernel: Call Trace: Jul 8 10:58:01 asus kernel: [810f77ec] put_page+0x2c/0x40 Jul 8 10:58:01 asus kernel: [8132045f] skb_release_data+0x8f/0x110 Jul 8 10:58:01 asus kernel: [813204fe] __kfree_skb+0x1e/0xa0 Jul 8 10:58:01 asus kernel: [813205b6] kfree_skb+0x36/0xa0 Jul 8 10:58:01 asus kernel: [a078f96c] tun_get_user+0x71c/0x810 [tun] Jul 8 10:58:01 asus kernel: [a078faba] tun_sendmsg+0x5a/0x80 [tun] Jul 8 10:58:01 asus kernel: [a079e607] handle_tx+0x287/0x680 [vhost_net] Jul 8 10:58:01 asus kernel: [a079ea35] handle_tx_kick+0x15/0x20 [vhost_net] Jul 8 10:58:01 asus kernel: [a079a80a] vhost_worker+0xaa/0x1a0 [vhost_net] Jul 8 10:58:01 asus kernel: [8105ef80] kthread+0xc0/0xd0 Jul 8 10:58:01 asus kernel: [8140395c] ret_from_fork+0x7c/0xb0 Jul 8 10:58:01 asus kernel: Code: 8b 6d f8 c9 c3 48 8b 07 f6 c4 80 75 0d f0 ff 4b 1c 0f 94 c0 84 c0 74 c9 eb bf 4c 8b 67 30 48 8b 07 f6 c4 80 74 e7 4c 39 e7 74 e2 41 8b 54 24 1c 49 8d 4c 24 1c 85 d2 74 d4 8d 72 01 89 d0 f0 0f Jul 8 10:58:01 asus kernel: RIP [810f73a9] put_compound_page+0x89/0x170 Jul 8 10:58:01 asus kernel: RSP 8800aab13bd8 Jul 8 10:58:01 asus kernel: CR2: 001c Jul 8 10:58:01 asus kernel: ---[ end trace 481d0b283c089c9a ]--- The patch I ran this test with is as follows: diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index dfff647..98f81e6 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -857,7 +857,7 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd) mutex_unlock(vq-mutex);
[PATCH 1/2] KVM: PPC: Book3S HV: Correct tlbie usage
This corrects the usage of the tlbie (TLB invalidate entry) instruction in HV KVM. The tlbie instruction changed between PPC970 and POWER7. On the PPC970, the bit to select large vs. small page is in the instruction, not in the RB register value. This changes the code to use the correct form on PPC970. On POWER7 we were calculating the AVAL (Abbreviated Virtual Address, Lower) field of the RB value incorrectly for 64k pages. This fixes it. Since we now have several cases to handle for the tlbie instruction, this factors out the code to do a sequence of tlbies into a new function, do_tlbies(), and calls that from the various places where the code was doing tlbie instructions inline. It also makes kvmppc_h_bulk_remove() use the same global_invalidates() function for determining whether to do local or global TLB invalidations as is used in other places, for consistency, and also to make sure that kvm-arch.need_tlb_flush gets updated properly. Signed-off-by: Paul Mackerras pau...@samba.org Cc: sta...@vger.kernel.org --- arch/powerpc/include/asm/kvm_book3s_64.h | 2 +- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 139 ++- 2 files changed, 82 insertions(+), 59 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 9c1ff33..dc6b84a 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -100,7 +100,7 @@ static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r, /* (masks depend on page size) */ rb |= 0x1000; /* page encoding in LP field */ rb |= (va_low 0x7f) 16; /* 7b of VA in AVA/LP field */ - rb |= (va_low 0xfe); /* AVAL field (P7 doesn't seem to care) */ + rb |= ((va_low 4) 0xf0); /* AVAL field (P7 doesn't seem to care) */ } } else { /* 4kB page */ diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 6dcbb49..105b00f 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -385,6 +385,80 @@ static inline int try_lock_tlbie(unsigned int *lock) return old == 0; } +/* + * tlbie/tlbiel is a bit different on the PPC970 compared to later + * processors such as POWER7; the large page bit is in the instruction + * not RB, and the top 16 bits and the bottom 12 bits of the VA + * in RB must be 0. + */ +static void do_tlbies_970(struct kvm *kvm, unsigned long *rbvalues, + long npages, int global, bool need_sync) +{ + long i; + + if (global) { + while (!try_lock_tlbie(kvm-arch.tlbie_lock)) + cpu_relax(); + if (need_sync) + asm volatile(ptesync : : : memory); + for (i = 0; i npages; ++i) { + unsigned long rb = rbvalues[i]; + + if (rb 1) /* large page */ + asm volatile(tlbie %0,1 : : +r (rb 0xf000ul)); + else + asm volatile(tlbie %0,0 : : +r (rb 0xf000ul)); + } + asm volatile(eieio; tlbsync; ptesync : : : memory); + kvm-arch.tlbie_lock = 0; + } else { + if (need_sync) + asm volatile(ptesync : : : memory); + for (i = 0; i npages; ++i) { + unsigned long rb = rbvalues[i]; + + if (rb 1) /* large page */ + asm volatile(tlbiel %0,1 : : +r (rb 0xf000ul)); + else + asm volatile(tlbiel %0,0 : : +r (rb 0xf000ul)); + } + asm volatile(ptesync : : : memory); + } +} + +static void do_tlbies(struct kvm *kvm, unsigned long *rbvalues, + long npages, int global, bool need_sync) +{ + long i; + + if (cpu_has_feature(CPU_FTR_ARCH_201)) { + /* PPC970 tlbie instruction is a bit different */ + do_tlbies_970(kvm, rbvalues, npages, global, need_sync); + return; + } + if (global) { + while (!try_lock_tlbie(kvm-arch.tlbie_lock)) + cpu_relax(); + if (need_sync) + asm volatile(ptesync : : : memory); + for (i = 0; i npages; ++i) + asm volatile(PPC_TLBIE(%1,%0) : : +r (rbvalues[i]), r (kvm-arch.lpid)); + asm volatile(eieio; tlbsync; ptesync
[Bug 60505] Heavy network traffic triggers vhost_net lockup
https://bugzilla.kernel.org/show_bug.cgi?id=60505 Bart Van Assche bvanass...@acm.org changed: What|Removed |Added Regression|No |Yes -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] KVM: PPC: Book3S HV: Allow negative offsets to real-mode hcall handlers
The table of offsets to real-mode hcall handlers in book3s_hv_rmhandlers.S can contain negative values, if some of the handlers end up before the table in the vmlinux binary. Thus we need to use a sign-extending load to read the values in the table rather than a zero-extending load. Without this, the host crashes when the guest does one of the hcalls with negative offsets, due to jumping to a bogus address. Signed-off-by: Paul Mackerras pau...@samba.org Cc: sta...@vger.kernel.org --- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index b02f91e..60dce5b 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -1381,7 +1381,7 @@ hcall_try_real_mode: cmpldi r3,hcall_real_table_end - hcall_real_table bge guest_exit_cont LOAD_REG_ADDR(r4, hcall_real_table) - lwzxr3,r3,r4 + lwaxr3,r3,r4 cmpwi r3,0 beq guest_exit_cont add r3,r3,r4 -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 60505] Heavy network traffic triggers vhost_net lockup
https://bugzilla.kernel.org/show_bug.cgi?id=60505 --- Comment #5 from Bart Van Assche bvanass...@acm.org --- The lockup does not occur with kernel 3.8.12 but occurs with at least kernel 3.9.9 and kernel 3.10. I have been able to trigger the lockup with kernel 3.10 without seeing any tasks hanging in vhost_work_flush(). -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] KVM: nVMX: Fix read/write to MSR_IA32_FEATURE_CONTROL
On Sun, Jul 07, 2013 at 11:07:33PM +0800, Arthur Chunqi Li wrote: Fix read/write to IA32_FEATURE_CONTROL MSR in nested environment. This patch simulate this MSR in nested_vmx and the default value is 0x0. BIOS should set it to 0x5 before VMXON. After setting the lock bit, write to it will cause #GP(0). Another QEMU patch is also needed to handle emulation of reset and migration. Reset to vCPU should clear this MSR and migration should reserve value of it. This patch is based on Nadav's previous commit. http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/88478 Signed-off-by: Nadav Har'El nyh at il.ibm.com Nadav's address is n...@math.technion.ac.il. Also the first line of the email should be From: Nadav Har'El n...@math.technion.ac.il Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- arch/x86/kvm/vmx.c | 32 ++-- arch/x86/kvm/x86.c |3 ++- 2 files changed, 28 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index a7e1855..a64efd0 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -373,6 +373,7 @@ struct nested_vmx { * we must keep them pinned while L2 runs. */ struct page *apic_access_page; + u64 msr_ia32_feature_control; }; #define POSTED_INTR_ON 0 @@ -2282,8 +2283,11 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata) switch (msr_index) { case MSR_IA32_FEATURE_CONTROL: - *pdata = 0; - break; + if (nested_vmx_allowed(vcpu)){ Space after { here and everywhere. Use scripts/checkpatch.pl to check your patches for style issues. + *pdata = to_vmx(vcpu)-nested.msr_ia32_feature_control; + break; + } + return 0; case MSR_IA32_VMX_BASIC: /* * This MSR reports some information about VMX support. We @@ -2356,14 +2360,21 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata) return 1; } -static int vmx_set_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data) +static int vmx_set_vmx_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) { + u32 msr_index = msr_info-index; + u64 data = msr_info-data; + bool host_initialized = msr_info-host_initiated; Leave empty line here. if (!nested_vmx_allowed(vcpu)) return 0; - if (msr_index == MSR_IA32_FEATURE_CONTROL) - /* TODO: the right thing. */ + if (msr_index == MSR_IA32_FEATURE_CONTROL){ + if (!host_initialized to_vmx(vcpu)-nested.msr_ia32_feature_control + FEATURE_CONTROL_LOCKED) + return 0; + to_vmx(vcpu)-nested.msr_ia32_feature_control = data; return 1; + } /* * No need to treat VMX capability MSRs specially: If we don't handle * them, handle_wrmsr will #GP(0), which is correct (they are readonly) @@ -2494,7 +2505,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) return 1; /* Otherwise falls through */ default: - if (vmx_set_vmx_msr(vcpu, msr_index, data)) + if (vmx_set_vmx_msr(vcpu, msr_info)) break; msr = find_msr_entry(vmx, msr_index); if (msr) { @@ -5576,6 +5587,8 @@ static int handle_vmon(struct kvm_vcpu *vcpu) struct kvm_segment cs; struct vcpu_vmx *vmx = to_vmx(vcpu); struct vmcs *shadow_vmcs; + const u64 VMXON_NEEDED_FEATURES = FEATURE_CONTROL_LOCKED + | FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX; /* The Intel VMX Instruction Reference lists a bunch of bits that * are prerequisite to running VMXON, most notably cr4.VMXE must be @@ -5604,6 +5617,13 @@ static int handle_vmon(struct kvm_vcpu *vcpu) skip_emulated_instruction(vcpu); return 1; } + + if ((vmx-nested.msr_ia32_feature_control VMXON_NEEDED_FEATURES) + != VMXON_NEEDED_FEATURES) { + kvm_inject_gp(vcpu, 0); + return 1; + } + if (enable_shadow_vmcs) { shadow_vmcs = alloc_vmcs(); if (!shadow_vmcs) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index d21bce5..cff77c4 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -850,7 +850,8 @@ static u32 msrs_to_save[] = { #ifdef CONFIG_X86_64 MSR_CSTAR, MSR_KERNEL_GS_BASE, MSR_SYSCALL_MASK, MSR_LSTAR, #endif - MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA + MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA, + MSR_IA32_FEATURE_CONTROL }; static unsigned num_msrs_to_save; -- 1.7.9.5 -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to
Re: [PATCH 1/2] KVM: Introduce kvm_arch_memslots_updated()
On Thu, 4 Jul 2013 12:53:15 +0300 Gleb Natapov g...@redhat.com wrote: On Thu, Jul 04, 2013 at 01:40:29PM +0900, Takuya Yoshikawa wrote: This is called right after the memslots is updated, i.e. when the result of update_memslots() gets installed in install_new_memslots(). Since the memslots needs to be updated twice when we delete or move a memslot, kvm_arch_commit_memory_region() does not correspond to this exactly. In the following patch, x86 will use this new API to check if the mmio generation has reached its maximum value, in which case mmio sptes need to be flushed out. Signed-off-by: Takuya Yoshikawa yoshikawa_takuya...@lab.ntt.co.jp --- Removed the trailing space after return old_memslots; at this chance. arch/arm/kvm/arm.c |4 arch/ia64/kvm/kvm-ia64.c |4 arch/mips/kvm/kvm_mips.c |4 arch/powerpc/kvm/powerpc.c |4 arch/s390/kvm/kvm-s390.c |4 arch/x86/kvm/x86.c |4 include/linux/kvm_host.h |1 + virt/kvm/kvm_main.c|5 - 8 files changed, 29 insertions(+), 1 deletion(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index e3aae6d..1c1e9de 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -498,6 +498,7 @@ int __kvm_set_memory_region(struct kvm *kvm, void kvm_arch_free_memslot(struct kvm_memory_slot *free, struct kvm_memory_slot *dont); int kvm_arch_create_memslot(struct kvm_memory_slot *slot, unsigned long npages); +void kvm_arch_memslots_updated(struct kvm *kvm); We can define empty function here like this: #ifdef __KVM_HAVE_MEMSLOT_UPDATE void kvm_arch_memslots_updated(struct kvm *kvm); #else static void kvm_arch_memslots_updated(struct kvm *kvm) { } #endif and make x86.c define __KVM_HAVE_MEMSLOT_UPDATE. But I am fine with your approach too. Do other arch maintainers have any preferences here? I don't really have a strong preference either way, so Acked-by: Cornelia Huck cornelia.h...@de.ibm.com for the current approach. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] kvm tools: fix boot of guests with more than 4gb of ram
Hi guys, On Mon, Jul 08, 2013 at 09:12:26AM +0100, Pekka Enberg wrote: On Sun, Jul 7, 2013 at 7:00 PM, Sasha Levin sasha.le...@oracle.com wrote: Commit kvm tools: virtio: remove hardcoded assumptions about guest page size has introduced a bug that prevented guests with more than 4gb of ram from booting. The issue is that 'pfn' is a 32bit integer, so when multiplying it by page size to get the actual page will cause an overflow if the pfn referred to a memory area above 4gb. Signed-off-by: Sasha Levin sasha.le...@oracle.com Will, Michael, Asias, good to merge? I'm at a conference at the moment, so unable to test this patch, but it looks simple and correct enough to me: Acked-by: Will Deacon will.dea...@arm.com Cheers, Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4] KVM: nVMX: Fix read/write to MSR_IA32_FEATURE_CONTROL
From: Nadav Har'El n...@math.technion.ac.il Fix read/write to IA32_FEATURE_CONTROL MSR in nested environment. This patch simulate this MSR in nested_vmx and the default value is 0x0. BIOS should set it to 0x5 before VMXON. After setting the lock bit, write to it will cause #GP(0). Another QEMU patch is also needed to handle emulation of reset and migration. Reset to vCPU should clear this MSR and migration should reserve value of it. This patch is based on Nadav's previous commit. http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/88478 Signed-off-by: Nadav Har'El n...@math.technion.ac.il Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- arch/x86/kvm/vmx.c | 35 +-- arch/x86/kvm/x86.c |3 ++- 2 files changed, 31 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index a7e1855..1200e4e 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -373,6 +373,7 @@ struct nested_vmx { * we must keep them pinned while L2 runs. */ struct page *apic_access_page; + u64 msr_ia32_feature_control; }; #define POSTED_INTR_ON 0 @@ -2282,8 +2283,11 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata) switch (msr_index) { case MSR_IA32_FEATURE_CONTROL: - *pdata = 0; - break; + if (nested_vmx_allowed(vcpu)) { + *pdata = to_vmx(vcpu)-nested.msr_ia32_feature_control; + break; + } + return 0; case MSR_IA32_VMX_BASIC: /* * This MSR reports some information about VMX support. We @@ -2356,14 +2360,24 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata) return 1; } -static int vmx_set_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data) +static int vmx_set_vmx_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) { + u32 msr_index = msr_info-index; + u64 data = msr_info-data; + bool host_initialized = msr_info-host_initiated; + if (!nested_vmx_allowed(vcpu)) return 0; - if (msr_index == MSR_IA32_FEATURE_CONTROL) - /* TODO: the right thing. */ + if (msr_index == MSR_IA32_FEATURE_CONTROL) { + if (!host_initialized + to_vmx(vcpu)-nested.msr_ia32_feature_control +FEATURE_CONTROL_LOCKED) + return 0; + to_vmx(vcpu)-nested.msr_ia32_feature_control = data; return 1; + } + /* * No need to treat VMX capability MSRs specially: If we don't handle * them, handle_wrmsr will #GP(0), which is correct (they are readonly) @@ -2494,7 +2508,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) return 1; /* Otherwise falls through */ default: - if (vmx_set_vmx_msr(vcpu, msr_index, data)) + if (vmx_set_vmx_msr(vcpu, msr_info)) break; msr = find_msr_entry(vmx, msr_index); if (msr) { @@ -5576,6 +5590,8 @@ static int handle_vmon(struct kvm_vcpu *vcpu) struct kvm_segment cs; struct vcpu_vmx *vmx = to_vmx(vcpu); struct vmcs *shadow_vmcs; + const u64 VMXON_NEEDED_FEATURES = FEATURE_CONTROL_LOCKED + | FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX; /* The Intel VMX Instruction Reference lists a bunch of bits that * are prerequisite to running VMXON, most notably cr4.VMXE must be @@ -5604,6 +5620,13 @@ static int handle_vmon(struct kvm_vcpu *vcpu) skip_emulated_instruction(vcpu); return 1; } + + if ((vmx-nested.msr_ia32_feature_control VMXON_NEEDED_FEATURES) + != VMXON_NEEDED_FEATURES) { + kvm_inject_gp(vcpu, 0); + return 1; + } + if (enable_shadow_vmcs) { shadow_vmcs = alloc_vmcs(); if (!shadow_vmcs) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index d21bce5..cff77c4 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -850,7 +850,8 @@ static u32 msrs_to_save[] = { #ifdef CONFIG_X86_64 MSR_CSTAR, MSR_KERNEL_GS_BASE, MSR_SYSCALL_MASK, MSR_LSTAR, #endif - MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA + MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA, + MSR_IA32_FEATURE_CONTROL }; static unsigned num_msrs_to_save; -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL] vhost: cleanups and fixes
The following changes since commit 8bb495e3f02401ee6f76d1b1d77f3ac9f079e376: Linux 3.10 (2013-06-30 15:13:29 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus for you to fetch changes up to 09a34c8404c1d4c5782de319c02e1d742c57875c: vhost/test: update test after vhost cleanups (2013-07-07 18:02:25 +0300) vhost: fixes and cleanups 3.11 This includes some fixes and cleanups for vhost net and scsi drivers. The scsi driver changes will cause a conflict with Nicholas Bellinger's scsi target changes, but the conflicting commit in my tree simply renames some variables so it's trivial to resolve. Signed-off-by: Michael S. Tsirkin m...@redhat.com Asias He (8): vhost: Simplify dev-vqs[i] access vhost-scsi: Remove unnecessary forward struct vhost_scsi declaration vhost-scsi: Rename struct vhost_scsi *s to *vs vhost-scsi: Make func indention more consistent vhost-scsi: Rename struct tcm_vhost_tpg *tv_tpg to *tpg vhost-scsi: Rename struct tcm_vhost_cmd *tv_cmd to *cmd vhost: Make vhost a separate module vhost: Make local function static Michael S. Tsirkin (2): vhost-net: fix use-after-free in vhost_net_flush vhost/test: update test after vhost cleanups drivers/vhost/Kconfig | 8 + drivers/vhost/Makefile | 3 +- drivers/vhost/net.c| 13 +- drivers/vhost/scsi.c | 472 ++--- drivers/vhost/test.c | 33 ++-- drivers/vhost/vhost.c | 86 +++-- drivers/vhost/vhost.h | 2 + 7 files changed, 356 insertions(+), 261 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Registers need to recover when emulating L2 vmexit
Hi Gleb and Paolo, From current KVM codes, when L2 cause VMEXIT or L1 fails to enter L2, host VMX will execute nested_vmx_vmexit() and nested_vmx_entry_failure(). Both of them calls load_vmcs12_host_state() which loads vmcs12's HOST fields as vmcs01's GUEST fields. But the HOST and GUEST fields are not accurately correspondence, e.g. GUEST_CS/ES..._BASE/LIMIT/AR. What will these MSRs be set? Thanks, Arthur -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] PF: Move architecture specifics to the backends
On 2013-07-05 21:55, Dominik Dingel wrote: Current common codes uses PAGE_OFFSET to indicate a bad host virtual address. As this check won't work on architectures that don't map kernel and user memory into the same address space (e.g. s390), it is moved into architcture specific code. Signed-off-by: Dominik Dingel din...@linux.vnet.ibm.com --- arch/arm/include/asm/kvm_host.h | 8 arch/ia64/include/asm/kvm_host.h| 3 +++ arch/mips/include/asm/kvm_host.h| 6 ++ arch/powerpc/include/asm/kvm_host.h | 8 arch/s390/include/asm/kvm_host.h| 12 arch/x86/include/asm/kvm_host.h | 8 include/linux/kvm_host.h| 8 7 files changed, 45 insertions(+), 8 deletions(-) [...] diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index a63d83e..210f493 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -85,14 +85,6 @@ static inline bool is_noslot_pfn(pfn_t pfn) return pfn == KVM_PFN_NOSLOT; } -#define KVM_HVA_ERR_BAD(PAGE_OFFSET) -#define KVM_HVA_ERR_RO_BAD (PAGE_OFFSET + PAGE_SIZE) - -static inline bool kvm_is_error_hva(unsigned long addr) -{ - return addr = PAGE_OFFSET; -} - #define KVM_ERR_PTR_BAD_PAGE (ERR_PTR(-ENOENT)) static inline bool is_error_page(struct page *page) Nit: This breaks arm64. I suppose the patches have been created before the arm64 code got merged, so I'd expect the next version of this series to deal with arm64 as well. Thanks, M. -- Fast, cheap, reliable. Pick two. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 01/13] nEPT: Support LOAD_IA32_EFER entry/exit controls for L1
On Thu, Jul 04, 2013 at 08:42:53AM +, Zhang, Yang Z wrote: Gleb Natapov wrote on 2013-07-02: On Tue, Jul 02, 2013 at 05:34:56PM +0200, Jan Kiszka wrote: On 2013-07-02 17:15, Gleb Natapov wrote: On Tue, Jul 02, 2013 at 04:28:56PM +0200, Jan Kiszka wrote: On 2013-07-02 15:59, Gleb Natapov wrote: On Tue, Jul 02, 2013 at 03:01:24AM +, Zhang, Yang Z wrote: Since this series is pending in mail list for long time. And it's really a big feature for Nested. Also, I doubt the original authors(Jun and Nahav)should not have enough time to continue it. So I will pick it up. :) See comments below: Paolo Bonzini wrote on 2013-05-20: Il 19/05/2013 06:52, Jun Nakajima ha scritto: From: Nadav Har'El n...@il.ibm.com Recent KVM, since http://kerneltrap.org/mailarchive/linux-kvm/2010/5/2/6261577 switch the EFER MSR when EPT is used and the host and guest have different NX bits. So if we add support for nested EPT (L1 guest using EPT to run L2) and want to be able to run recent KVM as L1, we need to allow L1 to use this EFER switching feature. To do this EFER switching, KVM uses VM_ENTRY/EXIT_LOAD_IA32_EFER if available, and if it isn't, it uses the generic VM_ENTRY/EXIT_MSR_LOAD. This patch adds support for the former (the latter is still unsupported). Nested entry and exit emulation (prepare_vmcs_02 and load_vmcs12_host_state, respectively) already handled VM_ENTRY/EXIT_LOAD_IA32_EFER correctly. So all that's left to do in this patch is to properly advertise this feature to L1. Note that vmcs12's VM_ENTRY/EXIT_LOAD_IA32_EFER are emulated by L0, by using vmx_set_efer (which itself sets one of several vmcs02 fields), so we always support this feature, regardless of whether the host supports it. Signed-off-by: Nadav Har'El n...@il.ibm.com Signed-off-by: Jun Nakajima jun.nakaj...@intel.com Signed-off-by: Xinhao Xu xinhao...@intel.com --- arch/x86/kvm/vmx.c | 23 --- 1 file changed, 16 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 260a919..fb9cae5 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2192,7 +2192,8 @@ static __init void nested_vmx_setup_ctls_msrs(void) #else nested_vmx_exit_ctls_high = 0; #endif -nested_vmx_exit_ctls_high |= VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR; +nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR | +VM_EXIT_LOAD_IA32_EFER); /* entry controls */ rdmsr(MSR_IA32_VMX_ENTRY_CTLS, @@ -2201,8 +2202,8 @@ static __init void nested_vmx_setup_ctls_msrs(void) nested_vmx_entry_ctls_low = VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR; nested_vmx_entry_ctls_high = VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_IA32E_MODE; -nested_vmx_entry_ctls_high |= VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR; - +nested_vmx_entry_ctls_high |= (VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR | + VM_ENTRY_LOAD_IA32_EFER); /* cpu-based controls */ rdmsr(MSR_IA32_VMX_PROCBASED_CTLS, nested_vmx_procbased_ctls_low, nested_vmx_procbased_ctls_high); @@ -7492,10 +7493,18 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) vcpu-arch.cr0_guest_owned_bits = ~vmcs12-cr0_guest_host_mask; vmcs_writel(CR0_GUEST_HOST_MASK, ~vcpu-arch.cr0_guest_owned_bits); -/* Note: IA32_MODE, LOAD_IA32_EFER are modified by vmx_set_efer below */ -vmcs_write32(VM_EXIT_CONTROLS, - vmcs12-vm_exit_controls | vmcs_config.vmexit_ctrl); - vmcs_write32(VM_ENTRY_CONTROLS, vmcs12-vm_entry_controls | +/* L2-L1 exit controls are emulated - the hardware exit is +to L0 so + * we should use its exit controls. Note that IA32_MODE, LOAD_IA32_EFER + * bits are further modified by vmx_set_efer() below. + */ +vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl); This is wrong. We cannot use L0 exit control directly. LOAD_PERF_GLOBAL_CTRL, LOAD_HOST_EFE, LOAD_HOST_PAT, ACK_INTR_ON_EXIT should use host's exit control. But others, still need use (vmcs12|host). I do not see why. We always intercept DR7/PAT/EFER, so save is emulated too. Host address space size always come from L0 and preemption timer is not supported for nested IIRC and when it will be host will have to save it on exit anyway for correct emulation. Preemption timer is already supported and works fine as far as I tested. KVM doesn't use it for L1, so we do not need to save/restore it - IIRC. So what happens if L1 configures it to value X after X/2 ticks L0 exit happen and L0 gets back to L2 directly. The counter will be X again instead of X/2. Likely. Yes, we need to improve our emulation by setting Save VMX-preemption timer value or emulate this in software if the hardware lacks support for it (was this flag introduced
Re: [Qemu-devel] [PATCH qom-cpu v9] target-i386: Move hyperv_* static globals to X86CPU
On Mon, 8 Jul 2013 03:03:54 +0200 Andreas Färber afaer...@suse.de wrote: From: Igor Mammedov imamm...@redhat.com - since hyperv_* helper functions are used only in target-i386/kvm.c move them there as static helpers Requested-by: Eduardo Habkost ehabk...@redhat.com Signed-off-by: Igor Mammedov imamm...@redhat.com Signed-off-by: Andreas Färber afaer...@suse.de I'm not tested it yet, but it looks good to me. --- v8 (imammedo) - v9: * Use X86CPU instead of CPUX86State (only used in KVM) * Changed helper functions to X86CPU argument * Moved field initialization to QOM instance_init * Fixed subject (not today's CPUState) target-i386/Makefile.objs | 2 +- target-i386/cpu-qom.h | 4 +++ target-i386/cpu.c | 16 target-i386/cpu.h | 4 +++ target-i386/hyperv.c | 64 --- target-i386/hyperv.h | 45 - target-i386/kvm.c | 36 ++ 7 files changed, 46 insertions(+), 125 deletions(-) delete mode 100644 target-i386/hyperv.c delete mode 100644 target-i386/hyperv.h diff --git a/target-i386/Makefile.objs b/target-i386/Makefile.objs index c1d4f05..887dca7 100644 --- a/target-i386/Makefile.objs +++ b/target-i386/Makefile.objs @@ -2,7 +2,7 @@ obj-y += translate.o helper.o cpu.o obj-y += excp_helper.o fpu_helper.o cc_helper.o int_helper.o svm_helper.o obj-y += smm_helper.o misc_helper.o mem_helper.o seg_helper.o obj-$(CONFIG_SOFTMMU) += machine.o arch_memory_mapping.o arch_dump.o -obj-$(CONFIG_KVM) += kvm.o hyperv.o +obj-$(CONFIG_KVM) += kvm.o obj-$(CONFIG_NO_KVM) += kvm-stub.o obj-$(CONFIG_LINUX_USER) += ioport-user.o obj-$(CONFIG_BSD_USER) += ioport-user.o diff --git a/target-i386/cpu-qom.h b/target-i386/cpu-qom.h index 7e55e5f..18f08b8 100644 --- a/target-i386/cpu-qom.h +++ b/target-i386/cpu-qom.h @@ -66,6 +66,10 @@ typedef struct X86CPU { CPUX86State env; +bool hyperv_vapic; +bool hyperv_relaxed_timing; +int hyperv_spinlock_attempts; + /* Features that were filtered out because of missing host capabilities */ uint32_t filtered_features[FEATURE_WORDS]; } X86CPU; diff --git a/target-i386/cpu.c b/target-i386/cpu.c index e3f75a8..14e9c7e 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -35,8 +35,6 @@ #include qapi/visitor.h #include sysemu/arch_init.h -#include hyperv.h - #include hw/hw.h #if defined(CONFIG_KVM) #include linux/kvm_para.h @@ -1587,12 +1585,19 @@ static void cpu_x86_parse_featurestr(X86CPU *cpu, char *features, Error **errp) object_property_parse(OBJECT(cpu), num, tsc-frequency, errp); } else if (!strcmp(featurestr, hv-spinlocks)) { char *err; +const int min = 0xFFF; numvalue = strtoul(val, err, 0); if (!*val || *err) { error_setg(errp, bad numerical value %s, val); goto out; } -hyperv_set_spinlock_retries(numvalue); +if (numvalue min) { +fprintf(stderr, hv-spinlocks value shall always be = 0x%x +, fixup will be removed in future versions\n, +min); +numvalue = min; +} +cpu-hyperv_spinlock_attempts = numvalue; } else { error_setg(errp, unrecognized feature %s, featurestr); goto out; @@ -1602,9 +1607,9 @@ static void cpu_x86_parse_featurestr(X86CPU *cpu, char *features, Error **errp) } else if (!strcmp(featurestr, enforce)) { check_cpuid = enforce_cpuid = 1; } else if (!strcmp(featurestr, hv_relaxed)) { -hyperv_enable_relaxed_timing(true); +cpu-hyperv_relaxed_timing = true; } else if (!strcmp(featurestr, hv_vapic)) { -hyperv_enable_vapic_recommended(true); +cpu-hyperv_vapic = true; } else { error_setg(errp, feature string `%s' not in format (+feature| -feature|feature=xyz), featurestr); @@ -2479,6 +2484,7 @@ static void x86_cpu_initfn(Object *obj) x86_cpu_get_feature_words, NULL, NULL, (void *)cpu-filtered_features, NULL); +cpu-hyperv_spinlock_attempts = HYPERV_SPINLOCK_NEVER_RETRY; env-cpuid_apic_id = x86_cpu_apic_id_from_index(cs-cpu_index); /* init various static tables used in TCG mode */ diff --git a/target-i386/cpu.h b/target-i386/cpu.h index 2d005b3..6c3eb86 100644 --- a/target-i386/cpu.h +++ b/target-i386/cpu.h @@ -549,6 +549,10 @@ typedef uint32_t FeatureWordArray[FEATURE_WORDS]; #define CPUID_MWAIT_IBE (1 1) /* Interrupts can exit capability */ #define CPUID_MWAIT_EMX (1 0) /* enumeration
[Bug 60271] Kernelpanic since 3.9.8 with qemu-kvm and pci-passthrough
https://bugzilla.kernel.org/show_bug.cgi?id=60271 Fabian Zimmermann dev@googlemail.com changed: What|Removed |Added Summary|Kernelpanic since 3.9.6 |Kernelpanic since 3.9.8 |with qemu-kvm and |with qemu-kvm and |pci-passthrough |pci-passthrough -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 60271] Kernelpanic since 3.9.8 with qemu-kvm and pci-passthrough
https://bugzilla.kernel.org/show_bug.cgi?id=60271 --- Comment #6 from Fabian Zimmermann dev@googlemail.com --- * reversed both above patches - problem still there * disabled radeon-module (in kernel-config) - problem still there Attached you will find the dmesg.txt (netconsole-output of panic). Don't hesitate to ask if I can provide further information -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 60271] Kernelpanic since 3.9.8 with qemu-kvm and pci-passthrough
https://bugzilla.kernel.org/show_bug.cgi?id=60271 --- Comment #7 from Fabian Zimmermann dev@googlemail.com --- Created attachment 106838 -- https://bugzilla.kernel.org/attachment.cgi?id=106838action=edit netconsole / dmesg of panic -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 60271] Kernelpanic since 3.9.8 with qemu-kvm and pci-passthrough
https://bugzilla.kernel.org/show_bug.cgi?id=60271 Fabian Zimmermann dev@googlemail.com changed: What|Removed |Added Kernel Version|3.9.8 |3.9.8, 3.9.9 --- Comment #8 from Fabian Zimmermann dev@googlemail.com --- 3.9.9 is affected, too. -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation
On 28.06.2013, at 11:20, Mihai Caraman wrote: lwepx faults needs to be handled by KVM and this implies additional code in DO_KVM macro to identify the source of the exception originated from host context. This requires to check the Exception Syndrome Register (ESR[EPID]) and External PID Load Context Register (EPLC[EGS]) for DTB_MISS, DSI and LRAT exceptions which is too intrusive for the host. Get rid of lwepx and acquire last instuction in kvmppc_handle_exit() by searching for the physical address and kmap it. This fixes an infinite loop What's the difference in speed for this? Also, could we call lwepx later in host code, when kvmppc_get_last_inst() gets invoked? caused by lwepx's data TLB miss handled in the host and the TODO for TLB eviction and execute-but-not-read entries. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com --- Resend this pacth for Alex G. he was unsubscribed from kvm-ppc mailist for a while. arch/powerpc/include/asm/mmu-book3e.h |6 ++- arch/powerpc/kvm/booke.c |6 +++ arch/powerpc/kvm/booke.h |2 + arch/powerpc/kvm/bookehv_interrupts.S | 32 ++- arch/powerpc/kvm/e500.c |4 ++ arch/powerpc/kvm/e500mc.c | 69 + 6 files changed, 91 insertions(+), 28 deletions(-) diff --git a/arch/powerpc/include/asm/mmu-book3e.h b/arch/powerpc/include/asm/mmu-book3e.h index 99d43e0..32e470e 100644 --- a/arch/powerpc/include/asm/mmu-book3e.h +++ b/arch/powerpc/include/asm/mmu-book3e.h @@ -40,7 +40,10 @@ /* MAS registers bit definitions */ -#define MAS0_TLBSEL(x) (((x) 28) 0x3000) +#define MAS0_TLBSEL_MASK 0x3000 +#define MAS0_TLBSEL_SHIFT28 +#define MAS0_TLBSEL(x) (((x) MAS0_TLBSEL_SHIFT) MAS0_TLBSEL_MASK) +#define MAS0_GET_TLBSEL(mas0)(((mas0) MAS0_TLBSEL_MASK) MAS0_TLBSEL_SHIFT) #define MAS0_ESEL_MASK0x0FFF #define MAS0_ESEL_SHIFT 16 #define MAS0_ESEL(x) (((x) MAS0_ESEL_SHIFT) MAS0_ESEL_MASK) @@ -58,6 +61,7 @@ #define MAS1_TSIZE_MASK 0x0f80 #define MAS1_TSIZE_SHIFT 7 #define MAS1_TSIZE(x) (((x) MAS1_TSIZE_SHIFT) MAS1_TSIZE_MASK) +#define MAS1_GET_TSIZE(mas1) (((mas1) MAS1_TSIZE_MASK) MAS1_TSIZE_SHIFT) #define MAS2_EPN (~0xFFFUL) #define MAS2_X0 0x0040 diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 1020119..6764a8e 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -836,6 +836,12 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, /* update before a new last_exit_type is rewritten */ kvmppc_update_timing_stats(vcpu); + /* + * The exception type can change at this point, such as if the TLB entry + * for the emulated instruction has been evicted. + */ + kvmppc_prepare_for_emulation(vcpu, exit_nr); Please model this the same way as book3s. Check out kvmppc_get_last_inst() as a starting point. + /* restart interrupts if they were meant for the host */ kvmppc_restart_interrupt(vcpu, exit_nr); diff --git a/arch/powerpc/kvm/booke.h b/arch/powerpc/kvm/booke.h index 5fd1ba6..a0d0fea 100644 --- a/arch/powerpc/kvm/booke.h +++ b/arch/powerpc/kvm/booke.h @@ -90,6 +90,8 @@ void kvmppc_vcpu_disable_spe(struct kvm_vcpu *vcpu); void kvmppc_booke_vcpu_load(struct kvm_vcpu *vcpu, int cpu); void kvmppc_booke_vcpu_put(struct kvm_vcpu *vcpu); +void kvmppc_prepare_for_emulation(struct kvm_vcpu *vcpu, unsigned int *exit_nr); + enum int_class { INT_CLASS_NONCRIT, INT_CLASS_CRIT, diff --git a/arch/powerpc/kvm/bookehv_interrupts.S b/arch/powerpc/kvm/bookehv_interrupts.S index 20c7a54..0538ab9 100644 --- a/arch/powerpc/kvm/bookehv_interrupts.S +++ b/arch/powerpc/kvm/bookehv_interrupts.S @@ -120,37 +120,20 @@ .if \flags NEED_EMU /* - * This assumes you have external PID support. - * To support a bookehv CPU without external PID, you'll - * need to look up the TLB entry and create a temporary mapping. - * - * FIXME: we don't currently handle if the lwepx faults. PR-mode - * booke doesn't handle it either. Since Linux doesn't use - * broadcast tlbivax anymore, the only way this should happen is - * if the guest maps its memory execute-but-not-read, or if we - * somehow take a TLB miss in the middle of this entry code and - * evict the relevant entry. On e500mc, all kernel lowmem is - * bolted into TLB1 large page mappings, and we don't use - * broadcast invalidates, so we should not take a TLB miss here. - * - * Later we'll need to deal with faults here. Disallowing guest - * mappings that are execute-but-not-read could be an option on - * e500mc, but not on chips with an LRAT if it is used.
[Bug 60271] Kernelpanic since 3.9.8 with qemu-kvm and pci-passthrough
https://bugzilla.kernel.org/show_bug.cgi?id=60271 Michael S. Tsirkin m.s.tsir...@gmail.com changed: What|Removed |Added CC||m.s.tsir...@gmail.com --- Comment #9 from Michael S. Tsirkin m.s.tsir...@gmail.com --- can you pls try disabling zero copy tx in vhost_net? it's a module parameter for this module -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -V3 1/4] mm/cma: Move dma contiguous changes into a seperate config
On 02.07.2013, at 07:45, Aneesh Kumar K.V wrote: From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com We want to use CMA for allocating hash page table and real mode area for PPC64. Hence move DMA contiguous related changes into a seperate config so that ppc64 can enable CMA without requiring DMA contiguous. Acked-by: Michal Nazarewicz min...@mina86.com Acked-by: Paul Mackerras pau...@samba.org Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com Thanks, applied all to kvm-ppc-queue. Please provide a cover letter next time :). Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v3 01/13] nEPT: Support LOAD_IA32_EFER entry/exit controls for L1
-Original Message- From: Gleb Natapov [mailto:g...@redhat.com] Sent: Monday, July 08, 2013 8:38 PM To: Zhang, Yang Z Cc: Jan Kiszka; Paolo Bonzini; Nakajima, Jun; kvm@vger.kernel.org Subject: Re: [PATCH v3 01/13] nEPT: Support LOAD_IA32_EFER entry/exit controls for L1 On Thu, Jul 04, 2013 at 08:42:53AM +, Zhang, Yang Z wrote: Gleb Natapov wrote on 2013-07-02: On Tue, Jul 02, 2013 at 05:34:56PM +0200, Jan Kiszka wrote: On 2013-07-02 17:15, Gleb Natapov wrote: On Tue, Jul 02, 2013 at 04:28:56PM +0200, Jan Kiszka wrote: On 2013-07-02 15:59, Gleb Natapov wrote: On Tue, Jul 02, 2013 at 03:01:24AM +, Zhang, Yang Z wrote: Since this series is pending in mail list for long time. And it's really a big feature for Nested. Also, I doubt the original authors(Jun and Nahav)should not have enough time to continue it. So I will pick it up. :) See comments below: Paolo Bonzini wrote on 2013-05-20: Il 19/05/2013 06:52, Jun Nakajima ha scritto: From: Nadav Har'El n...@il.ibm.com Recent KVM, since http://kerneltrap.org/mailarchive/linux-kvm/2010/5/2/6261577 switch the EFER MSR when EPT is used and the host and guest have different NX bits. So if we add support for nested EPT (L1 guest using EPT to run L2) and want to be able to run recent KVM as L1, we need to allow L1 to use this EFER switching feature. To do this EFER switching, KVM uses VM_ENTRY/EXIT_LOAD_IA32_EFER if available, and if it isn't, it uses the generic VM_ENTRY/EXIT_MSR_LOAD. This patch adds support for the former (the latter is still unsupported). Nested entry and exit emulation (prepare_vmcs_02 and load_vmcs12_host_state, respectively) already handled VM_ENTRY/EXIT_LOAD_IA32_EFER correctly. So all that's left to do in this patch is to properly advertise this feature to L1. Note that vmcs12's VM_ENTRY/EXIT_LOAD_IA32_EFER are emulated by L0, by using vmx_set_efer (which itself sets one of several vmcs02 fields), so we always support this feature, regardless of whether the host supports it. Signed-off-by: Nadav Har'El n...@il.ibm.com Signed-off-by: Jun Nakajima jun.nakaj...@intel.com Signed-off-by: Xinhao Xu xinhao...@intel.com --- arch/x86/kvm/vmx.c | 23 --- 1 file changed, 16 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 260a919..fb9cae5 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2192,7 +2192,8 @@ static __init void nested_vmx_setup_ctls_msrs(void) #else nested_vmx_exit_ctls_high = 0; #endif - nested_vmx_exit_ctls_high |= VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR; + nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR | + VM_EXIT_LOAD_IA32_EFER); /* entry controls */ rdmsr(MSR_IA32_VMX_ENTRY_CTLS, @@ -2201,8 +2202,8 @@ static __init void nested_vmx_setup_ctls_msrs(void) nested_vmx_entry_ctls_low = VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR; nested_vmx_entry_ctls_high = VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_IA32E_MODE; - nested_vmx_entry_ctls_high |= VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR; - + nested_vmx_entry_ctls_high |= (VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR | + VM_ENTRY_LOAD_IA32_EFER); /* cpu-based controls */ rdmsr(MSR_IA32_VMX_PROCBASED_CTLS, nested_vmx_procbased_ctls_low, nested_vmx_procbased_ctls_high); @@ -7492,10 +7493,18 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) vcpu-arch.cr0_guest_owned_bits = ~vmcs12-cr0_guest_host_mask; vmcs_writel(CR0_GUEST_HOST_MASK, ~vcpu-arch.cr0_guest_owned_bits); - /* Note: IA32_MODE, LOAD_IA32_EFER are modified by vmx_set_efer below */ - vmcs_write32(VM_EXIT_CONTROLS, - vmcs12-vm_exit_controls | vmcs_config.vmexit_ctrl); -vmcs_write32(VM_ENTRY_CONTROLS, vmcs12-vm_entry_controls | + /* L2-L1 exit controls are emulated - the hardware exit is +to L0 so + * we should use its exit controls. Note that IA32_MODE, LOAD_IA32_EFER +* bits are further modified by vmx_set_efer() below. + */ + vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl); This is wrong. We cannot use L0 exit control directly. LOAD_PERF_GLOBAL_CTRL, LOAD_HOST_EFE, LOAD_HOST_PAT, ACK_INTR_ON_EXIT should use host's exit control. But others, still need use (vmcs12|host). I do not see why. We always intercept DR7/PAT/EFER, so save is emulated too. Host address space size always come from L0 and preemption timer is not supported for nested IIRC and when it will be host will have to save it on exit anyway for correct emulation. Preemption timer is already supported and works fine as far as I tested. KVM doesn't use it for L1, so we do not need to save/restore it - IIRC. So what
guests not shutting down when host shuts down
Hi, i have a SLES 11 SP2 64bit host with three guests: - Windows XP 32 - Ubuntu 12.04 LTS 64bit - SLES 11 SP2 64bit The SLES guest shuts down with the host shutdown. The others not. When i shutdown these two guests with the virt-manager, they shutdown fine. ACPI is activated in virt-manager for both of them. When the host shuts down, the two guests get a signal (excerpt from the log of the host:) === 2013-07-07 16:39:51.674: starting up LC_ALL=C PATH=/bin:/sbin:/usr/bin:/usr/sbin HOME=/ QEMU_AUDIO_DRV=none /usr/bin/qemu-kvm -S -M pc-0.15 -enable-kvm -m 1025 -smp 1,sockets=1,cores=1,threads=1 -name greensql_2 -uuid 2cfbac9c-dbb2-c4bf-4aba-2d18dc49d18e -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/greensql_2.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -drive file=/var/lib/kvm/images/greensql_2/disk0.raw,if=none,id=drive-ide0-0-0,format=raw -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=17,id=hostnet0,vhost=on,vhostfd=20 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:37:92:a9,bus=pci.0,addr=0x3 -usb -vnc 127.0.0.1:2 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 Domain id=3 is tainted: high-privileges qemu: terminating on signal 15 from pid 24958 2013-07-08 13:58:29.651: starting up == I'm a bit astonished about no-shutdown in the commandline, but the sles guest also has it in its commandline, so it should not bother. Thanks for any help. Bernd -- Bernd Lentes Systemadministration Institut für Entwicklungsgenetik Gebäude 35.34 - Raum 208 HelmholtzZentrum münchen bernd.len...@helmholtz-muenchen.de phone: +49 89 3187 1241 fax: +49 89 3187 2294 http://www.helmholtz-muenchen.de/idg Wer nichts verdient außer Geld verdient nichts außer Geld Helmholtz Zentrum München Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH) Ingolstädter Landstr. 1 85764 Neuherberg www.helmholtz-muenchen.de Aufsichtsratsvorsitzende: MinDir´in Bärbel Brumme-Bothe Geschäftsführer: Prof. Dr. Günther Wess Dr. Nikolaus Blum Dr. Alfons Enhsen Registergericht: Amtsgericht München HRB 6466 USt-IdNr: DE 129521671 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 01/13] nEPT: Support LOAD_IA32_EFER entry/exit controls for L1
On Mon, Jul 08, 2013 at 02:28:15PM +, Zhang, Yang Z wrote: -Original Message- From: Gleb Natapov [mailto:g...@redhat.com] Sent: Monday, July 08, 2013 8:38 PM To: Zhang, Yang Z Cc: Jan Kiszka; Paolo Bonzini; Nakajima, Jun; kvm@vger.kernel.org Subject: Re: [PATCH v3 01/13] nEPT: Support LOAD_IA32_EFER entry/exit controls for L1 On Thu, Jul 04, 2013 at 08:42:53AM +, Zhang, Yang Z wrote: Gleb Natapov wrote on 2013-07-02: On Tue, Jul 02, 2013 at 05:34:56PM +0200, Jan Kiszka wrote: On 2013-07-02 17:15, Gleb Natapov wrote: On Tue, Jul 02, 2013 at 04:28:56PM +0200, Jan Kiszka wrote: On 2013-07-02 15:59, Gleb Natapov wrote: On Tue, Jul 02, 2013 at 03:01:24AM +, Zhang, Yang Z wrote: Since this series is pending in mail list for long time. And it's really a big feature for Nested. Also, I doubt the original authors(Jun and Nahav)should not have enough time to continue it. So I will pick it up. :) See comments below: Paolo Bonzini wrote on 2013-05-20: Il 19/05/2013 06:52, Jun Nakajima ha scritto: From: Nadav Har'El n...@il.ibm.com Recent KVM, since http://kerneltrap.org/mailarchive/linux-kvm/2010/5/2/6261577 switch the EFER MSR when EPT is used and the host and guest have different NX bits. So if we add support for nested EPT (L1 guest using EPT to run L2) and want to be able to run recent KVM as L1, we need to allow L1 to use this EFER switching feature. To do this EFER switching, KVM uses VM_ENTRY/EXIT_LOAD_IA32_EFER if available, and if it isn't, it uses the generic VM_ENTRY/EXIT_MSR_LOAD. This patch adds support for the former (the latter is still unsupported). Nested entry and exit emulation (prepare_vmcs_02 and load_vmcs12_host_state, respectively) already handled VM_ENTRY/EXIT_LOAD_IA32_EFER correctly. So all that's left to do in this patch is to properly advertise this feature to L1. Note that vmcs12's VM_ENTRY/EXIT_LOAD_IA32_EFER are emulated by L0, by using vmx_set_efer (which itself sets one of several vmcs02 fields), so we always support this feature, regardless of whether the host supports it. Signed-off-by: Nadav Har'El n...@il.ibm.com Signed-off-by: Jun Nakajima jun.nakaj...@intel.com Signed-off-by: Xinhao Xu xinhao...@intel.com --- arch/x86/kvm/vmx.c | 23 --- 1 file changed, 16 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 260a919..fb9cae5 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2192,7 +2192,8 @@ static __init void nested_vmx_setup_ctls_msrs(void) #else nested_vmx_exit_ctls_high = 0; #endif -nested_vmx_exit_ctls_high |= VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR; +nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR | +VM_EXIT_LOAD_IA32_EFER); /* entry controls */ rdmsr(MSR_IA32_VMX_ENTRY_CTLS, @@ -2201,8 +2202,8 @@ static __init void nested_vmx_setup_ctls_msrs(void) nested_vmx_entry_ctls_low = VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR; nested_vmx_entry_ctls_high = VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_IA32E_MODE; -nested_vmx_entry_ctls_high |= VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR; - +nested_vmx_entry_ctls_high |= (VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR | + VM_ENTRY_LOAD_IA32_EFER); /* cpu-based controls */ rdmsr(MSR_IA32_VMX_PROCBASED_CTLS, nested_vmx_procbased_ctls_low, nested_vmx_procbased_ctls_high); @@ -7492,10 +7493,18 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) vcpu-arch.cr0_guest_owned_bits = ~vmcs12-cr0_guest_host_mask; vmcs_writel(CR0_GUEST_HOST_MASK, ~vcpu-arch.cr0_guest_owned_bits); -/* Note: IA32_MODE, LOAD_IA32_EFER are modified by vmx_set_efer below */ -vmcs_write32(VM_EXIT_CONTROLS, - vmcs12-vm_exit_controls | vmcs_config.vmexit_ctrl); - vmcs_write32(VM_ENTRY_CONTROLS, vmcs12-vm_entry_controls | +/* L2-L1 exit controls are emulated - the hardware exit is +to L0 so + * we should use its exit controls. Note that IA32_MODE, LOAD_IA32_EFER + * bits are further modified by vmx_set_efer() below. + */ +vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl); This is wrong. We cannot use L0 exit control directly. LOAD_PERF_GLOBAL_CTRL, LOAD_HOST_EFE, LOAD_HOST_PAT, ACK_INTR_ON_EXIT should use host's exit control. But others, still need use (vmcs12|host). I do not see why. We always intercept DR7/PAT/EFER, so save is emulated too. Host address space size always come from L0 and preemption timer is not supported for nested IIRC and
[GIT PULL] VFIO for v3.11
Hi Linus, The following changes since commit 7d132055814ef17a6c7b69f342244c410a5e000f: Linux 3.10-rc6 (2013-06-15 11:51:07 -1000) are available in the git repository at: git://github.com/awilliam/linux-vfio.git tags/vfio-v3.11 for you to fetch changes up to 8d38ef1948bd415a5cb653a5c0ec16f3402aaca1: vfio/type1: Fix leak on error path (2013-07-01 08:28:58 -0600) vfio Updates for v3.11 Largely hugepage support for vfio/type1 iommu and surrounding cleanups and fixes. Alex Williamson (6): vfio: Convert type1 iommu to use rbtree vfio: hugepage support for vfio_iommu_type1 vfio: Provide module option to disable vfio_iommu_type1 hugepage support vfio/type1: Fix missed frees and zero sized removes vfio: Limit group opens vfio/type1: Fix leak on error path Alexey Kardashevskiy (1): vfio: fix documentation Documentation/vfio.txt | 6 +- drivers/vfio/vfio.c | 14 +++ drivers/vfio/vfio_iommu_type1.c | 626 +--- include/uapi/linux/vfio.h | 8 +- 4 files changed, 424 insertions(+), 230 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Registers need to recover when emulating L2 vmexit
On Mon, Jul 08, 2013 at 07:50:45PM +0800, Arthur Chunqi Li wrote: Hi Gleb and Paolo, From current KVM codes, when L2 cause VMEXIT or L1 fails to enter L2, host VMX will execute nested_vmx_vmexit() and nested_vmx_entry_failure(). Both of them calls load_vmcs12_host_state() which loads vmcs12's HOST fields as vmcs01's GUEST fields. But the HOST and GUEST fields are not accurately correspondence, e.g. GUEST_CS/ES..._BASE/LIMIT/AR. What will these MSRs be set? This is not MSRs, but VMCS field. Currently they are set to whatever value they had in vmcs01 when L1 executed VMLAUNCH, but this is incorrect. They should be set according to section 27.5.2 Loading Host Segment and Descriptor-Table Registers of SDM. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM: PPC: Fix kvm_exit_names array
On 03.07.2013, at 15:30, Mihai Caraman wrote: Some exit ids where left out from kvm_exit_names array. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com --- arch/powerpc/kvm/timing.c |4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/kvm/timing.c b/arch/powerpc/kvm/timing.c index 07b6110..c392d26 100644 --- a/arch/powerpc/kvm/timing.c +++ b/arch/powerpc/kvm/timing.c @@ -135,7 +135,9 @@ static const char *kvm_exit_names[__NUMBER_OF_KVM_EXIT_TYPES] = { [USR_PR_INST] = USR_PR_INST, [FP_UNAVAIL] = FP_UNAVAIL, [DEBUG_EXITS] = DEBUG, - [TIMEINGUEST] = TIMEINGUEST + [TIMEINGUEST] = TIMEINGUEST, + [DBELL_EXITS] = DBELL, + [GDBELL_EXITS] =GDBELL Please add a comma at the end here, so that we don't have to uselessly touch the entry next time again. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] KVM: PPC: Book3E: Emulate MCSRR0/1 SPR and rfmci instruction
On 03.07.2013, at 15:30, Mihai Caraman wrote: Some guests are making use of return from machine check instruction to do crazy things even though the 64-bit kernel doesn't handle yet this interrupt. Emulate MCSRR0/1 SPR and rfmci instruction accordingly. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com --- arch/powerpc/include/asm/kvm_host.h |1 + arch/powerpc/kvm/booke_emulate.c| 25 + arch/powerpc/kvm/timing.c |1 + 3 files changed, 27 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index af326cd..0466789 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -148,6 +148,7 @@ enum kvm_exit_types { EMULATED_TLBWE_EXITS, EMULATED_RFI_EXITS, EMULATED_RFCI_EXITS, + EMULATED_RFMCI_EXITS, I would quite frankly prefer to see us abandon the whole exit timing framework in the kernel and instead use trace points. Then we don't have to maintain all of this randomly exercised code. FWIW I think in this case however, treating RFMCI the same as RFI or random instruction emulation shouldn't hurt. This whole table is only about timing measurements. If you want to know for real what's going on, use trace points. Otherwise looks good. Alex DEC_EXITS, EXT_INTR_EXITS, HALT_WAKEUP, diff --git a/arch/powerpc/kvm/booke_emulate.c b/arch/powerpc/kvm/booke_emulate.c index 27a4b28..aaff1b7 100644 --- a/arch/powerpc/kvm/booke_emulate.c +++ b/arch/powerpc/kvm/booke_emulate.c @@ -23,6 +23,7 @@ #include booke.h +#define OP_19_XOP_RFMCI 38 #define OP_19_XOP_RFI 50 #define OP_19_XOP_RFCI51 @@ -43,6 +44,12 @@ static void kvmppc_emul_rfci(struct kvm_vcpu *vcpu) kvmppc_set_msr(vcpu, vcpu-arch.csrr1); } +static void kvmppc_emul_rfmci(struct kvm_vcpu *vcpu) +{ + vcpu-arch.pc = vcpu-arch.mcsrr0; + kvmppc_set_msr(vcpu, vcpu-arch.mcsrr1); +} + int kvmppc_booke_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu, unsigned int inst, int *advance) { @@ -65,6 +72,12 @@ int kvmppc_booke_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu, *advance = 0; break; + case OP_19_XOP_RFMCI: + kvmppc_emul_rfmci(vcpu); + kvmppc_set_exit_type(vcpu, EMULATED_RFMCI_EXITS); + *advance = 0; + break; + default: emulated = EMULATE_FAIL; break; @@ -138,6 +151,12 @@ int kvmppc_booke_emulate_mtspr(struct kvm_vcpu *vcpu, int sprn, ulong spr_val) case SPRN_DBCR1: vcpu-arch.dbg_reg.dbcr1 = spr_val; break; + case SPRN_MCSRR0: + vcpu-arch.mcsrr0 = spr_val; + break; + case SPRN_MCSRR1: + vcpu-arch.mcsrr1 = spr_val; + break; case SPRN_DBSR: vcpu-arch.dbsr = ~spr_val; break; @@ -284,6 +303,12 @@ int kvmppc_booke_emulate_mfspr(struct kvm_vcpu *vcpu, int sprn, ulong *spr_val) case SPRN_DBCR1: *spr_val = vcpu-arch.dbg_reg.dbcr1; break; + case SPRN_MCSRR0: + *spr_val = vcpu-arch.mcsrr0; + break; + case SPRN_MCSRR1: + *spr_val = vcpu-arch.mcsrr1; + break; case SPRN_DBSR: *spr_val = vcpu-arch.dbsr; break; diff --git a/arch/powerpc/kvm/timing.c b/arch/powerpc/kvm/timing.c index c392d26..670f63d 100644 --- a/arch/powerpc/kvm/timing.c +++ b/arch/powerpc/kvm/timing.c @@ -129,6 +129,7 @@ static const char *kvm_exit_names[__NUMBER_OF_KVM_EXIT_TYPES] = { [EMULATED_TLBSX_EXITS] =EMUL_TLBSX, [EMULATED_TLBWE_EXITS] =EMUL_TLBWE, [EMULATED_RFI_EXITS] = EMUL_RFI, + [EMULATED_RFMCI_EXITS] =EMUL_RFMCI, [DEC_EXITS] = DEC, [EXT_INTR_EXITS] = EXTINT, [HALT_WAKEUP] = HALT, -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM: x86: stop IO emulation cycle if instruction pointer is modified
On Sat, Jul 06, 2013 at 10:41:12AM +0300, Gleb Natapov wrote: On Fri, Jul 05, 2013 at 04:16:55PM -0300, Marcelo Tosatti wrote: MMIO/PIO emulation should be interrupted if the system is restarted. Otherwise in progress IO emulation continues at the instruction pointer, even after vcpus' IP has been modified by KVM_SET_REGS. Use IP change as an indicator to reset MMIO/PIO emulation state. Userspace has to return to the kernel to complete pending IO operation. This is documented in Documentation/virtual/kvm/api.txt. If this is not what program does it is a bug. What userspace you see the problem with? You're right, this patch should not be necessary. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/8] vfio: add external user support
On Sun, 2013-07-07 at 01:07 +1000, Alexey Kardashevskiy wrote: VFIO is designed to be used via ioctls on file descriptors returned by VFIO. However in some situations support for an external user is required. The first user is KVM on PPC64 (SPAPR TCE protocol) which is going to use the existing VFIO groups for exclusive access in real/virtual mode on a host to avoid passing map/unmap requests to the user space which would made things pretty slow. The proposed protocol includes: 1. do normal VFIO init stuff such as opening a new container, attaching group(s) to it, setting an IOMMU driver for a container. When IOMMU is set for a container, all groups in it are considered ready to use by an external user. 2. pass a fd of the group we want to accelerate to KVM. KVM calls vfio_group_get_external_user() to verify if the group is initialized, IOMMU is set for it and increment the container user counter to prevent the VFIO group from disposal prior to KVM exit. The current TCE IOMMU driver marks the whole IOMMU table as busy when IOMMU is set for a container what prevents other DMA users from allocating from it so it is safe to grant user space access to it. 3. KVM calls vfio_external_user_iommu_id() to obtian an IOMMU ID which KVM uses to get an iommu_group struct for later use. 4. When KVM is finished, it calls vfio_group_put_external_user() to release the VFIO group by decrementing the container user counter. Everything gets released. The vfio: Limit group opens patch is also required for the consistency. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index c488da5..57aa191 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -1370,6 +1370,62 @@ static const struct file_operations vfio_device_fops = { }; /** + * External user API, exported by symbols to be linked dynamically. + * + * The protocol includes: + * 1. do normal VFIO init operation: + * - opening a new container; + * - attaching group(s) to it; + * - setting an IOMMU driver for a container. + * When IOMMU is set for a container, all groups in it are + * considered ready to use by an external user. + * + * 2. The user space passed a group fd which we want to accelerate in + * KVM. KVM uses vfio_group_get_external_user() to verify that: + * - the group is initialized; + * - IOMMU is set for it. + * Then vfio_group_get_external_user() increments the container user + * counter to prevent the VFIO group from disposal prior to KVM exit. + * + * 3. KVM calls vfio_external_user_iommu_id() to know an IOMMU ID which + * KVM uses to get an iommu_group struct for later use. + * + * 4. When KVM is finished, it calls vfio_group_put_external_user() to + * release the VFIO group by decrementing the container user counter. nit, the interface is for any external user, not just kvm. + */ +struct vfio_group *vfio_group_get_external_user(struct file *filep) +{ + struct vfio_group *group = filep-private_data; + + if (filep-f_op != vfio_group_fops) + return NULL; ERR_PTR(-EINVAL) There also needs to be a vfio_group_get(group) here and put in error cases. + + if (!atomic_inc_not_zero(group-container_users)) + return NULL; ERR_PTR(-EINVAL) + + if (!group-container-iommu_driver || + !vfio_group_viable(group)) { + atomic_dec(group-container_users); + return NULL; ERR_PTR(-EINVAL) + } + + return group; +} +EXPORT_SYMBOL_GPL(vfio_group_get_external_user); + +void vfio_group_put_external_user(struct vfio_group *group) +{ + vfio_group_try_dissolve_container(group); And a vfio_group_put(group) here +} +EXPORT_SYMBOL_GPL(vfio_group_put_external_user); + +int vfio_external_user_iommu_id(struct vfio_group *group) +{ + return iommu_group_id(group-iommu_group); +} +EXPORT_SYMBOL_GPL(vfio_external_user_iommu_id); + +/** * Module/class support */ static char *vfio_devnode(struct device *dev, umode_t *mode) diff --git a/include/linux/vfio.h b/include/linux/vfio.h index ac8d488..24579a0 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -90,4 +90,11 @@ extern void vfio_unregister_iommu_driver( TYPE tmp; \ offsetof(TYPE, MEMBER) + sizeof(tmp.MEMBER); }) \ +/* + * External user API + */ +extern struct vfio_group *vfio_group_get_external_user(struct file *filep); +extern void vfio_group_put_external_user(struct vfio_group *group); +extern int vfio_external_user_iommu_id(struct vfio_group *group); + #endif /* VFIO_H */ -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/8] vfio: add external user support
On 07/09/2013 07:52 AM, Alex Williamson wrote: On Sun, 2013-07-07 at 01:07 +1000, Alexey Kardashevskiy wrote: VFIO is designed to be used via ioctls on file descriptors returned by VFIO. However in some situations support for an external user is required. The first user is KVM on PPC64 (SPAPR TCE protocol) which is going to use the existing VFIO groups for exclusive access in real/virtual mode on a host to avoid passing map/unmap requests to the user space which would made things pretty slow. The proposed protocol includes: 1. do normal VFIO init stuff such as opening a new container, attaching group(s) to it, setting an IOMMU driver for a container. When IOMMU is set for a container, all groups in it are considered ready to use by an external user. 2. pass a fd of the group we want to accelerate to KVM. KVM calls vfio_group_get_external_user() to verify if the group is initialized, IOMMU is set for it and increment the container user counter to prevent the VFIO group from disposal prior to KVM exit. The current TCE IOMMU driver marks the whole IOMMU table as busy when IOMMU is set for a container what prevents other DMA users from allocating from it so it is safe to grant user space access to it. 3. KVM calls vfio_external_user_iommu_id() to obtian an IOMMU ID which KVM uses to get an iommu_group struct for later use. 4. When KVM is finished, it calls vfio_group_put_external_user() to release the VFIO group by decrementing the container user counter. Everything gets released. The vfio: Limit group opens patch is also required for the consistency. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index c488da5..57aa191 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -1370,6 +1370,62 @@ static const struct file_operations vfio_device_fops = { }; /** + * External user API, exported by symbols to be linked dynamically. + * + * The protocol includes: + * 1. do normal VFIO init operation: + * - opening a new container; + * - attaching group(s) to it; + * - setting an IOMMU driver for a container. + * When IOMMU is set for a container, all groups in it are + * considered ready to use by an external user. + * + * 2. The user space passed a group fd which we want to accelerate in + * KVM. KVM uses vfio_group_get_external_user() to verify that: + * - the group is initialized; + * - IOMMU is set for it. + * Then vfio_group_get_external_user() increments the container user + * counter to prevent the VFIO group from disposal prior to KVM exit. + * + * 3. KVM calls vfio_external_user_iommu_id() to know an IOMMU ID which + * KVM uses to get an iommu_group struct for later use. + * + * 4. When KVM is finished, it calls vfio_group_put_external_user() to + * release the VFIO group by decrementing the container user counter. nit, the interface is for any external user, not just kvm. s/KVM/An external user/ ? Or add the description below uses KVM just as an example of an external user? + */ +struct vfio_group *vfio_group_get_external_user(struct file *filep) +{ +struct vfio_group *group = filep-private_data; + +if (filep-f_op != vfio_group_fops) +return NULL; ERR_PTR(-EINVAL) There also needs to be a vfio_group_get(group) here and put in error cases. Is that because I do not hold a reference to the file anymore? + +if (!atomic_inc_not_zero(group-container_users)) +return NULL; ERR_PTR(-EINVAL) + +if (!group-container-iommu_driver || +!vfio_group_viable(group)) { +atomic_dec(group-container_users); +return NULL; ERR_PTR(-EINVAL) +} + +return group; +} +EXPORT_SYMBOL_GPL(vfio_group_get_external_user); + +void vfio_group_put_external_user(struct vfio_group *group) +{ +vfio_group_try_dissolve_container(group); And a vfio_group_put(group) here +} +EXPORT_SYMBOL_GPL(vfio_group_put_external_user); + +int vfio_external_user_iommu_id(struct vfio_group *group) +{ +return iommu_group_id(group-iommu_group); +} +EXPORT_SYMBOL_GPL(vfio_external_user_iommu_id); + +/** * Module/class support */ static char *vfio_devnode(struct device *dev, umode_t *mode) diff --git a/include/linux/vfio.h b/include/linux/vfio.h index ac8d488..24579a0 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -90,4 +90,11 @@ extern void vfio_unregister_iommu_driver( TYPE tmp; \ offsetof(TYPE, MEMBER) + sizeof(tmp.MEMBER); }) \ +/* + * External user API + */ +extern struct vfio_group *vfio_group_get_external_user(struct file *filep); +extern void vfio_group_put_external_user(struct vfio_group *group); +extern int vfio_external_user_iommu_id(struct vfio_group *group); + #endif /* VFIO_H */ -- Alexey -- To unsubscribe from this list: send the
[PATCH 1/2] KVM: PPC: Book3S HV: Correct tlbie usage
This corrects the usage of the tlbie (TLB invalidate entry) instruction in HV KVM. The tlbie instruction changed between PPC970 and POWER7. On the PPC970, the bit to select large vs. small page is in the instruction, not in the RB register value. This changes the code to use the correct form on PPC970. On POWER7 we were calculating the AVAL (Abbreviated Virtual Address, Lower) field of the RB value incorrectly for 64k pages. This fixes it. Since we now have several cases to handle for the tlbie instruction, this factors out the code to do a sequence of tlbies into a new function, do_tlbies(), and calls that from the various places where the code was doing tlbie instructions inline. It also makes kvmppc_h_bulk_remove() use the same global_invalidates() function for determining whether to do local or global TLB invalidations as is used in other places, for consistency, and also to make sure that kvm-arch.need_tlb_flush gets updated properly. Signed-off-by: Paul Mackerras pau...@samba.org Cc: sta...@vger.kernel.org --- arch/powerpc/include/asm/kvm_book3s_64.h | 2 +- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 139 ++- 2 files changed, 82 insertions(+), 59 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 9c1ff33..dc6b84a 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -100,7 +100,7 @@ static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r, /* (masks depend on page size) */ rb |= 0x1000; /* page encoding in LP field */ rb |= (va_low 0x7f) 16; /* 7b of VA in AVA/LP field */ - rb |= (va_low 0xfe); /* AVAL field (P7 doesn't seem to care) */ + rb |= ((va_low 4) 0xf0); /* AVAL field (P7 doesn't seem to care) */ } } else { /* 4kB page */ diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 6dcbb49..105b00f 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -385,6 +385,80 @@ static inline int try_lock_tlbie(unsigned int *lock) return old == 0; } +/* + * tlbie/tlbiel is a bit different on the PPC970 compared to later + * processors such as POWER7; the large page bit is in the instruction + * not RB, and the top 16 bits and the bottom 12 bits of the VA + * in RB must be 0. + */ +static void do_tlbies_970(struct kvm *kvm, unsigned long *rbvalues, + long npages, int global, bool need_sync) +{ + long i; + + if (global) { + while (!try_lock_tlbie(kvm-arch.tlbie_lock)) + cpu_relax(); + if (need_sync) + asm volatile(ptesync : : : memory); + for (i = 0; i npages; ++i) { + unsigned long rb = rbvalues[i]; + + if (rb 1) /* large page */ + asm volatile(tlbie %0,1 : : +r (rb 0xf000ul)); + else + asm volatile(tlbie %0,0 : : +r (rb 0xf000ul)); + } + asm volatile(eieio; tlbsync; ptesync : : : memory); + kvm-arch.tlbie_lock = 0; + } else { + if (need_sync) + asm volatile(ptesync : : : memory); + for (i = 0; i npages; ++i) { + unsigned long rb = rbvalues[i]; + + if (rb 1) /* large page */ + asm volatile(tlbiel %0,1 : : +r (rb 0xf000ul)); + else + asm volatile(tlbiel %0,0 : : +r (rb 0xf000ul)); + } + asm volatile(ptesync : : : memory); + } +} + +static void do_tlbies(struct kvm *kvm, unsigned long *rbvalues, + long npages, int global, bool need_sync) +{ + long i; + + if (cpu_has_feature(CPU_FTR_ARCH_201)) { + /* PPC970 tlbie instruction is a bit different */ + do_tlbies_970(kvm, rbvalues, npages, global, need_sync); + return; + } + if (global) { + while (!try_lock_tlbie(kvm-arch.tlbie_lock)) + cpu_relax(); + if (need_sync) + asm volatile(ptesync : : : memory); + for (i = 0; i npages; ++i) + asm volatile(PPC_TLBIE(%1,%0) : : +r (rbvalues[i]), r (kvm-arch.lpid)); + asm volatile(eieio; tlbsync; ptesync
[PATCH 2/2] KVM: PPC: Book3S HV: Allow negative offsets to real-mode hcall handlers
The table of offsets to real-mode hcall handlers in book3s_hv_rmhandlers.S can contain negative values, if some of the handlers end up before the table in the vmlinux binary. Thus we need to use a sign-extending load to read the values in the table rather than a zero-extending load. Without this, the host crashes when the guest does one of the hcalls with negative offsets, due to jumping to a bogus address. Signed-off-by: Paul Mackerras pau...@samba.org Cc: sta...@vger.kernel.org --- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index b02f91e..60dce5b 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -1381,7 +1381,7 @@ hcall_try_real_mode: cmpldi r3,hcall_real_table_end - hcall_real_table bge guest_exit_cont LOAD_REG_ADDR(r4, hcall_real_table) - lwzxr3,r3,r4 + lwaxr3,r3,r4 cmpwi r3,0 beq guest_exit_cont add r3,r3,r4 -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation
On 28.06.2013, at 11:20, Mihai Caraman wrote: lwepx faults needs to be handled by KVM and this implies additional code in DO_KVM macro to identify the source of the exception originated from host context. This requires to check the Exception Syndrome Register (ESR[EPID]) and External PID Load Context Register (EPLC[EGS]) for DTB_MISS, DSI and LRAT exceptions which is too intrusive for the host. Get rid of lwepx and acquire last instuction in kvmppc_handle_exit() by searching for the physical address and kmap it. This fixes an infinite loop What's the difference in speed for this? Also, could we call lwepx later in host code, when kvmppc_get_last_inst() gets invoked? caused by lwepx's data TLB miss handled in the host and the TODO for TLB eviction and execute-but-not-read entries. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com --- Resend this pacth for Alex G. he was unsubscribed from kvm-ppc mailist for a while. arch/powerpc/include/asm/mmu-book3e.h |6 ++- arch/powerpc/kvm/booke.c |6 +++ arch/powerpc/kvm/booke.h |2 + arch/powerpc/kvm/bookehv_interrupts.S | 32 ++- arch/powerpc/kvm/e500.c |4 ++ arch/powerpc/kvm/e500mc.c | 69 + 6 files changed, 91 insertions(+), 28 deletions(-) diff --git a/arch/powerpc/include/asm/mmu-book3e.h b/arch/powerpc/include/asm/mmu-book3e.h index 99d43e0..32e470e 100644 --- a/arch/powerpc/include/asm/mmu-book3e.h +++ b/arch/powerpc/include/asm/mmu-book3e.h @@ -40,7 +40,10 @@ /* MAS registers bit definitions */ -#define MAS0_TLBSEL(x) (((x) 28) 0x3000) +#define MAS0_TLBSEL_MASK 0x3000 +#define MAS0_TLBSEL_SHIFT28 +#define MAS0_TLBSEL(x) (((x) MAS0_TLBSEL_SHIFT) MAS0_TLBSEL_MASK) +#define MAS0_GET_TLBSEL(mas0)(((mas0) MAS0_TLBSEL_MASK) MAS0_TLBSEL_SHIFT) #define MAS0_ESEL_MASK0x0FFF #define MAS0_ESEL_SHIFT 16 #define MAS0_ESEL(x) (((x) MAS0_ESEL_SHIFT) MAS0_ESEL_MASK) @@ -58,6 +61,7 @@ #define MAS1_TSIZE_MASK 0x0f80 #define MAS1_TSIZE_SHIFT 7 #define MAS1_TSIZE(x) (((x) MAS1_TSIZE_SHIFT) MAS1_TSIZE_MASK) +#define MAS1_GET_TSIZE(mas1) (((mas1) MAS1_TSIZE_MASK) MAS1_TSIZE_SHIFT) #define MAS2_EPN (~0xFFFUL) #define MAS2_X0 0x0040 diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 1020119..6764a8e 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -836,6 +836,12 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, /* update before a new last_exit_type is rewritten */ kvmppc_update_timing_stats(vcpu); + /* + * The exception type can change at this point, such as if the TLB entry + * for the emulated instruction has been evicted. + */ + kvmppc_prepare_for_emulation(vcpu, exit_nr); Please model this the same way as book3s. Check out kvmppc_get_last_inst() as a starting point. + /* restart interrupts if they were meant for the host */ kvmppc_restart_interrupt(vcpu, exit_nr); diff --git a/arch/powerpc/kvm/booke.h b/arch/powerpc/kvm/booke.h index 5fd1ba6..a0d0fea 100644 --- a/arch/powerpc/kvm/booke.h +++ b/arch/powerpc/kvm/booke.h @@ -90,6 +90,8 @@ void kvmppc_vcpu_disable_spe(struct kvm_vcpu *vcpu); void kvmppc_booke_vcpu_load(struct kvm_vcpu *vcpu, int cpu); void kvmppc_booke_vcpu_put(struct kvm_vcpu *vcpu); +void kvmppc_prepare_for_emulation(struct kvm_vcpu *vcpu, unsigned int *exit_nr); + enum int_class { INT_CLASS_NONCRIT, INT_CLASS_CRIT, diff --git a/arch/powerpc/kvm/bookehv_interrupts.S b/arch/powerpc/kvm/bookehv_interrupts.S index 20c7a54..0538ab9 100644 --- a/arch/powerpc/kvm/bookehv_interrupts.S +++ b/arch/powerpc/kvm/bookehv_interrupts.S @@ -120,37 +120,20 @@ .if \flags NEED_EMU /* - * This assumes you have external PID support. - * To support a bookehv CPU without external PID, you'll - * need to look up the TLB entry and create a temporary mapping. - * - * FIXME: we don't currently handle if the lwepx faults. PR-mode - * booke doesn't handle it either. Since Linux doesn't use - * broadcast tlbivax anymore, the only way this should happen is - * if the guest maps its memory execute-but-not-read, or if we - * somehow take a TLB miss in the middle of this entry code and - * evict the relevant entry. On e500mc, all kernel lowmem is - * bolted into TLB1 large page mappings, and we don't use - * broadcast invalidates, so we should not take a TLB miss here. - * - * Later we'll need to deal with faults here. Disallowing guest - * mappings that are execute-but-not-read could be an option on - * e500mc, but not on chips with an LRAT if it is used.
Re: [PATCH -V3 1/4] mm/cma: Move dma contiguous changes into a seperate config
On 02.07.2013, at 07:45, Aneesh Kumar K.V wrote: From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com We want to use CMA for allocating hash page table and real mode area for PPC64. Hence move DMA contiguous related changes into a seperate config so that ppc64 can enable CMA without requiring DMA contiguous. Acked-by: Michal Nazarewicz min...@mina86.com Acked-by: Paul Mackerras pau...@samba.org Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com Thanks, applied all to kvm-ppc-queue. Please provide a cover letter next time :). Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM: PPC: Fix kvm_exit_names array
On 03.07.2013, at 15:30, Mihai Caraman wrote: Some exit ids where left out from kvm_exit_names array. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com --- arch/powerpc/kvm/timing.c |4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/kvm/timing.c b/arch/powerpc/kvm/timing.c index 07b6110..c392d26 100644 --- a/arch/powerpc/kvm/timing.c +++ b/arch/powerpc/kvm/timing.c @@ -135,7 +135,9 @@ static const char *kvm_exit_names[__NUMBER_OF_KVM_EXIT_TYPES] = { [USR_PR_INST] = USR_PR_INST, [FP_UNAVAIL] = FP_UNAVAIL, [DEBUG_EXITS] = DEBUG, - [TIMEINGUEST] = TIMEINGUEST + [TIMEINGUEST] = TIMEINGUEST, + [DBELL_EXITS] = DBELL, + [GDBELL_EXITS] =GDBELL Please add a comma at the end here, so that we don't have to uselessly touch the entry next time again. Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] KVM: PPC: Book3E: Emulate MCSRR0/1 SPR and rfmci instruction
On 03.07.2013, at 15:30, Mihai Caraman wrote: Some guests are making use of return from machine check instruction to do crazy things even though the 64-bit kernel doesn't handle yet this interrupt. Emulate MCSRR0/1 SPR and rfmci instruction accordingly. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com --- arch/powerpc/include/asm/kvm_host.h |1 + arch/powerpc/kvm/booke_emulate.c| 25 + arch/powerpc/kvm/timing.c |1 + 3 files changed, 27 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index af326cd..0466789 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -148,6 +148,7 @@ enum kvm_exit_types { EMULATED_TLBWE_EXITS, EMULATED_RFI_EXITS, EMULATED_RFCI_EXITS, + EMULATED_RFMCI_EXITS, I would quite frankly prefer to see us abandon the whole exit timing framework in the kernel and instead use trace points. Then we don't have to maintain all of this randomly exercised code. FWIW I think in this case however, treating RFMCI the same as RFI or random instruction emulation shouldn't hurt. This whole table is only about timing measurements. If you want to know for real what's going on, use trace points. Otherwise looks good. Alex DEC_EXITS, EXT_INTR_EXITS, HALT_WAKEUP, diff --git a/arch/powerpc/kvm/booke_emulate.c b/arch/powerpc/kvm/booke_emulate.c index 27a4b28..aaff1b7 100644 --- a/arch/powerpc/kvm/booke_emulate.c +++ b/arch/powerpc/kvm/booke_emulate.c @@ -23,6 +23,7 @@ #include booke.h +#define OP_19_XOP_RFMCI 38 #define OP_19_XOP_RFI 50 #define OP_19_XOP_RFCI51 @@ -43,6 +44,12 @@ static void kvmppc_emul_rfci(struct kvm_vcpu *vcpu) kvmppc_set_msr(vcpu, vcpu-arch.csrr1); } +static void kvmppc_emul_rfmci(struct kvm_vcpu *vcpu) +{ + vcpu-arch.pc = vcpu-arch.mcsrr0; + kvmppc_set_msr(vcpu, vcpu-arch.mcsrr1); +} + int kvmppc_booke_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu, unsigned int inst, int *advance) { @@ -65,6 +72,12 @@ int kvmppc_booke_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu, *advance = 0; break; + case OP_19_XOP_RFMCI: + kvmppc_emul_rfmci(vcpu); + kvmppc_set_exit_type(vcpu, EMULATED_RFMCI_EXITS); + *advance = 0; + break; + default: emulated = EMULATE_FAIL; break; @@ -138,6 +151,12 @@ int kvmppc_booke_emulate_mtspr(struct kvm_vcpu *vcpu, int sprn, ulong spr_val) case SPRN_DBCR1: vcpu-arch.dbg_reg.dbcr1 = spr_val; break; + case SPRN_MCSRR0: + vcpu-arch.mcsrr0 = spr_val; + break; + case SPRN_MCSRR1: + vcpu-arch.mcsrr1 = spr_val; + break; case SPRN_DBSR: vcpu-arch.dbsr = ~spr_val; break; @@ -284,6 +303,12 @@ int kvmppc_booke_emulate_mfspr(struct kvm_vcpu *vcpu, int sprn, ulong *spr_val) case SPRN_DBCR1: *spr_val = vcpu-arch.dbg_reg.dbcr1; break; + case SPRN_MCSRR0: + *spr_val = vcpu-arch.mcsrr0; + break; + case SPRN_MCSRR1: + *spr_val = vcpu-arch.mcsrr1; + break; case SPRN_DBSR: *spr_val = vcpu-arch.dbsr; break; diff --git a/arch/powerpc/kvm/timing.c b/arch/powerpc/kvm/timing.c index c392d26..670f63d 100644 --- a/arch/powerpc/kvm/timing.c +++ b/arch/powerpc/kvm/timing.c @@ -129,6 +129,7 @@ static const char *kvm_exit_names[__NUMBER_OF_KVM_EXIT_TYPES] = { [EMULATED_TLBSX_EXITS] =EMUL_TLBSX, [EMULATED_TLBWE_EXITS] =EMUL_TLBWE, [EMULATED_RFI_EXITS] = EMUL_RFI, + [EMULATED_RFMCI_EXITS] =EMUL_RFMCI, [DEC_EXITS] = DEC, [EXT_INTR_EXITS] = EXTINT, [HALT_WAKEUP] = HALT, -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/8] vfio: add external user support
On Sun, 2013-07-07 at 01:07 +1000, Alexey Kardashevskiy wrote: VFIO is designed to be used via ioctls on file descriptors returned by VFIO. However in some situations support for an external user is required. The first user is KVM on PPC64 (SPAPR TCE protocol) which is going to use the existing VFIO groups for exclusive access in real/virtual mode on a host to avoid passing map/unmap requests to the user space which would made things pretty slow. The proposed protocol includes: 1. do normal VFIO init stuff such as opening a new container, attaching group(s) to it, setting an IOMMU driver for a container. When IOMMU is set for a container, all groups in it are considered ready to use by an external user. 2. pass a fd of the group we want to accelerate to KVM. KVM calls vfio_group_get_external_user() to verify if the group is initialized, IOMMU is set for it and increment the container user counter to prevent the VFIO group from disposal prior to KVM exit. The current TCE IOMMU driver marks the whole IOMMU table as busy when IOMMU is set for a container what prevents other DMA users from allocating from it so it is safe to grant user space access to it. 3. KVM calls vfio_external_user_iommu_id() to obtian an IOMMU ID which KVM uses to get an iommu_group struct for later use. 4. When KVM is finished, it calls vfio_group_put_external_user() to release the VFIO group by decrementing the container user counter. Everything gets released. The vfio: Limit group opens patch is also required for the consistency. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index c488da5..57aa191 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -1370,6 +1370,62 @@ static const struct file_operations vfio_device_fops = { }; /** + * External user API, exported by symbols to be linked dynamically. + * + * The protocol includes: + * 1. do normal VFIO init operation: + * - opening a new container; + * - attaching group(s) to it; + * - setting an IOMMU driver for a container. + * When IOMMU is set for a container, all groups in it are + * considered ready to use by an external user. + * + * 2. The user space passed a group fd which we want to accelerate in + * KVM. KVM uses vfio_group_get_external_user() to verify that: + * - the group is initialized; + * - IOMMU is set for it. + * Then vfio_group_get_external_user() increments the container user + * counter to prevent the VFIO group from disposal prior to KVM exit. + * + * 3. KVM calls vfio_external_user_iommu_id() to know an IOMMU ID which + * KVM uses to get an iommu_group struct for later use. + * + * 4. When KVM is finished, it calls vfio_group_put_external_user() to + * release the VFIO group by decrementing the container user counter. nit, the interface is for any external user, not just kvm. + */ +struct vfio_group *vfio_group_get_external_user(struct file *filep) +{ + struct vfio_group *group = filep-private_data; + + if (filep-f_op != vfio_group_fops) + return NULL; ERR_PTR(-EINVAL) There also needs to be a vfio_group_get(group) here and put in error cases. + + if (!atomic_inc_not_zero(group-container_users)) + return NULL; ERR_PTR(-EINVAL) + + if (!group-container-iommu_driver || + !vfio_group_viable(group)) { + atomic_dec(group-container_users); + return NULL; ERR_PTR(-EINVAL) + } + + return group; +} +EXPORT_SYMBOL_GPL(vfio_group_get_external_user); + +void vfio_group_put_external_user(struct vfio_group *group) +{ + vfio_group_try_dissolve_container(group); And a vfio_group_put(group) here +} +EXPORT_SYMBOL_GPL(vfio_group_put_external_user); + +int vfio_external_user_iommu_id(struct vfio_group *group) +{ + return iommu_group_id(group-iommu_group); +} +EXPORT_SYMBOL_GPL(vfio_external_user_iommu_id); + +/** * Module/class support */ static char *vfio_devnode(struct device *dev, umode_t *mode) diff --git a/include/linux/vfio.h b/include/linux/vfio.h index ac8d488..24579a0 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -90,4 +90,11 @@ extern void vfio_unregister_iommu_driver( TYPE tmp; \ offsetof(TYPE, MEMBER) + sizeof(tmp.MEMBER); }) \ +/* + * External user API + */ +extern struct vfio_group *vfio_group_get_external_user(struct file *filep); +extern void vfio_group_put_external_user(struct vfio_group *group); +extern int vfio_external_user_iommu_id(struct vfio_group *group); + #endif /* VFIO_H */ -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/8] vfio: add external user support
On 07/09/2013 07:52 AM, Alex Williamson wrote: On Sun, 2013-07-07 at 01:07 +1000, Alexey Kardashevskiy wrote: VFIO is designed to be used via ioctls on file descriptors returned by VFIO. However in some situations support for an external user is required. The first user is KVM on PPC64 (SPAPR TCE protocol) which is going to use the existing VFIO groups for exclusive access in real/virtual mode on a host to avoid passing map/unmap requests to the user space which would made things pretty slow. The proposed protocol includes: 1. do normal VFIO init stuff such as opening a new container, attaching group(s) to it, setting an IOMMU driver for a container. When IOMMU is set for a container, all groups in it are considered ready to use by an external user. 2. pass a fd of the group we want to accelerate to KVM. KVM calls vfio_group_get_external_user() to verify if the group is initialized, IOMMU is set for it and increment the container user counter to prevent the VFIO group from disposal prior to KVM exit. The current TCE IOMMU driver marks the whole IOMMU table as busy when IOMMU is set for a container what prevents other DMA users from allocating from it so it is safe to grant user space access to it. 3. KVM calls vfio_external_user_iommu_id() to obtian an IOMMU ID which KVM uses to get an iommu_group struct for later use. 4. When KVM is finished, it calls vfio_group_put_external_user() to release the VFIO group by decrementing the container user counter. Everything gets released. The vfio: Limit group opens patch is also required for the consistency. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index c488da5..57aa191 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -1370,6 +1370,62 @@ static const struct file_operations vfio_device_fops = { }; /** + * External user API, exported by symbols to be linked dynamically. + * + * The protocol includes: + * 1. do normal VFIO init operation: + * - opening a new container; + * - attaching group(s) to it; + * - setting an IOMMU driver for a container. + * When IOMMU is set for a container, all groups in it are + * considered ready to use by an external user. + * + * 2. The user space passed a group fd which we want to accelerate in + * KVM. KVM uses vfio_group_get_external_user() to verify that: + * - the group is initialized; + * - IOMMU is set for it. + * Then vfio_group_get_external_user() increments the container user + * counter to prevent the VFIO group from disposal prior to KVM exit. + * + * 3. KVM calls vfio_external_user_iommu_id() to know an IOMMU ID which + * KVM uses to get an iommu_group struct for later use. + * + * 4. When KVM is finished, it calls vfio_group_put_external_user() to + * release the VFIO group by decrementing the container user counter. nit, the interface is for any external user, not just kvm. s/KVM/An external user/ ? Or add the description below uses KVM just as an example of an external user? + */ +struct vfio_group *vfio_group_get_external_user(struct file *filep) +{ +struct vfio_group *group = filep-private_data; + +if (filep-f_op != vfio_group_fops) +return NULL; ERR_PTR(-EINVAL) There also needs to be a vfio_group_get(group) here and put in error cases. Is that because I do not hold a reference to the file anymore? + +if (!atomic_inc_not_zero(group-container_users)) +return NULL; ERR_PTR(-EINVAL) + +if (!group-container-iommu_driver || +!vfio_group_viable(group)) { +atomic_dec(group-container_users); +return NULL; ERR_PTR(-EINVAL) +} + +return group; +} +EXPORT_SYMBOL_GPL(vfio_group_get_external_user); + +void vfio_group_put_external_user(struct vfio_group *group) +{ +vfio_group_try_dissolve_container(group); And a vfio_group_put(group) here +} +EXPORT_SYMBOL_GPL(vfio_group_put_external_user); + +int vfio_external_user_iommu_id(struct vfio_group *group) +{ +return iommu_group_id(group-iommu_group); +} +EXPORT_SYMBOL_GPL(vfio_external_user_iommu_id); + +/** * Module/class support */ static char *vfio_devnode(struct device *dev, umode_t *mode) diff --git a/include/linux/vfio.h b/include/linux/vfio.h index ac8d488..24579a0 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -90,4 +90,11 @@ extern void vfio_unregister_iommu_driver( TYPE tmp; \ offsetof(TYPE, MEMBER) + sizeof(tmp.MEMBER); }) \ +/* + * External user API + */ +extern struct vfio_group *vfio_group_get_external_user(struct file *filep); +extern void vfio_group_put_external_user(struct vfio_group *group); +extern int vfio_external_user_iommu_id(struct vfio_group *group); + #endif /* VFIO_H */ -- Alexey -- To unsubscribe from this list: send the