Re: [User Question] How to create a backup of an LVM based maschine without wasting space
Am 15.10.2012 05:06, schrieb Javier Guerra Giraldez: On Sat, Oct 13, 2012 at 5:25 PM, Lukas Laukamp lu...@laukamp.me wrote: I have backed up the data within the machine with partimage and fsarchiver. But it would be greate to have a better way than doing this over a live system. make no mistake, the absolutely best way is from within the VM. It's the most consistent, safe and efficient method. Doing it from the outside is attractive, but it's a hack, and in some cases you have to jump through several hoops to make it safe. Because of the fact that this is a problem I bootet the VM with a live cd and made a backup of the important filesystems with the live system which runs inside the VM. Best Regards -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: do not de-cache cr4 bits needlessly
Signed-off-by: Gleb Natapov g...@redhat.com diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index e9c83b1..3df12c8 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -635,7 +635,7 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3) } if (is_long_mode(vcpu)) { - if (kvm_read_cr4(vcpu) X86_CR4_PCIDE) { + if (kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE)) { if (cr3 CR3_PCID_ENABLED_RESERVED_BITS) return 1; } else -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Shared IRQ with PCI Passthrough?
Jan Kiszka jan.kiszka at siemens.com writes: Nope, there is no IRQ sharing support for assigned devices in any public version so far. I'm on it, but some issues remain to be solved. Jan Hi, any news on this? I own an Intel DQ67OW that has the same issue. No PCI passthrough possible with KVM when USB is active. Marco -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/1] vhost-blk: Add vhost-blk support v4
vhost-blk is an in-kernel virito-blk device accelerator. Due to lack of proper in-kernel AIO interface, this version converts guest's I/O request to bio and use submit_bio() to submit I/O directly. So this version any supports raw block device as guest's disk image, e.g. /dev/sda, /dev/ram0. We can add file based image support to vhost-blk once we have in-kernel AIO interface. There are some work in progress for in-kernel AIO interface from Dave Kleikamp and Zach Brown: http://marc.info/?l=linux-fsdevelm=133312234313122 Performance evaluation: - 1) LKVM Fio with libaio ioengine on Fusion IO device using kvm tool IOPS Before After Improvement seq-read 107 121 +13.0% seq-write 130 179 +37.6% rnd-read 102 122 +19.6% rnd-write 125 159 +27.0% 2) QEMU Fio with libaio ioengine on Fusion IO device using QEMU IOPS Before After Improvement seq-read 76 123 +61.8% seq-write 139 173 +24.4% rnd-read 73 120 +64.3% rnd-write 75 156 +108.0% Userspace bits: - 1) LKVM The latest vhost-blk userspace bits for kvm tool can be found here: g...@github.com:asias/linux-kvm.git blk.vhost-blk 2) QEMU The latest vhost-blk userspace prototype for QEMU can be found here: g...@github.com:asias/qemu.git blk.vhost-blk Changes in v4: - Mark req-status as userspace pointer - Use __copy_to_user() instead of copy_to_user() in vhost_blk_set_status() - Add if (need_resched()) schedule() in blk thread - Kill vhost_blk_stop_vq() and move it into vhost_blk_stop() - Use vq_err() instead of pr_warn() - Fail un Unsupported request - Add flush in vhost_blk_set_features() Changes in v3: - Sending REQ_FLUSH bio instead of vfs_fsync, thanks Christoph! - Check file passed by user is a raw block device file Signed-off-by: Asias He as...@redhat.com --- drivers/vhost/Kconfig | 1 + drivers/vhost/Kconfig.blk | 10 + drivers/vhost/Makefile| 2 + drivers/vhost/blk.c | 677 ++ drivers/vhost/blk.h | 8 + 5 files changed, 698 insertions(+) create mode 100644 drivers/vhost/Kconfig.blk create mode 100644 drivers/vhost/blk.c create mode 100644 drivers/vhost/blk.h diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig index 202bba6..acd8038 100644 --- a/drivers/vhost/Kconfig +++ b/drivers/vhost/Kconfig @@ -11,4 +11,5 @@ config VHOST_NET if STAGING source drivers/vhost/Kconfig.tcm +source drivers/vhost/Kconfig.blk endif diff --git a/drivers/vhost/Kconfig.blk b/drivers/vhost/Kconfig.blk new file mode 100644 index 000..ff8ab76 --- /dev/null +++ b/drivers/vhost/Kconfig.blk @@ -0,0 +1,10 @@ +config VHOST_BLK + tristate Host kernel accelerator for virtio blk (EXPERIMENTAL) + depends on BLOCK EXPERIMENTAL m + ---help--- + This kernel module can be loaded in host kernel to accelerate + guest block with virtio_blk. Not to be confused with virtio_blk + module itself which needs to be loaded in guest kernel. + + To compile this driver as a module, choose M here: the module will + be called vhost_blk. diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile index a27b053..1a8a4a5 100644 --- a/drivers/vhost/Makefile +++ b/drivers/vhost/Makefile @@ -2,3 +2,5 @@ obj-$(CONFIG_VHOST_NET) += vhost_net.o vhost_net-y := vhost.o net.o obj-$(CONFIG_TCM_VHOST) += tcm_vhost.o +obj-$(CONFIG_VHOST_BLK) += vhost_blk.o +vhost_blk-y := blk.o diff --git a/drivers/vhost/blk.c b/drivers/vhost/blk.c new file mode 100644 index 000..5c2b790 --- /dev/null +++ b/drivers/vhost/blk.c @@ -0,0 +1,677 @@ +/* + * Copyright (C) 2011 Taobao, Inc. + * Author: Liu Yuan tailai...@taobao.com + * + * Copyright (C) 2012 Red Hat, Inc. + * Author: Asias He as...@redhat.com + * + * This work is licensed under the terms of the GNU GPL, version 2. + * + * virtio-blk server in host kernel. + */ + +#include linux/miscdevice.h +#include linux/module.h +#include linux/vhost.h +#include linux/virtio_blk.h +#include linux/mutex.h +#include linux/file.h +#include linux/kthread.h +#include linux/blkdev.h + +#include vhost.c +#include vhost.h +#include blk.h + +/* The block header is in the first and separate buffer. */ +#define BLK_HDR0 + +static DEFINE_IDA(vhost_blk_index_ida); + +enum { + VHOST_BLK_VQ_REQ = 0, + VHOST_BLK_VQ_MAX = 1, +}; + +struct req_page_list { + struct page **pages; + int pages_nr; +}; + +struct vhost_blk_req { + struct llist_node llnode; + struct req_page_list *pl; + struct vhost_blk *blk; + + struct iovec *iov; + int iov_nr; + + struct bio **bio; + atomic_t bio_nr; + + sector_t sector; + int write; + u16 head; + long len; + + u8 __user *status; +}; + +struct vhost_blk { + struct task_struct *host_kick; + struct
KVM call agenda for 2012-10-16
Hi Please send in any agenda topics you are interested in. Later, Juan. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Shared IRQ with PCI Passthrough?
On 2012-10-15 11:07, Marco wrote: Jan Kiszka jan.kiszka at siemens.com writes: Nope, there is no IRQ sharing support for assigned devices in any public version so far. I'm on it, but some issues remain to be solved. Jan Hi, any news on this? I own an Intel DQ67OW that has the same issue. No PCI passthrough possible with KVM when USB is active. Supported by qemu-kvm-1.2 and Linux = 3.4. But not all devices play well with it, so your mileage may vary. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] KVM: PPC: Support ioeventfd
In order to support vhost, we need to be able to support ioeventfd. This patch set adds support for ioeventfd to PPC and makes it possible to do so without implementing irqfd along the way, as it requires an in-kernel irqchip which we don't have yet. Alex Alexander Graf (2): KVM: Distangle eventfd code from irqchip KVM: PPC: Support eventfd arch/powerpc/kvm/Kconfig |1 + arch/powerpc/kvm/Makefile |4 +++- arch/powerpc/kvm/powerpc.c | 17 - include/linux/kvm_host.h | 12 +++- virt/kvm/eventfd.c |6 ++ 5 files changed, 37 insertions(+), 3 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] KVM: PPC: Support eventfd
In order to support the generic eventfd infrastructure on PPC, we need to call into the generic KVM in-kernel device mmio code. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/Kconfig |1 + arch/powerpc/kvm/Makefile |4 +++- arch/powerpc/kvm/powerpc.c | 17 - 3 files changed, 20 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig index 71f0cd9..4730c95 100644 --- a/arch/powerpc/kvm/Kconfig +++ b/arch/powerpc/kvm/Kconfig @@ -20,6 +20,7 @@ config KVM bool select PREEMPT_NOTIFIERS select ANON_INODES + select HAVE_KVM_EVENTFD config KVM_BOOK3S_HANDLER bool diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile index c2a0863..cd89658 100644 --- a/arch/powerpc/kvm/Makefile +++ b/arch/powerpc/kvm/Makefile @@ -6,7 +6,8 @@ subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror ccflags-y := -Ivirt/kvm -Iarch/powerpc/kvm -common-objs-y = $(addprefix ../../../virt/kvm/, kvm_main.o coalesced_mmio.o) +common-objs-y = $(addprefix ../../../virt/kvm/, kvm_main.o coalesced_mmio.o \ + eventfd.o) CFLAGS_44x_tlb.o := -I. CFLAGS_e500_tlb.o := -I. @@ -76,6 +77,7 @@ kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HV) := \ kvm-book3s_64-module-objs := \ ../../../virt/kvm/kvm_main.o \ + ../../../virt/kvm/eventfd.o \ powerpc.o \ emulate.o \ book3s.o \ diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index deb0d59..900d8fc 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -314,6 +314,7 @@ int kvm_dev_ioctl_check_extension(long ext) case KVM_CAP_PPC_IRQ_LEVEL: case KVM_CAP_ENABLE_CAP: case KVM_CAP_ONE_REG: + case KVM_CAP_IOEVENTFD: r = 1; break; #ifndef CONFIG_KVM_BOOK3S_64_HV @@ -613,6 +614,13 @@ int kvmppc_handle_load(struct kvm_run *run, struct kvm_vcpu *vcpu, vcpu-mmio_is_write = 0; vcpu-arch.mmio_sign_extend = 0; + if (!kvm_io_bus_read(vcpu-kvm, KVM_MMIO_BUS, run-mmio.phys_addr, +bytes, run-mmio.data)) { + kvmppc_complete_mmio_load(vcpu, run); + vcpu-mmio_needed = 0; + return EMULATE_DONE; + } + return EMULATE_DO_MMIO; } @@ -622,8 +630,8 @@ int kvmppc_handle_loads(struct kvm_run *run, struct kvm_vcpu *vcpu, { int r; - r = kvmppc_handle_load(run, vcpu, rt, bytes, is_bigendian); vcpu-arch.mmio_sign_extend = 1; + r = kvmppc_handle_load(run, vcpu, rt, bytes, is_bigendian); return r; } @@ -661,6 +669,13 @@ int kvmppc_handle_store(struct kvm_run *run, struct kvm_vcpu *vcpu, } } + if (!kvm_io_bus_write(vcpu-kvm, KVM_MMIO_BUS, run-mmio.phys_addr, + bytes, run-mmio.data)) { + kvmppc_complete_mmio_load(vcpu, run); + vcpu-mmio_needed = 0; + return EMULATE_DONE; + } + return EMULATE_DO_MMIO; } -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] KVM: Distangle eventfd code from irqchip
The current eventfd code assumes that when we have eventfd, we also have irqfd for in-kernel interrupt delivery. This is not necessarily true. On PPC we don't have an in-kernel irqchip yet, but we can still support easily support eventfd. Signed-off-by: Alexander Graf ag...@suse.de --- include/linux/kvm_host.h | 12 +++- virt/kvm/eventfd.c |6 ++ 2 files changed, 17 insertions(+), 1 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 6afc5be..f2f5880 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -884,10 +884,20 @@ static inline void kvm_free_irq_routing(struct kvm *kvm) {} #ifdef CONFIG_HAVE_KVM_EVENTFD void kvm_eventfd_init(struct kvm *kvm); +int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args); + +#ifdef CONFIG_HAVE_KVM_IRQCHIP int kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args); void kvm_irqfd_release(struct kvm *kvm); void kvm_irq_routing_update(struct kvm *, struct kvm_irq_routing_table *); -int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args); +#else +static inline int kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args) +{ + return -EINVAL; +} + +static inline void kvm_irqfd_release(struct kvm *kvm) {} +#endif #else diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 9718e98..d7424c8 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -35,6 +35,7 @@ #include iodev.h +#ifdef __KVM_HAVE_IOAPIC /* * * irqfd: Allows an fd to be used to inject an interrupt to the guest @@ -425,17 +426,21 @@ fail: kfree(irqfd); return ret; } +#endif void kvm_eventfd_init(struct kvm *kvm) { +#ifdef __KVM_HAVE_IOAPIC spin_lock_init(kvm-irqfds.lock); INIT_LIST_HEAD(kvm-irqfds.items); INIT_LIST_HEAD(kvm-irqfds.resampler_list); mutex_init(kvm-irqfds.resampler_lock); +#endif INIT_LIST_HEAD(kvm-ioeventfds); } +#ifdef __KVM_HAVE_IOAPIC /* * shutdown any irqfd's that match fd+gsi */ @@ -555,6 +560,7 @@ static void __exit irqfd_module_exit(void) module_init(irqfd_module_init); module_exit(irqfd_module_exit); +#endif /* * -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE handler
On 10/11/2012 01:06 AM, Andrew Theurer wrote: On Wed, 2012-10-10 at 23:24 +0530, Raghavendra K T wrote: On 10/10/2012 08:29 AM, Andrew Theurer wrote: On Wed, 2012-10-10 at 00:21 +0530, Raghavendra K T wrote: * Avi Kivity a...@redhat.com [2012-10-04 17:00:28]: On 10/04/2012 03:07 PM, Peter Zijlstra wrote: On Thu, 2012-10-04 at 14:41 +0200, Avi Kivity wrote: [...] A big concern I have (if this is 1x overcommit) for ebizzy is that it has just terrible scalability to begin with. I do not think we should try to optimize such a bad workload. I think my way of running dbench has some flaw, so I went to ebizzy. Could you let me know how you generally run dbench? I mount a tmpfs and then specify that mount for dbench to run on. This eliminates all IO. I use a 300 second run time and number of threads is equal to number of vcpus. All of the VMs of course need to have a synchronized start. I would also make sure you are using a recent kernel for dbench, where the dcache scalability is much improved. Without any lock-holder preemption, the time in spin_lock should be very low: 21.54% 78016 dbench [kernel.kallsyms] [k] copy_user_generic_unrolled 3.51% 12723 dbench libc-2.12.so[.] __strchr_sse42 2.81% 10176 dbench dbench [.] child_run 2.54% 9203 dbench [kernel.kallsyms] [k] _raw_spin_lock 2.33% 8423 dbench dbench [.] next_token 2.02% 7335 dbench [kernel.kallsyms] [k] __d_lookup_rcu 1.89% 6850 dbench libc-2.12.so[.] __strstr_sse42 1.53% 5537 dbench libc-2.12.so[.] __memset_sse2 1.47% 5337 dbench [kernel.kallsyms] [k] link_path_walk 1.40% 5084 dbench [kernel.kallsyms] [k] kmem_cache_alloc 1.38% 5009 dbench libc-2.12.so[.] memmove 1.24% 4496 dbench libc-2.12.so[.] vfprintf 1.15% 4169 dbench [kernel.kallsyms] [k] __audit_syscall_exit Hi Andrew, I ran the test with dbench with tmpfs. I do not see any improvements in dbench for 16k ple window. So it seems apart from ebizzy no workload benefited by that. and I agree that, it may not be good to optimize for ebizzy. I shall drop changing to 16k default window and continue with other original patch series. Need to experiment with latest kernel. (PS: Thanks for pointing towards, perf in latest kernel. It works fine.) Results: dbench run for 120 sec 30 sec warmup 8 iterations using tmpfs base = 3.6.0-rc5 with ple handler optimization patch. x = base + ple_window = 4k + = base + ple_window = 16k * = base + ple_gap = 0 dbench 1x overcommit case = N Min MaxMedian AvgStddev x 85322.5 5519.05 5482.71 5461.0962 63.522276 + 8 5255.45 5530.55 5496.94 5455.2137 93.070363 * 8 5350.85 5477.81 5408.065 5418.4338 44.762697 dbench 2x overcommit case == N Min MaxMedian AvgStddev x 8 3054.32 3194.47 3137.33 3132.625 54.491615 + 83040.8 3148.87 3088.615 3088.1887 32.862336 * 8 3031.51 3171.993083.6 3097.4612 50.526977 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE handler
On Mon, 2012-10-15 at 17:40 +0530, Raghavendra K T wrote: On 10/11/2012 01:06 AM, Andrew Theurer wrote: On Wed, 2012-10-10 at 23:24 +0530, Raghavendra K T wrote: On 10/10/2012 08:29 AM, Andrew Theurer wrote: On Wed, 2012-10-10 at 00:21 +0530, Raghavendra K T wrote: * Avi Kivity a...@redhat.com [2012-10-04 17:00:28]: On 10/04/2012 03:07 PM, Peter Zijlstra wrote: On Thu, 2012-10-04 at 14:41 +0200, Avi Kivity wrote: [...] A big concern I have (if this is 1x overcommit) for ebizzy is that it has just terrible scalability to begin with. I do not think we should try to optimize such a bad workload. I think my way of running dbench has some flaw, so I went to ebizzy. Could you let me know how you generally run dbench? I mount a tmpfs and then specify that mount for dbench to run on. This eliminates all IO. I use a 300 second run time and number of threads is equal to number of vcpus. All of the VMs of course need to have a synchronized start. I would also make sure you are using a recent kernel for dbench, where the dcache scalability is much improved. Without any lock-holder preemption, the time in spin_lock should be very low: 21.54% 78016 dbench [kernel.kallsyms] [k] copy_user_generic_unrolled 3.51% 12723 dbench libc-2.12.so[.] __strchr_sse42 2.81% 10176 dbench dbench [.] child_run 2.54% 9203 dbench [kernel.kallsyms] [k] _raw_spin_lock 2.33% 8423 dbench dbench [.] next_token 2.02% 7335 dbench [kernel.kallsyms] [k] __d_lookup_rcu 1.89% 6850 dbench libc-2.12.so[.] __strstr_sse42 1.53% 5537 dbench libc-2.12.so[.] __memset_sse2 1.47% 5337 dbench [kernel.kallsyms] [k] link_path_walk 1.40% 5084 dbench [kernel.kallsyms] [k] kmem_cache_alloc 1.38% 5009 dbench libc-2.12.so[.] memmove 1.24% 4496 dbench libc-2.12.so[.] vfprintf 1.15% 4169 dbench [kernel.kallsyms] [k] __audit_syscall_exit Hi Andrew, I ran the test with dbench with tmpfs. I do not see any improvements in dbench for 16k ple window. So it seems apart from ebizzy no workload benefited by that. and I agree that, it may not be good to optimize for ebizzy. I shall drop changing to 16k default window and continue with other original patch series. Need to experiment with latest kernel. Thanks for running this again. I do believe there are some workloads, when run at 1x overcommit, would benefit from a larger ple_window [with he current ple handling code], but I do not also want to potentially degrade 1x with a larger window. I do, however, think there may be a another option. I have not fully worked this out, but I think I am on to something. I decided to revert back to just a yield() instead of a yield_to(). My motivation was that yield_to() [for large VMs] is like a dog chasing its tail, round and round we go Just yield(), in particular a yield() which results in yielding to something -other- than the current VM's vcpus, helps synchronize the execution of sibling vcpus by deferring them until the lock holder vcpu is running again. The more we can do to get all vcpus running at the same time, the far less we deal with the preemption problem. The other benefit is that yield() is far, far lower overhead than yield_to() This does assume that vcpus from same VM do not share same runqueues. Yielding to a sibling vcpu with yield() is not productive for larger VMs in the same way that yield_to() is not. My recent results include restricting vcpu placement so that sibling vcpus do not get to run on the same runqueue. I do believe we could implement a initial placement and load balance policy to strive for this restriction (making it purely optional, but I bet could also help user apps which use spin locks). For 1x VMs which still vm_exit due to PLE, I believe we could probably just leave the ple_window alone, as long as we mostly use yield() instead of yield_to(). The problem with the unneeded exits in this case has been the overhead in routines leading up to yield_to() and the yield_to() itself. If we use yield() most of the time, this overhead will go away. Here is a comparison of yield_to() and yield(): dbench with 20-way VMs, 8 of them on 80-way host: no PLE426 +/- 11.03% no PLE w/ gangsched 32001 +/- .37% PLE with yield()29207 +/- .28% PLE with yield_to() 8175 +/- 1.37% Yield() is far and way better than yield_to() here and almost approaches gang sched result. Here is a link for the perf sched map bitmap: https://docs.google.com/open?id=0B6tfUNlZ-14weXBfVnFFZGw1akU The thrashing is way down and sibling vcpus tend to run together,
Re: [PATCH 0/3] x86: clear vmcss on all cpus when doing kdump if necessary
On 10/12/2012 08:40 AM, Zhang Yanfei wrote: Currently, kdump just makes all the logical processors leave VMX operation by executing VMXOFF instruction, so any VMCSs active on the logical processors may be corrupted. But, sometimes, we need the VMCSs to debug guest images contained in the host vmcore. To prevent the corruption, we should VMCLEAR the VMCSs before executing the VMXOFF instruction. How have you verified that VMXOFF doesn't flush cached VMCSs already? The patch set provides an alternative way to clear VMCSs related to guests on all cpus when host is doing kdump. I'm not sure the sysctl is really necessary. The only reason to turn if off is if the corruption is so severe that the loaded vmcs list itself causes a crash. I think it should be rare enough that we can do it unconditionally. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call agenda for 2012-10-16
CPU as DEVICE http://lists.gnu.org/archive/html/qemu-devel/2012-10/msg00719.html latest known tree for testing: https://github.com/ehabkost/qemu-hacks/commits/work/cpu-devicestate-qdev-core may be we could agree on proposed RFC. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH for-3.7] vhost: fix mergeable bufs on BE hosts
We copy head count to a 16 bit field, this works by chance on LE but on BE guest gets 0. Fix it up. Signed-off-by: Michael S. Tsirkin m...@redhat.com Tested-by: Alexander Graf ag...@suse.de Cc: sta...@kernel.org --- drivers/vhost/net.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 9ab6d47..2bb463c 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -448,7 +448,8 @@ static void handle_rx(struct vhost_net *net) .hdr.gso_type = VIRTIO_NET_HDR_GSO_NONE }; size_t total_len = 0; - int err, headcount, mergeable; + int err, mergeable; + s16 headcount; size_t vhost_hlen, sock_hlen; size_t vhost_len, sock_len; /* TODO: check that we are running from vhost_worker? */ -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] qemu: Update Linux headers
Based on v3.7-rc1-3-g29bb4cc Signed-off-by: Alex Williamson alex.william...@redhat.com --- Trying to get KVM_IRQFD_FLAG_RESAMPLE and friends for vfio-pci linux-headers/asm-x86/kvm.h | 17 + linux-headers/linux/kvm.h | 25 + linux-headers/linux/kvm_para.h |6 +++--- linux-headers/linux/vfio.h |6 +++--- linux-headers/linux/virtio_config.h |6 +++--- linux-headers/linux/virtio_ring.h |6 +++--- 6 files changed, 50 insertions(+), 16 deletions(-) diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h index 246617e..a65ec29 100644 --- a/linux-headers/asm-x86/kvm.h +++ b/linux-headers/asm-x86/kvm.h @@ -9,6 +9,22 @@ #include linux/types.h #include linux/ioctl.h +#define DE_VECTOR 0 +#define DB_VECTOR 1 +#define BP_VECTOR 3 +#define OF_VECTOR 4 +#define BR_VECTOR 5 +#define UD_VECTOR 6 +#define NM_VECTOR 7 +#define DF_VECTOR 8 +#define TS_VECTOR 10 +#define NP_VECTOR 11 +#define SS_VECTOR 12 +#define GP_VECTOR 13 +#define PF_VECTOR 14 +#define MF_VECTOR 16 +#define MC_VECTOR 18 + /* Select x86 specific features in linux/kvm.h */ #define __KVM_HAVE_PIT #define __KVM_HAVE_IOAPIC @@ -25,6 +41,7 @@ #define __KVM_HAVE_DEBUGREGS #define __KVM_HAVE_XSAVE #define __KVM_HAVE_XCRS +#define __KVM_HAVE_READONLY_MEM /* Architectural interrupt line count. */ #define KVM_NR_INTERRUPTS 256 diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h index 4b9e575..81d2feb 100644 --- a/linux-headers/linux/kvm.h +++ b/linux-headers/linux/kvm.h @@ -101,9 +101,13 @@ struct kvm_userspace_memory_region { __u64 userspace_addr; /* start of the userspace allocated memory */ }; -/* for kvm_memory_region::flags */ -#define KVM_MEM_LOG_DIRTY_PAGES 1UL -#define KVM_MEMSLOT_INVALID (1UL 1) +/* + * The bit 0 ~ bit 15 of kvm_memory_region::flags are visible for userspace, + * other bits are reserved for kvm internal use which are defined in + * include/linux/kvm_host.h. + */ +#define KVM_MEM_LOG_DIRTY_PAGES(1UL 0) +#define KVM_MEM_READONLY (1UL 1) /* for KVM_IRQ_LINE */ struct kvm_irq_level { @@ -618,6 +622,10 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_PPC_GET_SMMU_INFO 78 #define KVM_CAP_S390_COW 79 #define KVM_CAP_PPC_ALLOC_HTAB 80 +#ifdef __KVM_HAVE_READONLY_MEM +#define KVM_CAP_READONLY_MEM 81 +#endif +#define KVM_CAP_IRQFD_RESAMPLE 82 #ifdef KVM_CAP_IRQ_ROUTING @@ -683,12 +691,21 @@ struct kvm_xen_hvm_config { #endif #define KVM_IRQFD_FLAG_DEASSIGN (1 0) +/* + * Available with KVM_CAP_IRQFD_RESAMPLE + * + * KVM_IRQFD_FLAG_RESAMPLE indicates resamplefd is valid and specifies + * the irqfd to operate in resampling mode for level triggered interrupt + * emlation. See Documentation/virtual/kvm/api.txt. + */ +#define KVM_IRQFD_FLAG_RESAMPLE (1 1) struct kvm_irqfd { __u32 fd; __u32 gsi; __u32 flags; - __u8 pad[20]; + __u32 resamplefd; + __u8 pad[16]; }; struct kvm_clock_data { diff --git a/linux-headers/linux/kvm_para.h b/linux-headers/linux/kvm_para.h index 7bdcf93..cea2c5c 100644 --- a/linux-headers/linux/kvm_para.h +++ b/linux-headers/linux/kvm_para.h @@ -1,5 +1,5 @@ -#ifndef __LINUX_KVM_PARA_H -#define __LINUX_KVM_PARA_H +#ifndef _UAPI__LINUX_KVM_PARA_H +#define _UAPI__LINUX_KVM_PARA_H /* * This header file provides a method for making a hypercall to the host @@ -25,4 +25,4 @@ */ #include asm/kvm_para.h -#endif /* __LINUX_KVM_PARA_H */ +#endif /* _UAPI__LINUX_KVM_PARA_H */ diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h index f787b72..4758d1b 100644 --- a/linux-headers/linux/vfio.h +++ b/linux-headers/linux/vfio.h @@ -8,8 +8,8 @@ * it under the terms of the GNU General Public License version 2 as * published by the Free Software Foundation. */ -#ifndef VFIO_H -#define VFIO_H +#ifndef _UAPIVFIO_H +#define _UAPIVFIO_H #include linux/types.h #include linux/ioctl.h @@ -365,4 +365,4 @@ struct vfio_iommu_type1_dma_unmap { #define VFIO_IOMMU_UNMAP_DMA _IO(VFIO_TYPE, VFIO_BASE + 14) -#endif /* VFIO_H */ +#endif /* _UAPIVFIO_H */ diff --git a/linux-headers/linux/virtio_config.h b/linux-headers/linux/virtio_config.h index 4f51d8f..b7cda39 100644 --- a/linux-headers/linux/virtio_config.h +++ b/linux-headers/linux/virtio_config.h @@ -1,5 +1,5 @@ -#ifndef _LINUX_VIRTIO_CONFIG_H -#define _LINUX_VIRTIO_CONFIG_H +#ifndef _UAPI_LINUX_VIRTIO_CONFIG_H +#define _UAPI_LINUX_VIRTIO_CONFIG_H /* This header, excluding the #ifdef __KERNEL__ part, is BSD licensed so * anyone can use the definitions to implement compatible drivers/servers. * @@ -51,4 +51,4 @@ * suppressed them? */ #define VIRTIO_F_NOTIFY_ON_EMPTY 24 -#endif /* _LINUX_VIRTIO_CONFIG_H */ +#endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */ diff --git a/linux-headers/linux/virtio_ring.h b/linux-headers/linux/virtio_ring.h index 1b333e2..921694a 100644 --- a/linux-headers/linux/virtio_ring.h +++
[PATCH] vfio-pci: Add KVM INTx acceleration
This makes use of the new level irqfd support enabling bypass of qemu userspace both on INTx injection and unmask. This significantly boosts the performance of devices making use of legacy interrupts. Signed-off-by: Alex Williamson alex.william...@redhat.com --- My INTx routing workaround below will probably raise some eyebrows, but I don't feel it's worth subjecting users to core dumps if they want to try vfio-pci on new platforms. INTx routing is part of some larger plan, but until that plan materializes we have to try to avoid the API unless we think there's a good chance it might be there. I'll accept the maintenance of updating a whitelist in the interim. Thanks, Alex hw/vfio_pci.c | 224 + 1 file changed, 224 insertions(+) diff --git a/hw/vfio_pci.c b/hw/vfio_pci.c index 639371e..777a5f8 100644 --- a/hw/vfio_pci.c +++ b/hw/vfio_pci.c @@ -154,6 +154,53 @@ static uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len); static void vfio_mmap_set_enabled(VFIODevice *vdev, bool enabled); /* + * PCI code refuses to make it possible to probe whether the chipset + * supports pci_device_route_intx_to_irq() and booby traps the call + * to assert if doesn't. For us, this is just an optimization, so + * only enable it when we know it's present. Unfortunately PCIBus is + * private, so we can't just look at the function pointer. + */ +static bool vfio_pci_bus_has_intx_route(PCIDevice *pdev) +{ +#ifdef CONFIG_KVM +BusState *bus = qdev_get_parent_bus(pdev-qdev); +DeviceState *dev; + +if (!kvm_irqchip_in_kernel() || +!kvm_check_extension(kvm_state, KVM_CAP_IRQFD_RESAMPLE)) { + return false; +} + +for (; bus-parent; bus = qdev_get_parent_bus(dev)) { + +dev = bus-parent; + +if (!strncmp(i440FX-pcihost, object_get_typename(OBJECT(dev)), 14)) { +return true; +} +} + +error_report(vfio-pci: VM chipset does not support INTx routing, + using slow INTx mode\n); +#endif +return false; +} + +static PCIINTxRoute vfio_pci_device_route_intx_to_irq(PCIDevice *pdev, int pin) +{ +if (!vfio_pci_bus_has_intx_route(pdev)) { +return (PCIINTxRoute) { .mode = PCI_INTX_DISABLED, .irq = -1 }; +} + +return pci_device_route_intx_to_irq(pdev, pin); +} + +static bool vfio_pci_intx_route_changed(PCIINTxRoute *old, PCIINTxRoute *new) +{ +return old-mode != new-mode || old-irq != new-irq; +} + +/* * Common VFIO interrupt disable */ static void vfio_disable_irqindex(VFIODevice *vdev, int index) @@ -185,6 +232,21 @@ static void vfio_unmask_intx(VFIODevice *vdev) ioctl(vdev-fd, VFIO_DEVICE_SET_IRQS, irq_set); } +#ifdef CONFIG_KVM +static void vfio_mask_intx(VFIODevice *vdev) +{ +struct vfio_irq_set irq_set = { +.argsz = sizeof(irq_set), +.flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_MASK, +.index = VFIO_PCI_INTX_IRQ_INDEX, +.start = 0, +.count = 1, +}; + +ioctl(vdev-fd, VFIO_DEVICE_SET_IRQS, irq_set); +} +#endif + /* * Disabling BAR mmaping can be slow, but toggling it around INTx can * also be a huge overhead. We try to get the best of both worlds by @@ -248,6 +310,161 @@ static void vfio_eoi(VFIODevice *vdev) vfio_unmask_intx(vdev); } +static void vfio_enable_intx_kvm(VFIODevice *vdev) +{ +#ifdef CONFIG_KVM +struct kvm_irqfd irqfd = { +.fd = event_notifier_get_fd(vdev-intx.interrupt), +.gsi = vdev-intx.route.irq, +.flags = KVM_IRQFD_FLAG_RESAMPLE, +}; +struct vfio_irq_set *irq_set; +int ret, argsz; +int32_t *pfd; + +if (!kvm_irqchip_in_kernel() || +vdev-intx.route.mode != PCI_INTX_ENABLED || +!kvm_check_extension(kvm_state, KVM_CAP_IRQFD_RESAMPLE)) { +return; +} + +/* Get to a known interrupt state */ +qemu_set_fd_handler(irqfd.fd, NULL, NULL, vdev); +vfio_mask_intx(vdev); +vdev-intx.pending = false; +qemu_set_irq(vdev-pdev.irq[vdev-intx.pin], 0); + +/* Get an eventfd for resample/unmask */ +if (event_notifier_init(vdev-intx.unmask, 0)) { +error_report(vfio: Error: event_notifier_init failed eoi\n); +goto fail; +} + +/* KVM triggers it, VFIO listens for it */ +irqfd.resamplefd = event_notifier_get_fd(vdev-intx.unmask); + +if (kvm_vm_ioctl(kvm_state, KVM_IRQFD, irqfd)) { +error_report(vfio: Error: Failed to setup resample irqfd: %m\n); +goto fail_irqfd; +} + +argsz = sizeof(*irq_set) + sizeof(*pfd); + +irq_set = g_malloc0(argsz); +irq_set-argsz = argsz; +irq_set-flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_UNMASK; +irq_set-index = VFIO_PCI_INTX_IRQ_INDEX; +irq_set-start = 0; +irq_set-count = 1; +pfd = (int32_t *)irq_set-data; + +*pfd = irqfd.resamplefd; + +ret = ioctl(vdev-fd, VFIO_DEVICE_SET_IRQS, irq_set); +g_free(irq_set); +if (ret) { +
Re: [PATCH] qemu: Update Linux headers
Alex Williamson alex.william...@redhat.com writes: Based on v3.7-rc1-3-g29bb4cc Normally this would go through qemu-kvm/uq/master but since this is from Linus' tree, it's less of a concern. Nonetheless, I'd prefer we did it from v3.7-rc1 instead of a random git snapshot. Regards, Anthony Liguori Signed-off-by: Alex Williamson alex.william...@redhat.com --- Trying to get KVM_IRQFD_FLAG_RESAMPLE and friends for vfio-pci linux-headers/asm-x86/kvm.h | 17 + linux-headers/linux/kvm.h | 25 + linux-headers/linux/kvm_para.h |6 +++--- linux-headers/linux/vfio.h |6 +++--- linux-headers/linux/virtio_config.h |6 +++--- linux-headers/linux/virtio_ring.h |6 +++--- 6 files changed, 50 insertions(+), 16 deletions(-) diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h index 246617e..a65ec29 100644 --- a/linux-headers/asm-x86/kvm.h +++ b/linux-headers/asm-x86/kvm.h @@ -9,6 +9,22 @@ #include linux/types.h #include linux/ioctl.h +#define DE_VECTOR 0 +#define DB_VECTOR 1 +#define BP_VECTOR 3 +#define OF_VECTOR 4 +#define BR_VECTOR 5 +#define UD_VECTOR 6 +#define NM_VECTOR 7 +#define DF_VECTOR 8 +#define TS_VECTOR 10 +#define NP_VECTOR 11 +#define SS_VECTOR 12 +#define GP_VECTOR 13 +#define PF_VECTOR 14 +#define MF_VECTOR 16 +#define MC_VECTOR 18 + /* Select x86 specific features in linux/kvm.h */ #define __KVM_HAVE_PIT #define __KVM_HAVE_IOAPIC @@ -25,6 +41,7 @@ #define __KVM_HAVE_DEBUGREGS #define __KVM_HAVE_XSAVE #define __KVM_HAVE_XCRS +#define __KVM_HAVE_READONLY_MEM /* Architectural interrupt line count. */ #define KVM_NR_INTERRUPTS 256 diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h index 4b9e575..81d2feb 100644 --- a/linux-headers/linux/kvm.h +++ b/linux-headers/linux/kvm.h @@ -101,9 +101,13 @@ struct kvm_userspace_memory_region { __u64 userspace_addr; /* start of the userspace allocated memory */ }; -/* for kvm_memory_region::flags */ -#define KVM_MEM_LOG_DIRTY_PAGES 1UL -#define KVM_MEMSLOT_INVALID (1UL 1) +/* + * The bit 0 ~ bit 15 of kvm_memory_region::flags are visible for userspace, + * other bits are reserved for kvm internal use which are defined in + * include/linux/kvm_host.h. + */ +#define KVM_MEM_LOG_DIRTY_PAGES (1UL 0) +#define KVM_MEM_READONLY (1UL 1) /* for KVM_IRQ_LINE */ struct kvm_irq_level { @@ -618,6 +622,10 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_PPC_GET_SMMU_INFO 78 #define KVM_CAP_S390_COW 79 #define KVM_CAP_PPC_ALLOC_HTAB 80 +#ifdef __KVM_HAVE_READONLY_MEM +#define KVM_CAP_READONLY_MEM 81 +#endif +#define KVM_CAP_IRQFD_RESAMPLE 82 #ifdef KVM_CAP_IRQ_ROUTING @@ -683,12 +691,21 @@ struct kvm_xen_hvm_config { #endif #define KVM_IRQFD_FLAG_DEASSIGN (1 0) +/* + * Available with KVM_CAP_IRQFD_RESAMPLE + * + * KVM_IRQFD_FLAG_RESAMPLE indicates resamplefd is valid and specifies + * the irqfd to operate in resampling mode for level triggered interrupt + * emlation. See Documentation/virtual/kvm/api.txt. + */ +#define KVM_IRQFD_FLAG_RESAMPLE (1 1) struct kvm_irqfd { __u32 fd; __u32 gsi; __u32 flags; - __u8 pad[20]; + __u32 resamplefd; + __u8 pad[16]; }; struct kvm_clock_data { diff --git a/linux-headers/linux/kvm_para.h b/linux-headers/linux/kvm_para.h index 7bdcf93..cea2c5c 100644 --- a/linux-headers/linux/kvm_para.h +++ b/linux-headers/linux/kvm_para.h @@ -1,5 +1,5 @@ -#ifndef __LINUX_KVM_PARA_H -#define __LINUX_KVM_PARA_H +#ifndef _UAPI__LINUX_KVM_PARA_H +#define _UAPI__LINUX_KVM_PARA_H /* * This header file provides a method for making a hypercall to the host @@ -25,4 +25,4 @@ */ #include asm/kvm_para.h -#endif /* __LINUX_KVM_PARA_H */ +#endif /* _UAPI__LINUX_KVM_PARA_H */ diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h index f787b72..4758d1b 100644 --- a/linux-headers/linux/vfio.h +++ b/linux-headers/linux/vfio.h @@ -8,8 +8,8 @@ * it under the terms of the GNU General Public License version 2 as * published by the Free Software Foundation. */ -#ifndef VFIO_H -#define VFIO_H +#ifndef _UAPIVFIO_H +#define _UAPIVFIO_H #include linux/types.h #include linux/ioctl.h @@ -365,4 +365,4 @@ struct vfio_iommu_type1_dma_unmap { #define VFIO_IOMMU_UNMAP_DMA _IO(VFIO_TYPE, VFIO_BASE + 14) -#endif /* VFIO_H */ +#endif /* _UAPIVFIO_H */ diff --git a/linux-headers/linux/virtio_config.h b/linux-headers/linux/virtio_config.h index 4f51d8f..b7cda39 100644 --- a/linux-headers/linux/virtio_config.h +++ b/linux-headers/linux/virtio_config.h @@ -1,5 +1,5 @@ -#ifndef _LINUX_VIRTIO_CONFIG_H -#define _LINUX_VIRTIO_CONFIG_H +#ifndef _UAPI_LINUX_VIRTIO_CONFIG_H +#define _UAPI_LINUX_VIRTIO_CONFIG_H /* This header, excluding the #ifdef __KERNEL__ part, is BSD licensed so * anyone can
[PATCH v2] qemu: Update Linux headers
Based on v3.7-rc1 Signed-off-by: Alex Williamson alex.william...@redhat.com --- Using tag v3.7-rc1 instead of random HEAD, although the patch turns out identical to v1. linux-headers/asm-x86/kvm.h | 17 + linux-headers/linux/kvm.h | 25 + linux-headers/linux/kvm_para.h |6 +++--- linux-headers/linux/vfio.h |6 +++--- linux-headers/linux/virtio_config.h |6 +++--- linux-headers/linux/virtio_ring.h |6 +++--- 6 files changed, 50 insertions(+), 16 deletions(-) diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h index 246617e..a65ec29 100644 --- a/linux-headers/asm-x86/kvm.h +++ b/linux-headers/asm-x86/kvm.h @@ -9,6 +9,22 @@ #include linux/types.h #include linux/ioctl.h +#define DE_VECTOR 0 +#define DB_VECTOR 1 +#define BP_VECTOR 3 +#define OF_VECTOR 4 +#define BR_VECTOR 5 +#define UD_VECTOR 6 +#define NM_VECTOR 7 +#define DF_VECTOR 8 +#define TS_VECTOR 10 +#define NP_VECTOR 11 +#define SS_VECTOR 12 +#define GP_VECTOR 13 +#define PF_VECTOR 14 +#define MF_VECTOR 16 +#define MC_VECTOR 18 + /* Select x86 specific features in linux/kvm.h */ #define __KVM_HAVE_PIT #define __KVM_HAVE_IOAPIC @@ -25,6 +41,7 @@ #define __KVM_HAVE_DEBUGREGS #define __KVM_HAVE_XSAVE #define __KVM_HAVE_XCRS +#define __KVM_HAVE_READONLY_MEM /* Architectural interrupt line count. */ #define KVM_NR_INTERRUPTS 256 diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h index 4b9e575..81d2feb 100644 --- a/linux-headers/linux/kvm.h +++ b/linux-headers/linux/kvm.h @@ -101,9 +101,13 @@ struct kvm_userspace_memory_region { __u64 userspace_addr; /* start of the userspace allocated memory */ }; -/* for kvm_memory_region::flags */ -#define KVM_MEM_LOG_DIRTY_PAGES 1UL -#define KVM_MEMSLOT_INVALID (1UL 1) +/* + * The bit 0 ~ bit 15 of kvm_memory_region::flags are visible for userspace, + * other bits are reserved for kvm internal use which are defined in + * include/linux/kvm_host.h. + */ +#define KVM_MEM_LOG_DIRTY_PAGES(1UL 0) +#define KVM_MEM_READONLY (1UL 1) /* for KVM_IRQ_LINE */ struct kvm_irq_level { @@ -618,6 +622,10 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_PPC_GET_SMMU_INFO 78 #define KVM_CAP_S390_COW 79 #define KVM_CAP_PPC_ALLOC_HTAB 80 +#ifdef __KVM_HAVE_READONLY_MEM +#define KVM_CAP_READONLY_MEM 81 +#endif +#define KVM_CAP_IRQFD_RESAMPLE 82 #ifdef KVM_CAP_IRQ_ROUTING @@ -683,12 +691,21 @@ struct kvm_xen_hvm_config { #endif #define KVM_IRQFD_FLAG_DEASSIGN (1 0) +/* + * Available with KVM_CAP_IRQFD_RESAMPLE + * + * KVM_IRQFD_FLAG_RESAMPLE indicates resamplefd is valid and specifies + * the irqfd to operate in resampling mode for level triggered interrupt + * emlation. See Documentation/virtual/kvm/api.txt. + */ +#define KVM_IRQFD_FLAG_RESAMPLE (1 1) struct kvm_irqfd { __u32 fd; __u32 gsi; __u32 flags; - __u8 pad[20]; + __u32 resamplefd; + __u8 pad[16]; }; struct kvm_clock_data { diff --git a/linux-headers/linux/kvm_para.h b/linux-headers/linux/kvm_para.h index 7bdcf93..cea2c5c 100644 --- a/linux-headers/linux/kvm_para.h +++ b/linux-headers/linux/kvm_para.h @@ -1,5 +1,5 @@ -#ifndef __LINUX_KVM_PARA_H -#define __LINUX_KVM_PARA_H +#ifndef _UAPI__LINUX_KVM_PARA_H +#define _UAPI__LINUX_KVM_PARA_H /* * This header file provides a method for making a hypercall to the host @@ -25,4 +25,4 @@ */ #include asm/kvm_para.h -#endif /* __LINUX_KVM_PARA_H */ +#endif /* _UAPI__LINUX_KVM_PARA_H */ diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h index f787b72..4758d1b 100644 --- a/linux-headers/linux/vfio.h +++ b/linux-headers/linux/vfio.h @@ -8,8 +8,8 @@ * it under the terms of the GNU General Public License version 2 as * published by the Free Software Foundation. */ -#ifndef VFIO_H -#define VFIO_H +#ifndef _UAPIVFIO_H +#define _UAPIVFIO_H #include linux/types.h #include linux/ioctl.h @@ -365,4 +365,4 @@ struct vfio_iommu_type1_dma_unmap { #define VFIO_IOMMU_UNMAP_DMA _IO(VFIO_TYPE, VFIO_BASE + 14) -#endif /* VFIO_H */ +#endif /* _UAPIVFIO_H */ diff --git a/linux-headers/linux/virtio_config.h b/linux-headers/linux/virtio_config.h index 4f51d8f..b7cda39 100644 --- a/linux-headers/linux/virtio_config.h +++ b/linux-headers/linux/virtio_config.h @@ -1,5 +1,5 @@ -#ifndef _LINUX_VIRTIO_CONFIG_H -#define _LINUX_VIRTIO_CONFIG_H +#ifndef _UAPI_LINUX_VIRTIO_CONFIG_H +#define _UAPI_LINUX_VIRTIO_CONFIG_H /* This header, excluding the #ifdef __KERNEL__ part, is BSD licensed so * anyone can use the definitions to implement compatible drivers/servers. * @@ -51,4 +51,4 @@ * suppressed them? */ #define VIRTIO_F_NOTIFY_ON_EMPTY 24 -#endif /* _LINUX_VIRTIO_CONFIG_H */ +#endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */ diff --git a/linux-headers/linux/virtio_ring.h b/linux-headers/linux/virtio_ring.h index 1b333e2..921694a 100644 --- a/linux-headers/linux/virtio_ring.h +++
Re: [PATCH] qemu: Update Linux headers
On Mon, 2012-10-15 at 15:54 -0500, Anthony Liguori wrote: Alex Williamson alex.william...@redhat.com writes: Based on v3.7-rc1-3-g29bb4cc Normally this would go through qemu-kvm/uq/master but since this is from Linus' tree, it's less of a concern. Nonetheless, I'd prefer we did it from v3.7-rc1 instead of a random git snapshot. Resent against v3.7-rc1, which ends up just being a changelog change, no header changes since rc1. Thanks, Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH qom-cpu v2 4/7] cpus: Pass CPUState to qemu_cpu_is_self()
On Fri, 12 Oct 2012 03:26:40 +0200 Andreas Färber afaer...@suse.de wrote: Change return type to bool, move to include/qemu/cpu.h and add documentation. Signed-off-by: Andreas Färber afaer...@suse.de --- cpus.c | 10 -- exec.c |3 ++- hw/apic.c |6 -- include/qemu/cpu.h | 10 ++ kvm-all.c |4 +++- qemu-common.h |1 - target-i386/kvm.c |6 -- 7 Dateien geändert, 27 Zeilen hinzugefügt(+), 13 Zeilen entfernt(-) diff --git a/cpus.c b/cpus.c index 750a76f..849ea8a 100644 --- a/cpus.c +++ b/cpus.c @@ -638,9 +638,10 @@ void qemu_init_cpu_loop(void) void run_on_cpu(CPUArchState *env, void (*func)(void *data), void *data) { +CPUState *cpu = ENV_GET_CPU(env); struct qemu_work_item wi; -if (qemu_cpu_is_self(env)) { +if (qemu_cpu_is_self(cpu)) { func(data); return; } @@ -855,7 +856,7 @@ static void qemu_cpu_kick_thread(CPUArchState *env) exit(1); } #else /* _WIN32 */ -if (!qemu_cpu_is_self(env)) { +if (!qemu_cpu_is_self(cpu)) { SuspendThread(cpu-hThread); cpu_signal(0); ResumeThread(cpu-hThread); @@ -890,11 +891,8 @@ void qemu_cpu_kick_self(void) #endif } -int qemu_cpu_is_self(void *_env) +bool qemu_cpu_is_self(CPUState *cpu) { -CPUArchState *env = _env; -CPUState *cpu = ENV_GET_CPU(env); - return qemu_thread_is_self(cpu-thread); } diff --git a/exec.c b/exec.c index 7899042..e21be32 100644 --- a/exec.c +++ b/exec.c @@ -1685,6 +1685,7 @@ static void cpu_unlink_tb(CPUArchState *env) /* mask must never be zero, except for A20 change call */ static void tcg_handle_interrupt(CPUArchState *env, int mask) { +CPUState *cpu = ENV_GET_CPU(env); Is there any chance to get rid of expensive cast on this call path? int old_mask; old_mask = env-interrupt_request; @@ -1694,7 +1695,7 @@ static void tcg_handle_interrupt(CPUArchState *env, int mask) * If called from iothread context, wake the target cpu in * case its halted. */ -if (!qemu_cpu_is_self(env)) { +if (!qemu_cpu_is_self(cpu)) { qemu_cpu_kick(env); return; } diff --git a/hw/apic.c b/hw/apic.c index ccf2819..1b4cd2f 100644 --- a/hw/apic.c +++ b/hw/apic.c @@ -107,7 +107,7 @@ static void apic_sync_vapic(APICCommonState *s, int sync_type) length = offsetof(VAPICState, enabled) - offsetof(VAPICState, isr); if (sync_type SYNC_TO_VAPIC) { -assert(qemu_cpu_is_self(s-cpu-env)); +assert(qemu_cpu_is_self(CPU(s-cpu))); vapic_state.tpr = s-tpr; vapic_state.enabled = 1; @@ -363,10 +363,12 @@ static int apic_irq_pending(APICCommonState *s) /* signal the CPU if an irq is pending */ static void apic_update_irq(APICCommonState *s) { +CPUState *cpu = CPU(s-cpu); + if (!(s-spurious_vec APIC_SV_ENABLE)) { return; } -if (!qemu_cpu_is_self(s-cpu-env)) { +if (!qemu_cpu_is_self(cpu)) { cpu_interrupt(s-cpu-env, CPU_INTERRUPT_POLL); } else if (apic_irq_pending(s) 0) { cpu_interrupt(s-cpu-env, CPU_INTERRUPT_HARD); diff --git a/include/qemu/cpu.h b/include/qemu/cpu.h index ad706a6..7be983d 100644 --- a/include/qemu/cpu.h +++ b/include/qemu/cpu.h @@ -78,5 +78,15 @@ struct CPUState { */ void cpu_reset(CPUState *cpu); +/** + * qemu_cpu_is_self: + * @cpu: The vCPU to check against. + * + * Checks whether the caller is executing on the vCPU thread. + * + * Returns: %true if called from @cpu's thread, %false otherwise. + */ +bool qemu_cpu_is_self(CPUState *cpu); + #endif diff --git a/kvm-all.c b/kvm-all.c index 92a7137..db01aeb 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -854,9 +854,11 @@ static MemoryListener kvm_memory_listener = { static void kvm_handle_interrupt(CPUArchState *env, int mask) { +CPUState *cpu = ENV_GET_CPU(env); + env-interrupt_request |= mask; -if (!qemu_cpu_is_self(env)) { +if (!qemu_cpu_is_self(cpu)) { qemu_cpu_kick(env); } } diff --git a/qemu-common.h b/qemu-common.h index b54612b..2094742 100644 --- a/qemu-common.h +++ b/qemu-common.h @@ -326,7 +326,6 @@ int cpu_load(QEMUFile *f, void *opaque, int version_id); /* Unblock cpu */ void qemu_cpu_kick(void *env); void qemu_cpu_kick_self(void); -int qemu_cpu_is_self(void *env); /* work queue */ struct qemu_work_item { diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 5b18383..cf3d2f1 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -1552,9 +1552,10 @@ static int kvm_get_debugregs(CPUX86State *env) int kvm_arch_put_registers(CPUX86State *env, int level) { +CPUState *cpu = ENV_GET_CPU(env); int ret; -assert(cpu_is_stopped(env) || qemu_cpu_is_self(env)); +assert(cpu_is_stopped(env) ||
[PATCH 8/8] KVM: PPC: Book3S HV: Fix thinko in try_lock_hpte()
This fixes an error in the inline asm in try_lock_hpte() where we were erroneously using a register number as an immediate operand. The bug only affects an error path, and in fact the code will still work as long as the compiler chooses some register other than r0 for the bits variable. Nevertheless it should still be fixed. Signed-off-by: Paul Mackerras pau...@samba.org --- arch/powerpc/include/asm/kvm_book3s_64.h |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 0dd1d86..1472a5b 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -60,7 +60,7 @@ static inline long try_lock_hpte(unsigned long *hpte, unsigned long bits) ori %0,%0,%4\n stdcx. %0,0,%2\n beq+2f\n - li %1,%3\n + mr %1,%3\n 2:isync : =r (tmp), =r (old) : r (hpte), r (bits), i (HPTE_V_HVLOCK) -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/8] Various Book3s HV fixes that haven't been picked up yet
This is a set of 8 patches of which the first 7 have been posted previously and have had no comments. The 8th is new, but is quite trivial. They fix a series of issues with HV-style KVM on ppc. They only touch code that is specific to Book3S HV KVM. The patches are against the next branch of the kvm tree. The overall diffstat is: arch/powerpc/include/asm/kvm_asm.h |1 + arch/powerpc/include/asm/kvm_book3s_64.h |2 +- arch/powerpc/include/asm/kvm_host.h | 17 +- arch/powerpc/include/asm/smp.h |8 + arch/powerpc/kernel/smp.c| 46 + arch/powerpc/kvm/book3s_hv.c | 316 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 11 +- 7 files changed, 293 insertions(+), 108 deletions(-) Please apply. Thanks, Paul. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 7/8] KVM: PPC: Book3S HV: Allow DTL to be set to address 0, length 0
Commit 55b665b026 (KVM: PPC: Book3S HV: Provide a way for userspace to get/set per-vCPU areas) includes a check on the length of the dispatch trace log (DTL) to make sure the buffer is at least one entry long. This is appropriate when registering a buffer, but the interface also allows for any existing buffer to be unregistered by specifying a zero address. In this case the length check is not appropriate. This makes the check conditional on the address being non-zero. Signed-off-by: Paul Mackerras pau...@samba.org --- arch/powerpc/kvm/book3s_hv.c |5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 8b3c470..812764c 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -811,9 +811,8 @@ int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 id, union kvmppc_one_reg *val) addr = val-vpaval.addr; len = val-vpaval.length; r = -EINVAL; - if (len sizeof(struct dtl_entry)) - break; - if (addr !vcpu-arch.vpa.next_gpa) + if (addr (len sizeof(struct dtl_entry) || +!vcpu-arch.vpa.next_gpa)) break; len -= len % sizeof(struct dtl_entry); r = set_vpa(vcpu, vcpu-arch.dtl, addr, len); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/8] KVM: PPC: Book3S HV: Allow KVM guests to stop secondary threads coming online
When a Book3S HV KVM guest is running, we need the host to be in single-thread mode, that is, all of the cores (or at least all of the cores where the KVM guest could run) to be running only one active hardware thread. This is because of the hardware restriction in POWER processors that all of the hardware threads in the core must be in the same logical partition. Complying with this restriction is much easier if, from the host kernel's point of view, only one hardware thread is active. This adds two hooks in the SMP hotplug code to allow the KVM code to make sure that secondary threads (i.e. hardware threads other than thread 0) cannot come online while any KVM guest exists. The KVM code still has to check that any core where it runs a guest has the secondary threads offline, but having done that check it can now be sure that they will not come online while the guest is running. Signed-off-by: Paul Mackerras pau...@samba.org Acked-by: Benjamin Herrenschmidt b...@kernel.crashing.org --- arch/powerpc/include/asm/smp.h |8 +++ arch/powerpc/kernel/smp.c | 46 arch/powerpc/kvm/book3s_hv.c | 12 +-- 3 files changed, 64 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h index ebc24dc..b625a1a 100644 --- a/arch/powerpc/include/asm/smp.h +++ b/arch/powerpc/include/asm/smp.h @@ -66,6 +66,14 @@ void generic_cpu_die(unsigned int cpu); void generic_mach_cpu_die(void); void generic_set_cpu_dead(unsigned int cpu); int generic_check_cpu_restart(unsigned int cpu); + +extern void inhibit_secondary_onlining(void); +extern void uninhibit_secondary_onlining(void); + +#else /* HOTPLUG_CPU */ +static inline void inhibit_secondary_onlining(void) {} +static inline void uninhibit_secondary_onlining(void) {} + #endif #ifdef CONFIG_PPC64 diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index 8d4214a..c4f420c 100644 --- a/arch/powerpc/kernel/smp.c +++ b/arch/powerpc/kernel/smp.c @@ -417,6 +417,45 @@ int generic_check_cpu_restart(unsigned int cpu) { return per_cpu(cpu_state, cpu) == CPU_UP_PREPARE; } + +static atomic_t secondary_inhibit_count; + +/* + * Don't allow secondary CPU threads to come online + */ +void inhibit_secondary_onlining(void) +{ + /* +* This makes secondary_inhibit_count stable during cpu +* online/offline operations. +*/ + get_online_cpus(); + + atomic_inc(secondary_inhibit_count); + put_online_cpus(); +} +EXPORT_SYMBOL_GPL(inhibit_secondary_onlining); + +/* + * Allow secondary CPU threads to come online again + */ +void uninhibit_secondary_onlining(void) +{ + get_online_cpus(); + atomic_dec(secondary_inhibit_count); + put_online_cpus(); +} +EXPORT_SYMBOL_GPL(uninhibit_secondary_onlining); + +static int secondaries_inhibited(void) +{ + return atomic_read(secondary_inhibit_count); +} + +#else /* HOTPLUG_CPU */ + +#define secondaries_inhibited()0 + #endif static void cpu_idle_thread_init(unsigned int cpu, struct task_struct *idle) @@ -435,6 +474,13 @@ int __cpuinit __cpu_up(unsigned int cpu, struct task_struct *tidle) { int rc, c; + /* +* Don't allow secondary threads to come online if inhibited +*/ + if (threads_per_core 1 secondaries_inhibited() + cpu % threads_per_core != 0) + return -EBUSY; + if (smp_ops == NULL || (smp_ops-cpu_bootable !smp_ops-cpu_bootable(cpu))) return -EINVAL; diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 9a15da7..c5ddf04 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -47,6 +47,7 @@ #include asm/page.h #include asm/hvcall.h #include asm/switch_to.h +#include asm/smp.h #include linux/gfp.h #include linux/vmalloc.h #include linux/highmem.h @@ -1016,8 +1017,6 @@ static int kvmppc_run_core(struct kvmppc_vcore *vc) /* * Make sure we are running on thread 0, and that * secondary threads are offline. -* XXX we should also block attempts to bring any -* secondary threads online. */ if (threads_per_core 1 !on_primary_thread()) { list_for_each_entry(vcpu, vc-runnable_threads, arch.run_list) @@ -1730,11 +1729,20 @@ int kvmppc_core_init_vm(struct kvm *kvm) kvm-arch.using_mmu_notifiers = !!cpu_has_feature(CPU_FTR_ARCH_206); spin_lock_init(kvm-arch.slot_phys_lock); + + /* +* Don't allow secondary CPU threads to come online +* while any KVM VMs exist. +*/ + inhibit_secondary_onlining(); + return 0; } void kvmppc_core_destroy_vm(struct kvm *kvm) { + uninhibit_secondary_onlining(); + if (kvm-arch.rma) { kvm_release_rma(kvm-arch.rma); kvm-arch.rma = NULL; -- 1.7.10.4 -- To unsubscribe from this
[PATCH 4/8] KVM: PPC: Book3S HV: Fixes for late-joining threads
If a thread in a virtual core becomes runnable while other threads in the same virtual core are already running in the guest, it is possible for the latecomer to join the others on the core without first pulling them all out of the guest. Currently this only happens rarely, when a vcpu is first started. This fixes some bugs and omissions in the code in this case. First, we need to check for VPA updates for the latecomer and make a DTL entry for it. Secondly, if it comes along while the master vcpu is doing a VPA update, we don't need to do anything since the master will pick it up in kvmppc_run_core. To handle this correctly we introduce a new vcore state, VCORE_STARTING. Thirdly, there is a race because we currently clear the hardware thread's hwthread_req before waiting to see it get to nap. A latecomer thread could have its hwthread_req cleared before it gets to test it, and therefore never increment the nap_count, leading to messages about wait_for_nap timeouts. Signed-off-by: Paul Mackerras pau...@samba.org --- arch/powerpc/include/asm/kvm_host.h |7 --- arch/powerpc/kvm/book3s_hv.c| 14 +++--- 2 files changed, 15 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 68f5a30..218534d 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -289,9 +289,10 @@ struct kvmppc_vcore { /* Values for vcore_state */ #define VCORE_INACTIVE 0 -#define VCORE_RUNNING 1 -#define VCORE_EXITING 2 -#define VCORE_SLEEPING 3 +#define VCORE_SLEEPING 1 +#define VCORE_STARTING 2 +#define VCORE_RUNNING 3 +#define VCORE_EXITING 4 /* * Struct used to manage memory for a virtual processor area diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 3a737a4..89995fa 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -336,6 +336,11 @@ static void kvmppc_update_vpa(struct kvm_vcpu *vcpu, struct kvmppc_vpa *vpap) static void kvmppc_update_vpas(struct kvm_vcpu *vcpu) { + if (!(vcpu-arch.vpa.update_pending || + vcpu-arch.slb_shadow.update_pending || + vcpu-arch.dtl.update_pending)) + return; + spin_lock(vcpu-arch.vpa_update_lock); if (vcpu-arch.vpa.update_pending) { kvmppc_update_vpa(vcpu, vcpu-arch.vpa); @@ -1009,7 +1014,7 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc) vc-n_woken = 0; vc-nap_count = 0; vc-entry_exit_count = 0; - vc-vcore_state = VCORE_RUNNING; + vc-vcore_state = VCORE_STARTING; vc-in_guest = 0; vc-napping_threads = 0; @@ -1062,6 +1067,7 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc) kvmppc_create_dtl_entry(vcpu, vc); } + vc-vcore_state = VCORE_RUNNING; preempt_disable(); spin_unlock(vc-lock); @@ -1070,8 +1076,6 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc) srcu_idx = srcu_read_lock(vcpu0-kvm-srcu); __kvmppc_vcore_entry(NULL, vcpu0); - for (i = 0; i threads_per_core; ++i) - kvmppc_release_hwthread(vc-pcpu + i); spin_lock(vc-lock); /* disable sending of IPIs on virtual external irqs */ @@ -1080,6 +1084,8 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc) /* wait for secondary threads to finish writing their state to memory */ if (vc-nap_count vc-n_woken) kvmppc_wait_for_nap(vc); + for (i = 0; i threads_per_core; ++i) + kvmppc_release_hwthread(vc-pcpu + i); /* prevent other vcpu threads from doing kvmppc_start_thread() now */ vc-vcore_state = VCORE_EXITING; spin_unlock(vc-lock); @@ -1170,6 +1176,7 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) kvm_run-exit_reason = 0; vcpu-arch.ret = RESUME_GUEST; vcpu-arch.trap = 0; + kvmppc_update_vpas(vcpu); /* * Synchronize with other threads in this virtual core @@ -1193,6 +1200,7 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) if (vc-vcore_state == VCORE_RUNNING VCORE_EXIT_COUNT(vc) == 0) { vcpu-arch.ptid = vc-n_runnable - 1; + kvmppc_create_dtl_entry(vcpu, vc); kvmppc_start_thread(vcpu); } -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/8] KVM: PPC: Book3S HV: Fix accounting of stolen time
Currently the code that accounts stolen time tends to overestimate the stolen time, and will sometimes report more stolen time in a DTL (dispatch trace log) entry than has elapsed since the last DTL entry. This can cause guests to underflow the user or system time measured for some tasks, leading to ridiculous CPU percentages and total runtimes being reported by top and other utilities. In addition, the current code was designed for the previous policy where a vcore would only run when all the vcpus in it were runnable, and so only counted stolen time on a per-vcore basis. Now that a vcore can run while some of the vcpus in it are doing other things in the kernel (e.g. handling a page fault), we need to count the time when a vcpu task is preempted while it is not running as part of a vcore as stolen also. To do this, we bring back the BUSY_IN_HOST vcpu state and extend the vcpu_load/put functions to count preemption time while the vcpu is in that state. Handling the transitions between the RUNNING and BUSY_IN_HOST states requires checking and updating two variables (accumulated time stolen and time last preempted), so we add a new spinlock, vcpu-arch.tbacct_lock. This protects both the per-vcpu stolen/preempt-time variables, and the per-vcore variables while this vcpu is running the vcore. Finally, we now don't count time spent in userspace as stolen time. The task could be executing in userspace on behalf of the vcpu, or it could be preempted, or the vcpu could be genuinely stopped. Since we have no way of dividing up the time between these cases, we don't count any of it as stolen. Signed-off-by: Paul Mackerras pau...@samba.org --- arch/powerpc/include/asm/kvm_host.h |5 ++ arch/powerpc/kvm/book3s_hv.c| 127 ++- 2 files changed, 117 insertions(+), 15 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 1e8cbd1..3093896 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -559,12 +559,17 @@ struct kvm_vcpu_arch { unsigned long dtl_index; u64 stolen_logged; struct kvmppc_vpa slb_shadow; + + spinlock_t tbacct_lock; + u64 busy_stolen; + u64 busy_preempt; #endif }; /* Values for vcpu-arch.state */ #define KVMPPC_VCPU_NOTREADY 0 #define KVMPPC_VCPU_RUNNABLE 1 +#define KVMPPC_VCPU_BUSY_IN_HOST 2 /* Values for vcpu-arch.io_gpr */ #define KVM_MMIO_REG_MASK 0x001f diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 61d2934..8b3c470 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -60,23 +60,74 @@ /* Used to indicate that a guest page fault needs to be handled */ #define RESUME_PAGE_FAULT (RESUME_GUEST | RESUME_FLAG_ARCH1) +/* Used as a null value for timebase values */ +#define TB_NIL (~(u64)0) + static void kvmppc_end_cede(struct kvm_vcpu *vcpu); static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu); +/* + * We use the vcpu_load/put functions to measure stolen time. + * Stolen time is counted as time when either the vcpu is able to + * run as part of a virtual core, but the task running the vcore + * is preempted or sleeping, or when the vcpu needs something done + * in the kernel by the task running the vcpu, but that task is + * preempted or sleeping. Those two things have to be counted + * separately, since one of the vcpu tasks will take on the job + * of running the core, and the other vcpu tasks in the vcore will + * sleep waiting for it to do that, but that sleep shouldn't count + * as stolen time. + * + * Hence we accumulate stolen time when the vcpu can run as part of + * a vcore using vc-stolen_tb, and the stolen time when the vcpu + * needs its task to do other things in the kernel (for example, + * service a page fault) in busy_stolen. We don't accumulate + * stolen time for a vcore when it is inactive, or for a vcpu + * when it is in state RUNNING or NOTREADY. NOTREADY is a bit of + * a misnomer; it means that the vcpu task is not executing in + * the KVM_VCPU_RUN ioctl, i.e. it is in userspace or elsewhere in + * the kernel. We don't have any way of dividing up that time + * between time that the vcpu is genuinely stopped, time that + * the task is actively working on behalf of the vcpu, and time + * that the task is preempted, so we don't count any of it as + * stolen. + * + * Updates to busy_stolen are protected by arch.tbacct_lock; + * updates to vc-stolen_tb are protected by the arch.tbacct_lock + * of the vcpu that has taken responsibility for running the vcore + * (i.e. vc-runner). The stolen times are measured in units of + * timebase ticks. (Note that the != TB_NIL checks below are + * purely defensive; they should never fail.) + */ + void kvmppc_core_vcpu_load(struct kvm_vcpu *vcpu, int cpu) { struct kvmppc_vcore *vc = vcpu-arch.vcore; - if (vc-runner == vcpu
[PATCH 2/8] KVM: PPC: Book3S HV: Fix some races in starting secondary threads
Subsequent patches implementing in-kernel XICS emulation will make it possible for IPIs to arrive at secondary threads at arbitrary times. This fixes some races in how we start the secondary threads, which if not fixed could lead to occasional crashes of the host kernel. This makes sure that (a) we have grabbed all the secondary threads, and verified that they are no longer in the kernel, before we start any thread, (b) that the secondary thread loads its vcpu pointer after clearing the IPI that woke it up (so we don't miss a wakeup), and (c) that the secondary thread clears its vcpu pointer before incrementing the nap count. It also removes unnecessary setting of the vcpu and vcore pointers in the paca in kvmppc_core_vcpu_load. Signed-off-by: Paul Mackerras pau...@samba.org --- arch/powerpc/kvm/book3s_hv.c| 41 ++- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 11 ++--- 2 files changed, 32 insertions(+), 20 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index c5ddf04..77dec0f 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -64,8 +64,6 @@ void kvmppc_core_vcpu_load(struct kvm_vcpu *vcpu, int cpu) { struct kvmppc_vcore *vc = vcpu-arch.vcore; - local_paca-kvm_hstate.kvm_vcpu = vcpu; - local_paca-kvm_hstate.kvm_vcore = vc; if (vc-runner == vcpu vc-vcore_state != VCORE_INACTIVE) vc-stolen_tb += mftb() - vc-preempt_tb; } @@ -880,6 +878,7 @@ static int kvmppc_grab_hwthread(int cpu) /* Ensure the thread won't go into the kernel if it wakes */ tpaca-kvm_hstate.hwthread_req = 1; + tpaca-kvm_hstate.kvm_vcpu = NULL; /* * If the thread is already executing in the kernel (e.g. handling @@ -929,7 +928,6 @@ static void kvmppc_start_thread(struct kvm_vcpu *vcpu) smp_wmb(); #if defined(CONFIG_PPC_ICP_NATIVE) defined(CONFIG_SMP) if (vcpu-arch.ptid) { - kvmppc_grab_hwthread(cpu); xics_wake_cpu(cpu); ++vc-n_woken; } @@ -955,7 +953,8 @@ static void kvmppc_wait_for_nap(struct kvmppc_vcore *vc) /* * Check that we are on thread 0 and that any other threads in - * this core are off-line. + * this core are off-line. Then grab the threads so they can't + * enter the kernel. */ static int on_primary_thread(void) { @@ -967,6 +966,17 @@ static int on_primary_thread(void) while (++thr threads_per_core) if (cpu_online(cpu + thr)) return 0; + + /* Grab all hw threads so they can't go into the kernel */ + for (thr = 1; thr threads_per_core; ++thr) { + if (kvmppc_grab_hwthread(cpu + thr)) { + /* Couldn't grab one; let the others go */ + do { + kvmppc_release_hwthread(cpu + thr); + } while (--thr 0); + return 0; + } + } return 1; } @@ -1015,16 +1025,6 @@ static int kvmppc_run_core(struct kvmppc_vcore *vc) } /* -* Make sure we are running on thread 0, and that -* secondary threads are offline. -*/ - if (threads_per_core 1 !on_primary_thread()) { - list_for_each_entry(vcpu, vc-runnable_threads, arch.run_list) - vcpu-arch.ret = -EBUSY; - goto out; - } - - /* * Assign physical thread IDs, first to non-ceded vcpus * and then to ceded ones. */ @@ -1043,15 +1043,22 @@ static int kvmppc_run_core(struct kvmppc_vcore *vc) if (vcpu-arch.ceded) vcpu-arch.ptid = ptid++; + /* +* Make sure we are running on thread 0, and that +* secondary threads are offline. +*/ + if (threads_per_core 1 !on_primary_thread()) { + list_for_each_entry(vcpu, vc-runnable_threads, arch.run_list) + vcpu-arch.ret = -EBUSY; + goto out; + } + vc-stolen_tb += mftb() - vc-preempt_tb; vc-pcpu = smp_processor_id(); list_for_each_entry(vcpu, vc-runnable_threads, arch.run_list) { kvmppc_start_thread(vcpu); kvmppc_create_dtl_entry(vcpu, vc); } - /* Grab any remaining hw threads so they can't go into the kernel */ - for (i = ptid; i threads_per_core; ++i) - kvmppc_grab_hwthread(vc-pcpu + i); preempt_disable(); spin_unlock(vc-lock); diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index 44b72fe..1e90ef6 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -134,8 +134,11 @@ kvm_start_guest: 27:/* XXX should handle hypervisor maintenance interrupts etc. here */ + /* reload vcpu pointer after
[PATCH 3/8] KVM: PPC: Book3s HV: Don't access runnable threads list without vcore lock
There were a few places where we were traversing the list of runnable threads in a virtual core, i.e. vc-runnable_threads, without holding the vcore spinlock. This extends the places where we hold the vcore spinlock to cover everywhere that we traverse that list. Since we possibly need to sleep inside kvmppc_book3s_hv_page_fault, this moves the call of it from kvmppc_handle_exit out to kvmppc_vcpu_run, where we don't hold the vcore lock. In kvmppc_vcore_blocked, we don't actually need to check whether all vcpus are ceded and don't have any pending exceptions, since the caller has already done that. The caller (kvmppc_run_vcpu) wasn't actually checking for pending exceptions, so we add that. The change of if to while in kvmppc_run_vcpu is to make sure that we never call kvmppc_remove_runnable() when the vcore state is RUNNING or EXITING. Signed-off-by: Paul Mackerras pau...@samba.org --- arch/powerpc/include/asm/kvm_asm.h |1 + arch/powerpc/kvm/book3s_hv.c | 67 ++-- 2 files changed, 34 insertions(+), 34 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_asm.h b/arch/powerpc/include/asm/kvm_asm.h index 76fdcfe..aabcdba 100644 --- a/arch/powerpc/include/asm/kvm_asm.h +++ b/arch/powerpc/include/asm/kvm_asm.h @@ -118,6 +118,7 @@ #define RESUME_FLAG_NV (10) /* Reload guest nonvolatile state? */ #define RESUME_FLAG_HOST(11) /* Resume host? */ +#define RESUME_FLAG_ARCH1 (12) #define RESUME_GUEST0 #define RESUME_GUEST_NV RESUME_FLAG_NV diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 77dec0f..3a737a4 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -57,6 +57,9 @@ /* #define EXIT_DEBUG_SIMPLE */ /* #define EXIT_DEBUG_INT */ +/* Used to indicate that a guest page fault needs to be handled */ +#define RESUME_PAGE_FAULT (RESUME_GUEST | RESUME_FLAG_ARCH1) + static void kvmppc_end_cede(struct kvm_vcpu *vcpu); static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu); @@ -431,7 +434,6 @@ static int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, struct task_struct *tsk) { int r = RESUME_HOST; - int srcu_idx; vcpu-stat.sum_exits++; @@ -491,16 +493,12 @@ static int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, * have been handled already. */ case BOOK3S_INTERRUPT_H_DATA_STORAGE: - srcu_idx = srcu_read_lock(vcpu-kvm-srcu); - r = kvmppc_book3s_hv_page_fault(run, vcpu, - vcpu-arch.fault_dar, vcpu-arch.fault_dsisr); - srcu_read_unlock(vcpu-kvm-srcu, srcu_idx); + r = RESUME_PAGE_FAULT; break; case BOOK3S_INTERRUPT_H_INST_STORAGE: - srcu_idx = srcu_read_lock(vcpu-kvm-srcu); - r = kvmppc_book3s_hv_page_fault(run, vcpu, - kvmppc_get_pc(vcpu), 0); - srcu_read_unlock(vcpu-kvm-srcu, srcu_idx); + vcpu-arch.fault_dar = kvmppc_get_pc(vcpu); + vcpu-arch.fault_dsisr = 0; + r = RESUME_PAGE_FAULT; break; /* * This occurs if the guest executes an illegal instruction. @@ -984,22 +982,24 @@ static int on_primary_thread(void) * Run a set of guest threads on a physical core. * Called with vc-lock held. */ -static int kvmppc_run_core(struct kvmppc_vcore *vc) +static void kvmppc_run_core(struct kvmppc_vcore *vc) { struct kvm_vcpu *vcpu, *vcpu0, *vnext; long ret; u64 now; int ptid, i, need_vpa_update; int srcu_idx; + struct kvm_vcpu *vcpus_to_update[threads_per_core]; /* don't start if any threads have a signal pending */ need_vpa_update = 0; list_for_each_entry(vcpu, vc-runnable_threads, arch.run_list) { if (signal_pending(vcpu-arch.run_task)) - return 0; - need_vpa_update |= vcpu-arch.vpa.update_pending | - vcpu-arch.slb_shadow.update_pending | - vcpu-arch.dtl.update_pending; + return; + if (vcpu-arch.vpa.update_pending || + vcpu-arch.slb_shadow.update_pending || + vcpu-arch.dtl.update_pending) + vcpus_to_update[need_vpa_update++] = vcpu; } /* @@ -1019,8 +1019,8 @@ static int kvmppc_run_core(struct kvmppc_vcore *vc) */ if (need_vpa_update) { spin_unlock(vc-lock); - list_for_each_entry(vcpu, vc-runnable_threads, arch.run_list) - kvmppc_update_vpas(vcpu); + for (i = 0; i need_vpa_update; ++i) + kvmppc_update_vpas(vcpus_to_update[i]); spin_lock(vc-lock); } @@ -1037,8 +1037,10 @@ static int
[PATCH 5/8] KVM: PPC: Book3S HV: Run virtual core whenever any vcpus in it can run
Currently the Book3S HV code implements a policy on multi-threaded processors (i.e. POWER7) that requires all of the active vcpus in a virtual core to be ready to run before we run the virtual core. However, that causes problems on reset, because reset stops all vcpus except vcpu 0, and can also reduce throughput since all four threads in a virtual core have to wait whenever any one of them hits a hypervisor page fault. This relaxes the policy, allowing the virtual core to run as soon as any vcpu in it is runnable. With this, the KVMPPC_VCPU_STOPPED state and the KVMPPC_VCPU_BUSY_IN_HOST state have been combined into a single KVMPPC_VCPU_NOTREADY state, since we no longer need to distinguish between them. Signed-off-by: Paul Mackerras pau...@samba.org --- arch/powerpc/include/asm/kvm_host.h |5 +-- arch/powerpc/kvm/book3s_hv.c| 74 ++- 2 files changed, 40 insertions(+), 39 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 218534d..1e8cbd1 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -563,9 +563,8 @@ struct kvm_vcpu_arch { }; /* Values for vcpu-arch.state */ -#define KVMPPC_VCPU_STOPPED0 -#define KVMPPC_VCPU_BUSY_IN_HOST 1 -#define KVMPPC_VCPU_RUNNABLE 2 +#define KVMPPC_VCPU_NOTREADY 0 +#define KVMPPC_VCPU_RUNNABLE 1 /* Values for vcpu-arch.io_gpr */ #define KVM_MMIO_REG_MASK 0x001f diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 89995fa..61d2934 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -776,10 +776,7 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id) kvmppc_mmu_book3s_hv_init(vcpu); - /* -* We consider the vcpu stopped until we see the first run ioctl for it. -*/ - vcpu-arch.state = KVMPPC_VCPU_STOPPED; + vcpu-arch.state = KVMPPC_VCPU_NOTREADY; init_waitqueue_head(vcpu-arch.cpu_run); @@ -866,9 +863,8 @@ static void kvmppc_remove_runnable(struct kvmppc_vcore *vc, { if (vcpu-arch.state != KVMPPC_VCPU_RUNNABLE) return; - vcpu-arch.state = KVMPPC_VCPU_BUSY_IN_HOST; + vcpu-arch.state = KVMPPC_VCPU_NOTREADY; --vc-n_runnable; - ++vc-n_busy; list_del(vcpu-arch.run_list); } @@ -1169,7 +1165,6 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc) static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) { int n_ceded; - int prev_state; struct kvmppc_vcore *vc; struct kvm_vcpu *v, *vn; @@ -1186,7 +1181,6 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) vcpu-arch.ceded = 0; vcpu-arch.run_task = current; vcpu-arch.kvm_run = kvm_run; - prev_state = vcpu-arch.state; vcpu-arch.state = KVMPPC_VCPU_RUNNABLE; list_add_tail(vcpu-arch.run_list, vc-runnable_threads); ++vc-n_runnable; @@ -1196,35 +1190,26 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) * If the vcore is already running, we may be able to start * this thread straight away and have it join in. */ - if (prev_state == KVMPPC_VCPU_STOPPED) { + if (!signal_pending(current)) { if (vc-vcore_state == VCORE_RUNNING VCORE_EXIT_COUNT(vc) == 0) { vcpu-arch.ptid = vc-n_runnable - 1; kvmppc_create_dtl_entry(vcpu, vc); kvmppc_start_thread(vcpu); + } else if (vc-vcore_state == VCORE_SLEEPING) { + wake_up(vc-wq); } - } else if (prev_state == KVMPPC_VCPU_BUSY_IN_HOST) - --vc-n_busy; + } while (vcpu-arch.state == KVMPPC_VCPU_RUNNABLE !signal_pending(current)) { - if (vc-n_busy || vc-vcore_state != VCORE_INACTIVE) { + if (vc-vcore_state != VCORE_INACTIVE) { spin_unlock(vc-lock); kvmppc_wait_for_exec(vcpu, TASK_INTERRUPTIBLE); spin_lock(vc-lock); continue; } - vc-runner = vcpu; - n_ceded = 0; - list_for_each_entry(v, vc-runnable_threads, arch.run_list) - if (!v-arch.pending_exceptions) - n_ceded += v-arch.ceded; - if (n_ceded == vc-n_runnable) - kvmppc_vcore_blocked(vc); - else - kvmppc_run_core(vc); - list_for_each_entry_safe(v, vn, vc-runnable_threads, arch.run_list) { kvmppc_core_prepare_to_enter(v); @@ -1236,23 +1221,40 @@ static int
[PATCH 3/5] KVM: PPC: Book3S HV: Add a mechanism for recording modified HPTEs
This uses a bit in our record of the guest view of the HPTE to record when the HPTE gets modified. We use a reserved bit for this, and ensure that this bit is always cleared in HPTE values returned to the guest. The recording of modified HPTEs is only done if other code indicates its interest by setting kvm-arch.hpte_mod_interest to a non-zero value. Signed-off-by: Paul Mackerras pau...@samba.org --- arch/powerpc/include/asm/kvm_book3s_64.h |6 ++ arch/powerpc/include/asm/kvm_host.h |1 + arch/powerpc/kvm/book3s_hv_rm_mmu.c | 25 ++--- 3 files changed, 29 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 1472a5b..4ca4f25 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -50,6 +50,12 @@ extern int kvm_hpt_order;/* order of preallocated HPTs */ #define HPTE_V_HVLOCK 0x40UL #define HPTE_V_ABSENT 0x20UL +/* + * We use this bit in the guest_rpte field of the revmap entry + * to indicate a modified HPTE. + */ +#define HPTE_GR_MODIFIED (1ul 62) + static inline long try_lock_hpte(unsigned long *hpte, unsigned long bits) { unsigned long tmp, old; diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 3093896..58c7264 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -248,6 +248,7 @@ struct kvm_arch { atomic_t vcpus_running; unsigned long hpt_npte; unsigned long hpt_mask; + atomic_t hpte_mod_interest; spinlock_t slot_phys_lock; unsigned short last_vcpu[NR_CPUS]; struct kvmppc_vcore *vcores[KVM_MAX_VCORES]; diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 3233587..c83c0ca 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -66,6 +66,18 @@ void kvmppc_add_revmap_chain(struct kvm *kvm, struct revmap_entry *rev, } EXPORT_SYMBOL_GPL(kvmppc_add_revmap_chain); +/* + * Note modification of an HPTE; set the HPTE modified bit + * if it wasn't modified before and anyone is interested. + */ +static inline void note_hpte_modification(struct kvm *kvm, + struct revmap_entry *rev) +{ + if (!(rev-guest_rpte HPTE_GR_MODIFIED) + atomic_read(kvm-arch.hpte_mod_interest)) + rev-guest_rpte |= HPTE_GR_MODIFIED; +} + /* Remove this HPTE from the chain for a real page */ static void remove_revmap_chain(struct kvm *kvm, long pte_index, struct revmap_entry *rev, @@ -287,8 +299,10 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags, rev = kvm-arch.revmap[pte_index]; if (realmode) rev = real_vmalloc_addr(rev); - if (rev) + if (rev) { rev-guest_rpte = g_ptel; + note_hpte_modification(kvm, rev); + } /* Link HPTE into reverse-map chain */ if (pteh HPTE_V_VALID) { @@ -392,7 +406,8 @@ long kvmppc_h_remove(struct kvm_vcpu *vcpu, unsigned long flags, /* Read PTE low word after tlbie to get final R/C values */ remove_revmap_chain(kvm, pte_index, rev, v, hpte[1]); } - r = rev-guest_rpte; + r = rev-guest_rpte ~HPTE_GR_MODIFIED; + note_hpte_modification(kvm, rev); unlock_hpte(hpte, 0); vcpu-arch.gpr[4] = v; @@ -466,6 +481,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu) args[j] = ((0x80 | flags) 56) + pte_index; rev = real_vmalloc_addr(kvm-arch.revmap[pte_index]); + note_hpte_modification(kvm, rev); if (!(hp[0] HPTE_V_VALID)) { /* insert R and C bits from PTE */ @@ -555,6 +571,7 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags, if (rev) { r = (rev-guest_rpte ~mask) | bits; rev-guest_rpte = r; + note_hpte_modification(kvm, rev); } r = (hpte[1] ~mask) | bits; @@ -606,8 +623,10 @@ long kvmppc_h_read(struct kvm_vcpu *vcpu, unsigned long flags, v = ~HPTE_V_ABSENT; v |= HPTE_V_VALID; } - if (v HPTE_V_VALID) + if (v HPTE_V_VALID) { r = rev[i].guest_rpte | (r (HPTE_R_R | HPTE_R_C)); + r = ~HPTE_GR_MODIFIED; + } vcpu-arch.gpr[4 + i * 2] = v; vcpu-arch.gpr[5 + i * 2] = r; } -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/5] KVM: PPC: Book3S HV: Restructure HPT entry creation code
This restructures the code that creates HPT (hashed page table) entries so that it can be called in situations where we don't have a struct vcpu pointer, only a struct kvm pointer. It also fixes a bug where kvmppc_map_vrma() would corrupt the guest R4 value. Now, most of the work of kvmppc_virtmode_h_enter is done by a new function, kvmppc_virtmode_do_h_enter, which itself calls another new function, kvmppc_do_h_enter, which contains most of the old kvmppc_h_enter. The new kvmppc_do_h_enter takes explicit arguments for the place to return the HPTE index, the Linux page tables to use, and whether it is being called in real mode, thus removing the need for it to have the vcpu as an argument. Currently kvmppc_map_vrma creates the VRMA (virtual real mode area) HPTEs by calling kvmppc_virtmode_h_enter, which is designed primarily to handle H_ENTER hcalls from the guest that need to pin a page of memory. Since H_ENTER returns the index of the created HPTE in R4, kvmppc_virtmode_h_enter updates the guest R4, corrupting the guest R4 in the case when it gets called from kvmppc_map_vrma on the first VCPU_RUN ioctl. With this, kvmppc_map_vrma instead calls kvmppc_virtmode_do_h_enter with the address of a dummy word as the place to store the HPTE index, thus avoiding corrupting the guest R4. Signed-off-by: Paul Mackerras pau...@samba.org --- arch/powerpc/include/asm/kvm_book3s.h |5 +++-- arch/powerpc/kvm/book3s_64_mmu_hv.c | 36 +++-- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 27 - 3 files changed, 45 insertions(+), 23 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index ab73800..199b7fd 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -157,8 +157,9 @@ extern void *kvmppc_pin_guest_page(struct kvm *kvm, unsigned long addr, extern void kvmppc_unpin_guest_page(struct kvm *kvm, void *addr); extern long kvmppc_virtmode_h_enter(struct kvm_vcpu *vcpu, unsigned long flags, long pte_index, unsigned long pteh, unsigned long ptel); -extern long kvmppc_h_enter(struct kvm_vcpu *vcpu, unsigned long flags, - long pte_index, unsigned long pteh, unsigned long ptel); +extern long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags, + long pte_index, unsigned long pteh, unsigned long ptel, + pgd_t *pgdir, bool realmode, unsigned long *idx_ret); extern long kvmppc_hv_get_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot, unsigned long *map); diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 7a4aae9..351f2ac 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -41,6 +41,10 @@ /* Power architecture requires HPT is at least 256kB */ #define PPC_MIN_HPT_ORDER 18 +static long kvmppc_virtmode_do_h_enter(struct kvm *kvm, unsigned long flags, + long pte_index, unsigned long pteh, + unsigned long ptel, unsigned long *pte_idx_ret); + long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp) { unsigned long hpt; @@ -185,6 +189,7 @@ void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot, unsigned long addr, hash; unsigned long psize; unsigned long hp0, hp1; + unsigned long idx_ret; long ret; struct kvm *kvm = vcpu-kvm; @@ -216,7 +221,8 @@ void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot, hash = (hash 3) + 7; hp_v = hp0 | ((addr 16) ~0x7fUL); hp_r = hp1 | addr; - ret = kvmppc_virtmode_h_enter(vcpu, H_EXACT, hash, hp_v, hp_r); + ret = kvmppc_virtmode_do_h_enter(kvm, H_EXACT, hash, hp_v, hp_r, +idx_ret); if (ret != H_SUCCESS) { pr_err(KVM: map_vrma at %lx failed, ret=%ld\n, addr, ret); @@ -354,15 +360,10 @@ static long kvmppc_get_guest_page(struct kvm *kvm, unsigned long gfn, return err; } -/* - * We come here on a H_ENTER call from the guest when we are not - * using mmu notifiers and we don't have the requested page pinned - * already. - */ -long kvmppc_virtmode_h_enter(struct kvm_vcpu *vcpu, unsigned long flags, - long pte_index, unsigned long pteh, unsigned long ptel) +long kvmppc_virtmode_do_h_enter(struct kvm *kvm, unsigned long flags, + long pte_index, unsigned long pteh, + unsigned long ptel, unsigned long *pte_idx_ret) { - struct kvm *kvm = vcpu-kvm; unsigned long psize, gpa, gfn; struct kvm_memory_slot *memslot; long ret; @@ -390,8 +391,8 @@ long
[PATCH 0/5] KVM: PPC: Book3S HV: HPT read/write functions for userspace
This series of patches provides an interface by which userspace can read and write the hashed page table (HPT) of a Book3S HV guest. The interface is an ioctl which provides a file descriptor which can be accessed with the read() and write() system calls. The data read and written is the guest view of the HPT, in which the second doubleword of each HPTE (HPT entry) contains a guest physical address, as distinct from the real HPT that the hardware accesses, where the second doubleword of each HPTE contains a real address. Because the HPT is divided into groups (HPTEGs) of 8 entries each, where each HPTEG usually only contains a few valid entries, or none, the data format that we use does run-length encoding of the invalid entries, so in fact the invalid entries take up no space in the stream. The interface also provides for doing multiple passes over the HPT, where the first pass provides information on all HPTEs, and subsequent passes only return the HPTEs that have changed since the previous pass. I have implemented a read/write interface rather than an mmap-based interface because the data is not stored contiguously anywhere in kernel memory. Of each 16-byte HPTE, the first 8 bytes come from the real HPT and the second 8 bytes come from the parallel vmalloc'd array where we store the guest view of the guest physical address, permissions, accessed/dirty bits etc. Thus a mmap-based interface would not be practicable (not without doubling the size of the parallel array, typically requiring an extra 8MB of kernel memory per guest). This is also why I have not used the memslot interface for this. This implements the interface for HV-style KVM but not for PR-style KVM. Userspace does not need any additional interface with PR-style KVM because userspace maintains the guest HPT already in that case, and has an image of the guest view of the HPT in its address space. This series is against the next branch of the kvm tree plus my recently-posted set of 8 patches (Various Book3s HV fixes that haven't been picked up yet). The overall diffstat is: Documentation/virtual/kvm/api.txt| 53 + arch/powerpc/include/asm/kvm.h | 24 ++ arch/powerpc/include/asm/kvm_book3s.h|8 +- arch/powerpc/include/asm/kvm_book3s_64.h | 24 ++ arch/powerpc/include/asm/kvm_host.h |1 + arch/powerpc/include/asm/kvm_ppc.h |2 + arch/powerpc/kvm/book3s_64_mmu_hv.c | 380 +- arch/powerpc/kvm/book3s_hv.c | 12 - arch/powerpc/kvm/book3s_hv_rm_mmu.c | 71 -- arch/powerpc/kvm/powerpc.c | 17 ++ include/linux/kvm.h |3 + include/linux/kvm_host.h | 11 +- 12 files changed, 559 insertions(+), 47 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5] KVM: Provide mmu notifier retry test based on struct kvm
The mmu_notifier_retry() function, used to test whether any page invalidations are in progress, currently takes a vcpu pointer, though the code only needs the VM's struct kvm pointer. Forthcoming patches to the powerpc Book3S HV code will need to test for retry within a VM ioctl, where a struct kvm pointer is available but a struct vcpu pointer isn't. Therefore this creates a variant of mmu_notifier_retry called kvm_mmu_notifier_retry that takes a struct kvm pointer, and implements mmu_notifier_retry in terms of it. Signed-off-by: Paul Mackerras pau...@samba.org --- include/linux/kvm_host.h | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 6afc5be..1cc1e1d 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -841,9 +841,9 @@ extern struct kvm_stats_debugfs_item debugfs_entries[]; extern struct dentry *kvm_debugfs_dir; #if defined(CONFIG_MMU_NOTIFIER) defined(KVM_ARCH_WANT_MMU_NOTIFIER) -static inline int mmu_notifier_retry(struct kvm_vcpu *vcpu, unsigned long mmu_seq) +static inline int kvm_mmu_notifier_retry(struct kvm *kvm, unsigned long mmu_seq) { - if (unlikely(vcpu-kvm-mmu_notifier_count)) + if (unlikely(kvm-mmu_notifier_count)) return 1; /* * Ensure the read of mmu_notifier_count happens before the read @@ -856,10 +856,15 @@ static inline int mmu_notifier_retry(struct kvm_vcpu *vcpu, unsigned long mmu_se * can't rely on kvm-mmu_lock to keep things ordered. */ smp_rmb(); - if (vcpu-kvm-mmu_notifier_seq != mmu_seq) + if (kvm-mmu_notifier_seq != mmu_seq) return 1; return 0; } + +static inline int mmu_notifier_retry(struct kvm_vcpu *vcpu, unsigned long mmu_seq) +{ + return kvm_mmu_notifier_retry(vcpu-kvm, mmu_seq); +} #endif #ifdef KVM_CAP_IRQ_ROUTING -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/5] KVM: PPC: Book3S HV: Provide a method for userspace to read and write the HPT
A new ioctl, KVM_PPC_GET_HTAB_FD, returns a file descriptor. Reads on this fd return the contents of the HPT (hashed page table), writes create and/or remove entries in the HPT. There is a new capability, KVM_CAP_PPC_HTAB_FD, to indicate the presence of the ioctl. The ioctl takes an argument structure with the index of the first HPT entry to read out and a set of flags. The flags indicate whether the user is intending to read or write the HPT, and whether to return all entries or only the bolted entries (those with the bolted bit, 0x10, set in the first doubleword). This is intended for use in implementing qemu's savevm/loadvm and for live migration. Therefore, on reads, the first pass returns information about all HPTEs (or all bolted HPTEs). When the first pass reaches the end of the HPT, it returns from the read. Subsequent reads only return information about HPTEs that have changed since they were last read. A read that finds no changed HPTEs in the HPT following where the last read finished will return 0 bytes. Signed-off-by: Paul Mackerras pau...@samba.org --- Documentation/virtual/kvm/api.txt| 53 + arch/powerpc/include/asm/kvm.h | 24 +++ arch/powerpc/include/asm/kvm_book3s_64.h | 18 ++ arch/powerpc/include/asm/kvm_ppc.h |2 + arch/powerpc/kvm/book3s_64_mmu_hv.c | 344 ++ arch/powerpc/kvm/book3s_hv.c | 12 -- arch/powerpc/kvm/powerpc.c | 17 ++ include/linux/kvm.h |3 + 8 files changed, 461 insertions(+), 12 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 4258180..8df3e53 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2071,6 +2071,59 @@ KVM_S390_INT_EXTERNAL_CALL (vcpu) - sigp external call; source cpu in parm Note that the vcpu ioctl is asynchronous to vcpu execution. +4.78 KVM_PPC_GET_HTAB_FD + +Capability: KVM_CAP_PPC_HTAB_FD +Architectures: powerpc +Type: vm ioctl +Parameters: Pointer to struct kvm_get_htab_fd (in) +Returns: file descriptor number (= 0) on success, -1 on error + +This returns a file descriptor that can be used either to read out the +entries in the guest's hashed page table (HPT), or to write entries to +initialize the HPT. The returned fd can only be written to if the +KVM_GET_HTAB_WRITE bit is set in the flags field of the argument, and +can only be read if that bit is clear. The argument struct looks like +this: + +/* For KVM_PPC_GET_HTAB_FD */ +struct kvm_get_htab_fd { + __u64 flags; + __u64 start_index; +}; + +/* Values for kvm_get_htab_fd.flags */ +#define KVM_GET_HTAB_BOLTED_ONLY ((__u64)0x1) +#define KVM_GET_HTAB_WRITE ((__u64)0x2) + +The `start_index' field gives the index in the HPT of the entry at +which to start reading. It is ignored when writing. + +Reads on the fd will initially supply information about all +interesting HPT entries. Interesting entries are those with the +bolted bit set, if the KVM_GET_HTAB_BOLTED_ONLY bit is set, otherwise +all entries. When the end of the HPT is reached, the read() will +return. If read() is called again on the fd, it will start again from +the beginning of the HPT, but will only return HPT entries that have +changed since they were last read. + +Data read or written is structured as a header (8 bytes) followed by a +series of valid HPT entries (16 bytes) each. The header indicates how +many valid HPT entries there are and how many invalid entries follow +the valid entries. The invalid entries are not represented explicitly +in the stream. The header format is: + +struct kvm_get_htab_header { + __u32 index; + __u16 n_valid; + __u16 n_invalid; +}; + +Writes to the fd create HPT entries starting at the index given in the +header; first `n_valid' valid entries with contents from the data +written, then `n_invalid' invalid entries, invalidating any previously +valid entries found. + 5. The kvm_run structure diff --git a/arch/powerpc/include/asm/kvm.h b/arch/powerpc/include/asm/kvm.h index b89ae4d..6518e38 100644 --- a/arch/powerpc/include/asm/kvm.h +++ b/arch/powerpc/include/asm/kvm.h @@ -331,6 +331,30 @@ struct kvm_book3e_206_tlb_params { __u32 reserved[8]; }; +/* For KVM_PPC_GET_HTAB_FD */ +struct kvm_get_htab_fd { + __u64 flags; + __u64 start_index; +}; + +/* Values for kvm_get_htab_fd.flags */ +#define KVM_GET_HTAB_BOLTED_ONLY ((__u64)0x1) +#define KVM_GET_HTAB_WRITE ((__u64)0x2) + +/* + * Data read on the file descriptor is formatted as a series of + * records, each consisting of a header followed by a series of + * `n_valid' HPTEs (16 bytes each), which are all valid. Following + * those valid HPTEs there are `n_invalid' invalid HPTEs, which + * are not represented explicitly in the stream. The same format + * is used for writing. + */ +struct
[PATCH 4/5] KVM: PPC: Book3S HV: Make a HPTE removal function available
This makes a HPTE removal function, kvmppc_do_h_remove(), available outside book3s_hv_rm_mmu.c. This will be used by the HPT writing code. Signed-off-by: Paul Mackerras pau...@samba.org --- arch/powerpc/include/asm/kvm_book3s.h |3 +++ arch/powerpc/kvm/book3s_hv_rm_mmu.c | 19 +-- 2 files changed, 16 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index 199b7fd..4ac1c67 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -160,6 +160,9 @@ extern long kvmppc_virtmode_h_enter(struct kvm_vcpu *vcpu, unsigned long flags, extern long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags, long pte_index, unsigned long pteh, unsigned long ptel, pgd_t *pgdir, bool realmode, unsigned long *idx_ret); +extern long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags, + unsigned long pte_index, unsigned long avpn, + unsigned long *hpret); extern long kvmppc_hv_get_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot, unsigned long *map); diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index c83c0ca..505548a 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -364,11 +364,10 @@ static inline int try_lock_tlbie(unsigned int *lock) return old == 0; } -long kvmppc_h_remove(struct kvm_vcpu *vcpu, unsigned long flags, -unsigned long pte_index, unsigned long avpn, -unsigned long va) +long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags, + unsigned long pte_index, unsigned long avpn, + unsigned long *hpret) { - struct kvm *kvm = vcpu-kvm; unsigned long *hpte; unsigned long v, r, rb; struct revmap_entry *rev; @@ -410,10 +409,18 @@ long kvmppc_h_remove(struct kvm_vcpu *vcpu, unsigned long flags, note_hpte_modification(kvm, rev); unlock_hpte(hpte, 0); - vcpu-arch.gpr[4] = v; - vcpu-arch.gpr[5] = r; + hpret[0] = v; + hpret[1] = r; return H_SUCCESS; } +EXPORT_SYMBOL_GPL(kvmppc_do_h_remove); + +long kvmppc_h_remove(struct kvm_vcpu *vcpu, unsigned long flags, +unsigned long pte_index, unsigned long avpn) +{ + return kvmppc_do_h_remove(vcpu-kvm, flags, pte_index, avpn, + vcpu-arch.gpr[4]); +} long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu) { -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] MAINTAINERS: Add git tree link for PPC KVM
Signed-off-by: Michael Ellerman mich...@ellerman.id.au --- MAINTAINERS |1 + 1 file changed, 1 insertion(+) diff --git a/MAINTAINERS b/MAINTAINERS index e73060f..32dc107 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -4244,6 +4244,7 @@ KERNEL VIRTUAL MACHINE (KVM) FOR POWERPC M: Alexander Graf ag...@suse.de L: kvm-...@vger.kernel.org W: http://kvm.qumranet.com +T: git git://github.com/agraf/linux-2.6.git S: Supported F: arch/powerpc/include/asm/kvm* F: arch/powerpc/kvm/ -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 8/8] KVM: PPC: Book3S HV: Fix thinko in try_lock_hpte()
This fixes an error in the inline asm in try_lock_hpte() where we were erroneously using a register number as an immediate operand. The bug only affects an error path, and in fact the code will still work as long as the compiler chooses some register other than r0 for the bits variable. Nevertheless it should still be fixed. Signed-off-by: Paul Mackerras pau...@samba.org --- arch/powerpc/include/asm/kvm_book3s_64.h |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 0dd1d86..1472a5b 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -60,7 +60,7 @@ static inline long try_lock_hpte(unsigned long *hpte, unsigned long bits) ori %0,%0,%4\n stdcx. %0,0,%2\n beq+2f\n - li %1,%3\n + mr %1,%3\n 2:isync : =r (tmp), =r (old) : r (hpte), r (bits), i (HPTE_V_HVLOCK) -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/8] Various Book3s HV fixes that haven't been picked up yet
This is a set of 8 patches of which the first 7 have been posted previously and have had no comments. The 8th is new, but is quite trivial. They fix a series of issues with HV-style KVM on ppc. They only touch code that is specific to Book3S HV KVM. Please apply. Thanks, Paul. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/8] KVM: PPC: Book3S HV: Run virtual core whenever any vcpus in it can run
Currently the Book3S HV code implements a policy on multi-threaded processors (i.e. POWER7) that requires all of the active vcpus in a virtual core to be ready to run before we run the virtual core. However, that causes problems on reset, because reset stops all vcpus except vcpu 0, and can also reduce throughput since all four threads in a virtual core have to wait whenever any one of them hits a hypervisor page fault. This relaxes the policy, allowing the virtual core to run as soon as any vcpu in it is runnable. With this, the KVMPPC_VCPU_STOPPED state and the KVMPPC_VCPU_BUSY_IN_HOST state have been combined into a single KVMPPC_VCPU_NOTREADY state, since we no longer need to distinguish between them. Signed-off-by: Paul Mackerras pau...@samba.org --- arch/powerpc/include/asm/kvm_host.h |5 +-- arch/powerpc/kvm/book3s_hv.c| 74 ++- 2 files changed, 40 insertions(+), 39 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 218534d..1e8cbd1 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -563,9 +563,8 @@ struct kvm_vcpu_arch { }; /* Values for vcpu-arch.state */ -#define KVMPPC_VCPU_STOPPED0 -#define KVMPPC_VCPU_BUSY_IN_HOST 1 -#define KVMPPC_VCPU_RUNNABLE 2 +#define KVMPPC_VCPU_NOTREADY 0 +#define KVMPPC_VCPU_RUNNABLE 1 /* Values for vcpu-arch.io_gpr */ #define KVM_MMIO_REG_MASK 0x001f diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 89995fa..61d2934 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -776,10 +776,7 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id) kvmppc_mmu_book3s_hv_init(vcpu); - /* -* We consider the vcpu stopped until we see the first run ioctl for it. -*/ - vcpu-arch.state = KVMPPC_VCPU_STOPPED; + vcpu-arch.state = KVMPPC_VCPU_NOTREADY; init_waitqueue_head(vcpu-arch.cpu_run); @@ -866,9 +863,8 @@ static void kvmppc_remove_runnable(struct kvmppc_vcore *vc, { if (vcpu-arch.state != KVMPPC_VCPU_RUNNABLE) return; - vcpu-arch.state = KVMPPC_VCPU_BUSY_IN_HOST; + vcpu-arch.state = KVMPPC_VCPU_NOTREADY; --vc-n_runnable; - ++vc-n_busy; list_del(vcpu-arch.run_list); } @@ -1169,7 +1165,6 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc) static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) { int n_ceded; - int prev_state; struct kvmppc_vcore *vc; struct kvm_vcpu *v, *vn; @@ -1186,7 +1181,6 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) vcpu-arch.ceded = 0; vcpu-arch.run_task = current; vcpu-arch.kvm_run = kvm_run; - prev_state = vcpu-arch.state; vcpu-arch.state = KVMPPC_VCPU_RUNNABLE; list_add_tail(vcpu-arch.run_list, vc-runnable_threads); ++vc-n_runnable; @@ -1196,35 +1190,26 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) * If the vcore is already running, we may be able to start * this thread straight away and have it join in. */ - if (prev_state == KVMPPC_VCPU_STOPPED) { + if (!signal_pending(current)) { if (vc-vcore_state == VCORE_RUNNING VCORE_EXIT_COUNT(vc) == 0) { vcpu-arch.ptid = vc-n_runnable - 1; kvmppc_create_dtl_entry(vcpu, vc); kvmppc_start_thread(vcpu); + } else if (vc-vcore_state == VCORE_SLEEPING) { + wake_up(vc-wq); } - } else if (prev_state == KVMPPC_VCPU_BUSY_IN_HOST) - --vc-n_busy; + } while (vcpu-arch.state == KVMPPC_VCPU_RUNNABLE !signal_pending(current)) { - if (vc-n_busy || vc-vcore_state != VCORE_INACTIVE) { + if (vc-vcore_state != VCORE_INACTIVE) { spin_unlock(vc-lock); kvmppc_wait_for_exec(vcpu, TASK_INTERRUPTIBLE); spin_lock(vc-lock); continue; } - vc-runner = vcpu; - n_ceded = 0; - list_for_each_entry(v, vc-runnable_threads, arch.run_list) - if (!v-arch.pending_exceptions) - n_ceded += v-arch.ceded; - if (n_ceded == vc-n_runnable) - kvmppc_vcore_blocked(vc); - else - kvmppc_run_core(vc); - list_for_each_entry_safe(v, vn, vc-runnable_threads, arch.run_list) { kvmppc_core_prepare_to_enter(v); @@ -1236,23 +1221,40 @@ static int
[PATCH 6/8] KVM: PPC: Book3S HV: Fix accounting of stolen time
Currently the code that accounts stolen time tends to overestimate the stolen time, and will sometimes report more stolen time in a DTL (dispatch trace log) entry than has elapsed since the last DTL entry. This can cause guests to underflow the user or system time measured for some tasks, leading to ridiculous CPU percentages and total runtimes being reported by top and other utilities. In addition, the current code was designed for the previous policy where a vcore would only run when all the vcpus in it were runnable, and so only counted stolen time on a per-vcore basis. Now that a vcore can run while some of the vcpus in it are doing other things in the kernel (e.g. handling a page fault), we need to count the time when a vcpu task is preempted while it is not running as part of a vcore as stolen also. To do this, we bring back the BUSY_IN_HOST vcpu state and extend the vcpu_load/put functions to count preemption time while the vcpu is in that state. Handling the transitions between the RUNNING and BUSY_IN_HOST states requires checking and updating two variables (accumulated time stolen and time last preempted), so we add a new spinlock, vcpu-arch.tbacct_lock. This protects both the per-vcpu stolen/preempt-time variables, and the per-vcore variables while this vcpu is running the vcore. Finally, we now don't count time spent in userspace as stolen time. The task could be executing in userspace on behalf of the vcpu, or it could be preempted, or the vcpu could be genuinely stopped. Since we have no way of dividing up the time between these cases, we don't count any of it as stolen. Signed-off-by: Paul Mackerras pau...@samba.org --- arch/powerpc/include/asm/kvm_host.h |5 ++ arch/powerpc/kvm/book3s_hv.c| 127 ++- 2 files changed, 117 insertions(+), 15 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 1e8cbd1..3093896 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -559,12 +559,17 @@ struct kvm_vcpu_arch { unsigned long dtl_index; u64 stolen_logged; struct kvmppc_vpa slb_shadow; + + spinlock_t tbacct_lock; + u64 busy_stolen; + u64 busy_preempt; #endif }; /* Values for vcpu-arch.state */ #define KVMPPC_VCPU_NOTREADY 0 #define KVMPPC_VCPU_RUNNABLE 1 +#define KVMPPC_VCPU_BUSY_IN_HOST 2 /* Values for vcpu-arch.io_gpr */ #define KVM_MMIO_REG_MASK 0x001f diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 61d2934..8b3c470 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -60,23 +60,74 @@ /* Used to indicate that a guest page fault needs to be handled */ #define RESUME_PAGE_FAULT (RESUME_GUEST | RESUME_FLAG_ARCH1) +/* Used as a null value for timebase values */ +#define TB_NIL (~(u64)0) + static void kvmppc_end_cede(struct kvm_vcpu *vcpu); static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu); +/* + * We use the vcpu_load/put functions to measure stolen time. + * Stolen time is counted as time when either the vcpu is able to + * run as part of a virtual core, but the task running the vcore + * is preempted or sleeping, or when the vcpu needs something done + * in the kernel by the task running the vcpu, but that task is + * preempted or sleeping. Those two things have to be counted + * separately, since one of the vcpu tasks will take on the job + * of running the core, and the other vcpu tasks in the vcore will + * sleep waiting for it to do that, but that sleep shouldn't count + * as stolen time. + * + * Hence we accumulate stolen time when the vcpu can run as part of + * a vcore using vc-stolen_tb, and the stolen time when the vcpu + * needs its task to do other things in the kernel (for example, + * service a page fault) in busy_stolen. We don't accumulate + * stolen time for a vcore when it is inactive, or for a vcpu + * when it is in state RUNNING or NOTREADY. NOTREADY is a bit of + * a misnomer; it means that the vcpu task is not executing in + * the KVM_VCPU_RUN ioctl, i.e. it is in userspace or elsewhere in + * the kernel. We don't have any way of dividing up that time + * between time that the vcpu is genuinely stopped, time that + * the task is actively working on behalf of the vcpu, and time + * that the task is preempted, so we don't count any of it as + * stolen. + * + * Updates to busy_stolen are protected by arch.tbacct_lock; + * updates to vc-stolen_tb are protected by the arch.tbacct_lock + * of the vcpu that has taken responsibility for running the vcore + * (i.e. vc-runner). The stolen times are measured in units of + * timebase ticks. (Note that the != TB_NIL checks below are + * purely defensive; they should never fail.) + */ + void kvmppc_core_vcpu_load(struct kvm_vcpu *vcpu, int cpu) { struct kvmppc_vcore *vc = vcpu-arch.vcore; - if (vc-runner == vcpu
[PATCH 2/8] KVM: PPC: Book3S HV: Fix some races in starting secondary threads
Subsequent patches implementing in-kernel XICS emulation will make it possible for IPIs to arrive at secondary threads at arbitrary times. This fixes some races in how we start the secondary threads, which if not fixed could lead to occasional crashes of the host kernel. This makes sure that (a) we have grabbed all the secondary threads, and verified that they are no longer in the kernel, before we start any thread, (b) that the secondary thread loads its vcpu pointer after clearing the IPI that woke it up (so we don't miss a wakeup), and (c) that the secondary thread clears its vcpu pointer before incrementing the nap count. It also removes unnecessary setting of the vcpu and vcore pointers in the paca in kvmppc_core_vcpu_load. Signed-off-by: Paul Mackerras pau...@samba.org --- arch/powerpc/kvm/book3s_hv.c| 41 ++- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 11 ++--- 2 files changed, 32 insertions(+), 20 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index c5ddf04..77dec0f 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -64,8 +64,6 @@ void kvmppc_core_vcpu_load(struct kvm_vcpu *vcpu, int cpu) { struct kvmppc_vcore *vc = vcpu-arch.vcore; - local_paca-kvm_hstate.kvm_vcpu = vcpu; - local_paca-kvm_hstate.kvm_vcore = vc; if (vc-runner == vcpu vc-vcore_state != VCORE_INACTIVE) vc-stolen_tb += mftb() - vc-preempt_tb; } @@ -880,6 +878,7 @@ static int kvmppc_grab_hwthread(int cpu) /* Ensure the thread won't go into the kernel if it wakes */ tpaca-kvm_hstate.hwthread_req = 1; + tpaca-kvm_hstate.kvm_vcpu = NULL; /* * If the thread is already executing in the kernel (e.g. handling @@ -929,7 +928,6 @@ static void kvmppc_start_thread(struct kvm_vcpu *vcpu) smp_wmb(); #if defined(CONFIG_PPC_ICP_NATIVE) defined(CONFIG_SMP) if (vcpu-arch.ptid) { - kvmppc_grab_hwthread(cpu); xics_wake_cpu(cpu); ++vc-n_woken; } @@ -955,7 +953,8 @@ static void kvmppc_wait_for_nap(struct kvmppc_vcore *vc) /* * Check that we are on thread 0 and that any other threads in - * this core are off-line. + * this core are off-line. Then grab the threads so they can't + * enter the kernel. */ static int on_primary_thread(void) { @@ -967,6 +966,17 @@ static int on_primary_thread(void) while (++thr threads_per_core) if (cpu_online(cpu + thr)) return 0; + + /* Grab all hw threads so they can't go into the kernel */ + for (thr = 1; thr threads_per_core; ++thr) { + if (kvmppc_grab_hwthread(cpu + thr)) { + /* Couldn't grab one; let the others go */ + do { + kvmppc_release_hwthread(cpu + thr); + } while (--thr 0); + return 0; + } + } return 1; } @@ -1015,16 +1025,6 @@ static int kvmppc_run_core(struct kvmppc_vcore *vc) } /* -* Make sure we are running on thread 0, and that -* secondary threads are offline. -*/ - if (threads_per_core 1 !on_primary_thread()) { - list_for_each_entry(vcpu, vc-runnable_threads, arch.run_list) - vcpu-arch.ret = -EBUSY; - goto out; - } - - /* * Assign physical thread IDs, first to non-ceded vcpus * and then to ceded ones. */ @@ -1043,15 +1043,22 @@ static int kvmppc_run_core(struct kvmppc_vcore *vc) if (vcpu-arch.ceded) vcpu-arch.ptid = ptid++; + /* +* Make sure we are running on thread 0, and that +* secondary threads are offline. +*/ + if (threads_per_core 1 !on_primary_thread()) { + list_for_each_entry(vcpu, vc-runnable_threads, arch.run_list) + vcpu-arch.ret = -EBUSY; + goto out; + } + vc-stolen_tb += mftb() - vc-preempt_tb; vc-pcpu = smp_processor_id(); list_for_each_entry(vcpu, vc-runnable_threads, arch.run_list) { kvmppc_start_thread(vcpu); kvmppc_create_dtl_entry(vcpu, vc); } - /* Grab any remaining hw threads so they can't go into the kernel */ - for (i = ptid; i threads_per_core; ++i) - kvmppc_grab_hwthread(vc-pcpu + i); preempt_disable(); spin_unlock(vc-lock); diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index 44b72fe..1e90ef6 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -134,8 +134,11 @@ kvm_start_guest: 27:/* XXX should handle hypervisor maintenance interrupts etc. here */ + /* reload vcpu pointer after
[PATCH 7/8] KVM: PPC: Book3S HV: Allow DTL to be set to address 0, length 0
Commit 55b665b026 (KVM: PPC: Book3S HV: Provide a way for userspace to get/set per-vCPU areas) includes a check on the length of the dispatch trace log (DTL) to make sure the buffer is at least one entry long. This is appropriate when registering a buffer, but the interface also allows for any existing buffer to be unregistered by specifying a zero address. In this case the length check is not appropriate. This makes the check conditional on the address being non-zero. Signed-off-by: Paul Mackerras pau...@samba.org --- arch/powerpc/kvm/book3s_hv.c |5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 8b3c470..812764c 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -811,9 +811,8 @@ int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 id, union kvmppc_one_reg *val) addr = val-vpaval.addr; len = val-vpaval.length; r = -EINVAL; - if (len sizeof(struct dtl_entry)) - break; - if (addr !vcpu-arch.vpa.next_gpa) + if (addr (len sizeof(struct dtl_entry) || +!vcpu-arch.vpa.next_gpa)) break; len -= len % sizeof(struct dtl_entry); r = set_vpa(vcpu, vcpu-arch.dtl, addr, len); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/8] KVM: PPC: Book3S HV: Fixes for late-joining threads
If a thread in a virtual core becomes runnable while other threads in the same virtual core are already running in the guest, it is possible for the latecomer to join the others on the core without first pulling them all out of the guest. Currently this only happens rarely, when a vcpu is first started. This fixes some bugs and omissions in the code in this case. First, we need to check for VPA updates for the latecomer and make a DTL entry for it. Secondly, if it comes along while the master vcpu is doing a VPA update, we don't need to do anything since the master will pick it up in kvmppc_run_core. To handle this correctly we introduce a new vcore state, VCORE_STARTING. Thirdly, there is a race because we currently clear the hardware thread's hwthread_req before waiting to see it get to nap. A latecomer thread could have its hwthread_req cleared before it gets to test it, and therefore never increment the nap_count, leading to messages about wait_for_nap timeouts. Signed-off-by: Paul Mackerras pau...@samba.org --- arch/powerpc/include/asm/kvm_host.h |7 --- arch/powerpc/kvm/book3s_hv.c| 14 +++--- 2 files changed, 15 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 68f5a30..218534d 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -289,9 +289,10 @@ struct kvmppc_vcore { /* Values for vcore_state */ #define VCORE_INACTIVE 0 -#define VCORE_RUNNING 1 -#define VCORE_EXITING 2 -#define VCORE_SLEEPING 3 +#define VCORE_SLEEPING 1 +#define VCORE_STARTING 2 +#define VCORE_RUNNING 3 +#define VCORE_EXITING 4 /* * Struct used to manage memory for a virtual processor area diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 3a737a4..89995fa 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -336,6 +336,11 @@ static void kvmppc_update_vpa(struct kvm_vcpu *vcpu, struct kvmppc_vpa *vpap) static void kvmppc_update_vpas(struct kvm_vcpu *vcpu) { + if (!(vcpu-arch.vpa.update_pending || + vcpu-arch.slb_shadow.update_pending || + vcpu-arch.dtl.update_pending)) + return; + spin_lock(vcpu-arch.vpa_update_lock); if (vcpu-arch.vpa.update_pending) { kvmppc_update_vpa(vcpu, vcpu-arch.vpa); @@ -1009,7 +1014,7 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc) vc-n_woken = 0; vc-nap_count = 0; vc-entry_exit_count = 0; - vc-vcore_state = VCORE_RUNNING; + vc-vcore_state = VCORE_STARTING; vc-in_guest = 0; vc-napping_threads = 0; @@ -1062,6 +1067,7 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc) kvmppc_create_dtl_entry(vcpu, vc); } + vc-vcore_state = VCORE_RUNNING; preempt_disable(); spin_unlock(vc-lock); @@ -1070,8 +1076,6 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc) srcu_idx = srcu_read_lock(vcpu0-kvm-srcu); __kvmppc_vcore_entry(NULL, vcpu0); - for (i = 0; i threads_per_core; ++i) - kvmppc_release_hwthread(vc-pcpu + i); spin_lock(vc-lock); /* disable sending of IPIs on virtual external irqs */ @@ -1080,6 +1084,8 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc) /* wait for secondary threads to finish writing their state to memory */ if (vc-nap_count vc-n_woken) kvmppc_wait_for_nap(vc); + for (i = 0; i threads_per_core; ++i) + kvmppc_release_hwthread(vc-pcpu + i); /* prevent other vcpu threads from doing kvmppc_start_thread() now */ vc-vcore_state = VCORE_EXITING; spin_unlock(vc-lock); @@ -1170,6 +1176,7 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) kvm_run-exit_reason = 0; vcpu-arch.ret = RESUME_GUEST; vcpu-arch.trap = 0; + kvmppc_update_vpas(vcpu); /* * Synchronize with other threads in this virtual core @@ -1193,6 +1200,7 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) if (vc-vcore_state == VCORE_RUNNING VCORE_EXIT_COUNT(vc) == 0) { vcpu-arch.ptid = vc-n_runnable - 1; + kvmppc_create_dtl_entry(vcpu, vc); kvmppc_start_thread(vcpu); } -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/8] KVM: PPC: Book3S HV: Allow KVM guests to stop secondary threads coming online
When a Book3S HV KVM guest is running, we need the host to be in single-thread mode, that is, all of the cores (or at least all of the cores where the KVM guest could run) to be running only one active hardware thread. This is because of the hardware restriction in POWER processors that all of the hardware threads in the core must be in the same logical partition. Complying with this restriction is much easier if, from the host kernel's point of view, only one hardware thread is active. This adds two hooks in the SMP hotplug code to allow the KVM code to make sure that secondary threads (i.e. hardware threads other than thread 0) cannot come online while any KVM guest exists. The KVM code still has to check that any core where it runs a guest has the secondary threads offline, but having done that check it can now be sure that they will not come online while the guest is running. Signed-off-by: Paul Mackerras pau...@samba.org Acked-by: Benjamin Herrenschmidt b...@kernel.crashing.org --- arch/powerpc/include/asm/smp.h |8 +++ arch/powerpc/kernel/smp.c | 46 arch/powerpc/kvm/book3s_hv.c | 12 +-- 3 files changed, 64 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h index ebc24dc..b625a1a 100644 --- a/arch/powerpc/include/asm/smp.h +++ b/arch/powerpc/include/asm/smp.h @@ -66,6 +66,14 @@ void generic_cpu_die(unsigned int cpu); void generic_mach_cpu_die(void); void generic_set_cpu_dead(unsigned int cpu); int generic_check_cpu_restart(unsigned int cpu); + +extern void inhibit_secondary_onlining(void); +extern void uninhibit_secondary_onlining(void); + +#else /* HOTPLUG_CPU */ +static inline void inhibit_secondary_onlining(void) {} +static inline void uninhibit_secondary_onlining(void) {} + #endif #ifdef CONFIG_PPC64 diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index 8d4214a..c4f420c 100644 --- a/arch/powerpc/kernel/smp.c +++ b/arch/powerpc/kernel/smp.c @@ -417,6 +417,45 @@ int generic_check_cpu_restart(unsigned int cpu) { return per_cpu(cpu_state, cpu) == CPU_UP_PREPARE; } + +static atomic_t secondary_inhibit_count; + +/* + * Don't allow secondary CPU threads to come online + */ +void inhibit_secondary_onlining(void) +{ + /* +* This makes secondary_inhibit_count stable during cpu +* online/offline operations. +*/ + get_online_cpus(); + + atomic_inc(secondary_inhibit_count); + put_online_cpus(); +} +EXPORT_SYMBOL_GPL(inhibit_secondary_onlining); + +/* + * Allow secondary CPU threads to come online again + */ +void uninhibit_secondary_onlining(void) +{ + get_online_cpus(); + atomic_dec(secondary_inhibit_count); + put_online_cpus(); +} +EXPORT_SYMBOL_GPL(uninhibit_secondary_onlining); + +static int secondaries_inhibited(void) +{ + return atomic_read(secondary_inhibit_count); +} + +#else /* HOTPLUG_CPU */ + +#define secondaries_inhibited()0 + #endif static void cpu_idle_thread_init(unsigned int cpu, struct task_struct *idle) @@ -435,6 +474,13 @@ int __cpuinit __cpu_up(unsigned int cpu, struct task_struct *tidle) { int rc, c; + /* +* Don't allow secondary threads to come online if inhibited +*/ + if (threads_per_core 1 secondaries_inhibited() + cpu % threads_per_core != 0) + return -EBUSY; + if (smp_ops == NULL || (smp_ops-cpu_bootable !smp_ops-cpu_bootable(cpu))) return -EINVAL; diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 9a15da7..c5ddf04 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -47,6 +47,7 @@ #include asm/page.h #include asm/hvcall.h #include asm/switch_to.h +#include asm/smp.h #include linux/gfp.h #include linux/vmalloc.h #include linux/highmem.h @@ -1016,8 +1017,6 @@ static int kvmppc_run_core(struct kvmppc_vcore *vc) /* * Make sure we are running on thread 0, and that * secondary threads are offline. -* XXX we should also block attempts to bring any -* secondary threads online. */ if (threads_per_core 1 !on_primary_thread()) { list_for_each_entry(vcpu, vc-runnable_threads, arch.run_list) @@ -1730,11 +1729,20 @@ int kvmppc_core_init_vm(struct kvm *kvm) kvm-arch.using_mmu_notifiers = !!cpu_has_feature(CPU_FTR_ARCH_206); spin_lock_init(kvm-arch.slot_phys_lock); + + /* +* Don't allow secondary CPU threads to come online +* while any KVM VMs exist. +*/ + inhibit_secondary_onlining(); + return 0; } void kvmppc_core_destroy_vm(struct kvm *kvm) { + uninhibit_secondary_onlining(); + if (kvm-arch.rma) { kvm_release_rma(kvm-arch.rma); kvm-arch.rma = NULL; -- 1.7.10.4 -- To unsubscribe from this
[PATCH 3/8] KVM: PPC: Book3s HV: Don't access runnable threads list without vcore lock
There were a few places where we were traversing the list of runnable threads in a virtual core, i.e. vc-runnable_threads, without holding the vcore spinlock. This extends the places where we hold the vcore spinlock to cover everywhere that we traverse that list. Since we possibly need to sleep inside kvmppc_book3s_hv_page_fault, this moves the call of it from kvmppc_handle_exit out to kvmppc_vcpu_run, where we don't hold the vcore lock. In kvmppc_vcore_blocked, we don't actually need to check whether all vcpus are ceded and don't have any pending exceptions, since the caller has already done that. The caller (kvmppc_run_vcpu) wasn't actually checking for pending exceptions, so we add that. The change of if to while in kvmppc_run_vcpu is to make sure that we never call kvmppc_remove_runnable() when the vcore state is RUNNING or EXITING. Signed-off-by: Paul Mackerras pau...@samba.org --- arch/powerpc/include/asm/kvm_asm.h |1 + arch/powerpc/kvm/book3s_hv.c | 67 ++-- 2 files changed, 34 insertions(+), 34 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_asm.h b/arch/powerpc/include/asm/kvm_asm.h index 76fdcfe..aabcdba 100644 --- a/arch/powerpc/include/asm/kvm_asm.h +++ b/arch/powerpc/include/asm/kvm_asm.h @@ -118,6 +118,7 @@ #define RESUME_FLAG_NV (10) /* Reload guest nonvolatile state? */ #define RESUME_FLAG_HOST(11) /* Resume host? */ +#define RESUME_FLAG_ARCH1 (12) #define RESUME_GUEST0 #define RESUME_GUEST_NV RESUME_FLAG_NV diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 77dec0f..3a737a4 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -57,6 +57,9 @@ /* #define EXIT_DEBUG_SIMPLE */ /* #define EXIT_DEBUG_INT */ +/* Used to indicate that a guest page fault needs to be handled */ +#define RESUME_PAGE_FAULT (RESUME_GUEST | RESUME_FLAG_ARCH1) + static void kvmppc_end_cede(struct kvm_vcpu *vcpu); static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu); @@ -431,7 +434,6 @@ static int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, struct task_struct *tsk) { int r = RESUME_HOST; - int srcu_idx; vcpu-stat.sum_exits++; @@ -491,16 +493,12 @@ static int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, * have been handled already. */ case BOOK3S_INTERRUPT_H_DATA_STORAGE: - srcu_idx = srcu_read_lock(vcpu-kvm-srcu); - r = kvmppc_book3s_hv_page_fault(run, vcpu, - vcpu-arch.fault_dar, vcpu-arch.fault_dsisr); - srcu_read_unlock(vcpu-kvm-srcu, srcu_idx); + r = RESUME_PAGE_FAULT; break; case BOOK3S_INTERRUPT_H_INST_STORAGE: - srcu_idx = srcu_read_lock(vcpu-kvm-srcu); - r = kvmppc_book3s_hv_page_fault(run, vcpu, - kvmppc_get_pc(vcpu), 0); - srcu_read_unlock(vcpu-kvm-srcu, srcu_idx); + vcpu-arch.fault_dar = kvmppc_get_pc(vcpu); + vcpu-arch.fault_dsisr = 0; + r = RESUME_PAGE_FAULT; break; /* * This occurs if the guest executes an illegal instruction. @@ -984,22 +982,24 @@ static int on_primary_thread(void) * Run a set of guest threads on a physical core. * Called with vc-lock held. */ -static int kvmppc_run_core(struct kvmppc_vcore *vc) +static void kvmppc_run_core(struct kvmppc_vcore *vc) { struct kvm_vcpu *vcpu, *vcpu0, *vnext; long ret; u64 now; int ptid, i, need_vpa_update; int srcu_idx; + struct kvm_vcpu *vcpus_to_update[threads_per_core]; /* don't start if any threads have a signal pending */ need_vpa_update = 0; list_for_each_entry(vcpu, vc-runnable_threads, arch.run_list) { if (signal_pending(vcpu-arch.run_task)) - return 0; - need_vpa_update |= vcpu-arch.vpa.update_pending | - vcpu-arch.slb_shadow.update_pending | - vcpu-arch.dtl.update_pending; + return; + if (vcpu-arch.vpa.update_pending || + vcpu-arch.slb_shadow.update_pending || + vcpu-arch.dtl.update_pending) + vcpus_to_update[need_vpa_update++] = vcpu; } /* @@ -1019,8 +1019,8 @@ static int kvmppc_run_core(struct kvmppc_vcore *vc) */ if (need_vpa_update) { spin_unlock(vc-lock); - list_for_each_entry(vcpu, vc-runnable_threads, arch.run_list) - kvmppc_update_vpas(vcpu); + for (i = 0; i need_vpa_update; ++i) + kvmppc_update_vpas(vcpus_to_update[i]); spin_lock(vc-lock); } @@ -1037,8 +1037,10 @@ static int
Re: [PATCH 0/8] Various Book3s HV fixes that haven't been picked up yet
On 15.10.2012, at 13:14, Paul Mackerras wrote: This is a set of 8 patches of which the first 7 have been posted previously and have had no comments. The 8th is new, but is quite trivial. They fix a series of issues with HV-style KVM on ppc. They only touch code that is specific to Book3S HV KVM. Please apply. Sorry, I can't accept patches that haven't shown up on kvm@vger. Please send this patch set again with CC to kvm@vger. Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] KVM: PPC: Support ioeventfd
In order to support vhost, we need to be able to support ioeventfd. This patch set adds support for ioeventfd to PPC and makes it possible to do so without implementing irqfd along the way, as it requires an in-kernel irqchip which we don't have yet. Alex Alexander Graf (2): KVM: Distangle eventfd code from irqchip KVM: PPC: Support eventfd arch/powerpc/kvm/Kconfig |1 + arch/powerpc/kvm/Makefile |4 +++- arch/powerpc/kvm/powerpc.c | 17 - include/linux/kvm_host.h | 12 +++- virt/kvm/eventfd.c |6 ++ 5 files changed, 37 insertions(+), 3 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] KVM: PPC: Support eventfd
In order to support the generic eventfd infrastructure on PPC, we need to call into the generic KVM in-kernel device mmio code. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/Kconfig |1 + arch/powerpc/kvm/Makefile |4 +++- arch/powerpc/kvm/powerpc.c | 17 - 3 files changed, 20 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig index 71f0cd9..4730c95 100644 --- a/arch/powerpc/kvm/Kconfig +++ b/arch/powerpc/kvm/Kconfig @@ -20,6 +20,7 @@ config KVM bool select PREEMPT_NOTIFIERS select ANON_INODES + select HAVE_KVM_EVENTFD config KVM_BOOK3S_HANDLER bool diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile index c2a0863..cd89658 100644 --- a/arch/powerpc/kvm/Makefile +++ b/arch/powerpc/kvm/Makefile @@ -6,7 +6,8 @@ subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror ccflags-y := -Ivirt/kvm -Iarch/powerpc/kvm -common-objs-y = $(addprefix ../../../virt/kvm/, kvm_main.o coalesced_mmio.o) +common-objs-y = $(addprefix ../../../virt/kvm/, kvm_main.o coalesced_mmio.o \ + eventfd.o) CFLAGS_44x_tlb.o := -I. CFLAGS_e500_tlb.o := -I. @@ -76,6 +77,7 @@ kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HV) := \ kvm-book3s_64-module-objs := \ ../../../virt/kvm/kvm_main.o \ + ../../../virt/kvm/eventfd.o \ powerpc.o \ emulate.o \ book3s.o \ diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index deb0d59..900d8fc 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -314,6 +314,7 @@ int kvm_dev_ioctl_check_extension(long ext) case KVM_CAP_PPC_IRQ_LEVEL: case KVM_CAP_ENABLE_CAP: case KVM_CAP_ONE_REG: + case KVM_CAP_IOEVENTFD: r = 1; break; #ifndef CONFIG_KVM_BOOK3S_64_HV @@ -613,6 +614,13 @@ int kvmppc_handle_load(struct kvm_run *run, struct kvm_vcpu *vcpu, vcpu-mmio_is_write = 0; vcpu-arch.mmio_sign_extend = 0; + if (!kvm_io_bus_read(vcpu-kvm, KVM_MMIO_BUS, run-mmio.phys_addr, +bytes, run-mmio.data)) { + kvmppc_complete_mmio_load(vcpu, run); + vcpu-mmio_needed = 0; + return EMULATE_DONE; + } + return EMULATE_DO_MMIO; } @@ -622,8 +630,8 @@ int kvmppc_handle_loads(struct kvm_run *run, struct kvm_vcpu *vcpu, { int r; - r = kvmppc_handle_load(run, vcpu, rt, bytes, is_bigendian); vcpu-arch.mmio_sign_extend = 1; + r = kvmppc_handle_load(run, vcpu, rt, bytes, is_bigendian); return r; } @@ -661,6 +669,13 @@ int kvmppc_handle_store(struct kvm_run *run, struct kvm_vcpu *vcpu, } } + if (!kvm_io_bus_write(vcpu-kvm, KVM_MMIO_BUS, run-mmio.phys_addr, + bytes, run-mmio.data)) { + kvmppc_complete_mmio_load(vcpu, run); + vcpu-mmio_needed = 0; + return EMULATE_DONE; + } + return EMULATE_DO_MMIO; } -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] KVM: Distangle eventfd code from irqchip
The current eventfd code assumes that when we have eventfd, we also have irqfd for in-kernel interrupt delivery. This is not necessarily true. On PPC we don't have an in-kernel irqchip yet, but we can still support easily support eventfd. Signed-off-by: Alexander Graf ag...@suse.de --- include/linux/kvm_host.h | 12 +++- virt/kvm/eventfd.c |6 ++ 2 files changed, 17 insertions(+), 1 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 6afc5be..f2f5880 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -884,10 +884,20 @@ static inline void kvm_free_irq_routing(struct kvm *kvm) {} #ifdef CONFIG_HAVE_KVM_EVENTFD void kvm_eventfd_init(struct kvm *kvm); +int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args); + +#ifdef CONFIG_HAVE_KVM_IRQCHIP int kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args); void kvm_irqfd_release(struct kvm *kvm); void kvm_irq_routing_update(struct kvm *, struct kvm_irq_routing_table *); -int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args); +#else +static inline int kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args) +{ + return -EINVAL; +} + +static inline void kvm_irqfd_release(struct kvm *kvm) {} +#endif #else diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 9718e98..d7424c8 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -35,6 +35,7 @@ #include iodev.h +#ifdef __KVM_HAVE_IOAPIC /* * * irqfd: Allows an fd to be used to inject an interrupt to the guest @@ -425,17 +426,21 @@ fail: kfree(irqfd); return ret; } +#endif void kvm_eventfd_init(struct kvm *kvm) { +#ifdef __KVM_HAVE_IOAPIC spin_lock_init(kvm-irqfds.lock); INIT_LIST_HEAD(kvm-irqfds.items); INIT_LIST_HEAD(kvm-irqfds.resampler_list); mutex_init(kvm-irqfds.resampler_lock); +#endif INIT_LIST_HEAD(kvm-ioeventfds); } +#ifdef __KVM_HAVE_IOAPIC /* * shutdown any irqfd's that match fd+gsi */ @@ -555,6 +560,7 @@ static void __exit irqfd_module_exit(void) module_init(irqfd_module_init); module_exit(irqfd_module_exit); +#endif /* * -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm/powerpc: Handle errors in secondary thread grabbing
In the Book3s HV code, kvmppc_run_core() has logic to grab the secondary threads of the physical core. If for some reason a thread is stuck, kvmppc_grab_hwthread() can fail, but currently we ignore the failure and continue into the guest. If the stuck thread is in the kernel badness ensues. Instead we should check for failure and bail out. I've moved the grabbing prior to the startup of runnable threads, to simplify the error case. AFAICS this is harmless, but I could be missing something subtle. Signed-off-by: Michael Ellerman mich...@ellerman.id.au --- Or we could just BUG_ON() ? --- arch/powerpc/kvm/book3s_hv.c | 22 ++ 1 file changed, 18 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 721d460..55925cd 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -884,16 +884,30 @@ static int kvmppc_run_core(struct kvmppc_vcore *vc) if (vcpu-arch.ceded) vcpu-arch.ptid = ptid++; + /* +* Grab any remaining hw threads so they can't go into the kernel. +* Do this early to simplify the cleanup path if it fails. +*/ + for (i = ptid; i threads_per_core; ++i) { + int j, rc = kvmppc_grab_hwthread(vc-pcpu + i); + if (rc) { + for (j = i - 1; j ; j--) + kvmppc_release_hwthread(vc-pcpu + j); + + list_for_each_entry(vcpu, vc-runnable_threads, + arch.run_list) + vcpu-arch.ret = -EBUSY; + + goto out; + } + } + vc-stolen_tb += mftb() - vc-preempt_tb; vc-pcpu = smp_processor_id(); list_for_each_entry(vcpu, vc-runnable_threads, arch.run_list) { kvmppc_start_thread(vcpu); kvmppc_create_dtl_entry(vcpu, vc); } - /* Grab any remaining hw threads so they can't go into the kernel */ - for (i = ptid; i threads_per_core; ++i) - kvmppc_grab_hwthread(vc-pcpu + i); - preempt_disable(); spin_unlock(vc-lock); -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/8] Various Book3s HV fixes that haven't been picked up yet
On Mon, Oct 15, 2012 at 02:00:54PM +0200, Alexander Graf wrote: Sorry, I can't accept patches that haven't shown up on kvm@vger. Please send this patch set again with CC to kvm@vger. Done; I didn't cc kvm-ppc this time since the patches haven't changed. By the way, what is the purpose of kvm-ppc@vger.kernel.org? Paul. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm/powerpc: Handle errors in secondary thread grabbing
Michael, On Tue, Oct 16, 2012 at 11:15:50AM +1100, Michael Ellerman wrote: In the Book3s HV code, kvmppc_run_core() has logic to grab the secondary threads of the physical core. If for some reason a thread is stuck, kvmppc_grab_hwthread() can fail, but currently we ignore the failure and continue into the guest. If the stuck thread is in the kernel badness ensues. Instead we should check for failure and bail out. I've moved the grabbing prior to the startup of runnable threads, to simplify the error case. AFAICS this is harmless, but I could be missing something subtle. Thanks for looking at this - but in fact this is fixed by my patch entitled KVM: PPC: Book3S HV: Fix some races in starting secondary threads submitted back on August 28. Regards, Paul. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5] KVM: Provide mmu notifier retry test based on struct kvm
The mmu_notifier_retry() function, used to test whether any page invalidations are in progress, currently takes a vcpu pointer, though the code only needs the VM's struct kvm pointer. Forthcoming patches to the powerpc Book3S HV code will need to test for retry within a VM ioctl, where a struct kvm pointer is available but a struct vcpu pointer isn't. Therefore this creates a variant of mmu_notifier_retry called kvm_mmu_notifier_retry that takes a struct kvm pointer, and implements mmu_notifier_retry in terms of it. Signed-off-by: Paul Mackerras pau...@samba.org --- include/linux/kvm_host.h | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 6afc5be..1cc1e1d 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -841,9 +841,9 @@ extern struct kvm_stats_debugfs_item debugfs_entries[]; extern struct dentry *kvm_debugfs_dir; #if defined(CONFIG_MMU_NOTIFIER) defined(KVM_ARCH_WANT_MMU_NOTIFIER) -static inline int mmu_notifier_retry(struct kvm_vcpu *vcpu, unsigned long mmu_seq) +static inline int kvm_mmu_notifier_retry(struct kvm *kvm, unsigned long mmu_seq) { - if (unlikely(vcpu-kvm-mmu_notifier_count)) + if (unlikely(kvm-mmu_notifier_count)) return 1; /* * Ensure the read of mmu_notifier_count happens before the read @@ -856,10 +856,15 @@ static inline int mmu_notifier_retry(struct kvm_vcpu *vcpu, unsigned long mmu_se * can't rely on kvm-mmu_lock to keep things ordered. */ smp_rmb(); - if (vcpu-kvm-mmu_notifier_seq != mmu_seq) + if (kvm-mmu_notifier_seq != mmu_seq) return 1; return 0; } + +static inline int mmu_notifier_retry(struct kvm_vcpu *vcpu, unsigned long mmu_seq) +{ + return kvm_mmu_notifier_retry(vcpu-kvm, mmu_seq); +} #endif #ifdef KVM_CAP_IRQ_ROUTING -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/5] KVM: PPC: Book3S HV: Add a mechanism for recording modified HPTEs
This uses a bit in our record of the guest view of the HPTE to record when the HPTE gets modified. We use a reserved bit for this, and ensure that this bit is always cleared in HPTE values returned to the guest. The recording of modified HPTEs is only done if other code indicates its interest by setting kvm-arch.hpte_mod_interest to a non-zero value. Signed-off-by: Paul Mackerras pau...@samba.org --- arch/powerpc/include/asm/kvm_book3s_64.h |6 ++ arch/powerpc/include/asm/kvm_host.h |1 + arch/powerpc/kvm/book3s_hv_rm_mmu.c | 25 ++--- 3 files changed, 29 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 1472a5b..4ca4f25 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -50,6 +50,12 @@ extern int kvm_hpt_order;/* order of preallocated HPTs */ #define HPTE_V_HVLOCK 0x40UL #define HPTE_V_ABSENT 0x20UL +/* + * We use this bit in the guest_rpte field of the revmap entry + * to indicate a modified HPTE. + */ +#define HPTE_GR_MODIFIED (1ul 62) + static inline long try_lock_hpte(unsigned long *hpte, unsigned long bits) { unsigned long tmp, old; diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 3093896..58c7264 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -248,6 +248,7 @@ struct kvm_arch { atomic_t vcpus_running; unsigned long hpt_npte; unsigned long hpt_mask; + atomic_t hpte_mod_interest; spinlock_t slot_phys_lock; unsigned short last_vcpu[NR_CPUS]; struct kvmppc_vcore *vcores[KVM_MAX_VCORES]; diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 3233587..c83c0ca 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -66,6 +66,18 @@ void kvmppc_add_revmap_chain(struct kvm *kvm, struct revmap_entry *rev, } EXPORT_SYMBOL_GPL(kvmppc_add_revmap_chain); +/* + * Note modification of an HPTE; set the HPTE modified bit + * if it wasn't modified before and anyone is interested. + */ +static inline void note_hpte_modification(struct kvm *kvm, + struct revmap_entry *rev) +{ + if (!(rev-guest_rpte HPTE_GR_MODIFIED) + atomic_read(kvm-arch.hpte_mod_interest)) + rev-guest_rpte |= HPTE_GR_MODIFIED; +} + /* Remove this HPTE from the chain for a real page */ static void remove_revmap_chain(struct kvm *kvm, long pte_index, struct revmap_entry *rev, @@ -287,8 +299,10 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags, rev = kvm-arch.revmap[pte_index]; if (realmode) rev = real_vmalloc_addr(rev); - if (rev) + if (rev) { rev-guest_rpte = g_ptel; + note_hpte_modification(kvm, rev); + } /* Link HPTE into reverse-map chain */ if (pteh HPTE_V_VALID) { @@ -392,7 +406,8 @@ long kvmppc_h_remove(struct kvm_vcpu *vcpu, unsigned long flags, /* Read PTE low word after tlbie to get final R/C values */ remove_revmap_chain(kvm, pte_index, rev, v, hpte[1]); } - r = rev-guest_rpte; + r = rev-guest_rpte ~HPTE_GR_MODIFIED; + note_hpte_modification(kvm, rev); unlock_hpte(hpte, 0); vcpu-arch.gpr[4] = v; @@ -466,6 +481,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu) args[j] = ((0x80 | flags) 56) + pte_index; rev = real_vmalloc_addr(kvm-arch.revmap[pte_index]); + note_hpte_modification(kvm, rev); if (!(hp[0] HPTE_V_VALID)) { /* insert R and C bits from PTE */ @@ -555,6 +571,7 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags, if (rev) { r = (rev-guest_rpte ~mask) | bits; rev-guest_rpte = r; + note_hpte_modification(kvm, rev); } r = (hpte[1] ~mask) | bits; @@ -606,8 +623,10 @@ long kvmppc_h_read(struct kvm_vcpu *vcpu, unsigned long flags, v = ~HPTE_V_ABSENT; v |= HPTE_V_VALID; } - if (v HPTE_V_VALID) + if (v HPTE_V_VALID) { r = rev[i].guest_rpte | (r (HPTE_R_R | HPTE_R_C)); + r = ~HPTE_GR_MODIFIED; + } vcpu-arch.gpr[4 + i * 2] = v; vcpu-arch.gpr[5 + i * 2] = r; } -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/5] KVM: PPC: Book3S HV: Restructure HPT entry creation code
This restructures the code that creates HPT (hashed page table) entries so that it can be called in situations where we don't have a struct vcpu pointer, only a struct kvm pointer. It also fixes a bug where kvmppc_map_vrma() would corrupt the guest R4 value. Now, most of the work of kvmppc_virtmode_h_enter is done by a new function, kvmppc_virtmode_do_h_enter, which itself calls another new function, kvmppc_do_h_enter, which contains most of the old kvmppc_h_enter. The new kvmppc_do_h_enter takes explicit arguments for the place to return the HPTE index, the Linux page tables to use, and whether it is being called in real mode, thus removing the need for it to have the vcpu as an argument. Currently kvmppc_map_vrma creates the VRMA (virtual real mode area) HPTEs by calling kvmppc_virtmode_h_enter, which is designed primarily to handle H_ENTER hcalls from the guest that need to pin a page of memory. Since H_ENTER returns the index of the created HPTE in R4, kvmppc_virtmode_h_enter updates the guest R4, corrupting the guest R4 in the case when it gets called from kvmppc_map_vrma on the first VCPU_RUN ioctl. With this, kvmppc_map_vrma instead calls kvmppc_virtmode_do_h_enter with the address of a dummy word as the place to store the HPTE index, thus avoiding corrupting the guest R4. Signed-off-by: Paul Mackerras pau...@samba.org --- arch/powerpc/include/asm/kvm_book3s.h |5 +++-- arch/powerpc/kvm/book3s_64_mmu_hv.c | 36 +++-- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 27 - 3 files changed, 45 insertions(+), 23 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index ab73800..199b7fd 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -157,8 +157,9 @@ extern void *kvmppc_pin_guest_page(struct kvm *kvm, unsigned long addr, extern void kvmppc_unpin_guest_page(struct kvm *kvm, void *addr); extern long kvmppc_virtmode_h_enter(struct kvm_vcpu *vcpu, unsigned long flags, long pte_index, unsigned long pteh, unsigned long ptel); -extern long kvmppc_h_enter(struct kvm_vcpu *vcpu, unsigned long flags, - long pte_index, unsigned long pteh, unsigned long ptel); +extern long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags, + long pte_index, unsigned long pteh, unsigned long ptel, + pgd_t *pgdir, bool realmode, unsigned long *idx_ret); extern long kvmppc_hv_get_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot, unsigned long *map); diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 7a4aae9..351f2ac 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -41,6 +41,10 @@ /* Power architecture requires HPT is at least 256kB */ #define PPC_MIN_HPT_ORDER 18 +static long kvmppc_virtmode_do_h_enter(struct kvm *kvm, unsigned long flags, + long pte_index, unsigned long pteh, + unsigned long ptel, unsigned long *pte_idx_ret); + long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp) { unsigned long hpt; @@ -185,6 +189,7 @@ void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot, unsigned long addr, hash; unsigned long psize; unsigned long hp0, hp1; + unsigned long idx_ret; long ret; struct kvm *kvm = vcpu-kvm; @@ -216,7 +221,8 @@ void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot, hash = (hash 3) + 7; hp_v = hp0 | ((addr 16) ~0x7fUL); hp_r = hp1 | addr; - ret = kvmppc_virtmode_h_enter(vcpu, H_EXACT, hash, hp_v, hp_r); + ret = kvmppc_virtmode_do_h_enter(kvm, H_EXACT, hash, hp_v, hp_r, +idx_ret); if (ret != H_SUCCESS) { pr_err(KVM: map_vrma at %lx failed, ret=%ld\n, addr, ret); @@ -354,15 +360,10 @@ static long kvmppc_get_guest_page(struct kvm *kvm, unsigned long gfn, return err; } -/* - * We come here on a H_ENTER call from the guest when we are not - * using mmu notifiers and we don't have the requested page pinned - * already. - */ -long kvmppc_virtmode_h_enter(struct kvm_vcpu *vcpu, unsigned long flags, - long pte_index, unsigned long pteh, unsigned long ptel) +long kvmppc_virtmode_do_h_enter(struct kvm *kvm, unsigned long flags, + long pte_index, unsigned long pteh, + unsigned long ptel, unsigned long *pte_idx_ret) { - struct kvm *kvm = vcpu-kvm; unsigned long psize, gpa, gfn; struct kvm_memory_slot *memslot; long ret; @@ -390,8 +391,8 @@ long
[PATCH 0/5] KVM: PPC: Book3S HV: HPT read/write functions for userspace
This series of patches provides an interface by which userspace can read and write the hashed page table (HPT) of a Book3S HV guest. The interface is an ioctl which provides a file descriptor which can be accessed with the read() and write() system calls. The data read and written is the guest view of the HPT, in which the second doubleword of each HPTE (HPT entry) contains a guest physical address, as distinct from the real HPT that the hardware accesses, where the second doubleword of each HPTE contains a real address. Because the HPT is divided into groups (HPTEGs) of 8 entries each, where each HPTEG usually only contains a few valid entries, or none, the data format that we use does run-length encoding of the invalid entries, so in fact the invalid entries take up no space in the stream. The interface also provides for doing multiple passes over the HPT, where the first pass provides information on all HPTEs, and subsequent passes only return the HPTEs that have changed since the previous pass. I have implemented a read/write interface rather than an mmap-based interface because the data is not stored contiguously anywhere in kernel memory. Of each 16-byte HPTE, the first 8 bytes come from the real HPT and the second 8 bytes come from the parallel vmalloc'd array where we store the guest view of the guest physical address, permissions, accessed/dirty bits etc. Thus a mmap-based interface would not be practicable (not without doubling the size of the parallel array, typically requiring an extra 8MB of kernel memory per guest). This is also why I have not used the memslot interface for this. This implements the interface for HV-style KVM but not for PR-style KVM. Userspace does not need any additional interface with PR-style KVM because userspace maintains the guest HPT already in that case, and has an image of the guest view of the HPT in its address space. This series is against the next branch of the kvm tree plus my recently-posted set of 8 patches (Various Book3s HV fixes that haven't been picked up yet). The overall diffstat is: Documentation/virtual/kvm/api.txt| 53 + arch/powerpc/include/asm/kvm.h | 24 ++ arch/powerpc/include/asm/kvm_book3s.h|8 +- arch/powerpc/include/asm/kvm_book3s_64.h | 24 ++ arch/powerpc/include/asm/kvm_host.h |1 + arch/powerpc/include/asm/kvm_ppc.h |2 + arch/powerpc/kvm/book3s_64_mmu_hv.c | 380 +- arch/powerpc/kvm/book3s_hv.c | 12 - arch/powerpc/kvm/book3s_hv_rm_mmu.c | 71 -- arch/powerpc/kvm/powerpc.c | 17 ++ include/linux/kvm.h |3 + include/linux/kvm_host.h | 11 +- 12 files changed, 559 insertions(+), 47 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/5] KVM: PPC: Book3S HV: Provide a method for userspace to read and write the HPT
A new ioctl, KVM_PPC_GET_HTAB_FD, returns a file descriptor. Reads on this fd return the contents of the HPT (hashed page table), writes create and/or remove entries in the HPT. There is a new capability, KVM_CAP_PPC_HTAB_FD, to indicate the presence of the ioctl. The ioctl takes an argument structure with the index of the first HPT entry to read out and a set of flags. The flags indicate whether the user is intending to read or write the HPT, and whether to return all entries or only the bolted entries (those with the bolted bit, 0x10, set in the first doubleword). This is intended for use in implementing qemu's savevm/loadvm and for live migration. Therefore, on reads, the first pass returns information about all HPTEs (or all bolted HPTEs). When the first pass reaches the end of the HPT, it returns from the read. Subsequent reads only return information about HPTEs that have changed since they were last read. A read that finds no changed HPTEs in the HPT following where the last read finished will return 0 bytes. Signed-off-by: Paul Mackerras pau...@samba.org --- Documentation/virtual/kvm/api.txt| 53 + arch/powerpc/include/asm/kvm.h | 24 +++ arch/powerpc/include/asm/kvm_book3s_64.h | 18 ++ arch/powerpc/include/asm/kvm_ppc.h |2 + arch/powerpc/kvm/book3s_64_mmu_hv.c | 344 ++ arch/powerpc/kvm/book3s_hv.c | 12 -- arch/powerpc/kvm/powerpc.c | 17 ++ include/linux/kvm.h |3 + 8 files changed, 461 insertions(+), 12 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 4258180..8df3e53 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2071,6 +2071,59 @@ KVM_S390_INT_EXTERNAL_CALL (vcpu) - sigp external call; source cpu in parm Note that the vcpu ioctl is asynchronous to vcpu execution. +4.78 KVM_PPC_GET_HTAB_FD + +Capability: KVM_CAP_PPC_HTAB_FD +Architectures: powerpc +Type: vm ioctl +Parameters: Pointer to struct kvm_get_htab_fd (in) +Returns: file descriptor number (= 0) on success, -1 on error + +This returns a file descriptor that can be used either to read out the +entries in the guest's hashed page table (HPT), or to write entries to +initialize the HPT. The returned fd can only be written to if the +KVM_GET_HTAB_WRITE bit is set in the flags field of the argument, and +can only be read if that bit is clear. The argument struct looks like +this: + +/* For KVM_PPC_GET_HTAB_FD */ +struct kvm_get_htab_fd { + __u64 flags; + __u64 start_index; +}; + +/* Values for kvm_get_htab_fd.flags */ +#define KVM_GET_HTAB_BOLTED_ONLY ((__u64)0x1) +#define KVM_GET_HTAB_WRITE ((__u64)0x2) + +The `start_index' field gives the index in the HPT of the entry at +which to start reading. It is ignored when writing. + +Reads on the fd will initially supply information about all +interesting HPT entries. Interesting entries are those with the +bolted bit set, if the KVM_GET_HTAB_BOLTED_ONLY bit is set, otherwise +all entries. When the end of the HPT is reached, the read() will +return. If read() is called again on the fd, it will start again from +the beginning of the HPT, but will only return HPT entries that have +changed since they were last read. + +Data read or written is structured as a header (8 bytes) followed by a +series of valid HPT entries (16 bytes) each. The header indicates how +many valid HPT entries there are and how many invalid entries follow +the valid entries. The invalid entries are not represented explicitly +in the stream. The header format is: + +struct kvm_get_htab_header { + __u32 index; + __u16 n_valid; + __u16 n_invalid; +}; + +Writes to the fd create HPT entries starting at the index given in the +header; first `n_valid' valid entries with contents from the data +written, then `n_invalid' invalid entries, invalidating any previously +valid entries found. + 5. The kvm_run structure diff --git a/arch/powerpc/include/asm/kvm.h b/arch/powerpc/include/asm/kvm.h index b89ae4d..6518e38 100644 --- a/arch/powerpc/include/asm/kvm.h +++ b/arch/powerpc/include/asm/kvm.h @@ -331,6 +331,30 @@ struct kvm_book3e_206_tlb_params { __u32 reserved[8]; }; +/* For KVM_PPC_GET_HTAB_FD */ +struct kvm_get_htab_fd { + __u64 flags; + __u64 start_index; +}; + +/* Values for kvm_get_htab_fd.flags */ +#define KVM_GET_HTAB_BOLTED_ONLY ((__u64)0x1) +#define KVM_GET_HTAB_WRITE ((__u64)0x2) + +/* + * Data read on the file descriptor is formatted as a series of + * records, each consisting of a header followed by a series of + * `n_valid' HPTEs (16 bytes each), which are all valid. Following + * those valid HPTEs there are `n_invalid' invalid HPTEs, which + * are not represented explicitly in the stream. The same format + * is used for writing. + */ +struct
[PATCH 4/5] KVM: PPC: Book3S HV: Make a HPTE removal function available
This makes a HPTE removal function, kvmppc_do_h_remove(), available outside book3s_hv_rm_mmu.c. This will be used by the HPT writing code. Signed-off-by: Paul Mackerras pau...@samba.org --- arch/powerpc/include/asm/kvm_book3s.h |3 +++ arch/powerpc/kvm/book3s_hv_rm_mmu.c | 19 +-- 2 files changed, 16 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index 199b7fd..4ac1c67 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -160,6 +160,9 @@ extern long kvmppc_virtmode_h_enter(struct kvm_vcpu *vcpu, unsigned long flags, extern long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags, long pte_index, unsigned long pteh, unsigned long ptel, pgd_t *pgdir, bool realmode, unsigned long *idx_ret); +extern long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags, + unsigned long pte_index, unsigned long avpn, + unsigned long *hpret); extern long kvmppc_hv_get_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot, unsigned long *map); diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index c83c0ca..505548a 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -364,11 +364,10 @@ static inline int try_lock_tlbie(unsigned int *lock) return old == 0; } -long kvmppc_h_remove(struct kvm_vcpu *vcpu, unsigned long flags, -unsigned long pte_index, unsigned long avpn, -unsigned long va) +long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags, + unsigned long pte_index, unsigned long avpn, + unsigned long *hpret) { - struct kvm *kvm = vcpu-kvm; unsigned long *hpte; unsigned long v, r, rb; struct revmap_entry *rev; @@ -410,10 +409,18 @@ long kvmppc_h_remove(struct kvm_vcpu *vcpu, unsigned long flags, note_hpte_modification(kvm, rev); unlock_hpte(hpte, 0); - vcpu-arch.gpr[4] = v; - vcpu-arch.gpr[5] = r; + hpret[0] = v; + hpret[1] = r; return H_SUCCESS; } +EXPORT_SYMBOL_GPL(kvmppc_do_h_remove); + +long kvmppc_h_remove(struct kvm_vcpu *vcpu, unsigned long flags, +unsigned long pte_index, unsigned long avpn) +{ + return kvmppc_do_h_remove(vcpu-kvm, flags, pte_index, avpn, + vcpu-arch.gpr[4]); +} long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu) { -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] MAINTAINERS: Add git tree link for PPC KVM
Signed-off-by: Michael Ellerman mich...@ellerman.id.au --- MAINTAINERS |1 + 1 file changed, 1 insertion(+) diff --git a/MAINTAINERS b/MAINTAINERS index e73060f..32dc107 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -4244,6 +4244,7 @@ KERNEL VIRTUAL MACHINE (KVM) FOR POWERPC M: Alexander Graf ag...@suse.de L: kvm-ppc@vger.kernel.org W: http://kvm.qumranet.com +T: git git://github.com/agraf/linux-2.6.git S: Supported F: arch/powerpc/include/asm/kvm* F: arch/powerpc/kvm/ -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html