Re: [User Question] How to create a backup of an LVM based maschine without wasting space

2012-10-15 Thread Lukas Laukamp

Am 15.10.2012 05:06, schrieb Javier Guerra Giraldez:

On Sat, Oct 13, 2012 at 5:25 PM, Lukas Laukamp lu...@laukamp.me wrote:

I have backed up the data within the machine with partimage and fsarchiver.
But it would be greate to have a better way than doing this over a live
system.

make no mistake, the absolutely best way is from within the VM.  It's
the most consistent, safe and efficient method.

Doing it from the outside is attractive, but it's a hack, and in
some cases you have to jump through several hoops to make it safe.




Because of the fact that this is a problem I bootet the VM with a live 
cd and made a backup of the important filesystems with the live system 
which runs inside the VM.


Best Regards
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: do not de-cache cr4 bits needlessly

2012-10-15 Thread Gleb Natapov

Signed-off-by: Gleb Natapov g...@redhat.com
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e9c83b1..3df12c8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -635,7 +635,7 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
}
 
if (is_long_mode(vcpu)) {
-   if (kvm_read_cr4(vcpu)  X86_CR4_PCIDE) {
+   if (kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE)) {
if (cr3  CR3_PCID_ENABLED_RESERVED_BITS)
return 1;
} else
--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Shared IRQ with PCI Passthrough?

2012-10-15 Thread Marco
Jan Kiszka jan.kiszka at siemens.com writes:



 
 Nope, there is no IRQ sharing support for assigned devices in any public
 version so far. I'm on it, but some issues remain to be solved.
 
 Jan
 


Hi, any news on this? I own an Intel DQ67OW that has the same issue. No PCI 
passthrough possible with KVM when USB is active.

Marco

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1] vhost-blk: Add vhost-blk support v4

2012-10-15 Thread Asias He
vhost-blk is an in-kernel virito-blk device accelerator.

Due to lack of proper in-kernel AIO interface, this version converts
guest's I/O request to bio and use submit_bio() to submit I/O directly.
So this version any supports raw block device as guest's disk image,
e.g. /dev/sda, /dev/ram0. We can add file based image support to
vhost-blk once we have in-kernel AIO interface. There are some work in
progress for in-kernel AIO interface from Dave Kleikamp and Zach Brown:

   http://marc.info/?l=linux-fsdevelm=133312234313122

Performance evaluation:
-
1) LKVM
Fio with libaio ioengine on Fusion IO device using kvm tool
IOPS   Before   After   Improvement
seq-read   107  121 +13.0%
seq-write  130  179 +37.6%
rnd-read   102  122 +19.6%
rnd-write  125  159 +27.0%

2) QEMU
Fio with libaio ioengine on Fusion IO device using QEMU
IOPS   Before   After   Improvement
seq-read   76   123 +61.8%
seq-write  139  173 +24.4%
rnd-read   73   120 +64.3%
rnd-write  75   156 +108.0%

Userspace bits:
-
1) LKVM
The latest vhost-blk userspace bits for kvm tool can be found here:
g...@github.com:asias/linux-kvm.git blk.vhost-blk

2) QEMU
The latest vhost-blk userspace prototype for QEMU can be found here:
g...@github.com:asias/qemu.git blk.vhost-blk

Changes in v4:
- Mark req-status as userspace pointer
- Use __copy_to_user() instead of copy_to_user() in vhost_blk_set_status()
- Add if (need_resched()) schedule() in blk thread
- Kill vhost_blk_stop_vq() and move it into vhost_blk_stop()
- Use vq_err() instead of pr_warn()
- Fail un Unsupported request
- Add flush in vhost_blk_set_features()

Changes in v3:
- Sending REQ_FLUSH bio instead of vfs_fsync, thanks Christoph!
- Check file passed by user is a raw block device file

Signed-off-by: Asias He as...@redhat.com
---
 drivers/vhost/Kconfig |   1 +
 drivers/vhost/Kconfig.blk |  10 +
 drivers/vhost/Makefile|   2 +
 drivers/vhost/blk.c   | 677 ++
 drivers/vhost/blk.h   |   8 +
 5 files changed, 698 insertions(+)
 create mode 100644 drivers/vhost/Kconfig.blk
 create mode 100644 drivers/vhost/blk.c
 create mode 100644 drivers/vhost/blk.h

diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 202bba6..acd8038 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -11,4 +11,5 @@ config VHOST_NET
 
 if STAGING
 source drivers/vhost/Kconfig.tcm
+source drivers/vhost/Kconfig.blk
 endif
diff --git a/drivers/vhost/Kconfig.blk b/drivers/vhost/Kconfig.blk
new file mode 100644
index 000..ff8ab76
--- /dev/null
+++ b/drivers/vhost/Kconfig.blk
@@ -0,0 +1,10 @@
+config VHOST_BLK
+   tristate Host kernel accelerator for virtio blk (EXPERIMENTAL)
+   depends on BLOCK   EXPERIMENTAL  m
+   ---help---
+ This kernel module can be loaded in host kernel to accelerate
+ guest block with virtio_blk. Not to be confused with virtio_blk
+ module itself which needs to be loaded in guest kernel.
+
+ To compile this driver as a module, choose M here: the module will
+ be called vhost_blk.
diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
index a27b053..1a8a4a5 100644
--- a/drivers/vhost/Makefile
+++ b/drivers/vhost/Makefile
@@ -2,3 +2,5 @@ obj-$(CONFIG_VHOST_NET) += vhost_net.o
 vhost_net-y := vhost.o net.o
 
 obj-$(CONFIG_TCM_VHOST) += tcm_vhost.o
+obj-$(CONFIG_VHOST_BLK) += vhost_blk.o
+vhost_blk-y := blk.o
diff --git a/drivers/vhost/blk.c b/drivers/vhost/blk.c
new file mode 100644
index 000..5c2b790
--- /dev/null
+++ b/drivers/vhost/blk.c
@@ -0,0 +1,677 @@
+/*
+ * Copyright (C) 2011 Taobao, Inc.
+ * Author: Liu Yuan tailai...@taobao.com
+ *
+ * Copyright (C) 2012 Red Hat, Inc.
+ * Author: Asias He as...@redhat.com
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ *
+ * virtio-blk server in host kernel.
+ */
+
+#include linux/miscdevice.h
+#include linux/module.h
+#include linux/vhost.h
+#include linux/virtio_blk.h
+#include linux/mutex.h
+#include linux/file.h
+#include linux/kthread.h
+#include linux/blkdev.h
+
+#include vhost.c
+#include vhost.h
+#include blk.h
+
+/* The block header is in the first and separate buffer. */
+#define BLK_HDR0
+
+static DEFINE_IDA(vhost_blk_index_ida);
+
+enum {
+   VHOST_BLK_VQ_REQ = 0,
+   VHOST_BLK_VQ_MAX = 1,
+};
+
+struct req_page_list {
+   struct page **pages;
+   int pages_nr;
+};
+
+struct vhost_blk_req {
+   struct llist_node llnode;
+   struct req_page_list *pl;
+   struct vhost_blk *blk;
+
+   struct iovec *iov;
+   int iov_nr;
+
+   struct bio **bio;
+   atomic_t bio_nr;
+
+   sector_t sector;
+   int write;
+   u16 head;
+   long len;
+
+   u8 __user *status;
+};
+
+struct vhost_blk {
+   struct task_struct *host_kick;
+   struct 

KVM call agenda for 2012-10-16

2012-10-15 Thread Juan Quintela

Hi

Please send in any agenda topics you are interested in.

Later, Juan.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Shared IRQ with PCI Passthrough?

2012-10-15 Thread Jan Kiszka
On 2012-10-15 11:07, Marco wrote:
 Jan Kiszka jan.kiszka at siemens.com writes:
 
 
 

 Nope, there is no IRQ sharing support for assigned devices in any public
 version so far. I'm on it, but some issues remain to be solved.

 Jan

 
 
 Hi, any news on this? I own an Intel DQ67OW that has the same issue. No PCI 
 passthrough possible with KVM when USB is active.

Supported by qemu-kvm-1.2 and Linux = 3.4. But not all devices play
well with it, so your mileage may vary.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] KVM: PPC: Support ioeventfd

2012-10-15 Thread Alexander Graf
In order to support vhost, we need to be able to support ioeventfd.

This patch set adds support for ioeventfd to PPC and makes it possible to
do so without implementing irqfd along the way, as it requires an in-kernel
irqchip which we don't have yet.

Alex

Alexander Graf (2):
  KVM: Distangle eventfd code from irqchip
  KVM: PPC: Support eventfd

 arch/powerpc/kvm/Kconfig   |1 +
 arch/powerpc/kvm/Makefile  |4 +++-
 arch/powerpc/kvm/powerpc.c |   17 -
 include/linux/kvm_host.h   |   12 +++-
 virt/kvm/eventfd.c |6 ++
 5 files changed, 37 insertions(+), 3 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] KVM: PPC: Support eventfd

2012-10-15 Thread Alexander Graf
In order to support the generic eventfd infrastructure on PPC, we need
to call into the generic KVM in-kernel device mmio code.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/Kconfig   |1 +
 arch/powerpc/kvm/Makefile  |4 +++-
 arch/powerpc/kvm/powerpc.c |   17 -
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index 71f0cd9..4730c95 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -20,6 +20,7 @@ config KVM
bool
select PREEMPT_NOTIFIERS
select ANON_INODES
+   select HAVE_KVM_EVENTFD
 
 config KVM_BOOK3S_HANDLER
bool
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index c2a0863..cd89658 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -6,7 +6,8 @@ subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror
 
 ccflags-y := -Ivirt/kvm -Iarch/powerpc/kvm
 
-common-objs-y = $(addprefix ../../../virt/kvm/, kvm_main.o coalesced_mmio.o)
+common-objs-y = $(addprefix ../../../virt/kvm/, kvm_main.o coalesced_mmio.o \
+   eventfd.o)
 
 CFLAGS_44x_tlb.o  := -I.
 CFLAGS_e500_tlb.o := -I.
@@ -76,6 +77,7 @@ kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HV) := \
 
 kvm-book3s_64-module-objs := \
../../../virt/kvm/kvm_main.o \
+   ../../../virt/kvm/eventfd.o \
powerpc.o \
emulate.o \
book3s.o \
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index deb0d59..900d8fc 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -314,6 +314,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_PPC_IRQ_LEVEL:
case KVM_CAP_ENABLE_CAP:
case KVM_CAP_ONE_REG:
+   case KVM_CAP_IOEVENTFD:
r = 1;
break;
 #ifndef CONFIG_KVM_BOOK3S_64_HV
@@ -613,6 +614,13 @@ int kvmppc_handle_load(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
vcpu-mmio_is_write = 0;
vcpu-arch.mmio_sign_extend = 0;
 
+   if (!kvm_io_bus_read(vcpu-kvm, KVM_MMIO_BUS, run-mmio.phys_addr,
+bytes, run-mmio.data)) {
+   kvmppc_complete_mmio_load(vcpu, run);
+   vcpu-mmio_needed = 0;
+   return EMULATE_DONE;
+   }
+
return EMULATE_DO_MMIO;
 }
 
@@ -622,8 +630,8 @@ int kvmppc_handle_loads(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
 {
int r;
 
-   r = kvmppc_handle_load(run, vcpu, rt, bytes, is_bigendian);
vcpu-arch.mmio_sign_extend = 1;
+   r = kvmppc_handle_load(run, vcpu, rt, bytes, is_bigendian);
 
return r;
 }
@@ -661,6 +669,13 @@ int kvmppc_handle_store(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
}
}
 
+   if (!kvm_io_bus_write(vcpu-kvm, KVM_MMIO_BUS, run-mmio.phys_addr,
+ bytes, run-mmio.data)) {
+   kvmppc_complete_mmio_load(vcpu, run);
+   vcpu-mmio_needed = 0;
+   return EMULATE_DONE;
+   }
+
return EMULATE_DO_MMIO;
 }
 
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] KVM: Distangle eventfd code from irqchip

2012-10-15 Thread Alexander Graf
The current eventfd code assumes that when we have eventfd, we also have
irqfd for in-kernel interrupt delivery. This is not necessarily true. On
PPC we don't have an in-kernel irqchip yet, but we can still support easily
support eventfd.

Signed-off-by: Alexander Graf ag...@suse.de
---
 include/linux/kvm_host.h |   12 +++-
 virt/kvm/eventfd.c   |6 ++
 2 files changed, 17 insertions(+), 1 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 6afc5be..f2f5880 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -884,10 +884,20 @@ static inline void kvm_free_irq_routing(struct kvm *kvm) 
{}
 #ifdef CONFIG_HAVE_KVM_EVENTFD
 
 void kvm_eventfd_init(struct kvm *kvm);
+int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args);
+
+#ifdef CONFIG_HAVE_KVM_IRQCHIP
 int kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args);
 void kvm_irqfd_release(struct kvm *kvm);
 void kvm_irq_routing_update(struct kvm *, struct kvm_irq_routing_table *);
-int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args);
+#else
+static inline int kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args)
+{
+   return -EINVAL;
+}
+
+static inline void kvm_irqfd_release(struct kvm *kvm) {}
+#endif
 
 #else
 
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 9718e98..d7424c8 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -35,6 +35,7 @@
 
 #include iodev.h
 
+#ifdef __KVM_HAVE_IOAPIC
 /*
  * 
  * irqfd: Allows an fd to be used to inject an interrupt to the guest
@@ -425,17 +426,21 @@ fail:
kfree(irqfd);
return ret;
 }
+#endif
 
 void
 kvm_eventfd_init(struct kvm *kvm)
 {
+#ifdef __KVM_HAVE_IOAPIC
spin_lock_init(kvm-irqfds.lock);
INIT_LIST_HEAD(kvm-irqfds.items);
INIT_LIST_HEAD(kvm-irqfds.resampler_list);
mutex_init(kvm-irqfds.resampler_lock);
+#endif
INIT_LIST_HEAD(kvm-ioeventfds);
 }
 
+#ifdef __KVM_HAVE_IOAPIC
 /*
  * shutdown any irqfd's that match fd+gsi
  */
@@ -555,6 +560,7 @@ static void __exit irqfd_module_exit(void)
 
 module_init(irqfd_module_init);
 module_exit(irqfd_module_exit);
+#endif
 
 /*
  * 
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE handler

2012-10-15 Thread Raghavendra K T

On 10/11/2012 01:06 AM, Andrew Theurer wrote:

On Wed, 2012-10-10 at 23:24 +0530, Raghavendra K T wrote:

On 10/10/2012 08:29 AM, Andrew Theurer wrote:

On Wed, 2012-10-10 at 00:21 +0530, Raghavendra K T wrote:

* Avi Kivity a...@redhat.com [2012-10-04 17:00:28]:


On 10/04/2012 03:07 PM, Peter Zijlstra wrote:

On Thu, 2012-10-04 at 14:41 +0200, Avi Kivity wrote:



[...]

A big concern I have (if this is 1x overcommit) for ebizzy is that it
has just terrible scalability to begin with.  I do not think we should
try to optimize such a bad workload.



I think my way of running dbench has some flaw, so I went to ebizzy.
Could you let me know how you generally run dbench?


I mount a tmpfs and then specify that mount for dbench to run on.  This
eliminates all IO.  I use a 300 second run time and number of threads is
equal to number of vcpus.  All of the VMs of course need to have a
synchronized start.

I would also make sure you are using a recent kernel for dbench, where
the dcache scalability is much improved.  Without any lock-holder
preemption, the time in spin_lock should be very low:



 21.54%  78016 dbench  [kernel.kallsyms]   [k] 
copy_user_generic_unrolled
  3.51%  12723 dbench  libc-2.12.so[.] __strchr_sse42
  2.81%  10176 dbench  dbench  [.] child_run
  2.54%   9203 dbench  [kernel.kallsyms]   [k] _raw_spin_lock
  2.33%   8423 dbench  dbench  [.] next_token
  2.02%   7335 dbench  [kernel.kallsyms]   [k] __d_lookup_rcu
  1.89%   6850 dbench  libc-2.12.so[.] __strstr_sse42
  1.53%   5537 dbench  libc-2.12.so[.] __memset_sse2
  1.47%   5337 dbench  [kernel.kallsyms]   [k] link_path_walk
  1.40%   5084 dbench  [kernel.kallsyms]   [k] kmem_cache_alloc
  1.38%   5009 dbench  libc-2.12.so[.] memmove
  1.24%   4496 dbench  libc-2.12.so[.] vfprintf
  1.15%   4169 dbench  [kernel.kallsyms]   [k] 
__audit_syscall_exit




Hi Andrew,
I ran the test with dbench with tmpfs. I do not see any improvements in
dbench for 16k ple window.

So it seems apart from ebizzy no workload benefited by that. and I
agree that, it may not be good to optimize for ebizzy.
I shall drop changing to 16k default window and continue with other
original patch series. Need to experiment with latest kernel.

(PS: Thanks for pointing towards, perf in latest kernel. It works fine.)

Results:
dbench run for 120 sec 30 sec warmup 8 iterations using tmpfs
base = 3.6.0-rc5 with ple handler optimization patch.

x = base + ple_window = 4k
+ = base + ple_window = 16k
* = base + ple_gap = 0

dbench 1x overcommit case
=
N   Min   MaxMedian   AvgStddev
x   85322.5   5519.05   5482.71 5461.0962 63.522276
+   8   5255.45   5530.55   5496.94 5455.2137 93.070363
*   8   5350.85   5477.81  5408.065 5418.4338 44.762697


dbench 2x overcommit case
==

N   Min   MaxMedian   AvgStddev
x   8   3054.32   3194.47   3137.33  3132.625 54.491615
+   83040.8   3148.87  3088.615 3088.1887 32.862336
*   8   3031.51   3171.993083.6 3097.4612 50.526977

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE handler

2012-10-15 Thread Andrew Theurer
On Mon, 2012-10-15 at 17:40 +0530, Raghavendra K T wrote:
 On 10/11/2012 01:06 AM, Andrew Theurer wrote:
  On Wed, 2012-10-10 at 23:24 +0530, Raghavendra K T wrote:
  On 10/10/2012 08:29 AM, Andrew Theurer wrote:
  On Wed, 2012-10-10 at 00:21 +0530, Raghavendra K T wrote:
  * Avi Kivity a...@redhat.com [2012-10-04 17:00:28]:
 
  On 10/04/2012 03:07 PM, Peter Zijlstra wrote:
  On Thu, 2012-10-04 at 14:41 +0200, Avi Kivity wrote:
 
 [...]
  A big concern I have (if this is 1x overcommit) for ebizzy is that it
  has just terrible scalability to begin with.  I do not think we should
  try to optimize such a bad workload.
 
 
  I think my way of running dbench has some flaw, so I went to ebizzy.
  Could you let me know how you generally run dbench?
 
  I mount a tmpfs and then specify that mount for dbench to run on.  This
  eliminates all IO.  I use a 300 second run time and number of threads is
  equal to number of vcpus.  All of the VMs of course need to have a
  synchronized start.
 
  I would also make sure you are using a recent kernel for dbench, where
  the dcache scalability is much improved.  Without any lock-holder
  preemption, the time in spin_lock should be very low:
 
 
   21.54%  78016 dbench  [kernel.kallsyms]   [k] 
  copy_user_generic_unrolled
3.51%  12723 dbench  libc-2.12.so[.] 
  __strchr_sse42
2.81%  10176 dbench  dbench  [.] child_run
2.54%   9203 dbench  [kernel.kallsyms]   [k] 
  _raw_spin_lock
2.33%   8423 dbench  dbench  [.] next_token
2.02%   7335 dbench  [kernel.kallsyms]   [k] 
  __d_lookup_rcu
1.89%   6850 dbench  libc-2.12.so[.] 
  __strstr_sse42
1.53%   5537 dbench  libc-2.12.so[.] 
  __memset_sse2
1.47%   5337 dbench  [kernel.kallsyms]   [k] 
  link_path_walk
1.40%   5084 dbench  [kernel.kallsyms]   [k] 
  kmem_cache_alloc
1.38%   5009 dbench  libc-2.12.so[.] memmove
1.24%   4496 dbench  libc-2.12.so[.] vfprintf
1.15%   4169 dbench  [kernel.kallsyms]   [k] 
  __audit_syscall_exit
 
 
 Hi Andrew,
 I ran the test with dbench with tmpfs. I do not see any improvements in
 dbench for 16k ple window.
 
 So it seems apart from ebizzy no workload benefited by that. and I
 agree that, it may not be good to optimize for ebizzy.
 I shall drop changing to 16k default window and continue with other
 original patch series. Need to experiment with latest kernel.

Thanks for running this again.  I do believe there are some workloads,
when run at 1x overcommit, would benefit from a larger ple_window [with
he current ple handling code], but I do not also want to potentially
degrade 1x with a larger window.  I do, however, think there may be a
another option.  I have not fully worked this out, but I think I am on
to something.

I decided to revert back to just a yield() instead of a yield_to().  My
motivation was that yield_to() [for large VMs] is like a dog chasing its
tail, round and round we go   Just yield(), in particular a yield()
which results in yielding to something -other- than the current VM's
vcpus, helps synchronize the execution of sibling vcpus by deferring
them until the lock holder vcpu is running again.  The more we can do to
get all vcpus running at the same time, the far less we deal with the
preemption problem.  The other benefit is that yield() is far, far lower
overhead than yield_to()

This does assume that vcpus from same VM do not share same runqueues.
Yielding to a sibling vcpu with yield() is not productive for larger VMs
in the same way that yield_to() is not.  My recent results include
restricting vcpu placement so that sibling vcpus do not get to run on
the same runqueue.  I do believe we could implement a initial placement
and load balance policy to strive for this restriction (making it purely
optional, but I bet could also help user apps which use spin locks).

For 1x VMs which still vm_exit due to PLE, I believe we could probably
just leave the ple_window alone, as long as we mostly use yield()
instead of yield_to().  The problem with the unneeded exits in this case
has been the overhead in routines leading up to yield_to() and the
yield_to() itself.  If we use yield() most of the time, this overhead
will go away.

Here is a comparison of yield_to() and yield():

dbench with 20-way VMs, 8 of them on 80-way host:

no PLE426 +/- 11.03%
no PLE w/ gangsched 32001 +/- .37%
PLE with yield()29207 +/- .28%
PLE with yield_to()  8175 +/- 1.37%

Yield() is far and way better than yield_to() here and almost approaches
gang sched result.  Here is a link for the perf sched map bitmap:

https://docs.google.com/open?id=0B6tfUNlZ-14weXBfVnFFZGw1akU

The thrashing is way down and sibling vcpus tend to run together,

Re: [PATCH 0/3] x86: clear vmcss on all cpus when doing kdump if necessary

2012-10-15 Thread Avi Kivity
On 10/12/2012 08:40 AM, Zhang Yanfei wrote:
 Currently, kdump just makes all the logical processors leave VMX operation by
 executing VMXOFF instruction, so any VMCSs active on the logical processors 
 may
 be corrupted. But, sometimes, we need the VMCSs to debug guest images 
 contained
 in the host vmcore. To prevent the corruption, we should VMCLEAR the VMCSs 
 before
 executing the VMXOFF instruction.

How have you verified that VMXOFF doesn't flush cached VMCSs already?

 
 The patch set provides an alternative way to clear VMCSs related to guests
 on all cpus when host is doing kdump.
 

I'm not sure the sysctl is really necessary.  The only reason to turn if
off is if the corruption is so severe that the loaded vmcs list itself
causes a crash.  I think it should be rare enough that we can do it
unconditionally.

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for 2012-10-16

2012-10-15 Thread Igor Mammedov
CPU as DEVICE
http://lists.gnu.org/archive/html/qemu-devel/2012-10/msg00719.html
latest known tree for testing:

https://github.com/ehabkost/qemu-hacks/commits/work/cpu-devicestate-qdev-core
may be we could agree on proposed RFC.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH for-3.7] vhost: fix mergeable bufs on BE hosts

2012-10-15 Thread Michael S. Tsirkin
We copy head count to a 16 bit field,
this works by chance on LE but on BE
guest gets 0. Fix it up.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
Tested-by: Alexander Graf ag...@suse.de
Cc: sta...@kernel.org

---
 drivers/vhost/net.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 9ab6d47..2bb463c 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -448,7 +448,8 @@ static void handle_rx(struct vhost_net *net)
.hdr.gso_type = VIRTIO_NET_HDR_GSO_NONE
};
size_t total_len = 0;
-   int err, headcount, mergeable;
+   int err, mergeable;
+   s16 headcount;
size_t vhost_hlen, sock_hlen;
size_t vhost_len, sock_len;
/* TODO: check that we are running from vhost_worker? */
-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] qemu: Update Linux headers

2012-10-15 Thread Alex Williamson
Based on v3.7-rc1-3-g29bb4cc

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 Trying to get KVM_IRQFD_FLAG_RESAMPLE and friends for vfio-pci

 linux-headers/asm-x86/kvm.h |   17 +
 linux-headers/linux/kvm.h   |   25 +
 linux-headers/linux/kvm_para.h  |6 +++---
 linux-headers/linux/vfio.h  |6 +++---
 linux-headers/linux/virtio_config.h |6 +++---
 linux-headers/linux/virtio_ring.h   |6 +++---
 6 files changed, 50 insertions(+), 16 deletions(-)

diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
index 246617e..a65ec29 100644
--- a/linux-headers/asm-x86/kvm.h
+++ b/linux-headers/asm-x86/kvm.h
@@ -9,6 +9,22 @@
 #include linux/types.h
 #include linux/ioctl.h
 
+#define DE_VECTOR 0
+#define DB_VECTOR 1
+#define BP_VECTOR 3
+#define OF_VECTOR 4
+#define BR_VECTOR 5
+#define UD_VECTOR 6
+#define NM_VECTOR 7
+#define DF_VECTOR 8
+#define TS_VECTOR 10
+#define NP_VECTOR 11
+#define SS_VECTOR 12
+#define GP_VECTOR 13
+#define PF_VECTOR 14
+#define MF_VECTOR 16
+#define MC_VECTOR 18
+
 /* Select x86 specific features in linux/kvm.h */
 #define __KVM_HAVE_PIT
 #define __KVM_HAVE_IOAPIC
@@ -25,6 +41,7 @@
 #define __KVM_HAVE_DEBUGREGS
 #define __KVM_HAVE_XSAVE
 #define __KVM_HAVE_XCRS
+#define __KVM_HAVE_READONLY_MEM
 
 /* Architectural interrupt line count. */
 #define KVM_NR_INTERRUPTS 256
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 4b9e575..81d2feb 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -101,9 +101,13 @@ struct kvm_userspace_memory_region {
__u64 userspace_addr; /* start of the userspace allocated memory */
 };
 
-/* for kvm_memory_region::flags */
-#define KVM_MEM_LOG_DIRTY_PAGES  1UL
-#define KVM_MEMSLOT_INVALID  (1UL  1)
+/*
+ * The bit 0 ~ bit 15 of kvm_memory_region::flags are visible for userspace,
+ * other bits are reserved for kvm internal use which are defined in
+ * include/linux/kvm_host.h.
+ */
+#define KVM_MEM_LOG_DIRTY_PAGES(1UL  0)
+#define KVM_MEM_READONLY   (1UL  1)
 
 /* for KVM_IRQ_LINE */
 struct kvm_irq_level {
@@ -618,6 +622,10 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_PPC_GET_SMMU_INFO 78
 #define KVM_CAP_S390_COW 79
 #define KVM_CAP_PPC_ALLOC_HTAB 80
+#ifdef __KVM_HAVE_READONLY_MEM
+#define KVM_CAP_READONLY_MEM 81
+#endif
+#define KVM_CAP_IRQFD_RESAMPLE 82
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -683,12 +691,21 @@ struct kvm_xen_hvm_config {
 #endif
 
 #define KVM_IRQFD_FLAG_DEASSIGN (1  0)
+/*
+ * Available with KVM_CAP_IRQFD_RESAMPLE
+ *
+ * KVM_IRQFD_FLAG_RESAMPLE indicates resamplefd is valid and specifies
+ * the irqfd to operate in resampling mode for level triggered interrupt
+ * emlation.  See Documentation/virtual/kvm/api.txt.
+ */
+#define KVM_IRQFD_FLAG_RESAMPLE (1  1)
 
 struct kvm_irqfd {
__u32 fd;
__u32 gsi;
__u32 flags;
-   __u8  pad[20];
+   __u32 resamplefd;
+   __u8  pad[16];
 };
 
 struct kvm_clock_data {
diff --git a/linux-headers/linux/kvm_para.h b/linux-headers/linux/kvm_para.h
index 7bdcf93..cea2c5c 100644
--- a/linux-headers/linux/kvm_para.h
+++ b/linux-headers/linux/kvm_para.h
@@ -1,5 +1,5 @@
-#ifndef __LINUX_KVM_PARA_H
-#define __LINUX_KVM_PARA_H
+#ifndef _UAPI__LINUX_KVM_PARA_H
+#define _UAPI__LINUX_KVM_PARA_H
 
 /*
  * This header file provides a method for making a hypercall to the host
@@ -25,4 +25,4 @@
  */
 #include asm/kvm_para.h
 
-#endif /* __LINUX_KVM_PARA_H */
+#endif /* _UAPI__LINUX_KVM_PARA_H */
diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
index f787b72..4758d1b 100644
--- a/linux-headers/linux/vfio.h
+++ b/linux-headers/linux/vfio.h
@@ -8,8 +8,8 @@
  * it under the terms of the GNU General Public License version 2 as
  * published by the Free Software Foundation.
  */
-#ifndef VFIO_H
-#define VFIO_H
+#ifndef _UAPIVFIO_H
+#define _UAPIVFIO_H
 
 #include linux/types.h
 #include linux/ioctl.h
@@ -365,4 +365,4 @@ struct vfio_iommu_type1_dma_unmap {
 
 #define VFIO_IOMMU_UNMAP_DMA _IO(VFIO_TYPE, VFIO_BASE + 14)
 
-#endif /* VFIO_H */
+#endif /* _UAPIVFIO_H */
diff --git a/linux-headers/linux/virtio_config.h 
b/linux-headers/linux/virtio_config.h
index 4f51d8f..b7cda39 100644
--- a/linux-headers/linux/virtio_config.h
+++ b/linux-headers/linux/virtio_config.h
@@ -1,5 +1,5 @@
-#ifndef _LINUX_VIRTIO_CONFIG_H
-#define _LINUX_VIRTIO_CONFIG_H
+#ifndef _UAPI_LINUX_VIRTIO_CONFIG_H
+#define _UAPI_LINUX_VIRTIO_CONFIG_H
 /* This header, excluding the #ifdef __KERNEL__ part, is BSD licensed so
  * anyone can use the definitions to implement compatible drivers/servers.
  *
@@ -51,4 +51,4 @@
  * suppressed them? */
 #define VIRTIO_F_NOTIFY_ON_EMPTY   24
 
-#endif /* _LINUX_VIRTIO_CONFIG_H */
+#endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
diff --git a/linux-headers/linux/virtio_ring.h 
b/linux-headers/linux/virtio_ring.h
index 1b333e2..921694a 100644
--- a/linux-headers/linux/virtio_ring.h
+++ 

[PATCH] vfio-pci: Add KVM INTx acceleration

2012-10-15 Thread Alex Williamson
This makes use of the new level irqfd support enabling bypass of
qemu userspace both on INTx injection and unmask.  This significantly
boosts the performance of devices making use of legacy interrupts.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

My INTx routing workaround below will probably raise some eyebrows,
but I don't feel it's worth subjecting users to core dumps if they
want to try vfio-pci on new platforms.  INTx routing is part of some
larger plan, but until that plan materializes we have to try to avoid
the API unless we think there's a good chance it might be there.
I'll accept the maintenance of updating a whitelist in the interim.
Thanks,

Alex

 hw/vfio_pci.c |  224 +
 1 file changed, 224 insertions(+)

diff --git a/hw/vfio_pci.c b/hw/vfio_pci.c
index 639371e..777a5f8 100644
--- a/hw/vfio_pci.c
+++ b/hw/vfio_pci.c
@@ -154,6 +154,53 @@ static uint32_t vfio_pci_read_config(PCIDevice *pdev, 
uint32_t addr, int len);
 static void vfio_mmap_set_enabled(VFIODevice *vdev, bool enabled);
 
 /*
+ * PCI code refuses to make it possible to probe whether the chipset
+ * supports pci_device_route_intx_to_irq() and booby traps the call
+ * to assert if doesn't.  For us, this is just an optimization, so
+ * only enable it when we know it's present.  Unfortunately PCIBus is
+ * private, so we can't just look at the function pointer.
+ */
+static bool vfio_pci_bus_has_intx_route(PCIDevice *pdev)
+{
+#ifdef CONFIG_KVM
+BusState *bus = qdev_get_parent_bus(pdev-qdev);
+DeviceState *dev;
+
+if (!kvm_irqchip_in_kernel() ||
+!kvm_check_extension(kvm_state, KVM_CAP_IRQFD_RESAMPLE)) {
+   return false;
+}
+
+for (; bus-parent; bus = qdev_get_parent_bus(dev)) {
+
+dev = bus-parent;
+
+if (!strncmp(i440FX-pcihost, object_get_typename(OBJECT(dev)), 14)) {
+return true;
+}
+}
+
+error_report(vfio-pci: VM chipset does not support INTx routing, 
+ using slow INTx mode\n);
+#endif
+return false;
+}
+
+static PCIINTxRoute vfio_pci_device_route_intx_to_irq(PCIDevice *pdev, int pin)
+{
+if (!vfio_pci_bus_has_intx_route(pdev)) {
+return (PCIINTxRoute) { .mode = PCI_INTX_DISABLED, .irq = -1 };
+}
+
+return pci_device_route_intx_to_irq(pdev, pin);
+}
+
+static bool vfio_pci_intx_route_changed(PCIINTxRoute *old, PCIINTxRoute *new)
+{
+return old-mode != new-mode || old-irq != new-irq;
+}
+
+/*
  * Common VFIO interrupt disable
  */
 static void vfio_disable_irqindex(VFIODevice *vdev, int index)
@@ -185,6 +232,21 @@ static void vfio_unmask_intx(VFIODevice *vdev)
 ioctl(vdev-fd, VFIO_DEVICE_SET_IRQS, irq_set);
 }
 
+#ifdef CONFIG_KVM
+static void vfio_mask_intx(VFIODevice *vdev)
+{
+struct vfio_irq_set irq_set = {
+.argsz = sizeof(irq_set),
+.flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_MASK,
+.index = VFIO_PCI_INTX_IRQ_INDEX,
+.start = 0,
+.count = 1,
+};
+
+ioctl(vdev-fd, VFIO_DEVICE_SET_IRQS, irq_set);
+}
+#endif
+
 /*
  * Disabling BAR mmaping can be slow, but toggling it around INTx can
  * also be a huge overhead.  We try to get the best of both worlds by
@@ -248,6 +310,161 @@ static void vfio_eoi(VFIODevice *vdev)
 vfio_unmask_intx(vdev);
 }
 
+static void vfio_enable_intx_kvm(VFIODevice *vdev)
+{
+#ifdef CONFIG_KVM
+struct kvm_irqfd irqfd = {
+.fd = event_notifier_get_fd(vdev-intx.interrupt),
+.gsi = vdev-intx.route.irq,
+.flags = KVM_IRQFD_FLAG_RESAMPLE,
+};
+struct vfio_irq_set *irq_set;
+int ret, argsz;
+int32_t *pfd;
+
+if (!kvm_irqchip_in_kernel() ||
+vdev-intx.route.mode != PCI_INTX_ENABLED ||
+!kvm_check_extension(kvm_state, KVM_CAP_IRQFD_RESAMPLE)) {
+return;
+}
+
+/* Get to a known interrupt state */
+qemu_set_fd_handler(irqfd.fd, NULL, NULL, vdev);
+vfio_mask_intx(vdev);
+vdev-intx.pending = false;
+qemu_set_irq(vdev-pdev.irq[vdev-intx.pin], 0);
+
+/* Get an eventfd for resample/unmask */
+if (event_notifier_init(vdev-intx.unmask, 0)) {
+error_report(vfio: Error: event_notifier_init failed eoi\n);
+goto fail;
+}
+
+/* KVM triggers it, VFIO listens for it */
+irqfd.resamplefd = event_notifier_get_fd(vdev-intx.unmask);
+
+if (kvm_vm_ioctl(kvm_state, KVM_IRQFD, irqfd)) {
+error_report(vfio: Error: Failed to setup resample irqfd: %m\n);
+goto fail_irqfd;
+}
+
+argsz = sizeof(*irq_set) + sizeof(*pfd);
+
+irq_set = g_malloc0(argsz);
+irq_set-argsz = argsz;
+irq_set-flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_UNMASK;
+irq_set-index = VFIO_PCI_INTX_IRQ_INDEX;
+irq_set-start = 0;
+irq_set-count = 1;
+pfd = (int32_t *)irq_set-data;
+
+*pfd = irqfd.resamplefd;
+
+ret = ioctl(vdev-fd, VFIO_DEVICE_SET_IRQS, irq_set);
+g_free(irq_set);
+if (ret) {
+

Re: [PATCH] qemu: Update Linux headers

2012-10-15 Thread Anthony Liguori
Alex Williamson alex.william...@redhat.com writes:

 Based on v3.7-rc1-3-g29bb4cc

Normally this would go through qemu-kvm/uq/master but since this is from
Linus' tree, it's less of a concern.

Nonetheless, I'd prefer we did it from v3.7-rc1 instead of a random git
snapshot.

Regards,

Anthony Liguori


 Signed-off-by: Alex Williamson alex.william...@redhat.com
 ---

  Trying to get KVM_IRQFD_FLAG_RESAMPLE and friends for vfio-pci

  linux-headers/asm-x86/kvm.h |   17 +
  linux-headers/linux/kvm.h   |   25 +
  linux-headers/linux/kvm_para.h  |6 +++---
  linux-headers/linux/vfio.h  |6 +++---
  linux-headers/linux/virtio_config.h |6 +++---
  linux-headers/linux/virtio_ring.h   |6 +++---
  6 files changed, 50 insertions(+), 16 deletions(-)

 diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
 index 246617e..a65ec29 100644
 --- a/linux-headers/asm-x86/kvm.h
 +++ b/linux-headers/asm-x86/kvm.h
 @@ -9,6 +9,22 @@
  #include linux/types.h
  #include linux/ioctl.h
  
 +#define DE_VECTOR 0
 +#define DB_VECTOR 1
 +#define BP_VECTOR 3
 +#define OF_VECTOR 4
 +#define BR_VECTOR 5
 +#define UD_VECTOR 6
 +#define NM_VECTOR 7
 +#define DF_VECTOR 8
 +#define TS_VECTOR 10
 +#define NP_VECTOR 11
 +#define SS_VECTOR 12
 +#define GP_VECTOR 13
 +#define PF_VECTOR 14
 +#define MF_VECTOR 16
 +#define MC_VECTOR 18
 +
  /* Select x86 specific features in linux/kvm.h */
  #define __KVM_HAVE_PIT
  #define __KVM_HAVE_IOAPIC
 @@ -25,6 +41,7 @@
  #define __KVM_HAVE_DEBUGREGS
  #define __KVM_HAVE_XSAVE
  #define __KVM_HAVE_XCRS
 +#define __KVM_HAVE_READONLY_MEM
  
  /* Architectural interrupt line count. */
  #define KVM_NR_INTERRUPTS 256
 diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
 index 4b9e575..81d2feb 100644
 --- a/linux-headers/linux/kvm.h
 +++ b/linux-headers/linux/kvm.h
 @@ -101,9 +101,13 @@ struct kvm_userspace_memory_region {
   __u64 userspace_addr; /* start of the userspace allocated memory */
  };
  
 -/* for kvm_memory_region::flags */
 -#define KVM_MEM_LOG_DIRTY_PAGES  1UL
 -#define KVM_MEMSLOT_INVALID  (1UL  1)
 +/*
 + * The bit 0 ~ bit 15 of kvm_memory_region::flags are visible for userspace,
 + * other bits are reserved for kvm internal use which are defined in
 + * include/linux/kvm_host.h.
 + */
 +#define KVM_MEM_LOG_DIRTY_PAGES  (1UL  0)
 +#define KVM_MEM_READONLY (1UL  1)
  
  /* for KVM_IRQ_LINE */
  struct kvm_irq_level {
 @@ -618,6 +622,10 @@ struct kvm_ppc_smmu_info {
  #define KVM_CAP_PPC_GET_SMMU_INFO 78
  #define KVM_CAP_S390_COW 79
  #define KVM_CAP_PPC_ALLOC_HTAB 80
 +#ifdef __KVM_HAVE_READONLY_MEM
 +#define KVM_CAP_READONLY_MEM 81
 +#endif
 +#define KVM_CAP_IRQFD_RESAMPLE 82
  
  #ifdef KVM_CAP_IRQ_ROUTING
  
 @@ -683,12 +691,21 @@ struct kvm_xen_hvm_config {
  #endif
  
  #define KVM_IRQFD_FLAG_DEASSIGN (1  0)
 +/*
 + * Available with KVM_CAP_IRQFD_RESAMPLE
 + *
 + * KVM_IRQFD_FLAG_RESAMPLE indicates resamplefd is valid and specifies
 + * the irqfd to operate in resampling mode for level triggered interrupt
 + * emlation.  See Documentation/virtual/kvm/api.txt.
 + */
 +#define KVM_IRQFD_FLAG_RESAMPLE (1  1)
  
  struct kvm_irqfd {
   __u32 fd;
   __u32 gsi;
   __u32 flags;
 - __u8  pad[20];
 + __u32 resamplefd;
 + __u8  pad[16];
  };
  
  struct kvm_clock_data {
 diff --git a/linux-headers/linux/kvm_para.h b/linux-headers/linux/kvm_para.h
 index 7bdcf93..cea2c5c 100644
 --- a/linux-headers/linux/kvm_para.h
 +++ b/linux-headers/linux/kvm_para.h
 @@ -1,5 +1,5 @@
 -#ifndef __LINUX_KVM_PARA_H
 -#define __LINUX_KVM_PARA_H
 +#ifndef _UAPI__LINUX_KVM_PARA_H
 +#define _UAPI__LINUX_KVM_PARA_H
  
  /*
   * This header file provides a method for making a hypercall to the host
 @@ -25,4 +25,4 @@
   */
  #include asm/kvm_para.h
  
 -#endif /* __LINUX_KVM_PARA_H */
 +#endif /* _UAPI__LINUX_KVM_PARA_H */
 diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
 index f787b72..4758d1b 100644
 --- a/linux-headers/linux/vfio.h
 +++ b/linux-headers/linux/vfio.h
 @@ -8,8 +8,8 @@
   * it under the terms of the GNU General Public License version 2 as
   * published by the Free Software Foundation.
   */
 -#ifndef VFIO_H
 -#define VFIO_H
 +#ifndef _UAPIVFIO_H
 +#define _UAPIVFIO_H
  
  #include linux/types.h
  #include linux/ioctl.h
 @@ -365,4 +365,4 @@ struct vfio_iommu_type1_dma_unmap {
  
  #define VFIO_IOMMU_UNMAP_DMA _IO(VFIO_TYPE, VFIO_BASE + 14)
  
 -#endif /* VFIO_H */
 +#endif /* _UAPIVFIO_H */
 diff --git a/linux-headers/linux/virtio_config.h 
 b/linux-headers/linux/virtio_config.h
 index 4f51d8f..b7cda39 100644
 --- a/linux-headers/linux/virtio_config.h
 +++ b/linux-headers/linux/virtio_config.h
 @@ -1,5 +1,5 @@
 -#ifndef _LINUX_VIRTIO_CONFIG_H
 -#define _LINUX_VIRTIO_CONFIG_H
 +#ifndef _UAPI_LINUX_VIRTIO_CONFIG_H
 +#define _UAPI_LINUX_VIRTIO_CONFIG_H
  /* This header, excluding the #ifdef __KERNEL__ part, is BSD licensed so
   * anyone can 

[PATCH v2] qemu: Update Linux headers

2012-10-15 Thread Alex Williamson
Based on v3.7-rc1

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

Using tag v3.7-rc1 instead of random HEAD, although the patch turns
out identical to v1.

 linux-headers/asm-x86/kvm.h |   17 +
 linux-headers/linux/kvm.h   |   25 +
 linux-headers/linux/kvm_para.h  |6 +++---
 linux-headers/linux/vfio.h  |6 +++---
 linux-headers/linux/virtio_config.h |6 +++---
 linux-headers/linux/virtio_ring.h   |6 +++---
 6 files changed, 50 insertions(+), 16 deletions(-)

diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
index 246617e..a65ec29 100644
--- a/linux-headers/asm-x86/kvm.h
+++ b/linux-headers/asm-x86/kvm.h
@@ -9,6 +9,22 @@
 #include linux/types.h
 #include linux/ioctl.h
 
+#define DE_VECTOR 0
+#define DB_VECTOR 1
+#define BP_VECTOR 3
+#define OF_VECTOR 4
+#define BR_VECTOR 5
+#define UD_VECTOR 6
+#define NM_VECTOR 7
+#define DF_VECTOR 8
+#define TS_VECTOR 10
+#define NP_VECTOR 11
+#define SS_VECTOR 12
+#define GP_VECTOR 13
+#define PF_VECTOR 14
+#define MF_VECTOR 16
+#define MC_VECTOR 18
+
 /* Select x86 specific features in linux/kvm.h */
 #define __KVM_HAVE_PIT
 #define __KVM_HAVE_IOAPIC
@@ -25,6 +41,7 @@
 #define __KVM_HAVE_DEBUGREGS
 #define __KVM_HAVE_XSAVE
 #define __KVM_HAVE_XCRS
+#define __KVM_HAVE_READONLY_MEM
 
 /* Architectural interrupt line count. */
 #define KVM_NR_INTERRUPTS 256
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 4b9e575..81d2feb 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -101,9 +101,13 @@ struct kvm_userspace_memory_region {
__u64 userspace_addr; /* start of the userspace allocated memory */
 };
 
-/* for kvm_memory_region::flags */
-#define KVM_MEM_LOG_DIRTY_PAGES  1UL
-#define KVM_MEMSLOT_INVALID  (1UL  1)
+/*
+ * The bit 0 ~ bit 15 of kvm_memory_region::flags are visible for userspace,
+ * other bits are reserved for kvm internal use which are defined in
+ * include/linux/kvm_host.h.
+ */
+#define KVM_MEM_LOG_DIRTY_PAGES(1UL  0)
+#define KVM_MEM_READONLY   (1UL  1)
 
 /* for KVM_IRQ_LINE */
 struct kvm_irq_level {
@@ -618,6 +622,10 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_PPC_GET_SMMU_INFO 78
 #define KVM_CAP_S390_COW 79
 #define KVM_CAP_PPC_ALLOC_HTAB 80
+#ifdef __KVM_HAVE_READONLY_MEM
+#define KVM_CAP_READONLY_MEM 81
+#endif
+#define KVM_CAP_IRQFD_RESAMPLE 82
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -683,12 +691,21 @@ struct kvm_xen_hvm_config {
 #endif
 
 #define KVM_IRQFD_FLAG_DEASSIGN (1  0)
+/*
+ * Available with KVM_CAP_IRQFD_RESAMPLE
+ *
+ * KVM_IRQFD_FLAG_RESAMPLE indicates resamplefd is valid and specifies
+ * the irqfd to operate in resampling mode for level triggered interrupt
+ * emlation.  See Documentation/virtual/kvm/api.txt.
+ */
+#define KVM_IRQFD_FLAG_RESAMPLE (1  1)
 
 struct kvm_irqfd {
__u32 fd;
__u32 gsi;
__u32 flags;
-   __u8  pad[20];
+   __u32 resamplefd;
+   __u8  pad[16];
 };
 
 struct kvm_clock_data {
diff --git a/linux-headers/linux/kvm_para.h b/linux-headers/linux/kvm_para.h
index 7bdcf93..cea2c5c 100644
--- a/linux-headers/linux/kvm_para.h
+++ b/linux-headers/linux/kvm_para.h
@@ -1,5 +1,5 @@
-#ifndef __LINUX_KVM_PARA_H
-#define __LINUX_KVM_PARA_H
+#ifndef _UAPI__LINUX_KVM_PARA_H
+#define _UAPI__LINUX_KVM_PARA_H
 
 /*
  * This header file provides a method for making a hypercall to the host
@@ -25,4 +25,4 @@
  */
 #include asm/kvm_para.h
 
-#endif /* __LINUX_KVM_PARA_H */
+#endif /* _UAPI__LINUX_KVM_PARA_H */
diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
index f787b72..4758d1b 100644
--- a/linux-headers/linux/vfio.h
+++ b/linux-headers/linux/vfio.h
@@ -8,8 +8,8 @@
  * it under the terms of the GNU General Public License version 2 as
  * published by the Free Software Foundation.
  */
-#ifndef VFIO_H
-#define VFIO_H
+#ifndef _UAPIVFIO_H
+#define _UAPIVFIO_H
 
 #include linux/types.h
 #include linux/ioctl.h
@@ -365,4 +365,4 @@ struct vfio_iommu_type1_dma_unmap {
 
 #define VFIO_IOMMU_UNMAP_DMA _IO(VFIO_TYPE, VFIO_BASE + 14)
 
-#endif /* VFIO_H */
+#endif /* _UAPIVFIO_H */
diff --git a/linux-headers/linux/virtio_config.h 
b/linux-headers/linux/virtio_config.h
index 4f51d8f..b7cda39 100644
--- a/linux-headers/linux/virtio_config.h
+++ b/linux-headers/linux/virtio_config.h
@@ -1,5 +1,5 @@
-#ifndef _LINUX_VIRTIO_CONFIG_H
-#define _LINUX_VIRTIO_CONFIG_H
+#ifndef _UAPI_LINUX_VIRTIO_CONFIG_H
+#define _UAPI_LINUX_VIRTIO_CONFIG_H
 /* This header, excluding the #ifdef __KERNEL__ part, is BSD licensed so
  * anyone can use the definitions to implement compatible drivers/servers.
  *
@@ -51,4 +51,4 @@
  * suppressed them? */
 #define VIRTIO_F_NOTIFY_ON_EMPTY   24
 
-#endif /* _LINUX_VIRTIO_CONFIG_H */
+#endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
diff --git a/linux-headers/linux/virtio_ring.h 
b/linux-headers/linux/virtio_ring.h
index 1b333e2..921694a 100644
--- a/linux-headers/linux/virtio_ring.h
+++ 

Re: [PATCH] qemu: Update Linux headers

2012-10-15 Thread Alex Williamson
On Mon, 2012-10-15 at 15:54 -0500, Anthony Liguori wrote:
 Alex Williamson alex.william...@redhat.com writes:
 
  Based on v3.7-rc1-3-g29bb4cc
 
 Normally this would go through qemu-kvm/uq/master but since this is from
 Linus' tree, it's less of a concern.
 
 Nonetheless, I'd prefer we did it from v3.7-rc1 instead of a random git
 snapshot.

Resent against v3.7-rc1, which ends up just being a changelog change, no
header changes since rc1.  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH qom-cpu v2 4/7] cpus: Pass CPUState to qemu_cpu_is_self()

2012-10-15 Thread Igor Mammedov
On Fri, 12 Oct 2012 03:26:40 +0200
Andreas Färber afaer...@suse.de wrote:

 Change return type to bool, move to include/qemu/cpu.h and
 add documentation.
 
 Signed-off-by: Andreas Färber afaer...@suse.de
 ---
  cpus.c |   10 --
  exec.c |3 ++-
  hw/apic.c  |6 --
  include/qemu/cpu.h |   10 ++
  kvm-all.c  |4 +++-
  qemu-common.h  |1 -
  target-i386/kvm.c  |6 --
  7 Dateien geändert, 27 Zeilen hinzugefügt(+), 13 Zeilen entfernt(-)
 
 diff --git a/cpus.c b/cpus.c
 index 750a76f..849ea8a 100644
 --- a/cpus.c
 +++ b/cpus.c
 @@ -638,9 +638,10 @@ void qemu_init_cpu_loop(void)
  
  void run_on_cpu(CPUArchState *env, void (*func)(void *data), void *data)
  {
 +CPUState *cpu = ENV_GET_CPU(env);
  struct qemu_work_item wi;
  
 -if (qemu_cpu_is_self(env)) {
 +if (qemu_cpu_is_self(cpu)) {
  func(data);
  return;
  }
 @@ -855,7 +856,7 @@ static void qemu_cpu_kick_thread(CPUArchState *env)
  exit(1);
  }
  #else /* _WIN32 */
 -if (!qemu_cpu_is_self(env)) {
 +if (!qemu_cpu_is_self(cpu)) {
  SuspendThread(cpu-hThread);
  cpu_signal(0);
  ResumeThread(cpu-hThread);
 @@ -890,11 +891,8 @@ void qemu_cpu_kick_self(void)
  #endif
  }
  
 -int qemu_cpu_is_self(void *_env)
 +bool qemu_cpu_is_self(CPUState *cpu)
  {
 -CPUArchState *env = _env;
 -CPUState *cpu = ENV_GET_CPU(env);
 -
  return qemu_thread_is_self(cpu-thread);
  }
  
 diff --git a/exec.c b/exec.c
 index 7899042..e21be32 100644
 --- a/exec.c
 +++ b/exec.c
 @@ -1685,6 +1685,7 @@ static void cpu_unlink_tb(CPUArchState *env)
  /* mask must never be zero, except for A20 change call */
  static void tcg_handle_interrupt(CPUArchState *env, int mask)
  {
 +CPUState *cpu = ENV_GET_CPU(env);
Is there any chance to get rid of expensive cast on this call path?

  int old_mask;
  
  old_mask = env-interrupt_request;
 @@ -1694,7 +1695,7 @@ static void tcg_handle_interrupt(CPUArchState *env, int 
 mask)
   * If called from iothread context, wake the target cpu in
   * case its halted.
   */
 -if (!qemu_cpu_is_self(env)) {
 +if (!qemu_cpu_is_self(cpu)) {
  qemu_cpu_kick(env);
  return;
  }
 diff --git a/hw/apic.c b/hw/apic.c
 index ccf2819..1b4cd2f 100644
 --- a/hw/apic.c
 +++ b/hw/apic.c
 @@ -107,7 +107,7 @@ static void apic_sync_vapic(APICCommonState *s, int 
 sync_type)
  length = offsetof(VAPICState, enabled) - offsetof(VAPICState, isr);
  
  if (sync_type  SYNC_TO_VAPIC) {
 -assert(qemu_cpu_is_self(s-cpu-env));
 +assert(qemu_cpu_is_self(CPU(s-cpu)));
  
  vapic_state.tpr = s-tpr;
  vapic_state.enabled = 1;
 @@ -363,10 +363,12 @@ static int apic_irq_pending(APICCommonState *s)
  /* signal the CPU if an irq is pending */
  static void apic_update_irq(APICCommonState *s)
  {
 +CPUState *cpu = CPU(s-cpu);
 +
  if (!(s-spurious_vec  APIC_SV_ENABLE)) {
  return;
  }
 -if (!qemu_cpu_is_self(s-cpu-env)) {
 +if (!qemu_cpu_is_self(cpu)) {
  cpu_interrupt(s-cpu-env, CPU_INTERRUPT_POLL);
  } else if (apic_irq_pending(s)  0) {
  cpu_interrupt(s-cpu-env, CPU_INTERRUPT_HARD);
 diff --git a/include/qemu/cpu.h b/include/qemu/cpu.h
 index ad706a6..7be983d 100644
 --- a/include/qemu/cpu.h
 +++ b/include/qemu/cpu.h
 @@ -78,5 +78,15 @@ struct CPUState {
   */
  void cpu_reset(CPUState *cpu);
  
 +/**
 + * qemu_cpu_is_self:
 + * @cpu: The vCPU to check against.
 + *
 + * Checks whether the caller is executing on the vCPU thread.
 + *
 + * Returns: %true if called from @cpu's thread, %false otherwise.
 + */
 +bool qemu_cpu_is_self(CPUState *cpu);
 +
  
  #endif
 diff --git a/kvm-all.c b/kvm-all.c
 index 92a7137..db01aeb 100644
 --- a/kvm-all.c
 +++ b/kvm-all.c
 @@ -854,9 +854,11 @@ static MemoryListener kvm_memory_listener = {
  
  static void kvm_handle_interrupt(CPUArchState *env, int mask)
  {
 +CPUState *cpu = ENV_GET_CPU(env);
 +
  env-interrupt_request |= mask;
  
 -if (!qemu_cpu_is_self(env)) {
 +if (!qemu_cpu_is_self(cpu)) {
  qemu_cpu_kick(env);
  }
  }
 diff --git a/qemu-common.h b/qemu-common.h
 index b54612b..2094742 100644
 --- a/qemu-common.h
 +++ b/qemu-common.h
 @@ -326,7 +326,6 @@ int cpu_load(QEMUFile *f, void *opaque, int version_id);
  /* Unblock cpu */
  void qemu_cpu_kick(void *env);
  void qemu_cpu_kick_self(void);
 -int qemu_cpu_is_self(void *env);
  
  /* work queue */
  struct qemu_work_item {
 diff --git a/target-i386/kvm.c b/target-i386/kvm.c
 index 5b18383..cf3d2f1 100644
 --- a/target-i386/kvm.c
 +++ b/target-i386/kvm.c
 @@ -1552,9 +1552,10 @@ static int kvm_get_debugregs(CPUX86State *env)
  
  int kvm_arch_put_registers(CPUX86State *env, int level)
  {
 +CPUState *cpu = ENV_GET_CPU(env);
  int ret;
  
 -assert(cpu_is_stopped(env) || qemu_cpu_is_self(env));
 +assert(cpu_is_stopped(env) || 

[PATCH 8/8] KVM: PPC: Book3S HV: Fix thinko in try_lock_hpte()

2012-10-15 Thread Paul Mackerras
This fixes an error in the inline asm in try_lock_hpte() where we
were erroneously using a register number as an immediate operand.
The bug only affects an error path, and in fact the code will still
work as long as the compiler chooses some register other than r0
for the bits variable.  Nevertheless it should still be fixed.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/include/asm/kvm_book3s_64.h |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 0dd1d86..1472a5b 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -60,7 +60,7 @@ static inline long try_lock_hpte(unsigned long *hpte, 
unsigned long bits)
   ori %0,%0,%4\n
   stdcx.  %0,0,%2\n
   beq+2f\n
-  li  %1,%3\n
+  mr  %1,%3\n
 2:isync
 : =r (tmp), =r (old)
 : r (hpte), r (bits), i (HPTE_V_HVLOCK)
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/8] Various Book3s HV fixes that haven't been picked up yet

2012-10-15 Thread Paul Mackerras
This is a set of 8 patches of which the first 7 have been posted
previously and have had no comments.  The 8th is new, but is quite
trivial.  They fix a series of issues with HV-style KVM on ppc.
They only touch code that is specific to Book3S HV KVM.
The patches are against the next branch of the kvm tree.

The overall diffstat is:

 arch/powerpc/include/asm/kvm_asm.h   |1 +
 arch/powerpc/include/asm/kvm_book3s_64.h |2 +-
 arch/powerpc/include/asm/kvm_host.h  |   17 +-
 arch/powerpc/include/asm/smp.h   |8 +
 arch/powerpc/kernel/smp.c|   46 +
 arch/powerpc/kvm/book3s_hv.c |  316 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  |   11 +-
 7 files changed, 293 insertions(+), 108 deletions(-)

Please apply.

Thanks,
Paul.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 7/8] KVM: PPC: Book3S HV: Allow DTL to be set to address 0, length 0

2012-10-15 Thread Paul Mackerras
Commit 55b665b026 (KVM: PPC: Book3S HV: Provide a way for userspace
to get/set per-vCPU areas) includes a check on the length of the
dispatch trace log (DTL) to make sure the buffer is at least one entry
long.  This is appropriate when registering a buffer, but the
interface also allows for any existing buffer to be unregistered by
specifying a zero address.  In this case the length check is not
appropriate.  This makes the check conditional on the address being
non-zero.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/kvm/book3s_hv.c |5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 8b3c470..812764c 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -811,9 +811,8 @@ int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 id, union 
kvmppc_one_reg *val)
addr = val-vpaval.addr;
len = val-vpaval.length;
r = -EINVAL;
-   if (len  sizeof(struct dtl_entry))
-   break;
-   if (addr  !vcpu-arch.vpa.next_gpa)
+   if (addr  (len  sizeof(struct dtl_entry) ||
+!vcpu-arch.vpa.next_gpa))
break;
len -= len % sizeof(struct dtl_entry);
r = set_vpa(vcpu, vcpu-arch.dtl, addr, len);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/8] KVM: PPC: Book3S HV: Allow KVM guests to stop secondary threads coming online

2012-10-15 Thread Paul Mackerras
When a Book3S HV KVM guest is running, we need the host to be in
single-thread mode, that is, all of the cores (or at least all of
the cores where the KVM guest could run) to be running only one
active hardware thread.  This is because of the hardware restriction
in POWER processors that all of the hardware threads in the core
must be in the same logical partition.  Complying with this restriction
is much easier if, from the host kernel's point of view, only one
hardware thread is active.

This adds two hooks in the SMP hotplug code to allow the KVM code to
make sure that secondary threads (i.e. hardware threads other than
thread 0) cannot come online while any KVM guest exists.  The KVM
code still has to check that any core where it runs a guest has the
secondary threads offline, but having done that check it can now be
sure that they will not come online while the guest is running.

Signed-off-by: Paul Mackerras pau...@samba.org
Acked-by: Benjamin Herrenschmidt b...@kernel.crashing.org
---
 arch/powerpc/include/asm/smp.h |8 +++
 arch/powerpc/kernel/smp.c  |   46 
 arch/powerpc/kvm/book3s_hv.c   |   12 +--
 3 files changed, 64 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index ebc24dc..b625a1a 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -66,6 +66,14 @@ void generic_cpu_die(unsigned int cpu);
 void generic_mach_cpu_die(void);
 void generic_set_cpu_dead(unsigned int cpu);
 int generic_check_cpu_restart(unsigned int cpu);
+
+extern void inhibit_secondary_onlining(void);
+extern void uninhibit_secondary_onlining(void);
+
+#else /* HOTPLUG_CPU */
+static inline void inhibit_secondary_onlining(void) {}
+static inline void uninhibit_secondary_onlining(void) {}
+
 #endif
 
 #ifdef CONFIG_PPC64
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 8d4214a..c4f420c 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -417,6 +417,45 @@ int generic_check_cpu_restart(unsigned int cpu)
 {
return per_cpu(cpu_state, cpu) == CPU_UP_PREPARE;
 }
+
+static atomic_t secondary_inhibit_count;
+
+/*
+ * Don't allow secondary CPU threads to come online
+ */
+void inhibit_secondary_onlining(void)
+{
+   /*
+* This makes secondary_inhibit_count stable during cpu
+* online/offline operations.
+*/
+   get_online_cpus();
+
+   atomic_inc(secondary_inhibit_count);
+   put_online_cpus();
+}
+EXPORT_SYMBOL_GPL(inhibit_secondary_onlining);
+
+/*
+ * Allow secondary CPU threads to come online again
+ */
+void uninhibit_secondary_onlining(void)
+{
+   get_online_cpus();
+   atomic_dec(secondary_inhibit_count);
+   put_online_cpus();
+}
+EXPORT_SYMBOL_GPL(uninhibit_secondary_onlining);
+
+static int secondaries_inhibited(void)
+{
+   return atomic_read(secondary_inhibit_count);
+}
+
+#else /* HOTPLUG_CPU */
+
+#define secondaries_inhibited()0
+
 #endif
 
 static void cpu_idle_thread_init(unsigned int cpu, struct task_struct *idle)
@@ -435,6 +474,13 @@ int __cpuinit __cpu_up(unsigned int cpu, struct 
task_struct *tidle)
 {
int rc, c;
 
+   /*
+* Don't allow secondary threads to come online if inhibited
+*/
+   if (threads_per_core  1  secondaries_inhibited() 
+   cpu % threads_per_core != 0)
+   return -EBUSY;
+
if (smp_ops == NULL ||
(smp_ops-cpu_bootable  !smp_ops-cpu_bootable(cpu)))
return -EINVAL;
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 9a15da7..c5ddf04 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -47,6 +47,7 @@
 #include asm/page.h
 #include asm/hvcall.h
 #include asm/switch_to.h
+#include asm/smp.h
 #include linux/gfp.h
 #include linux/vmalloc.h
 #include linux/highmem.h
@@ -1016,8 +1017,6 @@ static int kvmppc_run_core(struct kvmppc_vcore *vc)
/*
 * Make sure we are running on thread 0, and that
 * secondary threads are offline.
-* XXX we should also block attempts to bring any
-* secondary threads online.
 */
if (threads_per_core  1  !on_primary_thread()) {
list_for_each_entry(vcpu, vc-runnable_threads, arch.run_list)
@@ -1730,11 +1729,20 @@ int kvmppc_core_init_vm(struct kvm *kvm)
 
kvm-arch.using_mmu_notifiers = !!cpu_has_feature(CPU_FTR_ARCH_206);
spin_lock_init(kvm-arch.slot_phys_lock);
+
+   /*
+* Don't allow secondary CPU threads to come online
+* while any KVM VMs exist.
+*/
+   inhibit_secondary_onlining();
+
return 0;
 }
 
 void kvmppc_core_destroy_vm(struct kvm *kvm)
 {
+   uninhibit_secondary_onlining();
+
if (kvm-arch.rma) {
kvm_release_rma(kvm-arch.rma);
kvm-arch.rma = NULL;
-- 
1.7.10.4

--
To unsubscribe from this 

[PATCH 4/8] KVM: PPC: Book3S HV: Fixes for late-joining threads

2012-10-15 Thread Paul Mackerras
If a thread in a virtual core becomes runnable while other threads
in the same virtual core are already running in the guest, it is
possible for the latecomer to join the others on the core without
first pulling them all out of the guest.  Currently this only happens
rarely, when a vcpu is first started.  This fixes some bugs and
omissions in the code in this case.

First, we need to check for VPA updates for the latecomer and make
a DTL entry for it.  Secondly, if it comes along while the master
vcpu is doing a VPA update, we don't need to do anything since the
master will pick it up in kvmppc_run_core.  To handle this correctly
we introduce a new vcore state, VCORE_STARTING.  Thirdly, there is
a race because we currently clear the hardware thread's hwthread_req
before waiting to see it get to nap.  A latecomer thread could have
its hwthread_req cleared before it gets to test it, and therefore
never increment the nap_count, leading to messages about wait_for_nap
timeouts.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/include/asm/kvm_host.h |7 ---
 arch/powerpc/kvm/book3s_hv.c|   14 +++---
 2 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 68f5a30..218534d 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -289,9 +289,10 @@ struct kvmppc_vcore {
 
 /* Values for vcore_state */
 #define VCORE_INACTIVE 0
-#define VCORE_RUNNING  1
-#define VCORE_EXITING  2
-#define VCORE_SLEEPING 3
+#define VCORE_SLEEPING 1
+#define VCORE_STARTING 2
+#define VCORE_RUNNING  3
+#define VCORE_EXITING  4
 
 /*
  * Struct used to manage memory for a virtual processor area
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 3a737a4..89995fa 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -336,6 +336,11 @@ static void kvmppc_update_vpa(struct kvm_vcpu *vcpu, 
struct kvmppc_vpa *vpap)
 
 static void kvmppc_update_vpas(struct kvm_vcpu *vcpu)
 {
+   if (!(vcpu-arch.vpa.update_pending ||
+ vcpu-arch.slb_shadow.update_pending ||
+ vcpu-arch.dtl.update_pending))
+   return;
+
spin_lock(vcpu-arch.vpa_update_lock);
if (vcpu-arch.vpa.update_pending) {
kvmppc_update_vpa(vcpu, vcpu-arch.vpa);
@@ -1009,7 +1014,7 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
vc-n_woken = 0;
vc-nap_count = 0;
vc-entry_exit_count = 0;
-   vc-vcore_state = VCORE_RUNNING;
+   vc-vcore_state = VCORE_STARTING;
vc-in_guest = 0;
vc-napping_threads = 0;
 
@@ -1062,6 +1067,7 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
kvmppc_create_dtl_entry(vcpu, vc);
}
 
+   vc-vcore_state = VCORE_RUNNING;
preempt_disable();
spin_unlock(vc-lock);
 
@@ -1070,8 +1076,6 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
srcu_idx = srcu_read_lock(vcpu0-kvm-srcu);
 
__kvmppc_vcore_entry(NULL, vcpu0);
-   for (i = 0; i  threads_per_core; ++i)
-   kvmppc_release_hwthread(vc-pcpu + i);
 
spin_lock(vc-lock);
/* disable sending of IPIs on virtual external irqs */
@@ -1080,6 +1084,8 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
/* wait for secondary threads to finish writing their state to memory */
if (vc-nap_count  vc-n_woken)
kvmppc_wait_for_nap(vc);
+   for (i = 0; i  threads_per_core; ++i)
+   kvmppc_release_hwthread(vc-pcpu + i);
/* prevent other vcpu threads from doing kvmppc_start_thread() now */
vc-vcore_state = VCORE_EXITING;
spin_unlock(vc-lock);
@@ -1170,6 +1176,7 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, 
struct kvm_vcpu *vcpu)
kvm_run-exit_reason = 0;
vcpu-arch.ret = RESUME_GUEST;
vcpu-arch.trap = 0;
+   kvmppc_update_vpas(vcpu);
 
/*
 * Synchronize with other threads in this virtual core
@@ -1193,6 +1200,7 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, 
struct kvm_vcpu *vcpu)
if (vc-vcore_state == VCORE_RUNNING 
VCORE_EXIT_COUNT(vc) == 0) {
vcpu-arch.ptid = vc-n_runnable - 1;
+   kvmppc_create_dtl_entry(vcpu, vc);
kvmppc_start_thread(vcpu);
}
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/8] KVM: PPC: Book3S HV: Fix accounting of stolen time

2012-10-15 Thread Paul Mackerras
Currently the code that accounts stolen time tends to overestimate the
stolen time, and will sometimes report more stolen time in a DTL
(dispatch trace log) entry than has elapsed since the last DTL entry.
This can cause guests to underflow the user or system time measured
for some tasks, leading to ridiculous CPU percentages and total runtimes
being reported by top and other utilities.

In addition, the current code was designed for the previous policy where
a vcore would only run when all the vcpus in it were runnable, and so
only counted stolen time on a per-vcore basis.  Now that a vcore can
run while some of the vcpus in it are doing other things in the kernel
(e.g. handling a page fault), we need to count the time when a vcpu task
is preempted while it is not running as part of a vcore as stolen also.

To do this, we bring back the BUSY_IN_HOST vcpu state and extend the
vcpu_load/put functions to count preemption time while the vcpu is
in that state.  Handling the transitions between the RUNNING and
BUSY_IN_HOST states requires checking and updating two variables
(accumulated time stolen and time last preempted), so we add a new
spinlock, vcpu-arch.tbacct_lock.  This protects both the per-vcpu
stolen/preempt-time variables, and the per-vcore variables while this
vcpu is running the vcore.

Finally, we now don't count time spent in userspace as stolen time.
The task could be executing in userspace on behalf of the vcpu, or
it could be preempted, or the vcpu could be genuinely stopped.  Since
we have no way of dividing up the time between these cases, we don't
count any of it as stolen.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/include/asm/kvm_host.h |5 ++
 arch/powerpc/kvm/book3s_hv.c|  127 ++-
 2 files changed, 117 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 1e8cbd1..3093896 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -559,12 +559,17 @@ struct kvm_vcpu_arch {
unsigned long dtl_index;
u64 stolen_logged;
struct kvmppc_vpa slb_shadow;
+
+   spinlock_t tbacct_lock;
+   u64 busy_stolen;
+   u64 busy_preempt;
 #endif
 };
 
 /* Values for vcpu-arch.state */
 #define KVMPPC_VCPU_NOTREADY   0
 #define KVMPPC_VCPU_RUNNABLE   1
+#define KVMPPC_VCPU_BUSY_IN_HOST   2
 
 /* Values for vcpu-arch.io_gpr */
 #define KVM_MMIO_REG_MASK  0x001f
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 61d2934..8b3c470 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -60,23 +60,74 @@
 /* Used to indicate that a guest page fault needs to be handled */
 #define RESUME_PAGE_FAULT  (RESUME_GUEST | RESUME_FLAG_ARCH1)
 
+/* Used as a null value for timebase values */
+#define TB_NIL (~(u64)0)
+
 static void kvmppc_end_cede(struct kvm_vcpu *vcpu);
 static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu);
 
+/*
+ * We use the vcpu_load/put functions to measure stolen time.
+ * Stolen time is counted as time when either the vcpu is able to
+ * run as part of a virtual core, but the task running the vcore
+ * is preempted or sleeping, or when the vcpu needs something done
+ * in the kernel by the task running the vcpu, but that task is
+ * preempted or sleeping.  Those two things have to be counted
+ * separately, since one of the vcpu tasks will take on the job
+ * of running the core, and the other vcpu tasks in the vcore will
+ * sleep waiting for it to do that, but that sleep shouldn't count
+ * as stolen time.
+ *
+ * Hence we accumulate stolen time when the vcpu can run as part of
+ * a vcore using vc-stolen_tb, and the stolen time when the vcpu
+ * needs its task to do other things in the kernel (for example,
+ * service a page fault) in busy_stolen.  We don't accumulate
+ * stolen time for a vcore when it is inactive, or for a vcpu
+ * when it is in state RUNNING or NOTREADY.  NOTREADY is a bit of
+ * a misnomer; it means that the vcpu task is not executing in
+ * the KVM_VCPU_RUN ioctl, i.e. it is in userspace or elsewhere in
+ * the kernel.  We don't have any way of dividing up that time
+ * between time that the vcpu is genuinely stopped, time that
+ * the task is actively working on behalf of the vcpu, and time
+ * that the task is preempted, so we don't count any of it as
+ * stolen.
+ *
+ * Updates to busy_stolen are protected by arch.tbacct_lock;
+ * updates to vc-stolen_tb are protected by the arch.tbacct_lock
+ * of the vcpu that has taken responsibility for running the vcore
+ * (i.e. vc-runner).  The stolen times are measured in units of
+ * timebase ticks.  (Note that the != TB_NIL checks below are
+ * purely defensive; they should never fail.)
+ */
+
 void kvmppc_core_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
struct kvmppc_vcore *vc = vcpu-arch.vcore;
 
-   if (vc-runner == vcpu  

[PATCH 2/8] KVM: PPC: Book3S HV: Fix some races in starting secondary threads

2012-10-15 Thread Paul Mackerras
Subsequent patches implementing in-kernel XICS emulation will make it
possible for IPIs to arrive at secondary threads at arbitrary times.
This fixes some races in how we start the secondary threads, which
if not fixed could lead to occasional crashes of the host kernel.

This makes sure that (a) we have grabbed all the secondary threads,
and verified that they are no longer in the kernel, before we start
any thread, (b) that the secondary thread loads its vcpu pointer
after clearing the IPI that woke it up (so we don't miss a wakeup),
and (c) that the secondary thread clears its vcpu pointer before
incrementing the nap count.  It also removes unnecessary setting
of the vcpu and vcore pointers in the paca in kvmppc_core_vcpu_load.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/kvm/book3s_hv.c|   41 ++-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |   11 ++---
 2 files changed, 32 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index c5ddf04..77dec0f 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -64,8 +64,6 @@ void kvmppc_core_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
struct kvmppc_vcore *vc = vcpu-arch.vcore;
 
-   local_paca-kvm_hstate.kvm_vcpu = vcpu;
-   local_paca-kvm_hstate.kvm_vcore = vc;
if (vc-runner == vcpu  vc-vcore_state != VCORE_INACTIVE)
vc-stolen_tb += mftb() - vc-preempt_tb;
 }
@@ -880,6 +878,7 @@ static int kvmppc_grab_hwthread(int cpu)
 
/* Ensure the thread won't go into the kernel if it wakes */
tpaca-kvm_hstate.hwthread_req = 1;
+   tpaca-kvm_hstate.kvm_vcpu = NULL;
 
/*
 * If the thread is already executing in the kernel (e.g. handling
@@ -929,7 +928,6 @@ static void kvmppc_start_thread(struct kvm_vcpu *vcpu)
smp_wmb();
 #if defined(CONFIG_PPC_ICP_NATIVE)  defined(CONFIG_SMP)
if (vcpu-arch.ptid) {
-   kvmppc_grab_hwthread(cpu);
xics_wake_cpu(cpu);
++vc-n_woken;
}
@@ -955,7 +953,8 @@ static void kvmppc_wait_for_nap(struct kvmppc_vcore *vc)
 
 /*
  * Check that we are on thread 0 and that any other threads in
- * this core are off-line.
+ * this core are off-line.  Then grab the threads so they can't
+ * enter the kernel.
  */
 static int on_primary_thread(void)
 {
@@ -967,6 +966,17 @@ static int on_primary_thread(void)
while (++thr  threads_per_core)
if (cpu_online(cpu + thr))
return 0;
+
+   /* Grab all hw threads so they can't go into the kernel */
+   for (thr = 1; thr  threads_per_core; ++thr) {
+   if (kvmppc_grab_hwthread(cpu + thr)) {
+   /* Couldn't grab one; let the others go */
+   do {
+   kvmppc_release_hwthread(cpu + thr);
+   } while (--thr  0);
+   return 0;
+   }
+   }
return 1;
 }
 
@@ -1015,16 +1025,6 @@ static int kvmppc_run_core(struct kvmppc_vcore *vc)
}
 
/*
-* Make sure we are running on thread 0, and that
-* secondary threads are offline.
-*/
-   if (threads_per_core  1  !on_primary_thread()) {
-   list_for_each_entry(vcpu, vc-runnable_threads, arch.run_list)
-   vcpu-arch.ret = -EBUSY;
-   goto out;
-   }
-
-   /*
 * Assign physical thread IDs, first to non-ceded vcpus
 * and then to ceded ones.
 */
@@ -1043,15 +1043,22 @@ static int kvmppc_run_core(struct kvmppc_vcore *vc)
if (vcpu-arch.ceded)
vcpu-arch.ptid = ptid++;
 
+   /*
+* Make sure we are running on thread 0, and that
+* secondary threads are offline.
+*/
+   if (threads_per_core  1  !on_primary_thread()) {
+   list_for_each_entry(vcpu, vc-runnable_threads, arch.run_list)
+   vcpu-arch.ret = -EBUSY;
+   goto out;
+   }
+
vc-stolen_tb += mftb() - vc-preempt_tb;
vc-pcpu = smp_processor_id();
list_for_each_entry(vcpu, vc-runnable_threads, arch.run_list) {
kvmppc_start_thread(vcpu);
kvmppc_create_dtl_entry(vcpu, vc);
}
-   /* Grab any remaining hw threads so they can't go into the kernel */
-   for (i = ptid; i  threads_per_core; ++i)
-   kvmppc_grab_hwthread(vc-pcpu + i);
 
preempt_disable();
spin_unlock(vc-lock);
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 44b72fe..1e90ef6 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -134,8 +134,11 @@ kvm_start_guest:
 
 27:/* XXX should handle hypervisor maintenance interrupts etc. here */
 
+   /* reload vcpu pointer after 

[PATCH 3/8] KVM: PPC: Book3s HV: Don't access runnable threads list without vcore lock

2012-10-15 Thread Paul Mackerras
There were a few places where we were traversing the list of runnable
threads in a virtual core, i.e. vc-runnable_threads, without holding
the vcore spinlock.  This extends the places where we hold the vcore
spinlock to cover everywhere that we traverse that list.

Since we possibly need to sleep inside kvmppc_book3s_hv_page_fault,
this moves the call of it from kvmppc_handle_exit out to
kvmppc_vcpu_run, where we don't hold the vcore lock.

In kvmppc_vcore_blocked, we don't actually need to check whether
all vcpus are ceded and don't have any pending exceptions, since the
caller has already done that.  The caller (kvmppc_run_vcpu) wasn't
actually checking for pending exceptions, so we add that.

The change of if to while in kvmppc_run_vcpu is to make sure that we
never call kvmppc_remove_runnable() when the vcore state is RUNNING or
EXITING.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/include/asm/kvm_asm.h |1 +
 arch/powerpc/kvm/book3s_hv.c   |   67 ++--
 2 files changed, 34 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_asm.h 
b/arch/powerpc/include/asm/kvm_asm.h
index 76fdcfe..aabcdba 100644
--- a/arch/powerpc/include/asm/kvm_asm.h
+++ b/arch/powerpc/include/asm/kvm_asm.h
@@ -118,6 +118,7 @@
 
 #define RESUME_FLAG_NV  (10)  /* Reload guest nonvolatile state? */
 #define RESUME_FLAG_HOST(11)  /* Resume host? */
+#define RESUME_FLAG_ARCH1  (12)
 
 #define RESUME_GUEST0
 #define RESUME_GUEST_NV RESUME_FLAG_NV
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 77dec0f..3a737a4 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -57,6 +57,9 @@
 /* #define EXIT_DEBUG_SIMPLE */
 /* #define EXIT_DEBUG_INT */
 
+/* Used to indicate that a guest page fault needs to be handled */
+#define RESUME_PAGE_FAULT  (RESUME_GUEST | RESUME_FLAG_ARCH1)
+
 static void kvmppc_end_cede(struct kvm_vcpu *vcpu);
 static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu);
 
@@ -431,7 +434,6 @@ static int kvmppc_handle_exit(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
  struct task_struct *tsk)
 {
int r = RESUME_HOST;
-   int srcu_idx;
 
vcpu-stat.sum_exits++;
 
@@ -491,16 +493,12 @@ static int kvmppc_handle_exit(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
 * have been handled already.
 */
case BOOK3S_INTERRUPT_H_DATA_STORAGE:
-   srcu_idx = srcu_read_lock(vcpu-kvm-srcu);
-   r = kvmppc_book3s_hv_page_fault(run, vcpu,
-   vcpu-arch.fault_dar, vcpu-arch.fault_dsisr);
-   srcu_read_unlock(vcpu-kvm-srcu, srcu_idx);
+   r = RESUME_PAGE_FAULT;
break;
case BOOK3S_INTERRUPT_H_INST_STORAGE:
-   srcu_idx = srcu_read_lock(vcpu-kvm-srcu);
-   r = kvmppc_book3s_hv_page_fault(run, vcpu,
-   kvmppc_get_pc(vcpu), 0);
-   srcu_read_unlock(vcpu-kvm-srcu, srcu_idx);
+   vcpu-arch.fault_dar = kvmppc_get_pc(vcpu);
+   vcpu-arch.fault_dsisr = 0;
+   r = RESUME_PAGE_FAULT;
break;
/*
 * This occurs if the guest executes an illegal instruction.
@@ -984,22 +982,24 @@ static int on_primary_thread(void)
  * Run a set of guest threads on a physical core.
  * Called with vc-lock held.
  */
-static int kvmppc_run_core(struct kvmppc_vcore *vc)
+static void kvmppc_run_core(struct kvmppc_vcore *vc)
 {
struct kvm_vcpu *vcpu, *vcpu0, *vnext;
long ret;
u64 now;
int ptid, i, need_vpa_update;
int srcu_idx;
+   struct kvm_vcpu *vcpus_to_update[threads_per_core];
 
/* don't start if any threads have a signal pending */
need_vpa_update = 0;
list_for_each_entry(vcpu, vc-runnable_threads, arch.run_list) {
if (signal_pending(vcpu-arch.run_task))
-   return 0;
-   need_vpa_update |= vcpu-arch.vpa.update_pending |
-   vcpu-arch.slb_shadow.update_pending |
-   vcpu-arch.dtl.update_pending;
+   return;
+   if (vcpu-arch.vpa.update_pending ||
+   vcpu-arch.slb_shadow.update_pending ||
+   vcpu-arch.dtl.update_pending)
+   vcpus_to_update[need_vpa_update++] = vcpu;
}
 
/*
@@ -1019,8 +1019,8 @@ static int kvmppc_run_core(struct kvmppc_vcore *vc)
 */
if (need_vpa_update) {
spin_unlock(vc-lock);
-   list_for_each_entry(vcpu, vc-runnable_threads, arch.run_list)
-   kvmppc_update_vpas(vcpu);
+   for (i = 0; i  need_vpa_update; ++i)
+   kvmppc_update_vpas(vcpus_to_update[i]);
spin_lock(vc-lock);
}
 
@@ -1037,8 +1037,10 @@ static int 

[PATCH 5/8] KVM: PPC: Book3S HV: Run virtual core whenever any vcpus in it can run

2012-10-15 Thread Paul Mackerras
Currently the Book3S HV code implements a policy on multi-threaded
processors (i.e. POWER7) that requires all of the active vcpus in a
virtual core to be ready to run before we run the virtual core.
However, that causes problems on reset, because reset stops all vcpus
except vcpu 0, and can also reduce throughput since all four threads
in a virtual core have to wait whenever any one of them hits a
hypervisor page fault.

This relaxes the policy, allowing the virtual core to run as soon as
any vcpu in it is runnable.  With this, the KVMPPC_VCPU_STOPPED state
and the KVMPPC_VCPU_BUSY_IN_HOST state have been combined into a single
KVMPPC_VCPU_NOTREADY state, since we no longer need to distinguish
between them.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/include/asm/kvm_host.h |5 +--
 arch/powerpc/kvm/book3s_hv.c|   74 ++-
 2 files changed, 40 insertions(+), 39 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 218534d..1e8cbd1 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -563,9 +563,8 @@ struct kvm_vcpu_arch {
 };
 
 /* Values for vcpu-arch.state */
-#define KVMPPC_VCPU_STOPPED0
-#define KVMPPC_VCPU_BUSY_IN_HOST   1
-#define KVMPPC_VCPU_RUNNABLE   2
+#define KVMPPC_VCPU_NOTREADY   0
+#define KVMPPC_VCPU_RUNNABLE   1
 
 /* Values for vcpu-arch.io_gpr */
 #define KVM_MMIO_REG_MASK  0x001f
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 89995fa..61d2934 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -776,10 +776,7 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, 
unsigned int id)
 
kvmppc_mmu_book3s_hv_init(vcpu);
 
-   /*
-* We consider the vcpu stopped until we see the first run ioctl for it.
-*/
-   vcpu-arch.state = KVMPPC_VCPU_STOPPED;
+   vcpu-arch.state = KVMPPC_VCPU_NOTREADY;
 
init_waitqueue_head(vcpu-arch.cpu_run);
 
@@ -866,9 +863,8 @@ static void kvmppc_remove_runnable(struct kvmppc_vcore *vc,
 {
if (vcpu-arch.state != KVMPPC_VCPU_RUNNABLE)
return;
-   vcpu-arch.state = KVMPPC_VCPU_BUSY_IN_HOST;
+   vcpu-arch.state = KVMPPC_VCPU_NOTREADY;
--vc-n_runnable;
-   ++vc-n_busy;
list_del(vcpu-arch.run_list);
 }
 
@@ -1169,7 +1165,6 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
 static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 {
int n_ceded;
-   int prev_state;
struct kvmppc_vcore *vc;
struct kvm_vcpu *v, *vn;
 
@@ -1186,7 +1181,6 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, 
struct kvm_vcpu *vcpu)
vcpu-arch.ceded = 0;
vcpu-arch.run_task = current;
vcpu-arch.kvm_run = kvm_run;
-   prev_state = vcpu-arch.state;
vcpu-arch.state = KVMPPC_VCPU_RUNNABLE;
list_add_tail(vcpu-arch.run_list, vc-runnable_threads);
++vc-n_runnable;
@@ -1196,35 +1190,26 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, 
struct kvm_vcpu *vcpu)
 * If the vcore is already running, we may be able to start
 * this thread straight away and have it join in.
 */
-   if (prev_state == KVMPPC_VCPU_STOPPED) {
+   if (!signal_pending(current)) {
if (vc-vcore_state == VCORE_RUNNING 
VCORE_EXIT_COUNT(vc) == 0) {
vcpu-arch.ptid = vc-n_runnable - 1;
kvmppc_create_dtl_entry(vcpu, vc);
kvmppc_start_thread(vcpu);
+   } else if (vc-vcore_state == VCORE_SLEEPING) {
+   wake_up(vc-wq);
}
 
-   } else if (prev_state == KVMPPC_VCPU_BUSY_IN_HOST)
-   --vc-n_busy;
+   }
 
while (vcpu-arch.state == KVMPPC_VCPU_RUNNABLE 
   !signal_pending(current)) {
-   if (vc-n_busy || vc-vcore_state != VCORE_INACTIVE) {
+   if (vc-vcore_state != VCORE_INACTIVE) {
spin_unlock(vc-lock);
kvmppc_wait_for_exec(vcpu, TASK_INTERRUPTIBLE);
spin_lock(vc-lock);
continue;
}
-   vc-runner = vcpu;
-   n_ceded = 0;
-   list_for_each_entry(v, vc-runnable_threads, arch.run_list)
-   if (!v-arch.pending_exceptions)
-   n_ceded += v-arch.ceded;
-   if (n_ceded == vc-n_runnable)
-   kvmppc_vcore_blocked(vc);
-   else
-   kvmppc_run_core(vc);
-
list_for_each_entry_safe(v, vn, vc-runnable_threads,
 arch.run_list) {
kvmppc_core_prepare_to_enter(v);
@@ -1236,23 +1221,40 @@ static int 

[PATCH 3/5] KVM: PPC: Book3S HV: Add a mechanism for recording modified HPTEs

2012-10-15 Thread Paul Mackerras
This uses a bit in our record of the guest view of the HPTE to record
when the HPTE gets modified.  We use a reserved bit for this, and ensure
that this bit is always cleared in HPTE values returned to the guest.
The recording of modified HPTEs is only done if other code indicates
its interest by setting kvm-arch.hpte_mod_interest to a non-zero value.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/include/asm/kvm_book3s_64.h |6 ++
 arch/powerpc/include/asm/kvm_host.h  |1 +
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  |   25 ++---
 3 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 1472a5b..4ca4f25 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -50,6 +50,12 @@ extern int kvm_hpt_order;/* order of 
preallocated HPTs */
 #define HPTE_V_HVLOCK  0x40UL
 #define HPTE_V_ABSENT  0x20UL
 
+/*
+ * We use this bit in the guest_rpte field of the revmap entry
+ * to indicate a modified HPTE.
+ */
+#define HPTE_GR_MODIFIED   (1ul  62)
+
 static inline long try_lock_hpte(unsigned long *hpte, unsigned long bits)
 {
unsigned long tmp, old;
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 3093896..58c7264 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -248,6 +248,7 @@ struct kvm_arch {
atomic_t vcpus_running;
unsigned long hpt_npte;
unsigned long hpt_mask;
+   atomic_t hpte_mod_interest;
spinlock_t slot_phys_lock;
unsigned short last_vcpu[NR_CPUS];
struct kvmppc_vcore *vcores[KVM_MAX_VCORES];
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 3233587..c83c0ca 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -66,6 +66,18 @@ void kvmppc_add_revmap_chain(struct kvm *kvm, struct 
revmap_entry *rev,
 }
 EXPORT_SYMBOL_GPL(kvmppc_add_revmap_chain);
 
+/*
+ * Note modification of an HPTE; set the HPTE modified bit
+ * if it wasn't modified before and anyone is interested.
+ */
+static inline void note_hpte_modification(struct kvm *kvm,
+ struct revmap_entry *rev)
+{
+   if (!(rev-guest_rpte  HPTE_GR_MODIFIED) 
+   atomic_read(kvm-arch.hpte_mod_interest))
+   rev-guest_rpte |= HPTE_GR_MODIFIED;
+}
+
 /* Remove this HPTE from the chain for a real page */
 static void remove_revmap_chain(struct kvm *kvm, long pte_index,
struct revmap_entry *rev,
@@ -287,8 +299,10 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long 
flags,
rev = kvm-arch.revmap[pte_index];
if (realmode)
rev = real_vmalloc_addr(rev);
-   if (rev)
+   if (rev) {
rev-guest_rpte = g_ptel;
+   note_hpte_modification(kvm, rev);
+   }
 
/* Link HPTE into reverse-map chain */
if (pteh  HPTE_V_VALID) {
@@ -392,7 +406,8 @@ long kvmppc_h_remove(struct kvm_vcpu *vcpu, unsigned long 
flags,
/* Read PTE low word after tlbie to get final R/C values */
remove_revmap_chain(kvm, pte_index, rev, v, hpte[1]);
}
-   r = rev-guest_rpte;
+   r = rev-guest_rpte  ~HPTE_GR_MODIFIED;
+   note_hpte_modification(kvm, rev);
unlock_hpte(hpte, 0);
 
vcpu-arch.gpr[4] = v;
@@ -466,6 +481,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
 
args[j] = ((0x80 | flags)  56) + pte_index;
rev = real_vmalloc_addr(kvm-arch.revmap[pte_index]);
+   note_hpte_modification(kvm, rev);
 
if (!(hp[0]  HPTE_V_VALID)) {
/* insert R and C bits from PTE */
@@ -555,6 +571,7 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long 
flags,
if (rev) {
r = (rev-guest_rpte  ~mask) | bits;
rev-guest_rpte = r;
+   note_hpte_modification(kvm, rev);
}
r = (hpte[1]  ~mask) | bits;
 
@@ -606,8 +623,10 @@ long kvmppc_h_read(struct kvm_vcpu *vcpu, unsigned long 
flags,
v = ~HPTE_V_ABSENT;
v |= HPTE_V_VALID;
}
-   if (v  HPTE_V_VALID)
+   if (v  HPTE_V_VALID) {
r = rev[i].guest_rpte | (r  (HPTE_R_R | HPTE_R_C));
+   r = ~HPTE_GR_MODIFIED;
+   }
vcpu-arch.gpr[4 + i * 2] = v;
vcpu-arch.gpr[5 + i * 2] = r;
}
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] KVM: PPC: Book3S HV: Restructure HPT entry creation code

2012-10-15 Thread Paul Mackerras
This restructures the code that creates HPT (hashed page table)
entries so that it can be called in situations where we don't have a
struct vcpu pointer, only a struct kvm pointer.  It also fixes a bug
where kvmppc_map_vrma() would corrupt the guest R4 value.

Now, most of the work of kvmppc_virtmode_h_enter is done by a new
function, kvmppc_virtmode_do_h_enter, which itself calls another new
function, kvmppc_do_h_enter, which contains most of the old
kvmppc_h_enter.  The new kvmppc_do_h_enter takes explicit arguments
for the place to return the HPTE index, the Linux page tables to use,
and whether it is being called in real mode, thus removing the need
for it to have the vcpu as an argument.

Currently kvmppc_map_vrma creates the VRMA (virtual real mode area)
HPTEs by calling kvmppc_virtmode_h_enter, which is designed primarily
to handle H_ENTER hcalls from the guest that need to pin a page of
memory.  Since H_ENTER returns the index of the created HPTE in R4,
kvmppc_virtmode_h_enter updates the guest R4, corrupting the guest R4
in the case when it gets called from kvmppc_map_vrma on the first
VCPU_RUN ioctl.  With this, kvmppc_map_vrma instead calls
kvmppc_virtmode_do_h_enter with the address of a dummy word as the
place to store the HPTE index, thus avoiding corrupting the guest R4.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/include/asm/kvm_book3s.h |5 +++--
 arch/powerpc/kvm/book3s_64_mmu_hv.c   |   36 +++--
 arch/powerpc/kvm/book3s_hv_rm_mmu.c   |   27 -
 3 files changed, 45 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index ab73800..199b7fd 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -157,8 +157,9 @@ extern void *kvmppc_pin_guest_page(struct kvm *kvm, 
unsigned long addr,
 extern void kvmppc_unpin_guest_page(struct kvm *kvm, void *addr);
 extern long kvmppc_virtmode_h_enter(struct kvm_vcpu *vcpu, unsigned long flags,
long pte_index, unsigned long pteh, unsigned long ptel);
-extern long kvmppc_h_enter(struct kvm_vcpu *vcpu, unsigned long flags,
-   long pte_index, unsigned long pteh, unsigned long ptel);
+extern long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
+   long pte_index, unsigned long pteh, unsigned long ptel,
+   pgd_t *pgdir, bool realmode, unsigned long *idx_ret);
 extern long kvmppc_hv_get_dirty_log(struct kvm *kvm,
struct kvm_memory_slot *memslot, unsigned long *map);
 
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 7a4aae9..351f2ac 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -41,6 +41,10 @@
 /* Power architecture requires HPT is at least 256kB */
 #define PPC_MIN_HPT_ORDER  18
 
+static long kvmppc_virtmode_do_h_enter(struct kvm *kvm, unsigned long flags,
+   long pte_index, unsigned long pteh,
+   unsigned long ptel, unsigned long *pte_idx_ret);
+
 long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
 {
unsigned long hpt;
@@ -185,6 +189,7 @@ void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct 
kvm_memory_slot *memslot,
unsigned long addr, hash;
unsigned long psize;
unsigned long hp0, hp1;
+   unsigned long idx_ret;
long ret;
struct kvm *kvm = vcpu-kvm;
 
@@ -216,7 +221,8 @@ void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct 
kvm_memory_slot *memslot,
hash = (hash  3) + 7;
hp_v = hp0 | ((addr  16)  ~0x7fUL);
hp_r = hp1 | addr;
-   ret = kvmppc_virtmode_h_enter(vcpu, H_EXACT, hash, hp_v, hp_r);
+   ret = kvmppc_virtmode_do_h_enter(kvm, H_EXACT, hash, hp_v, hp_r,
+idx_ret);
if (ret != H_SUCCESS) {
pr_err(KVM: map_vrma at %lx failed, ret=%ld\n,
   addr, ret);
@@ -354,15 +360,10 @@ static long kvmppc_get_guest_page(struct kvm *kvm, 
unsigned long gfn,
return err;
 }
 
-/*
- * We come here on a H_ENTER call from the guest when we are not
- * using mmu notifiers and we don't have the requested page pinned
- * already.
- */
-long kvmppc_virtmode_h_enter(struct kvm_vcpu *vcpu, unsigned long flags,
-   long pte_index, unsigned long pteh, unsigned long ptel)
+long kvmppc_virtmode_do_h_enter(struct kvm *kvm, unsigned long flags,
+   long pte_index, unsigned long pteh,
+   unsigned long ptel, unsigned long *pte_idx_ret)
 {
-   struct kvm *kvm = vcpu-kvm;
unsigned long psize, gpa, gfn;
struct kvm_memory_slot *memslot;
long ret;
@@ -390,8 +391,8 @@ long 

[PATCH 0/5] KVM: PPC: Book3S HV: HPT read/write functions for userspace

2012-10-15 Thread Paul Mackerras
This series of patches provides an interface by which userspace can
read and write the hashed page table (HPT) of a Book3S HV guest.
The interface is an ioctl which provides a file descriptor which can
be accessed with the read() and write() system calls.  The data read
and written is the guest view of the HPT, in which the second
doubleword of each HPTE (HPT entry) contains a guest physical address,
as distinct from the real HPT that the hardware accesses, where the
second doubleword of each HPTE contains a real address.

Because the HPT is divided into groups (HPTEGs) of 8 entries each,
where each HPTEG usually only contains a few valid entries, or none,
the data format that we use does run-length encoding of the invalid
entries, so in fact the invalid entries take up no space in the
stream.

The interface also provides for doing multiple passes over the HPT,
where the first pass provides information on all HPTEs, and subsequent
passes only return the HPTEs that have changed since the previous pass.

I have implemented a read/write interface rather than an mmap-based
interface because the data is not stored contiguously anywhere in
kernel memory.  Of each 16-byte HPTE, the first 8 bytes come from the
real HPT and the second 8 bytes come from the parallel vmalloc'd array
where we store the guest view of the guest physical address,
permissions, accessed/dirty bits etc.  Thus a mmap-based interface
would not be practicable (not without doubling the size of the
parallel array, typically requiring an extra 8MB of kernel memory per
guest).  This is also why I have not used the memslot interface for
this.

This implements the interface for HV-style KVM but not for PR-style
KVM.  Userspace does not need any additional interface with PR-style
KVM because userspace maintains the guest HPT already in that case,
and has an image of the guest view of the HPT in its address space.

This series is against the next branch of the kvm tree plus my
recently-posted set of 8 patches (Various Book3s HV fixes that
haven't been picked up yet).  The overall diffstat is:

 Documentation/virtual/kvm/api.txt|   53 +
 arch/powerpc/include/asm/kvm.h   |   24 ++
 arch/powerpc/include/asm/kvm_book3s.h|8 +-
 arch/powerpc/include/asm/kvm_book3s_64.h |   24 ++
 arch/powerpc/include/asm/kvm_host.h  |1 +
 arch/powerpc/include/asm/kvm_ppc.h   |2 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c  |  380 +-
 arch/powerpc/kvm/book3s_hv.c |   12 -
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  |   71 --
 arch/powerpc/kvm/powerpc.c   |   17 ++
 include/linux/kvm.h  |3 +
 include/linux/kvm_host.h |   11 +-
 12 files changed, 559 insertions(+), 47 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] KVM: Provide mmu notifier retry test based on struct kvm

2012-10-15 Thread Paul Mackerras
The mmu_notifier_retry() function, used to test whether any page
invalidations are in progress, currently takes a vcpu pointer, though
the code only needs the VM's struct kvm pointer.  Forthcoming patches
to the powerpc Book3S HV code will need to test for retry within a VM
ioctl, where a struct kvm pointer is available but a struct vcpu
pointer isn't.  Therefore this creates a variant of mmu_notifier_retry
called kvm_mmu_notifier_retry that takes a struct kvm pointer, and
implements mmu_notifier_retry in terms of it.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 include/linux/kvm_host.h |   11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 6afc5be..1cc1e1d 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -841,9 +841,9 @@ extern struct kvm_stats_debugfs_item debugfs_entries[];
 extern struct dentry *kvm_debugfs_dir;
 
 #if defined(CONFIG_MMU_NOTIFIER)  defined(KVM_ARCH_WANT_MMU_NOTIFIER)
-static inline int mmu_notifier_retry(struct kvm_vcpu *vcpu, unsigned long 
mmu_seq)
+static inline int kvm_mmu_notifier_retry(struct kvm *kvm, unsigned long 
mmu_seq)
 {
-   if (unlikely(vcpu-kvm-mmu_notifier_count))
+   if (unlikely(kvm-mmu_notifier_count))
return 1;
/*
 * Ensure the read of mmu_notifier_count happens before the read
@@ -856,10 +856,15 @@ static inline int mmu_notifier_retry(struct kvm_vcpu 
*vcpu, unsigned long mmu_se
 * can't rely on kvm-mmu_lock to keep things ordered.
 */
smp_rmb();
-   if (vcpu-kvm-mmu_notifier_seq != mmu_seq)
+   if (kvm-mmu_notifier_seq != mmu_seq)
return 1;
return 0;
 }
+
+static inline int mmu_notifier_retry(struct kvm_vcpu *vcpu, unsigned long 
mmu_seq)
+{
+   return kvm_mmu_notifier_retry(vcpu-kvm, mmu_seq);
+}
 #endif
 
 #ifdef KVM_CAP_IRQ_ROUTING
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5] KVM: PPC: Book3S HV: Provide a method for userspace to read and write the HPT

2012-10-15 Thread Paul Mackerras
A new ioctl, KVM_PPC_GET_HTAB_FD, returns a file descriptor.  Reads on
this fd return the contents of the HPT (hashed page table), writes
create and/or remove entries in the HPT.  There is a new capability,
KVM_CAP_PPC_HTAB_FD, to indicate the presence of the ioctl.  The ioctl
takes an argument structure with the index of the first HPT entry to
read out and a set of flags.  The flags indicate whether the user is
intending to read or write the HPT, and whether to return all entries
or only the bolted entries (those with the bolted bit, 0x10, set in
the first doubleword).

This is intended for use in implementing qemu's savevm/loadvm and for
live migration.  Therefore, on reads, the first pass returns information
about all HPTEs (or all bolted HPTEs).  When the first pass reaches the
end of the HPT, it returns from the read.  Subsequent reads only return
information about HPTEs that have changed since they were last read.
A read that finds no changed HPTEs in the HPT following where the last
read finished will return 0 bytes.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 Documentation/virtual/kvm/api.txt|   53 +
 arch/powerpc/include/asm/kvm.h   |   24 +++
 arch/powerpc/include/asm/kvm_book3s_64.h |   18 ++
 arch/powerpc/include/asm/kvm_ppc.h   |2 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c  |  344 ++
 arch/powerpc/kvm/book3s_hv.c |   12 --
 arch/powerpc/kvm/powerpc.c   |   17 ++
 include/linux/kvm.h  |3 +
 8 files changed, 461 insertions(+), 12 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 4258180..8df3e53 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2071,6 +2071,59 @@ KVM_S390_INT_EXTERNAL_CALL (vcpu) - sigp external call; 
source cpu in parm
 
 Note that the vcpu ioctl is asynchronous to vcpu execution.
 
+4.78 KVM_PPC_GET_HTAB_FD
+
+Capability: KVM_CAP_PPC_HTAB_FD
+Architectures: powerpc
+Type: vm ioctl
+Parameters: Pointer to struct kvm_get_htab_fd (in)
+Returns: file descriptor number (= 0) on success, -1 on error
+
+This returns a file descriptor that can be used either to read out the
+entries in the guest's hashed page table (HPT), or to write entries to
+initialize the HPT.  The returned fd can only be written to if the
+KVM_GET_HTAB_WRITE bit is set in the flags field of the argument, and
+can only be read if that bit is clear.  The argument struct looks like
+this:
+
+/* For KVM_PPC_GET_HTAB_FD */
+struct kvm_get_htab_fd {
+   __u64   flags;
+   __u64   start_index;
+};
+
+/* Values for kvm_get_htab_fd.flags */
+#define KVM_GET_HTAB_BOLTED_ONLY   ((__u64)0x1)
+#define KVM_GET_HTAB_WRITE ((__u64)0x2)
+
+The `start_index' field gives the index in the HPT of the entry at
+which to start reading.  It is ignored when writing.
+
+Reads on the fd will initially supply information about all
+interesting HPT entries.  Interesting entries are those with the
+bolted bit set, if the KVM_GET_HTAB_BOLTED_ONLY bit is set, otherwise
+all entries.  When the end of the HPT is reached, the read() will
+return.  If read() is called again on the fd, it will start again from
+the beginning of the HPT, but will only return HPT entries that have
+changed since they were last read.
+
+Data read or written is structured as a header (8 bytes) followed by a
+series of valid HPT entries (16 bytes) each.  The header indicates how
+many valid HPT entries there are and how many invalid entries follow
+the valid entries.  The invalid entries are not represented explicitly
+in the stream.  The header format is:
+
+struct kvm_get_htab_header {
+   __u32   index;
+   __u16   n_valid;
+   __u16   n_invalid;
+};
+
+Writes to the fd create HPT entries starting at the index given in the
+header; first `n_valid' valid entries with contents from the data
+written, then `n_invalid' invalid entries, invalidating any previously
+valid entries found.
+
 
 5. The kvm_run structure
 
diff --git a/arch/powerpc/include/asm/kvm.h b/arch/powerpc/include/asm/kvm.h
index b89ae4d..6518e38 100644
--- a/arch/powerpc/include/asm/kvm.h
+++ b/arch/powerpc/include/asm/kvm.h
@@ -331,6 +331,30 @@ struct kvm_book3e_206_tlb_params {
__u32 reserved[8];
 };
 
+/* For KVM_PPC_GET_HTAB_FD */
+struct kvm_get_htab_fd {
+   __u64   flags;
+   __u64   start_index;
+};
+
+/* Values for kvm_get_htab_fd.flags */
+#define KVM_GET_HTAB_BOLTED_ONLY   ((__u64)0x1)
+#define KVM_GET_HTAB_WRITE ((__u64)0x2)
+
+/*
+ * Data read on the file descriptor is formatted as a series of
+ * records, each consisting of a header followed by a series of
+ * `n_valid' HPTEs (16 bytes each), which are all valid.  Following 
+ * those valid HPTEs there are `n_invalid' invalid HPTEs, which
+ * are not represented explicitly in the stream.  The same format
+ * is used for writing.
+ */
+struct 

[PATCH 4/5] KVM: PPC: Book3S HV: Make a HPTE removal function available

2012-10-15 Thread Paul Mackerras
This makes a HPTE removal function, kvmppc_do_h_remove(), available
outside book3s_hv_rm_mmu.c.  This will be used by the HPT writing
code.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/include/asm/kvm_book3s.h |3 +++
 arch/powerpc/kvm/book3s_hv_rm_mmu.c   |   19 +--
 2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 199b7fd..4ac1c67 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -160,6 +160,9 @@ extern long kvmppc_virtmode_h_enter(struct kvm_vcpu *vcpu, 
unsigned long flags,
 extern long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
long pte_index, unsigned long pteh, unsigned long ptel,
pgd_t *pgdir, bool realmode, unsigned long *idx_ret);
+extern long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
+   unsigned long pte_index, unsigned long avpn,
+   unsigned long *hpret);
 extern long kvmppc_hv_get_dirty_log(struct kvm *kvm,
struct kvm_memory_slot *memslot, unsigned long *map);
 
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index c83c0ca..505548a 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -364,11 +364,10 @@ static inline int try_lock_tlbie(unsigned int *lock)
return old == 0;
 }
 
-long kvmppc_h_remove(struct kvm_vcpu *vcpu, unsigned long flags,
-unsigned long pte_index, unsigned long avpn,
-unsigned long va)
+long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
+   unsigned long pte_index, unsigned long avpn,
+   unsigned long *hpret)
 {
-   struct kvm *kvm = vcpu-kvm;
unsigned long *hpte;
unsigned long v, r, rb;
struct revmap_entry *rev;
@@ -410,10 +409,18 @@ long kvmppc_h_remove(struct kvm_vcpu *vcpu, unsigned long 
flags,
note_hpte_modification(kvm, rev);
unlock_hpte(hpte, 0);
 
-   vcpu-arch.gpr[4] = v;
-   vcpu-arch.gpr[5] = r;
+   hpret[0] = v;
+   hpret[1] = r;
return H_SUCCESS;
 }
+EXPORT_SYMBOL_GPL(kvmppc_do_h_remove);
+
+long kvmppc_h_remove(struct kvm_vcpu *vcpu, unsigned long flags,
+unsigned long pte_index, unsigned long avpn)
+{
+   return kvmppc_do_h_remove(vcpu-kvm, flags, pte_index, avpn,
+ vcpu-arch.gpr[4]);
+}
 
 long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
 {
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] MAINTAINERS: Add git tree link for PPC KVM

2012-10-15 Thread Michael Ellerman
Signed-off-by: Michael Ellerman mich...@ellerman.id.au
---
 MAINTAINERS |1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index e73060f..32dc107 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4244,6 +4244,7 @@ KERNEL VIRTUAL MACHINE (KVM) FOR POWERPC
 M: Alexander Graf ag...@suse.de
 L: kvm-...@vger.kernel.org
 W: http://kvm.qumranet.com
+T: git git://github.com/agraf/linux-2.6.git
 S: Supported
 F: arch/powerpc/include/asm/kvm*
 F: arch/powerpc/kvm/
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 8/8] KVM: PPC: Book3S HV: Fix thinko in try_lock_hpte()

2012-10-15 Thread Paul Mackerras
This fixes an error in the inline asm in try_lock_hpte() where we
were erroneously using a register number as an immediate operand.
The bug only affects an error path, and in fact the code will still
work as long as the compiler chooses some register other than r0
for the bits variable.  Nevertheless it should still be fixed.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/include/asm/kvm_book3s_64.h |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 0dd1d86..1472a5b 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -60,7 +60,7 @@ static inline long try_lock_hpte(unsigned long *hpte, 
unsigned long bits)
   ori %0,%0,%4\n
   stdcx.  %0,0,%2\n
   beq+2f\n
-  li  %1,%3\n
+  mr  %1,%3\n
 2:isync
 : =r (tmp), =r (old)
 : r (hpte), r (bits), i (HPTE_V_HVLOCK)
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/8] Various Book3s HV fixes that haven't been picked up yet

2012-10-15 Thread Paul Mackerras
This is a set of 8 patches of which the first 7 have been posted
previously and have had no comments.  The 8th is new, but is quite
trivial.  They fix a series of issues with HV-style KVM on ppc.
They only touch code that is specific to Book3S HV KVM.

Please apply.

Thanks,
Paul.
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/8] KVM: PPC: Book3S HV: Run virtual core whenever any vcpus in it can run

2012-10-15 Thread Paul Mackerras
Currently the Book3S HV code implements a policy on multi-threaded
processors (i.e. POWER7) that requires all of the active vcpus in a
virtual core to be ready to run before we run the virtual core.
However, that causes problems on reset, because reset stops all vcpus
except vcpu 0, and can also reduce throughput since all four threads
in a virtual core have to wait whenever any one of them hits a
hypervisor page fault.

This relaxes the policy, allowing the virtual core to run as soon as
any vcpu in it is runnable.  With this, the KVMPPC_VCPU_STOPPED state
and the KVMPPC_VCPU_BUSY_IN_HOST state have been combined into a single
KVMPPC_VCPU_NOTREADY state, since we no longer need to distinguish
between them.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/include/asm/kvm_host.h |5 +--
 arch/powerpc/kvm/book3s_hv.c|   74 ++-
 2 files changed, 40 insertions(+), 39 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 218534d..1e8cbd1 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -563,9 +563,8 @@ struct kvm_vcpu_arch {
 };
 
 /* Values for vcpu-arch.state */
-#define KVMPPC_VCPU_STOPPED0
-#define KVMPPC_VCPU_BUSY_IN_HOST   1
-#define KVMPPC_VCPU_RUNNABLE   2
+#define KVMPPC_VCPU_NOTREADY   0
+#define KVMPPC_VCPU_RUNNABLE   1
 
 /* Values for vcpu-arch.io_gpr */
 #define KVM_MMIO_REG_MASK  0x001f
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 89995fa..61d2934 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -776,10 +776,7 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, 
unsigned int id)
 
kvmppc_mmu_book3s_hv_init(vcpu);
 
-   /*
-* We consider the vcpu stopped until we see the first run ioctl for it.
-*/
-   vcpu-arch.state = KVMPPC_VCPU_STOPPED;
+   vcpu-arch.state = KVMPPC_VCPU_NOTREADY;
 
init_waitqueue_head(vcpu-arch.cpu_run);
 
@@ -866,9 +863,8 @@ static void kvmppc_remove_runnable(struct kvmppc_vcore *vc,
 {
if (vcpu-arch.state != KVMPPC_VCPU_RUNNABLE)
return;
-   vcpu-arch.state = KVMPPC_VCPU_BUSY_IN_HOST;
+   vcpu-arch.state = KVMPPC_VCPU_NOTREADY;
--vc-n_runnable;
-   ++vc-n_busy;
list_del(vcpu-arch.run_list);
 }
 
@@ -1169,7 +1165,6 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
 static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 {
int n_ceded;
-   int prev_state;
struct kvmppc_vcore *vc;
struct kvm_vcpu *v, *vn;
 
@@ -1186,7 +1181,6 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, 
struct kvm_vcpu *vcpu)
vcpu-arch.ceded = 0;
vcpu-arch.run_task = current;
vcpu-arch.kvm_run = kvm_run;
-   prev_state = vcpu-arch.state;
vcpu-arch.state = KVMPPC_VCPU_RUNNABLE;
list_add_tail(vcpu-arch.run_list, vc-runnable_threads);
++vc-n_runnable;
@@ -1196,35 +1190,26 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, 
struct kvm_vcpu *vcpu)
 * If the vcore is already running, we may be able to start
 * this thread straight away and have it join in.
 */
-   if (prev_state == KVMPPC_VCPU_STOPPED) {
+   if (!signal_pending(current)) {
if (vc-vcore_state == VCORE_RUNNING 
VCORE_EXIT_COUNT(vc) == 0) {
vcpu-arch.ptid = vc-n_runnable - 1;
kvmppc_create_dtl_entry(vcpu, vc);
kvmppc_start_thread(vcpu);
+   } else if (vc-vcore_state == VCORE_SLEEPING) {
+   wake_up(vc-wq);
}
 
-   } else if (prev_state == KVMPPC_VCPU_BUSY_IN_HOST)
-   --vc-n_busy;
+   }
 
while (vcpu-arch.state == KVMPPC_VCPU_RUNNABLE 
   !signal_pending(current)) {
-   if (vc-n_busy || vc-vcore_state != VCORE_INACTIVE) {
+   if (vc-vcore_state != VCORE_INACTIVE) {
spin_unlock(vc-lock);
kvmppc_wait_for_exec(vcpu, TASK_INTERRUPTIBLE);
spin_lock(vc-lock);
continue;
}
-   vc-runner = vcpu;
-   n_ceded = 0;
-   list_for_each_entry(v, vc-runnable_threads, arch.run_list)
-   if (!v-arch.pending_exceptions)
-   n_ceded += v-arch.ceded;
-   if (n_ceded == vc-n_runnable)
-   kvmppc_vcore_blocked(vc);
-   else
-   kvmppc_run_core(vc);
-
list_for_each_entry_safe(v, vn, vc-runnable_threads,
 arch.run_list) {
kvmppc_core_prepare_to_enter(v);
@@ -1236,23 +1221,40 @@ static int 

[PATCH 6/8] KVM: PPC: Book3S HV: Fix accounting of stolen time

2012-10-15 Thread Paul Mackerras
Currently the code that accounts stolen time tends to overestimate the
stolen time, and will sometimes report more stolen time in a DTL
(dispatch trace log) entry than has elapsed since the last DTL entry.
This can cause guests to underflow the user or system time measured
for some tasks, leading to ridiculous CPU percentages and total runtimes
being reported by top and other utilities.

In addition, the current code was designed for the previous policy where
a vcore would only run when all the vcpus in it were runnable, and so
only counted stolen time on a per-vcore basis.  Now that a vcore can
run while some of the vcpus in it are doing other things in the kernel
(e.g. handling a page fault), we need to count the time when a vcpu task
is preempted while it is not running as part of a vcore as stolen also.

To do this, we bring back the BUSY_IN_HOST vcpu state and extend the
vcpu_load/put functions to count preemption time while the vcpu is
in that state.  Handling the transitions between the RUNNING and
BUSY_IN_HOST states requires checking and updating two variables
(accumulated time stolen and time last preempted), so we add a new
spinlock, vcpu-arch.tbacct_lock.  This protects both the per-vcpu
stolen/preempt-time variables, and the per-vcore variables while this
vcpu is running the vcore.

Finally, we now don't count time spent in userspace as stolen time.
The task could be executing in userspace on behalf of the vcpu, or
it could be preempted, or the vcpu could be genuinely stopped.  Since
we have no way of dividing up the time between these cases, we don't
count any of it as stolen.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/include/asm/kvm_host.h |5 ++
 arch/powerpc/kvm/book3s_hv.c|  127 ++-
 2 files changed, 117 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 1e8cbd1..3093896 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -559,12 +559,17 @@ struct kvm_vcpu_arch {
unsigned long dtl_index;
u64 stolen_logged;
struct kvmppc_vpa slb_shadow;
+
+   spinlock_t tbacct_lock;
+   u64 busy_stolen;
+   u64 busy_preempt;
 #endif
 };
 
 /* Values for vcpu-arch.state */
 #define KVMPPC_VCPU_NOTREADY   0
 #define KVMPPC_VCPU_RUNNABLE   1
+#define KVMPPC_VCPU_BUSY_IN_HOST   2
 
 /* Values for vcpu-arch.io_gpr */
 #define KVM_MMIO_REG_MASK  0x001f
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 61d2934..8b3c470 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -60,23 +60,74 @@
 /* Used to indicate that a guest page fault needs to be handled */
 #define RESUME_PAGE_FAULT  (RESUME_GUEST | RESUME_FLAG_ARCH1)
 
+/* Used as a null value for timebase values */
+#define TB_NIL (~(u64)0)
+
 static void kvmppc_end_cede(struct kvm_vcpu *vcpu);
 static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu);
 
+/*
+ * We use the vcpu_load/put functions to measure stolen time.
+ * Stolen time is counted as time when either the vcpu is able to
+ * run as part of a virtual core, but the task running the vcore
+ * is preempted or sleeping, or when the vcpu needs something done
+ * in the kernel by the task running the vcpu, but that task is
+ * preempted or sleeping.  Those two things have to be counted
+ * separately, since one of the vcpu tasks will take on the job
+ * of running the core, and the other vcpu tasks in the vcore will
+ * sleep waiting for it to do that, but that sleep shouldn't count
+ * as stolen time.
+ *
+ * Hence we accumulate stolen time when the vcpu can run as part of
+ * a vcore using vc-stolen_tb, and the stolen time when the vcpu
+ * needs its task to do other things in the kernel (for example,
+ * service a page fault) in busy_stolen.  We don't accumulate
+ * stolen time for a vcore when it is inactive, or for a vcpu
+ * when it is in state RUNNING or NOTREADY.  NOTREADY is a bit of
+ * a misnomer; it means that the vcpu task is not executing in
+ * the KVM_VCPU_RUN ioctl, i.e. it is in userspace or elsewhere in
+ * the kernel.  We don't have any way of dividing up that time
+ * between time that the vcpu is genuinely stopped, time that
+ * the task is actively working on behalf of the vcpu, and time
+ * that the task is preempted, so we don't count any of it as
+ * stolen.
+ *
+ * Updates to busy_stolen are protected by arch.tbacct_lock;
+ * updates to vc-stolen_tb are protected by the arch.tbacct_lock
+ * of the vcpu that has taken responsibility for running the vcore
+ * (i.e. vc-runner).  The stolen times are measured in units of
+ * timebase ticks.  (Note that the != TB_NIL checks below are
+ * purely defensive; they should never fail.)
+ */
+
 void kvmppc_core_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
struct kvmppc_vcore *vc = vcpu-arch.vcore;
 
-   if (vc-runner == vcpu  

[PATCH 2/8] KVM: PPC: Book3S HV: Fix some races in starting secondary threads

2012-10-15 Thread Paul Mackerras
Subsequent patches implementing in-kernel XICS emulation will make it
possible for IPIs to arrive at secondary threads at arbitrary times.
This fixes some races in how we start the secondary threads, which
if not fixed could lead to occasional crashes of the host kernel.

This makes sure that (a) we have grabbed all the secondary threads,
and verified that they are no longer in the kernel, before we start
any thread, (b) that the secondary thread loads its vcpu pointer
after clearing the IPI that woke it up (so we don't miss a wakeup),
and (c) that the secondary thread clears its vcpu pointer before
incrementing the nap count.  It also removes unnecessary setting
of the vcpu and vcore pointers in the paca in kvmppc_core_vcpu_load.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/kvm/book3s_hv.c|   41 ++-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |   11 ++---
 2 files changed, 32 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index c5ddf04..77dec0f 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -64,8 +64,6 @@ void kvmppc_core_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
struct kvmppc_vcore *vc = vcpu-arch.vcore;
 
-   local_paca-kvm_hstate.kvm_vcpu = vcpu;
-   local_paca-kvm_hstate.kvm_vcore = vc;
if (vc-runner == vcpu  vc-vcore_state != VCORE_INACTIVE)
vc-stolen_tb += mftb() - vc-preempt_tb;
 }
@@ -880,6 +878,7 @@ static int kvmppc_grab_hwthread(int cpu)
 
/* Ensure the thread won't go into the kernel if it wakes */
tpaca-kvm_hstate.hwthread_req = 1;
+   tpaca-kvm_hstate.kvm_vcpu = NULL;
 
/*
 * If the thread is already executing in the kernel (e.g. handling
@@ -929,7 +928,6 @@ static void kvmppc_start_thread(struct kvm_vcpu *vcpu)
smp_wmb();
 #if defined(CONFIG_PPC_ICP_NATIVE)  defined(CONFIG_SMP)
if (vcpu-arch.ptid) {
-   kvmppc_grab_hwthread(cpu);
xics_wake_cpu(cpu);
++vc-n_woken;
}
@@ -955,7 +953,8 @@ static void kvmppc_wait_for_nap(struct kvmppc_vcore *vc)
 
 /*
  * Check that we are on thread 0 and that any other threads in
- * this core are off-line.
+ * this core are off-line.  Then grab the threads so they can't
+ * enter the kernel.
  */
 static int on_primary_thread(void)
 {
@@ -967,6 +966,17 @@ static int on_primary_thread(void)
while (++thr  threads_per_core)
if (cpu_online(cpu + thr))
return 0;
+
+   /* Grab all hw threads so they can't go into the kernel */
+   for (thr = 1; thr  threads_per_core; ++thr) {
+   if (kvmppc_grab_hwthread(cpu + thr)) {
+   /* Couldn't grab one; let the others go */
+   do {
+   kvmppc_release_hwthread(cpu + thr);
+   } while (--thr  0);
+   return 0;
+   }
+   }
return 1;
 }
 
@@ -1015,16 +1025,6 @@ static int kvmppc_run_core(struct kvmppc_vcore *vc)
}
 
/*
-* Make sure we are running on thread 0, and that
-* secondary threads are offline.
-*/
-   if (threads_per_core  1  !on_primary_thread()) {
-   list_for_each_entry(vcpu, vc-runnable_threads, arch.run_list)
-   vcpu-arch.ret = -EBUSY;
-   goto out;
-   }
-
-   /*
 * Assign physical thread IDs, first to non-ceded vcpus
 * and then to ceded ones.
 */
@@ -1043,15 +1043,22 @@ static int kvmppc_run_core(struct kvmppc_vcore *vc)
if (vcpu-arch.ceded)
vcpu-arch.ptid = ptid++;
 
+   /*
+* Make sure we are running on thread 0, and that
+* secondary threads are offline.
+*/
+   if (threads_per_core  1  !on_primary_thread()) {
+   list_for_each_entry(vcpu, vc-runnable_threads, arch.run_list)
+   vcpu-arch.ret = -EBUSY;
+   goto out;
+   }
+
vc-stolen_tb += mftb() - vc-preempt_tb;
vc-pcpu = smp_processor_id();
list_for_each_entry(vcpu, vc-runnable_threads, arch.run_list) {
kvmppc_start_thread(vcpu);
kvmppc_create_dtl_entry(vcpu, vc);
}
-   /* Grab any remaining hw threads so they can't go into the kernel */
-   for (i = ptid; i  threads_per_core; ++i)
-   kvmppc_grab_hwthread(vc-pcpu + i);
 
preempt_disable();
spin_unlock(vc-lock);
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 44b72fe..1e90ef6 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -134,8 +134,11 @@ kvm_start_guest:
 
 27:/* XXX should handle hypervisor maintenance interrupts etc. here */
 
+   /* reload vcpu pointer after 

[PATCH 7/8] KVM: PPC: Book3S HV: Allow DTL to be set to address 0, length 0

2012-10-15 Thread Paul Mackerras
Commit 55b665b026 (KVM: PPC: Book3S HV: Provide a way for userspace
to get/set per-vCPU areas) includes a check on the length of the
dispatch trace log (DTL) to make sure the buffer is at least one entry
long.  This is appropriate when registering a buffer, but the
interface also allows for any existing buffer to be unregistered by
specifying a zero address.  In this case the length check is not
appropriate.  This makes the check conditional on the address being
non-zero.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/kvm/book3s_hv.c |5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 8b3c470..812764c 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -811,9 +811,8 @@ int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 id, union 
kvmppc_one_reg *val)
addr = val-vpaval.addr;
len = val-vpaval.length;
r = -EINVAL;
-   if (len  sizeof(struct dtl_entry))
-   break;
-   if (addr  !vcpu-arch.vpa.next_gpa)
+   if (addr  (len  sizeof(struct dtl_entry) ||
+!vcpu-arch.vpa.next_gpa))
break;
len -= len % sizeof(struct dtl_entry);
r = set_vpa(vcpu, vcpu-arch.dtl, addr, len);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/8] KVM: PPC: Book3S HV: Fixes for late-joining threads

2012-10-15 Thread Paul Mackerras
If a thread in a virtual core becomes runnable while other threads
in the same virtual core are already running in the guest, it is
possible for the latecomer to join the others on the core without
first pulling them all out of the guest.  Currently this only happens
rarely, when a vcpu is first started.  This fixes some bugs and
omissions in the code in this case.

First, we need to check for VPA updates for the latecomer and make
a DTL entry for it.  Secondly, if it comes along while the master
vcpu is doing a VPA update, we don't need to do anything since the
master will pick it up in kvmppc_run_core.  To handle this correctly
we introduce a new vcore state, VCORE_STARTING.  Thirdly, there is
a race because we currently clear the hardware thread's hwthread_req
before waiting to see it get to nap.  A latecomer thread could have
its hwthread_req cleared before it gets to test it, and therefore
never increment the nap_count, leading to messages about wait_for_nap
timeouts.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/include/asm/kvm_host.h |7 ---
 arch/powerpc/kvm/book3s_hv.c|   14 +++---
 2 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 68f5a30..218534d 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -289,9 +289,10 @@ struct kvmppc_vcore {
 
 /* Values for vcore_state */
 #define VCORE_INACTIVE 0
-#define VCORE_RUNNING  1
-#define VCORE_EXITING  2
-#define VCORE_SLEEPING 3
+#define VCORE_SLEEPING 1
+#define VCORE_STARTING 2
+#define VCORE_RUNNING  3
+#define VCORE_EXITING  4
 
 /*
  * Struct used to manage memory for a virtual processor area
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 3a737a4..89995fa 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -336,6 +336,11 @@ static void kvmppc_update_vpa(struct kvm_vcpu *vcpu, 
struct kvmppc_vpa *vpap)
 
 static void kvmppc_update_vpas(struct kvm_vcpu *vcpu)
 {
+   if (!(vcpu-arch.vpa.update_pending ||
+ vcpu-arch.slb_shadow.update_pending ||
+ vcpu-arch.dtl.update_pending))
+   return;
+
spin_lock(vcpu-arch.vpa_update_lock);
if (vcpu-arch.vpa.update_pending) {
kvmppc_update_vpa(vcpu, vcpu-arch.vpa);
@@ -1009,7 +1014,7 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
vc-n_woken = 0;
vc-nap_count = 0;
vc-entry_exit_count = 0;
-   vc-vcore_state = VCORE_RUNNING;
+   vc-vcore_state = VCORE_STARTING;
vc-in_guest = 0;
vc-napping_threads = 0;
 
@@ -1062,6 +1067,7 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
kvmppc_create_dtl_entry(vcpu, vc);
}
 
+   vc-vcore_state = VCORE_RUNNING;
preempt_disable();
spin_unlock(vc-lock);
 
@@ -1070,8 +1076,6 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
srcu_idx = srcu_read_lock(vcpu0-kvm-srcu);
 
__kvmppc_vcore_entry(NULL, vcpu0);
-   for (i = 0; i  threads_per_core; ++i)
-   kvmppc_release_hwthread(vc-pcpu + i);
 
spin_lock(vc-lock);
/* disable sending of IPIs on virtual external irqs */
@@ -1080,6 +1084,8 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
/* wait for secondary threads to finish writing their state to memory */
if (vc-nap_count  vc-n_woken)
kvmppc_wait_for_nap(vc);
+   for (i = 0; i  threads_per_core; ++i)
+   kvmppc_release_hwthread(vc-pcpu + i);
/* prevent other vcpu threads from doing kvmppc_start_thread() now */
vc-vcore_state = VCORE_EXITING;
spin_unlock(vc-lock);
@@ -1170,6 +1176,7 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, 
struct kvm_vcpu *vcpu)
kvm_run-exit_reason = 0;
vcpu-arch.ret = RESUME_GUEST;
vcpu-arch.trap = 0;
+   kvmppc_update_vpas(vcpu);
 
/*
 * Synchronize with other threads in this virtual core
@@ -1193,6 +1200,7 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, 
struct kvm_vcpu *vcpu)
if (vc-vcore_state == VCORE_RUNNING 
VCORE_EXIT_COUNT(vc) == 0) {
vcpu-arch.ptid = vc-n_runnable - 1;
+   kvmppc_create_dtl_entry(vcpu, vc);
kvmppc_start_thread(vcpu);
}
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/8] KVM: PPC: Book3S HV: Allow KVM guests to stop secondary threads coming online

2012-10-15 Thread Paul Mackerras
When a Book3S HV KVM guest is running, we need the host to be in
single-thread mode, that is, all of the cores (or at least all of
the cores where the KVM guest could run) to be running only one
active hardware thread.  This is because of the hardware restriction
in POWER processors that all of the hardware threads in the core
must be in the same logical partition.  Complying with this restriction
is much easier if, from the host kernel's point of view, only one
hardware thread is active.

This adds two hooks in the SMP hotplug code to allow the KVM code to
make sure that secondary threads (i.e. hardware threads other than
thread 0) cannot come online while any KVM guest exists.  The KVM
code still has to check that any core where it runs a guest has the
secondary threads offline, but having done that check it can now be
sure that they will not come online while the guest is running.

Signed-off-by: Paul Mackerras pau...@samba.org
Acked-by: Benjamin Herrenschmidt b...@kernel.crashing.org
---
 arch/powerpc/include/asm/smp.h |8 +++
 arch/powerpc/kernel/smp.c  |   46 
 arch/powerpc/kvm/book3s_hv.c   |   12 +--
 3 files changed, 64 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index ebc24dc..b625a1a 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -66,6 +66,14 @@ void generic_cpu_die(unsigned int cpu);
 void generic_mach_cpu_die(void);
 void generic_set_cpu_dead(unsigned int cpu);
 int generic_check_cpu_restart(unsigned int cpu);
+
+extern void inhibit_secondary_onlining(void);
+extern void uninhibit_secondary_onlining(void);
+
+#else /* HOTPLUG_CPU */
+static inline void inhibit_secondary_onlining(void) {}
+static inline void uninhibit_secondary_onlining(void) {}
+
 #endif
 
 #ifdef CONFIG_PPC64
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 8d4214a..c4f420c 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -417,6 +417,45 @@ int generic_check_cpu_restart(unsigned int cpu)
 {
return per_cpu(cpu_state, cpu) == CPU_UP_PREPARE;
 }
+
+static atomic_t secondary_inhibit_count;
+
+/*
+ * Don't allow secondary CPU threads to come online
+ */
+void inhibit_secondary_onlining(void)
+{
+   /*
+* This makes secondary_inhibit_count stable during cpu
+* online/offline operations.
+*/
+   get_online_cpus();
+
+   atomic_inc(secondary_inhibit_count);
+   put_online_cpus();
+}
+EXPORT_SYMBOL_GPL(inhibit_secondary_onlining);
+
+/*
+ * Allow secondary CPU threads to come online again
+ */
+void uninhibit_secondary_onlining(void)
+{
+   get_online_cpus();
+   atomic_dec(secondary_inhibit_count);
+   put_online_cpus();
+}
+EXPORT_SYMBOL_GPL(uninhibit_secondary_onlining);
+
+static int secondaries_inhibited(void)
+{
+   return atomic_read(secondary_inhibit_count);
+}
+
+#else /* HOTPLUG_CPU */
+
+#define secondaries_inhibited()0
+
 #endif
 
 static void cpu_idle_thread_init(unsigned int cpu, struct task_struct *idle)
@@ -435,6 +474,13 @@ int __cpuinit __cpu_up(unsigned int cpu, struct 
task_struct *tidle)
 {
int rc, c;
 
+   /*
+* Don't allow secondary threads to come online if inhibited
+*/
+   if (threads_per_core  1  secondaries_inhibited() 
+   cpu % threads_per_core != 0)
+   return -EBUSY;
+
if (smp_ops == NULL ||
(smp_ops-cpu_bootable  !smp_ops-cpu_bootable(cpu)))
return -EINVAL;
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 9a15da7..c5ddf04 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -47,6 +47,7 @@
 #include asm/page.h
 #include asm/hvcall.h
 #include asm/switch_to.h
+#include asm/smp.h
 #include linux/gfp.h
 #include linux/vmalloc.h
 #include linux/highmem.h
@@ -1016,8 +1017,6 @@ static int kvmppc_run_core(struct kvmppc_vcore *vc)
/*
 * Make sure we are running on thread 0, and that
 * secondary threads are offline.
-* XXX we should also block attempts to bring any
-* secondary threads online.
 */
if (threads_per_core  1  !on_primary_thread()) {
list_for_each_entry(vcpu, vc-runnable_threads, arch.run_list)
@@ -1730,11 +1729,20 @@ int kvmppc_core_init_vm(struct kvm *kvm)
 
kvm-arch.using_mmu_notifiers = !!cpu_has_feature(CPU_FTR_ARCH_206);
spin_lock_init(kvm-arch.slot_phys_lock);
+
+   /*
+* Don't allow secondary CPU threads to come online
+* while any KVM VMs exist.
+*/
+   inhibit_secondary_onlining();
+
return 0;
 }
 
 void kvmppc_core_destroy_vm(struct kvm *kvm)
 {
+   uninhibit_secondary_onlining();
+
if (kvm-arch.rma) {
kvm_release_rma(kvm-arch.rma);
kvm-arch.rma = NULL;
-- 
1.7.10.4

--
To unsubscribe from this 

[PATCH 3/8] KVM: PPC: Book3s HV: Don't access runnable threads list without vcore lock

2012-10-15 Thread Paul Mackerras
There were a few places where we were traversing the list of runnable
threads in a virtual core, i.e. vc-runnable_threads, without holding
the vcore spinlock.  This extends the places where we hold the vcore
spinlock to cover everywhere that we traverse that list.

Since we possibly need to sleep inside kvmppc_book3s_hv_page_fault,
this moves the call of it from kvmppc_handle_exit out to
kvmppc_vcpu_run, where we don't hold the vcore lock.

In kvmppc_vcore_blocked, we don't actually need to check whether
all vcpus are ceded and don't have any pending exceptions, since the
caller has already done that.  The caller (kvmppc_run_vcpu) wasn't
actually checking for pending exceptions, so we add that.

The change of if to while in kvmppc_run_vcpu is to make sure that we
never call kvmppc_remove_runnable() when the vcore state is RUNNING or
EXITING.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/include/asm/kvm_asm.h |1 +
 arch/powerpc/kvm/book3s_hv.c   |   67 ++--
 2 files changed, 34 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_asm.h 
b/arch/powerpc/include/asm/kvm_asm.h
index 76fdcfe..aabcdba 100644
--- a/arch/powerpc/include/asm/kvm_asm.h
+++ b/arch/powerpc/include/asm/kvm_asm.h
@@ -118,6 +118,7 @@
 
 #define RESUME_FLAG_NV  (10)  /* Reload guest nonvolatile state? */
 #define RESUME_FLAG_HOST(11)  /* Resume host? */
+#define RESUME_FLAG_ARCH1  (12)
 
 #define RESUME_GUEST0
 #define RESUME_GUEST_NV RESUME_FLAG_NV
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 77dec0f..3a737a4 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -57,6 +57,9 @@
 /* #define EXIT_DEBUG_SIMPLE */
 /* #define EXIT_DEBUG_INT */
 
+/* Used to indicate that a guest page fault needs to be handled */
+#define RESUME_PAGE_FAULT  (RESUME_GUEST | RESUME_FLAG_ARCH1)
+
 static void kvmppc_end_cede(struct kvm_vcpu *vcpu);
 static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu);
 
@@ -431,7 +434,6 @@ static int kvmppc_handle_exit(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
  struct task_struct *tsk)
 {
int r = RESUME_HOST;
-   int srcu_idx;
 
vcpu-stat.sum_exits++;
 
@@ -491,16 +493,12 @@ static int kvmppc_handle_exit(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
 * have been handled already.
 */
case BOOK3S_INTERRUPT_H_DATA_STORAGE:
-   srcu_idx = srcu_read_lock(vcpu-kvm-srcu);
-   r = kvmppc_book3s_hv_page_fault(run, vcpu,
-   vcpu-arch.fault_dar, vcpu-arch.fault_dsisr);
-   srcu_read_unlock(vcpu-kvm-srcu, srcu_idx);
+   r = RESUME_PAGE_FAULT;
break;
case BOOK3S_INTERRUPT_H_INST_STORAGE:
-   srcu_idx = srcu_read_lock(vcpu-kvm-srcu);
-   r = kvmppc_book3s_hv_page_fault(run, vcpu,
-   kvmppc_get_pc(vcpu), 0);
-   srcu_read_unlock(vcpu-kvm-srcu, srcu_idx);
+   vcpu-arch.fault_dar = kvmppc_get_pc(vcpu);
+   vcpu-arch.fault_dsisr = 0;
+   r = RESUME_PAGE_FAULT;
break;
/*
 * This occurs if the guest executes an illegal instruction.
@@ -984,22 +982,24 @@ static int on_primary_thread(void)
  * Run a set of guest threads on a physical core.
  * Called with vc-lock held.
  */
-static int kvmppc_run_core(struct kvmppc_vcore *vc)
+static void kvmppc_run_core(struct kvmppc_vcore *vc)
 {
struct kvm_vcpu *vcpu, *vcpu0, *vnext;
long ret;
u64 now;
int ptid, i, need_vpa_update;
int srcu_idx;
+   struct kvm_vcpu *vcpus_to_update[threads_per_core];
 
/* don't start if any threads have a signal pending */
need_vpa_update = 0;
list_for_each_entry(vcpu, vc-runnable_threads, arch.run_list) {
if (signal_pending(vcpu-arch.run_task))
-   return 0;
-   need_vpa_update |= vcpu-arch.vpa.update_pending |
-   vcpu-arch.slb_shadow.update_pending |
-   vcpu-arch.dtl.update_pending;
+   return;
+   if (vcpu-arch.vpa.update_pending ||
+   vcpu-arch.slb_shadow.update_pending ||
+   vcpu-arch.dtl.update_pending)
+   vcpus_to_update[need_vpa_update++] = vcpu;
}
 
/*
@@ -1019,8 +1019,8 @@ static int kvmppc_run_core(struct kvmppc_vcore *vc)
 */
if (need_vpa_update) {
spin_unlock(vc-lock);
-   list_for_each_entry(vcpu, vc-runnable_threads, arch.run_list)
-   kvmppc_update_vpas(vcpu);
+   for (i = 0; i  need_vpa_update; ++i)
+   kvmppc_update_vpas(vcpus_to_update[i]);
spin_lock(vc-lock);
}
 
@@ -1037,8 +1037,10 @@ static int 

Re: [PATCH 0/8] Various Book3s HV fixes that haven't been picked up yet

2012-10-15 Thread Alexander Graf

On 15.10.2012, at 13:14, Paul Mackerras wrote:

 This is a set of 8 patches of which the first 7 have been posted
 previously and have had no comments.  The 8th is new, but is quite
 trivial.  They fix a series of issues with HV-style KVM on ppc.
 They only touch code that is specific to Book3S HV KVM.
 
 Please apply.

Sorry, I can't accept patches that haven't shown up on kvm@vger. Please send 
this patch set again with CC to kvm@vger.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] KVM: PPC: Support ioeventfd

2012-10-15 Thread Alexander Graf
In order to support vhost, we need to be able to support ioeventfd.

This patch set adds support for ioeventfd to PPC and makes it possible to
do so without implementing irqfd along the way, as it requires an in-kernel
irqchip which we don't have yet.

Alex

Alexander Graf (2):
  KVM: Distangle eventfd code from irqchip
  KVM: PPC: Support eventfd

 arch/powerpc/kvm/Kconfig   |1 +
 arch/powerpc/kvm/Makefile  |4 +++-
 arch/powerpc/kvm/powerpc.c |   17 -
 include/linux/kvm_host.h   |   12 +++-
 virt/kvm/eventfd.c |6 ++
 5 files changed, 37 insertions(+), 3 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] KVM: PPC: Support eventfd

2012-10-15 Thread Alexander Graf
In order to support the generic eventfd infrastructure on PPC, we need
to call into the generic KVM in-kernel device mmio code.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/Kconfig   |1 +
 arch/powerpc/kvm/Makefile  |4 +++-
 arch/powerpc/kvm/powerpc.c |   17 -
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index 71f0cd9..4730c95 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -20,6 +20,7 @@ config KVM
bool
select PREEMPT_NOTIFIERS
select ANON_INODES
+   select HAVE_KVM_EVENTFD
 
 config KVM_BOOK3S_HANDLER
bool
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index c2a0863..cd89658 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -6,7 +6,8 @@ subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror
 
 ccflags-y := -Ivirt/kvm -Iarch/powerpc/kvm
 
-common-objs-y = $(addprefix ../../../virt/kvm/, kvm_main.o coalesced_mmio.o)
+common-objs-y = $(addprefix ../../../virt/kvm/, kvm_main.o coalesced_mmio.o \
+   eventfd.o)
 
 CFLAGS_44x_tlb.o  := -I.
 CFLAGS_e500_tlb.o := -I.
@@ -76,6 +77,7 @@ kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HV) := \
 
 kvm-book3s_64-module-objs := \
../../../virt/kvm/kvm_main.o \
+   ../../../virt/kvm/eventfd.o \
powerpc.o \
emulate.o \
book3s.o \
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index deb0d59..900d8fc 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -314,6 +314,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_PPC_IRQ_LEVEL:
case KVM_CAP_ENABLE_CAP:
case KVM_CAP_ONE_REG:
+   case KVM_CAP_IOEVENTFD:
r = 1;
break;
 #ifndef CONFIG_KVM_BOOK3S_64_HV
@@ -613,6 +614,13 @@ int kvmppc_handle_load(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
vcpu-mmio_is_write = 0;
vcpu-arch.mmio_sign_extend = 0;
 
+   if (!kvm_io_bus_read(vcpu-kvm, KVM_MMIO_BUS, run-mmio.phys_addr,
+bytes, run-mmio.data)) {
+   kvmppc_complete_mmio_load(vcpu, run);
+   vcpu-mmio_needed = 0;
+   return EMULATE_DONE;
+   }
+
return EMULATE_DO_MMIO;
 }
 
@@ -622,8 +630,8 @@ int kvmppc_handle_loads(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
 {
int r;
 
-   r = kvmppc_handle_load(run, vcpu, rt, bytes, is_bigendian);
vcpu-arch.mmio_sign_extend = 1;
+   r = kvmppc_handle_load(run, vcpu, rt, bytes, is_bigendian);
 
return r;
 }
@@ -661,6 +669,13 @@ int kvmppc_handle_store(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
}
}
 
+   if (!kvm_io_bus_write(vcpu-kvm, KVM_MMIO_BUS, run-mmio.phys_addr,
+ bytes, run-mmio.data)) {
+   kvmppc_complete_mmio_load(vcpu, run);
+   vcpu-mmio_needed = 0;
+   return EMULATE_DONE;
+   }
+
return EMULATE_DO_MMIO;
 }
 
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] KVM: Distangle eventfd code from irqchip

2012-10-15 Thread Alexander Graf
The current eventfd code assumes that when we have eventfd, we also have
irqfd for in-kernel interrupt delivery. This is not necessarily true. On
PPC we don't have an in-kernel irqchip yet, but we can still support easily
support eventfd.

Signed-off-by: Alexander Graf ag...@suse.de
---
 include/linux/kvm_host.h |   12 +++-
 virt/kvm/eventfd.c   |6 ++
 2 files changed, 17 insertions(+), 1 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 6afc5be..f2f5880 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -884,10 +884,20 @@ static inline void kvm_free_irq_routing(struct kvm *kvm) 
{}
 #ifdef CONFIG_HAVE_KVM_EVENTFD
 
 void kvm_eventfd_init(struct kvm *kvm);
+int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args);
+
+#ifdef CONFIG_HAVE_KVM_IRQCHIP
 int kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args);
 void kvm_irqfd_release(struct kvm *kvm);
 void kvm_irq_routing_update(struct kvm *, struct kvm_irq_routing_table *);
-int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args);
+#else
+static inline int kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args)
+{
+   return -EINVAL;
+}
+
+static inline void kvm_irqfd_release(struct kvm *kvm) {}
+#endif
 
 #else
 
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 9718e98..d7424c8 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -35,6 +35,7 @@
 
 #include iodev.h
 
+#ifdef __KVM_HAVE_IOAPIC
 /*
  * 
  * irqfd: Allows an fd to be used to inject an interrupt to the guest
@@ -425,17 +426,21 @@ fail:
kfree(irqfd);
return ret;
 }
+#endif
 
 void
 kvm_eventfd_init(struct kvm *kvm)
 {
+#ifdef __KVM_HAVE_IOAPIC
spin_lock_init(kvm-irqfds.lock);
INIT_LIST_HEAD(kvm-irqfds.items);
INIT_LIST_HEAD(kvm-irqfds.resampler_list);
mutex_init(kvm-irqfds.resampler_lock);
+#endif
INIT_LIST_HEAD(kvm-ioeventfds);
 }
 
+#ifdef __KVM_HAVE_IOAPIC
 /*
  * shutdown any irqfd's that match fd+gsi
  */
@@ -555,6 +560,7 @@ static void __exit irqfd_module_exit(void)
 
 module_init(irqfd_module_init);
 module_exit(irqfd_module_exit);
+#endif
 
 /*
  * 
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm/powerpc: Handle errors in secondary thread grabbing

2012-10-15 Thread Michael Ellerman
In the Book3s HV code, kvmppc_run_core() has logic to grab the secondary
threads of the physical core.

If for some reason a thread is stuck, kvmppc_grab_hwthread() can fail,
but currently we ignore the failure and continue into the guest. If the
stuck thread is in the kernel badness ensues.

Instead we should check for failure and bail out.

I've moved the grabbing prior to the startup of runnable threads, to simplify
the error case. AFAICS this is harmless, but I could be missing something
subtle.

Signed-off-by: Michael Ellerman mich...@ellerman.id.au
---

Or we could just BUG_ON() ?
---
 arch/powerpc/kvm/book3s_hv.c |   22 ++
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 721d460..55925cd 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -884,16 +884,30 @@ static int kvmppc_run_core(struct kvmppc_vcore *vc)
if (vcpu-arch.ceded)
vcpu-arch.ptid = ptid++;
 
+   /*
+* Grab any remaining hw threads so they can't go into the kernel.
+* Do this early to simplify the cleanup path if it fails.
+*/
+   for (i = ptid; i  threads_per_core; ++i) {
+   int j, rc = kvmppc_grab_hwthread(vc-pcpu + i);
+   if (rc) {
+   for (j = i - 1; j ; j--)
+   kvmppc_release_hwthread(vc-pcpu + j);
+
+   list_for_each_entry(vcpu, vc-runnable_threads,
+   arch.run_list)
+   vcpu-arch.ret = -EBUSY;
+
+   goto out;
+   }
+   }
+
vc-stolen_tb += mftb() - vc-preempt_tb;
vc-pcpu = smp_processor_id();
list_for_each_entry(vcpu, vc-runnable_threads, arch.run_list) {
kvmppc_start_thread(vcpu);
kvmppc_create_dtl_entry(vcpu, vc);
}
-   /* Grab any remaining hw threads so they can't go into the kernel */
-   for (i = ptid; i  threads_per_core; ++i)
-   kvmppc_grab_hwthread(vc-pcpu + i);
-
preempt_disable();
spin_unlock(vc-lock);
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/8] Various Book3s HV fixes that haven't been picked up yet

2012-10-15 Thread Paul Mackerras
On Mon, Oct 15, 2012 at 02:00:54PM +0200, Alexander Graf wrote:
 
 Sorry, I can't accept patches that haven't shown up on kvm@vger. Please send 
 this patch set again with CC to kvm@vger.

Done; I didn't cc kvm-ppc this time since the patches haven't changed.

By the way, what is the purpose of kvm-ppc@vger.kernel.org?

Paul.
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm/powerpc: Handle errors in secondary thread grabbing

2012-10-15 Thread Paul Mackerras
Michael,

On Tue, Oct 16, 2012 at 11:15:50AM +1100, Michael Ellerman wrote:
 In the Book3s HV code, kvmppc_run_core() has logic to grab the secondary
 threads of the physical core.
 
 If for some reason a thread is stuck, kvmppc_grab_hwthread() can fail,
 but currently we ignore the failure and continue into the guest. If the
 stuck thread is in the kernel badness ensues.
 
 Instead we should check for failure and bail out.
 
 I've moved the grabbing prior to the startup of runnable threads, to simplify
 the error case. AFAICS this is harmless, but I could be missing something
 subtle.

Thanks for looking at this - but in fact this is fixed by my patch
entitled KVM: PPC: Book3S HV: Fix some races in starting secondary
threads submitted back on August 28.

Regards,
Paul.
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] KVM: Provide mmu notifier retry test based on struct kvm

2012-10-15 Thread Paul Mackerras
The mmu_notifier_retry() function, used to test whether any page
invalidations are in progress, currently takes a vcpu pointer, though
the code only needs the VM's struct kvm pointer.  Forthcoming patches
to the powerpc Book3S HV code will need to test for retry within a VM
ioctl, where a struct kvm pointer is available but a struct vcpu
pointer isn't.  Therefore this creates a variant of mmu_notifier_retry
called kvm_mmu_notifier_retry that takes a struct kvm pointer, and
implements mmu_notifier_retry in terms of it.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 include/linux/kvm_host.h |   11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 6afc5be..1cc1e1d 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -841,9 +841,9 @@ extern struct kvm_stats_debugfs_item debugfs_entries[];
 extern struct dentry *kvm_debugfs_dir;
 
 #if defined(CONFIG_MMU_NOTIFIER)  defined(KVM_ARCH_WANT_MMU_NOTIFIER)
-static inline int mmu_notifier_retry(struct kvm_vcpu *vcpu, unsigned long 
mmu_seq)
+static inline int kvm_mmu_notifier_retry(struct kvm *kvm, unsigned long 
mmu_seq)
 {
-   if (unlikely(vcpu-kvm-mmu_notifier_count))
+   if (unlikely(kvm-mmu_notifier_count))
return 1;
/*
 * Ensure the read of mmu_notifier_count happens before the read
@@ -856,10 +856,15 @@ static inline int mmu_notifier_retry(struct kvm_vcpu 
*vcpu, unsigned long mmu_se
 * can't rely on kvm-mmu_lock to keep things ordered.
 */
smp_rmb();
-   if (vcpu-kvm-mmu_notifier_seq != mmu_seq)
+   if (kvm-mmu_notifier_seq != mmu_seq)
return 1;
return 0;
 }
+
+static inline int mmu_notifier_retry(struct kvm_vcpu *vcpu, unsigned long 
mmu_seq)
+{
+   return kvm_mmu_notifier_retry(vcpu-kvm, mmu_seq);
+}
 #endif
 
 #ifdef KVM_CAP_IRQ_ROUTING
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] KVM: PPC: Book3S HV: Add a mechanism for recording modified HPTEs

2012-10-15 Thread Paul Mackerras
This uses a bit in our record of the guest view of the HPTE to record
when the HPTE gets modified.  We use a reserved bit for this, and ensure
that this bit is always cleared in HPTE values returned to the guest.
The recording of modified HPTEs is only done if other code indicates
its interest by setting kvm-arch.hpte_mod_interest to a non-zero value.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/include/asm/kvm_book3s_64.h |6 ++
 arch/powerpc/include/asm/kvm_host.h  |1 +
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  |   25 ++---
 3 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 1472a5b..4ca4f25 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -50,6 +50,12 @@ extern int kvm_hpt_order;/* order of 
preallocated HPTs */
 #define HPTE_V_HVLOCK  0x40UL
 #define HPTE_V_ABSENT  0x20UL
 
+/*
+ * We use this bit in the guest_rpte field of the revmap entry
+ * to indicate a modified HPTE.
+ */
+#define HPTE_GR_MODIFIED   (1ul  62)
+
 static inline long try_lock_hpte(unsigned long *hpte, unsigned long bits)
 {
unsigned long tmp, old;
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 3093896..58c7264 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -248,6 +248,7 @@ struct kvm_arch {
atomic_t vcpus_running;
unsigned long hpt_npte;
unsigned long hpt_mask;
+   atomic_t hpte_mod_interest;
spinlock_t slot_phys_lock;
unsigned short last_vcpu[NR_CPUS];
struct kvmppc_vcore *vcores[KVM_MAX_VCORES];
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 3233587..c83c0ca 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -66,6 +66,18 @@ void kvmppc_add_revmap_chain(struct kvm *kvm, struct 
revmap_entry *rev,
 }
 EXPORT_SYMBOL_GPL(kvmppc_add_revmap_chain);
 
+/*
+ * Note modification of an HPTE; set the HPTE modified bit
+ * if it wasn't modified before and anyone is interested.
+ */
+static inline void note_hpte_modification(struct kvm *kvm,
+ struct revmap_entry *rev)
+{
+   if (!(rev-guest_rpte  HPTE_GR_MODIFIED) 
+   atomic_read(kvm-arch.hpte_mod_interest))
+   rev-guest_rpte |= HPTE_GR_MODIFIED;
+}
+
 /* Remove this HPTE from the chain for a real page */
 static void remove_revmap_chain(struct kvm *kvm, long pte_index,
struct revmap_entry *rev,
@@ -287,8 +299,10 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long 
flags,
rev = kvm-arch.revmap[pte_index];
if (realmode)
rev = real_vmalloc_addr(rev);
-   if (rev)
+   if (rev) {
rev-guest_rpte = g_ptel;
+   note_hpte_modification(kvm, rev);
+   }
 
/* Link HPTE into reverse-map chain */
if (pteh  HPTE_V_VALID) {
@@ -392,7 +406,8 @@ long kvmppc_h_remove(struct kvm_vcpu *vcpu, unsigned long 
flags,
/* Read PTE low word after tlbie to get final R/C values */
remove_revmap_chain(kvm, pte_index, rev, v, hpte[1]);
}
-   r = rev-guest_rpte;
+   r = rev-guest_rpte  ~HPTE_GR_MODIFIED;
+   note_hpte_modification(kvm, rev);
unlock_hpte(hpte, 0);
 
vcpu-arch.gpr[4] = v;
@@ -466,6 +481,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
 
args[j] = ((0x80 | flags)  56) + pte_index;
rev = real_vmalloc_addr(kvm-arch.revmap[pte_index]);
+   note_hpte_modification(kvm, rev);
 
if (!(hp[0]  HPTE_V_VALID)) {
/* insert R and C bits from PTE */
@@ -555,6 +571,7 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long 
flags,
if (rev) {
r = (rev-guest_rpte  ~mask) | bits;
rev-guest_rpte = r;
+   note_hpte_modification(kvm, rev);
}
r = (hpte[1]  ~mask) | bits;
 
@@ -606,8 +623,10 @@ long kvmppc_h_read(struct kvm_vcpu *vcpu, unsigned long 
flags,
v = ~HPTE_V_ABSENT;
v |= HPTE_V_VALID;
}
-   if (v  HPTE_V_VALID)
+   if (v  HPTE_V_VALID) {
r = rev[i].guest_rpte | (r  (HPTE_R_R | HPTE_R_C));
+   r = ~HPTE_GR_MODIFIED;
+   }
vcpu-arch.gpr[4 + i * 2] = v;
vcpu-arch.gpr[5 + i * 2] = r;
}
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] KVM: PPC: Book3S HV: Restructure HPT entry creation code

2012-10-15 Thread Paul Mackerras
This restructures the code that creates HPT (hashed page table)
entries so that it can be called in situations where we don't have a
struct vcpu pointer, only a struct kvm pointer.  It also fixes a bug
where kvmppc_map_vrma() would corrupt the guest R4 value.

Now, most of the work of kvmppc_virtmode_h_enter is done by a new
function, kvmppc_virtmode_do_h_enter, which itself calls another new
function, kvmppc_do_h_enter, which contains most of the old
kvmppc_h_enter.  The new kvmppc_do_h_enter takes explicit arguments
for the place to return the HPTE index, the Linux page tables to use,
and whether it is being called in real mode, thus removing the need
for it to have the vcpu as an argument.

Currently kvmppc_map_vrma creates the VRMA (virtual real mode area)
HPTEs by calling kvmppc_virtmode_h_enter, which is designed primarily
to handle H_ENTER hcalls from the guest that need to pin a page of
memory.  Since H_ENTER returns the index of the created HPTE in R4,
kvmppc_virtmode_h_enter updates the guest R4, corrupting the guest R4
in the case when it gets called from kvmppc_map_vrma on the first
VCPU_RUN ioctl.  With this, kvmppc_map_vrma instead calls
kvmppc_virtmode_do_h_enter with the address of a dummy word as the
place to store the HPTE index, thus avoiding corrupting the guest R4.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/include/asm/kvm_book3s.h |5 +++--
 arch/powerpc/kvm/book3s_64_mmu_hv.c   |   36 +++--
 arch/powerpc/kvm/book3s_hv_rm_mmu.c   |   27 -
 3 files changed, 45 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index ab73800..199b7fd 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -157,8 +157,9 @@ extern void *kvmppc_pin_guest_page(struct kvm *kvm, 
unsigned long addr,
 extern void kvmppc_unpin_guest_page(struct kvm *kvm, void *addr);
 extern long kvmppc_virtmode_h_enter(struct kvm_vcpu *vcpu, unsigned long flags,
long pte_index, unsigned long pteh, unsigned long ptel);
-extern long kvmppc_h_enter(struct kvm_vcpu *vcpu, unsigned long flags,
-   long pte_index, unsigned long pteh, unsigned long ptel);
+extern long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
+   long pte_index, unsigned long pteh, unsigned long ptel,
+   pgd_t *pgdir, bool realmode, unsigned long *idx_ret);
 extern long kvmppc_hv_get_dirty_log(struct kvm *kvm,
struct kvm_memory_slot *memslot, unsigned long *map);
 
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 7a4aae9..351f2ac 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -41,6 +41,10 @@
 /* Power architecture requires HPT is at least 256kB */
 #define PPC_MIN_HPT_ORDER  18
 
+static long kvmppc_virtmode_do_h_enter(struct kvm *kvm, unsigned long flags,
+   long pte_index, unsigned long pteh,
+   unsigned long ptel, unsigned long *pte_idx_ret);
+
 long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
 {
unsigned long hpt;
@@ -185,6 +189,7 @@ void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct 
kvm_memory_slot *memslot,
unsigned long addr, hash;
unsigned long psize;
unsigned long hp0, hp1;
+   unsigned long idx_ret;
long ret;
struct kvm *kvm = vcpu-kvm;
 
@@ -216,7 +221,8 @@ void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct 
kvm_memory_slot *memslot,
hash = (hash  3) + 7;
hp_v = hp0 | ((addr  16)  ~0x7fUL);
hp_r = hp1 | addr;
-   ret = kvmppc_virtmode_h_enter(vcpu, H_EXACT, hash, hp_v, hp_r);
+   ret = kvmppc_virtmode_do_h_enter(kvm, H_EXACT, hash, hp_v, hp_r,
+idx_ret);
if (ret != H_SUCCESS) {
pr_err(KVM: map_vrma at %lx failed, ret=%ld\n,
   addr, ret);
@@ -354,15 +360,10 @@ static long kvmppc_get_guest_page(struct kvm *kvm, 
unsigned long gfn,
return err;
 }
 
-/*
- * We come here on a H_ENTER call from the guest when we are not
- * using mmu notifiers and we don't have the requested page pinned
- * already.
- */
-long kvmppc_virtmode_h_enter(struct kvm_vcpu *vcpu, unsigned long flags,
-   long pte_index, unsigned long pteh, unsigned long ptel)
+long kvmppc_virtmode_do_h_enter(struct kvm *kvm, unsigned long flags,
+   long pte_index, unsigned long pteh,
+   unsigned long ptel, unsigned long *pte_idx_ret)
 {
-   struct kvm *kvm = vcpu-kvm;
unsigned long psize, gpa, gfn;
struct kvm_memory_slot *memslot;
long ret;
@@ -390,8 +391,8 @@ long 

[PATCH 0/5] KVM: PPC: Book3S HV: HPT read/write functions for userspace

2012-10-15 Thread Paul Mackerras
This series of patches provides an interface by which userspace can
read and write the hashed page table (HPT) of a Book3S HV guest.
The interface is an ioctl which provides a file descriptor which can
be accessed with the read() and write() system calls.  The data read
and written is the guest view of the HPT, in which the second
doubleword of each HPTE (HPT entry) contains a guest physical address,
as distinct from the real HPT that the hardware accesses, where the
second doubleword of each HPTE contains a real address.

Because the HPT is divided into groups (HPTEGs) of 8 entries each,
where each HPTEG usually only contains a few valid entries, or none,
the data format that we use does run-length encoding of the invalid
entries, so in fact the invalid entries take up no space in the
stream.

The interface also provides for doing multiple passes over the HPT,
where the first pass provides information on all HPTEs, and subsequent
passes only return the HPTEs that have changed since the previous pass.

I have implemented a read/write interface rather than an mmap-based
interface because the data is not stored contiguously anywhere in
kernel memory.  Of each 16-byte HPTE, the first 8 bytes come from the
real HPT and the second 8 bytes come from the parallel vmalloc'd array
where we store the guest view of the guest physical address,
permissions, accessed/dirty bits etc.  Thus a mmap-based interface
would not be practicable (not without doubling the size of the
parallel array, typically requiring an extra 8MB of kernel memory per
guest).  This is also why I have not used the memslot interface for
this.

This implements the interface for HV-style KVM but not for PR-style
KVM.  Userspace does not need any additional interface with PR-style
KVM because userspace maintains the guest HPT already in that case,
and has an image of the guest view of the HPT in its address space.

This series is against the next branch of the kvm tree plus my
recently-posted set of 8 patches (Various Book3s HV fixes that
haven't been picked up yet).  The overall diffstat is:

 Documentation/virtual/kvm/api.txt|   53 +
 arch/powerpc/include/asm/kvm.h   |   24 ++
 arch/powerpc/include/asm/kvm_book3s.h|8 +-
 arch/powerpc/include/asm/kvm_book3s_64.h |   24 ++
 arch/powerpc/include/asm/kvm_host.h  |1 +
 arch/powerpc/include/asm/kvm_ppc.h   |2 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c  |  380 +-
 arch/powerpc/kvm/book3s_hv.c |   12 -
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  |   71 --
 arch/powerpc/kvm/powerpc.c   |   17 ++
 include/linux/kvm.h  |3 +
 include/linux/kvm_host.h |   11 +-
 12 files changed, 559 insertions(+), 47 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5] KVM: PPC: Book3S HV: Provide a method for userspace to read and write the HPT

2012-10-15 Thread Paul Mackerras
A new ioctl, KVM_PPC_GET_HTAB_FD, returns a file descriptor.  Reads on
this fd return the contents of the HPT (hashed page table), writes
create and/or remove entries in the HPT.  There is a new capability,
KVM_CAP_PPC_HTAB_FD, to indicate the presence of the ioctl.  The ioctl
takes an argument structure with the index of the first HPT entry to
read out and a set of flags.  The flags indicate whether the user is
intending to read or write the HPT, and whether to return all entries
or only the bolted entries (those with the bolted bit, 0x10, set in
the first doubleword).

This is intended for use in implementing qemu's savevm/loadvm and for
live migration.  Therefore, on reads, the first pass returns information
about all HPTEs (or all bolted HPTEs).  When the first pass reaches the
end of the HPT, it returns from the read.  Subsequent reads only return
information about HPTEs that have changed since they were last read.
A read that finds no changed HPTEs in the HPT following where the last
read finished will return 0 bytes.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 Documentation/virtual/kvm/api.txt|   53 +
 arch/powerpc/include/asm/kvm.h   |   24 +++
 arch/powerpc/include/asm/kvm_book3s_64.h |   18 ++
 arch/powerpc/include/asm/kvm_ppc.h   |2 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c  |  344 ++
 arch/powerpc/kvm/book3s_hv.c |   12 --
 arch/powerpc/kvm/powerpc.c   |   17 ++
 include/linux/kvm.h  |3 +
 8 files changed, 461 insertions(+), 12 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 4258180..8df3e53 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2071,6 +2071,59 @@ KVM_S390_INT_EXTERNAL_CALL (vcpu) - sigp external call; 
source cpu in parm
 
 Note that the vcpu ioctl is asynchronous to vcpu execution.
 
+4.78 KVM_PPC_GET_HTAB_FD
+
+Capability: KVM_CAP_PPC_HTAB_FD
+Architectures: powerpc
+Type: vm ioctl
+Parameters: Pointer to struct kvm_get_htab_fd (in)
+Returns: file descriptor number (= 0) on success, -1 on error
+
+This returns a file descriptor that can be used either to read out the
+entries in the guest's hashed page table (HPT), or to write entries to
+initialize the HPT.  The returned fd can only be written to if the
+KVM_GET_HTAB_WRITE bit is set in the flags field of the argument, and
+can only be read if that bit is clear.  The argument struct looks like
+this:
+
+/* For KVM_PPC_GET_HTAB_FD */
+struct kvm_get_htab_fd {
+   __u64   flags;
+   __u64   start_index;
+};
+
+/* Values for kvm_get_htab_fd.flags */
+#define KVM_GET_HTAB_BOLTED_ONLY   ((__u64)0x1)
+#define KVM_GET_HTAB_WRITE ((__u64)0x2)
+
+The `start_index' field gives the index in the HPT of the entry at
+which to start reading.  It is ignored when writing.
+
+Reads on the fd will initially supply information about all
+interesting HPT entries.  Interesting entries are those with the
+bolted bit set, if the KVM_GET_HTAB_BOLTED_ONLY bit is set, otherwise
+all entries.  When the end of the HPT is reached, the read() will
+return.  If read() is called again on the fd, it will start again from
+the beginning of the HPT, but will only return HPT entries that have
+changed since they were last read.
+
+Data read or written is structured as a header (8 bytes) followed by a
+series of valid HPT entries (16 bytes) each.  The header indicates how
+many valid HPT entries there are and how many invalid entries follow
+the valid entries.  The invalid entries are not represented explicitly
+in the stream.  The header format is:
+
+struct kvm_get_htab_header {
+   __u32   index;
+   __u16   n_valid;
+   __u16   n_invalid;
+};
+
+Writes to the fd create HPT entries starting at the index given in the
+header; first `n_valid' valid entries with contents from the data
+written, then `n_invalid' invalid entries, invalidating any previously
+valid entries found.
+
 
 5. The kvm_run structure
 
diff --git a/arch/powerpc/include/asm/kvm.h b/arch/powerpc/include/asm/kvm.h
index b89ae4d..6518e38 100644
--- a/arch/powerpc/include/asm/kvm.h
+++ b/arch/powerpc/include/asm/kvm.h
@@ -331,6 +331,30 @@ struct kvm_book3e_206_tlb_params {
__u32 reserved[8];
 };
 
+/* For KVM_PPC_GET_HTAB_FD */
+struct kvm_get_htab_fd {
+   __u64   flags;
+   __u64   start_index;
+};
+
+/* Values for kvm_get_htab_fd.flags */
+#define KVM_GET_HTAB_BOLTED_ONLY   ((__u64)0x1)
+#define KVM_GET_HTAB_WRITE ((__u64)0x2)
+
+/*
+ * Data read on the file descriptor is formatted as a series of
+ * records, each consisting of a header followed by a series of
+ * `n_valid' HPTEs (16 bytes each), which are all valid.  Following 
+ * those valid HPTEs there are `n_invalid' invalid HPTEs, which
+ * are not represented explicitly in the stream.  The same format
+ * is used for writing.
+ */
+struct 

[PATCH 4/5] KVM: PPC: Book3S HV: Make a HPTE removal function available

2012-10-15 Thread Paul Mackerras
This makes a HPTE removal function, kvmppc_do_h_remove(), available
outside book3s_hv_rm_mmu.c.  This will be used by the HPT writing
code.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/include/asm/kvm_book3s.h |3 +++
 arch/powerpc/kvm/book3s_hv_rm_mmu.c   |   19 +--
 2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 199b7fd..4ac1c67 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -160,6 +160,9 @@ extern long kvmppc_virtmode_h_enter(struct kvm_vcpu *vcpu, 
unsigned long flags,
 extern long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
long pte_index, unsigned long pteh, unsigned long ptel,
pgd_t *pgdir, bool realmode, unsigned long *idx_ret);
+extern long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
+   unsigned long pte_index, unsigned long avpn,
+   unsigned long *hpret);
 extern long kvmppc_hv_get_dirty_log(struct kvm *kvm,
struct kvm_memory_slot *memslot, unsigned long *map);
 
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index c83c0ca..505548a 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -364,11 +364,10 @@ static inline int try_lock_tlbie(unsigned int *lock)
return old == 0;
 }
 
-long kvmppc_h_remove(struct kvm_vcpu *vcpu, unsigned long flags,
-unsigned long pte_index, unsigned long avpn,
-unsigned long va)
+long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
+   unsigned long pte_index, unsigned long avpn,
+   unsigned long *hpret)
 {
-   struct kvm *kvm = vcpu-kvm;
unsigned long *hpte;
unsigned long v, r, rb;
struct revmap_entry *rev;
@@ -410,10 +409,18 @@ long kvmppc_h_remove(struct kvm_vcpu *vcpu, unsigned long 
flags,
note_hpte_modification(kvm, rev);
unlock_hpte(hpte, 0);
 
-   vcpu-arch.gpr[4] = v;
-   vcpu-arch.gpr[5] = r;
+   hpret[0] = v;
+   hpret[1] = r;
return H_SUCCESS;
 }
+EXPORT_SYMBOL_GPL(kvmppc_do_h_remove);
+
+long kvmppc_h_remove(struct kvm_vcpu *vcpu, unsigned long flags,
+unsigned long pte_index, unsigned long avpn)
+{
+   return kvmppc_do_h_remove(vcpu-kvm, flags, pte_index, avpn,
+ vcpu-arch.gpr[4]);
+}
 
 long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
 {
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] MAINTAINERS: Add git tree link for PPC KVM

2012-10-15 Thread Michael Ellerman
Signed-off-by: Michael Ellerman mich...@ellerman.id.au
---
 MAINTAINERS |1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index e73060f..32dc107 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4244,6 +4244,7 @@ KERNEL VIRTUAL MACHINE (KVM) FOR POWERPC
 M: Alexander Graf ag...@suse.de
 L: kvm-ppc@vger.kernel.org
 W: http://kvm.qumranet.com
+T: git git://github.com/agraf/linux-2.6.git
 S: Supported
 F: arch/powerpc/include/asm/kvm*
 F: arch/powerpc/kvm/
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html