Got bash 'Resource temporarily unavailable' when testing linux_s3 (kvm)

2009-10-02 Thread Amos Kong

Anybody can help on linux_s3 (kvm autotest)?

When running the testcase linux_s3 to test kvm, we constantly got
'bash: echo: write error: Resource temporarily unavailable'.

This testcase can be found at
autotest/client/tests/kvm/tests/linux_s3.py
(or autotest/client/tests/kvm/kvm_tests.py in the old version).

The testing command is:
'chvt %s  echo mem  /sys/power/state  chvt %s' % (dst_tty,
src_tty)
e.g.
'chvt 1  echo mem  /sys/power/state  chvt 7'


Is there any chance that the 'echo' command is executed before 'chvt
1' took full effect? (Just my wild guess.)


I'll appreciate your help.


Regards,

-- 
Amos Kong
Quality Engineer
Raycom Office(Beijing), Red Hat Inc.
Phone: +86-10-62608183
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


avoid soft lockups

2009-10-02 Thread James Brackinshaw
Hello,

If I suspend and resume a guest under kvm, or I migrate it from one
host to another, I sometimes get a soft lockup.

Is there a timer mode to prevent or reduce the likelihood of these?

Thanks

JB
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: avoid soft lockups

2009-10-02 Thread Marcelo Tosatti
On Fri, Oct 02, 2009 at 01:54:22PM +0200, James Brackinshaw wrote:
 Hello,
 
 If I suspend and resume a guest under kvm, or I migrate it from one
 host to another, I sometimes get a soft lockup.
 
 Is there a timer mode to prevent or reduce the likelihood of these?

Not yet. For now you can either disable softlockup in the guest, or live
with the spurious warnings.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Got bash 'Resource temporarily unavailable' when testing linux_s3 (kvm)

2009-10-02 Thread Marcelo Tosatti
On Fri, Oct 02, 2009 at 05:44:35PM +0800, Amos Kong wrote:
 
 Anybody can help on linux_s3 (kvm autotest)?
 
 When running the testcase linux_s3 to test kvm, we constantly got
 'bash: echo: write error: Resource temporarily unavailable'.
 
 This testcase can be found at
 autotest/client/tests/kvm/tests/linux_s3.py
 (or autotest/client/tests/kvm/kvm_tests.py in the old version).
 
 The testing command is:
 'chvt %s  echo mem  /sys/power/state  chvt %s' % (dst_tty,
 src_tty)
 e.g.
 'chvt 1  echo mem  /sys/power/state  chvt 7'
 
 
 Is there any chance that the 'echo' command is executed before 'chvt
 1' took full effect? (Just my wild guess.)
 
 
 I'll appreciate your help.

There was a bug in virtio-balloon, fixed in 2.6.30, that prevents proper
suspend-to-RAM.

Can you share dmesg output after the failure?


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [RESEND] KVM:VMX: Add support for Pause-Loop Exiting

2009-10-02 Thread Marcelo Tosatti
On Wed, Sep 30, 2009 at 01:22:49PM -0300, Marcelo Tosatti wrote:
 On Wed, Sep 30, 2009 at 09:01:51AM +0800, Zhai, Edwin wrote:
  Avi,
  I modify it according your comments. The only thing I want to keep is  
  the module param ple_gap/window.  Although they are not per-guest, they  
  can be used to find the right value, and disable PLE for debug purpose.
 
  Thanks,
 
 
  Avi Kivity wrote:
  On 09/28/2009 11:33 AM, Zhai, Edwin wrote:

  Avi Kivity wrote:
  
  +#define KVM_VMX_DEFAULT_PLE_GAP41
  +#define KVM_VMX_DEFAULT_PLE_WINDOW 4096
  +static int __read_mostly ple_gap = KVM_VMX_DEFAULT_PLE_GAP;
  +module_param(ple_gap, int, S_IRUGO);
  +
  +static int __read_mostly ple_window = KVM_VMX_DEFAULT_PLE_WINDOW;
  +module_param(ple_window, int, S_IRUGO);
 
  Shouldn't be __read_mostly since they're read very rarely  
  (__read_mostly should be for variables that are very often read, 
  and rarely written).

  In general, they are read only except that experienced user may try  
  different parameter for perf tuning.
  
 
 
  __read_mostly doesn't just mean it's read mostly.  It also means it's  
  read often.  Otherwise it's just wasting space in hot cachelines.
 

  I'm not even sure they should be parameters.

  For different spinlock in different OS, and for different workloads,  
  we need different parameter for tuning. It's similar as the 
  enable_ept.
  
 
  No, global parameters don't work for tuning workloads and guests since  
  they cannot be modified on a per-guest basis.  enable_ept is only 
  useful for debugging and testing.
 

  +set_current_state(TASK_INTERRUPTIBLE);
  +schedule_hrtimeout(expires, HRTIMER_MODE_ABS);
  +
  
  Please add a tracepoint for this (since it can cause significant  
  change in behaviour),   
  Isn't trace_kvm_exit(exit_reason, ...) enough? We can tell the PLE  
  vmexit from other vmexits.
  
 
  Right.  I thought of the software spinlock detector, but that's another 
  problem.
 
  I think you can drop the sleep_time parameter, it can be part of the  
  function.  Also kvm_vcpu_sleep() is confusing, we also sleep on halt.   
  Please call it kvm_vcpu_on_spin() or something (since that's what the  
  guest is doing).
 
 kvm_vcpu_on_spin() should add the vcpu to vcpu-wq (so a new pending
 interrupt wakes it up immediately).

Updated version (also please send it separately from the vmx.c patch):

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 894a56e..43125dc 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -231,6 +231,7 @@ int kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn);
 void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
 
 void kvm_vcpu_block(struct kvm_vcpu *vcpu);
+void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu);
 void kvm_resched(struct kvm_vcpu *vcpu);
 void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
 void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4d0dd39..e788d70 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1479,6 +1479,21 @@ void kvm_resched(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_resched);
 
+void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu)
+{
+   ktime_t expires;
+   DEFINE_WAIT(wait);
+
+   prepare_to_wait(vcpu-wq, wait, TASK_INTERRUPTIBLE);
+
+   /* Sleep for 100 us, and hope lock-holder got scheduled */
+   expires = ktime_add_ns(ktime_get(), 10UL);
+   schedule_hrtimeout(expires, HRTIMER_MODE_ABS);
+
+   finish_wait(vcpu-wq, wait);
+}
+EXPORT_SYMBOL_GPL(kvm_vcpu_on_spin);
+
 static int kvm_vcpu_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 {
struct kvm_vcpu *vcpu = vma-vm_file-private_data;
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: INFO: task journal:337 blocked for more than 120 seconds

2009-10-02 Thread Jeremy Fitzhardinge
On 09/30/09 14:11, Shirley Ma wrote:
 Anybody found this problem before? I kept hitting this issue for 2.6.31
 guest kernel even with a simple network test.

 INFO: task kjournal:337 blocked for more than 120 seconds.
 echo 0  /proc/sys/kernel/hung_task_timeout_sec disables this message.

 kjournald D 0041  0   337 2 0x

 My test is totally being blocked.

I'm assuming from the lists you've posted to that this is under KVM? 
What disk drivers are you using (virtio or emulated)?

Can you get a full stack backtrace of kjournald?

Kevin Bowling submitted a RH bug against Xen with apparently the same
symptoms (https://bugzilla.redhat.com/show_bug.cgi?id=526627).  I'm
wondering if there's a core kernel bug here, which is perhaps more
easily triggered by the changed timing in a virtual machine.

Thanks,
J
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: INFO: task journal:337 blocked for more than 120 seconds

2009-10-02 Thread Jeremy Fitzhardinge
On 10/02/09 12:06, Shirley Ma wrote:
 On Fri, 2009-10-02 at 11:30 -0700, Jeremy Fitzhardinge wrote:
   
 I'm assuming from the lists you've posted to that this is under KVM? 
 What disk drivers are you using (virtio or emulated)?

 Can you get a full stack backtrace of kjournald?
 
 Yes, it's under KVM, disk driver is virtio. Since the io has issue, the
 stack can't be saved on the disk. I have the image file attached here.
   

Ah, thank you.  The backtrace does indeed look very similar.

(BTW, you could get a serial console with qemu-kvm -nographic -append
console=ttyS0 ...)

J
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH][retry 2] Support Pause Filter in AMD processors

2009-10-02 Thread Mark Langsdorf
From 66741f741da741e58e8162ef7809dd7d6f8e01cf Mon Sep 17 00:00:00 2001
From: Mark Langsdorf mark.langsd...@amd.com
Date: Fri, 2 Oct 2009 10:32:33 -0500
Subject: [PATCH] Support Pause Filter in AMD processors

New AMD processors (Family 0x10 models 8+) support the Pause
Filter Feature.  This feature creates a new field in the VMCB
called Pause Filter Count.  If Pause Filter Count is greater
than 0 and intercepting PAUSEs is enabled, the processor will
increment an internal counter when a PAUSE instruction occurs
instead of intercepting.  When the internal counter reaches the
Pause Filter Count value, a PAUSE intercept will occur.

This feature can be used to detect contended spinlocks,
especially when the lock holding VCPU is not scheduled.
Rescheduling another VCPU prevents the VCPU seeking the
lock from wasting its quantum by spinning idly.

Experimental results show that most spinlocks are held
for less than 1000 PAUSE cycles or more than a few
thousand.  Default the Pause Filter Counter to 3000 to
detect the contended spinlocks.

Processor support for this feature is indicated by a CPUID
bit.

On a 24 core system running 4 guests each with 16 VCPUs,
this patch improved overall performance of each guest's
32 job kernbench by approximately 3-5% when combined
with a scheduler algorithm thati caused the VCPU to
sleep for a brief period. Further performance improvement
may be possible with a more sophisticated yield algorithm.

This patch depends on the changes to the kvm code from
KVM:VMX: Add support for Pause Loop Exiting
http://www.mail-archive.com/kvm@vger.kernel.org/msg23089.html

-Mark Langsdorf
Operating System Research Center
AMD
---
 arch/x86/include/asm/svm.h |3 ++-
 arch/x86/kvm/svm.c |   16 
 2 files changed, 18 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 85574b7..1fecb7e 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -57,7 +57,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
u16 intercept_dr_write;
u32 intercept_exceptions;
u64 intercept;
-   u8 reserved_1[44];
+   u8 reserved_1[42];
+   u16 pause_filter_count;
u64 iopm_base_pa;
u64 msrpm_base_pa;
u64 tsc_offset;
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 9a4daca..d5d2e03 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -46,6 +46,7 @@ MODULE_LICENSE(GPL);
 #define SVM_FEATURE_NPT  (1  0)
 #define SVM_FEATURE_LBRV (1  1)
 #define SVM_FEATURE_SVML (1  2)
+#define SVM_FEATURE_PAUSE_FILTER (1  10)
 
 #define NESTED_EXIT_HOST   0   /* Exit handled on host level */
 #define NESTED_EXIT_DONE   1   /* Exit caused nested vmexit  */
@@ -659,6 +660,11 @@ static void init_vmcb(struct vcpu_svm *svm)
svm-nested.vmcb = 0;
svm-vcpu.arch.hflags = 0;
 
+   if (svm_has(SVM_FEATURE_PAUSE_FILTER)) {
+   control-pause_filter_count = 3000;
+   control-intercept |= (1ULL  INTERCEPT_PAUSE);
+   }
+
enable_gif(svm);
 }
 
@@ -2270,6 +2276,15 @@ static int interrupt_window_interception(struct vcpu_svm 
*svm)
return 1;
 }
 
+static int pause_interception(struct vcpu_svm *svm)
+{
+   static int pause_count = 0;
+
+   kvm_vcpu_on_spin((svm-vcpu));
+printk(KERN_ERR MJLL pause intercepted %d\n, ++pause_count);
+   return 1;
+}
+
 static int (*svm_exit_handlers[])(struct vcpu_svm *svm) = {
[SVM_EXIT_READ_CR0] = emulate_on_interception,
[SVM_EXIT_READ_CR3] = emulate_on_interception,
@@ -2305,6 +2320,7 @@ static int (*svm_exit_handlers[])(struct vcpu_svm *svm) = 
{
[SVM_EXIT_CPUID]= cpuid_interception,
[SVM_EXIT_IRET] = iret_interception,
[SVM_EXIT_INVD] = emulate_on_interception,
+   [SVM_EXIT_PAUSE]= pause_interception,
[SVM_EXIT_HLT]  = halt_interception,
[SVM_EXIT_INVLPG]   = invlpg_interception,
[SVM_EXIT_INVLPGA]  = invlpga_interception,
-- 
1.6.0.2


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


docs on storage pools?

2009-10-02 Thread Richard Wurman
So far I've been using files and/or LVM partitions for my VMs --
basically by using virt-manager and modifying existing XML configs and
just copying my VM files to be reused.

I'm wondering how KVM storage pools work -- at first I thought it was
something like KVM's version of LVM where you can just dump all your
VMs in one space .. .but it looks like it's really means different
places you want to store your VMs:

- dir: Filesystem Directory
- disk: Physical Disk Device
- fs: Pre-Formatted Block Device
- iscsi: iSCSI Target
-logical: LVM Volume Group
- netfs: Network exported directory

I understand things like LVM and storing VMs in a filesystem
directory.. but what real difference is there by going through the
GUI? I suppose nothing. Maybe I'm overthinking this -- it's just a
frontend to where you store your VMs?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/4] KVM: xinterface

2009-10-02 Thread Gregory Haskins
(Applies to kvm.git/master:083e9e10)

For details, please read the patch headers.

[ Changelog:

 v2:
*) We now re-use the vmfd as the binding token, instead of creating
   a new separate namespace
*) Added support for switch_to(mm), which is much faster
*) Added support for memslot-cache for exploiting slot locality
*) Added support for scatter-gather access
*) Added support for xioevent interface

  v1:
*) Initial release
]

This series is included in upstream AlacrityVM and is well tested and
known to work properly.  Comments?

Kind Regards,
-Greg

---

Gregory Haskins (4):
  KVM: add scatterlist support to xinterface
  KVM: add io services to xinterface
  KVM: introduce xinterface API for external interaction with guests
  mm: export use_mm() and unuse_mm() to modules


 arch/x86/kvm/Makefile  |2 
 include/linux/kvm_host.h   |3 
 include/linux/kvm_xinterface.h |  165 +++
 kernel/fork.c  |1 
 mm/mmu_context.c   |3 
 virt/kvm/kvm_main.c|   24 ++
 virt/kvm/xinterface.c  |  587 
 7 files changed, 784 insertions(+), 1 deletions(-)
 create mode 100644 include/linux/kvm_xinterface.h
 create mode 100644 virt/kvm/xinterface.c

-- 
Signature
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/4] mm: export use_mm() and unuse_mm() to modules

2009-10-02 Thread Gregory Haskins
We want to use these functions from withing KVM, which may be built as
a module.

Signed-off-by: Gregory Haskins ghask...@novell.com
---

 mm/mmu_context.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/mm/mmu_context.c b/mm/mmu_context.c
index ded9081..f31ba20 100644
--- a/mm/mmu_context.c
+++ b/mm/mmu_context.c
@@ -6,6 +6,7 @@
 #include linux/mm.h
 #include linux/mmu_context.h
 #include linux/sched.h
+#include linux/module.h
 
 #include asm/mmu_context.h
 
@@ -37,6 +38,7 @@ void use_mm(struct mm_struct *mm)
if (active_mm != mm)
mmdrop(active_mm);
 }
+EXPORT_SYMBOL_GPL(use_mm);
 
 /*
  * unuse_mm
@@ -56,3 +58,4 @@ void unuse_mm(struct mm_struct *mm)
enter_lazy_tlb(mm, tsk);
task_unlock(tsk);
 }
+EXPORT_SYMBOL_GPL(unuse_mm);

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/4] KVM: introduce xinterface API for external interaction with guests

2009-10-02 Thread Gregory Haskins
What: xinterface is a mechanism that allows kernel modules external to
the kvm.ko proper to interface with a running guest.  It accomplishes
this by creating an abstracted interface which does not expose any
private details of the guest or its related KVM structures, and provides
a mechanism to find and bind to this interface at run-time.

Why: There are various subsystems that would like to interact with a KVM
guest which are ideally suited to exist outside the domain of the kvm.ko
core logic. For instance, external pci-passthrough, virtual-bus, and
virtio-net modules are currently under development.  In order for these
modules to successfully interact with the guest, they need, at the very
least, various interfaces for signaling IO events, pointer translation,
and possibly memory mapping.

The signaling case is covered by the recent introduction of the
irqfd/ioeventfd mechanisms.  This patch provides a mechanism to cover the
other cases.  Note that today we only expose pointer-translation related
functions, but more could be added at a future date as needs arise.

Example usage: QEMU instantiates a guest, and an external module foo
that desires the ability to interface with the guest (say via
open(/dev/foo)).  QEMU may then pass the kvmfd to foo via an
ioctl, such as: ioctl(foofd, FOO_SET_VMID, kvmfd).  Upon receipt, the
foo module can issue kvm_xinterface_bind(kvmfd) to acquire
the proper context.  Internally, the struct kvm* and associated
struct module* will remain pinned at least until the foo module calls
kvm_xinterface_put().

Signed-off-by: Gregory Haskins ghask...@novell.com
---

 arch/x86/kvm/Makefile  |2 
 include/linux/kvm_host.h   |3 
 include/linux/kvm_xinterface.h |  114 +++
 kernel/fork.c  |1 
 virt/kvm/kvm_main.c|   24 ++
 virt/kvm/xinterface.c  |  409 
 6 files changed, 552 insertions(+), 1 deletions(-)
 create mode 100644 include/linux/kvm_xinterface.h
 create mode 100644 virt/kvm/xinterface.c

diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 31a7035..0449d6e 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -7,7 +7,7 @@ CFLAGS_vmx.o := -I.
 
 kvm-y  += $(addprefix ../../../virt/kvm/, kvm_main.o ioapic.o \
coalesced_mmio.o irq_comm.o eventfd.o \
-   assigned-dev.o)
+   assigned-dev.o xinterface.o)
 kvm-$(CONFIG_IOMMU_API)+= $(addprefix ../../../virt/kvm/, iommu.o)
 
 kvm-y  += x86.o mmu.o emulate.o i8259.o irq.o lapic.o \
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index b985a29..7cc1afb 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -362,6 +362,9 @@ void kvm_arch_sync_events(struct kvm *kvm);
 int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu);
 void kvm_vcpu_kick(struct kvm_vcpu *vcpu);
 
+struct kvm_xinterface *
+kvm_xinterface_alloc(struct kvm *kvm, struct module *owner);
+
 int kvm_is_mmio_pfn(pfn_t pfn);
 
 struct kvm_irq_ack_notifier {
diff --git a/include/linux/kvm_xinterface.h b/include/linux/kvm_xinterface.h
new file mode 100644
index 000..01f092b
--- /dev/null
+++ b/include/linux/kvm_xinterface.h
@@ -0,0 +1,114 @@
+#ifndef __KVM_XINTERFACE_H
+#define __KVM_XINTERFACE_H
+
+/*
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include linux/kref.h
+#include linux/module.h
+#include linux/file.h
+
+struct kvm_xinterface;
+struct kvm_xvmap;
+
+struct kvm_xinterface_ops {
+   unsigned long (*copy_to)(struct kvm_xinterface *intf,
+unsigned long gpa, const void *src,
+unsigned long len);
+   unsigned long (*copy_from)(struct kvm_xinterface *intf, void *dst,
+  unsigned long gpa, unsigned long len);
+   struct kvm_xvmap* (*vmap)(struct kvm_xinterface *intf,
+ unsigned long gpa,
+ unsigned long len);
+   void (*release)(struct kvm_xinterface *);
+};
+
+struct kvm_xinterface {
+   struct module   *owner;
+   struct kref  kref;
+   const struct kvm_xinterface_ops *ops;
+};
+
+static inline void
+kvm_xinterface_get(struct kvm_xinterface *intf)
+{
+   kref_get(intf-kref);
+}
+
+static inline void
+_kvm_xinterface_release(struct kref *kref)
+{
+   struct kvm_xinterface *intf;
+   struct module *owner;
+
+   intf = container_of(kref, struct kvm_xinterface, kref);
+
+   owner = intf-owner;
+   rmb();
+
+   intf-ops-release(intf);
+   module_put(owner);
+}
+
+static inline void
+kvm_xinterface_put(struct kvm_xinterface *intf)
+{
+   kref_put(intf-kref, _kvm_xinterface_release);
+}
+
+struct kvm_xvmap_ops {
+   void (*release)(struct 

[PATCH v2 3/4] KVM: add io services to xinterface

2009-10-02 Thread Gregory Haskins
We want to add a more efficient way to get PIO signals out of the guest,
so we add an xioevent interface.  This allows a client to register
for notifications when a specific MMIO/PIO address is touched by
the guest.  This is an alternative interface to ioeventfd, which is
performance limited by io-bus scaling and eventfd wait-queue based
notification mechanism.  This also has the advantage of retaining
the full PIO data payload and passing it to the recipient.

Signed-off-by: Gregory Haskins ghask...@novell.com
---

 include/linux/kvm_xinterface.h |   47 ++
 virt/kvm/xinterface.c  |  106 
 2 files changed, 153 insertions(+), 0 deletions(-)

diff --git a/include/linux/kvm_xinterface.h b/include/linux/kvm_xinterface.h
index 01f092b..684b6f8 100644
--- a/include/linux/kvm_xinterface.h
+++ b/include/linux/kvm_xinterface.h
@@ -12,6 +12,16 @@
 
 struct kvm_xinterface;
 struct kvm_xvmap;
+struct kvm_xioevent;
+
+enum {
+   kvm_xioevent_flag_nr_pio,
+   kvm_xioevent_flag_nr_max,
+};
+
+#define KVM_XIOEVENT_FLAG_PIO   (1  kvm_xioevent_flag_nr_pio)
+
+#define KVM_XIOEVENT_VALID_FLAG_MASK  ((1  kvm_xioevent_flag_nr_max) - 1)
 
 struct kvm_xinterface_ops {
unsigned long (*copy_to)(struct kvm_xinterface *intf,
@@ -22,6 +32,10 @@ struct kvm_xinterface_ops {
struct kvm_xvmap* (*vmap)(struct kvm_xinterface *intf,
  unsigned long gpa,
  unsigned long len);
+   struct kvm_xioevent* (*ioevent)(struct kvm_xinterface *intf,
+   u64 addr,
+   unsigned long len,
+   unsigned long flags);
void (*release)(struct kvm_xinterface *);
 };
 
@@ -109,6 +123,39 @@ kvm_xvmap_put(struct kvm_xvmap *vmap)
kref_put(vmap-kref, _kvm_xvmap_release);
 }
 
+struct kvm_xioevent_ops {
+   void (*deassign)(struct kvm_xioevent *ioevent);
+};
+
+struct kvm_xioevent {
+   const struct kvm_xioevent_ops *ops;
+   struct kvm_xinterface *intf;
+   void (*signal)(struct kvm_xioevent *ioevent, const void *val);
+   void  *priv;
+};
+
+static inline void
+kvm_xioevent_init(struct kvm_xioevent *ioevent,
+ const struct kvm_xioevent_ops *ops,
+ struct kvm_xinterface *intf)
+{
+   memset(ioevent, 0, sizeof(vmap));
+   ioevent-ops = ops;
+   ioevent-intf = intf;
+
+   kvm_xinterface_get(intf);
+}
+
+static inline void
+kvm_xioevent_deassign(struct kvm_xioevent *ioevent)
+{
+   struct kvm_xinterface *intf = ioevent-intf;
+   rmb();
+
+   ioevent-ops-deassign(ioevent);
+   kvm_xinterface_put(intf);
+}
+
 struct kvm_xinterface *kvm_xinterface_bind(int fd);
 
 #endif /* __KVM_XINTERFACE_H */
diff --git a/virt/kvm/xinterface.c b/virt/kvm/xinterface.c
index 3b586c5..c356835 100644
--- a/virt/kvm/xinterface.c
+++ b/virt/kvm/xinterface.c
@@ -28,6 +28,8 @@
 #include linux/kvm_host.h
 #include linux/kvm_xinterface.h
 
+#include iodev.h
+
 struct _xinterface {
struct kvm *kvm;
struct task_struct *task;
@@ -42,6 +44,14 @@ struct _xvmap {
struct kvm_xvmap   vmap;
 };
 
+struct _ioevent {
+   u64   addr;
+   int   length;
+   struct kvm_io_bus*bus;
+   struct kvm_io_device  dev;
+   struct kvm_xioevent   ioevent;
+};
+
 static struct _xinterface *
 to_intf(struct kvm_xinterface *intf)
 {
@@ -362,6 +372,101 @@ fail:
return ERR_PTR(ret);
 }
 
+/* MMIO/PIO writes trigger an event if the addr/val match */
+static int
+ioevent_write(struct kvm_io_device *dev, gpa_t addr, int len, const void *val)
+{
+   struct _ioevent *p = container_of(dev, struct _ioevent, dev);
+   struct kvm_xioevent *ioevent = p-ioevent;
+
+   if (!(addr == p-addr  len == p-length))
+   return -EOPNOTSUPP;
+
+   if (!ioevent-signal)
+   return 0;
+
+   ioevent-signal(ioevent, val);
+   return 0;
+}
+
+static const struct kvm_io_device_ops ioevent_device_ops = {
+   .write = ioevent_write,
+};
+
+static void
+ioevent_deassign(struct kvm_xioevent *ioevent)
+{
+   struct _ioevent*p = container_of(ioevent, struct _ioevent, ioevent);
+   struct _xinterface *_intf = to_intf(ioevent-intf);
+   struct kvm *kvm = _intf-kvm;
+
+   kvm_io_bus_unregister_dev(kvm, p-bus, p-dev);
+   kfree(p);
+}
+
+static const struct kvm_xioevent_ops ioevent_intf_ops = {
+   .deassign = ioevent_deassign,
+};
+
+static struct kvm_xioevent*
+xinterface_ioevent(struct kvm_xinterface *intf,
+  u64 addr,
+  unsigned long len,
+  unsigned long flags)
+{
+   struct _xinterface *_intf = to_intf(intf);
+   struct kvm *kvm = _intf-kvm;
+   int pio = flags 

[PATCH v2 4/4] KVM: add scatterlist support to xinterface

2009-10-02 Thread Gregory Haskins
This allows a scatter-gather approach to IO, which will be useful for
building high performance interfaces, like zero-copy and low-latency
copy (avoiding multiple calls to copy_to/from).

The interface is based on the existing scatterlist infrastructure.  The
caller is expected to pass in a scatterlist with its dma field
populated with valid GPAs.  The xinterface will then populate each
entry by translating the GPA to a page*.

The caller signifies completion by simply performing a put_page() on
each page returned in the list.

Signed-off-by: Gregory Haskins ghask...@novell.com
---

 include/linux/kvm_xinterface.h |4 ++
 virt/kvm/xinterface.c  |   72 
 2 files changed, 76 insertions(+), 0 deletions(-)

diff --git a/include/linux/kvm_xinterface.h b/include/linux/kvm_xinterface.h
index 684b6f8..eefb575 100644
--- a/include/linux/kvm_xinterface.h
+++ b/include/linux/kvm_xinterface.h
@@ -9,6 +9,7 @@
 #include linux/kref.h
 #include linux/module.h
 #include linux/file.h
+#include linux/scatterlist.h
 
 struct kvm_xinterface;
 struct kvm_xvmap;
@@ -36,6 +37,9 @@ struct kvm_xinterface_ops {
u64 addr,
unsigned long len,
unsigned long flags);
+   unsigned long (*sgmap)(struct kvm_xinterface *intf,
+  struct scatterlist *sgl, int nents,
+  unsigned long flags);
void (*release)(struct kvm_xinterface *);
 };
 
diff --git a/virt/kvm/xinterface.c b/virt/kvm/xinterface.c
index c356835..16729f6 100644
--- a/virt/kvm/xinterface.c
+++ b/virt/kvm/xinterface.c
@@ -467,6 +467,77 @@ fail:
 
 }
 
+static unsigned long
+xinterface_sgmap(struct kvm_xinterface *intf,
+struct scatterlist *sgl, int nents,
+unsigned long flags)
+{
+   struct _xinterface *_intf   = to_intf(intf);
+   struct task_struct *p   = _intf-task;
+   struct mm_struct   *mm  = _intf-mm;
+   struct kvm *kvm = _intf-kvm;
+   struct kvm_memory_slot *memslot = NULL;
+   boolkthread = !current-mm;
+   int ret;
+   struct scatterlist *sg;
+   int i;
+
+   down_read(kvm-slots_lock);
+
+   if (kthread)
+   use_mm(_intf-mm);
+
+   for_each_sg(sgl, sg, nents, i) {
+   unsigned long   gpa= sg_dma_address(sg);
+   unsigned long   len= sg_dma_len(sg);
+   unsigned long   gfn= gpa  PAGE_SHIFT;
+   off_t   offset = offset_in_page(gpa);
+   unsigned long   hva;
+   struct page*pg;
+
+   /* ensure that we do not have more than one page per entry */
+   if ((PAGE_ALIGN(len + offset)  PAGE_SHIFT) != 1) {
+   ret = -EINVAL;
+   break;
+   }
+
+   /* check for a memslot-cache miss */
+   if (!memslot
+   || gfn  memslot-base_gfn
+   || gfn = memslot-base_gfn + memslot-npages) {
+   memslot = gfn_to_memslot(kvm, gfn);
+   if (!memslot) {
+   ret = -EFAULT;
+   break;
+   }
+   }
+
+   hva = (memslot-userspace_addr +
+  (gfn - memslot-base_gfn) * PAGE_SIZE);
+
+   if (kthread || current-mm == mm)
+   ret = get_user_pages_fast(hva, 1, 1, pg);
+   else
+   ret = get_user_pages(p, mm, hva, 1, 1, 0, pg, NULL);
+
+   if (ret != 1) {
+   if (ret = 0)
+   ret = -EFAULT;
+   break;
+   }
+
+   sg_set_page(sg, pg, len, offset);
+   ret = 0;
+   }
+
+   if (kthread)
+   unuse_mm(_intf-mm);
+
+   up_read(kvm-slots_lock);
+
+   return ret;
+}
+
 static void
 xinterface_release(struct kvm_xinterface *intf)
 {
@@ -483,6 +554,7 @@ struct kvm_xinterface_ops _xinterface_ops = {
.copy_from   = xinterface_copy_from,
.vmap= xinterface_vmap,
.ioevent = xinterface_ioevent,
+   .sgmap   = xinterface_sgmap,
.release = xinterface_release,
 };
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM: VMX: flush TLB with INVEPT on cpu migration

2009-10-02 Thread Ram Pai

On Thu, 2009-10-01 at 19:16 -0300, Marcelo Tosatti wrote:
 It is possible that stale EPTP-tagged mappings are used, if a 
 vcpu migrates to a different pcpu.
 
 Set KVM_REQ_TLB_FLUSH in vmx_vcpu_load, when switching pcpus, which
 will invalidate both VPID and EPT mappings on the next vm-entry.
 
 Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
 
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index e86f1a6..97f4265 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -708,7 +708,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
   if (vcpu-cpu != cpu) {
   vcpu_clear(vmx);
   kvm_migrate_timers(vcpu);
 - vpid_sync_vcpu_all(vmx);
 + set_bit(KVM_REQ_TLB_FLUSH, vcpu-requests);
   local_irq_disable();
   list_add(vmx-local_vcpus_link,
per_cpu(vcpus_on_cpu, cpu));
 --

This patch fixes my ept misconfig problem seen very so often while
installing sles11 guest.

thanks,
RP

 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] KVM: x86: Refactor guest debug IOCTL handling

2009-10-02 Thread Jan Kiszka
Much of so far vendor-specific code for setting up guest debug can
actually be handled by the generic code. This also fixes a minor deficit
in the SVM part /wrt processing KVM_GUESTDBG_ENABLE.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---

 arch/x86/include/asm/kvm_host.h |4 ++--
 arch/x86/kvm/svm.c  |   14 ++
 arch/x86/kvm/vmx.c  |   18 +-
 arch/x86/kvm/x86.c  |   28 +---
 4 files changed, 26 insertions(+), 38 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 295c7c4..e7f8708 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -475,8 +475,8 @@ struct kvm_x86_ops {
void (*vcpu_load)(struct kvm_vcpu *vcpu, int cpu);
void (*vcpu_put)(struct kvm_vcpu *vcpu);
 
-   int (*set_guest_debug)(struct kvm_vcpu *vcpu,
-  struct kvm_guest_debug *dbg);
+   void (*set_guest_debug)(struct kvm_vcpu *vcpu,
+   struct kvm_guest_debug *dbg);
int (*get_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata);
int (*set_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 data);
u64 (*get_segment_base)(struct kvm_vcpu *vcpu, int seg);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 02a4269..279a2ae 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1065,26 +1065,16 @@ static void update_db_intercept(struct kvm_vcpu *vcpu)
vcpu-guest_debug = 0;
 }
 
-static int svm_guest_debug(struct kvm_vcpu *vcpu, struct kvm_guest_debug *dbg)
+static void svm_guest_debug(struct kvm_vcpu *vcpu, struct kvm_guest_debug *dbg)
 {
-   int old_debug = vcpu-guest_debug;
struct vcpu_svm *svm = to_svm(vcpu);
 
-   vcpu-guest_debug = dbg-control;
-
-   update_db_intercept(vcpu);
-
if (vcpu-guest_debug  KVM_GUESTDBG_USE_HW_BP)
svm-vmcb-save.dr7 = dbg-arch.debugreg[7];
else
svm-vmcb-save.dr7 = vcpu-arch.dr7;
 
-   if (vcpu-guest_debug  KVM_GUESTDBG_SINGLESTEP)
-   svm-vmcb-save.rflags |= X86_EFLAGS_TF | X86_EFLAGS_RF;
-   else if (old_debug  KVM_GUESTDBG_SINGLESTEP)
-   svm-vmcb-save.rflags = ~(X86_EFLAGS_TF | X86_EFLAGS_RF);
-
-   return 0;
+   update_db_intercept(vcpu);
 }
 
 static void load_host_msrs(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 97f4265..70020e5 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1096,30 +1096,14 @@ static void vmx_cache_reg(struct kvm_vcpu *vcpu, enum 
kvm_reg reg)
}
 }
 
-static int set_guest_debug(struct kvm_vcpu *vcpu, struct kvm_guest_debug *dbg)
+static void set_guest_debug(struct kvm_vcpu *vcpu, struct kvm_guest_debug *dbg)
 {
-   int old_debug = vcpu-guest_debug;
-   unsigned long flags;
-
-   vcpu-guest_debug = dbg-control;
-   if (!(vcpu-guest_debug  KVM_GUESTDBG_ENABLE))
-   vcpu-guest_debug = 0;
-
if (vcpu-guest_debug  KVM_GUESTDBG_USE_HW_BP)
vmcs_writel(GUEST_DR7, dbg-arch.debugreg[7]);
else
vmcs_writel(GUEST_DR7, vcpu-arch.dr7);
 
-   flags = vmcs_readl(GUEST_RFLAGS);
-   if (vcpu-guest_debug  KVM_GUESTDBG_SINGLESTEP)
-   flags |= X86_EFLAGS_TF | X86_EFLAGS_RF;
-   else if (old_debug  KVM_GUESTDBG_SINGLESTEP)
-   flags = ~(X86_EFLAGS_TF | X86_EFLAGS_RF);
-   vmcs_writel(GUEST_RFLAGS, flags);
-
update_exception_bitmap(vcpu);
-
-   return 0;
 }
 
 static __init int cpu_has_kvm_support(void)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ffccb5c..aa5d574 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4470,12 +4470,19 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
 int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
struct kvm_guest_debug *dbg)
 {
-   int i, r;
+   unsigned long rflags;
+   int old_debug;
+   int i;
 
vcpu_load(vcpu);
 
-   if ((dbg-control  (KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_USE_HW_BP)) ==
-   (KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_USE_HW_BP)) {
+   old_debug = vcpu-guest_debug;
+
+   vcpu-guest_debug = dbg-control;
+   if (!(vcpu-guest_debug  KVM_GUESTDBG_ENABLE))
+   vcpu-guest_debug = 0;
+
+   if (vcpu-guest_debug  KVM_GUESTDBG_USE_HW_BP) {
for (i = 0; i  KVM_NR_DB_REGS; ++i)
vcpu-arch.eff_db[i] = dbg-arch.debugreg[i];
vcpu-arch.switch_db_regs =
@@ -4486,16 +4493,23 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu 
*vcpu,
vcpu-arch.switch_db_regs = (vcpu-arch.dr7  DR7_BP_EN_MASK);
}
 
-   r = kvm_x86_ops-set_guest_debug(vcpu, dbg);
+   rflags = kvm_x86_ops-get_rflags(vcpu);
+   if (vcpu-guest_debug  KVM_GUESTDBG_SINGLESTEP)
+   

[PATCH 2/2] KVM: x86: Preserve guest single-stepping on register

2009-10-02 Thread Jan Kiszka
Give user space more flexibility /wrt its IOCTL order. So far updating
the rflags via KVM_SET_REGS ignored potentially set single-step flags.
Now they will be kept.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---

 arch/x86/kvm/x86.c |7 ++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index aa5d574..9fbb4c8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3853,6 +3853,8 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, 
struct kvm_regs *regs)
 
 int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
 {
+   unsigned long rflags;
+
vcpu_load(vcpu);
 
kvm_register_write(vcpu, VCPU_REGS_RAX, regs-rax);
@@ -3876,8 +3878,11 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, 
struct kvm_regs *regs)
 #endif
 
kvm_rip_write(vcpu, regs-rip);
-   kvm_x86_ops-set_rflags(vcpu, regs-rflags);
 
+   rflags = regs-rflags;
+   if (vcpu-guest_debug  KVM_GUESTDBG_SINGLESTEP)
+   rflags |= X86_EFLAGS_TF | X86_EFLAGS_RF;
+   kvm_x86_ops-set_rflags(vcpu, rflags);
 
vcpu-arch.exception.pending = false;
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][retry 2] Support Pause Filter in AMD processors

2009-10-02 Thread Joerg Roedel
On Fri, Oct 02, 2009 at 02:49:59PM -0500, Mark Langsdorf wrote:
 +static int pause_interception(struct vcpu_svm *svm)
 +{
 + static int pause_count = 0;
 +
 + kvm_vcpu_on_spin((svm-vcpu));
 +printk(KERN_ERR MJLL pause intercepted %d\n, ++pause_count);

Debugging leftover?

 + return 1;
 +}
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Release plan for 0.12.0

2009-10-02 Thread TAKEDA, toshiya
Anthony Liguori さんは書きました:
Hi,

Now that 0.11.0 is behind us, it's time to start thinking about 0.12.0.

I'd like to do a few things different this time around.  I don't think 
the -rc process went very well as I don't think we got more testing out 
of it.  I'd like to shorten the timeline for 0.12.0 a good bit.  The 
0.10 stable tree got pretty difficult to maintain toward the end of the 
cycle.  We also had a pretty huge amount of change between 0.10 and 0.11 
so I think a shorter cycle is warranted.

I think aiming for early to mid-December would give us roughly a 3 month 
cycle and would align well with some of the Linux distribution cycles.  
I'd like to limit things to a single -rc that lasted only for about a 
week.  This is enough time to fix most of the obvious issues I think.

I'd also like to try to enumerate some features for this release.  
Here's a short list of things I expect to see for this release 
(target-i386 centric).  Please add or comment on items that you'd either 
like to see in the release or are planning on working on.

 o VMState conversion -- I expect most of the pc target to be completed
 o qdev conversion -- I hope that we'll get most of the pc target 
completely converted to qdev
 o storage live migration
 o switch to SeaBIOS (need to finish porting features from Bochs)
 o switch to gPXE (need to resolve slirp tftp server issue)
 o KSM integration
 o in-kernel APIC support for KVM
 o guest SMP support for KVM
 o updates to the default pc machine type

Please add to this list and I'll collect it all and post it somewhere.

o NEC PC-9821 family support on target-i386

In the last patch MS-DOS can boot on QEMU.
I think I can add support nic (LGY-98) and IDE in 0.12.0
and I hope I can boot FreeBSD/pc98 on it.

PS.
I will repost v3 patch in the next week, please wait reviewing v2 patch I post 
Oct.1.

Thanks,
TAKEDA, toshiya

Thanks!

-- 
Regards,

Anthony Liguori





--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


(no subject)

2009-10-02 Thread debmail_5f63g

subscribe kvm
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html