Re: [PATCH v2 3/7] KVM-HV: KVM Steal time implementation

2011-06-20 Thread Avi Kivity

On 06/20/2011 05:53 AM, Glauber Costa wrote:





+static void record_steal_time(struct kvm_vcpu *vcpu)
+{
+ u64 delta;
+
+ if (vcpu-arch.st.stime vcpu-arch.st.this_time_out) {


0 is a valid value for stime.



how exactly? stime is a guest physical address...


0 is a valid physical address.



@@ -2158,6 +2206,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu,
int cpu)
kvm_migrate_timers(vcpu);
vcpu-cpu = cpu;
}
+
+ record_steal_time(vcpu);
}


This records time spent in userspace in the vcpu thread as steal time.
Is this what we want? Or just time preempted away?


There are arguments either way.

Right now, the way it is, it does account our iothread as steal time, 
which is not 100 % accurate if we think steal time as whatever takes 
time away from our VM. I tend to think it as whatever takes time 
away from this CPU, which includes other cpus in the same VM. So 
thinking this way, in a 1-1 phys-to-virt cpu mapping, if the iothread 
is taking 80 % cpu for whatever reason, we have 80 % steal time the 
cpu that is sharing the physical cpu with the iothread.


I'm not talking about the iothread, rather the vcpu thread while running 
in userspace.




Maybe we could account that as iotime ?
Questions like that are one of the reasons behind me leaving extra 
fields in the steal time structure. We could do a more fine grained 
accounting and differentiate between the multiple entities that can do

work (of various kinds) in our behalf.



What do other architectures do (xen, s390)?

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 5/7] KVM-GST: KVM Steal time accounting

2011-06-20 Thread Avi Kivity

On 06/20/2011 05:38 AM, Glauber Costa wrote:

On 06/19/2011 07:04 AM, Avi Kivity wrote:

On 06/17/2011 01:20 AM, Glauber Costa wrote:

This patch accounts steal time time in kernel/sched.
I kept it from last proposal, because I still see advantages
in it: Doing it here will give us easier access from scheduler
variables such as the cpu rq. The next patch shows an example of
usage for it.

Since functions like account_idle_time() can be called from
multiple places, not only account_process_tick(), steal time
grabbing is repeated in each account function separatedely.

/*
+ * We have to at flush steal time information every time something 
else

+ * is accounted. Since the accounting functions are all visible to
the rest
+ * of the kernel, it gets tricky to do them in one place. This helper
function
+ * helps us.
+ *
+ * When the system is idle, the concept of steal time does not apply.
We just
+ * tell the underlying hypervisor that we grabbed the data, but skip
steal time
+ * accounting
+ */
+static inline bool touch_steal_time(int is_idle)
+{
+ u64 steal, st = 0;
+
+ if (static_branch(paravirt_steal_enabled)) {
+
+ steal = paravirt_steal_clock(smp_processor_id());
+
+ steal -= this_rq()-prev_steal_time;
+ if (is_idle) {
+ this_rq()-prev_steal_time += steal;
+ return false;
+ }
+
+ while (steal= TICK_NSEC) {
+ /*
+ * Inline assembly required to prevent the compiler
+ * optimising this loop into a divmod call.
+ * See __iter_div_u64_rem() for another example of this.
+ */


Why not use said function?


because here we want to do work during each loop. The said function
would have to be adapted for that, possibly using a macro, to run 
arbitrary code during each loop iteration, in a way that I don't think
it is worthy given the current number of callers (2 counting this new 
one)


You mean adding to prev_steal_time?  That can be done outside the loop.




+ asm( : +rm (steal));
+
+ steal -= TICK_NSEC;
+ this_rq()-prev_steal_time += TICK_NSEC;
+ st++;


Suppose a live migration or SIGSTOP causes lots of steal time. How long
will we spend here?
Silly me. I actually used this same argument with Peter to cap it with 
delta in the next patch in this series. So I think you are 100 % 
right. Here, however, we do want to account all that time, I believe.


How about we do a slow division if we're  10 sec (unlikely), and 
account everything as steal time in this scenario ?


Okay.  Division would be faster for a lot less than 10s though.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/7] KVM-HV: KVM Steal time implementation

2011-06-20 Thread Gleb Natapov
On Sun, Jun 19, 2011 at 04:02:22PM +0300, Avi Kivity wrote:
 On 06/19/2011 03:59 PM, Gleb Natapov wrote:
 On Sun, Jun 19, 2011 at 03:35:58PM +0300, Avi Kivity wrote:
   On 06/15/2011 12:09 PM, Gleb Natapov wrote:
   
  Actually, I'd expect most read/writes to benefit from caching, no?
  So why don't we just rename kvm_write_guest_cached() to
  kvm_write_guest(), and the few places - if any - that need to force
  transversing of the gfn mappings, get renamed to
  kvm_write_guest_uncached ?
   
   Good idea. I do not see any places where kvm_write_guest_uncached is
   needed from a brief look. Avi?
   
 
   kvm_write_guest_cached() needs something to supply the cache, and
   needs recurring writes to the same location.  Neither of these are
   common (for example, instruction emulation doesn't have either).
 
 Correct. Missed that. So what about changing steal time to use
 kvm_write_guest_cached()?
 
 Makes sense, definitely.  Want to post read_guest_cached() as well?
 
Glauber can you write read_guest_cached() as part of your series (should
be trivial), or do you want me to do it? I do not have a code to test it
with though :)

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM call agenda for June 21

2011-06-20 Thread Juan Quintela

Please send in any agenda items you are interested in covering.

thanks,
-juan

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/7] KVM-HV: KVM Steal time implementation

2011-06-20 Thread Avi Kivity

On 06/20/2011 10:21 AM, Gleb Natapov wrote:

On Sun, Jun 19, 2011 at 04:02:22PM +0300, Avi Kivity wrote:
  On 06/19/2011 03:59 PM, Gleb Natapov wrote:
  On Sun, Jun 19, 2011 at 03:35:58PM +0300, Avi Kivity wrote:
 On 06/15/2011 12:09 PM, Gleb Natapov wrote:
 
 Actually, I'd expect most read/writes to benefit from caching, no?
 So why don't we just rename kvm_write_guest_cached() to
 kvm_write_guest(), and the few places - if any - that need to 
force
 transversing of the gfn mappings, get renamed to
 kvm_write_guest_uncached ?
 
 Good idea. I do not see any places where kvm_write_guest_uncached is
 needed from a brief look. Avi?
 
  
 kvm_write_guest_cached() needs something to supply the cache, and
 needs recurring writes to the same location.  Neither of these are
 common (for example, instruction emulation doesn't have either).
  
  Correct. Missed that. So what about changing steal time to use
  kvm_write_guest_cached()?

  Makes sense, definitely.  Want to post read_guest_cached() as well?

Glauber can you write read_guest_cached() as part of your series (should
be trivial), or do you want me to do it? I do not have a code to test it
with though :)


Yes.

(you can write it, and Glauber can include it in the series)

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Why doesn't Intel e1000 NIC work correctly in Windows XP?

2011-06-20 Thread Dor Laor
Not sure it might help but IIRC the original e1000 driver for windows 
had some bugs that were fixed if you'll download the most recent driver 
from Intel site. This was the case for the fully emulated e1000 qemu 
device and might help here too.


On 06/19/2011 03:29 PM, Flypen CloudMe wrote:

Hi,

Here are the command line:

/usr/bin/qemu-kvm -S -M rhel6.0.0 -enable-kvm -m 2048 -smp
2,sockets=1,cores=2,threads=1 \
-name winxp -uuid 23cd2751-8a30-dd34-db47-bfc8c76ccadb -nodefconfig
-nodefaults \
-chardev 
socket,id=monitor,path=/var/lib/libvirt/qemu/winxp.monitor,server,nowait
-mon chardev=monitor,mode=readline \
-rtc base=localtime -boot c -device lsi,id=scsi0,bus=pci.0,addr=0x5
-device lsi,id=scsi1,bus=pci.0,addr=0x6 \
-device lsi,id=scsi2,bus=pci.0,addr=0x7 -device
lsi,id=scsi3,bus=pci.0,addr=0x8 \
-drive 
file=/mnt/vmdisk/winxp.disk,if=none,id=drive-ide0-0-0,boot=on,format=raw,cache=none
\
-device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 \
-drive 
file=/mnt/vmdisk/virtio-win-1.1.16.vfd,if=none,id=drive-fdc0-0-0,format=raw,cache=none\
-global isa-fdc.driveA=drive-fdc0-0-0 -drive
file=/dev/sd1,if=none,id=drive-scsi0-0-0,format=raw,cache=none \
-device scsi-disk,bus=scsi0.0,scsi-id=0,drive=drive-scsi0-0-0,id=scsi0-0-0 \
-drive file=/dev/sdb,if=none,id=drive-scsi0-0-1,format=raw,cache=none \
-device scsi-disk,bus=scsi0.0,scsi-id=1,drive=drive-scsi0-0-1,id=scsi0-0-1 \
-drive file=/dev/sdc,if=none,id=drive-scsi0-0-2,format=raw,cache=none \
-device scsi-disk,bus=scsi0.0,scsi-id=2,drive=drive-scsi0-0-2,id=scsi0-0-2 \
-drive file=/dev/sdd,if=none,id=drive-scsi0-0-3,format=raw,cache=none \
-device scsi-disk,bus=scsi0.0,scsi-id=3,drive=drive-scsi0-0-3,id=scsi0-0-3 \
-drive file=/dev/sde,if=none,id=drive-scsi0-0-4,format=raw,cache=none \
-device scsi-disk,bus=scsi0.0,scsi-id=4,drive=drive-scsi0-0-4,id=scsi0-0-4 \
-drive file=/dev/sdf,if=none,id=drive-scsi3-0-0,format=raw,cache=none \
-device scsi-disk,bus=scsi3.0,scsi-id=0,drive=drive-scsi3-0-0,id=scsi3-0-0 \
-drive file=/mnt/vmdisk/D/1,if=none,id=drive-scsi0-0-6,format=raw,cache=none \
-device scsi-disk,bus=scsi0.0,scsi-id=6,drive=drive-scsi0-0-6,id=scsi0-0-6 \
-chardev pty,id=serial0 -device isa-serial,chardev=serial0 -usb \
-vnc 0.0.0.0:0 -k en-us -vga vmware -device
pci-assign,host=02:00.0,id=hostdev0,configfd=18,bus=pci.0,addr=0x3 \
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4

The NIC and one SCSI controller (slot 7) has the same IRQ. The
performance in XP is really bad. When writing traffic to the drive,
the NIC can't be accessed, and ping will be also timeout.
If I let the NIC has the different IRQ number, then everything is OK.
Is it related to INTx model for XP?

We rebuild the QEMU, and add the LSI SCSI controller support. Why does
RHEL6 removes its support? Is this controller too old? Are there any
emulated SCSI devices to replace it?

Thanks,
flypen


On Thu, Jun 16, 2011 at 2:42 AM, Alex Williamson
alex.william...@redhat.com  wrote:


On Wed, 2011-06-15 at 11:31 +0200, Jan Kiszka wrote:

On 2011-06-15 10:04, Jan Kiszka wrote:

On 2011-06-15 02:54, Alex Williamson wrote:

On Tue, 2011-06-14 at 16:11 +0800, Flypen CloudMe wrote:

Hi,

I use Redhat Enterprise Linux 6, and use the KVM that is released by
Redhat officially. The kernel version is 2.6.32-71.el6.x86_64.

It seems that the IRQs are conflicted after reboot. The NIC and the
SCSI controller have the same IRQ number. If I re-install the NIC
driver, the IRQ number of the NIC will be assigned another value, then
it can work normally. Do we have a way to let the NIC and the SCSI
controller have different IRQ number in VM?


Hmm, I'm still confused here.  I went back and double checked, and as I
thought, we disable the LSI SCSI controller in the RHEL6 KVM.  So I'm
curious what this device is.  Is it an assigned SCSI controller or is
there another one that we forgot to disable in RHEL or is this a
different version of KVM?  The config file or command line would be
handy here.


I'll see if I can reproduce and figure anything out.  Windows XP isn't a
guest we concentrate on, especially with device assignment.  Are you
using an AMD or Intel host system?  Does the same thing happen if you
run the XP guest on an IDE controller?  It would be helpful to post the
guest configuration, command line used or libvirt xml.  Also, you might
try latest upstream qemu-kvm to see if the problem still exists.


I tested with an 82578DM e1000e NIC on an Intel host system, and it
surprisingly worked just fine on the RHEL6.0 base.  This is with a 32bit
Windows XP SP3 install.  The device supports MSI, but windows only seems
to use it with INTx.  I did have to remove the emulated rtl8139 or else
I couldn't even boot due to BSODs in the guest.



Nonsense, can't t make a difference as the PIIX3 resets the routing to
disable - which device-assignment does not deal with, but that's unrelated.


Yep, someone has to write it at some point and device assignment will
catch that.


Try assigning a 

Re: [PATCH 3/3] KVM: MMU: Use helpers to clean up walk_addr_generic()

2011-06-20 Thread Avi Kivity

On 06/14/2011 08:03 PM, Takuya Yoshikawa wrote:

From: Takuya Yoshikawayoshikawa.tak...@oss.ntt.co.jp

Introduce two new helpers: set_accessed_bit() and is_last_gpte().

These names were suggested by Ingo and Avi.

Cc: Ingo Molnarmi...@elte.hu
Signed-off-by: Takuya Yoshikawayoshikawa.tak...@oss.ntt.co.jp
---
  arch/x86/kvm/paging_tmpl.h |   57 ---
  1 files changed, 42 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 92fe275..d655a4b6 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -113,6 +113,43 @@ static unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, 
pt_element_t gpte)
return access;
  }

+static int FNAME(set_accessed_bit)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
+  gfn_t table_gfn, unsigned index,
+  pt_element_t __user *ptep_user,
+  pt_element_t *ptep)
+{
+   int ret;
+
+   trace_kvm_mmu_set_accessed_bit(table_gfn, index, sizeof(*ptep));
+   ret = FNAME(cmpxchg_gpte)(vcpu, mmu, ptep_user, index,
+ *ptep, *ptep|PT_ACCESSED_MASK);
+   if (unlikely(ret))
+   return ret;
+
+   mark_page_dirty(vcpu-kvm, table_gfn);
+   *ptep |= PT_ACCESSED_MASK;
+
+   return 0;
+}



I don't think this one is worthwhile, it takes 7 parameters!  If there's 
so much communication between caller and callee, it means they are too 
heavily tied up.




+
+static bool FNAME(is_last_gpte)(struct guest_walker *walker,
+   struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
+   pt_element_t gpte)
+{
+   if (walker-level == PT_PAGE_TABLE_LEVEL)
+   return true;
+
+   if ((walker-level == PT_DIRECTORY_LEVEL)  is_large_pte(gpte)
+   (PTTYPE == 64 || is_pse(vcpu)))
+   return true;
+
+   if ((walker-level == PT_PDPE_LEVEL)  is_large_pte(gpte)
+   (mmu-root_level == PT64_ROOT_LEVEL))
+   return true;
+
+   return false;
+}
+


This one is much better.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/7] KVM-HV: KVM Steal time implementation

2011-06-20 Thread Glauber Costa

On 06/20/2011 05:02 AM, Avi Kivity wrote:

On 06/20/2011 10:21 AM, Gleb Natapov wrote:

On Sun, Jun 19, 2011 at 04:02:22PM +0300, Avi Kivity wrote:
 On 06/19/2011 03:59 PM, Gleb Natapov wrote:
 On Sun, Jun 19, 2011 at 03:35:58PM +0300, Avi Kivity wrote:
  On 06/15/2011 12:09 PM, Gleb Natapov wrote:
  
   Actually, I'd expect most read/writes to benefit from caching,
no?
   So why don't we just rename kvm_write_guest_cached() to
   kvm_write_guest(), and the few places - if any - that need to
force
   transversing of the gfn mappings, get renamed to
   kvm_write_guest_uncached ?
  
  Good idea. I do not see any places where
kvm_write_guest_uncached is
  needed from a brief look. Avi?
  
 
  kvm_write_guest_cached() needs something to supply the cache, and
  needs recurring writes to the same location. Neither of these are
  common (for example, instruction emulation doesn't have either).
 
 Correct. Missed that. So what about changing steal time to use
 kvm_write_guest_cached()?

 Makes sense, definitely. Want to post read_guest_cached() as well?

Glauber can you write read_guest_cached() as part of your series (should
be trivial), or do you want me to do it? I do not have a code to test it
with though :)


Yes.

(you can write it, and Glauber can include it in the series)


Write it, handle me the patch, I'll include it and test it.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/7] KVM-HV: KVM Steal time implementation

2011-06-20 Thread Gleb Natapov
On Mon, Jun 20, 2011 at 09:42:31AM -0300, Glauber Costa wrote:
 On 06/20/2011 05:02 AM, Avi Kivity wrote:
 On 06/20/2011 10:21 AM, Gleb Natapov wrote:
 On Sun, Jun 19, 2011 at 04:02:22PM +0300, Avi Kivity wrote:
  On 06/19/2011 03:59 PM, Gleb Natapov wrote:
  On Sun, Jun 19, 2011 at 03:35:58PM +0300, Avi Kivity wrote:
   On 06/15/2011 12:09 PM, Gleb Natapov wrote:
   
Actually, I'd expect most read/writes to benefit from caching,
 no?
So why don't we just rename kvm_write_guest_cached() to
kvm_write_guest(), and the few places - if any - that need to
 force
transversing of the gfn mappings, get renamed to
kvm_write_guest_uncached ?
   
   Good idea. I do not see any places where
 kvm_write_guest_uncached is
   needed from a brief look. Avi?
   
  
   kvm_write_guest_cached() needs something to supply the cache, and
   needs recurring writes to the same location. Neither of these are
   common (for example, instruction emulation doesn't have either).
  
  Correct. Missed that. So what about changing steal time to use
  kvm_write_guest_cached()?
 
  Makes sense, definitely. Want to post read_guest_cached() as well?
 
 Glauber can you write read_guest_cached() as part of your series (should
 be trivial), or do you want me to do it? I do not have a code to test it
 with though :)
 
 Yes.
 
 (you can write it, and Glauber can include it in the series)
 
 Write it, handle me the patch, I'll include it and test it.

Only compile tested.

===
Introduce kvm_read_guest_cached() function in addition to write one we
already have.

Signed-off-by: Gleb Natapov g...@redhat.com
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index fa2321a..bf62c76 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1414,6 +1414,26 @@ int kvm_write_guest_cached(struct kvm *kvm, struct 
gfn_to_hva_cache *ghc,
 }
 EXPORT_SYMBOL_GPL(kvm_write_guest_cached);
 
+int kvm_read_guest_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
+  void *data, unsigned long len)
+{
+   struct kvm_memslots *slots = kvm_memslots(kvm);
+   int r;
+
+   if (slots-generation != ghc-generation)
+   kvm_gfn_to_hva_cache_init(kvm, ghc, ghc-gpa);
+
+   if (kvm_is_error_hva(ghc-hva))
+   return -EFAULT;
+
+   r = __copy_from_user(data, (void __user *)ghc-hva, len);
+   if (r)
+   return -EFAULT;
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_read_guest_cached);
+
 int kvm_clear_guest_page(struct kvm *kvm, gfn_t gfn, int offset, int len)
 {
return kvm_write_guest_page(kvm, gfn, (const void *) empty_zero_page,
--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] Inroduce panic hypercall KVM_HC_PANIC (host)

2011-06-20 Thread Daniel Gollub
Introduce panic hypercall (KVM_HC_PANIC) on host-end to signal
that the guest crashed/paniced. This gets signal to userspace
with KVM API and ioctl KVM_RUN with exit_reason: KVM_EXIT_PANIC

Signed-off-by: Daniel Gollub gol...@b1-systems.de
---
 arch/x86/kvm/x86.c   |9 +
 include/linux/kvm.h  |1 +
 include/linux/kvm_host.h |1 +
 include/linux/kvm_para.h |1 +
 4 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d88de56..bbe91fe 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5103,6 +5103,10 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
case KVM_HC_MMU_OP:
r = kvm_pv_mmu_op(vcpu, a0, hc_gpa(vcpu, a1, a2), ret);
break;
+   case KVM_HC_PANIC:
+   set_bit(KVM_REQ_PANIC, vcpu-requests);
+   ret = 0;
+   break;
default:
ret = -KVM_ENOSYS;
break;
@@ -5431,6 +5435,11 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
r = 1;
goto out;
}
+   if (kvm_check_request(KVM_REQ_PANIC, vcpu)) {
+   vcpu-run-exit_reason = KVM_EXIT_PANIC;
+   r = 0;
+   goto out;
+   }
}
 
r = kvm_mmu_reload(vcpu);
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 55ef181..8a8b609 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -161,6 +161,7 @@ struct kvm_pit_config {
 #define KVM_EXIT_NMI  16
 #define KVM_EXIT_INTERNAL_ERROR   17
 #define KVM_EXIT_OSI  18
+#define KVM_EXIT_PANIC19
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 #define KVM_INTERNAL_ERROR_EMULATION 1
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index b9c3299..1819414 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -47,6 +47,7 @@
 #define KVM_REQ_DEACTIVATE_FPU10
 #define KVM_REQ_EVENT 11
 #define KVM_REQ_APF_HALT  12
+#define KVM_REQ_PANIC 13
 
 #define KVM_USERSPACE_IRQ_SOURCE_ID0
 
diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h
index 47a070b..5cdf61b 100644
--- a/include/linux/kvm_para.h
+++ b/include/linux/kvm_para.h
@@ -19,6 +19,7 @@
 #define KVM_HC_MMU_OP  2
 #define KVM_HC_FEATURES3
 #define KVM_HC_PPC_MAP_MAGIC_PAGE  4
+#define KVM_HC_PANIC   5
 
 /*
  * hypercalls use architecture specific
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] Introduce panic hypercall

2011-06-20 Thread Daniel Gollub
Introduce panic hypercall to enable the crashing guest to notify the
host. This enables the host to run some actions as soon a guest
crashed (kernel panic).

This patch series introduces the panic hypercall at the host end.
As well as the hypercall for KVM paravirtuliazed Linux guests, by
registering the hypercall to the panic_notifier_list.

The basic idea is to create KVM crashdump automatically as soon the
guest paniced and power-cycle the VM (e.g. libvirt on_crash /).


Daniel Gollub (2):
  Inroduce panic hypercall KVM_HC_PANIC (host)
  Call KVM_HC_PANIC if guest panics

 arch/x86/kernel/kvm.c|   16 
 arch/x86/kvm/x86.c   |9 +
 include/linux/kvm.h  |1 +
 include/linux/kvm_host.h |1 +
 include/linux/kvm_para.h |1 +
 5 files changed, 28 insertions(+), 0 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] Call KVM_HC_PANIC if guest panics

2011-06-20 Thread Daniel Gollub
Call KVM hypercall KVM_HC_PANIC if guest kernel calls panic() to signal the
host that the guest paniced.

Depends on CONFIG_KVM_GUEST set.

Signed-off-by: Daniel Gollub gol...@b1-systems.de
---
 arch/x86/kernel/kvm.c |   16 
 1 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 33c07b0..f3c7d34 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -534,6 +534,20 @@ static void __init kvm_apf_trap_init(void)
set_intr_gate(14, async_page_fault);
 }
 
+static int kvm_guest_panic(struct notifier_block *nb, unsigned long l, void *p)
+{
+   kvm_hypercall0(KVM_HC_PANIC);
+   return NOTIFY_DONE;
+}
+
+static struct notifier_block kvm_guest_paniced = {
+   .notifier_call = kvm_guest_panic
+};
+
+static void kvm_guest_panic_handler_init(void) {
+   atomic_notifier_chain_register(panic_notifier_list, 
kvm_guest_paniced);
+}
+
 void __init kvm_guest_init(void)
 {
int i;
@@ -541,6 +555,8 @@ void __init kvm_guest_init(void)
if (!kvm_para_available())
return;
 
+   kvm_guest_panic_handler_init();
+
paravirt_ops_setup();
register_reboot_notifier(kvm_pv_reboot_nb);
for (i = 0; i  KVM_TASK_SLEEP_HASHSIZE; i++)
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] qemu-kvm: Introduce KVM panic hypercall support

2011-06-20 Thread Daniel Gollub
Introduce KVM panic hypercall support to make QEMU aware of a
crashed guest. This patches are specific to a KVM paravirtulized
guest which need to call KVM_HC_PANIC on a crash/panic.

The basic idea of this implementation and of the QMP PANIC event
is to be able to create crashdump via the hypervisor instead of
kexec/kdump as soon the guest crashes.

Initial panic QMP event enabled-libvirt is in progress:
http://people.b1-systems.de/~gollub/kvm/hypercall-panic/libvirt/


Daniel Gollub (2):
  Handle KVM hypercall panic on guest crash
  QMP: Introduce QEVENT_PANIC

 QMP/qmp-events.txt  |   13 +
 kvm-all.c   |5 +
 kvm/include/linux/kvm.h |1 +
 monitor.c   |   11 +--
 monitor.h   |1 +
 sysemu.h|3 +++
 vl.c|   20 
 7 files changed, 52 insertions(+), 2 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] Handle KVM hypercall panic on guest crash

2011-06-20 Thread Daniel Gollub
If the guest crash and the crash/panic handler calls
the KVM panic hypercall the KVM API notifies this with
KVM_EXIT_PANIC.

The VM status gets extended with panic to obtain
this status via the QEMU monitor.
---
 kvm-all.c   |4 
 kvm/include/linux/kvm.h |1 +
 monitor.c   |8 ++--
 sysemu.h|1 +
 vl.c|2 ++
 5 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 629f727..9771f91 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1029,6 +1029,10 @@ int kvm_cpu_exec(CPUState *env)
 qemu_system_reset_request();
 ret = EXCP_INTERRUPT;
 break;
+case KVM_EXIT_PANIC:
+panic = 1;
+ret = 1;
+break;
 case KVM_EXIT_UNKNOWN:
 fprintf(stderr, KVM: unknown exit, hardware reason % PRIx64 \n,
 (uint64_t)run-hw.hardware_exit_reason);
diff --git a/kvm/include/linux/kvm.h b/kvm/include/linux/kvm.h
index e46729e..207871c 100644
--- a/kvm/include/linux/kvm.h
+++ b/kvm/include/linux/kvm.h
@@ -161,6 +161,7 @@ struct kvm_pit_config {
 #define KVM_EXIT_NMI  16
 #define KVM_EXIT_INTERNAL_ERROR   17
 #define KVM_EXIT_OSI  18
+#define KVM_EXIT_PANIC19
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 #define KVM_INTERNAL_ERROR_EMULATION 1
diff --git a/monitor.c b/monitor.c
index 59a3e76..fd6a881 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2599,13 +2599,17 @@ static void do_info_status_print(Monitor *mon, const 
QObject *data)
 monitor_printf(mon, paused);
 }
 
+if (qdict_get_bool(qdict, panic)) {
+monitor_printf(mon,  (panic));
+}
+
 monitor_printf(mon, \n);
 }
 
 static void do_info_status(Monitor *mon, QObject **ret_data)
 {
-*ret_data = qobject_from_jsonf({ 'running': %i, 'singlestep': %i },
-vm_running, singlestep);
+*ret_data = qobject_from_jsonf({ 'running': %i, 'singlestep': %i, 
'panic': %i },
+vm_running, singlestep, panic);
 }
 
 static qemu_acl *find_acl(Monitor *mon, const char *name)
diff --git a/sysemu.h b/sysemu.h
index a42d83f..8ab0168 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -12,6 +12,7 @@
 extern const char *bios_name;
 
 extern int vm_running;
+extern int panic;
 extern const char *qemu_name;
 extern uint8_t qemu_uuid[];
 int qemu_uuid_parse(const char *str, uint8_t *uuid);
diff --git a/vl.c b/vl.c
index e0191e1..1d9a068 100644
--- a/vl.c
+++ b/vl.c
@@ -185,6 +185,7 @@ int mem_prealloc = 0; /* force preallocation of physical 
target memory */
 int nb_nics;
 NICInfo nd_table[MAX_NICS];
 int vm_running;
+int panic = 0;
 int autostart;
 int incoming_expected; /* Started with -incoming and waiting for incoming */
 static int rtc_utc = 1;
@@ -1407,6 +1408,7 @@ static void main_loop(void)
 pause_all_vcpus();
 cpu_synchronize_all_states();
 qemu_system_reset();
+panic = 0;
 resume_all_vcpus();
 }
 if (qemu_powerdown_requested()) {
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] QMP: Introduce QEVENT_PANIC

2011-06-20 Thread Daniel Gollub
Emitted when the guest panics.
For now only if KVM_EXIT_PANIC got triggered.

Signed-off-by: Daniel Gollub gol...@b1-systems.de
---
 QMP/qmp-events.txt |   13 +
 kvm-all.c  |3 ++-
 monitor.c  |3 +++
 monitor.h  |1 +
 sysemu.h   |2 ++
 vl.c   |   18 ++
 6 files changed, 39 insertions(+), 1 deletions(-)

diff --git a/QMP/qmp-events.txt b/QMP/qmp-events.txt
index 0ce5d4e..96e4307 100644
--- a/QMP/qmp-events.txt
+++ b/QMP/qmp-events.txt
@@ -264,3 +264,16 @@ Example:
 
 Note: If action is reset, shutdown, or pause the WATCHDOG event is
 followed respectively by the RESET, SHUTDOWN, or STOP events.
+
+
+PANIC
+-
+
+Emitted when the guest panics.
+
+Data: None.
+
+Example:
+
+{ timestamp: {seconds: 1308569038, microseconds: 918147}, 
+  event: PANIC}
diff --git a/kvm-all.c b/kvm-all.c
index 9771f91..9fdda69 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1030,7 +1030,8 @@ int kvm_cpu_exec(CPUState *env)
 ret = EXCP_INTERRUPT;
 break;
 case KVM_EXIT_PANIC:
-panic = 1;
+DPRINTF(panic\n);
+qemu_system_panic_request();
 ret = 1;
 break;
 case KVM_EXIT_UNKNOWN:
diff --git a/monitor.c b/monitor.c
index fd6a881..5b337f2 100644
--- a/monitor.c
+++ b/monitor.c
@@ -468,6 +468,9 @@ void monitor_protocol_event(MonitorEvent event, QObject 
*data)
 case QEVENT_SPICE_DISCONNECTED:
 event_name = SPICE_DISCONNECTED;
 break;
+case QEVENT_PANIC:
+event_name = PANIC;
+break;
 default:
 abort();
 break;
diff --git a/monitor.h b/monitor.h
index 4f2d328..8b045df 100644
--- a/monitor.h
+++ b/monitor.h
@@ -35,6 +35,7 @@ typedef enum MonitorEvent {
 QEVENT_SPICE_CONNECTED,
 QEVENT_SPICE_INITIALIZED,
 QEVENT_SPICE_DISCONNECTED,
+QEVENT_PANIC,
 QEVENT_MAX,
 } MonitorEvent;
 
diff --git a/sysemu.h b/sysemu.h
index 8ab0168..30744b0 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -43,11 +43,13 @@ void qemu_system_shutdown_request(void);
 void qemu_system_powerdown_request(void);
 void qemu_system_debug_request(void);
 void qemu_system_vmstop_request(int reason);
+void qemu_system_panic_request(void);
 int qemu_shutdown_requested_get(void);
 int qemu_reset_requested_get(void);
 int qemu_shutdown_requested(void);
 int qemu_reset_requested(void);
 int qemu_powerdown_requested(void);
+int qemu_panic_requested(void);
 void qemu_system_killed(int signal, pid_t pid);
 void qemu_kill_report(void);
 extern qemu_irq qemu_system_powerdown;
diff --git a/vl.c b/vl.c
index 1d9a068..d997c36 100644
--- a/vl.c
+++ b/vl.c
@@ -1173,6 +1173,7 @@ static pid_t shutdown_pid;
 static int powerdown_requested;
 static int debug_requested;
 static int vmstop_requested;
+static int panic_requested;
 
 int qemu_shutdown_requested_get(void)
 {
@@ -1235,6 +1236,13 @@ static int qemu_vmstop_requested(void)
 return r;
 }
 
+int qemu_panic_requested(void)
+{
+int r = panic_requested;
+panic_requested = 0;
+return r;
+}
+
 void qemu_register_reset(QEMUResetHandler *func, void *opaque)
 {
 QEMUResetEntry *re = qemu_mallocz(sizeof(QEMUResetEntry));
@@ -1311,6 +1319,13 @@ void qemu_system_vmstop_request(int reason)
 qemu_notify_event();
 }
 
+void qemu_system_panic_request(void)
+{
+panic = 1;
+panic_requested = 1;
+qemu_notify_event();
+}
+
 void main_loop_wait(int nonblocking)
 {
 fd_set rfds, wfds, xfds;
@@ -1418,6 +1433,9 @@ static void main_loop(void)
 if ((r = qemu_vmstop_requested())) {
 vm_stop(r);
 }
+if (qemu_panic_requested()) {
+monitor_protocol_event(QEVENT_PANIC, NULL);
+}
 }
 bdrv_close_all();
 pause_all_vcpus();
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/3] KVM: MMU: Clean up the error handling of walk_addr_generic()

2011-06-20 Thread Takuya Yoshikawa
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp

Avoid two step jump to the error handling part.  This eliminates the use
of the variables present and rsvd_fault.

We also use the const type qualifier to show that write/user/fetch_fault
do not change in the function.

Both of these were suggested by Ingo Molnar.

Cc: Ingo Molnar mi...@elte.hu
Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
---
 v2-v3: only changelog update

 arch/x86/kvm/paging_tmpl.h |   64 +++
 1 files changed, 28 insertions(+), 36 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 1caeb4d..137aa45 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -125,18 +125,17 @@ static int FNAME(walk_addr_generic)(struct guest_walker 
*walker,
gfn_t table_gfn;
unsigned index, pt_access, uninitialized_var(pte_access);
gpa_t pte_gpa;
-   bool eperm, present, rsvd_fault;
-   int offset, write_fault, user_fault, fetch_fault;
-
-   write_fault = access  PFERR_WRITE_MASK;
-   user_fault = access  PFERR_USER_MASK;
-   fetch_fault = access  PFERR_FETCH_MASK;
+   bool eperm;
+   int offset;
+   const int write_fault = access  PFERR_WRITE_MASK;
+   const int user_fault  = access  PFERR_USER_MASK;
+   const int fetch_fault = access  PFERR_FETCH_MASK;
+   u16 errcode = 0;
 
trace_kvm_mmu_pagetable_walk(addr, write_fault, user_fault,
 fetch_fault);
 walk:
-   present = true;
-   eperm = rsvd_fault = false;
+   eperm = false;
walker-level = mmu-root_level;
pte   = mmu-get_cr3(vcpu);
 
@@ -145,7 +144,7 @@ walk:
pte = kvm_pdptr_read_mmu(vcpu, mmu, (addr  30)  3);
trace_kvm_mmu_paging_element(pte, walker-level);
if (!is_present_gpte(pte)) {
-   present = false;
+   errcode |= PFERR_PRESENT_MASK;
goto error;
}
--walker-level;
@@ -171,34 +170,34 @@ walk:
real_gfn = mmu-translate_gpa(vcpu, gfn_to_gpa(table_gfn),
  PFERR_USER_MASK|PFERR_WRITE_MASK);
if (unlikely(real_gfn == UNMAPPED_GVA)) {
-   present = false;
-   break;
+   errcode |= PFERR_PRESENT_MASK;
+   goto error;
}
real_gfn = gpa_to_gfn(real_gfn);
 
host_addr = gfn_to_hva(vcpu-kvm, real_gfn);
if (unlikely(kvm_is_error_hva(host_addr))) {
-   present = false;
-   break;
+   errcode |= PFERR_PRESENT_MASK;
+   goto error;
}
 
ptep_user = (pt_element_t __user *)((void *)host_addr + offset);
if (unlikely(__copy_from_user(pte, ptep_user, sizeof(pte {
-   present = false;
-   break;
+   errcode |= PFERR_PRESENT_MASK;
+   goto error;
}
 
trace_kvm_mmu_paging_element(pte, walker-level);
 
if (unlikely(!is_present_gpte(pte))) {
-   present = false;
-   break;
+   errcode |= PFERR_PRESENT_MASK;
+   goto error;
}
 
if (unlikely(is_rsvd_bits_set(vcpu-arch.mmu, pte,
  walker-level))) {
-   rsvd_fault = true;
-   break;
+   errcode |= PFERR_RSVD_MASK;
+   goto error;
}
 
if (unlikely(write_fault  !is_writable_pte(pte)
@@ -213,16 +212,15 @@ walk:
eperm = true;
 #endif
 
-   if (!eperm  !rsvd_fault
-unlikely(!(pte  PT_ACCESSED_MASK))) {
+   if (!eperm  unlikely(!(pte  PT_ACCESSED_MASK))) {
int ret;
trace_kvm_mmu_set_accessed_bit(table_gfn, index,
   sizeof(pte));
ret = FNAME(cmpxchg_gpte)(vcpu, mmu, ptep_user, index,
  pte, pte|PT_ACCESSED_MASK);
if (unlikely(ret  0)) {
-   present = false;
-   break;
+   errcode |= PFERR_PRESENT_MASK;
+   goto error;
} else if (ret)
goto walk;
 
@@ -276,7 +274,7 @@ walk:
--walker-level;
}
 
-   if (unlikely(!present || eperm || rsvd_fault))
+   if (unlikely(eperm))
goto error;
 
   

[PATCH v3 2/3] KVM: MMU: Rename the walk label in walk_addr_generic()

2011-06-20 Thread Takuya Yoshikawa
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp

The current name does not explain the meaning well.  So give it a better
name retry_walk to show that we are trying the walk again.

This was suggested by Ingo Molnar.

Cc: Ingo Molnar mi...@elte.hu
Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
---
 v2-v3: only changelog update

 arch/x86/kvm/paging_tmpl.h |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 137aa45..92fe275 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -134,7 +134,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker 
*walker,
 
trace_kvm_mmu_pagetable_walk(addr, write_fault, user_fault,
 fetch_fault);
-walk:
+retry_walk:
eperm = false;
walker-level = mmu-root_level;
pte   = mmu-get_cr3(vcpu);
@@ -222,7 +222,7 @@ walk:
errcode |= PFERR_PRESENT_MASK;
goto error;
} else if (ret)
-   goto walk;
+   goto retry_walk;
 
mark_page_dirty(vcpu-kvm, table_gfn);
pte |= PT_ACCESSED_MASK;
@@ -287,7 +287,7 @@ walk:
errcode |= PFERR_PRESENT_MASK;
goto error;
} else if (ret)
-   goto walk;
+   goto retry_walk;
 
mark_page_dirty(vcpu-kvm, table_gfn);
pte |= PT_DIRTY_MASK;
-- 
1.7.4.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Why doesn't Intel e1000 NIC work correctly in Windows XP?

2011-06-20 Thread Alex Williamson
On Sun, 2011-06-19 at 20:29 +0800, Flypen CloudMe wrote:
 Hi,
 
 Here are the command line:
 
 /usr/bin/qemu-kvm -S -M rhel6.0.0 -enable-kvm -m 2048 -smp
 2,sockets=1,cores=2,threads=1 \
 -name winxp -uuid 23cd2751-8a30-dd34-db47-bfc8c76ccadb -nodefconfig
 -nodefaults \
 -chardev 
 socket,id=monitor,path=/var/lib/libvirt/qemu/winxp.monitor,server,nowait
 -mon chardev=monitor,mode=readline \
 -rtc base=localtime -boot c -device lsi,id=scsi0,bus=pci.0,addr=0x5
 -device lsi,id=scsi1,bus=pci.0,addr=0x6 \
 -device lsi,id=scsi2,bus=pci.0,addr=0x7 -device
 lsi,id=scsi3,bus=pci.0,addr=0x8 \
 -drive 
 file=/mnt/vmdisk/winxp.disk,if=none,id=drive-ide0-0-0,boot=on,format=raw,cache=none
 \
 -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 \
 -drive 
 file=/mnt/vmdisk/virtio-win-1.1.16.vfd,if=none,id=drive-fdc0-0-0,format=raw,cache=none\
 -global isa-fdc.driveA=drive-fdc0-0-0 -drive
 file=/dev/sd1,if=none,id=drive-scsi0-0-0,format=raw,cache=none \
 -device scsi-disk,bus=scsi0.0,scsi-id=0,drive=drive-scsi0-0-0,id=scsi0-0-0 \
 -drive file=/dev/sdb,if=none,id=drive-scsi0-0-1,format=raw,cache=none \
 -device scsi-disk,bus=scsi0.0,scsi-id=1,drive=drive-scsi0-0-1,id=scsi0-0-1 \
 -drive file=/dev/sdc,if=none,id=drive-scsi0-0-2,format=raw,cache=none \
 -device scsi-disk,bus=scsi0.0,scsi-id=2,drive=drive-scsi0-0-2,id=scsi0-0-2 \
 -drive file=/dev/sdd,if=none,id=drive-scsi0-0-3,format=raw,cache=none \
 -device scsi-disk,bus=scsi0.0,scsi-id=3,drive=drive-scsi0-0-3,id=scsi0-0-3 \
 -drive file=/dev/sde,if=none,id=drive-scsi0-0-4,format=raw,cache=none \
 -device scsi-disk,bus=scsi0.0,scsi-id=4,drive=drive-scsi0-0-4,id=scsi0-0-4 \
 -drive file=/dev/sdf,if=none,id=drive-scsi3-0-0,format=raw,cache=none \
 -device scsi-disk,bus=scsi3.0,scsi-id=0,drive=drive-scsi3-0-0,id=scsi3-0-0 \
 -drive file=/mnt/vmdisk/D/1,if=none,id=drive-scsi0-0-6,format=raw,cache=none \
 -device scsi-disk,bus=scsi0.0,scsi-id=6,drive=drive-scsi0-0-6,id=scsi0-0-6 \
 -chardev pty,id=serial0 -device isa-serial,chardev=serial0 -usb \
 -vnc 0.0.0.0:0 -k en-us -vga vmware -device
 pci-assign,host=02:00.0,id=hostdev0,configfd=18,bus=pci.0,addr=0x3 \
 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4

That's a lot of SCSI controllers.  Why are you creating 4 separate lsi
SCSI controller devices, but only using 2 of them?  Can you reduce the
problem by just using 1?  If so, then you might be able to move the
assigned device and lsi device addr around so the guest will use
different INTx interrupts for these (or at least move them until the
assigned device gets an interrupt in the guest exclusively).  Is the
guest Windows XP 32bit or 64bit?  A 64bit Windows is probably more
likely to enable MSI interrupts (which hopefully your assigned device
supports), which would also eliminate INTx sharing problems.

 The NIC and one SCSI controller (slot 7) has the same IRQ. The
 performance in XP is really bad. When writing traffic to the drive,
 the NIC can't be accessed, and ping will be also timeout.
 If I let the NIC has the different IRQ number, then everything is OK.
 Is it related to INTx model for XP?

Maybe so.  Most of the guest/device combinations we test for device
assignment make use of MSI/X interrupts, which are more efficient, and
avoid these sorts of problems.

 We rebuild the QEMU, and add the LSI SCSI controller support. Why does
 RHEL6 removes its support? Is this controller too old? Are there any
 emulated SCSI devices to replace it?

We remove it because it's not well used or tested and we don't want to
support it.  Virtio-blk is the alternative we'd typically recommend for
guests with supported drivers.  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 3/3] KVM: MMU: Introduce is_last_gpte() to clean up walk_addr_generic()

2011-06-20 Thread Takuya Yoshikawa
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp

Suggested by Ingo and Avi.

Cc: Ingo Molnar mi...@elte.hu
Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
---
 v2-v3: dropped set_accessed_bit()

 arch/x86/kvm/paging_tmpl.h |   26 +++---
 1 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 92fe275..e9243c8 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -113,6 +113,24 @@ static unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, 
pt_element_t gpte)
return access;
 }
 
+static bool FNAME(is_last_gpte)(struct guest_walker *walker,
+   struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
+   pt_element_t gpte)
+{
+   if (walker-level == PT_PAGE_TABLE_LEVEL)
+   return true;
+
+   if ((walker-level == PT_DIRECTORY_LEVEL)  is_large_pte(gpte) 
+   (PTTYPE == 64 || is_pse(vcpu)))
+   return true;
+
+   if ((walker-level == PT_PDPE_LEVEL)  is_large_pte(gpte) 
+   (mmu-root_level == PT64_ROOT_LEVEL))
+   return true;
+
+   return false;
+}
+
 /*
  * Fetch a guest pte for a guest virtual address
  */
@@ -232,13 +250,7 @@ retry_walk:
 
walker-ptes[walker-level - 1] = pte;
 
-   if ((walker-level == PT_PAGE_TABLE_LEVEL) ||
-   ((walker-level == PT_DIRECTORY_LEVEL) 
-   is_large_pte(pte) 
-   (PTTYPE == 64 || is_pse(vcpu))) ||
-   ((walker-level == PT_PDPE_LEVEL) 
-   is_large_pte(pte) 
-   mmu-root_level == PT64_ROOT_LEVEL)) {
+   if (FNAME(is_last_gpte)(walker, vcpu, mmu, pte)) {
int lvl = walker-level;
gpa_t real_gpa;
gfn_t gfn;
-- 
1.7.4.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/7] KVM-GST: adjust scheduler cpu power

2011-06-20 Thread Peter Zijlstra
On Tue, 2011-06-14 at 22:26 -0300, Glauber Costa wrote:
 On 06/14/2011 07:42 AM, Peter Zijlstra wrote:
  On Mon, 2011-06-13 at 19:31 -0400, Glauber Costa wrote:
  @@ -1981,12 +1987,29 @@ static void update_rq_clock_task(struct rq
  *rq, s64 delta)
 
   rq-prev_irq_time += irq_delta;
   delta -= irq_delta;
  +#endif
  +#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
  +   if (static_branch((paravirt_steal_rq_enabled))) {
 
  Why is that a different variable from the touch_steal_time() one?
 
 because they track different things, touch_steal_time() and 
 update_rq_clock() are
 called from different places at different situations.
 
 If we advance prev_steal_time in touch_steal_time(), and later on call 
 update_rq_clock_task(), we won't discount the time already flushed from 
 the rq_clock. Conversely, if we call update_rq_clock_task(), and only 
 then arrive at touch_steal_time, we won't account steal time properly.

But that's about prev_steal_time vs prev_steal_time_acc, I agree those
should be different.

 update_rq_clock_task() is called whenever update_rq_clock() is called.
 touch_steal_time is called every tick. If there is a causal relation 
 between them that would allow us to track it in a single location, I 
 fail to realize.

Both are steal time muck, I was wondering why we'd want to do one and
not the other when we have a high res stealtime clock.

  +
  +   steal = paravirt_steal_clock(cpu_of(rq));
  +   steal -= rq-prev_steal_time_acc;
  +
  +   rq-prev_steal_time_acc += steal;
 
  You have this addition in the wrong place, when you clip:
 
 I begin by disagreeing
  +   if (steal  delta)
  +   steal = delta;
 
  you just lost your steal delta, so the addition to prev_steal_time_acc
  needs to be after the clip.

 Unlike irq time, steal time can be extremely huge. Just think of a 
 virtual machine that got interrupted for a very long time. We'd have 
 steal  delta, leading to steal == delta for a big number of iterations.
 That would affect cpu power for an extended period of time, not
 reflecting present situation, just the past. So I like to think of delta
 as a hard cap for steal time.
 
 Obviously, I am open to debate.

I'm failing to see how this would happen, if the virtual machine wasn't
scheduled for a long long while, delta would be huge too. But suppose it
does happen, wouldn't it be likely that the virtual machine would
receive similar bad service in the near future? Making the total
accounting relevant.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] qemu-img: Add cache command line option

2011-06-20 Thread Kevin Wolf
Am 16.06.2011 16:43, schrieb Kevin Wolf:
 Am 16.06.2011 16:28, schrieb Christoph Hellwig:
 On Wed, Jun 15, 2011 at 09:46:10AM -0400, Federico Simoncelli wrote:
 qemu-img currently writes disk images using writeback and filling
 up the cache buffers which are then flushed by the kernel preventing
 other processes from accessing the storage.
 This is particularly bad in cluster environments where time-based
 algorithms might be in place and accessing the storage within
 certain timeouts is critical.
 This patch adds the option to choose a cache method when writing
 disk images.

 Allowing to chose the mode is of course fine, but what about also
 choosing a good default?  writethrough doesn't really make any sense
 for qemu-img, given that we can trivially flush the cache at the end
 of the operations.  I'd also say that using the buffer cache doesn't
 make sense either, as there is little point in caching these operations.
 
 Right, we need to keep the defaults as they are. That is, for convert
 unsafe and for everything else writeback. The patch seems to make
 writeback the default for everything.

Federico, are you going to fix this in a v4?

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] virtio: Support releasing lock during kick

2011-06-20 Thread Stefan Hajnoczi
On Sun, Jun 19, 2011 at 8:14 AM, Michael S. Tsirkin m...@redhat.com wrote:
 On Wed, Jun 23, 2010 at 10:24:02PM +0100, Stefan Hajnoczi wrote:
 The virtio block device holds a lock during I/O request processing.
 Kicking the virtqueue while the lock is held results in long lock hold
 times and increases contention for the lock.

 This patch modifies virtqueue_kick() to optionally release a lock while
 notifying the host.  Virtio block is modified to pass in its lock.  This
 allows other vcpus to queue I/O requests during the time spent servicing
 the virtqueue notify in the host.

 The virtqueue_kick() function is modified to know about locking because
 it changes the state of the virtqueue and should execute with the lock
 held (it would not be correct for virtio block to release the lock
 before calling virtqueue_kick()).

 Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com

 While the optimization makes sense, the API's pretty hairy IMHO.
 Why don't we split the kick functionality instead?
 E.g.
        /* Report whether host notification is necessary. */
        bool virtqueue_kick_prepare(struct virtqueue *vq)
        /* Can be done in parallel with add_buf/get_buf */
        void virtqueue_kick_notify(struct virtqueue *vq)

This is a nice idea, it makes the code cleaner.  I am testing patches
that implement this and after Khoa has measured the performance I will
send them out.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Introduce panic hypercall

2011-06-20 Thread Avi Kivity

On 06/20/2011 04:38 PM, Daniel Gollub wrote:

Introduce panic hypercall to enable the crashing guest to notify the
host. This enables the host to run some actions as soon a guest
crashed (kernel panic).

This patch series introduces the panic hypercall at the host end.
As well as the hypercall for KVM paravirtuliazed Linux guests, by
registering the hypercall to the panic_notifier_list.

The basic idea is to create KVM crashdump automatically as soon the
guest paniced and power-cycle the VM (e.g. libvirton_crash /).


This would be more easily done via a panic device (I/O port or 
memory-mapped address) that the guest hits.  It would be intercepted by 
qemu without any new code in kvm.\


However, I'm not sure I see the gain.  Most enterprisey guests already 
contain in-guest crash dumpers which provide more information than a 
qemu memory dump could, since they know exact load addresses etc. and 
are integrated with crash analysis tools.  What do you have in mind?


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Introduce panic hypercall

2011-06-20 Thread Daniel P. Berrange
On Mon, Jun 20, 2011 at 06:31:23PM +0300, Avi Kivity wrote:
 On 06/20/2011 04:38 PM, Daniel Gollub wrote:
 Introduce panic hypercall to enable the crashing guest to notify the
 host. This enables the host to run some actions as soon a guest
 crashed (kernel panic).
 
 This patch series introduces the panic hypercall at the host end.
 As well as the hypercall for KVM paravirtuliazed Linux guests, by
 registering the hypercall to the panic_notifier_list.
 
 The basic idea is to create KVM crashdump automatically as soon the
 guest paniced and power-cycle the VM (e.g. libvirton_crash /).
 
 This would be more easily done via a panic device (I/O port or
 memory-mapped address) that the guest hits.  It would be intercepted
 by qemu without any new code in kvm.\
 
 However, I'm not sure I see the gain.  Most enterprisey guests
 already contain in-guest crash dumpers which provide more
 information than a qemu memory dump could, since they know exact
 load addresses etc. and are integrated with crash analysis tools.
 What do you have in mind?

Well libvirt can capture a core file by doing 'virsh dump $GUESTNAME'.
This actually uses the QEMU monitor migration command to capture the
entire of QEMU memory. The 'crash' command line tool actually knows
how to analyse this data format as it would a normal kernel crashdump.

I think having a way for a guest OS to notify the host that is has
crashed would be useful. libvirt could automatically do a crash
dump of the QEMU memory, or at least pause the guest CPUs and notify
the management app of the crash, which can then decide what todo.
You can also use tools like 'virt-dmesg' which uses libvirt to peek
into guest memory to extract the most recent kernel dmesg logs (even
if the guest OS itself is crashed  didn't manage to send them out
via netconsole or something else).

This series does need to introduce a QMP event notification upon
crash, so that the crash notification can be propagated to mgmt
layers above QEMU.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Introduce panic hypercall

2011-06-20 Thread Avi Kivity

On 06/20/2011 06:38 PM, Daniel P. Berrange wrote:

On Mon, Jun 20, 2011 at 06:31:23PM +0300, Avi Kivity wrote:
  On 06/20/2011 04:38 PM, Daniel Gollub wrote:
  Introduce panic hypercall to enable the crashing guest to notify the
  host. This enables the host to run some actions as soon a guest
  crashed (kernel panic).
  
  This patch series introduces the panic hypercall at the host end.
  As well as the hypercall for KVM paravirtuliazed Linux guests, by
  registering the hypercall to the panic_notifier_list.
  
  The basic idea is to create KVM crashdump automatically as soon the
  guest paniced and power-cycle the VM (e.g. libvirton_crash /).

  This would be more easily done via a panic device (I/O port or
  memory-mapped address) that the guest hits.  It would be intercepted
  by qemu without any new code in kvm.\

  However, I'm not sure I see the gain.  Most enterprisey guests
  already contain in-guest crash dumpers which provide more
  information than a qemu memory dump could, since they know exact
  load addresses etc. and are integrated with crash analysis tools.
  What do you have in mind?

Well libvirt can capture a core file by doing 'virsh dump $GUESTNAME'.
This actually uses the QEMU monitor migration command to capture the
entire of QEMU memory. The 'crash' command line tool actually knows
how to analyse this data format as it would a normal kernel crashdump.


Interesting.


I think having a way for a guest OS to notify the host that is has
crashed would be useful. libvirt could automatically do a crash
dump of the QEMU memory, or at least pause the guest CPUs and notify
the management app of the crash, which can then decide what todo.
You can also use tools like 'virt-dmesg' which uses libvirt to peek
into guest memory to extract the most recent kernel dmesg logs (even
if the guest OS itself is crashed  didn't manage to send them out
via netconsole or something else).


I agree.  But let's do this via a device, this way kvm need not be changed.

Do ILO cards / IPMI support something like this?  We could follow their 
lead in that case.



This series does need to introduce a QMP event notification upon
crash, so that the crash notification can be propagated to mgmt
layers above QEMU.


Yes.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Introduce panic hypercall

2011-06-20 Thread Jan Kiszka
On 2011-06-20 17:45, Avi Kivity wrote:
 This series does need to introduce a QMP event notification upon
 crash, so that the crash notification can be propagated to mgmt
 layers above QEMU.
 
 Yes.

I think the best way to deal with that is to stop the VM on guest panic.
There is already WIP to signal stop reasons via QMP. Maybe we need to
differentiate between hypervisor and guest triggered panics
(VMSTOP_GUEST_PANIC?), but the rest should come for free.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Why doesn't Intel e1000 NIC work correctly in Windows XP?

2011-06-20 Thread Jan Kiszka
On 2011-06-20 16:32, Alex Williamson wrote:
 On Sun, 2011-06-19 at 20:29 +0800, Flypen CloudMe wrote:
 Hi,

 Here are the command line:

 /usr/bin/qemu-kvm -S -M rhel6.0.0 -enable-kvm -m 2048 -smp
 2,sockets=1,cores=2,threads=1 \
 -name winxp -uuid 23cd2751-8a30-dd34-db47-bfc8c76ccadb -nodefconfig
 -nodefaults \
 -chardev 
 socket,id=monitor,path=/var/lib/libvirt/qemu/winxp.monitor,server,nowait
 -mon chardev=monitor,mode=readline \
 -rtc base=localtime -boot c -device lsi,id=scsi0,bus=pci.0,addr=0x5
 -device lsi,id=scsi1,bus=pci.0,addr=0x6 \
 -device lsi,id=scsi2,bus=pci.0,addr=0x7 -device
 lsi,id=scsi3,bus=pci.0,addr=0x8 \
 -drive 
 file=/mnt/vmdisk/winxp.disk,if=none,id=drive-ide0-0-0,boot=on,format=raw,cache=none
 \
 -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 \
 -drive 
 file=/mnt/vmdisk/virtio-win-1.1.16.vfd,if=none,id=drive-fdc0-0-0,format=raw,cache=none\
 -global isa-fdc.driveA=drive-fdc0-0-0 -drive
 file=/dev/sd1,if=none,id=drive-scsi0-0-0,format=raw,cache=none \
 -device scsi-disk,bus=scsi0.0,scsi-id=0,drive=drive-scsi0-0-0,id=scsi0-0-0 \
 -drive file=/dev/sdb,if=none,id=drive-scsi0-0-1,format=raw,cache=none \
 -device scsi-disk,bus=scsi0.0,scsi-id=1,drive=drive-scsi0-0-1,id=scsi0-0-1 \
 -drive file=/dev/sdc,if=none,id=drive-scsi0-0-2,format=raw,cache=none \
 -device scsi-disk,bus=scsi0.0,scsi-id=2,drive=drive-scsi0-0-2,id=scsi0-0-2 \
 -drive file=/dev/sdd,if=none,id=drive-scsi0-0-3,format=raw,cache=none \
 -device scsi-disk,bus=scsi0.0,scsi-id=3,drive=drive-scsi0-0-3,id=scsi0-0-3 \
 -drive file=/dev/sde,if=none,id=drive-scsi0-0-4,format=raw,cache=none \
 -device scsi-disk,bus=scsi0.0,scsi-id=4,drive=drive-scsi0-0-4,id=scsi0-0-4 \
 -drive file=/dev/sdf,if=none,id=drive-scsi3-0-0,format=raw,cache=none \
 -device scsi-disk,bus=scsi3.0,scsi-id=0,drive=drive-scsi3-0-0,id=scsi3-0-0 \
 -drive file=/mnt/vmdisk/D/1,if=none,id=drive-scsi0-0-6,format=raw,cache=none 
 \
 -device scsi-disk,bus=scsi0.0,scsi-id=6,drive=drive-scsi0-0-6,id=scsi0-0-6 \
 -chardev pty,id=serial0 -device isa-serial,chardev=serial0 -usb \
 -vnc 0.0.0.0:0 -k en-us -vga vmware -device
 pci-assign,host=02:00.0,id=hostdev0,configfd=18,bus=pci.0,addr=0x3 \
 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4
 
 That's a lot of SCSI controllers.  Why are you creating 4 separate lsi
 SCSI controller devices, but only using 2 of them?  Can you reduce the
 problem by just using 1?  If so, then you might be able to move the
 assigned device and lsi device addr around so the guest will use
 different INTx interrupts for these (or at least move them until the
 assigned device gets an interrupt in the guest exclusively).  Is the
 guest Windows XP 32bit or 64bit?  A 64bit Windows is probably more
 likely to enable MSI interrupts (which hopefully your assigned device
 supports), which would also eliminate INTx sharing problems.
 

I tend to believe there is some problem with the IRQ routing information
provided to the BIOS or what the BIOS makes out of it. See how info
pci looks like on a qemu-syste-x86_64 -device e1000 -device e1000 VM
after the BIOS is done:

[...]
  Bus  0, device   3, function 0:
Ethernet controller: PCI device 8086:100e
  IRQ 11.
  BAR0: 32 bit memory at 0xf202 [0xf203].
  BAR1: I/O at 0xc040 [0xc07f].
  BAR6: 32 bit memory at 0x [0x0001fffe].
  id 
  Bus  0, device   4, function 0:
Ethernet controller: PCI device 8086:100e
  IRQ 11.
  BAR0: 32 bit memory at 0xf206 [0xf207].
  BAR1: I/O at 0xc080 [0xc0bf].
  BAR6: 32 bit memory at 0x [0x0001fffe].
  id 
  Bus  0, device   5, function 0:
Ethernet controller: PCI device 8086:100e
  IRQ 10.
  BAR0: 32 bit memory at 0xf20a [0xf20b].
  BAR1: I/O at 0xc0c0 [0xc0ff].
  BAR6: 32 bit memory at 0x [0x0001fffe].
  id 

Slot 3  4 on IRQ 11, but slot 5 on 10? That confuses Windows XP here -
at least until you reboot it after the device installation.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qemu-kvm: Fix kvm-disabled build

2011-06-20 Thread Marcelo Tosatti
On Fri, Jun 03, 2011 at 04:38:40PM +0200, Jan Kiszka wrote:
 Minor fallout from recent refactorings.
 
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 ---
  kvm-stub.c |   11 +++
  qemu-kvm.h |4 ++--
  2 files changed, 5 insertions(+), 10 deletions(-)

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qemu-kvm: Remove kvm_set_boot_cpu_id

2011-06-20 Thread Marcelo Tosatti
On Fri, Jun 03, 2011 at 04:39:55PM +0200, Jan Kiszka wrote:
 Upstream just as well as qemu-kvm only support CPU 0 as boot CPU. And
 that is also the KVM ABI default if the user does not issue any
 KVM_SET_BOOT_CPU_ID.
 
 So let's drop this redundancy. It can be re-introduced via upstream once
 we support something more sophisticated.
 
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 ---
  qemu-kvm.c|   11 ---
  qemu-kvm.h|2 --
  target-i386/kvm.c |5 -
  3 files changed, 0 insertions(+), 18 deletions(-)

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/15] KVM: MMU: cache mmio info on page fault path

2011-06-20 Thread Marcelo Tosatti
On Tue, Jun 07, 2011 at 09:00:30PM +0800, Xiao Guangrong wrote:
 If the page fault is caused by mmio, we can cache the mmio info, later, we do
 not need to walk guest page table and quickly know it is a mmio fault while we
 emulate the mmio instruction
 
 Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
 ---
  arch/x86/include/asm/kvm_host.h |5 +++
  arch/x86/kvm/mmu.c  |   21 +--
  arch/x86/kvm/mmu.h  |   23 +
  arch/x86/kvm/paging_tmpl.h  |   21 ++-
  arch/x86/kvm/x86.c  |   52 ++
  arch/x86/kvm/x86.h  |   36 +++
  6 files changed, 126 insertions(+), 32 deletions(-)
 
 diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
 index d167039..326af42 100644
 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -414,6 +414,11 @@ struct kvm_vcpu_arch {
   u64 mcg_ctl;
   u64 *mce_banks;
  
 + /* Cache MMIO info */
 + u64 mmio_gva;
 + unsigned access;
 + gfn_t mmio_gfn;
 +
   /* used for guest single stepping over the given code position */
   unsigned long singlestep_rip;
  

Why you're not implementing the original idea to cache the MMIO
attribute of an address into the spte?

That solution is wider reaching than a one-entry cache, and was proposed
to overcome large number of memslots.

If the access pattern switches between different addresses this one
solution is doomed.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/15] KVM: MMU: cache mmio info on page fault path

2011-06-20 Thread Marcelo Tosatti
On Mon, Jun 20, 2011 at 01:14:32PM -0300, Marcelo Tosatti wrote:
 On Tue, Jun 07, 2011 at 09:00:30PM +0800, Xiao Guangrong wrote:
  If the page fault is caused by mmio, we can cache the mmio info, later, we 
  do
  not need to walk guest page table and quickly know it is a mmio fault while 
  we
  emulate the mmio instruction
  
  Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
  ---
   arch/x86/include/asm/kvm_host.h |5 +++
   arch/x86/kvm/mmu.c  |   21 +--
   arch/x86/kvm/mmu.h  |   23 +
   arch/x86/kvm/paging_tmpl.h  |   21 ++-
   arch/x86/kvm/x86.c  |   52 
  ++
   arch/x86/kvm/x86.h  |   36 +++
   6 files changed, 126 insertions(+), 32 deletions(-)
  
  diff --git a/arch/x86/include/asm/kvm_host.h 
  b/arch/x86/include/asm/kvm_host.h
  index d167039..326af42 100644
  --- a/arch/x86/include/asm/kvm_host.h
  +++ b/arch/x86/include/asm/kvm_host.h
  @@ -414,6 +414,11 @@ struct kvm_vcpu_arch {
  u64 mcg_ctl;
  u64 *mce_banks;
   
  +   /* Cache MMIO info */
  +   u64 mmio_gva;
  +   unsigned access;
  +   gfn_t mmio_gfn;
  +
  /* used for guest single stepping over the given code position */
  unsigned long singlestep_rip;
   
 
 Why you're not implementing the original idea to cache the MMIO
 attribute of an address into the spte?
 
 That solution is wider reaching than a one-entry cache, and was proposed
 to overcome large number of memslots.
 
 If the access pattern switches between different addresses this one
 solution is doomed.

Nevermind, its later in the series.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Introduce panic hypercall

2011-06-20 Thread Daniel Gollub
On Monday, June 20, 2011 05:45:36 pm Avi Kivity wrote:
However, I'm not sure I see the gain.  Most enterprisey guests
already contain in-guest crash dumpers which provide more
information than a qemu memory dump could, since they know exact
load addresses etc. and are integrated with crash analysis tools.
What do you have in mind?

Right kexec/kdump works perfectly already inside the guest. But:

 - in the field a lot of people still manage to setup VM guest without
   kexec/kdump properly setup (even though most enterprisey distribution try
   hard to setup this up out-of-the-box .. still people manage to not have
   kexec/kdump loaded once they run into a crash).

 - you don't have to reserve disk space for a crashdump for each guest
   e.g. if you run 4 guests with 60 GB of memory each you would loose
   somehow 4*60 GB space ... just for the (rare) case that each of those
   guest could write a crashdump, uncompressed ...

 - legacy distribution - no or buggy kexec

 - maybe writing a crashdump+reboot with QEMU/libvirt is faster then
   with in-guest kexec/kdump? (haven't tested yet)

 - single place on the VM-host to collect coredumps


  
  Well libvirt can capture a core file by doing 'virsh dump $GUESTNAME'.
  This actually uses the QEMU monitor migration command to capture the
  entire of QEMU memory. The 'crash' command line tool actually knows
  how to analyse this data format as it would a normal kernel crashdump.
 
 Interesting.

Right. I'm using the kvmdump support of the crash utility now and then ... it 
could be more often. But unfortunately the people who run KVM in a productive 
environment with some strict service-level-agreement often just reboot, due to 
time pressure, or run out of disk space in the guest, or just forgot that they 
got told to do always virsh dump on a freeze or crash.


 
  I think having a way for a guest OS to notify the host that is has
  crashed would be useful. libvirt could automatically do a crash
  dump of the QEMU memory, or at least pause the guest CPUs and notify
  the management app of the crash, which can then decide what todo.
  You can also use tools like 'virt-dmesg' which uses libvirt to peek
  into guest memory to extract the most recent kernel dmesg logs (even
  if the guest OS itself is crashed  didn't manage to send them out
  via netconsole or something else).
 
 I agree.  But let's do this via a device, this way kvm need not be changed.

Is a device reliable enough if the guest kernel crashes?
Do you mean something like a hardware watchdog?

 
 Do ILO cards / IPMI support something like this?  We could follow their 
 lead in that case.

The only two things which came to my mind are:

 * NMI (aka. ipmitool diag) - already available in qemu/kvm - but requires
   in-guest kexec/kdump
 * Hardware-Watchdog (also available in qemu/libvirt)


lguest and xen have something similar. They also have an hypercall which get 
called by a function registered in the panic_notifier_list. Not quite sure if 
you want to follow their lead.

Something I forgot to mention: This panic hypercall could also sit within an 
external kernel module ... to support (legacy) distribution.

 
  This series does need to introduce a QMP event notification upon
  crash, so that the crash notification can be propagated to mgmt
  layers above QEMU.
 
 Yes.

Already done. I posted the QEMU relevant changes as a separated series to the 
KVM list ... since the initial implementation is KVM specific (KVM hypercall)

Best Regards,
Daniel

-- 
Daniel Gollub
Linux Consultant  Developer
Tel.: +49-160 47 73 970 
Mail: gol...@b1-systems.de

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537


signature.asc
Description: This is a digitally signed message part.


Re: [PATCH 0/2] Introduce panic hypercall

2011-06-20 Thread Avi Kivity

On 06/20/2011 07:26 PM, Daniel Gollub wrote:


  I agree.  But let's do this via a device, this way kvm need not be changed.

Is a device reliable enough if the guest kernel crashes?
Do you mean something like a hardware watchdog?


I'm proposing a 1:1 equivalent.  Instead of issuing a hypercall that 
tells the host about the panic, write to an I/O port that tells the host 
about the panic.




  Do ILO cards / IPMI support something like this?  We could follow their
  lead in that case.

The only two things which came to my mind are:

  * NMI (aka. ipmitool diag) - already available in qemu/kvm - but requires
in-guest kexec/kdump
  * Hardware-Watchdog (also available in qemu/libvirt)


A watchdog has the advantage that is also detects lockups.

In fact you could implement the panic device via the existing 
watchdogs.  Simply program the timer for the minimum interval and 
*don't* service the interrupt.  This would work for non-virt setups as 
well as another way to issue a reset.



lguest and xen have something similar. They also have an hypercall which get
called by a function registered in the panic_notifier_list. Not quite sure if
you want to follow their lead.


We could do the same, except s/hypercall/writel/.


Something I forgot to mention: This panic hypercall could also sit within an
external kernel module ... to support (legacy) distribution.


Yes.



This series does need to introduce a QMP event notification upon
crash, so that the crash notification can be propagated to mgmt
layers above QEMU.

  Yes.

Already done. I posted the QEMU relevant changes as a separated series to the
KVM list ... since the initial implementation is KVM specific (KVM hypercall)


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv4] qemu-img: Add cache command line option

2011-06-20 Thread Federico Simoncelli
qemu-img currently writes disk images using writeback and filling
up the cache buffers which are then flushed by the kernel preventing
other processes from accessing the storage.
This is particularly bad in cluster environments where time-based
algorithms might be in place and accessing the storage within
certain timeouts is critical.
This patch adds the option to choose a cache method when writing
disk images.

Signed-off-by: Federico Simoncelli fsimo...@redhat.com
---
 qemu-img-cmds.hx |6 ++--
 qemu-img.c   |   80 +-
 2 files changed, 70 insertions(+), 16 deletions(-)

diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index 3072d38..2b70618 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -22,13 +22,13 @@ STEXI
 ETEXI
 
 DEF(commit, img_commit,
-commit [-f fmt] filename)
+commit [-f fmt] [-t cache] filename)
 STEXI
 @item commit [-f @var{fmt}] @var{filename}
 ETEXI
 
 DEF(convert, img_convert,
-convert [-c] [-p] [-f fmt] [-O output_fmt] [-o options] [-s 
snapshot_name] filename [filename2 [...]] output_filename)
+convert [-c] [-p] [-f fmt] [-t cache] [-O output_fmt] [-o options] [-s 
snapshot_name] filename [filename2 [...]] output_filename)
 STEXI
 @item convert [-c] [-f @var{fmt}] [-O @var{output_fmt}] [-o @var{options}] [-s 
@var{snapshot_name}] @var{filename} [@var{filename2} [...]] 
@var{output_filename}
 ETEXI
@@ -46,7 +46,7 @@ STEXI
 ETEXI
 
 DEF(rebase, img_rebase,
-rebase [-f fmt] [-p] [-u] -b backing_file [-F backing_fmt] filename)
+rebase [-f fmt] [-t cache] [-p] [-u] -b backing_file [-F backing_fmt] 
filename)
 STEXI
 @item rebase [-f @var{fmt}] [-u] -b @var{backing_file} [-F @var{backing_fmt}] 
@var{filename}
 ETEXI
diff --git a/qemu-img.c b/qemu-img.c
index 4f162d1..f904e32 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -40,6 +40,7 @@ typedef struct img_cmd_t {
 
 /* Default to cache=writeback as data integrity is not important for qemu-tcg. 
*/
 #define BDRV_O_FLAGS BDRV_O_CACHE_WB
+#define BDRV_DEFAULT_CACHE writeback
 
 static void format_print(void *opaque, const char *name)
 {
@@ -64,6 +65,8 @@ static void help(void)
Command parameters:\n
  'filename' is a disk image filename\n
  'fmt' is the disk image format. It is guessed automatically in 
most cases\n
+ 'cache' is the cache mode used to write the output disk image, 
the valid\n
+   options are: 'none', 'writeback' (default), 'writethrough' and 
'unsafe'\n
  'size' is the disk image size in bytes. Optional suffixes\n
'k' or 'K' (kilobyte, 1024), 'M' (megabyte, 1024k), 'G' 
(gigabyte, 1024M)\n
and T (terabyte, 1024G) are supported. 'b' is ignored.\n
@@ -180,6 +183,27 @@ static int read_password(char *buf, int buf_size)
 }
 #endif
 
+static int set_cache_flag(const char *mode, int *flags)
+{
+*flags = ~BDRV_O_CACHE_MASK;
+
+if (!strcmp(mode, none) || !strcmp(mode, off)) {
+*flags |= BDRV_O_CACHE_WB;
+*flags |= BDRV_O_NOCACHE;
+} else if (!strcmp(mode, writeback)) {
+*flags |= BDRV_O_CACHE_WB;
+} else if (!strcmp(mode, unsafe)) {
+*flags |= BDRV_O_CACHE_WB;
+*flags |= BDRV_O_NO_FLUSH;
+} else if (!strcmp(mode, writethrough)) {
+/* this is the default */
+} else {
+return -1;
+}
+
+return 0;
+}
+
 static int print_block_option_help(const char *filename, const char *fmt)
 {
 BlockDriver *drv, *proto_drv;
@@ -441,13 +465,14 @@ static int img_check(int argc, char **argv)
 
 static int img_commit(int argc, char **argv)
 {
-int c, ret;
-const char *filename, *fmt;
+int c, ret, flags;
+const char *filename, *fmt, *cache;
 BlockDriverState *bs;
 
 fmt = NULL;
+cache = BDRV_DEFAULT_CACHE;
 for(;;) {
-c = getopt(argc, argv, f:h);
+c = getopt(argc, argv, f:ht:);
 if (c == -1) {
 break;
 }
@@ -459,6 +484,9 @@ static int img_commit(int argc, char **argv)
 case 'f':
 fmt = optarg;
 break;
+case 't':
+cache = optarg;
+break;
 }
 }
 if (optind = argc) {
@@ -466,7 +494,14 @@ static int img_commit(int argc, char **argv)
 }
 filename = argv[optind++];
 
-bs = bdrv_new_open(filename, fmt, BDRV_O_FLAGS | BDRV_O_RDWR);
+flags = BDRV_O_RDWR;
+ret = set_cache_flag(cache, flags);
+if (ret  0) {
+error_report(Invalid cache option: %s\n, cache);
+return -1;
+}
+
+bs = bdrv_new_open(filename, fmt, flags);
 if (!bs) {
 return 1;
 }
@@ -591,8 +626,8 @@ static int compare_sectors(const uint8_t *buf1, const 
uint8_t *buf2, int n,
 static int img_convert(int argc, char **argv)
 {
 int c, ret = 0, n, n1, bs_n, bs_i, compress, cluster_size, cluster_sectors;
-int progress = 0;
-const char *fmt, *out_fmt, *out_baseimg, *out_filename;
+int progress = 0, flags;
+  

Re: [PATCH 0/2] Introduce panic hypercall

2011-06-20 Thread Jan Kiszka
On 2011-06-20 18:34, Avi Kivity wrote:
 
   Do ILO cards / IPMI support something like this?  We could follow
 their
   lead in that case.

 The only two things which came to my mind are:

   * NMI (aka. ipmitool diag) - already available in qemu/kvm - but
 requires
 in-guest kexec/kdump
   * Hardware-Watchdog (also available in qemu/libvirt)
 
 A watchdog has the advantage that is also detects lockups.
 
 In fact you could implement the panic device via the existing
 watchdogs.  Simply program the timer for the minimum interval and
 *don't* service the interrupt.  This would work for non-virt setups as
 well as another way to issue a reset.

If you manage to bring down the other guest CPUs fast enough. Otherwise,
they may corrupt your crashdump before the host had a chance to collect
all pieces. Synchronous signaling to the hypervisor is a bit safer.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Introduce panic hypercall

2011-06-20 Thread Avi Kivity

On 06/20/2011 08:13 PM, Jan Kiszka wrote:

  A watchdog has the advantage that is also detects lockups.

  In fact you could implement the panic device via the existing
  watchdogs.  Simply program the timer for the minimum interval and
  *don't* service the interrupt.  This would work for non-virt setups as
  well as another way to issue a reset.

If you manage to bring down the other guest CPUs fast enough. Otherwise,
they may corrupt your crashdump before the host had a chance to collect
all pieces. Synchronous signaling to the hypervisor is a bit safer.


You could NMI-IPI them.  But I agree a synchronous signal is better 
(note it's not race-free itself).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/15] KVM: MMU: do not update slot bitmap if spte is nonpresent

2011-06-20 Thread Marcelo Tosatti
On Tue, Jun 07, 2011 at 08:59:25PM +0800, Xiao Guangrong wrote:
 Set slot bitmap only if the spte is present
 
 Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
 ---
  arch/x86/kvm/mmu.c |   15 +++
  1 files changed, 7 insertions(+), 8 deletions(-)
 
 diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
 index cda666a..125f78d 100644
 --- a/arch/x86/kvm/mmu.c
 +++ b/arch/x86/kvm/mmu.c
 @@ -743,9 +743,6 @@ static int rmap_add(struct kvm_vcpu *vcpu, u64 *spte, 
 gfn_t gfn)
   struct kvm_mmu_page *sp;
   unsigned long *rmapp;
  
 - if (!is_rmap_spte(*spte))
 - return 0;
 -

Not sure if this is safe, what if the spte is set as nonpresent but
rmap not removed?

BTW i don't see what patch 1 and this have to do with the goal
of the series.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/15] KVM: MMU: do not update slot bitmap if spte is nonpresent

2011-06-20 Thread Xiao Guangrong
On 06/21/2011 12:28 AM, Marcelo Tosatti wrote:
 On Tue, Jun 07, 2011 at 08:59:25PM +0800, Xiao Guangrong wrote:
 Set slot bitmap only if the spte is present

 Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
 ---
  arch/x86/kvm/mmu.c |   15 +++
  1 files changed, 7 insertions(+), 8 deletions(-)

 diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
 index cda666a..125f78d 100644
 --- a/arch/x86/kvm/mmu.c
 +++ b/arch/x86/kvm/mmu.c
 @@ -743,9 +743,6 @@ static int rmap_add(struct kvm_vcpu *vcpu, u64 *spte, 
 gfn_t gfn)
  struct kvm_mmu_page *sp;
  unsigned long *rmapp;
  
 -if (!is_rmap_spte(*spte))
 -return 0;
 -
 
 Not sure if this is safe, what if the spte is set as nonpresent but
 rmap not removed?

It can not happen, since when we set the spte as nonpresent, we always use
drop_spte to remove the rmap, we also do it in set_spte()

 
 BTW i don't see what patch 1 and this have to do with the goal
 of the series.
 


There are the preparing work for mmio page fault:
- Patch 1 fix the bug in walking shadow page, so we can safely use it to
  lockless-ly walk shadow page
- Patch 2 can avoid add rmap for the mmio spte :-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/15] KVM: MMU: lockless walking shadow page table

2011-06-20 Thread Xiao Guangrong
On 06/21/2011 12:37 AM, Marcelo Tosatti wrote:

 +if (atomic_read(kvm-arch.reader_counter)) {
 +free_mmu_pages_unlock_parts(invalid_list);
 +sp = list_first_entry(invalid_list, struct kvm_mmu_page, link);
 +list_del_init(invalid_list);
 +call_rcu(sp-rcu, free_invalid_pages_rcu);
 +return;
 +}
 
 This is probably wrong, the caller wants the page to be zapped by the 
 time the function returns, not scheduled sometime in the future.
 

It can be freed soon and KVM does not reuse these pages anymore...
it is not too bad, no?

 +
  do {
  sp = list_first_entry(invalid_list, struct kvm_mmu_page, link);
  WARN_ON(!sp-role.invalid || sp-root_count);
 @@ -2601,6 +2633,35 @@ static gpa_t nonpaging_gva_to_gpa_nested(struct 
 kvm_vcpu *vcpu, gva_t vaddr,
  return vcpu-arch.nested_mmu.translate_gpa(vcpu, vaddr, access);
  }
  
 +int kvm_mmu_walk_shadow_page_lockless(struct kvm_vcpu *vcpu, u64 addr,
 +  u64 sptes[4])
 +{
 +struct kvm_shadow_walk_iterator iterator;
 +int nr_sptes = 0;
 +
 +rcu_read_lock();
 +
 +atomic_inc(vcpu-kvm-arch.reader_counter);
 +/* Increase the counter before walking shadow page table */
 +smp_mb__after_atomic_inc();
 +
 +for_each_shadow_entry(vcpu, addr, iterator) {
 +sptes[iterator.level-1] = *iterator.sptep;
 +nr_sptes++;
 +if (!is_shadow_present_pte(*iterator.sptep))
 +break;
 +}
 
 Why is lockless access needed for the MMIO optimization? Note the spte 
 contents are copied to the array here are used for debugging purposes
 only, their contents are potentially stale.
 

Um, we can use it to check the mmio page fault if it is the real mmio access or 
the
bug of KVM, i discussed it with Avi:

===

 Yes, it is, i just want to detect BUG for KVM, it helps us to know if ept 
 misconfig is the
 real MMIO or the BUG. I noticed some ept misconfig BUGs is reported before, 
 so i think doing
 this is necessary, and i think it is not too bad, since walking spte 
 hierarchy is lockless,
 it really fast.

Okay.  We can later see if it show up on profiles. 
===

And it is really fast, i will attach the 'perf result' when the v2 is posted.

Yes, their contents are potentially stale, we just use it to check mmio, after 
all, if we get the
stale spte, we will call page fault path to fix it.
 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: MMU: make kvm_mmu_reset_context() flush the guest TLB

2011-06-20 Thread Marcelo Tosatti
On Sun, Jun 12, 2011 at 06:25:00PM +0300, Avi Kivity wrote:
 kvm_set_cr0() and kvm_set_cr4(), and possible other functions,
 assume that kvm_mmu_reset_context() flushes the guest TLB.  However,
 it does not.

TLB flush should be done lazily during guest entry, in
kvm_mmu_load(). Don't see why this patch is needed.

 
 Fix by flushing the tlb (and syncing the new root as well).
 
 Signed-off-by: Avi Kivity a...@redhat.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/12] [uq/master] Import linux headers and some cleanups

2011-06-20 Thread Marcelo Tosatti
On Wed, Jun 08, 2011 at 04:10:54PM +0200, Jan Kiszka wrote:
 Licensing of the virtio headers is no clarified. So we can finally
 resolve the clumbsy and constantly buggy #ifdef'ery around old KVM and
 virtio headers. Recent example: current qemu-kvm does not build against
 2.6.32 headers.
 
 This series introduces an import mechanism for all required Linux
 headers so that the appropriate versions can be kept safely inside the
 QEMU tree. I've incorporated all the valuable review comments on the
 first version and rebased the result over current uq/master after
 rebasing that one over current QEMU master.
 
 Please note that I had no chance to test-build PPC or s390.
 
 Beside the header topic, this series also includes a few assorted KVM
 cleanup patches so that my queue is empty again.

Applied all, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Introduce panic hypercall

2011-06-20 Thread Anthony Liguori

On 06/20/2011 10:31 AM, Avi Kivity wrote:

On 06/20/2011 04:38 PM, Daniel Gollub wrote:

Introduce panic hypercall to enable the crashing guest to notify the
host. This enables the host to run some actions as soon a guest
crashed (kernel panic).

This patch series introduces the panic hypercall at the host end.
As well as the hypercall for KVM paravirtuliazed Linux guests, by
registering the hypercall to the panic_notifier_list.

The basic idea is to create KVM crashdump automatically as soon the
guest paniced and power-cycle the VM (e.g. libvirton_crash /).


This would be more easily done via a panic device (I/O port or
memory-mapped address) that the guest hits. It would be intercepted by
qemu without any new code in kvm.\

However, I'm not sure I see the gain. Most enterprisey guests already
contain in-guest crash dumpers which provide more information than a
qemu memory dump could, since they know exact load addresses etc. and
are integrated with crash analysis tools. What do you have in mind?


FYI, s390 has this functionality.  It's useful because there's no use in 
having a guest just spin in a panic loop.  Crash dump integration is 
much more complicated and requires functioning networking or some 
paravirt channel.


Regards,

Anthony Liguori


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/7] KVM-HV: KVM Steal time implementation

2011-06-20 Thread Marcelo Tosatti
On Sun, Jun 19, 2011 at 12:57:53PM +0300, Avi Kivity wrote:
 On 06/17/2011 01:20 AM, Glauber Costa wrote:
 To implement steal time, we need the hypervisor to pass the guest information
 about how much time was spent running other processes outside the VM.
 This is per-vcpu, and using the kvmclock structure for that is an abuse
 we decided not to make.
 
 In this patchset, I am introducing a new msr, KVM_MSR_STEAL_TIME, that
 holds the memory area address containing information about steal time
 
 This patch contains the hypervisor part for it. I am keeping it separate from
 the headers to facilitate backports to people who wants to backport the 
 kernel
 part but not the hypervisor, or the other way around.
 
 
 
 +#define KVM_STEAL_ALIGNMENT_BITS 5
 +#define KVM_STEAL_VALID_BITS ((-1ULL  (KVM_STEAL_ALIGNMENT_BITS + 1)))
 +#define KVM_STEAL_RESERVED_MASK (((1  KVM_STEAL_ALIGNMENT_BITS) - 1 )  
 1)
 
 Clumsy, but okay.
 
 +static void record_steal_time(struct kvm_vcpu *vcpu)
 +{
 +u64 delta;
 +
 +if (vcpu-arch.st.stime  vcpu-arch.st.this_time_out) {
 
 0 is a valid value for stime.
 
 +
 +if (unlikely(kvm_read_guest(vcpu-kvm, vcpu-arch.st.stime,
 +vcpu-arch.st.steal, sizeof(struct kvm_steal_time {
 +
 +vcpu-arch.st.stime = 0;
 +return;
 +}
 +
 +delta = (get_kernel_ns() - vcpu-arch.st.this_time_out);
 +
 +vcpu-arch.st.steal.steal += delta;
 +vcpu-arch.st.steal.version += 2;
 +
 +if (unlikely(kvm_write_guest(vcpu-kvm, vcpu-arch.st.stime,
 +vcpu-arch.st.steal, sizeof(struct kvm_steal_time {
 +
 +vcpu-arch.st.stime = 0;
 +return;
 +}
 +}
 +
 +}
 +
 
 @@ -2158,6 +2206,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
  kvm_migrate_timers(vcpu);
  vcpu-cpu = cpu;
  }
 +
 +record_steal_time(vcpu);
   }
 
 This records time spent in userspace in the vcpu thread as steal
 time.  Is this what we want?  Or just time preempted away?

It also accounts halt time (kvm_vcpu_block) as steal time. Glauber, you
could instead use the runnable-state-but-waiting-in-runqueue field of
SCHEDSTATS, i forgot the exact name.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unable to unload kvm-intel module

2011-06-20 Thread AP
On Sun, Jun 19, 2011 at 2:06 AM, Jan Kiszka jan.kis...@web.de wrote:
 On 2011-06-17 20:04, AP wrote:
 I tired that and it did not give me any warning. Here is the compilation 
 output:

 make -C /lib/modules/2.6.38-8-generic/build M=`pwd` \
               LINUXINCLUDE=-I`pwd`/include -Iinclude \
                        -Iarch/x86/include \
                       -I`pwd`/include-compat -I`pwd`/x86 \
                       -include  include/generated/autoconf.h \
                       -include `pwd`/x86/external-module-compat.h \
               $@
 make[1]: Entering directory `/usr/src/linux-headers-2.6.38-8-generic'
   CC [M]  /home/ap/dev/kvm/kvm-kmod/x86/vmx.o
   LD [M]  /home/ap/dev/kvm/kvm-kmod/x86/kvm.o
   LD [M]  /home/ap/dev/kvm/kvm-kmod/x86/kvm-intel.o
   LD [M]  /home/ap/dev/kvm/kvm-kmod/x86/kvm-amd.o
   Building modules, stage 2.
   MODPOST 3 modules
   LD [M]  /home/ap/dev/kvm/kvm-kmod/x86/kvm-amd.ko
   CC      /home/ap/dev/kvm/kvm-kmod/x86/kvm-intel.mod.o
   LD [M]  /home/ap/dev/kvm/kvm-kmod/x86/kvm-intel.ko
   LD [M]  /home/ap/dev/kvm/kvm-kmod/x86/kvm.ko
 make[1]: Leaving directory `/usr/src/linux-headers-2.6.38-8-generic'

 Do you install the built modules and then do a modprobe, or how do you
 load them? Also try via

    insmod /home/ap/dev/kvm/kvm-kmod/x86/kvm.ko

I don't modprobe them. I use the insmod command above after I rmmod
the existing drivers.

I tried doing a make install and modprob. No luck!

 I don't know. Something must be broken with your Ubuntu installation.

This is looking very likely at this point. Thanks for all the help.

AP
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] client tools: Fix rebase bug on cd_hash.py

2011-06-20 Thread Lucas Meneghel Rodrigues
I really thought I had fixed this one. cd_hash makes reference
to a KvmLoggingConfig class, that existed prior to the refactor.

Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com
---
 client/tools/cd_hash.py |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/client/tools/cd_hash.py b/client/tools/cd_hash.py
index c658447..3db1e47 100755
--- a/client/tools/cd_hash.py
+++ b/client/tools/cd_hash.py
@@ -16,7 +16,7 @@ if __name__ == __main__:
 parser = optparse.OptionParser(usage: %prog [options] [filenames])
 options, args = parser.parse_args()
 
-logging_manager.configure_logging(virt_utils.KvmLoggingConfig())
+logging_manager.configure_logging(virt_utils.VirtLoggingConfig())
 
 if args:
 filenames = args
-- 
1.7.5.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


AW: current qemu-kvm doesn't work with vhost

2011-06-20 Thread Georg Hopp

Am 19.06.2011 10:58:20 schrieb(en) Jan Kiszka:

On 2011-06-17 22:31, Georg Hopp wrote:
 Am 17.06.2011 09:29:41 schrieb(en) Jan Kiszka:
 On 2011-06-17 09:10, Georg Hopp wrote:
  Jan Kiszka jan.kiszka at web.de writes:
 
  On 2011-06-10 05:08, Amos Kong wrote:
  host kernel: 2.6.39-rc2+
  qemu-kvm : 05f1737582ab6c075476bde931c5eafbc62a9349
 
  (gdb) r -monitor stdio -m 800 ~/RHEL-Server-6.0-64-virtio.qcow2
 -snapshot
  -device
  virtio-net-pci,netdev=he -netdev tap,vhost=on,id=he
 
 
  I already came across that symptom in a different context.  
Fixed by

 the
  patch below.
 
  However, the real issue is related to an upstream cleanup of the
  virtio-pci build. That reveals some unneeded build dependencies  
in

  qemu-kvm. Will post a fix.
 
  Jan
 
  FYI
 
  I encountered the same problem and applied the patch.
 
  Well this results in the following error while starting the  
guest:

 
   qemu-system-x86_64: unable to start vhost net: 38:
falling back on userspace virtio
 
  and i have no network at all. I will disable vhost=on for now.

 Hmm, works fine for me. The vhost-net module is loaded (though I  
got a

 different message when I forgot to load it)?

 Jan


 Generally it works for me until git revision b2146d8bd.

You mean including that commit, right?


 I have compiled vhost-net directly in my kernel so a have definetly  
not

 forgotten to load it...
 As i use gentoo i made an ebuild that installes exactly this  
revision.
 If i find the time to do some debugging i will do so, but actually  
i am

 very busy with my job
 family and the use of kvm is just a sparetime thing. :D

 If it would be of some help i can set a breakpoint just before the  
path

 and see what
 causes the message.

Let's start with double-checking that you are on ce5f0a588b, did a  
make

clean  make, and then actually used that result. I suspect an
inconsistent build as ce5f0a588b makes the difference between ENOSYS
(38) and working vhost support here.

Jan




Hi,

sorry for the long wait. Had a hard weekend. ;)

at revision: ce5f0a588b - check
applied patch   - check
clean make  - check

And now everything works as expected...well, at least i got no error at  
all.

Thanks! Anything else i can do?

Georg--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


AW: current qemu-kvm doesn't work with vhost

2011-06-20 Thread Georg Hopp

Am 19.06.2011 10:58:20 schrieb(en) Jan Kiszka:

On 2011-06-17 22:31, Georg Hopp wrote:
 Am 17.06.2011 09:29:41 schrieb(en) Jan Kiszka:
 On 2011-06-17 09:10, Georg Hopp wrote:
  Jan Kiszka jan.kiszka at web.de writes:
 
  On 2011-06-10 05:08, Amos Kong wrote:
  host kernel: 2.6.39-rc2+
  qemu-kvm : 05f1737582ab6c075476bde931c5eafbc62a9349
 
  (gdb) r -monitor stdio -m 800 ~/RHEL-Server-6.0-64-virtio.qcow2
 -snapshot
  -device
  virtio-net-pci,netdev=he -netdev tap,vhost=on,id=he
 
 
  I already came across that symptom in a different context.  
Fixed by

 the
  patch below.
 
  However, the real issue is related to an upstream cleanup of the
  virtio-pci build. That reveals some unneeded build dependencies  
in

  qemu-kvm. Will post a fix.
 
  Jan
 
  FYI
 
  I encountered the same problem and applied the patch.
 
  Well this results in the following error while starting the  
guest:

 
   qemu-system-x86_64: unable to start vhost net: 38:
falling back on userspace virtio
 
  and i have no network at all. I will disable vhost=on for now.

 Hmm, works fine for me. The vhost-net module is loaded (though I  
got a

 different message when I forgot to load it)?

 Jan


 Generally it works for me until git revision b2146d8bd.

You mean including that commit, right?


 I have compiled vhost-net directly in my kernel so a have definetly  
not

 forgotten to load it...
 As i use gentoo i made an ebuild that installes exactly this  
revision.
 If i find the time to do some debugging i will do so, but actually  
i am

 very busy with my job
 family and the use of kvm is just a sparetime thing. :D

 If it would be of some help i can set a breakpoint just before the  
path

 and see what
 causes the message.

Let's start with double-checking that you are on ce5f0a588b, did a  
make

clean  make, and then actually used that result. I suspect an
inconsistent build as ce5f0a588b makes the difference between ENOSYS
(38) and working vhost support here.

Jan




Hi again,

tried the current HEAD without any patches and vhost works again for me
and with much better performance than before.


Client connecting to host, TCP port 5001
TCP window size: 16.0 KByte (default)

[  3] local 192.168.100.4 port 48053 connected with 192.168.100.1 port  
5001

[ ID] Interval   Transfer Bandwidth
[  3]  0.0-10.0 sec  7.63 GBytes  6.55 Gbits/sec

Thanks for the great work!

Greets
   Georg--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html