Re: [PATCH v2 3/7] KVM-HV: KVM Steal time implementation
On 06/20/2011 05:53 AM, Glauber Costa wrote: +static void record_steal_time(struct kvm_vcpu *vcpu) +{ + u64 delta; + + if (vcpu-arch.st.stime vcpu-arch.st.this_time_out) { 0 is a valid value for stime. how exactly? stime is a guest physical address... 0 is a valid physical address. @@ -2158,6 +2206,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) kvm_migrate_timers(vcpu); vcpu-cpu = cpu; } + + record_steal_time(vcpu); } This records time spent in userspace in the vcpu thread as steal time. Is this what we want? Or just time preempted away? There are arguments either way. Right now, the way it is, it does account our iothread as steal time, which is not 100 % accurate if we think steal time as whatever takes time away from our VM. I tend to think it as whatever takes time away from this CPU, which includes other cpus in the same VM. So thinking this way, in a 1-1 phys-to-virt cpu mapping, if the iothread is taking 80 % cpu for whatever reason, we have 80 % steal time the cpu that is sharing the physical cpu with the iothread. I'm not talking about the iothread, rather the vcpu thread while running in userspace. Maybe we could account that as iotime ? Questions like that are one of the reasons behind me leaving extra fields in the steal time structure. We could do a more fine grained accounting and differentiate between the multiple entities that can do work (of various kinds) in our behalf. What do other architectures do (xen, s390)? -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 5/7] KVM-GST: KVM Steal time accounting
On 06/20/2011 05:38 AM, Glauber Costa wrote: On 06/19/2011 07:04 AM, Avi Kivity wrote: On 06/17/2011 01:20 AM, Glauber Costa wrote: This patch accounts steal time time in kernel/sched. I kept it from last proposal, because I still see advantages in it: Doing it here will give us easier access from scheduler variables such as the cpu rq. The next patch shows an example of usage for it. Since functions like account_idle_time() can be called from multiple places, not only account_process_tick(), steal time grabbing is repeated in each account function separatedely. /* + * We have to at flush steal time information every time something else + * is accounted. Since the accounting functions are all visible to the rest + * of the kernel, it gets tricky to do them in one place. This helper function + * helps us. + * + * When the system is idle, the concept of steal time does not apply. We just + * tell the underlying hypervisor that we grabbed the data, but skip steal time + * accounting + */ +static inline bool touch_steal_time(int is_idle) +{ + u64 steal, st = 0; + + if (static_branch(paravirt_steal_enabled)) { + + steal = paravirt_steal_clock(smp_processor_id()); + + steal -= this_rq()-prev_steal_time; + if (is_idle) { + this_rq()-prev_steal_time += steal; + return false; + } + + while (steal= TICK_NSEC) { + /* + * Inline assembly required to prevent the compiler + * optimising this loop into a divmod call. + * See __iter_div_u64_rem() for another example of this. + */ Why not use said function? because here we want to do work during each loop. The said function would have to be adapted for that, possibly using a macro, to run arbitrary code during each loop iteration, in a way that I don't think it is worthy given the current number of callers (2 counting this new one) You mean adding to prev_steal_time? That can be done outside the loop. + asm( : +rm (steal)); + + steal -= TICK_NSEC; + this_rq()-prev_steal_time += TICK_NSEC; + st++; Suppose a live migration or SIGSTOP causes lots of steal time. How long will we spend here? Silly me. I actually used this same argument with Peter to cap it with delta in the next patch in this series. So I think you are 100 % right. Here, however, we do want to account all that time, I believe. How about we do a slow division if we're 10 sec (unlikely), and account everything as steal time in this scenario ? Okay. Division would be faster for a lot less than 10s though. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/7] KVM-HV: KVM Steal time implementation
On Sun, Jun 19, 2011 at 04:02:22PM +0300, Avi Kivity wrote: On 06/19/2011 03:59 PM, Gleb Natapov wrote: On Sun, Jun 19, 2011 at 03:35:58PM +0300, Avi Kivity wrote: On 06/15/2011 12:09 PM, Gleb Natapov wrote: Actually, I'd expect most read/writes to benefit from caching, no? So why don't we just rename kvm_write_guest_cached() to kvm_write_guest(), and the few places - if any - that need to force transversing of the gfn mappings, get renamed to kvm_write_guest_uncached ? Good idea. I do not see any places where kvm_write_guest_uncached is needed from a brief look. Avi? kvm_write_guest_cached() needs something to supply the cache, and needs recurring writes to the same location. Neither of these are common (for example, instruction emulation doesn't have either). Correct. Missed that. So what about changing steal time to use kvm_write_guest_cached()? Makes sense, definitely. Want to post read_guest_cached() as well? Glauber can you write read_guest_cached() as part of your series (should be trivial), or do you want me to do it? I do not have a code to test it with though :) -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM call agenda for June 21
Please send in any agenda items you are interested in covering. thanks, -juan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/7] KVM-HV: KVM Steal time implementation
On 06/20/2011 10:21 AM, Gleb Natapov wrote: On Sun, Jun 19, 2011 at 04:02:22PM +0300, Avi Kivity wrote: On 06/19/2011 03:59 PM, Gleb Natapov wrote: On Sun, Jun 19, 2011 at 03:35:58PM +0300, Avi Kivity wrote: On 06/15/2011 12:09 PM, Gleb Natapov wrote: Actually, I'd expect most read/writes to benefit from caching, no? So why don't we just rename kvm_write_guest_cached() to kvm_write_guest(), and the few places - if any - that need to force transversing of the gfn mappings, get renamed to kvm_write_guest_uncached ? Good idea. I do not see any places where kvm_write_guest_uncached is needed from a brief look. Avi? kvm_write_guest_cached() needs something to supply the cache, and needs recurring writes to the same location. Neither of these are common (for example, instruction emulation doesn't have either). Correct. Missed that. So what about changing steal time to use kvm_write_guest_cached()? Makes sense, definitely. Want to post read_guest_cached() as well? Glauber can you write read_guest_cached() as part of your series (should be trivial), or do you want me to do it? I do not have a code to test it with though :) Yes. (you can write it, and Glauber can include it in the series) -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Why doesn't Intel e1000 NIC work correctly in Windows XP?
Not sure it might help but IIRC the original e1000 driver for windows had some bugs that were fixed if you'll download the most recent driver from Intel site. This was the case for the fully emulated e1000 qemu device and might help here too. On 06/19/2011 03:29 PM, Flypen CloudMe wrote: Hi, Here are the command line: /usr/bin/qemu-kvm -S -M rhel6.0.0 -enable-kvm -m 2048 -smp 2,sockets=1,cores=2,threads=1 \ -name winxp -uuid 23cd2751-8a30-dd34-db47-bfc8c76ccadb -nodefconfig -nodefaults \ -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/winxp.monitor,server,nowait -mon chardev=monitor,mode=readline \ -rtc base=localtime -boot c -device lsi,id=scsi0,bus=pci.0,addr=0x5 -device lsi,id=scsi1,bus=pci.0,addr=0x6 \ -device lsi,id=scsi2,bus=pci.0,addr=0x7 -device lsi,id=scsi3,bus=pci.0,addr=0x8 \ -drive file=/mnt/vmdisk/winxp.disk,if=none,id=drive-ide0-0-0,boot=on,format=raw,cache=none \ -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 \ -drive file=/mnt/vmdisk/virtio-win-1.1.16.vfd,if=none,id=drive-fdc0-0-0,format=raw,cache=none\ -global isa-fdc.driveA=drive-fdc0-0-0 -drive file=/dev/sd1,if=none,id=drive-scsi0-0-0,format=raw,cache=none \ -device scsi-disk,bus=scsi0.0,scsi-id=0,drive=drive-scsi0-0-0,id=scsi0-0-0 \ -drive file=/dev/sdb,if=none,id=drive-scsi0-0-1,format=raw,cache=none \ -device scsi-disk,bus=scsi0.0,scsi-id=1,drive=drive-scsi0-0-1,id=scsi0-0-1 \ -drive file=/dev/sdc,if=none,id=drive-scsi0-0-2,format=raw,cache=none \ -device scsi-disk,bus=scsi0.0,scsi-id=2,drive=drive-scsi0-0-2,id=scsi0-0-2 \ -drive file=/dev/sdd,if=none,id=drive-scsi0-0-3,format=raw,cache=none \ -device scsi-disk,bus=scsi0.0,scsi-id=3,drive=drive-scsi0-0-3,id=scsi0-0-3 \ -drive file=/dev/sde,if=none,id=drive-scsi0-0-4,format=raw,cache=none \ -device scsi-disk,bus=scsi0.0,scsi-id=4,drive=drive-scsi0-0-4,id=scsi0-0-4 \ -drive file=/dev/sdf,if=none,id=drive-scsi3-0-0,format=raw,cache=none \ -device scsi-disk,bus=scsi3.0,scsi-id=0,drive=drive-scsi3-0-0,id=scsi3-0-0 \ -drive file=/mnt/vmdisk/D/1,if=none,id=drive-scsi0-0-6,format=raw,cache=none \ -device scsi-disk,bus=scsi0.0,scsi-id=6,drive=drive-scsi0-0-6,id=scsi0-0-6 \ -chardev pty,id=serial0 -device isa-serial,chardev=serial0 -usb \ -vnc 0.0.0.0:0 -k en-us -vga vmware -device pci-assign,host=02:00.0,id=hostdev0,configfd=18,bus=pci.0,addr=0x3 \ -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 The NIC and one SCSI controller (slot 7) has the same IRQ. The performance in XP is really bad. When writing traffic to the drive, the NIC can't be accessed, and ping will be also timeout. If I let the NIC has the different IRQ number, then everything is OK. Is it related to INTx model for XP? We rebuild the QEMU, and add the LSI SCSI controller support. Why does RHEL6 removes its support? Is this controller too old? Are there any emulated SCSI devices to replace it? Thanks, flypen On Thu, Jun 16, 2011 at 2:42 AM, Alex Williamson alex.william...@redhat.com wrote: On Wed, 2011-06-15 at 11:31 +0200, Jan Kiszka wrote: On 2011-06-15 10:04, Jan Kiszka wrote: On 2011-06-15 02:54, Alex Williamson wrote: On Tue, 2011-06-14 at 16:11 +0800, Flypen CloudMe wrote: Hi, I use Redhat Enterprise Linux 6, and use the KVM that is released by Redhat officially. The kernel version is 2.6.32-71.el6.x86_64. It seems that the IRQs are conflicted after reboot. The NIC and the SCSI controller have the same IRQ number. If I re-install the NIC driver, the IRQ number of the NIC will be assigned another value, then it can work normally. Do we have a way to let the NIC and the SCSI controller have different IRQ number in VM? Hmm, I'm still confused here. I went back and double checked, and as I thought, we disable the LSI SCSI controller in the RHEL6 KVM. So I'm curious what this device is. Is it an assigned SCSI controller or is there another one that we forgot to disable in RHEL or is this a different version of KVM? The config file or command line would be handy here. I'll see if I can reproduce and figure anything out. Windows XP isn't a guest we concentrate on, especially with device assignment. Are you using an AMD or Intel host system? Does the same thing happen if you run the XP guest on an IDE controller? It would be helpful to post the guest configuration, command line used or libvirt xml. Also, you might try latest upstream qemu-kvm to see if the problem still exists. I tested with an 82578DM e1000e NIC on an Intel host system, and it surprisingly worked just fine on the RHEL6.0 base. This is with a 32bit Windows XP SP3 install. The device supports MSI, but windows only seems to use it with INTx. I did have to remove the emulated rtl8139 or else I couldn't even boot due to BSODs in the guest. Nonsense, can't t make a difference as the PIIX3 resets the routing to disable - which device-assignment does not deal with, but that's unrelated. Yep, someone has to write it at some point and device assignment will catch that. Try assigning a
Re: [PATCH 3/3] KVM: MMU: Use helpers to clean up walk_addr_generic()
On 06/14/2011 08:03 PM, Takuya Yoshikawa wrote: From: Takuya Yoshikawayoshikawa.tak...@oss.ntt.co.jp Introduce two new helpers: set_accessed_bit() and is_last_gpte(). These names were suggested by Ingo and Avi. Cc: Ingo Molnarmi...@elte.hu Signed-off-by: Takuya Yoshikawayoshikawa.tak...@oss.ntt.co.jp --- arch/x86/kvm/paging_tmpl.h | 57 --- 1 files changed, 42 insertions(+), 15 deletions(-) diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 92fe275..d655a4b6 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -113,6 +113,43 @@ static unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, pt_element_t gpte) return access; } +static int FNAME(set_accessed_bit)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, + gfn_t table_gfn, unsigned index, + pt_element_t __user *ptep_user, + pt_element_t *ptep) +{ + int ret; + + trace_kvm_mmu_set_accessed_bit(table_gfn, index, sizeof(*ptep)); + ret = FNAME(cmpxchg_gpte)(vcpu, mmu, ptep_user, index, + *ptep, *ptep|PT_ACCESSED_MASK); + if (unlikely(ret)) + return ret; + + mark_page_dirty(vcpu-kvm, table_gfn); + *ptep |= PT_ACCESSED_MASK; + + return 0; +} I don't think this one is worthwhile, it takes 7 parameters! If there's so much communication between caller and callee, it means they are too heavily tied up. + +static bool FNAME(is_last_gpte)(struct guest_walker *walker, + struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, + pt_element_t gpte) +{ + if (walker-level == PT_PAGE_TABLE_LEVEL) + return true; + + if ((walker-level == PT_DIRECTORY_LEVEL) is_large_pte(gpte) + (PTTYPE == 64 || is_pse(vcpu))) + return true; + + if ((walker-level == PT_PDPE_LEVEL) is_large_pte(gpte) + (mmu-root_level == PT64_ROOT_LEVEL)) + return true; + + return false; +} + This one is much better. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/7] KVM-HV: KVM Steal time implementation
On 06/20/2011 05:02 AM, Avi Kivity wrote: On 06/20/2011 10:21 AM, Gleb Natapov wrote: On Sun, Jun 19, 2011 at 04:02:22PM +0300, Avi Kivity wrote: On 06/19/2011 03:59 PM, Gleb Natapov wrote: On Sun, Jun 19, 2011 at 03:35:58PM +0300, Avi Kivity wrote: On 06/15/2011 12:09 PM, Gleb Natapov wrote: Actually, I'd expect most read/writes to benefit from caching, no? So why don't we just rename kvm_write_guest_cached() to kvm_write_guest(), and the few places - if any - that need to force transversing of the gfn mappings, get renamed to kvm_write_guest_uncached ? Good idea. I do not see any places where kvm_write_guest_uncached is needed from a brief look. Avi? kvm_write_guest_cached() needs something to supply the cache, and needs recurring writes to the same location. Neither of these are common (for example, instruction emulation doesn't have either). Correct. Missed that. So what about changing steal time to use kvm_write_guest_cached()? Makes sense, definitely. Want to post read_guest_cached() as well? Glauber can you write read_guest_cached() as part of your series (should be trivial), or do you want me to do it? I do not have a code to test it with though :) Yes. (you can write it, and Glauber can include it in the series) Write it, handle me the patch, I'll include it and test it. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/7] KVM-HV: KVM Steal time implementation
On Mon, Jun 20, 2011 at 09:42:31AM -0300, Glauber Costa wrote: On 06/20/2011 05:02 AM, Avi Kivity wrote: On 06/20/2011 10:21 AM, Gleb Natapov wrote: On Sun, Jun 19, 2011 at 04:02:22PM +0300, Avi Kivity wrote: On 06/19/2011 03:59 PM, Gleb Natapov wrote: On Sun, Jun 19, 2011 at 03:35:58PM +0300, Avi Kivity wrote: On 06/15/2011 12:09 PM, Gleb Natapov wrote: Actually, I'd expect most read/writes to benefit from caching, no? So why don't we just rename kvm_write_guest_cached() to kvm_write_guest(), and the few places - if any - that need to force transversing of the gfn mappings, get renamed to kvm_write_guest_uncached ? Good idea. I do not see any places where kvm_write_guest_uncached is needed from a brief look. Avi? kvm_write_guest_cached() needs something to supply the cache, and needs recurring writes to the same location. Neither of these are common (for example, instruction emulation doesn't have either). Correct. Missed that. So what about changing steal time to use kvm_write_guest_cached()? Makes sense, definitely. Want to post read_guest_cached() as well? Glauber can you write read_guest_cached() as part of your series (should be trivial), or do you want me to do it? I do not have a code to test it with though :) Yes. (you can write it, and Glauber can include it in the series) Write it, handle me the patch, I'll include it and test it. Only compile tested. === Introduce kvm_read_guest_cached() function in addition to write one we already have. Signed-off-by: Gleb Natapov g...@redhat.com diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index fa2321a..bf62c76 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1414,6 +1414,26 @@ int kvm_write_guest_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc, } EXPORT_SYMBOL_GPL(kvm_write_guest_cached); +int kvm_read_guest_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc, + void *data, unsigned long len) +{ + struct kvm_memslots *slots = kvm_memslots(kvm); + int r; + + if (slots-generation != ghc-generation) + kvm_gfn_to_hva_cache_init(kvm, ghc, ghc-gpa); + + if (kvm_is_error_hva(ghc-hva)) + return -EFAULT; + + r = __copy_from_user(data, (void __user *)ghc-hva, len); + if (r) + return -EFAULT; + + return 0; +} +EXPORT_SYMBOL_GPL(kvm_read_guest_cached); + int kvm_clear_guest_page(struct kvm *kvm, gfn_t gfn, int offset, int len) { return kvm_write_guest_page(kvm, gfn, (const void *) empty_zero_page, -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] Inroduce panic hypercall KVM_HC_PANIC (host)
Introduce panic hypercall (KVM_HC_PANIC) on host-end to signal that the guest crashed/paniced. This gets signal to userspace with KVM API and ioctl KVM_RUN with exit_reason: KVM_EXIT_PANIC Signed-off-by: Daniel Gollub gol...@b1-systems.de --- arch/x86/kvm/x86.c |9 + include/linux/kvm.h |1 + include/linux/kvm_host.h |1 + include/linux/kvm_para.h |1 + 4 files changed, 12 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index d88de56..bbe91fe 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5103,6 +5103,10 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) case KVM_HC_MMU_OP: r = kvm_pv_mmu_op(vcpu, a0, hc_gpa(vcpu, a1, a2), ret); break; + case KVM_HC_PANIC: + set_bit(KVM_REQ_PANIC, vcpu-requests); + ret = 0; + break; default: ret = -KVM_ENOSYS; break; @@ -5431,6 +5435,11 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) r = 1; goto out; } + if (kvm_check_request(KVM_REQ_PANIC, vcpu)) { + vcpu-run-exit_reason = KVM_EXIT_PANIC; + r = 0; + goto out; + } } r = kvm_mmu_reload(vcpu); diff --git a/include/linux/kvm.h b/include/linux/kvm.h index 55ef181..8a8b609 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -161,6 +161,7 @@ struct kvm_pit_config { #define KVM_EXIT_NMI 16 #define KVM_EXIT_INTERNAL_ERROR 17 #define KVM_EXIT_OSI 18 +#define KVM_EXIT_PANIC19 /* For KVM_EXIT_INTERNAL_ERROR */ #define KVM_INTERNAL_ERROR_EMULATION 1 diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index b9c3299..1819414 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -47,6 +47,7 @@ #define KVM_REQ_DEACTIVATE_FPU10 #define KVM_REQ_EVENT 11 #define KVM_REQ_APF_HALT 12 +#define KVM_REQ_PANIC 13 #define KVM_USERSPACE_IRQ_SOURCE_ID0 diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h index 47a070b..5cdf61b 100644 --- a/include/linux/kvm_para.h +++ b/include/linux/kvm_para.h @@ -19,6 +19,7 @@ #define KVM_HC_MMU_OP 2 #define KVM_HC_FEATURES3 #define KVM_HC_PPC_MAP_MAGIC_PAGE 4 +#define KVM_HC_PANIC 5 /* * hypercalls use architecture specific -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] Introduce panic hypercall
Introduce panic hypercall to enable the crashing guest to notify the host. This enables the host to run some actions as soon a guest crashed (kernel panic). This patch series introduces the panic hypercall at the host end. As well as the hypercall for KVM paravirtuliazed Linux guests, by registering the hypercall to the panic_notifier_list. The basic idea is to create KVM crashdump automatically as soon the guest paniced and power-cycle the VM (e.g. libvirt on_crash /). Daniel Gollub (2): Inroduce panic hypercall KVM_HC_PANIC (host) Call KVM_HC_PANIC if guest panics arch/x86/kernel/kvm.c| 16 arch/x86/kvm/x86.c |9 + include/linux/kvm.h |1 + include/linux/kvm_host.h |1 + include/linux/kvm_para.h |1 + 5 files changed, 28 insertions(+), 0 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] Call KVM_HC_PANIC if guest panics
Call KVM hypercall KVM_HC_PANIC if guest kernel calls panic() to signal the host that the guest paniced. Depends on CONFIG_KVM_GUEST set. Signed-off-by: Daniel Gollub gol...@b1-systems.de --- arch/x86/kernel/kvm.c | 16 1 files changed, 16 insertions(+), 0 deletions(-) diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 33c07b0..f3c7d34 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -534,6 +534,20 @@ static void __init kvm_apf_trap_init(void) set_intr_gate(14, async_page_fault); } +static int kvm_guest_panic(struct notifier_block *nb, unsigned long l, void *p) +{ + kvm_hypercall0(KVM_HC_PANIC); + return NOTIFY_DONE; +} + +static struct notifier_block kvm_guest_paniced = { + .notifier_call = kvm_guest_panic +}; + +static void kvm_guest_panic_handler_init(void) { + atomic_notifier_chain_register(panic_notifier_list, kvm_guest_paniced); +} + void __init kvm_guest_init(void) { int i; @@ -541,6 +555,8 @@ void __init kvm_guest_init(void) if (!kvm_para_available()) return; + kvm_guest_panic_handler_init(); + paravirt_ops_setup(); register_reboot_notifier(kvm_pv_reboot_nb); for (i = 0; i KVM_TASK_SLEEP_HASHSIZE; i++) -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] qemu-kvm: Introduce KVM panic hypercall support
Introduce KVM panic hypercall support to make QEMU aware of a crashed guest. This patches are specific to a KVM paravirtulized guest which need to call KVM_HC_PANIC on a crash/panic. The basic idea of this implementation and of the QMP PANIC event is to be able to create crashdump via the hypervisor instead of kexec/kdump as soon the guest crashes. Initial panic QMP event enabled-libvirt is in progress: http://people.b1-systems.de/~gollub/kvm/hypercall-panic/libvirt/ Daniel Gollub (2): Handle KVM hypercall panic on guest crash QMP: Introduce QEVENT_PANIC QMP/qmp-events.txt | 13 + kvm-all.c |5 + kvm/include/linux/kvm.h |1 + monitor.c | 11 +-- monitor.h |1 + sysemu.h|3 +++ vl.c| 20 7 files changed, 52 insertions(+), 2 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] Handle KVM hypercall panic on guest crash
If the guest crash and the crash/panic handler calls the KVM panic hypercall the KVM API notifies this with KVM_EXIT_PANIC. The VM status gets extended with panic to obtain this status via the QEMU monitor. --- kvm-all.c |4 kvm/include/linux/kvm.h |1 + monitor.c |8 ++-- sysemu.h|1 + vl.c|2 ++ 5 files changed, 14 insertions(+), 2 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index 629f727..9771f91 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -1029,6 +1029,10 @@ int kvm_cpu_exec(CPUState *env) qemu_system_reset_request(); ret = EXCP_INTERRUPT; break; +case KVM_EXIT_PANIC: +panic = 1; +ret = 1; +break; case KVM_EXIT_UNKNOWN: fprintf(stderr, KVM: unknown exit, hardware reason % PRIx64 \n, (uint64_t)run-hw.hardware_exit_reason); diff --git a/kvm/include/linux/kvm.h b/kvm/include/linux/kvm.h index e46729e..207871c 100644 --- a/kvm/include/linux/kvm.h +++ b/kvm/include/linux/kvm.h @@ -161,6 +161,7 @@ struct kvm_pit_config { #define KVM_EXIT_NMI 16 #define KVM_EXIT_INTERNAL_ERROR 17 #define KVM_EXIT_OSI 18 +#define KVM_EXIT_PANIC19 /* For KVM_EXIT_INTERNAL_ERROR */ #define KVM_INTERNAL_ERROR_EMULATION 1 diff --git a/monitor.c b/monitor.c index 59a3e76..fd6a881 100644 --- a/monitor.c +++ b/monitor.c @@ -2599,13 +2599,17 @@ static void do_info_status_print(Monitor *mon, const QObject *data) monitor_printf(mon, paused); } +if (qdict_get_bool(qdict, panic)) { +monitor_printf(mon, (panic)); +} + monitor_printf(mon, \n); } static void do_info_status(Monitor *mon, QObject **ret_data) { -*ret_data = qobject_from_jsonf({ 'running': %i, 'singlestep': %i }, -vm_running, singlestep); +*ret_data = qobject_from_jsonf({ 'running': %i, 'singlestep': %i, 'panic': %i }, +vm_running, singlestep, panic); } static qemu_acl *find_acl(Monitor *mon, const char *name) diff --git a/sysemu.h b/sysemu.h index a42d83f..8ab0168 100644 --- a/sysemu.h +++ b/sysemu.h @@ -12,6 +12,7 @@ extern const char *bios_name; extern int vm_running; +extern int panic; extern const char *qemu_name; extern uint8_t qemu_uuid[]; int qemu_uuid_parse(const char *str, uint8_t *uuid); diff --git a/vl.c b/vl.c index e0191e1..1d9a068 100644 --- a/vl.c +++ b/vl.c @@ -185,6 +185,7 @@ int mem_prealloc = 0; /* force preallocation of physical target memory */ int nb_nics; NICInfo nd_table[MAX_NICS]; int vm_running; +int panic = 0; int autostart; int incoming_expected; /* Started with -incoming and waiting for incoming */ static int rtc_utc = 1; @@ -1407,6 +1408,7 @@ static void main_loop(void) pause_all_vcpus(); cpu_synchronize_all_states(); qemu_system_reset(); +panic = 0; resume_all_vcpus(); } if (qemu_powerdown_requested()) { -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] QMP: Introduce QEVENT_PANIC
Emitted when the guest panics. For now only if KVM_EXIT_PANIC got triggered. Signed-off-by: Daniel Gollub gol...@b1-systems.de --- QMP/qmp-events.txt | 13 + kvm-all.c |3 ++- monitor.c |3 +++ monitor.h |1 + sysemu.h |2 ++ vl.c | 18 ++ 6 files changed, 39 insertions(+), 1 deletions(-) diff --git a/QMP/qmp-events.txt b/QMP/qmp-events.txt index 0ce5d4e..96e4307 100644 --- a/QMP/qmp-events.txt +++ b/QMP/qmp-events.txt @@ -264,3 +264,16 @@ Example: Note: If action is reset, shutdown, or pause the WATCHDOG event is followed respectively by the RESET, SHUTDOWN, or STOP events. + + +PANIC +- + +Emitted when the guest panics. + +Data: None. + +Example: + +{ timestamp: {seconds: 1308569038, microseconds: 918147}, + event: PANIC} diff --git a/kvm-all.c b/kvm-all.c index 9771f91..9fdda69 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -1030,7 +1030,8 @@ int kvm_cpu_exec(CPUState *env) ret = EXCP_INTERRUPT; break; case KVM_EXIT_PANIC: -panic = 1; +DPRINTF(panic\n); +qemu_system_panic_request(); ret = 1; break; case KVM_EXIT_UNKNOWN: diff --git a/monitor.c b/monitor.c index fd6a881..5b337f2 100644 --- a/monitor.c +++ b/monitor.c @@ -468,6 +468,9 @@ void monitor_protocol_event(MonitorEvent event, QObject *data) case QEVENT_SPICE_DISCONNECTED: event_name = SPICE_DISCONNECTED; break; +case QEVENT_PANIC: +event_name = PANIC; +break; default: abort(); break; diff --git a/monitor.h b/monitor.h index 4f2d328..8b045df 100644 --- a/monitor.h +++ b/monitor.h @@ -35,6 +35,7 @@ typedef enum MonitorEvent { QEVENT_SPICE_CONNECTED, QEVENT_SPICE_INITIALIZED, QEVENT_SPICE_DISCONNECTED, +QEVENT_PANIC, QEVENT_MAX, } MonitorEvent; diff --git a/sysemu.h b/sysemu.h index 8ab0168..30744b0 100644 --- a/sysemu.h +++ b/sysemu.h @@ -43,11 +43,13 @@ void qemu_system_shutdown_request(void); void qemu_system_powerdown_request(void); void qemu_system_debug_request(void); void qemu_system_vmstop_request(int reason); +void qemu_system_panic_request(void); int qemu_shutdown_requested_get(void); int qemu_reset_requested_get(void); int qemu_shutdown_requested(void); int qemu_reset_requested(void); int qemu_powerdown_requested(void); +int qemu_panic_requested(void); void qemu_system_killed(int signal, pid_t pid); void qemu_kill_report(void); extern qemu_irq qemu_system_powerdown; diff --git a/vl.c b/vl.c index 1d9a068..d997c36 100644 --- a/vl.c +++ b/vl.c @@ -1173,6 +1173,7 @@ static pid_t shutdown_pid; static int powerdown_requested; static int debug_requested; static int vmstop_requested; +static int panic_requested; int qemu_shutdown_requested_get(void) { @@ -1235,6 +1236,13 @@ static int qemu_vmstop_requested(void) return r; } +int qemu_panic_requested(void) +{ +int r = panic_requested; +panic_requested = 0; +return r; +} + void qemu_register_reset(QEMUResetHandler *func, void *opaque) { QEMUResetEntry *re = qemu_mallocz(sizeof(QEMUResetEntry)); @@ -1311,6 +1319,13 @@ void qemu_system_vmstop_request(int reason) qemu_notify_event(); } +void qemu_system_panic_request(void) +{ +panic = 1; +panic_requested = 1; +qemu_notify_event(); +} + void main_loop_wait(int nonblocking) { fd_set rfds, wfds, xfds; @@ -1418,6 +1433,9 @@ static void main_loop(void) if ((r = qemu_vmstop_requested())) { vm_stop(r); } +if (qemu_panic_requested()) { +monitor_protocol_event(QEVENT_PANIC, NULL); +} } bdrv_close_all(); pause_all_vcpus(); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 1/3] KVM: MMU: Clean up the error handling of walk_addr_generic()
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Avoid two step jump to the error handling part. This eliminates the use of the variables present and rsvd_fault. We also use the const type qualifier to show that write/user/fetch_fault do not change in the function. Both of these were suggested by Ingo Molnar. Cc: Ingo Molnar mi...@elte.hu Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp --- v2-v3: only changelog update arch/x86/kvm/paging_tmpl.h | 64 +++ 1 files changed, 28 insertions(+), 36 deletions(-) diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 1caeb4d..137aa45 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -125,18 +125,17 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker, gfn_t table_gfn; unsigned index, pt_access, uninitialized_var(pte_access); gpa_t pte_gpa; - bool eperm, present, rsvd_fault; - int offset, write_fault, user_fault, fetch_fault; - - write_fault = access PFERR_WRITE_MASK; - user_fault = access PFERR_USER_MASK; - fetch_fault = access PFERR_FETCH_MASK; + bool eperm; + int offset; + const int write_fault = access PFERR_WRITE_MASK; + const int user_fault = access PFERR_USER_MASK; + const int fetch_fault = access PFERR_FETCH_MASK; + u16 errcode = 0; trace_kvm_mmu_pagetable_walk(addr, write_fault, user_fault, fetch_fault); walk: - present = true; - eperm = rsvd_fault = false; + eperm = false; walker-level = mmu-root_level; pte = mmu-get_cr3(vcpu); @@ -145,7 +144,7 @@ walk: pte = kvm_pdptr_read_mmu(vcpu, mmu, (addr 30) 3); trace_kvm_mmu_paging_element(pte, walker-level); if (!is_present_gpte(pte)) { - present = false; + errcode |= PFERR_PRESENT_MASK; goto error; } --walker-level; @@ -171,34 +170,34 @@ walk: real_gfn = mmu-translate_gpa(vcpu, gfn_to_gpa(table_gfn), PFERR_USER_MASK|PFERR_WRITE_MASK); if (unlikely(real_gfn == UNMAPPED_GVA)) { - present = false; - break; + errcode |= PFERR_PRESENT_MASK; + goto error; } real_gfn = gpa_to_gfn(real_gfn); host_addr = gfn_to_hva(vcpu-kvm, real_gfn); if (unlikely(kvm_is_error_hva(host_addr))) { - present = false; - break; + errcode |= PFERR_PRESENT_MASK; + goto error; } ptep_user = (pt_element_t __user *)((void *)host_addr + offset); if (unlikely(__copy_from_user(pte, ptep_user, sizeof(pte { - present = false; - break; + errcode |= PFERR_PRESENT_MASK; + goto error; } trace_kvm_mmu_paging_element(pte, walker-level); if (unlikely(!is_present_gpte(pte))) { - present = false; - break; + errcode |= PFERR_PRESENT_MASK; + goto error; } if (unlikely(is_rsvd_bits_set(vcpu-arch.mmu, pte, walker-level))) { - rsvd_fault = true; - break; + errcode |= PFERR_RSVD_MASK; + goto error; } if (unlikely(write_fault !is_writable_pte(pte) @@ -213,16 +212,15 @@ walk: eperm = true; #endif - if (!eperm !rsvd_fault -unlikely(!(pte PT_ACCESSED_MASK))) { + if (!eperm unlikely(!(pte PT_ACCESSED_MASK))) { int ret; trace_kvm_mmu_set_accessed_bit(table_gfn, index, sizeof(pte)); ret = FNAME(cmpxchg_gpte)(vcpu, mmu, ptep_user, index, pte, pte|PT_ACCESSED_MASK); if (unlikely(ret 0)) { - present = false; - break; + errcode |= PFERR_PRESENT_MASK; + goto error; } else if (ret) goto walk; @@ -276,7 +274,7 @@ walk: --walker-level; } - if (unlikely(!present || eperm || rsvd_fault)) + if (unlikely(eperm)) goto error;
[PATCH v3 2/3] KVM: MMU: Rename the walk label in walk_addr_generic()
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp The current name does not explain the meaning well. So give it a better name retry_walk to show that we are trying the walk again. This was suggested by Ingo Molnar. Cc: Ingo Molnar mi...@elte.hu Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp --- v2-v3: only changelog update arch/x86/kvm/paging_tmpl.h |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 137aa45..92fe275 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -134,7 +134,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker, trace_kvm_mmu_pagetable_walk(addr, write_fault, user_fault, fetch_fault); -walk: +retry_walk: eperm = false; walker-level = mmu-root_level; pte = mmu-get_cr3(vcpu); @@ -222,7 +222,7 @@ walk: errcode |= PFERR_PRESENT_MASK; goto error; } else if (ret) - goto walk; + goto retry_walk; mark_page_dirty(vcpu-kvm, table_gfn); pte |= PT_ACCESSED_MASK; @@ -287,7 +287,7 @@ walk: errcode |= PFERR_PRESENT_MASK; goto error; } else if (ret) - goto walk; + goto retry_walk; mark_page_dirty(vcpu-kvm, table_gfn); pte |= PT_DIRTY_MASK; -- 1.7.4.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Why doesn't Intel e1000 NIC work correctly in Windows XP?
On Sun, 2011-06-19 at 20:29 +0800, Flypen CloudMe wrote: Hi, Here are the command line: /usr/bin/qemu-kvm -S -M rhel6.0.0 -enable-kvm -m 2048 -smp 2,sockets=1,cores=2,threads=1 \ -name winxp -uuid 23cd2751-8a30-dd34-db47-bfc8c76ccadb -nodefconfig -nodefaults \ -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/winxp.monitor,server,nowait -mon chardev=monitor,mode=readline \ -rtc base=localtime -boot c -device lsi,id=scsi0,bus=pci.0,addr=0x5 -device lsi,id=scsi1,bus=pci.0,addr=0x6 \ -device lsi,id=scsi2,bus=pci.0,addr=0x7 -device lsi,id=scsi3,bus=pci.0,addr=0x8 \ -drive file=/mnt/vmdisk/winxp.disk,if=none,id=drive-ide0-0-0,boot=on,format=raw,cache=none \ -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 \ -drive file=/mnt/vmdisk/virtio-win-1.1.16.vfd,if=none,id=drive-fdc0-0-0,format=raw,cache=none\ -global isa-fdc.driveA=drive-fdc0-0-0 -drive file=/dev/sd1,if=none,id=drive-scsi0-0-0,format=raw,cache=none \ -device scsi-disk,bus=scsi0.0,scsi-id=0,drive=drive-scsi0-0-0,id=scsi0-0-0 \ -drive file=/dev/sdb,if=none,id=drive-scsi0-0-1,format=raw,cache=none \ -device scsi-disk,bus=scsi0.0,scsi-id=1,drive=drive-scsi0-0-1,id=scsi0-0-1 \ -drive file=/dev/sdc,if=none,id=drive-scsi0-0-2,format=raw,cache=none \ -device scsi-disk,bus=scsi0.0,scsi-id=2,drive=drive-scsi0-0-2,id=scsi0-0-2 \ -drive file=/dev/sdd,if=none,id=drive-scsi0-0-3,format=raw,cache=none \ -device scsi-disk,bus=scsi0.0,scsi-id=3,drive=drive-scsi0-0-3,id=scsi0-0-3 \ -drive file=/dev/sde,if=none,id=drive-scsi0-0-4,format=raw,cache=none \ -device scsi-disk,bus=scsi0.0,scsi-id=4,drive=drive-scsi0-0-4,id=scsi0-0-4 \ -drive file=/dev/sdf,if=none,id=drive-scsi3-0-0,format=raw,cache=none \ -device scsi-disk,bus=scsi3.0,scsi-id=0,drive=drive-scsi3-0-0,id=scsi3-0-0 \ -drive file=/mnt/vmdisk/D/1,if=none,id=drive-scsi0-0-6,format=raw,cache=none \ -device scsi-disk,bus=scsi0.0,scsi-id=6,drive=drive-scsi0-0-6,id=scsi0-0-6 \ -chardev pty,id=serial0 -device isa-serial,chardev=serial0 -usb \ -vnc 0.0.0.0:0 -k en-us -vga vmware -device pci-assign,host=02:00.0,id=hostdev0,configfd=18,bus=pci.0,addr=0x3 \ -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 That's a lot of SCSI controllers. Why are you creating 4 separate lsi SCSI controller devices, but only using 2 of them? Can you reduce the problem by just using 1? If so, then you might be able to move the assigned device and lsi device addr around so the guest will use different INTx interrupts for these (or at least move them until the assigned device gets an interrupt in the guest exclusively). Is the guest Windows XP 32bit or 64bit? A 64bit Windows is probably more likely to enable MSI interrupts (which hopefully your assigned device supports), which would also eliminate INTx sharing problems. The NIC and one SCSI controller (slot 7) has the same IRQ. The performance in XP is really bad. When writing traffic to the drive, the NIC can't be accessed, and ping will be also timeout. If I let the NIC has the different IRQ number, then everything is OK. Is it related to INTx model for XP? Maybe so. Most of the guest/device combinations we test for device assignment make use of MSI/X interrupts, which are more efficient, and avoid these sorts of problems. We rebuild the QEMU, and add the LSI SCSI controller support. Why does RHEL6 removes its support? Is this controller too old? Are there any emulated SCSI devices to replace it? We remove it because it's not well used or tested and we don't want to support it. Virtio-blk is the alternative we'd typically recommend for guests with supported drivers. Thanks, Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 3/3] KVM: MMU: Introduce is_last_gpte() to clean up walk_addr_generic()
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Suggested by Ingo and Avi. Cc: Ingo Molnar mi...@elte.hu Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp --- v2-v3: dropped set_accessed_bit() arch/x86/kvm/paging_tmpl.h | 26 +++--- 1 files changed, 19 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 92fe275..e9243c8 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -113,6 +113,24 @@ static unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, pt_element_t gpte) return access; } +static bool FNAME(is_last_gpte)(struct guest_walker *walker, + struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, + pt_element_t gpte) +{ + if (walker-level == PT_PAGE_TABLE_LEVEL) + return true; + + if ((walker-level == PT_DIRECTORY_LEVEL) is_large_pte(gpte) + (PTTYPE == 64 || is_pse(vcpu))) + return true; + + if ((walker-level == PT_PDPE_LEVEL) is_large_pte(gpte) + (mmu-root_level == PT64_ROOT_LEVEL)) + return true; + + return false; +} + /* * Fetch a guest pte for a guest virtual address */ @@ -232,13 +250,7 @@ retry_walk: walker-ptes[walker-level - 1] = pte; - if ((walker-level == PT_PAGE_TABLE_LEVEL) || - ((walker-level == PT_DIRECTORY_LEVEL) - is_large_pte(pte) - (PTTYPE == 64 || is_pse(vcpu))) || - ((walker-level == PT_PDPE_LEVEL) - is_large_pte(pte) - mmu-root_level == PT64_ROOT_LEVEL)) { + if (FNAME(is_last_gpte)(walker, vcpu, mmu, pte)) { int lvl = walker-level; gpa_t real_gpa; gfn_t gfn; -- 1.7.4.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/7] KVM-GST: adjust scheduler cpu power
On Tue, 2011-06-14 at 22:26 -0300, Glauber Costa wrote: On 06/14/2011 07:42 AM, Peter Zijlstra wrote: On Mon, 2011-06-13 at 19:31 -0400, Glauber Costa wrote: @@ -1981,12 +1987,29 @@ static void update_rq_clock_task(struct rq *rq, s64 delta) rq-prev_irq_time += irq_delta; delta -= irq_delta; +#endif +#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING + if (static_branch((paravirt_steal_rq_enabled))) { Why is that a different variable from the touch_steal_time() one? because they track different things, touch_steal_time() and update_rq_clock() are called from different places at different situations. If we advance prev_steal_time in touch_steal_time(), and later on call update_rq_clock_task(), we won't discount the time already flushed from the rq_clock. Conversely, if we call update_rq_clock_task(), and only then arrive at touch_steal_time, we won't account steal time properly. But that's about prev_steal_time vs prev_steal_time_acc, I agree those should be different. update_rq_clock_task() is called whenever update_rq_clock() is called. touch_steal_time is called every tick. If there is a causal relation between them that would allow us to track it in a single location, I fail to realize. Both are steal time muck, I was wondering why we'd want to do one and not the other when we have a high res stealtime clock. + + steal = paravirt_steal_clock(cpu_of(rq)); + steal -= rq-prev_steal_time_acc; + + rq-prev_steal_time_acc += steal; You have this addition in the wrong place, when you clip: I begin by disagreeing + if (steal delta) + steal = delta; you just lost your steal delta, so the addition to prev_steal_time_acc needs to be after the clip. Unlike irq time, steal time can be extremely huge. Just think of a virtual machine that got interrupted for a very long time. We'd have steal delta, leading to steal == delta for a big number of iterations. That would affect cpu power for an extended period of time, not reflecting present situation, just the past. So I like to think of delta as a hard cap for steal time. Obviously, I am open to debate. I'm failing to see how this would happen, if the virtual machine wasn't scheduled for a long long while, delta would be huge too. But suppose it does happen, wouldn't it be likely that the virtual machine would receive similar bad service in the near future? Making the total accounting relevant. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] qemu-img: Add cache command line option
Am 16.06.2011 16:43, schrieb Kevin Wolf: Am 16.06.2011 16:28, schrieb Christoph Hellwig: On Wed, Jun 15, 2011 at 09:46:10AM -0400, Federico Simoncelli wrote: qemu-img currently writes disk images using writeback and filling up the cache buffers which are then flushed by the kernel preventing other processes from accessing the storage. This is particularly bad in cluster environments where time-based algorithms might be in place and accessing the storage within certain timeouts is critical. This patch adds the option to choose a cache method when writing disk images. Allowing to chose the mode is of course fine, but what about also choosing a good default? writethrough doesn't really make any sense for qemu-img, given that we can trivially flush the cache at the end of the operations. I'd also say that using the buffer cache doesn't make sense either, as there is little point in caching these operations. Right, we need to keep the defaults as they are. That is, for convert unsafe and for everything else writeback. The patch seems to make writeback the default for everything. Federico, are you going to fix this in a v4? Kevin -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] virtio: Support releasing lock during kick
On Sun, Jun 19, 2011 at 8:14 AM, Michael S. Tsirkin m...@redhat.com wrote: On Wed, Jun 23, 2010 at 10:24:02PM +0100, Stefan Hajnoczi wrote: The virtio block device holds a lock during I/O request processing. Kicking the virtqueue while the lock is held results in long lock hold times and increases contention for the lock. This patch modifies virtqueue_kick() to optionally release a lock while notifying the host. Virtio block is modified to pass in its lock. This allows other vcpus to queue I/O requests during the time spent servicing the virtqueue notify in the host. The virtqueue_kick() function is modified to know about locking because it changes the state of the virtqueue and should execute with the lock held (it would not be correct for virtio block to release the lock before calling virtqueue_kick()). Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com While the optimization makes sense, the API's pretty hairy IMHO. Why don't we split the kick functionality instead? E.g. /* Report whether host notification is necessary. */ bool virtqueue_kick_prepare(struct virtqueue *vq) /* Can be done in parallel with add_buf/get_buf */ void virtqueue_kick_notify(struct virtqueue *vq) This is a nice idea, it makes the code cleaner. I am testing patches that implement this and after Khoa has measured the performance I will send them out. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Introduce panic hypercall
On 06/20/2011 04:38 PM, Daniel Gollub wrote: Introduce panic hypercall to enable the crashing guest to notify the host. This enables the host to run some actions as soon a guest crashed (kernel panic). This patch series introduces the panic hypercall at the host end. As well as the hypercall for KVM paravirtuliazed Linux guests, by registering the hypercall to the panic_notifier_list. The basic idea is to create KVM crashdump automatically as soon the guest paniced and power-cycle the VM (e.g. libvirton_crash /). This would be more easily done via a panic device (I/O port or memory-mapped address) that the guest hits. It would be intercepted by qemu without any new code in kvm.\ However, I'm not sure I see the gain. Most enterprisey guests already contain in-guest crash dumpers which provide more information than a qemu memory dump could, since they know exact load addresses etc. and are integrated with crash analysis tools. What do you have in mind? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Introduce panic hypercall
On Mon, Jun 20, 2011 at 06:31:23PM +0300, Avi Kivity wrote: On 06/20/2011 04:38 PM, Daniel Gollub wrote: Introduce panic hypercall to enable the crashing guest to notify the host. This enables the host to run some actions as soon a guest crashed (kernel panic). This patch series introduces the panic hypercall at the host end. As well as the hypercall for KVM paravirtuliazed Linux guests, by registering the hypercall to the panic_notifier_list. The basic idea is to create KVM crashdump automatically as soon the guest paniced and power-cycle the VM (e.g. libvirton_crash /). This would be more easily done via a panic device (I/O port or memory-mapped address) that the guest hits. It would be intercepted by qemu without any new code in kvm.\ However, I'm not sure I see the gain. Most enterprisey guests already contain in-guest crash dumpers which provide more information than a qemu memory dump could, since they know exact load addresses etc. and are integrated with crash analysis tools. What do you have in mind? Well libvirt can capture a core file by doing 'virsh dump $GUESTNAME'. This actually uses the QEMU monitor migration command to capture the entire of QEMU memory. The 'crash' command line tool actually knows how to analyse this data format as it would a normal kernel crashdump. I think having a way for a guest OS to notify the host that is has crashed would be useful. libvirt could automatically do a crash dump of the QEMU memory, or at least pause the guest CPUs and notify the management app of the crash, which can then decide what todo. You can also use tools like 'virt-dmesg' which uses libvirt to peek into guest memory to extract the most recent kernel dmesg logs (even if the guest OS itself is crashed didn't manage to send them out via netconsole or something else). This series does need to introduce a QMP event notification upon crash, so that the crash notification can be propagated to mgmt layers above QEMU. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Introduce panic hypercall
On 06/20/2011 06:38 PM, Daniel P. Berrange wrote: On Mon, Jun 20, 2011 at 06:31:23PM +0300, Avi Kivity wrote: On 06/20/2011 04:38 PM, Daniel Gollub wrote: Introduce panic hypercall to enable the crashing guest to notify the host. This enables the host to run some actions as soon a guest crashed (kernel panic). This patch series introduces the panic hypercall at the host end. As well as the hypercall for KVM paravirtuliazed Linux guests, by registering the hypercall to the panic_notifier_list. The basic idea is to create KVM crashdump automatically as soon the guest paniced and power-cycle the VM (e.g. libvirton_crash /). This would be more easily done via a panic device (I/O port or memory-mapped address) that the guest hits. It would be intercepted by qemu without any new code in kvm.\ However, I'm not sure I see the gain. Most enterprisey guests already contain in-guest crash dumpers which provide more information than a qemu memory dump could, since they know exact load addresses etc. and are integrated with crash analysis tools. What do you have in mind? Well libvirt can capture a core file by doing 'virsh dump $GUESTNAME'. This actually uses the QEMU monitor migration command to capture the entire of QEMU memory. The 'crash' command line tool actually knows how to analyse this data format as it would a normal kernel crashdump. Interesting. I think having a way for a guest OS to notify the host that is has crashed would be useful. libvirt could automatically do a crash dump of the QEMU memory, or at least pause the guest CPUs and notify the management app of the crash, which can then decide what todo. You can also use tools like 'virt-dmesg' which uses libvirt to peek into guest memory to extract the most recent kernel dmesg logs (even if the guest OS itself is crashed didn't manage to send them out via netconsole or something else). I agree. But let's do this via a device, this way kvm need not be changed. Do ILO cards / IPMI support something like this? We could follow their lead in that case. This series does need to introduce a QMP event notification upon crash, so that the crash notification can be propagated to mgmt layers above QEMU. Yes. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Introduce panic hypercall
On 2011-06-20 17:45, Avi Kivity wrote: This series does need to introduce a QMP event notification upon crash, so that the crash notification can be propagated to mgmt layers above QEMU. Yes. I think the best way to deal with that is to stop the VM on guest panic. There is already WIP to signal stop reasons via QMP. Maybe we need to differentiate between hypervisor and guest triggered panics (VMSTOP_GUEST_PANIC?), but the rest should come for free. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Why doesn't Intel e1000 NIC work correctly in Windows XP?
On 2011-06-20 16:32, Alex Williamson wrote: On Sun, 2011-06-19 at 20:29 +0800, Flypen CloudMe wrote: Hi, Here are the command line: /usr/bin/qemu-kvm -S -M rhel6.0.0 -enable-kvm -m 2048 -smp 2,sockets=1,cores=2,threads=1 \ -name winxp -uuid 23cd2751-8a30-dd34-db47-bfc8c76ccadb -nodefconfig -nodefaults \ -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/winxp.monitor,server,nowait -mon chardev=monitor,mode=readline \ -rtc base=localtime -boot c -device lsi,id=scsi0,bus=pci.0,addr=0x5 -device lsi,id=scsi1,bus=pci.0,addr=0x6 \ -device lsi,id=scsi2,bus=pci.0,addr=0x7 -device lsi,id=scsi3,bus=pci.0,addr=0x8 \ -drive file=/mnt/vmdisk/winxp.disk,if=none,id=drive-ide0-0-0,boot=on,format=raw,cache=none \ -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 \ -drive file=/mnt/vmdisk/virtio-win-1.1.16.vfd,if=none,id=drive-fdc0-0-0,format=raw,cache=none\ -global isa-fdc.driveA=drive-fdc0-0-0 -drive file=/dev/sd1,if=none,id=drive-scsi0-0-0,format=raw,cache=none \ -device scsi-disk,bus=scsi0.0,scsi-id=0,drive=drive-scsi0-0-0,id=scsi0-0-0 \ -drive file=/dev/sdb,if=none,id=drive-scsi0-0-1,format=raw,cache=none \ -device scsi-disk,bus=scsi0.0,scsi-id=1,drive=drive-scsi0-0-1,id=scsi0-0-1 \ -drive file=/dev/sdc,if=none,id=drive-scsi0-0-2,format=raw,cache=none \ -device scsi-disk,bus=scsi0.0,scsi-id=2,drive=drive-scsi0-0-2,id=scsi0-0-2 \ -drive file=/dev/sdd,if=none,id=drive-scsi0-0-3,format=raw,cache=none \ -device scsi-disk,bus=scsi0.0,scsi-id=3,drive=drive-scsi0-0-3,id=scsi0-0-3 \ -drive file=/dev/sde,if=none,id=drive-scsi0-0-4,format=raw,cache=none \ -device scsi-disk,bus=scsi0.0,scsi-id=4,drive=drive-scsi0-0-4,id=scsi0-0-4 \ -drive file=/dev/sdf,if=none,id=drive-scsi3-0-0,format=raw,cache=none \ -device scsi-disk,bus=scsi3.0,scsi-id=0,drive=drive-scsi3-0-0,id=scsi3-0-0 \ -drive file=/mnt/vmdisk/D/1,if=none,id=drive-scsi0-0-6,format=raw,cache=none \ -device scsi-disk,bus=scsi0.0,scsi-id=6,drive=drive-scsi0-0-6,id=scsi0-0-6 \ -chardev pty,id=serial0 -device isa-serial,chardev=serial0 -usb \ -vnc 0.0.0.0:0 -k en-us -vga vmware -device pci-assign,host=02:00.0,id=hostdev0,configfd=18,bus=pci.0,addr=0x3 \ -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 That's a lot of SCSI controllers. Why are you creating 4 separate lsi SCSI controller devices, but only using 2 of them? Can you reduce the problem by just using 1? If so, then you might be able to move the assigned device and lsi device addr around so the guest will use different INTx interrupts for these (or at least move them until the assigned device gets an interrupt in the guest exclusively). Is the guest Windows XP 32bit or 64bit? A 64bit Windows is probably more likely to enable MSI interrupts (which hopefully your assigned device supports), which would also eliminate INTx sharing problems. I tend to believe there is some problem with the IRQ routing information provided to the BIOS or what the BIOS makes out of it. See how info pci looks like on a qemu-syste-x86_64 -device e1000 -device e1000 VM after the BIOS is done: [...] Bus 0, device 3, function 0: Ethernet controller: PCI device 8086:100e IRQ 11. BAR0: 32 bit memory at 0xf202 [0xf203]. BAR1: I/O at 0xc040 [0xc07f]. BAR6: 32 bit memory at 0x [0x0001fffe]. id Bus 0, device 4, function 0: Ethernet controller: PCI device 8086:100e IRQ 11. BAR0: 32 bit memory at 0xf206 [0xf207]. BAR1: I/O at 0xc080 [0xc0bf]. BAR6: 32 bit memory at 0x [0x0001fffe]. id Bus 0, device 5, function 0: Ethernet controller: PCI device 8086:100e IRQ 10. BAR0: 32 bit memory at 0xf20a [0xf20b]. BAR1: I/O at 0xc0c0 [0xc0ff]. BAR6: 32 bit memory at 0x [0x0001fffe]. id Slot 3 4 on IRQ 11, but slot 5 on 10? That confuses Windows XP here - at least until you reboot it after the device installation. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] qemu-kvm: Fix kvm-disabled build
On Fri, Jun 03, 2011 at 04:38:40PM +0200, Jan Kiszka wrote: Minor fallout from recent refactorings. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- kvm-stub.c | 11 +++ qemu-kvm.h |4 ++-- 2 files changed, 5 insertions(+), 10 deletions(-) Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] qemu-kvm: Remove kvm_set_boot_cpu_id
On Fri, Jun 03, 2011 at 04:39:55PM +0200, Jan Kiszka wrote: Upstream just as well as qemu-kvm only support CPU 0 as boot CPU. And that is also the KVM ABI default if the user does not issue any KVM_SET_BOOT_CPU_ID. So let's drop this redundancy. It can be re-introduced via upstream once we support something more sophisticated. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- qemu-kvm.c| 11 --- qemu-kvm.h|2 -- target-i386/kvm.c |5 - 3 files changed, 0 insertions(+), 18 deletions(-) Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 04/15] KVM: MMU: cache mmio info on page fault path
On Tue, Jun 07, 2011 at 09:00:30PM +0800, Xiao Guangrong wrote: If the page fault is caused by mmio, we can cache the mmio info, later, we do not need to walk guest page table and quickly know it is a mmio fault while we emulate the mmio instruction Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com --- arch/x86/include/asm/kvm_host.h |5 +++ arch/x86/kvm/mmu.c | 21 +-- arch/x86/kvm/mmu.h | 23 + arch/x86/kvm/paging_tmpl.h | 21 ++- arch/x86/kvm/x86.c | 52 ++ arch/x86/kvm/x86.h | 36 +++ 6 files changed, 126 insertions(+), 32 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index d167039..326af42 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -414,6 +414,11 @@ struct kvm_vcpu_arch { u64 mcg_ctl; u64 *mce_banks; + /* Cache MMIO info */ + u64 mmio_gva; + unsigned access; + gfn_t mmio_gfn; + /* used for guest single stepping over the given code position */ unsigned long singlestep_rip; Why you're not implementing the original idea to cache the MMIO attribute of an address into the spte? That solution is wider reaching than a one-entry cache, and was proposed to overcome large number of memslots. If the access pattern switches between different addresses this one solution is doomed. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 04/15] KVM: MMU: cache mmio info on page fault path
On Mon, Jun 20, 2011 at 01:14:32PM -0300, Marcelo Tosatti wrote: On Tue, Jun 07, 2011 at 09:00:30PM +0800, Xiao Guangrong wrote: If the page fault is caused by mmio, we can cache the mmio info, later, we do not need to walk guest page table and quickly know it is a mmio fault while we emulate the mmio instruction Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com --- arch/x86/include/asm/kvm_host.h |5 +++ arch/x86/kvm/mmu.c | 21 +-- arch/x86/kvm/mmu.h | 23 + arch/x86/kvm/paging_tmpl.h | 21 ++- arch/x86/kvm/x86.c | 52 ++ arch/x86/kvm/x86.h | 36 +++ 6 files changed, 126 insertions(+), 32 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index d167039..326af42 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -414,6 +414,11 @@ struct kvm_vcpu_arch { u64 mcg_ctl; u64 *mce_banks; + /* Cache MMIO info */ + u64 mmio_gva; + unsigned access; + gfn_t mmio_gfn; + /* used for guest single stepping over the given code position */ unsigned long singlestep_rip; Why you're not implementing the original idea to cache the MMIO attribute of an address into the spte? That solution is wider reaching than a one-entry cache, and was proposed to overcome large number of memslots. If the access pattern switches between different addresses this one solution is doomed. Nevermind, its later in the series. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Introduce panic hypercall
On Monday, June 20, 2011 05:45:36 pm Avi Kivity wrote: However, I'm not sure I see the gain. Most enterprisey guests already contain in-guest crash dumpers which provide more information than a qemu memory dump could, since they know exact load addresses etc. and are integrated with crash analysis tools. What do you have in mind? Right kexec/kdump works perfectly already inside the guest. But: - in the field a lot of people still manage to setup VM guest without kexec/kdump properly setup (even though most enterprisey distribution try hard to setup this up out-of-the-box .. still people manage to not have kexec/kdump loaded once they run into a crash). - you don't have to reserve disk space for a crashdump for each guest e.g. if you run 4 guests with 60 GB of memory each you would loose somehow 4*60 GB space ... just for the (rare) case that each of those guest could write a crashdump, uncompressed ... - legacy distribution - no or buggy kexec - maybe writing a crashdump+reboot with QEMU/libvirt is faster then with in-guest kexec/kdump? (haven't tested yet) - single place on the VM-host to collect coredumps Well libvirt can capture a core file by doing 'virsh dump $GUESTNAME'. This actually uses the QEMU monitor migration command to capture the entire of QEMU memory. The 'crash' command line tool actually knows how to analyse this data format as it would a normal kernel crashdump. Interesting. Right. I'm using the kvmdump support of the crash utility now and then ... it could be more often. But unfortunately the people who run KVM in a productive environment with some strict service-level-agreement often just reboot, due to time pressure, or run out of disk space in the guest, or just forgot that they got told to do always virsh dump on a freeze or crash. I think having a way for a guest OS to notify the host that is has crashed would be useful. libvirt could automatically do a crash dump of the QEMU memory, or at least pause the guest CPUs and notify the management app of the crash, which can then decide what todo. You can also use tools like 'virt-dmesg' which uses libvirt to peek into guest memory to extract the most recent kernel dmesg logs (even if the guest OS itself is crashed didn't manage to send them out via netconsole or something else). I agree. But let's do this via a device, this way kvm need not be changed. Is a device reliable enough if the guest kernel crashes? Do you mean something like a hardware watchdog? Do ILO cards / IPMI support something like this? We could follow their lead in that case. The only two things which came to my mind are: * NMI (aka. ipmitool diag) - already available in qemu/kvm - but requires in-guest kexec/kdump * Hardware-Watchdog (also available in qemu/libvirt) lguest and xen have something similar. They also have an hypercall which get called by a function registered in the panic_notifier_list. Not quite sure if you want to follow their lead. Something I forgot to mention: This panic hypercall could also sit within an external kernel module ... to support (legacy) distribution. This series does need to introduce a QMP event notification upon crash, so that the crash notification can be propagated to mgmt layers above QEMU. Yes. Already done. I posted the QEMU relevant changes as a separated series to the KVM list ... since the initial implementation is KVM specific (KVM hypercall) Best Regards, Daniel -- Daniel Gollub Linux Consultant Developer Tel.: +49-160 47 73 970 Mail: gol...@b1-systems.de B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 signature.asc Description: This is a digitally signed message part.
Re: [PATCH 0/2] Introduce panic hypercall
On 06/20/2011 07:26 PM, Daniel Gollub wrote: I agree. But let's do this via a device, this way kvm need not be changed. Is a device reliable enough if the guest kernel crashes? Do you mean something like a hardware watchdog? I'm proposing a 1:1 equivalent. Instead of issuing a hypercall that tells the host about the panic, write to an I/O port that tells the host about the panic. Do ILO cards / IPMI support something like this? We could follow their lead in that case. The only two things which came to my mind are: * NMI (aka. ipmitool diag) - already available in qemu/kvm - but requires in-guest kexec/kdump * Hardware-Watchdog (also available in qemu/libvirt) A watchdog has the advantage that is also detects lockups. In fact you could implement the panic device via the existing watchdogs. Simply program the timer for the minimum interval and *don't* service the interrupt. This would work for non-virt setups as well as another way to issue a reset. lguest and xen have something similar. They also have an hypercall which get called by a function registered in the panic_notifier_list. Not quite sure if you want to follow their lead. We could do the same, except s/hypercall/writel/. Something I forgot to mention: This panic hypercall could also sit within an external kernel module ... to support (legacy) distribution. Yes. This series does need to introduce a QMP event notification upon crash, so that the crash notification can be propagated to mgmt layers above QEMU. Yes. Already done. I posted the QEMU relevant changes as a separated series to the KVM list ... since the initial implementation is KVM specific (KVM hypercall) -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv4] qemu-img: Add cache command line option
qemu-img currently writes disk images using writeback and filling up the cache buffers which are then flushed by the kernel preventing other processes from accessing the storage. This is particularly bad in cluster environments where time-based algorithms might be in place and accessing the storage within certain timeouts is critical. This patch adds the option to choose a cache method when writing disk images. Signed-off-by: Federico Simoncelli fsimo...@redhat.com --- qemu-img-cmds.hx |6 ++-- qemu-img.c | 80 +- 2 files changed, 70 insertions(+), 16 deletions(-) diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx index 3072d38..2b70618 100644 --- a/qemu-img-cmds.hx +++ b/qemu-img-cmds.hx @@ -22,13 +22,13 @@ STEXI ETEXI DEF(commit, img_commit, -commit [-f fmt] filename) +commit [-f fmt] [-t cache] filename) STEXI @item commit [-f @var{fmt}] @var{filename} ETEXI DEF(convert, img_convert, -convert [-c] [-p] [-f fmt] [-O output_fmt] [-o options] [-s snapshot_name] filename [filename2 [...]] output_filename) +convert [-c] [-p] [-f fmt] [-t cache] [-O output_fmt] [-o options] [-s snapshot_name] filename [filename2 [...]] output_filename) STEXI @item convert [-c] [-f @var{fmt}] [-O @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] @var{filename} [@var{filename2} [...]] @var{output_filename} ETEXI @@ -46,7 +46,7 @@ STEXI ETEXI DEF(rebase, img_rebase, -rebase [-f fmt] [-p] [-u] -b backing_file [-F backing_fmt] filename) +rebase [-f fmt] [-t cache] [-p] [-u] -b backing_file [-F backing_fmt] filename) STEXI @item rebase [-f @var{fmt}] [-u] -b @var{backing_file} [-F @var{backing_fmt}] @var{filename} ETEXI diff --git a/qemu-img.c b/qemu-img.c index 4f162d1..f904e32 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -40,6 +40,7 @@ typedef struct img_cmd_t { /* Default to cache=writeback as data integrity is not important for qemu-tcg. */ #define BDRV_O_FLAGS BDRV_O_CACHE_WB +#define BDRV_DEFAULT_CACHE writeback static void format_print(void *opaque, const char *name) { @@ -64,6 +65,8 @@ static void help(void) Command parameters:\n 'filename' is a disk image filename\n 'fmt' is the disk image format. It is guessed automatically in most cases\n + 'cache' is the cache mode used to write the output disk image, the valid\n + options are: 'none', 'writeback' (default), 'writethrough' and 'unsafe'\n 'size' is the disk image size in bytes. Optional suffixes\n 'k' or 'K' (kilobyte, 1024), 'M' (megabyte, 1024k), 'G' (gigabyte, 1024M)\n and T (terabyte, 1024G) are supported. 'b' is ignored.\n @@ -180,6 +183,27 @@ static int read_password(char *buf, int buf_size) } #endif +static int set_cache_flag(const char *mode, int *flags) +{ +*flags = ~BDRV_O_CACHE_MASK; + +if (!strcmp(mode, none) || !strcmp(mode, off)) { +*flags |= BDRV_O_CACHE_WB; +*flags |= BDRV_O_NOCACHE; +} else if (!strcmp(mode, writeback)) { +*flags |= BDRV_O_CACHE_WB; +} else if (!strcmp(mode, unsafe)) { +*flags |= BDRV_O_CACHE_WB; +*flags |= BDRV_O_NO_FLUSH; +} else if (!strcmp(mode, writethrough)) { +/* this is the default */ +} else { +return -1; +} + +return 0; +} + static int print_block_option_help(const char *filename, const char *fmt) { BlockDriver *drv, *proto_drv; @@ -441,13 +465,14 @@ static int img_check(int argc, char **argv) static int img_commit(int argc, char **argv) { -int c, ret; -const char *filename, *fmt; +int c, ret, flags; +const char *filename, *fmt, *cache; BlockDriverState *bs; fmt = NULL; +cache = BDRV_DEFAULT_CACHE; for(;;) { -c = getopt(argc, argv, f:h); +c = getopt(argc, argv, f:ht:); if (c == -1) { break; } @@ -459,6 +484,9 @@ static int img_commit(int argc, char **argv) case 'f': fmt = optarg; break; +case 't': +cache = optarg; +break; } } if (optind = argc) { @@ -466,7 +494,14 @@ static int img_commit(int argc, char **argv) } filename = argv[optind++]; -bs = bdrv_new_open(filename, fmt, BDRV_O_FLAGS | BDRV_O_RDWR); +flags = BDRV_O_RDWR; +ret = set_cache_flag(cache, flags); +if (ret 0) { +error_report(Invalid cache option: %s\n, cache); +return -1; +} + +bs = bdrv_new_open(filename, fmt, flags); if (!bs) { return 1; } @@ -591,8 +626,8 @@ static int compare_sectors(const uint8_t *buf1, const uint8_t *buf2, int n, static int img_convert(int argc, char **argv) { int c, ret = 0, n, n1, bs_n, bs_i, compress, cluster_size, cluster_sectors; -int progress = 0; -const char *fmt, *out_fmt, *out_baseimg, *out_filename; +int progress = 0, flags; +
Re: [PATCH 0/2] Introduce panic hypercall
On 2011-06-20 18:34, Avi Kivity wrote: Do ILO cards / IPMI support something like this? We could follow their lead in that case. The only two things which came to my mind are: * NMI (aka. ipmitool diag) - already available in qemu/kvm - but requires in-guest kexec/kdump * Hardware-Watchdog (also available in qemu/libvirt) A watchdog has the advantage that is also detects lockups. In fact you could implement the panic device via the existing watchdogs. Simply program the timer for the minimum interval and *don't* service the interrupt. This would work for non-virt setups as well as another way to issue a reset. If you manage to bring down the other guest CPUs fast enough. Otherwise, they may corrupt your crashdump before the host had a chance to collect all pieces. Synchronous signaling to the hypervisor is a bit safer. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Introduce panic hypercall
On 06/20/2011 08:13 PM, Jan Kiszka wrote: A watchdog has the advantage that is also detects lockups. In fact you could implement the panic device via the existing watchdogs. Simply program the timer for the minimum interval and *don't* service the interrupt. This would work for non-virt setups as well as another way to issue a reset. If you manage to bring down the other guest CPUs fast enough. Otherwise, they may corrupt your crashdump before the host had a chance to collect all pieces. Synchronous signaling to the hypervisor is a bit safer. You could NMI-IPI them. But I agree a synchronous signal is better (note it's not race-free itself). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/15] KVM: MMU: do not update slot bitmap if spte is nonpresent
On Tue, Jun 07, 2011 at 08:59:25PM +0800, Xiao Guangrong wrote: Set slot bitmap only if the spte is present Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com --- arch/x86/kvm/mmu.c | 15 +++ 1 files changed, 7 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index cda666a..125f78d 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -743,9 +743,6 @@ static int rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn) struct kvm_mmu_page *sp; unsigned long *rmapp; - if (!is_rmap_spte(*spte)) - return 0; - Not sure if this is safe, what if the spte is set as nonpresent but rmap not removed? BTW i don't see what patch 1 and this have to do with the goal of the series. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/15] KVM: MMU: do not update slot bitmap if spte is nonpresent
On 06/21/2011 12:28 AM, Marcelo Tosatti wrote: On Tue, Jun 07, 2011 at 08:59:25PM +0800, Xiao Guangrong wrote: Set slot bitmap only if the spte is present Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com --- arch/x86/kvm/mmu.c | 15 +++ 1 files changed, 7 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index cda666a..125f78d 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -743,9 +743,6 @@ static int rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn) struct kvm_mmu_page *sp; unsigned long *rmapp; -if (!is_rmap_spte(*spte)) -return 0; - Not sure if this is safe, what if the spte is set as nonpresent but rmap not removed? It can not happen, since when we set the spte as nonpresent, we always use drop_spte to remove the rmap, we also do it in set_spte() BTW i don't see what patch 1 and this have to do with the goal of the series. There are the preparing work for mmio page fault: - Patch 1 fix the bug in walking shadow page, so we can safely use it to lockless-ly walk shadow page - Patch 2 can avoid add rmap for the mmio spte :-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 10/15] KVM: MMU: lockless walking shadow page table
On 06/21/2011 12:37 AM, Marcelo Tosatti wrote: +if (atomic_read(kvm-arch.reader_counter)) { +free_mmu_pages_unlock_parts(invalid_list); +sp = list_first_entry(invalid_list, struct kvm_mmu_page, link); +list_del_init(invalid_list); +call_rcu(sp-rcu, free_invalid_pages_rcu); +return; +} This is probably wrong, the caller wants the page to be zapped by the time the function returns, not scheduled sometime in the future. It can be freed soon and KVM does not reuse these pages anymore... it is not too bad, no? + do { sp = list_first_entry(invalid_list, struct kvm_mmu_page, link); WARN_ON(!sp-role.invalid || sp-root_count); @@ -2601,6 +2633,35 @@ static gpa_t nonpaging_gva_to_gpa_nested(struct kvm_vcpu *vcpu, gva_t vaddr, return vcpu-arch.nested_mmu.translate_gpa(vcpu, vaddr, access); } +int kvm_mmu_walk_shadow_page_lockless(struct kvm_vcpu *vcpu, u64 addr, + u64 sptes[4]) +{ +struct kvm_shadow_walk_iterator iterator; +int nr_sptes = 0; + +rcu_read_lock(); + +atomic_inc(vcpu-kvm-arch.reader_counter); +/* Increase the counter before walking shadow page table */ +smp_mb__after_atomic_inc(); + +for_each_shadow_entry(vcpu, addr, iterator) { +sptes[iterator.level-1] = *iterator.sptep; +nr_sptes++; +if (!is_shadow_present_pte(*iterator.sptep)) +break; +} Why is lockless access needed for the MMIO optimization? Note the spte contents are copied to the array here are used for debugging purposes only, their contents are potentially stale. Um, we can use it to check the mmio page fault if it is the real mmio access or the bug of KVM, i discussed it with Avi: === Yes, it is, i just want to detect BUG for KVM, it helps us to know if ept misconfig is the real MMIO or the BUG. I noticed some ept misconfig BUGs is reported before, so i think doing this is necessary, and i think it is not too bad, since walking spte hierarchy is lockless, it really fast. Okay. We can later see if it show up on profiles. === And it is really fast, i will attach the 'perf result' when the v2 is posted. Yes, their contents are potentially stale, we just use it to check mmio, after all, if we get the stale spte, we will call page fault path to fix it. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: MMU: make kvm_mmu_reset_context() flush the guest TLB
On Sun, Jun 12, 2011 at 06:25:00PM +0300, Avi Kivity wrote: kvm_set_cr0() and kvm_set_cr4(), and possible other functions, assume that kvm_mmu_reset_context() flushes the guest TLB. However, it does not. TLB flush should be done lazily during guest entry, in kvm_mmu_load(). Don't see why this patch is needed. Fix by flushing the tlb (and syncing the new root as well). Signed-off-by: Avi Kivity a...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/12] [uq/master] Import linux headers and some cleanups
On Wed, Jun 08, 2011 at 04:10:54PM +0200, Jan Kiszka wrote: Licensing of the virtio headers is no clarified. So we can finally resolve the clumbsy and constantly buggy #ifdef'ery around old KVM and virtio headers. Recent example: current qemu-kvm does not build against 2.6.32 headers. This series introduces an import mechanism for all required Linux headers so that the appropriate versions can be kept safely inside the QEMU tree. I've incorporated all the valuable review comments on the first version and rebased the result over current uq/master after rebasing that one over current QEMU master. Please note that I had no chance to test-build PPC or s390. Beside the header topic, this series also includes a few assorted KVM cleanup patches so that my queue is empty again. Applied all, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Introduce panic hypercall
On 06/20/2011 10:31 AM, Avi Kivity wrote: On 06/20/2011 04:38 PM, Daniel Gollub wrote: Introduce panic hypercall to enable the crashing guest to notify the host. This enables the host to run some actions as soon a guest crashed (kernel panic). This patch series introduces the panic hypercall at the host end. As well as the hypercall for KVM paravirtuliazed Linux guests, by registering the hypercall to the panic_notifier_list. The basic idea is to create KVM crashdump automatically as soon the guest paniced and power-cycle the VM (e.g. libvirton_crash /). This would be more easily done via a panic device (I/O port or memory-mapped address) that the guest hits. It would be intercepted by qemu without any new code in kvm.\ However, I'm not sure I see the gain. Most enterprisey guests already contain in-guest crash dumpers which provide more information than a qemu memory dump could, since they know exact load addresses etc. and are integrated with crash analysis tools. What do you have in mind? FYI, s390 has this functionality. It's useful because there's no use in having a guest just spin in a panic loop. Crash dump integration is much more complicated and requires functioning networking or some paravirt channel. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 3/7] KVM-HV: KVM Steal time implementation
On Sun, Jun 19, 2011 at 12:57:53PM +0300, Avi Kivity wrote: On 06/17/2011 01:20 AM, Glauber Costa wrote: To implement steal time, we need the hypervisor to pass the guest information about how much time was spent running other processes outside the VM. This is per-vcpu, and using the kvmclock structure for that is an abuse we decided not to make. In this patchset, I am introducing a new msr, KVM_MSR_STEAL_TIME, that holds the memory area address containing information about steal time This patch contains the hypervisor part for it. I am keeping it separate from the headers to facilitate backports to people who wants to backport the kernel part but not the hypervisor, or the other way around. +#define KVM_STEAL_ALIGNMENT_BITS 5 +#define KVM_STEAL_VALID_BITS ((-1ULL (KVM_STEAL_ALIGNMENT_BITS + 1))) +#define KVM_STEAL_RESERVED_MASK (((1 KVM_STEAL_ALIGNMENT_BITS) - 1 ) 1) Clumsy, but okay. +static void record_steal_time(struct kvm_vcpu *vcpu) +{ +u64 delta; + +if (vcpu-arch.st.stime vcpu-arch.st.this_time_out) { 0 is a valid value for stime. + +if (unlikely(kvm_read_guest(vcpu-kvm, vcpu-arch.st.stime, +vcpu-arch.st.steal, sizeof(struct kvm_steal_time { + +vcpu-arch.st.stime = 0; +return; +} + +delta = (get_kernel_ns() - vcpu-arch.st.this_time_out); + +vcpu-arch.st.steal.steal += delta; +vcpu-arch.st.steal.version += 2; + +if (unlikely(kvm_write_guest(vcpu-kvm, vcpu-arch.st.stime, +vcpu-arch.st.steal, sizeof(struct kvm_steal_time { + +vcpu-arch.st.stime = 0; +return; +} +} + +} + @@ -2158,6 +2206,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) kvm_migrate_timers(vcpu); vcpu-cpu = cpu; } + +record_steal_time(vcpu); } This records time spent in userspace in the vcpu thread as steal time. Is this what we want? Or just time preempted away? It also accounts halt time (kvm_vcpu_block) as steal time. Glauber, you could instead use the runnable-state-but-waiting-in-runqueue field of SCHEDSTATS, i forgot the exact name. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Unable to unload kvm-intel module
On Sun, Jun 19, 2011 at 2:06 AM, Jan Kiszka jan.kis...@web.de wrote: On 2011-06-17 20:04, AP wrote: I tired that and it did not give me any warning. Here is the compilation output: make -C /lib/modules/2.6.38-8-generic/build M=`pwd` \ LINUXINCLUDE=-I`pwd`/include -Iinclude \ -Iarch/x86/include \ -I`pwd`/include-compat -I`pwd`/x86 \ -include include/generated/autoconf.h \ -include `pwd`/x86/external-module-compat.h \ $@ make[1]: Entering directory `/usr/src/linux-headers-2.6.38-8-generic' CC [M] /home/ap/dev/kvm/kvm-kmod/x86/vmx.o LD [M] /home/ap/dev/kvm/kvm-kmod/x86/kvm.o LD [M] /home/ap/dev/kvm/kvm-kmod/x86/kvm-intel.o LD [M] /home/ap/dev/kvm/kvm-kmod/x86/kvm-amd.o Building modules, stage 2. MODPOST 3 modules LD [M] /home/ap/dev/kvm/kvm-kmod/x86/kvm-amd.ko CC /home/ap/dev/kvm/kvm-kmod/x86/kvm-intel.mod.o LD [M] /home/ap/dev/kvm/kvm-kmod/x86/kvm-intel.ko LD [M] /home/ap/dev/kvm/kvm-kmod/x86/kvm.ko make[1]: Leaving directory `/usr/src/linux-headers-2.6.38-8-generic' Do you install the built modules and then do a modprobe, or how do you load them? Also try via insmod /home/ap/dev/kvm/kvm-kmod/x86/kvm.ko I don't modprobe them. I use the insmod command above after I rmmod the existing drivers. I tried doing a make install and modprob. No luck! I don't know. Something must be broken with your Ubuntu installation. This is looking very likely at this point. Thanks for all the help. AP -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] client tools: Fix rebase bug on cd_hash.py
I really thought I had fixed this one. cd_hash makes reference to a KvmLoggingConfig class, that existed prior to the refactor. Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com --- client/tools/cd_hash.py |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/client/tools/cd_hash.py b/client/tools/cd_hash.py index c658447..3db1e47 100755 --- a/client/tools/cd_hash.py +++ b/client/tools/cd_hash.py @@ -16,7 +16,7 @@ if __name__ == __main__: parser = optparse.OptionParser(usage: %prog [options] [filenames]) options, args = parser.parse_args() -logging_manager.configure_logging(virt_utils.KvmLoggingConfig()) +logging_manager.configure_logging(virt_utils.VirtLoggingConfig()) if args: filenames = args -- 1.7.5.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
AW: current qemu-kvm doesn't work with vhost
Am 19.06.2011 10:58:20 schrieb(en) Jan Kiszka: On 2011-06-17 22:31, Georg Hopp wrote: Am 17.06.2011 09:29:41 schrieb(en) Jan Kiszka: On 2011-06-17 09:10, Georg Hopp wrote: Jan Kiszka jan.kiszka at web.de writes: On 2011-06-10 05:08, Amos Kong wrote: host kernel: 2.6.39-rc2+ qemu-kvm : 05f1737582ab6c075476bde931c5eafbc62a9349 (gdb) r -monitor stdio -m 800 ~/RHEL-Server-6.0-64-virtio.qcow2 -snapshot -device virtio-net-pci,netdev=he -netdev tap,vhost=on,id=he I already came across that symptom in a different context. Fixed by the patch below. However, the real issue is related to an upstream cleanup of the virtio-pci build. That reveals some unneeded build dependencies in qemu-kvm. Will post a fix. Jan FYI I encountered the same problem and applied the patch. Well this results in the following error while starting the guest: qemu-system-x86_64: unable to start vhost net: 38: falling back on userspace virtio and i have no network at all. I will disable vhost=on for now. Hmm, works fine for me. The vhost-net module is loaded (though I got a different message when I forgot to load it)? Jan Generally it works for me until git revision b2146d8bd. You mean including that commit, right? I have compiled vhost-net directly in my kernel so a have definetly not forgotten to load it... As i use gentoo i made an ebuild that installes exactly this revision. If i find the time to do some debugging i will do so, but actually i am very busy with my job family and the use of kvm is just a sparetime thing. :D If it would be of some help i can set a breakpoint just before the path and see what causes the message. Let's start with double-checking that you are on ce5f0a588b, did a make clean make, and then actually used that result. I suspect an inconsistent build as ce5f0a588b makes the difference between ENOSYS (38) and working vhost support here. Jan Hi, sorry for the long wait. Had a hard weekend. ;) at revision: ce5f0a588b - check applied patch - check clean make - check And now everything works as expected...well, at least i got no error at all. Thanks! Anything else i can do? Georg-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
AW: current qemu-kvm doesn't work with vhost
Am 19.06.2011 10:58:20 schrieb(en) Jan Kiszka: On 2011-06-17 22:31, Georg Hopp wrote: Am 17.06.2011 09:29:41 schrieb(en) Jan Kiszka: On 2011-06-17 09:10, Georg Hopp wrote: Jan Kiszka jan.kiszka at web.de writes: On 2011-06-10 05:08, Amos Kong wrote: host kernel: 2.6.39-rc2+ qemu-kvm : 05f1737582ab6c075476bde931c5eafbc62a9349 (gdb) r -monitor stdio -m 800 ~/RHEL-Server-6.0-64-virtio.qcow2 -snapshot -device virtio-net-pci,netdev=he -netdev tap,vhost=on,id=he I already came across that symptom in a different context. Fixed by the patch below. However, the real issue is related to an upstream cleanup of the virtio-pci build. That reveals some unneeded build dependencies in qemu-kvm. Will post a fix. Jan FYI I encountered the same problem and applied the patch. Well this results in the following error while starting the guest: qemu-system-x86_64: unable to start vhost net: 38: falling back on userspace virtio and i have no network at all. I will disable vhost=on for now. Hmm, works fine for me. The vhost-net module is loaded (though I got a different message when I forgot to load it)? Jan Generally it works for me until git revision b2146d8bd. You mean including that commit, right? I have compiled vhost-net directly in my kernel so a have definetly not forgotten to load it... As i use gentoo i made an ebuild that installes exactly this revision. If i find the time to do some debugging i will do so, but actually i am very busy with my job family and the use of kvm is just a sparetime thing. :D If it would be of some help i can set a breakpoint just before the path and see what causes the message. Let's start with double-checking that you are on ce5f0a588b, did a make clean make, and then actually used that result. I suspect an inconsistent build as ce5f0a588b makes the difference between ENOSYS (38) and working vhost support here. Jan Hi again, tried the current HEAD without any patches and vhost works again for me and with much better performance than before. Client connecting to host, TCP port 5001 TCP window size: 16.0 KByte (default) [ 3] local 192.168.100.4 port 48053 connected with 192.168.100.1 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 7.63 GBytes 6.55 Gbits/sec Thanks for the great work! Greets Georg-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html