Re: [PATCH] KVM: x86 emulator: emulate RETF imm
On 9/8/2013 at 07:13 AM, Gleb Natapov g...@redhat.com wrote: On Tue, Sep 03, 2013 at 01:42:09PM -0600, Bruce Rogers wrote: Opcode CA This gets used by a DOS based NetWare guest. Signed-off-by: Bruce Rogers brog...@suse.com --- arch/x86/kvm/emulate.c | 23 ++- 1 files changed, 22 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 2bc1e81..aee238a 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2025,6 +2025,26 @@ static int em_ret_far(struct x86_emulate_ctxt *ctxt) return rc; } +static int em_ret_far_imm(struct x86_emulate_ctxt *ctxt) +{ +int rc; +unsigned long cs; + +rc = emulate_pop(ctxt, ctxt-_eip, ctxt-op_bytes); +if (rc != X86EMUL_CONTINUE) +return rc; +if (ctxt-op_bytes == 4) +ctxt-_eip = (u32)ctxt-_eip; +rc = emulate_pop(ctxt, cs, ctxt-op_bytes); +if (rc != X86EMUL_CONTINUE) +return rc; +rc = load_segment_descriptor(ctxt, (u16)cs, VCPU_SREG_CS); +if (rc != X86EMUL_CONTINUE) +return rc; +rsp_increment(ctxt, ctxt-src.val); +return X86EMUL_CONTINUE; +} + Why not: static int em_ret_far_imm(struct x86_emulate_ctxt *ctxt) { int rc; rc = em_ret_far(struct x86_emulate_ctxt *ctxt); if (rc != X86EMUL_CONTINUE) return rc; rsp_increment(ctxt, ctxt-src.val); return X86EMUL_CONTINUE; } -- Gleb. Yes, that does seem better. Ack. Bruce -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: x86 emulator: emulate RETF imm
On 9/9/2013 at 07:10 AM, Gleb Natapov g...@redhat.com wrote: On Mon, Sep 09, 2013 at 07:09:15AM -0600, Bruce Rogers wrote: On 9/8/2013 at 07:13 AM, Gleb Natapov g...@redhat.com wrote: On Tue, Sep 03, 2013 at 01:42:09PM -0600, Bruce Rogers wrote: Opcode CA This gets used by a DOS based NetWare guest. Signed-off-by: Bruce Rogers brog...@suse.com --- arch/x86/kvm/emulate.c | 23 ++- 1 files changed, 22 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 2bc1e81..aee238a 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2025,6 +2025,26 @@ static int em_ret_far(struct x86_emulate_ctxt *ctxt) return rc; } +static int em_ret_far_imm(struct x86_emulate_ctxt *ctxt) +{ +int rc; +unsigned long cs; + +rc = emulate_pop(ctxt, ctxt-_eip, ctxt-op_bytes); +if (rc != X86EMUL_CONTINUE) +return rc; +if (ctxt-op_bytes == 4) +ctxt-_eip = (u32)ctxt-_eip; +rc = emulate_pop(ctxt, cs, ctxt-op_bytes); +if (rc != X86EMUL_CONTINUE) +return rc; +rc = load_segment_descriptor(ctxt, (u16)cs, VCPU_SREG_CS); +if (rc != X86EMUL_CONTINUE) +return rc; +rsp_increment(ctxt, ctxt-src.val); +return X86EMUL_CONTINUE; +} + Why not: static int em_ret_far_imm(struct x86_emulate_ctxt *ctxt) { int rc; rc = em_ret_far(struct x86_emulate_ctxt *ctxt); if (rc != X86EMUL_CONTINUE) return rc; rsp_increment(ctxt, ctxt-src.val); return X86EMUL_CONTINUE; } -- Gleb. Yes, that does seem better. Ack. Somebody still needs to write a proper patch :) Can you do it please? Sure, will do. Bruce -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] KVM: x86 emulator: emulate RETF imm
Opcode CA This gets used by a DOS based NetWare guest. Signed-off-by: Bruce Rogers brog...@suse.com --- arch/x86/kvm/emulate.c | 14 +- 1 files changed, 13 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 2bc1e81..ddc3f3d 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2025,6 +2025,17 @@ static int em_ret_far(struct x86_emulate_ctxt *ctxt) return rc; } +static int em_ret_far_imm(struct x86_emulate_ctxt *ctxt) +{ +int rc; + +rc = em_ret_far(ctxt); +if (rc != X86EMUL_CONTINUE) +return rc; +rsp_increment(ctxt, ctxt-src.val); +return X86EMUL_CONTINUE; +} + static int em_cmpxchg(struct x86_emulate_ctxt *ctxt) { /* Save real source value, then compare EAX against destination. */ @@ -3763,7 +3774,8 @@ static const struct opcode opcode_table[256] = { G(ByteOp, group11), G(0, group11), /* 0xC8 - 0xCF */ I(Stack | SrcImmU16 | Src2ImmByte, em_enter), I(Stack, em_leave), - N, I(ImplicitOps | Stack, em_ret_far), + I(ImplicitOps | Stack | SrcImmU16, em_ret_far_imm), + I(ImplicitOps | Stack, em_ret_far), D(ImplicitOps), DI(SrcImmByte, intn), D(ImplicitOps | No64), II(ImplicitOps, em_iret, iret), /* 0xD0 - 0xD7 */ -- 1.7.7 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests] realmode: test RETF imm
Signed-off-by: Bruce Rogers brog...@suse.com --- x86/realmode.c |7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/x86/realmode.c b/x86/realmode.c index 3546771..c57e033 100644 --- a/x86/realmode.c +++ b/x86/realmode.c @@ -481,6 +481,9 @@ void test_io(void) asm (retf: lretw); extern void retf(); +asm (retf_imm: lretw $10); +extern void retf_imm(); + void test_call(void) { u32 esp[16]; @@ -503,6 +506,7 @@ void test_call(void) MK_INSN(call_far1, lcallw *(%ebx)\n\t); MK_INSN(call_far2, lcallw $0, $retf\n\t); MK_INSN(ret_imm,sub $10, %sp; jmp 2f; 1: retw $10; 2: callw 1b); + MK_INSN(retf_imm, sub $10, %sp; lcallw $0, $retf_imm); exec_in_big_real_mode(insn_call1); report(call 1, R_AX, outregs.eax == 0x1234); @@ -523,6 +527,9 @@ void test_call(void) exec_in_big_real_mode(insn_ret_imm); report(ret imm 1, 0, 1); + + exec_in_big_real_mode(insn_retf_imm); + report(retf imm 1, 0, 1); } void test_jcc_short(void) -- 1.7.7 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: x86 emulator: emulate RETF imm
Opcode CA This gets used by a DOS based NetWare guest. Signed-off-by: Bruce Rogers brog...@suse.com --- arch/x86/kvm/emulate.c | 23 ++- 1 files changed, 22 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 2bc1e81..aee238a 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2025,6 +2025,26 @@ static int em_ret_far(struct x86_emulate_ctxt *ctxt) return rc; } +static int em_ret_far_imm(struct x86_emulate_ctxt *ctxt) +{ +int rc; +unsigned long cs; + +rc = emulate_pop(ctxt, ctxt-_eip, ctxt-op_bytes); +if (rc != X86EMUL_CONTINUE) +return rc; +if (ctxt-op_bytes == 4) +ctxt-_eip = (u32)ctxt-_eip; +rc = emulate_pop(ctxt, cs, ctxt-op_bytes); +if (rc != X86EMUL_CONTINUE) +return rc; +rc = load_segment_descriptor(ctxt, (u16)cs, VCPU_SREG_CS); +if (rc != X86EMUL_CONTINUE) +return rc; +rsp_increment(ctxt, ctxt-src.val); +return X86EMUL_CONTINUE; +} + static int em_cmpxchg(struct x86_emulate_ctxt *ctxt) { /* Save real source value, then compare EAX against destination. */ @@ -3763,7 +3783,8 @@ static const struct opcode opcode_table[256] = { G(ByteOp, group11), G(0, group11), /* 0xC8 - 0xCF */ I(Stack | SrcImmU16 | Src2ImmByte, em_enter), I(Stack, em_leave), - N, I(ImplicitOps | Stack, em_ret_far), + I(ImplicitOps | Stack | SrcImmU16, em_ret_far_imm), + I(ImplicitOps | Stack, em_ret_far), D(ImplicitOps), DI(SrcImmByte, intn), D(ImplicitOps | No64), II(ImplicitOps, em_iret, iret), /* 0xD0 - 0xD7 */ -- 1.7.7 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with ETP enabled
On 7/11/2013 at 03:36 AM, Zhanghaoyu (A) haoyu.zh...@huawei.com wrote: hi all, I met similar problem to these, while performing live migration or save-restore test on the kvm platform (qemu:1.4.0, host:suse11sp2, guest:suse11sp2), running tele-communication software suite in guest, https://lists.gnu.org/archive/html/qemu-devel/2013-05/msg00098.html http://comments.gmane.org/gmane.comp.emulators.kvm.devel/102506 http://thread.gmane.org/gmane.comp.emulators.kvm.devel/100592 https://bugzilla.kernel.org/show_bug.cgi?id=58771 After live migration or virsh restore [savefile], one process's CPU utilization went up by about 30%, resulted in throughput degradation of this process. oprofile report on this process in guest, pre live migration: CPU: CPU with timer interrupt, speed 0 MHz (estimated) Profiling through timer interrupt samples %app name symbol name 248 12.3016 no-vmlinux (no symbols) 783.8690 libc.so.6memset 683.3730 libc.so.6memcpy 301.4881 cscf.scu SipMmBufMemAlloc 291.4385 libpthread.so.0 pthread_mutex_lock 261.2897 cscf.scu SipApiGetNextIe 251.2401 cscf.scu DBFI_DATA_Search 200.9921 libpthread.so.0 __pthread_mutex_unlock_usercnt 160.7937 cscf.scu DLM_FreeSlice 160.7937 cscf.scu receivemessage 150.7440 cscf.scu SipSmCopyString 140.6944 cscf.scu DLM_AllocSlice post live migration: CPU: CPU with timer interrupt, speed 0 MHz (estimated) Profiling through timer interrupt samples %app name symbol name 1586 42.2370 libc.so.6memcpy 271 7.2170 no-vmlinux (no symbols) 832.2104 libc.so.6memset 411.0919 libpthread.so.0 __pthread_mutex_unlock_usercnt 350.9321 cscf.scu SipMmBufMemAlloc 290.7723 cscf.scu DLM_AllocSlice 280.7457 libpthread.so.0 pthread_mutex_lock 230.6125 cscf.scu SipApiGetNextIe 170.4527 cscf.scu SipSmCopyString 160.4261 cscf.scu receivemessage 150.3995 cscf.scu SipcMsgStatHandle 140.3728 cscf.scu Urilex 120.3196 cscf.scu DBFI_DATA_Search 120.3196 cscf.scu SipDsmGetHdrBitValInner 120.3196 cscf.scu SipSmGetDataFromRefString So, memcpy costs much more cpu cycles after live migration. Then, I restart the process, this problem disappeared. save-restore has the similar problem. perf report on vcpu thread in host, pre live migration: Performance counter stats for thread id '21082': 0 page-faults 0 minor-faults 0 major-faults 31616 cs 506 migrations 0 alignment-faults 0 emulation-faults 5075957539 L1-dcache-loads [21.32%] 324685106 L1-dcache-load-misses #6.40% of all L1-dcache hits [21.85%] 3681777120 L1-dcache-stores [21.65%] 65251823 L1-dcache-store-misses# 1.77% [22.78%] 0 L1-dcache-prefetches [22.84%] 0 L1-dcache-prefetch-misses [22.32%] 9321652613 L1-icache-loads [22.60%] 1353418869 L1-icache-load-misses # 14.52% of all L1-icache hits [21.92%] 169126969 LLC-loads [21.87%] 12583605 LLC-load-misses #7.44% of all LL-cache hits [ 5.84%] 132853447 LLC-stores [ 6.61%] 10601171 LLC-store-misses #7.9% [ 5.01%] 25309497 LLC-prefetches #30% [ 4.96%] 7723198 LLC-prefetch-misses [ 6.04%] 4954075817 dTLB-loads [11.56%] 26753106 dTLB-load-misses #0.54% of all dTLB cache hits [16.80%] 3553702874 dTLB-stores [22.37%] 4720313 dTLB-store-misses#0.13% [21.46%] not counted dTLB-prefetches not counted dTLB-prefetch-misses
Re: [Qemu-devel] qemu-kvm: remove boot=on|off drive parameter compatibility
On 10/1/2012 at 07:19 AM, Anthony Liguori anth...@codemonkey.ws wrote: Jan Kiszka jan.kis...@siemens.com writes: On 2012-10-01 11:31, Marcelo Tosatti wrote: It's not just about default configs. We need to validate if the migration formats are truly compatible (qemu-kvm - QEMU, the other way around definitely not). For the command line switches, we could provide a wrapper script that translates them into upstream format or simply ignores them. That should be harmless to carry upstream. qemu-kvm has: -no-kvm -no-kvm-irqchip -no-kvm-pit -no-kvm-pit-reinjection -tdf - does nothing There are replacements for all of the above. If we need to add them to qemu.git, it's not big deal to add them. -drive ...,boot= - this is ignored cpu_set command for CPU hotplug which is known broken in qemu-kvm. testdev which is nice but only used for development Default nic is rtl8139 vs. e1000. Some logic to move change the default VGA ram size to 16mb for pc-1.2 (QEMU uses 16mb by default now too). I think at this point, none of this matters but I added the various distro maintainers to the thread. I think it's time for the distros to drop qemu-kvm and just ship qemu.git. Is there anything else that needs to happen to make that switch? We are seriously considering moving to qemu.git for our SP3 release of SUSE SLES 11. There are just a handful of patches that provide the backwards compatibility we need to maintain (default to kvm, default nic model, vga ram size), so assuming there is a 100% commitment to fully supporting kvm in qemu going forward (which I don't doubt) I think this is a good time for us to make that switch. Bruce -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] handle device help before accelerator set up
A command line device probe using just -device ? gets processed after qemu-kvm initializes the accelerator. If /dev/kvm is not present, the accelerator check will fail (kvm is defaulted to on), which causes libvirt to not be set up to handle qemu guests. Moving the device help handling before the accelerator set up allows the device probe to work in this configuration and libvirt succeeds in setting up for a qemu hypervisor mode. Signed-off-by: Bruce Rogers brog...@suse.com --- vl.c |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/vl.c b/vl.c index 1a46d2d..5b75cf9 100644 --- a/vl.c +++ b/vl.c @@ -3380,6 +3380,9 @@ int main(int argc, char **argv, char **envp) ram_size = DEFAULT_RAM_SIZE * 1024 * 1024; } +if (qemu_opts_foreach(qemu_find_opts(device), device_help_func, NULL, 0) != 0) +exit(0); + configure_accelerator(); qemu_init_cpu_loop(); @@ -3535,9 +3538,6 @@ int main(int argc, char **argv, char **envp) } select_vgahw(vga_model); -if (qemu_opts_foreach(qemu_find_opts(device), device_help_func, NULL, 0) != 0) -exit(0); - if (watchdog) { i = select_watchdog(watchdog); if (i 0) -- 1.7.7 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] kvm: kvmclock: apply kvmclock offset to guest wall clock time
On 8/1/2012 at 02:21 PM, Marcelo Tosatti mtosa...@redhat.com wrote: On Mon, Jul 23, 2012 at 09:44:54PM -0300, Marcelo Tosatti wrote: On Fri, Jul 20, 2012 at 10:44:24AM -0600, Bruce Rogers wrote: When a guest migrates to a new host, the system time difference from the previous host is used in the updates to the kvmclock system time visible to the guest, resulting in a continuation of correct kvmclock based guest timekeeping. The wall clock component of the kvmclock provided time is currently not updated with this same time offset. Since the Linux guest caches the wall clock based time, this discrepency is not noticed until the guest is rebooted. After reboot the guest's time calculations are off. This patch adjusts the wall clock by the kvmclock_offset, resulting in correct guest time after a reboot. Cc: Glauber Costa glom...@redhat.com Cc: Zachary Amsden zams...@gmail.com Signed-off-by: Bruce Rogers brog...@suse.com --- arch/x86/kvm/x86.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index be6d549..14c290d 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -907,6 +907,10 @@ static void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock) */ getboottime(boot); + if (kvm-arch.kvmclock_offset) { + struct timespec ts = ns_to_timespec(kvm-arch.kvmclock_offset); + boot = timespec_sub(boot, ts); + } kvmclock_offset is signed (both directions). Must check the sign and use _sub and _add_safe accordingly. Your patch is correct, sorry (applied to master). Patch 2 still makes no sense. I'm fine with dropping the second patch. Thanks Bruce -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] kvm: kvmclock: eliminate kvmclock offset when time page count goes to zero
When a guest is migrated, a time offset is generated in order to maintain the correct kvmclock based time for the guest. Detect when all kvmclock time pages are deleted so that the kvmclock offset can be safely reset to zero. Cc: Glauber Costa glom...@redhat.com Cc: Zachary Amsden zams...@gmail.com Signed-off-by: Bruce Rogers brog...@suse.com --- arch/x86/include/asm/kvm_host.h |1 + arch/x86/kvm/x86.c |5 - 2 files changed, 5 insertions(+), 1 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index db7c1f2..112415c 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -524,6 +524,7 @@ struct kvm_arch { unsigned long irq_sources_bitmap; s64 kvmclock_offset; + unsigned int n_time_pages; raw_spinlock_t tsc_write_lock; u64 last_tsc_nsec; u64 last_tsc_write; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 14c290d..350c51b 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1511,6 +1511,8 @@ static void kvmclock_reset(struct kvm_vcpu *vcpu) if (vcpu-arch.time_page) { kvm_release_page_dirty(vcpu-arch.time_page); vcpu-arch.time_page = NULL; + if (--vcpu-kvm-arch.n_time_pages == 0) + vcpu-kvm-arch.kvmclock_offset = 0; } } @@ -1624,7 +1626,8 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data) if (is_error_page(vcpu-arch.time_page)) { kvm_release_page_clean(vcpu-arch.time_page); vcpu-arch.time_page = NULL; - } + } else + vcpu-kvm-arch.n_time_pages++; break; } case MSR_KVM_ASYNC_PF_EN: -- 1.7.7 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] kvm: kvmclock: fix kvmclock reboot after migrate issues
When a linux guest live migrates to a new host and subsequently reboots, the guest no longer has the correct time. This is due to a failure to apply the kvmclock offset to the wall clock time. The first patch addresses this failure directly, while the second patch detects when the offset is no longer needed, and zeroes the offset as a matter of cleaning up migration state which is no longer relevant. Both patches address the issue, but in different ways. Bruce Rogers (2): kvm: kvmclock: apply kvmclock offset to guest wall clock time kvm: kvmclock: eliminate kvmclock offset when time page count goes to zero arch/x86/include/asm/kvm_host.h |1 + arch/x86/kvm/x86.c |9 - 2 files changed, 9 insertions(+), 1 deletions(-) -- 1.7.7 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] kvm: kvmclock: apply kvmclock offset to guest wall clock time
When a guest migrates to a new host, the system time difference from the previous host is used in the updates to the kvmclock system time visible to the guest, resulting in a continuation of correct kvmclock based guest timekeeping. The wall clock component of the kvmclock provided time is currently not updated with this same time offset. Since the Linux guest caches the wall clock based time, this discrepency is not noticed until the guest is rebooted. After reboot the guest's time calculations are off. This patch adjusts the wall clock by the kvmclock_offset, resulting in correct guest time after a reboot. Cc: Glauber Costa glom...@redhat.com Cc: Zachary Amsden zams...@gmail.com Signed-off-by: Bruce Rogers brog...@suse.com --- arch/x86/kvm/x86.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index be6d549..14c290d 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -907,6 +907,10 @@ static void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock) */ getboottime(boot); + if (kvm-arch.kvmclock_offset) { + struct timespec ts = ns_to_timespec(kvm-arch.kvmclock_offset); + boot = timespec_sub(boot, ts); + } wc.sec = boot.tv_sec; wc.nsec = boot.tv_nsec; wc.version = version; -- 1.7.7 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] kvm: kvmclock: apply kvmclock offset to guest wall clock time
When a guest migrates to a new host, the system time difference from the previous host is used in the updates to the kvmclock system time visible to the guest, resulting in a continuation of correct kvmclock based guest timekeeping. The wall clock component of the kvmclock provided time is currently not updated with this same time offset. Since the Linux guest caches the wall clock based time, this discrepency is not noticed until the guest is rebooted. After reboot the guest's time calculations are off. This patch adjusts the wall clock by the kvmclock_offset, resulting in correct guest time after a reboot. Cc: Glauber Costa glom...@redhat.com Cc: Zachary Amsden zams...@redhat.com Signed-off-by: Bruce Rogers brog...@suse.com --- arch/x86/kvm/x86.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index be6d549..14c290d 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -907,6 +907,10 @@ static void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock) */ getboottime(boot); + if (kvm-arch.kvmclock_offset) { + struct timespec ts = ns_to_timespec(kvm-arch.kvmclock_offset); + boot = timespec_sub(boot, ts); + } wc.sec = boot.tv_sec; wc.nsec = boot.tv_nsec; wc.version = version; -- 1.7.7 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] kvm: kvmclock: eliminate kvmclock offset when time page count goes to zero
When a guest is migrated, a time offset is generated in order to maintain the correct kvmclock based time for the guest. Detect when all kvmclock time pages are deleted so that the kvmclock offset can be safely reset to zero. Cc: Glauber Costa glom...@redhat.com Cc: Zachary Amsden zams...@redhat.com Signed-off-by: Bruce Rogers brog...@suse.com --- arch/x86/include/asm/kvm_host.h |1 + arch/x86/kvm/x86.c |5 - 2 files changed, 5 insertions(+), 1 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index db7c1f2..112415c 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -524,6 +524,7 @@ struct kvm_arch { unsigned long irq_sources_bitmap; s64 kvmclock_offset; + unsigned int n_time_pages; raw_spinlock_t tsc_write_lock; u64 last_tsc_nsec; u64 last_tsc_write; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 14c290d..350c51b 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1511,6 +1511,8 @@ static void kvmclock_reset(struct kvm_vcpu *vcpu) if (vcpu-arch.time_page) { kvm_release_page_dirty(vcpu-arch.time_page); vcpu-arch.time_page = NULL; + if (--vcpu-kvm-arch.n_time_pages == 0) + vcpu-kvm-arch.kvmclock_offset = 0; } } @@ -1624,7 +1626,8 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data) if (is_error_page(vcpu-arch.time_page)) { kvm_release_page_clean(vcpu-arch.time_page); vcpu-arch.time_page = NULL; - } + } else + vcpu-kvm-arch.n_time_pages++; break; } case MSR_KVM_ASYNC_PF_EN: -- 1.7.7 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] kvm: kvmclock: fix kvmclock reboot after migrate issues
When a linux guest live migrates to a new host and subsequently reboots, the guest no longer has the correct time. This is due to a failure to apply the kvmclock offset to the wall clock time. The first patch addresses this failure directly, while the second patch detects when the offset is no longer needed, and zeroes the offset as a matter of cleaning up migration state which is no longer relevant. Both patches address the issue, but in different ways. Bruce Rogers (2): kvm: kvmclock: apply kvmclock offset to guest wall clock time kvm: kvmclock: eliminate kvmclock offset when time page count goes to zero arch/x86/include/asm/kvm_host.h |1 + arch/x86/kvm/x86.c |9 - 2 files changed, 9 insertions(+), 1 deletions(-) -- 1.7.7 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3][STABLE] KVM: indicate oom if add_buf fails
This patch is a subset of an already upstream patch, but this portion is useful in earlier releases. Please consider for stable. If the add_buf operation fails, indicate failure to the caller. Signed-off-by: Bruce Rogers brog...@novell.com --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -318,6 +318,7 @@ static bool try_fill_recv_maxbufs(struct skb_unlink(skb, vi-recv); trim_pages(vi, skb); kfree_skb(skb); + oom = true; break; } vi-num++; @@ -368,6 +369,7 @@ static bool try_fill_recv(struct virtnet if (err 0) { skb_unlink(skb, vi-recv); kfree_skb(skb); + oom = true; break; } vi-num++; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3][STABLE] KVM: add schedule check to napi_enable call
virtio_net: Add schedule check to napi_enable call Under harsh testing conditions, including low memory, the guest would stop receiving packets. With this patch applied we no longer see any problems in the driver while performing these tests for extended periods of time. Make sure napi is scheduled subsequent to each napi_enable. Signed-off-by: Bruce Rogers brog...@novell.com Signed-off-by: Olaf Kirch o...@suse.de --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -388,6 +388,20 @@ static void skb_recv_done(struct virtque } } +static void virtnet_napi_enable(struct virtnet_info *vi) +{ + napi_enable(vi-napi); + + /* If all buffers were filled by other side before we napi_enabled, we +* won't get another interrupt, so process any outstanding packets +* now. virtnet_poll wants re-enable the queue, so we disable here. +* We synchronize against interrupts via NAPI_STATE_SCHED */ + if (napi_schedule_prep(vi-napi)) { + vi-rvq-vq_ops-disable_cb(vi-rvq); + __napi_schedule(vi-napi); + } +} + static void refill_work(struct work_struct *work) { struct virtnet_info *vi; @@ -397,7 +411,7 @@ static void refill_work(struct work_stru napi_disable(vi-napi); try_fill_recv(vi, GFP_KERNEL); still_empty = (vi-num == 0); - napi_enable(vi-napi); + virtnet_napi_enable(vi); /* In theory, this can happen: if we don't get any buffers in * we will *never* try to fill again. */ @@ -589,16 +603,7 @@ static int virtnet_open(struct net_devic { struct virtnet_info *vi = netdev_priv(dev); - napi_enable(vi-napi); - - /* If all buffers were filled by other side before we napi_enabled, we -* won't get another interrupt, so process any outstanding packets -* now. virtnet_poll wants re-enable the queue, so we disable here. -* We synchronize against interrupts via NAPI_STATE_SCHED */ - if (napi_schedule_prep(vi-napi)) { - vi-rvq-vq_ops-disable_cb(vi-rvq); - __napi_schedule(vi-napi); - } + virtnet_napi_enable(vi); return 0; } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3][STABLE] KVM: Various issues in virtio_net
These are patches which we have found useful for our 2.6.32 based SLES 11 SP1 release. The first patch is already upstream, but should be included in stable. The second patch is a subset of another upstream patch. Again, stable material. The third patch solves the last remaining issue we saw when testing kvm configurations with the SUSE certification test suite. Under heavy load, we observed rx stalls (first two patches applied), and this third patch was crafted to address the issue. Please apply to stable. I assume this last problem also exists in more recent kernels than 2.6.32, but I haven't validated that. With these 3 patches applied we no longer see any issues with virito networking using our certification test suite. Signed-off-by: Bruce Rogers brog...@novell.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3][STABLE] KVM: fix delayed refill checking
Please consider this for stable: commit 39d321577405e8e269fd238b278aaf2425fa788a Author: Herbert Xu herb...@gondor.apana.org.au Date: Mon Jan 25 15:51:01 2010 -0800 virtio_net: Make delayed refill more reliable I have seen RX stalls on a machine that experienced a suspected OOM. After the stall, the RX buffer is empty on the guest side and there are exactly 16 entries available on the host side. As the number of entries is less than that required by a maximal skb, the host cannot proceed. The guest did not have a refill job scheduled. My diagnosis is that an OOM had occured, with the delayed refill job scheduled. The job was able to allocate at least one skb, but not enough to overcome the minimum required by the host to proceed. As the refill job would only reschedule itself if it failed completely to allocate any skbs, this would lead to an RX stall. The following patch removes this stall possibility by always rescheduling the refill job until the ring is totally refilled. Testing has shown that the RX stall no longer occurs whereas previously it would occur within a day. Signed-off-by: Herbert Xu herb...@gondor.apana.org.au Acked-by: Rusty Russell ru...@rustcorp.com.au Signed-off-by: David S. Miller da...@davemloft.net diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index c708ecc..9ead30b 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -395,8 +395,7 @@ static void refill_work(struct work_struct *work) vi = container_of(work, struct virtnet_info, refill.work); napi_disable(vi-napi); - try_fill_recv(vi, GFP_KERNEL); - still_empty = (vi-num == 0); + still_empty = !try_fill_recv(vi, GFP_KERNEL); napi_enable(vi-napi); /* In theory, this can happen: if we don't get any buffers in -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [stable] [PATCH 2/3][STABLE] KVM: indicate oom if add_buf fails
On 6/3/2010 at 03:02 PM, Greg KH g...@kroah.com wrote: WHat is the git commit id of the upstream patch? 9ab86bbcf8be755256f0a5e994e0b38af6b4d399 I grabbed this from: git://git.kernel.org/pub/scm/virt/kvm/kvm.git I need that for all stable patches to be accepted, thanks. Also, all KVM stuff needs to get acked by Avi, I can't take them until he says they are ok. Understood. Oh, and what -stable trees do you want these patches in? .27, .32, .33, or .34? I have a bunch of them going at the moment... All 3 in 2.6.32, only #2 and #3 in 2.6.33, and only #3 in 2.6.34 Thanks, Bruce -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [stable] [PATCH 3/3][STABLE] KVM: add schedule check to napi_enable call
On 6/3/2010 at 03:03 PM, Greg KH g...@kroah.com wrote: On Thu, Jun 03, 2010 at 01:38:31PM -0600, Bruce Rogers wrote: virtio_net: Add schedule check to napi_enable call Under harsh testing conditions, including low memory, the guest would stop receiving packets. With this patch applied we no longer see any problems in the driver while performing these tests for extended periods of time. Make sure napi is scheduled subsequent to each napi_enable. Signed-off-by: Bruce Rogers brog...@novell.com Signed-off-by: Olaf Kirch o...@suse.de I need a git commit id for this one as well. This one is not upstream. Bruce -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [stable] [PATCH 3/3][STABLE] KVM: add schedule check to napi_enable call
On 6/3/2010 at 04:51 PM, Greg KH g...@kroah.com wrote: On Thu, Jun 03, 2010 at 04:17:34PM -0600, Bruce Rogers wrote: On 6/3/2010 at 03:03 PM, Greg KH g...@kroah.com wrote: On Thu, Jun 03, 2010 at 01:38:31PM -0600, Bruce Rogers wrote: virtio_net: Add schedule check to napi_enable call Under harsh testing conditions, including low memory, the guest would stop receiving packets. With this patch applied we no longer see any problems in the driver while performing these tests for extended periods of time. Make sure napi is scheduled subsequent to each napi_enable. Signed-off-by: Bruce Rogers brog...@novell.com Signed-off-by: Olaf Kirch o...@suse.de I need a git commit id for this one as well. This one is not upstream. Then I can't include it in the -stable tree, so why are you sending it to me? :) thanks, greg k-h Good point! Sorry about the confusion. Bruce -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] document boot option to -drive parameter
The boot option is missing from the documentation for the -drive parameter. If there is a better way to descibe it, I'm all ears. Signed-off-by: Bruce Rogers brog...@novell.com diff --git a/qemu-options.hx b/qemu-options.hx index c5a160c..fbcf61e 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -160,6 +160,8 @@ an untrusted format header. This option specifies the serial number to assign to the device. @item ad...@var{addr} Specify the controller's PCI address (if=virtio only). +...@item bo...@var{boot} +...@var{boot} is on or off and allows for booting from non-traditional interfaces, such as virtio. @end table By default, writethrough caching is used for all block device. This means that -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] make help output be a little more self-consistent
This is the part which applies to qemu-kvm. Signed-off-by: Bruce Rogers brog...@novell.com --- qemu-options.hx | 19 ++- 1 files changed, 10 insertions(+), 9 deletions(-) diff --git a/qemu-options.hx b/qemu-options.hx index 788d849..fdd5884 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -1938,7 +1938,7 @@ DEF(readconfig, HAS_ARG, QEMU_OPTION_readconfig, -readconfig file\n) DEF(writeconfig, HAS_ARG, QEMU_OPTION_writeconfig, -writeconfig file\n -read/write config file) +read/write config file\n) DEF(no-kvm, 0, QEMU_OPTION_no_kvm, -no-kvm disable KVM hardware virtualization\n) @@ -1947,26 +1947,27 @@ DEF(no-kvm-irqchip, 0, QEMU_OPTION_no_kvm_irqchip, DEF(no-kvm-pit, 0, QEMU_OPTION_no_kvm_pit, -no-kvm-pit disable KVM kernel mode PIT\n) DEF(no-kvm-pit-reinjection, 0, QEMU_OPTION_no_kvm_pit_reinjection, --no-kvm-pit-reinjection disable KVM kernel mode PIT interrupt reinjection\n) +-no-kvm-pit-reinjection\n +disable KVM kernel mode PIT interrupt reinjection\n) #if defined(TARGET_I386) || defined(TARGET_X86_64) || defined(TARGET_IA64) || defined(__linux__) DEF(pcidevice, HAS_ARG, QEMU_OPTION_pcidevice, -pcidevice host=bus:dev.func[,dma=none][,name=string]\n -expose a PCI device to the guest OS.\n +expose a PCI device to the guest OS\n dma=none: don't perform any dma translations (default is to use an iommu)\n -'string' is used in log output.\n) +'string' is used in log output\n) #endif DEF(enable-nesting, 0, QEMU_OPTION_enable_nesting, -enable-nesting enable support for running a VM inside the VM (AMD only)\n) DEF(nvram, HAS_ARG, QEMU_OPTION_nvram, --nvram FILE provide ia64 nvram contents\n) +-nvram FILE provide ia64 nvram contents\n) DEF(tdf, 0, QEMU_OPTION_tdf, --tdf enable guest time drift compensation\n) +-tdfenable guest time drift compensation\n) DEF(kvm-shadow-memory, HAS_ARG, QEMU_OPTION_kvm_shadow_memory, -kvm-shadow-memory MEGABYTES\n - allocate MEGABYTES for kvm mmu shadowing\n) +allocate MEGABYTES for kvm mmu shadowing\n) DEF(mem-path, HAS_ARG, QEMU_OPTION_mempath, --mem-path FILE provide backing storage for guest RAM\n) +-mem-path FILE provide backing storage for guest RAM\n) #ifdef MAP_POPULATE DEF(mem-prealloc, 0, QEMU_OPTION_mem_prealloc, --mem-preallocpreallocate guest memory (use with -mempath)\n) +-mem-prealloc preallocate guest memory (use with -mempath)\n) #endif -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] [RESEND] make help output be a little more self-consistent
This is the part which applies to the base qemu. btw: it was sent to qemu-de...@nongnu.org yesterday.) Signed-off-by: Bruce Rogers --- qemu-options.hx | 39 --- 1 files changed, 20 insertions(+), 19 deletions(-) diff --git a/qemu-options.hx b/qemu-options.hx index ecd50eb..20b696d 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -42,7 +42,7 @@ DEF(smp, HAS_ARG, QEMU_OPTION_smp, -smp n[,maxcpus=cpus][,cores=cores][,threads=threads][,sockets=sockets]\n set the number of CPUs to 'n' [default=1]\n maxcpus= maximum number of total cpus, including\n - offline CPUs for hotplug etc.\n +offline CPUs for hotplug, etc\n cores= number of CPU cores on one socket\n threads= number of threads on one CPU core\n sockets= number of discrete sockets in the system\n) @@ -405,8 +405,9 @@ ETEXI DEF(device, HAS_ARG, QEMU_OPTION_device, -device driver[,options] add device\n) DEF(name, HAS_ARG, QEMU_OPTION_name, --name string1[,process=string2]set the name of the guest\n -string1 sets the window title and string2 the process name (on Linux)\n) +-name string1[,process=string2]\n +set the name of the guest\n +string1 sets the window title and string2 the process name (on Linux)\n) STEXI @item -name @var{name} Sets the @var{name} of the guest. @@ -483,7 +484,7 @@ ETEXI #ifdef CONFIG_SDL DEF(ctrl-grab, 0, QEMU_OPTION_ctrl_grab, --ctrl-grab use Right-Ctrl to grab mouse (instead of Ctrl-Alt)\n) +-ctrl-grab use Right-Ctrl to grab mouse (instead of Ctrl-Alt)\n) #endif STEXI @item -ctrl-grab @@ -756,12 +757,12 @@ ETEXI #ifdef TARGET_I386 DEF(smbios, HAS_ARG, QEMU_OPTION_smbios, -smbios file=binary\n -Load SMBIOS entry from binary file\n +load SMBIOS entry from binary file\n -smbios type=0[,vendor=str][,version=str][,date=str][,release=%%d.%%d]\n -Specify SMBIOS type 0 fields\n +specify SMBIOS type 0 fields\n -smbios type=1[,manufacturer=str][,product=str][,version=str][,serial=str]\n [,uuid=uuid][,sku=str][,family=str]\n -Specify SMBIOS type 1 fields\n) +specify SMBIOS type 1 fields\n) #endif STEXI @item -smbios fi...@var{binary} @@ -816,13 +817,13 @@ DEF(net, HAS_ARG, QEMU_OPTION_net, -net tap[,vlan=n][,name=str][,fd=h][,ifname=name][,script=file][,downscript=dfile][,sndbuf=nbytes][,vnet_hdr=on|off]\n connect the host TAP network interface to VLAN 'n' and use the\n network scripts 'file' (default=%s)\n -and 'dfile' (default=%s);\n -use '[down]script=no' to disable script execution;\n +and 'dfile' (default=%s)\n +use '[down]script=no' to disable script execution\n use 'fd=h' to connect to an already opened TAP interface\n -use 'sndbuf=nbytes' to limit the size of the send buffer; the\n -default of 'sndbuf=1048576' can be disabled using 'sndbuf=0'\n -use vnet_hdr=off to avoid enabling the IFF_VNET_HDR tap flag; use\n -vnet_hdr=on to make the lack of IFF_VNET_HDR support an error condition\n +use 'sndbuf=nbytes' to limit the size of the send buffer (the\n +default of 'sndbuf=1048576' can be disabled using 'sndbuf=0')\n +use vnet_hdr=off to avoid enabling the IFF_VNET_HDR tap flag\n +use vnet_hdr=on to make the lack of IFF_VNET_HDR support an error condition\n #endif -net socket[,vlan=n][,name=str][,fd=h][,listen=[host]:port][,connect=host:port]\n connect the vlan 'n' to another VLAN using a socket connection\n @@ -837,7 +838,7 @@ DEF(net, HAS_ARG, QEMU_OPTION_net, #endif -net dump[,vlan=n][,file=f][,len=n]\n dump traffic on vlan 'n' to file 'f' (max n bytes per packet)\n --net none use it alone to have zero network devices; if no -net option\n +-net none use it alone to have zero network devices. If no -net option\n is provided, the default is '-net nic -net user'\n) DEF(netdev, HAS_ARG, QEMU_OPTION_netdev, -netdev [ @@ -1589,7 +1590,7 @@ The default device is @code{vc} in graphical mode and @code{stdio} in non graphical mode. ETEXI DEF(qmp, HAS_ARG, QEMU_OPTION_qmp, \ --qmp devlike -monitor but opens in 'control' mode.\n) +-qmp devlike -monitor but opens in 'control' mode\n) DEF(mon, HAS_ARG, QEMU_OPTION_mon, \ -mon chardev=[name][,mode=readline|control][,default]\n) @@ -1607,7 +1608,7 @@ from a script. ETEXI DEF(singlestep
[PATCH] make help output be a little more self-consistent
Signed-off-by: Bruce Rogers brog...@novell.com --- qemu-options.hx | 58 -- 1 files changed, 30 insertions(+), 28 deletions(-) diff --git a/qemu-options.hx b/qemu-options.hx index 812d067..fdd5884 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -42,7 +42,7 @@ DEF(smp, HAS_ARG, QEMU_OPTION_smp, -smp n[,maxcpus=cpus][,cores=cores][,threads=threads][,sockets=sockets]\n set the number of CPUs to 'n' [default=1]\n maxcpus= maximum number of total cpus, including\n - offline CPUs for hotplug etc.\n +offline CPUs for hotplug, etc\n cores= number of CPU cores on one socket\n threads= number of threads on one CPU core\n sockets= number of discrete sockets in the system\n) @@ -406,8 +406,9 @@ ETEXI DEF(device, HAS_ARG, QEMU_OPTION_device, -device driver[,options] add device\n) DEF(name, HAS_ARG, QEMU_OPTION_name, --name string1[,process=string2]set the name of the guest\n -string1 sets the window title and string2 the process name (on Linux)\n) +-name string1[,process=string2]\n +set the name of the guest\n +string1 sets the window title and string2 the process name (on Linux)\n) STEXI @item -name @var{name} Sets the @var{name} of the guest. @@ -484,7 +485,7 @@ ETEXI #ifdef CONFIG_SDL DEF(ctrl-grab, 0, QEMU_OPTION_ctrl_grab, --ctrl-grab use Right-Ctrl to grab mouse (instead of Ctrl-Alt)\n) +-ctrl-grab use Right-Ctrl to grab mouse (instead of Ctrl-Alt)\n) #endif STEXI @item -ctrl-grab @@ -757,12 +758,12 @@ ETEXI #ifdef TARGET_I386 DEF(smbios, HAS_ARG, QEMU_OPTION_smbios, -smbios file=binary\n -Load SMBIOS entry from binary file\n +load SMBIOS entry from binary file\n -smbios type=0[,vendor=str][,version=str][,date=str][,release=%%d.%%d]\n -Specify SMBIOS type 0 fields\n +specify SMBIOS type 0 fields\n -smbios type=1[,manufacturer=str][,product=str][,version=str][,serial=str]\n [,uuid=uuid][,sku=str][,family=str]\n -Specify SMBIOS type 1 fields\n) +specify SMBIOS type 1 fields\n) #endif STEXI @item -smbios fi...@var{binary} @@ -817,13 +818,13 @@ DEF(net, HAS_ARG, QEMU_OPTION_net, -net tap[,vlan=n][,name=str][,fd=h][,ifname=name][,script=file][,downscript=dfile][,sndbuf=nbytes][,vnet_hdr=on|off]\n connect the host TAP network interface to VLAN 'n' and use the\n network scripts 'file' (default=%s)\n -and 'dfile' (default=%s);\n -use '[down]script=no' to disable script execution;\n +and 'dfile' (default=%s)\n +use '[down]script=no' to disable script execution\n use 'fd=h' to connect to an already opened TAP interface\n -use 'sndbuf=nbytes' to limit the size of the send buffer; the\n -default of 'sndbuf=1048576' can be disabled using 'sndbuf=0'\n -use vnet_hdr=off to avoid enabling the IFF_VNET_HDR tap flag; use\n -vnet_hdr=on to make the lack of IFF_VNET_HDR support an error condition\n +use 'sndbuf=nbytes' to limit the size of the send buffer (the\n +default of 'sndbuf=1048576' can be disabled using 'sndbuf=0')\n +use vnet_hdr=off to avoid enabling the IFF_VNET_HDR tap flag\n +use vnet_hdr=on to make the lack of IFF_VNET_HDR support an error condition\n #endif -net socket[,vlan=n][,name=str][,fd=h][,listen=[host]:port][,connect=host:port]\n connect the vlan 'n' to another VLAN using a socket connection\n @@ -838,7 +839,7 @@ DEF(net, HAS_ARG, QEMU_OPTION_net, #endif -net dump[,vlan=n][,file=f][,len=n]\n dump traffic on vlan 'n' to file 'f' (max n bytes per packet)\n --net none use it alone to have zero network devices; if no -net option\n +-net none use it alone to have zero network devices. If no -net option\n is provided, the default is '-net nic -net user'\n) DEF(netdev, HAS_ARG, QEMU_OPTION_netdev, -netdev [ @@ -1590,7 +1591,7 @@ The default device is @code{vc} in graphical mode and @code{stdio} in non graphical mode. ETEXI DEF(qmp, HAS_ARG, QEMU_OPTION_qmp, \ --qmp devlike -monitor but opens in 'control' mode.\n) +-qmp devlike -monitor but opens in 'control' mode\n) DEF(mon, HAS_ARG, QEMU_OPTION_mon, \ -mon chardev=[name][,mode=readline|control][,default]\n) @@ -1608,7 +1609,7 @@ from a script. ETEXI DEF(singlestep, 0, QEMU_OPTION_singlestep, \ --singlestep always run in singlestep
[PATCH] kvm: allocate correct size for dirty bitmap
The dirty bitmap copied out to userspace is stored in a long array, and gets copied out to userspace accordingly. This patch accounts for that correctly. Currently I'm seeing kvm crashing due to writing beyond the end of the alloc'd dirty bitmap memory, because the buffer has the wrong size. Signed-off-by: Bruce Rogers --- qemu-kvm.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/qemu-kvm.c b/qemu-kvm.c index 6511cb6..ee5db76 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -702,7 +702,7 @@ int kvm_get_dirty_pages_range(kvm_context_t kvm, unsigned long phys_addr, for (i = 0; i KVM_MAX_NUM_MEM_REGIONS; ++i) { if ((slots[i].len (uint64_t) slots[i].phys_addr = phys_addr) ((uint64_t) slots[i].phys_addr + slots[i].len = end_addr)) { -buf = qemu_malloc((slots[i].len / 4096 + 7) / 8 + 2); +buf = qemu_malloc(BITMAP_SIZE(slots[i].len)); r = kvm_get_map(kvm, KVM_GET_DIRTY_LOG, i, buf); if (r) { qemu_free(buf); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm scaling question
On 9/11/2009 at 9:46 AM, Javier Guerra jav...@guerrag.com wrote: On Fri, Sep 11, 2009 at 10:36 AM, Bruce Rogers brog...@novell.com wrote: Also, when I did a simple experiment with vcpu overcommitment, I was surprised how quickly performance suffered (just bringing a Linux vm up), since I would have assumed the additional vcpus would have been halted the vast majority of the time. On a 2 proc box, overcommitment to 8 vcpus in a guest (I know this isn't a good usage scenario, but does provide some insights) caused the boot time to increase to almost exponential levels. At 16 vcpus, it took hours to just reach the gui login prompt. I'd guess (and hope!) that having many 1- or 2-cpu guests won't kill performance as sharply as having a single guest with more vcpus than the physical cpus available. have you tested that? -- Javier Yes, but not empirically. I'll certainly be doing that, but wanted to see what perspective there was on the results I was seeing. And I've gotten the response that explains why overcommitment is performing so poorly in another email. Bruce -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm scaling question
On 9/11/2009 at 3:53 PM, Marcelo Tosatti mtosa...@redhat.com wrote: On Fri, Sep 11, 2009 at 09:36:10AM -0600, Bruce Rogers wrote: I am wondering if anyone has investigated how well kvm scales when supporting many guests, or many vcpus or both. I'll do some investigations into the per vm memory overhead and play with bumping the max vcpu limit way beyond 16, but hopefully someone can comment on issues such as locking problems that are known to exist and needing to be addressed to increased parallellism, general overhead percentages which can help provide consolidation expectations, etc. I suppose it depends on the guest and workload. With an EPT host and 16-way Linux guest doing kernel compilations, on recent kernel, i see: # Samples: 98703304 # # Overhead Command Shared Object Symbol # ... . .. # 97.15% sh [kernel] [k] vmx_vcpu_run 0.27% sh [kernel] [k] kvm_arch_vcpu_ioctl_ 0.12% sh [kernel] [k] default_send_IPI_mas 0.09% sh [kernel] [k] _spin_lock_irq Which is pretty good. Without EPT/NPT the mmu_lock seems to be the major bottleneck to parallelism. Also, when I did a simple experiment with vcpu overcommitment, I was surprised how quickly performance suffered (just bringing a Linux vm up), since I would have assumed the additional vcpus would have been halted the vast majority of the time. On a 2 proc box, overcommitment to 8 vcpus in a guest (I know this isn't a good usage scenario, but does provide some insights) caused the boot time to increase to almost exponential levels. At 16 vcpus, it took hours to just reach the gui login prompt. One probable reason for that are vcpus which hold spinlocks in the guest are scheduled out in favour of vcpus which spin on that same lock. I suspected it might be a whole lot of spinning happening. That does seems most likely. I was just surprised how bad the behavior was. Bruce -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm scaling question
On 9/11/2009 at 5:02 PM, Andre Przywara andre.przyw...@amd.com wrote: Marcelo Tosatti wrote: On Fri, Sep 11, 2009 at 09:36:10AM -0600, Bruce Rogers wrote: I am wondering if anyone has investigated how well kvm scales when supporting many guests, or many vcpus or both. I'll do some investigations into the per vm memory overhead and play with bumping the max vcpu limit way beyond 16, but hopefully someone can comment on issues such as locking problems that are known to exist and needing to be addressed to increased parallellism, general overhead percentages which can help provide consolidation expectations, etc. I suppose it depends on the guest and workload. With an EPT host and 16-way Linux guest doing kernel compilations, on recent kernel, i see: ... Also, when I did a simple experiment with vcpu overcommitment, I was surprised how quickly performance suffered (just bringing a Linux vm up), since I would have assumed the additional vcpus would have been halted the vast majority of the time. On a 2 proc box, overcommitment to 8 vcpus in a guest (I know this isn't a good usage scenario, but does provide some insights) caused the boot time to increase to almost exponential levels. At 16 vcpus, it took hours to just reach the gui login prompt. One probable reason for that are vcpus which hold spinlocks in the guest are scheduled out in favour of vcpus which spin on that same lock. We have encountered this issue some time ago in Xen. Ticket spinlocks make this even worse. More detailed info can be found here: http://www.amd64.org/research/virtualization.html#Lock_holder_preemption Have you tried using paravirtualized spinlock in the guest kernel? http://lkml.indiana.edu/hypermail/linux/kernel/0807.0/2808.html I'll try to give that a try. Thanks for the tips. Bruce -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
kvm scaling question
I am wondering if anyone has investigated how well kvm scales when supporting many guests, or many vcpus or both. I'll do some investigations into the per vm memory overhead and play with bumping the max vcpu limit way beyond 16, but hopefully someone can comment on issues such as locking problems that are known to exist and needing to be addressed to increased parallellism, general overhead percentages which can help provide consolidation expectations, etc. Also, when I did a simple experiment with vcpu overcommitment, I was surprised how quickly performance suffered (just bringing a Linux vm up), since I would have assumed the additional vcpus would have been halted the vast majority of the time. On a 2 proc box, overcommitment to 8 vcpus in a guest (I know this isn't a good usage scenario, but does provide some insights) caused the boot time to increase to almost exponential levels. At 16 vcpus, it took hours to just reach the gui login prompt. Any perspective you can offer would be appreciated. Bruce -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] handle -smp 16 more cleanly
The x86 kvm kernel module limits guest cpu count to 16, but theuserspace pc definition says 255 still, so kvm_create_vcpu will fail for that reason with -smp 16 specified. This patch causes qemu-kvm to exit in that case. Without this patch other errors get reported down the road and finally a segfault occurs. Bruce Signed-off-by: Bruce Rogers brog...@novell.com diff --git a/qemu/qemu-kvm.c b/qemu/qemu-kvm.c index ed76367..b6d6d5e 100644 --- a/qemu/qemu-kvm.c +++ b/qemu/qemu-kvm.c @@ -417,12 +417,18 @@ static void *ap_main_loop(void *_env) CPUState *env = _env; sigset_t signals; struct ioperm_data *data = NULL; +int r; current_env = env; env-thread_id = kvm_get_thread_id(); sigfillset(signals); sigprocmask(SIG_BLOCK, signals, NULL); -kvm_create_vcpu(kvm_context, env-cpu_index); +r = kvm_create_vcpu(kvm_context, env-cpu_index); +if (r) +{ +fprintf(stderr, error creating vcpu: %d\n, r); +exit(1); +} kvm_qemu_init_env(env); #ifdef USE_KVM_DEVICE_ASSIGNMENT -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html