Re: [PATCH] KVM: x86 emulator: emulate RETF imm

2013-09-09 Thread Bruce Rogers
  On 9/8/2013 at 07:13 AM, Gleb Natapov g...@redhat.com wrote: 
 On Tue, Sep 03, 2013 at 01:42:09PM -0600, Bruce Rogers wrote:
 Opcode CA
 
 This gets used by a DOS based NetWare guest.
 
 Signed-off-by: Bruce Rogers brog...@suse.com
 ---
  arch/x86/kvm/emulate.c |   23 ++-
  1 files changed, 22 insertions(+), 1 deletions(-)
 
 diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
 index 2bc1e81..aee238a 100644
 --- a/arch/x86/kvm/emulate.c
 +++ b/arch/x86/kvm/emulate.c
 @@ -2025,6 +2025,26 @@ static int em_ret_far(struct x86_emulate_ctxt *ctxt)
  return rc;
  }
  
 +static int em_ret_far_imm(struct x86_emulate_ctxt *ctxt)
 +{
 +int rc;
 +unsigned long cs;
 +
 +rc = emulate_pop(ctxt, ctxt-_eip, ctxt-op_bytes);
 +if (rc != X86EMUL_CONTINUE)
 +return rc;
 +if (ctxt-op_bytes == 4)
 +ctxt-_eip = (u32)ctxt-_eip;
 +rc = emulate_pop(ctxt, cs, ctxt-op_bytes);
 +if (rc != X86EMUL_CONTINUE)
 +return rc;
 +rc = load_segment_descriptor(ctxt, (u16)cs, VCPU_SREG_CS);
 +if (rc != X86EMUL_CONTINUE)
 +return rc;
 +rsp_increment(ctxt, ctxt-src.val);
 +return X86EMUL_CONTINUE;
 +}
 +
 Why not:
 
 static int em_ret_far_imm(struct x86_emulate_ctxt *ctxt)
 {
   int rc;
   rc = em_ret_far(struct x86_emulate_ctxt *ctxt);
   if (rc != X86EMUL_CONTINUE)
   return rc;
   rsp_increment(ctxt, ctxt-src.val);
   return X86EMUL_CONTINUE;
 }
 
 --
   Gleb.

Yes, that does seem better. Ack.

Bruce

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86 emulator: emulate RETF imm

2013-09-09 Thread Bruce Rogers
  On 9/9/2013 at 07:10 AM, Gleb Natapov g...@redhat.com wrote: 
 On Mon, Sep 09, 2013 at 07:09:15AM -0600, Bruce Rogers wrote:
   On 9/8/2013 at 07:13 AM, Gleb Natapov g...@redhat.com wrote: 
  On Tue, Sep 03, 2013 at 01:42:09PM -0600, Bruce Rogers wrote:
  Opcode CA
  
  This gets used by a DOS based NetWare guest.
  
  Signed-off-by: Bruce Rogers brog...@suse.com
  ---
   arch/x86/kvm/emulate.c |   23 ++-
   1 files changed, 22 insertions(+), 1 deletions(-)
  
  diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
  index 2bc1e81..aee238a 100644
  --- a/arch/x86/kvm/emulate.c
  +++ b/arch/x86/kvm/emulate.c
  @@ -2025,6 +2025,26 @@ static int em_ret_far(struct x86_emulate_ctxt 
  *ctxt)
return rc;
   }
   
  +static int em_ret_far_imm(struct x86_emulate_ctxt *ctxt)
  +{
  +int rc;
  +unsigned long cs;
  +
  +rc = emulate_pop(ctxt, ctxt-_eip, ctxt-op_bytes);
  +if (rc != X86EMUL_CONTINUE)
  +return rc;
  +if (ctxt-op_bytes == 4)
  +ctxt-_eip = (u32)ctxt-_eip;
  +rc = emulate_pop(ctxt, cs, ctxt-op_bytes);
  +if (rc != X86EMUL_CONTINUE)
  +return rc;
  +rc = load_segment_descriptor(ctxt, (u16)cs, VCPU_SREG_CS);
  +if (rc != X86EMUL_CONTINUE)
  +return rc;
  +rsp_increment(ctxt, ctxt-src.val);
  +return X86EMUL_CONTINUE;
  +}
  +
  Why not:
  
  static int em_ret_far_imm(struct x86_emulate_ctxt *ctxt)
  {
 int rc;
 rc = em_ret_far(struct x86_emulate_ctxt *ctxt);
 if (rc != X86EMUL_CONTINUE)
 return rc;
 rsp_increment(ctxt, ctxt-src.val);
 return X86EMUL_CONTINUE;
  }
  
  --
 Gleb.
 
 Yes, that does seem better. Ack.
 
 Somebody still needs to write a proper patch :) Can you do it please?

Sure, will do.

Bruce


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] KVM: x86 emulator: emulate RETF imm

2013-09-09 Thread Bruce Rogers
Opcode CA

This gets used by a DOS based NetWare guest.

Signed-off-by: Bruce Rogers brog...@suse.com
---
 arch/x86/kvm/emulate.c |   14 +-
 1 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 2bc1e81..ddc3f3d 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2025,6 +2025,17 @@ static int em_ret_far(struct x86_emulate_ctxt *ctxt)
return rc;
 }
 
+static int em_ret_far_imm(struct x86_emulate_ctxt *ctxt)
+{
+int rc;
+
+rc = em_ret_far(ctxt);
+if (rc != X86EMUL_CONTINUE)
+return rc;
+rsp_increment(ctxt, ctxt-src.val);
+return X86EMUL_CONTINUE;
+}
+
 static int em_cmpxchg(struct x86_emulate_ctxt *ctxt)
 {
/* Save real source value, then compare EAX against destination. */
@@ -3763,7 +3774,8 @@ static const struct opcode opcode_table[256] = {
G(ByteOp, group11), G(0, group11),
/* 0xC8 - 0xCF */
I(Stack | SrcImmU16 | Src2ImmByte, em_enter), I(Stack, em_leave),
-   N, I(ImplicitOps | Stack, em_ret_far),
+   I(ImplicitOps | Stack | SrcImmU16, em_ret_far_imm),
+   I(ImplicitOps | Stack, em_ret_far),
D(ImplicitOps), DI(SrcImmByte, intn),
D(ImplicitOps | No64), II(ImplicitOps, em_iret, iret),
/* 0xD0 - 0xD7 */
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests] realmode: test RETF imm

2013-09-04 Thread Bruce Rogers
Signed-off-by: Bruce Rogers brog...@suse.com
---
 x86/realmode.c |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/x86/realmode.c b/x86/realmode.c
index 3546771..c57e033 100644
--- a/x86/realmode.c
+++ b/x86/realmode.c
@@ -481,6 +481,9 @@ void test_io(void)
 asm (retf: lretw);
 extern void retf();
 
+asm (retf_imm: lretw $10);
+extern void retf_imm();
+
 void test_call(void)
 {
u32 esp[16];
@@ -503,6 +506,7 @@ void test_call(void)
MK_INSN(call_far1,  lcallw *(%ebx)\n\t);
MK_INSN(call_far2,  lcallw $0, $retf\n\t);
MK_INSN(ret_imm,sub $10, %sp; jmp 2f; 1: retw $10; 2: callw 1b);
+   MK_INSN(retf_imm,   sub $10, %sp; lcallw $0, $retf_imm);
 
exec_in_big_real_mode(insn_call1);
report(call 1, R_AX, outregs.eax == 0x1234);
@@ -523,6 +527,9 @@ void test_call(void)
 
exec_in_big_real_mode(insn_ret_imm);
report(ret imm 1, 0, 1);
+
+   exec_in_big_real_mode(insn_retf_imm);
+   report(retf imm 1, 0, 1);
 }
 
 void test_jcc_short(void)
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: x86 emulator: emulate RETF imm

2013-09-03 Thread Bruce Rogers
Opcode CA

This gets used by a DOS based NetWare guest.

Signed-off-by: Bruce Rogers brog...@suse.com
---
 arch/x86/kvm/emulate.c |   23 ++-
 1 files changed, 22 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 2bc1e81..aee238a 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2025,6 +2025,26 @@ static int em_ret_far(struct x86_emulate_ctxt *ctxt)
return rc;
 }
 
+static int em_ret_far_imm(struct x86_emulate_ctxt *ctxt)
+{
+int rc;
+unsigned long cs;
+
+rc = emulate_pop(ctxt, ctxt-_eip, ctxt-op_bytes);
+if (rc != X86EMUL_CONTINUE)
+return rc;
+if (ctxt-op_bytes == 4)
+ctxt-_eip = (u32)ctxt-_eip;
+rc = emulate_pop(ctxt, cs, ctxt-op_bytes);
+if (rc != X86EMUL_CONTINUE)
+return rc;
+rc = load_segment_descriptor(ctxt, (u16)cs, VCPU_SREG_CS);
+if (rc != X86EMUL_CONTINUE)
+return rc;
+rsp_increment(ctxt, ctxt-src.val);
+return X86EMUL_CONTINUE;
+}
+
 static int em_cmpxchg(struct x86_emulate_ctxt *ctxt)
 {
/* Save real source value, then compare EAX against destination. */
@@ -3763,7 +3783,8 @@ static const struct opcode opcode_table[256] = {
G(ByteOp, group11), G(0, group11),
/* 0xC8 - 0xCF */
I(Stack | SrcImmU16 | Src2ImmByte, em_enter), I(Stack, em_leave),
-   N, I(ImplicitOps | Stack, em_ret_far),
+   I(ImplicitOps | Stack | SrcImmU16, em_ret_far_imm),
+   I(ImplicitOps | Stack, em_ret_far),
D(ImplicitOps), DI(SrcImmByte, intn),
D(ImplicitOps | No64), II(ImplicitOps, em_iret, iret),
/* 0xD0 - 0xD7 */
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with ETP enabled

2013-07-11 Thread Bruce Rogers
  On 7/11/2013 at 03:36 AM, Zhanghaoyu (A) haoyu.zh...@huawei.com wrote: 
 hi all,
 
 I met similar problem to these, while performing live migration or 
 save-restore test on the kvm platform (qemu:1.4.0, host:suse11sp2, 
 guest:suse11sp2), running tele-communication software suite in guest,
 https://lists.gnu.org/archive/html/qemu-devel/2013-05/msg00098.html
 http://comments.gmane.org/gmane.comp.emulators.kvm.devel/102506
 http://thread.gmane.org/gmane.comp.emulators.kvm.devel/100592
 https://bugzilla.kernel.org/show_bug.cgi?id=58771
 
 After live migration or virsh restore [savefile], one process's CPU 
 utilization went up by about 30%, resulted in throughput degradation of this 
 process.
 oprofile report on this process in guest,
 pre live migration:
 CPU: CPU with timer interrupt, speed 0 MHz (estimated)
 Profiling through timer interrupt
 samples  %app name symbol name
 248  12.3016  no-vmlinux   (no symbols)
 783.8690  libc.so.6memset
 683.3730  libc.so.6memcpy
 301.4881  cscf.scu SipMmBufMemAlloc
 291.4385  libpthread.so.0  pthread_mutex_lock
 261.2897  cscf.scu SipApiGetNextIe
 251.2401  cscf.scu DBFI_DATA_Search
 200.9921  libpthread.so.0  __pthread_mutex_unlock_usercnt
 160.7937  cscf.scu DLM_FreeSlice
 160.7937  cscf.scu receivemessage
 150.7440  cscf.scu SipSmCopyString
 140.6944  cscf.scu DLM_AllocSlice
 
 post live migration:
 CPU: CPU with timer interrupt, speed 0 MHz (estimated)
 Profiling through timer interrupt
 samples  %app name symbol name
 1586 42.2370  libc.so.6memcpy
 271   7.2170  no-vmlinux   (no symbols)
 832.2104  libc.so.6memset
 411.0919  libpthread.so.0  __pthread_mutex_unlock_usercnt
 350.9321  cscf.scu SipMmBufMemAlloc
 290.7723  cscf.scu DLM_AllocSlice
 280.7457  libpthread.so.0  pthread_mutex_lock
 230.6125  cscf.scu SipApiGetNextIe
 170.4527  cscf.scu SipSmCopyString
 160.4261  cscf.scu receivemessage
 150.3995  cscf.scu SipcMsgStatHandle
 140.3728  cscf.scu Urilex
 120.3196  cscf.scu DBFI_DATA_Search
 120.3196  cscf.scu SipDsmGetHdrBitValInner
 120.3196  cscf.scu SipSmGetDataFromRefString
 
 So, memcpy costs much more cpu cycles after live migration. Then, I restart 
 the process, this problem disappeared. save-restore has the similar problem.
 
 perf report on vcpu thread in host,
 pre live migration:
 Performance counter stats for thread id '21082':
 
  0 page-faults
  0 minor-faults
  0 major-faults
  31616 cs
506 migrations
  0 alignment-faults
  0 emulation-faults
 5075957539 L1-dcache-loads
  
  [21.32%]
  324685106 L1-dcache-load-misses #6.40% of all L1-dcache hits 
   
 [21.85%]
 3681777120 L1-dcache-stores   
  
  [21.65%]
   65251823 L1-dcache-store-misses# 1.77%  
   
[22.78%]
  0 L1-dcache-prefetches   
  
  [22.84%]
  0 L1-dcache-prefetch-misses  
   
 [22.32%]
 9321652613 L1-icache-loads
  
  [22.60%]
 1353418869 L1-icache-load-misses #   14.52% of all L1-icache hits 
   
 [21.92%]
  169126969 LLC-loads  
   [21.87%]
   12583605 LLC-load-misses   #7.44% of all LL-cache hits  
   
 [ 5.84%]
  132853447 LLC-stores 
   [ 6.61%]
   10601171 LLC-store-misses  #7.9%
  
   [ 5.01%]
   25309497 LLC-prefetches #30%
   [ 4.96%]
7723198 LLC-prefetch-misses
  
  [ 6.04%]
 4954075817 dTLB-loads 
   [11.56%]
   26753106 dTLB-load-misses  #0.54% of all dTLB cache 
 hits 
  [16.80%]
 3553702874 dTLB-stores
   [22.37%]
4720313 dTLB-store-misses#0.13%
  
[21.46%]
  not counted dTLB-prefetches
  not counted dTLB-prefetch-misses
 
   

Re: [Qemu-devel] qemu-kvm: remove boot=on|off drive parameter compatibility

2012-10-01 Thread Bruce Rogers
  On 10/1/2012 at 07:19 AM, Anthony Liguori anth...@codemonkey.ws wrote: 
 Jan Kiszka jan.kis...@siemens.com writes:
 
 On 2012-10-01 11:31, Marcelo Tosatti wrote:

 It's not just about default configs. We need to validate if the
 migration formats are truly compatible (qemu-kvm - QEMU, the other way
 around definitely not). For the command line switches, we could provide
 a wrapper script that translates them into upstream format or simply
 ignores them. That should be harmless to carry upstream.
 
 qemu-kvm has:
 
  -no-kvm
  -no-kvm-irqchip
  -no-kvm-pit
  -no-kvm-pit-reinjection
  -tdf - does nothing
 
 There are replacements for all of the above.  If we need to add them to
 qemu.git, it's not big deal to add them.
 
  -drive ...,boot= - this is ignored
 
 cpu_set command for CPU hotplug which is known broken in qemu-kvm.
 
 testdev which is nice but only used for development
 
 Default nic is rtl8139 vs. e1000.
 
 Some logic to move change the default VGA ram size to 16mb for pc-1.2
 (QEMU uses 16mb by default now too).
 
 I think at this point, none of this matters but I added the various
 distro maintainers to the thread.
 
 I think it's time for the distros to drop qemu-kvm and just ship
 qemu.git.  Is there anything else that needs to happen to make that
 switch?

We are seriously considering moving to qemu.git for our SP3 release of
SUSE SLES 11. There are just a handful of patches that provide the backwards
compatibility we need to maintain (default to kvm, default nic model,
vga ram size), so assuming there is a 100% commitment to fully supporting
kvm in qemu going forward (which I don't doubt) I think this is a good time
for us to make that switch.

Bruce

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] handle device help before accelerator set up

2012-08-08 Thread Bruce Rogers
A command line device probe using just -device ? gets processed
after qemu-kvm initializes the accelerator. If /dev/kvm is not
present, the accelerator check will fail (kvm is defaulted to on),
which causes libvirt to not be set up to handle qemu guests.

Moving the device help handling before the accelerator set up allows
the device probe to work in this configuration and libvirt succeeds
in setting up for a qemu hypervisor mode.

Signed-off-by: Bruce Rogers brog...@suse.com
---
 vl.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/vl.c b/vl.c
index 1a46d2d..5b75cf9 100644
--- a/vl.c
+++ b/vl.c
@@ -3380,6 +3380,9 @@ int main(int argc, char **argv, char **envp)
 ram_size = DEFAULT_RAM_SIZE * 1024 * 1024;
 }
 
+if (qemu_opts_foreach(qemu_find_opts(device), device_help_func, NULL, 0) 
!= 0)
+exit(0);
+
 configure_accelerator();
 
 qemu_init_cpu_loop();
@@ -3535,9 +3538,6 @@ int main(int argc, char **argv, char **envp)
 }
 select_vgahw(vga_model);
 
-if (qemu_opts_foreach(qemu_find_opts(device), device_help_func, NULL, 0) 
!= 0)
-exit(0);
-
 if (watchdog) {
 i = select_watchdog(watchdog);
 if (i  0)
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] kvm: kvmclock: apply kvmclock offset to guest wall clock time

2012-08-01 Thread Bruce Rogers
  On 8/1/2012 at 02:21 PM, Marcelo Tosatti mtosa...@redhat.com wrote: 
 On Mon, Jul 23, 2012 at 09:44:54PM -0300, Marcelo Tosatti wrote:
 On Fri, Jul 20, 2012 at 10:44:24AM -0600, Bruce Rogers wrote:
  When a guest migrates to a new host, the system time difference from the
  previous host is used in the updates to the kvmclock system time visible
  to the guest, resulting in a continuation of correct kvmclock based guest
  timekeeping.
  
  The wall clock component of the kvmclock provided time is currently not
  updated with this same time offset. Since the Linux guest caches the
  wall clock based time, this discrepency is not noticed until the guest is
  rebooted. After reboot the guest's time calculations are off.
  
  This patch adjusts the wall clock by the kvmclock_offset, resulting in
  correct guest time after a reboot.
  
  Cc: Glauber Costa glom...@redhat.com
  Cc: Zachary Amsden zams...@gmail.com
  Signed-off-by: Bruce Rogers brog...@suse.com
  ---
   arch/x86/kvm/x86.c |4 
   1 files changed, 4 insertions(+), 0 deletions(-)
  
  diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
  index be6d549..14c290d 100644
  --- a/arch/x86/kvm/x86.c
  +++ b/arch/x86/kvm/x86.c
  @@ -907,6 +907,10 @@ static void kvm_write_wall_clock(struct kvm *kvm, 
  gpa_t 
 wall_clock)
  */
 getboottime(boot);
   
  +  if (kvm-arch.kvmclock_offset) {
  +  struct timespec ts = ns_to_timespec(kvm-arch.kvmclock_offset);
  +  boot = timespec_sub(boot, ts);
  +  }
 
 kvmclock_offset is signed (both directions). Must check the sign and use
 _sub and _add_safe accordingly.
 
 Your patch is correct, sorry (applied to master).
 
 Patch 2 still makes no sense.

I'm fine with dropping the second patch.

Thanks

Bruce

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] kvm: kvmclock: eliminate kvmclock offset when time page count goes to zero

2012-07-20 Thread Bruce Rogers
When a guest is migrated, a time offset is generated in order to
maintain the correct kvmclock based time for the guest. Detect when
all kvmclock time pages are deleted so that the kvmclock offset can
be safely reset to zero.

Cc: Glauber Costa glom...@redhat.com
Cc: Zachary Amsden zams...@gmail.com
Signed-off-by: Bruce Rogers brog...@suse.com
---
 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/x86.c  |5 -
 2 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index db7c1f2..112415c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -524,6 +524,7 @@ struct kvm_arch {
 
unsigned long irq_sources_bitmap;
s64 kvmclock_offset;
+   unsigned int n_time_pages;
raw_spinlock_t tsc_write_lock;
u64 last_tsc_nsec;
u64 last_tsc_write;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 14c290d..350c51b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1511,6 +1511,8 @@ static void kvmclock_reset(struct kvm_vcpu *vcpu)
if (vcpu-arch.time_page) {
kvm_release_page_dirty(vcpu-arch.time_page);
vcpu-arch.time_page = NULL;
+   if (--vcpu-kvm-arch.n_time_pages == 0)
+   vcpu-kvm-arch.kvmclock_offset = 0;
}
 }
 
@@ -1624,7 +1626,8 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, 
u64 data)
if (is_error_page(vcpu-arch.time_page)) {
kvm_release_page_clean(vcpu-arch.time_page);
vcpu-arch.time_page = NULL;
-   }
+   } else
+   vcpu-kvm-arch.n_time_pages++;
break;
}
case MSR_KVM_ASYNC_PF_EN:
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] kvm: kvmclock: fix kvmclock reboot after migrate issues

2012-07-20 Thread Bruce Rogers
When a linux guest live migrates to a new host and subsequently
reboots, the guest no longer has the correct time. This is due
to a failure to apply the kvmclock offset to the wall clock time.

The first patch addresses this failure directly, while the second
patch detects when the offset is no longer needed, and zeroes the
offset as a matter of cleaning up migration state which is no longer
relevant. Both patches address the issue, but in different ways. 


Bruce Rogers (2):
  kvm: kvmclock: apply kvmclock offset to guest wall clock time
  kvm: kvmclock: eliminate kvmclock offset when time page count goes to
zero

 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/x86.c  |9 -
 2 files changed, 9 insertions(+), 1 deletions(-)

-- 
1.7.7


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] kvm: kvmclock: apply kvmclock offset to guest wall clock time

2012-07-20 Thread Bruce Rogers
When a guest migrates to a new host, the system time difference from the
previous host is used in the updates to the kvmclock system time visible
to the guest, resulting in a continuation of correct kvmclock based guest
timekeeping.

The wall clock component of the kvmclock provided time is currently not
updated with this same time offset. Since the Linux guest caches the
wall clock based time, this discrepency is not noticed until the guest is
rebooted. After reboot the guest's time calculations are off.

This patch adjusts the wall clock by the kvmclock_offset, resulting in
correct guest time after a reboot.

Cc: Glauber Costa glom...@redhat.com
Cc: Zachary Amsden zams...@gmail.com
Signed-off-by: Bruce Rogers brog...@suse.com
---
 arch/x86/kvm/x86.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index be6d549..14c290d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -907,6 +907,10 @@ static void kvm_write_wall_clock(struct kvm *kvm, gpa_t 
wall_clock)
 */
getboottime(boot);
 
+   if (kvm-arch.kvmclock_offset) {
+   struct timespec ts = ns_to_timespec(kvm-arch.kvmclock_offset);
+   boot = timespec_sub(boot, ts);
+   }
wc.sec = boot.tv_sec;
wc.nsec = boot.tv_nsec;
wc.version = version;
-- 
1.7.7


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] kvm: kvmclock: apply kvmclock offset to guest wall clock time

2012-07-20 Thread Bruce Rogers
When a guest migrates to a new host, the system time difference from the
previous host is used in the updates to the kvmclock system time visible
to the guest, resulting in a continuation of correct kvmclock based guest
timekeeping.

The wall clock component of the kvmclock provided time is currently not
updated with this same time offset. Since the Linux guest caches the
wall clock based time, this discrepency is not noticed until the guest is
rebooted. After reboot the guest's time calculations are off.

This patch adjusts the wall clock by the kvmclock_offset, resulting in
correct guest time after a reboot.

Cc: Glauber Costa glom...@redhat.com
Cc: Zachary Amsden zams...@redhat.com
Signed-off-by: Bruce Rogers brog...@suse.com
---
 arch/x86/kvm/x86.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index be6d549..14c290d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -907,6 +907,10 @@ static void kvm_write_wall_clock(struct kvm *kvm, gpa_t 
wall_clock)
 */
getboottime(boot);
 
+   if (kvm-arch.kvmclock_offset) {
+   struct timespec ts = ns_to_timespec(kvm-arch.kvmclock_offset);
+   boot = timespec_sub(boot, ts);
+   }
wc.sec = boot.tv_sec;
wc.nsec = boot.tv_nsec;
wc.version = version;
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] kvm: kvmclock: eliminate kvmclock offset when time page count goes to zero

2012-07-20 Thread Bruce Rogers
When a guest is migrated, a time offset is generated in order to
maintain the correct kvmclock based time for the guest. Detect when
all kvmclock time pages are deleted so that the kvmclock offset can
be safely reset to zero.

Cc: Glauber Costa glom...@redhat.com
Cc: Zachary Amsden zams...@redhat.com
Signed-off-by: Bruce Rogers brog...@suse.com
---
 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/x86.c  |5 -
 2 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index db7c1f2..112415c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -524,6 +524,7 @@ struct kvm_arch {
 
unsigned long irq_sources_bitmap;
s64 kvmclock_offset;
+   unsigned int n_time_pages;
raw_spinlock_t tsc_write_lock;
u64 last_tsc_nsec;
u64 last_tsc_write;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 14c290d..350c51b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1511,6 +1511,8 @@ static void kvmclock_reset(struct kvm_vcpu *vcpu)
if (vcpu-arch.time_page) {
kvm_release_page_dirty(vcpu-arch.time_page);
vcpu-arch.time_page = NULL;
+   if (--vcpu-kvm-arch.n_time_pages == 0)
+   vcpu-kvm-arch.kvmclock_offset = 0;
}
 }
 
@@ -1624,7 +1626,8 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, 
u64 data)
if (is_error_page(vcpu-arch.time_page)) {
kvm_release_page_clean(vcpu-arch.time_page);
vcpu-arch.time_page = NULL;
-   }
+   } else
+   vcpu-kvm-arch.n_time_pages++;
break;
}
case MSR_KVM_ASYNC_PF_EN:
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] kvm: kvmclock: fix kvmclock reboot after migrate issues

2012-07-20 Thread Bruce Rogers
When a linux guest live migrates to a new host and subsequently
reboots, the guest no longer has the correct time. This is due
to a failure to apply the kvmclock offset to the wall clock time.

The first patch addresses this failure directly, while the second
patch detects when the offset is no longer needed, and zeroes the
offset as a matter of cleaning up migration state which is no longer
relevant. Both patches address the issue, but in different ways. 


Bruce Rogers (2):
  kvm: kvmclock: apply kvmclock offset to guest wall clock time
  kvm: kvmclock: eliminate kvmclock offset when time page count goes to
zero

 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/x86.c  |9 -
 2 files changed, 9 insertions(+), 1 deletions(-)

-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3][STABLE] KVM: indicate oom if add_buf fails

2010-06-03 Thread Bruce Rogers
This patch is a subset of an already upstream patch, but this portion is useful 
in earlier releases.
Please consider for stable.

If the add_buf operation fails, indicate failure to the caller.
Signed-off-by: Bruce Rogers brog...@novell.com

--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c

@@ -318,6 +318,7 @@ static bool try_fill_recv_maxbufs(struct
skb_unlink(skb, vi-recv);
trim_pages(vi, skb);
kfree_skb(skb);
+   oom = true;
break;
}
vi-num++;
@@ -368,6 +369,7 @@ static bool try_fill_recv(struct virtnet
if (err  0) {
skb_unlink(skb, vi-recv);
kfree_skb(skb);
+   oom = true;
break;
}
vi-num++;


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3][STABLE] KVM: add schedule check to napi_enable call

2010-06-03 Thread Bruce Rogers
virtio_net: Add schedule check to napi_enable call
Under harsh testing conditions, including low memory, the guest would
stop receiving packets. With this patch applied we no longer see any
problems in the driver while performing these tests for extended periods
of time.

Make sure napi is scheduled subsequent to each napi_enable.

Signed-off-by: Bruce Rogers brog...@novell.com
Signed-off-by: Olaf Kirch o...@suse.de

--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -388,6 +388,20 @@ static void skb_recv_done(struct virtque
}
 }

+static void virtnet_napi_enable(struct virtnet_info *vi)
+{
+   napi_enable(vi-napi);
+
+   /* If all buffers were filled by other side before we napi_enabled, we
+* won't get another interrupt, so process any outstanding packets
+* now.  virtnet_poll wants re-enable the queue, so we disable here.
+* We synchronize against interrupts via NAPI_STATE_SCHED */
+   if (napi_schedule_prep(vi-napi)) {
+   vi-rvq-vq_ops-disable_cb(vi-rvq);
+   __napi_schedule(vi-napi);
+   }
+}
+
 static void refill_work(struct work_struct *work)
 {
struct virtnet_info *vi;
@@ -397,7 +411,7 @@ static void refill_work(struct work_stru
napi_disable(vi-napi);
try_fill_recv(vi, GFP_KERNEL);
still_empty = (vi-num == 0);
-   napi_enable(vi-napi);
+   virtnet_napi_enable(vi);

/* In theory, this can happen: if we don't get any buffers in
 * we will *never* try to fill again. */
@@ -589,16 +603,7 @@ static int virtnet_open(struct net_devic
 {
struct virtnet_info *vi = netdev_priv(dev);

-   napi_enable(vi-napi);
-
-   /* If all buffers were filled by other side before we napi_enabled, we
-* won't get another interrupt, so process any outstanding packets
-* now.  virtnet_poll wants re-enable the queue, so we disable here.
-* We synchronize against interrupts via NAPI_STATE_SCHED */
-   if (napi_schedule_prep(vi-napi)) {
-   vi-rvq-vq_ops-disable_cb(vi-rvq);
-   __napi_schedule(vi-napi);
-   }
+   virtnet_napi_enable(vi);
return 0;
 }



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3][STABLE] KVM: Various issues in virtio_net

2010-06-03 Thread Bruce Rogers
These are patches which we have found useful for our 2.6.32 based SLES 11 SP1 
release. 

The first patch is already upstream, but should be included in stable.

The second patch is a subset of another upstream patch. Again, stable material.

The third patch solves the last remaining issue we saw when testing kvm 
configurations with the SUSE certification test suite. Under heavy load, we 
observed rx stalls (first two patches applied), and this third patch was 
crafted to address the issue. Please apply to stable.
I assume this last problem also exists in more recent kernels than 2.6.32, but 
I haven't validated that.

With these 3 patches applied we no longer see any issues with virito networking 
using our certification test suite.

Signed-off-by: Bruce Rogers brog...@novell.com


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3][STABLE] KVM: fix delayed refill checking

2010-06-03 Thread Bruce Rogers
Please consider this for stable:

commit 39d321577405e8e269fd238b278aaf2425fa788a
Author: Herbert Xu herb...@gondor.apana.org.au
Date:   Mon Jan 25 15:51:01 2010 -0800

virtio_net: Make delayed refill more reliable

I have seen RX stalls on a machine that experienced a suspected
OOM.  After the stall, the RX buffer is empty on the guest side
and there are exactly 16 entries available on the host side.  As
the number of entries is less than that required by a maximal
skb, the host cannot proceed.

The guest did not have a refill job scheduled.

My diagnosis is that an OOM had occured, with the delayed refill
job scheduled.  The job was able to allocate at least one skb, but
not enough to overcome the minimum required by the host to proceed.

As the refill job would only reschedule itself if it failed completely
to allocate any skbs, this would lead to an RX stall.

The following patch removes this stall possibility by always
rescheduling the refill job until the ring is totally refilled.

Testing has shown that the RX stall no longer occurs whereas
previously it would occur within a day.

Signed-off-by: Herbert Xu herb...@gondor.apana.org.au
Acked-by: Rusty Russell ru...@rustcorp.com.au
Signed-off-by: David S. Miller da...@davemloft.net

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index c708ecc..9ead30b 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -395,8 +395,7 @@ static void refill_work(struct work_struct *work)

vi = container_of(work, struct virtnet_info, refill.work);
napi_disable(vi-napi);
-   try_fill_recv(vi, GFP_KERNEL);
-   still_empty = (vi-num == 0);
+   still_empty = !try_fill_recv(vi, GFP_KERNEL);
napi_enable(vi-napi);

/* In theory, this can happen: if we don't get any buffers in


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [stable] [PATCH 2/3][STABLE] KVM: indicate oom if add_buf fails

2010-06-03 Thread Bruce Rogers
  On 6/3/2010 at 03:02 PM, Greg KH g...@kroah.com wrote: 

 
 WHat is the git commit id of the upstream patch?
 

9ab86bbcf8be755256f0a5e994e0b38af6b4d399
I grabbed this from:
git://git.kernel.org/pub/scm/virt/kvm/kvm.git

 I need that for all stable patches to be accepted, thanks.
 
 Also, all KVM stuff needs to get acked by Avi, I can't take them until
 he says they are ok.

Understood.

 
 Oh, and what -stable trees do you want these patches in?  .27, .32, .33,
 or .34?  I have a bunch of them going at the moment...

All 3 in 2.6.32, only #2 and #3 in 2.6.33, and only #3 in 2.6.34

Thanks,
Bruce

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [stable] [PATCH 3/3][STABLE] KVM: add schedule check to napi_enable call

2010-06-03 Thread Bruce Rogers
  On 6/3/2010 at 03:03 PM, Greg KH g...@kroah.com wrote: 
 On Thu, Jun 03, 2010 at 01:38:31PM -0600, Bruce Rogers wrote:
 virtio_net: Add schedule check to napi_enable call
 Under harsh testing conditions, including low memory, the guest would
 stop receiving packets. With this patch applied we no longer see any
 problems in the driver while performing these tests for extended periods
 of time.
 
 Make sure napi is scheduled subsequent to each napi_enable.
 
 Signed-off-by: Bruce Rogers brog...@novell.com
 Signed-off-by: Olaf Kirch o...@suse.de
 
 I need a git commit id for this one as well.
 

This one is not upstream.

Bruce

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [stable] [PATCH 3/3][STABLE] KVM: add schedule check to napi_enable call

2010-06-03 Thread Bruce Rogers
  On 6/3/2010 at 04:51 PM, Greg KH g...@kroah.com wrote: 
 On Thu, Jun 03, 2010 at 04:17:34PM -0600, Bruce Rogers wrote:
   On 6/3/2010 at 03:03 PM, Greg KH g...@kroah.com wrote: 
  On Thu, Jun 03, 2010 at 01:38:31PM -0600, Bruce Rogers wrote:
  virtio_net: Add schedule check to napi_enable call
  Under harsh testing conditions, including low memory, the guest would
  stop receiving packets. With this patch applied we no longer see any
  problems in the driver while performing these tests for extended 
 periods
  of time.
  
  Make sure napi is scheduled subsequent to each napi_enable.
  
  Signed-off-by: Bruce Rogers brog...@novell.com
  Signed-off-by: Olaf Kirch o...@suse.de
  
  I need a git commit id for this one as well.
  
 
 This one is not upstream.
 
 Then I can't include it in the -stable tree, so why are you sending it
 to me?  :)
 
 thanks,
 
 greg k-h

Good point!
Sorry about the confusion.
Bruce

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] document boot option to -drive parameter

2010-04-16 Thread Bruce Rogers
The boot option is missing from the documentation for the -drive parameter.

If there is a better way to descibe it, I'm all ears.

Signed-off-by: Bruce Rogers brog...@novell.com

diff --git a/qemu-options.hx b/qemu-options.hx
index c5a160c..fbcf61e 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -160,6 +160,8 @@ an untrusted format header.
 This option specifies the serial number to assign to the device.
 @item ad...@var{addr}
 Specify the controller's PCI address (if=virtio only).
+...@item bo...@var{boot}
+...@var{boot} is on or off and allows for booting from non-traditional 
interfaces, such as virtio.
 @end table
 
 By default, writethrough caching is used for all block device.  This means that

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] make help output be a little more self-consistent

2010-01-07 Thread Bruce Rogers

This is the part which applies to qemu-kvm. 

Signed-off-by: Bruce Rogers brog...@novell.com 
---
 qemu-options.hx |   19 ++-
 1 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index 788d849..fdd5884 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1938,7 +1938,7 @@ DEF(readconfig, HAS_ARG, QEMU_OPTION_readconfig,
 -readconfig file\n)
 DEF(writeconfig, HAS_ARG, QEMU_OPTION_writeconfig,
 -writeconfig file\n
-read/write config file)
+read/write config file\n)
 
 DEF(no-kvm, 0, QEMU_OPTION_no_kvm,
 -no-kvm disable KVM hardware virtualization\n)
@@ -1947,26 +1947,27 @@ DEF(no-kvm-irqchip, 0, QEMU_OPTION_no_kvm_irqchip,
 DEF(no-kvm-pit, 0, QEMU_OPTION_no_kvm_pit,
 -no-kvm-pit disable KVM kernel mode PIT\n)
 DEF(no-kvm-pit-reinjection, 0, QEMU_OPTION_no_kvm_pit_reinjection,
--no-kvm-pit-reinjection disable KVM kernel mode PIT interrupt 
reinjection\n)
+-no-kvm-pit-reinjection\n
+disable KVM kernel mode PIT interrupt reinjection\n)
 #if defined(TARGET_I386) || defined(TARGET_X86_64) || defined(TARGET_IA64) || 
defined(__linux__)
 DEF(pcidevice, HAS_ARG, QEMU_OPTION_pcidevice,
 -pcidevice host=bus:dev.func[,dma=none][,name=string]\n
-expose a PCI device to the guest OS.\n
+expose a PCI device to the guest OS\n
 dma=none: don't perform any dma translations (default is 
to use an iommu)\n
-'string' is used in log output.\n)
+'string' is used in log output\n)
 #endif
 DEF(enable-nesting, 0, QEMU_OPTION_enable_nesting,
 -enable-nesting enable support for running a VM inside the VM (AMD 
only)\n)
 DEF(nvram, HAS_ARG, QEMU_OPTION_nvram,
--nvram FILE  provide ia64 nvram contents\n)
+-nvram FILE provide ia64 nvram contents\n)
 DEF(tdf, 0, QEMU_OPTION_tdf,
--tdf enable guest time drift compensation\n)
+-tdfenable guest time drift compensation\n)
 DEF(kvm-shadow-memory, HAS_ARG, QEMU_OPTION_kvm_shadow_memory,
 -kvm-shadow-memory MEGABYTES\n
- allocate MEGABYTES for kvm mmu shadowing\n)
+allocate MEGABYTES for kvm mmu shadowing\n)
 DEF(mem-path, HAS_ARG, QEMU_OPTION_mempath,
--mem-path FILE   provide backing storage for guest RAM\n)
+-mem-path FILE  provide backing storage for guest RAM\n)
 #ifdef MAP_POPULATE
 DEF(mem-prealloc, 0, QEMU_OPTION_mem_prealloc,
--mem-preallocpreallocate guest memory (use with -mempath)\n)
+-mem-prealloc   preallocate guest memory (use with -mempath)\n)
 #endif


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] [RESEND] make help output be a little more self-consistent

2010-01-07 Thread Bruce Rogers
This is the part which applies to the base qemu. 
btw: it was sent to qemu-de...@nongnu.org yesterday.) 

Signed-off-by: Bruce Rogers 
---
 qemu-options.hx |   39 ---
 1 files changed, 20 insertions(+), 19 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index ecd50eb..20b696d 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -42,7 +42,7 @@ DEF(smp, HAS_ARG, QEMU_OPTION_smp,
 -smp n[,maxcpus=cpus][,cores=cores][,threads=threads][,sockets=sockets]\n
 set the number of CPUs to 'n' [default=1]\n
 maxcpus= maximum number of total cpus, including\n
-  offline CPUs for hotplug etc.\n
+offline CPUs for hotplug, etc\n
 cores= number of CPU cores on one socket\n
 threads= number of threads on one CPU core\n
 sockets= number of discrete sockets in the system\n)
@@ -405,8 +405,9 @@ ETEXI
 DEF(device, HAS_ARG, QEMU_OPTION_device,
 -device driver[,options]  add device\n)
 DEF(name, HAS_ARG, QEMU_OPTION_name,
--name string1[,process=string2]set the name of the guest\n
-string1 sets the window title and string2 the process name 
(on Linux)\n)
+-name string1[,process=string2]\n
+set the name of the guest\n
+string1 sets the window title and string2 the process 
name (on Linux)\n)
 STEXI
 @item -name @var{name}
 Sets the @var{name} of the guest.
@@ -483,7 +484,7 @@ ETEXI
 
 #ifdef CONFIG_SDL
 DEF(ctrl-grab, 0, QEMU_OPTION_ctrl_grab,
--ctrl-grab   use Right-Ctrl to grab mouse (instead of Ctrl-Alt)\n)
+-ctrl-grab  use Right-Ctrl to grab mouse (instead of Ctrl-Alt)\n)
 #endif
 STEXI
 @item -ctrl-grab
@@ -756,12 +757,12 @@ ETEXI
 #ifdef TARGET_I386
 DEF(smbios, HAS_ARG, QEMU_OPTION_smbios,
 -smbios file=binary\n
-Load SMBIOS entry from binary file\n
+load SMBIOS entry from binary file\n
 -smbios type=0[,vendor=str][,version=str][,date=str][,release=%%d.%%d]\n
-Specify SMBIOS type 0 fields\n
+specify SMBIOS type 0 fields\n
 -smbios 
type=1[,manufacturer=str][,product=str][,version=str][,serial=str]\n
   [,uuid=uuid][,sku=str][,family=str]\n
-Specify SMBIOS type 1 fields\n)
+specify SMBIOS type 1 fields\n)
 #endif
 STEXI
 @item -smbios fi...@var{binary}
@@ -816,13 +817,13 @@ DEF(net, HAS_ARG, QEMU_OPTION_net,
 -net 
tap[,vlan=n][,name=str][,fd=h][,ifname=name][,script=file][,downscript=dfile][,sndbuf=nbytes][,vnet_hdr=on|off]\n
 connect the host TAP network interface to VLAN 'n' and 
use the\n
 network scripts 'file' (default=%s)\n
-and 'dfile' (default=%s);\n
-use '[down]script=no' to disable script execution;\n
+and 'dfile' (default=%s)\n
+use '[down]script=no' to disable script execution\n
 use 'fd=h' to connect to an already opened TAP 
interface\n
-use 'sndbuf=nbytes' to limit the size of the send buffer; 
the\n
-default of 'sndbuf=1048576' can be disabled using 
'sndbuf=0'\n
-use vnet_hdr=off to avoid enabling the IFF_VNET_HDR tap 
flag; use\n
-vnet_hdr=on to make the lack of IFF_VNET_HDR support an 
error condition\n
+use 'sndbuf=nbytes' to limit the size of the send buffer 
(the\n
+default of 'sndbuf=1048576' can be disabled using 
'sndbuf=0')\n
+use vnet_hdr=off to avoid enabling the IFF_VNET_HDR tap 
flag\n
+use vnet_hdr=on to make the lack of IFF_VNET_HDR support 
an error condition\n
 #endif
 -net 
socket[,vlan=n][,name=str][,fd=h][,listen=[host]:port][,connect=host:port]\n
 connect the vlan 'n' to another VLAN using a socket 
connection\n
@@ -837,7 +838,7 @@ DEF(net, HAS_ARG, QEMU_OPTION_net,
 #endif
 -net dump[,vlan=n][,file=f][,len=n]\n
 dump traffic on vlan 'n' to file 'f' (max n bytes per 
packet)\n
--net none   use it alone to have zero network devices; if no -net 
option\n
+-net none   use it alone to have zero network devices. If no -net 
option\n
 is provided, the default is '-net nic -net user'\n)
 DEF(netdev, HAS_ARG, QEMU_OPTION_netdev,
 -netdev [
@@ -1589,7 +1590,7 @@ The default device is @code{vc} in graphical mode and 
@code{stdio} in
 non graphical mode.
 ETEXI
 DEF(qmp, HAS_ARG, QEMU_OPTION_qmp, \
--qmp devlike -monitor but opens in 'control' mode.\n)
+-qmp devlike -monitor but opens in 'control' mode\n)
 
 DEF(mon, HAS_ARG, QEMU_OPTION_mon, \
 -mon chardev=[name][,mode=readline|control][,default]\n)
@@ -1607,7 +1608,7 @@ from a script.
 ETEXI
 
 DEF(singlestep

[PATCH] make help output be a little more self-consistent

2010-01-06 Thread Bruce Rogers
Signed-off-by: Bruce Rogers brog...@novell.com
---
 qemu-options.hx |   58 --
 1 files changed, 30 insertions(+), 28 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index 812d067..fdd5884 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -42,7 +42,7 @@ DEF(smp, HAS_ARG, QEMU_OPTION_smp,
 -smp n[,maxcpus=cpus][,cores=cores][,threads=threads][,sockets=sockets]\n
 set the number of CPUs to 'n' [default=1]\n
 maxcpus= maximum number of total cpus, including\n
-  offline CPUs for hotplug etc.\n
+offline CPUs for hotplug, etc\n
 cores= number of CPU cores on one socket\n
 threads= number of threads on one CPU core\n
 sockets= number of discrete sockets in the system\n)
@@ -406,8 +406,9 @@ ETEXI
 DEF(device, HAS_ARG, QEMU_OPTION_device,
 -device driver[,options]  add device\n)
 DEF(name, HAS_ARG, QEMU_OPTION_name,
--name string1[,process=string2]set the name of the guest\n
-string1 sets the window title and string2 the process name 
(on Linux)\n)
+-name string1[,process=string2]\n
+set the name of the guest\n
+string1 sets the window title and string2 the process 
name (on Linux)\n)
 STEXI
 @item -name @var{name}
 Sets the @var{name} of the guest.
@@ -484,7 +485,7 @@ ETEXI
 
 #ifdef CONFIG_SDL
 DEF(ctrl-grab, 0, QEMU_OPTION_ctrl_grab,
--ctrl-grab   use Right-Ctrl to grab mouse (instead of Ctrl-Alt)\n)
+-ctrl-grab  use Right-Ctrl to grab mouse (instead of Ctrl-Alt)\n)
 #endif
 STEXI
 @item -ctrl-grab
@@ -757,12 +758,12 @@ ETEXI
 #ifdef TARGET_I386
 DEF(smbios, HAS_ARG, QEMU_OPTION_smbios,
 -smbios file=binary\n
-Load SMBIOS entry from binary file\n
+load SMBIOS entry from binary file\n
 -smbios type=0[,vendor=str][,version=str][,date=str][,release=%%d.%%d]\n
-Specify SMBIOS type 0 fields\n
+specify SMBIOS type 0 fields\n
 -smbios 
type=1[,manufacturer=str][,product=str][,version=str][,serial=str]\n
   [,uuid=uuid][,sku=str][,family=str]\n
-Specify SMBIOS type 1 fields\n)
+specify SMBIOS type 1 fields\n)
 #endif
 STEXI
 @item -smbios fi...@var{binary}
@@ -817,13 +818,13 @@ DEF(net, HAS_ARG, QEMU_OPTION_net,
 -net 
tap[,vlan=n][,name=str][,fd=h][,ifname=name][,script=file][,downscript=dfile][,sndbuf=nbytes][,vnet_hdr=on|off]\n
 connect the host TAP network interface to VLAN 'n' and 
use the\n
 network scripts 'file' (default=%s)\n
-and 'dfile' (default=%s);\n
-use '[down]script=no' to disable script execution;\n
+and 'dfile' (default=%s)\n
+use '[down]script=no' to disable script execution\n
 use 'fd=h' to connect to an already opened TAP 
interface\n
-use 'sndbuf=nbytes' to limit the size of the send buffer; 
the\n
-default of 'sndbuf=1048576' can be disabled using 
'sndbuf=0'\n
-use vnet_hdr=off to avoid enabling the IFF_VNET_HDR tap 
flag; use\n
-vnet_hdr=on to make the lack of IFF_VNET_HDR support an 
error condition\n
+use 'sndbuf=nbytes' to limit the size of the send buffer 
(the\n
+default of 'sndbuf=1048576' can be disabled using 
'sndbuf=0')\n
+use vnet_hdr=off to avoid enabling the IFF_VNET_HDR tap 
flag\n
+use vnet_hdr=on to make the lack of IFF_VNET_HDR support 
an error condition\n
 #endif
 -net 
socket[,vlan=n][,name=str][,fd=h][,listen=[host]:port][,connect=host:port]\n
 connect the vlan 'n' to another VLAN using a socket 
connection\n
@@ -838,7 +839,7 @@ DEF(net, HAS_ARG, QEMU_OPTION_net,
 #endif
 -net dump[,vlan=n][,file=f][,len=n]\n
 dump traffic on vlan 'n' to file 'f' (max n bytes per 
packet)\n
--net none   use it alone to have zero network devices; if no -net 
option\n
+-net none   use it alone to have zero network devices. If no -net 
option\n
 is provided, the default is '-net nic -net user'\n)
 DEF(netdev, HAS_ARG, QEMU_OPTION_netdev,
 -netdev [
@@ -1590,7 +1591,7 @@ The default device is @code{vc} in graphical mode and 
@code{stdio} in
 non graphical mode.
 ETEXI
 DEF(qmp, HAS_ARG, QEMU_OPTION_qmp, \
--qmp devlike -monitor but opens in 'control' mode.\n)
+-qmp devlike -monitor but opens in 'control' mode\n)
 
 DEF(mon, HAS_ARG, QEMU_OPTION_mon, \
 -mon chardev=[name][,mode=readline|control][,default]\n)
@@ -1608,7 +1609,7 @@ from a script.
 ETEXI
 
 DEF(singlestep, 0, QEMU_OPTION_singlestep, \
--singlestep   always run in singlestep

[PATCH] kvm: allocate correct size for dirty bitmap

2009-09-23 Thread Bruce Rogers
The dirty bitmap copied out to userspace is stored in a long array, and gets 
copied out to userspace accordingly.  This patch accounts for that correctly.  
Currently I'm seeing kvm crashing due to writing beyond the end of the alloc'd 
dirty bitmap memory, because the buffer has the wrong size.

Signed-off-by: Bruce Rogers 
---
 qemu-kvm.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/qemu-kvm.c b/qemu-kvm.c
index 6511cb6..ee5db76 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -702,7 +702,7 @@ int kvm_get_dirty_pages_range(kvm_context_t kvm, unsigned 
long phys_addr,
 for (i = 0; i  KVM_MAX_NUM_MEM_REGIONS; ++i) {
 if ((slots[i].len  (uint64_t) slots[i].phys_addr = phys_addr)
  ((uint64_t) slots[i].phys_addr + slots[i].len = end_addr)) {
-buf = qemu_malloc((slots[i].len / 4096 + 7) / 8 + 2);
+buf = qemu_malloc(BITMAP_SIZE(slots[i].len));
 r = kvm_get_map(kvm, KVM_GET_DIRTY_LOG, i, buf);
 if (r) {
 qemu_free(buf);


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm scaling question

2009-09-14 Thread Bruce Rogers
 On 9/11/2009 at 9:46 AM, Javier Guerra jav...@guerrag.com wrote:
 On Fri, Sep 11, 2009 at 10:36 AM, Bruce Rogers brog...@novell.com wrote:
 Also, when I did a simple experiment with vcpu overcommitment, I was 
 surprised how quickly performance suffered (just bringing a Linux vm up), 
 since I would have assumed the additional vcpus would have been halted the 
 vast majority of the time.  On a 2 proc box, overcommitment to 8 vcpus in a 
 guest (I know this isn't a good usage scenario, but does provide some 
 insights) caused the boot time to increase to almost exponential levels. At 
 16 vcpus, it took hours to just reach the gui login prompt.
 
 I'd guess (and hope!) that having many 1- or 2-cpu guests won't kill
 performance as sharply as having a single guest with more vcpus than
 the physical cpus available.  have you tested that?
 
 -- 
 Javier

Yes, but not empirically.  I'll certainly be doing that, but wanted to see what 
perspective there was on the results I was seeing.
And I've gotten the response that explains why overcommitment is performing so 
poorly in another email.

Bruce



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm scaling question

2009-09-14 Thread Bruce Rogers
 On 9/11/2009 at 3:53 PM, Marcelo Tosatti mtosa...@redhat.com wrote:
 On Fri, Sep 11, 2009 at 09:36:10AM -0600, Bruce Rogers wrote:
 I am wondering if anyone has investigated how well kvm scales when 
 supporting many guests, or many vcpus or both.
 
 I'll do some investigations into the per vm memory overhead and
 play with bumping the max vcpu limit way beyond 16, but hopefully
 someone can comment on issues such as locking problems that are known
 to exist and needing to be addressed to increased parallellism,
 general overhead percentages which can help provide consolidation
 expectations, etc.
 
 I suppose it depends on the guest and workload. With an EPT host and
 16-way Linux guest doing kernel compilations, on recent kernel, i see:
 
 # Samples: 98703304
 #
 # Overhead  Command  Shared Object  Symbol
 #   ...  .  ..
 #
 97.15%   sh  [kernel]   [k] 
 vmx_vcpu_run
  0.27%   sh  [kernel]   [k] 
 kvm_arch_vcpu_ioctl_
  0.12%   sh  [kernel]   [k] 
 default_send_IPI_mas
  0.09%   sh  [kernel]   [k] 
 _spin_lock_irq
 
 Which is pretty good. Without EPT/NPT the mmu_lock seems to be the major
 bottleneck to parallelism.
 
 Also, when I did a simple experiment with vcpu overcommitment, I was
 surprised how quickly performance suffered (just bringing a Linux vm
 up), since I would have assumed the additional vcpus would have been
 halted the vast majority of the time. On a 2 proc box, overcommitment
 to 8 vcpus in a guest (I know this isn't a good usage scenario, but
 does provide some insights) caused the boot time to increase to almost
 exponential levels. At 16 vcpus, it took hours to just reach the gui
 login prompt.
 
 One probable reason for that are vcpus which hold spinlocks in the guest
 are scheduled out in favour of vcpus which spin on that same lock.

I suspected it might be a whole lot of spinning happening. That does seems most 
likely. I was just surprised how bad the behavior was.

Bruce

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm scaling question

2009-09-14 Thread Bruce Rogers
 On 9/11/2009 at 5:02 PM, Andre Przywara andre.przyw...@amd.com wrote:
 Marcelo Tosatti wrote:
 On Fri, Sep 11, 2009 at 09:36:10AM -0600, Bruce Rogers wrote:
 I am wondering if anyone has investigated how well kvm scales when 
 supporting many guests, or many vcpus or both.

 I'll do some investigations into the per vm memory overhead and
 play with bumping the max vcpu limit way beyond 16, but hopefully
 someone can comment on issues such as locking problems that are known
 to exist and needing to be addressed to increased parallellism,
 general overhead percentages which can help provide consolidation
 expectations, etc.
 
 I suppose it depends on the guest and workload. With an EPT host and
 16-way Linux guest doing kernel compilations, on recent kernel, i see:
   ...
 
 Also, when I did a simple experiment with vcpu overcommitment, I was
 surprised how quickly performance suffered (just bringing a Linux vm
 up), since I would have assumed the additional vcpus would have been
 halted the vast majority of the time. On a 2 proc box, overcommitment
 to 8 vcpus in a guest (I know this isn't a good usage scenario, but
 does provide some insights) caused the boot time to increase to almost
 exponential levels. At 16 vcpus, it took hours to just reach the gui
 login prompt.
 
 One probable reason for that are vcpus which hold spinlocks in the guest
 are scheduled out in favour of vcpus which spin on that same lock.
 We have encountered this issue some time ago in Xen. Ticket spinlocks 
 make this even worse. More detailed info can be found here:
 http://www.amd64.org/research/virtualization.html#Lock_holder_preemption 
 
 Have you tried using paravirtualized spinlock in the guest kernel?
 http://lkml.indiana.edu/hypermail/linux/kernel/0807.0/2808.html 


I'll try to give that a try.  Thanks for the tips.

Bruce


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


kvm scaling question

2009-09-11 Thread Bruce Rogers
I am wondering if anyone has investigated how well kvm scales when supporting 
many guests, or many vcpus or both.

I'll do some investigations into the per vm memory overhead and play with 
bumping the max vcpu limit way beyond 16, but hopefully someone can comment on 
issues such as locking problems that are known to exist and needing to be 
addressed to increased parallellism, general overhead percentages which can 
help provide consolidation expectations, etc.

Also, when I did a simple experiment with vcpu overcommitment, I was surprised 
how quickly performance suffered (just bringing a Linux vm up), since I would 
have assumed the additional vcpus would have been halted the vast majority of 
the time.  On a 2 proc box, overcommitment to 8 vcpus in a guest (I know this 
isn't a good usage scenario, but does provide some insights) caused the boot 
time to increase to almost exponential levels. At 16 vcpus, it took hours to 
just reach the gui login prompt.

Any perspective you can offer would be appreciated.

Bruce

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] handle -smp 16 more cleanly

2009-04-09 Thread Bruce Rogers
The x86 kvm kernel module limits guest cpu count to 16, but theuserspace pc 
definition says 255 still, so kvm_create_vcpu will fail for that reason with 
-smp  16 specified.  This patch causes qemu-kvm to exit in that case.  Without 
this patch other errors get reported down the road and finally a segfault 
occurs.

Bruce

Signed-off-by: Bruce Rogers brog...@novell.com

diff --git a/qemu/qemu-kvm.c b/qemu/qemu-kvm.c
index ed76367..b6d6d5e 100644
--- a/qemu/qemu-kvm.c
+++ b/qemu/qemu-kvm.c
@@ -417,12 +417,18 @@ static void *ap_main_loop(void *_env)
 CPUState *env = _env;
 sigset_t signals;
 struct ioperm_data *data = NULL;
+int r;

 current_env = env;
 env-thread_id = kvm_get_thread_id();
 sigfillset(signals);
 sigprocmask(SIG_BLOCK, signals, NULL);
-kvm_create_vcpu(kvm_context, env-cpu_index);
+r = kvm_create_vcpu(kvm_context, env-cpu_index);
+if (r)
+{
+fprintf(stderr, error creating vcpu: %d\n, r);
+exit(1);
+}
 kvm_qemu_init_env(env);

 #ifdef USE_KVM_DEVICE_ASSIGNMENT


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html