date:20100121

The commit 0953ca73 KVM: Simplify coalesced mmio initialization
allocate kvm_coalesced_mmio_ring in the kvm_coalesced_mmio_init(), but
didn't discard the original allocation...

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 virt/kvm/kvm_main.c |   17 -
 1 files changed, 0 insertions(+), 17 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 7c5c873..2b0974a 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -371,9 +371,6 @@ static struct kvm *kvm_create_vm(void)
 {
int r = 0, i;
struct kvm *kvm = kvm_arch_create_vm();
-#ifdef KVM_COALESCED_MMIO_PAGE_OFFSET
-   struct page *page;
-#endif
 
if (IS_ERR(kvm))
goto out;
@@ -402,23 +399,9 @@ static struct kvm *kvm_create_vm(void)
}
}
 
-#ifdef KVM_COALESCED_MMIO_PAGE_OFFSET
-   page = alloc_page(GFP_KERNEL | __GFP_ZERO);
-   if (!page) {
-   cleanup_srcu_struct(kvm-srcu);
-   goto out_err;
-   }
-
-   kvm-coalesced_mmio_ring =
-   (struct kvm_coalesced_mmio_ring *)page_address(page);
-#endif
-
r = kvm_init_mmu_notifier(kvm);
if (r) {
cleanup_srcu_struct(kvm-srcu);
-#ifdef KVM_COALESCED_MMIO_PAGE_OFFSET
-   put_page(page);
-#endif
goto out_err;
}
 
-- 
1.5.4.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 04/12] Add handle page fault PV helper.


On 01/20/2010 07:18 PM, Rik van Riel wrote:

On 01/20/2010 07:00 AM, Avi Kivity wrote:

On 01/20/2010 12:02 PM, Gleb Natapov wrote:


I can inject the event as HW interrupt on vector greater then 32 but 
not

go through APIC so EOI will not be required. This sounds
non-architectural
and I am not sure kernel has entry point code for this kind of 
event, it
has one for exception and one for interrupts that goes through 
__do_IRQ()

which assumes that interrupts should be ACKed.


Further, we start to interact with the TPR; Linux doesn't use the TPR or
cr8 but if it does one day we don't want it interfering with apf.


That's not an issue is it?  The guest will tell the host what
vector to use for pseudo page faults.


And kill 15 other vectors?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 04/12] Add handle page fault PV helper.


On 01/20/2010 08:45 PM, H. Peter Anvin wrote:

On 01/20/2010 04:00 AM, Avi Kivity wrote:
   

On 01/20/2010 12:02 PM, Gleb Natapov wrote:
 

I can inject the event as HW interrupt on vector greater then 32 but not
go through APIC so EOI will not be required. This sounds
non-architectural
and I am not sure kernel has entry point code for this kind of event, it
has one for exception and one for interrupts that goes through __do_IRQ()
which assumes that interrupts should be ACKed.

   

Further, we start to interact with the TPR; Linux doesn't use the TPR or
cr8 but if it does one day we don't want it interfering with apf.

 

I don't think the TPR would be involved unless you involve the APIC
(which you absolutely don't want to do.)  What I'm trying to figure out
is if you could inject this vector as external interrupt and still
have it deliver if IF=0, or if it would cause any other funnies.
   


No, and it poses problems further down the line if the hardware 
virtualizes more and more of the APIC as seems likely to happen.


External interrupts are asynchronous events, so they're likely not to be 
guaranteed to be delivered on an instruction boundary like exceptions.  
Things like interrupt shadow will affect them as well.



As that point, you do not want to go through the do_IRQ path but rather
through your own exception vector entry point (it would be an entry
point which doesn't get an error code, like #UD.)
   


An error code would actually be useful.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 04/12] Add handle page fault PV helper.


On 01/20/2010 07:43 PM, H. Peter Anvin wrote:

On 01/20/2010 02:02 AM, Gleb Natapov wrote:



You can have the guest OS take an exception on a vector above 31 just
fine; you just need it to tell the hypervisor which vector it, the OS,
assigned for this purpose.

VMX doesn't allow to inject hardware exception with vector greater 
then 31.

SDM 3B section 23.2.1.3.



OK, you're right.  I had missed that... I presume it was done for 
implementation reasons.


My expectation is that is was done for forward compatibility reasons.




I can inject the event as HW interrupt on vector greater then 32 but not
go through APIC so EOI will not be required. This sounds 
non-architectural

and I am not sure kernel has entry point code for this kind of event, it
has one for exception and one for interrupts that goes through 
__do_IRQ()

which assumes that interrupts should be ACKed.


You can also just emulate the state transition -- since you know 
you're dealing with a flat protected-mode or long-mode OS (and just 
make that a condition of enabling the feature) you don't have to deal 
with all the strange combinations of directions that an unrestricted 
x86 event can take.  Since it's an exception, it is unconditional.


Do you mean create the stack frame manually?  I'd really like to avoid 
that for many reasons, one of which is performance (need to do all the 
virt-to-phys walks manually), the other is that we're certain to end up 
with something horribly underspecified.  I'd really like to keep as 
close as possible to the hardware.  For the alternative approach, see Xen.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 04/12] Add handle page fault PV helper.

On Thu, Jan 21, 2010 at 11:02:19AM +0200, Avi Kivity wrote:
 On 01/20/2010 07:43 PM, H. Peter Anvin wrote:
 On 01/20/2010 02:02 AM, Gleb Natapov wrote:
 
 You can have the guest OS take an exception on a vector above 31 just
 fine; you just need it to tell the hypervisor which vector it, the OS,
 assigned for this purpose.
 
 VMX doesn't allow to inject hardware exception with vector
 greater then 31.
 SDM 3B section 23.2.1.3.
 
 
 OK, you're right.  I had missed that... I presume it was done for
 implementation reasons.
 
 My expectation is that is was done for forward compatibility reasons.
 
 
 I can inject the event as HW interrupt on vector greater then 32 but not
 go through APIC so EOI will not be required. This sounds
 non-architectural
 and I am not sure kernel has entry point code for this kind of event, it
 has one for exception and one for interrupts that goes through
 __do_IRQ()
 which assumes that interrupts should be ACKed.
 
 You can also just emulate the state transition -- since you know
 you're dealing with a flat protected-mode or long-mode OS (and
 just make that a condition of enabling the feature) you don't have
 to deal with all the strange combinations of directions that an
 unrestricted x86 event can take.  Since it's an exception, it is
 unconditional.
 
 Do you mean create the stack frame manually?  I'd really like to
 avoid that for many reasons, one of which is performance (need to do
 all the virt-to-phys walks manually), the other is that we're
 certain to end up with something horribly underspecified.  I'd
 really like to keep as close as possible to the hardware.  For the
 alternative approach, see Xen.
 
That and our event injection path can't play with guest memory right now
since it is done from atomic context.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 04/12] Add handle page fault PV helper.


On 01/21/2010 11:04 AM, Gleb Natapov wrote:



Do you mean create the stack frame manually?  I'd really like to
avoid that for many reasons, one of which is performance (need to do
all the virt-to-phys walks manually), the other is that we're
certain to end up with something horribly underspecified.  I'd
really like to keep as close as possible to the hardware.  For the
alternative approach, see Xen.

 

That and our event injection path can't play with guest memory right now
since it is done from atomic context.
   


That's true (I'd like to fix that though, for the real mode stuff).

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] kvm: Flush coalesced MMIO buffer periodly

The default action of coalesced MMIO is, cache the writing in buffer, until:
1. The buffer is full.
2. Or the exit to QEmu due to other reasons.

But this would result in a very late writing in some condition.
1. The each time write to MMIO content is small.
2. The writing interval is big.
3. No need for input or accessing other devices frequently.

This issue was observed in a experimental embbed system. The test image
simply print test every 1 seconds. The output in QEmu meets expectation,
but the output in KVM is delayed for seconds.

Per Avi's suggestion, I add a periodly flushing coalesced MMIO buffer in
QEmu IO thread. By this way, We don't need vcpu explicit exit to QEmu to
handle this issue. Current synchronize rate is 1/25s.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 qemu-kvm.c |   47 +--
 qemu-kvm.h |2 ++
 2 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/qemu-kvm.c b/qemu-kvm.c
index 599c3d6..38f890c 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -463,6 +463,12 @@ static void kvm_create_vcpu(CPUState *env, int id)
 goto err_fd;
 }
 
+#ifdef KVM_CAP_COALESCED_MMIO
+if (kvm_state-coalesced_mmio  !kvm_state-coalesced_mmio_ring)
+kvm_state-coalesced_mmio_ring = (void *) env-kvm_run +
+   kvm_state-coalesced_mmio * PAGE_SIZE;
+#endif
+
 return;
   err_fd:
 close(env-kvm_fd);
@@ -927,8 +933,7 @@ int kvm_run(CPUState *env)
 
 #if defined(KVM_CAP_COALESCED_MMIO)
 if (kvm_state-coalesced_mmio) {
-struct kvm_coalesced_mmio_ring *ring =
-(void *) run + kvm_state-coalesced_mmio * PAGE_SIZE;
+struct kvm_coalesced_mmio_ring *ring = kvm_state-coalesced_mmio_ring;
 while (ring-first != ring-last) {
 cpu_physical_memory_rw(ring-coalesced_mmio[ring-first].phys_addr,
ring-coalesced_mmio[ring-first].data[0],
@@ -2073,6 +2078,29 @@ static void io_thread_wakeup(void *opaque)
 }
 }
 
+#ifdef KVM_CAP_COALESCED_MMIO
+
+/* flush interval is 1/25 second */
+#define KVM_COALESCED_MMIO_FLUSH_INTERVAL4000LL
+
+static void flush_coalesced_mmio_buffer(void *opaque)
+{
+if (kvm_state-coalesced_mmio_ring) {
+struct kvm_coalesced_mmio_ring *ring =
+kvm_state-coalesced_mmio_ring;
+while (ring-first != ring-last) {
+cpu_physical_memory_rw(ring-coalesced_mmio[ring-first].phys_addr,
+   ring-coalesced_mmio[ring-first].data[0],
+   ring-coalesced_mmio[ring-first].len, 1);
+smp_wmb();
+ring-first = (ring-first + 1) % KVM_COALESCED_MMIO_MAX;
+}
+}
+qemu_mod_timer(kvm_state-coalesced_mmio_timer,
+   qemu_get_clock(host_clock) + KVM_COALESCED_MMIO_FLUSH_INTERVAL);
+}
+#endif
+
 int kvm_main_loop(void)
 {
 int fds[2];
@@ -2117,6 +2145,15 @@ int kvm_main_loop(void)
 io_thread_sigfd = sigfd;
 cpu_single_env = NULL;
 
+#ifdef KVM_CAP_COALESCED_MMIO
+if (kvm_state-coalesced_mmio) {
+kvm_state-coalesced_mmio_timer =
+qemu_new_timer(host_clock, flush_coalesced_mmio_buffer, NULL);
+qemu_mod_timer(kvm_state-coalesced_mmio_timer,
+qemu_get_clock(host_clock) + KVM_COALESCED_MMIO_FLUSH_INTERVAL);
+}
+#endif
+
 while (1) {
 main_loop_wait(1000);
 if (qemu_shutdown_requested()) {
@@ -2135,6 +2172,12 @@ int kvm_main_loop(void)
 }
 }
 
+#ifdef KVM_CAP_COALESCED_MMIO
+if (kvm_state-coalesced_mmio) {
+qemu_del_timer(kvm_state-coalesced_mmio_timer);
+qemu_free_timer(kvm_state-coalesced_mmio_timer);
+}
+#endif
 pause_all_threads();
 pthread_mutex_unlock(qemu_mutex);
 
diff --git a/qemu-kvm.h b/qemu-kvm.h
index 6b3e5a1..17f9d1b 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -1144,6 +1144,8 @@ typedef struct KVMState {
 int fd;
 int vmfd;
 int coalesced_mmio;
+struct kvm_coalesced_mmio_ring *coalesced_mmio_ring;
+struct QEMUTimer *coalesced_mmio_timer;
 int broken_set_mem_region;
 int migration_log;
 int vcpu_events;
-- 
1.5.4.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm: Flush coalesced MMIO buffer periodly


On 01/21/2010 11:37 AM, Sheng Yang wrote:

The default action of coalesced MMIO is, cache the writing in buffer, until:
1. The buffer is full.
2. Or the exit to QEmu due to other reasons.

But this would result in a very late writing in some condition.
1. The each time write to MMIO content is small.
2. The writing interval is big.
3. No need for input or accessing other devices frequently.

This issue was observed in a experimental embbed system. The test image
simply print test every 1 seconds. The output in QEmu meets expectation,
but the output in KVM is delayed for seconds.

Per Avi's suggestion, I add a periodly flushing coalesced MMIO buffer in
QEmu IO thread. By this way, We don't need vcpu explicit exit to QEmu to
handle this issue. Current synchronize rate is 1/25s.

   


I'm not sure that a new timer is needed.  If the only problem case is 
the display, maybe we can flush coalesced mmio from the vga refresh 
timer.  That ensures that we flash exactly when needed, and don't have 
extra timers.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/5] Debug register emulation fixes and optimizations (reloaded)


On 01/20/2010 07:20 PM, Jan Kiszka wrote:

Major parts of this series were already posted a while ago during the
debug register switch optimizations. This version now comes with an
additional fix for VMX (patch 1) and a rework of mov dr emulation for
SVM.
   

Looks good.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] kvm-s390: fix potential array overrun in intercept handling

2010-01-21 Thread Christian Borntraeger

Avi, Marcelo,

kvm_handle_sie_intercept uses a jump table to get the intercept handler
for a SIE intercept. Static code analysis revealed a potential problem:
the intercept_funcs jump table was defined to contain (0x48  2) entries,
but we only checked for code  0x48 which would cause an off-by-one
array overflow if code == 0x48.

Since the table is only populated up to (0x28  2), we can reduce the
jump table size while fixing the off-by-one.

Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com

---
(patch was refreshed with -U8 to see the full jump table.)
 arch/s390/kvm/intercept.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6/arch/s390/kvm/intercept.c
===
--- linux-2.6.orig/arch/s390/kvm/intercept.c
+++ linux-2.6/arch/s390/kvm/intercept.c
@@ -208,32 +208,32 @@ static int handle_instruction_and_prog(s
 
if (rc == -ENOTSUPP)
vcpu-arch.sie_block-icptcode = 0x04;
if (rc)
return rc;
return rc2;
 }
 
-static const intercept_handler_t intercept_funcs[0x48  2] = {
+static const intercept_handler_t intercept_funcs[(0x28  2) + 1] = {
[0x00  2] = handle_noop,
[0x04  2] = handle_instruction,
[0x08  2] = handle_prog,
[0x0C  2] = handle_instruction_and_prog,
[0x10  2] = handle_noop,
[0x14  2] = handle_noop,
[0x1C  2] = kvm_s390_handle_wait,
[0x20  2] = handle_validity,
[0x28  2] = handle_stop,
 };
 
 int kvm_handle_sie_intercept(struct kvm_vcpu *vcpu)
 {
intercept_handler_t func;
u8 code = vcpu-arch.sie_block-icptcode;
 
-   if (code  3 || code  0x48)
+   if (code  3 || code  0x28)
return -ENOTSUPP;
func = intercept_funcs[code  2];
if (func)
return func(vcpu);
return -ENOTSUPP;
 }
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv2] kvm-s390: fix potential array overrun in intercept handling

2010-01-21 Thread Christian Borntraeger

v2: apply Avis suggestions about ARRAY_SIZE.

kvm_handle_sie_intercept uses a jump table to get the intercept handler
for a SIE intercept. Static code analysis revealed a potential problem:
the intercept_funcs jump table was defined to contain (0x48  2) entries,
but we only checked for code  0x48 which would cause an off-by-one
array overflow if code == 0x48.

Use the compiler and ARRAY_SIZE to automatically set the limits.

Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com

---
(patch was refreshed with -U8 to see the full jump table.)
 arch/s390/kvm/intercept.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6/arch/s390/kvm/intercept.c
===
--- linux-2.6.orig/arch/s390/kvm/intercept.c
+++ linux-2.6/arch/s390/kvm/intercept.c
@@ -208,32 +208,32 @@ static int handle_instruction_and_prog(s
 
if (rc == -ENOTSUPP)
vcpu-arch.sie_block-icptcode = 0x04;
if (rc)
return rc;
return rc2;
 }
 
-static const intercept_handler_t intercept_funcs[0x48  2] = {
+static const intercept_handler_t intercept_funcs[] = {
[0x00  2] = handle_noop,
[0x04  2] = handle_instruction,
[0x08  2] = handle_prog,
[0x0C  2] = handle_instruction_and_prog,
[0x10  2] = handle_noop,
[0x14  2] = handle_noop,
[0x1C  2] = kvm_s390_handle_wait,
[0x20  2] = handle_validity,
[0x28  2] = handle_stop,
 };
 
 int kvm_handle_sie_intercept(struct kvm_vcpu *vcpu)
 {
intercept_handler_t func;
u8 code = vcpu-arch.sie_block-icptcode;
 
-   if (code  3 || code  0x48)
+   if (code  3 || (code  2)  = ARRAY_SIZE(intercept_funcs))
return -ENOTSUPP;
func = intercept_funcs[code  2];
if (func)
return func(vcpu);
return -ENOTSUPP;
 }
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv2] kvm-s390: fix potential array overrun in intercept handling

2010-01-21 Thread Heiko Carstens

 - if (code  3 || code  0x48)
 + if (code  3 || (code  2)  = ARRAY_SIZE(intercept_funcs))
   return -ENOTSUPP;

Not that it matters for this patch, but -ENOTSUPP should not leak to
userspace. Not sure if it does somewhere, but it is used all over the
place within arch/s390/kvm...
Use -EOPNOTSUPP or something similar instead.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv2] kvm-s390: fix potential array overrun in intercept handling

2010-01-21 Thread Christian Borntraeger

Am Donnerstag 21 Januar 2010 12:24:18 schrieb Heiko Carstens:
  -   if (code  3 || code  0x48)
  +   if (code  3 || (code  2)  = ARRAY_SIZE(intercept_funcs))
  return -ENOTSUPP;
 
 Not that it matters for this patch, but -ENOTSUPP should not leak to
 userspace. Not sure if it does somewhere, but it is used all over the
 place within arch/s390/kvm...
 Use -EOPNOTSUPP or something similar instead.

AFAICS it does not leak to userspace, ENOTSUPP is an internal code. see
kvm_arch_vcpu_ioctl_run:
[...]
if (rc == -ENOTSUPP) {
/* intercept cannot be handled in-kernel, prepare kvm-run */
kvm_run-exit_reason = KVM_EXIT_S390_SIEIC;
kvm_run-s390_sieic.icptcode = vcpu-arch.sie_block-icptcode;
kvm_run-s390_sieic.ipa  = vcpu-arch.sie_block-ipa;
kvm_run-s390_sieic.ipb  = vcpu-arch.sie_block-ipb;
rc = 0;
}
[...]
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Setup vcpu add/remove infrastructure, including madt bios_info and dsdt.

From cb997030cba02e7e74a29b3d942aeba9808ed293 Mon Sep 17 00:00:00 2001
From: Liu, Jinsong jinsong@intel.com
Date: Fri, 22 Jan 2010 03:18:46 +0800
Subject: [PATCH] Setup vcpu add/remove infrastructure, including madt 
bios_info and dsdt.

1. setup madt bios_info structure, so that static dsdt get
   run-time madt info like checksum address, lapic address,
   max cpu numbers, with least hardcode magic number (realmode
   address of bios_info).
2. setup vcpu add/remove dsdt infrastructure, including processor
   related acpi objects and control methods. vcpu add/remove will
   trigger SCI and then control method _L02. By matching madt, vcpu
   number and add/remove action were found, then by notify control
   method, it will notify OS acpi driver.

Signed-off-by: Liu, Jinsong jinsong@intel.com
---
 src/acpi-dsdt.dsl |  131 -
 src/acpi-dsdt.hex |  441 ++---
 src/acpi.c|7 +
 src/biosvar.h |   14 ++
 src/post.c|   13 ++
 5 files changed, 582 insertions(+), 24 deletions(-)

diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl
index cc31112..ed78489 100644
--- a/src/acpi-dsdt.dsl
+++ b/src/acpi-dsdt.dsl
@@ -700,8 +700,11 @@ DefinitionBlock (
 Return (0x01)

 }
+/*
+ * _L02 method for CPU notification
+ */
 Method(_L02) {
-Return(0x01)
+Return(\_PR.PRSC())
 }
 Method(_L03) {
 Return(0x01)
@@ -744,4 +747,130 @@ DefinitionBlock (
 }
 }

+
+Scope (\_PR)
+{
+/* BIOS_INFO_PHYSICAL_ADDRESS == 0xEA000 */
+OperationRegion(BIOS, SystemMemory, 0xEA000, 16)
+Field(BIOS, DwordAcc, NoLock, Preserve)
+{
+MSUA, 32, /* MADT checksum address */
+MAPA, 32, /* MADT LAPIC0 address */
+PBYT, 32, /* bytes of max vcpus bitmap */
+PBIT, 32  /* bits of last byte of max vcpus bitmap */
+}
+
+OperationRegion(MSUM, SystemMemory, MSUA, 1)
+Field(MSUM, ByteAcc, NoLock, Preserve)
+{
+MSU, 8/* MADT checksum */
+}
+
+#define gen_processor(nr, name)   \
+Processor (C##name, nr, 0xb010, 0x06) {   \
+Name (_HID, ACPI0007)   \
+OperationRegion(MATR, SystemMemory, Add(MAPA, Multiply(nr,8)), 8) \
+Field (MATR, ByteAcc, NoLock, Preserve)   \
+{ \
+MAT, 64   \
+} \
+Field (MATR, ByteAcc, NoLock, Preserve)   \
+{ \
+Offset(4),\
+FLG, 1\
+} \
+Method(_MAT, 0) { \
+Return(ToBuffer(MAT)) \
+} \
+Method (_STA) {   \
+If (FLG) { Return(0xF) } Else { Return(0x9) } \
+} \
+Method (_EJ0, 1, NotSerialized) { \
+Sleep (0xC8)  \
+} \
+} \
+
+gen_processor(0, 0)
+gen_processor(1, 1)
+gen_processor(2, 2)
+gen_processor(3, 3)
+gen_processor(4, 4)
+gen_processor(5, 5)
+gen_processor(6, 6)
+gen_processor(7, 7)
+gen_processor(8, 8)
+gen_processor(9, 9)
+gen_processor(10, A)
+gen_processor(11, B)
+gen_processor(12, C)
+gen_processor(13, D)
+gen_processor(14, E)
+
+
+Method (NTFY, 2) {
+#define gen_ntfy(nr)\
+If (LEqual(Arg0, 0x##nr)) { \
+If (LNotEqual(Arg1, \_PR.C##nr.FLG)) {  \
+Store (Arg1, \_PR.C##nr.FLG)\
+If (LEqual(Arg1, 1)) {  \
+Notify(C##nr, 1)\
+Subtract(\_PR.MSU, 1, \_PR.MSU) \
+} Else {

[PATCH] Debug vcpu add

From 479e84d9ce9d7d78d845f438071a4b1a44aca0bb Mon Sep 17 00:00:00 2001
From: Liu, Jinsong jinsong@intel.com
Date: Fri, 22 Jan 2010 03:30:33 +0800
Subject: [PATCH] Debug vcpu add

Add 'kvm_vcpu_inited' check so that when adding vcpu it will not
cause segmentation fault. This is especially necessary when vpu
hotadd after guestos ready.

Signed-off-by: Liu, Jinsong jinsong@intel.com
---
 qemu-kvm.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/qemu-kvm.c b/qemu-kvm.c
index 599c3d6..bdf90b4 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -1618,7 +1618,7 @@ static void kvm_do_load_mpstate(void *_env)
 
 void kvm_load_mpstate(CPUState *env)
 {
-if (kvm_enabled()  qemu_system_ready)
+if (kvm_enabled()  qemu_system_ready  kvm_vcpu_inited(env))
 on_vcpu(env, kvm_do_load_mpstate, env);
 }
 
-- 
1.6.5.6
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

vcpu hotplug support

Avi,

I just send 2 patches for KVM vcpu hotplug support.
1 is seabios patch: Setup vcpu add/remove infrastructure, including madt 
bios_info and dsdt
2 is qemu-kvm patch: Debug vcpu add

Thanks,
Jinsong--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 7/9] KVM: PPC: Emulate trap SRR1 flags properly

On 21.01.2010, at 09:09, Liu Yu-B13201 wrote:

 -Original Message-
 From: kvm-ppc-ow...@vger.kernel.org 
 [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of Hollis Blanchard
 Sent: Saturday, January 09, 2010 3:30 AM
 To: Alexander Graf
 Cc: kvm@vger.kernel.org; kvm-ppc; Benjamin Herrenschmidt; Liu Yu
 Subject: Re: [PATCH 7/9] KVM: PPC: Emulate trap SRR1 flags properly

 diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
 index 338baf9..e283e44 100644
 --- a/arch/powerpc/kvm/booke.c
 +++ b/arch/powerpc/kvm/booke.c
 @@ -82,8 +82,9 @@ static void 
 kvmppc_booke_queue_irqprio(struct kvm_vcpu *vcpu,
   set_bit(priority, vcpu-arch.pending_exceptions);
 }

 -void kvmppc_core_queue_program(struct kvm_vcpu *vcpu)
 +void kvmppc_core_queue_program(struct kvm_vcpu *vcpu, ulong flags)
 {
 +   /* BookE does flags in ESR, so ignore those we get here */
   kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_PROGRAM);
 }

 Actually, I think Book E prematurely sets ESR, since it's done before
 the program interrupt is actually delivered. Architecturally, I'm not
 sure if it's a problem, but philosophically I've always wanted it to
 work the way you've just implemented for Book S.

 ESR is updated not only by program but by data_tlb, data_storage, etc.
 Should we rearrange them all? 
 Also DEAR has the same situation as ESR.
 Should it be updated when we decide to inject interrupt to guest?

If that's what the hardware does, then yes. I'm good with taking small steps 
though. So if you don't have the time to convert all of the handlers, you can 
easily start off with program interrupts.

Alex--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: vcpu hotplug support


On 01/21/2010 01:54 PM, Liu, Jinsong wrote:

Avi,

I just send 2 patches for KVM vcpu hotplug support.
1 is seabios patch: Setup vcpu add/remove infrastructure, including madt 
bios_info and dsdt
2 is qemu-kvm patch: Debug vcpu add

   


The patches look reasonable (of course I'd like to see Gleb review it), 
but please send the seabios patch to the seabios mailing list 
(seab...@seabios.org) so we don't have to diverge.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Debug vcpu add

On Thursday 21 January 2010 19:50:17 Liu, Jinsong wrote:
 From 479e84d9ce9d7d78d845f438071a4b1a44aca0bb Mon Sep 17 00:00:00 2001
 From: Liu, Jinsong jinsong@intel.com
 Date: Fri, 22 Jan 2010 03:30:33 +0800
 Subject: [PATCH] Debug vcpu add

Jinsong, this name is pretty strange...

I think something like Fix vcpu hot add feature should be more proper...

-- 
regards
Yang, Sheng

 
 Add 'kvm_vcpu_inited' check so that when adding vcpu it will not
 cause segmentation fault. This is especially necessary when vpu
 hotadd after guestos ready.
 
 Signed-off-by: Liu, Jinsong jinsong@intel.com
 ---
  qemu-kvm.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)
 
 diff --git a/qemu-kvm.c b/qemu-kvm.c
 index 599c3d6..bdf90b4 100644
 --- a/qemu-kvm.c
 +++ b/qemu-kvm.c
 @@ -1618,7 +1618,7 @@ static void kvm_do_load_mpstate(void *_env)
 
  void kvm_load_mpstate(CPUState *env)
  {
 -if (kvm_enabled()  qemu_system_ready)
 +if (kvm_enabled()  qemu_system_ready  kvm_vcpu_inited(env))
  on_vcpu(env, kvm_do_load_mpstate, env);
  }
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Setup vcpu add/remove infrastructure, including madt bios_info and dsdt.

On Thu, Jan 21, 2010 at 07:48:23PM +0800, Liu, Jinsong wrote:
 From cb997030cba02e7e74a29b3d942aeba9808ed293 Mon Sep 17 00:00:00 2001
 From: Liu, Jinsong jinsong@intel.com
 Date: Fri, 22 Jan 2010 03:18:46 +0800
 Subject: [PATCH] Setup vcpu add/remove infrastructure, including madt 
 bios_info and dsdt.

 1. setup madt bios_info structure, so that static dsdt get
run-time madt info like checksum address, lapic address,
max cpu numbers, with least hardcode magic number (realmode
address of bios_info).
 2. setup vcpu add/remove dsdt infrastructure, including processor
related acpi objects and control methods. vcpu add/remove will
trigger SCI and then control method _L02. By matching madt, vcpu
number and add/remove action were found, then by notify control
method, it will notify OS acpi driver.

 Signed-off-by: Liu, Jinsong jinsong@intel.com
It looks like AML code is a port of what we had in BOCHS bios with minor
changes. Can you detail what is changed and why for easy review please?
And this still doesn't work with Windows I assume.

 ---
  src/acpi-dsdt.dsl |  131 -
  src/acpi-dsdt.hex |  441 
 ++---
  src/acpi.c|7 +
  src/biosvar.h |   14 ++
  src/post.c|   13 ++
  5 files changed, 582 insertions(+), 24 deletions(-)

 diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl
 index cc31112..ed78489 100644
 --- a/src/acpi-dsdt.dsl
 +++ b/src/acpi-dsdt.dsl
 @@ -700,8 +700,11 @@ DefinitionBlock (
  Return (0x01)

  }
 +/*
 + * _L02 method for CPU notification
 + */
  Method(_L02) {
 -Return(0x01)
 +Return(\_PR.PRSC())
  }
  Method(_L03) {
  Return(0x01)
 @@ -744,4 +747,130 @@ DefinitionBlock (
  }
  }

 +
 +Scope (\_PR)
 +{
 +/* BIOS_INFO_PHYSICAL_ADDRESS == 0xEA000 */
 +OperationRegion(BIOS, SystemMemory, 0xEA000, 16)
 +Field(BIOS, DwordAcc, NoLock, Preserve)
 +{
 +MSUA, 32, /* MADT checksum address */
 +MAPA, 32, /* MADT LAPIC0 address */
 +PBYT, 32, /* bytes of max vcpus bitmap */
 +PBIT, 32  /* bits of last byte of max vcpus bitmap */
Why do you need PBYT/PBIT? Adds complexity for no apparent reason.

 +}
 +
 +OperationRegion(MSUM, SystemMemory, MSUA, 1)
 +Field(MSUM, ByteAcc, NoLock, Preserve)
 +{
 +MSU, 8/* MADT checksum */
 +}
 +
 +#define gen_processor(nr, name)  
  \
 +Processor (C##name, nr, 0xb010, 0x06) {  
  \
 +Name (_HID, ACPI0007)  
  \
 +OperationRegion(MATR, SystemMemory, Add(MAPA, Multiply(nr,8)), 
 8) \
 +Field (MATR, ByteAcc, NoLock, Preserve)  
  \
 +{
  \
 +MAT, 64  
  \
 +}
  \
 +Field (MATR, ByteAcc, NoLock, Preserve)  
  \
 +{
  \
 +Offset(4),   
  \
 +FLG, 1   
  \
 +}
  \
 +Method(_MAT, 0) {
  \
 +Return(ToBuffer(MAT))
  \
 +}
  \
 +Method (_STA) {  
  \
 +If (FLG) { Return(0xF) } Else { Return(0x9) }
  \
 +}
  \
 +Method (_EJ0, 1, NotSerialized) {
  \
 +Sleep (0xC8) 
  \
 +}
  \
Why _EJ0 is needed?

 +}
  \
 +
 +gen_processor(0, 0)
 +gen_processor(1, 1)
 +gen_processor(2, 2)
 +gen_processor(3, 3)
 +gen_processor(4, 4)
 +gen_processor(5, 5)
 +gen_processor(6, 6)
 +gen_processor(7, 7)
 +gen_processor(8, 8)
 +gen_processor(9, 9)
 +gen_processor(10, A)
 +gen_processor(11, B)
 +gen_processor(12, C)
 +

[PATCH] fix checking of cr0 validity

Move to/from Control Registers chapter of Intel SDM says.  Reserved bits
in CR0 remain clear after any load of those registers; attempts to set
them have no impact. Control Register chapter says Bits 63:32 of CR0 are
reserved and must be written with zeros. Writing a nonzero value to any
of the upper 32 bits results in a general-protection exception, #GP(0).

This patch tries to implement this twisted logic.

Signed-off-by: Gleb Natapov g...@redhat.com
Reported-by: Lorenzo Martignoni martig...@gmail.com
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 47c6e23..1df691d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -430,12 +430,16 @@ void kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
 {
cr0 |= X86_CR0_ET;
 
-   if (cr0  CR0_RESERVED_BITS) {
+#ifdef CONFIG_X86_64
+   if (cr0  0xlu) {
printk(KERN_DEBUG set_cr0: 0x%lx #GP, reserved bits 0x%lx\n,
   cr0, kvm_read_cr0(vcpu));
kvm_inject_gp(vcpu, 0);
return;
}
+#endif
+
+   cr0 = ~CR0_RESERVED_BITS;
 
if ((cr0  X86_CR0_NW)  !(cr0  X86_CR0_CD)) {
printk(KERN_DEBUG set_cr0: #GP, CD == 0  NW == 1\n);
--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/8] cr0/cr4/efer/fpu miscellaneous bits

Mostly trivial cleanups with the exception of a patch activating the fpu
on clts.

Avi Kivity (8):
  KVM: Allow kvm_load_guest_fpu() even when !vcpu-fpu_active
  KVM: Drop kvm_{load,put}_guest_fpu() exports
  KVM: Activate fpu on clts
  KVM: Add a helper for checking if the guest is in protected mode
  KVM: Move cr0/cr4/efer related helpers to x86.h
  KVM: Rename vcpu-shadow_efer to efer
  KVM: Optimize kvm_read_cr[04]_bits()
  KVM: trace guest fpu loads and unloads

 arch/x86/include/asm/kvm_host.h |3 ++-
 arch/x86/kvm/emulate.c  |   10 --
 arch/x86/kvm/kvm_cache_regs.h   |9 +++--
 arch/x86/kvm/mmu.c  |3 ++-
 arch/x86/kvm/mmu.h  |   24 
 arch/x86/kvm/svm.c  |   20 +---
 arch/x86/kvm/vmx.c  |   19 ++-
 arch/x86/kvm/x86.c  |   31 ---
 arch/x86/kvm/x86.h  |   30 ++
 include/trace/events/kvm.h  |   19 +++
 10 files changed, 103 insertions(+), 65 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/8] KVM: Allow kvm_load_guest_fpu() even when !vcpu-fpu_active

This allows accessing the guest fpu from the instruction emulator, as well as
being symmetric with kvm_put_guest_fpu().

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/x86.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 47c6e23..e3145d5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4251,7 +4251,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
preempt_disable();
 
kvm_x86_ops-prepare_guest_switch(vcpu);
-   kvm_load_guest_fpu(vcpu);
+   if (vcpu-fpu_active)
+   kvm_load_guest_fpu(vcpu);
 
local_irq_disable();
 
@@ -5297,7 +5298,7 @@ EXPORT_SYMBOL_GPL(fx_init);
 
 void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
 {
-   if (!vcpu-fpu_active || vcpu-guest_fpu_loaded)
+   if (vcpu-guest_fpu_loaded)
return;
 
vcpu-guest_fpu_loaded = 1;
-- 
1.6.5.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/8] KVM: Activate fpu on clts

Assume that if the guest executes clts, it knows what it's doing, and load the
guest fpu to prevent an #NM exception.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/svm.c  |8 +++-
 arch/x86/kvm/vmx.c  |1 +
 arch/x86/kvm/x86.c  |1 +
 4 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a1f0b5d..bf3ec76 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -512,6 +512,7 @@ struct kvm_x86_ops {
void (*cache_reg)(struct kvm_vcpu *vcpu, enum kvm_reg reg);
unsigned long (*get_rflags)(struct kvm_vcpu *vcpu);
void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags);
+   void (*fpu_activate)(struct kvm_vcpu *vcpu);
void (*fpu_deactivate)(struct kvm_vcpu *vcpu);
 
void (*tlb_flush)(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 8d7cb62..0f3738a 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1259,12 +1259,17 @@ static int ud_interception(struct vcpu_svm *svm)
return 1;
 }
 
-static int nm_interception(struct vcpu_svm *svm)
+static void svm_fpu_activate(struct kvm_vcpu *vcpu)
 {
+   struct vcpu_svm *svm = to_svm(vcpu);
svm-vmcb-control.intercept_exceptions = ~(1  NM_VECTOR);
svm-vcpu.fpu_active = 1;
update_cr0_intercept(svm);
+}
 
+static int nm_interception(struct vcpu_svm *svm)
+{
+   svm_fpu_activate(svm-vcpu);
return 1;
 }
 
@@ -2971,6 +2976,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.cache_reg = svm_cache_reg,
.get_rflags = svm_get_rflags,
.set_rflags = svm_set_rflags,
+   .fpu_activate = svm_fpu_activate,
.fpu_deactivate = svm_fpu_deactivate,
 
.tlb_flush = svm_flush_tlb,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 7375ae1..372bc38 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3011,6 +3011,7 @@ static int handle_cr(struct kvm_vcpu *vcpu)
vmcs_writel(CR0_READ_SHADOW, kvm_read_cr0(vcpu));
trace_kvm_cr_write(0, kvm_read_cr0(vcpu));
skip_emulated_instruction(vcpu);
+   vmx_fpu_activate(vcpu);
return 1;
case 1: /*mov from cr*/
switch (cr) {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index feca59f..09207ba 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3266,6 +3266,7 @@ int emulate_invlpg(struct kvm_vcpu *vcpu, gva_t address)
 int emulate_clts(struct kvm_vcpu *vcpu)
 {
kvm_x86_ops-set_cr0(vcpu, kvm_read_cr0_bits(vcpu, ~X86_CR0_TS));
+   kvm_x86_ops-fpu_activate(vcpu);
return X86EMUL_CONTINUE;
 }
 
-- 
1.6.5.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/8] KVM: Add a helper for checking if the guest is in protected mode

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/emulate.c |9 -
 arch/x86/kvm/vmx.c |4 ++--
 arch/x86/kvm/x86.c |7 +++
 arch/x86/kvm/x86.h |6 ++
 4 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 0f89e32..e46f276 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -32,6 +32,7 @@
 #include linux/module.h
 #include asm/kvm_emulate.h
 
+#include x86.h
 #include mmu.h   /* for is_long_mode() */
 
 /*
@@ -1515,7 +1516,7 @@ emulate_syscall(struct x86_emulate_ctxt *ctxt)
 
/* syscall is not available in real mode */
if (c-lock_prefix || ctxt-mode == X86EMUL_MODE_REAL
-   || !kvm_read_cr0_bits(ctxt-vcpu, X86_CR0_PE))
+   || !is_protmode(ctxt-vcpu))
return -1;
 
setup_syscalls_segments(ctxt, cs, ss);
@@ -1568,8 +1569,7 @@ emulate_sysenter(struct x86_emulate_ctxt *ctxt)
return -1;
 
/* inject #GP if in real mode or paging is disabled */
-   if (ctxt-mode == X86EMUL_MODE_REAL ||
-   !kvm_read_cr0_bits(ctxt-vcpu, X86_CR0_PE)) {
+   if (ctxt-mode == X86EMUL_MODE_REAL || !is_protmode(ctxt-vcpu)) {
kvm_inject_gp(ctxt-vcpu, 0);
return -1;
}
@@ -1634,8 +1634,7 @@ emulate_sysexit(struct x86_emulate_ctxt *ctxt)
return -1;
 
/* inject #GP if in real mode or paging is disabled */
-   if (ctxt-mode == X86EMUL_MODE_REAL
-   || !kvm_read_cr0_bits(ctxt-vcpu, X86_CR0_PE)) {
+   if (ctxt-mode == X86EMUL_MODE_REAL || !is_protmode(ctxt-vcpu)) {
kvm_inject_gp(ctxt-vcpu, 0);
return -1;
}
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 372bc38..cd78049 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1853,7 +1853,7 @@ static void vmx_get_segment(struct kvm_vcpu *vcpu,
 
 static int vmx_get_cpl(struct kvm_vcpu *vcpu)
 {
-   if (!kvm_read_cr0_bits(vcpu, X86_CR0_PE)) /* if real mode */
+   if (!is_protmode(vcpu))
return 0;
 
if (vmx_get_rflags(vcpu)  X86_EFLAGS_VM) /* if virtual 8086 */
@@ -2108,7 +2108,7 @@ static bool cs_ss_rpl_check(struct kvm_vcpu *vcpu)
 static bool guest_state_valid(struct kvm_vcpu *vcpu)
 {
/* real mode guest state checks */
-   if (!kvm_read_cr0_bits(vcpu, X86_CR0_PE)) {
+   if (!is_protmode(vcpu)) {
if (!rmode_segment_valid(vcpu, VCPU_SREG_CS))
return false;
if (!rmode_segment_valid(vcpu, VCPU_SREG_SS))
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 09207ba..6cdead0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3798,8 +3798,7 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
 * hypercall generates UD from non zero cpl and real mode
 * per HYPER-V spec
 */
-   if (kvm_x86_ops-get_cpl(vcpu) != 0 ||
-   !kvm_read_cr0_bits(vcpu, X86_CR0_PE)) {
+   if (kvm_x86_ops-get_cpl(vcpu) != 0 || !is_protmode(vcpu)) {
kvm_queue_exception(vcpu, UD_VECTOR);
return 0;
}
@@ -4763,7 +4762,7 @@ int kvm_load_segment_descriptor(struct kvm_vcpu *vcpu, 
u16 selector,
 {
struct kvm_segment kvm_seg;
 
-   if (is_vm86_segment(vcpu, seg) || !(kvm_read_cr0_bits(vcpu, 
X86_CR0_PE)))
+   if (is_vm86_segment(vcpu, seg) || !is_protmode(vcpu))
return kvm_load_realmode_segment(vcpu, selector, seg);
if (load_segment_descriptor_to_kvm_desct(vcpu, selector, kvm_seg))
return 1;
@@ -5115,7 +5114,7 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
/* Older userspace won't unhalt the vcpu on reset. */
if (kvm_vcpu_is_bsp(vcpu)  kvm_rip_read(vcpu) == 0xfff0 
sregs-cs.selector == 0xf000  sregs-cs.base == 0x 
-   !(kvm_read_cr0_bits(vcpu, X86_CR0_PE)))
+   !is_protmode(vcpu))
vcpu-arch.mp_state = KVM_MP_STATE_RUNNABLE;
 
vcpu_put(vcpu);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 5eadea5..f783d8f 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -2,6 +2,7 @@
 #define ARCH_X86_KVM_X86_H
 
 #include linux/kvm_host.h
+#include kvm_cache_regs.h
 
 static inline void kvm_clear_exception_queue(struct kvm_vcpu *vcpu)
 {
@@ -35,4 +36,9 @@ static inline bool kvm_exception_is_soft(unsigned int nr)
 struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu,
  u32 function, u32 index);
 
+static inline bool is_protmode(struct kvm_vcpu *vcpu)
+{
+   return kvm_read_cr0_bits(vcpu, X86_CR0_PE);
+}
+
 #endif
-- 
1.6.5.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/8] KVM: Drop kvm_{load,put}_guest_fpu() exports

Not used anymore.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/x86.c |2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e3145d5..feca59f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5305,7 +5305,6 @@ void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
kvm_fx_save(vcpu-arch.host_fx_image);
kvm_fx_restore(vcpu-arch.guest_fx_image);
 }
-EXPORT_SYMBOL_GPL(kvm_load_guest_fpu);
 
 void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
 {
@@ -5318,7 +5317,6 @@ void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
++vcpu-stat.fpu_reload;
set_bit(KVM_REQ_DEACTIVATE_FPU, vcpu-requests);
 }
-EXPORT_SYMBOL_GPL(kvm_put_guest_fpu);
 
 void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
 {
-- 
1.6.5.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 6/8] KVM: Rename vcpu-shadow_efer to efer

None of the other registers have the shadow_ prefix.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/include/asm/kvm_host.h |2 +-
 arch/x86/kvm/mmu.c  |2 +-
 arch/x86/kvm/svm.c  |   12 ++--
 arch/x86/kvm/vmx.c  |   14 +++---
 arch/x86/kvm/x86.c  |   14 +++---
 arch/x86/kvm/x86.h  |2 +-
 6 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index bf3ec76..76bf686 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -277,7 +277,7 @@ struct kvm_vcpu_arch {
unsigned long cr8;
u32 hflags;
u64 pdptrs[4]; /* pae */
-   u64 shadow_efer;
+   u64 efer;
u64 apic_base;
struct kvm_lapic *apic;/* kernel irqchip context */
int32_t apic_arb_prio;
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 6f7158f..599c422 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -237,7 +237,7 @@ static int is_cpuid_PSE36(void)
 
 static int is_nx(struct kvm_vcpu *vcpu)
 {
-   return vcpu-arch.shadow_efer  EFER_NX;
+   return vcpu-arch.efer  EFER_NX;
 }
 
 static int is_shadow_present_pte(u64 pte)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 0f3738a..0242fdd 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -231,7 +231,7 @@ static void svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
efer = ~EFER_LME;
 
to_svm(vcpu)-vmcb-save.efer = efer | EFER_SVME;
-   vcpu-arch.shadow_efer = efer;
+   vcpu-arch.efer = efer;
 }
 
 static void svm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr,
@@ -990,14 +990,14 @@ static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned 
long cr0)
struct vcpu_svm *svm = to_svm(vcpu);
 
 #ifdef CONFIG_X86_64
-   if (vcpu-arch.shadow_efer  EFER_LME) {
+   if (vcpu-arch.efer  EFER_LME) {
if (!is_paging(vcpu)  (cr0  X86_CR0_PG)) {
-   vcpu-arch.shadow_efer |= EFER_LMA;
+   vcpu-arch.efer |= EFER_LMA;
svm-vmcb-save.efer |= EFER_LMA | EFER_LME;
}
 
if (is_paging(vcpu)  !(cr0  X86_CR0_PG)) {
-   vcpu-arch.shadow_efer = ~EFER_LMA;
+   vcpu-arch.efer = ~EFER_LMA;
svm-vmcb-save.efer = ~(EFER_LMA | EFER_LME);
}
}
@@ -1361,7 +1361,7 @@ static int vmmcall_interception(struct vcpu_svm *svm)
 
 static int nested_svm_check_permissions(struct vcpu_svm *svm)
 {
-   if (!(svm-vcpu.arch.shadow_efer  EFER_SVME)
+   if (!(svm-vcpu.arch.efer  EFER_SVME)
|| !is_paging(svm-vcpu)) {
kvm_queue_exception(svm-vcpu, UD_VECTOR);
return 1;
@@ -1764,7 +1764,7 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm)
hsave-save.ds = vmcb-save.ds;
hsave-save.gdtr   = vmcb-save.gdtr;
hsave-save.idtr   = vmcb-save.idtr;
-   hsave-save.efer   = svm-vcpu.arch.shadow_efer;
+   hsave-save.efer   = svm-vcpu.arch.efer;
hsave-save.cr0= kvm_read_cr0(svm-vcpu);
hsave-save.cr4= svm-vcpu.arch.cr4;
hsave-save.rflags = vmcb-save.rflags;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index cd78049..d4a6260 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -618,7 +618,7 @@ static bool update_transition_efer(struct vcpu_vmx *vmx, 
int efer_offset)
u64 guest_efer;
u64 ignore_bits;
 
-   guest_efer = vmx-vcpu.arch.shadow_efer;
+   guest_efer = vmx-vcpu.arch.efer;
 
/*
 * NX is emulated; LMA and LME handled by hardware; SCE meaninless
@@ -963,7 +963,7 @@ static void setup_msrs(struct vcpu_vmx *vmx)
 * if efer.sce is enabled.
 */
index = __find_msr_index(vmx, MSR_K6_STAR);
-   if ((index = 0)  (vmx-vcpu.arch.shadow_efer  EFER_SCE))
+   if ((index = 0)  (vmx-vcpu.arch.efer  EFER_SCE))
move_msr_up(vmx, index, save_nmsrs++);
}
 #endif
@@ -1608,7 +1608,7 @@ static void vmx_set_efer(struct kvm_vcpu *vcpu, u64 efer)
 * of this msr depends on is_long_mode().
 */
vmx_load_host_state(to_vmx(vcpu));
-   vcpu-arch.shadow_efer = efer;
+   vcpu-arch.efer = efer;
if (!msr)
return;
if (efer  EFER_LMA) {
@@ -1640,13 +1640,13 @@ static void enter_lmode(struct kvm_vcpu *vcpu)
 (guest_tr_ar  ~AR_TYPE_MASK)
 | AR_TYPE_BUSY_64_TSS);
}
-   vcpu-arch.shadow_efer |= EFER_LMA;
-   vmx_set_efer(vcpu, vcpu-arch.shadow_efer);
+   vcpu-arch.efer |= EFER_LMA;
+   vmx_set_efer(vcpu, vcpu-arch.efer);
 }
 
 static void exit_lmode(struct kvm_vcpu *vcpu)
 {
-   vcpu-arch.shadow_efer = ~EFER_LMA;
+   vcpu-arch.efer = ~EFER_LMA;

[PATCH 8/8] KVM: trace guest fpu loads and unloads

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/x86.c |2 ++
 include/trace/events/kvm.h |   19 +++
 2 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8b42c19..06a03c1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5304,6 +5304,7 @@ void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
vcpu-guest_fpu_loaded = 1;
kvm_fx_save(vcpu-arch.host_fx_image);
kvm_fx_restore(vcpu-arch.guest_fx_image);
+   trace_kvm_fpu(1);
 }
 
 void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
@@ -5316,6 +5317,7 @@ void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
kvm_fx_restore(vcpu-arch.host_fx_image);
++vcpu-stat.fpu_reload;
set_bit(KVM_REQ_DEACTIVATE_FPU, vcpu-requests);
+   trace_kvm_fpu(0);
 }
 
 void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h
index dbe1084..8abdc12 100644
--- a/include/trace/events/kvm.h
+++ b/include/trace/events/kvm.h
@@ -145,6 +145,25 @@ TRACE_EVENT(kvm_mmio,
  __entry-len, __entry-gpa, __entry-val)
 );
 
+#define kvm_fpu_load_symbol\
+   {0, unload},  \
+   {1, load}
+
+TRACE_EVENT(kvm_fpu,
+   TP_PROTO(int load),
+   TP_ARGS(load),
+
+   TP_STRUCT__entry(
+   __field(u32,load)
+   ),
+
+   TP_fast_assign(
+   __entry-load   = load;
+   ),
+
+   TP_printk(%s, __print_symbolic(__entry-load, kvm_fpu_load_symbol))
+);
+
 #endif /* _TRACE_KVM_MAIN_H */
 
 /* This part must be outside protection */
-- 
1.6.5.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 5/8] KVM: Move cr0/cr4/efer related helpers to x86.h

They have more general scope than the mmu.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/emulate.c |1 -
 arch/x86/kvm/mmu.c |1 +
 arch/x86/kvm/mmu.h |   24 
 arch/x86/kvm/x86.h |   24 
 4 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index e46f276..a2adec8 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -33,7 +33,6 @@
 #include asm/kvm_emulate.h
 
 #include x86.h
-#include mmu.h   /* for is_long_mode() */
 
 /*
  * Opcode effective-address decode tables.
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index ff2b2e8..6f7158f 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -18,6 +18,7 @@
  */
 
 #include mmu.h
+#include x86.h
 #include kvm_cache_regs.h
 
 #include linux/kvm_host.h
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 599159f..61ef5a6 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -58,30 +58,6 @@ static inline int kvm_mmu_reload(struct kvm_vcpu *vcpu)
return kvm_mmu_load(vcpu);
 }
 
-static inline int is_long_mode(struct kvm_vcpu *vcpu)
-{
-#ifdef CONFIG_X86_64
-   return vcpu-arch.shadow_efer  EFER_LMA;
-#else
-   return 0;
-#endif
-}
-
-static inline int is_pae(struct kvm_vcpu *vcpu)
-{
-   return kvm_read_cr4_bits(vcpu, X86_CR4_PAE);
-}
-
-static inline int is_pse(struct kvm_vcpu *vcpu)
-{
-   return kvm_read_cr4_bits(vcpu, X86_CR4_PSE);
-}
-
-static inline int is_paging(struct kvm_vcpu *vcpu)
-{
-   return kvm_read_cr0_bits(vcpu, X86_CR0_PG);
-}
-
 static inline int is_present_gpte(unsigned long pte)
 {
return pte  PT_PRESENT_MASK;
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index f783d8f..2dc24a7 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -41,4 +41,28 @@ static inline bool is_protmode(struct kvm_vcpu *vcpu)
return kvm_read_cr0_bits(vcpu, X86_CR0_PE);
 }
 
+static inline int is_long_mode(struct kvm_vcpu *vcpu)
+{
+#ifdef CONFIG_X86_64
+   return vcpu-arch.shadow_efer  EFER_LMA;
+#else
+   return 0;
+#endif
+}
+
+static inline int is_pae(struct kvm_vcpu *vcpu)
+{
+   return kvm_read_cr4_bits(vcpu, X86_CR4_PAE);
+}
+
+static inline int is_pse(struct kvm_vcpu *vcpu)
+{
+   return kvm_read_cr4_bits(vcpu, X86_CR4_PSE);
+}
+
+static inline int is_paging(struct kvm_vcpu *vcpu)
+{
+   return kvm_read_cr0_bits(vcpu, X86_CR0_PG);
+}
+
 #endif
-- 
1.6.5.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 7/8] KVM: Optimize kvm_read_cr[04]_bits()

'mask' is always a constant, so we can check whether it includes a bit that
might be owned by the guest very cheaply, and avoid the decache call.  Saves
a few hundred bytes of module text.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/kvm_cache_regs.h |9 +++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
index 6b419a3..5a109c6 100644
--- a/arch/x86/kvm/kvm_cache_regs.h
+++ b/arch/x86/kvm/kvm_cache_regs.h
@@ -1,6 +1,9 @@
 #ifndef ASM_KVM_CACHE_REGS_H
 #define ASM_KVM_CACHE_REGS_H
 
+#define KVM_POSSIBLE_CR0_GUEST_BITS X86_CR0_TS
+#define KVM_POSSIBLE_CR4_GUEST_BITS X86_CR4_PGE
+
 static inline unsigned long kvm_register_read(struct kvm_vcpu *vcpu,
  enum kvm_reg reg)
 {
@@ -40,7 +43,8 @@ static inline u64 kvm_pdptr_read(struct kvm_vcpu *vcpu, int 
index)
 
 static inline ulong kvm_read_cr0_bits(struct kvm_vcpu *vcpu, ulong mask)
 {
-   if (mask  vcpu-arch.cr0_guest_owned_bits)
+   ulong tmask = mask  KVM_POSSIBLE_CR0_GUEST_BITS;
+   if (tmask  vcpu-arch.cr0_guest_owned_bits)
kvm_x86_ops-decache_cr0_guest_bits(vcpu);
return vcpu-arch.cr0  mask;
 }
@@ -52,7 +56,8 @@ static inline ulong kvm_read_cr0(struct kvm_vcpu *vcpu)
 
 static inline ulong kvm_read_cr4_bits(struct kvm_vcpu *vcpu, ulong mask)
 {
-   if (mask  vcpu-arch.cr4_guest_owned_bits)
+   ulong tmask = mask  KVM_POSSIBLE_CR4_GUEST_BITS;
+   if (tmask  vcpu-arch.cr4_guest_owned_bits)
kvm_x86_ops-decache_cr4_guest_bits(vcpu);
return vcpu-arch.cr4  mask;
 }
-- 
1.6.5.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

How to debug Ubuntu 8.04 LTS guest crash during install?

2010-01-21 Thread Neil Aggarwal

Hello:

I am using kvm on a CentOS 5.4 server.

I am trying to install the TunkeyLinux Core appliance 
found here: http://www.turnkeylinux.org/core

I downloaded the ISO file from the web site.

Then, I used this command to intall it:
virt-install -n tkl-core -r 512 --vcpus=1 --check-cpu --os-type=linux 
--os-variant=ubuntuhardy -v --accelerate 
-c /tmp/turnkey-core-2009.10-hardy-x86.iso 
-f /var/lib/libvirt/images/tkl-core.img -s 15 -b br0 --vnc noautoconsole

When I connect to the VNC console, I get the Turnkey linux 
options screen.
I select Install to hard disk from there and it seems to 
start the install but crashes during the installer startup.

This is repeatable so there has to be a way to debug it.

I tried turning on the debug option for virt-install but that
did not give me any useful info.

Any ideas how to debug this?

Thanks,
Neil

--
Neil Aggarwal, (281)846-8957, http://UnmeteredVPS.net/cpanel
cPanel/WHM preinstalled on a virtual server for only $40/month!
No overage charges, 7 day free trial, PayPal, Google Checkout

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Luvalley-5 has been released (with whitepaper!): enables arbitrary OS to run VMs without any modification

2010-01-21 Thread Xiaodong Yi

Luvalley is a lightweight type-1 Virtual Machine Monitor (VMM).
Its part of source codes are derived from KVM to virtualize
CPU instructions and memory management unit (MMU). However, its
overall architecture is completely different from KVM, but somewhat
like Xen. Luvalley runs outside of Linux, just like Xen's architecture.
Any operating system, including Linux, could be used as
Luvalley's scheduler, memory manager, physical device driver provider
and virtual IO device
emulator. Currently, Luvalley supports Linux and Windows. That is to
say, one may run Luvalley to boot a Linux or Windows, and then run
multiple virtualized operating systems on such Linux or Windows.

If you are interested in Luvalley project, you may download the source
codes as well as the whitepaper from
   http://sourceforge.net/projects/luvalley/

The main changes of this release (Luvalley-5) are:

 * The code derived is updated from KVM-83 to KVM-88

 * Supports both Intel and AMD CPUs

 * Automatically identify Intel and AMD CPUs

This release (Luvalley-5) includes:

 * Luvalley whitepaper (the first edition)

 * Luvalley binary and source code tarball

 * Readme, changelog and release notes files
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] Add definitions for current cpu models..

2010-01-21 Thread Andre Przywara

john cooper wrote:

Chris Wright wrote:

* Daniel P. Berrange (berra...@redhat.com) wrote:

To be honest all possible naming schemes for '-cpu name' are just as
unfriendly as each other. The only user friendly option is '-cpu host'.

IMHO, we should just pick a concise naming scheme document it. Given
they are all equally unfriendly, the one that has consistency with vmware
naming seems like a mild winner.

Heh, I completely agree, and was just saying the same thing to John
earlier today. May as well be -cpu {foo,bar,baz} since the meaning for
those command line options must be well-documented in the man page.

I can appreciate the concern of wanting to get this
as correct as possible. But ultimately we just
need three unique tags which ideally have some relation
to their associated architectures. The diatribes
available from /proc/cpuinfo while generally accurate
don't really offer any more of a clue to the model
group, and in their unmodified form are rather unwieldy
as command line flags.
I agree. I'd underline that this patch is for migration purposes only,
so you don't want to specify an exact CPU, but more like a class of
CPUs. If you look into the available CPUID features in each CPU, you
will find that there are only a few groups, with currently three for
each vendor being a good guess.
/proc/cpuinfo just prints out marketing names, which have only a mild
relationship to a feature-related technical CPU model. Maybe we can use
a generation approach like the AMD Opteron ones for Intel, too.

These G1/G2/G3 names are just arbitrary and have no roots within AMD.

I think that an exact CPU model specification is out of scope for this
patch and maybe even for QEMU. One could create a database with CPU
names and associated CPUID flags and provide an external tool to
generate a QEMU command line out of this. Keeping this database
up-to-date (especially for desktop CPU models) is a burden that the QEMU
project does not want to bear.

This is from an EVC kb article[1]:

Here is a pointer to a more detailed version:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=1003212

We probably should also add an option to dump out the
full set of qemu-side cpuid flags for the benefit of
users and upper level tools.

You mean like this one?
http://lists.gnu.org/archive/html/qemu-devel/2009-09/msg01228.html
Resending this patch set is on my plan for next week. What is the state
of this patch? Will it go in soon? Then I'd rebase my patch set on top
of it.

Regards,
Andre.

--
Andre Przywara
AMD-OSRC (Dresden)
Tel: x29712

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] Add definitions for current cpu models..

2010-01-21 Thread Anthony Liguori

On 01/20/2010 07:18 PM, john cooper wrote:

Chris Wright wrote:

* Daniel P. Berrange (berra...@redhat.com) wrote:

To be honest all possible naming schemes for '-cpuname' are just as
unfriendly as each other. The only user friendly option is '-cpu host'.

IMHO, we should just pick a concise naming scheme document it. Given
they are all equally unfriendly, the one that has consistency with vmware
naming seems like a mild winner.

I can appreciate the concern of wanting to get this
as correct as possible.

This is the root of the trouble. At the qemu layer, we try to focus on
being correct.

Management tools are typically the layer that deals with being correct.

A good compromise is making things user tunable which means that a
downstream can make correctness decisions without forcing those
decisions on upstream.

In this case, the idea would be to introduce a new option, say something
like -cpu-def. The syntax would be:

-cpu-def
name=coreduo,level=10,family=6,model=14,stepping=8,features=+vme+mtrr+clflush+mca+sse3+monitor,xlevel=0x8008,model_id=Genuine
Intel(R) CPU T2600 @ 2.16GHz

Which is not that exciting since it just lets you do -cpu coreduo in a
much more complex way. However, if we take advantage of the current
config support, you can have:

[cpu-def]
name=coreduo
level=10
family=6
model=14
stepping=8
features=+vme+mtrr+clflush+mca+sse3..
model_id=Genuine Intel...

And that can be stored in a config file. We should then parse
/etc/qemu/target-targetname.conf by default. We'll move the current
x86_defs table into this config file and then downstreams/users can
define whatever compatibility classes they want.

With this feature, I'd be inclined to take correct compatibility
classes like Nehalem as part of the default qemurc that we install
because it's easily overridden by a user. It then becomes just a
suggestion on our part verses a guarantee.

It should just be a matter of adding qemu_cpudefs_opts to
qemu-config.[ch], taking a new command line that parses the argument via
QemuOpts, then passing the parsed options to a target-specific function
that then builds the table of supported cpus.

Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 04/12] Add handle page fault PV helper.

2010-01-21 Thread H. Peter Anvin

On 01/21/2010 01:02 AM, Avi Kivity wrote:

 You can also just emulate the state transition -- since you know
 you're dealing with a flat protected-mode or long-mode OS (and just
 make that a condition of enabling the feature) you don't have to deal
 with all the strange combinations of directions that an unrestricted
 x86 event can take.  Since it's an exception, it is unconditional.
 
 Do you mean create the stack frame manually?  I'd really like to avoid
 that for many reasons, one of which is performance (need to do all the
 virt-to-phys walks manually), the other is that we're certain to end up
 with something horribly underspecified.  I'd really like to keep as
 close as possible to the hardware.  For the alternative approach, see Xen.
 

I obviously didn't mean to do something which didn't look like a
hardware-delivered exception.  That by itself provides a tight spec.
The performance issue is real, of course.

Obviously, the design of VT-x was before my time at Intel, so I'm not
familiar with why the tradeoffs that were done they way they were.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv2 1/3] eventfd: allow atomic read and waitqueue remove

This is a backport of commit: 03db343a6320f780937078433fa7d8da955e6fce
modified in a way that introduces some code duplication on the one hand,
but reduces the risk of regressing existing eventfd users on the other
hand.

KVM needs a wait to atomically remove themselves from the eventfd
-poll() wait queue head, in order to handle correctly their IRQfd
deassign operation.

This patch introduces such API, plus a way to read an eventfd from its
context.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---

Avi, Davidel, how about only including the following part for -stable
then?  Reason is, I still would like to be able to use irqfd there, and
getting spurious interrupts 100% of times unmask is done isn't a very
good idea IMO ...


 fs/eventfd.c|   35 +++
 include/linux/eventfd.h |9 +
 2 files changed, 44 insertions(+), 0 deletions(-)

diff --git a/fs/eventfd.c b/fs/eventfd.c
index 8b47e42..ea9c18a 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -135,6 +135,41 @@ static unsigned int eventfd_poll(struct file *file, 
poll_table *wait)
return events;
 }
 
+static void eventfd_ctx_do_read(struct eventfd_ctx *ctx, __u64 *cnt)
+{
+   *cnt = (ctx-flags  EFD_SEMAPHORE) ? 1 : ctx-count;
+   ctx-count -= *cnt;
+}
+
+/**
+ * eventfd_ctx_remove_wait_queue - Read the current counter and removes wait 
queue.
+ * @ctx: [in] Pointer to eventfd context.
+ * @wait: [in] Wait queue to be removed.
+ * @cnt: [out] Pointer to the 64bit conter value.
+ *
+ * Returns zero if successful, or the following error codes:
+ *
+ * -EAGAIN  : The operation would have blocked.
+ *
+ * This is used to atomically remove a wait queue entry from the eventfd wait
+ * queue head, and read/reset the counter value.
+ */
+int eventfd_ctx_remove_wait_queue(struct eventfd_ctx *ctx, wait_queue_t *wait,
+ __u64 *cnt)
+{
+   unsigned long flags;
+
+   spin_lock_irqsave(ctx-wqh.lock, flags);
+   eventfd_ctx_do_read(ctx, cnt);
+   __remove_wait_queue(ctx-wqh, wait);
+   if (*cnt != 0  waitqueue_active(ctx-wqh))
+   wake_up_locked_poll(ctx-wqh, POLLOUT);
+   spin_unlock_irqrestore(ctx-wqh.lock, flags);
+
+   return *cnt != 0 ? 0 : -EAGAIN;
+}
+EXPORT_SYMBOL_GPL(eventfd_ctx_remove_wait_queue);
+
 static ssize_t eventfd_read(struct file *file, char __user *buf, size_t count,
loff_t *ppos)
 {
diff --git a/include/linux/eventfd.h b/include/linux/eventfd.h
index 94dd103..85eac48 100644
--- a/include/linux/eventfd.h
+++ b/include/linux/eventfd.h
@@ -10,6 +10,7 @@
 
 #include linux/fcntl.h
 #include linux/file.h
+#include linux/wait.h
 
 /*
  * CAREFUL: Check include/asm-generic/fcntl.h when defining
@@ -34,6 +35,8 @@ struct file *eventfd_fget(int fd);
 struct eventfd_ctx *eventfd_ctx_fdget(int fd);
 struct eventfd_ctx *eventfd_ctx_fileget(struct file *file);
 int eventfd_signal(struct eventfd_ctx *ctx, int n);
+int eventfd_ctx_remove_wait_queue(struct eventfd_ctx *ctx, wait_queue_t *wait,
+ __u64 *cnt);
 
 #else /* CONFIG_EVENTFD */
 
@@ -61,6 +64,12 @@ static inline void eventfd_ctx_put(struct eventfd_ctx *ctx)
 
 }
 
+static inline int eventfd_ctx_remove_wait_queue(struct eventfd_ctx *ctx,
+   wait_queue_t *wait, __u64 *cnt)
+{
+   return -ENOSYS;
+}
+
 #endif
 
 #endif /* _LINUX_EVENTFD_H */
-- 
1.6.6.144.g5c3af
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] Add definitions for current cpu models..

2010-01-21 Thread john cooper

Anthony Liguori wrote:
 On 01/20/2010 07:18 PM, john cooper wrote: 
 I can appreciate the concern of wanting to get this
 as correct as possible.

 
 This is the root of the trouble.  At the qemu layer, we try to focus on
 being correct.
 
 Management tools are typically the layer that deals with being correct.
 
 A good compromise is making things user tunable which means that a
 downstream can make correctness decisions without forcing those
 decisions on upstream.

Conceptually I agree with such a malleable approach -- actually
I prefer it.  I thought however it was too much infrastructure to
foist on the problem just to add a few more models into the mix.

The only reservation which comes to mind is that of logistics.
This may ruffle the code some and impact others such as Andre
who seem to have existing patches relative to the current structure.
Anyone have strong objections to this approach before I have a
look at an implementation?

Thanks,

-john


-- 
john.coo...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv2 1/3] eventfd: allow atomic read and waitqueue remove

2010-01-21 Thread Davide Libenzi

On Thu, 21 Jan 2010, Michael S. Tsirkin wrote:

 This is a backport of commit: 03db343a6320f780937078433fa7d8da955e6fce
 modified in a way that introduces some code duplication on the one hand,
 but reduces the risk of regressing existing eventfd users on the other
 hand.
 
 KVM needs a wait to atomically remove themselves from the eventfd
 -poll() wait queue head, in order to handle correctly their IRQfd
 deassign operation.
 
 This patch introduces such API, plus a way to read an eventfd from its
 context.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
 
 Avi, Davidel, how about only including the following part for -stable
 then?  Reason is, I still would like to be able to use irqfd there, and
 getting spurious interrupts 100% of times unmask is done isn't a very
 good idea IMO ...

It's the same thing. Unless there are *real* problems in KVM due to the 
spurious ints, I still think this is .33 material.


- Davide


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] Add definitions for current cpu models..

2010-01-21 Thread Blue Swirl

On Thu, Jan 21, 2010 at 2:39 PM, Andre Przywara andre.przyw...@amd.com wrote:
john cooper wrote:

Chris Wright wrote:

* Daniel P. Berrange (berra...@redhat.com) wrote:

To be honest all possible naming schemes for '-cpu name' are just as
unfriendly as each other. The only user friendly option is '-cpu host'.
IMHO, we should just pick a concise naming scheme document it. Given
they are all equally unfriendly, the one that has consistency with
vmware
naming seems like a mild winner.

I agree. I'd underline that this patch is for migration purposes only, so
you don't want to specify an exact CPU, but more like a class of CPUs. If
you look into the available CPUID features in each CPU, you will find that
there are only a few groups, with currently three for each vendor being a
good guess.
/proc/cpuinfo just prints out marketing names, which have only a mild
relationship to a feature-related technical CPU model. Maybe we can use a
generation approach like the AMD Opteron ones for Intel, too.
These G1/G2/G3 names are just arbitrary and have no roots within AMD.

I think that an exact CPU model specification is out of scope for this patch
and maybe even for QEMU. One could create a database with CPU names and
associated CPUID flags and provide an external tool to generate a QEMU
command line out of this. Keeping this database up-to-date (especially for
desktop CPU models) is a burden that the QEMU project does not want to bear.

This is from an EVC kb article[1]:

Here is a pointer to a more detailed version:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=1003212

We probably should also add an option to dump out the
full set of qemu-side cpuid flags for the benefit of
users and upper level tools.

You mean like this one?
http://lists.gnu.org/archive/html/qemu-devel/2009-09/msg01228.html
Resending this patch set is on my plan for next week. What is the state of
this patch? Will it go in soon? Then I'd rebase my patch set on top of it.

FYI, a similar CPU flag mechanism has been implemented for Sparc and
x86, unifying these would be cool.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCHv2 1/3] eventfd: allow atomic read and waitqueue remove


On 01/21/2010 06:58 PM, Davide Libenzi wrote:

On Thu, 21 Jan 2010, Michael S. Tsirkin wrote:

   

This is a backport of commit: 03db343a6320f780937078433fa7d8da955e6fce
modified in a way that introduces some code duplication on the one hand,
but reduces the risk of regressing existing eventfd users on the other
hand.

KVM needs a wait to atomically remove themselves from the eventfd
-poll() wait queue head, in order to handle correctly their IRQfd
deassign operation.

This patch introduces such API, plus a way to read an eventfd from its
context.

Signed-off-by: Michael S. Tsirkinm...@redhat.com
---

Avi, Davidel, how about only including the following part for -stable
then?  Reason is, I still would like to be able to use irqfd there, and
getting spurious interrupts 100% of times unmask is done isn't a very
good idea IMO ...
 

It's the same thing. Unless there are *real* problems in KVM due to the
spurious ints, I still think this is .33 material.
   


I agree.

But I think we can solve this in another way in .32: we can clear the 
eventfd from irqfd-inject work, which is in process context.  The new 
stuff is only needed for lockless clearing, no?


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv2 1/3] eventfd: allow atomic read and waitqueue remove


On 01/21/2010 07:13 PM, Avi Kivity wrote:


But I think we can solve this in another way in .32: we can clear the 
eventfd from irqfd-inject work, which is in process context.  The new 
stuff is only needed for lockless clearing, no?




I meant atomic clearing, when we inject interrupts from the irqfd atomic 
context.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv2 1/3] eventfd: allow atomic read and waitqueue remove

On Thu, Jan 21, 2010 at 07:13:13PM +0200, Avi Kivity wrote:
 On 01/21/2010 06:58 PM, Davide Libenzi wrote:
 On Thu, 21 Jan 2010, Michael S. Tsirkin wrote:


 This is a backport of commit: 03db343a6320f780937078433fa7d8da955e6fce
 modified in a way that introduces some code duplication on the one hand,
 but reduces the risk of regressing existing eventfd users on the other
 hand.

 KVM needs a wait to atomically remove themselves from the eventfd
 -poll() wait queue head, in order to handle correctly their IRQfd
 deassign operation.

 This patch introduces such API, plus a way to read an eventfd from its
 context.

 Signed-off-by: Michael S. Tsirkinm...@redhat.com
 ---

 Avi, Davidel, how about only including the following part for -stable
 then?  Reason is, I still would like to be able to use irqfd there, and
 getting spurious interrupts 100% of times unmask is done isn't a very
 good idea IMO ...
  
 It's the same thing. Unless there are *real* problems in KVM due to the
 spurious ints, I still think this is .33 material.


 I agree.

 But I think we can solve this in another way in .32: we can clear the  
 eventfd from irqfd-inject work, which is in process context.  The new  
 stuff is only needed for lockless clearing, no?

No, AFAIK there's no way to clear the counter from kernel without
this patch.

 -- 
 Do not meddle in the internals of kernels, for they are subtle and quick to 
 panic.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv2 1/3] eventfd: allow atomic read and waitqueue remove


On 01/21/2010 07:23 PM, Michael S. Tsirkin wrote:



I agree.

But I think we can solve this in another way in .32: we can clear the
eventfd from irqfd-inject work, which is in process context.  The new
stuff is only needed for lockless clearing, no?
 

No, AFAIK there's no way to clear the counter from kernel without
this patch.
   


Can't you read from the file?

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

repeatable hang with loop mount and heavy IO in guest

2010-01-21 Thread Antoine Martin

I've tried various guests, including most recent Fedora12 kernels, 
custom 2.6.32.x
All of them hang around the same point (~1GB written) when I do heavy IO 
write inside the guest.
I have waited 30 minutes to see if the guest would recover, but it just 
sits there, not writing back any data, not doing anything - but 
certainly not allowing any new IO writes. The host has some load on it, 
but nothing heavy enough to completely hand a guest for that long.


mount -o loop some_image.fs ./somewhere bs=512
dd if=/dev/zero of=/somewhere/zero
then after ~1GB: sync

Host is running: 2.6.31.4
QEMU PC emulator version 0.10.50 (qemu-kvm-devel-88)

Guests are booted with elevator=noop as the filesystems are stored as 
files, accessed as virtio disks.



The hung backtraces always look similar to these:
[  361.460136] INFO: task loop0:2097 blocked for more than 120 seconds.
[  361.460139] echo 0  /proc/sys/kernel/hung_task_timeout_secs 
disables this message.
[  361.460142] loop0 D 88000b92c848 0  2097  2 
0x0080
[  361.460148]  88000b92c5d0 0046 880008c1f810 
880009829fd8
[  361.460153]  880009829fd8 880009829fd8 88000a21ee80 
88000b92c5d0
[  361.460157]  880009829610 8181b768 880001af33b0 
0002

[  361.460161] Call Trace:
[  361.460216]  [8105bf12] ? sync_page+0x0/0x43
[  361.460253]  [8151383e] ? io_schedule+0x2c/0x43
[  361.460257]  [8105bf50] ? sync_page+0x3e/0x43
[  361.460261]  [81513a2a] ? __wait_on_bit+0x41/0x71
[  361.460264]  [8105c092] ? wait_on_page_bit+0x6a/0x70
[  361.460283]  [810385a7] ? wake_bit_function+0x0/0x23
[  361.460287]  [81064975] ? shrink_page_list+0x3e5/0x61e
[  361.460291]  [81513992] ? schedule_timeout+0xa3/0xbe
[  361.460305]  [81038579] ? autoremove_wake_function+0x0/0x2e
[  361.460308]  [8106538f] ? shrink_zone+0x7e1/0xaf6
[  361.460310]  [81061725] ? determine_dirtyable_memory+0xd/0x17
[  361.460314]  [810637da] ? isolate_pages_global+0xa3/0x216
[  361.460316]  [81062712] ? mark_page_accessed+0x2a/0x39
[  361.460335]  [810a61db] ? __find_get_block+0x13b/0x15c
[  361.460337]  [81065ed4] ? try_to_free_pages+0x1ab/0x2c9
[  361.460340]  [81063737] ? isolate_pages_global+0x0/0x216
[  361.460343]  [81060baf] ? __alloc_pages_nodemask+0x394/0x564
[  361.460350]  [8108250c] ? __slab_alloc+0x137/0x44f
[  361.460371]  [812cc4c1] ? radix_tree_preload+0x1f/0x6a
[  361.460374]  [81082a08] ? kmem_cache_alloc+0x5d/0x88
[  361.460376]  [812cc4c1] ? radix_tree_preload+0x1f/0x6a
[  361.460379]  [8105c0b5] ? add_to_page_cache_locked+0x1d/0xf1
[  361.460381]  [8105c1b0] ? add_to_page_cache_lru+0x27/0x57
[  361.460384]  [8105c25a] ? grab_cache_page_write_begin+0x7a/0xa0
[  361.460399]  [81104620] ? ext3_write_begin+0x7e/0x201
[  361.460417]  [8134648f] ? do_lo_send_aops+0xa1/0x174
[  361.460420]  [81081948] ? virt_to_head_page+0x9/0x2a
[  361.460422]  [8134686b] ? loop_thread+0x309/0x48a
[  361.460425]  [813463ee] ? do_lo_send_aops+0x0/0x174
[  361.460427]  [81038579] ? autoremove_wake_function+0x0/0x2e
[  361.460430]  [81346562] ? loop_thread+0x0/0x48a
[  361.460432]  [8103819b] ? kthread+0x78/0x80
[  361.460441]  [810238df] ? finish_task_switch+0x2b/0x78
[  361.460454]  [81002f6a] ? child_rip+0xa/0x20
[  361.460460]  [81012ac3] ? native_pax_close_kernel+0x0/0x32
[  361.460463]  [81038123] ? kthread+0x0/0x80
[  361.460469]  [81002f60] ? child_rip+0x0/0x20
[  361.460471] INFO: task kjournald:2098 blocked for more than 120 seconds.
[  361.460473] echo 0  /proc/sys/kernel/hung_task_timeout_secs 
disables this message.
[  361.460474] kjournald D 88000b92e558 0  2098  2 
0x0080
[  361.460477]  88000b92e2e0 0046 88000aad9840 
88000983ffd8
[  361.460480]  88000983ffd8 88000983ffd8 81808e00 
88000b92e2e0
[  361.460483]  88000983fcf0 8181b768 880001af3c40 
0002

[  361.460486] Call Trace:
[  361.460488]  [810a6b16] ? sync_buffer+0x0/0x3c
[  361.460491]  [8151383e] ? io_schedule+0x2c/0x43
[  361.460494]  [810a6b4e] ? sync_buffer+0x38/0x3c
[  361.460496]  [81513a2a] ? __wait_on_bit+0x41/0x71
[  361.460499]  [810a6b16] ? sync_buffer+0x0/0x3c
[  361.460501]  [81513ac4] ? out_of_line_wait_on_bit+0x6a/0x76
[  361.460504]  [810385a7] ? wake_bit_function+0x0/0x23
[  361.460514]  [8113edad] ? 
journal_commit_transaction+0x769/0xbb8

[  361.460517]  [810238df] ? finish_task_switch+0x2b/0x78
[  361.460519]  [815137d9] ? thread_return+0x40/0x79
[  361.460522]  [8114162d] ? kjournald+0xc7/0x1cb
[  361.460525]  [81038579] ? autoremove_wake_function+0x0/0x2e
[

Re: [PATCHv2 1/3] eventfd: allow atomic read and waitqueue remove

On Thu, Jan 21, 2010 at 07:33:02PM +0200, Avi Kivity wrote:
 On 01/21/2010 07:23 PM, Michael S. Tsirkin wrote:

 I agree.

 But I think we can solve this in another way in .32: we can clear the
 eventfd from irqfd-inject work, which is in process context.  The new
 stuff is only needed for lockless clearing, no?
  
 No, AFAIK there's no way to clear the counter from kernel without
 this patch.


 Can't you read from the file?

IMO no, the read could block.

 -- 
 Do not meddle in the internals of kernels, for they are subtle and quick to 
 panic.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv2 1/3] eventfd: allow atomic read and waitqueue remove


On 01/21/2010 07:32 PM, Michael S. Tsirkin wrote:



Can't you read from the file?
 

IMO no, the read could block.
   


But you're in process context.  An eventfd never blocks.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv2 1/3] eventfd: allow atomic read and waitqueue remove

On Thu, Jan 21, 2010 at 07:47:40PM +0200, Avi Kivity wrote:
 On 01/21/2010 07:32 PM, Michael S. Tsirkin wrote:

 Can't you read from the file?
  
 IMO no, the read could block.


 But you're in process context.  An eventfd never blocks.

Yes it blocks if counter is 0. And we don't know
it's not 0 unless we read :) catch-22.

 -- 
 Do not meddle in the internals of kernels, for they are subtle and quick to 
 panic.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv2 1/3] eventfd: allow atomic read and waitqueue remove

2010-01-21 Thread Davide Libenzi

On Thu, 21 Jan 2010, Avi Kivity wrote:

 On 01/21/2010 07:32 PM, Michael S. Tsirkin wrote:
  
   Can't you read from the file?

  IMO no, the read could block.
 
 
 But you're in process context.  An eventfd never blocks.

Can you control the eventfd flags? Because if yes, O_NONBLOCK will never 
block.



- Davide


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] Add definitions for current cpu models..

2010-01-21 Thread Jamie Lokier

john cooper wrote:
 kvm itself can modify flags exported from qemu to a guest.

I would hope for an option to request that qemu doesn't run if the
guest won't get the cpuid flags requested on the command line.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv2 1/3] eventfd: allow atomic read and waitqueue remove

On Thu, Jan 21, 2010 at 09:50:34AM -0800, Davide Libenzi wrote:
 On Thu, 21 Jan 2010, Avi Kivity wrote:
 
  On 01/21/2010 07:32 PM, Michael S. Tsirkin wrote:
   
Can't you read from the file?
 
   IMO no, the read could block.
  
  
  But you're in process context.  An eventfd never blocks.
 
 Can you control the eventfd flags? Because if yes, O_NONBLOCK will never 
 block.
 

Userspace can but kvm can't.

 
 - Davide
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] Add definitions for current cpu models..

2010-01-21 Thread Jamie Lokier

john cooper wrote:
  I foresee wanting to iterate over the models and pick the latest one
  which a host supports - on the grounds that you have done the hard
  work of ensuring it is a reasonably good performer, while probably
  working on another host of similar capability when a new host is made
  available.
 
 That's a fairly close use case to that of safe migration
 which was one of the primary motivations to identify
 the models being discussed.  Although presentation and
 administration of such was considered the domain of management
 tools.

My hypothetical script which iterates over models in that way is a
management tool, and would use qemu to help do its job.

Do you mean that more powerful management tools to support safe
migration will maintain _their own_ processor model tables, and
perform their calculations using their own tables instead of querying
qemu, and therefore not have any need of qemu's built in table?

If so, I favour more strongly Anthony's suggestion that the processor
model table lives in a config file (eventually), as that file could be
shared between management tools and qemu itself without duplication.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv2 1/3] eventfd: allow atomic read and waitqueue remove


On 01/21/2010 07:45 PM, Michael S. Tsirkin wrote:



But you're in process context.  An eventfd never blocks.
 

Yes it blocks if counter is 0. And we don't know
it's not 0 unless we read :) catch-22.
   


Ah yes, I forgot.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv2 1/3] eventfd: allow atomic read and waitqueue remove


On 01/21/2010 07:56 PM, Avi Kivity wrote:

On 01/21/2010 07:45 PM, Michael S. Tsirkin wrote:



But you're in process context.  An eventfd never blocks.

Yes it blocks if counter is 0. And we don't know
it's not 0 unless we read :) catch-22.


Ah yes, I forgot.



Well, you can poll it and then read it... this introduces a new race (if 
userspace does a read in parallel) but it's limited to kvm and buggy 
userspace.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Use compile_prog as rest of configure

On Wed, Jan 20, 2010 at 12:46:28PM +0100, Juan Quintela wrote:
 This substitution got missed somehow
 
 Signed-off-by: Juan Quintela quint...@redhat.com

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: Fix kvm_coalesced_mmio_ring duplicate allocation

On Thu, Jan 21, 2010 at 04:20:04PM +0800, Sheng Yang wrote:
 The commit 0953ca73 KVM: Simplify coalesced mmio initialization
 allocate kvm_coalesced_mmio_ring in the kvm_coalesced_mmio_init(), but
 didn't discard the original allocation...
 
 Signed-off-by: Sheng Yang sh...@linux.intel.com

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] qemu-kvm: Use kvm-kmod headers if available

On Tue, Jan 12, 2010 at 10:21:27PM +0100, Jan Kiszka wrote:
 Since kvm-kmod-2.6.32.2 we have an alternative source for recent KVM
 kernel headers. Use it when available and not overruled by --kerneldir.
 If there is no kvm-kmod and no --kerneldir, we continue to fall back to
 the qemu-kvm's kernel headers.

Applied both, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv2] kvm-s390: fix potential array overrun in intercept handling

On Thu, Jan 21, 2010 at 12:19:07PM +0100, Christian Borntraeger wrote:
 v2: apply Avis suggestions about ARRAY_SIZE.
 
 kvm_handle_sie_intercept uses a jump table to get the intercept handler
 for a SIE intercept. Static code analysis revealed a potential problem:
 the intercept_funcs jump table was defined to contain (0x48  2) entries,
 but we only checked for code  0x48 which would cause an off-by-one
 array overflow if code == 0x48.
 
 Use the compiler and ARRAY_SIZE to automatically set the limits.
 
 Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com

Applied and queued for .33, CC: stable, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] [PATCH] Use macros for x86_emulate_ops to avoid future mistakes

On Wed, Jan 20, 2010 at 04:47:21PM +0900, Takuya Yoshikawa wrote:
 The return values from x86_emulate_ops are defined
 in kvm_emulate.h as macros X86EMUL_*.
 
 But in emulate.c, we are comparing the return values
 from these ops with 0 to check if they're X86EMUL_CONTINUE
 or not: X86EMUL_CONTINUE is defined as 0 now.
 
 To avoid possible mistakes in the future, this patch
 substitutes X86EMUL_CONTINUE for 0 that are being
 compared with the return values from x86_emulate_ops.
 
   We think that there are more places we should use these
   macros, but the meanings of rc values in x86_emulate_insn()
   were not so clear at a glance. If we use proper macros in
   this function, we would be able to follow the flow of each
   emulation more easily and, maybe, more securely.
 
 Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [KVM PATCH] pci passthrough: zap option rom scanning.

On Wed, Jan 20, 2010 at 11:58:48AM +0100, Gerd Hoffmann wrote:
 Nowdays (qemu 0.12) seabios loads option roms from pci rom bars.  So
 there is no need any more to scan for option roms and have qemu load
 them.  Zap the code.
 
 Signed-off-by: Gerd Hoffmann kra...@redhat.com

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv2 1/3] eventfd: allow atomic read and waitqueue remove

On Thu, Jan 21, 2010 at 07:57:22PM +0200, Avi Kivity wrote:
 On 01/21/2010 07:56 PM, Avi Kivity wrote:
 On 01/21/2010 07:45 PM, Michael S. Tsirkin wrote:

 But you're in process context.  An eventfd never blocks.
 Yes it blocks if counter is 0. And we don't know
 it's not 0 unless we read :) catch-22.

 Ah yes, I forgot.


 Well, you can poll it and then read it... this introduces a new race (if  
 userspace does a read in parallel) but it's limited to kvm and buggy  
 userspace.

I would rather not require that userspace never reads this fd.
You are right that it does not now, but adding this as requirement
looks like exporting an implementation bug to userspace.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] Add definitions for current cpu models..

2010-01-21 Thread Jamie Lokier

john cooper wrote:
 I can appreciate the argument above, however the goal was
 choosing names with some basis in reality.  These were
 recommended by our contacts within Intel, are used by VmWare
 to describe their similar cpu models, and arguably have fallen
 to defacto usage as evidenced by such sources as:
 
 http://en.wikipedia.org/wiki/Conroe_(microprocessor)
 http://en.wikipedia.org/wiki/Penryn_(microprocessor)
 http://en.wikipedia.org/wiki/Nehalem_(microarchitecture)

(Aside: I can confirm they haven't fallen into de facto usage anywhere
in my vicinity :-) I wonder if the contact within Intel are living in
a bit of a bubble where these names are more familiar than the outside
world.)

I think we can all agree that there is no point looking for a familiar
-cpu naming scheme because there aren't any familiar and meaningful names
these days.

 used by VmWare to describe their similar cpu models

If the same names are being used, I see some merit in qemu's list
matching VMware's cpu models *exactly* (in capabilities, not id
strings), to aid migration from VMware.  Is that feasible?  Do they
match already?

 I suspect whatever we choose of reasonable length as a model
 tag for -cpu some further detail is going to be required.
 That was the motivation to augment the table as above with
 an instance of a LCD for that associated class.
  
  I'm not a typical user: I know quite a lot about x86 architecture;
  I just haven't kept up to date enough to know the code/model names.
  Typical users will know less about them.
 
 Understood.


 One thought I had to further clarify what is going on under the hood
 was to dump the cpuid flags for each model as part of (or in
 addition to) the above table.  But this seems a bit extreme and kvm
 itself can modify flags exported from qemu to a guest.

Here's another idea.

It would be nice if qemu could tell the user which of the built-in
-cpu choices is the most featureful subset of their own host.  With
-cpu host implemented, finding that is probably quite easy.

Users with multiple hosts will get a better feel for what the -cpu
names mean that way, probably better than any documentation would give
them, because they probably have not much idea what CPU families they
have anyway.  (cat /proc/cpuinfo doesn't clarify, as I found).

And it would give a simple, effective, quick indication of what they
must choose if they want an VM image that runs on more than one of
their hosts without a management tool.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] Add definitions for current cpu models..

2010-01-21 Thread john cooper

Jamie Lokier wrote:

 Do you mean that more powerful management tools to support safe
 migration will maintain _their own_ processor model tables, and
 perform their calculations using their own tables instead of querying
 qemu, and therefore not have any need of qemu's built in table?

I would expect so.  IIRC that is what the libvirt folks have
in mind for example.  But we're also trying to simplify the use
case of the lonesome user at one with the qemu CLI.

-john

-- 
john.coo...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] Add definitions for current cpu models..

2010-01-21 Thread john cooper

Jamie Lokier wrote:

 I think we can all agree that there is no point looking for a familiar
 -cpu naming scheme because there aren't any familiar and meaningful names
 these days.

Even if we dismiss the Intel coined names as internal
code names, there is still VMW's use of them in this
space which we can either align with or attempt to
displace.   All considered I don't see any motivation
nor gain in doing the latter.  Anyway it doesn't appear
likely we're going to resolve this to our collective
satisfaction with a hard-wired naming scheme.   
 
 It would be nice if qemu could tell the user which of the built-in
 -cpu choices is the most featureful subset of their own host.  With
 -cpu host implemented, finding that is probably quite easy.

This should be doable although it may not be as simple
as traversing a hierarchy of features and picking one
with the most host flags present.  In any case this
should be fairly detachable from settling the immediate
issue.

-john

-- 
john.coo...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

KVM Virtual CPU time profiling

2010-01-21 Thread Saksena, Abhishek

Hi All,
Is there a way in KVM to measure the real physical (CPU) time consumed by each 
running Virtual CPU?  (I want to do time profiling of the virtual machines 
running on host system)


Also, is there an explanation somewhere on how Virtual CPU scheduling is 
achieved in KVM?
Thanks
Abhishek
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] Add definitions for current cpu models..

2010-01-21 Thread Anthony Liguori


On 01/21/2010 10:43 AM, john cooper wrote:

Anthony Liguori wrote:
   

On 01/20/2010 07:18 PM, john cooper wrote:
 

I can appreciate the concern of wanting to get this
as correct as possible.

   

This is the root of the trouble.  At the qemu layer, we try to focus on
being correct.

Management tools are typically the layer that deals with being correct.

A good compromise is making things user tunable which means that a
downstream can make correctness decisions without forcing those
decisions on upstream.
 

Conceptually I agree with such a malleable approach -- actually
I prefer it.  I thought however it was too much infrastructure to
foist on the problem just to add a few more models into the mix.
   


See list for patches.  I didn't do the cpu bits but it should be very 
obvious how to do that now.


Regards,

Anthony Liguori


The only reservation which comes to mind is that of logistics.
This may ruffle the code some and impact others such as Andre
who seem to have existing patches relative to the current structure.
Anyone have strong objections to this approach before I have a
look at an implementation?

Thanks,

-john


   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: repeatable hang with loop mount and heavy IO in guest

2010-01-21 Thread RW

Some months ago I also thought elevator=noop should be a good idea.
But it isn't. It works good as long as you only do short IO requests.
Try using deadline in host and guest.

Robert


On 01/21/10 18:26, Antoine Martin wrote:
 I've tried various guests, including most recent Fedora12 kernels,
 custom 2.6.32.x
 All of them hang around the same point (~1GB written) when I do heavy IO
 write inside the guest.
 I have waited 30 minutes to see if the guest would recover, but it just
 sits there, not writing back any data, not doing anything - but
 certainly not allowing any new IO writes. The host has some load on it,
 but nothing heavy enough to completely hand a guest for that long.
 
 mount -o loop some_image.fs ./somewhere bs=512
 dd if=/dev/zero of=/somewhere/zero
 then after ~1GB: sync
 
 Host is running: 2.6.31.4
 QEMU PC emulator version 0.10.50 (qemu-kvm-devel-88)
 
 Guests are booted with elevator=noop as the filesystems are stored as
 files, accessed as virtio disks.
 
 
 The hung backtraces always look similar to these:
 [  361.460136] INFO: task loop0:2097 blocked for more than 120 seconds.
 [  361.460139] echo 0  /proc/sys/kernel/hung_task_timeout_secs
 disables this message.
 [  361.460142] loop0 D 88000b92c848 0  2097  2
 0x0080
 [  361.460148]  88000b92c5d0 0046 880008c1f810
 880009829fd8
 [  361.460153]  880009829fd8 880009829fd8 88000a21ee80
 88000b92c5d0
 [  361.460157]  880009829610 8181b768 880001af33b0
 0002
 [  361.460161] Call Trace:
 [  361.460216]  [8105bf12] ? sync_page+0x0/0x43
 [  361.460253]  [8151383e] ? io_schedule+0x2c/0x43
 [  361.460257]  [8105bf50] ? sync_page+0x3e/0x43
 [  361.460261]  [81513a2a] ? __wait_on_bit+0x41/0x71
 [  361.460264]  [8105c092] ? wait_on_page_bit+0x6a/0x70
 [  361.460283]  [810385a7] ? wake_bit_function+0x0/0x23
 [  361.460287]  [81064975] ? shrink_page_list+0x3e5/0x61e
 [  361.460291]  [81513992] ? schedule_timeout+0xa3/0xbe
 [  361.460305]  [81038579] ? autoremove_wake_function+0x0/0x2e
 [  361.460308]  [8106538f] ? shrink_zone+0x7e1/0xaf6
 [  361.460310]  [81061725] ? determine_dirtyable_memory+0xd/0x17
 [  361.460314]  [810637da] ? isolate_pages_global+0xa3/0x216
 [  361.460316]  [81062712] ? mark_page_accessed+0x2a/0x39
 [  361.460335]  [810a61db] ? __find_get_block+0x13b/0x15c
 [  361.460337]  [81065ed4] ? try_to_free_pages+0x1ab/0x2c9
 [  361.460340]  [81063737] ? isolate_pages_global+0x0/0x216
 [  361.460343]  [81060baf] ? __alloc_pages_nodemask+0x394/0x564
 [  361.460350]  [8108250c] ? __slab_alloc+0x137/0x44f
 [  361.460371]  [812cc4c1] ? radix_tree_preload+0x1f/0x6a
 [  361.460374]  [81082a08] ? kmem_cache_alloc+0x5d/0x88
 [  361.460376]  [812cc4c1] ? radix_tree_preload+0x1f/0x6a
 [  361.460379]  [8105c0b5] ? add_to_page_cache_locked+0x1d/0xf1
 [  361.460381]  [8105c1b0] ? add_to_page_cache_lru+0x27/0x57
 [  361.460384]  [8105c25a] ?
 grab_cache_page_write_begin+0x7a/0xa0
 [  361.460399]  [81104620] ? ext3_write_begin+0x7e/0x201
 [  361.460417]  [8134648f] ? do_lo_send_aops+0xa1/0x174
 [  361.460420]  [81081948] ? virt_to_head_page+0x9/0x2a
 [  361.460422]  [8134686b] ? loop_thread+0x309/0x48a
 [  361.460425]  [813463ee] ? do_lo_send_aops+0x0/0x174
 [  361.460427]  [81038579] ? autoremove_wake_function+0x0/0x2e
 [  361.460430]  [81346562] ? loop_thread+0x0/0x48a
 [  361.460432]  [8103819b] ? kthread+0x78/0x80
 [  361.460441]  [810238df] ? finish_task_switch+0x2b/0x78
 [  361.460454]  [81002f6a] ? child_rip+0xa/0x20
 [  361.460460]  [81012ac3] ? native_pax_close_kernel+0x0/0x32
 [  361.460463]  [81038123] ? kthread+0x0/0x80
 [  361.460469]  [81002f60] ? child_rip+0x0/0x20
 [  361.460471] INFO: task kjournald:2098 blocked for more than 120 seconds.
 [  361.460473] echo 0  /proc/sys/kernel/hung_task_timeout_secs
 disables this message.
 [  361.460474] kjournald D 88000b92e558 0  2098  2
 0x0080
 [  361.460477]  88000b92e2e0 0046 88000aad9840
 88000983ffd8
 [  361.460480]  88000983ffd8 88000983ffd8 81808e00
 88000b92e2e0
 [  361.460483]  88000983fcf0 8181b768 880001af3c40
 0002
 [  361.460486] Call Trace:
 [  361.460488]  [810a6b16] ? sync_buffer+0x0/0x3c
 [  361.460491]  [8151383e] ? io_schedule+0x2c/0x43
 [  361.460494]  [810a6b4e] ? sync_buffer+0x38/0x3c
 [  361.460496]  [81513a2a] ? __wait_on_bit+0x41/0x71
 [  361.460499]  [810a6b16] ? sync_buffer+0x0/0x3c
 [  361.460501]  [81513ac4] ? out_of_line_wait_on_bit+0x6a/0x76
 [  361.460504]  [810385a7] ? wake_bit_function+0x0/0x23
 [  361.460514]  [8113edad] ?

Re: repeatable hang with loop mount and heavy IO in guest

2010-01-21 Thread Thomas Beinicke

On Thursday 21 January 2010 21:08:38 RW wrote:
 Some months ago I also thought elevator=noop should be a good idea.
 But it isn't. It works good as long as you only do short IO requests.
 Try using deadline in host and guest.
 
 Robert

@Robert: I've been using noop on all of my KVMs and didn't have any problems 
so far, never had any crash too.
Do you have any performance data or comparisons between noop and deadline io 
schedulers?

Cheers,

Thomas
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: repeatable hang with loop mount and heavy IO in guest

2010-01-21 Thread RW

No sorry, I haven't any performance data with noop. I even don't
have had a crash. BUT I've experienced serve I/O degradation
with noop. Once I've written a big chunk of data (e.g. a simple
rsync -av /usr /opt) with noop it works for a while and
after a few seconds I saw heavy writes which made the
VM virtually unusable. As far as I remember it was kjournald
which cases the writes.

I've written a mail to the list some months ago with some benchmarks:
http://article.gmane.org/gmane.comp.emulators.kvm.devel/41112/match=benchmark
There're some I/O benchmarks in there. You can't get the graphs
currently since tauceti.net is offline until monday. I haven't
tested noop in these benchmarks because of the problems
mentioned above. But it compares deadline and cfq a little bit
on a HP DL 380 G6 server.

Robert

On 01/21/10 22:08, Thomas Beinicke wrote:
 On Thursday 21 January 2010 21:08:38 RW wrote:
 Some months ago I also thought elevator=noop should be a good idea.
 But it isn't. It works good as long as you only do short IO requests.
 Try using deadline in host and guest.

 Robert
 
 @Robert: I've been using noop on all of my KVMs and didn't have any problems 
 so far, never had any crash too.
 Do you have any performance data or comparisons between noop and deadline io 
 schedulers?
 
 Cheers,
 
 Thomas
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv2] kvm-s390: fix potential array overrun in intercept handling


On 21.01.2010, at 18:36, Marcelo Tosatti wrote:

 On Thu, Jan 21, 2010 at 12:19:07PM +0100, Christian Borntraeger wrote:
 v2: apply Avis suggestions about ARRAY_SIZE.
 
 kvm_handle_sie_intercept uses a jump table to get the intercept handler
 for a SIE intercept. Static code analysis revealed a potential problem:
 the intercept_funcs jump table was defined to contain (0x48  2) entries,
 but we only checked for code  0x48 which would cause an off-by-one
 array overflow if code == 0x48.
 
 Use the compiler and ARRAY_SIZE to automatically set the limits.
 
 Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com
 
 Applied and queued for .33, CC: stable, thanks.

Yes. Christian, please get this into 2.6.32-stable.

Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: Some keys don't repeat in 64 bit Widows 7 kvm guest

2010-01-21 Thread Jimmy Crossley

I am now running qemu-kvm 0.11.1:

$ kvm -h | head -1
QEMU PC emulator version 0.11.1 (qemu-kvm-0.11.1), Copyright (c) 2003-2008 
Fabrice Bellard

My Windows 7 guest detected a lot of new hardware, but I still have the same 
key repeating problem.  I think I will just leave this alone for now since I am 
going to be away from my office (and this machine) for several weeks.   When I 
return, I plan on doing a clean install of everything.  If I still have this 
issue, I will report back.

Thanks to everyone for your help.

 -Original Message-
 From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On
 Behalf Of Jimmy Crossley
 Sent: Saturday, January 16, 2010 21:33
 To: 'Jim Paris'
 Cc: 'Gleb Natapov'; kvm@vger.kernel.org
 Subject: RE: Some keys don't repeat in 64 bit Widows 7 kvm guest
 
  From: j...@jim.sh [mailto:j...@jim.sh] On Behalf Of Jim Paris
  Sent: Saturday, January 16, 2010 20:40
  To: Jimmy Crossley
  Cc: 'Gleb Natapov'; kvm@vger.kernel.org
  Subject: Re: Some keys don't repeat in 64 bit Widows 7 kvm guest
 
  Jimmy Crossley wrote:
   Thanks for the quick response, Gleb.  You are right - we should
 not
   spend our time troubleshooting an issue with something this old.
   I'll try downloading all the sources and headers I need to build
   kvm-88.  I think I'll need another Debian install, since this is a
   production machine and I don't want to destabilize it.  Go ahead
 and
   laugh - I ran Debian stable for years before finally deciding I
   could risk running testing.
 
  Debian testing still has the kvm package at version 72, but the
 new
  package name qemu-kvm is at version 0.11.0 which is quite a bit
  newer.
 
  -jim
 
 It looks like I need to switch to qemu-kvm.  That kvm package that I
 have
 Installed (72+dfsg=5+squeeze1) is not in the squeeze repositories any
 more.
 
 It sure is hard to keep up with everything.  Thanks, Jim.
 
 
 Jimmy Crossley
 CoNetrix
 5214 68th Street
 Suite 200
 Lubbock TX 79424
 jcross...@conetrix.com
 http://www.conetrix.com
 tel: 806-687-8600 800-356-6568
 fax: 806-687-8511
 This e-mail message (and attachments) may contain confidential
 CoNetrix information. If you are not the intended recipient, you
 cannot use, distribute or copy the message or attachments. In such a
 case, please notify the sender by return e-mail immediately and erase
 all copies of the message and attachments. Opinions, conclusions and
 other information in this message and attachments that do not relate
 to official business are neither given nor endorsed by CoNetrix.
 
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

qemu-kvm-0.12.2 hangs when booting grub, when kvm is disabled

2010-01-21 Thread Jim Paris

Hi,

With this small disk image:

  http://psy.jim.sh/~jim/tmp/diskimage.gz

and the new qemu-kvm-0.12.2:

  $ kvm --version
  QEMU PC emulator version 0.12.2 (qemu-kvm-0.12.2), Copyright (c) 2003-2008 
Fabrice Bellard

I can successfully boot to a grub  prompt with:

  $ kvm -drive file=diskimage,boot=on

However, if kvm gets disabled:

  $ kvm -no-kvm -drive file=diskimage,boot=on

then the boot hangs at GRUB Loading, please wait... and consumes 100% CPU.

-jim
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/5] Debug register emulation fixes and optimizations (reloaded)

On Wed, Jan 20, 2010 at 06:20:20PM +0100, Jan Kiszka wrote:
 Major parts of this series were already posted a while ago during the
 debug register switch optimizations. This version now comes with an
 additional fix for VMX (patch 1) and a rework of mov dr emulation for
 SVM.
 
 Find this series also at git://git.kiszka.org/linux-kvm.git queues/debugregs
 
 Jan Kiszka (5):
   KVM: VMX: Fix exceptions of mov to dr
   KVM: VMX: Fix emulation of DR4 and DR5
   KVM: VMX: Clean up DR6 emulation
   KVM: SVM: Clean up and enhance mov dr emulation
   KVM: SVM: Trap all debug register accesses
 
  arch/x86/include/asm/kvm_host.h |5 +-
  arch/x86/kvm/svm.c  |   78 
 +--
  arch/x86/kvm/vmx.c  |   67 +++--
  arch/x86/kvm/x86.c  |   19 +
  4 files changed, 84 insertions(+), 85 deletions(-)

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: vcpu hotplug support

Avi Kivity wrote:
 On 01/21/2010 01:54 PM, Liu, Jinsong wrote:
 Avi,
 
 I just send 2 patches for KVM vcpu hotplug support.
 1 is seabios patch: Setup vcpu add/remove infrastructure, including
 madt bios_info and dsdt 2 is qemu-kvm patch: Debug vcpu add
 
 
 
 The patches look reasonable (of course I'd like to see Gleb review
 it), but please send the seabios patch to the seabios mailing list
 (seab...@seabios.org) so we don't have to diverge.

Thanks for remind! I have sent to seabios.

Jinsong--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] Setup vcpu add/remove infrastructure, including madt bios_info and dsdt.

Gleb Natapov wrote:
 On Thu, Jan 21, 2010 at 07:48:23PM +0800, Liu, Jinsong wrote:
 From cb997030cba02e7e74a29b3d942aeba9808ed293 Mon Sep 17 00:00:00
 2001 
 From: Liu, Jinsong jinsong@intel.com
 Date: Fri, 22 Jan 2010 03:18:46 +0800
 Subject: [PATCH] Setup vcpu add/remove infrastructure,
 including madt bios_info and dsdt. 

 1. setup madt bios_info structure, so that static dsdt get
run-time madt info like checksum address, lapic address,
max cpu numbers, with least hardcode magic number
(realmode address of bios_info).
 2. setup vcpu add/remove dsdt infrastructure, including
processor related acpi objects and control methods. vcpu
add/remove will trigger SCI and then control method _L02.
By matching madt, vcpu number and add/remove action were
found, then by notify control method, it will notify OS
 acpi driver. 

 Signed-off-by: Liu, Jinsong jinsong@intel.com
 It looks like AML code is a port of what we had in BOCHS bios with
 minor changes. Can you detail what is changed and why for easy review
 please? And this still doesn't work with Windows I assume.

Yes, my work is based on BOCHS infrastructure, thanks BOCHS :)
I just change some minor points:
1. explicitly define returen value of '_MAT' as 'buffer', otherwise some linux 
acpi driver (i.e. linux 2.6.30) would parse error which will handle it as 
'integer' not 'buffer';
2. keep correct 'checksum' of madt when vcpu add/remove, otherwise it will 
report 'checksum error' when using acpi tools to get madt info if we add/remove 
vcpu;
3. add '_EJ0' so that linux has acpi obj under /sys/devices/LNXSYSTM:00, which 
is need for vcpu remove;
4. on Method(PRSC, 0), just scan 'xxx' vcpus that qemu get from cmdline para 
'maxcpus=xxx', not all 256 vcpus, otherwise under some dsdt processor define, 
it will result error;
5. use 1 hardcode address bios_info structure to replace '0x514', so that it 
can transfer more madt info to dsdt;

Thanks,
Jinsong--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] kvm: Flush coalesced MMIO buffer periodly

The default action of coalesced MMIO is, cache the writing in buffer, until:
1. The buffer is full.
2. Or the exit to QEmu due to other reasons.

But this would result in a very late writing in some condition.
1. The each time write to MMIO content is small.
2. The writing interval is big.
3. No need for input or accessing other devices frequently.

This issue was observed in a experimental embbed system. The test image
simply print test every 1 seconds. The output in QEmu meets expectation,
but the output in KVM is delayed for seconds.

Per Avi's suggestion, I hooked a flushing for coalesced MMIO buffer in VGA
update handler. By this way, We don't need vcpu explicit exit to QEmu to
handle this issue.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---

Like this?

 qemu-kvm.c |   26 --
 qemu-kvm.h |6 ++
 vl.c   |2 ++
 3 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/qemu-kvm.c b/qemu-kvm.c
index 599c3d6..a9b5107 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -463,6 +463,12 @@ static void kvm_create_vcpu(CPUState *env, int id)
 goto err_fd;
 }
 
+#ifdef KVM_CAP_COALESCED_MMIO
+if (kvm_state-coalesced_mmio  !kvm_state-coalesced_mmio_ring)
+kvm_state-coalesced_mmio_ring = (void *) env-kvm_run +
+   kvm_state-coalesced_mmio * PAGE_SIZE;
+#endif
+
 return;
   err_fd:
 close(env-kvm_fd);
@@ -927,8 +933,7 @@ int kvm_run(CPUState *env)
 
 #if defined(KVM_CAP_COALESCED_MMIO)
 if (kvm_state-coalesced_mmio) {
-struct kvm_coalesced_mmio_ring *ring =
-(void *) run + kvm_state-coalesced_mmio * PAGE_SIZE;
+struct kvm_coalesced_mmio_ring *ring = kvm_state-coalesced_mmio_ring;
 while (ring-first != ring-last) {
 cpu_physical_memory_rw(ring-coalesced_mmio[ring-first].phys_addr,
ring-coalesced_mmio[ring-first].data[0],
@@ -2073,6 +2078,23 @@ static void io_thread_wakeup(void *opaque)
 }
 }
 
+#ifdef KVM_CAP_COALESCED_MMIO
+void kvm_flush_coalesced_mmio_buffer(void)
+{
+if (kvm_state-coalesced_mmio_ring) {
+struct kvm_coalesced_mmio_ring *ring =
+kvm_state-coalesced_mmio_ring;
+while (ring-first != ring-last) {
+cpu_physical_memory_rw(ring-coalesced_mmio[ring-first].phys_addr,
+   ring-coalesced_mmio[ring-first].data[0],
+   ring-coalesced_mmio[ring-first].len, 1);
+smp_wmb();
+ring-first = (ring-first + 1) % KVM_COALESCED_MMIO_MAX;
+}
+}
+}
+#endif
+
 int kvm_main_loop(void)
 {
 int fds[2];
diff --git a/qemu-kvm.h b/qemu-kvm.h
index 6b3e5a1..8188ff6 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -1125,6 +1125,11 @@ static inline int kvm_set_migration_log(int enable)
 return kvm_physical_memory_set_dirty_tracking(enable);
 }
 
+#ifdef KVM_CAP_COALESCED_MMIO
+void kvm_flush_coalesced_mmio_buffer(void);
+#else
+void kvm_flush_coalesced_mmio_buffer(void) {}
+#endif
 
 int kvm_irqchip_in_kernel(void);
 #ifdef CONFIG_KVM
@@ -1144,6 +1149,7 @@ typedef struct KVMState {
 int fd;
 int vmfd;
 int coalesced_mmio;
+struct kvm_coalesced_mmio_ring *coalesced_mmio_ring;
 int broken_set_mem_region;
 int migration_log;
 int vcpu_events;
diff --git a/vl.c b/vl.c
index 9edea10..64902f2 100644
--- a/vl.c
+++ b/vl.c
@@ -3235,6 +3235,7 @@ static void gui_update(void *opaque)
 interval = dcl-gui_timer_interval;
 dcl = dcl-next;
 }
+kvm_flush_coalesced_mmio_buffer();
 qemu_mod_timer(ds-gui_timer, interval + qemu_get_clock(rt_clock));
 }
 
@@ -3242,6 +3243,7 @@ static void nographic_update(void *opaque)
 {
 uint64_t interval = GUI_REFRESH_INTERVAL;
 
+kvm_flush_coalesced_mmio_buffer();
 qemu_mod_timer(nographic_timer, interval + qemu_get_clock(rt_clock));
 }
 
-- 
1.5.4.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: KVM Virtual CPU time profiling

On Friday 22 January 2010 02:41:35 Saksena, Abhishek wrote:
 Hi All,
 Is there a way in KVM to measure the real physical (CPU) time consumed by
  each running Virtual CPU?  (I want to do time profiling of the virtual
  machines running on host system)
 
 Also, is there an explanation somewhere on how Virtual CPU scheduling is
  achieved in KVM? Thanks

Each VM is a QEmu process, and each vcpu is a thread of it(but not all the 
threads are vcpus). Currently the KVM related scheduler algorithm is the same 
as other host threads/processes.

You can get thread_id for each vcpu in QEmu monitor, by:

(qemu) info cpus

Then, you can do anything you want with it, e.g. using top to got each 
thread/vcpu's CPU time. :)

-- 
regards
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Unable to single-step in kvm, always results in a resume

2010-01-21 Thread Nicholas Amon

So now I can step instruction but my breakpoints do not work.  I have 
verified that disabling kvm restores the breakpoint functionality.  Any 
suggestions?


Thanks,

Nicholas

Jan Kiszka wrote:

Hi Nicholas,

please don't drop CCs on reply.

Nicholas Amon wrote:
  

Hi Jan,

Thanks for responding.  Yes, I am able to step instruction when I 
disable kvm w/ the no-kvm option.  My host kernel is 64bit  2.6.27 and 
the program that I am debugging is 32 bit but starts in real mode.  But 
the KVM module I am running is from kvm-88.  Is there anyway I can check 
the version definitively?



kvm modules issue a message when being loaded, check your kernel log.
qemu-kvm gives you the version via -version.

OK, the problems you see is likely related to the very old versions you
use. Update to recent kvm-kmod (2.6.32 series) and qemu-kvm (0.12
series) and retry.

Jan

  

Thanks,

Nicholas

Jan Kiszka wrote:


Jan Kiszka wrote:
  
  

Nicholas Amon wrote:



Hi All,

I am trying to single-step through my kernel using qemu and kvm.  I have
run qemu via:  qemu-system-x86_64 -s -S -hda
/home/nickamon/lab1/obj/kernel.img and also connected to the process
using gdb.

Problem is that whenever I try and step instruction, it seems to resume
my kernel rather than allowing me to progress instruction by
instruction.  I have built the kvm snapshot from git and still no luck. 
Tried following the code for a few hours and have no luck.  Any

suggestions?
  
  

What's you host kernel or kvm-kmod version?




...and does -no-kvm make any difference (except that it's much slower)?

Jan

  
  


  


--
Nicholas Amon
Senior Software Engineer
Xceedium Inc.
Office: 201-536-1000 x127
Cell: 732-236-7698
na...@xceedium.com

See How to Control  Track High-Risk Users: Join our Webinar on Tuesday, 
June 2

Network World Names Xceedium GateKeeper RSA 2009 Best of Show

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

PCI Passthrough Problem

2010-01-21 Thread Aaron Clausen

I'm trying once again to get PCI passthrough working (KVM 84 on Ubuntu
9.10), and I'm getting this error :

LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
/usr/bin/kvm -S -M pc-0.11 -m 4096 -smp 4 -name mailserver -uuid
76a83471-e94a-3658-fa61-8eceaa74ffc2 -monitor
unix:/var/run/libvirt/qemu/mailserver.monitor,server,nowait -localtime
-boot c -drive file=,if=ide,media=cdrom,index=2 -drive
file=/var/lib/libvirt/images/mailserver.img,if=virtio,index=0,boot=on
-drive file=/var/lib/libvirt/images/mailserver-2.img,if=virtio,index=1
-net nic,macaddr=54:52:00:1b:b2:56,vlan=0,model=virtio,name=virtio.0
-net tap,fd=17,vlan=0,name=tap.0 -serial pty -parallel none -usb
-usbdevice tablet -vnc 127.0.0.1:0 -k en-us -vga cirrus -pcidevice
host=0a:01.0
char device redirected to /dev/pts/0
get_real_device: /sys/bus/pci/devices/:0a:01.0/config: Permission denied
init_assigned_device: Error: Couldn't get real device (0a:01.0)!
Failed to initialize assigned device host=0a:01.0

Any thoughts?

-- 
Aaron Clausen
mightymartia...@gmail.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Setup vcpu add/remove infrastructure, including madt bios_info and dsdt.

On Fri, Jan 22, 2010 at 10:15:44AM +0800, Liu, Jinsong wrote:
 Gleb Natapov wrote:
  On Thu, Jan 21, 2010 at 07:48:23PM +0800, Liu, Jinsong wrote:
  From cb997030cba02e7e74a29b3d942aeba9808ed293 Mon Sep 17 00:00:00
  2001 
  From: Liu, Jinsong jinsong@intel.com
  Date: Fri, 22 Jan 2010 03:18:46 +0800
  Subject: [PATCH] Setup vcpu add/remove infrastructure,
  including madt bios_info and dsdt. 

  1. setup madt bios_info structure, so that static dsdt get
 run-time madt info like checksum address, lapic address,
 max cpu numbers, with least hardcode magic number
 (realmode address of bios_info).
  2. setup vcpu add/remove dsdt infrastructure, including
 processor related acpi objects and control methods. vcpu
 add/remove will trigger SCI and then control method _L02.
 By matching madt, vcpu number and add/remove action were
 found, then by notify control method, it will notify OS
  acpi driver. 

  Signed-off-by: Liu, Jinsong jinsong@intel.com
  It looks like AML code is a port of what we had in BOCHS bios with
  minor changes. Can you detail what is changed and why for easy review
  please? And this still doesn't work with Windows I assume.

 Yes, my work is based on BOCHS infrastructure, thanks BOCHS :)
 I just change some minor points:
 1. explicitly define returen value of '_MAT' as 'buffer', otherwise some 
 linux acpi driver (i.e. linux 2.6.30) would parse error which will handle it 
 as 'integer' not 'buffer';
 2. keep correct 'checksum' of madt when vcpu add/remove, otherwise it will 
 report 'checksum error' when using acpi tools to get madt info if we 
 add/remove vcpu;
 3. add '_EJ0' so that linux has acpi obj under /sys/devices/LNXSYSTM:00, 
 which is need for vcpu remove;
 4. on Method(PRSC, 0), just scan 'xxx' vcpus that qemu get from cmdline para 
 'maxcpus=xxx', not all 256 vcpus, otherwise under some dsdt processor define, 
 it will result error;
What kind of errors? Qemu should never set bit over maxcpus in PRS.

 5. use 1 hardcode address bios_info structure to replace '0x514', so that it 
 can transfer more madt info to dsdt;

 Thanks,
 Jinsong

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: PCI Passthrough Problem

2010-01-21 Thread Yolkfull Chow

On Thu, Jan 21, 2010 at 09:24:36PM -0800, Aaron Clausen wrote:
 I'm trying once again to get PCI passthrough working (KVM 84 on Ubuntu
 9.10), and I'm getting this error :
 
 LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
 /usr/bin/kvm -S -M pc-0.11 -m 4096 -smp 4 -name mailserver -uuid
 76a83471-e94a-3658-fa61-8eceaa74ffc2 -monitor
 unix:/var/run/libvirt/qemu/mailserver.monitor,server,nowait -localtime
 -boot c -drive file=,if=ide,media=cdrom,index=2 -drive
 file=/var/lib/libvirt/images/mailserver.img,if=virtio,index=0,boot=on
 -drive file=/var/lib/libvirt/images/mailserver-2.img,if=virtio,index=1
 -net nic,macaddr=54:52:00:1b:b2:56,vlan=0,model=virtio,name=virtio.0
 -net tap,fd=17,vlan=0,name=tap.0 -serial pty -parallel none -usb
 -usbdevice tablet -vnc 127.0.0.1:0 -k en-us -vga cirrus -pcidevice
 host=0a:01.0
 char device redirected to /dev/pts/0
 get_real_device: /sys/bus/pci/devices/:0a:01.0/config: Permission denied
 init_assigned_device: Error: Couldn't get real device (0a:01.0)!
 Failed to initialize assigned device host=0a:01.0

Seems libvirt initialize the PCI devices problem, you could manually unbind 
this 
device from host kernel driver and try above command again.

For unbind this device please refer to :

http://www.linux-kvm.org/page/How_to_assign_devices_with_VT-d_in_KVM

 
 Any thoughts?
 
 -- 
 Aaron Clausen
 mightymartia...@gmail.com
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] KVM: x86: Fix probable memory leak of vcpu-arch.mce_banks

2010-01-21 Thread Wei Yongjun

vcpu-arch.mce_banks is malloc in kvm_arch_vcpu_init(), but
never free in any place, this may cause memory leak. So this
patch fixed to free it in kvm_arch_vcpu_uninit().

Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com
---
 arch/x86/kvm/x86.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f25b52e..1ddcad4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5089,6 +5089,7 @@ fail:
 
 void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
 {
+   kfree(vcpu-arch.mce_banks);
kvm_free_lapic(vcpu);
down_read(vcpu-kvm-slots_lock);
kvm_mmu_destroy(vcpu);
-- 
1.6.2.2


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] KVM: x86: Fix leak of free lapic date in kvm_arch_vcpu_init()

2010-01-21 Thread Wei Yongjun

In function kvm_arch_vcpu_init(), if the memory malloc for
vcpu-arch.mce_banks is fail, it does not free the memory
of lapic date. This patch fixed it.

Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com
---
 arch/x86/kvm/x86.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6651dbf..f25b52e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5072,12 +5072,13 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
   GFP_KERNEL);
if (!vcpu-arch.mce_banks) {
r = -ENOMEM;
-   goto fail_mmu_destroy;
+   goto fail_free_lapic;
}
vcpu-arch.mcg_cap = KVM_MAX_MCE_BANKS;
 
return 0;
-
+fail_free_lapic:
+   kvm_free_lapic(vcpu);
 fail_mmu_destroy:
kvm_mmu_destroy(vcpu);
 fail_free_pio_data:
-- 
1.6.2.2


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2 v2] KVM: x86: Fix probable memory leak of vcpu-arch.mce_banks

2010-01-21 Thread Wei Yongjun

vcpu-arch.mce_banks is malloc in kvm_arch_vcpu_init(), but
never free in any place, this may cause memory leak. So this
patch fixed to free it in kvm_arch_vcpu_uninit().

Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com
---
 arch/x86/kvm/x86.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 56a90a6..c27ebb1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5470,6 +5470,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
 {
int idx;
 
+   kfree(vcpu-arch.mce_banks);
kvm_free_lapic(vcpu);
idx = srcu_read_lock(vcpu-kvm-srcu);
kvm_mmu_destroy(vcpu);

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Some keys don't repeat in 64 bit Widows 7 kvm guest

On Thu, Jan 21, 2010 at 05:35:08PM -0600, Jimmy Crossley wrote:
 I am now running qemu-kvm 0.11.1:
 
 $ kvm -h | head -1
 QEMU PC emulator version 0.11.1 (qemu-kvm-0.11.1), Copyright (c) 2003-2008 
 Fabrice Bellard
 
 My Windows 7 guest detected a lot of new hardware, but I still have the same 
 key repeating problem.  I think I will just leave this alone for now since I 
 am going to be away from my office (and this machine) for several weeks.   
 When I return, I plan on doing a clean install of everything.  If I still 
 have this issue, I will report back.
 
qemu-kvm-0.11.1 is still pretty old. The latest version is qemu-kvm-0.12
and you need to update you kernel modules too. Similarly sounding
problem was fixed by kernel changes a while ago.

 Thanks to everyone for your help.
 
  -Original Message-
  From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On
  Behalf Of Jimmy Crossley
  Sent: Saturday, January 16, 2010 21:33
  To: 'Jim Paris'
  Cc: 'Gleb Natapov'; kvm@vger.kernel.org
  Subject: RE: Some keys don't repeat in 64 bit Widows 7 kvm guest
  
   From: j...@jim.sh [mailto:j...@jim.sh] On Behalf Of Jim Paris
   Sent: Saturday, January 16, 2010 20:40
   To: Jimmy Crossley
   Cc: 'Gleb Natapov'; kvm@vger.kernel.org
   Subject: Re: Some keys don't repeat in 64 bit Widows 7 kvm guest
  
   Jimmy Crossley wrote:
Thanks for the quick response, Gleb.  You are right - we should
  not
spend our time troubleshooting an issue with something this old.
I'll try downloading all the sources and headers I need to build
kvm-88.  I think I'll need another Debian install, since this is a
production machine and I don't want to destabilize it.  Go ahead
  and
laugh - I ran Debian stable for years before finally deciding I
could risk running testing.
  
   Debian testing still has the kvm package at version 72, but the
  new
   package name qemu-kvm is at version 0.11.0 which is quite a bit
   newer.
  
   -jim
  
  It looks like I need to switch to qemu-kvm.  That kvm package that I
  have
  Installed (72+dfsg=5+squeeze1) is not in the squeeze repositories any
  more.
  
  It sure is hard to keep up with everything.  Thanks, Jim.
  
  
  Jimmy Crossley
  CoNetrix
  5214 68th Street
  Suite 200
  Lubbock TX 79424
  jcross...@conetrix.com
  http://www.conetrix.com
  tel: 806-687-8600 800-356-6568
  fax: 806-687-8511
  This e-mail message (and attachments) may contain confidential
  CoNetrix information. If you are not the intended recipient, you
  cannot use, distribute or copy the message or attachments. In such a
  case, please notify the sender by return e-mail immediately and erase
  all copies of the message and attachments. Opinions, conclusions and
  other information in this message and attachments that do not relate
  to official business are neither given nor endorsed by CoNetrix.
  
  
  --
  To unsubscribe from this list: send the line unsubscribe kvm in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 04/12] Add handle page fault PV helper.

On Thu, Jan 21, 2010 at 07:47:22AM -0800, H. Peter Anvin wrote:
 On 01/21/2010 01:02 AM, Avi Kivity wrote:
 
  You can also just emulate the state transition -- since you know
  you're dealing with a flat protected-mode or long-mode OS (and just
  make that a condition of enabling the feature) you don't have to deal
  with all the strange combinations of directions that an unrestricted
  x86 event can take.  Since it's an exception, it is unconditional.
  
  Do you mean create the stack frame manually?  I'd really like to avoid
  that for many reasons, one of which is performance (need to do all the
  virt-to-phys walks manually), the other is that we're certain to end up
  with something horribly underspecified.  I'd really like to keep as
  close as possible to the hardware.  For the alternative approach, see Xen.
  
 
 I obviously didn't mean to do something which didn't look like a
 hardware-delivered exception.  That by itself provides a tight spec.
 The performance issue is real, of course.
 
 Obviously, the design of VT-x was before my time at Intel, so I'm not
 familiar with why the tradeoffs that were done they way they were.
 
Is it so out of question to reserver exception below 32 for PV use?

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: repeatable hang with loop mount and heavy IO in guest

2010-01-21 Thread Michael Tokarev

Antoine Martin wrote:
 I've tried various guests, including most recent Fedora12 kernels,
 custom 2.6.32.x
 All of them hang around the same point (~1GB written) when I do heavy IO
 write inside the guest.
[]
 Host is running: 2.6.31.4
 QEMU PC emulator version 0.10.50 (qemu-kvm-devel-88)

Please update to last version and repeat.  kvm-88 is ancient and
_lots_ of stuff fixed and changed since that time, I doubt anyone
here will try to dig into kvm-88 problems.

Current kvm is qemu-kvm-0.12.2, released yesterday.

/mjt
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 7/9] KVM: PPC: Emulate trap SRR1 flags properly

2010-01-21 Thread Liu Yu-B13201

 

 -Original Message-
 From: kvm-ppc-ow...@vger.kernel.org 
 [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of Hollis Blanchard
 Sent: Saturday, January 09, 2010 3:30 AM
 To: Alexander Graf
 Cc: k...@vger.kernel.org; kvm-ppc; Benjamin Herrenschmidt; Liu Yu
 Subject: Re: [PATCH 7/9] KVM: PPC: Emulate trap SRR1 flags properly
 
  diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
  index 338baf9..e283e44 100644
  --- a/arch/powerpc/kvm/booke.c
  +++ b/arch/powerpc/kvm/booke.c
  @@ -82,8 +82,9 @@ static void 
 kvmppc_booke_queue_irqprio(struct kvm_vcpu *vcpu,
         set_bit(priority, vcpu-arch.pending_exceptions);
   }
 
  -void kvmppc_core_queue_program(struct kvm_vcpu *vcpu)
  +void kvmppc_core_queue_program(struct kvm_vcpu *vcpu, ulong flags)
   {
  +       /* BookE does flags in ESR, so ignore those we get here */
         kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_PROGRAM);
   }
 
 Actually, I think Book E prematurely sets ESR, since it's done before
 the program interrupt is actually delivered. Architecturally, I'm not
 sure if it's a problem, but philosophically I've always wanted it to
 work the way you've just implemented for Book S.
 

ESR is updated not only by program but by data_tlb, data_storage, etc.
Should we rearrange them all? 
Also DEAR has the same situation as ESR.
Should it be updated when we decide to inject interrupt to guest?


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] kvmppc/e500: fix tlbcfg emulation

On 21.01.2010, at 04:22, Liu Yu-B13201 wrote:

 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de] 
 Sent: Wednesday, January 20, 2010 6:47 PM
 To: Liu Yu-B13201
 Cc: kvm-ppc@vger.kernel.org; a...@redhat.com; hol...@penguinppc.org
 Subject: Re: [PATCH 3/3] kvmppc/e500: fix tlbcfg emulation
 Importance: High

 On 20.01.2010, at 09:03, Liu Yu wrote:

 Signed-off-by: Liu Yu yu@freescale.com
 ---
 arch/powerpc/kvm/e500_emulate.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

 diff --git a/arch/powerpc/kvm/e500_emulate.c 
 b/arch/powerpc/kvm/e500_emulate.c
 index 95f8ec8..97337dd 100644
 --- a/arch/powerpc/kvm/e500_emulate.c
 +++ b/arch/powerpc/kvm/e500_emulate.c
 @@ -165,7 +165,7 @@ int kvmppc_core_emulate_mfspr(struct 
 kvm_vcpu *vcpu, int sprn, int rt)

 case SPRN_TLB0CFG:
 {
 -   ulong tmp = SPRN_TLB0CFG;
 +   ulong tmp = mfspr(SPRN_TLB0CFG);

 Does this SPR value change? I hope not :-). If not, better 
 read it once on init and then use it from there.

 Out of curiousity. Does read it once in order to get better performance?
 If yes, I think read from register is faster than read from mem.

Well, performance and clean structure. Nothing should keep us from having 
different parameters in the guest than we have in the host.

Also, as soon as nesting comes into play, reads from memory are definitely 
faster.

But if you think it's not worth the effort, keep it as it is.

Alex--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/3] kvmppc/e500: Add PVR/PIR init for E500

On 21.01.2010, at 04:30, Liu Yu-B13201 wrote:

 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de] 
 Sent: Wednesday, January 20, 2010 6:45 PM
 To: Liu Yu-B13201
 Cc: kvm-ppc@vger.kernel.org; a...@redhat.com; hol...@penguinppc.org
 Subject: Re: [PATCH 2/3] kvmppc/e500: Add PVR/PIR init for E500
 Importance: High

 On 20.01.2010, at 09:03, Liu Yu wrote:

 Signed-off-by: Liu Yu yu@freescale.com
 ---
 arch/powerpc/kvm/e500.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

 diff --git a/arch/powerpc/kvm/e500.c b/arch/powerpc/kvm/e500.c
 index 64949ee..fd3683d 100644
 --- a/arch/powerpc/kvm/e500.c
 +++ b/arch/powerpc/kvm/e500.c
 @@ -60,6 +60,10 @@ int kvmppc_core_vcpu_setup(struct kvm_vcpu *vcpu)

 kvmppc_e500_tlb_setup(vcpu_e500);

 +   /* Registers init */
 +   vcpu-arch.pvr = mfspr(SPRN_PVR);
 +   vcpu-vcpu_id = mfspr(SPRN_PIR);

 Is this correct? IIUC this should be the number of the vcpu. 
 So if you virtualize a 2-core system, but both vcpu init 
 functions run on core 1, this will break, right?

 Since kvm booke doesn't support more than 1 core virtualization.
 Can we put a comment here for now?

Sure. I'll need to do something clever about it on Book3S as well anyways.

Also, do you really need to set vcpu_id? If you just don't touch it it'll be 0. 
Shouldn't that be enough if you're only running a single guest core?

Alex--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 7/9] KVM: PPC: Emulate trap SRR1 flags properly