Re: [PATCH v2 6/10] KVM MMU: don't write-protect if have new mapping to unsync page

2010-04-26 Thread Avi Kivity

On 04/26/2010 06:58 AM, Xiao Guangrong wrote:


Avi Kivity wrote:
   

On 04/25/2010 10:00 AM, Xiao Guangrong wrote:
 

Two cases maybe happen in kvm_mmu_get_page() function:

- one case is, the goal sp is already in cache, if the sp is unsync,
we only need update it to assure this mapping is valid, but not
mark it sync and not write-protect sp-gfn since it not broke unsync
rule(one shadow page for a gfn)

- another case is, the goal sp not existed, we need create a new sp
for gfn, i.e, gfn (may)has another shadow page, to keep unsync rule,
we should sync(mark sync and write-protect) gfn's unsync shadow page.
After enabling multiple unsync shadows, we sync those shadow pages
only when the new sp not allow to become unsync(also for the unsyc
rule, the new rule is: allow all pte page become unsync)

   

Another interesting case is to create new shadow pages in the unsync
state.  That can help when the guest starts a short lived process: we
can avoid write protecting its pagetables completely.  Even if we do
sync them, we can sync them in a batch instead of one by one, saving IPIs.
 

IPI is needed when rmap_write_protect() changes mappings form writable to 
read-only,
so while we sync all gfn's unsync page, only one IPI is needed.
   


I meant, we can write protect all pages, then use one IPI to drop the 
tlbs for all of them.



And, another problem is we call ramp_write_protect()/flush-local-tlb many times 
when sync gfn's
unsync page, the same problem is in mmu_sync_children() function, could you 
allow me to improve
it after this patchset? :-)
   


Of course, this is more than enough to chew on.  Just suggesting an idea...


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V5 1/3] perf kvm: Enhance perf to collect KVM guest os statistics from host side

2010-04-26 Thread Zhang, Yanmin
On Fri, 2010-04-23 at 13:50 +0300, Avi Kivity wrote:
 On 04/22/2010 01:27 PM, Liu Yu-B13201 wrote:
 
  I met this error when built kernel. Anything wrong?
 
 CC  init/main.o
  In file included from include/linux/ftrace_event.h:8,
from include/trace/syscall.h:6,
from include/linux/syscalls.h:75,
from init/main.c:16:
  include/linux/perf_event.h: In function 
  'perf_register_guest_info_callbacks':
  include/linux/perf_event.h:1019: error: parameter name omitted
  include/linux/perf_event.h: In function 
  'perf_unregister_guest_info_callbacks':
  include/linux/perf_event.h:1021: error: parameter name omitted
  make[1]: *** [init/main.o] Error 1
  make: *** [init] Error 2
 
 
 I merged tip/perf/code which may fix this.  Find it in kvm.git next branch.
I downloaded the latest kvm.git tree and compilation is ok, including both 
enabling
FTRACE and disabling FTRACE.

Yanmin


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: VIA Nano support

2010-04-26 Thread Avi Kivity

On 04/26/2010 04:14 AM, Rusty Burchfield wrote:

When trying to boot installer iso's on this platform I get the
following error about 40 times per second in my kern.log.
kernel: [ 4857.828875] handle_exception: unexpected, vectoring info
0x880e intr info 0x8b0d

These options have no effect: -no-kvm-irqchip or -no-kvm-pit
QEMU is working fine.

Interestingly, specifying the -uuid option causes it to freeze
immediately.  Without that option it gets to the iso's boot screen and
freezes after attempting to continue past there.

Is this an expected condition for this platform / is this a platform
anyone is interested in supporting? ;-)
   


It's a known bug in the Nano's vmx implementation.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/10] KVM MMU: fix for calculating gpa in invlpg code

2010-04-26 Thread Avi Kivity

On 04/26/2010 06:10 AM, Xiao Guangrong wrote:


Avi Kivity wrote:
   

On 04/25/2010 10:00 AM, Xiao Guangrong wrote:
 

If the guest is 32-bit, we should use 'quadrant' to adjust gpa
offset

Changlog v2:
- when level is PT_DIRECTORY_LEVEL, the 'offset' should be
'role.quadrant   8', thanks Avi for point it out

Signed-off-by: Xiao Guangrongxiaoguangr...@cn.fujitsu.com
---
   arch/x86/kvm/paging_tmpl.h |   13 +++--
   1 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index d0cc07e..83cc72f 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -478,9 +478,18 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu,
gva_t gva)
   ((level == PT_DIRECTORY_LEVEL   is_large_pte(*sptep))) ||
   ((level == PT_PDPE_LEVEL   is_large_pte(*sptep {
   struct kvm_mmu_page *sp = page_header(__pa(sptep));
-
+int offset = 0;
+
+if (PTTYPE == 32) {
+if (level == PT_DIRECTORY_LEVEL)
+offset = PAGE_SHIFT - 4;
+else
+offset = PT64_LEVEL_BITS;
+offset = sp-role.quadrant   offset;
+}

   

The calculation is really

   shift = (PT32_LEVEL_BITS - PT64_LEVEL_BITS) * level;

 

So, the offset is q  (PAGE_SHIFT - (PT32_LEVEL_BITS - PT64_LEVEL_BITS) * 
level - 2)?
   


Ugh, what I meant was

  shift = PAGE_SHIFT - (PT_LEVEL_BITS - PT64_LEVEL_BITS) * level;
  offset  = q  shift

so,

  pte_gpa = (sp-gfn  PAGE_SHIFT) + offset + (spte - sp-spt) * 
sizeof(pt_element_t)


No magic numbers please.  Note it could work without the 'if (PTTYPE == 
32)'.



--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM_SET_MP_STATE is undocumented

2010-04-26 Thread Pekka Enberg

Hi Avi,

Avi Kivity kirjoitti:
I noticed that QEMU uses KVM_SET_MP_STATE but the ioctl() is 
completely undocumented.  I assume it has something to do with 
multiprocessor but I am unable to work out the details unless I take a 
peek at arch/x86/kvm.


Patch sent.


Two more interesting but undocumented ioctls:

 - KVM_SET_IDENTITY_MAP_ADDR
 - KVM_SET_BOOT_CPU_ID

Little background: we're debugging a KVM_EXIT_UNKNOWN problem for the 
largest bug-free kernel on Core i5 machine.  I've been looking at 
plain QEMU sources but it seems qemu-kvm that the person is using does 
much more during initialization. Do we have a known good list of 
mandatory steps required to properly initialize KVM on all CPUs?


Pekka
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/5] mmu root page cleanups

2010-04-26 Thread Avi Kivity
I tried to separate direct and indirect page allocation to reduce memory
usage with tdp, but that turned out too messy (and not much of a win - we
allocate 8K per page outside the struct kvm_mmu_page).  However the cleanups
on the way are worthwhile.

Avi Kivity (5):
  KVM: MMU: Rearrange struct kvm_mmu_page
  KVM: MMU: use 16 bits for root_count
  KVM: MMU: Unify 32-pae and single-root mmu setup
  KVM: MMU: Use correct root gfn for direct maps
  KVM: MMU: Fix check for cr3 outside guest memory

 arch/x86/include/asm/kvm_host.h |   15 ++-
 arch/x86/kvm/mmu.c  |   53 +++
 2 files changed, 34 insertions(+), 34 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] KVM: MMU: use 16 bits for root_count

2010-04-26 Thread Avi Kivity
This is incremented by a maximum of 4 for every vcpu, so we are far from
overflow.  16 bits improve struct packing.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/include/asm/kvm_host.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index cdaaedc..f38007d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -192,7 +192,7 @@ struct kvm_mmu_page {
 */
gfn_t gfn;
union kvm_mmu_page_role role;
-   int root_count;   /* Currently serving as active root */
+   short root_count; /* Currently serving as active root */
bool multimapped; /* More than one parent_pte? */
bool unsync;
union {
-- 
1.7.0.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] KVM: MMU: Rearrange struct kvm_mmu_page

2010-04-26 Thread Avi Kivity
Put all members required for direct pages together, to reduce cache footprint
for those pages.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/include/asm/kvm_host.h |   15 ---
 1 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3f0007b..cdaaedc 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -192,6 +192,13 @@ struct kvm_mmu_page {
 */
gfn_t gfn;
union kvm_mmu_page_role role;
+   int root_count;   /* Currently serving as active root */
+   bool multimapped; /* More than one parent_pte? */
+   bool unsync;
+   union {
+   u64 *parent_pte;   /* !multimapped */
+   struct hlist_head parent_ptes; /* multimapped, kvm_pte_chain */
+   };
 
u64 *spt;
/* hold the gfn of each spte inside spt */
@@ -201,14 +208,8 @@ struct kvm_mmu_page {
 * in this shadow page.
 */
DECLARE_BITMAP(slot_bitmap, KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS);
-   bool multimapped; /* More than one parent_pte? */
-   bool unsync;
-   int root_count;  /* Currently serving as active root */
+
unsigned int unsync_children;
-   union {
-   u64 *parent_pte;   /* !multimapped */
-   struct hlist_head parent_ptes; /* multimapped, kvm_pte_chain */
-   };
DECLARE_BITMAP(unsync_child_bitmap, 512);
 };
 
-- 
1.7.0.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] KVM: MMU: Unify 32-pae and single-root mmu setup

2010-04-26 Thread Avi Kivity
Reduce code duplication.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/mmu.c |   44 +---
 1 files changed, 21 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index ddfa865..4da4ff1 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2052,41 +2052,36 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
struct kvm_mmu_page *sp;
int direct = 0;
u64 pdptr;
+   hpa_t *rootp, root, root_flags;
+   int nr_roots;
 
-   root_gfn = vcpu-arch.cr3  PAGE_SHIFT;
+   direct = tdp_enabled || !is_paging(vcpu);
 
if (vcpu-arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) {
-   hpa_t root = vcpu-arch.mmu.root_hpa;
-
-   ASSERT(!VALID_PAGE(root));
-   if (tdp_enabled)
-   direct = 1;
-   if (mmu_check_root(vcpu, root_gfn))
-   return 1;
-   sp = kvm_mmu_get_page(vcpu, root_gfn, 0,
- PT64_ROOT_LEVEL, direct,
- ACC_ALL, NULL);
-   root = __pa(sp-spt);
-   ++sp-root_count;
-   vcpu-arch.mmu.root_hpa = root;
-   return 0;
+   rootp = vcpu-arch.mmu.root_hpa;
+   nr_roots = 1;
+   root_flags = 0;
+   } else {
+   rootp = vcpu-arch.mmu.pae_root;
+   nr_roots = 4;
+   root_flags = PT_PRESENT_MASK;
}
-   direct = !is_paging(vcpu);
-   if (tdp_enabled)
-   direct = 1;
-   for (i = 0; i  4; ++i) {
-   hpa_t root = vcpu-arch.mmu.pae_root[i];
 
+   for (i = 0; i  nr_roots; ++i) {
+   root = rootp[i];
ASSERT(!VALID_PAGE(root));
if (vcpu-arch.mmu.root_level == PT32E_ROOT_LEVEL) {
pdptr = kvm_pdptr_read(vcpu, i);
if (!is_present_gpte(pdptr)) {
-   vcpu-arch.mmu.pae_root[i] = 0;
+   rootp[i] = 0;
continue;
}
root_gfn = pdptr  PAGE_SHIFT;
} else if (vcpu-arch.mmu.root_level == 0)
root_gfn = 0;
+   else
+   root_gfn = vcpu-arch.cr3  PAGE_SHIFT;
+
if (mmu_check_root(vcpu, root_gfn))
return 1;
sp = kvm_mmu_get_page(vcpu, root_gfn, i  30,
@@ -2094,9 +2089,12 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
  ACC_ALL, NULL);
root = __pa(sp-spt);
++sp-root_count;
-   vcpu-arch.mmu.pae_root[i] = root | PT_PRESENT_MASK;
+   rootp[i] = root | root_flags;
}
-   vcpu-arch.mmu.root_hpa = __pa(vcpu-arch.mmu.pae_root);
+
+   if (nr_roots == 4)
+   vcpu-arch.mmu.root_hpa = __pa(rootp);
+
return 0;
 }
 
-- 
1.7.0.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] KVM: MMU: Use correct root gfn for direct maps

2010-04-26 Thread Avi Kivity
We currently use cr3, which is a random number for direct maps.  Use zero
instead (the start of the linear range mapped by the root).

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/mmu.c |9 +
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 4da4ff1..6e925b3 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2070,16 +2070,17 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
for (i = 0; i  nr_roots; ++i) {
root = rootp[i];
ASSERT(!VALID_PAGE(root));
-   if (vcpu-arch.mmu.root_level == PT32E_ROOT_LEVEL) {
+
+   if (direct)
+   root_gfn = 0;
+   else if (vcpu-arch.mmu.root_level == PT32E_ROOT_LEVEL) {
pdptr = kvm_pdptr_read(vcpu, i);
if (!is_present_gpte(pdptr)) {
rootp[i] = 0;
continue;
}
root_gfn = pdptr  PAGE_SHIFT;
-   } else if (vcpu-arch.mmu.root_level == 0)
-   root_gfn = 0;
-   else
+   } else
root_gfn = vcpu-arch.cr3  PAGE_SHIFT;
 
if (mmu_check_root(vcpu, root_gfn))
-- 
1.7.0.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5] KVM: MMU: Fix check for cr3 outside guest memory

2010-04-26 Thread Avi Kivity
Fir direct maps, root_gfn or cr3 are meaningless.  This hasn't bitten us
because no sane guest sets cr3 outside its own memory even in real mode.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/mmu.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 6e925b3..e3acc9e 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2083,7 +2083,7 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
} else
root_gfn = vcpu-arch.cr3  PAGE_SHIFT;
 
-   if (mmu_check_root(vcpu, root_gfn))
+   if (!direct  mmu_check_root(vcpu, root_gfn))
return 1;
sp = kvm_mmu_get_page(vcpu, root_gfn, i  30,
  PT32_ROOT_LEVEL, direct,
-- 
1.7.0.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: Minor MMU documentation edits

2010-04-26 Thread Avi Kivity
Reported by Andrew Jones.

Signed-off-by: Avi Kivity a...@redhat.com
---
 Documentation/kvm/mmu.txt |   10 +-
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/Documentation/kvm/mmu.txt b/Documentation/kvm/mmu.txt
index da04671..0cc28fb 100644
--- a/Documentation/kvm/mmu.txt
+++ b/Documentation/kvm/mmu.txt
@@ -75,8 +75,8 @@ direct mode; otherwise it operates in shadow mode (see below).
 Memory
 ==
 
-Guest memory (gpa) is part of user address space of the process that is using
-kvm.  Userspace defines the translation between guest addresses and user
+Guest memory (gpa) is part of the user address space of the process that is
+using kvm.  Userspace defines the translation between guest addresses and user
 addresses (gpa-hva); note that two gpas may alias to the same gva, but not
 vice versa.
 
@@ -111,7 +111,7 @@ is not related to a translation directly.  It points to 
other shadow pages.
 
 A leaf spte corresponds to either one or two translations encoded into
 one paging structure entry.  These are always the lowest level of the
-translation stack, with an optional higher level translations left to NPT/EPT.
+translation stack, with optional higher level translations left to NPT/EPT.
 Leaf ptes point at guest pages.
 
 The following table shows translations encoded by leaf ptes, with higher-level
@@ -167,7 +167,7 @@ Shadow pages contain the following information:
 Either the guest page table containing the translations shadowed by this
 page, or the base page frame for linear translations.  See role.direct.
   spt:
-A pageful of 64-bit sptes containig the translations for this page.
+A pageful of 64-bit sptes containing the translations for this page.
 Accessed by both kvm and hardware.
 The page pointed to by spt will have its page-private pointing back
 at the shadow page structure.
@@ -235,7 +235,7 @@ the amount of emulation we have to do when the guest 
modifies multiple gptes,
 or when the a guest page is no longer used as a page table and is used for
 random guest data.
 
-As a side effect we have resynchronize all reachable unsynchronized shadow
+As a side effect we have to resynchronize all reachable unsynchronized shadow
 pages on a tlb flush.
 
 
-- 
1.7.0.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] mmu root page cleanups

2010-04-26 Thread Avi Kivity

On 04/26/2010 11:48 AM, Avi Kivity wrote:

I tried to separate direct and indirect page allocation to reduce memory
usage with tdp, but that turned out too messy (and not much of a win - we
allocate 8K per page outside the struct kvm_mmu_page).  However the cleanups
on the way are worthwhile.
   


Breaks during testing, please don't apply.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] KVM MMU: fix hashing for TDP and non-paging modes

2010-04-26 Thread Avi Kivity

On 04/23/2010 02:11 PM, Avi Kivity wrote:



All feedback would be welcome, since I'm new to this system!  A
strawman patch follows.



Patch is correct, but I have already fixed this in a more extensive 
patch set (that also folds the 32-bit and 64-bit cases together, etc.) 
and I'm too lazy to rebase on top of yours.


Well my patchset is broken.  Marcelo, please apply this and I'll rebase 
and fix.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/1] KVM: x86: avoid unnecessary bitmap allocation when memslot is clean

2010-04-26 Thread Takuya Yoshikawa
Hi Avi,

I want you look at this patch before discussing about our patch set.

This patch sould itself worth it, I belive, and shows how much improvements
we can expect from our dirty bitmap works.

Note: this will not conflict with our future works!

Thanks,
  Takuya

** Simple test **
1. What we did
I measured the time needed for the get dirty log ioctl during playing with
Ubuntu installer, in which VGA is logging, and compared the result to that
of the original version.

2. Test environment
Intel(R) Core(TM)2 Duo CPU P8700  @ 2.53GHz
(No EPT support)
With latest qemu-kvm.git

This test is for clarifying the micro performance, and is not intended to
verify what we can expect on Enterprize servers: so I used my laptop.

3. Results
I picked up three typical parts for comparison.

- TYPE1
original

slot=6, slot.len= 32768, usec=   65
slot=7, slot.len= 32768, usec=   24
slot=6, slot.len= 32768, usec=   26
slot=7, slot.len= 32768, usec=   23
slot=6, slot.len= 32768, usec=   24
slot=7, slot.len= 32768, usec=   24
slot=6, slot.len= 32768, usec=   25
slot=7, slot.len= 32768, usec=   25

with my patch

slot=6, slot.len= 32768, usec=3
slot=7, slot.len= 32768, usec=3
slot=6, slot.len= 32768, usec=3
slot=7, slot.len= 32768, usec=2
slot=6, slot.len= 32768, usec=3
slot=7, slot.len= 32768, usec=2
slot=6, slot.len= 32768, usec=3
slot=7, slot.len= 32768, usec=3


- TYPE2
original

slot=6, slot.len= 32768, usec=  158
slot=7, slot.len= 32768, usec=   26
slot=6, slot.len= 32768, usec=  157
slot=7, slot.len= 32768, usec=   26
slot=6, slot.len= 32768, usec=  157
slot=7, slot.len= 32768, usec=   26
slot=6, slot.len= 32768, usec=  158
slot=7, slot.len= 32768, usec=   27

with my patch

slot=6, slot.len= 32768, usec=  117
slot=7, slot.len= 32768, usec=2
slot=6, slot.len= 32768, usec=  124
slot=7, slot.len= 32768, usec=1
slot=6, slot.len= 32768, usec=  121
slot=7, slot.len= 32768, usec=1
slot=6, slot.len= 32768, usec=   72
slot=7, slot.len= 32768, usec=2


- TYPE3
original

slot=5, slot.len=  16777216, usec=9
slot=6, slot.len= 32768, usec=6
slot=7, slot.len= 32768, usec=7
slot=5, slot.len=  16777216, usec=   11
slot=6, slot.len= 32768, usec=7
slot=7, slot.len= 32768, usec=7
slot=5, slot.len=  16777216, usec=   14
slot=6, slot.len= 32768, usec=8
slot=7, slot.len= 32768, usec=8
slot=5, slot.len=  16777216, usec=   11
slot=6, slot.len= 32768, usec=9
slot=7, slot.len= 32768, usec=9

with my patch

slot=5, slot.len=  16777216, usec=2
slot=6, slot.len= 32768, usec=2
slot=7, slot.len= 32768, usec=1
slot=5, slot.len=  16777216, usec=2
slot=6, slot.len= 32768, usec=2
slot=7, slot.len= 32768, usec=2
slot=5, slot.len=  16777216, usec=2
slot=6, slot.len= 32768, usec=1
slot=7, slot.len= 32768, usec=1
slot=5, slot.len=  16777216, usec=3
slot=6, slot.len= 32768, usec=2
slot=7, slot.len= 32768, usec=2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1] KVM: x86: avoid unnecessary bitmap allocation when memslot is clean

2010-04-26 Thread Takuya Yoshikawa
Although we always allocate a new dirty bitmap in x86's get_dirty_log(),
it is only used as a zero-source of copy_to_user() and freed right after
that when memslot is clean. This patch uses clear_user() instead of doing
this unnecessary zero-source allocation.

Performance improvement: as we can expect easily, the time needed to
allocate a bitmap is completely reduced. In my test, the improved ioctl
was about 4 to 10 times faster than the original one for clean slots.
Furthermore, the reduced allocations seem to produce good effects for
other cases too. Actually, I observed that the time for the ioctl was
more stable than the original one and the average time for dirty slots
was also reduced by some extent.

Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
---
 arch/x86/kvm/x86.c |   36 ++--
 1 files changed, 22 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6b2ce1d..0086d64 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2744,7 +2744,6 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
struct kvm_memory_slot *memslot;
unsigned long n;
unsigned long is_dirty = 0;
-   unsigned long *dirty_bitmap = NULL;
 
mutex_lock(kvm-slots_lock);
 
@@ -2759,27 +2758,29 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 
n = kvm_dirty_bitmap_bytes(memslot);
 
-   r = -ENOMEM;
-   dirty_bitmap = vmalloc(n);
-   if (!dirty_bitmap)
-   goto out;
-   memset(dirty_bitmap, 0, n);
-
for (i = 0; !is_dirty  i  n/sizeof(long); i++)
is_dirty = memslot-dirty_bitmap[i];
 
/* If nothing is dirty, don't bother messing with page tables. */
if (is_dirty) {
struct kvm_memslots *slots, *old_slots;
+   unsigned long *dirty_bitmap;
 
spin_lock(kvm-mmu_lock);
kvm_mmu_slot_remove_write_access(kvm, log-slot);
spin_unlock(kvm-mmu_lock);
 
-   slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
-   if (!slots)
-   goto out_free;
+   r = -ENOMEM;
+   dirty_bitmap = vmalloc(n);
+   if (!dirty_bitmap)
+   goto out;
+   memset(dirty_bitmap, 0, n);
 
+   slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
+   if (!slots) {
+   vfree(dirty_bitmap);
+   goto out;
+   }
memcpy(slots, kvm-memslots, sizeof(struct kvm_memslots));
slots-memslots[log-slot].dirty_bitmap = dirty_bitmap;
 
@@ -2788,13 +2789,20 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
synchronize_srcu_expedited(kvm-srcu);
dirty_bitmap = old_slots-memslots[log-slot].dirty_bitmap;
kfree(old_slots);
+
+   r = -EFAULT;
+   if (copy_to_user(log-dirty_bitmap, dirty_bitmap, n)) {
+   vfree(dirty_bitmap);
+   goto out;
+   }
+   vfree(dirty_bitmap);
+   } else {
+   r = -EFAULT;
+   if (clear_user(log-dirty_bitmap, n))
+   goto out;
}
 
r = 0;
-   if (copy_to_user(log-dirty_bitmap, dirty_bitmap, n))
-   r = -EFAULT;
-out_free:
-   vfree(dirty_bitmap);
 out:
mutex_unlock(kvm-slots_lock);
return r;
-- 
1.6.3.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/9] Make use of the redirection of guest serial

2010-04-26 Thread Jason Wang
The guest console is useful for troubleshooting the failure cases
especially for the one who has calltrace. And the session through it
is also useful for the networking related test cases ( consider the
test which could shutdown or crash the guest network ). so this
patchset first redirects the guest serial to unix domain socket and
then could log the guest serial through a thread or make it as a
method of session. The last patch makes all linux guests to use serial as
their consoles through unattended files. 

---

Jason Wang (9):
  KVM test: Introduce the prompt assist
  KVM test: Add the ability to send the username in remote_login()
  KVM test: Make the login re suitable for serial console
  KVM test: Redirect the serial to the unix domain socket
  KVM test: Log the content from guest serial console
  KVM test: Raise error when met unknown type in kvm_vm.remote_login().
  KVM test: Introduce the local_login()
  KVM test: Create the background threads before calling process()
  KVM test: Redirect the console to serial for all linux guests


 client/tests/kvm/kvm_preprocessing.py|   64 --
 client/tests/kvm/kvm_utils.py|   23 +++--
 client/tests/kvm/kvm_vm.py   |   48 
 client/tests/kvm/tests_base.cfg.sample   |1 
 client/tests/kvm/unattended/Fedora-10.ks |2 -
 client/tests/kvm/unattended/Fedora-11.ks |2 -
 client/tests/kvm/unattended/Fedora-12.ks |2 -
 client/tests/kvm/unattended/Fedora-8.ks  |2 -
 client/tests/kvm/unattended/Fedora-9.ks  |2 -
 client/tests/kvm/unattended/OpenSUSE-11.xml  |1 
 client/tests/kvm/unattended/RHEL-3-series.ks |2 -
 client/tests/kvm/unattended/RHEL-4-series.ks |2 -
 client/tests/kvm/unattended/RHEL-5-series.ks |2 -
 client/tests/kvm/unattended/SLES-11.xml  |1 
 14 files changed, 128 insertions(+), 26 deletions(-)

-- 
Signature
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/9] KVM test: Introduce the prompt assist

2010-04-26 Thread Jason Wang
Sometimes we need to send an assist string to a session in order to
get the prompt especially when re-connecting to an already logged
serial session. This patch send the assist string before doing the
pattern matching of remote_login.

Signed-off-by: Jason Wang jasow...@redhat.com
---
 client/tests/kvm/kvm_utils.py |9 +++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py
index 25f3c8c..9adbaee 100644
--- a/client/tests/kvm/kvm_utils.py
+++ b/client/tests/kvm/kvm_utils.py
@@ -451,7 +451,8 @@ def check_kvm_source_dir(source_dir):
 # The following are functions used for SSH, SCP and Telnet communication with
 # guests.
 
-def remote_login(command, password, prompt, linesep=\n, timeout=10):
+def remote_login(command, password, prompt, linesep=\n, timeout=10,
+ prompt_assist = None):
 
 Log into a remote host (guest) using SSH or Telnet. Run the given command
 using kvm_spawn and provide answers to the questions asked. If timeout
@@ -468,7 +469,8 @@ def remote_login(command, password, prompt, linesep=\n, 
timeout=10):
 @param timeout: The maximal time duration (in seconds) to wait for each
 step of the login procedure (i.e. the Are you sure prompt, the
 password prompt, the shell prompt, etc)
-
+@prarm prompt_assist: An assistant string sent before the pattern
+matching in order to get the prompt for some kinds of shell_client.
 @return Return the kvm_spawn object on success and None on failure.
 
 sub = kvm_subprocess.kvm_shell_session(command,
@@ -479,6 +481,9 @@ def remote_login(command, password, prompt, linesep=\n, 
timeout=10):
 
 logging.debug(Trying to login with command '%s' % command)
 
+if prompt_assist is not None:
+sub.sendline(prompt_assist)
+
 while True:
 (match, text) = sub.read_until_last_line_matches(
 [r[Aa]re you sure, r[Pp]assword:\s*$, r^\s*[Ll]ogin:\s*$,

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/9] KVM test: Add the ability to send the username in remote_login()

2010-04-26 Thread Jason Wang
In order to let the serial console work, we must let the
remote_login() send the username when needed.

Signed-off-by: Jason Wang jasow...@redhat.com
---
 client/tests/kvm/kvm_utils.py |   14 ++
 1 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py
index 9adbaee..1ea0852 100644
--- a/client/tests/kvm/kvm_utils.py
+++ b/client/tests/kvm/kvm_utils.py
@@ -452,7 +452,7 @@ def check_kvm_source_dir(source_dir):
 # guests.
 
 def remote_login(command, password, prompt, linesep=\n, timeout=10,
- prompt_assist = None):
+ prompt_assist = None, username = None):
 
 Log into a remote host (guest) using SSH or Telnet. Run the given command
 using kvm_spawn and provide answers to the questions asked. If timeout
@@ -471,6 +471,8 @@ def remote_login(command, password, prompt, linesep=\n, 
timeout=10,
 password prompt, the shell prompt, etc)
 @prarm prompt_assist: An assistant string sent before the pattern
 matching in order to get the prompt for some kinds of shell_client.
+@param username: The user name used to log into the session
+
 @return Return the kvm_spawn object on success and None on failure.
 
 sub = kvm_subprocess.kvm_shell_session(command,
@@ -505,9 +507,13 @@ def remote_login(command, password, prompt, linesep=\n, 
timeout=10,
 sub.close()
 return None
 elif match == 2:  # login:
-logging.debug(Got unexpected login prompt)
-sub.close()
-return None
+if username is None:
+logging.debug(Got unexpected login prompt)
+sub.close()
+return None
+else:
+sub.sendline(username)
+continue
 elif match == 3:  # Connection closed
 logging.debug(Got 'Connection closed')
 sub.close()

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/9] KVM test: Make the login re suitable for serial console

2010-04-26 Thread Jason Wang
Current matching re ^\s*[Ll]ogin:\s*$ is not suitable for the serial
console, so change it to [Ll]ogin:.

Signed-off-by: Jason Wang jasow...@redhat.com
---
 client/tests/kvm/kvm_utils.py |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py
index 1ea0852..bb42314 100644
--- a/client/tests/kvm/kvm_utils.py
+++ b/client/tests/kvm/kvm_utils.py
@@ -488,7 +488,7 @@ def remote_login(command, password, prompt, linesep=\n, 
timeout=10,
 
 while True:
 (match, text) = sub.read_until_last_line_matches(
-[r[Aa]re you sure, r[Pp]assword:\s*$, r^\s*[Ll]ogin:\s*$,
+[r[Aa]re you sure, r[Pp]assword:\s*$, r[Ll]ogin:,
  r[Cc]onnection.*closed, r[Cc]onnection.*refused,
  r[Pp]lease wait, prompt],
  timeout=timeout, internal_timeout=0.5)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/9] KVM test: Redirect the serial to the unix domain socket

2010-04-26 Thread Jason Wang
This patch redirect the guest serial to the unix domain socket which
would be used by the following patches to dump its content or use it
as a session.

Signed-off-by: Jason Wang jasow...@redhat.com
---
 client/tests/kvm/kvm_vm.py |   19 ---
 1 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/client/tests/kvm/kvm_vm.py b/client/tests/kvm/kvm_vm.py
index 6bc7987..8f4753f 100755
--- a/client/tests/kvm/kvm_vm.py
+++ b/client/tests/kvm/kvm_vm.py
@@ -116,17 +116,20 @@ class VM:
 self.address_cache = address_cache
 self.pci_assignable = None
 
-# Find available monitor filename
+# Find available filenames for monitor and guest serial redirection
 while True:
-# The monitor filename should be unique
+# The filenames should be unique
 self.instance = (time.strftime(%Y%m%d-%H%M%S-) +
  kvm_utils.generate_random_string(4))
-self.monitor_file_name = os.path.join(/tmp,
-  monitor- + self.instance)
-if not os.path.exists(self.monitor_file_name):
-break
-
 
+names = [os.path.join(/tmp, type + self.instance) for type in
+ monitor-, serial-]
+if True in [os.path.exists(file) for file in names]:
+continue
+else:
+[self.monitor_file_name, self.serial_file_name] = names
+break
+ 
 def clone(self, name=None, params=None, root_dir=None, address_cache=None):
 
 Return a clone of the VM object with optionally modified parameters.
@@ -316,6 +319,8 @@ class VM:
 for pci_id in self.pa_pci_ids:
 qemu_cmd +=  -pcidevice host=%s % pci_id
 
+qemu_cmd +=  -serial unix:%s,server,nowait % self.serial_file_name
+
 return qemu_cmd
 
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/9] KVM test: Log the content from guest serial console

2010-04-26 Thread Jason Wang
This patch tries to get the content of guest serial and log it into
the debug directoy of the testcase through a dedicated thread which is
created in the preprocessing and ended in the postprocessing. The
params of serial_mode must be set to dump in order to make use of
this feature.

Signed-off-by: Jason Wang jasow...@redhat.com
---
 client/tests/kvm/kvm_preprocessing.py  |   59 +++-
 client/tests/kvm/tests_base.cfg.sample |1 +
 2 files changed, 59 insertions(+), 1 deletions(-)

diff --git a/client/tests/kvm/kvm_preprocessing.py 
b/client/tests/kvm/kvm_preprocessing.py
index 4b9290c..50d0e35 100644
--- a/client/tests/kvm/kvm_preprocessing.py
+++ b/client/tests/kvm/kvm_preprocessing.py
@@ -1,4 +1,5 @@
 import sys, os, time, commands, re, logging, signal, glob, threading, shutil
+import socket, select
 from autotest_lib.client.bin import test, utils
 from autotest_lib.client.common_lib import error
 import kvm_vm, kvm_utils, kvm_subprocess, ppm_utils
@@ -13,7 +14,8 @@ except ImportError:
 
 _screendump_thread = None
 _screendump_thread_termination_event = None
-
+_serialdump_thread = None
+_serialdump_thread_termination_event = None
 
 def preprocess_image(test, params):
 
@@ -267,6 +269,16 @@ def preprocess(test, params, env):
   args=(test, params, env))
 _screendump_thread.start()
 
+# Start the serial dump thread
+if params.get(serial_mode) == dump:
+logging.debug(Starting serialdump thread)
+global _serialdump_thread, _serialdump_thread_termination_event
+_serialdump_thread_termination_event = threading.Event()
+_serialdump_thread = threading.Thread(target=_dump_serial_console,
+  args=(test, params, env))
+_serialdump_thread.start()
+
+
 
 def postprocess(test, params, env):
 
@@ -286,6 +298,13 @@ def postprocess(test, params, env):
 _screendump_thread_termination_event.set()
 _screendump_thread.join(10)
 
+# Terminate the serialdump thread
+global _serialdump_thread, _serialdump_thread_termination_event
+if _serialdump_thread:
+logging.debug(Terminating serialdump thread...)
+_serialdump_thread_termination_event.set()
+_serialdump_thread.join(10)
+
 # Warn about corrupt PPM files
 for f in glob.glob(os.path.join(test.debugdir, *.ppm)):
 if not ppm_utils.image_verify_ppm_file(f):
@@ -450,3 +469,41 @@ def _take_screendumps(test, params, env):
 if _screendump_thread_termination_event.isSet():
 break
 _screendump_thread_termination_event.wait(delay)
+
+def _dump_serial_console(test, params, env):
+global _serialdump_thread_termination_event
+rs = []
+files = {}
+
+while True:
+for vm in kvm_utils.env_get_all_vms(env):
+if not files.has_key(vm):
+try:
+serial_socket = socket.socket(socket.AF_UNIX,
+  socket.SOCK_STREAM)
+serial_socket.setblocking(False)
+serial_socket.connect(vm.serial_file_name)
+except:
+logging.debug(Could not connect to serial socket for %s %
+  vm.name)
+continue
+rs.append(serial_socket)
+serial_dump_filename = os.path.join(test.debugdir,
+serial-%s % vm.name)
+files[vm] = [serial_socket, file(serial_dump_filename, a+)]
+
+r, w, x = select.select(rs, [], [], 0.5)
+for vm in files.keys():
+[s ,d] = files[vm]
+if s in r:
+data = s.recv(16384)
+if len(data) == 0:
+rs.remove(s)
+files.pop(vm)
+else:
+d.write(data)
+
+if _serialdump_thread_termination_event.isSet():
+break
+
+
diff --git a/client/tests/kvm/tests_base.cfg.sample 
b/client/tests/kvm/tests_base.cfg.sample
index 9f82ffb..169a69e 100644
--- a/client/tests/kvm/tests_base.cfg.sample
+++ b/client/tests/kvm/tests_base.cfg.sample
@@ -13,6 +13,7 @@ start_vm = yes
 kill_vm = no
 kill_vm_gracefully = yes
 kill_unresponsive_vms = yes
+serial_mode = dump
 
 # Screendump specific stuff
 convert_ppm_files_to_png_on_error = yes

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/9] KVM test: Raise error when met unknown type in kvm_vm.remote_login().

2010-04-26 Thread Jason Wang
Need to raise the error when met the unknown type of shell_client in
kvm_vm.remote_login() in order to avoid the traceback.

Signed-off-by: Jason Wang jasow...@redhat.com
---
 client/tests/kvm/kvm_vm.py |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/client/tests/kvm/kvm_vm.py b/client/tests/kvm/kvm_vm.py
index 8f4753f..0cdf925 100755
--- a/client/tests/kvm/kvm_vm.py
+++ b/client/tests/kvm/kvm_vm.py
@@ -806,7 +806,9 @@ class VM:
 elif client == nc:
 session = kvm_utils.netcat(address, port, username, password,
prompt, linesep, timeout)
-
+else:
+raise error.TestError(Unknown shell_client type %s % client)
+
 if session:
 session.set_status_test_command(self.params.get(status_test_
 command, ))

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 7/9] KVM test: Introduce the local_login()

2010-04-26 Thread Jason Wang
This patch introduces a new method which is used to log into the guest
through the guest serial console. The serial_mode must be set to
session in order to make use of this patch.

Signed-off-by: Jason Wang jasow...@redhat.com
---
 client/tests/kvm/kvm_vm.py |   25 +
 1 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/client/tests/kvm/kvm_vm.py b/client/tests/kvm/kvm_vm.py
index 0cdf925..a22893b 100755
--- a/client/tests/kvm/kvm_vm.py
+++ b/client/tests/kvm/kvm_vm.py
@@ -814,7 +814,32 @@ class VM:
 command, ))
 return session
 
+def local_login(self, timeout=240):
+
+Log into the guest via serial console
+If timeout expires while waiting for output from the guest (e.g. a
+password prompt or a shell prompt) -- fail.
+
+
+serial_mode = self.params.get(serial_mode)
+username = self.params.get(username, )
+password = self.params.get(password, )
+prompt = self.params.get(shell_prompt, [\#\$])
+linesep = eval('%s' % self.params.get(shell_linesep, r\n))
 
+if serial_mode != session:
+logging.debug(serial_mode is not session)
+return None
+else:
+command = nc -U %s  % self.serial_file_name
+assist = self.params.get(prompt_assist)
+session = kvm_utils.remote_login(command, password, prompt, 
linesep,
+ timeout, , username)
+if session:
+session.set_status_test_command(self.params.get(status_test_
+command, ))
+return session
+
 def copy_files_to(self, local_path, remote_path, nic_index=0, timeout=300):
 
 Transfer files to the guest.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 8/9] KVM test: Create the background threads before calling process()

2010-04-26 Thread Jason Wang
If the screendump and scrialdump threads are created after the
process(), we may lose the progress tracking of guest shutting
down. So this patch creates them before calling process() in preprocess.

Signed-off-by: Jason Wang jasow...@redhat.com
---
 client/tests/kvm/kvm_preprocessing.py |5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/client/tests/kvm/kvm_preprocessing.py 
b/client/tests/kvm/kvm_preprocessing.py
index 50d0e35..73e835a 100644
--- a/client/tests/kvm/kvm_preprocessing.py
+++ b/client/tests/kvm/kvm_preprocessing.py
@@ -257,9 +257,6 @@ def preprocess(test, params, env):
 int(params.get(pre_command_timeout, 600)),
 params.get(pre_command_noncritical) == yes)
 
-# Preprocess all VMs and images
-process(test, params, env, preprocess_image, preprocess_vm)
-
 # Start the screendump thread
 if params.get(take_regular_screendumps) == yes:
 logging.debug(Starting screendump thread)
@@ -278,6 +275,8 @@ def preprocess(test, params, env):
   args=(test, params, env))
 _serialdump_thread.start()
 
+# Preprocess all VMs and images
+process(test, params, env, preprocess_image, preprocess_vm)
 
 
 def postprocess(test, params, env):

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 9/9] KVM test: Redirect the console to serial for all linux guests

2010-04-26 Thread Jason Wang
As we have the ability to dump the content from serial console or use
a session through it, we need to redirect the console to serial
through unattended files to make use of it.

Signed-off-by: Jason Wang jasow...@redhat.com
---
 client/tests/kvm/unattended/Fedora-10.ks |2 +-
 client/tests/kvm/unattended/Fedora-11.ks |2 +-
 client/tests/kvm/unattended/Fedora-12.ks |2 +-
 client/tests/kvm/unattended/Fedora-8.ks  |2 +-
 client/tests/kvm/unattended/Fedora-9.ks  |2 +-
 client/tests/kvm/unattended/OpenSUSE-11.xml  |1 +
 client/tests/kvm/unattended/RHEL-3-series.ks |2 +-
 client/tests/kvm/unattended/RHEL-4-series.ks |2 +-
 client/tests/kvm/unattended/RHEL-5-series.ks |2 +-
 client/tests/kvm/unattended/SLES-11.xml  |1 +
 10 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/client/tests/kvm/unattended/Fedora-10.ks 
b/client/tests/kvm/unattended/Fedora-10.ks
index 61e59d7..8628036 100644
--- a/client/tests/kvm/unattended/Fedora-10.ks
+++ b/client/tests/kvm/unattended/Fedora-10.ks
@@ -11,7 +11,7 @@ firewall --enabled --ssh
 selinux --enforcing
 timezone --utc America/New_York
 firstboot --disable
-bootloader --location=mbr
+bootloader --location=mbr --append=console=ttyS0,115200
 zerombr
 clearpart --all --initlabel
 autopart
diff --git a/client/tests/kvm/unattended/Fedora-11.ks 
b/client/tests/kvm/unattended/Fedora-11.ks
index 0be7d06..5ce8ee2 100644
--- a/client/tests/kvm/unattended/Fedora-11.ks
+++ b/client/tests/kvm/unattended/Fedora-11.ks
@@ -10,7 +10,7 @@ firewall --enabled --ssh
 selinux --enforcing
 timezone --utc America/New_York
 firstboot --disable
-bootloader --location=mbr
+bootloader --location=mbr --append=console=ttyS0,115200
 zerombr
 
 clearpart --all --initlabel
diff --git a/client/tests/kvm/unattended/Fedora-12.ks 
b/client/tests/kvm/unattended/Fedora-12.ks
index 0be7d06..5ce8ee2 100644
--- a/client/tests/kvm/unattended/Fedora-12.ks
+++ b/client/tests/kvm/unattended/Fedora-12.ks
@@ -10,7 +10,7 @@ firewall --enabled --ssh
 selinux --enforcing
 timezone --utc America/New_York
 firstboot --disable
-bootloader --location=mbr
+bootloader --location=mbr --append=console=ttyS0,115200
 zerombr
 
 clearpart --all --initlabel
diff --git a/client/tests/kvm/unattended/Fedora-8.ks 
b/client/tests/kvm/unattended/Fedora-8.ks
index f4a872d..884d556 100644
--- a/client/tests/kvm/unattended/Fedora-8.ks
+++ b/client/tests/kvm/unattended/Fedora-8.ks
@@ -11,7 +11,7 @@ firewall --enabled --ssh
 selinux --enforcing
 timezone --utc America/New_York
 firstboot --disable
-bootloader --location=mbr
+bootloader --location=mbr --append=console=ttyS0,115200
 zerombr
 clearpart --all --initlabel
 autopart
diff --git a/client/tests/kvm/unattended/Fedora-9.ks 
b/client/tests/kvm/unattended/Fedora-9.ks
index f4a872d..884d556 100644
--- a/client/tests/kvm/unattended/Fedora-9.ks
+++ b/client/tests/kvm/unattended/Fedora-9.ks
@@ -11,7 +11,7 @@ firewall --enabled --ssh
 selinux --enforcing
 timezone --utc America/New_York
 firstboot --disable
-bootloader --location=mbr
+bootloader --location=mbr --append=console=ttyS0,115200
 zerombr
 clearpart --all --initlabel
 autopart
diff --git a/client/tests/kvm/unattended/OpenSUSE-11.xml 
b/client/tests/kvm/unattended/OpenSUSE-11.xml
index 7dd44fa..2a4fec0 100644
--- a/client/tests/kvm/unattended/OpenSUSE-11.xml
+++ b/client/tests/kvm/unattended/OpenSUSE-11.xml
@@ -50,6 +50,7 @@
 moduleedd/module
   /initrd_module
 /initrd_modules
+appendconsole=ttyS0,115200/append
 loader_typegrub/loader_type
 sections config:type=list/
   /bootloader
diff --git a/client/tests/kvm/unattended/RHEL-3-series.ks 
b/client/tests/kvm/unattended/RHEL-3-series.ks
index 884b386..26b1130 100644
--- a/client/tests/kvm/unattended/RHEL-3-series.ks
+++ b/client/tests/kvm/unattended/RHEL-3-series.ks
@@ -10,7 +10,7 @@ rootpw 123456
 firewall --enabled --ssh
 timezone America/New_York
 firstboot --disable
-bootloader --location=mbr
+bootloader --location=mbr --append=console=ttyS0,115200
 clearpart --all --initlabel
 autopart
 reboot
diff --git a/client/tests/kvm/unattended/RHEL-4-series.ks 
b/client/tests/kvm/unattended/RHEL-4-series.ks
index ce4a430..f2f934f 100644
--- a/client/tests/kvm/unattended/RHEL-4-series.ks
+++ b/client/tests/kvm/unattended/RHEL-4-series.ks
@@ -11,7 +11,7 @@ firewall --enabled --ssh
 selinux --enforcing
 timezone --utc America/New_York
 firstboot --disable
-bootloader --location=mbr
+bootloader --location=mbr --append=console=ttyS0,115200
 zerombr
 clearpart --all --initlabel
 autopart
diff --git a/client/tests/kvm/unattended/RHEL-5-series.ks 
b/client/tests/kvm/unattended/RHEL-5-series.ks
index f4a872d..884d556 100644
--- a/client/tests/kvm/unattended/RHEL-5-series.ks
+++ b/client/tests/kvm/unattended/RHEL-5-series.ks
@@ -11,7 +11,7 @@ firewall --enabled --ssh
 selinux --enforcing
 timezone --utc America/New_York
 firstboot --disable
-bootloader --location=mbr
+bootloader --location=mbr 

Re: Synchronized time with kvm_clock

2010-04-26 Thread Alex Hermann
On Monday 26 April 2010, I wrote:
 host:
 cat /sys/devices/system/clocksource/clocksource0/current_clocksource
 tsc
 
 guest:
 cat /sys/devices/system/clocksource/clocksource0/current_clocksource
 kvm-clock

Forgotten some info which might be essential:

Kernel (host and guest): 2.6.32-trunk-amd64
qemu-kvm: 0.12.3+dfsg-4


Please keep me on the cc, I'm not on the list.
-- 
Greetings,

Alex Hermann

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] KVM test: Use customized command to get the version of kvm and its

2010-04-26 Thread Jason Wang
userspace

Current method may or may not work for various kinds of
distribution. So this patch enable the ability to use customized
commands to get the version of kvm and its userspace. kvm_ver_cmd is
used for kvm verison and kvm_userspace_ver_cmd is for its userspace.

Signed-off-by: Jason Wang jasow...@redhat.com
---
 client/tests/kvm/kvm_preprocessing.py |   18 +-
 1 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/client/tests/kvm/kvm_preprocessing.py 
b/client/tests/kvm/kvm_preprocessing.py
index 4b9290c..16200ab 100644
--- a/client/tests/kvm/kvm_preprocessing.py
+++ b/client/tests/kvm/kvm_preprocessing.py
@@ -225,10 +225,10 @@ def preprocess(test, params, env):
 # Get the KVM kernel module version and write it as a keyval
 logging.debug(Fetching KVM module version...)
 if os.path.exists(/dev/kvm):
-try:
-kvm_version = open(/sys/module/kvm/version).read().strip()
-except:
-kvm_version = os.uname()[2]
+kvm_ver_cmd = params.get(kvm_ver_cmd, cat /sys/module/kvm/version)
+s, kvm_version = commands.getstatusoutput(kvm_ver_cmd)
+if s != 0:
+kvm_version = Unknown
 else:
 kvm_version = Unknown
 logging.debug(KVM module not loaded)
@@ -239,11 +239,11 @@ def preprocess(test, params, env):
 logging.debug(Fetching KVM userspace version...)
 qemu_path = kvm_utils.get_path(test.bindir, params.get(qemu_binary,
qemu))
-version_line = commands.getoutput(%s -help | head -n 1 % qemu_path)
-matches = re.findall([Vv]ersion .*?,, version_line)
-if matches:
-kvm_userspace_version =  .join(matches[0].split()[1:]).strip(,)
-else:
+def_qemu_ver_cmd = %s -help | head -n 1 | awk '{ print $5}' % qemu_path
+kvm_userspace_ver_cmd = params.get(kvm_userspace_ver_cmd,
+   def_qemu_ver_cmd)
+s, kvm_userspace_version = commands.getstatusoutput(kvm_userspace_ver_cmd)
+if s != 0:
 kvm_userspace_version = Unknown
 logging.debug(Could not fetch KVM userspace version)
 logging.debug(KVM userspace version: %s % kvm_userspace_version)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] KVM test: Create ksm scanner through pre_command

2010-04-26 Thread Jason Wang
KSM may have various control interface for different distributions,so
this patch launch ksm through pre_command instead of the hard-coded
bits in the test. User may specify their owner suitable commands or
paramteres.

Signed-off-by: Jason Wang jasow...@redhat.com
---
 client/tests/kvm/tests/ksm_overcommit.py |   15 ---
 client/tests/kvm/tests_base.cfg.sample   |2 ++
 2 files changed, 2 insertions(+), 15 deletions(-)

diff --git a/client/tests/kvm/tests/ksm_overcommit.py 
b/client/tests/kvm/tests/ksm_overcommit.py
index 2dd46c4..4aa6deb 100644
--- a/client/tests/kvm/tests/ksm_overcommit.py
+++ b/client/tests/kvm/tests/ksm_overcommit.py
@@ -412,21 +412,6 @@ def run_ksm_overcommit(test, params, env):
  (3100 - 64.0)))
 mem = int(math.floor(host_mem * overcommit / vmsc))
 
-logging.debug(Checking KSM status...)
-ksm_flag = 0
-for line in os.popen('ksmctl info').readlines():
-if line.startswith('flags'):
-ksm_flag = int(line.split(' ')[1].split(',')[0])
-if int(ksm_flag) != 1:
-logging.info(KSM module is not loaded! Trying to load module and 
- start ksmctl...)
-try:
-utils.run(modprobe ksm)
-utils.run(ksmctl start 5000 100)
-except error.CmdError, e:
-raise error.TestFail(Failed to load KSM: %s % e)
-logging.debug(KSM module loaded and ksmctl started)
-
 swap = int(utils.read_from_meminfo(SwapTotal)) / 1024
 
 logging.debug(Overcommit = %f, overcommit)
diff --git a/client/tests/kvm/tests_base.cfg.sample 
b/client/tests/kvm/tests_base.cfg.sample
index e73ba44..2db0d2c 100644
--- a/client/tests/kvm/tests_base.cfg.sample
+++ b/client/tests/kvm/tests_base.cfg.sample
@@ -285,6 +285,8 @@ variants:
 catch_uuid_cmd = dmidecode | awk -F: '/UUID/ {print $2}'
 
 - ksm_overcommit:
+pre_command = [ -e /dev/ksm ]  true || modprobe ksm  ksmctl start 
5000 50
+pre_command_critical = yes
 # Don't preprocess any vms as we need to change its params
 vms = ''
 image_snapshot = yes

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] KVM test: Remove the duplicated KERNEL paramters in the pxe configuration file

2010-04-26 Thread Jason Wang
Remove the duplicated KERNEL vmlinuz in unattended.py

Signed-off-by: Jason Wang jasow...@redhat.com
---
 client/tests/kvm/scripts/unattended.py |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/client/tests/kvm/scripts/unattended.py 
b/client/tests/kvm/scripts/unattended.py
index e41bc86..fdadd03 100755
--- a/client/tests/kvm/scripts/unattended.py
+++ b/client/tests/kvm/scripts/unattended.py
@@ -209,7 +209,6 @@ class UnattendedInstall(object):
 pxe_config.write('PROMPT 0\n')
 pxe_config.write('LABEL pxeboot\n')
 pxe_config.write(' KERNEL vmlinuz\n')
-pxe_config.write(' KERNEL vmlinuz\n')
 pxe_config.write(' APPEND initrd=initrd.img %s\n' %
  self.kernel_args)
 pxe_config.close()

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Synchronized time with kvm_clock

2010-04-26 Thread Alex Hermann
Hello all,

from the various sources google comes up with, it seems kvm_clock is now the 
preferred clock source for kvm guests. I was hoping it would keep the time 
synced with the host, but apparentely it doesn't. I'm seeing a pretty steady 1 
sec diff between host and guest. The host is synced with external time sources 
by means of ntp.

Am I supposed to run ntp inside the guest when using kvm_clock as a 
clocksource? Othwerwise, what clocksource should be used in the guest
1) to automatically get synced time with the host? or
2) which can be used alongside ntpd in the guest?


host:
cat /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc

guest:
cat /sys/devices/system/clocksource/clocksource0/current_clocksource
kvm-clock


Pleas cc me, I'm not on the list.
-- 
Greetings,

Alex Hermann

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 04/20] Make QEMUFile buf expandable, and introduce qemu_realloc_buffer() and qemu_clear_buffer().

2010-04-26 Thread Yoshiaki Tamura

Avi Kivity wrote:

On 04/23/2010 12:59 PM, Yoshiaki Tamura wrote:

Avi Kivity wrote:

On 04/21/2010 08:57 AM, Yoshiaki Tamura wrote:

Currently buf size is fixed at 32KB. It would be useful if it could
be flexible.



Why is this needed? The real buffering is in the kernel anyways; this is
only used to reduce the number of write() syscalls.


This was introduced to buffer the transfered guests image transaction
ally on the receiver side. The sender doesn't use it.
In case of intermediate state, we just discard this buffer.


How large can it grow?


It really depends on what workload is running on the guest, but it should be as 
large as the guest ram size in the worst case.



What's wrong with applying it (perhaps partially) to the guest state?
The next state transfer will overwrite it completely, no?


AFAIK, the answer is no.
qemu_loadvm_state() calls load handlers of each device emulator, and they will 
update its state directly, which means even if the transaction was not complete, 
it's impossible to recover the previous state if we don't make a buffer.


I guess your concern is about consuming large size of ram, and I think having an 
option for writing the transaction to a temporal disk image should be effective.



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 07/20] Introduce qemu_put_vector() and qemu_put_vector_prepare() to use put_vector() in QEMUFile.

2010-04-26 Thread Yoshiaki Tamura

Anthony Liguori wrote:

On 04/22/2010 11:02 PM, Yoshiaki Tamura wrote:

Anthony Liguori wrote:

On 04/21/2010 12:57 AM, Yoshiaki Tamura wrote:

For fool proof purpose, qemu_put_vector_parepare should be called
before qemu_put_vector. Then, if qemu_put_* functions except this is
called after qemu_put_vector_prepare, program will abort().

Signed-off-by: Yoshiaki Tamuratamura.yoshi...@lab.ntt.co.jp


I don't get it. What's this protecting against?


This was introduced to prevent mixing the order of normal write and
vector write, and flush QEMUFile buffer before handling vectors.
While qemu_put_buffer copies data to QEMUFile buffer,
qemu_put_vector() will bypass that buffer.

It's just fool proof purpose for what we encountered at beginning, and
if the user of qemu_put_vector() is careful enough, we can remove
qemu_put_vectore_prepare(). While writing this message, I started to
think that just calling qemu_fflush() in qemu_put_vector() would be
enough...


I definitely think removing the vector stuff in the first version would
simplify the process of getting everything merged. I'd prefer not to
have two apis so if vector operations were important from a performance
perspective, I'd want to see everything converted to a vector API.


I agree with your opinion.
I will measure the effect of introducing vector stuff, and post the data later.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 05/20] Introduce put_vector() and get_vector to QEMUFile and qemu_fopen_ops().

2010-04-26 Thread Yoshiaki Tamura

Anthony Liguori wrote:

On 04/22/2010 10:37 PM, Yoshiaki Tamura wrote:

Anthony Liguori wrote:

On 04/21/2010 12:57 AM, Yoshiaki Tamura wrote:

QEMUFile currently doesn't support writev(). For sending multiple
data, such as pages, using writev() should be more efficient.

Signed-off-by: Yoshiaki Tamuratamura.yoshi...@lab.ntt.co.jp


Is there performance data that backs this up? Since QEMUFile uses a
linear buffer for most operations that's limited to 16k, I suspect you
wouldn't be able to observe a difference in practice.


I currently don't have data, but I'll prepare it.
There were two things I wanted to avoid.

1. Pages to be copied to QEMUFile buf through qemu_put_buffer.
2. Calling write() everytime even when we want to send multiple pages
at once.

I think 2 may be neglectable.
But 1 seems to be problematic if we want make to the latency as small
as possible, no?


Copying often has strange CPU characteristics depending on whether the
data is already in cache. It's better to drive these sort of
optimizations through performance measurement because changes are not
always obvious.


I agree.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 00/20] Kemari for KVM v0.1

2010-04-26 Thread Yoshiaki Tamura

Anthony Liguori wrote:

On 04/22/2010 08:53 PM, Yoshiaki Tamura wrote:

Anthony Liguori wrote:

On 04/22/2010 08:16 AM, Yoshiaki Tamura wrote:

2010/4/22 Dor Laordl...@redhat.com:

On 04/22/2010 01:35 PM, Yoshiaki Tamura wrote:

Dor Laor wrote:

On 04/21/2010 08:57 AM, Yoshiaki Tamura wrote:

Hi all,

We have been implementing the prototype of Kemari for KVM, and
we're
sending
this message to share what we have now and TODO lists.
Hopefully, we
would like
to get early feedback to keep us in the right direction. Although
advanced
approaches in the TODO lists are fascinating, we would like to run
this project
step by step while absorbing comments from the community. The
current
code is
based on qemu-kvm.git 2b644fd0e737407133c88054ba498e772ce01f27.

For those who are new to Kemari for KVM, please take a look at the
following RFC which we posted last year.

http://www.mail-archive.com/kvm@vger.kernel.org/msg25022.html

The transmission/transaction protocol, and most of the control
logic is
implemented in QEMU. However, we needed a hack in KVM to prevent
rip
from
proceeding before synchronizing VMs. It may also need some
plumbing in
the
kernel side to guarantee replayability of certain events and
instructions,
integrate the RAS capabilities of newer x86 hardware with the HA
stack, as well
as for optimization purposes, for example.

[ snap]


The rest of this message describes TODO lists grouped by each
topic.

=== event tapping ===

Event tapping is the core component of Kemari, and it decides on
which
event the
primary should synchronize with the secondary. The basic assumption
here is
that outgoing I/O operations are idempotent, which is usually true
for
disk I/O
and reliable network protocols such as TCP.

IMO any type of network even should be stalled too. What if the VM
runs
non tcp protocol and the packet that the master node sent reached
some
remote client and before the sync to the slave the master failed?

In current implementation, it is actually stalling any type of
network
that goes through virtio-net.

However, if the application was using unreliable protocols, it should
have its own recovering mechanism, or it should be completely
stateless.

Why do you treat tcp differently? You can damage the entire VM this
way -
think of dhcp request that was dropped on the moment you switched
between
the master and the slave?

I'm not trying to say that we should treat tcp differently, but just
it's severe.
In case of dhcp request, the client would have a chance to retry after
failover, correct?
BTW, in current implementation,


I'm slightly confused about the current implementation vs. my
recollection of the original paper with Xen. I had thought that all disk
and network I/O was buffered in such a way that at each checkpoint, the
I/O operations would be released in a burst. Otherwise, you would have
to synchronize after every I/O operation which is what it seems the
current implementation does.


Yes, you're almost right.
It's synchronizing before QEMU starts emulating I/O at each device model.


If NodeA is the master and NodeB is the slave, if NodeA sends a network
packet, you'll checkpoint before the packet is actually sent, and then
if a failure occurs before the next checkpoint, won't that result in
both NodeA and NodeB sending out a duplicate version of the packet?


Yes.  But I think it's better than taking checkpoint after.

If we checkpoint after sending packet, let's say it sent TCP ACK to the client, 
and if a hardware failure occurred to NodeA during the transaction *but the 
client received the TCP ACK*, NodeB will resume from the previous state, and it 
may need to receive some data from the client. However, because the client has 
already receiver TCP ACK, it won't resend the data to NodeB.  It looks this 
data is going to be dropped.


Anyway, I've just started planning to move the sync point to network/block 
layer, and I would post the result for discussion again.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 00/20] Kemari for KVM v0.1

2010-04-26 Thread Yoshiaki Tamura

Avi Kivity wrote:

On 04/21/2010 08:57 AM, Yoshiaki Tamura wrote:

Kemari starts synchronizing VMs when QEMU handles I/O requests.
Without this patch VCPU state is already proceeded before
synchronization, and after failover to the VM on the receiver, it
hangs because of this.


We discussed moving the barrier to the actual output device, instead of
the I/O port. This allows you to complete the I/O transaction before
starting synchronization.

Does it not work for some reason?


Sorry, I've just started working on that.
I've posted this series to share what I have done so far.
Thanks for looking.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[no subject]

2010-04-26 Thread Anthony Davis

subscribe kvm
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Networking - Static NATs

2010-04-26 Thread Anthony Davis

Hi,

I wonder if anyone is able to help me?

I am trying to get a system setup that consits of:

1x Centos 5.4 Host machine
1x kvm virtual guest which will be a build server with dhcp running.
x number of guests, built from the build virtual machine.

the problem I have is that kvm currently has dhcp running and setting  
up NATs etc...


I need to stop this, but still allow my current virtual machines  
access out, how would be the best way to do this?


Kind Regards

Tony
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Networking - Static NATs

2010-04-26 Thread Javier Guerra Giraldez
On Mon, Apr 26, 2010 at 7:10 AM, Anthony Davis
t...@specialistdevelopment.com wrote:
 the problem I have is that kvm currently has dhcp running and setting up
 NATs etc...

 I need to stop this, but still allow my current virtual machines access out,
 how would be the best way to do this?

use bridged networking

-- 
Javier
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Networking - Static NATs

2010-04-26 Thread Anthony Davis

Quoting Javier Guerra Giraldez jav...@guerrag.com:


On Mon, Apr 26, 2010 at 7:10 AM, Anthony Davis
t...@specialistdevelopment.com wrote:

the problem I have is that kvm currently has dhcp running and setting up
NATs etc...

I need to stop this, but still allow my current virtual machines access out,
how would be the best way to do this?


use bridged networking


I have read about this, but there are so many methods and out of  
date ways to set it up, and also none of them describe the best method  
to remove the current dnsmasq setup.


Tony




--
Javier




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM_SET_MP_STATE is undocumented

2010-04-26 Thread Avi Kivity

On 04/26/2010 10:48 AM, Pekka Enberg wrote:

Two more interesting but undocumented ioctls:

 - KVM_SET_IDENTITY_MAP_ADDR
 - KVM_SET_BOOT_CPU_ID


I'll post patches.

Little background: we're debugging a KVM_EXIT_UNKNOWN problem for the 
largest bug-free kernel on Core i5 machine.  I've been looking at 
plain QEMU sources but it seems qemu-kvm that the person is using 
does much more during initialization. Do we have a known good list of 
mandatory steps required to properly initialize KVM on all CPUs?


KVM_GET_API_VERSION (unless you're sure you aren't running on 2.6.20 or 
2.6.21)

KVM_CREATE_VM
KVM_SET_USER_MEMORY_REGION
KVM_CREATE_VCPU
KVM_SET_TSS_ADDR
KVM_SET_IDENTITY_MAP_ADDR (really only needed on EPT machines, but 
recommended to invoke on all hosts)

KVM_CREATE_IRQCHIP (optional; if you want in-kernel lapic/ioapic/pic)
KVM_SET_CPUID2
KVM_RUN

qemu also initializes all the vcpu state from its own values and has 
elaborate memory setup.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] qemu-kvm: fix crash on reboot with vhost-net

2010-04-26 Thread Michael S. Tsirkin
When vhost-net is disabled on reboot, we set msix mask notifier
to NULL to disable further mask/unmask notifications.
Code currently tries to pass this NULL to notifier,
leading to a crash.  The right thing to do is:
- if vector is masked, we don't need to notify backend,
  just disable future notifications
- if vector is unmasked, invoke callback to unassign backend,
  then disable future notifications

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---

Note: this patch is for qemu-kvm, the code in question
is not in qemu.git.

 hw/msix.c |   15 ---
 1 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/hw/msix.c b/hw/msix.c
index 3ec8805..43361b5 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -613,9 +613,18 @@ int msix_set_mask_notifier(PCIDevice *dev, unsigned 
vector, void *opaque)
 if (vector = dev-msix_entries_nr || !dev-msix_entry_used[vector])
 return 0;
 
-if (dev-msix_mask_notifier)
-r = dev-msix_mask_notifier(dev, vector, opaque,
-msix_is_masked(dev, vector));
+if (dev-msix_mask_notifier  !msix_is_masked(dev, vector)) {
+/* Switching notifiers while vector is unmasked:
+ * mask the old one, unmask the new one. */
+if (dev-msix_mask_notifier_opaque[vector]) {
+r = dev-msix_mask_notifier(dev, vector,
+dev-msix_mask_notifier_opaque[vector],
+1);
+}
+if (r = 0  opaque) {
+r = dev-msix_mask_notifier(dev, vector, opaque, 0);
+}
+}
 if (r = 0)
 dev-msix_mask_notifier_opaque[vector] = opaque;
 return r;
-- 
1.7.1.rc1.22.g3163
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [uq/master patch 2/5] kvm: add logging count to slots

2010-04-26 Thread Marcelo Tosatti
On Sun, Apr 25, 2010 at 05:17:55PM +0300, Avi Kivity wrote:
 On 04/25/2010 04:57 PM, Jan Kiszka wrote:
 
 It's still a good idea.  The current API assumes that there will be only
 one slot-based client (or that multiple clients will keep the refcount
 themselves).
 
 After the bytemap -  multiple bitmaps conversion this can be extended to
 each client getting its own bitmap (and therefore, s/refcount/list of
 bitmaps/ and s/!refcount/list_empty()/).
 
 No concerns if
   - there is an existing use case for multiple clients, at least in
 qemu-kvm
 
 There isn't.  But I don't like hidden breakage.
 
   - the logging API is consistently converted, not just extended
 (IOW, migration_log is converted to logging_count)
 
 migration_log needs to remain global, since we want hotplug memory
 to autostart logging.
 
   - someone signs he checked that current use of start/stop in qemu is
 completely symmetrical (I think to remember this used to be not the
 case, but I might be wrong)
 
 I remember this too.  Marcelo?

Don't see any guarantee that it is symmetrical. Anyway, will drop 
the patch from the series.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Jumbo frames with virtio net

2010-04-26 Thread carlopmart

Hi all

 Is it possible to configure jumbo frames (mtu=9000) on a kvm guest
using virtio net drivers?

Thanks.


--
CL Martinez
carlopmart {at} gmail {d0t} com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Synchronized time with kvm_clock

2010-04-26 Thread John Buswell

Alex,

You don't need to run ntp on each guest. You can enable rtc support in 
the guest kernel and on the hypervisor. Run ntp client on the hypervisor 
via cron, and use hwclock -w on the hypervisor after you run ntp, to 
sync the hardware clock to the system clock (which is now updated by 
ntpdate). On the guests, periodically run hwclock -s to set the system 
clock from the hw clock.


This seems to work extremely well, the clocksource on the guests as 
kvm_clock, and as long as you have the clocksource as hpet or acpi_pm on 
the hypervisor, there doesn't seem to be any problems with keeping time.


The only thing I've noticed is that when you reboot, the very first 
guest will have the wrong time on boot, so the uptime is messed up.


Regards

Alex Hermann wrote:

On Monday 26 April 2010, I wrote:
  

host:
cat /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc

guest:
cat /sys/devices/system/clocksource/clocksource0/current_clocksource
kvm-clock



Forgotten some info which might be essential:

Kernel (host and guest): 2.6.32-trunk-amd64
qemu-kvm: 0.12.3+dfsg-4


Please keep me on the cc, I'm not on the list.
  



--
John Buswell
CEO, Carbon Mountain LLC
http://www.carbonmountain.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Synchronized time with kvm_clock

2010-04-26 Thread Athanasius
On Mon, Apr 26, 2010 at 10:32:51AM -0400, John Buswell wrote:
 You don't need to run ntp on each guest. You can enable rtc support in  
 the guest kernel and on the hypervisor. Run ntp client on the hypervisor  
 via cron, and use hwclock -w on the hypervisor after you run ntp, to  
 sync the hardware clock to the system clock (which is now updated by  
 ntpdate). On the guests, periodically run hwclock -s to set the system  
 clock from the hw clock.

  What a *horribly* hacky way to do it, meaning you'll get time warps
all over the place, admittedly of short intervals if you run those cron
jobs often enough.  It seems much simpler to me to simply run ntpd in
all the guests.  It's not like the extra CPU or bandwidth is going to be
a problem.  At the very least you want to run ntpd, not ntpdate out of
cron, in the hypervisor, and only use cron for those hwclock -w's.

 This seems to work extremely well, the clocksource on the guests as  
 kvm_clock, and as long as you have the clocksource as hpet or acpi_pm on  
 the hypervisor, there doesn't seem to be any problems with keeping time.

 The only thing I've noticed is that when you reboot, the very first  
 guest will have the wrong time on boot, so the uptime is messed up.

  And I think many people would find this unacceptable.

  Really, I appreciate that keep the time sync'd via ntpd on the
hypervisor and have it passed accurately to the guests has a certain
elegant simplicity about it.  But if you achieve the latter by
periodically resyncing against what the guest sees as its hardware clock
you've lost that elegance again.  It really needs to 'just work' via KVM
code in the guest kernel using the exact same time as the hypervisor
kernel is supplying.

-- 
- Athanasius = Athanasius(at)miggy.org / http://www.miggy.org/
  Finger athan(at)fysh.org for PGP key
   And it's me who is my enemy. Me who beats me up.
Me who makes the monsters. Me who strips my confidence. Paula Cole - ME


signature.asc
Description: Digital signature


Re: Synchronized time with kvm_clock

2010-04-26 Thread John Buswell

Athanasius wrote:

On Mon, Apr 26, 2010 at 10:32:51AM -0400, John Buswell wrote:
  
You don't need to run ntp on each guest. You can enable rtc support in  
the guest kernel and on the hypervisor. Run ntp client on the hypervisor  
via cron, and use hwclock -w on the hypervisor after you run ntp, to  
sync the hardware clock to the system clock (which is now updated by  
ntpdate). On the guests, periodically run hwclock -s to set the system  
clock from the hw clock.



  What a *horribly* hacky way to do it, meaning you'll get time warps
all over the place, admittedly of short intervals if you run those cron
jobs often enough.  It seems much simpler to me to simply run ntpd in
all the guests.  It's not like the extra CPU or bandwidth is going to be
a problem.  At the very least you want to run ntpd, not ntpdate out of
cron, in the hypervisor, and only use cron for those hwclock -w's.
Not really. You don't get time warps at all, the only place you get a 
time warp is on the initial guest, and thats not a problem with the 
workaround I suggested. It seems to be an issue with the clock on the 
initial guest. There is no point wasting resources by running ntpd on 
each guest when you don't have to.
  
This seems to work extremely well, the clocksource on the guests as  
kvm_clock, and as long as you have the clocksource as hpet or acpi_pm on  
the hypervisor, there doesn't seem to be any problems with keeping time.


The only thing I've noticed is that when you reboot, the very first  
guest will have the wrong time on boot, so the uptime is messed up.



  And I think many people would find this unacceptable.
  
This particular problem has nothing to do with what I suggested above. 
This is some kind of issue with kvm_clock on the first guest starting up.

  Really, I appreciate that keep the time sync'd via ntpd on the
hypervisor and have it passed accurately to the guests has a certain
elegant simplicity about it.  But if you achieve the latter by
periodically resyncing against what the guest sees as its hardware clock
you've lost that elegance again.  It really needs to 'just work' via KVM
code in the guest kernel using the exact same time as the hypervisor
kernel is supplying.

  
I agree. Unfortunately, kvm_clock doesn't seem to be quite there yet. So 
using rtc0 as a comparison, and keeping the hypervisor clock in sync 
with reality, is a good way to avoid having to run N+1 copies of ntpd on 
the guests :)


--
John Buswell
CEO, Carbon Mountain LLC
http://www.carbonmountain.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] document boot option to -drive parameter

2010-04-26 Thread Marcelo Tosatti
On Fri, Apr 16, 2010 at 02:41:37PM -0600, Bruce Rogers wrote:
 The boot option is missing from the documentation for the -drive parameter.
 
 If there is a better way to descibe it, I'm all ears.
 
 Signed-off-by: Bruce Rogers brog...@novell.com
 
 diff --git a/qemu-options.hx b/qemu-options.hx
 index c5a160c..fbcf61e 100644
 --- a/qemu-options.hx
 +++ b/qemu-options.hx
 @@ -160,6 +160,8 @@ an untrusted format header.
  This option specifies the serial number to assign to the device.
  @item ad...@var{addr}
  Specify the controller's PCI address (if=virtio only).
 +...@item bo...@var{boot}
 +...@var{boot} is on or off and allows for booting from non-traditional 
 interfaces, such as virtio.
  @end table
  
  By default, writethrough caching is used for all block device.  This means 
 that

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM call agenda for Apr 27

2010-04-26 Thread Chris Wright
Please send in any agenda items you are interested in covering.

While I don't expect it to be the case this week, if we have a 
lack of agenda items I'll cancel the week's call.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3 v2] KVM MMU: make kvm_mmu_zap_page() return the number of zapped sp in total.

2010-04-26 Thread Marcelo Tosatti
On Fri, Apr 23, 2010 at 01:58:22PM +0800, Gui Jianfeng wrote:
 Currently, in kvm_mmu_change_mmu_pages(kvm, page), used_pages-- is  
 performed after calling
 kvm_mmu_zap_page() in spite of that whether page is actually reclaimed. 
 Because root sp won't 
 be reclaimed by kvm_mmu_zap_page(). So making kvm_mmu_zap_page() return total 
 number of reclaimed 
 sp makes more sense. A new flag is put into kvm_mmu_zap_page() to indicate 
 whether the top page is
 reclaimed.
 
 Signed-off-by: Gui Jianfeng guijianf...@cn.fujitsu.com
 ---
  arch/x86/kvm/mmu.c |   53 +++
  1 files changed, 36 insertions(+), 17 deletions(-)

Gui, 

There will be only a few pinned roots, and there is no need for
kvm_mmu_change_mmu_pages to be precise at that level (pages will be
reclaimed through kvm_unmap_hva eventually).

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/6] pvclock fixes

2010-04-26 Thread Glauber Costa
Hi,

This is the last series I've sent, with comments from you merged.
The first 5 patches are the same, only with the suggested fixes.
I am leaving documentation out, since the basics won't change, and
we're still discussing the details.

Patch 6 is new, and is the guest side of the skipping updates
avi asked for. I haven't yet done any HV work on this
(specially because I am not convinced exactly where it is safe to
do).

Let me know what you think.

Thanks

Glauber Costa (6):
  Enable pvclock flags in vcpu_time_info structure
  Add a global synchronization point for pvclock
  change msr numbers for kvmclock
  export new cpuid KVM_CAP
  Try using new kvm clock msrs
  don't compute pvclock adjustments if we trust the tsc

 arch/x86/include/asm/kvm_para.h|   13 
 arch/x86/include/asm/pvclock-abi.h |4 ++-
 arch/x86/include/asm/pvclock.h |1 +
 arch/x86/kernel/kvmclock.c |   59 +++-
 arch/x86/kernel/pvclock.c  |   37 ++
 arch/x86/kvm/x86.c |   13 +++-
 include/linux/kvm.h|1 +
 7 files changed, 105 insertions(+), 23 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/6] Add a global synchronization point for pvclock

2010-04-26 Thread Glauber Costa
In recent stress tests, it was found that pvclock-based systems
could seriously warp in smp systems. Using ingo's time-warp-test.c,
I could trigger a scenario as bad as 1.5mi warps a minute in some systems.
(to be fair, it wasn't that bad in most of them). Investigating further, I
found out that such warps were caused by the very offset-based calculation
pvclock is based on.

This happens even on some machines that report constant_tsc in its tsc flags,
specially on multi-socket ones.

Two reads of the same kernel timestamp at approx the same time, will likely
have tsc timestamped in different occasions too. This means the delta we
calculate is unpredictable at best, and can probably be smaller in a cpu
that is legitimately reading clock in a forward ocasion.

Some adjustments on the host could make this window less likely to happen,
but still, it pretty much poses as an intrinsic problem of the mechanism.

A while ago, I though about using a shared variable anyway, to hold clock
last state, but gave up due to the high contention locking was likely
to introduce, possibly rendering the thing useless on big machines. I argue,
however, that locking is not necessary.

We do a read-and-return sequence in pvclock, and between read and return,
the global value can have changed. However, it can only have changed
by means of an addition of a positive value. So if we detected that our
clock timestamp is less than the current global, we know that we need to
return a higher one, even though it is not exactly the one we compared to.

OTOH, if we detect we're greater than the current time source, we atomically
replace the value with our new readings. This do causes contention on big
boxes (but big here means *BIG*), but it seems like a good trade off, since
it provide us with a time source guaranteed to be stable wrt time warps.

After this patch is applied, I don't see a single warp in time during 5 days
of execution, in any of the machines I saw them before.

Signed-off-by: Glauber Costa glom...@redhat.com
CC: Jeremy Fitzhardinge jer...@goop.org
CC: Avi Kivity a...@redhat.com
CC: Marcelo Tosatti mtosa...@redhat.com
CC: Zachary Amsden zams...@redhat.com
---
 arch/x86/kernel/pvclock.c |   24 
 1 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index 8f4af7b..6cf6dec 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -118,11 +118,14 @@ unsigned long pvclock_tsc_khz(struct 
pvclock_vcpu_time_info *src)
return pv_tsc_khz;
 }
 
+static atomic64_t last_value = ATOMIC64_INIT(0);
+
 cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
 {
struct pvclock_shadow_time shadow;
unsigned version;
cycle_t ret, offset;
+   u64 last;
 
do {
version = pvclock_get_time_values(shadow, src);
@@ -132,6 +135,27 @@ cycle_t pvclock_clocksource_read(struct 
pvclock_vcpu_time_info *src)
barrier();
} while (version != src-version);
 
+   /*
+* Assumption here is that last_value, a global accumulator, always goes
+* forward. If we are less than that, we should not be much smaller.
+* We assume there is an error marging we're inside, and then the 
correction
+* does not sacrifice accuracy.
+*
+* For reads: global may have changed between test and return,
+* but this means someone else updated poked the clock at a later time.
+* We just need to make sure we are not seeing a backwards event.
+*
+* For updates: last_value = ret is not enough, since two vcpus could be
+* updating at the same time, and one of them could be slightly behind,
+* making the assumption that last_value always go forward fail to hold.
+*/
+   last = atomic64_read(last_value);
+   do {
+   if (ret  last)
+   return last;
+   last = atomic64_cmpxchg(last_value, last, ret);
+   } while (unlikely(last != ret));
+
return ret;
 }
 
-- 
1.6.2.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/6] Enable pvclock flags in vcpu_time_info structure

2010-04-26 Thread Glauber Costa
This patch removes one padding byte and transform it into a flags
field. New versions of guests using pvclock will query these flags
upon each read.

Flags, however, will only be interpreted when the guest decides to.
It uses the pvclock_valid_flags function to signal that a specific
set of flags should be taken into consideration. Which flags are valid
are usually devised via HV negotiation.

Signed-off-by: Glauber Costa glom...@redhat.com
CC: Jeremy Fitzhardinge jer...@goop.org
---
 arch/x86/include/asm/pvclock-abi.h |3 ++-
 arch/x86/include/asm/pvclock.h |1 +
 arch/x86/kernel/pvclock.c  |9 +
 3 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/pvclock-abi.h 
b/arch/x86/include/asm/pvclock-abi.h
index 6d93508..ec5c41a 100644
--- a/arch/x86/include/asm/pvclock-abi.h
+++ b/arch/x86/include/asm/pvclock-abi.h
@@ -29,7 +29,8 @@ struct pvclock_vcpu_time_info {
u64   system_time;
u32   tsc_to_system_mul;
s8tsc_shift;
-   u8pad[3];
+   u8flags;
+   u8pad[2];
 } __attribute__((__packed__)); /* 32 bytes */
 
 struct pvclock_wall_clock {
diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
index 53235fd..c50823f 100644
--- a/arch/x86/include/asm/pvclock.h
+++ b/arch/x86/include/asm/pvclock.h
@@ -6,6 +6,7 @@
 
 /* some helper functions for xen and kvm pv clock sources */
 cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src);
+void pvclock_valid_flags(u8 flags);
 unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src);
 void pvclock_read_wallclock(struct pvclock_wall_clock *wall,
struct pvclock_vcpu_time_info *vcpu,
diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index 03801f2..8f4af7b 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -31,8 +31,16 @@ struct pvclock_shadow_time {
u32 tsc_to_nsec_mul;
int tsc_shift;
u32 version;
+   u8  flags;
 };
 
+static u8 valid_flags = 0;
+
+void pvclock_valid_flags(u8 flags)
+{
+   valid_flags = flags;
+}
+
 /*
  * Scale a 64-bit delta by scaling and multiplying by a 32-bit fraction,
  * yielding a 64-bit result.
@@ -91,6 +99,7 @@ static unsigned pvclock_get_time_values(struct 
pvclock_shadow_time *dst,
dst-system_timestamp  = src-system_time;
dst-tsc_to_nsec_mul   = src-tsc_to_system_mul;
dst-tsc_shift = src-tsc_shift;
+   dst-flags = src-flags;
rmb();  /* test version after fetching data */
} while ((src-version  1) || (dst-version != src-version));
 
-- 
1.6.2.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/6] don't compute pvclock adjustments if we trust the tsc

2010-04-26 Thread Glauber Costa
If the HV told us we can fully trust the TSC, skip any
correction

Signed-off-by: Glauber Costa glom...@redhat.com
---
 arch/x86/include/asm/kvm_para.h|5 +
 arch/x86/include/asm/pvclock-abi.h |1 +
 arch/x86/kernel/kvmclock.c |3 +++
 arch/x86/kernel/pvclock.c  |4 
 4 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index f019f8c..615ebb1 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -21,6 +21,11 @@
  */
 #define KVM_FEATURE_CLOCKSOURCE23
 
+/* The last 8 bits are used to indicate how to interpret the flags field
+ * in pvclock structure. If no bits are set, all flags are ignored.
+ */
+#define KVM_FEATURE_CLOCKSOURCE_STABLE_TSC 0xf8
+
 #define MSR_KVM_WALL_CLOCK  0x11
 #define MSR_KVM_SYSTEM_TIME 0x12
 
diff --git a/arch/x86/include/asm/pvclock-abi.h 
b/arch/x86/include/asm/pvclock-abi.h
index ec5c41a..b123bd7 100644
--- a/arch/x86/include/asm/pvclock-abi.h
+++ b/arch/x86/include/asm/pvclock-abi.h
@@ -39,5 +39,6 @@ struct pvclock_wall_clock {
u32   nsec;
 } __attribute__((__packed__));
 
+#define PVCLOCK_STABLE_TSC (1  0)
 #endif /* __ASSEMBLY__ */
 #endif /* _ASM_X86_PVCLOCK_ABI_H */
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index f2f6aee..aca2d3c 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -218,4 +218,7 @@ void __init kvmclock_init(void)
clocksource_register(kvm_clock);
pv_info.paravirt_enabled = 1;
pv_info.name = KVM;
+
+   if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE_STABLE_TSC))
+   pvclock_valid_flags(PVCLOCK_STABLE_TSC);
 }
diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index 6cf6dec..43ae8d5 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -135,6 +135,10 @@ cycle_t pvclock_clocksource_read(struct 
pvclock_vcpu_time_info *src)
barrier();
} while (version != src-version);
 
+   if ((valid_flags  PVCLOCK_STABLE_TSC) 
+   (shadow-flags  PVCLOCK_STABLE_TSC))
+   return ret;
+
/*
 * Assumption here is that last_value, a global accumulator, always goes
 * forward. If we are less than that, we should not be much smaller.
-- 
1.6.2.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/6] Try using new kvm clock msrs

2010-04-26 Thread Glauber Costa
We now added a new set of clock-related msrs in replacement of the old
ones. In theory, we could just try to use them and get a return value
indicating they do not exist, due to our use of kvm_write_msr_save.

However, kvm clock registration happens very early, and if we ever
try to write to a non-existant MSR, we raise a lethal #GP, since our
idt handlers are not in place yet.

So this patch tests for a cpuid feature exported by the host to
decide which set of msrs are supported.

Signed-off-by: Glauber Costa glom...@redhat.com
---
 arch/x86/kernel/kvmclock.c |   56 +++
 1 files changed, 35 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index feaeb0d..f2f6aee 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -29,6 +29,8 @@
 #define KVM_SCALE 22
 
 static int kvmclock = 1;
+static int msr_kvm_system_time = MSR_KVM_SYSTEM_TIME;
+static int msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK;
 
 static int parse_no_kvmclock(char *arg)
 {
@@ -41,6 +43,7 @@ early_param(no-kvmclock, parse_no_kvmclock);
 static DEFINE_PER_CPU_SHARED_ALIGNED(struct pvclock_vcpu_time_info, hv_clock);
 static struct pvclock_wall_clock wall_clock;
 
+
 /*
  * The wallclock is the time of day when we booted. Since then, some time may
  * have elapsed since the hypervisor wrote the data. So we try to account for
@@ -54,7 +57,8 @@ static unsigned long kvm_get_wallclock(void)
 
low = (int)__pa_symbol(wall_clock);
high = ((u64)__pa_symbol(wall_clock)  32);
-   native_write_msr(MSR_KVM_WALL_CLOCK, low, high);
+
+   native_write_msr_safe(msr_kvm_wall_clock, low, high);
 
vcpu_time = get_cpu_var(hv_clock);
pvclock_read_wallclock(wall_clock, vcpu_time, ts);
@@ -130,7 +134,8 @@ static int kvm_register_clock(char *txt)
high = ((u64)__pa(per_cpu(hv_clock, cpu))  32);
printk(KERN_INFO kvm-clock: cpu %d, msr %x:%x, %s\n,
   cpu, high, low, txt);
-   return native_write_msr_safe(MSR_KVM_SYSTEM_TIME, low, high);
+
+   return native_write_msr_safe(msr_kvm_system_time, low, high);
 }
 
 #ifdef CONFIG_X86_LOCAL_APIC
@@ -165,14 +170,15 @@ static void __init kvm_smp_prepare_boot_cpu(void)
 #ifdef CONFIG_KEXEC
 static void kvm_crash_shutdown(struct pt_regs *regs)
 {
-   native_write_msr_safe(MSR_KVM_SYSTEM_TIME, 0, 0);
+
+   native_write_msr(msr_kvm_system_time, 0, 0);
native_machine_crash_shutdown(regs);
 }
 #endif
 
 static void kvm_shutdown(void)
 {
-   native_write_msr_safe(MSR_KVM_SYSTEM_TIME, 0, 0);
+   native_write_msr(msr_kvm_system_time, 0, 0);
native_machine_shutdown();
 }
 
@@ -181,27 +187,35 @@ void __init kvmclock_init(void)
if (!kvm_para_available())
return;
 
-   if (kvmclock  kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)) {
-   if (kvm_register_clock(boot clock))
-   return;
-   pv_time_ops.sched_clock = kvm_clock_read;
-   x86_platform.calibrate_tsc = kvm_get_tsc_khz;
-   x86_platform.get_wallclock = kvm_get_wallclock;
-   x86_platform.set_wallclock = kvm_set_wallclock;
+   if (kvmclock  kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE2)) {
+   msr_kvm_system_time = MSR_KVM_SYSTEM_TIME_NEW;
+   msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK_NEW;
+   }
+   else if (!(kvmclock  kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)))
+   return;
+
+   printk(KERN_INFO kvm-clock: Using msrs %x and %x,
+   msr_kvm_system_time, msr_kvm_wall_clock);
+
+   if (kvm_register_clock(boot clock))
+   return;
+   pv_time_ops.sched_clock = kvm_clock_read;
+   x86_platform.calibrate_tsc = kvm_get_tsc_khz;
+   x86_platform.get_wallclock = kvm_get_wallclock;
+   x86_platform.set_wallclock = kvm_set_wallclock;
 #ifdef CONFIG_X86_LOCAL_APIC
-   x86_cpuinit.setup_percpu_clockev =
-   kvm_setup_secondary_clock;
+   x86_cpuinit.setup_percpu_clockev =
+   kvm_setup_secondary_clock;
 #endif
 #ifdef CONFIG_SMP
-   smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
+   smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
 #endif
-   machine_ops.shutdown  = kvm_shutdown;
+   machine_ops.shutdown  = kvm_shutdown;
 #ifdef CONFIG_KEXEC
-   machine_ops.crash_shutdown  = kvm_crash_shutdown;
+   machine_ops.crash_shutdown  = kvm_crash_shutdown;
 #endif
-   kvm_get_preset_lpj();
-   clocksource_register(kvm_clock);
-   pv_info.paravirt_enabled = 1;
-   pv_info.name = KVM;
-   }
+   kvm_get_preset_lpj();
+   clocksource_register(kvm_clock);
+   pv_info.paravirt_enabled = 1;
+   pv_info.name = KVM;
 }
-- 
1.6.2.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to 

[PATCH 4/6] export new cpuid KVM_CAP

2010-04-26 Thread Glauber Costa
Since we're changing the msrs kvmclock uses, we have to communicate
that to the guest, through cpuid. We can add a new KVM_CAP to the
hypervisor, and then patch userspace to recognize it.

And if we ever add a new cpuid bit in the future, we have to do that again,
which create some complexity and delay in feature adoption.

Instead, what I'm proposing in this patch is a new capability, called
KVM_CAP_X86_CPUID_FEATURE_LIST, that returns the current feature list
currently supported by the hypervisor. If we ever want to add or remove
some feature, we only need to tweak into the HV, leaving userspace untouched.

Signed-off-by: Glauber Costa glom...@redhat.com
---
 arch/x86/include/asm/kvm_para.h |4 
 arch/x86/kvm/x86.c  |6 ++
 include/linux/kvm.h |1 +
 3 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 9734808..f019f8c 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -16,6 +16,10 @@
 #define KVM_FEATURE_CLOCKSOURCE0
 #define KVM_FEATURE_NOP_IO_DELAY   1
 #define KVM_FEATURE_MMU_OP 2
+/* This indicates that the new set of kvmclock msrs
+ * are available. The use of 0x11 and 0x12 is deprecated
+ */
+#define KVM_FEATURE_CLOCKSOURCE23
 
 #define MSR_KVM_WALL_CLOCK  0x11
 #define MSR_KVM_SYSTEM_TIME 0x12
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a2ead7f..04f04aa 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1545,6 +1545,12 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_MCE:
r = KVM_MAX_MCE_BANKS;
break;
+   case KVM_CAP_X86_CPUID_FEATURE_LIST:
+   r = (1  KVM_FEATURE_CLOCKSOURCE) |
+   (1  KVM_FEATURE_NOP_IO_DELAY) |
+   (1  KVM_FEATURE_MMU_OP) |
+   (1  KVM_FEATURE_CLOCKSOURCE2);
+   break;
default:
r = 0;
break;
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index ce28767..1ce124f 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -507,6 +507,7 @@ struct kvm_ioeventfd {
 #define KVM_CAP_DEBUGREGS 50
 #endif
 #define KVM_CAP_X86_ROBUST_SINGLESTEP 51
+#define KVM_CAP_X86_CPUID_FEATURE_LIST 52
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
1.6.2.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/6] change msr numbers for kvmclock

2010-04-26 Thread Glauber Costa
Avi pointed out a while ago that those MSRs falls into the pentium
PMU range. So the idea here is to add new ones, and after a while,
deprecate the old ones.

Signed-off-by: Glauber Costa glom...@redhat.com
---
 arch/x86/include/asm/kvm_para.h |4 
 arch/x86/kvm/x86.c  |7 ++-
 2 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index ffae142..9734808 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -20,6 +20,10 @@
 #define MSR_KVM_WALL_CLOCK  0x11
 #define MSR_KVM_SYSTEM_TIME 0x12
 
+/* Custom MSRs falls in the range 0x4b564d00-0x4b564dff */
+#define MSR_KVM_WALL_CLOCK_NEW  0x4b564d00
+#define MSR_KVM_SYSTEM_TIME_NEW 0x4b564d01
+
 #define KVM_MAX_MMU_OP_BATCH   32
 
 /* Operations for KVM_HC_MMU_OP */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8824b73..a2ead7f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -575,9 +575,10 @@ static inline u32 bit(int bitno)
  * kvm-specific. Those are put in the beginning of the list.
  */
 
-#define KVM_SAVE_MSRS_BEGIN5
+#define KVM_SAVE_MSRS_BEGIN7
 static u32 msrs_to_save[] = {
MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
+   MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
HV_X64_MSR_APIC_ASSIST_PAGE,
MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
@@ -1099,10 +1100,12 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, 
u64 data)
case MSR_IA32_MISC_ENABLE:
vcpu-arch.ia32_misc_enable_msr = data;
break;
+   case MSR_KVM_WALL_CLOCK_NEW:
case MSR_KVM_WALL_CLOCK:
vcpu-kvm-arch.wall_clock = data;
kvm_write_wall_clock(vcpu-kvm, data);
break;
+   case MSR_KVM_SYSTEM_TIME_NEW:
case MSR_KVM_SYSTEM_TIME: {
if (vcpu-arch.time_page) {
kvm_release_page_dirty(vcpu-arch.time_page);
@@ -1374,9 +1377,11 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, 
u64 *pdata)
data = vcpu-arch.efer;
break;
case MSR_KVM_WALL_CLOCK:
+   case MSR_KVM_WALL_CLOCK_NEW:
data = vcpu-kvm-arch.wall_clock;
break;
case MSR_KVM_SYSTEM_TIME:
+   case MSR_KVM_SYSTEM_TIME_NEW:
data = vcpu-arch.time;
break;
case MSR_IA32_P5_MC_ADDR:
-- 
1.6.2.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for Apr 27

2010-04-26 Thread Anthony Liguori

On 04/26/2010 12:26 PM, Chris Wright wrote:

Please send in any agenda items you are interested in covering.

While I don't expect it to be the case this week, if we have a
lack of agenda items I'll cancel the week's call.
   


- qemu management interface (and libvirt)
- stable tree policy (push vs. pull and call for stable volunteers)

Regards,

Anthony Liguori


thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] KVM MMU: fix sp-unsync type error in trace event definition.

2010-04-26 Thread Marcelo Tosatti
On Thu, Apr 22, 2010 at 05:33:57PM +0800, Gui Jianfeng wrote:
 sp-unsync is bool now, so update trace event declaration.
 
 Signed-off-by: Gui Jianfeng guijianf...@cn.fujitsu.com
 ---
  arch/x86/kvm/mmutrace.h |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)
 
 diff --git a/arch/x86/kvm/mmutrace.h b/arch/x86/kvm/mmutrace.h
 index 3851f1f..9966e80 100644
 --- a/arch/x86/kvm/mmutrace.h
 +++ b/arch/x86/kvm/mmutrace.h
 @@ -11,7 +11,7 @@
   __field(__u64, gfn) \
   __field(__u32, role) \
   __field(__u32, root_count) \
 - __field(__u32, unsync)
 + __field(bool, unsync)
  
  #define KVM_MMU_PAGE_ASSIGN(sp)   \
   __entry-gfn = sp-gfn;  \
 -- 
 1.6.5.2

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


2.6.32.12: Build warning due to 78ce64a384 / missing in 2.6.33?

2010-04-26 Thread Jan Kiszka
Gleb,

I'm getting a build warning with latest 2.6.32.12 due to Fix segment
descriptor loading. load_segment_descriptor_to_kvm_desct is unused
after that patch. I assume it's just forgotten code and did not
accidentally become unused, right?

The fact that 2.6.33.3 does not generate this makes me wonder why it
obviously lacks the above patch. Not required or not yet queued?

Jan



signature.asc
Description: OpenPGP digital signature


[PATCH 08/10] introduce leul_to_cpu

2010-04-26 Thread Marcelo Tosatti
To be used by next patch.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 bswap.h |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/bswap.h b/bswap.h
index aace9b7..956f3fa 100644
--- a/bswap.h
+++ b/bswap.h
@@ -205,8 +205,10 @@ static inline void cpu_to_be32wu(uint32_t *p, uint32_t v)
 
 #ifdef HOST_WORDS_BIGENDIAN
 #define cpu_to_32wu cpu_to_be32wu
+#define leul_to_cpu(v) le ## HOST_LONG_BITS ## _to_cpu(v)
 #else
 #define cpu_to_32wu cpu_to_le32wu
+#define leul_to_cpu(v) (v)
 #endif
 
 #undef le_bswap
-- 
1.6.6.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/10] kvm: allow qemu to set EPT identity mapping address

2010-04-26 Thread Marcelo Tosatti
From: Sheng Yang sh...@linux.intel.com

If we use larger BIOS image than current 256KB, we would need move reserved
TSS and EPT identity mapping pages. Currently TSS support this, but not
EPT.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 target-i386/kvm.c |   26 +-
 1 files changed, 25 insertions(+), 1 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index bb6dafa..f73b47b 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -326,6 +326,25 @@ static int kvm_has_msr_star(CPUState *env)
 return 0;
 }
 
+static int kvm_init_identity_map_page(KVMState *s)
+{
+#ifdef KVM_CAP_SET_IDENTITY_MAP_ADDR
+int ret;
+uint64_t addr = 0xfffbc000;
+
+if (!kvm_check_extension(s, KVM_CAP_SET_IDENTITY_MAP_ADDR)) {
+return 0;
+}
+
+ret = kvm_vm_ioctl(s, KVM_SET_IDENTITY_MAP_ADDR, addr);
+if (ret  0) {
+fprintf(stderr, kvm_set_identity_map_addr: %s\n, strerror(ret));
+return ret;
+}
+#endif
+return 0;
+}
+
 int kvm_arch_init(KVMState *s, int smp_cpus)
 {
 int ret;
@@ -353,7 +372,12 @@ int kvm_arch_init(KVMState *s, int smp_cpus)
 perror(e820_add_entry() table is full);
 exit(1);
 }
-return kvm_vm_ioctl(s, KVM_SET_TSS_ADDR, 0xfffbd000);
+ret = kvm_vm_ioctl(s, KVM_SET_TSS_ADDR, 0xfffbd000);
+if (ret  0) {
+return ret;
+}
+
+return kvm_init_identity_map_page(s);
 }
 
 static void set_v8086_seg(struct kvm_segment *lhs, const SegmentCache *rhs)
-- 
1.6.6.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/10] kvm: remove explicit kvm_arch_reset_vcpu from kvm_init_vcpu

2010-04-26 Thread Marcelo Tosatti
This is now done via the initialization's qemu_system_reset call.

Signed-off-by: Avi Kivity a...@redhat.com
---
 kvm-all.c |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 9c8aa7d..eabb097 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -208,7 +208,6 @@ int kvm_init_vcpu(CPUState *env)
 ret = kvm_arch_init_vcpu(env);
 if (ret == 0) {
 qemu_register_reset(kvm_reset_vcpu, env);
-kvm_arch_reset_vcpu(env);
 }
 err:
 return ret;
-- 
1.6.6.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 09/10] kvm: port qemu-kvm's bitmap scanning

2010-04-26 Thread Marcelo Tosatti
Which is significantly faster.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 kvm-all.c |   53 +
 1 files changed, 37 insertions(+), 16 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index eabb097..ef45418 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -26,6 +26,7 @@
 #include hw/hw.h
 #include gdbstub.h
 #include kvm.h
+#include bswap.h
 
 /* KVM uses PAGE_SIZE in it's definition of COALESCED_MMIO_MAX */
 #define PAGE_SIZE TARGET_PAGE_SIZE
@@ -282,11 +283,41 @@ static int kvm_set_migration_log(int enable)
 return 0;
 }
 
-static int test_le_bit(unsigned long nr, unsigned char *addr)
+/* get kvm's dirty pages bitmap and update qemu's */
+static int kvm_get_dirty_pages_log_range(unsigned long start_addr,
+ unsigned long *bitmap,
+ unsigned long offset,
+ unsigned long mem_size)
 {
-return (addr[nr  3]  (nr  7))  1;
+unsigned int i, j;
+unsigned long page_number, addr, addr1, c;
+ram_addr_t ram_addr;
+unsigned int len = ((mem_size / TARGET_PAGE_SIZE) + HOST_LONG_BITS - 1) /
+HOST_LONG_BITS;
+
+/*
+ * bitmap-traveling is faster than memory-traveling (for addr...)
+ * especially when most of the memory is not dirty.
+ */
+for (i = 0; i  len; i++) {
+if (bitmap[i] != 0) {
+c = leul_to_cpu(bitmap[i]);
+do {
+j = ffsl(c) - 1;
+c = ~(1ul  j);
+page_number = i * HOST_LONG_BITS + j;
+addr1 = page_number * TARGET_PAGE_SIZE;
+addr = offset + addr1;
+ram_addr = cpu_get_physical_page_desc(addr);
+cpu_physical_memory_set_dirty(ram_addr);
+} while (c != 0);
+}
+}
+return 0;
 }
 
+#define ALIGN(x, y)  (((x)+(y)-1)  ~((y)-1))
+
 /**
  * kvm_physical_sync_dirty_bitmap - Grab dirty bitmap from kernel space
  * This function updates qemu's dirty bitmap using 
cpu_physical_memory_set_dirty().
@@ -300,8 +331,6 @@ static int 
kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
 {
 KVMState *s = kvm_state;
 unsigned long size, allocated_size = 0;
-target_phys_addr_t phys_addr;
-ram_addr_t addr;
 KVMDirtyLog d;
 KVMSlot *mem;
 int ret = 0;
@@ -313,7 +342,7 @@ static int 
kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
 break;
 }
 
-size = ((mem-memory_size  TARGET_PAGE_BITS) + 7) / 8;
+size = ALIGN(((mem-memory_size)  TARGET_PAGE_BITS), HOST_LONG_BITS) 
/ 8;
 if (!d.dirty_bitmap) {
 d.dirty_bitmap = qemu_malloc(size);
 } else if (size  allocated_size) {
@@ -330,17 +359,9 @@ static int 
kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
 break;
 }
 
-for (phys_addr = mem-start_addr, addr = mem-phys_offset;
- phys_addr  mem-start_addr + mem-memory_size;
- phys_addr += TARGET_PAGE_SIZE, addr += TARGET_PAGE_SIZE) {
-unsigned char *bitmap = (unsigned char *)d.dirty_bitmap;
-unsigned nr = (phys_addr - mem-start_addr)  TARGET_PAGE_BITS;
-
-if (test_le_bit(nr, bitmap)) {
-cpu_physical_memory_set_dirty(addr);
-}
-}
-start_addr = phys_addr;
+kvm_get_dirty_pages_log_range(mem-start_addr, d.dirty_bitmap,
+  mem-start_addr, mem-memory_size);
+start_addr = mem-start_addr + mem-memory_size;
 }
 qemu_free(d.dirty_bitmap);
 
-- 
1.6.6.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/10] KVM: x86: Add debug register saving and restoring

2010-04-26 Thread Marcelo Tosatti
From: Jan Kiszka jan.kis...@siemens.com

Make use of the new KVM_GET/SET_DEBUGREGS to save/restore the x86 debug
registers.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 kvm-all.c |   11 ++
 kvm.h |1 +
 target-i386/kvm.c |   55 +
 3 files changed, 67 insertions(+), 0 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 2ede4b9..d050115 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -64,6 +64,7 @@ struct KVMState
 int migration_log;
 int vcpu_events;
 int robust_singlestep;
+int debugregs;
 #ifdef KVM_CAP_SET_GUEST_DEBUG
 struct kvm_sw_breakpoint_head kvm_sw_breakpoints;
 #endif
@@ -664,6 +665,11 @@ int kvm_init(int smp_cpus)
 kvm_check_extension(s, KVM_CAP_X86_ROBUST_SINGLESTEP);
 #endif
 
+s-debugregs = 0;
+#ifdef KVM_CAP_DEBUGREGS
+s-debugregs = kvm_check_extension(s, KVM_CAP_DEBUGREGS);
+#endif
+
 ret = kvm_arch_init(s, smp_cpus);
 if (ret  0)
 goto err;
@@ -939,6 +945,11 @@ int kvm_has_robust_singlestep(void)
 return kvm_state-robust_singlestep;
 }
 
+int kvm_has_debugregs(void)
+{
+return kvm_state-debugregs;
+}
+
 void kvm_setup_guest_memory(void *start, size_t size)
 {
 if (!kvm_has_sync_mmu()) {
diff --git a/kvm.h b/kvm.h
index ae87d85..70bfbf8 100644
--- a/kvm.h
+++ b/kvm.h
@@ -40,6 +40,7 @@ int kvm_init(int smp_cpus);
 int kvm_has_sync_mmu(void);
 int kvm_has_vcpu_events(void);
 int kvm_has_robust_singlestep(void);
+int kvm_has_debugregs(void);
 
 #ifdef NEED_CPU_H
 int kvm_init_vcpu(CPUState *env);
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 5513472..bb6dafa 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -874,6 +874,53 @@ static int kvm_guest_debug_workarounds(CPUState *env)
 return ret;
 }
 
+static int kvm_put_debugregs(CPUState *env)
+{
+#ifdef KVM_CAP_DEBUGREGS
+struct kvm_debugregs dbgregs;
+int i;
+
+if (!kvm_has_debugregs()) {
+return 0;
+}
+
+for (i = 0; i  4; i++) {
+dbgregs.db[i] = env-dr[i];
+}
+dbgregs.dr6 = env-dr[6];
+dbgregs.dr7 = env-dr[7];
+dbgregs.flags = 0;
+
+return kvm_vcpu_ioctl(env, KVM_SET_DEBUGREGS, dbgregs);
+#else
+return 0;
+#endif
+}
+
+static int kvm_get_debugregs(CPUState *env)
+{
+#ifdef KVM_CAP_DEBUGREGS
+struct kvm_debugregs dbgregs;
+int i, ret;
+
+if (!kvm_has_debugregs()) {
+return 0;
+}
+
+ret = kvm_vcpu_ioctl(env, KVM_GET_DEBUGREGS, dbgregs);
+if (ret  0) {
+   return ret;
+}
+for (i = 0; i  4; i++) {
+env-dr[i] = dbgregs.db[i];
+}
+env-dr[4] = env-dr[6] = dbgregs.dr6;
+env-dr[5] = env-dr[7] = dbgregs.dr7;
+#endif
+
+return 0;
+}
+
 int kvm_arch_put_registers(CPUState *env, int level)
 {
 int ret;
@@ -909,6 +956,10 @@ int kvm_arch_put_registers(CPUState *env, int level)
 if (ret  0)
 return ret;
 
+ret = kvm_put_debugregs(env);
+if (ret  0)
+return ret;
+
 return 0;
 }
 
@@ -940,6 +991,10 @@ int kvm_arch_get_registers(CPUState *env)
 if (ret  0)
 return ret;
 
+ret = kvm_get_debugregs(env);
+if (ret  0)
+return ret;
+
 return 0;
 }
 
-- 
1.6.6.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/10] introduce qemu_ram_map

2010-04-26 Thread Marcelo Tosatti
Which allows drivers to register an mmaped region into ram block mappings.
To be used by device assignment driver.

CC: Cam Macdonell c...@cs.ualberta.ca
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 cpu-common.h |1 +
 exec.c   |   28 
 2 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/cpu-common.h b/cpu-common.h
index b24cecc..2dfde6f 100644
--- a/cpu-common.h
+++ b/cpu-common.h
@@ -40,6 +40,7 @@ static inline void 
cpu_register_physical_memory(target_phys_addr_t start_addr,
 }
 
 ram_addr_t cpu_get_physical_page_desc(target_phys_addr_t addr);
+ram_addr_t qemu_ram_map(ram_addr_t size, void *host);
 ram_addr_t qemu_ram_alloc(ram_addr_t);
 void qemu_ram_free(ram_addr_t addr);
 /* This should only be used for ram local to a device.  */
diff --git a/exec.c b/exec.c
index 14d1fd7..648a9c9 100644
--- a/exec.c
+++ b/exec.c
@@ -2789,6 +2789,34 @@ static void *file_ram_alloc(ram_addr_t memory, const 
char *path)
 }
 #endif
 
+ram_addr_t qemu_ram_map(ram_addr_t size, void *host)
+{
+RAMBlock *new_block;
+
+size = TARGET_PAGE_ALIGN(size);
+new_block = qemu_malloc(sizeof(*new_block));
+
+new_block-host = host;
+
+new_block-offset = last_ram_offset;
+new_block-length = size;
+
+new_block-next = ram_blocks;
+ram_blocks = new_block;
+
+phys_ram_dirty = qemu_realloc(phys_ram_dirty,
+(last_ram_offset + size)  TARGET_PAGE_BITS);
+memset(phys_ram_dirty + (last_ram_offset  TARGET_PAGE_BITS),
+   0xff, size  TARGET_PAGE_BITS);
+
+last_ram_offset += size;
+
+if (kvm_enabled())
+kvm_setup_guest_memory(new_block-host, size);
+
+return new_block-offset;
+}
+
 ram_addr_t qemu_ram_alloc(ram_addr_t size)
 {
 RAMBlock *new_block;
-- 
1.6.6.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/10] vga: fix typo in length passed to kvm_log_stop

2010-04-26 Thread Marcelo Tosatti
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 hw/vga.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/vga.c b/hw/vga.c
index 845dbcc..db72115 100644
--- a/hw/vga.c
+++ b/hw/vga.c
@@ -1618,8 +1618,8 @@ void vga_dirty_log_stop(VGACommonState *s)
kvm_log_stop(s-map_addr, s-map_end - s-map_addr);
 
 if (kvm_enabled()  s-lfb_vram_mapped) {
-   kvm_log_stop(isa_mem_base + 0xa, 0x8);
-   kvm_log_stop(isa_mem_base + 0xa8000, 0x8);
+   kvm_log_stop(isa_mem_base + 0xa, 0x8000);
+   kvm_log_stop(isa_mem_base + 0xa8000, 0x8000);
 }
 
 #ifdef CONFIG_BOCHS_VBE
-- 
1.6.6.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 00/10] [PULL] qemu-kvm.git uq/master queue

2010-04-26 Thread Marcelo Tosatti
The following changes since commit a303f9e37b87ced34e966dc2c0b7f86bc5e74035:
  Blue Swirl (1):
sh4: remove dead assignments, spotted by clang analyzer

are available in the git repository at:

  git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git uq/master

Jan Kiszka (1):
  KVM: x86: Add debug register saving and restoring

Marcelo Tosatti (8):
  target-i386: print EFER in cpu_dump_state
  kvm: handle internal error
  kvm_init_vcpu requires global lock held
  kvm: remove explicit kvm_arch_reset_vcpu from kvm_init_vcpu
  vga: fix typo in length passed to kvm_log_stop
  introduce leul_to_cpu
  kvm: port qemu-kvm's bitmap scanning
  introduce qemu_ram_map

Sheng Yang (1):
  kvm: allow qemu to set EPT identity mapping address

 bswap.h  |2 +
 cpu-common.h |1 +
 cpus.c   |2 +-
 exec.c   |   28 ++
 hw/vga.c |4 +-
 kvm-all.c|   96 +-
 kvm.h|1 +
 target-i386/helper.c |1 +
 target-i386/kvm.c|   81 +-
 9 files changed, 195 insertions(+), 21 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/10] target-i386: print EFER in cpu_dump_state

2010-04-26 Thread Marcelo Tosatti
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 target-i386/helper.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/target-i386/helper.c b/target-i386/helper.c
index 3835835..c9508a8 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -356,6 +356,7 @@ void cpu_dump_state(CPUState *env, FILE *f,
 cc_op_name);
 }
 }
+cpu_fprintf(f, EFER=%016 PRIx64 \n, env-efer);
 if (flags  X86_DUMP_FPU) {
 int fptag;
 fptag = 0;
-- 
1.6.6.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 05/10] kvm_init_vcpu requires global lock held

2010-04-26 Thread Marcelo Tosatti
Since it accesses data protected by the lock.

Signed-off-by: Avi Kivity a...@redhat.com
---
 cpus.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/cpus.c b/cpus.c
index 8450ee4..2bf87d2 100644
--- a/cpus.c
+++ b/cpus.c
@@ -401,6 +401,7 @@ static void *kvm_cpu_thread_fn(void *arg)
 {
 CPUState *env = arg;
 
+qemu_mutex_lock(qemu_global_mutex);
 qemu_thread_self(env-thread);
 if (kvm_enabled())
 kvm_init_vcpu(env);
@@ -408,7 +409,6 @@ static void *kvm_cpu_thread_fn(void *arg)
 kvm_block_io_signals(env);
 
 /* signal CPU creation */
-qemu_mutex_lock(qemu_global_mutex);
 env-created = 1;
 qemu_cond_signal(qemu_cpu_cond);
 
-- 
1.6.6.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/10] kvm: handle internal error

2010-04-26 Thread Marcelo Tosatti
Port qemu-kvm's KVM_EXIT_INTERNAL_ERROR handling to upstream.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 kvm-all.c |   31 +++
 1 files changed, 31 insertions(+), 0 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index d050115..9c8aa7d 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -730,6 +730,32 @@ static int kvm_handle_io(uint16_t port, void *data, int 
direction, int size,
 return 1;
 }
 
+#ifdef KVM_CAP_INTERNAL_ERROR_DATA
+static void kvm_handle_internal_error(CPUState *env, struct kvm_run *run)
+{
+
+if (kvm_check_extension(kvm_state, KVM_CAP_INTERNAL_ERROR_DATA)) {
+int i;
+
+fprintf(stderr, KVM internal error. Suberror: %d\n,
+run-internal.suberror);
+
+for (i = 0; i  run-internal.ndata; ++i) {
+fprintf(stderr, extra data[%d]: %PRIx64\n,
+i, (uint64_t)run-internal.data[i]);
+}
+}
+cpu_dump_state(env, stderr, fprintf, 0);
+if (run-internal.suberror == KVM_INTERNAL_ERROR_EMULATION) {
+fprintf(stderr, emulation failure\n);
+}
+/* FIXME: Should trigger a qmp message to let management know
+ * something went wrong.
+ */
+vm_stop(0);
+}
+#endif
+
 void kvm_flush_coalesced_mmio_buffer(void)
 {
 #ifdef KVM_CAP_COALESCED_MMIO
@@ -845,6 +871,11 @@ int kvm_cpu_exec(CPUState *env)
 case KVM_EXIT_EXCEPTION:
 DPRINTF(kvm_exit_exception\n);
 break;
+#ifdef KVM_CAP_INTERNAL_ERROR_DATA
+case KVM_EXIT_INTERNAL_ERROR:
+kvm_handle_internal_error(env, run);
+break;
+#endif
 case KVM_EXIT_DEBUG:
 DPRINTF(kvm_exit_debug\n);
 #ifdef KVM_CAP_SET_GUEST_DEBUG
-- 
1.6.6.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6] Enable pvclock flags in vcpu_time_info structure

2010-04-26 Thread Jeremy Fitzhardinge
On 04/26/2010 10:46 AM, Glauber Costa wrote:
 This patch removes one padding byte and transform it into a flags
 field. New versions of guests using pvclock will query these flags
 upon each read.
   

Is this necessary?  Why not just make the pvclock driver maintain a
local flag set, and have the HV backend call into it to update it.  Why
does it need to be part of the pvclock structure?

J

 Flags, however, will only be interpreted when the guest decides to.
 It uses the pvclock_valid_flags function to signal that a specific
 set of flags should be taken into consideration. Which flags are valid
 are usually devised via HV negotiation.

 Signed-off-by: Glauber Costa glom...@redhat.com
 CC: Jeremy Fitzhardinge jer...@goop.org
 ---
  arch/x86/include/asm/pvclock-abi.h |3 ++-
  arch/x86/include/asm/pvclock.h |1 +
  arch/x86/kernel/pvclock.c  |9 +
  3 files changed, 12 insertions(+), 1 deletions(-)

 diff --git a/arch/x86/include/asm/pvclock-abi.h 
 b/arch/x86/include/asm/pvclock-abi.h
 index 6d93508..ec5c41a 100644
 --- a/arch/x86/include/asm/pvclock-abi.h
 +++ b/arch/x86/include/asm/pvclock-abi.h
 @@ -29,7 +29,8 @@ struct pvclock_vcpu_time_info {
   u64   system_time;
   u32   tsc_to_system_mul;
   s8tsc_shift;
 - u8pad[3];
 + u8flags;
 + u8pad[2];
  } __attribute__((__packed__)); /* 32 bytes */
  
  struct pvclock_wall_clock {
 diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
 index 53235fd..c50823f 100644
 --- a/arch/x86/include/asm/pvclock.h
 +++ b/arch/x86/include/asm/pvclock.h
 @@ -6,6 +6,7 @@
  
  /* some helper functions for xen and kvm pv clock sources */
  cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src);
 +void pvclock_valid_flags(u8 flags);
  unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src);
  void pvclock_read_wallclock(struct pvclock_wall_clock *wall,
   struct pvclock_vcpu_time_info *vcpu,
 diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
 index 03801f2..8f4af7b 100644
 --- a/arch/x86/kernel/pvclock.c
 +++ b/arch/x86/kernel/pvclock.c
 @@ -31,8 +31,16 @@ struct pvclock_shadow_time {
   u32 tsc_to_nsec_mul;
   int tsc_shift;
   u32 version;
 + u8  flags;
  };
  
 +static u8 valid_flags = 0;
 +
 +void pvclock_valid_flags(u8 flags)
 +{
 + valid_flags = flags;
 +}
 +
  /*
   * Scale a 64-bit delta by scaling and multiplying by a 32-bit fraction,
   * yielding a 64-bit result.
 @@ -91,6 +99,7 @@ static unsigned pvclock_get_time_values(struct 
 pvclock_shadow_time *dst,
   dst-system_timestamp  = src-system_time;
   dst-tsc_to_nsec_mul   = src-tsc_to_system_mul;
   dst-tsc_shift = src-tsc_shift;
 + dst-flags = src-flags;
   rmb();  /* test version after fetching data */
   } while ((src-version  1) || (dst-version != src-version));
  
   

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/10] introduce qemu_ram_map

2010-04-26 Thread Anthony Liguori

On 04/26/2010 12:59 PM, Marcelo Tosatti wrote:

Which allows drivers to register an mmaped region into ram block mappings.
To be used by device assignment driver.

   


This is not kvm specific and not required by this pull request so it 
shouldn't really be part of the pull.  Something like this should only 
be added when there's an actual consumer.


Regards,

Anthony Liguori


CC: Cam Macdonellc...@cs.ualberta.ca
Signed-off-by: Marcelo Tosattimtosa...@redhat.com
---
  cpu-common.h |1 +
  exec.c   |   28 
  2 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/cpu-common.h b/cpu-common.h
index b24cecc..2dfde6f 100644
--- a/cpu-common.h
+++ b/cpu-common.h
@@ -40,6 +40,7 @@ static inline void 
cpu_register_physical_memory(target_phys_addr_t start_addr,
  }

  ram_addr_t cpu_get_physical_page_desc(target_phys_addr_t addr);
+ram_addr_t qemu_ram_map(ram_addr_t size, void *host);
  ram_addr_t qemu_ram_alloc(ram_addr_t);
  void qemu_ram_free(ram_addr_t addr);
  /* This should only be used for ram local to a device.  */
diff --git a/exec.c b/exec.c
index 14d1fd7..648a9c9 100644
--- a/exec.c
+++ b/exec.c
@@ -2789,6 +2789,34 @@ static void *file_ram_alloc(ram_addr_t memory, const 
char *path)
  }
  #endif

+ram_addr_t qemu_ram_map(ram_addr_t size, void *host)
+{
+RAMBlock *new_block;
+
+size = TARGET_PAGE_ALIGN(size);
+new_block = qemu_malloc(sizeof(*new_block));
+
+new_block-host = host;
+
+new_block-offset = last_ram_offset;
+new_block-length = size;
+
+new_block-next = ram_blocks;
+ram_blocks = new_block;
+
+phys_ram_dirty = qemu_realloc(phys_ram_dirty,
+(last_ram_offset + size)  TARGET_PAGE_BITS);
+memset(phys_ram_dirty + (last_ram_offset  TARGET_PAGE_BITS),
+   0xff, size  TARGET_PAGE_BITS);
+
+last_ram_offset += size;
+
+if (kvm_enabled())
+kvm_setup_guest_memory(new_block-host, size);
+
+return new_block-offset;
+}
+
  ram_addr_t qemu_ram_alloc(ram_addr_t size)
  {
  RAMBlock *new_block;
   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6] Enable pvclock flags in vcpu_time_info structure

2010-04-26 Thread Glauber Costa
On Mon, Apr 26, 2010 at 11:11:57AM -0700, Jeremy Fitzhardinge wrote:
 On 04/26/2010 10:46 AM, Glauber Costa wrote:
  This patch removes one padding byte and transform it into a flags
  field. New versions of guests using pvclock will query these flags
  upon each read.

 
 Is this necessary?  Why not just make the pvclock driver maintain a
 local flag set, and have the HV backend call into it to update it.  Why
 does it need to be part of the pvclock structure?
Because it is already there, and we have plenty of space left?

There are obvious other ways, but I don't see any of them being simpler.
If we go by the method you suggested, we'd have, for instance, to register
the memory area where this flags lives. Which is a duplication of the
infrastructure already present in kvmclock.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/10] introduce qemu_ram_map

2010-04-26 Thread Marcelo Tosatti
On Mon, Apr 26, 2010 at 01:27:30PM -0500, Anthony Liguori wrote:
 On 04/26/2010 12:59 PM, Marcelo Tosatti wrote:
 Which allows drivers to register an mmaped region into ram block mappings.
 To be used by device assignment driver.
 
 This doesn't make much sense to me.
 
 Do you use this like:
 
 qemu_ram_map(64k, ptr);
 assert(qemu_ram_alloc(64k) == ptr);

No. hw/device-assignment.c in qemu-kvm mmaps
/sys/bus/pci/devices/x:y:z/resourcen (the PCI devices memory regions) to
the guest.

 If so, I think this is not the best API.  I'd rather see
 qemu_ram_map() register a symbolic name for the region and for there
 to be a qemu_ram_alloc() variant that allocated by name.
 
 Regards,
 
 Anthony Liguori
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/10] introduce qemu_ram_map

2010-04-26 Thread Marcelo Tosatti
On Mon, Apr 26, 2010 at 01:29:06PM -0500, Anthony Liguori wrote:
 On 04/26/2010 12:59 PM, Marcelo Tosatti wrote:
 Which allows drivers to register an mmaped region into ram block mappings.
 To be used by device assignment driver.
 
 
 This is not kvm specific and not required by this pull request so it
 shouldn't really be part of the pull.  Something like this should
 only be added when there's an actual consumer.

The user will be hw/device-assignment.c in qemu-kvm. And also Cam has
the need for a similar interface for shared memory drivers.

Index: qemu-kvm/hw/device-assignment.c
===
--- qemu-kvm.orig/hw/device-assignment.c2010-04-22 16:21:30.0 
-0400
+++ qemu-kvm/hw/device-assignment.c 2010-04-22 17:36:57.0 -0400
@@ -256,10 +256,7 @@
 AssignedDevice *r_dev = container_of(pci_dev, AssignedDevice, dev);
 AssignedDevRegion *region = r_dev-v_addrs[region_num];
 PCIRegion *real_region = r_dev-real_device.regions[region_num];
-pcibus_t old_ephys = region-e_physbase;
-pcibus_t old_esize = region-e_size;
-int first_map = (region-e_size == 0);
-int ret = 0;
+int ret = 0, flags = 0;
 
 DEBUG(e_phys=%08 FMT_PCIBUS  r_virt=%p type=%d len=%08 FMT_PCIBUS  
region_num=%d \n,
   e_phys, region-u.r_virtbase, type, e_size, region_num);
@@ -267,30 +264,22 @@
 region-e_physbase = e_phys;
 region-e_size = e_size;
 
-if (!first_map)
-   kvm_destroy_phys_mem(kvm_context, old_ephys,
- TARGET_PAGE_ALIGN(old_esize));
-
 if (e_size  0) {
+
+if (region_num == PCI_ROM_SLOT)
+flags |= IO_MEM_ROM;
+
+cpu_register_physical_memory(e_phys, e_size, region-memory_index | 
flags);
+
 /* deal with MSI-X MMIO page */
 if (real_region-base_addr = r_dev-msix_table_addr 
 real_region-base_addr + real_region-size =
 r_dev-msix_table_addr) {
 int offset = r_dev-msix_table_addr - real_region-base_addr;
-ret = munmap(region-u.r_virtbase + offset, TARGET_PAGE_SIZE);
-if (ret == 0)
-DEBUG(munmap done, virt_base 0x%p\n,
-region-u.r_virtbase + offset);
-else {
-fprintf(stderr, %s: fail munmap msix table!\n, __func__);
-exit(1);
-}
+
 cpu_register_physical_memory(e_phys + offset,
 TARGET_PAGE_SIZE, r_dev-mmio_index);
 }
-   ret = kvm_register_phys_mem(kvm_context, e_phys,
-region-u.r_virtbase,
-TARGET_PAGE_ALIGN(e_size), 0);
 }
 
 if (ret != 0) {
@@ -539,6 +528,15 @@
 pci_dev-v_addrs[i].u.r_virtbase +=
 (cur_region-base_addr  0xFFF);
 
+
+if (!slow_map) {
+void *virtbase = pci_dev-v_addrs[i].u.r_virtbase;
+
+pci_dev-v_addrs[i].memory_index = 
qemu_ram_map(cur_region-size,
+virtbase);
+} else
+pci_dev-v_addrs[i].memory_index = 0;
+
 pci_register_bar((PCIDevice *) pci_dev, i,
  cur_region-size, t,
  slow_map ? assigned_dev_iomem_map_slow
@@ -726,10 +724,6 @@
 kvm_remove_ioperm_data(region-u.r_baseport, region-r_size);
 continue;
 } else if (pci_region-type  IORESOURCE_MEM) {
-if (region-e_size  0)
-kvm_destroy_phys_mem(kvm_context, region-e_physbase,
- TARGET_PAGE_ALIGN(region-e_size));
-
 if (region-u.r_virtbase) {
 int ret = munmap(region-u.r_virtbase,
  (pci_region-size + 0xFFF)  0xF000);
Index: qemu-kvm/hw/device-assignment.h
===
--- qemu-kvm.orig/hw/device-assignment.h2010-04-22 16:21:30.0 
-0400
+++ qemu-kvm/hw/device-assignment.h 2010-04-22 16:24:32.0 -0400
@@ -63,7 +63,7 @@
 
 typedef struct {
 pcibus_t e_physbase;
-uint32_t memory_index;
+ram_addr_t memory_index;
 union {
 void *r_virtbase;/* mmapped access address for memory regions */
 uint32_t r_baseport; /* the base guest port for I/O regions */
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/10] introduce qemu_ram_map

2010-04-26 Thread Anthony Liguori

On 04/26/2010 01:49 PM, Marcelo Tosatti wrote:

On Mon, Apr 26, 2010 at 01:27:30PM -0500, Anthony Liguori wrote:
   

On 04/26/2010 12:59 PM, Marcelo Tosatti wrote:
 

Which allows drivers to register an mmaped region into ram block mappings.
To be used by device assignment driver.
   

This doesn't make much sense to me.

Do you use this like:

qemu_ram_map(64k, ptr);
assert(qemu_ram_alloc(64k) == ptr);
 

No. hw/device-assignment.c in qemu-kvm mmaps
/sys/bus/pci/devices/x:y:z/resourcen (the PCI devices memory regions) to
the guest.
   


I understand, but how do you use qemu_ram_map() to actually map that 
memory to a given PCI device resource?  I assume you rely on it getting 
put on the front of the list so that the next qemu_ram_alloc() will be 
at that location.


Regards,

Anthony Liguori


If so, I think this is not the best API.  I'd rather see
qemu_ram_map() register a symbolic name for the region and for there
to be a qemu_ram_alloc() variant that allocated by name.

Regards,

Anthony Liguori

 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/10] introduce qemu_ram_map

2010-04-26 Thread Anthony Liguori

On 04/26/2010 01:50 PM, Marcelo Tosatti wrote:

On Mon, Apr 26, 2010 at 01:29:06PM -0500, Anthony Liguori wrote:
   

On 04/26/2010 12:59 PM, Marcelo Tosatti wrote:
 

Which allows drivers to register an mmaped region into ram block mappings.
To be used by device assignment driver.

   

This is not kvm specific and not required by this pull request so it
shouldn't really be part of the pull.  Something like this should
only be added when there's an actual consumer.
 

The user will be hw/device-assignment.c in qemu-kvm. And also Cam has
the need for a similar interface for shared memory drivers.
   


It should be part of one of those submissions.


Index: qemu-kvm/hw/device-assignment.c
===
--- qemu-kvm.orig/hw/device-assignment.c2010-04-22 16:21:30.0 
-0400
+++ qemu-kvm/hw/device-assignment.c 2010-04-22 17:36:57.0 -0400
@@ -256,10 +256,7 @@
  AssignedDevice *r_dev = container_of(pci_dev, AssignedDevice, dev);
  AssignedDevRegion *region =r_dev-v_addrs[region_num];
  PCIRegion *real_region =r_dev-real_device.regions[region_num];
-pcibus_t old_ephys = region-e_physbase;
-pcibus_t old_esize = region-e_size;
-int first_map = (region-e_size == 0);
-int ret = 0;
+int ret = 0, flags = 0;

  DEBUG(e_phys=%08 FMT_PCIBUS  r_virt=%p type=%d len=%08 FMT_PCIBUS  
region_num=%d \n,
e_phys, region-u.r_virtbase, type, e_size, region_num);
@@ -267,30 +264,22 @@
  region-e_physbase = e_phys;
  region-e_size = e_size;

-if (!first_map)
-   kvm_destroy_phys_mem(kvm_context, old_ephys,
- TARGET_PAGE_ALIGN(old_esize));
-
  if (e_size  0) {
+
+if (region_num == PCI_ROM_SLOT)
+flags |= IO_MEM_ROM;
+
+cpu_register_physical_memory(e_phys, e_size, region-memory_index | 
flags);
+
  /* deal with MSI-X MMIO page */
  if (real_region-base_addr= r_dev-msix_table_addr
  real_region-base_addr + real_region-size=
  r_dev-msix_table_addr) {
  int offset = r_dev-msix_table_addr - real_region-base_addr;
-ret = munmap(region-u.r_virtbase + offset, TARGET_PAGE_SIZE);
-if (ret == 0)
-DEBUG(munmap done, virt_base 0x%p\n,
-region-u.r_virtbase + offset);
-else {
-fprintf(stderr, %s: fail munmap msix table!\n, __func__);
-exit(1);
-}
+
  cpu_register_physical_memory(e_phys + offset,
  TARGET_PAGE_SIZE, r_dev-mmio_index);
  }
-   ret = kvm_register_phys_mem(kvm_context, e_phys,
-region-u.r_virtbase,
-TARGET_PAGE_ALIGN(e_size), 0);
  }

  if (ret != 0) {
@@ -539,6 +528,15 @@
  pci_dev-v_addrs[i].u.r_virtbase +=
  (cur_region-base_addr  0xFFF);

+
+if (!slow_map) {
+void *virtbase = pci_dev-v_addrs[i].u.r_virtbase;
+
+pci_dev-v_addrs[i].memory_index = 
qemu_ram_map(cur_region-size,
+virtbase);
+} else
+pci_dev-v_addrs[i].memory_index = 0;
+
  pci_register_bar((PCIDevice *) pci_dev, i,
   cur_region-size, t,
   slow_map ? assigned_dev_iomem_map_slow
@@ -726,10 +724,6 @@
  kvm_remove_ioperm_data(region-u.r_baseport, region-r_size);
  continue;
  } else if (pci_region-type  IORESOURCE_MEM) {
-if (region-e_size  0)
-kvm_destroy_phys_mem(kvm_context, region-e_physbase,
- TARGET_PAGE_ALIGN(region-e_size));
-
  if (region-u.r_virtbase) {
  int ret = munmap(region-u.r_virtbase,
   (pci_region-size + 0xFFF)  0xF000);
   


How does hot unplug get dealt with?

Regards,

Anthony Liguori


Index: qemu-kvm/hw/device-assignment.h
===
--- qemu-kvm.orig/hw/device-assignment.h2010-04-22 16:21:30.0 
-0400
+++ qemu-kvm/hw/device-assignment.h 2010-04-22 16:24:32.0 -0400
@@ -63,7 +63,7 @@

  typedef struct {
  pcibus_t e_physbase;
-uint32_t memory_index;
+ram_addr_t memory_index;
  union {
  void *r_virtbase;/* mmapped access address for memory regions */
  uint32_t r_baseport; /* the base guest port for I/O regions */
   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Document KVM_GET_MP_STATE and KVM_SET_MP_STATE

2010-04-26 Thread Marcelo Tosatti
On Sun, Apr 25, 2010 at 03:51:46PM +0300, Avi Kivity wrote:
 Signed-off-by: Avi Kivity a...@redhat.com
 ---
  Documentation/kvm/api.txt |   44 
  1 files changed, 44 insertions(+), 0 deletions(-)

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/10] introduce qemu_ram_map

2010-04-26 Thread Marcelo Tosatti
On Mon, Apr 26, 2010 at 01:57:37PM -0500, Anthony Liguori wrote:
 On 04/26/2010 01:50 PM, Marcelo Tosatti wrote:
 On Mon, Apr 26, 2010 at 01:29:06PM -0500, Anthony Liguori wrote:
 On 04/26/2010 12:59 PM, Marcelo Tosatti wrote:
 Which allows drivers to register an mmaped region into ram block mappings.
 To be used by device assignment driver.
 
 This is not kvm specific and not required by this pull request so it
 shouldn't really be part of the pull.  Something like this should
 only be added when there's an actual consumer.
 The user will be hw/device-assignment.c in qemu-kvm. And also Cam has
 the need for a similar interface for shared memory drivers.
 
 It should be part of one of those submissions.

OK

 @@ -726,10 +724,6 @@
   kvm_remove_ioperm_data(region-u.r_baseport, 
  region-r_size);
   continue;
   } else if (pci_region-type  IORESOURCE_MEM) {
 -if (region-e_size  0)
 -kvm_destroy_phys_mem(kvm_context, region-e_physbase,
 - TARGET_PAGE_ALIGN(region-e_size));
 -
   if (region-u.r_virtbase) {
   int ret = munmap(region-u.r_virtbase,
(pci_region-size + 0xFFF)  
  0xF000);
 
 How does hot unplug get dealt with?

The regions will have such mappings unmapped from QEMU (and KVM) via
cpu_register_physical_memory(IO_MEM_UNASSIGNED) via
pci_unregister_io_regions.

Just pushed a new tree without the patch, please pull if you
are OK with the other changes.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH 10/10] introduce qemu_ram_map

2010-04-26 Thread Anthony Liguori

On 04/26/2010 02:14 PM, Marcelo Tosatti wrote:

On Mon, Apr 26, 2010 at 01:57:37PM -0500, Anthony Liguori wrote:
   

On 04/26/2010 01:50 PM, Marcelo Tosatti wrote:
 

On Mon, Apr 26, 2010 at 01:29:06PM -0500, Anthony Liguori wrote:
   

On 04/26/2010 12:59 PM, Marcelo Tosatti wrote:
 

Which allows drivers to register an mmaped region into ram block mappings.
To be used by device assignment driver.

   

This is not kvm specific and not required by this pull request so it
shouldn't really be part of the pull.  Something like this should
only be added when there's an actual consumer.
 

The user will be hw/device-assignment.c in qemu-kvm. And also Cam has
the need for a similar interface for shared memory drivers.
   

It should be part of one of those submissions.
 

OK

   

@@ -726,10 +724,6 @@
  kvm_remove_ioperm_data(region-u.r_baseport, region-r_size);
  continue;
  } else if (pci_region-type   IORESOURCE_MEM) {
-if (region-e_size   0)
-kvm_destroy_phys_mem(kvm_context, region-e_physbase,
- TARGET_PAGE_ALIGN(region-e_size));
-
  if (region-u.r_virtbase) {
  int ret = munmap(region-u.r_virtbase,
   (pci_region-size + 0xFFF)   
0xF000);
   

How does hot unplug get dealt with?
 

The regions will have such mappings unmapped from QEMU (and KVM) via
cpu_register_physical_memory(IO_MEM_UNASSIGNED) via
pci_unregister_io_regions.
   


But how do you qemu_ram_unmap()?  I see you munmap() that address but it 
looks like the qemu ram region gets leaked pointing to an invalid pointer.


Regards,

Anthony Liguori


Just pushed a new tree without the patch, please pull if you
are OK with the other changes.
   


Yes, I am.


   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH v4 01/18] Add a new struct for device to manipulate external buffer.

2010-04-26 Thread Andy Fleming
On Sun, Apr 25, 2010 at 4:19 AM,  xiaohui@intel.com wrote:
 From: Xin Xiaohui xiaohui@intel.com

 Signed-off-by: Xin Xiaohui xiaohui@intel.com
 Signed-off-by: Zhao Yu yzha...@gmail.com
 Reviewed-by: Jeff Dike jd...@linux.intel.com
 ---
  include/linux/netdevice.h |   19 ++-
  1 files changed, 18 insertions(+), 1 deletions(-)

 diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
 index c79a88b..bf79756 100644
 --- a/include/linux/netdevice.h
 +++ b/include/linux/netdevice.h
 @@ -530,6 +530,22 @@ struct netdev_queue {
        unsigned long           tx_dropped;
  } cacheline_aligned_in_smp;

 +/* Add a structure in structure net_device, the new field is
 + * named as mp_port. It's for mediate passthru (zero-copy).
 + * It contains the capability for the net device driver,
 + * a socket, and an external buffer creator, external means
 + * skb buffer belongs to the device may not be allocated from
 + * kernel space.
 + */
 +struct mpassthru_port  {
 +       int             hdr_len;
 +       int             data_len;
 +       int             npages;
 +       unsigned        flags;
 +       struct socket   *sock;
 +       struct skb_external_page *(*ctor)(struct mpassthru_port *,
 +                               struct sk_buff *, int);
 +};


I tried searching around, but couldn't find where struct
skb_external_page is declared.  Where is it?

Andy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6] Add mergeable rx buffer support to vhost_net

2010-04-26 Thread David L Stevens
This patch adds mergeable receive buffer support to vhost_net.

+-DLS

Signed-off-by: David L Stevens dlstev...@us.ibm.com

diff -ruNp net-next-v0/drivers/vhost/net.c net-next-v6/drivers/vhost/net.c
--- net-next-v0/drivers/vhost/net.c 2010-04-24 21:36:54.0 -0700
+++ net-next-v6/drivers/vhost/net.c 2010-04-26 01:13:04.0 -0700
@@ -109,7 +109,7 @@ static void handle_tx(struct vhost_net *
};
size_t len, total_len = 0;
int err, wmem;
-   size_t hdr_size;
+   size_t vhost_hlen;
struct socket *sock = rcu_dereference(vq-private_data);
if (!sock)
return;
@@ -128,13 +128,13 @@ static void handle_tx(struct vhost_net *
 
if (wmem  sock-sk-sk_sndbuf / 2)
tx_poll_stop(net);
-   hdr_size = vq-hdr_size;
+   vhost_hlen = vq-vhost_hlen;
 
for (;;) {
-   head = vhost_get_vq_desc(net-dev, vq, vq-iov,
-ARRAY_SIZE(vq-iov),
-out, in,
-NULL, NULL);
+   head = vhost_get_desc(net-dev, vq, vq-iov,
+ ARRAY_SIZE(vq-iov),
+ out, in,
+ NULL, NULL);
/* Nothing new?  Wait for eventfd to tell us they refilled. */
if (head == vq-num) {
wmem = atomic_read(sock-sk-sk_wmem_alloc);
@@ -155,20 +155,20 @@ static void handle_tx(struct vhost_net *
break;
}
/* Skip header. TODO: support TSO. */
-   s = move_iovec_hdr(vq-iov, vq-hdr, hdr_size, out);
+   s = move_iovec_hdr(vq-iov, vq-hdr, vhost_hlen, out);
msg.msg_iovlen = out;
len = iov_length(vq-iov, out);
/* Sanity check */
if (!len) {
vq_err(vq, Unexpected header len for TX: 
   %zd expected %zd\n,
-  iov_length(vq-hdr, s), hdr_size);
+  iov_length(vq-hdr, s), vhost_hlen);
break;
}
/* TODO: Check specific error and bomb out unless ENOBUFS? */
err = sock-ops-sendmsg(NULL, sock, msg, len);
if (unlikely(err  0)) {
-   vhost_discard_vq_desc(vq);
+   vhost_discard_desc(vq, 1);
tx_poll_start(net, sock);
break;
}
@@ -187,12 +187,25 @@ static void handle_tx(struct vhost_net *
unuse_mm(net-dev.mm);
 }
 
+static int vhost_head_len(struct vhost_virtqueue *vq, struct sock *sk)
+{
+   struct sk_buff *head;
+   int len = 0;
+
+   lock_sock(sk);
+   head = skb_peek(sk-sk_receive_queue);
+   if (head)
+   len = head-len + vq-sock_hlen;
+   release_sock(sk);
+   return len;
+}
+
 /* Expects to be always run from workqueue - which acts as
  * read-size critical section for our kind of RCU. */
 static void handle_rx(struct vhost_net *net)
 {
struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_RX];
-   unsigned head, out, in, log, s;
+   unsigned in, log, s;
struct vhost_log *vq_log;
struct msghdr msg = {
.msg_name = NULL,
@@ -203,14 +216,14 @@ static void handle_rx(struct vhost_net *
.msg_flags = MSG_DONTWAIT,
};
 
-   struct virtio_net_hdr hdr = {
-   .flags = 0,
-   .gso_type = VIRTIO_NET_HDR_GSO_NONE
+   struct virtio_net_hdr_mrg_rxbuf hdr = {
+   .hdr.flags = 0,
+   .hdr.gso_type = VIRTIO_NET_HDR_GSO_NONE
};
 
size_t len, total_len = 0;
-   int err;
-   size_t hdr_size;
+   int err, headcount, datalen;
+   size_t vhost_hlen;
struct socket *sock = rcu_dereference(vq-private_data);
if (!sock || skb_queue_empty(sock-sk-sk_receive_queue))
return;
@@ -218,18 +231,19 @@ static void handle_rx(struct vhost_net *
use_mm(net-dev.mm);
mutex_lock(vq-mutex);
vhost_disable_notify(vq);
-   hdr_size = vq-hdr_size;
+   vhost_hlen = vq-vhost_hlen;
 
vq_log = unlikely(vhost_has_feature(net-dev, VHOST_F_LOG_ALL)) ?
vq-log : NULL;
 
-   for (;;) {
-   head = vhost_get_vq_desc(net-dev, vq, vq-iov,
-ARRAY_SIZE(vq-iov),
-out, in,
-vq_log, log);
+   while ((datalen = vhost_head_len(vq, sock-sk))) {
+   headcount = vhost_get_desc_n(vq, vq-heads,
+datalen + vhost_hlen,
+in, vq_log, log);
+   if 

Re: [PATCH RFC] KVM MMU: fix hashing for TDP and non-paging modes

2010-04-26 Thread Marcelo Tosatti
On Thu, Apr 22, 2010 at 02:15:14PM -0700, Eric Northup wrote:
 I've been reading the x86's mmu.c recently and had been wondering
 about something.  Avi's recent mmu documentation (thanks!) seems to
 have confirmed my understanding of how the shadow paging is supposed
 to be working.  In TDP mode, when mmu_alloc_roots() calls
 kvm_mmu_get_page(), why does it pass (vcpu-arch.cr3  PAGE_SHIFT) or
 (vcpu-arch.mmu.pae_root[i]) as gfn?
 
 It seems to me that in TDP mode, gfn should be either zero for the
 root page table, or 0/1GB/2GB/3GB (for PAE page tables).
 
 The existing behavior can lead to multiple, semantically-identical TDP
 roots being created by mmu_alloc_roots, depending on the VCPU's CR3 at
 the time that mmu_alloc_roots was called.  But the nested page tables
 should be* independent of the VCPU state. That wastes some memory and
 causes extra page faults while populating the extra copies of the page
 tables.
 
 *assuming that we aren't modeling per-VCPU state that might change the
 physical address map as seen by that VCPU, such as setting the APIC
 base to an address overlapping RAM.
 
 All feedback would be welcome, since I'm new to this system!  A
 strawman patch follows.
 
 thanks,
 -Eric
 
 --
 
 For TDP mode, avoid creating multiple page table roots for the single
 guest-to-host physical address map by fixing the inputs used for the
 shadow page table hash in mmu_alloc_roots().
 
 Signed-off-by: Eric Northup digitale...@google.com
 ---
  arch/x86/kvm/mmu.c |   12 
  1 files changed, 8 insertions(+), 4 deletions(-)
 
 diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
 index ddfa865..9696d65 100644
 --- a/arch/x86/kvm/mmu.c
 +++ b/arch/x86/kvm/mmu.c
 @@ -2059,10 +2059,12 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
 hpa_t root = vcpu-arch.mmu.root_hpa;
 
 ASSERT(!VALID_PAGE(root));
 -   if (tdp_enabled)
 -   direct = 1;
 if (mmu_check_root(vcpu, root_gfn))
 return 1;
 +   if (tdp_enabled) {
 +   direct = 1;
 +   root_gfn = 0;
 +   }
 sp = kvm_mmu_get_page(vcpu, root_gfn, 0,
   PT64_ROOT_LEVEL, direct,
   ACC_ALL, NULL);
 @@ -2072,8 +2074,6 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
 return 0;
 }
 direct = !is_paging(vcpu);
 -   if (tdp_enabled)
 -   direct = 1;
 for (i = 0; i  4; ++i) {
 hpa_t root = vcpu-arch.mmu.pae_root[i];
 
 @@ -2089,6 +2089,10 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
 root_gfn = 0;
 if (mmu_check_root(vcpu, root_gfn))
 return 1;
 +   if (tdp_enabled) {
 +   direct = 1;
 +   root_gfn = i  30;
 +   }
 sp = kvm_mmu_get_page(vcpu, root_gfn, i  30,
   PT32_ROOT_LEVEL, direct,
   ACC_ALL, NULL);

There is no need to allocate 4 different roots for TDP tables if
kvm_x86_ops-get_tdp_level() == PT64_ROOT_LEVEL.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH 10/10] introduce qemu_ram_map

2010-04-26 Thread Marcelo Tosatti
On Mon, Apr 26, 2010 at 02:20:42PM -0500, Anthony Liguori wrote:
 On 04/26/2010 02:14 PM, Marcelo Tosatti wrote:
 On Mon, Apr 26, 2010 at 01:57:37PM -0500, Anthony Liguori wrote:
 On 04/26/2010 01:50 PM, Marcelo Tosatti wrote:
 On Mon, Apr 26, 2010 at 01:29:06PM -0500, Anthony Liguori wrote:
 On 04/26/2010 12:59 PM, Marcelo Tosatti wrote:
 Which allows drivers to register an mmaped region into ram block 
 mappings.
 To be used by device assignment driver.
 
 This is not kvm specific and not required by this pull request so it
 shouldn't really be part of the pull.  Something like this should
 only be added when there's an actual consumer.
 The user will be hw/device-assignment.c in qemu-kvm. And also Cam has
 the need for a similar interface for shared memory drivers.
 It should be part of one of those submissions.
 OK
 
 @@ -726,10 +724,6 @@
   kvm_remove_ioperm_data(region-u.r_baseport, 
  region-r_size);
   continue;
   } else if (pci_region-type   IORESOURCE_MEM) {
 -if (region-e_size   0)
 -kvm_destroy_phys_mem(kvm_context, region-e_physbase,
 - 
 TARGET_PAGE_ALIGN(region-e_size));
 -
   if (region-u.r_virtbase) {
   int ret = munmap(region-u.r_virtbase,
(pci_region-size + 0xFFF)   
  0xF000);
 How does hot unplug get dealt with?
 The regions will have such mappings unmapped from QEMU (and KVM) via
 cpu_register_physical_memory(IO_MEM_UNASSIGNED) via
 pci_unregister_io_regions.
 
 But how do you qemu_ram_unmap()?  I see you munmap() that address
 but it looks like the qemu ram region gets leaked pointing to an
 invalid pointer.

Yes, qemu_ram_free() is not implemented. last_ram_offset always moves
forward. But there should be no references to the memory mapping
anymore, after the device is hot-unplugged.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6] Add mergeable rx buffer support to vhost_net

2010-04-26 Thread Michael S. Tsirkin
On Mon, Apr 26, 2010 at 02:20:52PM -0700, David L Stevens wrote:
 This patch adds mergeable receive buffer support to vhost_net.
 
   +-DLS
 
 Signed-off-by: David L Stevens dlstev...@us.ibm.com

OK, looks good. I still think iovec handling is a bit off,
as commented below.

 diff -ruNp net-next-v0/drivers/vhost/net.c net-next-v6/drivers/vhost/net.c
 --- net-next-v0/drivers/vhost/net.c   2010-04-24 21:36:54.0 -0700
 +++ net-next-v6/drivers/vhost/net.c   2010-04-26 01:13:04.0 -0700
 @@ -109,7 +109,7 @@ static void handle_tx(struct vhost_net *
   };
   size_t len, total_len = 0;
   int err, wmem;
 - size_t hdr_size;
 + size_t vhost_hlen;
   struct socket *sock = rcu_dereference(vq-private_data);
   if (!sock)
   return;
 @@ -128,13 +128,13 @@ static void handle_tx(struct vhost_net *
  
   if (wmem  sock-sk-sk_sndbuf / 2)
   tx_poll_stop(net);
 - hdr_size = vq-hdr_size;
 + vhost_hlen = vq-vhost_hlen;
  
   for (;;) {
 - head = vhost_get_vq_desc(net-dev, vq, vq-iov,
 -  ARRAY_SIZE(vq-iov),
 -  out, in,
 -  NULL, NULL);
 + head = vhost_get_desc(net-dev, vq, vq-iov,
 +   ARRAY_SIZE(vq-iov),
 +   out, in,
 +   NULL, NULL);
   /* Nothing new?  Wait for eventfd to tell us they refilled. */
   if (head == vq-num) {
   wmem = atomic_read(sock-sk-sk_wmem_alloc);
 @@ -155,20 +155,20 @@ static void handle_tx(struct vhost_net *
   break;
   }
   /* Skip header. TODO: support TSO. */
 - s = move_iovec_hdr(vq-iov, vq-hdr, hdr_size, out);
 + s = move_iovec_hdr(vq-iov, vq-hdr, vhost_hlen, out);
   msg.msg_iovlen = out;
   len = iov_length(vq-iov, out);
   /* Sanity check */
   if (!len) {
   vq_err(vq, Unexpected header len for TX: 
  %zd expected %zd\n,
 -iov_length(vq-hdr, s), hdr_size);
 +iov_length(vq-hdr, s), vhost_hlen);
   break;
   }
   /* TODO: Check specific error and bomb out unless ENOBUFS? */
   err = sock-ops-sendmsg(NULL, sock, msg, len);
   if (unlikely(err  0)) {
 - vhost_discard_vq_desc(vq);
 + vhost_discard_desc(vq, 1);
   tx_poll_start(net, sock);
   break;
   }
 @@ -187,12 +187,25 @@ static void handle_tx(struct vhost_net *
   unuse_mm(net-dev.mm);
  }
  
 +static int vhost_head_len(struct vhost_virtqueue *vq, struct sock *sk)
 +{
 + struct sk_buff *head;
 + int len = 0;
 +
 + lock_sock(sk);
 + head = skb_peek(sk-sk_receive_queue);
 + if (head)
 + len = head-len + vq-sock_hlen;
 + release_sock(sk);
 + return len;
 +}
 +
  /* Expects to be always run from workqueue - which acts as
   * read-size critical section for our kind of RCU. */
  static void handle_rx(struct vhost_net *net)
  {
   struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_RX];
 - unsigned head, out, in, log, s;
 + unsigned in, log, s;
   struct vhost_log *vq_log;
   struct msghdr msg = {
   .msg_name = NULL,
 @@ -203,14 +216,14 @@ static void handle_rx(struct vhost_net *
   .msg_flags = MSG_DONTWAIT,
   };
  
 - struct virtio_net_hdr hdr = {
 - .flags = 0,
 - .gso_type = VIRTIO_NET_HDR_GSO_NONE
 + struct virtio_net_hdr_mrg_rxbuf hdr = {
 + .hdr.flags = 0,
 + .hdr.gso_type = VIRTIO_NET_HDR_GSO_NONE
   };
  
   size_t len, total_len = 0;
 - int err;
 - size_t hdr_size;
 + int err, headcount, datalen;
 + size_t vhost_hlen;
   struct socket *sock = rcu_dereference(vq-private_data);
   if (!sock || skb_queue_empty(sock-sk-sk_receive_queue))
   return;
 @@ -218,18 +231,19 @@ static void handle_rx(struct vhost_net *
   use_mm(net-dev.mm);
   mutex_lock(vq-mutex);
   vhost_disable_notify(vq);
 - hdr_size = vq-hdr_size;
 + vhost_hlen = vq-vhost_hlen;
  
   vq_log = unlikely(vhost_has_feature(net-dev, VHOST_F_LOG_ALL)) ?
   vq-log : NULL;
  
 - for (;;) {
 - head = vhost_get_vq_desc(net-dev, vq, vq-iov,
 -  ARRAY_SIZE(vq-iov),
 -  out, in,
 -  vq_log, log);
 + while ((datalen = vhost_head_len(vq, sock-sk))) {
 + headcount = vhost_get_desc_n(vq, vq-heads,
 +  datalen + vhost_hlen,
 +  

Re: [PATCH RFC] KVM MMU: fix hashing for TDP and non-paging modes

2010-04-26 Thread Marcelo Tosatti
On Mon, Apr 26, 2010 at 06:30:00PM -0300, Marcelo Tosatti wrote:
  @@ -2089,6 +2089,10 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
  root_gfn = 0;
  if (mmu_check_root(vcpu, root_gfn))
  return 1;
  +   if (tdp_enabled) {
  +   direct = 1;
  +   root_gfn = i  30;
  +   }
  sp = kvm_mmu_get_page(vcpu, root_gfn, i  30,
PT32_ROOT_LEVEL, direct,
ACC_ALL, NULL);
 
 There is no need to allocate 4 different roots for TDP tables if
 kvm_x86_ops-get_tdp_level() == PT64_ROOT_LEVEL.

Doh, and your patch does not. But it does not apply to kvm.git -next 
branch, can you regenerate please?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for Apr 27

2010-04-26 Thread Chris Wright
* Anthony Liguori (anth...@codemonkey.ws) wrote:
 On 04/26/2010 12:26 PM, Chris Wright wrote:
 Please send in any agenda items you are interested in covering.
 
 While I don't expect it to be the case this week, if we have a
 lack of agenda items I'll cancel the week's call.
 
 - qemu management interface (and libvirt)
 - stable tree policy (push vs. pull and call for stable volunteers)

block plug in (follow-on from qmp block watermark)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Minor MMU documentation edits

2010-04-26 Thread Marcelo Tosatti
On Mon, Apr 26, 2010 at 11:59:21AM +0300, Avi Kivity wrote:
 Reported by Andrew Jones.
 
 Signed-off-by: Avi Kivity a...@redhat.com
 ---
  Documentation/kvm/mmu.txt |   10 +-
  1 files changed, 5 insertions(+), 5 deletions(-)

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for Apr 27

2010-04-26 Thread Anthony Liguori

On 04/26/2010 05:12 PM, Chris Wright wrote:

* Anthony Liguori (anth...@codemonkey.ws) wrote:
   

On 04/26/2010 12:26 PM, Chris Wright wrote:
 

Please send in any agenda items you are interested in covering.

While I don't expect it to be the case this week, if we have a
lack of agenda items I'll cancel the week's call.
   

- qemu management interface (and libvirt)
- stable tree policy (push vs. pull and call for stable volunteers)
 

block plug in (follow-on from qmp block watermark)
   


A few comments:

1) The problem was not block watermark itself but generating a 
notification on the watermark threshold.  It's a heuristic and should be 
implemented based on polling block stats.  Otherwise, we'll be adding 
tons of events to qemu that we'll struggle to maintain.


2) A block plugin doesn't solve the problem if it's just at the 
BlockDriverState level because it can't interact with qcow2.


3) For general block plugins, it's probably better to tackle userspace 
block devices.  We have CUSE and FUSE already, a BUSE is a logical 
conclusion.


Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] KVM MMU: fix hashing for TDP and non-paging modes

2010-04-26 Thread Eric Northup
On Mon, Apr 26, 2010 at 2:46 PM, Marcelo Tosatti mtosa...@redhat.com wrote:
 Doh, and your patch does not. But it does not apply to kvm.git -next
 branch, can you regenerate please?

--

For TDP mode, avoid creating multiple page table roots for the single
guest-to-host physical address map by fixing the inputs used for the
shadow page table hash in mmu_alloc_roots().

Signed-off-by: Eric Northup digitale...@google.com
---

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index ddfa865..9696d65 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2059,10 +2059,12 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
hpa_t root = vcpu-arch.mmu.root_hpa;

ASSERT(!VALID_PAGE(root));
-   if (tdp_enabled)
-   direct = 1;
if (mmu_check_root(vcpu, root_gfn))
return 1;
+   if (tdp_enabled) {
+   direct = 1;
+   root_gfn = 0;
+   }
sp = kvm_mmu_get_page(vcpu, root_gfn, 0,
  PT64_ROOT_LEVEL, direct,
  ACC_ALL, NULL);
@@ -2072,8 +2074,6 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
return 0;
}
direct = !is_paging(vcpu);
-   if (tdp_enabled)
-   direct = 1;
for (i = 0; i  4; ++i) {
hpa_t root = vcpu-arch.mmu.pae_root[i];

@@ -2089,6 +2089,10 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
root_gfn = 0;
if (mmu_check_root(vcpu, root_gfn))
return 1;
+   if (tdp_enabled) {
+   direct = 1;
+   root_gfn = i  30;
+   }
sp = kvm_mmu_get_page(vcpu, root_gfn, i  30,
  PT32_ROOT_LEVEL, direct,
  ACC_ALL, NULL);
--
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: KVM call agenda for Apr 27

2010-04-26 Thread Luiz Capitulino
On Mon, 26 Apr 2010 12:51:08 -0500
Anthony Liguori anth...@codemonkey.ws wrote:

 On 04/26/2010 12:26 PM, Chris Wright wrote:
  Please send in any agenda items you are interested in covering.
 
  While I don't expect it to be the case this week, if we have a
  lack of agenda items I'll cancel the week's call.
 
 
 - qemu management interface (and libvirt)
 - stable tree policy (push vs. pull and call for stable volunteers)

 What do you mean by push vs. pull?

 Anyway, Aurelien was working on a stable release last week, maybe
he's interested in helping with the stables (or not :)).
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


GSoC: Accepted students announced

2010-04-26 Thread Luiz Capitulino

 Hi there,

 The following students have been accepted to work with QEMU for GSoC 2010:

Student: Corentin Chary
Title: Add more sophisticated encodings to VNC server
Mentor: Anthony Liguori

Student: Eduard - Gabriel Munteanu
Title: AMD IOMMU emulation
Mentor: Joerg Roedel

Student: Miguel Di Ciurcio Filho
Title: Converting Monitor interface functions to QMP
Mentor: Luiz Capitulino

Student: Mohammed Gamal
Title: Completing Big Real Mode Support for KVM
Mentor: Avi Kivity

Student: Roland Elek
Title: AHCI emulation
Mentor: Alexander Graf

 Congratulations to you all!

 If you're a student and your proposal wasn't accepted, remember that
it's always possible to work out of GSoC scope. Just contact the mentor
of the project you have interest and check with him/her.

 Thanks for everyone who applied.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6] pvclock fixes

2010-04-26 Thread Zachary Amsden

On 04/26/2010 07:46 AM, Glauber Costa wrote:

Hi,

This is the last series I've sent, with comments from you merged.
The first 5 patches are the same, only with the suggested fixes.
I am leaving documentation out, since the basics won't change, and
we're still discussing the details.

Patch 6 is new, and is the guest side of the skipping updates
avi asked for. I haven't yet done any HV work on this
(specially because I am not convinced exactly where it is safe to
do).

Let me know what you think.
   


I'm rebasing my patches on top of this series to address the host side 
issues.  I noticed a couple issues patching against Avi's tree, not sure 
if those issues are also present in trunk.


Can you keep me CC'd on any updates to this series?

Thanks,

Zach
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: Fix mmu shrinker error

2010-04-26 Thread Gui Jianfeng
kvm_mmu_remove_one_alloc_mmu_page() assumes kvm_mmu_zap_page() only reclaims
only one sp, but that's not the case. This will cause mmu shrinker returns
a wrong number. This patch fix the counting error.

Signed-off-by: Gui Jianfeng guijianf...@cn.fujitsu.com
---
 arch/x86/kvm/mmu.c |   10 +-
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 7a17db1..c97368e 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2914,13 +2914,13 @@ restart:
kvm_flush_remote_tlbs(kvm);
 }
 
-static void kvm_mmu_remove_one_alloc_mmu_page(struct kvm *kvm)
+static int kvm_mmu_remove_some_alloc_mmu_pages(struct kvm *kvm)
 {
struct kvm_mmu_page *page;
 
page = container_of(kvm-arch.active_mmu_pages.prev,
struct kvm_mmu_page, link);
-   kvm_mmu_zap_page(kvm, page);
+   return kvm_mmu_zap_page(kvm, page) + 1;
 }
 
 static int mmu_shrink(int nr_to_scan, gfp_t gfp_mask)
@@ -2932,7 +2932,7 @@ static int mmu_shrink(int nr_to_scan, gfp_t gfp_mask)
spin_lock(kvm_lock);
 
list_for_each_entry(kvm, vm_list, vm_list) {
-   int npages, idx;
+   int npages, idx, freed_pages;
 
idx = srcu_read_lock(kvm-srcu);
spin_lock(kvm-mmu_lock);
@@ -2940,8 +2940,8 @@ static int mmu_shrink(int nr_to_scan, gfp_t gfp_mask)
 kvm-arch.n_free_mmu_pages;
cache_count += npages;
if (!kvm_freed  nr_to_scan  0  npages  0) {
-   kvm_mmu_remove_one_alloc_mmu_page(kvm);
-   cache_count--;
+   freed_pages = kvm_mmu_remove_some_alloc_mmu_pages(kvm);
+   cache_count -= freed_pages;
kvm_freed = kvm;
}
nr_to_scan--;
-- 
1.6.5.2


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: KVM call agenda for Apr 27

2010-04-26 Thread Aurelien Jarno
On Mon, Apr 26, 2010 at 10:15:58PM -0300, Luiz Capitulino wrote:
 On Mon, 26 Apr 2010 12:51:08 -0500
 Anthony Liguori anth...@codemonkey.ws wrote:
 
  On 04/26/2010 12:26 PM, Chris Wright wrote:
   Please send in any agenda items you are interested in covering.
  
   While I don't expect it to be the case this week, if we have a
   lack of agenda items I'll cancel the week's call.
  
  
  - qemu management interface (and libvirt)
  - stable tree policy (push vs. pull and call for stable volunteers)
 
  What do you mean by push vs. pull?
 
  Anyway, Aurelien was working on a stable release last week, maybe
 he's interested in helping with the stables (or not :)).
 

I didn't find the time to do the stable release, but we should be very
close now.

I am interested to have stable releases, but if someone else want to
work on that, I am fine.

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/10] kvm: port qemu-kvm's bitmap scanning

2010-04-26 Thread Yoshiaki Tamura
Hi,

This patch may conflict with the patch I posted on April 19.

http://www.mail-archive.com/qemu-de...@nongnu.org/msg29941.html

If Marcelo's is going to be merged, I need to rebase the above to it.
It would be helpful if you could tell me the plan.

Thanks,

Yoshi

2010/4/27 Marcelo Tosatti mtosa...@redhat.com:
 Which is significantly faster.

 Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
 ---
  kvm-all.c |   53 +
  1 files changed, 37 insertions(+), 16 deletions(-)

 diff --git a/kvm-all.c b/kvm-all.c
 index eabb097..ef45418 100644
 --- a/kvm-all.c
 +++ b/kvm-all.c
 @@ -26,6 +26,7 @@
  #include hw/hw.h
  #include gdbstub.h
  #include kvm.h
 +#include bswap.h

  /* KVM uses PAGE_SIZE in it's definition of COALESCED_MMIO_MAX */
  #define PAGE_SIZE TARGET_PAGE_SIZE
 @@ -282,11 +283,41 @@ static int kvm_set_migration_log(int enable)
     return 0;
  }

 -static int test_le_bit(unsigned long nr, unsigned char *addr)
 +/* get kvm's dirty pages bitmap and update qemu's */
 +static int kvm_get_dirty_pages_log_range(unsigned long start_addr,
 +                                         unsigned long *bitmap,
 +                                         unsigned long offset,
 +                                         unsigned long mem_size)
  {
 -    return (addr[nr  3]  (nr  7))  1;
 +    unsigned int i, j;
 +    unsigned long page_number, addr, addr1, c;
 +    ram_addr_t ram_addr;
 +    unsigned int len = ((mem_size / TARGET_PAGE_SIZE) + HOST_LONG_BITS - 1) /
 +        HOST_LONG_BITS;
 +
 +    /*
 +     * bitmap-traveling is faster than memory-traveling (for addr...)
 +     * especially when most of the memory is not dirty.
 +     */
 +    for (i = 0; i  len; i++) {
 +        if (bitmap[i] != 0) {
 +            c = leul_to_cpu(bitmap[i]);
 +            do {
 +                j = ffsl(c) - 1;
 +                c = ~(1ul  j);
 +                page_number = i * HOST_LONG_BITS + j;
 +                addr1 = page_number * TARGET_PAGE_SIZE;
 +                addr = offset + addr1;
 +                ram_addr = cpu_get_physical_page_desc(addr);
 +                cpu_physical_memory_set_dirty(ram_addr);
 +            } while (c != 0);
 +        }
 +    }
 +    return 0;
  }

 +#define ALIGN(x, y)  (((x)+(y)-1)  ~((y)-1))
 +
  /**
  * kvm_physical_sync_dirty_bitmap - Grab dirty bitmap from kernel space
  * This function updates qemu's dirty bitmap using 
 cpu_physical_memory_set_dirty().
 @@ -300,8 +331,6 @@ static int 
 kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
  {
     KVMState *s = kvm_state;
     unsigned long size, allocated_size = 0;
 -    target_phys_addr_t phys_addr;
 -    ram_addr_t addr;
     KVMDirtyLog d;
     KVMSlot *mem;
     int ret = 0;
 @@ -313,7 +342,7 @@ static int 
 kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
             break;
         }

 -        size = ((mem-memory_size  TARGET_PAGE_BITS) + 7) / 8;
 +        size = ALIGN(((mem-memory_size)  TARGET_PAGE_BITS), 
 HOST_LONG_BITS) / 8;
         if (!d.dirty_bitmap) {
             d.dirty_bitmap = qemu_malloc(size);
         } else if (size  allocated_size) {
 @@ -330,17 +359,9 @@ static int 
 kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
             break;
         }

 -        for (phys_addr = mem-start_addr, addr = mem-phys_offset;
 -             phys_addr  mem-start_addr + mem-memory_size;
 -             phys_addr += TARGET_PAGE_SIZE, addr += TARGET_PAGE_SIZE) {
 -            unsigned char *bitmap = (unsigned char *)d.dirty_bitmap;
 -            unsigned nr = (phys_addr - mem-start_addr)  TARGET_PAGE_BITS;
 -
 -            if (test_le_bit(nr, bitmap)) {
 -                cpu_physical_memory_set_dirty(addr);
 -            }
 -        }
 -        start_addr = phys_addr;
 +        kvm_get_dirty_pages_log_range(mem-start_addr, d.dirty_bitmap,
 +                                      mem-start_addr, mem-memory_size);
 +        start_addr = mem-start_addr + mem-memory_size;
     }
     qemu_free(d.dirty_bitmap);

 --
 1.6.6.1
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html