Re: [PATCH v2 6/10] KVM MMU: don't write-protect if have new mapping to unsync page
On 04/26/2010 06:58 AM, Xiao Guangrong wrote: Avi Kivity wrote: On 04/25/2010 10:00 AM, Xiao Guangrong wrote: Two cases maybe happen in kvm_mmu_get_page() function: - one case is, the goal sp is already in cache, if the sp is unsync, we only need update it to assure this mapping is valid, but not mark it sync and not write-protect sp-gfn since it not broke unsync rule(one shadow page for a gfn) - another case is, the goal sp not existed, we need create a new sp for gfn, i.e, gfn (may)has another shadow page, to keep unsync rule, we should sync(mark sync and write-protect) gfn's unsync shadow page. After enabling multiple unsync shadows, we sync those shadow pages only when the new sp not allow to become unsync(also for the unsyc rule, the new rule is: allow all pte page become unsync) Another interesting case is to create new shadow pages in the unsync state. That can help when the guest starts a short lived process: we can avoid write protecting its pagetables completely. Even if we do sync them, we can sync them in a batch instead of one by one, saving IPIs. IPI is needed when rmap_write_protect() changes mappings form writable to read-only, so while we sync all gfn's unsync page, only one IPI is needed. I meant, we can write protect all pages, then use one IPI to drop the tlbs for all of them. And, another problem is we call ramp_write_protect()/flush-local-tlb many times when sync gfn's unsync page, the same problem is in mmu_sync_children() function, could you allow me to improve it after this patchset? :-) Of course, this is more than enough to chew on. Just suggesting an idea... -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V5 1/3] perf kvm: Enhance perf to collect KVM guest os statistics from host side
On Fri, 2010-04-23 at 13:50 +0300, Avi Kivity wrote: On 04/22/2010 01:27 PM, Liu Yu-B13201 wrote: I met this error when built kernel. Anything wrong? CC init/main.o In file included from include/linux/ftrace_event.h:8, from include/trace/syscall.h:6, from include/linux/syscalls.h:75, from init/main.c:16: include/linux/perf_event.h: In function 'perf_register_guest_info_callbacks': include/linux/perf_event.h:1019: error: parameter name omitted include/linux/perf_event.h: In function 'perf_unregister_guest_info_callbacks': include/linux/perf_event.h:1021: error: parameter name omitted make[1]: *** [init/main.o] Error 1 make: *** [init] Error 2 I merged tip/perf/code which may fix this. Find it in kvm.git next branch. I downloaded the latest kvm.git tree and compilation is ok, including both enabling FTRACE and disabling FTRACE. Yanmin -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: VIA Nano support
On 04/26/2010 04:14 AM, Rusty Burchfield wrote: When trying to boot installer iso's on this platform I get the following error about 40 times per second in my kern.log. kernel: [ 4857.828875] handle_exception: unexpected, vectoring info 0x880e intr info 0x8b0d These options have no effect: -no-kvm-irqchip or -no-kvm-pit QEMU is working fine. Interestingly, specifying the -uuid option causes it to freeze immediately. Without that option it gets to the iso's boot screen and freezes after attempting to continue past there. Is this an expected condition for this platform / is this a platform anyone is interested in supporting? ;-) It's a known bug in the Nano's vmx implementation. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/10] KVM MMU: fix for calculating gpa in invlpg code
On 04/26/2010 06:10 AM, Xiao Guangrong wrote: Avi Kivity wrote: On 04/25/2010 10:00 AM, Xiao Guangrong wrote: If the guest is 32-bit, we should use 'quadrant' to adjust gpa offset Changlog v2: - when level is PT_DIRECTORY_LEVEL, the 'offset' should be 'role.quadrant 8', thanks Avi for point it out Signed-off-by: Xiao Guangrongxiaoguangr...@cn.fujitsu.com --- arch/x86/kvm/paging_tmpl.h | 13 +++-- 1 files changed, 11 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index d0cc07e..83cc72f 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -478,9 +478,18 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva) ((level == PT_DIRECTORY_LEVEL is_large_pte(*sptep))) || ((level == PT_PDPE_LEVEL is_large_pte(*sptep { struct kvm_mmu_page *sp = page_header(__pa(sptep)); - +int offset = 0; + +if (PTTYPE == 32) { +if (level == PT_DIRECTORY_LEVEL) +offset = PAGE_SHIFT - 4; +else +offset = PT64_LEVEL_BITS; +offset = sp-role.quadrant offset; +} The calculation is really shift = (PT32_LEVEL_BITS - PT64_LEVEL_BITS) * level; So, the offset is q (PAGE_SHIFT - (PT32_LEVEL_BITS - PT64_LEVEL_BITS) * level - 2)? Ugh, what I meant was shift = PAGE_SHIFT - (PT_LEVEL_BITS - PT64_LEVEL_BITS) * level; offset = q shift so, pte_gpa = (sp-gfn PAGE_SHIFT) + offset + (spte - sp-spt) * sizeof(pt_element_t) No magic numbers please. Note it could work without the 'if (PTTYPE == 32)'. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM_SET_MP_STATE is undocumented
Hi Avi, Avi Kivity kirjoitti: I noticed that QEMU uses KVM_SET_MP_STATE but the ioctl() is completely undocumented. I assume it has something to do with multiprocessor but I am unable to work out the details unless I take a peek at arch/x86/kvm. Patch sent. Two more interesting but undocumented ioctls: - KVM_SET_IDENTITY_MAP_ADDR - KVM_SET_BOOT_CPU_ID Little background: we're debugging a KVM_EXIT_UNKNOWN problem for the largest bug-free kernel on Core i5 machine. I've been looking at plain QEMU sources but it seems qemu-kvm that the person is using does much more during initialization. Do we have a known good list of mandatory steps required to properly initialize KVM on all CPUs? Pekka -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/5] mmu root page cleanups
I tried to separate direct and indirect page allocation to reduce memory usage with tdp, but that turned out too messy (and not much of a win - we allocate 8K per page outside the struct kvm_mmu_page). However the cleanups on the way are worthwhile. Avi Kivity (5): KVM: MMU: Rearrange struct kvm_mmu_page KVM: MMU: use 16 bits for root_count KVM: MMU: Unify 32-pae and single-root mmu setup KVM: MMU: Use correct root gfn for direct maps KVM: MMU: Fix check for cr3 outside guest memory arch/x86/include/asm/kvm_host.h | 15 ++- arch/x86/kvm/mmu.c | 53 +++ 2 files changed, 34 insertions(+), 34 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/5] KVM: MMU: use 16 bits for root_count
This is incremented by a maximum of 4 for every vcpu, so we are far from overflow. 16 bits improve struct packing. Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/include/asm/kvm_host.h |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index cdaaedc..f38007d 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -192,7 +192,7 @@ struct kvm_mmu_page { */ gfn_t gfn; union kvm_mmu_page_role role; - int root_count; /* Currently serving as active root */ + short root_count; /* Currently serving as active root */ bool multimapped; /* More than one parent_pte? */ bool unsync; union { -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5] KVM: MMU: Rearrange struct kvm_mmu_page
Put all members required for direct pages together, to reduce cache footprint for those pages. Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/include/asm/kvm_host.h | 15 --- 1 files changed, 8 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 3f0007b..cdaaedc 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -192,6 +192,13 @@ struct kvm_mmu_page { */ gfn_t gfn; union kvm_mmu_page_role role; + int root_count; /* Currently serving as active root */ + bool multimapped; /* More than one parent_pte? */ + bool unsync; + union { + u64 *parent_pte; /* !multimapped */ + struct hlist_head parent_ptes; /* multimapped, kvm_pte_chain */ + }; u64 *spt; /* hold the gfn of each spte inside spt */ @@ -201,14 +208,8 @@ struct kvm_mmu_page { * in this shadow page. */ DECLARE_BITMAP(slot_bitmap, KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS); - bool multimapped; /* More than one parent_pte? */ - bool unsync; - int root_count; /* Currently serving as active root */ + unsigned int unsync_children; - union { - u64 *parent_pte; /* !multimapped */ - struct hlist_head parent_ptes; /* multimapped, kvm_pte_chain */ - }; DECLARE_BITMAP(unsync_child_bitmap, 512); }; -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/5] KVM: MMU: Unify 32-pae and single-root mmu setup
Reduce code duplication. Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/kvm/mmu.c | 44 +--- 1 files changed, 21 insertions(+), 23 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index ddfa865..4da4ff1 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2052,41 +2052,36 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu) struct kvm_mmu_page *sp; int direct = 0; u64 pdptr; + hpa_t *rootp, root, root_flags; + int nr_roots; - root_gfn = vcpu-arch.cr3 PAGE_SHIFT; + direct = tdp_enabled || !is_paging(vcpu); if (vcpu-arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) { - hpa_t root = vcpu-arch.mmu.root_hpa; - - ASSERT(!VALID_PAGE(root)); - if (tdp_enabled) - direct = 1; - if (mmu_check_root(vcpu, root_gfn)) - return 1; - sp = kvm_mmu_get_page(vcpu, root_gfn, 0, - PT64_ROOT_LEVEL, direct, - ACC_ALL, NULL); - root = __pa(sp-spt); - ++sp-root_count; - vcpu-arch.mmu.root_hpa = root; - return 0; + rootp = vcpu-arch.mmu.root_hpa; + nr_roots = 1; + root_flags = 0; + } else { + rootp = vcpu-arch.mmu.pae_root; + nr_roots = 4; + root_flags = PT_PRESENT_MASK; } - direct = !is_paging(vcpu); - if (tdp_enabled) - direct = 1; - for (i = 0; i 4; ++i) { - hpa_t root = vcpu-arch.mmu.pae_root[i]; + for (i = 0; i nr_roots; ++i) { + root = rootp[i]; ASSERT(!VALID_PAGE(root)); if (vcpu-arch.mmu.root_level == PT32E_ROOT_LEVEL) { pdptr = kvm_pdptr_read(vcpu, i); if (!is_present_gpte(pdptr)) { - vcpu-arch.mmu.pae_root[i] = 0; + rootp[i] = 0; continue; } root_gfn = pdptr PAGE_SHIFT; } else if (vcpu-arch.mmu.root_level == 0) root_gfn = 0; + else + root_gfn = vcpu-arch.cr3 PAGE_SHIFT; + if (mmu_check_root(vcpu, root_gfn)) return 1; sp = kvm_mmu_get_page(vcpu, root_gfn, i 30, @@ -2094,9 +2089,12 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu) ACC_ALL, NULL); root = __pa(sp-spt); ++sp-root_count; - vcpu-arch.mmu.pae_root[i] = root | PT_PRESENT_MASK; + rootp[i] = root | root_flags; } - vcpu-arch.mmu.root_hpa = __pa(vcpu-arch.mmu.pae_root); + + if (nr_roots == 4) + vcpu-arch.mmu.root_hpa = __pa(rootp); + return 0; } -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/5] KVM: MMU: Use correct root gfn for direct maps
We currently use cr3, which is a random number for direct maps. Use zero instead (the start of the linear range mapped by the root). Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/kvm/mmu.c |9 + 1 files changed, 5 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 4da4ff1..6e925b3 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2070,16 +2070,17 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu) for (i = 0; i nr_roots; ++i) { root = rootp[i]; ASSERT(!VALID_PAGE(root)); - if (vcpu-arch.mmu.root_level == PT32E_ROOT_LEVEL) { + + if (direct) + root_gfn = 0; + else if (vcpu-arch.mmu.root_level == PT32E_ROOT_LEVEL) { pdptr = kvm_pdptr_read(vcpu, i); if (!is_present_gpte(pdptr)) { rootp[i] = 0; continue; } root_gfn = pdptr PAGE_SHIFT; - } else if (vcpu-arch.mmu.root_level == 0) - root_gfn = 0; - else + } else root_gfn = vcpu-arch.cr3 PAGE_SHIFT; if (mmu_check_root(vcpu, root_gfn)) -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/5] KVM: MMU: Fix check for cr3 outside guest memory
Fir direct maps, root_gfn or cr3 are meaningless. This hasn't bitten us because no sane guest sets cr3 outside its own memory even in real mode. Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/kvm/mmu.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 6e925b3..e3acc9e 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2083,7 +2083,7 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu) } else root_gfn = vcpu-arch.cr3 PAGE_SHIFT; - if (mmu_check_root(vcpu, root_gfn)) + if (!direct mmu_check_root(vcpu, root_gfn)) return 1; sp = kvm_mmu_get_page(vcpu, root_gfn, i 30, PT32_ROOT_LEVEL, direct, -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: Minor MMU documentation edits
Reported by Andrew Jones. Signed-off-by: Avi Kivity a...@redhat.com --- Documentation/kvm/mmu.txt | 10 +- 1 files changed, 5 insertions(+), 5 deletions(-) diff --git a/Documentation/kvm/mmu.txt b/Documentation/kvm/mmu.txt index da04671..0cc28fb 100644 --- a/Documentation/kvm/mmu.txt +++ b/Documentation/kvm/mmu.txt @@ -75,8 +75,8 @@ direct mode; otherwise it operates in shadow mode (see below). Memory == -Guest memory (gpa) is part of user address space of the process that is using -kvm. Userspace defines the translation between guest addresses and user +Guest memory (gpa) is part of the user address space of the process that is +using kvm. Userspace defines the translation between guest addresses and user addresses (gpa-hva); note that two gpas may alias to the same gva, but not vice versa. @@ -111,7 +111,7 @@ is not related to a translation directly. It points to other shadow pages. A leaf spte corresponds to either one or two translations encoded into one paging structure entry. These are always the lowest level of the -translation stack, with an optional higher level translations left to NPT/EPT. +translation stack, with optional higher level translations left to NPT/EPT. Leaf ptes point at guest pages. The following table shows translations encoded by leaf ptes, with higher-level @@ -167,7 +167,7 @@ Shadow pages contain the following information: Either the guest page table containing the translations shadowed by this page, or the base page frame for linear translations. See role.direct. spt: -A pageful of 64-bit sptes containig the translations for this page. +A pageful of 64-bit sptes containing the translations for this page. Accessed by both kvm and hardware. The page pointed to by spt will have its page-private pointing back at the shadow page structure. @@ -235,7 +235,7 @@ the amount of emulation we have to do when the guest modifies multiple gptes, or when the a guest page is no longer used as a page table and is used for random guest data. -As a side effect we have resynchronize all reachable unsynchronized shadow +As a side effect we have to resynchronize all reachable unsynchronized shadow pages on a tlb flush. -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/5] mmu root page cleanups
On 04/26/2010 11:48 AM, Avi Kivity wrote: I tried to separate direct and indirect page allocation to reduce memory usage with tdp, but that turned out too messy (and not much of a win - we allocate 8K per page outside the struct kvm_mmu_page). However the cleanups on the way are worthwhile. Breaks during testing, please don't apply. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] KVM MMU: fix hashing for TDP and non-paging modes
On 04/23/2010 02:11 PM, Avi Kivity wrote: All feedback would be welcome, since I'm new to this system! A strawman patch follows. Patch is correct, but I have already fixed this in a more extensive patch set (that also folds the 32-bit and 64-bit cases together, etc.) and I'm too lazy to rebase on top of yours. Well my patchset is broken. Marcelo, please apply this and I'll rebase and fix. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/1] KVM: x86: avoid unnecessary bitmap allocation when memslot is clean
Hi Avi, I want you look at this patch before discussing about our patch set. This patch sould itself worth it, I belive, and shows how much improvements we can expect from our dirty bitmap works. Note: this will not conflict with our future works! Thanks, Takuya ** Simple test ** 1. What we did I measured the time needed for the get dirty log ioctl during playing with Ubuntu installer, in which VGA is logging, and compared the result to that of the original version. 2. Test environment Intel(R) Core(TM)2 Duo CPU P8700 @ 2.53GHz (No EPT support) With latest qemu-kvm.git This test is for clarifying the micro performance, and is not intended to verify what we can expect on Enterprize servers: so I used my laptop. 3. Results I picked up three typical parts for comparison. - TYPE1 original slot=6, slot.len= 32768, usec= 65 slot=7, slot.len= 32768, usec= 24 slot=6, slot.len= 32768, usec= 26 slot=7, slot.len= 32768, usec= 23 slot=6, slot.len= 32768, usec= 24 slot=7, slot.len= 32768, usec= 24 slot=6, slot.len= 32768, usec= 25 slot=7, slot.len= 32768, usec= 25 with my patch slot=6, slot.len= 32768, usec=3 slot=7, slot.len= 32768, usec=3 slot=6, slot.len= 32768, usec=3 slot=7, slot.len= 32768, usec=2 slot=6, slot.len= 32768, usec=3 slot=7, slot.len= 32768, usec=2 slot=6, slot.len= 32768, usec=3 slot=7, slot.len= 32768, usec=3 - TYPE2 original slot=6, slot.len= 32768, usec= 158 slot=7, slot.len= 32768, usec= 26 slot=6, slot.len= 32768, usec= 157 slot=7, slot.len= 32768, usec= 26 slot=6, slot.len= 32768, usec= 157 slot=7, slot.len= 32768, usec= 26 slot=6, slot.len= 32768, usec= 158 slot=7, slot.len= 32768, usec= 27 with my patch slot=6, slot.len= 32768, usec= 117 slot=7, slot.len= 32768, usec=2 slot=6, slot.len= 32768, usec= 124 slot=7, slot.len= 32768, usec=1 slot=6, slot.len= 32768, usec= 121 slot=7, slot.len= 32768, usec=1 slot=6, slot.len= 32768, usec= 72 slot=7, slot.len= 32768, usec=2 - TYPE3 original slot=5, slot.len= 16777216, usec=9 slot=6, slot.len= 32768, usec=6 slot=7, slot.len= 32768, usec=7 slot=5, slot.len= 16777216, usec= 11 slot=6, slot.len= 32768, usec=7 slot=7, slot.len= 32768, usec=7 slot=5, slot.len= 16777216, usec= 14 slot=6, slot.len= 32768, usec=8 slot=7, slot.len= 32768, usec=8 slot=5, slot.len= 16777216, usec= 11 slot=6, slot.len= 32768, usec=9 slot=7, slot.len= 32768, usec=9 with my patch slot=5, slot.len= 16777216, usec=2 slot=6, slot.len= 32768, usec=2 slot=7, slot.len= 32768, usec=1 slot=5, slot.len= 16777216, usec=2 slot=6, slot.len= 32768, usec=2 slot=7, slot.len= 32768, usec=2 slot=5, slot.len= 16777216, usec=2 slot=6, slot.len= 32768, usec=1 slot=7, slot.len= 32768, usec=1 slot=5, slot.len= 16777216, usec=3 slot=6, slot.len= 32768, usec=2 slot=7, slot.len= 32768, usec=2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/1] KVM: x86: avoid unnecessary bitmap allocation when memslot is clean
Although we always allocate a new dirty bitmap in x86's get_dirty_log(), it is only used as a zero-source of copy_to_user() and freed right after that when memslot is clean. This patch uses clear_user() instead of doing this unnecessary zero-source allocation. Performance improvement: as we can expect easily, the time needed to allocate a bitmap is completely reduced. In my test, the improved ioctl was about 4 to 10 times faster than the original one for clean slots. Furthermore, the reduced allocations seem to produce good effects for other cases too. Actually, I observed that the time for the ioctl was more stable than the original one and the average time for dirty slots was also reduced by some extent. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp --- arch/x86/kvm/x86.c | 36 ++-- 1 files changed, 22 insertions(+), 14 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 6b2ce1d..0086d64 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2744,7 +2744,6 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot; unsigned long n; unsigned long is_dirty = 0; - unsigned long *dirty_bitmap = NULL; mutex_lock(kvm-slots_lock); @@ -2759,27 +2758,29 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, n = kvm_dirty_bitmap_bytes(memslot); - r = -ENOMEM; - dirty_bitmap = vmalloc(n); - if (!dirty_bitmap) - goto out; - memset(dirty_bitmap, 0, n); - for (i = 0; !is_dirty i n/sizeof(long); i++) is_dirty = memslot-dirty_bitmap[i]; /* If nothing is dirty, don't bother messing with page tables. */ if (is_dirty) { struct kvm_memslots *slots, *old_slots; + unsigned long *dirty_bitmap; spin_lock(kvm-mmu_lock); kvm_mmu_slot_remove_write_access(kvm, log-slot); spin_unlock(kvm-mmu_lock); - slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL); - if (!slots) - goto out_free; + r = -ENOMEM; + dirty_bitmap = vmalloc(n); + if (!dirty_bitmap) + goto out; + memset(dirty_bitmap, 0, n); + slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL); + if (!slots) { + vfree(dirty_bitmap); + goto out; + } memcpy(slots, kvm-memslots, sizeof(struct kvm_memslots)); slots-memslots[log-slot].dirty_bitmap = dirty_bitmap; @@ -2788,13 +2789,20 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, synchronize_srcu_expedited(kvm-srcu); dirty_bitmap = old_slots-memslots[log-slot].dirty_bitmap; kfree(old_slots); + + r = -EFAULT; + if (copy_to_user(log-dirty_bitmap, dirty_bitmap, n)) { + vfree(dirty_bitmap); + goto out; + } + vfree(dirty_bitmap); + } else { + r = -EFAULT; + if (clear_user(log-dirty_bitmap, n)) + goto out; } r = 0; - if (copy_to_user(log-dirty_bitmap, dirty_bitmap, n)) - r = -EFAULT; -out_free: - vfree(dirty_bitmap); out: mutex_unlock(kvm-slots_lock); return r; -- 1.6.3.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/9] Make use of the redirection of guest serial
The guest console is useful for troubleshooting the failure cases especially for the one who has calltrace. And the session through it is also useful for the networking related test cases ( consider the test which could shutdown or crash the guest network ). so this patchset first redirects the guest serial to unix domain socket and then could log the guest serial through a thread or make it as a method of session. The last patch makes all linux guests to use serial as their consoles through unattended files. --- Jason Wang (9): KVM test: Introduce the prompt assist KVM test: Add the ability to send the username in remote_login() KVM test: Make the login re suitable for serial console KVM test: Redirect the serial to the unix domain socket KVM test: Log the content from guest serial console KVM test: Raise error when met unknown type in kvm_vm.remote_login(). KVM test: Introduce the local_login() KVM test: Create the background threads before calling process() KVM test: Redirect the console to serial for all linux guests client/tests/kvm/kvm_preprocessing.py| 64 -- client/tests/kvm/kvm_utils.py| 23 +++-- client/tests/kvm/kvm_vm.py | 48 client/tests/kvm/tests_base.cfg.sample |1 client/tests/kvm/unattended/Fedora-10.ks |2 - client/tests/kvm/unattended/Fedora-11.ks |2 - client/tests/kvm/unattended/Fedora-12.ks |2 - client/tests/kvm/unattended/Fedora-8.ks |2 - client/tests/kvm/unattended/Fedora-9.ks |2 - client/tests/kvm/unattended/OpenSUSE-11.xml |1 client/tests/kvm/unattended/RHEL-3-series.ks |2 - client/tests/kvm/unattended/RHEL-4-series.ks |2 - client/tests/kvm/unattended/RHEL-5-series.ks |2 - client/tests/kvm/unattended/SLES-11.xml |1 14 files changed, 128 insertions(+), 26 deletions(-) -- Signature -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/9] KVM test: Introduce the prompt assist
Sometimes we need to send an assist string to a session in order to get the prompt especially when re-connecting to an already logged serial session. This patch send the assist string before doing the pattern matching of remote_login. Signed-off-by: Jason Wang jasow...@redhat.com --- client/tests/kvm/kvm_utils.py |9 +++-- 1 files changed, 7 insertions(+), 2 deletions(-) diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py index 25f3c8c..9adbaee 100644 --- a/client/tests/kvm/kvm_utils.py +++ b/client/tests/kvm/kvm_utils.py @@ -451,7 +451,8 @@ def check_kvm_source_dir(source_dir): # The following are functions used for SSH, SCP and Telnet communication with # guests. -def remote_login(command, password, prompt, linesep=\n, timeout=10): +def remote_login(command, password, prompt, linesep=\n, timeout=10, + prompt_assist = None): Log into a remote host (guest) using SSH or Telnet. Run the given command using kvm_spawn and provide answers to the questions asked. If timeout @@ -468,7 +469,8 @@ def remote_login(command, password, prompt, linesep=\n, timeout=10): @param timeout: The maximal time duration (in seconds) to wait for each step of the login procedure (i.e. the Are you sure prompt, the password prompt, the shell prompt, etc) - +@prarm prompt_assist: An assistant string sent before the pattern +matching in order to get the prompt for some kinds of shell_client. @return Return the kvm_spawn object on success and None on failure. sub = kvm_subprocess.kvm_shell_session(command, @@ -479,6 +481,9 @@ def remote_login(command, password, prompt, linesep=\n, timeout=10): logging.debug(Trying to login with command '%s' % command) +if prompt_assist is not None: +sub.sendline(prompt_assist) + while True: (match, text) = sub.read_until_last_line_matches( [r[Aa]re you sure, r[Pp]assword:\s*$, r^\s*[Ll]ogin:\s*$, -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/9] KVM test: Add the ability to send the username in remote_login()
In order to let the serial console work, we must let the remote_login() send the username when needed. Signed-off-by: Jason Wang jasow...@redhat.com --- client/tests/kvm/kvm_utils.py | 14 ++ 1 files changed, 10 insertions(+), 4 deletions(-) diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py index 9adbaee..1ea0852 100644 --- a/client/tests/kvm/kvm_utils.py +++ b/client/tests/kvm/kvm_utils.py @@ -452,7 +452,7 @@ def check_kvm_source_dir(source_dir): # guests. def remote_login(command, password, prompt, linesep=\n, timeout=10, - prompt_assist = None): + prompt_assist = None, username = None): Log into a remote host (guest) using SSH or Telnet. Run the given command using kvm_spawn and provide answers to the questions asked. If timeout @@ -471,6 +471,8 @@ def remote_login(command, password, prompt, linesep=\n, timeout=10, password prompt, the shell prompt, etc) @prarm prompt_assist: An assistant string sent before the pattern matching in order to get the prompt for some kinds of shell_client. +@param username: The user name used to log into the session + @return Return the kvm_spawn object on success and None on failure. sub = kvm_subprocess.kvm_shell_session(command, @@ -505,9 +507,13 @@ def remote_login(command, password, prompt, linesep=\n, timeout=10, sub.close() return None elif match == 2: # login: -logging.debug(Got unexpected login prompt) -sub.close() -return None +if username is None: +logging.debug(Got unexpected login prompt) +sub.close() +return None +else: +sub.sendline(username) +continue elif match == 3: # Connection closed logging.debug(Got 'Connection closed') sub.close() -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/9] KVM test: Make the login re suitable for serial console
Current matching re ^\s*[Ll]ogin:\s*$ is not suitable for the serial console, so change it to [Ll]ogin:. Signed-off-by: Jason Wang jasow...@redhat.com --- client/tests/kvm/kvm_utils.py |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py index 1ea0852..bb42314 100644 --- a/client/tests/kvm/kvm_utils.py +++ b/client/tests/kvm/kvm_utils.py @@ -488,7 +488,7 @@ def remote_login(command, password, prompt, linesep=\n, timeout=10, while True: (match, text) = sub.read_until_last_line_matches( -[r[Aa]re you sure, r[Pp]assword:\s*$, r^\s*[Ll]ogin:\s*$, +[r[Aa]re you sure, r[Pp]assword:\s*$, r[Ll]ogin:, r[Cc]onnection.*closed, r[Cc]onnection.*refused, r[Pp]lease wait, prompt], timeout=timeout, internal_timeout=0.5) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/9] KVM test: Redirect the serial to the unix domain socket
This patch redirect the guest serial to the unix domain socket which would be used by the following patches to dump its content or use it as a session. Signed-off-by: Jason Wang jasow...@redhat.com --- client/tests/kvm/kvm_vm.py | 19 --- 1 files changed, 12 insertions(+), 7 deletions(-) diff --git a/client/tests/kvm/kvm_vm.py b/client/tests/kvm/kvm_vm.py index 6bc7987..8f4753f 100755 --- a/client/tests/kvm/kvm_vm.py +++ b/client/tests/kvm/kvm_vm.py @@ -116,17 +116,20 @@ class VM: self.address_cache = address_cache self.pci_assignable = None -# Find available monitor filename +# Find available filenames for monitor and guest serial redirection while True: -# The monitor filename should be unique +# The filenames should be unique self.instance = (time.strftime(%Y%m%d-%H%M%S-) + kvm_utils.generate_random_string(4)) -self.monitor_file_name = os.path.join(/tmp, - monitor- + self.instance) -if not os.path.exists(self.monitor_file_name): -break - +names = [os.path.join(/tmp, type + self.instance) for type in + monitor-, serial-] +if True in [os.path.exists(file) for file in names]: +continue +else: +[self.monitor_file_name, self.serial_file_name] = names +break + def clone(self, name=None, params=None, root_dir=None, address_cache=None): Return a clone of the VM object with optionally modified parameters. @@ -316,6 +319,8 @@ class VM: for pci_id in self.pa_pci_ids: qemu_cmd += -pcidevice host=%s % pci_id +qemu_cmd += -serial unix:%s,server,nowait % self.serial_file_name + return qemu_cmd -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/9] KVM test: Log the content from guest serial console
This patch tries to get the content of guest serial and log it into the debug directoy of the testcase through a dedicated thread which is created in the preprocessing and ended in the postprocessing. The params of serial_mode must be set to dump in order to make use of this feature. Signed-off-by: Jason Wang jasow...@redhat.com --- client/tests/kvm/kvm_preprocessing.py | 59 +++- client/tests/kvm/tests_base.cfg.sample |1 + 2 files changed, 59 insertions(+), 1 deletions(-) diff --git a/client/tests/kvm/kvm_preprocessing.py b/client/tests/kvm/kvm_preprocessing.py index 4b9290c..50d0e35 100644 --- a/client/tests/kvm/kvm_preprocessing.py +++ b/client/tests/kvm/kvm_preprocessing.py @@ -1,4 +1,5 @@ import sys, os, time, commands, re, logging, signal, glob, threading, shutil +import socket, select from autotest_lib.client.bin import test, utils from autotest_lib.client.common_lib import error import kvm_vm, kvm_utils, kvm_subprocess, ppm_utils @@ -13,7 +14,8 @@ except ImportError: _screendump_thread = None _screendump_thread_termination_event = None - +_serialdump_thread = None +_serialdump_thread_termination_event = None def preprocess_image(test, params): @@ -267,6 +269,16 @@ def preprocess(test, params, env): args=(test, params, env)) _screendump_thread.start() +# Start the serial dump thread +if params.get(serial_mode) == dump: +logging.debug(Starting serialdump thread) +global _serialdump_thread, _serialdump_thread_termination_event +_serialdump_thread_termination_event = threading.Event() +_serialdump_thread = threading.Thread(target=_dump_serial_console, + args=(test, params, env)) +_serialdump_thread.start() + + def postprocess(test, params, env): @@ -286,6 +298,13 @@ def postprocess(test, params, env): _screendump_thread_termination_event.set() _screendump_thread.join(10) +# Terminate the serialdump thread +global _serialdump_thread, _serialdump_thread_termination_event +if _serialdump_thread: +logging.debug(Terminating serialdump thread...) +_serialdump_thread_termination_event.set() +_serialdump_thread.join(10) + # Warn about corrupt PPM files for f in glob.glob(os.path.join(test.debugdir, *.ppm)): if not ppm_utils.image_verify_ppm_file(f): @@ -450,3 +469,41 @@ def _take_screendumps(test, params, env): if _screendump_thread_termination_event.isSet(): break _screendump_thread_termination_event.wait(delay) + +def _dump_serial_console(test, params, env): +global _serialdump_thread_termination_event +rs = [] +files = {} + +while True: +for vm in kvm_utils.env_get_all_vms(env): +if not files.has_key(vm): +try: +serial_socket = socket.socket(socket.AF_UNIX, + socket.SOCK_STREAM) +serial_socket.setblocking(False) +serial_socket.connect(vm.serial_file_name) +except: +logging.debug(Could not connect to serial socket for %s % + vm.name) +continue +rs.append(serial_socket) +serial_dump_filename = os.path.join(test.debugdir, +serial-%s % vm.name) +files[vm] = [serial_socket, file(serial_dump_filename, a+)] + +r, w, x = select.select(rs, [], [], 0.5) +for vm in files.keys(): +[s ,d] = files[vm] +if s in r: +data = s.recv(16384) +if len(data) == 0: +rs.remove(s) +files.pop(vm) +else: +d.write(data) + +if _serialdump_thread_termination_event.isSet(): +break + + diff --git a/client/tests/kvm/tests_base.cfg.sample b/client/tests/kvm/tests_base.cfg.sample index 9f82ffb..169a69e 100644 --- a/client/tests/kvm/tests_base.cfg.sample +++ b/client/tests/kvm/tests_base.cfg.sample @@ -13,6 +13,7 @@ start_vm = yes kill_vm = no kill_vm_gracefully = yes kill_unresponsive_vms = yes +serial_mode = dump # Screendump specific stuff convert_ppm_files_to_png_on_error = yes -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/9] KVM test: Raise error when met unknown type in kvm_vm.remote_login().
Need to raise the error when met the unknown type of shell_client in kvm_vm.remote_login() in order to avoid the traceback. Signed-off-by: Jason Wang jasow...@redhat.com --- client/tests/kvm/kvm_vm.py |4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/client/tests/kvm/kvm_vm.py b/client/tests/kvm/kvm_vm.py index 8f4753f..0cdf925 100755 --- a/client/tests/kvm/kvm_vm.py +++ b/client/tests/kvm/kvm_vm.py @@ -806,7 +806,9 @@ class VM: elif client == nc: session = kvm_utils.netcat(address, port, username, password, prompt, linesep, timeout) - +else: +raise error.TestError(Unknown shell_client type %s % client) + if session: session.set_status_test_command(self.params.get(status_test_ command, )) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 7/9] KVM test: Introduce the local_login()
This patch introduces a new method which is used to log into the guest through the guest serial console. The serial_mode must be set to session in order to make use of this patch. Signed-off-by: Jason Wang jasow...@redhat.com --- client/tests/kvm/kvm_vm.py | 25 + 1 files changed, 25 insertions(+), 0 deletions(-) diff --git a/client/tests/kvm/kvm_vm.py b/client/tests/kvm/kvm_vm.py index 0cdf925..a22893b 100755 --- a/client/tests/kvm/kvm_vm.py +++ b/client/tests/kvm/kvm_vm.py @@ -814,7 +814,32 @@ class VM: command, )) return session +def local_login(self, timeout=240): + +Log into the guest via serial console +If timeout expires while waiting for output from the guest (e.g. a +password prompt or a shell prompt) -- fail. + + +serial_mode = self.params.get(serial_mode) +username = self.params.get(username, ) +password = self.params.get(password, ) +prompt = self.params.get(shell_prompt, [\#\$]) +linesep = eval('%s' % self.params.get(shell_linesep, r\n)) +if serial_mode != session: +logging.debug(serial_mode is not session) +return None +else: +command = nc -U %s % self.serial_file_name +assist = self.params.get(prompt_assist) +session = kvm_utils.remote_login(command, password, prompt, linesep, + timeout, , username) +if session: +session.set_status_test_command(self.params.get(status_test_ +command, )) +return session + def copy_files_to(self, local_path, remote_path, nic_index=0, timeout=300): Transfer files to the guest. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 8/9] KVM test: Create the background threads before calling process()
If the screendump and scrialdump threads are created after the process(), we may lose the progress tracking of guest shutting down. So this patch creates them before calling process() in preprocess. Signed-off-by: Jason Wang jasow...@redhat.com --- client/tests/kvm/kvm_preprocessing.py |5 ++--- 1 files changed, 2 insertions(+), 3 deletions(-) diff --git a/client/tests/kvm/kvm_preprocessing.py b/client/tests/kvm/kvm_preprocessing.py index 50d0e35..73e835a 100644 --- a/client/tests/kvm/kvm_preprocessing.py +++ b/client/tests/kvm/kvm_preprocessing.py @@ -257,9 +257,6 @@ def preprocess(test, params, env): int(params.get(pre_command_timeout, 600)), params.get(pre_command_noncritical) == yes) -# Preprocess all VMs and images -process(test, params, env, preprocess_image, preprocess_vm) - # Start the screendump thread if params.get(take_regular_screendumps) == yes: logging.debug(Starting screendump thread) @@ -278,6 +275,8 @@ def preprocess(test, params, env): args=(test, params, env)) _serialdump_thread.start() +# Preprocess all VMs and images +process(test, params, env, preprocess_image, preprocess_vm) def postprocess(test, params, env): -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 9/9] KVM test: Redirect the console to serial for all linux guests
As we have the ability to dump the content from serial console or use a session through it, we need to redirect the console to serial through unattended files to make use of it. Signed-off-by: Jason Wang jasow...@redhat.com --- client/tests/kvm/unattended/Fedora-10.ks |2 +- client/tests/kvm/unattended/Fedora-11.ks |2 +- client/tests/kvm/unattended/Fedora-12.ks |2 +- client/tests/kvm/unattended/Fedora-8.ks |2 +- client/tests/kvm/unattended/Fedora-9.ks |2 +- client/tests/kvm/unattended/OpenSUSE-11.xml |1 + client/tests/kvm/unattended/RHEL-3-series.ks |2 +- client/tests/kvm/unattended/RHEL-4-series.ks |2 +- client/tests/kvm/unattended/RHEL-5-series.ks |2 +- client/tests/kvm/unattended/SLES-11.xml |1 + 10 files changed, 10 insertions(+), 8 deletions(-) diff --git a/client/tests/kvm/unattended/Fedora-10.ks b/client/tests/kvm/unattended/Fedora-10.ks index 61e59d7..8628036 100644 --- a/client/tests/kvm/unattended/Fedora-10.ks +++ b/client/tests/kvm/unattended/Fedora-10.ks @@ -11,7 +11,7 @@ firewall --enabled --ssh selinux --enforcing timezone --utc America/New_York firstboot --disable -bootloader --location=mbr +bootloader --location=mbr --append=console=ttyS0,115200 zerombr clearpart --all --initlabel autopart diff --git a/client/tests/kvm/unattended/Fedora-11.ks b/client/tests/kvm/unattended/Fedora-11.ks index 0be7d06..5ce8ee2 100644 --- a/client/tests/kvm/unattended/Fedora-11.ks +++ b/client/tests/kvm/unattended/Fedora-11.ks @@ -10,7 +10,7 @@ firewall --enabled --ssh selinux --enforcing timezone --utc America/New_York firstboot --disable -bootloader --location=mbr +bootloader --location=mbr --append=console=ttyS0,115200 zerombr clearpart --all --initlabel diff --git a/client/tests/kvm/unattended/Fedora-12.ks b/client/tests/kvm/unattended/Fedora-12.ks index 0be7d06..5ce8ee2 100644 --- a/client/tests/kvm/unattended/Fedora-12.ks +++ b/client/tests/kvm/unattended/Fedora-12.ks @@ -10,7 +10,7 @@ firewall --enabled --ssh selinux --enforcing timezone --utc America/New_York firstboot --disable -bootloader --location=mbr +bootloader --location=mbr --append=console=ttyS0,115200 zerombr clearpart --all --initlabel diff --git a/client/tests/kvm/unattended/Fedora-8.ks b/client/tests/kvm/unattended/Fedora-8.ks index f4a872d..884d556 100644 --- a/client/tests/kvm/unattended/Fedora-8.ks +++ b/client/tests/kvm/unattended/Fedora-8.ks @@ -11,7 +11,7 @@ firewall --enabled --ssh selinux --enforcing timezone --utc America/New_York firstboot --disable -bootloader --location=mbr +bootloader --location=mbr --append=console=ttyS0,115200 zerombr clearpart --all --initlabel autopart diff --git a/client/tests/kvm/unattended/Fedora-9.ks b/client/tests/kvm/unattended/Fedora-9.ks index f4a872d..884d556 100644 --- a/client/tests/kvm/unattended/Fedora-9.ks +++ b/client/tests/kvm/unattended/Fedora-9.ks @@ -11,7 +11,7 @@ firewall --enabled --ssh selinux --enforcing timezone --utc America/New_York firstboot --disable -bootloader --location=mbr +bootloader --location=mbr --append=console=ttyS0,115200 zerombr clearpart --all --initlabel autopart diff --git a/client/tests/kvm/unattended/OpenSUSE-11.xml b/client/tests/kvm/unattended/OpenSUSE-11.xml index 7dd44fa..2a4fec0 100644 --- a/client/tests/kvm/unattended/OpenSUSE-11.xml +++ b/client/tests/kvm/unattended/OpenSUSE-11.xml @@ -50,6 +50,7 @@ moduleedd/module /initrd_module /initrd_modules +appendconsole=ttyS0,115200/append loader_typegrub/loader_type sections config:type=list/ /bootloader diff --git a/client/tests/kvm/unattended/RHEL-3-series.ks b/client/tests/kvm/unattended/RHEL-3-series.ks index 884b386..26b1130 100644 --- a/client/tests/kvm/unattended/RHEL-3-series.ks +++ b/client/tests/kvm/unattended/RHEL-3-series.ks @@ -10,7 +10,7 @@ rootpw 123456 firewall --enabled --ssh timezone America/New_York firstboot --disable -bootloader --location=mbr +bootloader --location=mbr --append=console=ttyS0,115200 clearpart --all --initlabel autopart reboot diff --git a/client/tests/kvm/unattended/RHEL-4-series.ks b/client/tests/kvm/unattended/RHEL-4-series.ks index ce4a430..f2f934f 100644 --- a/client/tests/kvm/unattended/RHEL-4-series.ks +++ b/client/tests/kvm/unattended/RHEL-4-series.ks @@ -11,7 +11,7 @@ firewall --enabled --ssh selinux --enforcing timezone --utc America/New_York firstboot --disable -bootloader --location=mbr +bootloader --location=mbr --append=console=ttyS0,115200 zerombr clearpart --all --initlabel autopart diff --git a/client/tests/kvm/unattended/RHEL-5-series.ks b/client/tests/kvm/unattended/RHEL-5-series.ks index f4a872d..884d556 100644 --- a/client/tests/kvm/unattended/RHEL-5-series.ks +++ b/client/tests/kvm/unattended/RHEL-5-series.ks @@ -11,7 +11,7 @@ firewall --enabled --ssh selinux --enforcing timezone --utc America/New_York firstboot --disable -bootloader --location=mbr +bootloader --location=mbr
Re: Synchronized time with kvm_clock
On Monday 26 April 2010, I wrote: host: cat /sys/devices/system/clocksource/clocksource0/current_clocksource tsc guest: cat /sys/devices/system/clocksource/clocksource0/current_clocksource kvm-clock Forgotten some info which might be essential: Kernel (host and guest): 2.6.32-trunk-amd64 qemu-kvm: 0.12.3+dfsg-4 Please keep me on the cc, I'm not on the list. -- Greetings, Alex Hermann -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] KVM test: Use customized command to get the version of kvm and its
userspace Current method may or may not work for various kinds of distribution. So this patch enable the ability to use customized commands to get the version of kvm and its userspace. kvm_ver_cmd is used for kvm verison and kvm_userspace_ver_cmd is for its userspace. Signed-off-by: Jason Wang jasow...@redhat.com --- client/tests/kvm/kvm_preprocessing.py | 18 +- 1 files changed, 9 insertions(+), 9 deletions(-) diff --git a/client/tests/kvm/kvm_preprocessing.py b/client/tests/kvm/kvm_preprocessing.py index 4b9290c..16200ab 100644 --- a/client/tests/kvm/kvm_preprocessing.py +++ b/client/tests/kvm/kvm_preprocessing.py @@ -225,10 +225,10 @@ def preprocess(test, params, env): # Get the KVM kernel module version and write it as a keyval logging.debug(Fetching KVM module version...) if os.path.exists(/dev/kvm): -try: -kvm_version = open(/sys/module/kvm/version).read().strip() -except: -kvm_version = os.uname()[2] +kvm_ver_cmd = params.get(kvm_ver_cmd, cat /sys/module/kvm/version) +s, kvm_version = commands.getstatusoutput(kvm_ver_cmd) +if s != 0: +kvm_version = Unknown else: kvm_version = Unknown logging.debug(KVM module not loaded) @@ -239,11 +239,11 @@ def preprocess(test, params, env): logging.debug(Fetching KVM userspace version...) qemu_path = kvm_utils.get_path(test.bindir, params.get(qemu_binary, qemu)) -version_line = commands.getoutput(%s -help | head -n 1 % qemu_path) -matches = re.findall([Vv]ersion .*?,, version_line) -if matches: -kvm_userspace_version = .join(matches[0].split()[1:]).strip(,) -else: +def_qemu_ver_cmd = %s -help | head -n 1 | awk '{ print $5}' % qemu_path +kvm_userspace_ver_cmd = params.get(kvm_userspace_ver_cmd, + def_qemu_ver_cmd) +s, kvm_userspace_version = commands.getstatusoutput(kvm_userspace_ver_cmd) +if s != 0: kvm_userspace_version = Unknown logging.debug(Could not fetch KVM userspace version) logging.debug(KVM userspace version: %s % kvm_userspace_version) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] KVM test: Create ksm scanner through pre_command
KSM may have various control interface for different distributions,so this patch launch ksm through pre_command instead of the hard-coded bits in the test. User may specify their owner suitable commands or paramteres. Signed-off-by: Jason Wang jasow...@redhat.com --- client/tests/kvm/tests/ksm_overcommit.py | 15 --- client/tests/kvm/tests_base.cfg.sample |2 ++ 2 files changed, 2 insertions(+), 15 deletions(-) diff --git a/client/tests/kvm/tests/ksm_overcommit.py b/client/tests/kvm/tests/ksm_overcommit.py index 2dd46c4..4aa6deb 100644 --- a/client/tests/kvm/tests/ksm_overcommit.py +++ b/client/tests/kvm/tests/ksm_overcommit.py @@ -412,21 +412,6 @@ def run_ksm_overcommit(test, params, env): (3100 - 64.0))) mem = int(math.floor(host_mem * overcommit / vmsc)) -logging.debug(Checking KSM status...) -ksm_flag = 0 -for line in os.popen('ksmctl info').readlines(): -if line.startswith('flags'): -ksm_flag = int(line.split(' ')[1].split(',')[0]) -if int(ksm_flag) != 1: -logging.info(KSM module is not loaded! Trying to load module and - start ksmctl...) -try: -utils.run(modprobe ksm) -utils.run(ksmctl start 5000 100) -except error.CmdError, e: -raise error.TestFail(Failed to load KSM: %s % e) -logging.debug(KSM module loaded and ksmctl started) - swap = int(utils.read_from_meminfo(SwapTotal)) / 1024 logging.debug(Overcommit = %f, overcommit) diff --git a/client/tests/kvm/tests_base.cfg.sample b/client/tests/kvm/tests_base.cfg.sample index e73ba44..2db0d2c 100644 --- a/client/tests/kvm/tests_base.cfg.sample +++ b/client/tests/kvm/tests_base.cfg.sample @@ -285,6 +285,8 @@ variants: catch_uuid_cmd = dmidecode | awk -F: '/UUID/ {print $2}' - ksm_overcommit: +pre_command = [ -e /dev/ksm ] true || modprobe ksm ksmctl start 5000 50 +pre_command_critical = yes # Don't preprocess any vms as we need to change its params vms = '' image_snapshot = yes -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] KVM test: Remove the duplicated KERNEL paramters in the pxe configuration file
Remove the duplicated KERNEL vmlinuz in unattended.py Signed-off-by: Jason Wang jasow...@redhat.com --- client/tests/kvm/scripts/unattended.py |1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/client/tests/kvm/scripts/unattended.py b/client/tests/kvm/scripts/unattended.py index e41bc86..fdadd03 100755 --- a/client/tests/kvm/scripts/unattended.py +++ b/client/tests/kvm/scripts/unattended.py @@ -209,7 +209,6 @@ class UnattendedInstall(object): pxe_config.write('PROMPT 0\n') pxe_config.write('LABEL pxeboot\n') pxe_config.write(' KERNEL vmlinuz\n') -pxe_config.write(' KERNEL vmlinuz\n') pxe_config.write(' APPEND initrd=initrd.img %s\n' % self.kernel_args) pxe_config.close() -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Synchronized time with kvm_clock
Hello all, from the various sources google comes up with, it seems kvm_clock is now the preferred clock source for kvm guests. I was hoping it would keep the time synced with the host, but apparentely it doesn't. I'm seeing a pretty steady 1 sec diff between host and guest. The host is synced with external time sources by means of ntp. Am I supposed to run ntp inside the guest when using kvm_clock as a clocksource? Othwerwise, what clocksource should be used in the guest 1) to automatically get synced time with the host? or 2) which can be used alongside ntpd in the guest? host: cat /sys/devices/system/clocksource/clocksource0/current_clocksource tsc guest: cat /sys/devices/system/clocksource/clocksource0/current_clocksource kvm-clock Pleas cc me, I'm not on the list. -- Greetings, Alex Hermann -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 04/20] Make QEMUFile buf expandable, and introduce qemu_realloc_buffer() and qemu_clear_buffer().
Avi Kivity wrote: On 04/23/2010 12:59 PM, Yoshiaki Tamura wrote: Avi Kivity wrote: On 04/21/2010 08:57 AM, Yoshiaki Tamura wrote: Currently buf size is fixed at 32KB. It would be useful if it could be flexible. Why is this needed? The real buffering is in the kernel anyways; this is only used to reduce the number of write() syscalls. This was introduced to buffer the transfered guests image transaction ally on the receiver side. The sender doesn't use it. In case of intermediate state, we just discard this buffer. How large can it grow? It really depends on what workload is running on the guest, but it should be as large as the guest ram size in the worst case. What's wrong with applying it (perhaps partially) to the guest state? The next state transfer will overwrite it completely, no? AFAIK, the answer is no. qemu_loadvm_state() calls load handlers of each device emulator, and they will update its state directly, which means even if the transaction was not complete, it's impossible to recover the previous state if we don't make a buffer. I guess your concern is about consuming large size of ram, and I think having an option for writing the transaction to a temporal disk image should be effective. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 07/20] Introduce qemu_put_vector() and qemu_put_vector_prepare() to use put_vector() in QEMUFile.
Anthony Liguori wrote: On 04/22/2010 11:02 PM, Yoshiaki Tamura wrote: Anthony Liguori wrote: On 04/21/2010 12:57 AM, Yoshiaki Tamura wrote: For fool proof purpose, qemu_put_vector_parepare should be called before qemu_put_vector. Then, if qemu_put_* functions except this is called after qemu_put_vector_prepare, program will abort(). Signed-off-by: Yoshiaki Tamuratamura.yoshi...@lab.ntt.co.jp I don't get it. What's this protecting against? This was introduced to prevent mixing the order of normal write and vector write, and flush QEMUFile buffer before handling vectors. While qemu_put_buffer copies data to QEMUFile buffer, qemu_put_vector() will bypass that buffer. It's just fool proof purpose for what we encountered at beginning, and if the user of qemu_put_vector() is careful enough, we can remove qemu_put_vectore_prepare(). While writing this message, I started to think that just calling qemu_fflush() in qemu_put_vector() would be enough... I definitely think removing the vector stuff in the first version would simplify the process of getting everything merged. I'd prefer not to have two apis so if vector operations were important from a performance perspective, I'd want to see everything converted to a vector API. I agree with your opinion. I will measure the effect of introducing vector stuff, and post the data later. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 05/20] Introduce put_vector() and get_vector to QEMUFile and qemu_fopen_ops().
Anthony Liguori wrote: On 04/22/2010 10:37 PM, Yoshiaki Tamura wrote: Anthony Liguori wrote: On 04/21/2010 12:57 AM, Yoshiaki Tamura wrote: QEMUFile currently doesn't support writev(). For sending multiple data, such as pages, using writev() should be more efficient. Signed-off-by: Yoshiaki Tamuratamura.yoshi...@lab.ntt.co.jp Is there performance data that backs this up? Since QEMUFile uses a linear buffer for most operations that's limited to 16k, I suspect you wouldn't be able to observe a difference in practice. I currently don't have data, but I'll prepare it. There were two things I wanted to avoid. 1. Pages to be copied to QEMUFile buf through qemu_put_buffer. 2. Calling write() everytime even when we want to send multiple pages at once. I think 2 may be neglectable. But 1 seems to be problematic if we want make to the latency as small as possible, no? Copying often has strange CPU characteristics depending on whether the data is already in cache. It's better to drive these sort of optimizations through performance measurement because changes are not always obvious. I agree. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC PATCH 00/20] Kemari for KVM v0.1
Anthony Liguori wrote: On 04/22/2010 08:53 PM, Yoshiaki Tamura wrote: Anthony Liguori wrote: On 04/22/2010 08:16 AM, Yoshiaki Tamura wrote: 2010/4/22 Dor Laordl...@redhat.com: On 04/22/2010 01:35 PM, Yoshiaki Tamura wrote: Dor Laor wrote: On 04/21/2010 08:57 AM, Yoshiaki Tamura wrote: Hi all, We have been implementing the prototype of Kemari for KVM, and we're sending this message to share what we have now and TODO lists. Hopefully, we would like to get early feedback to keep us in the right direction. Although advanced approaches in the TODO lists are fascinating, we would like to run this project step by step while absorbing comments from the community. The current code is based on qemu-kvm.git 2b644fd0e737407133c88054ba498e772ce01f27. For those who are new to Kemari for KVM, please take a look at the following RFC which we posted last year. http://www.mail-archive.com/kvm@vger.kernel.org/msg25022.html The transmission/transaction protocol, and most of the control logic is implemented in QEMU. However, we needed a hack in KVM to prevent rip from proceeding before synchronizing VMs. It may also need some plumbing in the kernel side to guarantee replayability of certain events and instructions, integrate the RAS capabilities of newer x86 hardware with the HA stack, as well as for optimization purposes, for example. [ snap] The rest of this message describes TODO lists grouped by each topic. === event tapping === Event tapping is the core component of Kemari, and it decides on which event the primary should synchronize with the secondary. The basic assumption here is that outgoing I/O operations are idempotent, which is usually true for disk I/O and reliable network protocols such as TCP. IMO any type of network even should be stalled too. What if the VM runs non tcp protocol and the packet that the master node sent reached some remote client and before the sync to the slave the master failed? In current implementation, it is actually stalling any type of network that goes through virtio-net. However, if the application was using unreliable protocols, it should have its own recovering mechanism, or it should be completely stateless. Why do you treat tcp differently? You can damage the entire VM this way - think of dhcp request that was dropped on the moment you switched between the master and the slave? I'm not trying to say that we should treat tcp differently, but just it's severe. In case of dhcp request, the client would have a chance to retry after failover, correct? BTW, in current implementation, I'm slightly confused about the current implementation vs. my recollection of the original paper with Xen. I had thought that all disk and network I/O was buffered in such a way that at each checkpoint, the I/O operations would be released in a burst. Otherwise, you would have to synchronize after every I/O operation which is what it seems the current implementation does. Yes, you're almost right. It's synchronizing before QEMU starts emulating I/O at each device model. If NodeA is the master and NodeB is the slave, if NodeA sends a network packet, you'll checkpoint before the packet is actually sent, and then if a failure occurs before the next checkpoint, won't that result in both NodeA and NodeB sending out a duplicate version of the packet? Yes. But I think it's better than taking checkpoint after. If we checkpoint after sending packet, let's say it sent TCP ACK to the client, and if a hardware failure occurred to NodeA during the transaction *but the client received the TCP ACK*, NodeB will resume from the previous state, and it may need to receive some data from the client. However, because the client has already receiver TCP ACK, it won't resend the data to NodeB. It looks this data is going to be dropped. Anyway, I've just started planning to move the sync point to network/block layer, and I would post the result for discussion again. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 00/20] Kemari for KVM v0.1
Avi Kivity wrote: On 04/21/2010 08:57 AM, Yoshiaki Tamura wrote: Kemari starts synchronizing VMs when QEMU handles I/O requests. Without this patch VCPU state is already proceeded before synchronization, and after failover to the VM on the receiver, it hangs because of this. We discussed moving the barrier to the actual output device, instead of the I/O port. This allows you to complete the I/O transaction before starting synchronization. Does it not work for some reason? Sorry, I've just started working on that. I've posted this series to share what I have done so far. Thanks for looking. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
subscribe kvm -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Networking - Static NATs
Hi, I wonder if anyone is able to help me? I am trying to get a system setup that consits of: 1x Centos 5.4 Host machine 1x kvm virtual guest which will be a build server with dhcp running. x number of guests, built from the build virtual machine. the problem I have is that kvm currently has dhcp running and setting up NATs etc... I need to stop this, but still allow my current virtual machines access out, how would be the best way to do this? Kind Regards Tony -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Networking - Static NATs
On Mon, Apr 26, 2010 at 7:10 AM, Anthony Davis t...@specialistdevelopment.com wrote: the problem I have is that kvm currently has dhcp running and setting up NATs etc... I need to stop this, but still allow my current virtual machines access out, how would be the best way to do this? use bridged networking -- Javier -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Networking - Static NATs
Quoting Javier Guerra Giraldez jav...@guerrag.com: On Mon, Apr 26, 2010 at 7:10 AM, Anthony Davis t...@specialistdevelopment.com wrote: the problem I have is that kvm currently has dhcp running and setting up NATs etc... I need to stop this, but still allow my current virtual machines access out, how would be the best way to do this? use bridged networking I have read about this, but there are so many methods and out of date ways to set it up, and also none of them describe the best method to remove the current dnsmasq setup. Tony -- Javier -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM_SET_MP_STATE is undocumented
On 04/26/2010 10:48 AM, Pekka Enberg wrote: Two more interesting but undocumented ioctls: - KVM_SET_IDENTITY_MAP_ADDR - KVM_SET_BOOT_CPU_ID I'll post patches. Little background: we're debugging a KVM_EXIT_UNKNOWN problem for the largest bug-free kernel on Core i5 machine. I've been looking at plain QEMU sources but it seems qemu-kvm that the person is using does much more during initialization. Do we have a known good list of mandatory steps required to properly initialize KVM on all CPUs? KVM_GET_API_VERSION (unless you're sure you aren't running on 2.6.20 or 2.6.21) KVM_CREATE_VM KVM_SET_USER_MEMORY_REGION KVM_CREATE_VCPU KVM_SET_TSS_ADDR KVM_SET_IDENTITY_MAP_ADDR (really only needed on EPT machines, but recommended to invoke on all hosts) KVM_CREATE_IRQCHIP (optional; if you want in-kernel lapic/ioapic/pic) KVM_SET_CPUID2 KVM_RUN qemu also initializes all the vcpu state from its own values and has elaborate memory setup. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] qemu-kvm: fix crash on reboot with vhost-net
When vhost-net is disabled on reboot, we set msix mask notifier to NULL to disable further mask/unmask notifications. Code currently tries to pass this NULL to notifier, leading to a crash. The right thing to do is: - if vector is masked, we don't need to notify backend, just disable future notifications - if vector is unmasked, invoke callback to unassign backend, then disable future notifications Signed-off-by: Michael S. Tsirkin m...@redhat.com --- Note: this patch is for qemu-kvm, the code in question is not in qemu.git. hw/msix.c | 15 --- 1 files changed, 12 insertions(+), 3 deletions(-) diff --git a/hw/msix.c b/hw/msix.c index 3ec8805..43361b5 100644 --- a/hw/msix.c +++ b/hw/msix.c @@ -613,9 +613,18 @@ int msix_set_mask_notifier(PCIDevice *dev, unsigned vector, void *opaque) if (vector = dev-msix_entries_nr || !dev-msix_entry_used[vector]) return 0; -if (dev-msix_mask_notifier) -r = dev-msix_mask_notifier(dev, vector, opaque, -msix_is_masked(dev, vector)); +if (dev-msix_mask_notifier !msix_is_masked(dev, vector)) { +/* Switching notifiers while vector is unmasked: + * mask the old one, unmask the new one. */ +if (dev-msix_mask_notifier_opaque[vector]) { +r = dev-msix_mask_notifier(dev, vector, +dev-msix_mask_notifier_opaque[vector], +1); +} +if (r = 0 opaque) { +r = dev-msix_mask_notifier(dev, vector, opaque, 0); +} +} if (r = 0) dev-msix_mask_notifier_opaque[vector] = opaque; return r; -- 1.7.1.rc1.22.g3163 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [uq/master patch 2/5] kvm: add logging count to slots
On Sun, Apr 25, 2010 at 05:17:55PM +0300, Avi Kivity wrote: On 04/25/2010 04:57 PM, Jan Kiszka wrote: It's still a good idea. The current API assumes that there will be only one slot-based client (or that multiple clients will keep the refcount themselves). After the bytemap - multiple bitmaps conversion this can be extended to each client getting its own bitmap (and therefore, s/refcount/list of bitmaps/ and s/!refcount/list_empty()/). No concerns if - there is an existing use case for multiple clients, at least in qemu-kvm There isn't. But I don't like hidden breakage. - the logging API is consistently converted, not just extended (IOW, migration_log is converted to logging_count) migration_log needs to remain global, since we want hotplug memory to autostart logging. - someone signs he checked that current use of start/stop in qemu is completely symmetrical (I think to remember this used to be not the case, but I might be wrong) I remember this too. Marcelo? Don't see any guarantee that it is symmetrical. Anyway, will drop the patch from the series. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Jumbo frames with virtio net
Hi all Is it possible to configure jumbo frames (mtu=9000) on a kvm guest using virtio net drivers? Thanks. -- CL Martinez carlopmart {at} gmail {d0t} com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Synchronized time with kvm_clock
Alex, You don't need to run ntp on each guest. You can enable rtc support in the guest kernel and on the hypervisor. Run ntp client on the hypervisor via cron, and use hwclock -w on the hypervisor after you run ntp, to sync the hardware clock to the system clock (which is now updated by ntpdate). On the guests, periodically run hwclock -s to set the system clock from the hw clock. This seems to work extremely well, the clocksource on the guests as kvm_clock, and as long as you have the clocksource as hpet or acpi_pm on the hypervisor, there doesn't seem to be any problems with keeping time. The only thing I've noticed is that when you reboot, the very first guest will have the wrong time on boot, so the uptime is messed up. Regards Alex Hermann wrote: On Monday 26 April 2010, I wrote: host: cat /sys/devices/system/clocksource/clocksource0/current_clocksource tsc guest: cat /sys/devices/system/clocksource/clocksource0/current_clocksource kvm-clock Forgotten some info which might be essential: Kernel (host and guest): 2.6.32-trunk-amd64 qemu-kvm: 0.12.3+dfsg-4 Please keep me on the cc, I'm not on the list. -- John Buswell CEO, Carbon Mountain LLC http://www.carbonmountain.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Synchronized time with kvm_clock
On Mon, Apr 26, 2010 at 10:32:51AM -0400, John Buswell wrote: You don't need to run ntp on each guest. You can enable rtc support in the guest kernel and on the hypervisor. Run ntp client on the hypervisor via cron, and use hwclock -w on the hypervisor after you run ntp, to sync the hardware clock to the system clock (which is now updated by ntpdate). On the guests, periodically run hwclock -s to set the system clock from the hw clock. What a *horribly* hacky way to do it, meaning you'll get time warps all over the place, admittedly of short intervals if you run those cron jobs often enough. It seems much simpler to me to simply run ntpd in all the guests. It's not like the extra CPU or bandwidth is going to be a problem. At the very least you want to run ntpd, not ntpdate out of cron, in the hypervisor, and only use cron for those hwclock -w's. This seems to work extremely well, the clocksource on the guests as kvm_clock, and as long as you have the clocksource as hpet or acpi_pm on the hypervisor, there doesn't seem to be any problems with keeping time. The only thing I've noticed is that when you reboot, the very first guest will have the wrong time on boot, so the uptime is messed up. And I think many people would find this unacceptable. Really, I appreciate that keep the time sync'd via ntpd on the hypervisor and have it passed accurately to the guests has a certain elegant simplicity about it. But if you achieve the latter by periodically resyncing against what the guest sees as its hardware clock you've lost that elegance again. It really needs to 'just work' via KVM code in the guest kernel using the exact same time as the hypervisor kernel is supplying. -- - Athanasius = Athanasius(at)miggy.org / http://www.miggy.org/ Finger athan(at)fysh.org for PGP key And it's me who is my enemy. Me who beats me up. Me who makes the monsters. Me who strips my confidence. Paula Cole - ME signature.asc Description: Digital signature
Re: Synchronized time with kvm_clock
Athanasius wrote: On Mon, Apr 26, 2010 at 10:32:51AM -0400, John Buswell wrote: You don't need to run ntp on each guest. You can enable rtc support in the guest kernel and on the hypervisor. Run ntp client on the hypervisor via cron, and use hwclock -w on the hypervisor after you run ntp, to sync the hardware clock to the system clock (which is now updated by ntpdate). On the guests, periodically run hwclock -s to set the system clock from the hw clock. What a *horribly* hacky way to do it, meaning you'll get time warps all over the place, admittedly of short intervals if you run those cron jobs often enough. It seems much simpler to me to simply run ntpd in all the guests. It's not like the extra CPU or bandwidth is going to be a problem. At the very least you want to run ntpd, not ntpdate out of cron, in the hypervisor, and only use cron for those hwclock -w's. Not really. You don't get time warps at all, the only place you get a time warp is on the initial guest, and thats not a problem with the workaround I suggested. It seems to be an issue with the clock on the initial guest. There is no point wasting resources by running ntpd on each guest when you don't have to. This seems to work extremely well, the clocksource on the guests as kvm_clock, and as long as you have the clocksource as hpet or acpi_pm on the hypervisor, there doesn't seem to be any problems with keeping time. The only thing I've noticed is that when you reboot, the very first guest will have the wrong time on boot, so the uptime is messed up. And I think many people would find this unacceptable. This particular problem has nothing to do with what I suggested above. This is some kind of issue with kvm_clock on the first guest starting up. Really, I appreciate that keep the time sync'd via ntpd on the hypervisor and have it passed accurately to the guests has a certain elegant simplicity about it. But if you achieve the latter by periodically resyncing against what the guest sees as its hardware clock you've lost that elegance again. It really needs to 'just work' via KVM code in the guest kernel using the exact same time as the hypervisor kernel is supplying. I agree. Unfortunately, kvm_clock doesn't seem to be quite there yet. So using rtc0 as a comparison, and keeping the hypervisor clock in sync with reality, is a good way to avoid having to run N+1 copies of ntpd on the guests :) -- John Buswell CEO, Carbon Mountain LLC http://www.carbonmountain.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] document boot option to -drive parameter
On Fri, Apr 16, 2010 at 02:41:37PM -0600, Bruce Rogers wrote: The boot option is missing from the documentation for the -drive parameter. If there is a better way to descibe it, I'm all ears. Signed-off-by: Bruce Rogers brog...@novell.com diff --git a/qemu-options.hx b/qemu-options.hx index c5a160c..fbcf61e 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -160,6 +160,8 @@ an untrusted format header. This option specifies the serial number to assign to the device. @item ad...@var{addr} Specify the controller's PCI address (if=virtio only). +...@item bo...@var{boot} +...@var{boot} is on or off and allows for booting from non-traditional interfaces, such as virtio. @end table By default, writethrough caching is used for all block device. This means that Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM call agenda for Apr 27
Please send in any agenda items you are interested in covering. While I don't expect it to be the case this week, if we have a lack of agenda items I'll cancel the week's call. thanks, -chris -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3 v2] KVM MMU: make kvm_mmu_zap_page() return the number of zapped sp in total.
On Fri, Apr 23, 2010 at 01:58:22PM +0800, Gui Jianfeng wrote: Currently, in kvm_mmu_change_mmu_pages(kvm, page), used_pages-- is performed after calling kvm_mmu_zap_page() in spite of that whether page is actually reclaimed. Because root sp won't be reclaimed by kvm_mmu_zap_page(). So making kvm_mmu_zap_page() return total number of reclaimed sp makes more sense. A new flag is put into kvm_mmu_zap_page() to indicate whether the top page is reclaimed. Signed-off-by: Gui Jianfeng guijianf...@cn.fujitsu.com --- arch/x86/kvm/mmu.c | 53 +++ 1 files changed, 36 insertions(+), 17 deletions(-) Gui, There will be only a few pinned roots, and there is no need for kvm_mmu_change_mmu_pages to be precise at that level (pages will be reclaimed through kvm_unmap_hva eventually). -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/6] pvclock fixes
Hi, This is the last series I've sent, with comments from you merged. The first 5 patches are the same, only with the suggested fixes. I am leaving documentation out, since the basics won't change, and we're still discussing the details. Patch 6 is new, and is the guest side of the skipping updates avi asked for. I haven't yet done any HV work on this (specially because I am not convinced exactly where it is safe to do). Let me know what you think. Thanks Glauber Costa (6): Enable pvclock flags in vcpu_time_info structure Add a global synchronization point for pvclock change msr numbers for kvmclock export new cpuid KVM_CAP Try using new kvm clock msrs don't compute pvclock adjustments if we trust the tsc arch/x86/include/asm/kvm_para.h| 13 arch/x86/include/asm/pvclock-abi.h |4 ++- arch/x86/include/asm/pvclock.h |1 + arch/x86/kernel/kvmclock.c | 59 +++- arch/x86/kernel/pvclock.c | 37 ++ arch/x86/kvm/x86.c | 13 +++- include/linux/kvm.h|1 + 7 files changed, 105 insertions(+), 23 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/6] Add a global synchronization point for pvclock
In recent stress tests, it was found that pvclock-based systems could seriously warp in smp systems. Using ingo's time-warp-test.c, I could trigger a scenario as bad as 1.5mi warps a minute in some systems. (to be fair, it wasn't that bad in most of them). Investigating further, I found out that such warps were caused by the very offset-based calculation pvclock is based on. This happens even on some machines that report constant_tsc in its tsc flags, specially on multi-socket ones. Two reads of the same kernel timestamp at approx the same time, will likely have tsc timestamped in different occasions too. This means the delta we calculate is unpredictable at best, and can probably be smaller in a cpu that is legitimately reading clock in a forward ocasion. Some adjustments on the host could make this window less likely to happen, but still, it pretty much poses as an intrinsic problem of the mechanism. A while ago, I though about using a shared variable anyway, to hold clock last state, but gave up due to the high contention locking was likely to introduce, possibly rendering the thing useless on big machines. I argue, however, that locking is not necessary. We do a read-and-return sequence in pvclock, and between read and return, the global value can have changed. However, it can only have changed by means of an addition of a positive value. So if we detected that our clock timestamp is less than the current global, we know that we need to return a higher one, even though it is not exactly the one we compared to. OTOH, if we detect we're greater than the current time source, we atomically replace the value with our new readings. This do causes contention on big boxes (but big here means *BIG*), but it seems like a good trade off, since it provide us with a time source guaranteed to be stable wrt time warps. After this patch is applied, I don't see a single warp in time during 5 days of execution, in any of the machines I saw them before. Signed-off-by: Glauber Costa glom...@redhat.com CC: Jeremy Fitzhardinge jer...@goop.org CC: Avi Kivity a...@redhat.com CC: Marcelo Tosatti mtosa...@redhat.com CC: Zachary Amsden zams...@redhat.com --- arch/x86/kernel/pvclock.c | 24 1 files changed, 24 insertions(+), 0 deletions(-) diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c index 8f4af7b..6cf6dec 100644 --- a/arch/x86/kernel/pvclock.c +++ b/arch/x86/kernel/pvclock.c @@ -118,11 +118,14 @@ unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src) return pv_tsc_khz; } +static atomic64_t last_value = ATOMIC64_INIT(0); + cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src) { struct pvclock_shadow_time shadow; unsigned version; cycle_t ret, offset; + u64 last; do { version = pvclock_get_time_values(shadow, src); @@ -132,6 +135,27 @@ cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src) barrier(); } while (version != src-version); + /* +* Assumption here is that last_value, a global accumulator, always goes +* forward. If we are less than that, we should not be much smaller. +* We assume there is an error marging we're inside, and then the correction +* does not sacrifice accuracy. +* +* For reads: global may have changed between test and return, +* but this means someone else updated poked the clock at a later time. +* We just need to make sure we are not seeing a backwards event. +* +* For updates: last_value = ret is not enough, since two vcpus could be +* updating at the same time, and one of them could be slightly behind, +* making the assumption that last_value always go forward fail to hold. +*/ + last = atomic64_read(last_value); + do { + if (ret last) + return last; + last = atomic64_cmpxchg(last_value, last, ret); + } while (unlikely(last != ret)); + return ret; } -- 1.6.2.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/6] Enable pvclock flags in vcpu_time_info structure
This patch removes one padding byte and transform it into a flags field. New versions of guests using pvclock will query these flags upon each read. Flags, however, will only be interpreted when the guest decides to. It uses the pvclock_valid_flags function to signal that a specific set of flags should be taken into consideration. Which flags are valid are usually devised via HV negotiation. Signed-off-by: Glauber Costa glom...@redhat.com CC: Jeremy Fitzhardinge jer...@goop.org --- arch/x86/include/asm/pvclock-abi.h |3 ++- arch/x86/include/asm/pvclock.h |1 + arch/x86/kernel/pvclock.c |9 + 3 files changed, 12 insertions(+), 1 deletions(-) diff --git a/arch/x86/include/asm/pvclock-abi.h b/arch/x86/include/asm/pvclock-abi.h index 6d93508..ec5c41a 100644 --- a/arch/x86/include/asm/pvclock-abi.h +++ b/arch/x86/include/asm/pvclock-abi.h @@ -29,7 +29,8 @@ struct pvclock_vcpu_time_info { u64 system_time; u32 tsc_to_system_mul; s8tsc_shift; - u8pad[3]; + u8flags; + u8pad[2]; } __attribute__((__packed__)); /* 32 bytes */ struct pvclock_wall_clock { diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h index 53235fd..c50823f 100644 --- a/arch/x86/include/asm/pvclock.h +++ b/arch/x86/include/asm/pvclock.h @@ -6,6 +6,7 @@ /* some helper functions for xen and kvm pv clock sources */ cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src); +void pvclock_valid_flags(u8 flags); unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src); void pvclock_read_wallclock(struct pvclock_wall_clock *wall, struct pvclock_vcpu_time_info *vcpu, diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c index 03801f2..8f4af7b 100644 --- a/arch/x86/kernel/pvclock.c +++ b/arch/x86/kernel/pvclock.c @@ -31,8 +31,16 @@ struct pvclock_shadow_time { u32 tsc_to_nsec_mul; int tsc_shift; u32 version; + u8 flags; }; +static u8 valid_flags = 0; + +void pvclock_valid_flags(u8 flags) +{ + valid_flags = flags; +} + /* * Scale a 64-bit delta by scaling and multiplying by a 32-bit fraction, * yielding a 64-bit result. @@ -91,6 +99,7 @@ static unsigned pvclock_get_time_values(struct pvclock_shadow_time *dst, dst-system_timestamp = src-system_time; dst-tsc_to_nsec_mul = src-tsc_to_system_mul; dst-tsc_shift = src-tsc_shift; + dst-flags = src-flags; rmb(); /* test version after fetching data */ } while ((src-version 1) || (dst-version != src-version)); -- 1.6.2.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/6] don't compute pvclock adjustments if we trust the tsc
If the HV told us we can fully trust the TSC, skip any correction Signed-off-by: Glauber Costa glom...@redhat.com --- arch/x86/include/asm/kvm_para.h|5 + arch/x86/include/asm/pvclock-abi.h |1 + arch/x86/kernel/kvmclock.c |3 +++ arch/x86/kernel/pvclock.c |4 4 files changed, 13 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index f019f8c..615ebb1 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -21,6 +21,11 @@ */ #define KVM_FEATURE_CLOCKSOURCE23 +/* The last 8 bits are used to indicate how to interpret the flags field + * in pvclock structure. If no bits are set, all flags are ignored. + */ +#define KVM_FEATURE_CLOCKSOURCE_STABLE_TSC 0xf8 + #define MSR_KVM_WALL_CLOCK 0x11 #define MSR_KVM_SYSTEM_TIME 0x12 diff --git a/arch/x86/include/asm/pvclock-abi.h b/arch/x86/include/asm/pvclock-abi.h index ec5c41a..b123bd7 100644 --- a/arch/x86/include/asm/pvclock-abi.h +++ b/arch/x86/include/asm/pvclock-abi.h @@ -39,5 +39,6 @@ struct pvclock_wall_clock { u32 nsec; } __attribute__((__packed__)); +#define PVCLOCK_STABLE_TSC (1 0) #endif /* __ASSEMBLY__ */ #endif /* _ASM_X86_PVCLOCK_ABI_H */ diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c index f2f6aee..aca2d3c 100644 --- a/arch/x86/kernel/kvmclock.c +++ b/arch/x86/kernel/kvmclock.c @@ -218,4 +218,7 @@ void __init kvmclock_init(void) clocksource_register(kvm_clock); pv_info.paravirt_enabled = 1; pv_info.name = KVM; + + if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE_STABLE_TSC)) + pvclock_valid_flags(PVCLOCK_STABLE_TSC); } diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c index 6cf6dec..43ae8d5 100644 --- a/arch/x86/kernel/pvclock.c +++ b/arch/x86/kernel/pvclock.c @@ -135,6 +135,10 @@ cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src) barrier(); } while (version != src-version); + if ((valid_flags PVCLOCK_STABLE_TSC) + (shadow-flags PVCLOCK_STABLE_TSC)) + return ret; + /* * Assumption here is that last_value, a global accumulator, always goes * forward. If we are less than that, we should not be much smaller. -- 1.6.2.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/6] Try using new kvm clock msrs
We now added a new set of clock-related msrs in replacement of the old ones. In theory, we could just try to use them and get a return value indicating they do not exist, due to our use of kvm_write_msr_save. However, kvm clock registration happens very early, and if we ever try to write to a non-existant MSR, we raise a lethal #GP, since our idt handlers are not in place yet. So this patch tests for a cpuid feature exported by the host to decide which set of msrs are supported. Signed-off-by: Glauber Costa glom...@redhat.com --- arch/x86/kernel/kvmclock.c | 56 +++ 1 files changed, 35 insertions(+), 21 deletions(-) diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c index feaeb0d..f2f6aee 100644 --- a/arch/x86/kernel/kvmclock.c +++ b/arch/x86/kernel/kvmclock.c @@ -29,6 +29,8 @@ #define KVM_SCALE 22 static int kvmclock = 1; +static int msr_kvm_system_time = MSR_KVM_SYSTEM_TIME; +static int msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK; static int parse_no_kvmclock(char *arg) { @@ -41,6 +43,7 @@ early_param(no-kvmclock, parse_no_kvmclock); static DEFINE_PER_CPU_SHARED_ALIGNED(struct pvclock_vcpu_time_info, hv_clock); static struct pvclock_wall_clock wall_clock; + /* * The wallclock is the time of day when we booted. Since then, some time may * have elapsed since the hypervisor wrote the data. So we try to account for @@ -54,7 +57,8 @@ static unsigned long kvm_get_wallclock(void) low = (int)__pa_symbol(wall_clock); high = ((u64)__pa_symbol(wall_clock) 32); - native_write_msr(MSR_KVM_WALL_CLOCK, low, high); + + native_write_msr_safe(msr_kvm_wall_clock, low, high); vcpu_time = get_cpu_var(hv_clock); pvclock_read_wallclock(wall_clock, vcpu_time, ts); @@ -130,7 +134,8 @@ static int kvm_register_clock(char *txt) high = ((u64)__pa(per_cpu(hv_clock, cpu)) 32); printk(KERN_INFO kvm-clock: cpu %d, msr %x:%x, %s\n, cpu, high, low, txt); - return native_write_msr_safe(MSR_KVM_SYSTEM_TIME, low, high); + + return native_write_msr_safe(msr_kvm_system_time, low, high); } #ifdef CONFIG_X86_LOCAL_APIC @@ -165,14 +170,15 @@ static void __init kvm_smp_prepare_boot_cpu(void) #ifdef CONFIG_KEXEC static void kvm_crash_shutdown(struct pt_regs *regs) { - native_write_msr_safe(MSR_KVM_SYSTEM_TIME, 0, 0); + + native_write_msr(msr_kvm_system_time, 0, 0); native_machine_crash_shutdown(regs); } #endif static void kvm_shutdown(void) { - native_write_msr_safe(MSR_KVM_SYSTEM_TIME, 0, 0); + native_write_msr(msr_kvm_system_time, 0, 0); native_machine_shutdown(); } @@ -181,27 +187,35 @@ void __init kvmclock_init(void) if (!kvm_para_available()) return; - if (kvmclock kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)) { - if (kvm_register_clock(boot clock)) - return; - pv_time_ops.sched_clock = kvm_clock_read; - x86_platform.calibrate_tsc = kvm_get_tsc_khz; - x86_platform.get_wallclock = kvm_get_wallclock; - x86_platform.set_wallclock = kvm_set_wallclock; + if (kvmclock kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE2)) { + msr_kvm_system_time = MSR_KVM_SYSTEM_TIME_NEW; + msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK_NEW; + } + else if (!(kvmclock kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE))) + return; + + printk(KERN_INFO kvm-clock: Using msrs %x and %x, + msr_kvm_system_time, msr_kvm_wall_clock); + + if (kvm_register_clock(boot clock)) + return; + pv_time_ops.sched_clock = kvm_clock_read; + x86_platform.calibrate_tsc = kvm_get_tsc_khz; + x86_platform.get_wallclock = kvm_get_wallclock; + x86_platform.set_wallclock = kvm_set_wallclock; #ifdef CONFIG_X86_LOCAL_APIC - x86_cpuinit.setup_percpu_clockev = - kvm_setup_secondary_clock; + x86_cpuinit.setup_percpu_clockev = + kvm_setup_secondary_clock; #endif #ifdef CONFIG_SMP - smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu; + smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu; #endif - machine_ops.shutdown = kvm_shutdown; + machine_ops.shutdown = kvm_shutdown; #ifdef CONFIG_KEXEC - machine_ops.crash_shutdown = kvm_crash_shutdown; + machine_ops.crash_shutdown = kvm_crash_shutdown; #endif - kvm_get_preset_lpj(); - clocksource_register(kvm_clock); - pv_info.paravirt_enabled = 1; - pv_info.name = KVM; - } + kvm_get_preset_lpj(); + clocksource_register(kvm_clock); + pv_info.paravirt_enabled = 1; + pv_info.name = KVM; } -- 1.6.2.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to
[PATCH 4/6] export new cpuid KVM_CAP
Since we're changing the msrs kvmclock uses, we have to communicate that to the guest, through cpuid. We can add a new KVM_CAP to the hypervisor, and then patch userspace to recognize it. And if we ever add a new cpuid bit in the future, we have to do that again, which create some complexity and delay in feature adoption. Instead, what I'm proposing in this patch is a new capability, called KVM_CAP_X86_CPUID_FEATURE_LIST, that returns the current feature list currently supported by the hypervisor. If we ever want to add or remove some feature, we only need to tweak into the HV, leaving userspace untouched. Signed-off-by: Glauber Costa glom...@redhat.com --- arch/x86/include/asm/kvm_para.h |4 arch/x86/kvm/x86.c |6 ++ include/linux/kvm.h |1 + 3 files changed, 11 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index 9734808..f019f8c 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -16,6 +16,10 @@ #define KVM_FEATURE_CLOCKSOURCE0 #define KVM_FEATURE_NOP_IO_DELAY 1 #define KVM_FEATURE_MMU_OP 2 +/* This indicates that the new set of kvmclock msrs + * are available. The use of 0x11 and 0x12 is deprecated + */ +#define KVM_FEATURE_CLOCKSOURCE23 #define MSR_KVM_WALL_CLOCK 0x11 #define MSR_KVM_SYSTEM_TIME 0x12 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index a2ead7f..04f04aa 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1545,6 +1545,12 @@ int kvm_dev_ioctl_check_extension(long ext) case KVM_CAP_MCE: r = KVM_MAX_MCE_BANKS; break; + case KVM_CAP_X86_CPUID_FEATURE_LIST: + r = (1 KVM_FEATURE_CLOCKSOURCE) | + (1 KVM_FEATURE_NOP_IO_DELAY) | + (1 KVM_FEATURE_MMU_OP) | + (1 KVM_FEATURE_CLOCKSOURCE2); + break; default: r = 0; break; diff --git a/include/linux/kvm.h b/include/linux/kvm.h index ce28767..1ce124f 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -507,6 +507,7 @@ struct kvm_ioeventfd { #define KVM_CAP_DEBUGREGS 50 #endif #define KVM_CAP_X86_ROBUST_SINGLESTEP 51 +#define KVM_CAP_X86_CPUID_FEATURE_LIST 52 #ifdef KVM_CAP_IRQ_ROUTING -- 1.6.2.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/6] change msr numbers for kvmclock
Avi pointed out a while ago that those MSRs falls into the pentium PMU range. So the idea here is to add new ones, and after a while, deprecate the old ones. Signed-off-by: Glauber Costa glom...@redhat.com --- arch/x86/include/asm/kvm_para.h |4 arch/x86/kvm/x86.c |7 ++- 2 files changed, 10 insertions(+), 1 deletions(-) diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index ffae142..9734808 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -20,6 +20,10 @@ #define MSR_KVM_WALL_CLOCK 0x11 #define MSR_KVM_SYSTEM_TIME 0x12 +/* Custom MSRs falls in the range 0x4b564d00-0x4b564dff */ +#define MSR_KVM_WALL_CLOCK_NEW 0x4b564d00 +#define MSR_KVM_SYSTEM_TIME_NEW 0x4b564d01 + #define KVM_MAX_MMU_OP_BATCH 32 /* Operations for KVM_HC_MMU_OP */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 8824b73..a2ead7f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -575,9 +575,10 @@ static inline u32 bit(int bitno) * kvm-specific. Those are put in the beginning of the list. */ -#define KVM_SAVE_MSRS_BEGIN5 +#define KVM_SAVE_MSRS_BEGIN7 static u32 msrs_to_save[] = { MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK, + MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW, HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL, HV_X64_MSR_APIC_ASSIST_PAGE, MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP, @@ -1099,10 +1100,12 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data) case MSR_IA32_MISC_ENABLE: vcpu-arch.ia32_misc_enable_msr = data; break; + case MSR_KVM_WALL_CLOCK_NEW: case MSR_KVM_WALL_CLOCK: vcpu-kvm-arch.wall_clock = data; kvm_write_wall_clock(vcpu-kvm, data); break; + case MSR_KVM_SYSTEM_TIME_NEW: case MSR_KVM_SYSTEM_TIME: { if (vcpu-arch.time_page) { kvm_release_page_dirty(vcpu-arch.time_page); @@ -1374,9 +1377,11 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) data = vcpu-arch.efer; break; case MSR_KVM_WALL_CLOCK: + case MSR_KVM_WALL_CLOCK_NEW: data = vcpu-kvm-arch.wall_clock; break; case MSR_KVM_SYSTEM_TIME: + case MSR_KVM_SYSTEM_TIME_NEW: data = vcpu-arch.time; break; case MSR_IA32_P5_MC_ADDR: -- 1.6.2.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call agenda for Apr 27
On 04/26/2010 12:26 PM, Chris Wright wrote: Please send in any agenda items you are interested in covering. While I don't expect it to be the case this week, if we have a lack of agenda items I'll cancel the week's call. - qemu management interface (and libvirt) - stable tree policy (push vs. pull and call for stable volunteers) Regards, Anthony Liguori thanks, -chris -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] KVM MMU: fix sp-unsync type error in trace event definition.
On Thu, Apr 22, 2010 at 05:33:57PM +0800, Gui Jianfeng wrote: sp-unsync is bool now, so update trace event declaration. Signed-off-by: Gui Jianfeng guijianf...@cn.fujitsu.com --- arch/x86/kvm/mmutrace.h |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/mmutrace.h b/arch/x86/kvm/mmutrace.h index 3851f1f..9966e80 100644 --- a/arch/x86/kvm/mmutrace.h +++ b/arch/x86/kvm/mmutrace.h @@ -11,7 +11,7 @@ __field(__u64, gfn) \ __field(__u32, role) \ __field(__u32, root_count) \ - __field(__u32, unsync) + __field(bool, unsync) #define KVM_MMU_PAGE_ASSIGN(sp) \ __entry-gfn = sp-gfn; \ -- 1.6.5.2 Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2.6.32.12: Build warning due to 78ce64a384 / missing in 2.6.33?
Gleb, I'm getting a build warning with latest 2.6.32.12 due to Fix segment descriptor loading. load_segment_descriptor_to_kvm_desct is unused after that patch. I assume it's just forgotten code and did not accidentally become unused, right? The fact that 2.6.33.3 does not generate this makes me wonder why it obviously lacks the above patch. Not required or not yet queued? Jan signature.asc Description: OpenPGP digital signature
[PATCH 08/10] introduce leul_to_cpu
To be used by next patch. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com --- bswap.h |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/bswap.h b/bswap.h index aace9b7..956f3fa 100644 --- a/bswap.h +++ b/bswap.h @@ -205,8 +205,10 @@ static inline void cpu_to_be32wu(uint32_t *p, uint32_t v) #ifdef HOST_WORDS_BIGENDIAN #define cpu_to_32wu cpu_to_be32wu +#define leul_to_cpu(v) le ## HOST_LONG_BITS ## _to_cpu(v) #else #define cpu_to_32wu cpu_to_le32wu +#define leul_to_cpu(v) (v) #endif #undef le_bswap -- 1.6.6.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 04/10] kvm: allow qemu to set EPT identity mapping address
From: Sheng Yang sh...@linux.intel.com If we use larger BIOS image than current 256KB, we would need move reserved TSS and EPT identity mapping pages. Currently TSS support this, but not EPT. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com --- target-i386/kvm.c | 26 +- 1 files changed, 25 insertions(+), 1 deletions(-) diff --git a/target-i386/kvm.c b/target-i386/kvm.c index bb6dafa..f73b47b 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -326,6 +326,25 @@ static int kvm_has_msr_star(CPUState *env) return 0; } +static int kvm_init_identity_map_page(KVMState *s) +{ +#ifdef KVM_CAP_SET_IDENTITY_MAP_ADDR +int ret; +uint64_t addr = 0xfffbc000; + +if (!kvm_check_extension(s, KVM_CAP_SET_IDENTITY_MAP_ADDR)) { +return 0; +} + +ret = kvm_vm_ioctl(s, KVM_SET_IDENTITY_MAP_ADDR, addr); +if (ret 0) { +fprintf(stderr, kvm_set_identity_map_addr: %s\n, strerror(ret)); +return ret; +} +#endif +return 0; +} + int kvm_arch_init(KVMState *s, int smp_cpus) { int ret; @@ -353,7 +372,12 @@ int kvm_arch_init(KVMState *s, int smp_cpus) perror(e820_add_entry() table is full); exit(1); } -return kvm_vm_ioctl(s, KVM_SET_TSS_ADDR, 0xfffbd000); +ret = kvm_vm_ioctl(s, KVM_SET_TSS_ADDR, 0xfffbd000); +if (ret 0) { +return ret; +} + +return kvm_init_identity_map_page(s); } static void set_v8086_seg(struct kvm_segment *lhs, const SegmentCache *rhs) -- 1.6.6.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 06/10] kvm: remove explicit kvm_arch_reset_vcpu from kvm_init_vcpu
This is now done via the initialization's qemu_system_reset call. Signed-off-by: Avi Kivity a...@redhat.com --- kvm-all.c |1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index 9c8aa7d..eabb097 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -208,7 +208,6 @@ int kvm_init_vcpu(CPUState *env) ret = kvm_arch_init_vcpu(env); if (ret == 0) { qemu_register_reset(kvm_reset_vcpu, env); -kvm_arch_reset_vcpu(env); } err: return ret; -- 1.6.6.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 09/10] kvm: port qemu-kvm's bitmap scanning
Which is significantly faster. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com --- kvm-all.c | 53 + 1 files changed, 37 insertions(+), 16 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index eabb097..ef45418 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -26,6 +26,7 @@ #include hw/hw.h #include gdbstub.h #include kvm.h +#include bswap.h /* KVM uses PAGE_SIZE in it's definition of COALESCED_MMIO_MAX */ #define PAGE_SIZE TARGET_PAGE_SIZE @@ -282,11 +283,41 @@ static int kvm_set_migration_log(int enable) return 0; } -static int test_le_bit(unsigned long nr, unsigned char *addr) +/* get kvm's dirty pages bitmap and update qemu's */ +static int kvm_get_dirty_pages_log_range(unsigned long start_addr, + unsigned long *bitmap, + unsigned long offset, + unsigned long mem_size) { -return (addr[nr 3] (nr 7)) 1; +unsigned int i, j; +unsigned long page_number, addr, addr1, c; +ram_addr_t ram_addr; +unsigned int len = ((mem_size / TARGET_PAGE_SIZE) + HOST_LONG_BITS - 1) / +HOST_LONG_BITS; + +/* + * bitmap-traveling is faster than memory-traveling (for addr...) + * especially when most of the memory is not dirty. + */ +for (i = 0; i len; i++) { +if (bitmap[i] != 0) { +c = leul_to_cpu(bitmap[i]); +do { +j = ffsl(c) - 1; +c = ~(1ul j); +page_number = i * HOST_LONG_BITS + j; +addr1 = page_number * TARGET_PAGE_SIZE; +addr = offset + addr1; +ram_addr = cpu_get_physical_page_desc(addr); +cpu_physical_memory_set_dirty(ram_addr); +} while (c != 0); +} +} +return 0; } +#define ALIGN(x, y) (((x)+(y)-1) ~((y)-1)) + /** * kvm_physical_sync_dirty_bitmap - Grab dirty bitmap from kernel space * This function updates qemu's dirty bitmap using cpu_physical_memory_set_dirty(). @@ -300,8 +331,6 @@ static int kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr, { KVMState *s = kvm_state; unsigned long size, allocated_size = 0; -target_phys_addr_t phys_addr; -ram_addr_t addr; KVMDirtyLog d; KVMSlot *mem; int ret = 0; @@ -313,7 +342,7 @@ static int kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr, break; } -size = ((mem-memory_size TARGET_PAGE_BITS) + 7) / 8; +size = ALIGN(((mem-memory_size) TARGET_PAGE_BITS), HOST_LONG_BITS) / 8; if (!d.dirty_bitmap) { d.dirty_bitmap = qemu_malloc(size); } else if (size allocated_size) { @@ -330,17 +359,9 @@ static int kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr, break; } -for (phys_addr = mem-start_addr, addr = mem-phys_offset; - phys_addr mem-start_addr + mem-memory_size; - phys_addr += TARGET_PAGE_SIZE, addr += TARGET_PAGE_SIZE) { -unsigned char *bitmap = (unsigned char *)d.dirty_bitmap; -unsigned nr = (phys_addr - mem-start_addr) TARGET_PAGE_BITS; - -if (test_le_bit(nr, bitmap)) { -cpu_physical_memory_set_dirty(addr); -} -} -start_addr = phys_addr; +kvm_get_dirty_pages_log_range(mem-start_addr, d.dirty_bitmap, + mem-start_addr, mem-memory_size); +start_addr = mem-start_addr + mem-memory_size; } qemu_free(d.dirty_bitmap); -- 1.6.6.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/10] KVM: x86: Add debug register saving and restoring
From: Jan Kiszka jan.kis...@siemens.com Make use of the new KVM_GET/SET_DEBUGREGS to save/restore the x86 debug registers. Signed-off-by: Jan Kiszka jan.kis...@siemens.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com --- kvm-all.c | 11 ++ kvm.h |1 + target-i386/kvm.c | 55 + 3 files changed, 67 insertions(+), 0 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index 2ede4b9..d050115 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -64,6 +64,7 @@ struct KVMState int migration_log; int vcpu_events; int robust_singlestep; +int debugregs; #ifdef KVM_CAP_SET_GUEST_DEBUG struct kvm_sw_breakpoint_head kvm_sw_breakpoints; #endif @@ -664,6 +665,11 @@ int kvm_init(int smp_cpus) kvm_check_extension(s, KVM_CAP_X86_ROBUST_SINGLESTEP); #endif +s-debugregs = 0; +#ifdef KVM_CAP_DEBUGREGS +s-debugregs = kvm_check_extension(s, KVM_CAP_DEBUGREGS); +#endif + ret = kvm_arch_init(s, smp_cpus); if (ret 0) goto err; @@ -939,6 +945,11 @@ int kvm_has_robust_singlestep(void) return kvm_state-robust_singlestep; } +int kvm_has_debugregs(void) +{ +return kvm_state-debugregs; +} + void kvm_setup_guest_memory(void *start, size_t size) { if (!kvm_has_sync_mmu()) { diff --git a/kvm.h b/kvm.h index ae87d85..70bfbf8 100644 --- a/kvm.h +++ b/kvm.h @@ -40,6 +40,7 @@ int kvm_init(int smp_cpus); int kvm_has_sync_mmu(void); int kvm_has_vcpu_events(void); int kvm_has_robust_singlestep(void); +int kvm_has_debugregs(void); #ifdef NEED_CPU_H int kvm_init_vcpu(CPUState *env); diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 5513472..bb6dafa 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -874,6 +874,53 @@ static int kvm_guest_debug_workarounds(CPUState *env) return ret; } +static int kvm_put_debugregs(CPUState *env) +{ +#ifdef KVM_CAP_DEBUGREGS +struct kvm_debugregs dbgregs; +int i; + +if (!kvm_has_debugregs()) { +return 0; +} + +for (i = 0; i 4; i++) { +dbgregs.db[i] = env-dr[i]; +} +dbgregs.dr6 = env-dr[6]; +dbgregs.dr7 = env-dr[7]; +dbgregs.flags = 0; + +return kvm_vcpu_ioctl(env, KVM_SET_DEBUGREGS, dbgregs); +#else +return 0; +#endif +} + +static int kvm_get_debugregs(CPUState *env) +{ +#ifdef KVM_CAP_DEBUGREGS +struct kvm_debugregs dbgregs; +int i, ret; + +if (!kvm_has_debugregs()) { +return 0; +} + +ret = kvm_vcpu_ioctl(env, KVM_GET_DEBUGREGS, dbgregs); +if (ret 0) { + return ret; +} +for (i = 0; i 4; i++) { +env-dr[i] = dbgregs.db[i]; +} +env-dr[4] = env-dr[6] = dbgregs.dr6; +env-dr[5] = env-dr[7] = dbgregs.dr7; +#endif + +return 0; +} + int kvm_arch_put_registers(CPUState *env, int level) { int ret; @@ -909,6 +956,10 @@ int kvm_arch_put_registers(CPUState *env, int level) if (ret 0) return ret; +ret = kvm_put_debugregs(env); +if (ret 0) +return ret; + return 0; } @@ -940,6 +991,10 @@ int kvm_arch_get_registers(CPUState *env) if (ret 0) return ret; +ret = kvm_get_debugregs(env); +if (ret 0) +return ret; + return 0; } -- 1.6.6.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 10/10] introduce qemu_ram_map
Which allows drivers to register an mmaped region into ram block mappings. To be used by device assignment driver. CC: Cam Macdonell c...@cs.ualberta.ca Signed-off-by: Marcelo Tosatti mtosa...@redhat.com --- cpu-common.h |1 + exec.c | 28 2 files changed, 29 insertions(+), 0 deletions(-) diff --git a/cpu-common.h b/cpu-common.h index b24cecc..2dfde6f 100644 --- a/cpu-common.h +++ b/cpu-common.h @@ -40,6 +40,7 @@ static inline void cpu_register_physical_memory(target_phys_addr_t start_addr, } ram_addr_t cpu_get_physical_page_desc(target_phys_addr_t addr); +ram_addr_t qemu_ram_map(ram_addr_t size, void *host); ram_addr_t qemu_ram_alloc(ram_addr_t); void qemu_ram_free(ram_addr_t addr); /* This should only be used for ram local to a device. */ diff --git a/exec.c b/exec.c index 14d1fd7..648a9c9 100644 --- a/exec.c +++ b/exec.c @@ -2789,6 +2789,34 @@ static void *file_ram_alloc(ram_addr_t memory, const char *path) } #endif +ram_addr_t qemu_ram_map(ram_addr_t size, void *host) +{ +RAMBlock *new_block; + +size = TARGET_PAGE_ALIGN(size); +new_block = qemu_malloc(sizeof(*new_block)); + +new_block-host = host; + +new_block-offset = last_ram_offset; +new_block-length = size; + +new_block-next = ram_blocks; +ram_blocks = new_block; + +phys_ram_dirty = qemu_realloc(phys_ram_dirty, +(last_ram_offset + size) TARGET_PAGE_BITS); +memset(phys_ram_dirty + (last_ram_offset TARGET_PAGE_BITS), + 0xff, size TARGET_PAGE_BITS); + +last_ram_offset += size; + +if (kvm_enabled()) +kvm_setup_guest_memory(new_block-host, size); + +return new_block-offset; +} + ram_addr_t qemu_ram_alloc(ram_addr_t size) { RAMBlock *new_block; -- 1.6.6.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 07/10] vga: fix typo in length passed to kvm_log_stop
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com --- hw/vga.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/hw/vga.c b/hw/vga.c index 845dbcc..db72115 100644 --- a/hw/vga.c +++ b/hw/vga.c @@ -1618,8 +1618,8 @@ void vga_dirty_log_stop(VGACommonState *s) kvm_log_stop(s-map_addr, s-map_end - s-map_addr); if (kvm_enabled() s-lfb_vram_mapped) { - kvm_log_stop(isa_mem_base + 0xa, 0x8); - kvm_log_stop(isa_mem_base + 0xa8000, 0x8); + kvm_log_stop(isa_mem_base + 0xa, 0x8000); + kvm_log_stop(isa_mem_base + 0xa8000, 0x8000); } #ifdef CONFIG_BOCHS_VBE -- 1.6.6.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 00/10] [PULL] qemu-kvm.git uq/master queue
The following changes since commit a303f9e37b87ced34e966dc2c0b7f86bc5e74035: Blue Swirl (1): sh4: remove dead assignments, spotted by clang analyzer are available in the git repository at: git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git uq/master Jan Kiszka (1): KVM: x86: Add debug register saving and restoring Marcelo Tosatti (8): target-i386: print EFER in cpu_dump_state kvm: handle internal error kvm_init_vcpu requires global lock held kvm: remove explicit kvm_arch_reset_vcpu from kvm_init_vcpu vga: fix typo in length passed to kvm_log_stop introduce leul_to_cpu kvm: port qemu-kvm's bitmap scanning introduce qemu_ram_map Sheng Yang (1): kvm: allow qemu to set EPT identity mapping address bswap.h |2 + cpu-common.h |1 + cpus.c |2 +- exec.c | 28 ++ hw/vga.c |4 +- kvm-all.c| 96 +- kvm.h|1 + target-i386/helper.c |1 + target-i386/kvm.c| 81 +- 9 files changed, 195 insertions(+), 21 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 02/10] target-i386: print EFER in cpu_dump_state
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com --- target-i386/helper.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/target-i386/helper.c b/target-i386/helper.c index 3835835..c9508a8 100644 --- a/target-i386/helper.c +++ b/target-i386/helper.c @@ -356,6 +356,7 @@ void cpu_dump_state(CPUState *env, FILE *f, cc_op_name); } } +cpu_fprintf(f, EFER=%016 PRIx64 \n, env-efer); if (flags X86_DUMP_FPU) { int fptag; fptag = 0; -- 1.6.6.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 05/10] kvm_init_vcpu requires global lock held
Since it accesses data protected by the lock. Signed-off-by: Avi Kivity a...@redhat.com --- cpus.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/cpus.c b/cpus.c index 8450ee4..2bf87d2 100644 --- a/cpus.c +++ b/cpus.c @@ -401,6 +401,7 @@ static void *kvm_cpu_thread_fn(void *arg) { CPUState *env = arg; +qemu_mutex_lock(qemu_global_mutex); qemu_thread_self(env-thread); if (kvm_enabled()) kvm_init_vcpu(env); @@ -408,7 +409,6 @@ static void *kvm_cpu_thread_fn(void *arg) kvm_block_io_signals(env); /* signal CPU creation */ -qemu_mutex_lock(qemu_global_mutex); env-created = 1; qemu_cond_signal(qemu_cpu_cond); -- 1.6.6.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 03/10] kvm: handle internal error
Port qemu-kvm's KVM_EXIT_INTERNAL_ERROR handling to upstream. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com --- kvm-all.c | 31 +++ 1 files changed, 31 insertions(+), 0 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index d050115..9c8aa7d 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -730,6 +730,32 @@ static int kvm_handle_io(uint16_t port, void *data, int direction, int size, return 1; } +#ifdef KVM_CAP_INTERNAL_ERROR_DATA +static void kvm_handle_internal_error(CPUState *env, struct kvm_run *run) +{ + +if (kvm_check_extension(kvm_state, KVM_CAP_INTERNAL_ERROR_DATA)) { +int i; + +fprintf(stderr, KVM internal error. Suberror: %d\n, +run-internal.suberror); + +for (i = 0; i run-internal.ndata; ++i) { +fprintf(stderr, extra data[%d]: %PRIx64\n, +i, (uint64_t)run-internal.data[i]); +} +} +cpu_dump_state(env, stderr, fprintf, 0); +if (run-internal.suberror == KVM_INTERNAL_ERROR_EMULATION) { +fprintf(stderr, emulation failure\n); +} +/* FIXME: Should trigger a qmp message to let management know + * something went wrong. + */ +vm_stop(0); +} +#endif + void kvm_flush_coalesced_mmio_buffer(void) { #ifdef KVM_CAP_COALESCED_MMIO @@ -845,6 +871,11 @@ int kvm_cpu_exec(CPUState *env) case KVM_EXIT_EXCEPTION: DPRINTF(kvm_exit_exception\n); break; +#ifdef KVM_CAP_INTERNAL_ERROR_DATA +case KVM_EXIT_INTERNAL_ERROR: +kvm_handle_internal_error(env, run); +break; +#endif case KVM_EXIT_DEBUG: DPRINTF(kvm_exit_debug\n); #ifdef KVM_CAP_SET_GUEST_DEBUG -- 1.6.6.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/6] Enable pvclock flags in vcpu_time_info structure
On 04/26/2010 10:46 AM, Glauber Costa wrote: This patch removes one padding byte and transform it into a flags field. New versions of guests using pvclock will query these flags upon each read. Is this necessary? Why not just make the pvclock driver maintain a local flag set, and have the HV backend call into it to update it. Why does it need to be part of the pvclock structure? J Flags, however, will only be interpreted when the guest decides to. It uses the pvclock_valid_flags function to signal that a specific set of flags should be taken into consideration. Which flags are valid are usually devised via HV negotiation. Signed-off-by: Glauber Costa glom...@redhat.com CC: Jeremy Fitzhardinge jer...@goop.org --- arch/x86/include/asm/pvclock-abi.h |3 ++- arch/x86/include/asm/pvclock.h |1 + arch/x86/kernel/pvclock.c |9 + 3 files changed, 12 insertions(+), 1 deletions(-) diff --git a/arch/x86/include/asm/pvclock-abi.h b/arch/x86/include/asm/pvclock-abi.h index 6d93508..ec5c41a 100644 --- a/arch/x86/include/asm/pvclock-abi.h +++ b/arch/x86/include/asm/pvclock-abi.h @@ -29,7 +29,8 @@ struct pvclock_vcpu_time_info { u64 system_time; u32 tsc_to_system_mul; s8tsc_shift; - u8pad[3]; + u8flags; + u8pad[2]; } __attribute__((__packed__)); /* 32 bytes */ struct pvclock_wall_clock { diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h index 53235fd..c50823f 100644 --- a/arch/x86/include/asm/pvclock.h +++ b/arch/x86/include/asm/pvclock.h @@ -6,6 +6,7 @@ /* some helper functions for xen and kvm pv clock sources */ cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src); +void pvclock_valid_flags(u8 flags); unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src); void pvclock_read_wallclock(struct pvclock_wall_clock *wall, struct pvclock_vcpu_time_info *vcpu, diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c index 03801f2..8f4af7b 100644 --- a/arch/x86/kernel/pvclock.c +++ b/arch/x86/kernel/pvclock.c @@ -31,8 +31,16 @@ struct pvclock_shadow_time { u32 tsc_to_nsec_mul; int tsc_shift; u32 version; + u8 flags; }; +static u8 valid_flags = 0; + +void pvclock_valid_flags(u8 flags) +{ + valid_flags = flags; +} + /* * Scale a 64-bit delta by scaling and multiplying by a 32-bit fraction, * yielding a 64-bit result. @@ -91,6 +99,7 @@ static unsigned pvclock_get_time_values(struct pvclock_shadow_time *dst, dst-system_timestamp = src-system_time; dst-tsc_to_nsec_mul = src-tsc_to_system_mul; dst-tsc_shift = src-tsc_shift; + dst-flags = src-flags; rmb(); /* test version after fetching data */ } while ((src-version 1) || (dst-version != src-version)); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 10/10] introduce qemu_ram_map
On 04/26/2010 12:59 PM, Marcelo Tosatti wrote: Which allows drivers to register an mmaped region into ram block mappings. To be used by device assignment driver. This is not kvm specific and not required by this pull request so it shouldn't really be part of the pull. Something like this should only be added when there's an actual consumer. Regards, Anthony Liguori CC: Cam Macdonellc...@cs.ualberta.ca Signed-off-by: Marcelo Tosattimtosa...@redhat.com --- cpu-common.h |1 + exec.c | 28 2 files changed, 29 insertions(+), 0 deletions(-) diff --git a/cpu-common.h b/cpu-common.h index b24cecc..2dfde6f 100644 --- a/cpu-common.h +++ b/cpu-common.h @@ -40,6 +40,7 @@ static inline void cpu_register_physical_memory(target_phys_addr_t start_addr, } ram_addr_t cpu_get_physical_page_desc(target_phys_addr_t addr); +ram_addr_t qemu_ram_map(ram_addr_t size, void *host); ram_addr_t qemu_ram_alloc(ram_addr_t); void qemu_ram_free(ram_addr_t addr); /* This should only be used for ram local to a device. */ diff --git a/exec.c b/exec.c index 14d1fd7..648a9c9 100644 --- a/exec.c +++ b/exec.c @@ -2789,6 +2789,34 @@ static void *file_ram_alloc(ram_addr_t memory, const char *path) } #endif +ram_addr_t qemu_ram_map(ram_addr_t size, void *host) +{ +RAMBlock *new_block; + +size = TARGET_PAGE_ALIGN(size); +new_block = qemu_malloc(sizeof(*new_block)); + +new_block-host = host; + +new_block-offset = last_ram_offset; +new_block-length = size; + +new_block-next = ram_blocks; +ram_blocks = new_block; + +phys_ram_dirty = qemu_realloc(phys_ram_dirty, +(last_ram_offset + size) TARGET_PAGE_BITS); +memset(phys_ram_dirty + (last_ram_offset TARGET_PAGE_BITS), + 0xff, size TARGET_PAGE_BITS); + +last_ram_offset += size; + +if (kvm_enabled()) +kvm_setup_guest_memory(new_block-host, size); + +return new_block-offset; +} + ram_addr_t qemu_ram_alloc(ram_addr_t size) { RAMBlock *new_block; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/6] Enable pvclock flags in vcpu_time_info structure
On Mon, Apr 26, 2010 at 11:11:57AM -0700, Jeremy Fitzhardinge wrote: On 04/26/2010 10:46 AM, Glauber Costa wrote: This patch removes one padding byte and transform it into a flags field. New versions of guests using pvclock will query these flags upon each read. Is this necessary? Why not just make the pvclock driver maintain a local flag set, and have the HV backend call into it to update it. Why does it need to be part of the pvclock structure? Because it is already there, and we have plenty of space left? There are obvious other ways, but I don't see any of them being simpler. If we go by the method you suggested, we'd have, for instance, to register the memory area where this flags lives. Which is a duplication of the infrastructure already present in kvmclock. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 10/10] introduce qemu_ram_map
On Mon, Apr 26, 2010 at 01:27:30PM -0500, Anthony Liguori wrote: On 04/26/2010 12:59 PM, Marcelo Tosatti wrote: Which allows drivers to register an mmaped region into ram block mappings. To be used by device assignment driver. This doesn't make much sense to me. Do you use this like: qemu_ram_map(64k, ptr); assert(qemu_ram_alloc(64k) == ptr); No. hw/device-assignment.c in qemu-kvm mmaps /sys/bus/pci/devices/x:y:z/resourcen (the PCI devices memory regions) to the guest. If so, I think this is not the best API. I'd rather see qemu_ram_map() register a symbolic name for the region and for there to be a qemu_ram_alloc() variant that allocated by name. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 10/10] introduce qemu_ram_map
On Mon, Apr 26, 2010 at 01:29:06PM -0500, Anthony Liguori wrote: On 04/26/2010 12:59 PM, Marcelo Tosatti wrote: Which allows drivers to register an mmaped region into ram block mappings. To be used by device assignment driver. This is not kvm specific and not required by this pull request so it shouldn't really be part of the pull. Something like this should only be added when there's an actual consumer. The user will be hw/device-assignment.c in qemu-kvm. And also Cam has the need for a similar interface for shared memory drivers. Index: qemu-kvm/hw/device-assignment.c === --- qemu-kvm.orig/hw/device-assignment.c2010-04-22 16:21:30.0 -0400 +++ qemu-kvm/hw/device-assignment.c 2010-04-22 17:36:57.0 -0400 @@ -256,10 +256,7 @@ AssignedDevice *r_dev = container_of(pci_dev, AssignedDevice, dev); AssignedDevRegion *region = r_dev-v_addrs[region_num]; PCIRegion *real_region = r_dev-real_device.regions[region_num]; -pcibus_t old_ephys = region-e_physbase; -pcibus_t old_esize = region-e_size; -int first_map = (region-e_size == 0); -int ret = 0; +int ret = 0, flags = 0; DEBUG(e_phys=%08 FMT_PCIBUS r_virt=%p type=%d len=%08 FMT_PCIBUS region_num=%d \n, e_phys, region-u.r_virtbase, type, e_size, region_num); @@ -267,30 +264,22 @@ region-e_physbase = e_phys; region-e_size = e_size; -if (!first_map) - kvm_destroy_phys_mem(kvm_context, old_ephys, - TARGET_PAGE_ALIGN(old_esize)); - if (e_size 0) { + +if (region_num == PCI_ROM_SLOT) +flags |= IO_MEM_ROM; + +cpu_register_physical_memory(e_phys, e_size, region-memory_index | flags); + /* deal with MSI-X MMIO page */ if (real_region-base_addr = r_dev-msix_table_addr real_region-base_addr + real_region-size = r_dev-msix_table_addr) { int offset = r_dev-msix_table_addr - real_region-base_addr; -ret = munmap(region-u.r_virtbase + offset, TARGET_PAGE_SIZE); -if (ret == 0) -DEBUG(munmap done, virt_base 0x%p\n, -region-u.r_virtbase + offset); -else { -fprintf(stderr, %s: fail munmap msix table!\n, __func__); -exit(1); -} + cpu_register_physical_memory(e_phys + offset, TARGET_PAGE_SIZE, r_dev-mmio_index); } - ret = kvm_register_phys_mem(kvm_context, e_phys, -region-u.r_virtbase, -TARGET_PAGE_ALIGN(e_size), 0); } if (ret != 0) { @@ -539,6 +528,15 @@ pci_dev-v_addrs[i].u.r_virtbase += (cur_region-base_addr 0xFFF); + +if (!slow_map) { +void *virtbase = pci_dev-v_addrs[i].u.r_virtbase; + +pci_dev-v_addrs[i].memory_index = qemu_ram_map(cur_region-size, +virtbase); +} else +pci_dev-v_addrs[i].memory_index = 0; + pci_register_bar((PCIDevice *) pci_dev, i, cur_region-size, t, slow_map ? assigned_dev_iomem_map_slow @@ -726,10 +724,6 @@ kvm_remove_ioperm_data(region-u.r_baseport, region-r_size); continue; } else if (pci_region-type IORESOURCE_MEM) { -if (region-e_size 0) -kvm_destroy_phys_mem(kvm_context, region-e_physbase, - TARGET_PAGE_ALIGN(region-e_size)); - if (region-u.r_virtbase) { int ret = munmap(region-u.r_virtbase, (pci_region-size + 0xFFF) 0xF000); Index: qemu-kvm/hw/device-assignment.h === --- qemu-kvm.orig/hw/device-assignment.h2010-04-22 16:21:30.0 -0400 +++ qemu-kvm/hw/device-assignment.h 2010-04-22 16:24:32.0 -0400 @@ -63,7 +63,7 @@ typedef struct { pcibus_t e_physbase; -uint32_t memory_index; +ram_addr_t memory_index; union { void *r_virtbase;/* mmapped access address for memory regions */ uint32_t r_baseport; /* the base guest port for I/O regions */ -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 10/10] introduce qemu_ram_map
On 04/26/2010 01:49 PM, Marcelo Tosatti wrote: On Mon, Apr 26, 2010 at 01:27:30PM -0500, Anthony Liguori wrote: On 04/26/2010 12:59 PM, Marcelo Tosatti wrote: Which allows drivers to register an mmaped region into ram block mappings. To be used by device assignment driver. This doesn't make much sense to me. Do you use this like: qemu_ram_map(64k, ptr); assert(qemu_ram_alloc(64k) == ptr); No. hw/device-assignment.c in qemu-kvm mmaps /sys/bus/pci/devices/x:y:z/resourcen (the PCI devices memory regions) to the guest. I understand, but how do you use qemu_ram_map() to actually map that memory to a given PCI device resource? I assume you rely on it getting put on the front of the list so that the next qemu_ram_alloc() will be at that location. Regards, Anthony Liguori If so, I think this is not the best API. I'd rather see qemu_ram_map() register a symbolic name for the region and for there to be a qemu_ram_alloc() variant that allocated by name. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 10/10] introduce qemu_ram_map
On 04/26/2010 01:50 PM, Marcelo Tosatti wrote: On Mon, Apr 26, 2010 at 01:29:06PM -0500, Anthony Liguori wrote: On 04/26/2010 12:59 PM, Marcelo Tosatti wrote: Which allows drivers to register an mmaped region into ram block mappings. To be used by device assignment driver. This is not kvm specific and not required by this pull request so it shouldn't really be part of the pull. Something like this should only be added when there's an actual consumer. The user will be hw/device-assignment.c in qemu-kvm. And also Cam has the need for a similar interface for shared memory drivers. It should be part of one of those submissions. Index: qemu-kvm/hw/device-assignment.c === --- qemu-kvm.orig/hw/device-assignment.c2010-04-22 16:21:30.0 -0400 +++ qemu-kvm/hw/device-assignment.c 2010-04-22 17:36:57.0 -0400 @@ -256,10 +256,7 @@ AssignedDevice *r_dev = container_of(pci_dev, AssignedDevice, dev); AssignedDevRegion *region =r_dev-v_addrs[region_num]; PCIRegion *real_region =r_dev-real_device.regions[region_num]; -pcibus_t old_ephys = region-e_physbase; -pcibus_t old_esize = region-e_size; -int first_map = (region-e_size == 0); -int ret = 0; +int ret = 0, flags = 0; DEBUG(e_phys=%08 FMT_PCIBUS r_virt=%p type=%d len=%08 FMT_PCIBUS region_num=%d \n, e_phys, region-u.r_virtbase, type, e_size, region_num); @@ -267,30 +264,22 @@ region-e_physbase = e_phys; region-e_size = e_size; -if (!first_map) - kvm_destroy_phys_mem(kvm_context, old_ephys, - TARGET_PAGE_ALIGN(old_esize)); - if (e_size 0) { + +if (region_num == PCI_ROM_SLOT) +flags |= IO_MEM_ROM; + +cpu_register_physical_memory(e_phys, e_size, region-memory_index | flags); + /* deal with MSI-X MMIO page */ if (real_region-base_addr= r_dev-msix_table_addr real_region-base_addr + real_region-size= r_dev-msix_table_addr) { int offset = r_dev-msix_table_addr - real_region-base_addr; -ret = munmap(region-u.r_virtbase + offset, TARGET_PAGE_SIZE); -if (ret == 0) -DEBUG(munmap done, virt_base 0x%p\n, -region-u.r_virtbase + offset); -else { -fprintf(stderr, %s: fail munmap msix table!\n, __func__); -exit(1); -} + cpu_register_physical_memory(e_phys + offset, TARGET_PAGE_SIZE, r_dev-mmio_index); } - ret = kvm_register_phys_mem(kvm_context, e_phys, -region-u.r_virtbase, -TARGET_PAGE_ALIGN(e_size), 0); } if (ret != 0) { @@ -539,6 +528,15 @@ pci_dev-v_addrs[i].u.r_virtbase += (cur_region-base_addr 0xFFF); + +if (!slow_map) { +void *virtbase = pci_dev-v_addrs[i].u.r_virtbase; + +pci_dev-v_addrs[i].memory_index = qemu_ram_map(cur_region-size, +virtbase); +} else +pci_dev-v_addrs[i].memory_index = 0; + pci_register_bar((PCIDevice *) pci_dev, i, cur_region-size, t, slow_map ? assigned_dev_iomem_map_slow @@ -726,10 +724,6 @@ kvm_remove_ioperm_data(region-u.r_baseport, region-r_size); continue; } else if (pci_region-type IORESOURCE_MEM) { -if (region-e_size 0) -kvm_destroy_phys_mem(kvm_context, region-e_physbase, - TARGET_PAGE_ALIGN(region-e_size)); - if (region-u.r_virtbase) { int ret = munmap(region-u.r_virtbase, (pci_region-size + 0xFFF) 0xF000); How does hot unplug get dealt with? Regards, Anthony Liguori Index: qemu-kvm/hw/device-assignment.h === --- qemu-kvm.orig/hw/device-assignment.h2010-04-22 16:21:30.0 -0400 +++ qemu-kvm/hw/device-assignment.h 2010-04-22 16:24:32.0 -0400 @@ -63,7 +63,7 @@ typedef struct { pcibus_t e_physbase; -uint32_t memory_index; +ram_addr_t memory_index; union { void *r_virtbase;/* mmapped access address for memory regions */ uint32_t r_baseport; /* the base guest port for I/O regions */ -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: Document KVM_GET_MP_STATE and KVM_SET_MP_STATE
On Sun, Apr 25, 2010 at 03:51:46PM +0300, Avi Kivity wrote: Signed-off-by: Avi Kivity a...@redhat.com --- Documentation/kvm/api.txt | 44 1 files changed, 44 insertions(+), 0 deletions(-) Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 10/10] introduce qemu_ram_map
On Mon, Apr 26, 2010 at 01:57:37PM -0500, Anthony Liguori wrote: On 04/26/2010 01:50 PM, Marcelo Tosatti wrote: On Mon, Apr 26, 2010 at 01:29:06PM -0500, Anthony Liguori wrote: On 04/26/2010 12:59 PM, Marcelo Tosatti wrote: Which allows drivers to register an mmaped region into ram block mappings. To be used by device assignment driver. This is not kvm specific and not required by this pull request so it shouldn't really be part of the pull. Something like this should only be added when there's an actual consumer. The user will be hw/device-assignment.c in qemu-kvm. And also Cam has the need for a similar interface for shared memory drivers. It should be part of one of those submissions. OK @@ -726,10 +724,6 @@ kvm_remove_ioperm_data(region-u.r_baseport, region-r_size); continue; } else if (pci_region-type IORESOURCE_MEM) { -if (region-e_size 0) -kvm_destroy_phys_mem(kvm_context, region-e_physbase, - TARGET_PAGE_ALIGN(region-e_size)); - if (region-u.r_virtbase) { int ret = munmap(region-u.r_virtbase, (pci_region-size + 0xFFF) 0xF000); How does hot unplug get dealt with? The regions will have such mappings unmapped from QEMU (and KVM) via cpu_register_physical_memory(IO_MEM_UNASSIGNED) via pci_unregister_io_regions. Just pushed a new tree without the patch, please pull if you are OK with the other changes. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH 10/10] introduce qemu_ram_map
On 04/26/2010 02:14 PM, Marcelo Tosatti wrote: On Mon, Apr 26, 2010 at 01:57:37PM -0500, Anthony Liguori wrote: On 04/26/2010 01:50 PM, Marcelo Tosatti wrote: On Mon, Apr 26, 2010 at 01:29:06PM -0500, Anthony Liguori wrote: On 04/26/2010 12:59 PM, Marcelo Tosatti wrote: Which allows drivers to register an mmaped region into ram block mappings. To be used by device assignment driver. This is not kvm specific and not required by this pull request so it shouldn't really be part of the pull. Something like this should only be added when there's an actual consumer. The user will be hw/device-assignment.c in qemu-kvm. And also Cam has the need for a similar interface for shared memory drivers. It should be part of one of those submissions. OK @@ -726,10 +724,6 @@ kvm_remove_ioperm_data(region-u.r_baseport, region-r_size); continue; } else if (pci_region-type IORESOURCE_MEM) { -if (region-e_size 0) -kvm_destroy_phys_mem(kvm_context, region-e_physbase, - TARGET_PAGE_ALIGN(region-e_size)); - if (region-u.r_virtbase) { int ret = munmap(region-u.r_virtbase, (pci_region-size + 0xFFF) 0xF000); How does hot unplug get dealt with? The regions will have such mappings unmapped from QEMU (and KVM) via cpu_register_physical_memory(IO_MEM_UNASSIGNED) via pci_unregister_io_regions. But how do you qemu_ram_unmap()? I see you munmap() that address but it looks like the qemu ram region gets leaked pointing to an invalid pointer. Regards, Anthony Liguori Just pushed a new tree without the patch, please pull if you are OK with the other changes. Yes, I am. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH v4 01/18] Add a new struct for device to manipulate external buffer.
On Sun, Apr 25, 2010 at 4:19 AM, xiaohui@intel.com wrote: From: Xin Xiaohui xiaohui@intel.com Signed-off-by: Xin Xiaohui xiaohui@intel.com Signed-off-by: Zhao Yu yzha...@gmail.com Reviewed-by: Jeff Dike jd...@linux.intel.com --- include/linux/netdevice.h | 19 ++- 1 files changed, 18 insertions(+), 1 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index c79a88b..bf79756 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -530,6 +530,22 @@ struct netdev_queue { unsigned long tx_dropped; } cacheline_aligned_in_smp; +/* Add a structure in structure net_device, the new field is + * named as mp_port. It's for mediate passthru (zero-copy). + * It contains the capability for the net device driver, + * a socket, and an external buffer creator, external means + * skb buffer belongs to the device may not be allocated from + * kernel space. + */ +struct mpassthru_port { + int hdr_len; + int data_len; + int npages; + unsigned flags; + struct socket *sock; + struct skb_external_page *(*ctor)(struct mpassthru_port *, + struct sk_buff *, int); +}; I tried searching around, but couldn't find where struct skb_external_page is declared. Where is it? Andy -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6] Add mergeable rx buffer support to vhost_net
This patch adds mergeable receive buffer support to vhost_net. +-DLS Signed-off-by: David L Stevens dlstev...@us.ibm.com diff -ruNp net-next-v0/drivers/vhost/net.c net-next-v6/drivers/vhost/net.c --- net-next-v0/drivers/vhost/net.c 2010-04-24 21:36:54.0 -0700 +++ net-next-v6/drivers/vhost/net.c 2010-04-26 01:13:04.0 -0700 @@ -109,7 +109,7 @@ static void handle_tx(struct vhost_net * }; size_t len, total_len = 0; int err, wmem; - size_t hdr_size; + size_t vhost_hlen; struct socket *sock = rcu_dereference(vq-private_data); if (!sock) return; @@ -128,13 +128,13 @@ static void handle_tx(struct vhost_net * if (wmem sock-sk-sk_sndbuf / 2) tx_poll_stop(net); - hdr_size = vq-hdr_size; + vhost_hlen = vq-vhost_hlen; for (;;) { - head = vhost_get_vq_desc(net-dev, vq, vq-iov, -ARRAY_SIZE(vq-iov), -out, in, -NULL, NULL); + head = vhost_get_desc(net-dev, vq, vq-iov, + ARRAY_SIZE(vq-iov), + out, in, + NULL, NULL); /* Nothing new? Wait for eventfd to tell us they refilled. */ if (head == vq-num) { wmem = atomic_read(sock-sk-sk_wmem_alloc); @@ -155,20 +155,20 @@ static void handle_tx(struct vhost_net * break; } /* Skip header. TODO: support TSO. */ - s = move_iovec_hdr(vq-iov, vq-hdr, hdr_size, out); + s = move_iovec_hdr(vq-iov, vq-hdr, vhost_hlen, out); msg.msg_iovlen = out; len = iov_length(vq-iov, out); /* Sanity check */ if (!len) { vq_err(vq, Unexpected header len for TX: %zd expected %zd\n, - iov_length(vq-hdr, s), hdr_size); + iov_length(vq-hdr, s), vhost_hlen); break; } /* TODO: Check specific error and bomb out unless ENOBUFS? */ err = sock-ops-sendmsg(NULL, sock, msg, len); if (unlikely(err 0)) { - vhost_discard_vq_desc(vq); + vhost_discard_desc(vq, 1); tx_poll_start(net, sock); break; } @@ -187,12 +187,25 @@ static void handle_tx(struct vhost_net * unuse_mm(net-dev.mm); } +static int vhost_head_len(struct vhost_virtqueue *vq, struct sock *sk) +{ + struct sk_buff *head; + int len = 0; + + lock_sock(sk); + head = skb_peek(sk-sk_receive_queue); + if (head) + len = head-len + vq-sock_hlen; + release_sock(sk); + return len; +} + /* Expects to be always run from workqueue - which acts as * read-size critical section for our kind of RCU. */ static void handle_rx(struct vhost_net *net) { struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_RX]; - unsigned head, out, in, log, s; + unsigned in, log, s; struct vhost_log *vq_log; struct msghdr msg = { .msg_name = NULL, @@ -203,14 +216,14 @@ static void handle_rx(struct vhost_net * .msg_flags = MSG_DONTWAIT, }; - struct virtio_net_hdr hdr = { - .flags = 0, - .gso_type = VIRTIO_NET_HDR_GSO_NONE + struct virtio_net_hdr_mrg_rxbuf hdr = { + .hdr.flags = 0, + .hdr.gso_type = VIRTIO_NET_HDR_GSO_NONE }; size_t len, total_len = 0; - int err; - size_t hdr_size; + int err, headcount, datalen; + size_t vhost_hlen; struct socket *sock = rcu_dereference(vq-private_data); if (!sock || skb_queue_empty(sock-sk-sk_receive_queue)) return; @@ -218,18 +231,19 @@ static void handle_rx(struct vhost_net * use_mm(net-dev.mm); mutex_lock(vq-mutex); vhost_disable_notify(vq); - hdr_size = vq-hdr_size; + vhost_hlen = vq-vhost_hlen; vq_log = unlikely(vhost_has_feature(net-dev, VHOST_F_LOG_ALL)) ? vq-log : NULL; - for (;;) { - head = vhost_get_vq_desc(net-dev, vq, vq-iov, -ARRAY_SIZE(vq-iov), -out, in, -vq_log, log); + while ((datalen = vhost_head_len(vq, sock-sk))) { + headcount = vhost_get_desc_n(vq, vq-heads, +datalen + vhost_hlen, +in, vq_log, log); + if
Re: [PATCH RFC] KVM MMU: fix hashing for TDP and non-paging modes
On Thu, Apr 22, 2010 at 02:15:14PM -0700, Eric Northup wrote: I've been reading the x86's mmu.c recently and had been wondering about something. Avi's recent mmu documentation (thanks!) seems to have confirmed my understanding of how the shadow paging is supposed to be working. In TDP mode, when mmu_alloc_roots() calls kvm_mmu_get_page(), why does it pass (vcpu-arch.cr3 PAGE_SHIFT) or (vcpu-arch.mmu.pae_root[i]) as gfn? It seems to me that in TDP mode, gfn should be either zero for the root page table, or 0/1GB/2GB/3GB (for PAE page tables). The existing behavior can lead to multiple, semantically-identical TDP roots being created by mmu_alloc_roots, depending on the VCPU's CR3 at the time that mmu_alloc_roots was called. But the nested page tables should be* independent of the VCPU state. That wastes some memory and causes extra page faults while populating the extra copies of the page tables. *assuming that we aren't modeling per-VCPU state that might change the physical address map as seen by that VCPU, such as setting the APIC base to an address overlapping RAM. All feedback would be welcome, since I'm new to this system! A strawman patch follows. thanks, -Eric -- For TDP mode, avoid creating multiple page table roots for the single guest-to-host physical address map by fixing the inputs used for the shadow page table hash in mmu_alloc_roots(). Signed-off-by: Eric Northup digitale...@google.com --- arch/x86/kvm/mmu.c | 12 1 files changed, 8 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index ddfa865..9696d65 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2059,10 +2059,12 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu) hpa_t root = vcpu-arch.mmu.root_hpa; ASSERT(!VALID_PAGE(root)); - if (tdp_enabled) - direct = 1; if (mmu_check_root(vcpu, root_gfn)) return 1; + if (tdp_enabled) { + direct = 1; + root_gfn = 0; + } sp = kvm_mmu_get_page(vcpu, root_gfn, 0, PT64_ROOT_LEVEL, direct, ACC_ALL, NULL); @@ -2072,8 +2074,6 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu) return 0; } direct = !is_paging(vcpu); - if (tdp_enabled) - direct = 1; for (i = 0; i 4; ++i) { hpa_t root = vcpu-arch.mmu.pae_root[i]; @@ -2089,6 +2089,10 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu) root_gfn = 0; if (mmu_check_root(vcpu, root_gfn)) return 1; + if (tdp_enabled) { + direct = 1; + root_gfn = i 30; + } sp = kvm_mmu_get_page(vcpu, root_gfn, i 30, PT32_ROOT_LEVEL, direct, ACC_ALL, NULL); There is no need to allocate 4 different roots for TDP tables if kvm_x86_ops-get_tdp_level() == PT64_ROOT_LEVEL. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH 10/10] introduce qemu_ram_map
On Mon, Apr 26, 2010 at 02:20:42PM -0500, Anthony Liguori wrote: On 04/26/2010 02:14 PM, Marcelo Tosatti wrote: On Mon, Apr 26, 2010 at 01:57:37PM -0500, Anthony Liguori wrote: On 04/26/2010 01:50 PM, Marcelo Tosatti wrote: On Mon, Apr 26, 2010 at 01:29:06PM -0500, Anthony Liguori wrote: On 04/26/2010 12:59 PM, Marcelo Tosatti wrote: Which allows drivers to register an mmaped region into ram block mappings. To be used by device assignment driver. This is not kvm specific and not required by this pull request so it shouldn't really be part of the pull. Something like this should only be added when there's an actual consumer. The user will be hw/device-assignment.c in qemu-kvm. And also Cam has the need for a similar interface for shared memory drivers. It should be part of one of those submissions. OK @@ -726,10 +724,6 @@ kvm_remove_ioperm_data(region-u.r_baseport, region-r_size); continue; } else if (pci_region-type IORESOURCE_MEM) { -if (region-e_size 0) -kvm_destroy_phys_mem(kvm_context, region-e_physbase, - TARGET_PAGE_ALIGN(region-e_size)); - if (region-u.r_virtbase) { int ret = munmap(region-u.r_virtbase, (pci_region-size + 0xFFF) 0xF000); How does hot unplug get dealt with? The regions will have such mappings unmapped from QEMU (and KVM) via cpu_register_physical_memory(IO_MEM_UNASSIGNED) via pci_unregister_io_regions. But how do you qemu_ram_unmap()? I see you munmap() that address but it looks like the qemu ram region gets leaked pointing to an invalid pointer. Yes, qemu_ram_free() is not implemented. last_ram_offset always moves forward. But there should be no references to the memory mapping anymore, after the device is hot-unplugged. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6] Add mergeable rx buffer support to vhost_net
On Mon, Apr 26, 2010 at 02:20:52PM -0700, David L Stevens wrote: This patch adds mergeable receive buffer support to vhost_net. +-DLS Signed-off-by: David L Stevens dlstev...@us.ibm.com OK, looks good. I still think iovec handling is a bit off, as commented below. diff -ruNp net-next-v0/drivers/vhost/net.c net-next-v6/drivers/vhost/net.c --- net-next-v0/drivers/vhost/net.c 2010-04-24 21:36:54.0 -0700 +++ net-next-v6/drivers/vhost/net.c 2010-04-26 01:13:04.0 -0700 @@ -109,7 +109,7 @@ static void handle_tx(struct vhost_net * }; size_t len, total_len = 0; int err, wmem; - size_t hdr_size; + size_t vhost_hlen; struct socket *sock = rcu_dereference(vq-private_data); if (!sock) return; @@ -128,13 +128,13 @@ static void handle_tx(struct vhost_net * if (wmem sock-sk-sk_sndbuf / 2) tx_poll_stop(net); - hdr_size = vq-hdr_size; + vhost_hlen = vq-vhost_hlen; for (;;) { - head = vhost_get_vq_desc(net-dev, vq, vq-iov, - ARRAY_SIZE(vq-iov), - out, in, - NULL, NULL); + head = vhost_get_desc(net-dev, vq, vq-iov, + ARRAY_SIZE(vq-iov), + out, in, + NULL, NULL); /* Nothing new? Wait for eventfd to tell us they refilled. */ if (head == vq-num) { wmem = atomic_read(sock-sk-sk_wmem_alloc); @@ -155,20 +155,20 @@ static void handle_tx(struct vhost_net * break; } /* Skip header. TODO: support TSO. */ - s = move_iovec_hdr(vq-iov, vq-hdr, hdr_size, out); + s = move_iovec_hdr(vq-iov, vq-hdr, vhost_hlen, out); msg.msg_iovlen = out; len = iov_length(vq-iov, out); /* Sanity check */ if (!len) { vq_err(vq, Unexpected header len for TX: %zd expected %zd\n, -iov_length(vq-hdr, s), hdr_size); +iov_length(vq-hdr, s), vhost_hlen); break; } /* TODO: Check specific error and bomb out unless ENOBUFS? */ err = sock-ops-sendmsg(NULL, sock, msg, len); if (unlikely(err 0)) { - vhost_discard_vq_desc(vq); + vhost_discard_desc(vq, 1); tx_poll_start(net, sock); break; } @@ -187,12 +187,25 @@ static void handle_tx(struct vhost_net * unuse_mm(net-dev.mm); } +static int vhost_head_len(struct vhost_virtqueue *vq, struct sock *sk) +{ + struct sk_buff *head; + int len = 0; + + lock_sock(sk); + head = skb_peek(sk-sk_receive_queue); + if (head) + len = head-len + vq-sock_hlen; + release_sock(sk); + return len; +} + /* Expects to be always run from workqueue - which acts as * read-size critical section for our kind of RCU. */ static void handle_rx(struct vhost_net *net) { struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_RX]; - unsigned head, out, in, log, s; + unsigned in, log, s; struct vhost_log *vq_log; struct msghdr msg = { .msg_name = NULL, @@ -203,14 +216,14 @@ static void handle_rx(struct vhost_net * .msg_flags = MSG_DONTWAIT, }; - struct virtio_net_hdr hdr = { - .flags = 0, - .gso_type = VIRTIO_NET_HDR_GSO_NONE + struct virtio_net_hdr_mrg_rxbuf hdr = { + .hdr.flags = 0, + .hdr.gso_type = VIRTIO_NET_HDR_GSO_NONE }; size_t len, total_len = 0; - int err; - size_t hdr_size; + int err, headcount, datalen; + size_t vhost_hlen; struct socket *sock = rcu_dereference(vq-private_data); if (!sock || skb_queue_empty(sock-sk-sk_receive_queue)) return; @@ -218,18 +231,19 @@ static void handle_rx(struct vhost_net * use_mm(net-dev.mm); mutex_lock(vq-mutex); vhost_disable_notify(vq); - hdr_size = vq-hdr_size; + vhost_hlen = vq-vhost_hlen; vq_log = unlikely(vhost_has_feature(net-dev, VHOST_F_LOG_ALL)) ? vq-log : NULL; - for (;;) { - head = vhost_get_vq_desc(net-dev, vq, vq-iov, - ARRAY_SIZE(vq-iov), - out, in, - vq_log, log); + while ((datalen = vhost_head_len(vq, sock-sk))) { + headcount = vhost_get_desc_n(vq, vq-heads, + datalen + vhost_hlen, +
Re: [PATCH RFC] KVM MMU: fix hashing for TDP and non-paging modes
On Mon, Apr 26, 2010 at 06:30:00PM -0300, Marcelo Tosatti wrote: @@ -2089,6 +2089,10 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu) root_gfn = 0; if (mmu_check_root(vcpu, root_gfn)) return 1; + if (tdp_enabled) { + direct = 1; + root_gfn = i 30; + } sp = kvm_mmu_get_page(vcpu, root_gfn, i 30, PT32_ROOT_LEVEL, direct, ACC_ALL, NULL); There is no need to allocate 4 different roots for TDP tables if kvm_x86_ops-get_tdp_level() == PT64_ROOT_LEVEL. Doh, and your patch does not. But it does not apply to kvm.git -next branch, can you regenerate please? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call agenda for Apr 27
* Anthony Liguori (anth...@codemonkey.ws) wrote: On 04/26/2010 12:26 PM, Chris Wright wrote: Please send in any agenda items you are interested in covering. While I don't expect it to be the case this week, if we have a lack of agenda items I'll cancel the week's call. - qemu management interface (and libvirt) - stable tree policy (push vs. pull and call for stable volunteers) block plug in (follow-on from qmp block watermark) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: Minor MMU documentation edits
On Mon, Apr 26, 2010 at 11:59:21AM +0300, Avi Kivity wrote: Reported by Andrew Jones. Signed-off-by: Avi Kivity a...@redhat.com --- Documentation/kvm/mmu.txt | 10 +- 1 files changed, 5 insertions(+), 5 deletions(-) Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call agenda for Apr 27
On 04/26/2010 05:12 PM, Chris Wright wrote: * Anthony Liguori (anth...@codemonkey.ws) wrote: On 04/26/2010 12:26 PM, Chris Wright wrote: Please send in any agenda items you are interested in covering. While I don't expect it to be the case this week, if we have a lack of agenda items I'll cancel the week's call. - qemu management interface (and libvirt) - stable tree policy (push vs. pull and call for stable volunteers) block plug in (follow-on from qmp block watermark) A few comments: 1) The problem was not block watermark itself but generating a notification on the watermark threshold. It's a heuristic and should be implemented based on polling block stats. Otherwise, we'll be adding tons of events to qemu that we'll struggle to maintain. 2) A block plugin doesn't solve the problem if it's just at the BlockDriverState level because it can't interact with qcow2. 3) For general block plugins, it's probably better to tackle userspace block devices. We have CUSE and FUSE already, a BUSE is a logical conclusion. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] KVM MMU: fix hashing for TDP and non-paging modes
On Mon, Apr 26, 2010 at 2:46 PM, Marcelo Tosatti mtosa...@redhat.com wrote: Doh, and your patch does not. But it does not apply to kvm.git -next branch, can you regenerate please? -- For TDP mode, avoid creating multiple page table roots for the single guest-to-host physical address map by fixing the inputs used for the shadow page table hash in mmu_alloc_roots(). Signed-off-by: Eric Northup digitale...@google.com --- diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index ddfa865..9696d65 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2059,10 +2059,12 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu) hpa_t root = vcpu-arch.mmu.root_hpa; ASSERT(!VALID_PAGE(root)); - if (tdp_enabled) - direct = 1; if (mmu_check_root(vcpu, root_gfn)) return 1; + if (tdp_enabled) { + direct = 1; + root_gfn = 0; + } sp = kvm_mmu_get_page(vcpu, root_gfn, 0, PT64_ROOT_LEVEL, direct, ACC_ALL, NULL); @@ -2072,8 +2074,6 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu) return 0; } direct = !is_paging(vcpu); - if (tdp_enabled) - direct = 1; for (i = 0; i 4; ++i) { hpa_t root = vcpu-arch.mmu.pae_root[i]; @@ -2089,6 +2089,10 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu) root_gfn = 0; if (mmu_check_root(vcpu, root_gfn)) return 1; + if (tdp_enabled) { + direct = 1; + root_gfn = i 30; + } sp = kvm_mmu_get_page(vcpu, root_gfn, i 30, PT32_ROOT_LEVEL, direct, ACC_ALL, NULL); -- -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: KVM call agenda for Apr 27
On Mon, 26 Apr 2010 12:51:08 -0500 Anthony Liguori anth...@codemonkey.ws wrote: On 04/26/2010 12:26 PM, Chris Wright wrote: Please send in any agenda items you are interested in covering. While I don't expect it to be the case this week, if we have a lack of agenda items I'll cancel the week's call. - qemu management interface (and libvirt) - stable tree policy (push vs. pull and call for stable volunteers) What do you mean by push vs. pull? Anyway, Aurelien was working on a stable release last week, maybe he's interested in helping with the stables (or not :)). -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
GSoC: Accepted students announced
Hi there, The following students have been accepted to work with QEMU for GSoC 2010: Student: Corentin Chary Title: Add more sophisticated encodings to VNC server Mentor: Anthony Liguori Student: Eduard - Gabriel Munteanu Title: AMD IOMMU emulation Mentor: Joerg Roedel Student: Miguel Di Ciurcio Filho Title: Converting Monitor interface functions to QMP Mentor: Luiz Capitulino Student: Mohammed Gamal Title: Completing Big Real Mode Support for KVM Mentor: Avi Kivity Student: Roland Elek Title: AHCI emulation Mentor: Alexander Graf Congratulations to you all! If you're a student and your proposal wasn't accepted, remember that it's always possible to work out of GSoC scope. Just contact the mentor of the project you have interest and check with him/her. Thanks for everyone who applied. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] pvclock fixes
On 04/26/2010 07:46 AM, Glauber Costa wrote: Hi, This is the last series I've sent, with comments from you merged. The first 5 patches are the same, only with the suggested fixes. I am leaving documentation out, since the basics won't change, and we're still discussing the details. Patch 6 is new, and is the guest side of the skipping updates avi asked for. I haven't yet done any HV work on this (specially because I am not convinced exactly where it is safe to do). Let me know what you think. I'm rebasing my patches on top of this series to address the host side issues. I noticed a couple issues patching against Avi's tree, not sure if those issues are also present in trunk. Can you keep me CC'd on any updates to this series? Thanks, Zach -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: Fix mmu shrinker error
kvm_mmu_remove_one_alloc_mmu_page() assumes kvm_mmu_zap_page() only reclaims only one sp, but that's not the case. This will cause mmu shrinker returns a wrong number. This patch fix the counting error. Signed-off-by: Gui Jianfeng guijianf...@cn.fujitsu.com --- arch/x86/kvm/mmu.c | 10 +- 1 files changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 7a17db1..c97368e 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2914,13 +2914,13 @@ restart: kvm_flush_remote_tlbs(kvm); } -static void kvm_mmu_remove_one_alloc_mmu_page(struct kvm *kvm) +static int kvm_mmu_remove_some_alloc_mmu_pages(struct kvm *kvm) { struct kvm_mmu_page *page; page = container_of(kvm-arch.active_mmu_pages.prev, struct kvm_mmu_page, link); - kvm_mmu_zap_page(kvm, page); + return kvm_mmu_zap_page(kvm, page) + 1; } static int mmu_shrink(int nr_to_scan, gfp_t gfp_mask) @@ -2932,7 +2932,7 @@ static int mmu_shrink(int nr_to_scan, gfp_t gfp_mask) spin_lock(kvm_lock); list_for_each_entry(kvm, vm_list, vm_list) { - int npages, idx; + int npages, idx, freed_pages; idx = srcu_read_lock(kvm-srcu); spin_lock(kvm-mmu_lock); @@ -2940,8 +2940,8 @@ static int mmu_shrink(int nr_to_scan, gfp_t gfp_mask) kvm-arch.n_free_mmu_pages; cache_count += npages; if (!kvm_freed nr_to_scan 0 npages 0) { - kvm_mmu_remove_one_alloc_mmu_page(kvm); - cache_count--; + freed_pages = kvm_mmu_remove_some_alloc_mmu_pages(kvm); + cache_count -= freed_pages; kvm_freed = kvm; } nr_to_scan--; -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: KVM call agenda for Apr 27
On Mon, Apr 26, 2010 at 10:15:58PM -0300, Luiz Capitulino wrote: On Mon, 26 Apr 2010 12:51:08 -0500 Anthony Liguori anth...@codemonkey.ws wrote: On 04/26/2010 12:26 PM, Chris Wright wrote: Please send in any agenda items you are interested in covering. While I don't expect it to be the case this week, if we have a lack of agenda items I'll cancel the week's call. - qemu management interface (and libvirt) - stable tree policy (push vs. pull and call for stable volunteers) What do you mean by push vs. pull? Anyway, Aurelien was working on a stable release last week, maybe he's interested in helping with the stables (or not :)). I didn't find the time to do the stable release, but we should be very close now. I am interested to have stable releases, but if someone else want to work on that, I am fine. -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurel...@aurel32.net http://www.aurel32.net -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/10] kvm: port qemu-kvm's bitmap scanning
Hi, This patch may conflict with the patch I posted on April 19. http://www.mail-archive.com/qemu-de...@nongnu.org/msg29941.html If Marcelo's is going to be merged, I need to rebase the above to it. It would be helpful if you could tell me the plan. Thanks, Yoshi 2010/4/27 Marcelo Tosatti mtosa...@redhat.com: Which is significantly faster. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com --- kvm-all.c | 53 + 1 files changed, 37 insertions(+), 16 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index eabb097..ef45418 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -26,6 +26,7 @@ #include hw/hw.h #include gdbstub.h #include kvm.h +#include bswap.h /* KVM uses PAGE_SIZE in it's definition of COALESCED_MMIO_MAX */ #define PAGE_SIZE TARGET_PAGE_SIZE @@ -282,11 +283,41 @@ static int kvm_set_migration_log(int enable) return 0; } -static int test_le_bit(unsigned long nr, unsigned char *addr) +/* get kvm's dirty pages bitmap and update qemu's */ +static int kvm_get_dirty_pages_log_range(unsigned long start_addr, + unsigned long *bitmap, + unsigned long offset, + unsigned long mem_size) { - return (addr[nr 3] (nr 7)) 1; + unsigned int i, j; + unsigned long page_number, addr, addr1, c; + ram_addr_t ram_addr; + unsigned int len = ((mem_size / TARGET_PAGE_SIZE) + HOST_LONG_BITS - 1) / + HOST_LONG_BITS; + + /* + * bitmap-traveling is faster than memory-traveling (for addr...) + * especially when most of the memory is not dirty. + */ + for (i = 0; i len; i++) { + if (bitmap[i] != 0) { + c = leul_to_cpu(bitmap[i]); + do { + j = ffsl(c) - 1; + c = ~(1ul j); + page_number = i * HOST_LONG_BITS + j; + addr1 = page_number * TARGET_PAGE_SIZE; + addr = offset + addr1; + ram_addr = cpu_get_physical_page_desc(addr); + cpu_physical_memory_set_dirty(ram_addr); + } while (c != 0); + } + } + return 0; } +#define ALIGN(x, y) (((x)+(y)-1) ~((y)-1)) + /** * kvm_physical_sync_dirty_bitmap - Grab dirty bitmap from kernel space * This function updates qemu's dirty bitmap using cpu_physical_memory_set_dirty(). @@ -300,8 +331,6 @@ static int kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr, { KVMState *s = kvm_state; unsigned long size, allocated_size = 0; - target_phys_addr_t phys_addr; - ram_addr_t addr; KVMDirtyLog d; KVMSlot *mem; int ret = 0; @@ -313,7 +342,7 @@ static int kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr, break; } - size = ((mem-memory_size TARGET_PAGE_BITS) + 7) / 8; + size = ALIGN(((mem-memory_size) TARGET_PAGE_BITS), HOST_LONG_BITS) / 8; if (!d.dirty_bitmap) { d.dirty_bitmap = qemu_malloc(size); } else if (size allocated_size) { @@ -330,17 +359,9 @@ static int kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr, break; } - for (phys_addr = mem-start_addr, addr = mem-phys_offset; - phys_addr mem-start_addr + mem-memory_size; - phys_addr += TARGET_PAGE_SIZE, addr += TARGET_PAGE_SIZE) { - unsigned char *bitmap = (unsigned char *)d.dirty_bitmap; - unsigned nr = (phys_addr - mem-start_addr) TARGET_PAGE_BITS; - - if (test_le_bit(nr, bitmap)) { - cpu_physical_memory_set_dirty(addr); - } - } - start_addr = phys_addr; + kvm_get_dirty_pages_log_range(mem-start_addr, d.dirty_bitmap, + mem-start_addr, mem-memory_size); + start_addr = mem-start_addr + mem-memory_size; } qemu_free(d.dirty_bitmap); -- 1.6.6.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html