Re: [GIT PULL] KVM fixes for 2.6.26-rc7
> You mean with CONFIG_KVM_CLOCK=n it boots fine, I suppose. > > You should upgrade the guest kernel to the git tree, kvm clock changes > break compatibility with older kernels. For completeness: older -rc kernels only, mixing -rc8 with 2.6.25 and older is fine. cheers, Gerd -- http://kraxel.fedorapeople.org/xenner/ -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: update gitignore
The new coalesced_mmio.[ch] files need to be ignored in the userspace kernel/ directory. Signed-off-by: Amit Shah <[EMAIL PROTECTED]> --- .gitignore |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/.gitignore b/.gitignore index 48edae1..fee15ed 100644 --- a/.gitignore +++ b/.gitignore @@ -50,6 +50,8 @@ kernel/mmu.h kernel/modules.order kernel/tss.h kernel/x86.c +kernel/coalesced_mmio.c +kernel/coalesced_mmio.h qemu/pc-bios/extboot.bin qemu/qemu-doc.html qemu/*.1 -- 1.5.4.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM: pvmmu breakage with gcc 4.3.0
Marcelo Tosatti wrote: Some pvmmu functions store their commands on stack, and newer GCC versions conclude that these commands are unused. So stick an inline asm statement to convince the compiler otherwise. I think a better fix is to add a "memory " clobber to the hypercalls. This isn't really a GCC bug since it doesn't realize that hypercalls can touch memory. See the attached patch. Avi: please push this for 2.6.26 if possible. Regards, Anthony Liguori Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 8b7a3cf..c892752 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -55,6 +55,12 @@ static void kvm_mmu_op(void *buffer, unsigned len) int r; unsigned long a1, a2; + /* +* GCC 4.3.0 concludes that on-stack kvm_mmu_op* is unused and +* optimizes its initialization away. +*/ +asm ("" : : "p" (buffer)); + do { a1 = __pa(buffer); a2 = 0; /* on i386 __pa() always returns <4G */ -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html hypercall-memory-clobber.patch Description: application/mbox
KVM: pvmmu breakage with gcc 4.3.0
Some pvmmu functions store their commands on stack, and newer GCC versions conclude that these commands are unused. So stick an inline asm statement to convince the compiler otherwise. Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 8b7a3cf..c892752 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -55,6 +55,12 @@ static void kvm_mmu_op(void *buffer, unsigned len) int r; unsigned long a1, a2; + /* +* GCC 4.3.0 concludes that on-stack kvm_mmu_op* is unused and +* optimizes its initialization away. +*/ +asm ("" : : "p" (buffer)); + do { a1 = __pa(buffer); a2 = 0; /* on i386 __pa() always returns <4G */ -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: VMX: Add ept_sync_context in flush_tlb
On Wednesday 25 June 2008 20:02:17 Avi Kivity wrote: > Yang, Sheng wrote: > > From 54dc26e44f1c0aa460bef409b799f36dae56a911 Mon Sep 17 00:00:00 2001 > > From: Sheng Yang <[EMAIL PROTECTED]> > > Date: Wed, 18 Jun 2008 11:23:13 +0800 > > Subject: [PATCH] KVM: VMX: Add ept_sync_context in flush_tlb > > > > Fix a potention issue caused by kvm_mmu_slot_remove_write_access(). > > The old behavior don't sync EPT TLB with modified EPT entry, which > > result in inconsistent content of EPT TLB and EPT table. > > > > > > @@ -1407,6 +1408,8 @@ static void exit_lmode(struct kvm_vcpu *vcpu) > > static void vmx_flush_tlb(struct kvm_vcpu *vcpu) > > { > > vpid_sync_vcpu_all(to_vmx(vcpu)); > > + if (vm_need_ept()) > > + ept_sync_context(to_vmx(vcpu)); > > } > > So we're flushing both the vpid tlb and the ept context? What does an > ept context flush mean exactly? tlb entries for gpa->hpa? Yeah, the entries for gpa->hpa. So if we don't do this, cpu may see rw entry rather than ro, then write to it directly rather than fall into KVM. -- Thanks Yang, Sheng -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3 of 3] mmu-notifier-core
From: Andrea Arcangeli <[EMAIL PROTECTED]> With KVM/GFP/XPMEM there isn't just the primary CPU MMU pointing to pages. There are secondary MMUs (with secondary sptes and secondary tlbs) too. sptes in the kvm case are shadow pagetables, but when I say spte in mmu-notifier context, I mean "secondary pte". In GRU case there's no actual secondary pte and there's only a secondary tlb because the GRU secondary MMU has no knowledge about sptes and every secondary tlb miss event in the MMU always generates a page fault that has to be resolved by the CPU (this is not the case of KVM where the a secondary tlb miss will walk sptes in hardware and it will refill the secondary tlb transparently to software if the corresponding spte is present). The same way zap_page_range has to invalidate the pte before freeing the page, the spte (and secondary tlb) must also be invalidated before any page is freed and reused. Currently we take a page_count pin on every page mapped by sptes, but that means the pages can't be swapped whenever they're mapped by any spte because they're part of the guest working set. Furthermore a spte unmap event can immediately lead to a page to be freed when the pin is released (so requiring the same complex and relatively slow tlb_gather smp safe logic we have in zap_page_range and that can be avoided completely if the spte unmap event doesn't require an unpin of the page previously mapped in the secondary MMU). The mmu notifiers allow kvm/GRU/XPMEM to attach to the tsk->mm and know when the VM is swapping or freeing or doing anything on the primary MMU so that the secondary MMU code can drop sptes before the pages are freed, avoiding all page pinning and allowing 100% reliable swapping of guest physical address space. Furthermore it avoids the code that teardown the mappings of the secondary MMU, to implement a logic like tlb_gather in zap_page_range that would require many IPI to flush other cpu tlbs, for each fixed number of spte unmapped. To make an example: if what happens on the primary MMU is a protection downgrade (from writeable to wrprotect) the secondary MMU mappings will be invalidated, and the next secondary-mmu-page-fault will call get_user_pages and trigger a do_wp_page through get_user_pages if it called get_user_pages with write=1, and it'll re-establishing an updated spte or secondary-tlb-mapping on the copied page. Or it will setup a readonly spte or readonly tlb mapping if it's a guest-read, if it calls get_user_pages with write=0. This is just an example. This allows to map any page pointed by any pte (and in turn visible in the primary CPU MMU), into a secondary MMU (be it a pure tlb like GRU, or an full MMU with both sptes and secondary-tlb like the shadow-pagetable layer with kvm), or a remote DMA in software like XPMEM (hence needing of schedule in XPMEM code to send the invalidate to the remote node, while no need to schedule in kvm/gru as it's an immediate event like invalidating primary-mmu pte). At least for KVM without this patch it's impossible to swap guests reliably. And having this feature and removing the page pin allows several other optimizations that simplify life considerably. Dependencies: 1) mm_take_all_locks() to register the mmu notifier when the whole VM isn't doing anything with "mm". This allows mmu notifier users to keep track if the VM is in the middle of the invalidate_range_begin/end critical section with an atomic counter incraese in range_begin and decreased in range_end. No secondary MMU page fault is allowed to map any spte or secondary tlb reference, while the VM is in the middle of range_begin/end as any page returned by get_user_pages in that critical section could later immediately be freed without any further ->invalidate_page notification (invalidate_range_begin/end works on ranges and ->invalidate_page isn't called immediately before freeing the page). To stop all page freeing and pagetable overwrites the mmap_sem must be taken in write mode and all other anon_vma/i_mmap locks must be taken too. 2) It'd be a waste to add branches in the VM if nobody could possibly run KVM/GRU/XPMEM on the kernel, so mmu notifiers will only enabled if CONFIG_KVM=m/y. In the current kernel kvm won't yet take advantage of mmu notifiers, but this already allows to compile a KVM external module against a kernel with mmu notifiers enabled and from the next pull from kvm.git we'll start using them. And GRU/XPMEM will also be able to continue the development by enabling KVM=m in their config, until they submit all GRU/XPMEM GPLv2 code to the mainline kernel. Then they can also enable MMU_NOTIFIERS in the same way KVM does it (even if KVM=n). This guarantees nobody selects MMU_NOTIFIER=y if KVM and GRU and XPMEM are all =n. The mmu_notifier_register call can fail because mm_take_all_locks may be interrupted by a signal and return -EINTR. Because mmu_notifier_reigster is used when a driver
[PATCH 1 of 3] list_del_init_rcu
From: Andrea Arcangeli <[EMAIL PROTECTED]> Introduces list_del_init_rcu and documents it (fixes a comment for list_del_rcu too). Signed-off-by: Andrea Arcangeli <[EMAIL PROTECTED]> Acked-by: Linus Torvalds <[EMAIL PROTECTED]> --- diff -r 98f755616212 -r 5e8c41d283cc include/linux/list.h --- a/include/linux/list.h Tue Jun 24 11:23:35 2008 -0700 +++ b/include/linux/list.h Wed Jun 25 03:34:11 2008 +0200 @@ -747,7 +747,7 @@ static inline void hlist_del(struct hlis * or hlist_del_rcu(), running on this same list. * However, it is perfectly legal to run concurrently with * the _rcu list-traversal primitives, such as - * hlist_for_each_entry(). + * hlist_for_each_entry_rcu(). */ static inline void hlist_del_rcu(struct hlist_node *n) { @@ -760,6 +760,34 @@ static inline void hlist_del_init(struct if (!hlist_unhashed(n)) { __hlist_del(n); INIT_HLIST_NODE(n); + } +} + +/** + * hlist_del_init_rcu - deletes entry from hash list with re-initialization + * @n: the element to delete from the hash list. + * + * Note: list_unhashed() on the node return true after this. It is + * useful for RCU based read lockfree traversal if the writer side + * must know if the list entry is still hashed or already unhashed. + * + * In particular, it means that we can not poison the forward pointers + * that may still be used for walking the hash list and we can only + * zero the pprev pointer so list_unhashed() will return true after + * this. + * + * The caller must take whatever precautions are necessary (such as + * holding appropriate locks) to avoid racing with another + * list-mutation primitive, such as hlist_add_head_rcu() or + * hlist_del_rcu(), running on this same list. However, it is + * perfectly legal to run concurrently with the _rcu list-traversal + * primitives, such as hlist_for_each_entry_rcu(). + */ +static inline void hlist_del_init_rcu(struct hlist_node *n) +{ + if (!hlist_unhashed(n)) { + __hlist_del(n); + n->pprev = NULL; } } -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2 of 3] mm_take_all_locks
From: Andrea Arcangeli <[EMAIL PROTECTED]> mm_take_all_locks holds off reclaim from an entire mm_struct. This allows mmu notifiers to register into the mm at any time with the guarantee that no mmu operation is in progress on the mm. This operation locks against the VM for all pte/vma/mm related operations that could ever happen on a certain mm. This includes vmtruncate, try_to_unmap, and all page faults. The caller must take the mmap_sem in write mode before calling mm_take_all_locks(). The caller isn't allowed to release the mmap_sem until mm_drop_all_locks() returns. mmap_sem in write mode is required in order to block all operations that could modify pagetables and free pages without need of altering the vma layout (for example populate_range() with nonlinear vmas). It's also needed in write mode to avoid new anon_vmas to be associated with existing vmas. A single task can't take more than one mm_take_all_locks() in a row or it would deadlock. mm_take_all_locks() and mm_drop_all_locks are expensive operations that may have to take thousand of locks. mm_take_all_locks() can fail if it's interrupted by signals. When mmu_notifier_register returns, we must be sure that the driver is notified if some task is in the middle of a vmtruncate for the 'mm' where the mmu notifier was registered (mmu_notifier_invalidate_range_start/end is run around the vmtruncation but mmu_notifier_register can run after mmu_notifier_invalidate_range_start and before mmu_notifier_invalidate_range_end). Same problem for rmap paths. And we've to remove page pinning to avoid replicating the tlb_gather logic inside KVM (and GRU doesn't work well with page pinning regardless of needing tlb_gather), so without mm_take_all_locks when vmtruncate frees the page, kvm would have no way to notice that it mapped into sptes a page that is going into the freelist without a chance of any further mmu_notifier notification. Signed-off-by: Andrea Arcangeli <[EMAIL PROTECTED]> Acked-by: Linus Torvalds <[EMAIL PROTECTED]> --- diff -r 5e8c41d283cc -r 167f154fa536 include/linux/mm.h --- a/include/linux/mm.hWed Jun 25 03:34:11 2008 +0200 +++ b/include/linux/mm.hWed Jun 25 03:34:14 2008 +0200 @@ -1068,6 +1068,9 @@ extern struct vm_area_struct *copy_vma(s unsigned long addr, unsigned long len, pgoff_t pgoff); extern void exit_mmap(struct mm_struct *); +extern int mm_take_all_locks(struct mm_struct *mm); +extern void mm_drop_all_locks(struct mm_struct *mm); + #ifdef CONFIG_PROC_FS /* From fs/proc/base.c. callers must _not_ hold the mm's exe_file_lock */ extern void added_exe_file_vma(struct mm_struct *mm); diff -r 5e8c41d283cc -r 167f154fa536 include/linux/pagemap.h --- a/include/linux/pagemap.h Wed Jun 25 03:34:11 2008 +0200 +++ b/include/linux/pagemap.h Wed Jun 25 03:34:14 2008 +0200 @@ -19,6 +19,7 @@ */ #defineAS_EIO (__GFP_BITS_SHIFT + 0) /* IO error on async write */ #define AS_ENOSPC (__GFP_BITS_SHIFT + 1) /* ENOSPC on async write */ +#define AS_MM_ALL_LOCKS(__GFP_BITS_SHIFT + 2) /* under mm_take_all_locks() */ static inline void mapping_set_error(struct address_space *mapping, int error) { diff -r 5e8c41d283cc -r 167f154fa536 include/linux/rmap.h --- a/include/linux/rmap.h Wed Jun 25 03:34:11 2008 +0200 +++ b/include/linux/rmap.h Wed Jun 25 03:34:14 2008 +0200 @@ -26,6 +26,14 @@ */ struct anon_vma { spinlock_t lock;/* Serialize access to vma list */ + /* +* NOTE: the LSB of the head.next is set by +* mm_take_all_locks() _after_ taking the above lock. So the +* head must only be read/written after taking the above lock +* to be sure to see a valid next pointer. The LSB bit itself +* is serialized by a system wide lock only visible to +* mm_take_all_locks() (mm_all_locks_mutex). +*/ struct list_head head; /* List of private "related" vmas */ }; diff -r 5e8c41d283cc -r 167f154fa536 mm/mmap.c --- a/mm/mmap.c Wed Jun 25 03:34:11 2008 +0200 +++ b/mm/mmap.c Wed Jun 25 03:34:14 2008 +0200 @@ -2261,3 +2261,161 @@ int install_special_mapping(struct mm_st return 0; } + +static DEFINE_MUTEX(mm_all_locks_mutex); + +static void vm_lock_anon_vma(struct anon_vma *anon_vma) +{ + if (!test_bit(0, (unsigned long *) &anon_vma->head.next)) { + /* +* The LSB of head.next can't change from under us +* because we hold the mm_all_locks_mutex. +*/ + spin_lock(&anon_vma->lock); + /* +* We can safely modify head.next after taking the +* anon_vma->lock. If some other vma in this mm shares +* the same anon_vma we won't take it again. +* +* No need of atomic instructions here, head.next +* can't change from under us thanks to the +* anon_vma->lock. +*/ +
[PATCH 0 of 3] mmu notifier v18 for -mm
Hello, Christoph suggested me to repost v18 for merging in -mm, to give it more exposure before the .27 merge window opens. There's no code change compared to the previous v18 submission (the only change is the correction in the comment in the mm_take_all_locks patch rightfully pointed out by Linus). Full patchset including other XPMEM support patches can be found here: http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.26-rc7/mmu-notifier-v18 Only the three patches of the patchset I'm submitting here by email are ready for merging, the rest you can find in the website is not ready for merging yet for various performance degradations, lots of the XPMEM patches needs to be elaborated to avoid any slowdown for the non-XPMEM case, but I keep maintaining them to make life easier to XPMEM current development and later we can keep work on them to make them suitable for inclusion to avoid any performance degradation risk. (the fourth patch in the series of the above url, is not strictly relealted to mmu notifiers but it's good at least for me to keep it in the same tree to test pci-passthrough capable guest running on reserved-ram at the same time of two regular guests swapping heavily with mmu notifiers which tends to exercises both spte models at the same time, if you find this confusing I'll remove it from any later upload, but xpmem users can totally ignore it, it only touches x86-64 code) Thanks a lot. Andrea -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4 of 4] Add initial PowerPC libcflat files & make file changes
On Wed, 2008-06-25 at 15:39 -0500, Jerone Young wrote: > 4 files changed, 124 insertions(+), 1 deletion(-) > user/config-powerpc.mak | 10 ++- > user/test/lib/powerpc/44x/map.c | 51 + > user/test/lib/powerpc/44x/tlbwe.S | 29 + > user/test/lib/powerpc/io.c| 35 + Without user/test/powerpc/cstart.S (and exit.c which uses it), most of this code is totally unused. That should at least be an additional patch in this series; see http://marc.info/?l=kvm-ppc-devel&m=120043765909206&w=2 -- Hollis Blanchard IBM Linux Technology Center -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0 of 4] [kvm-userspace][test] consolidate test libs to libcflat
On Wed, 2008-06-25 at 15:39 -0500, Jerone Young wrote: > This set of patches are to consolidate test libraries into a single > library archive. This lib archive is libcflat. This will allow common > code to be shared among archs. > > Signed-off-by: Jerone Young <[EMAIL PROTECTED]> I think patches 1-3 should be combined. By the way, I assume you've built x86 with these changes, but have you run it too? -- Hollis Blanchard IBM Linux Technology Center -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
another kvm-70 compile bug with rhel/centos 5.2
hi, i'm just try to recompile kvm-70 with the latest centos-5.2 (aka rhel-5.2) kernel, but i've got a new compile error: --- make KDIR=/usr/src/kernels/2.6.18-92.1.1.el5-x86_64 make -C /usr/src/kernels/2.6.18-92.1.1.el5-x86_64 M=`pwd` \ LINUXINCLUDE="-I`pwd`/include -Iinclude -I`pwd`/include-compat \ -include include/linux/autoconf.h \ -include `pwd`/external-module-compat.h" make[1]: Entering directory `/usr/src/kernels/2.6.18-92.1.1.el5-x86_64' LD /home/robot/rpm/BUILD/kvm-kmod-70/_kmod_build_/kernel/built-in.o CC [M] /home/robot/rpm/BUILD/kvm-kmod-70/_kmod_build_/kernel/svm.o In file included from :2: /home/robot/rpm/BUILD/kvm-kmod-70/_kmod_build_/kernel/external-module-compat.h:351: error: redefinition of typedef 'bool' include/linux/types.h:36: error: previous declaration of 'bool' was here make[2]: *** [/home/robot/rpm/BUILD/kvm-kmod-70/_kmod_build_/kernel/svm.o] Error 1 make[1]: *** [_module_/home/robot/rpm/BUILD/kvm-kmod-70/_kmod_build_/kernel] Error 2 make[1]: Leaving directory `/usr/src/kernels/2.6.18-92.1.1.el5-x86_64' make: *** [all] Error 2 --- yours. -- Levente "Si vis pacem para bellum!" -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3 of 4] Remove old x86 test libs, that are now appart of libcflat
8 files changed, 411 deletions(-) user/test/x86/lib/apic.h | 14 --- user/test/x86/lib/exit.c |5 - user/test/x86/lib/printf.c | 195 user/test/x86/lib/printf.h |2 user/test/x86/lib/smp.c| 151 -- user/test/x86/lib/smp.h| 16 --- user/test/x86/lib/string.c | 21 user/test/x86/lib/string.h |7 - Signed-off-by: Jerone Young <[EMAIL PROTECTED]> diff --git a/user/test/x86/lib/apic.h b/user/test/x86/lib/apic.h deleted file mode 100644 --- a/user/test/x86/lib/apic.h +++ /dev/null @@ -1,14 +0,0 @@ -#ifndef SILLY_APIC_H -#define SILLY_APIC_H - -#define APIC_BASE 0x1000 -#define APIC_SIZE 0x100 - -#define APIC_REG_NCPU0x00 -#define APIC_REG_ID 0x04 -#define APIC_REG_SIPI_ADDR 0x08 -#define APIC_REG_SEND_SIPI 0x0c -#define APIC_REG_IPI_VECTOR 0x10 -#define APIC_REG_SEND_IPI0x14 - -#endif diff --git a/user/test/x86/lib/exit.c b/user/test/x86/lib/exit.c deleted file mode 100644 --- a/user/test/x86/lib/exit.c +++ /dev/null @@ -1,5 +0,0 @@ - -void exit(int code) -{ - asm volatile("out %0, %1" : : "a"(code), "d"((short)0xf4)); -} diff --git a/user/test/x86/lib/printf.c b/user/test/x86/lib/printf.c deleted file mode 100644 --- a/user/test/x86/lib/printf.c +++ /dev/null @@ -1,195 +0,0 @@ -#include "printf.h" -#include "smp.h" -#include -#include "string.h" - -static struct spinlock lock; - -void print(const char *s); - -typedef struct pstream { -char *buffer; -int remain; -int added; -} pstream_t; - -static void addchar(pstream_t *p, char c) -{ -if (p->remain) { - *p->buffer++ = c; - --p->remain; -} -++p->added; -} - -void print_str(pstream_t *p, const char *s) -{ -while (*s) - addchar(p, *s++); -} - -static char digits[16] = "0123456789abcdef"; - -void print_int(pstream_t *ps, long long n, int base) -{ -char buf[sizeof(long) * 3 + 2], *p = buf; -int s = 0, i; - -if (n < 0) { - n = -n; - s = 1; -} - -while (n) { - *p++ = digits[n % base]; - n /= base; -} - -if (s) - *p++ = '-'; - -if (p == buf) - *p++ = '0'; - -for (i = 0; i < (p - buf) / 2; ++i) { - char tmp; - - tmp = buf[i]; - buf[i] = p[-1-i]; - p[-1-i] = tmp; -} - -*p = 0; - -print_str(ps, buf); -} - -void print_unsigned(pstream_t *ps, unsigned long long n, int base) -{ -char buf[sizeof(long) * 3 + 1], *p = buf; -int i; - -while (n) { - *p++ = digits[n % base]; - n /= base; -} - -if (p == buf) - *p++ = '0'; - -for (i = 0; i < (p - buf) / 2; ++i) { - char tmp; - - tmp = buf[i]; - buf[i] = p[-1-i]; - p[-1-i] = tmp; -} - -*p = 0; - -print_str(ps, buf); -} - -int vsnprintf(char *buf, int size, const char *fmt, va_list va) -{ -pstream_t s; - -s.buffer = buf; -s.remain = size - 1; -s.added = 0; -while (*fmt) { - char f = *fmt++; - int nlong = 0; - - if (f != '%') { - addchar(&s, f); - continue; - } -morefmt: - f = *fmt++; - switch (f) { - case '%': - addchar(&s, '%'); - break; - case '\0': - --fmt; - break; - case 'l': - ++nlong; - goto morefmt; - case 'd': - switch (nlong) { - case 0: - print_int(&s, va_arg(va, int), 10); - break; - case 1: - print_int(&s, va_arg(va, long), 10); - break; - default: - print_int(&s, va_arg(va, long long), 10); - break; - } - break; - case 'x': - switch (nlong) { - case 0: - print_unsigned(&s, va_arg(va, unsigned), 16); - break; - case 1: - print_unsigned(&s, va_arg(va, unsigned long), 16); - break; - default: - print_unsigned(&s, va_arg(va, unsigned long long), 16); - break; - } - break; - case 'p': - print_str(&s, "0x"); - print_unsigned(&s, (unsigned long)va_arg(va, void *), 16); - break; - case 's': - print_str(&s, va_arg(va, const char *)); - break; - default: - addchar(&s, f); - break; - } -} -*s.buffer = 0; -++s.added; -return s.added; -} - - -int snprintf(char *buf, int size, const char *fmt, ...) -{ -va_list va; -int r; - -va_start(va, fmt); -r = vsnprintf(buf, size, fmt, va); -va_end(va); -return r; -} - -void print_serial(const char *buf) -{ -unsigned long len = strlen(buf); - -asm volatile ("rep/outsb" : "+S"(buf), "+c"(len) : "d"(0xf1)); -} - -int printf(const char *fmt, ...) -{ -va_list va; -char buf[2000]; -int r; - -va_start(va, f
[PATCH 4 of 4] Add initial PowerPC libcflat files & make file changes
4 files changed, 124 insertions(+), 1 deletion(-) user/config-powerpc.mak | 10 ++- user/test/lib/powerpc/44x/map.c | 51 + user/test/lib/powerpc/44x/tlbwe.S | 29 + user/test/lib/powerpc/io.c| 35 + Signed-off-by: Hollis Blanchard <[EMAIL PROTECTED]> Signed-off-by: Jerone Young <[EMAIL PROTECTED]> diff --git a/user/config-powerpc.mak b/user/config-powerpc.mak --- a/user/config-powerpc.mak +++ b/user/config-powerpc.mak @@ -3,6 +3,14 @@ CFLAGS += -I $(KERNELDIR)/include # for some reaons binutils hates tlbsx unless we say we're 405 :( CFLAGS += -Wa,-mregnames,-m405 + +cflatobjs += \ + test/lib/powerpc/io.o \ + test/lib/powerpc/44x/map.o \ + test/lib/powerpc/44x/tlbwe.o + +$(libcflat): LDFLAGS += -nostdlib +$(libcflat): CFLAGS += -ffreestanding -I test/lib -I test/lib/powerpc/44x %.bin: %.o $(OBJCOPY) -O binary $^ $@ @@ -18,7 +26,7 @@ tests := $(addprefix test/powerpc/, $(testobjs)) -all: kvmtrace kvmctl $(tests) +all: kvmtrace kvmctl $(libcflat) $(tests) kvmctl_objs = main-ppc.o iotable.o ../libkvm/libkvm.a diff --git a/user/test/lib/powerpc/44x/map.c b/user/test/lib/powerpc/44x/map.c new file mode 100644 --- /dev/null +++ b/user/test/lib/powerpc/44x/map.c @@ -0,0 +1,51 @@ +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. + * + * Copyright IBM Corp. 2008 + * + * Authors: Hollis Blanchard <[EMAIL PROTECTED]> + */ + +#include "libcflat.h" + +#define TLB_SIZE 64 + +extern void tlbwe(unsigned int index, + unsigned char tid, + unsigned int word0, + unsigned int word1, + unsigned int word2); + +unsigned int next_free_index; + +#define PAGE_SHIFT 12 +#define PAGE_MASK (~((1<= TLB_SIZE) + panic("TLB overflow"); + + w0 = (vaddr & PAGE_MASK) | V; + w1 = paddr & PAGE_MASK; + w2 = 0x3; + + tlbwe(next_free_index, 0, w0, w1, w2); +} diff --git a/user/test/lib/powerpc/44x/tlbwe.S b/user/test/lib/powerpc/44x/tlbwe.S new file mode 100644 --- /dev/null +++ b/user/test/lib/powerpc/44x/tlbwe.S @@ -0,0 +1,29 @@ +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. + * + * Copyright IBM Corp. 2008 + * + * Authors: Hollis Blanchard <[EMAIL PROTECTED]> + */ + +#define SPRN_MMUCR 0x3b2 + +/* tlbwe(uint index, uint8_t tid, uint word0, uint word1, uint word2) */ +.global tlbwe +tlbwe: + mtspr SPRN_MMUCR, r4 + tlbwe r5, r3, 0 + tlbwe r6, r3, 1 + tlbwe r7, r3, 2 + blr diff --git a/user/test/lib/powerpc/io.c b/user/test/lib/powerpc/io.c new file mode 100644 --- /dev/null +++ b/user/test/lib/powerpc/io.c @@ -0,0 +1,35 @@ +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. + * + * Copyright IBM Corp. 2008 + * + * Authors: Hollis Blanchard <[EMAIL PROTECTED]> + */ + +#include "libcflat.h" + +#define BASE 0xf000 +#define _putc ((volatile char *)(BASE)) +#define _exit ((volatile char *)(BASE+1)) + +void puts(const char *s) +{ + while (*s != '\0') + *_putc = *s++; +} + +void exit(int code) +{ + *_exit = code; +} -
[PATCH 1 of 4] Consilidate libcflat for x86 to single lib for all archs
8 files changed, 453 insertions(+) user/test/lib/libcflat.h | 37 + user/test/lib/panic.c| 13 +++ user/test/lib/printf.c | 179 ++ user/test/lib/string.c | 21 + user/test/lib/x86/apic.h | 14 +++ user/test/lib/x86/io.c | 23 + user/test/lib/x86/smp.c | 150 ++ user/test/lib/x86/smp.h | 16 This patch lays the ground work for a sinlge libcflat library that can be used for x86. But this allows for other archs to share common code, and build a single archive for tests to use libcflat functions. Signed-off-by: Jerone Young <[EMAIL PROTECTED]> diff --git a/user/test/lib/libcflat.h b/user/test/lib/libcflat.h new file mode 100644 --- /dev/null +++ b/user/test/lib/libcflat.h @@ -0,0 +1,37 @@ +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. + * + * Copyright IBM Corp. 2008 + * + * Authors: Hollis Blanchard <[EMAIL PROTECTED]> + */ + +#ifndef __LIBCFLAT_H +#define __LIBCFLAT_H + +#include + +extern int main(void); +extern void exit(int code); +extern void panic(char *fmt, ...); + +extern unsigned long strlen(const char *buf); +extern char *strcat(char *dest, const char *src); + +extern int printf(const char *fmt, ...); +extern int vsnprintf(char *buf, int size, const char *fmt, va_list va); + +extern void puts(const char *s); + +#endif diff --git a/user/test/lib/panic.c b/user/test/lib/panic.c new file mode 100644 --- /dev/null +++ b/user/test/lib/panic.c @@ -0,0 +1,13 @@ +#include "libcflat.h" + +void panic(char *fmt, ...) +{ + va_list va; + char buf[2000]; + + va_start(va, fmt); + vsnprintf(buf, sizeof(buf), fmt, va); + va_end(va); + puts(buf); + exit(-1); +} diff --git a/user/test/lib/printf.c b/user/test/lib/printf.c new file mode 100644 --- /dev/null +++ b/user/test/lib/printf.c @@ -0,0 +1,179 @@ +#include "libcflat.h" + +typedef struct pstream { +char *buffer; +int remain; +int added; +} pstream_t; + +static void addchar(pstream_t *p, char c) +{ +if (p->remain) { + *p->buffer++ = c; + --p->remain; +} +++p->added; +} + +void print_str(pstream_t *p, const char *s) +{ +while (*s) + addchar(p, *s++); +} + +static char digits[16] = "0123456789abcdef"; + +void print_int(pstream_t *ps, long long n, int base) +{ +char buf[sizeof(long) * 3 + 2], *p = buf; +int s = 0, i; + +if (n < 0) { + n = -n; + s = 1; +} + +while (n) { + *p++ = digits[n % base]; + n /= base; +} + +if (s) + *p++ = '-'; + +if (p == buf) + *p++ = '0'; + +for (i = 0; i < (p - buf) / 2; ++i) { + char tmp; + + tmp = buf[i]; + buf[i] = p[-1-i]; + p[-1-i] = tmp; +} + +*p = 0; + +print_str(ps, buf); +} + +void print_unsigned(pstream_t *ps, unsigned long long n, int base) +{ +char buf[sizeof(long) * 3 + 1], *p = buf; +int i; + +while (n) { + *p++ = digits[n % base]; + n /= base; +} + +if (p == buf) + *p++ = '0'; + +for (i = 0; i < (p - buf) / 2; ++i) { + char tmp; + + tmp = buf[i]; + buf[i] = p[-1-i]; + p[-1-i] = tmp; +} + +*p = 0; + +print_str(ps, buf); +} + +int vsnprintf(char *buf, int size, const char *fmt, va_list va) +{ +pstream_t s; + +s.buffer = buf; +s.remain = size - 1; +s.added = 0; +while (*fmt) { + char f = *fmt++; + int nlong = 0; + + if (f != '%') { + addchar(&s, f); + continue; + } +morefmt: + f = *fmt++; + switch (f) { + case '%': + addchar(&s, '%'); + break; + case '\0': + --fmt; + break; + case 'l': + ++nlong; + goto morefmt; + case 'd': + switch (nlong) { + case 0: + print_int(&s, va_arg(va, int), 10); + break; + case 1: + print_int(&s, va_arg(va, long), 10); + break; + default: + print_int(&s, va_arg(va, long long), 10); + break; + } + break; + case 'x': + switch (nlong) { + case 0: + print_unsigned(&s, va_arg(va, unsigned), 16); + break; + case 1: +
[PATCH 2 of 4] Add Makefile and test changes required for x86 to use libcflat
6 files changed, 25 insertions(+), 15 deletions(-) user/Makefile | 11 ++- user/config-x86-common.mak | 16 ++-- user/main.c|2 +- user/test/x86/port80.c |3 +-- user/test/x86/smptest.c|5 ++--- user/test/x86/tsc.c|3 +-- Signed-off-by: Jerone Young <[EMAIL PROTECTED]> diff --git a/user/Makefile b/user/Makefile --- a/user/Makefile +++ b/user/Makefile @@ -9,6 +9,12 @@ CFLAGS = libgcc := $(shell $(CC) --print-libgcc-file-name) + +libcflat := test/lib/libcflat.a +cflatobjs := \ + test/lib/panic.o \ + test/lib/printf.o \ + test/lib/string.o #include architecure specific make rules include config-$(ARCH).mak @@ -41,10 +47,13 @@ kvmtrace: $(kvmtrace_objs) $(CC) $(LDFLAGS) $^ -o $@ +$(libcflat): $(cflatobjs) + ar rcs $@ $^ + %.o: %.S $(CC) $(CFLAGS) -c -nostdlib -o $@ $^ -include .*.d clean: arch_clean - $(RM) kvmctl kvmtrace *.o *.a .*.d + $(RM) kvmctl kvmtrace *.o *.a .*.d $(libcflat) $(cflatobjs) diff --git a/user/config-x86-common.mak b/user/config-x86-common.mak --- a/user/config-x86-common.mak +++ b/user/config-x86-common.mak @@ -5,7 +5,15 @@ kvmctl_objs= main.o iotable.o ../libkvm/libkvm.a balloon_ctl: balloon_ctl.o -FLATLIBS = $(TEST_DIR)/libcflat.a $(libgcc) +cflatobjs += \ + test/lib/x86/io.o \ + test/lib/x86/smp.o + +$(libcflat): LDFLAGS += -nostdlib +$(libcflat): CFLAGS += -ffreestanding -I test/lib + + +FLATLIBS = test/lib/libcflat.a $(libgcc) %.flat: %.o $(FLATLIBS) $(CC) $(CFLAGS) -nostdlib -o $@ -Wl,-T,flat.lds $^ $(FLATLIBS) @@ -15,7 +23,7 @@ test_cases: $(tests-common) $(tests) -$(TEST_DIR)/%.o: CFLAGS += -std=gnu99 -ffreestanding -I$(TEST_DIR)/lib +$(TEST_DIR)/%.o: CFLAGS += -std=gnu99 -ffreestanding -I test/lib -I test/lib/x86 $(TEST_DIR)/bootstrap: $(TEST_DIR)/bootstrap.o $(CC) -nostdlib -o $@ -Wl,-T,bootstrap.lds $^ @@ -41,10 +49,6 @@ $(TEST_DIR)/tsc.flat: $(cstart.o) $(TEST_DIR)/tsc.o -$(TEST_DIR)/libcflat.a: $(TEST_DIR)/lib/exit.o $(TEST_DIR)/lib/printf.o \ - $(TEST_DIR)/lib/smp.o $(TEST_DIR)/lib/string.o - ar rcs $@ $^ - arch_clean: $(RM) $(TEST_DIR)/bootstrap $(TEST_DIR)/*.o $(TEST_DIR)/*.flat \ $(TEST_DIR)/.*.d $(TEST_DIR)/lib/.*.d $(TEST_DIR)/lib/*.o diff --git a/user/main.c b/user/main.c --- a/user/main.c +++ b/user/main.c @@ -17,7 +17,7 @@ #define _GNU_SOURCE #include -#include "test/x86/lib/apic.h" +#include "test/lib/x86/apic.h" #include "test/x86/ioram.h" #include diff --git a/user/test/x86/port80.c b/user/test/x86/port80.c --- a/user/test/x86/port80.c +++ b/user/test/x86/port80.c @@ -1,5 +1,4 @@ - -#include "printf.h" +#include int main() { diff --git a/user/test/x86/smptest.c b/user/test/x86/smptest.c --- a/user/test/x86/smptest.c +++ b/user/test/x86/smptest.c @@ -1,6 +1,5 @@ - -#include "smp.h" -#include "printf.h" +#include +#include static void ipi_test(void *data) { diff --git a/user/test/x86/tsc.c b/user/test/x86/tsc.c --- a/user/test/x86/tsc.c +++ b/user/test/x86/tsc.c @@ -1,5 +1,4 @@ - -#include "printf.h" +#include typedef unsigned long long u64; -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0 of 4] [kvm-userspace][test] consolidate test libs to libcflat
This set of patches are to consolidate test libraries into a single library archive. This lib archive is libcflat. This will allow common code to be shared among archs. Signed-off-by: Jerone Young <[EMAIL PROTECTED]> 26 files changed, 602 insertions(+), 427 deletions(-) user/Makefile | 11 +- user/config-powerpc.mak | 10 + user/config-x86-common.mak| 16 +-- user/main.c |2 user/test/lib/libcflat.h | 37 +++ user/test/lib/panic.c | 13 ++ user/test/lib/powerpc/44x/map.c | 51 + user/test/lib/powerpc/44x/tlbwe.S | 29 + user/test/lib/powerpc/io.c| 35 ++ user/test/lib/printf.c| 179 + user/test/lib/string.c| 21 +++ user/test/lib/x86/apic.h | 14 ++ user/test/lib/x86/io.c| 23 user/test/lib/x86/smp.c | 150 user/test/lib/x86/smp.h | 16 +++ user/test/x86/lib/apic.h | 14 -- user/test/x86/lib/exit.c |5 user/test/x86/lib/printf.c| 195 - user/test/x86/lib/printf.h|2 user/test/x86/lib/smp.c | 151 user/test/x86/lib/smp.h | 16 --- user/test/x86/lib/string.c| 21 --- user/test/x86/lib/string.h|7 - user/test/x86/port80.c|3 user/test/x86/smptest.c |5 user/test/x86/tsc.c |3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] KVM fixes for 2.6.26-rc7
On Wednesday 25 June 2008, Marcelo Tosatti wrote: > On Wed, Jun 25, 2008 at 04:17:23PM -0300, Marcelo Tosatti wrote: > > > Just found out, it is CONFIG_KVM_CLOCK. With CONFIG_KVM_CLOCK=y it does > > > boot fine. > > > > You mean with CONFIG_KVM_CLOCK=n it boots fine, I suppose. > > > > You should upgrade the guest kernel to the git tree, kvm clock changes > > break compatibility with older kernels. > > Err, I mean you have to upgrade the host kernel too. Ah, unfortunately I can't do this easily :( I'm presently on vacation and the host is my desktop at work, the risk to break something remotely is too high. And even when I'n back is a problem, unfortunately the host is a nvidia system and dual display only works with the binary only driver (*grr* why did I have to get a stupid nvidia system : ( ). Thanks, Bernd -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] KVM fixes for 2.6.26-rc7
On Wed, Jun 25, 2008 at 04:17:23PM -0300, Marcelo Tosatti wrote: > > Just found out, it is CONFIG_KVM_CLOCK. With CONFIG_KVM_CLOCK=y it does > > boot > > fine. > > You mean with CONFIG_KVM_CLOCK=n it boots fine, I suppose. > > You should upgrade the guest kernel to the git tree, kvm clock changes > break compatibility with older kernels. Err, I mean you have to upgrade the host kernel too. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] KVM fixes for 2.6.26-rc7
On Wednesday 25 June 2008, Marcelo Tosatti wrote: > On Wed, Jun 25, 2008 at 08:54:44PM +0200, Bernd Schubert wrote: > > On Wednesday 25 June 2008, Marcelo Tosatti wrote: > > > On Wed, Jun 25, 2008 at 06:13:05PM +0200, Bernd Schubert wrote: > > > > Avi Kivity wrote: > > > > > Linus, please pull from the repo and branch at: > > > > > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm.git > > > > > kvm-updates-2.6.26 > > > > > > > > I just pulled from Linus and now it stalls to boot at > > > > > > > > [0.616031] pnp: PnP ACPI init > > > > [0.628031] ACPI: bus type pnp registered > > > > [0.640031] pnp: PnP ACPI: found 7 devices > > > > [0.652031] ACPI: ACPI bus type pnp unregistered > > > > [0.660031] SCSI subsystem initialized > > > > [0.664031] PCI: Using ACPI for IRQ routing > > > > [0.692031] PCI-GART: No AMD northbridge found. > > > > > > > > The kvm process is at 100% time. Taking the many problems I already > > > > reported about, 2.6.26 probably will be entirely broken regarding kvm > > > > :( > > > > > > Do you have CONFIG_KVM_CLOCK or CONFIG_KVM_GUEST enabled ? > > > > > > There is a known problem with CONFIG_KVM_GUEST being worked on. > > > > > > Can you provide this details, and pinpoint which option is the culprit? > > > > Just found out, it is CONFIG_KVM_CLOCK. With CONFIG_KVM_CLOCK=y it does > > boot fine. > > You mean with CONFIG_KVM_CLOCK=n it boots fine, I suppose. Oh sorry, this was supposed to be "With CONFIG_KVM_GUEST=y it boots fine". So yes, CONFIG_KVM_CLOCK is the culprit. > > You should upgrade the guest kernel to the git tree, kvm clock changes > break compatibility with older kernels. This __is__ the after updating to recent git (before that my problem was different). Latest commit is commit 543cf4cb3fe6f6cae3651ba918b9c56200b257d0 Author: Linus Torvalds <[EMAIL PROTECTED]> Date: Tue Jun 24 18:58:20 2008 -0700 Linux 2.6.26-rc8 Thanks, Bernd -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] KVM fixes for 2.6.26-rc7
On Wed, Jun 25, 2008 at 08:54:44PM +0200, Bernd Schubert wrote: > On Wednesday 25 June 2008, Marcelo Tosatti wrote: > > On Wed, Jun 25, 2008 at 06:13:05PM +0200, Bernd Schubert wrote: > > > Avi Kivity wrote: > > > > Linus, please pull from the repo and branch at: > > > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm.git > > > > kvm-updates-2.6.26 > > > > > > I just pulled from Linus and now it stalls to boot at > > > > > > [0.616031] pnp: PnP ACPI init > > > [0.628031] ACPI: bus type pnp registered > > > [0.640031] pnp: PnP ACPI: found 7 devices > > > [0.652031] ACPI: ACPI bus type pnp unregistered > > > [0.660031] SCSI subsystem initialized > > > [0.664031] PCI: Using ACPI for IRQ routing > > > [0.692031] PCI-GART: No AMD northbridge found. > > > > > > The kvm process is at 100% time. Taking the many problems I already > > > reported about, 2.6.26 probably will be entirely broken regarding kvm :( > > > > Do you have CONFIG_KVM_CLOCK or CONFIG_KVM_GUEST enabled ? > > > > There is a known problem with CONFIG_KVM_GUEST being worked on. > > > > Can you provide this details, and pinpoint which option is the culprit? > > Just found out, it is CONFIG_KVM_CLOCK. With CONFIG_KVM_CLOCK=y it does boot > fine. You mean with CONFIG_KVM_CLOCK=n it boots fine, I suppose. You should upgrade the guest kernel to the git tree, kvm clock changes break compatibility with older kernels. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] KVM fixes for 2.6.26-rc7
On Wednesday 25 June 2008, Marcelo Tosatti wrote: > On Wed, Jun 25, 2008 at 06:13:05PM +0200, Bernd Schubert wrote: > > Avi Kivity wrote: > > > Linus, please pull from the repo and branch at: > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm.git > > > kvm-updates-2.6.26 > > > > I just pulled from Linus and now it stalls to boot at > > > > [0.616031] pnp: PnP ACPI init > > [0.628031] ACPI: bus type pnp registered > > [0.640031] pnp: PnP ACPI: found 7 devices > > [0.652031] ACPI: ACPI bus type pnp unregistered > > [0.660031] SCSI subsystem initialized > > [0.664031] PCI: Using ACPI for IRQ routing > > [0.692031] PCI-GART: No AMD northbridge found. > > > > The kvm process is at 100% time. Taking the many problems I already > > reported about, 2.6.26 probably will be entirely broken regarding kvm :( > > Do you have CONFIG_KVM_CLOCK or CONFIG_KVM_GUEST enabled ? > > There is a known problem with CONFIG_KVM_GUEST being worked on. > > Can you provide this details, and pinpoint which option is the culprit? Just found out, it is CONFIG_KVM_CLOCK. With CONFIG_KVM_CLOCK=y it does boot fine. Thanks, Bernd -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] KVM fixes for 2.6.26-rc7
On Wed, Jun 25, 2008 at 06:13:05PM +0200, Bernd Schubert wrote: > Avi Kivity wrote: > > > Linus, please pull from the repo and branch at: > > > > git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm.git > > kvm-updates-2.6.26 > > I just pulled from Linus and now it stalls to boot at > > [0.616031] pnp: PnP ACPI init > [0.628031] ACPI: bus type pnp registered > [0.640031] pnp: PnP ACPI: found 7 devices > [0.652031] ACPI: ACPI bus type pnp unregistered > [0.660031] SCSI subsystem initialized > [0.664031] PCI: Using ACPI for IRQ routing > [0.692031] PCI-GART: No AMD northbridge found. > > The kvm process is at 100% time. Taking the many problems I already reported > about, 2.6.26 probably will be entirely broken regarding kvm :( Do you have CONFIG_KVM_CLOCK or CONFIG_KVM_GUEST enabled ? There is a known problem with CONFIG_KVM_GUEST being worked on. Can you provide this details, and pinpoint which option is the culprit? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Avoid fragment virtio-blk transfers by copying
Avi Kivity wrote: Anthony Liguori wrote: A major source of performance loss for virtio-blk has been the fact that we split transfers into multiple requests. This is particularly harmful if you have striped storage beneath your virtual machine. This patch copies the request data into a single contiguous buffer to ensure that we don't split requests. This improves performance from about 80 MB/sec to about 155 MB/sec with my fibre channel link. 185 MB/sec is what we get on native so this gets us pretty darn close. If the guest issues a request for a terabyte of memory, the host will try to allocate it and drop to swap/oom. An unprivileged user should not be able to OOM the kernel by simply doing memory allocations. Likewise, while it will start swapping, that should primarily effect the application allocating memory (although yes, it's consuming IO bandwidth to do the swapping). As long as we properly handle memory allocation failures, it's my contention that allowing a guest to allocate unbound amounts of virtual memory is safe. So we need to either fragment beyond some size, or to avoid copying and thus the need for allocation. As Marcelo mentioned, there are very practical limitations on how much memory can be in-flight on any given queue. A malicious guest could construct a nasty queue that basically pointed to all of the guest physical memory for each entry in the queue but that's still a bound size. Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] KVM fixes for 2.6.26-rc7
Avi Kivity wrote: > Linus, please pull from the repo and branch at: > > git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm.git > kvm-updates-2.6.26 I just pulled from Linus and now it stalls to boot at [0.616031] pnp: PnP ACPI init [0.628031] ACPI: bus type pnp registered [0.640031] pnp: PnP ACPI: found 7 devices [0.652031] ACPI: ACPI bus type pnp unregistered [0.660031] SCSI subsystem initialized [0.664031] PCI: Using ACPI for IRQ routing [0.692031] PCI-GART: No AMD northbridge found. The kvm process is at 100% time. Taking the many problems I already reported about, 2.6.26 probably will be entirely broken regarding kvm :( Thanks, Bernd -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Avoid fragment virtio-blk transfers by copying
On Wed, Jun 25, 2008 at 01:44:35PM +0300, Avi Kivity wrote: > Anthony Liguori wrote: >> A major source of performance loss for virtio-blk has been the fact that we >> split transfers into multiple requests. This is particularly harmful if you >> have striped storage beneath your virtual machine. >> >> This patch copies the request data into a single contiguous buffer to ensure >> that we don't split requests. This improves performance from about 80 MB/sec >> to about 155 MB/sec with my fibre channel link. 185 MB/sec is what we get on >> native so this gets us pretty darn close. >> >> > > If the guest issues a request for a terabyte of memory, the host will > try to allocate it and drop to swap/oom. So we need to either fragment > beyond some size, or to avoid copying and thus the need for allocation. The maximum request size for Linux guests is 512K (after tuning virtio-blk guest driver, current max is 124K). I'm not sure what the max number of requests is, but I guess is between 128 and 1024, Anthony? So with the current configuration your concern is not an issue. BTW, what is maximum request size for the Windows driver? Point is that the guest is responsible for limiting the amount of data in-flight. A malicious guest can only hurt itself by attempting to DoS the host, with proper memory limits in place. IMO this issue should not be handled in the virtio-blk backend. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [v2] Remove use of bit fields in kvm trace structure
On Tue, 2008-06-24 at 13:54 -0500, Jerone Young wrote: > 2 files changed, 30 insertions(+), 11 deletions(-) > include/linux/kvm.h | 17 ++--- > virt/kvm/kvm_trace.c | 24 > > > *Updates: > Create global definitions for setting trace records as opposed to > explicitly setting them inside of a function. > > This patch fixes kvmtrace use on big endian systems. When using bit > fields the compiler will lay data out in the wrong order expected when > laid down into a file. This fixes it by using one variable instead of > using bit fields. > > Signed-off-by: Jerone Young <[EMAIL PROTECTED]> Acked-by: Hollis Blanchard <[EMAIL PROTECTED]> Avi, if this is OK now, please also apply Jerone's earlier patches: [PATCH 2 of 3] Move KVM TRACE DEFINITIONS to common header [PATCH 3 of 3] Add new KVM TRACE events (and add my Acked-by to those as well). -- Hollis Blanchard IBM Linux Technology Center -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Use qemu_memalign instead of qemu_malloc
Kevin Wolf wrote: Anthony Liguori schrieb: Kevin Wolf wrote: Anthony Liguori schrieb: Kevin Wolf wrote: Anthony Liguori schrieb: Yes, if it fails, the EINVAL is no surprise. I meant what code path it was using. Obviously we missed something in our patch and I'd like to fix that. Did the error occur on raw images or something like qcow2? It's a raw image and the calls are being made via bdrv_aio_read/bdrv_aio_write. It doesn't occur with a qcow2 but then cache=off doesn't seem to do what it's supposed to with cache=off (I believe the underlying backing file is not opened O_DIRECT?). This is really strange. In raw_aio_read/write there is a check like this: if (unlikely(s->aligned_buf != NULL && ((uintptr_t) buf % 512))) { // emulate it using raw_pread/write which uses // s->aligned_buf for the request then } Something is goofy then. For qcow2 I think O_DIRECT actually is in effect. Otherwise it would have worked even without our patch, and it didn't. And indeed, looking at the code, it passes flags to bdrv_file_open when it opens the image file. Something's broken then. Maybe -snapshot doesn't pick up the O_DIRECT'ness? I'll have to check again. I was definitely seeing page cache behavior with cache=off. Right, qemu seems to drop the flags for the backing file when using BDRV_O_SNAPSHOT (bdrv2_open in block.c opens the file). So O_DIRECT applies only to new data. Have you been using -snapshot when you had trouble with the unaligned buffer, too? I don't think I have tested this one when I made the patch... Nope. I was using a raw image. Actually, an LVM partition. Regards, Anthony Liguori Kevin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Use qemu_memalign instead of qemu_malloc
Anthony Liguori schrieb: > Kevin Wolf wrote: >> Anthony Liguori schrieb: >> >>> Kevin Wolf wrote: >>> Anthony Liguori schrieb: Yes, if it fails, the EINVAL is no surprise. I meant what code path it was using. Obviously we missed something in our patch and I'd like to fix that. Did the error occur on raw images or something like qcow2? >>> It's a raw image and the calls are being made via >>> bdrv_aio_read/bdrv_aio_write. It doesn't occur with a qcow2 but then >>> cache=off doesn't seem to do what it's supposed to with cache=off (I >>> believe the underlying backing file is not opened O_DIRECT?). >>> >> >> This is really strange. In raw_aio_read/write there is a check like this: >> >> if (unlikely(s->aligned_buf != NULL && ((uintptr_t) buf % 512))) { >> // emulate it using raw_pread/write which uses >> // s->aligned_buf for the request then >> } >> > > Something is goofy then. > >> For qcow2 I think O_DIRECT actually is in effect. Otherwise it would >> have worked even without our patch, and it didn't. And indeed, looking >> at the code, it passes flags to bdrv_file_open when it opens the image >> file. >> > > Something's broken then. Maybe -snapshot doesn't pick up the > O_DIRECT'ness? I'll have to check again. I was definitely seeing page > cache behavior with cache=off. Right, qemu seems to drop the flags for the backing file when using BDRV_O_SNAPSHOT (bdrv2_open in block.c opens the file). So O_DIRECT applies only to new data. Have you been using -snapshot when you had trouble with the unaligned buffer, too? I don't think I have tested this one when I made the patch... Kevin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Use qemu_memalign instead of qemu_malloc
Kevin Wolf wrote: Anthony Liguori schrieb: Kevin Wolf wrote: Anthony Liguori schrieb: Yes, if it fails, the EINVAL is no surprise. I meant what code path it was using. Obviously we missed something in our patch and I'd like to fix that. Did the error occur on raw images or something like qcow2? It's a raw image and the calls are being made via bdrv_aio_read/bdrv_aio_write. It doesn't occur with a qcow2 but then cache=off doesn't seem to do what it's supposed to with cache=off (I believe the underlying backing file is not opened O_DIRECT?). This is really strange. In raw_aio_read/write there is a check like this: if (unlikely(s->aligned_buf != NULL && ((uintptr_t) buf % 512))) { // emulate it using raw_pread/write which uses // s->aligned_buf for the request then } Something is goofy then. For qcow2 I think O_DIRECT actually is in effect. Otherwise it would have worked even without our patch, and it didn't. And indeed, looking at the code, it passes flags to bdrv_file_open when it opens the image file. Something's broken then. Maybe -snapshot doesn't pick up the O_DIRECT'ness? I'll have to check again. I was definitely seeing page cache behavior with cache=off. Kevin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Use qemu_memalign instead of qemu_malloc
Laurent Vivier schrieb: > Generally EINVAL with O_DIRECT opened files means there is an alignment > problem with offset, buffer address or size to read (must be multiple of > 512). Apparently the qemu_memalign for the buffer helps, so it's the buffer address in this case. Kevin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Use qemu_memalign instead of qemu_malloc
Le mercredi 25 juin 2008 à 16:15 +0200, Kevin Wolf a écrit : > Anthony Liguori schrieb: > > Kevin Wolf wrote: > >> Anthony Liguori schrieb: > >> > >>> I guess the main block code is not as defensive as I thought it was. > >>> This patch > >>> uses qemu_memalign to allocate the buffers for IO so that you don't > >>> get errors > >>> when using O_DIRECT. > >>> > >> > >> Actually, the block code should be able to deal with unaligned buffers > >> since qemu rev. 4599. This change seems to be present in current KVM. > >> > > > > That was what I thought at first too. > > > >> Can you tell exactly which operation failed? > > > > The aio requests fail with -22 (EINVAL). > > Yes, if it fails, the EINVAL is no surprise. I meant what code path it > was using. Obviously we missed something in our patch and I'd like to > fix that. Did the error occur on raw images or something like qcow2? Generally EINVAL with O_DIRECT opened files means there is an alignment problem with offset, buffer address or size to read (must be multiple of 512). Regards, Laurent -- - [EMAIL PROTECTED] --- "The best way to predict the future is to invent it." - Alan Kay -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Use qemu_memalign instead of qemu_malloc
Anthony Liguori schrieb: > Kevin Wolf wrote: >> Anthony Liguori schrieb: >> >> Yes, if it fails, the EINVAL is no surprise. I meant what code path it >> was using. Obviously we missed something in our patch and I'd like to >> fix that. Did the error occur on raw images or something like qcow2? >> > > It's a raw image and the calls are being made via > bdrv_aio_read/bdrv_aio_write. It doesn't occur with a qcow2 but then > cache=off doesn't seem to do what it's supposed to with cache=off (I > believe the underlying backing file is not opened O_DIRECT?). This is really strange. In raw_aio_read/write there is a check like this: if (unlikely(s->aligned_buf != NULL && ((uintptr_t) buf % 512))) { // emulate it using raw_pread/write which uses // s->aligned_buf for the request then } For qcow2 I think O_DIRECT actually is in effect. Otherwise it would have worked even without our patch, and it didn't. And indeed, looking at the code, it passes flags to bdrv_file_open when it opens the image file. Kevin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Use qemu_memalign instead of qemu_malloc
Kevin Wolf wrote: Anthony Liguori schrieb: Kevin Wolf wrote: Anthony Liguori schrieb: I guess the main block code is not as defensive as I thought it was. This patch uses qemu_memalign to allocate the buffers for IO so that you don't get errors when using O_DIRECT. Actually, the block code should be able to deal with unaligned buffers since qemu rev. 4599. This change seems to be present in current KVM. That was what I thought at first too. Can you tell exactly which operation failed? The aio requests fail with -22 (EINVAL). Yes, if it fails, the EINVAL is no surprise. I meant what code path it was using. Obviously we missed something in our patch and I'd like to fix that. Did the error occur on raw images or something like qcow2? It's a raw image and the calls are being made via bdrv_aio_read/bdrv_aio_write. It doesn't occur with a qcow2 but then cache=off doesn't seem to do what it's supposed to with cache=off (I believe the underlying backing file is not opened O_DIRECT?). Regards, Anthony Liguori Kevin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Use qemu_memalign instead of qemu_malloc
Anthony Liguori schrieb: > Kevin Wolf wrote: >> Anthony Liguori schrieb: >> >>> I guess the main block code is not as defensive as I thought it was. >>> This patch >>> uses qemu_memalign to allocate the buffers for IO so that you don't >>> get errors >>> when using O_DIRECT. >>> >> >> Actually, the block code should be able to deal with unaligned buffers >> since qemu rev. 4599. This change seems to be present in current KVM. >> > > That was what I thought at first too. > >> Can you tell exactly which operation failed? > > The aio requests fail with -22 (EINVAL). Yes, if it fails, the EINVAL is no surprise. I meant what code path it was using. Obviously we missed something in our patch and I'd like to fix that. Did the error occur on raw images or something like qcow2? Kevin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Memory image of kvm guest
Tomas Kouba wrote: > Hello, > is it possible to see and modify guest memory of the guest running under > kvm? > For example when I know the address of a kernel symbol, can I read the > memory > of the symbol in my application running on host? > > (I am quite new to KVM but similar things are possible in XEN via > xenctrl library calls). Even better: Inherited from QEMU, KVM provides a full-blown gdb backend. So you can do source-level debugging of your guest very comfortably. If you just want to get the content of some memory chunk: QEMU monitor, 'x' (as known from gdb, see also qemu/qemu-doc.html). But modification requires a gdb frontend again. Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Use qemu_memalign instead of qemu_malloc
Kevin Wolf wrote: Anthony Liguori schrieb: I guess the main block code is not as defensive as I thought it was. This patch uses qemu_memalign to allocate the buffers for IO so that you don't get errors when using O_DIRECT. Actually, the block code should be able to deal with unaligned buffers since qemu rev. 4599. This change seems to be present in current KVM. That was what I thought at first too. Can you tell exactly which operation failed? The aio requests fail with -22 (EINVAL). But apart from that, qemu_memalign is the right thing to do, because copying from/into an aligned buffer in the block code costs performance (don't know how much, though). Regards, Anthony Liguori Kevin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Memory image of kvm guest
Hello, is it possible to see and modify guest memory of the guest running under kvm? For example when I know the address of a kernel symbol, can I read the memory of the symbol in my application running on host? (I am quite new to KVM but similar things are possible in XEN via xenctrl library calls). Thank you, -- Tomas Kouba -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2001452 ] Restarted Windows 2003 Server guests have disk corruption
Bugs item #2001452, was opened at 2008-06-24 00:27 Message generated for change (Comment added) made by iggy_cav You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2001452&group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: intel Group: None Status: Open Resolution: None Priority: 7 Private: No Submitted By: gwachs (gerdwachs) Assigned to: Nobody/Anonymous (nobody) Summary: Restarted Windows 2003 Server guests have disk corruption Initial Comment: I have a number of Windows 2003 32Bit guests. I use them to perform installation and configuration tests of a large software product. During these tests, the guests are restarted. Randomly, the guests produce disk corruption messages after a restart. The following are two examples : --- Windows Registry Hive Recovered Registry hive (file): SOFTWARE was corrupted and it has been recovered. Some data might have been lost. --- The system cannot log on due to the following error: Unable to complete the requested operation because of either a catastrophic media failure or a data structure corruption on the disk. --- OS : Ubuntu 8.04 x86_64 Kernel : 2.6.24-18-server #1 SMP x86_64 GNU/Linux KVM: kvm-70 CPU: Intel(R) Core(TM)2 Quad CPU @ 2.40GHz flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx est tm2 ssse3 cx16 xt Start Command : sudo /usr/local/kvm/bin/qemu-system-x86_64 -hda asit51ascs.img \ -m 1024 -std-vga -boot c -k sv -usb -usbdevice tablet -snapshot -vnc :51 \ -net nic,vlan=0,macaddr=00:16:3e:00:51:00 -net tap,vlan=0,script=/etc/qemu-ifup-br0 \ -net nic,vlan=1,macaddr=00:16:3e:00:51:01 -net tap,vlan=1,script=/etc/qemu-ifup-br1 no-kvm : Cannot do due to the loss of performance. Tests execute time is 7 hours with kvm. -- Comment By: Brian Jackson (iggy_cav) Date: 2008-06-25 07:59 Message: Logged In: YES user_id=611130 Originator: NO Can you try to revert (patch -R) the virtio async feature? Someone else in the irc channel that was having fs corruption had luck doing that. http://people.redhat.com/~mtosatti/virtioblk-async.patch Otherwise, just stick with the kvm-69 userspace until it's fixed. -- Comment By: gwachs (gerdwachs) Date: 2008-06-25 02:26 Message: Logged In: YES user_id=2122332 Originator: YES Windows 2003 Guests also have random BSOD The problems in this bug report put a stop to running Windows 2003 Server on kvm at this point in time. -- Comment By: gwachs (gerdwachs) Date: 2008-06-24 03:33 Message: Logged In: YES user_id=2122332 Originator: YES The message : apic write: bad size=1 fee00030 does not occur when using the option : -no-kvm-irqchip Will continue testing. -- Comment By: gwachs (gerdwachs) Date: 2008-06-24 03:15 Message: Logged In: YES user_id=2122332 Originator: YES The message : apic write: bad size=1 fee00030 only occurs when the guest is started using kvm. i.e does not occur with the -no-kvm option. When using the -no-acpi option, the guest does not start kvm or no kvm -- Comment By: gwachs (gerdwachs) Date: 2008-06-24 02:32 Message: Logged In: YES user_id=2122332 Originator: YES Noted that I get the following in the linux console : apic write: bad size=1 fee00030 -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2001452&group_id=180599 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] KVM: Introduce a callback routine for IOAPIC ack handling
On Wednesday 25 June 2008 12:49:39 Amit Shah wrote: > This will be useful for acking irqs of assigned devices > > Signed-off-by: Amit Shah <[EMAIL PROTECTED]> > --- > virt/kvm/ioapic.c |3 +++ > virt/kvm/ioapic.h |1 + > 2 files changed, 4 insertions(+), 0 deletions(-) > > diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c > index 9d02136..4759d77 100644 > --- a/virt/kvm/ioapic.c > +++ b/virt/kvm/ioapic.c > @@ -295,6 +295,9 @@ static void __kvm_ioapic_update_eoi(struct kvm_ioapic > *ioapic, int gsi) ent->fields.remote_irr = 0; > if (!ent->fields.mask && (ioapic->irr & (1 << gsi))) > ioapic_deliver(ioapic, gsi); > + > + if (ioapic->callback) > + ioapic->callback(ioapic->kvm, gsi); I don't mean to call this function 'callback'; but I don't know what to name it as well: ack_notifier, maybe? Also, guests with PIC aren't yet supported (haven't been at all all this while). I do have a patch ready for that though which needs testing. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PCI PT: irq issue
On Wednesday 25 June 2008 15:03:11 Han, Weidong wrote: > Amit Shah wrote: > > On Monday 23 June 2008 20:46:18 Han, Weidong wrote: > >> Amit Shah wrote: > >>> On Saturday 21 June 2008 09:41:18 Han, Weidong wrote: > Amit Shah wrote: > > A couple of notes for the VT-d patch: > > - The pci_dev struct is now available in the pci_pt kernel > > structure, so just use that information each time you want to add > > a device instead of searching for it each time. > > - The kernel with KVM VT-d patches doesn't build on the > > kvm-userspace.git tree. Please fix that. > > I pulled the latest VT-d branch, and it works fine for me. > >>> > >>> I mean the 'vtd' branch in the kernel tree with 'master' branch of > >>> the userspace tree aren't compatible. > > Amit, why do you want them to be compatible? 'vtd' branch of the kernel > tree and 'vtd' branch of the userspace tree is the combination. right? > 'master' branch of the userspace may don't have some patches which in > 'vtd' branch of the userspace. We need this for compatibility: old kernel with new userspace and new kernel with old userspace should all work fine. Amit. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu: remove overzelaous virtio-net printf
Marcelo Tosatti wrote: When two virtio devices share an interrupt virtio-net floods the console with "this should not happen" message. As Anthony points this is not a fatal condition: its possible that the guest consumed all ring elements between the can_receive check and actual net_receive call. Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][REPOST] KVM: VMX: Fix a wrong usage of vmcs_config
Yang, Sheng wrote: From a1c929709718c015686b0c23046cc08b8bc47a62 Mon Sep 17 00:00:00 2001 From: Sheng Yang <[EMAIL PROTECTED]> Date: Wed, 18 Jun 2008 14:43:38 +0800 Subject: [PATCH] KVM: VMX: Fix a wrong usage of vmcs_config The function ept_update_paging_mode_cr0() write to CPU_BASED_VM_EXEC_CONTROL based on vmcs_config.cpu_based_exec_ctrl. That's wrong because the variable may not consistent with the content in the CPU_BASE_VM_EXEC_CONTROL MSR. Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: VMX: Add ept_sync_context in flush_tlb
Yang, Sheng wrote: From 54dc26e44f1c0aa460bef409b799f36dae56a911 Mon Sep 17 00:00:00 2001 From: Sheng Yang <[EMAIL PROTECTED]> Date: Wed, 18 Jun 2008 11:23:13 +0800 Subject: [PATCH] KVM: VMX: Add ept_sync_context in flush_tlb Fix a potention issue caused by kvm_mmu_slot_remove_write_access(). The old behavior don't sync EPT TLB with modified EPT entry, which result in inconsistent content of EPT TLB and EPT table. @@ -1407,6 +1408,8 @@ static void exit_lmode(struct kvm_vcpu *vcpu) static void vmx_flush_tlb(struct kvm_vcpu *vcpu) { vpid_sync_vcpu_all(to_vmx(vcpu)); + if (vm_need_ept()) + ept_sync_context(to_vmx(vcpu)); } So we're flushing both the vpid tlb and the ept context? What does an ept context flush mean exactly? tlb entries for gpa->hpa? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] configure: remove configure warning against not using gcc3
Carlo Marcelo Arenas Belon wrote: complement ff5396cfeacf74ad9611a35e882ff100b10126aci, removing the warning printed by ./configure --help which recommended at configure time against using gcc4 as it wasn't supported by dyngen. @@ -28,8 +28,6 @@ usage() { Any additional option is given to qemu's configure verbatim; including: - --disable-gcc-checkdon't insist on gcc-3.x - CAUTION: this will break running without kvm You've orphaned the "including:" above! -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] irq assignment
(copying qemu-devel) Xu, Anthony wrote: Subject: [PATCH] Irq assignment 1. use bimodal _PRT 2. pci device can use irq > 15, reduce interrupt sharing 3. test by running linux guest in kvm-ia64, kvm-i32(w/ wo/ -no-kvm) + +static int ioapic_irq_count[IOAPIC_NUM_PINS]; + void ioapic_set_irq(void *opaque, int vector, int level) { IOAPICState *s = opaque; - +if( vector >= 16 ){ + if( level ) + ioapic_irq_count[vector] += 1; + else + ioapic_irq_count[vector] -= 1; + level = (ioapic_irq_count[vector] != 0); +} +#ifdef KVM_CAP_IRQCHIP +if (kvm_enabled()) + if (kvm_set_irq(vector, ioapic_irq_count[vector] == 0)) + return; +#endif It's legal to call ioapic_set_irq(vector, 1) twice, which will screw up the level calculation. We need qemu_irq_or(), similar to qemu_irq_invert(): qemu_irq qemu_irq_or(qemu_irq irqs[], int nr); Also, this is not the place for doing the or. The ioapic does not know which interrupts are level connected and which are not. This belongs on the pci level (or the mainboard level). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] make qemu compile for kvm/ia64
Xu, Anthony wrote: Thanks for your comments It is the revised one From 1a30adfc5ded3608ac2f09499b42234cf7d54a19 Mon Sep 17 00:00:00 2001 From: Anthony Xu <[EMAIL PROTECTED]> Date: Fri, 20 Jun 2008 10:45:13 -0400 Subject: [PATCH] Make qemu compile for kvm-ia64 Since merging with Qemu upsteram, it can't be compiled for kvm-ia64 Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Avoid fragment virtio-blk transfers by copying
Anthony Liguori wrote: A major source of performance loss for virtio-blk has been the fact that we split transfers into multiple requests. This is particularly harmful if you have striped storage beneath your virtual machine. This patch copies the request data into a single contiguous buffer to ensure that we don't split requests. This improves performance from about 80 MB/sec to about 155 MB/sec with my fibre channel link. 185 MB/sec is what we get on native so this gets us pretty darn close. If the guest issues a request for a terabyte of memory, the host will try to allocate it and drop to swap/oom. So we need to either fragment beyond some size, or to avoid copying and thus the need for allocation. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] x86: Add "virt flag" in /proc/cpuinfo
(copying Ingo) Yang, Sheng wrote: From 54b1bb9fe5d2fe40fc047b43dd4e1a480d41a977 Mon Sep 17 00:00:00 2001 From: Sheng Yang <[EMAIL PROTECTED]> Date: Tue, 24 Jun 2008 17:03:17 +0800 Subject: [PATCH] x86: Add "virt flag" in /proc/cpuinfo The hardware virtualization technology evolves very fast. But currently it's hard to tell if your CPU support a certain kind of HW technology without dig into the source code. The patch add a new item under /proc/cpuinfo, named "virt flag". The "virt flag" got the similar function as "flag". It is used to indicate what features does this CPU supported. It don't cover all features but only the important ones. Ingo, do you prefer this as a separate 'virt flags' line or as addition to the 'flag' line? Current implement just cover Intel VMX side. Signed-off-by: Sheng Yang <[EMAIL PROTECTED]> --- arch/x86/kernel/cpu/proc.c | 28 include/asm-x86/cpufeature.h |9 + 2 files changed, 37 insertions(+), 0 deletions(-) diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c index 0d0d905..03b30d0 100644 --- a/arch/x86/kernel/cpu/proc.c +++ b/arch/x86/kernel/cpu/proc.c @@ -77,6 +77,31 @@ static void show_cpuinfo_misc(struct seq_file *m, struct cpuinfo_x86 *c) } #endif +static void show_cpuinfo_vmx_virtflag(struct seq_file *m) +{ + u32 vmx_msr_low, vmx_msr_high, msr_ctl, msr_ctl2; + + seq_printf(m, "\nvirt flag\t:"); + rdmsr(MSR_IA32_VMX_PROCBASED_CTLS, vmx_msr_low, vmx_msr_high); + msr_ctl = 0x & vmx_msr_high | vmx_msr_low; + if (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW) + seq_printf(m, " tpr_shadow"); + if (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_VNMI) + seq_printf(m, " vnmi"); + if (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_2ND_CTLS) { + rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2, + vmx_msr_low, vmx_msr_high); + msr_ctl2 = 0x & vmx_msr_high | vmx_msr_low; + if ((msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_VIRT_APIC) && + (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW)) + seq_printf(m, " flexpriority"); + if (msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_EPT) + seq_printf(m, " ept"); + if (msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_VPID) + seq_printf(m, " vpid"); + } +} + static int show_cpuinfo(struct seq_file *m, void *v) { struct cpuinfo_x86 *c = v; @@ -123,6 +148,9 @@ static int show_cpuinfo(struct seq_file *m, void *v) if (cpu_has(c, i) && x86_cap_flags[i] != NULL) seq_printf(m, " %s", x86_cap_flags[i]); + if (cpu_has(c, X86_FEATURE_VMX)) + show_cpuinfo_vmx_virtflag(m); + seq_printf(m, "\nbogomips\t: %lu.%02lu\n", c->loops_per_jiffy/(50/HZ), (c->loops_per_jiffy/(5000/HZ)) % 100); diff --git a/include/asm-x86/cpufeature.h b/include/asm-x86/cpufeature.h index 0d609c8..87d8084 100644 --- a/include/asm-x86/cpufeature.h +++ b/include/asm-x86/cpufeature.h @@ -84,6 +84,7 @@ #define X86_FEATURE_XMM3 (4*32+ 0) /* Streaming SIMD Extensions-3 */ #define X86_FEATURE_MWAIT (4*32+ 3) /* Monitor/Mwait support */ #define X86_FEATURE_DSCPL (4*32+ 4) /* CPL Qualified Debug Store */ +#define X86_FEATURE_VMX(4*32+ 5) /* Virtual Machine eXtensions */ #define X86_FEATURE_EST(4*32+ 7) /* Enhanced SpeedStep */ #define X86_FEATURE_TM2(4*32+ 8) /* Thermal Monitor 2 */ #define X86_FEATURE_CID(4*32+10) /* Context ID */ @@ -113,6 +114,14 @@ */ #define X86_FEATURE_IDA(7*32+ 0) /* Intel Dynamic Acceleration */ +/* Intel VMX MSR indicated features */ +#define X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW 0x0020 +#define X86_VMX_FEATURE_PROC_CTLS_VNMI 0x0040 +#define X86_VMX_FEATURE_PROC_CTLS_2ND_CTLS 0x8000 +#define X86_VMX_FEATURE_PROC_CTLS2_VIRT_APIC 0x0001 +#define X86_VMX_FEATURE_PROC_CTLS2_EPT 0x0002 +#define X86_VMX_FEATURE_PROC_CTLS2_VPID0x0020 + #if defined(__KERNEL__) && !defined(__ASSEMBLY__) #include -- 1.5.5 -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: VMX: Some defined name fix
Yang, Sheng wrote: From 0dae764c94f48bd05f796947df1c85028ade59fa Mon Sep 17 00:00:00 2001 From: Sheng Yang <[EMAIL PROTECTED]> Date: Tue, 24 Jun 2008 17:02:38 +0800 Subject: [PATCH] KVM: VMX: Some defined name fix MSR_IA32_FEATURE_LOCKED is just a bit in fact, which shouldn't prefix with MSR_. So did MSR_IA32_FEATURE_VMXON_ENABLED. Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvm-devel] performance with guests running 2.4 kernels (specifically RHEL3)
David S. Ahern wrote: RHEL3 is in Maintenance mode (for an explanation see http://www.redhat.com/security/updates/errata/) which means performance enhancement patches will not make it in. Scratch that idea, then. Also, I'm going to be out of the office for a couple of weeks in July, so I will need to put this aside until mid-August or so. I'll reevaluate options then. One thing I'm looking at is implementing out-of-sync like Xen, which looks like it will obsolete the entire emulate vs flood thing at the cost of making unshadowing a little more expensive and consuming more memory. See http://thread.gmane.org/gmane.comp.emulators.xen.devel/52557 (and 58, 59, 60). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: PCI PT: irq issue
Amit Shah wrote: > On Monday 23 June 2008 20:46:18 Han, Weidong wrote: >> Amit Shah wrote: >>> On Saturday 21 June 2008 09:41:18 Han, Weidong wrote: Amit Shah wrote: > A couple of notes for the VT-d patch: > - The pci_dev struct is now available in the pci_pt kernel > structure, so just use that information each time you want to add > a device instead of searching for it each time. > - The kernel with KVM VT-d patches doesn't build on the > kvm-userspace.git tree. Please fix that. I pulled the latest VT-d branch, and it works fine for me. >>> >>> I mean the 'vtd' branch in the kernel tree with 'master' branch of >>> the userspace tree aren't compatible. Amit, why do you want them to be compatible? 'vtd' branch of the kernel tree and 'vtd' branch of the userspace tree is the combination. right? 'master' branch of the userspace may don't have some patches which in 'vtd' branch of the userspace. Randy (Weidong) >> >> I tried it. It can build. Can you show me the errors? > > If you compile kvm as external modules, the userspace tree doesn't > contain the vtd.o file while linking, but kvm.ko refers to objects > within it: > > > Building modules, stage 2. > MODPOST 3 modules > WARNING: "kvm_intel_iommu_found" > [/home/amit/src/kvm-userspace/kernel/kvm.ko] undefined! > WARNING: "kvm_iommu_unmap_guest" > [/home/amit/src/kvm-userspace/kernel/kvm.ko] undefined! > WARNING: "kvm_iommu_map_guest" > [/home/amit/src/kvm-userspace/kernel/kvm.ko] undefined! > WARNING: "kvm_iommu_map_pages" > [/home/amit/src/kvm-userspace/kernel/kvm.ko] undefined! -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Use qemu_memalign instead of qemu_malloc
Anthony Liguori schrieb: > I guess the main block code is not as defensive as I thought it was. This > patch > uses qemu_memalign to allocate the buffers for IO so that you don't get errors > when using O_DIRECT. Actually, the block code should be able to deal with unaligned buffers since qemu rev. 4599. This change seems to be present in current KVM. Can you tell exactly which operation failed? But apart from that, qemu_memalign is the right thing to do, because copying from/into an aligned buffer in the block code costs performance (don't know how much, though). Kevin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][REPOST] KVM: VMX: Add ept_sync_context in flush_tlb
From 54dc26e44f1c0aa460bef409b799f36dae56a911 Mon Sep 17 00:00:00 2001 From: Sheng Yang <[EMAIL PROTECTED]> Date: Wed, 18 Jun 2008 11:23:13 +0800 Subject: [PATCH] KVM: VMX: Add ept_sync_context in flush_tlb Fix a potention issue caused by kvm_mmu_slot_remove_write_access(). The old behavior don't sync EPT TLB with modified EPT entry, which result in inconsistent content of EPT TLB and EPT table. Signed-off-by: Sheng Yang <[EMAIL PROTECTED]> --- arch/x86/kvm/vmx.c | 18 -- 1 files changed, 12 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 6e4278d..5e2a800 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -83,6 +83,7 @@ struct vcpu_vmx { } irq; } rmode; int vpid; + u64 eptp; }; static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu) @@ -364,24 +365,24 @@ static inline void ept_sync_global(void) __invept(VMX_EPT_EXTENT_GLOBAL, 0, 0); } -static inline void ept_sync_context(u64 eptp) +static inline void ept_sync_context(struct vcpu_vmx *vmx) { if (vm_need_ept()) { if (cpu_has_vmx_invept_context()) - __invept(VMX_EPT_EXTENT_CONTEXT, eptp, 0); + __invept(VMX_EPT_EXTENT_CONTEXT, vmx->eptp, 0); else ept_sync_global(); } } -static inline void ept_sync_individual_addr(u64 eptp, gpa_t gpa) +static inline void ept_sync_individual_addr(struct vcpu_vmx *vmx, gpa_t gpa) { if (vm_need_ept()) { if (cpu_has_vmx_invept_individual_addr()) __invept(VMX_EPT_EXTENT_INDIVIDUAL_ADDR, - eptp, gpa); + vmx->eptp, gpa); else - ept_sync_context(eptp); + ept_sync_context(vmx); } } @@ -1407,6 +1408,8 @@ static void exit_lmode(struct kvm_vcpu *vcpu) static void vmx_flush_tlb(struct kvm_vcpu *vcpu) { vpid_sync_vcpu_all(to_vmx(vcpu)); + if (vm_need_ept()) + ept_sync_context(to_vmx(vcpu)); } static void vmx_decache_cr4_guest_bits(struct kvm_vcpu *vcpu) @@ -1517,12 +1520,15 @@ static void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3) { unsigned long guest_cr3; u64 eptp; + struct vcpu_vmx *vmx; + vmx = to_vmx(vcpu); guest_cr3 = cr3; if (vm_need_ept()) { eptp = construct_eptp(cr3); vmcs_write64(EPT_POINTER, eptp); - ept_sync_context(eptp); + vmx->eptp = eptp; + ept_sync_context(vmx); ept_load_pdptrs(vcpu); guest_cr3 = is_paging(vcpu) ? vcpu->arch.cr3 : VMX_EPT_IDENTITY_PAGETABLE_ADDR; -- 1.5.5 From 54dc26e44f1c0aa460bef409b799f36dae56a911 Mon Sep 17 00:00:00 2001 From: Sheng Yang <[EMAIL PROTECTED]> Date: Wed, 18 Jun 2008 11:23:13 +0800 Subject: [PATCH] KVM: VMX: Add ept_sync_context in flush_tlb Fix a potention issue caused by kvm_mmu_slot_remove_write_access(). The old behavior don't sync EPT TLB with modified EPT entry, which result in inconsistent content of EPT TLB and EPT table. Signed-off-by: Sheng Yang <[EMAIL PROTECTED]> --- arch/x86/kvm/vmx.c | 18 -- 1 files changed, 12 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 6e4278d..5e2a800 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -83,6 +83,7 @@ struct vcpu_vmx { } irq; } rmode; int vpid; + u64 eptp; }; static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu) @@ -364,24 +365,24 @@ static inline void ept_sync_global(void) __invept(VMX_EPT_EXTENT_GLOBAL, 0, 0); } -static inline void ept_sync_context(u64 eptp) +static inline void ept_sync_context(struct vcpu_vmx *vmx) { if (vm_need_ept()) { if (cpu_has_vmx_invept_context()) - __invept(VMX_EPT_EXTENT_CONTEXT, eptp, 0); + __invept(VMX_EPT_EXTENT_CONTEXT, vmx->eptp, 0); else ept_sync_global(); } } -static inline void ept_sync_individual_addr(u64 eptp, gpa_t gpa) +static inline void ept_sync_individual_addr(struct vcpu_vmx *vmx, gpa_t gpa) { if (vm_need_ept()) { if (cpu_has_vmx_invept_individual_addr()) __invept(VMX_EPT_EXTENT_INDIVIDUAL_ADDR, - eptp, gpa); + vmx->eptp, gpa); else - ept_sync_context(eptp); + ept_sync_context(vmx); } } @@ -1407,6 +1408,8 @@ static void exit_lmode(struct kvm_vcpu *vcpu) static void vmx_flush_tlb(struct kvm_vcpu *vcpu) { vpid_sync_vcpu_all(to_vmx(vcpu)); + if (vm_need_ept()) + ept_sync_context(to_vmx(vcpu)); } static void vmx_decache_cr4_guest_bits(struct kvm_vcpu *vcpu) @@ -1517,12 +1520,15 @@ static void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3) { unsigned long guest_cr3; u64 eptp; + struct vcpu_vmx *vmx; + vmx = to_vmx(vcpu); guest_cr3 = cr3; if (vm_need_ept()) { eptp
[PATCH][REPOST] KVM: VMX: Fix a wrong usage of vmcs_config
From a1c929709718c015686b0c23046cc08b8bc47a62 Mon Sep 17 00:00:00 2001 From: Sheng Yang <[EMAIL PROTECTED]> Date: Wed, 18 Jun 2008 14:43:38 +0800 Subject: [PATCH] KVM: VMX: Fix a wrong usage of vmcs_config The function ept_update_paging_mode_cr0() write to CPU_BASED_VM_EXEC_CONTROL based on vmcs_config.cpu_based_exec_ctrl. That's wrong because the variable may not consistent with the content in the CPU_BASE_VM_EXEC_CONTROL MSR. Signed-off-by: Sheng Yang <[EMAIL PROTECTED]> --- arch/x86/kvm/vmx.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 6e4278d..6a31406 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -1438,7 +1438,7 @@ static void ept_update_paging_mode_cr0(unsigned long *hw_cr0, if (!(cr0 & X86_CR0_PG)) { /* From paging/starting to nonpaging */ vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, -vmcs_config.cpu_based_exec_ctrl | +vmcs_read32(CPU_BASED_VM_EXEC_CONTROL) | (CPU_BASED_CR3_LOAD_EXITING | CPU_BASED_CR3_STORE_EXITING)); vcpu->arch.cr0 = cr0; @@ -1448,7 +1448,7 @@ static void ept_update_paging_mode_cr0(unsigned long *hw_cr0, } else if (!is_paging(vcpu)) { /* From nonpaging to paging */ vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, -vmcs_config.cpu_based_exec_ctrl & +vmcs_read32(CPU_BASED_VM_EXEC_CONTROL) & ~(CPU_BASED_CR3_LOAD_EXITING | CPU_BASED_CR3_STORE_EXITING)); vcpu->arch.cr0 = cr0; -- 1.5.5 From a1c929709718c015686b0c23046cc08b8bc47a62 Mon Sep 17 00:00:00 2001 From: Sheng Yang <[EMAIL PROTECTED]> Date: Wed, 18 Jun 2008 14:43:38 +0800 Subject: [PATCH] KVM: VMX: Fix a wrong usage of vmcs_config The function ept_update_paging_mode_cr0() write to CPU_BASED_VM_EXEC_CONTROL based on vmcs_config.cpu_based_exec_ctrl. That's wrong because the variable may not consistent with the content in the CPU_BASE_VM_EXEC_CONTROL MSR. Signed-off-by: Sheng Yang <[EMAIL PROTECTED]> --- arch/x86/kvm/vmx.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 6e4278d..6a31406 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -1438,7 +1438,7 @@ static void ept_update_paging_mode_cr0(unsigned long *hw_cr0, if (!(cr0 & X86_CR0_PG)) { /* From paging/starting to nonpaging */ vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, - vmcs_config.cpu_based_exec_ctrl | + vmcs_read32(CPU_BASED_VM_EXEC_CONTROL) | (CPU_BASED_CR3_LOAD_EXITING | CPU_BASED_CR3_STORE_EXITING)); vcpu->arch.cr0 = cr0; @@ -1448,7 +1448,7 @@ static void ept_update_paging_mode_cr0(unsigned long *hw_cr0, } else if (!is_paging(vcpu)) { /* From nonpaging to paging */ vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, - vmcs_config.cpu_based_exec_ctrl & + vmcs_read32(CPU_BASED_VM_EXEC_CONTROL) & ~(CPU_BASED_CR3_LOAD_EXITING | CPU_BASED_CR3_STORE_EXITING)); vcpu->arch.cr0 = cr0; -- 1.5.5
[PATCH 2/2][REPOST] kvm: user: Add event mask support in kvmtrace.
From 0456679ed9444d153c60e41dfc556b7ca4d6277f Mon Sep 17 00:00:00 2001 From: Sheng Yang <[EMAIL PROTECTED]> Date: Wed, 25 Jun 2008 16:40:56 +0800 Subject: [PATCH] kvm: user: Add event mask support in kvmtrace. Signed-off-by: Feng (Eric) Liu <[EMAIL PROTECTED]> --- user/kvmtrace.c | 56 -- 1 files changed, 53 insertions(+), 3 deletions(-) diff --git a/user/kvmtrace.c b/user/kvmtrace.c index 876ac27..a7b2071 100644 --- a/user/kvmtrace.c +++ b/user/kvmtrace.c @@ -54,7 +54,7 @@ static char kvmtrace_version[] = "0.1"; #define max(a, b) ((a) > (b) ? (a) : (b)) -#define S_OPTS "r:o:w:?Vb:n:D:" +#define S_OPTS "r:o:w:?Vb:n:D:e:" static struct option l_opts[] = { { .name = "relay", @@ -99,6 +99,12 @@ static struct option l_opts[] = { .val = 'D' }, { + .name = "event_mask", + .has_arg = required_argument, + .flag = NULL, + .val = 'e' + }, + { .name = NULL, } }; @@ -154,6 +160,8 @@ static char *output_dir; static int stop_watch; static unsigned long buf_size = BUF_SIZE; static unsigned long buf_nr = BUF_NR; +static int cat_mask = ~0u; +static unsigned long long act_bitmap[16]; static unsigned int page_size; #define for_each_cpu_online(cpu) \ @@ -175,6 +183,13 @@ static void handle_sigint(__attribute__((__unused__)) int sig) done = 1; } +static inline int valid_mask_opt(int x) +{ + return ((1 << KVM_TRC_SHIFT) <= x) && + (x < (1 << (KVM_TRC_CAT_NR_BITS + KVM_TRC_SHIFT))) && + (0 <= KVM_TRC_ACT(x)) && (KVM_TRC_ACT(x) < 64); +} + static int get_lost_records() { int fd; @@ -473,6 +488,8 @@ static int start_trace(void) memset(&kuts, 0, sizeof(kuts)); kuts.buf_size = trace_information.buf_size = buf_size; kuts.buf_nr = trace_information.buf_nr = buf_nr; + kuts.cat_mask = cat_mask; + memcpy(kuts.act_bitmap, act_bitmap, sizeof(act_bitmap)); if (ioctl(trace_information.fd , KVM_TRACE_ENABLE, &kuts) < 0) { perror("KVM_TRACE_ENABLE"); @@ -587,13 +604,21 @@ static void show_stats(void) static char usage_str[] = \ "[ -r debugfs path ] [ -D output dir ] [ -b buffer size ]\n" \ - "[ -n number of buffers] [ -o ] [ -w time ] [ -V ]\n\n" \ + "[ -n number of buffers] [ -o ] [ -e event mask ]" \ + "[ -w time ] [ -V ]\n\n" \ "\t-r Path to mounted debugfs, defaults to /sys/kernel/debug\n" \ "\t-o File(s) to send output to\n" \ "\t-D Directory to prepend to output file names\n" \ "\t-w Stop after defined time, in seconds\n" \ "\t-b Sub buffer size in KiB\n" \ "\t-n Number of sub buffers\n" \ + "\t-e Only trace specified categories or actions.\n" \ + "\t kvmtrace defaults to collecting all events can be traced.\n" \ + "\t To limit the events being captured, you can specify filter.\n" \ + "\t if you want to trace all the actions of one category," \ + " set action to zero. \n" \ + "\t eg: -e 0x0001 -e 00020001 trace entryexit and PAGE_FAULT \n" \ + "\t -e 0x00020005 -e 00020006 trace IO_READ and IO_WRITE. \n" \ "\t-V Print program version info\n\n"; static void show_usage(char *prog) @@ -604,7 +629,7 @@ static void show_usage(char *prog) void parse_args(int argc, char **argv) { - int c; + int c, cat_mask_tmp = 0; while ((c = getopt_long(argc, argv, S_OPTS, l_opts, NULL)) >= 0) { switch (c) { @@ -647,6 +672,26 @@ void parse_args(int argc, char **argv) case 'D': output_dir = optarg; break; + case 'e': { + int index, mask; + + if ((sscanf(optarg, "%x", &mask) != 1) || + !valid_mask_opt(mask)) { + fprintf(stderr, + "Invalid event mask (%u)\n", mask); + exit(EXIT_FAILURE); + } + cat_mask_tmp |= KVM_TRC_CAT(mask); + index = ffs(KVM_TRC_CAT(mask)) - 1; + if (KVM_TRC_ACT(mask) == 0) + act_bitmap[index] = ~0ull; + else { + if (act_bitmap[index] == ~0ull) + act_bitmap[index] = 0; + act_bitmap[index] |= 1 << KVM_TRC_ACT(mask); + } + break; + } default: show_usage(argv[0]); } @@ -654,12 +699,17 @@ void parse_args(int argc, char **argv) if (optind < argc || output_name == NULL) show_usage(argv[0]); + +
[PATCH 1/2][REPOST] KVM: trace: Add event mask support.
From 5eb5fb6b40c45b1c140f46b00f5de9f73fc0bab6 Mon Sep 17 00:00:00 2001 From: Sheng Yang <[EMAIL PROTECTED]> Date: Wed, 25 Jun 2008 16:37:31 +0800 Subject: [PATCH] KVM: trace: Add event mask support. Allow user space application to specify one or morefilter masks to limit the events being captured via it. Signed-off-by: Feng (Eric) Liu <[EMAIL PROTECTED]> Signed-off-by: Sheng Yang <[EMAIL PROTECTED]> --- include/linux/kvm.h |7 +++ virt/kvm/kvm_trace.c | 24 2 files changed, 31 insertions(+), 0 deletions(-) diff --git a/include/linux/kvm.h b/include/linux/kvm.h index 0ea064c..f1d3d9e 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -18,6 +18,9 @@ struct kvm_user_trace_setup { __u32 buf_size; /* sub_buffer size of each per-cpu */ __u32 buf_nr; /* the number of sub_buffers of each per-cpu */ + __u16 cat_mask; /* the tracing categories are enabled */ + __u16 pad1[3]; + __u64 act_bitmap[16]; /* the actions are enabled for each category */ }; /* for KVM_CREATE_MEMORY_REGION */ @@ -292,6 +295,7 @@ struct kvm_s390_interrupt { }; #define KVM_TRC_SHIFT 16 +#define KVM_TRC_CAT_NR_BITS 12 /* * kvm trace categories */ @@ -305,6 +309,9 @@ struct kvm_s390_interrupt { #define KVM_TRC_VMEXIT (KVM_TRC_ENTRYEXIT + 0x02) #define KVM_TRC_PAGE_FAULT (KVM_TRC_HANDLER + 0x01) +#define KVM_TRC_CAT(evt)(((evt) >> KVM_TRC_SHIFT) & 0x0fff) +#define KVM_TRC_ACT(evt)((evt) & (~0u >> KVM_TRC_SHIFT)) + #define KVM_TRC_HEAD_SIZE 12 #define KVM_TRC_CYCLE_SIZE 8 #define KVM_TRC_EXTRA_MAX 7 diff --git a/virt/kvm/kvm_trace.c b/virt/kvm/kvm_trace.c index 58141f3..179e11f 100644 --- a/virt/kvm/kvm_trace.c +++ b/virt/kvm/kvm_trace.c @@ -26,6 +26,8 @@ struct kvm_trace { int trace_state; + u16 cat_mask; + u64 act_bitmap[16]; struct rchan *rchan; struct dentry *lost_file; atomic_t lost_records; @@ -39,6 +41,23 @@ struct kvm_trace_probe { marker_probe_func *probe_func; }; +static inline int check_event_mask(struct kvm_trace *kt, u32 event) +{ + unsigned long category; + int i; + + category = KVM_TRC_CAT(event); + if (!(category & kt->cat_mask)) + return 1; + + i = find_first_bit(&category, KVM_TRC_CAT_NR_BITS); + + if (!test_bit(KVM_TRC_ACT(event), &kt->act_bitmap[i])) + return 1; + + return 0; +} + static inline int calc_rec_size(int cycle, int extra) { int rec_size = KVM_TRC_HEAD_SIZE; @@ -60,6 +79,9 @@ static void kvm_add_trace(void *probe_private, void *call_data, return; rec.event = va_arg(*args, u32); + if (check_event_mask(kt, rec.event)) + return; + vcpu= va_arg(*args, struct kvm_vcpu *); rec.pid = current->tgid; rec.vcpu_id = vcpu->vcpu_id; @@ -175,6 +197,8 @@ static int do_kvm_trace_enable(struct kvm_user_trace_setup *kuts) if (!kt->rchan) goto err; + kt->cat_mask = kuts->cat_mask; + memcpy(kt->act_bitmap, kuts->act_bitmap, sizeof(kuts->act_bitmap)); kvm_trace = kt; for (i = 0; i < ARRAY_SIZE(kvm_trace_probes); i++) { -- 1.5.5 From 5eb5fb6b40c45b1c140f46b00f5de9f73fc0bab6 Mon Sep 17 00:00:00 2001 From: Sheng Yang <[EMAIL PROTECTED]> Date: Wed, 25 Jun 2008 16:37:31 +0800 Subject: [PATCH] KVM: trace: Add event mask support. Allow user space application to specify one or morefilter masks to limit the events being captured via it. Signed-off-by: Feng (Eric) Liu <[EMAIL PROTECTED]> Signed-off-by: Sheng Yang <[EMAIL PROTECTED]> --- include/linux/kvm.h |7 +++ virt/kvm/kvm_trace.c | 24 2 files changed, 31 insertions(+), 0 deletions(-) diff --git a/include/linux/kvm.h b/include/linux/kvm.h index 0ea064c..f1d3d9e 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -18,6 +18,9 @@ struct kvm_user_trace_setup { __u32 buf_size; /* sub_buffer size of each per-cpu */ __u32 buf_nr; /* the number of sub_buffers of each per-cpu */ + __u16 cat_mask; /* the tracing categories are enabled */ + __u16 pad1[3]; + __u64 act_bitmap[16]; /* the actions are enabled for each category */ }; /* for KVM_CREATE_MEMORY_REGION */ @@ -292,6 +295,7 @@ struct kvm_s390_interrupt { }; #define KVM_TRC_SHIFT 16 +#define KVM_TRC_CAT_NR_BITS 12 /* * kvm trace categories */ @@ -305,6 +309,9 @@ struct kvm_s390_interrupt { #define KVM_TRC_VMEXIT (KVM_TRC_ENTRYEXIT + 0x02) #define KVM_TRC_PAGE_FAULT (KVM_TRC_HANDLER + 0x01) +#define KVM_TRC_CAT(evt)(((evt) >> KVM_TRC_SHIFT) & 0x0fff) +#define KVM_TRC_ACT(evt)((evt) & (~0u >> KVM_TRC_SHIFT)) + #define KVM_TRC_HEAD_SIZE 12 #define KVM_TRC_CYCLE_SIZE 8 #define KVM_TRC_EXTRA_MAX 7 diff --git a/virt/kvm/kvm_trace.c b/virt/kvm/kvm_trace.c index 58141f3..179e1
Re: [PATCH] Ignore DEBUGCTL MSRs
Hi Joerg, On Jun 24, 2008, at 3:40 PM, Joerg Roedel wrote: Hi Alex, On Tue, Jun 24, 2008 at 07:04:45AM +0200, Alexander Graf wrote: Netware writes and reads to the DEBUGCTL and LAST*IP MSRs without further checks and is really confused to receive a #GP during that. To make it happy we should just make them stubs, which is exactly what SVM already does. To support VMX too, I put these in the generic code. Maybe the SVM code could be cleaned up to use generic code too. I would prefer if you put that into the VMX specific code. We can't move the SVM parts of it into generic code because Barcelona has hardware support to virtualize these registers. Therefore SVM don't need that in generic code. Hum, I'd actually prefer not to handle this specifically in the VMX code. MSRs should be handled in a generic way if they do not need special treatment from the extension side, as in your case. For example if a third virtualization extension might come to life, I would rather see more things handled in x86.c than in all three targets. Any objections to this? If not, please apply, as it makes Netware work in KVM. Alex Signed-off-by: Alexander Graf <[EMAIL PROTECTED]> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index fc0721e..02f8490 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -609,6 +609,11 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data) pr_unimpl(vcpu, "%s: MSR_IA32_MCG_CTL 0x%llx, nop\n", __func__, data); break; + case MSR_IA32_DEBUGCTLMSR: + case MSR_IA32_LASTBRANCHFROMIP: + case MSR_IA32_LASTBRANCHTOIP: + case MSR_IA32_LASTINTFROMIP: + case MSR_IA32_LASTINTTOIP: case MSR_IA32_UCODE_REV: case MSR_IA32_UCODE_WRITE: break; @@ -705,6 +710,11 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) case MSR_IA32_MC0_MISC+16: case MSR_IA32_UCODE_REV: case MSR_IA32_EBL_CR_POWERON: + case MSR_IA32_DEBUGCTLMSR: + case MSR_IA32_LASTBRANCHFROMIP: + case MSR_IA32_LASTBRANCHTOIP: + case MSR_IA32_LASTINTFROMIP: + case MSR_IA32_LASTINTTOIP: data = 0; break; case MSR_MTRRcap: -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2001452 ] Restarted Windows 2003 Server guests have disk corruption
Bugs item #2001452, was opened at 2008-06-24 07:27 Message generated for change (Comment added) made by gerdwachs You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2001452&group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: intel Group: None Status: Open Resolution: None >Priority: 7 Private: No Submitted By: gwachs (gerdwachs) Assigned to: Nobody/Anonymous (nobody) Summary: Restarted Windows 2003 Server guests have disk corruption Initial Comment: I have a number of Windows 2003 32Bit guests. I use them to perform installation and configuration tests of a large software product. During these tests, the guests are restarted. Randomly, the guests produce disk corruption messages after a restart. The following are two examples : --- Windows Registry Hive Recovered Registry hive (file): SOFTWARE was corrupted and it has been recovered. Some data might have been lost. --- The system cannot log on due to the following error: Unable to complete the requested operation because of either a catastrophic media failure or a data structure corruption on the disk. --- OS : Ubuntu 8.04 x86_64 Kernel : 2.6.24-18-server #1 SMP x86_64 GNU/Linux KVM: kvm-70 CPU: Intel(R) Core(TM)2 Quad CPU @ 2.40GHz flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx est tm2 ssse3 cx16 xt Start Command : sudo /usr/local/kvm/bin/qemu-system-x86_64 -hda asit51ascs.img \ -m 1024 -std-vga -boot c -k sv -usb -usbdevice tablet -snapshot -vnc :51 \ -net nic,vlan=0,macaddr=00:16:3e:00:51:00 -net tap,vlan=0,script=/etc/qemu-ifup-br0 \ -net nic,vlan=1,macaddr=00:16:3e:00:51:01 -net tap,vlan=1,script=/etc/qemu-ifup-br1 no-kvm : Cannot do due to the loss of performance. Tests execute time is 7 hours with kvm. -- >Comment By: gwachs (gerdwachs) Date: 2008-06-25 09:26 Message: Logged In: YES user_id=2122332 Originator: YES Windows 2003 Guests also have random BSOD The problems in this bug report put a stop to running Windows 2003 Server on kvm at this point in time. -- Comment By: gwachs (gerdwachs) Date: 2008-06-24 10:33 Message: Logged In: YES user_id=2122332 Originator: YES The message : apic write: bad size=1 fee00030 does not occur when using the option : -no-kvm-irqchip Will continue testing. -- Comment By: gwachs (gerdwachs) Date: 2008-06-24 10:15 Message: Logged In: YES user_id=2122332 Originator: YES The message : apic write: bad size=1 fee00030 only occurs when the guest is started using kvm. i.e does not occur with the -no-kvm option. When using the -no-acpi option, the guest does not start kvm or no kvm -- Comment By: gwachs (gerdwachs) Date: 2008-06-24 09:32 Message: Logged In: YES user_id=2122332 Originator: YES Noted that I get the following in the linux console : apic write: bad size=1 fee00030 -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2001452&group_id=180599 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] KVM: Handle device assignment to guests
From: Amit Shah <[EMAIL PROTECTED]> From: Ben-Ami Yassour <[EMAIL PROTECTED]> From: Han, Weidong <[EMAIL PROTECTED]> This patch adds support for handling PCI devices that are assigned to the guest ("PCI passthrough"). The device to be assigned to the guest is registered in the host kernel and interrupt delivery is handled. If a device is already assigned, or the device driver for it is still loaded on the host, the device assignment is failed by conveying a -EBUSY reply to the userspace. Devices that share their interrupt line are not supported at the moment. By itself, this patch will not make devices work within the guest. There has to be some mechanism of translating guest DMA addresses into machine addresses. This support comes from one of three approaches: 1. If you have recent Intel hardware with VT-d support, you can use the patches in git.kernel.org/pub/scm/linux/kernel/git/amit/kvm.git vtd git.kernel.org/pub/scm/linux/kernel/git/amit/kvm-userspace.git vtd These patches are for the host kernel. 2. For paravirtualised Linux guests, you can use the patches in git.kernel.org/pub/scm/linux/kernel/git/amit/kvm.git pvdma git.kernel.org/pub/scm/linux/kernel/git/amit/kvm-userspace.git pvdma This kernel tree has patches for host as well as guest kernels. 3. 1-1 mapping of guest in host address space The patches to do this are available on the kvm / lkml list archives: http://thread.gmane.org/gmane.comp.emulators.kvm.devel/18722/focus=18753 Signed-off-by: Amit Shah <[EMAIL PROTECTED]> --- arch/x86/kvm/x86.c | 291 include/asm-x86/kvm_host.h | 38 ++ include/asm-x86/kvm_para.h | 16 +++- include/linux/kvm.h|3 + virt/kvm/ioapic.c | 12 ++- 5 files changed, 357 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 0fbc032..b2f6c78 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4,10 +4,12 @@ * derived from drivers/kvm/kvm_main.c * * Copyright (C) 2006 Qumranet, Inc. + * Copyright (C) 2008 Qumranet, Inc. * * Authors: * Avi Kivity <[EMAIL PROTECTED]> * Yaniv Kamay <[EMAIL PROTECTED]> + * Amit Shah<[EMAIL PROTECTED]> * * This work is licensed under the terms of the GNU GPL, version 2. See * the COPYING file in the top-level directory. @@ -21,8 +23,10 @@ #include "tss.h" #include +#include #include #include +#include #include #include #include @@ -95,6 +99,280 @@ struct kvm_stats_debugfs_item debugfs_entries[] = { { NULL } }; +DEFINE_RWLOCK(kvm_pci_pt_lock); + +/* + * Used to find a registered host PCI device (a "passthrough" device) + * during ioctls, interrupts or EOI + */ +struct kvm_pci_pt_dev_list * +kvm_find_pci_pt_dev(struct list_head *head, + struct kvm_pci_pt_info *pt_pci_info, int irq, int source) +{ + struct list_head *ptr; + struct kvm_pci_pt_dev_list *match; + + list_for_each(ptr, head) { + match = list_entry(ptr, struct kvm_pci_pt_dev_list, list); + + switch (source) { + case KVM_PT_SOURCE_IRQ: + /* +* Used to find a registered host device +* during interrupt context on host +*/ + if (match->pt_dev.host.irq == irq) + return match; + break; + case KVM_PT_SOURCE_IRQ_ACK: + /* +* Used to find a registered host device when +* the guest acks an interrupt +*/ + if (match->pt_dev.guest.irq == irq) + return match; + break; + case KVM_PT_SOURCE_UPDATE: + if ((match->pt_dev.host.busnr == pt_pci_info->busnr) && + (match->pt_dev.host.devfn == pt_pci_info->devfn)) + return match; + break; + } + } + return NULL; +} + +static DECLARE_BITMAP(pt_irq_handled, NR_IRQS); + +static void kvm_pci_pt_work_fn(struct work_struct *work) +{ + struct kvm_pci_pt_dev_list *match; + struct kvm_pci_pt_work *int_work; + int source; + unsigned long flags; + int guest_irq; + int host_irq; + + int_work = container_of(work, struct kvm_pci_pt_work, work); + + source = int_work->source ? KVM_PT_SOURCE_IRQ_ACK : KVM_PT_SOURCE_IRQ; + + /* This is taken to safely inject irq inside the guest. When +* the interrupt injection (or the ioapic code) uses a +* finer-grained lock, update this +*/ + mutex_lock(&int_work->kvm->lock); + read_lock_irqsave(&kvm_pci_pt_lock, flags); + match = kvm_find_pci_pt_dev(&int_work->kvm->arch.pci_pt_dev_head, NULL, +
[PATCH 1/3] KVM: Introduce kvm_set_irq to inject interrupts in guests
This function injects an interrupt into the guest given the kvm struct, the (guest) irq number and the interrupt level. Signed-off-by: Amit Shah <[EMAIL PROTECTED]> --- arch/x86/kvm/irq.c | 11 +++ arch/x86/kvm/irq.h |2 ++ 2 files changed, 13 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c index 76d736b..0d9e552 100644 --- a/arch/x86/kvm/irq.c +++ b/arch/x86/kvm/irq.c @@ -100,3 +100,14 @@ void __kvm_migrate_timers(struct kvm_vcpu *vcpu) __kvm_migrate_apic_timer(vcpu); __kvm_migrate_pit_timer(vcpu); } + +/* This should be called with the kvm->lock mutex held */ +void kvm_set_irq(struct kvm *kvm, int irq, int level) +{ + /* Not possible to detect if the guest uses the PIC or the +* IOAPIC. So set the bit in both. The guest will ignore +* writes to the unused one. +*/ + kvm_ioapic_set_irq(kvm->arch.vioapic, irq, level); + kvm_pic_set_irq(pic_irqchip(kvm), irq, level); +} diff --git a/arch/x86/kvm/irq.h b/arch/x86/kvm/irq.h index 2a15be2..ba4e3bf 100644 --- a/arch/x86/kvm/irq.h +++ b/arch/x86/kvm/irq.h @@ -80,6 +80,8 @@ static inline int irqchip_in_kernel(struct kvm *kvm) void kvm_pic_reset(struct kvm_kpic_state *s); +void kvm_set_irq(struct kvm *kvm, int irq, int level); + void kvm_timer_intr_post(struct kvm_vcpu *vcpu, int vec); void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu); void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu); -- 1.5.4.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] KVM: Introduce a callback routine for IOAPIC ack handling
This will be useful for acking irqs of assigned devices Signed-off-by: Amit Shah <[EMAIL PROTECTED]> --- virt/kvm/ioapic.c |3 +++ virt/kvm/ioapic.h |1 + 2 files changed, 4 insertions(+), 0 deletions(-) diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c index 9d02136..4759d77 100644 --- a/virt/kvm/ioapic.c +++ b/virt/kvm/ioapic.c @@ -295,6 +295,9 @@ static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int gsi) ent->fields.remote_irr = 0; if (!ent->fields.mask && (ioapic->irr & (1 << gsi))) ioapic_deliver(ioapic, gsi); + + if (ioapic->callback) + ioapic->callback(ioapic->kvm, gsi); } void kvm_ioapic_update_eoi(struct kvm *kvm, int vector) diff --git a/virt/kvm/ioapic.h b/virt/kvm/ioapic.h index 7f16675..481740a 100644 --- a/virt/kvm/ioapic.h +++ b/virt/kvm/ioapic.h @@ -58,6 +58,7 @@ struct kvm_ioapic { } redirtbl[IOAPIC_NUM_PINS]; struct kvm_io_device dev; struct kvm *kvm; + void (*callback)(void *opaque, int irq); }; #ifdef DEBUG -- 1.5.4.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
PCI device assignment to guests
This patchset introduces support for assigning PCI devices to guests (kernel part). The main difference from the last version is reserving the PCI device for our exclusive use so that multiple device assignment will fail, as will device assignment when a driver for the device being assigned is loaded. In this version, we also add support for autodetecting the IRQ assigned to the device on the host, so passing the IRQ number on the command line is no longer necessary. With these changes, the kernel support is ready for merge after review. The patches are compile- and run- tested with PVDMA as well as VT-d and are checkpatch-clean. Amit. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] KVM fixes for 2.6.26-rc7
Linus Torvalds wrote: On Wed, 25 Jun 2008, Avi Kivity wrote: Linus, please pull from the repo and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm.git kvm-updates-2.6.26 to receive kvm updates for 2.6.26-rc7. The patches fix host oopses, guest interrupt loss, and total kvm clock borkage. Avi, you _really_ need to start respecting the merge window. If you can't learn, I will have to just stop pulling from you. This is simply too big for this late in the game. I pulled, but I'm simply not going to continue doing this dance. I don't care much for virtualization, so I've let it slide, but you need to learn that 18 files changed, 358 insertions(+), 266 deletions(-) is simply not acceptable this late. You're right, I guess I should have disabled kvm clock for 2.6.26 (which makes up the bulk of the changes) and re-enabled it for 2.6.27, but ended up not resisting the temptation. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html